Вы находитесь на странице: 1из 16

Chapter 6 (cont.

)
Regression Estimation
Simple Linear Regression:
review of least squares
procedure

2
Introduction
 We will examine the relationship
between quantitative variables x and y
via a mathematical equation.
 x: explanatory variable
 y: response variable
 Data:

( x1 , y1 ), ( x2 , y2 ), , ( xn , yn )

3
The Model
The model has a deterministic and a probabilistic component

House
Cost

Most lots sell


for $25,000

House size

4
The Model
However, house costs vary even among same size
houses! Since cost behave unpredictably,
House we add a random component.
Cost

Most lots sell


for $25,000

House size

5
The Model
 The first order linear model

y  b0  b1x  e
y = response variable b0 and b1 are unknown population
x = explanatory variable y parameters, therefore are estimated
from the data.
b0 = y-intercept
b1 = slope of the line
Rise
e = error variable b1 = Rise/Run
b0 Run
x
6
Estimating the Coefficients
 The estimates are determined by
– drawing a sample from the population of
interest,
– calculating sample statistics.
– producing a straight line that cuts into the
data.
y w
w Question: What should be
w considered a good line?
w
w w w w w
w w w w w
w
x 7
The Least Squares (Regression)
Line
( x1 , y1 ), ( x2 , y2 ), , ( xn , yn )
A good line is one that minimizes the sum
of squared differences ( yi  yˆi )  errors 
between the scatterplot points and the line.
n
determine b0 and b1 to minimize SSE   ( yi  yi ) .
ˆ 2

i 1

yˆi  b0  b1 xi
8
The Least Squares
(Regression) Line
Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
Let us compare two lines
4 (2,4)
w The second line is horizontal
3 w (4,3.2)
2.5
2
(1,2) w
w (3,1.5)
1 The smaller the sum of
squared differences
the better the fit of the
1 2 3 4
line to the data.
9
The Estimated Coefficients
To calculate the estimates of the slope and
intercept of the least squares line , use the
formulas: The least squares prediction equation
that estimates the mean value of y for a
sy
b1  r particular value of x is:
sx
b0  y  b1 x
r  correlation coefficient yˆ  b0  b1 x
n

(y i  y )2  ( y  b1 x )  b1 x
sy  i 1
n 1  y  b1 ( x  x )
n

 (x  x )
i
2

sx  i 1
n 1
10
Simple Linear Regression Brand
Freshetta 4 Cheese
Fat Score
15 75
Freschetta stuffed crust 11 56
DiGiorno 12 71
• Example: Amy's organic
Safeway
14
9
81
41
Tony's 12 67
 Consumer’s Union Kroger 9 55

recently evaluated 26 Tombstone stuffed crust


Red Baron
18
20
75
73
brands of frozen pizza Bobli 12 67
Tombstone extra cheese 14 60
based on taste (y) Jack's 13 51
Celeste 17 59
 We will examine the McCain Ellio's 9 46
taste scores (y) and Totino's
Freschetta pepperoni
14
18
68
80
the corresponding fat DiGiorno pepperoni 16 78
Tombstone stuffed crust pepperoni 22 80
content (x). Tombstone pepperoni 20 73
Red Baron pepperoni 23 64
Tony's pepperoni 26 86
Red Baron deep dish pepperoni 25 77
Stouffer's pepperoni 14 54
Weight Watchers pepperoni 6 43
Jeno's pepperoni 20 75
Totino's pepperoni 1120 65
The Simple Linear Regression
Line (example, cont.)
• Solution
– Solving by hand: Calculate a number of statistics
x  15.73; sx  5.23
r  0.724
y  66.15; s y  12.47

where n = 26.
sy 12.47
b1  r  0.724  1.726
sx 5.23
b0  y  b1 x  66.15  (1.726)(15.73)  39.002

yˆ  b0  b1 x  39.002  1.726 x
12
The Simple Linear Regression
Line (example, cont.)
• Solution – continued
– Using the computer

1. Scatterplot
2. Trend function
3. Data tab > Data Analysis > Regression

13
The Simple Linear Regression
Line (example, cont.)
Regression Statistics
Multiple R 0.723546339
R Square 0.523519305
Adjusted R Square 0.503665943
Standard Error 8.785081398

Observations 26
yˆ  39.002  1.726 x
ANOVA
df SS MS F Significance F
Regression 1 2035.120891 2035.121 26.3693 2.95293E-05
Residual 24 1852.263724 77.17766

Total 25 3887.384615

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 39.00208322 5.561098219 7.013378 2.99E-07 27.5245406 50.47962583

Fat 1.72602894 0.336123407 5.135105 2.95E-05 1.032304324 2.419753555

14
The Simple Linear Regression
Line (example, cont.)
Pizza Score vs Fat Content
90 y = 1.726x + 39.002
R² = 0.5235
85

80

75

70

65

60

55

50

45

40
5 10 15 20 25 30

15
Regression estimator of a
population mean y
ˆ yL  y  b1 (  x  x )
sy
where b1  r
sx
Estimated variance of ˆ yL

ˆ  n  1  SSE
V ( ˆ yL )  1   
 N  n  n  2
 n  MSE 
 1   
 N  n 

Вам также может понравиться