Chapter 6 (Cont.) Regression Estimation

Chapter 6 (cont.
)
Regression Estimation
Simple Linear Regression:
review of least squares
procedure
2
Introduction
 We will examine the relationship
between quantitative variables x and y
via a mathematical equation.
 x: explanatory variable
 y: response variable
 Data:
( x1 , y1 ), ( x2 , y2 ), , ( xn , yn )
3
The Model
The model has a deterministic and a probabilistic component
House
Cost
Most lots sell

for $25,000
House size
4
The Model
However, house costs vary even among same size
houses! Since cost behave unpredictably,
House we add a random component.
Cost
Most lots sell

for $25,000
House size
5
The Model
 The first order linear model
y  b0  b1x  e
y = response variable b0 and b1 are unknown population
x = explanatory variable y parameters, therefore are estimated
from the data.
b0 = y-intercept
b1 = slope of the line
Rise
e = error variable b1 = Rise/Run
b0 Run
x
6
Estimating the Coefficients
 The estimates are determined by
– drawing a sample from the population of
interest,
– calculating sample statistics.
– producing a straight line that cuts into the
data.
y w
w Question: What should be
w considered a good line?
w
w w w w w
w w w w w
w
x 7
The Least Squares (Regression)
Line
( x1 , y1 ), ( x2 , y2 ), , ( xn , yn )
A good line is one that minimizes the sum
of squared differences ( yi  yˆi )  errors 
between the scatterplot points and the line.
n
determine b0 and b1 to minimize SSE   ( yi  yi ) .
ˆ 2
i 1
yˆi  b0  b1 xi
8
The Least Squares
(Regression) Line
Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
Let us compare two lines
4 (2,4)
w The second line is horizontal
3 w (4,3.2)
2.5
2
(1,2) w
w (3,1.5)
1 The smaller the sum of
squared differences
the better the fit of the
1 2 3 4
line to the data.
9
The Estimated Coefficients
To calculate the estimates of the slope and
intercept of the least squares line , use the
formulas: The least squares prediction equation
that estimates the mean value of y for a
sy
b1  r particular value of x is:
sx
b0  y  b1 x
r  correlation coefficient yˆ  b0  b1 x
n
(y i  y )2  ( y  b1 x )  b1 x
sy  i 1
n 1  y  b1 ( x  x )
n
 (x  x )
i
2
sx  i 1
n 1
10
Simple Linear Regression Brand
Freshetta 4 Cheese
Fat Score
15 75
Freschetta stuffed crust 11 56
DiGiorno 12 71
• Example: Amy's organic
Safeway
14
9
81
41
Tony's 12 67
 Consumer’s Union Kroger 9 55
recently evaluated 26 Tombstone stuffed crust

Red Baron
18
20
75
73
brands of frozen pizza Bobli 12 67
Tombstone extra cheese 14 60
based on taste (y) Jack's 13 51
Celeste 17 59
 We will examine the McCain Ellio's 9 46
taste scores (y) and Totino's
Freschetta pepperoni
14
18
68
80
the corresponding fat DiGiorno pepperoni 16 78
Tombstone stuffed crust pepperoni 22 80
content (x). Tombstone pepperoni 20 73
Red Baron pepperoni 23 64
Tony's pepperoni 26 86
Red Baron deep dish pepperoni 25 77
Stouffer's pepperoni 14 54
Weight Watchers pepperoni 6 43
Jeno's pepperoni 20 75
Totino's pepperoni 1120 65
The Simple Linear Regression
Line (example, cont.)
• Solution
– Solving by hand: Calculate a number of statistics
x  15.73; sx  5.23
r  0.724
y  66.15; s y  12.47
where n = 26.
sy 12.47
b1  r  0.724  1.726
sx 5.23
b0  y  b1 x  66.15  (1.726)(15.73)  39.002
yˆ  b0  b1 x  39.002  1.726 x
12
• Solution – continued
– Using the computer
1. Scatterplot
2. Trend function
3. Data tab > Data Analysis > Regression
13
Regression Statistics
Multiple R 0.723546339
R Square 0.523519305
Adjusted R Square 0.503665943
Standard Error 8.785081398
Observations 26
yˆ  39.002  1.726 x
ANOVA
df SS MS F Significance F
Regression 1 2035.120891 2035.121 26.3693 2.95293E-05
Residual 24 1852.263724 77.17766
Total 25 3887.384615
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 39.00208322 5.561098219 7.013378 2.99E-07 27.5245406 50.47962583
Fat 1.72602894 0.336123407 5.135105 2.95E-05 1.032304324 2.419753555
14
Pizza Score vs Fat Content
90 y = 1.726x + 39.002
R² = 0.5235
85
80
75
70
65
60
55
50
45
40
5 10 15 20 25 30
15
Regression estimator of a
population mean y
ˆ yL  y  b1 (  x  x )
sy
where b1  r
sx
Estimated variance of ˆ yL
ˆ  n  1  SSE
V ( ˆ yL )  1   
 N  n  n  2
 n  MSE 
 1   
 N  n 

Chapter 6 (Cont.) Regression Estimation

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chapter 6 (Cont.) Regression Estimation

Загружено:

Авторское право:

Доступные форматы

Chapter 6 (cont.

Most lots sell

Most lots sell

recently evaluated 26 Tombstone stuffed crust

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Fat 1.72602894 0.336123407 5.135105 2.95E-05 1.032304324 2.419753555

Вам также может понравиться