Академический Документы
Профессиональный Документы
Культура Документы
)
Regression Estimation
Simple Linear Regression:
review of least squares
procedure
2
Introduction
We will examine the relationship
between quantitative variables x and y
via a mathematical equation.
x: explanatory variable
y: response variable
Data:
( x1 , y1 ), ( x2 , y2 ), , ( xn , yn )
3
The Model
The model has a deterministic and a probabilistic component
House
Cost
House size
4
The Model
However, house costs vary even among same size
houses! Since cost behave unpredictably,
House we add a random component.
Cost
House size
5
The Model
The first order linear model
y b0 b1x e
y = response variable b0 and b1 are unknown population
x = explanatory variable y parameters, therefore are estimated
from the data.
b0 = y-intercept
b1 = slope of the line
Rise
e = error variable b1 = Rise/Run
b0 Run
x
6
Estimating the Coefficients
The estimates are determined by
– drawing a sample from the population of
interest,
– calculating sample statistics.
– producing a straight line that cuts into the
data.
y w
w Question: What should be
w considered a good line?
w
w w w w w
w w w w w
w
x 7
The Least Squares (Regression)
Line
( x1 , y1 ), ( x2 , y2 ), , ( xn , yn )
A good line is one that minimizes the sum
of squared differences ( yi yˆi ) errors
between the scatterplot points and the line.
n
determine b0 and b1 to minimize SSE ( yi yi ) .
ˆ 2
i 1
yˆi b0 b1 xi
8
The Least Squares
(Regression) Line
Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
Let us compare two lines
4 (2,4)
w The second line is horizontal
3 w (4,3.2)
2.5
2
(1,2) w
w (3,1.5)
1 The smaller the sum of
squared differences
the better the fit of the
1 2 3 4
line to the data.
9
The Estimated Coefficients
To calculate the estimates of the slope and
intercept of the least squares line , use the
formulas: The least squares prediction equation
that estimates the mean value of y for a
sy
b1 r particular value of x is:
sx
b0 y b1 x
r correlation coefficient yˆ b0 b1 x
n
(y i y )2 ( y b1 x ) b1 x
sy i 1
n 1 y b1 ( x x )
n
(x x )
i
2
sx i 1
n 1
10
Simple Linear Regression Brand
Freshetta 4 Cheese
Fat Score
15 75
Freschetta stuffed crust 11 56
DiGiorno 12 71
• Example: Amy's organic
Safeway
14
9
81
41
Tony's 12 67
Consumer’s Union Kroger 9 55
where n = 26.
sy 12.47
b1 r 0.724 1.726
sx 5.23
b0 y b1 x 66.15 (1.726)(15.73) 39.002
yˆ b0 b1 x 39.002 1.726 x
12
The Simple Linear Regression
Line (example, cont.)
• Solution – continued
– Using the computer
1. Scatterplot
2. Trend function
3. Data tab > Data Analysis > Regression
13
The Simple Linear Regression
Line (example, cont.)
Regression Statistics
Multiple R 0.723546339
R Square 0.523519305
Adjusted R Square 0.503665943
Standard Error 8.785081398
Observations 26
yˆ 39.002 1.726 x
ANOVA
df SS MS F Significance F
Regression 1 2035.120891 2035.121 26.3693 2.95293E-05
Residual 24 1852.263724 77.17766
Total 25 3887.384615
14
The Simple Linear Regression
Line (example, cont.)
Pizza Score vs Fat Content
90 y = 1.726x + 39.002
R² = 0.5235
85
80
75
70
65
60
55
50
45
40
5 10 15 20 25 30
15
Regression estimator of a
population mean y
ˆ yL y b1 ( x x )
sy
where b1 r
sx
Estimated variance of ˆ yL
ˆ n 1 SSE
V ( ˆ yL ) 1
N n n 2
n MSE
1
N n