Академический Документы
Профессиональный Документы
Культура Документы
i i
y y error =
y y regression effect =
}
{
i
y y Total Effect =
Decomposition of Effects
Decomposition of the
sum of squares
( )
( ) ( ) ( )
( ) ( ) ( )
i i i i
i i i i
n n n
i i i i
i i i
Y Y Y Y Y Y
total effect error effects regression model effect
Y Y Y Y Y Y per case i
Y Y Y Y Y Y per case i
Y Y Y Y Y Y for data set
= = =
= +
= +
= +
= +
= +
2 2 2
2 2 2
1 1 1
Decomposition of the
sum of squares
Total SS = model SS + error SS
and if we divide by df
This yields the Variance Decomposition: We
have the total variance= model variance +
error variance
( ) ( ) ( )
n n n
i i i i
i i i
Y Y Y Y Y Y
n n k k
= = =
= +
2 2 2
1 1 1
1 1
Specifying the Model
Derivation of the Intercept
n n n
i i i
i i i
n n n n
i i i i
i i i i
n
i
i
n n n
i i i
i i i
a y b x
n n
i i
i i
y a bx e
e y a bx
e y a b x
Because by definition e
y a b x
na y b x
a y bx
= = =
= = = =
=
= = =
=
= =
= + +
=
=
=
=
=
=
1 1 1
1 1 1 1
1
1 1 1
1 1
0
0
Derivation of the Regression
Coefficient
:
( )
( )
( )
( )
i i i
i i i
n n
i i i
i i
n n
i i i
i i
n
i n n
i
i i i i
i i
n n
i i i i
i i
n
i i
i
n
i
i
Given y a b x e
e y a b x
e y a b x
e y a b x
e
x y b x x
b
x y b x x
x y
b
x
= =
= =
=
= =
= =
=
=
= + +
=
=
=
c
=
c
=
=
1 1
2 2
1 1
2
1
1 1
1 1
1
2
1
2 2
0 2 2
from which it can be seen that the regression coefficient b,
is a function of r.
( ) ( )
n
i i
i 1
n n
2 2
i i
i 1 i 1
i
i
x y
r
x y
where
x x x
y y y
=
= =
=
=
=
n
i i
i 1
j
n
2
i 1
x y
b
x
=
=
=
*
y
j
x
sd
b r
sd
=
Model Specification
Is Based on Theory
Economic, Psychological & business theory
Mathematical theory
Previous research
Common sense
We ASSUME causality flows
from X to Y
Advertising
Sales
Advertising
Sales
Advertising
Sales
Advertising
Sales
Thinking Challenge:
Which Is More Logical?
Alone Group Class
Types of
Regression Models
Regression
Models
Linear
Non-
Linear
2+ Explanatory
Variables
Simple
Non-
Linear
Multiple
Linear
1 Explanatory
Variable
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
Linear Regression Model
Types of
Regression Models
Regression
Models
Linear
Non-
Linear
2+ Explanatory
Variables
Simple
Non-
Linear
Multiple
Linear
1 Explanatory
Variable
Y
Y = bX + a
a = Y-intercept
X
Change
in Y
Change in X
b = Slope
Linear Equations
The Scatter Diagram
0
50
100
0 20 40 60
Axis Title
Axis
Title
Plot of all (X
i
, Y
i
) pairs
Simple Linear
Regression Model
i i i
X Y c | | + + =
1 0
Y intercept (Constant term)
Slope
The Straight Line that Best Fit the Data
Relationship Between Variables Is a Linear Function
Random
Error
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
c
i
= Random Error
Y
X
Population
Linear Regression Model
Observed
Value
Observed Value
| |
YX
i
X = +
0 1
Y X
i i i
= + + | | c
0 1
(E(Y))
Sample Linear
Regression Model
i i
X b b Y
1 0
+ =
.
Y
i
.
= Predicted Value of Y for observation i
X
i
= Value of X for observation i
b
0
= Sample Y - intercept used as estimate of
the population |
0
b
1
= Sample Slope used as estimate of the
population |
1
Estimating Parameters:
Least Squares Method
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
Least Squares
Best fit means difference between actual Y
values & predicted Y values are a
minimum
But positive differences off-set negative
What should we
expect?
If Y and X are not related, then
E(Y|X)=E(Y) - we should predict the
same Y for every value of X.
Y
X
Mean of Y
Y=constant+(0)X
=E(Y)
What should we
expect?
If Y and X are
related, then
E(Y|X)<>E(Y) - we
should predict a
different Y for
every value of X.
Therefore, the
slope will not be
zero
Y
X
Mean of Y
Mean of X
B <>0
What should we
expect?
At the mean of X, we will predict the
mean of Y. When X deviates from its
mean, we expect Y to also deviate from its
mean
Therefore, we can also think about X
explaining deviation of Y from its mean
value.
Simple Linear Regression
Equation: Example
You wish to examine the
relationship between the
square footage of produce
stores and its annual sales.
Sample data for 7 stores
were obtained. Find the
equation of the straight
line that fits the data best
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
Scatter Diagram
Example
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S qua re Fe e t
A
n
n
u
a
l
S
a
l
e
s
(
$
0
0
0
)
Excel Output
2350 X=
5130 Y=
Equation for the Best
Straight Line
i
i i
X . .
X b b Y
487 1 415 1636
1 0
+ =
+ =
.
From Excel Printout:
Co effi ci en ts
I n t e r c e p t 1 6 3 6 . 4 1 4 7 2 6
X V a r i a b l e 1 1 . 4 8 6 6 3 3 6 5 7
If X=0, then =1636.414 Realistic?
Y
=
The F-test can be written in terms of the r
2
.
The F-test is the test that the r
2
=0.
Standard Error of
Estimate
2
=
n
SSE
S
yx
2
1
2
=
n
) Y Y (
n
i
i i
.
=
The standard deviation of the variation of
observations around the regression line
Reg ressi o n S tati sti cs
M u l t i p l e R 0 . 9 7 0 5 5 7 2
R S q u a r e 0 . 9 4 1 9 8 1 2 9
A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4
S t a n d a r d E r r o r 6 1 1 . 7 5 1 5 1 7
O b s e r va t i o n s 7
Measures of Variation:
Example
Excel Output for Produce Stores
r
2
= .94
S
yx
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage
Inferences about the
Slope: t Test
t Test for a Population Slope
Is a Linear Relationship Between X & Y ?
1
1 1
b
S
b
t
|
=
Test Statistic:
=
=
n
i
i
YX
b
) X X (
S
S
1
2
1
and df = n - 2
Null and Alternative Hypotheses
H
0
: |
1
= 0 (No Linear Relationship)
H
1
: |
1
= 0 (Linear Relationship)
Where
Example: Produce Stores
Data for 7 Stores:
Regression
Model Obtained:
The slope of this model
is 1.487.
Is there a linear
relationship between the
square footage of a store
and its annual sales?
.
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
Y
i
= 1636.415 +1.487X
i
t Stat P-val ue
I n te r ce p t 3. 6244333 0. 0151488
X V a r i a b l e 1 9. 009944 0. 0002812
H
0
: |
1
= 0
H
1
: |
1
= 0
o = .05
df = 7 - 2 = 5
Critical Value(s):
Test Statistic:
Decision:
Conclusion:
There is evidence of a
relationship.
t
0 2.5706 -2.5706
.025
Reject Reject
.025
From Excel Printout
Reject H
0
Inferences about the
Slope: t Test Example
Connection of F and t in simple
regression
b
1
t
n-2
1
b
S
Excel Printout for Produce Stores
The t test for B=0 is identical to the F test for r
2
=0 for
simple regression. The t-statistic will be the square root of
the F statistic (t=1.4866/.1649=9.01) F
1,n-2
=t
2
n-2
ANOVA
df SS F Significance F
Regression 1 30380456.12 81.17909 0.0002812
Residual 5 1871199.595
Total 6 32251655.71
Coefficients Standard Error P-value Lower 95%
Intercept 1636.41473 451.4953308 0.0151488 475.810926
X Variable 1 1.48663366 0.164999212 0.0002812 1.06249037
Note: These are identical in simple regression!
Inferences about the Slope:
Confidence Interval Example
Confidence Interval Estimate of the Slope
b
1
t
n-2
1
b
S
Excel Printout for Produce Stores
At 95% level of Confidence The confidence Interval for the
slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear relationship
between annual sales and the size of the store.
Lower 95% Upper 95%
I n te r c e p t 4 7 5 . 8 1 0 9 2 6 2 7 9 7 . 0 1 8 5 3
X V a r i a b l e 11 . 0 6 2 4 9 0 3 7 1 . 9 1 0 7 7 6 9 4
Slope estimates make line pivot
around mean point
Different estimates of B
tilt the line around the
mean point
If B is different this will
give small differences in the
forecast for Y near the
mean, but big differences
away from the mean
Regression Line
Y=1636+1.49X
Lower 95% estimate of
B (1.06)
Upper 95% estimate
of B (1.91)
Square Footage
S
A
L
E
S
Estimation of
Predicted Values
Confidence Interval Estimate for
XY
The Mean of Y given a particular X
i
+ -
=
n
i
i
i
yx n i
) X X (
) X X (
n
S t Y
1
2
2
2
1
t value from table
with df=n-2
Standard error
of the estimate
Size of interval vary according to
distance away from mean, X.
Estimation of
Predicted Values
Confidence Interval Estimate for
Individual Response Y
i
at a Particular X
i
+ + -
=
n
i
i
i
yx n i
) X X (
) X X (
n
S t Y
1
2
2
2
1
1
Addition of this 1 increased width of
interval from that for the mean Y
Confidence Bands
Error associated with a forecast has two
components:
Error at the mean (standard error of
estimate)
Error in estimating B
Therefore, the confidence intervals
around forecasts will be larger as we
move away from the mean of X
Interval Estimates for
Different Values of X
X
Y
X
Confidence Interval
for a individual Y
i
A Given X
Confidence
Interval for the
mean of Y
_
Example: Produce Stores
Y
i
= 1636.415 +1.487X
i
Data for 7 Stores:
Regression Model Obtained:
Predict the annual
sales for a store with
2000 square feet.
.
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
Estimation of Predicted
Values: Example
Confidence Interval Estimate for Individual Y
Find the 95% confidence interval for the average annual sales
for stores of 2,000 square feet
+ -
=
n
i
i
i
yx n i
) X X (
) X X (
n
S t Y
1
2
2
2
1
Predicted Sales Y
i
= 1636.415 +1.487X
i
= 4610.45 ($000)
.
X = 2350.29 S
YX
= 611.75
t
n-2
= t
5
= 2.5706
= 4610.45 980.97
Confidence interval for mean Y
Estimation of Predicted
Values: Example
Confidence Interval Estimate for
XY
Find the 95% confidence interval for annual sales of one
particular stores of 2,000 square feet
Predicted Sales Y
i
= 1636.415 +1.487X
i
= 4610.45 ($000)
.
X = 2350.29 S
YX
= 611.75
t
n-2
= t
5
= 2.5706
= 4610.45 1853.45
Confidence interval for
individual Y
+ + -
=
n
i
i
i
yx n i
) X X (
) X X (
n
S t Y
1
2
2
2
1
1