Академический Документы
Профессиональный Документы
Культура Документы
Week 10 & 11 -1
Learning Objectives
Positive Relationship
rree
150
150
essu
Pressu
Pr
140
140
130
130
120
120
40
40 50
50 60
60 70
70
Age
Age
Negative Relationship
90
Final gr ade
80
70
60
50
40
5 10 15
Number of absences
No Relationship
10
10
55
Y
y
00
00 10
10 20
20 30
30 4040 50
50 60
60 70
70
xX
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Week 10 & 11-11
Scatter Plot Examples
(continued)
No relationship
x
Week 10 & 11-12
Week 10 & 11-13
Correlation Coefficient
( X X)(Y Y)
i i
cov ( X, Y)
r
i1
n n S X SY
( X X) (Y Y)
i1
i
2
i1
i
2
Y Y
+ve
X X
Y Y
-ve
X X
Week 10 & 11-26
Types of Relationships
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Week 10 & 11-27
Types of Relationships
(continued)
No relationship
X
Week 10 & 11-28
Simple Linear Regression
Model
The population regression model:
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi β0 β1Xi εi
Linear component Random Error
component
Y Yi β0 β1Xi εi
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Week 10 & 11-30
Simple Linear Regression
Equation
The simple linear regression equation provides an
estimate of the population regression line
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi b0 b1Xi
observation i
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
98.25 0.1098(2000)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Week 10 & 11-44
Interpolation vs. Extrapolation
When using a regression model for prediction,
only predict within the relevant range of data
Relevant range for
interpolation
450
400
House Price ($1000s)
350
300
250
200
150 Do not try to
100
extrapolate
50
0
beyond the range
0 500 1000 1500 2000 2500 3000 of observed X’s
Square Feet
Week 10 & 11-45
Measures of Variation
note: 0 r 1
2
X
r2 =1
Week 10 & 11-51
Examples of Approximate
r2 Values
Y
0 < r2 < 1
X
Week 10 & 11-52
Examples of Approximate
r2 Values
r2 = 0
Y
No linear relationship
between X and Y:
SSE i i
( Y Ŷ )2
SYX i1
n2 n2
Where
SSE = error sum of squares
n = sample size
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Y Y
small s YX X large s YX X
Normality of Error
Error values (ε) are normally distributed for any given
value of X
Homoscedasticity
The probability distribution of the errors has constant
variance
Independence of Errors
Error values are statistically independent
ei Yi Ŷi
The residual for observation i, ei, is the difference
between its observed and predicted value
Check the assumptions of regression by examining the
residuals
Examine for linearity assumption
Examine for constant variance for all levels of X
(homoscedasticity)
Evaluate normal distribution assumption
Evaluate independence assumption
Y Y
x x
residuals
x residuals x
Not Linear
Linear
Week 10 & 11-62
Residual Analysis for
Homoscedasticity
Y Y
x x
residuals
x residuals x
Not Independent
Independent
residuals
residuals
X
residuals
SYX SYX
Sb1
SSX (X X)i
2
where:
Sb1 = Estimate of the standard error of the least squares slope
Y Y
b1 β1 where:
t b1 = regression slope
coefficient
Sb1 β1 = hypothesized slope
Sb1 = standard
d.f. n 2 error of the slope
b1 Sb1
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
b1 β1 0.10977 0
t 3.32938
Sb1 0.03297
d.f. = 10-2 = 8
Decision:
a/2=.025 a/2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H
0
There is sufficient evidence
-tα/2 tα/2
0 that square footage affects
-2.3060 2.3060 3.329
house price
Week 10 & 11-72
Inferences about the Slope:
t Test Example
(continued)
P-value = 0.01039
P-value
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
Test statistic
r -ρ
t (with n – 2 degrees of freedom)
1 r 2
where
n2 r r 2 if b1 0
r r 2 if b1 0
Week 10 & 11-79
Example: House Prices
Is there evidence of a linear relationship
between square feet and house price at the
.05 level of significance?
r ρ .762 0
t 3.33
1 r 2 1 .7622
n2 10 2
Week 10 & 11-80
Example: Test Solution
r ρ .762 0
t 3.33 Decision:
1 r 2 1 .7622 Reject H0
n2 10 2 Conclusion:
There is
d.f. = 10-2 = 8
evidence of a
linear association
a/2=.025 a/2=.025
at the 5% level of
significance
Reject H0 Do not reject H0 Reject H0
-tα/2 tα/2
0
-2.3060 2.3060
3.33
Week 10 & 11-81
Pitfalls of Regression Analysis