Академический Документы
Профессиональный Документы
Культура Документы
2
Outline
1. Scatterplots are Pictures of 1→1 association
2. Correlation gives a number for 1→1 association
3. Simple Regression is Better than Correlation
3
Does Having Too Many Students
Per Teacher Lower Test Marks?
4
Scatterplots are Pictures of 1→1
Association
5
Is there a Number for This
Relationship?
6
What about Mean? Variance?
7
Treat this as a Dataset on Student-
teacher ratio (STR), called „X‟
8
Treat this as a Dataset on Student-
teacher ratio (STR), called „X‟
Imagine Falling Rain
9
Collapse onto „X‟ (horizontal) axis
10
Ignore „Y‟ (vertical)
11
Sample Mean
n
xi xi
i 1
x
n n
12
Sample Variance
n
( xi x) 2
( xi x )2
2 i 1
S x
n 1 n 1
13
Standard error/deviation is the
square root of the variance
n
( xi x) 2
( xi x )2
Sx S x2 i 1
n 1 n 1
14
(It is very close to a typical departure of x from mean.
„standard‟ = „typical‟; „deviation/error‟ = departures from mean)
| xi x|
Sx
n
15
Treat this as a Dataset on Test
score, called „Y‟
16
Collapse onto Y axis
17
Calculate Mean and Variance
2
S y
18
Is there a Number for This
Relationship? Not Yet
19
Break up All Observations
into 4 Quadrants
x
II I
III IV
20
Fill In the Signs of Deviations from
Means for Different Quadrants
x
II I
xi x 0 yi y 0 xi x 0 yi y 0
III IV
xi x 0 yi y 0 xi x 0 yi y 0
21
Fill In the Signs of Deviations from
Means for Different Quadrants
x
II I
xi x 0 yi y 0 xi x 0 yi y 0
III IV
xi x 0 yi y 0 xi x 0 yi y 0
22
The Products are Positive in I and III
x
II I
( xi x )( y i y ) 0 ( xi x )( y i y ) 0
III IV
( xi x )( y i y ) 0 ( xi x )( y i y ) 0
23
The Products are Negative in II and IV
x
II I
( xi x )( y i y ) 0 ( xi x )( y i y ) 0
III IV
( xi x )( y i y ) 0 ( xi x )( y i y ) 0
24
Sample Covariance, Sxy, describes
the Relationship between X and Y
1
S xy ( xi x )( y i y )
(n 1)
If Sxy > 0 most data lies in I and III:
This concurs with our visual common sense because
it looks like a positive relationship
If Sxy < 0 most data lies in II and IV
This concurs with our visual common sense because
it looks like a negative relationship
If Sxy = 0 data is „evenly spread‟ across I-IV 25
What About Our Data?
x
II I
III IV
26
Large Negative Sxy
27
Large Positive Sxy
28
Zero Sxy
29
Our Data has a Mild Negative
Covariance Sxy<0
x
II I
III IV
30
Correlation, rXY, is a Measure of
Relationship that is Unit-less
S XY
rXY
S X SY
It can be proved that it lies between -1 and 1.
1 rXY 1
It has the same sign as SXY so ….
31
Mild Negative Correlation
rXY= -.2264
x
II I
III IV
32
Outline
1. Scatterplots are Pictures of 1→1 association
2. Correlation gives number for 1→1 association
3. Simple Regression is Better than Correlation
33
Outline
1. Scatterplots are Pictures of 1→1 association
2. Correlation gives number for 1→1 association
3. Simple Regression is Better than Correlation
But…
How much does Y change when X changes?
What is a good guess of Y if X =25?
What does correlation = -.2264 mean anyway?
34
Outline
1. Scatterplots are Pictures of 1→1 association
2. Correlation gives a number for 1→1 association
3. Simple Regression is Better than Correlation
35
What is Simple Regression?
Y b0 b1. X
36
Y b0 b1. X
fadsf
fadsf
sa
sa
37
We Get our Guessed Line Using
„(Ordinary) Least Squares [OLS]‟
OLS minimises the squared difference between a
regression line and the observations.
We can view these squared differences as squares.
This task then becomes the minimisation of the
area of the squares.
Applet: http://hadm.sph.sc.edu/Courses/J716/demos/LeastSquares/LeastSquaresDemo.html
38
Measures of Fit
(Section 4.3)
39
The Standard Error of the
Regression (SER)
The SER measures the spread of the distribution of u. The SER
is (almost) the sample standard deviation of the OLS residuals:
n
1
SER = (uˆi uˆ )2
n 2 i 1
n
1
= uˆi2
n 2 i 1
n
1
(the second equality holds because û = uˆi = 0).
n i 1
40
n
1
SER = uˆi2
n 2 i 1
The SER:
has the units of u, which are the units of Y
measures the average “size” of the OLS residual (the average
“mistake” made by the OLS regression line)
Don‟t worry about the n-2 (instead of n-1 or n) – the reason is
too technical, and doesn‟t matter if n is large.
41
How the Computer Did it
(SW Key Concept 4.2)
42
The OLS Line has a Small Negative
Slope
One of the districts in the data set is Antelope, CA, for which
STR = 19.33 and Test Score = 657.8
predicted value: Yˆ
Antelope= 698.9 – 2.28 19.33 = 654.8
residual: uˆ Antelope = 657.8 – 654.8 = 3.0
46
R 2 and SER evaluate the Model
= 698.9 – 2.28.STR
48
Recap
1. Scatterplots are Pictures of 1→1 association
2. Correlation gives number for 1→1 association
3. Simple Regression is Better than Correlation
But…
How much does Y change when X changes?
What is a good guess of Y if X =25?
What does correlation = -.2264 mean anyway?
49
Recap
1. Scatterplots are Pictures of 1→1 association
2. Correlation gives number for 1→1 association
3. Simple Regression is Better than Correlation
But…
How much does Y change when X changes? b1 x
What is a good guess of Y if X =25? b0+b1(25)
What does correlation = -.2264 mean anyway?
50
Outline
1. Scatterplots are Pictures of 1→1 association
2. Correlation gives number for 1→1 association
3. Simple Regression is Better than Correlation
But…
How much does Y change when X changes? b1 x
What is a good guess of Y if X =25? b0+b1(25)
What does correlation = -.2264 mean anyway?
Surprise: R2=rXY2
51
Outline
1. Scatterplots are Pictures of 1→1 association
2. Correlation gives number for 1→1 association
3. Simple Regression is Better than Correlation
But…
How much does Y change when X changes? b1 x
What is a good guess of Y if X =25? b0+b1(25)
What does correlation = -.2264 mean anyway?
Surprise: R2=rXY2
52
Chapter 5
54
The Underlying Model
(or „Population Regression Function‟)
Yi = 0 + 1X i + ui, i = 1,…, n
X is the STR
Y is the Test score
0= intercept
Test score
1=
STR
= change in test score for a unit change in STR
If we also guess 0 we can also predict Test score when STR
has a particular value.
Clearly, we want good guesses (estimates) of 0 and 1.
56
A Picture is Worth 1000 Words
57
From Now on we Use „b‟ or „ ˆ ‟ to Signify our
Guesses, or „Estimates‟ of the Slope or Intercept.
We never see the True Line.
û1
û 2
b0+b1x
58
From Now on we Use „b‟ or „ ˆ ‟ to Signify our
Estimates of the Slope or Intercept and for
guesses for u. We never see the True Line or u‟s.
û1
û 2
b0+b1x
59
Our Estimators are Really Random
Least squares estimators have a distribution; they are
different every time you take a different sample. (like an
average of 5 heights, or 7 exam marks)
The estimators are Random Variables. A Random
variable generate numbers with a central measure called
a mean and a volatility called the Standard Errors
Least squares estimators b0 & b1 have means 0 & 1
Hypothesis testing:
Eg. How to test if the slope 1
is zero, or -37?
Confidence intervals:
Eg. What is a reasonable range of guesses for the slope 1?
60
Outline
1. OLS Assumptions
2. OLS Sampling Distribution
3. Hypothesis Testing
4. Confidence Intervals
61
Outline
1. OLS Assumptions
2. OLS Sampling Distribution
3. Hypothesis Testing
4. Confidence Intervals
62
Outline
1. OLS Assumptions (Very Technical)
2. OLS Sampling Distribution
3. Hypothesis Testing
4. Confidence Intervals
63
Outline
1. OLS Assumptions (When will OLS be ‘good’?)
2. OLS Sampling Distribution
3. Hypothesis Testing
4. Confidence Intervals
64
Estimator Distributions Depend on
Least Squares Assumptions
A key part of the model is the assumptions made
about the residuals ut for t=1,2….n.
1. E(ut)=0
2. E(ut2)=σ2 =SER2 (note: not σ2t – invariant)
3. E(utus)=0 t≠s
4. Cov(Xt, ut)=0
5. ut~Normal
65
SW has different assumptions;
Use mine for any Discussions
The conditional distribution of u given X has
mean zero, that is, E(u|X = x) = 0. (a combination
of 1. and 4.)
(Xi,Yi), i =1,…,n, are i.i.d. (unnecessary in many
applications)
Large outliers in X and/or Y are rare. (technical
assumption)
66
How reasonable
are these assumptions?
To answer, we need to understand them.
1. E(ut)=0
2. E(ut2)=σ2 =SER2 (note: not σ2t – invariant)
3. E(utus)=0 t≠s
4. Cov(Xt, ut)=0
5. ut~Normal
67
It‟s All About u
E(ut)=0
1. E(ut2)=σ2 =SER2 (note: not σ2t – invariant)
2. E(utus)=0 t≠s
3. Cov(Xt, ut)=0
4. ut~Normal
68
1. E(ut)=0 is not a big deal
Providing the model has a constant, this is not a
restrictive assumption.
If „all the other influences‟ don‟t have a zero
mean, the estimated constant will just adjust to
the point u does have a zero mean.
Really, B0+u could be thought of as everything
else that affects y apart from x
69
2. E(ut2)=σ2 =SER2 is Controversial
70
Hetero Related to X is very Common
Homoskedastic Heteroskedastic
71
Our Data Looks OK, but Don‟t be Complacent
Homoskedastic Heteroskedastic
72
3. E(utus)=0 t≠s
73
Aside: Hetero and Auto are not a
Disaster
Hetero plagues cross-sectional data, Auto plagues
time series.
Remarkably, Heteroskedasticity and
Autocorrelation do not bias the Least Squares
Estimators.
This is a very strange result!
74
Hetero Doesn‟t Bias
y
75
If You Could See the True Line
You‟d Realize hetero is bad for OLS
76
But OLS is Still Unbiased!
In case (b), OLS is still unbiased because the next draw
is just as likely to find the third error above the true
line, pulling up the (negative) slope of the least squares
line. On average, the true line would be revealed
with many samples.
y (a) (b)
x
But we will make an adjustment to our analysis later OLS is no
longer „best‟ which means minimum variance
77
Conquer Hetero and Auto with Just
One Click
SW recommend you correct standard errors for
hetero and auto. In EVIEWs you do this by:
estimate/options/heteroskedasticity consistent
coefficient covariance/ input: leave white ticked if
only worried about hetero. Tick Newey West to
correct for both.
Because OLS is unbiased, the correction only
occurs for the standard errors.
Sometimes, we will use standard errors corrected
in this way
78
4. Cov(Xt, ut)=0
82
With OLS Assumptions the CLT
Gives Us the Distribution of 1
ˆ ~ N( 1 , SE ( 1
2
) )
1
83
t-distribution (small n) vs Normal
0.4
0.3
0.2
0.1
3 2 1 1 2 3
84
With OLS Assumptions the CLT
Gives Us the Distribution of 1
ˆ ~ N( 1 , SE ( 1
2
) )
1
85
With OLS Assumptions the CLT
Gives Us the Distribution of 1
ˆ ~ N( 1 , SE ( 1
2
) )
1
86
EVIEWs output gives us SE(B1)
87
Outline
1. OLS Assumptions
2. OLS Sampling Distribution
3. Hypothesis Testing
4. Confidence Intervals
88
EVIEWs Output Can be Summarized
in Two Lines
Put standard errors in parentheses below the estimated
coefficients to which they apply.
2
TestScore = 698.9 – 2.28 STR, R = .05, SER = 18.6
(10.4)
9.4675 (0.52)
0.4798
SE ( ˆ1 )
Reject at 5% significance level if |t| > 1.96
This procedure relies on the large-n approximation; typically
n = 30 is large enough for the approximation to be excellent.
94
p-values are another method
96
Example: Test Scores and STR,
California data
Estimated regression line: TestScore = 698.9 – 2.28STR
Regression software reports the standard errors:
(The standard errors are corrected for heteroskedasticity)
SE( ˆ0 ) = 10.4 SE( ˆ1 ) = 0.52
ˆ 2.28 0
t-statistic testing 1,0 = 0 = 1 1,0
= = –4.38
ˆ
SE ( 1 ) 0.52
The 1% 2-sided significance level is 2.58, so we reject the null
at the 1% significance level.
Alternatively, we can compute the p-value…
97
The p-value based on the large-n standard normal approximation
to the t-statistic is 0.00001 (10–5)
98
Hypothesis Testing Can be Tricky
Dependent Variable: TESTSCR
Method: Least Squares
Date: 06/05/08 Time: 22:35
Sample: 1 420
Included observations: 420
99
Try These Hypotheses
(a) H0:B1=0, H1:B1>0, with =.05 using the critical-values approach
101
(b) H0:B1=0, H1:B1<0, with =.05 using the critical-values approach
102
(c) H0:B1=0, H1:B1≠0, with =.05 using the critical-values approach
103
(d) H0:B1=0, H1:B1>0, with =.05 using the p-value approach
104
(e) H0:B1=0, H1:B1<0, with =.05 using the p-value approach
105
(f) H0:B1=0, H1:B1≠0, with =.05 using the p-value approach
106
(g) H0:B1=-0.05, H1:B1<-0.05, with =.10
107
Outline
1. OLS Assumptions
2. OLS Sampling Distribution
3. Hypothesis Testing
4. Confidence Intervals
108
With OLS Assumptions the CLT
Gives Us the Distribution of 1
ˆ ~ N( 1 , SE ( 1
2
) )
1
109
Confidence
Intervals
ˆ ~ N ( , SE ( ) 2 )
1 1 1
ˆ 2
~ N (0, SE ( 1 ) )
1 1
ˆ
1 1 ~ N (0, 1)
SE ( 1 )
110
95% Confidence Intervals Catch the
True Parameter 95% of the Time
ˆ-
Prob(-1.96 1.96) .95
SE( ˆ )
Prob(-1.96. SE( ˆ ) ˆ - 1.96.SE( ˆ )) .95
Prob(1.96.SE( ˆ ) ˆ 1.96.SE( ˆ )) .95
Prob( ˆ 1.96. SE( ˆ ) ˆ 1.96.SE( ˆ )) .95
So, the probabilit y will be captured by the random
interval ˆ 1.96. SE( ˆ ) is 0.95.
http://bcs.whfreeman.com/bps4e/content/cat_010/applets/confidenceinterval.html
111
Confidence Intervals are
Reasonable Ranges
If we cannot reject H 0 : in favour of H1 : at, say 5%
ˆ-
It implies 1.96
SE( ˆ )
ˆ- ˆ
1.96 1.96 1.96 1.96
SE( )ˆ ˆ
SE( )
ˆ 1.96 SE( ˆ ) ˆ 1.96 SE( ˆ )
But this just says must lie in a 95% CI.
Going the other way, we can define a 1 - Confidence Interval
as a range of values that could not be rejected as nulls in a two - sided
test of significan ce with test size .
112
Confidence interval example: Test Scores and STR
Estimated regression line: TestScore = 698.9 – 2.28 STR
113
If You Make 1→1 Associations, Use
Simple Regression, not Correlation
1. OLS Assumptions
2. OLS Sampling Distribution
3. Hypothesis Testing
4. Confidence Intervals
114
Chapter 6
Introduction to
Multiple Regression
115
Outline
1. Omitted variable bias
2. Multiple regression and OLS
3. Measures of fit
4. Sampling distribution of the OLS estimator
116
It‟s all about u
(SW Section 6.1)
The error u arises because of factors that influence Y but are not
included in the regression function; so, there are always omitted
variables.
117
Outline
1. Omitted variable bias
2. Multiple regression and OLS
3. Measures of fit
4. Sampling distribution of the OLS estimator
118
Omitted variable bias=OVB
The bias in the OLS estimator that occurs as a result of an
omitted factor is called omitted variable bias.
Let y= 0+ 1x+u and let u=f(Z)
omitted variable bias is a problem if the omitted factor “Z” is:
120
…Doctors or Income or Both?
Simple Linear Regression only allows us to use
one of these predictors to estimate life expectancy.
But income per capita is correlated with the
number of physicians per 1000 people. Suppose
the truth is:
Life=B0+B1Income+B2Doctors+u but you run
Life=B0+B1Income+u* (u*=B2Doctors+u)
121
OVB=„Double Counting‟
B1 is the impact of Income on Life, holding
everything else constant including the residual
But if correlation exists between the Doctors (in
the residual) and income (rIncDoct≠0 ), and, if the
true impact of Doctors (B2≠0) is non-zero, then B1
counts both effects – it „double counts‟
Life=B0+B1Income+u* (u*=B2Doctors+u)
122
Our Test score Reg has OVB
In the test score example:
1. English language deficiency (whether the student is learning
English) plausibly affects standardized test scores: Z is a
determinant of Y.
2. Immigrant communities tend to be less affluent and thus
have smaller school budgets – and higher STR: Z is
correlated with X.
Accordingly, ˆ1 is biased.
123
What is the bias? We have a formula
STR is larger for those classes with a higher PctEL (both being a
feature of poorer areas), the correlation between STR and PctEL
will be positive
PctEL appears in u with a negative sign in front of it – higher
PctEL leads to lower scores. Therefore the correlation between
STR and u[ minus PctEL], must be negative (ρXu < 0).
Here is the formula. (Standard deviations are always positive)
Bias ˆ
X X
126
The Population Multiple Regression
Model (SW Section 6.2)
Consider the case of two regressors:
Yi = 0 + 1X1i + 2X2i + ui, i = 1,…,n
Y
1 = X1 , holding X2 constant= Ceteris Paribus
Y
2 = X2 , holding X1 constant = Ceteris Paribus
128
The OLS Estimator in Multiple
Regression (SW Section 6.3)
With two regressors, the OLS estimator solves:
n
min b0 ,b1 ,b2 [Yi (b0 b1 X 1i b2 X 2i )]2
i 1
129
Multiple regression in EViews
Dependent Variable: TESTSCR
Method: Least Squares
Sample: 1 420
Included observations: 420
White Heteroskedasticity-Consistent Standard Errors & Covariance
TESTSCR=C(1)+C(2)*STR+C(3)*EL_PCT
131
Measures of Fit for Multiple
Regression (SW Section 6.4)
R2 now becomes the square of the correlation coefficient
between y and predicted y.
132
2 2
R and R
The R2 is the fraction of the variance explained – same definition
as in regression with a single regressor:
2 ESS SSR
R = =1 ,
TSS TSS
n n n
where ESS = (Yˆi Yˆ ) , SSR =
2
uˆ , TSS =
2
i (Yi Y ) 2 .
i 1 i 1 i 1
133
2 2
R and R
The R 2 (the “adjusted R2”) corrects this problem by “penalizing”
you for including another regressor – the R 2 does not necessarily
increase when you add another regressor.
Adjusted R2:
2 n 1 SSR n 1 SSR 2
R =1 = 1 (1-R )
n k 1 TSS n k 1 TSS
Note that R 2 < R2, however if n is large the two will be very
close.
134
Measures of fit, ctd.
Test score example:
What – precisely – does this tell you about the fit of regression
(2) compared with regression (1)?
Why are the R2 and the R 2 so close in (2)?
135
Outline
1. Omitted variable bias
2. Multiple regression and OLS
3. Measures of fit
4. Sampling distribution of the OLS estimator
136
Sampling Distribution Depends on
Least Squares Assumptions (SW Section 6.5)
yi=B0+B1x1i+B2x2i+……Bkxki+ui
• E(ut)=0
1. E(ut2)=σ2 =SER2 (note: not σ2t – invariant)
2. E(utus)=0 t≠s
3. Cov(Xt, ut)=0
4. ut~Normal plus
5. There is no perfect multicollinearity
137
Assumption #4: There is no perfect multicollinearity
Perfect multicollinearity is when one of the regressors is an
exact linear function of the other regressors.
138
Perfect multicollinearity is when one of the regressors is an
exact linear function of the other regressors.
In such a regression (where STR is included twice), 1 is the
effect on TestScore of a unit change in STR, holding STR
constant (???)
The Standard Errors become Infinite when perfect
multicollinearity exists
139
OLS Wonder Equation
Suˆ 1
SE (bi )
S xi n(1 R 2
xi on X )
• Multicollinearity increases R2 and therefore
increases variance of bi
Perfect multicollinearity (R2=1) makes
regression impossible
You expect a low standard error the more
variables you add to a regression. The more
you add, the higher the R-squared on the
denominator becomes, because it always
rises with extra variables. 140
Quality of Slope Estimate (R2
and Suˆ fixed)
High SE(bi) Low SE(bi) Low SE(bi)
xi xi xi
141
The Sampling Distribution of the
OLS Estimator (SW Section 6.6)
Under the Least Squares Assumptions,
The exact (finite sample) distribution of ˆ1 has mean 1,
var( ˆ1 ) is inversely proportional to n; so too for ˆ2 .
Other than its mean and variance, the exact (finite-n)
distribution of ˆ1 is very complicated; but for large n…
p
ˆ is consistent: ˆ 1 (law of large numbers)
1 1
ˆ
1 1
143
Perfect multicollinearity, ctd.
Perfect multicollinearity usually reflects a mistake in the
definitions of the regressors, or an oddity in the data
If you have perfect multicollinearity, your statistical software
will let you know – either by crashing or giving an error
message or by “dropping” one of the variables arbitrarily
The solution to perfect multicollinearity is to modify your list
of regressors so that you no longer have perfect
multicollinearity.
144
Imperfect multicollinearity
Imperfect and perfect multicollinearity are quite different despite
the similarity of the names.
145
Imperfect multicollinearity, ctd.
Imperfect multicollinearity implies that one or more of the
regression coefficients will be imprecisely estimated.
Intuition: the coefficient on X1 is the effect of X1 holding X2
constant; but if X1 and X2 are highly correlated, there is very
little variation in X1 once X2 is held constant – so the data are
pretty much uninformative about what happens when X1
changes but X2 doesn‟t, so the variance of the OLS estimator
of the coefficient on X1 will be large.
Imperfect multicollinearity (correctly) results in large
standard errors for one or more of the OLS coefficients as
described by the OLS wonder equation
Next topic: hypothesis tests and confidence intervals…
146
Portion of X that “explains” Y
High R2
Y
For any two circles,
the overlap tells the
size of the R2
147
Portion of X that “explains” Y
LowR2
Y
For any two circles,
the overlap tells the
size of the R2
148
Adding Another X Increases R2
X2
X1
151
Multiple Coefficients Tests?
152
Multiple Coefficients Tests
1 > 2
154
Multiple Coefficients Tests
1 = 2= 0
155
Multiple Coefficients Tests
157
Example 2 Solution
yt = 0 + 1x1t + 2x2t +.. kxkt + et
If then . Sub this in.
yt = 0 + ( ) x1t + 2 x2t + . + et
= 0 + x1t + 2(x1t + x2t)+ . + et
So, to test 1 > 2 just run a new regression including
x1+x2 instead of x2 (everything else is left the same) and do a
t-test for H0: =0 vs. H1: >0. Naturally, if you accept H1: >0,
this implies 1 2 >0 which implies 1 > 2
This technique is called reparameterization
158
Restricted Regressions
159
Restricted Regression: Example
1
161
Properties of Restricted
Regressions
Imposing a restriction always increases the residual
sum of squares, since you are forcing the estimates to
take the values implied by the restriction, rather than
letting OLS choose the values of the estimates to
minimize the SSR
If the SSR increases a lot, it implies that the
restriction is relatively „unbelievable‟. That is, the
model fits a lot worse with the restriction imposed.
This last point is the basic intuition of the F-test –
impose the restriction and see if SSR goes up „too
much‟.
http://hadm.sph.sc.edu/Courses/J716/demos/LeastSquares/LeastSquaresDemo.html
162
The F-test
To test a restriction we need to run the restricted
regression as well as the unrestricted regression (i.e.
the original regression). Let q be the number of
restrictions.
Intuitively, we want to know if the change in SSR is
big enough to suggest the restriction is wrong
SSRr SSRur q
F , where
SSRur n k 1 )
r is restricted and ur is unrestricted 163
The F statistic
The F statistic is always positive, since the SSR
from the restricted model can‟t be less than the
SSR from the unrestricted
Essentially the F statistic is measuring the relative
increase in SSR when moving from the unrestricted
to restricted model
q = number of restrictions
164
The F statistic (cont)
To decide if the increase in SSR when we move to
a restricted model is “big enough” to reject the
restrictions, we need to know about the sampling
distribution of our F stat
Not surprisingly, F ~ Fq,n-k-1, where q is referred to
as the numerator degrees of freedom and n – k-1 as
the denominator degrees of freedom
165
The F statistic Reject H0 at
significance level
f(F) if F > c
0 c F
fail to reject reject
166
Equivalently, using p-values
f(F) Reject H0if p-value <
0 c F
fail to reject reject
167
The R 2 form of the F statistic
Because the SSR‟s may be large and unwieldy, an
alternative form of the formula is useful
We use the fact that SSR = TSS(1 – R2) for any
regression, so can substitute in for SSRu and SSRur
2 2
R ur R r q
F 2 , where again
1 R ur n k-1
r is restricted and ur is unrestricted 168
Overall Significance (example 1)
2
R k
F 2
1 R n k-1
169
2
R k
Dependent Variable: TESTSCR F 2
Method: Least Squares
Date: 06/05/08 Time: 15:29
Sample: 1 420
1 R n k-1
Included observations: 420
[.8053/4]/[{1-.8053}/(420-5)] = 429
170
General Linear Restrictions
The basic form of the F statistic will work for any
set of linear restrictions
First estimate the unrestricted model and then
estimate the restricted model
In each case, make note of the SSR
Imposing the restrictions can be tricky – will likely
have to redefine variables again
171
F Statistic Summary
Just as with t statistics, p-values can be calculated
by looking up the percentile in the appropriate F
distribution
If only one exclusion is being tested, then F = t2,
and the p-values will be the same
F-tests are done mechanically – you don‟t have to
do the restricted regressions (though you have to
understand how to do them for this course).
172
F-tests are Easy in EVIEWs
To test hypotheses like these in EVIEWs, use the
Wald test. After you run your regression, type
„View, Coefficient tests, Wald‟
Try testing a single restriction (which you can use
a t-test for) and see that t2=F, and, that the p-
values are the same.
Try testing all the coefficients except the intercept
are zero, and compare it with the F-test
automatically calculated in EVIEWs.
SW discusses the shortcomings of F-tests at
length. They crucially depend upon the
assumption of homoskedasticity.
173
Start Big and Go Small
General to Specific Modeling relies upon the fact that
omitted variable bias is a serious problem.
Start with a very big model to avoid OVB
Do t-tests on individual coefficients. Delete the most
insignificant, run the model again, delete the most
insignificant variable, run the model again, and so
on….until every individual coefficient is significant.
Finally, Do an F-test on the original model excluding all
the coefficients required to get to your final model at once.
If the null is accepted, you have verified the model.
Test for Hetero, and correct for it if need be.
174
Chapter 8
Nonlinear Regression
Functions
175
„Linear‟ Regression = Linear in
Parameters, Not Nec. Variables
176
„Linear‟ Regression = Linear in
Parameters, Not Nec. Variables
177
Nonlinear Regression Population Regression
Functions – General Ideas (SW Section 8.1)
If a relation between Y and X is nonlinear:
1. Polynomials in X
The population regression function is approximated by a
quadratic, cubic, or higher-degree polynomial
2. Logarithmic transformations
Y and/or X is transformed by taking its logarithm
this gives a “percentages” interpretation that makes sense
in many applications
179
„Linear‟ Regression = Linear in
Parameters, Not Nec. Variables
180
2. Polynomials in X
Approximate the population regression function by a polynomial:
2 i +…+
2 r
Yi = 0+ 1X i + X X
r i + ui
Quadratic specification:
2
TestScorei = 0+ 1Incomei + 2 (Incomei + ui
)
Cubic specification:
2
TestScorei = 0+ 1Incomei + 2 (Incomei)
3
+ 3 (Incomei + ui
)
182
Estimation of the quadratic
specification in EViews
Dependent Variable: TESTSCR
Method: Least Squares Create a quadratic regressor
Sample: 1 420
Included observations: 420
White Heteroskedasticity-Consistent Standard Errors & Covariance
TESTSCR=C(1)+C(2)*AVGINC + C(3)*AVGINC*AVGINC
184
Interpreting the estimated
regression function, ctd:
(b) Compute “effects” for different values of X
186
TestScore = 607.3 + 3.85Incomei – 0.0423(Incomei)2
187
Summary: polynomial regression
functions
Yi = 0+ 1Xi + 2 X i2 +…+ X r
r i + ui
189
3. Are Polynomials Enough?
We can investigate the appropriateness of a
regression function by graphing the regression
function over the top of the scatterplot.
For some models, we may need to transform the
data
For example, take logs of the response variable
The site below allows us to do this, exploring some
common regression functions
http://www.ruf.rice.edu/%7Elane/stat_sim/transformations/index.html
190
„Linear‟ Regression = Linear in
Parameters, Not Nec. Variables
191
3. Logarithmic functions of Y and/or X
ln(X) = the natural logarithm of X
Logarithmic transforms permit modeling relations in
“percentage” terms (like elasticities), rather than linearly.
Numerically:
ln(1.01)-ln(1) = .00995-0 =.00995 (correct % .01);
ln(40)-ln(45) = 3.6889-3.8067=-.1178 (correct % = -.1111) 192
Three log regression specifications:
195
Regression when X is Binary
(Section 5.3)
So far, 1 has been called a “slope,” but that doesn‟t make sense
if X is binary.
When Xi = 1, Yi = 0 + 1 + ui
the mean of Yi is 0 + 1
so:
1 = E(Yi|Xi=1) – E(Yi|Xi=0)
= population difference in group means
197
Interactions Between Independent
Variables (SW Section 8.3)
Perhaps a class size reduction is more effective in some
circumstances than in others…
Perhaps smaller classes help more if there are many English
learners, who need individual attention
TestScore
That is, might depend on PctEL
STR
Y
More generally, might depend on X2
X1
How to model such “interactions” between X1 and X2?
We first consider binary X‟s, then continuous X‟s
198
(a) Interactions between two binary
variables
Yi = 0 + 1D1i + 2D2i + ui
199
Interpreting the coefficients:
Yi = 0 + 1D1i + 2D2i + 3(D1i D2i) + ui
Y
D1 = 1 + 3D 2
200
Example: TestScore, STR, English
learners
Let
1 if STR 20 1 if PctEL l0
HiSTR = and HiEL =
0 if STR 20 0 if PctEL 10
Di is binary, X is continuous
As specified above, the effect on Y of X (holding constant D) =
1, which does not depend on D
To allow the effect of X to depend on D, include the
“interaction term” Di Xi as a regressor:
202
Binary-continuous interactions: the
two regression lines
Yi = 0 + 1Xi + 2Di + 3(Di Xi) + ui
Yi = 0 + 1Xi + 2 + 3Xi + ui
= ( 0+ 2) + ( 1+ 3)Xi + ui The D=1 regression line
203
Binary-continuous interactions, ctd.
D=1 D=0 D=1
D=0
B3=0
All Bi non-zero
D=1
D=0
B2=0
204
Interpreting the coefficients:
Yi = 0 + 1 Xi + 2 Di + 3(Xi Di) + ui
Y
X = 1 + 3D
205
Example: TestScore, STR, HiEL
(=1 if PctEL 10)
TestScore = 682.2 – 0.97STR + 5.6HiEL – 1.28(STR HiEL)
(11.9) (0.59) (19.5) (0.97)
When HiEL = 0:
TestScore = 682.2 – 0.97STR
When HiEL = 1,
TestScore = 682.2 – 0.97STR + 5.6 – 1.28STR
= 687.8 – 2.25STR
Two regression lines: one for each HiSTR group.
Class size reduction is estimated to have a larger effect when
the percent of English learners is large.
206
Example, ctd: Testing hypotheses
TestScore = 682.2 – 0.97STR + 5.6HiEL – 1.28(STR HiEL)
(11.9) (0.59) (19.5) (0.97)
The two regression lines have the same slope the
coefficient on STR HiEL is zero: t = –1.28/0.97 = –1.32
The two regression lines have the same intercept the
coefficient on HiEL is zero: t = –5.6/19.5 = 0.29
The two regression lines are the same population
coefficient on HiEL = 0 and population coefficient on
STR HiEL = 0: F = 89.94 (p-value < .001) !!
We reject the joint hypothesis but neither individual
hypothesis (how can this be?)
207
Summary: Nonlinear Regression
Functions
Using functions of the independent variables such as ln(X)
or X1 X2, allows recasting a large family of nonlinear
regression functions as multiple regression.
Estimation and inference proceed in the same way as in
the linear multiple regression model.
Interpretation of the coefficients is model-specific, but the
general rule is to compute effects by comparing different
cases (different value of the original X‟s)
Many nonlinear specifications are possible, so you must
use judgment:
What nonlinear effect you want to analyze?
What makes sense in your application?
208
Chapter 9
Misleading Statistics
209
Statistics Means Description and
Inference
Descriptive Statistics is about describing datasets.
Various visual tricks can distort these descriptions
Inferential Statistics is about statistical inference.
You know something about tricks to distort
inference (eg. Putting in lots of variables to raise
R2 or lowering to get in a variable you want).
210
Pitfalls of Analysis
There are several ways that misleading statistics
can occur (which effect both inferential and
descriptive statistics)
Obtaining flawed data
Not understanding the data
Not choosing appropriate displays of data
Fitting an inappropriate model
Drawing incorrect conclusions from analysis.
211
Poor Displays of Data: Chart
Junk
213
Poor Displays of Data: Axes
Increments of 100,000
214
How to Display Data
• The golden rule for displaying data in a graph is to
keep it simple
• Graphs should not have any chart junk.
– “minimise the ratio of ink to data” - Tufte
• Axes should be chosen so they do not inflate or deflate
the differences between observations
– Where possible, start the Y-axis at 0
– If this is not possible then you should consider graphing the
change in the observation from one period to the next
• Some general tips on how to properly display data can
be found at
http://lilt.ilstu.edu/gmklass/pos138/datadisplay/sections/goodcharts.htm
215
How to Display Data
216
Incorrect Conclusions: Causality
Correlation: 0.848
Excess money supply (%) Increase in prices two years
later (%)
1965 4.7 1967 2.5
1966 1.9 1968 4.7
1967 7.8 1969 5.4
1968 4.0 1970 6.4
1969 1.3 1971 9.4
1970 7.8 1972 7.1
1971 11.4 1973 9.2
1972 23.4 1974 16.1
1973 22.2 1975 24.2
Data
219
Incorrect Conclusions: Causality
Correlation: -0.868
Cases of Dysentery in Increase in prices one year
Scotland („000) later (%)
1966 4.3 1967 2.5
1967 4.5 1968 4.7
1968 3.7 1969 5.4
1969 5.3 1970 6.4
1970 3.0 1971 9.4
1971 4.1 1972 7.1
1972 3.2 1973 9.2
1973 1.6 1974 16.1
1974 1.5 1975 24.2
Source: Grenville and Macfarlane (1988) 220
A Final Warning
We have to inform you that the correlation coefficient
is -0.868 (which is statistically slightly more significant
than that obtained by Professor Mills). Professor Mills says
that “Until … a fallacy in the figures [can be shown], I
think Mr Rees-Mogg has fully established his point.” By
the same argument, so have we.
Yours faithfully.
G. E. J. LLEWELLYN, R. M. WITCOMB.
Faculty of Economics and Politics,
221