Вы находитесь на странице: 1из 13

VIOLATING THE ASSUMPTIONS OF CLRM

THE ASSUMPTIONS OF THE CLRM


A1 (Linearity in the parameters) Y is generated as:
Y =

1 X1

2 X2

+ ::: +

k Xk

+u

A2 (Mean Independence) The error term u has zero conditional mean:


E [ujX1 ; X2 ; :::; Xk ] = 0
A3 (No Perfect Collinearity) The columns of
2
1 X11
6 1 X21
6
X 6 .. ..
4 . .
1 Xn1
are not perfectly linearly related and n

the (n
X12
X22
..
.
Xn2

(k + 1) :

(k + 1)) matrix
3
: : : X1k
: : : X2k 7
7
7
. . . ..
5
.
: : : Xnk

A4 (Conditional Homoskedasticity) The conditional variance of u is constant:


V ar(ujX1 ; X2 ; :::; Xk ) =
where

is an unknown positive constant.

A5 (Random Sampling) The n (k + 1) tuples of the sample (Xi1 ; Xi2 ; :::; Xik ; Yi ) are independent and identically distributed across i = 1; :::; n:

If A1 and/or A2 (linearity and/or mean independence) fail, the OLS estimators are no longer
unbiased and consistent. This may happen for several reasons:
The functional form of the PRF is not correctly specied. Suppose for example the true
population model is
E (Y jX1 ; X2 ) =

1 X1

2 X2

2
3 X2

but we regress Y only on X1 and X2 without including the squared term X22 : Then the
OLS estimators of 0 ; 1 ; 2 will all be biased in general.
We omit relevant variables from the PRF (model under-specication) Suppose for example the true population model is
E (Y jX1 ; X2 ; X3 ) =

1 X1

2 X2

3 X3

but we regress Y only on X1 and X2 without including the third variable term X3 : Then
the OLS estimators of 0 ; 1 ; 2 will all be biased in general. We then say we have
omitted variable bias.
The error term is correlated with any of the included regressors, which are then known
to be endogenous regressors. For example this situation arises when a regressor (say
price) is simultaneously determined with the dependent variable (say quantity). We then
say we have simultaneity bias.
If A3 fails, that is, in the presence of perfect collinearity, OLS fails: we no longer can obtain
unique estimates for all regression coe cients.
If A4 fails, that is, in the presence of (conditional) heteroskedasticity, OLS is still unbiased
and consistent but no longer has minimum variance among all linear unbiased estimators, (i.e.
is no longer BLUE) and our estimates of the standard errors of the estimates will be biased.
Because of this, our usual hypothesis testing routine is unreliable.
If A5 fails, as for example when there is serial correlation, OLS may still be unbiased and
consistent but no longer has minimum variance among all linear unbiased estimators, (i.e. is
no longer BLUE) and our estimates of the standard errors of the estimates will be biased.
Because of this, our usual hypothesis testing routine is unreliable.

MULTICOLLINEARITY
Perfect Collinearity vs. Multicollinearity
In the presence of perfect collinearity among the explanatory variables we can no longer compute the OLS estimates. STATA will automatically drop one or more of the variables that are
perfectly linearly related.
What happens if we have a high degree but not perfect linear dependence among two or more
regressors? We will call this phenomenon multicollinearity.
In the presence of multicollinearity, all CLRM assumptions still hold and hence OLS is still
BLUE. But the correlations between two or more explanatory variables are high.

Why is Multicollinearity an issue?


1. Large variances and standard errors of OLS estimators. Recall that for a slope coe cient
V ar ^ j jX =

(SSRj )

where SSRj is the sum of squared residuals from an imaginary regression of Xj on the other
explanatory variables.
Even though the OLS estimates may have minimum variance, it doesnt mean small.
Harder to get a precise estimate
Wider condence intervals
Insignicant t ratios
Can get a high R2 and hence a low F statistic of overall signicance of the regression
but few signicant t ratios.
2. OLS estimators and their standard errors become very sensitive to small changes in the data
(i.e. they are unstable)
3. Wrong signs for regression coe cients
4. Di culty in assessing the individual contributions of explanatory variables.

How do we detect multicollinearity?


Multicollinearity is a matter of degree. There is no irrefutable test for it but there are warning
signs:
Insignicant t ratios but signicant overall F statistic.
Stability of coe cients in dierent subsamples
Examine bivariate correlations between the explanatory variables or even better examine the
VIFs (Variance Ination Factors),
V IFj =

1
1

Rj2

where Rj2 is the R2 of an auxiliary regression of the jth regression on all other regressors
(including a constant). Use the command estat VIF in STATA. VIFs higher than 10 indicate
a problem.
Although multicollienarity is a nuisance, it does not mean we have to "correct" it, i.e. get
rid of variables that belong in a regression merely to boost the statistical signicance of the
remaining variables!

(CONDITIONAL) HETEROSKEDASTICITY
What is Heteroskedasticity?
The conditional variance of ui , or equivalently of Yi ; is not constant but may vary with the
level of one or more of the Xi s:
V ar(ui jXi ) = V ar (Yi jXi ) =

2
i

Heteroskedasticity may be present when a high value for an independent variable (e.g. high
income) in necessary but not su cient for a high value of the dependent variable (high vacation
expenses).

How do we detect Heteroskedasticity


1. Priors:
Heteroskedasticity is more likely in cross-sectional data
Heterogeneity in rms (example, rm size and investment)
Heterogeneity in individuals (earnings and savings, education and earnings)
Measurement errors (some individuals provide more accurate answers than others)
Group heterogeneity (subpopulation dierences in error variances)
Model misspecication (e.g. X 2 is omitted from the model, Y instead of log Y is used,
etc.)
2. Graphical examination of OLS residuals (cone shapes and hourglass shapes may indicate
heteroskedasticity):
Plot u^i or u^2i against X to see if there is a pattern (rvpplot <variable name> in STATA)
Plot u^i or u^2 against Y^i if there are multiple Xs (rvfplot in STATA)
i

3. There are a number of tests that involve regressing forms of the residuals on Xs.
Whites Test (estat imtest, white in STATA)
(a) Let the model be
Yi =

1 X1i

2 X2i

+ ui

(b) Estimate with OLS, obtain residuals u^i


(c) Run the following auxiliary regression:
u^2i =

1 X1i

+ a2 X2i +

2
3 X1i

2
4 X2i

5 X1i X2i

regressing the squared residual on all Xs, their squares, and all interaction terms.
Be careful not to square dummies!
2
(d) Get the RA
of the auxiliary regression.
(e) In order to test the null hypothesis that there is no heteroskedasticity (i.e. that all
the slope coe cients are equal to zero), White has shown that, under the null:
2
nRA
s

(kA )

with the number of degrees of freedom equal to the number of explanatory variables
excluding the intercept in the auxiliary regression. (In the case above, kA = 5)
(f) A large value of the test statistic is indicative of heteroskedasticity.
Breusch-Pagan (or Cook-Weisberg) test (use estat hettest in STATA): Involves testing homoskedasticity against the alternative that the error variances are multiplicative
functions of one or more of the X variables.

What are the consequences of heteroskedasticity on OLS?


OLS is still unbiased and consistent in general under assumptions A1 and A2 (and hence
suitable for estimation), but does not have minimum variance among all linear unbiased
estimators (i.e. it is no longer BLUE), i.e. it is ine cient.
The usual OLS standard errors are incorrect. One has to use heteroskedasticity-robust (or just
robust) standard errors, also known as White standard errors that are obtained as the square
roots of
Pn 2 2
^ij u^i
\
i=1 r
^
V
j = Pn
2 2
^ij
i=1 r

where u^i is the ith OLS residual, and r^ij is the ith residual of the auxiliary regression of Xj
on all the other independent variables. STATA will calculate robust standard errors with the
option robust, i.e. doing reg y x, robust. The robust covariances of the estimates may be then
obtained using estat vce.
Furthermore, the t statistic that uses the usual - incorrect - standard error is not valid. Instead
we should use the robust standard error of the estimate and form the so-called robust t- statistic
which however is only asymptotically valid (i.e. for large samples) and is asymptotically
distributed as N (0; 1) : Similarly, the F test statistic that we construct using the sums of
squared residuals of the restricted and unrestricted regressions is not valid either. There do
exist robust versions of the F statistic which are used by STATA when one uses the robust
option.
Why is OLS ine cient?
P 2
OLS minimizes
u^i : With OLS, we weigh each u^2i equally, whether it comes from a
population with a large variance (i.e. little information) or with a small variance.

Ideally, we would like to give more weight to observations coming from populations with
smaller variances, as this would enable us to estimate the PRF more accurately. This
leads to the Weighted Least Squares estimator.

Weighted Least Squares (WLS)


Suppose that
Yi =

1 Xi1

2 Xi2

+ ::: +

k Xik

+ ui

where
E [ui jXi1 ; Xi2 ; :::; Xik ] = 0
but
V ar(ui jXi1 ; Xi2 ; :::; Xik ) =

2
i

If we know the form of heteroskedasticity we can estimate the model more e ciently by
Weighted Least Squares: This involves the weighting of each observation (Yi ; Xi1 ; :::; Xik ) by
the inverse of its standard deviation (1= i ) and then running OLS using the transformed
observations.
It can be shown that WLS is the BLU estimator of
in the model above, i.e. it is more
e cient than OLS. Weighted Least Squares is a special case of Generalized Least Squares
(GLS).

2
i

WLS: Case 1:
If we know
tions:

2
i,

Known

we can transform the model into something that satises our CLRM assumpYi

1
0

Xi1
1

Xik

+ ::: +

where the transformed error term


i

ui

ui
i

satises conditional homoskedasticity:


V ar( i jXi ) = E( 2i jXi ) = E

u2i
2
i

jXi

1
2
i

E u2i jXi =

2
i

2
i

=1

In the transformed model, all the CLRM assumptions are satised and OLS on the transformed
data will be BLUE.
OLS on the transformed model is what is known as Weighted Least Squares:
Weighted Least Squares = OLS applied to transformed data
i.e.

^ j;W LS

ok

solve the following problem:

j=0

min

b0; b1 ;:::;bk

Yi
i

b0

b1

Xi1
i

:::

bk

Xik

In eect WLS deates the importance of an observation with larger variance.


Standard errors for ^ j;W LS are calculated as usual in the transformed model and all testing
procedures are valid.

WLS: Case 2:

2
i

Unknown

If the 2i s are unknown, we have to make assumptions about the functional form of heteroskedasticity in order to transform the model
Example 1: Suppose that we are willing to assume that
V ar (ui jXi1 ; :::; Xik ) = E(u2i jXi ) =
i.e. each variance is assumed proportional to Xi1 ; where
constant. We transform the model as follows
Y
p i
=
Xi1

1
Xi1
1
p
Xi1
p

X
p i1
+ ::: +
Xi1
p
Xi1 + ::: +

where the new error term


i

Xi1

is an unknown proportionality
X
ui
p ik
+p
Xi1
Xi1
X
p ik
+ i
Xi1

ui
=p
Xi

satises the CLRM assumptions:


u
p i jXi
Xi

V ar( i jXi ) = V ar
=

1
Xi

Xi =

1
V ar(ui jXi )
Xi

NOTE:
The transformed model in this particular case has NO CONSTANT.
We do not need to know the proportionality constant
in the formulae of the estimators.

to perform WLS. (it drops out

In STATA, if we only know the relative scale of the i s, we use the command reg y x1 x2
[aweight=1/X1] if X1 is the variable that causes heteroskedasticity of the form described
above.

10

Example 2: Suppose that instead we believe that


2

V ar (ui jXi ) = E(u2i jXi ) =

2
Xi1

2
i.e. now the variance is proportional to Xi1
: Then the transformed model is

Yi
=
Xi1
Yi
=
Xi1

1
Xi1
1
Xi1

Xi1
Xi1

+ ::: +

+ ::: +
k

Xik
Xi1

Xik
Xi1
+

ui
Xi1

where

ui
Xi1
The new model satises the CLRM assumptions:
i

ui
jXi
Xi1

V ar( i jXi ) = V ar
=

1
2
Xi1

NOTE: The transformed model estimates

2
Xi1
=

1
V ar(ui jXi )
2
Xi1

as its CONSTANT term.

In STATA use the command reg y x1 x2 [aweight=1/X1^2] if X1 is the variable that


causes heteroskedasticity of the form described above.

11

Feasible Weighted Least Squares (FWLS)


Knowing the form of heteroskedasticity (up to a multiplicative constant as in the examples
above) is not the usual case. In practice, 2i is unknown and has to be estimated (and
consistently for that matter). For example we may be willing to assume that
2
i

1 Zi1

+ ::: +

p Zip

where the variables Zi may be a subset or transformations of the original Xi s. It can be


shown that the s may be estimated consistently by running an OLS regression of the original
^ + ^ Xi1 + ::: + ^ Xik with the ^ s being the (unbiased
squared OLS residuals, u^i = Yi
0
1
k
and consistent) OLS estimates of the s, on a constant and the Zi s. Then
^ 2i = ^ 0 + ^ 1 Zi1 + ::: + ^ p Zip
Let ^ 2i be a consistent estimator of 2i . For this we need to assume the correct model for the
form of heteroskedasticity and be able to consistently estimate the 2i s.
Running the weighted regression with ^ 2i
min

b0; b1 ;:::;bk

Yi
^i

b0

1
^i

b1

Xi1
^i

:::

bk

Xik
^i

is doing Feasible Weighted Least Squares (FWLS).


In STATA, if we know the absolute scale of the ^ i s, we use the command vwls y x1 x2, sd(s)
where s is the variable in the data set that contains the estimated ^ i s. It produces the same
point estimates as reg y x1 x2 [aweight=1/s 2 ] but the latter has in this case incorrect standard
errors.
Although WLS (i.e. if we use the true and correct 2i ) is BLU for , FWLS (i.e. if we use the
consistent estimates ^ 2i ) is neither unbiased nor best!
However we may sometimes prefer FWLS to OLS since FWLS is more e cient asymptotically
(i.e. it has smaller asymptotic standard errors). But for 2i to be estimated consistently, one
has to make an assumption about the form of the heteroskedasticity. IF this assumption
is wrong, then the procedure (FWLS) may yield not only biased estimates for but also
inconsistent!
If one is not concerned about e ciency, or does not want to run the risk of misspecifying the
form of heteroskedasticity, one can just do OLS and calculate heteroskedasticity-robust
(White) standard errors for the OLS estimates.

12

Example: Wage regression


Suppose we want to estimate the following wage regression
ln wage =

1 educ

2 exper

3 f emale

+ ui

Using WAGE1.DTA we nd by OLS:


ln\
wage =

:48
:09
:009
+
educ +
exper
(:10)
:007
:001

:34
f emale
:037

where the numbers in parenthesis are usual standard errors (i.e. they are calculated assuming
conditional homoskedasticity). Plotting the residuals with respect to the tted values, we detect a
cone shape which is indicative of heteroskedasticity. Testing for heteroskedasticity with the BreuschPagan test statistic (which is distributed 2 (1)), we nd a value of 5.53 for the test statistic and a
p value of .0187. Thus we reject the null of homoskedasticity at the 5% level. Using Whites test
(which is distributed 2 (8)), we nd a value of 17.03 and a p value of .0298. Thus we reject at
the 5% level using this testing approach as well. It seems reasonable then to correct the standard
errors of the OLS estimates above for heteroskedasticity. We obtain:
ln\
wage =

:48
:09
:009
+
educ +
exper
(:11)
:008
:001

:34
f emale
:037

We see that the standard errors are increased relative to the usual standard errors.
The following STATA commands have been used to produce the results above:
use WAGE1
reg lwage educ exper female
rvfplot
estat hettest
estat imtest, white
reg lwage educ exper female, robust

13

Вам также может понравиться