Академический Документы
Профессиональный Документы
Культура Документы
1 X1
2 X2
+ ::: +
k Xk
+u
the (n
X12
X22
..
.
Xn2
(k + 1) :
(k + 1)) matrix
3
: : : X1k
: : : X2k 7
7
7
. . . ..
5
.
: : : Xnk
A5 (Random Sampling) The n (k + 1) tuples of the sample (Xi1 ; Xi2 ; :::; Xik ; Yi ) are independent and identically distributed across i = 1; :::; n:
If A1 and/or A2 (linearity and/or mean independence) fail, the OLS estimators are no longer
unbiased and consistent. This may happen for several reasons:
The functional form of the PRF is not correctly specied. Suppose for example the true
population model is
E (Y jX1 ; X2 ) =
1 X1
2 X2
2
3 X2
but we regress Y only on X1 and X2 without including the squared term X22 : Then the
OLS estimators of 0 ; 1 ; 2 will all be biased in general.
We omit relevant variables from the PRF (model under-specication) Suppose for example the true population model is
E (Y jX1 ; X2 ; X3 ) =
1 X1
2 X2
3 X3
but we regress Y only on X1 and X2 without including the third variable term X3 : Then
the OLS estimators of 0 ; 1 ; 2 will all be biased in general. We then say we have
omitted variable bias.
The error term is correlated with any of the included regressors, which are then known
to be endogenous regressors. For example this situation arises when a regressor (say
price) is simultaneously determined with the dependent variable (say quantity). We then
say we have simultaneity bias.
If A3 fails, that is, in the presence of perfect collinearity, OLS fails: we no longer can obtain
unique estimates for all regression coe cients.
If A4 fails, that is, in the presence of (conditional) heteroskedasticity, OLS is still unbiased
and consistent but no longer has minimum variance among all linear unbiased estimators, (i.e.
is no longer BLUE) and our estimates of the standard errors of the estimates will be biased.
Because of this, our usual hypothesis testing routine is unreliable.
If A5 fails, as for example when there is serial correlation, OLS may still be unbiased and
consistent but no longer has minimum variance among all linear unbiased estimators, (i.e. is
no longer BLUE) and our estimates of the standard errors of the estimates will be biased.
Because of this, our usual hypothesis testing routine is unreliable.
MULTICOLLINEARITY
Perfect Collinearity vs. Multicollinearity
In the presence of perfect collinearity among the explanatory variables we can no longer compute the OLS estimates. STATA will automatically drop one or more of the variables that are
perfectly linearly related.
What happens if we have a high degree but not perfect linear dependence among two or more
regressors? We will call this phenomenon multicollinearity.
In the presence of multicollinearity, all CLRM assumptions still hold and hence OLS is still
BLUE. But the correlations between two or more explanatory variables are high.
(SSRj )
where SSRj is the sum of squared residuals from an imaginary regression of Xj on the other
explanatory variables.
Even though the OLS estimates may have minimum variance, it doesnt mean small.
Harder to get a precise estimate
Wider condence intervals
Insignicant t ratios
Can get a high R2 and hence a low F statistic of overall signicance of the regression
but few signicant t ratios.
2. OLS estimators and their standard errors become very sensitive to small changes in the data
(i.e. they are unstable)
3. Wrong signs for regression coe cients
4. Di culty in assessing the individual contributions of explanatory variables.
1
1
Rj2
where Rj2 is the R2 of an auxiliary regression of the jth regression on all other regressors
(including a constant). Use the command estat VIF in STATA. VIFs higher than 10 indicate
a problem.
Although multicollienarity is a nuisance, it does not mean we have to "correct" it, i.e. get
rid of variables that belong in a regression merely to boost the statistical signicance of the
remaining variables!
(CONDITIONAL) HETEROSKEDASTICITY
What is Heteroskedasticity?
The conditional variance of ui , or equivalently of Yi ; is not constant but may vary with the
level of one or more of the Xi s:
V ar(ui jXi ) = V ar (Yi jXi ) =
2
i
Heteroskedasticity may be present when a high value for an independent variable (e.g. high
income) in necessary but not su cient for a high value of the dependent variable (high vacation
expenses).
3. There are a number of tests that involve regressing forms of the residuals on Xs.
Whites Test (estat imtest, white in STATA)
(a) Let the model be
Yi =
1 X1i
2 X2i
+ ui
1 X1i
+ a2 X2i +
2
3 X1i
2
4 X2i
5 X1i X2i
regressing the squared residual on all Xs, their squares, and all interaction terms.
Be careful not to square dummies!
2
(d) Get the RA
of the auxiliary regression.
(e) In order to test the null hypothesis that there is no heteroskedasticity (i.e. that all
the slope coe cients are equal to zero), White has shown that, under the null:
2
nRA
s
(kA )
with the number of degrees of freedom equal to the number of explanatory variables
excluding the intercept in the auxiliary regression. (In the case above, kA = 5)
(f) A large value of the test statistic is indicative of heteroskedasticity.
Breusch-Pagan (or Cook-Weisberg) test (use estat hettest in STATA): Involves testing homoskedasticity against the alternative that the error variances are multiplicative
functions of one or more of the X variables.
where u^i is the ith OLS residual, and r^ij is the ith residual of the auxiliary regression of Xj
on all the other independent variables. STATA will calculate robust standard errors with the
option robust, i.e. doing reg y x, robust. The robust covariances of the estimates may be then
obtained using estat vce.
Furthermore, the t statistic that uses the usual - incorrect - standard error is not valid. Instead
we should use the robust standard error of the estimate and form the so-called robust t- statistic
which however is only asymptotically valid (i.e. for large samples) and is asymptotically
distributed as N (0; 1) : Similarly, the F test statistic that we construct using the sums of
squared residuals of the restricted and unrestricted regressions is not valid either. There do
exist robust versions of the F statistic which are used by STATA when one uses the robust
option.
Why is OLS ine cient?
P 2
OLS minimizes
u^i : With OLS, we weigh each u^2i equally, whether it comes from a
population with a large variance (i.e. little information) or with a small variance.
Ideally, we would like to give more weight to observations coming from populations with
smaller variances, as this would enable us to estimate the PRF more accurately. This
leads to the Weighted Least Squares estimator.
1 Xi1
2 Xi2
+ ::: +
k Xik
+ ui
where
E [ui jXi1 ; Xi2 ; :::; Xik ] = 0
but
V ar(ui jXi1 ; Xi2 ; :::; Xik ) =
2
i
If we know the form of heteroskedasticity we can estimate the model more e ciently by
Weighted Least Squares: This involves the weighting of each observation (Yi ; Xi1 ; :::; Xik ) by
the inverse of its standard deviation (1= i ) and then running OLS using the transformed
observations.
It can be shown that WLS is the BLU estimator of
in the model above, i.e. it is more
e cient than OLS. Weighted Least Squares is a special case of Generalized Least Squares
(GLS).
2
i
WLS: Case 1:
If we know
tions:
2
i,
Known
we can transform the model into something that satises our CLRM assumpYi
1
0
Xi1
1
Xik
+ ::: +
ui
ui
i
u2i
2
i
jXi
1
2
i
E u2i jXi =
2
i
2
i
=1
In the transformed model, all the CLRM assumptions are satised and OLS on the transformed
data will be BLUE.
OLS on the transformed model is what is known as Weighted Least Squares:
Weighted Least Squares = OLS applied to transformed data
i.e.
^ j;W LS
ok
j=0
min
b0; b1 ;:::;bk
Yi
i
b0
b1
Xi1
i
:::
bk
Xik
WLS: Case 2:
2
i
Unknown
If the 2i s are unknown, we have to make assumptions about the functional form of heteroskedasticity in order to transform the model
Example 1: Suppose that we are willing to assume that
V ar (ui jXi1 ; :::; Xik ) = E(u2i jXi ) =
i.e. each variance is assumed proportional to Xi1 ; where
constant. We transform the model as follows
Y
p i
=
Xi1
1
Xi1
1
p
Xi1
p
X
p i1
+ ::: +
Xi1
p
Xi1 + ::: +
Xi1
is an unknown proportionality
X
ui
p ik
+p
Xi1
Xi1
X
p ik
+ i
Xi1
ui
=p
Xi
V ar( i jXi ) = V ar
=
1
Xi
Xi =
1
V ar(ui jXi )
Xi
NOTE:
The transformed model in this particular case has NO CONSTANT.
We do not need to know the proportionality constant
in the formulae of the estimators.
In STATA, if we only know the relative scale of the i s, we use the command reg y x1 x2
[aweight=1/X1] if X1 is the variable that causes heteroskedasticity of the form described
above.
10
2
Xi1
2
i.e. now the variance is proportional to Xi1
: Then the transformed model is
Yi
=
Xi1
Yi
=
Xi1
1
Xi1
1
Xi1
Xi1
Xi1
+ ::: +
+ ::: +
k
Xik
Xi1
Xik
Xi1
+
ui
Xi1
where
ui
Xi1
The new model satises the CLRM assumptions:
i
ui
jXi
Xi1
V ar( i jXi ) = V ar
=
1
2
Xi1
2
Xi1
=
1
V ar(ui jXi )
2
Xi1
11
1 Zi1
+ ::: +
p Zip
b0; b1 ;:::;bk
Yi
^i
b0
1
^i
b1
Xi1
^i
:::
bk
Xik
^i
12
1 educ
2 exper
3 f emale
+ ui
:48
:09
:009
+
educ +
exper
(:10)
:007
:001
:34
f emale
:037
where the numbers in parenthesis are usual standard errors (i.e. they are calculated assuming
conditional homoskedasticity). Plotting the residuals with respect to the tted values, we detect a
cone shape which is indicative of heteroskedasticity. Testing for heteroskedasticity with the BreuschPagan test statistic (which is distributed 2 (1)), we nd a value of 5.53 for the test statistic and a
p value of .0187. Thus we reject the null of homoskedasticity at the 5% level. Using Whites test
(which is distributed 2 (8)), we nd a value of 17.03 and a p value of .0298. Thus we reject at
the 5% level using this testing approach as well. It seems reasonable then to correct the standard
errors of the OLS estimates above for heteroskedasticity. We obtain:
ln\
wage =
:48
:09
:009
+
educ +
exper
(:11)
:008
:001
:34
f emale
:037
We see that the standard errors are increased relative to the usual standard errors.
The following STATA commands have been used to produce the results above:
use WAGE1
reg lwage educ exper female
rvfplot
estat hettest
estat imtest, white
reg lwage educ exper female, robust
13