Вы находитесь на странице: 1из 17

Econometrics II.

Lecture Notes 1
ESTIMATING LINEAR
EQUATIONS BY
INSTRUMENTAL VARIABLES
1. Instrumental Variables Estimates
2. Two Step Least Squares Estimates
3. Asymptotic properties of 2SLS estimates
4. Specication Tests
(a) Endogeneity
(b) Overidentifying restrictions
(c) Functional form
(d) Heteroskedasticity
1
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
1.1 Instrumental Variables Estimates
Consider the linear model
y =
1
+
2
x
2
+ +
K
x
K
+ u (1.1)
E[u] = 0, Cov [x
j
, u] = 0, j = 1, 2, . . . , K 1, (1.2)
but x
K
might be correlated with u, so that is potentially endogenous, while the other
regressors are exogenous. This can be due to the existence of omitted regressors, so that
these are correlated with x
K
, but not with the other regressors, measurement error in
either dependent or independent variables, or simultaneity.
If Cov[x
K
, u] = 0 OLS estimation of all the coecients in (1.1) typically results in
inconsistency. The method of Instrumental Variables (IV) provides a general solution
to the problem of an endogenous explanatory variable. To use the IV approach we need
a variable z
1
, not in equation (1.1) that satises two conditions.
1. First z
1
must be exogenous in equation (1.1),
Cov [z
1
, u] = 0. (1.3)
2. The second condition involves the relationship between z
1
and the endogenous
variable x
K
, in particular the linear projection of x
K
on all exogenous variables,
x
K
=
1
+
2
x
2
+ +
K1
x
K1
+
1
z
1
+ r
K
(1.4)
where by denition E[r
K
] = 0 and r
K
is uncorrelated with 1, x
2
. . . , x
K1
and
z
1
. Equation (1.4) is called a reduced form (RF) equation for the endogenous
variable x
K
, which always exists as a linear projection. The second assumption is
that

1
= 0, (1.5)
i.e. z
1
is partially correlated with z
K
(after the other exogenous variables 1, x
2
, . . . , x
K1
are controlled).
If these two conditions are satised then z
1
is called an IV or just instrument for x
K
.
Since 1, x
2
, . . . , x
K1
are already exogenous, the full list of IV is the same as the list
of exogenous variables. These conditions allow identication of the parameters
j
in
equation (1.1), so that they can be written down in terms of population moments of
observable variables.
2
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
For that we write
y = x

+ u
where x = (1, x
2
, . . . , x
K
)

and we write z = (1, x


2
, . . . , x
K1
, z
1
)

. Assumptions (1.2)
and (1.3) imply
E[zu] = 0,
so that multiplying by z and taking expectations, we obtain
E[zx

] = E[zy] ,
i.e. a system of K equations with K unknowns, which has unique solution i the KK
matrix E[zx

] has full rank,


rank (E[zx

]) = K.
Condition (1.5) is key for this rank condition (given no multicolinearity among the
exogenous regressors). Then
= E[zx

]
1
E[zy] ,
where each expectation can be estimated by sample equivalents,

IV
n
= E
n
[zx

]
1
E
n
[zy] .
Both conditions (1.3) and (1.5) are important to dene valid IVs, but (1.3) cannot be
tested because it involves the non-observable u. To test for (1.5) it is only needed a t-test
on a linear regression.
3
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
1.2 Multiple instruments: Two-Stage Least Squares
(2SLS)
Let z
1
, . . . , z
L
be variables such that
Cov (z
h
, u) = 0, h = 1. . . . , L,
so that each variable is exogenous in (1.1) . This implies that there are many potential IV
estimates (with each of z
h
or any linear combination). We will see that the 2SLS is the
most ecient IV: 2SLS chose the most correlated linear combination of the exogenous
variables z = (1, x
2
, . . . , x
K1
, z
1
, . . . , z
L
)

with x
K
. If x
K
is exogenous, then we will chose
just x
K
. Writing the RF for x
K
as
x
K
=
1
+
2
x
2
+ +
K1
x
K1
+
1
z
1
+ +
L
z
L
+ r
K
, (1.6)
we nd that
x

K
=
1
+
2
x
2
+ +
K1
x
K1
+
1
z
1
+ +
L
z
L
is uncorrelated with r
K
, and, as any linear combination, also with u (i.e. x

K
is the part
of x
K
uncorrelated with u, so that x
K
is endogenous because r
K
is correlated with u).
The coecients
1
,
2
, . . . ,
K1
,
1
, . . . ,
h
can be estimated by OLS, and obtain
x
K
=

1
+

2
x
2
+ +

K1
x
K1
+

1
z
1
+ +

L
z
L
,
so that for x =(1, x
2
, . . . , x
K1
, x
K
)

we dene

2SLS
n
= E
n
[ xx

]
1
E
n
[ xy] .
In fact the 2SLS is an OLS estimate noting that
x = L(x|z) =

n
z,

n
= E
n
[zz

]
1
E
n
[zx

]
so that
E
n
[ xx

] =

n
E
n
[zx

] = E
n
[xz

] E
n
[zz

]
1
E
n
[zx

]
= E
n
[xz

] E
n
[zz

]
1
E
n
[zz

] E
n
[zz

]
1
E
n
[zx

]
= E
n
_

n
zz

n
_
= E
n
[ x x

] ,
or in other terms, x x is orthogonal to x.
4
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
The name 2SLS comes from these two steps:
1. Regress x
K
on z = (1, x
2
, . . . , x
K1
, z
1
, . . . , z
L
)

and compute x
K
.
2. Regress y on x =(1, x
2
, . . . , x
K1
, x
K
)

.
However, note that the usual OLS s.e.s of the second step are not correct for the
2SLS estimate, while the rst step can not be substituted by a regression of x
K
on
z = (1, z
1
, . . . , z
L
)

.
E[zx

] has full column rank if one of


j
is dierent from zero in the RF (1.6), otherwise
the last element of x could be written as a linear combination of the other ones (plus an
orthogonal term to z). The values of
j
are not relevant. To test the rank condition,
H
0
:
1
= =
L
= 0
against the alternative that at least one
j
is dierent from zero we need to use an F or
a (robust) Wald statistic.
The model with a single endogenous variable is said to be overidentied when L > 1,
and there are L 1 overidentifying restrictions.
5
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
1.3 Asymptotic Properties of 2SLS estimates
ASS. 1 (Orthogonality) For some L 1 vector z
E[zu]
L1
= 0.
Unless every element of x is exogenous, z will have to contain variables obtained from
outside the model.
ASS. 2 (Rank Condition)
(a) rank (E[zz

])
LK
= L; (b) rank (E[zx

]
LK
) = K.
A necessary condition for ASS. 2(b) to hold is the order condition L K. When
z = x (exogenous regressors) and K = L, then this is equivalent to the rank condition
for OLS. For testing part (b) it can be checked that in the RF of the endogenous variables
at least one element in z not in x is signicant (but this is not sucient).
ASS. 2 identies since we can dene x

= L(x|z) =

z, where =E[zz

]
1
E[zx

] and
x = x

+r, where E[zr

] = 0 and E[x

] = 0. Then the 2SLS estimate is an IV estimate


with instruments x

, so that
E[x

] = E[x

y] ,
and is identied by = E[x

]
1
E[x

y] , if E[x

] has rank K, but


E[x

] =

E[zx

] = E[xz

] E[zz

]
1
E[zx

] ,
which is not singular if E[zx

] has rank K, i.e. ASS. 2(b).


The 2SLS estimate can also be written as

2SLS
n
=
_
E
n
[xz

] E
n
[zz

]
1
E
n
[zx

]
_
1
E
n
[xz

] E
n
[zz

]
1
E
n
[zy] ,
from which we obtain the following results.
THM. 1 (Consistency of 2SLS) Under ASS. 1-2, as n ,

2SLS
n

p
.
6
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
For the asymptotic normality of the 2SLSE we need a CLT for n
1/2
E
n
[zu] , possibly
under some conditions on the covariance matrix of zu.
ASS. 3 (Orthogonality) E[u
2
zz

] =
2
E[zz

] where
2
= E[u
2
] .
This is implied if E[u
2
|z] =
2
, conditional homoskedasticity (wrt to z).
THM. 2 (CAN of 2SLS) Under ASS. 1-3

n
_

2SLS
n

_

d
N
_
0,
2
_
E[xz

] E[zz

]
1
E[zx

]
_
1
_
.
The AVar of the 2SLS estimate can also be written as
2
E[x

]
1
. The variance
2
can be estimated from residuals u
ni
= y
i
x

2SLS
n
, with
2
n
= (n K)
1

n
i=1
u
2
ni
. Note
that these residuals are not the same as those of the second step, y
i
x

2SLS
n
.
THM. 3 (Relative Eciency of 2SLS) Under ASS. 1-3 the 2SLS estimate is e-
cient in the class of all IV estimates using instruments linear in z.
PROOF. We can write any IV estimate using IV z as

n
= E
n
[ xx

]
1
E
n
[ xy] , where x =

z,
and is a K L matrix, and assume that the rank condition holds for x. Replacing
and by consistent estimates would not aect the AVar of the estimates. Then
AV ar
_

n
_
=
2
E[ xx

]
1
E[ x x

] E[x x

]
1
,
while AV ar
_

2SLS
n
_
=
2
E[x

]
1
where x

z = L(x|z) .
To show that AV ar
_

n
_
AV ar
_

2SLS
n
_
is psd, we have to show that E[x

]
E[x x

] E[ x x

]
1
E[ xx

] 0. But x = x

+ r, E[zr

] = 0 and E[ xr

] = 0. Then also
E[ xx

] = E[ xx

] , so that
E[x

] E[x x

] E[ x x

]
1
E[ xx

] = E[x

] E[x

] E[ x x

]
1
E[ xx

] = E[s

] 0,
where s

= x

E[x

] E[ x x

]
1
x = x

L(x

| x) .
7
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
Properties of 2SLS estimation
This result is not interesting when L = K (why?). When x is exogenous, then z
contains x and OLS is the ecient linear estimate because L(x|z) = x.
Further we will always do better adding more instruments, at least under cond.
homoskedasticity (but if L >> K nite sample properties might be aected).
If ASS. 3 fails there can be more ecient estimates when L > K. Even if x is
exogenous and included in z, if ASS. 1-2 hold but not ASS. 3, OLS is not necessarily
asymptotically ecient.
Testing can be done as usual, even F tests based on SSR
r
and SSR
ur
for exclusion
and homogeneity restrictions, but these should be computed from the second stage
OLS regression, not from the single-step 2SLS procedure.
Heteroskedasticity Robust Inference. ASS. 3 can be restrictive in many contexts,
so a general estimate of E[u
2
x x

] , such as
E
n
_
u
2
n
x x

,
should be used instead of
2
n
E
n
[ x x

] .
Problems of 2SLS estimation
2SLS estimation is never unbiased when one explanatory variable is endogenous
(and has a reduced number of moments in the Gaussian case).
In large samples IV estimator can be ill-behaved if the instruments are weak. If
there is a small correlation of u and z and the correlation between x and z is weak,
then the limit of

2SLS
n
could be arbitrarily large, and 2SLS could have worse
properties than OLS.
The stantard errors of 2SLS are typically large, compared e.g. with OLS s.e.s,
since from the second step its AVar depends on the variability of x
K
, instead than
of that of x
K
itself,
AV ar
_

2SLS
nK
_
=

2

SSR
K
=

2

SST
K
(1 R
2
2S
)
=

2
SST
K
(1 R
2
2S
) R
2
K
,
8
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
with

SSR
K
from the regression of x
K
(with

SST
K
) on 1, x
1
, . . . , x
K1
, and R
2
2S
is the correspondent R
2
correcting for colinearity among regressors. Since x
K
is a
projection of x
K
on the instruments, its variability is equal to

SST
K
= SST
K
R
2
K
where R
2
K
is obtained from regression (1.6) and SST
K
is for the original x
K
. Then

SST
K
is small if the 1st step regression is not very informative on the variability
of x
K
, i.e. if the instruments are weak.
2SLS solution to the Omitted Variables and Measurement Error Problems
Omitted Variables:
y =
0
+
1
x
1
+ +
K
x
K
+ q + v, E[v|x, q] = 0.
The solution would be to put q in the error term, u = q+v, and then nd instruments
for any element of x that is correlated with q. These instruments should satisfy:
1. Be redundant in the structural model E[y|x, q] .
2. Be uncorrelated with the omitted variable q.
3. Be suciently correlated with the endogenous elements of x.
Multiple Indicator Solution.
An alternative is using indicators of the unobservables q, as when using proxy variables.
An indicator q
1
can be written as
q
1
=
0
+
1
q + a
1
Cov (q, a
1
) = 0; Cov (x,a
1
) = 0. (1.7)
The CEV model appears when q
1
is the observed measurement,
0
= 0 and
1
= 1.
Writing, with
1
= 0,
q = (
0
/
1
) + (1/
1
) q
1
(1/
1
) a
1
,
the error term is necessarily correlated with q
1
, so the proxy variable solution is incon-
sistent under (1.7).
9
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
To use this assumption we need more information, as a a second indicator,
q
2
=
0
+
1
q + a
2
,
where a
2
satises the same assumptions as a
1
, and
Cov (a
1
, a
2
) = 0.
Then we nd that
y =
0
+x

+
1
q
1
+ (v (/
1
)a
1
) ,
where q
2
is uncorrelated with v (because is redundant in the structural equation) and
with a
1
by the assumption that the only relationship between q
1
and q
2
is due to q.
Thanks to this relation, then q
2
can be used as IV for q
1
.
Note that this is dierent from leaving q in the error term, which would imply to nd
instruments for all elements of x which are correlated with q.
10
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
1.4 Specication Tests
1.4.1 Endogeneity tests
The 2SLS estimate is less ecient that the OLSE when the explanatory variables are
exogenous. Therefore, endogeneity tests are useful to decide whether 2SLS estimation
is necessary.
In the model
y
1
= z

1
+
1
y
2
+ u
1
(1.8)
where z
1
are L
1
exogenous variables and both y
1
and y
2
are endogenous, the set of all
exogenous variables is denoted by the L 1 vector z, where z
1
is a strict subset of z,
E[zu
1
] = 0.
We also assume that equation (1.8) is identied, which requires that L > L
1
(order
condition) and that the extra elements in z wrt to z
1
are partially correlated with y
2
.
Hausman (1978) suggested comparing the OLS and 2SLS estimates of
1
= (

,
1
)

to
build a formal tests of endogeneity of y
2
. If y
2
is uncorrelated with u
1
then both estimates
should only dier up to sampling error. Otherwise the OLSE is inconsistent and both
estimates should dier.
Hausman Test for Endogeneity
To see if the dierence OLS-2SLS is signicative, the original form is complicated because
involves a singular matrix (and its generalized inverse), but there is an equivalent version
based on a simple regression.
For that we use the Reduced Form of y
2
,
y
2
=

2
z + v
2
, E[zv
2
] = 0,
so that y
2
is endogenous i v
2
is correlated with the structural error u
1
(because z
is uncorrelated with u
1
by assumption).
The linear projection of u
1
on v
2
u
1
=
1
v
2
+ e
1
(1.9)
11
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
with
1
= Cov (v
2
, u
1
) /V ar (v
2
) gives e
1
, uncorrelated with v
2
and zero mean, so
that E[ze
1
] = 0 because v
2
and u
1
are uncorrelated with z.
Then y
2
is exogenous i u
1
and v
2
are uncorrelated, i.e. i
H
0
:
1
= 0.
Plugging equation (1.9) into the original equation we obtain
y
1
= z

1
+
1
y
2
+
1
v
2
+ e
1
,
where E[z
1
e
1
] = E[y
2
e
1
] = E[v
2
e
1
] = 0 by construction, so that a test of H
0
can
be done with the usual t-test.
The problem is that v
2
is not observed, but
2
can be estimated by OLS, and then
residuals v
2
are obtained, so we run the regression
y
1
= z

1
+
1
y
2
+
1
v
2
+ error. (1.10)
Now, despite v
2
is a generated regressor, the t-test remains valid under H
0
(pro-
vided homoskedasticity holds).
In fact the estimates of
1
= (

,
1
)

are identical to the 2SLS estimates (see


Problem 1), but s.e.s are not valid unless
1
= 0.
It can also make sense to compare the estimates of
1
. Under the equivalent to
ASS. 1-3, it can be shown that
AV ar
_

1,2SLS

1,OLS
_
= AV ar
_

1,2SLS
_
AV ar
_

1,OLS
_
,
so it is easy to build a t-test using the s.e.s under homoskedasticity.
The extension to multiple endogenous regressors,
y
1
= z

1
+y

1
+ u
1
,
is straightforward and is routinely implemented with heteroskedasticity robust
Wald tests on the coecient of v
2
in
y
1
= z

1
+y

1
+ v

1
+ error.
Also LM tests can be implemented using the OLS residuals u
1
of regressing y
1
on
z
1
,y
2
on the regression
u
1
on z
1
, y
2
, v
2
where v
2
are the RF residuals of y
2
.
12
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
1.4.2 Endogeneity and over-identication restrictions tests
When there are more instruments that needed, it is possible to test whether the addi-
tional instruments are valid, i.e. if they are orthogonal to the structural error u
1
,
y
1
= z

1
+y

1
+ u
1
,
where z
1
is L
1
1 and y
2
is G
1
1. The vector of all exogenous variables z is L 1,
z =(z

1
, z

2
)

, with z
2
L
2
1 and L = L
1
+ L
2
.
The model is overidentied if L
2
> G
1
(more extra IVs than endogenous regressors) i.e.
if L > L
1
+ G
2
(more exogenous variables than regressors).
A valid LM test for endogeneity can be formed as nR
2
u
from the OLS regression of
u
1
on z
where u
1
are the 2SLS residuals using all of the instruments z (assuming they contain a
constant). Under the null,
H
0
: E[zu
1
] = 0,
so with ASS. 3
J = nR
2
u

d

2
Q
1
,
where Q
1
= L
2
G
1
, the number of overidentifying restrictions.
If we reject, then our choice of IV has to be reconsidered, if not, then we can have some
condence on them, but the test has low power against some endogenous instruments.
This test was proposed by Sargan (1958) for the 2SLS estimator under conditional
homoskedasticity. Hansen (1982) extended this test to general GMM estimates.
1.4.3 Tests of Functional Form
RESET, Ramsey (1969), to test for
y = x

+ u, E[u|x] = 0
i.e. E[y|x] = x

, proposed to augment the regression with nonlinear functions of x.


13
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
1.4.4 Tests for Heteroskedasticity
Heteroskedasticity does not aect consistency of OLS and 2SLS estimates, but does
aect asymptotic variances of estimates, needed for inference.
We assume rst all regressors are exogenous,
y = x

+ u, E[u|x] = 0,
so the model is well specied. The null is
H
0
: E
_
u
2
|x

=
2
while the alternative is that E[u
2
|x] depends on x. Then it is sensible to study the
correlation between u
2
and h(x) for some Q1 vector function h, through regressions
u
2
i
=
0
+

1
h
i
+ v
i
,
where h
i
= h(x
i
) . Under H
0
, E[v
i
|x
i
] = E[v
i
|h
i
] = 0,
1
= 0 and
0
=
2
. Then to test
H
0
:
1
= 0
we can use an LM or a Wald test (under the assumption that E[v
2
i
|x
i
] is constant).
However we can not test for H
0
directly because u
i
is not observed, so residuals u
ni
=
y
i
x

n
have to be used instead, and then regress
u
2
ni
on 1, h
i
and compare nR
2
c
to a
2
Q
distribution (no adjustment, see Problem 10).
There are dierent tests based on the choice of h
i
: Breusch and Pagan (1979) and
Koenker (1981) propose h
i
= x
i
, while White (1980) proposes all nonconstant and unique
elements of x
i
and x
i
x

i
. Both have degrees of freedom depending on the dimension of x
i
,
which can be quite large in applications. To avoid problems from this, it can be taken
h
i
= ( y
ni
, y
2
ni
)

, where y
ni
are just the OLS tted values, which are linear functions of
x
i
, leading to an asymptotic test with two degrees of freedom.
If we allow for endogenous regressors, and have exogenous variables z, then we have
to consider h
i
= h(z
i
) , but no endogenous regressors in any case.
RECOMMENDED READINGS: Wooldridge (2002, Ch. 5 & 6). Hayashi (2000, Ch. 3).
Ruud (2000, Ch. 20-22). Mittelhammer et al. (2000, Ch. 17).
14
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
Problem Set 1
1. Show the equivalence of 2SLS estimates of (

1
,
1
)

in the system
y
1
= z

1
+
1
y
2
+ u
1
y
2
= z

2
+ v
2
,
where y
2
is the suspected endogenous regressor, and z is the set of all exogenous
variables, with at least one more element than z
1
, and the OLS estimate of (

1
,
1
)

in
y
1
= z

1
+
1
y
2
+
1
v
2
+ error
where v
2
are the residuals from the OLS estimation of the reduced form for y
2
.
[Hint: use partitioned regression algebra for OLS estimates and that z
1
and v
2
are
orthogonal in the sample.]
2. Consider the multiple indicator model.
(a) Show that if q
2
is uncorrelated with x
j
, j = 1, 2, . . . , K, then the reduced form
of q
1
depends only on q
2
[Hint: Use the fact that the reduced form of q
1
is the
linear projection of q
1
onto (1, x
1
, x
2
, ..., x
K
, q
2
) and nd the coecient vector
on x using two-step multiple regression algebra].
(b) What happens if q
2
and x are correlated? In this setting, is it realistic to
assume that q
2
and x are uncorrelated?
3. Consider IV estimation of the simple linear model with a single, possibly endoge-
nous, explanatory variable x, and a single instrument z:
y =
0
+
1
x + u
E (u) = 0; Cov (z, u) = 0; Cov (z, x) = 0, E
_
u
2
|z
_
=
2
.
(a) Under the preceding (standard) assumptions, show that AVar
_
n
1/2
_

1
__
can be expressed as

2
zx

2
x
where
2
x
= V ar (x) and
zx
= Corr (z, x) . Compare this result with the
asymptotic variance of the OLS estimate under E (u|x) = 0 and E (u
2
|x) =

2
.
1
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
(b) Comment on how each factor aects the asymptotic variance of the IV esti-
mator. What happens as
zx
0.
4. A model with a single endogenous explanatory variable can be written as
y
1
= z

1
+
1
y
2
+ u
1
, E[zu
1
] = 0,
where z =(z

1
, z

2
)

. Consider the following two-step method, intended to mimic


2SLS:
(a) Regress y
2
on z
2
, and obtain tted values, y
2
(that is, z
1
is omitted from the
rst-stage regression.)
(b) Regress y
1
on z
1
, y
2
to obtain

1
and
1
. Show that

1
and
1
are generally
inconsistent. When would

1
and
1
be consistent?
5. In the setup of Sections 1.2-1.3 with x = (x
1
, . . . , x
K
) and z = (x
1
, . . . , x
K1
, z
1
, . . . , z
M
)

,
with x
1
= 1, and assume that E[zz

] is nonsingular. Prove that rank(E[zx

]) = K
i at least one
j
in equation (1.6) is dierent from zero.
6. Consider the model of the previous exercise.
(a) Find L(y|z) in terms of
j
, x
1
, . . . , x
K1
and x

K
= L(x
K
|z).
(b) Argue that, provided x
1
, . . . , x
K1
,x

K
are not perfectly collinear, an OLS
regression of y on x
1
, . . . , x
K1
, x

K
consistently estimates all
j
.
(c) State a necessary and sucient condition for x

K
not to be a perfect linear
combination of x
1
, . . . , x
K1
. What 2SLS assumption is identical to?
7. Consider a structural linear model with unobserved variable q
y = x

+ q + v, E[v|x, q] = 0,
Suppose, in addition, that E[q|x] = x

for some K 1 vector ; thus, q and x


are possibly correlated.
(a) Show that E[y|x] is linear in x. What consequences does this fact have for
tests of functional form to detect the presence of q? Does it matter how
strongly q and x are correlated?
(b) Now add the assumptions Var[v|x,q] =
2
v
, Var[q|x] =
2
q
. Show that Var[y|x]
is constant. What does this fact imply about using tests for heteroskedasticity
to detect omitted variables?
2
Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis
(c) Now write the equation as y = x

+u, where E[ux] = 0 and Var[u|x] =


2
.
If E[u|x] = E[u] , argue that an LM test regressing squares of residuals u
2
i
on
functions of x
i
, will detect heteroskedasticity in u, at least in large samples.
8. In the linear model y = x

+ u assume that ASS. 1-3, hold with w in place of z,


where w contains all nonredundant elements of x and z. Further, assume that the
rank conditions hold for OLS and 2SLS. Show that
AV ar
_

2SLS

OLS
_
= AV ar
_

2SLS
_
AV ar
_

OLS
_
.
9. Show that the degrees of freedom of the overidentifying restrictions tes is actually
Q
1
= L
2
G
1
(and not e.g. L
2
).
10. Show that the heteroskedasticity test has Q degrees of freedom (the dimension of
h
i
) despite using residuals u
2
in
to build it.
3