Вы находитесь на странице: 1из 61

# Econ 139/239: Introduction to Econometrics

Final Review
Sophia Zhengzi Li1
1 Department

of Economics
Duke University

Econ 139/239

1 / 61

## The final exam will be Saturday, August 14 from 2 PM - 5 PM.

Content
The final will be cumulative, but will be biased toward more recent
material (i.e. Binary Dependent Variables, Panel Data, IV).
Todays slides provide a good indication of the topics that I believe are
important.
Stop me if you have any questions!

## Preparation & Mechanics

Problem set, Quiz, in-class practice and Discussion session are the best
indication of exam content and style.
The exam is closed book, but you will be allowed to use the final cheat
sheet.
You should bring a calculator, since you will be doing several
calculations!

Econ 139/239

2 / 61

## Before the first midterm, we introduced the univariate regression

model
Yi = 0 + 1 Xi + ui
which we estimated using OLS.
In order for OLS to have the properties that we value in an estimator
(unbiasedness, consistency, and asymptotic normality)1 , we needed to
make some assumptions.

## 1 To prove efficiency we would need to assume homoskedasticity of the errors

(Var (ui | Xi ) = 2 ) as well. However, we usually wont make this assumption.
Final Review (Duke)

Econ 139/239

3 / 61

## The OLS Assumptions

OLS Assumption 1 Linearity
E (ui | Xi ) = 0
OLS Assumption 2 Simple random sample

## (Xi , Yi ) are iid draws from their joint distribution, and

OLS Assumption 3 No extreme outliers
ui and Xi have non-zero & finite fourth moments:


0 < E Xi4 < and 0 < E ui4 <
Given OLS assumptions 1-3, the OLS estimators are:
Unbiased
Consistent
Asymptotically Normal

Econ 139/239

4 / 61

## Omitted Variable Bias

However, univariate OLS has a big limitation: if the regressor (Xi ) is
correlated with a variable that has been omitted from the analysis,
but that determines (in part) the dependent variable, then the OLS
estimator will suffer from omitted variable bias (OVB).
OVB occurs when two conditions are true:
The OV is correlated with the included regressor
The OV is a determinant of the dependent variable

## OVB means that OLS Assumption 1 (E (ui | Xi ) = 0) does not hold.

The error term ui represents all factors (other than Xi ) that are
determinants of Yi .
If one of these factors is correlated with Xi , then the error term will
be correlated with Xi .

Econ 139/239

5 / 61

## Omitted Variable Bias

Since this violates OLS A1, OLS wont just be biased but also
inconsistent, so OVB is a problem whether the sample size is large or
small.
The magnitude and direction of the bias depends on the correlation
between the regressor and the omitted variable (or more generally, the
error term).
The best solution to the OVB problem is to add (if you can) the
other relevant variables to the regression.
If you cant, you will have to use another method (like Fixed Effects) to
solve the problem.

Econ 139/239

6 / 61

## We assume the population regression model is given by

Yi = 0 + 1 X1i + ... + k Xki + ui
0 is the intercept (the mean impact of unobserved factors) and k is
the slope coefficient of Xk .
k represents the expected change in Y associated with a unit
change in Xk , holding all other regressors constant.
We can estimate the parameters of the multiple regression model
using OLS, by minimizing the sum of the squared prediction errors.

Econ 139/239

7 / 61

## OLS Assumption 1 Linearity

E (ui | X1i , ..., Xki ) = 0
OLS Assumption 2 Simple random sample

## (Yi , X1i , ..., Xki ) iid

OLS Assumption 3 No extreme outliers
X1i , ..., Xki , ui have non-zero & finite fourth moments
OLS Assumption 4 No perfect collinearity
Regressors are not linear combinations of each other
Given OLS A1-A4, OLS is unbiased, consistent, and asymptotically
normal.

Econ 139/239

8 / 61

Homoskedasticity

## If we want to assume homoskedasticity (in general we wont), we

OLS Assumption 5 Homoskedasticity
Var (ui | X1i , ..., Xki ) = 2
Adding OLS Assumption 5 makes OLS efficient and allows us to use
HO standard errors (but this assumption is often violated in the data).

Econ 139/239

9 / 61

## Hypothesis Tests and CIs for a Single Coefficient

To test the hypothesis H0 : j = j,0 against the alternative
HA : j 6= j,0
 
Compute the standard error of bj , SE bj
Compute the t-statistic
bj j,0
 
t=
SE bj
Compute the p-value


p-value = 2 t act
where t act is the value of the t-statistic actually computed. Reject H0
at the 5% significance level if the p-value is less than 0.05, or
equivalently, if |t act | > 1.96.

Econ 139/239

10 / 61

## Hypothesis Tests and CIs for a Single Coefficient

When the sample size is large, a 95% confidence interval for j can
be constructed as
 
 

bj 1.96 SE bj , bj + 1.96 SE bj
Remember that this confidence interval contains the true value of j
with a 95% probability (i.e. it contains the true value of j in 95% of
all possible randomly selected samples).
Equivalently, it is also the set of values of j that cannot be rejected
by a 5% two-sided hypothesis test.

Econ 139/239

11 / 61

## What if you want to test a joint hypothesis about several coefficients?

Why might you want to?
If you think the coefficients are individually insignificant because of
near perfect multicollinearity.

Assuming A1-A4 and a large sample size, you can use the F -statistic.
To do so in practice, you need to:
1

2
3
4

## Count the number of restrictions under the null (degrees of

freedom), call this q.
Compute F -statistic.
Check table for Fq, (or use the p-value, if it is provided).
Reject the null if p-value < or F -Stat > Fq, .

Econ 139/239

## Summer II, 2010

12 / 61

Goodness of Fit
There are 3 main ways to measure goodness of fit.
1

## The standard error of the regression (SER) - the SER is a measure of

the spread of the distribution of Y around the regression line, but it
depends on the units of Y .
R 2 - The regression R 2 is the fraction of the sample variation in Yi
explained by the regressors.

2
bi Y
Y
ESS
SSR
R2 =
2 = TSS = 1 TSS
Y Y
i

## However, in multiple regression, the R 2 increases whenever a new

regressor is added (unless its perfectly multicollinear with the original
regressors).
2
2
R - R adjusts for this by deflating the R 2 by a penalty factor:
2

R = 1
Final Review (Duke)

s2
n 1 SSR
= 1 2ub
n k 1 TSS
sY

Econ 139/239

13 / 61

2

## An increase in R 2 or R does not mean that an added variable is

statistically significant or that the regressors are a true cause of the
dependent variable.
2
A high R 2 or R does not mean that there is no omitted variable bias
or that you have the best possible set of regressors.
2
Neither R 2 nor R can prove our model is wrong or right.
2

You can have a good model but a low R 2 and R because Var (ui ) is
large
2
Can also have a bad model with R 1 (spurious regression)

Econ 139/239

14 / 61

## A statistical analysis is internally valid if the statistical inferences

about causal effects are valid for the population being studied.
We know that internal validity hinges on two things:
1

## The estimator of the causal effect should be consistent (unbiased

would be nice too, but its not always feasible).
Hypothesis tests should have the desired significance level (i.e. you
should be using the correct standard errors).

Econ 139/239

15 / 61

## Threats to Internal Validity

Consider the simple univariate regression
Yi = 0 + 1 Xi + ui
We know that:


Xi X ui
b
1 = 1 +
2
Xi X

(Xi X )ui p
Since b1 = 1 +
2 1 +
( Xi X )
Cov (Xi , ui ) 6= 0.

Xu
,
X2

b1 will be inconsistent if

## Also, if E [ui | Xi ] 6= 0 then E [(Xi X ) ui ] 6= 0, so b1 will biased as

well.
So when might this occur?

Econ 139/239

16 / 61

## Omitted variables. Is the omitted variable observed?

Yes Include it! (Multivariate regression analysis)
No
Use Panel Data (fixed effects).
Use IV.
Design an experiment.

## Wrong functional form

Approximate with a nonlinear functional form like a polynomial
regression.

Econ 139/239

17 / 61

## Threats to Internal Validity

Simultaneous causality
X causes Y , but Y in turn causes X
Use IV.
Design an experiment.

## Measurement error in the regressor

Get more accurate measurements.
Use Instrumental Variables (IV) or model the form of error.

Sample selection
The availability of data is related to the value of the dependent
variable.
Use a model that corrects for selection bias.

Econ 139/239

## Summer II, 2010

18 / 61

Nonlinearities
Identifying and Modeling Nonlinearities

The basic OLS model assumes that the X s are all linearly related to
Y through the population regression line.
What if this is not the case?
We looked at two methods for modeling nonlinearities using OLS:
Allowing the effect on Y of a unit change in X1 to depend on the value
of another independent variable X2 (or perhaps more than one).
This method uses dummy variables and interactions.

## Allowing the effect on Y of a unit change in X1 to depend on the value

of X1 itself
This method uses nonlinear functions of the X s like polynomials and
logarithms

Econ 139/239

19 / 61

Nonlinearities

Econ 139/239

20 / 61

## Dummies and interactions

Through the use of the interaction term Xi Di , the population
regression line relating Yi and the continuous variable Xi can have a
slope or intercept that depends on the binary variable Di . There are
three possibilities:
1

## Different intercepts, same slope

Yi = 0 + 1 Xi + 2 Di + ui

## Different intercepts and slopes

Yi = 0 + 1 Xi + 2 Di + 3 (Xi Di ) + ui

## Same intercept, different slopes

Yi = 0 + 1 Xi + 2 (Xi Di ) + ui

Econ 139/239

21 / 61

Econ 139/239

22 / 61

## In other situations, we would like to allow the effect on Y of a unit

change in X to depend on the value of X itself, rather than some
other variable.
One way to do this is to run a quadratic regression
Yi = 0 + 1 Xi + 2 Xi2 + ui
Note that this regression is linear in the parameters: after creating
the variable Xi2 we can still use OLS to estimate the parameters.

Econ 139/239

23 / 61

## For the nonlinear models considered in this class

Y = f (X1 , X2 , ..., Xk ) + u
so the expected effect on Y of a change in Xi is then
Y = f (X1 , ..Xi + Xi , .., Xk ) f (X1 , ..Xi , .., Xk )
Notice that this formula applies both to the examples in Chapter 8,
where f is a nonlinear function of the X s but a linear function of the
parameters ( s) , and to the examples in Chapter 11, where f can be
a nonlinear function of the parameters as well as the X s.

Econ 139/239

## Summer II, 2010

24 / 61

The estimator of the unknown population difference
Y = f (X1 , ..Xi + Xi , .., Xk ) f (X1 , ..Xi , .., Xk )
is just the difference between the predicted values
b =b
Y
f (X1 , ..Xi + Xi , .., Xk ) b
f (X1 , ..Xi , .., Xk )
bi = b0 + b1 Xi + b2 Xi2
Y
we have
b
Y



= b0 + b1 (Xi + Xi ) + b2 (Xi + Xi )2 b0 + b1 Xi + b2 Xi2

= b1 X i +2 b2 Xi (Xi ) + b2 (Xi )2
Final Review (Duke)

Econ 139/239

## Summer II, 2010

25 / 61

Notice that to compute this you will need to know b1 and b2 , the
initial value of Xi , and the size of the change Xi .
To compute the standard error of the effect on Y of changing X in
the quadratic regression, you need to compute


b
Y
b =
SE Y
F
where F is the F -statistic from the null hypothesis that the effect is
zero, which will depend on the coefficients on X and X 2 , the initial
value of Xi , and the size of the change Xi .

Econ 139/239

## Summer II, 2010

26 / 61

Polynomial Regression

## The quadratic regression is a special case of a polynomial regression,

which extends the quadratic specification to higher order polynomials
(X 3 , X 4 , etc.).
As in the quadratic regression, calculating the effect of a change in
one regressor involves requires the difference in predicted values.
Since this can be tedious, researchers often prefer to use logarithms to
model nonlinearities.

Econ 139/239

27 / 61

## Regressions using Logarithms

Logarithms are often useful because they convert changes in variables
into percentage changes:


X
X
X + X
= ln (X + X ) ln (X )
(when
is small)
ln
X
X
X
This approximation makes their coefficients simpler to interpret and
perform tests on than the coefficients in quadratic or polynomial
regressions, which is very convenient.
interpretation and tests are easier.
This disadvantage is that you have to decide on the shape of the
relationship beforehand.

Econ 139/239

28 / 61

## Regressions using Logarithms

There are three main ways to use logs in regressions:
1

Linear-log model

Log-linear model

Log-log model

Linear-log model
Assume that the regression has the following shape
Y = 0 + 1 ln X + u
When would we want to use this approach?

Econ 139/239

29 / 61

## Regressions using Logarithms

Log-linear model
What if we apply the log to Y instead of X ?
Now the regression has the following shape
ln Y = 0 + 1 X + u
When would we want to use this approach?

## Here, a change in X of one unit (X = 1) is associated with a

100 1 % change in Y .
Log-log model
Now the regression has the following shape
ln Y = 0 + 1 ln X + u
When would we want to use this approach?

## Here, a 1% change in X is associated with a 1 % change in Y .

Final Review (Duke)

Econ 139/239

30 / 61

Econ 139/239

31 / 61

## One option for modeling discrete dependent variables is to just use

OLS.
The key here is to reinterpret the predicted values as probabilities.
Why is this interpretation legitimate?
The population regression function is a conditional expectation
(E (Y | X1 , ..., Xk )) and here Y is a 0/1 binary variable, so its
expected value is simply the probability that Y = 1.

## Thus, for a binary variable,

E (Y | X1 , ..., Xk ) = P (Y = 1 | X1 , ..., Xk )

Econ 139/239

32 / 61

## OLS with a binary dependent variable is called the linear probability

model since it models the probability that Y = 1 with a straight line.
P (Y = 1 | X1 , ..., Xk ) = 0 + 1 X1 + ... + k Xk
i measures the change in P (Yi = 1) due to a unit change in Xi , or
if Xi is a dummy variable (Di ) it measures the change in P (Yi = 1)
associated with changing Di from being equal to 0 to being equal to
1.

Econ 139/239

33 / 61

## Linear Probability Model (LPM)

Most of the tools weve learned so far carry over to the LPM.
confidence intervals, hypothesis tests, & interactions are the same.
2
only R 2 and R dont, since the fitted values are always somewhat far
from Yi .

However, the LPM has a serious flaw: you can get predicted
probabilities that are greater than one or less than zero.
For this reason, we introduced two nonlinear specifications (logit and
probit) to correct this flaw.

Econ 139/239

34 / 61

## Probit and Logit

With the LPM
E (Yi | X1i , ..., Xki ) = P (Yi = 1 | X1i , ..., Xki )

## = f (X1i , ..., Xki )

= 0 + 1 X1 + ... + k Xk
which can lead to predicted probabilities outside the unit interval.
Probit and logit use CDFs to model f (X1i , ..., Xki ) , which keeps the
predictions inside this interval.
The Probit model uses the standard normal CDF so
f (X1 , ..., Xk ) = ( 0 + 1 X1 + ... + k Xk )
The Logit uses the standard logistic CDF
f (X1 , ..., Xk ) =
Final Review (Duke)

e 0 + 1 X1 +...+ k Xk
F ( 0 + 1 X1 + ... + k Xk )
1 + e 0 + 1 X1 +...+ k Xk
Econ 139/239

35 / 61

## Probit and Logit

Both are essentially fitting an S shaped curve through the data, and
produce pretty similar results, so choosing between them is usually a
matter of preference (i.e. arbitrary).

Econ 139/239

36 / 61

## Probit and Logit

One drawback relative to the LPM is that the coefficients from the
logit or probit do not have simple interpretations.
Both the predicted values and differences in predicted values are
non-linear functions of the s and X s.

## The model is best interpreted by computing predicted probabilities

and the effect of a unit change in a regressor (often evaluated at the
mean value of the other regressors).

Econ 139/239

37 / 61

## For the probit model, the predicted probability that Y = 1, given

values of X1 , X2 , ..., Xk is calculated by computing the z-value,
z = b0 + b1 X1 + ... + bk Xk , and then looking up this z-value in the
normal distribution table.
For the logit model, the predicted probability that Y = 1, given
values of X1 , X2 , ..., Xk is calculated by computing the value of
b0 + b1 X1 + ... + bk Xk , and then plugging this value into the logistic
cumulative distribution function
e 0 + 1 X1 +...+ k Xk
b

f (X1 , ..., Xk ) =

## Final Review (Duke)

1 + e b0 + b1 X1 +...+ bk Xk

Econ 139/239



F b0 + b1 X1 + ... + bk Xk

38 / 61

1

regressors,

## Computing the predicted probability for the new or changed value of

the regressors, and

Econ 139/239

39 / 61

Estimation

## We cant use OLS to estimate the coefficients because these

parameters enter both the logit and probit nonlinearly.
In other words, both
E (Yi | X1i , ..., Xki ) = ( 0 + 1 X1 + ... + k Xk )
and
E (Yi | X1i , ..., Xki ) = F ( 0 + 1 X1 + ... + k Xk )
are nonlinear in the coefficients ( s) so we cant use OLS.
Instead, the coefficients of the probit and logit models are estimated
using maximum likelihood.

Econ 139/239

40 / 61

## Maximum Likelihood (ML)

To use ML, we treat the joint probability distribution of the data as a
function of the unknown coefficients.
If we know the distribution of the data as a function of the
parameters, bMLE is the parameter(s) that maximize the likelihood of
our data.
For probit, the log-likelihood is
ln L (Y1 , ..., Yn | X1i , .., Xki ; 0 , .., k ) =
[yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]
The MLE then solves
Max [yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]

0 ... k

Econ 139/239

41 / 61

## Max [yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]

0 ... k

Since this does not have a nice closed form solution, we cant
represent the estimators using simple formulas (like we could with
OLS).
Instead, we must use a computer algorithm to maximize the function
numerically.
But we know that under fairly general conditions, ML estimation is
consistent, asymptotically normal, and efficient.

Econ 139/239

42 / 61

## Inference & Goodness of Fit

Because the MLE is asymptotically normal, statistical inference about
the probit and logit coefficients is carried out in the same manner as
in OLS.
As usual, we can use a t-ratio or F-stat to test hypotheses about one
or more coefficients.
2

## We cant use R 2 or R though because the fitted values will still be

somewhat far from Yi so instead, we can use


ln Lmax
probit
Pseudo-R 2 = 1
max
ln (Lbernoulli )
where Lmax
probit is the value of the maximized probit likelihood and
Lmax
bernoulli is the value of the maximized Bernoulli likelihood

Econ 139/239

43 / 61

## Inference & Goodness of Fit

max
The formula for the logit simply replaces Lmax
probit with Llogit :

Pseudo-R 2 = 1



ln Lmax
logit
ln (Lmax
bernoulli )

The Pseudo-R 2 tells us how well the probit or logit does relative to a
simple Bernoulli model, so a higher value means that the probit (or
logit) does a better job of explaining the data.

Econ 139/239

44 / 61

## Panel data techniques are a powerful method for addressing the

omitted variables problem.
If we are willing to assume that the omitted variables are constant
over time, we can solve the OV problem by collecting panel data from
the same units (e.g. people, firms, states) for several time periods (at
least two).

Econ 139/239

45 / 61

## Suppose that we have n units each observed for T periods. We can

write our regression model as
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit

(1)

## where the error term now includes two components:

A fixed effect i which includes all unobserved variables that are
constant over time2 for each unit i
A second component uit which contains all the remaining (time-unit
specific) error.

2 If we believe that there is a third component of the error ( ) that varies over time
t
but is constant across units, we can also add a time fixed effect.
Final Review (Duke)

Econ 139/239

## Summer II, 2010

46 / 61

Random Effects
If the fixed effect i is uncorrelated with all the included regressors in
all time periods (Cov (i , Xj,it ) = 0) we can still use OLS to estimate
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit
but it will be more efficient to use an estimator that accounts for the
fact that the observations are no longer iid (due to the presence of
i ).
We can do so by using a particular form of GLS (Generalized Least
Squares) known as the random effects (RE) estimator.
If the fixed effect i is correlated with one of more of the included
regressors (Cov (i , Xj,it ) 6= 0), RE will be inconsistent.
In this case, we should use the fixed effects (FE) estimator, which
differences i away, allowing us to use OLS.
Final Review (Duke)

Econ 139/239

## Summer II, 2010

47 / 61

Fixed Effects
Specifically, by subtracting the average of both sides of
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit
from itself, we are left with

## (Yit Y i ) = 1 (X1,it X 1,i ) + ... + k (Xk,it X k,i ) + (uit u i )

which no longer includes the fixed effect i and can be estimated
using OLS.
Intuitively, we are exploiting the panel nature of the data to hold the
unobserved effect (i ) constant, even though we cant measure it.
Since we are regressing changes of Y on changes of X s, the fixed
effect wont play any role in this regression since, by definition, the
fixed effect did not change over time.
Final Review (Duke)

Econ 139/239

48 / 61

## Fixed Effects versus Random Effects

When Cov (i , Xj,it ) = 0, both FE and RE are consistent, but RE is
more efficient.
If Cov (i , Xj,it ) 6= 0, FE is unbiased and consistent, but RE is not, so
FE is more robust.
Therefore, you should only use RE if Cov (i , Xj,it ) = 0.
You can test this condition with a Hausman test.

Econ 139/239

## Summer II, 2010

49 / 61

Hausman Test
Formally, the Hausman test involves constructing a test statistic
which measures the normalized difference of the coefficients estimated
using RE and FE respectively.

## This test statistic will be distributed 2M , where M is the number of

coefficients (that vary over time).
Since the null hypothesis of the test statistic is that the coefficients
are the same, a rejection of the null implies that RE is inconsistent
(so we should use FE instead).
Final Review (Duke)

Econ 139/239

## Summer II, 2010

50 / 61

Instrumental Variables

## Instrumental variables (IV) techniques are another powerful method

Consider the simple univariate regression
Yi = 0 + 1 Xi + ui

(2)

## We know that OLS is inconsistent if Xi is correlated with ui (that is,

if Xi is endogenous).
IV or 2SLS is an estimation technique that can be used instead of
OLS to recover consistent estimates of the parameters.
2SLS can be used when particular variables called instruments are
available.

Econ 139/239

## Summer II, 2010

51 / 61

Instrumental Variables

Yi = 0 + 1 Xi + ui

(2)

## A valid instrument Zi must satisfy two conditions:

1

Instrument relevance
Cov (Zi , Xi ) 6= 0 (Usually easy to satisfy)

Instrument exogeneity
Cov (Zi , ui ) = 0 (Usually hard to satisfy)

## Instruments allow us to break X into two parts, only one of which is

correlated with u, and then use the good (uncorrelated) part alone
to estimate (2).

Econ 139/239

## Summer II, 2010

52 / 61

Instrumental Variables
So how does IV work?
Assume that the relation between the endogenous variable Xi and the
instrument Zi is described by the following linear model:
Xi = 0 + 1 Zi + vi
where, if Zi is a valid instrument, (0 + 1 Zi ) is uncorrelated with
the error term ui (but Cov (vi , ui ) 6= 0).
2SLS3 estimates the parameter 1 in
Yi = 0 + 1 Xi + ui
using only the component of Xi that is uncorrelated with the error.
3 Although this discussion concerns the univariate case with one instrument, the
general case is a simple extension.
Final Review (Duke)

Econ 139/239

## Summer II, 2010

53 / 61

Instrumental Variables
This procedure is called 2SLS because it involves two steps:
1

## Estimate (0 + 1 Zi ) by regressing Xi on Zi , using OLS. The

predicted value will then be
b (Xi | Zi ) =
b0 +
b1 Zi
X i = E

## Regress Yi on X i , again using OLS, to get b2SLS

& b2SLS
0
1

In practice, the two steps are performed jointly, which also computes
the correct standard errors.
The formula for this 2SLS estimator is given by
n

2SLS
=
1

Zi Z

Zi Z

i =1
n

Yi Y

d (Zi , Yi )
Cov
 = Cov
d (Zi , Xi )
Xi X

i =1

Econ 139/239

## Summer II, 2010

54 / 61

Inference in 2SLS

## Given the 2SLS assumptions, the 2SLS estimator is consistent and

asymptotically normal (CAN).
b2SLS has a sampling distribution that is
Specifically, in large
 samples,
 1
approximately N 1 , b22SLS where
1

b22SLS =
1

1 var [(Zi Z ) ui ]
n [Cov (Zi , Xi )]2

## which can be estimated by estimating the variance and covariance

terms.
Statistical inference is again straightforward (provided that you use
standard errors that take the two stage procedure into account).

Econ 139/239

55 / 61

## Multiple Regression 2SLS

So what if you have more than one endogenous variable?
For the general case, the equation of interest is
Yi = 0 + 1 X1i + .. + k Xki + k +1 W1i + .. + k +r Wri + ui
The 2SLS estimator is still computed in two stages:
1

Regress each Xji on the instruments (Z1i , ..., Zmi ) and the included
exogenous
 regressors
 (W1i , ..., Wri ) using OLS. Compute the predicted
b1i , ..., X
bki from these k regressions.
values X


b1i , ..., X
bki and the included
Regress Yi on the predicted values X
exogenous regressors (W1i , ..., Wri ) using OLS.

In practice, the two steps are done jointly, in order to compute the
correct standard errors.
Final Review (Duke)

Econ 139/239

56 / 61

## So how many instruments do we need?

We need at least as many instruments as endogenous regressors
(otherwise we cant estimate the parameters).
If m = k the equation is exactly identified.
If m > k the equation is over-identified.
If m < k the equation is under-identified.

## If the equation is under-identified, IV/2SLS cannot be used!

However, if we are over-identified, we can test instrument exogeneity.

Econ 139/239

57 / 61

## Valid instruments must be both exogenous and relevant.

With relevance the issue is not just whether the instrument is
relevant, but how relevant.
The degree of relevance is called strength.
An instrument Z is weak if Cov (Zi , Xi ) 0.
Instrument weakness is a problem since
d (Zi , Yi )
Cov
b2SLS =
d (Zi , Xi )
Cov
d (Zi , Xi ) 0 then b2SLS explodes (and so will 22SLS ).
so if Cov
b
1

Econ 139/239

58 / 61

## You can check for weakness (with a single endogenous regressor) by

using the following rule of thumb test
If the F -statistic testing the null hypothesis that the coefficients on the
instruments are all zero in the first stage regression is less than 10, you
have weak instruments.

## If your instruments are weak, you should seriously consider using a

different technique (or getting better instruments).

Econ 139/239

## Summer II, 2010

59 / 61

Instrument Exogeneity

## If the instruments are not exogenous (Cov (Zi , ui ) 6= 0 ), then the

b s will be correlated with u and 2SLS will be inconsistent.
X
Defeats the purpose of 2SLS since it cant isolate the good part of X .

## If our system is over-identified, we can test for instrument exogeneity

by using the test of over-identifying restrictions (OIR), which works as
follows.
Exogeneity means u is uncorrelated with the Z s.
We dont observe ui but we can estimate it with the 2SLS coefficients
u i2SLS = Yi b2SLS
b2SLS
X1i .. b2SLS
0
1
k +r Wri

Econ 139/239

## Summer II, 2010

60 / 61

Instrument Exogeneity
If we use OLS to estimate the regression coefficients in
u i2SLS = 0 + 1 Z1i + .. + m Zmi + m+1 W1i + .. + m+r Wri + ei
we can then use the F -statistic testing the null hypothesis
H0 : 1 = ... = m = 0
to construct the OIR test statistic
d

J = mF 2mk
where m is the number of instruments and k is the number of
endogenous variables.
Since the null hypothesis of this test is that u is uncorrelated with the
Z s, rejecting the null implies that the instruments are not exogenous.
Final Review (Duke)

Econ 139/239

61 / 61