The Big Problems File

Duke University, practice problems for Introduction to Econometrics
October 26, 2009

1 Properties of Expectations, Variances and Covariances, Distributions
1. (5 points) You know that income per head in Italy (denoted by j ), expressed in Euros, is normally distributed,
that is,
j ~
_
j
j
, o
2
j
_
You also know that the mean j
j
is Euros 16,000, and the standard deviation of y is 2000. What is the
distribution of income per head in Italy, in US$, if the exchange rate is 1.10 US$ per one Euro? Solution:
Denote income per head in US$ as ~ j. Then ~ j = 1.10j. Given that j is normally distributed and ~ j is just a
linear transformation of j, ~ j also follows a normal distribution with mean ~ j
j
:
~ j
j
= 1.10j
j
= 1.10 + 16, 000 = 17, 600
and variance ~ o
2
j
:
~ o
2
j
= \ ar(1.10j) = 1.10
2
\ ar(j) = (1.10 + o1(j))
2
= (1.10 + 2, 000)
2
= 4, 840, 000
Thus:
~ j ~ (17, 600, 4, 840, 000)
Here, notice that, by convention, the second argument of is the variance, and not the standard deviation!
2. (6 points) Let 1 have a uniform distribution over the interval (0, 0). Show that 21 is an unbiased estimator
for 0. (Recall that the pdf of a uniform distribution over the interval (a, /) is given by ) (r) =
1
bo
). Answer:
Here 1
i
~ l [0, 0] . The pdf of this uniform distribution is ) (j) =
1
0
. Therefore,
1 (1
i
) =
o
_
o
j) (j) dj =
0
_
0
j
1
0
dj =
j
2
20
[
0
0
=
0
2
20
=
0
2
Now,
1
_
21
_
= 2 1
_
1
_
= 2 1
_
1
:
a
i=1
1
i
_
= 2
1
:
a
i=1
1(1
i
) = 2
1
:
a
i=1
0
2
= 2
0
2
= 0
Since 1
_
21
_
= 0, 21 is an unbiased estimator of 0. Note: Several people showed that 1 (21 ) = 0, which is
also true, but not what the question was asking so you only got partial credit for that.
3. (7 points total) Suppose that a mutual fund is investing in three dierent asset categories. Each asset
category includes many dierent stocks or bonds. Let the variable A represent the asset category, and let 1
indicate the one-year expected (predicted) percentage return for a particular asset (one particular bond, or
one particular stock). The following table shows the asset allocation of the fund, together with the one-year
1
percentage expected return for each asset category. The expected returns have been calculated using some
forecasting model, but how the forecast has been done has no relevance in this problem.
Proportion of assets invested One-year expected return for
in asset type A assets in asset category A
Asset Category (A)
A = 1, Domestic Stock .30 0.10
A = 2, International Stock .20 0.15
A = 3, Bonds .50 0.00
(a) (5 points) Calculate the expected return of a dollar invested in the mutual fund. Which property of
expectations is useful to solve this problem? Explain. 0.10(0.3) + 0.15(0.20) + 0.00(0.5) = 0.06 L.I.E.:
1(1ctnr:) = 1(1o)1(1ctnr:[1o)
+ 1(1o)1(1ctnr:[1o)
+ 1(1)1(1ctnr:[1)
(b) (2 points) Calculate the predicted one-year percentage return of the fraction of your investment invested
in Stocks. Using the L.I.E.:
1(1ctnr:[o) = 1(1o[o)1(1ctnr:[1o) +1(1o[o)1(1ctnr:[1o)
=
0.3
0.5
0.1 +
0.2
0.5
0.15
= 0.12
4. You decide to analyze whether or not the presidential candidate for a certain party did better if his party
controlled the house. You have data for 34 presidential elections. Think of these data as the population which
you want to describe, rather than a sample from which you want to infer behavior of a larger population. You
generate the following table:
Joint Distribution of Presidential Party Aliation and Party Control of House of Representa-
tives, 1860-1996
Dem. control House (Y=0) Rep. control House (Y=1)
Democratic President (A = 0) .412 .03
Republican President (A = 1) .176 .382
(a) Compute 1 [A]. = 1 (A = 1) = .176 +.382 = 0.558
(b) Compute 1 [A [ 1 = 1] . = 1
1(A=1,Y =1)
1(Y =1)
+ 0
1(A=0,Y =1)
1(Y =1)
=
.382
.382+.03
= 0.927 18
(c) If you picked one of the Republican presidents at random, what is the probability that during his term
the Democrats had control of the House?
1 (1 = 0[A = 1) =
.176
.558
= .315
(d) Are A and 1 independent? Justify your answer. Certainly not independent. Clearly the two variables
are interrelated. Formally, it is sucient to notice that 1 [A] ,= 1 [A [ 1 = 1] .
2
5. Let 1 be a binary random variable, that is, a random variable that only takes two values, 0 and 1. 1 represents
unemployment status, and 1 = 1 if you are unemployed at age 30, and = 0 otherwise. Let A represent years
of schooling. You know that the conditional probability of being unemployed at age 30 given years of schooling
is described by the following relation:
1 (1 = 1[A) =
c
1(0.1)A
1 +c
1(0.1)A
.
(a) Prove that 1 (1 [ A) = 1 (1 = 1 [ A) . Remember that 1 is either equal to 1 or to zero. So:
1 (1 [ A) = 11 (1 = 1[A) + 01 (1 = 0[A) = 1 (1 = 1[A)
This was obvious enough once you realize that this conditinal random variable has a Bernoulli distribution,
with the probability of a one that depends on A.
(b) Calculate \ ar(1 [ A). 1 given A has a Bernoulli distribution with probability of a one equal to
c
1(0:1)X
1+c
1(0:1)X
, so
\ ar(1 [ A) =
c
1(0.1)A
1 +c
1(0.1)A
_
1
c
1(0.1)A
1 +c
1(0.1)A
_
=
c
1(0.1)A
_
1 +c
1(0.1)A
2
(c) Calculate 1 (1 [ A = 0)
1 (1 [ A = 0) = 1 (1 = 1 [ A = 0) =
c
1
1 +c
1
= 0.268 94
(d) Calculate 1 (1 = 1 [ A = 20)
1 (1 = 1 [ A = 20) =
c
1(0.1)20
1 +c
1(0.1)20
= 0.04742 6
6. Remember that
a
i=1
A
i
= A
1
+A
2
+... +A
a1
+A
a
(a) prove that if A
i
is constant, so that A
i
= a for all i,
3
i=1
A
i
= 3a
3
i=1
A
i
= A
1
+A
2
+A
3
= a +a +a = 3a
(b) prove that

3
i=1
_
A
i
P
2
j=1
Y
j
_
=
P
3
i=1
A
i
P
2
j=1
Y
j
3
i=1
_
A
i
2
)=1
1
)
_
=
A
1
2
)=1
1
)
+
A
2
2
)=1
1
)
+
A
3
2
)=1
1
)
=
A
1
+A
2
+A
3
2
)=1
1
)
=
3
i=1
A
i
2
)=1
1
)
3
(c) verify that

3
i=1
_
2
)=1
A
i
7
)
_
=
_
2
)=1
7
)
__
3
i=1
A
i
_
3
i=1
_
_
2
)=1
A
i
7
)
_
_
=
3
i=1
A
i
_
_
2
)=1
7
)
_
_
=
_
_
2
)=1
7
)
_
_
3
i=1
A
i
the last step follows because the summation over , does NOT depend on the index i.
(d) verify that

3
i=1
(a +/A
i
) = 3a +/
3
i=1
A
i
3
i=1
(a +/A
i
) =
3
i=1
a +
3
i=1
/A
i
= 3a +/
3
i=1
A
i
7. You have : observations, and you calculate the sample mean, which, by denition, is
A =
1
:
a
i=1
A
i
Calculate

a
i=1
_
A
i

A
_
(show your steps!!).
a
i=1
_
A
i

A
_
=
a
i=1
A
i
i=1
A = :

A :

A = 0
obviously, the mean deviation from the mean... is zero!
8. Prove that

a
i=1
_
A
i

A
_
2
=
a
i=1
A
2
i
:

A
2
. And once again dont forget to show your steps!!!
a
i=1
_
A
i

A
_
2
=
a
i=1
_
A
2
i
2A
i

A +

A
2
_
=
a
i=1
A
2
i

a
i=1
2A
i

A +
a
i=1
A
2
=
a
i=1
A
2
i
2

A
a
i=1
A
i
+:

A
2
=
a
i=1
A
2
i
2

A:

A +:

A
2
=
a
i=1
A
2
i
:

A
2
9. 11. (16 points) A supermarket has two express lines. Let Aand 1 denote the number of customers in the
rst and in the second, respectively, at any given time. During nonrush hours, the joint pdf of A and 1 is
summarized by the following table of joint probabilities:
A = 0 A = 1 A = 2 A = 3
1 = 0 0.1 0.2 0 0
1 = 1 0.2 0.25 0.05 0
1 = 2 0 0.05 0.05 0.025
1 = 3 0 0 0.025 0.05
(a) (4 points) What are the expectations of A and 1 ?
4
Solution: To answer this you need the marginal probabilities 1 (1 = j) and 1 (A = r), which can be
calculated by summing up the rows and columns in the joint distribution matrix:
A = 0 A = 1 A = 2 A = 3
1 = 0 0.1 0.2 0 0 0.3
1 = 1 0.2 0.25 0.05 0 0.5
1 = 2 0 0.05 0.05 0.025 0.125
1 = 3 0 0 0.025 0.05 0.075
0.3 0.5 0.125 0.075
which can be used then to calculate 1 (A) = 0.3 0 +0.5 1 +0.125 2 +0.075 3 = .975. Since the matrix
is symmetric, the expectation of 1 is also .975.
(b) (4 points) What is 1 (1 [ A = 3)?
Solution: First you need to calculate 1 (1 = j [ A = 3) for all values of j :
1 (1 = 0 [ A = 3) =
0
.075
= 0
1 (1 = 1 [ A = 3) =
0
.075
= 0
1 (1 = 2 [ A = 3) =
0.025
.075
=
1
3
1 (1 = 3 [ A = 3) =
0.05
.075
=
2
3
so that you can then calculate 1 (1 [ A = 3) = 0 0 + 0 1 +
1
3
2 +
2
3
3 = 2.667.
(c) (4 points) Are X and Y independent? Explain.
Solution: If A and 1 are independent, then
1 (A = r, 1 = j) = 1 (A = r) 1 (1 = j)
This should hold for all values of A and 1 , but we can see, for example, that
1 (A = 0, 1 = 0) = .1 ,= 1 (A = 0) 1 (1 = 0) = .09
so they are not independent. You could have picked many other examples from what the joint pdf would
be if they were independent (you did not need to reproduce this entire matrix!)
A = 0 A = 1 A = 2 A = 3
1 = 0 0.09 0.15 0.0375 0.0225 0.3
1 = 1 0.15 0.25 0.0625 0.0375 0.5
1 = 2 0.0375 0.0625 0.015625 0.009375 0.125
1 = 3 0.0225 0.0375 0.009375 0.005625 0.075
0.3 0.5 0.125 0.075
(d) (4 points) Find 1 ([A 1 [ = 1), the probability that A and 1 dier by exactly 1.
Solution: By denition,
1 ([A 1 [ = 1) =

[AY [=1
)
A,Y
(r, j)
= )
A,Y
(0, 1) +)
A,Y
(1, 0) +)
A,Y
(1, 2) +)
A,Y
(2, 1) +)
A,Y
(2, 3) +)
A,Y
(3, 2)
= 0.2 + 0.2 + 0.05 + 0.05 + 0.025 + 0.025 = 0.55
5
(You could also have calculated it as one minus the sum of the all the joint probabilities for which
[A 1 [ , = 1).
10. (6 points overall) Suppose you are estimating a parameter 0, and that your estimator is consistent and normally
distributed (everything would work also for asymptotic normality, but let us keep things simple). We know
that if we test the null hypothesis that 0 = 0
0
, the signicance level c is the probability of rejecting the null
hypothesis when the null is actually correct. So, if c = 0.05,
0.05 = Pr ([t[ 1.96 [ 0 = 0
0
)
where
t =
^
0 0
0
o1
_
^
0
_
where notice that the probability is conditional on the null being true. Another important concept in econo-
metrics is that of power. The power of a test is the probability of (correctly) rejecting the null when the null
is actually false! The power of a test depends upon which value of 0 is actually the true one. In fact, by
denition,
joncr (0
) = Pr ([t[ 1.96 [ 0 = 0
)
(a) (3 points) Now suppose that the true value of the parameter of interest 0 = 0
is very far away from the

null 0
0
. Suppose also that you have a very large sample. Do you think that the power of the test will be
big or small? Justify your argument.
Solution: Given a large sample, the estimated value of the parameter will be close to its true value,
0
. Under some regularity conditions, the variance of

^
0 should be relatively low (since we have a lot of
observations to estimate 0 accurately). In this case, t-statistic, which is
^
00
0
S1(
^
0)
, will be large. Therefore,
we expect the power of the test to be high as well.
(b) (3 points) You are interested in testing a certain null hypothesis. If you could choose between two
tests 1 and 2, and if you knew that for every possible true value of the parameter joncr (0
) [
test 1
joncr (0
) [
test 2
, which test would you choose, 1 or 2? Justify your answer.
Solution: High power means that we reject false hypothesis more often, whatever the "truth" is. Thus,
we would choose a more powerful test, i.e. test 1.
6
2 Estimation and OLS Theory
1. (6 points) Assume that the OLS linearity assumption holds. So 1
i
= ,
0
+ ,
1
A
i
+ n
i
, and 1 [n
i
[ A
i
] = 0.
Here you have to prove your answer. Be as specic as you can. The variance of 1
i
is given by
(a) ,
2
0
+,
2
1
\ ar(A
i
) +\ ar(n
i
).
(b) the variance of n
i
.
(c) ,
2
1
\ ar(A
i
) +\ ar(n
i
)
(d) the variance of the residuals. Answer: C (use the fact that 1 [n
i
[ A
i
] = 0 implies that n
i
and A
i
are
uncorrelated.
\ ar(1
i
) = \ ar(,
0
+,
1
A
i
+n
i
)
= \ ar(,
0
) +\ ar(,
1
A
i
) +\ ar(n
i
)
= 0 +,
2
1
\ ar(A
i
) +\ ar(n
i
)
2. (6 points) We are using an estimator
^
0 to estimate a certain parameter 0
0
and your sample contains 20000
observations. It turns out that the true value of the parameter is 0
0
= 2, but our point estimate is
^
0 = 2.5.
What does this tell us about our estimator
^
0 ? Justify your answer.
(a)
^
0 has a large variance.
(b)
^
0 is biased.
(c)
^
0 is not consistent.
(d) none of the above. Answer: D
That the point estimate is dierent from the true parameter value may indicate large variance, but does
not necessarily so. The denition of bias is 1(
^
0) = 0
0
. This is not what is given in the question.
Therefore, the statement may indicate bias, but not necessarily. Similarly, the given in the question does
not necessarily mean
^
0 is inconsistent.
3. (21 points) In this problem, we will consider what happens when we run a reverse regression. Suppose
that you have two variables 1 and A for which the following relationship holds
1
i
= ,
0
+,
1
A
i
+n
i
(1)
Futhermore, suppose that the rst three assumptions of OLS hold for this relationship, so that we can obtain
unbiased and consistent estimates of ,
0
and ,
1
from an OLS regression of 1 on A. However, by solving (1)
for A
i
we can also obtain
A
i
=
,
0
,
1
+
1
,
1
1
i
n
i
,
1
(2)
which we can simply rewrite as
A
i
= c
0
+c
1
1
i
+
i
(3)
where c
0
=
o
0
o
1
, c
1
=
1
o
1
, and
i
=
&
i
o
1
. Since (1) represents a true population relationship, we know that (3)
does too (by construction). We are interested in whether we can use a regression based on (3) (i.e. an OLS
regression of A on 1 ) to recover a consistent estimate of c
1
.
(a) (5 points) First, lets establish a couple of preliminary results. Given that we know (from OLS Assump-
tion 1) that 1(n [ A) = 0, prove that Co(A, n) = o
A&
= 0. Solution: We already know (from class or
using the LIE) that 1(n [ A) = 0 =j
&
= 1(n) = 0. Now let 1(A) = j
A
. Starting with the denition of
covariance and expanding the square we nd that:
Co(A, n) = 1 [(A j
A
) (n j
&
)] = 1 [(A j
A
) (n)]
= 1(An) 1(j
A
n) = 1(An) j
A
1(n) = 1(An)
7
Now applying the Law of Iterated Expectations we have:
1(An) = 1(1 (An [ A)) = 1(A1 (n [ A))) = 0
So we conclude that Co(A, n) = 0.
(b) (3 points) Now let \ ar(n) = o
2
&
. Using the result from part a, calculate the covariance between 1 and in
terms of o
2
&
. Solution: Since we know in general that Co(a+/A+c1, d1 ) = /dCo(A, 1 )+cdCo(1, 1 ),
we can see that
Co(1, ) = Co
_
,
0
+,
1
A +n,
n
i
,
1
_
= Co(A, n)
1
,
1
Co(n, n) = 0
o
2
&
,
1
=
o
2
&
,
1
(c) (2 points) Now lets consider the proposed reverse regression. Show that the OLS estimator of c
1
(from a regression of A on 1 ) can be written as
c
1
= c
1
+
a
i=1
_
1
i
1
_
a
i=1
_
1
i
1
_
2
Solution: We know that in usual regression of 1 on A that the estimator of the slope ,
1
can be written
as
,
1
= ,
1
+
a
i=1
_
A
i
A
_
n
i
a
i=1
_
A
i
A
_
2
Simply switching the roles of 1 and A and using (3) we see that

c
1
can be written as
c
1
= c
1
+
a
i=1
_
1
i
1
_
a
i=1
_
1
i
1
_
2
(d) (6 points) Is

c
1
a consistent estimator for c
1
? Prove your answer. Solution: We know that the OLS
estimator

c
1
can be written as
c
1
= c
1
+
a
i=1
_
1
i
1
_
a
i=1
_
1
i
1
_
2
= c
1
+
1
a1
a
i=1
_
1
i
1
_
(
i
)
1
a1
a
i=1
_
1
i
1
_
2
= c
1
+
:
Y
:
2
Y
We know that :
Y
j
o
Y
and :
2
Y
j
o
2
Y
so that

c
1
j
c
1
+
o
2
Y
. But since o
Y
=
o
2
u
o
1
, the second term
does not converge to zero, so

c
1
is not a consistent estimator for c
1
.
(e) (5 points) Suppose you were able to nd a consistent estimator of c
1
. Lets call this estimator

c
1
. Can
you use
c
1
to construct a consistent estimator of ,
1
. If yes, do so. If not, explain why you cant. Solution:
Since
c
1
is a consistent estimator of c
1
, we know that
c
1
j
c
1
=
1
o
1
. Therefore, by the continuous mapping
theorem, we know that
1
e
c
1
j
1
c
1
= ,
1
(assuming that c
1
,= 0). So
1
e
c
1
is a consistent estimator of ,
1
.
4. You want to calculate the mean proportion of the total budget that individuals devote to health expenditure
in the US. You have an iid sample of : individuals from a large household survey, where total expenditure (1
i
)
and health expenditure (H
i
) are recorded for each individual. Let \
i
be the budget share for health, that is,
\
i
=
H
i
1
i
.
So, you would like to estimate 1 (\
i
) = j
W
.
8
(a) (8 points) Suggest a consistent estimator for 1 (\
i
). Write down the proposed estimator, and explain
why it is consistent (this is very simple, so please do not look for dicult or long answers). Solution:
We estimate expectations with their respective sample equivalent, that is, the sample mean. If I want to to estimate
the expectation of W, my estimator will be
^ j
W
=
1
:
a
i=1
\
i
This is consistent by the usual LLN.
(b) (8 points) A researcher suggests that one may estimate j
W
calculating the ratio of total expenditure in
health for all individuals in the sample over total expenditure for all individuals in the sample. That is,
the suggested estimator is
^ j
W
=
a
i=1
H
i
a
i=1
1
i
Is ^ j
W
a consistent estimator for j
W
? Prove your answer. Solution: No, it is not consistent, because it will
converge to the ratio of the two expectations, and this is dierent from the expectation of the ratios!!
a
i=1
H
i
a
i=1
1
i
=
1
a
a
i=1
H
i
1
a
a
i=1
1
i

1 [H]
1 [1 ]
,= 1
_
H
1
_
In practice, you can actually prove that this will converge to a weighted average of budget shares, with more weight
given to richer households!
a
i=1
H
i
a
i=1
1
i
=
a
i=1
H
i
Y
i
Y
i
a
i=1
1
i
=
a
i=1
1
i
_
1
i
Y
i
_
a
i=1
1
i
=
a
i=1
_
1
i
a
i=1
1
i
__
H
i
1
i
_
5. (12 points total) You want to calculate the average share of income that people donate to charity in the
U.S. You have an iid sample of : individuals from a large government survey, where total charitable donations
(1
i
) and total income (1
i
) are recorded for each individual. Let C
i
be the share of income donated to charity:
C
i
=
1
i
1
i
.
You would like to estimate 1 (C
i
) = j
C
.
(a) (5 points) Suggest a consistent estimator for 1 (C
i
). Write down the proposed estimator, and explain
why it is consistent.
Solution: The sample mean C =
1
a
a
i=1
C
i
=
1
a
a
i=1
1
i
Y
i
is a consistent estimator. We know its
consistent by the law of large numbers, which states that the sample mean is a consistent estimator of
the true mean (for an iid sample with nite variance).
(b) (7 points) A colleague suggests that you could estimate j
C
calculating the ratio of total charitiable
donations for all individuals in the sample over total income for all individuals in the sample:
^ j
C
=
a
i=1
1
i
a
i=1
1
i
Is ^ j
C
a consistent estimator for j
C
? Prove your answer.
Solution: This is not a consistent estimator for j
C
= 1
_
1
Y
_
. While the probability limit of j
C
is
1(1)
1(Y )
=
j
D
j
Y
, because
1
a
a
i=1
1
i
1
a
a
i=1
1
i
j
j
1
j
Y
=
1(1)
1(1 )
we know that
1(1)
1(Y )
,= 1
_
1
Y
_
= 1 (C) = j
C
, which is what we want to estimate here.
9
6. You have : observations, and you calculate the sample mean, which, by denition, is
A =
1
:
a
i=1
A
i
(a) (4 points) Calculate

a
i=1
_
A
i

A
_
(show your steps!!).
Solution
a
i=1
_
A
i

A
_
=
a
i=1
A
i
i=1
A = :

A :

A = 0
obviously, the mean deviation from the mean... is zero!
(b) (5 points) Prove that

a
i=1
_
A
i

A
_
2
=

a
i=1
A
2
i
:

A
2
. And once again dont forget to show your
steps!!!
Solution
a
i=1
_
A
i

A
_
2
=
a
i=1
_
A
2
i
2A
i

A +

A
2
_
=
a
i=1
A
2
i

a
i=1
2A
i

A +
a
i=1
A
2
=
a
i=1
A
2
i
2

A
a
i=1
A
i
+:

A
2
=
a
i=1
A
2
i
2

A:

A +:

A
2
=
a
i=1
A
2
i
:

A
2
7. (10 points) This is a simple exercise with an important hidden lesson. Suppose that grade G in your
econometrics class depends on the diculty 1 of an exam. Your crazy instructor does not like students
arriving late in class, because he nds it distracting. So, chances are that if students keep arriving late, he
will be subconsciously upset while writing exams, with the consequence that the likelihood of having hard
exam questions will increase. Let 1 denote a binary variable equal to one if some students keep arriving late,
and equal to zero otherwise. Let 1 be a binary variable equal to 1 if the exam is dicult, and equal to
zero otherwise. You know that Pr (1 = 1 [ 1 = 1) = .5, and Pr (1 = 1 [ 1 = 0) = .2. You also know that
conditional on exam diculty, G and 1 are independent, that is, once you know if 1 is equal to one or zero,
knowing whether students used to arrive late has no information whatsoever about G. Exams are graded on a 0
to 100 scale, and based on data from previous years you know that 1 [G [ 1 = 1] = 60 and 1 [G [ 1 = 0] = 75.
Calculate 1 [G [ 1 = 0] and 1 [G [ 1 = 1]. Based on your results, would you say that arriving late in class is
a good or a bad strategy if one wants to maximize the grade? You do not need to turn in answers to the last
question, which is just meant to be rhetorical. SOLUTION: We know that exams can be either dicult
or not dicult. So we can write can also be written as
1 [G[1 = 0] = 1 [G[1 = 0, 1 = 1] 1 (1 = 1[1 = 0)
+1 [G[1 = 0, 1 = 0] 1 (1 = 0[1 = 0) .
We know that once you know if 1 is equal to one or zero, knowing whether students used to arrive late has
no information whatsoever about G. So we can be sure that
1 [G[1 = 0, 1] = 1 [G[1] .
But then we have all elements to calculate the conditional expectations we are after, because
1 [G[1 = 0] = 1 [G[1 = 1] 1 (1 = 1[1 = 0) +1 [G[1 = 0] 1 (1 = 0[1 = 0)
= 60 (.2) + 75 (.8) = 72
10
Similarly, we have
1 [G[1 = 1] = 1 [G[1 = 1] 1 (1 = 1[1 = 1) +1 [G[1 = 0] 1 (1 = 0[1 = 1)
= 60 (.5) + 75 (.5) = 67.5 < 72
8. You have an i.i.d. sample of : observations drawn from a random variable (RV) A. You can assume that all
i.i.d RVs in this problem have nite variance. By denition, the variance of A is o
2
A
= 1
_
(A j
A
)
2
_
. The
variance is just an expected value, and we know that expected values can usually be estimated using their
sample equivalent, that is, a sample mean. So, if j
A
is known, one can estimate the variance with
^ o
2
A
=
1
:
a
i=1
(A
i
j
A
)
2
(1)
(a) Show that ^ o
2
A
is an unbiased estimator of o
2
A
, that is, show that 1
_
^ o
2
A
= o
2
A
.
1
_
^ o
2
A
= 1
_
1
:
a
i=1
(A
i
j
A
)
2
_
=
1
:
a
i=1
1 (A
i
j
A
)
2
=
1
:
a
i=1
o
2
A
= o
2
A
(b) Show that ^ o
2
A
j
o
2
A
(that is, ^ o
2
A
converges in probability to the true variance, so that ^ o
2
A
consistently
estimates the variance). We know that (A
i
j
A
)
2
is iid, because it is a deterministic function of an iid RV.
The problem tells us that all iid variables in this problem have nite variance, so we can use the LLN and conclude
that
1
:
a
i=1
(A
i
j
A
)
2
j
1 (A
i
j
A
)
2
= o
2
A
(c) To use (1) above, you need to know j
A
, which in general is not known. So now suppose that you have
to estimate j
A
as well. Of course we will use a sample mean to estimate this parameter too. Then the
estimator for the variance becomes a two-step estimator. Let us call it ~ o
2
A
. In a rst step you estimate
j
A
by using

A, then you plug this estimate into (1) to obtain ^ o
2
A
. So, if the mean is unknown, we have
~ o
2
A
=
1
:
a
i=1
_
A
i

A
_
2
.
We want to see if this estimator is unbiased and consistent for o
2
A
. We can do this through a sequence of
small steps. First, if you add and subtract j
A
in the expression in parenthesis what you get is
~ o
2
A
=
1
:
a
i=1
_
(A
i
j
A
)
_
A j
A
_
2
(2)
(d) Let o
2
A
represent the variance of the sample mean. Prove that o
2
A
= 1
_
A
2
_
j
2
A
. We know that

A is
a random variable with mean j
A
and variance
o
2
X
a
. We also know that in general the variance of a RV can be
written as the dierence between the expected value of its square and the square of its expected value. Hence the
result follows.
(e) Now, prove that
1
a
a
i=1
(A
i
j
A
)
_
A j
A
_
=
_
A
_
2
2j
A

A +j
2
A
.
1
:
a
i=1
(A
i
j
A
)
_
A j
A
_
=
1
:
a
i=1
_
A
i

A A
i
j
A

Aj
A
+j
2
A
_
=

A
1
:
a
i=1
A
i
j
A
1
:
a
i=1
A
i
1
:

Aj
A
a
i=1
1 +
1
:
a
i=1
j
2
A
=

A
2
j
A

A

Aj
A
+j
2
A
=

A
2
2j
A

A +j
2
A
11
(f) Show that 1
_
_
A
_
2
2j
A

A +j
2
A
_
=
o
2
X
a
1
_
_
A
_
2
2j
A

A +j
2
A
_
= 1
_
_
A
_
2
_
2j
2
A
+j
2
A
= 1
_
_
A
_
2
_
j
2
A
=
o
2
A
:
which is equal to the variance of the sample mean, and then is equal to
o
2
X
a
.
(g) Show that 1
_
1
a
a
i=1
_
A j
A
_
2
_
=
o
2
X
a
1
_
1
:
a
i=1
_
A j
A
_
2
_
=
1
:
a
i=1
1
_
_
A j
A
_
2
_
=
1
:
a
i=1
\ ar
_
A
_
= \ ar
_
A
_
=
o
2
A
:
(h) Expanding the square in equation (2), and using the results in parts a), f), and g), show that
1
_
~ o
2
A
=
: 1
:
o
2
A
1
_
~ o
2
A
= 1
_
1
:
a
i=1
_
(A
i
j
A
)
_
A j
A
_
2
_
= 1
_
1
:
a
i=1
(A
i
j
A
)
2
_
. .
from (a)
+1
_
1
:
a
i=1
_
A j
A
_
2
_
. .
from (g)
21
_
1
:
a
i=1
(A
i
j
A
)
_
A j
A
_
_
. .
from f (and (e))
= o
2
A
+
o
2
A
:
2
o
2
A
:
= o
2
A

o
2
A
:
=
_
: 1
:
_
o
2
A
12
(i) So, ~ o
2
A
is a biased estimator of the variance! What happens to the bias when : ?
Our estimator is biased, but the bias goes to zero when n grows large. So, we should expect this estimator to be
biased but consistent.
(j) Given the results above, nd an n:biased estimator for the variance o
2
A
for the case where j
A
as in this
caseis not known. We know that
1
_
~ o
2
A
=
_
: 1
:
_
o
2
A
so that an unbiased estimator can be obtained simply multiplying ~ o
2
A
by the inverse of
a1
a
. Let
o
2
=
:
: 1
~ o
2
A
then
1
_
o
2
=
:
: 1
1
_
~ o
2
A
=
:
: 1
_
: 1
:
_
o
2
A
= o
2
A
So, an unbiased estimator of the variance is (see equation 3.7 on Stock & Watson)
o
2
=
:
: 1
~ o
2
A
=
:
: 1
1
:
a
i=1
_
A
i

A
_
2
=
1
: 1
a
i=1
_
A
i

A
_
2
13
9. (6 points) Suppose you are given the following model:
j
i
= c +n
i
where c is an unknown constant.
(a) (2 points) What is the OLS estimator for c? What other estimator is computed in the same way?
Answer: The OLS estimator solves the minimization problem
'i:
c
n
2
i
= 'i:
c
(j
i
j
i
)
2
= 'i:
c
(j
i
c)
2
0
(j
i
c)
2
0c
= 0 ==
(j
i
c) = 0 ==c =
1
:
j
i
= j
which is just the sample mean of j. This corresponds to the univariate regression where all the A
i
s (and
therefore A ) equal zero. So ,
1
(and

,
1
) equal zero and

,
0
= 1
,
1
A = 1
(b) (2 points) Compute
1
a1
n
2
i
. What other estimator is computed in the same way? Answer:
1
a1
n
2
i
=
1
a1
(j
i
j
i
)
2
=
1
a1
(j
i
c)
2
=
1
a1
(j
i
j)
2
which is just the sample variance of j.
(c) (2 points) Interpret your ndings from a) and b). Answer: As we showed in class, 1 is the least squares
estimator of j
Y
, ('i:
j
(j
i
j)
2
) so part a) should not be surprising. In part b), you are calculating a
measure of the variance that you are not explaining, which in this case is all of the variation around the
mean of 1 (i.e. the variance of 1 ).
10. You have a sample of : i.i.d. observations from two independent random variables c
i
and n
i
, and you know
that these two random variables have both mean zero. You also know that \ ar(c
i
) = o
2
c
< and \ ar(n
i
) =
o
2
&
< . Let c and n denote the means of the two random variables.
(a) Using the Central Limit Theorem (CLT), prove that
_
:( c)
o
(0, o
2
c
), that is, prove that the asymptotic
distribution of
_
:( c) is a normal with mean zero and variance o
2
c
. Solution: From the CLT (and using
the fact that we have iid observations with nite variance) we know that
( c j
c
)
oe
_
a
=
_
:( c j
c
)
o
c
o
(0, 1) =
_
:( c j
c
)
o
_
0, o
2
c
_
.
Then the conclusion simply follows noting that j
c
= 0 by assumption.
(b) Using again the CLT, determine the asymptotic distribution of
_
:( n). Solution: This is just the same
as in the previous point.
( n j
&
)
ou
_
a
=
_
:( n 0)
o
&
o
(0, 1) =
_
:( n)
o
_
0, o
2
&
_
(c) Using the results from the previous two parts, and assuming that the sample size : is very large (but not
), what is the distribution of ( c + n) approximately equal to? Solution: We know that the sum of two
normal distributions is normal, and we know that n and c are independent, hence uncorrelated with zero
covariance. Then in large samples
_
:( c + n) -
_
0, o
2
c
+o
2
&
_
.
But then it will also be the case that
( c + n) -
_
0,
o
2
c
+o
2
&
:
_
14
11. Let the following denitions hold:
A =
1
:
a
i=1
A
i
^ o
2
AY
=
1
:
a
i=1
_
A
i

A
_ _
1
i

1
_
(a) Prove that
^ o
2
AY
=
1
:
a
i=1
_
A
i

A
_
1
i
=
1
:
a
i=1
_
1
i

1
_
A
i
Proof:
^ o
2
AY
=
1
:
a
i=1
_
A
i

A
_ _
1
i

1
_
=
1
:
a
i=1
__
A
i

A
_
1
i
_
A
i

A
_

1
=
1
:
a
i=1
_
A
i

A
_
1
i
1
:
a
i=1
__
A
i

A
_

1
=
1
:
a
i=1
_
A
i

A
_
1
i
1
:

1
a
i=1
_
A
i

A
_
. .
=0
=
1
:
a
i=1
_
A
i

A
_
1
i
similarly:
^ o
2
AY
=
1
:
a
i=1
_
A
i

A
_ _
1
i

1
_
=
1
:
a
i=1
_
1
i

1
_
A
i
1
:
a
i=1
_
1
i

1
_

A
=
1
:
a
i=1
_
1
i

1
_
A
i
1
:

A
a
i=1
_
1
i

1
_
=
1
:
a
i=1
_
1
i

1
_
A
i
12. You know that companies in a certain economic sector produce an item j using labor (1) and capital (1).
However, dierent companies use slightly dierent technologies. The technology for the i
tI
rm can be described
by the following relation
j
i
= c1
o
i
1
1o
i
n
i
(1)
Where c and , are just two constants, and n
i
is an error or residual term which represents the fact that dierent
rms use partly dierent technologies (note that the functional form and the parameters are common to all
rms). (incidentally, you might remember from other ECON courses that (1) is a Cobb-Douglas production
function, which here has a random component). You also know that the following assumption holds:
1 (n
i
[ 1
i
, 1
i
) = 1
(a) Prove that 1 (j
i
[ 1
i
, 1
i
) = c1
o
i
1
1o
i
1 (j
i
[ 1
i
, 1
i
) = 1
_
c1
o
i
1
1o
i
n
i
[ 1
i
, 1
i
_
= c1
o
i
1
1o
i
1 (n
i
[ 1
i
, 1
i
)
= c1
o
i
1
1o
i
15
(b) Using the information provided, can you conclude that the following equality holds?
1 (lnj
i
[ 1
i
, 1
i
) = lnc +, ln1
i
+ (1 ,) ln1
i
No! In fact:
1 (lnj
i
[ 1
i
, 1
i
) = 1 [lnc +, ln1
i
+ (1 ,) ln1
i
+ lnn
i
[ 1
i
, 1
i
]
= lnc +, ln1
i
+ (1 ,) ln1
i
+1 [lnn
i
[ 1
i
, 1
i
]
but even if we know that 1 [n
i
[ 1
i
, 1
i
] = 1, we CANNOT conclude that 1 [lnn
i
[ 1
i
, 1
i
] = ln(1) = 0.
The logarithm is not a linear operation, and in general
1 (q (r)) ,= q (1 (r))
13. You want to estimate an expected value j
A
, and you want to compare the performance of two estimators
^
0
1
and
^
0
2
. You have an iid sample of 3 observations A
1
, A
2
, A
3
. The two estimators are dened as follows
^
0
1
=
1
3
3
i=1
A
i
^
0
2
=
1
6
A
1
+
2
3
A
2
+
1
6
A
3
(a) Are
^
0
1
and
^
0
2
unbiased? Yes, both estimators are unbiased, as
1[
^
0
1
] = 1[
1
3
3
i=1
A
i
] =
1
3
31[A]
1[
^
0
2
] = 1
__
1
6
A
1
+
2
3
A
2
+
1
6
A
3
_
= (
1
6
+
2
3
+
1
6
)1[A] = 1[A].
Here we used the fact that the sample draws are identically distributed.
(b) Which estimator, between
^
0
1
and
^
0
2
, is more c))icic:t? Estimator 1,
^
0
1
, is more ecient, as \ ar[
^
0
1
] <
\ ar[
^
0
2
].
To show this, look at the variances, using the i.i.d. property of the draws in our sample:
Var[
^
0
1
] =
1
3
Var[A], and Var[
^
0
2
] = (
1
36
+
4
9
+
1
36
)Var[A] =
1
2
Var[A].
(c) Which estimator has the smallest MSE? Estimator 1,
^
0
1
, has the smallest MSE, as it is more ecient and
both estimators are unbiased.
14. Here we see what can happen when we use assumptions that are not correct. Suppose that you have an iid
sample, and you correctly assume that the conditional expectation of j given r is linear, so that 1 (n
i
[ A
i
) = 0.
However, suppose also that you incorrectly assume that the intercept is zero. So the truth is that the
regression is the usual
1
i
= ,
0
+,
1
A
i
+n
i
(2)
with ,
0
,= 0, but instead you assume that
1
i
= ,
1
A
i
+n
i
. (3)
If you think that (2) is correct (which it isnt), the OLS estimator for the slope is (make sure you understand
why) the solution
^
,
1
of the following problem (you minimize the sum of squared errors)
min
b
1
a
i=1
(1
i
/
1
A
i
)
2
(a) Using steps analogous to those we saw in class for the standard OLS estimators, show that now the OLS
estimator for the slope is
^
,
&vcaj
1
=
a
i=1
A
i
1
i
a
i=1
A
2
i
16
Solution:
min
o
1
(1
i
,
1
A
i
)
2
min
o
1
(1
2
i
2,
1
1
i
A
i
+,
2
1
A
2
i
)
(21
i
A
i
+ 2,
1
A
2
i
) = 0
2
1
i
A
i
= 2,
1
A
2
i
^
,
.
1
=
1
i
A
i
A
2
i
(b) We want to see what happens when we use this estimator when the assumption that the intercept is zero
is incorrect. Using what you know to be the truth about 1
i
, show that
^
,
&vcaj
1
= ,
1
+,
0
a
i=1
A
i
a
i=1
A
2
i
+
a
i=1
A
i
n
i
a
i=1
A
2
i
.
Solution:
^
,
.
1
=
(,
0
+,
1
A
i
+n
i
)A
i
A
2
i
= ,
0
A
i
A
2
i
+,
1
A
2
i
A
2
i
+
A
i
n
i
A
2
i
(c) What does
1
a
a
i=1
A
2
i
converge in probability to? That is, calculate the probability limit of
1
a
a
i=1
A
2
i
.
(this is really simple, and does not require any long calculation!! The two following points should be even
simpler). Solution: Just use the LLN and the fact that the observations are iid. Then
1
:
a
i=1
A
2
i
j
1
_
A
2
i
(d) Calculate the probability limit of

1
a
a
i=1
A
i
n
i
.Solution: like before, just use iid-ness and LLN. Then
use LIE and the assumption 1 (n
i
[ A
i
) = 0.
1
:
a
i=1
A
i
n
i
j
1 [A
i
n
i
] = 1 [1 (A
i
n
i
[A
i
)] = 1 [A
i
1 (n
i
[A
i
)] = 1 [A
i
0] = 0
(e) Calculate the probability limit of
1
a
a
i=1
A
i
Solution: like before, just use iid-ness and LLN.
(f) Using the previous results, show whether (or not)
^
,
&vcaj
1
is a consistent estimator for the slope ,
1
.
Solution: Putting all pieces together:
^
,
.
1
= ,
0
1
a
A
i
1
a
A
2
i
+,
1
+
1
a
A
i
n
i
1
a
A
2
i
j
,
1
+,
0
1 [A
i
]
1
_
A
2
i
,= ,
1
unless ,
0
= 0 (which of course means that the wrong model is right after all) and/or 1 [A
i
] = 0, in
which case the true intercept would again be zero.
(g) Give an intuition for your results in part (f). (Hint, draw a scatterplot of points, with A
i
on the r-axis,
and 1
i
on the j-axis, more or less around a line, and do it in such a way that the line should NOT have an
intercept equal to zero. Then think about the consequences of minimizing the sum of squared residuals
from that scatterplot, but using a line that MUST pass through the origin of your graph. How would
you end up drawing it? What would be the relation between the slope estimated without the assumption
of ,
0
= 0, and the one estimated with it?). Solution: Using
^
,
.
1
leads to inconsistency because it forces
the regression line through the origin. Assuming that the A
i
s are non-negative, if ,
0
is negative, this
will make
^
,
.
1
smaller than it should in reality. This means we would under-estimate the impact of A
on 1 . Conversely, if ,
0
is positive, this will make ,
.
1
larger than it should. In other words, we would
over-estimate the impact of A on 1 .
17
15. You are interested in the relation between wages and education. For the purpose of this problem, you may
disregard the issues of omitted variable bias that we have frequently mentioned. Suppose that you know that
the correct model is the following:
lnnaqc
i
= ,
0
+,
1i
cdnc
i
+n
i
where Co (cdnc
i
, n
i
) = 0, and where you should notice that the slope changes for dierent individuals. Then, we
can treat the individual-specic slopes as random variables themselves. You have a sample of iid observations,
and you have a single observation for each individual i. Clearly, you cannot estimate the individual-specic
slope, as you have only one observation per individual. However, you may want to estimate the mean slope
in the population, that is, the mean percentage increase in wages associated with one more year of education.
Let this parameter be
,
1
= 1 (,
1i
)
You also know that the value of the slope is statistically independent from education. You want to understand
if an OLS regression of lnnaqc
i
on cdnc
i
allows you to estimate consistently ,
1
. As usual, then, the starting
point is the OLS estimate for the slope, which in this case becomes
^
,
1
=
a
i=1
_
cdnc
i
cdnc
_
lnnaqc
i
a
i=1
_
cdnc
i
cdnc
_
2
(a) (4 points) Prove that
^
,
1
=
1
a
a
i=1
_
cdnc
i
cdnc
_
,
1i
cdnc
i
1
a
a
i=1
_
cdnc
i
cdnc
_
2
+
1
a
a
i=1
_
cdnc
i
cdnc
_
n
i
1
a
a
i=1
_
cdnc
i
cdnc
_
2
Substituting the expression for lnnaqc
i
in the formula for

,
1
:
,
1
=
1
a
a
i=1
(cdnc
i
cdnc)(,
0
+,
1i
cdnc
i
+n
i
)
1
a
a
i=1
(cdnc
i
cdnc)
2
= ,
0
1
a
a
i=1
(cdnc
i
cdnc)
1
a
a
i=1
(cdnc
i
cdnc)
2
+
1
a
a
i=1
(cdnc
i
cdnc),
1i
cdnc
i
1
a
a
i=1
(cdnc
i
cdnc)
2
+
1
a
a
i=1
(cdnc
i
cdnc)n
i
1
a
a
i=1
(cdnc
i
cdnc)
2
= 0 +
1
a
a
i=1
(cdnc
i
cdnc),
1i
cdnc
i
1
a
a
i=1
(cdnc
i
cdnc)
2
+
1
a
a
i=1
(cdnc
i
cdnc)n
i
1
a
a
i=1
(cdnc
i
cdnc)
2
(b) (5 points) We know that under the usual regularity conditions
j lim
1
:
a
i=1
_
cdnc
i
cdnc
_
n
i
= Co (cdnc
i
, n
i
)
j lim
1
:
a
i=1
_
cdnc
i
cdnc
_
2
= o
2
co&c
What is the probability limit of
1
a
a
i=1
_
cdnc
i
cdnc
_
,
1i
cdnc
i
?
1
:
a
i=1
(cdnc
i
cdnc),
1i
cdnc
i
=
1
:
a
i=1
(cdnc
i
cdnc)(,
1i
cdnc
i
,
1
cdnc)
18
When taking the probability limit, we know that we can switch averages with expected values:
j lim
1
:
a
i=1
(cdnc
i
cdnc),
1i
cdnc
i
= 1[(cdnc
i
1(cdnc
i
))(,
1i
cdnc
i
1(,
1i
cdnc
i
))]
= Co(cdnc
i
, ,
1i
cdnc
i
)
= 1[cdnc
i
,
1i
cdnc
i
] 1[cdnc
i
] 1[,
1i
cdnc
i
]
= 1[,
1i
cdnc
2
i
] 1[cdnc
i
] 1[,
1i
cdnc
i
]
= 1[,
1i
] 1[cdnc
2
i
] 1[cdnc
i
]
2
1[,
1i
] (Independence of ,
1i
and cdnc
i
)
= 1[,
1i
] (1[cdnc
2
i
] 1[cdnc
i
]
2
)
= 1[,
1i
] o
2
co&c
(c) (2 points) Using the results from the previous steps, do you conclude that
^
,
1
is a consistent estimate of
the mean slope ,
1
? Explain.
j lim
,
1
=
1[,
1i
] o
2
co&c
o
2
co&c
+
Co(cdnc
i
, n
i
)
o
2
co&c
= 1[,
1i
] + 0
Therefore, the answer is yes: the estimator

,
1
is a consistent estimator of the mean slope ,
1
.
(d) (2 points) Do you think your result would change if you knew that the value of the slope is correlated
with the level of education? Provide a brief intuition. No formal proofs are necessary here.
Yes, results would change. If ,
1i
and cdnc were not independent, we would not be able to write:
Co(cdnc
i
, ,
1i
cdnc
i
) = 1[,
1i
] o
2
co&c
and therefore we would have that j lim
,
1
,= 1[,
1i
].
16. (15 points overall). Suppose that you have a dataset of iid observations (1
+
i
, A
i
) where 1
+
i
is the depen-
dent variable measured with error. In particular, you know that the error has additive form, and that it is
uncorrelated with the error in the true regression, which is linear in A
i
(without an intercept, to keep things
simpler). A
i
is measured without error. So, while the true model is the following
1
i
= ,A
i
+n
i
(1)
0 = 1(n
i
[ A
i
)
you cannot estimate it, since you only observe 1
+
i
= 1
i
+ -
i
,where 1(-
i
) = 0 and co(-
i
, A
i
) = 0. Since you
observe 1
+
i
and not 1
i
, your estimator for the slope will be
^
, =
a
i=1
A
i
1
+
i
a
i=1
A
2
i
(a) (2 points) Prove that 1(A
i
n
i
) = 0.
Solution: 1(A
i
n
i
) = 1(A
i
1(n
i
[A
i
)) = 0 by L.I.E.
(b) (2 points) Prove that 1(A
i
-
i
) = 0
Solution: co(-
i
, A
i
) = 0 = 1(A
i
-
i
) 1(A
i
)1(-
i
) = 1(A
i
-
i
). Therefore, 1(A
i
-
i
) = 0.
19
(c) (2 points) Prove that
^
, = , +
1
a
A
i
n
i
1
a
A
2
i
+
1
a
A
i
-
i
1
a
A
2
i
Solution:
^
, =
A
i
(,A
i
+n
i
+-
i
)
A
2
i
= ,
A
2
i
A
2
i
+
A
i
n
i
A
2
i
+
A
i
-
i
A
2
i
= , +
1
a
A
i
n
i
1
a
A
2
i
+
1
a
A
i
-
i
1
a
A
2
i
(d) (2 points) Assume that you can use the Law of Large Numbers for all the above averages, so that each
average will converge in probability to the corresponding expectation when : grows large. What does
^
,
converge in probability to?
Solution: We know that
^
,
1
, because by L.L.N,
1
a
A
2
i
1
1
_
A
2
i
_
,
1
a
A
i
n
i
1
0 and
1
a
A
i
-
i
1
0.
(e) (2 points) Is
^
, a consistent estimator for ,? How does this result compare with the care where you have
measurement error in the regressor?
Solution: As shown in the last question,
^
, is a consistent estimator of ,. That is, as : becomes large
enough, the estimator approaches its true value, even with the measurement error. This is in contrast
with when measurement error is in the regressor, in which case, the estimator is inconsistent. While
measurement error in A induces correlation between regressor and the error, this is not the case when 1
is measured with error, and the error has the structure dened in the problem.
(f) (2 points) What is the intuition behind this result (Hint: it might be useful to prove rst that the
regression we are estimating is not (1) above, but rather 1
+
i
= ,A
i
+
i
, where
i
= (n
i
+-
i
) is the true
error of the regression we are actually estimating).
Solution:
1
+
i
= ,A
i
+
i
i
= (n
i
+-
i
)
^
, = , +
1
a
A
i
n
i
1
a
A
2
i
+
1
a
A
i
-
i
1
a
A
2
i
= , +
1
a
A
i
i
1
a
A
2
i
Since 1(
i
[A
i
) = 1(n
i
+-
i
[A
i
) = 0, we see that the OLS assumptions still hold.
17. You want to estimate the parameters of a production function, and you have a sample of : factories. Let 1
i
be output for the i
tI
factory, let 1
i
be labor, and let 1
i
be capital. You know that
1
i
= c1
o
i
1
c
i
+n
i
(1)
and you also know that 1 [n
i
[ 1
i
, 1
i
] = 0.
(a) (3 points) Prove that 1 [1
i
[ 1
i
, 1
i
] = c1
o
i
1
c
i
.
Solution
20
1[1
i
[1
i
, 1
i
] = 1[c1
o
i
1
c
i
[1
i
, 1
i
] +1[n
i
[1
i
, 1
i
]
= 1[c1
o
i
1
c
i
[1
i
, 1
i
] + 0
= 1[c1
o
i
1
c
i
[1
i
, 1
i
]
= c1
o
i
1
c
i
(b) (4 points) Suppose that you have estimated the model in (1), and you found ^ c =
^
, = 1, and
^
c = 0.5.
What is the predicted change in output associated to an increase in capital from 10 to 11, if labor is equal
to 10?
Solution
1
i
= 1 10 11
0.5
1 10 10
0.5
- 1.54
(c) (5 points) Can you estimate the parameters c, ,, and c in model (1) using OLS? If your answer is yes,
explain why. If you answer is no, explain why, and propose an alternative estimator that you may use.
Solution: No, the model is nonlinear in the parameters and cannot be estimated with OLS.
It can be estimated using NLLS as 1(n
i
[1
i
, 1
i
):
min
c,o,c
a
i=1
(Y
i
L
o
i
K
c
i
)
2
Take the rst order conditions FOC and solve numerically for the parameters c, ,, c.
18. You want to study what is the expected eect of years of schooling (denoted by o ) on income (denoted by 1
), but income is measured with error. You have n i.i.d. observations (1
+
i
, o
i
) , where 1
+
i
is observed income,
measured with error. You also know that the relation between observed income 1
+
i
and true income 1
i
is described by the following relation
1
+
i
= 1
i
+-
i
where -
i
is an i.i.d. zero-mean reporting error. The relation between true income and schooling is described
by the following equation:
1
i
= ,
0
+,
1
o
i
+n
i
1 [n
i
[ o
i
] = 0 = Co (o
i
, n
i
) = 0
However, you also know that reporting errors are not the same for individuals with very dierent schooling, so
that the error is correlated with the level of schooling. That is,
Co (o
i
, -
i
) = o
2
S.
,= 0
You want to estimate , using OLS, but since you do not observe true income 1
i
, your OLS estimator
~
, will
be
~
, =
a
i=1
_
o
i

o
_
1
+
i
a
i=1
_
o
i

o
_
2
21
(a) (4 points) Show that the OLS estimator for the slope can be rewritten as
~
, = , +
1
a
a
i=1
_
o
i

o
_
n
i
1
a
a
i=1
_
o
i

o
_
2
+
1
a
a
i=1
_
o
i

o
_
-
i
1
a
a
i=1
_
o
i

o
_
2
(1)
Answer:
,
1
=
(o
i
o)(1
i
+-
i
)
(o
i
o)
2
=
(o
i
o)(,
0
+,
1
o
i
+n
i
+-
i
)
(o
i
o)
2
=
,
1
(o
i
o)o
i
+
(o
i
o)n
i
+
(o
i
o)-
i
(o
i
o)
2
=
,
1
(o
i
o)
2
+
(o
i
o)n
i
+
(o
i
o)-
i
(o
i
o)
2
= ,
1
+
1
a
(o
i
o)n
i
+
1
a
(o
i
o)-
i
1
a
(o
i
o)
2
.
(b) (2 points) Now assume that all necessary regularity conditions hold, so that each sample mean in
equation (1) converges in probability to its corresponding expectation. So, for example,
1
:
a
i=1
_
o
i

o
_
2 j
1
_
(o
i
j
S
)
2
_
= o
2
S
What does
1
a
a
i=1
_
o
i

o
_
n
i
Answer:
1
:
(o
i
o)n
i

1
Co[o, n] = 0,
the last equality is by assumption of the model.
(c) (2 points) What does
1
a
a
i=1
_
o
i

o
_
-
i
Answer:
1
:
(o
i
o)-
i

1
Co[o, -] ,= 0,
where again, that the covariance is not zero is the models assumption.
(d) (3 points) Using the results in the previous parts, what does
~
, converge in probability to?
Answer:
,
1

1
,
1
+
Co[o, -]
\ ar[o]
,
where the last term is not zero.
(e) (4 points) Is
~
, a consistent estimator for the true slope ,? Is your conclusion the same as the one
you would get if the measurement error in the dependent variable were uncorrelated with the regressors?
Why?
Answer: As can be clearly seen in part d, the OLS estimator is inconsistent. The reason is the correlation
of the regressor with the composite error term, which includes the measurement error.
19. (10 points) Consider the multiple regression model with three regressors, where OLS Assumptions 1-3 are
satised:
j = ,
0
+,
1
r
1
+,
2
r
2
+,
3
r
3
+n
You would like to test the null hypothesis H
0
: ,
1
3,
2
= 1
22
(a) (3 points) Let

,
1
and

,
2
denote the OLS estimators of ,
1
and ,
2
. Find \ ar
_
,
1
3
,
2
_
in terms of the
variances of

,
1
and

,
2
and the covariance between them. What is the standard error of

,
1
3
,
2
?
Solution: From formula 2.31 in Stock and Watson, we know that
\ ar (aA +/1 ) = a
2
\ ar(A) +/
2
\ ar(1 ) + 2a/Co(A, 1 )
Applied here, \ ar
_
,
1
3
,
2
_
= \ ar(
,
1
) + 9\ ar(
,
2
) 6Co(
,
1
,
,
2
). The standard error is simply
o1
_
,
1
3
,
2
_
=
_
\ ar(
,
1
) + 9\ ar(
,
2
) 6Co(
,
1
,
,
2
)
(b) (2 points) What is the t-statistic for testing H
0
: ,
1
3,
2
= 1? (Just write the formula, since you are
not given any values to plug into it).
Solution:
t =

,
1
3
,
2
1
o1
_
,
1
3
,
2
_
(c) (5 points) Dene 0
1
= ,
1
3,
2
and

0
1
=

,
1
3
,
2
. Write a regression equation involving ,
0
, ,
2
, ,
3
,
and 0
1
that allows you to directly obtain

0
1
and its standard error and describe how you would use it to
test the null hypothesis in part b.
Solution: Because 0
1
= ,
1
3,
2
, we can write ,
1
= 0
1
+ 3,
2
. Plugging this into the population model
gives
j = ,
0
+ (0
1
+ 3,
2
) r
1
+,
2
r
2
+,
3
r
3
+n
= ,
0
+0
1
r
1
+,
2
(3r
1
+r
2
) +,
3
r
3
+n
This last equation is what we would estimate by regressing j on r
1
, (3r
1
+r
2
) , and r
3
. The coecient
and standard error on r
1
are what we want. Specically, we can test the null hypothesis H
0
: ,
1
3,
2
= 1
with the following t-statistic: t =
b
0
1
1
S1(
b
0
1)
23
3 Empirical (or Empirical + Theory) Univariate
1. (25 points overall) You have obtained a sample of 1744 individuals and are interested in the relationship
between weekly earnings and age. We assume that the usual OLS assumptions hold. Linearity then implies
that
car:i:q:
i
= ,
0
+,
1
aqc
i
+n
i
The OLS regression, using heteroskedasticity-robust standard errors, yielded the following result (standard
errors in parenthesis):
car:i:q: = 239.16
(20.24)
+ 5.20
(0.57)
qc (4)
1
2
= 0.05
where Earnings and Age are measured in dollars and years respectively.
(a) (5 points) Is the relationship between Age and Earnings statistically signicant at the 1% signicance
level?
H
0
:
^
,
1
= 0
H
1
:
^
,
1
,= 0
t =
5.20 0
0.57
= 9.123 2.58
Therefore, reject the null hypothesis.
(b) (5 points) Suppose that you want to test the null hypothesis that becoming one year older increases
your expected weekly earnings by 4 dollars, versus a two-sided alternative. State clearly H
0
and H
1
, and
compute the p-value. Would you reject the null at the 5% signicance level? And at the 1% signicance
level?
H
0
:
^
,
1
= 4
H
1
:
^
,
1
,= 4
t =
5.20 4
0.57
= 2.105
p-value = 2([t[)
= 2 0.0179 = 0.0358
Therefore, reject the null hypothesis at 5 %, but fail to reject at 1 %.
(c) (5 points) Compute a 95% condence interval for the eect on expected earnings of increasing a workers
age from 35 to 40 years. Do you think the expected change looks important, in economic terms? Solution:
5 (
^
,
1
1.96 o1(
^
,
1
))
5 (5.20 1.96 0.57) = (26 5.586) = (20.414, 31.586)
(d) (5 points) Give an economic interpretation of the estimates in regression (1) above: why should age
matter in the determination of earnings? Do you think that growing old, by itself, causes an increase
in your expected earnings? Solution: Age is a proxy for experience. Aging by itself does not increase
income.
(e) (5 points) Do you think that assuming that the relationship between age and earnings is linear was a
good idea? Discuss briey. Solution: No. Slope probably begins to decrease and turns negative as a
person approaches retirement.
24
2. (7 points) You are interested in comparing the eect of increasing total monthly expenditure per person (call
it '11 ) on the fraction of the budget spent on food (the food budget share, call it 11o ) between two
Indian States, Andhra Pradesh (AP) and Uttar Pradesh (UP). You decide to draw two independent samples,
one from AP, and one from UP, and to run two separate regressions using OLS, using heteroskedasticity robust
standard errors. Assume that all the necessary OLS assumptions hold, so that the usual asymptotic results
are correct. Your OLS regression for AP turns out to be
11o
1
= 1.587
(0.024)
0.153
(0.0038)
'11
1
while your estimated OLS regression for UP is
11o
l1
= 1.218
(0.018)
0.0990
(0.0029)
'11
l1
Using these estimates, test the hypothesis that the eect on 11o of a unit increase in '11 is the same
across the two dierent Indian states, using a 1% signicance level. Solution: Since the two draws for the
two equations are independent, we can just use a t-test with the two , parameters were interested in. Dene
= ,
21
,
2l1
.
H
0
: = 0
H
1
: ,= 0
Now, run the usual t-test.
t =
^ 0
o1(^ )
=
0.153 + 0.0990
_
0.0038
2
+ 0.0029
2
=
0.054
0.0048
= 11.3
Therefore, reject the null hypothesis at 1 % that the result of a unit increase in MEP is equal in the two states.
3. (24 points) You have collected data on births from a random sample of women in the United States. Two
variables of interest are the dependent variable, infant birth weight in ounces (/nq/t), and an explanatory
variable (ciqdn:), which is a dummy (0,1) variable equal to 1 if the mother smoked during pregnancy and
equal to 0 if she did not. The following simple regression was estimated using data on : = 1388 births:
/nq/t = ,
0
+,
1
ciqdn:+n
(a) (3 points) What is the interpretation of ,
0
and ,
1
in this example? What do the parameter estimates
imply about the relationship between birth weight and smoking? (You do not have to discuss statistical
signicance). Answer: ,
0
is the population mean birth weight for babies with mothers who did not
25
smoke during pregnancy. Similarly, ,
0
+ ,
1
is the population mean birth weight for babies with mothers
who did smoke during pregnancy (you did not need to note this part on the exam). ,
1
is then the
dierence between the population mean of birth weights for babies with mothers who did not smoke
during pregnancy and the population mean birth weight for babies with mothers who did. The negative
coecient on C1G1l' implies that expected birth weights are lower among mothers who smoked during
pregnancy.
(b) (4 points) Formulate and conduct a test at the 5% level of the null hypothesis that birth weights are
not aected by cigarette smoking (use a two-sided alternative hypothesis). What do you conclude? Do
you think a two-sided alternative hypothesis is sensible in this context? Answer: The hypothesis test
should be set up as follows H
0
: ,
1
= 0 H
: ,
1
,= 0. The t-stat, which is also reported in the Eviews
output, is t =
8.910
1.44
= 6.18. The corresponding p-value = 2(6.18) - 0. Given that the p-value
< .05 (our signicance level), we can reject the null that the mean birth weights are the same. Note: You
could have also used a condence interval or acceptance region to answer this. A two-sided alternative
seems appropriate here since we dont necessarily have a reason to believe that babies born to mothers
who smoked will have a lower birth weight than babies born to mothers who did not smoke. They could
also be heavier.
(c) (4 points) How much of the variation in birth weight is explained by whether or not a mother smokes
during pregnancy? Does this mean that cigarette smoking does not have a signicant impact on birth
weight? Explain. Answer: According to the 1
2
reported in the regression output, about 2.5% of the
variation in birth weight is explained by whether or not a mother smokes during pregnancy. This 1
2
is
clearly quite small, suggesting that there are undoubtedly many other factors that impact birth weight
(such as mothers nutrition, race, and the length of pregnancy). However, the low value of the 1
2
does
not mean that smoking during pregnancy has an insignicant impact on birth weight. This requires a
test of the statistical signicance of ,
1
, as was conducted in part b (where we found that it did).
(d) (4 points) How many of the women in the sample smoked while pregnant? Justify your results. Answer:
We can see in the regression output that Mean dependent variable = 1\GHT = 118.6996 Since the
OLS formula for

,
0
tell us that 1 =

,
0
+
,
1
A we can solve for
C1G1l' =
1\GHT
,
0
,
1
=
118.6996 120.0612
8.914998
= .153
The number of smokers is then C1G1l' : = .153 1388 = 212
(e) (3 points) Suppose that instead of using the dummy variable ciqdn: as the explanatory variable, you
use the average number of cigarettes the mother smoked per day during pregnancy (ciq:).
/nq/t = ,
0
+,
1
ciq: +n
Why might you prefer this regression to the one reported above? Answer: Employing a continuous
measure of cigarette consumption instead of a binary one allows us to quantify the impact per cigarette
on birth weight. In particular, we can now calculate the predicted eect on birth weight of smoking only
one cigarette per day during pregnancy (or the eect of smoking 20).
26
(f) (3 points) Here are the results of the regression described in part e).
What is the predicted birth weight when ciq: = 0? How about when ciq: = 20? Answer: We know
that the prediction equation for the regression specied here is given by

1\GHT =

,
0
+
,
1
C1Go. At
C1Go = 0 :

1\GHT = 119.77 .514 0 = 119.77. So the predicted birth weight when C1Go = 0 is
119.8 ounces. At C1Go = 20 :

1\GHT = 119.77 .514 20 = 109.49. So the predicted birth weight
when C1Go = 20 is 109.5 ounces.
(g) (3 points) Does this regression necessarily capture a causal relationship between the childs birth weight
and the mothers smoking habits? Explain. Answer: Not necessarily. If there are other unobservable
factors such as parents income or nutritional habits that impact birth weight, and if cigarette smoking
is related to those factors, then we may only be uncovering a spurious correlation between smoking and
birth weight. If the assumptions of OLS hold for certain, then we are on somewhat safer ground (of
course, there is no way to know for certain that they do). Causation in econometrics is a subtle issue that
we will return to later in the course.
27
4. (22 points) You have collected data on the prices of diamonds at an online retailer (you can assume its a
random sample). In particular, you have the price (jricc) in U.S. dollars and the weight in carats (carat:) for
380 diamonds which are of similar clarity and cut (2 measures of diamond quality). Here is a scatterplot of
the data
(a) (4 points) Would you be comfortable using homoskedasticity only standard errors to construct a con-
dence interval for ,
1
in the following regression
jricc = ,
0
+,
1
carat: +n
Why or why not? Explain. Answer: No. Looking at the scatterplot, there is reason to believe that
the dispersion in jricc increases with carat:: there appears to be more volatility in jricc when carat:
is larger. This suggests that the errors are likely to be heteroskedastic. Given this evidence, it would
not be wise to assume homoskedasticity since, if the errors are heteroskedastic, the homoskedasticity only
standard errors are incorrect and could well lead to incorrect conclusions.
(b) (3 points) The following simple regression was estimated
jricc = ,
0
+,
1
carat: +n
What is the interpretation of ,
0
here? Does the parameter estimate make sense to you? Explain.
Answer: ,
0
is the predicted value of the price of a diamond when carat: = 0 (i.e. the value of the
28
population regression line when r = 0). Intuitively, we expect that this should be equal to 0, since a
diamond with no weight (i.e. no diamond) should not have any value. The negative parameter estimate
is clearly strange, given the interpretation of ,
0
. However, a look at the scatter plot reveals that we do
not in fact observe diamonds with carat: = 0 (we cant really) or even any diamonds with weights close
to 0. Since we are then extrapolating (i.e. predicting out of sample), we should not try to interpret this
coecient estimate.
(c) (3 points) Using the results reported above, construct a 95% condence interval for the predicted aect
on price of a .2 carat increase in weight. Answer: The general format for a condence interval for a
change in A of A is

,
1
A .
c2
o1(
,
1
) A Here we have

,
1
A 1.96 o1(
,
1
) A =
5573.34 .2 1.96 187.14 .2 = 1114.67 73.36 = (1041.31, 1188.03)
(d) (4 points) Calculate the p-value for the null hypothesis that ,
1
= 5200 against a two-sided alternative.
Can you reject the null when c = .05? Answer: The t-stat, which is not reported in the Eviews output,
is t =
5573.355200
187.14
= 1.995. The corresponding p-value = 2(1.995) = .046. Since the p-value = .046 <
c = .05, we can reject the null in this case.
(e) (4 points) What is the interpretation of 1
2
in this example? Why do you think is it so dierent from what
we found in the birth weight regressions in the previous question? Answer: The 1
2
is the percentage of
the total variation in price explained by the regression. In this case it is about 65%, which is quite high.
This should not be surprising, given the high degree of correlation apparent in the scatter plot. Also,
intuitively, it seems likely that weight would be a prominent factor in explaining the price of diamonds,
especially given that these are diamonds of similar cut and clarity
1
(so I have eectively held at least two
measures of quality constant - something I was not able to do in the smoking example). In other words,
there are arguably fewer outside factors impacting the dependent variable (price) in this case than in the
smoking example.
(f) (4 points) Suppose you also have data on the color of the stone (note that people strongly prefer
diamonds with as little color as possible). Let the variable (co|or) be a dummy variable equal to 1 if the
diamond is near colorless and equal to 0 if it is faintly yellow (none of the diamonds in this dataset are
truly colorless). The following simple regression was estimated
jricc = ,
0
+,
1
co|or +n
Is there a signicant price premium for near colorless stones? (To answer this question, set up a hypothesis
test and calculate the p-value for the null hypothesis against a one-sided alternative, using a 1% signicance
level). Is a one-sided alternative sensible in this context?
1
I noted this in the set-up of the problem.
29
(g) Answer: The hypothesis test should be set up as follows. H
0
: ,
1
= 0 H
: ,
1
0. The t-stat, which
is also reported in the Eviews output, is t =
649.950
100.08
= 6.49. The corresponding p-value = (6.49) - 0.
Given that the p-value < .01 (our signicance level), we can reject the null that there is no price premium..
Here, a one-sided test does seem appropriate, since the problem states that people have strong preferences
for diamonds with less color.
5. You are interested in studying the factors that inuence a persons decision of whether to go to college.
Therefore, you have collected data from 3796 high-school graduates, 6 years after they graduated from high
school. You can assume you have an iid sample. In particular, you observe their total years of education
(jr:cd), which ranges from 12 to 18, and whether or not at least one of their parents graduated from college.
(a) (4 points) Out of the 3796 people in your dataset, 954 have at least one parent who graduated from
college. The average years of education (jr:cd
c
) for this group is 14.8 years with a sample standard
deviation (:
c
) of 1.74. The remaining 2842 people in the sample have parents who did not graduate from
college. In this group, the average years of education (jr:cd
ac
) is 13.5 years with a sample standard
deviation (:
ac
) of 1.72. Using this information, construct 95% condence intervals for the population
means of years of education for each group. Solution: For the people with at least one parent who
went to college, a condence interval for j
jvccoc
is given by jr:cd
c
1.96 o1
_
jr:cd
c
_
. We know that
jr:cd
c
= 14.8 and o1
_
jr:cd
c
_
=
cc
_
ac
=
1.74
_
954
= .056.Therefore the condence interval is jr:cd
c
1.96
o1
_
jr:cd
c
_
= 14.8 1.96 .056 = (14.69, 14.91) . For the people for which neither parent went to college,
a condence interval for j
jvcconc
is given by jr:cd
ac
1.96 o1
_
jr:cd
ac
_
In this case, jr:cd
ac
= 13.5 and
o1
_
jr:cd
ac
_
=
cnc
_
anc
=
1.72
_
2842
= .032.Therefore the condence interval is jr:cd
ac
1.96 o1
_
jr:cd
ac
_
=
13.5 1.96 .032 = (13.44, 13.56)
(b) (5 points) Using a 1% signicance level and the information in the setup of part a), formulate and
conduct a test of the null hypothesis that there is no dierence in the mean of jr:cd between the two
groups of people. (You may assume that the two population variances are equal). Solution The null and
alternative hypotheses are written: H
0
: j
c
j
ac
= 0, H
: j
c
j
ac
,= 0. To test this hypothesis we can
either construct a condence interval (using .
c2
= 2.58) or simply calculate the p-value. For either one,
we need to calculate
o1
_
jr:cd
c
jr:cd
ac
_
=
_
:
2
c
:
c
+
:
2
ac
:
ac
=
_
(1.74)
2
954
+
(1.72)
2
2842
= .065
Using the information in part a, a 99% CI is simply:
jr:cd
c
jr:cd
ac
2.58 o1(jr:cd
c
jr:cd
ac
)
= 14.8 13.5 2.58 .065 = (1.13, 1.47)
Since this CI does not contain zero, we can reject the null hypothesis at the 1% level. To calculate the
p-value, we rst need the t-stat which is t =
jvcco
c
jvcco
nc
0
S1(jvcco
c
jvcco
nc
)
=
1.30
.065
= 20 The p-value is then p-value
= 2(20) - 0. This small p-value leads us to reject the null at the 1% level, just as we concluded with
the condence interval.
(c) (3 points) Now, again using a 1% level of signicance, repeat the exercise in part b using a one-sided
test (where the alternative is that the mean of jr:cdnc is greater for people for whom a parent graduated
from college). What do you conclude now? Solution: The null and alternative hypotheses should now
be written as
H
0
: j
c
j
ac
= 0,
H
: j
c
j
ac
0
The t-stat is now t =
jvcco
c
jvcco
nc
0
S1(jvcco
c
jvcco
nc
)
=
1.30
.065
= 20 and the p-value is now p-value = 1 (20) - 0. So,
we denitly reject the null hypothesis at the 1% level.
30
(d) (3 points) Now, lets analyze the data using a univariate regression. Using the same dataset, you
construct a dummy variable (jarco|), which is equal to 1 if at least one of the persons parents graduated
from college and 0 if not. The following simple regression was estimated using data on all : = 3796
people:
jr:cd = ,
0
+,
1
jarco| +n
jr:cd = 13.5
(.032)
+ 1.30
(.065)
jarco|, 1
2
= .095
What is the interpretation of the constant in this regression? Does it have a meaningful interpretation in
this model? Solution: ,
0
is the average years of education for people with no parents who went to college.
This does have a meaningful interpretation in this setting: its simply a group mean. Furthermore, as
it should, our estimate,

,
0
, matches what we were told about the sample estimate of this group mean
(jr:cd
ac
) in the setup of part a)
(e) (3 points) What is the interpretation of the slope in this regression? Using the regression results, what is
the predicted mean of jr:cd for people with at least one parent who graduated from college? Solution: ,
1
is the dierence in mean years of education between people with no parents who went to college and people
with at least one parent who went to college, or j
c
j
ac
above. Our estimate,

,
1
, matches the estimate
(jr:cd
c
jr:cd
ac
) that we calculated in part b). The predicted mean is

,
0
+
,
1
= 13.5 + 1.3 = 14.8,
which matches what we were told in the setup of part a).
(f) (4 points) Repeat the hypothesis test in part b) using the regression results. Do your conclusions change?
Should they? Solution: The null and alternative hypotheses are now:
H
0
: ,
1
= 0
H
: ,
1
,= 0
As always, to test this hypothesis we can either construct a condence interval (using .
c2
= 2.58) or
simply calculate the p-value. Using the regression results:

,
1
1.96 o1(
,
1
) = 1.3 2.58 .065 =
(1.13, 1.47) . To calculate the p-value, we rst need the t-stat which is t =
b
o
1
0
S1(
b
o
1
)
=
1.30
.065
= 20. The
p-value is then p-value = 2(20) - 0. These are the same results we found in part b), which is not
surprising since this procedure is equivalent to a dierence in means analysis.
(g) (6 points) In addition to the information on parents college status, you have also collected information
on the distance to the nearest college (di:t) in 10s of miles (di:t has a range of 0 to 16). You decide to
run another regression, this time using di:t as the regressor. Here are the results:
jr:cd = 13.9
(.038)
.073
(.014)
di:t, 1
2
= .01
What is the interpretation of the constant in this regression? What is the interpretation of the slope?
Is the slope statistically signicant at the 5% level? Solution The intercept here is the expected years
of education for a person who lives 0 mile from a college (probably a current student). The slope is the
expected change in years of education associated with a 10 mile increase in the distance to the nearest
college. To address the issue of signicance, we need to test the following null hypothesis:
H
0
: ,
1
= 0
H
: ,
1
,= 0
Again, to test this hypothesis we can either construct a condence interval (using .
c2
= 1.96) or simply
calculate the p-value.The condence interval is:
,
1
1.96 o1(
,
1
) = .073 1.96 .014 = (.10, .05)
To calculate the p-value, we rst need the t-stat which is t =
b
o
1
0
S1(
b
o
1
)
=
.0730
.014
= 5.21The p-value is
then p-value = 2(5.21) - 0.Either way, we can reject the null, so di:t does have a signicant (and
negative) impact on jr:cd.
31
(h) (3 points) What is the interpretation of 1
2
in this regression? Does the low value of 1
2
imply that
the coecient on di:t is not statistically signicant at the 1% level? Solution The 1
2
represents the
percent of variation in the dependent variable (jr:cd) that is explained by the regressor (di:t). Here we
see that we are explaining about only about 1% of the variation in jr:cd with di:t. While this low 1
2
does tell us that only a small part of the variation in jr:cd is explained by the distance to the nearest
college, the question of statistical signicance can only be answered with a formal test of signicance, i.e.
by calculating a p-value or a t-stat.
(i) (3 points) Using the regression results, what is the predicted mean years of education for a person who
lives 13 miles from the nearest college? How about a person who lives 100 miles from the nearest college?
Solution: The best predicted value for a person who lives 13 miles away (di:t = 1.3) is simply
jr:cd = 13.9 .073 di:t = 13.9 .073 1.3 = 13.81

For a person who lives 100 miles away we have
jr:cd = 13.9 .073 di:t = 13.9 .073 10 = 13.17

(j) (3 points) Construct a 95% condence interval for the expected decrease in mean years of education
associated with moving 20 miles farther away from the nearest college. Solution: This is asking for a
condence interval for 2,
1
, which is constructed as:
C1 = 2
,
1
1.96 2 o1
_
,
1
_
= 2 .073 1.96 2 .014 = (.2, .09)
32
6. (12 points) The director of marketing for the Durham Bulls baseball team is interested in the number of
games that Duke undergraduates attend. Each undergraduate at Duke was classied according to their year in
school (freshman, sophomore, junior, or senior) and according to the number of times they attended a Durham
Bulls baseball game that year (never, once, or more than once). The proportion of students in the various
classications are given in the following table:
Never Once 1
Freshman 0.08 0.10 0.04
Sophomores 0.04 0.10 0.04
Juniors 0.04 0.09
Seniors 0.02 0.15 0.10
(a) (2 points) What is the value that belongs in the missing space? Explain. Solution The missing value
is 0.20 since the probabalities (here proportions) must all sum to 1.
(b) (2 points) If there are 6,200 undergraduates, how many of them are freshman? Solution: To answer
this, we need to know the proportion (probability) of students in the freshman class. In particular, we need
to calculate 1(1 car = 1rc:/:a:) using the denition of marginal probabilty: 1(1 = j) =
a
i=1
1(A =
r
i
, 1 = j) We see that 1(1 car = 1rc:/:a:) = .08 + .10 + .04 = .22. So the number of freshman is
.22 6200 = 1364.
(c) ( 4 points) If a student selected at random from the Duke undergraduate population is a junior, what
is the probability that the student has never attended a Bulls game?
Solution: We are interested in nding 1 (Ga:c: = ccr [ 1 car = Jn:ior) . Since
1(Ga:c: = ccr [ 1 car = Jn:ior) =
1(1 car = Jn:ior, Ga:c: = ccr)
1(1 car = Jn:ior)
we need to calculate 1(1 car = Jn:ior) using the denition of marginal probabilty: 1(1 = j) =
a
i=1
1(A = r
i
, 1 = j) We see that 1(1 car = Jn:ior) = .04 + .20 + .09 = .33 Therefore, 1(Ga:c: =
ccr [ 1 car = Jn:ior) =
.04
.33
= .12
(d) (4 points) If a student selected at random from the undergraduate population has attended one game,
what is the probability that the student is a senior? Solution Now we are interested in nding 1 (1 car = oc:ior [ Ga:c: = O:c) .Since
1 (1 car = oc:ior [ Ga:c: = O:c) =
1(Ga:c: = O:c, 1 car = oc:ior)
1(Ga:c: = O:c)
we need to calculate 1(Ga:c: = O:c) using the denition of marginal probabilty: 1(1 = j) =
a
i=1
1(A =
r
i
, 1 = j). We see that 1(Ga:c: = O:c) = .1 + .1 + .2 + .15 = .55. Therefore, 1(1 car = oc:ior [
Ga:c: = O:c) =
.15
.55
= .27.
7. You have a random sample of 5911 individuals from the US Current Population Survey (CPS) and are interested
in the relationship between hourly earnings and education. You estimate the following model:
\aqc
i
= ,
0
+,
1
C
i
+-
i
. (1)
where \aqc
i
is average hourly earnings for the i
tI
individual and C
i
is a dummy equal to one if the i
tI
individual graduated from college and zero if not. The results from a univariate regression of \aqc on this
college dummy variable are as follows (homoskedasticity-only standard errors in parenthesis):
\aqc
i
= 11.7
(.11)
+ 5.09
(.17)
C
i
1
2
= .13 (2)
where

\aqc
i
is the OLS predicted (tted) value.
33
(a) (2 points) What is the interpretation of the intercept in this regression? Solution: The intercept in
this regression represents average hourly earnings for an individual who has not graduated from college.
There is no problem with interpretation since we are not extrapolating in this case. Here, we nd that
non-college graduates earn $11.70 per hour on average.
(b) (2 points) Calculate the predicted average hourly earnings for an individual who has graduated from
college.Solution: This is just 11.7 + 5.09 (1) = 16.79 or $16.79 per hour.
(c) (3 points) Calculate a 90% condence interval for ,
0
.
11.7 1.645 (.11) = (11.52, 11.88)
(d) (5 points) Calculate a 95% condence interval for the average hourly earnings of a college graduate,
given that

Co
_
,
0
,
,
1
_
= 0.01.Solution: To answer this question, you have to use the covariance
that is provided above. The parameter you are estimating is ,
0
+ ,
1
, so that the condence interval is
,
0
+
,
1
1.96 o1
_
,
0
+
,
1
_
where
o1
_
,
0
+
,
1
_
=
_
\ ar
_
,
0
+
,
1
_
=
_
\ ar
_
,
0
_
+

\ ar
_
,
1
_
+ 2
Co
_
,
0
,
,
1
_
=
_
.11
2
+.17
2
+ 2 (.01) = .145
so the CI is
11.7 + 5.09 1.96 .145 = 16.79 .284 = (16.5, 17.1)
(e) (4 points) Is the dierence in mean \aqc between college graduates and non-college graduates statis-
tically signicant at the 1% level? Solution: We can test this with the following t-statistic:
t =
5.09 0
.17
= 29.94 2.58
So the answer is yes.
(f) (3 points) Is the dierence in mean \aqc between college graduates and non-college graduates large, in
economic terms? Solution: it is very large. In fact, according to this regression, college graduates earn
about 44% more per hour than non-college graduates. This is a substantial dierence, especially once you
take into account how many working hours there are in a year!
(g) (2 points) How much of the variation in average hourly earnings is our regression explaining? Is this
surprising? Solution: According to the 1
2
, about 13%. This doesnt seem like a lot, but shouldnt
be too surprising since this is cross-sectional data and there are undoubtedly lots of other factors that
determine wages, such as ability, race, gender, profession, experience, etc.
(h) (4 points) Consider a two-sided test where the null hypothesis is that ,
1
= 5. Calculate the p-value for
this test. Can you reject the null at the 10% level? Solution: First, lets calculate the t-statistic:
5.09 5
.17
= .53
Then the p-value is simply
2([.53[) = 2 .298 = .596
With this large p-value ( .10) we clearly cant reject the null.
(i) (4 points) The standard errors in the estimated regression (2) were calculated assuming homoskedasticity.
Now you re-estimate the model using the same data, but allowing for heteroskedasticity, and your results
are as follows:
\aqc
i
= 11.7
(.115)
+ 5.09
(.175)
C
i
(3)
Do the results suggest that the standard errors from the rst model you have estimated (i.e. (2)) are
reliable? Solution: Since the dierence is quite small, both in absolute and in relative terms, het-
eroskedasticity does not seem to be a problem in this setting.
34
(j) (4 points) The point estimates for ,
0
and ,
1
in (2) and (3) are exactly the same. Does this make
sense? Why or why not? Solution: Absolutely. The adjustment in the standard errors that allows
for heteroskedasticity has nothing whatsoever to do with the point estimates, which remain numerically
identical. As an aside: There are other estimators (apart from OLS) that do change the point estimates
because they use information about the variance of the error when they calculate the point estimates
(these are called GLS, or Generalized Linear Squares - a practical form of WLS), but this does not
happen with OLS, so the answer is yes. The point estimates wouldnt change even if the data were very
heteroskedastic.
(k) (4 points) A classmate sees your results and concludes that there is clear evidence that graduating
from college increases an individuals expected earnings. Do you agree with this conclusion (based on the
results above)? Why, or why not? Solution: You shouldnt agree with this conclusion. While there is
ample reason to believe that theres a strong and positive return to education, this simple comparison is
not enough to establish this empirically. As I mentioned in part g (and also discussed in class), there are
lots of other things that determine a persons wage (like ability for example) that may also be correlated
with the included regressor. To establish a causal eect, we would need to control for these other factors.
(l) (5 points) Suppose you want to estimate the \aqc ratio between college graduates and non-college
graduates. Let 0 denote the true value for this parameter. Another classmate suggests using the following
estimator:
0 =

,
0
,
1
Would you accept your classmatess suggestion? In particular, is

0 a consistent estimator for 0? If so,
prove that it is. If not, prove that is isnt and propose an alternative but consistent estimator. Solution:
You shouldnt. The ratio between \aqc for college and non-college graduates is
o
0
+o
1
o
0
not
o
0
o
1
, so that
the proposed estimator is denitely not consistent a consistent estimator of 0 =
o
0
+o
1
o
0
. In fact, from OLS
Assumption 1, we know that

,
0
j
,
0
and

,
1
j
,
1
, so that
0 =

,
0
,
1
j
,
0
,
1
,= 0
Instead, a consistent estimator of the ratio we are actually interested in is
0
+
=

,
0
+
,
1
,
0
j
,
0
+,
1
,
0
= 0
(m) (4 points) Suppose that you happen to know that

,
0
and

,
1
are unbiased estimators for ,
0
and ,
1
respectively. Does this imply that
b
o
0
b
o
1
will be an unbiased estimator for
o
0
o
1
? Why or why not? Solution:
Although expectation is a linear operator, a ratio is NOT a linear function, and thus it is not generally
true that
1
_
A
1
_
=
1 (A)
1 (1 )
so while it is true that
1
_
,
0
_
1
_
,
1
_ =
,
0
,
1
it is not true that
1
_
,
0
,
1
_
=
,
0
,
1
In other words, unbiasedness of the

,s does not imply unbiasedness of the ratio. While it is certainly
true that
j|i:

,
0
,
1
=
j|i:

,
0
j|i:

,
1
=
,
0
,
1
this has nothing to do with unbiasedness.
35
8. (4 points) Suppose now that you know that the true value of the parameter ,
1
is 5. Does this imply that
,
1
(the OLS estimator for the parameter ,
1
) is a biased estimator for ,
1
? Justify your answer.Solution:
Denitely not. An estimator

,
1
biased for ,
1
if 1
_
,
1
_
,= ,
1
and we already know from OLS Assumption 1
that 1
_
,
1
_
,= ,
1
. We are told that ,
1
= 5, while our regression provides an estimate

,
1
= 5.09. Knowing the
true value of ,
1
doesnt change the fact that 1
_
,
1
_
= ,
1
if OLS Assumption 1 holds. Given that samples
vary, it will virtually never be the case that our estimate
_
,
1
_
hits the true value of ,
1
exactly. It would
make no sense at all to evaluate how good an estimator is based on whether we get the right value of the
parameter every time.
36
4 Empirical Multivariate
1. Using a sample of 534 i.i.d. observations from the 1985 CPS (Current Population Survey), you have estimated
that the relation between wages \
i
and years of schooling o
i
is described by the following relation:
ln\
i
=
^
,
0
+
^
,
1
o
i
= 1.06
(.10743)
+ 0.077
(.00809)
o
i
where heteroskedasticity-robust standard errors are reported in parenthesis, and ln\
i
is the natural logarithm
of the hourly wage. You can assume that all the usual OLS assumptions hold.
(a) (4 points) What is the economic interpretation of the estimated slope in the above regression? Solution:
This is a log-linear regression, so one more year of education is expected to increase wages by about 8%.
(b) (3 points) Does the relation between schooling and wage appear important, in economic terms? Solution:
Yes, it does; an 8% increase in wages is associated with only one additional year of schooling. No
argument on the statistical signicance was required here!
(c) (3 points) What is the expected level of (ln\
i
) for an individual with 15 years of schooling? Solution:
|:(\
i
) = 1.06 + 0.077 + 15 = 2.215
(d) (4 points) In the above regression, is the slope signicantly dierent from zero, using a 1% signicance
level? Solution:Form hypotheses: H
0
: ,
1
= 0, H
1
: ,
1
,= 0, The test statistics is constructed as follows:
t =

,
1
o1(
,
1
)
=
0.077
0.00809
= 9.52 t
1% cvitio| o|&c
= 2.58
= Reject the null that the slope coecient is not signicantly dierent from zero.
(e) (5 points) You want to test the null hypothesis that
^
,
1
= 0.1, against a two-sided alternative. Calculate
the p-value for this test. Solution: Form test statistic:
t =

,
1
0.1
o1(
,
1
)
=
0.077 1
0.00809
= 2.84.
p-value then can be computed as follows:
j a|nc = 2([t[) = 2 + 0.0023 = 0.0046 - 0.5%.
(f) (5 points) Calculate a 95% condence interval for the predicted eect on (ln\
i
) of increasing schooling
by 3 years. Solution:
C1
95%
= 3 + [
,
1
t
2.5% cvit.o|&c
+ o1(
,
1
)] = 3 + [0.077 1.96 + 0.00809] = [0.183; 0.279]
(g) (4 points) Would you expect the point estimates of the coecients ,
0
and ,
1
to change if you did not
assume homoskedasticity? Why, or why not? Solution: No, OLS point estimates are not aected by
assumptions on \ ar(n
i
[A
i
).
(h) (4 points) You calculate the standard errors assuming homoskedasticity, and the estimated standard errors
for
^
,
0
and
^
,
1
are respectively 0.05986 and 0.00807. Do you think heteroskedasticity is a serious concern
in this regression? Solution: Probably yes. The SEs look almost identical for the slope but the SE for
the intercept is now about half as large as it was before.
(i) Now you complicate the model, since you want to study whether the relation between wage and schooling
changes if the individual is a Trade Union member. You estimate the following regression
ln\
i
= 0.91
(0.13)
+ 0.75
(0.25)
l
i
+ 0.08
(0.01)
o
i
0.035
(0.019)
l
i
o
i
where l
i
is a dummy variable equal to 1 if the worker is a Trade Union member, and zero otherwise.
(5 points) What is the predicted value of (ln\
i
) for a unionized individual with 12 years of schooling?
Solution:
|:(\
i
) = 0.91 + 0.75 + 0.08 + 12 0.035 + 12 = 2.2
37
(j) (5 points) Does the eect of schooling on (logarithm of) wages dier importantly, in economic terms,
between unionized and non-unionized individuals? Solution: Yes, it does. The eect of schooling is
measured by the slope. For unionized workers the estimated slope coecient is 0.045 (0.08-0.035), while
for non-unionized workers it is equal to 0.08. Thus, an increase in log wages from additional schooling for
unionized workers is on average about 50% (!) less than that for non-unionized individuals.
(k) (5 points) Test the null hypothesis that the predicted eect of increasing the level of schooling on (ln\
i
)
is the same for unionized and non-unionized individuals, against a two-sided hypothesis, with a 10%
signicance level. Solution: Here we have to test that the slope coecients are the same for two
categories of workers. The dierence in the slopes is captured by the coecient on the cross-product
variable, l
i
o
i
. Thus, our hypotheses are formulated as follows: H
0
: coef. on l
i
o
i
= 0, H
1
:coef.on
l
i
o
i
,= 0 t-statistic is calculated in usual way:
t :tat =
0.035
0.019
= 1.84.
Since the absolute value of t-statistic is greater than 1.64 (the critical value of t-distribution for a given
level of signicance), we reject the null.
2. 25 points overall. Earnings functions attempt to nd the determinants of earnings, using both continuous
and binary variables. One of the central questions analyzed in this relationship is the returns to education.
Your estimated regression looks like the following:
ln(1ar:) = 0.01
(0.16)
+ 0.101
(0.012)
1dnc + 0.033
(0.006)
1rjcr 0.0005
(0.0001)
1rjcr
2
+c
where 1ar: is average hourly earnings, 1dnc is years of education, 1rjcr is years of experience, and c is the
(estimated) error.
(a) (4 points) What is the eect of an additional year of experience for a person who has worked for 20
years? What is the eect for a person who has worked for 30 years?
Solution: For a person going from 20 to 21 years of experience,
ln(1ar:) = 0.033 (0.0005 21
2
0.0005 20
2
) = 0.0125
Or a 1.25% increase in earnings. For a person going from 30 to 31 years of experience,
ln(1ar:) = 0.033 (0.0005 31
2
0.0005 30
2
) = 0.0025
Or a 0.25% increase in earnings.
You want to nd the eect of introducing two variables, gender and marital status. Accordingly you
specify a binary variable that takes on the value of one for females and is zero otherwise (Female), and
another binary variable that is one if the worker is married but is zero otherwise (Married). Adding these
variables to the regressors results in:
ln(1ar:) = 0.21
(0.16)
+ 0.093
(0.012)
1dnc + 0.032
(0.006)
1rjcr 0.0005
(0.0001)
1rjcr
2
0.289
(0.049)
1c:a|c + 0.062
(0.056)
'arricd +c
(b) (4 points) Are the coecients of the two added binary variables individually statistically signicant?
38
Solution: Female is signicant at the 1% level, but Married is not signicant.
H
0
: , = 0
H
1
: , ,= 0
t
1cno|c
=
0.289 0
0.049
= 5.9
t
Aovvico
=
0.062 0
0.056
= 1.1
(c) (4 points) In percentage terms, how much less do females earn per hour, controlling for education and
experience? Is the dierence economically important?
Solution: Females earn approximately 29% less than their male counterparts. This is very economically
signicant.
(d) (4 points) In percentage terms, how much more do married people make? Is the dierence economically
important?
Solution: Married people make 6.2% more than singles. This is also economically signicant.
(e) (4 points) In your nal specication, you allow for the binary variables to interact. The results are as
follows:
ln(1ar:) = 0.14
(0.16)
+ 0.093
(0.011)
1dnc + 0.032
(0.006)
1rjcr 0.0005
(0.001)
1rjcr
2
0.158
(0.075)
1c:a|c + 0.173
(0.080)
'arricd 0.218
(0.097)
1c:a|c 'arricd +c
In percentage terms, how much less do single females earn per hour, when compared with single males,
keeping education and experience constant?
Solution: Single females earn about 15.8 % less than single males with comparable education and
experience.
(f) (5 points) In percentage terms, how much less do married females earn per hour, when compared with
married males, keeping education and experience constant?
Solution: Married women earn 0.158 0.218 = 37.6% compared to the baseline value of married
males.
3. (12 points overall) Suppose that, in the population, the relation between the budget share spent in education
related goods in family i is accurately described by the following linear model
:/arc
i
= 0.7 + 0.1 ln1C1
i
+ 0.03:c/oo|i:q
i
+n
i
where :/arc
i
is the budget share for education related goods (that is, the proportion of the total budget spent
on those goods), 1C1
i
is total expenditure per person, and :c/oo|i:q
i
is average years of schooling in the
family.
(a) What is the predicted budget share spent in education related goods for a family with total expenditures
per person equal to 600, and with average schooling equal to 12?
Solution:
0.7 + 0.1 ln(600) + 0.03 12 = 0.299
Or about 30% of the predicted budget.
39
(b) (2 points) Suppose that you estimate the above model omitting the variable :c/oo|i:q
i
. Describe the
condition necessary for the OLS estimator of the eect of ln1C1
i
on :/arc
i
to be still unbiased. Do you
think these conditions are satised here? If they are not, do you think that the coecient of the included
regressor will be biased upwards or downwards?
Solution: It would have to be that ln(1C1
i
) and :c/oo|i:q
i
are uncorrelated. This is unlikely to hold
true. We would expect total expenditure per head to be positively correlated with schooling, so if the
schooling variable is omitted, wed expect the coecient in front of ln(1C1
i
) to have upward bias.
(c) (2 points) You are still omitting the variable :c/oo|i:q
i
from the estimated regression. Suppose that
co(ln1C1
i
, :c/oo|i:q
i
) = 0.7 and ar(ln1C1
i
) = 0.36. Compute the value of the asymptotic bias of
the OLS estimate for the eect of ln1C1
i
on :/arc
i
.
Solution: Using the denition of omitted variable bias, we know that:
^
,
1
1
c
1
+c
2
o
ln(1C1
i
)ccIcc|iaj
i
o
2
ln(1C1
i
)
Therefore, the bias is c
2
0.7
0.36
= 0.0583.
4. (15 points) The following data, taken from Forbes Magazines 1996 survey of CEO (chief executive ocer)
compensation, contains information on CEO compensation at 770 publicly traded rms. Each rm in the
dataset had only one CEO in 1996. For each of the 770 rms in the dataset, we observe:
Salbon - The CEOs salary plus bonus (in 1000s of dollars)
Logsalbon - The natural log of the CEOs salary plus bonus (in 1000s of dollars)
Logsales - The natural log of the rms sales in 1996 (in millions of dollars)
Fiveret - The rms ve year average total return (in percentage)
Age - The CEOs age in 1996 (in years)
Grad - A dummy variable equal to 1 if the CEO attended a post-graduate program (e.g. MBA), 0 if not
Computer - A dummy variable equal to 1 if the rm is in the computer industry, 0 if not
Financial - A dummy variable equal to 1 if the rm is in the nancial industry, 0 if not
Here are the results of a regression of Logsalbon on the covariates:
1oqoa|/o: = 3.65
(.25)
+ .31
(.019)
1oqoa|c: + .0016
(.0012)
1icrct + .014
(.003)
qc .037
(.039)
Grad .0037
(.064)
Co:jntcr + .156
(.049)
1i:a:cia|
1
2
= .32
Here are some tests of a few joint hypotheses:
H
0
: ,
2
= ,
4
= 0, 1 :tati:tic = 1.423
H
0
: ,
2
= ,
5
= 0, 1 :tati:tic = 1.026
H
0
: ,
4
= ,
5
= 0, 1 :tati:tic = 0.473
H
0
: ,
2
= ,
4
= ,
5
= 0, 1 :tati:tic = 0.959
(a) (4 points) People frequently complain about the high salaries and bonuses earned by CEOs. Some
suggest that their compensation is almost totally disconnected from the performance of their rms. Using
the companys ve year return (1icrct) as a measure of a rms performance, do you nd evidence that
CEOs are rewarded for good performance? What eect do you nd? Justify your answer.
Solution: Here the RHS variable is measured in levels, and the LHS in logs so the relationship is log-linear
for 1icrct. Thus, the coecient of .0016 on 1icrct implies that a 1% change in 1icrct is expected to
increase oa|/o: by .16% (a pretty small eect). Moreover, since the t-statistic here is t =
.0016
.0012
= 1.33,
which is less than 1.96 so the eect is statistically insignicantly dierent from zero.
40
(b) (4 points) Do CEOs at larger rms earn higher salaries? If so, how much? Justify your answer using
1oqoa|c: as your measure of rm size.
Solution: Here the RHS variable is measured in logs as well, so the relationship is log-log for 1oqoa|c:.
Thus, the coecient of .31 on 1oqoa|c: implies that a 1% change in 1oqoa|c: is expected to increase
1oqoa|/o: by .31%. Moreover, since the t-statistic here is t =
.31
.019
= 16.3, which a lot bigger than 1.96
so the eect is statistically signicantly dierent from zero.
(c) (4 points) Do the data provide any evidence that CEOs receive a premium (i.e. higher earnings) for
having attended a post-graduate program? What eect do you nd? Explain.
Solution: Here the RHS variable is a dummy, so we will have to talk about the eect on 1oqoa|/o:
directly. Thus, the coecient of.037 means that having a post graduate degree actually lowers expected
1oqoa|/o: by .037. However, since the t-statistic here is t =
.037
.039
= .948, which is less than 1.96, the
eect is statistically insignicantly dierent from zero.
(d) (3 points) Suppose I re-run the regression using oa|/o: instead of 1oqoa|/o: as the dependent variable
and nd that 1
2
= .34. On the basis of this evidence, should I conclude that its better to run this new
regression instead of the earlier one? Explain.
Solution: No, you cant compare 1
2
s when the LHS variables are dierent, so you dont have the
information to make a reasonable comparison. (If you wanted to make a meaningful comparison, you
would need information I didnt give you, like the output of the new regression, and scatter plots of
the two dependent variables against some of the independent variables. Then you could use the method
outlined on page 205 for identifying nonlinearities. You would also need to think carefully about whether
the relationship should involve logs (i.e. have a theory in mind).
5. (24 points) An avid basketball fan, you have collected data on the average points-per-game (1oi:t:), years
in the league (Exper), age (Age), and the number of years played in college (College) for 269 basketball players
in the NBA. Using this dataset, you estimate the following regression:
1oi:t: = 35.2
(7.44)
+ 2.36
(.399)
1rjcr .077
(.026)
(1rjcr)
2
1.07
(.314)
qc 1.29
(.437)
Co||cqc , 1
2
= .129
(a) (4 points) Is it reasonable to include (Exper)
2
in the above regression? Why or why not? On the basis
of the estimation results, would you recommend dropping it? Why or why not?
Solution: Including (1rjcr)
2
in the regression allows for a non-linear relationship between points scored
and experience. We might reasonably think that a player will benet from experience playing in the pros,
but that the returns to that experience (in terms of points per fame) might fall over time. I would not
recommend dropping it from the regression because it is signicantly dierent from zero at the 1% level
(P-value = 2
_
t
oct
_
= 2
_
.077
.026
_
= 2 (2.96) - 0.003 < .01)
(b) (5 points) Holding college years and age xed, what is the expected increase in points-per-game associ-
ated with increasing years of experience from 7 to 8. Again, holding college years and age xed, at what
value of experience does the next year of experience actually reduce points-per-game? Does this make
sense? Explain.
Solution: For a quadratic regression (like 1 = ,
0
+ ,
1
A + ,
2
A
2
+ n), the predicted eect on 1 of a
change in A is given by
1 =

,
0
+
,
1
(A + 1) +
,
2
(A + 1)
2
,
0
+
,
1
A +
,
2
A
2
_
=

,
1
+ 2
,
2
A +
,
2
41
(Hint: You could have just used this formula without deriving it.) In our case,

1oi:t: =

,
1
+2
,
2
1rjcr+
,
2
= 2.36 2 .077 7 .077 - 1.21 so we expect points to increase by 1.21 on average. To answer this,
you need to nd where the quadratic function for experience hits its peak and starts to decline. To do
this, set the derivative of ,
1
A + ,
2
A
2
equal to zero and solve for A. We then have A =
o
1
2o
2
. In our
case, 1rjcr
0
=
2.36
2.077
- 15.32, so the increase from 15 to 16 years of experience would actually reduce
points per game. This is a very high level of experience, and few players last this long in the NBA, so
we can essentially ignore this prediction. (It may be picking up some of the eects of age that are not
captured by include it only linearly).
(c) (4 points) What is the interpretation of the coecient on college? Is is signicantly dierent from zero
at the 5% level? Does this make sense? Explain. (Hint: NBA players can enter the NBA before nishing
college and some, like Kevin Garnett, start playing in the NBA right after high school.)
Solution: The coecient on college implies that an additional year of college is expected to decrease
points per game by 1.29. Since the P-value = 2
_
t
oct
_
= 2
_
1.29
.437
_
= 2 (2.95) - 0.003 < .05,
it is signicantly dierent from zero at the 5% (or even 1%) level. Many of the most promising players
leave college early, or, in some cases, forego college altogether, to play in the NBA. These top players
command the highest salaries. It is not more college that hurts salary, but less college is indicative of
super-star potential.
(d) (3 points) You also have data on the position played by each player in the sample (either Cc:tcr, Gnard,
or 1ornard). Suppose you now decide to include dummy variables for the position played by each player
in the regression. Here are the regression results:
1oi:t: = 33.2
(7.73)
+ 2.28
(.410)
1rjcr .072
(.026)
(1rjcr)
2
1.04
(.323)
qc 1.34
(.418)
Co||cqc + 2.30
(1.21)
Gnard + 1.47
(1.23)
1ornard
1
2
= .140
Why have I not included a dummy variable for Center? Will this bias the coecients on the remaining
dummy variables (i.e. Gnard and 1ornard)? Why or why not?
Solution: Including all three position dummy variables would be redundant, and result in perfect multi-
collinearity (as you showed in the last problem set). Each player falls into one of the three categories, and
the overall intercept in this regression is the intercept for centers. The choice of which dummy variable
(or ,
0
) to drop has no impact on bias, it merely changes the interpretation of the coecients.
(e) (4 points) Holding everything else xed, does a guard score more points than a center? How much
more? Is the dierence statistically signicant.
Solution: Yes, a guard is estimated to score about 2.3 points more per game on average than a center,
holding all other regressors xed. The P-value
= 2
_
t
oct
_
= 2
_
2.30
1.21
_
= 2 (1.90) - .057 .05
so the dierence is not statistically dierent from zero at the 5% level (although it is close).
(f) (4 points) Holding everything else xed, does a guard score more points than a forward? How much
more? Is the dierence statistically signicant? Can you answer this question with the regression output
provided? Why or why not?
Solution: Yes, a guard is estimated to score about .83 (= 2.3 1.74) points more per game on average
than a forward, holding all other regressors xed. We cannot test the statistical signicance of this without
more information, such as the covariance between

,
5
and

,
6
. (The easiest way to test this would be to
make 1ornard the omitted category instead of Cc:tcr).
6. (25 points) Your best friend is applying to medical school this year. To assess her chances of being accepted
to the school of her choice, you have collected data on applicants to a major west coast medical school. For 120
42
applicants you observe whether they were accepted (acccjt), the applicants age (aqc), whether the applicant
is male (:a|c), the applicants grade point average (qja), and the applicants MCAT score (:cat). An LPM
regression of acccjt on the above explanatory variables yields:
acccjt = .786
(.321)
.002
(.012)
aqc + .113
(.076)
:a|c + .016
(.086)
qja + .048
(.006)
:cat ,1
2
= .35
A Logit of cccjt on the same explanatory variables yields:
Pr (cccjt = 1 [ As)
= 1
_
9.90
(2.32)
+ .002
(.071)
aqc + .805
(.491)
:a|c .111
(.648)
qja + .379
(.076)
:cat
_
(a) (4 points) Is the eect of :cat signicant at the 5% level in each model?
Solution: For the LPM, P-value = 2
_
t
oct
_
= 2
_
.048
.006
_
= 2 (8) - 0, which is signicant at
the 5% (or any) level. For the Logit, P-value = 2
_
t
oct
_
= 2
_
.379
.076
_
= 2 (4.99) - 0, which
is also signicant at the 5% (or any) level. Aside: This is not surprising since the MCAT would seem to
be a major factor in determining admissions.
(b) (5 points) Focusing on the LPM output, holding everything else constant, what is the expected increase
in the probability of acceptance associated with a one point increase the applicants MCAT score? Can
you perform the same calculation for the Logit model? If yes, calculate the expected increase. If not,
explain why you cannot.
Solution: For the LPM, the coecient on MCAT implies that a one unit increase in an applicants
MCAT score is expected to increase the probability of admission by 4.8%. Because the Logit model is
non-linear in the coecients, we cannot perform the same calculation for the Logit output, we would need
to know the initial MCAT score as well as the applicants age, gender and GPA.
(c) (4 points) Focusing again on the LPM, what is the predicted probability of acceptance for a 22 year old,
female applicant, with a GPA of 1.2 and an MCAT of 16? Does this make sense? If not, what do you
think is going wrong?
Solution: According to the LPM results, the predicted probability of acceptance for a 22 year old, female
applicant, with a GPA of 1.2 and an MCAT of 16 is
Pr (cccjt = 1 [ As) = .786 .002 22 +.113 0 +.016 1.2 +.048 16 - .04

This, of course, does not make sense because probabilities must be between 0 and 1. This is the well
known problem that the LPM ts a straight line (or more generally, a linear function) through the data,
so it can make predictions outside the (0, 1) interval.
(d) (4 points) Focusing now on the Logit output, what is the predicted probability of acceptance for the
same applicant considered in part c? Does this make sense? Do the predictions from part c and d dier?
If so, why?
Solution: According to the Logit results, the predicted probability of acceptance for a 22 year old, female
applicant, with a GPA of 1.2 and an MCAT of 16 is
Pr (cccjt = 1 [ As) = 1 (9.90 +.002 22 +.805 0 .111 1.2 +.379 16)

= 1 (3.93) =
c
3.93
1 +c
3.93
- .02
Thus, the applicant has about a 2% chance of getting accepted, which makes a lot more sense. The
predictions are dierent because the logit is tting a non-linear function (which is forced to lie in the
(0, 1) interval) through the data, while the LPM is simply linear.
43
(e) (4 points) Focusing again on the Logit output, is the eect of GPA signicantly dierent from zero at
the 5% level? What do the sign and signicance of this coecient imply about the impact of GPA on
admissions? Does this make sense to you? Explain.
Solution: For the coecient on GPA in the Logit, P-value = 2
_
t
oct
_
= 2
_
.111
.648
_
= 2
(.171) - .86, which is insignicant at the 5% level (or any other reasonable level as well). The
coecient is also negative, which means higher having a higher GPA lowers the applicants probability
of admission. However, it is important to emphasize that we cannot reject that the coecient is equal
to zero at any reasonable level of signicance, so we should really conclude that GPA has no eect on
the probability of admission. Although this might at rst be surprising, it is possible that admissions
committees ignore GPA because the MCAT score already tells them what they need to know about the
applicants ability (its a pretty substantial test). Alternatively, GPA might vary so much across schools
and majors (due to grade ination for example) that we cannot pick up its eect without controlling for
these other factors.
(f) (4 points) Focusing again on the Logit output, you want to test the null hypothesis that all of the slope
coecients (excluding :cat) are equal to zero. The value of the F-stat is 1.01. Can you reject the null
using a 10% signicance level? Is this surprising? Explain.
Solution: The value of the F-test, 1.01, is smaller than the critical value 1
3,o
= 2.08 (there are three
restrictions here), so we cannot reject the null that all those slope coecients are equal to zero. Only GPA
is at all surprising, although the arguments in part e explain why it might be reasonable. We would hope
that schools are not discriminating on the basis of age or gender, so the fact that the other coecients
are not signicant is not surprising (and somewhat comforting).
7. (29 points) You are interested in understanding some of the determinants of the variation in grade point
averages (GPAs) across college students. As such, you have collected data on 4137 students at a large mid-
western research university that supports a Division 1 athletics program. Specically, your dataset includes
the following variables:
co|qja - the students cumulative GPA, measured on a four point scale (mean: 2.65)
/:i.c - the size of the students graduating high school class, in 100s (mean: 2.80)
/:jcrc - the students academic percentile
2
in their high school graduating class (mean: 19.2)
:at - the students combined SAT score (mean: 1030)
)c:a|c - a dummy variable equal to one if the student is female (mean: .45)
at/|ctc - a dummy variable equal to one if the student is a student-athlete (mean: .05)
You estimate the following regression using OLS (HR Standard Errors in parentheses):
co|qja = 1.24
(.080)
.057
(.017)
/:i.c+.0047
(.002)
/:i.c
2
.013
(.0006)
/:jcrc+ .0016
(.00007)
:at+.155
(.018)
)c:a|c+.169
(.037)
at/|ctc, 1
2
= .29
(a) (4 points) Consider the coecient on the variable /:jcrc. Does it make sense that this coecient is
negative? Explain why or why not. Is this coecient statistically signicant at the 5% level?
Solution: The variable /:jcrc is dened so that the larger it is, the lower the students standing in
high school. The negative coecient means that the worse the student did in high school, the lower their
college GPA will be. This seems pretty sensible. Finally, since the t-stat =
.013
.0006
= 21.7 is larger in
absolute value than the critical value 1.96, it is indeed signicant at the 5% level.
2
Percentile is dened so that, for example, hsperc = 5 means the top ve percent of the class.
44
(b) (5 points) What is the expected change in cumulative GPA associated with increasing the size a students
graduating high school class from 100 to 110 students? How about increasing the size from 200 to 250
students?
Solution: To answer this, we need to recall the formula for calculating the eect of a change in A on the
predicted value of 1 in a quadratic regression. That formula is simply:
1 =

,
1
A
i
+ 2
,
2
A
i
(A
i
) +
,
2
(A
i
)
2
or, using the variable names we have here

co|qja =

,
1
/:i.c
i
+ 2
,
2
/:i.c
i
(/:i.c
i
) +
,
2
(/:i.c
i
)
2
So, for the change from 100 to 110 we have /:i.c
i
= 1 and /:i.c
i
= .1, so

co|qja =

,
1
.1 + 2
,
2
1 (.1) +
,
2
(.1)
2
= .057 .1 + 2 .0047 .1 +.0047 (.1)
2
= .005
For the change from 200 to 250 we have /:i.c
i
= 2 and /:i.c
i
= .5, so

co|qja =

,
1
.5 + 2
,
2
2 (.5) +
,
2
(.5)
2
= .057 .5 + 2 .0047 1 +.0047 (.5)
2
= .02
(c) (6 points) Given that /:i.c ranges from .1 to 6 in the data, what do the signs on /:i.c and /:i.c
2
imply about the relationship between high school size and cumulative GPA? Suppose you were to replace
both /:i.c and /:i.c
2
with the single variable ln(:i.c). What sign would you expect the coecient on
this new variable to have? Why?
Solution: The negative sign on /:i.c and positive sign on /:i.c
2
mean that the relationship between
co|qja and /:i.c is convex (or bowl shaped as the gure below illustrates). Moreover, over the range
of /:i.c observed in the data, co|qja is never increasing in /:i.c (which it could have been since this
quadratic does turn upward at some point). This means that co|qja is decreasing in /:i.c but at a
decreasing rate. We can verify this explicitly by nding the minimum value of co|qja with respect to
/:i.c, which occurs at the value of /:i.c at which the derivative of .057/:i.c + .0047/:i.c
2
with
respect to /:i.c is equal to zero. This occurs when .057+2 .0047 /:i.c = 0, that is when /:i.c = 6.06,
which is outside the observed range of /:i.c. Given that co|qja is decreasing in /:i.c at a decreasing
rate, the negative of the log function should be able to t this relationship quite well (since it should look
something like the quadratic before it bottoms out), so you should expect the coecient on ln(:i.c) to
have a negative sign.
c
o
l
g
p
a
hsize
0 6 12
1.06719
1.24415
(d) (3 points) What is the estimated GPA dierential between females and males? Is it statistically signi-
cant at the 1% level?
Solution: The coecient on )c:a|c implies that, holding all other variables constant, we expect the
GPAs of female students to be about .155 higher than those of males. Since the t-stat =
.155
.018
= 8.61 is
greater than 2.58, the estimated dierential is statistically signicant at the 1% level.
45
(e) (3 points) What is the estimated GPA dierential between athletes and non-athletes? Is it statistically
signicant at the 5% level?
Solution: The coecient on at/|ctc implies that, holding all other variables constant, we expect the GPAs
of student-athletes to be about .169 higher than those of non-athletes. Since the t-stat =
.169
.037
= 4.57
is greater than 1.96, the estimated dierential is statistically signicant at the 5% level (or even the 1%
level).
(f) (5 points) If we drop the variable :at from the model and re-estimate it, we get the following result
co|qja = 3.05
(.034)
.053
(.018)
/:i.c +.0053
(.002)
/:i.c
2
.017
(.0006)
/:jcrc + .058
(.019)
)c:a|c + .005
(.039)
at/|ctc, 1
2
= .19
What is the estimated GPA dierential between athletes and non-athletes now? Is it statistically signi-
cant at the 5% level? Discuss why the estimate of the coecient on at/|ctc might be dierent from what
you found in part e).
Solution: Now the coecient on at/|ctc implies that, holding all other variables constant, we expect
the GPAs of student-athletes to be about .005 higher than those of non-athletes. This is a pretty small
eect. Moreover, since the t-stat =
.005
.039
= .128 is much smaller than 1.96, the estimated dierential is
not statistically signicant from zero at the 5% level. We are now omitting SAT from the regression,
and this is likely to cause omitted variable bias. In particular, if athletes have lower SAT scores on av-
erage, omitting SAT from the analysis will lead to a downward bias when we estimate the eect of athlete.
(g) (3 points) Explain how you might go about testing whether the eect of :at on co|qja diers by gender.
Solution: You would simply add an interaction term like :at)c: = :at )c:a|c to the regression
estimated above and test whether the coecient is signicantly dierent from zero.
8. 12. (16 points) You are asked to analyze student housing demand at a mid-sized southeastern university.
You have gathered data from 32 students (dont worry about the somewhat small sample) on rent per person
(1c:t1cr - which is the total apartment rent divided by the number of roommates), whether the student is
male or female (1c:a|c), the number of rooms per person in the apartment (1oo:1cr - which is the number
of rooms in the apartment divided by the number of roommates), and the distance from campus (1i:t). You
then run the following regressions:
1c:t1cr = 8.98
(4.37)
+ 5.01
(5.20)
1c:a|c + 29.5
(7.93)
1oo:1cr 0.20
(0.15)
1i:t, 1
2
= .36, 1
2
= .29
1c:t1cr = 8.17
(4.39)
+ 33.0
(6.21)
1oo:1cr 0.26
(0.11)
1i:t, 1
2
= .33, 1
2
= .28
Denote the coecients of 1c:a|c, 1oo:1cr, and 1i:t by ,
1
, ,
2
, and ,
3
respectively.
(a) (5 points) What is the interpretation of ,
2
? Using the rst regression, test the hypothesis that ,
2
= 0
(as opposed to ,
2
0). Is this what you would expect?
Solution: ,
2
represents how much more students are willing to pay (per person) for additional rooms
per person (i.e. to not have to share). This is a one-sided hypothesis test of the form
H
0
: ,
2
= 0
H
: ,
2
0
To calculate the relevant P-value we evaluate P-value =
_
t
oct
_
=
_
29.5
7.93
_
= (3.72) - 0, so we
reject the null at any signicance level. It makes sense that students would be willing to pay more to
have their own room.
46
(b) (5 points) What is the interpretation of ,
3
? Using the rst regression, test the hypothesis that ,
3
= 0
(as opposed to ,
3
< 0). Is this what you would expect?
Solution: ,
3
represents how much less students are willing to pay for apartments that are farther from
campus. Again, this is a one-sided hypothesis test of the form
H
0
: ,
3
= 0
H
: ,
3
< 0
To calculate the relevant P-value we evaluate P-value =
_
t
oct
_
=
_
0.2
0.15
_
= (1.29) - 0.0985, so
we fail to reject the null at the 5% signicance level. This is a bit surprising since we would think that
students would be willing to pay signicantly more to have an apartment thats close to campus (maybe
the campus isnt in so great an area!).
(c) (6 points) Using the rst regression, test the hypothesis that ,
1
= 0 (as opposed to ,
1
,= 0). Using the
information provided above, there is another way to test this hypothesis. What do you need to assume
for this method to be valid? What do you conclude?
Solution: Now we have a two-side hypothesis of the form
H
0
: ,
4
= 0
H
: ,
4
,= 0
To calculate the relevant P-value we evaluate P-value = 2
_
t
oct
_
= 2
_
5.01
5.20
_
= 2 (0.998) -
0.32, so we fail to reject the null at the 5% signicance level. Since we have regression output both with
and without the variable 1c:a|c, we can use the rule of thumb F-statistic discussed in the appendix of
chapter 5. To use this test, we have to assume that the errors are homoskedastic, which is often
unrealistic. The formula for the rule of thumb F-statistic is
1 =
_
1
2
l
1
2
1
_
,
_
1 1
2
l
_
, (: /
l
1)
which is distributed as 1
q,aI
U
1
. Here we have
1 =
(.36 .33) ,1
(1 .36) , (32 3 1)
= 1.31
which is less than the 5% critical value 1
1,28
= 4.2 so again, we fail to reject the null. Note, if you had
been given homoskedasticity only standard errors in the rst regression, this statistic would be identical
to the one youd get by looking at t
2
.
9. 13. (16 points) Two authors published a study in 1992 of the eect of minimum wages on teenage employment
using a U.S. state panel. The paper used annual observations for the years 1977-1989 and included all 50 states
plus the District of Columbia. The estimated equation is of the following type
1
it
= ,
0
+,
1
'
it
\
it
+,
2
Tcc:
it
+,
3
lra:
it
+otatc 11: +Ti:c 11: +n
it
where 1 is the employment to population ratio of teenagers, ' is the nominal minimum wage and \ is average
wage in the state (so
A
W
measures the relative minimum wage), lra: is the prime-age male unemployment
rate, and Tcc: is the teenage population share.
(a) (4 points) Briey discuss the advantage of using panel data in this situation rather than pure cross-
sections or time series.
47
Solution: There are likely to be omitted variables in the above regression. One way to deal with some
of these is to introduce state and time eects. State eects will capture the inuence of omitted variables
that are state specic and do not vary over time, while time eects capture those of countrywide variables
that are common to all states at a point in time. Furthermore, there are more observations when using
panel data, resulting in more variation.
(b) (4 points) Estimating the model by OLS but including only time xed eects results in the following
output.
1
it
=

,
0
0.33
(0.08)
'
it
\
it
+ 0.35
(0.28)
Tcc:
it
1.53
(0.13)
lra:
it
, 1
2
= .20
Coecients for the time xed eects and the constant term are not reported. Comment on the above
results (i.e. interpret the results). Are the coecients statistically signicant?
Solution: There is a negative relationship between minimum wages and the employment to population
ratio. Increases in the share of teenagers in the population result in a higher employment to population
ratio, and increases in the prime-age male unemployment rate lower the employment to population ratio.
20 percent of employment to population of teenagers variation is explained by the above regression. The
relative minimum wage and the prime-age male unemployment rate are signicant using a 1% signicance
level, while the proportion of teenagers in the population is not.
(c) (4 points) Adding state xed eects changes the above equation as follows:
1
it
=

,
0
+ 0.07
(0.10)
'
it
\
it
0.19
(0.22)
Tcc:
it
0.54
(0.11)
lra:
it
, 1
2
= .69
Compare the two results. Why would the inclusion of state xed eects change the coecients in this
way?
Solution: The parameter of interest here is the coecient on the relative minimum wage. While it was
highly signicant in the previous regression, it now has changed signs and is statistically insignicant.
The explanatory power of the equation has increased substantially. The size of the other two coecients
has also decreased. The results suggest that omitted variables, which are now captured by state xed
eects, were correlated with the regressors and caused omitted variable bias.
(d) (4 points) The signicance of each coecient decreased, yet 1
2
increased. How is that possible? What
does this result tell you about testing the hypothesis that all of the state xed eects can be restricted to
have the same coecient? How would you test for such a hypothesis (just describe what you would do,
no calculations!)?
Solution: The inuence of the state eects is large, which is reected in the dramatic increase in
1
2
. Omitted variable bias is almost certainly causing the changes in the coecients and their degree of
signicance. These are bound to be statistically signicant and the hypothesis to restrict these coecients
to zero is bound to fail. Since these are linear hypotheses that are supposed to hold simultaneously, an
F-test is appropriate here.
10. (20 points) You want to study the trade-o between time spent sleeping and time spent working, and also
look at other factors aecting sleep. You decide to estimate the following relationship
:|ccj = ,
0
+,
1
totnr/ +,
2
cdnc +,
3
aqc +n
where :|ccj and totnr/ (total work) are measured in minutes per week and cdnc and aqc are measured in
years.
48
(a) (3 points) If adults trade o sleeping for work, what is the sign of ,
1
? What signs do you think ,
2
and
,
3
will have? Why?
Solution: If adults trade o sleep for work, more work implies less sleep (other things equal), so ,
1
< 0.
The signs of ,
2
and ,
3
are not so obvious. One could argue that more educated people like to get more
out of life, and so, other things equal, they sleep less (,
2
< 0). The relationship between sleeping and
age is probably more complicated than this model suggests, but elderly people do seem to sleep less.
(b) (5 points) After collecting data from 706 adults in the United States, you estimate the following regression
:|ccj = 3638.2
(115.1)
.148
(.019)
totnr/ 11.13
(5.78)
cdnc + 2.20
(1.44)
aqc ,1
2
= .11
If someone works ve more hours per week, by how many minutes is :|ccj expected to fall? Construct
a 95% condence interval for this prediction.
Solution: Since totnr/ is in minutes, we must convert ve hours into minutes: totnr/ = 5(60) = 300.
Then sleep is predicted to fall by .148(300) = 44.4 minutes. For a week, 45 minutes less sleep is not an
overwhelming change. A 95% CI for 300 ,
1
is 300
,
1
1.96 300 o1
_
,
1
_
= 300 .1481.96 300 .019 =
(33.2, 55.6) (Of course, you could also have written it as (55.6, 33.2)).
(c) (4 points) Do totnr/, cdnc, and aqc explain much of the variation in sleep? What other factors might
aect the time spent sleeping? Are these likely to be correlated with totnr/?
Solution: Not surprisingly, the three explanatory variables explain only about 11.3% of the variation in
sleep. One important factor in the error term is general health. Another is marital status, and whether
the person has children. Health (however we measure that), marital status, and number and ages of
children would generally be correlated with totnr/. (For example, less healthy people would tend to work
less.)
(d) (4 points) To investigate whether gender has an impact on sleeping habits you estimate the following
regression
:|ccj = 3840.8
(250.4)
.163
(.021)
totnr/ 11.71
(5.75)
cdnc 8.70
(11.49)
aqc + .129
(.134)
(aqc)
2
+ 87.75
(35.46)
:a|c ,1
2
= .12
All other factors being equal, is there evidence that men sleep more than women? How strong is this
evidence?
Solution: The coecient on male is 87.75, so a man is estimated to sleep almost one and one-half hours
more per week than a comparable woman. Furthermore, the P-value = 2
_
t
oct
_
= 2
_
87.75
35.46
_
=
2(2.47) - .014 is close to being signicant at the 1% level. Thus, the evidence for a gender dierential
is fairly strong.
(e) (4 points) You want to test the null hypothesis that the eect of both aqc and (aqc)
2
are equal to zero.
The value of the F-test is 1.75. Can you reject the null using a 5% signicance level?
Solution: The value of the F-test, 1.75, is smaller than the critical value 1
2,o
= 3, so we cannot reject
the null that both age coecients are equal to zero.
49
5 Limited Dependent Variable Models & Maximum Likelihood
1. 20 points overall. A researcher wants to study whether there is discrimination against female applicants
for mortgages. She uses the data collected in the Boston area in 1990 that Stock & Watson mention in the
textbook, but she uses only observations related to white applicants with no missed mortgage payments in
their credit history. This leads to a sample of 1916 observations. The researcher wants to use dierent models
to see if the choice of model makes a dierence. The dependent variable is a dummy equal to one if
the mortgage is approved. The results are the following (the rst column reports the average in the sample
of the corresponding variable). Hetersoskedasticity-robust standard errors are in parenthesis.
Regressor Sample Mean LPM Logit Probit
constant 1.16
(0.073)
5.32
(1.02)
2.81
(0.496)
Female 0.18 0.021
(0.017)
0.26
(0.23)
0.12
(0.11)
NoHistory 0.65 0.040
(0.013)
0.59
(0.19)
0.29
(0.092)
P/I ratio 32.56 0.006
(0.001)
0.053
(0.012)
0.026
(0.005)
Log(loan) 4.82 0.01
(0.014)
0.18
(0.19)
0.086
(0.095)
where Female is a dummy equal to one when the applicant is a female, NoHistory is a dummy equal to one
if the applicant has no credit history, P/I ratio is total monthly payments for the mortgage as a proportion
of monthly income, and Log(loan) is the logarithm of the loan amount.
(a) (5 points) LPM, Logit, and Probit provide very dierent point estimates for all coecients. Does this
mean that their prediction will be very dierent? Why?
Solution: No the predictions will be similar. While LPMs coecients translate directly to marginal
probability increases, probit and logit coecients do not say much about magnitude of probability increase
or decrease.
(b) (5 points) How do you interpret the coecient for Female in the Linear Probability Model? Is the
interpretation the same in the logit and the probit models? Is there evidence of discrimination against
women applicants in any of these models?
Solution: The Female coecient for LPM is interpreted as a 2.1 % increase in the probability of being
approved if the applicant is a woman, all else equal. The interpretation is not the same. All that we can
say about the probit and logit Female coecient is that there is an increase in the probability of being
approved if the applicant is a woman, and in all cases the coecient is not statistically signicant. There
is no evidence of discrimination against women. In fact, there is a small but weakly signicant bias for
women.
(c) (5 points) Give an economic interpretation to the signs of the coecients in the three models. Do you
think the signs make sense? Do they dier across dierent models?
Solution: The economic interpretation is that a person is more likely to get a loan if: i. the applicant is
female ii. the applicant has a credit history iii. the applicant has a low P/I ratio. The signs are consistent
across all three models, and they make sense, except the Female parameter. Theres no clear reason why
being female should confer an advantage. We note that the coecient in front of log(loan) is insignicant,
which seems to indicate that the size of the loan is not a major factor in approval or denial of the loan.
50
(d) (5 points) For the three dierent models, compute the predicted eect on the probability of being
granted the mortgage of having no credit history, for a male, evaluating all other regressors at their
sample averages. Is the estimated eect very dierent for the three models? Solution:
11' : 1.16 0.04 0.006 32.56 0.01 4.82 = 0.85844
~
= 85.8%
1oqit : 1(5.32 0.59 0.053 32.56 0.18 4.82) =
1
1 +c
2.13672
= 0.894421
~
= 89.4%
1ro/it : (2.81 0.29 0.026 32.56 0.086 4.82) = (1.25892)
~
= 0.895 = 89.5%
The estimated probabilities for all three models are fairly close.
2. (21 points total) The Poisson distribution, whose name comes from the French mathematician Simeon
Poisson, is often used to model count data (that is, non-negative integer valued random variables that represent
the number of times something happens). The pf of the Poisson distribution is given by
j(j [ `) =
c
A
`
j
j!
for j = 0, 1, ... (1)
where the parameter ` also happens to be the mean of the distribution.
3
In one of its earliest empirical
applications, the Poisson distribution was used to model the number of Prussian cavalry (i.e. horse-riding)
soldiers who were killed by kicks from their horses. We are going to repeat that analysis here. In particular,
we have data on 10 cavalry units over a period of 20 years, yielding a total of 200 unit/year level observations.
Letting d = the number of deaths, we observe the following data
Observed Horse-Kick Fatalities
d = number of Number of unit/years in
deaths which d deaths occurred
0 109
1 65
2 22
3 3
4 1
so that, for example, no deaths occurred in 109 of the 200 unit/years and 4 deaths occurred in 1 unit/year.
We are going to estimate the mean (`) using Maximum Likelihood and then see how well our estimator ts
the data. We will assume that our sample (j
1
, .., j
200
) is iid.
(a) (5 points) Treating the observations (j
1
, .., j
a
) as known, show that the likelihood function 1
a
(`) can
be written as
1
a
(`) =
c
aA
`
P
n
i=1
j
i
a
i=1
j
i
!
(2)
Solution: From the formula for the Poisson pf, we know that the pf of a single observation is just
)
i
(j) =
c
A
`
j
i
j
i
!
The likelihood is simply the product of these pfs, or
a
i=1
)
i
(j) =
a
i=1
c
A
`
j
i
j
i
!
=
a
i=1
c
A
`
j
i
a
i=1
j
i
!
=
c
aA
`
P
n
i=1
j
i
a
i=1
j
i
!
3
Recall that the factorial n! is dened for a positive integer n as n! = n (n 1)::2 1 and the special case 0! is dened to have value
0! = 1:
51
(b) (4 points) Show that the value of ` that maximizes (2) is in fact the sample mean
1
a
a
i=1
j
i
. Hint: it will
be much easier to work with the log likelihood here.
Solution: Taking logs of both sides of (2) yields
log 1
a
(`) = :` +
_
a
i=1
j
i
_
log ` ln
_
a
i=1
j
i
!
_
We can nd the value of ` that maximizes the log likelihood by taking the derivative with respect to `
and setting it equal to zero. Because the third term does not depend on `, the derivative is simply
0 log 1
a
(`)
0`
= : +
1
`
a
i=1
j
i
Setting this equal to zero and solving for ` yields
(c) (3 points) How do we know that we have maximized and not minimized the likelihood in part b?
Solution: To assure that we are nding the maximum, it is sucient to establish that the function we
are maximizing is globally concave (i.e. shaped like a hill, rather than a bowl). One way to do this is to
graph the function
log 1
a
(`) = :` +
_
a
i=1
j
i
_
log ` ln
_
a
i=1
j
i
!
_
for some particular values of : and

a
i=1
j
i
(the last term is a constant in this function, so it doesnt
really matter here). Heres what it looks like for our sample values (: = 200 and

a
i=1
j
i
= 122).
ln
L
lambda
0 .61 3
-1685.49
-182.304
which appears to hit its peak around ` = .6. While you could have traced out a gure like this on your
exam, a much easier way to proceed is to calculate the second derivative of ln1
_
1
A
2
a
i=1
j
i
_
. Since it
is negative for all ` 0, we know that the likelihood function is globally concave, so the spot where the
rst derivative is 0 will be a maximum.
(d) (3 points) Now, using the data in Table 1, nd the maximum likelihood estimate of `.
Solution: All we need to do here is calculate the sample mean, which is simply
j =
1
200
(109 0 + 65 1 + 22 2 + 3 3 + 1 4) =
1
200
(65 + 44 + 9 + 4)
=
122
200
= .61, :o
^
`
A11
= .61
(e) (6 points) Using your maximum likelihood estimator (i.e. (1) with ` =
^
`), ll in the expected frequencies
in the third column of the following table:
52
Horse-Kick Fatalities
d = number Observed Expected
of deaths Frequency Frequency
0 109
1 65
2 22
3 3
4 1
How well did your estimator do?
Solution: To ll in the table, we need to calculate the expected frequencies using our ML estimator
^ j(r) =
c
0.61
(0.61)
j
j!
for j = 0, 1, ...
and then multiply each one by 200. So we have,
^ j(0) =
c
0.61
(0.61)
0
0!
=
.543 1
1
= .543 .543 200 = 108.7
^ j(1) =
c
0.61
(0.61)
1
1!
=
.543 .61
1
= .331 .331 200 = 66.2
^ j(2) =
c
0.61
(0.61) 2
2!
=
.543 .372
2
= .101 .101 200 = 20.2
^ j(3) =
c
0.61
(0.61)
3
3!
=
.543 .227
6
= .020 .020 200 = 4
^ j(4) =
c
0.61
(0.61)
4
4!
=
.543 .138
24
= .003 .003 200 = 0.6
3. Filling in the Table, we have:(15 points) Your company bottles and distributes soft drinks. One product
comes in both regular and diet. The company would like to know if customers diet/regular choice can be
predicted from the kind of data they can reasonably expect to obtain (e.g., customer purchases and demographic
information), or whether this choice is driven by more dicult-to-measure preference factors. This issue is
interesting because they are considering an expansion that involves increased distribution expenses, and only
makes sense if the mix of sales can be shifted towards the diet product, which has a higher margin. To explore
this question, you have been given data on 465 customers who purchased either the diet or regular version of
your product last weekend. The data include the following variables:
DIET a dummy variable equal to one if the customer purchased the diet product;
AGE the customers age, in years;
FEMALE a dummy variable equal to one if the customer is female; and
INCOME customers family income in $1,000s.
To begin with, you start by specifying a probit model in which the dependent variable is DIET, and the
independent variables are AGE, FEMALE and INCOME. The probit output is as follows.
Pr (dict = 1 [ As) =
_
3.68
(.389)
+ .476
(.129)
1c:a|c .111
(.012)
qc + .0047
(.0025)
1:co:c
_
(a) (3 points) Give an economic interpretation to the signs of the coecients in this model. Are the eects
of female and age signicantly dierent from zero? Justify your answer.
Solution: The probability that a consumer purchases a diet soft drink is higher for females, increases
with income, and decreases with age. In other words, females are more likely to choose diet soft drinks
and so are wealthier people. Older people are less likely to choose them. To discuss signicance you need
to calculate t-statistics: For 1c:a|c, t =
.476
.129
= 3.39 and for qc, t =
.111
.012
= 9.25. Both are greater than
1.96 so they are statistically signicantly dierent from zero.
(b) (5 points) What is the change in the predicted probability of purchasing a diet soft drink when age
increases from 33 to 34 for a female with a family income of $33,000? How about when age increases from
53
20 to 21 for this "average" consumer? Do the answers dier? If so, why?
Solution: The eect of increasing qc from 33 to 34 (on the predicted probability of purchasing diet)
for a female with a family income of $33,000 is
(3.68 +.476 .111 34 +.0047 33) (3.68 +.476 .111 33 +.0047 33)
= (.537) (.648) = .037 ==100 (.037) = 3.7%
(i.e. it decreases the predicted probability by -3.7%). Similarly, the eect of increasing qc from 20 to
21 (on the predicted probability of purchasing diet) for a female with a family income of $33,000 is
(3.68 +.476 .111 21 +.0047 33) (3.68 +.476 .111 20 +.0047 33)
= (1.98) (2.09) = .006 ==100 (.006) = .6%
(i.e. it decreases the predicted probability by -.6%) The marginal eect depends on the level of age since
probit implies a nonlinear relationship.
(c) (3 points) What is the eect of gender (i.e. female) on the probability of purchasing a diet soft drink?
(You should evaluate the eect for an "average" individual who is 34 years old with a family income of
$33,000)
Solution: The eect of gender (on the predicted probability of purchasing diet) for 34 year old with a
female with a family income of $33,000 is
(3.68 +.476 .111 34 +.0047 33) (3.68 .111 34 +.0047 33)
= (.537) (.061) = .18 ==100 .18 = 18%
(i.e. it increases the predicted probability by 18%)
(d) (4 points) After some experimentation, you decide to run a probit model that also includes an qc
1c:a|c interaction variable (1c:qc). The probit output is as follows.
_
3.26
(.500)
+ 1.29
(.758)
1c:a|c .099
(.015)
qc + .0046
(.0025)
1:co:c .023
(.021)
1c:qc
_
In addition, a test of the null hypothesis H
0
: ,
1
= ,
4
= 0 yields an F-statistic = 7.38
What does this new output tell you about the eect of gender on the probability of purchasing a diet soft
drink? Be complete.
Solution: Here we nd that 1c:a|c and 1c:qc are individually insignicant (they have t-statistics
of 1.70 and 1.09 respectively). However, the F-statistic for their joint signicance is 7.38 which is greater
than the the critical values for the 1
2,o
distribution at even the 1% level, so they are jointly signicant.
This tells us that females are more likely to choose diet soft drinks, but that this positive eect decreases
with age. This should not be surprising given the results in b.which matches the observed frequencies
quite closely!
4. (15 points) Your company bottles and distributes soft drinks. One product comes in both regular and
diet. The company would like to know if customers diet/regular choice can be predicted from the kind of
data they can reasonably expect to obtain (e.g., customer purchases and demographic information), or whether
this choice is driven by more dicult-to-measure preference factors. This issue is interesting because they
are considering an expansion that involves increased distribution expenses, and only makes sense if the mix
of sales can be shifted towards the diet product, which has a higher margin. To explore this question, you
have been given data on 465 customers who purchased either the diet or regular version of your product last
weekend. The data include the following variables:
DIET a dummy variable equal to one if the customer purchased the diet product;
AGE the customers age, in years;
FEMALE a dummy variable equal to one if the customer is female; and
INCOME customers family income in $1,000s.
54
To begin with, you start by specifying a probit model in which the dependent variable is DIET, and the
independent variables are AGE, FEMALE and INCOME. The probit output is as follows.
_
3.68
(.389)
+ .476
(.129)
1c:a|c .111
(.012)
qc + .0047
(.0025)
1:co:c
_
(a) (3 points) Give an economic interpretation to the signs of the coecients in this model. Are the eects
of female and age signicantly dierent from zero? Justify your answer.
Solution: The probability that a consumer purchases a diet soft drink is higher for females, increases
with income, and decreases with age. In other words, females are more likely to choose diet soft drinks
and so are wealthier people. Older people are less likely to choose them. To discuss signicance you need
to calculate t-statistics: For 1c:a|c, t =
.476
.129
= 3.39 and for qc, t =
.111
.012
= 9.25. Both are greater than
1.96 so they are statistically signicantly dierent from zero.
(b) (5 points) What is the change in the predicted probability of purchasing a diet soft drink when age
increases from 33 to 34 for a female with a family income of $33,000? How about when age increases from
20 to 21 for this "average" consumer? Do the answers dier? If so, why?
Solution: The eect of increasing qc from 33 to 34 (on the predicted probability of purchasing diet)
for a female with a family income of $33,000 is
(3.68 +.476 .111 34 +.0047 33) (3.68 +.476 .111 33 +.0047 33)
= (.537) (.648) = .037 ==100 (.037) = 3.7%
(i.e. it decreases the predicted probability by -3.7%). Similarly, the eect of increasing qc from 20 to
21 (on the predicted probability of purchasing diet) for a female with a family income of $33,000 is
(3.68 +.476 .111 21 +.0047 33) (3.68 +.476 .111 20 +.0047 33)
= (1.98) (2.09) = .006 ==100 (.006) = .6%
(i.e. it decreases the predicted probability by -.6%) The marginal eect depends on the level of age since
probit implies a nonlinear relationship.
(c) (3 points) What is the eect of gender (i.e. female) on the probability of purchasing a diet soft drink?
(You should evaluate the eect for an "average" individual who is 34 years old with a family income of
$33,000)
Solution: The eect of gender (on the predicted probability of purchasing diet) for 34 year old with a
female with a family income of $33,000 is
(3.68 +.476 .111 34 +.0047 33) (3.68 .111 34 +.0047 33)
= (.537) (.061) = .18 ==100 .18 = 18%
(i.e. it increases the predicted probability by 18%)
(d) (4 points) After some experimentation, you decide to run a probit model that also includes an qc
1c:a|c interaction variable (1c:qc). The probit output is as follows.
_
3.26
(.500)
+ 1.29
(.758)
1c:a|c .099
(.015)
qc + .0046
(.0025)
1:co:c .023
(.021)
1c:qc
_
In addition, a test of the null hypothesis H
0
: ,
1
= ,
4
= 0 yields an F-statistic = 7.38
What does this new output tell you about the eect of gender on the probability of purchasing a diet soft
drink? Be complete.
Solution: Here we nd that 1c:a|c and 1c:qc are individually insignicant (they have t-statistics
of 1.70 and 1.09 respectively). However, the F-statistic for their joint signicance is 7.38 which is greater
than the the critical values for the 1
2,o
distribution at even the 1% level, so they are jointly signicant.
This tells us that females are more likely to choose diet soft drinks, but that this positive eect decreases
with age. This should not be surprising given the results in b.
55
5. (15 points) On April 15th, 1912, the ocean liner Titanic collided with an iceberg and sank in the Atlantic
Ocean. The following dataset contains information on the passengers who were on board the Titanic and
whether or not they survived. In particular, for each of the 2201 people who boarded the ship, we observe the
following:
Survived - A dummy variable equal to 1 if the passenger survived, 0 if they perished
Female - A dummy variable equal to 1 if the passenger was a female, 0 if male
Child - A dummy variable equal to 1 if the passenger was a child, 0 if the passenger was an adult
First - A dummy variable equal to 1 if the passenger was traveling in rst class, 0 if not
Second - A dummy variable equal to 1 if the passenger was traveling in second class, 0 if not
Third - A dummy variable equal to 1 if the passenger was traveling in third class, 0 if not
Crew - A dummy variable equal to 1 if the passenger was a member of the Titanics crew, 0 if not
Note: The variables First, Second, Third and Crew are mutually exclusive. All passengers belonged to one
and only one of these categories.
A LPM regression of Survived on various covariates yields
onricd = .09
(.016)
+ .31
(.028)
1ir:t + .12
(.026)
occo:d + .13
(.021)
Crcn + .49
(.024)
1c:a|c + .18
(.048)
C/i|d ,1
2
= .25
A Probit of Survived on the same covariates yield
Pr (:nricd = 1 [ As)
=
_
1.24
(.07)
+ 1.03
(.095)
1ir:t + .398
(.095)
occo:d + .487
(.085)
Crcn + 1.45
(.078)
1c:a|c + .580
(.151)
C/i|d
_
(a) (4 points) Using each model, calculate the predicted probability of survival for a female child in rst
class accommodations. Do these predictions make intuitive sense?
Solution: For the LPM, the predicted probability of survival for a female child in rst class accommo-
dations is
Pr (:nricd = 1 [ As) = .09 +.31 +.49 +.18 = 1.07

For the Probit, the predicted probability of survival for a female child in rst class accommodations is
Pr (:nricd = 1 [ As) = (1.24 + 1.03 + 1.45 +.58) = (1.82) = .966

We certainly might expect that the probability of survival would be high for a female child (women
and children rst!), but the predicted probability from the LPM is greater than one, which is of course
nonsense. This is a result of the assumed linearity of the LPM, which is why we prefer the probit (or
logit) specication.
(b) (4 points) Maritime code (the code of the sea) dictates that women and children be saved before adult
males (hence the saying women and children rst). Using each output, discuss whether (and by how
much) being a child increased the probability of survival (for the probit, you should perform a comparison
for a male child in third class accommodations). Again using each output, discuss whether (and by how
much) being female increased the probability of survival (for the probit, you should perform a comparison
for a female adult in third class accommodations).
Solution: For the LPM, the eect of child (on the predicted probability of survival):

,
5
= .18 ==
100 .18 = 18% (i.e. it increases the predicted probability by 18%) For the probit, the eect of child (on
the predicted probability of survival) for a male in third class accommodations is:
(1.24 +.58) (1.24) = (.66) (1.24) = .147 ==100 .147 = 14.7%
(i.e. it increases the predicted probability by 14.7%). For the LPM, the eect of female (on the predicted
probability of survival) is:
,
4
= .49 ==100 .49 = 49%
56
(i.e. it increases the predicted probability by 49%). For the probit, the eect of female (on the predicted
probability of survival) for an adult in third class accommodations is:
(1.24 + 1.45) (1.24) = (.21) (1.24) = .476 ==100 .476 = 47.6%
(i.e. it increases the predicted probability by 47.6%)
(c) (4 points) Maritime code also dictates that the captain go down (perish) along with the ship. Indeed,
the Titanics captain did not survive the voyage. Using the LPM output only, discuss whether the data
provides any evidence that the practice of going down with the ship was followed by the rest of the
crew? How would your answer change if you had used the probit output instead? Justify your answers.
Solution: For the LPM, the eect of crew (on the predicted probability of survival) is:

,
3
= .13 ==
100 .13 = 13% (i.e. it increases the predicted probability by 13%). For the probit, the eect of crew
for depends on who you are comparing them to (crew is like class here since they are mutually exclusive
categories): Compared to an adult male in third class, for example, the eect (on the predicted probability
of survival) of being a male crewmen is
(1.24 +.487) (1.24) = (.753) (1.24) = .118 ==100 .118 = 11.8%
But compared to an adult male in rst class, the eect of a male crewmen is
(1.24 +.487) (1.24 + 1.03) = (.753) (.21)
= .191 ==100 (.191) = 19.1%
So the answer depends on who you are comparing the crew member to (and whether each person is male
or female as well). To completely answer this question (which I did not expect) you would also perhaps
want to know something about where the crew was quartered and how many were males and females, so
you would have an idea of who they should be compared to. Of course, interpreted literally, since some
of the crew survived, they didnt all go down with the ship.
(d) (3 points) Several historians have argued that the practice of saving women and children rst, even if
followed by the passengers of the Titanic, did not extend to children in third (the lowest) class accommo-
dations. What variable could you add to the probit model above to test this hypothesis? Assuming that
the historians hypothesis is correct, what would you expect to nd?
Solution: You could add a C/i|dT/ird interaction variable and test whether the coecient is negative
(using a one-sided test).
6. You are studying child nutrition in India. You use data for a sample of about 27,000 less than 3 year old
children, collected in 1998-99 from the Indian National Family and Health Survey. The dependent variable
you are interested in is a binary variable, :
i
, which is equal to one when the child is malnourished, and zero
otherwise. You want to study the determinants of child malnutrition, and you obtain the following estimates.
57
Robust standard errors are given in parenthesis.
Dependent variable - :
i
, dummy = 1 if child is malnourished
Sample mean LPM LPM LPM Logit Probit
(1) (2) (3) (4) (5)
Constant 0.130 0.120 0.132 -1.893 -1.121
(0.007) (0.005) (0.005) (0.044) (0.024)
Female (dummy=1 if female) -0.020
(0.010)
Mother Illiterate 0.53 0.017 0.021 0.004 0.026 0.015
(0.008) (0.006) (0.006) (0.050) (0.027)
(Mother Illiterate)*Female 0.009
(0.011)
Father Illiterate 0.28 0.023 0.027 0.019 0.133 0.074
(0.010) (0.007) (0.007) (0.050) (0.028)
(Father Illiterate)*Female 0.009
(0.014)
Birth Order 2.78 0.001 0.002 0.001 0.006 0.003
(0.002) (0.001) (0.001) (0.011) (0.006)
(Birth Order)*Female 0.002
(0.003)
Wealth -0.05 -0.013 -0.126 -0.066
(0.001) (0.015) (0.008)
Log-Likelihood -11081.7 -11082.4
Wealth is a measure of the family wealth (this measure of wealth can be negative, given the way it is
constructed, but the exact construction of this variable is not relevant here), Mother (Father) illiterate are
(self-explanatory) dummies, and Birth Order represents the birth order of the child (so, older children have a
lower birth order). PLEASE NOTE: this table is also reported in the very last page of the exam.
You can take that page o the exam book, and use it for your convenience.
(a) (2 points) You rst estimate the model in Column (1), using a Linear Probability Model (LPM).
Calculate the predicted probability that a boy is malnourished, when both parents are literate, and birth
order is 1.
Answer: The predicted probability in this case is 0.131.
(b) (4 points) Calculate a 95% condence interval for the predicted probability of being malnourished for a
child with the characteristics described in part (a). The covariance between the estimated constant and
the slope for birth order is -.00001.
Answer: The 95% condence interval for the predicted probability is
0.131 1.96 o1(
,
0
+ 1
,
7
) =
0.131 1.96
_
o1(
,
0
)
2
+o1(
,
7
)
2
2Co(
,
0
,
,
7
) = [0.1197, 0.1423].
(c) (4 points) You want to see if results are dierent for boys and girls, so you test the null hypothesis that
all coecients related to the dummy Female are equal to zero. The value of the F-statistic is 1.35. Can
you reject the null using a 10% signicance level?
Answer: We can not reject the null that all coecients related to the variable female are zero, as the
10% level critical value for the 1(4, ) variable is 1.94.
58
(d) (5 points) You re-estimate the model without the variables related to the dummy Female. The results
are in Column (2). Consider two children, both third born (that is, Birth Order = 3), but one whose
parents are both illiterate, and one with both parents literate. Calculate the dierence in the predicted
probability of being malnourished between these two children.
Answer: In this version of the LPM the dierence in the predicted probability of being malnourished
when parents are illiterate vs. not is 0.048.
(e) (6 points) You are interested in how parental illiteracy aects child malnutrition. However, you suspect
that the results in Column (2) are aected by omitted variable bias, as you are not including a measure
of household wealth. Therefore, you estimate a new LPM model, reported in Column (3). Compare the
estimated slope for Mother Illiterate and Father Illiterate in Columns (2) and (3). Do the slopes change
in the expected direction?
Answer: The slopes become much smaller. They have changed in the expected direction, as we ex-
pect wealth to negatively aect the probability of malnutrition and to be negatively correlated with the
illiteracy characteristic. Thus, the coecients in (2) are upward biased, and go down as we include wealth.
(f) (6 points) Now you re-estimate the last model using logit and probit. The results are reported in
Columns (4) and (5) respectively. Consider two children, both fourth born (that is, Birth Order = 4),
but one whose parents are both illiterate, and one with both parents literate. Calculate the dierence
in the predicted probability of being malnourished between these two children, for both logit and probit,
assuming that wealth is equal to its mean value in the sample. Do probit and logit produce very dierent
answers?
Answer: The changes in predicted probability in Logit and Probit are close, both around 2% (i.e. the
predicted probability of being malnourished when both parents are illiterate is 2% higher that when they
are both literate for a fourth born child, with a family of average wealth).
(g) (4 points) We know that when we estimate the same regression using probit and logit, the estimated
coecients are always very dierent, because logit and probit make a dierent assumption as to which is
the correct likelihood. However, the results in the table show that the two models produce values for the
log-likelihood which are almost identical. Give an intuition as to why this is the case.
Answer: The loglikelihood values are close, as both Logit and Probit predict similar probabilities, in
spite of dierent functional forms and coecients.
7. You have a sample of 63,168 individuals from rural India, and you want to see if illiteracy is associated with the
probability that an individual works as an agricultural wage laborer. Let 1
i
= 1 if the individual is working as
an agricultural wage laborer at the time of the interview, and 0 otherwise. Let 1111T
i
be another dummy = 1
if the individual is illiterate, and 0 if instead he/she has some education. The following table contains results
from two dierent regressions. All results have been estimated using a linear probability model (that is, using
OLS), and heteroskedasticity-robust standard errors are listed to the right of each coecient.
59
Table 1 - LPM - Dependent variable is 1
i
Model (1) Model (2)
Coecient (s.e.) Coecient (s.e.)
Age -0.0052 0.0001 -0.0038 0.0001
log(income) -0.2305 0.0033
ILLIT 0.2252 0.0037 0.1542 0.0038
ILLITFemale 0.0974 0.0148 0.0617 0.0144
Female -0.0374 0.0114 0.0142 0.0110
Owns_land -0.0416 0.0073 -0.0819 0.0070
Fall 0.0054 0.0049 0.0096 0.0047
Winter 0.0014 0.0048 0.0024 0.0047
Spring 0.0046 0.0048 0.0096 0.0047
Intercept 0.4416 0.0092 1.8771 0.0232
1
2
0.0822 0.1412
where Female is a dummy equal to one if the individual is a woman, Owns_land is a dummy = 1 if the
individual owns agricultural land, and Fall, Winter, and Spring are dummies = 1 if the data have been
collected during the Fall, Winter, or Spring respectively.
(a) (6 points) Using the results for model (2), interpret the coecients corresponding to the variables
log(income), age, and ILLIT. Are they statistically signicant, using a 1% signicance level?
Solution: For log(i:co:c) the tstatistic is
0.2305
0.0033
- 70 and therefore signicant at the 1% probabil-
ity level. A 1% increase in income is associated with a 0.0023 (0.001 0.2305) decrease in probability of
working as a wage labor. This result makes sense since richer individuals are unlikely to be agricultural
wage laborers.
The coecient of qc is also signicant as its tstatistic is equal to
0.0038
0.0001
- 38. Keeping everything
else constant, one more year reduces the probability by 0.0038: older individuals may have more experi-
ence.
1111T indicates that an illiterate man is - 15% more likely to work as an agricultural wage laborer
then a litterate man. Its coecient is higly signicant t =
0.1542
0.0038
- 41: it makes sense as education leads
to better opportunities.
(b) (5 points) Using again the results from model (2), calculate the dierence between men and women in
the predicted impact of being illiterate on the probability of working as an agricultural wage laborer. Is
the dierence large in economic terms? Is is statistically signicant, using a 5% level?
Solution: The dierence is measured by 0.0617. The tstatistic is
0.0617
0.0144
- 4.28 which is signicant
at 5%. The dierence is fairly large: being illiterate increases the probability of working as AWL by 6
pecentage points more if you are a woman than if you are a man.
(c) (4 points) Compare the coecients for ILLIT in model (1) and (2). How would you explain the dier-
ence, in terms of omitted variable bias?
Solution: The coecient for 1111T (and also the interaction with 1c:a|c) goes down when log(i:co:c)
is included. This makes sense as the coecient of the omitted variable is negative and one would expect
that Co(1111T, log(i:co:c)) < 0. Therefore the result in model (1) will have positive omitted variable
bias [() () = (+)].
60
(d) (4 points) Compare the coecients for Age in model (1) and (2). How would you explain the dierence,
in terms of omitted variable bias, and taking into account that older individuals, in this sample, usually
have higher income?
Solution: In model (2) the point estimate is larger. The explanation follows the one above: the
coecient of the omitted variable is negative, Co(qc, log(i:co:c)) 0 and therefore the sign of the
omitted variable bias will be negative [() (+) = ()].
(e) (4 points) Use the results from model (2). Is there evidence of important seasonality in the probability
of working as an agricultural wage laborer? That is, does it look like the probability is very dierent
across seasons, keeping everything else constant?
Solution: There is no evidence of important seasonality eect as all the coecients for the season
dummies are < 0.01, that is very small. Keeping everything else constant, 1r(1
i
= 1) remains more or
less the same.
(f) (5 points) You test formally the hypothesis that the probability of working as an agricultural wage
laborer does not depend on the season (keeping everything else constant). State carefully the null and
the alternative hypothesis for this test. The value of the 1 test is 2.25. What do you conclude?
Solution: H
0
: ,
1
= ,
2
= ,
3
= 0
H
1
: at least one of ,
1
, ,
2
, ,
3
,= 0
where ,
1
, ,
2
, ,
3
are the coecients for Fall, Winter, and Spring.
The critical values of an 1
3,o
for a signicance level of 10% and 5% are respectively 2.08 and 2.60.
Hence, one would reject the null if testing at 10% and not reject if testing at 5%. The evidence is
somehow mixed.
8. 2. (21 points) You are interested in the factors that inuence whether a person chooses to smoke cigarettes.
To analyze this, you have collected a random sample from 807 adults living throughout the United States.
Your sample includes the following variables:
::o/c - a dummy variable equal to one if the person is a smoker
ciqjricc - the per pack price of cigarettes (in cents)
i:co:c - the persons annual income (in dollars)
cdnc - the persons total years of schooling (in years)
aqc - the persons age (in years)
rc:tanra:t - a dummy variable equal to one if person lives in a state where smoking is banned in
restaurants.
n/itc - a dummy equal to one if the persons race is white.
61
You estimate the following models using LPM, Logit, and Probit (HR Standard Errors in parentheses):
Dependent variable: ::o/c, a dummy = 1 if the person smokes
Sample mean LPM LPM Logit Probit Probit
(1) (2) (3) (4) (5)
Co::ta:t .656 .449 -.360 -.199 -.295
(.864) (.119) (.575) (.350) (.045)
ln(ciqjricc) 4.09 -.068
(.208)
ln(i:co:c) 9.69 .012
(.026)
cdncatio: 12.5 -.029 -.028 -.132 -.082
(.005) (.005) (.027) (.016)
aqc 41.2 .020 .020 0.106 .064
(.005) (.005) (.027) (.016)
aqc
2
1990 -.0003 -.0003 -.001 -.0008
(.00006) (.00005) (.0003) (.0002)
rc:tanra:t -.101 -.099 -.452 -.282
(.038) (.037) (.176) (.107)
n/itc -.026
(.051)
Log-Likelihood -510.5 -510.2 -537.5
(a) (3 points) Focusing on LPM (1), a test of the joint signicance of the coecients on ln(ciqjricc),
ln(i:co:c), and n/itc yields an 1-Statistic of .19. What do you conclude about the joint signicance of
these three variables?
Solution: The joint test has three restrictions and even the 10% critical value of the 1
3,o
distribution
(2.08) is larger than .19, so we cant reject the null hypothesis that these three coecients are jointly
insignicant at even the 10% level.
(b) (4 points) Focusing on LPM (2), what is the expected eect of a restaurant smoking ban on the predicted
probability that someone smokes? Is it signicant at the 1% level? Construct a 95% condence interval
for this expected eect.
Solution: For LPM (2), the eect is given by the coecient on restaurant. The estimated coecient
implies that the smoking ban leads to about a 10% reduction in the probability that someone smokes.
Since the t-stat =
.099
.037
= 2.67 is bigger in absolute value than 2.58, this dierential is signicantly
dierent from 0 at the 1% level. A 95% condence interval for ,
vcct
is given by

,
vcct
1.96 o1(
,
vcct
) =
.099 1.96 .037 = (.17, .026)
(c) (6 points) Evaluated at the mean values of the other regressors, what is the expected eect of a restaurant
smoking ban on the predicted probability that someone smokes using Logit (3) and Probit (4) respectively?
Solution: For Logit (3), we need to calculate the dierence in predicted probabilities:
1 (.360 (.132 12.5) + (.106 41.2) (.001 1990) (1 .452))
1 (.360 (.132 12.5) + (.106 41.2) (.001 1990))
= 1(.085) 1(.367) =
c
.085
1 +c
.085

c
.367
1 +c
.367
= .11
For Probit (4), we need to calculate the dierence in predicted probabilities:
(.199 (.082 12.5) + (.064 41.2) (.0008 1990) (1 .282))
(.199 (.082 12.5) + (.064 41.2) (.0008 1990))
= (.463) (.181) = .106
62
(d) (4 points) How many of the people in the sample are smokers? Explain.
Solution: The specication in column (5) is a probit with no right hand side variables, apart from a
constant. As we saw in class, estimating a Probit (or Logit) with no regressors is equivalent to estimating
a Bernoulli model. In particular, j =

1 (::o/c = 1) = (
,
0
). Once we have estimated this probability,
we can recover the number of smokers in the sample simply by multiplying the probability by the size of
the sample. In this case, j =

1 (::o/c = 1) = (
,
0
) = (.295) = .384 and .384 807 - 310, so there
are 310 smokers in the sample.
(e) (4 points) Using the output provided, calculate the Pseudo-1
2
for both logit (3) and probit (4). Based
on these calculations, is there a strong reason to prefer one model over the other in this case?
Solution: The formula for calculating the Pseudo-1
2
for a probit is 1
ln(1
max
probit
)
ln(1
max
bernoulli
)
, while for a logit its
1
ln(1
max
logit
)
ln(1
max
bernoulli
)
. As explained in part c, the Probit in column 5 is equivalent to estimating the Bernoulli
model using MLE so its maximized log likelihood is equal to ln(1
max
bcvac&||i
) . For the logit we have
1:cndo 1
2
= 1
ln
_
1
max
|cjit
_
ln
_
1
max
bcvac&||i
_ = 1
510.5
537.5
= 1
510.5
537.5
= 1 .950 = .050
For the probit we have
1:cndo 1
2
= 1
ln
_
1
max
jvcbit
_
ln
_
1
max
bcvac&||i
_ = 1
510.2
537.5
= 1
510.2
537.5
= 1 .949 = .051
The results imply that the models yield very similar results, so there is no strong reason to prefer one
over the other.
9. (20 points) We are going to show how to estimate the mean and variance of a (Normally distributed)
population using Maximum Likelihood. Recall that the pdf of a Normal distribution is given by
c(r[j, o
2
) =
1
_
2o
exp
_
1
2
_
r j
o
_
2
_
, < r <
Assume you observe a sample (r
1
, .., r
a
) of size : from a normal distribution, but you do not know j or o
2
and would like to estimate them with ML using this sample.
(a) (4 points) Treating the observations (r
1
, .., r
a
) as known, show that the likelihood function 1
a
_
j, o
2
_
can be written as
1
a
_
j, o
2
_
=
1
(2o
2
)
a2
exp
_
1
2o
2
a
i=1
(r
i
j)
2
_
(1)
Solution: From the formula for the Normal pdf, we know that the pdf of a single observation is just
)
i
(r) =
1
(2o
2
)
12
exp
_
1
2
_
r
i
j
o
_
2
_
The likelihood is simply the product of these pdfs, or
a
i=1
)
i
(r) =
a
i=1
1
(2o
2
)
12
exp
_
1
2
_
r
i
j
o
_
2
_
Since the product of exponents yields a sum (and o is not indexed by i), we have
a
i=1
1
(2o
2
)
12
exp
_
1
2
_
r
i
j
o
_
2
_
=
1
(2o
2
)
a2
exp
_
1
2o
2
a
i=1
(r
i
j)
2
_
= 1
a
_
j, o
2
_
63
(b) (6 points) Treating the value of o
2
as known, show that the value of j that maximizes (1) is in fact the
sample mean
1
a
a
i=1
r
i
. Hint: it may be easier to work with the log likelihood here.
Solution: We can work with either the likelihood function or log likelihood, but its a bit easier to work
with the log likelihood. Taking logs of both sides of (1) yields
log 1
a
_
j, o
2
_
=
:
2
log(2)
:
2
log o
2
1
2o
2
a
i=1
(r
i
j)
2
We can nd the value of j that maximizes the log likelihood by taking the derivative with respect to j
and setting it equal to zero. The derivative is simply
0 log 1
a
_
j, o
2
_
0j
=
1
o
2
a
i=1
(r
i
j)
Setting this equal to zero yields
1
o
2
a
i=1
(r
i
j) = 0 =
a
i=1
(r
i
j) = 0 =
a
i=1
r
i
:j = 0 =j =
1
:
a
i=1
r
i
(c) (4 points) Remember that we are interested in estimating both j and o
2
. In part b), we showed that
the MLE of j is
1
a
a
i=1
r
i
, now lets nd the MLE of o
2
. Treating the value of j as both given and equal
to r =
1
a
a
i=1
r
i
, re-write the likelihood function (1) as a function of o
2
(and r). Hint: this is very easy!
Solution: We simply replace j in (1) with r, yielding
1
a
_
o
2
_
=
1
(2o
2
)
a2
exp
_
1
2o
2
a
i=1
(r
i
r)
2
_
(2)
(d) (6 points) Find the value of o
2
that maximizes the likelihood function you derived in part c). Hint: it
will denitely be easier to work with the log likelihood here.
Solution: In this case it is much easier to work with the log likelihood. Taking logs of both sides of (2)
yields
log 1
a
_
o
2
_
=
:
2
log(2)
:
2
log o
2
1
2o
2
a
i=1
(r
i
r)
2
Again we can nd the value of o
2
that maximizes the log likelihood by taking the derivative with respect
to o
2
and setting it equal to zero. The derivative is simply
0 log 1
a
_
o
2
_
0o
2
=
:
2
1
o
2
+
1
2 (o
2
)
2
a
i=1
(r
i
r)
2
Setting this equal to zero yields
:
2
1
o
2
+
1
2 (o
2
)
2
a
i=1
(r
i
r)
2
= 0 =
1
2 (o
2
)
2
a
i=1
(r
i
r)
2
=
:
2
1
o
2
=o
2
=
1
:
a
i=1
(r
i
r)
2
10. (28 points total) The website for the Dave Matthews Band (DMB) is not only a great source of information
about the band, but also sells CDs, posters, DVDs, clothing, concert tickets, and so forth. Having witnessed
the great success that companies like Amazon.com have had in increasing sales by making personal recom-
mendations to their customers, the person in charge of marketing for DMB has asked you for help. In order to
purchase from the site, one must become a member; this allows demographic and purchase history information
to be gathered. You have data on recent orders that were followed by some sort of recommendation from the
site. DMB has tried several approaches for making recommendations of other products:
64
Recommending another product frequently ordered by purchasers of the product just purchased (e.g. People
who bought Daves latest album often purchase this new poster too).
Recommending another product often purchased by members demographically similar to the purchaser (e.g.
to younger members: Do you have a copy of Daves rst album?).
Recommending another product based on the members history (e.g. to purchasers of DMBs rst live CD:
Check out Daves second live CD).
Recommending a randomly selected product.
For each of 1500 orders, you have information on:
4
qc - the members age (in years)
and ve dummy variables:
1nrc/a:c - equal to one if the recommendation resulted in an additional purchase
'a|c - equal to one for men
Ot/cr: - equal to one if strategy 1 was used
1i/c1 on - equal to one if strategy 2 was used
1a:t1nrc/ - equal to one if strategy 3 was used
Note that one and only one recommendation was used for each order, so Ot/cr: = 1i/c1 on = 1a:t1nrc/ = 0
corresponds to the use of strategy 4.
You estimate the following models using LPM, Probit, and Logit models:
Dependent variable: 1nrc/a:c, dummy = 1 if person makes an additional purchase
Sample LPM LPM Probit Logit Logit
mean (1) (2) (3) (4) (5)
Co::ta:t .086
(.013)
.287
(.035)
.153
(.230)
.343
(.430)
1.62
(.069)
Ot/cr: .222 .091
(.024)
.091
(.024)
.440
(.118)
.821
(.220)
1i/c1 on .209 .192
(.028)
.189
(.027)
.763
(.114)
1.39
(.209)
1a:t1nrc/ .235 .073
(.023)
.073
(.023)
.369
(.118)
.690
(.221)
qc 22 .012
(.001)
.077
(.009)
.135
(.018)
'a|c .80 .094
(.020)
.476
(.118)
.935
(.229)
Log-Likelihood 594.5 595.3 670.9
1. (a) (2 points) How many of the 1500 recommendations made in this dataset were for a randomly selected
product?
Solution: The sample means in the table give us the proportions of people who received recommendations
of the rst three types. Summing them up we get .222 + .209 + .235 = .666 so the remaining fraction
(.334) must have received the random recommendation, yielding a total of .334 1500 = 501 people.
4
Although the data refer to orders, not members, no member appears more than once in these data.
65
(b) (4 points) How many of the 1500 recommendations resulted in an additional purchase?
Solution: To answer this question, you need to recover the sample proportion of people for whom
jnrc/a:c = 1, which you can get from the logit with no regressors in column 5. Since
^
1 (1nrc/a:c) =
^
1(1nrc/a:c = 1) = 1(,
0
) =
1
1 +c
1.62
= .165
we know that 1500 .165 - 247 people made an additional purchase. Note: it is also possible to use either
LPM 1 or 2 to do this calculation, since we know that the OLS regression line goes through the means,
and you have all their values so, for example,
^
1 (1nrc/a:c) =
^
1(1nrc/a:c = 1) = .086 + 0.091 0.222 + 0.192 0.209 + 0.073 0.235 = .163
and 1500 .163 - 245. It was ne if you did it that way instead (it just would have taken longer).
(c) (4 points) Focusing rst on the LPM in column 1, what is the interpretation of the coecient on
1i/c1 on? Is it signicantly dierent from 0 at the the 1% level?
Solution: The coecient of .192 on 1i/c1 on implies that receiving a recommendation based on strategy
2 increases the probability of making a purchase by 19.2%, relative to receiving a recommendation for a
randomly selected product. Since the t-statistic
_
t =
.192
.028
= 6.86
_
is larger in absolute value than 2.58, it
is signicant at the 1% level.
(d) (4 points) Because you were worried that ignoring the demographic variables (age, gender) might lead
to omitted variable bias, you added the 'a|c and qc variables to the LPM model in column 2. Based
on the results in column 2, do your concerns appear to have been justied?
Solution: Although the coecients on 'a|c and qc (in column 2) are statistically signicant at any
reasonable level, the coecients on three strategy variables do not change much at all. Thus, it seems
that the two demographic variables only satisfy one of the conditions for omitted variables bias and your
concerns appear to have been misplaced (in this case). Nonetheless, it is probably a good idea to include
them in the analysis anyway, since they are highly signicant.
(4 points) For the LPM, Probit and Logit models in columns 2-4, compute the predicted probability of
making an additional purchase for a 22 year old male who receives a recommendation based on strategy
1. Are the predicted probabilities very dierent across the three models?
Solution: For the LPM model, the calculation is simply.287 + .091 .012 22 + .094 = .208 or 20.8%.
For the probit model, you must calculate (.153 +.44 .077 22 +.476) = .176 or 17.6% For the logit
model, you must calculate 1 (.343 +.821 .135 22 +.935) =
1
1+c
1:557
= .174 or 17.4%. The results are
pretty similar across the models, which is not surprising given what we learned in class.
(e) (4 points) For the LPM, Probit and Logit models in columns 2-4, compute the change in the predicted
probability of making an additional purchase when age increases from 22 to 24, for a male who receives
a random recommendation.
Solution: For the LPM model, the calculation is simply 2
^
,
4
= 2 .012 = .024 or -2.4%. For the pro-
bit model, you must calculate (.153 .077 24 +.476) (.153 .077 22 +.476) = (1.525)
(1.371) = .022 or -2.2% For the logit model, you must calculate 1 (.343 .135 24 +.935)
1 (.343 .135 22 +.935) = 1 (2.648) 1 (2.378) = .019 or -1.9%
(f) (3 points) Focusing on the Logit model in column 4, you want to test whether the coecients on the
three included strategy variables are equal. Your joint test yields an F-statistic of 7.36. What do you
conclude?
Solution: Since the null hypothesis we are testing is H
0
: ,
1
= ,
2
= ,
3
, there are 2 restrictions, so we
need to compare our F-statistic to the critical values of the 1
2,o
distribution. Since the 1% critical value
for 1
2,o
is 4.61, we can reject the null hypothesis that the three coecients are equal at the 1% level.
66
(g) (3 points) Calculate the Pseudo-1
2
s for the models in columns 3 and 4.
Solution: Using the formula from the book, the Pseudo-1
2
for the probit is simply
Pseudo1
2
= 1
ln
_
1
max
jvcbit
_
ln
_
1
max
bcvac&||i
_ = 1
594.5
670.9
= 1 .886 = .114
and the Pseudo-1
2
for the logit is simply
Pseudo1
2
= 1
ln
_
1
max
|cjit
_
ln
_
1
max
bcvac&||i
_ = 1
595.3
670.9
= 1 .887 = .113
2. 6. (5 points overall) Some researchers want to estimate a binary dependent variable model, and they want to
decide if logit or probit is more appropriate.
(a) (3 points) Now suppose they have no idea about the true functional form of the conditional probability
1 (1
i
= 1 [ A
1i
, ..., A
Ii
). One of them suggests to use a Hausman test (based, as usual, on the normalized
distance between the two sets of coecients estimated with the two dierent estimators) to decide between
logit and probit. Do you think this test would make sense? Why?
Solution: No, Hausman test does not allow us to test for functional misspecication. Since both models
can be misspecied, the distance between the two sets of estimated coecients will be uninformative.
Besides, recall that Logit and Probit always lead to very dierent point estimates. This just reinforces
the uselessness of the Hausman test.
(b) (2 points) Now suppose that the researchers know for sure that either logit or probit is the correct model.
Would a Hausman test make sense as a tool to decide which model is correct? why?
Solution: No, the reasoning is the same as in part (a).
67
6 Instrumental Variables & Simultaneous Equation Models (SEM)
1. You want to estimate ,
1
in the following regression model:
j
i
= ,
0
+,
1
r
i
+n
i
, co(r, n) = o
a&
,= 0.
We know that in this case OLS will be inconsistent because the error is correlated with the regressor. However,
suppose that there is another variable . such that 1(n
i
[ .
i
) = 0, co(r, .) = o
a:
,= 0. In midterm 1, we have
proved that a consistent estimator for ,
1
can be obtained as:
^
,
+
1
=
1
a
a
i=1
(.
i
.)j
i
1
a
a
i=1
(.
i
.)r
i
= ,
1
+
1
a
a
i=1
(.
i
.)n
i
1
a
a
i=1
(.
i
.)r
i
. (5)
In this problem, you can assume that all regularity conditions needed for the validity of the Law of Large
Numbers and for the Central Limit Theorem hold. We assume that we have a large sample, so that . can be
approximated with its true value j
:
. Using this approximation, one can show that (5) can be re-written as:
_
:
_
^
,
+
1
,
1
_
=
_
_
:

ov
_
o
1
a
a
i=1
(.
i
j
:
)r
i
where
i
= (.
i
j
:
)n
i
and o
2
= \ ar () = \ ar [(.
i
j
:
)n
i
].
(a) (4 points) What does
1
a
a
i=1
(.
i
j
:
)r
i
converge in probability to? Justify your answer.
All regularity conditions for LLN hold (so observations are also iid). Then, we know that
1
:
a
i=1
(.
i
j
:
)r
i
j
1 [(.
i
j
:
)r
i
] = 1 [(.
i
j
:
) (r
i
j
a
)] = o
a:
(b) (4 points) Prove that 1 (
i
) = 0.
Just use LIE:
1 (
i
) = 1 [(.
i
j
:
)n
i
] = 1 [1 [(.
i
j
:
)n
i
[.
i
]]
= 1
_
_
(.
i
j
:
)1 (n
i
[.
i
)
. .
=0
_
_
= 0
(c) (3 points) Suppose that you know that A
a
o
(0, 1), that is, you know that a certain random variable
A
a
when the sample size : goes to innity converges in distribution to a standard normal. Also, let a be
a constant. What does aA
a
converge in distribution to?
aA
a
j
_
a0, a
2
_
=
_
0, a
2
_
68
(d) (3 points) Using the Central Limit Theorem (CLT), prove that
_
:

ov
o
(0, 1) (that is,
_
:

ov
converges in distribution to a standard normal).
We just have to apply the CLT. According to the CLT, if we have iid variables
i
with nite variance:
j
ov
_
a
o
(0, 1) ,
but this is exactly what we have here, as we proved in part b) that j
= 0, and so
_
:

o
=
_
:
j
=
j
ov
_
a
o
(0, 1)
(e) (4 points) Using the results so far, prove that
_
:
_
^
,
+
1
,
1
_
o
_
0,
o
2
o
2
a:
_
.
At this point we know that
_
:

ov
o
(0, 1) ,
1
a
a
i=1
(.
i
j
:
)r
i
j
o
a:
and we know that
_
:
_
^
,
+
1
,
1
_
=
_
a
v
v
ov
1
n
P
n
i=1
(:
i
j
z
)a
i
. Putting together all the pieces, we get (rigorously speaking, because of Slutsky Theorem,
but it was not necessary to mention this)
_
:
_
^
,
+
1
,
1
_
=
d
.(0,1)
..
_
_
:

o
_
o
1
:
a
i=1
(.
i
j
:
)r
i
. .
p
oxz
o
o
a:
(0, 1) =
_
0,
o
2
o
2
a:
_
(f) (3 points) Based on the results above, how would you construct a 95% condence interval for ,
1
?
_
:
_
^
,
+
1
,
1
_
o
_
0,
o
2
o
2
a:
_
so that in large samples
^
,
+
1
is approximately distributed as
_
,
1
,
1
:
o
2
o
2
a:
_
.
Hence, a condence interval can be constructed as
^
,
1
1.96
_
1
:
^ o
2
^ o
2
a:
69
2. You have data on the performance of n mutual fund managers. The performance is measured by a variable
1 , whose mean, in the sample, is equal to 10. You also have data on whether the manager reads a certain
nancial newsletter. You estimate the following regression using OLS (robust standard errors in parenthesis).
1
i
= 5
( 0.1)
+ 1.5
( 0.15)
1
i
+ 0.1
( 0.03)
c
i
+ 0.005
( 0.001 )
c
2
i
(6)
where 1
i
is a dummy = 1 if the i-th manager reads the newsletter, and e
i
is years of experience.
(a) (5 points) Suppose that the model as specied in (6) is correct. Does it appear that reading the newsletter
improves the managers performance? Evaluate both the statistical and the economic signicance of the
results.
Solution: Yes, reading the newsletter improves the performance by 1.5, which is 15% of the mean, and
seems to be economically signicant. This improvement is also statistically signicant: t :tat =
1.5
0.15
=
10 exceeds the 1% critical value.
(b) You suspect that the above results are unreliable, as the decision to read the newsletter might be corre-
lated with many unobserved characteristics of the manager. For example, suppose that better managers
typically read the newsletter. Then you re-estimate the above model using IV, using as instrument for 1
i
a dummy equal to one if the manager received an oer of free subscription. You know that the oer was
sent to a randomly selected group of managers. Do you think the dummy is a valid instrument? Why, or
why not? Solution:
Relevant? Yes, probably free oer does aect readership, people are more likely to read a newsletter
when they are oered a free subscription.
Exogenous? Yes, since the oer has been made randomly. There is no reason to believe that the free
oer would have an additional direct eect on 1
i
beyond its inuence through 1
i
.
Since this variable is both relevant and exogenous = looks like it is a valid instrument.
(c) Suppose that the instrument described in the previous part is valid. Would you expect the IV estimate
of the newsletter benet to be larger or smaller than the OLS estimate? Justify your answer!
Solution: Results in (6) are likely to be biased upward as readership will absorb the eect of managers
quality on 1 . Hence, we expect IV estimate to be smaller. The variable that controls for the quality
of managers (call it Q, and let ,
Q
be the corresponding slope in the regression) is omitted from the
regression. This leads to an upwards bias in OLS estimate, since Co(Q
i
, 1
i
) 0 and ,
Q
0.
(d) Suppose again that the instrument described previously is valid. You nd another two variables that
might be valid instruments, but you are not sure. Since you have more instruments that endogenous
variables, you can calculate the J-statistic, which turns out to be equal to 2.5. What do you conclude?
Solutions: There are three instruments and one endogenous variable, hence there are 2 overidentifying
restrictions and J-statistic follows
2
distribution with 2 degrees of freedom. Critical values for
2
2
given
dierent levels of signicance are:
10% 4.61
5% 5.99
1% 9.21
Since J-stat is smaller than any of the critical values, we fail to reject the null hypothesis that all the
instruments are exogenous.
(e) Now suppose that you also suspect that the population of managers is heterogeneous. Specically, suppose
that now the model is
1
i
= ,
0
+,
1i
1
i
+,
c
c
i
+,
c2
c
2
i
+n
i
,
and the rst stage will be
1
i
=
0
+
i1
1
i
+
c
c
i
+
c2
c
2
i
+
i
,
70
where 1
i
is a dummy equal to one if manager i receives the oer of free subscription. Notice that now
there is heterogeneity in ,
1i
(the slope for readership) as well as
i1
(the impact that the oer has
on readership). You would like to estimate the ATE (Average Treatment Eect) or readership, that is,
1 [,
1i
]. Do you think that the IV estimator will be consistent for the ATE? Why? If your answer is no,
do you think IV will generally lead to a result that is lower or higher than ATE? Why? (you do not need
to use formal proofs here, just use a reasoning along the lines of what we discussed in class for cases where
you use IV with heterogeneity)
Solution: Using the results we saw in class, with heterogeneity in this population we would expect
^
,
2S1S
1
to converge in probability to a weighted average of the individual-specic ,
1i
(the slopes for readership),
with weights that will depend on how much individual is decision to read the newsletter is aected by
the oer of a free subscription (that is, on
i1
). Here we suspect that better managers have higher ,
1i
(they benet more) but they also always read the newsletter, so the subscription oer does not change
their decision (hence for these managers
i1
is close to zero). This means that the probability limit will
attach relatively larger weights to the slopes of the worst managers, and hence we should expect an
estimate LOWER than the true ATE.
3. Problem (12) 15 points overall Here are some examples of the instrumental variables regression model.
In each case you are given the number of instruments and the J-statistic. We follow Stock & Watsons notation,
so the A variables are endogenous variables, and the \ variables are exogenous variables. For parts b and c
nd the relevant value from the
2
nI
distribution, using a 1% and 5% signicance level, and make a decision
whether or not to reject the null hypothesis.
(a) (3 points) What is the null hypothesis of the J-statistic?
Solution: If we let ^ n
i
TS1S
be the residuals from TSLS estimation, and set up an OLS estimation:
^ n
i
TS1S
=
0
+
1
7
1i
+... +
n
7
ni
+
n+1
\
1i
+... +
n+v
\
vi
+c
i
where 7s are instruments, \s are exogenous variables, and c
i
are regression error terms. The null
hypothesis is that
1
= ... =
n
= 0.
(b) (6 points) 1
i
= ,
0
+,
1
A
1i
+n
i
, i = 1, ..., :; 7
1i
, 7
2i
are valid instruments, J = 2.58.
Solution: Note that there is one degree of freedom. The
2
1
value is 6.63 at the 1% level and 3.84 at the
5% level. Therefore, we cannot reject the null hypothesis that all instruments are exogenous.
(c) (6 points) 1
i
= ,
0
+ ,
1
A
1i
+ ,
2
A
2i
+ ,
3
\
1i
+ n
i
, i = 1, ..., :; 7
1i
, 7
2i
, 7
3i
, 7
4i
are valid instruments,
J = 9.63.
Solution: Note that there are two degrees of freedom. The
2
2
value is 9.21 at the 1% level and 5.99 at
the 5% level. Therefore, we can reject the null hypothesis that all instruments are exogenous.
4. 14. (16 points) Earnings functions, whereby the log of earnings is regressed on years of education, years of
on-the-job training, and individual characteristics, have been studied for a variety of reasons. Some studies
have focused on the returns to education, others on discrimination, union and non-union dierentials, etc. For
all these studies, a major concern has been the fact that ability should enter as a determinant of earnings,
but that it is close to impossible to measure and therefore represents an omitted variable. Assume that the
coecient on years of education is the parameter of interest. Given that education is positively correlated with
ability, since, for example, more able students attract scholarships and hence receive more years of education,
the OLS estimator for the returns to education could be upward-biased. To overcome this problem, various
authors have used instrumental variables estimation techniques. For each of the potential instruments listed
below, briey discuss instrument validity.
71
(a) (4 points) The individuals postal zip code.
Solution: Instrumental validity has two components, instrument relevance
Co (7
i
, A
i
) ,= 0
and instrument exogeneity
Co (7
i
, n
i
) = 0
The individuals postal zip code is somewhat likely to be relevant, because the ZIP code indicates, after
all, where the individual lives, and dierent people sort across ZIP codes based on their characteristics.
But for the same reason the ZIP code is also likely to be ENDOGENOUS, because where the individual
lives is likely correlated with lots of unobservable variables (such as school quality, job opportunities,
crime levels etc.) which are also going to have a direct impact on earnings.
(b) will almost certainly be uncorrelated with the omitted variable, ability, even though some zip codes may
attract more able (or more likely richer) individuals, so there is no problem with exogeneity. However,
this is an example of a weak (not relevant) instrument, since it is also almost certainly uncorrelated with
years of education.
(c) (4 points) The individuals IQ or test-score on a work-related exam.
Solution: There is instrument relevance in this case, since, on average, individuals who do well in
intelligence scores or other work-related test scores will have more years of education. Unfortunately
there is bound to be a high correlation with the omitted variable ability, since this is what these tests are
supposed to measure, so there is a problem with exogeneity.
(d) (4 points) Years of education for the individuals mother or father.
Solution: A non-zero correlation between the mothers or fathers years of education and the individuals
years of education can be expected. Hence this is a relevant instrument. However, it is not clear that
the parents years of education are uncorrelated with parents ability, which in turn, can be a major
determinant of the individuals ability. If this is the case, then years of education of the mother or father
is not a valid instrument, due to a failure of exogeneity.
(e) (4 points) Number of siblings the individual has.
Solution: There is some evidence that the larger the number of siblings of an individual, the less the
number of years of education the individual receives. This makes sense as education is expensive (at least
college is). Hence the number of siblings is a relevant instrument. It could also be argued that number
of siblings is uncorrelated with an individuals ability. In that case it also represents an exogenous
instrument. However, there is the possibility that ability depends on the attention an individual receives
from parents, and this attention is shared with other siblings, so its a little ambiguous.
5. SEM (10 points) Consider the following model of demand and supply of coee:
Demand: Q
Cc))cc
i
= ,
1
1
Cc))cc
i
+,
2
1
Tco
i
+n
i
Supply : Q
Cc))cc
i
= ,
3
1
Cc))cc
i
+,
4
1
Tco
i
+,
5
\cat/cr +
i
(Variables are measured in deviations from means, so that the constant is omitted. Dont worry about this.)
What are the expected signs of the various coecients this model? Explain. Assume that the price of tea
_
1
Tco
_
and \cat/cr are exogenous variables. Are the coecients in the supply equation identied? Are the
coecients in the demand equation identied? Are they overidentied? Is this result surprising given that
72
there are more exogenous regressors in the second equation?
Solution: Our intuition about supply and demand curves apply here. We expect:
,
1
< 0 : increasing price drives demand down.
,
2
0 : coee and tea are substitutes, so increases in the price of tea should increase demand for coee
(as consumers substitute toward it).
,
3
0 : increasing price causes producers to produce more
,
4
0 : again, coee and tea are substitutes, so increases in the price of tea should increase the supply of
coee (theres more potential for prot). You could also have argued that ,
4
= 0 since the price of other
goods shouldnt be in the supply equation in a static setting.
,
5
< 0 : assuming \cat/cr means bad weather. If you assumed it meant good weather (which is ne),
the expected sign would be the opposite.
Changes in \cat/cr will shift the supply equation and thereby trace out the demand equation. Hence
the coecients of the demand equation are exactly identied since the number of instruments equals the
number of endogenous regressors. However, the coecients of the supply equation are underidentied
since there is no instrumental variable available for estimation. The result is not surprising, since it is
not the number of exogenous regressors in the equation that matters when determining whether or not
the coecients are identied. Instead what matters is the number of instruments available relative to
the number of endogenous regressors. It is possible that the regression coecients can be (over)identied
even if there are no exogenous regressors present in the equation.
If you argued that ,
4
= 0, then both equations are exactly identied.
6. (60 points overall) You have a sample of n schools from a very poor country, and you carry out an experiment
to evaluate the eect of a program of free meals on average test scores for young kids. Let T
i
be the average
test score in school i (after the program) and and let A
i
= 1 for the treatment group (schools where the
program is implemented), and equal to zero for the control group (where the program is not implemented). All
schools comply with the rules of the experiment, and randomization is done carefully, so there is no systematic
dierence between the treatment and the control group before the experiment. Therefore one can recover the
treatment eect estimating the following regression:
T
i
= ,
0
+,
1
A
i
+n
i
(1)
You took care of collecting data on test scores, so that T
i
is measured without error. Unfortunately, your two
research assistants (RA hereafter) have messed up their data, and they did mistakes in reporting which schools
belong to the treatment and the control group. Both RAs recorded information on all schools. That is, you
do not observe A
i
, but you observe A
i
and A
1
i
, where
A
i
= A
i
+c
i
A
1
i
= A
i
+c
1
i
where c
i
and c
1
i
are the errors made by RA and 1 respectively. Assume that these errors are totally
random, so that they have zero expectation, and that they are uncorrelated with all other random variables.
(a) (6 points) Suppose that you regress T
i
on A
i
. Will you get a consistent estimate of the treatment eect
,
1
? Justify your answer.
Solution: No, the estimator will be inconsistent, the measurement error introduces bias in the OLS
estimator of ,
1
:
^
,
1
j
,
1
o
2
A
o
2
A
+o
2
c
A
73
Since the ratio
o
2
X
o
2
X
+o
2
e
A
is less than one,
^
,
1
will be biased towards zero. This result if proved in S&Watsons
chapter on assessing studies based on multiple regression. You can also prove this result in the usual
way. You start from the OLS formula, you plug in the truth and you solve. Because you do not observe
A
i
, all you can do is to regress T
i
on the mismeasured A
i
. Then your OLS estimator is
^
,
1
=
1
a
a
i=1
_
A
i

A
_
T
i
1
a
a
i=1
_
A
i

A
_
2
.
But then using the usual LLNs we have
^
,
1
j
co
_
A
i
, T
i
_
ar
_
A
i
_
where (using the fact that c
i
is uncorrelated with everything else and A
i
is exogenous)
ar
_
A
i
_
= o
2
A
+o
2
c
A
co
_
A
i
, T
i
_
= co
_
A
i
+c
i
, ,
0
+,
1
A
i
+n
i
_
= co (A
i
, ,
1
A
i
) = ,
1
o
2
A
and the result follow.
(b) (5 points) Let A
A
i
be the mean of the two RAs reports. So A
A
i
=
A
A
i
+A
B
i
2
. Are A
i
, A
1
i
, and A
A
i
unbiased estimators for the true treatment status A
i
?
Solution: Yes, all three estimators are unbiased estimators of the true treatment status: 1[A
i
] =
[A
i
+c
i
] = A
i
+1[c
i
] = A
i
Same is true for A
1
i
. Since, both A
i
and A
1
i
are unbiased estimators, their
linear combination (with the weights sum up to 1!) is an unbiased estimator as well. Note: A
i
is NOT a
random variable here, it is the "parameter" you want to estimate!
(c) (6 points) You still want to estimate A
i
using A
i
, A
1
i
, or A
A
i
. Let o
2
= \ ar
_
c
i
_
and o
2
1
= \ ar
_
c
1
i
_
.
Calculate the variance of the three estimators of the true treatment status A
i
.
Solution:
\ ar(A
i
) = \ ar(A
i
+c
i
) = \ ar(c
i
) = o
2
\ ar(A
1
i
) = \ ar(A
i
+c
1
i
) = \ ar(c
1
i
) = o
2
1
\ ar(A
A
i
) = \ ar(
A
i
+A
1
i
2
) =
o
2
+o
2
1
+Co(c
i
, c
1
i
)
4
=
o
2
+o
2
1
4
where the last equality holds since Co(c
i
, c
1
i
) = 0 by assumption.
(d) (6 points) Suppose that you have reasons to believe that RA is more accurate than 1. In particular,
you believe that o
2
1
= 2o
2
. If you wanted to predict the value of A

i
, which estimator would you choose,
A
i
, A
1
i
, or A
A
i
Solution: Since all estimators are unbiased, we want to choose the most ecient one, i.e. the one with
the smallest variance. Compare the variances of the estimators in hand:
\ ar(A
i
) = o
2
\ ar(A
1
i
) = o
2
1
= 2o
2
\ ar(A
A
i
) =
o
2
+o
2
1
4
=
o
2
+ 2o
2
4
=
3
4
o
2
Since the last estimator has the smallest variance among the three, we would choose A
A
i
to estimate A
i
.
74
(e) (6 points) You want to estimate equation (1) using OLS, and (as in part d) you think that o
2
1
= 2o
2
.
Which variable would you choose as regressor, A
i
, A
1
i
, or A
A
i
Solution: Since the magnitude of the bias depends on the variance of the measurement error:
^
,
1
,
1
o
2
A
o
2
A
+o
2
ncoc.cvvcv
we would choose A
A
i
since it has the smallest variance among the estimators that are proposed. This
will minimize the bias induced the the measurement error.
(f) (5 points) Now you entertain the idea of estimating ,
1
using instrumental variables. You decide to use A
i
as the endogenous variable, and A
1
i
as the instrument. Briey describe the two steps of the estimation
procedure (just mention what you regress on what in each stage).
Solution: I stage: regress A
i
on A
1
i
and calculate the tted values of A
i
:
^
A
i
; II stage: regress T
i
on
^
A
i
.
(g) (6 points) Now you have to understand whether A
1
i
is a valid instrument. Is it relevant as predictor of
A
i
?
Solution: Obviously A
1
i
is a relevant instrument: since both A
i
and A
1
i
measure the same thing (A
i
),
A
1
i
is a good predictor of A
i
. You can show that Co(A
i
, A
1
i
) ,= 0.
(h) (6 points) Prove that Co
_
A
1
i
, n
i
,
1
c
i
_
= 0.
Solution:
Co
_
A
1
i
, n
i
,
1
c
i
_
= Co(A
i
+c
1
i
, n
i
,
1
c
i
) =
= Co(A
i
, n
i
) ,
1
Co(A
i
, c
i
) +Co(c
1
i
, n
i
) ,
1
Co(c
1
i
, c
i
) =
= 0
All the covariation terms are equal zero by assumptions.
(i) (4 points) Prove that (1) can be rewritten as
T
i
= ,
0
+,
1
A
i
+
_
n
i
,
1
c
i
_
Solution:
T
i
= ,
0
+,
1
A
i
+n
i
=
= ,
0
+,
1
A
i
+,
1
c
i
,
1
c
i
+n
i
=
= ,
0
+,
1
(A
i
+c
i
) +n
i
,
1
c
i
=
= ,
0
+,
1
A
i
+
_
n
i
,
1
c
i
_
(j) (5 points) Does the instrument A
1
i
satisfy the exogeneity condition? That is, is A
1
i
uncorrelated with
the error of the regression you are estimating?
Solution: Yes, given the results in (h) and (i): Co(A
1
i
, n
i
,
1
c
i
) = 0.
75
(k) (5 points) To wrap up, is the proposed 2SLS estimator a consistent estimator of the true treatment eect
,
1
Solution: Since, the instrument is both relevant and exogenous, 2SLS will give a consistent estimator of
the true treatment eect.
^
,
2S1S
1

Co(A
1
i
, T
i
)
Co(A
1
i
, A
i
)
=
Co(A
i
+c
1
i
, ,
0
+,
1
A
i
+n
i
)
Co(A
i
+c
1
i
, A
i
+c
i
)
=
,
1
o
2
A
o
2
A
= ,
1
7. SEM (24 points overall) You are studying demand and supply of labor in a certain manufacturing sector. You
think demand and supply are well represented by the following system of two structural equations
H
t
= c
1
c
2
\
t
+c
3
1
t
+n
1t
1 (1)
H
t
= c
4
+c
5
\
t
+n
2t
2 (2)
where H
t
is hours worked, \
t
is the wage, and 1
t
is price of raw material. Here, all cs are positive. So,
equation (1) represents demand of labor, while equation (2) represents supply. Assume that 1
t
is exogenous.
(a) (5 points) Briey explain why one cannot estimate equations (1) and (2) using OLS.
Solution: In equilibrium H and \ are simultaneously determined, they are both endogenous. This
introduces a simultaneous causality bias (since the error term is correlated with the regressor) and makes
OLS estimators inconsistent.
(b) (6 points) From (1) and (2), and with some algebra, one can derive the reduced form system, which is:
H
t
=
1
+
2
1
t
+
1t
3 (3)
\
t
=
3
+
4
1
t
+
Wt
4 (4)
where
1
=
c
4
c
2
+c
5
c
1
c
2
+c
5
,
2
=
c
5
c
3
c
2
+c
5
,
1t
=
n
2t
c
2
+c
5
n
1t
c
2
+c
5
You do not have to prove the above result, just take it as given. Prove instead that
3
=
c
1
c
4
c
2
+c
5
,
4
=
c
3
c
2
+c
5
,
Wt
=
n
1t
n
2t
c
2
+c
5
Solution: Subtract (2) from (1): 0 = c
1
c
4
(c
2
+c
5
)\
t
+c
3
1
t
+n
1t
n
2t
Rearranging, obtain:
\
t
=
c
1
c
4
c
2
+c
5
+
c
3
c
2
+c
5
1
t
+
n
1t
n
2t
c
2
+c
5
It follows then that:
3
=
c
1
c
4
c
2
+c
5
,
4
=
c
3
c
2
+c
5
,
Wt
=
n
1t
n
2t
c
2
+c
5
(c) (5 points) You estimate the reduced form system of equations (3) and (4), so that now you have estimates
^
1
, ^
2
, ^
3
, ^
4
. From these point estimates, can you recover an estimate for the slope of the supply function,
c
5
?
Solution: Yes,
^ c
5
=
^
2
^
4
76
(d) (4 points) Are the coecients c
4
and c
5
of the supply function identied? Why? (NO algebra is necessary
here!)
Solution: Yes, they are identied since the necessary condition for identication is satised, there is
only one included endogenous variable and one excluded exogenous variable.
(e) (4 points) Are the coecient c
1
, c
2
and c
3
of the demand function identied? Explain. (NO algebra is
necessary here!)
Solution: No, parameters in the demand equation are not identied, there is one included endogenous
variable but none excluded exogenous variables.
8. SEM (7 points) Consider the following three-equation system:
1
1
= c
1
+c
2
1
2
+c
3
A
1
+c
4
A
2
+n
1
1
2
= ,
1
+,
2
1
3
+,
3
A
2
+n
2
1
3
=
1
+
2
1
2
+n
3
(a) (3 points) Which of the above equations (if any) can be (consistently) estimated using OLS? Explain.
Solution: Estimating equations 2 and 3 by OLS will lead to simultaneous bias, since there are endogenous
variables on the right hand side of these equations. Equation 1 is more complicated, because 1
1
does not
appear on the right hand side of equations 2 or 3. Therefore, if n
1
is uncorrelated with n
2
, then equation
1 can be consistently estimated using OLS.
(b) (4 points) Which of the above equations (if any) are under-identied? Exactly identied? Over-
identied? Explain your answer and be precise.
Solution: If n
1
is uncorrelated with n
2
, then equation 1 can be estimated by OLS, so identication is
not an issue (if n
1
is correlated with n
2
, then equation 1 is under-identied, since there are no available
exogenous instruments). The second equation is unidentied because there are no available instruments
in the third equation which are excluded from the second. The third is exactly identied (provided that
,
3
,= 0) since there is a single instrument (A
2
) available in the second equation for the endogenous variable
1
2
.
9. SEM (30 points) You are interested in the relationship between cigarette smoking and income.
(a) (3 points) A model to estimate the eects of smoking on annual income (perhaps through lost work
days due to illness, or productivity eects) is
ln(i:co:c) = ,
0
+,
1
ciq: +,
2
cdnc +,
3
aqc +,
4
aqc
2
+n
where cigs is the number of cigarettes smoked per day, on average. How do you interpret ,
1
?
Solution: Assuming the structural equation represents a causal relationship, 100 ,
1
is the approximate
percentage change in income if a person smokes one more cigarette per day. Given eects on productivity,
we expect it to be negative (or maybe zero).
(b) (4 points) To reect the fact that cigarette consumption might be jointly determined with income, a
demand for cigarettes equation is
ciq: =
0
+
1
ln(i:co:c) +
2
cdnc +
3
aqc +
4
aqc
2
+
5
ln(ciqjric) +
6
rc:tanr +
77
where ciqjric is the price of a pack of cigarettes (in cents), and rc:tanr is a binary variable equal to one
if the person lives in a state with restaurant smoking restrictions. Assuming these are exogenous to the
individual, what signs would you expect for
5
and
6
. Explain your answers.
Solution: Since consumption and price are negatively related, we expect
5
< 0. Similarly, every-
thing else equal, restaurant smoking restrictions should reduce cigarette smoking (since the benets and
opportunities are decreased), so
6
< 0.
(c) (4 points) Under what circumstances is the income equation from part a) identied? How about the
demand equation in part b)?
Solution: We need either
5
or
5
(or both) to be dierent from zero. That is, we need at least one
exogenous variable in the ciq: equation that is not also in the |:(i:co:c) equation. The demand equation
is not identied because there are no exogenous variables in the ln(i:co:c) equation that are not also in
the ciq: equation.
(d) (5 points) Estimating the income equation by OLS yields the following output
ln(i:co:c) = 7.80
(0.17)
+ .0017
(.0017)
ciq: + .060
(.008)
cdnc + .058
(.008)
aqc .00063
(.0008)
aqc
2
,

1
2
= .165
Discuss the estimate of ,
1
(remember to discuss the size and signicance). Does this make sense? If not,
what do you think is going wrong?
Solution: The coecient on ciq: implies that cigarette smoking causes income to increase. In particular,
smoking one additional cigarette per day is expected to increase income by 100 .0017 = .17 percent.
However, the coecient is not statistically dierent from zero (t-stat = 1) at even the 10% level. This
result does not make sense, but since OLS ignores potential simultaneity between income and cigarette
smoking, we probably do not have an unbiased or consistent estimate of the true eect.
(e) (3 points) The rst stage (or reduced form) estimate for the income equation is
ciq: = 1.58
(23.70)
.450
(.162)
cdnc + .823
(.154)
aqc .0096
(.0017)
aqc
2
.351
(5.766)
ln(ciqjric) 2.74
(1.11)
rc:tan,

1
2
= .051
and the F-stat for the joint signicance of |:(ciqjric) and rc:tanr: is 3.13. What do these results imply
about the instruments available to identify the income equation? Explain.
Solution: While ln(ciqjric) is very insignicant, rc:tanr: has the expected negative sign and a t-statistic
of about 2.47. (People living in states with restaurant smoking restrictions smoke almost three fewer
cigarettes, on average, given education and age.) Moreover, the F-test of joint signicance (3.13) is greater
than the critical value 1
2,o
= 3, so we can reject the null that both coecients are equal to zero. However,
the F-stat is nowhere near the cuto of 10 suggested in S&W, implying that we should be worried about
weak instruments.
(f) (4 points) Estimating the income equation by 2SLS yields
ln(i:co:c) = 7.78
(0.23)
.042
(.026)
ciq: + .040
(.016)
cdnc + .094
(.023)
aqc .00105
(.00027)
aqc
2
How does the 2SLS estimate of ,
1
compare with the OLS estimate (be sure to discuss the size and
signicance of the coecient). Construct a 95% condence interval for ,
1
and discuss its implications.
Solution: Now the coecient on ciq: is negative, but not quite signicant at the 10% level (t = 1.615).
78
However, the estimated eect is very large: each additional cigarette someone smokes lowers predicted
income by about 4.2%. Nonetheless, the 95% CI for ,
1
is very wide:

,
1
1.96 o1
_
,
1
_
= .042 1.96
.026 = (.009, .093) as is the CI for the estimated impact of smoking one additional cigarette per day:
100
,
1
1.96 100 o1
_
,
1
_
= 100 .042 1.96 100 .026 = (.9, 9.3) .
(g) (6 points) Do you think that cigarette prices and restaurant smoking restrictions are likely to be ex-
ogenous in the income equation? Explain why or why not. The J-stat from the 2SLS estimation is 6.26.
What do you conclude?
Solution: Assuming that state level cigarette prices and restaurant smoking restrictions are exogenous in
the income equation seems problematic. Incomes are known to vary by region, as do restaurant smoking
restrictions. It could be that in states where income is lower (after controlling for education and age),
restaurant smoking restrictions are less likely to be in place. Also, cigarette prices (or taxes) may well be
lower in low income states. Since there are two instruments and one endogenous regressor, the J-stat is
distributed as
2
1
. The
2
1
critical value is 6.63 at the 1% level and 3.84 at the 5% level. Therefore, we
can reject the null hypothesis that both instruments are exogenous at the 5% but not the 1% level.
10. (18 points) Consider the univariate regression model
1
i
= ,
0
+,
1
A
i
+n
i
and let the correlation between A
i
and n
i
be corr (A
i
, n
i
) = j
A&
. Suppose that the second and third OLS
assumptions hold, but the rst does not, because j
A&
is nonzero. Also, let o
&
be the standard deviation of n
and o
A
be the standard deviation of A.
(a) (5 points) Recalling that the OLS estimator of ,
1
can be written as
,
1
= ,
1
+
1
a
_
A
i
A
_
n
i
1
a
_
A
i
A
_
2
prove that
,
1
j
,
1
+j
A&
o
&
o
A
Solution: Given that the OLS estimator of ,
1
can be written as
,
1
= ,
1
+
1
a
_
A
i
A
_
n
i
1
a
_
A
i
A
_
2
we can use our summation trick to re-write this as
,
1
= ,
1
+
1
a
_
A
i
A
_
(n
i
n)
1
a
_
A
i
A
_
2
Since we know
1
a
_
A
i
A
_
j
o
2
A
and
1
:
_
A
i
A
_
n
i
j
co (n
i
, A
i
)
j
j
A&
o
&
o
A
.
Substituting this into the above equation yields the desired result
,
1
j
,
1
+j
A&
o
&
o
A
79
(b) (5 points) Suppose you now have a potential instrument 7 for A. Let the correlation between 7
i
and
n
i
be corr (7
i
, n
i
) = j
Z&
and the correlation between 7
i
and A
i
be corr (7
i
, A
i
) = j
ZA
. Recalling that
the 2SLS estimator for ,
1
can be written as
,
2S1S
1
= ,
1
+
1
a
_
7
i
7
_
n
i

1
1
a
_
7
i
7
_
2
where
1
is the OLS estimator for the slope in the regression of A on 7, prove that
,
2S1S
1
j
,
1
+
j
Z&
j
ZA
o
&
o
A
Solution: Given that the 2SLS estimator for ,
1
can be written as
,
2S1S
1
= ,
1
+
1
a
_
7
i
7
_
n
i

1
1
a
_
7
i
7
_
2
where we again use our summation trick to re-write this as
,
2S1S
1
= ,
1
+
1
a
_
7
i
7
_
(n
i
n)

1
1
a
_
7
i
7
_
2
Now, since
1
is the OLS estimator for the regression of A on 7, we know that

1
=
_
7
i
7
_
A
i
_
7
i
7
_
2
we can re-write the 2SLS estimator as
,
2S1S
1
= ,
1
+
1
a
_
7
i
7
_
n
i
1
a
_
7
i
7
_
A
i
Now, since
1
a
_
7
i
7
_
n
i
j
co (n
i
, 7
i
)
j
j
Z&
o
&
o
Z
and
1
a
_
7
i
7
_
A
i
j
co (A
i
, 7
i
)
j
j
ZA
o
A
o
Z
substitution will yield the desired result:
,
2S1S
1
j
,
1
+
j
Z&
j
ZA
o
&
o
A
(c) (4 points) Assume that o
&
= o
A
, so that the population variation in the error term is the same as it
is in A. Suppose that the instrumental variable 7 is slightly correlated with n : corr(7, n) = .1. Suppose
also that 7 and A have a somewhat stronger correlation: corr(7, A) = .2. What is the asymptotic bias
in the IV estimator?
Solution: Using

,
2S1S
1
j
,
1
+
j
Zu
j
ZX
ou
o
X
with o
&
= o
A
,

,
2S1S
1
j
,
1
+
.1
.2
= ,
1
+.5. So the asymptotic
bias is .5.
(d) (4 points) How much correlation would have to exist between A and n before OLS has more asymptotic
bias than 2SLS?
Solution: Using

,
1
j
,
1
+j
A&
ou
o
X
with o
&
= o
A
,
,
1
j
,
1
+j
A&
. So we would need to have j
A&
.5
before the asymptotic bias in OLS exceeds that of IV.
80
11. (30 points) You are interested in estimating a demand equation for the sh sold at Manhattans Fulton
Fish market, so you go to the market and collect daily price and quantity observations for 97 consecutive days
(since the market is closed on the weekends, you collect data for Monday through Friday). Specically, you
have data on the following variables:
tottj - the total quantity of sh sold that day
aqjrc - the average price of sh sold that day
:o: - a dummy for whether the day is a Monday
tnc: - a dummy for whether the day is a Tuesday
ncd - a dummy for whether the day is a Wednesday
t/nr: - a dummy for whether the day is a Thursday
nac2 - the average maximum wave height over the two days prior to the price and quantity data
nac3 - the average maximum wave height three and four days prior to the price and quantity data
Note: even though we use time subscripts throughout this question, we will not be using time series
methods here. We will also maintain the assumption that all errors are homoskedastic.
1. (a) (3 points) Assume the demand equation can be written for each time period as
ln(tottj
t
) = ,
0
+,
1
ln(aqjrc
t
) +,
2
:o:
t
+,
3
tnc:
t
+,
4
ncd
t
+,
5
t/nr:
t
+n
t
so that demand is allowed to dier across days of the week. Why is it inappropriate to use OLS to
estimate this demand equation? What additional information do we need to consistently estimate the
demand equation parameters?
Solution: Since price and quantity are determined in equilibrium (as the intersection of the supply and
demand curves), ln(aqjrc
t
) will be endogenous in the demand equation above (i.e. correlated with the
error n
t
). To estimate the demand equation, we will need at least one exogenous variable that appears in
the supply equation.
(b) (4 points) The variables nac2
t
and nac3
t
are measures of ocean wave heights over the past several
days. What two assumptions do we need to make in order to use nac2
t
and nac3
t
as instruments for
ln(aqjrc
t
) in estimating the demand equation? Be sure to discuss how these assumptions relate to the
demand and supply equations.
Solution: For nac2
t
and nac3
t
to be valid IVs for ln(aqjrc
t
), we need two assumptions. The rst is
that these can be properly excluded from the demand equation (i.e. that they are exogenous). (This may
not be entirely reasonable, as wave heights are determined partly by weather, and demand at a local sh
market could depend on weather, but its probably pretty reasonable to treat them as exogenous). The
second assumption is that at least one of nac2
t
and nac3
t
appears in the supply equation (i.e. that
they are relevant). It seems reasonable that supply would depend on the conditions at sea.
(c) (5 points) The rst stage of a 2SLS regression yields
ln(aqjric) = 1.02
(0.14)
.012
(.114)
:o:
t
.0090
(.1119)
tnc:
t
+ .051
(.112)
ncd
t
+ .124
(.111)
t/nr:
t
+ .094
(.021)
nac2
t
+ .053
(.020)
nac3
t
,
1
2
= .165
and a test of the joint signicance of nac2
t
and nac3
t
yields an 1-stat = 19.1, while a test of the joint
signicance of the day dummies yields an 1-stat = .53. Are nac2
t
and nac3
t
individually signicant
at the 1% level? What do the results of this rst stage regression reveal about our instruments?
81
Solution: In the rst stage, we are primarily concerned with whether or not our instruments are weak. In
this case, our instruments nac2
t
and nac3
t
are both individually and jointly signicant. Their t-stats
are
.094
.021
= 4.48 and
.053
.020
= 2.65, which are both greater than 2.58 so they are each individually signicant
at the 1% level. Since the 1-stat = 19.1 10 we conclude that our instruments are not weak (using our
Rule of Thumb).
(d) (4 points) Estimating the demand equation by 2SLS yields
ln(tottj
t
) = 8.16
(.18)
.816
(.327)
ln(aqjrc
t
) .307
(.229)
:o:
t
.685
(.226)
tnc:
t
.521
(.224)
ncd
t
+ .095
(.225)
t/nr:
t
where the standard errors in parentheses are the correct ones (i.e. take the two-stage procedure into
account). What is the the interpretation of the coecient on ln(aqjrc
t
)? Does its magnitude seem
reasonable? Construct a 95% condence interval for this coecient.
Solution: Since this is a ln-ln specication, the coecient on ln(aqjrc
t
) represents the elasticity of
demand. The estimated coecient implies that, holding all other variables constant, we expect a 1%
increase in price to reduce quantity demanded by about .82% (or equivalently, a 10% increase in price to
reduce quantity demanded by about 8.2%). This seems like a reasonable magnitude. A 95% condence
interval for this coecient is given by .816 1.96 .327 = [1.46, .175] .
(e) (3 points) Since we have two instruments and one endogenous variable, the demand equation is over-
identied. The test of over-identifying restrictions yields an 1-statistic of .013. What do you conclude?
Solution: We need to construct the J-statistic which we know is given by J = :1
o
2
nI
, where : is
the number of instruments and / is the number of endogenous variables. In this case : = 2 and / = 1,
so J = 2 .013 = .026. The 10% critical value of the
2
1
distribution is 2.71, so we cannot reject the null
hypothesis that our instruments are exogenous (which is, of course, a good thing).
(f) (3 points) Given that the (unspecied) supply equation evidently depends on the wave variables, what
two assumptions would we need to make in order to estimate the price elasticity of supply?
Solution: To estimate the supply elasticity, we would have to assume that the day-of-the-week dummies
do not appear in the supply equation, but they do appear in the demand equation. Part d) provides
evidence that there are day-of-the-week eects in the demand function. But we cannot know about the
supply function.
(g) (4 points) Here are the results from the 2SLS estimation of a possible supply equation for this industry
ln(tottj
t
) = 10.82
(2.23)
+ 2.13
(2.24)
ln(aqjrc
t
) .267
(.212)
nac2
t
.169
(.139)
nac3
t
What is the the interpretation of the coecient on ln(aqjrc
t
)? Is it statistically signicant at the 10%
level?
Solution: Since this is also a ln-ln specication, the coecient on ln(aqjrc
t
) in this estimation represents
the elasticity of supply. The estimated coecient implies that, holding all other variables constant, we
expect a 1% increase in price to increase quantity supplied by about 2.13% (or equivalently, a 10% increase
in price to increase quantity supplied by about 21.3%). This seems like a pretty big eect, maybe too
big, but I didnt expect you to say this. Also, the t-stat on ln(aqjrc
t
) is
2.13
2.24
= .95 meaning that we
cannot reject that there is no eect of price on supply, even at the 10% level. These results do not instill
condence in our supply equation estimates.
82
(h) (4 points) Consider again the results in part c. Do they provide any explanation for what you found in
part g? Why or why not?
Solution: The regression in part c is also the rst stage of the 2SLS estimation of the supply curve
estimated in part g. None of the day of week dummies show up as individually signicant in that
regression. Moreover, these four variables are jointly insignicant
_
1 = .53 < 1.94 = 1
10%
4,o
_
and the 1-
stat is nowhere near our Rule of Thumb cuto of 10. This means that we have very weak instruments
in our supply regression, which can cause both our coecient estimates and standard errors to explode.
This looks a lot like what we found in part g.
2. SEM. 4. (26 points total) Consider the two equation structural model given by
j
1
= c
1
j
2
+,
1
.
1
+n
1
(1)
j
2
= c
2
j
1
+,
2
.
2
+n
2
(2)
where c
1
< 0 and c
2
0, .
1
and .
2
are each uncorrelated with both n
1
and n
2
, and n
1
is uncorrelated with
n
2
by assumption.
(a) (5 points) The reduced form equation for j
2
takes the form
j
2
=
21
.
1
+
22
.
2
+
2
(3)
where
21
,
22
and
2
are functions of the structural parameters (cs, ,s, and ns) in equations (1) and (2).
Using equations (1) and (2), solve for this reduced form equation (in terms of the structural parameters).
Solution: If we plug the right hand side of (1) into (2) we get
j
2
= c
2
(c
1
j
2
+,
1
.
1
+n
1
) +,
2
.
2
+n
2
or
(1 c
2
c
1
) j
2
= c
2
,
1
.
1
+,
2
.
2
+c
2
n
1
+n
2
Since c
2
c
1
,= 1 (due to the sign restrictions imposed in the set-up), we can divide through by (1 c
2
c
1
)
yielding
j
2
=
c
2
,
1
(1 c
2
c
1
)
.
1
+
,
2
(1 c
2
c
1
)
.
2
+
c
2
n
1
+n
2
(1 c
2
c
1
)
so
21
=
c
2
o
1
(1c
2
c
1
)
,
22
=
o
2
(1c
2
c
1
)
, and
2
=
c
2
&
1
+&
2
(1c
2
c
1
)
(b) (6 points) Using the reduced form equation you solved for in part a, explain why using OLS to estimate
equation (1) will yield an inconsistent estimate of c
1
. What is the direction of the inconsistency (i.e. will
OLS over-estimate or under-estimate the true eect)?
Solution: Our reduced form equation
j
2
=
c
2
,
1
(1 c
2
c
1
)
.
1
+
,
2
(1 c
2
c
1
)
.
2
+
c
2
n
1
+n
2
(1 c
2
c
1
)
clearly illustrates the problem of simultaneity bias: j
2
is a function of the structural error (n
1
) from
equation (1), which violates OLS assumption 1. We can nd the sign of the bias by calculating the
covariance between j
2
and n
1
(since we know that the sign of the bias will coincide with the sign of their
correlation).
co(j
2
, n
1
) = co(
c
2
,
1
(1 c
2
c
1
)
.
1
+
,
2
(1 c
2
c
1
)
.
2
+
c
2
n
1
+n
2
(1 c
2
c
1
)
, n
1
)
= co(
c
2
n
1
+n
2
(1 c
2
c
1
)
, n
1
) =
c
2
1 c
2
c
1
co(n
1
, n
1
) =
c
2
1 c
2
c
1
o
2
&
1
83
Since c
1
c
2
< 0 and c
2
0, the (asymptotic) bias is positive and OLS will over-estimate the true eect.
(c) (5 points) The reduced form equation for j
1
takes the form
j
1
=
11
.
1
+
12
.
2
+
1
(4)
where
11
,
12
and
1
are functions of the structural parameters (cs, ,s, and ns) in equations (1) and
(2). Using equations (1) and (2), solve for this reduced form equation.
Solution: Following similar steps to part (a) yields
j
1
=
,
1
(1 c
2
c
1
)
.
1
+
c
1
,
2
(1 c
2
c
1
)
.
2
+
c
1
n
2
+n
1
(1 c
2
c
1
)
so
11
=
,
1
(1 c
2
c
1
)
,
12
=
c
1
,
2
(1 c
2
c
1
)
, and
1
=
c
1
n
2
+n
1
(1 c
2
c
1
)
(d) (4 points) Can we obtain consistent estimates of the parameters (s) in equations (3) and (4) using
OLS? Why or why not?
Solution: Yes. Both .
1
and .
2
are uncorrelated with both n
1
and n
2
by assumption. Since the errors
(the s) in the reduced form equations (3) and (4) are functions of the ns, the .s are also uncorrelated
with these reduced form errors, implying that OLS Assumption 1 is satised for equations (3) and (4).
This means that estimating (3) and (4) by OLS will yield consistent estimates of the s.
(e) (6 points) Estimation of the reduced form equations (3) and (4) by OLS yields
^ j
2
= 1.42 .
1
+ 1.72 .
2
^ j
1
= 1.29 .
1
2.07 .
2
where we will ignore the issue of calculating standard errors and focus only on parameter estimates. Can
you use these estimates to construct consistent estimates of c
1
and ,
1
? If yes, do so. If not, explain
why you cant. Hint: think about using ratios of the estimated parameters...
Solution: Using the reduced form equations, we see that we can construct a consistent estimate of c
1
using the ratio
12
22
=
c
1
,
2
(1 c
2
c
1
)
(1 c
2
c
1
)
,
2
= c
1
^
12
^
22
=
2.07
1.72
= 1.20
Similarly, we can construct a consistent estimate of c
2
using
^
21
^
11
=
1.42
1.29
= 1.10
Finally, since a consistent estimate of 1 c
2
c
1
is 1 (1.10 1.20) = 2.32, a consistent estimate of ,
1
is
then
(1 ^ c
2
^ c
1
) ^
11
= 2.32 1.29 = 3
3. (31 points total) You are interested in the impact children have on a womans choice of whether to work (i.e.
how fertility impacts labor supply). Specically, you want to know whether (and by how much) a womans
labor supply falls when she has an additional child. You have collected data from the 1980 U.S. Census on
250,000 married women aged 21-35 with two or more children. Your dataset contains the following variables:
84
ncc/: - total weeks the woman worked in 1979
:orc/id: - dummy equal to 1 if the woman has three or more kids
aqc - age of the woman
:a:c:cr - dummy equal to 1 if the two oldest children are of the same sex (i.e. boy-boy or girl-girl)
1. (a) (4 points) A simple OLS regression of ncc/: on :orc/id: yields the following output
ncc/: = 21.07
(.056)
5.39
(.087)
:orc/id:
On average, do women with more than two children work less than women with two children? How much
less? Is the dierence statistically signicant at the 1% level?
Solution: The OLS regression shows that women with two or more kids work 5.39 fewer weeks (per
year) than women with two kids. The dierence is statistically signicant at the 1% level since t =
5.39
.087
=
61.96 2.58.
(b) (4 points) Is there a good reason to think that OLS is an inappropriate technique for estimating the
causal eect of fertility (:orc/id:) on labor supply (ncc/:)? Explain why or why not.
Solution: There is a good reason to believe OLS is inappropriate. Both the LHS and RHS variables here
are choice variables of the woman and are probably inuenced by the same underlying unobservables.
In particular, the type of unobserved variable that would lead a woman to works a greater than average
number of hours would likely also lead her to have fewer children (meaning that :orc/id: is likely
positively correlated with the regression error, causing

,
1
to be positively biased).
(c) (4 points) You decide to examine whether the decision to have more than two children is inuenced by
the gender of the rst two children. To examine this issue, you regress the variable :orc/id: on :a:c:cr
(using a LPM), yielding the following results
:orc/id: = .346
(.001)
+ .068
(.002)
:a:c:cr
Are couples whose rst two children are of the same sex more likely to have a third child? Is the eect
large? Is it statistically signicant at the 1% level?
Solution: The regression reveals that couples whose rst two children are the same sex are indeed more
likely to have a third child. Specically, they are 6.8% more likely, which is a pretty big eect. It is
statistically signicant at the 1% level since t =
.068
.002
= 34 2.58.
(d) (4 points) Do you think :a:c:cr is a valid instrument for an IV regression of ncc/: on :orc/id:? Why
or why not? Is :a:c:cr a weak instrument?
Solution: Since people dont/cant really choose the gender of their children (yet!), it would seem to
be a nice source of random variation that pushes around how many kids people choose to have (i.e. its
exogenous). We have already established relevance in part c), since the coecient on :a:c:cr was positive
and highly signicant. Moreover, using the 1-Statistic based rule of thumb, it is also quite strong, since
1 = t
2
= 1156 10.
85
(e) (4 points) You now estimate the 2SLS regression of ncc/: on :orc/id: using :a:c:cr as the instrument.
Here are the results:
ncc/: = 21.42
(.487)
6.31
(1.27)
:orc/id:
How large is the fertility eect on labor supply now? Is it statistically signicant?
Solution: The 2SLS regression shows that women with two or more kids work 6.31 fewer weeks (per
year) than women with two kids. The dierence is statistically signicant at the 1% level since t =
6.31
1.27
=
4.97 2.58.
(f) (4 points) Would the womans level of educational attainment be a valid instrument? Why or why not?
How about the womans age?
Solution: Neither of these candidates seem like good instruments. While both are likely to be relevant
- more educated women might have fewer kids, older women likely have more kids (just because theyve
had more time to have them!) - neither are likely to be exogenous. Education and age probably have
strong direct eects on weeks worked!
(g) (4 points) We dont observe the level of education in this dataset, but we do have aqc. Heres what we
get when we use aqc (as well as :a:c:cr) as an instrument in the 2SLS regression of ncc/: on :orc/id:
ncc/: = 6.97
(.352)
+ 31.67
(.921)
:orc/id:
What is the fertility eect on labor supply now? Is it statistically signicant? Do you think you have
correctly recovered the causal eect now? Why or why not?
Solution: Now we nd the opposite eect: women with three or more kids work 31.67 more weeks per
year than women with two kids. The result is also highly signicant, since t =
31.67
.921
= 34.4. However,
you should not be comfortable concluding that you have recovered a causal eect; this is an implausible
outcome, likely driven by the fact that we have included an invalid instrument in our 2SLS regression!
(we will conrm this below).
(h) (3 points) The J-statistic from the regression is part g) is 791.8. What do you conclude?
Solution: Since we have two instruments and one endogenous variable, the J-statistic will have a
2
1
distribution. The 1% critical value for the
2
1
distribution is 6.63, so with a J-statistic of 791.8 we reject
the null at any level of signicance, meaning that at least one of our instruments is likely endogenous (not
surprising, given the results in part g)).
2. (24 points total) Never tiring of the same old subject, you decide to once again look at the returns to
education. To do so, you collected data from the NLSY on 1230 white males who are currently in the
workforce. You have data on the following variables:
|:(naqc) - log of the workers hourly wage (in dollars)
cdnc - highest grade of school completed
crjcr - years of experience in the workforce
ctnit - average level of college tuition (in $1000s) in the state where the person resides
1. (a) (4 points) To start with, you simply regress |:(naqc) on cdnc using OLS. Heres what you nd
ln(naqc) = 1.09
(.099)
+ .101
(.007)
cdnc
86
What is the estimated eect of an additional year of schooling? Construct a 95% condence interval for
this eect.
Solution: Since this is a log-level regression, the coecient on cdnc implies that an additional year of
schooling is associated with a 10.1% increase in hourly wages. The 95% condence interval is simply
.101 1.96 .007 = .101 .01372 = (.087, .115)
(b) (4 points) In search of a valid instrument, you try regressing cdnc on ctnit. Heres what you nd
cdnc = 13.04
(.067)
.049
(.079)
ctnit
What do you conclude about the potential usefulness of ctnit as an instrument for cdnc?
Solution: We nd (perhaps not all that surprisingly) that the average college tuition in the state is
negatively correlated with a persons highest grade of school completed. However, not surprisingly, it is
not particularly signicant: t =
.049
.079
= .62. So even though college tuition would seem to be exogenous
(probably not related to a given persons wage) and potentially relevant (people consume less of things
that are more expensive, including education), we dont really have the best tuition measure here (the
average tuition in the state where the person resides is probably not the relevant price). So its unclear
exactly what we are capturing here. More importantly, 1 = t
2
= .384 << 10, so its very weak. Overall,
it doesnt look like a good instrument.
(c) (4 points) You decide to include some additional regressors in your original OLS regression. In particular,
you add crjcr, crjcr
2
, and some dummy variables for various geographic regions. You nd the following
results
ln(naqc) = .507
(.241)
+ .137
(.009)
cdnc + .112
(.027)
crjcr .003
(.001)
crjcr
2
where I have omitted the coecients on the regional dummies for brevity. What is the estimated eect
of an additional year of schooling now? Construct a 95% condence interval for this eect.
Solution: Since this is still a log-level regression, the coecient on cdnc implies that an additional year
of schooling is associated with a 13.7% increase in hourly wages. This is a pretty big eect, likely biased
upward by omitted variable bias. The 95% condence interval is simply .1371.96 .009 = .137.01764 =
(.119, .155)
(d) (3 points) Based on the regression results in part c), how does experience impact earnings? (Just describe
in words, no calculations are necessary here).
Solution: Experience is modelled here as a quadratic, but wages are in logs. Given the signs (one
positive, the other negative), we are nding that the log of wages increases with experience, but at a
decreasing rate (i.e. its concave). Having the quadratic term is quite important here since, without it,
wages (the level, not the log) would have a convex relationship to experience, which doesnt make much
sense (unlike education, whose impact on earnings is likely to be convex, we probably expect the returns
to experience to hit diminishing returns at some point).
(e) (4 points) Using your new covariates (including the regional dummies), you re-estimate the eect of
ctnit on cdnc using OLS. Now you nd a coecient of -.165 with a standard error of .075. How do you
feel about ctnit as an instrument now?
Solution: The t-statistic on ctnit is now t =
.165
.075
= 2.2. While it is statistically signicant (at the 5%
level, but not the 1% level), it is still weak, since 1 = t
2
= 4.84 < 10. It still looks like a poor choice of
instrument!
87
(f) (5 points) You decide to re-estimate the regression in part c) using 2SLS, with ctnit as an instrument.
Heres what you nd
ln(naqc) = 2.89
(2.57)
+ .250
(.121)
cdnc + .209
(.108)
crjcr .005
(.002)
crjcr
2
What does this regression tell you about the returns to education? Do you have faith in these results?
Why or why not?
Solution: Contrary to what we should expect (i.e. that OLS should be biased upwards), we are now
nding an even larger impact of education. In particular, since this is still a log-level regression, the
coecient on cdnc implies that an additional year of schooling is associated with a 25% increase in hourly
wages. This implausible result is almost certainly driven by our weak instrument, which we expect to
bias the coecient upward.
2. (24 points total) You are interested in estimating treatment eects for a heterogeneous population. We know
that, since the population is heterogeneous, the population regression equation can be written as
1
i
= ,
0i
+,
1i
A
i
+n
i
In the setting you are interested in, treatment is only partially randomly determined, but you do have a valid
instrument 7
i
with which to perform IV. However, there is also heterogeneity in the eect of A
i
on 7
i
. In
particular, A
i
is related to 7
i
by the linear model
A
i
=
0i
+
1i
7
i
+i
i
Suppose you know that
0i
,
1i
, ,
0i
, and ,
1i
are distributed independently of n
i
, i
i
, and 7
i
, that 1(n
i
[ 7
i
) =
1(i
i
[ 7
i
) = 0, and that 1 (
1i
) ,= 0. We are going to show that
,
2S1S
1
=
a
i=1
_
7
i
7
_ _
1
i
1
_
a
i=1
_
7
i
7
_ _
A
i
A
_
j
1 (,
1i
1i
)
1 (
1i
)
(Note: we showed this equation in class, but did not prove it. Now, we will.)
(a) (3 points) Lets start with the rst step. Assuming you have an iid sample, show that
,
2S1S
1
j
o
ZY
o
ZA
Solution:

,
2S1S
1
=
1
n1
P
n
i=1
(Z
i
Z)(Y
i
Y )
1
n1
P
n
i=1
(Z
i
Z)(A
i
A)
=
c
ZY
c
ZX
j
o
ZY
o
ZX
by the Law of Large Numbers.
(b) (4 points) Lets focus rst on the denominator of
o
ZY
o
ZX
. Using the denition of covariance, show that
o
ZA
= 1 [(7
i
j
Z
)
0i
] +1 [
1i
7
i
(7
i
j
Z
)] +1 [(7
i
j
Z
) i
i
]
Solution: By the denition of covariance
o
ZA
= 1 [(7
i
j
Z
) (A
i
j
A
)] = 1 [(7
i
j
Z
) A
i
] = 1 [(7
i
j
Z
) (
0i
+
1i
7
i
+i
i
)]
Breaking up this last term yields
1 [(7
i
j
Z
) (
0i
+
1i
7
i
+i
i
)]
= 1 [(7
i
j
Z
)
0i
] +1 [
1i
7
i
(7
i
j
Z
)] +1 [(7
i
j
Z
) i
i
]
88
(c) (4 points) Focusing on the rst term of the equation from part b), show that 1 [(7
i
j
Z
)
0i
] = 0
Solution:
1 [(7
i
j
Z
)
0i
] = 1 (7
i
j
Z
) 1 (
0i
) (since 7
i
is distributed independently of
0i
)
= (1 (7
i
) j
Z
) 1 (
0i
) = (j
Z
j
Z
) 1 (
0i
) = 0
(d) (4 points) Looking now at the second term, show that 1 [
1i
7
i
(7
i
j
Z
)] = o
2
Z
1 (
1i
)
Solution:
1 [
1i
7
i
(7
i
j
Z
)] = 1 (
1i
) 1 [7
i
(7
i
j
Z
)] since 7
i
is distributed independently of
1i
= 1 (
1i
) 1 [(7
i
j
Z
) (7
i
j
Z
)]
= o
2
Z
1 (
1i
)
(e) (3 points) Finally, looking now at the third term, show that 1 [(7
i
j
Z
) i
i
] = 0
Solution:
1 [(7
i
j
Z
) i
i
] = 1 [(7
i
j
Z
) (i
i
j
i
)] = co (7
i
, i
i
) = 0 by assumption
(f) (3 points) Now, putting it all together, show that o
ZA
= o
2
Z
1 (
1i
)
Solution: Very simple! You just combine the results of the previous steps.
o
ZA
= 1 [(7
i
j
Z
)
0i
] +1 [
1i
7
i
(7
i
j
Z
)] +1 [(7
i
j
Z
) i
i
]
= 0 +o
2
Z
1 (
1i
) + 0 = o
2
Z
1 (
1i
)
(g) (3 points) In the interest of time, I wont have you prove that o
ZY
= o
2
Z
1 (,
1i
1i
) . Instead, taking this
result as given, use the preceding results to show that
,
2S1S
1
=
:
ZY
:
ZA
j
o
ZY
o
ZA
=
1 (,
1i
1i
)
1 (
1i
)
Solution: From part a) we have

,
2S1S
1
=
c
ZY
c
ZX
j
o
ZY
o
ZX
, from part f) we have o
ZA
= o
2
Z
1 (
1i
) , and
nally, we are given that o
ZY
= o
2
Z
1 (,
1i
1i
) . Therefore
,
2S1S
1
=
:
ZY
:
ZA
j
o
ZY
o
ZA
=
o
2
Z
1 (,
1i
1i
)
o
2
Z
1 (
1i
)
=
1 (,
1i
1i
)
1 (
1i
)
.
89
7 Estimation with Panel Data
1. 25 Points overall. A researcher investigating the determinants of crime in the United Kingdom has data for
42 police regions over 22 years. She estimates by OLS the following regression
ln(c:rt)
it
= c
i
+c
t
+,
1
n:rt:
it
+,
2
jrojt/
it
+,
3
ln(jj)
it
+n
it
; i = 1, ..., 42, t = 1, ..., 22
where c:rt is the crime rate per head in the population, n:rt: is the unemployment rate of males, jrojt/ is
the proportion of youths, and jj is the probability of punishment measured as (number of convictions)/(number
of crimes reported). c and c are area and year xed eects, where c
i
equals one for area i and is zero otherwise
for all i, and c
t
is one in year t and zero for all other years for t = 2, . . . , 22. c
1
is not included.
(a) (4 points) What is the purpose of excluding c
1
?
Solution: We leave out c
1
to avoid perfect multi-collinearity. If you included c
1
, the sum of the cs
would always equal one, which is also the sum of the cs.
(b) (4 points) Briey discuss the advantages of using panel data for this type of investigation.
Solution: Using panel data, we can control for group specic, time invariant eects, such as attitude
towards crime in urban vs. rural areas, and we can also control for time specic, group invariant eects,
such as macroeconomic shocks.
(c) (4 points) Estimation by OLS using heteroskedasticity-robust standard errors results in the following
output, where the coecients of the xed eects are not reported for brevity:
ln(c:rt)
it
= 0.063
(0.109)
n:rt:
it
+ 3.739
(0.179)
jrojt/
it
0.588
(0.024)
ln(jj)
it
; 1
2
= 0.904
Briey interpret the signs of the coecients. Do the coecients have the expected signs? Justify your
answer.
Solution: A higher male unemployment rate and a higher proportion of youths increase the crime rate,
while a higher probability of punishment decreases the crime rate. The coecients on the probability of
punishment and the proportion of youths is statistically signicant. while the male unemployment rate
is not. The regression explains roughly 90 percent of the variation in crime rates in the sample.
(d) (4 points) Using the results above, what is the eect of a ten percent increase in the probability of
punishment?
Solution: A ten percent increase in the number of convictions over the number of crimes reported
decreases the crime rate by about 5.88 percent.
(e) (4 points) You want to test for the relevance of the year xed eects, and the relevant F-statistic is 1.7.
Using a 1% signicance level, what do you conclude?
Solution: Since there are 22 years in the sample, there are 21 restrictions imposed by eliminating the
year xed eects and adding a constant. The critical value is about 1.85 at the 1 % level. Therefore, we
cannot reject the restriction.
90
(f) (5 points) You would like to use a Random Eect estimator (RE), but you are afraid that there might
be correlation between the regressors and the police region xed eect. You therefore run a Hausman
test, and the result is 21.5. What do you conclude?
Solution: Since the value calculated from the Hausman test is 21.5 and there are 24 degrees of freedom
(3 variables plus 21 time dummies)
2
24
is 42.98 at the 1% level and 36.41 at the 5% level. Therefore, we
cannot reject the hypothesis that there is no correlation between regressors and regions.
2. (16 points) Recall the example from class, where we had data on trac fatalities
5
and the real tax
6
on a
case of beer in 48 U.S. states
1ata|itj1atc
i
= ,
0
+,
1
1ccrTar
i
+,
2
Cn|t1actor:
i
+n
i
but we dont observe (among other things) cultural attitudes toward drinking and driving. We addressed this
problem by specifying the model
1ata|itj1atc
it
= ,
1
1ccrTar
1,it
+c
i
+c
t
+n
it
which we estimated using state and year xed eects as
1ata|itj1atc = 0.64
(0.25)
1ccrTar +otatc11: +1 car11:
(a) (3 points) Construct a 99% condence interval for the eect of a 50c/ increase in the 1ccrTar on the
fatality rate.
Solution: A 99% condence interval for the eect of a 50c/ increase in the 1ccrTar on the fatality rate
is the same as a 99% CI for
1
2
,
1
(since 1ccrTar is measured in dollars): 99% CI for
1
2
,
1
=
1
2
,
1
2.58
1
2
o1
_
,
1
_
=
1
2
(.64) 2.58
1
2
.25 = (.643, .003)
(b) (4 points) Using the condence interval constructed above, can you reject the null hypothesis that a 50c/
increase in the 1ccrTar will decrease the fatality rate by 6 deaths per 100,000?
Solution: This is the same as asking if you can reject the null hypothesis that a 50c/ increase in the
1ccrTar will decrease the fatality rate by .6 deaths per 10,000. You cannot reject this null hypothesis
because -.6 lies inside the condence interval constructed in part a)
(c) (6 points) This estimated relationship between beer tax and the fatality rate is immune to omitted
variable bias from variables that are constant either over time or across states. But many important
determinants of trac deaths do not fall into this category. Identify two such factors and describe why
omitting them from the analysis could lead to omitted variable bias.
Solution: Alcohol taxes are only one way to discourage drinking and driving. States also dier in their
punishments for drunk driving, and a state that cracks down on drunk driving could do so across the
board by toughing laws as well as raising taxes. If so, omitting these laws could produce omitted variable
bias in the OLS estimator of the eect of real beer taxes on trac fatalities, even after including xed
eect. Examples of such laws include the legal drinking age, mandatory jail sentences, and mandatory
5
The number of trac fatalities per 10,000 people in the state. The average value of this variable in this dataset is about 2 (so 2
fatalities per 10,000).
6
In 1988 dollars. The average tax is about 50c/ a case.
91
community service. In addition, because vehicle use depends in part on whether drivers have jobs and
because tax changes can reect economic conditions, omitting state economic conditions also could result
in omitted variable bias. Examples include the unemployment rate and measures of average income.
(d) (3 points) Your friend Daniel, who has not taken this class, suggests handling the issues raised in part c)
by replacing the year and state xed eects with a year/state xed eect, that is a xed eect for every
year-state pair. What, if anything, is wrong with his suggestion? Explain.
Solution: This is like having a dummy variable for every observation, which in addition to the 1ccrTar
regressor, would give you more variables than observations, making the regression infeasible.
3. (23 points total) Trac crashes are the leading cause of death for Americans between the ages of 5 and 32.
Through various spending policies, the federal government has encouraged states to institute mandatory seat
belt laws to reduce the number of fatalities and serious injuries. You are interested in examining how eective
these laws are in reducing fatalities. You have collected a panel dataset from 50 U.S. states (plus the District
of Columbia) for the years 1983-1997. Your dataset includes the following variables:
)ata|itjratc is the number of fatalities per thousand of trac miles
:/_n:aqc is the seat belt usage rate
:jccd65 is a dummy =1 if 65 mile per hour speed limit, =0 otherwise
:jccd70 is a dummy =1 if 70 or higher mile per hour speed limit, =0 otherwise
/a08 is a dummy =1 if blood alcohol limit _ .08%, =0 otherwise
dri:/aqc21 is a dummy =1 if age 21 drinking age, =0 otherwise
i:co:c is per capita income
aqc is mean age
:tatc is a set of state dummies
jcar is a set of year dummies
The table below contains the results of several regressions (pooled OLS, OLS with state xed eects, GLS
with state random eects, and OLS with state and year xed eects). The following questions are based
on these results.
1 2 3 4
:/_n:aqc 4.07
(1.22)
5.77
(1.15)
4.50
(1.12)
3.72
(1.13)
:jccd65 0.148
(0.403)
0.425
(0.334)
0.341
(0.337)
0.783
(0.424)
:jccd70 2.40
(0.511)
1.23
(0.329)
1.34
(0.328)
0.804
(0.340)
/a08 1.92
(0.445)
1.38
(0.373)
1.36
(0.367)
0.822
(0.352)
dri:/aqc21 0.079
(0.876)
0.745
(0.507)
0.767
(0.510)
1.13
(0.535)
|:(i:c) 18.1
(0.931)
13.5
(1.42)
12.6
(1.14)
6.26
(3.86)
aqc 0.007
(0.109)
0.979
(0.382)
0.232
(0.239)
1.32
(0.383)
co::ta:t 196.5
(8.22)
137.9
(8.92)
State Eects None FE RE FE
Year Eects None None None FE
1
2
0.544 0.874 0.683 0.897
92
1. (a) (4 points) Focusing on the results from the pooled OLS regression in column 1, does the estimated
regression suggest that increased seat belt use signicantly reduces fatalities? Be complete. Does this
result make sense? If so, explain why. If not, explain what you think is going on here.
Solution: According to the pooled OLS regression 1, higher seat belt usage rates are actually associated
with a higher fatality rate. Moreover, this positive result is in fact signicant at the 1% level since
t =
4.07
1.22
= 3.34 2.58. This is a very suspicious result, suggesting that we may be suering from omitted
variable bias. In particular, it seems likely that in places with the most dangerous driving conditions,
people might be more likely to wear seat belts.
(b) (4 points) How do the results regarding the impact of the seat belt usage rate on trac fatalities change
when we add state xed eects (column 2)? Provide an intuitive explanation for why the results changed.
Solution: Once we control for state xed eects, the coecient on :/_n:aqc switches signs. We now nd
a negative and signicant impact on trac fatalities of increased seat belt usage (t =
5.77
1.15
= 5.02 <
2.58). This seems much more reasonable. If the omitted variables associated with dangerous road
conditions are constant over time, but vary across states, we have now controlled for them, allowing us
to isolate the true impact of :/_n:aqc.
(c) (4 points) The specication in column 3 replaces the state xed eects with state random eects. Does
this also seem like a reasonable solution to the problem you identied in part a?
Solution: No. The RE specication simply handles the fact that repeated observations from the same
entity are not iid. It does not address the omitted variables problem since it assumes the xed eect is
uncorrelated with the included regressors. However, it is interesting that the coecients are all pretty
similar to the FE regression. Nonetheless, we know that random eects are not a remedy for the OVB
problem (since RE assumes that c
i
is uncorrelated with the regressors).
(d) (4 points) Here are the results of a Hausman test between the regressions in columns 2 and 3
What do you conclude from these results?
Solution: The null hypothesis of the Hausman test is that both the RE and FE models are consistent.
The Hausman test statistic is distributed
2
7
here since there are 7 time varying regressors. The 1%
critical value for the
2
7
distribution is 18.48, so we can reject the null at the 1% level, implying that the
93
RE model is inconsistent (despite the apparent similarity in the coecients). Note: you could have also
used the reported j-value of .0004, which is < .01 here.
(e) (4 points) The model in column 4 adds year xed eects to the model in column 2. An F-test of the
joint signicance of these 14 year dummies yields an F-statistic of 8.85. What do you conclude about
the joint signicance of the year dummies? Do the results regarding the impact of seat belt usage change
now that we have added year xed eects?
Solution: The 1% critical value for the 1
14,o
distribution is 2.08, so the time dummies are signicant
at the 1% level. The results do change, as the coecient on :/_n:caqc is now 3.72.
(f) (3 points) Which regression specication 1, 2, 3, or 4 is most reliable? Why?
Solution: The pooled OLS regression clearly suered from Omitted Variable Bias, so we need to do
something about that. RE does not solve this problem, so that specication is inappropriate as well
(you also rejected it with the Hausman test). Between the two FE specications, the one with both time
and state xed eects certainly controls for the most omitted stu, and since the time xed eects are
signicant (and the coecients change a bit too), the specication is column 4 is the best choice.
2. You are using data collected from 545 men who worked every year between 1980 and 1987. The included
variables are as follows
variable description
id person identier
year 1980 to 1987
lwage log(wage)
le log(labor mkt experience)
black Dummy, =1 if black
hisp Dummy, =1 if Hispanic
married Dummy, =1 if married
educ years of schooling
union Dummy, =1 if worked belongs to a union
d81 Dummy, =1 if year = 1981
d82 etc
. . .
d87
In what follows we will assume that the error terms in our regressions are homoskedastic. Indeed, it turns out
that the Hausman test that we saw in class is valid only when there is homoskedasticity (there are ways to
modify it to make it robust to heteroskedasticity, but we wont see them). Here we want to study the relation
between log(wage), education, and experience, controlling for some other variables. We assume that the true
regression is
ln(naqc
it
) = ,
0
+,
1
cdnc
it
+,
2
|c
it
+
87
Y =81
Y
d1 + other controls +c
i
+n
it
(1)
where c
i
is the unobserved xed eect. Note that the model includes year-specic dummies.
(a) The sample includes data from 1980 to 1987, but equation (1) includes only dummies for years 1981 to
1987. Why? Solution If we included the dummy for 1980 we would have a multi-collinearity problem.
(b) Before dealing with panel data, we have typically assumed that observations are i.i.d.. You know that
independent observations are uncorrelated, so, if you can show that two observations are correlated, you
showed that they cannot be independent, and then they cannot be i.i.d.. Here the error term for worker i
in year t is
it
= c
i
+n
it
, while in year : the error is
ic
= c
i
+n
ic
. Assume that both the xed eect c
i
94
and the other error component n
ic
have zero expected value, and you can assume that co (n
it
, n
)c
) = 0
for every i, ,, t, : (unless, of course, i = , and t = : !!). Assume also that the xed eects are uncorrelated
with all the ns. Calculate co (
it
,
ic
). Is it equal to zero? Do you think that in a panel data the
assumption that observations are i.i.d. is a good one? Solution
Co(
it
,
ic
) = Co(c
i
+n
it
, c
i
+n
ic
)
= Co(c
i
, c
i
) +Co(c
i
, n
it
) +Co(c
i
, n
ic
) +Co(n
it
, n
ic
)
= Co(c
i
, c
i
)
Since Co(
it
,
ic
) = Co(c
i
, c
i
) = \ ar(c
i
) ,= 0, it is not valid to assume that observations in panel data
sets are iid.
You estimate equation (1) using RE (Random Eects), and FE (Fixed Eects) and you get the following
results (standard errors are shown to the right of the corresponding coecient):
(1) (2)
Random Eect Fixed Eect
Variable Coe. s.e. Coe. s.e.
educ 0.104 0.024
le 0.301 0.092 0.176 0.116
black -0.125 0.139
hisp 0.104 0.162
married 0.067 0.038 0.003 0.042
union 0.050 0.041 0.015 0.044
d81 0.039 0.058 0.088 0.063
d82 -0.020 0.074 0.068 0.086
d83 -0.031 0.089 0.084 0.106
d84 -0.055 0.102 0.083 0.124
d85 -0.019 0.113 0.140 0.139
d86 -0.010 0.124 0.168 0.153
d87 0.081 0.133 0.276 0.165
constant -0.123 0.316
(c) Using the RE results in column (1), interpret the coecient related to the variables cdnc and |c. Solution
The coecient on cdnc means that increasing education by one year will increase the wage by 10%. The
coecient on |c means that increasing experience by 1% will increase wages by 0.3%.
(d) Using again RE, you run a test for the joint signicance of all the year-specic dummies, and the 1-test
is equal to 1.576. What do you conclude? Solution The critical value for an F-test with 7 degrees of
freedom is 1.72 at the 10% level. Therefore, we can not reject the null hypothesis that all of the time
dummies are jointly equal to zero.
(e) Now you turn to the estimates obtained estimating equation (1) using FE (Fixed Eects). How do you
justify the fact that the FE estimator did not estimate the coecients for educ, black, and hisp? Solution
The race and education level of the individuals in this sample do not vary across time. (Note that everyone
was working during these years and not in school.) Therefore, the eect of these variables is captured by
the xed eect coecient.
(f) The estimated coecients using FE and RE look quite dierent, so you suspect that the xed eect
might be correlated with one or more included regressors. Therefore, you decide to run a Hausman
test and the result is 17.2 Do you reject the null using a 5% signicance level? And what about using
a 10% signicance level? What does the result of the test suggest about the presence of correlation
between the xed eect c
i
and the included regressors? Based on the result of the Hausman test, is there
95
evidence that the RE estimators are not consistent? (Note for the calculation of the number of degrees
of freedom: this test is based on the comparison of coecients estimated by using FE and RE, so you
can only compare coecients that are estimated in both models!). Solution The null hypothesis under
the Hausman test is that the individual eects are uncorrelated with the other regressors. The critical
values for a
2
distribution with 10 degrees of freedom are 16.0 for 10% and 18.3 for 5%. (Ten degrees of
freedom because there are 10 coecients that are estimated in both models.) Therefore, we can reject the
null hypothesis at the 10% but not the 5% level. This means that c
i
may be correlated with the other
regressors, which would cause a random eects model to be inconsistent.
96
8 Empirical Exercises with a bit of everything
1. SEM 30 Points overall. Consider the following system of simultaneous equations
1
1
i
= ,
0
+,
1
Q
1
i
+,
2
1
W
i
+n
1i
(1)
Q
1
i
= ,
3
+,
4
1
1
i
+n
2i
(2)
where 1
1
i
is the average price of a generic bottle of beer in market i and Q
1
i
is the quantity of this beer sold
in market i. 1
W
i
is the price of a generic bottle of wine in market i.
(a) (5 points) Which is the supply equation and which is the demand equation? Justify your answer.
Solution: Equation (1) is demand and equation (2) is supply. We know (1) must be demand because
the price of generic wine would not aect supply, but it must surely aect demand, since generic wine is
a substitute for generic beer.
(b) (5 points) What are the expected signs of ,
1
, ,
2
and ,
4
?
Solution:
,
1
< 0
,
2
0
,
4
0
(c) (5 points) Very briey explain why estimating the rst equation by OLS would result in biased estimates
of (,
0
, ,
1
, ,
2
).
Solution: Price and quantity of beer are jointly determined by both demand and supply. This is the
classic identication problem.
(d) (5 points) Is equation (1) identied? Justify your answer. If the answer is yes, briey mention how you
would estimate the corresponding parameters.
Solution: Equation (1) is not identied.
(e) (5 points) Is equation (2) identied? Justify your answer. If the answer is yes, briey mention how you
would estimate the corresponding parameters.
Solution: Equation (2) is identied. Using price of wine as an instrument, run TSLS. There is 1 excluded
exogenous parameter. To solve, wed using the market-clearing mechanism from introductory economics
and note that quantity demanded = quantity supplied. We can re-write equation (1) as:
Q
1
i
=
1
0
1
1
+
1
1
1
1
1
i

1
2
1
1

n
1i
1
1
And now set equation (1) to equal equation (2), and solve for price. Substituting price into the demand
or supply function, we can write quantity in the same manner. Then, we can run OLS to nd the
transformed parameters, and use algebra to isolate the true parameter values from supply.
97
(f) (5 points) What should change in the above system of equations in order for the rst equation to be
overidentied with 2 overidentication restrictions?
Solution: We would need to add three additional excluded exogenous variables to the supply function.
2. This year, you are teaching ECON 139. Suppose you have a very large class so that, in all what follows, you
can use asymptotic results. You decide to have two midterms and one nal. Both midterms are worth 75
points. Midterm 1 has just been graded, and you are not very happy with the performance of the class. You
want to analyze if one extra section of econometrics per week leads to higher grades. To do this, you create
a new section, held on Saturday morning, so that there will be no schedule conicts. Participation to this
extra section is on a voluntary basis. When midterm 2 comes, and grades have been assigned, you estimate
the following regression
qradc'2
i
= 52.3
(2.9)
+ 0.20
(0.05)
qradc'1
i
+ 8.37
(1.08)
1rtra
i
(7)
where qradc', denotes grade in midterm ,, , = 1, 2, and 1rtra
i
is a dummy equal to one if the student
attended the extra sections.
(a) Using the results in equation 1, does it look like the students who attended the extra sections performed
relatively better than the others? Evaluate both the statistical signicance and the magnitude of the
results. Solutions
Yes, the students who attended the extra sections did better. The average gain was 8.37, over 10% of the
total points possible on the test. And with a t-statistic of 7.75, this dierence is statistically signicant
as well.
(b) Using again the data that you have collected after Midterm 2, now you estimate the following logit model:
1 (1rtra
i
= 1 [ qradc'1
i
) = 1
_
6.27
(2.5)
+ 0.11
(0.043)
qradc'1
i
_
(8)
where 1 is the logit CDF. Does it look like students who performed better in Midterm 1 are statistically
signicantly more likely to participate to the extra sections? Solution
t =
0.11
0.043
= 2.56 1.97
The coecient on midterm 1 is statistically signicant at the 5% level. Therefore, we do conclude that
students who do better on midterm 1 are more likely to attend the extra sections.
(c) Calculate the estimated dierence in the probability of attending the extra sections for a student who got
50 in Midterm 1, and another student who instead got 70. Solution
1
1 +c
(6.270.1170)

1
1 +c
(6.270.1150)
= 0.81 0.32 = 0.49
A student with a midterm grade of 70 is 49% more likely to attend the extra sections than a student who
scored 50.
(d) Ideally, you would like to estimate the causal impact of the extra sections on the grade in midterm 2. Do
the results you obtained from the logit model in (2) suggest that the coecient for 1rtra in equation (1)
can or cannot be interpreted in a causal way? Explain. Solution
The logit results show that we can not interpret the coecient from equation (1) in a causal way. The
logit model shows that there is signicant self-selection. Good students are more likely to go to section
and to do well on the second midterm.
98
(e) What does your answer to the previous question suggest in terms of the reliability of the estimated
coecient for 1rtra
i
in equation (1). Is 8.37 likely to be upward biased? Downward biased? Or is it likely
to be close to the true causal impact? Explain. Solution
It is likely that the coecient on 1rtra will be upwardly biased. We have seen from equation (2) that
there is signicant self-selection among those who attend the extra section. If this selection is driven only
by the grade on midterm 1, we can control for this. However, it is likely that there unobserved variables,
such as motivation, that also matter. Motivation would have a positive correlation with midterm scores
and with attendance. From the omitted variable bias formula, we know this will bias the 1rtra coecient
upwards.
After Midterm 2, you carry out the following experiment. You generate a dummy variable 1
i
, and you set
the dummy variable = 1 for student i if tossing a coin you get a head, and you assign instead 1
i
= 0 if the
result is tails. Then you tell students who got 1
i
= 1 that if they attend the extra sections (attendance
will be checked every day by the TA), they will get a $100 gift card to be used at a local grocery store.
However, attendance of the extra sections is still voluntary. Before the nal, you collect from your TA all
information on attendance, and you come up with the following 2 2 table, which represents the joint
probability of 1
i
(=1 if you got the oer) and 1rtra (=1 if you attended the post-midterm 2 sections).
Please note that from now on the data on section attendance before midterm 2 are no longer used.
1rtra
i
1 0
1
i
0 0.3 0.2
1 0.4 0.1
(f) Calculate 1 (1rtra
i
= 1 [ 1
i
= 1), 1 (1rtra
i
= 1 [ 1
i
= 0) , and 1 (1rtra
i
= 1) . Solution
1(1rtra
i
= 1 [ 1
i
= 1) =
0.4
0.5
= 0.8
1(1rtra
i
= 1 [ 1
i
= 0) =
0.3
0.5
= 0.6
1(1rtra
i
= 1) = 0.7
(g) Are 1rtra
i
and 1
i
independent? Explain? Solution
No, 1rtra
i
and 1
i
are not independent. They are positively correlated. A higher 1
i
increases the
probability of attending section.
(h) You think of using 1
i
as an instrument for 1rtra
i
. Do you think this would be a valid instrument?
Would it be exogenous? Would it be relevant? Explain. Solution
Yes, it should be a valid instrument. As shown above, it is relevant because it increases the probability
of attending the extra section. Furthermore, it is exogenous to the unobserved portion of the selection
equation because it was assigned randomly.
Now you want to estimate the following model using 2SLS, using 1
i
as an instrument.
qradc1
i
= ,
0
+,
1
qradc'1
i
+,
2
qradc'2
i
+,
3
1rtra
i
+n
i
where qradc1
i
is the grade in the Final. The grade in the nal is out of a total of 180 points.
(i) After estimating the rst stage, you want to test for the relevance of the instrument. The F-statistic
is equal to 13. Describe the regression estimated in the rst stage, then carefully describe the null
hypothesis that is being tested by the F-test. Finally, explain whether the result of the test suggests that
your instrument is weak. Solution
The rst stage is a regression of 1rtra
i
on the instrument and the exogenous variables. The null hypothesis
that is being tested by the F-test is the hypothesis that the coecient on the instrument, 1
i
, is zero. An
instrument is considered weak if the F-test is under 10. Therefore, we would conclude that 1
i
is not a
weak instrument.
99
(j) You estimate the second stage, and the results are as follows:
qradc1
i
= 74
(46.9)
+ 0.48
(0.27)
qradc'1
i
+ 0.68
(0.69)
qradc'2
i
+ 3.23
(10.5)
1rtra
i
When you estimate the same model using OLS, the results are instead the following:
qradc1
i
= 105.9
(26.1)
+ 0.36
(0.22)
qradc'1
i
+ 0.24
(0.42)
qradc'2
i
+ 13.1
(5.8)
1rtra
i
Comment on the change in the coecient for 1rtra when you go from OLS to 2SLS. Is the change
consistent with what you expected? Why? What do you conclude about the utility of the extra sections?
Solution
When we move from OLS to 2SLS, the coecient on 1rtra went down and is no longer statistically
signicant. This is not unexpected because we saw that there was self-selection of better students into
the extra section and that this would bias the coecient on 1rtra upwards. From the 2SLS, we conclude
that the extra section does not help very much and the evidence that it helps at all is somewhat weak.
(k) One of your students suggests that you should try to add other instruments to your 2SLS estimator, so
you will be able to test for instrument exogeneity. In particular, he suggests that you should use GPA in
other ECON courses, and GPA in non-ECON courses as two added instruments. Should you accept the
students suggestion? Explain. Solution
Adding these instruments is probably not a good idea. Since 1
i
was drawn randomly, we are condent
that it is exogenous to the error term in equation (1). However, the GPA variables are not likely to be
exogenous. For example, motivation might inuence the students test scores in Econ 139 and in her GPA
in other classes.
(l) You show your student what happens if you estimate the model using 2SLS, using, as instruments, the
two GPAs described before, as well as the dummy 1
i
. The J-statistic turns out to be equal to 25. What
should your student conclude? Solution
The J-statistic will be distributed
2
with :/ degrees of freedom, where : is the number of instruments
and / is the number of endogenous regressors. Here, the critical value of
2
with 2 degrees of freedom is
9.21 at the 1% level. Therefore, with a value of 25, we can reject the null hypothesis that the instruments
are exogenous. Most likely it is the GPA variables that are the problem.
3. 40 Points overall. You teach a course in econometrics, and you want to know if 2 extra hours of econometric
classes per week help your students or not. After the midterm has been graded, you assign randomly the
treatment A to your students. Students assigned to the treatment group (for which A = 1) are asked to
attend one extra 2-hour lecture per week, while students in the control group (for which A = 0) are asked not
to attend the extra lectures. Let G be the grade in the nal exam (assume that the maximum grade is 100).
Your estimates are as follows (heteroskedasticity-robust standard errors in parenthesis)
^
G
i
= 80.6
(3.23)
+ 10.03
(3.53)
A
i
1
2
= 0.18 (1)
(a) (4 points) Interpret the results. Judging solely on the above regression, is there evidence that more
econometric classes help improving your grade? Is the eect statistically signicant? Does it look impor-
tant?
Solution: There seems to be evidence that more econometric classes helps to improve grades. The eect
is statistically signicant, and it looks important.
(b) (4 points) Briey explain why it might be a good idea to re-estimate the above regression including, as
regressor, the grade obtained in the midterm.
Solution: If treatment is randomly assigned, the OLS estimator of interest is more ecient using multiple
100
regressor than a single regressor.
You re-estimate the above model including the students grade in the midterm ('
i
) as regressor, and
your estimates are as follows (heteroskedasticity-robust standard errors in parenthesis):
^
G
i
= 39.1
(14.44)
+ 0.54
(0.18)
'
i
+ 10.5
(3.14)
A
i
1
2
= 0.336 (2)
Compare the estimates in (2) to those in the previous regression (1) and briey comment on the following:
(c) (4 points) Note that in (2) the constant is much smaller than in (1). How do you explain this?
Solution: In equation (1) the constant was measuring the average grade for people in the control group,
but now it doesnt since you also include the midterm grade.
(d) (4 points) The standard error for the coecient of X in model (2) is smaller than the corresponding
standard error in model (1). Was this to be expected? Why?
Solution: Yes, we expect smaller standard errors, since weve added additional regressors.
(e) (4 points) How do you explain the large increase in the 1
2
when you move from model (1) to model (2)?
Solution: Similarly as in part (d), the addition of midterm grade as a regressor helps to explain variation
in the data. Additional regressors always improves t.
(f) (4 points) Is the coecient for M signicant? Does your answer suggest that our estimator for the eect
of the program in model (1) is not consistent? Justify your answer.
Solution: t =
0.540
0.18
= 3. Since the critical values are about 1.99 and 2.63 at 5% and 1%, respectively,
we can reject the null that midterm grade do not help to predict nal grades. This does not indicate
an inconsistent estimator in equation (1), since true randomly assigned treatment will make the OLS
estimator consistent.
(g) (4 points) Comparing your estimates from models (1) and (2), is there evidence that the randomization
across students was done incorrectly, or that partial compliance is a serious issue, here? Justify your
answer.
Solution: Because the OLS estimates for the eect of the additional econometric classes is fairly close
for the two equations we can be reasonably sure that randomization was done correctly.
Now you want to estimate what is the eect of attending extra lectures on the probability of improving
the midterms grade in the nal exam. 1:jrocd
i
is a binary variable equal to one if the i -th students
grade in the nal is above his/her grade in the midterm. You estimate the following probit model.
^
1(1:jrocd
i
= 1 [ A
i
, '
i
) =
_
4.1
(1.72)
0.05
(0.03)
'
i
+ 1.05
(0.46)
A
i
_
(3)
(h) (4 points) What is the predicted eect of the program on the probability of improving the grade? Does
it look important? In your answer, use the fact that the average midterm grade was equal to 76.
Solution: (i:jrocd[c|a::c:) (i:jrocd[:oc|a::c:) = (4.1 0.05 76 + 1.05 1) (4.1
101
0.05 76 +1.05 0) = (1.35) (0.30) = 0.9115 0.6179 = 0.2936. Since probit coecients calculated
using MLE is consistent and normally distributed in large samples, we can just use the usual t-statistics.
t =
1.050
0.46
= 2.28. Therefore, its statistically signicant at 5% but not at 1%. It does seem to be
important, since attending these sessions seems to increase the chance of getting a better grade by 30
percent.
(i) (4 points) You re-estimate the above model using a Linear Probability Model, and the results are as
follows
^
1(1:jrocd
i
= 1 [ A
i
, '
i
) = 1.47
(0.4)
0.01
(0.005)
'
i
+ 0.273
(0.111)
A
i
(4)
What is the predicted eect of the program on the probability of improving the grade? Is is very dierent
from the one estimated using a probit model? Did you expect it to be very dierent or not, and why?
Solution: The predicted eect is a 27.3 percent increase in the likelihood of receiving an improved grade.
The estimates are in line with results from the probit model. We expect the results to be similar. LPM
often provide answers close to probit or logit if there are not too many extreme values of the regressors
(j) (4 points) In both models (3) and (4) the coecient related to the midterms grade is negative (forget
about the statistical signicance, here). Can you think of a reason why this might be the case?
Solution: If a student received a very high grade on his/her midterm, its more dicult to improve upon
that in the nal grade. Therefore, it makes sense that a high midterm score would negatively aect the
probability of the nal exam grade improving over the midterm grade.
4. Problem (16) 25 points overall. Consider the following population regression model relating the depen-
dent variable 1
i
and regressor A
i
,
1
i
= ,
0
+,
1
A
i
+n
i
, i = 1, . . . , :. (1)
A
i
= 1
i
+7
i
(2)
where 7 is a valid instrument for A.
(a) (4 points) Explain why you should not use OLS to estimate ,
1
.
Solution: Substitution of the rst equation into the identity shows that A is correlated with the error
term. Hence estimation with OLS results in an inconsistent estimator.
(b) (4 points) To generate a consistent estimator for ,
1
, what should you do? (Just briey describe the
estimation procedure you would follow. No computation or proof is necessary here!).
Solution: The instrumental variable estimator is consistent and in this case is
^
,
1
2S1S
=
P
(A
i
A)Y
i
P
(A
i
A)Z
i
.
(c) (5 points) The two structural equations (1) and (2) above make up a system of equations in two
unknowns. Specify the two reduced form equations in terms of the original coecients. (Hint: Substitute
the identity into the rst equation and solve for 1 . Similarly, substitute 1 into the identity and solve for
A.)
102
Solution:
1
i
= ,
0
+,
1
(1
i
+7
i
) +n
i
A
i
= (,
0
+,
1
A
i
+n
i
) +7
i
or
(1 ,
1
)1
i
= ,
0
+,
1
7
i
+n
i
(1 ,
1
)A
i
= ,
0
+7
i
+n
i
Hence
1
i
=
0
+
2
7
i
+
1i
A
i
=
3
+
4
7
i
+
2i
where
0
=
3
=
o
0
1o
1
,
2
=
o
1
1o
1
,
4
=
1
1o
1
, and
1i
=
2i
=
1
1o
1
n
i
.
(d) (4 points) Can you estimate consistently the two reduced form equations using OLS? Why?
Solution: Since 7 is a valid instrument by assumption, it must be uncorrelated with the error term.
Hence using OLS results in a consistent estimator.
(e) (4 points) What is the ratio of the two estimated slopes? (This estimator is called indirect least
squares. )
Solution:
^
2
^
4
=
P
(A
i
A)Y
i
P
(A
i
A)
2
P
(A
i
A)Z
i
P
(A
i
A)
2
=
_
A
i

A
_
1
i
_
A
i

A
_
7
i
(f) (4 points) How does it compare to the TSLS in this example?
Solution: This indirect least squares estimator is identical to the TSLS estimator.
5. (45 points overall) This is (loosely) based on Goldin & Rouse (2000), Orchestrating impartiality: the impact
of blind auditions on female musicians. Sex-biased hiring has been alleged for many occupations but is
extremely dicult to prove. A change in the audition procedures of symphony orchestrasadoption of blind
auditions with a screen to conceal the candidates identity from the juryprovides a test for sex-biased hiring.
(a) (6 points) Suppose that the identity of a candidate is known, and you regress a dummy = 1 if a job
candidate is given a position (Job) on a dummy equal to one if the candidate is female ( Female). Your
estimated regression is
Jo/
i
= 0.4
( 0.01)
0.1
( 0.04)
1c:a|c
i
Explain why you cannot interpret the results as necessarily indicating discrimination.
Solution: Because there are a lot of relevant variables that are omitted from the regression. For ex-
ample, we do not control for the quality of candidates. It well may be the case that in this particular
sample female candidates are less qualied. Then the coecient on Female is downward biased since:
Co(Jo/, Qna|itj) 0 and Co(1c:a|c, Qna|itj) < 0.
The authors want to study if the existing composition of an orchestra appears to aect the probability
103
that blind auditions will be chosen. They estimate the following regression, using Probit (actually, the
constant is not reported in the paper, so I had to make it up...). The dependent variable is a dummy
equal to 1 if in a given year a screen was adopted for the auditions.
PROBIT estimates - (robust standard errors in parenthesis)
Probit estimates
constant
^
,
0
2.5
(0.5)
Proportion females in previous year
^
,
1
0.490
(1.163)
Prop. orchestra with tenure < 6 years
^
,
2
9.467
(2.787)
Pseudo-R
2
0.05
(b) (4 points) Using a 10 signicance level, can you reject the hypothesis that ,
1
is equal to zero?
Solution: No, we fail to reject the hypothesis: t :tat =
0.49
1.163
= 0.42 < 1.645
(c) (4 points) Using a 1 signicance level, can you reject the hypothesis that ,
2
is equal to zero?
Solution: Yes, we do reject the hypothesis: t :tat =
9.467
2.787
= 3.4 2.58
(d) (5 points) Suppose that the proportion of women in an orchestra in the previous year increases from 0 to
0.4, and that the proportion of the orchestra with tenure below six years remains equal to 0.2. What is
the predicted eect on the probability that a screen will be adopted this year?
Solution:
(

1ro/) = (2.5 + 0.49 + 0.4 9.467 + 0.2) (2.5 + 0.49 + 0 9.467 + 0.2)
= (0.803) (0.607) = 0.788 0.729 = 0.059
(e) (6 points) Suppose that you add another regressor to the above regression, a dummy equal to one if the
orchestra is one of the so-called big ves (Boston, Chicago, Philadelphia, NY, Cleveland). The new
variable is not statistically signicant using standard signicance level. Is it possible for the Pseudo-R
2
for this new regression to be below 0.05? Briey justify your answer.
Solution: No, because the MLE maximizes the likelihood function, adding another regressor to a model
increases the value of the maximized likelihood, just like adding a regressor necessarily reduces the sum of
squared OLS residuals in linear regression. In other words, without an additional regressor, the objective
function is maximized under the restriction that the coecient on this omitted variable is zero. Compare
the two 1:cndo 1
2
s. For the unrestricted model: 1:cndo 1
2
&
= 1
1cj1u
1cj1
0
For the restricted model:
1:cndo1
2
v
= 1
1cj1r
1cj1
0
where: 1oq1
&
and 1oq1
v
are the values of the maximized probit loglikelihoods for
the unrestricted and restricted models respectively; and 1oq1
0
is the value of the maximized loglikelihood
excluding all the regressors. The value of the constrained maximized likelihood is always less than the
value of its unconstrained counterpart, i.e.: 1oq1
v
< 1oq1
&
. It is important here to recall that the
loglikelihood is always negative (!, i.e. 1oq1
&
, 1oq1
v
, 1oq1
0
are all less than zero). Thus, the ratio
in the formula for 1:cndo 1
2
goes DOWN as we go from restricted to the unrestricted model (i.e. a
model with an additional regressor), and therefore: 1:cndo 1
2
v
< 1:cndo 1
2
&
, In this example the
1:cndo 1
2
for the restricted model is equal to 0.05, so as we add another regressor (and therefore relax
the constraint), the new 1:cndo 1
2
will exceed this value.
104
You have observations for a large number of individuals who participated to at least one blind and at
least one not blind audition. You use a Linear Probability Model in a regression where the dependent
variable is a dummy = 1 if the individual advanced to the next stage of the audition. Blind is a dummy
variable equal to one if a screen was used during the audition (that is, if the audition was blind). The
results are the following (robust standard errors in parenthesis).
Fixed Eects No Fixed Eects
(1) (2)
Blind
^
,
0
0.399 (0.027) 0.103 (0.018)
Blind Woman
^
,
1
0.041 (0.039) 0.069 (0.022)
Woman
^
,
2
0.005 (0.019)
In column (1), Fixed Eects estimates are obtained including individual xed eects. The coecients
corresponding to the individual-specic dummies (the xed eects) are not reported for brevity.
(f) (5 points) The results in column (1) do not include an estimate
^
,
2
. Why?
Solution: To avoid multicollinearity, the xed eect absorbs any individual-specic time-invariant
characteristics, and gender is one of them.
(g) (5 points) Consider a female musician. Using the Fixed Eects results (column 1) calculate the dierence
in the predicted probability of advancing to the next stage between a blind and a non-blind audition.
Solution:
^
1 =
^
1
b|iao,)

^
1
acab|iao,)
= 0.399 + 0.041 0 = 0.44
(h) (5 points) Consider a female musician. Using the OLS results (column 2) calculate the dierence in the
predicted probability of advancing to the next stage between a blind and a non-blind audition.
Solution:
^
1 =
^
1
b|iao,)

^
1
acab|iao,)
= 0.103 0.069 0 = 0.034
(i) (5 points) You test the null hypothesis that both ,
0
and ,
1
are equal to zero, using Fixed Eects results.
The result of the F-test is 8. Can you reject the null using a 1 signicance level?
Solution: Since F-stat is greater than 4.61 (the 1% critical value for 1
2,o
distribution), we reject the
hypothesis.
6. (36 points) You are interested in understanding whether student athletes grade point averages (GPAs) suer
during the semester their sport is in season. As such, you have collected data on 366 student-athletes from a
large midwestern research university that supports a Division 1 athletics program. You have observations for
the same students in both the fall and spring semesters (i.e. a two period panel). Your dataset includes the
following variables:
tcr:qja - the students GPA for that term, measured on a four point scale (mean: 2.33)
:jri:q - a dummy for whether the semester is the spring semester
:at - the students combined SAT score
/:jcrc - the students academic percentile in their high school graduating class
)c:a|c - a dummy variable equal to one if the student is female
105
cr:qja - a weighted average of the overall GPA (among all students) in courses taken by this student
:ca:o: - a dummy variable equal to one is the students sport is in season
We will maintain the assumption that all errors are homoskedastic throughout this question. You estimate
the following regression using OLS (pooling observations from both semesters):
tcr:qja = 2.15
(.311)
.057
(.045)
:jri:q + .002
(.0001)
:at .008
(.001)
/:jcrc + .366
(.051)
)c:a|c + 1.06
(.096)
cr:qja .035
(.049)
:ca:o:
1. (a) (3 points) Based on this regression, do student-athletes GPAs suer when their sport is in season?
Justify your answer.
Solution: The coecient on :ca:o: implies that, holding all other variables constant, we expect the
GPAs of student-athletes to be -.035 points lower when their sport is in season. Although negative, this
is a very small eect (only 1.5% of the mean). Moreover, since the t-stat =
.035
.049
= .71 is smaller in
magnitude than even the 10% critical value (1.64), we conclude that the dierential is not statistically
signicant.
(b) (3 points) What is the estimated GPA dierential between females and males? Is it statistically signi-
cant at the 1% level?
Solution: The coecient on )c:a|c implies that, holding all other variables constant, we expect the
GPAs of female student-athletes to be .366 higher than those of males. Since the t-stat =
.366
.051
= 7.18 is
greater than 2.58, the estimated dierential is signicant at the 1% level.
(c) (3 points) Since you have panel data (two semesters for each student athlete), you decide to estimate
the model using Random Eects as well. Here is the output you obtain:
tcr:qja = 2.22
(.313)
.063
(.034)
:jri:q + .002
(.0001)
:at .008
(.001)
/:jcrc + .369
(.061)
)c:a|c + 1.09
(.092)
cr:qja .047
(.039)
:ca:o:
Based on this RE regression, do student-athletes GPAs suer when their sport is in season? Justify your
answer.
GPAs of student-athletes to be -.045 points lower when their sport is in season. Again, while negative in
sign, this is a very small eect. Moreover, since the t-stat =
.047
.039
= 1.21 is still smaller in magnitude
than the 10% critical value (1.64), we conclude that the dierential is not statistically signicant.
(d) (4 points) Write out the population regression model (including the errors) for the RE estimator used
above. What must be true about the error terms for the RE estimator to yield consistent estimates of
the coecients?
Solution: The population regression model is given by
tcr:qja = ,
0
+,
1
:jri:q +,
2
:at +,
3
/:jcrc +,
4
)c:a|c +,
5
cr:qja +,
6
:ca:o: +c
i
+n
it
In order for the RE estimator to yield consistent estimates, both the xed eect c
i
and the idiosyncratic
error n
it
must be uncorrelated with the regressors.
(e) (5 points) Most of the athletes who play their sport only in the fall are football players. Suppose the
ability levels of football players dier systematically from those of other athletes (i.e. they have lower
ability). If ability is not fully captured by SAT score and high school percentile, do you think the OLS
106
estimate of the coecient on :ca:o: will be biased? What do you think the sign of that bias will be?
Why?
Solution: If omitted ability is correlated with season then, as we know from Chapters 5, OLS is biased
and inconsistent, so the coecient on :ca:o: will be biased.. However, the sign of the bias is dicult to
determine since we are pooling across semesters. First, suppose we used only the fall term, when football
is in season. Then the error term and season would be negatively correlated, which produces a downward
bias in the OLS estimator of ,
6
. Because ,
6
is hypothesized to be negative, an OLS regression using only
the fall data will produce a downward biased estimator. However, if we use just the spring semester, the
bias is in the opposite direction because ability and season would be positive correlated (more academically
able athletes are in season in the spring). When we pool the two semesters we cannot, with a much more
detailed analysis, determine which bias will dominate.
(f) (3 points) Since you are concerned with unobserved ability, you decide to use a xed eects model to
estimate the impact of :ca:o: on tcr:qja. How will the xed eects estimator allow you to control for
unobserved ability?
Solution: Using the xed eects estimator will allow us to estimate a separate c
i
for each student
athlete. This allows us to control for the unobserved aspects of each individual that do not change over
time. If we assume that innate ability is constant over time, then we will be controlling for ability by
including this xed eect.
(g) (5 points) When you use the FE estimator, you obtain the following results:
Dependent variable: tcr:qja
Regressor Coecient
:jri:q .069
(.034)
cr:qja 1.14
(.118)
:ca:o: .057
(.041)
Fixed Eects Included
Standard Errors in parentheses
Notice that the variables :at, /:jcrc, and )c:a|c have been dropped from the estimation. Why? Why
werent these variables dropped from the RE regression?
Solution: The variables :at, /:jcrc, and )c:a|c must be dropped from the FE regression since they do
not vary over time for a particular student. Therefore, the same dierencing procedure that purges c
i
will get rid of them as well. Alternatively, if you are estimating the FE regression with student dummies,
including these three variables would lead to perfect multicollinearity. Since the RE estimator does not
explicitly include student specic intercepts (i.e. dummies), there is no collinearity problem. Additional
note: Unlike the FE estimator, the goal of the RE estimator isnt to eliminate c
i
, but to use its presence
to improve eciency. In particular, the RE estimator simply requires that c
i
be uncorrelated with the
included regressors, but does not require that those regressors be constant over time.
(h) (3 points) Based on the FE regression in part g, do student-athletes GPAs suer when their sport is in
season? Justify your answer.
107
GPAs of student-athletes to be -.057 points lower when their sport is in season. Once again, although
negative, this is a very small eect. Also, since the t-stat =
.057
.041
= 1.39 is still smaller in magnitude
than the 10% critical value, we conclude that the dierential is not statistically signicant.
(i) (3 points) A Hausman test comparing the FE and RE estimators above produces a test statistic equal
to .88 . What do you conclude about the validity of the RE assumptions in this setting?
Solution: The Hausman test statistic is distributed
2
A
, where ' is the number of coecients (excluding
the ones that do not vary over time). ' here is equal to 3 and the 10% critical value of the
2
3
is 6.25,
so we cannot reject the null hypothesis that the coecient estimates are the same across specications.
Therefore, we conclude that it is appropriate to use the RE estimator in this case.
(j) (4 points) Suppose you decide to drop the three variables :at, /:jcrc, and )c:a|c and re-estimate the
RE model using the remaining variables. Here is the output you obtain.
tcr:qja = .520
(.286)
.046
(.034)
:jri:q + 1.04
(.102)
cr:qja .009
(.040)
:ca:o:
The Hausman test comparing this RE estimator to the FE estimator from part g yields a test statistic
equal to 32.13. What do you conclude about the validity of the RE assumptions in this new setting? Do
you reach the same conclusion as in part i? Explain why you do or do not.
Solution: The Hausman test statistic is distributed
2
A
, where ' is the number of coecients (excluding
the ones that do not vary over time). ' here is again equal to 3 and the 1% critical value of the
2
3
is 11.34, so we can easily reject the null hypothesis that the coecient estimates are the same across
specications. This is strong evidence that the RE assumptions are invalid. This is the opposite of what
we concluded in part i. Note that we are no longer controlling for variation in :at, /:jcrc, and )c:a|c .
This variation is now part of the error and may well be correlated with the remaining regressors. The
coecient estimates on the remaining parameters seem quite dierent than those of the FE estimator, so
we should not be surprised by the results of the Hausman test.
2. (27 points) Consider the following two-equation system:
1
1
= c
0
+c
1
A
1
+n
1
(9)
1
2
= ,
0
+,
1
1
1
+n
2
(10)
(a) (3 points) Under what condition(s) will simple OLS yield consistent estimates for the parameters c
0
&
c
1
in equation (9)?
Solution: For equation (9), we simply need OLS assumption 1 to hold: OLS Assumption 1: 1 (n
1i
[ A
1i
) =
0. In fact, even the weaker condition Co(A
1i
, n
1i
) = 0 is sucient.
(b) (3 points) Under what condition(s) will simple OLS yield consistent estimates for the parameters ,
0
&
,
1
in equation (10)? Does this place any restrictions on the relationship between n
2
, n
1
, and A
1
?
Solution: For equation (10), we again need the OLS assumption 1 to hold: OLS Assumption 1:
1 (n
2i
[ 1
1i
) = 0. This alone is a sucient condition for OLS to yield consistent estimates. However,
because 1
1
appears in the other equation, it does place some restrictions on the variables in that equation,
namely that Co(n
2i
, n
1i
) = 0 and Co(n
2i
, A
1i
) = 0
(c) (10 points) Suppose that, under the conditions you described above for equation (10), a researcher
decides instead to use 2SLS to estimate (10). That is, you rst regress 1
1
on A
1
using simple OLS and
then regress 1
2
on the tted values

1
1
from the rst step. Will this procedure yield a consistent estimate
108
of ,
1
? Justify your answer. (Hint: you should follow the same steps we used to establish consistency of
the 2SLS estimator in class).
Solution: Recall the trick we used in class: Since

1
1i
= c
0
+ c
1
A
1i
and

1
1
= c
0
+ c
1
A
1
we know
1
1i

1
1
= c
1
_
A
1i
A
1
_
. Now, using OLS to estimate equation (9) yields:
c
1
=
_
A
1i
A
1
_ _
1
1i
1
1
_
_
A
1i
A
1
_
2
The second stage estimator is given by
,
1
=
1
1i

1
1
_
_
1
2i
1
2
_
1
1i

1
1
_
2
=
1
1i

1
1
_
1
2i
1
1i

1
1
_
2
=
1
1i

1
1
_
(,
0
+,
1
1
1i
+n
2i
)
1
1i

1
1
_
2
=
c
1
_
A
1i
A
1
_
(,
0
+,
1
1
1i
+n
2i
)
c
2
1
_
A
1i
A
1
_
2
=
,
0
_
A
1i
A
1
_
c
1
_
A
1i
A
1
_
2
+
,
1
_
A
1i
A
1
_
(1
1i
)
c
1
_
A
1i
A
1
_
2
+
_
A
1i
A
1
_
(n
2i
)
c
1
_
A
1i
A
1
_
2
= ,
1
+
_
A
1i
A
1
_
(n
2i
)
c
1
_
A
1i
A
1
_
2
j
,
1
+
Co(A
1
, n
2
)
c
1
\ ar(A
1
)
= ,
1
(as long as c
1
,= 0, since Co(A
1
, n
2
) = 0 by assumption).
(d) (8 points) Someone who has taken Econ 139 in the past suggests that yet another way to obtain a
consistent estimator of ,
1
is to use indirect least squares (ILS). This person suggests rst regressing 1
1
on
A
1
, then regressing 1
2
on A
1
, and then using the results from both regressions to construct an estimate
of ,
1
. Show how you can use these two regressions to obtain a consistent estimate of ,
1
. Justify your
answer.
Solution: Plugging equation (9) into equation (10) yields
1
2
= ,
0
+,
1
1
1
+n
2
=1
2
= ,
0
+,
1
(c
0
+c
1
A
1
+n
1
) +n
2
=1
2
= ,
0
+,
1
c
0
+c
1
,
1
A
1
+,
1
n
1
+n
2
So regressing 1
2
on A
1
yields a consistent estimate of c
1
,
1
(provided 1(,
1
n
1
+n
2
[ A
1
) = 0). While you
can show that the estimator is consistent by constructing the ratios of the two OLS formulas themselves
and proceed as in the previous question, it is far easier to use what we already know about each OLS
estimator and use the known properties of convergence in probability. Since a regression of 1
1
on A
1
yields
a consistent estimate of c
1
(provided 1(n
1
[ A
1
) = 0) we can recover an estimate of ,
1
by constructing
the ratio of the two slope coecients. Since c
1
j
c
1
and

c
1
,
1
j
c
1
,
1
,
d
c
1
o
1
b c
1
j
c
1
o
1
c
1
= ,
1
.
(e) (3 points) Is there any reason to prefer simple OLS to these two alternative procedures?
Solution: If the OLS assumptions are satised, OLS will be more ecient than 2SLS (since, in this
case, we are adding unnecessary noise by using

1
1
instead of 1
1
). Also, 2SLS is generally biased in small
samples, while OLS is not.
3. (22 points total) Workers in the U.S. have several options for saving money for their retirement. Two primary
options are contributing to an individual retirement account (IRA) and participating in a 401(k) plan. While
everyone is eligible to contribute savings to an IRA, you can only participate in a 401(k) plan if your employer
109
oers one (that is, only if you work for a rm that chooses to oer one as part of the overall compensation
package). The goal of this exercise is to test whether there is a trade-o between participating in a 401(k) plan
and having an IRA (some economists have claimed that 401(k) plans crowd out IRAs). In particular, we are
interested in whether participating in a 401(k) plan makes a person less likely to participate in an IRA.
To explore this issue, you propose estimating the following linear probability model (LPM)
jira = ,
0
+,
1
j401/ +,
2
i:c +,
3
i:c
2
+,
4
aqc +,
5
aqc
2
+n (1)
where jira is a dummy variable indicating that a worker contributes to an IRA, j401/ is a dummy variable
indicating whether a worker participates in a 401(k) plan, i:c is the workers annual income, and aqc is the
workers age. We are primarily interested in the coecient ,
1
.
(a) (3 points) OLS estimation of this LPM yields
jira = .198
(.069)
+ .054
(.010)
j401/ + .0087
(.0005)
i:c .000023
(.000004)
i:c
2
.0016
(.0033)
aqc +.00012
(.00004)
aqc
2
What is the estimated eect of j401/? Is it statistically signicant at the 1% level?
Solution: The coecient on j401/ implies that participation in a 401(k) plan is associated with a .054
higher probability of having an IRA, holding income and age xed. We can conclude that it is statistically
signicant at the 1% level since t =
.054
.010
= 5.4 2.58.
(b) (4 points) What, if anything, is wrong with using OLS to estimate (1). Hint: think about the exogeneity
of j401/.
Solution: While regression (1) controls for income and age, it does not account for the fact that dierent
people have dierent taste for savings, even within given income and age categories. People that tend
to be savers will tend to have both a 401(k) plan and an IRA. (This means that the error term, n, is
positively correlated with j401/.) What we would like to know is, for a given person, if that person
participates in a 401(k) does it make it less likely or more likely that the person also has an IRA. This
question is dicult to answer using OLS without having many more controls for the taste for saving.
(c) (4 points) The variable c401/ is a binary variable equal to one if a worker is eligible to participate in a
401(k) plan. Explain what is required for c401/ to be a valid instrumental variable (IV) for j401/. Do
these assumptions seem reasonable?
Solution: To be a valid IV for j401/, c401/ must be both relevant and exogenous. For relevance, we
need c401/ to be correlated with j401/; not surprisingly, this is not an issue, as being eligible for a
401(k) plan is, by denition, necessary for participation. (The regression in part (d) veries that they are
strongly positively correlated.) The more dicult issue is whether c401/ can be taken as exogenous in
the structural model. In other words, is being eligible for a 401(k) correlated with unobserved taste for
saving? If we think workers that like to save for retirement will match up with employers that provide
vehicles for retirement saving, then n and c401/ would be positively correlated. Certainly we think that
c401/ is less correlated with n than is j401/.
(d) (7 points) Estimating the reduced form (rst stage) equation for j401/ by OLS yields
j401/ = .059
(.049)
+ .689
(.008)
c401/ + .0011
(.0003)
i:c .0000018
(.0000027)
i:c
2
.0047
(.0022)
aqc +.000052
(.000026)
aqc
2
What is the estimated impact of c401/ on j401/? Is it statistically signicant at the 1% level? What do
you conclude about the relevance of c401/ as an instrument for j401/? Is it exogenous?
110
Solution: The coecient estimate for c401/ implies that, holding income and age xed, eligibility in a
401(k) plan increases the probability of participating in a 401(k) by .69. The t-statistic t =
.689
.008
= 86.1
establishes that it is signicant at the 1% level (it is well above 2.58). Moreover, since the F-stat
1 = t
2
= 86.1
2
= 7418 is well above 10 (our rule of thumb cut-o), our instrument is very strong.
Clearly, c401/ passes one of the two requirements as an IV for j401/. Unfortunately, we cannot evaluate
the exogeneity of c401/ because we are exactly identied.
(e) (4 points) We now estimate equation (1) by 2SLS and obtain the following results
jira = .207
(.065)
+ .021
(.013)
j401/ + .0090
(.0005)
i:c .000024
(.000004)
i:c
2
.0011
(.0032)
aqc +.00011
(.00004)
aqc
2
What do you conclude about the trade o between participating in a 401(k) plan and participating in an
IRA? Did your conclusions change from part a?
Solution: The IV estimate of ,
j401I
is less than half as large as the OLS estimate, and the IV estimate
has a t-statistic t =
.021
.013
= 1.62 so it isnt signicant at the standard levels. A reduction in
^
,
j401I
is
what we expect given the unobserved taste for saving argument made in part (b). But we still do not
estimate a trade-o between participating in a 401(k) plan and participating in an IRA. This conclusion
has prompted some in the economics literature to claim that 401(k) saving is additional saving; it does
not simply crowd out saving in other plans.
4. (24 points total) Suppose the (inverse) supply function for the monthly growth in cement price (qjrc) as a
function of growth in quantity (qcc:) is given by
qjrc
t
= ,
0
+,
1
qcc:
t
+,
2
qjrcjct
t
+,
3
)c/
t
+... +,
13
dcc
t
+n
t
(1)
where qjrcjct, the growth in the price of petroleum (a key input to the production of cement), is assumed to
be exogenous and )c/, ..., dcc are dummy variables for the months of the year.
(a) (2 points) Why is there no January dummy in equation (1)?
Solution: It would lead to perfect multicollinearity.
(b) (2 points) What signs do you expect for ,
1
and ,
2
?
Solution: ,
1
0 (supply curves should be upward sloping) and ,
2
0 (more expensive inputs should
increase the price of outputs).
(c) (6 points) Estimation of equation (1) by OLS yields
qjrc = .0144
(.0058)
.0443
(.0127)
qcc:+ .0628
(.0256)
qjrcjct
where the month dummies have been suppressed for brevity. What does the estimate of ,
1
imply about
the supply curve? Is this surprising? What, if anything, is wrong with using OLS to estimate equation
(1)?
Solution: The estimated supply curve slopes down, not up, and the coecient on qcc:t is statistically
signicant at the 1% level (t =
.0043
.0127
= 3.47). This contradicts standard economic theory. However,
since this is a supply curve and prices and quantities are determined in equilibrium, it is very likely that
our results are tainted by simultaneity bias.
111
(d) (3 points) The variable qrdc): is the monthly growth in real defense spending in the United States.
Estimation of the reduced form (rst stage) equation yields
qcc: = .2482
(.0296)
1.054
(3.255)
qrdc): + .0670
(.0909)
qjrcjct
where the month dummies have been suppressed for brevity. What does this regression tell you about
the value of qrdc): as an instrument for qcc:?
Solution: The coecient on qrdc): is 1.054 with t-statistic t =
1.054
3.255
= 0.32. We cannot reject
H
0
: ,
jvoc)c
= 0 at any reasonable signicance level and 1 = t
2
= (0.32)
2
= .102 is well under our rule
of thumb of 10. We conclude that qrdc): is not a useful IV for qcc: (even if qrdc): is exogenous in the
demand equation).
112
e. (3 points) Two additional instruments available to us are the growth in output of residential (qrrc:) and
nonresidential (qr:o:) construction. These are demand shifters that should be roughly uncorrelated with the supply
error. Estimation of the reduced form (rst stage) equation yields
qcc: = .2437
(.0267)
+ .1361
(.1280)
qrrc: + 1.145
(.2887)
qr:o: + .0369
(.0938)
qjrcjct
where the month dummies have been suppressed for brevity and an F-test of the joint signicance of the two IVs
yields an F-statistic of 16.9. What do you conclude about the relevance of these instruments?
Solution:
Our F-statistic 1 10, so it passes our rule of thumb cut-o. Our instruments appear to be both relevant and
reasonably strong.
f. (3 points) You are concerned about the exogeneity of the two instruments proposed in part e, so you perform a
standard test of the over-identifying restrictions, which yields a J-statistic J = 0.178. What do you conclude?
Solution:
Here we have two instruments (qrrc:, qr:o:) and one endogenous variable (qcc:) so (in the notation of Stock
and Watson) : = 2 and / = 1, meaning that the J-statistic has a
2
nI
=
2
1
distribution. Since the 10% critical
value for the
2
1
distribution is 2.71, we are unable to reject the null hypothesis that the instruments are exogenous.
This is, of course, a good thing.
g. (5 points) Using the two instruments above (qrrc:, qr:o:) to estimate equation (1) by 2SLS yields
qjrc = .0228
(.0073)
.0106
(.0277)
qcc:+ .0605
(.0157)
qjrcjct
where the month dummies have been suppressed for brevity. Construct a 95% condence interval for ,
1
. What do
you conclude about the supply curve now?
Solution:
A 95% condence interval for ,
1
is given by
^
,
1
1.96 o1(
^
,
1
) = .0106 1.96 .0277 = [.065, .044]
While the coecient on qcc:t is still negative, it is only about one-fourth the size of the OLS coecient, and
it is now very insignicant (t =
.0106
.0277
= .38). At this point we would conclude that the static supply function is
horizontal (with qjrc on the vertical axis, the standard convention).
113
9 Program Evaluation - Dierences-in-Dierences
1. 15 points overall. In 1992, there was an increase in the (state) minimum wage in New Jersey, but not in
a neighboring location (eastern Pennsylvania). To calculate the
^
,
oi))ciaoi))c
1
you need the change in the
treatment group and the change in the control group. To do this, the study provides you with the following
information,
Pennsylvania New Jersey
Employment before 23.33 20.44
Employment after 21.17 21.03
The numbers are average employment per restaurant.
(a) (3 points) Calculate the change in the treatment group.
Solution: 21.03 20.44 = +0.59%
(b) (3 points) Calculate the change in the control group.
Solution: 21.17 23.33 = 2.16%
(c) (3 points) Calculate the dierence-in-dierences estimator
^
,
oi))ciaoi))c
1
.
Solution:
^
,
1
oi))iaoi))
= 0.59 (2.16) = 2.75
(d) (3 points) Since minimum wages represent a price oor, did you expect
^
,
oi))ciaoi))c
1
to be positive or
negative? How do your expectations compare with the above results?
Solution: According to standard economic theory, we expect
^
,
1
oi))iaoi))
to be negative. Because
minimum wage is supposed to act as a price oor, we expect employment to decrease after a minimum
wage hike relative to the treatment group.
(e) (3 points) The standard error for
^
,
oi))ciaoi))c
1
is 1.36. Test whether or not the coecient is statistically
signicant, given that there are 410 observations.
Solution: t =
2.750
1.36
= 2.02.
^
,
1
oi))iaoi))
is signicant at the 5% level.
2. (13 points total) On the last problem set, you analyzed the eect of a minimum wage increase using a
quasi-experiment for two adjacent states: New Jersey and Pennsylvania. In particular, you calculated a Dis-
in-Dis estimate by comparing average employment changes per restaurant between a treatment group (New
Jersey) and a control group (Pennsylvania). However, the authors of the originial study also provided data on
the employment changes between low wage restaurants and high wage restaurants in New Jersey only.
A restaurant was classied as low wage if the starting wage in the rst wave of surveys was at the then
prevailing minimum wage of $4.25. A high wage restaurant was a restaurant with a starting wage close to
or above the $5.25 minimum wage after the increase.
(a) (4 points) Explain why employment changes of the high wage and low wage restaurants might
constitute a quasi-experiment. Which is the treatment group and which the control group?
Solution: In the above example, the increase in wages (treatment) occurs not because of changes in
114
the demand or supply of labor, but because of an external event, namely the raising of the minimum wage
in New Jersey. This is therefore a good example of a natural experiment. The treatment group is the
low wage restaurants, since the wages there are actually changed. The high wage restaurants are the
control group.
(b) (6 points) The following information is provided
Low Wage High Wage
Employment before 19.56 22.25
Employment after 20.88 20.21
where the numbers are average employment per restaurant. Calculate the change in the treatment group,
the change in the control group, and nally
^
,
1i))cia1i))c
1
. Since minimum wages represent a price
oor, did you expect
^
,
1i))cia1i))c
1
to be positive or negative?
Solution: The change in treatment group is +1.32, the change in control group is 2.04, so
^
,
1i))cia1i))c
1
= 3.36
According to standard economic theory, we expect
^
,
1
1i))c1a1i))c
to be negative (higher minimum
wages should reduce employment since rms now have higher costs).
(c) (3 points) The standard error for
^
,
1i))cia1i))c
1
is 1.48. Test whether or not this is statistically
signicant at the 5% level, given that there are 174 observations.
Solution: The t-statistic is
3.36
1.48
= 2.27, so the coecient statistically signicant at the 5% level.
115
10 Time Series
1. Consider a time series 1
t
. You think that this time series may be either described by Model 1 or by Model 2.
The two models are as follows
Model 1: 1
t
= 1
t1
+n
t
, n
t
~ iid with 1 (n
t
) = 0, \ ar (1
t
) = o
2
Y
for each t
Co (n
t
, n
t)
) = 0 unless , = 0, 0 < < 1
Model 2: 1
t
= -
t
+0-
t1
, -
t
~ iid with 1 (-
t
) = 0, \ ar (-
t
) = o
2
.
for each t
Co (-
t
, -
t)
) = 0 unless , = 0
(a) (6 points) Prove that, for Model 1, 1
t
=
2
1
t2
+n
t1
+n
t
=
3
1
t3
+
2
n
t2
+n
t1
+n
t
.
Solution Note that the model implies 1
t1
= 1
t2
+n
t1
, and 1
t2
= 1
t3
+n
t2
.
1
t
= 1
t1
+n
t
= (1
t2
+n
t1
) +n
t
= ((1
t3
+n
t2
) +n
t1
) +n
t
=
3
1
t3
+
2
n
t2
+n
t1
+n
t
(b) (6 points) The population autocorrelation of order , of a time series 1
t
can be calculated as
j
)
=
co (1
t
, 1
t)
)
\ ar (1
t
)
.
Calculate the autocorrelation of order 1, 2, and 3 for the time series 1
t
if Model 1 is correct. (Hint: use
the result from part (a), and remember that the problem tells you that for each t, \ ar (1
t
) = o
2
Y
)
Solution In Model 1 Co(1
t
, 1
t1
) = Co(1
t1
+ n
t
, 1
t1
) = \ ar(1
t1
). From the previous part,
Co(1
t
, 1
t2
) = Co(
2
1
t2
+ n
t1
+ n
t
, 1
t2
) =
2
\ ar(1
t2
), and, analogously, Co(1
t
, 1
t3
) =
3
\ ar(1
t3
). Since the 1
t
is a covariance stationary process, \ ar(1
ti
) = \ ar(1
t)
) for all i, ,. So,
the autocorrelations are
j
1
= , j
2
=
2
, j
3
=
3
(c) (6 points) Calculate the autocorrelation of order 1, 2, and 3 for the time series 1
t
if Model 2 is correct.
(Hint: write down each 1 in terms of the corresponding iid components -).
Solution In Model 2 Co(1
t
, 1
t1
) = Co(-
t
+ 0-
t1
, -
t1
+ 0-
t2
) = 0o
2
.
, as -
t
is iid. Higher order
covariances are zero, since for example Co(1
t
, 1
t2
) = Co(-
t
+0-
t1
, -
t2
+0-
t3
) = 0 by independence
of -
t
. Also, \ ar(1
t
) = \ ar(-
t
+0-
t1
) = (1 +0
2
)o
2
Y
So, the autocorrelations are
j
1
= 0,(1 +0
2
), j
2
= 0, j
3
= 0
(d) (5 points) For both Model 1 and Model 2, draw two separate graphs with , on the horizontal axis, with
, = 1, 2, 3, and j
)
on the vertical axis. Comment on the dierence between the two graphs.
Solution The graph for Model 1 looks like a slowly decreasing positive function of ,, since (0, 1).
The graph for Model 2 has a spike at , = 1 and is zero at all higher lags.
116
(e) (5 points) Explain in words how you can use your results to choose between Model 1 and Model 2.
Solution The models imply dierent distribution properties of the process 1
t
. Model 1 implies some
linear dependence of the process at any lag, while Model 2 implies independence of 1
t
beyond the
rst lag. When we take real data, we calculate autocorrelations, and if there are signicant non-zero
correlation coecients beyond lag 1, we choose not to use Model 2.
2. You have monthly data on the New York Stock Exchange index (NYSE hereafter), from January 1991 to
December 1998, for a total of 96 observations. Let 1
t
denote the NYSE index, and let 1
t
denote, as usual,
the rst dierence, that is 1
t
= 1
t
1
t1
. You estimate the following AR(1) model
1
t
= 2.86
(1.12)
+ 0.24
(0.09)
1
t1
,

1
2
= 0.057
(a) (5 points) Does it look like an AR(1) model does a good job at predicting monthly changes in the index?
Explain.
Solution The results suggest that the change in index is only weakly predictable with the past value. As
can be seen from the 1
2
, only 6% of its variation is explained by the variation of the index change in the
previous month.
(b) (6 points) You know that the index was equal to 576 in December 1998, and equal to 564 in November
1997. Calculate a prediction for the value of the NYSE index in January 1999.
Solution In this case 1
t1
= 12, which implies

1
t
= 5.74, so the value in January is 1
t
= 1
t1
+
1
t
=
581.74.
(c) (5 points) You want to select the optimal number of lags, and you use a BIC criterion for this purpose.
The following table reports the results of your calculations, for j = 1, 2, ..., 12.
j 11C (j)

1
2
1 4.93 0.04
2 4.92 0.08
3 4.96 0.07
4 5.00 0.07
5 5.01 0.09
6 5.02 0.12
7 4.98 0.18
8 4.99 0.20
9 4.79 0.37
10 4.84 0.36
11 4.88 0.35
12 4.89 0.37
Based on the results of this table, which model should you estimate? Explain.
Solution Based on these results, we would estimate an AR(9) model, as the BIC is minimized, when the
number of lags in the model is 9.
117
11 Complete Past Exams
11.1 Midterm 1 in Fall 2007
This problem is based on actual data for a sample of 55,360 women of age 15 to 49 from rural India. For simplicity,
let us assume these data are iid. The dataset includes the following variables recorded for all women in the sample:
age Age in years
s Completed years of schooling
malaria Binary variable = 1 if the woman said she had malaria in the 3 months before the interview
hg Hemoglobin level (grams per deciliter of blood)
lowhg Binary variable = 1 if Hemoglobin level is < 10
Hemoglobin is the molecule, in our blood red cells, which among other things has the role of transporting oxygen
from the lungs to the rest of the body. Low levels of /q are usually associated with poor health, and can be caused
by dierent factors, such as malnutrition, disease, etc. The following table shows the joint distribution of the two
binary variables :a|aria and |on/q in the sample
Malaria=0 Malaria=1
lowhg=0 .796 0.038
lowhg=1 .155 0.011
1. (3 points): What is the number of women with low hemoglobin level in this sample?
Solution: The number of women is the fraction of the total for whom|on/q = 1, that is 55360(.155+0.011) =
9190 (rounded to the nearest unit)
2. (3 points): What is the mean value of |on/q in this sample?
Solution:
1 (|on/q
i
) = 1 1 (|on/q
i
= 1) + 0 1 (|on/q
i
= 0) = 1 (|on/q
i
= 1)
From the previous question, .155 + 0.011 = 0.166
3. (3 points): In this sample, what is the fraction of women with low hemoglobin level (that is, with |on/q = 1)
conditional on having had malaria recently (that is among women with :a|aria = 1)?
Solution:
1 (|on/q = 1[:a|aria = 1) =
1 (|on/q = 1[:a|aria = 1)
1 (:a|aria = 1)
=
0.011
0.038 + 0.011
= 0.224
:
4. (3 points): What is the fraction of women with low hemoglobin level among those who did not have malaria
recently?
Solution:
1 (|on/q = 1[:a|aria = 0) =
1 (|on/q = 1[:a|aria = 0)
1 (:a|aria = 0)
=
0.155
0.155 + 0.796
= 0.163
118
5. (3 points): Malaria is often associated with low hemoglobin levels (because the malaria parasite destroys red
blood cells). Are your results from the previous two questions overall consistent with this ndings from the
medical literature? Explain.
Solution: Yes, the fraction of women with low hg levels is higher among women who report malarial episodes
recently. Note that this does not indicate causation (it is possible that factors other than malaria explain this
correlation), but at least this correlation is consistent with the ndings in the medical literature.
Now you want to study the relationship between hemoglobin level and years of schooling. In this sample the
mean and standard deviation of /q are 11.6 and 1.9 respectively. The results of the OLS regression of /q
on : are the following, where the index i refers to the woman (heteroskedasticity-robust standard errors in
parenthesis):
/q
i
= 11.5
(0.01)
+ 0.048
(0.0019)
:
i
. (11)
6. (3 points): What is the interpretation of the intercept? Is it a meaningful parameter in this context?
Solution: It indicates the predicted value of /q for women with no schooling, and the parameter is meaningful
because there certainly are women with no schooling.
7. (3 points): What is the interpretation of the slope?
Solution: The slope indicates that a one more year of schooling is associated with a 0.048 increase in the
predicted level of hemoglobin
8. (4 points): Construct a 99% condence interval for the slope.
Solution:
0.048 2.57 0.0019 = [.043, .053]
9. (3 points): What is predicted hemoglobin level for a woman with no schooling?
Solution:
11.5 +.048 (0) = 11.5
10. (3 points): What is the predicted hemoglobin level for a woman with 10 years of schooling?
Solution:
11.5 +.048 (10) = 11.98
11. (4 points): Does the dierence between the prediction in part 9 and that in part 10 look large?
Solution: The dierence is about .5. It does not look very large. Recall that the mean value of /q in the
sample is 11.6, and the standard deviation is 1.9. So, 10 more years of schooling only increase /q by about
1,4 of a standard deviation, which is not very large in relative terms.
12. (4 points): Can you reject the null hypothesis that the slope is equal to 0.05, using a 10 percent signicance
level?
Solution: The test is two-sided (because the question does not indicate otherwise) so
0.048 0.05
0.0019
= 1.0526 1.645
so we cannot reject the null.
119
13. (6 points): Now you re-estimate the model separately for two dierent Indian states. In the state of Punjab,
the slope is now 0.018, with a standard error equal to 0.01. In the state of Rajasthan, the slope is 0.003, and
the standard error is again 0.01. Note that the two estimates use independent samples. Can you reject the
null hypothesis that the slope is the same in the two states, using a 1% signicance level?
Solution: Here the null and alternative hypothesis are
H
0
: ,
1&a)ob
1
,
1o)octIoa
1
= 0
H
: ,
1&a)ob
1
,
1o)octIoa
1
,= 0
so the test is
_
^
,
1&a)ob
1

^
,
1o)octIoa
1
_
0
o.1.
_
^
,
1&a)ob
1

^
,
1o)octIoa
1
_ =
_
^
,
1&a)ob
1

^
,
1o)octIoa
1
_
0
_

\ ar.
_
^
,
1&a)ob
1

^
,
1o)octIoa
1
_
=
_
^
,
1&a)ob
1

^
,
1o)octIoa
1
_
0
_

\ ar.
_
^
,
1&a)ob
1
_
+

\ ar.
_
^
,
1o)octIoa
1
_
where the fact that there is no covariance depends on the independence of the two samples. So
_
^
,
1&a)ob
1

^
,
1o)octIoa
1
_
0
_
o.1.
_
^
,
1&a)ob
1
_
2
+

o.1.
_
^
,
1o)octIoa
1
_
2
=
.018 .003
_
.01
2
+.01
2
= 1.0607 < 2.57
so that you cannot reject the null that ,
1&a)ob
1
,
1o)octIoa
1
= 0.
14. (5 points): All the results point to a positive correlation between years of schooling and hemoglobin level. Do
you think these results should be interpreted to mean that more education causes improvements in hemoglobin
levels? Explain.
Solution: Not at all. There are many other factors which could be driving this correlation. On the one hand,
better education could lead to better ability to take care of ones health, so the result could be causal in part.
But better education is also likely to be associated with higher income, better nutrition, better epidemiological
environment etc etc. Without keeping all these (and probably many other) factors constant, it is impossible
to argue that the estimated correlation actually indicates causality.
.
Suppose now that the standard OLS assumptions hold for the following model:
/q
i
= ,
0
+,
1
:
i
+n
i
. (12)
Recall that the OLS assumptions will also imply that o
&,c
= Co(n
i
, :
i
) = 0. You want to estimate ,
1
.
Unfortunately, there were problems with the blood tests necessary to measure hemoglobin, so that you do not
observe the true hemoglobin level /q
i
, but you only observe a value /q
+
i
= /q
i
+ c
i
, where c
i
is measurement
error. In this problem we want to study the consequences of such measurement error for the estimation of the
slope ,
1
.
Let o
c,c
denote Co(c
i
, :
i
), that is, the covariance between the education level and measurement error, and let
o
2
c
denote the variance of years of completed schooling.
First, note that because your data set only includes /q
+
i
, and not /q
i
, your OLS estimator for the slope ,
1
will
be
^
,
1
=
a
i=1
(:
i
:) /q
+
i
a
i=1
(:
i
:)
2
120
15. (6 points): Prove that the estimator can be rewritten as
^
,
1
= ,
1
+
1
a
a
i=1
(:
i
:) (n
i
+c
i
)
1
a
a
i=1
(:
i
:)
2
Solution:
^
,
1
=
a
i=1
(:
i
:) /q
+
i
a
i=1
(:
i
:)
2
=
a
i=1
(:
i
:) (/q
i
+c
i
)
a
i=1
(:
i
:)
2
=
a
i=1
(:
i
:) (,
0
+,
1
:
i
+n
i
+c
i
)
a
i=1
(:
i
:)
2
= ,
0
=0
..
a
i=1
(:
i
:)
a
i=1
(:
i
:)
2
+,
1
_
_
a
i=1
(:
i
:) :
i
a
i=1
(:
i
:)
2
_
_
. .
=1
+
a
i=1
(:
i
:) (n
i
+c
i
)
a
i=1
(:
i
:)
2
= ,
1
+
1
a
a
i=1
(:
i
:) (n
i
+c
i
)
1
a
a
i=1
(:
i
:)
2
16. (3 points)What is the probability limit of
1
a
a
i=1
(:
i
:) (n
i
+c
i
)?
Solution: First, note that in large samples, : - j
c
, so when :
1
:
a
i=1
(:
i
:) (n
i
+c
i
) -
1
:
a
i=1
(:
i
j
c
) (n
i
+c
i
)
but then this is just a mean of : iid random variables, which will converge to 1 [(:
i
j
c
) (n
i
+c
i
)] =
1[(:
i
j
c
) (n
i
+c
i
j
&
j
c
)] = co (:
i
, n
i
+c
i
) . Using the properties of covariances, this in turn can be
written as
co (:
i
, n
i
+c
i
) = co (:
i
, n
i
) +co (:
i
, c
i
) .
We know that co (:
i
, n
i
) = 0 by assumption, so nally
j lim
1
:
a
i=1
(:
i
:) (n
i
+c
i
) = co (:
i
, c
i
) = o
c,c
17. (4 points): Prove that
j lim
^
,
1
= ,
1
+
o
c,c
o
2
c
Solution: We saw many times in class that under the usual assumptions
j lim
1
:
a
i=1
(:
i
:)
2
= o
2
c
,
121
and in the previous point we proved that
j lim
1
:
a
i=1
(:
i
:) (n
i
+c
i
) = o
c,c
,
so, putting things together and using the properties of j lim we have the result
j lim
^
,
1
= j lim
_
_
,
1
+
p
oe;s
..
1
:
a
i=1
(:
i
:) (n
i
+c
i
)
1
:
a
i=1
(:
i
:)
2
. .
p
o
2
s
_
_
= ,
1
+
o
c,c
o
2
c
18. (5 points): Suppose that better educated women have less time to be tested, so that for these women, on
average, /q
i
has to measured in a rush, and that this usually leads to measurements which are below the true
value. In this case, is
^
,
1
a consistent estimator of ,
1
? Explain.
Solution: If women with better education, on average, get readings which are lower than the true value, then
we should expect that when : is above average, the measurement error c will be more likely to be negative and
below average. Hence the covariance between the two will be negative, and then
j lim
^
,
1
= ,
1
+
o
c,c
o
2
c
< ,
1
so that
^
,
1
will NOT be consistent. Intuitively, the estimates will be systematically lower than the true value,
because for women with better schooling we observe values of the dependent variable with are lower than the
true values.
Let o
2
&
and o
2
c
denote the variance of the regression error and measurement error respectively, and let o
c,&
denote the covariance between the two errors.
19. (3 points): Calculate \ ar(n
i
+c
i
).
Solution:
\ ar(n
i
+c
i
) = o
2
&
+o
2
c
+ 2o
c,&
20. (4 points): Assume now that o
c,c
= o
c,&
= 0 and assume also for simplicity that the regression error n
i
is
homoskedastic with variance o
2
&
, and the variance of the measurement error is constant and equal to o
2
c
. What
is the asymptotic variance of
_
:(
^
,
1
,
1
)? Interpret your results.
Solution: This is a VERY simple question, which does not require any lenghty calculation. First, note that
\ ar(n
i
+ c
i
) = o
2
&
+ o
2
c
if the two errors are uncorrelated. Then note that with no measurement error and
homoskedasticity, the asymptotic distribution would be
_
:(
^
,
1
,
1
)
o
_
0,
o
2
&
o
2
c
_
This is proved starting from (again, when there is NO measurement error)
^
,
1
= ,
1
+
1
a
a
i=1
(:
i
:) n
i
1
a
a
i=1
(:
i
:)
2
122
But in this case we have
^
,
1
= ,
1
+
1
a
a
i=1
(:
i
:) (n
i
+c
i
)
1
a
a
i=1
(:
i
:)
2
.
But then notice that the only dierence between the usual and this special case with measurement error is
that now the new error has become n
i
+c
i
. So, IF measurement error in the dependent variable is uncorrelated
with the regressor,
^
,
1
IS consistent, and the only consequence for our estimator is that it will be end up having
more noise, because the variance increases. Formally, we will have
_
:(
^
,
1
,
1
)
o
_
0,
o
2
&
+o
2
c
o
2
c
_
123
11.2 Midterm 2, Fall 2007
You have data from a random sample of 19,451 zero to 3 years old children from rural India, and you have estimated
the following regressions, where the dependent variable is log nciq/t
i
(heteroskedasticity robust standard errors in
parenthesis)
Model (1) Model (2)
^
,
1
- log /ciq/t
i
1.93 (.0108) 1.924 (.0110)
^
,
2
- 1o
i
(Fathers years of schooling) .002 (.0003) .00076 (.0003)
^
,
3
- o11
i
(Standard of Living Index) .0017 (.0002)
^
,
4
- constant -6.18 (.0466) -6.18 (.0466)
1
2
0.7396 0.7407
The Standard of Living Index (o11
i
) is a measure of asset ownership, and is constructed in a way that households
with more assets have larger values of o11
i
.
1. (3 points): Interpret the estimated slope
^
,
1
(the coecient corresponding to log /ciq/t
i
) in Model (1).
This is a log-log model, so
^
,
1
is the elasticity of weight with respect to height (that isa 1% increase in height
is associated with a 1.93% increase in weight)
2. (3 points): Using again Model (1), can you reject the null hypothesis ,
1
= 2 using a 1% signicance level?
The alternative is (by default) two-sided, so
t =
1.93 2
0.0108
= 6.48
so we denitely reject the null hypothesis using a 1% level
3. (5 points): In Model (2), the asset index o11
i
is added to the regression estimated in Model (1). Interpret
the change in
^
,
2
in terms of omitted variable bias.
We should certainly expect o11 (wealth) to be positively correlated with fathers schooling. We also know
that
^
,
3
0, so we should expect
:iq:(O\ 1) = :iq:
_
^
,
3
_
0
:iq:(o
S11,1S
)
0
0.
Then, the omission in Model (1) of o11 should lead to an upward bias in
^
,
2
, which is indeed what we see,
because when we do include o11 in the regression
^
,
2
decreases from 0.002 to 0.00076 (and the t-ratio decreases
too).
124
Now let oTl
i
be a binary variable equal to 1 if child i is stunted (that is, his/her weight is low relative
to his/her height) and equal to zero otherwise. You use oTl
i
as dependent variable and you estimate the
following Linear Probability Model (LPM), where heteroskedasticity-robust standard errors are in parenthesis.
_cons .2583318 .0143801 17.96 0.000 .2301454 .2865181
FxSLI -.0004805 .0008231 -0.58 0.559 -.0020939 .0011329
SLI -.0045558 .000602 -7.57 0.000 -.0057357 -.0033759
Female .0117461 .0136431 0.86 0.389 -.0149955 .0384876
age2 -.00027 .0000348 -7.76 0.000 -.0003382 -.0002019
age .0097866 .0012474 7.85 0.000 .0073416 .0122317
Ftr_Illit .0145057 .0079548 1.82 0.068 -.0010864 .0300978
Mtr_Illit .0344392 .0073581 4.68 0.000 .0200168 .0488617
STU Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
where 'tr_1||it and 1tr_1||it are binary variable indicating respectively whether the childs mother or father
is illiterate, aqc is childs age in months, aqc2 is its squared, 1c:a|c is a dummy variable equal to one if child
i is a girl and 1 o11 is 1c:a|c o11. In what follows, let also ,
a
denote the slope corresponding to a given
regressor r (so that, for instance,
^
,
ojc
= 0.0097866).
4. (4 points) Interpret the estimated coecients
^
,
Atv_1||it
and
^
,
1tv_1||it
.
^
,
Atv_1||it
indicates that having an illiterate mother increases the probability of stunting by 3.4 percentage
points, keeping everything else constant. Similarly,
^
,
1tv_1||it
indicates that having an illiterate father increases
the probability of stunting by1.5 percentage points, keeping everything else constant.
5. (4 points) Do the results above suggest that fathers and mothers illiteracy have a very dierent importance
for child nutritional status, as measured by stunting?
Based on the results from question 4, there does seem to be a large dierence. Keeping everything else constant,
the impact of mothers illiteracy on the predicted probability is more than twice as large as the impact of
fathers probability. Given that we are talking about the probability of stunting for young children, a dierence
in predicted probabilities equal to - 1.9 is quite large!
125
6. (5 points) Suppose that you want to test the null hypothesis that fathers and mothers illiteracy aect equally
child nutritional status as measured by stunting. State clearly the null and the alternative hypothesis, and
state whether you can reject the null hypothesis, knowing that the value of the 1 test statistic is 3.23.
H
0
: ,
Atv_1||it
= ,
1tv_1||it
H
: ,
Atv_1||it
,= ,
1tv_1||it
This is a test with a single restriction (note that you cannot use a t-ratio test, because the problem does not tell
you what the estimated covariance between the two coecients is). So we have to use an 1
1,o
test. Looking
at the appropriate row in the tables, we can see that we cannot reject the null using a 5 or 1% signicance
level, which we can reject the null using a 10% level.
7. (3 points) Using a 1% signicance level, can you reject the null hypothesis that the conditional expectation
of oTl
i
is a linear function of aqc
i
keeping all other regressors constant?
The conditional expectation is linear in aqc if aqc
2
does not enter into the regression. So we just have to look
at the p-value of ,
ojc2
, which is approximately zero. Hence, we can denitely reject the null of linearity.
8. (5 points) Now you want to test the null hypothesis that the regression is the same for boys and girls. State
clearly the null and the alternative hypothesis, and state if you can reject the null, knowing that the value of
the 1 test statistic is 0.44.
H
0
: ,
1cno|c
= ,
1S11
= 0
H
: ,
1cno|c
,= 0 and/or ,
1S11
,= 0
This is a joint hypothesis with two restrictions, hence we have to use an 1
2,o
test. Looking at the tables, we
see that we cannot reject the null hypothesis that the regression is the same for boys and girls.
126
9. (4 points) Calculate the probability of being stunted for that a newborn boy (aqc = 0), born from parents
who have no assets (o11
i
= 0) and who are both illiterate.
.2583318 +.0344392 +.0145057 = .3072767
Now you re-estimate the same model as before, but using probit. The results are the following:
_cons -.6473693 .0447236 -14.47 0.000 -.7350259 -.5597127
FxSLI -.0014958 .0026327 -0.57 0.570 -.0066557 .0036642
SLI -.0140907 .0019095 -7.38 0.000 -.0178333 -.0103481
Female .0347082 .0401499 0.86 0.387 -.0439842 .1134007
age2 -.0008046 .000105 -7.66 0.000 -.0010104 -.0005988
age .0291546 .0037847 7.70 0.000 .0217368 .0365725
Ftr_Illit .0389355 .0229746 1.69 0.090 -.0060938 .0839648
Mtr_Illit .104285 .0226281 4.61 0.000 .0599346 .1486353
STU Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -11442.85 Pseudo R2 = 0.0127
Prob > chi2 = 0.0000
LR chi2(7) = 295.13
Probit regression Number of obs = 19451
10. (4 points) Re-calculate the probability of being stunted for that a newborn boy (aqc = 0), born from parents
who have no assets (o11
i
= 0) and who are both illiterate. Is the result very dierent from the one you
estimated using LPM?
(.6473693 +.104285 +.0389355) = (.5041488) = .30707841,
which is almost identical to the one we obtained using the LMP!! As usual, the model we use does not seem
to matter much....
127
11. (4 points) Calculate the marginal eect of o11 (that is, of an increase in the asset index), for a newborn boy
whose parents are both literate and whose o11
i
is equal to 20. (note: writing down the correct expression for
the marginal eect is sucient to get full credit in this question, you do not have to do the actual calculations)
Remember that the marginal eect in a non-linear regression model is the partial derivative of the regression
with respect to a given regressor, for given initial values of all regressors. Also, we saw in class that the
derivative of the cdf is the density, so
'1 =
0(.)
0o11
i
= c(.6473693 .0140907 20) (.0140907)
= (.0140907)
1
_
2
c
1
2
(.6473693.014090720)
2
where the last step follows from using the formula for the density of a normal distribution. So
'1 = .00365
which is almost zero.
12. (5 points) You re-estimate the model above omitting 1tr_1||it, 1c:a|c and 1 o11. The log-likelihood of
the new model is 11444.731. Can you reject the null hypothesis that ,
1tv_1||it
= ,
1cno|c
= ,
1S11
= 0?
11 = 2 [ln/
l
ln/
1
] = 2 [11442.85 (11444.731)] = 3.762
There are three restrictions, so that we know that under the null hypothesis
11 = 2 [ln/
l
ln/
1
]
o
2
3
.
Looking at the table, it is clear that we cannot reject the null hypothesis (the critical value using a 10% level
is 6.25).
128
Now you want to evaluate the impact of a child nutrition supplement program on stunting. Let 1
i
be a binary
variable equal to 1 if child i participates to the program, and zero otherwise. Participation to the program is
voluntary, and decided by the childrens parents. Suppose that you estimate the following model using OLS
(remember that oTl
i
is a dummy equal to one if child i is stunted):
oTl
i
= c
0
+c
1
1
i
+n
i
, (13)
where 1 (n
i
) = 0.
13. (4 points) Do you think your estimate ^ c
1
could be interpreted in a causal sense, that is, will ^ c
1
measure the
causal impact of program participation on the probability of stunting? Explain.
Certainly not. Participation is voluntary, hence 1
i
will be most likely correlated with other observable or
unobservable variables which are also likely to be important determinant of oTl
i
(for instance how much
parents care about childrens health, how far they live from the place where the program is oered, how well
they understand the importance of the program to improve child nutrition etc etc). Hence, omitted variable
bias is likely to be a problem.
Suppose now that program participation is still voluntary, but that you also make sure that a random subsample
of families is supplied ample information about the existence and the utility of the program. As a consequence,
you anticipate that such random subsample of families will be relatively more likely to participate to the
program. Let
i
be a binary variable equal to one if child i lives in a family that has been exposed to this
advertisement campaign about the program, and zero otherwise.
14. (4 points) Prove that 1 (oTl
i
[
i
) = c
0
+c
1
1 (1
i
[
i
) .
1 (oTl
i
[
i
) = 1 (c
0
+c
1
1
i
+n
i
[
i
)
= c
0
+c
1
1 (1
i
[
i
) +1 (n
i
[
i
)
The conclusion follows noting that the last term is zero, because
i
has been determined completely at
random, and hence will be uncorrelated with the error. Formally,
i
and n
i
are statistically independent, so
that 1 (n
i
[
i
) = 1 (n
i
) = 0.
129
15. (3 points) Prove that 1 (1
i
i
) = 1 (1
i
= 1,
i
= 1) . Justify your argument.
Both 1
i
and
i
are binary variable, so their product will always be zero, unless both variables are equal to
one. Hence
1 (1
i
i
) = 0 + 1 1 (1
i
= 1,
i
= 1) = 1 (1
i
= 1,
i
= 1)
16. (5 points) Recalling that both 1
i
and
i
are binary variables, prove that
j lim
ao
_
a
i=1
1
i
a
i=1
i
_
= 1 (1
i
[
i
= 1)
j lim
ao
_
a
i=1
1
i
a
i=1
i
_
= j lim
ao
_
1
a
a
i=1
1
i
i
1
a
a
i=1
i
_
(by Slutsky) =
j lim
1
a
a
i=1
1
i
i
j lim
1
a
a
i=1
i
(by LLN) =
1 (1
i
i
)
1 (
i
)
=
1 (1
i
= 1,
i
= 1)
1 (
i
= 1)
(because 1
i
and
i
are binary variables)
= 1 (1
i
= 1 [
i
= 1) = 1 (1
i
[
i
= 1) (because 1
i
is binary)
130
17. (6 points) Recalling that both oTl
i
and
i
are binary variables, prove that
j lim
ao
_
a
i=1
oTl
i
(1
i
)
a
i=1
(1
i
)
_
= 1 (oTl
i
[
i
= 0)
j lim
ao
_
a
i=1
oTl
i
(1
i
)
a
i=1
(1
i
)
_
=
j lim
1
a
a
i=1
oTl
i
(1
i
)
j lim
1
a
a
i=1
(1
i
)
(by Slutsky)
=
1 [oTl
i
(1
i
)]
1 (1
i
)
But oTl
i
and
i
are both binary, so oTl
i
(1
i
) is binary too, and is equal to one only if both oTl
i
= 1
and
i
= 0. Hence:
=
1 (oTl
i
= 1, 1
i
= 1)
1 (1
i
= 1)
=
1 (oTl
i
= 1,
i
= 0)
1 (
i
= 0)
= 1 (oTl
i
= 1 [
i
= 0)
= 1 (oTl
i
[
i
= 0)
131
18. (4 points) Based on the results from the questions 14, 16 and 17, describe a consistent estimator for c
1
, that
is, write down the estimator and prove that it is consistent. Justify your steps. (hint: start by writing down
the conditional expectation in 14 for
i
= 0 and for
i
= 1). This question is worth few points given its
diculty, so plan accordingly.
From 14 we know that
1 (oTl
i
[
i
= 1) = c
0
+c
1
1 (1
i
[
i
= 1)
1 (oTl
i
[
i
= 0) = c
0
+c
1
1 (1
i
[
i
= 0)
So c
1
can be obtained through some simple maniuplations. Specically, if both the above expressions are true,
then the dierence of the left-hand sides is equal to the dierence of the right-hand sides. Hence
1 (oTl
i
[
i
= 1) 1 (oTl
i
[
i
= 0) = c
0
+c
1
1 (1
i
[
i
= 1) c
0
+c
1
1 (1
i
[
i
= 0) .
Solving for c
1
:
c
1
=
1 (oTl
i
[
i
= 1) 1 (oTl
i
[
i
= 0)
1 (1
i
[
i
= 1) 1 (1
i
[
i
= 0)
Then a consistent estimator can be obtained using the sample analogue!
^ c
1
=
^
1 (oTl
i
[
i
= 1)
^
1 (oTl
i
[
i
= 0)
^
1 (1
i
[
i
= 1)
^
1 (1
i
[
i
= 0)
From the responses to 16 and 17 we already know how to consistently estimate each term:
^ c
1
=
p
1(STl
i
[
i
=1)
..
a
i=1
oTl
i
a
i=1
i

p
1(STl
i
[
i
=0)
..
a
i=1
oTl
i
(1
i
)
a
i=1
(1
i
)
a
i=1
1
i
a
i=1
i
. .
p
1(1
i
[
i
=1)
a
i=1
1
i
(1
i
)
a
i=1
(1
i
)
. .
p
1(1
i
[
i
=0)
j
1 (oTl
i
[
i
= 1) 1 (oTl
i
[
i
= 0)
1 (1
i
[
i
= 1) 1 (1
i
[
i
= 0)
= c
1
132
11.3 Final, Fall 2006
1. You have collected information on seatbelts use and other trac-related variables from the 50 U.S. States plus
the District of Columbia, for the year 1993. Let )
i
be the number of fatalities per million of trac miles in
state i, and let o11
i
be a binary variable equal to one if state i enforces seat belt laws. You estimate the
following regression (heteroskedasticity robust standard errors in parenthesis):
^
)
i
= .0169
(.00091)
+ .0018
(.0012)
o11
i
(14)
(a) (4 points) Determine if o11
i
is statistically signicant, using a 10% level.
t =
.0018
.0012
= 1.5 < 1.645
so this is not signicant at 10% level.
(b) (4 points) Now you re-estimate the model adding, as regressor, the logarithm of per capita income
(log i:c
i
). The result is the following:
^
)
i
= .0181
(.0337)
+ .0016
(.0009)
o11
i
.0017
(.0034)
log i:c
i
. (15)
Interpret the coecient for log i:c
i
.
A one percent increase in income is associated with a decrease in fatality rate equal to (0.01) 0.0017 fewer
deaths per million of trac miles.
(c) (5 points) What does the comparison of the results in (??) and (15) suggest about the correlation
between log i:c
i
and o11
i
? Should it be positive or negative? Explain.
Here we observe that adding log(income) to the regression leads to (albeit small) decrease in the coecient
for o11. Hence, the coecient in (??) appears to be upward biased. Because the sign of the coecient
for log(income) is negative, an upward bias results if the correlation (or covariance) between log i:c
i
and
o11
i
is negative as well. The negative correlation is also conrmed by the following logit and probit
estimates.
133
Now you estimate a regression of o11
i
on log i:c
i
using logit and probit, and you obtain the following
results (standard errors in parenthesis):
logit probit
log i:c -.3955 -.2345
( 2.00) (1.20)
constant 4.8 2.9
(19.8) (11.9)
log-likelihood -30.8761 -30.8765
(d) (5 points) Using logit, estimate the dierence in the estimated probability of having seat belt enforcement
between a state with income per head equal to $18,000 and one with income per head equal to $25,000.
1
1 +c
(4.8.3955 ln(25000))

1
1 +c
(4.8.3955 ln(18000))
= 0.027141
(e) (5 points) Estimate the same dierence using probit. Is the results very dierent from the one obtained
with logit? Is this what you expected? Explain.
(2.9 .2345 ln (25000)) (2.9 .2345 ln (18000)) = .02621.
As expected, logit and probit give very similar results for the estimated predicted change.
(f) (5 points) The coecients estimated using logit and probit are very dierent. However, the two log-
likelihoods are almost identical. Is this surprising, or was it to be expected? Explain.
As pointed out in the previous point, logit and probit usually produce very similar results when it comes
to the predicted probabilities. Becaues the log-likelihood is a function of such predicted probabilities, the
two log-likelihoods will usually be very similar.
134
Now you want to explore the topic further, and you collect data for all 50 states + DC for all years
between 1983 and 1997. First, you estimate equation (15) with OLS again, but adding time dummies.
The results are as follows (the time dummies are omitted for brevity).
^
)
it
= .195
(.011)
+.00009
(.0003)
o11
it
.0018
(.0011)
log i:c
it
+Ti:c1ircd1))cct:. (16)
(g) (4 points) How many time xed eects should be included in regression (16). Explain.
You should include 97-83 time xed eects. There are 15 years in our dataset, but because we are also
estimating a constant, we need to omit one year dummy.
(h) (4 points) You test the null hypothesis that the time xed eects are not statistically signicant. The
value of the 1-test is 1.91. What do you conclude?
There are fourteen time xed eects, so that we have to compare 1.91 with the critical value of a 1
14,o
distribution. The critical values are 1.5 (10%), 1.69 (5%) and 2.08 (1%), so that we cannot reject the null
using a 1% signicance level, but we reject using a 10 or a 5% level.
(i) (5 points) Using the results in (16), calculate the p-value of the test of statistical signicance of ,
S1.1
it
.
This is a two-sided test, so that the p-value is
2
_
.00009
.0003
_
= .7642
135
(j) (6 points) Because you suspect that o11
it
is endogenous, you would like to use IV to estimate
consistently its impact on fatality rates. A fellow researcher suggests to use, as instrument, another
binary variable (111
it
) equal to one if state i has in place primary enforcement at time t (primary
enforcement allows police ocers to have more power in enforcing seat belt laws). Do you think this
instrument is likely to be valid? Explain.
This is probably a bad idea. Surely, 111 is likely to be relevant. Factors that are associated with
the probability of of having secondary enforcement are also likely to be associated to having primary
enforcement, so that the two variables are likely to be strongly related (positively or negatively depending
on whether o11 and 111 are likely to be complements or substitute from a policy perspective.
However, if we suspect that there are unobserved factors (such as road conditions, political views in the
state etc.) that will be related to o11 AND to ) through channels dierent from o11 (so that
o11 will be correlated with the error term and hence endogenous), it is not clear why one should think
that the same factors will also be correlated with 111. So, 111 is likely to be relevant but not
exogenous, hence not a valid instrument.
(k) (6 points) You want to go ahead with the idea of trying to use 111
it
as an instrument, and you decide
to use also aqc
it
(mean age of drivers in state i in year t) as a further instrumental variable in estimating
equation (16). Explain how you should carry out the rst stage of the 2SLS procedure.
You should regress the endogenous variable (o11) on all instruments (111 and aqc) and all the
exogenous variables included in (16), that is the constant, log i:c
it
, as well as the time xed eects.
136
(l) (6 points) You estimate the rst stage of 2SLS, and you want to check if the instruments are likely to
be weak. The F-test is 382.11. Explain what is the null hypothesis of this F-test, and what you conclude
in terms of instrument weakness.
The null hypothesis is that all coecients that multiply the instruments are equal to zero. We conclude
that the instruments are weak if the F test for this joint test of statistical signicance is below 10. Here,
the test is clearly above the threshold, and so we conclude (as expected) that the instrument are relevant,
and not weak.
You estimate the second stage of 2SLS, and this is what you get:
^
)
it
= .195
(.010)
.0007
(.00047)
o11
it
.0018
(.0011)
log i:c
it
+Ti:c1ircd1))cct:. (17)
(m) (5 points) Now the coecient for o11
it
has turned negative. Calculate the power of a 5% test of
statistical signicance of this coecient, when the true value of the coecient is zero.
The power is the probability of rejecting H
0
: ,
S1.1
= 0, when the truth is that ,
S1.1
= ,
S1.1
. But
if the the coecient is zero, then ,
S1.1
= 0, so that this is just the probability of rejecting the null when
the null is true! If we use a 5% signicance level, such probability is .05 by construction.
137
(n) (6 points) Calculate the power of a 5% test of statistical signicance of this coecient, when the true
value of the coecient is -0.0005.
joncr
c=0.05
(0.0005)
= 1 Pr
_
do not reject null at 5% [ ,
S1.1
= 0.0005
_
= 1 Pr
_
1.96 _
^
,
S1.1
0
.00047
_ 1.96 [ ,
S1.1
= 0.0005
_
= 1 Pr
_
1.96
0.0005
.00047
_
^
,
S1.1
(0.0005)
.00047
_ 1.96
0.0005
.00047
[ ,
S1.1
= 0.0005
_
= 1 Pr
_
_
_
_
3.0238298 _
^
,
S1.1
(0.0005)
.00047
. .
~.(0,1)
_ .89617021 [ ,
S1.1
= 0.0005
_
_
_
_
= 1 (.89617021) + (3.0238298) = .18632892
(o) (6 points) You still suspect that your instruments may not be exogenous, and because you have more
instruments than necessary, you can perform a test of overidentifying restrictions. The value of the test
is 7.07. Explain how this test should be carried out, and what you conclude about the validity of your
instruments.
First, after estimating the model with 2SLS, you should calculate the residuals as
^ n
it
=
^
)
it
.195 +.0007 o11
it
+.0018 log i:c
it
Ti:c1ircd1))cct:.
Then you should regress ^ n
it
on all instruments, log i:c
it
and the Ti:c1ircd1))cct:, and nally you
should test the joint null that both instruments (111 and aqc) are equal to zero. Here the value
of the test is 7.07, which has to be compared with the critical value of a
2
1
distribution (there is one
overidentication restriction). Clearly, we reject at any standard signicance level. This suggests that at
least one of the two instruments is not valid, and hence our results are probably not that meaningful, as
expected from the poor choice of instruments.
138
Now you nally turn to exploiting the panel structure of your data, and you estimate the usual equation
using Fixed Eects (FE) and Random Eects (RE). The results are as follows (year dummies are included
but omitted from the table for brevity):
FE RE
o11
i
-.00005 -.00019
( .0003 ( .0003)
log i:c .019 .0037
(.0026) (.0021)
Time Fixed Eect Yes Yes
Constant No Yes
(p) (4 points) Using the FE results, what is the predicted decrease in the number of fatalities associated
with a state having seat belt enforcement?
Seat belt enforcement is associated with a decline in fatalities equal to .00005 fewer deaths per million
miles.
(q) (4 points) The results using the FE and RE models are quite dierent. You perform a Hausman test,
and the result of the test is 104.18. What do you conclude?
The value of the test is very large. There are 16 degrees of freedom (14 time dummies + the regressors in
the table), and the test is asymptotically distributed as a
2
16
. The critical value with a 1% signicance
level is 32, so we reject the null that the state xed eects are uncorrelated with the included regressors
at any standard signicance level. We conclude that RE is inconsisten, and then we shuold use FE (with
TWO grains of salt, and its not clear that using FE we will solve all endogeneity problems...)
139
2. You wish to estimate the impact of mothers smoking on child birth weight. Low birth weight is often associated
with poorer outcomes for the child, both in infancy and later in life. You know that the true model is the
following:
|oq(/irt/n)
i
= ,
0i
+,
1i
jac/:daj
i
+n
i
, (18)
where notice that there is heterogeneity in the coecients, which can therefore be treated as random variables
themselves. You can also assume that the coecients are independent from all the other variables. You would
like to estimate the Average Treatment Eect (ATE), that is, 1[,
1i
].
(a) (6 points) Suppose for the moment that 1[n
i
[ jac/:daj
i
] = 0, and assume also that there are two
types of mothers: half of the population is composed of Type I mothers, for whom ,
1i
= 0.12, while
the other half of the population is composed of Type II women, for whom ,
1i
= 0.02. If you estimate
equation (18) with OLS using i.i.d. data, what is the probability limit of

,
1
We proved in class that if the regressor is exogenous, then
,
O1S
1
j
1[,
1i
]
so that OLS estimates consistently the ATE. In our case, because we have two Types which happen with
the same probability, we have
,
O1S
1
j
1[,
1i
] = .5 (.12) +.5 (.02) = 0.07
140
In reality, 1[n
i
[ jac/:daj
i
] ,= 0, as there are several omitted factors that are likely to lead to a negative
correlation between /irt/n
i
and jac/:daj
i
. Hence, you resort to using IV, using cigarette prices (j
i
) as
an instrument. You can assume that j
i
is a valid instrument, in the sense that it is both relevant and
exogenous. However, there is also heterogeneity in the rst stage of IV, which is then as follows:
jac/:daj
i
=
0i
+
1i
j
i
+
i
, (19)
where you know that
1i
= 0.05 for Type I women, but
1i
= 0 for Type II women.
(b) (6 points) If you now estimate equation (18) with IV using i.i.d. data and j
i
as an instrument, what is
the probability limit of

,
1\
1
No,

,
1\
1
is not a consistent estimator for the ATE. We know that in fact
,
1\
1
j
1
_
,
1i

1i
1 (
1i
)
_
In this case
1
_
,
1i

1i
1 (
1i
)
_
= .5 (.12)
_
0.05
1 (
1i
)
_
+.5 (.02)
_
0
1 (
1i
)
_
= .5 (0.05) (.12)
1
(.5) (0.05) + (0.5) 0
= .12
(c) (5 points) Is

,
1\
1
a consistent estimator for the ATE? Briey interpret your results.
Because Type II womens smoking habits are not at all at all aected by cigarette prices, their beta
will receive zero weight. Hence, IV will converge in distribution to the beta of women of Type I only,
hence severely overestimating the potential harm of smoke on birth weight.
141
3. You have an i.i.d. sample of 1000 observations on yearly income j
1
, ..., j
1000
. Let . be a poverty line, so
that an individual is categorized as poor if her/his yearly income is below .. Let j
i
denote a binary variable
equal to one if an individual is poor, that is, if j
i
< .. Let H denote the poverty head count ratio, that is,
the proportion of individuals who are poor in the population. You want to estimate H = 1 (j
i
< .) . You can
assume that all the usual regularity conditions hold, so that you can apply LLN and CLT when appropriate.
(a) (4 points) Prove that H = 1 (j
i
) .
1 (j
i
) = 1 Pr (j
i
< .) + 0 Pr (j
i
_ .) = Pr (j
i
< .) = H
Note that
1 (j
i
) ,=
1
:
a
i=1
j
i
Sample mean and expectation are not the same thing!!!
(b) (6 points) Write down the log-likelihood of (j
1
, j
2
, ..., j
1000
).
This is really easy once you realize that this is the usual, standard log-likelihood of an i.i.d. sample from
a Bernoulli distribution with parameter H. Then:
ln1(H) =
1000
i=1
j
i
lnH +
_
1000
1000
i=1
j
i
_
ln(1 H)
(c) (6 points) What is the maximum likelihood estimator of the poverty headcount ratio H? To get full
credit, you can either prove your answer by maximizing the likelihood, or refer clearly to results we have
seen in class.
As we saw in class, the MLE of the single parameter that identied a Bernoulli distribution, with an i.i.d.
sample, is just the sample mean, so that:
^
H =
1
1000
1000
i=1
j
i
This, of course, could also be derived solving the maximization problem
max
1
ln1(H) = max
_
1000
i=1
j
i
lnH +
_
1000
1000
i=1
j
i
_
ln(1 H)
_
= 0 =
_
1000
i=1
j
i
_
1
^
H
+
_
1000
1000
i=1
j
i
_
1
1
^
H
(1)
=
^
H =
1
1000
1000
i=1
j
i
142
(d) (6 points) Suppose that 10% of individuals in your sample are poor. Using a 10% signicance level, can
you reject the null hypothesis that H = .15 using a likelihood ratio test?
ln1
lavcctvictco
=
1000
i=1
j
i
ln(.1) +
_
1000
1000
i=1
j
i
_
ln(1 .1)
= 100 ln(.1) + 900 ln(.9) = 325.08
ln1
1cctvictco
=
1000
i=1
j
i
ln(.15) +
_
1000
1000
i=1
j
i
_
ln(1 .15)
= 100 ln(.15) + 900 ln(.85) = 335.98
so that
11 = 2 [ln1
lavcctvictco
ln1
1cctvictco
] = 2 [325.08 + 335.98] = 21.8 2.71
This LR test has an asymptotic distribution equal to a
2
1
, as here we have only one restriction. So we
denitely reject the null at any standard signicance level, including a 10% (the critical value for 10%
is 2.71). The most common mistakes here has been to confuse the value of the parameter with the
value of the likelihood.
(e) (6 points) Suppose now that you are estimating the proportion of individuals who are poor using the
sample mean, that is
^
H = j =
1
:
a
i=1
j
i
,
where in this case : = 1000, and j = .10. Calculate a 95% condence interval for j.
Because this is a sample from a Bernoulli distribution, ar
_
^
H
_
=
(1
^
1)
^
1
1000
=
(.1)(.9)
1000
, so the condence
interval is
_
.10 1.96
_
(.1)(.9)
1000
.10 + 1.96
_
(.1)(.9)
1000
_
_
0.081406 0.11859

(f) (5 points) Is
^
H a consistent estimator for the true headcount poverty ratio H? Prove your answer.
Is is certainly consistent, as it is just a sample mean of i.i.d. random variable. The mean, under the usual
regularity conditions, will converge in probability to the expectation of an abitrary term, so that:
^
H = j =
1
:
a
i=1
j
i
j
1 (j
i
) = H
143
(g) (4 points) Now you want to check if
^
H is asymptotically normally distributed. Let j
i
= H +n
i
, where
H is the true value of the headcount ratio, and n
i
is a residual. First prove that 1(n
i
) = 0.
Just using the denition of n
i
:
1(n
i
) = 1 (j
i
H) = 1 (j
1
) H = H H = 0
(h) (4 points) Prove that \ ar(n
i
) = H(1 H)
It is sucient to notice that this is just the variance of a Bernoulli random variable with parameter H.
(i) (4 points) Let n be the sample mean of n
i
. Prove that
_
:
_
^
H H
_
=
n
_
1(11)
a
_
H (1 H)
This follows from a few manipulations
^
H =
1
:
a
i=1
j
i
=
1
:
a
i=1
(H +n
i
) = H +
1
:
a
i=1
n
i
=
^
H H =
1
:
a
i=1
n
i
=
_
^
H H
_
= n
=
_
:
_
^
H H
_
=
_
: n =
_
: n
_
H (1 H)
_
H (1 H)
=
n
_
1(11)
a
_
H (1 H)
144
(j) (7 points) Using the Central Limit Theorem (CLT), prove that
_
:
_
^
H H
_
o
(0, H (1 H))
This is just as in Midterm 2. First, because we have i.i.d. random variables and 1 (n
i
) = 0, and
_
1(11)
a
is the standard deviation of n, we have
n
_
1(11)
a
=
n 0
_
1(11)
a
o
(0, 1) .
But
_
:
_
^
H H
_
is just
&
q
H(1H)
n
multiplied by a constant equal to
_
H (1 H), so its asymptotic
distribution will be
(0, H (1 H))
(k) (4 points) Using the previous results, how would you construct a 95% condence interval for the head-
count ratio H?
Analogously to what we saw in Midterm 2:
^
H 1.96
_
^
H
_
1
^
H
_
:
145
Now you would like to study the relation between poverty and voting behavior. You are using data from a
country with two political parties, and 1. You would like to have an estimate of the poverty head count
ratio among voters of party , that is, you would like to know H
= 1(j
i
_ . [
i
= 1) = 1(j
i
[
i
= 1),
where j is income, . is the poverty line, and
i
is a binary variable equal to one if i votes for party ,
and equal to zero if individual i votes for party 1. Let j
, j
1
be the proportion of individuals who vote
for party and 1 respectively. Similarly, let H
1
= 1(j
i
_ . [
i
= 0) denote the poverty ratio among
voters of party 1. Suppose that you have a dataset which includes : i.i.d. observations (j
i
,
i
).
(l) (6 points) Prove that 1[j
i
i
] = H
.
Both j and are binary variables, so the only instance where their product is non-zero is if both are
equal to 1. Then
1[j
i
i
] = Pr (j
i
= 1,
i
= 1) = Pr (j
i
= 1 [
i
= 1)
. .
1
A
Pr (
i
= 1)
. .
j
A
= H
(m) (5 points) Is the following estimator unbiased for H

^
H
a
i=1
j
i
a
i=1
i
. (20)
This is estimator is NOT unbiased. To be unbiased, we would need 1
_
^
H
_
= H, however, the expecta-
tion of a ratio of two random variables is NOT equal to the ratio of the expectations of the
two random variables, so that
1
_
^
H
_
= 1
_
a
i=1
j
i
a
i=1
i
_
,=
1 (
a
i=1
j
i
i
)
1 (
a
i=1
i
)
= H
(n) (6 points) Is
^
H
a consistent estimator of H

Is is consistent. Observations are i.i.d. and we assume that the usual regularity conditions hold, so
^
H
=
p
1(j
i
i
)=1
A
j
A
..
1
:
a
i=1
j
i
i
1
:
a
i=1
i
. .
p
1(
i
)=j
A
j
= H
146
(o) (2 points) Now suppose that, unfortunately, you do not have a dataset that includes information on both
voting behavior and income. However, suppose that you know for certain that 10% of the population of
voters is poor, and you also know that 40% of the population voted for party . We want to see if this
information can be used to infer something about H
. Prove that
H
=
.1 H
1
(.6)
.4
. (21)
.10 = Pr (j
i
= 1) = Pr (j
i
= 1,
i
= 1) + Pr (j
i
= 1,
i
= 0)
= Pr (j
i
= 1 [
i
= 1) Pr (
i
= 1) + Pr (j
i
= 1 [
i
= 0) Pr (
i
= 0)
= H
+H
1
j
1
= H
(.40) +H
1
(.60)
= H
=
.1 H
1
(.6)
.4
(p) (2 points) Given the information at hand, calculate the lower bound for H
, that is, calculate what

is the smallest possible value of H
that is consistent with equation (21)? Does your conclusion provide

useful information about H
? Hint: Remember that H

1
is the fraction of voters who vote for 1 who
are poor, so that this fraction is certainly a number _ 0 and _ 1.
We know that H
1
, being a probability, must be below 0 and one. It enters the expression for H
with a
negative sign, so that the minimum value of H
is achieved when H
1
is equal to one, its maximum value,
so
H
_
.1 .6
.4
= 1.25
which is not very useful information. We already knew that H
must be between zero and one, because

it is a probability/fraction!!
(q) (2 points) Given the information at hand, calculate the upper bound for H
, that is, calculate what

is the largest possible value of H
that is consistent with equation (21)? Does your conclusion provide

useful information about H
?
In this case the information IS useful. The maximum value of H
is achieved (by an argument similar to

the one in the previous point) when H
1
achieves its minimum value, that is, zero. In this case then we
do learn that
H
_
.1
.4
= .25
So even if the information we have is not sucient to calculate exactly H
, we can at least say that no

more than one quarter of A-voters are poor.
147
11.4 Final - Fall 2007
1. Malaria aects millions of people worldwide and kills many thousands every year. The public health literature
has shown that one of the most eective preventive measures against malaria is the regular use of insecticide
treated nets (ITNs) while sleeping at night. You want to study this topic by using data from an i.i.d. random
sample of 2569 individuals in rural Orissa (India). Let 1T
i
denote a binary variable equal to one if individual
i sleeps regularly under an ITN, and let '
i
denote a binary variable equal to one if individual i has malaria
at the time of the survey.
(a) (5 points) Suppose that you know that 58 percent of individuals regularly use a net, and you also know
that the fraction of individuals in your sample with malaria is .14 among those with 1T = 0, and .10
among those with 1T = 1. Calculate

' (that is, the fraction of individuals with malaria in the sample).
This just requires the calculation of a weighted mean (the sample equivalent of LIE). So
' = .58 (.10) + (1 .58) (.14) = 0.1168

You would like to estimate what is the impact of using 1T: on malaria. You estimate a simple linear
regression of '
i
on 1T
i
, and the results are the following (standard errors are heteroskedasticity-robust)
_cons .1353105 .0104173 12.99 0.000 .1148834 .1557376
ITN -.0326259 .013054 -2.50 0.013 -.0582234 -.0070285
M Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
Throughout the problem, let ,
A
denote the coecient corresponding to a given regressor A, so for
instance, in this case
^
,
1T.
= 0.0326259.
(b) (6 points) Interpret the estimated coecient
^
,
1T.
and indicate whether it is statistically signicant at
the standard signicance levels.
The fraction of individuals with malaria is 0.033 lower among individuals who sleep regularly under a net
than among those who dont. The t-ratio is
t =
.033
.013054
= 2.5280
which is signicant at the 5 and 10 level, but not at the 1% level (even if almost so).
148
(c) (5 points) Calculate the p-value of a two-sided test where H
0
: ,
1T.
= 0.05.
ja|nc = 2
_
.033 (0.05)
.013054
_
= 2(1.33) = .1835
(d) (6 points) Do you think the results in this regression can be interpreted in a causal sense? Why, or why
not?
Denitely not. Individuals choose whether to sleep under a net. This regression most likely suer from
omitted variable bias. For instance, richer persons, or persons that care more about health, are more
likely to use an ITN (which induces a correlation between omitted variable and regressor), but these same
characteristics are also likely to aect malaria prevalence directly, through pathways dierent from the
use of ITNs. Hence, both conditions for the existence of OVB hold and OLS will not be consistent for
the causal eect.
(e) (4 points) In this model, the dependent variable is binary. Do you think a logit or a probit model would
have been more appropriate? Explain.
Not really. First, we know that usually results with logit and probit are very similar to those obtained
using LPM. But (and this is the most important reason) in this case the single included regressor is binary,
so logit and probit will actually lead to identical results, by construction! Intuitively, we know that with
no regressors the three models will just estimate the probability of a success (that is, a 1). If the
only regressor is a binary variable, all models will estimate the probability of a success for each of the two
possible values of the regressor. This will lead to identical (not just similar) results across models.
149
Now you estimate the following model, again using OLS:
_cons .3268167 .0593594 5.51 0.000 .2104195 .4432139
lpce -.0348944 .0104168 -3.35 0.001 -.0553207 -.0144681
ITN -.0285609 .0130424 -2.19 0.029 -.0541356 -.0029863
Robust
where |jcc denotes the logarithm of per capita (per head) expenditure (in Indian Rupees) in individual
is family.
(f) (5 points) Calculate the tted value
^
'
i
for an individual who sleeps regularly under a net, and who
lives in a household where expenditure per head is 300 Rupees.
^
'
i
= .3268167 .0285609 .0348944 (ln 300) = 0.099
(g) (5 points) Interpret the estimated coecient
^
,
|jcc
(you do not need to comment on the signicance).
A 1% increase in jcc decreases the predicted probability of having malaria by 0.01 0.035
(h) (6 points) Interpret the change in
^
,
1T.
with respect to the value estimated using the previous model
(on page 2) in terms of omitted variable bias.
When lnjcc is included in the regression,
^
,
1T.
decreases by about 10% in absolute value. So, it looks
like the rst model was suering from downward bias (the coecients are negative). We know that the
sign of the asymptotic bias is determined by the sign of the product between the sign of the coecients
of the omitted variable (in this case ,
ln jcc
, which is < 0) and the sign of the correlation between the
included and the excluded regressors (which in this case should be expected to be positive, because richer
households are more likely to use nets). So, the sign of the product is negative, and indeed the bias is
downwards. In other words, if we do not control for income/wealth the coecient for ITN will incorporate
not only the reduced malaria risk due to nets, but also the fact that when we look at folks who use nets we
are looking at individuals who are richer, and hence are likely to have lower malaria burdens for reasons
other than ITN use (health care, quality of housing etc).
150
You are still worried that the results may be inconsistent because of omitted variable bias. It so happens
that you have collected data for several individuals in dierent households, so you can use your dataset
as if it were a panel where the unit is the household and instead of time you have individuals. So,
the model you estimate is the following
'
Ii
= ,
1T.
1T
Ii
+,
|jcc
|jcc
I
+c
I
+n
Ii
(22)
where the index / denotes the household, and the index i denotes the individual. You estimate equation
(22) using xed eect (FE), and the results are the following:
_cons .0954435 .0440625 2.17 0.031 .0089603 .1819267
lpce (dropped)
ITN .0361111 .0750633 0.48 0.631 -.1112186 .1834408
Robust
(i) (4 points) Why did Stata drop
^
,
|jcc
from the estimation?
Because ln(jcc) is constant for all individuals in a household, hence it is dropped when you use FE,
together with anything that does not vary with the group (in this case, the household).
(j) (5 points) Suppose that equation (22) suers from omitted variable bias where the variables omitted are
individual-specic (for instance, you could think that individuals who care more about health are more
likely to use ITNs and that these individuals are also less likely to have malaria for reasons other than
ITN use). In such case, do you think that the FE estimator will be consistent? Explain.
The FE estimator would not be consistent, because if this were the case we would still have correlation
between 1T and the individual-specic error n
Ii
. Indeed this is likely to be the case, because even within
a family people decide whether they want to use a net or not. Someone who does not use one may care
less about illness (or maybe have some sort of immunity and not be very worried about malaria).
151
(k) (6 points) A colleague suggests that instead of estimating the impact of regular ITN use on malaria
using FE, it may be better to use 2SLS. He proposes to use, as an instrument, a binary variable 1
i
equal to one if individual i slept under a net the night before the interview. Would you expect 1
to be a valid instrument for 1T (which denotes regular use of a net while sleeping)? Why, or why not?
This is a really silly idea. If 1T is endogenous, 1 will be endogenous as well for the same reasons! This
instrument will most likely to be very relevant (if you sleep regularly under a net you are also going to be
more likely to have slept under a net the night before the interview), but exogeneity will most certainly
not hold.
You go ahead with your colleagues plan, but instead of using 1 only as an instrument for 1T, you also
use, as instrument, //:i.c (number of family members) and aqc. The result of the rst-stage regression
is the following:
_cons -.0154055 .09324 -0.17 0.869 -.1982389 .1674278
lpce .0940303 .0152509 6.17 0.000 .0641249 .1239356
hhsize .0086553 .0036473 2.37 0.018 .0015032 .0158073
age -.0019503 .0005715 -3.41 0.001 -.0030709 -.0008297
LN .4673849 .0134301 34.80 0.000 .4410499 .4937198
ITN Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
(l) (4 points) You want to check if the instruments you have chosen are weak. Describe the null hypothesis
of the test that you should carry out to establish if the instruments are weak.
The null hypothesis is that ,
1.
= ,
IIci:c
= ,
ojc
= 0
152
(m) (4 points) The value of the test statistic calculated to test for instrument weakness is 418.65. What do
you conclude?
As expected, the instrument (which make no sense...) are strongly correlated with the endogenous vari-
able. The value of the F test is way above 10, hence our instruments are really bad, but very strong!
You carry out the estimation with 2SLS and the results are the following:
_cons .3285983 .0590766 5.56 0.000 .2127557 .444441
lpce -.0337712 .0110664 -3.05 0.002 -.0554713 -.0120712
ITN -.0423915 .0349156 -1.21 0.225 -.1108571 .026074
Robust
(n) (4 points) Can you reject the null that ,
1T.
= 0 at the 10% signicance level?
t =
.0424
0.035
= 1.21
so you cannot reject the null at the 10 percent level.
(o) (5 points) Now you carry out an overidentication test to test for exogeneity, and the value of the F-test
for the joint statistical signicance of the instruments is 2.11. What do you conclude?
There are three instruments, and one endogenous regressors, so there are two overidentication restric-
tions. Hence
:1 = 3 2.11 = 6.33.
The J-statistics has a chi-square distribution with (in this case) two degrees of freedom, so you can only
reject the null at the 5 or 10% level. Given that we are certain that this was a stupid instrument, this
result conrms that one has to be careful when arguing in favor of the exogeneity simply based on a test
for exogeneity. You have to use common sense as well!
153
2. You want to try a completely dierent approach, and you carry out the following randomized experiment, in
an area where no one uses ITNs. You give a free ITN to a random sample of individuals, and after
a few months you return to the same area and you collect information on ITN use and malaria prevalence.
Let 1
i
be a binary variable equal to one if individual i received a free ITN, and zero otherwise. Like in the
previous problem, let '
i
denote a binary variable equal to one if individual i has malaria at the time of the
survey. Finally, let o
i
denote the fraction of nights individual i has slept under a net since you completed your
distribution of free nets.
(a) (5 points) Suppose that you estimate the following regression '
i
= c
0
+c
1
o
i
+n
i
. Do you think that
c
1
could be interpreted as the causal impact of ITN use on malaria prevalence? Explain.
Not really. Even if nets have been distributed at random in the population, what you have done is
distributing the nets for free. You are not randomly assigning o
i
, which is still CHOSEN by individuals.
Hence, a regression of ' on o will not estimate a causal impact.
(b) (5 points) Suppose that you estimate instead the following regression '
i
= ,
0
+,
1
1
i
+
i
. What is the
interpretation of
^
,
1
in this regression?
This measures the causal impact on malaria of OFFERING A NET to an individual. It does NOT
measure the impact of actually using it, because you do not know whether these folks use them or not.
While actual use is not randomly determined, the oer is, and hence you can interpret the results in a
causal sense.
(c) (6 points) Suppose that you estimate instead the following regression, where jcc
i
indicates expenditure
per head in individual is household: '
i
= ,
t
0
+ ,
t
1
1
i
+ ,
2
jcc
i
+
t
i
. Would you expect
^
,
t
1
to be very
dierent from
^
,
1
estimated using the model in part (b)? Explain.
No, you should not expect a signicant change. Thats because 1
i
has been assigned at random, hence it
will be uncorrelated with anything. So, one of the two conditions for OVB (correlation between included
and excluded regressor) does not hold, so even if we should expect jcc to matter in the regression, the
coecient for 1
i
should not be expected to change (as long as, of course, you have done the randomization
well!)
154
(d) (4 points) From now on, consider the model in part (a), that is, '
i
= c
0
+c
1
o
i
+n
i
and assume that
1 (n
i
) = 0. Now you consider the idea of estimating this model using instrumental variables. Specically,
you consider the idea of using 1
i
as an instrument for o
i
. Do you think this is going to be a valid instru-
ment? Why?
Yes, you should expect this one to be a valid instrument. It is exogenous, because its been randomly
determined, and it should be expected to be relevant, because individuals should be more likely to sleep
often under a net if they got one for free.
Assume now that 1
i
is a valid instrument for o
i
. In Midterm 2 we proved that c
1
can be calculated as
c
1
=
1 ('
i
[1
i
= 1) 1 ('
i
[1
i
= 0)
1 (o
i
[1
i
= 1) 1 (o
i
[1
i
= 0)
Now we want to see how this estimator compares with a more standard 2SLS procedure. As usual, we
proceed in steps.
(e) (5 points) Prove that 1 ('
i
1
i
) = 1 ('
i
[1
i
= 1) 1 (1
i
)
1 ('
i
1
i
) = 1 ('
i
= 1, 1
i
= 1) = 1 ('
i
= 1[1
i
= 1) 1 (1
i
= 1)
= 1 ('
i
[1
i
= 1) 1 (1
i
)
where all the changes from 1 to 1 and vice versa follow easily because both random variables are binary
(which is also the reason why 1 ('
i
1
i
) is equal to the probability that both are equal to one.
155
(f) (6 points) Suppose that you have two random variables, A
i
and 1
i
. You know that A
i
is a binary
random variable, and let 1 (A
i
= 1) = 1 (A
i
) = j
A
. Prove that
Co (1
i
, A
i
) = j
A
(1 j
A
) [1 (1
i
[A
i
= 1) 1 (1
i
[A
i
= 0)]
Start from the denition of covariance (I omit the index i for simplicity)
Co (A, 1 ) = 1 (A1 ) 1 (A) 1 (1 )
now we know that A is binary, so
1 (A1 ) = 1 (1 [A = 1) j
A
+ 0
1 (1 ) = 1 (1 [A = 1) j
A
+1 (1 [A = 0) (1 j
A
)
Hence
Co (A, 1 ) = 1 (1 [A = 1) j
A
1 (1 [A = 1) j
2
A
1 (1 [A = 0) j
A
(1 j
A
)
= 1 (1 [A = 1)
_
j
A
j
2
A
_
1 (1 [A = 0) j
A
(1 j
A
)
= 1 (1 [A = 1) j
A
(1 j
A
) 1 (1 [A = 0) j
A
(1 j
A
)
= j
A
(1 j
A
) [1 (1
i
[A
i
= 1) 1 (1
i
[A
i
= 0)]
156
(g) (5 points) Putting together the results in the previous steps, prove that
c
1
=
1 ('
i
[1
i
= 1) 1 ('
i
[1
i
= 0)
1 (o
i
[1
i
= 1) 1 (o
i
[1
i
= 0)
=
Co ('
i
, 1
i
)
Co (o
i
, 1
i
)
(23)
Based on the results in the previous step, we have
Co ('
i
, 1
i
)
Co (o
i
, 1
i
)
=
j
1
(1 j
1
) [1 ('
i
[1
i
= 1) 1 ('
i
[1
i
= 0)]
j
1
(1 j
1
) [1 (o
i
[1
i
= 1) 1 (o
i
[1
i
= 0)]
=
1 ('
i
[1
i
= 1) 1 ('
i
[1
i
= 0)
1 (o
i
[1
i
= 1) 1 (o
i
[1
i
= 0)
(h) (5 points) From Midterm 2, it follows that c
1
can be estimated consistently by using
^ c
Wo|o
1
=
^
1 ('
i
[1
i
= 1)
^
1 ('
i
[1
i
= 0)
^
1 (o
i
[1
i
= 1)
^
1 (o
i
[1
i
= 0)
that is, by calculating the sample analogue of the middle expression in (23). This estimator is sometimes
called the Wald Estimator. Based on the results in part (g), what is the relationship between the Wald
Estimator and the 2SLS estimator where the binary variable 1
i
is used as instrument? Explain.
They are identical! The Wald estimator estimates the sample analogue of the middle expression in (23),
while 2SLS estimates the sample analogue of the last term in (23), but such two expressions are identical,
so the estimators are eectively the same.
157
(i) (6 points) You proceed with the 2SLS estimation and the results are the following:
^
'
i
= 0.10
(0.01)
0.05
(0.015)
o
i
, 1
2
= 0.01, co
_
^
,
0
,
^
,
S
_
= .0001 (24)
Do these results suggest that it is important to sleep regularly under an ITN to reduce the risk of con-
tracting malaria? Comment on both the statistical and substantive signicance of the results.
Yes they do. The probability of having malaria without ever sleeping under a net (o
i
= 0) is 10 percent,
while the probability is only half as large if you always sleep under a net (o
i
= 1). So the dierence is
certainly substantively important (also consider that malaria can kill...). The dierence is also statistically
signicant at all standard signicance level, because
t =
0.05
.015
= 3.33
(j) (4 points) Suppose that the true value of the slope c
1
is equal to 0.075. Based on the estimates in
(24), would you argue that the 2SLS estimator is inconsistent? Justify your answer.
Certainly not! An estimator is inconsistent when its probability limit is dierent from the true value, not
when the point estimate is dierent from the true value! Here, indeed, we should expect the estimator to
be consistent, because we are using 2SLS and a valid instrument. The fact that the point estimate is not
identical to the true value is just a result of the fact that we have a nite sample. The point estimate is
essentially always dierent from the true value!
(k) (6 points) Show how to construct a 95% condence interval for the predicted probability of having malaria
for an individual who always sleeps under a net (note: you do not need to complete the calculation.
Setting it up correctly will give you full credit).
.05 1.96
_
.01
2
+.015
2
2 (1) (.0001)
158
Suppose again that the instrument is valid, but that that the population you are studying is heteroge-
neous, that is
'
i
= c
i0
+c
i1
o
i
+n
i
o
i
=
i0
+
i1
1
i
+
i
so that now the regression coecients are individual-specic. You can also assume that o
1
= 0, and that
the individual-specic coecients are independent of o
i
and 1
i
. You would like to estimate the Average
Treatment Eect, that is, 1 (c
i1
) .
(l) (5 points) Let ^ c
2S1S
1
be the 2SLS estimator that you obtain if you use 1
i
as an instrument, and if you
ignore the existence of heterogeneity. Prove that
j lim ^ c
2S1S
1
= 1 (c
i1
) +
co (c
i1
,
i1
)
1 (
i1
)
.
From the chichi, we know that
j lim ^ c
2S1S
1
=
1 (c
i1
i1
)
1 (
i1
)
so that using the denition of covariance we get
j lim ^ c
2S1S
1
=
co (c
i1
,
i1
) 1 (c
i1
) 1 (
i1
)
1 (
i1
)
= 1 (c
i1
) +
co (c
i1
,
i1
)
1 (
i1
)
(m) (7 points) Suppose that the poorest and least literate individuals, on average, do not sleep regularly
under the nets, even when they receive them for free (for instance, because they do not believe that malaria
is transmitted by mosquitoes). However, suppose also that these same individuals are those that would
benet more from sleeping regularly under a net (for instance, because they live in poorly constructed
huts that mosquitoes can easily enter at night). If so, will ^ c
2S1S
1
consistently estimate the ATE? Justify
your answer, and if you nd that the answer is no, explain whether 2SLS will over or underestimate the
true benets from using regularly ITNs.
The question clearly indicates that to more negative (that is, smaller) values of c
i1
correspond, on average,
smaller values of
i1
(which should be expected to be positive, because oering a net for free should, if
anything, increase the number of nights you sleep under one of them). In other words, the covariance will
be positive. Hence, we should expect
j lim ^ c
2S1S
1
= 1 (c
i1
) +
0
..
co (c
i1
,
i1
)
1 (
i1
)
. .
0
1 (c
i1
)
Hence, the 2SLS estimator will UNDERestimate the benet of ITNs, because the result will be a negative
number which is not negative enough!
159
Now you estimate the following model, where C
i
is a binary variable equal to one if individual i is a child
'
i
= c
0
+c
1
o
i
+c
2
C
i
+c
3
(o
i
C
i
) +n
i
(25)
(n) (5 points) Suppose you want to test the null hypothesis that sleeping under an ITN aects the risk
of contracting malaria in the same way for children as for adults. Describe the null and the alternative
hypothesis in terms of the coecients.
H
0
: c
3
= 0
H
: c
3
,= 0
(o) (4 points) You carry out an F-test for the null described in the previous step, and the value of the test
is 6.5. What do you conclude?
The threshold for a 1% test and one degree of freedom is 6.63, and for a 5% level is 3.84, so you reject at
5% level but not at 1%.
(p) (5 points) Based again on model (25), how would you test the null hypothesis that ITNs are not useful
in protecting adults from malaria?
H
0
: c
1
= 0
H
: c
1
,= 0
160
3. Suppose now that you have the two binary variables 1
i
and
i
, where 1
i
= 1 if individual i sleeps regularly
under an ITN, while
i
= 1 if the individual sleeps regularly under an untreated net (that is, a net not treated
with insecticide). Let j
1
= Pr (1
i
= 1) and j
.
= Pr (
i
= 1) . Suppose that you want to estimate j
1
and j
.
using Maximum Likelihood, and suppose that you have an iid sample of : observations from a malaria-prone
area.
(a) (6 points) Write down the log-likelihood of your sample.
There are three possibly outcomes (ITN, N, or no net at all). So, the likelihood is
1 (1
i
, T
i
) = j
1
i
1
j
.
i
.
(1 j
.
j
1
)
11
i
.
i
so the log-likelihood for the whole sample will be (using, as with a Bernoulli, the properties of exponentials
and then taking logs)
ln1(j
1
, j
.
) = lnj
1
a
i=1
1
i
+ lnj
.
a
i=1
i
+ ln(1 j
1
j
.
)
_
:
a
i=1
1
i
i=1
i
_
(b) (5 points) You estimate the two parameters above using MLE, and the resulting log-Likelihood is equal
to 12450. The log-likelihood evaluated at j
1
= 0.02 and j
.
= 0.15 is equal to 12455. Can you reject
the null hypothesis that j
1
= 0.02 and j
.
= 0.15?
2 [ln1
l
ln1
1
] = 2 [5] = 10
so you certainly reject at any standard signicance level (compare with a chi-square distribution with two
degrees of freedom. The threshold for a 1% level is 9.21
161
Suppose now that you would like to estimate the fraction of individuals who do not sleep regularly under
any net. Let j
.O
denote the true value of such fraction. Let
^ j
.O
=
1
:
a
i=1
(1 1
i
i
) .
(c) (6 points) Prove that
j lim
ao
(^ j
.O
) = j
.O
This is just the usual LLN with iid observatoins. The mean converges in probability to the expected
value, so
j lim
ao
(^ j
.O
) = 1 (1 1
i
i
) = 1 (1
i
= 0,
i
= 0) = j
.O
(d) (6 points) Prove that
_
:(^ j
.O
j
.O
)
o
(0, j
.O
(1 j
.O
))
This is very simple once you realize that, by the CLT,
A 1 (A
i
)
_
ov(A
i
)
a
o
(0, 1)
or
_
:
_
A 1 (A
i
)
_
o
(0, ar (A
i
))
In this case, ^ j
.O
is a sample mean of iid variables equal to (1 1
i
i
), and j
.O
is the corresponding
variance, so
_
:(^ j
.O
j
.O
)
o
(0, ar (1 1
i
i
))
where
ar (1 1
i
i
) = 1 (1 1
i
i
) [1 1 (1 1
i
i
)]
= j
.O
(1 j
.O
)
by the denition of j
.O
162
This problem is based on actual data collected from a group of villages in rural areas of the Indian state of Orissa.
A group of researchers is studying new mechanisms to increase the proportion of households who regularly use
insecticide-treated bednets (ITNs). The public health literature has amply demonstrated that the use of ITNs
sharply reduces the prevalence of malaria, a debilitating and potentially fatal disease spread by mosquitoes. Many
poor households in rural Orissa do now own ITNs. The researchers have teamed up with a micro-nance institution
to evalute if poor households can be induced to purchase ITNs on credit rather than on cash. A household survey
has been completed on a random sample of respondents within the villages where the ITNs have been sold on credit.
Let
i
denote a dummy (binary) variable equal to one when the household decides to purchase at least one net.
Let o
i
denote a dummy variable equal to one if at least one person in the household was sick with malaria before
the survey. The following table diplays the joint distribution of
i
and o
i
in the sample.
i
= 0
i
= 1
o
i
= 0 0.29 0.19
o
i
= 1 0.25 0.27
1. (3 points) Calculate 1 (o
i
) using the gures shown in the distribution.
1 (o
i
) = 1 (o
i
= 1) = 0.25 + 0.27 = 0.52
2. (4 points) Calculate 1(
i
[ o
i
= 0) and 1(
i
[ o
i
= 1).
1(
i
[ o
i
) = 1(
i
= 1 [ o
i
) =
1(
i
= 1, o
i
)
1(o
i
)
,
so
1(
i
[ o
i
= 1) =
0.27
0.52
= 0.51923
1(
i
[ o
i
= 0) =
0.19
0.29 + 0.19
= 0.39583
3. (3 points) Are o
i
and
i
statistically independent? Justify your answer.
The conditional expectation of
i
depends on o
i
, so the two random variables cannot be independent. This
makes sense, we do expect purchase to be more likely if someone in the household had recent episodes of
malaria.
4. (5 points) Calculate Co(o
i
,
i
).
We know that Co(o
i
,
i
) = 1 (o
i
i
) 1 (o
i
) 1 (
i
) . We also know that both o
i
and
i
are binary, so
1 (o
i
) = 1 (o
i
= 1) , 1 (
i
) = 1 (
i
= 1) and 1 (o
i
i
) = 1 (o
i
= 1,
i
= 1) . So
Co(o
i
,
i
) = 0.27 0.52 (0.19 + 0.27) = 0.0308.
As expected, the two variables are positively correlated.
5. (4 points) Suppose now that you estimate an OLS regression of
i
on o
i
using the same data used to produce
the joint distribution in the previous page. What would the slope of the regression be equal to? Justify your
answer.
We know that when we regress a variable on a binary (dummy) variable, the slope ,
1
measures the impact
163
on the dependent variable of changing the binary variable from zero to one. So the estimated slope will be
1(
i
[ o
i
= 1) 1(
i
[ o
i
= 0) = 0.51923 0.39583 = 0.1234. The same could be calculated using the OLS
formula
^
,
1
=
co (o
i
,
i
)
ar (o
i
)
.
Because o
i
is binary we know that ar (o
i
) =
^
1 (o
i
= 1)
_
1
^
1 (o
i
= 1)
_
, so that
^
,
1
=
0.0308
0.52 (1 0.52)
= 0.1234
Now you estimate the regression model mentioned in the previous question using OLS, and the result is the
following (heteroskedasticity-robust standard errors in parenthesis):
^
i
= 0.39
(0.028)
+ 0.12
(0.040)
o
i
. (26)
You have also estimated that the covariance between the intercept and the slope is equal to .0008.
6. (3 points) Can you reject the null hypothesis that recent malaria episodes have no impact on the probability
of purchasing ITNs, using a 1% level?
The test statistic is
0.12
0.040
= 3 2.58
so you should reject the null at the 1% level.
7. (4 points) From a substantive point of view, do you think the results in model (26) show that past malaria
episodes are an important predictor of ITN purchase? Is this what you expected, and why?
Yes, they do. Past malaria episodes increase the probability of purchase by 12 percent in our sample. This is
large increase. Note also that .12 represents a 30 percent increase (0.12,0.39) in the probability of purchasing
relative to the group where no one had malaria before the survey.
8. (4 points) Do you think the results in model (26) can be interpreted in a causal way? Why?
While it makes sense to expect that past malaria episodes will cause individuals to be more willing to purchase
ITNs, which protect from malaria, it is hard to interpret the result in a causal sense. There are many other
factors which could bias this relationship. For instance, poor households may be more at risk for malaria but
they may also be less likely to purchase the nets. If so, a high o
i
could also proxy for poverty, which would be
associated with lower purchases. This would end up biasing the OLS estimates downwards. Or o
i
could proxy
for the frequency of malaria in a given area (regardless of actual incidence within a household). This would
bias the OLS estimates upwards.
9. (5 points) Calculate a 95% condence interval for the probability that a household purchases ITNs when
there was at least one malaria episode in the six months before the purchase.
What you want is a condence interval for ,
0
+,
1
, which measures the probability of a purchase among those
164
with at least one malaria episode. So
C1 =
_
,
0
+
,
1
_
1.96
_
^ o
2
b
o
0
+ ^ o
2
b
o
1
+ 2^ o
b
o
0
,
b
o
1
= 0.39 + 0.12 1.96
_
0.028
2
+ 0.04
2
+ 2 (.0008)
= 0.51 0.054 88 = [0.45512, 0.56488]
Your data have been collected from a list of villages in dierent parts of the state of Orissa. You know that in
Kandhamal district malaria prevalence is very high, while prevalence is relatively low in Sambalpur district.
You then re-estimate the model in (26) using district-specic samples. Keep in mind that the two samples are
statistically independent. The results for the two regressions are the following:
Kandhamal:
^
i
= 0.86
(0.137)
+ 0.14
(0.137)
o
i
,
Sambalpur:
^
i
= 0.33
(0.065)
+ 0.087
(0.096)
o
i
,
10. (6 points) Test the null hypothesis that the probability of purchasing at least one net among households
where no one has been sick with malaria in the six months before the survey in Kandhamal is twice as large
as in Sambalpur.
In this case the null hypothesis is that
H
0
: ,
1oaoIono|
0
2 ,
Sonbo|j&v
0
= 0
H
1
: ,
1oaoIono|
0
2 ,
Sonbo|j&v
0
,= 0.
So the rst step to proceed with the test is the calculation of the correct standard error to use in the denomi-
nator. Using the fact that the two samples are independent we have
:c
_
,
1oaoIono|
0
+
,
Sonbo|j&v
0
_
=
_
0.137
2
+ 4 0.065
2
= 0.18886.
So the test is
0.86 2 (0.33)
0.18886
= 1.059
11. (3 points) Calculate the p-value for a test of statistical signicance of o
i
in the regression estimated using
data from Sambalpur.
The p-value is
j = 2
_
0.087
0.096
_
= 2
_
0.087
0.096
_
= 0.3648
Now you would like to nd out if in your study area it is true that the regular use of bednets reduces the
probability of falling sick with malaria. You have found some useful data collected before your sale-on-credit
program. Your data include 1
i
, the number of nights individual i slept protected by an ITN last year. You
also observe o
i
, a dummy variable equal to one if the individual had malaria last year. Suppose now that the
following model holds:
o
i
= ,
0
+,
1
1
i
+,
2
'
i
+n
i
, (27)
where '
i
is an indicator of the density of malaria-transmitting mosquitoes in the area where individual i lives
(higher values of '
i
indicate that there are more mosquitoes in the area). You also know that the error is
uncorrelated with both 1
i
and '
i
.
165
12. (4 points) What signs do you expect ,
1
and ,
2
to have? Justify your answer.
You should expect ,
1
< 0 (lower probability of falling sick with malaria if you sleep regularly under a net,
keeping everything else constant) and ,
2
0 (higher probability of falling sick with malaria if you live in a
place with a lot of malaria-transmitting mosquitoes).
Suppose that your dataset only includes 1
i
and o
i
. You do not observe '
i
, so instead of estimating model
(27) you estimate a regression of o
i
on 1
i
. The usual OLS estimator is therefore:
^
,
1
=
a
i=1
_
1
i

1
_
o
i
a
i=1
_
1
i

1
_
2
. (28)
Now you want to study if this estimator is consistent for the true value ,
1
despite the fact that you are ignoring
the fact that '
i
belongs to the regression too. In what follows, also let o
2
1
= ar (1
i
) and o
1A
= co (1
i
, '
i
) .
13. (6 points) Prove that
^
,
1
= ,
1
+,
2
a
i=1
_
1
i

1
_
'
i
a
i=1
_
1
i

1
_
2
+
a
i=1
_
1
i

1
_
n
i
a
i=1
_
1
i

1
_
2
(29)
Substituting the right-hand side of (27) into (28) we have
^
,
1
=
a
i=1
_
1
i

1
_
(,
0
+,
1
1
i
+,
2
'
i
+n
i
)
a
i=1
_
1
i

1
_
2
= .,
0
a
i=1
_
1
i

1
_
. .
=0
a
i=1
_
1
i

1
_
2
+,
1
_
1 =
_
a
i=1
_
1
i

1
_
1
i
a
i=1
_
1
i

1
_
2
_
+,
2
a
i=1
_
1
i

1
_
'
i
a
i=1
_
1
i

1
_
2
+
a
i=1
_
1
i

1
_
n
i
a
i=1
_
1
i

1
_
2
= ,
1
+,
2
a
i=1
_
1
i

1
_
'
i
a
i=1
_
1
i

1
_
2
+
a
i=1
_
1
i

1
_
n
i
a
i=1
_
1
i

1
_
2
14. (3 points) What is the probability limit of
1
a
a
i=1
_
1
i

1
_
'
i
? Justify your steps.
We have iid observations, so we can use the usual LLN and conclude that
j lim
1
:
a
i=1
_
1
i

1
_
'
i
= j lim
1
:
a
i=1
(1
i
1 (1
i
)) '
i
= 1 [(1
i
1 (1
i
)) '
i
] = 1 [(1
i
1 (1
i
)) ('
i
1 ('
i
))]
= co (1
i
, '
i
) = o
1A
15. (4 points) What is the probability limit of
1
a
a
i=1
_
1
i

1
_
n
i
? Justify your steps.
Like in the previous step, we can use the usual LLN to argue that
j lim
1
:
a
i=1
_
1
i

1
_
n
i
= j lim
1
:
a
i=1
(1
i
1 (1
i
)) n
i
= 1 [(1
i
1 (1
i
)) n
i
] = 1 [(1
i
1 (1
i
)) (n
i
1 (n
i
))]
= co (1
i
, n
i
) = 0
166
because the error term is uncorrelated with 1
i
by assumption.
j lim
^
,
1
= ,
1
+,
2
o
1A
o
2
1
(30)
We have to use again the usual LLN for the denominators, then use the results from the last two points, and
nally use the properties of probability limits (so that the j lim of a ratio is the ratio of the j lims). Then:
^
,
1
= ,
1
+,
2
p
o
PM
..
1
:
a
i=1
_
1
i

1
_
'
i
1
:
a
i=1
_
1
i

1
_
2
. .
p
o
2
P
+
p
0
..
1
:
a
i=1
_
1
i

1
_
n
i
1
:
a
i=1
_
1
i

1
_
2
. .
p
o
2
P
so
j lim
^
,
1
= ,
1
+,
2
o
1A
o
2
1
17. (4 points) Based on the result in (30), and given that ,
2
,= 0, we have shown that
^
,
1
is a consistent estimator
for ,
1
only if o
1A
= 0. Would you expect o
1A
to be equal to, less than or greater than zero, and why?
Certainly not. o
1A
is the covariance between a variable which indicates that an individual sleeps regularly
under a net and the prevalence of malaria in the place where s/he lives. We expect this covariance to be
positive. Keeping everything else constant, we would expect individuals to be more likely to use regularly a
bednet if they know there is a lot of malaria around (note that this model is clearly too simple. In practice it
is likely that areas with more malaria are also poorer, for instance, but in our model we are keeping income
into account. However, here we are dealing with a simplied case).
18. (5 points) Given your response to the previous question, if the true model is (27) but you estimate ,
1
using
the wrong estimator in equation (28) (that is, if you use
^
,
1
=

a
i=1
_
1
i

1
_
o
i
,
a
i=1
_
1
i

1
_
2
) will you
end up overestimating or underestimating the actual benet of sleeping under a bednet? Justify your argument
both algebraically and intuitively.
We have argued that o
1A
is likely positive, and ,
2
is likely positive as well. So
j lim
^
,
1
= ,
1
+,
2
o
1A
o
2
1
,
1
and hence if we estimate the wrong model we end up overestimating ,
1
. Because ,
1
< 0 (the coecient
measures how sleeping under a net reduces the probability of getting malaria), this means that we are under-
estimating the benets from sleeping under a bednet. Intuitively, if we do not control for how much malaria
there is in a given area, we may end up nding that ITNs do not protect from malaria because when in our
regression we look at people who use ITNs we are looking at people who live in area with more malaria. If we
want to isolate the causal impact of using nets on the probability of falling sick with malaria we need to keep
other confounding factors constant.
167
You want to study the relationship between hemoglobin level (H/) and malaria in a sample of 2,651 individuals from
rural Orissa (India). Hemoglobin is a protein that has the essential task of transporting oxygen in red blood cells.
Low H/ values are associated with anemia, which if severe can have very serious health consequences. Given that
the malaria parasite destroys red blood cells, malaria is usually associated with anemia. Table 1 reports the results
of several OLS regressions. Each column represents a dierent model. The second row (denoted A) indicates the
dependent variable. Malaria is a binary variable equal to 1 if the individual tests positive to malaria. Male is a
dummy equal to one if the individual is a male, while Age indicates the individuals age in years. The second to
last row (denoted B) indicates if the standard errors are heteroskedasticity-robust. All standard errors are indicated
in brackets below the corresponding coecient. In what follows let ,
A
denote the regression coecient for variable
A, and let hats denote as usual estimates. So, for instance, in Column (1),
^
,
Ao|ovio
= 0.216.
1. (3 points) Using the results in Column (2), what is the interpretation of
^
,
no|ovio
?
Individuals with malaria are predicted to have an hemoglobin level 0.216 lower than other individuals who
are not sick with malaria.
2. (3 points) Now compare the results in Columns (1) and (2). Does the comparison suggest that the regression
residual is heteroskedastic? Explain.
Not really. The standard errors are almost identical, so in this regression heteroskedasticity is unlikly to be an
important issue.
3. (3 points) Do you think the results in model (2) can be interpreted in a causal way? Justify your answer.
Certainly not. We are not controlling for a lot of factors which are likely to aect hemoglobin levels and which
are also likely to be correlated with the included regressor, that is, with 'a|aria. The answer to the following
question represents one example of such a factor.
168
4. (4 points) We know that anemia is a common health problem among the poor, both because of poor nutrition
and because of infections. Suppose that poor individuals are also more likely to have malaria (for instance,
because they are more likely to live in malarious areas, or they lack the means to protect themselves from
mosquitoes). Under these conditions, how would the inclusion of income as regressor in model (2) aect
^
,
no|ovio
? Explain.
According to this argument, if we added income to the regression the coecient would be positive, while
co (i:co:c, :a|aria) < 0. Hence, the exclusion of income from the regression would lead to downward bias,
because the sign of the omitted variable bias (OVB) would be negative. Hence, the inclusion of income in the
regression should be expected to lead to a higher value of
^
,
no|ovio
, that is, to a value closer to zero (but most
likely still negative).
5. (5 points) Now compare the results in Columns (2) and (3). Would you conclude that in this sample males
are more or less likely to have malaria than females? Explain.
Here we have once again to use the formula for the OVB. Here we see that the exclusion of 'a|c from the
regression leads to downward bias in ,
no|ovio
(because the coecient for 'a|aria increases when 'a|c is
included). Because ,
no|c
0, the only way the exclusion of 'a|c can lead to a downward bias in ,
no|ovio
is
if co (:a|aria, :a|c) < 0. Both 'a|aria and 'a|c are binary variables, so this means that 'a|aria is more
likely to be equal to one when 'a|c is equal to zero. In other words, females are more likely to have malaria
than males in this sample.
6. (3 points) What is the interpretation of
^
,
no|ovio
in the model estimated in Column (4)?
In this model the dependent variable is in logarithms. So the result indicates that, conditional on gender, being
sick with malaria predicts a 1.3% decline in hemoglobin levels.
169
7. (3 points) In the model estimated in Column (5), can you reject the null hypothesis that the regression is
linear in age, using a 1% signicance level?
The t-ratio is
t =
0.002
0.0001
= 20
so we certainly reject, even at the 1% level.
^
,
Ao|c|cj(ojc)
in the regression estimated in Column (6)?
The coecient indicates that a 1% increase in age, keeping all other regressors constant, increases the predicted
value of H/ by 0.01 0.635 = 0.00635 more than among females. In other words, older individuals appear to
have higher hemoglobin levels (keeping everything else constant) than younger ones, but this is more so among
males than among females.
9. (5 points) You want to test the null hypothesis that the regression in Column (6) is the same for males and
females. The result of the test is 1 = 161.8. Write down the null and the alternative hypothesis and determine
whether you can reject the null hypothesis.
H
0
: ,
(Ao|ovio=1)+(Ao|c=1)
= ,
(Ao|c=1)
= ,
(Ao|c=1)+|cj(ojc)
= 0
H
: at least one of the above coecients ,= 0

This is a joint hypothesis with 3 degrees of freedom. The critical value of an 1
3,o
test using a 1% level is 3.78,
so we certainly reject the null at any standard level of signicance.
10. (5 points) Using again the results in model (6), calculate a 95% condence interval for the predicted change in
Hq associated with having malaria, for a male. The estimated covariance between
^
,
Ao|ovioAo|c
and
^
,
Ao|ovio
is equal to .012. You do not need to complete the nal calculation to get full credit.
Here the condence interval should be calculated as
^
,
+
^
,
(Ao|ovio=1)
1.96
_
ar
_
^
,
+
^
,
(Ao|ovio=1)
_
,
C1 = (.186 +.038) 1.96
_
0.108
2
+ 0.235
2
+ 2 (.012)
170
In all the models above, the standard errors have been estimated assuming that observations are iid. However,
suppose that in your sample there are several cases where blood tests have been completed for more than one
person within the same family. Suppose that the model is the following (for this question it does not matter
what the regressor is):
j
)i
= ,
0
+,
1
A
)i
+n
)i
, (31)
where n
)i
= c
)
+-
)i
.
So, in this model the error term n
)i
is the sum of two components: c
)
, which is an error term common to
everyone in the same family, but uncorrelated across dierent families; and -
)i
, which is the usual iid error
term component. You can assume that the iid errors -
)i
are also uncorrelated with all the c
)
.
11. (3 points) Calculate Co(n
)i
, n
j)
), where )i and q, denote two individuals that belong to two dierent
families.
Co(n
)i
, n
j)
) = co (c
)
+
)i
, c
j
+
j)
) = 0
because all elements are uncorrelated with each other by assumption.
12. (4 points) Calculate Co(n
)i
, n
))
), where )i and ), denote two individuals that belong to the same family.
Co(n
)i
, n
))
) = co (c
)
+
)i
, c
)
+
))
)
= co (c
)
, c
)
) = ar (c
)
) 0.
All other elements are uncorrelated by assumption.
13. (3 points) In model (31), do you think the assumption that observations in your sample are iid is correct?
Explain.
Denitely not. We have just shown that there is correlation between the residuals that belong to observations
from the same family, so the sample is not an iid sample. Intuitively, information about someone from a given
family ) should be informative about someone else in our sample that belongs to the same family.
171
Now let
i
denote a binary variable equal to one if individual i is anemic, that is, if the individuals hemoglobin
level is low. The results in Table 2 report the results of dierent regressions, estimated using OLS (the Linear
Probability Model", LPM, in Column 1), logit (in Column 2) or probit (Columns 3 and 4).
14. (6 points) You need an estimate of the predicted increase in the probability of being anemic associated with
having malaria, for an individual for whom log(income per head) is equal to the sample mean (the mean is
equal to 6.35). Calculate the predicted change using the three models in columns (1), (2) and (3) of Table 2.
For the LPM model, the predicted increase is simply 0.03. For the logit model, we need to calculate the change
as
1
1 +c
1.8070.137+0.422(6.35)

1
1 +c
1.807+0.422(6.35)
= 0.029
while for probit
(1.055 + 0.083 0.251 (6.35)) (1.055 0.251 (6.35)) = 0.029.
Not surprisingly, the predicted impacts are almost identical across the dierent models, as it is usually the
case when the value of the regressors is chosen to be close to their means.
15. (4 points) Using the results in columns (3) and (4), test the null hypothesis that the coecients for log(income
per head)
2
and log(income per head)
3
are equal to zero, using a 5% level.
With the information at hand, the test can be performed using a likelihood ratio test. The unrestricted
model is the one with the square and the cube, while the restricted model is the one without. So
1 = 2 [log 1
l
log 1
1
] = 2 (1601.31 + 1602.35) = 2.08
The null hypothesis imposes two restrictions, so we have to compare this value with the critical value of a
2
(2)
distribution, which is 5.99. Because 2.08 < 5.99 we do not reject the null.
172
Now you have collected data on the prevalence of anemia from a large number \ of villages. Let r denote the
number of individuals in the village who are not anemic (let us call them healthy). Based on preliminary
analysis, you know that the true density function of r is well approximated by the following distribution:
) (r) = `c
Aa
, 0 _ r _ , (32)
where, however, you do not know the value of the parameter `.
16. (4 points) Let r
denote the number of healthy individuals in village . Write down the log-likelihood function
of your sample of villages, keeping in mind that you have an iid sample of villages. Explain your steps.
The density for one observation is `c
Aav
, and given that observations are iid, the likelihood of the sample
can be written as the product of the village-specic likelihoods. Hence
/ =
\
=1
`c
Aav
and the log-likelihood is
1 = ln/(`) =
\
=1
ln
_
`c
Aav
_
= \ ln` `
\
=1
r
= \ (ln` ` r)
^
`
A11
, that is, the MLE of ` is equal to 1, r, where r is the sample mean of r
across
all villages in your sample.
To nd the MLE we maximize the log-likelihood wrt the parameter `. So, we calculate the FOC and we solve.
0 ln/(`)
0`
= \
_
1
`
r
_
= 0 =
^
`
A11
=
1
r
173
18. (4 points) Calculate j lim
^
`
A11
. Justify your steps.
This is just a straightforward application of the LLN. We have iid observations so we know that
j lim r = 1 (r
) .
Then we also know that (by the properties of probability limits)
j lim
^
`
A11
= j lim
1
r
=
1
j lim r
=
1
1 (r
)
19. (4 points) What is the relationship between ` (the true value) and the expected value of r
? Justify your
answer.
We have just shown that j lim
^
`
A11
=
1
1(av)
. We also know that
^
`
A11
is consistent (because it is a maximum
likelihood estimator and we know that we are using the right likelihood). Hence, it must be that
j lim
^
`
A11
=
1
1 (r
)
= ` =1 (r
) =
1
`
.
You could have also calculated the expected value directly from (32) above, by using the denition of expec-
tation, that is, by solving the integral
1 (r
) =
_
o
0
r`c
Aa
dr.
174
Table 1
(1) (2) (3) (4) (5) (6)
(A) Dependent Variable Hb Hb Hb log(Hb) Hb Hb
Regressors
Malaria -0.216 -0.216 -0.145 -0.013 -0.189 -0.186
[0.114] [0.111] [0.105] [0.010] [0.113] [0.108]
Male 1.225 0.106 1.561 -0.23
[0.080] [0.007] [0.076] [0.141]
Malaria Male 0.001 0.038
[0.236] [0.235]
Age 0.128
[0.007]
Age
2
-0.002
[0.0001]
log(age) 0.215
[0.033]
Male log(age) 0.635
[0.052]
Constant 10.916 10.916 10.468 2.335 8.937 9.857
[0.039] [0.040] [0.041] [0.004] [0.079] [0.097]
(B) Heteroskedasticity-robust st. errors No Yes Yes Yes Yes Yes
R-squared 0.00 0.00 0.10 0.08 0.23 0.25
Table 2
(1) (2) (3) (4)
LPM Logit Probit Probit
malaria 0.03 0.137 0.083 0.079
[0.027] [0.129] [0.079] [0.079]
log(income per head) -0.086 -0.422 -0.251 -1.504
[0.016] [0.079] [0.047] [3.022]
log(income per head)
2
0.13
[0.449]
log(income per head)
3
-0.003
[0.022]
Constant 0.841 1.807 1.055 -0.003
[0.102] [0.504] [0.300] [0.022]
Log-likelihood -1602.18 -1602.35 -1601.31
175
11.7 Final, Fall 2008
This problem uses data from a sample of 22,445 zero to 3 year old Indian children. For simplicity, you can assume
that the data are iid throughout the problem, unless specied otherwise. Among the other things, the dataset
includes information on the weight and height of each child, as well as measures of nutritional status which evaluate
the growth performance of the child relative to a reference population of healthy, well-fed children. Let l\
i
denote
a binary variable equal to one if child i is underweight, that is, if the childs weight is very low relative to normal
standards for children of the same age and gender. Let also oT
i
denote a binary variable equal to one if a child
is stunted, that is, if the childs height is very low relative to normal standards for children of the same age and
gender. The following table shows the distribution of l\
i
and oT
i
in your sample.
oT
i
= 0 oT
i
= 1
l\
i
= 0 0.46 0.11
l\
i
= 1 0.11 0.32
1. (4 points) Estimate the fraction of stunted children in the sample.
Solution:
^
1 (oT
i
) =
^
1 (oT
i
= 1) = 0.43
2. (4 points) Calculate an estimate of 1(l\
i
= 1 [ oT
i
= 1).
Solution:
^
1(l\
i
= 1 [ oT
i
) =
^
1(l\
i
= 1, oT
i
= 1)
^
1(oT
i
= 1)
=
0.32
0.43
= 0.74419
3. (5 points) Estimate the variance of l\
i
.
Solution: This is a binary variable, so we can use the formula for the variance of a Bernoulli.
ar(l\
i
) =
^
1(l\
i
= 1)
_
1
^
1(l\
i
= 1)
_
= 0.43 (1 .43) = .2451
176
4. (6 points) Estimate the covariance between l\
i
and oT
i
.
Solution: we know that co (l\
i
, oT
i
) = 1 (l\
i
oT
i
) 1 (l\
i
) 1 (oT
i
) . Also, note that l\
i
oT
i
is
dierent from zero only if both variables are dierent from zero. Hence
co (l\
i
, oT
i
) =
^
1 (l\
i
oT
i
)
^
1 (l\
i
)
^
1 (oT
i
)
=
^
1 (oT
i
= 1, l\
i
= 1)
^
1 (oT
i
= 1)
^
1 (l\
i
= 1)
= 0.32 0.43 (0.43) = 0.1351.
5. (6 points) Calculate a 95% condence interval for 1(l\
i
).
Solution: First, recall that we estimate expectations using sample means, so that
^
1(l\
i
) = l\
i
=
1
:
a
i=1
l\
i
.
We also know that the variance of l\
i
is
\ ov(lW
i
)
a
. Hence the condence interval is constructed as
l\
i
1.96
_

\ ar (l\
i
)
:
= 0.43 1.96
_
0.2451
22445
= [0.42352, 0.436 48]
177
Now we want to see if the prevalence of undernutrition changes with age, and we estimate dierent regression
models where the dependent variable is l\
i
. The results of the dierent regressions are listed in Table 1. As
usual, let ,
A
denote the slope of a regression with respect to regressor A.
6. (5 points) Using the OLS model (1), calculate the predicted probability of being underweight for a girl, when
age is 12, 18 or 36 months. How does prevalence of underweight change with age?
l\
i
(aqc = 12) = 0.011 + 0.02401 (1) + 0.04724 (12) 0.00097
_
12
2
_
= 0.44021
l\
i
(aqc = 18) = 0.011 + 0.02401 (1) + 0.04724 (18) 0.00097
_
18
2
_
= 0.54905
l\
i
(aqc = 36) = 0.011 + 0.02401 (1) + 0.04724 (36) 0.00097
_
36
2
_
= 0.45653
According to these estimates, the shape of relationship between l\ and age is parabolic and concave. The
prevalence of undernutrition rst increases with age and then decreases.
7. (5 points) Using again the OLS model (1), test the null hypothesis that the regression is linear in age, using
a 1% signicance level.
jc
2 = 0. So the t-ratio is
t =
^
,
jc
2
:c
_
^
,
jc
2
_ =
0.00097
0.00003
= 32.333.
The null is certainly rejected.
^
,
|cj(jc)
in model (2)?
A 1% increase in age in months increases the prevalence of underweight by 0.01 (0.1775) .
178
9. (5 points) Using model (3), test the null hypothesis that age aects in the same way boys and girls predicted
probability of being underweight.
1cno|clog(jc)
= 0.So the t-ratio is
t =
^
,
1cno|clog(jc)
:c
_
^
,
1cno|clog(jc)
_ =
0.0085
0.0059
= 1.4407.
So the null cannot be rejected at any standard signicance level.
10. (6 points) Using both regression models (3) and (4), calculate the dierence in the predicted probability of
being underweight between a 12 and an 18-month old boy.
Using the linear model (3), we have
0.0292 + 0.1735 ln (18) [0.0292 + 0.1735 ln(12)]
= 0.1735 ln
_
18
12
_
= 0.070348
Using the probit model (4) we have
(1.6206 + 0.5332 ln 18) (1.6206 + 0.5332 ln 12)
= (0.079454) (0.295 65)
= .46833576 .38374869 = 0.08458707,
11. (4 points) Are the results similar between the two models? Is this what you expected?
The results are quite close, which is what we usually expect. OLS and probit usually give very similar results,
as long as the predictions are being calculated for values of the regressors that are not too extreme.
179
12. (6 points) Using both regression models (3) and (4), calculate the predicted probability of being underweight
for a 1-month old boy. Are the results very dierent from each other? Is this what you expected?
Using the linear model (3), we have
0.0292 + 0.1735 ln (1) = 0.0292
Using the probit model (4) we have
(1.6206 + 0.5332 ln 1) = (1.6206) = .05255173.
The results are quite dierent! The OLS prediction is even negative. This is perhaps not expected but we
should not be too surprised. Here the prediction is being made for a fairly extreme value of the regressor.
Recall that children in the sample are 1-35 month old, so choosing children of age 1 means that we are choosing
values of the regressors which are close to the boundaries. For such values we know that the choice of model
may matter, and here it does!
13. (5 points) Let 1
i
denote a dummy variable equal to one if child i lives in a rural area. How should you
modify model (2) if you wanted to test the null hypothesis that gender dierences in underweight are dierent
in rural vs. urban areas, keeping age constant?
You should estimate a regression of l\
i
on 1c:a|c
i
, ln(qc
i
) and the interaction between 1
i
and 1c:a|c
i
.
The null hypothesis would be that ,
1cno|c1
i
= 0.
180
Now you want to study how the prevalence of underweight if related to episodes of intestinal disease. Let 1
i
be a binary variable equal to one if the child had an intestinal disease (such as diarrhea) in the 3 months before
the survey. Table 2 reports the results of dierent regressions, all estimated using OLS. The Asset index is
a indicator of wealth (a larger number indicates more wealth).
14. (6 points) Looking at the results in model (5), does it look like 1
i
changes signicantly the probability that
a child is underweight? Evaluate both the statistical signicance and the magnitude of the coecient.
1
i
is certainly statistically signicant at any standard signicance level (the t-ratio is 7.28). The resutls indi-
cate that the probability of being underweight is about 3 percentage points higher for children who recently
had intestinal diseases. We know that overall 43 percent of children were underweight, so a 3 percent dierence
is not huge (its about 7 percent of the mean) but its not negligible either.
15. (5 points) Do you think that the results in model (5) can be interpreted as suggesting that intestinal disease
is a cause of underweight? Explain.
Not at all. There are many other factors which are likely to aect l\
i
and which are also correlated with
1
i
. For instance, we would expect children that live in poorer areas to suer more from malnutrition and also
to suer more from intestinal diseases.
16. (5 points) Compare the results in models (5) and (6). Does
^
,
1
change in the expected direction when you
include the asset index in the regression?
Yes it does, although the change is very small, and perhaps smaller in magnitude than we could have ex-
pected. First, note that in model (2)
^
,
ccct1aoca
< 0 (wealth reduced underweight, conditional on 1
i
). Also,
we would expect children from richer families to suer less from intenstinal diseases, so we would expect
co (1
i
, ::ct1:dcr) < 0. Hence, the OVB that results from excluding ::ct1:dcr from the regression should
be expected to be positive, and indeed with the inclusion of ::ct1:dcr,
^
,
1
becomes smaller, as expected.
181
17. (5 points) Compare the results in models (6) and (7). Does
^
,
AssetIndex
change in the expected direction when
you include in the regression a binary variable equal to one when the childs mother is illiterate?
Yes it does. First, note that in model (3)
^
,
ActIcv1||itcvotc
0. Also, we would expect children from families
where the mother is illiterate to suer more from intenstinal diseases, so we would expect co ('ot/cr1||itcratc
i
, 1
i
) <
0. Hence, the OVB that results from excluding 'ot/cr1||itcratc from the regression should be expected to be
negative, and indeed with the inclusion of 'ot/cr1||itcratc,
^
,
ccct1aoca
becomes larger (closer to zero), as
expected.
18. (4 points) Using the results in model (8), construct a test for the null hypothesis that mothers and fathers
illiteracy predict equal increases in the probability of child underweight. You have estimated that
Co(
^
,
FatherIlliterate
,
^
,
MotherIlliterate
) = .00002189.
You do not have to complete the calculation to get full credit.
Solution:
t =
^
,
FatherIlliterate
^
,
MotherIlliterate
_
:c
2
_
^
,
FatherIlliterate
_
+ :c
2
_
^
,
MotherIlliterate
_
2 (.00002189)
=
0.04368 0.09974
_
0.00846
2
+ 0.00780
2
2 (.00002189)
= 4.2234
so the null is certainly rejected at any standard signicance level (the nal calculation was not necessary to
get full credit).
182
You are worried that your estimates of the eect of 1
i
on l\
i
may be biased by unobserved omitted factors
at the household level. For this reason, you re-estimate the model using Fixed Eects and Random Eects,
including only households where there are at least two children below 3 years of age. In this estimation, the
group is then represented by a household, while the within-group observations are the dierent children
within the same household. Keep in mind that a household may include more than one family, and that the
Asset index is household-specic. The following table shows the results.
(FE) (RE)
Had diarrhea recently 0.02934 0.02941
[0.01128] [0.00792]
Mother is Illiterate 0.0702 0.09801
[0.04061] [0.01447]
Asset index -0.03515
[0.00382]
Constant 0.35586
[0.01036]
Observations 5722 5722
Number of households 2731 2731
Standard errors in brackets
19. (4 points) The FE model has not estimated the slope ,
AssetIndex
. Why?
Solution: The problem states that the asset index has been calculated at the household level, so when we
estimate model (1), everything that is invariant within the houshold will be absorbed by the FE and the
corresponding slopes cannot be estimated.
20. (6 points) You perform a Hausman test, and the value of the test is 0.54 (note: this is the value of the test,
not the p-value). Based on this result, which model should you use, and why?
We know that the null of the Hausman test is that the xed eects are not correlated with the regressors.
The test is distributed as a
2
n
, where : is the number of slopes estimated in both FE and RE models. Here,
: = 2. The critical value for a
2
2
test if we use a 10 percent signicance level is 4.61, and our test is 0.54.
Therefore, we do not reject the null hypothesis at standard signicance levels (the critical values would be
even larger with smaller signicance levels). This suggests that in this case we should use RE, which not only
is consistent but will also be more eicient than FE.
183
21. (6 points) As a further robustness check, you would like to re-estimate the eect of 1
i
on underweight using
instrumental variable estimation. A colleague suggests to use, as instruments, the two binary variables C
i
and
1
i
, where C
i
is = 1 if the child had recent episodes of cough or other respiratory ailments, while 1
i
is = 1 if
the child recently had fever. Do you think these two variables satisfy the requirements for valid instruments?
Explain.
Both variables are likely to be relevant but not exogenous. Both are likely to be relevant, because they are
all measures of child health, and so we expect them to be strongly correlated with each other. But both are
unlikely to be exogenous. If we think that the prevalence of intestinal disease is likely correlated with omitted
variables (such as income, availability of health care and sanitation, quality of housing etc.) we should ex-
pect these omitted factors to be correlated with C
i
and 1
i
as well. So, these instuments are not likely to be valid.
22. (6 points) Consider now the simple OLS estimates in column (5), Table 2. If you were worried that this
model suers from omitted variable bias, and you assume that C
i
and 1
i
are valid instruments, would you
expect
^
,
1
to increase or decrease, if you estimate model (5) using 2SLS? Explain carefully your argument.
Based on the answers in the previous points, we mostly expect
^
,
1
to be biased upwards, because 1
i
is likely
absorbing the impact on l\
i
of factors associated to poverty. Such factors will generally increase l\ and
be positively correlated with 1
i
. So, we would expect OLS estimates to be biased upwards. Hence, if the
instruments were valid, we would expect
^
,
1\
1
to be smaller than
^
,
O1S
1
.
184
23. (5 points) You go ahead with your colleagues idea and you re-estimate model (5), Table 2, using C
i
and 1
i
as instruments. You test the hypothesis that instruments are weak, and the result of the test is 1 = 600.4.
Explain how the test is performed, and whether you conclude that the instruments are weak.
Solution: As expected, the instruments are very strong. 600.4 is way larger than the rule-of-thumb thresh-
hold of 10. The test for instrument weakness is performed after the rst stage, by calculating, in our case, the
value of the F test of the null hypothesis that all instruments are not signicant in a regression of 1
i
on C
i
and 1
i
.
24. (6 points) Another colleague is not persuaded that the instruments are exogenous, so you also perform a test
of exogeneity. The result of the F-test is 5.21. Explain how the test is performed, and whether you conclude
that the instruments are exogenous, using a 5 percent level.
Solution: The test is performed after the second stage of 2SLS. First, the residuals are calculates as
^ n
i
= l\
i
^
,
2S1S
0

^
,
2S1S
1
1
i
.
Then ^ n
i
is regressed on C
i
and 1
i
, and the F test for the null hypothesis that both variables are not sig-
nicant is calculated. Here the result of the F-test is 5.21. Then we have to compare :1 with the critical
value for a
2
nI
, where : = 2 is the number of instruments, and : / is the number of overidentifying
restrictions. Because 2 (5.21) = 10.42, we reject the null hypothesis that the instruments are exogenous (as ex-
pected!) at any standard signicance level (the critical value for a
2
1
for the most conservative 1% test is 6.63).
25. (5 points) Suppose that you know that the true value ,
1
= 0.01. In your sample, the 2SLS estimate
^
,
1
= 0.02. Does this imply that your estimator for ,
1
is biased? Does it imply that your estimator is not
consistent? Explain.
Solution: The point estimate has nothing to do with either bias or consistency. Both are properties of an
the estimator which do not depend on the actual point estimate we get. Indeed, our point estimate is pretty
much never equal to the true value, but still there are plenty of unbiased and consistent estimators out there
(just think of the sample mean).
185
Now you want to evaluate the height performance of children in your study population. Let H
i
denote a
measure of the height performance of a child relative to a reference of healthy and well fed children. You have
reasons to believe that H
i
is normally distributed. Recall that if a random variable H is distributed normally
with expected value j
1
and variance o
2
, its density is
) (H) =
1
_
2o
c
1
2
H
H
2
.
You already know that the (true) expected value of H
i
in the population is j
1
, but you want to estimate the
variance using Maximum Likelihood. Recall that you can assume that the observations are iid.
26. (4 points) Prove that the log-likelihood of your sample can be written as
ln1
_
H
1
...H
a
[o
2
_
= :ln
_
2
:
2
ln(o
2
)
1
2o
2
a
i=1
(H
i
j
1
)
2
Solution: the density for a single observation is the one indicated above, so, given that we can assume iid
observations:
1
_
H
1
...H
a
[o
2
_
=
a
i=1
1
_
2o
c
1
2
H
i
2
and taking logs
ln1
_
H
1
...H
a
[o
2
_
=
a
i=1
ln
_
1
_
2o
c
1
2
H
i
2
_
=
a
i=1
_
ln
_
_
2o
_
1
2
a
i=1
_
H
i
j
1
o
_
2
_
= :ln
_
2 :lno
1
2o
2
a
i=1
(H
i
j
1
)
2
= :ln
_
2
:
2
lno
2
1
2o
2
a
i=1
(H
i
j
1
)
2
27. (6 points) Prove that the MLE of the variance is ^ o
2
A11
=
1
a
a
i=1
(H
i
j
1
)
2
Solution: we have to take the rst order condition with respect to o
2
and solve
0 ln1
_
H
1
...H
a
[o
2
_
0o
2
= 0 =
:
2o
2
+
1
2 (o
2
)
2
a
i=1
(H
i
j
1
)
2
= : +
1
^ o
2
A11
a
i=1
(H
i
j
1
)
2
= ^ o
2
A11
=
1
:
a
i=1
(H
i
j
1
)
2
186
28. (6 points) Is ^ o
2
A11
a consistent estimator of the true variance o
2
Solution: ^ o
2
A11
is consistent. We have iid observations, and by the LLN we know that (if the usual conditions
hold) the mean of iid observations converges in probability to the expectation of any one observation. Hence
^ o
2
A11
=
1
:
a
i=1
(H
i
j
1
)
2
j
1
_
(H
i
j
1
)
2
_
= o
2
.
29. (5 points) Suppose that ^ o
2
A11
= 2.75. Construct a test for the null hypothesis that o
2
= 3. (This question
is relatively hard. You do not need to complete the nal calculations to get full credit).
Solution: with the information at hand, we can use a likelihood ratio test. Recall that
1 = 2 [ln1
l
ln1
1
] ,
and that
ln1
_
H
1
...H
a
[o
2
_
= :ln
_
2
:
2
lno
2
1
2o
2
a
i=1
(H
i
j
1
)
2
Also, we have proved that the unrestricted MLE of the variance is ^ o
2
A11
=
1
a
a
i=1
(H
i
j
1
)
2
, so we can
write
ln1
_
H
1
...H
a
[o
2
_
= :ln
_
2
:
2
lno
2
:
2o
2
_
1
:
a
i=1
(H
i
j
1
)
2
_
= :ln
_
2
:
2
lno
2
:
2o
2
^ o
2
A11
.
So
1 = 2
_
:ln
_
2
:
2
ln ^ o
2
A11
:
2^ o
2
A11
^ o
2
A11
_
:ln
_
2
:
2
ln3
:
2 (3)
^ o
2
A11
__
= 2
_
22445
2
ln2.75
22445
2

_
22445
2
ln3
22445
2 (3)
2.75
__
Note that in principle one could use the formula from the very last question (n. 33) to construct a test.
However, such expression (which still would yield partial credit) was not acceptable as an answer to this
question, because the expression in (33) requires knowledge of
1
a
a
i=1
_
(H
i
j
1
)
2
^ o
2
A11
_
2
, which you do not
instead observe here. As you can see above, the LR test can instead be calculated with the data at hand.
187
30. (5 points) Now we need to derive the asymptotic distribution of ^ o
2
A11
. First prove that
_
:
_
^ o
2
A11
o
2
_
=
_
: ,
where is the sample mean of
i
= (H
i
j
1
)
2
o
2
.
Solution:
^ o
2
A11
=
1
:
a
i=1
(H
i
j
1
)
2
== ^ o
2
A11
o
2
=
1
:
a
i=1
_
(H
i
j
1
)
2
_
o
2
=
1
:
a
i=1
_
(H
i
j
1
)
2
_
:
:
o
2
== ^ o
2
A11
o
2
=
1
:
a
i=1
_
(H
i
j
1
)
2
o
2
_
==
_
:
_
^ o
2
A11
o
2
_
=
_
:
1
:
a
i=1
_
(H
i
j
1
)
2
o
2
_
=
_
:
31. (5 points) Prove that 1 (
i
) = 0.
Solution:
1 (
i
) = 1
_
(H
i
j
1
)
2
o
2
_
= 1
_
(H
i
j
1
)
2
_
. .
=o
2
o
2
= 0
32. (5 points) Prove that o
2
= \ ar (
i
) = 1
_
_
(H
i
j
1
)
2
o
2
_
2
_
.
Solution: We have just proved that 1 (
i
) = 0, so it follows that
\ ar (
i
) = 1
_
2
i
_
_
_
1 (
i
)
. .
=0
_
_
2
= 1
_
2
i
_
= 1
_
_
(H
i
j
1
)
2
o
2
_
2
_
by the denition of
i
.
188
33. (5 points) Using the results in the previous steps, prove that
_
:
_
^ o
2
A11
o
2
_
=
1 (
i
)
ov
_
a
o
Solution: We have already proved that

_
:
_
^ o
2
A11
o
2
_
=
_
: and that 1 (
i
) = 0. But then
_
:
_
^ o
2
A11
o
2
_
=
_
:
o
=
_
:( 0)
o
=
_
:( 1 (
i
))
o
=
_
:
1 (
i
)
o
=
1 (
i
)
ov
_
a
o

_
:
_
^ o
2
A11
o
2
_
o
_
0, o
2
_
. Justify your steps!
Solution: By the Central Limit Theorem, we know that if we have a sample of iid random variables with
nite mean and variance it follows that
1 (
i
)
ov
_
a
o
(0, 1) .
But then, using the properties of variances, we have that
_
:
_
^ o
2
A11
o
2
_
=
1 (
i
)
ov
_
a
. .
d
.(0,1)
o
o
o
(0, 1) =
_
0, o
2
_
189
35. (5 points) Based on the result in the previous step, and using the denition of o
2
, how would you calculate a

95% condence interval for the estimated variance o
2
? (Note: this question is worth fewer points that it should
based on its diculty, so plan accordingly. Hint: think about what the approximate value of the variance of
^ o
2
A11
is in large samples, and think about how you would estimate all the elements that are unknown, that
is, those elements that need to estimated)
Solution: We know that
_
:
_
^ o
2
A11
o
2
_
o

_
0, o
2
_
. Therefore, in large but nite samples we also have
that the following is approximately true
_
:
_
^ o
2
A11
o
2
_
-
_
0, o
2
_
.
Using the properties of normal distribution we have therefore
^ o
2
A11
o
2
-
1
_
:
_
0, o
2
_
=
_
0,
1
:
o
2
_
and
^ o
2
A11
-
_
0,
1
:
o
2
_
o
2
=
_
o
2
,
1
:
o
2
_
.
So, using the denition of
i
and of o
2
a 95% condence interval will be

^ o
2
A11
1.96
_
1
:
o
2
= ^ o
2
A11
1.96
_
1
:
1
_
_
(H
i
j
1
)
2
o
2
_
2
_
.
However, in this expression there are several elements which are unknown. We have been assuming that we
know j
1
, so this is not a problem. We do not know o
2
, but we have an estimate ^ o
2
A11
. Finally, we can always
estimate an expectation with a sample mean. So
^ o
2
A11
1.96
_
1
:
^ o
2
= ^ o
2
A11
1.96
_
1
:
_
1
:
a
i=1
_
(H
i
j
1
)
2
^ o
2
A11
_
2
_
190
Table 1: Dependent variable: l\
i
(=1 if child i is underweight)
(1) (2) (3) (4)
OLS OLS OLS Probit
Female 0.02401 0.0244 0.0024 0.052
[0.00628] [0.0063] [0.0148] [0.0627]
Age 0.04724
[0.00114]
Age
2
-0.00097
[0.00003]
log(Age) 0.1775 0.1735 0.5332
[0.0029] [0.0039] [0.0153]
Femalelog(Age) 0.0085 0.0066
[0.0059] [0.0226]
Constant -0.011 -0.0396 -0.0292 -1.6206
[0.00776] [0.0078] [0.0097] [0.0425]
R-squared 0.10187 0.0958 0.0958
Log-likelihood -14140.528
Robust standard errors in brackets
191
You are using a sample of 985 households who live in Delhi, India. The data set includes information on their total
monthly expenditure per head (jcc) and their expenditure in food. Let 1
i
be a binary variable equal to one if the
household is poor, where here a household is considered to be poor if its jcc is below the sample mean. Let also
1
i
denote a binary variable equal to one if the household spends more than 50% of its total budget in food. The
following table shows the joint distribution of 1
i
and 1
i
in your sample:
1
i
= 0 1
i
= 1
1
i
= 0 0.1878 0.1350
1
i
= 1 0.1066 0.5706
1. (3 points) Estimate the fraction of households in your sample who spend more than 50% of their total budget
in food, that is, calculate

Pr (1
i
= 1) .
Solution: This is just

Pr (1
i
= 1, 1
i
= 0) +
Pr (1
i
= 1, 1
i
= 1) = 0.135 + 0.5706 = 0.7056.
2. (3 points) Estimate 1 (1
i
) .
Solution: 1
i
is a binary variable, so
^
1 (1
i
) =

Pr (1
i
= 1) = 0.1066 + 0.5706 = 0.6772.
3. (4 points) Estimate 1 (1
i
[1
i
= 0) and 1 (1
i
[1
i
= 1) .
Solution: again it helps to note that 1 is a binary variable. So
^
1 (1
i
[1
i
= 0) =

Pr (1
i
= 1[1
i
= 0) =

Pr (1
i
= 1, 1
i
= 0)
Pr (1
i
= 0)
=
0.135
1 0.6772
= 0.41822.
Similarly
^
1 (1
i
[1
i
= 1) =

Pr (1
i
= 1[1
i
= 1) =

Pr (1
i
= 1, 1
i
= 1)
Pr (1
i
= 1)
=
0.5706
0.6772
= 0.84259.
4. (3 points) Interpret the result in the previous point. For instance, what does it mean from an economic point
of view? Is it what you expected?
Solution: The relative magnitude of the two conditional expectations indicate that more than 80% of the
poor in this sample spend more than half of their total budget in food, while only 40 percent do among the
non-poor. This was to be expected. This being a sample from a poor country, we would expect the poor
(because of their low income) to spend a very large fraction of their total outlay in food, which is a necessity.
5. (5 points) Estimate j
11
, that is, the correlation coecient between 1
i
and 1
i
.
Solution: Once again it helps using the binary nature of the two random variables, which implies that the
expectation of their product is the probability that both are equal to one. Hence
^ o
11
=
^
1 (1
i
1
i
)
^
1 (1
i
)
^
1 (1
i
) =

Pr (1
i
= 1, 1
i
= 1)
Pr (1
i
= 1)
Pr (1
i
= 1)
= 0.5706 0.6772 (0.7056) = 0.092768.
192
Also, we know that the variance of a binary variable with probability j is j (1 j) , so
^ j
11
=
^ o
11
^ o
1
^ o
1
=
0.092768
_
0.6772 (1 0.6772) 0.7056 (1 0.7056)
= 0.43534.
6. (5 points) Construct a 95% condence interval for the fraction of households who are poor in the population.
Solution: Recall that this is just a condence interval for a sample mean. Hence the condence interval is
Pr (1
i
= 1) ,= 1.96 :.c.
_
Pr (1
i
= 1)
_
,
where
:.c.
_
Pr (1
i
= 1)
_
=
Pr (1
i
= 1)
_
1
Pr (1
i
= 1)
_
:
=
_
0.6772 (1 0.6772)
985
= 0.014897
so that the CI is
[0.678 1.96 (0.014897) , 0.678 + 1.96 (0.014897)]
7. (4 points) Suppose that you know that mean jcc among the poor is 579 Rupees per person per month (this
is about 35 USD taking into account the dierence in purchasing power between US and India), while you
know that among the non-poor the mean is 1832. Estimate 1 (jcc) in the population.
Solution: This an application of the LIE, because we know that
1 (jcc) = 1
1
[1 (jcc[1)] = 0.6772 (579) + (1 0.6772) 1832 = 983.47
193
Let 1o
i
denote the budget share spent in food by household i. That is
1o
i
= 100
Total monthly expenditure per head in food
i
jcc
i
.
You estimate a linear regression of 1o
i
on jcc
i
(in 100 Rupees) and this is the result:
1o
i
= 59.7
(0.986)
0.194
(0.097)
jcc
i
.
8. (3 points) What is the interpretation of the slope in this regression?
Solution: The result indicates that an increase in jcc of 100 Rs predicts a decrease of slightly less than 0.2
percentage points in the share of the budget spent in food.
9. (3 points) Does the intercept have a meaningful interpretation in this regression?
Solution: No. In this regression, the intercept indicates the food budget share for a household with zero
expenditure, which is not a very meaningful quantity to estimate...
10. (4 points) Is jcc
i
signicant at the 1, 5 and 10 percent level?
Solution: The t-ratio is
0.194
0.097
= 2,
so we reject the null that the slope is zero using a 10 or 5 percent level (barely, in this case) but we cannot
reject the null if we use a 1% level.
11. (4 points) Calculate the p-value for the two-sided test in the previous point.
Solution:
j = 2(2) = (.047)
12. (4 points) Estimate the budget share spent in food for a household whose per capita expenditure is Rs 5000
(recall that in the regression estimated above jcc
i
denotes expenditure divided by 100).
Solution: 1 (1o
i
[jcc
i
= 5000,100) = 59.7 0.194 (50) = 50.0.
13. (5 points) Estimate a 95% condence interval for the prediction estimated in the previous question, taking into
account that the covariance between the estimated slope and intercept is 0.085. You do not need to complete
the calculation to get full credit. Just write down the correct formula and plug in the correct estimates.
Solution: 50.0 1.96
_
0.986
2
+ (50
2
) 0.097
2
+ 2 (50) (0.085) = 50 7.8387.
194
Now you estimate the following model,
jcc
i
= ,
0
+,
c
:
i
+n
i
, (33)
where the depedent variable is jcc
i
(in 00 Rupees) and :
i
is household size, that is, the number of members
in household i. You estimate model (33) with OLS, and the results are the following:
jcc
i
= 12.3
(0.64)
0.60
(0.29)
:
i
. (34)
14. (5 points) Calculate a 95% condence interval for the dierence in predicted jcc
i
(in 00 Rupees) between a
household with 3 members and a household with 6 members. You do not need to complete the calculation to
get full credit. Setting up the problem correctly is sucient.
Solution: First, note that the point estimate of the quantity for which we need a condence interval is
0.60 (6 3) = 1.8.
The condence interval is then
1.8 1.96 (3) (0.29) = 1.8 1.7052 = [3.5052, 0.0948]
15. (4 points) A colleague sees the results of your regression and concludes that increased use of contraception,
by leading to lower family sizes, is very likely to be an eective development policy, because based on your
results reduced family size will certainly lead to increased household expenditure. Do you think this is a sound
argument? Justify your answer.
Solution: The argument is nonsense. The OLS results only document the existence of a negative correlation
between jcc and household size, but tell us nothing about the causal relationship between the two. Indeed,
many economists argue that the causal pathway goes from poverty to large family size, and not vice-versa.
16. (5 points) The OLS standard errors in equation equation (34) have been estimated using the formula we saw in
class. Specically, the variance of the slope has been estimated using the following large sample approximation:
\ ar
_
^
,
c
_
-
1
:
ar [(:
i
j
c
) n
i
]
(o
2
c
)
2
,
where j
c
= 1 (:
i
) and o
2
c
= \ ar (:
i
) . Assume that the error term n
i
satises the usual zero conditional mean
assumption, that is, assume that 1 (n
i
[:
i
) = 0. Prove that the following is true:
ar [(:
i
j
c
) n
i
] = 1
_
(:
i
j
c
)
2
n
2
i
_
.
Solution: First, recall that the variance of a random variable 1 can be written as 1
_
1
2
_
[1 (1 )]
2
. Then
ar [(:
i
j
c
) n
i
] = 1
_
(:
i
j
c
)
2
n
2
i
_
1 [(:
i
j
c
) n
i
]
2
.
Now consider the term in curly brackets and use LIE:
1 [(:
i
j
c
) n
i
] = 1
_
_
1
_
_ (:
i
j
c
)
. .
constant, given c
n
i
[:
i
_
_
_
_
= 1
_
_
_
(:
i
j
c
) 1 [n
i
[:
i
]
. .
=0
_
_
_
= 0.
Then
ar [(:
i
j
c
) n
i
] = 1
_
(:
i
j
c
)
2
n
2
i
_
0
2
= 1
_
(:
i
j
c
)
2
n
2
i
_
.Q11
195
17. (3 points) Prove that \ ar (n
i
[:
i
) = 1
_
n
2
i
[:
i
_
.
Solution: Let us rewrite the variance using a formula analogous to the one we just used, \ ar (n
i
[:
i
) =
1
_
n
2
i
[:
i
_
[1 (n
i
[:
i
)]
2
(recall that here all expectations need to be conditional). But then
\ ar (n
i
[:
i
) = 1
_
n
2
i
[:
i
_
because we know that 1 (n
i
[:
i
) = 0 by assumption.
18. (4 points) Suppose now that the variance of the residual n
i
does not depend on the regressor :
i
, that is
\ ar (n
i
[:
i
) = 1
_
n
2
i
[:
i
_
= 1
_
n
2
i
_
= o
2
&
.
Prove that in this case we have
ar [(:
i
j
c
) n
i
] = o
2
&
o
2
c
.
Solution: We already know that ar [(:
i
j
c
) n
i
] = 1
_
(:
i
j
c
)
2
n
2
i
_
. Then, by using LIE again we have
1
_
(:
i
j
c
)
2
n
2
i
_
= 1
_
1
_
(:
i
j
c
)
2
n
2
i
[:
i
__
= 1
_
_
(:
i
j
c
)
2
1
_
n
2
i
[:
i
. .
=o
2
u
by assumption
_
_
= 1
_
o
2
&
(:
i
j
c
)
2
_
= o
2
&
1
_
(:
i
j
c
)
2
_
. .
=o
2
s
= o
2
&
o
2
c
Q11
19. (3 points) Using the results in the previous points, prove that, under the assumption that the variance of the
error does not depend on the regressor (that is, under the assumption that \ ar (n
i
[:
i
) = o
2
&
) the variance of
the slope in your OLS regression can obtained using the following large sample approximation:
\ ar
_
^
,
c
_
-
1
:
o
2
&
o
2
c
.
Solution:
\ ar
_
^
,
c
_
-
1
:
\ ar [(:
i
j
c
) n
i
]
(o
2
c
)
2
.
But in 18 we have proved that if \ ar (n
i
[:
i
) = o
2
&
then \ ar [(:
i
j
c
) n
i
] = o
2
&
o
2
c
, so, substituting in we have
\ ar
_
^
,
c
_
-
1
:
o
2
&
o
2
c
(o
2
c
)
2
=
1
:
o
2
&
o
2
c
Q11
196
Table 2: Dependent variable: l\
i
(=1 if child i is underweight)
(5) (6) (7) (8)
D (had intestinal disease recently) 0.02992 0.02644 0.02582 0.02572
[0.00411] [0.00401] [0.00400] [0.00399]
Asset index -0.05615 -0.04288 -0.04114
[0.00149] [0.00171] [0.00173]
Mother is illiterate 0.1127 0.09974
[0.00736] [0.00780]
Father is illiterate 0.04368
[0.00846]
Constant 0.41864 0.42263 0.36539 0.36065
[0.00370] [0.00362] [0.00513] [0.00520]
Observations 22445 22445 22445 22445
R-squared 0.00239 0.05087 0.0611 0.06229
Robust standard errors in brackets
197

The Big Problems File

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

The Big Problems File

Загружено:

Авторское право:

Доступные форматы

Duke University, practice problems for Introduction to Econometrics

October 26, 2009

is very far away from the

. Under some regularity conditions, the variance of

(d) Calculate the probability limit of

jr:cd = 13.9 .073 di:t = 13.9 .073 1.3 = 13.81

jr:cd = 13.9 .073 di:t = 13.9 .073 10 = 13.17

Pr (cccjt = 1 [ As) = .786 .002 22 +.113 0 +.016 1.2 +.048 16 - .04

Pr (cccjt = 1 [ As) = 1 (9.90 +.002 22 +.805 0 .111 1.2 +.379 16)

Pr (:nricd = 1 [ As) = .09 +.31 +.49 +.18 = 1.07

Pr (:nricd = 1 [ As) = (1.24 + 1.03 + 1.45 +.58) = (1.82) = .966

. If you wanted to predict the value of A

(m) (5 points) Is the following estimator unbiased for H

? Justify your answer.

? Justify your answer.

, that is, calculate what

that is consistent with equation (21)? Does your conclusion provide

? Hint: Remember that H

must be between zero and one, because

, that is, calculate what

that is consistent with equation (21)? Does your conclusion provide

is achieved (by an argument similar to

, we can at least say that no

' = .58 (.10) + (1 .58) (.14) = 0.1168

: at least one of the above coecients ,= 0

Solution: We have already proved that

34. (6 points) Prove that

, how would you calculate a

a 95% condence interval will be

Вам также может понравиться