Cross Section Answers

Economic Studies 2008–2009
60052 Cross Section Econometrics
This document provides answers to the May Examinations for 2007 and later. Answers
for ES5052 May 2006 and before are in separate document.
Martyn Andrews and Mette Christensen, xx/05/2009.
1
Answers to May 2009 Examination
1. (a) Compute differences and 2 more columns:

t i ∆y ∆x1 ∆x2 ∆x3 ∆y∆x1 (∆x1 )2
1 1 . . . . . .
2 1 -6 1 0 2 -6 1
1 2 . . . . . .
2 2 4 1 0 2 4 1
1 3 . . . . . .
2 3 0 0 0 2 0 0
1 4 . . . . . .
2 4 -2 -1 0 2 2 1
1 5 . . . . . .
P2 5 2 0 0 2 0 0
0 3
The FD estimator is given by:
P
b ∆y∆x1 0
β= P 2
= = 0.
(∆x1 ) 3
[4]. Dropping i = 3 makes no difference to either numerator or denominator,
ie βb = 0 [2]. [6 marks]
(b) ∆x3 = 2 for i = 1, . . . , 5 and ∆d2t = 1 for i = 1, . . . , 5.[2] Adding them both
wont work because they are collinear with each other.[1] (Each on its own will
work because there is no constant.) [3 marks]
(c) Construct the following (see Exercise C):
∆y ∆x1 ∆y∆x1 ∆x21
∆y1 0 0 0
∆y2 0 0 0 stayers
.. .. .. ..
. . . .
0 0 0
−1 −∆yi 1
−1 −∆yi 1 nq quitters
.. .. .. ..
. . . .
−1 −∆yi 1
1 ∆yi 1
1 ∆yi 1 nj joiners
.. .. .. ..
. . . .
∆yn 1 ∆yi 1
Hence we can write:
P P P
∆y ∆x y i − i∈q yi + 0
βb = Pi i 1i i∈j
2
=
i ∆x1i nq + nj + 0
2
By ignoring data for whom ∆x1i = 0, we can see that top block contributes
b
nothing to β. [7 marks]
(d) Now construct mean-deviations:
t i y x1 x2 x3 ÿ ẍ1 ẍ2 ÿẍ1
1 1 10 0 1 24 3 -1/2 0 -3/2
2 1 4 1 1 26 -3 1/2 0 -3/2
1 2 3 0 1 34 -2 -1/2 0 1
2 2 7 1 1 36 2 1/2 0 1
1 3 1 0 0 51 0 0 0 0
2 3 1 0 0 53 0 0 0 0
1 4 5 1 0 44 1 1/2 0 1/2
2 4 3 0 0 46 -1 -1/2 0 1/2
1 5 2 1 0 17 -1 0 0 0
2 5 4 1 0 19 1 o 0 0
The FE estimator is given by:
P
ÿẍ1 0
βb = P 2
= = 0.
(ẍ1 ) 3/2
[5]. This demonstrates that, for T = 2, FD and FE are identical.[2] One
cannot add x2 to this regression because ẍ2 = 0 for everyone. Its a male
dummy, possibly.[2] [9 marks]
2. (a) Just state that the Robust Covariance Matrix can be written as:
Ã n !
db X
Avar(β) = (X0 X)−1 b2 x0 xi (X0 X)−1
u i i
i=1
Also, because
  
b21 0 . . .
u 0 x1
n
X £ ¤ 0 u b22   x2 
b2i x0i xi =
u x01 x02 . . . x0n 
 ... ...   .  = X0 ΛX,
  .. 
i=1
0 b2n
u xn
this can be written:
db
Avar(β) = (X0 X)−1 X0 ΛX(X0 X)−1 .
Note that X0 is k × n, Λ = diag{b

u2i } is n × n, and X is n × k. [4 for either
formula]
(b) Write the model as
yi = β0 + β1 xi + ui ,
and then write in mean deviation form
yi − ȳ = β1 (xi − x̄) + ui .
3
Now X0 is a 1 × n matrix:
X0 = [x1 − x̄, x2 − x̄, . . . , xn − x̄]
which means that n
X
0
XX= (xi − x̄)2 = SSTx
i=1
and n
X
0 −1
(X X) = 1/ (xi − x̄)2 = 1/SSTx
i=1
Next,
  
b21 0 . . . 0
u x1 − x̄
n
 0 u
b 2
  x − x̄  X
X0 ΛX = [x1 − x̄, x2 − x̄, . . . , xn − x̄]  .
2
.  2 . = (xi −x̄)2 u
b2i .
 .. ..  .. 
i=1
2
0 u
bn xn − x̄
Finally,
n
1 X 1
(X0 X)−1 (X0 ΛX)(X0 X)−1 = (xi − x̄)2 u
b2i .
SSTx i=1 SSTx
QED. [8 marks]
(c) Write the model as:
y = β1 + β2 d2 + · · · + β5 d5 + u,
where the dk represent the age dummies. The traditional standard errors are
approx 0.022 for all 4 dummies, whereas the robust equivalents are 0.018,
which is 20% lower. This suggests that there is some heteroskedasticity.[4]
A formal test comes from assuming that the homoskedasticity is linear:
Var(u|d1 , . . . , d5 ) = E(u2 |d1 , . . . , d5 ) = σ 2 .
To implement the test, run the following regression:
b2 = δ1 + δ2 d2 + · · · + δ5 d5 + v,
u
and test H0 : δ2 = · · · = δ5 = 0 (4 restns) using an F -test. The observed
F statistic is 15.9 (pv=0.0000), and so there is considerable evidence of het-
eroskedasticity.[4]
The regression also shows that there is considerable variation in the sample
variation of y across the 5 age categories, going from 0.17, 0.24, 0.32, 0.35, and
0.34.[2]
The point here is that, because there is heteroskedasticity, cannot use the F -
test. One has to use the Robust Covariance Matrix and test the restrictions
jointly (A Wald test—test in Stata).[3] [I gave one mark if students note that
all the estimates are significant in the Robust regression.]
4
3. (a) The IV estimator is P
(zi − z̄)(yi − ȳ)
βb1,IV =P (1)
(zi − z̄)(xi − x̄)
[1] and its estimated standard error is
σ
b βb1,IV ) = p
se( 2
(2)
SSTx Rx,z
2
[1], where Rx,z is from regressing x on 1 and z [1], σb is the estimated standard
error of u [1], and SSTx is the total sum-of-squares of x [1]. [5 marks]
(b) Replace z by x is expression for βb1,IV :
P
b (xi − z̄)(yi − ȳ)
β1,OLS = P . (3)
(xi − x̄)2
2
[1] If x acts as an IV for z, they are perfectly collinear, and so Rx,z = 1 [1].
Substituting,
σ
b βb1,OLS ) = √
se( (4)
SSTx
[1] [3 marks]
(c) The ratio of estimated standard errors is:
b βb1,IV )
se( 1
=p 2
se( b
b β1,OLS ) Rx,z
2 2
[2]. As Rx,z ≤ 1, IV standard errors always bigger [1]. (When Rx,z = 1, its
OLS.) [3 marks]
(d) The OLS estimate of the return to education is 0.101 (0.0066), which is stan-
dard [1]. However, because u contains ability, which causes E(u|x) > 0, OLS
is biased upwards [1].
Both instruments are ‘family background’ variables. For either be a valid IV,
i. E(u|z) = 0 [1], ie family background should be uncorrelated with ability.
This is the standard assumption [1].
ii. z does not belong in the model being estimated [1]. Employers do not give
extra pay because an individual has more/fewer siblings, or because they
come from a broken home [1].
iii. E(x|z) 6= 0 [1]. In the RF regression of educ on the 2 IVs, they are
significant (t-stats –5.1 and –7.6), but the R2 is worryingly low [1].
For the broken home variable, the IV returns to education is 0.221 (0.0474)
[1]. This moves the wrong way [1], and the IV se is 8 times bigger! [1]. For no
of siblings, the IV estimate doesn’t move, ie is 0.110 (0.0300), and the IV se
is 5 times bigger [1].
The effect of the 2 IVs on the IV returns to education differ [1], but one cannot
chose between them because both are ‘weak’ and do not move IV estimate in
the ‘right’ direction [1].
5
(At this point one can get about 14 marks. 2 extra are awarded if one notes
that the model could be estimated using both IVs at the same time because
the model is then over-identified [1]. One would use GIVE/2SLS [1].)
4. Binary choice model compared with linear probability model.
(a) Men are more likely to have an affair than women, since the estimated coef-
ficient on male is positive. This can be seen from both the linear probability
model and from the logit model. The effect is not significant, though, since
p-values are larger than 5% (0.081). The difference in probability of having
an affair between men and women according to the linear probability model
OLS
is β̂male = 0.0626079, i.e. men are 6.26% more likely to have an affair than
women. According to the logit model:
P (y = 1 | male = 1, x̄) = F (β̂0 + β̂1 yrsmarr + β̂2 relig + β̂3 + β̂4 age)
= F (−.0198 + .1282(8.1777) + (−.3464)(3.1165) + .3521 + (−.039967)(32.4875))
exp(−1.3494)
= = .269473
1 + exp(−1.3494)
P (y = 1 | male = 0, x̄) = F (β̂0 + β̂1 yrsmarr + β̂2 relig + β̂4 age)

exp(−.9973)
= = .205969,
1 + exp(−.9973)
i.e. men are 6.35% more lilely than women to have an affair.
(b) The effect of years of being married on the probability of having an affair for
25 year old males:
∂P (y = 1 | male = 1, x̄, age = 25)
= β̂1 f (β̂0 + β̂1 yrsmarr + β̂2 relig + β̂3 + 25β̂4 )
∂yrsmarr
exp(−.6980)
= (.1282) = .0284
(1 + exp(−.6980))2
And for 40 year old females:

∂P (y = 1 | male = 0, x̄, age = 40)
= β̂1 f (β̂0 + β̂1 yrsmarr + β̂2 relig + 40β̂4 )
∂yrsmarr
exp(−1.64956)
= (.1282) = .0173
(1 + exp(−1.64956))2
(c) The effect of years of being married on the probability of having an affair for
6
a ”very religious” person (coded 5):
∂P (y = 1 | relig = 5, x̄)
= β̂1 f (β̂0 + β̂1 yrsmarr + β̂2 5 + β̂3 male + β̂4 age)
∂yrsmarr
= .1282f (−.0197 + .1282(8.177) + (−.346)5 + .3521(.4759)
+ (−.03997)32.488)
exp(. − 1.8331)
= .1282
(exp(−1.8331))2
= .0152
The effect of years of being married on the probability of having an affair for
an ”anti-religious” person (coded 1):
∂P (y = 1 | relig = 1, x̄)
= β̂1 f (β̂0 + β̂1 yrsmarr + β̂2 + β̂3 male + β̂4 age)
∂yrsmarr
exp(.2104)
= .1282 = .0305
(exp(.2104))2
(d) All four effects are according to the LPM β̂relig = .02268.
(e) In the LPM, the marginal effects are constant, whereas in the logit models
they can vary with other observables. This is where the logit model becomes
richer than the LPM. From our logit estimates we can see that the effect of
yrsmarr on the probability of having an affair is very different for different
age, gender and religiousity groups (.02 or .017 or .0152 or .0305, depending
on age, gender and relig). These effects would all be .022 in the LPM.
5. (a) The appropriate sample selection model:
yi∗ = x01i β1 + ε1 (structural equation)

h∗i 0
= x2i β2 + ε2 (selection equation)
∗ ∗
yi = yi , h i > 0
yi not observed if h∗i ≤0
µ ¶ µµ ¶ µ 2 ¶¶
ε1 0 σ11 σ12
˜N ,
ε2 0 1
(b) The OLS conditional mean is E(y|x) = x01 β. The conditional mean, conditional
on working is
φ(x02 β2 )
E(y|x1 ; h = 1) = ... bookwork..... = x01 β1 + σ12 .
Φ(x02 β2 )
(c) No. The model in the output has x1 = x2 , which means that σ12 is only
identified off that λ(·) is non-linear. But from the regression output in the
appendix, we can see that a regression of the estimated λ’s on x1 = x2 has
an R2 = .9696, i.e. λ(·) is almost linear. This means that identification is
7
very weak. A preferred model would have some identifying extra variables in
the selection equation which do not affect wages. One such variable could be
kidlt6. Another possibly age.
6. (a) The kernel estimator:
n µ ¶
1 X xi − x
fˆ(x) = k ,
nh i=1 h
where h = bandwidth, k(·) is the kernel function.

(b) Bookwork. Do not expect to derive the bias and variance, but must know
the trade-off between bias and variance as a function of the bandwidth, and
possibly also provide the expressions for the bias and variance.
(c)

 0, x < .6



 .5, .6 < x < .9
10 
1, .9 < x < 1
fˆ(x) =
6 
 1, 1 < x < 1.3


 .5, 1.3 < x < 1.4

0, x > 1.4
8
1. [Comment: I reallocated the marks after I realised that nobody was answering part
(f) as I expected. (f) is 2 less; (e) 2 more.]
(a) When either δ = 1 (trivial) or when xit and dit are (exactly) uncorrelated in the
sample.[3 marks] [Comment: this extremely basic idea has been emphasised
repeatedly in the course.] Basic OLS algebra applied to (2) gives
βb = ∆y 1 − ∆y 2 ,
where “1” denotes migrants and “2” denotes non-migrants.[3 marks] [This
algebra comes from straight from the lecture notes. Some tried applying OLS
to (1) or the first (2).] An alternative answer that ignores the fact that xi2 is
a dummy is [3]: P
b (∆yi − ∆y)(xi2 − x̄2 )
β= i P 2
.
i (xi2 − x̄2 )
[6 marks]
(b) Denote the estimators as Pooled OLS and FD respectively. For FD:
E(∆xi ∆ui ) = E[(xi2 − xi1 )(ui2 − ui1 )] = 0.
4 sufficient conditions are:
E(xi1 ui1 ) = 0
E(xi2 ui2 ) = 0
E(xi1 ui2 ) = 0
E(xi2 ui1 ) = 0.
Pooled OLS doesn’t need the last two, and is therefore the weaker assumption.
[This algebra comes from straight from the lecture notes on FD.]
[3 marks]
(c) The first Stata regression is (1) and the second is (2). Iow, the same model
using the same data is being estimated by OLS and FD resp. Because the
second model removes 3275 fixed effects ai , FD has 3275 fewer obs.[2 marks]
The F-statistic is
µ ¶µ ¶
1551.63 − 195.23 3275 − 2
= 6.946,
195.23 3274
and so reject H0 : ai = a.[2 marks]
[4 marks]
(d) When the ai are not controlled for (Pooled OLS), βb = 0.200 and is almost
significant; when the ai are controlled for (FD), βb = −0.004 and is insignificant.
This is heterogeneity bias, because ai and xit are strongly correlated.[3 marks.
I gave 2 marks if they just said that the ai are jointly significant.]
9
(e) For a particular variable to be a valid IV, one needs to examine three condi-
tions. First, should nokid actually belong in the wage equation? Probably not;
employers dont give extra money for having kids.[2] Second, is nokid likely to
be correlated with ‘motivation’ ? Again, probably not.[2] Thus far, nokid looks
like a valid IV.
Third, are nokid and move correlated? In a regression of move on nokid, there
is no evidence that having kids makes one more or less likely to migrate (t-
stat=0.78) (even though a priori one might expect kids to act as a constraint
on migration). Also, the R-squared is 0.0035.[1] This is a weak instrument
[1], and it means that the standard error on move in the IV regression is 86.9
compared with the OLS standard error of 0.104! [1]
[7 marks]
(f) Here the investigator is dealing with ai by differencing, but still believes that
E(∆xi ∆ui ) 6= 0. Like in (e), ∆nokid is a weak instrument for ∆move.[1] [1]
more mark is available for mentioning any of the following.
If ‘motivation’ can be plausibly modelled a time-invariant effect, then its very
important.
To deal with either E(xit uit ) 6= 0 or E(∆xi ∆ui ) 6= 0, we learn nothing using
nokid as an IV.
There are only 22 migrants!
[2 marks]
[General comment: this question was badly answered, although all the material is
very familiar. Maybe by examining two major parts of the course, FD and IV, in
the same question, this confuses students.]
2. (a) The Traditional Covariance Matrix is given by [3 marks]:
· ¸−1 · ¸
2 0 −1 2 n1 0 2 1/n1 0
σ (X X) = σ =σ
0 n2 0 1/n2
The Robust Covariance Matrix is given by [3 marks]:
 
1 0
 
1  0 

b21 0 . . . 0
u .. ..  · P
· ¸ 2 . .  ¸
1, 1, . . . , 1, 0, . . . , 0  0 u
b 2    b2i P 0
u
0
X ΛX =   1 0 = i∈1 .
0, 0, . . . , 0, 1, . . . , 1  ... ..
.   0 i∈2 b2i
u
 0 1 
0 b2n  .. .. 
u
 . . 
0 1
Hence [3 marks]:
(X0 X)−1 X0 ΛX(X0 X)−1 =
· ¸· P ¸· ¸ · 2 ¸
1/n1 0 b2i P 0
i∈1 u 1/n1 0 b1 /n1
σ 0
≈ .
0 1/n2 0 b2i
i∈2 u 0 1/n2 0 b22 /n2
σ
10
With homoskedasticity, σ12 = σ22 = σ 2 , which delivers the Traditional Covari-
ance matrix above.[1]
[Comment: this question was taken straight from Exercise B. It appears that
students just don’t engage with matrices anymore - maybe a downside of
Wooldridge’s book?]
[10 marks]
(b) There are two bits of evidence. First, to estimate Var(u|x1 , . . . , xk ), take the
squared residuals from the model and regress them on covariates. In this
particular case, the investigator is estimating:
Var(u|male) = γ0 + γ1 male.
The estimate on male has a p-value of 0.123, and so one does not reject H0 :
Var(u|male) = Var(u|f emale).[3] Also, in the raw data, one can see that
se(logwahehr) is very similar for males and females (0.459 and 0.462) resp.[1]
Second, the Robust and Traditional standard errors are quite similar:
male 0.0160 0.0160 (0%)
age 0.00520 0.00474 (9.7%)
agesq 0.0000626 0.0000583 (7.4%)
Iow, the heteroskedasticity is quite weak.[3]
[7 marks]
(c) i. False. The LPM always exhibits heteroskedasticity, but heteroskedasticity
does not affect unbiasedness or consistency in linear models.[3]
ii. True.[2]
iii. False. All inference is affected by heteroskedasticity (including Chow
tests).[3]
[8 marks]
3. [Most of this is taken verbatim from the slides.]
(a) We observe, for each individual, the observed outcome Yi , not both Yi (0) and
Yi (1):
Yi = Wi Yi (1) + (1 − Wi )Yi (0) (5)
[3 marks]
(b) The population average treatment effect (PATE) and the population
average treatment effect for the treated (PATT) are resp:
τP = E[Yi (1) − Yi (0)] τP,T = E[Yi (1) − Yi (0)|Wi = 1]
The sample average treatment effect (SATE) and the sample average
treatment effect for the treated (SATT) are resp:
N N
1 X 1 X
τS = [Yi (1) − Yi (0)] τS,T = [Yi (1) − Yi (0)],
N i=1 NT i:W =1
i
P
where NT = N i=1 Wi is the number of treated individuals.
11
[6 marks]
(c) Assumption 1 (Unconfoundedness)
[Yi (1), Yi (0)] ⊥ Wi |Xi
This is called the conditional independence assumption or selection on
observables, or missing at random. Conditional on Xi , both Yi (1) and
Yi (0) are independent of Wi . Iow, for each value of Xi , treatment is assumed
to be randomly assigned. To see its link to standard exogeneity assumptions,
assume a constant effect and linear regression (see the slides).
Assumption 2 (Overlap)
0 < Pr(Wi = 1|Xi ) < 1
For any given Xi , there are both treated and untreated individuals in the
population. The slides illustrate this.
[5 marks]
(d) Define the PATE for given values of Xi = x as τ (x) ≡ E[Yi (1) − Yi (0)|Xi = x].
Then
τ (x) ≡ E[Yi (1) − Yi (0)|Xi = x]
= E[Yi (1)|Xi = x] − E[Yi (0)|Xi = x]
= E[Yi (1)|Xi = x, Wi = 1] − E[Yi (0)|Xi = x, Wi = 0] (a)
= E(Yi |Xi = x, Wi = 1) − E(Yi |Xi = x, Wi = 0) (b)
Unconfoundedness is used in line (a). Line (b) is definitional, given (5). Now
τ (x) can be estimated.
[5 marks]
(e) Suppose Xi comprises a single binary covariate, eg gender. Then, for men:
τ (1) = E(Yi |Xi = 1, Wi = 1) − E(Yi |Xi = 1, Wi = 0)
This is estimated by ȳ11 − ȳ01 where ȳ11 is the sample average Y for treated
men, and ȳ01 is . . . for untreated men.
This is a simple example of a matching estimator. We observe a sample
of men with treatment, with average (earnings, say) ȳ11 . As it is only gender
that is different between individuals, we compare ȳ11 with average earnings of
men without treatment, ȳ01 .
Every treated man is matched with an untreated man, because, within the
two groups, everybody is the same.
For women, τ (0) is by estimated ȳ10 − ȳ00 .
Averaging over τ (1) and τ (0) gives the SATE:
N.1 N.0
τS = (ȳ11 − ȳ01 ) + (ȳ10 − ȳ00 )
N N
where N.1 and N.0 are the number of men and women in the sample resp, and
ȳ10 , ȳ00 refer to treated and untreated women resp. Also, N = N.1 + N.0 .
12
[6 marks]
4. (a) The three selection models are constructed such that the first model is the best
one (it satisifies that the explanatory variables in the wage equation is a strict
subset of the explanatory variables in the selection equation, thus it identifies
the selection off these extra variables and not just of the non-linearity of the
inverse mills ratio), the second also includes nwifeinc and kidlt6 in the wage
equation (hence violating that preferably, the selection equation should contain
variables that are not in the wage equation) and the third model does not
include the variables from the wage equation in the selection equation (hence
violating that the set of explanatory variables in the wage equation should be
a subset of the set of explanatory variables in the selection equation).
(b) Since the t test on the inverse mills ratio comes out insiginificant (t=1.76),
we can not reject the null hypothesis of no sample selection bias. Hence, the
potential problem of bias from simple OLS estimation is not a problem after
all.
(c) Comparing the estimated return to education from the OLS with the estimated
return to education from the selection model: β̂OLS = 0.09915, β̂Heckman =
0.1032796. Hence, almost no difference at all, as to be expected when the test
for sample selection bias comes out with the result that there is no bias.
(d) The potential problem with simply running an OLS of log wages on education
and other explanatory variables is that a substantial part of the women in
the data set (41.68%, nearly half!) do not work, hence we do not observe
their wages. This can potentially cause the OLS estimates to be biased. The
conditional mean estimated by OLS is
E(w|x) = x01 β1 ,
whereas the conditional mean when taking the selection properly into account
is
E(w|x) = x01 β1 + σ12 λ(x02 β2 ),
where λ is the inverse Mill’s ratio. This shows that if σ12 6= 0, OLS will be
biased on the selected sample.
5. (a) The logit model models the probability of the baby being underweight, P (dw =
1|·), so since the coefficient on cigs is positive and cigs is significant (t=2.22 or
t=2.02), smoking during pregnancy does significantly increase the probability
of the baby being underweight .
(b) The difference in the probabilities of the baby being underweight baby between
not smoking and smoking a pack of cigarettes per day (using the sample means
13
of the other variables):
Pr(dw = 1| 20 cigs, ·) − Pr(dw = 1| 0 cigs, ·) =
F (β̂0 + β̂1 · 20 + β̂2 drink + β̂3 mage + β̂4 meduc + β̂5 npvis + β̂6 wh)−
F (β̂0 + β̂2 drink + β̂3 mage + β̂4 meduc + β̂5 npvis + β̂6 wh) =
F (−.3316 + .0281 · 20 + .115 · .0816 − .004 · 29.5579−
.0234 · 13.7181 − .030411.1889 − .2618 · .8772)−
F (−.3316 + .115 · .0816 − .004 · 29.5579−
.0234 · 13.7181 − .030411.1889 − .2618 · .8772) =
F (−.7765) − F (−1.3385) =
e−.7765 e−1.3385
− =
1 + e−.7765 1 + e−1.3385
.3151 − .2078 = .1073
Between smoking 10 cigarettes and 20 cigarettes per day (using the sample
means of the other variables):
Pr(dw = 1| 20 cigs, ·) − Pr(dw = 1| 10 cigs, ·) =
F (−.7765) − F (−1.3385 + .0281 · 10) =
F (−.7765) − F (−1.0575) =
e−.7765 e−1.0575
− =
1 + e−.7765 1 + e−1.0575
.3151 − .2578 = .0573
(c) The marginal effect on the probability of having an underweight baby of the
number of prenatal visits when the mother did not smoke during pregnancy
(using the sample means of the other variables):
∂P (dw = 1| cigs = 0, x) e−1.3385
= (−.0304) = .1646(−.0304) = −.005
∂(npvisit) (1 + e−1.3385 )2
And when the mother smoked a pack of cigarettes per day during pregnancy
(using the sample means of the other variables):
∂P (dw = 1| cigs = 20, x) e−.7765
= (−.0304) = .2158(−.0304) = −.0066
∂(npvisit) (1 + e−.7765 )2
[? MARKS]
(d) The same effects as in b. and c., but now from the linear regression model:
Pr(dw = 1| cigs = 20, ·) − Pr(dw = 1| cigs = 0, ·) = .0054 · 20 = .1080
Pr(dw = 1| cigs = 20, ·) − Pr(dw = 1| cigs = 10, ·) = .0054 · 10 = .054
∂P (dw = 1| cigs = 0, x) ∂P (dw = 1| cigs = 20, x)
= = −.0049
∂(nvisit) ∂(nvisit)
14
Even though the differences between the logit and linear probability models
are not great when calculating the differences in the probabilities of having an
underweight baby between not smoking and smoking a pack per day are very
small, there seems to be rather a big difference in the marginal effects of the
prenatal visits between non-smokers and smokers smoking a pack per day in
the logit model which are not captured by the linear probability model. Thus,
if the mother does not smoke, an extra prenatal visit lowers the probability
of her baby being underweight by 0.5% (.49%) according to both the logit
and LP models, whereas if the mother smokes a pack per day, the logit model
shows a larger effect of an extra prenatal visit (the probability of having an
underweight baby decreases by 0.66%) than the LPM (the probability of having
an underweight baby decreases by 0.49%). Thus, the logit model captures
that perhaps the prenatal visits have a larger effect (are more important) for
smokers than for non-smokers. This effect is not captures by the LPM.
6. Lecture notes on non parametric kernel estimation: Write down the estimator. Ex-
plain intuitively what it does (done in lectures by using the uniform kernel). Derive
(or state) the bias and variance of the estimator and show that the bandwidth-choice
introduces a trade-off: Small bandwidth gives small bias, but large variance. The
choice of bandwidth is thus not trivial.
15
1. (a)
P P
Txy i Pt (x − x̄)(yit − ȳ)
βb = ≡ Pit 2
Txx i t (xit − x̄)
P P
Wxy i Pt (x − x̄i )(yit − ȳi )
βbw = ≡ Pit 2
Wxx i t (xit − x̄i )
P
Bxy i (x̄ − x̄)(ȳi − ȳ)
βbb = ≡ Pi 2
Bxx i (x̄i − x̄)
P P P
where ȳi ≡ t yit /T , i = 1, . . . , N , and ȳ ≡ i t yit /N T .
[3 marks each]
(b) The first 2 by 2 matrix gives Tyy , Txy , and Txy ; the second 2 by 2 matrix gives
Byy , Bxy , and Bxy ; and the third 2 by 2 matrix gives Wyy , Wxy , and Wxy . This
is obvious from looking at the first 20 rows of the data (as listed).
Hence
Txy 0.071367
βb = = = 0.3053.
Txx 0.233713
Wxy 0
βbw = = , ie, is not defined.
Wxx 0
Bxy 0.071367
βbb = = = 0.3053.
Bxx 0.233713
[2 marks each] Because there is no within variation, (a) βbw is not defined, and
(b) βb = βbw . [2] Why?
Txy = Bxy + Wxy = Bxy as Wxy = 0

Txy = Bxx + Wxx = Bxx as Wxx = 0.
[8 marks]
(c) The following formula helps, as well as using the equation in the question:
Wxy + θBxy
βbg = , θ = σu2 /(σu2 + T σa2 ).
Wxx + θBxx
When λ = 1, θ = 0, the RE estimator becomes FE. Why?
Wxy
βbg =
Wxx
16
and
yit − ȳi = β(xit − x̄i ) + (vit − v̄i ).
When λ = 0, θ = 1, the RE estimator becomes OLS. Why?
Wxy + Bxy Txy
βbg = =
Wxx + Bxx Txx
and
yit = β0 + βxit + vit .
[3 marks each]. Extra marks were given for noting that the output gives the
OLS/Between estimate of 0.3052 computed in part (b). Why?
θBxy Bxy
βbg = =
θBxx Bxx
Also, an estimate of θ for the RE model can be computed:
θb = 0.348252 /(0.348252 + 2.0 ∗ 0.373302 ) = 0.303,

p
b = 1 − θb = 0.449. This doesn’t help much.
with λ
[8 marks]
17
2. (a) In order for a variable, z, to serve as a valid instrument for y2 , the following
must be true
i. The instrument must be exogenous:
Cov(z, u) = 0 (15.4)
ii. The instrument must be correlated with the endogenous variable y2 :
Cov(z, y2 ) 6= 0 (15.5)
iii. z does not belong in (15.2), the model being estimated
y2 = π0 + π1 z + v.
We sometimes refer to this regression as the first-stage regression.
Consider (15.2). Multiply by 1 and z, and take expectations:
1 : E(y1 ) = β0 + β1 E(y2 ) as E(u) = 0

z : E(y1 z) = β0 E(z) + β1 E(y2 z) + E(uz)
Now subtract first times E(z) from second:
E(y1 z) − E(z)E(y1 ) = β1 [E(y2 z) − E(z)E(y2 )] + E(uz)

or Cov(z, y1 ) = β1 Cov(z, y2 ) + Cov(z, u)
Given our assumptions, then

Cov(z, y1 )
β1 = . (15.9)
Cov(z, y2 )
Thus the IV estimator for β1 is
P
b (zi − z̄)(y1i − ȳ1 )
β1 = P . (15.10)
(zi − z̄)(y2i − ȳ2 )
[7 marks]
(b) The correlate command suggests both IVs are weakly correlated with educa-
tion. However, one should regress education on both IVs. Now husband’s age
in not significant, and regional unemployment is. [1 mark for each var.]
Need to discuss whether either variable belongs in the wage equation. The
answer is ‘no’ and ‘no’, but some noted that in the Wage Curve literature
regional unemployment is in the wage equation. [1 mark for each var.]
Need to discuss whether either variable is correlated with unobserved abil-
ity/motivation. Again, the answer is ‘no’ and ‘no’, but one might argue that
some ‘clever’ women marry older men! [1 mark for each var.]
[6 marks]
18
(c) The OLS estimate of 0.108 is absolutely standard (an increase of one year’s
schooling increases wages by 11%.)[1 mark]
But it is probably biased upwards (too big) [1], because unobserved ability is
positively correlated with education [1].
If the two IVs satisfy the three conditions (part (a)), then the IV estimate
is consistent, and lower (0.033), as expected. However, the standard error
on education is 7 times bigger [1]. The 95% confidence interval (-0.17,0.23)
contains all reasonable values so we learn nothing from this regression [1]. This
is because the IV is “weak” [1].
Bonus if students mention that one might prefer IV if
Corr(z, u)
< Corr(x, u)
Corr(z, x)
As the IV gets weaker, Corr(z, x) → 0. Might prefer OLS when Corr(z, u) 6=
0.[1]
(d) Taken straight from the lecture notes:
• If we do not have endogeneity, both OLS and IV are consistent
• However, OLS is preferred to IV if we do not have an endogeneity problem,
because the standard errors will be smaller.
• A test for endogeneity is rather simple. Idea of Hausman test is to see if
the estimates from OLS and IV are different
• Consider
y1 = β0 + β1 y2 + β2 z1 + β3 z2 + u1 , (15.49)
with reduced-form
y2 = π0 + π1 z1 + π2 z2 + π3 z3 + π4 z4 + v2 (15.50)
• If y2 is endogenous E(y2 u1 ), then v2 (from the reduced form equation) and
u1 from the structural model will be correlated:
E(y2 u1 ) = π0 E(u1 ) + π1 E(z1 u1 ) + · · · + π4 E(z4 u1 ) + E(v2 u1 )
= E(v2 u1 )
• Save the residuals from the first stage and include in the structural equa-
tion (which of course has y2 in it):
y1 = β0 + β1 y2 + β2 z1 + β3 z2 + δ1 vb2 + u1 , (15.51)
• If the coefficient on the residual is statistically different from zero, reject
the null of exogeneity. Iow, H0 : δ1 = 0 tests E(y2 u1 ) = E(v2 u1 ) = 0.
The regressions match these notes exactly, ie y1 is lwage, y2 is educ, z1 is exper
and z2 is expersq. The IV estimate is obtained twice (0.082) and is very similar
to OLS.
The crucial equation is (15.51) and students need to explain why adding for vb2
tests for endogeneity and that it also generates IV estimates. Then they need
to state that vb2 is insignificant and so endogeneity is rejected, which is why
OLS and IV are similar. However, this conclusion is again based on weak IVs.
[6 marks]
19
3. Taken straight from the lecture notes. Policy analysis is discussed in both Pooled
Cross Sections and Panel Data parts.
4. (a) An expression for the log likelihood function of the logit model:
y 1−yi
L(β) = Πi (F (x0i β)) i (1 − F (x0i β))
P P
l(β) = i yi log(F (x0i β)) + i (1 − yi ) log (1 − F (x0i β))
P P
= yi =0 log F (x0i β) + yi=1 log(1 − F (x0i β)),
so for the logit:

µ ¶ µ x0 β ¶
P 1 P ei
l(β) = yi =0 log x 0β + yi=1 log 0
1+e i 1 + exi β
P 0 P P 0
= − yi =0 log(1 + exi β ) + yi=1 (x0i β) − yi=1 log(1 + exi β )
P P 0
= yi=1 (x0i β) − ni=1 log(1 + exi β ).
(b) Interpreting the coefficient estimates from the logit and probit: The effect of
age is negative, ie the probability of being unemployed descreases with age.
Males are more likely to be unemployed than females. Having a university
degree lowers the probability of being unemployed.
(c) The probability of being unemployed when male:
p̂male = P (y = 1 | x1i = 1, x2i = age, x3i = educ)

= F (β̂0 + β̂1 + β̂2 x̄2 + β̂3 x̄3 )
= F (−1.5741 + .2149 + (−.0362)(38.8491) + (−.9473)(.1552))
= F (−2.9125)
e−2.9125
=
1 + e−2.9125
= .0515.
The probability of being unemployed when female:
p̂f emale = P (y = 1 | x1i = 0, x2i = age, x3i = educ)

= F (β̂0 + β̂2 x̄2 + β̂3 x̄3 )
= F (−2.9125 − .2149)
= F (−3.1274)
e−3.1274
=
1 + e−3.1274
= .0420.
The difference between genders is thus .0515 − .0420 = .010, ie women are 1%
less likely to be unemployed than men.
20
Th effect of age on the probability of being unemployed:
∂P (y = 1 | x)
= β̂2 f (β0 + β1 male + β2 age + β3 educ)
∂(age)
= β̂2 f (−1.5741 + .2149(.5216) + (−.0362)(38.8491) + (−.9473)(.1552))
= β̂2 f (−3.0145)
e−3.0145
= β̂2
(1 + e−3.0145 )2
= (−.0362)(.0446)
= −.00161,
i.e. one additional year of age decreases the probability of bein gunemployed
by 0.16%.
The difference between having a university degree or not for males:
P (y = 1 | x1 = 1, x2 = age, x3 = 1) − P (y = 1 | x1 = 1, x2 = age, x3 = 0)
= F (β̂0 + β̂1 + β̂2 age + β̂3 ) − F (β̂0 + β̂1 + β̂2 age)
= F (−3.7120) − F (−2.7647)
e−3.7120 e−2.7647
= −
1 + e−3.7120 1 + e−2.7647
= −.0035,
similarly for women we find: P (y = 1 | x1 = 0, x2 = age, x3 = 1) − P (y =

1 | x1 = 0, x2 = age, x3 = 0) = −.0291, i.e. the effect is larger for men.
5. (a) The income elasticity e for tobacco resulting from OLS:
w = β0 + β1 ln x + β2 (children) + β3 (adults) + β4 (age) + u,
so
∂w
= β1 ,
∂ ln x
i.e.
1 ∂w
e= +1
w ∂ ln x
β1
= + 1.
w
When calculating this, use the average budget share:
−.0142
e= + 1 = .5591.
.0322
(b) From the output it can be seen that there is a large proportion of zeroes (62%),
hence it is likely that OLS estimates are biased.
21
(c) Tobit model would be appropriate because of the large number of zeroes and
because when the budget share is actually positive, we can regard it is a
continuous variable. The apporpriate tobit model:
yi∗ = x0i β + εi , εi ∼ N (0, σ 2 )

(
yi∗ , yi∗ ≥ 0
yi =
0, yi∗ < 0
(d) The income elasticity eT OBIT resulting from the tobit model: eT OBIT =.2358.
This is smaller than the elasticity resulting from OLS, suggesting that in fact
tobacco is not that income-sensitive. The difference highlights the importance
of using the the tobit model instead of OLS; OLS overestimates the income-
elasticity.
6. Estimating a linear regresion of wages on education levels, years of labour market
experience and ages by OLS may not be appropriate because the sample of workers
may not be a random one: People select themselves into employment and it is not
randomwho choses to take a job/who can get a job, or not. Hence OLS may be
biased. The appropriate sample selection model:
h∗i = x02i β2 + ε2i (selection equation)

wi∗ 0
= x1i β1 + ε1i (wage equation)
∗
wi = wi , hi = 1 if h∗i ≥ 0
wi not observed and hi =0 if h∗i ≤ 0
One can show that the conditional mean E(w|h = 1) = x01 β + σ12 λ(x02 β) (average
wage, conditional on being in the labour force), which is different from the OLS
mean which would be E(w) = x01 β. Explanation for how Heckman’s Two Step
Estimator works: Particularly, emphasize that if one can find no variables that
enter the selection equation but not the wage equation, identification will happen
off functional form only, which is not as desirable as having ”real” identification.
See also slides on Heckman Two-Step.
22

Cross Section Answers

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Cross Section Answers

Загружено:

Авторское право:

Доступные форматы

Economic Studies 2008–2009

60052 Cross Section Econometrics

Martyn Andrews and Mette Christensen, xx/05/2009.

1. (a) Compute differences and 2 more columns:

Note that X0 is k × n, Λ = diag{b

P (y = 1 | male = 0, x̄) = F (β̂0 + β̂1 yrsmarr + β̂2 relig + β̂4 age)

And for 40 year old females:

5. (a) The appropriate sample selection model:

yi∗ = x01i β1 + ε1 (structural equation)

where h = bandwidth, k(·) is the kernel function.

Txy = Bxy + Wxy = Bxy as Wxy = 0

Also, an estimate of θ for the RE model can be computed:

θb = 0.348252 /(0.348252 + 2.0 ∗ 0.373302 ) = 0.303,

ii. The instrument must be correlated with the endogenous variable y2 :

iii. z does not belong in (15.2), the model being estimated

1 : E(y1 ) = β0 + β1 E(y2 ) as E(u) = 0

Now subtract first times E(z) from second:

E(y1 z) − E(z)E(y1 ) = β1 [E(y2 z) − E(z)E(y2 )] + E(uz)

Given our assumptions, then

so for the logit:

p̂male = P (y = 1 | x1i = 1, x2i = age, x3i = educ)

The probability of being unemployed when female:

p̂f emale = P (y = 1 | x1i = 0, x2i = age, x3i = educ)

similarly for women we find: P (y = 1 | x1 = 0, x2 = age, x3 = 1) − P (y =

w = β0 + β1 ln x + β2 (children) + β3 (adults) + β4 (age) + u,

yi∗ = x0i β + εi , εi ∼ N (0, σ 2 )

h∗i = x02i β2 + ε2i (selection equation)

Вам также может понравиться