Вы находитесь на странице: 1из 14

Econometrics II, Solutions to PS4, Summer 2010.

Page: 1

Problem 1. The results from estimating binary outcome model are as


follows:

rob(yi = 1|xi ) = F (xi β̂) = F (0.30 + 0.60x1i − 0.40x2i )


P[
Denote by p = P rob(yi = 1|xi ).

Problem 1(a)

i Evaluating partial effects.


∂p ∂p
A Then, ∂x 1
= 0.60f (Xβ) and ∂x 2
= −0.40f (Xβ), where
Xβ = 0.30 + 0.60x1 − 0.40x2 .
When p = 0.80, knowing that the estimates are from the probit
model, we get (see Stata code for the details) Xβ = 0.8416 and
thus:
∂p ∂p
= 0.1680 and = −0.1120.
∂x1 ∂x2

B When x1 = 1 and x2 = 2 then Xβ = 0.1, thus f = 0.3970 and


from there follows:
∂p ∂p
= 0.2382 and = −0.1588
∂x1 ∂x2

C Same exercise can be done with x1 = 2 and x2 = 2, giving the


results:
∂p ∂p
= 0.1874 and = −0.1249
∂x1 ∂x2

ii When we want to calculate P rob(yi = 1) at x1 = 1 and x2 = 2, we


get p1 = 0.5398 and for x1 = 2 and x2 = 2, p2 = 0.7580. Thus
p2 − p1 = 0.2182 is the change in p, when we change x1 by one. This
is between 0.2382 and 0.1874, which are the results of partial effect of
change of x1 on p.

Problem 1(b). Now we assume that results are from the logit model.
We should evaluate the three terms at x1 = 1 and x2 = 2.
∂p ∂p
i ∂x1 = 0.60p(1−p) = 0.60λ(Xβ) and ∂x2 = −0.40p(1−p) = −.40λ(Xβ).
Econometrics II, Solutions to PS4, Summer 2010. Page: 2

Thus
exp(Xβ) exp(0.30 + 0.60(1) − 0.40(2))
λ(Xβ) = 2
= = 0.2494
[1 + exp(Xβ)] [1 + exp(0.30 + 0.60(1) − 0.40(2))]2
∂p ∂p
Then we get ∂x1 = 0.1496 and ∂x2 = −0.0998
∂(1−p) ∂(1−p)
ii ∂x1 = −0.1496 and ∂x2 = 0.0998

p ∂(l) ∂(l)
iii l = log( 1−p ) = 0.30 + 0.60x1 − 0.40x2 , so ∂x1 = 0.60 and ∂x2 = −0.40

Problem 2. (Intro Wooldridge 17.2) The estimated model (inside the


logit function):

X β̂ = −1.77 + 0.24 · hsGP A + 0.00058 · SAT + 0.073 · study

In order to evaluate the effect of the increase in study by 5 hours, we need


to compute the estimated probability at hsGP A = 3, SAT = 1200 and
study = 10 and subtract this from the estimated probability with hsGP A =
3, SAT = 1200 and study = 5. We evaluate

Λ(hsGP A = 3, SAT = 1200, study = 10) = Λ(0.976)


exp(0.976)
and then we plug into logit function 1+exp(0.976) ≈ 0.726. Thus this is the
estimated probability that a student-athlete with the given characteristics
and study = 10 graduates in five years. Further, we evaluate the probability
of graduating in five years for the same student but with study = 5,

Λ(hsGP A = 3, SAT = 1200, study = 5) = Λ(0.611)


exp(0.611)
Evaluating the logit function at this value we get 1+exp(0.611) ≈ 0.648. The
difference in estimated probabilities is 0.08.

Problem 2 Extra Question (a) In order to estimate the desired


difference, we need to compute the estimated graduation probability at
hsGP A = 3.0, SAT = 1000 and study = 10 and subtract this from the
estimated probability at hsGP A = 3.5, SAT = 1000 and study = 10.
We evaluate Λ(hsGP A = 3.0, SAT = 1000, study = 10) = 0.7027 and
Λ(hsGP A = 3.5, SAT = 1000, study = 10) = 0.7271. The difference in
estimated probabilities is then 0.0244.
Econometrics II, Solutions to PS4, Summer 2010. Page: 3

Problem 2 Extra Question (b) To maintain the graduation prob-


ability unchanged, we need to preserve the exact numerical value of the
argument of the logit function - i.e. we need to solve the following equation:

−1.77+0.24·2.0+0.00058·1200+0.073·5 = −1.77+0.24·hsGP A+0.00058·1200

The individual hsGP A will have to be increased to 3.52 to compensate the


drop in study to zero.

Problem 3. (Intro Wooldridge 17.15)

Problem 3(i). 166 out of 401 participated in the job training program.
The highest number of months a man participated was 24.

Problem 3(ii). The F -statistic for the joint significance of the ex-
planatory variables is F (7, 393)=1.40 with the respective p−value of 0.2027.
Hence the explanatory variables are jointly insignificant even at the 20 per
cent level.

Problem 3(iii). The likelihood ratio test statistic for the joint signif-
icance of the explanatory variables in the probit model is equal to 10.03
with the respective p − value of 0.1866, which is quite similar to the result
obtained in part (ii).

Problem 3(iv). Since training eligibility appears to be independent


of the demographic and pre-training variables considered above, it must
have been randomly assigned among the participants. One may therefore
conclude that training eligibility can indeed be treated as exogenous for
explaining the 1978 unemployment status.

Problem 3(v). The LPM results are as follows:

unem78 = 0.3447 − 0.1037 · train


(0.0298) (0.0464)

As one can see, participating in the job training program lowers the esti-
mated probability of being unemployed in 1978 by more than 10 percentage
points: the probability of being unemployed without participation is 0.3447,
and the training program reduces it to 0.2410. The difference is statistically
significant at the 5 per cent level.
Econometrics II, Solutions to PS4, Summer 2010. Page: 4

Problem 3(vi). The estimated probit model is:

P (unem78 = 1|train) = Φ(−0.4000 − 0.3035 · train)


(0.0842) (0.1358)

And since the probabilities have different functional forms, it makes no sense
to compare the coefficient on train for the probit model with the above LPM
estimate.

Problem 3(vii). The fitted probabilities for both functional forms


are absolutely identical and equal to 0.3447 when train=1 and 0.2409 when
train=0. This has to be the case, because either method simply delivers the
cell frequencies as the estimated probabilities. Hence it does not matter,
which functional form is used to measure the effect and statistical signifi-
cance of the job training program. The LPM coefficient estimates are way
more straightforward to interpret, though.

Problem 3(viii). The fitted values are no longer identical because


the model is not saturated, that is, the explanatory variables are not an
exhaustive, mutually exclusive set of dummy variables. But since the other
explanatory variables are insignificant, the fitted values are highly corre-
lated: the LPM and probit fitted values have a correlation of 0.9922.

Problem 4. We have an iid sample with N observations on yi , a variable


that describes the duration of unemployment (with yi > 0 for all i). Assume
that yi is distributed exponentially, with f (y|γ) = γ exp(−γ y). Derive the
ML estimator of γ.

The likelihood for the sample of size N is obtained as follows:


N
Y
L(yi |γ) = γ exp(−γ yi )
i=1

The log-likelihood is then equal to:


N
X N
X N
X N
X
ln L(yi |γ) = ln[γ exp(−γ yi )] = ln γ − γ yi = N ln γ − γ yi
i=1 i=1 i=1 i=1
Econometrics II, Solutions to PS4, Summer 2010. Page: 5

Maximizing this log-likelihood w.r.t. γ yields:


N
∂ ln L N X
= − yi = 0
∂γ γ
i=i

Hence
N 1
γ̂M LE = PN =
i=1 yi
y

Problem 5. A latent variable yi∗ is generated by

yi∗ = xi β + εi

where εi is N (0, σi2 ) and σi2 = exp(zi γ). We observe yi defined as yi =


1(yi∗ > 0), and variables in vector xi and zi . Write down the log-likelihood
function of the model, in terms of the parameters: γ, and β.

P rob(yi = 1|xi ) = P rob(yi∗ > 0|xi ) = P rob (i > −xi β) =


    !
−xi β xi β xi β
= 1−Φ =Φ =Φ p ,
σi σi exp(zi γ)

where Φ is the standard normal CDF. In the similar way,

P rob(yi = 0|xi ) = P rob(yi∗ ≤ 0|xi ) = P rob(i ≤ −xi β) =


−xi β −xi β
= Φ( ) = Φ( p ).
σi exp(zi γ)

The log-likelihood function is then


N
X
logL = yi ln[P rob(yi = 1)] + (1 − yi ) ln[P rob(yi = 0)] =
i=1
N
X xi β −xi β
= yi ln[Φ( p )] + (1 − yi ) ln[Φ( p )].
i=1
exp(zi γ) exp(zi γ)

Afterwards, we maximize this logL with respect to parameters γ and β.


Econometrics II, Solutions to PS4, Summer 2010. Page: 6

Problem 6.

Problem 6(a).
i We have model P (yi = 1|xi , di ) = Φ(α + βxi + γx2i + δdi ). In order to
find the partial effects of xi and di , we compute:

∂P (y = 1|xi , di )
= φ(α + βxi + γx2i + δdi )(β + 2γxi ),
∂xi

∆P (y = 1|xi , di )
= P (y = 1|xi , di = 1) − P (y = 1|xi , di = 0) =
∆di
= Φ(α + βxi + γx2i + δ) − Φ(α + βxi + γx2i ).

ii By the same token, for P (yi = 1|xi , di ) = Φ(α + βxi + γdi + δxi di ), we
get:

∂P (y = 1|xi , di )
= φ(α + βxi + γdi + δxi di )(β + δdi ),
∂xi

∆P (y = 1|xi , di )
= P (y = 1|xi , di = 1) − P (y = 1|xi , di = 0) =
∆di
= Φ(α + βxi + γ + δxi ) − Φ(α + βxi ).

Problem 6(b). To estimate either of the above partial effects, one


would estimate probit of yi on xi and di to get α̂, β̂, γ̂ and δ̂. After that,
one would need to evaluate the corresponding expressions at x, d and the
estimated values of the model parameters.

Problem 6(c). We want to obtain the standard errors of the estimated


partial effects from part (b). As the partial effects are non-linear functions
of the estimated parameters, we have to use the delta method:
ˆ → N (g(β), G0 (β̂)var(β̂)G(β̂)),
if β̂ → N (β, var(β̂)), then g(β)

where G(β̂) = ∂g( β̂) 0


∂β . For example, let θ = (α, β, γ, δ) . Then the first partial
effect from part (a):

∂P (y = 1|xi , di )
→ N (g(θ), G0 (θ̂)var(θ̂)G(θ̂)),
∂xi
Econometrics II, Solutions to PS4, Summer 2010. Page: 7

∂g ∂g ∂g ∂g
where g(θ) = φ(α + βxi + γx2i + δdi )(β + 2γxi ) and G0 (θ̂) = [ ∂α , ∂β , ∂γ , ∂δ ]
and standard error of ∂P (y=1|x
∂xi
i ,di )
is just sqrt[G0 (θ̂)var(θ̂)G(θ̂)]. For the
other three partial effects, we proceed analogically.

Problem 7. (Advanced Wooldridge 15.7)

Problem 7(a). The results of estimated equation follow:

Table 1: Results of LPM estimations


arr86 coef. st.err. het.st.err.
const .361 .016 .017
pcnv -.154 .021 .019
avgsen .004 .006 .006
tottime -.002 .005 .004
ptime86 -.022 .004 .003
inc86 -.001 .000 .000
black .162 .024 .026
hispan .089 .021 .021
born60 .003 .017 .017
R2 0.08
obs. 2725

The estimated effect from increasing pcnv from 0.25 to 0.75 is −0.154 ∗
(0.5) = −0.077, thus the probability of arrest falls by about 7.7 points.
There are no important differences between the usual and robust standard
errors.

Problem 7(b). We want to test joint significance of avgsen and tottime


using robust and non-robust test. The robust statistics and its p-value are
gotten by using the test command after appending robust to the regression
command. In both cases we do not reject H0 about joint insignificance of
avgsen and tottime.

Problem 7(c). The results of probit regression are in Table 2.


Econometrics II, Solutions to PS4, Summer 2010. Page: 8

Table 2: Results of probit estimations


arr86 coef. st.err.
const -.314 .051
pcnv -.553 .072
avgsen .013 .021
tottime -.008 .019
ptime86 -.081 .018
inc86 -.005 .000
black .467 .072
hispan .291 .065
born60 .011 .056
Pseudo R2 0.08
obs. 2725

First, we will get averages of variables avgsen, tottime, ptime86, inc86

variable mean
avgsen 0.632
tottime 0.839
ptime86 0.387
inc86 54.967

Now, we must compute the difference in the normal cdf at the two dif-
ferent values of pcnv (0.25 and 0.75), black = 1, hispan = 0 and born60 = 1
and at the average values of the remaining variables. The probability falls
by about 0.10, what is larger than the result obtained from the LPM.

Problem 7(d). We want to obtain percentage of correctly predicted


outcomes. We first generate the predicted values of arr86 by command:
predict p̂ and then gen arr86h = phat > 0.5 Then we tabulate arr86h arr86
Thus we can see that correctly predicted that person will be not arrested

arr86h arr86=0 arr86=1 Total


0 1903 677 2580
1 67 78 145
Total 1970 755 2725

is 1903/1970 = 96% and correctly predicted that person will be arrested is


Econometrics II, Solutions to PS4, Summer 2010. Page: 9

78/755 = 10%.The overall percent correctly predicted is quite high, but we


cannot very well predict the outcome we would most like to predict.

Problem 7(e). Now, we add quadratic terms pcnv2, ptime862 and


inc862 to the model. We want to test that these predictors are individually
and jointly significant.
The results of probit regression are following:

Table 3: Results of probit estimations


arr86 coef. st.err.
const -.337 .056
pcnv .217 .260
avgsen .014 .024
tottime -.012 .020
ptime86 .745 .144
inc86 -.006 .000
black .437 .073
hispan .266 .067
born60 -.015 .057
pcnv2 -.857 .271
pt862 -.104 .022
inc862 .000 .000
Pseudo R2 0.10
obs. 2725

The quadratic terms are individually and jointly significant.

Problem 7(f ). The extrema for all three models considered are shown
in Table 4. As you can see, the three maxima are quite similar, while the
minima differ substantially, the one for LPM even being negative (hence one
of the major disadvantages of the model).

Table 4: Model Extrema Estimated


Model Min Max
LPM -.411208 .5665109
probit .0006766 .6207836
logit .003771 .6293383
Econometrics II, Solutions to PS4, Summer 2010. Page: 10

Problem 7(g). Using the same method as in part (d), the following
cross-tables were obtained for LPM and logit (Table 5 and Table 6, respec-
tively). We can see that correctly predicted that person will be not arrested

Table 5: LPM Fit Test


arr86h arr86=0 arr86=1 Total
0 1919 697 2616
1 51 58 109
Total 1970 755 2725

Table 6: Logit Fit Test


arr86h arr86=0 arr86=1 Total
0 1869 665 2534
1 101 90 191
Total 1970 755 2725

is 1903/1970 = 97% and 1869/1970 = 95% for the LPM and logit models,
respectively. Correctly predicted that person will be arrested is 58/755 = 8%
and 90/755 = 12% for the LPM and logit models, respectively. As it was
the case with the probit model in part (d), the overall percent correctly pre-
dicted is quite high, but we cannot very well predict the outcome we would
most like to predict.
Econometrics II, Solutions to PS4, Summer 2010. Page: 11

STATA CODES

* =================================================================
* Problem 1
* =================================================================
* a)
* =================================================================
scalar a = invnormal(0.8) /* X*beta */
scalar b = normalden(a) * .6 /* derivative w.r.t. x_1 */
scalar c = normalden(a) * -.4 /* derivative w.r.t. x_2 */
scalar d = 0.30 + 0.60 - 0.40*2 /* X*beta */
scalar e = normalden(d) * .6 /* derivative w.r.t. x_1 */
scalar f = normalden(d) * -.4 /* derivative w.r.t. x_2 */
scalar g = 0.30 + 0.60*2 - 0.40*2 /* X*beta */
scalar h = normalden(g) * .6 /* derivative w.r.t. x_1 */
scalar i = normalden(g) * -.4 /* derivative w.r.t. x_2 */
scalar j = normal(g) - normal(d) /* discrete change in Prob(y=1) */
scalar list /* displaying the results */

* =================================================================
* b)
* =================================================================
scalar k = exp(d) / (1 + exp(d))ˆ2 * .6 /* derivative w.r.t. x_1 */
scalar l = exp(d) / (1 + exp(d))ˆ2 * -.4 /* derivative w.r.t. x_2 */
scalar list k l /* displaying the results */

* =================================================================
* Problem 3 (baby-Wooldridge 17.15)
* =================================================================
set seed 123 /* setting initial value of random-number seed */
use jtrain2 /* inputting data file */
sample 90 /* sample 90% of all data */

* =================================================================
* i)
* =================================================================
count if train == 1 /* counting # of participants */
tab mostrn /* determining max participation length */

* =================================================================
* ii)
* =================================================================
reg train unem74 unem75 age educ black hisp married /* LPM */
test unem74 unem75 age educ black hisp married /* F-test */

* =================================================================
* iii)
* =================================================================
Econometrics II, Solutions to PS4, Summer 2010. Page: 12

probit train unem74 unem75 age educ black hisp married /* Probit */
estimates store probit_unr /* storing estimation results [UNR] */
probit train /* Probit [RESTRICTED MODEL] */
estimates store probit_r /* storing estimation results [R] */
lrtest probit_unr probit_r /* likelihood ration test */

* =================================================================
* v)
* =================================================================
reg unem78 train /* LPM */

* =================================================================
* vi)
* =================================================================
probit unem78 train /* Probit */

* =================================================================
* vii)
* =================================================================
reg unem78 train /* LPM */
predict fit_lpm /* generating fitted values */
probit unem78 train /* Probit */
predict fit_probit /* generating fitted values */
tab fit_lpm fit_probit /* comparing fitted values */

* =================================================================
* viii)
* =================================================================
reg unem78 train unem74 unem75 age educ black hisp married
predict fit_lpm_all /* generating fitted values */
probit unem78 train unem74 unem75 age educ black hisp married
predict fit_probit_all /* generating fitted values */
sum fit_lpm_all fit_probit_all /* comparing fitted values */
correlate fit_lpm_all fit_probit_all /* calculating correl. */

* =================================================================
* Problem 7 (adult-Wooldridge 15.7)
* =================================================================
infile narr86 nfarr86 nparr86 pcnv avgsen tottime ptime86 ///
qemp86 inc86 durat black hispan born60 pcnvsq pt86sq ///
inc86sq ///
using CRIME.raw, clear /* input the raw file */

* =================================================================
* a) LPM
* =================================================================
g arr86=(narr86>0) /* define =1 if narr86>0, =0 otherwise */
reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan ///
Econometrics II, Solutions to PS4, Summer 2010. Page: 13

born60, robust
/* computing the effect of change in pcnv 0.25 - >0.75 */
scalar pcnv_025_075_lpm=_b[pcnv]*(0.75-0.25)
di "Effect of increasing pcnv from 0.25 to 0.75 (LPM): " ///
pcnv_025_075_lpm

* =================================================================
* b) Testing joint significance of avgsen and tottime
* =================================================================
/* we suppress regression output with command "quietly" */
qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
test avgsen tottime /* non-robust test */
qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan ///
born60, robust
test avgsen tottime /* robust test */

* =================================================================
* c) Probit
* =================================================================
probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
/* compute the effect of increasing pcnv from 0.25 to 0.75 */
mfx, at(pcnv=0.25 black=1 hispan=0 born60=1)
scalar pcnv025 = e(Xmfx_y)
mfx, at(pcnv=0.75 black=1 hispan=0 born60=1)
scalar pcnv075 = e(Xmfx_y)
scalar pcnv_025_075_probit = pcnv075 - pcnv025
di "Effect of increasing pcnv from 0.25 to 0.75 (Probit): " ///
pcnv_025_075_probit
di "Effect of increasing pcnv from 0.25 to 0.75 (LPM): " ///
pcnv_025_075_lpm

* =================================================================
* d) % correctly predicted by them model
* =================================================================
estat class /* you can calculate the output by hands: */
* predict prob_hat, pr /* predict Pr. of ’+’ outcome for arr86 */
* g arr86_hat = (prob_hat > 0.5) /* predict arr86 */
* tab arr86_hat arr86 /* tabulate predictions */
* di "Correct prediction for not arrested men" 1903/1970
* di "Correct prediction for arrested men" 78/755

* =================================================================
* e) Probit with additional variables
* =================================================================
probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan ///
born60 pcnvsq pt86sq inc86sq
test pcnvsq pt86sq inc86sq /* test for joint significance */
scalar pcnv_max = -_b[pcnv]/(2*_b[pcnvsq]) /* inflaction point */
di "pcnv inflection point: " pcnv_max /* display the point */
Econometrics II, Solutions to PS4, Summer 2010. Page: 14

* =================================================================
* f) predicted probability extrema comparison
* =================================================================
reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60
predict prob_hat_lpm, xb /* predict Pr. of ’+’ outcome for
arr86 */ logit arr86 pcnv avgsen tottime ptime86 inc86 black
hispan born60 predict prob_hat_logit, pr /* predict Pr. of ’+’
outcome for arr86 */ sum prob_hat prob_hat_logit prob_hat_lpm

* =================================================================
* g) model goodness-of-fit comparison
* =================================================================
g arr86_hat_lpm =(prob_hat_lpm > 0.5) /* predict arr86 */ g
arr86_hat_logit =(prob_hat_logit > 0.5) /* predict arr86 */
tab arr86 arr86_hat_lpm /* goodness-of-fit cross-table */
display 1919 / 1970 /* calculating percentage predicted */
display 58 / 755 /* calculating percentage predicted */
tab arr86 arr86_hat_logit /* goodness-of-fit cross-table */
display 1869 / 1970 /* calculating percentage predicted */
display 90 / 755 /* calculating percentage predicted */

Вам также может понравиться