Nonlinear Panel Data

Nonlinear Panel Data
Whitney Newey
Fall 2007
Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Panel data control for individual eects correlated with regressors.

Well known how to do this in linear models with additive eects.
Nonlinear model harder.
General set up:
Data: Yi = [Yi1, ..., YiT ]0, Xi = [Xi1, ..., XiT ]0, (i = 1, ..., n).
A linear model:
0 Yit = Xit + i + it, E[ it|Xi, i] = 0.
Alternative, equivalent formulation:

0 E [Yit|Xi, i] = Xit + i.
Species the conditional mean of Yi given Xi, i, and . Likelihood species conditional pdf f (y|x, , ) of Yi given Xi, i and parameter vector .
Example: Normal linear model: For eT a T 1 vector of 10s, Yi|(Xi, i) N (Xi + ieT , 2IT ). This is distributional version of a linear model.
Binary choice model: Yit {0, 1}; e.g. labor force participation.
0 Yit, (t = 1, ..., T ) independent, Prob(Yit = 1|Xi, i) = G(Xit + i). 0 Count data: Yi1, ..., YiT indep, Yit|Xi, i Poisson with mean exp(Xit + i).
Linear model method is to transform data so i drops out. Dierencing gives

0 0 E [Yit Yit1|Xi] = Xit+E[i|Xi](Xi,t1+E[i|Xi]) = (XitXi,t1)0
,
In nonlinear model, i does not drop out when we dierence.

Binary choice example (What about linear probability model?):
0 0 E [Yit Yit1|Xi] = E[G(Xit + i) G(Xit1 + i)|Xi].

Fixed Eects and the Incidental Parameters Problem
Fixed eects is maximizing the log-likelihood over each i as well as .

Fixed eects generally inconsistent in nonlinear model as n grows with T xed.
In a linear model, least squares treating i as a parameter to be estimated is
consistent.
Maximum likelihood treating i as a parameter to be estimated is generally not.
This is known as the incidental parameters problem.
It is caused by only having T observations to estimate each i, so that as n grows
the estimate of i remains random.
In linear models this randomnes gets "averaged out." In nonlinear models it does
not.
Limit of the xed eects estimator as n grows with T xed. Estimator

n 1X = arg max ln f (Yi|Xi, , i). ,1,...,n n i=1
Concentrate out i: For a xed each xed eect is given by i() = max ln f (Yi|Xi, , i).
Substituting in and maximize over to get ,
n 1X = arg max ln f (Yi|Xi, , i()). n i=1
By the usual extremum estimator, as n grows for xed T the estimator has plim T = arg max E[ln f (Yi|Xi, , i())].
T = arg max E[ln f (Yi|Xi, , i())].
. Randomness in i() leads to inconsistnecy of i() = max ln f (Yi|Xi, , i).
If i() were replaced by i() = arg max E[ln f (Y |X, , )],
would get consistency. Like measurement error in nonlinear model. Example: Binary logit, Yit {0, 1}, G(u) = eu/(1 + eu). Known that the xed eects estimator F E satises
p F E 2 0
Bias in can be severe. Not so severe in Tobit model.

Example: Gaussian linear model, FE estimator of 2 converges to

2 = T T 1 2 . T
Bias in estimates of marginal eects less severe. In binary choice, marginal eect is
Z
[G(X 0 0 + ) G(X 0 0 + )]F(d).
Fixed eects estimator is

n X
i=1
G(X 0 + i) G(X 0 + i)]/n
Hahn and Newey (2004) show quite small biases for probit.
Return to this below.
Discuss now how can get consistent estimators.
Conditional Maximum Likelihood

Occasionally there is statistic Si such that i drops out of the conditional likelihood of Yi given Xi and Si. That is, f (Yi|Xi, Si, , i) = f (Yi|Xi, Si, ). Conditional MLE (CMLE). = arg max
n X
i=1
f (Yi|Xi, Si, )
Consistent and asymptotically normal, and asymptotically ecient when the distribution of i conditional on Xi is unrestricted. Problem is Si only exists in a few cases, including Gaussian linear model, logit binary choice, oisson model for count data, and proportional hazards model. In most other models there is no such Si, so conditional MLE has limited usefulness.
Identication Issue:
may not be identied in the semiparametric model where the conditional pdf of Yi given Xi, i is specied as f (y|x, , ) and the conditional pdf of i given Xi is unspecied. Chamberlain (1992): T = 2; Pr(Yit = 1|Xi, i) = G(0dit + x0 0 + i), it 0(u) > 0 everywhere, other regularity conditions. If X is di1 = 0, di2 = 1, G i bounded then 0 is not identied if G(u) is not logistic. Also can show that 0 is not identied for T = 2, Pr(Yit = 1|Xi, i) = ( 0Xit+ i), Xit {0, 1}. See following graph.
Extent of nonidentication (e.g. for censored models) is not clear.

No consistent estimator in nonidentied cases.
Could directly estimate identied set.
Recent progress, Honore and Tamer (2006) and other work.
Dicult when Xit takes on many values.
Other approaches are a) restrict distribution of i given Xi; b) nd clever estimators
for identied models; c) large T xed eect bias corrections;
Correlated Random Eects:

Restricts conditional distribution of i given Xi.
Here consider parametric models; there are nonparametric and semiparametric ver
sions.
Let g(|X, ) be conditional pdf of given X.
Likelihood of Y given X is integrates out , as in f (Y |X, , ) = The MLE is given by
n n 1X 1X , = arg max ln ln f (Yi|Xi, , ) = , n i=1 n i=1
f (Y |X, , )g(|X, )d.
f (Yi|Xi, , )g(|Xi, )d
Consistency of depends on the g(|X, ) being correctly specied.

May be dicult to calculate the integral.
Also, hard to form g(|X, ) in time consistent fashion.
Example: Correlated random eects probit.

Yit = 1(Yit > 0) where conditional on (Xi, i), Yi , ..., YiT are independent 1 has distribution N (X 0 + , 2). Let x = vec(X 0) be the vector and Yit i t i it 0 i of all observations across t on the regressors. Suppose also that the conditional distribution of i given Xi is N (x0 , 2 ). Note that conditional on Xi, i 0 Yit N (Xit 0 + x0 , 2 + 2 ). t i
Then for = ( 0, 0, 2, ..., 2 , 2 )0 and et the tth T 1 unit vector, 1 T

0 0 Xit + xi Pr(Yit = 1|Xi, ) = x0 )/ 2 )d =
q
i 2 + 2 t 0 , = et + . q = xi t t 2 + 2 t
This is a marginal likelihood for Yit. Joint likelihood is very complicated. Yi1, ..., YiT not independent conditional on Xi. This is generally true in models where integrate out i.
Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare
(http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Estimation: Do q marginal likelihood (probit) to get 1, ..., T . Normalize 1 = 1 and let t = 1/ 2 + 2 , (t = 1, ..., T ), where we normalize 1 = 1. Repara t meterize so that = ( 0, 0, 2, ..., T )0 and for = ( 0 , ..., 0 )0 let 1 T
h(, ) =

1 1 e1 . .

.
T T eT
We can then do minimum distance, using = ( 0 , ..., 0 )0 mentioned above. 1 T = arg min h( , )0W h( , ).
h( , ) is linear in so easy to do. Ecient two-step estimator. For V an estimator of the joint asymptotic variance of , let = arg min h( , )0V 1h( , ). Then let D = diag(I, 2I, ..., T I) where I is an identity matrix with the same dimension as . Then DV D is estimator of the variance of n( 0), so optimal minimum distance is
0 DV D 1 h( , ). = arg min h( , )
Empirical example from Chamberlain (1984).

Labor force participation, with n = 924 and T = 4, four years. 1968, 70, 72, 74.
Two Xit number of children under 6 and number of children. Here are the results:
Probit -.121 -.058 Logit .-573 -.336 (.046) (.029) (.115) (.120) Quite dierent estimates; ratios are similar. Correlated random eects depends on T in an essential way.
Many coecients. PT t=1 Xit/T. 0 A more parsimonius model is i
N (Xi , 2 ), Xi =

Marginal Eects Marginal eect for change in X is, for F () the CDF of , t(X ) t(X), (X) =
Z
((X 0 0 + )/ t)F (d)
By iterated expectations, holding X xed, t(X) = E[1(X 0 0 + i + it > 0)] = E[E[1(X 0 0 + i + it > 0)|Xi]] = E[( t(X 0 0 + x0 0))] i This object can be estimated by t(X) =
n X
(t(X 0 + x0 ))/n i
i=1
Would be interesting to compare this estimator with xed eects marginal eect in the empirical example.
Some Semiparametric Results

Some distribution free results that are useful. Poisson model: Conditional on (Xi, i), Yit is independent over time and Poisson 0 with mean eXit+i . Good model for patents; see Hausman, Hall, Griliches (1984). Wooldridge showed that consistency of CMLE only requires E [Yit|Xi, i] = eXit+i Binary choice: Manski maximum score estimator; Conditions for consistency include
innite support.
Tobit: Honore
Manski and Honore require homoskedasticity over time.
Does not hold in linear model applications.
Large T Fixed Eects Bias Correction
Let T denote plim of xed eects estimator.

As T grows limT T = 0.
Under smoothness,
T = 0 + Example: Gaussian linear model
2 2 = T 1 2 = 2 = 2 + B , B = 2. T
B 1 + O( 2 ). T T
Also n and T grow, we should have

d 1/2 b (nT ) T N (0, ).
1 B d 1/2 b T = 0 + + O( 2 ), (nT ) N (0, ). T T T
As a way to think about how bad xed eects bias can be, consider n/T .
1/2 1/2 b b = (nT ) + (nT )1/2(T 0) (nT ) 0 T B 1/2 b = (nT ) + (nT )1/2 + O((nT )1/2/T 2) T T d N B1/2, .
Here there is asymptotic bias.

Consequently, usual asymptotic condence intervals incorrect.
b Asymptotic normality of , centered at its probability limit, like misspecication

result (e.g. White, 1982).
Analytical Bias Correction

Find formula for B, construct estimator B . Bias corrected estimator is 1 = B/T. To show when this works, suppose
p (nT )1/2(B B)/T 0.
For example, if B itself has (nT )1/2 (B B) asymptotically normal then holds. Plugging in as before we get,
1/2 1/2 b b = (nT ) (nT ) 1 0 T +(nT )1/2(T 0 B/T ) 1/2 1/2(B B )/T b = (nT ) T + (nT ) +O((nT )1/2/T 2)
N (0, ) .
Iterated Analytical Correction
Often the bias formula will depend on , so that B = B ().

Can iterate the bias correction:
j = B (j1)/T. Iterating to convergence would give = B ()/T. Does not improve asymptotic properties. Can improve small sample properties.
Jackknife Bias Correction
Use how changes with T to form implicit bias correction.

Does not require formula for B.
Let (t) denote xed eects estimator not using tth time period.
Jackknife estimator is
e b T (T 1)
T X
t=1
b (t)/T.
Explain with expansion, B D 1 T = 0 + + 2 + O . 3 T T T

e imit of for xed T and how it changes with T shows bias correction.

= 0 + O
p T T (T 1) T 1 = 0 +
1 . 2 T
1 1 1 D+O T T 1 T2
Example: Variance estimation in Gaussian l model (Neyman and Scott, 1948):
zit is i.i.d. with distribution N (i, 0).

Here T = T 1 0 = 0 0 . T T
Thus B = 0. Analytical correction:

p
T 1 T 1 1 = + + /T 0 2 T T Is not consistent for xed T. Iterating analytical correction is = + /T, T = . T 1 Can also show that this is jackknife. Here is consistent for xed T .
Monte Carlo Example: Like Heckman (1981). Design is:

yit = 1(xit0 + i + it > 0), i N (0, 1), it N (0, 1), xit = t/10 + xi,t1/2 + uit, xi0 = ui0, uit = U(1/2, 1/2). N = 100, T = 8; = 1, 1. Marginal eect is average derivative of (x0 + ), = 0E [(x00 + i)]. The xed eects estimator of this object is =
n X
i=1
0 + /n. x i
Consider analytical and jacknife bias corrections.
Table Three: Properties Estimator of 0 Mean Med. MLE 1.18 1.17 Jackknife .953 .950 Analytic 1.05 1.05 Analytic-M 1.05 1.05
of T = 8. , SD p; .05 p; .10 .151 .267 .370 .119 .056 .102 .134 .062 .135 .132 .060 .126
Table Five: Properties Estimator of 0 Mean Med. MLE 1.42 1.41 Jackknife .752 .743 Analytic 1.12 1.11 Analytic-M 1.21 1.20
of T = 4 , SD p; .05 p; .10 .397 .269 .373 .262 .100 .177 .306 .055 .101 .335 .102 .172
Table Four: Properties of Estimator of /0 Mean Med. MLE 1.02 1.02 Jackknife 1.00 .992 Analytic 1.02 1.02 Analytic-M 1.02 1.02
, T = 8. SD p;.05 p;.10 .131 .078 .140 .130 .086 .159 .133 .090 .153 .131 .087 .154
Table Six: Properties of Estimator of /0 Mean Med. MLE 1.00 1.00 Jackknife 1.06 1.05 Analytic .996 .994 Analytic-M 1.05 1.05
, T = 4. SD p; .05 p; .10 .257 .103 .168 .307 .159 .224 .265 .113 .178 .266 .117 .185
Bounds for Marginal Eects:
Assume Xit {0, 1}. Pr(Yit = 1|Xi, i) = (0Xit + i).

Object of interest
0 =
[(0 + ) ()]F0(d)
Average change in the probability of Yit = 1.

Let 0 and 1 denote T 1 vectors of 00s and 10s respectively.
Dene =
Z
[(0 + ) ()]F0(d|Xi { 0 , 1 }). /
Then is identied.
= is identied.
[(0 + ) ()]F0(d|Xi { 0 , 1 }). /
Proof: Consider X { 0 , 1 }. Then there is t(X) such that xt(X) = 1 and / s(X) such that xs(X) = 1. Then we have E[yi,t(X) yi,s(X)|Xi = X] = E[yi,t(X) yi,s(X)|Xi = X, i]|Xi = X] =
X Z
[(0 + ) ()]F0(d|Xi = X).
Let P (X) = Pr(Xi = X). Then =

X { 0 , 1 } /
P (X)E[yi,t(X) yi,s(X)|Xi = X].
X { 0 , 1 } /
P (X)E[yi,t(X) yi,s(X)|Xi = X].
x Cannot identify [(0 + ) ()]F0(d| ) for x { 0 , 1 }. is over identied for T > 2. Simple estimator:

/ Let n = #{i : Xi { 0 , 1 }}.
= 1 n
X { 0 , 1 } {i|Xi=X} /
yi,t(X) yi,s(X) .
Bounds for 0.
Let D = 1 ( > 0) . Let P = P ( 0 ) + P ( 1 ) (1 P ) (1 D)P 0 (1 P ) + DP Tight bounds use the form (0 + ) ().
Bounds shrink to a point exponentially fast at T grows.
There are 2T possible X so P ( 0 )+P ( 1 ) will shrink like C2T for some constant
C. This fast shrinkage rate might be conjectured fom the bias corrections. In smooth models (all derivative existing) one can form a bias correction that approaches the truth at T J for any integer J.

Nonlinear Panel Data

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Nonlinear Panel Data

Загружено:

Авторское право:

Доступные форматы

Nonlinear Panel Data

Panel data control for individual eects correlated with regressors.

0 Yit = Xit + i + it, E[ it|Xi, i] = 0.

Alternative, equivalent formulation:

Linear model method is to transform data so i drops out. Dierencing gives

In nonlinear model, i does not drop out when we dierence.

0 0 E [Yit Yit1|Xi] = E[G(Xit + i) G(Xit1 + i)|Xi].

Fixed Eects and the Incidental Parameters Problem

Fixed eects is maximizing the log-likelihood over each i as well as .

Limit of the xed eects estimator as n grows with T xed. Estimator

Substituting in and maximize over to get ,

n 1X = arg max ln f (Yi|Xi, , i()). n i=1

T = arg max E[ln f (Yi|Xi, , i())].

. Randomness in i() leads to inconsistnecy of i() = max ln f (Yi|Xi, , i).

If i() were replaced by i() = arg max E[ln f (Y |X, , )],

Bias in can be severe. Not so severe in Tobit model.

Example: Gaussian linear model, FE estimator of 2 converges to

[G(X 0 0 + ) G(X 0 0 + )]F(d).

Fixed eects estimator is

G(X 0 + i) G(X 0 + i)]/n

Conditional Maximum Likelihood

Extent of nonidentication (e.g. for censored models) is not clear.

Correlated Random Eects:

f (Y |X, , )g(|X, )d.

Consistency of depends on the g(|X, ) being correctly specied.

Example: Correlated random eects probit.

Then for = ( 0, 0, 2, ..., 2 , 2 )0 and et the tth T 1 unit vector, 1 T

Empirical example from Chamberlain (1984).

((X 0 0 + )/ t)F (d)

Some Semiparametric Results

Large T Fixed Eects Bias Correction

Let T denote plim of xed eects estimator.

Also n and T grow, we should have

1 B d 1/2 b T = 0 + + O( 2 ), (nT ) N (0, ). T T T

Here there is asymptotic bias.

b Asymptotic normality of , centered at its probability limit, like misspecication

Analytical Bias Correction

Iterated Analytical Correction

Often the bias formula will depend on , so that B = B ().

Jackknife Bias Correction

Use how changes with T to form implicit bias correction.

Explain with expansion, B D 1 T = 0 + + 2 + O . 3 T T T

Example: Variance estimation in Gaussian l model (Neyman and Scott, 1948):

zit is i.i.d. with distribution N (i, 0).

Thus B = 0. Analytical correction:

Monte Carlo Example: Like Heckman (1981). Design is:

Consider analytical and jacknife bias corrections.

Bounds for Marginal Eects:

Assume Xit {0, 1}. Pr(Yit = 1|Xi, i) = (0Xit + i).

Average change in the probability of Yit = 1.

[(0 + ) ()]F0(d|Xi { 0 , 1 }). /

[(0 + ) ()]F0(d|Xi { 0 , 1 }). /

[(0 + ) ()]F0(d|Xi = X).

Let P (X) = Pr(Xi = X). Then =

P (X)E[yi,t(X) yi,s(X)|Xi = X].

P (X)E[yi,t(X) yi,s(X)|Xi = X].

Вам также может понравиться