Panel Data Analysis

Copyright 2006 Pearson Addison-Wesley. All rights reserved.
Lecture 24:
Panel Data
(Chapter 16.116.2,
16.4)
Copyright 2006 Pearson Addison-Wesley. All rights reserved. 24-2
Agenda
Panel Data (Chapter 16.1)
Example: Cross-Industry Wage Effects
Panel Data DGPs (Chapter 16.1)
Fixed Effects (Chapter 16.2)
Random Effects (Chapter 16.2)
Example: Production Functions (Chapter 16.2)
The Hausman Test (Chapter 16.4)
Panel Data
Potential unobserved heterogeneity is a form
of omitted variables bias.
Unobserved heterogeneity refers to omitted
variables that are fixed for an individual (at
least over a long period of time).
A persons upbringing, family characteristics,
innate ability, and demographics (except age)
do not change.
Panel Data (cont.)
With cross-sectional data, there is no
particular reason to differentiate
between omitted variables that are
fixed over time and omitted variables
that are changing.
However, when an omitted variable is
fixed over time, panel data offers
another tool for eliminating the bias.
Panel Data (cont.)
Panel Data is data in which we
observe repeated cross-sections of the
same individuals.
Examples:
Annual unemployment rates of each state over
several years
Quarterly sales of individual stores over
several quarters
Wages for the same worker, working at several
different jobs
Panel Data (cont.)
By far the leading type of panel data is
repeated cross-sections over time.
The key feature of panel data is that we
observe the same individual in more
than one condition.
Omitted variables that are fixed will take
on the same values each time we
observe the same individual.
Panel Data (cont.)
Some of the most valuable data sets in
economics are panel data sets.
Longitudinal surveys return year after
year to the same individuals, tracking
them over time.
Panel Data (cont.)
For example, the National Longitudinal
Survey of Youth (NLSY) tracks
labor market outcomes for thousands
of individuals, beginning in their
teenage years.
The Panel Survey of Income Dynamics
provides high-quality income data.
Example:
Cross-Industry Wage Disparities
A great puzzle in labor economics is the
presence of cross-industry wage disparities.
Workers of seemingly equivalent ability, in
seemingly equivalent occupations, receive
different wages in different industries.
Do high-wage industries actually pay higher
wages, or do they attract workers of
unobservably higher quality?
Cross-Industry Wage Differentials (cont.)
Gibbons and Katz (Review of Economic
Studies 1992) exploited panel data to explore
these differentials.
They observed workers in 1984 and 1986.
They focused on workers who lost their
1984 jobs because of plant closings (on the
grounds that plant closings are unlikely to be
correlated with an individual workers
abilities). They looked only at workers who
were re-employed by 1986.
Gibbons and Katz estimated wages as
where X
kit
are demographic variables,
and D
it
are a set of dummy variables for
being employed in different industries
_1
1 1 1
_
ln ..
..
Industry
it o it k kit it
Industry m
m it it
w X X D
D
o o o o
o c
= + + + +
+ +
Cross-Industry Wage Differentials
Estimating with simple OLS, Gibbons and
Katz estimate os that are very similar to other
estimates of cross-industry wage differentials.
_1
1 1 1
_
ln ..
..
Industry
it o it k kit it
Industry m
m it it
w X X D
D
o o o o
o c
= + + + +
+ +
Gibbons and Katz speculated that any
unmeasured ability is fixed over time and
equally rewarded in all industries.
Differencing the 1986 and 1984 observations
eliminated the v
i

_1
1 1 1
_
ln ..
..
Industry
it o it k kit it
Industry m
m it i it
w X X D
D v
o o o o
o
= + + + +
+ + +
_1
1986 1 1 1986 1986 1 1986
_
1986 1986
_1
1984 1 1 1984 1984 1 1984
_
1984 1984
ln ..
..
ln ..
..
ln

Industry
i o i k ki i
Industry m
m i i i
Industry
i o i k ki i
Industry m
m i i i
i
w X X D
D v
w X X D
D v
w
o o o o
o
o o o o
o
= + + + +
+ + +
= + + + +
+ + +
A
_1
1 1 1
_
..
..
Industry
i k ki i
Industry m
m i i
X X D
D
o o o
o
= A + + A + A
+ A + A
The estimated industry coefficients from
the differenced equation are about 80% of
the estimated industry coefficients from the
levels equation.
Unobserved worker ability appears to
explain relatively little of the cross-industry
wage differentials.
A Panel Data DGP
0 1 1 2 2 3 3
2
' '
..
1... ; 1...
( ) 0
( )
( ) 0 ' '
( ) 0 , ,

if OR
for all
it i it i t K Kit it
it
it
it i t
jit it
Y X X X X
i n t T
E
Var
E i i t t
E X j i t
| | | | | c
c
c o
c c
c
= + + + + + +
= =
=
=
= = =
=
Panel Data DGPs
Notice that when we have panel data,
we index observations with both i and t.
Pay close attention to the subscripts
on variables.
Some variables vary only across time or
across individual.
Panel Data DGPs (cont.)
0 1 1 2 2 3 3
2
2
3
..
1... ; 1...
For example, varies only by individual, and
is fixed over time. might be a variable such
as race or gender.
varies only by t
i
i
t
Y X X X X
i n t T
X
X
X
| | | | | c = + + + + + +
= =
3
1
1
ime, and is fixed across .
might be national unemployment.
varies across BOTH individual and time.
For example, might refer to wages.
t
it
it
i
X
X
X
0 1 1 2 2 3 3
0
..
1... ; 1...
.

One of the key features of the DGP is that we
allow each individual to have a distinct
intercept This intercept includes ALL

i
Y X X X X
i n t T
i
| | | | | c
|
= + + + + + +
= =
aspects of unobserved heterogeneity that
are fixed over the length of the panel.
In this DGP, the |
0i
are fixed across samples.
The unmeasured heterogeneity is the same
in every sample.
This DGP is called the Distinct
Intercepts DGP.
It is suitable for panels of states or countries,
where the same individuals would be selected
in each sample.
With longitudinal data on individual
workers or consumers, we draw a
different set of individuals from the
population each time we collect
a sample.
Each individual has his/her own set of
fixed omitted variables.
We cannot fix each individual intercept.
Another Panel Data DGP
0 1 1 2 2 3 3
2
' '
2
'
'
..
1... ; 1...
( ) 0 ( )
( ) 0 ' ' ( ) 0
( ) 0 ' ( )
( ) 0 , ',
( ) 0 , ,

if OR
for
for all
for all
E
it it i t K Kit i it
it it
it i t i
i i i v
it i
jit it
Y X X X X v
i n t T
E Var
E i i t t E v
E v v i i Var v
E v i i t
E x j i t
| | | | |
o

o
= + + + + + + +
= =
= =
= = = =
= = =
=
=
( ) 0 , ,
( ) 0 , ,
ITHER for all
OR for at least some
jit i
jit i
E X v j i t
E X v j i t
=
=
Panel Data DGPs
In this DGP, we return to a model with a
single intercept for all data points, |
0
However, we break the error term into
two components:

When we draw an individual i, we draw
one v
i
that is fixed for that individual in
all time periods.
v
i
includes all fixed omitted variables.
it i it
v c = +
In the Distinct Intercepts DGP, the
unobserved heterogeneity is absorbed
into the individual-specific intercept |
0i
In the second DGP, the unobserved
heterogeneity is absorbed into the
individual fixed component of the error
term, v
i
This DGP is an Error Components
Model.
The Error Components DGP comes in
two flavors, depending on .
If , then the unobserved
heterogeneity is uncorrelated with
the explanators.
OLS is unbiased and consistent.
( )
jit i
E X v
( ) 0
jit i
E X v =
heterogeneity IS correlated with
the explanators.
OLS is BIASED and INCONSISTENT.
( ) 0
jit i
E X v =
Panel data is most useful in the second
Error Components case.
When , OLS is inconsistent.
Using panel data, we can create a
consistent estimator: Fixed Effects.
( ) 0
jit i
E X v =
Fixed Effects (Chapter 16.2)
The Fixed Effects Estimator
Used with EITHER the distinct intercepts
DGP OR the error components DGP
with
Basic Idea: estimate a separate intercept
for each individual
( ) 0
ijt i
E X v =
Fixed Effects (cont.)
The simplest way to estimate separate
intercepts for each individual is to use
dummy variables.
This method is called the least squares
dummy variable estimator.
We have already seen that we can use
dummy variables to estimate separate
intercepts for different groups.
With panel data, we have multiple
observations for each individual. We
can group these observations.
Least Squares Dummy Variable
Estimator:
1. Create a set of n dummy variables,
D
j
, such that D
j
= 1 if i = j.
2. Regress Y
it
against all the dummies,
X
t
, and X
it
variables (you must omit
X
i
variables and the constant).
The LSDV estimator is conceptually
quite simple.
In practice, the tricky parts are:
1. Creating the dummy variables
2. Entering the regression into the computer
3. Reporting results
Suppose we have a longitudinal dataset
with 300 workers over 10 years.
n = 300
We must create 300 dummy variables
and then specify a regression with
300+ explanators.
How do we do this in our software
package?
Our regression output includes
300 intercepts.
Usually, we are not interested in the
intercepts themselves.
We include the dummy variables to
control for heterogeneity.
In reporting your regression output,
it is preferable to note that you have
included individual fixed effects.
Then omit the dummy variable
coefficients from your table of results.
At some point, n becomes too large
for the computer to handle easily.
Modern computers can implement LSDV
for ever larger data sets, but eventually
LSDV becomes computationally
intractable.
A computationally convenient alternative
is called the Fixed Effects Estimator.
Technically, only this strategy is
Fixed Effects; using dummy variables
is LSDV.
In practice, econometricians tend to
refer to either method as Fixed Effects.
The initial insight for the Fixed Effects
estimator: if we DIFFERENCE
observations for the same individual, the
v
i
cancels out.
0 1 1 2 2
' 0 1 1 ' 2 2 '
' 1 ' '
( ) 0 ( ) 0 0
it it i i it
it it i i it
it it it it it it
Y X X v
Y X X v
Y Y X X
| | |
| | |
|
= + + + +
= + + + +
= + + + +
When we difference, the heterogeneity
term v
i
drops out.
(In the distinct intercepts model, the |
0i

would drop out).
By assumption, the
it
are uncorrelated
with the X
it
OLS would be a consistent estimator
of |
1

If T = 2, then we have only 2
observations for each individual
(as in the Gibbons and Katz example).
Differencing the 2 observations
is efficient.
If T > 2, then differencing any 2
observations ignores valuable
information in the other observations
for each individual.
We can use all the observations
for each individual if we subtract
the individual-specific mean from
each observation.
0 1 2
0 1 2
1
1
( ) 0 ( ) 0 0
1
1 1
where
Note:
it it i i it
i i i i i
it i it i it i
T
i it
t
i i i
Y X X v
Y X X v
Y Y X X
Y Y
T
v n v v
n n
| | |
| | |
|
=
= + + + +
= + + + +
= + + + +
=
E = =

1
Fixed Effects:
1) Construct

2) Regress
FE
it it i
FE
it it i
FE FE
it it it
y Y Y
x X X
y x | q
=
=
= +
The Fixed Effects and DVLS estimators
provide exactly identical estimates.
The n-T-k term of the Fixed Effects
e.s.e.s must be adjusted to account for
the extra n degrees of freedom that have
been used.
The computer can make this adjustment.
Demeaning each observation by the
individual-specific mean eliminates the
need to create n dummy variables.
FE is computationally much simpler.
Fixed Effects (however estimated)
discards all variation between
individuals.
Fixed Effects uses only variation over
time within an individual.
FE is sometimes called the within
estimator.
Fixed Effects discards a great deal of
variation in the explanators (all variation
between individuals).
Fixed Effects uses n degrees
of freedom.
Fixed Effects is not efficient if
Could we use OLS?
( ) 0
it i
E X v =
Checking Understanding (cont.)
0 1
2
' '
2
'
'
( ) 0 ( )
( ) 0 ' '
( ) 0 ( )
( ) 0 ' ( ) 0 ,
( ) 0 , ', ( ) 0 ,
if OR
for for all
for all for all
Question: is OLS cons
it it i it
it it
it i t
i i v
i i it i
it i it it
Y X v
E Var
E i i t t
E v Var v
E v v i i E x v i t
E v i i t E X i t
| |
o

o

= + + +
= =
= = =
= =
= = =
= =
istent and efficient?
Because X is uncorrelated with either
v or , OLS is consistent in the
uncorrelated version of the error
components DGP.
The error terms are homoskedatic.
2 2
( ) ( )
it i it v
Var Var v

c o o = + = +
However, the covariance between
disturbances for a given individual is

' '
2
'
2 2
2
'
'
2 2
( , ) ([ ] [ ])
( ) 2 ( ) ( )
( )
( , )
( , )
( )

it it i it i it
i i it it it
i v
it it v
it it
it v
Cov E v v
E v E v E
E v
Cov
Corr
Var

c c

o
c c o
c c
c o o
= + +
= + +
= =
= =
+
In the presence of serial correlation,
OLS is inefficient.
Random Effects
When unobserved heterogeneity is
uncorrelated with explanators, panel
data techniques are not needed to
produce a consistent estimator.
However, we do need to correct for
serial correlation between observations
of the same individual.
Random Effects (cont.)
When , panel data provides
a valuable tool for eliminating omitted
variables bias. We use Fixed Effects to
gain the benefits of panel data.
When , panel data does not
offer special benefits. We use Random
Effects to overcome the serial
correlation of panel data.
( ) 0
it i
E X v =
( ) 0
it i
E X v =
The key idea of random effects:
Estimate o
v
2
and o
2
Use these estimates to construct efficient
weights of panel data observations
2
1 1 1 2
2
:
1
( 1)
1) Estimate the regression using Fixed Effects.
2) Construct Fixed Effects residuals,
3) Estimate
4) Estimate the regression using OLS
5) Estimate
it
T n T
it i
t i
u
u u
T
s
n T k
s
t
t
o
= = =
| |
|
\ .
=

2 2 2
2 2 2
:
as usual
6) Because

v
v
s s s

o o o = +
=
Once we have estimates of o
v
2
and
o
2
, we can re-weight the observations
optimally.
These calculations are complicated,
but most computer packages can
implement them.
Example: Production Functions
(Chapter 16.2)
We have data from 625 French firms from
16 countries for 8 years.
We wish to estimate a CobbDouglas
production function:
Taking logs:
We estimate using random effects.
1 2
0 i i i i
Q L K
| |
| c =
0 1 2
ln( ) ln( ) ln( ) ln( ) ln( )
i i i i
Q L K | | | c = + + +
TABLE 16.1 Random Effects Estimation
of a CobbDouglas Production Function
for a Sample of Manufacturing Firms
The estimated coefficients of 0.30 for
capital and 0.69 for labor are similar to
estimates using US data.
We also get similar results using fixed
effects estimation.
TABLE 16.2 Fixed Effects Estimation of a
CobbDouglas Production Function for a
Sample of Manufacturing Firms
We arrive at similar estimates using
either random effects or fixed effects.
Because only fixed effects controls
for unobserved heterogeneity that is
correlated with the explanators, the
similarity between the two estimates
suggests that unobserved heterogeneity
is not creating a large bias in
this sample.
Example: Production Functions (cont.)
The fixed effects estimator discards all
variation between firms, and must use
624 more degrees of freedom than
random effects.
Moving from RE to FE increases the e.s.e.
on capital from 0.0116 to 0.0145
The e.s.e. on labor moves from 0.0118
to 0.0132
Example: Production Functions (cont.)
The RE estimator provides more
precise estimates
We would prefer to use RE instead of FE.
However, RE might be inconsistent if
We need a test to help determine whether
it is safe to use RE.
( ) 0
it i
E X v =
The Hausman Test (Chapter 16.4)
Hausmans specification test for error
components DGPs provides guidance
on whether
The key idea: if , then the
inconsistent RE estimator and the
consistent FE estimator converge to
different estimates.
( ) 0
it i
E X v =
( ) 0
it i
E X v =
The Hausman Test (cont.)
heterogeneity is uncorrelated with X
and does not create a bias.
RE and FE are both consistent.
For two consistent estimators to provide
significantly different estimates would be
surprising.
( ) 0
it i
E X v =
We know the FE estimator is consistent
even when
The problem with FE is its inefficiency.
FE is not as precise as RE.
Although FE is imprecise, it may provide
a good enough estimate to detect a
large bias in RE.
( ) 0
it i
E X v =
If FE is very imprecise, then the
Hausman test has very weak power and
cannot rule out even large biases.
If FE is very precise, then the Hausman
test has very good power, but we gain
little benefit from switching to the more
efficient RE.
If FE is somewhat precise, then the Hausman
test can warn us away from using RE in the
presence of a large bias, but there is still
room for substantial efficiency gains in
switching to RE.
With the French manufacturing firms, FE is
precise enough to reject the null even though
the two estimates are fairly close.
TABLE 16.4 Hausman Specification Test
of French Firms Error Components Correlation
with Explanators
The Hausman Test
Warning: fixed effects exacerbates
measurement error bias.
There is likely to be less variation
in X within the experience of a single
individual than across several individuals.
Small measurement errors can become
large relative to the within-variation in X.
The Hausman Test warns us that RE and
FE provide significantly different estimates.
This difference could arise because of
omitted variables bias in RE, caused by
This difference could ALSO arise because
of measurement error biases in FE.

E( X
it
v
i
) = 0
Review
Potential unobserved heterogeneity is a form
of omitted variables bias.
Unobserved heterogeneity refers to omitted
variables that are fixed for an individual
(at least over a long period of time).
A persons upbringing, family characteristics,
innate ability, and demographics (except age)
do not change.
Review (cont.)
Panel Data is data in which we
observe repeated cross-sections of the
same individuals.
By far the leading type of panel data is
repeated cross-sections over time.
Review (cont.)
The key feature of panel data is that we
observe the same individual in more
than one condition.
Omitted variables that are fixed will take
on the same values each time we
observe the same individual.
Review (cont.)
We learned 3 different DGPs for
panel data.
In the distinct intercept DGP, across
samples we would observe the same
individuals with the same unobserved
heterogeneity.
Each i has its own intercept, |
0i
, that is
fixed across samples.
A Panel Data DGP
0 1 1
2
' '
..
1... ; 1...
( ) 0
( )
( ) 0 ' '
( ) 0 , ,

if OR
for all
it i it K Kit it
it
it
it i t
jit it
Y X X
i n t T
E
Var
E i i t t
E X j i t
| | | c
c
c o
c c
c
= + + + +
= =
=
=
= = =
=
Review
Error components DGPs are suitable
when we would draw different
individuals across samples.
When each i is drawn, its unobserved
heterogeneity is captured in a v
i
term.
We learned two error components
DGP, depending on whether the v
i
is
correlated with the X
kit
s.
Another Panel Data DGP
0 1 1 2 2 3 3
2
' '
2
'
'
..
1... ; 1...
( ) 0 ( )
( ) 0 ' ' ( ) 0
( ) 0 ' ( )
( ) 0 , ',
( ) 0 , ,

if OR
for
for all
for all
E
it it i t K Kit i it
it it
it i t i
i i i v
it i
jit it
Y X X X X v
i n t T
E Var
E i i t t E v
E v v i i Var v
E v i i t
E x j i t
| | | | |
o

o
= + + + + + + +
= =
= =
= = = =
= = =
=
=
( ) 0 , ,
( ) 0 , ,
ITHER for all
OR for at least some
jit i
jit i
E X v j i t
E X v j i t
=
=
Review
If , OLS would be inconsistent
By estimating a separate intercept for
each individual, we can control for the v
i

We learned two equivalent strategies:
DVLS and FE.

E( X
it
v
i
) = 0
Review (cont.)
The simplest way to estimate separate
intercepts for each individual is to use
dummy variables.
This method is called the least squares
dummy variable estimator.
Review (cont.)
Least Squares Dummy Variable
Estimator:
1. Create a set of n dummy variables,
D
j
, such that D
j
= 1 if i = j
2. Regress Y
it
against all the dummies,
X
t
, and X
it
variables (you must omit X
i

variables and the constant)
Review (cont.)
1
Fixed Effects:
1) Construct

2) Regress
FE
it it i
FE
it it i
FE FE
it it it
y Y Y
x X X
y x | q
=
=
= +
Review (cont.)
Demeaning each observation by the
individual-specific mean eliminates the
need to create n dummy variables.
FE is computationally much simpler.
Review (cont.)
Fixed Effects (however estimated)
discards all variation between
individuals.
Fixed Effects uses only variation over
time within an individual.
FE is sometimes called the within
estimator.
Review (cont.)
Because X is uncorrelated with either
v or , OLS is consistent in the
uncorrelated version of the error
components DGP.
OLS disturbances are homoskedatic.
Var(c
it
) =Var(v
i
+
it
) =o
v
2
+ o
2
Review (cont.)
However, the covariance between
disturbances for a given individual is

' '
2
'
2 2
2
'
'
2 2
( , ) ([ ] [ ])
( ) 2 ( ) ( )
( )
( , )
( , )
( )

it it i it i it
i i it it it
i v
it it v
it it
it v
Cov E v v
E v E v E
E v
Cov
Corr
Var

c c

o
c c o
c c
c o o
= + +
= + +
= =
= =
+
Review (cont.)
When unobserved heterogeneity is
uncorrelated with explanators, panel
data techniques are not needed to
produce a consistent estimator.
However, we do need to correct for
serial correlation between observations
of the same individual.
Review (cont.)
When , panel data provides
a valuable tool for eliminating omitted
variables bias. We use Fixed Effects to
gain the benefits of panel data.
( ) 0
it i
E X v =
Review (cont.)
When , panel data is less
convenient than an equal-sized
cross-sectional data set. We use
Random Effects to overcome the serial
correlation of panel data.

E( X
it
v
i
) = 0
Review (cont.)
The key idea of random effects:
Estimate o
v
2
and o
2
Use these estimates to construct efficient
weights of panel data observations

Panel Data Analysis

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Panel Data Analysis

Загружено:

Авторское право:

Доступные форматы

Copyright 2006 Pearson Addison-Wesley. All rights reserved.

Copyright 2006 Pearson Addison-Wesley. All rights reserved. 24-44

Вам также может понравиться