Вы находитесь на странице: 1из 32

Basic Econometrics

Lecture 12: Advanced Panel Data Methods


Mns Sderbom Department of Economics University of Gothenburg
Mans.Soderbom@economics.gu.se

Introduction
Reference: Wooldridge, Chapter 14.1-2 (14.3 optional)

In the previous lecture (and in Chapter 13) it was discussed how first differencing of the data can be used to estimate the unobserved effects panel data model. In this chapter we cover two other methods that can be used for this purpose:
The fixed effects estimator The random effects estimator

Throughout we assume we have panel data, i.e. data on the same individuals (or whatever) over time.

Important features of these methods


The fixed effects (FE) estimator:
Attractive, because the unbserved effect ai can be correlated with the explanatory variables
But a disadvantage is that coefficients on time-constant explanatory variables cannot be directly estimated (xi would be wiped out along with ai)

The random effects (RE) estimator:


We need to assume that the unbserved effect ai is uncorrelated with the explanatory variables
Unlike the FE model, with RE we can estimate coefficients on time-constant explanatory variables directly.

14.1 Fixed effects estimation


Recall the simple 2-period fixed effects model (Chapter 13):

Because ai is unobserved, we cant include it in the regression. This may create omitted variable bias if ai is correlated with xit. Possible exam question: (a) Why? (b) Please provide an example in which this might be a problem.

Solution: Eliminate ai from the equation and then estimate the parameters of interest! As we have seen, one way of eliminating ai is to take first differences of the data:

But first differencing is just one of many ways of eliminating ai An alternative method, which often works better in practice, is called the fixed effects transformation.

The fixed effects transformation


Consider a model with a single explanatory variable: for each individual i,

Now, for each individual i, average this equation over time. We get:
where and so on.

14.1 14.2 Because ai is fixed (constant) over time, it appears in both (14.1) and (14.2). Now subtract (14.2) from (14.1):

which we shall write as 14.3 where (similar for is the time-demeaned data on y and ).

14.3
This is also called the within transformation. The important thing: The unobserved effect ai has disappeared.

We can now estimate (14.3) using OLS.


This procedure first, do the within transformation of the data, then estimate by OLS is called the fixed effects estimator (or the within estimator).

The general unobserved effects model


Straightforward to add more explanatory variables:

Simply use time demeaning on each explanatory variable, and then estimate the following equation using OLS:
14.5

(again, note the absence of ai in this, the transformed, equation).

What assumptions do we need to make?


So our equation to be estimated is now as follows:

We use the OLS method to estimate the parameters here. What do we need to assume for this approach to result in unbiased estimates of the -parameters? Hint: Revisit MLR.1-MLR.4, Chapter 3.

Key requirements for the FE estimator to work:


The explanatory variables are strictly exogenous meaning the idiosyncratic error uit is uncorrelated with each explanatory variable across all time periods.
Otherwise, lead to bias. will be correlated with which would

The xit variables must vary over time.

Otherwise is always equal to zero (yes?), i.e. theres no variation in this variable => cant estimate its coefficient.

If we also assume that uit is homoskedastic and serially uncorrelated, we can use the usual formula for computing standard errors.

Example 14.1: Effect of job training on scrap rates


Data: JTRAIN.dta. Now use three years of data: 1987, -88, -89. N = 54 firms. T = 3. NT = 162 (total number of observations). The Stata command xtreg can be used to obtain fixed effects estimates you must remember to specify that you want the FE model however (otherwise youll get RE).

First, make sure the data have been declared to be panel data we do this by using tsset:
. tsset fcode year panel variable: time variable: delta: fcode (strongly balanced) year, 1987 to 1989 1 unit

(acutally this has already been done for this data set).

Next I obtain fixed effects results as follows:


xtreg lscrap d88 d89 grant grant_1, fe

. xtreg lscrap d88 d89 grant grant_1, fe Fixed-effects (within) regression Group variable: fcode R-sq: within = 0.2010 between = 0.0079 overall = 0.0068 Number of obs Number of groups = = 162 54 3 3.0 3 6.54 0.0001

Obs per group: min = avg = max = F(4,104) Prob > F = =

corr(u_i, Xb)

= -0.0714

lscrap d88 d89 grant grant_1 _cons sigma_u sigma_e rho

Coef. -.0802157 -.2472028 -.2523149 -.4215895 .5974341 1.438982 .49774421 .89313867

Std. Err. .1094751 .1332183 .150629 .2102 .0677344

t -0.73 -1.86 -1.68 -2.01 8.82

P>|t| 0.465 0.066 0.097 0.047 0.000

[95% Conf. Interval] -.297309 -.5113797 -.5510178 -.8384239 .4631142 .1368776 .0169741 .0463881 -.0047551 .7317539

(fraction of variance due to u_i) F(53, 104) = 24.66 Prob > F = 0.0000

F test that all u_i=0:

. xtreg lscrap d88 d89 grant grant_1, fe Fixed-effects (within) regression Group variable: fcode Number of obs Number of groups = = 162 54 3 3.0 3 6.54 0.0001

Based on (14.5)
R-sq: within = 0.2010 between = 0.0079 overall = 0.0068 Obs per group: min = avg = max = F(4,104) Prob > F = =

corr(u_i, Xb)

= -0.0714

lscrap d88 d89 grant grant_1 _cons sigma_u sigma_e rho

Coef. -.0802157 -.2472028 -.2523149 -.4215895 .5974341 1.438982 .49774421 .89313867

Std. Err. .1094751 .1332183 .150629 .2102 .0677344

t -0.73 -1.86 -1.68 -2.01 8.82

P>|t| 0.465 0.066 0.097 0.047 0.000

[95% Conf. Interval] -.297309 -.5113797 -.5510178 -.8384239 .4631142 .1368776 .0169741 .0463881 -.0047551 .7317539

(fraction of variance due to u_i) F(53, 104) = 24.66 Prob > F = 0.0000

F test that all u_i=0:

Additional points
While time-constant variables cannot be included by themselves in FE models, they can be interacted with variables that change over time. Can we use FE to estimate
The return to education?

Changes in the return to education over time?

When we include a full set of year dummies (one for each year except the base year), we cannot also include a variable whose change over time is constant (e.g. age). Why?

Example 14.2: Changes in the return to education over time


Data: WAGEPAN.dta. 545 men observed every year from 1980 through 1987. Education is constant over time (makes sense, right?) so we cant use FE to estimate the return to education. Lets investigate whether the return to education has changed over time.

xi: xtreg lwage i.year*educ married union, fe


Fixed-effects (within) regression Group variable: nr R-sq: within = 0.1708 between = 0.1900 overall = 0.1325 Number of obs Number of groups = = 4360 545 8 8.0 8 48.91 0.0000 Obs per group: min = avg = max = F(16,3799) Prob > F = =

corr(u_i, Xb)

= 0.0991

lwage _Iyear_1981 _Iyear_1982 _Iyear_1983 _Iyear_1984 _Iyear_1985 _Iyear_1986 _Iyear_1987 educ _IyeaXedu_1981 _IyeaXedu_1982 _IyeaXedu_1983 _IyeaXedu_1984 _IyeaXedu_1985 _IyeaXedu_1986 _IyeaXedu_1987 married union _cons sigma_u sigma_e rho

Coef. -.0224158 -.0057611 .0104297 .0843743 .0497253 .0656064 .0904448 0 .0115854 .0147905 .0171182 .0165839 .0237085 .0274123 .0304332 .0548205 .0829785 1.362459 .37264193 .35335713 .52654439

Std. Err. .1458885 .1458558 .1458579 .1458518 .1458602 .1458917 .1458505 (omitted) .0122625 .0122635 .0122633 .0122657 .0122738 .012274 .0122723 .0184126 .0194461 .0162385

t -0.15 -0.04 0.07 0.58 0.34 0.45 0.62 0.94 1.21 1.40 1.35 1.93 2.23 2.48 2.98 4.27 83.90

P>|t| 0.878 0.968 0.943 0.563 0.733 0.653 0.535 0.345 0.228 0.163 0.176 0.053 0.026 0.013 0.003 0.000 0.000

[95% Conf. Interval] -.3084431 -.2917243 -.2755377 -.2015811 -.2362465 -.2204273 -.195508 -.0124562 -.0092533 -.0069251 -.007464 -.0003554 .0033481 .0063722 .018721 .0448527 1.330622 .2636114 .2802021 .2963971 .3703297 .3356971 .3516401 .3763977 .0356271 .0388342 .0411615 .0406319 .0477725 .0514765 .0544942 .09092 .1211042 1.394296

(fraction of variance due to u_i) F(544, 3799) = 8.09 Prob > F = 0.0000

F test that all u_i=0:

Note: xi expands terms containing categorical variables into dummy variable sets.

The dummy variable regression


As it happens, the FE results are numerically identical to what you would get if you estimate the original equation untransformed but with a dummy for each individual i (except one; the base group) in the data. This approach of explicitly including N-1 dummy variables is of course not very practical if N is large. But possibly it helps with the interpretation of the FE model we are more used to interpreting regressions with dummy variables than regressions based on transformed equations.

Example: The training scrape rate regression with firm dummies


drop if lscrap==. xi: reg lscrap d88 d89 grant grant_1 i.fcode
Source Model Residual Total SS 329.979162 25.7659272 355.745089 df 57 104 161 MS 5.7891081 .2477493 2.20959682 Number of obs F( 57, 104) Prob > F R-squared Adj R-squared Root MSE = = = = = = 162 23.37 0.0000 0.9276 0.8879 .49774

lscrap d88 d89 grant grant_1 _Ifcode_410538 _Ifcode_410563 _Ifcode_410565 _Ifcode_410566 _Ifcode_410567 _Ifcode_410577 _Ifcode_410592 _Ifcode_410593 _Ifcode_410596

Coef. -.0802157 -.2472028 -.2523149 -.4215895 3.905259 4.717328 4.443668 4.621434 2.279588 3.423147 6.12662 2.934958 4.761838

Std. Err. .1094751 .1332183 .150629 .2102 .4064064 .4064064 .4064064 .4064064 .4064064 .4064064 .4064064 .4064064 .4064064

t -0.73 -1.86 -1.68 -2.01 9.61 11.61 10.93 11.37 5.61 8.42 15.08 7.22 11.72

P>|t| 0.465 0.066 0.097 0.047 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

[95% Conf. Interval] -.297309 -.5113797 -.5510178 -.8384239 3.09934 3.911408 3.637748 3.815514 1.473668 2.617228 5.3207 2.129039 3.955919 .1368776 .0169741 .0463881 -.0047551 4.711178 5.523247 5.249587 5.427353 3.085507 4.229066 6.932539 3.740878 5.567757

Verify that the coefficients on the year dummies & grant variables are identical to the FE estimates shown earlier.

(additional coefficients omitted from the slide)

Estimating the ai
Sometimes we are interested in the estimated ai. These can be computed as follows: Stata can do this for us: after your xtreg regression, use predict ahati, u If you are using a dummy variable regression, the estimated coefficients on the N-1 dummies are your estimates of ai.

Fixed effects or first differencing?


If your panel data cover only two time periods (T=2), then FE and first differencing (FD) are identical. With T>2, they will be different. How choose?
If uit is serially uncorrelated, FE is better (since the FD residual will be serially correlated) If uit follows a random walk, FD is better (since uit will be serially uncorrelated)

Testing for serial correlation after FE estimation is difficult methods are beyond the scope of the course. Advice: If N is large and T is not too large (say less than 20), then the FE usually works better in practice. A good approach in practice: Obtain results for both estimators and compare if the results do not differ very much, this is reassuring.

Unbalanced panels
So far we have focused on the case where we have the same number of time series observations for each firm (T is the same for all i). Balanced panel. The main reason is that it makes the exposition above a bit more user friendly. But, in practice, T often varies across the individuals in which case we have an unbalanced panel. Estimating the FE model with an unbalanced panel is just as easy as for a balanced panel.

However, you may want to think about why the panel is unbiased, and whether this may lead to bias. For example, if you are analyzing the determinants of firm-level profitability, it may be a problem if the least profitable firms go out of business during the sampling period (attrition). Basically this creates a non-random sample. Addressing this particular problem is difficult and we will not go into further detail.

Indeed, I think its fair to say that the vast majority of researchers using a panel data approach assume attrition will not bias the estimates.

14.2 Random effects models


Model: Now assume that ai is uncorrelated with each explanatory variable:
We continue to assume that uit is uncorrelated with each explanatory variable across all time periods (strict exogeneity). How can we estimate the model? Suppose we use pooled OLS would there be a problem?

Re-write the model above as follows:

where vit = ai + uit is the composite error term. Now, because ai is in the composite error term in each time period, the vit will be serially correlated. It can be shown that:

where

The following GLS transformation eliminates the serial correlation:


Define The transformed equation:

Estimating this transformed equation using OLS gives you the random effects estimator. The residual of this equation is serially uncorrelated OK to have time-constant explanatory variables The parameter is unknown and has to be estimated

Example 14.4: Wage Equation & Panel Data


Data: WAGEPAN.dta (introduced above). To get RE estimates, use the same Stata syntax as for FE, but omit the fe option:
xi: xtreg lwage educ black hisp exper expersq married union i.year

(note that I am including a full set of year dummies)

Random-effects GLS regression Group variable: nr R-sq: within = 0.1799 between = 0.1860 overall = 0.1830

Number of obs Number of groups

= =

4360 545 8 8.0 8 957.77 0.0000

Obs per group: min = avg = max = Wald chi2(14) Prob > chi2 = =

corr(u_i, X)

= 0 (assumed)

lwage educ black hisp exper expersq married union _Iyear_1981 _Iyear_1982 _Iyear_1983 _Iyear_1984 _Iyear_1985 _Iyear_1986 _Iyear_1987 _cons sigma_u sigma_e rho

Coef. .0918763 -.1393767 .0217317 .1057545 -.0047239 .063986 .1061344 .040462 .0309212 .0202806 .0431187 .0578155 .0919476 .1349289 .0235864 .32460315 .35099001 .46100216

Std. Err. .0106597 .0477228 .0426063 .0153668 .0006895 .0167742 .0178539 .0246946 .0323416 .041582 .0513163 .0612323 .0712293 .0813135 .1506683

z 8.62 -2.92 0.51 6.88 -6.85 3.81 5.94 1.64 0.96 0.49 0.84 0.94 1.29 1.66 0.16

P>|z| 0.000 0.003 0.610 0.000 0.000 0.000 0.000 0.101 0.339 0.626 0.401 0.345 0.197 0.097 0.876

[95% Conf. Interval] .0709836 -.2329117 -.0617751 .0756361 -.0060753 .0311091 .0711415 -.0079385 -.0324672 -.0612186 -.0574595 -.0621977 -.0476592 -.0244427 -.271718 .1127689 -.0458417 .1052385 .1358729 -.0033726 .0968629 .1411273 .0888626 .0943096 .1017798 .1436969 .1778286 .2315544 .2943005 .3188907

(fraction of variance due to u_i)

Compare to FE estimates:
Fixed-effects (within) regression Group variable: nr R-sq: within = 0.1806 between = 0.0005 overall = 0.0635 Number of obs Number of groups = = 4360 545 8 8.0 8 83.85 0.0000 Obs per group: min = avg = max = F(10,3805) Prob > F = =

corr(u_i, Xb)

= -0.1212

lwage educ black hisp exper expersq married union _Iyear_1981 _Iyear_1982 _Iyear_1983 _Iyear_1984 _Iyear_1985 _Iyear_1986 _Iyear_1987 _cons sigma_u sigma_e rho

Coef. 0 0 0 .1321464 -.0051855 .0466804 .0800019 .0190448 -.011322 -.0419955 -.0384709 -.0432498 -.0273819 0 1.02764 .4009279 .35099001 .56612236

Std. Err. (omitted) (omitted) (omitted) .0098247 .0007044 .0183104 .0193103 .0203626 .0202275 .0203205 .0203144 .0202458 .0203863 (omitted) .0299499

P>|t|

[95% Conf. Interval]

13.45 -7.36 2.55 4.14 0.94 -0.56 -2.07 -1.89 -2.14 -1.34 34.31

0.000 0.000 0.011 0.000 0.350 0.576 0.039 0.058 0.033 0.179 0.000

.1128842 -.0065666 .0107811 .0421423 -.0208779 -.0509798 -.0818357 -.0782991 -.0829434 -.0673511 .9689201

.1514087 -.0038044 .0825796 .1178614 .0589674 .0283359 -.0021553 .0013573 -.0035562 .0125872 1.086359

(fraction of variance due to u_i) F(544, 3805) = 9.16 Prob > F = 0.0000

F test that all u_i=0:

Random Effects or Fixed Effects?


Because FE allows for an arbitrary correlation between ai and xit, while RE does not, the FE estimator is typically thought a better more convincing approach. But if a key explanatory variable is constant over time, the FE approach less useful. Many researchers consider results for both methods. Some would argue that if RE and FE give results that are not significantly different (this can be tested by means of a Hausman test see p.495), one should use the RE estimates.

Some nice problems in the book


Chapter 13: Questions 14.1, 14.2, 14.3 in the chapter (pages 482, 483, 492); and 14.1 in the set of problems.

Вам также может понравиться