Вы находитесь на странице: 1из 24

The Spatial Economics

and Econometrics Centre (SEEC)

D ISCUSSION PAPER S ERIES


No 8 / July, 2016

xtdcce: Estimating Dynamic Common Correlated Effects


in Stata

Jan Ditzen

Please see the SEEC website (http://seec.hw.ac.uk/) for more information.


xtdcce: Estimating Dynamic Common
Correlated Effects in Stata
Jan Ditzen∗

Spatial Economics and Econometrics Centre (SEEC)


Heriot-Watt University, Edinburgh, UK

July 7, 2016

Abstract
This article introduces a new Stata command, xtdcce, to esti-
mate a dynamic common correlated effects model with heterogeneous
coefficients. The estimation procedure mainly follows Chudik and
Pesaran (2015b), in addition the common correlated effects estima-
tor (Pesaran, 2006), as well as the mean group (Pesaran and Smith,
1995) and the pooled mean group estimator (Shin et al., 1999) are sup-
ported. Coefficients are allowed to be heterogeneous or homogeneous.
In addition instrumental variable regressions and unbalanced panels
are supported. The Cross Sectional Dependence Test (CD Test) is au-
tomatically calculated and presented in the estimation output. Small
sample time series bias can be corrected by jackknife correction or
recursive mean adjustment. Examples for empirical applications of all
estimation methods mentioned above are given.

Keywords: xtdcce, parameter heterogeneity, dynamic panels, cross


section dependence, common correlated effects, pooled mean-group
estimator, mean-group estimator, instrumental variables, ivreg2

Correspondence: Jan Ditzen, PhD Student, Spatial Economics and Econometrics
Centre (SEEC), School of Management & Languages, Heriot-Watt University, Scotland,
United Kingdom, EH14 4AS; Email: jd219@hw.ac.uk, Web: www.jan.ditzen.net

1
1 Introduction
Estimating panels with heterogeneous coefficients in a large N and T setting
became standard in the last years, thanks to seminal work in theoretical
econometrics (Pesaran and Smith, 1995; Shin et al., 1999) and availability
of data and computer power. Allowing for heterogeneous slopes allows the
researcher to identify effects on a much more detailed thus local level. At
the same time the theoretical literature on how to account for unobserved
dependence between cross sectional units evolved (Pesaran, 2006; Chudik
and Pesaran, 2015b). Cross sectional means are added to account for the
unobserved dependence between countries.
This paper introduces a new Stata program which combines these two
strands of the literature. xtdcce allows for (pooled) mean group estimations
in a dynamic panel with dependence between countries. It controls for de-
pendence by adding cross sectional means and lags, as proposed by Pesaran
(2006) and Chudik and Pesaran (2015b).1 Furthermore it tests for cross sec-
tional dependence in the error terms and allows for instrumental variable
estimation as well. Additionally xtdcce allows to correct for small sample
time series bias by using the jackknife correction method or the recursive
mean adjustment as proposed by Chudik and Pesaran (2015b).
xtdcce differs in several ways from the existing estimation procedures
for common correlated effects in a heterogeneous panel. In comparison to
xtmg it allows the consistent estimation of a dynamic panel by adding lags
of the cross sectional means. Moreover coefficients may be constrained to
be homogeneous across all units. Additionally unbalanced panels are sup-
ported. Compared to xtpmg, xtdcce avoids maximum likelihood estimations,
offering the possibility to estimate models including endogenous independent
variables. Hence, the main novelties within the setting of xtpmg and xtmg
are the inclusion of a test for cross sectional dependence, small T bias cor-
rection methods and the support for instrumental variable (IV) regressions.
IV regressions benefit from the ivreg2 package. Possible applications for an
IV estimation are endogenous spatial lags, which are instrumented by exoge-
nous measures such as distance, other variables or higher order spatial lags.
Furthermore adding cross sectional means implies to account for unobserved
heterogeneity across units.
The xtdcce packages includes xtcd2, which tests for cross sectional de-
1
Chudik and Pesaran (2015a) give a comprehensive overview over the literature on
(dynamic) common correlated effects, while Chudik and Pesaran (2015b) focuses on dy-
namic common correlated effects. In the following common correlated effects cites Pesaran
(2006), while dynamic cites Chudik and Pesaran (2015b), even though both are found in
Chudik and Pesaran (2015a).

2
pendence (henceforth CD test) as proposed by Pesaran (2015) and Chudik
and Pesaran (2015a). Two other programs, xtcd (by Markus Eberhardt) and
xtcsd by De Hoyos and Sarafidis (2006), made the CD test already available
in Stata. The novelties of xtcd2 is the support of unbalanced panels, the
possibility to test any variable for cross sectional dependence and the option
to plot the cross correlations as a histogram.
The remainder of the paper is structured as the following: the next two
sections give a brief introduction into dynamic common correlated effects and
testing for cross sectional dependence. Then the syntax, options and saved
values of xtdcce and xtcd2 are explained. The paper closes with examples
for an empirical application and a comparison of regression results obtained
by xtdcce with results from estimation procedure already available in Stata.

2 Common Correlated Effects Estimators


Assume the following equation with heterogeneous coefficients (Pesaran, 2006):

yi,t = αi + βi xi,t + ui,t (1)

ui,t = γi0 ft + ei,t


where ft is an unobserved common factor and γi a heterogeneous factor
loading.2 The heterogeneous coefficients are randomly distributed around a
common mean, such that βi = β + vi , vi ∼ IID(0, Ωv ) (Pesaran and Smith,
1995). Pesaran (2006) shows that equation 1 can be consistently estimated
by approximating the unobserved common factors with cross section means
x̄t under strict exogeneity of xi,t . This estimator is commonly know as the
Common Correlated Effects estimator (henceforth CCE). Furthermore it was
proved to be consistent under a variety of further assumptions on the error
term Chudik et al. (2011); Kapetanios et al. (2011). In empirical applica-
tions the estimator was used for example in Eberhardt et al. (2012), Bond
and Eberhardt (2013) or McNabb and LeMay-Boucher (2014). The CCE
estimator was made available in Stata by Markus Eberhard’s xtmg command
(Eberhardt, 2012).
The CCE estimator is, however, consistent only in non-dynamic panels
(Chudik and Pesaran, 2015b; Everaert and De Groote, 2015). In a dynamic
panel as:
yi,t = αi + λi yi,t−1 + βi xi,t + ui,t ,
2
Unlike Pesaran (2006) the intercept is kept and not partialled out. See discussion in
Section 4.5.

3
where the idiosyncratic errors ui,t are cross sectionally weakly dependent and
E(λi ) = λ. The lagged dependent variable is no longer strictly exogenous and
therefore the estimator becomes inconsistent. √ Chudik and Pesaran (2015b)
show that the estimator gains consistency if 3 T lags of the cross section
means are added. The equation to be estimated is then
pT
X
0
yi,t = αi + λi yi,t−1 + βi xi,t + δi,l z̄t−l + ei,t , (2)
l=0

where z¯t = (ȳt−1 , x̄t ) and pT is the number of lags. λi and βi are stacked into
πi = (λi , βi ). The mean group estimates are then
N
1 X
π̂M G = π̂i .
N i=1

π̂i and π̂M G are consistently estimated with convergence rate N if

(N, T, pT ) ⇒ ∞

and under full rank of the factor loadings (Chudik and Pesaran, 2015a,b).
The asymptotic variance can be consistently estimated by
XN
1
V ˆar(π̂M G ) = N −1 Σ̂π = (π̂i − π̂M G ) (π̂i − π̂M G )0 .
N (N − 1) i=1

The mean group estimates have the following asymptotic distribution (Chudik
and Pesaran, 2015b):
√ d
N (π̂M G − π) → N (0, ΣM G ).

The pooled mean group estimator (Shin et al., 1999) can be seen as an
intermediate between a pure pooled estimation (homogeneous coefficients)
and a mean group estimation (heterogeneous coefficients). The assumption
of the pooled mean group estimator (PMG) is, that regressors have a ho-
mogeneous long run and a heterogeneous short run effect on the dependent
variable. Equation 1 is transformed into an error correction model, such that

∆yi,t = φi (yi,t−1 − θ1,i xi,t ) + δ0,i + δ1,i ∆xi,t + i,t .

φi is the error-correction speed of adjustment parameter and expected to


be negative. θ are the long run coefficients and assumed to be homogeneous,
while δ capture short term dynamics and are heterogeneous across units.
Shin et al. (1999) propose to estimate the long run coefficients by maximum

4
likelihood and the short run coefficients by OLS. The estimator is consistent
as long as the disturbances are independently distributed across all individ-
uals and time periods with a zero mean and a variance strictly larger than
zero.
The mean group estimate and the variance of the short run coefficients
are:
N N  2
1 X 1 X
δ̂M G = δ̂i , V ˆar(δ̂M G ) = δ̂i − δ̂M G
N i=1 N (N − 1) i=1
The mean group and the pooled mean group estimator, in the static and
the dynamic version rely on large N and T. The literature on small sample
time series bias corrections in dynamic heterogeneous panels is somewhat
scare and the reason why Chudik and Pesaran (2015b) focus on ”‘half-panel”’
jackknife and recursive mean adjustment bias correction methods. Both do
not require any knowledge of the error factor structure and can be applied
to the mean group estimates.3 The mean group estimate of the jackknife
bias-corrected CCE estimator is
1 a b

π̃M G = 2π̂M G − π̂M G + π̂M G ,
2
a T
where π̂M G is the mean group estimate of the first half (t = 1, ..., 2 ) of the
b T
panel and π̂M G of the second half (t = 2 + 1, ..., T ) of the panel.
The recursive mean adjustment removes the partial mean from the all
variables, meaning:
t−1
1 X
ω̃it = ωit − ωis ,
t − 1 s=1
where ωit = (yit , xit ) or any other variable except the constant. In line with
Chudik and Pesaran (2015b) the partial mean is lagged by one period to
prevent it from being influenced by contemporaneous observations.

3 Testing for cross sectional dependence


If the cross sectional means are not included in the equation or do not account
for all dependence between units, the error term will contain cross sectional
dependence. Therefore it is not iid any more and OLS becomes inconsistent
(Everaert and De Groote, 2015). Pesaran (2015) and Chudik and Pesaran
(2015a) develop a procedure to test for cross sectional dependence. Under the
3
For a further discussion see Chudik and Pesaran (2015b) or Everaert and De Vos
(2016).

5
null hypothesis, the error terms are weakly cross sectional dependent. More
formally, if the error term for unit i in period t is ui,t , then the hypothesis is

H0 : E(ui,t uj,t ) = 0, ∀ t and i 6= j.

and the test statistic is


s  
NX−1 XN
2T 
CD = ρ̂ij 
N (N − 1) i=1 j=i+1
PT
t=1 ûi,t ûjt
ρ̂ij = ρ̂ji = P 1/2 P 1/2
T 2 T
t=1 ûit t=1 û2jt
where ρ̂ij is the correlation coefficient. In the case of an unbalanced panel,
the correlation coefficient is calculated for the common sample
P   
t∈Ti ∩Tj
¯i
ûit − û ¯j
ûjt − û
ρ̂ij = ρ̂ji =   2 (1/2) P  2 (1/2)
P ¯i ¯j
t∈Ti ∩Tj ûit − û t∈Ti ∩Tj ûjt − û

where P
t∈Ti ∩Tj ûit
¯i =
û , Tij = # (Ti ∩ Tj )
Tij
and the CD test statistic becomes then
s  
NX−1 XN q
2 
CD = Tij ρ̂ij 
N (N − 1) i=1 j=i+1

Under the null the CD test statistic is asymptotically

CD ∼ N (0, 1)

distributed. For a more in depth discussion see Pesaran (2015); Chudik and
Pesaran (2015a).

4 The xtdcce command


4.1 Syntax
  
xtdcce depvar indepvars if , pooled(varlist) crosssectional(varlist)
nocrosssectional cr lags(#) exogenous vars(varlist)

6
endogenous vars(varlist) ivreg2options(string) lr(varlist) lr options(string)
pooledconstant reportconstant trend pooledtrend residuals(string)
jackknife recursive nocd cluster(string) noomit full lists noisily

post full
Data has to be [TS] tsset before using xtdcce. varlist may contain time-
series operators, see [TS] tsvarlist. xtdcce requires the moremata package
by Jann (2005).

4.2 Options
pooled(varlist) specifies homogenous coefficients. For these variables the
estimated coefficients are constrained to be equal across all units (βi =
β ∀ i). Variable may occur in indepvars. Variables in exogenous vars(),
endogenous vars() and lr() may be pooled as well.
crosssectional(varlist) defines the variables which are included in zt and
added as cross sectional averages (z̄t−l ) to the equation. Variables in
crosssectional() may be included in pooled(), exogenous vars(),
endogenous vars() and lr(). Default option is to include all vari-
ables from depvar, indepvars and endogenous vars() in zt . Variables
in crosssectional() are partialled out, the coefficients not estimated
and reported.
cr lags(#) specifies the number of lags of the cross sectional averages. If
not defined but crosssectional() contains varlist, then only contem-
poraneous cross sectional averages are added, but no lags. cr lags(0) is
equivalent to omitting it.
nocrosssectional prevents adding cross sectional averages. Results will
be equivalent to the Pesaran and Smith (1995) Mean Group estimator,
or if lr(varlist) specified to the Shin et al. (1999) Pooled Mean Group
estimator.
xtdcce supports instrumental variable regression using ivreg2 by Baum
et al. (2003, 2007). Endogenous and exogenous variables are set by:
endogenous vars(varlist) specifies the endogenous and
exogenous vars(varlist) the exogenous variables. See for a further de-
scription ivreg2.
ivreg2options passes further options on to ivreg2. See ivreg2, options
for more information.
fulliv posts all available results from ivreg2 in e() with prefix ivreg2 .
noisily shows the output of wrapped ivreg2 regression command.
lr(varlist): Variables to be included in the long-run cointegration vector.
The first variable is the error-correcting speed of adjustment term.

7
lr options(string) Options for the long run coefficients. Options may be:
nodivide, coefficients are not divided by the error correction speed of
adjustment vector (i.e. estimate equation 4).
xtpmgnames, coefficients names in e(b p mg) and e(V p mg) match the
name convention from xtpmg.
noconstant suppress constant term.
pooledconstant restricts the constant to be the same across all groups
(β0,i = β0 , ∀i).
reportconstant reports the constant. If not specified the constant is treated
as a part of the cross sectional averages.
trend adds a linear unit specific trend. May not be combined with pooledtrend.
pooledtrend a linear common trend is added. May not be combined with
trend.
jackknife applies the jackknife bias correction for small sample time series
bias. May not be combined with recursive.
recursive applies recursive mean adjustment method to correct for small
sample time series bias. May not be combined with jackknife.
residuals(varname) saves residuals as new variable.
nocd suppresses calculation of CD test statistic.
cluster(varname) clustered standard errors, where varname is the cluster
identifier.
nomit suppress checks for collinearity.
full reports unit individual estimates in output.
lists shows all variables names and lags of cross section means.
post full requests that the individual estimates, rather than the mean group
estimates are saved in e(b) and e(V). Mean group estimates are then
saved in e(b p mg) and e(V p mg).

4.3 Saved results


xtdcce saves the following in e():

8
Scalars
e(N) number of observations e(N g) number of groups
e(T) number of time periods e(K) number of regressors
e(N partial) number of variables e(N omitted) number of omitted variables
partialled out
e(N pooled) number of pooled variables e(mss) model sum of square
e(rss) residual sum of squares e(F) F statistic
e(ll) log-likelihood (only IV) e(rmse) root mean squared error
e(df m) model degrees of freedom e(df r) residual degree of freedom
e(r2) R-squared e(r2 a) R-squared adjusted
e(cd) CD test statistic e(cdp) p-value of CD test statistic
Scalars (unbalanced panel)
e(minT) minimum time e(maxT) maximum time
e(avgT) average time
Macros
e(tvar) name of time variable e(idvar) name of unit variable
e(depvar) name of dependent variable e(indepvar) name of independent variables
e(omitted) name of omitted variables e(lr) long run variables
e(pooled) name of pooled variables e(cmd) command line
e(cmd full) command line including
options
Macros (iv-specific)
e(insts) instruments (exogenous) e(instd) instrumented (endogenous)
variables variables
Matrices
e(b) coefficient vector e(V) variance–covariance matrix
(mean group or individual) (mean group or individual)
e(b p mg) coefficient vector e(V p mg) variance–covariance matrix
(mean group and pooled) (mean group and pooled)
e(b full) coefficient vector e(V full) variance–covariance matrix
(individual and pooled) (individual and pooled)
Functions
e(sample) marks estimation sample

4.4 xtcd2
Included in the xtdcce package is the xtcd2 command, which tests for weak
cross sectional dependence. The command supports balanced as well as un-
balanced panels. For a discussion of the teststatistic see section 3.

4.4.1 Syntax
 
xtcd2 varname(max=1), noestimation rho histogram name(string)
varname is the name of the residuals or variables and is optional in case
the command is performed after an estimation (postestimation).

4.4.2 Options
If noestimation is specified, then xtcd2 is not run as a postestimation com-
mand and does not require e(sample) to be set. This option allows to test
any variable. If not set, then xtcd2 uses either the variable specified in

9
varname or predicts the residuals using predict, residuals. In both
cases the sample is restricted to e(sample).
histogram plots a histogram of the cross correlations. The number of ob-
servations, the mean, percentiles, minimum and maximum of the cross
correlations are reported. If name(string) is set, then the histogram is
saved and not drawn.
rho saves the matrix with the cross correlations in r(rho).

4.4.3 Saved results


Scalars
r(CD) value of the CD statistic
r(p) p-value
Matrices
r(rho) cross correlations matrix, if re-
quested

4.5 The Constant in xtdcce


xtdcce can treat the constant (αi ) in several ways.4 In Pesaran (2006) and
Chudik and Pesaran (2015a) the constant is part of the matrix which includes
the cross sectional averages and is partialled out. In an empirical setting it
can be of use to obtain estimates of the constant.
Therefore xtdcce estimates and reports the constant if the option report-
constant is used. Otherwise it is partialled out or removed from the model
and not reported. Additionally xtdcce allows the constant to be the same
across all units by specifying the option pooledconstant.5 As a final op-
tion, the constant can be completely removed from the model by using the
noconstant option.
The constant is removed from the model if all parameters including the
constant are constrained to be homogeneous, the cross sectional means in-
clude all variables and the dataset is strongly balanced. Loosely speaking,
by adding the time averages of the dependent variable and all indepen-
dent variables together with a homogeneous constant, the data is demeaned
and the constant is rendered to be zero. Thus xtdcce automatically re-
moves the constant from the model to improve the estimation. If the option
reportconstant is used, then the constant is still estimated and reported in
the output.
4
In order to stick to Stata notation, the intercept is called the constant.
5
If pooledconstant is used but not reportconstant, the constant is internally calcu-
lated but not displayed.

10
5 Empirical Examples
In the following three empirical examples are carried out to demonstrate the
use of the xtdcce command. As a first exercise an augmented Solow model
with dynamic common correlated effects is estimated. This section includes
IV regressions and introduces the novelties of xtdcce. Afterwards mean
group and common correlated effects estimations and finally pooled mean
group estimations are shown. These last two parts compare xtdcce to the
existing Stata commands xtmg and xtpmg.

5.1 Dynamic Common Correlated Effects and testing


for cross sectional dependence
As a first exercise the augmented Solow Model in style of Mankiw et al.
(1992), Islam (1995) and Lee et al. (1997) is estimated by regressing the
difference in log GDP per capita on lagged GDP per capita, human and
physical capital and the population growth rate. The latest version of the
Penn World Tables (Feenstra et al., 2015) are used. The dataset is restricted
to the years from 1960 to 2007, meaning 48 years are available initially. After
tsset, the estimation is run, with the dependent and all four independent
variables as cross sectional means, set by crosssectional.6 The number of
cross section means, specified by cr lags() is set to 3.
. xtdcce d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd , /*
> */ cr(d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd) /*
> */ reportc cr_lags(3) res(residuals)
Dynamic Common Correlated Effects - Mean Group
Panel Variable (i): id Number of obs = 4092
Time Variable (t): year Number of groups = 93
Obs per group (T) = 44
F( 465, 1767)= 0.81
Prob > F = 1.00
R-squared = 0.52
Adj. R-squared = 0.52
Root MSE = 0.06
CD Statistic = 1.37
p-value = 0.1716

D.log_rgdpo Coef. Std. Err. z P>|z| [95% Conf. Interval]

Mean Group Estimates:


L.log_rgdpo -.636661 .030397 -20.94 0.000 -.6962375 -.5770841
log_hc -1.3089 .396571 -3.30 0.001 -2.086161 -.5316326
log_ck .220947 .051488 4.29 0.000 .1200325 .3218616
log_ngd .041286 .105526 0.39 0.696 -.1655399 .2481125
_cons -1.87339 1.83475 -1.02 0.307 -5.469432 1.722646

6
For a better readability, the commands are abbreviated in the outputs, but not in the
text.

11
Mean Group Variables: L.log_rgdpo log_hc log_ck log_ngd _cons
Cross Sectional Averaged Variables: D.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd
Degrees of freedom per country:
in mean group estimation = 39
with cross-sectional averages = 24
Number of
cross sectional lags = 3
variables in mean group regression = 2325
variables partialled out = 1860

On the lower right of the upper panel, the output shows a CD test statis-
tic of 1.37, or a p-value of 0.17, so the hypothesis of weak cross sectional
dependence can not be rejected. Below the coefficient estimates, xtdcce
displays the names of the 4 mean group variables, the constant and 5 cross
sectional averages. As the regression without any pooled variables is essen-
tially a regression run on each country separately, the degree of freedom for
each country is shown as well, which equals to the number of time periods
(T=44) minus the number of variables (5). The degree of freedom for each
country with cross-sectional averages indicates the potential degree of free-
dom of a regression which would include the cross sectional averages. It
equals the number of time periods (44) minus the number of variables (K =
5), minus the number of cross section averages times the number of lags plus
one for the contemporaneous averages (5 ∗ (3 + 1) = 20). At the end of the
table, the number of lags of the cross sectional means is displayed, together
with the number of variables in the mean group regression and the number of
variables partialled out, which equals to the number of cross sectional means.
As the cross sectional means are purely treated as controls and have no in-
terpretation, no information is lost by partialling out. Therefore the means
are regressed on each of the explanatory variables of interest and then the
residuals collected. The residuals are then used as the new explanatory and
dependent variables. The partialling out is performed in Mata. The variables
to be partialled out (the cross sectional means and if requested the hetero-
geneous intercept) are stacked in a block diagonal matrix, with zeros on the
off diagonals. For a large number of units the matrix becomes sparse and
calculating and inverting the cross product becomes computational intensive,
hence time consuming. To improve speed, the partialling out is done sequen-
tially unit by unit, which is possible as long as the coefficients on the cross
sectional means, δi,l , are heterogeneous.7 Within this process, the program
7
The precision lies in a negligible order of magnitude and is offsetted by the improve-
ment in speed. A simulation supporting these results are available upon request from the
author.
The standard solver for the calculation of the inverse of the cross product of the factor
loadings is cholsolve. cholsolve can not solve positive definite or singular matrices. In
this case qrsolve is used. Thanks to Mark Schaffer for providing the code for this routine.

12
checks if the factor loadings are full rank.8 If the check fails, xtdcce aborts
with error code 506, see section 6. While for the computation of the cross
sectional means, all available data is used, the partialling out is restricted to
the observations used in the regression. For the example above this means
the following: 1 period is lost to create L.log rgdpo and further 3 periods
are lost due to the creation of the lags of the cross sectional means. The
cross sectional means of the variables with exception of the lag level of the
dependent variable are computed using the full available sample size. The
average for L.log rgdpo is based on the years 1961 - 2007 as the value for
1960 is missing. The regression and the partialling out is then applied on
the restricted sample. In total 4 periods are lost, so the time span for the
regression are the years 1964 - 2007, making the time dimension T = 44.
The regression results are not in favour of the Solow Model as the coef-
ficient on human capital is negative and the coefficient on physical capital
positive but not significant. For a more detailed discussion of the Solow
Model in growth empirics see Mankiw et al. (1992); Islam (1995); Lee et al.
(1997) or Durlauf et al. (2005); Jones (2015); and with a focus on slope
heterogeneity see Islam (1998) and Lee et al. (1998).
As the residuals are saved as a new variable called residuals, the test on
cross sectional dependence can be done by hand to confirm the result from
above.
. xtcd2 residuals
Chudik and Pesaran (2015) test for cross sectional dependence
Postestimation.
H0: errors are weakly cross sectional dependent.
CD = 1.3669791
p_value = .17163187

Using the option noestimation leads to the same result, as long as the
observations which are omitted in the estimation are missings in the variable
residuals. The advantage of noestimation is, that it allows testing variables
for cross sectional dependence. For example testing the independent variable
for cross sectional dependence reads:
. xtcd2 log_d_rgdpo , noest
Chudik and Pesaran (2015) test for cross sectional dependence
H0: errors are weakly cross sectional dependent.
CD = 72.243906
p_value = 0

In a next step all 5 coefficients are constrained to be the same across


8
The condition of a full rank is checked on the unit specific matrices containing the cross
sectional means (z̄i = (y¯i , x̄i ), where y¯i and x̄i are T × 1 and T × K matrices containing
the cross sectional means). This is possible as the matrix over all units is block diagonal
and the rank of it is equal to the sum of the rank of the blocks.

13
countries, βi,k = βk , ∀i = 1, .., N, k = 0, ..., 4, by specifying the pooled()
option, pooling the constant by using the pooledconstant option and forcing
xtdcce to display the constant using reportconstant.9
. xtdcce d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd , /*
> */ p(L.log_rgdpo log_hc log_ck log_ngd) reportc /*
> */ cr(d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd) cr_lags(3) pooledc
Dynamic Common Correlated Effects - Pooled Mean Group
Panel Variable (i): id Number of obs = 4092
Time Variable (t): year Number of groups = 93
Obs per group (T) = 44
F( 5, 2227)= 0.23
Prob > F = 0.95
R-squared = 0.16
Adj. R-squared = 0.16
Root MSE = 0.07
CD Statistic = -0.92
p-value = 0.3567

D.log_rgdpo Coef. Std. Err. z P>|z| [95% Conf. Interval]

Pooled Variables:
L.log_rgdpo -.291652 .014082 -20.71 0.000 -.3192522 -.2640528
log_hc .061267 .096776 0.63 0.527 -.1284114 .2509449
log_ck .147058 .013429 10.95 0.000 .1207373 .1733795
log_ngd .003515 .022033 0.16 0.873 -.0396682 .0466986
_cons -1.4e-15 .469827 -0.00 1.000 -.9208433 .9208433

Pooled Variables: L.log_rgdpo log_hc log_ck log_ngd _cons


Cross Sectional Averaged Variables: D.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd
Degrees of freedom per country:
in mean group estimation = 39
with cross-sectional averages = 24
Number of
cross sectional lags = 3
variables in mean group regression = 1865
variables partialled out = 1860

The estimate for the constant is close to zero. This result is expected, as
outlined in the section above, because all coefficients are pooled, the panel is
balanced and all variables are added as cross sectional means.
As a final exercise, let’s assume that investments into physical capital
are endogenous and are instrumented by its lagged values. The endogenous
variable log sk is specified in endogenous vars() and the exogenous variable
L.log sk in exogenous vars().
. xtdcce d.log_rgdpo L.log_rgdpo log_hc log_ngd , /*
> */ cr(d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd) reportc /*
> */ endo(log_ck) exo(L.log_ck) cr_lags(3) ivreg2options(nocollin noid)
Dynamic Common Correlated Effects - Mean Group IV

9
Pesaran (2006) discusses the mean group and the pooled version of the CCE estimator.
A pooled version in the dynamic setting is not mentioned by Chudik and Pesaran (2015b).
However Everaert and De Groote (2015) show that the unrestricted CCEP estimator is
consistent as long as N, T ⇒ ∞.

14
Panel Variable (i): id Number of obs = 4092
Time Variable (t): year Number of groups = 93
Obs per group (T) = 44
F( 2325, 1767)= 3.55
Prob > F = 0.00
R-squared = 0.48
Adj. R-squared = -0.21
Root MSE = 0.04
CD Statistic = 1.16
p-value = 0.2474

D.log_rgdpo Coef. Std. Err. z P>|z| [95% Conf. Interval]

Mean Group Estimates:


log_ck -.085406 .073381 -1.16 0.244 -.2292295 .0584181
L.log_rgdpo -.557558 .034126 -16.34 0.000 -.6244428 -.4906724
log_hc -1.04877 .403293 -2.60 0.009 -1.839213 -.2583344
log_ngd .127136 .167222 0.76 0.447 -.2006135 .4548857
_cons -1.51382 1.92312 -0.79 0.431 -5.283056 2.255424

Mean Group Variables: L.log_rgdpo log_hc log_ngd _cons


Cross Sectional Averaged Variables: D.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd
Endogenous Variables: log_ck
Exogenous Variables: L.log_ck
Degrees of freedom per country:
in mean group estimation = 39
with cross-sectional averages = 24
Number of
cross sectional lags = 3
variables in mean group regression = 2325
variables partialled out = 1860

5.2 Pooled Mean Group


In the following xtdcce is compared to results from xtpmg, by Blackburne
and Frank (2007). xtpmg implements the Pooled Mean Group estimator by
Shin et al. (1999) into Stata. The two programs differs in two ways. First of
all xtpmg estimates the following equation

∆ci,t = φi (ci,t−1 − θ1,i yi,t − θ2,i πi,t ) + δ0,i + δ1,i ∆yi,t + δ2,i ∆πi,t + i,t , (3)

while xtdcce internally estimates (leaving out any cross sectional means):

∆ci,t = φi ci,t−1 + γ1,i yi,t + γ2,i πi,t + δ0,i + δ1,i ∆yi,t + δ2,i ∆πi,t + i,t , (4)

Secondly xtpmg calculates the long run coefficients using maximum likeli-
hood. xtdcce treats the long run coefficients, defined in lr(), as further co-
variates and estimates equation 4 entirely by OLS. To calculate the long run
coefficients, the coefficients are divided by the negative of the long run coin-
tegration vector to match equation 3, θ1,i = −γ1,i /φi . The variances are cal-
culated using the Delta method. Equation 4 and the coefficients γ1,i , ..., γK,i
can be estimated by using lr options(nodivide).

15
As in Blackburne and Frank (2007) the jasa2 dataset is used to explain
consumption with inflation and income after 1962. esttab produces the
following output:
. use jasa2, clear
. tsset id year
panel variable: id (unbalanced)
time variable: year, 1960 to 1993
delta: 1 unit
. qui xtpmg d.c d.pi d.y if year>=1962, lr(l.c pi y) ec(ec) replace pmg
. eststo xtpmg:
.
. eststo xtdcce1: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y) nocross lr_options(xtpmgnames) reportc
. eststo xtdcce2: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y) nocross lr_options(nodivide xtpmgnames) reportc
. eststo xtdcce3: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y) cr(d.c d.pi d.y) cr_lags(0) /*
> */ lr_options(xtpmgnames) reportc
. esttab xtpmg xtdcce1 xtdcce2 xtdcce3 /*
> */ , se mlabels(xtpmg "xtdcce (mg)" "xtdcce (mg)" "xtdcce (cce)") s(N cd cdp)

(1) (2) (3) (4)


xtpmg xtdcce (mg) xtdcce (mg) xtdcce (cce)

ec
pi -0.466*** -0.194** -0.0327** -0.276***
(0.0567) (0.0701) (0.0121) (0.0698)
y 0.904*** 0.903*** 0.152*** 0.940***
(0.00868) (0.0163) (0.0144) (0.0170)

SR
ec -0.200*** -0.168*** -0.168*** -0.184***
(0.0322) (0.0151) (0.0151) (0.0172)
D.pi -0.0183 -0.0548 -0.0548 0.0237
(0.0278) (0.0299) (0.0299) (0.0317)
D.y 0.327*** 0.380*** 0.380*** 0.384***
(0.0574) (0.0350) (0.0350) (0.0431)
_cons 0.154*** 0.137*** 0.137*** 0.0643***
(0.0217) (0.00765) (0.00765) (0.00585)

N 767 767 767 767


cd 4.101 4.101 0.671
cdp 0.0000410 0.0000410 0.502

Standard errors in parentheses


* p<0.05, ** p<0.01, *** p<0.001

Column (1) shows the results using xtpmg, columns (2) - (4) using the
xtdcce estimator. Column (1) matches the results from Blackburne and
Frank (2007) p. 203. Column (2) displays the results using xtdcce. The
signs of the results are the same and the magnitudes, especially for the short
run coefficients, are very similar. In column (3) the option nodivide is used,
producing estimates for equation 4.10
10
An alternative to obtain long run coefficients in a dynamic panel using the between

16
As the last row indicates, using no cross sectional means leads to a rejec-
tion of the null of weak cross section dependence. Therefore in column (4)
cross sectional means are added.
The short run coefficients can be restricted to be equal across all units
by including them in the pooled() option. At the same time the long run
coefficients can be allowed to vary as well. To test under which constraints
the model is consistent, the Hausman test can be performed:
. eststo mg: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) nocross reportc
. eststo pmg: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y) nocross reportc
. eststo pooled: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y d.pi d.y) nocross reportc
. hausman mg pooled, sigmamore
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
mg pooled Difference S.E.

pi
D1. -.0253642 -.0280826 .0027184 .0305406
y
D1. .2337588 .3811944 -.1474357 .0533472
c
L1. -.3063473 -.1794146 -.1269326 .0328851
pi -.3529095 -.266343 -.0865666 .1230082
y .9181344 .9120574 .0060771 .0287949

b = consistent under Ho and Ha; obtained from xtdcce


B = inconsistent under Ha, efficient under Ho; obtained from xtdcce
Test: Ho: difference in coefficients not systematic
chi2(5) = (b-B)´[(V_b-V_B)^(-1)](b-B)
= 18.13
Prob>chi2 = 0.0028
. hausman pmg pooled, sigmamore
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
pmg pooled Difference S.E.

c
L1. -.1683577 -.1794146 .0110569 .0050032
pi -.1941238 -.266343 .0722191 .0316818
y .9025766 .9120574 -.0094807 .007498
pi
D1. -.0548234 -.0280826 -.0267408 .0264268
y
D1. .3802491 .3811944 -.0009453 .0279448

b = consistent under Ho and Ha; obtained from xtdcce


B = inconsistent under Ha, efficient under Ho; obtained from xtdcce
Test: Ho: difference in coefficients not systematic
chi2(5) = (b-B)´[(V_b-V_B)^(-1)](b-B)
= 2.66
Prob>chi2 = 0.7515
(V_b-V_B is not positive definite)

estimator is outlined in Ditzen and Gundlach (2015).

17
The result of the Hausman test is similar to the one obtained in (Black-
burne and Frank, 2007, Section 4.3 and 4.4). The first Hausman test implies
that the pooled model is preferred over the mean group model. The second
Hausman test compares the mean group and the pooled model. The con-
clusion is in line with Blackburne and Frank (2007), that the pooled mean
group model is preferred.

5.3 Mean Group and Common Correlated Effects


xtdcce is able to compute the mean group and common correlated effects
estimator by Pesaran and Smith (1995) and Pesaran (2006), introduced by
Eberhardt (2012) xtmg command. Following Eberhardt (2012) using the
dataset manu stata9.dta, xtmg leads to the following mean group results:11
. use manu_stata9.dta
. xtset nwbcode year
panel variable: nwbcode (strongly balanced)
time variable: year, 1970 to 2002
delta: 1 unit
. eststo xtmg06: qui xtmg ly lk, cce trend
. eststo xtdcce06: qui xtdcce ly lk , cr(ly lk) cr_lags(0) trend reportc
. eststo xtmg95: qui xtmg ly lk, trend
. eststo xtdcce95: qui xtdcce ly lk , cr(ly lk) trend nocross reportc
. estout xtmg95 xtdcce95 xtmg06 xtdcce06 , c(b(star fmt(4)) se(fmt(4) par)) /*
> */ mlabels("xtmg - mg" xtdcce "xtmg - cce" xtdcce) s(N cd cdp , fmt(0 3 3 )) /*
> */ drop(*_ly *_lk) rename(__000007_t trend) collabels(,none)

xtmg - mg xtdcce xtmg - cce xtdcce

lk 0.1789* 0.1789* 0.3125*** 0.3125***


(0.0805) (0.0805) (0.0849) (0.0849)
trend 0.0174*** 0.0174*** 0.0108** 0.0108**
(0.0030) (0.0030) (0.0035) (0.0035)
_cons 7.6528*** 7.6354*** 4.7860*** 4.7752***
(0.8546) (0.8531) (1.3227) (1.3202)

N 1194 1194 1194 1194


cd 6.686 -0.201
cdp 0.000 0.841

The first two columns show regressions using the mean group estimator
without any cross section means. Column (1) and (3) match the estimation
results from Table 1, p. 67 in Eberhardt (2012). Column (3) and (4) are
based on the common correlated effects estimator, which includes a contem-
poraneous cross sectional means. In column (2) the CD teststatistic rejects
the hypothesis of weak cross sectional dependence. Including cross sectional
averages improves the statistic such that the hypothesis is can not be rejected
11
The dataset manu stata9.dta is taken from Eberhardt and Teal (2014) and is available
at https://sites.google.com/site/medevecon/.

18
any longer. Estimation results produced by xtmg and xtdcce differ slightly,
as seen here by the constant. The difference between both is, that xtdcce
converts and creates internally all variables into doubles, to allow for best
precision.12

6 Error Messages
xtdcce produces the following error codes:
r(109) ivreg2 not installed r(184) options noconstant and
pooledconstant, trend
and trendconstant or
jackknife and
recursive are combined.
r(506) Rank condition on cross r(2001) More variables than
section means not satisfied. observations.

7 Conclusion
The Stata user written program xtdcce introduces new routines to estimate
a heterogeneous panel model using dynamic common correlated effects. It
combines estimation procedures proposed in Pesaran and Smith (1995) and
(Shin et al., 1999) with those in Pesaran (2006) and Chudik and Pesaran
(2015b). It allows coefficients to be pooled or estimated as mean groups.
Furthermore it supports unbalanced panels, estimation of instrumental vari-
ables, small sample time series bias corrections and tests for cross sectional
dependence, using the included xtcd2 routine. An empirical example esti-
mating a growth regression is given.

8 Acknowledgments
I am grateful to Arnab Bhattacharjee and Mark Schaffer for many valuable
comments and suggestions. To Achim Ahrens for his contributions to the
xtcd2 command and to Kyle McNabb for testing the program. The code
and especially the output greatly benefited from Markus Eberhardt’s xtmg
command. Any remaining errors are my own.
Jan Ditzen, PhD Student, Spatial Economics and Econometrics Centre
(SEEC), School of Management & Languages, Heriot-Watt University, Edin-
burgh EH14 4AS, Scotland, United Kingdom; Email: jd219@hw.ac.uk, Web:
www.jan.ditzen.net
12
Another difference to xtmg is that xtdcce supports time-series operators.

19
References
Baum, C. F., M. E. Schaffer, and S. Stillman. 2003. Instrumental variables
and GMM: Estimation and testing. Stata Journal 1(3): 1–31.

. 2007. Enhanced routines for instrumental variables/generalized


method of moments estimation and testing. Stata Journal 7(4): 465–506.

Blackburne, E. F., and M. W. Frank. 2007. Estimation of nonstationary


heterogeneous panels. Stata Journal 7(2): 197–208.

Bond, S. R., and M. Eberhardt. 2013. Accounting for unobserved hetero-


geneity in panel time series models.

Chudik, A., and M. H. Pesaran. 2015a. Large Panel Data Models with Cross-
Sectional Dependence: A Survey. In The Oxford Handbook Of Panel Data,
ed. B. H. Baltagi, chap. 1, 2–45. Oxford University Press.

. 2015b. Common correlated effects estimation of heterogeneous dy-


namic panel data models with weakly exogenous regressors. Journal of
Econometrics 188(2): 393–420.

Chudik, A., M. H. Pesaran, and E. Tosetti. 2011. Weak and strong cross-
section dependence and estimation of large panels. The Econometrics Jour-
nal 14(1): C45–C90.

De Hoyos, R. E., and V. Sarafidis. 2006. Testing for cross-sectional depen-


dence in panel-data models. The Stata Journal 6(4): 482–496.

Ditzen, J., and E. Gundlach. 2015. A Monte Carlo study of the BE estimator
for growth regressions. Empirical Economics forthcoming.

Durlauf, S. N., P. A. Johnson, and J. R. W. Temple. 2005. Growth econo-


metrics. In Handbook of economic growth, ed. P. Aghion and S. N. Durlauf,
volume 1, ed., chap. Chapter 8, 555–677.

Eberhardt, M. 2012. Estimating panel time series models with heterogeneous


slopes. Stata Journal 12(1): 61–71.

Eberhardt, M., C. Helmers, and H. Strauss. 2012. Do Spillovers Matter When


Estimating Private Returns to R&D? Review of Economics and Statistics
95(May): 120207095627009.

Eberhardt, M., and F. Teal. 2014. The magnitude of the task ahead: Pro-
ductivity analysis with heterogeneous technology .

20
Everaert, G., and T. De Groote. 2015. Common Correlated Effects Estima-
tion of Dynamic Panels with Cross-Sectional Dependence. Econometric
Reviews 4938(1981): 1–31.

Everaert, G., and I. De Vos. 2016. Bias-corrected Common Correlated Effects


Pooled estimation in homogeneous dynamic panels. University of Ghent
Working Paper .

Feenstra, R. C., R. Inklaar, and M. Timmer. 2015. The Next Gener-


ation of the Penn World Table. American Economic Review . URL
www.ggdc.net/pwt.

Hayashi, F. 2000. Econometrics.

Islam, N. 1995. Growth Empirics: A Panel Data Approach. The Quarterly


Journal of Economics 110(4): 1127 – 1170.

. 1998. Growth Empirics: A Panel Data Approach - A reply. Quarterly


journal of economics 113(1): 325–329.

Jann, B. 2005. moremata: Stata module (Mata)


to provide various functions. URL Available from
http://ideas.repec.org/c/boc/bocode/s455001.html.

Jones, C. I. 2015. The Facts of Economic Growth. NBER Working Paper


(w21142).

Kapetanios, G., M. H. Pesaran, and T. Yamagata. 2011. Panels with non-


stationary multifactor error structures. Journal of Econometrics 160(2):
326–348.

Lee, K., M. H. Pesaran, and R. Smith. 1997. Growth and Convergence in


a Multi-Country Empirical Stochastic Solow Model. Journal of Applied
Economics 12(4): 357–392.

Lee, K., M. H. Pesaran, and R. P. Smith. 1998. Growth empirics: a panel


data approach - A Comment. The Quarterly Journal of Economics 113(1):
319–323.

Mankiw, N. G., D. Romer, and D. N. Weil. 1992. A Contribution to the Em-


pirics of Economic Growth. The Quarterly Journal of Economics 107(2):
407–437.

McNabb, K., and P. LeMay-Boucher. 2014. Tax Structures, Economic


Growth and Development. ICTD Working Paper 22.

21
Pesaran, M. 2006. Estimation and inference in large heterogeneous panels
with a multifactor error structure. Econometrica 74(4): 967–1012.
Pesaran, M. H. 2015. Testing Weak Cross-Sectional Dependence in Large
Panels. Econometric Reviews 34(6-10): 1089–1117.
Pesaran, M. H., and R. Smith. 1995. Econometrics Estimating long-run
relationships from dynamic heterogeneous panels. Journal of Econometrics
68: 79–113.
Shin, Y., M. H. Pesaran, and R. P. Smith. 1999. Pooled Mean Group Estima-
tion of Dynamic Heterogeneous Panels. Journal of the American Statistical
Association 94(446): 621 –634.

9 Technical Appendix
9.1 Delta Method
For the calculation of the long run coefficients the estimates of γ̂k,i obtained
by OLS from equation 4 are divided by estimates of the long run cointegration
vector φ̂i . To calculate the variance covariance matrix the delta method is
used. The delta method allows the calculation of an approximate probability
distribution for a matrix function a(β) based on a random vector with a
known variance (see for example √ Hayashi, 2000, p. 93). Suppose that for
the random vector βi →p β and n(βi − β) →d N (0, σ). Denote the first
derivates of a(β) as
∂a(β)
A(β) ≡ .
∂β 0
Then the distribution of the function a() is

n [a(βi ) − a(β)] →d N (0, A(β)ΣA(β)0 ) .
For the calculation of the long run coefficients and using the notation from
equation 4, assume that
βi = (φi , γ1,i , γ2,i , δ1,i , δ2,i )0
The variance covariance matrix is:
 
V (φi ) Cov(φi γ1,i ) Cov(φi γ2,i ) Cov(φi δ1,i ) Cov(φi δ2,i )
 


Cov(φi γ1,i ) V (γ1,i ) Cov(γ1,i γ2,i ) ... . 

Σ=  . . . 
 
 
 . . . 
Cov(φi δ2,i ) ... ... ... V (δ2,1 )

22
The function a() maps the long run coefficients and leaves the short run
coefficients:
a(βi ) = (φi , −γ1,i /φi , −γ2,i /φi , δ1,i , δ2,i )0
= (φi , θ1,i , θ2,i , δ1,i , δ2,i )0
The first derivative of a() is then:
 
∂φi ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i
 ∂φi ∂φi ∂φi ∂φi ∂φi 
 ∂φi ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i 
 ∂γ1,i ∂γ1,i ∂γ1,i ∂γ1,i ∂γ1,i 
 
 ∂φi ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i 
Ai (β) =  ∂γ2,i ∂γ2,i ∂γ2,i ∂γ2,i ∂γ2,i 
 
 ∂φi ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i 
 ∂δ1,i ∂δ1,i ∂δ1,i ∂δ1,i ∂δ1,i 
 ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i

∂φi
∂δ2,i ∂δ2,i ∂δ2,i ∂δ2,i ∂δ2,i

 
1 − γφ1,i2 − γφ2,i2 0 0
 i i 
 0 − φ1i 0 0 0 
 
 
= 
 0 0 − φ1i 0 0 

 
 0 0 0 1 0 
0 0 0 0 1
All components of the variance covariance matrix are then known and it can
be calculated as:
Σa = A(β)ΣA(β)0

23

Вам также может понравиться