Академический Документы
Профессиональный Документы
Культура Документы
Jan Ditzen
July 7, 2016
Abstract
This article introduces a new Stata command, xtdcce, to esti-
mate a dynamic common correlated effects model with heterogeneous
coefficients. The estimation procedure mainly follows Chudik and
Pesaran (2015b), in addition the common correlated effects estima-
tor (Pesaran, 2006), as well as the mean group (Pesaran and Smith,
1995) and the pooled mean group estimator (Shin et al., 1999) are sup-
ported. Coefficients are allowed to be heterogeneous or homogeneous.
In addition instrumental variable regressions and unbalanced panels
are supported. The Cross Sectional Dependence Test (CD Test) is au-
tomatically calculated and presented in the estimation output. Small
sample time series bias can be corrected by jackknife correction or
recursive mean adjustment. Examples for empirical applications of all
estimation methods mentioned above are given.
1
1 Introduction
Estimating panels with heterogeneous coefficients in a large N and T setting
became standard in the last years, thanks to seminal work in theoretical
econometrics (Pesaran and Smith, 1995; Shin et al., 1999) and availability
of data and computer power. Allowing for heterogeneous slopes allows the
researcher to identify effects on a much more detailed thus local level. At
the same time the theoretical literature on how to account for unobserved
dependence between cross sectional units evolved (Pesaran, 2006; Chudik
and Pesaran, 2015b). Cross sectional means are added to account for the
unobserved dependence between countries.
This paper introduces a new Stata program which combines these two
strands of the literature. xtdcce allows for (pooled) mean group estimations
in a dynamic panel with dependence between countries. It controls for de-
pendence by adding cross sectional means and lags, as proposed by Pesaran
(2006) and Chudik and Pesaran (2015b).1 Furthermore it tests for cross sec-
tional dependence in the error terms and allows for instrumental variable
estimation as well. Additionally xtdcce allows to correct for small sample
time series bias by using the jackknife correction method or the recursive
mean adjustment as proposed by Chudik and Pesaran (2015b).
xtdcce differs in several ways from the existing estimation procedures
for common correlated effects in a heterogeneous panel. In comparison to
xtmg it allows the consistent estimation of a dynamic panel by adding lags
of the cross sectional means. Moreover coefficients may be constrained to
be homogeneous across all units. Additionally unbalanced panels are sup-
ported. Compared to xtpmg, xtdcce avoids maximum likelihood estimations,
offering the possibility to estimate models including endogenous independent
variables. Hence, the main novelties within the setting of xtpmg and xtmg
are the inclusion of a test for cross sectional dependence, small T bias cor-
rection methods and the support for instrumental variable (IV) regressions.
IV regressions benefit from the ivreg2 package. Possible applications for an
IV estimation are endogenous spatial lags, which are instrumented by exoge-
nous measures such as distance, other variables or higher order spatial lags.
Furthermore adding cross sectional means implies to account for unobserved
heterogeneity across units.
The xtdcce packages includes xtcd2, which tests for cross sectional de-
1
Chudik and Pesaran (2015a) give a comprehensive overview over the literature on
(dynamic) common correlated effects, while Chudik and Pesaran (2015b) focuses on dy-
namic common correlated effects. In the following common correlated effects cites Pesaran
(2006), while dynamic cites Chudik and Pesaran (2015b), even though both are found in
Chudik and Pesaran (2015a).
2
pendence (henceforth CD test) as proposed by Pesaran (2015) and Chudik
and Pesaran (2015a). Two other programs, xtcd (by Markus Eberhardt) and
xtcsd by De Hoyos and Sarafidis (2006), made the CD test already available
in Stata. The novelties of xtcd2 is the support of unbalanced panels, the
possibility to test any variable for cross sectional dependence and the option
to plot the cross correlations as a histogram.
The remainder of the paper is structured as the following: the next two
sections give a brief introduction into dynamic common correlated effects and
testing for cross sectional dependence. Then the syntax, options and saved
values of xtdcce and xtcd2 are explained. The paper closes with examples
for an empirical application and a comparison of regression results obtained
by xtdcce with results from estimation procedure already available in Stata.
3
where the idiosyncratic errors ui,t are cross sectionally weakly dependent and
E(λi ) = λ. The lagged dependent variable is no longer strictly exogenous and
therefore the estimator becomes inconsistent. √ Chudik and Pesaran (2015b)
show that the estimator gains consistency if 3 T lags of the cross section
means are added. The equation to be estimated is then
pT
X
0
yi,t = αi + λi yi,t−1 + βi xi,t + δi,l z̄t−l + ei,t , (2)
l=0
where z¯t = (ȳt−1 , x̄t ) and pT is the number of lags. λi and βi are stacked into
πi = (λi , βi ). The mean group estimates are then
N
1 X
π̂M G = π̂i .
N i=1
√
π̂i and π̂M G are consistently estimated with convergence rate N if
(N, T, pT ) ⇒ ∞
and under full rank of the factor loadings (Chudik and Pesaran, 2015a,b).
The asymptotic variance can be consistently estimated by
XN
1
V ˆar(π̂M G ) = N −1 Σ̂π = (π̂i − π̂M G ) (π̂i − π̂M G )0 .
N (N − 1) i=1
The mean group estimates have the following asymptotic distribution (Chudik
and Pesaran, 2015b):
√ d
N (π̂M G − π) → N (0, ΣM G ).
The pooled mean group estimator (Shin et al., 1999) can be seen as an
intermediate between a pure pooled estimation (homogeneous coefficients)
and a mean group estimation (heterogeneous coefficients). The assumption
of the pooled mean group estimator (PMG) is, that regressors have a ho-
mogeneous long run and a heterogeneous short run effect on the dependent
variable. Equation 1 is transformed into an error correction model, such that
4
likelihood and the short run coefficients by OLS. The estimator is consistent
as long as the disturbances are independently distributed across all individ-
uals and time periods with a zero mean and a variance strictly larger than
zero.
The mean group estimate and the variance of the short run coefficients
are:
N N 2
1 X 1 X
δ̂M G = δ̂i , V ˆar(δ̂M G ) = δ̂i − δ̂M G
N i=1 N (N − 1) i=1
The mean group and the pooled mean group estimator, in the static and
the dynamic version rely on large N and T. The literature on small sample
time series bias corrections in dynamic heterogeneous panels is somewhat
scare and the reason why Chudik and Pesaran (2015b) focus on ”‘half-panel”’
jackknife and recursive mean adjustment bias correction methods. Both do
not require any knowledge of the error factor structure and can be applied
to the mean group estimates.3 The mean group estimate of the jackknife
bias-corrected CCE estimator is
1 a b
π̃M G = 2π̂M G − π̂M G + π̂M G ,
2
a T
where π̂M G is the mean group estimate of the first half (t = 1, ..., 2 ) of the
b T
panel and π̂M G of the second half (t = 2 + 1, ..., T ) of the panel.
The recursive mean adjustment removes the partial mean from the all
variables, meaning:
t−1
1 X
ω̃it = ωit − ωis ,
t − 1 s=1
where ωit = (yit , xit ) or any other variable except the constant. In line with
Chudik and Pesaran (2015b) the partial mean is lagged by one period to
prevent it from being influenced by contemporaneous observations.
5
null hypothesis, the error terms are weakly cross sectional dependent. More
formally, if the error term for unit i in period t is ui,t , then the hypothesis is
where P
t∈Ti ∩Tj ûit
¯i =
û , Tij = # (Ti ∩ Tj )
Tij
and the CD test statistic becomes then
s
NX−1 XN q
2
CD = Tij ρ̂ij
N (N − 1) i=1 j=i+1
CD ∼ N (0, 1)
distributed. For a more in depth discussion see Pesaran (2015); Chudik and
Pesaran (2015a).
6
endogenous vars(varlist) ivreg2options(string) lr(varlist) lr options(string)
pooledconstant reportconstant trend pooledtrend residuals(string)
jackknife recursive nocd cluster(string) noomit full lists noisily
post full
Data has to be [TS] tsset before using xtdcce. varlist may contain time-
series operators, see [TS] tsvarlist. xtdcce requires the moremata package
by Jann (2005).
4.2 Options
pooled(varlist) specifies homogenous coefficients. For these variables the
estimated coefficients are constrained to be equal across all units (βi =
β ∀ i). Variable may occur in indepvars. Variables in exogenous vars(),
endogenous vars() and lr() may be pooled as well.
crosssectional(varlist) defines the variables which are included in zt and
added as cross sectional averages (z̄t−l ) to the equation. Variables in
crosssectional() may be included in pooled(), exogenous vars(),
endogenous vars() and lr(). Default option is to include all vari-
ables from depvar, indepvars and endogenous vars() in zt . Variables
in crosssectional() are partialled out, the coefficients not estimated
and reported.
cr lags(#) specifies the number of lags of the cross sectional averages. If
not defined but crosssectional() contains varlist, then only contem-
poraneous cross sectional averages are added, but no lags. cr lags(0) is
equivalent to omitting it.
nocrosssectional prevents adding cross sectional averages. Results will
be equivalent to the Pesaran and Smith (1995) Mean Group estimator,
or if lr(varlist) specified to the Shin et al. (1999) Pooled Mean Group
estimator.
xtdcce supports instrumental variable regression using ivreg2 by Baum
et al. (2003, 2007). Endogenous and exogenous variables are set by:
endogenous vars(varlist) specifies the endogenous and
exogenous vars(varlist) the exogenous variables. See for a further de-
scription ivreg2.
ivreg2options passes further options on to ivreg2. See ivreg2, options
for more information.
fulliv posts all available results from ivreg2 in e() with prefix ivreg2 .
noisily shows the output of wrapped ivreg2 regression command.
lr(varlist): Variables to be included in the long-run cointegration vector.
The first variable is the error-correcting speed of adjustment term.
7
lr options(string) Options for the long run coefficients. Options may be:
nodivide, coefficients are not divided by the error correction speed of
adjustment vector (i.e. estimate equation 4).
xtpmgnames, coefficients names in e(b p mg) and e(V p mg) match the
name convention from xtpmg.
noconstant suppress constant term.
pooledconstant restricts the constant to be the same across all groups
(β0,i = β0 , ∀i).
reportconstant reports the constant. If not specified the constant is treated
as a part of the cross sectional averages.
trend adds a linear unit specific trend. May not be combined with pooledtrend.
pooledtrend a linear common trend is added. May not be combined with
trend.
jackknife applies the jackknife bias correction for small sample time series
bias. May not be combined with recursive.
recursive applies recursive mean adjustment method to correct for small
sample time series bias. May not be combined with jackknife.
residuals(varname) saves residuals as new variable.
nocd suppresses calculation of CD test statistic.
cluster(varname) clustered standard errors, where varname is the cluster
identifier.
nomit suppress checks for collinearity.
full reports unit individual estimates in output.
lists shows all variables names and lags of cross section means.
post full requests that the individual estimates, rather than the mean group
estimates are saved in e(b) and e(V). Mean group estimates are then
saved in e(b p mg) and e(V p mg).
8
Scalars
e(N) number of observations e(N g) number of groups
e(T) number of time periods e(K) number of regressors
e(N partial) number of variables e(N omitted) number of omitted variables
partialled out
e(N pooled) number of pooled variables e(mss) model sum of square
e(rss) residual sum of squares e(F) F statistic
e(ll) log-likelihood (only IV) e(rmse) root mean squared error
e(df m) model degrees of freedom e(df r) residual degree of freedom
e(r2) R-squared e(r2 a) R-squared adjusted
e(cd) CD test statistic e(cdp) p-value of CD test statistic
Scalars (unbalanced panel)
e(minT) minimum time e(maxT) maximum time
e(avgT) average time
Macros
e(tvar) name of time variable e(idvar) name of unit variable
e(depvar) name of dependent variable e(indepvar) name of independent variables
e(omitted) name of omitted variables e(lr) long run variables
e(pooled) name of pooled variables e(cmd) command line
e(cmd full) command line including
options
Macros (iv-specific)
e(insts) instruments (exogenous) e(instd) instrumented (endogenous)
variables variables
Matrices
e(b) coefficient vector e(V) variance–covariance matrix
(mean group or individual) (mean group or individual)
e(b p mg) coefficient vector e(V p mg) variance–covariance matrix
(mean group and pooled) (mean group and pooled)
e(b full) coefficient vector e(V full) variance–covariance matrix
(individual and pooled) (individual and pooled)
Functions
e(sample) marks estimation sample
4.4 xtcd2
Included in the xtdcce package is the xtcd2 command, which tests for weak
cross sectional dependence. The command supports balanced as well as un-
balanced panels. For a discussion of the teststatistic see section 3.
4.4.1 Syntax
xtcd2 varname(max=1), noestimation rho histogram name(string)
varname is the name of the residuals or variables and is optional in case
the command is performed after an estimation (postestimation).
4.4.2 Options
If noestimation is specified, then xtcd2 is not run as a postestimation com-
mand and does not require e(sample) to be set. This option allows to test
any variable. If not set, then xtcd2 uses either the variable specified in
9
varname or predicts the residuals using predict, residuals. In both
cases the sample is restricted to e(sample).
histogram plots a histogram of the cross correlations. The number of ob-
servations, the mean, percentiles, minimum and maximum of the cross
correlations are reported. If name(string) is set, then the histogram is
saved and not drawn.
rho saves the matrix with the cross correlations in r(rho).
10
5 Empirical Examples
In the following three empirical examples are carried out to demonstrate the
use of the xtdcce command. As a first exercise an augmented Solow model
with dynamic common correlated effects is estimated. This section includes
IV regressions and introduces the novelties of xtdcce. Afterwards mean
group and common correlated effects estimations and finally pooled mean
group estimations are shown. These last two parts compare xtdcce to the
existing Stata commands xtmg and xtpmg.
6
For a better readability, the commands are abbreviated in the outputs, but not in the
text.
11
Mean Group Variables: L.log_rgdpo log_hc log_ck log_ngd _cons
Cross Sectional Averaged Variables: D.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd
Degrees of freedom per country:
in mean group estimation = 39
with cross-sectional averages = 24
Number of
cross sectional lags = 3
variables in mean group regression = 2325
variables partialled out = 1860
On the lower right of the upper panel, the output shows a CD test statis-
tic of 1.37, or a p-value of 0.17, so the hypothesis of weak cross sectional
dependence can not be rejected. Below the coefficient estimates, xtdcce
displays the names of the 4 mean group variables, the constant and 5 cross
sectional averages. As the regression without any pooled variables is essen-
tially a regression run on each country separately, the degree of freedom for
each country is shown as well, which equals to the number of time periods
(T=44) minus the number of variables (5). The degree of freedom for each
country with cross-sectional averages indicates the potential degree of free-
dom of a regression which would include the cross sectional averages. It
equals the number of time periods (44) minus the number of variables (K =
5), minus the number of cross section averages times the number of lags plus
one for the contemporaneous averages (5 ∗ (3 + 1) = 20). At the end of the
table, the number of lags of the cross sectional means is displayed, together
with the number of variables in the mean group regression and the number of
variables partialled out, which equals to the number of cross sectional means.
As the cross sectional means are purely treated as controls and have no in-
terpretation, no information is lost by partialling out. Therefore the means
are regressed on each of the explanatory variables of interest and then the
residuals collected. The residuals are then used as the new explanatory and
dependent variables. The partialling out is performed in Mata. The variables
to be partialled out (the cross sectional means and if requested the hetero-
geneous intercept) are stacked in a block diagonal matrix, with zeros on the
off diagonals. For a large number of units the matrix becomes sparse and
calculating and inverting the cross product becomes computational intensive,
hence time consuming. To improve speed, the partialling out is done sequen-
tially unit by unit, which is possible as long as the coefficients on the cross
sectional means, δi,l , are heterogeneous.7 Within this process, the program
7
The precision lies in a negligible order of magnitude and is offsetted by the improve-
ment in speed. A simulation supporting these results are available upon request from the
author.
The standard solver for the calculation of the inverse of the cross product of the factor
loadings is cholsolve. cholsolve can not solve positive definite or singular matrices. In
this case qrsolve is used. Thanks to Mark Schaffer for providing the code for this routine.
12
checks if the factor loadings are full rank.8 If the check fails, xtdcce aborts
with error code 506, see section 6. While for the computation of the cross
sectional means, all available data is used, the partialling out is restricted to
the observations used in the regression. For the example above this means
the following: 1 period is lost to create L.log rgdpo and further 3 periods
are lost due to the creation of the lags of the cross sectional means. The
cross sectional means of the variables with exception of the lag level of the
dependent variable are computed using the full available sample size. The
average for L.log rgdpo is based on the years 1961 - 2007 as the value for
1960 is missing. The regression and the partialling out is then applied on
the restricted sample. In total 4 periods are lost, so the time span for the
regression are the years 1964 - 2007, making the time dimension T = 44.
The regression results are not in favour of the Solow Model as the coef-
ficient on human capital is negative and the coefficient on physical capital
positive but not significant. For a more detailed discussion of the Solow
Model in growth empirics see Mankiw et al. (1992); Islam (1995); Lee et al.
(1997) or Durlauf et al. (2005); Jones (2015); and with a focus on slope
heterogeneity see Islam (1998) and Lee et al. (1998).
As the residuals are saved as a new variable called residuals, the test on
cross sectional dependence can be done by hand to confirm the result from
above.
. xtcd2 residuals
Chudik and Pesaran (2015) test for cross sectional dependence
Postestimation.
H0: errors are weakly cross sectional dependent.
CD = 1.3669791
p_value = .17163187
Using the option noestimation leads to the same result, as long as the
observations which are omitted in the estimation are missings in the variable
residuals. The advantage of noestimation is, that it allows testing variables
for cross sectional dependence. For example testing the independent variable
for cross sectional dependence reads:
. xtcd2 log_d_rgdpo , noest
Chudik and Pesaran (2015) test for cross sectional dependence
H0: errors are weakly cross sectional dependent.
CD = 72.243906
p_value = 0
13
countries, βi,k = βk , ∀i = 1, .., N, k = 0, ..., 4, by specifying the pooled()
option, pooling the constant by using the pooledconstant option and forcing
xtdcce to display the constant using reportconstant.9
. xtdcce d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd , /*
> */ p(L.log_rgdpo log_hc log_ck log_ngd) reportc /*
> */ cr(d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd) cr_lags(3) pooledc
Dynamic Common Correlated Effects - Pooled Mean Group
Panel Variable (i): id Number of obs = 4092
Time Variable (t): year Number of groups = 93
Obs per group (T) = 44
F( 5, 2227)= 0.23
Prob > F = 0.95
R-squared = 0.16
Adj. R-squared = 0.16
Root MSE = 0.07
CD Statistic = -0.92
p-value = 0.3567
Pooled Variables:
L.log_rgdpo -.291652 .014082 -20.71 0.000 -.3192522 -.2640528
log_hc .061267 .096776 0.63 0.527 -.1284114 .2509449
log_ck .147058 .013429 10.95 0.000 .1207373 .1733795
log_ngd .003515 .022033 0.16 0.873 -.0396682 .0466986
_cons -1.4e-15 .469827 -0.00 1.000 -.9208433 .9208433
The estimate for the constant is close to zero. This result is expected, as
outlined in the section above, because all coefficients are pooled, the panel is
balanced and all variables are added as cross sectional means.
As a final exercise, let’s assume that investments into physical capital
are endogenous and are instrumented by its lagged values. The endogenous
variable log sk is specified in endogenous vars() and the exogenous variable
L.log sk in exogenous vars().
. xtdcce d.log_rgdpo L.log_rgdpo log_hc log_ngd , /*
> */ cr(d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd) reportc /*
> */ endo(log_ck) exo(L.log_ck) cr_lags(3) ivreg2options(nocollin noid)
Dynamic Common Correlated Effects - Mean Group IV
9
Pesaran (2006) discusses the mean group and the pooled version of the CCE estimator.
A pooled version in the dynamic setting is not mentioned by Chudik and Pesaran (2015b).
However Everaert and De Groote (2015) show that the unrestricted CCEP estimator is
consistent as long as N, T ⇒ ∞.
14
Panel Variable (i): id Number of obs = 4092
Time Variable (t): year Number of groups = 93
Obs per group (T) = 44
F( 2325, 1767)= 3.55
Prob > F = 0.00
R-squared = 0.48
Adj. R-squared = -0.21
Root MSE = 0.04
CD Statistic = 1.16
p-value = 0.2474
∆ci,t = φi (ci,t−1 − θ1,i yi,t − θ2,i πi,t ) + δ0,i + δ1,i ∆yi,t + δ2,i ∆πi,t + i,t , (3)
while xtdcce internally estimates (leaving out any cross sectional means):
∆ci,t = φi ci,t−1 + γ1,i yi,t + γ2,i πi,t + δ0,i + δ1,i ∆yi,t + δ2,i ∆πi,t + i,t , (4)
Secondly xtpmg calculates the long run coefficients using maximum likeli-
hood. xtdcce treats the long run coefficients, defined in lr(), as further co-
variates and estimates equation 4 entirely by OLS. To calculate the long run
coefficients, the coefficients are divided by the negative of the long run coin-
tegration vector to match equation 3, θ1,i = −γ1,i /φi . The variances are cal-
culated using the Delta method. Equation 4 and the coefficients γ1,i , ..., γK,i
can be estimated by using lr options(nodivide).
15
As in Blackburne and Frank (2007) the jasa2 dataset is used to explain
consumption with inflation and income after 1962. esttab produces the
following output:
. use jasa2, clear
. tsset id year
panel variable: id (unbalanced)
time variable: year, 1960 to 1993
delta: 1 unit
. qui xtpmg d.c d.pi d.y if year>=1962, lr(l.c pi y) ec(ec) replace pmg
. eststo xtpmg:
.
. eststo xtdcce1: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y) nocross lr_options(xtpmgnames) reportc
. eststo xtdcce2: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y) nocross lr_options(nodivide xtpmgnames) reportc
. eststo xtdcce3: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y) cr(d.c d.pi d.y) cr_lags(0) /*
> */ lr_options(xtpmgnames) reportc
. esttab xtpmg xtdcce1 xtdcce2 xtdcce3 /*
> */ , se mlabels(xtpmg "xtdcce (mg)" "xtdcce (mg)" "xtdcce (cce)") s(N cd cdp)
ec
pi -0.466*** -0.194** -0.0327** -0.276***
(0.0567) (0.0701) (0.0121) (0.0698)
y 0.904*** 0.903*** 0.152*** 0.940***
(0.00868) (0.0163) (0.0144) (0.0170)
SR
ec -0.200*** -0.168*** -0.168*** -0.184***
(0.0322) (0.0151) (0.0151) (0.0172)
D.pi -0.0183 -0.0548 -0.0548 0.0237
(0.0278) (0.0299) (0.0299) (0.0317)
D.y 0.327*** 0.380*** 0.380*** 0.384***
(0.0574) (0.0350) (0.0350) (0.0431)
_cons 0.154*** 0.137*** 0.137*** 0.0643***
(0.0217) (0.00765) (0.00765) (0.00585)
Column (1) shows the results using xtpmg, columns (2) - (4) using the
xtdcce estimator. Column (1) matches the results from Blackburne and
Frank (2007) p. 203. Column (2) displays the results using xtdcce. The
signs of the results are the same and the magnitudes, especially for the short
run coefficients, are very similar. In column (3) the option nodivide is used,
producing estimates for equation 4.10
10
An alternative to obtain long run coefficients in a dynamic panel using the between
16
As the last row indicates, using no cross sectional means leads to a rejec-
tion of the null of weak cross section dependence. Therefore in column (4)
cross sectional means are added.
The short run coefficients can be restricted to be equal across all units
by including them in the pooled() option. At the same time the long run
coefficients can be allowed to vary as well. To test under which constraints
the model is consistent, the Hausman test can be performed:
. eststo mg: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) nocross reportc
. eststo pmg: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y) nocross reportc
. eststo pooled: qui xtdcce d.c d.pi d.y if year >= 1962 , /*
> */ lr(l.c pi y) p(l.c pi y d.pi d.y) nocross reportc
. hausman mg pooled, sigmamore
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
mg pooled Difference S.E.
pi
D1. -.0253642 -.0280826 .0027184 .0305406
y
D1. .2337588 .3811944 -.1474357 .0533472
c
L1. -.3063473 -.1794146 -.1269326 .0328851
pi -.3529095 -.266343 -.0865666 .1230082
y .9181344 .9120574 .0060771 .0287949
c
L1. -.1683577 -.1794146 .0110569 .0050032
pi -.1941238 -.266343 .0722191 .0316818
y .9025766 .9120574 -.0094807 .007498
pi
D1. -.0548234 -.0280826 -.0267408 .0264268
y
D1. .3802491 .3811944 -.0009453 .0279448
17
The result of the Hausman test is similar to the one obtained in (Black-
burne and Frank, 2007, Section 4.3 and 4.4). The first Hausman test implies
that the pooled model is preferred over the mean group model. The second
Hausman test compares the mean group and the pooled model. The con-
clusion is in line with Blackburne and Frank (2007), that the pooled mean
group model is preferred.
The first two columns show regressions using the mean group estimator
without any cross section means. Column (1) and (3) match the estimation
results from Table 1, p. 67 in Eberhardt (2012). Column (3) and (4) are
based on the common correlated effects estimator, which includes a contem-
poraneous cross sectional means. In column (2) the CD teststatistic rejects
the hypothesis of weak cross sectional dependence. Including cross sectional
averages improves the statistic such that the hypothesis is can not be rejected
11
The dataset manu stata9.dta is taken from Eberhardt and Teal (2014) and is available
at https://sites.google.com/site/medevecon/.
18
any longer. Estimation results produced by xtmg and xtdcce differ slightly,
as seen here by the constant. The difference between both is, that xtdcce
converts and creates internally all variables into doubles, to allow for best
precision.12
6 Error Messages
xtdcce produces the following error codes:
r(109) ivreg2 not installed r(184) options noconstant and
pooledconstant, trend
and trendconstant or
jackknife and
recursive are combined.
r(506) Rank condition on cross r(2001) More variables than
section means not satisfied. observations.
7 Conclusion
The Stata user written program xtdcce introduces new routines to estimate
a heterogeneous panel model using dynamic common correlated effects. It
combines estimation procedures proposed in Pesaran and Smith (1995) and
(Shin et al., 1999) with those in Pesaran (2006) and Chudik and Pesaran
(2015b). It allows coefficients to be pooled or estimated as mean groups.
Furthermore it supports unbalanced panels, estimation of instrumental vari-
ables, small sample time series bias corrections and tests for cross sectional
dependence, using the included xtcd2 routine. An empirical example esti-
mating a growth regression is given.
8 Acknowledgments
I am grateful to Arnab Bhattacharjee and Mark Schaffer for many valuable
comments and suggestions. To Achim Ahrens for his contributions to the
xtcd2 command and to Kyle McNabb for testing the program. The code
and especially the output greatly benefited from Markus Eberhardt’s xtmg
command. Any remaining errors are my own.
Jan Ditzen, PhD Student, Spatial Economics and Econometrics Centre
(SEEC), School of Management & Languages, Heriot-Watt University, Edin-
burgh EH14 4AS, Scotland, United Kingdom; Email: jd219@hw.ac.uk, Web:
www.jan.ditzen.net
12
Another difference to xtmg is that xtdcce supports time-series operators.
19
References
Baum, C. F., M. E. Schaffer, and S. Stillman. 2003. Instrumental variables
and GMM: Estimation and testing. Stata Journal 1(3): 1–31.
Chudik, A., and M. H. Pesaran. 2015a. Large Panel Data Models with Cross-
Sectional Dependence: A Survey. In The Oxford Handbook Of Panel Data,
ed. B. H. Baltagi, chap. 1, 2–45. Oxford University Press.
Chudik, A., M. H. Pesaran, and E. Tosetti. 2011. Weak and strong cross-
section dependence and estimation of large panels. The Econometrics Jour-
nal 14(1): C45–C90.
Ditzen, J., and E. Gundlach. 2015. A Monte Carlo study of the BE estimator
for growth regressions. Empirical Economics forthcoming.
Eberhardt, M., and F. Teal. 2014. The magnitude of the task ahead: Pro-
ductivity analysis with heterogeneous technology .
20
Everaert, G., and T. De Groote. 2015. Common Correlated Effects Estima-
tion of Dynamic Panels with Cross-Sectional Dependence. Econometric
Reviews 4938(1981): 1–31.
21
Pesaran, M. 2006. Estimation and inference in large heterogeneous panels
with a multifactor error structure. Econometrica 74(4): 967–1012.
Pesaran, M. H. 2015. Testing Weak Cross-Sectional Dependence in Large
Panels. Econometric Reviews 34(6-10): 1089–1117.
Pesaran, M. H., and R. Smith. 1995. Econometrics Estimating long-run
relationships from dynamic heterogeneous panels. Journal of Econometrics
68: 79–113.
Shin, Y., M. H. Pesaran, and R. P. Smith. 1999. Pooled Mean Group Estima-
tion of Dynamic Heterogeneous Panels. Journal of the American Statistical
Association 94(446): 621 –634.
9 Technical Appendix
9.1 Delta Method
For the calculation of the long run coefficients the estimates of γ̂k,i obtained
by OLS from equation 4 are divided by estimates of the long run cointegration
vector φ̂i . To calculate the variance covariance matrix the delta method is
used. The delta method allows the calculation of an approximate probability
distribution for a matrix function a(β) based on a random vector with a
known variance (see for example √ Hayashi, 2000, p. 93). Suppose that for
the random vector βi →p β and n(βi − β) →d N (0, σ). Denote the first
derivates of a(β) as
∂a(β)
A(β) ≡ .
∂β 0
Then the distribution of the function a() is
√
n [a(βi ) − a(β)] →d N (0, A(β)ΣA(β)0 ) .
For the calculation of the long run coefficients and using the notation from
equation 4, assume that
βi = (φi , γ1,i , γ2,i , δ1,i , δ2,i )0
The variance covariance matrix is:
V (φi ) Cov(φi γ1,i ) Cov(φi γ2,i ) Cov(φi δ1,i ) Cov(φi δ2,i )
Cov(φi γ1,i ) V (γ1,i ) Cov(γ1,i γ2,i ) ... .
Σ= . . .
. . .
Cov(φi δ2,i ) ... ... ... V (δ2,1 )
22
The function a() maps the long run coefficients and leaves the short run
coefficients:
a(βi ) = (φi , −γ1,i /φi , −γ2,i /φi , δ1,i , δ2,i )0
= (φi , θ1,i , θ2,i , δ1,i , δ2,i )0
The first derivative of a() is then:
∂φi ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i
∂φi ∂φi ∂φi ∂φi ∂φi
∂φi ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i
∂γ1,i ∂γ1,i ∂γ1,i ∂γ1,i ∂γ1,i
∂φi ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i
Ai (β) = ∂γ2,i ∂γ2,i ∂γ2,i ∂γ2,i ∂γ2,i
∂φi ∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i
∂δ1,i ∂δ1,i ∂δ1,i ∂δ1,i ∂δ1,i
∂θ1,i ∂θ2,i ∂δ1,i ∂δ2,i
∂φi
∂δ2,i ∂δ2,i ∂δ2,i ∂δ2,i ∂δ2,i
1 − γφ1,i2 − γφ2,i2 0 0
i i
0 − φ1i 0 0 0
=
0 0 − φ1i 0 0
0 0 0 1 0
0 0 0 0 1
All components of the variance covariance matrix are then known and it can
be calculated as:
Σa = A(β)ΣA(β)0
23