Академический Документы
Профессиональный Документы
Культура Документы
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
M. HASHEM PESARAN
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© M. Hashem Pesaran 2015
The moral rights of the author have been asserted
First Edition published in 2015
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2015936093
ISBN 978–0–19–873691–2 (HB)
978–0–19–875998–0 (PB)
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Preface
T his book is concerned with recent developments in time series and panel data techniques
for the analysis of macroeconomic and financial data. It provides a rigorous, nevertheless
user-friendly, account of the time series techniques dealing with univariate and multivariate time
series models, as well as panel data models. An overview of econometrics as a subject is provided
in Pesaran (1987a) and updated in Geweke, Horowitz, and Pesaran (2008).
It is distinct from other time series texts in the sense that it also covers panel data models
and attempts at a more coherent integration of time series, multivariate analysis, and panel data
models. It builds on the author’s extensive research in the areas of time series and panel data
analysis and covers a wide variety of topics in one volume. Different parts of the book can be
used as teaching material for a variety of courses in econometrics. It can also be used as a reference
manual.
It begins with an overview of basic econometric and statistical techniques and provides an
account of stochastic processes, univariate and multivariate time series, tests for unit roots,
cointegration, impulse response analysis, autoregressive conditional heteroskedasticity mod-
els, simultaneous equation models, vector autoregressions, causality, forecasting, multivariate
volatility models, panel data models, aggregation and global vector autoregressive models
(GVAR). The techniques are illustrated using Microfit 5 (Pesaran and Pesaran (2009)) with
applications to real output, inflation, interest rates, exchange rates, and stock prices.
The book assumes that the reader has done an introductory econometrics course. It begins
with an overview of the basic regression model, which is intended to be accessible to advanced
undergraduates, and then deals with more advanced topics which are more demanding and
suited to graduate students and other interested scholars.
The book is organized into six parts:
Part I: Chapters 1 to 7 present the classical linear regression model, describe estimation and
statistical inference, and discuss the violation of the assumptions underlying the classical linear
regression model. This part also includes an introduction to dynamic economic modelling, and
ends with a chapter on predictability of asset returns.
Part II: Chapters 8 to 11 deal with asymptotic theory and present the maximum likelihood
and generalized method of moments estimation frameworks.
Part III: Chapters 12 and 13 provide an introduction to stochastic processes and spectral den-
sity analysis.
Part IV: Chapters 14 to 18 focus on univariate time series models and cover stationary ARMA
models, unit root processes, trend and cycle decomposition, forecasting and univariate volatility
models.
Part V: Chapters 19 to 25 consider a variety of reduced form and structural multivariate mod-
els, rational expectations models, as well as VARs, vector error corrections, cointegrating VARs,
VARX models, impulse response analysis, and multivariate volatility models.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
viii Preface
Part VI: Chapters 26 to 33 considers panel data models both when the time dimension (T)
of the panels is short, as well as when panels with N (the cross-section dimension) and T are
large. These chapters cover a wide range of panel data models, starting with static panels with
homogenous slopes and graduating to dynamic panels with slope heterogeneity, error cross-
section dependence, unit roots, and cointegration.
There are also chapters dealing with the aggregation of large dynamic panels and the theory
and practice of GVAR modelling. This part of the book focuses more on large N and T panels
which are less covered in other texts, and draws heavily on my research in this area over the past
20 years starting with Pesaran and Smith (1995).
Appendices A and B present background material on matrix algebra, probability and distribu-
tion theory, and Appendix C provides an overview of Bayesian analysis.
This book has evolved over many years of teaching and research and brings together in one
place a diverse set of research areas that have interested me. It is hoped that it will also be of
interest to others. I have used some of the chapters in my teaching of postgraduate students at
Cambridge University, University of Southern California, UCLA, and University of Pennsylva-
nia. Undergraduate students at Cambridge University have also been exposed to some of the
introductory material in Part I of the book. It is impossible to name all those who have helped
me with the preparation of this volume. But I would like particularly to name two of my Cam-
bridge Ph.D. students, Alexander Chudik and Elisa Tosetti, for their extensive help, particularly
with the material in Part VI of the book.
The book draws heavily from my published and unpublished research. In particular:
Chapter 7 is based on Pesaran (2010).
Chapter 25 draws from Pesaran and Pesaran (2010).
Chapter 32 is based on Pesaran (2003) and Pesaran and Chudik (2014) where additional
technical details and proofs are provided.
Chapter 31 is based on Breitung and Pesaran (2008) and provides some updates and extensions.
Chapter 33 is based on Chudik and Pesaran (2015b).
I would also like to acknowledge all my coauthors whose work has been reviewed in this vol-
ume. In particular, I would like to acknowledge Ron Smith, Bahram Pesaran, Allan Timmer-
mann, Kevin Lee, Yongcheol Shin, Vanessa Smith, Cheng Hsiao, Michael Binder, Richard Smith,
Alexander Chudik, Takashi Yamagata, Tony Garratt, Til Schermann, Filippo di Mauro, Stéphane
Dées, Alessandro Rebucci, Adrian Pagan, Aman Ullah, and Martin Weale. It goes without saying
that none of them is responsible for the material presented in this volume.
Finally, I would like to acknowledge the helpful and constructive comments and suggestions
from two anonymous referees which provided me with further impetus to extend the coverage
of the material included in the book and to improve its exposition over the past six months. Ron
Smith has also provided me with detailed comments and suggestions over a number of successive
drafts. I am indebted to him for helping me to see the wood from the trees over the many years
that we have collaborated with each other.
Hashem Pesaran
Cambridge and Los Angeles
January 2015
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
x Contents
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents xi
5 Autocorrelated Disturbances 94
5.1 Introduction 94
5.2 Regression models with non-spherical disturbances 94
5.3 Consequences of residual serial correlation 95
5.4 Efficient estimation by generalized least squares 95
5.4.1 Feasible generalized least squares 97
5.5 Regression model with autocorrelated disturbances 98
5.5.1 Estimation 99
5.5.2 Higher-order error processes 100
5.5.3 The AR(1) case 102
5.5.4 The AR(2) case 102
5.5.5 Covariance matrix of the exact ML estimators for the AR(1) and AR(2) disturbances 103
5.5.6 Adjusted residuals, R2 , R̄2 , and other statistics 103
5.5.7 Log-likelihood ratio statistics for tests of residual serial correlation 105
5.6 Cochrane–Orcutt iterative method 106
5.6.1 Covariance matrix of the C-O estimators 107
5.7 ML/AR estimators by the Gauss–Newton method 110
5.7.1 AR(p) error process with zero restrictions 111
5.8 Testing for serial correlation 111
5.8.1 Lagrange multiplier test of residual serial correlation 112
5.9 Newey–West robust variance estimator 113
5.10 Robust hypothesis testing in models with serially correlated/heteroskedastic errors 115
5.11 Further reading 118
5.12 Exercises 118
6 Introduction to Dynamic Economic Modelling 120
6.1 Introduction 120
6.2 Distributed lag models 120
6.2.1 Estimation of ARDL models 122
6.3 Partial adjustment model 123
6.4 Error-correction models 124
6.5 Long-run and short-run effects 125
6.6 Concept of mean lag and its calculation 127
6.7 Models of adaptive expectations 128
6.8 Rational expectations models 129
6.8.1 Models containing expectations of exogenous variables 130
6.8.2 RE models with current expectations of endogenous variable 130
6.8.3 RE models with future expectations of the endogenous variable 131
6.9 Further reading 133
6.10 Exercises 134
7 Predictability of Asset Returns and the Efficient Market Hypothesis 136
7.1 Introduction 136
7.2 Prices and returns 137
7.2.1 Single period returns 137
7.2.2 Multi-period returns 138
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
xii Contents
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents xiii
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
xiv Contents
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents xv
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
xvi Contents
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents xvii
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
xviii Contents
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents xix
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
xx Contents
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents xxi
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
xxii Contents
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents xxiii
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
xxiv Contents
Appendices 937
Appendix A: Mathematics 939
A.1 Complex numbers and trigonometry 939
A.1.1 Complex numbers 939
A.1.2 Trigonometric functions 940
A.1.3 Fourier analysis 941
A.2 Matrices and matrix operations 942
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Contents xxv
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
xxvi Contents
References 995
Name Index 1035
Subject Index 1042
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
List of Figures
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
32.2 GIRFs of one unit combined aggregate shock on the aggregate variable, gξ̄ (s), for different
persistence of common factor, ψ = 0, 0.5, and 0.8. 886
32.3 GIRFs of one unit combined aggregate shock on the aggregate variable. 895
32.4 GIRFs of one unit combined aggregate shocks on the aggregate variable (light-grey colour)
and estimates of as (dark-grey colour); bootstrap means and 90% confidence bounds,
s = 6, 12, and 24. 896
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
List of Tables
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
25.1 Summary statistics for raw weekly returns and devolatized weekly returns
over 1 April 1994 to 20 October 2009 621
25.2 Maximized log-likelihood values of DCC models estimated with weekly returns over 27 May
1994 to 28 December 2007 622
25.3 ML estimates of t-DCC model estimated with weekly returns over the period 27 May 94–28
Dec 07 624
26.1 Estimation of the Grunfeld investment equation 656
26.2 Pooled OLS, fixed-effects filter and HT estimates of wage equation 669
27.1 Arellano-Bover GMM estimates of budget shares determinants 688
27.2 Production function estimates 690
28.1 Fixed-effects estimates of static private saving equations, models M0 and M1 (21 OECD
countries, 1971–1993) 713
28.2 Fixed-effects estimates of private savings equations with cross-sectionally varying slopes,
(Model M2), (21 OECD countries, 1971–1993) 714
28.3 Country-specific estimates of ‘static’ private saving equations (20 OECD countries, 1972–1993) 720
28.4 Fixed-effects estimates of dynamic private savings equations with cross-sectionally varying
slopes (21 OECD countries, 1972–1993) 728
28.5 Private saving equations: fixed-effects, mean group and pooled MG estimates (20 OECD
countries, 1972–1993) 734
28.6 Slope homogeneity tests for the AR(1) model of the real earnings equations 746
29.1 Error correction coefficients in cointegrating bivariate VAR(4) of log of real house prices in
London and other UK regions (1974q4-2008q2) 762
29.2 Mean group estimates allowing for cross-sectional dependence 772
29.3 Small sample properties of CCEMG and CCEP estimators of mean slope coefficients in panel
data models with weakly and strictly exogenous regressors 780
29.4 Size and power of CD and LM tests in the case of panels with weakly and strictly exogenous
regressors (nominal size is set to 5 per cent) 790
29.5 Size and power of the JBFK test in the case of panel data models with strictly exogenous
regressors and homoskedastic idiosyncratic shocks (nominal size is set to 5 per cent) 792
29.6 Size and power of the CD test for large N and short T panels with strictly and weakly exogenous
regressors (nominal size is set to 5 per cent) 793
30.1 ML estimates of spatial models for household rice consumption in Indonesia 806
30.2 Estimation and RMSE performance of out-of-sample forecasts (estimation sample of
twenty-five years; prediction sample of five years) 807
31.1 Pesaran’s CIPS panel unit root test results 844
31.2 Estimation result: income elasticity of real house prices: 1975–2003 845
31.3 Panel error correction estimates: 1977–2003 846
32.1 Weights ωv and ωε̄ in experiments with ψ = 0.5 886
32.2 RMSE (×100) of estimating GIRF of one unit (1 s.e.) combined aggregate shock on the
aggregate variable, averaged over horizons s = 0 to 12 and s = 13 to 24 887
32.3 Summary statistics for individual price relations for Germany, France, and Italy
(equation (32.105)) 894
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Part I
Introduction to Econometrics
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 Relationship Between
Two Variables
1.1 Introduction
T here are a number of ways that a regression between two or more variables can be moti-
vated. It can, for example, arise because we know a priori that there exists an exact linear
relationship between Y and X, with Y being observed with measurement errors. Alternatively, it
could arise if (Y, X) have a bivariate distribution and we are interested in the conditional expec-
tations of Y given X, namely E(Y | X), which will be a linear function of X either if the underly-
ing relationship between Y and X is linear, or if Y and X have a bivariate normal distribution. A
regression line can also be considered without any underlying statistical model, just as a method
of fitting a line to a scatter of points in a two-dimensional space.
A: How to define and measure the distance of the points in the scatter diagram from the fitted
line. There are three plausible ways to measure the distance of a point from the fitted line:
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
4 Introduction to Econometrics
B: How to add up all such distances of the sampled observations. Possible weighting (adding-
up) schemes are:
The simplest is the combination A(i) and B(i), which gives the ordinary least squares (OLS)
estimates of the regression of Y on X. The method of ordinary least squares will be extensively
treated in the rest of this Chapter and in Chapter 2. The difference between A(i) and A(ii) can
also be characterized as to which of the two variables, X or Y, is represented on the horizontal
axis. The combination A(ii) and B(i) is also referred to as the ‘reverse regression of Y on X’.
Other combinations of distance/weighting schemes can also be considered. For example A(iii)
and B(i) is called orthogonal regression, A(i) and B(ii) yields the absolute minimum distance
regression. A(i) and B(iii) gives the weighted (or absolute distance) least squares (or absolute
distance) regression.
X as the regressor
Treating and Y as the regressand, then choosing the distance measure,
dt = yt − α − βxt , the least squares criterion function to be minimized is1
T
T
2
Q (α, β) = d2t = yt − α − βxt .
t=1 t=1
∂Q (α, β) T
= (−2) yt − α̂ − β̂xt = 0, (1.1)
∂α t=1
∂Q (α, β)
T
= (−2xt ) yt − α̂ − β̂xt = 0. (1.2)
∂β t=1
Equations (1.1) and (1.1) are called normal equations for the OLS problem and can be written as
T
ût = 0, (1.3)
t=1
T
ût xt = 0, (1.4)
t=1
1
T
The notations t=1 and t are used later to denote the sum of the terms after the summation sign over
t = 1, 2, . . . , T.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
are the OLS residuals. The condition Tt=1 ût = 0 also gives ȳ = α̂ + β̂ x̄, where x̄ =
T T
t=1 xt /T and ȳ = t=1 yt /T, and demonstrates that the least squares regression line ŷt =
α̂ + β̂xt , goes through the sample means of Y and X. Solving (1.3) and (1.4) for β̂, and hence
for α̂, we have
T
xt yt − Tx̄ȳ
β̂ = t=1
T , (1.6)
2
t=1 xt − Tx̄2
α̂ = ȳ − β̂ x̄ (1.7)
or since
T
T
(xt − x̄) yt − ȳ = xt yt − Tx̄ȳ,
t=1 t=1
T T
(xt − x̄)2 = x2t − Tx̄2 ,
t=1 t=1
equivalently
T
t=1 (xt − x̄) yt − ȳ SXY
β̂ = T = ,
t=1 (xt − x̄)
2 SXX
where
T
− x̄) yt − ȳ
t=1 (xt
SXY = = SYX ,
T
T
(xt − x̄)2
SXX = t=1 .
T
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
6 Introduction to Econometrics
It is easily seen that ρ̂ YX lies between −1 and +1. Notice also that the correlation coefficient
between Y and X is the same as the correlation coefficient between X and Y, namely ρ̂ XY =
ρ̂ YX . In this bivariate case we have the following interesting relationship between ρ̂ XY and the
regression coefficients of the regression Y on X and the ‘reverse’ regression of X on Y. Denoting
these two regression coefficients respectively by β̂ Y·X and β̂ X·Y , we have
SYX SXY
β̂ Y·X β̂ X·Y = = ρ̂ 2YX . (1.9)
(SXX SYY )
Hence, if β̂ Y·X > 0 then β̂ X·Y > 0. Since ρ̂ 2XY ≤ 1, if we assume that β̂ Y·X > 0 it follows that
ρ̂ 2XY
β̂ X·Y ≤ 1
. If we further assume that 0 < β̂ Y·X < 1, then β̂ X·Y = > ρ̂ 2XY .
β̂ Y·X β̂ Y·X
where
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and Rank(yt : y) is equal to a number in the range [1 to T] determined by the size of yt relative
to the other T − 1 values of y = (y1 , y2 , . . . , yT ) . Note also that by construction Tt=1 dt = 0,
T 2
and that t=1 dt can only take even integer values and has a mean equal to (T 3 − T)/6. Hence
E(rs ) = 0. The Spearman rank correlation can also be computed as a simple correlation between
ryt = Rank(yt : y) and rxt = Rank(xt : x). It is easily seen that
T
t=1 (ryt
− ry)(rxt − rx)
rs =
1/2
1/2 ,
T T
t=1 (ry t − ry)2
t=1 (rx t − rx)2
where
T
T
T+1
ry = rx = T −1 ryt = T −1 rxt = .
t=1 t=1
2
Kendall’s τ correlation
Another rank correlation coefficient was introduced by Kendall (1938). Consider the T pairs
of ranked observations (ryt , rxt ), associated with the quantitative measures (yt , xt ), for t =
1, 2, . . . , T as discussed above. Then the two pairs of ranks (ryt , rxt ) and (rys , rxs ) are said to
be concordant if
(rxt − rxs )(ryt − rys ) > 0, concordant pairs for all t and s,
and discordant if
Denoting the number of concordant pairs by PT and the number of discordant pairs by QT ,
Kendall’s τ correlation coefficient is defined by
2
τT = (PT − QT ) . (1.11)
T(T − 1)
More formally
T
PT = I [(rxt − rxs )(ryt − rys )] ,
t,s=1
T
QT = I [−(rxt − rxs )(ryt − rys )] ,
t,s=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
8 Introduction to Econometrics
2 −1
E(τ T ) = sin (ρ),
π
3(τ − ρ s )
E(rs ) = ρ s + ,
T+1
where ρ s is the population value of Spearman rank correlation. Finally, in the bivariate normal
case we have
πρ
s
ρ = 2 sin .
6
These relationships suggest the following indirect possibilities for estimation of the simple cor-
relation coefficient, namely
π
ρ̂ 1 = sin τT ,
2
π 3(τ T − rs )
ρ̂ 2 = 2 sin rs − ,
6 T+1
as possible alternatives to ρ̂, the simple correlation coefficient. See Kendall and Gibbons (1990,
p. 169). The alternative estimators, ρ̂ 1 and ρ̂ 2 , are likely to have some merit over ρ̂ in small sam-
ples in cases where the population distribution of (yt , xt ) differs from bivariate normal and/or
when the observations are subject to measurement errors.
Tests based on the different correlation measures are discussed in Section 3.4.
T
2
T
2
yt − ȳ = ŷt − ȳ − ŷt − yt
t=1 t=1
T
2
T
2
T
= ŷt − ȳ + ŷt − yt − 2 ŷt − yt ŷt − ȳ
t=1 t=1 t=1
T
2 T
2 T
= yt − ŷt + ŷt − ȳ + 2 ût ŷt − ȳ .
t=1 t=1 t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
T T
ût ŷt − ȳ = ût α̂ + β̂xt − ût ȳ
t=1 t=1 t=1
T
T
T
= α̂ ût + β̂ ût xt − ȳ ût = 0,
t=1 t=1 t=1
T T
since from the normal equations (1.3) and (1.4), t=1 ût = 0 and t=1 ût xt = 0, then
T 2 T 2 T 2
t=1 yt − yt = t=1 ŷt − ȳ + t=1 yt − ŷt . (1.12)
This decomposition of the total variations in Y forms the basis of the analysis of variance, which
is described in the following table.
Proposition 1 highlights the relation between ρ̂ 2XY and the variance decomposition.
Proposition 1
T 2
S2XY t=1 yt − ŷt
ρ̂ 2XY = = 1 − T 2 . (1.13)
SXX SYY yt − ȳ t=1
2 2 2
yt − ŷt
t t yt − ȳ − t yt − ŷt
1− 2 = 2 ,
t y t − ȳ t y t − ȳ
2 2
yt − ŷt
t t ŷt − ȳ
1− 2 = 2 .
t yt − ȳ t yt − ȳ
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
10 Introduction to Econometrics
2 2
ŷt − ȳ = α̂ + β̂xt − ȳ
t t
2
= β̂ (xt − x̄) + β̂ x̄ + α̂ − ȳ ,
t
2 2 S2XY S2XY
ŷt − ȳ = β̂ (xt − x̄)2 = · SXX = ,
t t
S2XX SXX
2
yt − ŷt
t S2YY
1− 2 = = ρ̂ 2XY .
t yt − ȳ
S S
YY XX
The above result is important since it also provides a natural generalization of the concept of
the simple correlation coefficient, ρ̂ XY , to the multivariate regression case, where it is referred to
as the multiple correlation coefficient (see Section 2.10).
A: Classical linear regression model. This model assumes that the relationship between Y
and X is a linear one:
yt = α + βxt + ut , (1.14)
(i) Zero mean: the disturbances ut have zero means, i.e., E(ut ) = 0.
(ii) Homoskedasticity: conditional on xt the disturbances ut have constant conditional
variance. Var (ut |xs ) = σ 2 , for all t and s.
(iii) Non-autocorrelated error: the disturbances ut are serially uncorrelated. Cov(ut , us ) =
0 for all t = s.
(iv) Orthogonality: the disturbances ut and the regressor xt are uncorrelated, or condi-
tional on xs , ut has a zero mean (namely E (ut | xs ) = 0 , for all t and s).
Assumption (i) ensures that the unconditional mean of yt is correctly specified by the
regression equation. The other assumptions can be relaxed and are introduced to provide
a simple model that can be used as a benchmark in econometric analysis.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
B: Another way of motivating the linear regression model is to focus on the joint distribution
of Y and X, and assume that this distribution is normal with constant means, variances
and covariances. In this case the regression of Y on X defined as the conditional mean
of Y given a particular value of X, say X = x will be a linear function of x. In particular
we have:
E (Y |X = xt ) = α + βxt , (1.15)
and
Var (Y |X = xt ) = Var (Y) 1 − ρ 2XY , (1.16)
The parameters α and β are related to the moments of the joint distribution of Y and X in the
following manner:
Cov (X, Y)
α = E (Y) − E (X) , (1.17)
Var (X)
and
Cov (X, Y) Var (Y)
β= = ρ XY . (1.18)
Var (X) Var (X)
Using (1.17) and (1.18), relation (1.15) can also be written as:
Cov (X, Y)
E (Y |X = xt ) = E (Y) + [xt − E (X)] . (1.19)
Var (X)
Model B does not postulate a linear relationship between Y and X, but assumes that (Y, X) have a
bivariate normal distribution. In contrast, model A assumes linearity of the relationship between
Y and X, but does not necessarily require that the joint probability distribution of (Y, X) be
normal. It is clear that under assumption (iv), (1.14) implies (1.15). Also (1.15) can be used to
obtain (1.14) by defining ut to be
ut = yt − E (Y |X = xt ) ,
or more simply
ut = yt − E yt |xt . (1.20)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
12 Introduction to Econometrics
It is in the light of this expression that ut s are also often referred to as ‘innovations’ or ‘unexpected
components’ of yt .
Both the above statistical models are used in the econometric literature. The two models can
also be combined to yield the ‘classical normal linear regression model’ which adds the extra
assumption that ut are normally distributed to the list of the four basic assumptions of the clas-
sical linear regression model set out above.
Finally, it is worth noting that under the normality assumption using (1.16) we also have
Var (ut |xt ) = σ 2 = Var (Y) 1 − ρ 2YX . (1.21)
Hence,
σ2
ρ 2YX = 1 − ,
Var (Y)
which is the population value of the sample correlation coefficient defined by (1.8) and (1.13).
E(yt ) = α + βE (xt ) ,
E(yt xt ) = αE(xt ) + βE(x2t ).
It is clear that α and β can now be derived in terms of the population moments, E(yt ), E(xt ),
E(x2t ), and E(yt xt ), namely
−1
α 1 E (xt ) E(yt )
= . (1.22)
β E (xt ) E(x2t ) E(yt xt )
The inverse exists if Var(xt ) = E(x2t ) − [E (xt )]2 > 0. The method of moment estimators of α
and β are obtained when the population moments in the above expression are replaced by the
sample moments which are given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Using these sample moments in (1.22) gives α̂ MM and β̂ MM , that are easily verified to be the
same as the OLS estimators given by (1.7) and (1.6).
In cases where the number of moment conditions exceed the number of unknown parameters,
the method of moments is generalized to take account of the additional moment conditions
in an efficient manner. The resultant estimator is then referred to as the generalized method of
moments (GMM), which will be discussed in some detail in Chapter 10.
But under the assumption that the errors are normally distributed, the non-autocorrelated error
assumption, (iii), implies that the errors are independently distributed and hence we have
Pr y x,α, β, σ 2 = Pr(u1 ) Pr(u2 ) . . . . Pr(uT ).
The likelihood of the unknown parameters, which we collect in the 3×1 vector θ = (α, β, σ 2 ) ,
is the same as the above joint density function, but is viewed as a function of θ rather than y.
Denoting the likelihood function of θ by LT (θ ) we have
T 2
2 −T/2 − t=1 yt − α − βxt
LT (θ ) = (2πσ ) exp . (1.23)
2σ 2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
14 Introduction to Econometrics
To obtain the maximum likelihood estimator (MLE) of θ it is often more convenient to work
with the logarithm of the likelihood function, referred to as the log-likelihood function, which
we denote by T (θ ). Using (1.23) we have
T 2
T t=1 yt − α − βxt
T (θ ) = − log(2πσ 2 ) − .
2 2σ 2
It is now clear that maximization of T (θ ) with respect to α and β will be the same as minimizing
T 2
t=1 yt − α − βxt with respect to these parameters, which establish that the MLE of α and
β is the same as their OLS estimators, namely α̂ ML = α̂, and β̂ ML = β̂, where α̂, and β̂ are given
by (1.7) and (1.6), respectively. The MLE of σ 2 can be obtained by taking the first derivative of
T (θ ) with respect to σ 2 . We have
T 2
∂ T (θ ) T t=1 yt − α − βxt
=− 2 + .
∂σ 2 2σ 2σ 4
Setting ∂ T (θ )/∂σ 2 = 0 and solving for σ̂ 2ML in terms of the MLE of α and β now yields
T 2 T 2
T
yt − α̂ ML − β̂ ML xt t=1 yt − α̂ − β̂xt 2
t=1 t=1 ût
σ̂ 2ML = = = , (1.24)
T T T
In what follows we present a proof of properties (i) to (iii) for β̂. A similar proof can also be
established for α̂. Recall that
T
SXY t=1 yt − ȳ (xt − x̄)
β̂ = = T .
t=1 (xt − x̄)
SXX 2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
T
T
yt − ȳ (xt − x̄) = yt (xt − x̄) − ȳ (xt − x̄) ,
t=1 t=1 t=1
T T
and since t=1 ȳ (xt − x̄) = ȳ t=1 (xt − x̄) = 0, then
T
T
yt − ȳ (xt − x̄) = yt (xt − x̄) .
t=1 t=1
T
β̂ = wt yt , (1.25)
t=1
xt − x̄
wt = T (1.26)
t=1 (xt − x̄)2
are fixed and add up to zero, namely Tt=1 wt = 0. This establishes property (ii).
Notice that xt ’s are taken as given, which is justified if they are strictly exogenous. Further dis-
cussion of the concept of strict exogeneity is given in Section 2.2, but in the present context xt
will be strictly exogenous if it is uncorrelated with current, past, as well as future values of the
error terms, us ; more specifically if Cov(xt , us ) = 0, for all values of t and s. Under this assump-
tion, taking conditional expectations of both sides of (1.25), we have:
T
E β̂ = E wt yt |x1 , x2 , . . . , xT
t=1
T
= wt E yt |xt ,
t=1
But using (1.14) or (1.15), conditional on xt , we have E yt |xt = α + βxt . Consequently,
T
E β̂ = wt (α + βxt )
t=1
T
T
=α wt + β wt xt . (1.27)
t=1 t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
16 Introduction to Econometrics
T
T
xt (xt − x̄)
wt xt = t=1
T ,
t=1 t=1 (xt − x̄)2
and since
T
T
(xt − x̄)2 = (xt − x̄) (xt − x̄)
t=1 t=1
T
T
= xt (xt − x̄) − x̄ (xt − x̄)
t=1 t=1
T T
= xt (xt − x̄) − x̄ (xt − x̄)
t=1 t=1
T
= xt (xt − x̄) ,
t=1
it then follows that Tt=1 wt xt = 1. We have also seen that Tt=1 wt = 0, hence it follows from
(1.27) that E β̂ = β, which establishes that β̂ is an unbiased estimator, that is, point (i).
The variance of β̂ can also be computed easily using (1.25). We have
T
Var β̂ = w2i Var yt |xt
t=1
T
= w2i Var (ut |xt )
t=1
T
=σ 2
w2i ,
t=1
σ2 σ2
Var β̂ = T = . (1.28)
t=1 (xt − x̄)
2 SXX
Similarly, we have
T
σ2 2
t=1 xt
Var α̂ = T , (1.29)
T t=1 (xt − x̄)2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and
−σ 2 x̄
Cov α̂, β̂ = T . (1.30)
t=1 (xt − x̄)
2
The Gauss–Markov theorem (i.e., property (iii) above) states that among all linear, unbiased esti-
mators of β (or α) the OLS estimator, β̂, has the smallest variance. To prove this result consider
another linear unbiased estimator of β and denote it by β̃. Then by assumption
T
β̃ = w̃t yt ,
t=1
where w̃t are fixed weights (which do not depend on yt ) and satisfy the conditions
T
w̃t = 0, (1.31)
t=1
and
T
w̃t xt = 1. (1.32)
t=1
These two conditions ensure that β̃ is an unbiased estimator of β, that is, that E β̃ = β. Sup-
pose now w̃t differ from wt , the OLS weights given in (1.26), by the amount δ t and let
w̃t = wt + δ t , t = 1, 2, . . . , T, (1.33)
where δ is the amount of discrepancy between the two weighting schemes. Since Tt=1 wt =
T t T T T
t=1 w̃t = 0. It follows
also that t=1 δ t = 0, and since t=1 wt xt = t=1 w̃t xt = 1, then
we should also have Tt=1 δ t xt = 0.
The variance of β̃ is now given by
T
Var β̃ = w̃2t Var yt |xt
t=1
T
= σ2 w̃2t ,
t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
18 Introduction to Econometrics
T
T
T
δ t (xt − x̄) = δ t xt − x̄ δt ,
t=1 t=1 t=1
T T T
which is equal to zero. Recall that t=1 δ t = 0, and t=1 δ t xt = 0. Hence t=1 wt δ t = 0,
and
T
T
Var β̃ = σ 2
wi +
2
δ t ≥ Var β̃ ,
2
t=1 t=1
which establishes the Gauss–Markov theorem for β̂. The equality sign holds if and only if δ t = 0
for all i. The proof of the Gauss–Markov theorem for the multivariate case is presented in
Section 2.7.
1.9.1 Estimation of σ 2
Since Var α̂ and Var β̃ depend on the unknown parameter, σ 2 (the variance of the distur-
bance term), in order to obtain estimates of the variances of the OLS estimators, it is also neces-
sary to obtain an estimate of σ 2 . For this purpose we first note that
σ 2 = Var (ut |xt ) = E u2t .
It is, therefore, reasonable to interpret σ 2 as the mean value of the squared disturbances. A
moment estimator of σ 2 can then be obtained by the sample average of u2t . In practice, how-
ever, ut ’s are observed indirectly through the estimates of α and β. Hence a feasible estimator of
σ 2 can be obtained by replacing α and β in the definition of ut by their OLS estimators. Namely,
T T 2
2 yt − α̂ − β̂xt
t=1 ût t=1
σ̃ 2 = = ,
T T
which is the same as the ML estimator of σ 2 given by (3). When T is large, this provides a rea-
sonable estimator of σ 2 . However, in finite samples a more satisfactory estimator of σ 2 can be
obtained by dividing the sum of squares of the residuals by T − 2 rather than T. Namely,
T 2
t=1 yt − α̂ − β̂xt
σ̂ 2 = , (1.34)
T−2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where ‘2’ is equal to the number of estimated unknown parameters in the simple regression
2
2 (here,2 α̂ and β̂). Unlike σ̃ , the above estimator of σ given by (1.34) is unbiased. Namely
model 2
E σ̂ = σ .
Using the above estimator of σ 2 it is now possible to estimate the variances and covariances
of β̂ given in (1.28). For example we have
σ̂ 2
β̂ =
Var ,
T
t=1 (xt − x̄)
2
α̂ and C
and similarly for Var ov α̂, β̂ .
The problem of testing the statistical significance of the estimates and their confidence bands
will be addressed in Chapter 3.
An estimate of this expression gives the estimate of the conditional predictor of yT+1 . The OLS
estimate of yT+1 is given by
ŷT+1 = Ê yT+1 |x1 , x2 , . . . = α̂ + β̂xT+1 .
2 Notice that for the two random variables x and y, and the fixed constants α and β, we have
Var αx + βy = α 2 Var (x) + β 2 Var y + 2αβ Cov x, y .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
20 Introduction to Econometrics
x
σ 2 Tt t + x2T+1 − 2x̄xT+1
= T
t=1 (xt − x̄)
2
t xt + TxT+1 − 2
2
σ2 2
t xt xT+1
= .
t (xt − x̄)
T 2
Therefore
σ2
Var ŷT+1 = (xt − x̄) + T (xT+1 − x̄) ,
2 2
T t (xt − x̄)2 t
or
1 (xT+1 − x̄)2
Var ŷT+1 = σ 2 + . (1.35)
t (xt − x̄)
T 2
An estimate of Var ŷT+1 is now given by
2 1 (xT+1 − x̄)2
Var ŷT+1 = σ̂ + 2 . (1.36)
T t (xt − x̄)
The general theory of prediction under alternative loss functions is discussed in Chapter 17.
Under the assumption that yT+1 is generated according to the simple regression model we have
To compute the variance of ûT+1 we first note that both α̂ and β̂ are linear functions of the
disturbances over the estimation period (namely u1 , u2 , . . . , uT ) and do not depend on uT+1 .
Since by assumption ut ’s are serially uncorrelated it therefore follows that
Cov uT+1 , α̂ − α = 0,
Cov uT+1 , β̂ − β = 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Hence, conditional on xT+1 , uT+1 and ŷT+1 = α̂ + β̂xT+1 will also be uncorrelated, and
Var ûT+1 = Var (uT+1 ) + Var ŷT+1 .
In the case where {xt } has a constant variance,
Var ûT+1 converges to σ 2 as T → ∞. The
above derivations also clearly show that Var ûT+1 is composed of two parts: one part is due
to the inherent uncertainty that surrounds the regression line (i.e., Var (ut ) = σ 2 ), and the
other part is due to the sampling variation that is associated with the estimation of the regression
parameters, α and β. It is, therefore, natural that as T → ∞, the latter source of variations
disappears and we are left with the inherent uncertainty due to the regression, as measured by σ 2 .
where ε t s are assumed to have zero means and constant variances. Under this specification the
‘optimal’ forecast of xT+1 (conditional on past values of x’s) is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
22 Introduction to Econometrics
where ρ̂ is the OLS estimator of ρ, obtained from the regression of xt on its one-period lagged
value. In Chapter 17 we review forecasting within the general context of ARMA models, intro-
duced in Chapter 12.
1.11 Exercises
1. Show that the correlation coefficient defined in (1.8) ranges between −1 and 1.
2. In the model yt = α + βxt + ut what happens to the OLS estimator of β if xt and/or yt are
standardized by demeaning and scaling by their standard deviations?
3. The following table provides a few key summary statistics for daily rates of change of UK stock
index (FTSE) and the GB pound versus US dollar.
Using these statistics what do you think are the main differences between these two series and
how best these differences are characterized?
4. Consider the following data
(X) (Y)
169.6 71.2
166.8 58.2
157.1 56.0
181.1 64.5
158.4 53.0
165.6 52.4
166.7 56.8
156.5 49.2
168.1 55.6
165.3 77.8
X̄ = 165.52 Ȳ = 59.47
We obtain
SXX = 472.076,
SYY = 731.961,
SXY = 274.786.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Plot Y against X. Run OLS regressions of Y on X and the reverse regression of Y on X. Check
that the fitted regression line goes through the means of X and Y.
5. Consider the simple regression model
yt = α + βxt + ut , t = 1, 2, . . . , T
(a) Explain briefly what is meant by saying that an estimator, β̂, of β is:
i. unbiased
ii. consistent
iii. maximum likelihood.
(b) Under what assumptions is the OLS estimator of β:
i. the best linear unbiased estimator
ii. the maximum likelihood estimator
(c) For each of the assumptions you have listed under (b) give an example where the assump-
tion might not hold in economic applications.
(d) In the model above, why do econometricians make assumptions about the distribution
of ut when testing a hypothesis about the value of β?
Wi = a + b log(Ei ) + ε i ,
ln (Wi ) = α + β log(Ei ) + vi ,
where Wi = P Fi /Ei , is the share of food expenditure of household i, P is the price of food
assumed fixed across all households, Ei = Fi + NFi , with Fi and NFi are respectively food
and non-food expenditures of the household, εi and vi are random errors, a, b, α and β are
constant coefficients.
(a) How do you use these specifications to compute the elasticity of food expenditure rela-
tive to the total expenditure?
(b) Discuss the relative statistical and theoretical merits of these specifications for the analy-
sis of food expenditure.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
2 Multiple Regression
2.1 Introduction
T his chapter considers the extension of the bivariate regression discussed in Chapter 1 to
the case where more than one variable is available to explain/predict yt , the dependent
variable. The topic is known as multiple regression analysis, although only one relationship
is in fact considered between yt and the k explanatory variables, xti , for i = 1, 2, . . . , k. The
problem of multiple regressions where m sets of dependent (or endogenous) variables, ytj , j =
1, 2, . . . , m are explained in terms of xti , for i = 1, 2, . . . , k will be considered in Chapter 19
and is known as multivariate analysis and includes topics such as canonical correlation and fac-
tor analysis. This chapter covers standard techniques such as ordinary least squares (OLS) and
examines the properties of OLS estimators under classical assumption, discusses the Gauss–
Markov theorem, multiple correlation coefficient, the multicollinearity problem, partitioned
regression, introduces regressions that are nonlinear in variables and discusses the interpreta-
tion of coefficients.
k
yt = β j xtj + ut , for t = 1, 2, . . . , T, (2.1)
j=1
where xt1 , xt2 , . . . , xtk are the t th observation on k regressors. If the regression contains an inter-
cept, then one of the k regressors, say the first one xt1 , is set equal to unity for all t, namely xt1 = 1.
The parameters β 1 , β 2 , . . . , β k assumed to be fixed (i.e., time invariant) are the regression coef-
ficients, and ut are the ‘disturbances’ or the ‘errors’ of the regression equation. The regression
equation can also be written more compactly as
yt = β xt + ut , for t = 1, 2, . . . , T, (2.2)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 25
where β = (β 1 , β 2 , . . . , β k ) and xt = (xt1 , xt2 , . . . , xtk ) . Stacking the equations for all the T
observation and using matrix notations, (2.1) or (2.2) can be written as (see Appendix A for an
introduction to matrices and matrix operations)
y = Xβ + u, (2.3)
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
x11 x12 ··· x1k y1 u1
⎜ x21 x22 ··· x2k ⎟ ⎜ y2 ⎟ ⎜ u2 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
X=⎜ .. .. .. .. ⎟, y=⎜ .. ⎟, u=⎜ .. ⎟.
⎝ . . . . ⎠ ⎝ . ⎠ ⎝ . ⎠
xT1 xT2 · · · xTk yT uT
Assumption A4: Orthogonality: the disturbances ut and the regressors xt1 , xt2 , . . . , xtk are uncor-
related
Assumption A2 implies that the variances of ut s are constant also unconditionally, since,1
given that, under A4, E(ut |x1 , x2 , . . . , xT ) = 0. The assumption of constant conditional and
unconditional error variances is likely to be violated when dealing with cross-sectional regres-
sions, while that of constant conditional error variances is often violated in analysis of financial
and macro-economic times series, such as exchange rates, stock returns and interest rates. How-
ever, it is possible for errors to be unconditionally constant (time-invariant) but conditionally
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
26 Introduction to Econometrics
Assumption A4(i) Orthogonality: the disturbances ut and the regressors xt1 , xt2 , . . . , xtk are
uncorrelated
Under these assumptions the regressors are said to be weakly exogenous, and allow for
lagged values of yt to be included in xt .
Adding assumption A5 to the classical model yields the classical linear normal regression
model. This model can also be derived using the joint distribution of yt , xt , and by assuming
2 Champernowne (1960) and Granger and Newbold (1974) provide Monte Carlo evidence on the spurious regression
problem, and Phillips (1986) establishes a number of theoretical results.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 27
that this distribution is a multivariate normal with constant means, variances and covariances. In
this setting, the regression of yt on xt , defined as the mathematical expectation of yt conditional
on the realized values of the regressors, will be linear in the regressors. The linearity of the regres-
sion equation follows from the joint normality assumption and need not hold if this assumption
is relaxed. To be more precise suppose that
yt
N (μ, ) , (2.4)
xt
where
μy σ yy σ yx
μ= , and = .
μx σ xy xx
Then using known results from theory of multivariate normal distributions (see Appendix B for
a summary and references) we have
E yt |xt = μy + σ yx −1
xx (xt − μx ),
Var yt |xt = σ yy − σ yx −1
xx σ xy .
Under this setting, assuming that (2.2) includes an intercept, the regression coefficients β will be
given by (μy − σ yx −1 −1
xx μx , σ yx xx ) . It is also easily seen that the regression errors associated
with (2.4) are given by
ut = yt − (μy − σ yx −1 −1
xx μx ) − σ yx xx xt ,
and, by construction, satisfy the classical assumptions. But note that no dynamic effects are
allowed in the distribution of (yt , xt ) .
Both of the above interpretations of the classical normal regression model have been used in
the literature (see, e.g., Spanos (1989)). We remark that the normality assumption A5 may be
important in small samples, but is not generally required when the sample under consideration
is large enough.
All the various departures from the classical normal regression model mentioned here will be
analysed in Chapters 3 to 6.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
28 Introduction to Econometrics
The necessary conditions for the minimization of Q β 1 , β 2 , . . . , β k are given by
⎛ ⎞
∂Q β 1 , β 2 , . . . , β k T k
= −2 xts ⎝yt − β̂ j xtj ⎠ = 0, s = 1, 2, . . . , k, (2.6)
∂β s t=1 j=1
where β̂ j is the OLS estimator of β j . The k equations in (2.6) are known as the ‘normal’ equa-
tions. Denoting the residuals by ût = yt − j β̂ j xtj , the normal equations can be written as
T
t=1 xts ût = 0, for s = 1, 2, . . . , k, or, in expanded form
T
T
k
xts yt = β̂ j xtj xts
t=1 t=1 j=1
T
k
= β̂ j xtj xts .
j=1 t=1
Without the use of matrix notations, the study of the properties of multiple regression would be
extremely tedious. In matrix form, the criterion function (2.5) to be minimized is
Q (β) = y − Xβ y − Xβ , (2.7)
− −
In the case where Rank(X) = r < k, β̂ = X X X y, where X X represents the gen-
eralized inverse of X X. In this case only r linear combinations of the regression coefficients are
uniquely determined.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 29
Taking logs, we obtain the log-likelihood function for the classical linear regression model
1
T
T 2
T (θ) = log LT (θ ) = − log 2πσ 2 − 2 yt − β x t . (2.10)
2 2σ t=1
∂β σ2
X y − Xβ 0
∂T (θ)
= = .
∂σ 2
− 2σT 2 + 1
2σ 4
y − Xβ y − Xβ 0
where u = y − X β. Notice that the estimator for the slope coefficients is identical to the OLS
estimator (2.8), while the variance estimator differs from (2.14) by the divisor of T instead
of T − k. Clearly, the OLS estimator inherits all the asymptotic properties of the ML esti-
mator. We refer to Chapter 9 for a review of the theory underlying the maximum likelihood
approach, and to Chapter 19 for an extension of the above results to the case of multivariate
regression.
The likelihood approach also forms the basis of the Bayesian inference where the likeli-
hood is combined with prior distributions on the unknown parameters to obtain posterior
probability distributions which is then used for estimation and inference: see Section C.6 in
Appendix C.
3 See also Section 1.8 where the likelihood approach is introduced for the analysis of bivariate regression models.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
30 Introduction to Econometrics
Therefore
X û = X My = 0, (2.11)
or Tt=1 xts ût = 0, for s = 1, 2, . . . , k which are the normal equations of the regression problem.
Therefore, the regressors are by construction ‘orthogonal’ to the vector of OLS residuals.
In the case where the regression equation contains an intercept term (i.e., when one of the xtj ’s
is equal to 1 for all t) we also have
T
ût = 0 = T ȳ − β̂ 1 x̄1 − β̂ 2 x̄2 − . . . − β̂ k x̄k = 0,
t=1
where x̄j stands for the sample mean of the jth regressor, xtj . This result follows directly from
the normal equations Tt=1 xts ût = 0, by choosing xts to be the intercept term, namely setting
xts = 1 in Tt=1 xts ût = 0.
To summarize, the OLS residual vector, û, has the following properties:
(i) By construction all the regressors are orthogonal to the residual vector, that is, X û = 0.
(ii) When the regression equation contains an intercept term, the residuals, ût , have mean
zero exactly, i.e. Tt=1 ût = 0. This result also implies that the regression plane goes
through the sample mean of y and the sample means of all the regressors.
(iii) Even if ut are homoskedastic and serially uncorrelated, the OLS residuals, ût , will be het-
eroskedastic and autocorrelated in small samples.
û = My = M (Xβ + u) = Mu,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 31
and
E ûû = E Muu M = ME uu M .
But under the classical assumptions E uu = σ 2 IT . Hence
E ûû = M σ 2 IT M = σ 2 MM = σ 2 M,
which is different from an identity matrix and establishes that ût and ût t = t are neither
uncorrelated nor homoskedastic. These properties of OLS residuals lie at the core of some of
the difficulties encountered in practice in developing tests of the classical assumptions based on
OLS residuals, that perform well in small samples. Fortunately, the serial correlation and het-
eroskedasticity properties of OLS residuals tend to disappear in ‘large enough’ samples.
Therefore
−1 −1
Var β̂ = E X X X uu X X X .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
32 Introduction to Econometrics
and under assumptions A2 and A3, E uu |X = σ 2 IT . Therefore,
−1 −1 −1
E X X X uu X X X |X = σ 2 X X ,
and hence
−1
Var β̂ = σ 2 E X X . (2.13)
For given values of X an estimator of Var β̂ is
β̂ = σ̂ 2 X X −1 ,
Var
where σ̂ 2 is
2
2
t ût û û
σ̂ = = , (2.14)
T−k T−k
with k being the number of regressors, including the intercept term. As in the case of the simple
regression model, σ̂ 2 is an unbiased estimator of σ 2 , namely E(σ̂ 2 ) = σ 2 . Unbiasedness of σ̂ 2
is easily established by noting that û = Mu and hence
1
E σ̂ 2
= E u Mu
T−k
1 1
= E Tr u Mu = E Tr uMu
T−k T−k
1 1
= Tr ME uu = Tr Mσ 2 ,
T−k T−k
Noting that
−1
Tr (M) = Tr IT − X X X X
−1
= Tr (IT ) − Tr X X X X = T − k,
it follows that
σ 2 Tr (M)
E σ̂ 2 = = σ 2.
T−k
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 33
th −1
The estimator of Cov β̂ j , β̂ s is given by the j, s element of matrix σ̂ 2 X X .
where we have set the first variable, xt1 , equal to unity to allow for an intercept in the regres-
sion. To simplify the derivations we work with variables in terms of their deviations from their
respective sample means. Summing the equation (2.15) over t and dividing by the sample size, T,
yields:
where ȳ = t yt /T, x̄2 = t xt2 /T, x̄3 = t xt3 /T , ū = t ut /T are the sample means.
Subtracting (2.16) from (2.15) we obtain
where
Sjs =
ts − x̄s ) = t xtj − x̄j xts ,
t xtj − x̄j (x j, s = 2, 3,
Sjy = t xtj − x̄j yt , j = 2, 3,
−1
S22 S23 1 S33 −S23
= .
S23 S33 S22 S33 − S223 −S23 S22
Hence
The estimator of β 1 , the intercept term, can now be obtained recalling that the regression plane goes
through the sample means when the equation has an intercept term. Namely
ȳ = β̂ 1 + β̂ 2 x̄2 + β̂ 3 x̄3 ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
34 Introduction to Econometrics
and hence
The estimates of the variances and the covariance of β̂ 2 and β̂ 3 are given by [using (2.12) and
(2.13)]
−1
β̂ 2 2 S22 S23
Cov = σ̂ ,
β̂ 3 S23 S33
or
σ̂ 2 S33
β̂ 2 =
Var , (2.20)
S22 S33 − S223
σ̂ 2 S22
β̂ 3 =
Var , (2.21)
S22 S33 − S223
and
σ̂ 2 S23
Cov β̂ 2 , β̂ 3 = − . (2.22)
S22 S33 − S223
Finally,
2
û2 t yt − β̂ 1 − β̂ 2 xt2 − β̂ 3 xt3
σ̂ 2 = t t = . (2.23)
T−3 T−3
Notice that the denominator of σ̂ 2 is T − 3, as we have estimated three coefficients, namely the
intercept term, β 1 , and the two regression coefficients, β 2 and β 3 .
β ∗ = β̂ + C y, (2.24)
where C is a k × T matrix with elements possibly depending on X, but not on y. It is clear that
β ∗ is a linear estimator. Also since β̂ is an unbiased estimator of β, for β ∗ to be an unbiased
estimator we need
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 35
E β ∗ = E β̂ + C E y = β + C E y = β,
or that C E y = 0, which in turn implies that
C E y = C (Xβ + E (u)) = C Xβ = 0, (2.25)
or
−1
β ∗ − β = C Xβ + X X X + C u.
However, since C Xβ = 0 for all parameter values, β, then we should also have C X = 0, and
−1
Var β ∗ = σ 2 X X + σ 2 C C .
Therefore
Var β ∗ − Var β̂ = σ 2 C C ,
δ = λ β,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
36 Introduction to Econometrics
where λ is a k × 1 vector of fixed coefficients. Denote the OLS estimator of δ by δ̂, and the
alternative linear unbiased estimator by δ ∗ . We have
δ̂ = λ β̂, δ ∗ = λ β ∗ ,
and
Var δ ∗ − Var δ̂ = λ Var β ∗ λ − λ Var β̂ λ
= λ Var β ∗ − Var β̂ λ.
But we have already shown that Var β ∗ − Var(β̂) is a semi-positive definite matrix. Therefore,
Var δ ∗ − Var δ̂ ≥ 0.
A number of other interesting results also follow from this last inequality. Setting λ = (1, 0, . . . , 0),
for example, gives
δ = λ β = β 1 ,
where
β denotes an alternative estimator to the OLS estimator, β̂, and β 0 is the true value of β.
To see the bias-variance trade-off we first note that
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 37
E (
β−β 0 )(
β−β 0 ) = E β) − β 0 − E(
β−E( β)
β−E(β) − β 0 − E(
β)
=E β−E(
β) β) + E β 0 − E(
β−E( β) β 0 − E(
β)
−E β) β 0 − E(
β−E( β) − E β 0 − E( β) β − E(β) .
But β 0 − E(β) is a constant (i.e., non-stochastic), and can be taken outside of the expectations
operator. Also
E
β−E(
β) β) = Var
β−E( β ,
and by construction
E
β−E(
β) = 0.
Hence
MSE( β + β 0 − E(
β) = Var β) β 0 − E(
β) .
Namely, the MSE( β) can be decomposed into a variance term plus the square of the bias. In
principle it is clearly possible to find an estimator for β with lower variance at the expense of
some bias, leading to a reduction in the overall MSE. This result has been used by James and
Stein (1961) to propose a biased estimator for β such that its MSE is smaller than the MSE of
β̂. Specifically, they considered the estimator
(k − 2) σ 2
βj = 1 − β̂ j , j = 1, 2, . . . ., k,
β̂ (XX ) β̂
obtained by minimizing the overall MSE of β. James and Stein proved that this estimator, by
shrinking the OLS estimator towards zero, has a MSE smaller than the MSE of OLS estimator
when k > 2. For further details see, for example, Draper and van Nostrand (1979) and Gruber
(1998).
β̂ = β + (X X)−1 X u,
and since under assumptions A1–A5, u ∼ N(0,σ 2 IT ), then recalling that X X is a positive defi-
nite matrix, we have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
38 Introduction to Econometrics
β̂ − β ∼ N[0,σ 2 (X X)−1 ].
Equivalently,
β̂ − β
(X X)1/2 ∼ N(0, Ik ),
σ
and
X X
β̂ − β β̂ − β ∼ χ 2k , (2.26)
σ2
where χ 2k stands for the central chi-square distribution with k degrees of freedom. The above
result also follows unconditionally.
Consider now the distribution of σ̂ 2 , the unbiased estimator of σ 2 , given by (2.14). We note
that û = Mu, where M = IT − X(X X)−1 X is an idempotent matrix with rank T − k. Then the
singular value decomposition of M is given by GMG = , where G is an orthonormal matrix
such that GG = IT , and
IT−k 0
= .
0 0
Hence
(T − k) σ̂ 2 û û u Mu
= = = ξ ξ ,
σ2 σ2 σ2
(T − k) σ̂ 2 T−k
= ξ 2i ,
σ2 i=1
(T − k) σ̂ 2
∼ χ 2T−k . (2.27)
σ2
where F(k, T − k) stands for the central F-distribution with k and T − k degrees of freedom.
This result follows immediately from the definition of F-distribution, which is given by the ratio
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 39
of two independent chi-squared variates corrected for their respective degrees of freedom (see
Appendix B). In the present application, the two chi-squared distributions are
(T − k) σ̂ 2 u Mu
= ∼ χ 2T−k ,
σ 2 σ2
and
X X
−1 X X
β̂ − β β̂ − β = u X(X X) (X X)−1 X u
σ2 σ2
u (IT − M)u
= ∼ χ 2k .
σ2
The independence of u Mu and u (IT −M)u follows from the fact that (IT −M)M = M − M2 =
M − M = 0.
The above results can be readily adapted for deriving the distribution of linear subsets of β̂.
Suppose we are interested in the distribution of R β̂, where R is an r × k matrix of fixed constants
with rank r ≤ k. Then
−1 −1
R β̂ − Rβ R X X R R β̂ − Rβ
rσ̂ 2
−1
R β̂ − Rβ β̂)R
R Var( R β̂ − Rβ
= ∼ F(r, T − k). (2.28)
r
In the case where r = 1, the F-test reduces to a t-test. For example, by setting R = (1, 0, . . . , 0),
the above result implies
(β̂ 1 − β 1 )2
∼ F(1, T − k),
β̂ 1 )
Var(
!
β̂ 1 ) ∼ tT−k .
which in turn yields the familiar t-test statistic, given by β̂ 1 − β 1 / Var(
2
t ŷt − ȳ
R =
2
2 . (2.29)
t yt − ȳ
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
40 Introduction to Econometrics
As in the case of the simple regression equation, the total variation of y, measured by Syy =
2
2
t yt − ȳ , can be decomposed into that explained by the regression equation, t ŷt − ȳ ,
and the rest:4
2 2 2
yt − ȳ = ŷt − ȳ + yt − ŷt .
t t t
or
2
t yt − ŷt
R =1−
2
2
t yt − ȳ
2
û û û
=1− t t =1− , (2.30)
Syy Syy
σ̂ 2
R̄2 = 1 − .
SYY / (T − 1)
This ‘adjusted’ measure provides a trade-off between fit, as measured by R2 , and parsimony as
measured by T − k. To make this trade-off more explicit R̄2 is also often defined as
T−1
1 − R̄2 = 1 − R2 . (2.32)
T−k
4 The proof is similar to that presented in Chapter 1 for the bivariate regression model and will not be repeated here.
5 When the regression equation does not contain an intercept term, R2 can become negative.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 41
All the above three definitions of R̄2 are algebraically equivalent. Note that, unlike R2 , there is
no guarantee for the R̄2 to be non-negative, and hence R̄ is not always defined.
In applied econometrics, R̄2 is often used as a criterion of model selection. However, its use
can be justified when the regression models under consideration are non-nested, in the sense
that none of the models under consideration can be obtained from the others by means of some
suitable parametric restrictions. In the case where the models are nested, a more suitable pro-
cedure would be to apply classical hypotheses testing procedures and test the models against
one another by means of F- or t-tests. (See Chapter 3 on hypotheses testing in linear regression
models.)
Remark 1 When yt is trended (upward or downward) it is possible to obtain an R2 very close to unity,
irrespective of whether the trend is deterministic or stochastic. This is because the denominator of
2
R2 , namely Syy = t yt − ȳ , implicitly assumes that yt is stationary with a constant mean
and variance (see Chapter 12 for definition of stationarity). In the case of trended variables a more
appropriate measure of fit would be to define R2 with respect to the first differences of yt , yt =
yt − yt−1 , namely
û û
2
Ry =1−
2 ,
t yt − y
where y = t yt /T. This measure is applicable irrespective of whether yt is trend-stationary
(namely when its deviations from a deterministic trend line are stationary), or first difference sta-
tionary. A variable is said to be first difference stationary if it must be first differenced once before it
becomes stationary (see Chapter 15 for further details). The following simple relation exists between
2 :
R2 and Ry
2
y t − ȳ
1 − Ry
2
=
t 2 1 − R .
2
t yt − y
2
Since in the case of trended yt , for modest values of T, the sum t yt − ȳ will most certainly be
2
substantially larger than t yt − y , it then follows that in practice Ry 2 will be less than
R2 , often by substantial amounts. Also as T tends to infinity R2 will tend to unity, but Ry 2 remain
bounded away from unity. An alternative approach to arriving at a plausible measure of fit in the
case of trended variables would be to ensure that the dependent variable of the regression is station-
ary by running regressions of first differences, yt on the regressors, xt , of interest. But in that case
it is important that lagged values of yt , are also included amongst the regressors, namely a dynamic
specification should be considered. This naturally leads to the analysis of error correction specifica-
tions to be discussed in Chapters 6, 23, and 24.
y = Xβ + u, (2.33)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
42 Introduction to Econometrics
and suppose that X is partitioned into two sub-matrices X1 and X2 of order T × k1 and T × k2
.
such that k = k + k .6 Partitioning β conformably with X = X ..X we have
1 2 1 2
y = X1 β 1 + X2 β 2 + u. (2.34)
Such partitioned regressions arise, for example, when X1 is composed of seasonal dummy vari-
ables or time trends, and X2 contains the regressors of interest, or the ‘focus’ regressors. The OLS
estimators of β 1 and β 2 are given by the normal equations
X1 y = X1 X1 β̂ 1 + X1 X2 β̂ 2 , (2.35)
X2 y = X2 X1 β̂ 1 + X2 X2 β̂ 2 . (2.36)
where
−1
Mj = IT − Xj Xj Xj Xj , for j = 1, 2.
The estimators of the ‘focus’ coefficients, β̂ 2 , can also be written as (recall that Mj are symmetric
and idempotent: Mj = Mj = M2j ):
−1
β̂ 2 = (M1 X2 ) (M1 X2 ) (M1 X2 ) y,
or
−1
X2
β̂ 2 = X2
X2 y,
where X2 = M1 X2 and y = M1 y are the residual matrices and vectors of the regressions of X2
on X1 and of y on X2 , respectively. The residuals from the regression of y = y − ŷ on X2 =
X2 − X̂2 are also given byu =y − X2 β̂ 2 . It is now easily seen that
u is in fact the same as the OLS
residual vector from the unpartitioned regression of y on X.7 Therefore, a regression of y on X̃2
yields the same estimate for β 2 as the standard regression of y on X1 and X2 simultaneously and
6 See Section A.9 in Appendix A for a description of partitoned matrices and their properties.
7 Notice that
−1 −1
ũ = I − X1 X1 X1 X1 y − I − X1 X1 X1 X1 X2 β̂ 2
−1
= y − X2 β̂ 2 − X1 X1 X1 X1 y − X1 X2 β̂ 2 .
(continued)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 43
But using (2.35), X1 y − X1 X2 β̂ 2 = X1 X1 β̂ 1 , and hence
−1
ũ = y − X2 β̂ 2 − X1 X1 X1 X1 X1 β̂ 1
= y − X1 β̂ 1 − X2 β̂ 2 = û.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
44 Introduction to Econometrics
the analysis of dynamic models and have been discussed by Koop, Pesaran, and Potter (1996)
and Pesaran and Shin (1998) and will be addressed in Chapter 24.
To illustrate Pesaran and Smith (2014)’s argument consider the following simple classical lin-
ear regression model with two regressors:
yt = β 0 + β 1 x1t + β 2 x2t + ut .
Suppose further that x1t and x2t are random draws from a bivariate normal distribution with
the covariance matrix
x1t σ 11 σ 21
Var = .
x2t σ 21 σ 22
yt = β 0 + β 1 xt + β 2 x2t + ut . (2.39)
Here it clearly does not make any sense to ask what is the effect on yt of a change in xt , holding
x2t fixed. In this case we have
E(yt |xt ) = β 1 + 2β 2 xt xt , (2.40)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 45
yt = α + β 1 xt + β 2 zt + ut , (2.41)
yt = α + βxt + ε t , (2.42)
which omits the regressor zt . The new error, εt , now contains the effect of the omitted variable
and the orthogonality assumption that requires xt and ε t to be uncorrelated might no longer
hold. To see this consider the OLS estimator of β in (2.42), which is given by
T
t=1 (xt − x̄) yt − ȳ
β̂ =
T .
t=1 (xt − x̄)
2
Hence
t (xt − x̄) (zt − z̄) t (xt − x̄) (ut − ū)
β̂ = β 1 + β 2
+
,
t (xt − x̄) t (xt − x̄)
2 2
where bx•z stands for the OLS estimator of the regression coefficient of xt on zt . In general, there-
fore, β̂ is not an unbiased estimator of β 1 [the ‘true’ regression coefficient of xt in (2.41)]. The
extent of the bias depends on the importance of the zt variable as measured by β 2 and the degree
of the dependence of xt on zt . Only in the case where xt and zt are uncorrelated β̂ will yield an
unbiased estimator of β 1 . See Section 3.13 on the effects of omitting relevant regressors on test-
ing hypothesis involving the regression coefficients.
The omitted regressor bias can be readily generalized to the case where two or more relevant
regressors are omitted. The appropriate set up is the partitioned regression equation given in
(2.34). Suppose that in that equation the regressors X2 are incorrectly omitted and β 1 is esti-
mated by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
46 Introduction to Econometrics
yt = α + βxt + ut ,
y t = α + β 1 xt + β 2 z t + ε t .
The OLS estimator of β 1 in this regression will still be unbiased, but will no longer be an efficient
estimator. There will also be the possibility of a multicollinearity problem that can arise if the
erroneously included regressor, zt , is highly correlated with xt (see Section 15.3.1). In general
suppose that the correct regression model is
y = Xβ + u, (2.44)
but β is estimated by running the expanded regression of y on X and Z. The OLS estimator of
the coefficients of X in this regression, say β 1 , is given by (see also (2.37))
−1
β̂ 1 = X Mz X X Mz y,
Therefore, so long as Z as well as X are strictly exogenous and the orthogonality assumption
E (u |X, Z ) = 0 is satisfied we obtain
E β̂ 1 − β |X, Z = 0,
or unconditionally
E β̂ 1 = β.
Notice, however, that the additional variables in Z can not be weakly exogenous. For example,
adding lagged values of yt to the regressors in error can lead to biased estimators.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 47
yi = α + βxi + γ x2i + ui .
To transform this nonlinear relation to a linear regression model, set zi = x2i and write the
quadratic equation as
yi = α + βxi + γ zi + ui ,
which is a linear regression in the two regressors xi and zi . Other examples of nonlinear relations
that are transformable to linear regressions are general polynomial regressions, logistic models,
log-linear, semi-log-linear and inverse models. Here we examine some of these models in more
detail.
β
Q i = ALαi Ki exp (ui ) ,
where Q i is output of firm i, Li and Ki are the quantities of labour and capital used in the production
process, and ui are independently distributed productivity shocks. Taking logarithms of both sides
now yields the linear logarithmic specification
yi = a + αx1i + βx2i + ui ,
equation in the two regressors x1i and x2i . The estimate of A can now be
which is a linear regression
obtained by  = exp â , where â is the OLS estimate of the intercept term in the above regression.
Example 3 (Logistic function with a known saturation level) The logistic model has the general
form
A
Yi = β
, β, γ > 0, xi > 0,
1 + γ xi exp (ui )
where A is the saturation level of Y, which is assumed to be known. We also assume that A > Yi ,
for all i. This is clearly a nonlinear model in terms of Y and x. To transform this model into a linear
regression model in terms of the unknown parameters γ and β, we first note that
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
48 Introduction to Econometrics
β A
γ xi exp (ui ) = − 1,
Yi
A − Yi
yi = log = α + β log xi + ui ,
Yi
Other examples of nonlinear functions that can be transformed into linear regressions include
semi-logarithmic model
yi = α + β log xi + ui ,
β
yi = α + + ui .
xi
These models have proved very useful in cross-section studies of household consumption
behaviour.
2.16 Exercises
1. Suppose that in the classical regression model yi = α +βxi +ui the true value of the constant,
α, is zero. Compare the variance of the OLS estimator for β computed without a constant term
with that of the OLS estimator for β computed with the constant term.
2. Consider the following linear regression model
yt = α + βxt + γ wt + ut . (2.45)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiple Regression 49
Suppose that the classical assumptions are applicable to (2.45), but β is estimated by running
an OLS regression of yt on a vector of ones and xt . Denote such an estimator by " β , and show
that "
β is a biased estimator of β in (2.45). Derive the formula for the bias of " β in terms of the
correlation coefficient of xt and wt , and their variances, namely ρ xw , σ 2x , σ 2w .
3. Consider the following partitioned classical linear regression model:
y = X1 β 1 + X2 β 2 + u,
(a) Show that if we omit the variables included in X2 , and estimate β 1 by running a regression
of y on X1 only, then β̂ 1 is generally biased with the bias:
where X = (X1 , X2 ).
(b) Interpret the elements of matrix P12 . Under what conditions β̂ 1 will be unbiased?
(c) A researcher is estimating the demand equation for furniture using cross-section data. As
regressors she uses an intercept term, the relative price of furniture, and omits the relevant
income variable. Find an expression for the bias of the OLS estimate of the price variable
in such a regression. What other regressors should she have considered, and how could
their omission have affected her estimate of the price effect?
and suppose that the observations (yt , x1t , x2t ), for t = 1, 2, . . . , T are available.
(a) Specify the assumptions under which (2.46) can be viewed as a classical linear regres-
sion model. In your response clearly distinguish between the cases where x1t and x2t are
fixed in repeated samples, strictly exogenous, and weakly exogenous (see Chapter 9 for
definition of strictly exogenous, and weakly exogenous regressors).
(b) Suppose that the classical assumptions are applicable to (2.46), but β 1 is estimated by
running an OLS regression of yt on a vector of ones and x1t , and β 2 is estimated by run-
ning an OLS regression of yt on a vector of ones and x2t . Denote these estimators by " β yx1
" " "
and β yx2 . Show that in general β yx1 and β yx1 are biased estimators of β 1 and β 2 in (2.46).
(c) Denote the OLS estimators of β 1 and β 2 in the regression of yt on x1t and x2t as in (2.46)
by β̂ 1 and β̂ 2 , respectively. Show that
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
50 Introduction to Econometrics
where s1 and s2 are the standard deviations of x1t and x2t , respectively, and r denotes the
correlation coefficients of x1t and x2t . Discuss the relevance of these results for empirical
time series research.
y = Xβ + u, u ∼ N(0, σ 2 IT ),
(a) Let λmax (X X) and λmin (X X) denote the largest and the smallest characteristic roots (or
eigenvalues) of X X. Prove that the following four statements are equivalent:
• λmin (X X) tends to infinity
• λmax (X X)−1 tends to zero
• Trace (X X)−1 tends to zero
• Every diagonal element of (X X)−1 tends to zero
(b) Using the results under (a), or otherwise show that the OLS estimator of β is consistent
if λmin (X X) tends to infinity.
(c) Prove σ̂ 2 = û û/T is a consistent estimator of σ 2 , where û is the vector of OLS
residuals.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
3 Hypothesis Testing in
Regression Models
3.1 Introduction
S tatistical hypothesis testing is at the core of the classical theory of statistical inference.
Although it is closely related to the problem of estimation, it can be considered almost inde-
pendently of it. In this chapter, we introduce some key concepts of statistical inference, and show
their use to investigate the statistical significance of the (linear) relationships modelled through
regression analysis, or to investigate the validity of the classical assumptions in simple and mul-
tiple linear regression.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
52 Introduction to Econometrics
The hypothesis being tested (i.e. the maintained hypothesis) is usually denoted by H0 and
is called the null hypothesis. The hypothesis against which H0 is tested is called the alternative
hypothesis and is usually denoted by H1 .
The probability of a type I error is called the size of the test and, often denoted by α T , α T ×100
per cent, is also called the significance level of the test. The probability of the type II error is called
the size of the type II error and is often denoted by β T . Ideally, we would like both errors to be as
small as possible. However, there is a trade-off between the two, and by reducing the probability
of a type I error, we must increase the probability of a type II error.
The power of a test is defined as 1 minus the size of the type II error, namely powerT = 1−β T .
For a given significance level, α T , we would like the power of the test, powerT , to be as large as
possible.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
or equivalently,
T = Pr {T (x1 , x2 , . . . , xT ) ≥ CT |H1 } .
yt = α + βxt + ut ,
H0 : β = β 0 ,
H1 : β = β 0 ,
where β 0 is a given value of β. To construct a test for β, first recall that, from (1.25) and (1.26),
T
β̂ = wt yt ,
t=1
where
xt − x̄
wt = T .
s=1 (xs − x̄)2
T
β̂ = wt (α + βxt + ut ),
t=1
T
T
T
β̂ = α wt + β wt xt + wt ut ,
t=1 t=1 t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
54 Introduction to Econometrics
T T
and since t=1 wt = 0, t=1 wt xt = 1 (see the derivations in Section 1.9), we have
T
β̂ = β + wt ut . (3.1)
t=1
Noting that the weighted average of normal variates is also normal, it follows that
β̂ |x ∼ N β, Var β̂ , (3.2)
where
T
σ2
Var β̂ = σ 2 w2t = T .
t=1 (xt − x̄)
2
t=1
In the case where σ 2 is known, we can base the test of H0 : β = 0, on the following standardized
statistic
β̂ − β β̂ − β 0
Zβ̂ = 0 = , (3.3)
Var β̂ S.E. β̂
where S.E. (·) stands for the standard errors. Under the null hypothesis, Zβ̂ ∼ N (0, 1) and the
critical values of the normal distribution will be applicable.
The appropriate choice of the critical values depends on the distribution of the test statistic,
the size of the test (or the level of significance), and whether the alternative hypothesis is two
sided, (namely H1 : β = β 0 ) or one-side, namely whether H1 : β ≥ β 0 or H1 : β ≤ β 0 .
In the case where σ 2 is not known, the use of statistic Zβ̂ defined by (3.3) is not feasible and
σ 2 needs to be replaced by its estimate. Using the unbiased estimator of σ 2 , given by (1.34),
namely
2
t y t − α̂ − β̂x t
σ̂ 2 = ,
T−2
β̂ − β β̂ − β 0
tβ̂ = 0 = 1 ,
Var β̂ σ̂ t (x t − x̄)2 2
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Example 5 Suppose we are interested to test the hypothesis that the marginal propensity to consume
out of disposable income is equal to unity. Using aggregate UK consumption data over the period
1948–89 we obtained the following OLS estimates:
The bracketed figures are standard errors. The estimate of the marginal propensity to consume is
equal to β̂ = 0.87233. To test H0 : β = 1 against H1 : β = 1 we compute the t-statistic
β̂ − β 0 0.87233 − 1.0
tβ̂ = = = −10.92.
S.E.(β̂) 0.01169
The number of degrees of freedom of this test is equal to 42 − 2 = 40, and the 95 per cent critical
value of the t-distribution with 40 degrees of freedom for a two-sided test is equal to ±2.021. Hence
since the value of tβ̂ for the test of β = 1 against β = 1 is well below the critical value of the test
(i.e., −2.021) we reject the null hypothesis that β = 1.
S2XY
ρ̂ 2XY = .
SXX SYY
But since
SXY
β̂ =,
SXX
σ̂ 2
β̂) =
Var( ,
SXX
we have
2 2
β̂ S2XX β̂ SXX
ρ̂ 2XY = = . (3.4)
SXX SYY SYY
β̂
t̂β = ,
β̂)
Var(
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
56 Introduction to Econometrics
2
β̂ SXX
t̂β2 = . (3.5)
σ̂ 2
2
Finally, recall from the decomposition of SYY = yt − ȳ in the analysis of variance table
that (see Section 1.5)
2
yt − ŷt
t (T − 2) σ̂ 2
ρ̂ 2XY =1− 2 = 1 − ,
t yt − ȳ
SYY
or
SYY 1 − ρ̂ 2XY
σ̂ 2 = . (3.6)
T−2
(T − 2) ρ̂ 2XY
tβ̂2 = . (3.7)
1 − ρ̂ 2XY
t2
β̂
ρ̂ 2XY = < 1. (3.8)
T − 2 + t2
β̂
These results show that in the context of a simple regression model the statistical test of the
‘fit’ of the model (i.e., H0 : ρ XY = 0 against H1 : ρ XY = 0) is the same as the test of zero
restriction on the slope coefficient of the regression model (i.e., test of H0 : β = 0 against
H1 : β = 0). Moreover, the test results under the null hypothesis of a zero relationship between
Y and X is equivalent to testing the significance of the reverse regression of X on Y, namely testing
H0 : δ = 0, against H1 : δ = 0, in the reverse regression
xt = ax + δyt + vt , (3.9)
assuming that the classical assumptions now apply to this model. Of course, it is clear that the
classical assumptions cannot apply to the regression of Y on X and to the reverse regression of X
on Y at the same time. But testing the null hypothesis that β = 0 and δ = 0 are equivalent since
the null states that there is no relationship between the two variables. However, if the null of no
relationship between Y and X is rejected, then to measure the size of the effect of X on Y (β X·Y )
as compared with the size of the effect of Y on X (β Y.·X ), will crucially depend on whether the
classical assumptions are likely to hold for the regression of Y on X or for the reverse regression
of X on Y. As was already established in Chapter 1, β̂ Y·X β̂ X·Y = ρ̂ 2YX = ρ̂ 2XY (see (1.9)), from
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
which it follows in general that the estimates of the effects of X on Y and the effects of Y on X do
not match, in the sense that β̂ Y·X is not equal to 1/β̂ X·Y , unless ρ̂ 2XY = 1, which does not apply
in practice.
Hence, in order to find the size of the effects the direction of the analysis (whether Y is
regressed on X or X regressed on Y) matters crucially. But, if the purpose of the analysis is sim-
ply to test for the significance of the statistical relationship between Y and X, the direction of the
regression does not matter and it is sufficient to test the null hypothesis of zero correlation (or
more generally zero dependence) between Y and X. This can be done using a number of alterna-
tive measures of dependence between Y and X. In addition to ρ YX , one can also use Spearman
rank correlation and Kendall’s τ coefficients defined in Section 1.4. The rank correlation mea-
sures are less sensitive to outliers and are more appropriate when the underlying bivariate distri-
bution of (Y and X) show significant departures from Gaussianity and the sample size, T, under
consideration is small. But in cases where T is sufficient large (60 or more), and the underlying
bivariate distribution has fourth-order moments, then the use of simple correlation coefficient,
ρ YX , seems appropriate and tests based on it are likely to be more powerful than tests based on
rank correlation coefficients. √
Under the null hypothesis that Y and X are independently distributed T ρ̂ YX is asymptoti-
cally distributed as N(0, 1), and a test of ρ YX = 0 can be based on
√
zρ = T ρ̂ YX →d N(0, 1).
Fisher has derived an exact sample distribution for ρ̂ YX when the observations are from an
underlying bivariate normal distribution. But in general no exact sampling distribution is known
for ρ̂ YX in the case of non-Gaussian processes. In small samples
more accurate inferences can
be achieved by basing the test of ρ YX = 0 on tβ̂ = ρ̂ YX (T − 2) /(1 − ρ̂ 2YX ) which is dis-
tributed approximately as the Student’s t with T − 2 degrees of freedom. This result follows
from the equivalence of testing ρ YX = 0 and testing β = 0 in the simple regression model
yt = α + βxt + ut .
To use the Spearman rank correlation to test the null hypothesis that Y and X are independent,
we recall from (1.10) that the Spearmen rank correlation, rs , between Y and X is defined by
6 Tt=1 d2t
rs = 1 − , (3.10)
T(T 2 − 1)
where dt is the difference between the ranks of the two variables. Under the null hypothesis of
zero rank correlation between y and x (ρ s = 0, where ρ s is the rank correlation coefficient in the
population from which the sample is drawn) we have
1
Var(rs ) = . (3.11)
T−1
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
58 Introduction to Econometrics
and suppose that we are interested in testing the null hypothesis on the jth coefficient
H0 : β j = β j0 , (3.15)
H1 : β j = β j0 .
−1 −1
where X X jj is the (j, j) element of the matrix X X (see expression (2.13)). Hence, in the
case where σ 2 is known, the test can be based on the following standardized statistic
β̂ j − β j0
Zβ̂ =
1/2 ,
j
σ (X X)−1jj
Under the null hypothesis (3.15), Zβ̂ ∼ N (0, 1) and the critical values of the normal distribu-
j
tion will be applicable. When σ 2 is not known, the unbiased estimator of σ 2 , given by (2.14),
namely
2
t ût û û
σ̂ 2 = = ,
T−k T−k
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
can be used, where k is the number of regression coefficients (inclusive of an intercept, if any).
Replacing σ 2 with σ̂ 2 , yields the t-statistic
β̂ j − β j0
tβ̂ =
1/2 ,
j
−1
σ̂ (X X)jj
which, under the null hypothesis, H0 has a t-distribution with T − k degrees of freedom.
and assume that it satisfies all the classical assumptions. Suppose now that we are interested in
testing the hypothesis
H0 : β 1 + β 2 = 1,
against
H1 : β 1 + β 2 = 1.
Let
δ = β 1 + β 2 − 1, (3.18)
H0 : δ = 0,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
60 Introduction to Econometrics
against
H1 : δ = 0.
δ̂ = β̂ 1 + β̂ 2 − 1,
δ̂ − 0 b̂1 + b̂2 − 1
tδ̂ = = .
δ̂
Var
δ̂
Var
where
δ̂ = Var
Var
β̂ 2 + 2C
β̂ 1 + Var ov β̂ 1 , β̂ 2 .
The relevant expressions of the variance-covariance matrix of the regression coefficients are
given in relations (2.20)–(2.22).
An alternative procedure for testing δ = 0 which does not require knowledge of Cov β̂ 1 , β̂ 2
would be to use (3.18) to solve for β 1 or β 2 in the regression equation (3.17). Solving for β 2 ,
for example, we have
yt = β 0 + β 1 xt1 + δ − β 1 + 1 xt2 + ut ,
or
Therefore, the test of δ = 0 against δ = 0 can be carried out by means of a simple t-test on the
regression coefficient of xt2 in the regression of (yt − xt2 ) on (xt1 − xt2 ) and xt2 .
Example 6 This example describes two different methods of testing the hypothesis of constant returns
to scale in the context of the Cobb–Douglas (CD) production function
β
Yt = AKtα Lt eut , t = 1, 2, . . . , T, (3.20)
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
H0 : α + β = 1,
which we consider as the null hypothesis and derive an appropriate test of it against the two-sided
alternative:
H1 : α + β = 1.
In order to implement the test of H0 against H1 , we first take logarithms of both sides of (3.20),
which yield the log-linear specification
where
and a = log (A). It is now possible to obtain estimates of α and β by running OLS regressions
of LYt on LKt and LLt (for t = 1, 2, . . . , T), including an intercept in the regression. Denote the
OLS estimates of α and β by α̂ and
β , and define a new parameter, δ, as
δ = α + β − 1. (3.22)
H0 : δ = 0,
H1 : δ = 0.
We now consider two alternative methods of testing δ = 0: a direct method and a regression
method. The first method directly focuses on the OLS estimates of δ, namely δ̂ = α̂ + β̂ − 1,
and examines whether this estimate is significantly different from zero. For this we need an estimate
of the variance of δ̂. We have
V(δ̂) = V(α̂) + V(β̂) + 2 Cov α̂, β̂ ,
where V(·) and Cov(·) stand for the variance and the covariance operators, respectively. The OLS
estimator of V(δ̂) is given by
V̂(δ̂) = V̂(α̂) + V̂(β̂) + 2Cov(α̂, β̂).
δ̂ α̂ + β̂ − 1
tδ̂ = = , (3.23)
V̂(δ̂)
V̂(α̂) + V̂(β̂) + 2Cov(α̂, β̂)
and, under δ = 0, has a t-distribution with T − 3 degrees of freedom. An alternative method for
testing δ = 0 is the regression method. This starts with (3.21) and replaces β (or α) in terms of δ
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
62 Introduction to Econometrics
β = δ − α + 1.
or
where Zt = log(Yt /Lt ) = LYt − LLt and Wt = log(Kt /Lt ) = LKt − LLt . A test of δ = 0
can now be carried out by first regressing Zt on Wt and LLt (including an intercept term), and then
carrying out the usual t-test on the coefficient of LLt in (3.24). The t-ratio of δ in (3.24) will be
identical to tδ̂ defined by (3.23). We now apply the two methods discussed above to the historical
data on Y, K, and L used originally by Cobb and Douglas (1928), covering the period 1899–1922.
The following estimates of α̂, β̂ and of the variance covariance matrix of (α̂, β̂) can be obtained:
α̂ = 0.23305, β̂ = 0.80728,
⎡
⎤
V̂(α̂) Cov α̂, β̂ 0.004036 −0.0083831
⎣ ⎦= .
Cov α̂, β̂ V̂(β̂) −0.0083831 0.021047
0.23305 + 0.80728 − 1
tδ̂ = √ = 0.442. (3.25)
0.004036 + 0.021047 − 2(0.0083831)
Comparing tδ̂ = 0.442 and the 5 per cent critical value of the t-distribution with T − 3 = 24 −
3 = 21 degrees of freedom (which is equal to 2.080), it is clear that since tδ̂ = 0.442 < 2.080,
then the hypothesis δ = 0 or α + β = 1 cannot be rejected at the 5 per cent level. Implementing
the regression approach, we estimate (3.24) by OLS and obtain estimates for the coefficients of
Wt and LLt of 0.2330(0.06353) and 0.0403(0.0912), respectively. (The figures in brackets are
standard errors.) Note that the t-ratio of the coefficient of the LL variable in this regression is equal
to 0.0403/0.0912 = 0.442, which is identical to tδ̂ as computed in (3.25). It is worth noting that
the estimates of α and β, which have played a historically important role in the literature, are very
‘fragile’, in the sense that they are highly sensitive to the sample period chosen in estimating them.
For example, estimating the model (given in (3.21)) over the period 1899–1920 (dropping the
observations for the last two years) yields α̂ = 0.0807(0.1099) and β̂ = 1.0935(0.2241).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
H0 : β 1 = β 2 = 0,
H1 : β 1 = 0 and/or β 2 = 0.
Note that this joint hypothesis is different from testing the following two hypotheses separately.
H0I : β 1 = 0, H0II : β 2 = 0,
or
H1I : β 2 = 0. H1II : β 1 = 0.
The latter tests are known as separate induced tests and could lead to test outcomes that differ
from the outcome of a joint test.
The general procedure for testing joint hypotheses in regression contexts is to construct
the F-statistic that compares the sum of squares of residuals (SSR) of the regression under
the restrictions (i.e., under H0 ) with the SSR under the alternative hypothesis, H1 , when the
parameter restrictions are not applied. This procedure is valid for a two-sided test. Carrying
out one sided tests in the case of joint hypotheses is more complicated and will not be
addressed here.
The relevant statistic for the joint test of r ≤ k different linear restrictions on the regression
coefficients is
T−k−1 SSRR − SSRU
F= , (3.26)
r SSRU
where
Under the null hypothesis, the above statistic, F, has an F-distribution with r and T − k − 1
degrees of freedom.
Consider now the application of this general procedure to the problem of testing β 1 =β 2 = 0.
The restricted sum of squares of errors (SSRR ) for the problem is obtained by imposing the
restrictions β 1 = β 2 = 0 on (3.17) and then by estimating the restricted model
yt = β 0 + ut .
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
64 Introduction to Econometrics
Hence
2
T−3 SYY − t ût
F= 2
,
2 t ût
H0 : Rβ − d0 = 0,
H1 : Rβ − d0 = 0,
where R is an r × k matrix of known constants with full row rank given by r ≤ k, and d is an r × 1
vector of constants. The different hypotheses considered above can be obtained by appropriate
choice of R and d0 . For example, if the object of the exercise is to test the null hypothesis that
the first element of β is equal to zero, then we need to set R = (1, 0, . . . , 0), and d0 =0. To test
the hypothesis that the sum of the first two elements adds up to 2 and the sum of the second two
elements of β adds up to 3 we set
1 1 0 0 ... 0 2
R= , d0 = .
0 1 1 0 ... 0 3
the result given by (2.28), it follows that under H0 the F statistic given by (3.27) has a central
F-distribution with r and T−k degrees of freedom. This result of course requires that the classical
normal regression assumptions A1–A5 set out in Chapter 2 hold.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where δ = d1 − d0 . Hence
−1
R β̂ − d0 R(X X)−1 R R β̂ − d0
X1 = ∼ χ 2r (λ), (3.28)
σ2
where χ 2r (λ) is a non-central chi-square variate with r degrees of freedom and the non-centrality
parameter1
−1
−1
δ R(X X)−1 R δ
λ= = δ RVar(β̂)R δ. (3.29)
σ2
k
yt = β 0 + β j xtj + ut , t = 1, 2, . . . , T,
j=1
1 For further information regarding the non-central chi-square distribution see Section B.10.2 in Appendix B.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
66 Introduction to Econometrics
and suppose we are interested in testing the joint significant of the regressors xt1 , xt2 , . . . , xtk .
The relevant hypothesis is
H0 : β 1 = β 2 , · · · = β k = 0,
H1 : β 1 = 0, β 2 = 0, · · · β k = 0.
Hence
T−k−1 SYY T−k−1 R2
F= 2 −1 = ,
k t ût k 1 − R2
which yields the generalization of the result (3.7) obtained in the case of the simple
regression.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where β = β 1 , β 2 ,
⎛ ⎞
β̂ 1
Var
Cov β̂ 1 , β̂ 2
Cov β̂ = ⎝ ⎠,
Cov β̂ 1 , β̂ 2
β̂ 2
Var
and Fα (2, T − 3) is the (1 − α) × 100 per cent critical value of the F-distribution with 2 and
T − 3 degrees of freedom.
and assume for simplicity that (xt1 , xt2 ) have a bivariate distribution with the correlation coeffi-
cient, ρ 12 . That is
It is clear that as ρ approaches unity separate estimation of the slope coefficients β 1 and β 2
becomes more and more problematic. Multicollinearity (namely a value of ρ 12 near unity in the
context of the present example) will be a problem if xt1 and xt2 are jointly statistically significant
but neither is statistically significant when taken individually. Put differently, multicollinearity
will be a problem when the hypothesis β 1 = 0 and β 2 = 0 can not be rejected when tested
separately, while the joint hypothesis that β 1 = β 2 = 0 is rejected. This clearly happens when
xt1 (or xt2 ) is an exact linear function of xt2 (or xt1 ). In this case xt2 = γ xt1 and (3.31) reduces
to the simple regression equation
yt = α + β 1 + β 2 γ xt1 + ut ,
and it is only possible to estimate β 1 + γ β 2 . Neither β 1 nor β 2 can be estimated (or tested) sep-
arately. This is the case of ‘perfect multicollinearity’ and arises out of faulty specification of the
regression equation. One important example is when four seasonal dummies are included in a
quarterly regression model that already contains an intercept term. In general the multicollinear-
ity problem is likely to arise when ρ 212 is close to 1.
The multicollinearity problem is also closely related to the problem of low power when test-
ing hypotheses concerning the values of the regression coefficients separately. It is worth not-
ing that no matter how large the correlation coefficient between xt1 and xt2 , so long as it is not
exactly equal to ±1, a test of β 1 = 0 (or β 2 = 0) will have the correct size. The high degree of
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
68 Introduction to Econometrics
correlation between xt1 and xt2 causes the power of the test to be rather low and as a result we
may end up not rejecting the null hypothesis that β 1 = 0 even if it is false.
Example 7 To demonstrate the multicollinearity problem and its relation to the problem of low power,
using Microfit 5.0 we generated 1,000 observations on x1 , x2 and y in the following manner.
x1 ∼ N (0, 1) ,
x2 = x1 + 0.15v,
v ∼ N (0, 1) ,
y = β 0 + β 1 x1 + β 2 x2 + u,
u ∼ N (0, 1) ,
Now running the regression of y on x1 and x2 (including an intercept term) using only the first fifty
observations yields
The standard errors of the parameter estimates are given in brackets, R is the multiple correlation
coefficient, σ̂ is the estimated standard error of the regression equation, and F2,47 is the F-statistics
for testing the joint hypothesis
J
H0 : β 1 = β 2 = 0,
against
J
H1 : β 1 = 0, β 2 = 0.
H0I : β 1 = 0
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
against
H1I : β 1 = 0,
and of
H0II : β 2 = 0,
against
H1II : β 2 = 0,
J
It is firstly clear that since the value of the F-statistic (F2,47 = 132.98) for the test of H0 : β 1 =
β 2 = 0 is well above the 95 critical value of the F-distribution with 2 and 47 degrees of freedom, we
conclude that the joint hypothesis β 1 = β 2 = 0 is rejected at least at the 95 per cent significance
level. Turning now to the tests of β 1 = 0 and β 2 = 0 separately, (i.e. testing the separate induced
null hypotheses H0I and H0II ), we note that the t-statistics for these hypotheses are equal to tβ̂ =
1
1.0950/1.0403 = 1.05 and tβ̂ = 0.8719/1.0200 = 0.85, respectively. Neither is statistically
2
significant and the null hypothesis of β 1 = 0 or β 2 = 0 can not be rejected. There is clearly a
multicollinearity problem. The joint hypothesis that β 1 and β 2 are both equal to zero is strongly
rejected, but neither of the hypotheses that β 1 and β 2 are separately equal to zero can be rejected.
The sample correlation coefficient of x1 and x2 computed using the first 50 observations is equal to
0.99316 which is apparently too high, given the sample size and the fit of the underlying equation,
for the β 1 and β 2 coefficients to be estimated separately with any degree of precision. In short, the
separate induced tests lack the necessary power to allow rejection of β 1 = 0 and β 2 = 0 separately.
The relationship between the F-statistic used to test the joint hypothesis β 1 = β 2 = 0, and the
t-statistics used to test β 1 = 0 and β 2 = 0 separately, can also be obtained theoretically. Recall
from Section 3.7 that
2
T−3 SYY − t ût
F= 2
. (3.34)
2 t ût
Denote the t-statistics for testing β 1 = 0 and β 2 = 0 separately by t1 and t2 , respectively. Then
2
β̂ j
tj2 = , j = 1, 2.
β̂ j
Var
σ̂ 2 S22
β̂ 1 =
Var ,
S11 S22 − S212
σ̂ 2 S11
β̂ 2 =
Var ,
S11 S22 − S212
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
70 Introduction to Econometrics
where as before Sjs = t xtj − x̄j (xts − x̄s ). Also since yt − ȳ = β̂ 1 (xt1 − x̄1 ) +
β̂ 2 (xt2 − x̄) + ût we have2
2 2 2
SYY = yt − ȳ = β̂ 1 S11 + β̂ 2 S22 + 2β̂ 1 β̂ 2 S12 + û2t .
t t
Using these results in the expression for the F-statistic in (3.34) we obtain:
t12 + t22 + 2ρ 12 t1 t2
F= , (3.35)
2 1 − ρ 212
where ρ 12 is the sample correlation coefficient between xt1 and xt2 .3 This relationship clearly shows
that even for small values of t1 and t2 it is possible to get quite large values of F so long as ρ 12 is
chosen to be close enough to 1.
The above example considers the simple case of a regression model with two explanatory
variables. In case of regression models with more than two regressors the detection of the multi-
collinearity problem becomes more complicated. For example, when there are three regressors
with the coefficients β 1 , β 2 and β 3 , we need to consider all the possible combinations of the
coefficients, namely testing them separately: β 1 = 0, β 2 = 0, β 3 = 0; in pairs: β 1 = β 2 = 0,
β 2 = β 3 = 0, β 1 = β 3 = 0; and jointly: β 1 = β 2 = β 3 . Only in the case where the results of
separate induced tests, the ‘pairs’ tests and the joint test are free from contradictions can we be
confident that multicollinearity is not a problem.
There exist a number of measures in the literature that purport to detect and measure the
seriousness of the multicollinearity problem. One commonly used diagnostic is the condition
number defined as the square root of the ratio of the largest to the smallest eigenvalue of the
matrix X X, where the columns of X have been re-scaled to length 1 (namely, the elements of the
jth column of X have been divided by sj = ( Tt=1 x2tj )1/2 , for j = 1, 2, . . . , k). The condition
number detects whether the matrix X X has a small determinant, namely if it is ill-conditioned.
The larger the condition number, the more ill-conditioned is the matrix, and difficulties can be
encountered in calculations involving (X X)−1 . Values of condition number higher than 30 are
suggested as indicative of a problem (see Belsley, Kuh, and Welsch (1980) for details). Another
diagnostic used to detect multicollinearity is the variance-inflation factor (VIF), defined as VIFj =
(1 − Rj2 )−1 , for the jth regressor, where Rj2 is the squared multiple correlation coefficient of the
regression of xtj on all other variables in the regression. A high value of VIFj suggests that xtj is
in some collinear relationship with the other regressors. As a rule of thumb, for scaled data, a
VIF j higher than ten indicates severe collinearity (see Kennedy (2003)). We remark that these
measures only examine the inter-correlation between the regressors, and at best give a partial
picture of the multicollinearity problem, and can often ‘lead’ to misleading conclusions.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
A useful rule of thumb which goes beyond regressor correlations is to compare the squared
multiple correlation coefficient of the regression equation, R2 , with Rj2 . Klein (1962) suggests
that collinearity is likely to be a problem and could lead to imprecise estimates if R2 < Rj2 , for
some j = 1, 2, . . . , k.
Example 8 To illustrate the problem return to the simulation exercise, and use the first 500 obser-
vations (instead of the first 50 observations) in computing the regression of y on x1 and x2 . The
results are
As compared with the estimates based on the first 50 observations [see (3.32) and (3.33)], these
estimates have much smaller standard errors and using the 95 percent significance level we arrive
at similar conclusions whether we test β 1 = 0 and β 2 = 0 separately or jointly. Yet the sam-
ple correlation coefficient between xt1 and xt2 estimated over the first 500 observations is equal to
0.9895 which is only marginally smaller than the estimate obtained for the first 50 observations. By
increasing the sample size from 50 to 500 we have increased the precision with which β 1 and β 2
are estimated and the power of testing β 1 = 0 and β 2 = 0 both separately and jointly.
The above illustration also points to the fact that the main cause of the multicollinearity prob-
lem is lack of adequate observations (or information), and hence the imprecision with which
the parameters of interest are estimated. Assuming the regression model under consideration
is correctly specified, the only valid solution to the problem is to increase the information on
the basis of which the regression is estimated. The new information could be either in the form
of additional observations on y, x1 and x2 , or it could be some a priori information concerning
the parameters. The latter fits well with the Bayesian approach, but is difficult to accommodate
within the classical framework. There are also other approaches suggested in the literature such
as the ridge regression, and the principle component regression to deal with the multicollinear-
ity problem. For a Bayesian treatment of the regression analysis see Section C.6 in Appendix C.
However, in using Bayesian techniques to deal with the multicollinearity problem it is important
to bear in mind that the posterior means of the regression coefficients are well defined in small
samples even if the regressors are highly multicollinear and even if X X is rank deficient. But in
such cases the posterior mean of β can be very sensitive to the choice of the priors, and unless
T −1 X X tends to a positive definite matrix the Bayes estimates of β could become unstable as
T → ∞.
Example 9 As an example consider the following Fisher type explanation of nominal interests esti-
mated on US quarterly data over the period 1948(1)–1990(4) using the file USGNP.FIT provided
in Microfit 5:
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
72 Introduction to Econometrics
where Rt = nominal rate of interest, DMt = the growth of money supply (M2 definition). In this
regression, the coefficients of the lagged interest rate variables are all significant, but neither of the
two coefficients of the lagged monetary growth variable is statistically significant. The t-ratios for
the coefficients of DMt−1 and DMt−2 are equal to 1.23 and 1.01, respectively, while the 95 percent
critical value of the t-distribution with 165 (namely T − k = 172 − 7) degrees of freedom is
equal to 1.97. As we have seen above, it would be a mistake to necessarily conclude from this result
that monetary growth has no significant impact on the nominal interest rates in the US. The sta-
tistical insignificance of the coefficients of DMt−1 and DMt−2 , when tested separately may be due
to the high intercorrelation between the regressors. Also we are not interested in testing the statis-
tical significance of individual coefficients of the past monetary growth rates. What is of interest is
the sum of the two coefficients of the lagged monetary growth rates, and not the individual coeffi-
cients, separately. Denote the coefficients of DMt−1 and DMt−2 by γ 1 and γ 2 respectively, and let
δ = γ 1 + γ 2 . We have
y = X1 β 1 + X2 β 2 + u = Xβ + u,
where y = (y1 , y2 , . . . , yT ) , X1 and X2 are T×k1 and T×k2 regressor matrices that are perfectly
correlated, namely
X2 = X1 A ,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and A is a k2 ×k1 matrix of fixed constants. Further assume that X1 X1 is a positive definite matrix.
Consider now the forecast of yT+1 conditional on xT+1 = (x1T , x ) which is given by4
2T
ŷT+1 = xT+1 β̂ T = xT+1 (X X)+ X y,
where (X X)+ is the generalized inverse of X X, defined by (see also Section A.7)
+
X X (X X) X X = X X .
It is well known that (X X)+ is not unique when X X is rank deficient. In what follows we show
that ŷT+1 is unique despite the non-uniqueness of (X X)+ . Note that
X1 X1 X1 X1 A
XX= = HX 1 X1 H ,
AX 1 X1 AX 1 X1 A
where H is a k × k1 matrix (k = k1 + k2 ):
I k1
H= .
A
Also
X y = HX 1 y, and xT+1
= x1T H .
Hence
+
ŷT+1 = xT+1 (X X)+ X y = x 1T H HX 1 X1 H HX 1 y.
−1/2 1/2 1/2 1/2 +
ŷT+1 = x1T X1 X1 X1 X1 H H X1 X1 X1 X1 H
1/2 −1/2
H X1 X1 X1 X1 X1 y,
or
−1/2 + −1/2
ŷT+1 = x1T X1 X1 G GG G X1 X1 X1 y,
where
1/2
G = H X1 X1 .
+
Consider now the k1 × k1 matrix G G G G and note that from properties of generalized
inverse we have
4 A general treatment of the prediction problem is given in Chapter 17.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
74 Introduction to Econometrics
GG (GG )+ GG = GG .
But
1/2 1/2
G G = X1 X1 H H X1 X1 ,
and since
H H = Ik1 + A A,
then
1/2 1/2
G G = X1 X1 Ik1 + A A X1 X1 ,
is a nonsingular matrix (for any A) and has a unique inverse. Using this result in (3.36) it now
follows that
G (GG )+ G = Ik1 ,
and hence
−1/2 + −1/2
ŷT+1 = x1T X1 X1 G GG G X1 X1 X1 y
−1
= x1T X1 X1 X1 y,
yt = β 0 + β 1 xt + β 2 zt + ut , (3.37)
yt = α + βxt + ε t , (3.38)
which omits the regressor zt . We have seen in Section 2.13 that omitting a relevant regressor,
zt , may lead to biased estimates, unless the included regressor, xt , and the omitted variable, zt ,
are uncorrelated. However, even in the case xt and zt are uncorrelated, β̂ will not be an efficient
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
estimator of β 1 . This is because the correct estimator of the variance of β̂ requires knowledge of
an estimator of σ 2u = Var (ut ), namely
2
2
t ût t yt − β̂ 0 − β̂ 1 xt − β̂ 2 zt
σ̂ 2u = = .
T−3 T−3
with β̂ 0 , β̂ 1 , and β̂ 2 being OLS estimators of parameters in (3.37), while the regression with the
omitted variable only yields an estimator of σ 2ε = Var (ε t ), namely
2
t yt − α̂ − β̂xt
2
t ε̂ t
σ̂ 2ε = = ,
T−2 T−2
with α̂ and β̂ being OLS estimators of parameters in (3.38). Notice that, in general, σ̂ 2ε ≥ σ̂ 2u ,
and therefore the variance of β̂ will be generally larger than the variance of β̂ 1 . A similar prob-
lem in the estimation of the variance of estimated regression parameters arises when additional
irrelevant variables are included in the regression equation.
where
T j
t=1 ût
mj = , j = 1, 2, 3, 4,
T
√
For a normal distribution b1 ≈ 0, and b2 ≈ 3. The Jarque–Bera’s test of the departures from
normality is given by (see Jarque and Bera (1980) and Bera and Jarque (1987))
1
χ 2T (2) = T 6 b1 + 24 (b2
1
− 3)2 ,
if the regression contains an intercept term (note that in that case m1 = 0). When the regression
does not contain an intercept term, then m1 = 0, and the test statistic has the additional term
Tb0 = T 3m21 /(2m2 ) − m3 m1 /m22 ,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
76 Introduction to Econometrics
namely
χ 2T (2) = T b0 + 16 b1 + 24 (b2
1
− 3)2 .
y0 = X0 β 1 + S2 δ + u0 , (3.41)
where
– û0 is the OLS residual vector of the regression of y0 on X0 (i.e., based on the first and the
second sample periods together).
– û1 is the OLS residual vector of the regression of y1 on X1 (i.e., based on the first sample
period).
Under the classical normal assumptions, the predictive failure test statistic, FPF , has an exact
F-distribution with T2 and T1 − k degrees of freedom.
The LM version of the above statistic is computed as
a
χ 2PF = T2 FPF ∼ χ 2 (T2 ), (3.43)
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
which is distributed as a chi-squared with T2 degrees of freedom for large T1 (see Chow (1960),
Salkever (1976), Dufour (1980), and Pesaran, Smith, and Yeo (1985), section III.)
It is also possible to test if the predictive failure is due to particular time period(s) by applying
the t- or the F-tests to one or more elements of δ in (3.41).
where
– û0 is the OLS residual vector for the first two sample periods together
– û1 is the OLS residual vector for the first sample period
– û2 is the OLS residual vector for the second sample period.
a
χ 2SS = kFSS ∼ χ 2 (k). (3.45)
For more details see, for example, Pesaran, Smith, and Yeo (1985, p. 285).
1 1
T
y − yt
f̂ (y) = K ,
T t=1 hT hT
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
78 Introduction to Econometrics
where K(·) is called kernel function, and hT is the window width, also called the smoothing
parameter or bandwidth. The kernel function needs to satisfy some regularity conditions typical
$ +∞
of probability density functions, for example, K (−∞) = K (∞) = 0, and −∞ K (x) dx = 1.
There exists a vast literature on the choice of this function. One popular choice is the Gaussian
kernel, namely
1 y2
K y = √ e− 2 .
2π
As also pointed by Pagan and Ullah (1999), the choice of K is not critical to the analysis, and the
optimal kernel in most cases will only yield modest improvements in the performance of f̂ (y),
over selections such as the Gaussian kernel.
When implementing density estimates, the choice of the window width, hT , plays an essential
role. One crude way of choosing hT is by a trial-and-error approach, consisting of looking at
several different plots of f̂ (y) against y, when f̂ (y) is computed for different values of hT . Other
more objective and automatic methods for selecting hT have been proposed in the literature.
One popular choice is the Silverman rule of thumb, according to which
hsrot = 0.9 · A · T − 5 ,
1
(3.46)
where A = min (σ , R/1.34), σ is the standard deviation of the variable y, R is the interquartile
range, and T is the number of observations, see Silverman (1986, p. 47). Another very popular
method is the least squares cross-validation method, according to which the window width is the
value, hlscv , that minimizes the following criterion
1 2
T T
yt − ys
ISE (hT ) = 2 K2 − f̂−t yt , (3.47)
T hT hT T t=1
t =s
and f̂−t yt is the density estimator obtained after omitting the t th observation. We have
1
T T
1 yt − yj
f̂−t yt = K .
T t=1 T (T − 1) hT hT
t =j
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
For the Gaussian kernel the expression for ISE (hT ) simplifies to (see Bowman and Azzalini
(1997, p. 37))
1 √ T−2
T
√
ISE (hT ) = φ 0, 2hT + φ yt − yj , 2hT
(T − 1) T(T − 1)2
t =j
2 T
− φ yt − yj , hT ,
T(T − 1)
t =j
where φ(y, σ ) denotes the normal density function with mean 0 and standard deviation σ :
2 −1/2 −y2
φ(y, σ ) = (2π σ ) exp .
2σ 2
In cases where local minima are encountered we select the bandwidth that corresponds to the
local minimum with the largest value for hT . See Bowman and Azzalini (1997, pp. 33–4). See also
Pagan and Ullah (1999), Silverman (1986), Jones, Marron, and Sheather (1996), and Sheather
(2004) for further details.
3.19 Exercises
1. Consider the model
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
80 Introduction to Econometrics
where Lt and Kt are exogenous and the ut are distributed independently as N(0, σ 2 ) variates.
The estimated equation, based on data for 1929–67, is
(a) Test the hypothesis H1 : β 1 = 0 and also test H2 : β 2 = 0. Each at the 5 per cent
level.
(b) Test the joint hypothesis H : β 1 = β 2 = 0 at the 5 per cent level.
(c) Find the 95 per cent confidence interval for β 1 + β 2 (the return to scale parameter)
and test the hypothesis β 1 + β 2 = 1.
(d) Re-estimating the equation with β 1 = 1.5 and β 2 = 0.5, the (restricted) sum of
squared residuals is 0.0678. Use a 5 per cent level F-test to test the joint hypothesis:
H : β 1 = 1.5, β 2 = 0.5.
q = α 0 + α 1 y + α 2 p1 + α 3 p2 + α 4 n + u (3.48)
where q is the volume of food consumption, y real disposable income, p1 an index of the price
of food, p2 an index of all other prices and n population. All variables are in logarithms. He
knows that the correlation between p1 and p2 is 0.95 and between y and n is 0.93, and decides
that the equation suffers from multicollinearity. On asking his colleagues for advice, he gets
the following suggestions: Colleague A suggests dropping all variables with t-statistics less
than 2. Colleague B says that multicollinearity results from too little data variation and sug-
gests pooling the aggregate time series data with a cross-section budget survey on food con-
sumption. Colleague C recommends that he should reduce the amount he is asking of the
data by imposing the restrictions α 2 + α 3 = 0 and 1 − α 1 − α 4 = 0 which are suggested by
economic theory. Colleague D says multicollinearity will be reduced by replacing (3.48) by
(3.49)
Z1 = β 0 + β 1 Z2 + β 2 Z3 + β 3 Z4 + u (3.49)
(a) Is the economist correct in being sure that (3.48) will necessarily suffer from multi-
collinearity?
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Suppose that the classical assumptions are applicable to (3.50) and εt ∼ IID(0, σ 2 ). Denote
the OLS estimators of β 0 , β 1 and β 2 by β̂ 0 , β̂ 1 and β̂ 2 , respectively, and the estimator
of σ 2 by
1
σ̂ 2 = (yt − β̂ 0 − β̂ 1 x1t − β̂ 2 x2t )2 .
T−3 t
σ̂ 2 σ̂ 2
β̂ 1 ) =
Var( ,
β̂ 2 ) =
Var( ,
s21 (1 − r2 ) s22 (1 − r2 )
where s1 and s2 are the standard deviations of x1t and x2t , respectively, and r is the cor-
relation coefficient between x1t and x2t .
(b) Suppose that x2t − μ2 = λ(x1t − μ1 ), where μ1 and μ2 are the means of x1t and x2t ,
respectively, and λ is a fixed non-zero constant. Show that in this case Var(β̂ 1 + λβ̂ 2 ) is
β̂ 1 ) and Var(
finite, although Var(
β̂ 2 ) both blow up individually. How do you interpret
this result?
(c) What is meant by the ‘multicollinearity problem’? How is it detected? What are the
possible solutions. Discuss, in the light of the results in part (b).
(a) Let x ∼N(μ, ), where x and μ are s dimensional vectors, and is an s × s positive
definite matrix. Show that
x −1 x ∼ χ 2s (δ 2 ),
y = Xβ + u, u ∼ N(0, σ 2 IT ),
H0 : Rβ = c,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
82 Introduction to Econometrics
where R is a known s × k matrix of rank s and c is a known s × 1 vector. Use the results
in (b) to show that
−1
(R β̂ − c) σ 2 R(X X)−1 R (R β̂ − c) ∼ χ 2s (δ 2 ) ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
4 Heteroskedasticity
4.1 Introduction
T wo important assumptions of the classical linear regression model are that the disturbances
of the regression equation have constant variances across statistical units and that they are
uncorrelated. These assumptions are needed to prove the Gauss–Markov theorem and to show
that the least squares estimator is asymptotically efficient. In this chapter, we consider an impor-
tant extension of the classical model introduced in Chapter 2 by allowing the disturbances, ui , to
have variances differing across different values of i, namely to be heteroskedastic. Chapter 5 will
consider the case of autocorrelated disturbances.
The heteroskedasticity problem frequently arises in cross-section regressions, while it is less
common in time-series regressions. Important examples of regressions with heteroskedastic
errors include cross-section regressions of household consumption expenditure on household
income, cross-country growth regressions, and the cross-section regression of labour produc-
tivity on output growth across firms or industries. Heteroskedasticity also arises if regression
coefficients vary randomly across observations or when observations on yi and xi are average
estimates based on stratified sampling where the average estimates are based on different sample
sizes. In time series regressions heteroskedasticity can arise either because of structural change
or omission of relevant variables from the regression.
yi = α + βxi + ui , (4.1)
and assume that all the classical assumptions apply to this model except that
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
84 Introduction to Econometrics
We first examine the consequences of heteroskedastic errors for the OLS estimators of α and β,
and their standard errors. The OLS estimator of β is given by
i yi − ȳ (xi − x̄) (xi − x̄) ui
β̂ = = β + i 2 ,
i (xi − x̄) i (xi − x̄)
2
where i stands for summation over i = 1, 2, . . . , T. Hence
β̂ = β + wi ui , (4.3)
i
2
where the weights wi = (xi − x̄) / j xj − x̄ are as given by (1.26). Taking expectations of
both sides of (4.3) we have
E β̂ = β + wi E (ui ) .
i
Note that xi , and hence wi are taken as given. Therefore E β̂ = β, and the OLS estimator of
β continues to be unbiased. Consider now the variance of β̂ in the presence of heteroskedas-
ticity. Taking variances of both sides of (4.3) and noting that ui is assumed to be uncorrelated
we have
Var β̂ = w2i Var (ui ) .
i
σ 2 (x − x̄)2
i
Var β̂ = i i , (4.5)
2 2
i (xi − x̄)
which differs from the standard OLS formula for the variance of the slope coefficient given in
(1.28) and reduces to it only when σ 2i = σ 2 , for all i. Therefore, the presence of heteroskedas-
tic errors biases the estimates of the variances of the OLS estimators and hence invalidates the
application of the usual t or F tests to the parameters. This result readily generalizes to the
multivariate case.
The direction of the bias in the OLS variance of β̂ depends on the pattern of the heteroskedas-
ticity. In the case where σ 2i and (xi − x̄)2 are positively correlated, the OLS formula
underestimates the true variance β̂ given by (4.5), and its use can thus result in invalid inferences.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Heteroskedasticity 85
There are two general ways of dealing with the heteroskedasticity problem. One possibility is
to specify the form of the heteroskedasticity and estimate the regression coefficients and the stan-
dard errors of their estimates by taking explicit account of the assumed pattern of heteroskedas-
ticity. An alternative approach is to continue using the OLS estimators (i.e., α̂ and β̂), but adjust
the variances of α̂ and β̂ for the presence of heteroskedasticity. This latter procedure is partic-
ularly of interest when the exact form of the heteroskedasticity (namely σ 2i ; i = 1, 2, . . . , T)
is not known. In these circumstances, consistent estimators of the variances and covariances of
the OLS estimators of the regression coefficients have been suggested in the literature, and are
known as heteroskedasticity-consistent estimators (HCV) (see Eicker (1963), Eicker, LeCam,
and Neyman (1967), Rao (1970), and White (1980)). In the case of the simple regression the
heteroskedasticity-consistent estimator of Var(β̂) is given by
T û2 (x − x̄)2
HCV β̂ =
i i i
2 , (4.6)
T−2 (xi − x̄)2 i
where ûi = yi − α̂ − β̂xi are the OLS residuals. The factor T/ (T − 2) is asymptotically neg-
ligible and is introduced to allow for the loss in degrees of freedom due to the estimation of the
parameters, α and β.1 Notice that apart from the degrees of freedom correction factor, (4.6)
gives a consistent estimator of Var β̂ in (4.5) by replacing σ 2i with û2i .
The result (4.6) readily generalizes to the multivariate case, and in matrix notations can be
written as
T
T −1
−1
HCV β̂ = X X û2i xi xi X X , (4.7)
T−k i=1
where k is the dimension of the coefficient vector β in the multivariate regression model,
y = Xβ + u,
and xi is the vector of the ith observation on the variables (including the intercept term) in the
regression equation. See White (1980) and also Sections 5.9 and 5.10 for further discussion on
robust estimation and testing in the presence of heteroskedastic and autocorrelated errors.
1 The small sample correction implicit in the introduction of the degrees of freedom factor in the calculation of the
heteroskedasticity-consistent estimators has been suggested in MacKinnon and White (1985).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
86 Introduction to Econometrics
σ 2i = σ 2 z2i , (4.8)
where σ 2 is a unknown scalar, and zi are known observations on a variable thought to be dis-
tributed independently of ui . In this case, as heteroskedasticity takes a known form, efficient esti-
mation of α and β can be achieved by the method of weighted least squares. Dividing both sides
of (4.1) by zi we have
yi 1 xi ui
=α +β + , (4.9)
zi zi zi zi
we obtain
where y∗i = yi /zi , x1i = 1/zi , x2i = xi /zi and ε i = ui /zi . Using (4.8), we now have
Furthermore, since zi and xi are assumed to be distributed independently of ui , it also follows that
x1i and x2i in (4.10) will be distributed independently of εi , and (4.10) satisfies all the classical
assumptions. Therefore, by the Gauss–Markov theorem, the OLS estimators of α and β in the
regression of y∗i on x1i and x2i will be BLUE (see Section 2.7 for a definition of BLUE estimators).
These estimators of α and β are also referred to as the weighted or the generalized least square
(GLS) estimators. The efficiency of the estimators of α and β in the transformed equation (4.9)
over their OLS counterpart easily follows from the fact that the GLS estimators of α and β in
(4.10) satisfy the Gauss–Markov theorem, while the OLS estimators in (4.1) do not. See Section
5.4 for a direct proof for the general case of a non-diagonal error covariance matrix.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Heteroskedasticity 87
This model postulates a nonlinear relation between the conditional mean and the conditional
variance of the dependent variable and can be justified on theoretical grounds. For example, in
the case of risk-adverse investors, mean return is related to volatility of stock returns measured
by conditional variance of returns.
The estimation of the regression model under any of the above specifications of the
heteroskedasticity can be carried out by the maximum likelihood (ML) method as illustrated
in Chapter 9. However, in many cases estimation involves using some iterative numerical algo-
rithm, as in the following example.
Example 10 Consider the linear regression model (4.1) with heteroskedastic errors having variances
γ
σ 2i = σ 2 zi , zi > 0, (4.14)
where both σ 2 and γ are unknown parameters. To apply the weighted least squares approach, we
first need an estimate of γ , which can be done by the asymptotically efficient ML method. Assume
that ui are normally distributed, then the ln-likelihood function for this problem is
(θ) = ln L (θ ) = ln P y1 , y2 , . . . , yT (4.15)
T
= ln P yi ,
i=1
where θ = α, β, γ , σ 2 , and P yi is the probability density function of yi conditional on xi ,
given by
1
2 −2 1 2
P yi = 2πσ i exp − 2 yi − α − βxi
2σ i
2
T γ 1 yi − α − βxi
= − ln 2πσ − 2
ln zi − 2 γ .
2 2 i 2σ i zi
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
88 Introduction to Econometrics
The ML estimators are obtained by maximizing the above function with respect to the unknown
parameters α, β, γ and σ 2 . For this purpose, the Newton–Raphson method can be used (see Section
A.16). The updated relation for the present problem is given by
−1
(i+1) (i) ∂ 2 (θ ) ∂ (θ )
θ̂ = θ̂ − , i = 1, 2, . . .
∂θ θ=θ̂ (i)
(4.16)
∂θ∂θ (i)
θ =θ̂
and
⎡ −γ −γ −γ
−γ ⎤
i ui zi
i zi i xi zi σ −γ
2 i ui z i ln zi
⎢ −γ 2 −γ −γ ⎥
1 ⎢ ⎥
x i ui zi
∂ 2 (θ )
i
⎢ i xi zi i xi zi i xi ui zi ln zi σ−γ
2 ⎥
= − 2⎢ −γ −γ 1 2 2 −γ
2 ⎥,
∂θ∂θ i ui zi ln zi
σ ⎢ i ui zi ln zi i xi ui zi ln zi 2 i ui (ln zi ) zi
⎥
⎣ −γ −γ 2 −γ 2 −γ σ 2
⎦
u z x u z u z ln z u z
i i
σ2
i i i
σ2
i i i i i
σ2
i i i i
σ4
− 2σT 2
with ui = yi − α − βxi . It is often convenient to replace ∂ 2 (θ ) /∂θ ∂θ by its expectations (or its
probability limit). In this case we have
⎡ −γ −γ ⎤
z xz 0 0
i i −γ i 2i i−γ
−∂ 2 (θ ) 1 ⎢
⎢ i i zi
x i i zi
x 0 0
⎥
⎥
E = 2⎢ σ2 ⎥, (4.18)
∂θ∂θ σ ⎣ 0 0 2 i (ln zi )
2
i ln zi ⎦
T
0 0 i ln zi 2σ 2
which is Fisher’s information matrix for the ML estimator of θ . The asymptotic variance-covariance
matrix of the ML estimator of θ is given by the inverse of Fisher’s information matrix given in
(4.18) (see Section 9.4). The block diagonality of the information matrix in the case of this example
establishes that the ML estimators of the regression coefficients (α and β) and the parameters of
the error-variance (σ 2 and γ ) are asymptotically independent. Hence, we have:
−γ −γ −1
α̂ z xz
Asy Var =σ i i −γ
2 i 2i i−γ , (4.19)
β̂ i xi z i i xi z i
−γ −γ
which is the same as the variance matrix of α and β in the OLS regression of zi yi on zi and
−γ
xi zi . Similarly
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Heteroskedasticity 89
1 2 −1
γ̂ 2 σ i (ln zi )
2
i ln zi
Asy Var =σ 2
,
σ̂ 2 i ln zi
T
2σ 2
which yields
2
Asy Var γ̂ = 2 . (4.20)
i (ln zi )
2
− 4
T i ln zi
This result can be used, for example, to test the homoskedasticity hypothesis, H0 : γ = 0.
Estimation procedures other than ML have also been suggested in the literature. For example,
in the case of the mean-variance specification, (4.13), the following two-step procedure has often
been suggested for the case where δ = 1:
Step I Run the OLS regression of yi on xi (including an intercept) to obtain α̂ and β̂ and hence
the fitted values ŷi = α̂ + β̂xi .
Step II Run the OLS regression of yi /ŷi on 1/ŷi and xi /ŷi to obtain new estimates of α and β.
The estimates of α and β obtained in Step II are asymptotically more efficient than the OLS
estimator. In the case where δ is not known a further step is needed to estimate δ from the regres-
sion of ln û2i on ln ŷ2i , including an intercept term. The coefficient of ln ŷ2i in this regression pro-
vides us with an estimate of δ (say δ̂), which can then be used to compute new estimates of α
and β from the OLS regression of yi ŷ− δ̂ −δ̂ −δ̂
i on ŷi and ŷi xi . Recall that ŷi = α̂ + β̂xi .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
90 Introduction to Econometrics
Group 1: yi = α I + β I xi + uIi i = 1, 2, . . . T1 ,
Group 2: yi = α II + β II xi + uIIi i = T1 + 1, . . . , T1 + T2 ,
Group 3: yi = α III + β III xi + uIII
i i = T1 + T2 + 1, . . . , T,
where T = T1 + T2 + T3 .
Step II: Run the OLS regressions of yi on xi for the first and the third groups separately. Obtain
the sums of squares of residuals for these two regressions, and denote them by SSRI and
SSRIII , respectively.
Step III: Construct the statistic
SSRI / (T1 − 2) σ̂ 2
F= = 2I ,
SSRIII / (T3 − 2) σ̂ III
where σ̂ 2I and σ̂ 2III are the unbiased estimates of Var uIi and Var uIII
i , computed using the
observations in groups I and III, respectively. It is convenient to compute the above F statistic
such that it is larger than unity (by putting the larger estimate of the variance in the numera-
tor), so that the test statistic is more directly comparable to the critical values in F Tables.
Under the null hypothesis of homoskedasticity, the above F-statistic has an F-distribution
with T1 − 2 and T3 − 2 degrees of freedom. Large values of F are associated with the rejec-
tion of the homoskedasticity assumption, and possible evidence of the heteroskedasticity. The
Goldfeld–Quandt test readily generalizes to multivariate regressions and to more than three
observation groups.
1. Multiplicative specification:
H0 : γ 1 = γ 2 , . . . , γ p = 0,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Heteroskedasticity 91
2. Additive specification:
H0 : λ1 = λ2 = · · · = λp = 0,
3. Mean-variance specification:
H0 : δ = 0.
Any one of the three likelihood-based approaches discussed in Section 9.7 can be used
to implement the tests. The simplest procedure to compute is the Lagrange multiplier
(LM) method, since this method does not require the estimation of the regression model
under heteroskedasticity. One popular LM procedure for testing the homoskedasticity
assumption is based on an additive version of the mean-variance model, (4.13). The
LM statistic is computed as the t-ratio of the slope coefficient in the regression of û2i on
ŷ2i (including an intercept), where ûi are the OLS residuals and ŷi are the fitted values.
Under the null hypothesis of homoskedastic variances, this t-ratio is asymptotically dis-
tributed as a standard normal variable. In small samples, however, it is more advisable to
use critical values from the t-distribution rather than the critical values from the normal
distribution.
H0 : λ1 = λ2 = · · · = λp = 0,
against
H1 : λi = 0, λ2 = 0, · · · , λp = 0,
using the F-test or other asymptotically equivalent procedures. For example, denoting the mul-
tiple correlation coefficient of the regression of û2i on zi1 , zi2 , . . . , zip by R, it is easily seen that
under H0 : λ1 = λ2 = · · · = λp = 0, the statistic T · R2 is asymptotically distributed
as a χ 2 with p degrees of freedom. This test is also asymptotically equivalent to the test pro-
posed by Breusch and Pagan (1980), which tests H0 against the more general alternative spec-
ification: σ 2i = f α 0 + λ1 zi1 + · · · + λp zip , where f (·) could be any general function. The
White (1980) test of homoskedasticity is a particular application of the above test where zij ’s are
chosen to be equal to the regressors, their squares and their cross products. For example, in the
case of the regression equation
yi = α + β 1 xi1 + β 2 xi2 + ui ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
92 Introduction to Econometrics
zi1 = xi1 , zi2 = xi2 , zi3 = x2i1 , zi4 = x2i2 , zi5 = xi1 xi2 .
A particularly simple example of the above testing procedure involves running the auxiliary
regression
4.7 Exercises
1. The linear regression model
k
yt = bj xtj + ut , t = 1, 2, . . . , T, (4.22)
j=1
satisfies all the assumptions of the classical normal regression model except for the variance
of ut which is given by:V(ut ) = σ 2 |zt | .
(a) Discuss the statistical problems that arise if relation (4.22) is estimated by OLS.
(b) Set out the computational steps involved in estimating the parameters of (4.22) effi-
ciently.
(c) How would you test for heteroskedasticity if the form of error variance is not known?
where ȳi is the mean expenditure on alcohol in group i, and x̄i is the mean income of group i.
Each group i has Ni members and the model satisfies all the classical assumptions except that
the variance of ε i is equal to σ 2 /Ni .
(a) What are the statistical properties of the OLS estimates of α and β?
(b) Obtain the best linear unbiased estimators of α and β.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Heteroskedasticity 93
3. Using cross-sectional observations on 2049 households, the following linear Engel curve was
estimated by OLS:
(a) Test ‘Engel’s Law’ (that the share of food in household expenditure declines with house-
hold income).
(b) Test the hypothesis that the error variances in equation (24) are homoskedastic. Inter-
pret your results.
(c) Discuss the relevance of your test for heteroskedasticity. Do the reported summary and
diagnostic statistics have any implications for your test of Engel’s Law?
where W̄g ≡ C¯g /Ȳg , C̄g ≡ household expenditure on food for income group g, Ȳg ≡ mean
income for group g, Ng ≡ number of households in income group g.
(a) Test Engel’s Law using (4.24). Compare your results with those based on (4.23)
(b) Suppose that grouping the data has dealt with the measurement error problem. Discuss
the other econometric problems that may arise with grouped data. Suggest a more effi-
cient way of estimating an Engel curve using grouped data.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
5 Autocorrelated Disturbances
5.1 Introduction
T his chapter considers extensions of multiple regression analysis to the case where regres-
sion disturbances are serially correlated. Serial correlation, or autocorrelation, arises when
the regression errors are not independently distributed either due to persistent of observations
over time or over space. Our focus in this chapter is on time series observations; the problem of
spatial dependence is addressed in Chapter 30. Serially correlated errors may also arise when the
dynamics of the interactions between the dependent variable, yt , and the regressors, xt , are not
adequately taken into account, or if the regression model is misspecified due to the omission of
persistent regressors.
y = Xβ + u, (5.1)
where we assume
with being a T ×T positive definite matrix. Model (5.1), together with assumptions (5.2) and
(5.3) is known as the generalized linear regression model. Note that the classical linear regression
model can be obtained by setting = σ 2 IT . Another special case of the above specification is
the regression model with heteroskedastic disturbances introduced in Chapter 4.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Autocorrelated Disturbances 95
and β̂ OLS is still an unbiased estimator of β. The variance of the OLS estimator is
Var β̂ OLS = E β̂ OLS − β β̂ OLS − β
−1 −1
= E X X X uu X X X
−1 −1
= XX X X X X .
It follows that, if the matrices PlimT→∞ T −1 X X and PlimT→∞ T −1 X X are both posi-
tive definite matrices with finite elements, then β̂ OLS is consistent for β. Further, under normal-
ity of u,
−1 −1
β̂ OLS − β ∼ N β, X X X X X X .
Hence, in the presence of residual serial correlation, the variance of the least squares estimator is
−1 −1
not σ 2 X X , and statistical inference based on σ̂ 2 X X may be misleading.
−1 = QQ .
Matrix Q exists if is positive definite, but it is not unique. It can be constructed from eigenval-
ues and eigenvectors of (see Section A.5 in Appendix A). Now consider the following trans-
formations
y∗ = Q y, X∗ = Q X, u∗ = Q u.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
96 Introduction to Econometrics
y∗ = X∗ β + u∗ ,
satisfies all the classical assumption and establishes that the OLS estimator of β in the regression
of y∗ on X∗ is the best linear unbiased estimation (BLUE). We have
−1
β̂ GLS = X∗ X∗ X∗ y∗ ,
This is known as the generalized least squares (GLS) estimator, and is more efficient than the
OLS estimator. The efficiency of the GLS over the OLS estimator follows immediately from
the fact that the GLS estimator satisfies the assumptions of the Gauss–Markov theorem. It is
also instructive to give a direct proof of the efficiency of β̂ GLS over the β̂ OLS estimator. We first
note that
−1 −1 −1
Var β̂ GLS = X∗ X∗ = X X ,
and
−1 −1
Var β̂ OLS = X X X X X X .
To prove that β̂ GLS is at least as efficient as β̂ OLS it is sufficient to show that Var β̂ OLS −
Var β̂ GLS is a positive semi-definite matrix. This follows if it is shown that
−1 −1
Var β̂ GLS − Var β̂ OLS ≥ 0,
or equivalently if
−1
X −1 X− X X X −1 X X X ≥ 0.
Note that the left-hand side of the above inequality can be written as
−1
X − 2 IT − 2 X X 2 2 X X 2 − 2 X,
1 1 1 1 1 1
and the condition for the efficiency of the β̂ GLS over β̂ OLS becomes
−1
Y IT − Z Z Z Z Y ≥ 0,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Autocorrelated Disturbances 97
−1
IT − Z Z Z Z being an idempotent matrix allows us to write Y MZ Y as (MZ Y) (MZ Y).
The above proof also shows that for the GLS estimators to be strictly more efficient than the
least squares estimators, it is required that MZ Y be non-zero.
See Section 19.2.1 for a review of the GLS estimator in the context of seemingly unrelated
regressions.
Example 11 In general is not known, but there are some special cases of interest where is known
up to a scalar constant. One important example is the simple heteroskedastic model introduced in
Chapter 4 where
⎛ ⎞
z21 0 ... 0
⎜ 0 z22 ... 0 ⎟
⎜ ⎟
= σ2 ⎜ .. .. .. .. ⎟,
⎝ . . . . ⎠
0 0 . . . z2T
and zt ’s are observations on a variable distributed independently of the disturbances, u. In this case
the GLS estimator reduces to the weighted least squares estimator obtained from the regression of
yt /zt on xit /zt , for i = 1, 2, . . . , k, the weights being the inverse of zt s.
−1
ˆ −1 X
β̂ FGLS = X ˆ −1 y.
X
and
ˆ −1 u − T −1/2 X −1 u = 0.
Plim T −1/2 X
T→∞
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
98 Introduction to Econometrics
If these conditions are satisfied, then the FGLS estimator, based on θ̂ , has the same asymptotic
properties as the infeasible GLS estimator, β̂ GLS .
yt = α + βxt + ut , (5.6)
ut = φut−1 + ε t (5.7)
where we assume that |φ| < 1, and εt is a white-noise process, namely it is a serially uncorrelated
process with a zero mean and a constant variance σ 2ε
Note that condition (5.9) is weaker than the orthogonality assumption (5.3), where it is assumed
that εt is uncorrelated with future values of xt , as well as with its current and past values.
By repeated substitution, in (5.7), we have
ut = ε t + φε t−1 + φ 2 ε t−2 + . . . ,
which is known as the moving average form for ut . From the above expression, under |φ| < 1,
each disturbance ut embodies the entire past history of the εt , with the most recent observa-
tions receiving greater weight than those in the distant past. Since the successive values of εt are
uncorrelated, the variance of ut is
Var (ut ) = σ 2ε + φ 2 σ 2ε + φ 4 σ 2ε + . . . .
σ 2ε
= .
1 − φ2
σ 2ε
Cov (ut , ut−1 ) = E (ut ut−1 ) = E [(φut−1 + ε t ) ut−1 ] = φ .
1 − φ2
To obtain the covariance between ut and ut−s , for any s, first note that by applying repeated sub-
stitution equation (5.7) can be written as
s−1
ut = φ ut−s +
s
φ i εt−i .
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Autocorrelated Disturbances 99
It follows that
s−1
σ 2ε
Cov (ut , ut−s ) = E (ut ut−s ) = E φ ut−s +
s
φ ε t−i ut−s = φ s
i
.
i=1
1 − φ2
Note that, under |φ| < 1, the values decline exponentially as we move away from the diagonal.
5.5.1 Estimation
Suppose, initially, that the parameter φ in (5.7) is known. Then model (5.6) can be transformed
so that the transformed equation satisfies the classical assumptions. To do this, first substitute
ut = yt − α − βxt in ut = φut−1 + εt to obtain:
It is clear that in this transformed regression, disturbances ε t satisfy all the classical assumptions,
and efficient estimators of α and β can be obtained by the OLS regression of y∗t on x∗t .
For the AR(2) error process we need to use the following transformations:
The above procedure ignores the effect of initial observations. For example, for the AR(1) case
we can allow for the initial observations using:
x1
x∗1 = , (5.14)
1 − φ2
y1
y∗1 = . (5.15)
1 − φ2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Efficient estimators of α and β that take account of initial observations can now be obtained by
running the OLS regression of
y∗ = y∗1 , y∗2 , . . . , y∗T , (5.16)
on an intercept and
x∗ = x∗1 , x∗2 , . . . , x∗T . (5.17)
The estimators that make use of the initial observations and those that do not are asymptotically
equivalent (i.e., there is little to choose between them when it is known that |φ| < 1 and T is
relatively large).
If φ = 1, then y∗t and x∗t will be the same as the first differences of yt and xt , and β can be
estimated by regression of yt on xt . There is no long-run relationship between the levels of
yt and xt .
When φ is unknown, φ and β can be estimated using the Cochrane and Orcutt (C–O method)
two-step procedure. Let φ̂ (0) be the initial estimate of φ, then generate the quasi-differenced
variables
Then run a regression of y∗t (0) on x∗t (0) to obtain a new estimate of α and β, (say α̂ (1) and
β̂ (1), and hence a new estimate of φ, given by
T
t=2 ût (1) ût−1 (1)
φ̂ (1) = T 2 ,
t=2 ût (1)
where ût (1) = yt − α̂ (1) − β̂ (1) xt . Generate new transformed observations x∗t (1) = xt −
φ̂ (1) xt−1 , and y∗t (1) = yt − φ̂ (1) yt−1 , and repeat the above steps until the two successive
estimates of β are sufficiently close to one another.
yt = β xt + ut , t = 1, 2, . . . , T,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Assuming the error process is stationary and has started a long time prior to the first observation
date (i.e., t = 1) we have
σ 2
AR(1) Case : Var(u1 ) = ,
⎧ 1 − φ2
⎪ σ 2 (1 − φ 2 )
⎪
⎪ Var(u ) = Var(u ) = ,
⎪
⎨
1 2
(1 + φ 2 ) (1 − φ 2 )2 − φ 21
AR(2) Case :
⎪
⎪
⎪
⎪ σ 2 φ 1
⎩ Cov(u1 , u2 ) = .
(1 + φ 2 ) (1 − φ 2 )2 − φ 21
The exact ML estimation procedure then allows for the effect of initial values on the parameter
estimates by adding the logarithm of the density function of the initial values to the log-density
function of the remaining observations obtained conditional on the initial values. For example,
in the case of the AR(1) model the log-density function of (u2 , u3 , . . . , uT ) conditional on the
initial value, u1 , is given by
T
(T − 1) 1
log f (u2 , u3 , . . . , uT |u1 ) = − log(2π σ 2 ) − 2 u2t , (5.21)
2 2σ t=2
and
1 1 (1 − φ 2 ) 2
log f (u1 ) = − log(2π σ 2 ) + log(1 − φ 2 ) − u1 .
2 2 2σ 2
Combining the above log-densities yields the full (unconditional) log-density function of
(u1 , u2 , . . . , uT )
T 1
log f (u1 , u2 , . . . , uT ) = − log(2π σ 2 ) + log(1 − φ 2 )
2 2
T
1
− 2 (ut − φut−1 ) + (1 − φ )u1 .
2 2 2
(5.22)
2σ t=2
Asymptotically, the effect of the distribution of the initial values on the ML estimators is negli-
gible, but it could be important in small samples where xt s are trended and φ is suspected to be
near but not equal to unity. See Pesaran (1972) and Pesaran and Slater (1980) (Chs 2 and 3) for
further details. Also see Judge et al. (1985), Davidson and MacKinnon (1993), and the papers by
Hildreth and Dent (1974), and Beach and MacKinnon (1978). Strictly speaking, the ML estima-
tion will be exact if lagged values of yt are not included amongst the regressors. For a discussion
of the exact ML estimation of models with lagged dependent variables and serially correlated
errors see Pesaran (1981a).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
AR1 (θ ) = − log(2πσ 2 ) + 12 log(1 − φ 2 ) (5.23)
2
1
− 2 (y − Xβ) R(φ)(y − Xβ),
2σ
with respect to the unknown parameters θ = (β , σ 2 , φ) , where R(φ) is the T × T matrix
⎛ ⎞
1 −φ ··· 0 0
⎜ ⎟
⎜ −φ 1 + φ2 ··· 0 0 ⎟
⎜ ⎟
⎜ .. .. .. .. .. ⎟
R(φ) = ⎜ . . . . . ⎟, (5.24)
⎜ ⎟
⎜ ⎟
⎝ 0 0 · · · 1 + φ2 −φ ⎠
0 0 ··· −φ 1
T
AR1 (φ) = − [1 + log(2π )] + 12 log(1 − φ 2 ) (5.25)
2
T
− log{ũ R(φ)ũ/T}, |φ| < 1,
2
T
AR2 (θ ) = − log(2π σ 2 ) + log(1 + φ 2 ) (5.26)
2
+ 12 log (1 − φ 2 )2 − φ 21
1
− (y − Xβ) R(φ)(y − Xβ),
2σ 2
1 This result follows readily from (5.22) and can be obtained by substituting ut = yt − β xt in (5.22).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 + φ 2 > 0,
1 − φ 2 + φ 1 > 0, , (5.28)
1 − φ 2 − φ 1 > 0.
˜ = σ̂ 2 [X R(φ̃)X]−1 ,
Ṽ(β) (5.29)
2
Ṽ(φ̃) = T −1 (1 − φ̃ ), (5.30)
where R(φ̃) is already defined by (5.24), and σ̂ 2 is given below by (5.39) below.
For the AR(2) case we have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
! 2
"
˜ 1 = ũ1 [(1 − φ̃ 2 )2 − φ̃ 1 ](1 + φ̃ 2 )/(1 − φ̃ 2 ) , (5.34)
# #
2
˜ 2 = ũ2 (1 − φ̃ 2 ) − ũ1 φ̃ 1 [(1 + φ̃ 2 )/(1 − φ̃ 2 )], (5.35)
where
ũt = yt − xt β̃, t = 1, 2, . . . , T,
Recall that φ̃ = (φ̃ 1 , φ̃ 2 ) . The programme also takes account of the specification of the AR-
error process in computations of the fitted values. Denoting these adjusted (or conditional) fitted
values by ỹt , we have
$
ỹt = Ẽ(yt $yt−1 , yt−2 , . . . ; xt , xt−1 , . . .) = yt − ˜ t , t = 1, 2, . . . , T. (5.38)
where p = 1, for the AR(1) case, and p = 2 for the AR(2) case. Given the way the adjusted
residuals ˜ t are defined above we also have
T
σ̂ 2 = ũ R(φ̃)%
u/(T − k − p) = ˜ 2t /(T − k − p). (5.40)
t=1
T
σ̃ 2 = ˜ 2t /T,
t=1
and the estimator adopted in Pesaran and Slater (1980). The difference lies in the way the sum
of squares of residuals ˜ 2t , is corrected for the loss in degrees of freedom arising from the esti-
mation of the regression coefficients, β, and the parameters of the error process, φ = (φ 1 , φ 2 ) .
The R2 , R̄2 , and the F-statistic are computed from the adjusted residuals:
T &
T
R =1−
2
˜ 2t (yt − ȳ)2 ,
t=1 t=1
R̄2 = 1 − (σ̂ 2 /σ̂ 2y ), (5.41)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The F-statistics reported following the regression results are computed according to the for-
mula
' (' (
R2 T−k−p a
F-statistic = ∼ F(k + p − 1, T − k − p) (5.42)
1 − R2 k+p−1
with
and
Notice that R2 in (5.42) is given by (5.41). The above F-statistic can be used to test the joint
hypothesis that except for the intercept term, all the other regression coefficients and the param-
eters of the AR-error process are zero. Under this hypothesis the F-statistic is distributed approx-
imately as F with k + p − 1 and T − k − p degrees of freedom. The chi-squared version of this test
can be based on TR2 /(1−R2 ), which under the null hypothesis of zero slope and AR coefficients
is asymptotically distributed as a chi-squared variate with k + p − 1 degrees of freedom.
The Durbin–Watson statistic is also computed using the adjusted residuals, ˜ t :
T
(˜ t − ˜ t−1 )2
DW=
t=2
.
T
˜ 2t
t=1
a
χ 2AR1,OLS = 2(LLAR1 − LLOLS ) ∼ χ 21 .
The log-likelihood ratio statistic for the test of the AR(2)-error specification against the AR(1)-
error specification is given by
a
χ 2AR2,AR1 = 2(LLAR2 − LLAR1 ) ∼ χ 21 .
Both of the above statistics are asymptotically distributed, under the null hypothesis, as a chi-
squared variate with one degree of freedom.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The log-likelihood values, LLAR1 and LLAR2 , represent the maximized values of the log-
likelihood functions defined by (5.23), and (5.26), respectively. LLOLS denotes the maximized
value of the log-likelihood estimator for the OLS case.
p
ut = φ i ut−i + t , t ∼ N(0, σ 2 ), t = 1, 2, . . . , T, (5.43)
i=1
with ‘fixed initial’ values. The fixed initial value assumption is the same as treating the values,
y1 , y2 , . . . , yp as given or non-stochastic. This procedure in effect ignores the possible contribu-
tion of the distribution of the initial values to the overall log-likelihood function of the model.
Once again the primary justification of treating initial values as fixed is asymptotic and is plau-
sible only when (5.43) is stationary and T is reasonably large (see Pesaran and Slater (1980,
Section 3.2), and Judge et al. (1985) for further discussion).
The log-likelihood function for this case is defined by
1 2
T
(T − p)
LLCO (θ ) = − log(2π σ 2 ) − 2 + c, (5.44)
2 2σ t=p+1 t
where φ̃ i,j and φ̃ i,(j−1) stand for estimators of φ i in the jth and (j − 1)th iterations, respectively.
The estimator of σ 2 is computed as
T
σ̂ 2 = ˜ 2t /(T − p − k), (5.46)
t=p+1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
p
˜ t = ũt − φ̃ i ũt−i , t = p + 1, p + 2, . . . , T, (5.47)
i=1
where
k
ũt = yt − β̃ i xit , t = 1, 2, . . . , T. (5.48)
i=1
As before, the symbol ∼ on top of an unknown parameter stands for ML estimators (now under
fixed initial values). The estimator of σ 2 in (5.46) differs from the ML estimator, given by
σ̃ 2 = Tt=p+1 ˜ 2t /(T − p). The estimator σ̂ 2 allows for the loss of degrees of freedom associ-
ated with the estimation of the unknown coefficients, β, and the parameters of the AR process,
φ. Notice also that the estimator of σ 2 is based on T−p adjusted residuals, since the initial values
y1 , y2 , . . . , yp are treated as fixed.
The adjusted fitted values, ỹt , in the case of this option are computed as
$
ỹt = Ê(yt $yt−1 , yt−2 , . . . ; xt , xt−1 , . . .) = yt − ˜ t , (5.49)
for t = p + 1, p + 2, . . . , T. Notice that the initial values ỹ1 , ỹ2 , . . . , ỹp , are not defined.
In the case where p = 1, Microfit also provides a plot of the concentrated log-likelihood
function in terms of φ 1 , defined by
(T − 1)
LLCO (φ̃ 1 ) = − [1 + log(2π σ̃ 2 )], (5.50)
2
where
T
σ̃ 2 = ˜ 2t /(T − 1),
t=2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
p
X̃∗ = φ̃ i X−i , (5.52)
i=1
and S is an (T − p) × p matrix containing the p lagged values of the C-O residuals, ũt , namely
⎛ ⎞
ũp ũp−1 ... ũ1
⎜ ũp+1 ũp ... ũ2 ⎟
⎜ ⎟
S=⎜ . .. .. .. ⎟. (5.53)
⎝ .. . . . ⎠
ũT−1 ũT−2 . . . ũT−p
The unadjusted residuals, ũt , are already defined by (5.48). The above estimator of the variance
matrix of β̃ and φ̃ is asymptotically valid even if the regression model contains lagged dependent
variables.
where st is the saving rate, log yt is the rate of change of real disposable income, t is the rate of
inflation, and et are the adaptive expectations of t , and
ut = φ 1 ut−1 + t . (5.55)
Equation (5.54) is a modified version of the saving function estimated by Deaton (1977).3 In the
following, we use an approximation of et by a geometrically declining distributed lag function of the
UK inflation rate (see Lesson 10.12 in Pesaran and Pesaran (2009) for details). Figure 5.1 shows
the log-likelihood profile for different values of φ 1 , in the range [−0.99, 0.99]. The log-likelihood
function is bimodal at positive and negative values of φ 1 . The global maximum of the log-likelihood
is achieved for φ 1 < 0. Bimodal log-likelihood functions frequently arise in estimation of models
with lagged dependent variables subject to a serially correlated error process, particularly in cases
where the regressors show a relatively low degree of variability. The bimodal problem is sure to
arise if apart from the lagged values of the dependent there are no other regressors in the regression
equation. Table 5.1 reports maximum likelihood estimation of the model in the Cochrane–Orcutt
method. The iterative algorithm has converged to the correct estimate of φ 1 (i.e. φ̂ 1 = −0.22838)
and refers to the global maximum of the log-likelihood function given by LL(φ̂ 1 = −0.22838)
= 445.3720. Notice also that the estimation results are reasonably robust to the choice of the initial
estimates chosen for φ 1 , so long as negative or small positive values are chosen. However, if the iter-
ations are started from φ (0)
1 = 0.5 or higher, the results in Table 5.2 will be obtained. The iterative
process has now converged to φ̂ 1 = 0.81487 with the maximized value for the log-likelihood func-
tion given by LL(φ̂ 1 = 0.81487) = 444.3055, which is a local maximum. (Recall from Table 5.1
3 Note, however, that the saving function estimated by Deaton (1977) assumes that the inflation expectations e are
t
time invariant.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
450
440
430
420
410
–0.99 –0.5 0 0.5 0.99
Parameter of the autoregressive error process of order 1
that LL(φ̂ 1 = −0.22838) = 445.3720.) This example clearly shows the importance of exper-
imenting with different initial values when estimating regression models (particularly when they
contain lagged dependent variables) with serially correlated errors. (For further details see Lesson
11.6 in Pesaran and Pesaran (2009).)
Dependent variable is S
140 observations used for estimation from 1960Q1 to 1994Q4
U= −.22838*U(-1)+E
( −2.5135) [.013]
t-ratio(s) based on asymptotic standard errors in brackets
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Table 5.2 An example in which the Cochrane–Orcutt method has converged to a local maximum
Dependent variable is S
140 observations used for estimation from 1960Q1 to 1994Q4
U= .81487*U(−1)+E
( 16.1214) [.000]
t-ratio(s) based on asymptotic standard errors in brackets
' ( ' (
−1
β̃ β̃ X̃∗ X̃∗ X̃∗ S X̃∗ ˜
= + , (5.56)
φ̃ j φ̃ j−1 S X̃∗ S S j−1
S ˜ j−1
where the subscripts j and j−1 refer to the jth and the (j−1)th iterations; and ˜ = (˜ p+1 , ˜ p+2 , . . . ,
˜ T ) , X̃∗ , and S have the same expressions as those already defined by (5.47), (5.52), and (5.53),
respectively. The iterations can start with
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and end them if either the number of iterations exceeds, say by 20, or if the condition (5.45) is
satisfied.
On exit from the iterations Microfit computes a number of statistics including estimates of σ 2 ,
the variance matrices of β̃ and φ̃, R2 , R̄2 , etc. using the results already set out in Sections 5.6.1
and 5.5.6.
T
σ̂ 2 = ˜ 2t /(T − p − r − k), T > p + r + k, (5.57)
t=p+1
The chi-squared version of this statistic can, as before, be computed by TR2 /(1 − R2 ), which is
asymptotically distributed (under the null hypothesis) as a chi-squared variate with k + r − 1
degrees of freedom.
where ût = yt − α̂ − ki=1 β̂ i xit , are the OLS residuals. Note that the test is not valid if the
regression equation does not include an intercept term. It is easy to see that, for large sample size,
d ≈ 2 1 − φ̂ ,
)
where φ̂ = Tt=2 ût ût−1 T
t=2 ut−1 . Further, for values of φ̂ near unity, we have d ≈ 0.
2
The critical values depend only on T and on k and are available in Appendix C of Microfit
5 (Pesaran and Pesaran (2009)). From the tables provided we get upper (du ) and lower bound
(dL ) values. We have:
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
• If d ≤ dL , then we reject H0 : φ = 0.
• If d > du , then we do not reject H0 : φ = 0.
• If dL ≤ d ≤ du , then the test is inconclusive.
• If d > 2 then calculate d∗ = 4 − d and apply the above testing procedure to d∗ .
One limitation of the Durbin–Watson statistic is that it only deals with the first-order auto
regressive error process. Further, it is biased towards acceptance when the regression model con-
tains lagged dependent variables. The h-statistic can be used to deal with the problem of lagged
dependent variable. This statistic is defined by
' (*
+
DW + T
,
h-statistic = 1 − ,
2 1 − T V̂ λ̂
where V̂ λ̂ is the estimator of the variance of the OLS estimator of λ in the regression
yt = α + βxt + λyt−1 + ut .
k
ût = α 0 + α j xtj + γ yt−1 + δ 1 ût−1 + δ 2 ût−2 + · · · + δ p ût−p + error. (5.60)
j=1
The LM test is also applicable if the regression equation contains higher-order lags of yt (i.e.,
when (5.60) contains yt−1 , yt−2 , . . . , yt−q ).
Two versions of this test have been used in the literature; an LM, version and an F version. In
the general case the LM version is given by (see Godfrey (1978a, 1978b))
ûOLS W(W Mx W)−1 W ûOLS a
χ 2SC (p) =T ∼ χ 2p , (5.61)
ûOLS ûOLS
where ûOLS is the vector of OLS residuals, X is the observation matrix, possibly containing lagged
values of the dependent variable,
Mx = IT − X(X X)−1 X ,
ûOLS = y − Xβ̂ OLS = (û1 , û2 , . . . , ûT ) ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
0 0 ... 0
⎜ û1 0 ... 0 ⎟
⎜ ⎟
⎜ û2 û1 ... 0 ⎟
W=⎜
⎜ û2 ... ⎟,
⎟ (5.62)
⎜ .. .. .. ⎟
⎝ . . . ûT−p−1 ⎠
ûT−1 ûT−2 ... ûT−p
and p is the order of the error process, and χ 2p stands for a chi-squared variate with p degrees of
freedom.
The F-version of (5.61) is given by4
' (' (
T−k−p χ 2SC (p) a
FSC (p) = ∼ Fp,T−k−p , (5.63)
p T − χ SC (p)
2
where χ 2SC (p) is given by (5.61). The above statistic can also be computed as the F-statistic for
the (joint) test of zero restrictions on the coefficients of W in the auxiliary regression
y = Xα + Wδ + v.
The two versions of the test of residual serial correlation, namely χ 2SC (p) and FSC (p), are asymp-
totically equivalent.
1
T
QT = xt x ,
T t=1 t
m
ˆ0+
ŜT = w(j, m)( ˆ j )
ˆj+ (5.65)
j=1
4 For a derivation of the relationship between the LM-version, and the F-version of the test statistics see, for example,
Pesaran (1981a, pp. 78–80).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
in which
T
ˆj =
ût ût−j xt xt−j ,
t=j+1
ût = yt − β̂ xt ,
and w(j, m) is the kernel or lag window, and m is the ‘bandwidth’ or the ‘window size’. White’s
heteroskedasticity-consistent estimators considered in Section 4.2 can be computed using the
Newey-West estimator by setting the window size, m, equal to zero. In applied work the following
are popular choices for the kernel:
w( j, m) = 1, for j = 1, 2, . . . , m,
• Bartlett kernel
j
w( j, m) = 1 − , j = 1, 2, . . . , m,
m+1
• Parzen kernel
' (2 ' (3
j j m+1
w( j, m) = 1 − 6 +6 , 1≤j≤ ,
m+1 m+1 2
' (2
j m+1
=2 1− < j ≤ m.
m+1 2
In their paper, Newey and West (1987) adopted the Bartlett kernel. The uniform kernel is
appropriate when estimating a regression model with moving average errors of known order.
This type of model arises in testing the market efficiency hypothesis where the forecast horizon
exceeds the sampling interval (see, e.g., Pesaran (1987c, Section 7.6)). In other cases, a Parzen lag
window may be preferable. Note that the positive semi-definiteness of the Newey-West variance
matrix is only ensured in the case of the Bartlett and Parzen kernels. The choice of the uniform
kernel can result in a negative-definite variance matrix, especially if a large value for m is chosen
relative to the number of available observations, T. See the discussion in Andrews (1991), and
Andrews and Monahan (1992).
The choice of the window size or bandwidth, m, is even more critical for the properties of the
NW estimator. The maximum lag m must be determined in advance to be sufficiently large so
that the correlation between xt ut and xt−j ut−j for j ≥ m is essentially zero. Current practice is
to use the smallest integer greater than or equal to T 1/3 , or T 1/4 . Automatic bandwidth selec-
tion procedures that asymptotically minimize estimation errors have also been proposed in the
literature. See, for example, Newey and West (1994) and Sun, Phillips, and Jin (2008).
One major problem in the use of tests based on the NW estimator is the tendency to over-
reject the null hypothesis in finite samples. This problem has been documented in several
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
studies; see, among others, Andrews and Monahan (1992) and den Haan and Levin (1997). To
deal with this issue, as discussed below, a number of authors have proposed a stochastic transfor-
mation of OLS estimates so that the asymptotic distribution of the transformed estimates does
not depend on nuisance parameters.
1
T
Ĉt = ŝt ŝ ,
T 2 t=1 t
where
t
ŝt = xt ût ,
j=1
and ût = yt − β̂ xt . Now define
⎛ ⎞−1
1 T
1/2
M̂ = ⎝ xt xt ⎠ Ĉt ,
T j=1
1/2
where Ĉt represents a lower triangular Cholesky factor of Ĉt . Kiefer, Vogelsang, and Bunzel
(2000) establish that as T → ∞,
−1 √
d
M̂ T β̂ − β → Z−1
k bk (1), (5.66)
and bk (r) denotes a k-dimensional vector of independent standard Wiener processes.5 This
transformation results in a limiting distribution that does not depend on nuisance parameters.
However, the distribution of Z−1 k bk (1) is non-standard, although it only depends on k, the
number of regression coefficients being estimated. Critical values have been computed by simu-
lation by Kiefer, Vogelsang, and Bunzel (2000). The main advantage of this approach compared
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
with standard approaches is that estimates of the variance-covariance matrix are not explicitly
required to construct the tests. Further, Kiefer, Vogelsang, and Bunzel (2000) show that tests
constructed using their procedure can have better finite sample size properties than tests based
on consistent NW estimates.
Kiefer and Vogelsang (2002) showed that the above approach is exactly equivalent to using
NW standard errors with Bartlett kernel, and without truncation (namely, setting m = T in
(5.65)). This result suggests that valid tests can be constructed using kernel based estimators
with bandwidth m = T.
Kiefer and Vogelsang (2005) studied the limiting distribution of robust tests based on the NW
estimators setting m = b · T, where b ∈ (0, 1] is a constant, labelling the asymptotics obtained
under this framework as ‘fixed-b asymptotics’. The authors showed that the limiting distribution
of the F- and t-statistics based on such NW variance estimator are non-standard, and that they
depend on the choice of the kernel and on b. Kiefer and Vogelsang (2005) have also analysed the
properties of these test statistics via a simulation study. Their results indicate a trade-off between
size distortions and power with regard to choice of the bandwidth. Smaller bandwidths lead to
tests with higher power but at the cost of greater size distortions, whereas larger bandwidths
lead to tests with smaller size distortions but lower power. They also found that, among a group
of common choice kernels, the Bartlett kernel leads to tests with highest power in their fixed-b
framework.
Phillips, Sun, and Jin (2006) suggested a new class of kernel functions obtained by exponenti-
ating a ‘mother’ kernel (such as the Bartlett or Parzen lag window), but without using lag trunca-
tion. When the exponent parameter is not too large, the absence of lag truncation influences the
variability of the estimate because of the presence of autocovariances at long lags. Such effects
can have the advantage of better reflecting finite sample behavior in test statistics that employ
NW estimates, and leading to some improvement in test size, as also reported in a simulation
study by Phillips, Sun, and Jin (2006).
While this approach works well, Kapetanios and Psaradakis (2007) note that it does not
exploit information on the structure of the dependence in the regression errors. However, such
information may be used to improve the properties of robust inference procedures. Hence, the
authors suggest to employ a feasible GLS estimator where the stochastic process generating
disturbances is approximated by an autoregessive model with an order that grows at a slower
rate than the sample size (see also Amemiya (1973) on such approximation). Specifically, let
ût = yt − β̂ xt , where β̂ is an initial consistent estimator of β. For some positive integer p chosen
as a function of T so that p → ∞ and p/T → 0 as T → ∞, let φ̂ pp = φ̂ p,1 , φ̂ p,2 , . . . , φ̂ p,p
be the pth order OLS estimator of the autoregressive coefficients for ût , obtained as the solu-
tion to the minimization of
T 2
−1
min T − p ût − φ p,1 ût−1 − φ p,2 ût−2 . . . − φ p,p ût−p .
φ p,1 ,φ p,2 ,...,φ p,p ∈R p
t=p+1
−1
ˆ
X
β̂ = X
ˆ ˆ
y,
X
ˆ (5.68)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
ˆ is the T − p × T matrix defined as
where
⎛ ⎞
−φ̂ p,p −φ̂ p,p−1 −φ̂ p,p−2 . . . −φ̂ p,1 1 ... 0 0
⎜ −φ̂ p,p −φ̂ p,p−1 . . . −φ̂ p,2 −φ̂ p,1 ... ⎟
⎜ 0 0 0 ⎟
=⎜
ˆ
⎜ .. .. .. .. .. .. .. .. .. ⎟.
⎟
⎝ . . . . . . . . . ⎠
0 0 0 . . . −φ̂ p,p −φ̂ p,p−1 . . . −φ̂ p,1 1
p
Note that (5.68) can be obtained by applying OLS to the regression of ŷ∗t = 1− j=1 φ̂ p,j Lj yt
p
on x̂∗t = 1 − j=1 φ̂ p,j Lj xt .
One drawback of NW-type estimators is that they cannot be employed to obtain valid tests
of the significance of OLS estimates when there are lagged dependent variables in the regressors,
and errors are serially correlated. The problem is that these procedures require the OLS esti-
mator to be consistent. However, as formally proved in Section 14.6, in general OLS estimators
will be inconsistent when errors are autocorrelated and there are lagged values of the dependent
variable among the regressors. One possible way of dealing with this problem would be to use an
instrumental variables (IV) approach for estimating consistently the parameters of the regres-
sion model, and then obtain IV-based robust tests. As an alternative, Godfrey (2011) suggested
a joint test for misspecification and autocorrelation using the J-test approach by Davidson and
MacKinnon (1981) and introduced in Chapter 11 (see Section 11.6). Suppose that the valid-
ity of M1 is to be tested using information about M2 , and that the regressors xt and zt in models
(11.22)–(11.23) both contain at least one lagged value of yt . Also suppose that the autoregressive
or moving average model of order m is used as the alternative to the assumption of independent
errors. The author suggested a heteroskedasticity-robust joint test of the (1 + m) restrictions
λ = φ 1 = . . . = φ m = 0, in the ‘artificial’ OLS regression
yt = β 1 xt + λ(β̂ 2 zt ) + φ 1 û1,t−1 + . . . + φ m û1,t−m + t , (5.69)
where û1,t−j is a lagged value of the OLS residual from estimation of (11.22) when (t−j) > 0 and
is set equal to zero when (t − j) ≤ 0. More specifically, Davidson and MacKinnon (1981) pro-
posed using the Wald approach, combined with heteroskedasticity-consistent variance (HCV)
estimator:
−1
τ J = λ̂, φ̂ 1 , . . . , φ̂ m R1 ĈJ R1 λ̂, φ̂ 1 , . . . , φ̂ m , (5.70)
T T T
ĈJ = r̂ t r̂ t û21t r̂ t r̂ t r̂ t r̂ t , (5.71)
t=1 t=1 t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
with r̂ t = xt , β̂ 2 zt , û1,t−1 , . . . , û1,t−m . Note that, under the null hypothesis, the OLS esti-
mators of the artificial alternative regression are consistent and asymptotically normal. It follows
that τ J ∼ χ 2m+1 . Recent work has indicated that, when several restrictions are under test, the use
of asymptotic critical values with HCV-based test statistics produces estimates of null hypothe-
sis rejection probabilities that are too small (see Godfrey and Orme (2004)). To overcome this
problem, Davidson and MacKinnon (1981) suggested a bootstrap implementation of the above
test, using the wild bootstrap method.
5.12 Exercises
1. Consider the generalized linear regression model (5.1), under assumptions (5.2) and (5.3),
and suppose that is known. Then:
(a) What is the variance matrix of the OLS residual vector ûOLS = y − Xβ̂ OLS ?
(b) What is the variance matrix of the GLS residual vector ûGLS = y − Xβ̂ GLS ?
(c) What is the covariance matrix of OLS and GLS residual vectors?
2. Consider
yt = β xt + ut ,
where
ut = ε t + θ ε t−1 ,
yt = βxt−1 + ut , (5.72)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
ut = ρut−1 + ε t ,
xt = φxt−1 + vt ,
and derive expressions for λ, ψ 0 and ψ 1 in terms of β, ρ and φ. In particular, show that
θ = (ψ 0 + ψ 1 )/(1 − λ) is equal to β. Hence, or otherwise, suggest how to test model
(5.73) against model (5.72).
(b) How do you think β should be estimated?
(a) Show that the OLS estimator of β is biased, and derive an expression for its bias. Under
what conditions the OLS estimator of β is unbiased?
|x
$ that β̂ OLS xt−1 is an unbiased estimator of E yt t−1 . How do you estimate
(b) Show
E yt $xt−1 , xt−2 , yt−1 ?
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
6 Introduction to Dynamic
Economic Modelling
6.1 Introduction
D ynamic economic models typically arise as a characterization of the path of the economy
around its long-run equilibrium (steady state), and involve modelling expectations, learn-
ing and adjustment costs. There exist a variety of dynamic specifications used in applied time
series econometrics. This chapter reviews a number of single-equation specifications suggested
by econometric literature to represent dynamics in regression models. It provides a preliminary
introduction to distributed lag models, autoregressive distributed lag models, partial adjustment
models, error-correction models, and adaptive and rational expectations models. More general
multi-equation dynamic systems will be considered in the second part of the book, where vector
autoregressive models with and without weakly exogenous variables and multivariate rational
expectations are discussed.
yt = α + β 0 xt + β 1 xt−1 + · · · + β q xt−q + ut
= α + β (L) xt + ut , (6.1)
β (L) = β 0 + β 1 L + . . . + β q Lq ,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Li xt = xt−i , i = 0, 1, 2, . . . . (6.2)
The lag coefficients are often restricted to lie on a polynomial of order r ≤ q. In the case where
r = q the distributed lag model is unrestricted.
β (L)
yt = α + xt + vt , (6.3)
λ (L)
λ (L) = λ0 + λ1 L + . . . + λp Lp .
A comprehensive early treatment of rational distributed lag models can be found in Dhrymes
(1971). See also Jorgenson (1966).
or
Note that the rational distributed lag model defined by (6.3) can also be written in the form of
an ARDL(p, q) model with moving average errors, namely
which is the same as (6.4) except that the error term is now given by
ut = λ(L)vt ,
Recent developments in time series analysis focus on the ARDL(p, q) specification for two
reasons: it is analytically much simpler to work with, as compared with the rational distributed
lag models. Secondly, by selecting p and q to be sufficiently large one can provide a reasonable
approximation to the rational distributed lag specification if required.
Deterministic trends, or seasonal dummies, can be easily incorporated in the ARDL model.
For example, we could have
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
The ARDL model is said to be stable if all the roots of the pth order polynomial equation
λ(z) = 1 + λ1 z + λ2 z2 + . . . + λp zp = 0, (6.7)
lie outside the unit circle, namely if |z| > 1. The process is said to have a unit root if λ(1) =
0. The explosive case where one or more roots of λ(z) = 0 lie inside the unit circle is not of
practical relevance for the analysis of economic time series and will not be considered.
The simple model (6.4) with one regressor can be extended to the case of k regressors,
each with a specific number of lags. Specifically, consider the following ARDL(p, q1 , q2 , . . . , qk )
model
k
λ(L, p)yt = β j (L, qj )xtj + ut ,
j=1
where
λ(L, p) = 1 + λ1 L + λ2 L2 + . . . + λp Lp .
β j (L, qj ) = β j0 + β j1 L + . . . + β jqj Lqj , j = 1, 2, . . . , k.
See Hendry, Pagan, and Sargan (1984) for a comprehensive early review of ARDL models.
T −1
T
θ̂ = zt zt zt yt . (6.8)
t=1 t=1
The first condition is satisfied if zt follows a covariance stationary process. This will be
the case if all the roots of the pth order polynomial equation λ(z) lie outside the unit circle,
and xt and ut are covariance stationary processes with absolute summable autocovariances, as
defined by
∞
xt = ai vt−i ,
i=0
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and
∞
ut = bi ε t−i ,
i=0
∞
where ∞ i=0 |ai | < K < ∞, i=0 |bi | < K < ∞, and {vt }, and {ε t } are standard white
noise processes. However, as we will show in Section 14.6, condition 1 alone does not guarantee
consistency of OLS estimators and must be accompanied by condition 2.
To select the lag orders p and q, one possibility is to estimate model (6.4) by OLS for all
possible values of p = 0, 1, 2, . . . , m, q = 0, 1, 2, . . . , m, where m is the maximum lag, and
t = m + 1, m + 2, . . . , T. Hence, one of the (m + 1)2 estimated models can be selected by using
one of the four model selection criteria described in Chapters 2 and 11, namely the R̄2 criterion,
the Akaike information criterion, the Schwarz Bayesian criterion, and the Hannan and Quinn cri-
terion. See Section 6.4 for the derivation of the error-correction representation of ARDL models,
and Section 6.5 for the computation of the long-run coefficients for the response of yt to a unit
change in xt .
Further suppose that yt adjusts to its desired level according to the following first-order partial
adjustment equation
yt − yt−1 = λ y∗t − yt−1 , (6.10)
where λ is the adjustment coefficient. No adjustment will take place if λ = 0, and adjustment
will be instantaneous if λ = 1. Using (6.9) to substitute for y∗t we have
In this formulation, θ measures the cost of adjustment relative to the cost of being out of equi-
librium. The first-order condition for this optimization problem is
yt − y∗t + θ yt − yt−1 = 0, (6.13)
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
1 ∗
yt − yt−1 = yt − yt−1 , (6.14)
1+θ
where θ = β 0 + β 1 / (1 − λ) is the slope coefficient in the long-run relationship between yt
and xt (see (6.24) below). Relation (6.16) is known as the error-correction representation
of the
ARDL(1, 1) model (6.15). The term xt is referred to as the derivative effect, and yt−1 − θxt−1
as the error-correction term.
More general error-correction models can also be obtained from ARDL(p, q) specifications,
given by (6.4). In general, any lagged variable, yt−j , for j = 1, 2, . . . , can be written as
yt−j = yt−1 − yt−1 + yt−2 + . . . + yt−j+1 .
Similarly
xt−j = xt−1 − xt−1 + xt−2 + . . . + xt−j+1 .
Also
yt = yt + yt−1 ,
xt = xt + xt−1 .
1 See Alogoskoufis and Smith (1991) for an historical survey of the evolution of ECMs.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
p−1
q−1
+ λ∗i yt−i + β ∗i xt−i + ε t ,
i=1 i=0
where
λ(1) = 1 + λ1 + . . . . + λp , β(1) = β 0 + β 1 + . . . + β q ,
p
q
λ∗i = λj , β ∗i =− β j.
j=i+1 j=i+1
The stability of the ARDL model, or the equilibrating properties of the ECM representation in
the present simple (single equation) case critically depend on the roots of the characteristic equa-
λ (z) = 0, defined by (6.7). If the underlying ARDL model is stable, λ(1) = 0, the process
tion
yt is said to be ‘mean reverting’. The equilibrium (or steady state) value of yt depends on the
nature of the {xt } process. If xt is a stationary process with a constant mean, μx and a constant
variance, σ 2x , then in the limit yt will also tend to a constant mean given by
β(1)μx + a
μy = .
λ(1)
In the case where xt is trended, or unit root non-stationary (also known as first difference sta-
tionary) then yt is given by
β(1)
yt = α ∗ + xt + vt ,
λ(1)
where a∗ is a constant term and vt is a stationary random variable. In this case we have
β(1)μx
μy = ,
λ(1)
where μy , and μx are the means of yt and xt , which are assumed to be stationary. Pesaran,
Shin, and Smith (2001) develop a procedure for testing the existence of a level relationship
between yt and xt in (6.17), when it is not known if the regressor, xt , is I(1) or I(0). In this case,
the distribution of the F- or Wald statistics for testing the existence of the level relations in the
ARDL model are non-standard and must be computed by stochastic simulations. See Section
22.3.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
∂yt
= λβ. (6.18)
∂xt
The long-run effect of a unit change in x upon y can be obtained (when it exists) by replacing all
current and lagged values of y and x in (6.11) equal to ỹ and x̃, respectively, namely2
ỹ = αλ + (1 − λ) ỹ + λβ x̃ + λũ, (6.19)
ỹ = α + β x̃ + ũ. (6.20)
Hence
∂ ỹ
= β.
∂ x̃
Then setting yt and yt−1 equal to ỹ, and xt and xt−1 equal to x̃, we have
ỹ (1 − λ) = α + β 0 + β 1 x̃, (6.22)
α β + β1
ỹ = + 0 x̃. (6.23)
1−λ 1−λ
∂ ỹ β + β1
= 0 . (6.24)
∂ x̃ 1−λ
β +β
0
In the case where the long run effect is assumed to be equal to unity, then 1−λ 1
= 1, or 1 =
λ + β 0 + β 1 . The parametric restriction β 0 + β 1 + λ = 1 can be tested by noting that (6.15)
may also be written as
yt = α + β 0 xt − β 0 + β 1 yt−1 − xt−1 − 1 − λ − β 0 − β 1 yt−1 + ut , (6.25)
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
β0 + β1 + . . . + βq
. (6.26)
1 − λ1 − λ2 . . . − λp
The long-run coefficient can be estimated by replacing β j and λj by their OLS estimates, and p
and q by their estimates p̂ and q̂ obtained by applying one of the selection criteria introduced in
Chapters 2 and 11. The asymptotic standard errors of the long-run coefficient can be computed
by using the methodology developed by Bewley (1979) or by applying the -method.
∞
yt = a 0 + θ wi xt−i + vt ,
i=0
where the weights, wi , are non-negative and add up to unity: wi ≥ 0, i wi = 1. The long-run
impact of xt on yt is given by θ . Using the lag operator L (Lxt = xt−1 ) we have
yt = a0 + θW(L)xt + vt ,
where
∞
W(L) = wi Li .
i=0
1−λ
W(L) = ,
1 − λL
and
(1 − λ) λ
W (L) = ,
(1 − λL)2
which yields the mean lag of W (1) = λ/(1 − λ), as derived above.
Let us go back to the simple ARDL(1, 0) model
yt = α + βxt + λyt−1 + ut .
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
To obtain the mean lag we first write this model in its distributed lag form
∞ ∞
α
yt = +β λi xt−i + λi ut−i ,
1−λ i=0 i=0
or equivalently
∞
∞
α β
yt = + (1 − λ) λ xt−i +
i
λi ut−i .
1−λ 1−λ i=0 i=0
yt = α + β t xet+1 + ut , (6.27)
where t xet+1 denotes the expectations of xt+1 formed at time t. The adaptive model of expec-
tations formation postulates that expectations are revised with the past error of expectations,
namely
t xt+1 − t−1 xt = (1 − μ) xt − t−1 xt ,
e e e
(6.28)
∞
e
t xt+1 = (1 − μ) μi xt−i .
i=0
Substituting this result in (6.27) gives the distributed lag model with geometrically declining
weights
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
∞
yt = α + β (1 − μ) μi xt−i + ut . (6.29)
i=0
Notice that the basic difference between a partial adjustment model and an adaptive expectations
model lies in the autocorrelation patterns of their residuals. Clearly, one can consider a combined
partial adjustment/expectations model, namely
where t xet+1 and y∗t are defined by (6.28) and (6.10), respectively. Using (6.28) and (6.10) we
have
∞
yt − (1 − λ) yt−1 = αλ + λβ (1 − μ) μi xt−i + λut , (6.32)
i=1
or
∞
yt = (1 − λ) yt−1 + αλ + λβ (1 − μ) μi xt−i + λut .
i=0
This geometric distributed lag model can also be written as an ARDL model. We have
where
vt = λ (ut − μut−1 ) .
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
e
t xt+1 = E (xt+1 |
t ) ,
where
t is the information set, and it is assumed that
t = xt , xt−1 , . . . , yt , yt−1 , . . . .
Hence
e
t xt+1 = μ1 xt + μ2 xt−1 , (6.36)
or
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Finally
α γβ e
yt = + x + γ xt + ut . (6.42)
1−β 1−β t
as before we have xet = μ1 xt−1 + μ2 xt−2 , and after substituting this result in (6.42) we obtain
the solution
α γβ
yt = + μ1 xt−1 + μ2 xt−2 + γ xt + ut . (6.44)
1−β 1−β
yt = α + βE(yt+1 |
t ) + γ xt + ut . (6.45)
Let wt = α + βxt + ut , and rolling the equation one step forward we have
yt+1 = βE(yt+2 |
t+1 ) + wt+1 . (6.46)
E(yt+1 |
t ) = βE(yt+2 |
t ) + E(wt+1 |
t ).
E(yt+2 |
t ) = βE(yt+3 |
t ) + E(wt+2 |
t ),
and so on. Using these results recursively forward we obtain the forward solution of the model
given by
h−1
yt = β E(yt+h |
t ) +
h
β j E(wt+j |
t ). (6.47)
j=0
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
lim β h E(yt+h |
t ) = 0, (6.48)
h→∞
and the stochastic process of the forcing variables {wt } is such that
⎡ ⎤
h−1 ∞
lim ⎣ β j E(wt+j |
t )⎦ = β j E(wt+j |
t ),
h→∞
j=0 j=0
exists. Under these assumptions the unique solution of the RE equation, (6.46), is give by the
familiar present value formula
∞
yt = β j E(wt+j |
t ).
j=0
The condition (6.48) is known as the transversality condition and is likely to be met if and only
if |β| < 1. The present value expression exists for a wide class of stochastic processes. It clearly
exists if wt is stationary. It also exists if wt follows a unit root process.
As an example, suppose that ut is serially uncorrelated and xt follows the AR(1) process
xt = ρxt−1 + ε t .
E(wt |
t ) = α + γ xt + ut ,
E(wt+j |
t ) = α + γ E(wt+j |
t ) = α + γ ρ j xt , for j > 0.
Hence,
∞
α γ
yt = β j E(wt+j |
t ) = + xt + ut .
j=0
1−β 1 − βρ
xt = xt−1 exp(μ + εt ),
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and making use of results from the moment generating function of normal variates, we have
jσ 2
E(xt+j |
t ) = xt exp jμ + ε .
2
Hence, the present value exits if β exp(μ + 0.5 ∗ σ 2ε ) < 1, and the unique solution of the RE
model (assuming |β| < 1) is given by
α γ
yt = + xt + ut ,
1−β 1 − βλ
σ2
where λ = exp μ + 2ε .
If |β| > 1 the solution is not unique and depends on E(yt+h |
t ) for some h. Note that in
this case the present value component of the solution, (6.47), might exist but the transversality
condition would not be satisfied. The multiplicity of the solution in the case of |β| > 1, also
known as the irregular case, can be obtained by observing that under the REH the expectations
error
yt = α + β(yt+1 − vt+1 ) + γ xt + ut ,
and
The non-uniqueness nature of this solution is due to the presence of the martingale process vt
in the solution. Note that vt is a function of the information set and can be further characterized
as functions of the innovations in the forcing variables. For example,
vt = g0 [xt − E(xt |
t−1 )] + ξ t ,
where g0 is an arbitrary constant and ξ t is another martingale process, in the sense that
E(ξ t |
t−1 ) = 0.
Solution of more complicated RE models where there are backward as well as forward
components are reviewed in Pesaran (1987c) and Binder and Pesaran (1995). In Chapter 20
we consider RE models in a multivariate setting.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
formal treatment of stochastic processes is given in Chapter 12. More details on univariate ratio-
nal expectations models are provided in Pesaran (1987c) and Gouriéroux, Monfort, and Gallo
(1997).
6.10 Exercises
1. Consider the dynamic model
λ, β are parameters
where and ut is a white-noise disturbance term (i.e. E(ut ) = 0,
E u2t = σ 2 , and E ui uj = 0, for all i = j). Show that (6.49) is equivalent to the model
∞
yt = β λj xt−j + vt ,
j=0
where vt is an autoregressive process with parameter λ. Derive the short-run and long-run
effects on y of a once and for all change in x.
2. Show that the mean lag for the DL model
yt = α + β 0 xt + β 1 xt−1 + β 2 xt−2 + ut ,
is given by
β 1 + 2β 2
.
β0 + β1 + β2
How do you interpret the difference between the two mean lags of the above two equations?
4. Consider the following simple model
yt = λyt−1 + ut ,
with |λ| < 1, and let
t = yt , yt−1 , . . . , yt−p . Derive the rational expectations at horizon
h, namely E yt+h
t .
5. Suppose that the desired level of consumption, Ct∗ is determined by
Ct∗ = α + βYt + ε t ,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where Yt is real disposable income, εt is a error term, and actual (realized) consumption, Ct ,
evolves gradually in response to desired consumption
where ν t = θ ε t .
(ii) Discuss the advantages and disadvantages of estimating (6.51) over (6.50).
(iii) Show that the autoregressive lag model (6.51) can be rewritten as a distributed lag
model.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
7 Predictability of Asset
Returns and the Efficient
Market Hypothesis
7.1 Introduction
E conomists have long been fascinated by the nature and sources of variations in the stock
market. By the early 1970s, a consensus had emerged among financial economists suggesting
that stock prices could be well approximated by a random walk model and that changes in stock
prices were basically unpredictable. Fama (1970) provides an early, definitive statement of this
position. Historically, the ‘random walk’ theory of stock prices was preceded by theories relating
movements in the financial markets to the business cycle. A prominent example is the interest
shown by Keynes in the variation in stock returns over the business cycle.
The efficient market hypothesis (EMH) evolved in the 1960s from the random walk theory of
asset prices advanced by Samuelson (1965). Samuelson showed that, in an informationally effi-
cient market, price changes must be unforecastable. Kendall (1953), Cowles (1960), Osborne
(1959, 1962), and many others had already provided statistical evidence on the random nature
of equity price changes. Samuelson’s contribution was, however, instrumental in providing aca-
demic respectability for the hypothesis, despite the fact that the random walk model had been
around for many years, having been originally discovered by Louis Bachelier, a French statisti-
cian, back in 1900.
Although a number of studies found some statistical evidence against the random walk hypoth-
esis, these were dismissed as economically unimportant (they could not generate profitable trad-
ing rules in the presence of transaction costs) and statistically suspect (they could be due to data
mining). For example, Fama (1965) concluded that ‘there is no evidence of important depen-
dence from either an investment or a statistical point of view’. Despite its apparent empirical suc-
cess, the random walk model was still a statistical statement and not a coherent theory of asset
prices. For example, it need not hold in markets populated by risk-averse traders, even under
market efficiency.
There now exist many different versions of the EMH, and one of the aims of this chap-
ter is to provide a simple framework where alternative versions of the EMH can be articu-
lated and discussed. We begin with an overview of the statistical properties of asset returns at
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
different frequencies (daily, weekly, and monthly), and consider the evidence on return pre-
dictability, risk aversion, and market efficiency. We then focus on the theoretical foundation
of the EMH, and show that market efficiency could co-exist with heterogeneous beliefs and
individual ‘irrationality’, so long as individual errors are cross-sectionally weakly dependent in
the sense defined by Chudik, Pesaran, and Tosetti (2011). But at times of market euphoria or
gloom these individual errors are likely to become cross sectionally strongly dependent and
the collective outcome could display significant departures from market efficiency. Market effi-
ciency could be the norm, but most likely it will be punctuated by episodes of bubbles and
crashes. To test for such episodes we argue in favour of compiling survey data on individual
expectations of price changes that are combined with information on whether such expecta-
tions are compatible with market equilibrium. A trader who believes that asset prices are too
high (low) might still expect further price rises (falls). Periods of bubbles and crashes could
result if there are sufficiently large numbers of such traders that are prepared to act on the basis
of their beliefs. The chapter also considers if periods of market inefficiency can be exploited
for profit.
1 + Rt = Pt /Pt−1 ,
rt = ln(Pt ) = ln(1 + Rt ).
It is easily seen that for small relative price changes the log-price change and the relative price
change are almost identical.
In the case of daily observations when dividends are negligible, 100 · Rt measures the per cent
return on the security, and 100·rt is the continuously compounded return. Rt is also known as dis-
cretely compounded return. The continuously compounded return, rt , is particularly convenient
in the case of temporal aggregation (multi-period returns: see Section 7.2.2), while the discretely
compounded returns are convenient for use in cross-sectional aggregation, namely aggregation
of returns across different instruments in a portfolio. For example, for a portfolio composed of
N instruments with weights wi,t−1 , ( N i=1 wi,t−1 = 1, wi,t−1 ≥ 0) we have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
N
Rpt = wi,t−1 Rit , (per cent return),
i=1
N
rpt = ln wi.t−1 e rit
, (continuously compounded).
i=1
Often rpt is approximated by Ni=1 wi,t−1 rit .
When dividends are paid out we have
Pt − Pt−h
Rt (h) = ,
Pt−h
or
1 + Rt (h) = Pt /Pt−h ,
and
where rt−i , i = 0, 1, 2, . . . , h − 1 are the single-period returns. For example, weekly returns are
defined by rt (5) = rt +rt−1 +. . .+rt−4 . Similarly, since there are 25 business days in one month,
then the 1-month return can be computed as the sum of the last 25 1-day returns, or rt (25).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where μt and σ 2t are the conditional mean and the conditional variance of returns (with respect
to the information set t available at time t) and εt+1 represents the unpredictable component
of return. Two popular distributions for εt+1 are
ε t+1 | t ∼ IID Z ,
v−2
ε t+1 | t ∼ IID Tv ,
v
where Z ∼ N(0, 1) stands for a standard normal distribution, and Tv stands for Student’s
t-distribution with v degrees of freedom. Unlike the normal distribution that has moments of
all orders, Tv only has moments of order v − 1 and smaller. For the Student’s t to have a variance,
for example, we need v > 2.
Since rt+1 = ln(1 + Rt+1 ), where Rt+1 = (Pt+1 − Pt )/Pt , it then follows that under εt+1 |
t ∼ IID Z , the price level, Pt+1 conditional on t will be lognormally distributed. Note that
t = (Pt , Pt−1 , . . . .) and t = (rt , rt−1 , . . . .) convey the same information and are equivalent.
Hence, Pt+1 = Pt exp(rt+1 ), and we have1
In practice, it is much more convenient to work with log returns, rt+1 , rather than asset prices.
The probability density functions of Z and Tv are given by
−Z 2
f (Z ) = (2π )−1/2 exp , − ∞ < Z < ∞, (7.2)
2
and
−(v+1)/2
1 T2
f (Tv ) = √ 1+ v , (7.3)
vB(v/2, 1/2) v
1 Using properties of the moment generating function of normal variates, if x N(μx , σ 2x ) then, E [exp(x)] =
exp(μx + .5σ 2x ).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where −∞ < Tv < ∞, and B(v/2, 1/2) is the beta function defined by
∞
(α) (β)
B(α, β) = , (α) = uα−1 e−u du.
(α + β) 0
v
E (Tv ) = 0, and Var (Tv ) = .
v−2
A large part of financial econometrics is concerned with alternative ways of modelling the con-
ditional mean (mean returns), μt , the conditional variance (asset return volatility), σ t , and the
cumulative probability distribution of the errors, εt+1 . A number of issues need to be addressed
in order to choose an adequate model. In particular:
The above modelling issues can be readily extended to the case where we are concerned with
a vector of asset returns, rt = (r1t , r2t , . . . rmt ) . In this case we also need to model the pair-wise
conditional correlations of asset returns, namely
Var(rit , rjt | t )
Corr(rit , rjt | t ) = .
Var(rit | t )Var(rjt | t )
Typically the conditional variances and correlations are modelled using exponential smoothing
procedures or the multivariate generalized autoregressive conditional heteroskedastic models
developed in the econometric literature. See Chapters 18 and 25 for further details.
In the literature on risk management Cp is used to compute ‘Value at Risk’ or VaR for short. For
p = 1% , Cp associated with the one-sided critical value of the normal distribution is given by
−2.33σ , where σ is the standard deviation of returns (see Chapter 25 for an application of the
VaR in the context of risk management).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In hypothesis testing Cp is known as the critical value of the test associated with a (one-sided)
test of size p. In the case of two-sided tests of size p, the associated critical value is computed as
Cp/2 . See Chapter 3.
1
f (rt+1 ) = (2πσ 2t )−1/2 exp − 2 (rt+1 − μt ) ,
2
2σ t
with μt = E(rt+1 | t ) and σ 2t = E (rt+1 − μt )2 | t being the conditional mean and
variance. If the return process is stationary, unconditionally we also have μ = E(rt+1 ), and
σ 2 = E (rt+1 − μt )2 .
Skewness and tail-fatness measures are defined by
3/2
Skewness = b1 = m3 /m2 ,
Kurtosis = b2 = m4 /m22 ,
where
T
t=1 (rt − r̄)j
mj = , j = 2, 3, 4.
T
√
For a normal distribution b1 ≈ 0, and b2 ≈ 3. In particular
T
T
t=1 (rt
− r̄)2
μ̂ = r̄ = rt /T, σ̂ = .
t=1
T−1
The Jarque–Bera’s test statistic for departure from normality is given by, (see Jarque and Bera
(1980), and Section 3.14)
1
JB = T 6 b1 + 24 (b2
1
− 3)2 .
Under the joint null hypothesis that b1 = 0 and b2 = 3, the JB statistic is asymptotically dis-
tributed (as T → ∞) as a chi-squared with 2 degrees of freedom, χ 22 . Therefore, a value of JB
in excess of 5.99 will be statistically significant at the 95 per cent confidence level, and the null
hypothesis of normality will be rejected.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
2 All statistics and graphs have been obtained using Microfit 5.0.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
0.6
0.5
0.4
0.3
0.2
0.1
0.0
–10 –9 –8 –7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 7.1 Histogram and Normal curve for daily returns on S&P 500 (over the period 3 Jan 2000–31
Aug 2009).
16
14
12
10
8
6
4
2
0
–2
–4
–6
–8
–10
03-Jan-00 03-Jun-02 01-Nov-04 02-Apr-07 31-Aug-09
SP
Figure 7.2 Daily returns on S&P 500 (over the period 3 Jan 2000–31 Aug 2009).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Table 7.2 Descriptive statistics for daily returns on British pound, euro,
Japanese yen, Swiss franc, Canadian dollar, and Australian dollar
Variables BU BE BG BJ
considered are the British pound (GBP), euro (EU), Japanese yen ( JPY), Swiss franc (CHF),
Canadian dollar (CAD), and Australian dollar (AD), all measured in terms of the US dollar.
The returns on government bonds are generally less fat-tailed than the returns on equities and
currencies. But their distribution still shows a significant degree of departure from normality.
Table 7.3 reports descriptive statistics on daily returns on the main four government bond
futures: US T-Note 10Y (BU), Europe Euro Bund 10Y (BE), Japan Government Bond 10Y (BJ),
and UK Long Gilts 8.75-13Y (BG) over the period 03 Jan 2000–31 Aug 2009.
It is clear that, for all three asset classes, there are significant departures from normality which
need to be taken into account when analysing financial time series.
rt = sign(rt ) |rt | ,
where sign(rt ) = +1 if rt > 0 and sign(rt ) = −1 if rt ≤ 0. Since |rt | is predictable, it is, there-
fore, the non-predictability of sign(rt ), or the direction of the market, which lies behind the dif-
ficulty of predicting returns.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The extent to which returns are predictable depends on the forecast horizon, the degree
of market volatility, and the state of the business cycle. Predictability tends to rise during cri-
sis periods. Similar considerations also apply to the degree of fat-tailedness of the underly-
ing distribution and the cross-correlations of asset returns. The return distributions become
less fat-tailed as the horizon is increased, and cross-correlations of asset returns become more
predictable with the horizon. Cross-correlation of returns also tends to increase with market
volatility. The analysis of time variations in the cross correlation of asset returns is discussed in
Chapter 25.
In the case of daily returns, equity returns tend to be negatively serially correlated. During nor-
mal times they are small and only marginally significant statistically, but they become relatively
large and attain a high level of statistical significance during crisis periods. These properties are
illustrated in the following empirical application.
The first- and second-order serial correlation coefficients of daily returns on S&P 500 over
the period 3 Jan 2000–31 Aug 2007 are −0.015 (0.0224) and −0.0458 (0.0224), respectively,
but increase to −0.068 (0.0199) and −0.092 (0.0200) once the sample is extended to the end
of August 2009, which covers the 2008 global financial crisis.3 Similar patterns are also observed
for other equity indices. For currencies the evidence is more mixed. In the case of major cur-
rencies such as euro and yen, there is little evidence of serial correlation in returns and this out-
come does not seem much affected by whether one considers normal or crisis periods. For other
currencies there is some evidence of negative serial correlation, particularly at times of crisis.
For example, over the period 3 Jan 2000–31 Aug 2009 the first-order serial correlation of daily
returns on Australian dollar amounts to −0.056 (0.0199), but becomes statistically insignifi-
cant if we exclude the crisis period. There is also very little evidence of serial correlation in
daily returns on the four major government bonds that we have been considering. This outcome
does not depend on whether the crisis period is included in the sample. Irrespective of whether
the underlying returns are serially correlated, their absolute values (or their squares) are highly
serially correlated, often over many periods. For example, over the 3 Jan 2000–31 Aug 2009
period the first- and second-order serial correlation coefficients of absolute return on S&P 500
are 0.2644(0.0199), 0.3644(0.0204); for euro they are 0.0483(0.0199) and 0.1125(0.0200), and
for US 10Y bond they are 0.0991(0.0199) and 0.1317(0.0201). The serial correlation in abso-
lute returns tends to decay very slowly and continues to be statistically significant event after 120
trading days (see Figure 7.3).
It is also interesting to note that there is little correlation between rt and |rt |. Based on the full
sample ending in August 2009, this correlation is −.0003 for S&P 500, 0.025 for euro, and 0.009
for the US 10Y bond.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
0.4
0.3
0.2
0.1
0.0
1 51 101 151 200
Order of Lags
Figure 7.3 Autocorrelation function of the absolute values of returns on S&P 500 (over the period 3 Jan
2000–31 Aug 2009).
where SPt is the monthly spot price index of S&P 500 and SPDIVt denotes the associated div-
idends on the S&P 500 index. Over the period 1871m1 to 2009m9 (a total of 1,664 monthly
observations) the coefficient of skewness and kurtosis of RSP amounted to 1.07 and 23.5
per cents, respectively. The excess kurtosis coefficient of 20.5 is much higher than the figure of
11.3 obtained for the daily observations on SP over the period 3 Jan 2000–31 Aug 2009. Also as
before the skewness coefficient is relatively small. However, the monthly returns show a much
higher degree of serial correlation and a lower degree of volatility as compared to daily or weekly
returns. The correlation coefficients of RSP are 0.346 (0.0245) and 0.077 (0.027), and the serial
correlation coefficients continue to be statistically significant up to the lag order of 12 months.
Also, the pattern of serial correlations in absolute monthly returns, |RSPt |, is not that different
from that of the serial correlation in RSPt , which suggests a lower degree of return volatility (as
compared with the volatility of daily or weekly returns) once the effects of mean returns are taken
into account.
Similar, but less pronounced, results are obtained if we exclude the 1929 stock market crash
and focus on the post-Second World War period. The coefficients of skewness and kurtosis
of monthly returns over the period 1948m1 to 2009m9 (741 observations) are –0.49 and 5.2,
respectively. The first- and second-order serial correlation coefficients of returns are 0.361
(0.0367) and 0.165 (0.041), respectively. The main difference between these sub-sample esti-
mates and those obtained for the full sample is the much lower estimate for the kurtosis coeffi-
cient. But even the lower post 1948 estimates suggest a significant degree of fat-tailedness in the
monthly returns.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
f
Rt+1 − rt = a + b1 x1t + b2 x2t + . . . + bk xkt + ε t+1 , (7.4)
where Rt+1 is the one-period holding return on an stock index, such as FTSE or Dow Jones,
defined by
Pt is the stock price at the end of the period, Dt+1 is the dividend paid out over the period t to
t + 1, and xit , i = 1, 2, . . . , k are the factors/variables thought to be important in predicting
f
stock returns. Finally, rt is the return on the government bond with one-period to maturity (the
period to maturity of the bond should be exactly the same as the holding period of the stock).
f
Rt+1 − rt is known as the excess return (return on stocks in excess of the return on the safe
f
asset). Note also that rt would be known to the investor/trader at the end of period t, before the
price of stocks, Pt+1 , is revealed at the end of period t + 1.
Examples of possible stock market predictors are past changes in macroeconomic variables
such as interest rates, inflation, dividend yield (Dt /Pt−1 ), price earnings ratio, output growth,
and term premium (the difference in yield of a high grade and a low grade bond such as AAA
rated minus BAA rated bonds).
For individual stocks the relevant stock market regression is the capital asset pricing model
(CAPM), augmented with potential predictors:
Ri,t+1 = ai + b1i x1t + b2i x2t + . . . + bki xkt + β i Rt+1 + ε i,t+1 , (7.6)
where Ri,t+1 is the holding period return on asset i (shares of firm i), defined similarly as Rt+1 .
The asset-specific regressions (7.6) could also include firm specific predictors, such as Rit or its
higher-order lags, book-to-market value or size of firm i. Under market efficiency, as characterized
by CAPM,
and only the ‘betas’, β i , will be significantly different from zero. Under CAPM, the value of β i
captures the risk of holding the share i with respect to the market.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
efficiency needs to be defined separately from predictability. In fact, it is easily seen that stock
market returns will be non-predictable only if market efficiency is combined with risk neutrality.
f
$(1 + rt )At ,
f
A risk-neutral investor will be indifferent between the certainty of $(1 + rt )At , and her/his
expectations of the uncertain payout of option 2. Namely, for such a risk-neutral investor
f
(1 + rt )At = E [(At /Pt ) (Pt+1 + Dt+1 ) |t ] , (7.7)
where t is the investor’s information at the end of period t. This relationship is called the ‘arbi-
trage condition’. Using (7.5) we now have
or
f
E Rt+1 − rt |t = 0. (7.8)
This result establishes that if the investor forms his/her expectations of future stock (index)
f
returns taking account of all market information efficiently, then the excess return, Rt+1 − rt ,
should not be predictable using any of the market information that are available at the end of
f
period t. Notice that rt is known at time t and is therefore included in t . Hence, under the joint
f
hypothesis of market efficiency and risk neutrality we must have E (Rt+1 |t ) = rt .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The above set up can also be used to derive conditions under which asset prices can be char-
f
acterized as a random walk model. Suppose, the risk-free rate, rt , in addition to being known at
f
time t, is also constant over time and given by r . Then using (7.7) we can also write
1
Pt = E [(Pt+1 + Dt+1 ) |t ] ,
1 + rf
or
1
Pt = [E (Pt+1 |t ) + E (Dt+1 |t )] .
1 + rf
Under the rational expectations hypothesis and assuming that the ‘transversality condition’
j
1
lim E Pt+j |t = 0,
j→∞ 1 + rf
that equates the level of stock price to the present discounted stream of the dividends expected to
occur to the asset over the infinite future. The transversality condition rules out rational specula-
tive bubbles and is satisfied if the asset prices are not expected to rise faster than the exponential
decay rate determined by the discount factor, 0 < 1/(1 + r f ) < 1. It is now easily seen that if
Dt follows a random walk so will Pt . For example, suppose
Dt = Dt−1 + ε t , (7.10)
and
Dt
Pt = . (7.11)
rf
Pt = Pt−1 + ut , (7.12)
where ut = ε t /r f .
The random walk property holds even if r f = 0, since in such a case it would be reasonable to
expect no dividends to be paid out, namely Dt = 0. In this case the arbitrage condition becomes
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
which is satisfied by the random walk model but is in fact more general than the random walk
model. An asset price that satisfies (7.13) is a martingale process. Random walk processes with
zero drift are martingale processes but not all martingale processes are random walks. For
example, the price process
Pt+1 = Pt + λ (Pt+1 )2 − E (Pt+1 )2 |t + ε t ,
where ε t is a white noise process is a martingale process with respect to the information set t ,
but it is clearly not a random walk process, unless λ = 0. See Section 15.3.1 for a brief discussion
of martingale processes.
Other modifications of the random walk theory are obtained if it is assumed that dividends
follow a geometric random walk which is more realistic than the linear dividend model assumed
in (7.10). In this case
where μd and σ d are mean and standard deviation of the growth rate of the dividends. If it is
further assumed that ν t+1 |t is N(0, 1), we have
1 2
E Dt+j |t = Dt exp jμd + jσ d .
2
Using this result in (7.9) now yields, assuming that (1 + r f )−1 exp μd + 12 σ 2d < 1,
Dt
Pt = , (7.15)
ρ
where
1 2
ρ = (1 + r ) exp −μd − σ d − 1.
f
2
The condition (1 + r f )−1 exp μd + 12 σ 2d < 1 ensures that the infinite sum in (7.9) is conver-
gent and ρ > 0 . Under this set up ln(Pt ) = ln(Dt ) − ln(ρ), and
which establishes that in this case it is log prices that follow the random walk model. This is a
special case of the statistical model of return, (7.1), discussed in Section 7.3, where μt = μd ,
and σ t = σ d .
There are, however, three different types of empirical evidence that shed doubt on the empir-
ical validity of the present value model under risk neutrality.
1. The model predicts a constant price-dividend ratio for a large class of the dividend pro-
cesses, two prominent examples being the linear and the geometric random walk models, (7.10)
and (7.14), discussed above. For more general dividend processes the price-dividend ratio,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
ρ t = Pt /Dt , could be time varying, but it must be mean-reverting, in the sense that shocks to
prices and dividends must eventually cancel out. In reality, the price-dividend ratio varies con-
siderably over time, shows a high degree of persistent, and in general it is not possible to reject
the hypothesis that the processes for ρ t or ln(ρ t ) contain a unit root. For the Shiller data dis-
cussed in 7.4.2 the autocorrelation coefficient of the log dividend to price ratio computed over
the period 1871m1 to 2009m9 is 0.994 (0.024) and falls very gradually as its order is increased
and amounts to 0.879 (0.111) at the lag order 12. Formal tests of unit root hypothesis are dis-
cussed in Chapter 15.
2. We have already established that under risk neutrality excess returns must not be pre-
dictable (see equation (7.8)). Yet there is ample evidence of excess return predictability at least
in periods of high market volatility. For example, it is possible to explain 15 per cent of the vari-
ations in monthly excess returns on S&P 500 over the period 1872m2–2009m9 by running a
linear regression of the excess return on a constant and its 12 lagged values—namely by a uni-
variate AR(12) process. This figure rises to 19 per cent if we exclude the 1929 stock market crash
and focus on the post 1948 period. See also the references cited in Section 7.7.1. Formal tests of
unit root hypothesis are discussed in Chapter 15.
3. To derive the geometric random walk model of asset prices, (7.16), from the present
value model under risk neutrality, we have assumed that innovations to the dividend process
are normally distributed. This implies that innovations to asset returns must also be normally
distributed. But the empirical evidence discussed in Section 7.4 above clearly shows that inno-
vations to asset returns tend to be fat-tailed, and often significantly depart from normality. This
anomaly between the theory and the evidence is also difficult to reconcile. Under the present
value model prices will have fat-tailed innovations only if the dividends that drive asset
prices are
also fat-tailed. But under the geometric random walk model for dividends (7.14), E Dt+j |t
need not exist if the dividend innovations, ν t , are fat-tailed. One important example arises when
ν t has the Student t-distribution as defined by (7.3). For the derivation of the present value
expression in this case we need E(exp(σ d ν t+j )), which is the moment generating function of
ν t+j evaluated at σ d . But the Student t-distribution does not have a moment generating function,
and hence the present value formula cannot be computed when innovations to the dividends are
t distributed.
where λt is the premium per $ of invested capital required (expected) by the investor. It is now
easily seen that
f
E Rt+1 − rt |t = λt ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and it is no longer necessarily true that under market efficiency excess returns are non-predictable.
The extent to which excess returns can be predicted will depend on the existence of a historically
stable relationship between the risk premium, λt , and the macro and business cycle indicators
such as changes in interest rates, dividends and various business cycle indicators.
In the context of the consumption capital asset pricing model, λt is determined by the ex ante
correlation of excess returns and changes in the marginal utility of consumption. In the case
of a representative consumer with the single period utility function, u(ct ), the first-order inter-
temporal optimization condition (the Euler equation) is given by
f u (ct+1 )
E Rt+1 − rt | t = 0, (7.17)
u (ct )
where ct denotes the consumer’s real consumption in period t. Using the above condition it is
now easily seen that5
Cov Rt+1 , uu(c (ct+1
t )
)
| t Cov [Rt+1 , u (ct+1 ) |t ]
λt = − =− .
E uu(c (ct+1 )
| E [u (ct+1 ) |t ]
t)
t
For a power utility function, u(ct ) = (ct −1)/(1−γ), we have u (ct+1 )/u (ct ) = exp(−γ
1−γ
ln(ct+1 )), where γ > 0 is the coefficient of relative risk aversion. In this case λt is given by
−Cov Rt+1 , exp [−γ ln(ct+1 )] |t
λt = . (7.18)
E exp [−γ ln(ct+1 )] |t
This result shows that the risk premium depends on the covariance of asset returns with the
marginal utility of consumption. The premium demanded by the investor to hold the stock is
higher if the return on the asset co-varies positively with consumption. The extent of this co-
variation depends on the magnitude of the risk aversion coefficient γ. For plausible values of
γ (in the range 1 to 3) and historically observed values of the consumption growth, we would
expect λt to be relatively small, below 1 per cent per annum. However, using annual observations
over relatively long periods one obtains a much larger estimate for λt . This was first pointed out
by Mehra and Prescott (1985) who found that in the 90 years from 1889 to 1978 the average
estimate of λt in fact amounted to 6.18 per cent per annum, which could only be reconciled with
the theory if one was prepared to consider an implausibly large value for the relative risk aversion
coefficient (in the regions of 30 or 40). The large discrepancy between the historical estimate of
f
λt based on Rt+1 − rt , and the theory-consistent estimate of λt based on (7.18) is known as the
‘equity premium puzzle’. There have been many attempts in the literature to resolve the puzzle
by modifications to the utility function, attitudes towards risk, allowing for the possibility of rare
f
5 Let Xt+1 = Rt+1 − rt and Yt+1 = u (ct+1 )/u (ct ), and write the Euler equation (7.17) as
E (Xt+1 Yt+1 |t ) = 0 = Var (Xt+1 Yt+1 |t ) + E (Xt+1 |t ) E (Yt+1 |t ),
f
then the required results follow immediately, also noting that rt is known at time t and hence has a zero correlation with
u (ct+1 )/u (ct ).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
events, and the heterogeneity in asset holdings and preferences across consumers. For reviews
see Kocherlakota (2003) and Mehra and Prescott (2003).
f
But even if the mean discrepancy between E Rt+1 − rt |t and λt as given by (7.18) is
resolved, the differences in the higher moments of historically and theory-based risk premia are
likely to be important empirical issues of concern. It seems difficult to reconcile the high volatility
of excess returns with the low volatility of consumption growth that is observed historically.
(a) The weak form asserts that all price information is fully reflected in asset prices, in the
sense that current price changes cannot be predicted from past prices. This weak form
was also introduced in an unpublished paper by Roberts (1967).
(b) The semi-strong form that requires asset price changes to fully reflect all publicly available
information and not only past prices.
(c) The strong form that postulates that prices fully reflect information even if some investor
or group of investors have monopolistic access to some information.
Fama regarded the strong form version of the EMH as a benchmark against which the other
forms of market efficiencies are to be compared. With respect to the weak form version he con-
cluded that the test results strongly support the hypothesis, and considered the various depar-
tures documented as economically unimportant. He reached a similar conclusion with respect
to the semi-strong version of the hypothesis; although as he noted, the empirical evidence avail-
able at the time was rather limited and far less comprehensive as compared to the evidence on
the weak version.
The three forms of the EMH present different degrees whereby public and private information
are revealed in transaction prices. It is difficult to reconcile all the three versions to the main-
stream asset pricing theory, and as we shall see in Section 7.7.1 a closer connection is needed
between market efficiency and the specification of the model economy that underlies it.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
the forecast horizon. While the vast majority of these studies had looked at the US stock mar-
ket, an emerging literature has also considered the UK stock market. US studies include Balvers,
Cosimano, and MacDonald (1990), Breen, Glosten, and Jagannathan (1989), Campbell (1987),
Fama and French (1989), and more recently Ferson and Harvey (1993), Kandel and Stam-
baugh (1996), Pesaran and Timmermann (1994, 1995). See Granger (1992) for a survey of the
methods and results in the literature. UK studies after 1991 included Clare, Thomas, and Wick-
ens (1994), Clare, Psaradakis, and Thomas (1995), Black and Fraser (1995), and Pesaran and
Timmermann (2000).
Theoretical advances over Samuelson’s seminal paper by Leroy (1973), Rubinstein (1976),
and Lucas (1978) also made it clear that in the case of risk-averse investors tests of predictabil-
ity of excess returns could not on their own confirm or falsify the EMH. The neoclassical the-
ory cast the EMH in the context of dynamic stochastic general equilibrium models and showed
that excess returns weighted by marginal utility could be predictable. Only under risk neutrality,
where marginal utility was constant, the equilibrium condition implied the non-predictability
of excess returns.
As Fama (1991) noted in his second review, the test of the EMH involved a joint hypothesis—
market efficiency and the underlying equilibrium asset pricing model. He concluded that ‘Thus,
market efficiency per se is not testable’. (see p. 1575). This, did not, however, mean that market
efficiency was not a useful concept; almost all areas of empirical economics are subject to the
joint hypotheses problem.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
hindsight, and are unlikely to be repeated in real time. In this connection, the following consid-
erations would need to be born in mind:
1. Investor rationality: it is assumed that investors are rational, in the sense that they correctly
update their beliefs when new information is available.
2. Arbitrage: individual investment decisions satisfy the arbitrage condition, and trade deci-
sions are made guided by the calculus of the subjective expected utility theory à la
Savage.
3. Collective rationality: differences in beliefs across investors cancel out in the market.
To illustrate how these premises interact, suppose that at the start of period (day, week, month)
t there are Nt traders (investors) that are involved in an act of arbitrage between a stock and a
f
safe (risk-free) asset. Denote the one-period holding returns on these two assets by Rt+1 and rt ,
respectively. Following a similar line of argument as in section 7.6.2, the arbitrage condition for
trader i is given by
f
Êi Rt+1 − rt |it = λit + δ it ,
f f
where Êi Rt+1 −rt |it is his/her subjective expectations of the excess return, Rt+1 −rt taken
with respect to the information set
it = it ∪ t ,
where t is the component of the information which is publicly available, λit > 0 represents
trader’s risk premium, and δ it > 0 is her/his information and trading costs per unit of funds
invested. In the absence of information and trading costs, λit can be characterized in terms of the
trader’s utility function, ui (cit ), where cit is his/her real consumption expenditures during the
period t to t + 1, and is given by
f
−Covi mi,t+1 , Rt+1 |it
λit = Êi Rt+1 − rt |it = ,
Êi mi,t+1 |it
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where Ĉovi (. |it ) is the subjective covariance operator conditional on the trader’s information
set, it , mi,t+1 = β i ui (ci,t+1 )/ui (cit ), which is known as the ‘stochastic discount factor’, ui (.) is
the first derivative of the utility function, and β i is his/her discount factor.
The expected returns could differ across traders due to the differences in their perceived con-
f
ditional probability distribution function of Rt+1 − rt , the differences in their information sets,
it , the differences in their risk preferences, and/or endowments. Under the rational expecta-
tions hypothesis
f f
Êi Rt+1 − rt |it = E Rt+1 − rt |it ,
f
where E Rt+1 − rt |it is the ‘true’ or ‘objective’ conditional expectations. Furthermore, in
this case
f f
E Êi Rt+1 − rt |it |t = E E Rt+1 − rt |it |t ,
Therefore, under the REH, taking expectations of the individual arbitrage conditions with respect
to the public information set yields
f
E Rt+1 − rt |t = E (λit + δ it |t ) ,
which also implies that E (λit + δ it |t ) must be the same across all i, or
f
E Rt+1 − rt |t = E (λit + δ it |t ) = ρ t , for all i,
where ρ t is an average market measure of the combined risk premia and transaction costs. The
REH combined with perfect arbitrage ensures that different traders have the same expectations of
λit + δ it . Rationality and market discipline override individual differences in tastes, information
processing abilities and other transaction related costs and renders the familiar representative
agent arbitrage condition:
f
E Rt+1 − rt |t = ρ t . (7.19)
λit = λt + ε it , E (ε it |t ) = 0,
δ it = δ t + υ it , E (υ it |t ) = 0,
where ε it and υ it are distributed with mean zero independently of t , and λt and δ t are known
functions of the publicly available information.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Under this setting, the extent to which excess returns can be predicted will depend on the
existence of a historically stable relationship between the risk premium, λt , and the macro and
business cycle indicators such as changes in interest rates, dividends, and a number of other
indicators.
The rational expectations hypothesis is rather extreme which is unlikely to hold at all times
in all markets. Even if one assumes that in financial markets learning takes place reasonably fast,
there will still be periods of turmoil where market participants will be searching in the dark,
f
trying and experimenting with different models of Rt+1 − rt often with marked departures from
f
the common rational outcomes, given by E Rt+1 − rt |t .
Herding and correlated behaviour across some of the traders could also lead to further depar-
f
tures from the equilibrium RE solution. In fact, the objective probability distribution of Rt+1 −rt
f
might itself be affected by market transactions based on subjective estimates Êi Rt+1 −rt |it .
Market inefficiencies provide further sources of stock market predictability by introducing a
f
wedge between a ‘correct’ ex ante measure E Rt+1 − rt |t , and its average estimate by market
t f
participants, which we write as N i=1 wit Êi Rt+1 − rt |it , where wit is the market share of the
ith trader. Let
Nt
f f
ξ̄ wt = wit Êi Rt+1 − rt |it − E Rt+1 − rt |t ,
i=1
Nt
and note that it can also be written as (since i=1 wit = 1)
Nt
ξ̄ wt = wit ξ it , (7.20)
i=1
where
f f
ξ it = Êi Rt+1 − rt |it − E Rt+1 − rt |t , (7.21)
measures the degree to which individual expectations differs from the correct (but unobserv-
f
able) expectations, E Rt+1 − rt |t . A non-zero ξ it could arise from individual irrationality,
but not necessarily so. Rational individuals faced with an uncertain environment, costly informa-
tion and limitations on computing power could rationally arrive at their expectations of future
price changes that with hindsight differ from the correct ones.6 A non-zero ξ it could also arise
due to disparity of information across traders (including information asymmetries), and hetero-
geneous priors due to model uncertainty or irrationality. Nevertheless, despite such individual
deviations, ξ̄ wt which measures the extent of market or collective inefficiency, could be quite
negligible. When Nt is sufficiently large, individual ‘irrationality’ can cancel out at the level of
6 This is in line with the premise of the recent paper by Angeletos, Lorenzoni, and Pavan (2010) who maintain the axiom
of rationality, but allow for dispersed information and the possibility of information spillovers in the financial markets to
explain market inefficiencies.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Nt q.m.
f f
wit Êi Rt+1 − rt |it → E Rt+1 − rt |t , as Nt → ∞.
i=1
In such periods the representative agent paradigm would be applicable, and predictability of
excess return will be governed solely by changes in business cycle conditions and other publicly
available information.8
However, in periods where traders’ individual expectations become strongly correlated (say as
the result of herding or common over-reactions to distressing news), ξ̄ wt need not be negligible
even in thick markets with many traders; and market inefficiencies and profitable opportunities
could prevail. Markets could also display inefficiencies without exploitable profitable opportu-
nities if ξ̄ wt is non-zero but there is no stable predictable relationship between ξ̄ wt and business
cycle or other variables that are observed publicly.
The evolution and composition of ξ̄ wt can also help in shedding light on possible bubbles
or crashes developing in asset markets. Bubbles tend to develop in the aftermath of technologi-
cal innovations that are commonly acknowledged to be important, but with uncertain outcomes.
The emerging common beliefs about the potential advantages of the new technology and the dif-
ficulties individual agents face in learning how to respond to the new investment opportunities
can further increase the gap between average market expectations of excess returns and the asso-
ciated objective rational expectations outcome. Similar circumstances can also prevail during a
crash phase of the bubble when traders tend to move in tandem trying to reduce their risk expo-
sures all at the same time. Therefore, one would expect that during bubbles and crashes the indi-
vidual errors, ξ it , to become more correlated, such that the average errors, ξ̄ wt , are no longer neg-
ligible. In contrast, at times of market calm the individual errors are likely to be weakly correlated,
with the representative agent rational expectations model being a reasonable approximation.
f
More formally note that since rt and Pt are known at time t, then
Also to simplify the exposition assume that the length of the period t is sufficiently small so that
dividends are of secondary importance and
7 Concepts of weak and strong cross-sectional dependence are defined and discussed in Chudik, Pesaran, and Tosetti
(2011). See also Chapter 29.
8 The heterogeneity of expectations across traders can also help in explaining large trading volume observed in the finan-
cial markets, a feature which has proved difficult to explain in representative agent asset pricing models. But see Scheinkman
and Xiong (2003), who relate the occurrence of bubbles and crashes to changes in trading volume.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where ft = E [ ln(Pt+1 ) |t ] is the unobserved price change expectations. Individual devia-
tions, ξ it , could then become strongly correlated if individual expectations Êi [ ln(Pt+1 ) |it ]
differ systematically from ft . For example, suppose that
1. Do you believe the current price is (a) just right (in the sense that the price is in line with
market fundamentals), (b) is above the fundamental price, or (c) is below the fundamental
price?
2. Do you expect the market price next period to (a) stay about the level it is currently, (b)
fall, or (c) rise?
In cases where the market is equilibrating we would expect a close association between the
proportion of respondents who select 1a and 2a, 1b and 2b, and 1c and 2c. But in periods of
bubbles (crashes) one would expect a large proportion of respondents who select 1b (1c) to
also select 2c (2b).
In situations where the equilibrating process is well established and commonly understood,
the second question is redundant. For example, if an individual states that the room temperature
is too high, it will be understood that he/she would prefer less heating. The same is not applicable
to financial markets and hence responses to both questions are needed for a better understanding
of the operations of the markets and their evolution over time.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Automated systems reduce, but do not eliminate the need for discretion in real time decision
making. There are many ways that automated systems can be designed and implemented. The
space of models over which to search is huge and is likely to expand over time. Different approx-
imation techniques such as genetic algorithms, simulated annealing and MCMC algorithms can
be used. There are also many theoretically valid model selection or model averaging procedures.
The challenge facing real time econometrics is to provide insight into many of these choices that
researchers face in the development of automated systems.
Return forecasts need to be incorporated in sound risk management systems. For this purpose
point forecasts are not sufficient and joint probability forecast densities of a large number of inter-
related asset returns will be required. Transaction and slippage costs need to be allowed for in the
derivation of trading rules. Slippage arises when long (short) orders, optimally derived based on
currently observed prices, are placed in rising (falling) markets. Slippage can be substantial, and
is in addition to the usual transactions costs.
Familiar risk measures such as the Sharpe ratio and the VaR are routinely used to moni-
tor and evaluate the potential of trading systems. But due to cash constraint (for margin calls,
etc.) it is large drawdowns that are most feared. Prominent recent examples are the down-
fall of Long Term Capital who experienced substantial drawdowns in 1998 following the
Russian financial crisis, and the collapse of Lehman Brothers during the global financial crisis
of 2008.
Successful traders might not be (and usually are not) better in forecasting returns than many
others in the market. What they have is a sense of ‘big’ opportunities when they are confident of
making a ‘kill’.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
– limits to rational expectations (for an early treatment see Pesaran (1987c); see also the
recent paper on survey expectations by Pesaran and Weale (2006)).
– limits to arbitrage due to liquidity requirements and institutional constraints.
– herding and correlated behaviour with noise traders entering markets during bull periods
and deserting during bear periods.
Departures from the EMH listed above are addressed by behavioural finance, complexity the-
ory, and the Adaptive Markets Hypothesis recently advocated by Lo (2004). Some of the recent
developments in behavioural finance are reviewed in Baberis and Thaler (2003). Farmer and
Lo (1999) focus on recent research that views the financial markets from a biological perspec-
tive and, specifically, within an evolutionary framework in which markets, instruments, institu-
tions, and investors interact and evolve dynamically according to the ‘law’ of economic selection.
Under this view, financial agents compete and adapt, but they do not necessarily do so in an opti-
mal fashion.
Special care should also be exercised in evaluation of return predictability and trading rules.
To minimize the effects of hindsight in such analysis recursive modelling techniques discussed
in Pesaran and Timmermann (1995, 2000, 2005a) seem much more appropriate than the return
regressions on a fixed set of regressors/factors that are estimated ex post on historical data.
7.11 Exercises
1. The file FUTURESDATA.fit, provided in Microfit 5, contains daily returns on a number of
equity index futures, currencies and government bonds. Use this data set to compute skew-
ness and kurtosis coefficients for daily returns on different assets over the periods before
and after 2000. Examine if your results are qualitatively affected by which sub-period is
considered.
2. The file UKUS.fit, provided in Microfit 5, contains monthly observations on UK and US
economies. Using the available data, investigate the extent to which stock markets in UK and
US could have been predicted during 1990s.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∞
j
1
Pt = E Dt+j |t ,
j=1
1+r
where
Show that
j 0.5σ 2u 1 − ρ 2j
E Dt+j |t = exp δ(1 − ρ ) exp ρ log(Dt ) exp
j
,
1 − ρ2
and hence or otherwise derive the price equation for this process and establish conditions
under which the price equation exists. In particular, consider the case where ρ = 1.
(c) How would you go about testing the validity of the above price equations?
4. Consider an investor who wishes to allocate the fractions wt = (wt1 , wt2 , . . . , wtn ) of his/her
wealth at time t to n risky assets and the remainder to the risk-free asset.
where rt+1 is an n × 1 vector of returns on the risky assets, rf is the return on the safe
asset, and τ is an n-dimensional vector of ones.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(b) Suppose that E (rt+1 |It ) = μt and Var (rt+1 |It ) = t , where It is an information
set
that contains rt and its lagged values. Derive wt such that Var ρ t+1 |It is minimized
subject to E ρ t+1 |It = μ̄ρ > 0.
(c) Under the same assumptions
as
above, now derive wt such that E ρ t+1 |It is maxi-
mized subject to Var ρ t+1 |It = σ̄ 2ρ > 0.
(d) Compare your answers under (b) and (c) and discuss the results in the light of investor’s
degree of risk aversion.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Part II
Statistical Theory
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
8 Asymptotic Theory
8.1 Introduction
M ost econometric methods used in applied economics, particularly in time series econo-
metrics, are asymptotic in the sense that they are likely to hold only when the sam-
ple size is ‘large enough’. In this chapter we briefly review the different concepts of asymp-
totic convergence used in mathematical statistics and discuss their applications to econometric
problems.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and when x is a fixed constant it is referred to as the probability limit of xt , written as Plim(xt ) = x,
as t → ∞. The above concept is readily extended to multivariate cases where {xt , t = 1, 2 . . .}
denote mdimensional vectors of random variables. Condition (8.1) should now be replaced by
where · denotes an appropriate norm measuring the discrepancy between xt and x. Using the
m 2 1
Euclidean norm we have z = 2 , where z = (z , z , . . . , z ) .
i=1 zi 1 2 m
Example 13 Suppose xT is normally distributed with mean μ+ Tk and the variance σ 2T = σT > 0.
2
Show that {xT } converges in probability to μ, a fixed constant. Here we show the convergence of xt
to μ by obtaining Pr(| xT − μ |< ) directly. However, as it becomes clear later, the result can be
established
much more easily using general results on convergence in probability. Since xT − μ ∼
k σ2
N T, T it is easily seen that
− k
− − k
Pr(| xT − μ |< ) = T
− T
, (8.2)
√σ √σ
T T
where (·) represents the cumulative distribution function of a standard normal variate. But
√
− k
T
lim T
= lim = 1, for any > 0,
T→∞ √σ T→∞ σ
T
and
√
− − k
− T
lim T
= lim = 0, for any > 0.
T→∞ √σ T→∞ σ
T
Therefore
as required. For a given value of , the rate of convergence of xT to μ depends on k, σ and the
shape of the distribution function (·) . The larger the value of σ , the slower will be the rate of
convergence of xT to μ.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
wp1 as
This is often written as xt → x (or →). An equivalent condition for convergence with prob-
ability 1 is given by
The equivalence of conditions (8.3) and (8.4) is proved by Halmos (1950) and clearly shows
that the concept of convergence in probability defined by (8.1) is a special case of (8.4) (setting
m = t in (8.4) delivers (8.1)). But as we shall see below, the reverse is not necessarily true.
The concept of convergence with probability 1 is stronger than convergence in probability and
is often referred to as the ‘strong convergence’ as compared to convergence in probability which
is referred to as ‘weak convergence’.
lim E | xt − x |s = 0, (8.5)
t→∞
s-th
and it is written as xt → x.
or
s
E |xt − x|s ≥ [E |xt − x|r ] r , for s > r > 0,
which is also known as Lyapunov’s inequality (see, e.g., Billingsley (1999)). Taking limits of both
s-th
sides of this inequality and assuming that xt → x, then
s
lim [E |xt − x|r ] r ≤ lim |xt − x|s = 0,
t→∞ t→∞
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
s-th r-th
and therefore xt → x implies xt → x, for s > r > 0.
1
Pr(|xt − x| > ) ≤ E(xt − x)2 , (8.6)
2
and for any fixed , taking limits of both sides yields
1
lim Pr(|xt − x| > ) ≤ lim E(xt − x)2 .
t→∞ 2 t→∞
Therefore, we have the following result:
q.m.
Theorem 2 Convergence in quadratic mean implies convergence in probability, i.e., xt → x =⇒
p
xt → x. More generally, if
s-th p
xt → x =⇒ xt → x, for any s > 0.
where I (|zt | ≤ ) is the indicator function, taking the value of unity if |zt | > , and zero
otherwise. Since |zt | is non-negative, we have
E |zt |s ≥ E |zt |s I (|zt | > )
s
= |zt | f (zt ) dzt ≥ s
f (zt ) dzt = s Pr {|zt | > } .
|zt |> |zt |>
Hence
E |zt |s
Pr {|zt | > } ≤ , s > 0, (8.7)
s
or
E |xt − x|s
Pr {|xt − x| > } ≤ , s > 0, (8.8)
s
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Notice, however, as the following example demonstrates, convergence in probability does not
necessarily imply convergence in s-th mean.
p
As t → ∞, the probability that xt → 0 tends unity, and hence xt → 0. However, as t → ∞
ts
E |xt |s = → ∞, for s > 0,
log t
which contradicts the necessary condition for xt to converge to zero in s-th mean.
The relationship between convergence with probability 1 and convergence in quadratic mean
is more complex, and involves additional conditions. A useful result is provided in the following
theorem:
∞
E (xt − c)2 < ∞, (8.9)
t=1
Condition (8.9) is not necessary for xt to converge to x with probability 1. In the case of exam-
ple 13, it is easily seen that
σ2 k2
E (xt − μ)2 = + 2, (8.10)
t t
qm p
and hence xt → μ which in turn implies that xt → μ. Using (8.2) we have
⎛ √ ⎞ ⎛ √ ⎞
m− √k − m − √k
Pr(|xm − μ| < ) = ⎝ ⎠ − ⎝ ⎠,
m m
σ σ
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and it readily follows that for all m ≥ t, Pr(|xm − μ| < ) tends to zero as t → ∞, and by
wp1
condition (8.4) then xt → μ. But, using (8.10), we have
T
1 1 1 1 1 1
E (xt − μ) = σ
2 2
1 + + + ... + +k 2
1 + 2 + 2 + ... + 2 .
t=1
2 3 T 2 3 T
The sequence 1 + 1
2 + 1
3 + ... + 1
T diverges since
1 1 1 1 1 1 1 1 1 1 1 1
1+ + + ... + > 1 + + + + + + + ... +
2 3 4 T 2 4 4 4 4 4 4 T
1 1 1 1
= + + + + ....
2 2 2 2
It is clear that the condition ∞ t=1 E (xt − c) < ∞ is not satisfied in the present example.
2
d L a
Convergence in distribution is usually denoted by xt → x, xt → x, xt ∼ x, or Ft =⇒ F.
Definition 5 If Ft ⇒ F, and F is continuous, then the convergence is said to be uniform, that is:
Theorem 4 Let ϕ t (θ ) and ϕ (θ ) be the characteristic functions associated with the distribution func-
tions Ft (·) and F (·) , respectively. The following statements are equivalent:
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
d
(i) Ft =⇒ F or xt → x .
For a proof of the above theorem see Rao (1973) and Serfling (1980). An important applica-
tion of the above theorem is given by the following lemma.
Proof Let xt = at x + bt and denote its distribution function and the associated characteristic
function by Ft and ϕ t (θ ) , respectively. From the properties of characteristic functions we
have
ϕ t (θ ) = E eiθxt = E eiθ(at x+bt ) = eiθbt E eiθat x .
Further
lim ϕ t (θ) = eiθb E eiθax ,
t→∞
d
which is the characteristic function of ax + b, and consequently at x + bt → ax + b, and
Ft (at x + bt ) =⇒ F (ax + b) .
d
Theorem 6 Let xt , yt , t = 1, 2, . . . be a sequence of pairs of random variables with yt → y, and
p
| yt − xt |→ 0. Then the limiting distribution of xt exists and is the same as that of y, that is
d
xt → y.
Proof Let ft be the distribution function of xt , and fy be the distribution function of y. Set zt =
yt − xt , and let u be the continuity point of Fy (·). Then
Ft (u) = Pr (xt < u) = Pr yt < u + zt ,
= Pr yt < u + zt , zt < + Pr yt < u + zt , zt ≥ , (8.11)
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and
Pr yt < u + zt , zt ≥ = Pr (zt ≥ ) Pr yt < u + zt | zt ≥ ≤ Pr (zt ≥ ) .
Ft (u) ≤ Pr yt < u + + Pr ( zt ≥ ) ,
lim Ft (u) ≤ lim Pr yt < u + + lim Pr ( zt ≥ ) .
t→∞ t→∞ t→∞
p
But given that zt → 0 by assumption, the second limit on the right hand side of this equality
d
is equal to zero, and since yt → y, we have
lim Ft (u) ≤ Fy (u + ) .
t→∞
lim Ft (u) ≥ Ft (u − ) .
t→∞
as required.
d p
Theorem 7 If xt → x and yt → c, where c is a finite constant, then
d
(i) xt + yt → x + c.
d
(ii) yt xt → cx.
xt d
(iii) yt −→ xc ; if c = 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Proof
p
(i) By assumption, we have yt − c = (xt + yt ) − (xt + c) → 0. Therefore, from Theorem 6
xt + yt and xt + c have the same limiting distribution. But by assumption we also have
d d
xt + c → x + c. Hence it also follows that xt + yt → x + c.
(ii) Let zt = xt (yt − c), and for arbitrary positive constants and δ, consider
Pr (|zt | > ) = Pr(|xt | |yt − c| > , |yt − c| < )+
δ
Pr(|xt | |yt − c| > , |yt − c| ≥ )
δ
≤ Pr (|xt | ≥ δ) + Pr(|yt − c| ≥ ).
δ
For any fixed δ, taking limits of both sides of the above inequality, and noting that by
p d
assumption yt −→ c and xt → x , we have
But δ is arbitrary and hence Pr (|xt | > δ) can be made as small as desired by choosing a
p
large enough value for δ. Therefore limt→∞ Pr (|zt | > ) = 0 and zt → 0. Hence by
Theorem 6, xt yt and cxt will have the same asymptotic distribution given by the distri-
bution of cx.
(iii) The proof is similar to that given above for (ii).
The above results readily extend to the multivariate case where xt is a vector of random vari-
ables. In this connection the following theorem is particularly useful.
Theorem 8 Let xt = (x1t , x2t , . . . , xmt ) be a sequence of m × 1 vector of random variables and
suppose that
d
λ xt → λ x,
when λ = (λ1 , λ2 , . . . , λm ) is an arbitrary vector of fixed constants. Then the limiting distribu-
tion of xt exists and is given by the limiting distribution of x.
d
Proof λ xt → λ x implies that λ xt and λ x have the same characteristic function (see Theo-
rem 4). Denote the characteristic functions of xt and x by φ t (θ 1 , θ 2 , . . . θ m ) and φ θ 1 , θ 2 , . . .
θ m , respectively.
Then the characteristic functions of λ xt and λ x are given by ϕ t λ1 θ 1 ,
λ2 θ 2 , . . . λm θ m and ϕ (λ1 θ 1 , λ2 θ 2 , . . . λm θ m ), and as t → ∞ by assumption
ϕ t (λ1 θ 1 , λ2 θ 2 , . . . λm θ m ) → ϕ (λ1 θ 1 , λ2 θ 2 , . . . λm θ m ) ,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
ϕ t (θ 1 , θ 2 , . . . θ m ) → ϕ (θ 1 , θ 2 , . . . θ m ) ,
wp1 wp1
(i) xt → x ⇒ g (xt ) → g (x)
p p
(ii) xt → x ⇒ g (xt ) → g (x)
d d
(iii) xt → x ⇒ g (xt ) → g (x)
p d d
(iv) xt − yt → 0 and yt → y ⇒ g (xt ) − g y → 0.
For a proof see, for example, Serfling (1980) and Rao (1973).
Example 15
p p
(a) Suppose that xt → c, then λ xt → λ c
d d
(b) If xt → N (0, Im ), then xt Mxt → χ 2s ,
d d
(c) If xt → N (0, 1), then xt2 → χ 21 .
Definition 6 Let {at } be a sequence of positive numbers and {xt } be a sequence of random variables.
Then
(i) xt = Op (at ), or xatt is bounded in probability, if, for each > 0 there exist the real numbers M
and N such that
|xt |
Pr > M < , for t > N . (8.12)
at
(ii) xt = op (at ), if
xt p
→ 0.
at
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
The above definition can be generalized for two sequences
of random variables {xt } and yt .
The notation xt = Op yt denotes that the sequence xytt is Op (1) . Also xt = op yt means
p
that xytt → 0. See, for example, Bierens (2005, p. 157).
One important use of the stochastic order notation is in the Taylor series expansion of func-
tions of random variables. Let xt − c = op (at ), where at → 0 as t → ∞, and assume that g (x)
has a kth order Taylor series expansion at c, namely
g (x) = Gk (x, c) + op |x − c|k .
Then we have
g (xt ) = Gk (xt , c) + op akt . (8.13)
The proof is very simple and follows immediately from the fact that if xt − c = op (at ), then
|xt − c|k = op akt .
Theorem 10 (Khinchine) Suppose that {xt } is a sequence of IID random variables with constant
mean; i.e., E(xt ) = μ < ∞. Then
T
t=1 xt p
x̄T = → μ.
T
Proof Denote the characteristic function (c.f.) of xt by ϕ x (θ ). Since xt are IID, then c.f. of x̄T is
given by
T
θ
ϕ T (θ ) = ϕ x .
T
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and
θ
log [ϕ T (θ)] = T log ϕ x
T
iμθ θ
=T +o = iμθ + o (θ ) ,
T T
This theorem represents the weak law of large numbers (WLLN) for independent random
variables, and only requires that the mean of xt exists.
T
and hence limT→∞ V(ȳT ) = 0, if limT→∞ 1
T t=1 σ t
2 < ∞. Therefore, by Theorem 2,
p p
ȳT → 0, or x̄T − μ̄T → 0.
Theorem 10 and Theorem 11 give different conditions for the weak convergence of the sums
of random variables. The strong forms of the law of large numbers are given by the following
theorems:
Theorem 12 (Kolmogorov) Let {xt } be a sequence of independent random variables with E (xt ) =
μt < ∞ and V (xt ) = σ 2t , such that
∞
σ2 t
< ∞. (8.14)
t=1
t2
wp1
Then x̄T − μ̄T → 0. If the independence assumption is replaced by lack of correlation (i.e.
cov (xt , xs ) = 0, t = s), the convergence of x̄T − μ̄T with probability one requires the stronger
condition
∞
σ 2 (log t)2
t
< ∞. (8.15)
t=1
t2
For a proof see Rao (1973), where other forms of the law are also discussed.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Another version of the strong law of large numbers, which is more relevant in econometric
applications, is given in Theorem 13.
13 Suppose that x1, x2. . . . are independent random variables, and that E (xi ) = 0,
Theorem
E x4i ≤ K, ∀i, where K is an arbitrary positive constant. Then x̄T converges to zero with proba-
bility 1.
Proof We have
T 4
4 1
E x̄T = 4 E xi
T i=1
⎛ ⎞
1 ⎝ 4
T
= 4E xi + 6 x2i x2j ⎠
T i=1 i<j
2 12 4 14
E xi ≤ E xi
Therefore
1
E x̄4T ≤ [Tk + 3T(T − 1)K] ≤ 3kT −2 ,
T
and
∞ ∞
E x̄4T ≤ 3k T −2 < ∞,
i=1 T=1
∞
which establishes x̄4T < ∞, with probability 1.
i=1
It is also easily seen that if the zero-mean assumption is replaced by E (xi ) = μ < ∞, in
wp1
the statement of the theorem, then the theorem still holds, with x̄T → μ as its conclusion.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Theorem 14 (Uniform strong law of large numbers) Let {xt } be a sequence of independent
T
x wp1
random variables and suppose that the strong law of large numbers is satisfied, i.e., t=1T
t
→ x.
Let g (x, θ )be a function
continuous
on R ×
where R is the range of x and θ lies in the compact
set
. If E supθ∈
g (x, θ ) < ∞ then
T g (x , θ )
t=1 t
lim sup − E [g (x, θ )] = 0, with probability 1.
T→∞ θ∈
T
The above law is uniform since it happens for the supremum of the difference between the
average and the expectations. In other words, it holds at θ for which the difference is the greatest.
Theorem 15 (Lindberg and Levy) Let {xt } be a sequence of IID random variables with E (xt ) =
μ and V (xt ) = σ 2 . Then
√
T (x̄T − μ) d
→ N(0, 1). (8.16)
σ
Proof Let ϕ z (θ ) be√the characteristic function of zt = xt − μ, and let ϕ T (θ) be the character-
T(x̄T −μ)
istic function of σ . Then using the independence of xt , we have
T
θ
ϕ T (θ ) = ϕ z √ .
σ T
2
θ θ2 θ
ϕz √ =1− +o .
σ T 2T T
Therefore,
2
θ2 θ
log [ϕ T (θ )] = T log 1 − +o ,
2T T
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
θ2
lim log [ϕ T (θ )] = − .
T→∞ 2
It follows that limT→∞ ϕ T (θ) = e− 2 θ , which is the c.f. of a standard normal variate, which
1 2
implies (8.16).
Theorem 16 (Liapounov) Let {xt } be a sequence of independent random variables and assume
that the following moments of xt exist
E (xt ) = μ,
E(xt − μt )2 = σ 2t > 0,
E(xt − μt )3 = α t ,
3
E x t − μ t = β t ,
where
T 12 T 12
BT = βt , CT = σ 2t ,
t=1 t=1
it follows that
√
T x̄T − μ̄T d
→ N(0, 1),
σ̄ T
T T
t=1 μt
where μ̄T = T , and σ̄ 2T = 1
T t=1 σ t .
2
Theorem 17 (Linberg–Feller) Let {xt } be a sequence of independent random variables, and assume
that E (xt ) = μt and V(xt ) = σ 2t > 0 exist. Then
√
T x̄T − μ̄T d
→ N(0, 1),
σ̄ T
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and
1 σ 2t
lim max = 0, (8.17)
T→θ 1≤t≤n T σ̄ 2T
T
1 1 2
lim √ x − μt dFt (x) = 0, (8.18)
T→θ T σ̄ 2T
t=1 |x−μt |> T σ̄ T
T T
t=1 μt
where μ̄T = T , σ̄ 2T = 1
T t=1 σ t , and Ft
2 (x) denotes the distribution function of xt .
Theorems 16 and 17 are proved in Gnedenko (1962) and Loeve (1977) and cover the case
of independent but heterogeneously distributed random variables. They are particularly useful
in the case of cross-section observations. Condition (8.18), known as the Lindberg condition, is,
however, difficult to verify in practice and the following limit theorem is often used instead.
Theorem 18 Let {xt } be a sequence of independent random variables with E (xt ) = μt , Var(xt ) =
2+δ
σ 2t > 0 and E xt − μt < ∞, for some δ > 0 and all t. If
1 2
T
lim σ t > 0, (8.19)
T→∞ T
t=1
√
T (x̄T −μ̄T ) d
then σ̄ T → N(0, 1).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
as the uniform mixing coefficient. The strong mixing is weaker than the uniform mixing concept,
since
If A and B are independent then α (A, B ) = φ (A, B ) = 0. The converse is true in the case of
uniform mixing, while it is not true for strong mixing. See Davidson (1994, p. 206).
where α (., .) is given by (8.20). The sequence is said to be φ-mixing (or uniform mixing) if
limm→∞ φ m = 0 with
t ∞
φ m = sup φ F−∞ , Ft+m ,
t
Cov (xt , xt+τ ) ≤ ρ τ [Var (xt ) Var (xt+τ )]1/2 , for all τ > 0,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Theorem 19 (Strong law for mixing processes) Let {xt } be a α-mixing sequence of size −r/(r−
1) with r > 1, or a φ-mixing sequence of size −r/(2r − 1), with r ≥ 1. If E |xt |r+δ < K < ∞,
wp1
for some δ > 0 and all t, then x̄t − μ̄t → 0.
Theorem 20 (Strong law for asymptotically uncorrelated processes) Let {xt } be a sequence
of random variables with asymptotically uncorrelated elements, and with means E (xt ) = μt , and
wp1
variances Var(xt ) = σ 2t < ∞. Then x̄T → μ̄T .
See White (2000), page 53. Let, for example, {xt } be a covariance
stationary process with
E(xt ) = μ, and with autocovariances given by γ (j) = E xt xt−j . If autocovariances are abso-
lutely summable, namely if
∞
γ (j) < ∞,
j=0
wp1
then from the above theorem it follows that x̄T → μ.
It is interesting to observe that Theorem 20, compared with Theorem 19, relaxes the depen-
dence restriction from asymptotic independence (mixing) to asymptotic uncorrelatedness. At
the same time, the moment requirements have been strengthened from requiring the existence
of moments of order r + δ (with r ≥ 1 and δ > 0), to requiring the existence of second-order
moments.
We now present some results for martingale difference sequences and for Lp -mixingales (see
Section 15.3). To this end, the following definition is helpful.
Definition 10 {xt } is said to be uniformly integrable if, for every > 0, there exists a constant M > 0
such that
E |xt | 1[|xt |≥M] < , (8.22)
The following theorems provide weak and strong laws of large numbers for martingale differ-
ence sequences and for mixingales.
Theorem 21 (Weak law for martingale differences) Let {xt } be a martingale difference sequence
p
with respect to the information set t . If {xt } is uniformly integrable then x̄T → 0.
Theorem 22 (Strong law for martingale differences) Let {xt } be a martingale difference sequence
with respect to the information set t . If, for 1 ≤ p ≤ 2, we have
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
∞
E |xt |p /t p < ∞,
t=1
wp1
then x̄T → 0.
Theorem 23 (Weak law for L1 -mixingales) Let {xt } be a L1 -mixingale with respect to t . If {xt }
is uniformly integrable and there exists a choice for {ct } such that
T
lim T −1 ct < ∞,
T→∞
t=1
p
then x̄T → 0.
See Davidson (1994), page 302, and Hamilton (1994), page 190.
Theorem 24 (Strong law for Lp -mixingales) Let {xt } be a Lp -mixingale with respect to t with
either (i) p = 2 of size −1/2, or (ii) 1 < p < 2, of size −1; if there exists a choice for {ct } such
that
T
lim T −1 ct < ∞,
T→∞
t=1
wp1
then x̄T → 0.
Theorem 25 (Strong law for Lp -mixingales) Let {xt } be a Lp -mixingale with respect to t , with
1 < p ≤ 2, of size −λ. If ct /at = O(t α ), with α < min −1/p, 1 − 1/p λ − 1 , then
wp1
x̄T → 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Theorem 26 Let {xt } be a sequence of random variables such that E |xt |r < K < ∞ for some
r ≥ 2, and all t. If {xt } is α-mixing of size −r/(r − 2) or φ-mixing of size −r/2(r − 1), with
√ d
r > 2, and σ̄ 2T = Var T −1/2 Tt=1 xt > 0, then T x̄T − μ̄T /σ̄ T → N (0, 1).
See White (2000), Theorem 5.20. See also Corollary 3.2 in Wooldridge and White (1988),
and Theorem 4.2 in McLeish (1975a).
Theorem 27 Let
∞
xt = ψ j ε t−j ,
j=0
where ε t is a sequence of IID random variables with E (ε t ) = 0, and E ε 2t < ∞. Assume
√ ∞
that ∞
d
j=0 ψ j < ∞. Then Tx̄T → N 0, j=−∞ γ (j) , where γ (j) is the j order
th
autocovariance of xt .
Theorem 28 Let {xt } be a martingale difference sequence with respect to the information set t . Let
√ T
σ̄ 2T = Var Tx̄T = t=1 σ t . If E (|xt | ) < K < ∞, r > 2 and for all t, and
1 2 r
T
T
p
−1
T x2t − σ̄ 2T → 0,
t=1
√ d
then Tx̄T /σ̄ T → N (0, 1) .
where under standard classical assumptions it can be established that the OLS estimator of θ =
(α, λ) , say θ̂ T , is asymptotically normally distributed with mean θ 0 (the ‘true’ value of θ), and a
covariance matrix σ 2T V where σ 2T → 0 as T → ∞. But, the parameter of interest is the long-run
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
α
response of yt to a unit change in xt , namely g (θ ) = 1−λ , and the asymptotic distribution of
g θ̂ T is required. The following theorem is particularly useful for such problems:
Theorem 29 Suppose that xT = (x1T , x2T , . . . , xmT ) is asymptotically distributed as N μ, σ 2T V ,
where V is a fixed matrix and the scalar constants σ 2T → 0 as T → ∞. Let g (x) =
(g1 (x) , g2 (x) , . . . , gp (x)), x = (x1, x2, . . . xm ) , be a vector-valued function for which each
component g1 (x) is real-valued, and with non-zero differentials at x = μ given by the matrix
∂gi (x)
G = .
m×p ∂xj x=μ
It follows that
d
g (xT ) → N g (μ) , σ 2T GVG .
p
Proof Since Var (xT ) = σ 2T V, as T → ∞, and σ 2T → 0 it follows that xT → μ, and
xT − μ = op (1) .
From the Taylor series approximation result for stochastic processes we have
where zT = op (xT − μ). Now using Slutsky’s convergence theorem (see Theorem
g(xT )−g(μ)
6), σT and G(xσTT−μ) will have the same limiting distribution if we show that
PlimT (zT /σ T ) = 0. But since zT = op (xT − μ) = op (1), then
zT
Plim = 0,
T→∞ xT − μ
or
zT /σ T
Plim = 0.
T→∞ xT − μ /σ T
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Example 16 An application of this theorem to the dynamic regression model (8.23) yields the follow-
ing asymptotic distribution for the long-run response of yt with respect to a unit change in xt :
2
α̂ T a α σ
∼N , GVG , (8.24)
1 − λ̂T 1−λ T
where
T T
x2t xt yt−1
V = Plim t=1 T
T yt−1 xt
t=1 T
T y2t−1 , (8.25)
T→∞
t=1 T t=1 T
and
1 α
G= , , (8.26)
1 − λ (1 − λ)2
assuming that |λ| < 1. As we shall see later in the case where {xt } is covariance-stationary, the
probability limits appearing in (8.25) exist and are finite.
Theorem 29 can also be extended to cover cases where the first or even higher-order partial
derivatives of g(x) evaluated at μ vanish. This extension to the case where the first-order partial
derivatives of g(x) vanish at x = μ, will be stated without proof in the following theorem:
Theorem 30 (A generalization
of Theorem
29). Suppose that the m × 1 vector xT has an asymptotic
normal distribution N μ, T −1 V . Let g(x) be a vector real-valued function of order p × 1 pos-
sessing continuous partials of order 2 in the neighbourhood of x = μ , with all partial derivatives
of order 1 vanishing at x = μ , but with the second-order partial derivatives not all vanishing at
x = μ. Then
d 1
T g(xT ) − g(μ) → z Q BQ z, (8.27)
2
where z = z1, z2, . . . zm ∼ N (0, Im ) , V = QQ and B is an m × m matrix with (i, j)
elements
2
∂ g(x)
B= . (8.28)
∂xi ∂xj x=μ
yt = β xt + ut , t = 1, 2, . . . T, (8.29)
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
T p
is consistent, if T −1 t=1 xt xt → xx , where xx is a nonsingular matrix, and
T
p
T −1 xt ut → 0.
t=1
y = Xβ + u,
where y and u are T × 1 vectors and X in a T × k matrix of observations on xt . Denoting the OLS
estimator of β by β̂ T , under (8.29) we have
T −1 T
β̂ T − β = xt xt xt ut
t=1 t=1
−1
−1 X X X u
= X X Xu= .
T T
Since by assumption T −1 X X converges in probability to a nonsingular matrix, then by the con-
vergence Theorem 9, we have
X X −1 Xu
Plim β̂ T − β = Plim Plim
T→∞ T→∞ T T→∞ T
Xu
= (xx )−1 Plim ,
T→∞ T
X u p p
and since by assumption T → 0 , then β̂ T − β → 0, and hence β̂ T is a consistent estimator
of β.
Example 18 Consider the regression model in the above example and suppose now that one of the
regressors is a time trend. It is clear that in this case the row and the column of T −1 X X associ-
ated with the trended variable blow-up as T → ∞ , and the convergence theorems cannot be
applied to β̂ T −β directly. Dividing X X by T −3 does not resolve the problem either, since the terms
T −3 Tt=1 xit xjt converge to zero when at least one of the variables (xit or xjt ) is bounded. Sup-
pose that all the regressors are bounded except for xkt which is trended, i.e., KL ≤ |xit | ≤ KU , for
i = 1, 2, . . . k − 1, and xkt = t, for t = 1, 2, . . . T . Then T −1 Tt=1 xit xkt , and T −1 Tt=1 x2kt
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
blow up as T → ∞, and while T −3 Tt=1 x2kt → 13 , the other terms in T −3 Tt=1 xit xjt con-
verge to zero, which makes xx a singular matrix. The problem lies in the fact that the OLS esti-
mators of the coefficients of the non-trended variables (x1t, x2t, . . . xk−1,t, ) and that of the trended
variable, xkt, , converge to their ‘true’ values at different rates. To simplify the exposition let k = 2,
and suppose x1t is bounded (with coefficient β 1 ) and x2t = t , (with coefficient β 2 ). We have
T 2 T 2
t=1 x2t t=1 t 1
Plim = lim = , (8.30)
T→∞ T3 T→∞ T3 3
T 2
t=1 x1t
Plim < KU2 , (8.31)
T→∞ T
T
t=1 x1t x2t
Plim < KU , (8.32)
T→∞ T2
and since Var T −2 Tt=1 x2t ut = T −4 T
t=1 t 2 σ 2 → 0, as T → ∞, then
−2
T −2
T p
T t=1 x2t ut = T t=1 tut → 0.
p
Similarly, it is easily established that T −1 Tt=1 x1t ut → 0. Consider now the following expres-
sions for the OLS estimators of β 1 and β 2
T 2 T T T
t=1 x2t t=1 x1t ut t=1 x1t x2t t=1 x2t ut
T3 T − T2 T2
β̂ 1T − β 1 = , (8.33)
T
T T 2 T T
t=1 x2t ut t=1 x1t t=1 x1t ut t=1 x1t x2t
T2 T − T T2
T(β̂ 2T − β 2 ) = , (8.34)
T
where
2
T
T
T
T = T −3 x22t T −1 x21t − T −2 x1t x2t .
t=1 t=1 t=1
Using the results (8.30)–(8.32) in (8.33) and (8.34) and noting that T −1 Tt=1 x1t ut and
−2
T p
T t=1 x 2t ut both converge in probability to zero, we have β̂ 1T −β 1 → 0 and T β̂ 2T − β 2
p
→ 0. It is also easily seen that
β̂ 1T − β 1 = Op (T −1/2 ),
β̂ 2T − β 2 = Op (T −3/2 ).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Since the rate of convergence of β̂ 2T to β 2 is much faster than is usually encountered in the statistics
literature, β̂ 2T is said to be ‘super-consistent’ for β 2 , a concept which we shall encounter later in our
discussion of cointegration. See also Theorem 31 below.
yt = α zt + ε t (8.35)
where zt is a s×1 vector of regressors, in general different from xt , and ε t are IID random variables,
distributed independently of xt and zt for all t and t . Namely, zt and xt are strictly exogenous in
p
the context of (8.35). But β is still estimated using (8.29). We show that if T −1 X Z → (a finite
p
−1 α, which, in general, differs from α. Under (8.35)
k × s matrix) then β̂ T − β → xx xz
we have
p
But T −1 X X → lim T −1 E X X = xx , and hence Var T −1 X ε → 0 as T → ∞.
T→∞
q.m. p
Therefore, T −1 X ε → 0, which implies that T −1 X ε → 0 (see Theorem 2). Using these results
in (8.36) and taking advantage of the stochastic convergence Theorem 7 and Theorem 9, we have
−1 −1 X ε
Plim β̂ T = xx xz α + xx Plim ,
T→∞ T→∞ T
and since PlimT→∞ T −1 X ε = 0, we have
−1
Plim β̂ T = β ∗ (α) = xx xz α.
T→∞
In the misspecification literature, β ∗ (α) or β ∗ for short, is known as the pseudo-true value of β̂ T ,
and shows the explicit dependence of the probability limit of β̂ T on the parameters of the correctly
specified model.3 Theil (1957) and Griliches (1957) were the first to discuss the implications of the
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
above result in econometrics. In particular, they discussed the effect of incorrectly deleting or adding
regressors on the OLS estimators.
Theorem 31 Consider the regression model in Example 17 and assume that ut is IID 0, σ 2 and
that
max1≤t≤T (xit )2
lim T 2 = 0, i = 1, 2, . . . k, (8.37)
T→∞
t=1 xit
then
d
AT β̂ T − β → N 0, σ 2 V , (8.38)
where
⎛ ⎞
T 2 12
⎜ t=1 x1t 0 ... 0 ⎟
⎜ 1 ⎟
⎜ ⎟
⎜ 0 T 2
t=1 x2t
2
... 0 ⎟
AT = ⎜
⎜
⎟,
⎟ (8.39)
⎜ .. .. .. ⎟
⎜ . . ... . ⎟
⎝ 1 ⎠
T 2
0 0 ... 2
t=1 xkt
and
−1 −1
V = lim AT X XAT , (8.40)
T→∞
A proof of this theorem is given in Amemiya (1985), and makes use of the Linderberg–Feller
central limit Theorem 17. What is interesting about this theorem is the fact that it accommodates
regression models containing both trended and non-trended variables. It is clear that when the
regressors are bounded the condition (8.37) is satisfied. For trended variables, say xit = t, we
have max1≤t≤T x2it = T 2 , and
T
T(T + 1)(2T + 1)
x2it = .
t=1
6
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
⎡ 12 ⎤
T 2
xit ⎦
β̂ iT − β i = op ⎣ ,
t=1
T
or
T 12
x2it β̂ iT − β i = Op (1) .
t=1
For bounded regressors T −1 Tt=1 x2it = Op (1), and β̂ iT − β i = op (1). When xjt = t, then
T −1 Tt=1 x2jt = O(T 2 ) and β̂ jT − β j = op T −1 , a result already demonstrated in Example
18. When xlt = t 2 , then T −1 Tt=1 x2lt = O(T 4 ), and β̂ lT − β l = op T −2 , etc. Similarly, we
have β̂ iT − β i = Op T −1/2 , β̂ jT − β j = Op T −3/2 , and β̂ lT − β l = Op T −5/2 .
8.11 Exercises
d d
1. Prove that, if xt → x, and P(x = c) = 1, where c is a constant, then xt → c.
2. Show that if xt is bounded (i.e., P(|xt | ≤ M) = 1 for all t, and for some M < ∞), then
d
xt → x implies limt→∞ E(xt ) = E(x).
3. Let {Xt } be a sequence of IID random variables with E(Xt ) = μ, E(Xt2 ) = 1, third central
moment E(Xt − μ)3 = 0, and fourth central moment E(Xt − μ)4 = 3.
and discuss the estimation method that underlies the following estimate of μ, based on
the T observations x1 , x2 , . . . , xT
' (1/2
T
μ̂T = T −1 (x2t − 1) .
t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where μ̄T lies on√the line joining μ̂T to μ. Hence, or otherwise, determine the asymptotic
distribution of T(μ̂T − μ). Discuss possible difficulties with your derivation when
μ = 0.
−2
where yt = 0 with probability 1−t , and yt = t
4. Let yt be a sequence of random variables,
−2
with probability t . Let xt = yt − E yt . Verify whether the Lindberg and Feller CLT holds
for {xt }.
5. Let {xt } be a sequence of random variables with xt = ρxt−1 + εt , where εt is IID(0, σ 2 ) and
|ρ| < 1. Verify that {xt } is asymptotically uncorrelated (see Definition 9).
6. Consider the following regression
yt = α/t + ε t , for t = 1, 2 . . . , T,
where εt are IID random variables with mean zero and a constant variance. Show that the OLS
estimator of α need not be consistent.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
9 Maximum Likelihood
Estimation
9.1 Introduction
LT (θ , x) = f (x1 , . . . , xT ; θ) , (9.1)
where x = (x1 , x2 , . . . , xT ) , and f (x; θ ) represents the joint density function of the sample
(x1 , x2 , . . . , xT ). It is often convenient to work with the logarithm of the likelihood function, the
so-called log-likelihood function
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
T (θ ) = log f (xt , θ) . (9.3)
t=1
The likelihood function gives the probability that a particular set of realizations of x, namely
x1 , . . . , xT , lie in the range x and x + x. R. A. Fisher suggested estimating the unknown param-
eters, θ, by maximizing the likelihood function with respect to θ. The value of θ, say θ̂ T , at which
T (θ) is globally maximized is referred to as the maximum likelihood (ML) estimator of θ .
Example 20 (The Bernoulli distribution) Consider a random sample of size T which is drawn
from the Bernoulli distribution
where θ is a scalar parameter representing the probability of success (or failure). The sample values
x1 , x2 , . . . , xT , will be a sequence of 0 and 1. The log-likelihood function for this problem is given by
T
T
T (θ) = xt ln (θ ) + T − xt ln(1 − θ ).
t=1 t=1
The necessary condition for the maximization of the log-likelihood function is given by equating the
first derivative of T (θ) to zero
T
∂T (θ ) 1 T
1
= xt − T− xt
∂θ t=1
θ t=1
1 − θ
T
1
= xt − Tθ .
θ (1 − θ) t=1
∂T (θ ) T
Hence, ∂θ = 0, yields the ML estimator θ̂ T = T −1 t=1 xt = x̄T .
Example 21 (Linear regression with normal errors) Consider the classical normal regression
model
y = Xβ + u, (9.4)
u |X ∼ N(0, σ IT ), 2
(9.5)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T 1
T (θ) = − ln 2πσ 2 − 2 y − Xβ y − Xβ , (9.6)
2 2σ
where θ = (β , σ 2 ) . The maximization of T (θ) with respect to β will be the same as the min-
imization of the sum of squares of errors, Q (β) = (y − Xβ) (y − Xβ) with respect to β, and
establishes that in the context of this model the ML and OLS estimators of β are algebraically the
same. We have
∂T (θ ) 1
= 2 X y − Xβ , (9.7)
∂β σ
and similarly
∂T (θ) T 1
= − 2 + 4 y − Xβ y − Xβ . (9.8)
∂σ 2 σ 2σ
Setting these derivatives equal to zero now yields the following ML estimators
−1
β̂ T = X X X y, (9.9)
2 y − Xβ y − Xβ
σ̂ T = . (9.10)
T
Suppose now xt and yt are jointly distributed with parameters θ = β , σ 2 , γ , where γ
denotes the parameter vector of the probability distribution of xt . Let zt = yt, xt . Then the
likelihood function of θ is given by the joint probability distribution of z1 , z2 , . . . , zT , namely
LT (θ ) = f (z1 , z2 , . . . , zT , θ )
= Pr (z1 |
0 , θ ) Pr (z2 |
1 , θ ) . . . Pr (zT |
T−1 , θ) , (9.12)
where
t , t = 0, 1, . . . , is a sequence of non-decreasing σ -fields, containing at least observations
on current and past values of zt .
In general, P (zT |
T−1 ) can be decomposed as
Pr (zT |
T−1 , θ) = Pr (xT |
T−1 , θ ) Pr yT | xT ;
T−1 , θ ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where Pr (xT |
T−1 , θ) is known as the marginal density and Pr yT | xT ;
T−1 , θ as the con-
ditional density. Suppose now that it is possible to write
Pr (zT |
T−1 , θ ) = Pr (xT |
T−1 , γ ) Pr yT | xT ;
T−1 , β, σ 2 . (9.13)
When γ does not depend on β and σ 2 , then we say that xt is weakly exogenous with respect to
the estimation of β and σ 2 . This decomposition, when it holds, allows us to ignore the marginal
density of x in the ML estimation of β and σ 2 . The concept of weak exogeneity holds more
generally and has been discussed in detail by Engle, Hendry, and Richard (1983).
Under weak exogeneity, substituting (9.13) in (9.12) and taking logs we obtain
T
T β, σ =
2
ln Pr yt | xt ,
t−1 , β, σ 2 . (9.14)
t=1
The probability density of xt , which does not depend on β and σ 2 , is left out of the log-likelihood
function. Under (9.11) and conditional on xt we have
− 1 − 1 u2
Pr yt | xt ,
T−1 , β, σ 2 = 2πσ 2 2 e 2σ 2 t .
1 2
T
T
T β, σ = − ln 2πσ − 2
2 2
u ,
2 2σ t=1 t
or
T 1
T β, σ 2 = − ln 2πσ 2 − 2 y − Xβ y − Xβ .
2 2σ
Example 22 Consider the following model for xt
where G and λ are k × k and k × 1, matrices of free coefficients (unrelated to β, and σ 2 ), and vt is
a k × 1 vector of disturbances. In this example, γ is defined in terms of G, λ and xx , which do not
depend on parameters of the conditional model (β and σ 2 ) and the joint probability distribution
function of zt = (yt , xt ) decomposes as in (9.13) with γ unrelated to β, and σ 2 if the weak
exogeneity condition
E(ut vt |
T−1 ) = 0
holds, we have
However, due to the feedback effect from yt−1 in (9.15), xt is not strictly exogenous, and in general
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
For xt to be strictly exogenous we need the additional restrictions that there are no lagged feedbacks
from y into x, namely we must also have λ = 0. With this additional set of restrictions it is now
easily seen that
namely under strict exogeneity ut is uncorrelated with past as well as future realizations of x. But
under weak exogeneity ut is only uncorrelated with current and past values of x.
In the above example, weak exogeneity is sufficient for asymptotic inference (as T → ∞),
but can lead to biased OLS estimators. For the OLS estimators to be unbiased strict exogeneity
is required. To see this note that:
⎡ −1 T ⎤
T
E β̂ T | x1, x2, . . . xT = β + E⎣ xt xt xt ut | x1, x2, . . . , xT ⎦
t=1 t=1
T −1
T
=β+ xt xt xt E ut | x1, x2, . . . , xT .
t=1 t=1
Hence
E β̂ T | x1 , x2 , . . . , xT = β,
Under strict exogeneity the exact variance-covariance matrix of β̂ T can also be derived and is
given by
−1
Var β̂ T = σ 2 E X X ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
introduced in Chapter 6), and dynamic panel data models (the models described in
Chapter 27).
Assumption 2 RC2 For θ 0 ∈ , there exist functions G(x), H(x) and K(x) (possibly depending
on θ 0 ) such that for θ in the neighbourhood of θ 0 , the inequalities
∂ log f (x,θ)
Assumption RC1 ensures that ∂θ has a Taylor series expansion as a function of θ.
∂ log f (x,θ )
The derivative ∂θ is known as the score vector. Assumption RC2 allows differentiation
∂ log f (x,θ)
of f (x, θ ) dx and ∂θ dx with respect to θ under the integral sign. That is, it permits
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
the order of differentiation and integration to be interchanged. Finally, assumption RC3 requires
∂ log f (x,θ )
that ∂θ has a finite variance.
Theorem 32 (Score vector) Under the regularity conditions RC1 to RC3, the score vector d(θ)
∂ log f (x,θ )
= ∂θ has mean zero and a finite variance.
Taking the partial derivatives of both sides of this relation with respect to θ (and noting that
the regularity condition RC2 allows the order of differentiation and integration to be inter-
changed), we have
∂f (x, θ )
dx = 0,
∂θ
1 ∂f (x, θ)
f (x, θ ) dx = 0,
f (x, θ) ∂θ
∂ log f (x, θ)
f (x, θ ) dx = 0, (9.17)
∂θ
which may also be written as d (θ ) ∂f (x, θ ) d (θ) = 0, or simply E[d (θ)] = 0, where the
expectations are taken with respect to the density function f (x, θ ). The variance of the score
function is given by
Var[d (θ )] = E[d (θ ) d (θ ) ]
∂ log f (x, θ ) ∂ log f (x, θ )
=E · ,
∂θ ∂θ
which is finite by assumption RC3. Taking partial derivatives of (9.17) with respect to θ we
have:
2
∂ log f (x, θ ) ∂ log f (x, θ) ∂ log f (x, θ )
+ · f (x, θ ) dx = 0,
∂θ∂θ ∂θ ∂θ
or
2
∂ log f (x, θ )
E d (θ ) d (θ ) = E − = B (θ ) . (9.18)
∂θ∂θ
Fisher’s information
Expression (9.18) is known as matrix. E[d (θ ) d (θ) ] = B (θ) is known
∂ 2 log f (x,θ )
as the outer-product form and E − ∂θ ∂θ = A (θ ) as the inner product form of the infor-
mation matrix. Notice that these forms are equal only under the assumption that f (x, θ ) is the
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
density function with respect to which expectations are taken. In the misspecified case where the
density function of x is not the same as f (x, θ ), the two forms of the information matrix need
not be the same.
BT (θ ) = T B(θ),
where
2
∂ log f (x, θ )
B(θ ) = E − .
∂θ∂θ
where f (x, θ) represents the joint density function of x = (x1 , x2 , . . . , xT ) . Taking partial
derivatives of both sides of (9.20), (and noting that θ̃ T is dependent only on x) we have
∂T (θ)
θ̃ T f (x, θ) dx = Ip ,
∂θ
where Ip is an identity matrix of order p. But since from Theorem 32, E[ ∂∂θ
T (θ)
] = 0, the
above relation can also be written as
∂T (θ )
Cov θ̃ T , = Ip . (9.21)
∂θ
1 This inequality can be easily derived by first minimizing Var λ θ̃ n + μ ∂(θ
∂θ
)
with respect to fixed λ, and then noting
that the minimized value of this variance is non-negative. λ and μ are vectors of constants.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∂T (θ ) ∂T (θ ) −1 ∂T (θ )
Var θ̃ T − Cov θ̃ T , Var Cov , θ̃ T ≥ 0. (9.22)
∂θ ∂θ ∂θ
∂ (θ ) −1
T
Var θ̃ T − Var ≥ 0.
∂θ
In the following section, we will show that, under the regularity conditions set out above, the
ML estimators are asymptotically efficient and achieve the C–R lower bound.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Theorem 34 (Consistency of ML estimators) Under the regularity conditions RC1 to RC3, the
ML estimator of θ , namely θ̂ T , converges in probability to θ 0 , the true value of θ under f (x, θ), as
T → ∞.
Proof In the IID case, using the law of large numbers due to Khinchine (Theorem 10), it is eas-
ily seen that the average log-likelihood function T −1 T (θ) = T −1 Tt=1 log f (xt , θ) con-
verges in probability to E [log f (xt , θ )], where expectations are taken under f (xt , θ); that is
T (θ ) p
→ E{log f (xt , θ)} = [log f (x, θ )] f (x, θ 0 ) dx. (9.23)
T
Khinchine’s theorem directly applies to the sum T −1 Tt=1 log f (xt , θ ), since independence
of xt s implies that log f (xt , θ ), t = 1, 2, . . . , T are also independently distributed with a con-
stant mean, given by E[log f (x, θ )]. Consider now the divergence of f (x, θ ) from f (x, θ 0 ),
measured by the Kullback–Leibler information criterion (KLIC) defined by,
f (x,θ ) f (x;θ)
Since log[ f (x,θ 0 ) ] is a concave function of the ratio f (x;θ 0 ) , then by Jensen’s inequality we have
(using f and f0 for f (x, θ ) and f (x, θ 0 ), respectively)
f f
E log ≤ log E . (9.24)
f0 f0
But
f f (x, θ)
E = f (x, θ 0 ) dx
f0 f (x, θ 0 )
= f (x, θ ) dx = 1,
with equality holding if and only if θ = θ 0 . Therefore, θ 0 is the value of θ that globally maxi-
mizes E [logf (x, θ )]. Hence, on the one hand from (9.23) we note that as T → ∞, θ̂ T max-
imizes E [logf (x, θ )], and from (9.25) we have the value of θ that maximizes E [logf (x, θ )]
is the true value, θ 0 . Hence, by the continuity of the log-likelihood function in θ , we also
have θ̂ T converging in probability to θ 0 . The strong convergence of θ̂ T to θ 0 also follows
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
when T −1 T (θ ) converges to E [logf (x, θ )] with probability 1. In the IID case, the strong
laws of large numbers (Theorem 12) is applicable to T −1 Tt=1 log f (xt , θ ) and hence, θ̂ T
converges to θ 0 with probability 1.
Remark 2 It is clear that the θ maximizing E [logf (x, θ)] needs to be unique or should correspond
to the global maximum of E [logf (x, θ )]. The necessary condition for E [logf (x, θ)] to have a
unique maximum is given by
2
∂ 2 E[log f (x, θ )] ∂ log f (x, θ)
= E < 0, (9.26)
∂θ∂θ ∂θ∂θ
Remark 3 Whether E[log f (x, θ )] has local or global maxima is closely related to the problem of
local and global identifiability of parameters. When E[log f (x, θ)] is a concave function of θ (and
hence A (θ ) is positive definite for all θ ∈ ), then θ is globally identified, otherwise θ is at best
only locally identified. Parameters θ are not identified when A (θ ) is rank deficient for all θ . See the
discussion of identification of the parameters of the simultaneous equation models in Chapters 20
and 22, and the paper by Rothenberg (1971).
Remark 4 In the case where log f (x, θ) is differentiable, θ̂ T is obtainable as a root of the equations
∂T (θ)
|θ=θ̂ T = 0.
∂θ
In practice, multiple roots may arise due to a variety of factors. For example, the parameters
may not be globally identified, the available sample size may not be large enough, the underlying
model may be misspecified, or any combination of these factors. However, if all three regularity
conditions are met, one would expect the multiple root problem to disappear as the sample size
increases.
The following theorem provides the asymptotic properties of ML estimators.
Theorem 35 Under the regularity conditions RC1 to RC3, the ML estimator has the following asymp-
totic properties
√ d
(i) Asymptotic normality: T θ̂ T − θ 0 → N 0, A (θ 0 )−1 , where
3 The interchange of the order of differentiation and integration in (9.26) is justified by the regularity conditions RC1
and RC2.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 ∂ 2 T (θ 0 )
A (θ 0 ) = lim E −
T→∞ T ∂θ∂θ
1 ∂ 2 T (θ 0 )
= Plim − .
T→∞ T ∂θ∂θ
(ii) Asymptotic unbiasedness: limT→∞ E θ̂ T = θ 0 .
(iii) Asymptotic efficiency: θ̂ T is an asymptotically efficient estimator and achieves the Cramer-
Rao lower bound, asymptotically.
∂T θ̂ T
Proof To prove (i), we expand ∂θ around θ 0 . By the mean value theorem
1 ∂ T θ̂ 1 ∂T (θ 0 ) 1 ∂ 2 T (θ 0 ) √
√ =√ + T θ̂ T − θ 0 + δT , (9.27)
T ∂θ T ∂θ T ∂θ∂θ
p
1 ∂ 3 T θ̄ T √
p
δ iT = T θ̂ jT − θ j0 θ̂ kT − θ k0 ,
j=1
T ∂θ i ∂θ j ∂θ k
k=1
which by assumption RC2 are bounded for all x, and i, j, k = 1, 2, . . . , p. The convergence of
θ̄ T to θ 0 in probability follows from the fact that θ̄ T lies between θ̂ T and θ 0 and the fact that
p
θ̂ T → θ 0 (see Theorem 34). Hence, we have
√
δ T = op T θ̂ T − θ 0 . (9.28)
Also, applying the law of large numbers to the elements of the inner product matrix
1 ∂ 2 log f (xt , θ )
T
1 ∂ 2 T (θ 0 )
− = − ,
T ∂θ∂θ T t=1 ∂θ∂θ
we have
2
1 ∂ 2 T (θ 0 ) p ∂ log f (x, θ)
− →E = A (θ 0 ) , (9.29)
T ∂θ∂θ ∂θ∂θ
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
which, by Assumption RC3, is a positive definite matrix. Using (9.28) and (9.29) in (9.27)
and invoking Slutsky’s convergence theorem (see Theorem 35) we have
√ 1 ∂T (θ 0 )
T θ̂ T − θ 0 = A (θ 0 )−1 √ + op (1) . (9.30)
T ∂θ
It now remains to show that √1 ∂T∂θ(θ 0 ) tends to a normal distribution. This follows immedi-
T
ately from the application of the Lindberg–Levy theorem (see Theorem 15) to
1 ∂ log f (xt , θ 0 )
T
1 ∂T (θ 0 )
√ =√ .
T ∂θ T i=1 ∂θ
∂ log f (x ,θ )
t 0
Firstly, as the theorem requires, under the regularity conditions, ∂θ has a constant
mean equal to zero, and a finite nonsingular variance given by A (θ 0 ). Therefore,
1 ∂T (θ 0 ) d
√ → N 0, A (θ 0 )−1 . (9.31)
T ∂θ
√ d
T θ̂ T − θ 0 → N 0, A (θ 0 )−1 . (9.32)
As for (ii), using (9.30) it is easily seen that as Var(θ̂ T ) → 0 as T → ∞, and hence θ̂ T
converges to θ 0 in mean squared error, and that limT→∞ E(θ̂ T ) = θ 0 . This latter result
follows from the fact that Eθ̂ T − θ 0 2 is bounded for all T, since by regularity conditions
RC1 and RC3, the derivatives of the log-likelihood function up to the third-order exist and
are bounded by functions that have finite integrals. Hence it follows that limT→∞ E(θ̂ T ) will
be equal to the mean of the asymptotic distribution of θ̂ T (for a proof see Rao (1973)). As
for (iii), it is clear that θ̂ T asymptotically achieves the Cramer–Rao lower bound (Theorem
33) given by the inverse of the information matrix. Given that θ̂ T is asymptotically unbiased
and achieves the Cramer–Rao lower bound, then it is also asymptotically efficient.
Example 23 Consider the problem of deriving the asymptotic distribution of the ML estimators of
α, β and σ 2 in the following simple nonlinear regression
β
yt = αxt + ut , ut ∼ N 0, σ 2 , (9.33)
β
for t = 1, 2, . . . , T. Let g(xt , γ ) = αxt , where γ = (α, β) and set θ = (γ , σ 2 ) . Conditional
on xt , the log-likelihood function of this model is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1
T
T
T (θ ) = − log 2π σ 2 − 2 [yt − g(xt , γ )]2 .
2 2σ t=1
To apply Theorem 35 to this problem, we need to find the score vector, ∂∂θ
T (θ )
, and the information
matrix, B (θ 0 ). We have
1 β
T
∂T (θ ) β
= 2 xt yt − αxt ,
∂α σ t=1
α β
T
∂T (θ ) β
= 2 xt yt − αxt log xt ,
∂β σ t=1
1
T
∂T (θ ) T β 2
= − + y t − αx t ,
∂σ 2 σ2 2σ 4 t=1
1 2β
T
∂2T (θ )
= − x ,
∂α 2 σ 2 t=1 t
α β 2 α 2
T T
∂2T (θ ) β 2β 2
= x t y t − αx t log x t − xt log xt ,
∂β 2 σ t=1
2 σ t=1
2
α β α 2β
T T
∂2T (θ ) β
= 2 xt yt − αxt log xt − 2 x log xt ,
∂α∂β σ t=1 σ t=1 t
1 β
T
∂2T (θ ) β
= − xt yt − αxt ,
∂σ ∂α
2 σ t=1
4
α β
T
∂2T (θ ) β
= − x t y t − αx t log xt .
∂σ 2 ∂β σ 4 t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∂2 (θ ) ∂2 (θ) ∂2 (θ )
and E − ∂σT2 ∂α = E − ∂σT2 ∂β = 0. Hence A (θ) = E − T1 T is block-diagonal
∂θ∂θ
2β 2β 2β 2
and (assuming that the expectations of xt , xt log xt and xt log xt exist and are finite, is
given by
⎡ T ⎤
1 2β α T 2β
E x t E x t log x t 0
1 ⎢ 2 ⎥
T t=1 T t=1
A (θ ) = ⎢ α T 2β T 2β ⎥
σ2 ⎣ T t=1 E x t log x t α 21
T t=1 E xt log xt 0 ⎦.
1
0 0 2σ 2
In the case where xt are identically distributed A (θ ) simplifies further. The information matrix for
γ is given by
⎡ ⎤
2β 2β
1 ⎣ E xt αE xt log xt
A (γ ) = 2 2 ⎦ .
2β
(9.34)
σ 2β
αE xt log xt α 2 E xt log xt
Hence γ̂ T , the ML estimator of γ , is asymptotically distributed with mean γ 0 (the true value of γ
under (9.33)), and the asymptotic covariance matrix given by T1 A (γ )−1 . It is clear that for the
nonlinear model, (9.33), to be meaningful, the realized values of xt should all be strictly positive.
This requirement is, for example, satisfied if we assume that xt has a log-normal distribution. It is
also clear that α should be strictly non-zero, otherwise the information matrix becomes singular,
and the parameter β will no longer be identified.
where f (xT | xT−1 , xT−2 , . . . , x1 ) represents the conditional density function of xT given the
realizations x1 , x2 , . . . , xT−1 . The above result can also be written more generally as
#
T
f (x, θ) = f (xt , θ |
t−1 ) ,
t=1
where {
t } is a sequence of non-decreasing σ -fields, containing at least some observations on
current and past values of xt . This formulation assumes that {xt } is adapted to {
t }, namely that
for each t, xt is
t -measurable. The log-likelihood function for this general case can be written as
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
T (θ ) = ln [f (xt , θ |
t−1 )] . (9.35)
t=1
Example 24 (The AR process) A simple example of dependent observations is the first-order sta-
tionary autoregressive process,
where f (x1 , θ ) is the marginal distribution of the initial observations and θ = (φ, σ 2 ) . Assuming
the process is stationary and has started a long time ago, we have
σ2
x1 ∼ N 0, ,
1 − φ2
We shall be dealing with more general time series processes in Chapter 12.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 ∂ log f (x,θ )
where hT = T ∂θ , and C is a matrix of constants which may depend on θ.
Theorem 36 Assume that the parameter space, , is compact, and that the likelihood L(θ ) =
f (x, θ ) is continuous on . A necessary and sufficient condition for weak consistency of the ML
estimator θ̂ T is that for every θ = θ 0 , with θ ∈ , there exists a neighbourhood of θ , N (θ),
satisfying
lim Pr sup [(T (θ ) − T (θ 0 )) < 0] = 1, (9.37)
T→∞ θ ∈N(θ)
For a proof see Theorem 1 in Heijmans and Magnus (1986b). Note that, since an ML estima-
tor may not be unique, the above theorem gives sufficient conditions for the consistency of every
ML estimator. If (9.37) (or (9.38)) is not satisfied, then at least one inconsistent ML estimator
exists.
The following theorem establishes asymptotic normality. To this end, let gT (θ ) = LT (θ )/
LT−1 (θ ), where LT (θ ) and LT−1 (θ ) are likelihood functions based on T and T−1 observations,
respectively, and let ξ Tj = ∂gT (θ )/∂θ j , for j = 1, 2, . . . ., p.
Theorem 37 Assume that the ML estimator, θ̂ T , exists asymptotically almost surely, and is weakly
consistent. Further, assume that:
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
(v) limT→∞ 1/T 2 Var t=1 ξ ti ξ tj |x1 , x2 , . . . ., xt−1 = 0 for all j = 1, 2, . . . , p.
T
(vi) limT→∞ 1/T 2 Var t=1 ξ ξ
ti tj − E ξ ξ |x
ti tj 1 2 , x , . . . ., x t−1 = 0 for all j =
1, 2, . . . , p.
(vii) There exists a finite positive definite p × p matrix, A(θ 0 ), such that
∂T (θ 0 ) ∂T (θ 0 )
lim (1/T) E = A(θ 0 ),
T→∞ ∂θ ∂θ
∂ 2 T (θ 0 )
lim [(1/T) R (θ 0 )] = lim (1/T) = −A(θ 0 ).
T→∞ T→∞ ∂θ∂θ
lim Pr T −1
sup Rij (θ ) − Rij (θ 0 ) > = 0.
T→∞ θ ∈N(θ)
√ d
T θ̂ T − θ 0 → N 0, A(θ 0 )−1 .
All these three procedures yield asymptotically valid tests, in the sense that they will have the
correct size (i.e., the type I error) and possess certain optimal power properties in large samples:
they are asymptotically equivalent, although they can lead to different results in small samples.
The choice between them is often made on the basis of computational simplicity and ease of use.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
p × 1 vector
Let LT (θ) be the likelihood function of the of unknown parameters, θ , associated
with the joint probability distribution of y1 , y2 , . . . , yT , conditional (possibly) on a set of pre-
determined variables or regressors. Assume also that the hypothesis of interest to be tested can
be written as a set of s ≤ p independent restrictions (linear and/or nonlinear) on θ . Denote
these s restrictions by4
H0 : h (θ) = 0, (9.39)
H1 : h (θ) = 0. (9.40)
d
LR → χ 2s ,
where s is the number of restrictions imposed (see below for a sketch of the proof). The null
hypothesis H0 is rejected if LR is larger than the appropriate critical value of the chi-squared
distribution.
The LR approach requires that the maintained model is estimated both under the null and
under the alternative hypotheses. The other two likelihood-based approaches to be presented
below require the estimation of the maintained model either under the null or under the alter-
native hypothesis, but not under both hypotheses.
4 The assumption that these restrictions are independent requires that the s × p matrix of the derivatives ∂h/∂θ has a
full rank, namely that Rank ∂h/∂θ = s.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where ∂ log LT (θ ) /∂θ and ∂ 2 log LT (θ ) /∂θ ∂θ are the first and the second derivatives of the
log-likelihood function which are evaluated at θ = θ̃, the restricted estimator of θ . Recall that it
is computed under the null hypothesis, H0 , which defines the set of restrictions to be tested. The
LM test was originally proposed by Rao and is also referred to as Rao’s score test, or simply the
‘score test’. Under the null hypothesis, LM has a limiting chi-squared distribution with degrees
of freedom equal to the number of restrictions, s, in the case of H0 in (9.39).
$ h θ̂ is the estimator of the variance of h θ̂ and can be estimated consistently
where Var
by
∂h (θ) 2
∂ log LT (θ )
∂h (θ)
$ h θ̂ =
Var − . (9.44)
∂θ θ =θ̂ ∂θ∂θ θ =θ̂ ∂θ θ =θ̂
Under H0 , W has a chi-squared distribution with degrees of freedom equal to the number of
restrictions. However, it is important to note that in small samples the outcome of the Wald test
could crucially depend on the particular choice of the algebraic formulation of the nonlinear
restrictions used. This has been illustrated by Gregory and Veall (1985), using Monte Carlo sim-
ulations.
Asymptotically (namely as the sample size, T, is allowed to increase without a bound), all the
three test procedures are equivalent. Like the LR statistic, under the null hypothesis, the LM and
the W statistics are asymptotically distributed as chi-squared variates with s degrees of freedom.
We can write
a a
LR ∼ LM ∼ W,
a
where ‘∼’ stands for ‘asymptotic equivalence’ in distribution functions.
Other versions
of the LM and the W statistics are also available.
One possibility
would be
∂ 2 log L (θ ) ∂ 2 log L (θ )
to replace − ∂θ∂θT in (9.42) and (9.44) by T Plim T −1 ∂θ ∂θT . This would not
affect the asymptotic distribution of the test statistics, but in some cases could simplify their
computation.
The literature contains a variety of proofs of the above propositions at different levels of gen-
erality. In what follows we provide a sketch of the proof under basic regularity conditions. It is
simpler to start with the LM test.
Define the Lagrangian function
T (θ , λ) = T (θ ) + λT h (θ ) ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
LM test is, if H0 is valid, then the restricted estimator should be near to the unrestricted estimator,
i.e., λ̃T should be near to zero.
The first-order conditions of maximum likelihood function yield
That is
∂T θ̃ T
+ H θ̃ T λ̃T = 0, (9.45)
∂θ
hT (θ̃ T ) = 0. (9.46)
Since we are interested in the distribution of λ̃T under the H0, we take first-order Taylor expan-
sion of (9.45) and (9.46) around θ 0
∂T (θ 0 )
d (θ 0 ) + θ̃ T − θ 0 + H θ̃ T λ̃T = op (1)
∂θ∂θ
H (θ 0 ) θ̃ T − θ 0 = op (1) .
Replacing θ̃ T by θ 0 under the null hypothesis and putting them in matrix form
∂T (θ 0 )
d (θ 0 ) ∂θ ∂θ
H (θ 0 ) θ̃ T − θ 0
+ = op (1) .
0 H (θ 0 ) 0 λ̃T
This is equivalent to
⎡ √ ⎤
1 ∂T (θ 0 )
√1 d (θ 0 ) H (θ 0) T θ̃ T − θ 0
T + T ∂θ∂θ ⎣ ⎦ = op (1) .
0 H (θ 0 ) 0 √1 λ̃T
T
1 ∂T (θ 0 )
Using (9.29) to replace T ∂θ∂θ by −A (θ 0 ) , we get
⎡ √ ⎤ −1
T θ̃ T − θ 0 −A (θ 0 ) H (θ 0 ) √1 d (θ 0 )
⎣ ⎦=
a T . (9.47)
√1 λ̃T H (θ 0 ) 0 0
T
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and under H0
√ a
T θ̃ T − θ 0 ∼ N 0, B (θ 0 )−1 ,
1 a
−1
√ λ̃T ∼ N 0, H (θ 0 ) A (θ 0 )−1 H (θ 0 ) .
T
1 1
√ λ̃T H (θ 0 ) A (θ 0 )−1 H (θ 0 ) √ λ̃T ∼ χ 2s , where s = rank (H (θ 0 )) and s ≤ p.
T T
1 −1
LM = λ̃ H θ̃ T A θ̃ T H θ̃ T λ̃T
T T
⎛ ⎞ ⎛ ⎞
1 ∂T θ̃ T −1 ∂ T θ̃ T
= ⎝ ⎠ A θ̃ T ⎝ ⎠ ∼ χ 2s . (9.48)
T ∂θ ∂θ
Hence under regularity conditions the LM test will be asymptotically distributed as χ 2s under
the null hypothesis. The advantage of the LM test is that we only need to estimate the model
under the constraints.
The LR test focuses on the difference between the restricted and unrestricted values of the
log-likelihood function. The statistic is defined as
LR = −2 T θ̃ T − T θ̂ T .
Note the difference T θ̃ T −T θ̂ T is always non-positive, hence LR is always non-negative.
Expand the restricted estimator around the unrestricted estimator
∂T θ̂ T 1 ∂2 θ̄ T
T
T θ̃ T = T θ̂ T + θ̃ T − θ̂ T + θ̃ T − θ̂ T θ̃ T − θ̂ T ,
∂θ 2 ∂θ∂θ
with θ̄ T lies between θ̃ T and θ̂ T . Since θ̂ T is an unrestricted ML estimator, we get ∂T θ̂ T /
∂θ = 0. Hence
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
LR = −2 T θ̃ T − T θ̂ T
√ 1 ∂2T θ̄ T √
= T θ̃ T − θ̂ T − T θ̃ − θ̂ .
T ∂θ∂θ
T T
Hence
√ 1 ∂ T θ̃ T
T θ̃ T − θ̂ T = −A (θ 0 )−1 √ + op (1) .
T ∂θ
Hence
√ √
LR = T θ̃ T − θ̂ T A (θ 0 )−1 T θ̃ T − θ̂ T
⎛ ⎞ ⎛ ⎞
1 ⎝ ∂T θ̃ T ⎠ ∂T θ̃ T
= A (θ 0 )−1 ⎝ ⎠ + op (1) . (9.49)
T ∂θ ∂θ
Comparing (9.48) and (9.49) we observe LR has the same χ 2 distribution as LM, that is, LR
and LM are asymptotically equivalent.
The Wald test focuses on the unrestricted estimation. Note that if H0 is valid, we should
have h (θ ) be near to zero under the unrestricted estimation. Hence by construction we have
h(θ̃ T ) = 0. We carry out a Taylor expansion of the unrestricted estimates around θ 0
∂h θ̄ T
h(θ̂ T ) = h (θ 0 ) + θ̂ − θ , θ̄ ∈ θ̂ , θ .
∂θ
T 0 T T 0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Hence
√ √ ∂h θ̄ T
T h(θ̂ T ) − h (θ 0 ) = T θ̂ T − θ 0 .
∂θ
∂h θ̄ p
By consistency of ML estimators, ∂θ T → ∂h(θ
∂θ
0)
= H (θ 0 ). Now under H0 , h (θ 0 ) =
√ √
∂h θ̄
0, Th θ̂ T has the same distribution as T θ̂ T − θ 0 since ∂θ T tends to matrix H (θ 0 )
in probability. Hence
√ a
√
Th(θ̂ T ) ∼ N 0, H (θ 0 ) Var T θ̂ T − θ 0 H (θ 0 ) .
√
Note that Var T θ̂ T − θ 0 is just the Fisher information matrix B (θ 0 ), we get
−1
W = h(θ̂ T ) H (θ 0 ) Var(
$ θ̂ T )H (θ 0 ) h(θ̂ T ) ∼ χ 2s .
The LM, LR, and Wald tests will converge to the same χ 2s distribution asymptotically. Hence in
large samples, it does not matter which one we choose. The choice could be based on ease of
computations. If both θ̃ T and θ̂ T are easy to compute, we can choose the LR test; if θ̃ T is easy to
compute, we choose the LM test; if θ̂ T is easy to compute, we choose the Wald test.
Although all three statistics are asymptotically equivalent, there is, however, an interesting
inequality relationship between them in small samples. We have
W ≥ LR ≥ LM.
This result suggests that in finite samples, the LR test rejects the null hypothesis less often than
the W test, but rejects the null more often than the LM test. In practice, the real value of these
likelihood-based procedures lies in situations where the problem cannot be cast in the classical
normal regression model framework, or when the hypotheses under consideration impose non-
linear parametric restrictions on the parameters of the linear regression model. Among the three
testing procedures, the LR approach seems to be more robust, especially as far as the formulation
of the null hypothesis is concerned, and is to be preferred to the other procedures. This is partic-
ularly the case when the null hypothesis imposes nonlinear restrictions on the parameters. Often
the LM procedure is favoured over the other two approaches on grounds of computational ease,
as it requires estimation of the ML estimators only under the null hypothesis.
The three tests are discussed under the maximum likelihood context, which requires knowl-
edge of density function of the variable. When we need to estimate parameters without specify-
ing density functions, the Generalized Method of Moments (GMM) is more robust. The GMM
approach is discussed in Chapter 10.
Example 25 (Linear regression with normal errors) The difference between the three test pro-
cedures is best demonstrated by means of a simple example. Suppose we are interested in testing the
hypothesis
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
H0 : β = 0, against H1 : β = 0,
where β is the slope coefficient in the simple classical normal regression model
yt = α + βxt + ut , ut ∼ N 0, σ 2 t = 1, 2, . . . , T.
The log-likelihood function of this model is given by (9.6) which we reproduce here for convenience
1
T
T 2
log LT (θ ) = − log 2π σ 2 − 2 yt − α − βxt ,
2 2σ t=1
where θ = α, β, σ 2 . The unrestricted ML estimators obtained under H1 ,
⎛ ⎞
⎛ ⎞ ȳ − x̄ SXY
α̂ ⎜ SXX
⎟
θ̂ = ⎝ β̂ ⎠ = ⎜ ⎟,
SXY
⎝ SXX ⎠
σ̂ 2 2
1
T t y t − α̂ − β̂x t
where as before SXY = t (xt − x̄) yt − ȳ and SXX = t (xt − x̄) . The restricted ML
2
The maximized values of the log-likelihood function under H0 and H1 are given by
2
−T
t yt − ȳ T
log LT θ̃ = log 2π − ,
2 T 2
and
⎡ 2 ⎤
−T y − α̂ − β̂x
⎢ t t t ⎥ T
log LT θ̂ = log ⎣2π ⎦− ,
2 T 2
respectively. Hence, the LR statistic for testing H0 : β = 0, against the two-sided alternatives
H1 : β = 0 will be
LR = 2 log LT θ̂ − log LT θ̃
⎡ 2 ⎤
y − α̂ − β̂xt
⎢ t t ⎥
= −T log ⎣ 2 ⎦,
t y t − ȳ
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where SSRU and SSRR are respectively the unrestricted and the restricted sums of squares of residu-
als.5 In the present application there is also a simple relationship between the LR and the F test
of linear restrictions discussed in Section 3.7. Recall from (3.26) that the F-statistic for testing
H0 : β = 0 is given by
T − 2 SSRR − SSRU
F= .
1 SSRU
Hence there is a monotonic relationship between the exact F-test of β = 0 and the asymptotic LR
test. Also, since in this simple case under H0 , the F-statistic is distributed with 1 and T − 2 degrees
of freedom, we have F = t 2 , where t has a t-distribution with T − 2 degrees of freedom, and (9.51)
becomes
t2
LR = T log 1 + ,
T−2
where ρ̂ XY is the sample correlation coefficient between Y and X. Hence, not surprisingly, a large
value of the LR statistic is associated with a large value of |ρ̂ XY |. These results are readily general-
ized to the multivariate case.6 Turning now to the LM and W statistics, we need first and second-
order derivatives of the log-likelihood function:
⎛ 1 ⎞
σ 2 t t
u
∂ log LT (θ) ⎜ ⎟
=⎝ σ2
1
t xt ut ⎠. (9.53)
∂θ
− 2σT 2 + 2σ1 4 t u2t
5 The unrestricted estimators of the residuals are û = y − α̂ − β̂x and the restricted ones are ũ = y − ȳ. Hence
t t t t t
2 2
i yt − α̂ − β̂xt = t ût is the unrestricted sums of squares of residuals, and t yt − ȳ = t ũt is the restricted
sums of squares of residuals.
6 In testing the joint hypothesis H : β = β = . . . = β = 0 against the alternative, H : β = 0, β =
0 1 2 k 1 1 2
0, . . . , β k = 0, in the multivariate regression model yt = α + ki=1 β i xit , we have LR= −T log 1 − R2 , where R2 is
the squared multiple correlation coefficient of the regression equation.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Evaluating these derivatives under H0 involves replacing θ = α, β, σ 2 with the restricted esti-
2
mators, namely α̃ = ȳ, β̃ = 0 and σ̃ 2 = T −1 t yt − ȳ . We have
⎛ 1
⎞ ⎛ ⎞
y − ȳ
σ̃ 2 t t 0
∂ log LT (θ ) ⎜ ⎟ ⎝ ⎠,
=⎝
1
2 t y t − ȳ xt ⎠=
1
t xt yt − ȳ
∂θ σ̃ σ̃ 2
θ =θ̃ 2
0
− T 2 + 1 4 t yt − ȳ
2σ̃ 2σ̃
and
⎛ ⎞
− T2 Tx̄
0
σ̃ σ̃ 2
∂ 2 log LT (θ ) ⎜ − 2 − t xt (yt −ȳ) ⎟
=⎜
⎝
Tx̄ t xt ⎟.
⎠
∂θ∂θ σ̃ 2 2
σ̃ σ̃ 4
θ =θ̃ − t xt (yt −ȳ) −T
0
σ̃ 4 2σ̃ 4
To simplify the derivations we use an asymptotically equivalent version of the estimator of ∂ 2 log
LT (θ) /∂θ∂θ which in effect treats the off diagonal element in the matrix of second derivatives,
T −1 x y − ȳ as being negligible. With this approximation it is now easily seen that
t t t
−1
− yt − ȳ
T
− Tx̄2 0
t xt σ̃ 2 σ̃ 2
LM = 0, −Tx̄ t xt (yt −ȳ)
,
σ̃ 2 t xt
σ̃ 2 σ̃ 2 σ̃ 2
or
2
t xt yt − ȳ
LM = .
σ̃ 2 t (xt − x̄)2
2
But since σ̃ 2 = T −1 −1
t yt − ȳ = T SYY , and t xt yt − ȳ = SXY , then
TS2XY
LM = = T ρ̂ 2XY . (9.55)
SXX SYY
This form of the LM statistic is quite common in more complicated applications of the LM principle.
For example, in the multivariate case, the LM statistic for the test of H0 : β 1 = β 2 = · · · =
β k = 0 in yt = a + ki=1 β i xit + ut is given by TR2 , where R2 is the square of the coefficient of
the multiple correlation of the regression equation.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The Wald statistic for testing H0 : β = 0 is based on the unrestricted estimate of β, namely
h (θ ) = β and h(θ̂ ) = β̂. Hence,
2
−1
W = β̂ $ β̂
Var ,
where in the ML framework the estimator of the variance of β̂ is based on the ML estimator of σ 2 ,
2
2 τ yt −α̂−β̂xt
i.e. σ̂ = T rather than the unbiased estimator which is obtained by dividing the
sum of squares of residuals by T − 2. We have
2
T β̂ T
W= 2 = t2 ,
T−2
t yt − α̂ − β̂xt
where t is the t-ratio of the slope coefficient. Once again using (3.7) W can be written in terms of
ρ 2XY . That is
T ρ̂ 2XY
W= . (9.56)
1 − ρ̂ 2XY
To summarize, for testing H0 : β = 0 in the simple regression model, the LR, LM, and W statistics
given by (9.52), (9.55) and (9.56), respectively, can all be written in terms of ρ̂ 2XY . Collecting these
results in one place we have
LR = −T log 1 − ρ̂ 2XY ,
LM = T ρ̂ 2XY ,
and
T ρ̂ 2XY
W= .
1 − ρ̂ 2XY
9.9 Exercises
1. Let x1 , x2 , . . . , xT be independent random drawings from the exponential distribution
p (xi , λ) = λe−λxi ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(a) Derive the maximum likelihood (ML) estimator for λ, λ̂, and derive the asymptotic vari-
ance of λ̂.
(b) Obtain the likelihood ratio (LR) test statistic for testing H0 : λ = λ0 versus H1 : λ =
λ0 , and compare it to the Lagrange multiplier (LM) statistic for the same test.
θ i exp(−θ )
Pr(yi |θ ) = , yi = 0, 1, 2, . . . ,
yi !
yt = β 1 x1t + β 2 x2t + ut .
yt = xt β + ut,
= x1t β 1 + x2t β 2 + ut,
where β 1 and β 2 are k1 and k2 vectors of constant parameters and ut ∼ IIDN(0, σ 2 ). Suppose
we are interested in testing H0 : β 2 = 0 against H1 : β 2 = 0.
(a) Obtain the Lagrange multiplier (LM), likelihood ratio (LR) and Wald (W) statistics for
testing H0 .
(b) Show that the F-test of H0 may be obtained as a simple transformation of all three statis-
tics mentioned under (a).
(c) Demonstrate the inequality LM ≤ LR ≤ W.
6. Show that when the regression equation (9.11) is augmented with the model of the exoge-
nous variables given by (9.15), the result can be written in the form of the following vector
autoregressive (VAR) model (see also Chapter 21)
zt = zt−1 + ξ t .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Derive an expression for zt , and ξ t . Hence, or otherwise, assuming that E(vt ut ) = 0, prove
that E(ut xt−s ) =0, for s = 0, 1, 2, . . . . Also noting that
E (zt ut ) = E ξ t ut ,
and
E (zt+1 ut ) = E (zt ut ) ,
show that
E(ut xt+1 ) = σ 2 λ.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
10 Generalized Method
of Moments
10.1 Introduction
S tandard econometric modelling practice has for a long time been based on strong assump-
tions concerning the data generating process underlying a series. Such assumptions, although
often unrealistic, allowed the construction of estimators with optimal theoretical properties. The
most prominent example of this perspective is the maximum likelihood (ML) method, which
requires a complete specification of the model to be estimated, including the probability distribu-
tion of the variables of interest. However, in practice, the investigator may not have full knowl-
edge of the probability distribution to commit himself to such a complete specification of the
econometric model. The generalized method of moments (GMM), discussed in this chapter,
is an alternative estimation procedure which is devised for such circumstances. This estimator
requires only the specification of a set of moment conditions that are deduced from the assump-
tions underlying the econometric model to be estimated. The GMM is particularly attractive to
economists who deal with a variety of moment or orthogonality conditions derived from the
theoretical properties of their economic models. This method is also useful in cases where the
complexity of the economic model makes it difficult to write down a tractable likelihood func-
tion (Bera and Bilias (2002)). Finally, in cases where the distribution of the data is known, the
GMM may be a convenient method to adopt to avoid the computational complexities often asso-
ciated with ML techniques.
The utilization of moment-based estimation techniques dates back to the work by Karl Pear-
son on the method of moments (see Pearson (1894)), although it has been the object of renewed
interest by econometricians since the seminal paper by Hansen (1982) on GMM. Recently,
GMM techniques have been widely applied to analyse economic and financial data, using time
series, cross-sectional or panel data.
In the rest of the chapter we review the estimation theory of the GMM. We also describe the
instrumental variables (IV) approach within the broader context of the GMM.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
f (w1 , w2 , . . . , wT ; θ 0 ),
Estimation can be based on the empirical counterpart of E [m(wt , θ )], given by the
r-dimensional vector of sample moments
1
T
MT (θ ) = m(wt , θ ). (10.2)
T t=1
Example 26 (Linear regression) Consider the linear regression model yt = xt β 0 + ut , where xt
is a k-dimensional vector of regressors. Under the classical assumptions, the following population
conditions can be found
E (xt ut ) = E xt (yt − xt β 0 ) = 0, t = 1, 2, . . . , T.
Suppose now that the regressors, xt , are correlated with the error term, ut , namely E (xt ut ) = 0.
This may arise in a variety of circumstances such as errors-in-variables, simultaneous equations, or
rational expectations models. It is well known that in this case the ordinary least squares (OLS)
estimator of β 0 is biased and inconsistent. Assume we can find a m-dimensional vector of so-called
instrumental variables, zt , satisfying the orthogonality conditions
E (zt ut ) = E zt (yt − xt β 0 ) = 0, t = 1, 2, . . . , T. (10.3)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
zy = zx β 0 ,
where zy = E zt yt , and zx = E zt xt . For identification of β 0 , it is required that the m × k
matrix zx be of full rank, k, ensuring that β 0 is the unique solution to (10.3). If m = k, then zx
is invertible and β 0 may be determined by
−1
β 0 = zx zy .
Example 27 (Consumption based asset pricing model) One of the most cited applications of
the GMM principle for estimating econometric models is the Hansen and Singleton (1982) con-
sumption based asset pricing model. This model involves a representative agent who makes invest-
ment and consumption decisions at time t to maximize his/her discounted lifetime utility subject
to a budget constraint. Assume that the only one asset available as a possible investment yields a
pay-off in the following period. The agent wishes to maximize
∞
E β i U (ct+i ) |t ,
i=0
ct + qt = rt qt−1 + wt ,
where β is a discount factor, U (.) is a utility function, ct is consumption in period t, qt is the quantity
of the asset held in period t which pays rt , wt is real labour income, and t is the information
available at time t. Optimal choice of consumption and investments satisfies
U (ct ) = βE rt U (ct+1 )|t ,
U (ct+1 )
E β rt+1 |t − 1 = 0.
U (ct )
γ
Hansen and Singleton (1982) set U(ct ) = ct − 1 /γ so that the above equation becomes
ct+1 γ −1
E β rt+1 |t − 1 = 0.
ct
The above moment can be exploited for estimation of the unknown parameters, β and γ . In particu-
lar, let zt be a vector of variables in the agent’s information set at time, namely zt ∈ t . For example,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
ct ct−1
zt may contain lagged values ct−1 , ct−2 , as well as a constant. Hansen and Singleton (1982) suggest
the following population moment conditions to be used in GMM estimation
ct+1 γ −1
E zt β (1 + rt+1 ) − 1 = 0.
ct
1
T
MT (θ̂ T ) = m(wt , θ̂ T ) = 0.
T t=1
This estimation procedure was introduced by Pearson (1894), and θ̂ T is usually referred to as
the ‘methods-of-moments’ estimator. Its application yields many familiar estimators, such as the
OLS or the IV.
Example 28 Suppose we wish to estimate the parameters of a univariate normal distributed random
variable, vt . It is well known that the normal distribution depends only on two parameters, the pop-
ulation mean, μ0 , and the population variance, σ 20 . These two parameters satisfy the population
moment conditions
E (vt ) − μ0 = 0,
2 2
E vt − σ 0 + μ0 = 0.
Hence, Pearson’s method involves estimating μ0 , σ 20 by the values μ̂T , σ̂ 2T which satisfy the
analogous sample moment conditions, and therefore are solutions of
1 1 2 2
T T
vt − μ̂T = 0, vt − σ̂ T + μ̂T = 0,
T t=1 T t=1
1 1
T T
2
μ̂T = vt , σ̂ 2T = vt − μ̂T .
T t=1 T t=1
Example 29 (Wright’s demand equation) Wright (1925) considered the following simple simul-
taneous equations system in agricultural demand and supply
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
t = α 0 pt + ut ,
qD D
qSt = β 0 nt + γ 0 pt + uSt ,
t = qt = qt ,
qD S
S
where qD t and qt represent demand and supply in year t, pt is the price of the commodity in year
t, qt equals quantity produced, nt is a vector of variables affecting supply but not the demand, and
t and ut are zero mean disturbances. The interest is in estimating α 0 , given a sample of T obser-
S
uD
vations on (qt , pt ). OLS regression of qt on pt would yield misleading results, given that price and
output are simultaneously determined. Wright (1925) suggests D Dsolving
this problem by taking a
variable, zDt , which is related to price, but it is such that Cov zt ut = 0. One example is the input
price or the yield per acre. Then by taking the covariance of zD t with both sides of the equation for
qt yields
D
E zD t qt − α 0 E zt pt = 0.
The above expression provides a population moment condition that can be exploited for estimating
α 0 . Using Pearson’s method of moments yields
T
T
α̂ T = t qt /
zD zD
t pt ,
t=1 t=1
The matrix AT imposes weights on the r moment conditions in such a way that we obtain a
unique parameter vector θ̂ T . Note that the positive semi-definiteness of AT implies that
MT (θ )AT MT (θ ) ≥ 0 for any θ . The first-order conditions for minimization of (10.4) are
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
1 ∂
T
∂
DT (θ) = M (θ ) = mT (θ ) (10.6)
∂θ T t=1 ∂θ
T
10.4.1 Consistency
To establish the theoretical properties of the GMM estimator, it is necessary to make some reg-
ularity conditions on wt and m(.). Specifically, we assume:
Assumption A1: (i) E [m(wt , θ 0 )] exists and is finite for all θ ∈ , and for all t; (ii) If
gt (θ ) = E [m(wt , θ)] , then there exists a θ 0 ∈ such that gt (θ ) = 0
for all t if and only if θ = θ 0 .
p
Assumption A2: supθ ∈ gt (θ) − MT (θ ) → 0.
Assumption A1, part (ii), implies that θ 0 can be identified by the population moments, gt (θ ).
Note that if for more than one value of θ we had gt (θ ) = 0 then we would not be able to
identify θ 0 (see also the discussion on observational equivalence in Section 20.9). Assumption
A2 requires that m(wt , θ ) satisfies a uniform weak law of large numbers, so that the difference
between the average sample and population moments converges in probability to zero. Note
that this is a high-level assumption that is satisfied under a variety of more primitive conditions.
For example, under some general conditions on m(.), and if is compact, Assumption A2 is
satisfied if {wt }∞
t=0 satisfies a weak law of large number for independent and heterogeneously
distributed processes, stationary and ergodic processes, or mixing processes (see Chapter 8 for
further details, Section 8.8).
Under Assumption A1 it follows that MT (θ )AMT (θ ) = 0 if and only if θ = θ 0 , and
MT (θ )AMT (θ ) > 0, otherwise. Further, under Assumption A2 it is possible to show that
p
MT (θ)AT MT (θ ) − MT (θ )AMT (θ ) → 0. (10.7)
The facts that the GMM estimator, θ̂ T , defined in (10.4), minimizes MT (θ )AT MT (θ ), that
θ 0 minimizes MT (θ )AMT (θ ), and relation (10.7) implies weak consistency of θ̂ T , namely,
p
θ̂ T → θ 0 .
See Mátyás (1999) and Amemiya (1985, p. 107), for further details.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
√
[S(θ 0 )]−1/2
d
TMT (θ 0 ) −→ N(0, Ir ), (10.8)
where
S(θ 0 ) = E TMT (θ 0 )MT (θ 0 ) ,
is a nonsingular matrix.
Again, note that Assumption A5 is a high level assumption that is satisfied under a number of
more primitive conditions. For example, Assumption A5 holds if {wt }∞ t=0 satisfies central limit
theorems for independent and heterogeneously distributed processes, stationary and ergodic
processes, or mixing processes (see Chapter 8).
To prove asymptotic normality, consider the mean-value expansion of MT (θ̂ T ) around θ 0
MT (θ̂ T ) = MT (θ 0 ) + DT θ̄ (θ̂ T − θ 0 ),
where θ̄ lies, element by element, between θ 0 and θ̂ T . Premultiplying by DT (θ̂ T )AT gives
DT θ̂ T AT MT (θ̂ T ) = DT θ̂ T AT MT (θ 0 ) + DT θ̂ T AT DT θ̄ (θ̂ T − θ 0 ), (10.9)
or
DT θ̂ T AT MT (θ 0 ) + DT θ̂ T AT DT θ̄ (θ̂ T − θ 0 ) = 0,
given that the left hand side of (10.9) are the √ first-order conditions for minimization of
MT (θ )AT MT (θ ). Rearranging and multiplying by T we obtain
√ √
T DT θ̂ T AT DT θ̄ (θ̂ T − θ 0 ) = −DT θ̂ T AT TMT (θ 0 ). (10.10)
√ a
T(θ̂ T (AT ) − θ 0 ) ∼ N 0, (D AD)−1 D ASA D(D A D)−1 , (10.11)
where S = S (θ 0 ).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
or, equivalently, its trace. The traditional proof of this is rather tedious and, therefore, omitted
here. An informed guess for the optimal A, denoted by A∗ , would be
A∗ = S−1 . (10.13)
We can now prove that this is the smallest possible variance matrix. To this end, we will show
that the difference between variance in equation (10.12) with any arbitrary A and the variance
in equation (10.14), VA − VA∗ , is a positive semi-definite matrix. To establish this it is sufficient
to show that VA−1∗ − VA−1 is a positive semi-definite matrix. Using the results from equations
(10.12) and (10.14) we have
−1
VA−1∗ − VA−1 = (D S−1 D) − (D AD)−1 D ASA D(D A D)−1 .
Since S is positive definite, we can decompose it using the Cholesky decomposition S = CC ,
so that S−1 C = C−1 . Now define
H= C−1 D, and B = C A .
Then
−1
VA−1∗ − VA−1 = (D S−1 D) − (D AD)−1 D ASA D(D A D)−1
−1
= (D C−1 C−1 D) − (D A D) D ACC A D (D AD)
−1
−1
=DC Ir − C A D D ACC A D D AC C−1 D
−1
= H Ir − BD D BB D D B H.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
√ 1
T T
S = Var TMT (θ 0 ) = E m (wt , θ 0 ) m (ws , θ 0 ) . (10.15)
T t=1 s=1
In the case m (wt , θ 0 ) is a stationary and ergodic process, under some regularity conditions it
can be shown that
∞
S = 0 (θ 0 ) + j (θ 0 ) + j (θ 0 ) , (10.16)
j=1
m
ŜT = ˆ 0 + w(j, m) ˆ j + ˆ j ,
j=1
where ˆ j = T1 Tt=j+1 m(wt , θ̂ T )m(wt−j , θ̂ T ) , for j = 0, 1, . . . , m, and w(j, m) is the ker-
nel or lag window, and m is the bandwidth. The kernel and bandwidth must satisfy certain
restrictions to ensure HAC is both consistent and positive semi-definite (see Section 5.9 for
details).
Clearly, the calculation of ŜT requires knowledge of the unknown parameters, θ 0 . A two-step
estimation procedure can be followed to compute the optimal GMM. This consists of comput-
(1)
ing an initial, consistent estimator of θ 0 , θ̂ T , for example using the non-efficient GMM based
(1)
on an arbitrary choice of AT . Hence, θ̂ T is used to compute ŜT , which is plugged into (10.4)
(2)
to obtain the asymptotically efficient GMM estimator, θ̂ T , in the second step. One common
choice of AT at the first step is the identity matrix, Iq . Instead of stopping after just two steps,
the procedure can be continued so that on the jth step the GMM estimation is performed using
(j−1)
as weighting matrix ŜT computed using θ̂ T . Such an estimator is known as the iterated GMM
estimator.
Monte Carlo studies show that the estimated asymptotic standard errors of the two-step and
iterated GMM estimators may be severely downward biased in small samples (see e.g., Hansen,
Heaton, and Yaron (1996) and Arellano and Bond (1991)). To improve finite sample proper-
ties of GMM, Hansen, Heaton, and Yaron (1996) suggested the so-called continuous-updating
GMM (CUGMM) estimator, which is the value of θ that minimizes MT (θ)ST (θ )−1 MT (θ ),
where ST (θ) is a matrix function of θ such that ST (θ ) → S. Note that this estimator does
not depend on an initial choice of the weighting matrix, although it is computationally more
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
burdensome than the iterated estimator, especially for a large nonlinear model. It is possible to
show that this estimator is asymptotically equivalent to the two-step and iterated estimators,
but may differ in finite samples. However, Anatolyev (2005) demonstrated analytically that the
CUGMM estimator can be expected to exhibit lower finite sample bias than its two-step coun-
terpart.
Windmeijer (2005) showed that the extra variation due to the presence of the estimated
parameters in the efficient weighting matrix accounts for much of the difference between the
finite sample and the estimated asymptotic variance for two-step GMM estimators based on
moment conditions that are linear in the parameters. In response to this problem Windmei-
jer (2005) has proposed a finite sample correction for the estimates of the asymptotic vari-
ance. In a Monte Carlo study the author shows that such bias correction leads to more accurate
inference.
A further complication arises when the number of available moment conditions for GMM
estimation is large, a case that often occurs in practice. For example, the application of GMM
techniques for estimation of dynamic panel data (see Chapter 27) leads, when T increases, to a
rapid rise in the number of orthogonality conditions available for estimation. Even though using
many moment conditions is desirable according to conventional first-order asymptotic theory,
it has been found that the two-step GMM estimator has a considerable bias in finite samples, in
the presence of many moment restrictions (see, e.g., Newey and Smith (2000)). To deal with
this problem, Donald, Imbens, and Newey (2009) developed asymptotic mean-square error cri-
teria (MSE) that can be minimized to choose the number of moments to use in the two-step
GMM estimator. Koenker and Machado (1999) showed that Tr3 → 0 as T → ∞ is a sufficient
condition for the limiting distribution of the GMM estimator to remain valid.
Hansen (1982) suggests testing the above hypothesis using T times the minimized value of the
GMM criterion function
√ √
J= TMT (θ̂ T ) Ŝ−1
T TMT (θ̂ T ) . (10.18)
d
It can be shown that J −→ χ 2 (r−q). The J-statistic is known as the over-identifying restrictions
test, and is widely adopted as a diagnostic tool for models estimated by GMM. If the statistic
(10.18) exceeds the relevant critical value of a χ 2 (r − q) distribution, then (10.17) must be
rejected since at least some of the moment conditions are not supported by the data. The J-test
may also be used to investigate whether an additional vector of moments has mean zero and,
thus, may be incorporated in the moment conditions in order to improve inference. To this end,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
assume that an initial GMM estimator, θ̃ T , based only on the r1 -dimensional vector, m1 (wt , θ ),
was computed. Then consider
√ √ √ √
J1 = TMT (θ̂ T ) Ŝ−1
T TM (θ̂
T T ) − TM (θ̃
1,T T ) Ŝ−1
1,T TM (θ̃
1,T T ,)
T
where M1,T (θ̃ T ) = t=1 m1 (wt , θ̃ T ), and Ŝ1,T is the corresponding optimal weight matrix.
1
T
d
Under the null hypothesis, J1 −→ χ 2 (r − r1 ), as T → ∞.
yt = β xt + ut , (10.19)
where it is known that E (xt ut ) = 0, with xt being a k-dimensional vector of regressors, and
the errors, ut , satisfying E (ut ) = 0, E (ut us ) = 0 for t = s and E u2t = σ 2 . Let zt be a r-
dimensional vector of instruments, assumed to be correlated with xt , but independent of us for
all s and t. Suppose that the number of instruments is larger than the number of parameters to
be estimated, i.e., r > k. The generalized instrumental variable estimator (GIVE), proposed by
Sargan (1958, 1959), combines all available instruments for estimation of β. Consider the the r
population moment conditions
E zt (yt − β 0 xt ) = 0. (10.20)
For expositional convenience suppose that the variables yt , xt , and zt have mean 0. The covari-
ance matrices of xt and zt are denoted by
xx = E xt xt , zz = E zt zt , xz = E xt zt .
The sample moments associated to the population moment conditions, of equation, (10.20),
are given by
1
T
MT (θ ) = zt (yt − β xt ).
T t=1
Also
⎡ T T ⎤
1
S = E⎣ zt ut zs us ⎦
T t=1 s=1
⎡ ⎤
1 T
= E⎣ zt zs ut us ⎦ ,
T t,s=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
which depends on the assumptions made about ut and zt . Given that in this simple application
the errors, ut , are assumed to be homoskedastic and serially uncorrelated, and distributed inde-
pendently of zs for all s, t, we have
1
T T
S= E zt zτ E (ut uτ )
T t=1 τ =1
1 2
T
ZZ
= E zt zt σ u = σ u E
2
= σ 2u zz ,
T t=1 T
and
∂m(wt , θ 0 )
D=E
∂θ
T
1 ∂
=E zt (yt − xt β)
T ∂β 0 t=1
1
T
=E zt x = zx .
T t=1 t
Denoting the GIVE or GMM estimator of β by β̂ IV , it follows from the two results above that
√
Var( T β̂ IV ) = (D S−1 D)−1
−1
= σ 2 (xz zz zx )−1 ,
−1
√ X Z Z Z −1 Z X
T β̂ IV ) = σ 2
Var( . (10.21)
T T T
Furthermore,
1 1
ÂT = D̂T Ŝ−1 ˆ ˆ −1
T = −zx zz = −(Z X) (Z Z)−1 2 , (10.22)
σ 2u σu
and
1
T
MT (β) = zt (yt − βxt )
T t=1
1
= Z (Y − Xβ). (10.23)
T
Hence, using the results from equations (10.22) and (10.23) in expression (10.4) we obtain
X Z(Z Z)−1 Z (Y − Xβ)
β̂ IV = argmin
.
(10.24)
β T
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Define
Pz = Z(Z Z)−1 Z ,
X Pz Y − (X Pz X)β̂ T = 0,
which is the GIVE (White (1982a)). Using (10.21), an estimator of the variance matrix of
β̂ IV is
1
σ̂ 2IV = û ûIV , (10.27)
T − K IV
If ut ∼ ID 0, σ ti , an heteroskedasticity-consistent estimator of the covariance matrix of β̂ IV
can be computed (see White (1982b) p. 489)
T
HCV(β̂ IV ) = QT−1 PT V̂T PT QT−1 , (10.29)
T−k
where
T
QT = X Pz X, PT = (Z Z)−1 Z X, V̂T = û2t, IV zt zt .
t=1
where X̂ = Pz X, and x̂t = X Z(Z Z)−1 zt . It follows that if Z is specified to include X, then
X̂ = X, x̂t = xt , ût, IV = ût , and H
CV(β̂ IV ) = HCV(β̂ OLS ) (see Section 4.2).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Before concluding we observe that for an instrument to be valid it must be ‘relevant’ for the
endogenous variables included in the regression equation. When instruments are only weakly
correlated with the included endogenous variables, we have the problem of ‘weak instruments’
or ‘weak identification’, which poses considerable challenges to inference using GMM and IV
methods. Indeed, if instruments are weak, the sampling distributions of GMM and IV statistics
are in general non-normal, and standard GMM and IV point estimates, hypothesis tests, and
confidence intervals are unreliable (see Stock, Wright, and Yogo (2002)).
m
yi = ρ wij yj + β xi + ui , i = 1, 2, . . . , m,
j=1
where ut ∼ IID(0, σ 2 ), xi is a vector of strictly exogenous, non-stochastic regressors, wij are known
elements of an m × m matrix, known in the literature as spatial weights matrix, W. It is assumed
that wii = 0, the matrix W is row-normalized so that the elements of each row add up to 1, and
|ρ| < 1. In matrix form the above model is
y = ρWy + Xβ + u. (10.31)
The variable Wy is typically referred to as spatial lag of y. Note that, in general, the elements
of the spatially
lagged dependent vector are correlated with those of the disturbance vector, i.e.,
E y W u = 0. One implication of this is that the parameters of (10.31) cannot be consistently
estimated by OLS. Under some regularity conditions, (10.31) can be rewritten as
Note that WE y can be seen as formed by a linear combination of the columns of the matrices WX,
W 2 X, W 3 X, . . . . On the basis of this observation, Kelejian and Prucha (1998) have suggested an
IV estimator for the parameters θ = ρ, β in the spatial regression (10.31), using as instruments
the matrix Z = (X, WX) (see (10.25)).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
X̂ = Pz X are computed, where Pz = Z(Z Z)−1 Z. Then β̂ IV is obtained by the OLS regression
of y on X̂. Notice, however, that such a two-step procedure does not, in general, produce a correct
β̂ IV ). This is because the IV residuals, ûIV , defined by (10.28),
estimator of σ 2 , and hence of Var(
2
used in the estimation of σ̂ IV are not the same as the residuals obtained at the second stage of
the 2SLS method. To see this denote the 2SLS residuals by û2SLS and note that
û2SLS = y − X̂β̂ IV
= (y − Xβ̂ IV ) + (X − X̂)β̂ IV (10.34)
= ûIV + (X − X̂)β̂ IV ,
where X − X̂ are the residual matrix (T × k) of the regressions of X on Z. Only in the case where
Z is an exact predictor of X, will the two sets of residuals be the same.
û û2SLS
GR2 = 1 − T2SLS , (10.35)
t=1 (yt − ȳ)
2
where û2SLS , given by (10.34), is the vector of residuals from the second stage in the 2SLS pro-
cedure. Note also that
Pesaran and Smith (1994) show that under reasonable assumptions and for large T, the use of
GR2 is a valid discriminator for models estimated by the IV method.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Q (β̂ IV )
χ 2SM = , (10.38)
σ̂ 2IV
Q β̂ IV = (y − Xβ̂ IV ) Pz (y − Xβ̂ IV )
= y [Pz − Pz X(X Pz X)−1 X Pz ]y (10.39)
−1
= ŷ [I − X̂(X̂ X̂) X ]ŷ.
Under the null hypothesis that the regression equation (10.19) is correctly specified, and that
a
the r instrumental variables Z are valid instruments, Sargan’s misspecification statistic, χ 2SM ∼
χ 2 (r − k). It is easily seen that χ 2SM is a special case of the J-statistic introduced in
Section 10.7.
⎛ ⎞
0 0 ... 0
⎜ û1,IV 0 ... 0 ⎟
⎜ ⎟
⎜ û2,IV û1,IV ... 0 ⎟
⎜ ⎟
W=⎜ .. .. .. .. ⎟, (10.41)
⎜ . . . . ⎟
⎜ ⎟
⎝ ûT−2,IV ûT−3,IV . . . ûT−p−1,IV ⎠
ûT−1,IV ûT−2,IV . . . ûT−p,IV
and
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
in which X̂ = Pz X.1 Note that when Z includes X, then X̂ = X, and (10.40) reduces to
(5.61). Under the null hypothesis that the disturbances in (10.19) are serially uncorrelated,
a
χ 2SC (p) ∼ χ 2 (p).
10.10 Exercises
1. Consider a random sample x1 , x2 , . . . , xT , drawn from a centered Student’s t-distribution
with ν 0 degrees of freedom, and assume ν 0 > 4 (see expression (B. 39)). Write down two
population moment conditions to estimate ν 0 , exploiting the second and fourth moments of
the distribution. Derive the corresponding sample moments and write down the objective
function for GMM estimation.
2. Consider the linear regression model yt = β 0 xt + ut , where xt is a k-dimensional vector of
regressors assumed to be orthogonal to the error term. Assume that ut is conditionally het-
eroskedastic and serially correlated. Find k population moment conditions for estimation of
β 0 and show that the GMM estimator of β 0 , β̂ T , is equivalent to the OLS estimator. Derive
the covariance matrix of β̂ T .
3. Consider the linear regression model yt = β 0 xt + ut , where xt is a k-dimensional vector of
regressors assumed to be orthogonal to the error term. Assume that ut ∼ IID(0, σ 2t ). Find
a set of population moment conditions for estimation of β 0 and σ 2t , t = 1, 2, . . . , N. Derive
the corresponding sample moments, write down the objective function for GMM estimation,
and derive the matrices D and S.
4. Consider the MA(1) model
yt = μ0 + εt + ψ 0 ε t−1 ,
1 See Breusch and Godfrey (1981, p. 101), for further details. The statistic in (10.40) is derived from the results in Sargan
(1976).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
11.1 Introduction
M odel selection in econometric analysis involves both statistical and non-statistical con-
siderations. It depends on the objective(s) of the analysis, the nature and the extent of
economic theory used, and the statistical adequacy of the model under consideration compared
with other econometric models. The various choice criteria discussed below are concerned with
the issue of ‘statistical fit’ and provide different approaches to trading off the ‘fit’ and ‘parsimony’
of a given econometric model.
We also contrast model selection with testing of statistical hypotheses that are non-nested, or
belong to separate families of distributions, meaning that none of the individual models may be
obtained from the remaining models either by imposition of parameter restrictions or through a
limiting process. In econometric analysis non-nested models arise naturally when rival economic
theories are used to explain the same phenomena such as unemployment, inflation or output
growth. Typical examples from economics literature are Keynesian and new classical explana-
tions of unemployment, structural and monetary theories of inflation, alternative theories of
investment, and endogenous and exogenous theories of growth.
Non-nested models could also arise when alternative functional specifications are considered
such as multinomial probit and logit distribution functions used in the qualitative choice liter-
ature, exponential and power utility functions used in the asset pricing models, and a variety of
non-nested specifications considered in the empirical analysis of income and wealth distribu-
tions. Finally, even starting from the same theoretical paradigm, it is possible for different inves-
tigators to arrive at different models if they adopt different conditioning or follow different paths
to a more parsimonious model.
More recently, Bayesian and penalized regression techniques have also been used as alternative
approaches to the problem of model selection and model combination, in particular when there
are a large number of predictors under consideration. We end this chapter with a brief account
of these approaches.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where fi (.) is the probability density function of the model (hypothesis) Mi , and ϕ i is a pi × 1
vector of unknown parameters associated with model Mi .1
The models characterized by fi (W|w0 , ϕ i ) are unconditional in the sense that probability dis-
tribution of wt is fully specified in terms of some initial values, w0 , and for a given value of ϕ i .2 In
econometrics the interest often centres on conditional models, where a vector of ‘endogenous’
variables, yt , is explained (or modelled) conditional on a set of ‘exogenous’ variables, xt . Such
conditional models can be derived from (11.1) by noting that
fi (w1 , w2 , . . . , wT |w0 , ϕ i )
= fi (y1 , y2 , . . . , yT |x1 , x2 , . . . , xT , ψ(ϕ i )) × fi (x1 , x2 , . . . , xT |w0 , κ(ϕ i )), (11.2)
where wt = (yt , xt ) . The unconditional model Mi is decomposed into a conditional model of
yt given xt and a marginal model of xt . Denoting the former by Mi,y|x we have
1 In cases where one or more elements of z are discrete, as in probit or Tobit specifications, cumulative probabality
t
distribution functions can be used instead of probability density functions.
2 Strictly speaking, however, the models defined by (11.1) are conditional on the initial values. This is unlikely to present
any difficulties when dealing with ergodic time series models. But in the case of panel data models with short T the formu-
lation of the unconditional model also requires that the distribution of the initial values be specified as well.
3 See Engle, Hendry, and Richard (1983).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
exogenous variables according to a subset xit which are included in model Mi , and a subset xit∗
which are excluded. We may then write
fi (Y|x1 , x2 , . . . xT , w0 , ϕ i )
∗ ∗ ∗
= fi (Y|xi1 , xi2 , . . . xiT , xi1 , xi2 , . . . , xiT , w0 , ϕ i )
∗
= fi (Y|Xi , w0 , ϕ i (ψ i )) × fi (Xi |Xi , w0 , ci (ϕ i )),
where Xi = (xi1 , x , . . . , x ) and X∗ = (x∗ , x∗ , . . . , x∗ ) . As noted above in the case of
i2 iT i i1 i2 iT
models differentiated solely by different p.d.f., a comparison of models based upon the partition
of xt into xit and xit∗ should be preceded by determining whether ∂ψ i (ϕ i )/∂ci (ϕ i ) = 0.
The above set up allows consideration of rival models that could differ in the conditioning
set of variables, {xit , i = 1, 2, . . . , m} and/or the functional form of their underlying probability
distribution functions, {fi (·), i = 1, 2, . . . , m}.
where t−1 denotes the set of all past observations on y, x and z, θ and γ are respectively kf
and kg vectors of unknown parameters belonging to the non-empty compact sets and , and
where x and z represent the conditioning variables. For the sake of notational simplicity we shall
also often use ft (θ ) and gt (γ ) in place of f (yt |xt , t−1 ; θ) and g(yt |zt , t−1 ; γ ), respectively.
Now given the observations (yt , xt , zt , t = 1, 2, . . . , T) and conditional on the initial values
w0 , the maximum likelihood (ML) estimators of θ and γ are given by
T
T
Lf (θ ) = ln ft (θ ), Lg (γ ) = ln gt (γ ). (11.7)
t=1 t=1
Throughout we shall assume that the conditional densities ft (θ ) and gt (γ ) satisfy the usual reg-
ularity conditions needed to ensure that θ̂ T and γ̂ T have asymptotically normal limiting dis-
tributions under the DGP.4 We allow the DGP to differ from Hf and Hg , and denote it by Hh ,
thus admitting the possibility that both Hf and Hg could be misspecified and that both are likely
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
to be rejected in practice. In this setting, θ̂ T and γ̂ T are referred to as quasi-ML estimators and
their probability limits under Hh , which we denote by θ h∗ and γ h∗ respectively, are known as
pseudo-true values. These pseudo-true values are defined by
θ h∗ = argmax Eh T −1 Lf (θ ) , γ h∗ = argmax Eh T −1 Lg (γ ) , (11.8)
θ∈ γ ∈
where Eh (·) denotes expectations taken under Hh . In the case where wt follows a strictly station-
ary process, (11.8) simplifies to
To ensure global identifiability of the pseudo-true values, it will be assumed that θ f ∗ and γ f ∗
provide unique maxima of Eh T −1 Lf (θ ) and Eh T −1 Lg (γ ) , respectively. Clearly, under Hf ,
namely assuming Hf is the DGP, we have θ f ∗ = θ 0 , and γ f ∗ = γ ∗ (θ 0 ) where θ 0 is the ‘true’
value of θ under Hf . Similarly, under Hg we have γ g∗ = γ 0 , and θ g∗ = θ ∗ (γ 0 ) with γ 0 denot-
ing the ‘true’ value of γ under Hg . The functions γ ∗ (θ 0 ), and θ ∗ (γ 0 ) that relate the parameters
of the two models under consideration are called the binding functions. These functions do not
involve the true model, Hh , and only depend on the models Hf and Hg that are under consider-
ation. We now consider some examples of non-nested models.
The conditional probability density associated with these regression models are given by
2 −1/2 −1
Hf : f (yt |xt ; θ ) = (2π σ ) exp (yt − α xt ) , 2
(11.12)
2σ 2
−1
Hg : g(yt |zt ; θ ) = (2π ω2 )−1/2 exp (y t − β 2
zt ) , (11.13)
2ω2
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
1 υ2 ˆ ww δ − 2δ
δ ˆ wx α + α
ˆ xx α
Eh T −1 Lf (θ ) = − ln(2π σ 2 ) − 2 − ,
2 2σ 2σ 2
where
T
T
T
ˆ ww = T −1
ˆ xx = T −1
wt wt , ˆ wx = T −1
xt xt , wt xt .
t=1 t=1 t=1
α h∗ ˆ xx
ˆ xw δ
−1
θ h∗ = = ˆ ˆ wx ˆ xx ˆ xw )δ
−1 . (11.15)
σ 2h∗ υ + δ (ww −
2
Similarly,
β h∗ ˆ zz
ˆ zw δ
−1
γ h∗ = = ˆ ˆ wz ˆ zz ˆ zw )δ
−1 , (11.16)
ω2h∗ υ + δ (ww −
2
where
T
T
ˆ zz = T −1
ˆ wz = T −1
zt zt , wt zt .
t=1 t=1
When the regressors are stationary, the unconditional counterparts of the above pseudo-true
values can be obtained by replacing ˆ ww , ˆ xx ,
ˆ wx etc. with their population values, namely
ww = E(wt wt ), xx = E(xt xt ), wx = E(wt xt ) etc. It is clear that the pseudo-true values of
the regression coefficients, α h∗ and β h∗ , in general differ from the true values given by δ.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
T
T
Lf (θ ) = yt log (θ xt ) + (1 − yt ) log 1 − (θ xt ) ,
t=1 t=1
T
−1 −1
Eh T Lf (θ ) = T H(δ xt ) log (θ xt )
t=1
T
+ T −1 1 − H(δ xt ) log 1 − (θ xt ) .
t=1
Therefore, the pseudo-true value of θ , namely θ ∗ (δ) or simply θ ∗ , satisfies the following equation
H(δ xt ) 1 − H(δ xt )
T
−1
T xt φ(θ ∗ xt ) − = 0,
t=1
(θ ∗ xt ) 1 − (θ ∗ xt )
where φ(θ ∗ xt ) = (2π)−1/2 exp −1
2 (θ ∗ xt ) . It is easily established that the solution of θ ∗ in
2
terms of δ is in fact unique, and θ ∗ = δ if and only if (·) = H(·). Similar results also obtain
for the logistic specification.
with the aim of choosing one of the models under consideration for a particular purpose with a
specific loss (utility) function in mind. In essence, model selection is a part of decision-making
and as argued in Granger and Pesaran (2000a), ideally it should be fully integrated into the
decision-making process. However, most of the current literature on model selection builds on
statistical measures of fit such as sums of squares of residuals or more generally maximized log-
likelihood values, rather than economic benefit which one would expect to follow from a model
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
choice. Consequently, model selection seems much closer to hypothesis testing than it actually
is in principle.
The model selection process treats all models under consideration symmetrically, while
hypothesis testing attributes a different status to the null and to the alternative hypotheses and
by design treats the models asymmetrically. Model selection always ends in a definite outcome,
namely one of the models under consideration is selected for use in decision-making. Hypoth-
esis testing on the other hand asks whether there is any statistically significant evidence (in the
Neyman–Pearson sense) of departure from the null hypothesis in the direction of one or more
alternative hypotheses. Rejection of the null hypothesis does not necessarily imply acceptance
of any one of the alternative hypotheses; it only warns the investigator of possible shortcomings
of the null that is being advocated. Hypothesis testing does not seek a definite outcome and if
carried out with due care need not lead to a favourite model. For example, in the case of non-
nested hypothesis testing it is possible for all models under consideration to be rejected, or all
models to be deemed as observationally equivalent.
Due to its asymmetric treatment of the available models, the choice of the null hypothesis
plays a critical role in the hypothesis testing approach. When the models are nested the most
parsimonious model can be used as the null hypothesis. But in the case of non-nested models
(particularly when the models are globally non-nested) there is no natural null, and it is impor-
tant that the null hypothesis is selected on a priori grounds.5 Alternatively, the analysis could
be carried out with different models in the set treated as the null. Therefore, the results of non-
nested hypothesis testing is less clear cut as compared with the case where the models are nested.
It is also important to emphasise the distinction between paired and joint non-nested hypoth-
esis tests. Letting f1 denote the null model and fi ∈ M, i = 2, 3, . . . , m index a set of m − 1
alternative models, a paired test is a test of f1 against a single member of M, whereas a joint test
is a test of f1 against multiple alternatives in M.
The distinction between model selection and non-nested hypothesis tests can also be moti-
vated from the perspective of Bayesian versus sampling-theory approaches to the problem of
inference. For example, it is likely that with a large amount of data the posterior probabilities asso-
ciated with a particular hypothesis will be close to one. However, the distinction drawn by Zell-
ner (1971) between ‘comparing’ and ‘testing’ hypotheses is relevant given that within a Bayesian
perspective the progression from a set of prior to posterior probabilities on M, mediated by the
Bayes factor, does not necessarily involve a decision to accept or reject the hypothesis. If a deci-
sion is required it is generally based upon minimizing a particular expected loss function. Thus,
model selection motivated by a decision problem is much more readily reconcilable with the
Bayesian rather than the classical approach to model selection.
Finally, the choice between hypothesis testing and model selection clearly depends on the pri-
mary objective of the exercise. There are no definite rules. Model selection is more appropriate
when the objective is decision-making. Hypothesis testing is better suited to inferential prob-
lems where the empirical validity of a theoretical prediction is the primary objective. A model
may be empirically adequate for a particular purpose but of little relevance for another use. Only
in the unlikely event that the true model is known or knowable will the selected model be uni-
versally applicable. In the real world where the truth is elusive and unknowable both approaches
to model evaluation are worth pursuing.
5 The concepts of globally and partially non-nested models are defined in Pesaran (1987b).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
AIC = T (
θ ) − p, (11.19)
where
In the case of single-equation linear (or nonlinear) regression models, the AIC can also be writ-
ten equivalently as
2p
AICσ = log(σ̃ 2 ) + , (11.20)
T
SBC = T (
θ ) − 12 p log T. (11.21)
In application of the SBC across models, the model with the highest SBC value is chosen. For
regression models an alternative version of (11.21), based on the estimated standard error of the
regression, σ̃ , is given by
6 For linear regression models, the equivalence of (11.19) and (11.20) follows by substituting for (θ̃)= − n (1 +
n 2
log 2π) − 2n log σ̃ 2 in (11.19):
n n
AIC = − (1 + log 2π ) − log σ̃ 2 − p,
2 2
hence using (11.20)
n n
AIC = − (1 + log 2π) − AICσ .
2 2
Therefore, in the case of regression models estimated on the same sample period, the same preference ordering across
models will result irrespective of whether AIC or AICσ criteria are used.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
log T
SBCσ = log(σ̃ 2 ) + p.
T
According to this criterion, a model is chosen if it has the lowest SBCσ value. See C.5 in Appendix C
for a Bayesian treatment and derivations in the case of linear regression models.
HQC = T (
θ ) − (log log T)p,
2 log log T
HQCσ = log σ̃ + p.
T
M2 : y = Zβ 2 + u2 , u2 ∼ N(0, ω IT ),2
(11.23)
where y is the T × 1 vector of observations on the dependent variable, X and Z are T × k1 and
T × k2 observation matrices for the regressors of models M1 and M2 , β 1 and β 2 are the k1 × 1
and k2 × 1 unknown regression coefficient vectors, and u1 and u2 are the T × 1 disturbance
vectors.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
In the context of these regressions, models M1 and M2 are non-nested if the regressors of
M1 (respectively M2 ) cannot be expressed as an exact linear combination of the regressors of
M2 (respectively M1 ). For a formal definition of the concepts of nested and non-nested mod-
els, see Pesaran (1987b). An early review of the literature on non-nested hypothesis testing is
given in McAleer and Pesaran (1986). A more recent review can be found in Pesaran and Weeks
(2001).
where
Similarly, the Cox statistic N2 is also computed for the test of M2 against M1 .
Pesaran and Deaton (1978) extend the Cox test to non-nested nonlinear system equation
models.
where
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Similarly, the Ñ-test statistic, Ñ2 , is also computed for the test of M2 against M1 .
(T − k2 )(ω̃2 − ω̃2∗ )
W1 = 1/2 . (11.27)
2σ̃ 4 Tr(B2 ) + 4σ̃ 2 β̂ 1 X M2 M1 M2 Xβ̂ 1
All the notations are as above. Notice that it is similarly possible to compute a statistic, W2 , for
the test of M2 against M1 .
y = Xβ 1 + λ(Zβ̂ 2 ) + u.
The relevant statistic for the J-test of M2 against M1 is the t-ratio of μ in the OLS regression
y = Zβ 2 + μ(Xβ̂ 1 ) + v,
where β̂ 1 = (X X)−1 X y, and β̂ 2 = (Z Z)−1 Z y. The J-test is asymptotically equivalent to the
above non-nested tests but, as demonstrated by extensive Monte Carlo experiments in Godfrey
and Pesaran (1983), the Ñ-test, and the W-test, defined above, are preferable to the J-test in small
samples.
y = Xβ 1 + λ(A2 Xβ̂ 1 ) + u.
The relevant statistic for the JA-test of M2 against M1 is the t-ratio of μ in the OLS regression
y = Zβ 2 + μ(A1 Zβ̂ 2 ) + v.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
y = Xa0 + Z∗ δ + u,
where Z∗ denotes the variables in M2 that cannot be expressed as exact linear combinations of
the regressors of M1 . Similarly, it is possible to compute the F-statistic for the test of M2 against
M1 . The encompassing test is asymptotically equivalent to the above non-nested tests under the
null hypothesis, but in general it is less powerful for a large class of alternative non-nested models
(see Pesaran (1982)).
A Monte Carlo study of the relative performance of the above non-nested tests in small sam-
ples can be found in Godfrey and Pesaran (1983).
where f(y) and g(y) are known transformations of the T × 1 vector of observations on the
underlying dependent variable of interest, y. Examples of the functions f(y) and g(y), are
where z is a variable of choice. Notice that log(y) refers to a vector of observations with elements
equal to log(yt ), t = 1, 2, . . . , n. Also y − y(−1) refers to a vector with a typical element equal
to yt − yt−1 , t = 1, 2, . . . , T.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Similarly, the PE statistic for testing Mg against Mf is given by the t-ratio of α g in the auxiliary
regression
g(y) = Zd + α g Xβ̂ 1 − f g−1 (Zβ̂ 2 ) + Error. (11.31)
−1 −1
−1f (·) and g (·) represent
Functions the inverse functions for f (·) and g(·), respectively, such
that f f (y) = y, and g g −1 (y) = y. For example, in the case where Mf is linear (i.e.,
f (y) = y) and Mg is log-linear (i.e., g(y) = log y), we have
f −1 (yt ) = yt ,
g −1 (yt ) = exp(yt ).
In the case where Mf is in first differences (i.e., f (yt ) = yt − yt−1 ) and Mg is in log-differences
(i.e., g(yt ) = log(yt /yt−1 )) we have
The BM statistic for the test of Mg against Mf is given by the t-ratio of θ g in the auxiliary
regression
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where SSRf denotes the sums of squares of residuals from the DL regression
e1 /σ̂ −X e1 /σ̂ −e2
= b+ c+ d + Error, (11.35)
τ 0 −τ σ̂ v̂
where
and τ = (1, 1, . . . , 1) is a T × 1 vector of ones, and g (yt ) and f (yt ) stand for the derivatives
of g(yt ) and f (yt ) with respect to yt .
To compute the SSRf statistic we first note that
where
e1 /σ̂ −X e1 /σ̂ −e2
ỹ = , X̃ = .
τ 0 −τ σ̂ v̂
1 2
DLf = k1 R1 + (2T − k1 )R32 − 2k1 R2 R3 , (11.36)
D
where
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
T
T 1/2
Tf (R) = − 12 T 1/2 log(σ̂ 2 /ω̂2 ) + T −1/2 log f (yt )/g (yt )
t=1
+ 12 T −1/2 (k1 − k2 ) − T 1/2 CR (θ̂ , γ̂ ∗ (R)), (11.37)
where θ̂ = β̂ 1 , σ̂ 2 , R is the number of replications, γ̂ ∗ (R) is the simulated pseudo-ML esti-
mator of γ = β 2 , ω2 under Mf :
R
γ̂ ∗ (R) = R−1 γ̂ j , (11.38)
j=1
where γ̂ j is the ML estimator of γ computed using the artificially simulated independent obser-
vations Yj = (Yj1 , Yj2 , . . . , YjT ) obtained under Mf with θ = θ̂ . CR (θ̂ , γ̂ ∗ (R)) is the simulated
estimator of the ‘closeness’ measure of Mf with respect of Mg (see Pesaran (1987b))
R
CR (θ̂ , γ̂ ∗ (R)) = R−1 [Lf (Yj , θ̂ ) − Lg Yj , γ̂ ∗ (R) ], (11.39)
j=1
where Lf (Y, θ ) and Lg (Y, γ ) are the average log-likelihood functions under Mf and Mg ,
respectively
T
1 2
Lf (Y, θ ) = − 12 log(2π σ 2 ) − 2 f (yt ) − β 1 xt /T
2σ t=1
T
+ T −1 log f (yt ) , (11.40)
t=1
T
1 2
Lg (Y, γ ) = − 12 log(2π ω ) − 2
2
g(yt ) − β 2 zt /T
2ω t=1
T
+ T −1 log g (yt ) . (11.41)
t=1
T
2
V∗d (R) = (T − 1)−1 (d∗t − d̄∗ )2 , (11.42)
t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
!
T
where d̄∗ = T −1 d∗t , and
t=1
1
d∗t = − 12 log σ̂ 2 /ω̂2∗ (R) − 2 e2t1
2σ̂
1 2
+ 2 g(yt ) − zt β̂ ∗2 (R) + log f (yt )/g (yt ) ,
2ω̂∗ (R)
and
T T
LLfg = − log(σ̂ 2 /ω̂2 ) + log f (yt )/g (yt ) + 12 (k1 − k2 ). (11.43)
2 t=1
One could also apply the known model selection criteria such as AIC and SBC to the models Mf
and Mg (see Section 11.5). For example, in the case of the AIC we have
7 The Monte Carlo results reported in Pesaran and Pesaran (1995) also clearly show that the SC and the DL tests are
c
more powerful than the PE or BM tests discussed in Sections 11.7.1 and 11.7.2 above.
2 2
8 Note that throughout σ̂ = e e /(n−k ) and ω̂ = e e /(n−k ) are used as estimators of σ and ω2 , respectively.
2
1 1 1 2 2 2
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Vuong’s criterion is motivated in the context of testing the hypothesis that Mf and Mg are
equivalent, using the Kullback and Leibler (1951) information criterion as a measure of
goodness of fit. The Vuong (1989) test criterion for the comparison of Mf and Mg is
computed as
!
T
dt
t=1
Vfg = 1/2 , (11.44)
!
T
(dt − d̄)2
t=1
!
T
where d̄ = T −1 dt , and
t=1
e2t1 e2t2
dt = − 12 2
log(σ̂ /ω̂ ) − 2 1
2 2
− 2
+ log f (yt )/g (yt ) ,
σ̂ ω̂
et1 = f (yt ) − β̂ 1 xt , et2 = g(yt ) − β̂ 2 zt .
Under the null hypothesis that ‘Mf and Mg are equivalent’, Vfg is approximately distributed as
a standard normal variate.
Example 31 Suppose you are interested in testing the following linear form of the inflation augmented
ARDL(1, 1) model for aggregate consumption (ct )
where ct is real non-durable consumption expenditure in the US, yt is real disposable income, and
π t is the inflation rate in the years 1990 to 1994. Table 11.1 reports the parameter estimates under
models M1 and M2 . The estimates of the parameters of M1 computed under M1 are the OLS esti-
mates (α̂), while the estimates of the parameters of M1 computed under M2 are the pseudo-true
estimators (α̂ ∗ = α̂ ∗ (β̂)). If model M1 is correctly specified, one would expect α̂ and α̂ ∗ to be
near to one another. The same also applies to the estimates of the parameters of model M2 (β).
The bottom part of Table 11.1 gives a number of non-nested statistics for testing the linear versus
the log-linear model and vice versa, computed by simulations, using a number of replications equal
to 100. This table also gives the Sargan (1964) and Vuong (1989) likelihood function criteria
for the choice between the two models. All the tests reject the linear model against the log-linear
model, and none reject the log-linear model against the linear one at the 5 per cent significance
level, although the simulated Cox and the double-length tests also suggest rejection of the log-linear
model at the 10 per cent significance level. Increasing the number of replications to 500 does not
alter this conclusion. The two choice criteria also favour the log-linear specification over the linear
specification.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
S-Test is the SCc test proposed by Pesaran and Pesaran (1995) and is the simple version of the simulated Cox
test statistic.
PE-Test is the PE test due to MacKinnon, White, and Davidson (1983).
BM-Test is due to Bera and McAleer (1989).
DL-Test is the double-length regression test statistic due to Davidson and MacKinnon (1984).
9 The exposition in this section follows Garratt et al. (2003a). Also see Section C.5 in Appendix C on Bayesian model
selection.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where fi (.) is the joint probability density function of past and future values of zt . Conditional
on each model, Mi , being true we shall assume that the true value of θ i , which we denote by θ i0
is fixed and remains constant across the estimation and the prediction periods and lies in the
interior of
i . We denote the maximum likelihood estimator of θ i0 by θ̂ iT , and assume that it
satisfies the usual regularity conditions so that
√
a
T θ̂ iT − θ i0 |Mi N 0, Vθ i ,
a
where stands for ‘asymptotically distributed as’, and T −1 Vθ i is the asymptotic covariance
matrix of θ̂ iT conditional on Mi . Under these assumptions, parameter uncertainty only arises
when T is finite. The case where θ i0 could differ across the estimation and forecast periods poses
new difficulties and can be resolved in a satisfactory manner if one is prepared to formalize how
θ i0 changes over time.
The object of interest is the probability density function of ZT+1,h = (zT+1 , zT+2 , . . . , zT+h )
conditional on the available observations
at the end of period T, ZT = (z1 , z2 , . . . , zT ). This will
be denoted by Pr ZT+1,h |ZT . For this purpose, models and their parameters
serve
as interme-
diate inputs in the process of characterization and estimation of Pr ZT+1,h |ZT . The Bayesian
approach provides an elegant and logically coherent solution to this problem, with a full solu-
tion given by the so-called Bayesian model averaging formula (see, e.g., Draper (1995), Hoeting,
Madigan, Raftery, and Volinsky (1999)):
m
Pr ZT+1,h |ZT = Pr (Mi |ZT ) Pr(ZT+1,h |ZT , Mi ), (11.46)
i=1
Pr (Mi ) is the prior probability of model Mi , Pr(ZT |Mi ) is the integrated or average likelihood
Pr(ZT |Mi ) = Pr (θ i |Mi ) Pr(ZT |Mi , θ i )dθ i , (11.48)
θi
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
The Bayesian approach requires a priori specifications of Pr (Mi ) and Pr (θ i |Mi ) for i = 1, 2, . . . ,
m, and further assumes that one of the m models being considered is the DGP so that
Pr ZT+1,h |ZT defined by (11.46) is proper.
The Bayesian model averaging formula also provides a simple ‘optimal’ solution to the prob-
lem of pooling of the point forecasts, E(ZT+1,h |ZT , Mi ), studied extensively in the literature,
namely (see, for example, Draper (1995))
m
E ZT+1,h |ZT = Pr (Mi |ZT ) E(ZT+1,h |ZT , Mi ),
i=1
m
V ZT+1,h |ZT = Pr (Mi |ZT ) V(ZT+1,h |ZT , Mi )
i=1
m
2
+ Pr (Mi |ZT ) E(ZT+1,h |ZT , Mi ) − E ZT+1,h |ZT ,
i=1
where the first term accounts for within model variability and the second term for between model
variability.
There is no doubt that the Bayesian model averaging (BMA) provides an attractive solution
to the problem of accounting for model uncertainty. But its strict application can be problematic,
particularly in the case of high-dimensional models. The major difficulties lie in the choice of the
space of models to be considered, the model priors Pr (Mi ), and the specification of meaningful
priors for the unknown parameters, Pr (θ i |Mi ). The computational issues, while still consider-
able, are partly overcome by Monte Carlo integration techniques. For an excellent overview of
these issues, see Hoeting et al. (1999). Also see Fernandez et al. (2001) for specific applications.
Putting the problem of model specification to one side, the two important components of the
BMA formula are the posterior probability of the models, Pr (Mi |ZT ), and the posterior density
functions of the parameters, Pr (θ i |ZT , Mi ), for i = 1, . . . , m.
p
yt = β i xit + ut .
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
The predictor variables, xit , are typically standardized (to make the scale of β i comparable
across i). It is assumed that the regressors, xit , are strictly exogenous, precluding the inclusion
of lagged dependent variables. Most crucially it is assumed that the ‘true’ regression model is
‘sparse’ in the sense that only a few βi ’s are non-zero! and the rest are zero. Lasso (least absolute
p
shrinkage and selection operator) regressions uses i=1 β i as the penalty which is bounded
by the sparseness assumption. Lasso was originally proposed! by Tibshirani (1996) and is closely
p
related to the Ridge regression which uses the penalty term i=1 β 2i , which is less restrictive as
compared to the Lasso penalty. As shown in Section C.7 of Appendix , the Ridge regression also
results from the application of Bayesian analysis to regression models.
The two penalty terms can also be combined. In general, the penalized regressions can be
computed by solving the optimization problem
⎧ % &2 ⎫
⎨T
p
p
⎬
min yt − β i xit +λ (1 − α) β i + αβ 2i ,
βp ⎩ ⎭
t=1 i=1 i=1
where λ and α are called tuning parameters and are typically estimated by cross validation. OLS
corresponds to the no penalty case of λ = 0. When λ = 0, α = 1 yields the Ridge regres-
sions, and if α = 0 with λ = 0 we obtain the Lasso regression. As originally noted by Tib-
shirani (1996), Lasso is a selection procedure since Lasso optimization yields corner solutions
due to the non-differentiable nature of Lasso’s penalty function. Penalized regressions, partic-
ularly Lasso, are easy to apply and have been shown to work well in the context of indepen-
dently distributed observations. Although linear in structure, nonlinear effects can also be included
as predictors—such as threshold effects. The tuning parameters, λ and α, are estimated by
cross-validation. Comprehensive reviews of the penalized regression techniques can be found
in Hastie, Tibshirani, and Friedman (2009) and Buhlmann and van de Geer (2012).
In the case of large data sets often encountered in macroeconomics and finance, penalized
regressions must be adapted to deal with temporal dependence and possible structural breaks.
These are topics for future research, but some progress has been made for the analysis of high
dimensional factor-augmented vector autoregressions. For a review of this literature see
Chapter 33.
11.11 Exercises
1. Let f (y, θ ) and g (y, γ ) be the log-likelihood functions under models Hf and Hg , where y is
a T × 1 vector of observations on Y. Define the closeness of model Hf with respect to Hg by
Ifg (θ, γ ) = Ef f (y, θ ) − g (y, γ ) . (11.51)
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
(a) Show that, in general, Ifg (θ , γ ) is not the same as Igf (γ , θ ). Under what conditions
Ifg (θ , γ ) = 0 ?
(b) Suppose that under Hf , y are draws from the log-normal density
−(ln y − θ 1 )2
f (y, θ) = y−1 (2πθ 2 ) exp , θ 2 > 0, y > 0,
2θ 2
Derive the expression for Ifg (θ ,γ ), and show that Ifg (θ , γ ) > 0 for all values of θ and γ .
What is the significance of this result when comparing log-normal and logistic densities?
2. Suppose that it is known that T observations yt = 1, 2, . . . , T, are generated from the MA(1)
process
yt = ε t + θ ε t−1 ,
where ε t ∼ IIDN(0, σ 2 ).
(a) What is the pseudo-true value of ρ if it is incorrectly assumed that yt follows the AR(1)
process
yt = ρyt−1 + ut ,
where ut ∼ IIDN(0, ω2 ).
(b) Derive the divergence measure of the MA(1) process from the AR(1) process and vice
versa. The divergence measure of one density against another is defined by (11.51).
(c) Discuss alternative testing procedures for testing the AR(1) model against MA(1) and
vice versa.
Hf : y = Xα + uf , uf ∼ N(0, σ 2 IT ), (11.52)
Hg : y = Zβ + ug , ug ∼ N(0, ω IT ), 2
(11.53)
L1−λ
f (y |X )Lλg (y |Z )
Lλ (y |X, Z ) = ,
L1−λ
f (y |X )Lλg (y |Z )dy
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where Lf (y |X ) and Lg (y |Z ) are the likelihood functions associated with models Hf and
Hg , respectively.
Hκ : y = (1 − κ)Xα + κZβ + u.
(a) Show that the t-ratio statistic for testing κ = 0, for a given value of β, is given by
β Z Mx y
tκ (Zβ) = 1/2 ,
σ̂ β Z Mx Zβ
where
* 2 +
1 β Z M x y
σ̂ 2 = y Mx y− .
T − kf − 1 β Z Mx Zβ
(b) Derive an expression for supβ {tκ (Zβ)} and discuss its relevance for testing Hf against
Hg , defined by (11.52) and (11.53).
5. Consider the log-normal and the exponential models set out in Question 1 above. Denote the
prior densities of the parameters of the two models by π f (θ ) and π g (γ ).
(a) Suppose you are given the observations y = (y1 , y2 , . . . , yT ) . Derive the posterior odds
of the log-normal against the exponential model assuming that they have the same prior
odds.
(b) Compare the Bayesian posterior odds with the values of Efg (θ 0 , γ ∗ ) and Efg (γ 0 , θ ∗ ) as
T → ∞, where θ 0 (γ 0 ) is the true value of θ(γ ) under Hf (Hg ), and γ ∗ and θ ∗ are the
associated pseudo-true values.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Part III
Stochastic Processes
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
12 Introduction to Stochastic
Processes
12.1 Introduction
A ny ordered series may be regarded as a time series. The temporal, immutable order imposed
on the observations is the critical and distinguishing feature of time series. As a result, time
series techniques cannot be generally applied to cross-section observations (such as those over
different individuals, firms, countries, or regions) where they cannot be ordered in an immutable
(time-invariant) fashion. The origin of modern time series analysis dates back to the pioneering
work of Slutsky (1937) and Yule (1926, 1927), on the analysis of the linear combinations of
purely random numbers. There are two main approaches to the analysis of time series; the time
domain and the spectral (or frequency domain) approaches. Time domain techniques are preva-
lent in econometrics, whilst in engineering and oceanography the frequency domain approach
dominates. Until 1980s, the analysis of time series has been confined to stationary processes, or
processes that can be transformed to stationarity. But important developments have taken place,
particularly in the area of non-stationary and nonlinear times series analysis.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
{yt , t ∈ T } is said
Definition 14 (Strict stationarity of order s) The stochastic process to be
strictly stationary of order s, if the joint distribution functions of yt1 , yt2 , . . . , ytk and yt1 +h ,
yt2 +h , . . . , ytk +h are identical for all values of t1 , t2 , . . . , tk , and h, and all positive integers k ≤ s.
Definition 15 (Strict stationarity) The stochastic process {yt , t ∈ T } is said to be strictly stationary
if it is strictly stationary of order s for any positive integer s.
In effect, under strict stationarity the process is in ‘stochastic equilibrium’ and realizations of
the process obtained over different time intervals would be similar. This is a counterpart of the
concept of static equilibrium in the theory of deterministic processes. One important implica-
tion of strict stationarity is that yt will have the same distribution for all t. The importance of the
stationarity property for empirical analysis is closely tied up with the ergodicity property which,
loosely speaking, ensures consistent estimation of the unknown parameters of the process from
time-averages. For a rigorous account of the ergodicity property and the conditions under which
it holds see, for example, Hannan (1970), and Karlin and Taylor (1975). It is clear that if a pro-
cess is strictly stationary and has second-order moments, then its mean and variance will be time
invariant, namely they do not depend on t.
Another important concept is weak stationarity (or, simply, stationarity).
Therefore, for a weakly stationary process the covariance between any two observations
depends only on the length of time separating the observations. A weakly stationary process is
also referred to as ‘covariance stationary’, or ‘wide-sense stationary’. This definition is, however,
too restrictive for most economic time series that are trended. A related concept which allows
for deterministic trends is the trend-stationary process.
Examples of purely deterministic processes include time trends, seasonal dummies and
sinusoid functions, such as dt = 1 for odd (even) value of t, dt = a0 + a1 t, or more generally
dt = f (t), where f (t) is a deterministic function of t.
Finally, note that a strictly stationary process with finite second-order moments is weakly sta-
tionary, but a weakly stationary process need not be strictly stationary. It is also worth noting
that it is possible for a strictly stationary process not to be weakly stationary. This happens when
the strictly stationary process does not have a second-order moment.
The simplest form of a covariance (or weakly) stationary process is the ‘white noise process’.
Definition 18 The process {ε t } is said to be a white noise process if it has mean zero, a constant vari-
ance, and ε t and ε s are uncorrelated for all s = t.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
q
yt = ai ε t−i , t = 0, ±1, ±2, . . . , (12.1)
i=0
where {ε t } is a white noise process with mean 0 and a constant variance σ 2 , and aq = 0. Recall
that a white noise process is also serially uncorrelated, namely E(εt ε t ) = 0, for all t = t . It is
easily seen that without loss of generality we can set a0 = 1. This process is also referred to as a
‘one-sided moving average process,’ and distinguished from the two-sided representation
q
yt = ai ε t−i , t = 0, ±1, ±2, . . . .
i=−q
But by letting ηt = ε t+q , the above two-sided process can be written as the one-sided moving
average process
2q
yt = a∗i ηt−i , t = 0, ±1, ±2, . . . ,
i=0
where a∗i = ai−q . Therefore, in what follows we focus on the one-sided moving average process,
(12.1), and simply refer to it as the moving average process of order q, denote by MA(q).
It is often useful to write down the moving average process, (12.1), in terms of polynomial
lag operators. Denote a first-order lag operator by L and note that by repeated application
of the operator we have Li ε t = ε t−i , where L0 = 1, by convention. Then (12.1) can be
written as
q
yt = ai L i
ε t = aq (L)ε t .
i=0
For a finite q an MA(q) process is well defined for all finite values of the coefficients (weights) ai
and is covariance stationary. The autocovariance function of an MA(q) process is given by
q−|h|
γ (h) = E(yt yt+h ) = σ 2
ai ai+|h| , if 0 ≤ |h| ≤ q, (12.2)
i=0
= 0, for |h| > q.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Only the first q autocovariances of an MA(q) process are non-zero. This is rather restrictive for
many economic and financial time series, but can be relaxed by letting q tend to infinity.
qHowever,
certain restrictions must be imposed on the coefficients {ai } for the infinite series i=0 ai ε t−i
to converge to a well defined limit as q → ∞.
∞
| ai |< ∞.
i=0
∞
a2i < ∞.
i=0
It is easily seen that an absolutely summable sequence is also square summable, but the reverse
is not true. For a proof, note that
∞ 2 ∞ ∞ ∞
| ai | = a2i + 2 | ai || aj |> a2i .
i=0 i=0 i>j i=0
∞ ∞ ∞
Hence 2 1/2<
i=0 ai i=0 | ai | and 2
i=0 ai
will
be bounded
if ∞i=0 | ai |< ∞. To see
1
that the reverse does not hold, note that the sequence i+1 is square summable, namely
∞
2
1
< ∞,
i=0
i+1
∞
1 1 1
= 1 + + + ...,
i=0
i+1 2 3
in fact diverges.
q
yt = lim ai ε t−i . (12.3)
q→∞
i=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Proposition 38 The infinite moving average process exists in the mean squared error sense
⎛ 2 ⎞
q
lim E ⎝yt − ai εt−i ⎠ = 0,
q→∞
i=0
∞
if the sequence {ai } is square summable, with E(y2t ) = σ 2 2
i=0 ai < ∞.1
∞
Proposition 39 The infinite moving average process converges almost surely to yt = i=0 ai ε t−i if
the sequence {ai } is absolutely summable.2
∞
γ (h) = σ 2 ai ai+|h| . (12.4)
i=0
Notice that γ (h) = γ (−h), and hence γ (h) is an even function of h. Scaling the autocovariance
function by γ (0) we obtain the autocorrelation function of order h, denote by ρ(h)
∞
γ (h) i=0 ai ai+|h|
ρ(h) = = ∞ 2 .
γ (0) i=0 ai
Clearly ρ(0) = 1. It is also readily seen that ρ 2 (h) ≤ 1, for all h. For a proof first note that
Since this inequality holds for all values of λ, it should also hold for λ∗ = γ (h)/γ (0), which
globally minimizes its left-hand side. Namely, we must also have
γ 2 (h)
(1 + λ2∗ )γ (0) − 2λ∗ γ (h) = γ (0) − ≥ 0,
γ (0)
and since γ (0) > 0, dividing the last inequality by γ (0) we obtain 1 − ρ 2 (h) ≥ 0, as desired.
For a non-zero h the equality holds if and only if yt is an exact linear function of yt−h .
Finally, we observe that a linear stationary process with absolutely summable coefficients
will have absolutely summable autocovariances. Consider the autocovariance function of the
MA(∞) process given by (12.4). Hence
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∞
∞
∞ ∞
∞
|γ (h)| ≤ σ 2 ai ai+|h| ≤ σ 2 |ai | ai+|h| .
h=0 h=0 i=0 i=0 h=0
∞
∞
yt = ai ε t−i = ai L i
ε t = a(L)ε t .
i=−∞ i=−∞
that {ai } is an absolutely summable sequence. Then the autocovariance generating function
Suppose
of yt is given by
G(z) = σ 2 a(z)a(z−1 ),
where
∞
a(z) = ai zi .
i=−∞
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Multiplying both sides of the above relationship by zh , summing over h ∈ (−∞, ∞) and taking
expectations we have
∞
∞
∞
G(z) = zh ai aj E ε t−i ε t−h−j .
h=−∞ i=−∞ j=−∞
But E ε t−i ε t−h−j is non-zero only if t − i = t − h − j, or if i = h + j . Hence
∞
∞
G(z) = σ 2
zh ah+j aj
h=−∞ j=−∞
∞
∞
= σ2 zh+j ah+j z−j aj
h=−∞ j=−∞
∞
∞
= σ2 as zs aj z−j .
s=−∞ j=−∞
The above proof is carried out for a two-sided MA process, but it applies equally to one-sided
MA processes by setting ai = 0 for i < 0. Also, there exist important relationships between
the autocovariance generating function and the spectral density function which we shall dis-
cuss later.
In a number of time series applications, a stationary stochastic process is obtained from
another stationary stochastic process through an infinite-order MA filtration. Examples includes
consumption growth obtained from the growth of real disposable income, long term real inter-
est rate from short term rates, or equity returns derived from dividend growths. The following
proposition establishes the conditions under which such infinite-order filtrations exist and are
stationary with absolutely summable autocovariances.
Proposition 41 Consider the following two infinite moving average processes with absolute summable
coefficients
∞
∞
yt = ai xt−i = ai Li
xt = a(L)xt ,
i=0 i=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and
∞
∞
xt = bi ε t−i = i
bi L ε t = b(L)ε t ,
i=0 i=0
yt = a(L)xt = a(L)b(L)ε t
∞
= c(L)ε t = ci ε t−i ,
i=0
where {ci } is an absolute summable sequence, and yt is a stationary process with absolutely
summable autocovariances.
c(L) = a(L)b(L),
then
c0 = a0 b0 ,
c1 = a0 b1 + a1 b0 ,
c2 = a0 b2 + a1 b1 + a2 b0 ,
..
.
ci = a0 bi + a1 bi−1 + . . . . + ai b0 , and so on.
Then
∞
∞ ∞
ci = a0 bi + a 1 bi + . . . ,
i=0 i=0 i=0
or
∞
∞ ∞
|ci | ≤ |ai | |bi | ,
i=0 i=0 i=0
∞
∞ establishes the absolute summability of∞{ci } considering that i=0 |ai | < K < ∞, and
which
i=0 |bi | < K < ∞. Also since yt = i=0 ci ε t−i , then by (12.5) it follows that yt has
absolutely summable autocovariances.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
These four components are usually combined together using either an additive or a multiplica-
tive model. The latter is often transformed into an additive structure using the log-transformation.
Most statistical procedures are concerned with modelling of the cyclical component and usually
take trend and seasonal patterns as given or specified a priori by the investigator. Further discus-
sion can be found in Mills (1990, 2003).
The meaning and the importance of stationarity can be appreciated in the context of the
famous decomposition theorem due to Wold (1938). Wold proved that any stationary process
can be decomposed into the sum of a deterministic (perfectly predictable) and a purely non-
deterministic (stochastic) component. More formally
Theorem 42 (Wold’s decomposition) Any trend-stationary process yt can be represented in
the form of
∞
y t = dt + α i ε t−i ,
i=0
where α 0 = 1, and ∞ i=0 α i < K < ∞. The term dt is a deterministic component, while {ε t } is
2
ε t = yt − E(yt | It−1 ), t = 1, 2, . . . ,
E ε 2t | It−1 = σ 2 > 0,
E(ε t ds ) = 0, for all s and t,
In the above decomposition, εt is the error in the one step ahead forecast of yt , and is also
known as the ‘innovation error’. As noted in Definition 17, the deterministic component, dt , is
also known as the perfectly predictable component of yt , in the sense that E (dt |It−1 ) = dt .
Further discussion on Wold’s decomposition theorem can be found in Nerlove, Grether, and
Carvalo (1979) and in Brockwell and Davis (1991).
K < ∞, it is possible to approximate α(z) by a ratio of two finite-order polynomials, φ p (z)/θ q (z),
p q
where φ p (z) = i=0 φ i zi and θ q (z) = i=0 θ i zi , for sufficiently large, but with finite, p and q.
This yields the general form of an ARMA(p, q) process which is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
p
q
yt = φ i yt−i + θ i ε t−i , θ 0 = 1. (12.6)
i=1 i=0
q
It is easily seen that the MA part of the process, ut = i=0 θ i ε t−i , is stationary for any finite q,
and hence yt is stationary if the AR part of the process is stationary. Consider the process
p
yt = φ i yt−i + ut ,
i=1
g
p
yt = Ai λti ,
i=1
p
λ =
t
φ i λt−i . (12.7)
i=1
The above general solution assumes that the roots of this characteristic equation are distinct.
More complicated solution forms follow when two or more roots are identical, but the main
conclusions are unaffected by such complications. For the yt process to be stationary it is neces-
sary that all the roots of (12.7) lie strictly inside the unit circle. Alternatively, the condition can
be written in terms of z = λ−1 , thus requiring that all the roots of
p
1− φ i zi = 0, (12.8)
i=1
lie outside the unit circle. The ARMA process is said to be invertible (so that yt can be solved
uniquely in terms of its past values) if all the roots of
p
1− θ i zi = 0, (12.9)
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
θ
ρ(1) = , and,
1 + θ2
ρ(h) = 0, for h > 1.
It is easily seen that, for a given value of ρ(1), the moving average parameter, θ , is not unique;
for any choice of θ , its inverse also satisfies the relationship, ρ(1) = θ/(1 + θ 2 ) = θ −1 /(1 +
θ −2 ). Also notice that for obtaining a real-valued solution θ in terms of ρ(1), it must be that
|ρ(1)| ≤ 12 . Similar conditions apply to the more general MA(q) process defined by
12.6.2 AR processes
First, consider the first-order autoregressive process, denoted by AR(1)
yt = φyt−1 + ε t , ε t ∼ 0, σ 2 .
This process is stationary if |φ| < 1. Under this condition yt can be written as an infinite MA
process with absolutely summable coefficients
∞
1
yt = φ ε t−i =
i
εt .
i=0
1 − φL
Therefore, using results in Proposition 40, the autocovariance generating function of the AR(1)
process is given by
1 1
G(z) = σ 2
1 − φz−1
1 − φz
= σ 1 + φz + φ z + . . . 1 + φz−1 + φ 2 z−2 + . . .
2 2 2
∞
σ2
= 1+ φ h (zh + z−h ) .
1 − φ2
h=1
Hence
σ 2 φ |h|
Autocovariance function : γ (h) = ,
1 − φ2
Autocorrelation function : ρ(h) = φ |h| .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
φ(L)yt = ε t ,
where
φ(L) = 1 − φL − φ 2 L2 − · · · − φ p Lp .
To derive the conditions under which this process is stationary it is convenient to consider its
so-called companion form as the following first-order vector autoregressive process
yt = yt−1 + ξ t ,
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
yt φ1 φ2 . . . φp εt
⎜ yt−1 ⎟ ⎜ 1 0 ... 0 ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
yt = ⎜ .. ⎟, = ⎜ .. .. .. ⎟ , ξ t = ⎜ .. ⎟.
⎝ . ⎠ ⎝ . . . ⎠ ⎝ . ⎠
yt−p+1 0 ... 1 0 0
t−1
yt = y0 +
t
j ξ t−j .
j=0
lim t = 0.
t→∞
This condition is satisfied if all the eigenvalues of the companion matrix, , lie inside the unit
circle, which is equivalent to the absolute values of all the roots of φ(z) = 0 being strictly larger
than unity (see (12.8)). Under this condition the AR process has the following infinite-order
MA representation
∞
yt = α i ε t−i = α(L)ε t ,
i=0
where
α(L)φ(L) ≡ 1,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∞
∞
i |α i | < ∞, and iα 2i < ∞. (12.11)
i=1 i=1
To see why, suppose that |α i | < Kρ i , where 0 < ρ < 1, and K is a positive finite constant.
Then
∞
∞
∞
i |α i | < K iρ i = Kρ iρ i−1
i=1 i=1 i=1
∞
d d ρ
= Kρ ρ i
= Kρ
dρ i=1
dρ 1−ρ
Kρ
= < ∞.
(1 − ρ)2
σ2
G(z) = .
φ(z)φ(z−1 )
This result can now be used to derive the autocovariances of the AR(p) process. But a simpler
approach would be to use the Yule–Walker equations, which can be readily obtained by pre-
multiplying (12.10) with yt , yt−1 , yt−2 , . . . , yt−p , and then taking expectations. Namely
Taking expectations of both sides of this relation, and recalling that under stationarity
γ (h) = γ (−h), we have
E(yt−h ε t ) = σ 2, for h = 0,
= 0, for h > 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Therefore, we have
and, for h = 1, 2, . . . , p,
The system of equations (12.12) and (12.13) is known as the Yule–Walker equations and can
be used in two ways: solving for the autocovariances, recursively, from the autoregressive coef-
ficients; and for a consistent estimation of the former in terms of the latter. Writing (12.13) in
matrix notation we have
⎛ ⎞⎛ ⎞ ⎛ ⎞
γ (0) γ (1) . . . γ (p − 1) φ1 γ (1)
⎜ γ (1) γ (0) . . . γ (p − 2) ⎟⎜ φ2 ⎟ ⎜ γ (2) ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ .. .. .. ⎟⎜ .. ⎟=⎜ .. ⎟,
⎝ . . ... . ⎠⎝ . ⎠ ⎝ . ⎠
γ (p − 1) γ (p − 2) . . . γ (0) φp γ (p)
which can be used to compute the autoregressive coefficients, φ i , in terms of the autocovari-
ances. Alternatively, we have
⎛ ⎞⎛ ⎞ ⎛ ⎞
1 −φ 1 · · · −φ p γ (0) σ2
⎜ −φ 1 1 · · · −φ p−1 ⎟⎜ γ (1) ⎟ ⎜ 0 ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ .. .. .. .. ⎟⎜ .. ⎟=⎜ .. ⎟,
⎝ . . . . ⎠⎝ . ⎠ ⎝ . ⎠
−φ p −φ p−1 ··· 1 γ (p) 0
which can be used to solve for autocovariances in terms of the autoregressive coefficients. For
example, in the case of the AR(2) process we have
and
(1 − φ 2 )σ 2
γ (0) = ,
(1 + φ 2 ) (1 − φ 2 )2 − φ 21
φ σ2
γ (1) = 1 ,
(1 + φ 2 ) (1 − φ 2 )2 − φ 21
2
φ 1 + φ 2 (1 − φ 2 ) σ 2
γ (2) = .
(1 + φ 2 ) (1 − φ 2 )2 − φ 21
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Using the above expressions for γ (1) and γ (2) as initial conditions, the difference equation
(12.13) can now be used to solve for higher-order autocovariances either recursively or directly
in terms of the roots of (12.8). Assuming the roots of (12.8) are distinct (real or complex) and
denoting the inverse of these roots by λ1 and λ2 we have
Notice also that the stability conditions on the roots of 1 − φ 1 z − φ 2 z2 = 0 are satisfied if
1 − φ 2 − φ 1 > 0,
1 − φ 2 + φ 1 > 0,
1 + φ 2 > 0.
12.8 Exercises
1. Which of the following autoregressive processes are stationary?
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
yt = φyt−1 + ε t + θ ε t−1 ,
yt = ρ 1 yt−1 + ρ 2 yt−2 + ε t .
∞
μ
xt = + α i ε t−i ,
1 − ρ1 − ρ2 i=0
yt = ρ 1 yt−1 + ρ 2 yt−2 + ε t ,
(c) Using the above result, or otherwise, derive the following Yule–Walker equations for the
AR(2) model in (12.14)
γ 0 = ρ1 γ 1 + ρ2 γ 2 + σ 2,
γ 1 = ρ1 γ 0 + ρ2 γ 1 ,
γ 2 = ρ1 γ 1 + ρ2 γ 0,
4. Let
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and
y2t = ut − 0.9ut−1 ,
xt = μ + α(L)ε t , ε t IID(0, σ 2 ),
where
α(L) = α 0 + α 1 L + α 2 L2 + . . . ,
α 0 = 1 and L is a lag operator, Lxt = x. Derive the conditions under which {xt } is
yt = φyt−1 + ε t , ε t ∼ IIDN(0, σ 2 ),
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(a) Obtain the log-likelihood function of the model assuming (i) y0 is fixed, or (ii) y0 is
stochastic with mean zero and variance σ 2 /(1 − φ 2 ).
(b) Show that the ML estimator of φ is guaranteed to be in the range |φ| < 1, only under the
stochastic initial value case.
9. Prove that linear combinations of a finite number of covariance stationary processes is covari-
ance stationary. Under what conditions does this result hold if the number of stationary
processes under consideration tends to infinity?
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
13 Spectral Analysis
13.1 Introduction
S pectral analysis provides an alternative to the time domain approach to time series analysis.
This approach views a stochastic process as a weighted sum of the periodic functions sin(·)
and cos(·) with different frequencies, namely
m
yt = μ + aj cos ωj t + bj sin(ωj t) , (13.1)
j=1
where ω denotes frequency in the range (−π, π) , ωj denotes a particular realization of ω, aj and
bj are the weights attached to different sine and cosine waves, and m is the window size. The above
specification explicitly models yt as a weighted average of sine and cosine functions rather than
lagged values of yt . Any covariance stationary process has both a time domain and a frequency
domain representation, and any feature of the data that can be described by one representation
can equally be described by the other. The frequency domain approach, or spectral analysis, is
concerned with determining the importance of cycles of different frequencies for the variations
of yt over time.
m
γ (0) = E(yt − μ)2 = E(a2j ) cos2 (ωj t) + E(b2j ) sin2 (ωj t)
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
m
= σ 2j cos2 (ωj t + sin2 (ωj t)
j=1
m
= σ 2j ,
j=1
and similarly
or
m
m
γ (h) = σ 2j cos ωj h = σ 2j cos −ωj h . (13.2)
j=1 j=1
Clearly, γ (h) = γ (−h), and it readily follows that yt is a covariance stationary process.
It is also possible to consider the reverse problem and derive the frequency specific variances,
σ 2j , in terms of the autocovariances. In principle, the unknown variances, σ 2j , associated with
the individual frequencies, ωj , can be estimated from the estimates of the autocovariances, γ (h),
h = 0, 1, . . . . For example, for m = 3 we have
γ (0) = σ 21 + σ 22 + σ 23 ,
γ (1) = σ 21 cos (ω1 ) + σ 22 cos (ω2 ) + σ 23 cos (ω3 ) ,
γ (2) = σ 21 cos (2ω1 ) + σ 22 cos (2ω2 ) + σ 23 cos (2ω3 ) ,
for given choices of ω1 , ω2 , and ω3 in the frequency range (0, π ). However, this is a rather
cumbersome approach and other alternative procedures using Fourier transforms of the autoco-
variance functions have been explored in the literature. This idea is formalized in the following
definition.
Definition 22 (Spectral density) Let {yt } be a stationary stochastic process, and let γ (h) be its
autocovariance function of order h. The spectral density function associated to γ (h) is defined by
the infinite-order Fourier transform
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
∞
1
f (ω) = γ (h) eihω , ω ∈ (−π , π) , (13.3)
2π
h=−∞
√
where eihω = cos (hω) + i sin (hω), and i = −1 is a complex number.
Using the one-to-one relationship that exists between the spectral density function, f (ω), and
the autocovariances γ (h), we also have
+π
γ (h) = f (ω) eiωh dω,
−π
or, equivalently,
+π
γ (h) = f (ω) cos(ωh)dω. (13.4)
−π
m
This last result corresponds to γ (h) = j=1 σ j cos
2 ωj h , obtained using the trigonometric
representation (13.1) and (13.2).
1. f (ω) always exists and is bounded if γ (h) is absolutely summable. Since eihω = cos (hω) +
2
i sin (hω) , then eihω = cos2 (hω) + sin2 (hω) = 1, and
∞
f (ω) ≤ 1 γ (h) eihω
2π
h=−∞
∞ ∞
1
≤ γ (h) eihω = 1 γ (h) < ∞.
2π 2π
h=−∞ h=−∞
∞
1
f (−ω) = γ (h) e−ihω
2π
h=−∞
∞
1
= γ (−s) eisω ,
2π s=−∞
if we let h = −s. Since for stationary processes we have γ (−s) = γ (s), then
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
∞
1
f (−ω) = γ (s) eisω = f (ω) ,
2π s=−∞
hence f (−ω) = f (ω). This shows that f (ω) is symmetric around ω = 0. Thus the
spectral density function can also be written as
1
f (ω) = [f (ω) + f (−ω)] ,
2
or, upon using (13.3), we have (noting that eiωh + e−iωh = 2 cos (ωh))
⎡ ⎤
∞
∞
1⎣ 1 1
f (ω) = γ (h) eihω + γ (h) e−ihω ⎦
2 2π 2π
h=−∞ h=−∞
1 1
∞
= γ (h) eihω + e−ihω
2 2π
h=−∞
∞
1
= γ (h) cos(hω),
2π
h=−∞
or
∞
1
f (ω) = γ (0) + 2 γ (h) cos(hω) , with ω ∈ [0, π] . (13.5)
2π
h=1
which is bounded since by assumption the autocovariances, γ (h) , are absolute summable.
4. Spectral decomposition of the variance. Using (13.4), and the symmetry property of the
spectrum we first note that1
1 This result can be obtained directly by deriving the integral 0π f (ω)dω with f (ω) given by (13.5).
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
π
γ (0) = 2 f (ω) dω.
0
jπ/m
Since f (ω) ≥ 0, the term 2 γ −1 (0) ω=(j−1)π/m f (ω) dω can be viewed as the propor-
tion of the variance explained by the frequency ωj = jπ /m. Compare this result with the
∞
yt = ai ε t−i ,
i=0
G(z) = σ 2 a(z)a(z−1 ),
where a(z) = ∞ i
i=0 ai z . The spectral density function of yt can now be obtained from G(z)
by evaluating it at z = eiω . More specifically
1 σ 2 iω
f (ω) = G(eiω ) = a(e )a(e−iω ). (13.6)
2π 2π
We now calculate the spectral density function for various processes.
Suppose first, that yt is a white noise process. In this case, γ 0 = σ 2 and γ k = 0 for k = 0.
It follows that f (ω) is flat at σ 2 /π for all ω ∈ [0,
π]. Consider
the stationary AR (1) process
yt = φyt−1 + ε t , where |φ| < 1, and ε t ∼ IID 0, σ 2 . Since γ (h) = σ 2 φ |h| / 1 − φ 2 , by
direct methods we have
∞
1 σ 2 φ |h| ihω
f (ω) = e
2π
h=−∞
1 − φ2
∞
1 σ 2 φ |h| iω h
= e (13.7)
2π
h=−∞
1 − φ2
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
∞
1
= γ (h) zh ,
2π
h=−∞
with z = eiω .
We can also express f (ω) as a real valued function of ω. From (13.7) we have
⎡ ⎤
−1 ∞
1 σ2
f (ω) = ⎣1 + φ |h| e−iωh + φ h eiωh ⎦
2π 1 − φ 2
h=−∞ h=1
1 σ2
∞
1+ −iωh
= φ e +e
h iωh
. (13.8)
2π 1 − φ 2
h=1
Since φeiωh = |φ| < 1, the infinite series in the above expression converge and we have
1 σ2 φeiω φe−iω
f (ω) = 1 + + ,
2π 1 − φ 2 1 − φeiω 1 − φe−iω
or
1 σ2
f (ω) = .
2π 1 − φe iω 1 − φe−iω
The same result can also be obtained directly using the autocovariance generating function
which for the AR(1) process is given by
σ2
G (z) = ,
(1 − φz) 1 − φz−1
using (13.6 ) the spectral density function of the yt process can be obtained directly as
1 σ2
f (ω) = , for ω ∈ [0, π] ,
2π 1 − φeiω 1 − φe−iω
or
1 σ2
f (ω) = . (13.9)
2π 1 − 2φ cos(ω) + φ 2
Note that, for φ > 0, f (ω) is monotonically decreasing in ω over [0, π ], while for φ < 0, f (ω)
is monotonically increasing in ω. Using (13.8) and the result
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
we also have
∞
1 σ2
f (ω) = 1+2 φ cos (ωh) ,
h
2π 1 − φ 2
h=1
φ(L)yt = θ (L)ε t ,
is given by
1 σ 2 θ(eiω )θ (e−iω )
f (ω) = , for ω ∈ [0, 2π] .
2π φ(eiω )φ(e−iω )
∞
yt = ai xt−i = a(L)xt , (13.10)
i=0
∞
xt = bj ε t−i = b(L)ε t , ε t ∼ IID 0, σ 2 , (13.11)
j=0
where we assume that xt is absolutely summable. xt is the input process, yt is the output pro-
cess, and (13.10) is the distributed lag or the transfer function. From (13.10) and (13.11) we
can write
yt = a (L) b (L) εt .
σ 2 iω −iω
fy (ω) = c e c e , ω ∈ (0, π)
2π
= a eiω a e−iω fx (ω) .
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Evaluating at ω = 0,
If we consider (13.10) as a filter, from (13.12) we observe the above filter changes the memory
of the stochastic process. Since [a (1)]2 < ∞, and xt absolutely summable implies fx (0) < ∞,
we get fy (0) < ∞. This shows that if the input is a stationary stochastic process, then the output
process is also stationary. On the other hand, a (1) shows the degree and direction of the changes
in memory through the filter. If a (1) > 1, the filter increases the memory of the output process,
the larger the a (1), the more persistent will be the output process. If a (1) < 1, then the filter
decreases the memory of the output process. If a (1) = 1, the filter does not affect the memory
of the output process.
13.6 Exercises
1. Derive the spectral density function of the stationary autoregressive process
yt = φyt−1 + ε t − θ ε t−1 .
Derive the spectral density function of yt and discuss its property in the case when (i) φ = θ,
and (ii) |φ − θ | = , where is a small positive constant.
4. Consider the univariate process {yt }∞
t=1 generated from xt by the following linear filter
∞
yt = (1 − λ) λi xt−i , t = 1, 2, . . . T, |λ| < 1,
i=0
where {xt }∞
−∞ is generated according to the MA(1) process
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
and { t }∞
−∞ are IID(0, σ ) random variables. The autocovariances of xt are denoted by γ s =
2
(a) Show that the spectral density function of xt , fx (ω), is given by the following expression
σ2
fx (ω) = (1 + 2θ cos ω + θ 2 ), 0 ≤ ω < π.
π
where ε t are IID innovation processes and α i decay exponentially. Derive the spectral density
of
xt and show that it is zero at zero frequency.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Part IV
Univariate Time Series Models
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
14 Estimation of Stationary
Time Series Processes
14.1 Introduction
W e start with the problem of estimating the mean and autocovariances of a stationary pro-
cess and then consider the estimation of autoregressive and moving average processes as
well as the estimation of spectral density functions. We also relate the analysis of this section
to the standard OLS regression models and show that when the errors are serially correlated,
the OLS estimators of models with lagged dependent variables are inconsistent, and derive an
asymptotic expression for the bias.
y 1 + y2 + . . . + yT
μ̂T = ȳT = .
T
It is easily seen that ȳT is an unbiased estimator of μ, since under stationarity E(yt ) = μ for all t.
Also, a sufficient condition for ȳT to be a consistent estimator of μ is given by limT→∞ Var(ȳT ) →
0. If this condition holds we say the process is ergodic in mean. To investigate this
condition further let y = y1 , y2 , . . . , yT , and τ = (1, 1, . . . , 1) , a T × 1 vector of ones.
Then ȳT = T −1 τ y , and Var(ȳT ) = T −2 [τ Var(y)τ ], where 1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
Var y1 Cov y1 , y2 ... Cov y1 , yT−1 Cov y1 , yT
⎜ Cov y2 , y1 Var y2 ... Cov y2 , yT−1 Cov y2 , yT ⎟
⎜⎜ .. .. .. .. ..
⎟
⎟
Var y = ⎜ ⎟,
⎜ . . . . . ⎟
⎝ Cov yT−1 , y1 Cov yT−1 , y2 ... Var yT−1 Cov yT−1 ⎠
,yT
Cov yT , y1 Cov yT , y2 . . . Cov yT , yT−1 Var yT
⎛ ⎞
γ0 γ1 . . . γ T−2 γ T−1
⎜ γ1 γ0 . . . γ T−3 γ T−2 ⎟
⎜ ⎟
⎜ ⎟
= ⎜ ... ..
.
..
.
..
.
..
. ⎟.
⎜ ⎟
⎝ γ T−2 γ T−3 ... γ0 γ1 ⎠
γ T−1 γ T−2 ... γ1 γ0
Then
V(ȳT ) = τ Var(y)τ /T 2
1
T−1
h
= γ0 + 2 1− γh . (14.1)
T T
h=1
To ensure limT→∞ Var(ȳT ) = limT→∞ τ Var y τ /T 2 → 0, it is therefore sufficient that
T−1
h
lim 1− γ h < ∞,
T→∞ T
h=1
which is clearly satisfied if the autocovariances are absolute summable (recall that γ (0) < ∞).
To see this, note that
T−1 T−1
h
T−1
h
1− γ h ≤ 1 − γ < γh .
T T h
h=1 h=1 h=1
T−1
lim γ h < ∞, (14.2)
T→∞
h=0
would be sufficient. When this condition is met, it is said that yt is ‘ergodic in mean’.
In spectral analysis, the condition for ergodicity in mean is equivalent to the spectrum, fy (ω) ,
being bounded at zero frequency. Recall that (see Chapter 13)
∞
1
fy (0) = γ0 + 2 γh < ∞,
2π
h=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
holds if ∞ h=0 γ h < ∞. The spectrum at zero frequency measures the extent to which shocks
to the process yt are persistent, and captures the ‘long-memory’ property of the series. Note also
from (14.1) that
√
T−1
h
lim Var( TȳT ) = lim γ0 + 2 1− γ h = 2π fy (0) ,
T→∞ T→∞ T
h=1
√
which relates the asymptotic variance of TȳT to the value of the spectral density at zero
frequency.
1
2
H
lim γ h → 0. (14.5)
H→∞ H
h=1
2 The denominator is T instead of T − h to ensure the positive definiteness of the covariance matrix. See Brockwell and
Davis (1991) for a proof.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and
T
T
γ̂ T (h) = T −1 yt − ȳT yt−h − ȳT = T −1 (yt − μ)(yt−h − μ)
t=h+1 t=h+1
T
T
+ μ − ȳT T −1 yt − μ + μ − ȳT T −1 yt−h − μ
t=h+1 t=h+1
h 2
+ 1− μ − ȳT . (14.6)
T
ȳT = μ + Op (T −1/2 ),
T
T −1/2 yt − μ = Op (1).
t=h+1
T
γ̂ T (h) = T −1 (yt − μ)(yt−h − μ) + Op T −1 .
t=h+1
Therefore, limT→∞ E [γ̂ T (h)] = γ h . Also using results in Bartlett (1946) we have
∞
yt = μ + α i ε t−i , (14.7)
i=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
√ √
√ T γ̂ T (h) a T γ̂ T (h)
T ρ̂ T (h) = ∼ ,
γ̂ T (0) γ0
and we have
p
ρ̂ T (h) → ρ h .
When condition (14.5) is met, the process is said to be ‘ergodic in variance’. It is easily seen that
the mean and
variance ergodicity conditions defined by (14.2) and (14.5)
are satisfied if the
process yt has absolute summable autocovariances, namely if ∞
h=1 γ h < ∞. The fourth-
order moment requirement in (iii) can be relaxed at the expense of augmenting the absolute
summability condition (i) above with ∞ iα
i=0 i
2 < K < ∞.
As shown in Chapter 12, stationary ARMA processes have exponentially decaying autocovari-
ances. Therefore it also follows that stationary ARMA processes are mean/variance ergodic and
their mean and autocovariances can beconsistently estimated by ȳT and γ̂ T (h), for a fixed h as
T → ∞. In fact, since the condition ∞ i=0 iα i < K < ∞ is also met in the case of stationary
2
ARMA processes then consistency of the estimates of the autocorrelation coefficients of station-
ary ARMA processes follow even if the errors, εt , do not possess the fourth-order moments. The
problem with the estimation of ρ h arises when one considers values of h that are large relative to
T, an issue that one encounters when estimating the spectral density function of yt , as discussed
below. See Section 14.9.
Consider now the asymptotic distribution of ρ̂ mT = (ρ̂ T (1), ρ̂ T (2), . . . ., ρ̂ T (m)) , where m
is fixed and T → ∞ and suppose that yt follows the stationary linear process (14.7) where
(i) ε t ∼ IID(0, σ 2 ),
(ii) ∞ |α i | < K < ∞,
i=0∞
(iii) i=1 iα 2i < K < ∞, or E(ε 4t ) < K < ∞.
Then
√ a
T ρ̂ mT − ρ m ∼ N(0, W m ), (14.8)
∞
wm,ij = ρ h+i + ρ h−i − 2ρ i ρ h ρ h+j + ρ h−j − 2ρ j ρ h .
h=1
For a proof see, for example, Brockwell and Davis (1991), Section 7.3.
Finally, assuming that conditions (i)–(iii) above hold, under the null hypothesis
H0 : ρ 1 = ρ 2 = . . . = ρ m = 0,
we have wm,ij = 0 if i = j and wm,ii = 1, and it is easily seen from (14.8) that
√ a
T ρ̂ T (h) ∼ N(0, 1), h = 1, 2, . . . , m.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
√
Furthermore, under the null hypothesis H0 defined above, the statistics T ρ̂ T (h), h = 1,
2, . . . , m are asymptotically independently distributed. This result now can be used to derive
the Box and Pierce (1970) Q statistics (of order m)
m
a
Q =T ρ̂ 2T (h) ∼ χ 2 (m),
h=1
or its small-sample modification, the Ljung and Box (1978) statistics (of order m)
m
1
Q ∗ = T(T + 2)
a
ρ̂ 2 (h) ∼ χ 2 (m).
T−h T
h=1
Under the assumption that yt are serially uncorrelated, the Box–Pierce and the Ljung–Box statis-
tics are both distributed asymptotically as χ 2 variates with m degrees of freedom. The two tests
are asymptotically equivalent, although the Ljung-Box statistic is likely to perform better in small
samples. See Kendall, Stuart, and Ord (1983, Chs 48 and 50), for further details.
yt = ε t + θε t−1 , (14.9)
|θ| < 1, ε t ∼ IIDN 0, σ 2 .
Denote that parameters by the 2 × 1 vector θ = θ, σ 2 . There are two basic approaches for
estimation of MA processes. The method of moments and the maximum likelihood procedure
(see Chapters 9 and 10 for a description of these methods).
γ 0 = E(y2t ) = σ 2 (1 + θ 2 ),
γ 1 = E(yt yt−1 ) = σ 2 θ,
γ s = E(yt yt−s ) = 0, for s > 2.
The last set of moment conditions does not depend on the parameters and hence is not infor-
mative with respect to the estimation of θ and σ 2 , although it can be used to test the MA(1)
specification. Using the first two moment conditions we have
γ1 θ
ρ1 = = .
γ0 1 + θ2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
We have already seen that γ s can be estimated consistently (for a fixed s), by
T
t=s+1 yt yt−s
γ̂ s = , (14.10)
T
and hence θ can be estimated consistently by finding the solution to the following estimating
equation (assuming that such a solution in fact exists)3
2
ρ̂ 1 θ̃ − θ̃ + ρ̂ 1 = 0, (14.11)
where we have denoted the moment estimator of θ by θ̃ . This quadratic equation in θ̃ has a real
solution if
= 1 − 4ρ̂ 21 ≥ 0, or if ρ̂ 1 ≤ 1/2.
When this condition is met, (14.11) has a solution that lies in the range [−1, 1]. Note that such
a solution exists since the product of the two solutions of (14.11) is equal to unity. This solution
is the moment estimator of θ . The reason for selecting the solution that lies inside the unit circle
is to ensure that the estimated MA(1) process is invertible, namely that it can be written as the
infinite-order AR process
yt + θ yt−1 + θ 2 yt−2 + . . . . . . = ε t .
This representation provides a simple solution to the prediction problem, to be discussed later.
The moment estimator of θ , although simple to compute, is not efficient and does not exist if
ρ̂ 1 ≥ 1/2. Other estimation procedures need to be considered.
3 To simplify the notation we are using γ̂ and ρ̂ instead of γ̂ (h), and ρ̂ (h), respectively.
h h T T
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Hence
= σ 2 ∗ , (14.13)
where
∗ = 1 + θ 2 IT + θA,
T 1 y ∗−1 y
(θ ) = − log 2πσ 2 − log ∗ − , (14.14)
2 2 2σ 2
where ∗ denotes the determinant of ∗ . Note, we cannot ignore log∗ if |θ |
is very close
to unity. This
can be
illustrated by noting that 1 + θ + θ 2 + . . . = 1 − θ T / (1 − θ ) . As
As is clear from (14.15), the exact inverse is highly nonlinear in θ and its direct use in (14.14) in
order to compute the ML estimators involves a great deal of computations. The computation of
the exact inverse of ∗ can be quite time consuming when T is large and might not be practical
for very large T.
In order to facilitate the computations we first reduce ∗ to a diagonal form by means of an
orthogonal transformation. To do so we note that the characteristic roots of A are distinct and
do not depend on the unknown parameter, θ. Also from (14.13) it readily follows that A and ∗
commute and hence have the same characteristic vectors, and the characteristic roots of ∗ can
be obtained from those of A.5 This implies that ∗ has the following characteristic vectors
jπ 2jπ Tjπ
hj = sin , sin , . . . , sin ,
T+1 T+1 T+1
4 See, for example, Pesaran (1973). 5 Matrices A and B are said to commute if and only if AB = BA.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
for j = 1, 2, . . . , T. After the necessary normalization of the above characteristic vectors, the
orthogonal transformation H which does not depend on the unknown parameters can be formed
1/2
2
H = (h1 , h2 , . . . , hT ) . (14.17)
T+1
Then using the theorem on the diagonalization of real symmetric matrices it follows that
∗−1 = H −1 H, where is a diagonal matrix with the characteristic roots λj , j = 1, 2, . . . , T
as its diagonal elements. With λj specified in (14.16), we can calculate log ∗ by
T
∗ jπ
= θ 2 + 2θ cos +1 ,
j=1
T+1
which can also be obtained by a direct evaluation of ∗ .
Substituting ȳ = Hy and (14.18) in (14.14), the log-likelihood function becomes
T 1 1 − θ 2T+2 1
(θ) = − log 2πσ 2 − log − 2 ȳ −1 ȳ. (14.19)
2 2 1−θ 2 2σ
In order to maximize the likelihood function given by (14.19), we first obtain the following con-
centrated log-likelihood function
1
T 2 1 − θ 2T+2 T
(θ) = − log 2π σ̂ (θ ) − log − , (14.20)
2 2 1 − θ2 2
where
σ̂ 2 (θ ) = ȳ −1 ȳ /T.
Now dealing with the transformed observations, ȳ, the problem of maximizing (θ ) defined
in (14.19) is much simpler. A grid search method or iterative procedures, such as the Newton-
Raphson method, can be used. Here we describe an iterative procedure for the maximization of
(14.20) which is certain to converge. For this purpose we first note that
∂ (θ ) T ∂ σ̂ 2 (θ ) (T + 1) θ 2T+1 θ
=− 2 + − = 0, (14.21)
∂θ 2σ̂ (θ ) ∂θ 1−θ 2T+2 1 − θ2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
∂ σ̂ 2 (θ) 1 ∂−1 2 ∂ ȳ
= ȳ ȳ + ȳ −1 .
∂θ T ∂θ T ∂θ
Hence
T
∂ σ̂ 2 (θ ) 1
jπ
= θ + cos λ−2 2
j ȳj .
∂θ T j=1 T+1
Consequently, using the above result in (14.21) and ignoring θ 2T+2 (|θ | < 1), the first-order
condition for the maximization of the log-likelihood function can be written as
T
T
jπ
f (θ ) = θ λ−2
j ȳ 2
j − T 1 − θ 2
θ + cos λ−2
j ȳj = 0.
2
j=1 j=1
T + 1
2
T −2 2
Now it is easily seen that f (1) f (−1) = − j=1 λj ȳj < 0 and hence the equation
f (θ̂ ) = 0, must have a root within the range |θ | < 1. A simple procedure for computing this
root is by the iterative method of ‘False position’. A description of this method together with the
proof of its convergence can be found, for example, in Hartee (1958); also see Pesaran (1973).
yt = wt β + ut , t = 1, 2, . . . , T, (14.22)
where
q
ut = θ i ε t−i , ε t ∼ N(0, σ 2 ), θ 0 ≡ 1. (14.23)
i=0
T 1
LLMA (θ ) = − (2π σ 2 ) − 12 log ∗ − 2 (y − Wβ) ∗−1 (y − Wβ), (14.24)
2 2σ
where u = y−Wβ, and E(uu ) = σ 2 ∗ . This yields exact ML estimates of the unknown param-
eters θ = (β , θ 1 , θ 2 , . . . , θ q , σ 2 ) , when the regressors wt do not include lagged values of yt .
The numerical method used to calculate the above maximization problem involves a Cholesky
decomposition of the variance-covariance matrix ∗ . For the MA(q) error specification we have
∗ = HDH , with H being an upper triangular matrix
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
1 h11 h21 ... hq1 ... 0
⎜ 1 h12 h22 ... hq2 0 ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜ · · · . . ⎟
⎜ ⎟
⎜ · · · hq,T−q ⎟
H=⎜
⎜ ..
⎟,
⎟
⎜ · · . ⎟
⎜ ⎟
⎜ 0 · · h2,T−2 ⎟
⎜ ⎟
⎝ 1 h1,T−1 ⎠
1
q
dt = δ 0 − h2it dt+i , t = T − 1, T − 2, . . . , 1,
i=1
⎛ ⎞
q
t = T − j, T − j − 1, . . . , 1,
hjt = d−1 ⎝
t+j δ j − hit hi−j,t+j wt+i ⎠ ,
j = q − 1, q − 2, . . . , 1,
i=j+1
hqt = d−1
t+q δ q ,
⎧ q
⎨
θ i θ i−s , 0 ≤ s ≤ q,
δs =
⎩ i=1
0, s > q.
f
q
f
yt = yt − hit yt+i , for t = T − 1, T − 2, . . . , 1,
i=1
f
q
f
wt = wt − hit wt+i , for t = T − 1, T − 2, . . . , 1,
i=1
and
−1/2 f −1/2 f
y∗t = dt yt , wt∗ = dt wt .
dT = δ 0 = 1 + θ 21 + θ 22 + . . . + θ 2q ,
hjT = hj,T−1 = · · · = hj,T−j+1 = 0,
f
yT = yT ,
f
wT = wT .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
For a given value of (θ 1 , θ 2 , . . . , θ q ), the estimator of β can be computed by the OLS regression
of y∗t on wt∗ . The estimation of θ 1 , θ 2 , . . . , θ q needs to be carried out iteratively. Microfit 5.0
carries these iterations by the modified Powell method of conjugate directions that does not
require derivatives of the log-likelihood function. See Powell (1964), Brent (1973), and Press
et al. (1989). The application of the Gauss–Newton method to the present problem requires
derivatives of the log-likelihood function which are analytically intractable, and can be very time-
consuming if they are to be computed numerically. In the case of pure MA(q) processes, we need
to set w = (1, 1, . . . , 1) .
p
yt = φ i yt−i + ε t , ε t ∼ IID(0, σ 2 ),
i=1
and suppose that the observations (y1 , y2 , . . . , yT ) are available. As with the MA processes,
estimation of the unknown parameters θ = (φ , σ 2 ) , where φ = (φ 1 , φ 2 , . . . , φ p ) , can be
accomplished by the method of moments or by the ML procedure.
⎛ ⎞⎛ ⎞ ⎛ ⎞
γ0 γ1 . . . γ p−2 γ p−1 φ1 γ1
⎜ γ1 γ0 . . . γ p−3 γ p−2 ⎟⎜ φ2 ⎟ ⎜ γ2 ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ .. .. .. .. .. ⎟⎜ .. ⎟ ⎜ .. ⎟
⎜ . . . . . ⎟⎜ . ⎟=⎜ . ⎟,
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ γ p−2 γ p−3 ... γ0 γ1 ⎠ ⎝ φ p−1 ⎠ ⎝ γ p−1 ⎠
γ p−1 γ p−2 ... γ1 γ0 φp γp
or more compactly as
pφ = γ p.
Using the moment estimates of γ s given by (14.10), the YW estimators of φ are given by
−1
φ̂ YW = ˆ p γ̂ p ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
γ̂ t=2 yt yt−1
φ̂ 1 = 1 = T . (14.25)
γ̂ 0 t=1 yt
2
The YW estimator of φ is consistent and asymptotically efficient and for sufficiently large T is
very close to the ML estimator of φ, to which we now turn.
where f y1 is the marginal distribution of the initial observations, and θ = (φ, σ 2 ) . Assuming
the process has started a long time ago, we have
σ2
y1 ∼ N 0, , (14.27)
1 − φ2
y1 = y1 ,
y2 = φy1 + ε 2 ,
y3 = φy2 + ε 3 ,
..
.
yT = φyT−1 + ε T .
Also
f y1 , y 2 , . . . , yT = f y 2 , . . . , yT | y 1 f y 1
= f y 1 , ε 2 , ε 3 . . . ε T J ,
where
∂ y1 , ε 2 , ε3 . . . ε T
J =
∂ y 1 , y 2 , . . . , yT
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Hence
T
f y1 , y2 , . . . , yT |θ = f (y1 |θ ) f (ε t |θ ). (14.28)
t=2
T
log f y1 , y2 , . . . , yT |θ = log f (y1 |θ ) + log f (ε t |θ ) (14.29)
t=2
Under the assumption that the AR(1) process has started a long time in the past and is stationary,
we have
1 2πσ 2 1 − φ2 2
log f y1 |θ = − log − y . (14.30)
2 1−φ 2 2σ 2 1
Also
1 1 2
log f (ε t |θ ) = − log 2πσ 2 − 2 yt − φyt−1 , t = 2, 3, . . . , T. (14.31)
2 2σ
Therefore, substituting these in (14.29) we have (recalling that (θ ) = log f y1 , y2 , . . . , yT |θ )
T 1
(θ ) = − log 2πσ 2 + log 1 − φ 2
2 2
1
T
1−φ 2 2
− 2
y 2
1 − 2
yt − φyt−1 . (14.32)
2σ 2σ t=2
We need to
value if φ is very close to1. The reason is, if |φ| is sufficiently less
take care ofthe initial
than 1, log 2π σ 2 / 1 − φ 2 is finite hence log f y1 is finite. The effect of the distribution
of the initial value will become smaller and smaller as T → ∞. However, as |φ| → 1 then,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
log 2πσ 2 / 1 − φ 2 → ∞. Therefore, initial values matter in small samples or when the
Only in the case where |φ| < 1, and T is sufficiently large, is one
process is near non-stationary.
justified to ignore log f y1 .
Rearranging the terms in (14.32), we now obtain
T 1 1
T
2
(θ ) = − log 2π σ 2 + log 1 − φ 2 − 2 1 − φ 2 y21 + yt − φyt−1 .
2 2 2σ t=2
(14.33)
There is a one-to-one relationship between the above expression and the general log-likelihood
specification given by (14.14). It is easily seen that
T
2
1 − φ 2 y21 + yt − φyt−1 = y ∗−1 y,
t=2
where
⎛ ⎞
1 φ . . . φ T−2 φ T−1
⎜
⎜ φ 1 . . . φ T−3 φ T−2 ⎟
⎟
1 ⎜ .. .. .. .. .. ⎟
∗ = ⎜ . . . . . ⎟,
1 − φ2 ⎜ T−2 ⎟
⎝ φ φ T−3 ... 1 φ ⎠
φ T−1 φ T−2 ... φ 1
and
⎛ ⎞
1 −φ ... 0 0
⎜ −φ 1 + φ2 ... 0 ⎟0
⎜ ⎟
∗−1 ⎜ .. .. .. ⎟..
=⎜ . . . ... ⎟..
⎜ ⎟
⎝ 0 0 . . . 1 + φ2 −φ ⎠
0 0 ... −φ 1
Ignoring the log density of the initial observation, f (y1 ), the ML estimator of φ can be computed
by finding the solution to the following least squares problem
T
2
φ̂ LS = argmin yt − φyt−1 ,
φ t=2
and is given by the OLS coefficient of the regression of yt , on yt−1 , for t = 2, 3, . . . , T. In the
case where yt has a non-zero mean, the least squares regression must also include an intercept.
In such a case the least squares estimator of φ is given by
T
t=2 yt − ȳ yt−1 − ȳ−1
φ̂ LS = T 2 , (14.34)
t=2 yt−1 − ȳ−1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where ȳ = (T − 1)−1 Tt=2 yt , and ȳ−1 = (T − 1)−1 Tt=2 yt−1 . It is now easily seen that φ̂ LS
is asymptotically equivalent to the Yule–Walker This result holds for more general AR processes
and extends to the ML estimators when a stationary initial value distribution, f (y1 ), is added to
the log-likelihood function.
p
yt = α + φ i yt−i + εt , ε t ∼ IIDN(0, σ 2 ).
i=1
In this case, the log-likelihood function for the sample observations, y = (y1 , y2 , . . . , yT ) , is
given by
T
(θ ) = log f y1 , y2 , . . . , yp + log f yt | yt−1 , yt−2 , . . . , yt−p , (14.35)
t=p+1
where log f (y1 , y2 , . . . , yp ) is the log-density of the initial observations, (y1 , y2 , . . . , yp ), and
θ = (φ 1 , φ 2 , . . . , φ p , σ 2 ) . Under ε t ∼ IIDN(0, σ 2 )
2
1
T p
(T − p)
log f yt | yt−1 , yt−2 , . . . , yt−p = − log 2π σ − 2
2
yt − α − φ i yt−i .
2 2σ t=p+1 i=1
When the AR(p) process is stationary, the average log-likelihood function, T −1 (θ) , converges
to the limit
1 1
Plim T −1 (θ ) = − log 2πσ 2 − ,
T→∞ 2 2
and does not depend on the density of the initial observations. Namely, for sufficiently large T,
the first part of (14.35) can be ignored, and asymptotically the log-likelihood function can be
approximated by
2
1
T p
(T − p)
(θ ) ≈ − log 2πσ − 2
2
yt − α − φ i yt−i ,
2 2σ t=p+1 i=1
where α is an intercept, and allows for the possibility of yt not having mean zero. The MLE of
θ = (α, φ , σ 2 ) = (β , σ 2 ) based on this approximation can be computed by the OLS regres-
sion of yt on 1, yt−1 , yt−2 , . . . , yt−p . We have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
⎛ ⎞ ⎛ ⎞
yp+1 1 yp yp−1 ... y1
⎜ yp+2 ⎟ ⎜ 1 yp+1 yp ... y2 ⎟
⎜ ⎟ ⎜ ⎟
yp = ⎜ .. ⎟ p ⎜
, X = .. .. .. ⎟.
⎝ . ⎠ ⎝ . . . ⎠
yT 1 yT−1 yT−2 . . . yT−p
Also
yp − Xp β̂ yp − Xp β̂
σ̂ 2 = .
T−p
∞
yt−1 = μ + φ j ε t−1−j ,
j=0
and E(yt−1 ε t−s ) = σ 2 φ s−1 , for s = 1, 2, . . . The small sample bias of φ̂ LS has been derived
in the case where |φ| < 1 and the errors, ε t , are normally distributed by Kendall (1954) and
Marriott and Pope (1954), and is shown to be given by7
1 + 3φ 1
E φ̂ LS = φ − +O . (14.37)
T T2
Also
1 − φ2
1
Var φ̂ LS = +O .
T T2
6 See Sections 2.2 and 9.3 on the differences between weak and strict exogeneity assumptions.
7 Bias corrections for the LS estimates in the case of higher-order AR processes are provided in Shaman and Stine (1988).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
bias φ̂ LS = E φ̂ LS − φ = −T −1 (1 + 3φ) + O(T −2 ), (14.38)
can be substantial when T is small and φ close to unity. It is also clear that the bias is negative for
all positive values of φ. For example, when T = 40 and φ = 0.9, the bias is or order of −0.0.925,
which represents a substantial under estimation of φ.
To deal with the problem of small sample bias a number of bias-corrected estimators of φ are
proposed in the literature. Here we draw attention to two of these estimators. The first, initially
proposed by Orcutt and Winokur (1969), uses the bias formula, (14.38), to obtain the following
biased-corrected estimator
1
φ̃ = 1 + T φ̂ LS .
T−3
The second approach is known as the (half) Jackknife bias-corrected estimator and was first
proposed by Quenouille (1949). It is defined by
1
φ̆ = 2φ̂ LS − φ̂ 1,LS + φ̂ 2,LS ,
2
where φ̂ 1,LS and φ̂ 2,LS are the least squares estimates of φ based on the first T/2 and the last T/2
observations (it is assumed that T is even, if it is not, one of the observations can be dropped).
Again using (14.37) we have
1 + 3φ 1
E φ̂ 1,LS = φ − +O
(T/2) (T/2)2
1 + 3φ 1
E φ̂ 1,LS = φ − +O .
(T/2) (T/2)2
1 1
E φ̆ = φ + O =φ+O .
(T/2)2 T2
Both bias-corrected estimators work well in reducing the bias, but impact the variances of the
bias-corrected estimators differently. The Jackknife estimator does not require knowing the
expression for the bias in (14.38), and is more generally applicable and can be used with sim-
ilar effects in the case of higher-order AR processes.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Finally, when using bias-corrected estimators it is important to bear in mind that the variance
of the bias-corrected estimator tends to be higher than the variance of the uncorrected estimator.
T 2
For example, Var φ̃ = T−3 Var φ̂ LS > Var φ̂ LS , and the overall effect of bias-correction
on the mean squared error of the estimators generally depends on the true value, φ. In the case
of φ̂ LS and φ̃, for example, we have
2 2
(1 + 3φ)2 + 1 − φ 2 1
MSE φ̂ LS = bias of φ̂ LS + Var(φ̂ LS ) = + O ,
T2 T3
2 2
1 − φ2 1
MSE φ̃ = bias of φ̃ + Var(φ̃ ) = + O .
(T − 3)2 T3
Although for most values of φ and T, MSE φ̃ < MSE φ̂ LS , there exist combinations of φ and
T for which MSE φ̃ > MSE φ̂ LS . For example, this is the case when φ = 0 and T < 10.
yt = λyt−1 + βxt + ut ,
where, without loss of generality, we assume that yt and xt both have zero means. Suppose that
|λ| < 1, and xt and ut are covariance stationary processes with absolute summable autocovari-
ances, as defined by
∞
xt = ai vt−i ,
i=0
and
∞
ut = bi ε t−i ,
i=0
∞
where ∞ i=0 |ai | < K < ∞, i=0 |bi | < K < ∞, and {vt }, and {ε t } are standard white noise
processes.
Let θ = (λ, β) , and zt = (yt−1 , xt ) , for t = 1, 2, . . . , T, (see Section 6.2 for further details),
and consider the least squares estimator of θ, θ̂ OLS , and note that
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
−1
T
T
−1
θ̂ OLS − θ 0 = T zt zt T −1
zt ut , (14.39)
t=1 t=1
yt = βc(L)vt + d(L)ε t ,
where
Both c(L) and d(L) are products of two polynomials with absolute summable
∞ coefficients,
∞ and
∞
therefore also satisfy the absolute summability conditions |c | < |λ | i=0 |ai |
∞ ∞ i=0 i i=0 i
< K < ∞ and ∞ i=0 i|d | < |λ
i=0 i | |b
i=0 i | < K < ∞. Hence,
T
−1 p ω11 ω12
T zt zt → E(zt zt ) = ,
ω12 ω22
t=1
where
∞
∞
ω11 = β 2 c2i + d2i ,
i=0 i=0
∞
∞
ω12 = β ci ai+1 , ω22 = a2i .
i=0 i=0
Similarly,
T
p δ
−1
T zt ut → E(zt ut ) = ,
0
t=1
where
∞
δ= bi di−1 .
i=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In general, therefore, the OLS estimator, θ̂ OLS , is inconsistent, unless δ = 0. The direction of
inconsistency (the asymptotic bias) depends particularly on the sign of δ. Note that ω22 > 0,
and also since E(zt zt ) is a positive definite matrix then ω11 ω22 − ω212 > 0. Hence
In most economic applications where λ > 0 and the errors are positively serially correlated,
δ > 0, the OLS estimator of λ will be biased upward.
p
q
yt = φ i yt−i + θ i ε t−i , εt ∼ IID(0, σ 2 ), with θ 0 = 1. (14.40)
i=1 i=0
q
yt = wt β + θ i ε t−i , (14.41)
i=0
Suppose now that λp+1 is very close to μp+1 , then the above ARMA specification might be indis-
tinguishable from the ARMA(p, q) specification given by (14.40). In the extreme case where
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
λp+1 = μp+1 , the common lag factor, 1 − λp+1 L, can be cancelled from both sides which yields
the minimal-order ARMA(p, q) specification. See also Exercise 5 at the end of this chapter.
p
1− φ i zi = 0,
i=1
lie outside the unit circle, |z| > 1, and for T sufficiently large and p fixed we have
√
T(φ̂ − φ) ∼ N(0, σ 2 Vp−1 ),
a
where
⎛ ⎞
γ0 γ1 . . . γ p−2 γ p−1
⎜ γ1 γ0 . . . γ p−3 γ p−2 ⎟
Yp Yp ⎜ ⎟
⎜ .. .. .. .. .. ⎟
Vp = Plim =⎜ . . . . . ⎟ = p.
T→∞ T ⎜ ⎟
⎝ γ p−2 γ p−3 ... γ0 γ1 ⎠
γ p−1 γ p−2 ... γ1 γ0
1 − φ 21
Asy.Var(φ̂ 1 ) = , for φ 1 < 1.
T
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1
T−1
f̃T (ω) = γ̂ T (0) + 2 γ̂ T (h) cos (ωh) ,
2π
h=1
where the unknown covariances are replaced by their estimators, (14.3). But this is not a good
estimator since for large values of h, γ̂ T (h) will be based only on a few data points, and hence
the condition for consistency of f̃T (ω) will not be satisfied. To avoid this problem the sum in
the above expression needs to be truncated. It is also necessary to put less weight on distance
autocovariances, namely those with relatively large h, to reduce the possibility of undue influence
of these estimators on f̃T (ω). This suggests using a truncated and weighted version of f̃T (ω). At
a given frequency, ω = ωj in the range [0, π], a consistent estimator of f (ω) is given by
1
K
f̂T ωj = γ̂ T (0) + 2 λh γ̂ T (h) cos ωj h , (14.42)
2π
h=1
where ωj = jπ/K, j = 0, 1, . . . , K, K is the ‘window size’, and {λk } are a set of weights called
the ‘lag window’.
Many different weighting schemes are proposed in the literature. Among these, the most com-
monly used are
h
Bartlett window : λh = 1 − , 0 ≤ h ≤ K,
K
1
Tukey window : λh = [1 + cos (πh/K)] , 0 ≤ h ≤ K,
2
! "
1 − 6 (h/K)2 + 6 (h/K)3 , 0 ≤ h ≤ K/2
Parzen window : λh = .
2 (1 − h/K)3 , K/2 ≤ h ≤ K
We need the window size K to increase with T, but at a lower rate, such √ that the condition:
K/T → 0 as T → ∞, is satisfied. The value for K is often set equal to 2 T.
In practice, it is more convenient to work with a standardized spectrum, defined by
where ρ̂ T (h) is given by (14.4). The standard errors reported for the estimates of the standard-
ized spectrum can be calculated according to the following formulae, which are valid
asymptotically
$
# f̂ (ωj ) = 2v f̂ (ωj ), for j = 1, 2, . . . , K − 1,
s.e.
$
= 4v f̂ (ωj ), for j = 0, K,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
K
where v = 2T/ k=−K (λk ). For the three different windows, v is given by:
2
T
Bartlett window:v=3 ,
K
8T
Tukey window: v = ,
3K
T
Parzen window: v = 3.71 .
K
Note that the estimates of the standard error of the spectrum at the limit frequencies 0 and π are
twice as large as the standard error estimates at the intermediate frequencies. This is particularly
relevant to the analysis of persistence as it involves estimation of the spectrum at zero frequency.
Example 32 Figure 14.1 shows the estimation of the spectral density function for the rate of change
of US real GNP, (denoted by DYUS) using quarterly data from 1979q3 to 2013q1, for a total of
135 observations. Estimates of the standardized spectral density function based on Bartlett, Tukey,
and Parzen windows are reported. The estimates of the spectral density are scaled √and standard-
ized using the unconditional variance of the variable. The window size is set to 2 T = 23. One
important feature of this plot is that the contribution to the sample variance of the lowest-frequency
component is much larger than the contributions of other frequencies (for example at business cycle
frequencies). For further details see Lessons 10.11 and 12.2 in Pesaran and Pesaran (2009).
K
f̂ (ωj ) = λ(ωj+i , ωj )f̂ (ωj+i ),
i=−K
3.0
2.0
1.0
0.0
0 1 2 3 4
Figure 14.1 Spectral density function for the rate of change of US real GNP.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where ωj = jπ/K, K is a bandwidth parameter indicating how many frequencies {ωj , ωj±1 , . . . ,
ωj±K } are used in estimating the population spectrum, and the kernel λ(ωj+i , ωj ) indicates
how much weight each frequency is to be given, where Ki=−K λ(ωj+i , ωj ) = 1. Specification
of kernel λ(ωj+i , ωj ) can equivalently be described in terms of a weighting sequence {λj , j =
1, . . . , K}. One important problem is the choice of the bandwidth parameter, K. As a practical
guide, it is often recommended to plot an estimate of the spectrum using several different band-
widths and then rely on subjective judgement to choose the bandwidth that produces the most
plausible estimate. More formal statistical procedures for the choice of K have been proposed,
among others, by Andrews (1991), and Andrews and Monahan (1992).
For an introductory text on the estimation of the spectrum see Chapter 7 in Chatfield (2003).
For more advanced treatments of the subject see Priestley (1981, Ch. 6), and Brockwell and
Davis (1991, Ch. 10).
14.10 Exercises
1. Suppose yt is a covariance stationary linear process given by
∞
yt = μ + α j ε t−j , ε t IID(0, σ 2 ),
j=0
∞ ∞
where j=0 | α j |< ∞, and j=0 α j = 0.
(a) Show that the sample mean, ȳT = T −1 Tt=1 yt is a consistent estimator of μ.
(b) Consider now the following estimate of the autocovariance function of yt
T−s
−1
γ̂ s = T (yt − ȳT )(yt+s − ȳT ),
t=1
γ̂ 0 γ̂ 1
for 0 ≤ s ≤ T − 1. Show that the matrix is positive definite.
γ̂ 1 γ̂ 0
(a) Show that the limit of the covariance of y1 and ȳT = T −1 (y1 + y2 + . . . + yT ) is given
by
⎛ ⎞
1
T−1
lim ⎝ γ j ⎠ = 0. (14.43)
T→∞ T
j=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
3. The time series {yt } and {xt } are independently generated according to the following schemes
yt = βxt + ut ,
Plim (β̂) = 0,
T→∞
2
tβ̂2 = β̂ /V̂(β̂) = (T − 1)r2 /(1 − r2 ),
Plim (Tr2 ) = (1 + λρ)/(1 − λρ),
T→∞
where β̂ is the OLS estimator of β, V̂(β̂) is the estimated variance of β̂, and r is the sample
correlation coefficient between x and y, i.e., r2 = ( Tt=1 xt yt )2 /( Tt=1 x2t Tt=1 y2t ). What
are the implications of these results for problems of spurious correlation in economic time
series analysis?
4. Consider the AR(1) process
(a) Derive the log-likelihood function of the model: (i) conditional on y0 being a given fixed
constant. (ii) y0 is normally distributed with mean zero and variance σ 2 /(1 − φ 2 ).
(b) Discuss the maximum likelihood estimator of φ under (i) and (ii), and show that it is
guaranteed to be in the range |φ| < 1, only under (ii).
5. Weekly returns (rt ) on DAX futures contracts are available over the period 14 Jan 1994 – 30
Oct 2009 (825 weeks). A trader interested in predicting future values of rt proceeds initially
by estimating the first- and second-order autocorrelation coefficients of returns and obtains
the estimates
ρ̂ 1 = −0.041, ρ̂ 2 = 0.0511.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
√ √ √
It is known that under ρ 1 = ρ 2 = 0, T ρ̂ = ( T ρ̂ 1 , T ρ̂ 2 ) is asymptotically distributed
as N(0, I2 ), where T is the sample size and I2 is an identity matrix of order 2. He/she finds
these results disappointing and decides to estimate an ARMA(1,1) model for rt and obtains
the following maximum likelihood (ML) estimates
rt = φrt−1 + ε t + θ ε t−1 .
φ̂ = −0.8453 , θ̂ = 0.7811,
(0.0678) (0.0767)
%
Cov(φ̂, θ̂) = 0.0045,
where the figures in brackets are standard errors of the associated ML estimates. Knowing that
the ML estimators are asymptotically normally distributed, the trader argues in favour of the
ARMA model on the grounds that the parameters of the model, φ and θ are statistically highly
significant.
(a) Do you agree with the trader’s statistical analyses and conclusions? To answer this ques-
tion we suggest that you carry out the following tests at the 5 per cent significance level
(b) Discuss the general problem of asset return predictability and its relation to the efficient
market hypothesis.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
15.1 Introduction
W ithin the class of stochastic linear processes discussed in the earlier chapters, the case
where one of the roots of the autoregressive representation of the underlying process is
unity plays an important role in the analysis of macroeconomic and financial data. In this chap-
ter we compare the properties of unit root processes with the stationary processes and consider
alternative ways of testing for unit roots.
with a given (fixed or stochastic) initial value, y0 . The parameter μ is known as the ‘drift’ param-
eter. Solving for yt in terms of its initial value we obtain
yt = y0 + tμ + ε 1 + ε 2 + . . . + ε t ,
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Clearly, the random walk model is not covariance stationary, even if we set the drift term μ
equal to zero. The coefficients of the innovations, εt , are not square summable, and the vari-
ance of yt is trended. But it is easily seen that the first difference of yt , namely yt = μ +
ε t , is covariance stationary. For this reason the random walk model is also called the first-
difference stationary process. Pictorial examples of random walk models are given in Figures 15.1
and 15.2.
15
10
-5
-10
-15
-20
-25
1,000
Observations
100
80
60
40
20
0
1,000
Observations
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
yt = yt−1 + ut , (15.2)
∞
ut = ai εt−i , (15.3)
i=0
where {ε t } is mean zero, serially uncorrelated process. Therefore, {yt } is an integrated process of
order 1, I (1). Similarly, {ut } is also referred to as an I (0) process.
The I (1) process in (15.2) is a unit root process without a drift, namely E yt = y0 . A unit
root process with a non-zero drift is defined by
yt = μ + ut , (15.4)
More formal
definition for t denotes it as a non-decreasing sequence of σ -fields that is gen-
erated by y1 , y2 , . . . , yt , i.e., t ⊇ t−1 ⊇ . . . ⊇ 0 . A process satisfying Definition 23 with
t so defined is said to have unbounded memory. When yt is the outcome of a game, then the
martingale condition, E yt+1 | t = yt , is also known as the ‘fair game’ condition. It is clear
that the random walk model
yt = yt−1 + ε t ,
and E(ε t | t−1 ) = 0, by assumption. The main difference between random walk and mar-
tingale processes lies in the assumption concerning the innovations εt . Under the random walk
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
model ε t has a constant variance, whilst when yt is a martingale process, the innovations εt could
be conditionally and/or unconditionally heteroskedastic.
Some important properties of the martingale processes are:
1. E yt+j | t = yt , for all j ≥ 0. This follows
from
the law
of expected
iteration:
Suppose
there are two sets St ⊆ t , then we have E E yt | t−1 | St−1 = E yt | St−1 . Apply
this to the martingale process at hand
E E yt+1 | t | t−1 = E yt+1 | t−1 , (15.5)
since E yt+1 | t = yt , is then
E E yt+1 | t | t−1 = E yt | t−1 = yt−1 = E yt+1 | t−1 . (15.6)
A generalization of (15.6) yields the desired result, E yt+j | t = yt , for j ≥ 0.
2. Constant mean. Since
E yt = E E yt | t−1 = E yt−1 ,
A martingale process has a constant mean, but it can have time-varying variance, that is, a
martingale process allows for heteroskedasticity or conditional heteroskedasticity.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
the normality assumption does not hold and martingale difference processes are more generally
applicable.
15.3.3 Lp -mixingales
The one-step ahead unpredictability property of martingale differences is often unrealistic in a
time series context. A more general class of processes, known as Lp -mixingales introduced by
McLeish (1975b) and Andrews (1988), provide an important generalization of martingale dif-
ferences to situations where the process is asymptotically unpredictable in the sense formalized
in Definition 25.
Definition 25 Let yt be a sequence of random variables with E(yt ) = 0, t = 1, 2, . . ., and let
t be the information set available at time t. The sequence is said to follow an Lp -mixingale with
∞
respect to t if we can find a sequence of deterministic constants {ct }∞
−∞ and ξ m 0 such that
ξ m → 0 as m → ∞ and
E yt |t−m ≤ ct ξ m ,
p
yt − E yt |t+m ≤ ct ξ m+1 ,
p
None of the above properties holds for a model with a unit root. In the case of the simple
random walk process without a drift
we have
yt = y0 + σ (ε 1 + ε 2 + · · · + ε t ),
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
1. The variance of yt is
Var yt = σ 2 t,
which is an increasing function of time and hence will not be finite for large enough t.
2. A shock will have a permanent effect on yt , namely for a non-zero shock of size δ hitting
the system at time t
lim E yt+h |ε t = δ, t−1 − E yt+h |t−1 = δ
h→∞
which is non-zero.
3. The spectrum of yt has the approximate shape f (ω) ∼ Aω−2 , and f (0) → ∞.
4. The expected time between crossings of y = y0 is infinite.
5. ρ k → 1, for all k as t → ∞.
The trend and difference stationary processes have both played a very important role in the
empirical analysis of economic data, with the latter being used particularly in the analysis of
financial data. The importance of difference stationary processes for the analysis of economic
time series was first emphasized by Nelson and Plosser (1982), who argued that in the case of
a majority of aggregate time series such as output, employment, prices and interest rates they
are best characterized by a first difference stationary process, rather than by a stationary process
round a deterministic trend. The issue of whether economic time series are trend stationary or
first difference stationary (also known as the ‘unit root’ problem) has been the subject of inten-
sive research and controversy.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
where Var yt+m−1 − yt is known as the long-difference variance, and Var yt − yt−1 the
short-difference variance. Under the random walk hypothesis (with or without a drift), we have
VRm = 1.
Consider now the properties of VRm under the alternative hypothesis that yt is a stationary
AR(1) process
For m = 2, we have
Var yt+1 − yt−1 Var yt+1 − 2Cov yt+1, yt−1 + Var yt−1
VR2 = = , (15.8)
2Var yt − yt−1 2Var yt − yt−1
and under
the AR (1) model Var y t+1 = Var y t−1 = σ 2 / 1 − ρ 2 , and Cov y
t+1, y t−1 =
ρ σ / 1 − ρ . Hence, using these in (15.8) yields
2 2 2
σ 2ρ σ 2 2
2 1−ρ − 2 1−ρ
( 2) ( 2) 1
VR2 =
= (1 + ρ) < 1, (15.9)
σ 2 ρσ 2
2
2 2 1−ρ − 2 1−ρ 2
( 2) ( )
m
yt+m−1 − yt−1 = yt+m + yt+m−1 + . . . + yt = yt+j ,
j=0
then
m
yt+m−1 − yt−1 j=1 yt+j
Var = Var .
m m
Now using the general formula for the variance of the sample mean of a stationary process given
by (14.1), it readily follows that
yt+m−1 − yt−1 1
m−1
h
Var = γ (0) + 2 1− γ (h) ,
m m y m y
h=1
where γy (h) is the autocovariance function of yt , which is a stationary process. Hence
yt+m−1 −yt−1 yt+m−1 −yt−1
mVar m mVar m
VRm = =
Var(yt − yt−1 ) γy (0)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
m−1
h
= 1+2 1− ρ y (h), (15.10)
m
h=1
where ρ y (h) = γy (h)/γy (0) is the autocorrelation function of order s of the first difference
process {yt }. It is easily verified that (15.9) is a special case of (15.10), noting that in the case
of the AR(1) process given by (15.7) we have
Car yt+1 − yt , yt − yt−1 1−ρ
ρ y (1) = =− .
Var yt − yt−1 2
m−1
h
V Rm = 1 + 2 1− ρ̂ y (h),
m
h=1
T
wt = yt − T −1 yτ ,
τ =1
for series with linear deterministic trends (or random walk models with a drift).
It is interesting to note that equation (15.10) is closely related to the estimate of the standard-
ized spectral density of yt with Bartlett window of size m − 1, namely (in the case of random
walk models without a drift)
1
m−1
h
∗
f̂y (0) = 1+ λh ρ y (h) , λh = 1 − . (15.11)
2π m−1
h=1
Hence, a test of the random walk model can be carried out by testing the hypothesis that
∗ (0) ≈ VR = 1. Significant departures of 2πf ∗ (0) from unity can be interpreted as
2π f̂y m y
evidence against the random walk model. The choice of m is guided by the same considerations
as with the estimation of the spectral density, namely it should be chosen to be sufficiently large
such that m/T → 0 as T → ∞. Popular choices are m = T 1/2 and T 1/3 .
The above version of random walk hypothesis requires the strict IID assumption for the error
terms, ε t . One generalization of the above model relaxes the IID assumption and allows εt s to
have non-identical distributions. This model includes the IID case as a special case and allows
the unconditional heteroskedasticity in the εt s.
A more general version of the random walk hypothesis is obtained by relaxing the indepen-
dence assumption. This is the same as the I(1) or the first difference stationary model. Under the
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
null hypothesis that yt is first difference stationary, 2πfy∗ (0) can depart from unity. Therefore,
∗ (0).
in general, it is not possible to base a test of the unit root hypothesis on 2πfy
The absence of drift for the unit root is achieved by the restriction on the intercept. That is, when
|φ| < 1, E(yt ) = μ and when φ = 1, then E(yt ) = y0 . Therefore, (15.12) shows the AR (1)
without time trend regardless of whether φ = 1 or |φ| < 1. Such consideration is important
since the trend characteristics contained in the data are invariant to whether the model used is
unit root or not.
The unit root hypothesis is
H0 : φ = 1,
against
H1 : |φ| < 1.
To compute the Dickey–Fuller (DF) test statistic (Dickey and Fuller (1979)) we need first to
write (15.12) as
and then test the null hypothesis that the coefficient of yt−1 in the above regression is zero.
Letting β = −(1 − φ), then (15.13) is
and
H0 : β = 0,
against
H1 : β < 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
β̂
DF = , (15.15)
s.e. β̂
with
T
t=1 yt yt−1 − ȳ−1
β̂ = T 2 ,
t=1 yt−1 − ȳ−1
T
where ȳ−1 = T −1 t=1 yt−1 , is the simple average of y0 , y1 , . . . yT−1 and
σ̂ 2
V̂ β̂ = T 2 ,
t=1 yt−1 − ȳ−1
In matrix form,
y Mτ y−1
DF = 1/2 , (15.17)
σ̂ y−1 Mτ y−1
−1
where y = y1 , y2 , . . . , yT , τ = (1, 1, . . . , 1)T×1 , Mτ = IT − τ τ τ τ , y−1 =
y0 , y1 , . . . , yT−1 , and s−1 = (s0 , s1 , s2 , . . . sT−1 ) , with s0 = 0, and st = ti=1 ε i . But
ε Mτ s−1
DF = 1/2 . (15.19)
σ̂ s−1 Mτ s−1
Also, under H0
σ̂ 2 = σ 2 + op (1).
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Hence asymptotically,
ε s−1
a σ Mτ σ
DF ∼
s−1 1/2 , (15.20)
s−1
σ Mτ σ
with E σε = 0 and V σε = 1. Therefore, for large T, the DF statistic does not depend on σ ,
and without loss of generality we can set σ = 1, and write
a ε Mτ s−1 ε Mτ s−1 /T
DF ∼ 1/2 = , (15.21)
s−1 Mτ s−1 s−1 Mτ s−1 1/2
T2
where ε ∼ (0, IT ). It is clear from this result that the asymptotic distribution of the DF statistic
does not depend on any nuisance parameters and therefore can be tabulated for reasonably large
values of T. We shall return to the mathematical form of the asymptotic distribution of DF below.
yt = α + μ (1 − φ) t − (1 − φ) yt−1 + ε t ,
and
H0 : β = 0,
against
H1 : β < 0.
Equation (15.22) allows the model to share the same trend features, irrespective of whether
|φ| < 1 or φ = 1. This follows since under |φ| < 1, we have E yt = α/ (1 − φ) + μt,
and when φ = 1, we also have E yt = y0 + αt, namely the mean of yt follows a linear trend in
both cases. The DF statistic is given by the t-ratio of the OLS estimate of β in (15.23), namely
the t-ratio of the coefficient associated with the level variable, yt−1 , in the regression of yt on
the intercept term, a linear time trend and yt−1 .
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Definition 26 (Wiener process) Let w (t) be the change in w (t) during the time interval dt.
Then w (t) is said to follow a Wiener process if
√
w (t) = ε t dt, ε t ∼ IID(0, 1), (15.24)
1
RT (a) = √ s[Ta] , (15.25)
T
where
s[Ta] = ε 1 + ε 2 + . . . . + ε[Ta] ,
[Ta] denotes the largest integer part of Ta and s[Ta] = 0, if [Ta] = 0. Then RT (a) weakly
converges to w (a), i.e.,
RT (a) ⇒ w (a) ,
The following limit results are useful for deriving the limiting distribution of the DF statistic.
Let s̄T = (s1 + s2 + . . . + sT ) /T, then
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
√ 1 1
Ts̄T ≈ RT (a) da ⇒ w (a) da, (15.27)
0 0
T 2 1
t=1 st
⇒ w (a)2 da,
T2 0
1
5
T
T− 2 tst ⇒ aw (a) da,
t=1 0
T 1
T− 2
3
tε t ⇒ a dw (a) ,
t=1 0
and
T 1
−1
T ε t st−1 ⇒ w (a) dw (a) .
t=1 0
T
T
s2t = (st−1 + ε t )2 ,
t=1 t=1
T
T
T
= s2t−1 + ε2t + 2 st−1 εt .
t=1 t=1 t=1
We also have
T T T T
t=1 st−1 ε t t=1 ε t
2 2 2
2 t=1 st t=1 st−1
= − −
T T T T
T 2
s2T ε
= − t=1 t .
T T
√ a a
Since 1/ T sT = RT (1) ∼ N (0, 1) , we have s2T /T ∼ χ 21 , and since ε t ∼ IID(0, 1), by
T 2
the application of standard law of large numbers, t=1 ε t /T converges to its limit 1. Hence, it
follows that
T
1
t=1 st−1 ε t 1 2
w (a) dw (a) ⇒ ⇒ χ1 − 1 .
0 T 2
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Also
1
1
w (a) da ⇒ N 0, .
0 3
More general results can also be obtained for I(1) processes. Suppose that yt follows the general
linear process
∞
yt − yt−1 = ut = ai ε t−i , t = 1, 2, . . . , T,
i=0
∞ ∞
where y0 = 0, ε t ∼ IID(0, σ 2 ), i=0 |ai | < ∞, and Var(ut ) = σ 2 = σ 2u < ∞.
2
i=0 ai
Let
T
d
T −1/2 ut → σ a(1)w(1) ≡ N[0, σ 2 a2 (1)],
t=1
T
d
−1/2
T ut−h εt → N 0, σ 2 γy (0) ,
t=1
T
p
T −1 ut ut−h → γy (h),
t=1
T
d 1 1
T −1 ut−1 ε t → σ 2 a(1) w2 (1) − 1 ≡ σ 2 a(1) χ 21 − 1 ,
t=1
2 2
1
2 2
T
d
T −1 yt−1 ut → σ a (1)w(1)2 − γy (0) ,
t=1
2
1
2 2
T
d
T −1 yt−1 ut−h → σ a (1)w(1)2 − γy (0) + γy (0) + γy (1) + . . .
t=1
2
Also
T 1
d
T −3/2 yt−1 → σ a(1) w(a)da,
t=1 0
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
T 1
d
T −3/2 tyt−j → σ a(1) w(1) − w(a)da ,
t=1 0
T 1
d
T −2 y2t−1 → σ 2 a2 (1) w2 (a)da,
t=1 0
T 1
d
T −5/2 tyt−1 → σ a(1) aw(a)da,
t=1 0
T 1
−3 d
T ty2t−1 → σ a (1)
2 2
aw2 (a)da.
t=1 0
Similar expressions can also be obtained for DF regression models with a linear trend (Case III).
p
yt = α + μ (1 − φ) t − (1 − φ) yt−1 + ψ i yt−i + ε t ,
i=1
where p is chosen such that the equation’s residuals, εt , are serially uncorrelated. In practice
model selection criteria such as the Akaike information criterion (AIC), or the Schwarz Bayesian
Criterion (SBC) are used to select p.2 The ADF(p) statistic is given by the t-ratio of yt−1 in
the above regression. Critical values for DF and ADF tests are the same.
p This follows since as
t → ∞, the yt−1 component dominates the augmentation part, i=1 ψ i yt−i , which is a
2 For a discussion of AIC and SBC see Sections 11.5.1 and 11.5.2.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
stationary process, and hence its effect diminishes and can be ignored as T → ∞. When using
the ADF tests the following points are worth bearing in mind:
1. The ADF test is not very powerful in finite samples for alternatives H1 : φ < 1 when φ is
near unity.
2. There is a size-power trade-off depending on the order of augmentation used in dealing
with the problem of residual serial correlation. Therefore, it is often crucial that an appro-
priate value is chosen for p, the order of augmentation of the test.
Appropriate (asymptotic) critical values of the DF test have been tabulated by Fuller (1996)
and by MacKinnon (1991). In the following, we describe how critical values can be obtained.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
yt = a + byt−1 + ut , t = 1, 2, . . . , T,
and denote the residuals from this regression by ût and the t ratio of the OLS estimator of b by
DFτ . Compute
T T
2
t=1 ût t=j+1 ût ût−j
s2T = , γ̂ j = ,
T T
and
m
j
s2LT = γ̂ 0 + 2 1− γ̂ j
j=1
m+1
which uses the Bartlett window.3 The PP unit root test is given by
2 sLT − sT
1 2 2
sT
Zτ ,df = DFτ −
sLT 2 1/2
t=1 (yt−1 −ȳ−1 )
T
sLT T2
T−1
where ȳ−1 = t=1 yt−1 /T.
Models with an intercept and a linear trend
In this case the underlying DF regression is
yt = a0 + a1 t + byt−1 + ut , t = 1, 2, . . . , T,
with DFt given by the t ratio of the OLS estimate of yt−1 , and
y = Wθ + u
where
⎛ ⎞
1 1 y0
⎜ 1 2 y1 ⎟
⎜ ⎟
⎜ .. .. .. ⎟
W=⎜ . . . ⎟.
⎜ ⎟
⎝ 1 T − 1 yT−2 ⎠
1 T yT−1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
yρ1 = y1 ,
yρt = yt − ρyt−1 , for t = 2, . . . , T,
z11 (ρ) = 1,
z1t (ρ) = (1 − ρ), for t = 2, . . . , T,
and
z21 (ρ) = 1,
z2t (ρ) = t − ρ(t − 1), for t = 2, . . . , T.
wt = yt − β̂ ρ , for t = 1, 2, . . . , T,
and carry out ADF(p) test applied to wt . It is recommended that ρ is set to 1 − 7/T.
Models with an intercept and a linear trend
Compute the OLS regression coefficients of yρt on z1t (ρ), and z2t (ρ), and denote these coeffi-
cients by β̂ 1ρ and β̂ 2ρ and then compute
wt = yt − β̂ 1ρ − β̂ 2ρ t, for t = 1, 2, . . . , T,
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
and then apply the ADF(p) procedure to wt . The recommended choice of ρ for this case is
1 − 13.5/T.
The form of the ADF-GLS test can be set out as
ADF-GLS(cμ , cτ ),
where
cμ
ρ =1− , for models with intercept only,
T
cτ
ρ = 1 − , for models with intercept and linear trend.
T
The 5 per cent critical values of ADF-GLS test can be found in Pantula, Gonzalez-Farias, and
Fuller (1994) and in Elliott, Rothenberg, and Stock (1996) and have been reproduced in
Table 15.1.
Bold figures are simulated using Microfit 5.0, with 10,000 replications.
(1) From Table 2 of Pantula, Gonzalez-Farias, and Fuller (1994)
(2) From Table I in Elliott, Rothenberg, and Stock (1996)
p
yt = φyt−1 + δ j yt−j + ε bt ,
j=1
p
f
yt = φyt+1 − δ j yt+j+1 + ε t .
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
The WS estimator of φ is obtained by solving the following weighted least squares problem
⎛ ⎞2
T
p
Q (φ, δ) = wt ⎝yt − φyt−1 − δ j yt−j ⎠
t=p+2 j=1
⎛ ⎞2
T
p
+ 1 − wt−p ⎝yt−p−1 − φyt−p + δ j yt−p+j ⎠ ,
t=p+2 j=1
or equivalently
⎛ ⎞2
T
p
Q (φ, δ) = wt ⎝yt − φyt−1 − δ j yt−j ⎠
t=p+2 j=1
⎛ ⎞2
T−p−1
p
+ (1 − wt+1 ) ⎝yt − φyt+1 + δ j yt+j+1 ⎠ ,
t=1 j=1
where
⎧
⎪
⎨ 0, for 1 ≤ t ≤ p + 1,
t−p−1
wt = T−2p , for p + 1< t ≤ T − p,
⎪
⎩
1, for T − p < t ≤ T,
φ̂ − 1
& ,
φ̂
Var
where
φ̂ = σ̂ 2 aφφ ,
Var
Q (φ̂, δ̂)
σ̂ 2 = for a model with intercept,
T−p−2
Q (φ̂, δ̂)
σ̂ 2 = for a model with a linear trend,
T−p−3
and aφφ is the element (1,1) in the inverse of ∂ 2 Q (φ, δ)/∂θ∂θ , where θ = (φ̂, δˆ ) .
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Explicit solution
Let zbt = (yt−1 , yt−1 , . . . , yt−p ) and zft = (yt+1 , −yt+2 , −yt+3 , . . . , −yt+p+1 ) ,
then it is easily seen that
θ̂ = AT−1 bT ,
where
T
T−p−1
AT = wt zbt zbt + (1 − wt+1 ) zft zft ,
t=p+2 t=1
T
T−p−1
bT = wt zbt yt + (1 − wt+1 ) zft yt .
t=p+2 t=1
Also
∂ 2 Q (φ, δ)
= AT ,
∂θ∂θ
and
Var θ̂ = σ̂ 2 AT−1 .
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
is relatively small under stationarity as compared with the alternative unit root hypothesis.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
2
T
1
'
ζT = s2t ,
TsT (l) t=1
and
j
wj = 1 − , j = 1, 2, . . . , l.
l+1
These weights are the Bartlett’s window introduced in Chapter 13. Other choices for wj are also
possible, such as Tukey or Parzen windows. The critical values of the KPSS test statistic are repro-
duced in Table 15.4.
10% 5% 2.5% 1%
∞
1
yt = ε t−i ,
i=0
1+i
where {ε t } is a white noise process. It is easily seen that yt has mean zero, a finite constant vari-
1 2
ance, σ 2 ∞ i=0 1+i = σ 2 π 2 /6 , and
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
∞
1 1
γ (h) = γ (−h) = σ 2 < K < ∞, for all h.
i=0
1+i 1 + i + |h|
∞
∞ ∞
1 1
|γ (h)| = σ 2
1+i 1 + i + |h|
h=0 h=0 i=0
∞
∞
1 1
=σ 2
= ∞,
i=0
1+i 1 + i + |h|
h=0
∞
since all the elements are positive and for each i = 0, 1, 2, . . . , we have h=0
1
1+i+|h| = ∞.
Definition 27 Consider a covariance stationary process, yt , and let γ (h) be its autocovariance
function at lag h. Then yt is said to be a long memory process if
∞
|γ (h)| = ∞, (15.29)
h=−∞
or alternatively if
where g(h) is a slowly varying function of h. The constant d is known as the ‘long-memory
parameter’.
The function g(.) is said to be slowly varying if for any c > 0, g(ch)/g(h) converges to unity
as h → ∞. An example of slowly varying functions is ln(.).
Consider the infinite-order moving average process
q
yt = lim ai ε t−i .
q→∞
i=0
The long memory condition can also be defined in terms of the weights, ai , in the infinite-order
moving average representation of yt . Note that for such a representation to exist we need {ai } to
be square summable and not necessarily absolute summable. The infinite-order moving average
representation is said to be a long memory process if
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
The above four definitions of long memory are not necessarily equivalent, unless further
restrictions are imposed. But all point to a decay in the dependence of the series on their past
which is slower than the exponential decay, but with the decay being sufficiently fast to ensure
that the series have a finite variance. In the case where 0 < d < 1/2, it is easily seen that (15.31)
implies (15.30), and (15.30) implies (15.29).4
where d > 0 is the long-memory parameter, and bf (·) is a slowly varying function. The exis-
tence of the spectral density for long memory processes depends on the properties of the slowly
varying function bf (·). It is possible to show that (15.32) has a one-to-one relationship with the
following specification of the autocovariance function
so long as bf (·) and bγ (·) are slowly varying in the sense of Zygmund (1959). For a proof and
further details see Palma (2007, Ch. 3).
where L is the usual lag operator, ε t is a white noise with zero mean and variance σ 2ε , and d can
be any real number.
When d = 0, then yt is stationary, while under d = 1 we have yt ∼ I(1). When −1/2 <
d < 1/2, it is possible to prove that the process is covariance stationary and invertible. Under
d = 0, the above model displays long memory, and can be used to characterise a wide range
of long-term patterns. The autocorrelation function of yt defined by (15.33) declines to zero at
a very slow rate. These processes are therefore very useful in the study of economic time series
that are known to display rather slow long-term movements as is the case with some inflation
and interest rate series. For large h, the autocorrelation function of ARFIMA models can be
approximated by
4 Other notions of long memory or long-range dependence are also proposed in terms of other more general concepts
of slowly varying functions. But they will not be pursued here.
5 For an introduction to spectral density analysis see Chapter 13.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
ρ(h) Kh2d−1 ,
where K is a constant. For d < 0.5, the exponent 2d − 1 < 0 so that the correlations eventu-
ally decay, but at a slow hyperbolic rate compared with the fast exponential decay in the case of
standard stationary ARMA models.
When 1/2 < d < 1, a number of studies have shown that the usual unit root tests display a
bias in favour of the hypothesis d = 1 (see, e.g., Diebold and Rudebush (1991)).
Estimation techniques for fractionally integrated processes include semi-parametric estima-
tors of their spectral density function, such as the methods proposed by Robinson (1995) and
Velasco (1999), or parametric methods based on approximation of the likelihood function (see,
for example, Fox and Taqqu (1986)). Further details of long memory processes can be found in
Robinson (1994) and Baillie (1996).
for i = 1, 2, . . . , N, and t = . . . − 1, 0, 1, 2, . . ., where |λi | < 1. It is clear that for each i, yit
is covariance stationary with absolute summable autocovariances. Suppose that Var (uit ) =
σ 2i < ∞, and λi are independently and identically distributed random draws with the distribu-
tion function F (λ) for λ on the range [0, 1). Consider now the moving average representation
of the cross-sectional average ȳt = N −1 N i=1 yit , and note that
N
N ∞
1
= N −1 uit = N −1
j
ȳt λi ui,t−j
i=1
1 − λi L i=1 j=0
∞
N
N −1
j
= λi ui,t−j .
j=0 i=1
Under the assumption that for each t, λi and uit are independently distributed we have
∞
N
j
E ȳ1 |ut , Ft−1 = N −1 E λi ui,t−j ,
j=0 i=1
where Ft−1 = u1,t−1 , u1,t−2 , . . . ; u2,t−1 , u2,t−2 , . . . ; uN,t−1 , uN,t−2, . . . , and ut = (u1t , u2t , . . . ,
j
uNt ) . But since by assumption λi is identically distributed across i, then E λi = aj and we have
∞
|u
E ȳt t , Ft−1 = aj ūt−j ,
j=0
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
N
where ūt−j = N −1 i=1 ui,t−j . Hence,
∞
ȳt = aj ūt−j + vt ,
j=1
where vt = ȳt −E ȳt |Ft−1 . It is easily seen that ūt−j , for j = 1, 2, . . . , and vt are serially uncor-
related with zero means and finite variances. Therefore, ȳt has a moving average representation
j
with coefficients aj = E λi . The rate of decay of aj over j depends on the distribution λi . For
example, if λi are random draws from a uniform distribution over [0, 1) we have aj = 1/(1 + j),
and the coefficients aj are not absolute summable, and therefore ȳt is a long memory process.
Similar results follow if it is assumed that λi are draws from beta distribution with support that
covers unity. Granger (1980) was the first to obtain this result, albeit under a more restrictive
set of assumptions. Granger also showed that when λ is type II beta distributed with parameters
p > 0 and q > 0, the sth -order autocovariance of ȳt is O(s1−q ), and therefore the aggregate
variable behaves as a fractionally integrated process of order 1 − q/2. For a generalization to
multivariate models see Pesaran and Chudik (2014) and Chapter 32.
Finally, it is important to note that the long memory property of the aggregate, ȳt , critically
depends on whether the support of the distribution of λi covers unity. For example, if λi , for
i = 1, 2, . . . , N, are draws from uniform distribution over the range [0, b) where 0 ≤ b < 1, the
moving average coefficients are given by aj = bj /(1 + j), and we have6
∞ ∞
, , bj − ln(1 − b)
,aj , = = < ∞, for b < 1.
j=0 j=0
1+j b
Hence, {ai } is an absolute summable sequence and the aggregate variable, ȳt , will no longer be a
long memory process.
6 Note that
⎡ ⎤
∞ ∞
d ⎣ b1+j ⎦ j 1
= b = ,
db 1+j 1−b
j=0 j=0
∞ b1+j
and hence j=0 1+j = − ln(1 − b).
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
15.10 Exercises
1. Consider the simple autoregressive distributed lag model
yt = α + λyt−1 + βxt + ut ,
where
xt = ρxt−1 + εt ,
2
ut 0 σ 0
∼ IID , ,
εt 0 0 ω2
for t = 1, 2, . . . , T.
(b) Hence, or otherwise, derive the mean and variance of yt .
(c) Show that under the conditions | λ |< 1 and | ρ |< 1, y∞ = limt→∞ (yt ) has a finite
mean and variance, and derive an expression for the variance of y∞ .
(d) Discuss the case where | λ |< 1, but ρ = 1, and consider the limiting properties of yt .
A(L) = a0 + a1 L + a2 L2 + · · ·,
is a polynomial in the lag operator L, (Lyt = yt−1 ), and μ is a scalar constant. The εt are
mean zero, serially uncorrelated shocks with common variance, σ 2ε .
(a) Derive the conditions under which (15.34) reduces to the trend-stationary process
yt = λt + B(L)ε t . (15.35)
(b) Given the observations (y1 , y2 , . . . , yn ), discuss alternative methods of testing (15.34)
against (15.35) and vice versa.
(c) What is meant by ‘persistence’ of shocks in time series models? How useful do you think
the concept of ‘persistence’ is for an understanding of cyclical fluctuations of the US real
GNP?
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
3. Suppose that the time series of interest can be decomposed into a deterministic trend, a ran-
dom walk component and stationary errors
yt = α + δt + γ t + vt , (15.36)
γ t = γ t−1 + ut .
yt = δ + yt−1 + wt , (15.37)
wt = ε t + θε t−1 ,
where ε t are IIDN(0, σ 2ε ). In this case show that under θ = −1, yt is a trend stationary
process.
(c) Derive a relation between λ and the MA(1) parameter θ , and hence or otherwise show
that a test of θ = −1 in (15.37) is equivalent to a test of λ = 0 in (15.36).
(d) Assume that vt and ut are distributed independently, then show that (15.37) as a charac-
terization of (15.36) implies θ < 0.
H0 : ρ = 1,
against
H1 : ρ < 1,
yt = ρyt−1 + t , t = 1, 2, . . . , T, (15.38)
where { t }∞
−∞ is a sequence of IID random variables with mean 0 and variance, σ . Let ρ̂ be
2
(a) Derive the asymptotic distribution of T(ρ̂ − 1) under the null hypothesis.
(b) How does the asymptotic distribution in (a) change if an intercept is included in (15.38)?
What is the appropriate way of including such an intercept in the model?
(c) Suppose now that, instead of (15.38), {yt } is generated according to the second-order
autoregressive (AR(2)) process
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
where { t }∞
−∞ is a sequence of IID random variables with mean 0 and variance, σ .
2
H0 : ρ 1 + ρ 2 = 1,
against
H1 : ρ 1 + ρ 2 < 1 ?
yt = α + ρyt−1 + δt + ut .
Note that the fitted values and estimate of ρ from this regression are identical to those from
an OLS regression of yt on a constant , time trend , and ξ t−1 = yt−1 − α(t − 1)
yt = α ∗ + ρ ∗ ξ t−1 + δt + ut .
6. The following regression equations are estimated by ordinary least squares using US monthly
data over the period 1948M1-2009M9.
Model (A)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Model (B)
Pt = 1.2907 + 1.2203 Pt−1 − 0.2232 Pt−2 + ε̂ pt
(0.8382) (0.03592) (0.03591)
Model (C)
Pt = −2.3591 + 0.003483 · t + 0.9945 Pt−1 + ε̂ pt
(3.8009) (0.003535) (0.003895)
LL = − − 2951.1, R̄ = 0.9955, σ̂ ε = 13.0091
2
Model (D)
Pt = −3.4460 + 0.004523 · t + 1.2187 Pt−1 − 0.2254 Pt−2 + ε̂ pt
(3.7098) (0.003451) (0.03592) (0.03593)
LL = −2931.8, R̄ = 0.9957, σ̂ ε = 12.6836
2
Model (E)
Dt = 0.02649 + 0.9979 Dt−1 + ε̂ dt
(0.009465) (0.001173)
LL = 1048.8, R̄ = 0.9990, σ̂ ε = 0.05884
2
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Model (F)
Model (G)
Model (H)
where Pt represents real equity prices, and Dt is real dividends per annum for the S&P 500
portfolio.
(a) Use the above regression results to test the hypothesis of a unit root in price and dividend
processes.
(b) Consider the following asset pricing model (r > 0)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
1
Pt = E (Pt+1 + Dt+1 |It ) , (15.40)
1+r
where It = (Pt , Dt , Pt−1 , Dt−1 , . . .). Suppose that Dt follows the following stationary
AR(p) process in Dt
Model ( J)
Model (K)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Model (L)
Test the hypothesis that the vt process contains a unit root. Interpret the result of your
tests in relation to the market efficiency hypothesis (see Chapter 7).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
16.1 Introduction
I n this chapter we consider alternative approaches proposed in the literature for the decompo-
sition of time series into trend and cyclical components. We focus on univariate techniques,
and consider Hodrick–Prescott and band-pass filters, structural time series techniques, and the
Beveridge–Nelson decomposition technique specifically designed for the unit root processes.
A multivariate version of the Beveridge–Nelson decomposition is considered in Section 22.15,
where the role of long run economic theory in such decomposition is also discussed.
yt = y∗t + ct .
Hodrick and Prescott (1997) suggested a way to isolate ct from yt by the following minimization
problem
T
T−1
∗ 2 2 ∗
min (yt − yt ) + λ ( yt+1 ) ,
2
y∗1 ,y∗2 ,...y∗T
t=1 t=2
where λ is a penalty parameter. The first term in the loss function penalizes the variance of ct ,
while the second term penalizes the lack of smoothness in y∗t , with the parameter λ regulating the
trade-off between the two sources of variations. Putting it differently, the HP filter identifies the
cyclical component ct from yt by trading-off the extent to which the trend component, y∗t , keeps
track of the original series, yt , (goodness of fit) whilst maintaining a desired degree of smoothness
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
4.8
4.6
4.4
4.2
4.0
1979Q2 1987Q4 1996Q2 2004Q4 2013Q1
YUK YUKHP
Figure 16.1 Logarithm of UK output and its Hodrick–Prescott filter using λ = 1, 600.
0.06
0.04
0.02
0.00
–0.02
–0.04
1979Q2 1987Q4 1996Q2 2004Q4 2013Q1
DYUK
Figure 16.2 Plot of detrended UK output series using the Hodrick–Prescott filter with λ = 1, 600.
in the trend component. Note that as λ approaches 0, the trend component becomes equivalent
to the original series, while as λ diverge to ∞, y∗t becomes a linear trend, since for sufficiently
large λ it is optimal to set 2 y∗t+1 = 0, which yields, y∗t+1 = d0 + d1 t, where d0 and d1 are fixed
constants.
The ‘smoothing’ parameter λ is usually chosen by trial and error, and for quarterly observa-
tions it is set to 1,600. A discussion on the value of λ for different observation frequencies can
be found in Ravn and Uhlig (2002) and Maravall and Rio (2007).
Example 33 Figure 16.1 shows the plot of the logarithm of UK real GDP and its trend computed
using the HP filter, setting λ = 1600, over the period 1970Q1 to 2013Q1. Figure 16.2 reports
the detrended series, computed using Microfit 5.0. The HP detrending procedure in this exercise
is quite sensitive to the choice of λ, giving much more pronounced cyclical fluctuations for smaller
values of λ.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
For a discussion of the statistical properties of the HP filter, see, for example, Cogley (1995),
Söderlind (1994), and Harvey and Jaeger (1993) who show that the use of the HP filter can
generate spurious cyclical patterns.
K
y∗t = ak yt−k = a (L) yt . (16.1)
k=−K
The weights can be derived from the inverse Fourier transform of the frequency response func-
tion (see Priestley (1981)). Baxter and King adjust the band-pass filter with a constraint that the
sum of the coefficients in (16.1) must be zero. Under this condition, the authors show that a (L)
can be factorized as
a (L) = (1 − L) 1 − L−1 a∗ (L) ,
with a∗(L) being a symmetric moving average with K−1 leads and lags. It follows that the moving
average has the characteristic of rendering stationary series that contain quadratic deterministic
trends.
When applied to quarterly data, K in the band-pass filter is usually set at K = 12, and as a result
24 data points (12 at the start and 12 at the end of the sample) are sacrificed, seriously limiting
the usefulness of the filter for the analysis of the current state of the economy. The use of two-
sided filters also creates difficulties in forecasting. To avoid some of these difficulties two-sided
filters must be applied recursively, rather than to the full sample. Further details are provided in
the papers by Baxter and King (1999) and Christiano and Fitzgerald (2003).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Harvey and Jaeger (1993), many stylized facts reported in the literature do not fulfil these crite-
ria. Information based on mechanically detrended series can easily lead the researcher to report
spurious cyclical behaviours; analysis based on ARIMA models can also be misleading if such
models are chosen primarily on grounds of parsimony.
Structural time series models which are linear and time invariant all have a corresponding
reduced form ARIMA representation which is equivalent in the sense that it will give identical
forecasts to the structural form.1 For example, consider the local trend model,
yt = μt + ε t ,
μt = μt−1 + ηt ,
where εt and ηt are uncorrelated white noise disturbances. Taking first differences yields
yt = ηt + ε t − ε t−1 ,
which is equivalent to an MA(1) process with a non-positive autocorrelation at lag one. By equat-
ing autocorrelations at lag one it is possible to derive the relationship between the moving aver-
age parameter and q, the ratio of the variance of ηt to that of εt . In more complex models, there
may not be a simple correspondence between the structural and reduced form parameters. The
key to handling structural time series models is the state space form, with the state of the system
representing the various unobserved components such as trends and seasonals. See Harvey and
Shephard (1993) for a detailed analysis of structural time series models.
yt = Zt α t + ε t , t = 1, 2, . . . , T, (16.2)
α t = Tt α t−1 + Rt ηt , (16.3)
1 It is worth noting that the word ‘structural’ in this literature has a very different meaning to what is meant by structural
in the literature on simultaneous equation models, and the more recent literature on dynamic stochastic general equilibrium
models. For these alternative meanings of ‘structural’ see Chapters 19 and 20.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
xt = φ 1 xt−1 + φ 2 xt−2 + ut .
yt = Zα t + ε t ,
α t = Tα t−1 + ηt ,
where
xt
yt = xt , α t = ,
xt−1
Z = (1, 0) , εt = 0,
φ1 φ2 ut
T= , ηt = ,
1 0 0
with H = 0, and
Var (ut ) 0
Q = .
0 0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Kalman (1960) showed that the calculations needed for estimating a state space model could
be set out in a recursive form which has proved very convenient computationally. Since then, the
so-called Kalman filter has been almost universally adopted in modern control and system the-
ory, and has been useful in handling time series models (Harvey (1989), Durbin and Koopman
(2001)). The method based on the Kalman filter has many practical advantages among which are
its applicability to cases where there are missing observations, measurement errors and variables
which are observed at different frequencies. The optimal forecasts of α t in the mean squared
forecast error sense (see Section 17.2), given information up to period t−1, are given by the
prediction equations
where at|t−1 = E(α t |t−1 ), and Pt|t−1 = E α t − at|t−1 α t − at|t−1 |t−1 , is the
covariance matrix of the prediction error, α t − at|t−1 . From (16.2), and using equation (16.6)–
(16.7), the best estimate of yt is
ŷt|t−1 = Zt at|t−1 ,
Once a new observation, yt , becomes available, using results on multivariate normal distribu-
tions (see Section B.10 in Appendix B), it follows that at|t−1 and Pt|t−1 can be revised using the
updating equations
Note that the term Pt|t−1 Zt Ft Zt in equation (16.9) is the weight assigned to the new informa-
tion available at time t. The Kalman algorithm calculates optimal predictions of α t in a recursive
manner. It starts with the initial values α 0 and P0 , and then iterates between (16.6)–(16.7) and
(16.8)–(16.9), for t = 1, 2, . . . , T.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
If the initial value α 0 and the innovations ε t and ηt are Gaussian processes, we have
yt |t−1 ∼ N Zt at|t−1 , Ft ,
mT 1 1 −1
T T
(θ ) = − − ln |Ft | − e F et ,
2 2 t=1 2 t=1 t t
where θ contains the parameters of interest. Maximization of the above log-likelihood function
can be achieved by employing, for example, the Newton-Raphson algorithm or the expectation
maximization algorithm introduced by Dempster, Laird, and Rubin (1977).
a (L) = a0 + a1 L + a2 L + . . . ,
2
yt = zt + ξ t , t = 1, 2, . . . T, (16.11)
with
zt = zt−1 + μ + ut ut ∼ IID 0, σ 2u , (16.12)
ξ t = c (L) vt , vt ∼ IID 0, σ 2v ,
c (L) = c0 + c1 L + c2 L2 + . . . .
Here zt is a random walk with drift. It is considered as the permanent component of the series
since shocks to zt have permanent effects on yt , whilst shocks to ξ t do not have a permanent effect
on yt , namely their effects die out eventually. This occurs because ξ t , also called the transitory
or cyclical component of the series, is a stationary process.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(i) Can we find μ, the sequences {ut } , {vt }, and {ci } such that the above decomposition is
compatible with the original process defined by (16.10)?
(ii) Is the solution for μ, ci , ut , and vt unique?
That is,
μ + ut + (1 − L) c (L) vt = μ + a (L) ε t .
Hence
Whether the decomposition is unique is clearly of interest. Recall that two processes are consid-
ered to be observationally equivalent if they have the same autocovariance generating function
(see Section 12.4). In the present context, since yt and ξ t are stationary processes with a (L)
and c (L) being absolutely summable, the autocovariance generating functions for the two sides
of (16.13) exist and are equal, namely
σ 2u + (1 − z) 1 − z−1 c (z) c z−1 σ 2v + 2σ uv (1 − z) 1 − z−1 c (z) c z−1 ,
= σ 2 a (z) a z−1 , (16.14)
where
2
ut 0 σu σ uv
∼ IID , .
vt 0 σ vu σ 2v
Now any ut and vt processes satisfying (16.14) will also satisfy (16.10), hence they can be con-
sistent with the original series. A solution clearly exists but it is not unique. To obtain a unique
solution, Beveridge and Nelson (1981) (BN) assume that ut and vt are perfectly collinear, that is
ut = λvt .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
We need to solve for λ and c (z) using (16.16). By equating the constant terms and the terms
with the same order of z from both sides of (16.16), we obtain
c0 = a0 − λ, (16.17)
and
ci = ci−1 + ai , for i = 1, 2, . . . ,
or
i
ci = c0 + aj . (16.18)
j=1
λ2 = a (1)2 ,
and without loss ofgenerality, we can set λ = a (1) = ∞ i=0 ai . Using this result in (16.17) we
now have c0 = − ∞ a
j=1 j , and in view of (16.18) we obtain
∞
ci = − aj . (16.19)
j=i+1
Thus, under the assumption that ut and vt are perfectly correlated we have the following unique
answer to the decomposition problem
yt = zt + ξ t ,
where
and
ξ t = c (L) vt = c0 + c1 L + c2 L2 + . . . ε t , (16.21)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
decomposition. Therefore, testing the hypothesis a (1) = 0 is the same as testing for a unit root.
For this reason, a (1) is also often referred to as the ‘size of unit root’.
Another method of estimating a (1) would be via the spectral density approach. Since yt
is a stationary process, the related spectral density is
σ 2 iw −iw
fy (ω) = a e a e .
2π
Evaluating at zero frequency,
σ2
fy (0) = a (1) a (1) ,
2π
which yields
1/2
2πfy (0)
a (1) = .
σ
Since a (1) is the square root of the standardized spectral density at zero frequency, it follows
that identification of a (1) does not depend on particular decomposition advocated by Beverage
and Nelson. Thus the non-uniqueness of the BN decomposition does not pose any difficulty for
the estimation and interpretation of a (1).
σ 2u = σ 2 a (1)2 .
σ 2v
a (1)2 + (1 − z) (1 − z−1 )c (z) c(z−1 ) = a (z) a(z−1 ).
σ2
Again, by equating both sides of (16.24), we can get a unique answer to the decomposition.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∞
yt = μ + ai ε t−i = μ + a (L) ε t ,
i=0
t
yt = y∗0 + μt + a (1) ε i + a∗ (L) ε t ,
i=1
which is known as the stochastic trend representation of yt process, and decomposes
yt into
a deterministic linear trend, y∗0 + μt, a stochastic trend component, a (1) ti=1 ε i , and a sta-
∗
tionary
∞ (cyclical) ∞ a (L) ε t , which satisfies the absolute summability condition,
component,
∗
j=0 |aj | < ∞, if j=0 j aj < ∞. In relation to BN decomposition
t
y∗0 + μt + a (1) ε i = zt ,
i=1
and a∗ (L) ε t = ξ t , where zt and ξ t are defined by (16.20) and (16.21), respectively.
To obtain the stochastic trend representation of the unit root process, first note that a(L) can
be written as
∞
where a∗ (L) = ∗ i
i=0 ai L . Therefore
and
yt − ζ t = μ + a (1) ε t ,
where
ζ t = a∗ (L) εt .
t
yt = y∗0 + μt + a (1) ε i + a∗ (L) ε t .
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The coefficients a∗i can be obtained in terms of ai using (16.25). Equating powers of Li in
expansions of both sides of (16.25) we have
and hence
∞
a∗i = − aj .
j=i+1
Also, since
∞
∞ ∞
∞ ∞ ∞ ∞
aj ≤ aj = a j + a j + aj + . . .
i=0 j=i+1 i=0 j=i+1 j=1 j=2 j=3
∞
= i |ai | .
i=1
∞
∞ ∞ ∞
∗
a = a j
≤ iai < ∞,
i
i=0 i=0 j=i+1 i=1
t
yPt = lim E yt+h − y∗0 − (h + t)μ |It = a (1) εi ,
h→∞
i=1
where It = (yt , yt−1 , . . .). This follows by noting that the long-horizon expectations of the mean
zero stationary component of yt , namely a∗ (L) ε t , is zero.
A multivariate version of the above trend/cycle decomposition is discussed in Section 22.15.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
16.8 Exercises
1. Consider the ARMA(p, q) model analysed in Section 12.6
φ(L)yt = θ (L)ε t ,
where
φ(L) = 1 − φ 1 L − φ 2 L2 − . . . − φ p Lp ,
θ (L) = 1 − θ 1 L − θ 2 L2 − . . . − θ q Lq ,
and ε t ∼ IID(0, σ 2 ). Suppose that all the roots of φ(z) = 0, lie outside the unit circle and
yt has the infinite-order moving average process
ψ 1 = φ1 − θ 1,
ψ 2 = φ1ψ 1 + φ2 − θ 2,
..
.
ψ n = φ 1 ψ n−1 + φ 2 ψ n−2 + . . . . + φ n−1 ψ 1 + φ n − θ n .
(b) Consider the conditional forecasts yt+h|t+s = E yt+h |Ft+s , where Ft+s = (yt+s ,
yt+s−1 , . . .), and s < h. Show that
(c) Hence, or otherwise, show that the ARMA process can be written in the following
state-space form
yt = (1, 0, . . . , 0)st
st+1 = Tst + Rε t+1 ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
a(L) = a0 + a1 L + a2 L2 + . . . ,
is a polynomial in the lag operator, (Lyt = yt−1 ) and μ is a scalar constant. The εt are
mean zero, serially uncorrelated shocks with common variance, σ 2ε .
(a) Show that the {yt } process can be decomposed into a stationary component, xt , and
a random walk component τ t
yt = xt + τ t , (16.27)
where
xt = b(L)ε t ,
τ t = μ + τ t−1 + ηt , ηt ∼ IID(0, σ 2η ),
and b(L) = b0 + b1 L + b2 L2 + . . . .
(b) Obtain the coefficients {bi } in terms of {ai }, and show that
∞
ηt = ai ε t .
i=0
(c) Discuss the relevance of the decomposition (16.27) for the impulse response analysis
of shocks to y.
3. Suppose that yt follows the ARIMA p, d, q process
wt = yt + A0 + A1 t + · · · + Ad−1 t d−1 ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
yt = a0 + a1 t + ρyt−1 + ut .
(a) Let xt = yt − δ 0 − δ 1 t and derive δ 0 and δ 1 in terms of the parameters of the AR(1)
process such that
xt = ρxt−1 + ut .
(b) Derive the long horizon forecast of xt , defined by E(xt+h |t ), where t = (yt ,
yt−1 , . . .) for values of ρ inside the unit circle as well as when ρ = 1.
(c) Using the results in (b) above derive the permanent component of yt , and compare
your results with the Beveridge–Nelson decomposition for ρ inside the unit circle as
well as when ρ = 1.
5. Use quarterly time series observations on US GDP over the period 1979Q1-2013Q2
(provided in the GVAR data set https://sites.google.com/site/gvarmodelling/data) to
compute the permanent component of the log of US output (yt ) using the Hodrick–
Presoctt
filter. Compare your results with the long-run forecasts of yt , namely E(yt+h
yt , yt−1 , . . . ), for h sufficiently large, computed using the following ARIMA(1, 1, 1)
specification
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
17 Introduction to Forecasting
17.1 Introduction
T his chapter provides an introduction to the theory of forecasting and presents some
applications to forecasting univariate processes. It begins with a discussion of alternative
criteria of forecast optimality. It distinguishes between point and probability forecasts, one-step
and multi-step ahead forecasts, conditional and ex ante forecasts. Using a quadratic loss func-
tion, point and probability forecasts are derived for univariate time series processes that are opti-
mal in the mean squared forecast error sense. Also, the problem of parameter and model uncer-
tainty in forecasting is discussed, and an overview of the techniques for forecast evaluation is
provided.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
2
Lq (yt+1 , y∗t+1|t ) = Ae2t+1 = A yt+1 − y∗t+1|t , (17.1)
where A is a positive non-zero constant. The optimal forecast is obtained by minimizing the
expected loss conditional on the information available at time t, namely
y∗t+1|t = argmin E Lq (yt+1 , y∗t+1|t ) | t .
y∗t+1|t
y∗t+1|t is also said to be optimal in the mean squared forecast error sense. Note that (setting A = 1
without loss of generality)
2
E Lq (yt+1 , y∗t+1|t ) | t = yt+1 − y∗t+1|t f (yt+1 | t )dyt+1 ,
R
where R denotes the range of variation of yt+1 , and f (yt+1 | t ) is the probability density of
yt+1 conditional on the information set t . Suppose now that the probability density function
is exogenously given and is not affected by the forecasting exercise (reality is invariant to the
way forecasts are formed), then the first-order condition for the above minimization problem is
given by
2
∂E yt+1 − y∗t+1|t 2
∂
= yt+1 − y∗t+1|t
f (yt+1 | t )dyt+1
∂y∗t+1|t ∂y∗t+1|t R
2
∂ ∗
= ∗ y t+1 − y t+1|t f (y t+1 | t )dy t+1
R ∂yt+1|t
= −2 yt+1 − y∗t+1|t f (yt+1 | t )dyt+1 = 0. (17.2)
R
Since y∗t+1|t , the predicted value, can be viewed as known by the forecaster, the integral in (17.2)
can also be written as
∗
yt+1 f (yt+1 | t )dyt+1 = yt+1|t f (yt+1 | t )dyt+1 . (17.3)
R R
But since
f (yt+1 | t ) is a density function then R yt+1 f (yt+1 | t )dyt+1 = E yt+1 | t ,
and R f (yt+1 | t )dyt+1 = 1, and from (17.3) we obtain
y∗t+1|t = E yt+1 | t . (17.4)
Thus it is established that E yt+1 | t is the optimal point forecast of yt+1 conditional on t
when
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
This fundamental result will be used in Section 17.6 to construct forecasts of ARMA
processes.
2 [exp (αe ) − αe − 1]
La yt+1 , y∗t+1|t =
t+1 t+1
, (17.5)
α2
where as before, et+1 = yt+1 − y∗t+1|t , and α is a parameter that controls the degree of asym-
metry.
This function has the interesting property that it reduces to the familiar quadratic loss function
for α = 0. Using L’Hopital’s rule
A pictorial representation of the LINEX function for α = 0.5 is provided in Figure 17.1.
10
6
L(e)
4
–2.5 –2.0 –1.5 –1.0 –0.5 0.0 0.5 1.0 1.5 2.0 2.5
e = y – y*
Figure 17.1 The LINEX cost function defined by (17.5) for α = 0.5.
For this particular loss function under-predicting is more costly than over-predicting when
α > 0. The reverse is true when α < 0. Again assuming that the forecast will not affect the
range of the integral, for the LINEX loss function the optimal forecast, y∗t+1|t , can be obtained as
the solution of the following equation
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∂
E La yt+1 , y∗t+1|t | t = 0. (17.7)
∂y∗t+1|t
1
y∗t+1|t = log E exp αyt+1 | t ,
α
where the expectations are taken with respect to the conditional true density function of yt+1 .
In the case where this density is normal, we have
α
y∗t+1|t = E yt+1 | t + Var yt+1 | t ,
2
where E yt+1 | t and Var yt+1 | t are the conditional mean and variance of yt . Notice that
the higher the degree of asymmetry in the cost function (as measuredby the magnitude
of α),
the larger will be the discrepancy between the optimal forecast and E yt+1 | t . The average
realized value of the cost function, evaluated at the optimal forecast, is given by
E La yt+1 , y∗t+1|t = E Var yt+1 | t ,
which, interestingly enough, is independent of α, the degree of asymmetry of the underlying loss
function.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In this case, the probability (event) forecast, π̂ t+1|t , can be converted to the point forecast, using
a ‘rule of thumb’ which gives ẑ∗t+1|t = 1 if π̂ t+1|t exceeds some specified probability threshold,
α t ∈ (0, 1).
Hence, the economic forecaster has two alternative forms of forecast to announce, either
π̂ t+1|t , which takes some value in the region 0 ≤ π̂ t+1|t ≤ 1, and represents a probability
forecast; or ẑ∗t+1|t which is an event forecast. The relationship between probability and event
forecasts can also be written as ẑ∗t+1|t = I(π̂ t+1|t − α t ), where the indicator function I(·), is
defined by I(A) = 1 if A > 0, and I(A) = 0, otherwise. For further discussion see Pesaran
and Granger (2000a, 2000b). Two-states decision problems typically arise when the focus of the
analysis is correct prediction of the direction of change in the variable under consideration (up,
down) (see Pesaran and Timmermann (1992) on this, and also Section 17.12 below).
More generally, let yt be a variable of interest, and suppose that we are interested in forecast-
ing yt+1 at time t, having available an information set t . Probability event forecast refers to
the probability
of a particular
event taking place, say the probability that the event
At+1 = b ≤ yt+1 ≤ a occurs. For example, the probability of inflation (pt+1 ) conditional
on the information at time t falling in the range (a1 , a2 ), or the probability of a recession defined
as two successive negative growth rates (yt+1 )
or the joint probability of the inflation rate (pt+1 ) falling within a target range and a positive
output growth
Probability forecasts also play an important role in the Value-at-Risk (VaR) analysis in insurance
and finance (see Chapter 7). For example, it is often required that return on a given portfolio,
rt+1 (or insurance claim) satisfies the following VaR probability constraint
where VaR denotes the maximum permitted loss over the period t to t + 1.
A density forecast of the realization of a random variable at some future time is an estimate of
the probability distribution of the possible future values of that variable. Thus, density forecast-
ing is concerned with f̂t (yt+1 | t ) for all feasible values of yt+1 , or equivalently with its probabil-
y
ity distribution function, F̂t (y) = −∞ f̂t (u | t )du, for all feasible values of y. It thus provides
a complete description of the uncertainty associated with a forecast, and stands in contrast to a
point forecast, which by itself contains no description of the associated uncertainty. Notice that
probability forecasts can be seen as a special case of density forecasting, since we have
As explained above, an event forecast can be put in the form of an indicator function and states
whether an event is forecast to occur. For example, in the case of At+1 = b ≤ yt+1 ≤ a , the
event forecast will simply be
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
It is always possible to compute event forecasts from probability event forecasts, but not vice
versa. This could be done with respect to a probability threshold, p, often taken to be 1/2 in
practice. In the case of the above example we have
Î(At+1 | t ) = I F̂t (a) − F̂t (b) − p .
Finally, the main object of interest could be point forecasts, as the mean
∞
E yt+1 | t = uf (u | t )du,
−∞
yt = a + λyt−1 + βxt + ut ,
where ut is a serially uncorrelated process with mean zero and xt is a conditioning variable.
Then assuming a mean squared loss function, the conditional forecast of yt+1 based on
t = (yt , xt , yt−1 , xt−1 , . . .) and the value of xt+1 is given by
E yt+1 | t , xt+1 = a + λyt + βxt+1 .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In contrast, unconditional forecasting does not assume known future values for the conditioning
variable, xt . An unconditional (or ex ante) forecast of yt+1 is given by
E yt+1 | t = a + λyt + βE (xt+1 | t ) .
where
is the forecast error. As with the case of 1-step ahead forecasts outlined in
Section 17.2, the value
of y∗T+h|T that minimizes the expected loss, E Lq (yT+h , y∗T+h|T ) | T , is
y∗T+h|T = E(yT+h | T ) = argmin E Lq (yT+h , y∗T+h|T ) | T . (17.8)
y∗T+h|T
y∗T+h|T = α −1 log E [exp(αyT+h ) |T ] ,
where the expectations are taken with respect to the conditional true density function of yT+h .
In the case where this density is normal, we have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
α
y∗T+h|T = E yT+h |T + Var yT+h |T ,
2
where Var yT+h |T is the conditional variance of yT+h .
p
q
yt = φ i yt−i + θ i ε t−i , θ 0 = 1,
i=1 i=0
and suppose that we are interested in forecasting yT+h given the information set T = (yT ,
yT−1 , . . . .). Optimal point or probability forecasts of yT+h can be derived with respect to a
given loss function and conditional on the information set T . In the following, we derive opti-
mal point forecasts of yT+h using the quadratic loss function and result (17.8) for AR, MA, and
ARMA models.
yt − dt = φ(yt−1 − dt−1 ) + ε t ,
where dt is the deterministic or perfectly predictable component of the process—recall that for
dt we would have E (dT+h |T ) = dT+h . It is now easily seen that
y∗T+h|T = E yT+h |T = dT+h + φ h (yT − dT ), (17.10)
and E yT+h |T converges to its perfectly predictable component (also known in economic
applications as the steady state) as h → ∞, if the process (yt − dt ) is stationary, namely if
|φ| < 1. For this reason a (trend-) stationary processes is also known as a mean reverting pro-
cess. Note, however, that the above forecasts are optimal
even
if the underlying process is non-
stationary. For example, if φ = 1 we would have E yT+h |T = yT + (dT+h − dT ). But in this
case the long-horizon forecast, defined by limh→∞ E yT+h |T , is no longer mean reverting.
When parameters are not known and have to be estimated, (17.10) becomes (abstracting from
deterministic components)
h
ŷ∗T+h|T = φ̂ yT , (17.11)
where φ̂ is an estimator of φ, for example the Yule-Walker estimator (see formula (14.25) in
Chapter 14). Notice that this formula, obtained by minimizing the quadratic loss function, is
equivalent to the forecast obtained by using the iterative approach (see, in particular, formula
(17.15)).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
For higher-order AR models, h-step ahead forecasts can be obtained recursively. For example,
the optimal point forecasts for an AR(2) process are given by
y∗T+1|T = φ 1 yT + φ 2 yT−1 ,
y∗T+2|T = φ 1 y∗T+1|T + φ 2 yT ,
y∗T+j|T = φ 1 y∗T+j−1|T + φ 2 y∗T+j−2|T , for j = 3, 4, . . . , h.
More generally
p
y∗T+j|T = φ i y∗T+j−i|T , j = 1, 2, . . . , h,
i=1
For h = 1 < q,
for h = 2 < q,
and so on. To compute the forecasts we now need to estimate εT , ε T−1 , . . . from the realiza-
tions yT , yT−1 , . . .. This can be achieved assuming that the invertibility condition (discussed in
Section 12.6) holds. When this condition is met we can obtain εT and its lagged values from a
truncated version of the infinite AR representation of the MA process
ε T = θ(L)−1 yT = α(L)yT ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
α0θ 0 = 1
α1θ 0 + α0θ 1 = 0
..
.
α q θ 0 + α q−1 θ 1 + . . . . + α 0 θ q = 0
α i θ 0 + α i−1 θ 1 + . . . . + α i−q θ q = 0, for i > q,
where θ 0 = 1.
The above procedures can be adapted to forecasting using ARMA models. We have
and
yt = μ + ut , where (17.13)
∞
ut = bi ε t−i , bi = φ i .
i=0
Rewrite (17.12) as
1 − φh
yt = a + φ h yt−h + vt ,
1−φ
≡ ah + φ h yt−h + vt , (17.14)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
h−1
vt = φ j ε t−j .
j=0
Notice that vt follows an MA(h − 1) process even when ε t is serially uncorrelated due to the data
overlap resulting from h > 1.
Two basic strategies exist for generating multi-period forecasts in the context of AR models.
The first approach, known as the ‘iterated’ or ‘indirect’ method, consists of estimating (17.12)
for the data observed and then using the chain rule to generate a forecast at the desired horizon,
h ≥ 1. Specifically, the iterated forecast of yT+h , denoted by ŷT+h , is given as
h
1 − φ̂ T h
ŷ∗T+h|T = âT + φ̂ T yT , (17.15)
1 − φ̂ T
where âT and φ̂ T are the estimators of a and φ obtained from the OLS regression (17.12) of yt
on an intercept and yt−1 , using the observations yt , t = −h + 1, −h + 2, . . . , T. We have
T
T
(T + h − 1)−1 yt = âT + φ̂ T (T + h − 1)−1 yt−1 ,
t=−h+2 t=−h+2
or, equivalently,
where
T
T
ȳh:T = (T + h − 1)−1 yt , ȳh:T,−1 = (T + h − 1)−1 yt−1 ,
t=−h+2 t=−h+2
T
yt (yt−1 − ȳh:T,−1 )
φ̂ T = t=−h+2
T .
t=−h+2 (yt−1 − ȳh:T,−1 )2
Under this approach, the forecasting equation is the same across all forecast horizons; only the
number of iterations changes with h. Note also that this method yields identical forecasts as when
minimizing the MSFE loss function.
An alternative approach, known as the ‘direct’ method, consists of estimating a model for the
variable measured h-periods ahead as a function of current information. Specifically, the direct
forecast of yT+h , ỹ∗T+h|T , is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where ãh,T and φ̃ h,T are the OLS estimators of ah and φ h obtained directly from (17.14), by
regressing yt on an intercept and yt−h using the same sample observations, yt ,
t = −h + 1, −h + 2, . . . , T, used in the computation of the iterated forecasts. Notice that
under this approach, the forecasting model and its estimates will typically vary across different
forecast horizons.
We next establish conditions under which both the direct and indirect forecasts are uncondi-
tionally unbiased:
Proposition 45 Suppose data is generated by the stationary AR(1) process, (17.12) and define the
h-step ahead forecast errors from the iterated and direct methods,
and
where the iterated h-step forecast, ŷ∗T+h|T , and the direct forecast, ỹ∗T+h|T , are given by (17.15) and
(17.17), respectively. Assume that ut and vt , defined in (17.13) and (17.14), are symmetrically
distributed around zero, have finite second-order moments and expectations of φ̂ T and φ̃ h,T exist.
Then for any finite T and h we have
E(êT+h|T ) = E(ẽT+h|T ) = 0.
The proposition generalizes the known result in the literature for h = 1 established, for
example, by Fuller (1996) to multi-step ahead forecasts. For h = 1, Pesaran and Timmermann
(2005b) also show that forecast errors are unconditionally unbiased for symmetrically dis-
tributed error processes even in the presence of breaks in the autocorrelation coefficient, φ, so
long as μ is stable over the estimation sample.
In comparing iterated and direct forecasts, it is worth noting that when φ is positive and not
too close to unity, for moderately large values of h, φ̂ ≈ 0, since φ̂ T < |φ| < 1.2 It follows
h
T
that in such cases, êT+h|T = (μ−μ̂T )+vT+h +o(φ h ). Similarly, ẽT+h|T = −v̄T +vT+h +o(φ h ).
Hence, for h moderately large and φ not too close to the unit circle, a measure of the relative
efficiency of the two forecasting methods can be obtained as
since (μ − μ̂T ) and v̄T are uncorrelated with vT+h . But E(μ − μ̂T )2 = O(T −1 ) and does not
depend on h. To derive E(v̄2T ), recall that vt = h−1
j=0 φ ε t−j , and hence after some algebra
j
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and
⎧ 2 ⎫
2 ⎨h−1 j 2 ⎬
σ 1 − φ 1 − φ h
E(v̄2T ) = Var(v̄T ) = 2ε 1 + φ 2(h−j) + (T − h + 1) .
T ⎩ j=1 1 − φ 1−φ ⎭
Clearly, E(v̄2T ) = O(T −1 ) if h is fixed. But it is easily seen that we continue to have
E(v̄2T ) = O(T −1 ) even if h → ∞ so long as h/T → κ, where κ is a fixed finite fraction in
the range [0, 1). Therefore,
E(ê2T+h|T )
= 1 + O(T −1 ) + o(φ h ),
E(ẽ2T+h|T )
and for sufficiently large T there will be little to choose between the iterated and the direct
procedures. From the above result, we should expect to find the greatest difference between the
performance of the two forecasting methods in small samples (T) or in situations where h is
large, that is, when h/T is large.
Marcellino, Stock, and Watson (2006) compared the performance of iterated and direct
approaches by applying simulated out-of-sample methods to 170 US macroeconomic time series
spanning 1959–2002. They found that iterated forecasts outperform direct forecasts, particularly
if the models can select long lag specifications. Along similar lines, Pesaran, Pick, and Timmer-
mann (2011) conducted a broad-based comparison of iterated and direct multi-period forecast-
ing approaches applied to both univariate and multivariate models in the form of parsimonious
factor-augmented vector autoregressions. These authors also accounted for the serial correlation
in the residuals of the multi-period direct forecasting models by considering SURE-based esti-
mation methods, and proposed modified Akaike information criteria for model selection. Using
the data set studied by Marcellino, Stock, and Watson (2006), Pesaran, Pick, and Timmermann
(2011) further show that information in factors helps improve forecasting performance for most
types of economic variables, although it can also lead to larger biases. They also show that SURE
estimation and finite-sample modifications to the Akaike information criterion can improve the
performance of the direct multi-period forecasts.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Consider the problem of forecasting at time T the future value of a target variable, y, after h
periods, whose realization is denoted yT+h . Suppose we have an m-dimensional vector of alterna-
∗
tive forecasts of yT+h , namely yT+h|T = y∗1,T+h|T , y∗2,T+h|T , . . . , y∗m,T+h|T , where y∗i,T+h|T is
the ith forecast of yT+h formed on the basis of information available at time T. Forecast combina-
tion consists of aggregating or pooling the forecasts so that the information in the m components
∗
of yT+h|T is reduced to a single combined or pooled point forecast,
where α i,T+h|T is the weight attached to the ith forecast, y∗i,T+h|T . These weights are typically con-
strained to be positive and add up to unity, namely m i=1 α = 1, with α i,T+h|T > 0. The
i,T+h|T
combined forecast is optimal if the weights α T+h|T = α 1,T+h|T , α ∗2,T+h|T , . . . , α ∗m,T+h|T
∗ ∗
assuming the quadratic loss function (17.1). Bates and Granger (1969) have shown that the opti-
mal weights, α ∗T+h|T , depend on the (unknown) covariance matrix of all forecasts errors, namely
e1,T+h , e2,T+h , . . . ., em,T+h , with ei,T+h = yT+h − y∗i,T+h|T . In practice, if the number of fore-
casts, m, is large, computing the covariance matrix of e1,T+h , e2,T+h , . . . ., em,T+h is unfeasible.
Even when m is small the estimates of the weights might be unreliable due to short data samples
and/or breaks in the underlying forecasting processes. In practice, other (possibly sub-optimal)
weighting schemes are used. A prominent example is the equal weights average forecast
1 ∗
m
yCT+h|T = y ,
m i=1 i,T+h|T
which often works well. Other combinations that are less sensitive to outliers than the simple
average combinations are the median or the trimmed mean forecasts. Stock and Watson (2004)
have suggested using weights that depend inversely on the historical forecasting performance of
individual models. To evaluate the historical forecasting performance, the authors suggest split-
ting the sample into two sub-samples: the observations prior to date T0 are used for estimating
the individual forecasting models, while T − T0 (with T − T0 ≥ h) observations are used for
evaluation purposes. Hence, the weights are set to
pi,T+h|T
T−h 2
α i,T+h|T = m , with pi,T+h|T = δ T−h−s ys+h − y∗i,s+h|s ,
j=1 pj,T+h|T s=T 0
where δ is a discount factor. When δ = 1, there is no discounting, while for δ < 1, greater impor-
tance is attributed to the recent forecast performance of the individual models. Other possible
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
choices of weights involve the use of shrinking methods, which shrink the weights towards a
value imposed a priori.
Also see Sections 17.9, and Section C.4 in Appendix C on a Bayesian approach to forecast
combination.
Measurement uncertainty and future changes in the underlying structure of the economy pose
special problems of their own and will not be addressed in this chapter. We refer to Hendry and
Ericsson (2003) for further discussion on these sources of forecast uncertainty. Model uncer-
tainty concerns the ‘structural’ assumptions underlying a statistical model for the variable of
interest. Further details on the problem of model uncertainty can be found in Draper (1995).
Future uncertainty refers to the effects of unobserved future shocks on forecasts, while parame-
ter uncertainty is concerned with the robustness of forecasts to the choice of parameter values,
assuming a given forecasting model. In the following, we focus on future and parameter uncer-
tainty and consider alternative ways that these types of uncertainty can be taken into account.
The standard textbook approach to taking account of future and parameter uncertainties is
through the construction of forecast intervals. For the purpose of exposition, initially we abstract
from parameter uncertainty and consider the following simple linear regression model
yt = xt−1 β + ut , t = 1, 2, . . . , T,
where xt−1 is a k×1 vector of predetermined regressors, β is a k×1 vector of fixed but unknown
coefficients, and ut ∼ N(0, σ 2 ). The optimal forecast of yT+1 at time T (in the mean squared
error sense) is given by xT β. In the absence of parameter uncertainty, the calculation of a prob-
ability forecast for a specified event is closely related to the more familiar concept of forecast
confidence interval. For example, suppose that we are interested in the probability that the value
of yT+1 lies below a specified threshold, say a, conditional on T = (yT , xT , yT−1 , xT−1 , . . .),
the information available at time T. For given values of β and σ 2 , we have
a − xT β
Pr yT+1 < a | T = ,
σ
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where (·) is the standard normal cumulative distribution function, while
the (1−α)% forecast
interval for yT+1 (conditional on T ) is given by xT β ± σ −1 1 − α2 .
The two approaches, although related, are motivated by different considerations.
The point
forecast provides the threshold value a = xT β for which Pr yT+1 < a |
T = 0.5, while
forecast interval provides the threshold values cL = xT β − σ −1 1 − α2 , and cU = xT β +
σ −1 1 − α2 for which Pr yT+1 < cL | T = α2 , and Pr yT+1 < cU | T = 1 − α2 .
Clearly, the threshold values, cL and cU , associated with the (1 − α)% forecast interval may or
may not be of interest.3 Only by chance will the forecast interval calculations provide information
in a way which is directly useful in specific decision making contexts.
The relationship between probability forecasts and interval forecasts becomes even more
obscure when parameter uncertainty is also taken into account. In the context of the above
regression model, the point estimate of the forecast is given by ŷ∗T+1|T = xT β̂ T , where
−1
β̂ T = QT−1 qT ,
T
T
QT−1 = xt−1 xt−1 , and qT = xt−1 yt .
t=1 t=1
The relationship between yT+1 and its time T predictor can be written as
yT+1 = xT β + uT+1
= xT β̂ T + xT (β − β̂ T ) + uT+1 , (17.18)
This example shows that the point forecasts, xT β̂ T , are subject to two types of uncertainties,
namely that relating to β and that relating to the distribution of uT+1 . For any given sample of
data, T , β̂ T is known and can be treated as fixed. On the other hand, although β is assumed
fixed at the estimation stage, it is unknown to the forecaster and, from this perspective, it is best
viewed as a random variable at the forecasting stage. Hence, in order to compute probability
forecasts which account for future as well as parameter uncertainties, we need to specify the
joint probability distribution of β and uT+1 , conditional on T . As far as uT+1 is concerned, we
continue to assume that
3 The association between probability forecasts and interval forecasts is even weaker when one considers joint events.
For example, it would be impossible to infer the probability of the joint event of a positive output growth and an inflation
rate falling within a pre-specified range from individual, variable-specific forecast intervals. Many different such intervals
will be needed for this purpose.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and to keep the exposition simple, for the time being we shall assume that σ 2 is known and that
uT+1 is distributed independently of β. For β, noting that
−1
β̂ T − β |T ∼ N 0, σ 2 QT−1 , (17.19)
we assume that
−1
β |T ∼ N β̂ T , σ 2 QT−1 , (17.20)
When σ 2 is unknown, under the standard non-informative Bayesian priors on (β,σ 2 ), the appro-
priate forecast interval can be obtained by replacing σ 2 by its unbiased estimate,
T
−1
σ̂ 2T = (T − k) (yt − xt−1 β̂ T ) (yt − xt−1
β̂ T ),
t=1
and −1 1 − α2 by the (1− α2 )% critical value of the standard t-distribution with T−k degrees
of freedom. Although such interval forecasts have been discussed in the econometrics literature,
the particular assumptions that underlie them are not fully recognized.
Using this interpretation, the effect of parameter uncertainty on forecasts can also be obtained
via stochastic simulations, by generating alternative forecasts of yT+1 for different values of β
(and σ 2 ) drawn from the conditional probability distribution of β given by (17.20). Alterna-
tively, one could estimate probability forecasts by focusing directly on the probability distribu-
tion of yT+1 for a given value of xT , simultaneously taking into account both parameter and
future uncertainties. For example, in the simple case where σ 2 is known, this can be achieved
∗(j)
by simulating ŷT+1|T , j = 1, 2, . . . , J, where
(j)
−1 (j)
β̂ is the jth random draw from N β̂ T , σ 2 QT−1 , and uT+1 is the jth random draw from
N 0, σ 2 , with σ 2 replaced by its unbiased estimator, σ̂ 2T , defined above.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
yt is the decision variable to be chosen by the decision maker, and t is the information set con-
taining at least observations on current and past values of xt . To simplify the analysis, we assume
that the choice of yt does not affect Ft (x), although clearly changes in Ft (x) will influence the
decisions. In general, the cost and the probability distribution functions, C(yt , xt+1 ) and Ft (x),
also depend on a number of parameters characterizing the degree of risk aversion of the deci-
sion maker and his/her (subjective) specification of the future uncertainty characterized by the
curvature of the conditional distribution function of xt+1 .
Suppose now that, at time t, a forecaster provides the decision maker with the predictive distri-
bution F̂t , being an estimate of Ft (x), and we are interested in computing the value of this forecast
to the decision maker. Under the traditional approach, the forecasts F̂t are evaluated using sta-
tistical criteria which are based on the degree of closeness of F̂t to Ft (x) at different realizations
of x. This could involve the first- or higher-order conditional moments of xt+1 , the probability
that xt+1 falls in a particular range, or other event forecasts of interest. However, such evaluation
criteria need not be directly relevant to the decision maker. A more appropriate criterion would
be the loss function that underlies the decision problem. As we shall see, such decision-based
evaluation criteria simplify to the familiar MSFE criterion only in special cases.
Under the decision-based approach, we first need to solve for the decision variable yt based
on the predictive distribution function F̂t . For the above simple decision problem, the optimal
value of yt , which we denote by y∗t , is given by
y∗t = argmin EF̂ [C(yt , xt+1 ) | t ] , (17.24)
yt
where EF̂ [C(yt , xt+1 ) | t ] is the conditional expectations operator with respect to the predic-
tive distribution function, F̂t . A ‘population average’ criterion function for the evaluation of the
probability distribution function, F̂t , is given by
C Ft , F̂t = EF C(y∗t , xt+1 ) | t , (17.25)
where the conditional expectations are taken with respect to Ft (x), the ‘true’ probability distri-
bution function of xt+1 conditional on t . The above function can also be viewed as the average
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
cost of making errors when large samples of forecasts and realizations are available, for the same
specifications of cost and predictive distribution function.
To simplify notation, we drop the subscript F when the expectations are taken with respect to
the true distribution functions. We now turn to some decision problems of particular interest.
where a > 0 and ca − b2 > 0, thus ensuring that C(yt , xt+1 ) is globally convex in yt and xt+1 .
Based on the forecasts, F̂t , the unique optimal decision rule for this problem is given by
−b
y∗t = EF̂ (xt+1 | t )
a
−b
= x̂t+1|t ,
a
where x̂t+1|t is the one-step forecast of x formed at time t based on the estimate, F̂t . Substituting
this result in the utility function, after some simple algebra we have
b2
2 b2
2
C(y∗t , xt+1 ) = c − xt+1 + xt+1 − x̂t+1|t .
a a
Therefore,
b2
2
b2
2
C Ft , F̂t = c − E xt+1 | t + E xt+1 − x̂t+1|t | t .
a a
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
A B
,
B C
is positive definite and symmetric. As before, due to the quadratic nature of the cost function,
the optimal decision depends only on the first conditional moment of the assumed conditional
probability distribution function of the state variables and is given by
where x̂t+1|t is the point forecast of xt+1 formed at time t, with respect to the conditional proba-
bility distribution function, F̂t . Substituting this result in (17.27) and taking conditional expec-
tations with respect to the Ft (x), the true conditional probability distribution function of xt+1 ,
we have
C Ft , F̂t = E[xt+1 (C − H)xt+1 | t ] + E (xt+1 − x̂t+1|t ) H(xt+1 − x̂t+1|t ) | t ,
which, through H, depends on the parameters of the underlying cost function. Only in the
univariate LQ case can the implied evaluation criterion be cast in terms of a purely statistical
criterion function.
The dependence of the evaluation criterion in the multivariate case on the parameters of
the cost (or utility) function of the underlying decision model has direct bearing on the non-
invariance critique of MSFEs to scale-preserving linear transformations discussed by Clements
and Hendry (1993). In multivariate forecasting problems, the choice of the evaluation criterion
our attention to MSFE type criteria.
is not as clear cut as in the univariate case even if we confine
One possible procedure, commonly adopted, is touse E (xt+1 − x̂t+1|t ) (xt+1 − x̂t+1|t) | t ,
or equivalently the trace of the MSFE matrix E (xt+1 − x̂t+1|t ) (xt+1 − x̂t+1|t ) | t . Alter-
natively, the determinant of the MSFE matrix has also been suggested. In the context of the LQ
decision problem, both of these purely statistical criteria are inappropriate. The trace MSFE cri-
terion is justified only when H is proportional to an identity matrix of order m + k.
4 Edison and Cho (1993) consider a utility-based procedure for comparisons of exchange rate volatility models. Skouras
(1998) discusses asset allocation decisions and forecasts of a ‘risk neutral’ investor.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
costs. At the end of the period (the start of period t + 1) the speculator’s net worth will be
given by
Wt+1 = yt ρ t+1 ,
where ρ t+1 is the rate of return on the security. The speculator chooses yt in order to maximize
the expected value of the negative exponential utility function
U(yt , ρ t+1 ) = − exp −λyt ρ t+1 , λ > 0, (17.29)
and
∂EF̂ U(yt , ρ t+1 ) | t 2 1 2 2 2
= − −λρ̂ t+1|t + λ yt σ̂ t+1|t exp −λyt ρ̂ t+1|t + λ yt σ̂ t+1|t .
2
∂yt 2
Setting this derivative equal to zero, we now have the following familiar result for the speculator’s
optimal decision
ρ̂ t+1|t
y∗t = . (17.31)
λσ̂ 2t+1|t
Hence
ρ t+1 ρ̂ t+1|t
U(y∗t , ρ t+1 ) = − exp − , (17.32)
σ̂ 2t+1|t
5 We assume that y is small relative to the size of the market and the choice of y does not influence the returns
t t
distribution.
6 In general, where the conditional distribution of returns are not normally distributed we have
EF̂ U(yt , ρ t+1 ) | t = −MF̂ (−λyt ),
where MF̂ (θ) is the moment generating function of the assumed conditional distribution of returns. In this more general
case, the optimal solution is y∗t that solves ∂MF̂ (−λyt )/∂yt = 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where expectations are taken with respect to the true distribution of returns, Ft (ρ).7 This result
has three notable features. The decision-based forecast evaluation measure does not depend on
the risk-aversion coefficient, λ. It has little bearing on the familiar purely
statistical forecast eval-
uation criteria such as the MSFEs of the mean return, given by EF (ρ t+1 − ρ̂ t+1|t )2 | t .
Finally, even under Gaussian assumptions the evaluation criterion involves return predictions,
ρ̂ t+1|t , as well as volatility predictions, σ̂ t+1|t .
It is also interesting to note that under the assumption that (17.30) is based on a correctly
specified model we have
2
1 ρ̂ t+1|t
U Ft , F̂t = − exp − ,
2 σ̂ t+1|t
where ρ̂ t+1|t /σ̂ t+1|t is a single-period Sharpe ratio routinely used in the finance literature for
the economic evaluation of risky portfolios.
The average loss associated with the error in forecasting can now be computed as
T+h−1
ρ t+1 ρ̂ t+1|t
¯ U = −h−1
exp − ,
t=T σ̂ 2t+1|t
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Hence, under (17.34) the two forecast models are equally accurate on average, according to a
given loss function. If the null hypothesis is rejected, one would choose the model yielding the
lower loss. Diebold and Mariano (1995) have proposed a test that is based on the loss-differential
dt = L(yt+h , y∗1 ∗2
t+h|t ) − L(yt+h , yt+h|t ).
The null of equal predictive accuracy is then H0 : E (dt ) = 0. Given a series of T forecast errors,
the Diebold and Mariano (1995) test statistic is
T 1/2 d̄
DM =
, (17.36)
" d̄ 1/2
Var
where
1
T
d̄ = dt , (17.37)
T t=1
" d̄ is an estimator of
and Var
∞
Var d̄ = γ j , with γ j = Cov dt , dt−j . (17.38)
j=−∞
Expression (17.38) is used for the variance of d̄ because the sample of loss differentials, dt , is seri-
ally correlated for h > 1. Under the null of equal predictive ability, and under a set of regularity
a
conditions, it is possible to show that as T → ∞, DM ∼ N(0, 1). Notice that this result holds
for a wide class of loss functions (see McCracken and West (2004)). A number of modifications
and extensions of the above test have been suggested in the literature. West (1996) has extended
the DM test to deal with the case in which forecasts and forecast errors depend on estimated
regression parameters. Harvey, Leybourne, and Newbold (1997) have proposed two modifica-
tions of the DM test. Since the DM test could be seriously oversized for moderate numbers of
samples observations,8 the authors suggests the use of the following modified statistic
1/2
MDM = T −1/2 T + 1 − 2h + T −1 h (h − 1) DM.
A further modification of the DM test proposed by Harvey, Leybourne, and Newbold (1997) is
to compare the statistic with critical values from the t-distribution with T−1 degrees of freedom,
8 See also the Monte Carlo study reported in Diebold and Mariano (1995).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
rather than the standard normal. Monte Carlo experiments provided by these authors show sub-
stantially better size properties for the MDM test when compared to the DM test in moderate
samples (see also Harvey, Leybourne, and Newbold (1998)).
The statistic (17.36) is based on unconditional expectations of forecasts and forecast errors,
and therefore can be seen as a test of unconditional out-of-sample predictive ability. More
recently, Giacomini and White (2006) (GW) have focused on a test for the null hypothesis of
equal conditional predictive ability, namely
H0 : E L(yt+h , ŷ∗1 ∗2
t+h|t ) | t − E L(yt+h , ŷt+h|t ) | t = 0. (17.39)
Notice that, in the above expression expectations are conditional on the information set t avail-
able at time t, and the losses depend on the parameter estimates at time t. One important advan-
tage of the GW test is that it captures the effect of estimation uncertainty together with model
uncertainty, and can be used to study forecasts produced by general estimation methods. These
advantages come at the cost of having to specify a test function, which helps to predict the loss
from a forecast.
In Table 17.1, the proportion of Ups that were correctly forecast to occur, and the propor-
tion of Downs that were incorrectly forecast are known as the ‘hit rate’ and the ‘false alarm rate’
respectively. These can be computed as
N uu Nud
HI = , F= . (17.40)
N uu + Ndu Nud + Ndd
One important evaluation criterion for directional forecasts is the Kuipers score (KS), originally
developed for evaluation of weather forecasts. This is defined by
KS = HI − F. (17.41)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
For more details, see Murphy and Dann (1985) and Wilks (1995).
The Henriksson and Merton (1981) (HM) market-timing statistic is based on the conditional
probabilities of making correct forecasts. Merton (1981) postulates the following conditional
probabilities of taking correct actions
where ρ t+1 is the (excess) return on a given security, and ρ̂ t+1|t is its forecast (see Section 17.10.2)
for further details). Assuming that p1 (t) and p2 (t) do not depend on the size of the excess returns,
|ρ t+1 |, Merton (1981) shows that p1 (t) + p2 (t) is a sufficient statistic for the evaluation of the
forecasting ability. Together with Henriksson, he then develops a nonparametric statistic for test-
ing the hypothesis
H0 : p1 (t) + p2 (t) = 1,
or, equivalently,
H0 : p1 (t) = 1 − p2 (t),
that a market-timing forecast (ρ̂ t+1|t ≥ 0 or ρ̂ t+1|t < 0) has no economic value against the
alternative
that has positive economic value. As HM point out, their test is essentially a test of the indepen-
dence between the forecasts and whether the excess return on the market portfolio is positive.
In terms of the notation in the above contingency table, the sample estimate of the HM statistic,
p1 (t) − (1 − p2 (t)), is exactly equal to the Kuipers score given by (17.41). The hit rate, HI, is
the sample estimate of p1 (t) and the false alarm rate, F, is the sample estimate of 1 − p2 (t).
P̂ − P̂∗
PT = 1 , (17.42)
V̂(P̂) − V̂(P̂∗ ) 2
where P̂ is the proportion of Ups that are correctly predicted, P̂∗ is the estimate of the prob-
ability of correctly predicting the events assuming predictions and realizations are indepen-
dently distributed, and V̂(P̂) and V̂(P̂∗ ) are consistent estimates of the variances of P̂ and P̂∗ ,
respectively. More specifically, suppose we are interested in testing whether one binary variable,
xt = I(Xt ) is related to another binary variable, yt = I(Yt ) using a sample of observations
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(y1 , x1 ), (y2 , x2 ), . . . , (yT , xT ). Let I(A) be an indicator function that takes the value of unity if
A > 0 and zero otherwise. Now we have
T
−1
P̂ = T I(Yt Xt ), P̂∗ = ȳx̄ + (1 − ȳ)(1 − x̄), (17.43)
t=1
V̂(P̂) = T −1 P̂∗ (1 − P̂∗ ), (17.44)
−1 −1
V̂(P̂∗ ) = T (2ȳ − 1) x̄(1 − x̄) + T
2
(2x̄ − 1) ȳ(1 − ȳ)
2
(17.45)
where N = N uu + Nud + Ndu + Ndd is the total number of forecasts (provided in Table 17.1),
π̂ a = N −1 (N uu + Ndu ) is the estimate of the probability that the realizations are Up, and
π̂ f = (N uu + Nud ) /N is the estimate of the probability that outcomes are forecast to be Up.
The above results also establish the asymptotic equivalence of the HM and PT statistics.
yt = α + βxt + ut , (17.46)
where E (ut |xt , xt−1 , . . . ) = 0. We deal with the case where ut could be serially correlated
and/or heteroskedastic below.
The t-ratio of the OLS estimator of β in the above regression is given by
√
r T−2
tβ = √ , (17.47)
1 − r2
9 The PT statistic is undefined when ȳ or x̄ take the extreme values of zero or unity.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where r is the simple correlation coefficient between yt and xt . To establish the relationship
between tβ and the PT statistic, note that
and hence
T
T
P̂ = T −1 I(Yt Xt ) = 2T −1 yt xt − ȳ − x̄ + 1.
t=1 t=1
Ignoring the second term which is of order T −2 , and noting that x2t = xt and y2t = yt , we have
T
T
Sxx = T −1 (xt − x̄)2 = x̄(1 − x̄), Sxy = Syx = T −1 (xt − x̄) yt − ȳ ,
t=1 t=1
T
2
Syy = T −1 yt − ȳ = ȳ(1 − ȳ).
t=1
This in turn establishes that the student-t test of β = 0 in (17.47), and the PT test defined by
(17.42), will be asymptotically equivalent. The two test statistics are also likely to be numerically
very close in most applications.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
with negative GDP growth is usually not viewed as a recession, nor is the emergence of a short
period with negative stock returns sufficient to constitute a bear market—or may reflect the serial
dependence properties of the underlying data generating process. For example, the presence of
regimes whose dynamics are determined by a Markov process as in Hamilton (1989) might give
rise to persistence in output growth. Serial correlation in such variables is likely to generate serial
dependence in the qualitative outcomes and could cause distortions in the size of the PT test,
typically in the form of over-rejection of the null hypothesis.
In the context of the regression based test (17.47), the serial dependence in outcomes under
the null hypothesis translates into serial dependence in the errors, ut . Due to the discrete nature
of the yt = I(Yt ) series, the pattern of serial dependence in yt could differ from that of Yt and
additionally yt could be conditionally heteroskedastic even if Yt is not and vice versa.
In testing β = 0 in (17.46), serial dependence in the errors, ut , can be dealt with either para-
metrically or by using Bartlett weights recommended by Newey and West (1987) in the con-
struction of the test statistic. Consider the t-ratio
β̂
t̃β = $ , (17.49)
V̂NW β̂
where β̂ is the OLS estimator of β, and V̂NW β̂ is the (2, 2) element of the Newey and West
variance estimator (see Section 5.9)
1 z̄ −z̄ z̄ −z̄
V̂NW (φ̂) = F̂h , (17.50)
(T − 2) z̄2 (1 − z̄)2 −z̄ 1 −z̄ 1
h
ˆ0+
F̂h = 1−
j ˆ j ),
ˆj+
(
j=1
h+1
and
T
ˆ j = T −1 1 zt−j
ût ût−j .
zt zt zt−j
t=j+1
Clearly, other estimates of the variance of β̂, based on different estimates of the spectral density
of ût = yt − α̂ − β̂zt at zero frequency, could be used.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
when interest lies in testing whether one sequence of discrete random variables (‘outcomes’, {yt })
is predicted by another sequence of discrete random variables (‘forecasts’, {xt }). For example,
prediction of the direction of change in the variable under consideration may have multiple cat-
egories such as ‘down’, ‘unchanged’ and ‘up’.10
Suppose a time series of T observations on some explanatory or predictive variable, x, is
arranged into mx categories (states) while observations on the dependent or realized variable,
y, are categorized into my groups. Without loss of generality we assume that mx ≤ my and that
these are finite numbers that remain fixed as T → ∞. Denote the x-categories by xjt so that
xjt = 1 if the jth category occurs at time t and zero otherwise. Similarly, denote the realized out-
comes by yit so yit = 1 if category i occurs at time t and zero otherwise. Convert the categorical
observations into quantitative measures by assigning the (time-invariant) weights ai to yit for
i = 1, 2, . . . , my and bj to xjt for j = 1, 2, . . . , mx and t = 1, 2, . . . , T as follows
my
mx
yt = ai yit , and xt = bj xjt .
i=1 j=1
Since the outcome categories are mutually exclusive, the regression of yt on an intercept and xt
can be written as
⎡ ⎤
my −1 x −1
m
amy + ai − amy yit = α + βbmx + β ⎣ bj − bmx xjt ⎦ + ut ,
i=1 j=1
θ yt = c + γ xt + ut , (17.51)
where yt = (y1t , y2t , . . . , ymy −1,t ) , xt = (x1t , x2t , . . . , xmx −1,t ) , c = α + βbmx − amy and
⎛ ⎞ ⎛
⎞
a1 − amy β b1 − bmx
⎜
⎟
⎜ a2 − amy ⎟ ⎜ β b2 − bmx ⎟
⎜ ⎟
θ =⎜ .. ⎟, γ =⎜
⎜ ..
⎟.
⎟
⎝ . ⎠ ⎝ . ⎠
amy −1 − amy
β bmx −1 − bmx
Suppose first that ut is serially uncorrelated. A test of predictability can now be carried out by
testing H0 : γ = 0 in (17.51), conditional on a given value of the ‘nuisance’ parameters, θ . (See
Section 17.12.4 for the special case where my = mx = 2). For a given value of θ , a standard
F-statistic can be employed to test independence of yt and xt
T − mx θ Syx S−1
xx Sxy θ
F(θ) =
, (17.52)
mx − 1 θ Syy −Syx S−1
xx Sxy θ
10 Another example arises in the analysis of contagion where positive as well as negative discrete jumps in market returns
or spreads could be of interest (see, e.g., Favero and Giavazzi (2002) and Pesaran and Pick (2007)).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
subject to the normalizing restriction that θ Syy θ =1. This idea has been used in the statis-
tical literature in cases where certain parameters of the statistical model disappear under the
null hypothesis (e.g., see Davies (1977)). However, we note that in this specific application of
Davies’s main idea the nuisance parameters, θ, do not disappear under the null. Using (17.52),
the first-order condition for optimization of F(θ) is given by
Syx S−1 2 ˆ
xx Sxy θ̂ = ρ̂ Syy θ , (17.53)
where
mx −1
F θ̂ T−m x
ρ̂ 2 =
. (17.54)
mx −1
1 + T−mx F θ̂
The value of θ that maximizes F(θ ) is therefore given by the eigenvector associated with the
maximum eigenvalue of
S = S−1 −1
yy Syx Sxx Sxy . (17.55)
(T − mx )ρ̂ 21
Fmax =
, (17.56)
(mx − 1) 1 − ρ̂ 21
which is a generalization of (17.48) and reduces to tβ2 in the case of mx = 2. Note that
ρ̂ 2i , for i = 1, 2, . . . , mx − 1 are the squared canonical correlation coefficients between the
indicators, xt , and the realizations, yt (see Section 19.6 for a definition). There are mx − 1 such
canonical correlations, given by the square roots of the ordered non-zero solutions of the deter-
minantal equation (recall that mx ≤ my )
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Syx S−1 Sxy − ρ 2 Syy = 0.
xx
These are the same as the mx − 1 non-zero eigenvalues of the matrix S defined by (17.55). The
estimator of θ , denoted by θ̂ 1 , is given by the eigenvector associated with ρ̂ 21 , which satisfies
Syx S−1 S
xx xy − ρ̂ 2
1 yy θ̂ 1 = 0.
S (17.57)
Since ρ̂ 21 < 1 and Fmax is a monotonic function of ρ̂ 21 , a test of γ = 0 in (17.51) is thus reduced
to testing the statistical significance of the largest canonical correlation between yt and xt . The
exact joint probability distribution of the canonical correlations, 1 > ρ̂ 21 > ρ̂ 22 > . . . > ρ̂ 2mx −1 ,
is provided in Anderson (2003) (see pp. 543–545) for the case where the distribution of yt con-
ditional on xt is Gaussian. In the present application where the elements of yt (conditional on
xt ) can be viewed as independent draws from a multinominal distribution, the exact distribution
of the canonical correlations will be less tractable but can be simulated. Critical values of the test
statistic (17.56) are provided in Pesaran and Timmermann (2009).
The null of independence between x and y implies not only that ρ 1 = 0 but that ρ 1 = ρ 2 =
. . . = ρ mx −1 = 0. An alternative to the maximum canonical correlation test is therefore to base
a test of γ = 0 on an average concept of F(θ ) given by
mx −1
(T − mx ) ρ̂ 2i (T − mx )
F̄ = ≈ Tr (S) .
mx − 1 i=1 1 − ρ̂ 2i mx − 1
This test can also be derived in the context of the reduced rank regression
yt = a + xt + ε t , (17.58)
where in our application the null hypothesis of interest is = 0, or rank ( ) = 0.11 Under
some regularity conditions and assuming that under the null hypothesis the outcomes or εt are
serially independent an asymptotic test of = 0 is given by
x −1
m
a
(T − mx ) ρ̂ 2i ∼ χ 2(m .
x −1)
2
i=1
mx −1
i=1 ρ̂ 2i can also be computed by Tr(S) and for this reason is often called the trace test.
It is possible to show that the trace test based on the reduced rank regression (17.58) is iden-
tical to the Fisher chi-square test of independence for data arranged in a contingency table. The
values or ‘labels’ assigned to the categories for the X and Y variables may have a specific meaning
in some applications but are often arbitrary—think of the convention of labelling recessions as
unity and expansions as zeros. It would be unfortunate if such labels had an effect on the outcome
of the proposed tests. However, Pesaran and Timmermann (2009) showed that both maximum
canonical correlation and trace canonical correlation tests are invariant to the values taken by the
my categories of Y and the values taken by the mx categories of X.
11 For an account of the reduced rank regression technique, see Section 19.7 and references cited therein.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where ε t are serially independent. For this error specification, using (17.51) we have
θ yt = c (1 − ϕ) + γ xt − ϕγ xt−1 + ϕθ yt−1 + ε t .
As before, a consistent test of γ = 0 can now be carried out using the maximum or the average
of the canonical correlation coefficients of Y and X after filtering both sets of variables for the
effects of yt−1 and xt−1 . More specifically, we compute the eigenvalues of
Sw = S−1 −1
yy,w Syx,w Sxx,w Sxy,w ,
where
X−1 and Y−1 are T × (mx − 1) and T × (my − 1) observation matrices on xt−1 and yt−1 ,
respectively. It is now easy to show that the trace test based on Sw is the same as testing = 0
in the dynamically augmented reduced rank regression
Y = X + WB + E,
An alternative to dynamically augmenting the reduced rank regression is to adjust the moment
matrices used in calculating the variance matrix of γ̂ to account for heteroskedasticity and auto-
correlation in the errors in (17.51). The F-statistic corresponding to (17.52) in this case is
given by
T − mx θ Syx H−1
xx (θ )Sxy θ
F(θ ) = ,
mx − 1 θ Syy −Syx H−1
xx (θ )Sxy θ
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
1
T T
Hxx (θ ) = lim E (xt − x̄) (xs − x̄) ut (θ )us (θ) ,
T→∞ T s=1 t=1
x̄ = x̄1 , x̄2 , . . . , x̄mx −1 , ȳ = ȳ1 , ȳ2 , . . . , ȳmy −1 , and under γ = 0, we have ut (θ) =
θ (yt − ȳ). Hence
1
T T
¯ (xt − x̄) (xs − x̄) (ys − y)
¯ θ ,
Hxx (θ ) = lim E θ (yt − y)
T→∞ T s=1 t=1
can be viewed as the long-run variance of T −1/2 Tt=1 dt (θ ), where dt (θ ) = θ (yt −ȳ) (xt − x̄).
Since elements of xt and yt are bounded, Hxx (θ ) exists under general assumptions concerning
the serial dependence and heteroskedasticity of the error terms, as set out in Newey and West
(1987).
Unlike the serially independent case, the first-order conditions for maximization of LM(θ )
cannot get reduced to solving an eigenvalue problem. An asymptotically equivalent alternative
(under γ = 0) is to use a first-stage consistent estimate of Hxx (θ ) that abstracts from the serial
dependence of the errors. Such an estimator of θ is given by (17.57), and the first-stage estimate
of Hxx (θ) can be obtained by (using a Bartlett window)
h
j
Ĥxx,h (θ̂ 1 ) = ˆ 0 + 1− (ˆ j + ˆ j ),
j=1
h+1
T
ˆ j = T −1 dt (θ̂ 1 )d t−j (θ̂ 1 ),
t=j+1
dt (θ̂ 1 ) = θ̂ 1 (yt − ȳ) (xt − x̄) .
Using this estimator, one can solve the following eigenvalue problem
Syx Ĥ−1 2
xx (θ̂ 1 )Sxy − ρ̃ 1 Syy θ̃ 1 = 0,
Under the null that γ = 0, and conditional on the initial estimator of θ , θ̂ 1 , the trace test is now
given by
a
(T − mx ) Tr S̃(θ̂ 1 ) ∼ χ 2(m , (17.60)
x −1)
2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
S̃(θ̂ 1 ) = S−1 −1
yy Syx Ĥxx (θ̂ 1 )Sxy .
The estimate of θ used for the estimation of Hxx (θ ) can be iterated upon as required until con-
vergence is achieved, subject to the normalization restriction, θ Syy θ = 1.
See Pesaran and Timmermann (2009) for further details and small sample results from Monte
Carlo experiments.
Hence, the probability integral transform Ut is the cumulative density function corresponding to
the density, evaluated at xt+1 . Under the null hypothesis that f̂t (xt+1 |t ) and f (xt+1 |t ) coin-
cide, the Ut are independent uniform U(0, 1) random variables. Deviations from uniform IID
will indicate that the forecasts have failed to capture some aspect of the underlying data gen-
erating process. Non uniformity may indicate improper distributional assumptions, while the
presence of serial correlation in the series Ut may indicate that the dynamics are not adequately
captured by the forecast model. Hence, the statistical adequacy of the predictive distributions, F̂t ,
can be assessed by testing whether {Ut , t = T + 1, . . . ., T + h} forms an IIDU(0, 1) sequence.
(see Diebold, Gunther, and Tay (1998)). This test can be carried out using a number of famil-
iar statistical procedures. The uniformity of the distribution of Ut over t can be evaluated by
adopting graphical methods, for example by visually comparing the estimated density (by sim-
ple histograms) to U(0, 1). Formal tests of goodness of fit can also be employed, such as the
Kolmogorov test, which measures the maximum absolute difference between the empirical dis-
tribution function and the uniform distribution. The IID
property again can be visually evalu-
ated, by examining the correlogram of the series Ut − Ū . It is important to note that the IID
uniform property is a necessary but not a sufficient condition for the optimality of the underly-
ing predictive distribution. For example, expanding the information set t will lead to a different
predictive distribution, but f̂t continues to satisfy the IID uniform property.
In practice, there will be discrepancies between f (xt+1 |t ) and f̂ (xt+1 |t ) that could be more
serious for some ranges of the state variable, x, and for some decisions as compared with other
ranges or decisions. For example, in risk management the extreme values of x (viewed as portfolio
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 T+h−1
h F̂t = C(y∗t (F̂t ), xt+1 ), (17.62)
h t=T
may be used, where the dependence of the decision variable, y∗t , on the choice of the probabil-
ity forecast distribution, F̂t , is made explicit here. Under certain regularity conditions on the
distribution of the state variable and the underlying cost function, C(·, ·), it is reasonable to
expect that
1
T+h−1
Plim h F̂t = lim C Ft , F̂t ,
h→∞ h→∞ h t=T
namely that h F̂t is an asymptotically consistent estimate of EF C Ft , F̂t . The average
h provides an estimate of the realized cost to the decision maker of using F̂t over the period
t = T, T + 1, . . . , T + h − 1. Clearly, the decision maker will be interested in forecasts that
minimize h .
In general, when one does not expect F̂t to coincide with Ft , a sensible approach to forecast
evaluation is to focus on comparisons between two competing forecasts. For example, suppose
in addition to F̂t the alternative predictive distribution function, F̃t , is also available. A baseline
alternative could be the unconditional probability distribution function of the state variable, or
some other simple conditional model. Then the average economic loss arising from using the
predictive distribution F̂t as compared with F̃t is given by
1 ∗
T+h−1
Lh (F̂t , F̃t ) = C(yt (F̂t ), xt+1 ) − C(yt∗ (F̃t ), xt+1 ) ,
h t=T
which can also be written as the simple average of the loss-differential series,
As noted above, (17.63) has been considered in Diebold and Mariano (1995) in the special case
where C(y∗t (F̂t ), xt+1 ) = ϕ(xt+1 − x̂t+1|t ), C(y∗t (F̃t ), xt+1 ) = ϕ(xt+1 − x̃t+1|t ), and ϕ(·) is a
loss function. But in a decision-based framework it is C(y∗t , xt+1 ) which determines the appro-
priate cost-of-error function and is equally applicable to the evaluation of point and probability
forecasts.
In the case of the illustrative examples of Section 17.10, we have the following expressions for
the loss-differential series
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(b) For the finance example in Section 17.10.2, using (17.32), the loss-differential becomes
ρ t+1 ρ̂ t+1|t ρ t+1 ρ̃ t+1|t
dt+1 = exp − − exp − .
σ̂ 2t+1|t σ̃ 2t+1|t
17.16 Exercises
1. Compute the 1- and 2- step ahead forecasts for the model
yt = μ + φyt−1 + ε t ,
assuming μ = 0 and φ = 0.4, and when μ and φ are estimated using the first ten observations
from the following data:
Time yt
1 101.1
2 103.4
3 103.7
4 104.6
5 104.6
6 105.7
7 107.5
8 108.1
9 108.6
10 108.9
11 109.4
12 109.6
12 For a fixed choice of H, specified independently of the parameters of the underlying cost function, C(y , x
t t+1 ), the loss
differential series dt is not invariant under nonsingular linear transformations of the state variables, xt+1 , a point emphasized
by Clements and Hendry (1993) and alluded to above. But dt+1 is invariant to the nonsingular linear transformations of
the state variables once it is recognized that such linear transformations will also alter the weighting matrix H, in line with
the transformation of the state variables, thus leaving the loss differential series, dt+1 , unaltered.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where εt is an IID white noise with mean zero and variance one. Use both the iterated and
direct methods described in Section 17.7. Compute the corresponding forecast errors under
both methods.
2. Compute the h-step ahead optimal forecast (assuming quadratic loss function) for the model
3. Consider the following linear exponential (LINEX) cost function in the forecast errors, et =
yt − y∗t|t−1 , where y∗t|t−1 is the forecast of yt formed with the information available at time
t − 1, which we denote by t−1
exp(αet ) − αet − 1
L(et ) = .
α2
(a) Plot L(et ) as a function of et for α = 0 and for α = 1/2. What are the main differences
between these two plots? How do you interpret parameter α?
(b) What is the value of y∗t|t−1 which minimizes the expected loss, assuming that yt condi-
tional on t−1 is Gaussian with mean μt|t−1 and variance ht|t−1 ?
(c) How does the solution obtained under (b) differ from the rational expectations of yt
obtained conditional on t−1 ?
(d) Discuss the relevance of the above analysis for tests of the rational expectations hypoth-
esis using survey data where individual forecasters are asked about their expectations of
yt in the future.
yt = φyt−1 + ut ,
where ut ∼ IID(0, σ 2 ).
(a) Derive iterated and direct forecasts of yt+2 conditional on yt , and show that they can be
estimated as
(it) 2
Iterated : ŷt+2|t = φ̂ yt ,
(d)
Direct : ŷt+2|t = φ̂ 2 yt ,
where φ̂ and φ̂ 2 are OLS coefficients in the regressions of yt on yt−1 and yt−2 , respectively,
using the M observations yT−M+1 ,yT−M+1 , . . . , yT .
(b) Show that
2
2 2
(it)
E yT+2 − ŷt+2|t = E φ 2 − φ̂ + (1 + φ 2 )σ 2 ,
2 2
E yT+2 − ŷ(d)
t+2|t = E φ 2
− φ̂ 2 + (1 + φ 2 )σ 2 .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
lim E(dT+2 ) = 0,
M→∞
where dT+2 is the loss differential of the two forecasting methods defined by
2 2
(it) (d)
dT+2 = yT+2 − ŷt+2|t − yT+2 − ŷt+2|t .
(d) Suppose now that there exist iterated and direct forecasts, ŷ(it) (d)
j,t+2|t and ŷj,t+2|t across N
cross-section units j = 1, 2, . . . , N. Develop statistical tests of the two approaches as N
and M tend to infinity. Specify your assumptions carefully with justification.
5. Using inflation data over the sample period 1979Q2-2007Q4 from the GVAR data, com-
pute one quarter ahead forecasts of output growth for US and UK over the period 2008Q1-
2012Q4.
(a) Initially use univariate techniques applied to US and UK output growths separately.
The GVAR 2013 data vintage can be downloaded from <https://sites.google.com/site/
gvarmodelling/data>.
(b) Then use bivariate VAR models in US and UK output growth jointly to generate alterna-
tive forecasts.
(c) Compare the two sets of forecasts and discuss their relative merit.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
18.1 Introduction
V olatility as a measure of uncertainty has been used extensively in the theoretical and empiri-
cal literature. But ‘volatility’ as such is not directly observable and like many other economic
concepts, such as expectations, demand and supply, it is usually treated as a latent variable and
measured indirectly using a number of different proxies. Initially, volatility was measured by stan-
dard deviations of price changes computed over time, typically using a rolling window. But it was
realized that such an historical measure tends to underestimate sudden changes in volatility and
is only suitable when the underlying volatility is relatively stable.
To allow for time variations in volatility, Engle (1982) developed the autoregressive condi-
tional heteroskedastic (ARCH) model that relates the (unobserved) volatility to squares of past
innovations in price (or output) changes. Such a model-based approach only partly overcomes
the deficiency of the historical measure and continues to respond very slowly when volatility
undergoes rapid changes, as has been the case during the recent financial crisis. (See, e.g., Hansen,
Huang, and Shek (2012)). The use of ARCH, or its various generalization (GARCH), in macro-
econometric modelling is further complicated by the temporal aggregation of daily GARCH
models for use in quarterly models.
In finance literature, the focus of the volatility measurement has shifted to market-based
implied volatility obtained from option prices, and realized measures based on summation of
intra-period higher-frequency squared returns. The use of implied volatility in macro-econo-
metric modelling is limited both by availability of option price data and the fact that we still
need to aggregate daily implied volatilities to a quarterly measure. By contrast, the idea of real-
ized volatility can be easily adapted for use in macro-econometric models by summing squares
of daily returns within a given quarter to construct a quarterly measure of market volatility.
The approach can be extended to include intra-daily return observations when available, but
this could contaminate the quarterly realized volatility measures with measurement errors of
intra-daily returns due to market micro-structure and jumps in intra-daily returns. In addition,
intra-daily returns are not available for all markets, and when available tend to cover a relatively
short period.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
1
Dt
RV2t = (rt (τ ) − r̄t )2 ,
Dt τ =1
Dt
where rt (τ ) = ln Pt (τ ) and r̄t = D−1 t τ =1 rt (τ ) is the average intra-daily price changes
over the day t. In practice, r̄t is very small and is set to zero. The key issue is the appropriate
number of price changes to be included in the computation of RV measures and the implications
of large price changes (jumps) for the measurement of volatility. For further details see Andersen
et al. (2003) and Barndorff-Nielsen and Shephard (2002).
The same idea can be applied to measuring quarterly RV based on daily price changes. In this
case, rt (τ ) would refer to price changes in day τ of quarter t, and Dt will be the number of trading
days within the quarter t. For most quarters we have Dt = 3 × 22 = 66, which is larger than
the number of data points typically used in the construction of daily realized market volatility in
finance.1 Similar realized quarterly volatility measures can also be computed for real asset prices
with Pt (τ ) in the above expressions replaced by Pt (τ )/CPIt , where CPIt is the general price level
for quarter t.
where rt is the variable under consideration (such as asset returns, inflation rate or output
growth), and t is the information set at time t. Volatility can arise due to a number of factors:
over-reaction to news, incomplete learning, parameter variations, and abrupt switches in policy
regimes. Econometric analysis of volatility usually focuses on daily, weekly or monthly obser-
vations. This chapter provides the technical details of the econometric methods that underlie
models of asset return volatility.
zt = rt − r̄.
1 In the case of intra-day observations, prices are usually sampled at 10-minute intervals which yield around 48 intra-
daily returns in an 8-hour-long trading day.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Under the RiskMetrics approach, the historical volatility of zt conditional on observations avail-
able at time t − 1 is computed using the exponentially weighted moving average
∞
h2t = (1 − λ) λτ z2t−τ −1 , (18.1)
τ =0
where λ is known as decay factor (or 1 − λ the decay rate). Note that the weights
wτ = (1 − λ)λτ , τ = 0, 1, 2, . . . ,
which is a restricted version of the GARCH(1, 1) model introduced below (see equation (18.5)),
with parameters satisfying α 1 + φ 1 = 1. Model (18.1) requires the initialization of the process.
For a finite observation window, denoted by H, a more appropriate specification is
H
h2H,t = wHτ z2t−1−τ ,
τ =0
where
(1 − λ)λτ
wHτ = , τ = 0, 1, . . . , H. (18.2)
1 − λH+1
Once again, these weights add up to unity. Other weighing schemes have also been considered.
In particular, the equal weighted specification
1 2
H
h2t = z ,
H + 1 τ =0 t−1−τ
where wHτ = 1/(1 + H), for all τ , which is a simple moving average specification.
The value chosen for the decay factor, λ, and the size of the observation window, H, are
related. For example, for λ = 0.9, even if a relatively large value is chosen for H, due to the
exponentially declining weights attached to past observations only around 110 observations are
effectively used in the computation of h2t .
rt = β xt−1 + ε t .
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Under the classical normal assumptions (A1 to A5) set out in Chapter 2, the disturbances εt ,
in the above regression model have a constant variance both unconditionally and conditionally.
However, in many applications in macroeconomics and finance, the assumption that the con-
ditional variance of ε t is constant over time is not valid. One possible model capturing such
variations over time is the autoregressive conditional heteroskedasticity (ARCH) model, where
volatility depends on the variability of past observations. In financial econometrics, ARCH is a
fundamental tool for analyzing the time-variation of conditional variance. The ARCH model was
introduced into the econometric literature by Engle (1982), and was subsequently generalized
by Bollerslev (1986), who proposed the generalized ARCH (or GARCH) model. Other related
models where the conditional variance of εt is used as one of the regressors explaining the con-
ditional mean of rt have also been suggested in the literature, and are known as ARCH-in-mean
and GARCH-in-mean (or GARCH-M) models.
It is clear that conditional on the information set, t−1 , variance of εt is time varying. But this
need not hold unconditionally. To see this first note that the unconditional variance of εt , which
we denote by V(ε t ), can be decomposed as
V (ε t ) = E [V (ε t |t−1 )] + V [E (ε t |t−1 )] ,
σ 2t = α 0 + α 1 σ 2t−1 .
Therefore,
σ 2t = α 0 1 + α 1 + α 21 + . . . . + α t+M−1
1 + α t+M
1 σ 2−M ,
and provided |α 1 | < 1, in the limit as M → ∞ we have (for any finite choice of σ 2−M )
α0
V (ε t ) = σ 2 = E h2t = > 0.
1 − α1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
This process is unconditionally stationary if α 1 + φ 1 < 1. Note that
or
σ 2t = α 0 + (α 1 + φ 1 )σ 2t−1 .
The unconditional variance exists and is fixed if α 1 + φ 1 < 1. The case where α 1 + φ 1 = 1
is known as the Integrated GARCH(1, 1), or IGARCH(1, 1), for short. The RiskMetrics expo-
nentially weighted formulation of h2t for large H is a special case of the IGARCH(1, 1) model
where α 0 is set equal to 0. RiskMetrics formulation avoids the variance non-existence problem
by focusing on H fixed.
A further generalization of the GARCH model is the asymmetric GARCH(1, 1), where
h2t = α 0 + α + 2 −
1 dt−1 ε t−1 + α 1 (1 − dt−1 )ε t−1 + φ 1 ht−1 , α 0 > 0,
2 2
with dt = I(ε t ).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where
and t−1 is the information set at time t − 1, containing at least observations on lagged values
of yt and xt ; namely t−1 = (xt−1 , xt−2 , . . . , yt−1 , yt−2 , . . .). The unconditional variance of εt
is determined by
q
p
σ 2t = α 0 + α i σ 2t−i + φ i σ 2t−i ,
i=1 i=1
q p
and yields a stationary outcome if all the roots of 1 = i=1 α i λ
i + i=1 φ i λ , lie outside the
i
α0
V(ε t ) = σ 2 = > 0. (18.8)
q
p
1− αi − φi
i=1 i=1
q
p
αi + φ i < 1. (18.9)
i=1 i=1
In addition to the restrictions (18.8) and (18.9), Bollerslev (1986) also assumes that α i ≥ 0,
i = 1, 2, . . . , q, and φ i ≥ 0, i = 1, 2, . . . , q. Although these additional restrictions are suf-
ficient for the conditional variance to be positive, they are not necessary (see Nelson and Cao
(1992)).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
q
ε t−i
log h2t = α 0 + αi (18.10)
i=1
ht−i
q
ε t−i
+ α ∗i −μ
i=1
ht−i
p
+ φ i log h2t−i ,
i=1
where μ = E εhtt . The value of μ depends on the density function assumed for the standard-
ized disturbances, ε̃t = ε t /ht . We have
2
μ= , if ε̃ t ∼ N (0, 1) ,
π
and
1
2 (v − 2) 2
μ= ,
(v − 1) B 2v , 12
where ht is given by
q
p
ht = α 0 + α i |ε̃ t−i | + φ i ht−i + δ wt . (18.12)
i=1 i=1
The AGARCH model can also be estimated for different error distributions. The log-likelihood
functions for the cases where ε̃t = ε t /ht has a standard normal distribution; but when it
has a standardized Student-t-distribution the log-likelihood functions are given by (18.18) and
(18.19), where εt and ht are now specified by (18.11) and (18.12), respectively.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
regression of yt on xt are computed. In the second step, ε̂2t is regressed on a constant and q of its
own lagged values
A test of the ARCH(q) effect can now be carried out by testing the statistical significance of the
slope coefficients
α 1 = α 2 = · · · = α q = 0,
H0 : α 1 = 0, (18.13)
against
H1 : α 1 = 0.
H0 : α̃ 1 = α̃ 2 = . . . = α̃ q ,
against
H1 : α̃ 1 = 0, α̃ 2 = 0, . . . α̃,q = 0,
which can be achieved by using the LM test proposed above. Note the LM test cannot distinguish
between ARCH or GARCH processes.
Several points need to be kept in mind when we use GARCH model:
1. GARCH models are not closed under cross-sectional aggregation. This means that if every
individual process follows GARCH(1, 1), there is no guarantee that the average of those
processes is also GARCH(1, 1).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
2. GARCH models need restrictions on their coefficients to make sure that the variance is
positive.
3. In order to price options, one needs to know how volatile the price is. To achieve this, one
has to match the GARCH model to some diffusion process. However, GARCH models do
not fit diffusion processes.
with
2
εt 0 σε 0
∼N , .
ξt 0 0 σ 2ξ
Here α t is a AR(1) process and the persistence of the shock ξ t to it is measured by ψ. If we square
(18.14) then take logs, we obtain
The advantage of this model is that we do not need restrictions on parameters since log z2t is well
defined and zt cannot take negative values. On the other hand, this model is computationally
demanding as it is nonlinear and non-Gaussian. Even if we assume εt is normally distributed,
ε 2t is a χ 21 and its logarithm involves nonlinearities. Prediction also becomes nonlinear and the
prediction formula is difficult to derive.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
yt = β 0 + β 1 x1,t−1 + γ h2t + ε t ,
V (ε t |t−1 ) = h2t = α 0 + α 1 ε2t−1 + · · · + α p ε 2t−p .
yt = α + ϕ t yt−1 + ut ,
ϕt = ϕ + ξ t ,
yields
yt = α + ϕyt−1 + ε t ,
ε t = ut + ξ t yt−1 ,
and
E (ε t |t−1 ) = 0,
V (ε t |t−1 ) = σ 2u + σ 2ξ y2t−1 .
Assuming ut and ξ t are distributed independently and also that they are serially uncorrelated.
q
p
h2t = α 0 + α i ε 2t−i + φ i h2t−i + δ w wt−1 , (18.16)
i=1 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
α 0 + δ μw
σ2 = > 0, (18.17)
q p
1− αi − φi
i=1 i=1
where μw = E(wt ).
The ML estimation of the above augmented GARCH-M model can be carried out under two
different assumptions concerning the conditional distribution of the disturbances, namely Gaus-
sian and standardized t-distribution. In both cases the exact log-likelihood function depends on
the joint density function of the initial observations, f (y1 , y2 , . . . , yq ), which is non-Gaussian and
intractable analytically. In most applications where the sample size is large (as is the case with
most financial time series) the effect of the distribution of the initial observations is relatively
small and can be ignored.2
(n − q)
n
(θ ) = − log(2π ) − 1
2 log h2t
2 t=q+1
n
− 1
2 h−2
t εt ,
2
(18.18)
t=q+1
n
(θ , v) = t (θ , v), (18.19)
t=q+1
where
t (θ , v) = − log B 2v , 12 − 12 log(v − 2)
v+1 ε 2t
− 2 log ht −
1 2
log 1 + 2 , (18.20)
2 ht (v − 2)
2 Diebold and Schuermann (1993) examine the quantitative importance of the distribution of the initial observations
in the case of simple ARCH models and find their effect to be negligible.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
and B 2v , 12 is the beta function.3
The degrees of freedom of the underlying t-distribution, v, are then estimated along with the
other parameters. The Gaussian log-likelihood function (18.18) is a special case of (18.20) and
can be obtained from it for large values of v. In most applications, the two log-likelihood func-
tions give very similar results for values of v around 20. The t -distribution is particularly appro-
priate for the analysis of stock returns where the distribution of the standardized residuals, ε̂t /ĥt ,
is often found to have fatter tails than the normal distribution.
The (approximate) log-likelihood function for the EGARCH model has the same form as
in (18.18) and (18.19) for the Gaussian and Student t-distributions, respectively. Unlike the
GARCH–M class of models, the EGARCH–M model always yields a positive conditional vari-
ance, h2t , for any choice of the unknown parameters; it is only required that the roots of 1 −
p
i=1 φ i z = 0 should all fall outside the unit circle. The unconditional variance of ε t in the
i
case of the EGARCH model does not have a simple analytical form.
The absolute GARCH model can also be estimated for different error distributions. The log-
likelihood functions for the cases where ε̃ t = ε t /ht has a standard normal distribution and when
it has a standardized Student t-distribution are given by (18.18) and (18.19), where εt and ht are
now specified by (18.11) and (18.12), respectively.
√
3 Notice that B( 2v , 12 ) = v+1
2 2v 12 . The constant term 12 = π is omitted from the expression
used by Bollerslev (1987). See his equation (1).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Codes Industries
1 EN Energy
2 MA Materials
3 IC Capital Goods
4 CS Commercial Services & Supplies
5 TRN Transportation
6 AU Automobiles & Components
7 LP Consumer Durables & Apparel
8 HR Hotels, Restaurants & Leisure
9 ME Media
10 MS Retailing
11 FD Food & Staples Retailing
12 FBT Food, Beverage & Tobacco
13 HHPE Household & Personal Products
14 HC Health Care Equipment & Services
15 PHB Pharmaceuticals & Biotechnology
16 BK Banks
17 DF Diversified Financials
18 INSC Insurance
19 IS Software & Services
20 TEHW Technology Hardware & Equipment
21 TS Telecommunication Services
22 UL Utilities
Note: The codes in the second column are taken from REUTERS for the S&P 500 industry groups
according to the Global Industry Classification Standard. ‘Real States’ and ‘Semiconductors & Semi-
conductor Equipment’ industries are excluded.
Source: Datastream.
yt = β xt−1 + εt
V (ε t |t−1 ) = h2t = α 0 + α 1 ε2t−1 + β 1 h2t−1 .
y∗t+1 = β xt ,
and the GARCH component does not affect the forecasts, except possibly through its effect on
the estimate of β. The interval forecast for yt+1 is also largely unaffected by the GARCH structure
of the disturbances.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Note: Columns 2 to 4 report the sample mean, standard deviation, skewness and kurtosis. Column 5
reports the Ljung–Box statistic of order 20 for testing autocorrelations in individual asset returns. The
critical value of χ 220 at the 1% significance level is 37.56. The sample period is 2nd January 1995 to 13th
October 2003.
c − β xt
Pr(yt+1 > c |t−1 ) = 1 − ,
ht+1
assuming that ε t (conditionally) has a normal distribution. Probability event forecasting, there-
fore, involves forecasting volatility. See also Chapter 17.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Sector α̂ i0 α̂ i1 φ̂ i1 α̂ i0 α̂ i1 φ̂ i1 ν̂ i
Note: Columns 2 to 4 report the ML estimates of the univariate GARCH(1,1) model for each sector i =
1, 2, . . . , 22, assuming Gaussian innovations:
h2it = ai0 +α i1 ri,t−1
2 +φ i1 h2i,t−1 .
Columns 5 to 8 report the ML estimates of the univariate GARCH(1, 1) for each sector i = 1, 2, . . . , 22,
assuming Student t innovations with ν i degrees of freedom. All the estimates reported for α i0 , α i1 , and φ i1
are statistically significant at the 5% level. The estimation period is 2 Jan. 1995 to 13 Oct. 2003.
Notice that for large enough forecast horizon, h, ĥ2T+h tends to the unconditional variance of zt
given by ĥ2∞ = α̂ 0 /(1 − α̂ 1 − φ̂ 1 ).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Engle, Lillien, and Robins (1987), while textbook treatments can be found in Hamilton (1994),
Satchell and Knight (2007), Campbell, Lo, and MacKinlay (1997), and Engle (1995). Shephard
(2005) provides selected readings of the literature on stochastic volatility. For an extension of
volatility models to the multivariate case see Chapter 25.
18.12 Exercises
1. Consider the generalized autoregressive conditional heteroskedastic (GARCH) model
yt = ht zt ,
where
and t is the information set that contains at least yt and its lagged values.
2. The RiskMetrics measure of conditional volatility of rt , excess return, is given by the expo-
nentially weighted moving average,
∞
h2t = (1 − λ) λτ rt−τ
2
−1 ,
τ =0
(a) Under what conditions does this measure of volatility coincide with the GARCH(1, 1)
model?
(b) What are the main limitations of the RiskMetrics approach?
yt = γ xt + ht ε t ,
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
(a) Suppose that the interest lies in estimating the regression coefficients, γ . Discuss the rela-
tive merits of least squares (LS) and quasi-maximum likelihood (QML) estimators, with
the latter obtained assuming (perhaps incorrectly) that the errors, εt , are Gaussian. Dis-
cuss carefully the assumptions that you must make under the two estimation procedures
to ensure that the estimators of γ are consistent. What can be said about the relative effi-
ciency of LS and QML estimators.
(b) Derive the asymptotic variance of the LS estimator of γ .
(c) How would you forecast the mean and variance of yT+h conditional on the information
set T = (yT, xT , yT−1 , xT−1 , . . .)?
yt = λyt−1 + βxt + ut ,
xt = ρxt−1 + vt ,
where ut and vt are serially uncorrelated with zero means and conditional variances
5. Suppose that the errors ut and vt in the above question are independently distributed.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Part V
Multivariate Time Series Models
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
19 Multivariate Analysis
19.1 Introduction
T his chapter reviews techniques for the analysis of multivariate systems. It begins with a dis-
cussion of the system of regression equations both when the regressors are strictly exoge-
nous (the so called SURE model) and when one or more of the regressors are endogenously
determined (the classical simultaneous equation system). It provides an overview of two and
three stage least squares, and iterated instrumental variables estimators for systems of equations
containing endogenous variables. It then considers other statistical techniques for the analysis
of multivariate systems and gives an account of principal components analysis and factor mod-
els that are useful when introducing econometric techniques for panel data with error cross-
sectional dependence (see Chapter 29). We end the chapter with canonical correlation analysis
and reduced rank regressions where a sub-set of matrix coefficients in a SURE model is assumed
to be rank deficient. Such analyses form the basis of cointegration analysis that will be considered
in Chapter 22.
yi = Xi β i + ui , i = 1, 2, . . . , m, (19.1)
where yi = yi1 , yi2 , . . . , yiT is a T×1 vector of observations on the dependent variable yit , and
Xi is a T×ki matrix of observations on the ki vector of regressors explaining yit , β i is a ki ×1 vector
of unknown coefficients and ui = (ui1 , ui2 , . . ., uiT ) is a T × 1 vector of disturbances or errors,
for i = 1, 2, . . . , m. Let u = u1 , u2 , . . . , um be the mT-dimensional vector of disturbances.
We assume
Assumption A1: The mT × 1 vector of disturbances, u, has zero conditional mean
E ( u| X1 , X2 , . . . , Xm ) = 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In econometric analysis of the system of equations in (19.1), three cases can be distinguished:
1. Contemporaneously uncorrelated disturbances, namely E ui uj = 0, for i = j.
2. Contemporaneously correlated disturbances, with identical regressors across all the equa-
tions, namely
E ui uj = σ ij IT = 0,
Xi = Xj , for all i, j.
and
In the rest of this chapter, we briefly review estimation and inference in the above three cases.
To this end, it is convenient to stack the different equations in the system in the following
manner
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
y1 X1 0 . . . 0 β1 u1
⎜ y2 ⎟ ⎜ 0 X2 . . . 0 ⎟ ⎜ β 2 ⎟ ⎜ u2 ⎟
⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ = ⎜ .. .. .. .. ⎟ ⎜ .. ⎟ + ⎜ .. ⎟ , (19.2)
⎝ . ⎠ ⎝ . . . . ⎠⎝ . ⎠ ⎝ . ⎠
ym 0 0 . . . Xm βm um
y = Gβ + u, (19.3)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
E uu X1 , X2 , . . . , Xm = = ⊗ IT ,
where = ( σ ij ) is an m×m symmetric positive definite matrix and ⊗ stands for the Kronecker
matrix multiplication.1 More specifically, we have
⎛ ⎞
σ 11 IT σ 12 IT ... σ 1m IT
⎜ ⎟
⎜ σ 21 IT σ 22 IT ... σ 2m IT ⎟
⎜ ⎟
= ⊗ IT = ⎜ .. .. .. .. ⎟. (19.4)
⎜ ⎟
⎝ . . . . ⎠
σ m1 IT σ m2 IT . . . σ mm IT
Note that
−1 = −1 ⊗ IT .
When is known, the efficient estimator of β is the GLS estimator (see Section 4.3) given by
−1 −1
β̂ GLS = G −1 ⊗ IT G G ⊗ IT y, (19.5)
But in practice, is not known and as a result β̂ GLS is often referred to as the infeasible GLS
estimator of β. A feasible GLS estimator is obtained by replacing the unknown elements of ,
namely σ ij , with suitable estimators. In the case where m is small relative to T, σ ij can be esti-
mated consistently by
ûi ûj
σ̂ ij = , i, j = 1, 2, . . . , m, (19.6)
T
where ûi = yi − Xi β̂ i,OLS , and β̂ i,OLS is the ordinary least squares estimator of β i . The resultant
feasible GLS estimator is then given by
−1
˜ −1 ⊗ IT G
β̂ FGLS = G ˜ −1 ⊗ IT y.
G
1 For a definition of the Kronecker product and the rules of its operation see Section A.8 in Appendix A.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Notice that (19.7) is now an m-dimensional vector containing single-equation OLS estimators.
Therefore, when all equations have the same regressors in common, then the GLS reduces to the
least squares procedure applied to one equation at a time.
Rβ = b, (19.8)
where R and b are r × k matrix and r × 1 vector of known constants, and as in Section 19.2,
β = β 1 , β 2 , . . . , β m , is a k × 1 vector of unknown coefficients, with k = mi=1 ki .
In what follows we distinguish between the cases where the restrictions are applicable to the
coefficients β i in each equation separately, and when there are cross-equation restrictions. In the
former case, the matrix R is block diagonal, namely
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
R1 0 ··· 0
⎜ 0 R2 ··· 0 ⎟
⎜ ⎟
R=⎜ .. .. .. .. ⎟, (19.9)
⎝ . . . . ⎠
0 0 · · · Rm
where Ri is the ri × ki matrix of known constants applicable to β i only, with rank(Ri ) = ri < ki .
In the more general case, where the restrictions involve coefficients from different equations, R
is not block-diagonal.
Computations of the ML estimators of β in (19.1) subject to the restrictions in (19.8) can
be carried out in the following manner. Initially suppose is known and define the mT × mT
matrix P = (Pσ ⊗ IT ) such that Pσ Pσ = Im , and hence
P ( ⊗ IT ) P = ImT , (19.10)
where ImT is an identity matrix of order mT. Such a matrix always exists since is a symmetric
positive definite matrix. Then compute the transformations
where G and y are given by (19.2) and (19.3). Now using familiar results the from estimation of
linear regression models subject to linear restrictions, we have (see, for example, Section 1.4 in
Amemiya (1985))
−1 −1
β = G∗ G∗ G∗ y∗ − G∗ G∗ R q, (19.12)
where
−1 −1
q = R G∗ G∗
R R G∗ G∗ G∗ y∗ − b . (19.13)
In practice, since is not known we need to estimate it. Starting with unrestricted SURE, or
other initial estimates of β i (say β̂ i,OLS ) an initial estimate of = (σ ij ) can be obtained. Using
the OLS estimates of β i , the initial estimates of σ ij are given by
ûi,OLS ûj,OLS
σ̂ ij,OLS = , i, j = 1, 2, . . . , m,
T
where
ûi,OLS = yi − Xi β̂ i,OLS , i, j = 1, 2, . . . , m.
With the help of these initial estimates, constrained estimates of β i can be computed
using (19.12). Starting from these new estimates of β i , another set of estimates for σ ij can
then be computed. This process can be repeated until the convergence criteria in (19.21)
are met.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Notice that
−1
ˆ ⊗ IT G.
Ĝ∗ Ĝ∗ = G P̂ P̂G = G
ˆ is computed differently depending on whether matrix R in (19.9) is block
The i, j element of
diagonal or not. When R is block diagonal, σ ij is estimated by
ui uj
σ̂ ij = , i, j = 1, 2, . . . , m, (19.14)
(T − si )(T − sj )
ui uj
σ̃ ij = , i, j = 1, 2, . . . , m. (19.15)
T
In the case where R is not block diagonal, an appropriate degrees of freedom correction is not
available, and hence the ML estimator of σ ij is used in the computation of the covariance matrix
of the ML estimators of β.
Tm
(θ) = − log(2π ) − 12 log || − 12 y − Gβ −1 y − Gβ .
2
Since
then
Tm T
(θ ) = − log(2π) − log || − 12 y − Gβ −1 ⊗ IT y − Gβ . (19.16)
2 2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Denoting the ML estimator of θ by
θ = β 1,
β 2, . . . ,
β m , σ̃ 11 , σ̃ 12 , . . . , σ̃ 1m ; σ̃ 22 , σ̃ 23 , . . . ,
σ̃ 2m ; . . . .; σ̃ mm ) , it is easily then seen that
yi − Xi
βi yj − Xj
βj
σ̃ ij = , (19.17)
T
and
−1
−1 ⊗ IT G
β = G −1 ⊗ IT y.
G (19.18)
−1
where = σ̃ ij with
ui
uj
σ̃ ij = , i, j = 1, 2, . . . , m, (19.20)
T
ui = yi − Xi
where β i.
The computation of the ML estimators β 1,
β= β 2, . . . ,
β m , and σ̃ ij , i, j = 1, 2, . . . , m,
can be carried out by iterating between (19.17) and (19.18) starting from the OLS estimators of
−1
β i , namely β̂ i,OLS = Xi Xi Xi yi . This iterative procedure is continued until a pre-specified
convergence criterion is met. For example, the stopping rule could be the following
ki
(r) (r−1)
β i − β i < ki × 10−4 , i = 1, 2, . . . , m, (19.21)
=1
(r)
where β i stands for the estimate of the th element of β i at the rth iteration.
The maximized value of the system log-likelihood function is given by
Tm
(
θ) = − log(2π ) − T
2 log
. (19.22)
2
Example 37 (Grunfeld’s investment equation I) In an important study of investment demand,
Grunfeld (1960) and Grunfeld and Griliches (1960) estimated investment equations for ten firms
in the US economy over the period 1935–1954. Here we estimate investment equations for five
of these firms by the SURE method, namely for General Motors (GM), Chrysler (CH), General
Electric (GE), Westinghouse (WE) and US Steel (USS). This smaller data set is also analysed in
Greene (2002). For example, GMI refers to General Motors’ gross investment, WEF to the market
value of Westinghouse, and CHC to the stock of plant and equipment of Chrysler. The SURE model
to be estimated is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
for i = GM, CH, GE, WE, and USS, and t = 1935, 1936, . . . , 1954. The results for Chrysler
reported in Table 19.1 (in the table the above variables are denoted by adding the prefix CH to
the variable names). Except for the intercept term, the results in this table are comparable with
the SURE estimates for the same equations reported in Table 14.3 in Greene (2002). See also
Example 57.
Table 19.1 SURE estimates of the investment equation for the Chrysler company
H0 : h(β) = 0,
H1 : h(β) = 0,
where h(β) is the known r × 1 vector function of β, with continuous partial derivatives.
The Wald statistic for testing the null H0 : h(β) = 0 against the two-sided alternatives, H1 :
h(β) = 0 is given by
−1
W = h(
β) H(
β)C β)H (
ov( β) h(
β), (19.24)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Example 38 Consider the investment equations for five US firms estimated in Example 37. We are
now interested in testing the hypothesis that the coefficients of Fit , the market value of the firms, are
the same across all the five companies. In terms of the coefficients of the equations in (19.23), the
relevant null hypothesis is
These four restrictions clearly involve coefficients from all the five equations. We report the test results
in Table 19.2. The LR statistic for testing these restrictions is 20.46 which is well above the 95 per
cent critical value of the chi-squared distribution with 4 degrees of freedom, and we therefore strongly
reject the slope homogeneity hypothesis.
H0 : σ 12 = σ 13 = · · · = σ 1m = 0,
σ 23 = · · · = σ 2m = 0,
..
.
σ mm = 0,
against the alternative that one or more of the off-diagonal elements of are non-zero, namely
the hypothesis that the errors from different regressions are uncorrelated. One possibility would
be to use the log-likelihood ratio statistic which is given by
m
LR = 2 (
θ) − i (θ̂ i,OLS ) , (19.25)
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where (θ ) is given by (19.22) and i (θ̂ i,OLS ) is the log-likelihood function of the ith equation
evaluated at the OLS estimators. Using (19.22), we have
m
LR = T log σ̃ 2ii
− log , (19.26)
i=1
where
σ̃ ii = T −1 yi − Xi β̂ i,OLS yi − Xi β̂ i,OLS .
m
i−1
LM = T ρ̂ 2ij ,
i=2 j=1
1
where ρ̂ ij = σ̃ ij,OLS / σ̃ ii,OLS σ̃ jj,OLS 2 is the pair-wise correlation coefficient of the residuals
from regression equations i and j. This statistic is also asymptotically distributed as a χ 2 with
m(m − 1)/2 degrees of freedom, for a fixed m and as T → ∞.
These tests of cross-equation error uncorrelatedness are asymptotically equivalent for a fixed
m and as T → ∞. They tend, however, to over-reject when m is relatively large and should be
used when m is small and T large. In cases where m is also large, a bias-corrected version of the
LM test is proposed by Pesaran, Ullah, and Yamagata (2008). See also Section 29.7.
Example 39 We now test the hypothesis of a diagonal error covariance matrix in the context of
the investment equation estimated in Example 37. For this purpose, we need to estimate the five
individual equations separately by the OLS method, and then employ the log-likelihood ratio pro-
cedure. The maximized log-likelihood values for the five equations estimated separately are for
General Motors (−117.1418), Chrysler (−78.4766), General Electric (−93.3137), Westing-
house (−73.2271) and US Steel (−119.3128), respectively, yielding the restricted log-likelihood
value of
The maximized log-likelihood value for the unrestricted system (namely, when the error covariance
matrix is not restricted) is given at the bottom right-hand corner of Table 19.1, under ‘System Log-
likelihood’ (= −459.0922). Therefore, the log-likelihood ratio statistic for testing the diagonality
of the error covariance matrix is given by LR = 2(−459.0922 + 481.472) = 44.76, which is
asymptotically distributed as a chi-squared variate with 5 (5 − 1) /2 = 10 degrees of freedom.
The 95 per cent critical value of the chi-squared distribution with 10 degrees of freedom is 19.31.
We therefore reject the hypothesis that the error covariance matrix of the five investment equations
is diagonal, which provides support for the application of the SURE technique to this problem.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Table 19.3 Estimated system covariance matrix of errors for Grunfeld–Griliches investment equations
Table 19.3 reports the estimated error covariance matrix. The covariance estimates on the off-
diagonal elements are quite large relative to the respective diagonal elements.
yi = Xi β i + Yi γ i + ui , (19.27)
= Wi δ i + ui , i = 1, 2, . . . , m,
m
X= Xi . (19.28)
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Finally, to derive the relationship between the structural form parameters, (β i , γ i ) and the
reduced form parameters (to be defined below) we first note that (19.27) can be written as
yi = (XHi ) β i + (YGi ) γ i + ui ,
Y = XHβ + YGγ + U,
where
Hβ = (H1 β 1 , H2 β 2 , . . . , Hm β m ),
Gγ = G1 γ 1 , G2 γ 2 , . . . , Gm γ m , (19.30)
U = (u1 , u2 , . . . , um ) .
Hence, the reduced form model (associated to the structural model (19.27) ) is
Y = X + V, (19.31)
−1 −1
= Hβ Im − Gγ , and V = U Im − Gγ . (19.32)
ˆ = (X X)−1 X Y,
assuming that X X is a positive definite matrix. Using these estimates in (19.29), Yi (which enters
on the right-hand side of (19.27)) can then be consistently estimated by Ŷi = X(X X)−1 X Y i .
Using these estimates the familiar two-stage least squares (2SLS) estimator of δ i can then be
written as
−1
δ̂ i,2SLS = Ŵi Ŵi Ŵi yi ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
The 2SLS is consistent if T −1 Ŵi Ŵi tends to a positive definite matrix. The order condition
k − ki ≥ pi is necessary but not sufficient. To see this note that
But
T −1 Wi PX Wi = T −1 Wi X (T −1 X X)−1 T −1 X Wi ,
T −1 Wi PX ui = T −1 Wi X (T −1 X X)−1 T −1 X ui .
To obtain an explicit expression for the 3SLS estimator stack the m equations as
y = Ŵδ + ξ , (19.35)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
⎛ ⎞ ⎛ ⎞
y1 Ŵ1 0 ··· 0
⎜ y2 ⎟ ⎜ 0 Ŵ2 ··· 0 ⎟
⎜ ⎟ ⎜ ⎟
y=⎜ .. ⎟ , Ŵ = ⎜ .. .. .. .. ⎟,
⎝ . ⎠ ⎝ . . . . ⎠
ym 0 0 · · · Ŵm
⎛ ⎞ ⎛ ⎞
δ1 ξ1
⎜ δ2 ⎟ ⎜ ξ2 ⎟
⎜ ⎟ ⎜ ⎟
δ=⎜ .. ⎟ , ξ = ⎜ .. ⎟.
⎝ . ⎠ ⎝ . ⎠
δm ξm
Then
−1 −1 −1
ˆ ⊗ IT Ŵ
δ̂ 3SLS = Ŵ ˆ ⊗ IT y,
Ŵ (19.36)
with
yi − Wi δ̂ i,2SLS yj − Wj δ̂ j,2SLS
ˆ = (σ̂ ij ), σ̂ ij =
. (19.37)
T
δ̂ i,2SLS = Ŵi Ŵi Ŵi yi . (19.38)
Zi = XLi .
Note that the elements of Li are known. The IV estimator of δ i with Zi as instruments is
given by
2 The sub-set selection can be carried out using reduced rank regression techniques reviewed in Section 19.7.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
−1
δ̂ i,IV = Ŵzi Ŵzi Ŵzi yi ,
where
where
⎛ ⎞
Ŵz1 0 ··· 0
⎜ 0 Ŵz2 ··· 0 ⎟
⎜ ⎟
Ŵz = ⎜ .. .. .. .. ⎟.
⎝ . . . . ⎠
0 0 · · · Ŵzm
−1
This estimate can be updated using = Hβ Im − Gγ , where Hβ and Gγ are estimated
−1
ˆ (1)IV = Hβ(1),IV Im − Gγ (1),IV
using δ̂ (1)IV , namely . Then compute
Ŵi(1) = Xi , i,(1)IV X , (19.41)
where
⎛ ⎞
Ŵ1(1) 0 ··· 0
⎜ 0 Ŵ2(1) ··· 0 ⎟
⎜ ⎟
Ŵ(1) = ⎜ .. .. .. .. ⎟,
⎝ . . . . ⎠
0 0 · · · Ŵm(1)
(1) is given by
and the (i, j) element of
yi − Wi δ̂ i,(1)IV yj − Wj δ̂ j,(1)IV
σ̃ ij(1) = .
T
The iterations can be continued to obtain a fully iterated system IV estimator.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Y Y
ST = .
T
Let ĉ1 = (ĉ11 , ĉ12 , . . . , ĉ1m ) be an m-dimensional real valued vector. The first principal compo-
nent is defined by taking the linear combination of the elements of yt
Therefore, the first (population) PC, denoted by c1 , is given by a suitably normalized eigenvector
associated to the largest eigenvalue, denoted by λ1 , of yy . The estimate of c1 , denoted by ĉ1 , is
based on the sample estimate of yy , which is given by ST = T −1 Y Y.3
3 In cases where m is large, one could also base the estimate of c on a regularized estimate of .
1 yy
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Similarly, the second principal component is defined as the linear combination of the
elements of yt , p̂2t = ĉ2 yt , having maximum variance, subject to the constraints ĉ2 ĉ2 = 1, and
Cov(p̂1t , p̂2t ) = 0. Again, we can compute T linear combinations to obtain the vector p̂2 =
p̂21 , p̂22 , . . . , p̂2T . The kth principal component is defined as the linear combination of the
elements of yt , p̂kt = ĉk yt , having maximum variance, subject to the constraints ĉk ĉk = 1, and
Cov(p̂kt , p̂ht ) = 0, for h = 1, 2, . . . , k − 1. In this way, we can obtain m principal
components. Let
λ1 ≥ λ2 ≥ . . . ≥ λm ≥ 0,
be the m eigenvalues of ST , in a descending order. It is possible to prove that the vector of coeffi-
cients ĉk for the kth principal component, p̂k , is given by the eigenvector of ST corresponding to
λk , satisfying
ĉk ĉk = 1, k = 1, . . . , m,
ĉk ĉh = 0, k = h.
Since the sample covariance matrix ST is non-negative definite, it has spectral decomposition
(see Section A.5 in Appendix A). Using such decomposition, it is easy to prove that
E p̂k p̂k = λk ,
where λk is the kth largest eigenvalue of ST . If m > T, eigenvalues and principal components can
be computed using the T × T matrix m−1 Y Y.
It is also possible to estimate principal components for Y, once these have been filtered by
a set of variables, contained in a T × s matrix X, that might influence Y. In this case, principal
components are computed from eigenvectors and eigenvalues of
Y MX Y
ST = ,
T
−1
where MX = IT − X X X X , and IT is a T × T identity matrix. For example, in the case
where the means of yit are unknown, X can be chosen to be a vector of ones, namely by setting
MX = Mτ = IT − τ (τ τ )−1 τ , where τ is a T × 1 vector of ones.
There are a number of methods that can be used to select, k < T, the number of PC’s or
factors. The simplest and most popular procedures are the Kaiser (1960) criterion and the scree
test. To use the Kaiser criterion the
observations are standardized so that the variables have unit
variances (in sample), and hence m i=1 λi = m (when T > m). According to this criterion one
would then retain only factors with eigenvalues greater than 1. In effect only factors that explain
as much as the equivalent of one original variable are retained.
The scree test is based on a graphical method, first proposed by Cattell (1966). A simple line
plot of the eigenvalues is used to identify a visual break in this plot. There is no formal method
for identifying the threshold, and a certain degree of personal judgement is required.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
For comprehensive treatments of the PC literature see Chapter 11 of Anderson (2003), and
Jolliffe (2004).
where yi· = (yi1 , yi2 , . . . , yiT ) , F = (f 1 , f2 , . . . , fT ) , and ui· = (ui1 , ui2 , . . . , uiT ) . Assuming
that F is known, the above system of equations has the same format as the SURE model with
Xi = F for all i. Then, by the result in Section 19.2.1, the GLS estimator of γ i is the same as the
OLS estimator and is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
or ˆ (F) = [γ̂ 1 (F), γ̂ 2 (F), . . . , γ̂ m (F)] = (F F)−1 F Y, where Y = (y1· , y2· , . . . , ym· ). For
identification of F, the normalization restrictions T −1 F F = Ik are imposed which yield
ˆ
(F) = T −1 Y F. (19.45)
ˆ
It is clear that for a given F, (F) is a consistent estimator of , which is also robust to cross-
correlations of ui· and uj· .
Similarly, for a given , the observations can be stacked over i, which gives
y·t = f t + u·t ,
where y·t = (y1t , y2t , . . . , ymt ) , and u·t = (u1t , u2t , . . . , umt ) . Again using the results in Section
19.2.1, we note that for a given , a consistent and efficient estimator of ft is given by
−1
f̂t () = y·t . (19.46)
To ensure that these estimates of ft satisfy the normalization restrictions we must have
T
−1 −1
−1
T f̂t ()f̂t () = I k = ST ,
t=1
P ST P = Ik , (19.47)
with
T
−1
ST = T −1 y·t y·t , and P = .
t=1
Therefore, P is the m × k matrix of the PCs of the sample m × m covariance matrix, ST , and
the factor estimates, f̂t (P) = P y·t , are formed as linear combinations of the observations (over
i), with the weights in these linear combinations given by the first k < m PCs of Y Y/T. Using
the factor estimates, the loadings γ i can then be estimated by running OLS regressions of yit (for
each i) on the estimated factors, f̂t . To summarize, the unobserved factors and the associated
loadings can be consistently estimated by
ˆ = T −1 Y F̂, (19.49)
where P̂ is a T × k matrix of the first k PCs of Y Y/T, namely the eigenvectors corresponding to
the k largest eigenvalues of the T × T matrix Y Y/T, and F̂ = (f̂1 , f̂2 , . . . , f̂T ) . See also Section
19.4.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The PC estimators of the factors and their loadings can also be motivated by the following
minimization problem
T
m
min (y·t − f t ) (y·t − f t ) = min (yi· − Fγ i ) (yi· − Fγ i ) ,
F
ft ;t=1,2,...,T t=1 γ i ;i=1,2,...,m i=1
(19.50)
T
subject to the k(k + 1)/2 normalization constraints T −1
t=1 ft ft = Ik . The first-order condi-
tions for this minimization problem are given by
Recalling that T −1 F F = Ik , the estimates of the factor loadings are given by γ̂ i = (F F)−1
F yi· = T −1 F yi· , or ˆ = T −1 Y F̂, which is the same as those given by (19.49). Also using
(19.52) we have f̂t = ( )−1 y·t , which is the same as (19.46). Therefore, minimization
of (19.50) with respect to and F simultaneously yields the same solution as the sequential
optimization followed earlier. Both approaches result in (19.48) and (19.49) as the solutions.
where
distributed independently of uit . Let γ̄ mw = m
and ηi and ft are i=1 wi γ i , where
theweights wi
add up to unity, m
w = 1, and are granular in the sense that w = O m −1 , and m
w2 =
i=1 i i i=1 i
O m−1 .4 Suppose that γ̄ mw = 0 and γ = 0. Then
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and ft can be consistently estimated by ȳtw = Tt=1 wi yit (up to the scaling factor γ̄ m ) so long
as ūtw = Op (m−1/2 ). The restriction that the scaling factor, γ̄ mw = 0 serves as the identify-
ing restriction, very much in the same way that Var(ft ) = 1 is used as the identifying restric-
tion under the factor model. But condition γ̄ mw = 0 is clearly more restrictive than assuming
Var(ft ) = 0, although in most economic applications condition γ̄ mw = 0 is likely to be satisfied,
since otherwise ȳtw tends to a non-stochastic constant (in the above example to zero) which is
contrary to what we observe about the highly cyclical and volatile nature of economic and finan-
cial aggregates.
m
Consider now the PC estimator of ft which is given by f̂t,T PC
= i=1 piT yit , where pT =
(p1T , p2T , . . . , pmT ) is the eigenvector associated with the largest eigenvalue of ST = T −1
T
t=1 y·t y·t . It is clear that both estimators of ft are cross-sectional weighted averages of the
observations. The main difference between the two estimators lies in the choice of the weights. In
construction of ȳtw the weights wi are predetermined and can be typically taken to be wi = 1/m.
PC
In contrast, the weights in the PC estimator, f̂t,T , are endogenously obtained as nonlinear func-
tions of the observations, yit . In small samples, the two estimators could have different degrees
PC
of correlations with ft , but when m and T are sufficiently large both estimators (ȳtw and f̂t,T )
become perfectly correlated with ft and hence with one another. The cross-section average (CS)
estimator, ȳtw , becomes perfectly correlated with ft even if T is small, but the validity of the PC
estimator requires that both m and T be large. But as we have noted above, the advantage of the
PC estimator over the CS estimator is that it is valid even if γ̄ mw → 0, as m → ∞.
The relationship between the CS and PC estimators of ft can be better understood in the
case of an exact factor model where uit s in (19.53) are cross sectionally independently dis-
tributed with a common variance, σ 2u . In this case, and imposing the normalizing restriction
T −1 Tt=1 ft2 = 1, we have
T
T
ST = T −1 y·t y·t = T −1 γ ft + ut γ ft + ut
t=1 t=1
−1/2
= S+Op (T ),
where S = γ γ + σ 2 Im . Therefore5
where p is the first eigenvector of S. Now let λmax be the largest eigenvalue of S then
γ γ + σ 2 Im p = λmax p,
and hence
γ γ p = λmax − σ 2 p.
5 Recall that p is the first eigenvector of S , normalized to have a unit length. Here we scale the PC estimator by p p
T T T T
which ensures that the PC estimator and ft have the same scale.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Thus p is also the first eigenvector of γ γ associated to λmax − σ 2 , and since γ γ has rank
unity, then p = γ , and λmax = σ 2 + γ γ . Using this result in (19.56) and using (19.54) we
have
−1
PC
f̂t,T = γ γ γ y·t + Op (T −1/2 )
μγ m−1 m i=1 ηi yit
= ȳt + + Op (T −1/2 ).
m−1 γ γ m−1 γ γ
and hence
μγ σ 2γ
PC
f̂t,T = ȳt + ft + Op (m−1/2 ) + Op (T −1/2 ).
μ2γ + σ 2γ μ2γ + σ 2γ
ȳt = μγ ft + Op (m−1/2 ).
It is clear that when μγ = 0, then ft and ȳt will be perfectly correlated if m is sufficiently large
even if T is small. But when μγ = 0, we have ȳt = Op (m−1/2 ) and, as noted earlier, ȳt →p 0,
PC
and ft cannot be identified by ȳt . In this case f̂t,T identifies ft if σ 2γ > 0. Using the above results
it is easily seen that
ȳt = μγ f̂t,T
PC
+ Op (m−1/2 ) + Op (T −1/2 ),
m
m
ȳδt = m−1 δ̂ i yit = m−1 δ̂ i γ i ft + uit
i=1 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
m
m
−1 −1
= aT m γ 2i ft + a T m γ i uit +
i=1 i=1
−1
T
T
m
+ T −1 ȳ2t (mT)−1 ȳτ uiτ uit ,
t=1 τ =1 i=1
where
−1
T
T
−1 −1
aT = T ȳ2t T ȳt ft .
t=1 t=1
It is now clear that when T is fixed and aT = 0, then ȳδt becomes proportional to ft if
m
−1
lim m γ 2i > 0, (19.57)
m→∞
i=1
m
−1
lim m γ i uit = c, (19.58)
m→∞
i=1
T
m
−1
lim (mT) ūτ uiτ uit = cT , (19.59)
m→∞
τ =1 i=1
where c represents a generic constant. Condition (19.57) is standard in the factor literature. Con-
dition (19.58) is less restrictive than assuming γ i and uit are uncorrelated, which is typically
assumed in the literature. Condition (19.59) is more complicated to relate to the literature, but
allows for weak cross-sectional dependence in the idiosyncratic errors. Note that by the Cauchy–
Schwarz inequality
⎡ 2 ⎤1/2
m
2
1/2
m
E m−1 ūτ uiτ uit ≤ E ūτ ⎣E m−1 uiτ uit ⎦ ,
i=1 i=1
2
and condition (19.59) is met if E ū2τ < K and E m−1 m i=1 uiτ uit < K. These conditions
are satisfied if uit s have fourth-order moments and are weakly cross-correlated.
PC
To investigate how quickly the correlation between ȳt and f̂t,T tends to unity when m and
T → ∞, we carried out a limited number of Monte Carlo experiments using (19.53) as the
DGP, with γ i ∼ IIDN(1, 1); ft ∼ IIDN(0, 1); uit ∼ IIDN(0, 1), for m and T = 30, 50, 100,
PC
200, 1000. The squared pair-wise correlation coefficients of ft , f̂t,T , ȳt and ȳδt , averaged across
2,000 replications, are summarized in the top part of the following Table 19.4.6 We have also
carried out experiments with spatially correlated errors generated as7
6 I would like to thank Alex Chudik for carrying out the Monte Carlo experiments reported in this sub-section.
7 For a discussion of spatial models, see Chapter 30.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
ut = au Hu ut + et , (19.60)
where the elements of et are drawn as IIDN 0, σ 2e ,
⎛ ⎞
0 1
2 0 ··· 0 0
⎜ 1
0 1 ··· 0 0 ⎟
⎜ 2 ⎟
⎜ . ⎟
⎜ 0 1 0 .. 0 0 ⎟
Hu = ⎜
⎜ .. .. .. . . ..
⎟,
.. ⎟
⎜ . . . . . . ⎟
⎜ ⎟
⎝ 0 0 0 ··· 0 1 ⎠
2
0 0 0 ··· 1
2 0
the spatial autoregressive parameter is set to au = 0.6, and σ 2e is set to ensure that
N −1 N i=1 Var (uit ) = 1. Experiments with spatially correlated errors are reported in the bot-
tom part of Table 19.4.
The results clearly show that all three estimators of ft are highly correlated with the unobserved
factor and this correlation is almost perfect for values of m above 100 (for all values of T) when
the idiosyncratic errors are independently distributed. But when the errors are weakly (spatially)
dependent then the value of m needed to get an almost perfect fit is around 200. It is also clear
that T does not matter and for a given m the correlations are hardly affected by increasing T.
Finally, although the simple average estimator, ȳt , performs well when m is sufficiently large, for
small values of m the weighted average estimator, ȳδt , is to be preferred. It is also interesting that
PC
ȳδt performs very similarly to the PC estimator, f̂t,T , which also performs well even when T is
small.
Finally, we also carried out the same experiments but with E(γ i ) = 0. The results are sum-
marized in Table 19.5. As to be expected, the simple average estimator, ȳt , performs poorly.
However, an iterated version of the weighted average estimator, ȳδt , performs well and very
similarly to the PC estimator even if E(γ i ) = 0. The rth iterated estimator is computed as
m (r) (r)
ȳ(r)
δt = m
−1 (r−1)
i=1 δ̂ i yit , where δ̂ i is the coefficient of ȳδt in the OLS regression of yit on
(r−1) (1)
ȳδt , with ȳδt = ȳδt . The results reported in Table 19.5 set r = 2. Further iterations did not
make much difference.
i i
i
i
i
i
Table 19.4 Monte Carlo findings for squared correlations of the unobserved common factor and its estimates: Experiments with E γ i = 1
(m/T) 30 50 100 200 1000 30 50 100 200 1000 30 50 100 200 1000
Experiments with IID idiosyncratic errors
30 98.16 98.21 98.25 98.25 98.27 96.28 96.31 96.41 96.36 96.42 98.16 98.21 98.25 98.25 98.27
50 98.92 98.97 98.97 98.97 98.98 97.87 97.93 97.89 97.92 97.91 98.92 98.97 98.97 98.97 98.98
100 99.46 99.49 99.50 99.49 99.49 98.93 98.96 98.98 98.97 98.98 99.46 99.49 99.50 99.49 99.49
200 99.73 99.74 99.75 99.75 99.75 99.47 99.49 99.49 99.50 99.49 99.73 99.74 99.75 99.75 99.75
500 99.95 99.95 99.95 99.95 99.95 99.90 99.90 99.90 99.90 99.90 99.95 99.95 99.95 99.95 99.95
30 96.34 96.35 96.43 96.46 96.49 89.54 89.32 89.59 89.67 89.84 96.14 96.18 96.28 96.32 96.37
50 97.77 97.85 97.87 97.87 97.88 93.44 93.68 93.71 93.71 93.65 97.68 97.78 97.82 97.83 97.84
100 98.88 98.91 98.94 98.94 98.95 96.70 96.80 96.82 96.83 96.84 98.85 98.89 98.92 98.93 98.94
200 99.44 99.46 99.46 99.47 99.47 98.33 98.38 98.39 98.41 98.41 99.43 99.45 99.46 99.47 99.47
1000 99.89 99.89 99.89 99.89 99.89 99.67 99.68 99.68 99.68 99.68 99.89 99.89 99.89 99.89 99.89
PC is the principal component estimator of f , ȳ = m−1 m y , and ȳ = m−1 m δ̂ y , where δ̂ is given by a regression of y on ȳ . DGP is
Notes: f̂t,T t t i=1 it δt i=1 i it i it t
yit = γ i ft + uit , for i = 1, 2, . . . , m, t = 1, 2, . . . , T, where γ i ∼ IIDN (1, 1), ft ∼ IIDN (0, 1), and errors are generated either as uit ∼ IIDN (0, 1) (top panel),
or from a spatial autoregressive (SAR) process with SAR parameter 0.6. ρ xt , yt denotes correlation between xt and yt . Findings in this table are based on R = 2000
Monte Carlo replications.
i
i
i
i
i
i
Table 19.5 Monte Carlo findings for squared correlations of the unobserved common factor and its estimates: Experiments with E γ i = 0
(m/T) 30 50 100 200 1000 30 50 100 200 1000 30 50 100 200 1000 30 50 100 200 1000
Experiments with IID idiosyncratic errors
30 96.24 96.40 96.49 96.52 96.60 36.04 35.19 34.18 34.23 34.09 87.48 89.17 90.51 91.53 92.59 95.09 95.75 96.03 96.31 96.42
50 97.80 97.85 97.91 97.95 97.97 36.58 35.00 34.36 34.71 35.06 91.01 92.63 94.04 94.59 95.36 97.22 97.26 97.61 97.84 97.94
100 98.92 98.96 98.97 98.97 98.99 35.94 35.18 35.56 34.00 34.67 92.62 94.91 95.90 96.82 97.57 98.21 98.81 98.82 98.93 98.95
200 99.45 99.48 99.49 99.49 99.50 35.85 35.05 35.31 34.84 35.39 94.09 96.27 97.34 97.92 99.03 99.17 99.42 99.46 99.49 99.50
500 99.89 99.90 99.90 99.90 99.90 37.52 35.16 34.44 35.05 34.98 96.16 96.95 97.98 99.26 99.63 99.74 99.85 99.86 99.90 99.90
30 95.86 96.01 96.18 96.25 96.28 20.48 19.83 19.17 18.11 18.22 72.62 74.44 75.83 75.73 76.18 90.40 91.86 92.72 92.95 93.13
50 97.66 97.77 97.79 97.82 97.85 20.87 19.94 19.12 19.03 18.31 79.42 82.80 83.92 83.50 84.87 94.84 96.07 96.44 96.24 96.96
100 98.84 98.92 98.95 98.95 98.97 20.67 18.75 18.72 18.07 18.44 85.62 87.91 89.80 89.98 92.57 97.57 98.14 98.23 98.58 98.62
200 99.43 99.46 99.48 99.49 99.49 20.72 19.56 18.67 17.96 18.60 89.56 91.30 92.94 94.84 96.21 98.75 98.93 99.22 99.34 99.46
1000 99.89 99.89 99.90 99.90 99.90 20.28 19.32 19.20 18.64 18.32 91.54 94.97 96.61 97.70 99.02 99.40 99.75 99.80 99.88 99.89
PC is the principal component estimator of f , ȳ = m−1 m y , and ȳ = m−1 m δ̂ y , where δ̂ is given by a regression of y on ȳ . The rth iterated estimator is
Notes: f̂t,T t t i=1 it δt i=1 i it i it t
(r) (r) (r) (r−1) (r−1) (1)
computed as ȳδt = m−1 m δ̂
i=1 i it y , where δ̂ i is the coefficient of ȳδt in the OLS regression of yit on ȳδt , with ȳδt = ȳδt . DGP is yit = γ i ft + uit , for i = 1, 2, . . . , m,
t = 1, 2, . . . , T, where γ i ∼ IIDN (0, 1), ft ∼ IIDN (0, 1), and errors are generated either as uit ∼ IIDN (0, 1) (top panel), or from a spatial autoregressive (SAR) process with
SAR parameter 0.6. ρ xt , yt denotes correlation between xt and yt . Findings in this table are based on R = 2000 Monte Carlo replications.
i
i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
1
m T
(h) (h) (h)
V h, F̂ = min yit − γ i f̂t ,
NT
i=1 t=1
γ (h)
i = γ i1 , γ i2 , . . . , γ ih and f̂t(h) = f̂t1 , f̂t2 , . . . , f̂th , where factors are estimated by
principal components, and g (.) is a penalty function due to over-fitting, satisfying the following
conditions
(a) : g (m, T) → 0, as m, T → ∞,
(b) : CNT
2
× g (m, T) → ∞, as m, T → ∞,
√ √
with CmT = min m, T . The authors prove that, under some regularity conditions, the cri-
teria PC(h) and IC(h) will consistently estimate k. Bai and Ng (2002) also propose the following
specific formulations of g (m, T)
(h) m+T2 mT
PCp1 (h) = V h, F̂ + hσ̂ ln ,
mT m+T
m+T 2
PCp2 (h) = V h, F̂(h) + hσ̂ 2 ln CmT ,
mT
2
(h) 2 ln CmT
PCp3 (h) = V h, F̂ + hσ̂ 2 ,
CmT
m T
where σ̂ 2 = (mT)−1 i=1
2
t=1 eit , and
m+T mT
ICp1 (h) = ln V h, F̂(h) + h ln ,
mT m+T
m+T 2
ICp2 (h) = ln V h, F̂(h) + h ln CmT ,
mT
2
ln CmT
(h)
ICp3 (h) = ln V h, F̂ +h 2
.
CmT
In practice, Bai and Ng suggest replacing σ̂ 2 with V kmax , F̂(kmax ) , where kmax is the maximum
2
IC criteria, scaling by σ̂ is implicitly performed by
number of selected factors. Note that, in the
the logarithmic transformation of V h, f̂h and is thus not required in the penalty function.
In a Monte Carlo exercise, Onatski (2010) shows that Bai and Ng (2002) information criteria
perform rather poorly, unless N and T are quite large. Further, the performance of these criteria
deteriorates considerably as the variances of the idiosyncratic components increase, or when
such components are cross-sectionally (weakly) correlated. In particular, Onatski observes an
overestimation of the number of factors when the idiosyncratic errors are contemporaneously
correlated. One explanation for this result is that, in this case, some linear combinations of the
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
idiosyncratic errors may have a non-trivial effect on a sizeable portion of the data. Hence, the
explanatory power of such linear combination rises and Bai and Ng (2002) criteria have difficulty
in distinguishing these linear combinations from ft .
Onatski (2010) proposes an estimator of the number of factors based on the empirical dis-
tribution of eigenvalues of the sample covariance matrix. Let λi be the ith largest eigenvalue of
T −1 YY , and consider
k̂δ = # i ≤ m : λi > (1 + δ)v̂ ,
where δ is a positive scalar, and v̂ = wλkmax + (1 − w)λ2kmax +1 , w = 22/3 / 22/3 − 1 . Onatski
(2010) proves that k̂δ is consistent for k when δ ∼ m−α for any scalar α satisfying a set of
conditions. See Onatski (2010) for details.
Factor models are used extensively in panel data models to characterize strong cross-sectional
dependence. See Chapter 29.
where yt = (y1t , y2t , . . . , ymy ,t ) , xt = (x1t , x2t , . . . , xmx ,t ) , and α (i) and γ (j) are the associ-
ated my × 1 and mx × 1 loading vectors, respectively. The first canonical correlation of yt and
xt is given by those values α (1) and γ (1) that maximize the correlation of u1t and v1t . These
variables are known as canonical variates. The second canonical correlation refers to α (2) and
γ (2) such that u2t and v2t have maximum correlation subject to the restriction that they are
uncorrelated with u1t and v1t . The loadings are typically normalized so that the canonical vari-
ates have unit variances, namely α yy α = 1, and γ xx γ = 1. The optimization problem
can be set as
$ %
1 1
max α yx γ − ρ 1 (α yy α − 1) − ρ 2 γ xx γ − 1 ,
α,γ 2 2
where yx is the population covariance matrix of yt and xt , yy and xx , are the population
variance matrices of yt and xt , respectively, and ρ 1 and ρ 2 are Lagrange multipliers. The first-
order conditions for this optimization are given by
yx γ −ρ 1 yy α= 0,
xy α−ρ 2 xx γ = 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Now assuming that xx and yy are nonsingular, using standard results on the determinant of
partitioned matrices we have (see Section A.9 in Appendix A)
−ρ yy yx 2
= ρ − −1
xy −ρ xx yy xx xy yy yx
2
= | xx | ρ yy − yx −1 xy = 0.
xx
Syxy = S−1 −1
yy Syx Sxx Sxy , if my ≤ mx ,
and
Sxyx = S−1 −1
xx Sxy Syy Syx , if mx < my ,
and let ρ 21 ≥ ρ 22 ≥ . . . ≥ ρ 2my ≥ 0 be the eigenvalues of Syxy . Then the kth squared canonical
correlation of Y and X is given by the kth largest eigenvalue of matrix Syxy , ρ 2k . These coefficients
measure the strength of the overall relationships between the two canonical variates, or weighted
sums of Y and X.
The canonical variates, ukt and vkt , associated with the kth squared canonical correlation, ρ 2k
is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
−ρ k Syy Syx α (k)
= 0.
Sxy −ρ k Sxx γ (k)
and hence α (k) can be computed as the eigenvector associated with the kth largest root of
Syxy = S−1 −1
yy Syx Sxx Sxy , and γ (k) can be computed as the eigenvector associated with the k
th
−1 −1
largest root of Sxyx = Sxx Sxy Syy Syx . These eigenvectors are normalized such that
α (k) Syy α (k) = 1, γ (k) Sxx γ (k) = 1, and α (k) Syx γ (k) = ρ k .
a
T × Trace Syxy ∼ χ 2(m −1)(m .
y x −1)
The above analysis can be extended to control for a third set of variables that might
influence Y
and X. Consider the T × mz observation matrix Z , and suppose that T > max my , mx , mz .
−1
Let Mz = IT − Z Z Z Z . Compute
Ŷ = Mz Y, X̂ = Mz X.
if my ≤ mx and
if my > mx .
Similarly, the covariates in this case are defined by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where α (k) is the eigenvector of Sŷx̂ŷ associated with its kth largest eigenvalue, and γ (k) is the
eigenvector of Sx̂ŷx̂ associated with its kth largest eigenvalue. Note that by construction
Corr(u kt ) = Var(v
kt , vkt ) = ρ k , Var(u kt ) = 1, for k = 1, 2, . . . , min(my , mx ).
See Anderson (2003, Ch. 12) for further details.
Y = XB + U, (19.62)
where r is an integer. The above rank restriction has the interpretation that fewer than m linear
combinations of the X variables are relevant to the explanation of the dependent variables (Tso
(1981)). Under the reduced rank hypothesis (19.63), the coefficient matrix B can be expressed
as the product of two matrices of lower dimensions, namely B = CD, with C and D having
dimensions km × r and r × m respectively, so that model (19.62) can now be written as
Y = XCD + U.
Under the rank deficiency condition, the OLS method is not valid since it ignores the cross-
equation restrictions on the elements of B imposed by the rank deficiency. This is clarified in the
following example.
Given that Rank (B) = 1, the determinant of B is 0, and we have the following nonlinear restriction
on the elements of B : β 11 β 22 − β 12 β 21 = 0.
The log-likelihood function of (19.62) is given by (see Anderson (1951), Tso (1981))
Tm T 1
(θ ) = − log(2π ) − log || − Tr U −1 U , (19.64)
2 2 2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
with θ = vec(C) , vec(D) , vech() , and U = Y−XCD. The maximum likelihood estimator
of conditional on C and D is given by
which, if substituted in (19.64), reduces the problem of maximizing (19.64) to the problem of
finding the minimum of
q(C, D) = T −1 (Y − XCD) (Y − XCD) . (19.66)
We observe that the above optimization problem does not lead to a unique solution for C and
D. In fact, for any r × r nonsingular matrix, G,
B = CD = (CG) G−1 D = C∗ D∗ ,
with C∗ = CG, and D∗ = G−1 D, and therefore q(C, D) = q(C∗ , D∗ ). It follows that r2
identifying restrictions are needed. Tso (1981) suggests the following restrictions
P P = Ir , where P = X C , (19.67)
T×r T×(mk)(mk)×r
Noting that
Y − PD = IT − PP Y + P P Y − D ,
Given that when D = P Y = C X Y, (19.69) attains its minimum, we are only left with the prob-
lem of minimizing the following expression
q̃ (P) = T −1 Y IT − PP Y ,
over all matrices P satisfying (19.67). Assume that X and Y are full column ranks and consider
the following decompositions of X and Y (see Section A.2 in Appendix A for a description of
matrix decompositions)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where R and V are T × km and T × m orthogonal matrices, and S and Q are km×km and m×m
invertible matrices. Noting that, given (19.70), P = RSC = RF with F = SC being a km × r
matrix such that F F = Ir , we have
q̃ (P) = q̃ (RF) = T −1 Q V IT − RFF R VQ
= T −1 |Q |2 Ikm − HFF H = T −1 |Q |2 F Ikm − HH F , (19.71)
−1
−1
λk HH = λk R VV R = λk S S S S R VQ Q Q Q VR ,
−1 −1
λk HH = λk S R RS S R VQ Q V VQ Q V RS .
S R RS = X X, S R VQ = X Y,
Q V VQ = Y Y,
and hence
λk HH = λk S−1
xx Sxy S−1
yy Syx ,
where Sxx , Sxy , and Syy are defined by (19.61). It follows that λk HH corresponds to the
kth largest squared canonical correlations between the variables in Y and X (see Section 19.6).
Estimates of C are given by Ĉ = S−1 F, while estimates of D and , in terms of Ĉ, can be
obtained as
−1 −1
D̂ = Ĉ X XĈ XĈY = Syx Ĉ Ĉ Sxx Ĉ ,
ˆ = T −1 Y − XĈD̂ Y − XĈD̂ .
See Anderson (1951) and Tso (1981) for further details. As we will see in Chapter 22, the RRR
method is particularly useful in the analysis of cointegrated variables (see also Johansen (1991)).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
19.9 Exercises
1. Consider the following system of regression equations
yi = Xi β i + ui , for i = 1, 2, . . . , m,
y = Wβ + u,
(a) Using the information available to you, derive the instrumental variable (IV) estimator
of β
i. when s = k,
and
ii. when s > k.
(b) Derive necessary and sufficient conditions under which the IV estimators are consistent
and asymptotically efficient with respect to the available set of instruments.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(c) What is a suitable statistic for testing the validity of the IV estimators when s > k?
Comment on the usefulness of such a test.
where yt = (y1t , y2t ) is the 2×1 vector of endogenous variables, and xt is the only exogenous
variable of the model. The 2 × 1 vector of errors ut = (u1t , u2t ) is serially uncorrelated with
mean zero and the positive definite covariance matrix
σ 11 σ 12
Cov (ut ) = .
σ 21 σ 22
(a) Discuss the conditions under which α is identified. Can β be identified as well?
(b) Show that the structural model has the following reduced form representation (assuming
that αβ = 1)
αθ u1t + αu2t
y1t = xt + ,
1 − αβ 1 − αβ
θ βu1t + u2t
y2t = xt + .
1 − αβ 1 − αβ
(c) Show that the OLS estimator of α based on the observations yt , xt ; for t = 1, 2, . . . , T
is biased. Under what conditions does this bias vanish asymptotically (as T → ∞)?
(d) Consider now the IV (or two-stage) estimator of α
−1
T
T
α̂ IV = xt y2t xt y1t .
t=1 t=1
k
yit = γ ij fjt + uit , for i = 1, 2, . . . , m, and t = 1, 2, . . . , T, (19.72)
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where the factors, fjt , j = 1, 2, . . . , k are mutually uncorrelated with unit variances, and dis-
tributed independently of uit .
Var(yt ) = + u ,
5. Consider the multifactor model given by (19.72) and suppose that yit denotes the return on
security i during period t. Consider the portfolio return ρ ωt = ω yt , where ω = (ω1 , ω2 , . . . ,
ωm ) is a vector of granular weights such ωi = O(1/m).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
20 Multivariate Rational
Expectations Models
20.1 Introduction
A ll economic and financial decisions are subject to major uncertainties. How to model
uncertainty and expectations has been controversial, although since the pioneering contri-
butions of Muth, Lucas, and Sargent, the rational expectations hypothesis (REH) has come to
dominate economics and finance as the favoured approach to expectations formation. According
to the REH, subjective characterization of uncertainty as conditional probability distributions
will coincide (through learning) with the associated objective outcomes. The REH is mathe-
matically elegant and allows model-consistent solutions, and fits nicely within the equilibrium
economic theory. Almost all dynamic stochastic general equilibrium (DSGE) models used in
macroeconomics and finance are solved under the REH. It is with this in mind that we devote
this chapter to the solution, identification, and estimation of rational expectations models. But
readers should be aware of the limitations of the REH, as set out in Pesaran (1987c).
We begin with an overview of solution techniques, distinguishing between RE models with
and without feedbacks from the decision (or target) variables to the state variables. We also con-
sider models with and without lagged values of the decision variables. In the case of RE models
with feedbacks we argue that it is best to cast the RE models as a closed dynamic system before
solving them. We then consider identification of structural parameters of DSGE models and esti-
mation of RE models in general.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and we obtain
Similarly,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
h−1
yt = Ah E(yt+h |t ) + Aj E wt+j |t . (20.2)
j=0
A unique solution exits if it is possible to eliminate the effect of future expectations, E(yt+h |t ),
on yt . Suppose that all eigenvalues of A are distinct and consider the spectral decomposition of
A given by A = PDP−1 , where D is a diagonal matrix formed from the eigenvalues of
A, and columns of matrix P are formed from the associated eigenvectors of A, and
Ah = PDh P−1 .1 Using this decomposition (20.2) can be written as
h−1
ỹt = Dh E(ỹt+h |t ) + Dj E w̃t+j |t ,
j=0
where ỹt = P−1 yt = (ỹ1t , ỹ2t , . . . , ỹmt ) , and w̃t = P−1 wt . Hence,
h−1
j
ỹit = λhi E(ỹi,t+h |t ) + λi E w̃i,t+j |t , for i = 1, 2, . . . , m,
j=0
where λi , i = 1, 2, . . . , m are the distinct eigenvalues of A. It is now clear that if all eigenvalues of
A have an absolute value smaller than unity (namely |λi | < 1), then as h → ∞, λhi → 0, and
the solution to ỹit will not depend on the future expectations of yt so long as for all h the future
expectations, E(ỹi,t+h |t ), are bounded or satisfy the transversality conditions
Finally, assuming that the process of the forcing variables is stable, the unique solution of (20.1)
is given by
∞
yt = Aj E(wt+j |t ). (20.4)
j=0
This solution does not require the wt process to be stationary and allows the forcing variables
to contain unit roots. For example, suppose that wt follows the first-order vector autoregressive
process
1 In the case where one or more eigenvalues of A are the same, one needs to use the Jordan form where the diagonal
matrix D is replaced by an upper (lower) triangular matrix having eigenvalues of A on its main diagonal.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
wt = wt−1 + vt ,
where vt are serially uncorrelated innovations. For this process, E(wt+j |t ) = j wt , and assum-
ing that all eigenvalues of A lie inside the unit circle we have
⎛ ⎞
∞
yt = ⎝ A j j ⎠ w t . (20.5)
j=0
It is now easily seen that for a finite m, the solution exists if the product of the largest eigenvalues
of A and strictly lies within the unit circle. Therefore, one or more eigenvalues of could be
equal to unity if all the eigenvalues of A are less than unity in absolute value.
In cases where one or more eigenvalues of A lie on or outside the unit circle, the solution to
the RE model is not unique and depends on arbitrary martingale processes. In the extreme case
where all eigenvalues of A fall outside the unit circle the general solution can be written as
t−1
yt = A−t mt − A−j wt−j , for t ≥ 1,
j=0
where mt is a martingale vector process with m arbitrary martingale components such that
E (mt+1 |t ) = mt (see Section 15.3.1). In the more general case where m1 of the roots of A
fall on or outside the unit circle and the rest fall inside, the solution will depend on m1 arbitrary
martingale processes.
yt = Gwt ,
Gwt = AGwt + wt ,
and the unknown coefficient matrix, G, must satisfy the system of equations (known as ‘Sylverster
equations’)
G = AG+Im , (20.6)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where Im is an identity matrix of order m. To obtain the solution (20.5) matrix G can be solved
in terms of A and iteratively. Consider the recursive system of equations with G(0) = Im
where vec(A) denotes a vector composed of the stacked columns of A. But (see, e.g., Magnus and
Neudecker (1999, p. 30, Theorem 2))
vec (AG) = ⊗ A vec(G)
The above solution strategy can be readily extended to the case where wt follows higher-order
processes, or when wt contains serially correlated unobserved components, as in the following
example.
xt = 1 xt−1 + 2 xt−2 + vt ,
and
ut = Rut−1 + ηt ,
where vt and ηt are serially uncorrelated with zero means. Using (20.4)
2 This happens because the eigenvalues of Kronecker products of two matrices are given by the products of their
respective eigenvalues.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
∞
∞
yt = Aj BE(xt+j |t ) + ⎝ Aj R j ⎠ ut .
j=0 j=0
Since xt is a second-order process then the unique solution (when it exists) will have the general form
yt = G1 xt + G2 xt−1 + Hut .
Substituting this result in (20.1) and equating the relevant coefficient matrices we obtain
G1 = AG1 1 + AG2 + B,
G2 = AG1 2 ,
H = AHR + Im ,
−1
vec(G1 ) = I2m − 1 ⊗ A − 2 ⊗ A2 vec (A2 + B) ,
vec(G2 ) = 2 ⊗ A vec(G1 ),
−1
vec(H) = Im2 − R ⊗ A vec (Im ) .
For further details on alternative methods of solving RE model with strictly exogenous vari-
ables, see Pesaran (1981b) and Pesaran (1987c). See also Whiteman (1983), Salemi (1986) and
Uhlig (2001).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Yt = yt − Cyt−1 , (20.8)
| |
the form (20.1). Using the fact that E(Yt+1 t ) = E yt+1 t − Cyt ,
obeys an equation of
so that E yt+1 |t = E (Yt+1 |t ) + Cyt , and substituting (20.8) back into (20.7) we obtain
Yt = −Cyt−1 + Ayt−1 + B E (Yt+1 |t ) + Cyt + ut
= −Cyt−1 + Ayt−1 + BE (Yt+1 |t ) + BC Yt + Cyt−1 + ut .
This equation characterizes the matrix C introduced in (20.8) as the solution of the quadratic
equation
BC2 − C + A = 0m , (20.10)
where
F = (Im − BC)−1 B,
Wt = (Im − BC)−1 ut .
The new equation system (20.11) does not depend on lagged values of the transformed variable,
and can be solved using the martingale difference approach (see Section 20.7.4 on this). Binder
and Pesaran (1995) and Binder and Pesaran (1997) have shown that there will be a unique solu-
tion if there exists a real matrix solution to equation (20.10) such that all eigenvalues of C lie
inside or on the unit circle, and all eigenvalues of F lie strictly inside the unit circle. In such a
case, the unique solution is given by
∞
yt = Cyt−1 + Fh E (Wt+h |t ) . (20.12)
h=0
3 Notice that the nonsingularity of (I − BC) does not necessarily require B to be nonsingular. Binder and Pesaran
n
(1997) provide sufficient conditions under which (In − BC) is nonsingular (see their Proposition 2).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The infinite sum in the solution can be solved for different choices of the ut process. For example,
if ut follows a VAR(1) given by
ut = Rut−1 + ε t , (20.13)
we have
Hence,
∞
−1
yt = Cyt−1 + F (Im − BC)
h
R h
ut ,
h=0
or
As before, G can also be obtained using the method of undetermined coefficients, noting that C
satisfies the quadratic matrix equation, (20.10). We first note that
E yt+1 |t = Cyt + GRut = C Cyt−1 + Gut + GRut
= C2 yt−1 + (CG + GR) ut .
G = B (CG + GR) + Im ,
G = (Im −BC)−1 BGR + (Im −BC)−1 , (20.15)
This solution exists if all the eigenvalues of F lie inside the unit circle and
all the roots of R lie on
or inside the unit circle. These conditions ensure that Im2 − R ⊗ F is a nonsingular matrix.
The solution in terms of the innovations to the forcing variables can now be written as
yt = Cyt−1 + GRut−1 + Gε t .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Example 42 Consider the following new Keynesian Phillips curve (NKPC) with a backward
component
where π t is the rate of inflation, xt is a measure of output gap, and ut is a serially uncorrelated ‘sup-
ply’ shock with mean zero. The theory also predicts that β f , β b > 0. The solution of the model
depends on the process generating xt and ut , and the backward (β b ) and the forward (β f ) coeffi-
cients. Following the QDE approach let yt = π t − λπ t−1 and write (20.16) as
βf 1
yt = E(yt+1 | t ) + (γ xt + ut ) , (20.17)
1 − βf λ 1 − βf λ
β f λ2 − λ + β b = 0. (20.18)
β −1
f 1 − β −1
f b = β f − λb = λf .
λ
For a unique stable solution we need to select λ such that β −1 1 − β f λ < 1. Set λ = λb , and
f
using the above result note that β −1 1 − β f λb = λ−1
f f . The solution will be unique if λf > 1.
Using the results in Section 20.2.1 the unique solution of yt is given by
∞
1 −j
yt = λf E γ xt+j + ut+j | t .
1 − β f λb j=0
where |λb | < 1. A sufficient condition for the quadratic equation to have one root, λb , inside the
unit circle and the other root, λf , outside the unit circle is given by (note that λb λf = β b /β f )
1 β 1 − βb − βf
(1 − λb )(λf − 1) = − b −1= ≥ 0.
βf βf βf
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
For an economically meaningful solution, the roots must be real and this is ensured if β f β b ≤ 1/4.
In the boundary case where β f + β b = 1, then λb = 1, and λf = β1 − 1, and a unique solution
1−β
f
follows if λf = β f > 1, or if β f < 1/2. Therefore, in the case where β f + β b = 1, and
f
β f < 1/2, the unique solution of the NKPC is given by
∞
j
γ βf 1
π t = π t−1 + E xt+j |t−1 + ut .
1 − βf j=0
1 − βf 1 − βf
Since by design the output gap, xt , is a stationary process, then inflation will be I(1) if β b +β f = 1.
If both roots, λb and λf , fall inside the unit circle a general solution is given by
π t = β −1 −1 −1 −1
f π t−1 − β f β b π t−2 − β f γ xt−1 + mt − β f ut−1 , (20.19)
where
where g is an arbitrary constant. This in itself gives a multiplicity of solutions, depending on the
choice of g. Finally, the NKPC does not have any stable solutions if both roots fall outside the unit
circle.
where
S is a non-zero matrix of fixed coefficients and captures the degree of feedbacks from yt−1 back
into ut . It is clear that the solution approaches of the previous section do not apply here directly.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
But RE models with feedbacks can be written in the form of a larger RE model, with no feedbacks.
Let zt = yt , ut and write the above set of equations as
Im −Im A 0 B 0 0
zt = zt−1 + E(zt+1 |t ) + ,
0 Im S R 0 0 εt
zt =
Azt−1 +
BE(zt+1 |t ) + vt , (20.22)
where
A+S R B 0 εt
A= ,
B= , and vt = .
S R 0 0 εt
In the enlarged RE model (20.22), there are no longer any feedbacks, and the solution methods
of previous sections can be readily applied to it. In particular, it is easily seen that this model has
the unique solution
where
C is such that
B
C2 −
C+
A = 02m×2m , (20.23)
and
Also
Im − BC −BD
I2m − B̃C̃ =
0 Im
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(Im − BC)−1 (Im − BC)−1 BD
(I2m − B̃C̃)−1 = .
0 Im
Using the above result, the unique solution of yt is given by (assuming that regularity conditions
are satisfied)
which upon using (20.25) and after some algebra can be written equivalently as
where
The solution of the RE model in this case requires solving the nonlinear matrix equations given
by (20.24) and (20.25) for C and D.
The above solution form can also be used to derive C and G directly, using the method of
undetermined coefficients. Note that if yt = Cyt−1 + G (Rut−1 + εt ) is to be a solution of
(20.20) and (20.21) we must have
Cyt−1 + G (Rut−1 + ε t ) = Ayt−1 + B Cyt + GRut + ut ,
and hence
(C − A)yt−1 + G (Rut−1 + ε t ) = BC Cyt−1 + G (Rut−1 + εt )
+ (BGR + Im ) Syt−1 + Rut−1 + ε t .
Equating coefficient matrices of yt−1 , ut−1 and ε t from both sides we have
which simplify to
These two sets of matrix equations can now be solved iteratively for C and G.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
p
p H
yt = Aj0 yt−j + Ajh E(yt+h−j |t−j ) + vt , (20.28)
j=1 j=0 h=1
where zt , ut and
ϑ t are of dimension m(H + 1)p × 1, ϑ t is of dimension m(H + 1) × 1, and
A and B are square matrices of dimension m(H + 1)p, with Di , i = −1, 0, 1, defined by
⎛ ⎞ ⎛ ⎞
−1 0m · · · 0m 0 1 · · · p−1
⎜ 0m 0m · · · 0m ⎟ ⎜ 0m Im · · · 0m ⎟
⎜ ⎟ ⎜ ⎟
D−1 = ⎜ .. .. .. .. ⎟ , D0 = ⎜ .. .. .. .. ⎟,
⎝ . . . . ⎠ ⎝ . . . . ⎠
0m 0m · · · 0m 0m 0m ··· In
⎛ ⎞
0m 0m · · · 0m p
⎜ −Im 0m · · · 0m 0m ⎟
⎜ ⎟
D1 = ⎜ . .. .. .. .. ⎟,
⎝ .. . . . . ⎠
0m 0m · · · −Im 0m
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and
⎛ ⎞
0m 0m ··· 0m 0m
⎜ −Im 0m ··· 0m 0m ⎟
⎜ ⎟
−1 = ⎜ .. .. .. .. .. ⎟.
⎝ . . . . . ⎠
0m 0m · · · −Im 0m
Using the above auxiliary vectors and matrices, we obtain the canonical form
See, for example, Broze, Gouriéroux, and Szafarz (1995) and Binder and Pesaran (1995) for
further details.
xt = Rxt−1 + ut . (20.31)
Let Yt = (yt , E(yt+1 |t )) and note that (20.30) can be written as
Yt = AE(Yt+1 |t ) + Wt ,
where
A1 A2 Bxt + εt
A= , and Wt = .
Im 0 0
Suppose now that all the eigenvalue of A lie within the unit circle and the standard transversality
condition is met. Using the method of undetermined coefficients, the unique solution of the RE model
is given by
yt = Cxt + εt ,
C = A1 CR + A2 CR 2 + B,
or
vec(C) = (R ⊗ A1 ) vec(C)+ R 2 ⊗ A2 vec(C) + vec(B),
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and finally
−1
vec(C) = Ikm − (R ⊗ A1 ) − R 2 ⊗ A2 vec(B). (20.32)
zt = Czt−1 + ht , (20.33)
where C and ht can be obtained from a backward recursion as set out in Binder and Pesaran
(1997). We now address the problem of how to retrieve yt from this solution. To simplify the
exposition we set p = 2 and H = 1. Let
⎛ ⎞
1 0 0 0
⎜ 0 0 1 0 ⎟
D =⎜
⎝ 0
⎟.
1 0 0 ⎠
0 0 0 1
and
Hence, if all eigenvalues of G22 fall within the unit circle, the solution for q1t will be given by
q1t = G11 q1,t−1 + G12 (I − G22 L)−1 G21 q1,t−2 + h̃2,t−1 + h̃1t .
The solution for yt can be obtained from the above equations. Clearly, the solution for yt involves
infinite moving average components, unless G12 = 0. In the case where G12 = 0, the pres-
ence of the infinite-order moving average term in the solution complicates the problems of
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
identification and estimation of RE models, and raises the issue of whether the solution can be
approximated by finite-order VARMA processes.
where E yt+τ +j−i |t+τ −i is defined for t + τ + j − i > T. Binder and Pesaran (2000) have
presented efficient methods for the solution of model (20.34), and showed that this is linked
to the problem of solving sparse linear equations systems with a block tridiagonal coefficients
matrix structure.
See Binder and Pesaran (2000) and Gilli and Pauletto (1997).
Proceeding
recursively
backward, we can obtain yT−1 as a function of yT−2 , the terminal condi-
tion E yT+1 |T , and of E (wT |T−1 ) and wT−1 . Combining (20.34) for τ = T − t − 1 with
(20.35), one readily obtains
yT−1 = (Im −BA)−1 AyT−2 + B2 E yT+1 |T + BE (wT |T−1 ) + wT−1 . (20.36)
Proceeding to period T−2, combining (20.34) for τ = T−t−2 with (20.36), the solution for
yT−2 is given by
−1
yT−2 = Im − B (Im −BA)−1 A
AyT−3 + B (Im −BA)−1 B2 E yT+1 |T + B (Im −BA)−1 BE (wT |T−2 )
× .
+B (Im −BA)−1 E (wT−1 |T−2 ) + wT−2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The pattern of these backward recursions should be apparent. Along the same lines of reasoning,
the solution for yt+τ to (20.34) is given by
yt+τ = −1 −1
τ Ay t+τ −1 + τ E ( t+τ |t+τ ) , τ = 0, 1, . . . , T − t, (20.37)
where
and
T = BE yT+1 |T +wT , T−i = B−1
T−t−i+1 T−i+1 +wT−i , i = 1, 2, . . . , T−t.
The matrices T−t−i are assumed to be nonsingular for i = 1, 2, . . ., T − t. Note that the solu-
tion in all periods is a linear combination of the initial and terminal values, and the conditional
expectations of the forcing variables. As the forcingvariables
were assumed to be adapted to the
information sets {t+τ }, then so will the solution yt+τ .
4 We will also show that under this rank condition, (20.38) can be written as a special case of the canonical model given
by (20.7).
5 Namely J is arranged in Jordan blocks. See Broze, Gouriéroux, and Szafarz (1995) for definition of Jordan canonical
form, Jordan blocks, and canonical variables.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where Ju contains the unstable eigenvalues of G with absolute value greater than unity, Js contains
the stable eigenvalues of G with absolute value less than unity,6 and ut and st are the canonical
variables associated with the eigenvalues in Ju and Js , respectively. Substituting the above results
in (20.39) we now have
∗
ut+1 Ju 0 ut u
E t = + vt . (20.40)
st+1 0 Js st ∗s
ut = J−1 −1 ∗
u E (ut+1 |t ) − Ju u vt ,
ut = Cuy yt + Cux xt .
The above equations link yt to the canonical variables ut (that evolve according to (20.41)) and
the predetermined variables, xt . In the case where Cuy is nonsingular we have
yt = C−1 −1
uy ut − Cuy Cux xt . (20.43)
6 The diagonal elements of J and J are also given by the roots of the determinant equation | z − | = 0. This will
s u 0 1
be useful when introducing the King and Watson (1998) method.
7 See Section 20.2.1 for a description of the forward method and the assumptions required for obtaining a unique
solution.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where Rxs = Csx − Csy C−1 uy Cux , which is a nonsingular matrix. Recall that by assumption C
and Cuy are nonsingular matrices. Equations (20.43) and (20.44) can then be used recursively
to solve for yt , xt and zt , given the initial values, y0 and x0 , and the unique solution of ut as given
above.
where L−1 is the forward operator, i.e. L−1 zt = zt+1 . To ensure a unique solution King and
Watson assume that | 0 λ − 1 | = 0, for all values of λ. King and Watson (1998) show that,
under this condition, model (20.38) (or (20.45)) can be written equivalently as
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ∗ ⎞
G 0 0 qt+1 Iq 0 0 qt ι
⎝ 0 Iu 0 ⎠ E ⎝ ut+1 t ⎠ = ⎝ 0 Ju 0 ⎠ ⎝ ut ⎠ + ⎝ ∗u ⎠ vt ,
0 0 Is st+1 0 0 Js st ∗s
where G is an upper triangular matrix with zeros on the main diagonal, and Iq , Iu , and Is are
identity matrices of orders conformable to qt , ut and st . This representation contains the same
variables identified by Blanchard and Kahn (1980) (see, in particular, equations (20.40)), but
also the new set of variables, qt . These are the canonical variables associated to the roots of the
polynomial | 0 λ − 1 | that are infinite, or explosive, under singularity of the matrix 0 (see
Section 20.7.1 for the definition of canonical variables). A solution for qt can be obtained by
noting that
−1 ∞
qt = E GL−1 − I ∗ι vt |t = − Gh ∗ι E (vt+h |t ) ,
h=0
where L−1 is the forward operator, i.e. L−1 vt =vt+1 . Let υ t = qt , ut . The non-predetermined
variables, yt , are described by the equations (see also transformations (20.42))
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
υ t = Cvy yt + Cvx xt ,
where Cυy and Cυx have a number of rows equal to the sum of the number of elements in
qt and ut . Under the condition that Cυy is nonsingular, we can write a solution for the non-
predetermined variables
yt = C−1 −1
υy υ t + Cυy Cυx xt , (20.46)
which is a generalization of (20.43) to include qt . The solution for xt+1 can then be obtained
following the steps outlined in Section 20.7.1 for the Blanchard and Kahn (1980) method. Note
that under nonsingularity of Cυy , and using (20.46), it is possible to express (20.38) in the gen-
eral form (20.7) (see Section 20.7.1 for details in the case of the Blanchard and Kahn (1980)
method).
0 zt = 1 zt−1 + vt +
ηt , (20.47)
where
in the vector
zt some variables may enter as actual values and others as expectations, such
as E yj,t+1 |t . In the above model, the matrix 0 is allowed to be singular,
vtis a random, exoge-
nous and potentially serially correlated process, and ηt satisfies E ηt+1 |t = 0, for all t. We
note that in the Sims (2001) approach the vector of expectations revisions, ηt , is determined
endogenously as part of the solution.
This method is based on the generalized Schur decomposition of matrices 0 and 1
Q 0 Z = 0 ,
Q 1 Z = 1 ,
The above system can be rearranged so that the lower right blocks of 0 and 1 contain the
generalized eigenvalues exploding to infinity. Partition z∗t as follows
0,11 0,12 z∗1t 1,11 1,12 z∗1,t−1 x1t
= + , (20.49)
0 0,22 z∗2t 0 1,22 z∗2,t−1 x2t
∗
where z∗t = z∗ 1t , z2t , xt = x1t , x2t = Q vt +
ηt , and z∗2t is the vector of unstable
variables associated with the explosive generalized eigenvalues. Note that z∗2t does not depend on
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
z∗1t . Letting M = −1 h ∗
1,22 0,22 , and assuming limh→∞ M z2,t+h = 0, solve forward the equations
∗
for z2t to obtain
∞
z∗2t = − Mh−1 −1
1,22 x2,t+h
h=1
∞
=− Mh−1 −1
1,22 Q 2 vt+h +
η t+h , (20.50)
h=1
which relates z∗2t to future values of vt+h and ηt+h . This means that knowing z∗2t requires that
all future events be known at time t. Since taking expectations conditional on the information
available at time t does not change the left-hand side of the above equation, we obtain
∞
z∗2t = −E Mh−1 −1
1,22 Q 2
vt+h +
η t+h |t . (20.51)
h=1
The fact that the right-hand side of equation (20.50) never deviates from its expected value
implies that the vector of expectations revisions, ηt , must fluctuate as a function of current and
future values of vt to guarantee that equality (20.51) holds. In particular, equality in (20.51) is
satisfied if and only if ηt satisfies
∞
Q 2
ηt+1 = 1,22 Mh−1 −1
1,22 Q 2 [E (vt+h |t+1 ) − E (vt+h |t )] .
h=1
Hence, the stability of the system crucially depends on the existence of expectations revisions
ηt to offset the effect that the fundamental shocks vt have on z∗2t .
Sims (2001) also proves that a necessary and sufficient condition to have a unique solution is
that the row space of Q 1
should be contained in that of Q 2
. In this case, we can write
Q 1
= Q 1
,
for some matrix . Premultiplying (20.49) by I − yields a new set of equations, free of refer-
ences to ηt , that can be combined with (20.50) to give
0,11 0,12 − 0,22 z∗1t 1,11 1,12 − 1,22
= ×
0 I z∗2t 0 0
z∗1,t−1 Q 1 − Q 1
+ vt
z∗2,t−1 0
0
− ∞ h−1 −1 Q E (v .
h=1 M 1,22 2 t+h |t )
Hence, when the matrix exists, the term involving ηt drops out, and the reduced form of the
RE model can be written as
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∞
zt = 1 zt−1 + v vt + z h−1
f v E (vt+h |t ) , (20.52)
h=1
where the matrices 1 , v , z and f are a function of parameters in (20.49) (see Sims (2001)
for a description of the elements in system (20.52)).
(L) = −B + Im L − AL2 .
Premultiplying both sides of (20.53) lagged by one period by the adjoint matrix of (L), Y (L),
we have
where Lm1 (L) = det [ (L)], and m1 equals the number of zero roots of (L). Multiplying
both sides by L−m1 , and substituting the expression of (L) in the first term of the right-hand
side we obtain
(L) yt − L−m1 Y (L) (L) ξ t = −Y (L) Im L − AL2 ξ t+m1 + Y (L) ut+m1 −1 ,
Note that now the left-hand side of equation (20.54) only depends on information known at
time t − 1 (recall that yt − ξ t =E(yt |t−1 )). The same line of reasoning can be applied to the
right-hand side of (20.54), and we must therefore have
E −Y (L) Im L − AL2 ξ t+m1 + Y (L) ut+m1 −1 |t−1
= −Y (L) Im L − AL2 ξ t+m1 + Y (L) ut+m1 −1 . (20.55)
Solutions to the RE model (20.7) can thus be computed by finding the martingale difference pro-
cesses, ξ t+m1 , that satisfy (20.55), and then solving the corresponding difference equation sys-
tems (20.53) for yt in terms of ut and ξ t . Generally there will be an infinite number of bounded
solutions, and the number of martingale difference processes that can be chosen arbitrarily may
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
be derived using the restrictions implied by (20.55). For further details see Broze, Gouriéroux,
and Szafarz (1990).
Remark 5 The choice amongst the alternative solution methods depends on the nature of the RE model
and the type of solution sought. For example, the undetermined coefficients method is appropriate
when it is known that the solution is unique. The Blanchard and Kahn method only applies when the
coefficients of the future expectations are nonsingular which could be highly restrictive in practice.
The King and Watson solution strategy relaxes the restrictive nature of the Blanchard and Kahn’s
approach but does not allow characterizations of all the possible solutions in the general case. The
same also applies to Sims’ method. In contrast, QDE and the martingale difference methods can be
used to develop all the solutions of the RE models in a transparent manner. In the case where a unique
solution exists, the numerical accuracy and speed of alternative solution methods are compared by
Anderson (2008).
The solution of this model is discussed in Section 20.3, and assuming that the solution is unique
it takes the form
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
A0 yt = A1 Et (yt+1 ) + ε t , (20.59)
E(ε t ) = 0, E(ε t ε t ) = ε .
The regular case, where there is a unique stationary solution, arises if all eigenvalues of
Q = A0−1 A1 lie within the unit circle (see Section 20.2). In this case, the unique solution of
the model is given by
∞
yt = Q j A0−1 Et (ε t+j ). (20.61)
j=0
A0 yt = εt , (20.62)
or
Notice that (20.63) provides us with a likelihood function which does not depend on A1 and,
therefore, the parameters that are unique to A1 (i.e., the coefficients that are specific to the for-
ward variables) are not identified. Furthermore, the RE model is observationally equivalent to
a model without forward variables which takes the form of (20.62). Since what can be esti-
mated from the data, namely u , is not a function of A1 , all possible choices of A1 are obser-
vationally equivalent in the sense that they lead to the same observed data covariance matrix.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Although the coefficients in the forward solution (20.61) are functions of A1 , this does not iden-
tify them because Et (ε t+j ) = 0. Elements of A1 could be identified by certain sorts of a pri-
ori restrictions, but these are likely to be rather special, rather limited in number and cannot be
tested.
If the parameters of the DSGE model were thought to be known a priori from calibration,
there would be no identification problem and the structural errors εit could be recovered and
used, for instance, in calculating impulse response functions, IRFs (see Chapter 24). How-
ever, suppose someone else believed that the true model was just a set of random errors yt =
ut , with different IRFs. There is no information in the data that a proponent of the DSGE
could use to persuade another person that the DSGE model was correct relative to the random
error model.
The above result generalizes to higher-order RE models. Consider, for example, the model
p
A0 yt = Ai Et (yt+i ) + εt .
i=1
Once again the unique stable solution of this model is given by A0 yt = εt , and none of
the elements of A1 , A2 , . . . , Ap that are variation free with respect to the elements of A0 are
identified.
Example 44 Consider the following standard three equation NK-DSGE model used in Benati (2010)
involving only current and future variables
Rt = ψπ t + ε 1t , (20.64)
yt = E(yt+1 | t ) − σ [Rt − E(π t+1 | t )] + ε 2t , (20.65)
π t = βE(π t+1 | t ) + γ yt + ε 3t. (20.66)
Equation (20.64) is a Taylor rule determining the interest rate, Rt , (20.66) a Phillips curve deter-
mining inflation, π t , and (20.65) is an IS curve determining output, yt , all measured as devia-
tions from their steady states. The errors, which are assumed to be white noise, are a monetary
policy shock, ε1t , a demand shock, ε 2t , and a supply or cost shock, ε3t , which we collect in the
vector ε t = (ε 1t , ε 2t , ε 3t ) . These are also usually assumed to be orthogonal. This system is
highly restricted, with many parameters set to zero a priori. For instance, output does not appear
in the Taylor rule and the coefficient of future output is assumed to be equal to unity. Let yt =
(Rt , π t , yt ) and
⎛ ⎞ ⎛ ⎞
1 −ψ 0 0 0 0
A0 = ⎝ σ 0 1 ⎠ , A1 = ⎝ 0 σ 1 ⎠. (20.67)
0 1 −γ 0 β 0
yt = AE(yt+1 |t ) + wt ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where wt = A0−1 ε t ,
⎛ ⎞
1 1 γψ ψ
A0−1 = ⎝ −γ σ γ 1 ⎠,
γσψ + 1 −σ 1 −σ ψ
⎛ ⎞⎛ ⎞
1 1 γψ ψ 0 0 0
A = A0−1 A1 = ⎝ −γ σ γ 1 ⎠⎝ 0 σ 1 ⎠
γσψ + 1 −σ 1 −σ ψ 0 β 0
⎛ ⎞
1 0 ψ(β + γ σ ) γ ψ
= ⎝ 0 β + γσ γ ⎠.
γσψ + 1 0 σ (1 − βψ) 1
1 1
λ1 = (1 + β + γ σ + κ) , λ2 = (1 + β + γ σ − κ) ,
2 (γ σ ψ + 1) 2 (γ σ ψ + 1)
where κ = β 2 − 2β + γ 2 σ 2 + 2γ σ + 2γ σ β − 4γ σ βψ + 1. Assuming that λj < 1
for j = 1, 2 and under serially uncorrelated errors, the solution of the above model is given by the
forward solution which in this case reduces to
yt = A0−1 εt , (20.68)
which does not depend on A1 . This solution is also obtainable from (20.64), (20.66), and (20.65)
by setting all expectational variables to zero. Writing the solution in full we have
Rt = ψπ t + ε1t ,
yt = −σ Rt + ε 2t ,
π t = γ yt + ε 3t ,
which does not depend on β. As we shall see later, this has implications for identification and
estimation of β. (See Example 46). This example illustrates some of the features of DSGE mod-
els. First, the RE model parameter matrices, A0 and A1 , are written in terms of deep parame-
ters, θ = (γ , σ , ψ, β) . Second, the parameters which appear only in A1 do not enter the RE
solution and, thus, do not enter the likelihood function. In this example, β does not appear in the
likelihood function, though σ which appears in A1 does appear in the likelihood function because it
also appears in A0 . Third, the restrictions necessary to ensure regularity (i.e., |λi | < 1 for i = 1, 2),
imply bounds involving the structural parameters, including the unidentified β. Thus, the param-
eter space is not variation free. Fourth, if β is fixed at some pre-selected value for the discount rate
(as would be done by a calibrator), then the model is identified.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where, as shown in Section 20.3.1, C solves the quadratic matrix equation A1 C2 −A0 C+A2 = 0.
The solution is unique and stationary if all the eigenvalues of C and (Im − A1 C)−1 A1 lie strictly
inside the unit circle. Therefore, the RE solution is observationally equivalent to the non-RE
simultaneous equations model (SEM)
A0 yt = A2 yt−1 + ε t ,
Example 45 Consider the new Keynesian Phillips curve (NKPC) model in example 42, but to
simplify the discussion of identification, abstract from the backward component and write the
NKPC as
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where inflation, π t , is determined by expected inflation and an exogenous driving process, such as
the output gap, xt . β f and γ are fixed parameters and εt is a martingale difference process and
Et−1 π t+1 = E(π t+1 | It−1 ), where It−1 is the information set available at time t − 1. Note
that in this model expectations are conditioned on It−1 , rather than on It . It is assumed that there
is no feedback from π t to xt , and xt follows a stationary AR(2) process
where
γ ρ1 + βf ρ2 γ ρ2
α1 = , α2 = . (20.74)
1 − β f ρ 1 − β 2f ρ 2 1 − β f ρ 1 − β 2f ρ 2
α1ρ 2 − α2ρ 1
βf = , for ρ 2 α 2 = 0,
ρ α2
2
α 1 1 − β f ρ 1 − β 2f ρ 2
γ = , for 1 − β f ρ 1 − β 2f ρ 2 = 0.
ρ1 + βf ρ2
Within the classical framework, the matrix of derivatives of the reduced form parameters with
respect to the structural parameters, plays an important role in identification. In this example the
relevant part of this matrix is the derivatives of α = (α 1 , α 1 ) with respect to θ , which can be
obtained using (20.74)
⎡ ρ 1 +2β f ρ 2
⎤
γ ρ2 + ρ1 + βf ρ2
∂α 1 ⎢ 1−βρ 1 −β ρ 2
2
⎥
R(θ) = = ⎣ ρ 2 ρ 1 +2β f ρ 2 ⎦.
∂θ 1 − β f ρ 1 − β 2f ρ 2 γ ρ2
1−β f ρ 1 −β 2f ρ 2
A ‘yes/no’ answer to the question of whether a particular value of θ is identified is given by inves-
tigating if the rank of R(θ ), evaluated at that particular value, is full. Therefore, necessary condi-
tions for identification are 1 − β f ρ 1 − β 2f ρ 2 = 0, γ = 0 and ρ 2 = 0. This matrix will
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
also play a role in the Bayesian identification analysis. The weakly √ identified case can arise if
1 − β f ρ 1 − β 2f ρ 2 = 0, γ = 0, but ρ 2 is replaced by ρ 2T = δ/ T.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Identification conditions were given by determining when the mapping from reduced-form to
structural parameters is unique.
The focus of the more recent discussion on identification of RE models has been on closed
systems with no exogenous variables. Specifically, consider the following structural RE model
with lagged values:
where the elements of the matrices A0 (θ ) , A1 (θ ) , A2 (θ) , and A3 (θ ) are functions of the
structural parameters θ , and vt is such that E(vt ) = 0, and E(vt vt ) = Im . Assuming that a
unique solution exists, we have seen that it can be cast in the form
yt = yt−1 + ut , (20.76)
where
The solution of the RE model can, therefore, be viewed as a restricted form of the VAR model
popularized in econometrics by Sims (1980). (20.76) can also be viewed as the reduced form
model associated with the structural model (20.75). Identification of structural parameters,
θ , can be investigated by considering the mapping from the reduced form parameters, and
Var(ut ) = u to θ . Identification of RE models is complicated by the fact that this mapping is
often highly nonlinear.
Example 46 Consider the static DSGE model given in Example 44, and note that under the assump-
tion that εt ∼ IIDN(0, ε ) the log-likelihood function of the model is given by
1 −1
T
T
T (θ ) ∝ − ln | ε | − y A A0 yt ,
2 2 t=1 t 0 ε
where
⎛ ⎞
1 −ψ 0
A0 = ⎝ σ 0 1 ⎠,
0 1 −γ
and ε is a diagonal matrix with diag( ε ) = (σ 2ε1 , σ 2ε2 , σ 2ε3 ) . It is clear that the likelihood
function does not depend on β, and hence β is not identified. In fact all parameters of A1 (defined
by (20.67)) that do not appear as an element of A0 are potentially unidentifiable.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
evidence. Kydland and Prescott (1996) argue that the task of computational experiments of the
sort they conduct is to derive the quantitative implications of the theory rather than to measure
economic parameters, one of the primary objects of econometric analysis. Calibration is gen-
erally based on estimates from microeconomic studies, or cross-country estimates. As pointed
out by Canova and Sala (2009), if the calibrated parameters do not enter in the solution of the
model (and therefore do not appear in the likelihood function), then estimates of the remaining
parameters will be unaffected.
Over the past ten years it has become more common to estimate, rather than calibrate, dynamic
stochastic general equilibrium (DSGE) models, often using Bayesian techniques (see, among
many others, De Jong, Ingram, and Whiteman (2000), Smets and Wouters (2003), Smets and
Wouters (2007) and An and Schorfheide (2007) ). In this context, the issue of identification has
attracted renewed attention. Questions have been raised about the identification of particular
equations of the standard new Keynesian DSGE model, such as the Phillips curve (Mavroeidis
(2005), Nason and Smith (2008), Kleibergen and Mavroeidis (2009), Dées, di Mauro, Pesaran,
and Smith (2009), and others), or the Taylor rule, Cochrane (2011). There have also been ques-
tions about the identification of DSGE systems as a whole. Canova and Sala (2009, p. 448) con-
clude: ‘it appears that a large class of popular DSGE structures are only very weakly identified’.
Iskrev (2010a), concludes ‘the results indicate that the parameters of the Smets and Wouters
(2007) model are quite poorly identified in most of the parameter space’. Other recent papers
which consider determining the identification of DSGE systems are Iskrev (2010b), Iskrev and
Ratto (2010), who provide rank and order conditions for local identification based on the spec-
tral density matrix.
Whereas papers like Iskrev (2010b) and Iskrev (2010a) and Komunjer and Ng (2011) pro-
vide classical procedures for determining identification based on the rank of particular matrices,
Koop, Pesaran, and Smith (2013) propose Bayesian indicators. A Bayesian approach to iden-
tification is useful both because the DSGE models are usually estimated by Bayesian methods
and since the issues raised by identification are rather different in a Bayesian context. Given an
informative choice of the prior, such that a well-behaved marginal prior exists for the parameter
of interest, then there is a well-defined posterior distribution, whether or not the parameter is
identified. In a Bayesian context lack of identification manifests itself in rendering the Bayesian
inference sensitive to the choice of the priors even for sufficiently large sample sizes. If the param-
eter is not identified, one cannot learn about the parameter directly from the data and, even with
an infinite sample of data, the posterior would be determined by the priors.
Within a Bayesian context, learning is interpreted as a changing posterior distribution, and
a common practice in DSGE estimation is to judge identification by a comparison of the prior
and posterior distributions for a parameter. Among many others, Smets and Wouters (2007,
p. 594) compare prior and posteriors and note that the mean of the posterior distribution is
typically quite close to the mean of the prior distribution and later note that ‘It appears that the
data are quite informative on the behavioral parameters, as indicated by the lower variance of the
posterior distribution relative to the prior distribution.’
As we discuss, not only can the posterior distribution differ from the prior even when the
parameter is unidentified, but in addition a changing posterior need not be informative about
identification. This can happen because, for instance, the requirement for a determinate solu-
tion of a DSGE model puts restrictions on the joint parameter space, which may create depen-
dence between identified and unidentified parameters, even if their priors are independent.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
What proves to be informative in a Bayesian context is the rate at which learning takes place
(posterior precision increases) as more data become available.
Koop, Pesaran, and Smith (2013) suggest two Bayesian indicators of identification. The first,
like the classical procedures, indicates non-identification while the second, which is likely to be
more useful in practice, indicates either non-identification or weak identification. Like most of
the literature the analysis is local in the sense that identification at a given point in the feasible
parameter space is investigated. Although these indicators can be applied to any point in the
parameter space, in the Bayesian context prior means seem a natural starting point. If the param-
eters are identified at their prior means then other points could be investigated.
The first indicator, referred to as the ‘Bayesian comparison indicator’, is based on Proposition
2 of Poirier (1998) and considers identification of the q1 × 1 vector of parameters, θ 1 , assum-
ing that the remaining q2 × 1 vector of parameters, θ 2 , is identified. It compares the posterior
distribution of θ 1 with the posterior expectation of its prior distribution conditional on θ 2 , and
concludes that θ 1 is unidentified if the two distributions coincide. This contrasts with the direct
comparison of the prior of θ 1 with its posterior, which could differ even if θ 1 is unidentified. Like
the classical indicators based on the rank of a matrix, this Bayesian indicator provides a yes/no
answer, though in practice the comparison will depend on the numerical accuracy of the MCMC
procedures used to compute the posterior distributions.
The application of the Bayesian comparison indicator to DSGE models can be problematic,
since it is often difficult to suitably partition the parameters of the model such that there exists a
sub-set which is known to be identified. Furthermore, in many applications the main empirical
issue of interest is not a yes/no response to an identification question, but whether a parameter
of the model is weakly identified, in the sense discussed, for example, by Stock, Wright, and Yogo
(2002), and Andrews and Cheng (2012), in the classical literature. Accordingly, Koop, Pesaran,
and Smith (2013) also propose a second indicator, which they refer to as the ‘Bayesian learn-
ing rate indicator’, that examines the rate at which the posterior precision of a given parameter
gets updated with the sample size, T, using simulated data. For identified parameters the poste-
rior precision increases at the rate T. But for parameters that are either not identified or weakly
identified the posterior precision may be updated but its rate of update will be slower than T.
Implementation of this procedure requires simulating samples of increasing size and does not
require the size of the available realized data to be large. In a recent paper Caglar, Chadha, and
Shibayama (2012) apply the learning rate indicator to examine the identification of the parame-
ters of the Bayesian DSGE model of Smets and Wouters (2007), and find that many parameters
of this widely used model do not appear to be well identified.
We shall return to Bayesian estimation of RE models below. But first we consider estimation
of RE models by ML and GMM approaches.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where as pointed out above the structural errors, vt , are typically assumed to be uncorrelated, and
without loss of generality its variance matrix is set to an identity matrix, E(vt vt ) = v = Im .
Note that under nonsingularity of A0 (θ ) we have
ut = Rut−1 + ε t , (20.78)
where ε t ∼ IIDN (0, ε ), and we assume all the roots of R lie inside the unit circle. In this case,
the unique solution (20.12) via the QDE method can be written as
where C(θ ) and G(θ ) satisfy equations (20.10) and (20.15). Noting that, under the regularity
conditions, ut = (Im − RL)−1 ε t , and an infinite moving average term arises in (20.79). Its
likelihood function can be computed by applying the Kalman filter (see Section 16.5 for further
details on state space models and the Kalman filter). Let
yt
zt = S , (20.80)
ut
where S is a selection matrix picking the observables from the variables of the model. Then
(20.79)–(20.80) represent a state space system. Let ẑt|t−1 be the one-step ahead forecasts of
the state vector zt , (given the information up to time t − 1). Then the one-step ahead prediction
error ν t = zt − ẑt|t−1 has zero mean and covariance matrix Ft , and the log-likelihood function
of the sample is
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 1 −1
T T
−mT
T (θ ) = ln(2π ) − ln |Ft | − ν F νt. (20.81)
2 2 t=1 2 t=1 t t
For further details see Pesaran (1987c) and Binder and Pesaran (1995).
Under both of the above specifications, ML estimation of the structural parameters, θ, can be
computed using suitable numerical algorithms. In practice, the maximization of T (θ ) is com-
plicated when one or more of the structural parameters are not well identified, and can tend to
result in log-likelihood profiles that are flat over certain regions of the parameter space. It is there-
fore important that identification of the parameters is adequately investigated before estimation.
where the expectational errors, ξ t+1 = yt+1 −E(yt+1 |t ) are by assumption uncorrelated with
past observations yt−1 , yt−2 , . . .. Also considering that the structural errors are serially uncor-
related, it follows that the composite errors, A3 (θ ) vt − A2 (θ ) ξ t+1 , are also uncorrelated
with past observations. Hence, the GMM procedure can be based on the following moment
conditions
E yi,t−s A0 (θ) yt − A1 (θ ) yt−1 − A2 (θ ) yt+1 = 0, for each i = 1, 2, . . . , m
and s = 1, 2, . . . , p, (20.82)
where p denotes the number of lagged variables used in the construction of the moment condi-
tions. The choice of p is determined by the number of unknown parameters, θ , but otherwise is
arbitrary. The orthogonality condition (20.82) suggests that a consistent estimate of the param-
eters can be obtained by applying the IV method using yt−2 , yt−3 , . . . , yt−p (as well as current
and past values of any exogenous variables present in the model) as instruments. However, the
validity of the GMM procedure critically depends on whether the parameters θ are identified.
This aspect is clearly illustrated in Example 47.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where xt is a scalar exogenous variable, and vt has mean zero and variance σ 2v . Suppose that
zt = (xt−1 , xt ) is chosen as a vector of instrumental variables. Clearly, the components of thisvec-
tor satisfy the orthogonality condition (20.82). Further, using (20.84), noting that E xt xt−j =
σ 2v ρ j /(1 − ρ 2 ), for |ρ| < 1, it follows that the matrix
σ 2v 1 ρ
zz = E zt zt = ,
1 − ρ2 ρ 1
is non-singular, as required by the econometric theory on GMM (see Section 10.8). However, a
further condition for consistency of the IV estimator is that T1 Tt=1 zt ht , where ht = yt+1 , xt ,
converges in probability to a constant matrix of rank 2 (on this,
see the discussion on identification
conditions provided in Hall (2005, Ch. 2)). Suppose that β f < 1, then the unique stationary
solution for yt is given by
yt = θ xt + ut ,
where θ = γ / 1 − β f ρ . In this case it is easily seen that the matrix
σ 2v θ ρ2 ρ
zh = E zt ht =
1 − ρ2 θρ 1
has rank 1. Therefore, when β f < 1, the matrix zh is not full rank and the application of the
IV method fails to yield consistent estimates of β f and γ . This is not surprising, given that the RE
model (20.83)-(20.84) is observationally equivalent to the non-RE model yt = δxt + ut , and
therefore β f and γ are not identifiable. See also the discussion in Pesaran (1987c, Ch. 7).
The GMM method is particularly convenient for estimation of unknown parameters of the
nonlinear rational expectations equations obtained as first-order conditions of intertemporal
optimization problems encountered by agents and firms. However, such applications can be
problematic since in practice it might not be possible to ascertain whether the underlying param-
eters are identifiable. Example 47 clearly illustrates the danger of indiscriminate application of
GMM method to RE models. This example illustrates that prior to estimation, identification of
the parameters should be checked.
A textbook treatment of GMM can be found in Hall (2005), where GMM inference tech-
niques are also illustrated through some empirical examples from macroeconomics (see, in par-
ticular, Ch. 9).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In Bayesian analysis the likelihood function, LT θ ; y , given by (20.77), is combined with
the prior probability distribution, P(θ ), to obtain the posterior distribution
PT (θ |Y ) = P(θ )LT (θ ; Y) .
When the structural parameters, θ, are identified, the posterior probability distribution, PT (θ |Y),
will become dominated by the likelihood function and the posterior mean will converge in prob-
ability to the ML estimator of θ . In cases where one or more elements of θ are not identified, or
are only weakly identified, this convergence need not take place or might require T to be very
large, which is not the case in most empirical macro economic applications. In such cases the
Bayesian inference could be quite sensitive to the choice of the priors and it is important that the
robustness of the results to the choice of priors are investigated.
In the context of DSGE models, the errors are typically assumed to be Gaussian and the log-
likelihood is given by (20.77), or by (20.81) if the errors are serially correlated. For the priors
many specifications have been considered in the literature including improper priors, conjugate
priors, Minnesota priors (Litterman (1980, 1986) and Doan, Litterman, and Sims (1984)), and
priors more recently proposed by Sims and Zha (1998) that are intended to simplify the com-
putations of the posterior distribution in the case of structural VARs. Since the DSGE model can
be viewed as a restricted VAR model, the prior specification must also satisfy the DSGE param-
eter restrictions. To this end Markov chain Monte Carlo (MCMC) methods are employed to
generate random samples for the purpose of numerical evaluation of the posterior distributions
that are otherwise impossible to evaluate analytically or by standard quadrature or Monte Carlo
methods. An introduction to the MCMC algorithm is provided in Chib (2011). Empirical appli-
cations of Bayesian DSGE models can be carried out using standard software packages such as
Dynare (<http://www.dynare.org/>).
Example 48 As an example of Bayesian estimation consider the NKPC in Example 45, and note
that
yt = Dzt + vt ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Under the assumption that vt ∼ IIDN(0, v ), and given the observations, Y = (y1 , y2 , . . . , yT ),
the log-likelihood function of the model is given by
1
T
2T T
T (θ ) = − ln (2π) − ln | v | − [yt − D(ψ, φ)zt ] −1
v [yt − D(ψ, φ)zt ] ,
2 2 2 t=1
where θ = ψ , ρ , vech( v ) . For the posterior distribution of θ we have
where P(θ ) denotes the prior distribution of θ . To compute the posterior distribution of a parameter
of interest, such as γ or β f , numerical methods are typically used to integrate out the effects of the
other parameters. In small samples, the outcome can critically
depend on the choice of P(θ ). The
∂α
rank condition for identification for this example is Rank ∂ψ
= 2, where
⎡ ⎤
ρ +2β f ρ 2
∂α ⎢ γ ρ 1 + 1−β1 ρ −β 2ρ ρ1 + βf ρ2 ⎥
1 ⎢ f 1 f 2 ⎥
= ⎢
⎣
⎥.
⎦
∂ψ 1 − β f ρ 1 − β 2f ρ 2 ρ 2 ρ 1 +2β f ρ 2
γ 1−β f ρ 1 −β 2f ρ 2
ρ2
In the case where ρ 2 is close to zero the parameters γ and β could be weakly identified and their
posterior distribution might depend on the chosen prior distribution, P(θ ), even for sufficiently large
samples.
Various econometric and computational issues involved in the application of Bayesian tech-
niques to DSGE models are surveyed in Karagedikli et al. (2010) and Del Negro and Schorfheide
(2011). An overview of Bayesian analysis is provided in Appendix C.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
20.15 Exercises
1. Consider the new Keynesian Phillips curve
where π t is the rate of inflation, xt is a measure of output gap, E (π t+1 |t ) is the expectations
of π t+1 conditional on the information set t which is assumed to contain at least current and
past values of inflation and the output gap.
2
ut 0 σu 0
∼ IID , .
vt 0 0 σ 2v
xt = θ π t−1 + vt .
Derive the conditions under which the solution for π t is unique in this case.
where r is the real rate of discount, pt is the real share price, dt is real dividends paid per share,
ut is a serially uncorrelated process, and t is the information available at time t. Suppose that
dt follows a random walk with a non-zero drift, μ
dt = μ + dt−1 + ε t , ε t ∼ IID(0, σ 2 ).
(a) Derive the solution of pt and discuss the conditions under which the solution is unique.
(b) Suppose that there are feedbacks from pt−1 − pt−2 into dt so that
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Compare your solution with the case without feedbacks and discuss the conditions under
which the solution with feedbacks is unique.
where yt is an m × 1 vector of target variables (such as employment, sales and stock levels), yt∗
is the desired level of yt , H and G are m×m matrices with known coefficients, β (0 ≤ β < 1)
is a known discount factor. Suppose further that
yt∗ = xt + ut ,
xt = xt−1 + ε t ,
where is an m×k matrix of known constants, xt is a k×1 vector of observed forcing variables
assumed to follow a first-order vector autoregression. Finally, ut and ε t are unobserved serially
uncorrelated error processes.
(a) Derive the first-order (Euler) conditions for the following minimization problem
⎡ ⎤
∞
Minyt ,yt+1 ,... E ⎣ β j c(yt+j ) |It ⎦ ,
j=0
where It is the information set available to the firm at time t and contains at least current
and past values of yt and xt . Show that these conditions can be written as the following
canonical form rational expectations model
yt = Ayt−1 + BE yt+1 |It + wt ,
where
A = (H + (1 + β)G)−1 G, B =βA,
wt = (H + (1 + β)G)−1 Hy∗t .
Establish the conditions under which this optimization problem has a unique solution.
(b) Write down the solution in the case where Ik − is rank deficient.
(c) Discuss the conditions under which A and β can be identified from time series observa-
tions on xt and yt .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where It is the information set at time t, |α| < 1, ut ∼ IID(0, σ 2u ), and xt follows the AR(1)
process
xt = ρxt−1 + vt .
yt = αyt+1 + γ xt + ξ t
= β wt + ξ t ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
21 Vector Autoregressive
Models
21.1 Introduction
T his chapter provides an overview of vector autoregressive (VAR) models, with a particular
focus on estimation and hypothesis testing.
yt = yt−1 + ut , t = 1, 2, . . . ,
where i are m × m matrices. The above model can be extended to include deterministic com-
ponents such as intercepts, linear trends or seasonal dummy variables, as well as weakly exoge-
nous I(1) variables. The inclusion of deterministic components in the model will be discussed in
Section 21.4, while the extension to weakly exogenous variables will be reviewed in Chapter 23.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
,...,
which is a VAR(1) model, but in the mp × 1 vector of random variables Yt = (yt , yt−1
yt−p+1 ) , namely
Yt = Yt−1 + Ut , (21.2)
t+M−p−1
Yt = t+M−p
Y−M+p + j Ut−j . (21.4)
j=0
(21.4) by letting M → ∞, namely by assuming that the process has been in operation for a suf-
ficiently long time before its realization, Yt , being observed at time t. Also, when all eigenvalues
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
j j 1/2
of lie inside the unit circle,
the norm of defined by Tr → 0 exponentially in
j and the process Yt = ∞ j=0 jU
t−j will also be absolute summable, in the sense that the sum
of absolute values of the elements of , for j = 0, 1, 2, . . . converge. It is then easily seen that Yt
j
(or yt ) will have finite moments of order , assuming ut has finite moments of the same order. In
particular, by Assumption B1 to B3, since ut ∼ IID(0, ), then
∞
the stability condition of the Yt (or yt ) can be written equivalently in terms of the roots of the
determinantal equation
Im − 1 z − 2 z2 − . . . p zp = 0. (21.6)
In this formulation the Yt process will be stable and covariance stationary if all the roots of (21.6)
lie outside the unit circle (|z| > 1).
and Yt (or yt ) is said to be a unit root process. It is, however, important to note that the above
condition does not rule out the possibility of yt being integrated of order 2 or more.1 To ensure
that yt ∼ I(1), namely that yt ∼ I(0), further restrictions are required. See, for example,
Johansen (1991, p. 1559, Theorem 4.1) and Johansen (1995, p.49, Theorem 4.2).
Define the long-run
multiplier matrix as =
Im −1 −2 −. . . p , when (21.6) holds for all
|z| > 1 we have Im − 1 − 2 − . . . p = 0, and will be full rank, namely Rank() =
m. On the other hand, if (21.7) is true, we must have Rank() = r < m, with m − r being the
number of unit roots in the system. See also Chapter 22.
21.3 Estimation
We now focus on estimation of the parameters of (21.1). To this end, it is convenient to rewrite
(21.1) as follows
1 Recall that a process is said to be integrated of order q if it must be differenced q times before stationarity is achieved.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
A = 1 2 . . . p , gt = yt−1 yt−2 . . . yt−p . (21.8)
Assumption B5: The augmented VAR(p) model, (21.1), is stable. That is, all the roots of the
determinantal equation
Im − 1 λ − 2 λ2 − · · · − p λp = 0, (21.9)
Since the system of equations (21.1) is in the form of a SURE model with all the equations
having the same set of regressors, gt , in common, it then follows that when ut s are Gaussian the
ML estimators of the unknown coefficients can be computed by OLS regressions of yt on gt (see
Section 19.2.1). Writing (21.1) in matrix notation we have
yt = A gt + ut , (21.10)
where A and gt are defined in (21.8). Hence, the ML estimators of A and are given by
T T −1
and
T
= T −1
(yt − Â gt )(yt − Â gt ) . (21.12)
t=1
= − Tm 1 + log 2π − T log
Â, . (21.13)
2 2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
xt = yt − dt ,
or, equivalently,
This is because the trend properties of yt are determined by dt under (21.14), while if speci-
fication (21.15) is adopted the trend property of yt would critically depend on the number of
eigenvalues of (if any) that fall on the unit circle. For example, setting dt = a0 + a1 t, and
assuming = Im , under (21.14) we have
yt = y0 + a1 t + st ,
t
where st = j=1 uj . Under (21.15) we have
t(t + 1)
yt = y0 + a0 t + a1 + st .
2
Only when all eigenvalues of fall within the unit circle are the two specifications stochastically
equivalent. In this case the two specifications, (21.14) and (21.15), can be solved as
yt = dt + (Im − L)−1 ut ,
and
respectively. Since dt is a deterministic process it then follows that both processes have the same
covariance matrix. The two processes have the same means when dt = a0 + a1 t. To see this
note that
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(Im − L)−1 (a0 + a1 t) = Im + L + 2 L2 + . . . . (a0 + a1 t)
= Im + L + 2 L2 + . . . . a0 + Im + L + 2 L2 + . . . . a1 t
= b0 + b1 t,
where
b0 = (Im − )−1 a0 − + 22 + 33 + . . . a1 ,
b1 = (Im − )−1 a1 .
Hence, in the stationary case both specifications will have the same linear trends.
Tm T
AICp = − 1 + log 2π − log
p − ms, (21.16)
2 2
and
Tm T ms
SBCp = − 1 + log 2π − log
p − log(T), (21.17)
2 2 2
where s = mp, and p is defined by (21.12). The AICp and SBCp can be computed for
p = 0, 1, 2, . . . . , P, where P is the maximum order for the VAR model chosen by the user.
The log-likelihood ratio statistic for testing the hypothesis that the order of the VAR is p
against the alternative that it is P (with P > p) are given by
p − log
LRP,p = T log P . (21.18)
For p = 0, 1, 2, . . . , P − 1, where P is the maximum order for the VAR model selected by the
user, p is defined by (21.12), and
0 refers to the ML estimator of the system covariance matrix
of yt .
Under the null hypothesis, the LR statistic in (21.18) is asymptotically distributed as a
chi-squared variate with m2 (P − p) degrees of freedom.
In small samples the use of the LR statistic, (21.18), tends to result in over-rejection of the
null hypothesis. In an attempt to take some account of this small sample problem, in practice the
following degrees of freedom adjusted LR statistics can be computed
∗
LRP,p p − log
= (T − mP) log P , (21.19)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Example 49 We now consider the problem of selecting the order of a trivariate VAR model in the
output growths of USA, Japan and Germany, estimated over the period 1964(3)-1992(4). Table
21.1 reports the log-likelihood, Akaike and the Schwarz criteria, the LR and adjusted LR statistics
for the seven VAR models VAR(p), for p = 0, 1, 2, . . . , 6. As expected, the maximized values of
the log-likelihood function given under the column headed LL increase with p. However, the Akaike
and the Schwarz criteria select the orders 1 and 0, respectively. The log-likelihood ratio statistics
reject order 0, but do not reject a VAR of order 1. In the light of these results we choose the VAR(1)
model. Note that it is quite usual for the SBC to select a lower order VAR as compared with the AIC.
Having chosen the order of the VAR it is prudent to examine the residuals of individual equations for
serial correlation. Tables 21.2 to 21.4 show the regression results for the US, Japan, and Germany,
respectively. There is no evidence of residual serial correlation in the case of the US and Germany’s
output equations, but there is statistically significant evidence of residual serial correlation in the case
of Japan’s output equation. There is also important evidence of departures from normality in the case
of output equations for the US and Japan. A closer examination of the residuals of these equations
suggests considerable volatility during the early 1970s as a result of the abandonment of the Bretton
Wood system and the quadrupling increase in oil prices. It is therefore likely that the remaining serial
correlation in the residuals of Japan’s output equation may be due to these unusual events. Such a
possibility can be handled by introducing a dummy variable for the oil shock in the VAR model.
Table 21.1 Selecting the order of a trivariate VAR model in output growths
Test statistics and choice criteria for selecting the order of the VAR model
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Diagnostic Tests
A:Lagrange multiplier test of residual serial correlation; B:Ramsey’s RESET test using the square of the fitted
values
C:Based on a test of skewness and kurtosis of residuals; D:Based on the regression of squared residuals on
squared fitted values
More specifically, a variable X is said to ‘Granger-cause’ a variable Y if past and present values
of X contain information that helps predict future values of Y better than using the information
contained in past and present values of Yalone.
Consider two stationary processes yt , and xt . Let y∗T+h|T be the forecast of yT+h formed
at time T, using the information set T , and ỹ∗T+h|T be the forecast based on the information
set ˜ T , containing all the information in T except for that on past and present values of the
process {xt }. Let Lq (.,.) be a quadratic loss function (see Section 17.5). The process {xt } is said
to Granger-cause yt if
E Lq (yT+h , y∗T+h|T ) < E Lq (yT+h , ỹ∗T+h|T ) , for at least one h = 1, 2, . . . .
Hence, if {xt } fails to Granger-cause yt , for all h > 0, the mean square forecast error based on
y∗T+h|T is the same as that based on ỹ∗T+h|T .2
2 See Dufour and Renault (1998) for futher details on causality in Granger’s sense.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Diagnostic Tests
There are other notions of ‘cause’ and ‘effect’ discussed in philosophy and statistics that we
shall not be considering here. But even if we confine our analysis to the causality in Granger’s
sense, we need to be careful that all proximate third-party channels that influence the interactions
of Y and X are taken into account. The most obvious case arises when there exists a third variable
Z which influences both Y and X, but with differential time delays. As an example, suppose that
changes in Z affect X before affecting Y. Then in the absence of an a priori knowledge of Z, it
might be falsely concluded that X is the cause of Y. In practice, the presence Z (not known to
the investigator) might not be an issue if the profiles of the effects of Z on Y, and Z on X are
stable over time. But the situation could be very different if (for some reason unknown to the
investigator) the time delays in the way Z affects Y and X change.
Other examples arise in situations where economic agents possess forecasting skills. Consider
an individual who decides to take an umbrella when leaving home depending on whether the
forecast is rain or shine. Suppose further that this individual is reasonably good at predicting
the weather over the course of a day. Let X be an indicator variable that takes the value of 1
if the individual carries an umbrella and zero otherwise, and Y takes the value of 1 if it rains
and zero otherwise. Then it is clear that a crude application of the Granger causality test to X
and Y, can lead to the misleading conclusion that the decision to take the umbrella causes rain!
However, if a third variable Z, which captures the forecasting skill of the individual, is included
in the analysis, we will soon learn that the correlation between Y and X represents the extent to
which the individual is good at forecasting the weather.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Diagnostic Tests
In short, we need to consider all possible variables that interact to bring about an outcome
before making any definite conclusions about causality from the application of Granger non-
causality tests. In what follows we discuss a useful generalization that allows for additional factors
in the application of the Granger test. But the fundamental problem remains that no matter how
exhaustive we are in our analysis of Granger non-causality, we still need to be aware of other
possible casual links that we might have inadvertently overlooked.
where Y1 and Y2 are T × m1 and T × m2 matrices of observations on y1t and y2t respectively, G1
and G2 are T × pm1 and T × pm2 matrices of observations on the p lagged values of y1,t− , and
y2,t− , for t = 1, 2, . . . , T, = 1, 2, . . . , p, respectively. The process y2t does not Granger-cause
y1t if the m1 m2 p restrictions A12 = 0 hold.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Y = AG + U,
with
Y1 A11 A12 G1 U1
Y= ,A= ,G= ,U= .
Y2 A21 A22 G2 U2
See formula (21.12). R is the ML estimator of when the restrictions A12 = 0 are imposed.
Under the null hypothesis that A12 = 0, LRG is asymptotically distributed as a chi-squared
variate with m1 m2 p degrees of freedom.
Since under A12 = 0 the system of equations (21.20)–(21.21) are block recursive, R can
be computed in the following manner:
∗ ∗
U2 = Y2 − G1
A21 − G2
A22 ,
where ∗ and
A21 ∗ are the OLS estimators of A ∗ and A ∗ , in (21.22). Define
A22 21 22
U=
U1 : U2 .
Then
R = T −1
UU . (21.23)
yt = yt−1 + ut , t = 1, 2, . . . . (21.24)
Following the discussions in Chapter 17, conditional on T = (yT , yT−1 , . . . .), the point fore-
cast of yt+h is given
∗
yT+h|T = E yT+h | T , (21.25)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where A is a symmetric positive definite matrix. The elements of A measure the relative impor-
tance of the forecast errors of the different variables in yt and their correlations. It is important
to note that the optimal forecast does not depend on A. For the VAR(1) specification in (21.24)
we have
∗
yT+h|T = h yT .
∗
For the VAR(p) specification defined by (21.1) the h-step ahead forecasts, yT+h|T , can be obtained
recursively by noting that
p
∗ ∗
yT+j|T = i yT+j−i|T , j = 1, 2, . . . , h,
i=1
∗
with initial values yT+j−i|T = yT−i for j − i ≤ 0. See Chapter 23 for a discussion of forecasting
in the case of cointegrating VARs possibly in the presence of weakly exogenous variables.
Example 50 Table 21.5 reports the multivariate, multi-step ahead forecasts and forecast errors of
output growths for the VAR(1) model estimated in Example 49 (see Tables 21.2–21.4), for the
four quarters of 1993. As can be seen from the summary statistics, the size of the forecast errors and
the in-sample residuals are very similar. A similar picture also emerges by plotting in-sample fitted
values and out-of-sample forecasts (see Figure 21.1). It is, however, important to note that the US
growth experience in 1993 may not have been a stringent enough test of the forecast performance
of the VAR, as the US output growth has been positive in all four quarters. A good test of forecast
performance is to see whether the VAR model predicts the turning points of the output movements.
Similarly, forecasts of output growths for Japan and Germany can also be computed. For Japan the
root mean sum of squares of the forecast errors over the 1993(1)–1993(4) period turned out to be
1.48 per cent, which is slightly higher than the value of 1.02 per cent obtained for the root mean sum
of squares of residuals over the estimation period. It is also worth noting that the growth forecasts for
Japan miss the two negative quarterly output growths that occurred in the second and fourth quar-
ters of 1993. A similar conclusion is also reached in the case of output growth forecasts for Germany.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
coefficients, and it is square summable (refer to Appendix A for details for the square summabil-
ity of a matrix).
The autocovariance generating function for (21.26) is
Gy (z) = A (z) A z−1 ,
1 iω
Fy (ω) = A e A e−iω , 0 ≤ ω ≤ π.
2π
Evaluating at zero frequency, we obtain
1
Fy (0) = A (1)A (1) .
2π
yt = yt−1 + ut . (21.27)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
0.03
0.02
0.01
0.00
–0.01
–0.02
–0.03
1964Q1 1971Q3 1979Q1 1986Q3 1993Q4
DLYUSA Forecast
1 −1 −1
Fy (ω) = Im − eiω Im − eiω .
2π
Evaluating at zero frequency, we obtain
1 −1
Fy (0) = (Im − )−1 Im − .
2π
For an introductory text on the estimation of the spectrum see Chatfield (2003). For more
advanced treatments of the subject see Priestley (1981) and Brockwell and Davis (1991). See
also Chapter 13.
21.10 Exercises
1. Consider the bivariate vector autoregressive model
y1t a11 a12 y1,t−1 u1t
= + ,
y2t a21 a22 y2,t−1 u2t
yt = Ayt−1 + ut , ut IIDN(0, ),
where
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
σ 11 σ 12
= .
σ 12 σ 22
(a) Derive the conditional mean and variance of y1t with respect to y2t , and lagged values of
y1t and y2t .
(b) Show that the univariate representation of y1t (or y2t ) is an autoregressive moving average
process of order (1,1).
where i , for i = 1, 2 are m × m matrices of fixed coefficients, and ut is a mean zero, seri-
ally uncorrelated vector of disturbances with a common positive definite variance–covariance
matrix, . Derive the conditions under which the VAR(2) model defined in (21.28) is station-
ary.
3. Consider the first-order vector autoregressive model
yt = Ayt−1 + ut ,
where yt is an m×1 vector of observed variables, and ut ∼ IID(0, ), with being an m×m
positive definite matrix.
(a) Show that yt is covariance stationary if all eigenvalues of A lie inside the unit circle.
(b) Derive the point forecasts of yT+1 , yT+2 , . . . , yT+h , based on observations y1 , y2 ,
. . . , yT , and show that the j-step ahead forecast errors
ξ T+j = yT+j − E yT+j |yT , yT−1 , · · · , y1 , for j = 1, 2, . . . , h
(d) Discuss the relevance of this result for multi-period ahead forecasting.
4. Suppose that the m-dimensional random variable, yt = (y1t , y2t , . . . , ymt ) , follows a VAR(1)
process.
(a) Show that y1t follows a univariate ARMA(m, m − 1) process. Start by first proving this
result for m = 2.
(b) Derive the pair-wise correlation of yit and yjt across all i and j, and show that the univariate
representations form a system of seemingly unrelated autoregressions.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(c) Set m = 2 and compare the forecast performance of y1t based on a VAR(1) in (y1t , y2t )
with forecasts obtained from univariate ARMA(2, 1).
where y1t and y2t are m1 × 1 and m2 × 1 are vectors of random variable, and the m × 1 error
vector ut = (u1t u2t ) is IID(0, ).
(a) Given the set of observations, yt for t = 1, 2, . . . , T, test the hypothesis that y2t ‘Granger
causes’ y1t and not vice versa.
(b) Discuss the pros and cons of Granger causality tests. Illustrate your response by an empiri-
cal application based on the GVAR data set which can be downloaded from
<https://sites.google.com/site/gvarmodelling/data>.
(c) Consider now the possibility that y1t and y2t are also affected by a third set of variables,
y3t , not already included in yt = (y1t , y ) . How does this affect your analysis? Again
2t
illustrate your response empirically.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
22 Cointegration Analysis
22.1 Introduction
I n this chapter we provide an overview of the econometric methods used in long-run struc-
tural macroeconometric modelling. In what follows we first introduce the concept of coin-
tegration for a set of time series variables. We then turn our attention to cointegration within
a VAR framework and review the literature on identification, estimation and hypothesis testing
in cointegrated systems. We discuss estimation of cointegrating relations under general linear
restrictions, and review tests of the over-identifying restrictions on the cointegrating vectors.
We also comment on the small sample properties of some of the test statistics discussed in the
chapter, and discuss a bootstrap approach for obtaining critical values. We conclude the chap-
ter by reviewing the multivariate version of the Beveridge-Nelson decomposition, extended to
include possible restrictions on the intercepts and/or trend coefficients, as well as the existence
of long-run relationships.
22.2 Cointegration
The concept of cointegration was first introduced by Granger (1986) and more formally devel-
oped in Engle and Granger (1987). Two or more variables are said to be cointegrated if they are
individually integrated (or have a random walk component), but there exist linear combinations
of them which are stationary. More formally, consider m time series variables y1t , y2t , . . . , ymt
known to be non-stationary with unit roots, integrated of order one, namely (see Section 15.2)
yit ∼ I(1), i = 1, 2, . . . , m.
The m × 1 vector time series yt = y1t , y2t , . . . , ymt is said to be cointegrated if there exists an
m × r matrix (r ≥ 1) such that
β yt = ξ t ∼ I (0) ,
r×m m×1 r×1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where the integer r denotes the number of cointegrating vectors, also known as dimension of the
cointegration space. Cointegration means that, although each individual series is I(1), there exist
some relations linking the individual series together, represented by the linear combinations,
β yt , which are I(0). The cointegrating relations summarized in the r × 1 vector β yt are also
known as long-run relations (Johansen (1991)).
Example 52 Many examples of cointegrating relations exist in the literature. In finance, under the
expectations hypothesis, interest rates of different maturities are cointegrated. In macroeconomics
examples of cointegration include the purchasing power parity hypothesis, the Fisher equation (that
relates nominal interest rate to the expected rate of inflation), and the uncovered interest parity.
For further details see Garratt et al. (2003b). Here we derive the cointegrating relationship that
exists between equity prices and dividends in a simple model where equity prices are assumed to be
determined by the discounted stream of dividends that are expected to occur to the equity
∞ i
1
Pt = E (Dt+i | t ) ,
i=1
1+r
where t = (Pt , Dt , Pt−1 , Dt−1 , . . .) is the information set, and assuming that r is constant over
time. To model Pt we first need to model the dividend process, Dt . If Dt is a unit root process, then
Pt will also be a unit root process. A bilinear version of the above model is
∞ i
1
Pt = E (Dt+i | t ) + ut ,
i=1
1+r
∞
Dt = Dt−1 + α i ε t−i ,
i=0
where ut could characterize the influence of noise traders or the effects of other similar factors on
equity prices. We shall assume that ut and ε t are white noise processes, and that {α i } is absolute
∞ 1 i
summable, so that wt = ∞ ∗
i=0 α i ε t−i is covariance stationary. Pt = i=1 1+r E (Dt+i | t )
∗
is often referred to as the ‘fundamental’ price. To derive Pt we first note that
∞
∞
∞
λ Dt+j = λ
j
λj−1
Dt+j−1 + λj wt+j ,
j=1 j=1 j=1
where λ = 1/(1 + r). Taking expectations conditional on t it is then easily seen that
Pt∗ = λ Dt + Pt∗ + ξ t ,
where
∞
ξt = E λ wt+j | t ,
j
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
which is the expected value of the discounted stream of a covariance stationary process, and itself
will be stationary (recall that |λ| < 1).1 Hence
λ 1
Pt∗ = Dt + ξ,
1−λ 1−λ t
Therefore
Pt − Dt /r ∼ I(0),
1 See Appendix D in Pesaran (1987c) for exact derivation of ξ in terms of the dividends innovations, ε .
t t
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
that are invariant to the ordering of variables based on the full information maximum likelihood
have been proposed by Johansen (1991) (see Section 22.10). Another shortcoming of residual-
based tests is that they do not allow for more than one cointegrating relation. Further, these tests
do not make the best use of available data, and have generally low power. We refer to Pesavento
(2007) for a comparison of residuals-based tests under a set of local alternatives.
p
yt = a0 + ψ i yt−i
i=1
p
p
+ φ i1 x1,t−i + φ i2 x2,t−i
i=0 i=0
+ δ 1 yt−1 + δ 2 x1,t−1 + δ 3 x2,t−1 + ut . (22.1)
Step 2: Compute the usual Wald or F-statistics for testing the null hypothesis
H0 : δ 1 = δ 2 = δ 3 = 0.
The distribution of this test statistic is non-standard, and the relevant critical value
bounds have been tabulated by Pesaran, Shin, and Smith (2001). The critical values dif-
fer depending on whether the regression equation has a trend or not.
Step 3: Compare the Wald or F-statistic computed in Step 2 with the upper and lower critical
value bounds for a given significance level, denoted by FU and FL . Then:
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Hence, if the computed Wald or F-statistics fall outside the critical value bounds, a conclusive
decision results without needing to know the order of the integration of the underlying variables.
If, however, the Wald or F-statistics fall within these bounds, inference would be inconclusive. In
such circumstances, more needs to be found out about the order of integration of the underlying
variables.
It is also possible to carry out a bounds t-test only on the coefficient of the lagged depen-
dent variable, namely testing δ 1 = 0, against δ 1 = 0 in the error correction model (22.1). Such
test is also non-standard, and the appropriate critical values are tabulated in Pesaran, Shin,
and Smith (2001). Once it is established that the linear relationship between the variables is
not ‘spurious’ the parameters of the long-run relationship can be estimated using the ARDL
procedure, discussed in Chapter 6 (see, in particular, Section 6.5). Pesaran and Shin (1999)
show that the ARDL approach to estimation of long-run relations continues to be applicable
even if the variables under consideration are I(1) and cointegrated. They also provide Monte
Carlo evidence on the comparative small sample performance of the ARDL and the fully mod-
ified OLS (FM-OLS) approach proposed by Phillips and Hansen (1990), showing that in gen-
eral the former performs better than the latter. For proofs and further details see Pesaran and
Shin (1999). In what follows we provide a brief account of the FM-OLS approach for
completeness.
yt = β 0 + β 1 xt + ut , t = 1, 2, . . . , T, (22.2)
where the k × 1 vector of I(1) regressors, xt , are not themselves cointegrated. Therefore, xt has
a first difference stationary process given by
xt = μ + vt , t = 2, 3, . . . , T, (22.3)
ût
ξ̂ t = , t = 2, 3, . . . , T, (22.4)
v̂t
T
where v̂t = xt − μ̂, for t = 2, 3, . . . , T, and μ̂ = (T − 1)−1 xt . A consistent estimator
t=2
of the long-run variance of ξ t is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
11
12
⎜ 1×1 1×k ⎟
=
+
= ⎜
+ ⎜
⎟
⎟, (22.5)
⎝
21
22
⎠
k×1 k×k
where
1
T
=
ξ̂ ξ̂ , (22.6)
T − 1 t=2 t t
and
m
=
ω(s, m)
s, (22.7)
s=1
in which
T−s
s = T −1 ξ̂ t ξ̂ t+s , (22.8)
t=1
and ω(s, m) is the lag window with horizon (or truncation) m. For a choice of lag window such
as Bartlett, Tukey or Parzen see Section 5.9.
Now let
=
+
= 11 12 , (22.9)
21
22
21 −
Z=
−1
22
22 21 , (22.10)
ŷ∗t = yt −
−1
12 22 v̂t , (22.11)
and
0 ⎛ ⎞
D ⎜ 1×k ⎟
=⎜ ⎟. (22.12)
(k + 1) × k ⎝ Ik ⎠
k×k
Z), (22.13)
where ŷ∗ = ŷ∗1 , ŷ∗2 , . . . , ŷ∗T , W = (τ T , X), and τ T = (1, 1, . . . , 1) .
A consistent estimator of the variance matrix of
β ∗ defined in (22.13) is given by
V (
β ∗ ) = ω̂11.2 (W W)−1 , (22.14)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
11 −
ω̂11.2 =
−1
12
22 21 . (22.15)
yt =
1 yt−1 +
2 yt−2 + . . . +
p yt−p + ut , ut ∼ IID(0, ), (22.16)
where p, the order of the VAR, is assumed to be known, and the initial values, y0 , y1 , . . . , y−p+1 ,
are assumed to be given. Cointegration within the VAR model (22.16) can be introduced by
considering its error correction representation. Rewrite (22.16) as
yt + yt−1 =
1 yt−1 +
2 (yt−1 − yt−1 ) + . . .
+
p (yt−1 − yt−1 − . . . yt−p+1 ) + ut ,
p−1
yt = −yt−1 + j yt−j + ut , (22.17)
j=1
p
with = Im −
1 −
2 − . . . −
p , and j = − i=j+1
i , for j = 1, 2, . . . , p − 1.
Suppose now yt ∼ I(1), then the left-hand side of (22.17) is I(0), and on the right-hand side
both yt−j and ut are I(0). Since I(1) + I(0) = I(1), (22.17) holds if and only if yt−1 is
I(0). Now let dt = yt−1 ∼ I(0). In the case where Rank() = m, is nonsingular and
we have yt−1 = −1 dt ∼ I(0), that is, yt−1 is a stationary process, which contradicts the
assumption that yt ∼ I(1). Therefore, given yt ∼ I(1) and yt−1 ∼ I(0) holds, we must have
Rank() = r < m. This introduces us to the concept of cointegration.
Definition 28 If yt−1 ∼ I(1) and the linear combinations of yt−1 , namely yt−1 , are covariance
stationary, namely if yt−1 ∼ I(0), we say the VAR model (22.16) is cointegrated. Denoting
Rank() = r < m, r is the dimension of the cointegration space.
= αβ , (22.18)
where α and β are m × r matrices of full column ranks, namely Rank(β) =Rank(α) = r.
Then
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
p−1
yt = −αβ yt−1 + j yt−j + ut . (22.19)
j=1
β yt−1 ∼ I(0),
where β yt is the r × 1 vector of cointegrating relations, also known as the long-run relations.
Example 53 Cointegration can also be defined in terms of the spectral density of first differences of
the variables evaluated at zero frequency. Consider the m × 1 vector of I(1) processes, yt ∼ I (1) ,
such that
is stationary, with ut ∼ IID(0, ), where is a positive definite matrix. Since yt is stationary
its spectral density exists and when evaluated at zero frequency can be written as (see Section 21.8)
1
Fy (0) = A (1) A (1) . (22.21)
2π
Suppose now that ξ t = β yt ∼ I(0), and note that the spectral density of ξ t at zero frequency
must be zero, due to over-differencing of a stationary process (see Exercise 5 in Chapter 13). Hence
1
β Fy (0) β = β A (1) A (1) β = 0, (22.22)
2π
where is a positive definite matrix. It then follows that we must have A (1) β = 0, which is
possible if and only if rank[A (1)] = m − r, where r is the number of cointegrating relations.
Therefore, cointegration is present when the spectral density of yt , evaluated at zero frequency, is
rank deficient. This suggests that non-parametric methods may be used to test for cointegration (see,
e.g., Breitung (2000)).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
from the observations. This is called the ‘long-run identification problem’. To exactly identify
the long-run (or cointegrating) coefficients, we need r2 exact- or just-identifying restrictions, r
restrictions on each of the r cointegrating relations. Note that it is not possible to distribute the
r2 just-identifying restrictions unevenly across the r cointegrating relations.
where pt and p∗t are domestic and foreign log prices, et is the exchange rate at time t, and rt and rt∗
are domestic and foreign interests, respectively. Denote
ξ 1t
ξ t = β xt = ,
ξ 2t
with
To distinguish between (22.23) and (22.24) and identify the cointegrating vectors, we need to
impose two restrictions on the coefficients of each of the two cointegrating vectors. To identify
(22.23), we need to impose the restriction β 11 = 1, and either β 14 = 0 or β 15 = 0. Simi-
larly to identify (22.24) we need to impose β 24 = 1, and either β 21 = 0 or β 22 = 0. Therefore,
a possible set of exact identifying restrictions for this example is
β 14 = 0, β 11 = 1
,
β 21 = 0, β 24 = 1
which involves r2 = 4 restrictions. Note that the economic theory imposes ten restrictions
⎛ ⎞ ⎛ ⎞
1 0
⎜ −1 ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟
β1 = ⎜ −1 ⎟ β2 = ⎜ 0 ⎟,
⎝ 0 ⎠ ⎝ 1 ⎠
0 −1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
with four of the restrictions used for exact identification. Hence, we are left with six over-identifying
restrictions, which is in line with using the formula mr − r2 = 5 × 2 − 22 = 6.
When is of full rank m, then and the other parameters of (22.28) are identified under fairly
general conditions, and can be consistently estimated by OLS (see Chapter 21). However, if the
rank of is r < m, then is subject to (m − r)2 nonlinear restrictions, and therefore uniquely
determined in terms of the m2 − (m − r)2 = 2mr − r2 underlying unknown parameters.
The cointegrating VAR analysis is concerned with the estimation of VAR(1) (22.28)
(or (22.29)) when the multiplier matrix, , is rank deficient. As pointed out in Section 19.7,
under the rank deficiency, the OLS method is not valid. In addition, since (22.27) is a system
of m × 1 equations, the OLS method will not be appropriate if we have contemporaneously
correlated disturbances with different regressors across the equations, that is, the equations are
‘seemingly’ unrelated. Estimation of (22.28) can be approached by applying the reduced rank
regression method, which consists of carrying a canonical correlation analysis between the vari-
ables in yt and yt−1 (see Sections 19.6 and 19.7, Anderson (1951) and Johansen (1991)).
Conditional on the initial values, y0 , and assuming ut ∼ IIDN (0, ), where is a symmetric
positive definite matrix, the log-likelihood function of (22.28) is given by
1 −1
T
Tm T
(θ;r) = − log 2π − log || − u ut , (22.30)
2 2 2 t=1 t
with θ = vec(α), vec(β), vech() , ut = yt + yt−1 , and r is the assumed rank of
= αβ . Taking β as given, α can be estimated by least squares, namely2
T −1
T
−1
α̂ = − yt yt−1 β β yt−1 yt−1 β = −S01 β β S11 β ,
t=1 t=1
where
1
T
1
S01 = yt yt−1 = (Y − Y −1 ) Y−1 , (22.31)
T t=1 T
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1
T
1
S11 = yt−1 yt−1 = Y−1 Y−1 . (22.32)
T t=1 T
Y and Y−1 are the T ×m matrix of observations on yt and its lagged value, yt−1 . Further, we have
T
−1
ˆ (β) = T −1
ut α̂ ut α̂ = S00 − S01 β β S11 β β S01 ,
t=1
with
−1
ut α̂ = yt + α̂β yt−1 = yt − S01 β β S11 β β yt−1 ,
1
T
1
S00 = yt yt = (Y − Y −1 ) (Y − Y −1 ).
T t=1 T
Assume that T is sufficiently large such that matrices S00 and S11 are nonsingular. Then the con-
centrated log-likelihood function which is given by
Tm T T
−1
c (β;r) = − log 2π − log ˆ (β) − 1 ˆ
ut α̂ (β) ut α̂ ,
2 2 2 t=1
can be written as
Tm T −1
c (β;r) = − 1 + log 2π − log S00 − S01 β β S11 β β S01 . (22.33)
2 2
However, it is easily seen that3
−1 |S00 | β AT β
S00 − S01 β β S11 β β S01 = , (22.34)
β B T β
where
Tm T T
c (β;r) = − 1 + log 2π − log |S00 | − log β AT β − log β BT β . (22.36)
2 2 2
3 Note that
G + XHY = GHH−1 + YG−1 X,
where H and G are n × n and m × m nonsingular matrices, and X and Y are m × n and n × m matrices.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
It is clear that the maximization of c (β; r) with respect to β is equivalent to the minimization
of the ratio
β AT β
q(β) = ,
β B T β
with respect to β. Also, none of these two optimization problems will lead to a unique solu-
tion for β. It is easily seen q(βQ ) = q(β) holds for any arbitrary r × r nonsingular matrix, Q .
Therefore, as also explained in Section 19.7, r2 just-identifying restrictions are needed for exact
identification. For computational purposes Johansen (1991) employs the following restrictions
β B T β = Ir ,
and further assumes that the different columns of β are orthogonal to each other. These restric-
tions together impose the required r2 exact-identifying restrictions; with the restrictions
β BT β = Ir providing r(r +1)/2 restrictions and the orthogonality of the cointegrating vectors
supplying the remaining needed r(r − 1)/2 restrictions.
Hence, ML estimates of β (and the maximized log-likelihood function) can be obtained
by noting that when
AT and BT are positive definite matrices, the minimized value of
q(β) = β AT β / β BT β , denoted by q(β̂), is given by4
r
q(β̂) = q(β̂Q) = ρ̂ i ,
i=1
where ρ̂ 1 < ρ̂ 2 < . . . < ρ̂ r are the r smallest eigenvalues of AT with respect to BT , given by the
solution to the following determinantal equation in ρ
|AT − ρBT | = 0.
where λ = 1 − ρ. Also since S11 is a nonsingular matrix, then λ̂i can be computed as the ith
largest eigenvalue of 5
S10 S−1 −1
00 S01 S11 .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
−1
S10 S−1
00 S01 S11 v̂i = λ̂i v̂i , i = 1, 2, . . . , r, (22.37)
T
r
Tm T
c (r) = − 1 + log 2π − log |S00 | − log 1 − λ̂i . (22.38)
2 2 2 i=1
Note that the maximized value of the log-likelihood c (r) is only a function of the cointegra-
tion rank r through the eigenvalues {λ̂i }ri=1 defined by (22.37). For further details see Johansen
(1991) and Pesaran and Shin (2002).
p−1
yt = −αβ yt−1 + j yt−j + ut , for t = 1, 2, . . . , T, (22.39)
j=1
which corresponds to an underlying VAR(p) specification. Writing the model in matrix notation
we have the following system of regression equations
Y = −Y−1 + X + U, (22.40)
where = αβ , Y = y1 , y2 , . . . , yT , X = (Y−1, Y−2 , . . . , Y−p+1 ), U = (u1 , u2 , . . . , uT ) ,
and = ( 1 , 2 , . . . , p−1 ) is an m(p − 1) × m matrix of unknown coefficients. Further,
Y = Y − Y−1 , and Y−1 , Y−2 , . . . , Y−p+1 refer to T × m matrices of lagged obser-
vations on Y.
Conditional on the p initial values, y−p+1 , . . . , y0 , the log-likelihood function of (22.40) can
be written as
Tm T 1
(θ ;r) = − log 2π − log || − Tr −1 U U ,
2 2 2
where U = Y + Y−1 + X, and θ = vec(α) , vec(β) , vec() , vech() . The results
obtained in Section 22.6 still hold for this more general case. One only needs to replace cross-
product sample moment matrices S01 and S11 , defined by (22.31) and (22.32), by
1
T
Sij = rit r , for i, j = 0, 1,
T t=1 jt
residual vectors from the OLS regressions of yt and yt−1 on yt−1 ,
where r0t and r1t are the
yt−2 , . . . , yt−p+1 , respectively. The rest of the analysis will be unaffected.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
It is also possible to include intercepts, linear (deterministic) trends and I(1) weakly exogenous
variables in the model. For the inclusion of intercepts, or linear deterministic trends, see Sections
22.8 and 22.9. See Chapter 23 for the inclusion of weakly exogenous variables in the VAR model.
(L)(yt − μ − γ t) = ut , t = 1, 2, . . . , (22.41)
p
where μ and γ are m-dimensional vectors of unknown coefficients, and
(L) ≡ Im − i=1
i Li
is an m × m matrix lag polynomial of order p. It is convenient to re-express the lag polynomial
(L) in a form which arises in the vector error correction model
(L) ≡ −L + (L)(1 − L). (22.42)
p
≡ − Im −
j , (22.43)
j=1
p−1 p
and the short-run response matrix lag polynomial (L) ≡ Im − i=1 i Li , j = − i=j+1
i ,
j = 1, . . . , p − 1. Hence, the VAR(p) model (22.41) may be rewritten in the following form
(L)yt = a0 + a1 t + ut , t = 1, 2, . . . , (22.44)
where
a0 ≡ −μ + ( + ) γ , a1 ≡ −γ , (22.45)
p−1
≡ Im − j. (22.46)
j=1
Hr : Rank () = r, r = 0, 1, . . . , m. (22.47)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
= αβ , (22.48)
where α and β are m × r matrices of full column rank. Correspondingly, we may define the
m × (m − r) matrices of full column rank α ⊥ and β ⊥ whose columns form bases for the null
spaces (kernels) of α and β , respectively. In particular, α α ⊥ = 0 and β β ⊥ = 0. We make the
following assumptions.
p
Assumption 1: The m × m matrix polynomial
(z) = Im − i=1
i zi is such that the roots
of the determinantal equation |
(z)| = 0 satisfy |z| > 1 or z = 1.
Assumption 1 rules out the possibility that the random process {(yt − μ − γ t)}∞ t=1 admits
explosive roots or seasonal unit roots except at zero frequency. Under Assumption 1, Assump-
tion 2 is necessary and sufficient for the processes {β ⊥ (yt − μ − γ t)}∞
t=1 and {β (yt − μ −
∞
γ t)}t=1 to be integrated of orders one and zero respectively.6 Moreover, Assumption 2 specifi-
cally excludes the process {(yt − μ − γ t)}∞ t=1 being integrated of order two, or I(2). Together
these assumptions allow us to write the solution of (22.41) as an infinite-order moving average
representation, given below. See Johansen (1991, Theorem 4.1, p. 1559) and Johansen (1995,
Theorem 4.2, p. 49).
The differenced process {yt }∞ t=1 may be expressed as the infinite vector moving average
process
where b0 ≡ Ca0 + C∗ a1 , b1 ≡ Ca1 . The matrix lag polynomial C(L) is given by7
∞
∞
∗ ∗
C(L) ≡ Im + Cj L = C + (1 − L)C (L), C (L) ≡
j
C∗j Lj ,
j=1 j=0
∞
∞
C≡ Cj , C∗ ≡ C∗j . (22.50)
j=0 j=0
Now, as C(L)
(L) =
(L)C(L) = (1 − L)Im , then C = 0 and C = 0, and in particular,
C = β ⊥ (α ⊥ β ⊥ )−1 α ⊥ . Re-expressing (22.49) in levels,
t(t + 1)
yt = y0 + b0 t + b1 + Cst + C∗ (L)(ut − u0 ), (22.51)
2
6 See Johansen (1995), Definitions 3.2 and 3.3, p.35. That is, defining the difference operator ≡ (1−L), the processes
{β ⊥ [(yt − μ − γ t)]}∞ ∞
t=1 , and {β (yt − μ − γ t)}t=1 admit stationary and invertible ARMA representations; see also
Engle and Granger (1987, p. 252, Definition).
7 The matrices {C } can be obtained from the recursions C =
p
i i j=1 Ci−j
j , i > 1, C0 = Im , C1 = −(Im −
1 ),
defining Ci = 0, for i < 0. Similarly, for the matrices {Cj }, Cj = Cj + C∗j−1 , j > 0, C∗0 = Im − C.
∗ ∗
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where st ≡ ts=1 us , t = 1, 2, . . . .
Adopting the VAR(p) formulation (22.41) rather than the more usual (22.44), in which a0
and a1 are unrestricted, reveals immediately from (22.51) that the restrictions (22.45) on a1
induce b1 = 0 and ensure that the nature of the deterministic trending behaviour of the level
process {yt }∞
t=1 remains invariant to the rank r of the long-run multiplier matrix ; that is, it
is the deterministic trend of yt which will be linear for all values of r, the rank of β. Hence, the
infinite moving average representation for the level process {yt }∞t=1 is
8
where we have used the initialization y0 ≡ μ + C∗ (L)u0 .9 See also Johansen (1994) and
Johansen (1995, Section 5.7, p. 80–84).10 If, however, a1 were not subject to the restrictions
(22.45), the quadratic trend term would be present in the level equation (22.51) apart from in
the full rank stationary case Hm : Rank [] = m or C = 0. However, b1 would be uncon-
strained under the null hypothesis of no cointegration; that is, H0 : Rank[] = 0, and C full
rank. In the general case Hr : Rank[] = r of (22.47), this would imply different deterministic
trending behaviour for {yt }∞t=1 for differing values of the cointegrating rank r, with the number
of independent quadratic deterministic trends, m − r, decreasing as r increases.
The above analysis further reveals that because cointegration is only concerned with the elim-
ination of stochastic trends it does not rule out the possibility of deterministic trends in the
cointegrating relations. Pre-multiplying both sides of (22.52) by the cointegrating matrix β , we
obtain the cointegrating relations
β yt = β μ + (β γ )t + β C∗ (L)ut , t = 1, 2, . . . , (22.53)
p−1
yt = a0 + a1 t − yt−1 + j yt−j + ut . (22.54)
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Given the above discussion, we can differentiate between five cases of interest:
Case I (no intercepts and no trends) a0 = 0 and a1 = 0.
This corresponds to a model with no deterministic components. In particular, model (22.54)
becomes
p−1
yt = −yt−1 + j yt−j + ut . (22.55)
j=1
p−1
yt = − yt−1 − μ + j yt−j + ut . (22.56)
j=1
p−1
yt = a0 + yt−1 + j yt−j + ut . (22.57)
j=1
p−1
yt = a0 + [yt−1 − γ (t − 1)] + j yt−j + ut . (22.58)
j=1
p−1
yt = a0 + a1 t − yt−1 + j yt−j + ut . (22.59)
j=1
The maximum likelihood estimation for the above cases can be carried out using, instead of
S01 and S11 defined by (22.31) and (22.32), the matrices
1
T
Sij = rit r , for i, j = 0, 1,
T t=1 jt
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where r0t and r1t , respectively, are the residual vectors computed using the following regressions:
Case I: (a0 = a1 = 0)
r0t is the residual vector from the OLS regressions of yt on yt−1 , yt−2 , . . . , yt−p+1 ,
and r1t is the residual vector from the OLS regressions of yt−1 on yt−1 , yt−2 , . . . , yt−p+1 .
Case II: (a1 = 0, a0 = μ)
r0t is the residual vector from the OLS regressions of yt on yt−1
, yt−2 , . . . , yt−p+1 ,
1
and r1t is the residual vector from the OLS regressions of on yt−1 , yt−2 , . . . ,
yt−1
yt−p+1 .
Case III: (a1 = 0, a0 = 0)
r0t is the residual vector from the OLS regressions of yt on 1, yt−1 , yt−2 , . . . , yt−p+1 ,
and r1t is the residual vector from the OLS regressions of yt−1 on 1, yt−1 , yt−2 , . . . , yt−p+1 .
Case IV: (a0 = 0, a1 = γ )
r0t is the residual vector from the OLS regressions of yt on 1, y
t−1 , yt−2 , . . . , yt−p+1 ,
t
and r1t is the residual vector from the OLS regressions of on 1, yt−1 , yt−2 , . . . ,
yt−1
yt−p+1 .
Case V: (a0 = 0, a1 = 0)
r0t is the residual vector from the OLS regressions of yt on 1, t, yt−1
, yt−2 , . . . , yt−p+1 ,
and r1t is the residual vector from the OLS regressions of yt−1 on 1, t, yt−1 , yt−2 , . . . ,
yt−p+1 .
The rest of the analysis will be unaffected.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The log-likelihood ratio statistic for testing the null of r cointegrating relations against the alter-
native that there are r + 1 of them is defined by
LR Hr | Hr+1 = 2 c β̂; r + 1 − c β̂; r ,
where c β̂; r+1 and c β̂; r refer to the maximized log-likelihood values under Hr+1 and Hr ,
respectively. Hence, by substituting the expression for the maximized concentrated likelihood
(see equation (22.38) for the VAR(1)) we obtain
LR (Hr | Hr+1 ) = −T log 1 − λ̂r+1 , (22.61)
where λ̂r is defined in (22.37). See Johansen (1991) for further details.
Hm : Rank () = m,
or
m
LR (Hr | Hm ) = −T log 1 − λ̂i , (22.62)
i=r+1
where λ̂r+1 > λ̂r+2 > . . . > λ̂m are the smallest m − r eigenvalues of S10 S−1 −1
00 S01 S11 .
We note that, unlike residual-based cointegration tests, tests based on the maximum eigen-
value or trace statistics are invariant to the ordering of variables, namely, they are not affected
when the variables in yt are re-ordered or replaced by other linear combinations.
Next we derive the asymptotic distribution of the Trace statistic for model (22.27).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Assumption 3(a) states that the error process {ut }∞t=−∞ is a martingale difference sequence
with constant conditional variance; hence, {ut }∞ t=−∞ is an uncorrelated process. Assumption 3
is required for the multivariate invariance principle to hold (see Appendix B, Section B.13.1).
Consider the trace statistic defined by (22.62) under r = 0
m
m
LR H0 | Hmy = −T log 1 − λ̂i = T λ̂i + op (1),
i=1 i=1
m
−1
λ̂i = Tr S10 S−1
00 S01 S11 .
i=1
Under the null of no cointegration the VAR(1) model (without intercepts or linear trends)
implies
t
yt = y0 + uj = y0 + st ,
j=1
and similarly
t−1
yt−1 = y0 + uj = y0 + st−1 .
j=1
Hence
1
T
y0 y0 1
T
1
T 1 T
T −1 S11 = y y
t−1 t−1 = + s s
t−1 t−1 + y0 s + s
t−1 y0 .
T 2 t=1 T 2 T 2 t=1 T 2 t=1 t−1 T 2 t=1
1 y0 y0
T
st−1 → 0, → 0,
T 2 t=1 T2
T 1
T −2 st st ⇒ W(a)W(a) da.
t=1 0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where W(a) is an m-dimensional Brownian motion with the covariance matrix, . Similarly, it
is possible to show that
1
1 1
T T
S01 = yt yt−1 = ut y0 + st−1 ⇒ dW(a)W(a) ,
T t=1 T t=1 0
1
T
p
S00 = yt yt → .
T t=1
It follows that
m
LR (H0 | Hm ) = −T log 1 − λ̂i
i=1
−1 1 !
1 1
−1
⇒ Tr W(a)W(a) da W(a)dW (a) dW(a)W(a) .
0 0 0
m
−T log 1 − λ̂i
i=1
−1 1
!
1 1
⇒ Tr dB(a)B(a) B(a)B(a) da B(a)dB(a) .
0 0 0
This is a multivariate generalization of the Dickey–Fuller distribution used to test the unit root
hypothesis (for the basic case of no intercept or trend) where m = 1. Note that the asymptotic
distribution of the trace statistic does not depend on , and depends
only on the dimension of
yt , m. It is also easily seen that a test based on −T mi=1 log 1 − λ̂ i will be consistent, in the
sense that the power of the test will tend to unity as T → ∞, if r > 0.
The critical values for the maximum eigenvalue and the trace statistics defined by (22.61) and
(22.62), respectively, depend on m and whether the VECM contains intercepts and/or trends
and whether these are restricted. These critical values are available in MacKinnon, Haug, and
Michelis (1999) (see also Osterwald-Lenum (1992)).
Monte Carlo simulation results indicate that these cointegrating rank test statistics generally
tend to under-reject in small samples (see Pesaran, Shin, and Smith (2000)). Appropriate critical
values can be computed by adopting a bootstrap approach, as outlined in Section 22.12. We
also refer to Lütkepohl, Saikkonen, and Trenkler (2001) for a comparison of the properties of
maximum eigenvalue and trace tests under a set of local alternatives.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
β̂ J BT β̂ J = Ir , (22.63)
and
β̂ iJ (BT − AT ) β̂ jJ = 0, i = j, i, j = 1, 2, . . . , r, (22.64)
where β̂ iJ represents the ith column of β̂ J . The conditions (22.63) and (22.64) together exactly
impose r2 just-identifying restrictions on β. It is, however, clear that the r2 restrictions in (22.63)
and (22.64) are adopted for their mathematical convenience and not because they are meaning-
ful from the perspectives of any long-run economic theory.
A more satisfactory procedure would be to directly estimate the concentrated log-likelihood
function (22.36) subject to exact or over-identifying a priori restrictions obtained from the
long-run equilibrium properties of a suitable underlying economic model (on this see, Pesaran
(1997)). We can formulate the following general linear restrictions on the elements of β
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
R vec(β) = b, (22.65)
where R and b are k × rm matrix and k × 1 vector of known constants (with Rank(R) = k),
and vec(β) is the rm × 1 vector of long-run coefficients, which stacks the r columns of β into a
vector. If the matrix R is block-diagonal then (22.65) can be written as
Ri β i = bi , i = 1, 2, . . . , r, (22.66)
This result also implies that there must be at least r independent restrictions on each of the r
cointegrating vectors.
The identification condition in the case where R is not block diagonal is given by
A necessary condition for (22.68) to hold is given by the order condition k ≥ r2 . Three cases of
interest can be distinguished:
Pesaran and Shin (2002) proved that this estimator satisfies the restriction (22.65), and is invari-
ant to nonsingular transformations of the cointegrating space spanned by columns of β̂.
Over-identified case k > r2
In this case, there are k − r2 additional restrictions that need to be taken into account at the esti-
mation stage. This can be done by maximizing the concentrated log-likelihood function (22.36),
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
subject to the restrictions given by (22.65). We assume that the normalization restrictions on
each of the r cointegrating vectors is also included in R vec (β) = b. The Lagrangian function
for this problem is given by
1 c 1
(θ , λ) = (θ ; r) − λ (Rθ − b)
T 2
1 1
= constant − log β AT β − log β BT β − λ (Rθ − b),
2 2
R θ̃ = b, (22.71)
where θ̃ and λ̃ stand for the restricted ML estimators, and d(θ̃ ) is the score function defined by
−1 −1 "
d(θ̃ ) = β̃ AT β̃ ⊗ AT − β̃ BT β̃ ⊗ BT θ̃ . (22.72)
Computation of θ̃ can be obtained by numerical methods such as the Newton Raphson proce-
dure.
Evidence on the small sample properties of alternative methods of estimating the cointegrat-
ing relations is provided in Gonzalo (1994), who shows that the Johansen maximum likelihood
approach is to be preferred to the other alternatives proposed in the literature.
R vec(β) = b, (22.73)
RA θ b
= 2 A , (22.74)
r2 × rm rm × 1 r ×1
RB θ bB
= , (22.75)
(k − r2 ) × rm rm × 1 (k − r2 ) × 1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where R = RA , RB , and b = (bA , bB ), such that Rank(RA ) = r2 , Rank(RB ) = k − r2 , and
bA = 0. Without loss of generality the restrictions characterized by (22.74) can be viewed as
the just-identifying restrictions, and the remaining restriction defined by (22.75) will then con-
stitute the k − r2 over-identifying restrictions. Let θ̂ be the ML estimators of θ obtained subject
to the r2 exactly-identifying restrictions, and θ̃ be the ML estimators of θ obtained under all the
k restriction in (22.73). Then the log-likelihood ratio statistic for testing the over-identifying
restrictions is given by
LR (R |RA ) = 2 c θ̂ ; r − c θ̃ ; r , (22.76)
where c θ̂ ; r is given by (22.38) and represents the maximized value of the log-likelihood func-
tion under the just-identifying restriction, (say RA θ = bA ), and c θ̃ ; r is the maximized value
of the log-likelihood function under the k just- and over-identifying restrictions given by (22.73).
Pesaran and Shin (2002) proved that, under the null hypothesis that the restrictions (22.73)
hold, the log-likelihood ratio statistic LR (R |RA ) defined by (22.76) is asymptotically dis-
tributed as a χ 2 variate with degrees of freedom equal to the number of the over-identifying
restrictions, namely k − r2 > 0.
The above testing procedure is also applicable when interest is on testing restrictions on a sin-
gle cointegrating vector of a subset of cointegrating vectors. For this purpose, one simply needs
to impose just-identifying restrictions on all the vectors except for the vector(s) that are to be
subject to the over-identifying restrictions. The resultant test statistic will be invariant to the
nature of the just-identifying restrictions. Note that this test of the over-identifying restrictions
on the cointegrating relations pre-assumes that the variables yt are I(1), and that the number of
cointegrating relations, r, is correctly chosen. See Pesaran and Shin (2002).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
More specifically, suppose that the model in (22.16) has been estimated under the exact- or
over-identifying restrictions given by (22.65). We therefore have estimates of the cointegrating
vectors, β̂, of the short-run parameters, α̂, ˆ i , and the elements of the covariance matrix, .
ˆ Tak-
ing the p lagged values of the yt observed just prior to the sample as fixed, for the sth replication,
(s)
we can recursively simulate the values of yt , s = 1, 2, . . . , S, using
(s) (s)
p−1
(s)
yt = −α̂ β̂ yt−1 + ˆ i yt−i + ut , t = 1, 2, . . . , T. (22.77)
i=1
The simulated errors, u(s)t , can be obtained in two alternative ways, so that the contemporane-
ous correlations that exist across the errors in the different equations of the VAR model are taken
into account and maintained. The first is a parametric method where the errors are drawn from
an assumed probability distribution function. Alternatively, one could employ a non-parametric
procedure. The latter is slightly more complicated and is based on re-sampling techniques in
which the simulated errors are obtained by a random draw from the in-sample estimated resid-
uals (e.g., Hall (1992)).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
functional form misspecifications can be readily computed, based on these OLS regressions, in
the usual manner. Further discussion of the validity of standard diagnostic test procedures when
different estimation procedures are adopted in models involving unit roots and cointegrating
relations is provided in Gerrard and Godfrey (1998). This is an important observation because
it simplifies estimation and diagnostic testing procedures. Moreover, it makes clear that the mod-
elling procedure is robust to uncertainties surrounding the order of integration of particular vari-
ables. It is often difficult to establish the order of integration of particular variables using the
techniques and samples of data which are available, and it would be problematic if the modelling
procedure required all the variables in the model to be integrated of a particular order. However,
the observations above indicate that, so long as the r × 1 cointegrating relations, ξ̂ t = β̂ yt−1 ,
are stationary, the conditional VEC model, estimated and interpreted in the usual manner, will
be valid even if it turns out that some or all of the variables in yt−1 are I(0) and not I(1)
after all.
in which the variables y1t and y2t are cointegrated with cointegrating vector β = (β 1 , β 2 ) .
Denoting ξ t+1 = β 1 y1t + β 2 y2t , and pre-multiplying both sides of (22.78) by β , we obtain
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
grating vector, and the estimate of α 1 alone will not allow us to sign the expressions β 1 α 1 +β 2 α 2
and β 1 α 1 +β 2 α 2 −2. Hence, for example, restricting α 1 to lie in the range (0, 2) ensures the sta-
bility of (22.79) only under the normalization β 1 = 1, and in the simple case where α 2 = 0.11
More generally, we can rewrite (22.39) as an infinite-order difference equation in an r × 1
vector of (stochastic) disequilibrium terms, ξ t = β yt−1 . Under our assumption that all the
p−1
variables in yt are I(1), and all the roots of Im − i=1 i zi = 0, fall outside the unit circle, we
have the following expression for yt
yt = (L)−1 −αξ t + ut , t = 1, 2, . . . , T, (22.80)
p−1
where (L) = Im − i=1 i Li . Defining (L) = (L)−1 = ∞
i=0 i L , then it is easily
i
or
∞
∞
ξ t+1 = Ir − β α − β i α Li ξ t + β + β i Li (a0 + a1 t + ut ) . (22.82)
i=1 i=1
This shows that, in general, when p ≥ 2, the error correction variables, ξ t+1 , follow infinite-
order VARMA processes, and there exists no simple rule involving α alone that could ensure the
stability of the dynamic processes in ξ t+1 . This result also highlights the deficiency of residual-
based approaches to testing for cointegration described in Section 22.3, where finite-order ADF
regressions are fitted to the residuals even if the order of the underlying VAR is 2 or more.
p−1
However, given the assumption that none of the roots of Im − i=1 i zi = 0, fall on or
inside the unit circle, it is easily seen that the matrices i, i = 0,i 1, 2, . . ., are absolute summable,
and therefore a suitably truncated version of ∞ i=1 β
i α L can provide us with an adequate
approximation in practice. Using an -order truncation we have
ξ t+1 ≈ Di ξ t−i+1 + vt , t = 1, 2, . . . , T, (22.83)
i=1
where
D1 = Ir − β α, Di = −β i−1 α, i = 2, 3, . . . , , (22.84)
11 When α 2 = 0, y2t is said to be long-run forcing for y1t . See Chapter 23.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
vt = β + β i Li (a0 + a1 t + ut ) .
i=1
To explicitly evaluate the stability of the cointegrated system, we rewrite (22.83) more com-
pactly as
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
ξt D1 D2 · · · D−1 D vt
⎜ ξ t−1 ⎟ ⎜ Ir 0 ··· 0 0 ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
ξ̌ t = ⎜ ξ t−2 ⎟, D =⎜ 0 Ir ··· 0 0 ⎟ , v̌t = ⎜ 0 ⎟.
⎜ .. ⎟ r×r ⎜ .. .. .. .. .. ⎟ r×1 ⎜ .. ⎟
r×1 ⎝ . ⎠ ⎝ . . . . . ⎠ ⎝ . ⎠
ξ t−+1 0 0 ··· Ir 0 0
(22.86)
The above cointegrated system is stable if all the roots of Ir − D1 z − · · · − D z = 0, lie
outside the unit circle, or if all the eigenvalues of D have modulus less than unity.12
12 Notice that the stability analysis is not affected by the presence of deterministic and stationary exogenous variables in
the system.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
P
ydt = g0 + gt,
The above result forms the basis of the trend/cycle decomposition of yt described in Garratt
et al. (2006). Suppose that yt has the following vector error correction representation with unre-
stricted intercept and restricted trend
p−1
yt = a − αβ [yt−1 − γ (t − 1)] + i yt−i + ut . (22.88)
i=1
Denote the deviation of the variables in yt from their deterministic components as ỹt , namely
p−1 p−1
ỹt = a − αβ g0 − Im − i g − αβ g − γ (t − 1) − αβ ỹt + i ỹt−i + ut .
i=1 i=1
p−1
a = αβ g0 + Im − i g, (22.90)
i=1
and
β g = β γ . (22.91)
p−1
ỹt = −αβ ỹt−1 + i ỹt−i + ut , (22.92)
i=1
or, equivalently,
p
ỹt =
i ỹt−i + ut , (22.93)
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
1 = Im + 1 − αβ ,
i = i − i−1 , i = 2, . . . , p − 1,
p = − p−1 .
In the general case where one or more elements of ỹt are I(1), it is not possible to invert the
p
polynomial operator, Im − i=1
i Li , to derive ỹt in terms of the shocks, ut . However, since
it is assumed that the order of integration of the variables is at most I(1), it follows that ỹt
will follow a general stationary process irrespective of the I(0)/I(1) properties of the underlying
variables. More specifically, we have
where C(L) = C0 + C1 L + C2 L2 + . . ., such that {Ci } are absolute summable matrices. In the
case where ỹt is stationary, we must have C(1) = 0. In general,
C0 = Im , C1 = −(I −
1 ),
p−1
C2 =
1 C1 +
2 C0 , Cp−1 =
i Cp−1−i ,
i=1
or more generally
p
Cj =
i Cj−i , for j = p, p + 1, . . . .
i=1
and cumulating the above from some initial state ỹ0 = y0 − g0 , we have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
t
ỹt = ỹ0 + C (1) ui + C∗ (L) (ut − u0 ) .
i=1
t
yt = y0 + gt + C (1) ui + C∗ (L) (ut − u0 ) . (22.96)
i=1
t
ystP = C (1) ui , (22.97)
i=1
ytC = C∗ (L) (ut − u0 ) + y0 .
t+h
yt+h − g0 − g(t+h) = C (1) ui + ξ t+h ,
i=1
where g0 = y0 − C∗ (L) u0 , and ξ t+h = C∗ (L) ut+h . Since C∗i are absolute summable
matrices, and the error vectors, ut , are serially uncorrelated stationary processes with zero means,
then ξ t+h is also a stationary process, and hence
lim E ξ t+h |t = 0.
h→∞
As a result
# $
t+h
t
ystP = lim E C (1) ui |t = C (1) ui ,
h→∞
i=1 i=1
as required.
As for the estimation of the various components, note that ystP can easily be estimated since
the coefficients for Ci can be derived recursively in terms of
i , which in turn can be obtained
from the i . Once ystP has been estimated, consider the difference
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
t
ŵt = yt − Ĉ (1) ûi ,
i=1
Hence, to obtain ĝ and ŷtC , one can perform a seemingly unrelated (SURE) regression of ŵt on
an intercept and a time trend t, subject to the restrictions
β̂ ĝ = β̂ γ̂ , (22.98)
where γ̂ and β̂ have already been estimated, under the assumption that the cointegrating vectors
are exactly identified. Residuals obtained from such a regression will be an estimate of the cyclical
component ytC . In the case of a cointegrating VAR with no intercept and no trends, we have
t
wt = yt − Ĉ (1) ûi = y0 + ŷtC ,
i=1
and the deterministic component is given by y0 . In the case of a cointegrating VAR with restricted
intercepts and no trends, consistent estimates of g and ytC can be obtained by running the SURE
regressions of wt on an intercept, subject to the restrictions
β̂ g0 = β̂ â,
where, once again, β̂ and â have already been estimated from the VECM model.
In the case of a cointegrating VAR with unrestricted intercepts and no trends, g = 0 and
g0 can be consistently estimated by computing the sample mean of wt (or by running OLS
regressions of wt on intercepts). Finally, for a cointegrating VAR with unrestricted intercepts
and trends, consistent estimates of g can be obtained by running OLS regressions of wt on an
intercept and a linear trend. The cyclical component ŷtC in all cases is the residual from the above
regressions.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
yt = Ayt−1 + ut , (22.99)
where yt = rt , rt∗ , ut = ε t , ε ∗t and
1 + a −a
A= .
b 1−b
and hence
E yt+h |t = Ah yt .
Since in this example there are no deterministic variables such as intercept or trend, the perma-
nent component of yt is given by
ytP = ystP = lim E yt+h |t = lim Ah yt = A∞ yt . (22.100)
h→∞ h→∞
y0 + C(1)sut + C∗ (L)ut ,
yt =
t
y0 = y0 − C∗ (L)u0 , sut =
where i=1 ui , and
∞
C(1) = Ci ,
i=0
∞
C∗ (L) = C∗i Li ,
i=0
with
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Hence,
lim E yt+h |t = ỹ0 + A∞ sut = y0 + A∞ (u1 + u2 + . . . + ut ). (22.101)
h→∞
This result looks very different from that in (22.100) obtained using the direct method. However,
note that
t−1
yt = At y0 + Aj−1 ut−j .
j=0
t−1
lim Ah yt = lim At+h y0 + lim Ah+j−1 ut−j ,
h→∞ h→∞ h→∞
j=0
t−1
A∞ yt = A∞ y0 + A∞ ut−j
j=0
∞
= ỹ0 + A (u1 + u2 + . . . + ut ),
where
1 1 b
− −a+b
a
Q = , Q −1 = −a+b .
1 b/a − −a+b
a a
−a+b
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Clearly, rt and rt∗ have the same stochastic components. Furthermore, the cycles for rt and rt∗ are
given by
Using UK data over the period 1979–2003, â = −0.13647 and b̂ = 0.098014 (Dées et al.
(2007)). Microfit can be used to check that equations (22.102) and (22.103)–(22.104) provide
the stochastic and cyclical components of rt and rt∗ in the BN decomposition (see Lesson 17.6
in Pesaran and Pesaran (2009) and Dées et al. (2007)).
22.18 Exercises
1. Consider the following standard asset pricing model
∞
pt = β i E(dt+i | t ),
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where t is the information available at time t, β = (1 + r)−1 , r > 0 is the discount rate,
pt is the real share price, dt is real dividends paid per share, E(dt+i | t ) is the conditional
mathematical expectations of dt+i with respect to t . Suppose that dt follows a random walk
model with a non-zero drift, μ
dt = μ + dt−1 + ε t , ε t IID(0, σ 2 ).
(a) Show that pt is integrated of order 1 (namely I(1)), and that pt and dt are cointegrated.
(b) Derive the cointegrating vector associated with (pt , dt ).
(c) Write down the error-correction representation of asset prices, pt , and discuss its rela-
tionship to the random walk theory of asset prices.
yt = μ +
1 yt−1 +
2 yt−2 + ut , (22.105)
(a) Derive the conditions under which the VAR(2) model defined in (22.105) is stationary.
(b) Suppose now that one or more elements of yt is I(1). Derive suitable restrictions on the
intercepts, μ, such that despite the I(1) nature of the variables in (22.105), yt has a fixed
mean. Discuss the importance of such restrictions for the analysis of
cointegration.
(c) Write down the error-correction form of (22.105), and use it to motivate and describe
Johansen’s method of testing for cointegration.
∞
A(L) = Ai Li , A0 = Im ,
i=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where ut = (uyt , uxt ) , is a serially uncorrelated process with zero mean and the constant
variance-covariance matrix = (σ ij ), with σ ij = 0.
(a) State the necessary and sufficient conditions under which yt and xt are cointegrated. Pro-
vide examples of yt and xt from macroeconomics and finance where cointegration con-
ditions are expected to hold, based on long-run economic theory.
(b) Write down the above error correction model in the form of the following VAR(2) spec-
ification
zt =
1 zt−1 +
2 zt−2 + ut ,
zt = β xt ,
(a) Show that xt is integrated of order 1 and cointegrated if and only if all the eigenvalues of
the r × r matrix H = β α lie in the range (−2, 0).
(b) Suppose T + 1 observations x0 , x1 , x2 . . . xT are available. Show that the concentrated
log likelihood function of (22.107) in terms of β can be written as
T −1
(β) ∝ log S00 − S01 β β S11 β β S01 ,
2
where
1
T
S01 = xt xt−1 ,
T t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1
T
S11 = xt−1 xt−1 .
T t=1
(c) Using the concentrated log-likelihood function or otherwise, derive the conditions under
which β is exactly identified, and discuss alternative procedures suggested in the literature
for identification of β.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
23 VARX Modelling
23.1 Introduction
T his chapter generalizes the cointegration analysis of Chapter 22 and provides a brief account
of the econometric issues involved in the modelling approach advanced by Pesaran, Shin,
and Smith (2000). We start by describing a general VARX model, which allows for the possi-
bility of distinguishing between endogenous and weakly exogenous I(1) variables, and consider
its efficient estimation. In this framework, we also prove that weak exogeneity is sufficient for
consistent estimation of the long-run parameters of interest that enter the conditional model.
We then turn our attention to the analysis of cointegrating VARX models, present cointegrating
rank tests, derive their asymptotic distribution, and discuss testing the over-identifying restric-
tions on the cointegrating vectors. We also consider the problem of forecasting using a VARX
model, and conclude with an empirical application to the UK economy as discussed in Garratt
et al. (2003b). The methods discussed in this chapter are used to estimate country-specific mod-
ules in the GVAR approach outlined in Chapter 33.
For the purpose of exposition, in this section we assume ut ∼ IIDN(0, ). Further, the analysis
that follows is conducted given the initial values, z0 .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
uyt = yx −1
xx uxt + υ t , (23.2)
where yy,x = y − yx −1 −1
xx x , and = yx xx . Following Pesaran, Shin, and Smith
∞
(2000), we assume that the process {xt }t=1 is weakly exogenous with respect to the matrix of
long-run multiplier parameters , namely
x = 0, (23.4)
so that,
yy.x = y . (23.5)
Since under x = 0, the exogenous variables, xt , are I(1), xt are also referred to as I(1) weakly
exogenous variables in the conditional model of yt . Note that the weak exogeneity restriction
∞
(23.4) implies that {xt }∞
t=1 is integrated of order 1, and that it is long-run forcing for yt t=1 (see
Granger and Lin (1995)). Strictly speaking one can also consider a generalization of this concept
to the case where x is non-zero, but rank deficient.
Under the above restrictions, the conditional model for yt can be written as
Let ψ = (vec(α) , vec(β) , vec( 1 ) , vech() ) be the parameters of interest and note that the
log-likelihood function of (23.7) for the sample of observations over t = 1, 2, . . . , T, is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 −1
T
T
(ψ) = − ln || − u ut .
2 2 t=1 t
where
−1 yy yx
= ,
xy xx
and
|| = | xx | yy − yx −1
xx xy .
Following the discussion below equation (23.2) (see also Appendix B, Section B.10), we have
ut −1 ut = υ t −1 −1
υυ υ t + uxt xx uxt ,
where υ t = uyt − yx −1
xx uxt . Also, partitioning 1 , as 1 = ( y1 , x1 ), we have
where = yx −1 −1
xx , and 1 = y1 − yx xx x1 . Hence, under x = 0 (see (23.4)), the
log-likelihood function can be decomposed as
(ψ) = 1 (θ ) + 2 ( xx , x1 ),
where
T
1 (θ ) =− ln | vv |
2
1
T
− yt − y zt−1 − xt − 1 zt−1
2 t=1
−1
vv yt − y zt−1 − xt − 1 zt−1 ,
and
1
T
T
2 ( xx , 1x ) = − ln | xx | − (xt − x1 zt−1 ) −1
xx (xt − x1 zt−1 ) .
2 2 t=1
Hence, under x , the parameters of interest, θ, that enter the conditional model, 1 (θ ), are vari-
ation free with respect to the parameters of the marginal model, 2 ( xx , x1 ), and the ML esti-
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
mators of θ based on the conditional model will be identical to the ML estimators computed
indirectly (as set out above) using the full model, (ψ).
p−1
zt = −zt−1 + i zt−i + ut , (23.8)
i=1
p−1
where the matrices { i }i=1 are the short-run responses and is the long-run multiplier matrix.
The analysis is now conducted given the initial values Z0 ≡ (z−p+1 , . . . , z0 ). Following a
similar line of reasoning as above, the conditional model for yt in terms of zt−1 , xt , zt−1 ,
zt−2 , . . ., can be obtained as
p−1
yt = −yy,x zt−1 + xt + i zt−i + υ t , (23.9)
i=1
where yy,x ≡ y − yx −1 −1 −1
xx x , = yx xx , i ≡ yi − yx xx xi , i = 1, 2, . . . , p − 1.
Hence, under restrictions (23.4), we obtain the following system of equations
p−1
yt = −y zt−1 + xt + i zt−i + υ t , (23.10)
i=1
p−1
xt = xi zt−i + ax0 + uxt . (23.11)
i=1
Equation (23.11) describes the dynamics of the weakly exogenous variables, and is also called
the marginal model. Note from (23.11) that restriction (23.4) implies that the elements of the
vector process {xt }∞
t=1 are not cointegrated among themselves. However, it does not preclude
∞
yt t=1 being Granger-causal for {xt }∞ t=1 in the short run, in the sense that yt−1 , yt−2 , . . .
could help in predicting xt , even if its lagged values are included in the regression model.
Finally, we note that the cointegration rank hypothesis (22.60) is restated in the context of
(23.6) as
Hr : Rank(y ) = r, r = 0, 1, . . . , my . (23.12)
y = α y β , (23.13)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
my T T 1
(θ) = − ln 2π − ln −1 −1
υυ − Trace( υυ VV ),
(23.15)
2 2 2
with θ = vec() , vec(α y ) , vec(β) , vech( υυ ) . Concentrating out −1
υυ , and α y in
(23.15) results in the concentrated log-likelihood function
my T T −1
c (β) = − (1 + ln 2π) − ln T −1 Ŷ IT − Ẑ−1 β β Ẑ−1 Ẑ−1 β β Ẑ−1 Ŷ ,
2 2
(23.16)
where Ŷ and Ẑ−1 are respectively the OLS residuals from regressions of Y and Z−1 on Z− .
Defining the sample moment matrices
the maximization of the concentrated log-likelihood function c (β) of (23.16) reduces to the
minimization of
−1 |SYY | β SZZ − SZY S−1 SYZ β
SYY − SYZ β β SZZ β β SZY = YY
,
β SZZ β
with respect to β. The solution β̂ to this minimization problem, that is, the maximum likelihood
(ML) estimator for β, is given by the eigenvectors corresponding to the r largest eigenvalues
λ̂1 > . . . > λ̂r > 0 of
λ̂SZZ − SZY S−1
YY SYZ = 0. (23.18)
See Section 22.6, and pp.1553–1554 in Johansen (1991). The ML estimator β̂ is identified up to
post-multiplication by an r×r nonsingular matrix; that is, r2 just-identifying restrictions on β are
required for exact identification. The resultant maximized concentrated log-likelihood function
c (β) at β̂ of (23.16) is
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
r
my T T
c (r) = − (1 + ln 2π ) − ln |SYY | − ln 1 − λ̂i . (23.19)
2 2 2 i=1
Note that the maximized value of the log-likelihood c (r) is only a function of the cointegration
rank r (and my and mx ) through the eigenvalues {λ̂i }ri=1 defined by (23.18). Also, see Boswijk
(1995) and Harbo et al. (1998).
p−1
yt = a0 + a1 t + xt + i zt−i − y zt−1 + υ t .
i=1
In these more general cases we need to modify the definitions of Ŷ and Ẑ−1 and, consequently,
the sample moment matrices SYY , SYZ and SZZ given by (23.17) (see also Section 22.9). Let
1T = (1, 1, . . . , 1) and τ T = (1, 2, . . . , T) . We have:
Case I (a0 = 0 and a1 = 0)
Ŷ and Ẑ−1 are the OLS residuals from the regression of Y and Z−1 on Z− .
Case II (a0 = −y μ and a1 = 0)
Ŷ and Ẑ−1 are the OLS residuals from the regression of Y and Z∗−1 on Z− , where Z∗−1 =
(1T , Z−1 ) .
Case III (a0 = 0 and a1 = 0)
Ŷ and Ẑ−1 are the OLS residuals from the regression of Y and Z−1 on (1T , Z− ) .
Case IV (a0 = 0 and a1 = −y γ )
Ŷ and Ẑ−1 are the OLS residuals from the regression of Y and Z∗−1 on (1T , Z− ) , where
Z∗−1
= (τ T , Z−1 ) .
Case V (a0 = 0 and a1 = 0)
Ŷ and Ẑ∗−1 are the OLS residuals from the regression of Y and Z∗−1 on (1T , τ T , Z− ) ,
where Z∗−1 = (τ T , Z−1 ) .
Tests of the cointegrating rank are obtained along exactly the same lines as those in Sec-
tion 22.10. Estimation of the VECM subject to exact- and over-identifying long-run restrictions
can be carried out by maximum likelihood methods as outlined above, applied to (23.10) sub-
ject to the appropriate restrictions on the intercepts and trends, subject to Rank(y ) = r,
and subject to k general linear restrictions. Having computed ML estimates of the cointegrating
vectors, the short-run parameters of the conditional VECM can be computed by OLS regressions.
While estimation and inference on the parameters of (23.10) can be conducted without a
reference to the marginal model (23.11), for forecasting and impulse response analysis the pro-
cesses driving the weakly exogenous variables must be specified. In other words, one needs to
take into account the possibility that changes in one variable may have an impact on the weakly
exogenous variables and that these effects will continue and interact over time.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
r
s
n
xit = γ ij ECMj,t−1 + ϕ ik yi,t−k + ϑ im xi,t−m +
it ,
j=1 k=1 m=1
where ECMj,t−1 , j = 1, 2, . . . , r, are the estimated error correction terms corresponding to the
r cointegrating relations. The statistic for testing the weak exogeneity of xit is the standard F
statistic for testing the joint hypothesis that γ ij = 0, j = 1, 2, . . . , r, in the above regression.
in the structural VECM (23.10). To this end, we weaken the independent normal distribu-
tional assumption of previous sections on the error process {ut }∞
t=−∞ and make the following
assumptions:
Assumption 1: The error process {ut } is such that
(a) (i) E ut | {zt−i }t−1
i=1 , Z0 = 0; (ii) Var ut | {zt−i }i=1 , Z0 = , with positive definite;
t−1
(b) (i) E υ t |xt , {zt−i }t−1
i=1 , Z0 = 0; (ii) Var υ t |xt , {zt−i }i=1 , Z0 = υυ , where υ t =
t−1
uyt − yx −1 −1
xx uxt , and υυ ≡ yy − yx xx xy ;
(c) supt E(ut s ) < ∞ for some s > 2.
Assumption 1 states that the error process {ut }∞ t=−∞ is a martingale difference sequence
with constant conditional variance; hence, {ut }∞
t=−∞ is an uncorrelated process. Therefore, the
p−1
VECM (23.8) represents a conditional
model for z t given {zt−i }i=1 and zt−1 , t = 1, 2, . . . .
Under Assumption 1(b)(i), E uyt |xt , {zt−i }t−1 −1
i=1 , Z0 = yx xx uxt while 1(b)(ii) ensures that
t−1
Var uyt |xt , {zt−i }i=1 , Z0 = υυ . Therefore, under this assumption, (23.10) can be inter-
p−1
preted as a conditional model for yt given xt , {zt−i }i=1 and zt−1 , t = 1, 2, . . . . Hence,
(23.10) remains appropriate for conditional inference. Moreover, the error process {υ t }∞t=−∞ is
also a martingale difference process with constant conditional variance and is uncorrelated with
the {uxt }∞
t=−∞ process. Thus, Assumptions 1(a)(ii) and 1(b)(ii) rule out any conditional het-
eroskedasticity. Assumption 1(c) is standard and, together with Assumption 1(a), is required for
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
the multivariate invariance principle stated in (23.20) below, while Assumption 1(b) together
with Assumption 1(c) implies the multivariate invariance principle (23.21) below.
Define the partial sum process
[Ta]
−1/2
suT (a) ≡T us ,
s=1
where [Ta] denotes the integer part of Ta, a ∈ [0, 1]. Under Assumption 1 (see also Assump-
tions 1–2 in Chapter 22), suT (a) satisfies the multivariate invariance principle (see Section B.13
in Appendix B and Phillips and Durlaf (1986))
where Wm (.) denotes an m-dimensional Brownian motion with the variance matrix .
y
We partition suT (a) = (sT (a) , sxT (a) ) conformably with zt = (yt , xt ) and the Brownian motion
Wm (a) = (Wmy (a) , Wmy (a) ) likewise, a ∈ [0, 1]. Define sυT (a) ≡ T −1/2 [Ta] s=1 υ s , a ∈ [0, 1].
Hence, as υ t = uyt − yx −1
xx xtu ,
where Wm∗ y (a) ≡ Wmy (a) − yx −1xx Wmx (a) is a Brownian motion with variance matrix υυ
which is independent of Wmx (a), a ∈ [0, 1]. See also Harbo et al. (1998).
Under restriction (23.4), the m × (m − r) matrix α ⊥ ≡ diag(α ⊥ ⊥ ⊥
y , α x ), where α x , an
mx × mx nonsingular matrix, is a basis for the orthogonal complement of the m × r load-
ings matrix α = (α y , 0 ) . Hence, we define the (m − r)-dimensional standard Brownian
motion Bm−r (a) ≡ (Bmy (a) , Bmx (a) ) partitioned into the my - and mx -dimensional sub-
vector independent standard Brownian motions Bmy −r (a) ≡ (α ⊥ ⊥ −1/2 α ⊥ W ∗ (a)
y υυ α y ) y my
and Bmx (a) ≡ (α ⊥ ⊥ −1/2 α ⊥ W (a), a ∈ [0, 1]. See Pesaran, Shin, and Smith (2000)
x xx α x ) x my
for further details. We also need to introduce the following associated de-meaned (m−r)-vector
standard Brownian motion
1
B̃m−r (a) ≡ Bm−r (a) − Bm−r (a)da, (23.22)
0
and their respective partitioned counterparts B̃m−r (a) = (B̃my −r (a) , B̃mx (a) ) , and B̂m−r (a) =
(B̂my −r (a) , B̂mx (a) ) , a ∈ [0, 1].
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where λ̂r is the rth largest eigenvalue from the determinantal equation (23.18), r = 0, . . . , my −1,
with the appropriate definitions of Ŷ and Ẑ∗−1 and, thus, the sample moment matrices SYY ,
SYZ and SZZ given by (23.17) to cover Cases I–V. Under Assumption 1 the limit distribution of
LR(Hr |Hr+1 ) of (23.24) for testing Hr against Hr+1 is given by the distribution of the maximum
eigenvalue of
1 1 −1 1
dBmy −r (a)Fmy −r (a) Fmy −r (a)Fm−r (a) da Fmy −r (a)dWmy −r (a) , (23.25)
0 0 0
where
⎧ ⎫
⎪ Bmy −r (a) Case I ⎪
⎪
⎪ ⎪
⎪
⎪
⎨ (Bmy −r (a) , 1) Case II ⎪
⎬
Fmy −r (a) = B̃my −r (a) Case III , a ∈ [0, 1], (23.26)
⎪
⎪ ⎪
⎪
⎪
⎪ (B̃my −r (a) , a − 12 ) Case IV ⎪
⎪
⎩ ⎭
B̂my −r (a) Case V
where λ̂i is the ith largest eigenvalue from the determinantal equation (23.18). Under Assump-
tion 1 the limit distribution of LR(Hr |Hmy ) of (23.27) for testing Hr against Hmy is given by the
distribution of
−1
1 1 1
Trace dWmy −r (a)Fmy −r (a) Fmy −r (a)Fmy −r (a) da Fmy −r (a)dWmy −r (r) ,
0 0 0
where Fmy −r (a), a ∈ [0, 1], is defined in (23.26) for Cases I–V, r = 0, . . . , my − 1.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
ratio tests for cointegration will now depend on nuisance parameters. However, the above anal-
ysis may easily be adapted to deal with this difficulty.
Let {wt }∞t=1 denote a kw -vector process of weakly exogenous explanatory variables which is
t ∞
integrated of order zero. Therefore, the partial sum vector process w s is integrated
t s=1 t=1
of order one. Defining s=1 ws as a sub-vector of xt with the corresponding sub-vector of xt
as wt , t = 1, 2, . . ., allows
the above analysis to proceed unaltered. With these re-definitions
of xt
and xt to include ts=1 ws and wt , t = 1, 2, . . ., respectively, the partial sum ts=1 ws will now
appear in the cointegrating relations (22.53) and the lagged level term zt−1 in (23.6) although
economic theory may indicate its absence; that is, the corresponding (kw , r) block of the cointe-
grating matrix β is null. This constraint on the cointegrating matrix β is straightforwardly tested
using a likelihood ratio statistic which will possess a limiting chi-squared distribution with rkw
degrees of freedom under Hr . See Rahbek and Mosconi (1999) for further discussion.
The asymptotic critical values for the (log-) likelihood ratio cointegration rank statistics
(23.24) and (23.27) are available in Pesaran, Shin, and Smith (2000). However, as also explained
in Section 22.10, these distributions are appropriate only asymptotically. When the sample is
small or when the order of the VARX or the number of variables in the VARX is large, it is advis-
able to compute critical values by the bootstrap approach, as outlined in Section 22.12.
R vec(β) = b, (23.28)
where R and b are a k × (m + 1)r matrix of full row rank and a k × 1 vector of known constants,
respectively, and vec(β) is the (m + 1)r × 1 vector of long-run coefficients, which stacks the r
columns of β into a vector. As in the case of VAR models, three cases can be distinguished:
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
full k restrictions (namely, Rθ = b), respectively. Then, following similar lines of reasoning as in
Section 22.11, the k − r2 over-identifying restrictions on θ can be tested using the log-likelihood
ratio statistic given by
LR = 2 c θ̂; r − c
θ; r , (23.29)
where c θ̂ ; r and c
θ ; r represent the maximized values of the log-likelihood function
obtained under RA θ = bA and Rθ = b, respectively. Pesaran and Shin (2002) prove that
the log-likelihood ratio statistic for testing Rθ = b given by (23.29) has a χ 2 distribution with
k − r2 degrees of freedom, asymptotically.
Also critical values for the above tests can be computed using the bootstrap approach of the
type described in Section 22.12.
p−1
zt = −αβ zt−1 + i zt−i + Hζ t , (23.30)
i=1
where
αy i + xi
α= , i = , (23.31)
0 xi
υt I υυ 0
ζt = , H = my , Cov(ζ t ) = ζ ζ = . (23.32)
uxt 0 Imx 0 xx
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
p
zt = i zt−i + Hζ t , (23.33)
i=1
p
ẑ(t + τ , t) = ˆ i ẑ(t + τ − i, t), for τ = 1, 2, . . . , h,
i=1
with ẑ(t − i, t) = zt−i , for i = 0, 1, . . . , p − 1. Further, define the h-step ahead forecast changes
as x̂t (h) = ẑ(t + h, t) − zt and the associated h-step ahead realized changes as xt (h) = zt+h − zt .
The h-step ahead forecast error is then computed as
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where pt = ln(Pt ), p∗t = ln(Pt∗ ), et = ln(Et ), yt = ln(Yt /Pt ), y∗t = ln(Yt∗ /Pt∗ ), rt = ln(1+Rt ),
rt∗ = ln(1+Rt∗ ), ht −yt = ln(Ht+1 /Pt )−ln(Yt /Pt ) = ln(Ht+1 /Yt ) and b50 = ln(1+ρ). The
variables Pt and Pt∗ are the domestic and foreign price indices, Yt and Yt∗ are per-capita domestic
and foreign outputs, Rt is the nominal interest rate on domestic assets held from the beginning
to the end of period t, Rt∗ is the nominal interest rate paid on foreign assets during period t, Et
is the effective exchange rate, defined as the domestic price of a unit of foreign currency at the
beginning of period t, and Ht = H̃t /POPt−1 with H̃t the stock of high-powered money.
Equations (23.34), (23.35) and (23.38) describe a set of arbitrage conditions, included in
many macroeconomic models in one form or another. These are the (relative) purchasing power
parity (PPP), the uncovered interest parity (UIP), and the Fisher inflation parity (FIP) relation-
ships. Equation (23.36) is an output gap relation, while (23.37) is a long-run condition that is
derived from the solvency constraints to which the economy is subject (see Garratt et al. (2003b)
for details).
We have allowed for intercept and trend terms (when appropriate) in order to ensure that
(long-run) reduced form disturbances, ξ i,t+1 , i = 1, 2, . . . , 5, have zero means.
The five long-run relations of the core model, (23.34)–(23.38), can be written more com-
pactly as
where
zt = pot , et , rt∗ , rt , pt , yt , pt − p∗t , ht − yt , y∗t . (23.40)
b0 = (b01 , b02 , b03, b04 , b05 ) , b1 = (b11 , 0, 0, b41 , 0),
ξ t = (ξ 1t , ξ 2t , ξ 3t , ξ 4t , ξ 5t ) ,
⎛ ⎞
0 −1 0 0 0 0 1 0 0
⎜ 0 0 −1 1 0 0 0 0 0 ⎟
⎜ ⎟
β =⎜ 0 0 0 0 0 1 0 0 −1 ⎟, (23.41)
⎝ 0 0 0 −β 42 0 −β 43 0 1 0 ⎠
0 0 0 1 −1 0 0 0 0
and pot is the logarithm of oil prices. In modelling the short-run dynamics, we follow Sims (1980)
and others and assume that departures from the long-run relations, ξ t , can be approximated by
a linear function of a finite number of past changes in zt−1 . For estimation purposes we also par-
tition zt = (pot , yt ) where yt = (et , rt∗ , rt , pt , yt , pt − p∗t , ht − yt , y∗t ) . Here, pot is considered to
be a ‘long-run forcing’ variable for the determination of yt , in the sense that changes in pot have a
direct influence on yt , but changes in pot are not affected by the presence of ξ t , which measures
the extent of disequilibria in the UK economy. The treatment of oil prices as ‘long-run forcing’
represents a generalization of the approach to modelling oil price effects in some previous appli-
cations of cointegrating VAR analyses (e.g., Johansen and Juselius (1992)), where the oil price
change is treated as a strictly exogenous I(0) variable. The approach taken in the previous litera-
ture excludes the possibility that there might exist cointegrating relationships which involve the
oil price level, while the approach taken here allows the validity of the hypothesized restriction to
be tested, and for the restriction to be imposed if it is not rejected. Note that foreign output and
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
interest rates are treated as endogenous, to allow for the possibility of feedbacks. This involves
loss of efficiency in estimation if they were in fact long-run forcing or strictly exogenous.
Under the assumption that oil prices are long-run forcing for yt , the cointegrating properties
of the model can be investigated without having to specify the oil price equation. However, spec-
ification of an oil price equation is required for the analysis of the short-run dynamics. We shall
adopt the following general specification for the evolution of oil prices
s−1
pot = δ o + δ oi zt−i + uot , (23.42)
i=1
where uot represents a serially uncorrelated oil price shock with a zero mean and a constant vari-
ance. The above specification ensures oil prices are long-run forcing for yt since it allows lagged
changes in the endogenous and exogenous variables of the model to influence current oil prices
but rules out the possibility that error correction terms, ξ t , have any effects on oil price changes.
These assumptions are weaker than the requirement of ‘Granger non-causality’ often invoked
in the literature.
Assuming that the variables in zt are difference-stationary, our modelling strategy is now to
embody ξ t in an otherwise unrestricted VAR(s − 1) in zt . Under the assumption that oil prices
are long-run forcing, it is efficient (for estimation purposes) to base our analysis on the following
conditional error correction model
s−1
yt = ay − α y ξ t + yi zt−i + ψ yo pot + uyt , (23.43)
i=1
s−1
yt = ay − α y b0 − α y β zt−1 − b1 (t − 1) + yi zt−i + ψ yo pot + uyt , (23.44)
i=1
where β zt−1 − b1 (t − 1) is a 5 × 1 vector of error correction terms. The above specification
embodies the economic theory’s long-run predictions by construction, in contrast to the more
usual approach where the starting point is an unrestricted VAR model, with some vague priors
about the nature of the long-run relations.
Estimation of the parameters of the core model, (23.44), can be carried out using the long-
run structural modelling approach described in Sections 23.3 and 23.5. With this approach, hav-
ing selected the order of the underlying VAR model (using model selection criteria such as the
Akaike information criterion (AIC) or the Schwarz Bayesian criterion (SBC)), we test for the
number of cointegrating relations among the 9 variables in zt . When performing this task, and in
all subsequent empirical analyses, we work with a VARX model with unrestricted intercepts and
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
restricted trend coefficients (Case IV). In terms of (23.44), we allow the intercepts to be freely
estimated but restrict the trend coefficients so that α y b1 = y γ , where y = α y β and γ is
an 9 × 1 vector of unknown coefficients. We then compute ML estimates of the model param-
eters subject to exact and over-identifying restrictions on the long-run coefficients. Assuming
that there is empirical support for the existence of five long-run relationships, as suggested by
theory, exact identification in our model requires five restrictions on each of the five cointegrat-
ing vectors (each row of β), or a total of 25 restrictions on β. These represent only a subset of the
restrictions suggested by economic theory, as characterized in (23.41). Estimation of the model
subject to all the (exact- and over-identifying) restrictions given in (23.41) enables a test of the
validity of the over-identifying restrictions, and hence the long-run implications of the economic
theory, to be carried out.
⎛ ⎞
β 11 β 12 0 0 β 15 0 1 β 18 0
⎜ β 21 0 β 23 1 β 25 0 0 0 β 29 ⎟
⎜ ⎟
β = ⎜ β 31 0 0 0 0 1 β 37 β 38 β 39 ⎟, (23.45)
⎝ β 41 0 0 β 44 β 45 β 46 0 1 0 ⎠
β 51 0 0 β 54 −1 0 0 β 58 β 59
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Notes: The underlying VARX model is of order 2 and contains unrestricted intercepts and
restricted trend coefficients, with pot treated as an exogenous I(1) variable. The statistics
refer to Johansen’s log-likelihood-based trace and maximal eigenvalue statistics and are
computed using 140 observations for the period 1965q1–1999q4. The asymptotic criti-
cal values are taken from Pesaran, Shin, and Smith (2000).
that corresponds to zt = pot , et , rt∗ , rt , pt , yt , pt − p∗t , ht − yt , y∗t . The first vector (the first
row of β ) relates to the PPP relationship defined by (23.34) and is normalized on pt − p∗t ;
the second relates to the IRP relationship defined by (23.35) and is normalized on rt ; the third
relates to the ‘output gap’ relationship defined by (23.36) and is normalized on yt ;2 the fourth is
the money market equilibrium condition defined by (23.37) and is normalised on ht − yt .; and
the fifth is the real interest rate relationship defined by (23.38), normalised on pt .
Having exactly identified the long-run relations, we then test the over-identifying restrictions
predicted by the long-run theory. There are 20 unrestricted parameters in (23.45), and two in
(23.41), yielding a total of 18 over-identifying restrictions. In addition, working with a cointe-
grating VAR with restricted trend coefficients, there are potentially five further parameters on
the trend terms in the five cointegrating relationships. The imposition of zeros on the trend
coefficients in the IRP, FIP or output gap relationships provides a further three over-identifying
restrictions. The absence of a trend in the PPP relationship is also consistent with the theory, as
is the restriction that β 46 = 0 (so that equation (23.37) is effectively a relationship explaining
the velocity of circulation of money). These final two restrictions, together with those which are
2 Our use of the term ‘output gap relationship’ to describe (23.36) should not be confused with the more usual use of
the term which relates, more specifically, to the difference between a country’s actual and potential output levels (although
clearly the two uses of the term are related).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
intrinsic to the theory, mean that there are just two parameters to be freely estimated in the coin-
tegrating relationships and provide a total of 23 over-identifying restrictions on which the core
model is based and with which the validity of the economic theory can be tested.
The log-likelihood ratio (LR) statistic for jointly testing the 23 over-identifying restrictions
takes the value 71.49. In view of the relatively large dimension of the underlying VAR model, the
number of restrictions considered and the available sample size, we proceed to test the signifi-
cance of this statistic using critical values which are computed by means of bootstrap techniques.
(See Section 22.12 for a discussion of bootstrap techniques applied to cointegrating VAR mod-
els.) In the present application, the bootstrap exercise is based on 3,000 replications of the LR
statistic testing the 23 restrictions. For each replication, an artificial data set is generated (of the
same length as the original data set) on the assumption that the estimated version of the core
model is the true data-generating process, using the observed initial values of each variable, the
estimated model, and a set of random innovations.3 The test of the over-identifying restrictions
is carried out on each of the replicated data sets and the empirical distribution of the test statistic
is derived across all replications. This shows that the relevant critical values for the joint tests of
the 23 over-identifying restrictions are 67.51 at the 10 per cent significance level and 73.19 at the
5 per cent level. Therefore, LR statistic of 71.49 is not sufficiently large to justify the rejection of
the over-identifying restrictions implied by the long-run theory.
ML estimation of the five error correction terms yields
The bracketed figures are asymptotic standard errors. The first equation, (23.46), describes the
PPP relationship and the failure to reject this in the context of our core model provides an inter-
esting empirical finding. Of course, there has been considerable interest in the literature exam-
ining the co-movements of exchange rates and relative prices, and the empirical evidence on
PPP appears to be sensitive to the data set used and the way in which the analysis is conducted.
For example, the evidence of a unit root in the real exchange rate found by Darby (1983) and
Huizinga (1988) contradicts PPP as a long-run relationship, while Grilli and Kaminsky (1991)
and Lothian and Taylor (1996) have obtained evidence in favour of rejecting the unit root
hypothesis in real exchange rates using longer annual series.
The second cointegrating relation, defined by (23.47), is the IRP condition. This includes an
intercept, which can be interpreted as the deterministic component of the risk premia associated
with bonds and foreign exchange uncertainties. Its value is estimated at 0.0058, implying a risk
premium of approximately 2.3 per cent per annum. The empirical support we find for the IRP
condition, namely that rt −rt∗ I(0), is in accordance with the results obtained in the literature,
3 In light of the evidence of non-normality of residuals, in this exercise we apply the non-parametric bootstrap (see
Section 22.12). The cointegrating matrix subject to the over-identifying restrictions is estimated on each replicated data set
using the simulated annealing routine by Goffe, Ferrier, and Rogers (1994). Also see Section A.16.3 in Appendix A.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and is compatible with UIP, defined by (23.35). However, under the UIP hypothesis it is also
required that a regression of rt −rt∗ on ln(Et+1 ) has a unit coefficient, but this is not supported
by the data.
The third long-run relationship, given by (23.48), is the output gap (OG) relationship
with per capita domestic and foreign output (measured by the Organisation for Economic
Co-operation and Development (OECD) total output) levels moving in tandem in the long run.
It is noteworthy that the co-trending hypothesis cannot be rejected; that is, the coefficient of the
deterministic trend in the output gap equation is zero. This suggests that average long-run growth
rate for the UK is the same as that in the rest of the OECD. This finding seems, in the first instance,
to contradict some of the results obtained in the literature on the cointegrating properties of
real output across countries. Campbell and Mankiw (1989), and Cogley (1990), for example,
consider cointegration among international output series and find little evidence that outputs
of different pairs of countries are cointegrated. However, our empirical analysis, being based on
a single foreign output index, does not necessarily contradict this literature, which focuses on
pair-wise cointegration of output levels. The hypothesis advanced here, that yt and y∗t are coin-
tegrated, is much less restrictive than the hypothesis considered in the literature that all pairs of
output variables in the OECD are cointegrated.
For the money market equilibrium (MME) condition, given by (23.49), we could not reject
the hypothesis that the elasticity of real money balances with respect to real output is equal
to unity, and therefore (23.49) in fact represents an M0 velocity equation. The MME condi-
tion, however, contains a deterministic downward trend, representing the steady decline in the
money–income ratio in the UK over most of the period 1965–99, arising primarily from the
technological innovations in financial inter-mediation. There is also strong statistical evidence
of a negative interest rate effect on real money balances. This long-run specification is compara-
ble with the recent research on the determinants of the UK narrow money velocity reported in,
for example, Breedon and Fisher (1996).
Finally, the fifth equation, (23.50), defines the FIP relationship, where the estimated constant
implies an annual real rate of return of approximately 1.67 per cent. While the presence of this
relationship might appear relatively non-contentious, there is empirical work in which the rela-
tionship appears not to hold; see, for example MacDonald and Murphy (1989) and Mishkin
(1992). The results support the FIP relationship and again highlight the important role played
by the FIP relationship in a model of the macroeconomy which can incorporate interactions
between variables omitted from more partial analyses.
The estimates of the long-run relations and short-run dynamics of the model are provided in
Table 23.2. The estimates of the error correction coefficients (also known as the loading coeffi-
cients) show that the long-run relations make an important contribution in most equations and
that the error correction terms provide for a complex and statistically significant set of inter-
actions and feedbacks across commodity, money and foreign exchange markets. The results
in Table 23.2 also show that the core model fits the data well and has satisfactory diagnostic
statistics.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Table 23.2 Reduced form error correction specification for the UK model
Equation (pt -p∗t ) et rt rt∗ yt y∗t (ht -yt ) (pt )
† †
−.015† .060 .002 .002 .017 .021† −.024∗ −.005
ξ̂ 1,t (.007) (.029) (.002) (.001) (.008) (.004) (.013) (.004)
−.840† 1.42 .049 .130∗ 1.34† .891† −.721 −.811†
ξ̂ 2,t (.301) (1.28) (.107) (.043) (.353) (.181) (.576) (.297)
.062† −.210∗ −.013 −.006 −.165† −.021 .106∗ .034
ξ̂ 3,t (.029) (.121) (.010) (.004) (.034) (.017) (.055) (.028)
.018† −.029 −.003∗ −.001∗ −.027† −.016† −.003 .009∗
ξ̂ 4,t (.005) (.020) (.002) (.001) (.005) (.003) (.009) (.005)
−.149∗ −.244 −.054∗ −.024† −.099 −.119† .408† .451†
ξ̂ 5,t (.083) (.353) (.028) (.012) (.098) (.050) (.159) (.082)
†
.459† −.039 −.028 −.136 −.013 .436†
(pt−1 − p∗t−1 ) (.095)
.150
(.404) (.032) (.014) (.111) (.057)
.046
(.182) (.094)
.051† .216† −.005 −.001 .021 .013 .007 −.022
et−1 (.022) (.092) (.007) (.003) (.025) (.013) (.042) (.021)
.416† −1.31 .125 −.067 .467 .204 −.677 .974†
rt−1 (.294) (1.25) (.098) (.042) (.345) (.177) (.562) (.290)
∗ −0.810 2.75 −.606† .430† .306 .573 −.267 .166
rt−1 (.617) (2.62) (.205) (.088) (.723) (.371) (1.18) (.606)
.083 .072 .017 .015 −.044 .031 −.168 .356†
yt−1 (.089) (.381) (.030) (.013) (.105) (.053) (.172) (.089)
−.050 .040∗ −.073 .602∗ −.010
y∗t−1 .010
(.161)
–.630
(.683) (.054) (.023) (.188)
.069
(.097) (.307) (.158)
.116 .331 .026 .006 .069 −.014 −.253† .140†
(ht−1 − yt−1 ) (.054) (.228) (.018) (.008) (.063) (.032) (.103) (.053)
†
−.151 .321 .016 .010 .125 −.082∗ .012 −.244†
(pt−1 ) (.073) (.302) (.024) (.011) (.086) (.044) (.140) (.072)
†
−.018† −.024 .001 .001 −.010† .0001 .024† .003
pot (.004) (.018) (.001) (.0005) (.005) (.002) (.008) (.004)
.010† −.013 −.002 −.0001 .006 .002 -.011 .016†
pot−1 (.005) (.019) (.002) (.0001) (.005) (.003) (.009) (.004)
2
R .484 .070 .115 .345 .260 .367 .257 .445
2
Benchmark R .316 .026 .007 .213 .022 .196 .00 .191
σ̂ .007 .032 .002 .001 .009 .004 .014 .007
χ 2SC [4] 2.79 0.96 2.43 17.13† 6.71 .79 8.37† 5.63
χ 2FF [1] 8.57† 0.13 4.34† 6.70† 0.04 5.28† .033 0.01
χ 2N [2] 12.53† 13.98† 17.15† 19.9† 112.4† 10.84 31.45† 118.9†
χ 2H [1] 6.13† 1.97 4.53† 5.2† 0.88 0.93 0.19 4.55†
Notes: Standard errors are given in parenthesis. ‘∗’ indicates significance at the 10% level, and ‘†’ indicates significance at the
5% level. The diagnostics are chi-squared statistics for serial correlation (SC), functional form (FF), normality (N) and het-
2
eroskedasticity (H). The benchmark R statistics are computed based on univariate ARMA(s,q), s,q=0,1,…,4, specifications
with s- and q-order selected by AIC.
and methods described in this chapter can be found in Assenmacher-Wesche and Pesaran (2008)
for the Swiss economy, and in Garratt et al. (2006) and Garratt et al. (2003b) for the UK.
23.10 Exercises
1. Suppose that zt = (yt , xt ) is jointly determined by the following vector autoregressive model
of order 1, VAR(1),
zt = zt−1 + et ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where = (φ ij ) is a 2×2 matrix of unknown parameters, and et = eyt , ext is 2-dimensional
vector of reduced form errors. Denoting the covariance of eyt and ext by ωVar (ext )
where ut is uncorrelated with ext , and therefore the first equation in the VAR can be writ-
ten as the following ARDL model
yt = ϕyt−1 + β 0 xt + β 1 xt−1 + ut ,
with
ϕ = φ 11 − ωφ 21 , β 0 = ω, β 1 = φ 12 − ωφ 22 .
where
β0 + β1
θ= ,
1−ϕ
∞
ũt = (1 − ϕL)−1 ut , α (L) = ∞ =0 α L
, with α =
s=+1 δ s , for = 0, 1, 2, . . .,
∞
and δ (L) = =0 δ L = (1 − ϕL)−1 β 0 + β 1 L .
(c) Under what condition is (1, −θ) is a cointegrating vector?
(d) Discuss alternative approaches to the estimation of θ , distinguishing the cases where xt
is I(0) and I(1).
and
E yt yt−1 , xt = Ayt−1 + Gxt ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(b) Derive and compare the mean squared forecast errors of predicting yT+1 conditional on
(yT , xT ) and (yT , xT+1 ), with respect to a quadratic loss function.
yt = yt−1 + Cxt + ut ,
xt = xt−1 + vt ,
where yt = y1t , y2t , . . . , ymt , xt = (x1t , x2t , . . . , xkt ) , , C, and are fixed-coefficient
matrices, and ξ t = (ut , vt ) ∼ IID(0, ), where is a positive definite matrix.
(a) Suppose that all eigenvalues of lie within the unit circle. Derive the Beveridge–Nelson
decomposition of yt when the eigenvalues of lie inside the unit circle, and when Ik −
is rank deficient.
(b) Repeat the exercise under (a) assuming that Ik − and Im − are both rank deficient.
(c) Derive the long horizon forecasts of yt assuming that has some roots on the unit circle.
(d) How do you estimate limh→∞ E yt+h yt , xt ?
4. Use quarterly time series data on the US and UK economies, which can be downloaded from
<https://sites.google.com/site/gvarmodelling/data>, to estimate a VARX model for the UK
economy conditional on US real equity prices and long term interest rates, taking as endoge-
nous the following UK variables: real output, inflation, real equity prices, short-term and long-
term interest rates.
(a) Test for the presence of a long-run relation between UK inflation and UK long-term inter-
est rate.
(b) Test for the presence of a long-run relation between UK and US long-term interest rates.
(c) Discuss the pros and cons of this VARX model with an alternative specification where
conditioning variables also include euro area variables.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
24.1 Introduction
W e first introduce impulse response analysis and forecast error variance decomposition
for unrestricted VAR models, and discuss the orthogonalized and generalized impulse
response functions. We then consider the identification problem of short-run effects in a struc-
tural VAR model. We review Sims’ approach, and then investigate the identification problem of
a structural model when one or more of the structural shocks have permanent effects.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Iy (n,δ, t−1 )
= E yt+n |ut = δ, ut+1 = 0, · · · , ut+n = 0; t−1
−E yt+n |ut = 0, ut+1 = 0, · · · , ut+n = 0; t−1 ,
where ut is the vector of variables being shocked, δ the vector of shocks, and t is the information
set at time t, containing the information available up to time t.
In this case m = 1, and it is easily seen that the impulse response function of yt is
Iy (n, δ, t−1 ) = ρ n δ, n = 0, 1, 2, . . . .
yt = φyt−1 + ut .
yt = yt−1 + ut ,
assuming that the process is stationary, we can write yt in terms of the shocks, ut , and their lagged
values (see Chapter 21)
yt = ut + ut−1 + 2 ut−1 + . . . .
Then
Ix (n, δ, t−1 ) = An δ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where is the covariance matrix of the errors. First write the VAR model in the form of an
infinite-order moving average (MA) representation
∞
yt = Aj ut−j , (24.3)
j=0
with
= PP , (24.5)
where P is a lower-triangular matrix (see also Section 24.10). Then rewrite the moving average
representation of yt as
∞
∞
−1
yt = Aj P P ut−j = Bj ηt−j , (24.6)
j=0 j=0
where
Bj = Aj P, and ηt = P−1 ut .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and the new errorsηt = η1t , η2t , . . . ηmt in (24.6) are contemporaneously uncorrelated,
namely Var ηit , ηjt = 0, for i = j. The orthogonalized impact of a unit shock at time t to
the ith equation on y at time t + n is given by
Bn ei , n = 0, 1, 2, . . . , (24.7)
Written more compactly, the orthogonalized impulse response function of a unit (one stan-
dard error) shock to the ith variable on the jth variable is given by
These orthogonalized impulse responses are not unique and depend on the particular ordering
of the variables in the VAR. The orthogonalized responses are invariant to the ordering of the
variables only if is diagonal. The non-uniqueness of the orthogonalized impulse responses is
also related to the non-uniqueness of the matrix P in the Cholesky decomposition of in (24.5).
For more details see Lütkepohl (2005).
where σ 12 = var(u1t , u2t ), σ 22 = var(u2t ), and the new error, η1t , has a zero correlation with
u2t . Using this relationship we have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
or
σ 12
y1t = (σ 12 /σ 22 ) y2t + (φ 11 − φ )y1,t−1 +
σ 22 21
σ 12
(φ 12 − φ )y2,t−1 + η1t . (24.10)
σ 22 22
By construction η1t and u2t are orthogonal. Hence shocking η1t will move y1t on impact but
√
leaves y2t unchanged. By contrast, a shock to u2t of size, say σ 22 , will move y2t directly by the
√
amount of the shock, σ 22 , and through equation (24.10) will cause y1t to move on impact by
σ 12 √
the amount of σ 22 σ 22 = √σσ1222 .
This system can also be presented by an (upper) triangular form with the orthogonalized
shocks, η1t and u2t
1 − σσ 12
22
y1t
0 1 y2t
(φ 11 − σσ 12 φ ) (φ 12 − σσ 12
22 21
φ )
22 22
y1,t−1 η1t
= + .
φ 21 φ 22 y2,t−2 u2t
A0 yt = A1 yt−1 + ε t ,
where A0 is an upper triangular matrix, and the shocks in ε t = (η1t , u2t ) are orthogonal by
construction. This is the identification scheme of Sims which treats the shocks in εt as structural.
It is also worth noting that out of the two reduced form errors, u1t , and u2t , it is u2t which is
viewed as structural, and this is made possible by restricting the second equation in the above
system not to contain contemporaneous effects from y1t , and by assuming that the shocks η1t
and u2t (in εt ) to be orthogonal.
A generalization of the above identification scheme when there are m equations, with m > 2
is provided in Section 24.10, where we show that in order to identify a shock as structural it is
not necessary that all the shocks be orthogonal and/or A0 to be lower triangular. But first we
develop the concept of the generalized impulse response function which allows the analysis of
systems with non-orthogonalized shocks.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1. How was the dynamical system hit by shocks at time t? Was it hit by a variable-specific
shock or system-wide shocks?
2. What was the state of the system at time t − 1, before the system was hit by shocks? Was
the trajectory of the system in an upward or in a downward phase?
3. How would one expect the system to be shocked in the future, namely over the interim
period from t + 1, to t + n?
In the context of the VAR model, the GIRF for a system-wide shock, u0t , is defined by
GIy n, u0t , 0t−1 = E yt+n |ut = u0t , 0t−1 − E yt+n |0t−1 , (24.11)
where E (· |· ) is the conditional mathematical expectation taken with respect to the VAR model,
and 0t−1 is a particular historical realization of the process at time t − 1. In the case of the VAR
model having the infinite moving average representation (24.3) we have
GIy n, u0t , 0t−1 = An u0t , (24.12)
which is independent of the ‘history’ of the process. This history invariance property of the
impulse response function (also shared by the traditional methods of impulse response analy-
sis) is, however, specific to linear systems and does not carry over to nonlinear dynamic models.
In practice, the choice of the vector of shocks, u0t , is arbitrary; one possibility would be to
consider a large number of likely shocks and then examine the empirical distribution function
of An u0t for all these shocks. In the case where u0t is drawn from the same distribution as ut ,
namely a multivariate normal with zero means and a constant covariance matrix , we have the
analytical result that
GIy n, u0t , 0t−1 ∼ N 0, An An . (24.13)
The diagonal elements of An An , when appropriately scaled, are the ‘persistence profiles’ pro-
posed in Lee and Pesaran (1993), and applied in Pesaran and Shin (1996) to analyse the speed of
convergence to equilibrium in cointegrated systems. It is also worth noting that when the under-
lying VAR model is stable, the limit of the persistence profile as n → ∞ tends to the spectral
density function of yt at zero frequency (apart from a multiple of π ).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Consider now the effect of a variable-specific shock on the evolution of yt+1 , yt+2 , . . . , yt+n ,
√
and suppose that the VAR model is perturbed by a shock of size δ i = σ ii to its ith equation at
time t. By the definition of the generalized IR function we have
GIy n, δ i , 0t−1 = E yt |uit = δ i , 0t−1 − E yt |0t−1 . (24.14)
Once again using the infinite moving average representation (24.3), we obtain
GIy n, δ i , 0t−1 = An E (ut |uit = δ i ) , (24.15)
which is history invariant (i.e. does not depend on 0t−1 ). The computation of the conditional
expectations E (ut |uit = δ i ) depends on the nature of the multivariate distribution assumed for
the disturbances, ut . In the case where ut ∼ IIDN (0, ), we have
⎛ ⎞
σ 1i /σ ii
⎜ σ 2i /σ ii ⎟
⎜ ⎟
E (ut |uit = δ i ) = ⎜ .. ⎟ δi, (24.16)
⎝ . ⎠
σ mi /σ ii
√
where as before = (σ ij ). Hence, for a ‘unit shock’ defined by δ i = σ ii , we have
√ An ei
GIy n, δ i = σ ii , 0t−1 = √ , i, j, = 1, 2, . . . , m, (24.17)
σ ii
where ei is a selection vector defined by (24.8). The GIRF of a unit shock to the ith equation in
the VAR model (24.1) on the jth variable at horizon n is given by the jth element of (24.17), or
expressed more compactly by
ej An ei
GIij,n = √ , i, j, = 1, 2, . . . , m. (24.18)
σ ii
Unlike the orthogonalized impulse responses in (24.9), the generalized IR in (24.18) are invari-
ant to the ordering of the variables in the VAR . It is also interesting to note that the two impulse
responses coincide only for the first variable in the VAR, or when is a diagonal matrix. See
Pesaran and Shin (1998) for further details and derivations.
A0 yt = A1 yt−1 + ε t ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where yt is an m×1 vector of endogenous variables, A0 and A1 are matrices that contain the struc-
tural coefficients, and εt is an m × 1 vector of structural shocks in the sense that A0 and A1 are
invariant to shocks to one or more elements of εt . Here, however, unlike the usual assumptions
made in the literature, we do not require the structural shocks to be orthogonal. In particular, we
assume that ε t ∼ IID(0, ε ), where ε is unrestricted.
Consider the problem of deriving the generalized impulse response function of a unit shock
to the composite structural √ shock, εct = a εt , where a = (a1 , a2 , . . . , am ) , and the size of the
unit shock is given by δ c = a ε a. We have
gy (h, δ c ) = E yt+h a ε t = δ c , It−1 − E yt+h |It−1 ,
where It−1 is the information set at time. It is now easily seen that
yt = yt−1 + ut ,
gy (h, δ c ) = h gy (0, δ c ),
−1
gy (0, δ c ) = δ −1 −1
c A0 ε a = δ c A0 a.
Note that and can be identified from the reduced form model, and the scaling parameter,
δ c , is given. Hence, for the identification of the effects of the composite shock we require identi-
fication of A0 a, and not all the elements of A0 .
Suppose now that we are interested in identifying the effects of the ith structural shock, ε it .
In this case we need to set a = ei = (0, 0, . . . , 0, 1, 0, . . . , 0, 0) which is a vector of zeros
√
with the exception of its ith element, and δ c = σ ii,ε . It is now easily seen that A0 a = A0 ei
which is equal to the ith row of A0 , which we denote by the m × 1 vector, a0i , and note that
A0 = (a01 , a02 , . . . ., a0m ). Hence, to identify the effects of the ith structural shock we only need
to identify the elements of the ith row of A0 even if the structural shocks are not orthogonal.
An important example where the effects of the ith structural shock are identified arises if there
are no contemporaneous effects in the ith structural equation. This is equivalent to placing the
ith endogenous variable first in the list of the variables when applying Sim’s orthogonalization
procedure, with this important difference that under the above setup we do not require εit to be
orthogonal to the other structural shocks.
It is clear that other more general assumptions concerning the ith row of A0 can be entertained.
For example, it is possible to identify other elements of a0i by standard exclusion restrictions. The
point is that we do not need to make assumptions that identify the effects of all the structural
shocks, as is often done in the literature, if we are interested in identifying the effects of the ith
shock only.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
n
ξ t (n) = B ηt+n− .
=0
Since all the elements of ηt+n− are pair-wise orthogonal at all leads and lags, knowing the jth
orthogonalized shocks now and in future will have no contemporaneous effect on the other
variables, but could affect the ability to forecast the other variables in future periods. More
formally, the forecast errors conditional on the shocks ηj,t+n− , for = 0, 1, 2, . . . , n, are
given by
n
(j)
ξ t (n) = B ηt+n− − ej ηj,t+n− ,
=0
(j)
where by construction ej ξ t (n) = 0 for n = 0 (note that B0 = Im ). It is now easily seen that
[noting that E ηt+n− ηt+n− = 0 if = and E ηt+n− ηt+n− = Im if = ] we
have
n
Var [ξ t (n)] = B B ,
=0
and
n
n
(j)
Var ξ t (n) = B B − B ej ej B .
=0 =0
Hence, improvements in the n-step ahead forecasts (in the mean squared error sense) of knowing
the values of the jth orthogonlized shocks are given by
n
(j)
Var ξ t (n) − Var [ξ t (n)] = B ej ej B .
=0
Scaling the ith element of this matrix by the ith element of n=0 B B yields the proportion of
the forecast error variance of the ith variable that can be predicted by knowing the values of the
jth orthogonalized shocks, namely
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
n
2
n
ei B ej ei B ej ej B ei
=0 =0
θ ij,n = = , i = 1, 2, . . . , m. (24.19)
n n
ei B B ei
ei B B ei
=0 =0
In the multivariate time series literature, θ ij,n for i = 1, 2, . . . , m is known as the forecast error
variance decomposition of the ith variable in the VAR. Since m j=1 ej ej = Im , it is then easily
m
seen that j=1 θ ij,n = 1, which follows due to the orthogonal nature of the shocks, ηjt . Also
since B = A P, where P is defined by the Cholesky decomposition of , (24.5), we can also
write θ ij,n as
n
2
ei A Pej
=0
θ ij,n = , j = 1, 2, . . . , m. (24.20)
n
ei A A ei
=0
θ ij,n can also be viewed as measuring the proportion of the n-step ahead forecast error variance
of variable i, which is accounted for by the orthogonalized innovations in variable j. For further
details, see, for example, Lütkepohl (2005). As with the orthogonalized impulse response func-
tion, the orthogonalized forecast error variance decompositions in (24.20) are not invariant to
the ordering of the variables in the VAR.
n
ξ t (n) = A ut+n− , (24.21)
m×1 =0
n
Var [ξ t (n)] = A A . (24.22)
=0
Consider now the forecast error covariance matrix of predicting yt+n conditional on the infor-
mation at time t − 1, and given values of the shocks to the jth equation, ujt , uj,t+1 , . . . , uj,t+n .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(j)
n
ξ t (n) = A ut+n− − E ut+n− uj,t+n− . (24.23)
m×1 =0
n
(j)
ξ t (n) = A ut+n− − σ −1
jj ej uj,t+n− ,
=0
Therefore, using (24.22) and (24.24) it follows that the decline in the n-step forecast error vari-
ance of ξ t obtained as a result of conditioning on the future shocks to the ith equation is given by
(j)
jn = Var [ξ t (n)] − Var ξ t (n)
n
= σ −1
jj A ej ej A . (24.25)
=0
Scaling the ith diagonal element of jn , namely ei jn ei , by the n-step ahead forecast error vari-
ance of the ith variable in yt , we have the following generalized forecast error variance
decomposition
n
2
σ −1
jj ei A ej
=0
ij,n = . (24.26)
n
ei A A ei
=0
Note that the denominator of this measure is the ith diagonal element of the total forecast error
variance formula in (24.22) and is the same as the denominator of the orthogonalized forecast
error variance decomposition formula (24.20). Also θ ij,n = ij,n when yit is the first variable in
the VAR, and/or is diagonal. However, in general the two decompositions differ.
1 Note that since ut s are serially uncorrelated, E ut+n− |uit , ui,t+1 , . . . , ui,t+n = E ut+n− ui,t+n− , =
0, 1, 2, . . . , n.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
For computational purposes it is worth noting that the numerator of (24.26) can also be writ-
ten as the sum of squares of the generalized responses of the shocks to the ith equation on the jth
2
variable in the model, namely n=0 GIij, , where GIij, is given by (24.18).
p−1
zt = −αβ zt−1 +
i zt−i + Hζ t , (24.27)
i=1
p
zt = i zt−i + a0 + a1 t + Hζ t , (24.28)
i=1
where
∞
C(L) = Cj Lj = C(1) + (1 − L)C∗ (L),
j=0
∞
∞
C∗ (L) = C∗j Lj , and C∗j = − Ci ,
j=0 i=j+1
with C0 = Im , C1 = 1 − Im and Ci = 0, for i < 0. Cumulating forward one obtains the level
MA representation,
t
zt = z0 + b0 t + C(1) Hζ j + C∗ (L)H(ζ t − ζ 0 ),
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1
GI n, z : ζ i = √ C̃n H ζ ζ ei , n = 0, 1, . . . , i = 1, 2, . . . , m, (24.30)
σ ζ ,ii
OI n, z : ζ ∗i = C̃n HPζ ei , n = 0, 1, . . . , i = 1, 2, . . . , m, (24.31)
th
where ζ t is IID 0, ζ ζ , ζ ∗i is an orthogonalized residual, σ ζ ,ij is i, j element of ζ ζ , C̃n =
h
j=0 Cj , with Cj ’s given by the recursive relations (24.29), H and ζ ζ are given in (23.32), ei
is a selection vector of zeros with unity as its ith element, Pζ is a lower triangular matrix obtained
by the Cholesky decomposition of ζ ζ = Pζ Pζ .
Similarly, the generalized and orthogonalized impulse response functions for the cointegrat-
ing relations with respect to a unit change in the error, ζ it are given by
1
GI n, ξ : ζ i = √ β C̃n H ζ ζ ei , n = 0, 1, . . . , i = 1, 2, . . . , m, (24.32)
σ ζ ,ii
OI n, ξ : ζ ∗i = β C̃n HPζ ei , n = 0, 1, . . . , i = 1, 2, . . . , m, (24.33)
where ξ t = β zt−1 .
While the impulse responses show the effect of a shock to a particular variable, the persistence
profile, as developed by Lee and Pesaran (1993) and Pesaran and Shin (1996), shows the effects
of system-wide shocks on the cointegrating relations. In the case of the cointegrating relations
the effects of the shocks (irrespective of their sources) will eventually disappear. Therefore, the
shape of the persistence profiles provides valuable information on the speed of convergence of
the cointegrating relations towards equilibrium. The persistence profile for a given cointegrating
relation defined by the cointegrating vector β j in the case of a VARX model is given by
β j
Cn H ζ ζ H
Cn β j
h(β j z, n) = , n = 0, 1, . . . , j = 1, . . . r, (24.34)
β j H ζ ζ H β j
where β,
Cn , H and ζ ζ are as defined above.
1
GI(n, y : ε i ) = √ C̃n A0−1 ei ,
ωii
n
where C̃n = j=0 Cj . Also
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1
GI(n, y : ui ) = √ C̃n ei .
σ ii
In particular,
1
GI(∞, y : ε i ) = √ C(1)A0−1 ei
ωii
and
1
GI(∞, y : ui ) = √ C(1)ei
σ ii
and unlike the stationary case shocks will have permanent effects on the I(1) variables, though
not on the cointegrating relations.
For the cointegrating relations ξ t = β yt , we have
1
GI(n, ξ : εi ) = √ β C̃n A0−1 ei .
ωii
β C̃n C̃n β, n = 0, 1, . . . .
The profiles tend to zero as n → ∞, and provide a useful graphical representation of the extent
to which the cointegrating (equilibrium) relations adjust to system-wide shocks. The persistence
profiles are uniquely determined.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(s)
p
zt = ˆ i z(s)
(s)
t−i + â0 + â1 t + Ĥζ t , t = 1, 2, . . . , T, (24.35)
i=1
realizations are used for the initial values, z−1 , . . . , z−p , and ζ (s)
t s can be drawn either by para-
metric or nonparametric methods (see Section 22.12).
(s)
Having obtained the S set of simulated in-sample values, z(s) (s)
1 , z2 , . . . , zT , the VAR(p)
model, (24.27), is re-estimated S times to obtain the ML estimates, ˆ (s) (s) (s)
i , â0 , â1 , Ĥ
(s) and
ˆ (s)
ζ ζ , for i = 1, 2, . . . , p, and s = 1, 2, . . . , S. For each of these bootstrap replications, we
(s) n, ξ (s) : ζ (s) , OI (s) n, z(s) : ζ ∗(s) ,
then obtain the estimates of GI (s) n, z(s) : ζ (s) i , GI
∗(s) i i
OI (s) n, ξ (s) : ζ i , h(s) β j z(s) , n . Therefore, using the S set of simulated estimates, we will
obtain both empirical mean and confidence intervals of impulse response functions and persis-
tence profiles.
p−1
yt = −αβ yt−1 +
j yt−j + ut , (24.37)
j=1
p
with = Im − 1 − 2 − . . . − p , and
j = − i=j+1 i , for j = 1, 2, . . . , p − 1. We have
seen in Chapter 22 that the VECM(p − 1) model, (24.37), under Rank(β) = r < m, is subject
to long-run identification problems. We now consider the problem of identification of short run
effects. To this end premultiply both sides of (24.37) by a nonsingular m×m matrix A0 to obtain
p−1
A0 yt = −A0 αβ yt−1 + A0
j yt−j + A0 ut . (24.38)
j=1
The structural shocks are given by εt = A0 ut , and their identification require knowing A0 . But
knowing A0 does not help in identification of β, since irrespective of the value of A0 we have
A0 αQ −1 Q β and replacing A0 with another nonsingular matrix will not help in restricting
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Sims (1980) identification procedure leads to the ‘orthogonalized impulses’ (see also Section
24.4). Sims assumed that:
Due to the symmetric nature of v and and since the elements of are estimated with-
out any restrictions, then the above imposes m (m − 1) /2 restrictions on the elements
of A0 .
(ii) The contemporaneous coefficient matrix, A0 is (lower) triangular
⎛ ⎞
a0,11 0 ··· 0
⎜ a0,21 a0,22 ··· 0 ⎟
⎜ ⎟
A0 = ⎜ .. .. .. .. ⎟,
⎝ . . . . ⎠
a0,m1 a0.m2 · · · a0.mm
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Wold showed, the parameters of these linear relations can be estimated consistently in a recursive
manner.
Other identification schemes have followed the work by Sims (1980). One prominent exam-
ple is the identification scheme developed in Blanchard and Quah (1989), who distinguished
between permanent and transitory shocks and attempted to identify the structural models
through long-run restrictions. For example, Blanchard and Quah argued that the effect of a
demand shock on real output should be temporary (namely it should have a zero long-run
impact), whilst a supply shock should have a permanent effect. This approach is known as ‘struc-
tural VAR’ (SVAR) and has been used extensively in the literature.
A0 yt = A1 yt−1 + A2 yt−2 + εt ,
Now suppose that there are r < m cointegrating relations in this system, so that is rank defi-
cient and = αβ , where α and β are m × r full column rank matrices. Then
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
with
= −A0−1 A2 , (24.40)
and
is the structural VEC (SVEC) model, where α ∗ = A0 α. The central task in SVEC (and SVAR)
systems is to estimate the m2 coefficients of A0 , m of which can be fixed by suitable normal-
ization restrictions. The remaining m(m − 1) coefficients need to be identified by means of a
priori restrictions inspired by economic reasoning. A number of different identification schemes
are possible depending on the nature of the available a priori information. Each identification
scheme produces a set of instruments for yt and so enables the consistent estimation of the
unknown parameters in A0 . Notice from (24.41) that, if one or more elements of α ∗ are known
and we are able to estimate β consistently, then β yt−1 can be used as instruments. This idea will
be described and illustrated in the following Section.
−1
where F = β ⊥ α ⊥ (Im − )β ⊥ α ⊥ , with α ⊥ α = 0 and β β ⊥ = 0, so that ( is defined
by (24.40))
t ∞
yt = y0 + F A0−1 ε j + F∗i ut−i
j=1 i=0
∞
t
ε 1j
−1
= y0 + FA0 t
j=1
+ F∗j ut−j .
j=1 ε 2j
j=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
These restrictions are necessary and sufficient and apply irrespective of whether the transitory
shocks are correlated or not. However, using (24.43) it follows that
0(m−r)×r
A0−1 = αQ , (24.45)
Ir
where Q is an arbitrary r × r nonsingular matrix.3 Hence, after multiplying both sides of (24.45)
by A0 we have
0m−r ∗
α ∗1 Q
= A0 αQ = α Q = . (24.46)
Ir α ∗2 Q
This in turn implies that α ∗1 = 0(m−r)×r , namely the structural equations for which there are known
permanent shocks must have no error correction terms present in them, thereby freeing up the latter
to be used as instruments in estimating their parameters. More specifically, the identification of
the first m − r structural shocks as permanent imposes r(m − r) restrictions on the structural
parameters. Also α ∗2 = Q −1 is an arbitrary nonsingular r × r matrix.
The restrictions α ∗1 = 0(m−r)×r can then be exploited by noting that the r lagged error correc-
tion terms, β yt−1 , are available to be used as instruments for estimating the structural param-
eters of the first m − r equations in (24.41). More specifically, under α ∗1 = 0(m−r)×r the first
m − r equations can be written as
0
A11 y1t + A12
0
y2t = −A11
2
y1,t−1 − A12
2
y2,t−1 + ε 1t , (24.47)
and it is clear that the r × 1 error correction terms, ξ t−1 = β yt−1 , that do not appear in these
equations, but are included in the remaining r equations of (24.41) can be used as instruments
for the m − r equations in (24.47). These instruments are clearly uncorrelated with the error
terms ε 1t , whilst at the same time being correlated with y1t and y2t since α ∗2 is a nonsingu-
lar matrix. Note also that since instrumental variable estimators are unaffected by nonsingular
transformations of the instruments, for the purpose of estimating the structural parameters of
the first m − r equations (A11 0 and A 0 ) the error correction terms, ξ
12 t−1 (or β), need only be
identified up to a nonsingular transformation.
Further discussion on the implications of the permanent/transitory decomposition of shocks
for identification can be found in Pagan and Pesaran (2008).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
unt = α 021 yt + α 022 β 2 unt−1 + α 121 yt−1 + α 122 (β 2 unt−1 ) + ε2t .
It is clear that in this setup β 2 unt−1 does not enter the first equation and can therefore be used as
an instrument for unt in it. So long as β 2 = 0, the value of β 2 does not matter as the instrumen-
tal variable estimator is invariant to it. However, unlike the cointegration case where β 2 could be
estimated super-consistently, this is not possible when unt is I(0), so that we would need to treat
unt−1 as a regressor in the second equation. That means unt−1 is not available as an instrument
for yt . But the residuals from the first equation form a suitable instrument. This instrumental
variable interpretation of Blanchard and Quah is due to Shapiro and Watson (1988). The prob-
lem with this procedure is that unt−1 is often a very poor instrument for unt and this can lead
to highly non-normal densities for the instrumental variables estimator. Using the same data as
Blanchard and Quah this is shown in Fry and Pagan (2005).
0 1 −1 0 −1 0
β = (β 1 , β 2 ) = , with β 2 = .
0 0 1 −1 1 −1
Gali works with an SVAR in yt , it , ξ 1t and ξ 2t rather than the SVECM that is implied by the
assumptions that there are I(1) variables and cointegration.
The implied SVAR for the first equation has the form
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
It is clear that we can use ξ 1,t−1 and ξ 1,t−1 as instruments for ξ jt . But still we need another
instrument for it . To this end Gali assumes that the long-run effect of the second permanent
shock upon yt is zero, which yields the restriction α 012 = −α 112 , and so the equation for it can
be re-expressed in terms of 2 it , allowing it−1 to be used as an instrument.
The second equation has the form
We can still use the lagged ECM terms as instruments. Assuming that the shocks are uncorre-
lated, we can also use the residuals from the first equation as instruments. Gali adopts the latter
but not the former as instruments.
Under expectations formation mechanisms consistent with the reduced form VECM, the expec-
tational variables E(rt∗ | t−1 ), E(et | t−1 ), and E(pot | t−1 ) can be replaced by the
error correction terms β ξ t−1 − b1 (t − 1) and the lagged changes zt−i , i = 1, 2, . . . , s − 1.
This would yield
rt − arr∗ rt∗ − are et − aro pot = rtb − rt−1 + ρ t−1
s−1
∗
+ φ r β zt−1 − b1 (t − 1) + φ ∗zi zt−i + ε rt ,
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where the parameters φ ∗r and φ ∗zi are functions of arr∗ , are , aro and the coefficients in the rows of
the reduced form model associated with rt∗ , et and pot . Suppose now that rtb is set by solving
the optimization problem
where C(wt , rt ) is the loss function of the monetary authorities, assumed to be quadratic so that
1 † † 1
C(wt , rt ) = (wt − wt ) H(wt − wt ) + θ (rt − rt−1 )2 , (24.49)
2 2
where wt = (yt , pt ) and wt = (yt , π t ) are the target variables and their desired values, respec-
† † †
where the monetary policy shock is identified by εrt . Note that changes in the preference param-
eters of the monetary authorities affect the magnitude and the speed with which interest rates
respond to economic disequilibria, but such changes have no effect on the long-run coefficients,
β. It is also easily shown that, while changes in the trade-off parameter matrix, H, affect all the
short-run coefficients of the interest rate equation, changes to the desired target values affect only
the intercept term, ar .
The structural interest rate equation (24.50) can now be used, in conjunction with certain
other a priori restrictions, to derive the impulse response functions of the monetary policy
shocks, εrt . See Garratt et al. (2003b) for further details.
24.15 Exercises
1. Consider the VAR(2) model
in the m × 1 vector of random variables, xt , and is the covariance matrix of the errors with
a typical element, σ ij .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(a) Derive the conditions under which this process is stationary, and show that it has the
following moving average representation
∞
xt = Aj ε t−j .
j=0
xt = A(L)ε t ,
A(L) = A0 + A1 L + A2 L2 + . . . ,
and suppose that Ah < Kλh , where K is a fixed positive constant and 0 ≤ λ < 1, and
Ah represent a matrix norm.
(a) Show that there exist the infinite-order polynomials B(L) and G(L) such that
xt = B(L)ut ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
3. Consider the following stationary VAR(1) model in zt = yt , xt
zt = zt−1 + et ,
where = (φ ij ) is a 2×2 matrix of unknown parameters, and et = eyt , ext is a 2-dimensional
vector of reduced form errors. Define the effects of a permanent shock to the xt process on
itself and on yt in the long run by
gx = lim E xt+s | It−1 , ex,t+h = σ x , for h = 0, 1, 2, . . . ,
s→∞
and
gy = lim E yt+s It−1 , ex,t+h = σ x , for h = 0, 1, 2, . . . ,
s→∞
gy ω + φ 12 − ωφ 22
θ= = ,
gx 1 − (φ 11 − ωφ 21 )
4. Assume that yt and xt are m × 1 vector of random variables that follow the following VAR(1)
processes
yt = yt−1 + ut ,
xt = xt−1 + ε t .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(b) Derive the generalized impulse response functions of a unit (one standard error) shock
to the ith element of ut on zt process, assuming that
ut uu uε
Var = .
εt εu εε
(c) Discuss estimation of the impulse response function under (b), again assuming that only
observations on zt are available.
(d) How do your responses to the above questions are altered if and do not commute?
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
25.1 Introduction
M odelling of conditional volatilities and correlations across asset returns is part of portfolio
decision making and risk management. In risk management the Value at Risk (VaR) of a
given portfolio can be computed using univariate volatility models, but a multivariate model is
needed for portfolio decisions.1 Even in risk management the use of a multivariate model would
be desirable when a number of alternative portfolios of the same universe of m assets are under
consideration. By using the same multivariate volatility model marginal contributions of differ-
ent assets towards the overall portfolio risk can be computed in a consistent manner. Multivariate
volatility models are also needed for determination of hedge ratios and leverage factors.
There exists a growing literature on multivariate volatility modelling. A general class of such
models is the multivariate generalized autoregressive conditional heteroskedastic (MGARCH)
specification (Engle and Kroner (1995)). However, the number of unknown parameters in the
unrestricted MGARCH model rises exponentially with m and its estimation will not be possi-
ble even for a modest number of assets. To deal with the curse of dimensionality the dynamic
conditional correlations (DCC) model is proposed by Engle (2002) which generalizes an ear-
lier specification in Bollerslev (1990) by allowing for time variations in the correlation matrix.
This is achieved parsimoniously by separating the specification of the conditional volatilities
from that of the conditional correlations. The latter are then modelled in terms of a small num-
ber of unknown parameters, which avoids the curse of the dimensionality. DCC is an attractive
estimation procedure which is reasonably flexible in modeling individual volatilities and can
be applied to portfolios with a large number of assets. Pesaran and Pesaran (2010) propose a
DCC model combined with a multivariate t-distribution assumption on the distribution of asset
returns. Indeed, in many applications in finance the t-distribution seems more appropriate to
capture the fat-tailed nature of the distribution of asset returns. The authors suggest a simulta-
neous approach for estimating the parameters, including the degree-of-freedom parameter of the
multivariate t-distribution, of a t-DCC model.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where t−1 is the information set available at close of day t − 1, and t−1 is assumed to be non-
singular. Here we are not concerned with how mean returns are predicted and take μt−1 as given
and equal to a zero vector.2
(1 − λ) (1 − λ) n−1
t−1 = λ t−2 + rt−1 rt−1 − λ rt−n−1 rt−n−1 , (25.1)
(1 − λ )
n (1 − λn )
for a constant parameter 0 < λ < 1, and a window of size n. Typically, the initialization of the
recursion in (25.1) is based on estimates of the unconditional variances using a pre-sample of
th
data. For the i, j entry of t−1 we have
(1 − λ) s−1
n
σ ij,t−1 = λ ri,t−s rj,t−s .
(1 − λn ) s=1
The Riskmetrics specification discussed in Chapter 18 is characterized by the fact that n and λ
are fixed a priori. The choice of λ depends on the frequency of the returns. For daily returns the
values of λ = 0.94, 0.95, and 0.96, have often been used. There is an obvious trade-off between
λ and n, with a small λ yielding similar results to a small n. Note that for t−1 to be non-singular
requires n ≥ m, and it is therefore advisable that a relatively large value is selected for n.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
which form the diagonal matrix Dt−1 . The covariances are based on the same recursion as (25.1)
but using a smoothing parameter, ν, generally different from λ (ν ≤ λ) yielding
(1 − ν) s−1
n
σ ij,t−1 = ν ri,t−s rj,t−s , for i = j.
(1 − ν n ) s=1
We assume that the same window size, n, applies to variance and covariance recursions. The ratio
√
ρ ij,t−1 = σ ij,t−1 / σ ii,t−1 σ jj,t−1 (25.2)
1/2 1/2
represents the (i, j)th entry of the correlation matrix Rt−1 with t−1 = Dt−1 Rt−1 Dt−1 . The
parameters ν and λ are not estimated but calibrated a priori, as for the one-parameter EWMA
model.
(1 − ν) s−1
n
σ ij,t−1 = ν ε i,t−s ε j,t−s ,
(1 − ν n ) s=1
which after normalization according to (25.2), yields the conditional correlation matrix, Rt−1 ,
1/2 1/2
and hence t−1 = Dt−1 Rt−1 Dt−1 .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
⎛ ⎞
σ 1,t−1 0 ... 0
⎜ .. ⎟
⎜ 0 σ 2,t−1 ... . ⎟
Dt−1 =⎜
⎜ .. ..
⎟,
⎟
⎝ .. ⎠
. . . 0
0 0 . . . σ m,t−1
⎛ ⎞
1 ρ 12,t−1 · · · ρ 1m,t−1
⎜ ρ 21,t−1 1 · · · ρ 2m,t−1 ⎟
⎜ ⎟
Rt−1 = ⎜ .. .. .. .. ⎟.
⎝ . . . . ⎠
ρ m1,t−1 ρ m2,t−1 · · · · · · 1
Dt−1 is an m×m, diagonal matrix with elements σ i,t−1 , i = 1, 2, . . . , m, denoting the conditional
volatilities of assets returns, and Rt−1 is the symmetric m × m matrix of pair-wise conditional
correlations. More specifically, the conditional volatility for the ith asset return is defined as
and the conditional pair-wise return correlation between the ith and the jth asset is
Cov rit , rjt | t−1
ρ ij,t−1 = ρ ji,t−1 = .
σ i,t−1 σ j,t−1
where σ̄ 2i is the unconditional variance of the ith asset return. Note that in (25.4) we allow the
parameters λ1i , λ2i to differ across assets. An alternative approach to model (25.4) would be to
use the conditionally heteroskedastic factor model discussed, for example, in Sentana (2000)
where the vector of unobserved common factors is assumed to be conditionally heteroskedastic.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Parsimony is achieved by assuming that the number of the common factors is much less than the
number of assets under consideration.
Under the restriction λ1i + λ2i = 1, unconditional variance does not exist. In this case we
have the integrated GARCH (IGARCH) model used extensively in the professional financial
community 3
∞
σ 2i,t−1 = (1 − λi ) λs−1 2
i ri,t−s 0 < λi < 1, (25.5)
s=1
For cross-asset correlations, Engle proposes the use of the following exponential smoother
applied to the ‘standardized returns’
qij,t−1
ρ̃ ij,t−1 = √ ,
qii,t−1 qjj,t−1
where qij,t−1 are given by
In (25.6), ρ̄ ij is the (i, j)th unconditional correlation, φ 1 , φ 2 are parameters such that
φ 1 + φ 2 < 1, and r̃i,t−1 are the standardized assets returns. Under φ 1 + φ 2 < 1, the pro-
cess is mean reverting. In the case φ 1 + φ 2 = 1, we have
exp rit
r̃i,t−1 = r̃i,t−1 = , (25.7)
σ i,t−1
where σ i,t−1 is given either by (25.4) or, in the case of non-mean reverting volatilities, by (25.5).
We refer to (25.7) as the ‘exponentially weighted returns’.
An alternative way of standardizing returns is to use a measure of realized volatility (Pesaran
and Pesaran (2010))
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
rit
r̃i,t−1 = r̃i,t−1
devol
= , (25.8)
σ i,t−1
realized
where σ realized th
i,t−1 is a proxy for the realized volatility of the i return during day t. The use of
r̃it is data intensive and requires intra-daily observations. Although intra-daily observations are
becoming increasingly available across a large number of assets, it would still be desirable to work
with a version of r̃it that does not require intra-daily observations, but is nevertheless capable of
rendering the devolatized returns approximately Gaussian. One of the main reasons for the non-
Gaussian behavior of daily returns is the presence of jumps in the return process as documented
for a number of markets in the literature (see, e.g., Barndorff Nielsen and Shephard (2002)). The
standardized return (25.7) does not deal with such jumps, since the jump process that affects
exp exp
the numerator of r̃i,t−1 in day t does not enter the denominator of r̃i,t−1 which is based on past
returns and excludes the day t return, rt . The problem is accentuated due to the fact that jumps
are typically independently distributed over time. The use of realized volatility ensures that the
numerator and the denominator of the devolatized returns, r̃it , are both affected by the same
jumps in day t.
Pesaran and Pesaran (2010) have suggested the following approximation for the realized
volatility
p−1 2
s=0 ri,t−s
σ̃ 2it (p) = . (25.9)
p
The lag-order, p, needs to be chosen carefully. We refer to returns (25.8) where the realized
volatility is estimated using (25.9) as ‘devolatized returns’. In a series of papers Andersen, Boller-
slev and Diebold show that daily returns on foreign exchange and stock returns standardized
by realized volatility are approximately Gaussian (see, e.g., Andersen, Bollerslev, Diebold, and
Ebens (2001), and Andersen et al. (2001)).
Note that σ̃ 2it (p) is not the same as the rolling historical estimate of σ it defined by
p 2
s=1 ri,t−s
σ̂ 2it (p) = .
p
Specifically,
rit2 − ri,t−p
2
σ̃ 2it (p) − σ̂ 2it (p) = .
p
It is the inclusion of the current squared returns, rit2 , in the estimation of σ̃ 2it that seems to be criti-
cal in the transformation of rit (which is non-Gaussian) into r̃it which seems to be approximately
Gaussian.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
This decomposition allows the size of the estimation window to vary by moving the index
s along the time axis in order to accommodate estimation of the unknown parameters using
expanding or rolling observation windows, with different estimation update frequencies. For
example, for an expanding estimation window we set s = T0 + 1. For a rolling window of size W
we need to set s = T + 1 − W. The whole estimation process can then be rolled into the future
with an update frequency of h by carrying the estimations at T + h, T + 2h, …, using either
expanding or rolling estimation samples from t = s.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In the non-mean reverting case these intercept coefficients disappear, but for initialization of the
recursive relations (25.4) and (25.6) it is still advisable to use unconditional estimates of the
correlation matrix and asset returns volatilities.
t
lt (θ) = fτ (θ ) ,
τ =s
m 1
fτ (θ ) = − ln (π) − ln | Rτ −1 (θ ) | − ln | Dτ −1 (λ1 , λ2 ) |
2
2
− ln eτ D−1
τ −1 (λ −1 −1
1 , λ2 ) Rτ −1 (θ ) Dτ −1 (λ1 , λ2 ) eτ ,
with eτ = rτ − μτ −1 . For estimation of the unknown parameters, Engle (2002) shows that the
log-likelihood function of the DCC model can be maximized using a two-step procedure. In the
first step, m univariate GARCH models are estimated separately. In the second step using stan-
dardized residuals, computed from the estimated volatilities from the first stage, the parameters
of the conditional correlations are then estimated. The two-step procedure can then be iterated
if desired for full maximum likelihood estimation. Note that under Engle’s specification Rt−1
depends on λ1 and λ2 as well as on φ 1 and φ 2 .
This procedure has two main drawbacks. First, the Gaussian assumption in general does not
hold for daily returns (see Chapter 7) and its use can under-estimate the portfolio risk. Second,
the two-stage approach is likely to be inefficient even under Gaussianity.
For further details on ML estimation using Gaussian returns, see Engle (2002).
t
lt (θ) = fτ (θ ) , (25.12)
τ =s
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
m 1
fτ (θ) = − ln (π) − ln Rτ −1 (θ ) − ln Dτ −1 (λ1 , λ2 )
2 2
m+v v m
+ ln / − ln (v − 2) (25.13)
2 2 2
m+v eτ Dτ −1 (λ1 , λ2 ) Rτ−1
−1 −1
−1 (θ ) Dτ −1 (λ1 , λ2 ) eτ
− ln 1 + ,
2 v−2
m
ln Dτ −1 (λ1 , λ2 ) = ln σ i,τ −1 (λ1i , λ2i ) .
i=1
Under the specification based on devolatized returns, Rt−1 does not depend on λ1 and λ2 , but
depends on φ 1 and φ 2 , and p, the lag order used in the devolatization process. Under the spec-
ification based on exponentially weighted returns, Rt−1 depends on λ1 and λ2 as well as on φ 1
and φ 2 .
In practice the simultaneous estimation of all the parameters of the DDC model could be prob-
lematic, since it can encounter convergence problems, or could lead to a local maxima of the like-
lihood function. When the returns are conditionally Gaussian one could simplify (at the expense
of some loss of estimation efficiency) the computations by adopting Engle’s two-stage estimation
procedure. But in the case of t-distributed returns the use of such a two-stage procedure could
lead to contradictions. For example, estimation of separate t-GARCH(1, 1) models for individ-
ual asset returns can lead to different estimates of v, while the multivariate t-distribution requires
v to be the same across all assets.4
4 Marginal distributions associated with a multivariate t-distribution with v degrees of freedom are also t-distributed
with the same degrees of freedom.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Suppose that we are interested in computing the capital Value at Risk (VaR) of this portfolio
expected at the close of business on day t − 1 with probability 1 − α, which we denote by
VaR(wt−1 , α). For this purpose we require that
Pr wt−1 rt < −VaR(wt−1 , α) | t−1 ≤ α.
Under our assumptions, conditional on t−1 , wt−1 r has a Student t-distribution with mean
t
wt−1 μt−1 , variance wt−1 t−1 wt−1 , and degrees of freedom v. Hence
⎛ ⎞
r − w μ
w
v ⎝ t−1 t t−1 t−1 ⎠
zt = ,
v−2
wt−1 t−1 wt−1
conditional on t−1 will also have a Student t-distribution with v degrees of freedom. It is easily
verified that E(zt |t−1 ) = 0, and V(zt |t−1 ) = v/(v − 2). Denoting the cumulative distribu-
tion function of a Student’s t with v degrees of freedom by Fv (z), VaR(wt−1 , α) will be given as
the solution to
⎛ ⎞
−VaR(wt−1 ,α) − wt−1 μ
Fv ⎝
t−1 ⎠
≤ α.
v−2
v w w
t−1 t−1 t−1
where cα is the α per cent critical value of a Student t-distribution with v degrees of freedom.
Therefore,
VaR(wt−1 ,α) = c̃α wt−1 t−1 wt−1 − wt−1 μt−1 , (25.16)
where c̃α = cα v−2v .
Following Engle and Manganelli (2004), a simple test of the validity of t-DCC model can be
conducted recursively using the indicator statistics
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
dt = I wt−1 rt + VaR(wt−1 ,α) , (25.17)
where I(A) is an indicator function, equal to unity if A > 0 and zero otherwise. These
indicator statistics can be computed in-sample or preferably can be based on a recursive out-
of-sample one-step ahead forecast of t−1 and μt−1 , for a given (predetermined set of port-
folio weights, wt−1 ). In such an out–of-sample exercise the parameters of the mean returns
and the volatility variables (β and θ , respectively) could either be kept fixed at the start of
the evaluation sample or changed with an update frequency of h periods ( for example with
h = 5 for weekly updates, or h = 20 for monthly updates). For the evaluation sample, Seval =
{rt , t = T + 1, T + 2, . . . , T + N}, the mean hit rate is given by
1
T+N
π̂ N = dt . (25.18)
N t=T+1
Under the t-DCC specification, π̂ N will have mean 1 − α and variance α(1 − α)/N, and the
standardized statistic,
√
N [π̂ N − (1 − α)]
zπ = √ , (25.19)
α(1 − α)
will have a standard normal distribution for a sufficiently large evaluation sample size, N. This
result holds irrespective of whether the unknown parameters are estimated recursively or fixed
at the start of the evaluation sample. In such cases the validity of the test procedure requires that
N/T → 0 as (N, T) → ∞. For further details on this statistic, see Pesaran and Timmermann
(2005a).
The zπ statistic provides evidence on the performance of t−1 and μt−1 in an average
(unconditional) sense. An alternative conditional evaluation procedure can be based on proba-
bility integral transforms
⎛ ⎞
r − w μ̂
wt−1
Ût = Fv ⎝ t−1 t−1 ⎠
t
, t = T + 1, T + 2, . . . , T + N. (25.20)
v−2
w ˆ
w
t−1 t−1 t−1
v
Under the null hypothesis of correct specification of the t-DCC model, the probability trans-
form estimates, Ût , are serially uncorrelated and uniformly distributed over the range (0, 1).
Both of these properties can be readily tested. The serial correlation property of Ût can be tested
by Lagrange Multiplier tests using OLS regressions of Ût on an intercept and the lagged values
Ût−1 , Ût−2 , . . . ., Ût−s , where the maximum lag length, s, can be selected by using the AIC cri-
terion. The uniformity of the distribution of Ût over t can be tested using the Kolmogorov–
Smirnov statistic defined by KSN = supx FÛ (x) − U(x) , where FÛ (x) is the empirical cumu-
lative distribution function (CDF) of the Ût , for t = T + 1, T + 2, . . . , T + N, and U(x) = x
is the CDF of IIDU[0, 1]. Large values of the Kolmogorov-Smirnov statistic, KSN , indicate that
the sample CDF is not similar to the hypothesized uniform CDF.5
5 For details of the Kolmogorov-Smirnov test and its critical values see, e.g., Neave and Worthington (1992, pp. 89–93).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
σ̄ 2i,T = T −1 riτ2 ,
τ =1
λ̂1i,T and λ̂2i,T are the ML estimates of λ1i and λ2i computed using the observations over the
estimation sample Sest = {rt , t = s, s + 1, . . . , T}, and σ̂ 2i,T−1 is the ML estimate of σ 2i,T−1 ,
based on the estimates σ̄ 2i,T−1 , λ̂1i,T−1 and λ̂2i,T−1 .
Similarly, the one step-ahead forecast of ρ ij,T (using either exponentially weighted returns
(25.7) or devolatilized returns (25.8)) is given by
q̂ij,T
ρ̂ ij,T (φ) = ,
q̂ii,T q̂jj,T
where
As before, φ̂ 1T and φ̂ 2T are the ML estimates of φ 1T and φ 2T computed using the estimation
sample, and q̂ij,T−1 is the ML estimate of qij,T−1 , based on the estimates ρ̄ ij,T−1 , φ̂ 1T−1 and
φ̂ 2T−1 .
• 6 currencies: British pound (GBP), euro (EU), Japanese yen ( JPY), Swiss franc (CHF),
Canadian dollar (CAD), and Australian dollar (AD).
• 4 government bonds: US T-Note 10Y (BU), Europe euro bund 10Y (BE), Japan govern-
ment bond 10Y ( JGB), and UK long gilts 8.75-13Y (BG).
• 7 equity index futures S&P 500 (SP), FTSE 100 (FTSE), German DAX (DAX), French
CAC40 (CAC), Swiss Market Index (SM), Australia SPI200 (AUS), Nikkei 225 (NK).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The weekly returns are computed from daily prices obtained from Datastream and cover the
period from 7 Jan 94 to 30 Oct 2009.
Table 25.1 Summary statistics for raw weekly returns and devolatized weekly returns over 1 April 1994 to 20
October 2009
Currencies
Australian dollar 0.044 1.690 –1.163 7.886 0.059 1.005 –0.214 –0.112
British pound 0.019 1.297 –0.831 5.348 0.037 1.013 –0.148 –0.197
Canadian dollar 0.035 1.136 –0.739 7.443 0.031 1.023 –0.040 –0.266
Swiss franc 0.053 1.517 0.210 1.071 0.044 0.994 0.146 –0.299
Euro 0.039 1.381 –0.043 1.424 0.044 1.012 –0.008 –0.281
Yen 0.031 1.669 1.326 9.462 –0.009 1.016 0.328 0.139
Bonds
Euro Bunds 0.070 0.755 –0.378 0.910 0.123 1.000 –0.210 –0.205
UK Gilt 0.051 0.893 –0.013 1.744 0.068 1.008 –0.015 –0.290
Japan JGB 0.072 0.578 –0.436 2.323 0.152 1.007 –0.364 0.022
US T-Note 0.077 0.894 –0.359 0.954 0.084 1.004 –0.243 –0.188
Equities
S&P 500 0.094 2.575 –0.749 8.018 0.054 1.011 –0.314 –0.124
Nikkei –0.017 3.175 –0.979 9.645 –0.005 0.996 –0.235 –0.147
FTSE 0.060 2.535 –0.858 10.399 0.042 1.002 –0.264 –0.132
CAC 0.107 3.116 –0.656 5.473 0.043 1.003 –0.216 –0.478
DAX 0.113 3.398 –0.559 5.673 0.055 1.008 –0.312 –0.220
SM 0.137 2.819 –0.734 10.174 0.077 1.005 –0.349 0.077
AUS 0.083 2.118 –0.670 4.698 0.066 1.001 –0.224 –0.253
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
weekly returns on AD, BP, and JY falls from 7.89, 5.35, and 9.46 to –0.112, –0.020, and 0.139,
respectively. Out of the four ten year government bonds only the weekly returns on Japanese
government bond show some degree of excess kurtosis which is eliminated once the returns are
devolatized. It is also interesting to note that the standard deviations of the devolatized returns
are now all very close to unity, that allows for a more direct comparison of the devolatized returns
across assets.
25.8.2 ML estimation
It is well established that daily or weekly returns are approximately mean zero serially uncorre-
lated processes and for the purpose of risk analysis it is reasonable to assume that μt−1 = 0.
Using the ML procedure described above, initially we estimate a number of DCC models on the
17 weekly returns over the period 27 May 1994 to 28 Dec 2007 (710 observations). We then use
the post estimation sample observations from 4 January, 2008 to 30 October, 2009 for the evalu-
ation of the estimated volatility models using the VaR and distribution free diagnostics.6 We also
provide separate t-DCC models for currencies, bonds and equities for purposes of comparison.
We begin with the unrestricted version of the DCC(1,1) model with asset-specific volatility
parameters λ1 = (λ11 , λ12 , . . . , λ1m ) , λ2 = (λ21 , λ22 , . . . , λ2m ) , and common conditional
correlation parameters, φ 1 and φ 2 , and the degrees-of-freedom parameter, v, under conditionally
t distributed returns (note that m = 17). We did not encounter any convergence problems, and
obtained the same ML estimates when starting from different initial parameter values. But to
achieve convergence in some applications we had to experiment with different initial values. In
particular we found the initial values λ1i = 0.95, λ2i = 0.05, φ 1 = 0.96, φ 2 = 0.03 and v = 12
to work relatively well. Also the sum of unrestricted estimates of λ1 and λ2 for the Canadian
dollar exceeded 1, and to ensure a non-explosive outcome we estimated its volatility equation
subject to the restriction λ1,CD + λ2,CD = 1.
To evaluate the statistical significance of the multivariate t-distribution for the analysis of
return volatilities, in Table 25.2 we first provide the maximized log-likelihood values under mul-
tivariate normal and t-distributions for currencies, bonds and equities separately, as well as for all
the 17 assets jointly. These results are reported for both the standardized and devolatized returns.
Table 25.2 Maximized log-likelihood values of DCC models estimated with weekly returns over 27 May
1994 to 28 December 2007
Currencies (6) –5783.7 –5689.8 9.62 (1.098) –5790.6 –5694.1 9.24 (0.94)
Bonds (4) –2268.5 –2243.5 11.28 (2.00) –2270.7 –2246.9 11.35 (5.53)
Equities (7) –9500.1 –9380.7 7.96 (0.74) –9504.4 –9383.2 7.79 (0.72)
All 17 –17509.2 –17244.8 11.84 (0.90) –17510.4 –17250.4 12.11 (0.92)
Note: D.F. is the estimated degrees of freedom of the multivariate t-distribution. Standard errors of the estimates are given
in parentheses.
6 The ML estimation and the computation of the diagnostic statistics are carried out using Microfit 5. See Pesaran and
Pesaran (2009).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
It is firstly clear from these results that the normal-DCC specifications are strongly rejected rel-
ative to the t-DCC models for all asset categories. The maximized log-likelihood values for the
t-DCC models are significantly larger than the ones for the normal-DCC models. The estimated
degrees of freedom of the multivariate t-distribution for different asset classes are quite close and
range from 8 (for equities) to 11 (for bonds), all well below the value of 30 and above what one
would expect for a multivariate normal distribution. For the full set of 17 assets the estimate of
v is closer to 12. There seems to be a tendency for the estimate of v to rise as more assets are
included in the t-DCC model.
The above conclusions are robust to the way returns are scaled for computation of cross asset
return correlations. The maximized log-likelihoods for the standardized and devolatized returns
are very close although, due to the non-nested nature of the two return transformations, no def-
inite conclusions can be reached as to their relative merits. The specifications where the returns
are standardized by the conditional volatilities tend to fit better (give higher log-likelihood
values). But this is to be expected since the maximization of the log-likelihood function in this
case is carried out with respect to the parameters of the scaling factor, unlike the case where scal-
ing is carried out with respect to the realized volatilities which do not depend on the unknown
parameters of the likelihood function. In what follows we base our correlation analysis on the
devolatized returns on the grounds of their approximate Gaussianity, as argued in Section 25.3.
7 Recall that for Canadian dollar the volatility model is estimated subject to the restriction λ
1,CD + λ2,CD = 1.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Table 25.3 ML estimates of t-DCC model estimated with weekly returns over the
period 27 May 94–28 Dec 07
ML Estimates
Asset λ̂1 λ̂2 1 − λ̂1 − λ̂2
Currencies
Australian dollar 0.9437 (0.0201) 0.0361 (0.0097) 0.0201 (0.0140)[1.44]
British pound 0.9862 (0.0110) 0.0124 (0.0056) 0.0014 (0.0081)[0.18]
Canadian dollar 0.9651 (0.0102) 0.0349 (0.0102) 0 (N/A)[N/A]
Swiss franc 0.9365 (0.0517) 0.0303 (0.0157) 0.0332 (0.0378)[0.88]
Euro 0.9222 (0.0264) 0.0487 (0.0133) 0.0291 (0.0154)[1.89]
Yen 0.9215 (0.0235) 0.0586 (0.0151) 0.01992 (0.0107)[1.86]
Bonds
Euro Bunds 0.9031 (0.0237) 0.0703 (0.0149) 0.0266 (0.0118)[2.26]
UK Gilt 0.9062 (0.0304) 0.0774 (0.0224) 0.0164 (0.0091)[1.80]
Japan JGB 0.8179 (0.0369) 0.1444 (0.0268) 0.0377 (0.0141)[2.74]
US T-Note 0.9072 (0.0249) 0.0714 (0.0165) 0.0216 (0.0115)[1.87]
Equities
CAC 0.9252 (0.0118) 0.0674 (0.0099) 0.0074 (0.0033)[2.23]
DAX 0.9267 (0.0117) 0.0653 (0.0095) 0.0080 (0.0039)[2.03]
Nikkei 0.9552 (0.0305) 0.0402 (0.0210) 0.0046 (0.0109)[0.42]
S&P 500 0.9326 (0.0194) 0.0582 (0.0150) 0.0091 (0.0060)[1.53]
FTSE 0.9298 (0.0144) 0.0589 (0.0109) 0.0112 (0.0052)[2.16]
SM 0.9066 (0.0225) 0.0774 (0.0165) 0.0160(0.0076)[2.11]
AUS 0.9393 (0.0295) 0.0370 (0.0128) 0.0237(0.0194)[1.22]
Note: Standard errors of the estimates are given in parentheses; t-statistics are given is in brack-
ets; λ1i and λ2i are the asset-specific volatility parameters; and φ 1 and φ 2 are the common con-
ditional correlation parameters.
in the present applications, the unrestricted parameter estimates and those obtained under
IGARCH are very close and one can view the restrictions λ1i + λ2i = 1 as a first-order approx-
imation that avoids explosive outcomes. We also note that the diagnostic test results, to be
reported in Section 25.8.5, are not qualitatively affected by the imposition of the restrictions,
λ1i + λ2i = 1.
Finally, it is worth noting that there is statistically significant evidence of parameter hetero-
geneity across assets, which could lead to misleading inference if these differences are ignored.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(25.15) set to 1/17, and use the risk tolerance probability of α = 1%, which is the value typically
assumed in practice. We consider two versions of the t-DCC model: a version with no restrictions
on λ1i and λ2i (except for i = CD), and an integrated version where λ1i + λ2i = 1, for all i.
Using the Lagrange multiplier statistic to test the null hypothesis that Ût ’s are serially uncor-
related, we obtained the values of χ 212 = 4.74 and χ 212 = 5.31 for the unrestricted and the
restricted t-DCC specifications. These statistics are computed assuming a maximum lag order of
12, and are asymptotically distributed as chi-squared variates with twelve degrees of freedom. It
is clear that both specifications of the t-DCC model pass this test.
Next we apply the Kolmogorov-Smirnov statistic to Ût ’s to test the null hypothesis that the
PIT values are draws from a uniform distribution. The KS statistics for the unrestricted and the
restricted versions amount to 0.0646 and 0.0454, respectively. Both these statistics are well below
the KS critical value of 0.1388 (at the 5 per cent level).8 Therefore, the null hypothesis that the
sample CDF of Ût ’s is similar to the hypothesized uniform CDF cannot be rejected.
It is interesting that neither of the tests based on Ût ’s are capable of detecting the effects
of the financial turmoil that occured in 2008. A test based on the violations of the VaR con-
straint is likely to be more discriminating, since it focusses on the tail properties of the return
distributions. For a tolerance probability of α = 0.01, we would expect only one violation
of the VaR constraint in 100 observations (our evaluation sample contains 96 observations).
The unrestricted specification results in three violations of the VaR constraint, and the restricted
specification in four violations. Both specifications violate the VaR constraint in the weeks start-
ing on 5 Sep 08, 3 Oct 08, and 10 Oct 08. The restricted version also violates the VaR in
the week starting in 18 Jan 08. The test statistics associated with these violations are −2.09
and −3.12 which are normally distributed. Thus both specifications are rejected by the VaR
violation test.9 Not surprisingly, the rejection of the test is due to the unprecedented market
volatility during the weeks in September and October of 2008. This period covers the Fed’s
take over of FannieMae <http://www.fanniemae.com/portal/index.html> and Freddie Mac
<http://www.freddiemac.com/>, the collapse of Lehman Brothers, and the downgrading of the
AIG’s credit rating. In fact, during the two weeks starting on 3 Oct 08, the S&P 500 dropped
by 29.92 per cent, which is larger than the 20 per cent market decline experienced during the
October Crash of 1987.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
the recursive PIT values are draws from a uniform distribution. We also could not find any evi-
dence of serial correlation in the PIT values. But as before, the violations of the VaR constraint
were statistically significant with zπ = −3.09. The violations occur exactly on the same dates
as when the parameters were fixed at the end of 2007. Updating the parameter estimates of the
t-DCC model seem to have little impact on the diagnostic test outcomes.
1
30-Dec-94 18-Sep-98 07-Jun-02 24-Feb-06 30-Oct-09
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
30-Dec-94 18-Sep-98 07-Jun-02 24-Feb-06 30-Oct-09
1
30-Dec-94 18-Sep-98 07-Jun-02 24-Feb-06 30-Oct-09
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
30-Dec-94 18-Sep-98 07-Jun-02 24-Feb-06 30-Oct-09
Cor(EU,JY) Cor(BP,EU) Cor(CH,EU)
Cor(CD,EU) Cor(AD,EU)
1.0
0.8
0.6
0.4
0.2
0.0
–0.2
30-Dec-94 18-Sep-98 07-Jun-02 24-Feb-06 30-Oct-09
Cor(BU,BG) Cor(BU,BJ) Cor(BU,BE)
1.0
0.8
0.6
0.4
0.2
0.0
30-Dec-94 18-Sep-98 07-Jun-02 24-Feb-06 30-Oct-09
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
2.6
2.4
2.2
2.0
1.8
30-Dec-94 18-Sep-98 07-Jun-02 24-Feb-06 30-Oct-09
Cor_Eigen_Max
25.10 Exercises
1. Consider the m × 1 vector of returns, rt = (r1t , r2t , . . . , rmt ) , and suppose that
rit = μi + uit ,
where
uit = σ it ε it , for i = 1, 2, . . . , m,
log σ 2it = λi log σ 2i,t−1 + α 0i + vit ,
2. Let ρ t (ωt−1 ) = ωt−1 rt be a portfolio return, where ωt−1 = (ω1,t−1 , ω2,t−2 , . . . , ωN,t−1 )
is the N×1 vector of weights and rt = (r1t , r2t , . . . , rNt ) is the associated vector of returns.
Suppose that rt is distributed with the conditional mean, E(rt |t−1 ), and the conditional
covariance, V(rt |t−1 ), where t−1 is the available information at time t − 1.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
(a) Derive the portfolio weights, ωt−1 , assuming the aim is to maximize expected returns
subject to a given value for the portfolio variance.
(b) Assume further that rt |t−1 is Student t-distributed with v > 2 degrees of freedom.
Derive the portfolio weights subject to the VaR constraint given by
Pr ρ t (ωt−1 ) < −Lt−1 |t−1 ≤ α, (25.21)
where Lt−1 > 0 is a pre-specified maximum daily loss and α is a (small) probability
value.
(c) Show that the above two optimization problems can be combined by solving the fol-
lowing mean-variance objective function
δ t−1
Q (ωt−1 |t−1 ) = ωt−1 E(rt |t−1 ) − ω V(rt |t−1 )ωt−1 ,
2 t−1
with
st−1 v−2
v cv,α − st−1
δ ∗t−1 ≡ ,
Lt−1
where st−1 = E(r t |t−1 ) [V(rt |t−1 )]−1 E(rt |t−1 ), and cv,α > 0 is the α%
left tail of the Student t-distribution with v degrees of freedom.
Hint: See Pesaran, Schleicher, and Zaffaroni (2009).
3. Use the daily returns data on the equity index futures S&P 500 (SP), FTSE 100 (FTSE),
German DAX (DAX), French CAC40 (CAC), Swiss Market Index (SM), Australia SPI200
(AUS), Nikkei 225 (NK) provided in Pesaran and Pesaran (2009) to estimate the condi-
tional covariance of these seven returns using Riskmetrics specification with parameters
λ = 0.96 and n = 250, and compare your results (using some suitable diagnostics) with
the estimates obtained using the DCC approach.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Part VI
Panel Data Econometrics
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
26.1 Introduction
P anel data consist of observations on many individual economic units over two or more peri-
ods of time. The individual units are usually referred to as cross-sectional units, and in eco-
nomic and finance applications are typically represented by single individuals, firms, returns on
individual securities, industries, regions, or countries.
In recent years, panel data sets have become widely available to empirical researchers.
Examples of such data sets in the US include the Panel Study of Income Dynamics (PSID),
collected by the Institute for Social Research at the University of Michigan, and the National
Longitudinal Surveys of Labor Market Experience (NLS), from the Center for Human Resource
Research at Ohio State University. The PSID began in 1968 by collecting of annual economic
information from a representative national sample of about 6,000 families and 15,000 individ-
uals. The NLS started in the mid 1960s, and contains five separate annual surveys covering
various segments of the labour force. In Europe, many countries have their national annual
surveys such as the Netherlands Socioeconomic Panel, the German Social Economics Panel,
and the British Household Panel Survey. At aggregated level, the published statistics of the
Organisation for Economic Co-operation and Development (OECD) contain numerous series
of economic aggregates observed yearly for many countries. New data sources are also emerg-
ing through Google search engine and retail scanner datasets. Examples are Google Flu Trends
(<http://www.google.org/flutrends/>), Nielson Datasets for consumer marketing (<http://
research.chicagobooth.edu/nielsen/>). This increasing availability of panel data sets, while open-
ing up new possibilities for analysis, has also raised a number of new and interesting econometric
issues.
Panel data offer several important advantages over data sets with only a temporal or longi-
tudinal dimension. A major motivation for using panel data is the ability to control for possi-
bly correlated, time-invariant heterogeneity without actually observing it. One may be able to
identify and measure effects that are otherwise not detectable, as well as to account for latent
individual heterogeneity. An additional advantage of panel data, compared to time series data,
is the reduction in collinearity among explanatory variables and the increase in efficiency of
econometric estimators. Finally, the cross-sectional dimension may also alleviate problems of
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
aggregation. These benefits do come at a cost. Important difficulties arise when explanatory
variables in panel data regression models cannot be assumed strictly exogenous. As we shall
see in Chapter 27, standard panel estimators are inconsistent when panel data regression mod-
els have weakly exogenous regressors, and their treatment poses a number of methodological
challenges. A further complication arises when regression errors attached to different cross-
section units are dependent, even after conditioning on variables that are specific to cross-
sectional units. In the presence of cross-section dependence, conventional panel estimators
can result in misleading inference and even inconsistent estimators (see Chapter 29 on this).
Finally, important econometric issues arise when panel data sets involves non-responses and
measurement errors.
The literature on panel data can be broadly divided into three categories, depending on their
assumptions about the relative magnitudes of the number of cross-sectional units (N) and the
number of time periods (T). First, there exists a ‘small N, large T’ time series literature which
closely follows the SURE procedure, due to Zellner (1962) and described in Chapter 19. The
main attraction of the SURE approach is that it allows the contemporaneous error covariances
to be freely estimated. But this is possible only when N is reasonably small relative to T, while
the SURE procedure is not feasible when N is of the same order of magnitude as T. Also the
SURE approach assumes that the regressors are uncorrelated with the errors which rules out
the error correlation being due to the presence of unobserved common factors. The general
problem of error cross-sectional correlation will be discussed in Chapter 29. Second, there is
the ‘small T, large N’ panel literature. The set of econometric models and techniques suggested
to carry inference on this type of panel data sets, assuming strictly exogenous regressors, will
be the object of this chapter. Next chapter relaxes the exogeneity assumption and allows the
regressors to be weakly exogenous. The analysis of ‘large T, large N’ panels will be covered in
Chapters 28 and 31.
where xit is a k×1 vector of observed individual specific regressors on the ith cross-sectional
unit at time t, uit is the error term, β is a k-dimensional vector of unknown parameters,
and α i denotes an unobservable, unit-specific effect. Note that α i is time-invariant, and it
accounts for any individual-specific effect that is not included in the regression (Mundlak
(1978)).
It is often convenient to rewrite model (26.1) in stacked form using a unit-specific formulation
as follows
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
In other cases, it is convenient to rewrite (26.1) in stacked form using a time-specific formulation
where
⎛ ⎞ ⎛ ⎞
y1t x(1)
1t x(2)
1t . . . x(k)
1t
⎜ ⎟ ⎜ ⎟
⎜ y2t ⎟ ⎜ x(1) x(2) . . . x(k) ⎟
⎟ , X.t = ⎜ ⎟,
2t 2t 2t
y.t = ⎜ .. ⎜ .. .. .. .. ⎟
⎝ . ⎠ ⎝ . . . . ⎠
yNt (1) (2) (k)
xNt xNt . . . xNt
⎛ ⎞ ⎛ ⎞
u1t α1
⎜ u2t ⎟ ⎜ α2 ⎟
⎜ ⎟ ⎜ ⎟
u.t = ⎜ . ⎟ , α = ⎜ .. ⎟.
⎝ .. ⎠ ⎝ . ⎠
uNt αN
y = (α ⊗ τ T ) + Xβ + u. (26.5)
, X = (x , x , . . . , x ) , u = u , u , . . . , u , and ⊗ is the
where y = y1. , y2. , . . . , yN. 1. 2. N. 1. 2. N.
Kronecker product.
In the rest of this chapter, it is assumed E(uit |Xi. ) = 0, for all i and t, namely, that regressors
are strictly exogenous (see Section 9.3 for a discussion of the notion of strict and weak exogene-
ity). In other words, at each time period, the error term is assumed to be uncorrelated with all
lags and leads of the explanatory variables. As we shall see, this is a critical assumption for the
methods developed in this chapter. The case of weakly exogenous regressors will be considered
in Chapter 27.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and α and β can be estimated by the OLS procedure. The resultant estimator of β is known as
pooled OLS and is given by
N T −1
N
T
β̂ OLS = (xit − x̄) (xit − x̄) (xit − x̄) (yit − ȳ) , (26.7)
i=1 t=1 i=1 t=1
where
N
T
N
T
x̄ = (NT)−1 xit , ȳ = (NT)−1 yit ,
i=1 t=1 i=1 t=1
T
and assuming that N i=1 t=1 (xit − x̄) (xit − x̄) is a nonsingular matrix. The pooled estima-
tor is unbiased and consistent if xit is strictly exogenous and the intercepts are homogeneous.
Heteroskedasticity of the errors, uit , and temporal dependence affect inference but does not
affect the consistency property of the pooled estimator, when T is fixed and N large. More for-
mally, we make the following assumptions:
Assumption P1: E(uit |xit ) = 0, for all i, t and t .
Assumption P2 : The regressors, xit , are either deterministic and bounded, namely xit <
K < ∞, or they satisfy the moment conditions E (xit − x̄) xjt − x̄ < K < ∞, for all i, j,
t and t , where A denotes the Frobenius norm of matrix A.
Assumption P3: The k × k matrix Q p,NT defined by
1
N T
Q p,NT = (xit − x̄) (xit − x̄) , (26.8)
NT i=1 t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where γ i (t, t ) is the auto-covariance of the uit process, assumed to be bounded, namely γ i (t, t ) <
K < ∞, for all i, t and t .
Remark 6 Assumption P2 can be relaxed to allow for trended or unit root processes. The autocovari-
ances γ i (t, t ) can be left unrestricted when T is fixed.
To establish the unbiasedness property we first note that α can be eliminated by demeaning
using the grand means, x̄ and ȳ. We have
1
N T
β̂ OLS − β = Q −1
p,NT (xit − x̄) uit .
NT i=1 t=1
Under the strict exogeneity Assumption P1, we have E(uit |X ) = 0, for all i and t, where
X = {xit , for i = 1, 2, . . . ., N; t = 1, 2, . . . , T}, and it readily follows that
1
N T
E β̂ OLS |X − β = Q −1
p,NT (xit − x̄) E(uit |X ) = 0,
NT i=1 t=1
and, therefore, unconditionally we also have E E β̂ OLS |X − β = 0, or E β̂ OLS = β,
which establishes that β̂ OLS is an unbiased estimator of β.
Consider now the variance of β̂ OLS and note that
Var β̂ OLS |X
1
N N T T
= Q −1
p,NT E u u
it jt |X (x it − x̄) x jt − x̄ Q −1
p,NT . (26.9)
N 2 T 2 i=1 j=1 t=1
t =1
1 −1
Var β̂ OLS |X = Q V p,NT Q −1
p,NT , (26.10)
NT p,NT
1 Note that
N
T
N
T
(xit − x̄) ū = ū (xit − x̄) = 0.
i=1 t=1 i=1 t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎡ ⎤
1 ⎣ −1 2
N T N T
V p,NT = T σ i (xit − x̄) (xit − x̄) + T −1 γ i (t, t ) (xit − x̄) (xit − x̄) ⎦ .
N
i=1 t=1 i=1 t =t
(26.11)
Also under Assumption P2, it readily follows that
1
T
E (xit − x̄) (xit − x̄) = Op (1), for all i,
T t=1
1
T
γ i (t, t )E (xit − x̄) (xit − x̄) = O(T), for all i,
T
t =t
and as a result
lim Var β̂ OLS |X = 0, for a fixed T.
N→∞
which in turn establishes that β̂ OLS converges in root mean squared error to its true value, and
Plim(β̂ OLS ) → 0, as N → ∞.
This is a general result and holds so long as the regressors are strictly exogenous, the errors are
cross-sectionally uncorrelated, the individual effects, α i , are uncorrelated with the errors and the
regressors, and T is fixed as N → ∞, or if N and T → ∞, jointly in any order. But when N is
fixed and T → ∞, then certain mixing or stationary conditions are required on the autocovari-
ances, γ i (t, t ), for the pooled OLS estimator to remain consistent. A sufficient condition is given
by T −2 Tt=1 Tt =1 γ 2i (t, t ) → 0, for each i. In a panel data context the most interesting cases
are when T is fixed and N large or when both N and T are large. In such cases, under assumptions
P1-P5, the pooled OLS is robust to any degree of temporal dependence in the errors, uit . It can
also account for cross-sectional heteroskedasticity.
Furthermore, as in the case of the classical regression model, if we also assume that uit are
normally distributed it then readily follows that
√
NT β̂ OLS − β ∼ N(0, p,NT ),
where
p,NT = Q −1 −1
p,NT V p,NT Q p,NT .
In the case where the errors are not normally distributed, then for any fixed T,
√
NT β̂ OLS − β →d N(0, β ols ), as N → ∞, where β ols = PlimN→∞ p,NT .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
All the above results critically depend on the intercept homogeneity assumption. In the case
where α i differ sufficiently across i, the pooled OLS estimator could be biased depending on the
degree of the heterogeneity of α i and the extent to which α i and xit are correlated. As a simple
formulation suppose that
α i = α + ηi ,
where gt = (g1t , g2t , . . . , gkt ) , and wit is a k × 1 vector of strictly exogenous regressors that are
uncorrelated with ηi . The degree of correlation between ηi and xit is given by σ 2η gt . To derive
the asymptotic bias of β̂ OLS note that under this setup yit − ȳ = ηi − η̄ + (xit − x̄) β + uit − ū,
and
−1
N T
N
T
β̂ OLS − β = (xit − x̄) (xit − x̄) (xit − x̄) (uit − ū + ηi − η̄) .
i=1 t=1 i=1 t=1
1
N T
Plim β̂ OLS − β = Q −1
T,p lim E [(xit − x̄) (ηi − η̄)] ,
N→∞ N→∞ NT
i=1 t=1
where Q T,p = PlimN→∞ Q −1
p,NT . Also
E [(xit − x̄) (ηi − η̄)] = E gt (ηi − η̄) + gt − ḡ η̄ + wit − w̄ (ηi − η̄)
N−1
= σ 2η gt ,
N
and hence
Plim β̂ OLS = β+σ 2η Q −1
T,p ḡT ,
N→∞
where ḡT = T −1 Tt=1 gt . This bias arises because of the omission of ηi which is correlated
with xit . One way of dealing with this bias is to employ the fixed-effects estimator to which we
now turn.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
is boundedness, namely that |α i | < K < ∞, for all i, where K is a fixed positive constant. Oth-
erwise, α i is allowed to have any degree of dependence on the regressors, xit or the error term,
uit . This setup does not rule out the possibility that α i are random draws from a given distribu-
tion. In general, one can think of the fixed-effects as draws from a joint probability distribution
function over α i , xit and uit , where the number of parameters characterizing this distribution is
allowed to increase at the same rate as the number of cross-sectional observations, N. For further
discussion see Mundlak (1978) and Hausman and Taylor (1981).
Under the FE specification, we assume that conditional on the individual effects, α i , the
regressors, xit , are strictly exogenous, but do not impose any restrictions on the fixed-effects.
More formally, we continue to maintain Assumptions P1, P4 and P5 , but replace Assumptions
P2 and P3 with the following:
Assumption P2’ : The regressors, xit , are either!deterministic and bounded, namely xit <
! !
!
K < ∞, or they satisfy the moment conditions E !(xit − x̄i ) xjt − x̄j ! < K < ∞, for all i, j,
t and t , where x̄i = T −1 Tt=1 xit .
Assumption P3’: The k × k matrix Q FE,NT defined by
1
N T
Q FE,NT = (xit − x̄i ) (xit − x̄i ) , (26.13)
NT i=1 t=1
is positive definite for all N and T, and as N and/or T → ∞. We denote the (probability) limits
of Q FE,NT as N or T, or both tending to infinity by Q FE,T , Q FE,N and Q FE , respectively.
The basic idea behind FE estimation is to estimate β after eliminating the individual effects,
α i . Averaging over time equation (26.1) yields
1 1 1
T T T
ȳi. = yit , x̄i. = xit , ūi. = uit . (26.15)
T t=1 T t=1 T t=1
which is known as FE, or within transformation. β is now estimated by applying the method of
pooled OLS to the above transformed relations to obtain
−1
T
N
T
N
β̂ FE = (xit − x̄i ) (xit − x̄i ) (xit − x̄i. ) yit − ȳi . (26.17)
t=1 i=1 t=1 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
α̂ i = ȳi − β̂ FE x̄i . (26.18)
The transformed equation (26.16), and the FE estimator can also be rewritten in a more
convenient form using the unit-specific stacked notation (26.2). In particular, let MT =
IT −τ T (τ T τ T )−1 τ T .The matrix MT is a T×T idempotent transformation matrix that converts
variables in the form of deviations from their mean. Noting that MT τ T = τ T − τ T (τ T τ T )−1
τ T τ T = 0, and, pre-multiplying both sides of (26.2) by MT , we obtain
Applying the OLS to equation (26.19) yields (assuming Assumption P3’ holds)
" #−1
N
N
β̂ FE = Xi. MT Xi. Xi. MT yi. , (26.20)
i=1 i=1
N
T
N
−1 −1
qFE,NT = (NT) (xit − x̄) (yit − ȳ) = (NT) Xi. MT yi. . (26.21)
i=1 t=1 i=1
It is now easily seen that, under the above assumptions, β FE is unbiased and consistent for any
fixed T and as N → ∞. Substituting the expression for yi. in (26.20), yields
"
#−1
N
N
X MT ui.
i=1 Xi. MT Xi. i.
β̂ FE − β = ,
NT i=1
NT
and
"
#−1
N
N
X MT E (ui. |X )
i=1 Xi. MT Xi.
E β̂ FE |X − β = i.
.
NT i=1
NT
But under Assumption P1, E (ui. |X ) = 0, and it readily follows that E β̂ FE |X = β; and
hence unconditionally we also have E β̂ FE = β, which establishes that the FE estimator of β
is unbiased under Assumptions P1 and P3’. Consider now the variance of β̂ FE and note that
1 −1
Var β̂ FE |X = Q VFE,NT Q −1
FE,NT , (26.22)
NT FE,NT
where
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1
N N
VFE,NT = Xi. MT E ui. uj. |X MT Xj. . (26.23)
NT i=1 j=1
But under Assumptions P4 and P5, E ui. uj. |X = 0, if i = j and E (ui. ui. |X ) = i which is a
T × T matrix with t, t element given by γ i (t, t ), and γ i (t, t) = σ 2i , and we have
N
1 Xi. MT i MT Xi.
VFE,NT =
N i=1 T
1 2 1
N T N T
= σ i (xit − x̄i ) (xit − x̄i ) + γ (t, t ) (xit − x̄i ) (xit − x̄i ) .
NT i=1 t=1 NT i=1 i
t =t
Under Assumptions P2’ and P3’, Var β̂ FE |X → 0, when T is fixed and N → ∞, which
together with E β̂ FE = β establishes the consistency of β̂ FE . In the case where both N and
T → ∞, a sufficient condition for consistency of β̂ FE is given by
1 2
N T T
γ i (t, t ) → 0,
N 2 T 2 i=1 t=1
t =1
which is met since γ i (t, t ) < K. But if N is fixed as T → ∞, then we need
T
T
T −2 γ 2i (t, t ) → 0, for each i,
t=1 t =1
which is the usual time series ergodicity condition and is met if the T × T autocovariance matrix
i = (γ i (t, t )) has bounded absolute row (column) sum norm. This condition is met, for exam-
ple, if uit is a stationary process for all i (see Chapter 14).
The asymptotic distribution of β̂ FE can also be obtained either assuming that the errors, uit ,
are normally distributed when N and T are fixed, or satisfy certain distributional conditions
when N or/and T → ∞. In the case where uit is normally distributed, under Assumptions
P1’, P2’, P3’, P4 and P5 and for any given N and T we have
√
NT β̂ FE − β ∼ N(0, FE,NT ),
where
FE,NT = Q −1 −1
FE,NT VFE,NT QFE,NT .
A number of results in the literature can be derived. In the case where the errors are serially uncor-
related, VFE,NT simplifies to
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 2
N T
VFE,NT = σ (xit − x̄i ) (xit − x̄i ) .
NT i=1 t=1 i
If it is further assumed that the errors are homoskedastic, so that σ 2i = σ 2 , we then have
σ2
N T
VFE,NT = (xit − x̄i ) (xit − x̄i ) ,
NT i=1 t=1
FE,NT = σ 2 Q −1
FE,NT .
where
FE,T = Q −1 −1
FE,T VFE,T QFE,T ,
1 2
T T
Var (ūi. ) = γ i (t, t ),
T 2 t=1
t =1
1
x̄i Var β̂ FE x̄ i = O ,
NT
and
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
N
j=1 Xj. MT E(uj. ūi )
Cov ūi , x̄i. β̂ FE −β |X = Q −1
FE,NT
NT
Xi. MT E(ui. ūi )
= Q −1
FE,NT .
NT
−1
T
However, E(ui. ūi. ) is a T × th
1 vector with its t element given by T τ =1 γ i (t, τ ). Hence, the
last two terms of Var α̂ i vanish as N → ∞. But when T is fixed the first term does not vanish
as N → ∞, even if it is assumed that the errors are serially uncorrelated. Therefore, in general,
α̂ i is consistent only if T → ∞.
The above results show that the FE estimator is fairly robust to temporal dependence and
cross-sectional heteroskedasticity. But it is important to note that the robustness of the FE esti-
mator to possible correlations between α i and xit comes at a cost. Using the FE approach we can
only estimate the effects of time varying regressors. The effects of non-time varying regressors
(such as sex or race) will be unidentified under the within or the FE transformation. But with
additional assumptions the time-invariant effects can be estimated using time averages of the
residuals from fixed-effects regressions. For further details see Section 26.10.
Another important point to bear in mind is that the consistency of β̂ FE crucially depends on
the assumption of strict exogeneity of the explanatory variables. As we shall see in Chapter 27,
in the presence of weakly exogenous regressors, since the time averages x̄i. in (26.15), contain
the values of xit at all time periods, the demeaning operation would introduce a correlation of
order O(T −1 ) between the regressors and the error term in the transformed equation (26.16)
that renders β̂ FE biased in small samples. Finally, the FE is often not fully efficient since it ignores
variation across individuals in the sample (see Hausman and Taylor (1981)).
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
y1. τT 0 0
⎜ y2. ⎟ ⎜ 0 ⎟ ⎜ τT ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ = α1 ⎜ .. ⎟ + α2 ⎜ .. ⎟ + . . . + αN ⎜ .. ⎟
⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠
yN. 0 0 τT
⎛ ⎞ ⎛ ⎞
X1. u1.
⎜ X2. ⎟ ⎜ u2. ⎟
⎜ ⎟ ⎜ ⎟
+⎜ . ⎟ β + ⎜ .. ⎟,
⎝ .. ⎠ ⎝ . ⎠
XN. uN.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
N
y= α i di + Xβ + u,
i=1
where di is an NT × 1 vector of a dummy variable with all its elements zero except for the ele-
ments associated with the ith cross-sectional unit which are set to unity. It is easily seen that the
OLS estimator of β in this regression is the same as the FE estimator. For this reason the FE esti-
mator is also known as the least squares dummy variable (LSDV) estimator. However, when N is
relatively large it is not computationally efficient as compared to using the pooled formula given
by (26.17). Also, especial care needs to be exercised when the LSDV approach is used, since
the standard errors obtained from such regressions are only valid under the strong assumptions
that the errors are homoskedastic and serially uncorrelated. In general it is more appropriate to
use (26.17) to compute the FE estimator and then compute robust standard errors as set out in
Section 26.7.
N
(θ ) = i (θ i ) , (26.25)
i=1
T 1 1
i (θ i ) = − log (2π) − log | i | − yi − α i τ T − Xi. β −1
i yi. − α i τ T − Xi. β ,
2 2 2
θ i = α i , β , vech( i ) , θ = (θ 1 , θ 2 , . . . , θ N ) . In the special case where i = σ 2 IT , it is
easily seen that the maximum likelihood estimator for β and α i obtained from the first-order
conditions, by maximizing (26.25), is identical to the FE estimator for these parameters. How-
ever, the estimator for σ 2 does not have the appropriate correction for the degrees of freedom,
since from the first-order conditions we obtain
1
N
σ̂ 2ML = yi. − Xi. β̂ FE − α̂ i τ T yi. − Xi. β̂ FE − α̂ i τ T .
NT i=1
σ̂ 2ML is not a consistent estimator of σ 2 when T is fixed and N → ∞. This is due to the
dependence of σ̂ 2ML on α̂ i , for i = 1, 2, . . . , N, estimated for each i based on a finite sample
of T observations, which is known as the incidental parameters problem discussed by Neyman
and Scott (1948).
In the more general case where i = σ 2 IT , the FE and ML estimators differ and the latter is
only feasible if T is sufficiently large.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and
It follows that
⎛ ⎞
1 ρ ··· ρ
⎜
⎜ ρ 1 ··· ρ ⎟ ⎟
v = E vi. vi. = σ 2α + σ 2 ⎜ .. .. .. . ⎟, (26.26)
⎝ . . . .. ⎠
ρ ρ ··· 1
where
σ 2α
ρ= . (26.27)
σ 2α + σ 2
Note that the presence of the time-invariant effects, α i , introduces equi-correlation among regres-
sion errors belonging to the same cross-sectional unit, although errors from different cross-
sectional units are independent. It follows that the GLS estimator needs to be used to obtain
an efficient estimator of β, which is given by
" N #−1
N
β̂ RE = Xi. −1
v Xi. Xi. −1
v yi. . (26.28)
i=1 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
N
Assumption RE.3: The matrix (NT)−1 −1
i=1 Xi. v Xi. is nonsingular for all N, and T and as
N and T → ∞.
" N #−1
−1
Var β̂ RE = Xi. v Xi. . (26.29)
i=1
If the variance components σ 2 and σ 2α are unknown, a two-step procedure can be used to imple-
ment the GLS. In the first step, the variance components are estimated using some consistent
estimators. In particular, the within-group residuals can be used to estimate σ 2 and σ 2α
1 N
σ̂ 2 = yi. − Xi. β̂ FE MT yi. − Xi. β̂ FE ,
N(T − 1) − k i=1
1 2 1
N
σ̂ 2α = ȳi − β̂ FE x̄i − σ̂ 2 .
N − k i=1 T
But care must be exercised since there is no guarantee that σ̂ 2α > 0, when T is relatively small.
An alternative estimator of σ 2α which is ensured to be positive is given by
N 2
i=1 α̂ i − α̂
σ̃ 2α = ,
N−1
where α̂ i is the least squares estimate α i given by (26.24) and α̂ = N −1 N 2
i=1 α̂ i . However, σ̃ α
is a consistent estimator of σ α only if both N and T are large.
2
v = σ 2 IT + σ 2α τ T τ T , (26.30)
−1
v = σ 2 IT + σ 2α Tτ T τ T τ T τ T,
= σ 2 IT + σ 2α T PT ,
1
= σ 2 MT + PT ,
ψ
−1
where PT = IT − MT = τ T τ T τ T τ T , and
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
σ2 1−ρ
ψ= = , (26.31)
Tσ 2α + σ 2 1 − ρ + ρT
Because ψ > 0, it follows from (26.33) that the difference between the covariance matrices of
β̂ FE and β̂ RE is a positive semi-definite matrix. Namely, under RE specification the RE estimator
is more efficient than the FE estimator.
Further insights into the RE procedure can be obtained by noting that, from (26.30),
1
v−1/2 = (IT − φPT ) ,
σ2
where φ = 1−ψ 1/2 . Hence, the RE estimator is obtained by applying the pooled OLS estimator
to the transformed equation
where
yi.0 = (IT − φPT ) yi. = yi. − φ ȳi. , Xi.0 = (IT − φPT ) Xi. = Xi. − φ X̄i. .
Note that errors in (26.34) are serially uncorrelated, and hence the pooled OLS in this model is
efficient. Seen from this perspective, the RE estimator is obtained by a quasi-time demeaning (or
quasi-differencing) data: rather than removing the time average from the explanatory and depen-
dent variables at each t as in the FE approach, the RE approach removes a fraction of the time
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
average. If φ is close to 1 (which implies ψ close to 0), the random effects and fixed-effects esti-
mates tend to be close.
The above discussion also shows that, similar to FE estimation, the consistency of the RE esti-
mator crucially depends on the assumption of strict exogeneity of the explanatory variables. In
the presence of weakly exogenous regressors, the transformation (26.34) to eliminate the indi-
vidual effects would render the transformed regressors, Xi.0 , correlated with the new error term,
u0i. , thus inducing a small sample bias in the β̂ RE .
There are a number of advantages in using an RE specification. First, it allows the derivation
of efficient estimators which, as seen above, make use of both within- and between-group varia-
tions. Further, contrary to the FE specification, with an RE specification it is possible to estimate
the impact of time-invariant variables. However, the disadvantage is that one has to specify a con-
ditional density of α i given Xi. , which needs to be independent of the explanatory variables. If
such an independence assumption does not hold, then the RE estimator would be inconsistent.
For further discussion, see Mundlak (1978).
where
τ Tτ 1
S = IT − φ T √ ,
τ Tτ T 1−ρ
$
and as before, φ = 1 − 1−ρ1−ρ + ρT . Under the cross-sectional independence of the errors we
obtain the following log-likelihood function for the RE model
TN TN
(θ) = − log 2π σ 2v − log (1 − ρ)
2 2
1
N
− 2 Syi − αSτ T − SXi β Syi − αSτ T − SXi β ,
2σ v i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
TN TN
(θ ) = − log 2πσ 2v − log (1 − ρ) (26.36)
2 2
1
N
− 2
ỹi − α̃τ T − X̃i β ỹi − α̃τ T − X̃i β .
2σ v i=1
Also
N
ỹi − α̃τ T − X̃i β ỹi − α̃τ T − X̃i β
i=1
N
N
= 2
ỹi − X̃i β ỹi − X̃i β + NT α̃ − 2α̃ τ T ỹi − X̃i β .
i=1 i=1
N
%̃
α(ρ) = N −1 T −1 τ T ỹi − X̃i β̂(ρ)
i=1
" N #−1 " N #
N
β̂(ρ) = X̃i X̃i X̃i ỹi − %̃
α(ρ) X̃i τ T ,
i=1 i=1 i=1
and
1
N
σ̂ 2v (ρ) = ỹi − %̃ ˆ
α(ρ)τ T − X̃i β(ρ) ỹi − %̃ ˆ
α(ρ)τ T − X̃i β(ρ) .
NT i=1
These ML estimators can be substituted back into the log-likelihood function to obtain a con-
centrated log-likelihood function in terms of ρ. The concentrated log-likelihood function can
then be maximized using grid search techniques which can be readily implemented, considering
that ρ must lie in the region 0 ≤ ρ < 1. Plotting the profile function of the concentrated log-
likelihood function also allows us to check for multiple or local maxima. See Maddala (1971)
and Hsiao (2003) for further details.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Alternatively, the cross-sectional regression could be based on time averages of yit and xit , for
example
where as before ȳi = T −1 Tt=1 yit , x̄i = T −1 Tt=1 xit , and v̄i = ūi + ηi . Running the
regression of ȳi on x̄i. defined by (26.40), we obtain the cross-sectional estimator of β, which we
denote by β̂ b , namely
−1
N
N
−1 −1
β̂ b = N (x̄i. − x̄) (x̄i. − x̄) N (x̄i. − x̄) yi − y . (26.41)
i=1 i=1
β̂ b is also known as the between estimator since it only exploits variation between groups, while
ignoring the variability of observations within groups. For future reference we also note that
β̂ b = Q −1
b,NT qb,NT ,
where
N
Q b,NT = N −1 (x̄i. − x̄) (x̄i. − x̄) , (26.42)
i=1
and
N
−1
qb,NT = N (x̄i. − x̄) yi − y . (26.43)
i=1
To obtain the variance of the between estimator, since yi − y = (x̄i − x̄) β+ (v̄i − v̄), we
note that
N
−1 −1
β̂ b − β = Q b,NT N (x̄i − x̄) v̄i .
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Unlike the RE estimator which is consistent in terms of both N and T, the between estimator is
consistent only if N → ∞, which is not surprising since it ignores the variability along the time
dimension.
where vit = uit + ηi . Since under RE specification E (vit |X ) = 0, then Assumption P1 of the
pooled OLS estimator is satisfied. Also Assumptions P2 and P3 are satisfied under the RE model.
Furthermore, Assumption P5 clearly applies to vit , as they allow for the errors of the pooled OLS
regression to be serially correlated. Finally, Assumption P4, the cross-sectional independence of
the errors, is assumed to hold for both pooled OLS and RE estimators. Therefore, pooled OLS
continues to be consistent under RE specification, although it will be inefficient under the RE
specification that maintains uit to be serially uncorrelated and homoskedastic. But these assump-
tions are likely to be quite restrictive in practice, and pooled OLS with robust standard errors
might be preferable. For estimation of robust standard errors for the pooled OLS estimator in
the presence of general forms of residual serial correlation and cross-sectional heteroskedastic-
ity see Section 26.7.
namely the total variations of the regressors in the case of the pooled OLS decomposes into the
total variations in the case of within (FE) and between estimators. Also let
N
T
q p,NT = (NT)−1 (xit − x̄) (yit − ȳ),
i=1 t=1
where qFE,NT , and qb,NT are defined by (26.21) and (26.43), respectively. Using (26.32) and the
above notations, the RE estimator, β̂ RE , can be rewritten as
−1
β̂ RE = Q FE,NT + ψQb,NT qFE,NT + ψqb,NT .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
−1
β̂ RE = Q FE,NT + ψQb,NT Q FE,NT β̂ FE + ψQ b,NT β̂ b ,
β̂ RE = W β̂ b + (Ik − W) β̂ FE , (26.46)
where
−1
W = ψ Q FE,NT + ψQ b,NT Q b,NT .
Expression (26.46) shows that β̂ RE is a weighted average of the between-group and within-group
estimators. If ψ → 0, the RE estimator becomes the FE estimator, while for ψ → 1, it is easy
to see from (26.46) that β̂ RE converges to the OLS estimator. The parameter ψ measures the
degree of heterogeneity in the intercept; under the pooled OLS, we have α i = α and σ 2α = 0,
and thus ψ = 1; under the fixed-effects hypothesis, the case of maximum heterogeneity, ψ = 0.
It also follows from (26.31) that as T → ∞, then ψ → 0 and RE and FE estimators tend to
the same value.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
a general covariance matrix of the uit , as in White (1980). In particular, the robust asymptotic
covariance matrix of β̂ FE , also known as the ‘clustered’ covariance matrix (CCM) estimator, is
given by
" #
1
N
1 −1
& β̂ FE =
Var Q X MT ûi. ûi. MT Xi. Q −1
∗ ∗
FE,NT ,
NT FE,NT NT i=1 i.
where û∗i. = MT yi. − Xi. β̂ FE . To show that Var & β̂ FE is an appropriate estimator of
Var β̂ FE , defined by (26.22), we need to establish that
1 1
N N
∗ ∗
lim Xi. MT E ûi. ûi. |X MT Xi. = lim Xi. MT E ui. ui. |X MT Xi. .
N,T→∞ NT N,T→∞ NT
i=1 i=1
and hence (recalling that MT ui. and Xi β̂ FE − β are uncorrelated)
E û∗i. û∗
i. |X = MT E ui. ui. |X MT
+ MT Xi. E β̂ FE − β β̂ FE − β |X Xi MT .
1
N
Xi. MT E û∗i. û∗
i. |X MT Xi.
NT i=1
1
N
= Xi. MT E ui. ui. |X MT Xi.
NT i=1
N X M X
T Xi. MT Xi. i T i.
+ E β̂ FE − β β̂ FE − β |X .
N i=1 T T
Consider now the relevant case where T is fixed as N → ∞. In this case we have already estab-
lished that
1
E β̂ FE − β β̂ FE − β |X = Var β̂ FE = O ,
N
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 1
N N
Xi. MT E û∗i. û∗
i. |X M X
T i. = Xi. MT E ui. ui. |X MT Xi. + O N −1 .
NT i=1 NT i=1
as desired. But when T is also large we either need to restrict the degree of error serial correlations
or assume that T/N → 0, as N and T → ∞, to obtain the consistency of the variance estimator.
& β̂ FE is not a consistent estimator if T is large and N is fixed. See also Hansen (2007).
Clearly Var
In a simulation study, Kezdi (2004) showed that Var & β̂ FE behaves well in finite samples, when
N is large and T is fixed.
Similar arguments can also be applied to obtain a consistent estimator of the variance of the
pooled OLS given by (26.10). Let
where X̃i. = (xi1 − x̄, xi2 − x̄, . . . , xiT − x̄), and ûi,OLS = ûi1,OLS , ûi2,OLS , . . . , ûiT,OLS .
In the case of the RE specification, a feasible estimator that allows for an arbitrary error covari-
ance matrix is given by
" N #−1 " #" N #−1
N
& β̂ RE =
Var ˆ v−1 Xi.
Xi. ˆ v−1 v̂i. v̂i.
Xi. ˆ v−1 Xi. ˆ v−1 Xi.
Xi. ,
i=1 i=1 i=1
where lit and kit are the logarithm of labour and capital inputs, respectively, and mi is an input that
represents the effect of a set of unobserved inputs such as quality of the soil, or the location of the land.
It is realistic to assume that mi remains constant over time (over a short time period), and that it
is known by the farmer, although not observed by the econometrician. If the farmer maximizes his
expected profits, then he will choose the observed inputs in xit = (lit , kit ) , in the light of mi . Hence,
there will be a correlation between the observed and unobserved inputs that renders pooled OLS,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
and the RE estimators inconsistent. To avoid inconsistent estimates FE estimates can be used. A
similar problem has been considered in Mundlak (1961), who identifies mi with the effect of an
unobserved ‘management’ activity that influences observed inputs. Mundlak (1961) also suggests
how to measure the ‘management bias’ by comparing the FE regression with the OLS regression
from a pooled regression without farm fixed-effects.
Example 57 (Grunfeld’s investment equation II) Following from Examples 37 and 38, we now
use data from the study by Grunfeld (1960) and Grunfeld and Griliches (1960) on eleven firms in
the US economy over the period 1935–1954. Consider2
Iit = α i + β 1 Fit + β 2 Cit + uit , i = 1, 2, .., 11; t = 1935, 1936, . . . , 1954, (26.47)
where Iit is gross investment, Fit is the market value of the firm at the end of the previous year,
and Cit is the value of the stock of plant and equipment at the end of the previous year. The eleven
firms indexed by i are General Motors (GM), Chrysler (CH), General Electric (GE), Westinghouse
(WE) and US Steel (USS), Atlantic Refining (AR), IBM, Union Oil (UO), Goodyear (GY), Dia-
mond Match (DM), American Steel (AS). Table 26.1 reports estimation of the above equation
using various estimation methods: the pooled OLS estimator given by (26.7), the FE estimator
given by (26.17), the RE estimator computed by maximization of the likelihood function, (26.36),
using the Newton-Raphson algorithm, and the between (BE) estimator given by (26.41). Note that
results from FE and RE (or ML) are very close to each other; this is also confirmed by the Hausman
test, which does not reject the null hypothesis that the RE and FE are identical (see Section 26.9.1
for a description of the Hausman test).
Estimation method β̂ 1 β̂ 2
0.114 0.227
OLS
(0.006) (0.024)
0.110 0.310
FE
(0.011) (0.016)
0.109 0.308
RE
(0.010) (0.016)
0.109 0.307
RE-MLE
(0.009) (0.016)
0.134 0.029
BE
(0.027) (0.175)
3.97
Hausman test FE vs RE
[0.137]
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
y = (α ⊗ τ T ) + (τ N ⊗ d) + Xβ + u, (26.49)
−1 −1
where d = (d1 , d2 , . . . , dT ) . Let PT = τ T τ T τ T τ T , PN = τ N τ N τ N τ N , and
consider the within transformation matrix
Q = IN ⊗ IT − IN ⊗ PT − PN ⊗ IT + PN ⊗ PT .
Let y∗ = Qy, X∗ = QX, and u∗ = Qu. For example, the generic element of y∗ is yit − ȳi. − ȳ.t +
ȳ.. . Noting that PT τ T = τ T , PN τ N = τ N , we have Q (α ⊗ τ T ) = 0, and Q (τ N ⊗ d) = 0.
Hence, the two-way FE estimator of β can be obtained by applying OLS to the transformed
model
One important point to bear in mind is that the above transformation wipes out the α i and
dt effects, as well as the effect of any time-invariant or individual-invariant variables. Therefore,
the two-way FE estimator cannot estimate the effect of time-invariant and individual-invariant
variables.
If the true model is a two-way fixed-effects model, as in (26.48), then applying the pooled
OLS, which ignores both time and individual effects, or the one-way FE estimator, which omits
the time effects, will yield biased and inconsistent estimates of regression coefficients.
Under the random effects specification, α i and dt are assumed to be random draws from
a probability distribution, and the GLS estimator can be used. Under this specification, it is
it |Xi. , α i , d t ) = 0, E (α i |Xi. ) = 0, E (dt |Xi. ) = 0, E(ui. ui. |Xi. , α i ) = σ u IT ,
2
2 that E(u
assumed
E α i |Xi. = σ α , E dt |Xi. = σ d for all i and t. In this case, letting vit = α i + dt + uit , the
2 2 2
, where v = (v , v , . . . , v ) , is
covariance matrix of v = v1. , v2. , . . . , vN. i. i1 i2 iT
E vv = v = σ 2α IN ⊗ τ T τ T + σ 2d τ T τ T ⊗ IT + σ 2u (IN ⊗ IT ) .
To obtain the GLS estimator, an expression for the inverse of v is needed. It is possible to show
that (see Wallace and Hussain (1969))
1
−1
v = INT − ψ 1 IN ⊗ τ T τ T − ψ 1 τ N τ N ⊗ IT + ψ 3 τ N τ N ⊗ τ T τ T .
σu
2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
σ 2α σ 2d
ψ1 = , ψ 2 = ,
σ 2u + Tσ 2α σ 2u + Nσ 2d
" #
σ 2α σ 2d 2σ 2u + Tσ 2α + Nσ 2d
ψ3 = 2 .
σ u + Nσ 2d σ 2u + Tσ 2α σ 2u + Tσ 2α + Nσ 2d
Example 58 In an interesting study, Lillard and Weiss (1979) investigate the sources of variation in
the earnings of American scientists reported by a panel of PhDs every two years, and over the decade
1960–1970. The sample is composed of six fields: biology, chemistry, earth sciences, mathematics,
physics, and psychology. The earning function has the form
where dt are year dummies, malei is a dummy variable equal to 1 if the scientist is a male and 0 oth-
erwise, schooli is a set of schooling related variables, and experienceit is a set of experience related
variables. The residual earnings variation in vit is decomposed into a random effect individual vari-
ance component in the level of earnings, a random effect individual component in earnings growth,
and a serially correlated transitory component. Specifically, it is assumed that (see also Exercise 5
below)
vit = α i + uit + ξ i t − t T , (26.50)
t T = T −1 (1 + 2 + . . . + T), and
uit = ρui,t−1 + ε it ,
where
αi
∼ 0, αξ , ε it ∼ IID 0, σ 2ε .
ξi
The individual-specific term, α i , represents the effect of unmeasured characteristics such as ability
and work-related preferences, on the relative earnings of scientists, while ξ i represents the effect of
omitted variables which influence the growth in earnings such as individual learning ability. It is not
unreasonable to expect some of the same unobserved variables to affect both α i and ξ i , in which
case they will be correlated. The serial correlation coefficient, ρ, represents the rate of deterioration
of the effects of random shocks, ε it , which persist for more than a year. The model is estimated by
maximum likelihood. One interesting finding is that, during the sample period, divergent patterns
in earnings are observed for individuals with similar characteristics. In particular, individuals with
greater mean earnings also had greater earnings growth. Further, it is observed that a substantial
increase in the variance of individual mean earnings with increased experience, while the variance
of the growth component remains constant. These patterns suggest that a substantial amount of
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
inequality is sustained even if one adopts measures of earnings which are based on longer periods,
such as a lifetime or permanent income. See Lillard and Weiss (1979) for further details.
H0 : α 1 = α 2 = . . . . = α N = 0.
The F-test is
RRSS−URSS
N−1
F1 = URSS . (26.51)
N(T−1)−k
where RRSS denotes the residual sum of squares under the null hypothesis, URSS the residual
sum of squares under the alternative. Under H0 , this statistic is distributed as F(N−1),N(T−1)−k .
Consider now the two-way fixed-effects specification, (26.48). In this case, it is possible to test
for joint significance of the time and group effects
H0 : α 1 = α 2 = . . . . = α N = 0, and d1 = d2 = . . . . = dT = 0.
H0 : α 1 = α 2 = . . . . = α N = 0, and dt = 0, t = 1, 2, . . . , T,
and the F-statistic is F3 ∼ F(N−1),(N−1)(T−1)−k . Finally, one can test the null of no time effects
allowing for group effects, namely
H0 : d1 = d2 = . . . . = dT = 0, and α i = 0, i = 1, 2, . . . , N.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Denote the efficient estimator by subscript ‘e’ and the inefficient but consistent estimator
(under the alternative hypothesis) by the subscript ‘c’. We then have
This is the result used by Hausman (1978) where it is assumed that θ̂ e is asymptotically the most
efficient estimator. However, it is easily shown that (26.52) holds under a weaker requirement,
namely when the (asymptotic) efficiency of θ̂ e cannot be enhanced by the information contained
in θ̂ c . Consider a third estimator θ̂ ∗ , defined as a convex combination of θ̂ c and θ̂ e
and hence a [Var(θ̂ e ) − Cov(θ̂ e , θ̂ c )]a = 0. But, if this result is to hold for an arbitrary vector, a,
we must have
Using this in
yields (26.52) as desired. Because under the null hypothesis both estimators are consistent, the
difference, θ̂ c − θ̂ e , will converge to zero if the null hypothesis is true, while under the alternative
hypothesis it will diverge. The Hausman test based on θ̂ c −θ̂ e [Var(θ̂ c )−Var(θ̂ e )]−1 θ̂ c −θ̂ e ,
will be consistent if Var(θ̂ c ) − Var(θ̂ e ) converges to a positive definite matrix and θ̂ c − θ̂ e
converges to a non-zero limit under the alternative hypothesis. See Pesaran and Yamagata (2008)
for examples of cases where the Hausman test fails to be applicable. See also Chapter 28.
The Hausman testing procedure is quite general and can be applied to a variety of testing prob-
lems, and is particularly convenient in the case of panels where N is large and the use of classical
tests can encounter the incidental parameter problem. In the context of panels, Hausman and
Taylor (1981) consider the hypothesis
H0 : E ηi |xit = 0, (26.56)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
α i = α + ηi . (26.57)
Indeed, under H0 the RE estimator of β achieves the Cramer–Rao lower bound, while under H1
it is biased. In contrast, the FE estimator of β is consistent under both H0 and H1 , but it is not
efficient under H0 . Let
q̂ = β̂ RE − β̂ FE . (26.58)
Hence, the Hausman test examines whether RE and FE estimates are significantly different.
We have
& q̂ = Var
Var & β̂ FE − Var
& β̂ RE , (26.59)
& β̂ FE and Var
where Var & β̂ RE are the estimated covariances of β̂ FE and β̂ RE obtained under
the assumption that errors, uit , are serially uncorrelated and homoskedastic. Under this setting
the Hausman statistic is given by
−1
H = q̂ Var
& q̂ q̂, (26.60)
q̂ = β̂ FE − β̂ OLS
T
N
= Q −1
FE,NT (NT) −1
(xit − x̄i. ) uit
t=1 i=1
T N
−1 −1
− QP,NT (NT) (xit − x̄) ηi + uit .
t=1 i=1
It is clear that under H0 , and supposing that Assumptions P1–P5 hold, E β̂ FE − β̂ OLS |X = 0,
or E β̂ FE − β̂ OLS = 0. But if H0 does not hold we then have
' (
T
N
−1 −1
E β̂ FE − β̂ OLS |X = −QP,NT (NT) E [(xit − x̄) ηi ] = 0.
t=1 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Using (26.10) and (26.22) and noting that (under Assumptions P1-P5) we have
1 −1 −1
Cov β̂ FE , β̂ OLS |X = Q VFEP,NT Q P,NT ,
NT FE,NT
where
N
T
T
−1
VFEP,NT = (NT) γ i (t, t ) (xit − x̄i. ) (xit − x̄) .
i=1 t=1 t =1
Similarly,
1 −1
Cov β̂ OLS , β̂ FE |X = Q VPFE,NT Q −1
NT P,NT FE,NT
N
T
T
VPFE,NT = (NT)−1 γ i (t, t ) (xit − x̄) (xit − x̄i. ) .
i=1 t=1 t =1
Hence
1 Q −1 −1 −1 −1
FE,NT VFE,NT Q FE,NT + QP,NT VP,NT QP,NT
Var β̂ FE − β̂ OLS |X = .
NT −Q FE,NT VFEP,NT QP,NT − QP,NT VPFE,NT Q −1
−1 −1 −1
FE,NT
The above result simplifies if we assume that the errors are serially uncorrelated, and reduces
to Var β̂ FE |X − Var β̂ OLS |X if it is further assumed that the errors, uit are homoskedastic.
To see this, note that in the case of serially uncorrelated errors, γ i (t, t ) = 0 if t = t , and
γ i (t, t) = σ 2i , we have
N
T
VFEP,NT = (NT)−1 σ 2i (xit − x̄i. ) (xit − x̄)
i=1 t=1
N
T
= (NT)−1 σ 2i (xit − x̄i. ) [xit − x̄i. + (x̄i. − x̄)]
i=1 t=1
N
T
= VFE,NT + (NT)−1 σ 2i (xit − x̄i. ) (x̄i. − x̄)
i=1 t=1
= VFE,NT .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 Q −1 −1 −1 −1
FE,NT VFE,NT Q FE,NT + QP,NT VP,NT QP,NT
Var β̂ FE − β̂ OLS |X = .
NT −Q −1 −1 −1 −1
FE,NT VFE,NT QP,NT − QP,NT VFE,NT Q FE,NT
If we now further assume that σ 2i = σ 2 we have VFE,NT = Q FE,NT , and VP,NT = QP,NT , and
we have the further simplification
1 Q −1 −1 −1 −1
FE,NT Q FE,NT Q FE,NT + QP,NT QP,NT QP,NT
Var β̂ FE − β̂ OLS |X =
NT −Q −1 −1 −1 −1
FE,NT Q FE,NT QP,NT − QP,NT Q FE,NT Q FE,NT
1 −1 −1
= Q FE,NT − QP,NT ,
NT
formula.
which accords with Hausman’s variance
For consistent estimation of Var β̂ FE − β̂ OLS |X in the general case see the derivations in
Section 26.7.
where
α i = α + ηi , (26.62)
and zi is an m × 1 vector of observed individual-specific variables that only vary over the cross-
sectional units, i. The focus of the analysis is on estimation and inference involving the elements
of γ . Important examples of time-invariant regressors are sex, ethnicity, and place of birth.
In what follows, we allow for ηi and xit to have any degree of dependence and distinguish
between case 1, where zi is assumed to be uncorrelated with ηi , and case 2, where one or more
elements of zi are allowed to be correlated with ηi . Under case 2, to identify the time-invariant
effects we need to assume that there exists a sufficient number of instruments that can be used
to deal with the dependence of zi and ηi .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
ûit = yit − β̂ FE xit . (26.63)
T
Then FEF estimator of γ is computed by regressing ûi = 1
T t=1 ûit = ȳi − β̂ FE x̄i on an
intercept and zi , and is given by
N −1
N
γ̂ FEF = (zi − z̄) (zi − z̄) (zi − z̄) ûi − û , (26.64)
i=1 i=1
where û = N −1 N i=1 ûi .
Pesaran and Zhou (2014) also derive the asymptotic distribution of γ̂ FEF under Assump-
tions P1–P2, P3, P3’ and P4–P5, and the following additional assumptions on the time-invariant
regressors:
Assumption P6: Consider the m × m matrix Q zz,N , and the m × k matrix Q zx̄,N defined by
1
N
Q zz,N = (zi − z̄) (zi − z̄) , (26.65)
N i=1
1
N
Q zx̄,N = (zi − z̄) (x̄i − x̄) . (26.66)
N i=1
Matrix Q zz,N is nonsingular for all N > m and as N → ∞, namely λmin Q zz,N > 1/K, for
all N. Matrices Q zx̄,N and Q zz,N converge (in probability) to the non-stochastic limits Q zz and
Q zx̄ , respectively.
Assumption P7: The time-invariant regressors, zi , are independently distributed of vj = ηj +
ε̄ j , for all i and j, and ηi and ε̄ i are independently distributed such that vi ∼ IID(0, σ 2η + σ 2i /T),
where E η2i = σ 2η , and E ε 2it = σ 2i . Also, zi are either deterministic or have bounded sup-
port, namely zi < K, or zi satisfy the moment conditions E (zi − z̄)4 < K, for all i.
Under the above assumptions Pesaran and Zhou (2014) show that (for a fixed T and as N →
∞)
√
N γ̂ FEF − γ →d N 0, γ̂ FEF , (26.67)
where
γ̂ FEF = Q −1
zz σ 2
Q
η zz + ξ̄ Q zz .
−1
(26.68)
⎡ ⎤
N
T
ξ̄ = lim N −1 ⎣T −2 dz,it dz,is E (ε it ε is )⎦ , (26.69)
N→∞
i=1 t,s=1
and
1
N
dz,it = (zi − z̄) − zj − z̄ wji,t , wij,t = (x̄i − x̄) Q −1
FE,NT xjt − x̄j . (26.70)
N j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Pesaran and Zhou (2014) propose to estimate Var γ̂ FEF by
& γ̂ FEF = N −1 Q −1
Var & −1
zz,N V̂ zz,N + Q zx̄,N N Var(β̂) Q zx̄,N Q zz,N , (26.71)
where
1 2
N
V̂zz,N = ς̂ i − ς̂ (zi − z)(z ¯ ,
¯ i − z) (26.72)
N i=1
xi· = (xi1 − x̄i , xi2 − x̄i , . . . , xiT − x̄i ) denotes the demeaned vector of xit and the t th element
of ei is given by
where Xi = (xi1 , . . . , xiT ) , yi = (yi1 , yi2 , . . . , yiT ) , and ε i = (ε i1 , ε i2 , . . . , εiT ) . Then the fol-
lowing two-step procedure is used:
Step 1 of HT: β FE is estimated by β̂ FE , the FE estimator, and the deviations d̂i = ȳi − x̄i β̂ FE ,
i = 1, 2, . . . , N, are used to compute the 2SLS (or IV) estimator
−1
γ̂ IV = Z PA Z Z PA d̂, (26.75)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
−1
where d̂ = (d̂1 , d̂2 , . . . , d̂N ) , Z = (z1, z2 , . . . , zN ) = (Z1 , Z2 ), and PA = A A A A is the
orthogonal projection matrix of A = τ N , X̄1 , Z1 , where X̄ = (X̄1 , X̄2 ), and X̄ = x̄1 , x̄2 , . . . ,
x̄N , x̄i = (x̄i,1 , x̄i,2 ). Using these initial estimates of β and γ , the error variances σ 2ε and σ 2η are
estimated as
σ̂ 2η = s2 − σ̂ 2ε ,
1 N
σ̂ 2ε = yi − Xi β̂ FE Mτ T yi − Xi β̂ FE ,
N (T − 1) i=1
1 2
N T
s =
2
yit − μ̂ − xit β̂ FE − zi γ̂ IV ,
NT i=1 t=1
Step 2 of HT : In the second step the N equations in (26.74) are stacked to obtain
y = Wθ + (η ⊗ τ T ) + ε,
To simplify the notation we assume that the first column of Z is τ N , and then write the (infeasi-
ble) HT estimator as,
−1 −1/2
θ̂ HT = W −1/2 PA −1/2 W W PA −1/2 y , (26.77)
−1
where PA = A A A A is the projection onto the space of instruments A = τ N ⊗ τ T ,
Q V X, X(1) , Z1 ⊗ τ T , where X(1) = (x1,1 , x1,2
, . . . , x ) , with x = x , . . . , x
1,N 1,i 1,i1 1,iT , and
x1,it contains the regressors that are uncorrelated with ηi .3
The covariance matrix of θ̂ HT is given by
) 1
T
Var θ̂ HT = Q −1 + Q −1
W −1/2
PA V η − σ 2
I
η N ⊗ τ τ
T T
σ 2ε + Tσ 2η T
*
PA −1/2 W Q −1 , (26.78)
3 See Amemiya and MaCurdy (1986) and Breusch, Mizon, and Schmidt (1989) for discussion on the choice of instru-
ments for HT estimation.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where Q = W −1/2 PA −1/2 W, and Vη represents the covariance matrix of η. Var θ̂ HT
reduces to Q −1 in the standard case where ηi ’s are assumed to be homoskedastic and cross-
sectionally independent, namely when Vη = σ 2η IN .
Remark 7 In the case where the effects of the time-invariant regressors are exactly identified, then the
HT estimator of γ , γ̂ HT , is identical to the first stage estimator of γ , given by (26.75). See Baltagi
and Bresson (2012).
N
N
Q rz,N = N −1 (ri − r̄)(zi − z̄) , Q rx̄,N = N −1 (ri − r̄)(x̄i − x̄) , Q rr,N
i=1 i=1
N
= N −1 (ri − r̄)(ri − r̄) , (26.79)
i=1
where r̄ = N −1 N i=1 ri . Q rz,N and Q rr,N are full rank matrices for all N > r, and have finite
probability limits as N → ∞, given by Q rz and Q rr , respectively. MatricesQ rx̄,N and Q zz,N
have finite probability limits given by Q rx̄ and Q zz , respectively, and in cases where xit and zi
are stochastic with unbounded supports, then λmin (Q rr,N ) > 1/K, for all N, and as N → ∞,
with probability approaching one.
Under the above assumptions (including Assumptions P1–P6) γ can be estimated consis-
tently by
−1
γ̂ FEF−IV = Q zr,N Q −1
rr,N Q zr,N Q zr,N Q −1
rr,N Q rû,N , (26.80)
1
N
Q rû,N = (ri − r̄) ûi − û ,
N i=1
N
and as before, û = 1
N = ȳi − x̄i β̂ FE . It then follows that
i=1 ûi , and ûi
√
N γ̂ FEF−IV − γ →d N 0, γ̂ FEF−IV ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
−1 −1 −1
γ̂ FEF−IV = Q zr Q −1
rr Q zr Q zr Q −1
rr σ η Q rr + ψ̄
2
Q −1
rr Q zr Q zr Q rr Q zr .
(26.81)
The variance of γ̂ FEF−IV can be consistently estimated by
& γ̂ FEF−IV = N −1 Hzr,N V̂rr,N + Q rx̄,N N Var(
Var & β̂ FE ) Q rx̄,N Hzr,N ,
where
−1
Hzr,N = Q zr,N Q −1
rr,N Q zr,N Q zr,N Q −1
rr,N ,
1
N
Q rx̄,N = (r − r̄) (x̄i − x̄) ,
N i=1 i
1
N
2
V̂rr,N = υ̂ i − υ̂ (ri − r̄)(ri − r̄)
N i=1
where
¯ γ̂ FEF−IV .
υ̂ i − υ̂ = ȳi − ȳ − (x̄i − x̄) β̂ − (zi − z)
Monte Carlo experiments reported in Pesaran and Zhou (2014) show that γ̂ FEF and γ̂ FEF−IV
perform well in small samples and are robust to heteroskedasticity and residual serial correlation.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
marriedi = T −1 Tt=1 marriedit as an instrument for educi , FEF-IV2 uses blacki as an instru-
ment for educi , and FEF-IV3 uses hispi as an instrument for educi . The results are summarized in
Table 26.2.
The results show that the estimates could be quite sensitive to the choice of the estimation pro-
cedure and the instrument used for educi which is the time-invariant variable of interest and deter-
mines the return to schooling. Estimates of the coefficient of the educi variable all have the expected
positive sign, although they vary widely across the estimation procedures. The pooled OLS and
FEF estimates are very close (around 0.10), although as to be expected the pooled OLS has a
much smaller standard error than that of the FFF estimate (0.0046 as compared to 0.0091). The
FEF estimates are preferable to the pooled OLS estimates since they allow for possible dependence
between the time varying regressors and the individual-specific effects, whilst this is ruled out under
pooled OLS. Nevertheless, the FEF estimates could still be biased since they ignore possible depen-
dence between educi and the individual specific effects. To take account of such dependence we need
to have suitable instruments. We consider a number of possibilities. Since the model contains three
time-invariant variables (educi , blacki and hispi ) we need at least three instruments. Initially, fol-
lowing HT, we consider using x1,it = (marriedit ) and z1,i = blacki , hispi as instruments. The
corresponding set of instruments for the FEF-IV procedure will be ri = (marriedi , blacki , hispi ) .
These associated estimates in Table 26.2 are given under columns HT and FEF-IV1 . As can be seen,
the HT and FEF-IV1 estimates are very close and differ only marginally in terms of the estimated
standard errors. This is partly due to the fact that the parameters are exactly identified and there
is little variation in marriedit overtime, and as a result it does not make much difference in using
Table 26.2 Pooled OLS, fixed-effects filter and HT estimates of wage equation
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
marriedit or marriedi as an instrument for educi . Also, note that blacki and hispi are treated as
exogenous under both HT and FEF-IV procedures. The HT and FEF-IV1 estimates of return to
schooling are rather disappointing—neither is statistically significant, which seems to be largely due
to the quality of marriedit or marriedi as an instrument for educi . As you might recall, it is not suffi-
cient that the instrument is uncorrelated with the individual-specific effects, α i ; a strong instrument
should also have a reasonable degree of correlation with the instrumented variable. In the present
application the correlation between marriedi and educi is around 0.025, which renders marriedi
a weak instrument for educi . In view of this, we consider two additional specifications of the wage
equation which are reported in Table 26.2 under columns FEF-IV2 and FEF-IV3 . Specification
FEF-IV2 excludes blacki from the regression and uses it as the instrument for educi , whilst FEF-IV3
excludes hispi and uses it as the instrument. In these specifications the estimates of the coefficient
educi are both positive and statistically significant, although the estimate obtained (namely 0.469)
when blacki is used as an instrument is rather large and not very precisely estimated (it has a large
standard error of 0.216). Once again this could reflect the poor quality of the instrument (blacki ) in
this specification whose correlation with educi is −0.037. In contrast, the correlation between hispi
and educi is around −0.20, and the specification FEF-IV3 which uses hispi as the instrument for
educi yields a much more reasonable estimate for the schooling variable at 0.071, which is closer
to the FEF’s estimate of 0.104, although the FEF-IV3 estimate is much less precisely estimated as
compared to the FEF estimate.
where xit is a vector of exogenous explanatory variables, and α i could be treated as fixed or ran-
dom. Instead of observing y∗it , we observe yit , where
)
1, if y∗it > 0,
yit =
0, if y∗it ≤ 0.
Two widely used parametric specifications are the logit and probit models, based on the logistic
and the standard normal distributions, respectively. Specifically, under the unobserved effects
logit specification, it is assumed that
eαi +β xit
P yit = 1|xit , α i = F α i + β xit = ,
1 + eαi +β xit
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
+ ' (
+∞ ,
T
Li yi1 , yi2 , . . . , yiT ; Xi. , θ = (α i + X.t β) [1 − (α i + X.t β)]
yit 1−yit
−∞ t=1
× 1/σ 2α φ α i /σ 2α dα i .
where φ(.) is the standard normal density. Hence, the log-likelihood function
√ for the full sample
can be computed and maximized with respect to β and σ 2α to obtain N-consistent asymp-
totically normal estimators. The conditional MLE in this context is typically called the random
effects probit estimator. For further details, the reader is referred to Wooldridge (2010).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
to some pre-specified rules; individuals initially participating in the panel may not be willing
or able to participate in some waves. The way unbalanced panels are treated critically depends
on whether the cause of missing observations on individual cross-sectional units is random or
systematic.
Let sit = 1 if (yit , xit ) is observed and zero otherwise.
sit is an indicator variable telling
Then
us which time periods are missing for each i, and yit , xit , sit can be treated as a random sample
from the population. If the indicator variable, sit , is independent of the error term uit for all t and
t (although it may be possibly correlated with α i and/or Xi,Ti ), then FE on the unbalanced panel
remains consistent and asymptotically normal. The unbalanced panel data model in general can
now be written as
for i = 1, 2, . . . , N and t = 1, 2, . . . , T, where y∗it = sit yit , xit∗ = sit xit , and u∗it = sit uit . The
fixed-effects estimation approach can now be applied to the above model. We first note that
T
T
T
T
y∗it = αi sit + xit∗ β + u∗it ,
t=1 t=1 t=1 t=1
and assuming that Tt=1 sit = 0, (which will be the case if we have at least one time series
observation on unit i), we have
and hence
−1
N
T
N
T
β̂ FE = sit xit − x̄i∗ xit − x̄i∗ sit xit − x̄i∗ yit − ȳ∗i . (26.83)
i=1 t=1 i=1 t=1
Alternatively, one could drop all missing observations and consider the remaining Ti obser-
vations, so that for unit i (stacking available observations) we have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T
where MTi = ITi − τ Ti (τ Ti τ Ti )−1 τ Ti . Note that Ti = t=1 sit , it is easily seen that the
alternative expressions in (26.83) and (26.84) are the same.
A similar approach can be utilized for pooled OLS and RE estimators. The pooled OLS
estimator will be given by
−1
N
T
N
T
β̂ FE = sit xit − x̄∗ xit − x̄∗ sit xit − x̄∗ yit − ȳ∗ ,
i=1 t=1 i=1 t=1
T ∗
N
T
N
T ∗
N
T
where x̄∗ = N i=1 t=1 xit / i=1
∗
t=1 sit , and ȳi = i=1 t=1 yit / i=1 t=1 sit .
For RE specification, the relevant quasi-demeaned observations for the unbalanced panel are
where
-
σ2
φi = 1 − .
Ti σ 2α + σ 2
To remain consistent and asymptotically normal, this estimator requires the stronger condition
that sit is independent of α i , as well as of uit for all t and t .
Sample selection is likely to be a problem and could induce a bias when selection is related
to the idiosyncratic errors. One possible way to test for sample selection bias is to compare esti-
mates of the regression equation based on the balanced sub-panel of complete observations only,
with estimates from the unbalanced panel by the means of a Hausman test. Significant differences
between the estimates should be caused by a non-random response problem. However, note that,
since both estimators are inconsistent under the alternative, the power of this test may be lim-
ited. As an alternative, a simple test of selection bias has been suggested by Nijman and Verbeek
(1992). It consists of augmenting the regression equation with the lagged selection indicator,
si,t−1 , estimating the model, and performing a t-test for the significance of si,t−1 . Under the null
hypothesis, uit is uncorrelated with sit for all t , and selection in the previous time period should
not be significant in the equation at time t.
See Nijman and Verbeek (1992) and Wooldridge (2010) for further discussion.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
26.14 Exercises
1. Show that the maximum likelihood estimator for β and α i of model (26.1) obtained by max-
imizing (26.25) is identical to the FE estimator for these parameters.
2. Show that the FE estimator of β in model (26.1) is identical to the OLS estimator of β in
a regression of yit on xit and N dummies dj , j = 1, 2, . . . , N, with dj = 1 if j = i and 0
otherwise.
3. Derive the error covariance structure of vit in (26.50), in Example 58. For further details, see
Lillard and Weiss (1979).
4. Consider the RE model yit = α i + β xit + uit , where α i ∼ IID(α, σ 2α ), α i and uit are
independently distributed of each other and of xit for all i, t, t . Derive the bias of estimating
σ 2α by
N 2
i=1 α̂ i − α̂
σ̃ 2α = ,
N−1
N
where α̂ i is the least squares estimate α i given by (26.24) and α̂ = N −1 i=1 α̂ i .
5. The panel data model
is defined over the groups i = 1, 2, . . . , N, and the unbalanced periods t = Ti0 + 1, Ti0 +
2, . . . , Ti1 . The unit-specific intercepts are assumed to be random
α i = α + ηi ,
where ηi ∼ IID 0, σ 2η and ε it ∼ IID 0, σ 2i are distributed independently of xit for i, t
and t .
where
1 1
Ti1 Ti1
ȳi = yit , and x̄i = xit ,
Ti t=T +1 Ti t=T +1
i0 i0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where
1
Ti1
ε̄ i = ε it .
Ti t=T +1
i0
(b) Under what conditions does the cross-sectional regression in (a) yield a consistent esti-
mate of the long-run relationship between yit and xit ?
(c) Assuming the conditions in (b) are satisfied, discuss the efficient estimation of a1 in view
of the unbalanced nature of the underlying panel.
(d) If one is interested in the long-run relations, are there any advantages in cross-sectional
estimation over panel estimation?
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
27.1 Introduction
S o far we have considered panels with strictly exogenous regressors, but often we also wish
to estimate economic relationships that are dynamic in nature, namely, for which the data
generating process is a panel containing lagged dependent variables. If lagged dependent vari-
ables appear as explanatory variables, strict exogeneity of the regressors does not hold, and the
maximum-likelihood estimator or the within estimator under the fixed-effects specification is
no longer consistent in the case of panel data models where the number of cross-section units,
N, is large and T, the number of time periods, is small. This is due to the presence of the inci-
dental parameters problem discussed earlier (see Section 26.4). In addition, the treatment of
initial observations in a dynamic process raise a number of theoretical and practical problems.
As we shall see in this chapter, the assumption about initial observations plays a crucial role in
interpreting the model and formulating consistent estimators. Solving the initial value problem
is even more difficult in nonlinear panels, where misspecification in the distribution of initial
values can lead to serious bias in the parameter estimates.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
in Chapter 26, the presence of lagged values of the dependent variable amongst the regressors,
creates two complications. First, yi,t−1 can no longer be viewed as a strictly exogenous regressor.
By construction, the lagged dependent variable, yi,t−1 , will be correlated with the unit-specific
effects, α i , whether they are fixed or random, and with lagged uit . The second complication arises
due the non-vanishing effects of the initial values, yi0 , on yit , in small T panels. More explicitly,
using (27.1) to solve for yit recursively from the initial states, yi0 , we obtain
t
1 − λt t−1
yit = λt yi0 + λj β xi,t−j + αi + λj ui,t−j . (27.2)
j=0
1−λ j=0
Each observation on the dependent variable can thus be written as the sum of four components:
a term depending on initial observations, a component depending on current and past values of
the exogenous variables, a modified intercept term that depends on the unit-specific effects, α i ,
and a moving average term in past values of the disturbances. It is firstly clear that yit depends on
α i and the initial values, yi0 , and the effects of the latter do not vanish when T is small or when
λ is close to unity. In such cases, assumptions on initial observations play an important role in
determining the properties of the various estimators proposed in the literature (see Nerlove and
Balestra (1992)). At one extreme it can be assumed that the initial observations, yi0 , are fixed
constants specified independently of the parameters of the model. Under this specification there
are no unit-specific effects at the initial period, t = 0, which considerably simplifies the analysis.
However, as pointed out by Nerlove and Balestra (1992), unless there is a specific argument
in favour of treating yi0 as fixed (see, e.g., the application in Balestra and Nerlove (1966)), in
general such an assumption is not justified and can lead to biased estimates. Alternatively, the
initial values can be assumed to be random draws from a distribution with a common mean
yi0 = μ + i , (27.3)
where i is assumed to be independent of α i and uit . In this case, starting values may be seen as
representing the initial individual endowments, and their impact on current observations gradu-
ally diminishes and eventually vanishes with t. Finally, in the more general case, yi0 , could be spec-
ified to depend on the unit-specific effects, α i , and time averages of the regressors. For example,
α i + β x̄i
yi0 = + i, (27.4)
1−λ
where T −1 x̄i = t=1T x , and is independent of α . This specification is sufficiently general and
it i i
encompasses a number of other specifications of the initial values considered in the literature as
special cases.
According to Nerlove and Balestra (1992), the data generating process of yi0 should be quite
similar, if not identical, to the process generating subsequent observations. For further discussion
on assumptions concerning initial values, the reader is referred to Anderson and Hsiao (1981)
(see, in particular, Table 1 in Anderson and Hsiao (1981)), who show the sensitivity of max-
imum likelihood estimators to alternative assumptions about initial conditions, and Bhargava
and Sargan (1983).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where we assume that yi0 are given (non-stochastic), |λ| < 1, and uit ∼ IID 0, σ 2u . Using
(27.2) and setting β = 0, we have
t−1
1 − λt
yit = α i + λ yi0 +
t
λj ui,t−j . (27.6)
1−λ j=0
T
N
1
NT yi,t−1 − ȳi,−1 (uit − ūi )
i=1 t=1
λ̂FE − λ = N T
, (27.7)
1
2
NT yi,t−1 − ȳi,−1
i=1 t=1
where ȳi = T −1 Tt=1 yit , ȳi,−1 = T −1 Tt=1 yi,t−1 , and ūi = T −1 Tt=1 uit . For a fixed T and
as N → ∞, we have (by the Slutsky Theorem)
1
N T
limN→∞ NT E yi,t−1 − ȳi,−1 (uit − ūi )
i=1 t=1
Plim λ̂FE − λ = ,
2
(27.8)
1
N→∞ N T
limN→∞ NT E yi,t−1 − ȳi,−1
i=1 t=1
assuming that the limit of the denominator is finite and non-zero. To derive the above limiting
values we first note that
1
T
(T − 1) − Tλ + λT yi0 1 − λT
ȳi,−1 = yi,t−1 = α i + (27.9)
T t=1 T (1 − λ)2 T 1−λ
1 1 − λT−1 1 1 − λT−2 1 1−λ
+ ui1 + ui2 + . . . . + uiT .
T 1−λ T 1−λ T 1−λ
Using this result and observing that E uit yi,t−1 = 0, we have
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1
1
N T N T
E yi,t−1 − ȳi,−1 (uit − ūi ) = E yi,t−1 − ȳi,−1 uit (27.10)
NT i=1 t=1 NT i=1 t=1
1
N T
=− E uit ȳi,−1 .
NT i=1 t=1
1
N T
σ 2u 1 1 − λT
E yi,t−1 − ȳi,−1 (uit − ūi ) = − 1− .
NT i=1 t=1 T (1 − λ) T 1−λ
As for the denominator, using a similar line of reasoning it may be verified that
1 2
T N
σ 2u 1 2λ 1 1 − λT
E yi,t−1 − ȳi,−1 = 1− − 1−
NT t=1 i=1 1 − λ2 T (1 − λ) T T 1−λ
σ 2u
= + O T −1 .
1−λ 2
−1
(1 + λ) 1 1−λT 1 2λ 1 1−λT
Plim λ̂FE − λ = − 1− 1− − 1− .
N→∞ T T 1−λ T (1−λ) T T 1−λ
(27.11)
This bias is often referred to as the Nickell bias, since it was Nickell (1981) who was one of the
first to provide a formal derivation of the bias of the FE estimator of λ. The above result can be
written more compactly (assuming |λ| < 1) as
(1 + λ)
Plim λ̂FE − λ = − + O T −2 .
N→∞ T
The Nickell bias is of order 1/T, and disappears only if T → ∞. It could be substantial when T is
small and/or λ close to unity, which can arise in the case of microeconomic panels of households
and firms where T is typically in the range of 3 to 8, as well as for some cross-country data sets
where time series represent averages over sub-periods. Note that, for λ ≥ 0, the bias is negative.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
At λ = 0, the bias is given by PlimN→∞ λ̂FE = −1/T. Similar results can be obtained for the
RE estimator. The properties of the ML estimator under different assumptions on initial values
yi0 have been investigated by Anderson and Hsiao (1981). Kiviet (1995) derives an approxima-
tion for the bias of the FE estimator in a dynamic panel data model with exogenous regressors,
and suggests a bias-corrected FE estimator that subtracts a consistent estimator of this bias from
the original FE estimator.
To avoid the small T bias, transformations of the regression equation for eliminating the α i
alternative to the within transformation are required.
Example 60 (The demand for natural gas) One important early application of dynamic panel
data methods in economics includes the study by Balestra and Nerlove (1966) on the demand for
natural gas. One feature of the proposed model is the distinction between ‘captive’ and new demand
for energy and natural gas, where captive energy consumption depends on the existing stock of
energy-consuming equipment. This feature is represented in the model by the following relation
Gt = u · K t , (27.12)
where Gt is the use of gas, Kt is stock of gas-consuming capital, and u is the utilization rate, assumed
to be constant over time. Assuming that the capital stock is depreciated at a constant rate, δ, the
following relation holds between the capital stock and new investments It
Kt = It + (1 − δ) Kt−1 .
Applying (27.12), we obtain a corresponding dynamic equation for the incremental change in con-
sumption of natural gas (G∗t ), given by
where G∗t = u · It , so that the total new demand appears as the sum of the incremental change in
consumption, (Gt − Gt−1 ), and a ‘replacement’ demand term, given by δGt−1 . Gross investments
in gas-consuming equipment, and thus the new demand for gas, is specified as a function of the
relative price of gas, Pg , and new demand for total energy, denoted by E∗ , namely
G∗t = f Pg,t , E∗t . (27.14)
A relation similar to (27.13) can be derived for the increment in total energy use
where Et is the total use of all fuels in period t, and δ e is the rate of depreciation for energy-using
equipment. The model is then closed by specifying a relation explaining the total consumption
of energy
Et = f Pe,t , Yt , Ht , (27.16)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where Pe,t is the price of energy, Yt is real income and Ht is a vector of socioeconomic variables.
By combining (27.15) and (27.16), and inserting the expression for E in (27.14), the following
linearized dynamic equation for total natural gas demand is obtained
Gt = α 0 + α 1 Pt + α 2 Ht + α 3 Yt + α 4 Gt−1 ,
where Pt = (Pg,t , Pe,t ) , the coefficient attached to the lagged gas consumption is α 4 = (1 − δ),
and hence can be interpreted as linked to the depreciation rate for gas-consuming equipment. This
equation is then estimated by OLS and by FE estimator, using data on 36 US states over 13 years.
Results are reported in Table I of Balestra and Nerlove (1966). The OLS yields an estimate for α 4
that is above 1, a result that is incompatible with theoretical expectations as it implies a negative
depreciation rate for gas equipment. On the other hand, the FE gives an estimate for α 4 of 0.68.
According to the authors, the inclusion of state dummy variables seems to reduce the coefficient of
the lagged gas variable to too low a level. Such a result may be explained by the negative bias of the
FE estimator as obtained above. For further details, see Balestra and Nerlove (1966) and Nerlove
(2002).
A panel data equation with lagged dependent variables among the regressors is a particular
case of a panel weakly exogenous regressors (see Section 9.3 for a formal definition of weak exo-
geneity). Kiviet (1999) discusses the finite sample properties of the FE estimator under model
(27.1), where the regressors, xit , are allowed to be weakly exogenous by assuming that
where wit is independent of α j and uj,t−s , for all i, j, t and s. Note that under φ = 0, xit are strictly
exogenous, while if φ = 0, xit are weakly exogeneity, due to feedbacks from ui,t−1 . Under this
specification, Kiviet (1999) shows that weak exogeneity has an effect on the FE bias of simi-
lar magnitude as the presence of a lagged dependent variable. Even when no lagged dependent
variable is present in the model, weak exogeneity will render the FE estimator inconsistent for a
fixed T.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where α i ∼ IID(0, σ 2α ), and uit ∼ IID(0, σ 2α ) are assumed to be independent of each other. One
possible way of eliminating unit–specific effects is to take first differences:
Hence applying OLS to the first differenced model (27.19) would yield inconsistent estimates.
Also note that, even if the uit is serially uncorrelated, uit will be correlated over time since
⎧
⎨ 2σ 2u , for s = 0,
E uit ui,t−s = −σ 2u , for s = 1,
⎩
0, for s > 1.
To deal with the problem of the correlation between yi,t−1 and uit , Anderson and Hsiao
(1981)
suggest
using an instrumental variable (IV) approach. They note that since
E yi,t−2 uit = 0, then yi,t−2 is a valid instrument for yi,t−1 , since it is correlated with yi,t−1
and not correlated with uit , as long as uit are not serially correlated. As an example, suppose that
β = 0 , and T = 3, then we have (assuming that E(uit α i ) = E(uit yi0 ) = 0),
1 − λ2(t−1)
E yi,t−2 yi,t−1 = −σ u (1 − λ)
2
,
1 − λ2
which is non-zero (as required by the IV approach) so long as |λ| < 1. It is clear that yi,t−2 as an
instrument for yi,t−1 starts to become rather weak as λ moves closer to unity. The IV approach
breaks down for λ = 1.
The IV estimation method delivers consistent but not necessarily efficient estimates of the
parameters in the model because, as we shall see later in the chapter, it does not make use of
all the available moment conditions. Furthermore, the suggested IV procedure does not take
into account the correlation structure on transformed errors. As noted by Alvarez and Arellano
(2003), ignoring autocorrelation in the first differenced errors leads to inconsistency of the IV
estimator if T/N → c > 0.
For further discussion, see Anderson and Hsiao (1981), and Anderson and Hsiao (1982).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
yi5 − yi4 = λ yi4 − yi3 + β xi5 + ui5 , (27.22)
... (27.23)
yiT − yi,T−1 = λ yi,T−1 − yi,T−2 + β xiT + uiT . (27.24)
In equation
(27.20),
the valid instrument for y i2 − y i1 is yi1 ; in equation (27.21) valid instru-
ments for yi3 − yi2 are yi1 and yi2 ; while in (27.22) they are yi1 , yi2 , and yi3 , and so forth until
equation (27.24), where the valid instruments are yi1 , yi2 , . . . , yi,T−2 . Hence, an additional valid
instrument is added with each additional time period. Clearly, the appropriate instruments for
xit are themselves, since, by assumption, xit are strictly exogenous. Hence, there is a total of
T(T − 1)/2 available instruments or moment conditions for yi,t−1 that are given by
E yis yit − λ yi,t−1 − β xit = 0, s = 0, 1, . . . , t − 2; t = 2, 3, . . . , T.
To deal with the serial correlation in the transformed disturbances, uit , Arellano and Bond
(1991) apply the GMM method to the stacked observations
where
⎛ ⎞ ⎛
⎞
yi2 xi2
⎜ yi3 ⎟ ⎜ xi3 ⎟
⎜ ⎟ ⎜ ⎟
yi. = ⎜ .. ⎟ , Xi. = ⎜ .. ⎟,
⎝ . ⎠ ⎝ . ⎠
yiT
xiT
⎛ ⎞ ⎛ ⎞
yi1 ui2
⎜ yi2 ⎟ ⎜ ui3 ⎟
⎜ ⎟ ⎜ ⎟
yi.,−1 = ⎜ .. ⎟ , ui. = ⎜ .. ⎟. (27.26)
⎝ . ⎠ ⎝ . ⎠
yi,T−1 uiT
Let
⎛ ⎞
yi1 0 ... 0
⎜ 0 yi1 , yi2 ... 0 ⎟
⎜ ⎟
Wi = ⎜ .. .. .. .. ⎟, (27.27)
⎝ . . . . ⎠
0 0 . . . yi1 , . . . , yi,T−2
Holtz-Eakin, Newey, and Rosen (1988) have also considered a GMM estimator based on similar
conditions. Stacking the observations on all the N different groups
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
y = λ y−1 + Xβ + u, (27.29)
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
y1. y1.,−1 x1.
⎜ y2. ⎟ ⎜ y2.,−1 ⎟ ⎜ x2. ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
y = ⎜ .. ⎟ , y−1 = ⎜ .. ⎟ , X = ⎜ .. ⎟.
⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠
yN. yN.,−1 xN.
Also, let
⎛ ⎞
W1
⎜ W2 ⎟
⎜ ⎟
W=⎜ .. ⎟,
⎝ . ⎠
WN
W y = λW y−1 +W Xβ + W u. (27.30)
Similarly,
However, moments (27.32) still do not account for the serial correlation in the differenced error
term. Due to the first-order moving average structure of the error terms we also have
E Z uu Z = Z σ 2 (IN ⊗ A) Z, (27.33)
where
⎛ ⎞
2 −1 · · · 0 0
⎜ −1 2 · · · 0 0 ⎟
⎜ ⎟
A ⎜ .. .. .. .. .. ⎟
=⎜ . . . ⎟.
(T − 2) × (T − 2) ⎜ . . ⎟
⎝ 0 0 · · · 2 −1 ⎠
0 0 · · · −1 2
The GMM method can now be applied to the above moment conditions to obtain
−1
γ̂ GMM = G ZSN Z G G ZSN Z y, (27.34)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where γ̂ GMM = (λ̂GMM , β̂ GMM ) , G = y−1 , X . Alternative choices for the weights SN
give rise to a set of GMM estimators based on the moment conditions in equation (27.32), all of
which are consistent for large N and finite T, but which differ in their asymptotic efficiency. It is
possible to show that the asymptotically optimal weights are given by
N −1
SN = Zi. ûi. ûi. Zi. , (27.35)
i=1
with Zi. = (Wi , xi. ), and ûi. are the residuals from a consistent estimate, for example, from
preliminary IV estimates of β and λ. Such preliminary estimates are given by
−1
−1 −1
γ̂ = G Z Z Z ZG G Z Z Z Z y,
where = (IN ⊗ A). The GMM estimator (27.34) with weighting matrix (27.35) is known
in the literature as the two-step GMM estimator. Note that if any of the xit variables are pre-
determined rather than strictly exogenous with E(xit vis ) = 0 for s < t, and zero otherwise, then
only xi1 , xi2 , . . . , xi,t−1 are valid instruments for the differenced equation at period t. In this
case, the matrix Wi can be expanded with further columns containing the lagged values
xi1 , xi2 , . . . , xi,t−1 .
In the absence of any additional knowledge about initial conditions for the dynamic processes,
the estimator (27.34) with weighting matrix (27.35) is asymptotically normal and is efficient
in the class of estimators based on the linear moment conditions (Hansen (1982), Chamber-
lain (1987)). However, as shown in Blundell and Bond (1998) and Binder, Hsiao, and Pesaran
(2005), the performance of the IV and of the one-step and two-step GMM estimators deterio-
rates as the variance of α i increases relative to the variance of the idiosyncratic error, uit , or when
λ is close to 1. Indeed, in these cases it is possible to show that the instruments yi,t−s are only
weakly related with the differences yit . A further complication with GMM arises when T is not
small. As T → ∞, the number of GMM orthogonality conditions r = T(T − 1)/2 also tend to
infinity. In this case, Alvarez and Arellano (2003) show that the GMM remains asymptotically
normal, but, unless lim(T/N) = 0, it exhibits a bias of order O(1/N). Koenker and Machado
(1999) finds conditions on r for the limiting distribution of the GMM estimator to remain valid.
Another important point to note is that consistency of the GMM
estimator relies upon the
fact that errors are serially uncorrelated, i.e., that E uit ui,t−2 = 0. In the case of serially cor-
related errors, the GMM estimator would lose its consistency. Hence, Arellano and Bond (1991)
suggest testing the hypothesis that the second-order autocovariances for all periods in the sam-
ple are zero, based on residuals from first difference equations. See Arellano and Bond (1991,
p. 282).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
to be uncorrelated with α i and yi0 . As a result, using (27.17), they identify the following addi-
tional moment conditions
Exploiting the assumption that the error term has constant variance through time
E v2it = σ 2i , t = 1, 2, . . . , T,
The above moments can be combined with the moment conditions already introduced by
Arellano and Bond (1991), and further columns can be added to the instrument matrix in
(27.27). Calculation of the one-step and two-step GMM estimators then proceeds exactly as
described above.
where lagged values of yit are now included in xit , zi are time-invariant variables, and α i are now
assumed to be random. More compactly the first equation can be written as
where xit∗ = (xit , zi ). Arellano and Bover propose the following nonsingular transformation of
the above system equations
C
H =
T×T 1T /T
Since the first (T − 1) elements of vi.∗ do not contain α i , all exogenous variables are valid instru-
ments for the first (T − 1) equations of the transformed model. Let wi. = xi. , zi and mi.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
E (Wi Hvi. ) = 0.
∗ , and W = W , W , . . . ,
Let H∗ = (IN ⊗ H), ∗ = (IN ⊗ ), X∗ = X1.∗ , X2.∗ , . . . , XN.
. The GMM estimator based on the above moment conditions is
1. 2.
WN.
−1 ∗ ∗
−1 ∗ ∗ ∗ ∗ ∗ −1 ∗
δ̂ = X∗ H∗ W W H∗ ∗ H∗ W WH X X H W WH H W WH y .
In practice, the covariance matrix of the transformed system, S = H∗ H , will be replaced by a
consistent estimator. An unrestricted estimator of S is
1 ∗ ∗
N
Ŝ = v̂ v̂ ,
N i=1 i. i.
where the v̂i.∗ are residuals based on some consistent preliminary estimates. Because the set of
instruments, Wi , is block-diagonal, Arellano and Bover show that δ̂ is invariant to the choice of
C. Another advantage of their representation is that the form of need not be known. Further,
this approach can be easily extended to the dynamic panel data case. See Arellano and Bover
(1995) for details.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Notes: food-in: food at home; food-out: food outside the home; alct: alcohol and tobacco; clo: clothing; nds: other
nondurables and services; sdur: small durables such as books, etc.; nch: number of children, nad: number of adults in the
household; hage: age of the husband; hagez: age squared of the husband. Standard errors in parentheses.
∗, ∗∗, and ∗∗∗ denote significance at 10%, 5% and 1% levels.
where wiht is the budget share for good i, by household h, at time t, xht is total expenditure deflated
by a price index, zkht is a list of demographics and time and seasonal dummies, and εiht is a pos-
sibly autocorrelated error term. In the above specification, the random group effect, λih , allows for
persistent individual heterogeneity, while the coefficient γ i captures the state-dependence present in
the data. The Arellano and Bover (1995) approach is then adopted to estimate the above model.
Results, reported on Table 27.1, show that lagged budget shares are significant for food-out, alco-
hol and tobacco, clothing and small durables, whereas for food-in and non-durables and services
there is no evidence of state dependence once they control for unobserved heterogeneity. The posi-
tive coefficient of the lagged budget shares for food-out and alcohol and tobacco is consistent with
habit formation in those commodities, while the negative sign for clothing and for small durables
reflects the durability of these two goods. The estimated elasticities show that, as expected, food-in
and alcohol and tobacco are necessities, whereas food-out, clothing and small durables are luxuries.
See Browning and Collado (2007) for further details.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
αi
yi0 = + ui0 , i = 1, 2, . . . , N, (27.37)
1−λ
The above condition states that deviations of the initial conditions from α i /(1 − λ) are uncor-
αi
related with the level of 1−λ itself. To guarantee this, it may be assumed that
αi
E yi0 − α i = 0.
1−λ
If (27.38) holds in addition to other standard assumptions, then the following T − 1 additional
moment conditions are available
E yit − λyi,t−1 yi,t−1 = 0, for t = 2, 3, . . . , T.
Hence, calculation of the one-step and two-step GMM estimators proceeds as described above,
adding T columns to the matrix of instruments (27.27). This estimator is known as system
GMM. Blundell and Bond (1998) show that the above moment conditions remain informative
when λ is close to unity or when σ 2α /σ 2u is large. A Monte Carlo exercise provided in Blundell,
Bond, and Windmeijer (2000) shows that the use of these additional moment conditions yields
substantial gains in terms of the properties of the 2-step GMM estimators, especially in the ‘weak
instrument’ case. But when using this estimator it is important to bear in mind that its validity
critically depends on assumption (27.38).
where yit is the log of sales of firm i in year t, nit is the log of employment, kit is the log of capital
stock, γ t is a year-specific intercept, α i is an unobserved firm-specific effect, vit is an autoregressive
productivity shock, and mit reflects serially uncorrelated measurement errors. Constant returns to
scale would imply β n +β k = 1. Estimation of the above production function is subject to a number
of econometric issues, including measurement errors in output and capital, and simultaneity arising
from potential correlation between observed inputs and productivity shocks (for example, manage-
rial ability). See, for example, Griliches and Mairesse (1997). GMM methods could be used to
control for these sources of bias. To this end, note that model (27.39)–(27.40) has the dynamic
common factor representation
yit = π 1 nit + π 2 ni,t−1 + π 3 kit + π 4 ki,t−1 + π 5 yi,t−1 + γ ∗t + α ∗i + wit ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
OLS Levels Within groups DIF t-2 DIF t-3 SYS t-2 SYS t-3
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
estimator. Estimates are both restricted and unrestricted. As expected in the presence of group effects,
the OLS shows an upward-bias in the estimate of π 5 , while the FE estimator appears to give a
downward-biased estimate of this coefficient. See Blundell and Bond (2000) for further details. We
also refer to Levinshon and Petrin (2003) and Ackerberg, Caves, and Frazer (2006) for alternative
extended GMM approaches for consistent estimation of production function parameters.
N −1
H = û W Wi ûi ûi Wi W û ∼ χ 2r−k−1 ,
i=1
where r (assumed to be greater than k) refers to the number of columns of the matrix W, con-
taining the instruments, and û denotes the residuals from the two-step estimation. In a Monte
Carlo study, Bowsher (2002) shows that, when T is large, using too many moment conditions
causes the above test to have extremely low power.
Hence, zit may contain lagged values of yit . Keane and Runkle consider the general covariance
specification
E vv = IN ⊗
, (27.43)
where
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
v1.
⎜ v2. ⎟
v ⎜ ⎟
=⎜ .. ⎟,
(NT × 1) ⎝ . ⎠
vN.
vi. = (vi1 , vi2 , . . . , viT ) , and
= E vi. vi. is no longer constrained to be equal to A as in the
case of the Arellano and Bond (1991) estimator. Stacking all the group regressions we have
y = Xβ + v, (27.44)
. The Keane and Runkle estimator
where y = y1. , y2. , . . . ., yN. and X = X1. , X2. , . . . , XN.
is obtained by applying the forward filtering approach by Hayashi and Sims (1983) to the above
stacked form of the panel. Forward filtering eliminates the serial correlation pattern, and yields a
more efficient estimator than the standard 2SLS estimator. First,
−1 is decomposed using the
Cholesky decomposition
−1 = P P,
where P is an upper triangular matrix. Hence, both sides of (27.44) are pre-multiplied by
S = IN ⊗ P, yielding
Sy = SXβ + Su.
−1
where Pz = Z Z Z Z . Matrix
can be estimated using consistent IV residuals from a
preliminary estimation
N
ˆ = 1
v̂ v̂ , (27.46)
N i=1 i. i.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
group-specific effects assumed to be fixed unknown parameters. To simplify the analysis, abstract
from exogenous regressors and take first differences of (27.5) to obtain
m−1
yi1 = λm yi,−m+1 + λj ui,1−j
j=0
= λ yi,−m+1 + vi1 .
m
When | λ | <1, for any fixed starting values yi,−m+1 , limm→∞ λ yi,−m+1 = 0, and, under
m
MLE.i | λ | <1, and the process has been going on for a long time, namely m→∞, with
E y i1 = 0, Var( yi1 ) = 2σ u / (1 + λ), Cov (vi1 , ui2 ) = − σ u , and Cov vi1 ,
2 2
uit = 0 for t = 3, 4, . . . , T, i = 1, 2, . . . , N.
MLE.ii The process has started from a finite period in the past not too far back from the 0th
period, namely for given values of yi,−m+1 with m finite, such that E( yi1 ) = b,
Var( yi1 ) = cσ 2u , with c > 0, Cov (vi1 , ui2 ) = −σ 2u , and Cov (vi1 , uit ) = 0 for
t = 3, 4, . . . , T, i = 1, 2, . . . , N.
Assumption MLE.i follows directly from the serial independence of uit , and under | λ |< 1.
Assumption MLE.ii imposes the restriction that expected changes in the initial endowments are
the same across individuals, but does not require | λ |< 1, or that all individuals start from the
same point in time.
In the case where the dynamic model also contains the regressors, xit , a distinction must be
made depending on whether the regressors, xit , are strictly or weakly exogenous (see Assump-
tions 4.i and 4.ii in Hsiao, Pesaran, and Tahmiscioglu (2002)). In this more general case the first
differenced model is
m−1
m−1
yi1 = λm yi,−m+1 + β λj xi,1−j + λj ui,1−j .
j=0 j=0
To solve the incidental parameters problem associated with the initial conditions, it is also
required that xit follows either
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
∞
∞
xit = μi + g t + aj ε i,t−j , | aj |< ∞, (27.48)
j=0 j=0
or
∞
∞
xit = g + dj ε i,t−j , | dj |< ∞. (27.49)
j=0 j=0
∞
∞
xit = g + d∗j ε i,t−j , | d∗j |< ∞,
j=0 j=0
where d∗j = dj in (27.49), and d∗j = aj − aj−1 under (27.48), and it is easily seen that
E xi, 1−j | xi = bj + π j xi .
Let yi = yi1 , yi2 , . . . , yiT and
⎛ ⎞
1 xi 0 0
⎜ 0 0 yi1 xi2 ⎟
⎜ ⎟
W̃i = ⎜ .. .. .. .. ⎟,
⎝ . . . . ⎠
0 0 yi,T−1 xiT
and ui = ( ui1 , ui2 , . . . , uiT ) . Note that under either MLE.i or MLE.ii we have
yi = W̃i ϕ + ui , (27.50)
with ϕ = b∗ , π ∗ , γ , β , where b∗ = 0 under MLE.i and if g = 0, and b∗ = b under MLE.ii,
and π ∗ is a T × 1 vector of unknown coefficients which in general varies independently of the
variations in β and γ .
The transformed disturbances, ui , have covariance matrix = σ 2u ∗ , where
⎛ ⎞
ω −1 · · · 0 0
⎜ −1 2 · · · 0 0 ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
∗ = ⎜ . .
..
. . . ⎟,
⎜ ⎟
⎝ 0 0 · · · 2 −1 ⎠
0 0 · · · −1 2
where ω = 1/σ 2u Var( yi1 ). Note that ω is generally unrestricted except for the case
when
β = 0 and yit ∼ I(0), in which case ω = 2/ (1 + λ). The determinant of ∗ is ∗ =
1 + T (ω − 1).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Let θ = ϕ , ω, σ 2u . The likelihood function of the transformed model (27.50) is then
NT NT 2 N
(θ ) = − ln (2π ) − ln σ u − ln [1 + T (ω − 1)]
2 2 2
1
N
− yi − W̃i ϕ −1 yi − W̃i ϕ . (27.51)
2 i=1
The exact specification of the new parameters b, π ∗ , and ω depends on whether yit is I(0),
and/or whether xit is strictly or weakly exogenous. See Hsiao, Pesaran, and Tahmiscioglu (2002)
for further details.
The Monte Carlo experiments reported in Hsiao, Pesaran, and Tahmiscioglu (2002) also
show that transformed MLE performs well in small samples, and tends to dominate GMM type
estimators. However, the transformed ML is based on a number of strong distributional assump-
tions on the disturbances. In particular, it assumes that uit are cross-sectionally homoskedastic.
In a recent paper, Hayakawa and Pesaran (2015) relax this assumption and allow E u2it = σ 2ui
to differ across i. This is not a trivial extension, due to the incidental parameters problem that
arises, and its implications for estimation and inference. To deal with this problem, Hayakawa
and Pesaran (2015) use the pseudo- or quasi-ML approach where the error variance hetero-
geneity is ignored at the estimation stage, but robust standard errors are used at the inference
stage.
Let θ ∗ denote the pseudo true values obtained by maximizing the pseudo log-likelihood
function (27.51) of the misspecified model that ignores the error variance heterogeneity.
Hayakawa and Pesaran (2015) establish that in this heteroskedastic case, θ ∗ = ϕ , ω, σ 2u∗ ,
where σ 2u∗ = limN→∞ N −1 N i=1 σ ui . Hence, the ML estimator of ϕ and ω by Hsiao, Pesaran,
2
√ d
N
θ − θ ∗ → N 0, A∗−1 B∗ A∗−1 (27.52)
where θ ∗ = (ϕ , ω, σ 2u∗ ) ,
∗ 1 ∂ 2 p (θ ∗ ) 1 ∂p (θ ∗ ) ∂p (θ ∗ )
A = lim E − , and B∗ = lim E .
N→∞ N ∂θ∂θ N→∞ N ∂θ ∂θ
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Chamberlain (1982) to deal with the correlation between the factor loadings and the regres-
sors, but continues to assume that all factor loadings (including the one associated with the
intercepts) are uncorrelated with the errors. Hayakawa, Pesaran, and Smith (2014), using the
transformed ML approach of Hsiao, Pesaran, and Tahmiscioglu (2002), propose an alternative
quasi-ML approach applied to the panel data model after first-differencing. The proposed esti-
mation procedure includes the transformed likelihood procedure of Hsiao, Pesaran, and Tahmis-
cioglu (2002) as a special case. It allows for both fixed and interactive effects (the latter based on
a random coefficient specification), and can be used to test the validity of the fixed-effects spec-
ification against the more general model with interactive effects.
In what follows, we give a summary account of the quasi-differencing and the transformed
MLE approaches, as they represent quite different ways of dealing with the unobserved factors
when T is short and N large. Also, to simplify the exposition, we consider a factor error structure
with only a single unobserved factor. To this end suppose that uit in the dynamic panel data
model, (27.1), is given by the following unobserved factor error structure,
uit = γ i ft + ε it , (27.53)
where ft is the unobserved common factor, γ i is the factor loading of ith unit and ε it ’s are cross-
sectionally independent innovations. (27.53) is an exact factor model.3 Using (27.53) in (27.1)
we have
Ahn, Lee, and Schmidt (2001) employ the quasi-differencing approach of Holtz-Eakin, Newey,
and Rosen (1988) to eliminate γ i . This involves multiplying the equation for yi,t−1 by φ t =
ft /ft−1 , and then subtracting it from the equation for yit to obtain
yit − φ t yi,t−1 = 1 − φ t α i + λ yi,t−1 − φ t yi,t−2 + β xit − φ t xi,t−1 + eit , (27.55)
Conditions (27.56)–(27.57) imply that the error term of the transformed equation (27.55) sat-
isfies the orthogonality conditions
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
E yis eit = 0, E (xis eit ) = 0, for s < t − 1.
Thus, the vector of instrumental variables that is available to identify the parameters of equation
(27.55) (under α i = α) is
1, yi1 , . . . ., yi,t−2 , xi1 , xi2 , . . . , xi,t−2 ,
and the GMM estimation based on these instruments is consistent under fixed T and as
N → ∞, although with moderate T the number of moments can be large and rises rapidly as T
increases. The GMM estimation can also be subject to the weak instrument problem. Extension
of (27.54) to more than one unobserved factor is attempted by Ahn, Lee, and Schmidt (2007) to
estimate a production function for a set of rice farms observed over six seasons, where multiple
factors were included to proxy farm-specific, time varying technical inefficiencies. See also Ahn,
Lee, and Schmidt (2007, 2013) for details.
Hayakawa, Pesaran, and Smith (2014) consider the dynamic panel data, (27.54), and allow
for α i to differ across i. First, they apply the first-differencing operator to eliminate α i , and then
deal with the factor loadings by assuming that γ i are random draws from a distribution with a
fixed number of unknown parameters. For simplicity assume that xit is a scalar, and write the
model as
∞
∞
xit = c + ϑ i gt + dj εi,t−j , dj < ∞, (27.60)
j=0 j=0
which is a generalization of (27.49), where ϑ i are the random interactive effects distributed inde-
pendently of uit and ft . Following similar assumptions as in Hsiao, Pesaran, and Tahmiscioglu
(2002), it is further shown that
yi = Wi ϕ + λg + ξ i , (27.62)
where yi = yi1 , yi2 , . . . , yiT , ϕ = b, π , γ , β , ξ i = ηi g + ri , g = (g̃1 , g2 , . . . , gT ) ,
ri = (vi1 , ui2 , . . . , uiT ) and
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
1 xi 0 0
⎜ 0 0 yi1 xi2 ⎟
⎜ ⎟
Wi = ⎜ .. .. .. .. ⎟. (27.63)
⎝ . . . . ⎠
0 0 yi,T−1 xiT
⎛ ⎞
ω −1 · · · 0 0
⎜ −1 2 · · · 0 0 ⎟
⎜ ⎟
2⎜ .. .. .. .. ⎟
E(ri ri ) =σ ⎜ . .
..
. . . ⎟ = σ 2 . (27.64)
⎜ ⎟
⎝ 0 0 · · · 2 −1 ⎠
0 0 · · · −1 2
Also
Var(ξ i ) = σ 2 + σ 2η gg =σ 2 + φgg ,
NT NT N
N (ψ) = − ln (2π ) − ln(σ 2 ) − ln +φgg
2 2 2
1
N
−1
− 2 yi − Wi γ − λg + φgg yi − Wi γ − λg .
2σ i=1
(27.65)
where y∗it is a latent variable, yit = I(y∗it > d) is the discrete observed dependent variable that
takes the value of unity if y∗it > d and zero otherwise, d is a threshold value, xit is a vector of
strictly exogenous regressors, α i ∼ IIDN(0, σ 2α ), and uit ∼ IIDN(0, σ 2u ). We are interested in
modelling
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
P yit |yi0 , yi1 , . . . , yi,t−1 , Xi. , α i = G α i + λyi,t−1 + β xit , (27.67)
where G is typically chosen to be logit or the probit functions. Under the above specification the
probability of success at time t is allowed to depend on the outcome in the previous period, t −1,
as well as on unobserved heterogeneity, α i . Of particular interest is testing the null hypothesis
that λ = 0. Under this hypothesis, the response probability at time t does not depend on past
outcomes once controlled for α i and Xi. . As with the linear models specification of the initial
values plays an important role in the estimation and inference. A simple approach would be to
treat yi0 as non-stochastics, and assume that α i , i = 1, 2, . . . , N are random and independent
of Xi. . In such a setting, the density of yi1 , yi2 , . . . , yiT given Xi. can be obtained by integrating
out α i ’s, following a Bayesian approach, along similar lines as those described in Section 26.11.
Although treating the yi0 as nonrandom simplifies estimation, it is undesirable because it implies
that yi0 is independent of α i and of any of the exogenous variables, which is a strong assumption.
In a recent paper, using a set of Monte Carlo experiments, Akay (2012) shows that the exoge-
nous initial values assumption, if incorrect, can lead to serious overestimation of the true state
dependence and serious underestimation of the variance of unobserved group effects, when T
is small. An alternative approach would be to allow the initial condition to be random, and then
to use the joint distribution of all outcomes on the responses—including that in the initial time
period—conditional on unobserved heterogeneity and observed strictly exogenous explanatory
variables. However, as shown by Wooldridge (2005), the main complication with this approach
is in specifying the distribution of the initial values given α i and xit . For the dynamic probit spec-
ification, Wooldridge (2005) proposes a very simple approach, which consists of specifying a
distribution for α i conditional on the initial values and on the time averages of the exogenous
variables
where x̄i. = T1 Tt=1 xit , ηi is an unobserved individual-effects, such that ηi |x̄i. ∼ IIDN(0, σ 2η ),
and yi0 ∼ IIDN(0, σ 2η ). Plugging (27.68) into (27.66), under the probit specification, it is pos-
sible to derive the joint distribution of outcomes conditional on the initial values and the strictly
exogenous variables. Such a likelihood has exactly the same structure as the standard random
effects probit model, except for the regressors, which is now given by xit∗ = 1, xit , yi0 , x̄i. .
Hence, with this approach it is possible to add yi0 and x̄i. as additional explanatory variables in
each time period and use standard random effects probit software to estimate β, λ, π 0 , π 1 , π 2
and σ 2η .
Al-Sadoon, Li, and Pesaran (2012) introduce a binary choice panel data model where the
idiosyncratic error term follows an exponential distribution, and derive moment conditions that
eliminate the fixed-effect term and at the same time identify the parameters of the model. Appro-
priate moment conditions are derived both for identification of the state dependent parameter,
λ, as well as the coefficients of the exogenous covariates, β. It is shown that
√ the resultant GMM
estimators are consistent and asymptotically normally distributed at the N rate.
We refer to Hsiao (2003, Section 7.5.2), and Wooldridge (2005) for further discussion on
the initial conditions problem and on estimation of dynamic nonlinear models. Arellano and
Bonhomme (2011) provide a review of recent developments in the econometric analysis of non-
linear panel data models.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
27.10 Exercises
1. Consider model (27.17)–(27.18) and suppose that all variables xit are predetermined, with
E(xit vis ) = 0 for s < t, and zero otherwise. Write the Wi matrix with valid instruments for
GMM estimation and derive the one-step and two-step GMM estimator for this case.
2. Consider the dynamic panel data model with no exogenous regressors
(a) Transform the model into first differences and write down the log-likelihood function
using the transformed likelihood approach.
(b) Derive the first- and second-order conditions for maximization of the log-likelihood
function.
(c) Distinguish between the stationary and the unit root case and discuss the consistency of
the transformed MLE estimators under both cases.
3. Consider the dynamic panel data model with a single unobserved factor but without fixed-
effects
(a) Derive the bias of the least square estimate of λ, λ̂OLS , given by
T N
yit yi,t−1
λ̂OLS = t=1 i=1
T N 2
.
t=1 i=1 yi,t−1
(b) Compare this bias with the Nickell bias given by (27.3).
4. Consider the following dynamic panel data model with interactive effects
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where α i are fixed-effects, ft is an unobserved factor, and γ i is the factor loading for the
ith unit.
(a) Derive an equation in yit , xit and their lagged values that do not depend on α i and γ i .
(b) Under what conditions can λ and β be estimated consistently?
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
28 Large Heterogeneous
Panel Data Models
28.1 Introduction
P anel data models introduced in the previous two chapters, 26 and 27, deal with panels where
the time dimension (T) is fixed, and assumes that conditional on a number of observ-
able characteristics, any remaining heterogeneity over the cross-sectional units can be modelled
through an additive intercept (assuming either fixed or random), and possibly heteroskedastic
errors. This chapter extends the analysis of panels to linear panel data models with slope hetero-
geneity. It discusses how neglecting such heterogeneities affects the consistency of the estimates
and inferences based upon them, and introduces models that explicitly allow for slope hetero-
geneity both in the case of static and dynamic panel data models. To deal with slope heterogene-
ity, particularly in the case of dynamic models, it is often necessary to assume that the number
of time series observations, T, is relatively large, so that individual equations can be estimated
for each unit separately. Models, estimation and inference procedures developed in this and sub-
sequent chapters are more suited to large N and T panels. Such panel data sets are becoming
increasingly available and cover countries, regions, industries, and markets over relatively long
time periods.
Despite the slope heterogeneity, the cross-sectional units could nevertheless share common
features of interest. For example, it is possible for different countries or geographical regions
to have different dynamics of adjustments towards equilibrium, due to their historical and cul-
tural differences, but they could all converge to the same economic equilibrium in the very
long run, due to forces of arbitrage and interconnections through international trade and cul-
tural exchanges. Other examples include cases where slope coefficients can be viewed as random
draws from a distribution with a number of parameters that are bounded in N. Large number of
panel data sets fit within this setup, where the cross-sectional units might be industries, regions,
or countries, and we wish to identify common patterns of responses across otherwise heteroge-
neous units. The parameters of interest may be intercepts, short-run coefficients, long-run coef-
ficients or error variances.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
This chapter deals with panels with stationary variables. The econometric analyses of panels
with unit roots and cointegration is covered in Chapter 31.
k
yit = β kit xkit + uit , (28.1)
k=1
= β it xit + uit , i = 1, 2, . . . , N, t = 1, 2, . . . , T,
where uit denotes the random error term, xit is a k×1 vector of exogenous variables and β it is the
k × 1 vector of coefficients. The above specification is very general and allows the coefficients to
vary both across time and over individual units. As it is specified it is too general. It simply states
that each individual unit has its own coefficients that are specific to each time period. However,
as pointed out by Balestra (1996), this general formulation is, at most, descriptive. It lacks any
explanatory power and it is not useful for prediction. Furthermore, it is not estimable, as the num-
ber of parameters to be estimated exceeds the number of observations. For a model to become
interesting and to acquire explanatory and predictive power, it is essential that some structure is
imposed on its parameters.
One way to reduce the number of parameters in (28.1) is to adopt a random coefficient
approach, which assumes that the coefficients β it are draws from probability distributions with a
fixed number of parameters that do not vary with N and/or T. Depending on the type of assump-
tion about the parameter variation, we can further classify the models into one of two categories:
stationary and non-stationary random-coefficient models.
The stationary random-coefficient models view the coefficients as having constant means and
variance-covariances. Namely, the k × 1 vector β it is specified as
β it = β + ηit , i = 1, 2, . . . , N, t = 1, 2, . . . , T, (28.2)
β it = β + ηi , i = 1, 2, . . . , N, t = 1, 2, . . . , T, (28.3)
and
Estimation and inference in the above specification are discussed in Section 28.4.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
β it = β + ξ it (28.6)
= β + ηi + λt , i = 1, 2, . . . , N, t = 1, 2, . . . , T,
and assume
E(ηi ) = E(λt ) = 0, E ηi λt = 0, (28.7)
E ηi xit = 0, E λt xit = 0,
η , if i = j,
E ηi ηj =
0, if i = j,
, if i = j,
E(λi λj ) =
0, if i = j,
β it = β t = Hβ t−1 + δ t , (28.8)
where all eigenvalues of H lie inside the unit circle, and δ t is a stationary random variable with
mean μ. Hence, letting H = 0 and δ t be IID we obtain the model proposed by Hildreth and
Houck (1968), while for the Pagan (1980) model, H = 0 and
δ t − μ = δ t − β = A(L) t , (28.9)
where β is the mean of β t and A(L) is a matrix polynomial in the lag operator L (with L t =
t−1 ), and t is independent normal. The Rosenberg (1972), Rosenberg (1973) return-to-
normality model assumes that the absolute value of the characteristic roots of H be less than
1, with ηt independently normally distributed with mean μ = (Ik − H)β.
The non-stationary random coefficients models do not regard the coefficient vector as having
constant mean or variances. Changes in coefficients from one observation to the next can be the
result of the realization of a nonstationary stochastic process or can be a function of exogenous
variables. When the coefficients are realizations of a nonstationary stochastic process, we may
again use (28.8) to represent such a process. For instance, the Cooley and Prescott (1976) model
can be obtained by letting H = Ik and μ = 0. When the coefficients β it are functions of
individual characteristics or time variables (e.g. see Amemiya (1978), Boskin and Lau (1990)),
we can let
β it =
qit + ηit . (28.10)
While the detailed formulation and estimation of the random coefficients model depends on the
specific assumptions about the parameter variation, many types of random coefficients models
can be conveniently represented using a mixed fixed and random coefficients framework of the
form (see, for example, Hsiao, Appelbe, and Dineen (1992))
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where zit and wit are vectors of exogenous variables with dimensions and p respectively, γ is an
× 1 vector of constants, α it is a p × 1 vector of random variables, and uit is the error term. For
instance, the Swamy type model, (28.3), can be obtained from (28.11) by letting zit = wit =
xit , γ = β, and α it = ηi ; the Hsiao type model (28.6) and (28.7) is obtained by letting zit =
wit = xit , γ = β, and α it = ηi + λt ; the stochastic time varying parameter model (28.8) is
obtained by letting zit = xit , wit = xit (H, Ik ) , γ = μ, and α it = λt = [β t−1 , (δ t − μ) ];
and the model where β it is a function of other variables is obtained by letting zit = xit ⊗ qit ,
γ = vec(
), wit = xit , α it = ηit , etc.
In this chapter we focus on models with time-invariant slope coefficients that vary randomly
or freely over the cross-sectional units. We begin by considering the implications of neglecting
such heterogeneity on the consistency and efficiency of the homogenous slope type estimators
such as fixed and random effects models.
uit ∼ IID 0, σ 2u , and μi are unknown fixed parameters. The coefficients, β i , are allowed to vary
freely across units but are otherwise assumed to be fixed (over time). It proves useful to decom-
pose β i into a common component, β, and a remainder term, ηi , that varies across units:
β i = β + ηi . (28.13)
The nature of the slope heterogeneity can now be characterized in terms of the properties of
ηi , in particular where there is systematic dependence between ηi and the regressors xit and an
additional regressor zit .
Consider an investigator that ignores the heterogeneity of the slope coefficients in (28.12),
and instead estimates the model
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
such that
1
N
E(i ) = lim i , (28.16)
N→∞ N i=1
yi = μi τ T + β i xi + ui , (28.17)
and
yi = α i τ T + Wi δ + vi , (28.18)
respectively, where
and
⎛ ⎞ ⎛ ⎞
xi1 zi1 vi1
⎜ xi2 zi2 ⎟ ⎜ vi2 ⎟
⎜ ⎟ ⎜ ⎟
Wi = ⎜ .. .. ⎟ , vi = ⎜ .. ⎟.
⎝ . . ⎠ ⎝ . ⎠
xiT ziT viT
The fixed-effects (FE) estimators of the slope coefficients in (28.18) can be written as
−1 N
δ̂ x,FE
N
δ̂ FE = = Wi MT Wi Wi MT yi , (28.19)
δ̂ z,FE i=1 i=1
1 The fixed-effects estimator in (28.19) assumes a balanced panel. But the results readily extend to unbalanced panels.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
It is now easily seen that under Assumptions H.1–H.4 and for N and/or T sufficiently large
N
1 Wi MT ui p
→ 0, (28.21)
N i=1 T
p
where → denotes convergence in probability. To see this, note that since uit are cross-sectionally
independent and wit are strictly exogenous, then we have
N
1 Wi MT ui 1 2
N
1 Wi MT Wi
Var = σ E .
N i=1 T TN N i=1 i T
Also, under Assumptions H.1 and H.3, σ 2i and E T −1 Wi MT Wi are bounded and as a result
N
1 Wi MT ui
Var → 0,
N i=1 T
if N and/or T → ∞. Also, under strict exogeneity of wit , E T −1 Wi MT ui = 0, for all i, and
the desired result in (28.21) follows.
Using (28.21) in (28.20) we now have
−1 N
N
W MT Wi W MT x i
i i
PlimN,T→∞ (δ̂ FE ) = Plim Plim βi . (28.22)
i=1
NT i=1
NT
Consider now the case where the slopes are heterogenous. Using the above results, it is now
easily seen that the consistency result in (28.23) will follow if and only if
N N
W MT x i 1
i=1 xi MT xi ηi p
i
ηi = NT
N → 0. (28.24)
i=1 zi MT xi ηi
NT 1
i=1 NT
This condition holds under the random coefficient specification where it is assumed that ηi ’s are
distributed independently of wit for all i and t. (See below and Swamy (1970)). Under Assump-
tion H.3 and as T → ∞ we have
1 1
N N
p
x i MT x i η i − ωixx ηi → 0,
NT i=1 N i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
and
1 1
N N
p
z MT x i η i − ωizx ηi → 0.
NT i=1 i N i=1
1 1
N N
p p
ωixx ηi → 0, and ωizx ηi → 0, (28.25)
N i=1 N i=1
where
1 1
N N
Cov(ωixx , ηi ) = PlimN→∞ ωixx ηi , Cov(ωixz , ηi ) = PlimN→∞ ωixz ηi ,
N i=1 N i=1
1 1
N N
E(ωixx ) = lim ωixx , E(ωizz ) = lim ωizz , (28.28)
N→∞ N N→∞ N
i=1 i=1
1
N
E(ωixz ) = lim ωixz .
N→∞ N
i=1
Clearly, these conditions are met under slope homogeneity. In the present application
where the regressors are assumed to be strictly exogenous, the fixed-effects estimators con-
verge to their true values under the random coefficient model (RCM) where the slope
coefficients and the regressors are assumed to be independently distributed. Notice, how-
ever, that since the β i ’s are assumed to be fixed over time, then any systematic depen-
2 Notice that under slope heterogeneity the fixed-effects estimators are inconsistent when N is finite and only T → ∞.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
dence of ηi on wit over time is already ruled out under model (28.12). The random coeffi-
cients assumption imposes further restrictions on the joint distribution of ηi and the cross-
sectional distribution of wit .
2. The FE estimator of δ z is robust to slope heterogeneity if the incorrectly included
regressors, zit , are on average orthogonal to xit , namely when E(ωixz ) = 0, and if
Cov(ωixz , ηi ) = 0. However, in the presence of slope heterogeneity, the FE estimator of
δ x continues to be inconsistent even if zit and xit are on average orthogonal. The direction
of the asymptotic bias of δ̂ x,FE depends on the sign of Cov(ωixx , ηi ). The bias of δ̂ x,FE is
positive when Cov(ωixx , ηi ) > 0 and vice versa.3
3. In general, where E(ωixz ) = 0 and Cov(ωixz , ηi ) = 0 and/or Cov(ωixx , ηi ) = 0, the
fixed-effects estimators, δ̂ x,FE and δ̂ z,FE , are both inconsistent.
In short, if the slope coefficients are fixed but vary systematically across the groups, the appli-
cation of the general-to-specific methodology to standard panel data models can lead to mislead-
ing results (spurious inference). An important example is provided by the case when attempts are
made to check for the presence of nonlinearities by testing the significance of quadratic terms in
static panel data models using fixed-effects estimators. In the context of our simple specification,
this would involve setting zit = x2it , and a test of the significance of zit in (28.14) will yield sen-
sible results only if the conditions defined by (28.29) are met. In general, it is possible to falsely
reject the linearity hypothesis when there are systematic relations between the slope coefficients
and the cross-sectional distribution of the regressors. Therefore, results from nonlinearity tests
in panel data models should be interpreted with care. The linearity hypothesis may be rejected
not because of the existence of a genuine nonlinear relationship between yit and xit , but due to
slope heterogeneity.
Finally, it is worth noting that since the β i ’s are fixed for each i, the nonlinear specification
cannot be reconciled with (28.29), unless it is assumed that β i varies proportionately with xit .
Clearly, it is possible to allow the slopes, β i , to vary systematically with some aspect of the
cross-sectional distribution of xit without requiring β i to be proportional to xit , and hence time-
varying. For example, it could be that
β i = γ 0 + γ 1 x̄i , (28.31)
where x̄i = T −1 Tt=1 xit . This specification retains the linearity of (28.29) for each i, but can
still yield a statistically significant effect for x2it in (28.30) if slope heterogeneity is ignored and
fixed-effects estimates of (28.30) are used for inference. This feature of fixed-effects regressions
under heterogeneous slopes is illustrated in Figure 28.1. The figure shows scatter points and
associated regression lines for three countries with slopes that differ systematically with x̄i . It is
clear that the pooled regression based on the scatter points from all three countries will exhibit
strong nonlinearities, although the country-specific regressions are linear.
3 Notice that E(ωixx )E(ωizz ) − (E(ωixz ))2 > 0, unless xit and zit are perfectly collinear for all i, which we rule out.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
country 2
country 1 country 3
Example 63 One interesting study illustrating the importance of slope heterogeneity in cross coun-
try analysis is the analysis by Haque, Pesaran, and Sharma (2000) on the determinants of cross-
country private savings rates, using a subset of data from Masson, Bayoumi, and Samiei (1998)
(MBS), on 21 OECD countries over 1971–1993. MBS ran FE regressions of
PSAV : the private savings rate, defined as the ratio of aggregate private savings
to GDP;
Table 28.1 contains the FE regression for the industrial countries. We refer to this specification as
model M0 . The estimates under ‘model M0 ’ in Table 28.1 are identical to those reported in column
1 of Table 3 in MBS (1998), except for a few typos. Apart from the coefficient of the GDP growth
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
rate (GR), all the estimated coefficients are statistically (some very highly) significant, and in par-
ticular suggest a strong quadratic relationship between saving and per-capita income. However, the
validity of these estimates and the inferences based on them critically depend on the extent to which
slope coefficients differ across countries, and in the case of static models, whether these differences
are systematic. As shown above, one important implication of neglected slope heterogeneity is the
possibility of obtaining spurious nonlinear effects. This possibility is explored by adding quadratic
terms in W, INF, PCTT, and DEP to the regressors already included in model M0 . Estimation
results, reported under ‘model M1 ’ in Table 28.1, show that the quadratic terms are all statistically
highly significant. While there may be some a priori argument for a nonlinear wealth effect in the
savings equation, the rationale for nonlinear effects in the case of the other three variables seems less
clear. The quadratic relationships between the private savings rate and the variables W, PCTT,
and DEP are in fact much stronger than the quadratic relationship between savings and per capita
income that MBS focus on. The R̄2 of the augmented model, 0.801, is also appreciably larger than
that obtained for model M0 , 0.766. A similar conclusion is reached using other model selection cri-
teria such as the Akaike information criterion (AIC) and the Schwarz Bayesian criterion (SBC)
also reported in Table 28.1. As an alternative to the quadratic specifications used in model M1 , the
authors investigate the possibility that the slope coefficients in each country are fixed over time, but
are allowed to vary across countries linearly with the sample means of their wealth to GDP ratio or
their per-capita income. More specifically, denote the vector of slope coefficients for country i by β i ,
and define
T
T
W i = T −1 Wit , and YRUSi = T −1 YRUSit .
t=1 t=1
β i = β 0 + β 01 W i + β 02 YRUSi . (28.32)
xit = (SURit , GCURit , GIit , GRit , RINTit , Wit , INFit , PCTTit , YRUSit , DEPit ) .
The estimated elements of β 0 , β 01 , and β 02 together with their t-ratios are given in Table 28.2.
Apart from the coefficient of the SUR variable, all the other coefficients show systematic variation
across countries. The coefficient of the SUR variable seems to be least affected by slope heterogeneity,
and the hypothesis of slope homogeneity cannot be rejected in the case of this variable. However, none
of the other estimates is directly comparable to the FE estimates given in Table 28.1. In particular,
the coefficients of output growth variables (GRit and GRit × W i ) are both statistically significant,
while this was not so in the case of the FE estimates in Table 28.1. Care must also be exercised when
interpreting these estimates. For example, the results suggest that the effect of real output growth on
the savings rate is likely to be higher in a country with a high wealth–GDP ratio. Similarly, inflation
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Table 28.1 Fixed-effects estimates of static private saving equations, models M0 and M1
(21 OECD countries, 1971–1993)
Model M0 Model M1
Regressors Linear Terms Quadratic Terms Linear Terms Quadratic Terms
effects on the savings rate are estimated to be higher in countries with higher wealth to GDP ratios.
However, these results do not predict, for instance, that an individual country’s savings rate will
necessarily rise with output growth.
For further discussion on the consequences of ignoring parameter heterogeneity see, for
example, Robertson and Symons (1992) and Haque, Pesaran, and Sharma (2000).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Regressors β̂ 0 β̂ 01 β̂ 02
SUR −0.625 − −
(−12.10)
GCUR −1.146 0.0022 −
(−6.91) (4.26)
GI −1.891 0.0039 −
(−2.44) (1.60)
GR −0.744 0.0023
(−2.69) (2.71) −
RINT 0.417 − −0.0052
(4.36) − (−3.53)
W 0.119 −0.00033 −
(5.28) (−4.70) −
INF −0.860 0.0031 −
(−5.29) (6.29)
PCTT −0.214 0.00083 −
(−1.88) (2.30)
YRUS 1.435 −0.0046 −
(6.31) (−6.72)
DEP 0.502 −0.0021 −
(2.54) (−3.39)
2
R 0.838
σ̂ 1.934
LL −982.9
AIC −1022.9
SBC −1106.5
∗ See the notes to Table 28.1
under the Swamy (1970) random coefficient scheme (28.3), where ηi satisfies assumptions
(28.4)–(28.5). For simplicity, we also assume that uit is independently distributed across i and
over t with zero mean and Var (uit ) = σ 2i . Substituting β i = β + ηi into (28.33) we obtain,
using stacked form notation,
y = Xβ + v,
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
y1. X1. v1.
⎜ y2. ⎟ ⎜ X2. ⎟ ⎜ v2. ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
y=⎜ .. ⎟, X =⎜ .. ⎟ , and v = ⎜ .. ⎟.
⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠
yN. XN. vN.
Suppose we are interested in estimating the mean coefficient vector, β, and the covariance matrix
of v, , given by
⎛ ⎞
1 0 ... 0
⎜ ⎜ 0 2 . . . 0 ⎟ ⎟
= E vv = ⎜ . .. .. .. ⎟ ,
⎝ .. . . . ⎠
0 0 ... N
where
For known values of η and σ 2i , the best linear unbiased estimator of β is given by the general-
ized least squares (GLS) estimator, known in this case as the Swamy estimator
−1 −1
β̂ SW = X −1 X X y,
N −1 N
−1
= Xi. i Xi. Xi. −1
i yi. .
i=1 i=1
It is easily seen that (under the assumption that η is nonsingular) (see property (A.9) in
Appendix A)
−1
IT Xi. Xi. Xi. Xi.
−1
i = − 2 + −1
η .
σi
2 σi σ 2i σ 2i
Note that −1
i exists even if η is singular. In general we can write
4
−1
IT Xi. η Xi. Xi. Xi.
−1
i = − Ik + η ,
σi
2 σ 2i σ 2i σ 2i
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Then
Xi. −1
i Xi.
= QiT − QiT H−1
iT QiT ,
T
and
Xi. −1
i yi.
= qiT − QiT H−1
iT qiT .
T
N
β̂¯ SW = Ri β̂ i ,
i=1
where
N
−1 −1 −1
Ri = η + β̂ η + β̂ , (28.35)
i i
i=1
and
β̂ i = (Xi. Xi. )−1 Xi. yi. , β̂ = Var β̂ i = σ 2i (Xi. Xi. )−1 . (28.36)
i
The expression (28.34) shows that the Swamy estimator is a matrix weighted average of the least
squares estimator for each cross-sectional unit (28.36), with the weights inversely proportional
to their covariance matrices. It also shows that the GLS estimator requires only a matrix inversion
of order k, and so it is not much more complicated to compute than the sample least squares
estimator.
The covariance matrix of the SW estimator is
N −1 N
−1 −1
¯
Var β̂ SW = −1
Xi i Xi = η + β̂ . (28.37)
i
i=1 i=1
If errors uit and ηi are normally distributed, the SW estimator is the same as the maximum like-
lihood (ML) estimator of β conditional on η and σ 2i . Without knowledge of η and σ 2i , we
can estimate β, η and σ 2i , i = 1, 2, . . . , N simultaneously by the ML method. However, it
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
can be computationally tedious. A natural alternative is to first estimate i , then substitute the
estimated i into (28.37).
Swamy proposes using the least squares estimator of β i , β̂ i = (Xi. Xi. )−1 Xi. yi. and the resid-
uals ûi. = yi. − Xi. β̂ i to obtain consistent estimators of σ 2i , for i = 1, . . . , N, and η . Noting
that
and
ûi. ûi.
σ̂ 2i = , (28.40)
T−k
1
= y [IT − Xi. (Xi. Xi. )−1 Xi. ]yi. ,
T−k i
N N N
ˆη = 1
β̂ i − N −1 β̂ j β̂ i − N −1 β̂ j
N − 1 i=1 j=1 j=1
1 2 Xi. Xi. −1
N
− σ̂ i . (28.41)
TN i=1 T
Just as in the error-components model, the estimator (28.41) is not necessarily non-negative
definite. In this situation, Swamy has suggested replacing (28.41) by
1
N N N
ˆ ∗η =
β̂ i − N −1 β̂ j β̂ i − N −1 β̂ j . (28.42)
N − 1 i=1 j=1 j=1
This estimator, although biased, is nonnegative definite and consistent when T tends to infinity.
For further discussion on the above estimator see Swamy (1970), and Hsiao and Pesaran
(2008).
1
N
β̂ MG = β̂ ,
N i=1 i
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where
−1
β̂ i = Xi. Xi. Xi. yi. .
MG estimation is possible when both T and N are sufficiently large, and is applicable irrespective
of whether the slope coefficients are random (in Swamy’s sense), or fixed in the sense that the
diversity in the slope coefficients across cross sectional units cannot be captured by means of
a finite parameter probability distribution. To compute the variance of the MG estimator, first
note that
β̂ i = β + ηi + ξ i. ,
where
−1
ξ i. = Xi. Xi. Xi. ui. ,
β̂ MG = β + η + ξ , (28.43)
and
1 1
N N
η= ηi , ξ = ξ .
N i=1 N i=1 i.
Hence, when the regressors are strictly exogenous and the errors, uit , are independently dis-
tributed, the variance of β̂ MG is
Var β̂ MG = Var (η) + Var ξ
1 2 Xi. Xi. −1
N
1
= η + 2 σ E .
N N i=1 i T
1 N
β̂ MG =
Var β̂ i − β̂ MG β̂ i − β̂ MG .
N(N − 1) i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
and
N
1 2 −1
N
E β̂ i − β̂ MG β̂ i − β̂ MG = (N − 1)η + 1 − σ E Xi. Xi. .
i=1
N i=1 i
as required. For a further discussion of the mean group estimator, see Pesaran and Smith (1995),
and Hsiao and Pesaran (2008).
Example 64 Continuing from Example 63, Haque, Pesaran, and Sharma (2000) further investigate
the determinants of cross country private savings rates by carrying out a country-specific analysis.
The FE regression in Table 28.2 assumes that the slope coefficients across countries are exact linear
functions of W i and/or YRUSi (see equation 28.32), and that the error variances, Var(uit ) = σ 2i ,
are the same across countries. Clearly, these are rather restrictive assumptions, and the consequences
of incorrectly imposing them on the parameters of interest need to be examined. Under the alterna-
tive assumption of unrestricted slope and error variance heterogeneity, MG estimates can be com-
puted as simple averages of country-specific estimates from country-specific regressions and can then
be used to make inferences about E(β i ) = β. Results on country-specific estimates and MG esti-
mates are summarized in Table 28.3. The estimated slope coefficients differ considerably across
countries, both in terms of their magnitude and their statistical significance. Some of the coefficients
are statistically significant only in the case of 3 or 4 countries and in general are very poorly estimated.
This is true of the coefficients of GI, GR, W, PCTT, and YRUS. Also the sign of these estimated coef-
ficients varies quite widely across countries. The coefficients of RINT and INF are better estimated,
but still differ significantly both in magnitude and in sign across the countries. Only the coefficients
of SUR and GCUR tend to be similar across countries. The coefficient of SUR is estimated to be
negative in 19 of the 20 countries, and 13 of these are statistically significant. The positive estimate
obtained for New Zealand is very small and not statistically significant. Similarly, 17 out of 20 coef-
ficients estimated for the GCUR variable have a negative sign, with 7 of the 17 negative coefficients
statistically significant. None of the three positive coefficients estimated for GCUR is statistically
significant. The MG estimates based on the individual country regressions in Table 28.3 support
these general conclusions. Only the MG estimates of the SUR and the GCUR variables are statisti-
cally significant (see the last two rows of Table 28.3). At −0.671, the MGE of the SUR variable is
only marginally higher than the corresponding FE estimate in Table 28.2 that allows for some slope
heterogeneity.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Table 28.3 Country-specific estimates of ‘static’ private saving equations (20 OECD countries, 1972–1993)
Australia −0.81 −0.18 −1.00 0.08 0.18 0.06 0.27 0.04 0.42 0.46
[0.18] [0.27] [0.41] [0.08] [0.08] [0.02] [0.09] [0.03] [0.17] [0.22]
Austria −0.48 −0.42 0.35 0.06 0.24 0004 0.09 0.11 −0.10 −0.03
[0.56] [0.40] [0.84] [0.32] [0.32] [0.05] [0.54] [0.16] [0.24] [0.21]
Belgium −0.68 −0.53 −2.47 0.09 −0.04 −0.02 −0.10 −0.00 0.17 −0.22
[0.23] [0.15] [1.51] [0.11] [0.14] [0.03] [0.13] [0.02] [0.09] [0.35]
Canada −1.31 −0.56 1.01 0.24 0.10 −0.03 0.29 0.17 0.07 −0.17
[0.10] [0.14] [1.03] [0.09] [0.08] [0.04] [0.09] [0.05] [0.12] [0.12]
Denmark −1.08 −0.64 0.36 0.03 −0.20 −0.01 0.10 0.02 0.14 −1.17
[0.15] [0.22] [0.80] [0.25] [0.20] [0.03] [0.29] [0.05] [0.24] [0.36]
Finland −0.70 −0.35 0.87 0.14 0.40 0.03 0.52 0.01 0.02 −0.39
[0.16] [0.21] [1.59] [0.20] [0.18] [0.03] [0.22] [0.02] [0.19] [0.52]
France −1.45 −0.78 −3.13 0.10 −0.16 −0.04 −0.22 −0.06 0.12 −0.16
[0.51] [0.52] [2.00] [0.23] [0.18] [0.10] [0.24] [0.05] [0.12] [0.43]
Germany −0.80 −0.54 −0.18 0.19 −0.06 0.00 0.02 −0.01 −0.10 −0.28
[0.35] [0.28] [0.71] [0.18] [0.17] [0.03] [0.25] [0.05] [0.20] [0.11]
Greece −0.69 −0.29 −1.13 0.15 1.23 0.10 1.05 −0.49 −0.87 1.52
[0.45] [0.71] [1.65] [0.34] [0.58] [0.05] [0.63] [0.27] [1.29] [1.24]
Ireland −0.48 −0.50 1.33 −0.08 −0.71 −0.13 −0.88 0.32 0.79 1.14
[0.29] [0.14] [1.18] [0.14] [0.28] [0.06] [0.22] [0.11] [0.24] [0.35]
Italy −0.46 0.05 −0.16 0.13 0.12 −0.00 0.09 −0.00 −0.12 0.32
[0.18] [0.21] [0.48] [0.15] [0.11] [0.03] [0.13] [0.04] [0.15] [0.19]
Japan −0.58 −0.79 −0.98 −0.14 −0.05 0.04 0.01 0.04 −0.06 0.22
[0.21] [0.31] [0.50] [0.12] [0.16] [0.03] [0.09] [0.01] [0.08] [0.32]
Netherlands −0.75 −0.43 −1.50 −0.05 0.09 0.12 −0.37 0.06 0.30 0.22
[0.33] [0.33] [2.64] [0.20] [0.28] [0.05] [0.27] [0.15] [0.26] [0.39]
New Zealand [0.02] −0.54 −1.22 −0.12 −0.07 0.02 −0.20 0.07 −0.46 0.24
[0.29] [0.45] [0.78] [0.22] [0.20] [0.03] [0.19] [0.07] [0.33] [0.18]
Norway −0.22 0.13 −0.15 −0.06 0.02 −0.07 −0.04 0.23 0.12 −0.16
[0.51] [0.66] [0.61] [0.46] [0.51] [0.05] [0.60] [0.07] [0.31] [0.64]
Portugal −1.00 −0.57 2.91 0.60 0.47 −0.07 0.64 0.21 −0.72 0.16
[0.20] [0.32] [1.64] [0.24] [0.20] [0.05] [0.19] [0.13] [0.37] [0.41]
Spain −0 18 −0 06 1.36 −0.01 0.07 −0.09 0.11 0.18 −0.78 −0.28
[0.55] [0.59] [1.58] [0.31] [0.38] [0.05] [0.42] [0.12] [0.32] [0.57]
Sweden −0.84 −0.96 −2.54 −0.53 0.24 0.05 −0.02 0.09 0.00 0.22
[0.11] [0.20] [1.49] [0.30] [0.23] [0.05] [0.23] [0.10] [0.22] [0.81]
Switzerland −0.22 −0.09 0.36 −0.26 0.02 0.06 0.21 −0.04 −0.06 −0.59
[0.50] [0.16] [0.76] [0.13] [0.14] [0.03] [0.11] [0.5] [0.12] [0.09]
UK −0.72 0.03 −0.79 0.37 0.18 −0.04 0.21 0.01 −0.25 0.34
[0.12] [0.10] [0.34] [0.09] [0.08] [0.03] [0.08] [0.04] [0.15] [0.15]
Average −0.671 −0.401 −0.335 0.046 0.104 0.001 0.089 0.048 −0.069 0.080
Standard error [.083] [.067] [.332] [.052] [.081] [.014] [.088] [.036] [.127] [.090]
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
2
σ̂ 2 χ 2SC (1) χ 2FF (1) χ 2N (2) χ 2H (1) R LL
Write
−1
η = λA,
N −1 N
λ −1 −1 λ −1
β̂ SW = QiT − Ik + AQ iT QiT qiT − Ik + AQ −1
iT q iT .
i=1
T i=1
T
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
−1 2
λ −1 λ λ
Ik + AQ iT = Ik − GiT + G2iT − . . .
T T T
where GiT = AQ −1
iT . Therefore,
2 −1
N
λ λ
β̂ SW = QiT − Ik − GiT + GiT + . . . QiT
2
i=1
T T
2
N
λ λ
qiT − Ik − GiT + GiT + . . . qiT .
2
i=1
T T
N −1
λ 2
N
λ 2
= GiT QiT − GiT QiT + O
i=1
T i=1 T
N 2
λ 2
N
λ
GiT qiT − GiT qiT + O .
i=1
T i=1
T
N −1
N
β̂ SW → GiT QiT GiT qiT .
i=1 i=1
N −1 N −1
N
N
GiT QiT GiT qiT = AQ −1
iT QiT AQ −1
iT qiT
i=1 i=1 i=1 i=1
1 N
1 N
= QiT−1 qiT = β̂ i = β̂ MG ,
N i=1
N i=1
lim β̂ SW (λ) = β̂ MG ,
λ→0
lim β̂ SW (λ) − β̂ MG = 0.
T→∞
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
p
q
yit = α i + λij yi,t−j + δ ij xi,t−j + uit , for i = 1, 2, . . . , N, (28.44)
j=1 j=0
where xit is a k-dimensional vector of explanatory variables for group i; α i represent the
fixed-effects; the coefficients of the lagged dependent variables, λij , are scalars; and δ ij are k-
dimensional coefficient vectors. In the following, we assume that the disturbances uit , i =
1, 2, . . . , N; t = 1, 2, . . . , T, are independently distributed across i and t, with zero means, vari-
ances σ 2i , and are distributed independently of the regressors xit .
The error correction representation of the above ARDL model is:
p−1
q−1
yit = α i + φ i yi,t−1 + β i xit + λ∗ij yi,t−j + δ ∗
ij xi,t−j + uit , (28.45)
j=1 j=0
where
p
q
φ i = −(1 − λij ), βi = δ ij ,
j=1 j=0
p
λ∗ij = − λim , j = 1, 2, . . . , p − 1,
m=j+1
q
δ ∗ij = − δ im , j = 1, 2, . . . , q − 1.
m=j+1
If we stack the time series observations for each group, (28.45) can be written as
p−1
q−1
yi. = α i τ T + φ i yi.,−1 + Xi. β i + λ∗ij yi.,−j + Xi.,−j δ ∗ij + ui. ,
j=1 j=0
for i = 1, 2, . . . , N, where τ T is a T × 1 vector of ones, yi.,−j and Xi.,−j are j-period lagged values
of yi. and Xi. , yi. = yi. − yi.,−1 , Xi. = Xi. − Xi.,−1 , yi.,−j and Xi.,−j are j-period lagged
values of yi. and Xi. .
If the roots of the polynomial
p
fi (z) = 1 − λij zj = 0,
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
for i = 1, 2, . . . , N, fall outside the unit circle, then the ARDL(p, q, q, . . . , q) model is stable. In
this chapter we will take up this assumption, while the non-stationary case will be discussed in
Chapter 31. This condition ensures that φ i < 0, and that there exists a long-run relationship
between yit and xit defined by (see Sections 6.5 and 22.2)
for each i = 1, 2, . . . , N, where ηit is I (0), and θ i are the long-run coefficients on Xi. ,
θ i = −β i /φ i .
where the slopes, λi and β i , as well as the intercepts, α i , are allowed to vary across cross-
sectional units (groups). Here, for simplicity, xit is a scalar random variable but the analysis can
be extended to the case of more than one regressor. We assume that xit is strictly exogenous. Let
θ i = β i / (1 − λi ) be the long-run coefficient of xit for the ith group and rewrite (28.46) as
yit = α i − (1 − λi ) yi,t−1 − θ i xit + uit ,
or
yit = α i − φ i yi,t−1 − θ i xit + uit .
φ i = φ + ηi1 , (28.47)
θ i = θ + ηi2 . (28.48)
Hence
β i = θ i φ i = θ φ + ηi3 , (28.49)
where
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
and
ω33 = Var ηi3 = Var(φηi2 + θηi1 + ηi1 ηi2 ).
It is now clear that vit and yi,t−1 are correlated and the FE or RE estimators will not be consistent.
This is not a surprising result in the case where T is small. In Chapter 27 we saw that the FE
(and RE) estimators are inconsistent when T is finite and N large when the slopes λi and β i are
homogeneous, that is, ηi1 = ηi3 = 0. The significant result here is that the inconsistency of the
FE and RE estimators will not disappear even when both T → ∞ and N → ∞, if the slopes λi
and/or β i are heterogenous across groups. In fact, in the relatively simple case where
λi = λ, or ηi1 = 0 ,
β i = β + ηi3 ,
xit = μi (1 − ρ) + ρxi,t−1 + ν it ,
|ρ| < 1, E (xit ) = μi ,
ν it ∼ IID 0, τ 2 , (28.53)
we have5
ρ (1 − λρ) 1 − λ2 ω33
Plim λ̂FE − λ = , (28.54)
N,T→∞ 1
βρ 2 1 − λ2 ω33
Plim β̂ FE − β = − ,
N,T→∞ 1
where
σ2
1 = 1 − ρ 2 (1 − λρ)2 + 1 − λ2 ρ 2 ω33 + 1 − ρ 2 β 2 > 0,
τ2
and ω33 = Var ηi3 = Var β i measures the degree of heterogeneity in β i . It is now clear that
when ρ > 0,
Plim λ̂FE > λ, Plim β̂ FE < β.
5 It is interesting that when ρ > 0 the heterogeneity bias, given by (28.54), is in the opposite direction to the Nickell
bias defined by (27.3).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
The bias of the FE estimator of the long-run coefficient, θ̂ FE = β̂ FE / 1 − λ̂FE , is given by
θ
Plim θ̂ FE = ,
N,T→∞ 1 − ρ2
where
(1 + λ) ω33
2 = .
σ2
(1 + ρ) τ2
(1 − λρ)2 + β 2 + ω33
irrespective of the true value of λ. See Pesaran and Smith (1995) for further details.
Example 65 The FE ‘static’ private savings regressions reported in Tables 28.1 and 28.2 within
Example 63 are subject to a substantial degree of residual serial correlation, which can lead to incon-
sistent estimates even under slope homogeneity since the wealth variable, W, is in fact constructed
from accumulation of past savings. The presence of residual serial correlation could be due to a host
of factors: omitted variables, neglected slope heterogeneity in the case of serially correlated regres-
sors, and of course neglected dynamics. The diagnostic statistics provided in the second part of Table
28.3, within Example 64, show statistically significant evidence of residual serial correlation in the
case of eight of the twenty countries.6 It is clear that, even when the slope coefficients are allowed to be
estimated freely across countries, residual serial correlation still continues to be a problem, at least in
the case of some, if not all, the countries.7 The usual time series technique for dealing with dynamic
misspecification is to estimate error correction models based on ARDL models. ARDL models have
the advantage that they are robust to integration and cointegration properties of the regressors, and
for sufficiently high lag-orders could be immune to the endogeneity problem, at least as far as the
long-run properties of the model are concerned. In the present application, observations for each
individual country are available for too short a period to estimate even a first-order ARDL model
including all the 10 regressors for each country separately.8 Pooling in the form of FE estimation can
compensate for lack of time series observations but, as shown in previous example, this can have its
own set of problems. To check the robustness of the ‘static’ FE estimates presented in Table 28.2 to
dynamic misspecification, Haque, Pesaran, and Sharma (2000) estimated the following first-order
dynamic panel data model
6 The diagnostic statistics are computed using the Lagrange multiplier procedure described in Section 5.8, and are valid
irrespective of whether the regressions contain lagged dependent variables, implicitly or explicitly.
7 Under slope homogeneity restrictions, residual serial correlation is a problem for all the countries in the panel.
8 A first-order ARDL model in the private savings rate for each country that contains all ten regressors would involve
estimating twenty-two unknown parameters with only twenty-two time series observations available per country!
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
The FE estimates computed using all the 21 countries over the period 1972–1993 are given in Table
28.4.9 Clearly, there are significant dynamics, particularly in the relationship between changes in the
government surplus and expenditure variables (SUR, GCUR, and GI) and the private savings rate.
There is also important evidence of cross-sectional variations in the coefficients of wealth, income
and demographic variables (W, YRUS and DEP). However, unlike the static estimates in Table
28.2, the coefficients of GDP growth and the real interest rate are no longer statistically significant.
Overall, this equation presents a substantial improvement over the static FE estimates. In fact, the
estimated standard error of this dynamic regression is 62 percent lower than the standard error of
the FE estimates favoured by Masson, Bayoumi, and Samiei (1998), and reproduced in the first
column of Table 28.1. Using the formula (28.56) the following estimates of the long-run coefficients
are obtained
SUR −0.432
(−3.11)
GCUR −0.398
(−4.65)
GI −0.202
(−0.91)
GR −0.004
(−0.03)
RINT 0.154
(1.64)
W 0.224 −0.00057W̄i
(4.58) (−3.77)
INF 0.248
(3.10)
PCTT 0.136
(4.11)
YRUS 1.384 −0.0047W̄i
(2.58) (−2.92)
DEP 0.708 −0.0027W̄i
(2.19) (−2.64)
According to these estimates the long-run coefficients of the SUR and GCUR variables are still sta-
tistically significant, although the coefficient of the SUR variable is now estimated to be much lower
than the estimate based on the static regressions. The long-run coefficients of the GI, GR and RINT
variables are no longer statistically significant. It appears that, in contrast to government consump-
tion expenditures, the effect of changes in government investment expenditures on private savings is
temporary and tends to zero in the long run. The inflation and the terms of trade variables (INF
9 For relatively simple dynamic models where T (= 22) is reasonably large and of the same order of magnitude as N
(= 21), the application of the IV type estimators, discussed in Chapter 27, to a first differenced version of (28.55) does not
seem necessary and can lead to considerable loss of efficiency.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
and PCTT) have the expected signs and are also statistically significant. The long-run coefficients of
the remaining variables vary with country-specific average wealth-GDP ratio and when averaged
across countries yield the values of 0.043 [0.026], −0.118 [0.219] and −0.148 [0.125] for W,
YRUS, and DEP variables respectively. The cross-sectional standard errors of these estimates are
given in square brackets. The average estimate of the coefficient of the relative income variable has
the wrong sign, but it is not statistically significant. The average estimates of the other two coefficients
have the expected signs, but are not statistically significant either. It seems that the effects of many of
the regressors considered in the MBS study are not robust to dynamic misspecifications. However, it
would be interesting to examine the consequences of jointly allowing for unrestricted short-run slope
heterogeneity and dynamics.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where xit is a k×1 vector of exogenous variables, and the error term uit is assumed to be indepen-
dently, identically distributed over t with mean zero and variance σ 2i , and is independent across i.
Let ψ i = (λi , β i ) . Further assume that ψ i is independently distributed across i with
E ψ i = ψ = λ, β , (28.58)
!
E (ψ i − ψ)(ψ i − ψ) = . (28.59)
∞
∞
yi,t−1 = (λ + ηi1 )j xi,t−j−1 (β + ηi2 ) + (λ + ηi1 )j ui,t−j−1 , (28.61)
j=0 j=0
where ηi = ηi1 , ηi2 . It follows that E(ηi yi,t−1 ) = 0.
The violation of the independence between the regressors and the individual effects, ηi ,
implies that the pooled least squares regression of yit on yi,t−1 , and xit will yield inconsistent
estimates of ψ, even for sufficiently large T and N. Pesaran and Smith (1995) have noted that, as
T → ∞, the least squares regression of yit on yi,t−1 and xit yields a consistent estimator of ψ i ,
ψ̂ i . Hence, the authors suggest a MG estimator of ψ by taking the average of ψ̂ i across i,
1
N
ψ̂ MG = ψ̂ , (28.62)
N i=1 i
where
−1
ψ̂ i = Wi. Wi. Wi. yi. ,
Wi. = (yi.,−1 , Xi. ) with yi.,−1 = (yi0 , yi1 , . . . , yiT−1 ) . The variance of ψ̂ MG is consistently esti-
mated by
1 N
ψ̂ MG =
Var ψ̂ i − ψ̂ MG ψ̂ i − ψ̂ MG .
N(N − 1) i=1
Note that, for finite T, ψ̂ i for ψ i is biased, with a bias of order 1/T (Hurwicz (1950), Kiviet
and Phillips (1993)). Hsiao, Pesaran, and Tahmiscioglu (1999) have √ shown that the MG esti-
mator is asymptotically normal for large N, and large T, so long as N/T → 0 as both N and
T → ∞.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
1 −1
N
E ψ̂ MG = ψ + E Wi. Wi. Wi. ui. . (28.63)
N i=1
It is easy to see that, due to the presence of lagged dependent variables, N → ∞ is not sufficient
for eliminating the second term. One needs large enough T for the bias to disappear. In practice,
when the model contains lagged dependent variables, we have
−1 K 3
Wi. Wi. Wi. ui. = + O T− 2 ,
iT
E
T
where KiT is bounded in T and a function of the unknown underlying parameters. Hence
1 KiT
N 3
E ψ̂ MG = ψ + + O T− 2 .
T i=1 N
Pesaran and Zhao (1999) propose a number of bias reduction techniques for the MG estimator
of the long-run coefficients in dynamic models. Estimation of such coefficients poses additional
difficulties due to the nonlinearity of long-run coefficients in terms of the underlying short-run
parameters is an additional source of bias for the MG estimation of dynamic models. In a set
of Monte Carlo experiments, Hsiao, Pesaran, and Tahmiscioglu (1999) showed that the MG
estimator is unlikely to be a good estimator when either N or T is small.
−1
N
!−1
N
!
ψ̂ B = σ 2i (Wi Wi )−1 + σ 2i (Wi Wi )−1 + ψ̂ i , (28.64)
i=1 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where Wi = (yi,−1 , Xi ) with yi,−1 = (yi0 , yi1 , . . . , yiT−1 ) . This Bayes estimator is a weighted
average of the least squares estimator of individual units with the weights being inversely propor-
tional to individual variances. When T → ∞, N → ∞, and N/T 3/2 → 0, the Bayes estimator
is asymptotically equivalent to the MG estimator (28.62) (Hsiao, Pesaran, and Tahmiscioglu
(1999)).
In practice, the variance components, σ 2i and are rarely known. The Monte Carlo studies
conducted by Hsiao, Pesaran, and Tahmiscioglu (1999) show that, following the approach of
Lindley and Smith (1972) in assuming that the prior-distributions of σ 2i and are independent
and are distributed as
"
N
P(−1 , σ 21 , . . . , σ 2n ) = W(−1 |(rR)−1 , r) σ −2
i , (28.65)
i=1
yields a Bayes estimator almost as good as the Bayes estimator with known and σ 2i , where
W (.) represents the Wishart distribution with scale matrix, rR, and degrees of freedom r.
The Hsiao, Pesaran, and Tahmiscioglu (1999) Bayes estimator is derived under the assump-
tion that the initial observation yi0 are fixed constants. As discussed in Anderson and Hsiao
(1981, 1982), this assumption is clearly unjustifiable for a panel with finite T. However, contrary
to the sampling approach where the correct modelling of initial observations is quite important,
the Hsiao, Pesaran, and Tahmiscioglu (1999) Bayesian approach appears to perform fairly well in
the estimation of the mean coefficients for dynamic random coefficient models as demonstrated
in their Monte Carlo studies.
θi = θ, i = 1, 2, . . . , N.
This estimator, known as the pooled mean group estimator, provides a useful intermediate alter-
native between estimating separate regressions, which allows all coefficients and error variances
to differ across the groups, and standard FE estimators that assume the slope coefficients are the
same across i. Under the above assumptions, the error correction model can be written more
compactly as
yi = φ i ξ i (θ ) + Wi κ i + εi , (28.66)
where
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
There are three issues to be noted in estimating (28.66). First, the regression equations for each
group are nonlinear in φ i and θ . A further complication arises from the cross-equation parameter
restrictions existing by virtue of the long-run homogeneity assumption. Finally, note that the
error variances differ across groups. The log-likelihood function is
T 1 −2
N N
T (ϕ) = − ln 2πσ i −
2
σ Qi , (28.67)
2 i=1 2 i=1 i
where
Qi = [yi − φ i ξ i (θ )] Hi [yi − φ i ξ i (θ )] ,
Hi = IT − Wi (Wi Wi )−1 Wi ,
1 φ 2i
N
X Hi Xi ,
NT i=1 σ 2i i
converges in probability to a fixed positive definite matrix. In the case where the xit ’s are I(1), the
matrix
1 φ 2i
N
X Hi Xi ,
NT 2 i=1 σ 2i i
converges to a random positive definite matrix with probability 1. These conditions should hold
for all feasible values of φ i and σ 2i as T → ∞ either for a fixed N, or for N → ∞ and T → ∞,
jointly. See Pesaran, Shin, and Smith (1999) for details.
The ML estimates of the long-run coefficients, θ, and the group-specific error-correction coef-
ficients, φ i , can be computed by maximizing (28.67) with respect to ϕ. These ML estimators
are termed pooled mean group (PMG) estimators in order to highlight the pooling effect of the
homogeneity restrictions on the estimates of the long-run coefficients, and the fact that averages
across groups are used to obtain group-wide mean estimates of the error-correction coefficients
and the other short-run parameters of the model.
Pesaran, Shin, and Smith (1999) propose two different likelihood-based algorithms for the
computation of the PMG estimators which are computationally less demanding than estimating
the pooled regression. The first is a ‘back-substitution’ algorithm that only makes use of the first
derivatives of the log-likelihood function
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
−1
N 2
φ̂ i
N
φ̂ i
θ̂ = − X H X
2 i i i
X H yi
2 i i
− φ̂ i yi,−1 , (28.68)
σ̂
i=1 i σ̂
i=1 i
−1
φ̂ i = ξ̂ i Hi ξ̂ i ξ̂ i Hi yi , (28.69)
σ̂ 2i = T −1 (yi − φ̂ i ξ̂ i ) Hi (yi − φ̂ i ξ̂ i ), (28.70)
(0)
where ξ̂ i = yi,−1 −Xi θ̂. Starting with an initial estimate of θ, say θ̂ , estimates of φ i and σ 2i can
be computed using (28.69) and (28.70), which can then be substituted in (28.68) to obtain a
(1)
new estimate of θ , say θ̂ , and so on until convergence is achieved. Alternatively, the PMG esti-
mators can be computed using (a variation of) the Newton-Raphson algorithm which makes use
of both the first and the second derivatives. An overview of alternative numerical optimization
techniques is provided in Section A.16 of Appendix A.
Note that, for small T, the PMG estimator (as well as the group-specific estimator) will be sub-
ject to the familiar downward bias on the coefficient of the lagged dependent variable. Because
the bias is in the same direction for each group, averaging or pooling does not reduce this bias.
Bias corrections are available in the literature (e.g., Kiviet and Phillips (1993)), but these apply to
the short-run coefficients. Because the long-run coefficient is a nonlinear function of the short-
run coefficients, procedures that remove the bias in the short-run coefficients can leave the long-
run coefficient biased. Pesaran and Zhao (1999) discuss how the bias in the long-run coefficients
can be reduced.
Example 66 Continuing from Example 65, Haque, Pesaran, and Sharma (2000) then allowed for
both unrestricted short-run slope heterogeneity and dynamics. To this end, they estimate individ-
ual country regressions containing first-order lagged values of the savings rates, PSAVi,t−1 . The
MG and pooled mean group (PMG) estimates of the long-run coefficients based on these dynamic
individual country regressions are given in Table 28.5. For ease of comparison, the MG estima-
tor based on a static version of these regressions, as well as the corresponding FE estimates, are
reported. Unlike the FE estimates, the consequences of allowing for dynamics on the MG estimates
are rather limited. Once again only the coefficients of the SUR and the GCUR variables are sta-
tistically significant, although the dynamic MG estimates suggest the coefficient of the PCTT vari-
able to be also marginally significant. Finally, the last column of Table 28.5 provides the pooled
mean group estimates of the long-run coefficients, where the short-run dynamics are allowed to
differ freely across countries but equality restrictions are imposed on one or more of the long-run
coefficients; the rationale being that due to differences in factors such as adjustment costs or the
institutional set-up across countries slope homogeneity is more likely to be valid in the long run.
The PMG estimates in Table 28.5 impose the slope homogeneity restrictions only on the long-
run coefficients of the SUR variable. As expected, the PMG estimates are generally more precisely
estimated and confirm that, amongst the various determinants of private savings considered by
MBS, only the effects of the SUR and the GCUR variables seem to be reasonably robust to the
presence of slope heterogeneity and yield plausible estimates for the offsetting effects of govern-
ment budget surpluses and government consumption expenditures on private savings across OECD
countries.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Table 28.5 Private saving equations: fixed-effects, mean group and pooled MG estimates (20 OECD
countries, 1972–1993)
H0 : β i = β, for all i,
β
< K < ∞, (28.72)
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
One assumption underlying existing tests for slope homogeneity is that, under H1 , the fraction
of the slopes that are not the same does not tend to zero as N → ∞.
where RSSR and USSR are restricted and unrestricted residual sum of squares, respectively,
obtained under the null (β i = β) and the alternative hypotheses. This test is applicable when
N is fixed as T → ∞, and the error variances are homoskedastic, σ 2i = σ 2 . But it is likely to
perform rather poorly in cases where N is relatively large, the regressors contain lagged values of
the dependent variable and/or if the error variances are cross sectionally heteroskedastic.
N
β̂ MG = N −1 β̂ i , (28.74)
i=1
−1
where Mτ = IT − τ T τ T τ T τ T , τ T is a T × 1 vector of ones, IT is an identity matrix of
order T, and
−1
β̂ i = Xi Mτ Xi Xi Mτ yi . (28.75)
For the Hausman test to have the correct size and be consistent two conditions must be met
(see also Section 26.9.1)
(a) Under H0 , β̂ FE and β̂ MG must both be consistent for β, with β̂ FE being asymptotically
more efficient such that
AVar β̂ MG − β̂ FE = AVar β̂ MG − AVar β̂ FE > 0, (28.76)
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
β i = β + vi , vi ∼ IID(0, v ),
where v = 0 is a non-negative definite matrix, and E(Xj vi ) = 0 for all i and j. Then
N −1
N
N
β̂ FE − β̂ MG = Xi Mτ Xi Xi Mτ Xi vi − N −1 vi +
i=1 i=1 i=1
N −1
N
N
−1
Xi Mτ Xi Xi Mτ ε i − N −1 X i Mτ X i X i Mτ ε i ,
i=1 i=1 i=1
and it readily follows that, under the random coefficients alternatives and strictly exogenous
regressors, we have E β̂ FE − β̂ MG |H1 = 0. This result holds for N and T fixed as well as
when N and T → ∞, and hence condition (b) of Hausman’s procedure is not satisfied.
Another important case where the Hausman test does not apply arises when testing the homo-
geneity of slopes in pure autoregressive panel data models. To simplify the exposition, consider
the following stationary AR(1) panel data model
# #
yit = α i (1 − β i ) + β i yi,t−1 + ε it , with #β i # < 1. (28.77)
It is now easily seen that with N fixed and as T → ∞, under H0 (where β i = β) we have
√
NT β̂ FE − β →d N 0,1 − β 2 ,
and
√
NT β̂ MG − β →d N 0,1 − β 2 .
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Hence the variance inequality part of condition (a), namely (28.76), is not satisfied, and the
application of the Hausman test to autoregressive panels will not have the correct size.
where β̂ = (β̂ 1 , β̂ 2 , . . . , β̂ N ) is an Nk × 1 stacked vector of all the N individual least square
estimates of β i , β̂ FE is a fixed-effect estimator as before, and ˆ g is a consistent estimator of g ,
the asymptotic variance matrix of β̂ − τ N ⊗ β̂ FE , under H0 . Under standard assumptions for
stationary dynamic models, and assuming H0 holds and N is fixed, then G →d χ 2 (Nk) as
T → ∞, so long as g is a non-stochastic positive definite matrix.
As compared to the Hausman test based on β̂ MG − β̂ FE , the G test is likely to be more pow-
erful; but its use will be limited to panel data models where N is small relative to T. Also, the G
test will not be valid in the case of pure dynamic models, very much for the same kind of rea-
sons noted above in relation to the Hausman test based on β̂ MG − β̂ FE . This is easily established
in the case of the stationary first-order autoregressive panel data model considered by Phillips
and Sul (2003). In the case of AR(1) panel regressions with σ 2i = σ 2 , it is easily verified that
under H0
√ √ √
T β̂ i − β̂ FE = Avar
Avar T β̂ i − β − T β̂ FE − β
1 − β2
= 1−β −2
,
N
√ √
1 − β2
Acov T β̂ i − β̂ FE , T β̂ j − β̂ FE = − .
N
Therefore
1 − β2
g = IN − N −1 τ N τ N .
T
It is now easily seen that rank g = N − 1, and g is non-invertible.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where N is small relative to T, but allows for cross-sectional heteroskedasticity. Swamy’s statistic
applied to the slope coefficients can be written as
N X M X
i τ i
Ŝ = β̂ i − β̂ WFE β̂ i − β̂ WFE , (28.78)
i=1 σ̂ 2i
1
σ̂ 2i = yi − Xi β̂ WFE Mτ yi − Xi β̂ WFE ,
T−k−1
and β̂ WFE is the weighted pooled estimator also computed using σ̂ 2i , namely
N
Xi Mτ Xi −1 Xi Mτ yi
N
β̂ WFE = .
i=1 σ̂ 2i i=1 σ̂ 2i
In the case where N is fixed and T tends to infinity, under H0 the Swamy statistic, Ŝ, is asymp-
totically chi-square-distributed with k(N − 1) degrees of freedom.
−1
√ N −1 Ŝ − k √ N S̃ − k
ˆ =
N √ ˜
,= N √ (28.79)
2k 2k
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where
N X M X
i τ i
S̃ = β̂ i − β̃ WFE β̂ i − β̃ WFE , (28.80)
i=1
σ̃ 2i
N −1 N
X Mτ X i X Mτ yi
i i
β̃ WFE = 2 2
,
i=1
σ̃ i i=1
σ̃ i
and
1
σ̃ 2i = yi − Xi β̂ FE Mτ yi − Xi β̂ FE .
T−1
Although the difference between Ŝ and S̃ might appear slight at first, the different choices of
the estimator of σ 2i used in construction of these statistics have important implications for the
properties of the two tests as N and T tends to infinity. To see this let
Q iT = T −1 Xi Mτ Xi , (28.81)
N
Q NT = (NT)−1 Xi Mτ Xi , (28.82)
i=1
−1
Pi = Mτ Xi Xi Mτ Xi X i Mτ , (28.83)
Assumption H.5:
(i) ε it |Xi ∼ IID(0, σ 2i ), σ 2max = max1≤i≤N (σ 2i ) < K, and σ 2min = min1≤i≤N (σ 2i ) > 0.
(ii) ε it and ε js are independently distributed for i = j and/or t = s.
(iii) E(ε 9it |Xi ) < K.
(ii) The k × k pooled observation matrix Q NT defined by (28.82) is positive definite, and
Q NT tends to a non-stochastic positive definite matrix, Q = limN→∞ N −1 N i=1 Q i ,
j
as (N, T) → ∞.
Assumption H.7: There exists a finite T0 such that for T > T0 , E{[υ i Mτ υ i /(T−1)]−4− } < K
and E{[υ i Mi υ i /(T−k−1)]−4− } < K, for each i and for some small positive constant , where
υ i = εi /σ i .
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Assumption H.8: Under H1 , the fraction of slopes that are not the same does not tend to zero as
N → ∞.
Under Assumptions H.5–H.7 and assuming that H0 (the null of slope homogeneity) holds,
then the dispersion statistics Ŝ and S̃ defined above can be written as
N
−1/2 −1/2
N Ŝ = N ẑiT + Op N −1/2 + Op T −1/2 , (28.85)
i=1
N
N −1/2 S̃ = N −1/2 z̃iT + Op N −1/2 + Op T −1/2 , (28.86)
i=1
where
(T − k − 1)υ i Pi υ i (T − 1)υ i Pi υ i
ẑiT = , and z̃iT = . (28.87)
υ i Mi υ i υ i Mτ υ i
Under Assumptions H.4–H.7, ẑiT and z̃iT are independently (but not necessarily identically)
distributed random variables across i with finite means and variances, and for all i we have
Also under the null hypothesis that the slopes are homogenous, we have
j √
ˆ →d N(0, 1), as (N, T) → ∞, so long as
N/T → 0,
√ j
˜ →d N(0, 1), as (N, T) → ∞, so long as N/T 2 → 0,
where the standardized dispersion statistics, ˆ and ˜ are defined above. Furthermore, if the
errors, εit , are normally distributed, under H0 we have
j √
ˆ →d N(0, 1), as (N, T) → ∞, so long as
N/T → 0,
j
˜ →d N(0, 1), as (N, T) → ∞.
The small sample properties of the dispersion tests can be improved under the normally
distributed errors by considering the following mean and variance bias adjusted versions of
ˆ and
˜
$
N (T + 1) N −1 S̃ − k
˜ adj =
√ , (28.91)
(T − k − 1) 2k
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
√ N −1 Ŝ − E(ẑiT )
ˆ adj =
N % ,
Var(ẑiT )
where
k(T − k − 1) 2k (T − k − 1)2 (T − 3)
E(ẑiT ) = , Var(ẑiT ) = . (28.92)
T−k−3 (T − k − 3)2 (T − k − 5)
The Monte Carlo results reported in Pesaran and Yamagata (2008) suggest that the ˜ adj test
works well even if there are major departures from normality, and is to be recommended.
yi = α i τ T + Xi1 β i1 + Xi2 β i2 + ε i , i = 1, 2, . . . , N,
T×1 T×k1 T×k2
or
yi = Zi1 δ i + Xi2 β i2 + ε i ,
T×1 T×(k1 +1) T×k2
where Zi1 = (τ T , Xi1 ) and δ i = α i , β i1 . Suppose the slope homogeneity hypothesis of inter-
est is given by
H0 : β i2 = β 2 , for i = 1, 2, . . . , N. (28.93)
N X M X
i2 i1 i2
S̃2 = β̂ i2 − β̃ 2,WFE 2 β̂ i2 − β̃ 2,WFE ,
i=1
σ̃ i
where
−1
β̂ i2 = Xi2 Mi1 Xi2 Xi2 Mi1 yi ,
N −1 N
X Mi1 Xi2 X Mi1 yi
i2 i2
β̃ 2,WFE = 2
,
i=1
σ̃ i i=1
σ̃ 2i
−1
Mi1 = IT − Zi1 Zi1 Zi1 Zi1 ,
yi − Xi2 β̂ 2,FE Mi1 yi − Xi2 β̂ 2,FE
σ̃ 2i = ,
T − k1 − 1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
and
−1
N
N
β̂ 2,FE = Xi2 Mi1 Xi2 Xi2 Mi1 yi .
i=1 i=1
Using a similar line of reasoning as above, it is now easily seen that under H0 defined by (28.93),
j √
and for (N, T) → ∞, such that N/T 2 → 0, then
√ N −1 S̃2 − k2
˜2 =
N √ →d N (0, 1) .
2k2
In the case of normally distributed errors, the following mean-variance bias adjusted statistic can
be used
$
N (T − k1 + 1) N −1 S̃2 − k2
˜ adj = √ .
(T − k − 1) 2k2
The -tests can also be extended to unbalanced panels. Denoting the number of time series
observations on the ith cross-section by Ti , the standardized dispersion statistic is given by
1 d̃i − k
N
˜ =√
√ , (28.94)
N i=1 2k
X M X
i τi i
d̃i = β̂ i − β̃ WFE β̂ i − β̃ WFE ,
σ̃ 2i
−1
Xi = xi1 , xi2 , . . . , xiTi , Mτ i = ITi − τ Ti τ Ti τ Ti τ Ti with τ Ti being a Ti × 1 vector of
unity,
−1
β̂ i = Xi Mτ i Xi Xi Mτ i yi , (28.95)
X M X −1
N N
Xi Mτ i yi
i τi i
β̃ WFE = 2 , (28.96)
i=1
σ̃ i i=1
σ̃ 2i
yi = yi1 , yi2 , . . . , yiTi ,
yi − Xi β̂ FE Mτ i yi − Xi β̂ FE
σ̃ 2i = ,
Ti − 1
and
N −1
N
β̂ FE = Xi Mτ i Xi Xi Mτ i yi . (28.97)
i=1 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
˜
The -test can also be applied to stationary dynamic models. Pesaran and Yamagata (2008)
show that the test will be valid for dynamic panel data models so long as N/T →κ, as (N, T) →j
∞, where 0 ≤ κ < ∞. This condition is more restrictive than the one obtained for panels with
exogenous regressors, but is the same as the condition required for the validity of the fixed-effects
estimator of the slope in AR(1) models in large N and T panels.
˜
Using Monte Carlo experiments it is shown that the -test has the correct size and satisfactory
power in panels with strictly exogenous regressors for various combinations of N and T. Similar
results are also obtained for dynamic panels, but only if the autoregressive coefficient is not too
close to unity and so long as T ≥ N. See Pesaran and Yamagata (2008) for further discussion.
1
λ̊WFE = λ̃WFE + 1 + λ̃WFE , (28.98)
T
and estimate the associated intercepts as
10 For example, see Beran (1988), Horowitz (1994), Li and Maddala (1996), and Bun (2004), although none of these
authors makes any bias corrections in their bootstrapping procedures.
11 Bias-corrected estimates are also used in the literature on the derivation of the bootstrap confidence intervals to gen-
erate the bootstrap samples in dynamic AR(p) models. See Kilian (1998), among others.
12 Bias corrections for the OLS estimates of individual λ are provided by Kendall (1954) and Marriott and Pope (1954),
i
and further elaborated by Orcutt and Winokur (1969). See also Section 14.5. No bias corrections seem to be available for
FE or WFE estimates of AR(p) panel data models in the case of p ≥ 2.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where S̃(b) is the modified Swamy statistic, defined by (28.80), computed using the bth boot-
˜ (b) for b = 1, 2, . . . , B, can now be used to obtain the bootstrap
strapped sample. The statistics
p-values
1 (b)
B
pB = ˜ −
I ˜ ,
B
b=1
where B is the number of bootstrap sample, I(A) takes the value of unity if A > 0 or zero
˜ is the standardized dispersion statistic applied to the actual observations. If
otherwise, and
pB < 0.05, say, the null hypothesis of slope homogeneity is rejected at the 5 per cent signifi-
cance level.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where within each education group λ(e) is assumed to be homogeneous across the different indi-
(e)
viduals. Our interest is to test the hypothesis that λ(e) = λi for all i in e.
The test results are given in the first panel of Table 28.6. The ˜ statistics and the associ-
ated bootstrapped p values by education groups all lead to strong rejections of the homogeneity
hypothesis. Judging by the size of the ˜ statistics, the rejection is stronger for the pooled sam-
ple as compared with the sub-samples, confirming the importance of education as a discrimi-
natory factor in the characterizations of heterogeneity of earnings dynamics across individuals.
The test results also indicate the possibility of other statistically significant sources of hetero-
geneity within each of the education groups, and casts some doubt on the two-step estimation
procedure adopted in the literature for dealing with heterogeneity, a point recently emphasized
by Browning, Ejrnæs, and Alvarez (2010).
In Table 28.6 we also provide a number of different FE estimates of λ(e) , e = 0, 1, 2, 3, on the
assumption of within group slope homogeneity. Given the relatively small number of time series
observations available (on average 18), the bias corrections to the FE estimates are quite large.
The cross-section error variance heterogeneity also plays an important role in this application,
as can be seen from a comparison of FE and WFE estimates with the latter being larger. Focusing
on the bias-corrected WFE estimates, we also observe that the persistence of earnings dynamics
rises systematically from 0.52 in the case of the school drop outs to 0.72 for the college graduates.
This seems sensible, and partly reflects the more reliable job prospects that are usually open to
individuals with a higher level of education.
The homogeneity test results suggest that further efforts are needed also to take account of
within group heterogeneity. One possibility would be to adopt a Bayesian approach, assuming
(e)
that λi , i = 1, 2, . . . , N (e) are draws from a common probability distribution and focus atten-
tion on the whole posterior density function of the persistent coefficients, rather than the aver-
age estimates that tend to divert attention from the heterogeneity problem. Another possibility
would be to follow Browning, Ejrnæs, and Alvarez (2010) and consider particular parametric
functions, relating λ(e)
i to individual characteristics as a way of capturing within group hetero-
geneity. Finally, one could consider a finer categorization of the individuals in the panel; say by
further splitting of the education groups or by introducing new categories such as occupational
classifications. The slope homogeneity tests provide an indication of the statistical importance
of the heterogeneity problem, but are silent as how best to deal with the problem.
(e) (e) (e)
13 Log real earnings are computed as wit = ln LABYit /PCEDt , where LABYit is earnings in the current US dollar,
and PCEDt is the personal consumption expenditure deflator, base year 1992.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
Table 28.6 Slope homogeneity tests for the AR(1) model of the real earnings equations
Notes: The FE estimator and the WFE estimator are defined by (28.97), and (28.96), respectively, and their associated
−1
λ̂FE = σ̂ 2
standard errors (shown in round brackets) are based on Var N y M y , where
i=1 i,−1 τ i i,−1
−1
N
σ̂ 2 = T − N − 1 yi − λ̂FE yi,−1 Mτ i yi − λ̂FE yi,−1 ,
i=1
N −1
T= N σ̃ −2 y M y
i=1 Ti , and Var λ̃WFE = i=1 i i,−1 τ i i,−1 .
Bias corrected estimates are based on λ̊WFE = λ̃WFE + (T/N) 1 + λ̃WFE and Var λ̊WFE = T−1 1 − λ̊2WFE .
Bias-corrected bootstrapped tests also use λ̊WFE and the associated estimates to generate bootstrap samples (see Section
28.11.7 for further details).
28.13 Exercises
1. Suppose that
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
β it = β + ηi + λt , i = 1, 2, . . . , N, t = 1, 2, . . . , T,
with
E(ηi ) = E(λt ) = 0, E ηi λt = 0, (28.99)
E ηi xit = 0, E λt xit = 0,
η , if i = j,
E ηi ηj =
0, if i = j,
, if i = j,
E(λi λj ) =
0, if i = j.
Derive the best linear unbiased estimator of β in the above model, for known values of η ,
and σ 2i .
2. Consider the random coefficient panel data model
where α i are fixed group-specific effects, εit are independently distributed across i and t with
mean zero and the variance σ 2i < ∞, ηi ∼ IID(0, σ 2η ), σ 2η < ∞, and xit , ε it and ηi are
independently distributed for all t, t , and i.
where the error term uit is assumed to be independently, identically distributed over
t with
mean
zero
and variance σ 2 , is independent across i, and λ = λ + η , with E η
i i i i = 0,
E ηi ηj = , if i = j, and 0 otherwise. Find an expression for the bias of the MG estimator
of λ when T is finite (see (28.63)).
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where
α i = α + ηi , ηi IID(0, σ 2η ),
| λ |< 1, ε it are independently distributed across i and t with mean zero and the common
variance σ 2 , and xit are strictly exogenous.
where
ηi + ε̄ i
vi = ,
1−λ
ȳi = T −1 Tt=1 yit , x̄i = T −1 Tt=1 xit etc.
(b) Derive the conditions under which the cross-sectional regression of ȳi on x̄i will yield a
consistent estimator of the long-run coefficient, θ = β/(1 − λ). Is it possible also to
obtain consistent estimates of the short-run coefficients, β and λ, from cross-sectional
regressions?
(c) How robust are your results under (b) to possible dynamic misspecification of
(28.102)?
where the error correction coefficients, ϕ i , and the long-run coefficients, θ i , as well as the
intercepts, α i , are allowed to vary across the groups, xit is a scalar random variable assumed
to be stationary and strictly exogenous, and εit IID(0, σ 2i ). Assume also that ϕ i and θ i are
generated according to the following random coefficient model
ϕ i = ϕ + ηi1 ,
θ i = θ + ηi2 ,
where ηi = (ηi1 , ηi2 ) IID(0, ), being a 2×2 nonsingular matrix and ηi are distributed
independently of xjt , for all i, j, and t.
i i
i i
OUP CORRECTED PROOF – FINAL, 9/9/2015, SPi
i
where
Under what conditions (if any) will the fixed-effects estimation of (28.104) yield a con-
sistent estimate of (ϕ, θ )?
(b) Assuming T and N are sufficiently large, how would you estimate ϕ and θ ?
(c) Suppose ηi2 = 0, and slope heterogeneity is confined to the error correction coeffi-
cients, ϕ i . How would you now estimate θ?
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
29 Cross-Sectional Dependence
in Panels
29.1 Introduction
T his chapter reviews econometric methods for large linear panel data models subject to error
cross-sectional dependence. Early panel data literature assumed cross-sectionally indepen-
dent errors and homogeneous slopes. Heterogeneity across units was confined to unit-specific
intercepts, treated as fixed or random (see, e.g., the survey by Chamberlain (1984)). Depen-
dence of errors was only considered in spatial models, but not in standard panels. However, with
an increasing availability of data (across countries, regions, or industries), the panel literature
moved from predominantly micro panels, where the cross dimension (N) is large and the time
series dimension (T) is small, to models with both N and T large, and it has been recognized
that, even after conditioning on unit-specific regressors, individual units, in general, need not be
cross-sectionally independent.
Ignoring cross-sectional dependence of errors can have serious consequences, and the pres-
ence of some form of cross-section correlation of errors in panel data applications in economics
is likely to be the rule rather than the exception. Cross correlations of errors could be due to
omitted common effects, spatial effects, or could arise as a result of interactions within socioeco-
nomic networks. Conventional panel estimators such as fixed or random effects can result in mis-
leading inference and even inconsistent estimators, depending on the extent of cross-sectional
dependence and on whether the source generating the cross-sectional dependence (such as an
unobserved common shock) is correlated with regressors (Phillips and Sul (2003), Andrews
(2005), and Sarafidis and Robertson (2009)). Correlation across units in panels may also have
serious drawbacks on commonly used panel unit root tests, since several of the existing tests
assume independence. As a result, when applied to cross-sectionally dependent panels, such unit
root tests can have substantial size distortions (O’Connell (1998)). This potential problem has
recently given major impetus to the research on panel unit root tests that allow for cross unit
correlations. These and other related developments are reviewed in Chapter 31. If, however, the
extent of cross-sectional dependence of errors is sufficiently weak, or limited to a sufficiently
small number of cross-sectional units, then its consequences might be unimportant. Consis-
tency of conventional estimators can be affected only when the factors behind cross-correlations
are themselves correlated with regressors. The problem of testing for the extent of cross-section
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
correlation of panel residuals and modelling the cross-sectional dependence of errors are there-
fore important issues.
In the case of panel data models where the cross-section dimension is short and the time series
dimension is long, the standard approach to cross-sectional dependence is to consider the equa-
tions from different cross-sectional units as a system of seemingly unrelated regression equations
(SURE), and then estimate it by the generalized least squares techniques (see Chapter 19 and
Zellner (1962)). This approach assumes that the factors generating the cross-sectional depen-
dence are not correlated with the regressors, an assumption which is required for the consistency
of the SURE estimator. Also, if the time series dimension is not sufficiently large, and in particular
if N > T, the SURE approach is not feasible either.
Currently, there are two main strands in the literature for dealing with error cross-sectional
dependence in panels where N is large, namely the spatial econometric and the residual mul-
tifactor approaches. The spatial econometric approach assumes that the structure of cross-
sectional correlation is related to location and distance among units, defined according to a
pre-specified metric given by a ‘connection or spatial’ matrix that characterizes the pattern of
spatial dependence according to pre-specified rules. Hence, cross-sectional correlation is repre-
sented by means of a spatial process, which explicitly relates each unit to its neighbours (see
Whittle (1954), Moran (1948), Cliff and Ord (1973, 1981), Anselin (1988, 2001), Haining
(2003, Chapter 7), and the recent survey by Lee and Yu (2013)). This approach, however, typ-
ically does not allow for slope heterogeneity across the units and requires a priori knowledge of
the weight matrix. Spatial econometric literature is reviewed in Chapter 30.
The residual multifactor approach assumes that the cross dependence can be characterized
by a small number of unobserved common factors, possibly due to economy-wide shocks that
affect all units, albeit with different intensities (see Chapter 19 for an introduction to common
factor models). Geweke (1977) and Sargent and Sims (1977) introduced dynamic factor mod-
els, which have more recently been generalized to allow for weak cross-sectional dependence
by Forni and Lippi (2001), Forni et al. (2000, 2004). This approach does not require any prior
knowledge regarding the ordering of individual cross-sectional units or a weight matrix used in
the spatial econometric literature.
The main focus of this chapter is on estimation and inference in the case of large N and T
panel data models with a common factor error structure. We provide a synthesis of the alter-
native approaches proposed in the literature (such as principal components and common corre-
lated effects approaches), with particular focus on key assumptions and their consequences from
the practitioner’s view point. In particular, we discuss robustness of estimators to cross-sectional
dependence of errors, the consequences of coefficient heterogeneity, panels with strictly or
weakly exogenous regressors, including panels with a lagged dependent variable, and highlight
how to test for residual cross-sectional dependence.
The outline of the chapter is as follows: an overview of the different types of cross-sectional
dependence is provided in Section 29.2. The analysis of cross-sectional dependence using a fac-
tor error structure is presented in Section 29.3. A review of estimation and inference in the case
of large panels with a multifactor error structure and strictly exogenous regressors is provided
in Section 29.4, and its extension to models with lagged dependent variables and weakly exoge-
nous regressors is given in Section 29.5. A review of tests of error cross-sectional dependence
in static and dynamics panels is presented in Section 29.7, and Section 29.8 discusses the appli-
cation of common correlated effects estimators and tests of error cross-sectional dependence to
unbalanced panels.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Assumption CSD.1: For each t ∈ T ⊆ Z, zt = (z1t , z2t , . . . , zNt ) has mean E (zt ) = 0, and
variance Var (zt ) = t , where t is an N × N symmetric, nonnegative definite matrix. The
(i, j)th element of t , denoted by σ ij,t , is bounded such that 0 < σ ii,t ≤ K, for i = 1, 2, . . . , N,
where K is a finite constant independent of N.
Instead of assuming unconditional mean and variances, one could consider conditioning on a
given information set, t−1 , for t = 1, 2, . . . , T, as done in Chudik, Pesaran, and Tosetti (2011).
The assumption of zero means can also be relaxed to E (zt ) = μ [or E (zt |t−1 ) = μt−1 ]. The
covariance matrix, t , fully characterizes cross-sectional correlations of the double index process
{zit }, and this section discusses summary measures based on the elements of t that can be used
to characterize the extent of the cross-sectional dependence in zt .
Summary measures of cross-sectional dependence based on t can be constructed in a num-
ber of different ways. One possible measure, that has received a great deal of attention in the lit-
erature, is the largest eigenvalue of t , denoted by λ1 ( t ) (see, e.g., Bai and Silverstein (1998),
Hachem et al. (2005) and Yin et al. (1988).) However, the existing work in this area suggests
that the estimates of λ1 ( t ) based on sample estimates of t could be very poor when N is
large relative to T, and consequently using estimates of λ1 ( t ) for the analysis of cross-sectional
dependence might be problematic, particularly in cases where T is not sufficiently large relative
to N. Accordingly, other measures based on matrix norms of t have also been used in the lit-
erature. One prominent choice is the absolute column sum matrix norm, defined by t 1 =
maxj∈{1,2,...,N} N σ , which is equal to the absolute row sum matrix norm of t , defined
i=1 ij,t
by t ∞ = maxi∈{1,2,...,N} N j=1 σ ij,t , due to the symmetry of t . It is easily seen that
|λ1 ( t )| ≤ t 1 t ∞ = t 1 . See Chudik, Pesaran, and Tosetti (2011). Another pos-
sible measure of cross-sectional dependence can be based on the behaviour of (weighted) cross-
sectional averages which is often of interest in panel data econometrics, as well as in macroeco-
nomics and finance where the object of the analysis is often the study of aggregates or portfolios
of asset returns. In view of this, Bailey, Kapetanios, and Pesaran (2015) and Chudik, Pesaran,
and Tosetti (2011) suggest summarizing the extent of cross-sectional dependence based on the
behavior of cross-sectional averages, z̄wt = N i=1 wit zit = wt zt , at a point in time t, for t ∈ T ,
where zt satisfies Assumption CSD.1 and the sequence of weight vectors wt satisfies the follow-
ing assumption.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Assumption CSD.2, known in finance as the granularity condition, ensures that the weights
{wit } are not dominated by a few of the cross-sectional units.1 Although we have assumed the
weights to be non-stochastic, this is done for expositional convenience and can be relaxed by
allowing the weights, wt , to be random but distributed independently of zt . Chudik, Pesaran,
and Tosetti (2011) define the concepts of weak and strong cross-sectional dependence based on
the limiting behaviour of z̄wt at a given point in time t ∈ T , as N → ∞.
Definition 29 (Weak and strong cross-sectional dependence) The process {zit } is said to be
cross-sectionally weakly dependent (CWD) at a given point in time t ∈ T , if for any sequence of
weight vectors {wt } satisfying the granularity conditions (29.1)–(29.2) we have
The above concepts can also be defined conditional on a given information set, t−1 . The
choice of the conditioning set largely depends on the nature of the underlying processes and the
purpose of the analysis. For example, in the case of dynamic stationary models, the information
set could contain all lagged realizations of the process {zit }, that is t−1 = {zt−1 , zt−2 , . . . .},
whilst for dynamic non-stationary models, such as unit root processes, the information included
in t−1 , could start from a finite past. The conditioning information set could also contain con-
temporaneous realizations, which might be useful in applications where a particular unit has a
dominant influence on the rest of the units in the system. For further details, see Chudik and
Pesaran (2013).
The following proposition establishes the relationship between weak cross-sectional depen-
dence and the asymptotic behaviour of the largest eigenvalue of t .
1 Conditions (29.1)–(29.2) imply existence of a finite constant K (which does not depend on i or N) such that
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
(ii) The process {zit } is CSD at a point in time t ∈ T , if and only if for any N sufficiently large
(and as N → ∞), N −1 λ1 ( t ) ≥ K > 0.
Proof First, suppose λ1 ( t ) is bounded in N or increases at the rate slower than N. We have
Var(wt zt ) = wt t wt ≤ wt wt λ1 ( t ) , (29.5)
lim Var(wt zt ) = 0,
N→∞
namely that {zit } is CWD, which proves (i). Proof of (ii) is provided in Chudik, Pesaran, and
Tosetti (2011).
It is often of interest to know not only whether z̄wt converges to its mean, but also the rate
at which this convergence (if at all) takes place. To this end, Bailey, Kapetanios, and Pesaran
(2015) propose to characterize the degree of cross-sectional dependence by an exponent of
cross-sectional dependence defined by the rate of change of Var(z̄wt ) in terms ofN. Note
that in
the case where zit are independently distributed across i, we have Var(z̄wt ) = O N −1 , whereas
in the case of strong cross-sectional dependence Var(z̄wt ) ≥ K > 0. There is, however, a range
of possibilities in between, where Var(z̄wt ) decays but at a rate slower than N −1 . In particular,
using a factor framework, Bailey, Kapetanios, and Pesaran (2015) show that in general
where κ i > 0 for i = 0 and 1, are bounded in N, and will be time invariant in the case of sta-
tionary processes. Since the rate at which Var(z̄wt ) tends to zero with N cannot be faster than
N −1 , the range of α identified by Var(z̄wt ) lies in the restricted interval −1 < 2α − 2 ≤ 0 or
1/2 < α ≤ 1. Note that (29.3) holds for all values of α < 1, whereas (29.4) holds only for
α = 1. Hence the process with α < 1 is CWD, and a CSD process has the exponent α = 1.
Bailey, Kapetanios, and Pesaran (2015) show that, under certain conditions on the underlying
factor model, α is identified in the range 1/2 < α ≤ 1, and can be consistently estimated. Alter-
native bias-adjusted estimators of α are proposed and shown by Monte Carlo experiments to
have satisfactory small sample properties.
A particular form of a CWD process arises when pair-wise correlations take non-zero values
only across finite subsets of units that do not spread widely as the sample size increases. As we
shall see in Chapter 30, a similar situation arises in the case of spatial processes, where direct
dependence exists only amongst adjacent observations, and indirect dependence is assumed to
decay with distance.
Since λ1 ( t ) ≤ t 1 , it follows from (29.5) that both the spectral radius and the column
norm of the covariance matrix of a CSD process will be increasing at the rate N. Similar situa-
tions also arise in the case of time series processes with long memory or strong temporal depen-
dence where autocorrelation coefficients are not absolutely summable. Along the cross-sectional
dimension, common factor models represent examples of strong cross-sectional dependence.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
zt = ft + et , (29.8)
et = Rε t , (29.9)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
sum matrix norms (so that the cross-sectional dependence of et is sufficiently weak) and the
factor loadings are such that limN→∞ (N −1 ) is a full rank matrix.
A leading example of R arises in the context of the first-order spatial autoregressive, SAR(1),
model, defined by
et = ρWet + ε t , (29.10)
where is a diagonal matrix with strictly positive and bounded elements, 0 < σ i < ∞,
ρ is a spatial autoregressive coefficient, and the matrix W is a ‘connection’ or ‘spatial’ weight
matrix which is taken as given.2 Assuming that (IN − ρW) is invertible, we then have R =
(IN − ρW)−1 . In the spatial literature, W is assumed to have non-negative elements and is
typically row-standardized so that W∞ = 1. Under these assumptions, |ρ| < 1 ensures that
|ρ| W∞ < 1, and we have
R∞ ≤ ∞
IN + ρW+ρ 2 W 2 + . . . .
∞
∞
≤ ∞ 1 + |ρ| W∞ + |ρ|2 W2∞ + . . . = < K < ∞,
1 − |ρ| W∞
where ∞ = maxi (σ i ) < ∞. Similarly, R1 < K < ∞, if it is further assumed that
|ρ| W1 < 1. In general, R = (IN − ρW)−1 has bounded row and column sum matrix
norms if |ρ| < max (1/ W1 , 1/ W∞ ). In the case where W is a row and column stochastic
matrix (often assumed in the spatial literature) this sufficient condition reduces to |ρ| < 1,
which also ensures the invertibility of (IN − ρW). Note that for a doubly stochastic matrix
ρ(W) = W1 = W∞ = 1, where ρ (W) is the spectral radius of W. It turns out that
almost all spatial models analysed in the spatial econometrics literature characterize weak forms
of cross-sectional dependence. See Sarafidis and Wansbeek (2012) for further discussion.
Turning now to the factor representation, to ensure that the factor component of (29.8) rep-
resents strong cross-sectional dependence, it is sufficient that the absolute column sum matrix
N
norm of 1 = maxj∈{1,2,...,N} i=1 γ ij rises with N at the rate N, and limN→∞ (N −1 )
is a full rank matrix, as noted earlier.
The distinction between weak and strong cross-sectional dependence in terms of factor load-
ings is formalized in the following definition.
N
lim N −1 γ i = K > 0. (29.11)
N→∞
i=1
N
lim γ i = K < ∞. (29.12)
N→∞
i=1
2 Spatial econometric models are discussed in Chapter 30. In particular, see Section 30.3.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
N
lim N −α γ i = K < ∞, for K > 0. (29.13)
N→∞
i=1
Strong and weak factors correspond to the two values of α = 1 and α = 0, respectively. For
any other values of α ∈ (0, 1) the factor f t can be said to be semi-strong or semi-weak. It will
prove useful to associate the semi-weak factors with values of 0 < α < 1/2, and the semi-
strong factors with values of 1/2 ≤ α < 1. In a multi-factor set up the overall exponent can be
defined by α = max(α 1 , α 2 , . . . , α m ).
Example 67 Suppose that zit are generated according to the simple factor model, zit = γ i ft +
eit , where ft is independently distributed of γ i , and eit ∼ IID(0, σ 2i ), for alli and
t, σ i is non-
2
γ i = μ + vi , for i = 1, 2, . . . , [N α γ ] , (29.14)
γ i = 0, for i = [N α γ ] + 1, [N αγ ] + 2, . . . , N, (29.15)
where (dropping the integer part sign, [.] , for further clarity)
αγ αγ
1
N N N
−1 −1 α γ −1 α γ −1
γ̄ N = N γi = N γ i = μN +N vi ,
i=1 i=1
N α γ i=1
N
σ̄ 2N = N −1 σ 2i > 0.
i=1
we have
3 The assumption of zero loadings for i > N α γ could be relaxed so long as N γ = Op (1). But for exposi-
i=[N α γ ]+1 i
α α
tional simplicity we maintain γ i = 0, for i = N γ + 1, N γ + 2, . . . , N.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
2
E γ̄ 2N = E γ̄ N + Var(γ̄ N ) = μ2 N 2(α γ −1) + N α γ −2 σ 2v .
Thus the exponent of cross-sectional dependence of zit , denoted as α z , and the exponent α γ coincide
in this example, so long as α γ > 1/2. When α γ = 1/2, one cannot use Var (z̄t ) to distinguish the
factor effects from those of the idiosyncratic terms. Of course, this does not necessarily mean that
other more powerful techniques cannot be found to distinguish such weak factor effects from the
αγ
effects of the idiosyncratic terms. Finally, note also that in this example N i=1 γ i = Op (N ),
2
and the largest eigenvalue of the N × N covariance matrix, Var (zt ) , also rises at the rate of N αγ .
The relationship between the notions of CSD and CWD and the definitions of weak and
strong factors are explored in the following theorem.
Theorem 47 Consider the factor model (29.8) and suppose that Assumptions CF.1-CF.2 hold, and
there exists a positive constant α = max(α 1 , α 2 , . . . , α m ) in the range 0 ≤ α ≤ 1, such that
condition (29.13) is met for any = 1, 2, . . . , m. Then the following statements hold:
(i) The process {zit } is cross-sectionally weakly dependent at a given point in time t ∈ T , if α < 1,
which includes cases of weak, semi-weak or semi-strong factors, f t , for = 1, 2, . . . , m.
(ii) The process {zit } is cross-sectionally strongly dependent at a given point in time t ∈ T , if and
only if there exists at least one strong factor.
N
zit = γ ij fjt + ε it , for i = 1, 2, . . . , N,
j=1
where εit is independently distributed across i. Under this formulation, to ensure that the vari-
ance of zit is bounded in N, we also require that
N
γ i ≤ K < ∞, for i = 1, 2, . . . , N. (29.19)
=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
where
m
N
zsit = γ i f t ; zwit = γ i f t + ε it , (29.21)
=1 =m+1
and γ i satisfy conditions (29.11) for = 1, . . . , m, where m must be finite in view of the
absolute summability condition (29.19) that ensures finite variances. Remaining loadings γ i
for = m + 1, m + 2, . . . , N must satisfy either (29.12) or (29.13) for some α < 1.4 In the
light of Theorem 47, it can be shown that zsit is CSD and zwit is CWD. Also, notice that when zit
is CWD, we have a model with no strong factors and potentially an infinite number of weak or
semi-strong factors. Seen from this perspective, spatial models considered in the literature can
be viewed as an N weak factor model.
Consistent estimation of factor models with weak or semi-strong factors may be problematic,
as evident from the following example.
Example 68 Consider the single factor model with known factor loadings
zit = γ i ft + ε it , ε it ∼ IID 0, σ 2 .
The least squares estimator of ft , which is the best linear unbiased estimator, is given by
N
γ i zit σ2
f̂t = i=1
N , Var f̂t = N 2 .
i=1 γ i i=1 γ i
2
In the weak factor case where N i=1 γ i is bounded in N, then Var f̂t does not vanish as N → ∞,
2
and f̂t need not be a consistent estimator of ft . See also Onatski (2012).
The presence of weak or semi-strong factors in errors does not affect consistency of conven-
tional panel data estimators, but affects inference, as is evident from the following example.
where
xit = δ i ft + vit .
To simplify the exposition we assume that, εit, vjs and ft are independently, and identically dis-
tributed across all i, j, t,s and t , as ε it ∼ IID(0, σ 2ε ), vit ∼ IID(0, σ 2v ), and ft ∼ IID(0, 1). The
pooled estimator of β satisfies
4 Note that the number of factors with α > 0 is limited by the absolute summability condition (29.19).
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
N T
√ √1
i=1 t=1 xit uit
NT
NT β̂ P − β = 1 N T 2
, (29.22)
NT i=1 t=1 xit
N
where the denominator converges in probability to σ 2v + limN→∞ N −1 i=1 δ i
2 > 0, while the
numerator can be expressed, after substituting for xit and uit , as
1
N
T
1
N
T
1
N
T
√ xit uit = √ γ i δ i ft2 + √ δ i ft ε it + γ i vit ft + vit ε it .
NT i=1 t=1
NT i=1 t=1
NT i=1 t=1
(29.23)
Under the above assumptions it is now easily seen that the second term in the above expression is
Op (1), but the first term can be written as
1
N
T
1
N
1 2
T
√ γ i δ i ft2 = √ γ iδi · √ ft
NT i=1 t=1
N i=1 T t=1
1
N
=√ γ i δ i · Op T 1/2 .
N i=1
Suppose now that ft is a factor such that loadings γ i and δ i are given by (29.14)–(29.15)
with the
exponents α γ and α δ (0 ≤ α γ ,α δ ≤ 1), respectively, and let α = min α γ , α δ . It then follows
α
that Ni=1 γ i δ i = Op (N ), and
1
N
T
√ γ i δ i ft2 = Op (N α−1/2 T 1/2 ).
NT i=1 t=1
Therefore, even if α < 1 the first term in (30.9) diverges, and overall we have β̂ P − β =
Op (N α−1 ) + Op (T −1/2 N −1/2 ). It is now clear that even if ft is not a strong factor, the rate of
convergence of β̂ P and its asymptotic variance will still be affected by the factor structure of the
error term. In the case where α = 0, and the errors are spatially dependent, the variance matrix of
the pooled estimator also depends on the nature of the spatial dependence which must be taken into
account when carrying out inference on β. See Pesaran and Tosetti (2011) for further results and
discussions. See also Section 30.7.
Weak, strong and semi-strong common factors may be used to represent very general forms
of cross-sectional dependence. For example, as we will see in Chapter 30, a factor process with
an infinite number of weak factors, and no idiosyncratic errors can be used to represent spatial
processes. In particular, the spatial model (29.9) can be represented by eit = N j=1 γ ij fjt , where
γ ij = rij and fjt = ε jt . Strong factors can be used to represent the effect of the cross-sectional
units that are “dominant” or pervasive, in the sense that they impact all the other units in the
sample and their effect does not vanish as N tends to infinity (Chudik and Pesaran (2013)).
As outlined in Example 70 below, a large city may play a dominant role in determining house
prices nationally (Holly, Pesaran, and Yamagata (2011)). Semi-strong factors may exist if there
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
is a cross-sectional unit or an unobserved common factor that affects only a subset of the units
and the number of affected units rise more slowly than the total number of units. Estimates of
the exponent of cross-sectional dependence reported by Bailey, Kapetanios, and Pesaran (2015)
suggest that for typical large macroeconomic data sets the estimates of α fall in the range of 0.77–
0.92, which fall short of 1 assumed in the factor literature. For cross-country quarterly real GDP
growth, inflation and real equity prices the estimates of α are much closer to unity and tend to
be around 0.97.
Example 70 (The diffusion of UK house prices) Holly, Pesaran, and Yamagata (2011) study
the diffusion of (log) of quarterly house prices, pit , over time and across London and 11 UK regions
in the years from 1973q4 to 2008q2. The authors assume that one of the regions, the London area,
to be denoted as region 0, is dominant in the sense that shocks to it propagate to other regions
simultaneously and over time. Conversely, shocks to the remaining regions are assumed to have little
immediate impact on region 0—although there may be some lagged effects of shocks from the other
regions onto region 0. Hence, the following first-order linear error correction specification for region
0 is specified by
p0t = φ 0s p0,t−1 − p̄s0,t−1 + a0 + a01 p0,t−1 + b01 p̄s0,t−1 + ε 0t . (29.24)
In the above equations, p̄s0t and p̄sit are the spatial lags of prices defined as
N
N
p̄s0t = s0j p̄sj,t−1 , p̄sit = sij p̄sj,t−1 ,
j=1 j=0,j
=i
where sij = 1/ni if i and j share a border and zero otherwise, with ni being the number of neighbours
of region i. From the above specifications, London prices are assumed to be cointegrating with aver-
age prices in the neighbourhood of London. At the same time, prices in other regions are allowed
to cointegrate with London as well as with their neighbouring regions. The assumption that p0t
is weakly exogenous in the equations for pit , i = 1, 2, . . . , N, can be tested using the procedure
advanced by Wu (1973). OLS estimation of equations (29.24)–(29.25), and the Wu (1973)
statistic are reported in Table 29.1. The error correction term measured relative to London is sta-
tistically significant in five regions (East Anglia, East Midlands, West Midlands, South West, and
North), while the error correction term measured relative to neighboring regions is statistically signif-
icant only in the price equation for Scotland. The estimates of short-term dynamics show a consider-
able degree of heterogeneity in lag lengths and short-term dynamics. Surprisingly, the own lag effect
(ai1 ) is rather weak and generally statistically insignificant, except for the region ‘North’. Finally,
the contemporaneous effect of London house prices (ci0 ) is sizeable and statistically significant in
all regions. Figure 29.1 shows the effect of a unit shock to London house prices on London over time,
compared to the impact effects of the same shock on regions ordered by their distance from Lon-
don, for different horizons, h = 0, 1, . . . , 11. This figure clearly shows the levelling off of the effect
of shocks over time and across regions, indicating that the decay along the geographical dimension
i i
i
i
i
i
Table 29.1 Error correction coefficients in cointegrating bivariate VAR(4) of log of real house prices in London and other UK regions (1974q4-2008q2)
k k k
Notes: This table reports estimates based on the price equations pit = φ is (pi,t−1 − p̄si,t−1 ) + φ i0 (pi,t−1 − p0,t−1 ) + =1 ia
ai pi,t− + =1
ib
bi p̄si,t− + =1
ic
ci p0,t− +
ci0 p0,t + ε it , for i = 1, 2, . . . , N. For i = 0, denoting the London equation, we have the additional a priori restrictions, φ 00 = c00 = 0. ‘EC1’, ‘EC2’, ‘Own lag effects’, ‘Neighbour
k k k
lag effects’, ‘London lag effects’, and ‘London contemporaneous effects’ relate to the estimates of φ i0 , φ is , =1 ia
ai , =1
ib
bi , =1
ic
ci , and ci0 , respectively. t-ratios are shown in
parentheses. ∗∗∗ signifies that the test rejects the null at the 1% level , ∗∗ at the 5% level, and ∗ at the 10% level. The error correction coefficients (φ is and φ i0 ) are restricted such that at
least one of them is statistically significant at the 5% level. Wu-Hausman is the t-ratio for testing H0 : λi = 0 in the augmented regression pit = φ is (pi,t−1 − p̄si,t−1 ) + φ i0 (pi,t−1 −
k k k
p0,t−1 ) + =1ia
ai pi,t− + =1
ib
bi p̄si,t− + =0
ic
ci p0,t− + λi ε̂ 0t + εit, , where ε̂0t is the residual of the London house price equation, and the error correction coefficients
are restricted as described above. In selecting the lag-orders, kia , kib , and kic the maximum lag-order is set to 4. All regressions include an intercept term.
i
i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
L OM OSE EA EM WM SW W YW NW N S
0.030 0.030
0.025 0.025
0.020 0.020
GIRF
0.015 0.015
0.010 0.010
0.005 0.005
0.000 0.000
0 1 2 3 4 5 6 7 8 9 10 11
Horizon in Quarter
Figure 29.1 GIRFs of one unit shock (+ s.e.) to London on house price changes over time and across
regions.
Notes: Broken lines are bootstrap 90% confidence band of the GIRF s for the regions, based on 10,000
bootstrap samples.
seems to be slower as compared with the decay along the time dimension. The effects of a shock to
London on itself, die away and are largely dissipated after two years. By contrast, the effects of the
same shock on other regions takes much longer to dissipate, the further the region is from London.
This finding is in line with other empirical evidence on the rate of spatial as compared to temporal
decay discussed in Whittle (1954). For further details see Holly, Pesaran, and Yamagata (2011).
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
where f̂t is an m × 1 vector of principal components of the residuals computed in the first stage.
The resultant estimator of β is consistent for N and T large, so long as ft and the regressors, xit ,
are uncorrelated. However, if the factors and the regressors are correlated, as is likely to be the
case in practice, the two-stage estimator becomes inconsistent (Pesaran (2006)).
5 Pooled estimation is carried out assuming that β = β for all i, whilst mean group estimation allows for slope hetero-
i
geneity and estimates β by the average of the individual estimates of β i . See Chapter 28.
6 Tests of slope homogeneity hypothesis in static and dynamic panels are discussed in Pesaran and Yamagata (2008)
and in Section 28.11.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Building on Coakley, Fuertes, and Smith (2002), Bai (2009) has proposed an iterative method
which consists of alternating the PC method applied to OLS residuals and the least squares esti-
mation of (29.28), until convergence. In particular, to simplify the exposition suppose α i = 0.
Then the least squares estimator of β and F is the solution of the following set of nonlinear
equations
N −1
N
β̂ PC = Xi MF̂ Xi Xi MF̂ yi ,
i=1 i=1
1
N
yi − Xi β̂ PC yi − Xi β̂ PC F̂ = F̂V̂,
NT i=1
where Xi = xi1 , xi2 , . . . , xiT is the matrix of observations on xit , yi = yi1 , yi2 , . . . , yiT is the
−1
vector of observations on yit , MF̂ = IT − F̂ F̂ F̂ F̂ , F̂ = f̂1 , f̂2 , . . . , f̂T , and V̂ is a diagonal
matrix with the m largest eigenvalues of the matrix (NT)−1 N i=1 yi − Xi β̂ PC yi − Xi β̂ PC
−1
arranged in a decreasing order. The solution β̂ PC , F̂ and γ̂ i = F̂ F̂ F̂ yi −Xi β̂ PC minimizes
the sum of squared residuals function,
N
N
SSRNT β, γ i i=1 , {ft }t=1 =
T
yi − Xi β − Fγ i yi − Xi β − Fγ i ,
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
β i = β + υ i , υ i ∼ IID(0,
υ ) for i = 1, 2, . . . , N,
where the deviations, υ i , are distributed independently of ejt , xjt , and dt , for all i, j and t. To allow
for possible dependence of the regressors and the factors, the following model for the individual-
specific regressors in (29.26) is adopted
where Ai and i are n × k and m × k factor loading matrices with fixed components, vit is the
idiosyncratic component of xit distributed independently of the common effects ft and errors
ejt for all i, j, t and t . However, vit is allowed to be serially correlated, and cross sectionally weakly
correlated.
Equations (29.26), (29.27) and (29.29) can be combined to yield the following system of
equations
yit
zit = = Bi dt + Ci ft + ξ it , (29.30)
xit
where
eit + β i vit
ξ it = ,
vit
1 0 1 0
α
Bi = ( i A i ) , Ci = ( γ i i ) .
βi Ik βi Ik
Consider the weighted average of zit using the weights wi satisfying the granularity conditions
(29.1)–(29.2)
where
N
z̄wt = wi zit ,
i=1
N
N
N
B̄w = wi Bi , C̄w = wi Ci , and ξ̄ wt = wi ξ it .
i=1 i=1 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Assume that7
Rank(C̄w ) = m ≤ k + 1, (29.31)
we have
ft = (C̄w C̄w )−1 C̄w z̄wt − B̄w dt − ξ̄ wt . (29.32)
Under the assumption that eit ’s and vit ’s are CWD processes, it is possible to show that (see
Pesaran and Tosetti (2011))
q.m.
ξ̄ wt → 0, (29.33)
which implies
q.m.
ft − (C̄w C̄w )−1 C̄w z̄wt − B̄ dt → 0, as N → ∞, (29.34)
where
1 0
C = lim (C̄w ) = ˜ , (29.35)
N→∞ β Ik
˜ = [E(γ i ), E( i )], and β = E(β i ). Therefore, the unobservable common factors, ft , can be well
approximated by a linear combination of observed effects, dt , the cross-section averages of the
dependent variable, ȳwt , and those of the individual-specific regressors, x̄wt .
When the parameters of interest are the cross-section means of the slope coefficients, β, we
can consider two alternative estimators, the CCE Mean Group (CCEMG) estimator, originally
proposed by Pesaran and Smith (1995), and the CCE Pooled (CCEP) estimator. Let M̄w be
defined by
where A+ denotes the Moore–Penrose inverse of matrix A, H̄w = (D, Z̄w ), and D and Z̄w are,
) .
respectively, the matrices of the observations on dt and z̄wt = (ȳwt , x̄wt
The CCEMG is a simple average of the estimators of the individual slope coefficients8
N
−1
β̂ CCEMG = N β̂ CCE,i , (29.37)
i=1
where
7 This assumption can be relaxed. See the discussions at the end of this Section and examples 71 and 72.
8 Pesaran (2006) also considered a weighted average of individual b̂i , with weights inversely proportional to the indi-
vidual variances.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Under some general conditions Pesaran (2006) shows that β̂ CCEMG is asymptotically unbiased
for β, and as (N, T) → ∞,
√ d
N β̂ CCEMG − β → N(0, CCEMG ), (29.39)
where CCEMG =
v . A consistent estimator of the variance of β̂ CCEMG , denoted by
Var β̂ CCEMG , can be obtained by adopting the non-parametric estimator
1 N
Var ˆ CCEMG =
β̂ CCEMG = N −1 β̂ CCE,i − β̂ CCEMG β̂ CCE,i − β̂ CCEMG .
N (N − 1) i=1
(29.40)
The CCEP estimator is given by
N −1
N
β̂ CCEP = wi Xi M̄w Xi wi Xi M̄w yi . (29.41)
i=1 i=1
It is now easily seen that β̂ CCEP is asymptotically unbiased for β, and, as (N, T) → ∞,
N −1/2 d
w2i β̂ CCEP − β → N(0, CCEP ),
i=1
where
β̂ CCEP , is given by
A consistent estimator of Var β̂ CCEP , denoted by Var
N
N
2 ˆ ˆ ∗−1 ,
ˆ ∗−1 R̂ ∗
Var β̂ CCEP = wi CCEP = w2i (29.42)
i=1 i=1
where
N X M̄ X
ˆ∗=
wi i
w i
,
i=1
T
1 2 Xi M̄w Xi X M̄ X
N
R̂ ∗ = w̃i β̂ CCE,i − β̂ CCEMG β̂ CCE,i − β̂ CCEMG i w i
.
N − 1 i=1 T T
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
√
The rate of convergence of β̂ CCEMG and β̂CCEP is N when
υ
= 0. Note that even√if β i were
observed for all i, the estimate of β = E β i cannot converge at a faster rate than N. If the
individual slope coefficients β i are homogeneous
√ (namely√if
υ = 0), β̂ CCEMG and β̂ CCEP are
still consistent and converge at the rate NT rather than N.
The advantage of the non-parametric estimators ˆ CCEMG and ˆ CCEP is that they do not
require knowledge of the form of weak cross-sectional dependence of eit , nor the knowledge of
serial correlation of eit . An important question is whether the non-parametric variance estimators
β̂ CCEMG and Var
Var β̂ CCEP can be used in both cases of homogeneous and heterogeneous
slopes. As established in Pesaran and Tosetti (2011), the asymptotic distribution of β̂ CCEMG
and β̂ CCEP depends on nuisance parameters when slopes are homogeneous (
υ = 0), includ-
ing the nature of cross-section correlations of eit and their serial correlation structure. However,
it can be shown that the robust non-parametric estimators Var β̂ CCEMG and Var β̂ CCEP are
consistent when the regressor-specific components, vit , are independently distributed across i.
The CCE continues to be applicable even if the rank condition (29.31) is not satisfied. Fail-
ure of the rank condition can occur if there is an unobserved factor for which the average of
the loadings in the yit and xit equations tends to a zero vector. This could happen if, for exam-
ple, the factor in question is weak, in the sense defined above. Another possible reason for failure
of the rank condition is if the number of unobservable factors, m, is larger than k + 1, where k
is the number of the unit-specific regressors included in the model. In such cases, common fac-
tors cannot be estimated from cross-section averages. However, it is possible to show that the
cross-section means of the slope coefficients, β i , can still be consistently estimated, under the
additional assumption that the unobserved factor loadings, γ i , in equation (29.27) are indepen-
dently and identically distributed across i, and of ejt , vjt , and gt = (dt , ft ) for all i, j and t, are
uncorrelated with the loadings attached to the regressors, i . The consequences of the correla-
tion between loadings γ i and i for the performance of CCE estimators in the rank deficient
case are documented in Sarafidis and Wansbeek (2012). The following example illustrates the
implications of such correlations.
yit = α i + β i xit + γ i ft + ε it , i = 1, 2, . . . , N; t = 1, 2, 3, . . . , T,
xit = δ i ft + vit , β i = β + υ i , γ i = γ + ηi , δ i = δ + ξ i ,
where υ i , ηi and ξ i are distributed with zero means and constant variances. Suppose also that εit
and vit are cross-sectionally and serially uncorrelated, and vit is uncorrelated with ft . But allow υ i ,
ηi and ξ i to be correlated with one another. Let wi = N −1 and note that for i = 1, 2, . . . , N,
we have
−1
N
x M̄xi 1 xi M̄(xi υ i + εi )
N
−1 i
β̂ CCEP − β = N + qNT , (29.43)
i=1
T N i=1 T
N
1 xi M̄f γ i
qNT = , (29.44)
N i=1 T
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
where M̄ = IT − H̄(H̄ H̄)−1 H̄, with H̄ = (τ T , x̄, ȳ), τ T = (1, 1, . . . , 1) , x̄ = (x̄1 ,
N
x̄2 , . . . , x̄T ) , x̄t = N −1 Ni=1 xit , ȳ = ȳ1 , ȳ2 , . . . , ȳN , and ȳt = N
−1
i=1 yit . Note that
N
x M̄f
N
xi M̄f
qNT = N −1 i
γ + N −1 ηi . (29.45)
i=1
T i=1
T
N
−1 (δ i f+vi ) M̄f
dNT = N ηi
i=1
T
N
N
f M̄f v M̄f
−1 −1 i
= N δ i ηi +N ηi .
i=1
T i=1
T
The second term tends to zero as N and T tend to infinity since by assumption vit and ft are inde-
pendently distributed. It is clear that the first term also tends to zero if γ i and δ i are uncorrelated.
So far none of these results requires γ i and/or δ i to have non-zero mean. Consider now the case
where N −1 N i=1 δ i ηi → ρ γ δ
= 0, then the asymptotic bias of β̂ CCEP depends on the limiting
property of T −1 f M̄f. It is now clear that if either of γ or δ are non-zero then ȳ or x̄ can be written
as a function of f even as N → ∞, and T −1 f M̄f → 0. It is only in the case where γ = δ = 0
that neither ȳ nor x̄ have any relationship to f in the limit and as a result T −1 f M̄f does not tend
to zero. Further, in the case where γ = 0 but δ
= 0, β̂ CCEP will be consistent even if ρ γ δ
= 0.
In the case where γ = δ = 0, ȳt → E(α i ) and x̄t → 0. Also Var(ȳt ) → 0 and Var(x̄t ) → 0
as N → ∞ . In economic applications such non-stochastic limits do not seem plausible. For exam-
ple, when factor loadings have zero means, then variances of per capita consumption, output, and
investment will all tend to zero, which seems unlikely. Similarly, in the case of capital asset pricing
models, rit = α i + β i ft + ε it , the assumption that β i have a zero mean is equivalent to saying that
there exist risk free portfolios that have positive excess returns—again a very unlikely scenario.
An advantage of the CCE approach is that it yields consistent estimates under a variety of situa-
tions. Kapetanios, Pesaran, and Yamagata (2011) consider the case where the unobservable com-
mon factors follow unit root processes and could be cointegrated. They show that the asymp-
totic distribution of panel estimators in the case of I(1) factors is similar to that in the stationary
case. Pesaran and Tosetti (2011) prove consistency and asymptotic normality for CCE estima-
tors when {eit } are generated by a spatial process. Chudik, Pesaran, and Tosetti (2011) prove
consistency and asymptotic normality of the CCE estimators when errors are subject to a finite
number of unobserved strong factors and an infinite number of weak and/or semi-strong unob-
served common factors as in (29.20)–(29.21), provided that certain conditions on the loadings
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
of the infinite factor structure are satisfied. A further advantage of the CCE approach is that it
does not require an a priori knowledge of the number of unobserved common factors.
In a Monte Carlo (MC) study, Coakley, Fuertes, and Smith (2006) compare ten alternative
estimators for the mean slope coefficient in a linear heterogeneous panel regression with strictly
exogenous regressors and unobserved common (correlated) factors. Their results show that,
overall, the mean group version of the CCE estimator stands out as the most efficient and robust.
These conclusions are in line with those in Kapetanios and Pesaran (2007) and Chudik, Pesaran,
and Tosetti (2011), who investigate the small sample properties of CCE estimators and the esti-
mators based on principal components. The MC results show that PC augmented methods do
not perform as well as the CCE approach, and can lead to substantial size distortions, due, in
part, to the small sample errors in the number of factors selection procedure. In a theoretical
study, Westerlund and Urbain (2011) investigate the merits of the CCE and PC estimators in
the case of homogeneous slopes and known number of unobserved common factors and find
that, although the PC estimates of factors are more efficient than the cross-sectional averages,
the CCE estimators of slope coefficients generally perform the best.
Example 72 There exists extensive literature on the relationship between long-run economic growth
and investment in physical capital. Simple exogenous growth theories, such as the Solow model,
predict a positive association between investment and the level of per capita GDP, but no relation
between investment and steady-state growth rates (Barro and Sala-i-Martin (2003)). This conclu-
sion has been supported by a number of empirical studies (see, e.g., the review of empirical litera-
ture in Easterly and Levine (2001)). Bond, Leblebicioglu, and Schiantarelli (2010) reconsider this
problem, using data for seventy-five economies over the period 1960–2005, distinguishing between
OECD and non-OECD countries. They adopt a model that allows for country-specific heterogene-
ity, endogeneity of investments and cross-sectional dependence. Let yit denote the logarithm of GDP
per worker in country i in year t, and xit denote the logarithm of the investment to GDP ratio. The
authors consider the ARDL(p, p) model
p
p
yit = git + α s yi,t−s + β s xi,t−s + ηi + uit , (29.46)
s=1 s=1
where git is a non-stationary process that determines the behavior of the growth rate of yit in the
long-run. The long-run growth rate is modelled as
where di is a country-specific effect, ft and vit are permanent shocks, common to all countries (ft ),
and country-specific (vit ). The main object of the analysis is θ 1 . Under θ 1 = 0, there is no long-run
relationship between investment as a share of GDP and long-run growth rate, while under θ 1 > 0,
a permanent increase in investment predicts a higher long-run growth rate. Taking first differences
of equation (29.46) and substituting for git from equation (29.47), yields
p
p
yit = θ 0 + θ 1 xit + α s yi,t−s + β s xi,t−s + di + γ i ft + vit + uit . (29.48)
s=1 s=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
One interesting point to observe is that, from (29.47), git can be written as
t
t
t
git = gi0 + (θ 0 + di ) t + θ 1 xis + γ i fs + vis .
s=1 s=1 s=1
Hence, substituting
git in (29.46) yields a model for the level of yit in which the error term has an I(1)
component ts=1 vis if these idiosyncratic permanent shocks to income levels are present, implying
that the I(1) series yit and ts=1 xis are not cointegrated among themselves. Equation (29.48) is
then estimated separately for each country, by the IV method using as instruments lagged observa-
tions dated from t−2 to t−6 on yit and xit , and lagged observations dated t−2 and t−3 on a set of
additional instruments (inflation, trade as a share of GDP, and government spending as a share of
GDP). Following the CCE approach, Bond, Leblebicioglu, and Schiantarelli (2010) approximate
the unobserved common factor, ft , by including ȳt and x̄t in the regression specification. Estimation
results for the mean and median estimated coefficients are reported in Table 29.2. Results show that
investment as a share of GDP has a large and statistically significant effect on long-run growth rates,
using the full sample of seventy-five countries and sub-sample of non-OECD countries. However,
this evidence is weaker for OECD countries, for which the estimated coefficient θ 1 is not statistically
significant. This result may reflect important differences across countries in the growth process.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
relaxed. One important example is the panel data model with lagged dependent variables and
unobserved common factors (possibly correlated with the regressors)9
for i = 1, 2, . . . , N; t = 1, 2, . . . , T. It is assumed that |λi | < 1, and the dynamic processes have
started a long time in the past. As in Section 29.4, we distinguish between the case of homo-
geneous coefficients, where λi = λ and β i = β for all i, and the heterogeneous case, where λi
and β i are randomly distributed
across units and the object of interest are the mean coefficients
λ = E (λi ) and β = E β i . This distinction is more important for dynamic panels, since not only
is the rate of convergence affected by the presence of coefficient heterogeneity, but, as shown by
Pesaran and Smith (1995), pooled least squares estimators are no longer consistent in the case
of dynamic panel data models with heterogeneous coefficients. See also Section 28.6.
It is convenient to define the vector of regressors ζ it = yi,t−1 , xit and the corresponding
parameter vector π i = λi , β i so that (29.49) can be written as
where B is a compact set assumed to contain the true parameter values, and
1
N
LNT (π) = min yi − i π − Fγ i yi − i π − Fγ i ,
{γ i }Ni=1 ,{ft }Tt=1 NT i=1
Both π̂ QMLE and β̂ PC minimize the same objective function and therefore, when the same set of
regressors is considered, these two estimators are numerically the same. But there are important
9 Fixed-effects and observed common factors (denoted by d previously) can also be included in the model. They are
t
excluded here to simplify the exposition.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
differences in their bias-corrected versions also considered in Bai (2009) and Moon and Weidner
(2015). The latter paper allows for more general assumptions on regressors, including the pos-
sibility of weak exogeneity, and adopts a quadratic approximation of the profile likelihood func-
tion, which allows the authors to work out the asymptotic distribution and to conduct inference
on the coefficients.
Moon and Weidner (MW) show that π̂ QMLE is a consistent estimator of π, as (N, T) →
∞ without any restrictions on the ratio T/N. To derive the asymptotic distribution of π̂ QMLE ,
MW require T/N → κ, 0 < κ < ∞, as (N, T) → ∞, and assume that the idiosyncratic
√ eit , are
errors, cross-sectionally
independent. Under certain high level assumptions, they show
that NT π̂ QMLE − π converges to a normal distribution with a non-zero mean, which is due
to two types of asymptotic bias. The first follows from the heteroskedasticity of the error terms,
as in Bai (2009), and the second is due to the presence of weakly exogenous regressors. The
authors provide consistent estimators of these two components, and propose a bias-corrected
QMLE.
There are two important considerations that should be borne in mind when using the QMLE
proposed by MW. First, it is developed for the case of slope homogeneity, namely under π i = π
for all i. This assumption, for example, rules out the inclusion of fixed-effects into the model,
which can be quite restrictive in practice, although the unobserved factor component, γ i ft , does
in principle allow for fixed-effects if the first element of ft can be constrained to be unity at the
estimation stage. A second consideration is the small sample properties of QMLE in the case of
models with fixed-effects, which are of primary interest in empirical applications. Simulations
reported in Chudik and Pesaran (2015a) suggest that the bias correction does not go far enough
and the QMLE procedure could yield tests which are grossly over-sized. To check the robustness
of the QMLE to the presence of fixed-effects, we carried out a small Monte Carlo experiment in
the case of a homogeneous AR(1) panel data model with fixed-effects, λi = 0.70, and N = T =
100. Using R = 2, 000 replications, the bias of the bias-corrected QMLE, λ̂QMLE , turned out to
be −0.024, and tests based on λ̂QMLE were grossly oversized with size exceeding 60 per cent.
−1
π̂ i,PC = i MF̂ i i MF̂ yi , for i = 1, 2, . . . , N, (29.52)
1
N
yi − i π̂ i,PC yi − i π̂ i,PC F̂ = F̂V̂. (29.53)
NT i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
independence of eit . Song (2013) does not provide theoretical results on the estimation of the
mean coefficients π = E (π i ), but he considers the following mean group estimator based on the
individual estimates π̂ i,PC ,
1
N
π̂ sPCMG = π̂ i,PC ,
N i=1
in a Monte Carlo study and finds that π̂ sPCMG has satisfactory small sample properties in terms
of bias and root mean squared error. But he does not provide any results on the asymptotic dis-
tribution of π̂ sPCMG . However,
√ results of a Monte Carlo study presented in Chudik and Pesaran
(2015a) suggest that N π̂ sPCMG − π is asymptotically normally distributed with mean zero
and a covariance matrix that can be estimated by (as in the case of the CCEMG estimator),
1 s N
π̂ sPCMG =
Var π̂ i − π̂ sMG π̂ si − π̂ sMG .
N (N − 1) i=1
The test results based on this conjecture tend to perform well so long as T is sufficiently large.
However, as with the other PC based estimators, knowledge of the number of factors and the
assumption that the factors under consideration are strong continue to play an important role in
the small sample properties of the tests based on π̂ sMGPC .
10 See Everaert and Groote (2011) who derive the asymptotic bias of the CCE pooled estimator in the case of dynamic
homogeneous panels.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
∞
where a (L) =
=0 a L , with a = E λ i , β = E(β i ), and γ = E(γ i ). Under the assump-
p
tion that the idiosyncratic errors are cross-sectionally weakly dependent, we have ξ wt → 0, as
N → ∞, with the rate of convergence depending on the degree of cross-sectional dependence
of {eit } and the granularity of w. In the case where w satisfies the usual granularity conditions
(29.1)–(29.2),
the exponent of cross-sectional dependence of eit is α e ≤ 1/2, we have
and
ξ wt = Op N −1/2 . In the special case where β = 0 and m = 1, (29.55) reduces to
ywt = γ a (L) ft + Op N −1/2 .
The extent to which ft can be accurately approximated by ywt and its lagged values depends on
the rate at which, a = E λ i , the coefficients in the polynomial lag operator, a (L) , decay with
, and the size of the cross-sectional dimension, N. The coefficients in a (L) are given by the
moments of λi and therefore these coefficients need not be absolute summable if the support
of λi is not sufficiently restricted in the neighborhood of the unit circle (see Section 15.8 and
Chapter 32). Assuming that for all i the support of λi lies strictly within the unit circle, it is easily
seen that α will then decay exponentially and for N sufficiently large, ft can be well approxi-
mated by ywt and a number of its lagged values.11 The number of lagged values of ywt needed to
approximate ft rises with T but at a slower rate.
In the general case where β is nonzero, xit are weakly exogenous, and m ≥ 1, Chudik and
Pesaran (2015a) show that there exists the following large N distributed lag relationship between
the unobserved common factors and cross-sectional averages of the dependent variable and the
,
regressors, z̄wt = ȳwt , x̄wt
(L) ˜ ft = z̄wt + Op N −1/2 ,
where as before ˜ = E γ i , i and the decay rate of the matrix coefficients in (L) depends
on the heterogeneity of λi and β i and other related distributional assumptions. The existence
of a large N relationship between the unobserved common factors and cross-sectional averages
of variables is not surprising considering that only the components with the largest exponents
of cross-sectional dependence can survive cross-sectional
aggregation with granular weights.
Assuming ˜ has full row rank, namely rank ˜ = m, and the distributions of coefficients are
such that −1 (L) exists and has exponentially decaying coefficients, yields the following unit-
specific dynamic CCE regressions,
11 For example, if λ is distributed uniformly over the range (0, b) where 0 < b < 1, we have α = E(λ ) = b /(1 + ),
i i
which decays exponentially with . See also Section 15.8.3.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
pT
yit = λi yi,t−1 + β i xit + δ i z̄w,t− + eyit , (29.56)
=0
where z̄wt and its lagged values are used to approximate ft . The error term eyit consists of three
parts: an idiosyncratic term, eit , an error component
due to the truncation of possibly infinite dis-
tributed lag function, and an Op N −1/2 error component due to the approximation of unob-
served common factors based on large N relationships.
Chudik and Pesaran (2015a) consider the least squares estimates of π i = λi , β i based on
the above dynamic CCE regressions, denoted as π̂ i = λ̂i , β̂ i , and the mean group estimate of
π = E (π i ) based on π̂ i . To define these estimators, we introduce the following data matrices
⎛ ⎞ ⎛ ⎞
yipT
xi,p T +1
z̄w,pT +1 z̄w,pT ··· z̄w,1
⎜ yi,pT +1
xi,p ⎟ ⎜ z̄w,pT +2 z̄w,pT +1 ··· z̄w,2 ⎟
˜i =⎜
⎜ ..
T +2
..
⎟ ⎜
⎟ , Q̄ w = ⎜ .. .. ..
⎟
⎟, (29.57)
⎝ . . ⎠ ⎝ . . . ⎠
yi,T−1
xiT z̄w,T z̄w,T−1 · · · z̄w,T−pT
+
and the projection
matrix M̄q = IT−pT − Q̄ w Q̄ w Q̄ w Q̄ w , where IT−pT is a T − pT ×
T − pT dimensional identity matrix.12 pT should be set such that p2T /T tends to zero as pT
and T both tend to infinity. The number of lags cannot increase too fast, otherwise there will
not be a sufficient number of observations to accurately estimate the parameters, whilst at the
same time a sufficient number of lags is needed to ensure that the factors are well approximated.
Setting the number of lags equal to T 1/3 seems to be a good choice, balancing the effects of the
above two opposing considerations.13
The individual estimates, π̂ i , can now be written as
−1
π̂ i = ˜i
˜ i M̄q ˜ i M̄q ỹi ,
(29.58)
where ỹi = yi,pT +1 , yi,pT +2 , . . . , yi,T . The mean group estimator of π = E (π i ) = λ, β is
given by
1
N
π̂ MG = π̂ i . (29.59)
N i=1
12 Matrices i , Q̄ w , and M̄q depend also on pT , N and T, but these subscripts are omitted to simplify notations.
13 See Berk (1974), Said and Dickey (1984), and Pesaran and Chudik (2014) for a related discussion on the choice of
lag truncation for estimation of infinite-order autoregressive models.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
correlated with xit ), then π̂ MG is consistent also in the rank deficient case, despite the incon-
sistency of π̂ i , so long as factor
√ loadings are independently, identically distributed across i. The
convergence rate of π̂ MG is N due to the heterogeneity of the slope coefficients. Chudik and
j
Pesaran (2015a) show that π̂ MG converges to a normal distribution as N, T, pT → ∞ such
that p3T /T → κ1 and T/N → κ2 , 0 < κ1 , κ2 < ∞. The ratio N/T needs to be restricted
for conducting inference, due to the presence of small time series bias. In the full rank case, the
asymptotic variance of π̂ MG is given by the variance of π i alone. When the rank condition does
not hold, but factors are serially uncorrelated, then the asymptotic variance depends also on
other parameters, including the variance of factor loadings. In both cases the asymptotic vari-
ance can be consistently estimated non-parametrically, as in (29.40).
Monte Carlo experiments in Chudik and Pesaran (2015a) show that the dynamic CCE
approach performs reasonably well (in terms of bias, RMSE, size and power). This is particu-
larly the case when the parameter of interest is the average slope of the regressors (β), where the
small sample results are quite satisfactory even if N and T are relatively small (around 40). But
the situation is different if the parameter of interest is the mean coefficient of the lagged depen-
dent variable (λ). In the case of λ, the CCEMG estimator suffers from the well known time series
bias and tests based on it tend to be over-sized, unless T is sufficiently large. To reduce this bias,
Chudik and Pesaran (2015a) consider application of the half-panel jackknife procedure (Dhaene
and Jochmans (2012)), and the recursive mean adjustment procedure (So and Shin (1999)),
both of which are easy to implement.14 The proposed jackknife bias-corrected CCEMG estima-
tor is found to be more effective in mitigating the time series bias, but it cannot deal fully with
the size distortion when T is relatively small. Improving the small T sample properties of the
CCEMG estimator of λ in the heterogeneous panel data models still remains a challenge.
and
14 The jackknife bias-correction procedure was first proposed by Quenouille (1949). See also Section 14.5.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
ft = ρ f ft−1, + ς ft , ς ft ∼ IIDN 0, 1 − ρ 2f ,
vit = ρ xi vi,t−1 + ς it , ς it ∼ IIDN 0, σ 2vi , (29.62)
βi
θi = , for i = 1, 2, . . . , N. (29.63)
1 − λi
Long-run relationships are of great importance in economics. The concept of ‘long-run rela-
tions’ is typically associated with the steady-state solution of a structural macroeconomic model.
Often the same long-run relations can also be obtained from arbitrage conditions within and
i i
i
i
i
i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
Table 29.3 Small sample properties of CCEMG and CCEP estimators of mean slope coefficients in panel data models with weakly and strictly exogenous regressors
CCEMG
40 –5.70 –1.46 -0.29 0.00 0.11 7.82 3.65 2.80 2.67 2.61 23.70 9.35 6.20 6.05 6.25 86.80 94.05 96.00 96.30 96.95
50 –5.84 –1.56 –0.39 0.04 0.11 7.56 3.43 2.56 2.41 2.33 29.50 9.30 7.00 6.70 6.20 93.40 96.75 98.75 98.70 99.20
100 –5.88 –1.50 –0.41 –0.05 0.07 6.82 2.63 1.83 1.70 1.64 46.70 13.10 6.00 5.75 5.25 99.75 99.95 100.00 100.00 100.00
150 –6.11 –1.59 –0.45 –0.11 0.08 6.73 2.36 1.53 1.35 1.30 66.05 16.15 6.60 4.75 4.80 100.00 100.00 100.00 100.00 100.00
200 –6.04 –1.55 –0.43 –0.12 0.01 6.54 2.17 1.37 1.18 1.18 74.65 19.70 7.35 4.50 6.10 100.00 100.00 100.00 100.00 100.00
CCEP
40 –3.50 –0.09 0.76 0.98 1.23 6.58 3.71 3.33 3.24 3.35 14.80 6.75 7.50 7.55 9.85 72.30 78.45 80.55 82.70 82.55
50 –3.55 –0.27 0.70 1.08 1.19 6.07 3.31 2.96 3.00 2.96 14.00 5.70 6.20 8.65 8.80 79.70 86.90 88.55 88.70 90.90
100 –3.56 –0.10 0.76 1.08 1.17 5.11 2.42 2.22 2.27 2.26 21.75 5.50 6.75 9.10 10.45 96.05 97.80 98.80 98.95 99.30
150 –3.78 –0.10 0.74 1.10 1.16 4.86 1.98 1.87 1.99 1.98 30.45 5.85 7.60 11.45 12.60 99.15 99.75 99.95 99.95 100.00
200 –3.66 –0.19 0.80 1.08 1.13 4.56 1.77 1.67 1.78 1.77 35.65 6.25 8.35 12.50 12.45 100.00 100.00 100.00 100.00 100.00
i
i
i
i
i
i
Table 29.3 Continued
CCEMG
40 0.19 –0.05 0.02 0.07 0.04 6.43 3.91 3.06 2.91 2.75 6.20 6.40 4.60 6.40 5.55 36.20 74.40 89.95 93.90 95.60
50 –0.02 0.08 0.11 –0.05 –0.02 5.72 3.48 2.83 2.68 2.46 5.25 6.10 5.90 6.75 5.75 43.90 82.20 93.70 96.80 98.05
100 –0.06 0.01 0.02 –0.05 –0.01 4.13 2.52 2.02 1.79 1.78 5.55 6.45 4.90 4.95 6.20 69.95 97.60 99.75 99.95 100.00
150 0.06 0.03 0.00 0.02 0.01 3.29 2.03 1.62 1.50 1.42 5.40 6.00 5.50 5.05 5.30 85.65 99.95 100.00 100.00 100.00
200 –0.06 0.03 –0.02 –0.03 –0.01 2.87 1.75 1.39 1.33 1.23 4.50 5.30 4.85 6.50 5.15 94.10 100.00 100.00 100.00 100.00
CCEP
40 0.21 0.17 0.02 –0.01 –0.02 5.78 3.85 3.16 3.08 2.85 6.40 6.45 5.95 7.10 6.35 74.55 72.90 88.10 92.15 93.50
50 0.03 –0.01 –0.13 0.02 –0.02 5.20 3.48 2.84 2.59 2.54 5.60 6.25 6.25 6.00 5.95 83.35 83.30 94.80 96.30 97.30
100 –0.01 -0.06 0.05 –0.04 0.07 3.67 2.56 2.03 1.89 1.76 5.60 6.15 5.00 5.35 5.65 98.50 97.75 99.85 100.00 100.00
150 0.05 0.02 0.02 0.01 0.01 2.95 2.02 1.65 1.52 1.49 4.50 5.20 5.50 4.95 5.60 99.80 99.95 100.00 100.00 100.00
200 –0.09 –0.04 –0.06 0.03 0.02 2.57 1.74 1.43 1.38 1.28 6.05 5.75 5.15 5.75 4.95 100.00 100.00 100.00 100.00 100.00
Notes: Observations are generated as yit = cyi + β 0i xit + β 1i xi,t−1 + uit , uit = γ i ft + ε it , and xit = cxi + α xi yi,t−1 + γ xi ft + vit , (see (29.60)–(29.61)), where β 0i ∼ IIDU(0.5, 1),
β 1i = −0.5 for all i, and m = 3 (number of unobserved common factors). Fixed-effects are generated as cyi ∼ IIDN (1, 1), and cxi = cyi +IIDN (0, 1). In the case of weakly exogenous
regressors, α xi ∼ IIDU(0, 1) (with E (α xi ) = 0.5), and under the case of strictly exogenous regressors α xi = 0 for all i. The errors are generated to be heteroskedastic and weakly
cross-sectionally dependent. See Section 29.5.3 for a more detailed description of the MC design.
i
i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
across markets. As a result, many long-run relationships in economics are free of particular model
assumptions; examples include purchasing power parity, uncovered interest parity and the Fisher
inflation parity.
Estimation of long-run relations in the case of pure time series models has been discussed in
Section 6.5, and for dynamic panel data models without cross-sectional dependence has been
considered in Chapter 28 (Sections 28.6–28.10). This section extends the estimation of long-
run effects to dynamic panels with multifactor error structure.
There are two approaches to estimating long-run coefficients. One approach, is to estimate
the individual short-run coefficients λi and β i in the ARDL relation (29.49) and then compute
the estimates of long-run effects using formula (29.63) with the short-run coefficients replaced
by their estimates (λ̂i and β̂ i ) discussed in Section 29.5. This is the ‘ARDL approach to the esti-
mation of long-run effects.’ This approach is consistent irrespective of whether the underlying
variables are I (0) or I (1), and whether the regressors in xit are strictly or weakly exogenous.
These robustness properties are clearly important in empirical research. However, the ARDL
approach also has its own drawbacks. Most importantly, the sampling uncertainty could be large,
especially when the speed of convergence towards the long-run relation is rather slow and the
time dimension is not sufficiently long. This is readily apparent from (29.63), since even a small
change to 1 − λ̂i could have a large impact on the estimates of θ i , when λ̂i is close to unity. In
this respect, a correct specification of lag-orders could be quite important for the performance
of the ARDL estimates. Moreover, the estimates of the short-run coefficients are subject to small
T bias.
An alternative approach, proposed by Chudik et al. (2015), is to estimate the long-run coeffi-
cients θ i directly, without first estimating the short run coefficients. This is possible by observing
that the ARDL model (29.49) can be written as
∞
where ũit = λi (L)−1 uit , λi (L) = 1 − λi L , and α i (L) = ∞ =0
s= +1 λi β i L . We shall
s
refer to the direct estimation of θ i based on the distributed lag (DL) representation (29.64) as
the ‘DL approach to the estimation of long-run effects’. Under the usual assumptions |λi | < 1
(the roots of λi (L) fall strictly outside the unit circle), the coefficients of α i (L) are exponentially
decaying, and in the absence of feedback effects from lagged values of yit onto the regressors xit ,
a consistent estimate of θ i can be obtained directly based on the least squares regression of yit on
pT
xit , {xit− } =0 and a set of cross-sectional averages that deals with the effects of unobserved
common factors in uit . The truncation lag-order pT = p (T) is chosen as a non-decreasing func-
tion of T such that 0 ≤ pT < T.
The cross-section augmented distributed lag (CS-DL) mean group estimator of the long-run
coefficients is given by
1
N
θ̂ MG = θ̂ i , (29.65)
N i=1
where
−1
θ̂ i = X̃i Mqi X̃i X̃i Mqi ỹi . (29.66)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Estimators θ̂ MG and θ̂ P differ from the mean group and pooled CCE estimator developed in
Pesaran (2006) (see (29.37)-(29.38)), which only allows for the inclusion of a fixed number
of regressors, whilst the CS-DL type estimators include pT lags of xit and their cross-section
j
averages, where pT increases with T, albeit at a slower rate. Specifically, when N, T, pT → ∞
√
such that NpT ρ p → 0, for any constant 0 < ρ < 1 and p3T /T → κ, 0 < κ < ∞,
Chudik et al. (2015) establish asymptotic normality of θ̂ MG and θ̂ P under the assumption of the
random coefficient model,
θ i = θ + υ i , υ i ∼ IID(0,
θ ), for i = 1, 2, . . . , N, (29.68)
where θ <K,
√
θ <K,
θ is a k×k symmetric nonnegative definite matrix. The rate of con-
vergence is N (due to coefficient heterogeneity) and the asymptotic variance can be estimated
non-parametrically along similar lines as in Section 29.4.2 (see (29.40) and (29.42)).
Monte Carlo evidence presented in Chudik et al. (2015) suggests that the DL approach has
often better small sample performance when T is in the range 30 ≤ T < 100, compared to the
ARDL approach. The advantage of the DL approach is its robustness to residual serial corre-
lation, breaks in error processes, and dynamic misspecifications. However, unlike the ARDL
approach, the DL procedure could be subject to simultaneity bias (when there are feedbacks
from lagged values of yit to the regressors in xit ). Nevertheless, the extensive Monte Carlo exper-
iments reported in Chudik et al. (2015) suggest that the endogeneity bias of the DL approach
is more than compensated for by its better small sample performance as compared with the
ARDL procedure when the time dimension is not very large. ARDL seems to dominate DL
only if the time dimension is sufficiently large and the underlying ARDL model is correctly
specified.
where ai and β i for i = 1, 2, . . . , N, are assumed to be fixed unknown coefficients, and xit is a
k-dimensional vector of regressors. We consider both cases where the regressors are strictly and
weakly exogenous, as well as when they include lagged values of yit .
15 Strictly speaking, the hypothesis being tested is zero cross-section correlations rather than independence of the errors,
although the two notions coincide in the case of linear or Gaussian models.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
The literature on testing for error cross-sectional dependence in large panels follows two sep-
arate strands, depending on whether the cross-sectional units are ordered or not. In the case of
ordered data sets (which could arise when observations are spatial or belong to given economic
or social networks) tests of cross-sectional independence that have high power with respect to
such ordered alternatives have been proposed in the spatial econometrics literature. A prominent
example of such tests is Moran’s I test. See Moran (1948) with further developments by Anselin
(1988), Anselin and Bera (1998), Haining (2003), and Baltagi, Song, and Koh (2003).
In the case of cross-section observations that do not admit an ordering, tests of cross-sectional
dependence are typically based on estimates of pair-wise error correlations (ρ ij ) and are appli-
cable when T is sufficiently large so that relatively reliable estimates of ρ ij can be obtained. An
early test of this type is the Lagrange multiplier (LM) test of Breusch and Pagan (1980) which
tests the null hypothesis that all pair-wise correlations are zero, namely that ρ ij = 0 for all i
= j.
This test is based on the average of the squared estimates of pair-wise correlations, and under
standard regularity conditions it is shown to be asymptotically (as T → ∞) distributed as χ 2
with N(N − 1)/2 degrees of freedom. The LM test tends to be highly over-sized in the case of
panels with relatively large N.
In what follows, we review the various attempts made in the literature to develop tests of cross-
sectional independence when N is large and the cross-sectional units are unordered. But before
proceeding further, we first need to consider the appropriateness of the null hypothesis of cross-
sectional ‘independence’ or ‘uncorrelatedness’, that underlies the LM test of Breusch and Pagan
(1980), namely that all ρ ij are zero for all i
= j, when N is large. The null that underlies the LM
test is sensible when N is small and fixed as T → ∞. But when N is relatively large and rising
with T, it is unlikely to matter if, out of the total N(N − 1)/2 pair-wise correlations, only a few
are non-zero. Accordingly, Pesaran (2015) argues that the null of cross-sectionally uncorrelated
errors, defined by
H0 : E uit ujt = 0, for all t and i
= j, (29.70)
is restrictive for large panels and the null of a sufficiently weak cross-sectional dependence could
be more appropriate since the mere incidence of isolated error dependencies is of little conse-
quence for estimation or inference about the parameters of interest, such as the individual slope
coefficients, β i , or their average value, E(β i ) = β.
Consider the panel data model (29.69), and let ûit be the OLS estimator of uit defined by
ûit = yit − âi − β̂ i xit , (29.71)
with âi , and β̂ i being the OLS estimates of ai and β i , based on the T sample observations, yt , xit ,
for t = 1, 2, . . . , T. Consider the sample estimate of the pair-wise correlation of the residuals,
ûit and ûjt , for i
= j
T
t=1 ûit ûjt
ρ̂ ij = ρ̂ ji = 1/2 1/2 .
T 2 T 2
t=1 ûit t=1 ûjt
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
In the case where the uit is symmetrically distributed and the regressors are strictly exogenous,
then under the null hypothesis of no cross-sectional dependence, ρ̂ ij and ρ̂ is are cross-sectionally
uncorrelated for all i, j and s such that i
= j
= s. This follows since
T
T
T
T
E ρ̂ ij ρ̂ is = E η̂it η̂it η̂jt η̂st = E η̂it η̂it E η̂jt E η̂st = 0, (29.72)
t=1 t =1 t=1 t =1
T 2 1/2
where η̂it = ûit t=1 ûit . Note that when xit is strictly exogenous for each i, ûit , being a
linear function of uit , for t = 1, 2, . . . , T, will also be symmetrically distributed with zero means,
which ensures that ηit is also symmetrically distributed around its mean which is zero. Further,
under (29.70) and when N is finite, it is known that (see Pesaran (2004))
√ a
T ρ̂ ij ∼ N(0, 1), (29.73)
for a given i and j, as T → ∞. The above result has been widely used for constructing tests based
on the sample correlation coefficient or its transformations. Noting that, from (29.73), T ρ̂ 2ij is
asymptotically distributed as a χ 21 , it is possible to consider the following test statistic
N−1 N
1
CDLM = T ρ̂ 2ij − 1 . (29.74)
N(N − 1) i=1 j=i+1
Based on the Euclidean norm of the matrix of sample correlation coefficients, (29.74) is a ver-
sion of the Lagrange multiplier test statistic due to Breusch and Pagan (1980). Frees (1995)
first explored the finite sample properties of the LM statistic, calculating its moments for fixed
values of T and N, under the normality assumption. He advanced a non-parametric version of
the LM statistic based on the Spearman rank correlation coefficient. Dufour and Khalaf (2002)
have suggested to apply Monte Carlo exact tests to correct the size distortions of CDLM in finite
samples. However, these tests, being based on the bootstrap method applied to the CDLM , are
computationally intensive, especially when N is large.
An alternative adjustment to the LM test is proposed by Pesaran, Ullah, and Yamagata (2008),
where the LM test is centered to have a zero mean for a fixed T. These authors also propose a
correction to the variance of the LM test. The basic idea is generally applicable, but analytical
bias corrections can be obtained only under the assumption that the regressors, xit , are strictly
exogenous and the errors, uit are normally distributed. Under these assumptions, Pesaran, Ullah,
and Yamagata (2008) show that the exact mean and variance of (N − k) ρ̂ 2ij are given by:
! " 1
μTij = E (N − k) ρ̂ 2ij = Tr E Mi Mj ,
T−k
! "
2 # ! 2 "$
2
vTij = Var (N − k) ρ̂ ij = Tr E Mi Mj
2
a1T + 2 Tr E Mi Mj a2T ,
where
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
2 2
1 (T − k − 8) (T − k + 2) + 24
a1T = a2T − , a2T =3 ,
T−k (T − k + 2) (T − k − 2) (T − k − 4)
−1
Mi = IT −X̃i X̃i X̃i X̃i , and X̃i is T×(k + 1) matrix of observations on 1, xit . The adjusted
LM statistic is now given by
2
N−1 N
(T − k) ρ̂ 2ij − μTij
LMAdj = , (29.75)
N(N − 1) i=1 j=i+1 vTij
which is asymptotically N(0, 1) under H0 , T→∞ followed by N→∞. The asymptotic distri-
bution of LMAdj is derived under sequential asymptotics, but it might be possible to establish it
under the joint asymptotics following the method of proof in Schott (2005) or Pesaran (2015).
The application of the LMAdj test to dynamic panels or panels with weakly exogenous regres-
sors is further complicated by the fact that the bias corrections depend on the true values of the
unknown parameters and will be difficult to implement. The implicit null of LM tests when T and
N → ∞, jointly rather than sequentially could also differ from the null of uncorrelatedness of
all pair-wise correlations. To overcome some of these difficulties, Pesaran (2004) has proposed
a test that has exactly mean zero for fixed values of T and N. This test is based on the average of
pair-wise correlation coefficients
⎛ ⎞
2T
N−1 N
CDP = ⎝ ρ̂ ⎠ . (29.76)
N(N − 1) i=1 j=i+1 ij
As it is established in (29.72), under the null hypothesis ρ̂ ij and ρ̂ is are uncorrelated for all i
=
j
= s, but they need not be independently distributed when T is finite. Therefore, the standard
central limit theorems cannot be applied to the elements of the double sum in (29.76) when
(N, T) → ∞ jointly, and as shown in Pesaran (2015, Theorem 2) the derivation of the limiting
distribution of the CDP statistic involves a number of complications. It is also important to bear
in mind that the implicit null of the test in the case of large N depends on the rate at which T
expands with N. Indeed, as argued in Pesaran (2004), under the null hypothesis of ρ ij = 0 for
all i
= j, we continue to have E ρ̂ ij = 0, even when T is fixed, so long as uit are symmetrically
distributed around zero, and the CDP test continues to hold.
Pesaran (2015) extends the analysis of the CDP test and shows that the implicit null of the test
is weak cross-sectional dependence. In particular, the implicit null hypothesis of the test depends
on the relative expansion rates of N and T.16 Using the exponent of cross-sectional dependence,
α, developed in Bailey, Kapetanios, and Pesaran (2015) and discussed above, Pesaran (2015)
shows that when T = O (N ) for some 0 < ≤ 1, the implicit null of the CDP test is given by
0 ≤ α < (2 − ) /4. This yields the range 0 ≤ α < 1/4 when (N, T) → ∞ at the same rate
such that T/N → κ for some finite positive constant κ, and the range 0 ≤ α < 1/2 when T is
16 Pesaran (2015) also derives the exact variance of the CD test under the null of cross-sectional independence and
P
proposes a slightly modified version of the CDP test distributed exactly with mean zero and a unit variance.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
small relative to N. For larger values of α, as shown by Bailey, Kapetanios, and Pesaran (2015),
α can be estimated consistently using the variance of the cross-section averages.
Monte Carlo experiments reported in Pesaran (2015) show that the CDP test has good small
sample properties for values of α in the range 0 ≤ α ≤ 1/4, even in cases where T is small
relative to N, as well as when the test is applied to residuals from pure autoregressive panels so
long as there are no major asymmetries in the error distribution.
Other statistics have also been proposed in the literature to test for zero contemporaneous
correlation in the errors of panel data model (29.69).17 Using results from the literature on spac-
ing discussed in Pyke (1965), Ng (2006) considers a statistic based on the qth differences of the
cumulative normal distribution associated to the N(N − 1)/2 pair-wise correlation coefficients
ordered from the smallest to the largest, in absolute value. Building on the work of John (1971),
and under the assumption of normal disturbances, strictly exogenous regressors, and homoge-
neous slopes, Baltagi, Feng, and Kao (2011) propose a test of the null hypothesis of sphericity,
defined by
H0BFK : ut ∼ IIDN 0, σ 2u IN ,
where Ŝ is the N ×N sample covariance matrix, computed using the fixed-effects residuals under
the assumption of slope homogeneity, β i = β. Under H0BFK , errors uit are cross-sectionally inde-
pendent and homoskedastic and the JBFK statistic converges to a standardized normal distribu-
tion as (N, T) → ∞ such that N/T → κ for some finite positive constant κ. The rejection of
H0BFK could be caused by cross-sectional dependence, heteroskedasticity, slope heterogeneity,
and/or non-normal errors. Simulation results reported in Baltagi, Feng, and Kao (2011) show
that this test performs well in the case of homoskedastic, normal errors, strictly exogenous regres-
sors, and homogeneous slopes, although it is oversized for panels with large N and small T, and is
sensitive to non-normality of disturbances. Joint assumption of homoskedastic errors and homo-
geneous slopes is quite restrictive in applied work, and therefore the use of the JBFK statistics as
a test of cross-sectional dependence should be approached with care.
A slightly modified version of the CDLM statistic, given by
N−1 N ! "
T+1
LMS = (T − 1) ρ̂ 2ij − 1 (29.78)
N(N − 1) (T − 2) i=1 j=i+1
has also been considered by Schott (2005), who shows that when the LMS statistic is com-
puted based on normally distributed observations, as opposed to panel residuals, it converges to
N(0, 1) under ρ ij = 0 for all i
= j as (N, T) → ∞ such that N/T → κ for some 0 < κ < ∞.
Monte Carlo simulations reported in Jensen and Schmidt (2011) suggest that the LMS test has
good size properties for various sample sizes when applied to panel residuals in the case when
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
slopes are homogeneous and estimated using the fixed-effects approach. However, the LMS test
can lead to severe over-rejection when the slopes are in fact heterogeneous and the fixed-effects
estimators are used. Over-rejection of the LMS test could persist even if mean group estimates
are used in the computation of the residuals to take care of slope heterogeneity. This is because
for relatively small values of T, unlike the LMAdj statistic defined by (29.75), the LMS statistic
defined by (29.78) is not guaranteed to have a zero mean exactly.
The problem of testing for cross-sectional dependence in limited dependent variable panel
data models with strictly exogenous covariates has also been investigated by Hsiao, Pesaran, and
Pick (2012). In this paper the authors derive an LM test and show that in terms of the general-
ized residuals of Gourieroux et al. (1987), the test reduces to the LM test of Breusch and Pagan
(1980). However, not surprisingly as with the linear panel data models, the LM test based on
generalized residuals tends to over-reject in panels with large N. They then develop a CD type
test based on a number of different residuals, and using Monte Carlo experiments they find that
the CD test preforms well for most combinations of N and T.
The existing literature on testing for error cross-sectional dependence, with the exception
of Sarafidis et al. (2009), has mostly focused on the case of strictly exogenous regressors. This
assumption is required for both LMAdj and JBFK tests, while Pesaran (2004) shows that the CDP
test is also applicable to autoregressive panel data models so long as the errors are symmetrically
distributed. The properties of the CDP test for dynamic panels that include weakly or strictly
exogenous regressors have not yet been investigated.
We conduct Monte Carlo experiments to investigate the performance of these tests in the case
of dynamic panels and to shed light also on the performance of LMS test in the case of panels with
heterogeneous slopes. We generate the dependent variable and the regressors in the same way
as described in Section 29.5.3 with the following two exceptions. First, we introduce lags of the
dependent variable in (29.74)
and generate λi as IIDU (0, 0.8). As discussed in Chudik and Pesaran (2015a) the lagged depen-
dent variable coefficients, λi , and the feedback coefficients, α xi , in (29.61) need to be chosen
such as to ensure the variances of yit remain bounded. We generate α xi as IIDU (0, 0.35), which
ensures that this condition is met and E (α xi ) = 0.35/2. For comparison purposes, we also con-
sider the case of strictly exogenous regressors where we set λi = α xi = 0 for all i. The second
exception is the generation of the reduced form errors. In order to consider different options for
cross-sectional dependence, we use the following residual factor model to generate the errors uit
γ i = vγ i , for i = 1, 2, . . . , Mα ,
γ i = 0, for i = Mα + 1, Mα + 2, . . . , N,
where Mα = [N α ], vγ i ∼ IIDU [μv − 0.5, μv + 0.5]. We set μv = 1, and consider four values
for the exponent of the cross-sectional dependence of the errors, namely α = 0, 0.25, 0.5, and
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
0.75. We also consider the following combinations of N ∈ {40, 50, 100, 150, 200}, and T ∈
{20, 50, 100, 150, 200}, and use 2,000 replications for all experiments.
Table 29.4 presents the findings for the CDp , LMAdj and LMS tests. The rejection rates for
JBFK in all cases, including the cross-sectionally independent case of α = 0, were all close to 100
per cent, in part due to the error variance heteroskedasticity, and are not included in Table 29.4.
Panel A of Table 29.4 reports the test results for the case of strictly exogenous regressors, and
Panel B gives the results for the case of weakly exogenous regressors. We see that the CDP test
continues to perform well even when the panel data model contains a lagged dependent variable
and other weakly exogenous regressors, for the combination of N and T samples considered.
The results also confirm the theoretical finding discussed above that shows the implicit null of
the CDP test to be 0 ≤ α ≤ 0.25. In contrast, the LMAdj test tends to over-reject when the panel
includes dynamics and T is small compared with N. For N = 200 and T = 20 the rejection
rate is 14.25 per cent.18 Furthermore, the MC results also suggest that the LMAdj test has power
when the cross-sectional dependence is very weak, namely in the case when the exponent of
cross-sectional dependence is α = 0.25. LMS also over-rejects when T is small relative to N,
but the over-rejection is much more severe as compared with the LMadj test since in the weakly
exogenous regressor case the test statistic is not centered at zero for a fixed T.
The over-rejection of the JBFK test in these experiments is caused by a combination of several
factors, including heteroskedastic errors and heterogeneous coefficients. In order to distinguish
between these effects, we also conducted experiments with homoskedastic errors where we set
Var (ε it ) = σ 2i = 1, for all i, and strictly exogenous regressors (by setting α xi = 0 for all i),
and consider
two cases for the coefficients: heterogeneous and homogeneous (we set β i0 =
E β i0 = 0.75, for all i). The results under homoskedastic errors and homogeneous slopes are
summarized in the upper part of Table 29.5. As to be expected, the JBFK test has good size and
power when T > 20 and α = 0. But the test tends to over-reject when T = 20 and N is
relatively large even under these restrictions. The bottom part of Table 29.5 presents findings for
the experiments with slope heterogeneity, whilst maintaining the assumptions of homoskedastic
errors and strictly exogenous regressors. We see that even a small degree of slope heterogeneity
can cause the JBFK test to over-reject badly .
Finally, it is important to bear in mind that even the CDP test is likely to over-reject in the
case of models with weakly exogenous regressors if N is much larger than T. Only in the case of
models with strictly exogenous regressors, and pure autoregressive models with symmetrically
distributed disturbances, we would expect the CDP test to perform well even if N is much larger
than T. To illustrate this property we provide empirical size and power results when N = 1, 000
and T = 10 in Table 29.6. As can be seen, the CDP test has the correct size when we consider
panel data models with strictly exogenous regressors or in the case of pure AR(1) models. This
is in contrast to the case of panels with weakly exogenous regressors where the size of the CDP
test is close to 70 per cent. It is clear that the small sample properties of the CDp test for very
large N and small T panel very much depend on whether the panel includes weakly exogenous
regressors.
18 The rejection rates based on the LMAdj test were above 90 per cent for the sample sizes N = 500, 1000 and T = 10.
i i
i
i
i
i
Table 29.4 Size and power of CD and LM tests in the case of panels with weakly and strictly exogenous regressors (nominal size is set to 5 per cent)
CDP test
40 5.65 6.25 5.15 5.15 4.95 6.20 6.50 6.20 6.25 6.75 24.60 46.70 74.60 86.35 91.95 99.50 100.00 100.00 100.00 100.00
50 5.40 4.80 5.30 5.15 5.40 5.05 5.15 6.85 7.25 6.50 28.55 52.60 78.85 90.45 96.10 99.70 100.00 100.00 100.00 100.00
100 5.45 5.45 5.40 4.40 5.45 5.10 6.15 6.75 6.60 8.20 32.50 60.15 82.55 92.95 97.45 99.95 100.00 100.00 100.00 100.00
150 4.80 4.75 4.65 4.95 5.05 5.05 5.70 5.85 5.15 6.10 31.90 56.45 83.45 92.80 97.50 100.00 100.00 100.00 100.00 100.00
200 5.85 4.70 5.25 6.60 4.50 6.00 5.80 5.30 5.55 6.40 30.00 57.60 83.65 94.15 97.95 100.00 100.00 100.00 100.00 100.00
LMAdj test
40 4.75 5.25 5.50 4.30 5.20 6.80 7.65 15.95 28.30 36.55 43.05 93.35 99.80 100.00 100.00 99.70 100.00 100.00 100.00 100.00
50 6.05 5.25 4.00 4.95 4.95 6.05 6.45 12.40 19.70 31.50 47.85 95.85 100.00 100.00 100.00 99.70 100.00 100.00 100.00 100.00
100 7.00 5.10 4.75 4.70 4.80 7.35 8.80 18.40 34.30 46.25 53.75 98.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
150 6.55 4.85 5.10 5.15 5.40 7.45 6.70 11.95 18.55 28.65 49.85 98.55 99.95 100.00 100.00 100.00 100.00 100.00 100.00 100.00
200 7.75 4.95 5.15 3.90 5.10 8.75 6.45 8.50 13.25 19.00 52.05 98.70 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
LMS test
40 11.35 5.30 4.70 5.70 5.30 15.75 10.85 20.10 28.15 41.50 63.35 95.65 99.70 99.95 100.00 99.80 100.00 100.00 100.00 100.00
50 17.65 6.70 5.90 5.40 4.90 18.90 11.40 15.65 21.05 31.45 73.85 97.05 99.95 99.90 100.00 99.95 100.00 100.00 100.00 100.00
100 44.70 9.40 5.80 6.15 6.15 49.70 19.80 24.80 40.00 51.05 88.65 99.20 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
150 67.25 14.30 7.25 7.30 5.45 70.40 25.85 19.35 25.20 35.40 94.15 99.70 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
200 85.10 24.45 9.45 6.55 6.60 85.75 31.10 19.85 22.05 26.05 98.20 99.85 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
i
i
i
i
i
i
Table 29.4 Continued
CDP test
40 5.90 4.75 5.55 5.05 4.90 6.00 6.10 5.70 6.40 5.90 26.00 49.75 72.55 84.80 93.10 99.80 100.00 100.00 100.00 100.00
50 6.35 5.00 5.15 5.60 4.40 5.90 6.75 6.00 5.95 6.45 28.90 55.00 77.90 90.50 96.00 99.75 100.00 100.00 100.00 100.00
100 6.55 5.50 5.05 5.10 3.95 6.85 6.85 6.80 6.75 8.15 31.75 59.15 81.30 93.65 97.10 100.00 100.00 100.00 100.00 100.00
150 7.55 5.90 4.50 5.60 4.35 8.30 5.75 6.80 5.25 6.45 34.30 56.15 80.95 94.15 97.00 100.00 100.00 100.00 100.00 100.00
200 8.10 4.75 5.05 5.65 4.60 10.30 6.00 7.25 6.40 6.45 35.75 61.10 83.20 94.05 98.45 100.00 100.00 100.00 100.00 100.00
LMAdj test
40 5.05 4.70 5.25 4.10 5.85 6.40 6.85 15.25 26.70 38.65 32.60 92.00 99.80 99.95 100.00 99.40 100.00 100.00 100.00 100.00
50 5.80 4.90 4.70 4.90 4.90 5.30 6.15 12.10 20.80 30.85 35.65 95.55 99.85 100.00 100.00 99.65 100.00 100.00 100.00 100.00
100 6.45 5.60 5.05 4.70 4.80 7.85 7.50 18.45 31.05 47.00 36.40 98.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
150 11.55 5.95 5.30 5.20 3.85 10.35 6.50 10.35 19.65 28.60 31.60 97.65 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
200 14.25 5.50 5.35 4.90 5.25 12.85 6.00 8.20 11.55 19.25 31.55 98.80 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
LMS test
40 15.40 6.10 5.40 4.50 5.10 17.00 10.90 19.45 28.35 41.70 61.30 94.85 99.85 99.85 100.00 99.65 100.00 100.00 100.00 100.00
50 18.60 6.55 5.60 4.25 5.05 22.25 10.25 14.25 22.80 31.70 72.80 97.35 99.95 100.00 100.00 99.80 100.00 100.00 100.00 100.00
100 50.25 10.55 6.60 5.10 5.55 55.60 21.65 26.35 37.95 54.40 88.20 99.25 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
150 77.65 18.25 7.70 6.25 6.95 77.70 28.20 18.75 27.70 34.95 95.40 99.55 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
200 89.60 29.55 11.90 6.90 6.30 87.95 36.65 19.95 23.40 25.85 98.70 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
Notes: Observations are generated using the equations yit = cyi + λi yi,t−1 + β 0i xit + β 1i xi,t−1 + uit , xit = cxi + α xi yi,t−1 + γ xi ft + vit , (see (29.79) and (29.61),
respectively), and uit = γ i gt + εit , (see (29.80)). Four values of α = 0, 0.25, 0.5 and 0.75 are considered. Null of weak cross-sectional dependence is characterized by α = 0
and α = 0.25. In the case of panels with strictly exogenous regressors λi = α xi = 0, for all i. For a more detailed account of the MC design see Section 29.7. LMS test statistic
is computed using the fixed-effects estimates.
i
i
i
i
i
i
Table 29.5 Size and power of the JBFK test in the case of panel data models with strictly exogenous regressors and homoskedastic idiosyncratic shocks (nominal
40 7.85 5.60 5.60 5.20 5.90 21.85 53.80 79.40 86.65 92.50 82.70 99.30 100.00 100.00 100.00 99.70 100.00 100.00 100.00 100.00
50 8.90 5.90 6.00 6.10 4.20 17.85 44.90 73.75 83.75 89.10 84.35 99.90 100.00 100.00 100.00 99.85 100.00 100.00 100.00 100.00
100 9.70 6.10 5.65 5.30 5.50 19.35 52.30 81.30 91.90 95.55 88.25 100.00 100.00 100.00 100.00 99.95 100.00 100.00 100.00 100.00
150 15.00 5.90 5.30 5.10 5.60 14.65 39.60 69.80 83.95 91.00 87.95 99.95 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
200 21.30 6.60 5.30 4.60 5.60 15.90 27.45 58.70 75.45 84.55 87.10 99.95 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
40 7.30 9.10 13.70 22.10 31.80 22.15 55.15 83.90 93.30 96.45 81.40 99.60 100.00 100.00 100.00 99.75 100.00 100.00 100.00 100.00
50 7.60 8.80 18.20 30.90 40.45 18.65 53.25 80.95 92.45 96.95 85.45 99.85 100.00 100.00 100.00 99.85 100.00 100.00 100.00 100.00
100 9.40 16.85 42.05 65.20 83.10 21.40 65.65 94.80 99.20 99.90 88.75 100.00 100.00 100.00 100.00 99.90 100.00 100.00 100.00 100.00
150 12.65 24.70 60.80 86.25 96.25 17.35 62.30 94.10 99.60 99.95 88.45 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
200 15.20 36.80 78.90 95.65 99.00 16.05 64.85 96.15 99.75 99.90 87.75 99.95 100.00 100.00 100.00 99.95 100.00 100.00 100.00 100.00
Notes: The data generating process is the same as the one used to generate the results in Table 29.4 with strictly exogenous regressors, but with two exceptions: error variances
are assumed
homoskedastic
(Var (εit ) = σ 2i = 1, for all i) and two possibilities are considered for the slope coefficients: heterogeneous and homogeneous (in the latter case
β i0 = E β i0 = 0.75, for all i). Null of weak cross-sectional dependence is characterized by α = 0 and α = 0.25. See also the notes to Table 29.4. The JBFK test statistic is computed
using the fixed-effects estimates.
i
i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Table 29.6 Size and power of the CD test for large N and short T panels with
strictly and weakly exogenous regressors (nominal size is set to 5 per cent)
Notes: See the notes to Tables 29.3 and 29.4, and Section 29.7 for further details. In par-
ticular, note that null of weak cross-sectional dependence is characterized by α = 0 and
α = 0.25, with alternatives of semi-strong and strong cross-sectional dependence given
by values of α ≥ 1/2.
%
for t = t, t + 1, . . . , t where N = tt=t Nt and the starting and ending points of the sample t
and t are chosen to maximize the use of data subject to the constraint #N ≥ Nmin .19 The second
possibility utilizes data in a more efficient way,
1 1
ȳt = yit , and x̄t = xit ,
#N t #N t
i∈Nt i∈Nt
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
for t = t, t + 1, . . . , t, where t and t are chosen such that #Nt ≥ Nmin for all t = t, t + 1, . . . , t.
Both procedures are likely to perform similarly when #N is reasonably large, and the occurrence
of missing observations is random. In cases where new cross-sectional units are added to the
panel over time and such additions can have systematic influences on the estimation outcomes,
it might be advisable to de-mean or de-trend the observations for individual cross-sectional units
before computing the cross-section averages to be used in the CCE regressions.
Now suppose that the cross-section coverage differs for each variable. For example, the depen-
dent variable can be available only for OECD countries, whereas some of the regressors could be
available for a larger set of countries. Then it is preferable to utilize also data on non-OECD coun-
tries to maximize the number of units for the computation of cross-section averages for each of
the individual variables.
The CD and LM tests can also be readily extended to unbalanced panels. Denote by Ti , the set
of dates over which time series observations on yit and xit are available for the ith individual, and
denote the number of elements in the set by #Ti . For each i compute the OLS residuals based
on the full set of available time series observations. As before, denote these residuals by ûit , for
t ∈ Ti , and compute the pair-wise correlations of ûit and ûjt using the common set of data points
in Ti ∩ Tj . Since in such cases the estimated residuals need not sum to zero over the common
sample period, ρ ij should be estimated by
t∈Ti ∩Tj ûit − ûi ûjt − ûj
ρ̂ ij = 2 1/2 2 1/2 ,
t∈Ti ∩Tj ûit − ûi t∈Ti ∩Tj ûjt − ûj
where
t∈Ti ∩Tj ûit
ûi = .
# Ti ∩ Tj
The CD (similarly the LM type) statistics for the unbalanced panel can then be computed as
usual by
⎛ ⎞
2
N−1 N
CDP = ⎝ Tij ρ̂ ij ⎠ , (29.81)
N(N − 1) i=1 j=i+1
where Tij = # Ti ∩ Tj .
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
29.10 Exercises
1. Consider the following ‘star’ model
zt = Rε t ,
where ε t = (ε1t , . . . , ε Nt ) , ε it ∼ IID 0, σ 2ε ,
⎛ ⎞
1 0 ··· 0 0
⎜ r21 1 ··· 0 0 ⎟
⎜ ⎟
⎜ .. .. . . .. .. ⎟ ,
R=⎜ . . . . . ⎟
⎜ ⎟
⎝ rN−1,1 0 ··· 1 0 ⎠
rN1 0 ··· 1
1 N
and N−1 i=2 |ri1 | is bounded away from zero. Prove that Var w zt > 0 for any N and as
N → ∞, where w = (wi1 , wi2 , . . . , wiN ) satisfies granularity conditions (29.1)–(29.2).
2. Consider the single factor model
uit = γ i ft + ε it , i = 1, . . . , N; t = 1, 2, . . . , T, (29.82)
2
ε it ∼ IID(0, σ ε ), ft ∼ IID(0, 1), and γ i are fixed coefficients. Write down E uit and
with 2
E uit ujt the elements of the covariance matrix, , of u.t = (u1t , u2t , . . . , uNt ) . Hence,
derive the largest eigenvalue of and check conditions for the {uit } process to be CSD.
3. Consider the single factor model (29.82), with εit ∼ IID(0, σ 2ε ) and ft ∼ IID(0, 1). Assume
that γ i for i = 1, 2, . . . , N are fixed coefficients.
(a) Find a set of weights, w = (w1 , w2 , . . . , wN ) , such that Var w u.t > 0, for all N.
(b) Derive the correlation matrix of u.t = (ui1 , ui2 , . . . , uiN ) , and use the elements of this
correlation matrix to write an expression for the statistics CDP and ρ̄ (see (29.76)).
(c) Find conditions on the loadings, γ i , which ensure ρ̄
= 0, even if N → ∞.
where
xit = δi ft + vit
uit = γ i ft + ε it ,
ft is a covariance stationary unobserved common factor, and the errors uit , vit and ε it are seri-
ally and cross-sectionally independently distributed with zero means and finite variances σ 2iu ,
σ 2iv , and σ 2iε , respectively. Further assume that ft is distributed independently of these errors.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
(a) Discuss the statistical properties of the pooled OLS estimator of β in the case where
δ i = 0. In particular show that the pooled OLS is unbiased but inefficient if δ i = 0.
(b) Derive an expression for the probability limit of the pooled OLS in the general case where
δ i
= 0, as N and T tend to infinity either sequentially or jointly. Hence or otherwise show
that the pooled OLS estimator of β is inconsistent.
(c) Is SURE estimation likely to help under case (b)?
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
30.1 Introduction
T his chapter reviews econometric methods for linear panel data models that exhibit spatial
dependence. Spatial dependence may arise from local interaction of individuals, or from
unobserved characteristics that show persistence across space or over a network. In order to
measure spatial correlation, we need to specify the modalities of how agents interact and define
a metric of distance between individuals. ‘Local’ does not need to be specified in terms of phys-
ical space, but can be related to other types of metrics, such as the economic, policy, or social
distance.
A sizeable literature has analysed the role of interactions and externalities in several differ-
ent branches of economics, both at a theoretical and at an empirical level. For instance, a rich
literature in microeconomics explores the decision-making process of an agent embedded in a
system of social relations, where he/she can watch other agents’ actions (Bala and Goyal (2001),
Brock and Durlauf (2001)). A key finding is that local interaction may allow some forms of
behaviour to propagate to the entire population (Ellison (1993)). Recent studies in macroeco-
nomics have theorized the existence of strategic complementaritis that produce aggregate fluc-
tuations in industrial market economies (Cooper and Haltiwanger (1996); Binder and Pesaran
(1998)). Factors such as input–output linkages, demand complementarities and human cap-
ital spillovers have been used to explain observed comovements not attributable to aggregate
shocks (Aoki (1996)). Finally, literature on endogenous growth has emphasized the importance
of linkages between countries in the analysis of regional income growth (Rivera-Batiz and Romer
(1991); Barro and Sala-i-Martin (2003); Arbia (2006)). According to this literature, relations
established with neighbouring regions, in the form of demand linkages, interacting labour mar-
kets and knowledge spillovers, also due to the increased economic integration between devel-
oped economies, are determinants of regional economic growth.
Spatial correlation can also be caused by a variety of measurement problems often encoun-
tered in applied work, or by the particular sampling scheme used to select units. An example is
the lack of concordance between the delineation of observed spatial units, such as the region
or the country, and the spatial scope of the phenomenon under study (Anselin (1988)). When
the sampling scheme is clustered, potential correlation may also arise between respondents
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
belonging to the same cluster. Indeed, units sharing observable characteristics such as location
or industry, may also have similar unobservable characteristics that would cause the regression
disturbances to be correlated (Moulton (1990), Pepper (2002)).
This chapter provides a survey of econometric methods proposed to deal with spatial depen-
dence in the context of linear panel data regression models.
N
yit = α i + ρ wij yjt + β xit + uit , i = 1, 2, . . . , N; t = 1, 2, . . . , T, (30.1)
j=1
where xit is a k × 1 vector of observed regressors on the ith cross-sectional unit at time t, uit is
the error term, and ρ and β are unknown parameters to be estimated. The group or individual
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
effects, α i , could be either considered fixed, unknown parameters to be estimated, or draws from
a probability distribution. For the time being, it is assumed that regressors are strictly exogenous.
The above specification is typically considered for representation of the equilibrium outcome
of a spatial or social interaction process in which the value of the dependent variable for one
individual is jointly determined with that of the neighbouring individual (Anselin, Le Gallo, and
Jayet (2007)).
It is now easily seen that estimation of ρ and β by least squaresapplied to (30.1) can lead to
N
biased and inconsistent estimates. It is sufficient to show that Cov j=1 wij yjt , uit = 0 when
ρ = 0. To see this, it is convenient to rewrite the model in stacked form as
where y.t = y1t , y2t , . . . , yNt , α = (α 1 , α 2 , . . . , α N ) , X.t = (x1t , x2t , . . . , xNt ) , u.t = u1t ,
u2t , . . . , uNt ∼ IID(0, D), and D is an N × N diagonal matrix with elements 0 < σ 2i < K.
To solve the above model, we first need to establish conditions under which IN − ρW
is invertible. To this end note that the eigenvalues of IN − ρW are given by 1 − λ (ρW),
and IN − ρW is invertible if |λmax (ρW)| < 1, where λmax (A) denotes the largest eigenvalue
of matrix A. This condition can also be written in terms of column and row norms of W.
Since |λmax (ρW)| ≤ |ρ| W, where W is any matrix norm of W, then we also have that
|λmax (ρW)| ≤ |ρ| W1 and |λmax (ρW)| ≤ |ρ| W∞ , where W1 and W∞ are,
respectively, the column and row matrix norms of W. Therefore, invertibility of IN − ρW is
ensured if |ρ| < max (1/ W1 , 1/ W∞ ). This condition can also be written equivalently as
|ρ| < 1/τ ∗ , where τ ∗ = min (W1 , W∞ ) (see Kelejian and Prucha (2010)). Under this
condition we have
wi (IN − ρW)−1 Dei = wi Dei + ρwi WDei + ρ 2 wi W 2 Dei + . . .
= ρσ 2i wi Wei + ρwi W 2 ei + . . . .
1 Under condition |ρ| < max 1/ W1 , 1/ W∞ , we have (IN − ρW)−1 = IN + ρW + ρ 2 W 2 + . . . . See also
Section 29.3.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
But wi Wei = N =1 wiwi and given that wij ≥ 0, then wi W j ei ≥ 0 for j = 1, 2, . . . . Hence,
it must follow that Cov wi y.t , uit > 0 if ρ > 0 and N =1 wi wi = 0. The last condition
holds if there are non-zero elements in the ith column and row of W. Also asymptotically (as
N
N → ∞) we need to have limN→∞ w w
=1 i i > 0, which rules out the possibility of
−1
−1
spatial weights to be granular, namely wij = O(N ). In such a case, N =1 wi wi = O(N )
and limN→∞ Cov(wi y.t , uit ) = 0, for each i.
We can therefore conclude that, in the case of non-granular spatial weights and assuming that
|ρ| < max (1/ W1 , 1/ W∞ ), conventional estimators of parameters ρ and β (such as
pooled OLS or FE) are inconsistent, and alternative estimation approaches, such as maximum
likelihood and generalized method of moments, are needed for consistent estimation of spatial
lag models.
where the notation is as above. There exist few main approaches to assign a spatial structure
to the error term, u.t ;2 the intent is to represent the covariance as a simpler, lower dimensional
matrix than the unconstrained version
One way is to define the covariance between two observations directly as a function of the
distance between them. Accordingly, the covariance matrix for the cross-section at time t is
E(u.t u.t ) = f (θ , W), where θ is a parameter vector, and f is a suitable distance decay function,
such as the negative exponential (Dubin (1988), Cressie (1993); see also Example 75). The
decaying function suggests that the disturbances should become uncorrelated when the dis-
tance separating the observations is sufficiently large. One shortcoming of this method is that it
requires the specification of a functional form for the distance decay, which is subject to a degree
of arbitrariness.
An alternative strategy consists of specifying a spatial process for the error term, which relates
each unit to its neighbours through W. The most widely used model is the spatial autoregres-
sive (SAR) specification. Proposed by Cliff and Ord (1969) and Cliff and Ord (1981), the SAR
process is a variant of the model introduced by Whittle (1954)
2 As we shall see in Section 30.4.3, if α is assumed to be random, then a spatial structure could also be assigned to it.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Other spatial processes suggested to model spatial error dependence, although less used in
the empirical literature, are the spatial moving average (SMA) and the spatial error compo-
nent (SEC) specifications. The first, proposed by Haining (1978) (see also Huang (1984)),
assumes that
where ψ .t = ψ 1t , ψ 2t , . . . , ψ Nt and ψ it ∼ IID 0, σ 2 . The covariance matrix induced by
this model is
SEC = δ 2 σ 2 WW + σ 2ε IN .
A major distinction between the SAR and the other two specifications is that under SAR there is
an inverse involved in the covariance matrix. This has important consequences on the range of
dependence implied by its covariance matrix. Indeed, even if W contains few non-zero elements,
the covariance structure induced by the SAR is not sparse, linking all the units in the system to
each other, so that a perturbation in the error term of one unit will be ultimately transmitted
to all other units. Conversely, for the SMA and SEC, the off-diagonal non-zero elements of the
covariance matrix are those corresponding to the non-zero elements in W.
Conventional panel estimators introduced in Chapter 26, such as the fixed-effects (FE) or ran-
dom effects√(RE) estimators of slope coefficients in equation (30.3) with spatially dependent
errors, are NT-consistent under broad regularity conditions and strictly exogenous regres-
sors. However, these estimators are in general not efficient since the covariance of errors is non-
diagonal and the elements along its main diagonal are in general not constant.
u.t = Rε .t , (30.7)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where R = (rij ) isan N × N matrix, and ε .t ∼ IID(0, D), with D = diag(σ 2ε1 , σ 2ε2 , . . . , σ 2εN ),
σ 2max = supi σ 2εi < K. For example, for an invertible SAR process R = (IN − ρW)−1 , while
in the case of a SMA process, we have R = IN + δW. It is now easily seen that (see Section A.10
in Appendix A)
λ1 () = λ1 RDR ≤ sup σ 2 RR εi 1
i
≤ σ 2max R1 R 1 = σ 2max R1 R∞ .
Hence assuming that R has bounded row and column matrix norms, namely R∞ and
R1 < K, then λ1 () is also bounded in N. Under these conditions spatial processes lead
to cross-sectional dependence that is weak (see Chudik, Pesaran, and Tosetti (2011), and Sec-
tion 29.2). For SMA process R∞ and R1 < K if the spatial weights, W, have bounded row
and column matrix norms. For SAR models it is further required that |ρ| < max 1/ W1 ,
1/ W∞ . In the case where W is row and column standardized the latter condition reduces
to |ρ| < 1.
It is also interesting to observe that, under these conditions, the above process can be repre-
sented by a factor process with an infinite number of weak factors, and no idiosyncratic error,
N
by setting uit = j=1 γ ij fjt , where γ ij = rij , and fjt = ε jt , for i, j = 1, . . . , N. Under the
bounded column and row norms of R, the loadings in the above factor structure satisfy condi-
tion (29.12) in Chapter 29, and hence uit will be a cross-sectionally weakly dependent (CWD)
process.
30.4 Estimation
30.4.1 Maximum likelihood estimator
The theoretical properties of quasi-maximum likelihood (ML) estimator in a single cross-
sectional framework have been studied by Ord (1975), Anselin (1988), and Lee (2004), among
others. More recently, considerable work has been undertaken to investigate the properties of
ML estimators in panel data contexts, in the presence of spatial dependence and unobserved,
time-invariant heterogeneity (Elhorst (2003); Baltagi, Song, and Koh (2003); Baltagi, Egger,
and Pfaffermayr (2013); and Lee and Yu (2010a)).
where the spatial lags in the dependent variable and in the error term are constructed using two
(possibly different) spatial weights matrices, W1 and W2 . Suppose that the group effects, α i ,
are treated as fixed and unknown parameters, and that εit ∼ IID(0, σ 2ε ). Lee and Yu (2010a)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
propose a transformation of the above model to get rid of the fixed-effects, and then use ML
to estimate the remaining parameters, ρ, β, δ and σ 2ε . Specifically, the authors suggest multi-
plying all variables by a T × (T − 1) matrix, P, having as columns the (T − 1) eigenvectors
associated to the non-zero eigenvalues of the deviation from the mean transformation, MT =
−1
IT − τ T τ T τ T τ T , where τ T is a T-dimensional vector of ones.
Let Z = (z.1 , z.2 , . . . , z.T )
be an N × T matrix of variables and let Z∗ = ZP, with Z∗ = z∗.1 , z∗.2 , . . . , z∗.T , be the corre-
sponding transformed matrix of variables. It is easily seen that τ T P = 0, so that such a transfor-
mation removes the individual-specific intercepts. The transformed model is
After the transformation, the effective sample size reduces to N(T − 1), and, since P P = IT−1 ,
the new error term, ε∗.t , has uncorrelated elements, that is, E(ε∗.t ε∗
.t ) = σ ε IN . The log-likelihood
2
where θ = ρ, β , δ, σ 2ε , and ε ∗.t = (IN − δW2 ) (IN − ρW1 ) y.t∗ − X.t∗ β . Subject to some
identification conditions, the estimator of θ obtained by maximizing (30.12) is consistent and
asymptotically normal when either N and/or T → ∞. See Lee and Yu (2010a).
where μ = μ1 , μ2 , . . . , μN , and it is assumed that μi ∼ IID(0, σ 2μ ), and εit ∼ IID(0, σ 2ε ).
The above model, by distinguishing between time-invariant spatial error spillovers and spatial
spillovers of transitory shocks, encompasses various econometric specifications proposed in the
literature as special cases. If the same spatial process applies to α and u.t (i.e., δ = γ and W2 =
W3 ), this model reduces to that proposed by Kapoor, Kelejian, and Prucha (2007); if γ = 0, it
simplifies to that considered by Anselin (1988) and Baltagi, Song, and Koh (2003).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
y = ρ(IT ⊗ W1 )y + Xβ + v, (30.17)
−1 −1
v = (τ T ⊗ A )μ + (IT ⊗ B )ε, (30.18)
, X = X , X , . . . , X , v = v , v , . . . , v , ε = ε , ε ,
where y = y.1 , y.2 , . . . , y.T
.1 .2 .T .1 .2 .T .1 .2
. . . , ε .T , A = (IN − γ W2 ) , B = (IN − δW3 ). The covariance matrix of v is
v = σ 2μ τ T τ T ⊗ (A A)−1 + σ 2ε IT ⊗ (B B)−1 , (30.19)
and applying a set of lemmas by Magnus (1982), the inverse and determinant of v are
1 2 −1 2 −1 −1
1
−1
v =
τ T τ T ⊗ Tσ μ (A A) + σ ε (B B) + 2 M ⊗ (B B) ,
T σε
T−1
| v | =
Tσ 2 (A A)−1 + σ 2 (B B)−1
σ 2 (B B)−1
μ ε ε .
Thus, the log-likelihood function of the random effects (RE) model (30.13)-(30.16) is given by
NT 1
(θ) = − ln(2π) − ln
Tσ 2μ (A A)−1 + σ 2ε (B B)−1
2 2
T − 1
2 −1
1
− ln σ ε (B B) + T ln |IN − ρW1 | − 2 v −1v v, (30.20)
2 2σ ε
where θ = ρ, β , γ , σ 2μ , δ, σ 2ε , and v = [IT ⊗ (IN − ρW1 )] y−Xβ. Consistency of the ML
estimator of θ is established in Baltagi, Egger, and Pfaffermayr (2013). Under the Kapoor, Kele-
jian, and Prucha (2007) RE specification, A = B, and the covariance matrix (30.19) reduces to
2 1
v = σ ε + Tσ 2μ τ T τ T + σ 2ε M ⊗ (A A)−1 ,
T
and its inverse and determinant simplify considerably. When some observations are missing at
random, selection matrices excluding missing observations may be used to obtain | v | and −1
v .
However, the computational burden in this case may be considerable even at medium-sized N
and small T (Pfaffermayr (2009)). A set of joint and conditional specification Lagrange multi-
plier (LM) tests for spatial effects within the RE framework are proposed by Baltagi, Egger, and
Pfaffermayr (2013). These statistics allow for testing of the model (30.13)–(30.16) against its
restricted counterparts: the Anselin model, the Kapoor, Kelejian, and Prucha model, and the ran-
dom effects model without spatial correlation (see also Baltagi, Song, and Koh (2003); Baltagi
and Liu (2008); and Baltagi and Yang (2013)).
As in the non-spatial case, the choice between the FE spatial model and its RE counterpart
can be based on the Hausman test. The properties of a Hausman type specification test and an
LM statistic for testing the FE versus RE specification are studied by Lee and Yu (2010c).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Example 73 One of the earliest applications of the spatial RE model is by Case (1991), who studies
households’ demand for rice in Indonesia, using data on households within a set of districts. The
author considers a regression model with district-specific random errors, which is a special case of
model (30.13)–(30.16). Specifically, let yi be the log of quantity of rice purchased by household i
living in district , with i = 1, 2, . . . , N, = 1, 2, . . . , M, and y = y11 , y12 , . . . , y1N , . . . , yM1 ,
yM2 , . . . , yMN . Then the following specification is assumed for y
y = φWNM y + Xβ + u, (30.21)
u = λWNM u + (1N ⊗ ϕ) + ε, (30.22)
where ε is an NM-dimensional vector with εit ∼ IID(0, σ 2ε ), and ϕ = ϕ 1 , ϕ 2 , . . . , ϕ N is
a vector of district-specific random effects uncorrelated with X. The log-likelihood function for the
above model is
NT 1
(θ) = − ln(2π) + ln |A| + ln |B| − ln | v | − 2 v −1
v v,
2 2σ ε
2
θ = ρ, β , δ, σ ε , A = INM − λWNM , B = (INM − φWNM ) and v =
where
A By − Xβ , and v is the covariance matrix of the error term (τ N ⊗ ϕ) + ε. The model
is fitted to data, using a sample of 2089 households across 141 districts, and including as exogenous
regressors household expenditure per household member, the size of the household, the number of its
members above the age of 10, and the mean village log price of rice, fish, housing, and fuel. Results
are reported in Table 30.1. Note that empirical evidence strongly supports the presence of spatial
error correlation, while it is weaker when compared with the spatial lag effect model.
Example 74 (Prediction in spatial panels) Baltagi and Li (2006) consider the problem of pre-
diction in a panel data regression model with spatial correlation in the context of a simple demand
equation for liquor in the US, at state level. The authors consider the following panel data model
with SAR errors for real per capita consumption of liquor (expressed in logs), for t = 1, 2, . . . , T,
where the explanatory variables include the average retail price of a 750 ml of Seagram’s seven
(a blended American whiskey) expressed in real terms, real per capita disposable income, and a
time trend. It is assumed that εit ∼ IID(0, σ 2ε ). The authors estimate the above model both under
RE and FE specifications. Under the RE hypothesis, α i ∼ IID(0, σ 2α ), and it is easily seen that the
is
covariance matrix of v = v.1 , v.2 , . . . , v.T
v = σ 2α τ T τ T ⊗ IN + σ 2ε IT ⊗ (B B)−1 ,
where B = IN − δW. The parameters β, σ 2α , δ, σ 2ε are then estimated by ML. Under the FE
framework, α 1 , α 2 , . . . , α N are treated as fixed unknown parameters and the authors estimate
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Table 30.1 ML estimates of spatial models for household rice consumption in Indonesia
Model Estimates
them by ML, jointly with other parameters (β, δ, and σ 2ε ). Hence, they consider the following best
linear unbiased predictor for the ith state at a future period T + S under the RE framework
N
ŷi,T+S = β̂ RE xi,T+S + Tθ vij ε̂ j. ,
j=1
th
where θ = σ̂ 2α /σ̂ 2ε , vij is the i, j element of V −1 , with V = Tθ IN + (B B)−1 , and ε̂ j. =
T
T −1 ε̂ jt . Under the FE framework, the
t=1
ŷi,T+S = β̂ FE xi,T+S + α̂ i,FE ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Estimation results
Notes:
a The numbers in parentheses are standard errors.
b The F-test for H ; μ = 0 in the FE model is F(42, 1029) = 165.79, with p = 0.000.
0
c The Breusch–Pagan test for H ; σ 2 = 0 in the RE model is 97.30, with p = 0.000.
0 μ
d The Hausman test based on FE and RE yields aχ 2 of 3.36, with p = 0.339.
3
RMSE of forecasts
where α̂ i,FE and β̂ FE are estimated by ML. The predictive performance is then compared using data
on forty-three States over the period 1965–1994. ML estimates and the RMSE of out-of-sample
forecasts of various estimators are reported in Table 30.2. Note that overall, both the FE and RE
estimators perform well in predicting liquor demand, while adding spatial correlation in the model
does not improve prediction except for the first year. See Baltagi and Li (2006) and Baltagi, Bresson,
and Pirotte (2012) for further discussion of forecasting with spatial panel data models.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In a single cross-sectional setting, Kelejian and Robinson (1993) and Kelejian and Prucha
(1998) propose a simple IV strategy to deal with the endogeneity of the spatially lagged depen-
dent variable, Wy.t , that consists of using as instruments the spatially lagged (exogenous) explana-
tory variables, WX.t (see Section 10.8 and the example therein). As shown by Mutl and Pfaffer-
mayr (2011), the IV approach can be easily adapted to spatial panel data either with fixed or
random effects (Wooldridge (2003)). The reader is also referred to Lee (2003) for a discussion
on the choice of optimal instruments.
GMM estimation of spatial regression models in a single cross-sectional setting was originally
advanced by Kelejian and Prucha (1999). The authors focus on a regression equation with SAR
disturbances, and suggest the use of three moment conditions that exploit the properties of dis-
turbances implied by a standard set of assumptions. Estimation consists of solving a nonlinear
optimization problem, which yields a consistent estimator under a number of regularity condi-
tions. Considerable work has been carried to extend this procedure in various directions. Liu,
Lee, and Bollinger (2006) and Lee and Liu (2006) suggest a set of moments that encompass
Kelejian and Prucha conditions as special cases. They focus on a spatial lag model with SAR dis-
turbances (30.4) and T = 1, and consider a vector of linear and quadratic conditions in the error
term, where the matrices appearing in the quadratic forms have bounded row and column norms
(see also Lee (2007)). In panel (30.3), assuming α i are fixed parameters, consider r quadratic
moments of the type
1
T
M (δ) = E ε .t A ε.t , = 1, 2, . . . , r, (30.25)
NT t=1
where ε .t = (IN − δW) u.t , and A , for = 1, 2, . . . , r are non-stochastic matrices having
bounded row and column sum matrix norms. Lee and Liu (2006) note that the matrices A have
zero diagonal elements, so that M (δ) = 0. Interestingly, this assumption renders the GMM
procedure robust to unknown, cross-sectional heteroskedasticity. The empirical counterpart of
(30.25) is obtained by dropping the expectation operator, replacing u.t by a consistent estimator
(e.g., the IV estimator).
Lee and Liu (2006) focus on the problem of selecting the matrices appearing in the vector of
linear and quadratic moment conditions, in order to obtain the lowest variance for the GMM esti-
mator. Lee and Liu (2010) extend this framework to estimate the SAR model with higher-order
spatial lags. Kelejian and Prucha (2010) generalize their original work to include spatial lags in
the dependent variable and allow for heteroskedastic disturbances. This setting is extended by
Kapoor, Kelejian, and Prucha (2007) to estimate a spatial panel regression model with group
error components, and by Moscone and Tosetti (2011) for a panel with fixed-effects. Druska
and Horrace (2004) have introduced the Keleijan and Prucha GMM within the framework of a
panel with SAR disturbances, time dummies and time varying spatial weights, while Fingleton
(2008a, 2008b) has extended it to the case of a regression with spatial moving average distur-
bances. Egger, Larch, Pfaffermayr, and Walde (2009) compare the small sample properties of
ML and GMM estimators, observing that they perform similarly under the assumption of nor-
mally and non-normally distributed homoskedastic disturbances. However, one advantage of
the GMM procedure over ML is that it is computationally simpler, especially when dealing with
unbalanced panels (Egger, Pfaffermayr, and Winner (2005)).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Example 75 (Spatial price competition) Spatial methods have been widely used to study firm
competition across space, under the assumption that markets are limited in extent. One important
example is the study by Pinkse, Slade, and Brett (2002) on competition in petrol prices in the US.
The authors consider the following model for the price of the ith product
N
pi = g dij pj + β xi + ε i , i = 1, 2, . . . , N,
j=1
where g (.) is a function of distance dij , measuring the influence of distance on the strength of com-
petition between products i and j, and xi is an h-dimensional vector of observed demand and cost
variables. It is further assumed that dij depends on a discrete measure, dD ij , taking a finite number of
C
different values, D, and a vector of continuous distance measures, dij , so that
D
g dij = IdDij =r gr (dCij ),
r=1
where IdDij =r is an indicator function, and it is assumed that gr (dCij ) = ∞ =1 α r er dij , where
α r are unknown coefficients, and er (.) form a basis of the function space to which gr (.) belongs.
Setting e (dij ) = D I
r=1 dij =r
D e r d ij , and α the corresponding coefficients, it follows that
∞
g dij = α e (dij ).
=1
LN
N
pi = α e (dij )pj + β xi + vi ,
=1 j=1
∞
N
vi = ε i + α e (dij )pj ,
=LN +1 j=1
p = Zα + Xβ + v, (30.26)
where Z is an N ×LN matrix with a generic (i, )th element given by N j=1 e (dij )pj . Note that the
dimension of Z increases with N, and the disturbances v, containing neglected expansion terms, are
correlated with the dependent variable, p. Hence, Pinkse, Slade, and Brett (2002) suggest using
an IV approach to estimate θ = α , β in equation (30.26). The authors propose using,
N
as instruments for j=1 e (dij )pj , = 1, 2, . . . , LN , the spatial variables N j=1 e (dij )xjh , for
= 1, 2, . . . , LN and h = 1, 2, . . . , H, where xjh is the observation on the hth regressor. Each
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
LN
ĝ dij = α̂ e (dij ),
=1
Pinkse, Slade, and Brett (2002) establish consistency and asymptotic normality of the above esti-
mator, and provide OLS and IV estimates of the above model using data on prices at 312 terminals
in the US in 1993.
This model is stable if |γ | + |ρ| + |λ| < 1 assuming that the spatial weight matrix, W, is row
and column standardized. Yu, de Jong, and Lee (2008) derive ML estimators for the fixed-effects
specification of the above model and show that when T is large relative to N, the ML estimators
are consistent and asymptotically normal. But if limN,T→∞ N/T > 0, the limit distribution of
the ML estimators is not centered around 0, in which case the authors propose a bias corrected
estimator. See Yu, de Jong, and Lee (2008); and Lee and Yu (2010b) for further details. IV and
GMM estimation of a stationary spatiotemporal model is considered in Kukenova and Monteiro
(2009).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
by a spatial process of the form (30.7), such as SAR, SMA, or SEC, where ε .t follows a covariance
stationary process. Pesaran and Tosetti (2011) focus on estimation of the cross-section means
of parameters, β = E(β i ), in model (30.28), by fixed-effects (FE) and mean group (MG) esti-
mators, introduced in Chapters 26 and 28, respectively. In the general case of equation (30.28)
these estimators are
N −1
N
β̂ FE = Xi. MD Xi. Xi. MD yi. (30.29)
i=1 i=1
N
−1
β̂ MG = N −1 Xi. MD Xi. Xi. MD yi. , (30.30)
i=1
−1
where MD = IT − D D D D , and D = (d1 , d2 , . . . , dn ). Pesaran and Tosetti (2011)
j
show that, under general regularity conditions, as (N, T) → ∞, for the FE estimator, β̂ FE ,
we have
√ d
N β̂ FE − β → N(0, FE ),
where
FE = Q −1 Q −1 , (30.31)
with
N
−1 X MD Xi. i.
Q = Plim N , (30.32)
N,T→∞
i=1
T
N
X MD Xi. Xi. MD Xi.
−1 i.
= Plim N υ .
N,T→∞
i=1
T T
j
While for the MG estimator, β̂ MG , given by (30.30), as (N, T) → ∞ we have
√ d
N β̂ MG − β → N(0, MG ),
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1 −1
β̂ FE = QNT −1
Asy.Var NT QNT , (30.33)
N
1 N
Asy.Var β̂ MG = β̂ i − β̂ MG β̂ i − β̂ MG . (30.34)
N (N − 1) i=1
where
1 −1
N
QNT = T Xi. MD Xi. , (30.35)
N i=1
N
X M X
1 Xi. MD Xi.
i. D i.
NT = β̂ i − β̂ MG β̂ i − β̂ MG .
N − 1 i=1 T T
One advantage of the above non-parametric variance estimators is that their computation does
not require a priori knowledge of the spatial arrangement of cross-sectional units. In a set of
Monte Carlo experiments, Pesaran and Tosetti (2011) show that misspecification of the spa-
tial weights matrix may lead to substantial size distortions in tests based on the ML or quasi-ML
estimators of β i (or β). Another advantage of using the above approach over standard spatial
techniques is that, while allowing for serially correlated errors, it does not entail information on
the time series processes underlying εit , so long as these processes are covariance stationary.
where β t , ρ t and δ t are time varying parameters, and ε.t satisfies E(ε .t ε.s ) = σ ts IN . Let be a
T×T positive definite matrix with elements σ ts . ML or GMM techniques can be used to estimate
the above model. The log-likelihood function of (30.36)-(30.37) is
NT N T
T
(θ ) = − ln 2π − ln || + ln
IN − ρ t W1
+ ln |IN − δ t W2 |
2 2 t=1 t=1
1
− u −1 ⊗ IN u,
2
, ρ T , β 1 , . . . , β
where θ = ρ 1 , . . .
T , δ 1 , . . . , δT , vech() , and u = u.1 , u.2 , . . . , u.T , with
u.t = (IN − δ t W2 ) IN − ρ t W1 y.t − X.t β t . A number of LM tests for the presence of
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
spatial effects in the above specification are proposed by Mur, López, and Herrera (2010). See
also Baltagi and Pirotte (2010), who consider ML and GMM estimation of a SURE model with
spatial error of the SAR or SMA type, assuming that the remainder term of the spatial process
follows an error component structure.
where K1 (.) and K2 (.) satisfy a set of regularity conditions (see Kelejian and Prucha (2007)). A
Newey-West type SHAC estimator of the variance of the classic FE estimator (30.29) is given by
⎡ ⎤
N T
1 φ |t − s|
= Q −1 ⎣ x̃it x̃js ûit ûjs ⎦ Q −1
ij
β̂ FE
Asy.Var NT K , NT , (30.38)
(NT)2 i,j=1 t,s=1 φN m + 1
1 T
where Q NT = NT
t=1 X.t MX.t , ûit = yit − α̂ i − β̂ FE xit , and Xi. = Xi. M, with Xi. =
(xi1 , xi2 , . . . , xiT ) , X̃i. = (x̃i1 , x̃i2 , . . . , x̃iT ) . For the MG estimator we have
1
N T
φ ij |t − s|
β̂ MG =
Asy.Var K , wit wjs ûit ûjs , (30.39)
(NT)2 i,j=1 t,s=1 φN m + 1
−1
where wit is the t th column of Wi. = T −1 Xi. MD Xi. Xi. MD . One shortcoming of this method
is that its finite sample properties may be quite poor when N or T are not sufficiently large
(Pesaran and Tosetti (2011)). An alternative strategy has been suggested by Bester, Conley, and
3 See Section 5.9 for a discussion of HAC estimators in the context of the time series literature.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Hansen (2011), who, using results taken from Ibragimov and Müller (2010), propose dividing
the sample in groups so that group-level averages are approximately independent, and accord-
ingly suggest an HAC estimator based on a discrete group-membership metric. However, the
validity of this approach relies on the capacity of the researcher to construct groups whose aver-
ages are approximately uncorrelated.
Robinson (2007) considers smoothed nonparametric kernel regression estimation. Under
this approach, rather than employing mixing conditions, it is assumed that regression errors fol-
low a general linear process representation covering both weak (spatial) dependence as well as
dependence at longer ranges. Robinson (2007) establishes consistency of the Nadaraya-Watson
kernel estimate and derives its asymptotic distribution (see also Hallin, Lu, and Tran (2004)).
Spatial filtering techniques can also be used to control for spatial effects (Tiefelsdorf and Grif-
fith (2007)). Under this framework, spatial dependence in the regression is proxied by a linear
combination of a subset of eigenvectors of a matrix function of W. Hence, estimation is carried
out by least squares applied to auxiliary regressions where the observed regressors are augmented
with these artificial variables. The reader is referred to Tiefelsdorf and Griffith (2007) and Grif-
fith (2010) for further discussion.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
N
where S0 = N i=1 j=1 wij , and ρ̂ ij is the sample pair-wise correlation coefficient computed
between fixed-effects residuals of units i and j. The CDP,Local test is asymptotically normally dis-
tributed. Similarly, it is possible to derive local versions of other tests proposed in the panel lit-
erature such as the LM test given by (29.74). See also Pesaran, Ullah, and Yamagata (2008) and
Moscone and Tosetti (2009).
Robinson (2008) has proposed a general class of statistics that, like CDMoran , is based on
quadratic forms in OLS regression residuals, where the matrices appearing in the quadratic forms
satisfy certain sparseness conditions. The author shows that these statistics have a limiting chi-
square distribution under the null hypothesis of error independence. Special cases of this class
of statistics can be interpreted as ML tests directed against specific alternative hypotheses.
30.10 Exercises
1. Consider the simple SAR process
uit = 0.5ρ ui−1,t + ui+1,t + ε it , i = 2, . . . , N − 1; t = 1, 2, . . . , T, (30.42)
where ε it ∼ IID(0, σ 2ε ). Write down the spatial weights matrix for the above process and
derive the covariance matrix of u.t = (u1t , u2t , . . . , uNt ) .
2. Derive the conditions under which the spatial process
where ε it ∼ IID(0, σ 2i ), wi = (wi1 , wi2 , . . . , wiN ) , and yt = (y1t , y2t , . . . , yNt ) , is cross-
sectionally weakly dependent.
3. Consider the simple SAR process (30.42)-(30.44), and assume that εit ∼ IIDN(0, σ 2ε ).
Write down the log-likelihood function for this process and derive the first- and second-order
conditions for estimation of ρ and σ 2ε .
4. Derive an expression for the conventional FE estimator for β in model (30.3) with SAR errors,
(30.4).
√ Obtain its covariance matrix. Derive a set of conditions under which this estimator is
NT-consistent.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
31.1 Introduction
T his chapter provides a review of the theoretical literature on testing for unit roots and coin-
tegration in panels where the time dimension (T), and the cross-section dimension (N)
are relatively large. In cases where N is large (say over 100) and T small (less than 50) the anal-
ysis can proceed only under restrictive assumptions such as dynamic homogeneity and/or local
cross-sectional dependence as in spatial autoregressive or moving average models. In cases where
N is small (say less than ten) and T is relatively large, standard time series techniques applied to
systems of equations, such as the seemingly unrelated regression equations (SURE), can be used
and the panel aspect of the data should not pose new technical difficulties.
One of the primary reasons behind the application of unit root and cointegration tests to a
panel of cross-section units was to gain statistical power and to improve on the poor power of
their univariate counterparts. This was supported by the application of what might be called
the first generation panel unit root tests to real exchange rates, output and inflation. For exam-
ple, the augmented Dickey–Fuller (1979) test, reviewed in Chapter 15, is typically not able to
reject the hypothesis that the real exchange rate is nonstationary. By contrast, panel unit root
tests applied to a collection of industrialized countries generally find that real exchange rates are
stationary, thereby lending empirical support to the purchasing power parity hypothesis (e.g.,
Coakley and Fuertes (1997) and Choi (2001)).
Testing the unit root and cointegration hypotheses by using panel data instead of individual
time series involves several additional complications. First, as seen in previous chapters, the anal-
ysis of panel data generally involves a substantial amount of unobserved heterogeneity, rendering
the parameters of the model cross-section specific. Second, the assumption of cross-sectional
independence is inappropriate in many empirical applications, particularly in the analysis of
real exchange rates mentioned above. To overcome these difficulties, variants of panel unit root
tests are developed that allow for different forms of cross-sectional dependence (see Section
31.4). Third, the panel test outcomes are often difficult to interpret if the null of the unit root
or cointegration is rejected. The best that can be concluded is that ‘a significant fraction of the
cross-section units is stationary or cointegrated’. Conventional panel tests do not provide explicit
guidance as to the size of this fraction or the identity of the cross-section units that are stationary
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
or cointegrated. To deal with issue, recent studies have proposed methods for estimating the
fraction of non-stationary series in the panel, and for classifying the individual series into sta-
tionary and non-stationary sets (see (Pesaran 2012)). Fourth, with unobserved I(1) common
factors affecting some or all of the variables in the panel, it is also necessary to consider the pos-
sibility of cointegration between the variables across the groups (cross-section cointegration)
as well as within group cointegration (see Section 31.5). Finally, the asymptotic theory is con-
siderably more complicated due to the fact that the sampling design involves a time as well as a
cross-section dimension. For example, applying the usual Dickey–Fuller test to a panel data set
introduces a bias that is not present in the case of a univariate test. Furthermore, a proper limit
theory has to take account of the relationship between the increasing number of time periods
and cross-section units (see Phillips and Moon (1999)).
In comparison with panel unit root tests, the analysis of cointegration in panels is still at an
early stages of its development. So far the focus of the panel cointegration literature has been on
residual based approaches, although there has been a number of attempts at the development
of system approaches as well. As in the case of panel unit root tests, such tests are developed
based on homogenous and heterogeneous alternatives. The residual based tests were developed
to ward against the ‘spurious regression’ problem that can also arise in panels when dealing with
I(1) variables. Such tests are appropriate when it is known a priori that at most there can be only
one within group cointegration in the panel. System approaches are required in more general
settings where more than one within group cointegrating relation might be present, and/or there
exist unobserved common I(1) factors.
Having established a cointegration relationship, the long-run parameters can be estimated
efficiently using techniques similar to those proposed in the case of single time series mod-
els. Specifically, fully-modified OLS procedures, the dynamic OLS estimator and estimators
based on a vector error correction representation were adapted to panel data structures. Most
approaches employ a homogeneous framework, that is, the cointegration vectors are assumed
to be identical for all panel units, whereas the short-run parameters are panel specific. Although
such an assumption seems plausible for some economic relationships (like the PPP hypothe-
sis mentioned above) there are other behavioural relationships (like the consumption function
or money demand), where a homogeneous framework seems overly restrictive. On the other
hand, allowing all parameters to be individual specific would substantially reduce the appeal of
a panel data study. It is therefore important to identify parameters that are likely to be similar
across panel units whilst at the same time allowing for sufficient heterogeneity of other parame-
ters. This requires the development of appropriate techniques for testing the homogeneity of a
sub-set of parameters across the cross-section units. When N is small relative to T, standard like-
lihood ratio based statistics can be used. Groen and Kleibergen (2003) provide an application.
Testing for parameter homogeneity in the case of large panels poses new challenges that require
further research (see Section 28.11).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where the initial values, yi0 , are given, and the errors εit are identically, independently distributed
across i and t with E(ε it ) = 0, E(ε 2it ) = σ 2i < ∞ and E(ε 4it ) < ∞. These processes can also be
written equivalently as simple Dickey–Fuller (DF) regressions
where yit = yit − yi,t−1 , φ i = α i − 1. In further developments of the model it is also helpful
to write (31.1) or (31.2) in mean-deviations forms ỹit = α i ỹi,t−1 + ε it , where ỹit = yit − μi .
The corresponding DF regression in ỹit is given by
Most panel unit root tests are designed to test the null hypothesis of a unit root for each individual
series in a panel. Accordingly, the null hypothesis of interest is
H0 : φ 1 = φ 2 = · · · = φ N = 0, (31.4)
that is, all time series are independent random walks. The formulation of the alternative hypothe-
sis is instead a controversial issue that critically depends on which assumptions one makes about
the nature of the homogeneity/heterogeneity of the panel. First, under the assumption that the
autoregressive parameter is identical for all cross-section units, we can consider
The panel unit root statistics motivated by H1a pools the observations across the different cross-
section units before forming the ‘pooled’ statistic (see, e.g., Harris and Tzavalis (1999) and Levin,
Lin, and Chu (2002)). One drawback of tests based on such alternative hypotheses is that they
tend to have power even if only a few of the units are stationary; hence a rejection of the null
hypothesis, H0 , is not convincing evidence that a significant proportion of the series are indeed
stationary. In particular, Westerlund and Breitung (2014) show that the local power of the Levin,
Lin, and Chu (2002) test is greater than that of the Im, Pesaran, and Shin (2003) test, based on
a less restrictive alternative, also when not all individual series are stationary. A further draw-
back in using H1a is that this is likely to be unduly restrictive, particularly for cross-country stud-
ies involving differing short-run dynamics. For example, such a homogeneous alternative seems
particularly inappropriate in the case of the PPP hypothesis, where yit is taken to be the real
exchange rate. There are no theoretical grounds for the imposition of the homogeneity hypoth-
esis, φ i = φ, under PPP. At the other extreme, there is the alternative hypothesis stating that at
least one of the series in the panel is generated by a stationary process
Such an alternative hypothesis is at the basis of panel unit root tests proposed by Chang (2002)
and Chang (2004). We observe that H1b is only appropriate when N is finite, namely within the
multivariate model with a fixed number of variables analyzed in the time series literature. On the
contrary, in the case of large N and T, panel unit root tests will lack power if the alternative, H1b ,
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
is adopted. For large N and T panels it is reasonable to entertain alternatives that lie somewhere
between the two extremes of H1a and H1b . In this context, a more appropriate alternative is given
by the heterogeneous alternative
such that
N1
lim = δ, 0 < δ ≤ 1. (31.6)
N→∞ N
Using the above specification the null hypothesis is H0 : δ = 0, while H1c can be written as
H1c : δ > 0.
In other words, rejection of the unit root null hypothesis can be interpreted as providing evi-
dence in favour of rejecting the unit root hypothesis for a non-zero fraction of panel members
as N → ∞. The tests developed against the above heterogeneous alternatives, H1c , operate
directly on the test statistics for the individual cross-section units using (standardized) simple
averages of the underlying individual statistics or their suitable transformations such as rejec-
tion probabilities (see, among others, Choi (2001), Im, Pesaran, and Shin (2003), and Pesaran
(2007b)).
Remark 8 The heterogeneity of panel data models used in cross-country analysis introduces a new
kind of asymmetry in the way the null and the alternative hypotheses are treated, which is not usu-
ally present in the univariate time series (or cross-sectional) models. This is because the same null
hypothesis is imposed across all i but the specification of the alternative hypothesis is allowed to
vary with i. This asymmetry is assumed away in homogeneous panels. However, as demonstrated
in Pesaran and Smith (1995), neglected heterogeneity (even if purely random) can lead to spurious
results in dynamic panels. Therefore, in cross-country analysis where slope heterogeneity is a norm,
the asymmetry of the null and the alternative hypotheses has to be taken into account. The appropri-
ate response critically depends on the relative size of N and T. In large N-heterogeneous panel data
models with small T (say around 15) it is only possible to devise sufficiently powerful unit root tests
which are informative in some average sense, namely whether the null of a unit root can be rejected
in the case of a significant fraction of the countries in the panel.1 To identify the exact proportion
of the sample for which the null hypothesis is rejected, one requires country-specific data sets with
T sufficiently large. But if T is large enough for reliable country-specific inferences to be made, then
there seems little rationale in pooling countries into a panel.
In the rest of the chapter we will focus on panel unit root tests designed for one of the alterna-
tive hypotheses H1a or H1c . However, we observe that, despite the differences in the way the two
classes of test view the alternative hypothesis, both types of test can be consistent against both
types of the alternative. See, for example, the discussion in Westerlund and Breitung (2014).
1 Some of these difficulties can be circumvented if slope heterogeneity can be modelled in a sensible and parsimonious
manner.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
The nuisance cross-section specific parameters θ i can be estimated either under the null or the
alternative hypothesis. Under the null hypothesis μi is unidentified, but as we shall see it is often
replaced by yi0 , on the implicit (identifying) assumption that ỹi0 = 0 for all i. For this choice
of μi the effective number of time periods used for estimation of φ i is reduced by one. Under
the alternative hypothesis the particular estimates of μi and σ 2i chosen naturally depend on the
nature of the alternatives envisaged. Under homogeneous alternatives, φ i = φ < 0, the ML
estimates of μi and σ 2i are given as nonlinear functions of φ̂. Under heterogeneous alternatives
φ i and σ 2i can be treated as free parameters and estimated separately for each i.
Levin, Lin, and Chu (2002) avoid the problems associated with the choice of the estimators
for μi and base their tests on the t-ratio of φ in the pooled fixed-effects regression
N
σ̂ −2
i yi Mτ yi,−1
i=1
tφ = , (31.9)
N
σ̂ −2
i yi,−1 Mτ yi,−1
i=1
where yi = yi1 , yi2 , . . . , yiT , yi,−1 = yi0 , yi1 , . . . , yi,T−1 , Mτ = IT −τ T (τ T τ T )−1 τ T ,
τ T is a T-dimensional vector of ones,
yi Mi yi
σ̂ 2i = , (31.10)
T−2
Mi = IT − Xi (Xi Xi )−1 Xi , and Xi = τ T , yi,−1 .
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
The construction of a unit root test against H1c is less clear because the alternative consists of
a set of inequality conditions. Im, Pesaran, and Shin (2003) suggest the mean of the individual
specific t-statistics2
1
N
t̄ = ti ,
N i=1
where
yi Mτ yi,−1
ti = 1/2 ,
M y
σ̂ i yi,−1 τ i,−1
is the Dickey–Fuller t-statistic of cross-sectional unit i.3 LM versions of the t-ratios of φ and φ i ,
that are analytically more tractable, can also be used which are given by
N
σ̃ −2
i yi Mτ yi,−1
i=1
t̃φ = , (31.11)
N
σ̃ −2
i yi,−1 Mτ yi,−1
i=1
and
yi Mτ yi,−1
t̃i = 1/2 , (31.12)
M y
σ̃ i yi,−1 τ i,−1
where σ̃ 2i = (T − 1)−1 yi Mτ yi . It is easily established that the panel unit root tests based
on tφ and t̃φ in the case of the pooled versions, and those based on t̄ and
N
t̃ = N −1 t̃i , (31.13)
i=1
To establish the distribution of t̃φ and t̃, we first note that under φ i = 0, yi = σ i vi =
σ i (vi1 , vi2 , . . . , viT ) , where vi (0, IT ) and yi,−1 can be written as
2 Andrews (1998) has considered optimal tests in such situations. His directed Wald statistic that gives a high weight to
alternatives close to the null (i.e., parameter c in Andrews (1998) tends to zero) is equivalent to the mean of the individual
specific test statistics.
3 The mean of other unit root test statistics may be used as well. For example, Smith, Leybourne, Kim, and Newbold
(2004) suggest using the mean of the weighted symmetric test statistic proposed for single time series by Park and Fuller
(1995) and Fuller (1996) (see Section 10.1.3 in Fuller (1996)), or the Max-ADF test proposed by Leybourne (1995) based
on the maximum of the original and the time-reversed Dickey–Fuller test statistics. See also Section 15.7.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where yi0 is a given initial value (fixed or random), si,−1 = si0 , si1 , . . . , si,T−1 , with sit = tj=1 vij ,
t = 1, 2, . . . , T, and si0 = 0. Using these results in (31.11) and (31.12) we have
N √
T−1vi Mτ si,−1
vi Mτ vi
i=1
t̃φ = ,
N si,−1 Mτ si,−1
vi Mτ vi
i=1
and
√
N
T − 1vi Mτ si,−1
−1
t̃ = N 1/2 1/2 .
i=1 vi Mτ vi si,−1 Mτ si,−1
It is clear that under the null hypothesis both test statistics are free of nuisance parameters and
their critical values can be tabulated for all combinations of N and T assuming, for example, that
ε it (or vit ) are normally distributed. Therefore, in the case where the errors, εit , are serially uncor-
related, an exact sample panel unit root test can be developed using either of the test statistics and
no adjustments to the test statistics are needed. The main difference between the two tests lies in
the way information on individual units is combined and their relative small sample performance
would naturally depend on the nature of the alternative hypothesis being considered.
Asymptotic null distributions of the tests can also be derived depending on whether (T, N) →
∞, sequentially, or when both N and T → ∞, jointly. To derive the asymptotic distributions
we need to work with the standardized versions of the test statistics
t̃φ − E t̃φ
ZLL =
, (31.15)
Var t̃φ
and
√
N t̄ − E(ti )
ZIPS = √ , (31.16)
Var(ti )
assuming that T is sufficiently large such that the second-order moments of ti and tφ exist. The
conditions under which ti has a second-order moment are discussed in IPS and it is shown that,
when the underlying errors are normally distributed, the second-order moments exist for T > 5.
For non-normal distributions, the existence of the moments can be ensured by basing the IPS
test on suitably truncated versions of the individual t-ratios (see Pesaran (2007b) for further
details). The exact first- and second-order moments of ti and t̃i for different values of T are given
in Im, Pesaran, and Shin (2003, Table 1). Using these results it is also possible to generalize the
IPS test for unbalanced panels. Suppose the number of time periods available on the ith cross-
sectional unit is Ti , the standardized IPS statistics will now be given by
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
√
N t̄ − N −1 N i=1 E(tiTi )
ZIPS =
, (31.17)
N −1 N i=1 Var(t iTi )
where E(tiTi ) and Var(tiTi ) are, respectively, the exact mean and variance of the DF statistics
d
based on Ti observations. IPS show that for all finite Ti > 6, ZIPS → N (0, 1) as N → ∞.
Similar results follow for the LL test.
To establish the asymptotic distribution of the panel unit root tests in the case of T → ∞,
we first note that for each i
1
d i (a)dW
W i (a)
ti → η i = 0
1 ,
2
0 Wi (a) da
where W i (a) is a demeaned Brownian motion defined as W i (a) = Wi (a) − 01 Wi (a)da and
W1 (a), W2 (a), . . . , WN (a) are independent standard Brownian motions. The existence of the
moments of ηi are established in Nabeya (1999) who also provides numerical values for the first
six moments of the DF-distribution for the three standard specifications; namely models with
and without intercepts and linear trends. Therefore, since the individual Dickey–Fuller statis-
tics t1 , t2 , . . . , tN are independent, it follows that η1 , η2 , . . . ηN are also independent with finite
moments. Hence, by standard central limit theorems we have
√
d N [η̄ − E(ηi )] d
ZIPS −−−→ −−−→ N (0, 1),
T→∞ Var(ηi ) N→∞
N
where η̄ = N −1 i=1 ηi . Similarly,
tφ − E(tφ ) d
ZLL = −−−−−−→ N (0, 1).
Var(tφ ) (T,N)→∞
To simplify the exposition, the above asymptotic results are derived using a sequential limit
theory, where T → ∞ is followed by N → ∞. However, Phillips and Moon (1999) show
that sequential convergence does not imply joint convergence so that in some situations the
sequential limit theory may break down. In the case of models with serially uncorrelated errors,
IPS (2003) show that the t-bar test is in fact valid for N and T→∞ jointly. Furthermore, as we
shall see, the IPS test is valid for the case of serially correlated errors as N and T→∞ so long as
N/T → k where k is a finite non-zero constant.
Maddala and Wu (1999) and Choi (2001) independently suggested a test against the het-
erogenous alternative H1c that is based on the p-values of the individual statistic as originally
suggested by Fisher (1932). Let π i denote the p-value of the individual specific unit root test
applied to cross-sectional unit i. The combined test statistic is
N
π = −2 log(π i ). (31.18)
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
1 −1
N
ZINV =√ (π i ) , (31.19)
N i=1
where (·) denotes the cdf of the standard normal distribution. An important advantage of this
approach is that it is possible to allow for different specifications (such as different deterministic
terms and lag-order) for each panel unit.
Under the null hypothesis π is χ 2 distributed with 2N degrees of freedom. For large N the
transformed statistic
1
N
π̄ ∗ = − √ [log(π i ) + 1], (31.20)
N i=1
Following Breitung (2000) and Moon, Perron, and Phillips (2007), the asymptotic distribution
d
under H is obtained as Zj → N (−c̄ θ j , 1), j =LL,IPS, where c̄ = limN→∞ N −1 N i=1 ci and
1
1 E 0
i (a)2 da
W
θ1 = E
Wi (a) da , θ 2 =
2 √ .
0 Var(ti )
It is interesting to note that the local power of both test statistics depends on the mean c̄. Accord-
ingly, the test statistics do not exploit the deviations from the mean value of the autoregressive
parameter.
Moon, Perron, and Phillips (2007) derive the most powerful test statistic against the local
alternative (31.21). Assume that we (randomly) choose the sequence c∗1 , c∗2 , . . . , c∗N instead of
the unknown values c1 , c2 , . . . , cN . The point optimal test statistic is constructed using the (local-
to-unity) pseudo differences
√
For the model without individual constants and homogeneous variances the point optimal test
results in the statistic
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
N T
1 1
VNT = 2 (c∗i yit ) − (yit ) − κ 2 ,
2 2
σ̂ i=1 t=1
2
where E(c∗i )2 = κ 2 . Under the sequence of local alternatives (31.21), Moon, Perron, and Phillips
(2007) derive the limiting distribution as
d
VNT → N −E(ci c∗i ), 2κ 2 .
The upper bound of the local power is achieved with ci = c∗i , that is, if the local alternatives
used to construct the test coincide with the actual alternative. Unfortunately, in practice it seems
extremely unlikely that one could select values of c∗i that are perfectly correlated with the true
values, ci . If, on the other hand, the variates c∗i are independent of ci , then the power is smaller
than the power of a test using identical values c∗i = c∗ for all i. This suggests that if there is no
information about the variation of ci , then a test cannot be improved by taking into account a
possible heterogeneity of the alternative.
where dit represents the deterministics and ỹit = φ i ỹi,t−1 + ε it . For the model with a con-
stant mean we let dit = 1 and the model with individual specific time trends dit is given by
dit = (1, t) . Furthermore, structural breaks in the mean function can be accommodated by
including (possibly individual specific) dummy variables in the vector dit . The parameter vector
δ i is assumed to be unknown and has to be estimated. For the Dickey–Fuller test statistic, the
mean function is estimated under the alternative, that is, for the model with a time trend, δ̂ i dit
can be estimated from a regression of yit on a constant and t (t = 1, 2, . . . , T). Alternatively,
the mean function can also be estimated under the null hypothesis (see Schmidt and Phillips
(1992)) or under a local alternative (Elliott, Rothenberg, and Stock (1996)).4
Including deterministic terms may have an important effect on the asymptotic properties of
the test. Let ỹˆt and ỹˆi,t−1 denote estimates for ỹit = yit − E(yit ) and ỹi,t−1 = yi,t−1 −
E(yi,t−1 ). In general, running the regression
does not render a t-statistic with a standard normal limiting distribution due to the fact that ỹˆ i,t−1
is correlated with eit . For example, if dit is an individual specific constant such that
ỹˆit = yit − T −1 (yi0 + · · · + yi,T−1 ), we obtain under the null hypothesis
4 See, e.g. Choi (2002) and Harvey, Leybourne, and Sakkas (2006).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
T
1
lim E eit ỹˆi,t−1 = −σ 2i /2 .
T→∞ T
t=1
where δ̂ = (δ̂ 1 , δ̂ 2 , . . . , δ̂ N ) , and δ̂ i is the estimator of the coefficients of the deterministics, dit ,
in the OLS regression of yit on dit . The corrected, standardized statistic is given by
N T
ˆ ˆ 2
ỹit ỹi,t−1 /σ̂ i − NTaT (δ̂)
i=1 t=1
ZLL (δ̂) = .
N T
ˆ 2
bT (δ̂) ỹi,t−1 /σ̂ i
2
i=1 t=1
Levin, Lin, and Chu (2002) present simulated values of aT (δ̂) and bT (δ̂) for models with con-
stants, time trends and various values of T. A problem is, however, that for unbalanced data sets
no correction terms are tabulated.
Alternatively, the test statistic may be corrected such that the adjusted t-statistic
∗
ZLL (δ̂) = [ZLL (δ̂) − a∗T (δ̂)]/b∗T (δ̂)
is asymptotically standard normal. Harris and Tzavalis (1999) derive the small sample values of
a∗T (δ̂) and b∗T (δ̂) for T fixed and N → ∞. Therefore, their test statistic can be applied for small
values of T and large values of N.
An alternative approach is to avoid the bias—and hence the correction terms—by using alter-
native estimates of the deterministic terms. Breitung and Meyer (1994) suggest using the initial
value yi0 as an estimator of the constant term. As argued by Schmidt and Phillips (1992), the ini-
tial value is the best estimate of the constant given the null hypothesis is true. Using this approach,
the regression equation for a model with a constant term becomes
Under the null hypothesis, the pooled t-statistic of H0 : φ ∗ = 0 has a standard normal limit
distribution.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
For a model with a linear time trend, a minimal invariant statistic is obtained by the transfor-
mation (see Ploberger and Phillips (2002))
t
x∗it = yit − yi0 − (yiT − yi0 ) .
T
In this transformation, subtracting yi0 eliminates the constant and (yiT − yi0 )/T = (yi1 +
· · · + yiT )/T is an estimate of the slope of the individual trend function.
A Helmert transformation can be used to correct for the mean of yit ,
1
y∗it = st yit − (yi,t+1 + · · · + yiT ) , t = 1, . . . , T − 1,
T−t
where s2t = (T − t)/(T − t + 1) (see Arellano (2003), p. 17). Using these transformations, the
regression equation becomes
It is not difficult to verify that, under the null hypothesis we have E(y∗it x∗i,t−1 ) = 0, and thus
the t-statistic for φ ∗ = 0 is asymptotically standard normally distributed (see Breitung (2000)).
It is important to note that including individual specific time trends substantially reduces the
(local) power of the test. This was first observed by Breitung (2000) and studied more rigorously
by Ploberger and Phillips (2002) and Moon, Perron, and Phillips (2007). Specifically, the latter
two papers show that a panel unit root test with incidental trends has non-trivial asymptotic
power only for local alternatives with rate T −1 N −1/4 . A similar result is found by Moon, Perron,
and Phillips (2006) for the test suggested by Breitung (2000).
The test against heterogeneous alternatives, H1c , can easily be adjusted for individual specific
deterministic terms such as linear trends or seasonal dummies. This can be done by computing
IPS statistics, defined by (31.16) and (31.17) for the balanced and unbalanced panels, using
Dickey–Fuller t-statistics based on DF regressions including the deterministics δ i dit , where
dit = 1 in the case of a constant term, dit = (1, t) in the case of models with a linear time trend
and so on. The mean and variance corrections should, however, be computed to match the nature
of the deterministics. Under a general setting IPS (2003) have shown that the ZIPS statistic con-
verges in distribution to a standard normal variate as N, T → ∞, jointly.
In a straightforward manner it is possible to include dummy variables in the vector dit that
accommodate structural breaks in the mean function (see, e.g., Murray and Papell (2002);
Tzavalis (2002); Carrion-i-Sevestre, Del Barrio, and Lopez-Bazo (2005); Breitung and Cande-
lon (2005); Im, Lee, and Tieslau (2005)).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
For example, the IPS statistics (31.16) and (31.17) developed for balanced and unbalanced pan-
els can now be constructed using the ADF(pi ) statistics based on the above regressions. As noted
in IPS (2003), small sample properties of the test can be much improved if the standardization
of the IPS statistic is carried out using the simulated means and variances of ti (pi ), the t-ratio of
φ i computed based on ADF(pi ) regressions. This is likely to yield better approximations, since
E [ti (pi )], for example, makes use of the information contained in pi while E [ti (0)] = E(ti )
does not. Therefore, in the serially correlated case, IPS propose the following standardized t-bar
statistic
√
N t̄ − N1 N i=1 E [ti (pi )] d
ZIPS =
−−−−−−→ N (0, 1). (31.27)
N (T,N)→∞
i=1 Var [ti (pi )]
1
N
The values of E [ti (p)] and Var [ti (p)] simulated for different combinations of T and p, are pro-
vided in Table 3 of IPS. These simulated moments also allow the IPS panel unit root test to be
applied to unbalanced panels with serially correlated errors.
For tests against the homogeneous alternatives, φ 1 = φ 2 = · · · = φ N = φ < 0, Levin, Lin,
and Chu (2002) suggest removing all individual specific parameters within a first step regression
such that eit (vi,t−1 ) are the residuals from a regression of yit (yi,t−1 ) on yi,t−1 , . . . , yi,t−pi
and dit . In the second step the common parameter φ is estimated from a pooled regression
where σ̂ 2i is the estimated variance of eit . Unfortunately, the first step regressions are not sufficient
to remove the effect of the short-run dynamics on the null distribution of the test. Specifically,
⎡ ⎤
1 T
σ̄ i
lim E ⎣ eit vi,t−1 /σ 2i ⎦ = a∞ (δ̂) ,
T→∞ T − p t=p+1 σi
where σ 2i is the long-run variance and a∞ (δ̂) denotes the limit of the correction term given in
(31.23). Levin, Lin, and Chu (2002) propose a nonparametric (kernel based) estimator for σ̄ 2i
⎡ ⎛ ⎞⎤
T K
T
1 K + 1 − l
s̄2i = ⎣ ỹˆit2 + 2 ⎝ ỹˆit ỹˆi,t−l ⎠⎦ , (31.28)
T t=1
K + 1
l=1 t=l+1
where ỹˆit denotes the demeaned difference and K denotes the truncation lag. As noted by Bre-
itung and Das (2005), in a time series context the estimator of the long-run variance based on
p
differences is inappropriate since under the stationary alternative s̄2i → 0; thus the use of this
estimator yields an inconsistent test. In contrast, in the case of panels the use of s̄2i improves the
p
power of the test, since with s̄2i → 0 the correction term drops out and the test statistic tends
to −∞.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
It is possible to avoid the use of a kernel based estimator of the long-run variance by using an
alternative approach suggested by Breitung and Das (2005). Under the null hypothesis we have
γ i (L)yit = δ i dit + ε it ,
where γ i (L) = 1−γ i1 L−· · ·−γ i,pi Lp and L is the lag operator. It follows that gt = γ i (L)[yit −
E(yit )] is a random walk with uncorrelated increments. Therefore, the serial correlation can be
removed by replacing yit by the pre-whitened variable ŷit = γ̂ i (L)yit , where γ̂ i (L) is an estimator
of the lag polynomial obtained from the least-square regression
This approach may also be used for modifying the ‘unbiased statistic’ based on the t-statistic
of φ ∗ = 0 in (31.25). The resulting t-statistic has a standard normal limiting distribution if
T → ∞ is followed by N → ∞.
A related approach is proposed by Westerlund (2009), who suggests testing the unit root
hypothesis by running a modified ADF regression of the form
where y∗i,t−1 = (σ̂ i /s̄i )yi,t−1 and s̄2i is a consistent estimator of the long-run variance, σ 2i . West-
erlund (2009) recommends using a parametric estimate of the long-run variance based on an
autoregressive representation. This transformation of the lagged dependent variable eliminates
the nuisance parameters in the asymptotic distribution of the ADF statistic and, therefore, the
correction for the numerator of the corrected t-statistic of Levin, Lin, and Chu (2002) is the
same as in the case without short-run dynamics.
Pedroni and Vogelsang (2005) have proposed a test statistic that avoids the specification of
the short-run dynamics by using an autoregressive approximation. Their test statistic is based on
the pooled variance ratio statistic
Tci (0)
w
ZNT = ,
N ŝi2
where ci () = T −1 Tt=+1 ỹˆit ỹˆi,t− , ỹˆit = yit − δ̂ i dit , and ŝi2 is the untruncated Bartlett kernel
estimator defined as ŝi2 = T+1
=−T+1 (1 − ||/T)ci (). As has been shown by Kiefer and Vogel-
sang (2002) and Breitung (2002), the limiting distribution of such ‘non-parametric’ statistics
does not depend on nuisance parameters involved by the short run dynamics of the processes.
Accordingly, no adjustment for short-run dynamics is necessary.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
distribution even for a finite N. In this case no correction terms need to be tabulated to account
for the mean and the variance of the test statistic.
Chang (2002) proposes a nonlinear instrumental variable (IV) approach, where the trans-
formed variable
with ci > 0 is used as an instrument for estimating φ i in the regression yit = φ i yi,t−1 + ε it
(which may also include deterministic terms and lagged differences). Since wi,t−1 tends to zero
as yi,t−1 tends to ±∞ the trending behaviour of the nonstationary variable yi,t−1 is eliminated.
Using the results of Chang, Park, and Phillips (2001), Chang (2002) showed that the Wald test
of φ = 0 based on the nonlinear IV estimator possesses a standard normal limiting distribution.
Another important property of the test is that the nonlinear transformation also takes account of
possible contemporaneous dependence among the cross-sectional units. Accordingly, Chang’s
panel unit root test is also robust against cross-sectional dependence.
It should be noted that wi,t−1 ∈ [−(ci e)−1 , (ci e)−1 ] with a maximum (minimum) at yi,t−1 =
1/ci (yi,t−1 = −1/ci ). Therefore, the choice of the parameter ci is crucial for the properties of
the test. First, the parameter should be proportional to the inverse of the standard deviations of
yit . Chang notes that, if the time dimension is short, the test slightly over-rejects the null and
therefore she proposes the use of a larger value of K to correct for the size distortion.
An alternative approach to obtain an asymptotically standard normal test statistic is to adjust
the given samples in all cross-sections so that they all have sums of squares y2i1 + · · · + y2iki =
p
σ 2i cT 2 +hi , where hi → 0 as T → ∞. In other words, the panel data set becomes an unbalanced
panel with ki time periods in the ith unit. Chang calls this setting the ‘equi-squared sum contour’,
whereas the traditional framework is called the ‘equi-sample-size contour’. The nice feature of
this approach is that it yields asymptotically standard normal test statistics. An important draw-
back is, however, that a large number of observations may be discarded by applying this contour
which may result in a severe loss of power.
Testing the null of stationarity in panels
As in the time series case (see Section 15.7.5), it is possible to test the null hypothesis that
the series are stationary against the alternative that (at least some of) the series are nonstation-
ary. The test suggested by Tanaka (1990) and Kwiatkowski et al. (1992) is designed to test the
hypothesis H0∗ : θ i = 0 in the model
where rit is white noise with unit variance and uit is stationary. The cross-sectional specific
KPSS statistic is
1 2
T
κi = Ŝ ,
T 2 σ̄ 2T,i t=1 it
where σ̄ 2T,i denotes a consistent estimator of the long-run variance of yit and Ŝit =
t
=1 y i − δ̂ i i is the partial sum of the residuals from a regression of yit on the deterministic
d
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
terms (a constant or a linear time trend). The individual test statistics can be combined as in the
test suggested by IPS (2003) yielding
N
−1/2 i=1 [κ i − E(κ i )]
κ̄ = N √ ,
Var(κ i )
where asymptotic values of E(κ i ) and Var(κ i ) are derived in Hadri (2000) and values for finite
T and N → ∞ are presented in Hadri and Larsson (2005).
The test of Harris, Leybourne, and McCabe (2004) is based on the stationarity statistic
√
Zi (k) = T ĉi (k)/ω̂zi (k),
where ĉi (k) denotes the usual estimator of the covariance at lag k of cross-sectional unit i and
ω̂2zi (k) is an estimator of the long-run variance of zkit = (yit − δ̂ i dit )(yi,t−k − δ̂ i di,t−k ). The intu-
ition behind this test statistic is that for a stationary and ergodic time series we have E[ĉi (k)] → 0
as k → ∞. Since ω̂2zi is a consistent estimator for the variance of ĉi (k) it follows √ that Zi (k) con-
verges to a standard normally distributed random variable as k → ∞ and k/ T → δ < ∞.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
so-called false discovery rate (FDR), given by the expected fraction of series classified as I(0) that
are in fact I(1), as a useful diagnostic on the aggregate decision. In the computation of the FDR,
the authors estimate the fraction of true null hypotheses by applying the Ng (2008) approach
described above.
where
uit = γ i ft + ξ it , (31.32)
or
ut = ft + ξ t , (31.33)
5 See, for example, O’Connell (1998) and Phillips and Sul (2003).
6 The case where f and/or ξ might be serially correlated will be considered below.
t it
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
A simple example of panel data models with weak cross-sectional dependence is given by
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
y1t a1 y1,t−1 u1t
⎜ y2t ⎟ ⎜ a2 ⎟ ⎜ y2,t−2 ⎟ ⎜ u2t ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟=⎜ .. ⎟+φ⎜ .. ⎟+⎜ .. ⎟ (31.34)
⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠ ⎝ . ⎠
yNt aN yN,t−1 uNt
or
where ai = −φμi , yt , yt−1 , a and ut are N × 1 vectors, and the cross-sectional correlation is
represented by a non-diagonal matrix
with bounded eigenvalues. For the model without constants, Breitung and Das (2005) showed
that the regression t-statistic of φ = 0 in (31.35) is asymptotically distributed as N (0, ν) where
tr( 2 /N)
ν = lim . (31.36)
N→∞ (tr /N)2
Note that tr( ) and tr( 2 ) are O(N) and, thus, ν converges to a constant that can be shown to
be larger than one. This explains why the test ignoring the cross-correlation of the errors has a
positive size bias.
where yt is the vector of demeaned variables. Harvey and Bates (2003) derive the limiting dis-
tribution of tgls (N) for a fixed N and as T → ∞, and tabulate its asymptotic distribution for
various values of N. Breitung and Das (2005) show that ifyt = yt − y0 is used to demean the
variables and T → ∞ is followed by N → ∞, then the GLS t-statistic possesses a standard
normal limiting distribution.
The GLS approach cannot be used if T < N since in this case the estimated covariance matrix
ˆ
is singular. Furthermore, Monte Carlo simulations suggest that for reasonable size properties
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
of the GLS test, T must be substantially larger than N (e.g., Breitung and Das (2005)). Maddala
and Wu (1999) and Chang (2004) have suggested a bootstrap procedure that improves the size
properties of the GLS test.
T
yt−1 ˆ yt−1
& φ̂) =
Var(
t=1
2 .
T
yt−1 yt−1
t=1
If T → ∞ is followed by N → ∞ the robust t statistic trob = φ̂/ Var( & φ̂) is asymptotically
standard normally distributed (Breitung and Das (2005)).
If it is assumed that the cross-correlation is due to common factors, then the largest eigen-
value of the error covariance matrix, , is Op (N) and the robust PCSE approach breaks down.
Specifically, Breitung and Das (2008) show that in this case trob is distributed as the ordinary
Dickey–Fuller test applied to the first principal component.
In the case of a single unobserved common factor, Pesaran (2007b) suggests a simple mod-
N
ification of the usual test procedure. Let ȳt = N −1 N i=1 yit and ȳt = N
−1
i=1 yit =
ȳt − ȳt−1 . The cross-section augmented Dickey–Fuller (CADF) test is based on the following
regression
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
variables, such as interest rates, inflation and output share the same factors. If anything, it would
be difficult to find macroeconomic time series that do not share one or more common factors.
For example, in testing for unit roots in a panel of real outputs one would expect the unob-
served common shocks to output (that originate from technology) to also manifest themselves
in employment, consumption and investment. In the case of testing for unit roots in inflation
across countries, one would expect the unobserved common factors that correlate inflation
rates across countries to also affect short-term and long-term interest rates across markets and
economies. The basic idea of using covariates to deal with a multiple factor structure is intuitive
and easy to implement—the ADF regression for yit is simply augmented with cross-section aver-
ages of yit and xit .7 Pesaran, Smith, and Yamagata (2013) show that the extended version of the
Pesaran (2007b) test, denoted as CIPS, is valid so long as k = mmax − 1, where mmax is the
assumed maximum number of factors. Importantly, the estimation of the true number of fac-
tors, m, is not needed so long as m ≤ mmax . Furthermore, it is not required that all of the factors
be strong. Following Bai and Ng (2010), Pesaran, Smith, and Yamagata (2013) also consider a
panel unit root test based on simple averages of cross-sectionally augmented Sargan-Bhargava-
type statistics, denoted as CSB. Monte Carlo simulations reported by these authors suggest that
both CIPS and CSB tests have the correct size across different experiments and with various com-
binations of N and T being considered. The experimental results also show that the proposed
CSB test has satisfactory power, which for some combinations of N and T tends to be higher
than that of the CIPS test.
7 The idea of augmenting ADF regressions with other covariates has been investigated in the unit root literature by
Hansen (1995) and Elliott and Jansson (2003). These authors consider the additional covariates in order to gain power
when testing the unit root hypothesis in the case of a single time series. Pesaran, Smith, and Yamagata (2013) augment
ADF regressions with cross-section averages to eliminate the effects of unobserved common factors in the case of panel
unit root tests.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where
sft = f1 + f2 + . . . + ft ,
sit = ε i1 + ε i2 + . . . + ε it .
Clearly, under the null hypothesis all cross-section units are related to the common stochastic
component, sft , albeit with varying effects, γ i . This framework rules out cross-unit cointegra-
tion as under the null hypothesis there does not exist a linear combination of y1t , . . . , yNt that is
stationary. Therefore, tests based on (31.37) are designed to test the joint null hypothesis: ‘All
time series are I(1) and not cointegrated’.
To allow for cross-unit cointegration, Bai and Ng (2004) propose analyzing the common fac-
tors and idiosyncratic components separately. A simple multi-factor example of the Bai and Ng
framework is given by
yit = μi + γ i gt + eit ,
gt =
gt−1 + vt ,
eit = ρ i ei,t−1 + εit ,
where gt is the m × 1 vector of unobserved components, vt and εit are stationary common and
individual specific shocks, respectively. Two different sets of null hypotheses are considered:
H0a : (testing the I(0)/I(1) properties of the common factors) Rank(
) = r ≤ m, and H0b :
(panel unit root tests) ρ i = 1, for all i. A test of H0a is based on common factors estimated
by principal components and cointegration tests are used to determine the number of the com-
mon trends, m − r. Panel unit root tests are then applied to the idiosyncratic components. The
null hypothesis that the time series have a unit root is rejected if either the test of the common
factors or the test for the idiosyncratic component reject the null hypothesis of nonstationary
components.8 As has been pointed out by Westerlund and Larsson (2009), replacing the unob-
served idiosyncratic components by estimates introduces an asymptotic bias when pooling the
t-statistic (or p-values) of the panel units, which renders the pooled tests in Bai and Ng (2004)
asymptotically invalid. Bai and Ng (2010) provide an alternative panel unit root test based on
Sargan and Bhargava (1983) that has much better small sample properties.
To allow for short-run and long-run dependencies, Chang and Song (2005) suggest a non-
linear instrument variable test procedure. As the nonlinear instruments suggested by Chang
(2002) are invalid in the case of cross-unit cointegration, panel specific instruments based on the
Hermite function of different order are used as nonlinear instruments. Chang and Song (2005)
show that the t-statistic computed from the nonlinear IV statistic are asymptotically standard
normally distributed and, therefore, a panel unit root statistics against the heterogeneous alter-
native H1c can be constructed that has an standard normal limiting distribution.
Choi and Chue (2007) employ a subsampling procedure to obtain tests that are robust against
a wide range of cross-sectional dependence such as weak and strong correlation as well as cross-
unit cointegration. To this end, the sample is grouped into a number of overlapping blocks of b
8 An alternative factor extraction method is suggested by Kapetanios (2007) who also provides detailed Monte Carlo
results on the small sample performance of panel unit root tests based on a number of alternative estimates of the unobserved
common factors. He shows that the factor-based panel unit root tests tend to perform rather poorly when the unobserved
common factor is serially correlated.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
time periods. Using all (T − b + 1) possible overlapping blocks, the critical value of the test is
estimated by the respective quantile of the empirical distribution of the (T −b+1) test statistics
computed. The advantage of this approach is that the null distribution of the test statistic may
depend on unknown nuisance parameters. Whenever the test statistics converge in distribution
to some limiting null distribution as T → ∞ and N fixed, the sub-sample critical values con-
verge in probability to the true critical values. Using Monte Carlo simulations Choi and Chue
(2007) demonstrate that the size of the subsample test is indeed very robust against various
forms of cross-sectional dependence. But such tests are only appropriate in the case of panels
where N is small relative to T.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
the mean CADF test has better size properties than the test of Moon and Perron (2004), which
tends to be conservative in small samples. However the latter test appears to have more power
against stationary idiosyncratic components. Since these tests remove the common factors, they
will eventually indicate stationary time series in cases where the series are actually nonstationary
due to a common stochastic trend. The results of Gengenbach, Palm, and Urbain (2010) also
suggest that the approach of Bai and Ng (2004) is able to cope with this possibility although the
power of the unit test applied to the nonstationary component is not very high.
In general, the application of factor models in the case of weak cross sectional dependence
does not yield valid test procedures. Alternative unit root tests that allow for weak cross-sectional
dependence are considered in Breitung and Das (2005). They find that the GLS t-statistic may
have a severe size bias if T is only slightly larger than N. In these cases, the Chang (2004) boot-
strap procedure is able to substantially improve the size properties. The robust OLS t-statistic
performs slightly worse but outperforms the nonlinear IV test of Chang (2002). However,
Monte Carlo simulations carried out by Baltagi, Bresson, and Pirotte (2007) show that there can
be considerable size distortions even in panel unit root tests that allow for weak dependence.
Interestingly enough Pesaran’s test, which is not designed for weak cross-sectional dependence,
tends to be the most robust to spatial type dependence.
zijt ∼ I(1), j = 1, 2, . . . ., ni .
Then zit is said to form one or more cointegrating relations if there are linear combinations of
zijt ’s for j = 1, 2, . . . , ni that are I (0), i.e. if there exists an ni × ri matrix (ri ≥ 1) such that
β i zit = ξ it ∼ I (0) .
ri × ni ni × 1 ri × 1
ri denotes the number of cointegrating (or long-run) relations. The residual-based tests are
appropriate when ri = 1, and zit can be partitioned such that zit = (yit , xit ) with no cointe-
gration amongst the ki × 1 (ki = ni − 1) variables, xit . The system cointegration approaches
are much more generally applicable and allow for ri > 1 and do not require any particular par-
titioning of the variables in zit . Another main difference between the two approaches is the way
the stationary component of ξ it is treated in the analysis. Most of the residual-based techniques
employ non-parametric (spectral density) procedures to model the residual serial correlation in
the error correction terms, ξ it , whilst vector autoregressions (VAR) are utilized in the develop-
ment of system approaches.
In panel data models, the analysis of cointegration is further complicated by heterogeneity,
unbalanced panels, cross-sectional dependence, cross unit cointegration and the N and T asymp-
totics. But in cases where ni and N are small, such that i=1N n is less than 10, and T is relatively
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
large (T > 100), as noted by Banerjee, Marcellino, and Osbat (2004), many of these prob-
lems can be avoided by applying the system cointegration techniques discussed in Chapter 22
to the pooled vector, zt = (z1t , z2t , . . . , zNt ) . In this setting, cointegration will be defined by
the relationships β zt that could contain cointegration between variables from different cross-
section units as well as cointegration amongst the different variables specific to a particular cross-
sectional unit. This framework can also deal with residual cross-sectional dependence since it
allows for a general error covariance matrix that covers all the variables in the panel.
Despite its attractive theoretical features, the ‘full’ system approach to panel cointegration
is not feasible even in the case of panels with moderate values of N and ni . In practice, cross-
sectional cointegration can be accommodated using common factors as in the work of Bai and
Ng (2004), Pesaran (2006), Pesaran, Schuermann, and Weiner (2004) (PSW) and its subse-
quent developments in Dées et al. (2007) (DdPS). Bai and Ng (2004) consider the simple case
where ni = 1 but allow N and T to be large. But their setup can be readily generalized so that
cointegration within each cross-sectional unit as well as across the units can be considered. Fol-
lowing DdPS suppose that9
zit = id dt + if ft + ξ it , (31.38)
ft =
(L) ηt , ηt ∼ IID(0, Im ), (31.39)
∞
∞
(L) =
L ,
i (L) =
i L . (31.41)
m×m n×n
=0 =0
matrices,
and
i , i = 1, 2, . . . , N, are absolute summable, so that Var (ft )
The coefficient
and Var ξ it are bounded and positive definite, and [
i (L)]−1 exists. In particular we
require that
'∞ '
' '
' '
'
i
i ' ≤ K < ∞, (31.42)
' '
=0
9 DdPS also allow for common observed macro factors (such as oil prices), but they are not included to simplify the
exposition. Also see Chapter 33.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
' '
where K is a fixed constant. A sufficient condition is given by '
i
i ' < 1, for all i and .
Using the familiar decomposition (see Chapter 22):
(L) =
(1) + (1 − L)
∗ (L) , and
i (L) =
i (1) + (1 − L)
∗i (L) ,
the common stochastic trend representations of (31.39) and (31.40) can now be written as
ft = f0 +
(1) st +
∗ (L) ηt − η0 ,
and
where
t
t
st = ηj , and sit = vij .
j=1 j=1
zit = ai + id dt + if
(1) st +
i (1) sit
+ if
∗ (L) ηt +
∗i (L) vit ,
where10
ai = if [f0 −
∗ (L) η0 ] + ξ i0 −
∗i (L) vi0 .
In this representation
(1) st and
i (1) sit can be viewed as common global and individual-
specific stochastic trends, respectively; whilst
∗ (L) ηt and
∗i (L) vit are the common and
individual-specific stationary components. From this result it is clear that, in general, it will not
be possible to simultaneously eliminate the two types of common stochastic trends (global and
individual-specific) in zit .
Specific cases of interest where it would be possible for zit to form a cointegrating vector are
when
(1) = 0 or
i (1) = 0. Under the former, panel cointegration exists if
i (1) is rank
deficient. The number of cointegrating relations could differ across i and is given by ri = n −
Rank [
i (1)]. Note that even in this case zit can be cross-sectionally correlated through the
common stationary components,
∗ (L) ηt . Under
i (1) = 0 for all i with
(1) = 0, we will
have panel cointegration if there exists n × ri matrices β i such that β i if
(1) = 0.
Turning to the case where
(1) and
i (1) are both non-zero, panel cointegration could still
exist but must involve both zit and ft . But since ft is unobserved it must be replaced by a suitable
estimate. The global VAR (GVAR) approach of Pesaran, Schuermann, and Weiner (2004) and
Dées et al. (2007) implements this idea by replacing ft with the (weighted) cross-section aver-
ages of zit (see also Chapter 33). To see how this can be justified, first differencing (31.38) and
10 In usual case where d is specified to include an intercept, 1, a can be absorbed into the deterministics.
t i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
p
(1 − L) [
i (L)]−1 ≈ i L = i L, p ,
=0
When the common factors, ft , are observed the model for the ith cross-sectional unit decouples
from the rest of the units and can be estimated using the econometric techniques developed in
Pesaran, Shin, and Smith (2000), reviewed in Chapter 23, with ft treated as weakly exogenous.
But in general where the common factors are unobserved appropriate proxies for the common
factors can be used. There are two possible approaches, one could either use the principal com-
ponents of the observables, zit , or alternatively, following Pesaran (2006), ft can be approximated
in terms of z̄t = N −1 i=1N z , the cross-section averages of the observables. To see how this
it
procedure could be justified in the present context, average the individual equations given by
(31.38) over i to obtain
z̄t = ¯ d dt + ¯ f ft + ξ̄ t , (31.44)
that
N
ξ̄ t − ξ̄ t−1 = N −1
j (L) vjt . (31.45)
j=1
q.m.
But using results in Pesaran (2006), for each t and as N → ∞ we have ξ̄ t − ξ̄ t−1 → 0, and
q.m.
hence ξ̄ t → ξ̄ , where ξ̄ is a time-invariant random variable. Using this result in (31.44) and
assuming that the n × m average factor loading coefficient matrix, ¯ f , has full column rank (with
n ≥ m) we obtain
q.m.
−1
ft → ¯ f ¯ f ¯ f z̄t − ¯ d dt − ξ̄ ,
which justifies using the observable vector {dt , z̄t } as proxies for the unobserved common factors.
The various contributions to the panel cointegration literature will now be reviewed in the
context of the above general set up. First-generation literature on panel cointegration tends to
ignore the possible effects of global unobserved common factors, or attempts to account for
them either by cross-section de-meaning or by using observable common effects such as oil
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
prices or US output. This literature also focusses on residual based approaches where it is often
assumed that there exists at most one cointegrating relation in the individual specific models.
Notable contributions to this strand of the literature include Kao (1999), Pedroni (1999, 2001,
2004), and more recently Westerlund (2005b). System approaches to panel cointegration that
allow for more than one cointegrating relation include the work of Larsson, Lyhagen, and Loth-
gren (2001), Groen and Kleibergen (2003) and Breitung (2005) who generalized the likelihood
approach introduced in Pesaran, Shin, and Smith (1999). Like the second generation panel unit
root tests, recent contributions to the analysis of panel cointegration have also emphasized the
importance of allowing for cross-sectional dependence which, as we have noted above, could be
due to the presence of common stationary or non-stationary components or both. The impor-
tance of allowing for the latter has been emphasized in Banerjee, Marcellino, and Osbat (2004)
through the use of Monte Carlo experiments in the case of panels where N is very small, at most
8 in their analysis. But to date a general approach that is capable of addressing all the various
issues involved does not exist if N is relatively large.
We now consider in some further detail the main contributions, beginning with a brief dis-
cussion of the spurious regression problem in panels.
are considered, where as before δ i dit represent the deterministics and the k × 1 vector of regres-
sors, xit , are assumed to be I(1) and not cointegrated. However, the innovations in xit , denoted
by εit = xit − E(xit ), are allowed to be correlated with uit . Residual-based approaches to
panel cointegration focus on testing for unit roots in OLS or panel estimates of uit .
d
where Wi is a (k + 1) × 1 vector of standard Brownian motions, → denotes weak convergence
on D[0, 1] and
2
σ i,u σ i,uε
i = .
σ i,uε i,εε
Kao (1999) showed that in the homogeneous case with i = , i = 1, . . . , N, and abstract-
ing from the deterministics, the OLS estimator β̂ converges in probability to the limit −1 εε σ εu ,
where it is assumed that wit is identically independently distributed across i. In the heterogeneous
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
case εε and σ εu are replaced by the means εε = N −1 N i=1 i,εε and σ εu = N
−1 N σ
i=1 i,εu ,
respectively (see Pedroni (2000)). In contrast, the OLS estimator of β fails to converge within
a pure time series framework. On the other hand, if xit and yit are independent random walks,
then the t-statistics for the hypothesis that one component of β is zero is Op (T 1/2 ) and, there-
fore, the t-statistic has similar properties as in the time series case. As demonstrated by Entorf
(1997) and Kao (1999), the tendency for spuriously finding a relationship among yit and xit
may be even stronger in panel data regressions than in the pure time series case. Therefore, it is
important to test whether the errors in a panel data regression such as (31.46) are stationary.
Example 76 (House prices in the US) Holly, Pesaran, and Yamagata (2010) investigate the
extent to which real house prices at state level in the US are driven by fundamentals such as real
per capita disposable income, as well as by common shocks, and determine the speed of adjustment
of real house prices to macroeconomic and local disturbances. Economic theory suggests that real
house prices and incomes are cointegrated with cointegrating vector (1, −1). Let pit be the logarithm
of the real price of housing in the ith state during year t, and yit be the logarithm of the real per
capita personal disposable income. Table 31.1 reports CIPS panel unit root tests for these variables,
using data on forty-nine US states followed over the years 1975 to 2003. Results show that the
unit root hypothesis cannot be rejected for pit and yit , if the trended nature of these variables are
taken into account. This conclusion seems robust to the choice of the augmentation order of the
underlying CADF regressions. Hence, the analysis proceeds taking yit and pit as I(1). To test for
possible cointegration between pit and yit , the authors estimate the following model
With an Intercept
Notes: The reported values are CIPS(s) statistics, computed as the average of cross-sectionally
augmented Dickey–Fuller (CADF(s)) test statistics ((Pesaran 2007b)). The relevant lower
5% (10%) critical values for the CIPS statistics are −2.11 (−2.03) with an intercept case, and
−2.62 (−2.54) with an intercept and a linear trend case. cit = rit − pit , which is the real
cost of borrowing net of real house price appreciation/depreciation. The superscripts ‘*’ and
‘†’ signify the test is significant at the 5 and 10 per cent levels, respectively.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where, to allow for possible error cross-sectional dependence uit is assumed to have the multi-factor
error structure
m
uit = γ i ft + ε it . (31.48)
=1
Table 31.2 Estimation result: income elasticity of real house prices: 1975–2003
MG CCEMG CCEP
Notes: Estimated model is pit = α i + β i yit + uit . MG stands for mean group estimates.
CCEMG and CCEP denote the common correlated effects mean group and pooled estimates,
−1 N β̂ for MG and
respectively. α̂ = N −1 N i=1 α̂ i for all estimates, and β̂ = N i=1 i
CCEMG estimates. Standard errors are given in parentheses. The average cross-correlation
coefficient is computed as the simple average of the pair-wise cross-sectional correlation coef-
N
ficients of the regression residuals, namely ρ̂ = [2/N(N − 1)] N−1 i=1 j=i+1 ρ̂ ij , with ρ̂ ij
being the correlation coefficient of the regression residuals of the i and j cross-section units.
The CD test statistic is [TN(N − 1)/2]1/2 ρ̂, which tends to N(0, 1) under the null hypoth-
esis of no error cross-sectional dependence. See Section 29.7.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
that the panel unit root tests applied to ûit should also allow for the cross-sectional dependence
of the residuals. Computing CIPS(s) panel unit root test statistics for pit − yit , including state-
specific intercepts, for different augmentation and lag-orders, s = 1, 2, 3 and 4, yields the results,
−2.16, −2.39, −2.45, and −2.29, respectively. The 5 per cent and 1 per cent critical values of the
CIPS statistic for the intercept case with N = 50 and T = 30 are −2.11 and −2.23, respectively.
The results suggest rejection of a unit root in pit − yit for all the augmentation orders at 5 per cent
level and rejection at 1 per cent level in the case of the augmentation orders 2 and more. Therefore,
one could conclude that pit and yit are cointegrated for a sufficiently large number of States. Hav-
ing established panel cointegration between pit and yit , Holly, Pesaran, and Yamagata (2010) turn
their attention to the dynamics of the adjustment of real house prices to real incomes and estimate
the panel error correction model
The coefficient φ i provides a measure of the speed of adjustment of house prices to a shock. The
half-life of a shock to pit is approximately −ln(2)/ln(1 + φ i ). To allow for possible cross-sectional
dependence in the errors, υ it , the authors compute CCEMG and CCEP estimators, and compare
these estimates with the mean group (MG) estimates, which do not take account of cross-sectional
dependence, as a benchmark. The former estimates are computed by the OLS regressions of pit on
1, (pi,t−1 − yi,t−1 ), pi,t−1 , yit , and the associated cross-section averages, (p̄t−1 − ȳt−1 ), yt ,
pt , and pt−1 . The results are summarized in Table 31.3. The coefficients are all correctly signed.
The CCEMG and CCEP estimators are very close and yield error correction coefficients given by
−0.183(0.016) and −0.171(0.015) that are reasonably large and statistically highly significant.
The average half-life estimates are around 3.5 years, much smaller than the half-life estimates of
6.3 years obtained using the MG estimators. But the MG estimators are likely to be biased, since
the residuals from these estimates show a high degree of cross-sectional dependence. The same is not
true of the CCE type estimators. This analysis suggests that, even if house prices deviate from the
equilibrating relationship because of state-specific or common shocks, they will eventually revert. If
Notes: The state-specific intercepts are estimated but not reported. MG stands for Mean Group esti-
mates. CCEMG and CCEP denote the common correlated effects mean group and pooled estimates,
respectively. Standard errors are given in parentheses. The half life of a shock to pit is approximated by
−ln(2)/ln(1 + φ̂) where φ̂ is the pooled estimates for the coefficient on pi,t−1 − yi,t−1 .
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
house prices are above equilibrium they will tend to fall relative to income, and vice versa if they
are above equilibrium. Of course, because there is heterogeneity across states, a particular state need
not be in the same disequilibrium position as other states. But on average the change in the ratio
of house prices to per capita incomes should be zero, consistent with a cointegrating relationship,
for T sufficiently large. In their conclusions, Holly, Pesaran, and Yamagata (2010) also examine
the temporal pattern of the differences, pit − yit , since 2003. The process of house price boom that
started in the US in early 2000 accelerated during 2003–06 and some have interpreted this as a
bubble. Over the period 2000 to 2006 the average (unweighted) rise in US house prices was 46 per
cent, as compared with a 25 per cent rise in income per capita. However, the price increases relative
to per capita incomes have been quite heterogeneous. While house prices over the period 2000 to
2006 rose by 67 per cent in Virginia, 73 per cent in Arizona and 92 per cent in the District of
Columbia, they rose by only 20 per cent in Indiana and 21 per cent in Ohio. These differences were
much more pronounced than the rise in income per capita in these states (respectively 26 per cent,
23 per cent, 40 per cent, 20 per cent, and 19 per cent). Individual states can move about the average
because the loading of the driving variables differ across states or because the initial disequilibrium
is different. The extent of the heterogeneity in the disequilibrium, as measured by the time profile of
the logarithm of price-income per capita over the full sample, 1976–2007, for all the 49 states is
displayed in Figure 31.1. It is interesting that the excess rise in house prices tends to be associated
with increased dispersion in the log price-income ratios, which begin to decline with moderation of
house price rises relative to incomes. This fits well with the development of house prices in 2007,
where prices rose only by 4 per cent as compared with a rise in per capita income of 5 per cent. The
range of house price changes across states was also narrowed down substantially. In fact, in the case
–3.5
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
–3.7
–3.9
–4.1
–4.3
–4.5
–4.7
–4.9
–5.1
–5.3
–5.5
year
AL AR AZ CA CO CT DC DE FL GA IA ID IL
IN KS KY LA MA MD ME MI MN MO MS MT NC
ND NE NH NJ NM NV NY OH OK OR PA RI SC
SD TN TX UT VA VT WA WI WV WY
Figure 31.1 Log ratio of house prices to per capita incomes over the period 1976–2007 for the 49 states of
the US.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
0.10
0.08
0.06
0.04
0.02
0.00
–0.02
–0.04
AL
AR
AZ
CA
CO
CT
DC
DE
FL
GA
IA
ID
IL
IN
KS
KY
LA
MA
MD
ME
MI
MN
MO
MS
MT
NC
ND
NE
NH
NJ
NM
NV
NY
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VA
VT
W
WI
W
WY
–0.06
–0.08
–0.10 Average of net per cent change in house prices to income per capita over 2000–06
Net per cent change in house prices to income per capita in 2007
Figure 31.2 Percent change in house prices to per capita incomes across the US states over 2000–06 as
compared with the corresponding ratios in 2007.
of the five states mentioned above, the price-income ratio declined by 1 per cent in Virginia, 2 per
cent in Arizona, 0 per cent in District of Columbia, −2 per cent in Indiana, and 4 per cent in Ohio.
If we calculate the average change in the log ratio of house prices to per capita income for each state
over the period 2000–06, and compare it to the average change in the ratio for 2007, it is to be
expected that, if a state on average is above its equilibrium before 2006, the average change after
2006 should be negative, and vice versa otherwise. The results are plotted in Figure 31.2, and show
that of 49 states, 32 states have an average rate of price change in 2007 with the opposite sign to
the average price changes for 2000–06. Moreover, we note that the correlation coefficient between
the change in the price-income ratio in 2007, when the house price boom began to unwind, and the
average change in the same ratio over the preceding price boom period, 2000–2006, is negative and
quite substantial, around −0.42.
√ d
(tφ − N μK )/σ K → N (0, 1), (31.50)
where the values of μK and σ K depend on the kind of deterministics included in the regression,
the contemporaneous covariance matrix E(wit wit ) and the long-run covariance matrix, i . Kao
(1999) proposes adjusting tφ by using consistent estimates of μK and σ K , where he assumes
that the nuisance parameters are the same for all units in the panel.
Pedroni (2004) suggests two different test statistics for the models with heterogeneous coin-
tegration vectors. Let ûit = yit − δ̂ i dit − β̂ i xit denote the OLS residual of the cointegration
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
regression. Pedroni considers two different classes of test statistics: (i) the ‘panel statistic’ that is
equivalent to the unit root statistic against homogeneous alternatives and (ii) the ‘Group Mean
statistic’ which is analogous to the panel unit root tests against heterogeneous alternatives. The
two versions of the t statistic are defined as
−1/2 N T
N
T
N
panel ZPt = σ̃ 2NT û2i,t−1 ûi,t−1 ûit − T λ̂i ,
i=1 t=1 i=1 t=1 i=1
−1/2
N
T
T
group-mean
ZPt = 2
σ̂ ie û2i,t−1 ûi,t−1 ûit − T λ̂i ,
i=1 t=1 t=1
∞
where λ̂i is a consistent estimator of the one-sided long run variance λi = j=1 E(eit ei,t−j ), eit =
E(uit ui,t−1 )/E(u2i,t−1 ), σ̂ 2ie
uit − δ i ui,t−1 , δ i = denotes the estimated variance of eit and σ̃ 2NT =
√
N −1 N 2 2
i=1 σ̂ ie . Pedroni presents values of μp , σ p and μ̃p , σ̃ p such that (ZPt − μp N)/σ p and
2
√
(
ZPt − μ̃p N)/σ̃ p have standard normal limiting distributions under the null hypothesis.
Other residual-based panel cointegration tests include the recent contribution of Westerlund
(2005b) which is based on variance ratio statistics and does not require corrections for the resid-
ual serial correlations.
The finite sample properties of some residual based tests for panel cointegration are discussed
in Baltagi and Kao (2000). Gutierrez (2006) compares the power of various panel cointegration
test statistics. He shows that in homogeneous panels with a small number of time periods Kao’s
tests tend to have higher power than Pedroni’s tests, whereas in panels with large T the latter
tests performs best. Both tests outperform the system test suggested by Larsson, Lyhagen, and
Lothgren (2001). Wagner and Hlouskova (2010) compare various panel cointegration tests in
a large scale simulation study. They found that the Pedroni (2004) test based on ADF regres-
sions performs best, whereas all other tests tend to be severely undersized and have very low
power in many cases. Furthermore, the system tests suffer from large small sample distortions
and are unreliable tools for finding out the correct cointegration rank. Gengenbach, Palm, and
Urbain (2006) investigate the performance of Pedroni’s tests in cross-dependent models with a
factor structure.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
to test the null hypothesis that r = 0 against the alternative that at most r = r0 ≥ 1. Using
a sequential limit theory it can be shown that (r) is asymptotically standard normally dis-
tributed. Asymptotic values of E[λi (r)] and Var[λi (r)] are tabulated in Larsson, Lyhagen, and
Lothgren (2001) for the model without deterministic terms and Breitung (2005) for models
with a constant and a linear time trend. Unlike the residual-based tests, the LR-bar test allows
for the possibility of multiple cointegration relations in the panel.
It is also possible to test the null hypothesis that the errors of the cointegration regression
are stationary. That is, under the null hypothesis it is assumed that yit , xit are cointegrated with
cointegration rank r = 1. McCoskey and Kao (1998) suggest a panel version of the Shin (1994)
cointegration test based on the residuals of a fully modified OLS regression. Westerlund (2005c)
suggests a related test procedure based on the CUSUM statistic.
y+ −1
it = yit − σ i,εu i,εε xit . (31.51)
−1 N T
N
T
β̂ FM = xit xit (xit y+
it − λi,εu ) , (31.52)
i=1 t=1 i=1 t=1
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where
⎛ ⎞
∞
λi,εu = E ⎝ εi,t−j uit ⎠ .
j=0
where vit is orthogonal to all leads and lags of xit . Inserting (31.53) in the regression (31.46)
yields
∞
yit = β xit + γ k xi,t+k + vit . (31.54)
k=−∞
In practice the infinite sums are truncated at some small numbers of leads and lags (see Kao
and Chiang (2001), Mark and Sul (2003)). Westerlund (2005a) considers data dependent
choices of the truncation lags. Kao and Chiang (2001) show that, in the homogeneous case with
i = and individual specific intercepts, the limiting distribution of the DOLS estimator
β̂ DOLS is given by
√ d
T N(β̂ DOLS − β) → N (0, 6 σ 2u|ε −1
εε ),
where
σ 2u|ε = σ 2u − σ εu −1
εε σ εu .
Furthermore, the FM-OLS estimator possesses the same asymptotic distribution as the DOLS
estimator. In the heterogeneous case εε and σ 2u|ε are replaced by εε = N −1 N i=1 i,εε and
−1
N 2
σ u|ε = N
2
i=1 σ i,u|ε , respectively (see Phillips and Moon (1999)). Again, the matrix i can
be estimated consistently (for T → ∞) by using a non-parametric approach.
In many applications the number of time periods is smaller than 20 and, therefore, the kernel
based estimators of the nuisance parameters may perform poorly in such small samples. In these
cases, the pooled mean group estimator introduced by Pesaran, Shin, and Smith (1999) and dis-
cussed in Section 28.10 may be used. This method assumes that the long-run parameters are
identical across the cross-section units. Economic theory often predicts the same cointegration
relation(s) across the cross-section units, although it is often silent on the magnitude of short-
run dynamics, across i. For example, the long-run relationships predicted by the PPP, the uncov-
ered interest parity, or the Fisher equation are the same across countries, although the speed of
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
convergence to these long-run relations can differ markedly over countries due to differences in
economic and political institutions. For further discussion see, for example, Pesaran (1997).
where wit = (uit , εit ) . Once again we leave out deterministic terms and lagged differences. To
be consistent with the approaches considered above, we confine ourselves to the case of homo-
geneous cointegration, that is, we let β i = β for i = 1, 2, . . . , N. Larsson and Lyhagen (1999)
propose an ML estimator, whereas the estimator of Groen and Kleibergen (2003) is based on a
nonlinear GMM approach.
It is well known that the ML estimator of the cointegration parameters for a single series may
behave poorly in small samples. Phillips (1994) has shown that the finite sample moments of
the estimator do not exist. Using Monte Carlo simulations Hansen, Kim, and Mittnik (1998)
and Brüggemann and Lütkepohl (2005) found that the ML estimator may produce implausible
estimates far away from the true parameter values. Furthermore the asymptotic χ 2 distribution
of the likelihood ratio test for restrictions on the cointegration parameters may be a poor guide
for small sample inference (e.g., Gredenhoff and Jacobson (2001)).
To overcome these problems, Breitung (2005) proposes a computationally convenient two-
step estimator, which is adapted from Ahn and Reinsel (1990) . This estimator is based on the
fact that the Fisher information is block-diagonal with respect to the short- and long-run param-
eters. Accordingly, an asymptotically efficient estimator can be constructed by estimating the
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
short- and long-run parameters in separate steps. Suppose that the n × r matrix of cointegrating
vectors is ‘normalized’ as β = (Ir , B) , where Ir is the identity matrix of order r and B is the
(n − r) × r matrix of unknown coefficients.12 Then β is exactly identified and the Gaussian ML
estimator of B is equivalent to the OLS estimator of B in
(2)
z∗it = Bzi,t−1 + vit , (31.56)
(1) (2)
where z(2)
it is the r × 1 vector defined by zit = zit , zit , and
(1)
z∗it = (α i −1 −1 −1
i α i ) α i i zit − zi,t−1 .
√
The matrices α i and i can be replaced by T-consistent estimates without affecting the limit-
ing distribution. Accordingly, these matrices can be estimated for each panel unit separately, for
example by using the Johansen (1991) ML estimator. To obtain the same normalization as in
(31.56) the estimator for α i is multiplied with the r × r upper block of the ML estimator of β.
Breitung (2005) shows that the limiting distribution of the OLS estimator of B is asymptoti-
cally normal. Therefore, tests of restrictions on the cointegration parameters have the standard
limiting distributions (i.e. a χ 2 distribution for the usual Wald tests).
Some Monte Carlo experiments are performed by Breitung (2005) to compare the small sam-
ple properties of the two-step estimator with the FM-OLS and DOLS estimators. The results
suggest that the latter two tests may be severely biased in small samples, whereas the bias of
the two-step estimator is relatively small. Furthermore, the standard errors (and hence the size
properties of the t-statistics) of the two-step procedure are more reliable than the ones of the
semi-parametric estimation procedures. In a large scale simulation study, Wagner and Hlouskova
(2010) found that the DOLS estimator outperforms all other estimators, whereas the FM-OLS
and the two-step estimator perform similarly.
12 The analysis can be readily modified to take account of other types of exact identifying restrictions on β that might be
more appropriate from the viewpoint of long-run economic theory. See Pesaran and Shin (2002) for a general discussion
of identification and testing of cointegrating relations in the context of a single cross-sectional unit.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Xt = (x̃1t , x̃2t , . . . , x̃Nt ) .
regression residuals. Furthermore, define ỹt = ỹ1t , ỹ2t , . . . , ỹNt and
The DSUR estimator of the (homogeneous) cointegration vector is
⎛ ⎞
T−p
T−p
β̂ dsur =⎝ Xt −1
⎠
uu Xt Xt −1
uu ỹt , (31.57)
t=p+1 t=p+1
where uu denotes the long-run covariance matrix of ut = (u1t , u2t , . . . , uNt ) , namely
1
T
T
uu = lim E ut ut ,
T→∞ T
t=1 t=1
for a fixed N. This matrix is estimated by using an autoregressive representation of ut . See also
(31.53). An alternative approach is suggested by Breitung (2005), where an SUR procedure is
applied in the second step of the two-step estimator.
Bai and Kao (2005), Westerlund (2007), and Bai, Kao, and Ng (2009) suggest estimators for
the cointegrated panel data model given by
where ft is an r × 1 vector of common factors and eit is the idiosyncratic error. Bai and Kao
(2005) and Westerlund (2007) assume that ft is stationary. They suggest an FM-OLS cointegra-
tion regression that accounts for the cross-correlation due to the common factors. Bai, Kao, and
Ng (2009) consider a model with non-stationary factors. Their estimation procedure is based
on a sequential minimization of the criterion function
N
T
SNT (β, f1 , . . . , fT , γ 1 , . . . , γ N ) = (yit − β xit − γ i ft )2 , (31.59)
i=1 t=1
subject to the constraint T −1 Tt=1 ft ft = Ir and N
i=1 γ i γ i being diagonal. The asymptotic
bias of the resulting estimator is corrected for by using an additive bias adjustment term or by
using a procedure similar to the FM-OLS estimator suggested by Phillips and Hansen (1990).
A common feature of these approaches is that cross-sectional dependence can be repre-
sented by a contemporaneous correlation of the errors, and does not allow for the possibility
of cross-unit cointegration. In many applications it is more realistic to allow for some form of
dynamic cross-sectional dependence. A general model to accommodate cross-section cointe-
gration and dynamic links between panel units is the panel VECM model considered by Groen
and Kleibergen (2003) and Larsson and Lyhagen (1999). As in Section 31.7, let zit denote an
times series on the i cross-sectional unit. Consider the nN × 1 vector
n-dimensional th
vector
of
zt = z1t , z2t . . . , zNt of all available time series in the panel data set. The VECM representa-
tion of this time series vector is
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
For cointegrated systems rank() < nN. It is obvious that such systems typically involve a large
number of parameters as the number of parameters increases with N 2 . Therefore, to obtain reli-
able estimates of the parameters T must be considerably larger than N. In many macroeconomic
applications, however, the number of time periods is roughly as large as the number of cross-
section units. Therefore, a simple structure must be imposed on the matrices , 1 , . . . , p that
yields a reasonable approximation to the underlying dynamic system.
31.13 Exercises
1. Let yit be the real exchange rate (in logs) of country i = 1, 2, . . . , N, observed over the period
t = 1, 2, . . . , T. Suppose that yit is generated by the first-order autoregressive process
yit = 1 − φ i μi + φ i yi,t−1 + ε it , i = 1, 2, . . . , N; t = 1, 2, . . . , T,
where initial values, yi0 , are given, and εit are serially uncorrelated and distributed indepen-
dently across i, ε it ∼ IIDN(0, σ 2i ).
yi Mτ yi,−1
φ̂ i = M y ,
yi,−1 τ i,−1
where yi,−1 = yi0 , yi1 , . . . , yi,T−1 , yi = yi1 , yi2 , . . . , yiT , Mτ = IT −
τ T (τ T τ T )−1 τ T , τ T = (1, 1, . . . , 1). Hence, or otherwise establish that under φ i = 1
ε i Mτ si,−1
φ̂ i = 1 +
si,−1 Mτ si,−1
where si,−1 = 0, si1 , . . . , si,T−1 , with sit = tj=1 ε ij , and for a fixed T ( > 3) show
that
E φ̂ i = 1 + bias,
where
x Mτ Hx
Bias = E ,
x H Mτ Hx
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
N
φ̂ MGE = N −1 φ̂ i .
i=1
(a) Suppose that |φ| < 1, E(α i uit ) = 0 for all i and t, and the processes in (31.61) started a
long time ago. Show that the IV estimator of φ given by
T
N
yit yi,t−2
t=3 i=1
φ̂ IV = ,
T N
yi,t−1 yi,t−2
t=3 i=1
is a consistent estimator of φ for a fixed T > 2, and as N → ∞. Make sure you establish
N
that N −1 yi,t−1 yi,t−2 tends to a non-zero value.
i=1
(b) Consider now the case where φ = 1. Are there any conditions under which φ̂ IV is a
consistent estimator?
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
yit = μi + gt + vit ,
(a) Show that E(yit ) = g, irrespective of whether |φ| < 1 or φ = 1. What is the implica-
tion of this observation for robust estimation of g?
(b) Let
⎛ ⎞ ⎛ ⎞
y1t u1t
⎜ y2t ⎟ ⎜ u2t ⎟
⎜ ⎟ ⎜ ⎟
yt = ⎜ .. ⎟ , ut = ⎜ .. ⎟,
⎝ . ⎠ ⎝ . ⎠
yNt uNt
⎛ ⎞ ⎛ ⎞
1 y1,t−1 1 y1,t−2
⎜ 1 y2,t−1 ⎟ ⎜ 1 y2,t−2 ⎟
⎜ ⎟ ⎜ ⎟
Wt = ⎜ .. .. ⎟ , Zt = ⎜ .. .. ⎟,
⎝ . . ⎠ ⎝ . . ⎠
1 yN,t−1 1 yN,t−2
and
yt = Wt ψ+ut ,
converges in probability to [(1 − φ)g, φ] if |φ| < 1 and E(uit ) = 0 = E(uit μi ).
(c) Consider the case where φ = 1 and analyse the asymptotic properties of ψ̂ IV when
φ = 1.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
N
N
N
N
where ȳt = N −1 yit , x̄t = N −1 xit , ᾱ = N −1 α i , γ̄ j = N −1 γ ij , and
i=1 i=1 i=1 i=1
N
θ̄ = N −1 θ i.
i=1
(b) Using the results in (a) derive a panel unit root test of λ = 1, against the alternative of
λ < 1, assuming that fjt follows stationary processes.
(c) Consider now the case where γ i2 = 0, and yit and xit are determined by the same factor
f1t and assume that f1t is I(1). Discuss the conditions under which ȳt and x̄t are cointe-
grated. How do you test and estimate such a cointegrating relationship, assuming that it
does exist?
(d) How do you interpret a test of λ = 1 if one of the factors is I(1)?
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
32.1 Introduction
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Aggregation can be broadly divided into two categories: aggregation across time and over
cross-sectional units. The concept of cross-sectional units in this chapter is broadly defined, and
includes geographical dimensions (e.g., regions or countries), individuals (e.g., firms or house-
holds), industries and products (e.g., the consumer price index basket of goods and services).
There are two leading examples of aggregation across time: sequential sampling and temporal
aggregation. The former method sequentially samples data from a higher frequency to a lower
frequency. One example of sequential sampling is market closing prices or end-of-period prices.
The latter method, on the other hand, combines the data typically by using period averages, to
convert the series from a higher frequency to a lower frequency. Examples of temporally aggre-
gated data are data measuring economic activity (gross domestic product, industrial output,
retail sales) or consumer price data, where prices are collected repeatedly over a number of days
within a month and then averaged across collection days to obtain monthly price indices for
individual goods and/or services.
Aggregation over cross-sectional units (or ‘cross-sectional aggregation’) can also be divided
into two categories, depending on the number of units, whether aggregation is carried out across
a finite number of units (N) and/or over a large number of units (N→∞). This distinction
is important for theoretical analysis, where taking limits (N→∞) often simplifies the analy-
sis. Large N asymptotics often seem a reasonable approximation when it comes to macroeco-
nomic data, where the number of cross-sectional units (households, products, firms, etc.) can
be very large.
The focus of this chapter is predominantly on large N aggregation. It first briefly reviews
the main aggregation problems studied in the literature (Section 32.2). Then it presents a gen-
eral framework for micro/disaggregate behavioral relationships (Section 32.3) and develops a
forecasting approach to derive the optimal aggregate function (Section 32.4). This approach is
applied to a large cross-section aggregation of panel ARDL models (Section 32.5) and to the
case of large factor-augmented VAR models in N cross-sectional units, where each micro unit is
potentially related to all other micro units, and where micro innovations are allowed to be cross
sectionally dependent (Section 32.6). The optimal aggregate function is used to examine the
relationship between micro and macro parameters to show which distributional features of micro
parameters can be identified from the aggregate model (Section 32.7). This chapter also derives
and contrasts impulse response functions for the aggregate variables, distinguishing between the
effects of composite macro and aggregated idiosyncratic shocks (Section 32.8). Some of these
findings are illustrated by Monte Carlo experiments (Section 32.9) and two applications are pre-
sented. The first application investigates the aggregation of life-cycle consumption decision rules
under habit formation (Section 32.10). The second application investigates the sources of per-
sistence of consumer price inflation in Germany, France, and Italy, and re-examines the extent to
which ‘observed’ inflation persistence at the aggregate level is due to aggregation and/or com-
mon unobserved factors (Section 32.11).
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
to shocks is of considerable interest in policy making. Both inflation and real exchange rates are
aggregates from data on a large number of goods and services.
The problem of aggregating a large number of independent time series processes was first
addressed by Robinson (1978) and Granger (1980). Granger shows that aggregate variables can
have fundamentally different time series properties as compared with those of the underlying
micro units. Focusing on autoregressive models of order 1, AR(1), he shows that aggregation
can generate long memory even if the micro units follow stochastic processes with exponentially
decaying autocovariances (see also Section 15.8.3). Consider the following AR(1) disaggregate
relations,
for i = 1, 2, . . . , N, and t = . . .−1, 0, 1, 2, . . ., where |λi | < 1. Suppose these relations are inde-
pendent, and in addition λi and Var (uit ) = σ 2i are independently and identically distributed
(IID) random draws with the distribution function F (λ) for λ on the range [0, 1). Granger’s
N
objective is the memory properties of the aggregate variable St,N y = i=1 yit . The same
setup is considered also in an earlier work by Robinson (1978), but with a focus on the esti-
mation of the moments of F (λ). To study the persistence properties of the aggregates, Granger
considers the spectrum of ȳt = N −1 N i=1 yit ,
N
−1 1 1
f̄N (ω) = N fi (ω) ≈ E [Var (uit )] dF (λ) .
2π 1 − λe−iω 2
i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
yit = xit + γ i ft ,
where xit is unit-specific explanatory variable, ft is a common factor with loadings, γ i , and yit is
the observation for the unit i at time t. Suppose xit and ft have zero means, bounded variances,
and xit is independently distributed of ft and of xjt for all j = i. Consider the variance of the
aggregate variable Nȳt = N i=1 yit ,
N
Var N ȳt = Var (xit ) + N 2 γ 2 Var ft ,
i=1
N
where ȳt = N −1 N i=1 yit and γ = N
−1
i=1 γ i . The first summand is at most of order N,
denoted as O (N), and, provided that limN→∞ γ = 0, the second summand is of order N 2 . The
second term will therefore generally dominate the aggregate relationship. Granger demonstrates
striking implications of this finding in terms of the fit of the aggregate (macro) relationship, where
the common factor prevails when N is sufficiently large, and disaggregate (micro) relationships,
where the micro regressor could play a leading role. If the common factor was unobserved, then
the aggregate relation would have zero fit (for N large) whereas the fit of disaggregate relations
could be quite high, being driven by the micro regressor, xit . On the other hand, if ft was observed
and xit was unobserved then the macro relation would have a perfect fit (for N large), whereas the
micro relation may have a very poor fit due to the missing micro regressor, xit . Hence variables
that may have very good explanatory power at the micro level might be unimportant at the macro
level, and vice versa. Granger shows that the strength and pattern of cross-sectional dependence
thus play a central role in aggregation and components with weaker cross-sectional dependence
typically do not matter for the behaviour of aggregate variables.
Aggregation has also been studied from the perspective of forecasting: is it better to fore-
cast using aggregate or disaggregate data, if the primary objective is to forecast the aggregates?
Pesaran, Pierse, and Kumar (1989) and Pesaran, Pierse, and Lee (1994), building on Grunfeld
and Griliches (1960), develop selection criteria for a choice between aggregate and disaggregate
specifications. Giacomini and Granger (2004) discuss forecasting of aggregates in the context of
space-time autoregressive models.
Other contributions to the theory of aggregation include the contributions of Kelejian (1980),
Stoker (1984, 1986), and Garderen et al. (2000), on aggregation of static nonlinear micro mod-
els; Pesaran and Smith (1995), Phillips and Moon (1999), and Trapani and Urga (2010) on the
effects of aggregation on cointegration.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
where yit denotes the vector of decision variables, xit is a vector of observable variables, uit is a
vector of unobservable variables, and θ i denotes the vector of unknown parameters.
Example 77 When the source of heterogeneity is different inputs (or endowments) across individuals
only, we have
For this type of heterogeneity, aggregation clearly will not be a problem when the micro relations are
linear.
Example 78 When the input variables as well as the parameters differ across individuals, we have
In the analysis of nonlinear Engel curves, such a scenario arises, for example, if the model is given by
2
wit = a0i + a1i log xit + a2i log xit + uit . (32.6)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Example 79 It is also possible that there is heterogeneity in the functional form of the micro relations,
for example a production function of the form
−δ i −1/δ i uit
yit = λi L−δ
it
i
+ (1 − λ i ) Kit e . (32.7)
In this chapter, we consider the case where f (·) is the same across individuals, but the input
variables xit and uit , and/or the parameters θ i differ across individuals. The analysis can also be
easily extended to account for observed and unobserved macro (or aggregate) effects on indi-
vidual behaviour, namely
where zt represents a vector of observed macro effects, and vt represents a vector of unobserved
macro effects.
N
−1
ȳt = N f (xit , uit , θ i ) . (32.9)
i=1
An aggregation problem is said to be present if the aggregate function F (x̄t , ūt , θ a ) (with x̄t =
N
N −1 N i=1 xit , ūt = N
−1 u , and where θ a is the vector of parameters of the aggregate
i=1 it
function) differs from N −1 N i=1 f (xit , uit , θ i ). Perfect aggregation holds if
N
F (x̄t , ūt , θ a ) − N −1 f (xit , uit , θ i )
= 0, (32.10)
i=1
for all xit , uit , and θ i , where a − b denotes a suitable norm discrepancy measure between a and
b. This requirement turns out to be extremely restrictive and is rarely met in applied economic
analysis, except for linear models with identical coefficients. Condition (32.10) is not satisfied
when f (·) is a nonlinear function of xit and uit , even if θ i is identical across individuals.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
remote the possibility of their occurrence. An alternative and less restrictive approach would
be to require that (32.10) holds ‘on average’. More precisely, let μy (t) and μx (t) be the means
of yit and xit across individuals at a point in time or over a given period of time (depending on
whether the variables are stocks or flows) and define a macro (or aggregate) relation as one that
links μy (t) to μx (t) at a point in time t. This approach is suggested by Kelejian (1980) and rigor-
ously formalized by Stoker (1984). It treatsxit , uit , and θi across individuals as stochastic, having
a joint probability distribution function P xt , ut , θ ; φ t with parameter vector φ t that can vary
over time, but not across individuals. Then
μy (t) =
y φ t = f (xt , ut , θ ) P xt , ut , θ; φ t dxt dut dθ , (32.11)
and
μx (t) =
x φ t = xt P xt , ut , θ ; φ t dxt dut dθ . (32.12)
Let φ t = φ 1t , φ 2t , where φ 2t has the same dimension as xit , for all i, and suppose that for a
given φ 1t there is a one-to-one relationship between φ 2t and μx (t). Then
φ 2t =
x−1 φ 1t , μx (t) , (32.13)
and
μy (t) =
y φ 1t ,
x−1 φ 1t , μx (t) = F μx (t) , φ 1t . (32.14)
The relationship between μy (t) and μx (t) is then defined as the exact aggregate equation.
This is clearly an improvement over the deterministic approach, but it is still rather removed
from direct empirical analysis and does not adequately focus on the inevitably approximate
nature of econometric analysis. Moreover, perhaps more importantly, due to its reliance on
unconditional means, this approach is not suitable for the analysis of dynamic systems.
N
E
F (t , θ at ) − N −1
f (xit , uit , θ i )
,
i=1
be as small as possible over F(·). For expositional simplicity denote the aggregate function
F (t , θ ta ) by Ft , and f (xit , uit , θ i ) by fit . Also note that the parameters of the aggregate func-
tion, θ ta , will typically include first and higher moments of the joint distribution of (xit , uit , θ i )
across i, and could be time-dependent. To simplify the exposition in what follows we assume
F(·) is a scalar function.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
2
and therefore the function that minimizes E Ft − ȳt |t is given by
N
Ft = E ȳt |t = N −1 E [f (xit , uit , θ i ) |t ] . (32.16)
i=1
This function will be referred to as the ‘optimal aggregator function’ (in the mean squared error
sense). The orthogonal projection used (implicitly or explicitly) by Granger (1980), Lütkepohl
(1984), and Lippi (1988) for aggregation of linear time series is a special case of this optimal
aggregator which is more widely applicable. For an application to aggregation of static nonlinear
models see Garderen et al. (2000).
2
This choice of Ft globally minimizes E Ft − ȳt |t , but does not reduce it to zero, which
is what (32.10) requires. We have
2
E Ft − ȳt |t = Var ȳt |t = 0, (32.17)
unless, of course, E ȳt |t = ȳt .
It is also possible to define an aggregate prediction function, based on individual prediction
of yit , conditional on information on all the observed disaggregate variables at time t. Let
it = yit−1 , yit−2 , . . . ; xit , xit−1 , . . . , (32.18)
denote the information set specific to individual i, and as before denote the information common
to all individuals by
t = ȳt−1 , ȳt−2 , . . . ; x̄t , x̄t−1 , . . . . (32.19)
Then
it = it ∪ t , (32.20)
t = ∪N
i=1
it , (32.21)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
all information available in the disaggregate model. Then the aggregate forecast, ȳtd , based on
the universal information set,
t , is given by
N
ȳtd = N −1 E [f (xit , uit , θ i ) |
t ] , (32.22)
i=1
N
ȳtd = N −1 E [f (xit , uit , θ i ) |
it ] . (32.23)
i=1
Then we have
2 2
E ȳt − ȳtd |
t ≤ E ȳt − E ȳt |t |
t , (32.24)
and hence
2 2
E ȳt − ȳtd ≤ E ȳt − E ȳt |t , (32.25)
which is basically saying that the optimal predictors ȳtd which utilize information on micro vari-
ables on average are expected to do better than the optimal predictors based on the aggregate
information only.
uit = ϕ i ηt + ξ it , (32.27)
1 A unit-specific intercept term can also be included in (32.26) without affecting the main results.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
where ηt is the component which is common across all micro units, and ξ it is the idiosyncratic
component assumed to be distributed independently across i, with a mean zero and a finite
variance.
Assumption A.1 is standard in the aggregation and panel literature with random coefficients.
The stability conditions, |λi | < 1, for all i, can be relaxed at the expense of additional assump-
tions in the way the micro processes are initialized. Assumption A.3 is required for consistent
estimation of the parameters of the aggregate equation and can be relaxed. Assumption A.4 is
quite general and allows a considerable degree of dependence across the micro disturbances, uit .
Nor does it require ξ it and ϕ i ηt to be independently distributed.
To derive the optimal aggregator function, E(ȳt |t ), one possibility will be to work with the
autoregressive distributed lag representations, (32.26). But this will involve deriving expecta-
tions such as E(λi yi,t−j |t ) which is complicated by the fact that λi and yi,t−j are not indepen-
dently distributed. To see this notice that under Assumption A.2, (32.26) may be solved for
∞
∞
j j
yit = β i λi xi,t−j + λi ui,t−j , i = 1, 2, . . . , N, (32.28)
j=0 j=0
which makes the dependence of yi,t−j on λi and β i explicit, and suggests that it might be more
appropriate to work directly with the distributed lag representations, (32.28). This is the approach
followed by Pesaran (2003).
Aggregating (32.28) across all i, we have
∞
N ∞
N
ȳt = N −1 β i λi xi,t−j + N −1
j j
λi ui,t−j , (32.29)
j=0 i=1 j=0 i=1
where as before ȳt = N −1 N i=1 yit . Introduce the new information set ϒ it = {xit , xi,t−1 , . . .} ∪
t which excludes the individual-specific information on lagged values of yit , and let ϒ t = ∪N i=1
ϒ it . Suppose also that N is large enough so that yi,t−j , j = 1, 2, . . ., cannot be revealed from the
aggregates ȳt−1 , ȳt−2 , . . .. Now, under Assumptions A.1 and A.4
j
E β i λi | ϒ t = E βλj = aj , (32.30)
j
E λi | ϒ t = E λj = bj , (32.31)
and
j j
E λi ui,t−j |ϒ t = E λi | ϒ t E ui,t−j |ϒ t .
Taking conditional expectations of both sides of (32.29) with respect to ϒ t we now have
∞
N
∞
N
−1 j −1 j
E(ȳt |ϒ t ) = N xi,t−j E β i λi |ϒ t + N E λi | ϒ t E ui,t−j |ϒ t .
j=0 i=1 j=0 i=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
∞
∞
E(ȳt |ϒ t ) = aj x̄t−j + bj E(Ut−j |ϒ t ), (32.32)
j=0 j=0
N
where x̄t = N −1 N i=1 xit and Ut = N
−1
i=1 uit . This result provides the forecast of the
aggregate series {ȳt } conditional on ϒ t that involves disaggregated observations on xit s. To
obtain the aggregate forecast function we need to take expectations of both sides of (32.32) with
respect to t . Noting that t is contained in ϒ t we now have
∞ ∞
E ȳt |t = aj x̄t−j + bj E Ut−j |t . (32.33)
j=0 j=0
The aggregate predictor function, E ȳt |t , is composed
of a predetermined component,
∞ ∞
j=0 a j x̄ t−j , and a random component, j=0 b j E Ut−j | t . To learn more about the random
component, using (32.27) first note that
Ut = ϕ ηt + Zt ,
where
N
N
ϕ = N −1 ϕ i , and Zt = N −1 ξ it .
i=1 i=1
Namely, the aggregate error term, Ut , is itself composed of a common component, ηt , and an
aggregate of the idiosyncratic shocks, Zt . Under Assumptions A.3 and A.4, ηt and Zt are serially
uncorrelated and independently distributed of xit ’s, and hence (noting that ȳt is not contained in
t ) we have
E (Ut |t ) = ϕ E ηt |t + E (Zt |t ) = 0. (32.34)
∞ ∞
E ȳt |t = aj x̄t−j + bj Vt−j , (32.35)
j=0 j=1
where
Vt−j = E Ut−j |t = ϕ E ηt−j |t + E Zt−j |t , j = 1, 2, . . . . (32.36)
The optimal aggregate dynamic model corresponding to the micro relations, (32.26), is now
given by
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
∞
∞
ȳt = aj x̄t−j + bj Vt−j + ε t , (32.37)
j=0 j=1
or
∞
∞
∞
ȳt = aj x̄t−j + ϕ bj E ηt−j |t + bj E Zt−j |t + ε t , (32.38)
j=0 j=1 j=1
where εt = ȳt −E ȳt |t . By construction εt is orthogonal to {x̄t , x̄t−1,... } and {Vt−1 , Vt−2 , . . .}.
But, as in the static case, the contemporaneous errors of the aggregate equation, εt , are likely
to be heteroskedastic. The above
2 aggregate specification is optimal in the sense that E ȳt |t
minimizes E[ȳt − E ȳt |t ] with respect to the aggregate information set, t . 2
The terms Vt−1 , Vt−2 , . . . in addition to being orthogonal to the aggregate disturbances, εt ,
are in fact serially uncorrelated with zero means and a finite variance. First, it is easily seen that
E(Vt−j ) = E E Ut−j |t = E(Ut−j ) = 0.
But Ut−j is a serially uncorrelated process with zero mean. Hence, E(Vt−j Vt−j−1 |t−j−1 ) = 0,
which also implies that E(Vt−j Vt−j−1 ) = 0. Using a similar line of reasoning it is also easily
established that E(Vt−j Vt−j−s ) = 0, for all s ≥ 0. Finally, since by Assumptions A.3 and A.4, xis
and uit have finite variances, the random variables Vt−1 , Vt−2 , . . . , being linear functions of xis
and uit , will also have finite variances. Clearly, the same
arguments
also apply to the components
η η
of Vt−j , namely Vt−j = E(ηt−j |t ) and Vt−j z
= E Zt−j |t , namely Vt−j and Vt−j z
have zero
means, are serially uncorrelated with finite variances.
The aggregate function, (32.37), holds irrespective of whether the shocks to the underlying
micro relations contain a common component. But the contribution of the idiosyncratic shocks,
Zt , to the aggregate function will depend on the rate at which the distributed lag coefficients, bj ,
p
decay as j → ∞. Although, under Assumption A.4 Zt →0, this does not necessarily
mean
that the contribution of the idiosyncratic shocks, given by ∞ b
j=1 j E Z |
t−j t , will also tend
to zero as N → ∞. Heuristically,
this is due to the fact that, under Assumptions A.3 and A.4, the
z
variance of Vt−j is of order of ∞ b
j=1 j
2 /N and need not tend to zero if the coefficients, b , do not
j
decay sufficiently fast. An example of such a possibility was first discussed by Granger (1980).
We now turn to this and other examples and show how a number of results in the literature can
be obtained from the optimal aggregator function given by (32.37). In the general case where
micro relations are subject to both common and idiosyncratic shocks, the effect of the common
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
shocks on the aggregate forecast, E ȳt |t , will dominate as N → ∞. Hence, for forecasting
purposes, the effects of idiosyncratic shocks can be ignored.
The analysis of aggregation of ARDL models by Lewbel (1994) can also be related to the
optimal aggregate function, (32.37). Lewbel considers the relatively simple case where the coef-
ficients β i and λi are independently and identically distributed across i, and makes the addi-
tional assumption that the distributions of β i and xit are uncorrelated and that λi and β i xit + uit
are independently distributed.3 Under these assumptions and adopting the statistical approach
described in Section 32.4.2, Lewbel derives the following aggregate infinite-order autoregressive
specification
∞
μy (t) = cj μy (t − j) + βμx (t) + μu (t), (32.39)
j=1
where μy (t), μx (t), and μu (t) are the cross-section means of yit , xit , and uit , respectively.
Assuming the above infinite-order autoregressive representation exists, Lewbel shows that the
coefficients cs satisfy the recursions
s−1
bs = br cs−r , (32.40)
r=0
with bj = E λj , as before. It is then easily seen that c1 = b1 = E(λ), c2 = E(λ − b1 )2 =
Var(λ), which establishes that the autoregressive component of the aggregate specification must
at least be of second-order; otherwise the distribution of λ will be degenerate with all agents
having the same lag coefficient.
Lewbel’s result and a number of its generalizations can be derived from the optimal aggre-
gate specification given by (32.37). Our approach also provides the conditions that ensure the
existence of Lewbel’s infinite-order autoregressive representation. In the simple case considered
j
by Lewbel, where β i and λi are assumed to be independently distributed, we have E β i λi =
j
E(β i )E λi = βbj , and (32.37) simplifies to4
∞
∞
ȳt = β bj x̄t−j + bj Vt−j + ε t . (32.41)
j=0 j=1
To see the relationship between (32.41) and Lewbel’s result, (32.39) first note that
where B(L) = ∞ j
j=0 bj L . Whether it is possible to write (32.42) as an infinite-order autore-
gressive specification in ȳt , depends on whether B(L) is invertible and this in turn depends on
3 The consequences of relaxing some of these assumptions are briefly discussed by Lewbel (1994, Section 4).
p
4 Recall that here we are assuming that there are no common components in the micro shocks, uit , and hence Vt → 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
the probability distribution of λ. It is, for example, clear from our discussion in the previous sec-
tion that if λ has a beta distribution
of the jsecond type with 0 < q ≤ 1, then {bj } will not be
absolute summable and B(L) = ∞ j=0 bj L may not be invertable. Therefore, under this distri-
butional assumption, Lewbel’s autoregressive representation may not exist. But if {bj } is absolute
summable, B(L) can be inverted and (32.42) can be written as
∞
∞
ȳt = cj ȳt−j + β x̄t + cj Vt−j + C(L)ε t , (32.43)
j=1 j=1
where C(L) = 1 − ∞ j
j=1 cj L . The coefficients cj are obtainable from the polynomial identity
B(L)C(L) ≡ 1, and it is easily verified that they in fact satisfy the recursive relations (32.40)
derived by Lewbel (1994).
In the more general case where β i and λi are allowed to be statistically dependent, the opti-
mal aggregate specification does not simplify to (32.43) and will be given by (32.37). In this
more general setting there seems little gain in rewriting the resultant distributed lag model in the
infinite-order autoregressive form used by Lewbel (1994).
|wi |
= O N −1/2 , for any i, and w = O N −1/2 . (32.45)
w
Denote the aggregate information set by t = (ȳw,t−1 , ȳw,t−2 , . . . ; x̄wt , x̄w,t−1 , . . . ; ft , ft−1 , . . .).
When ft is not observed, the current and lagged values of ft in t must be replaced by their fit-
ted or forecast values obtained from an auxiliary model for ft , and possibly other variables, not
included in (32.44). Consider the augmented information set ϒt = (yt−M ; w; xt , xt−1 , . . . ;
5 This specification can be readily generalized to allow for more than one cross-section specific regressor, by replacing
Bxt with B1 x1t + B2 x2t + . . . + Bk xkt .
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
ft , ft−1 , . . . ; ȳw,t−1 , ȳw,t−2 , . . .), that includes the weights, w, and the disaggregate observations
on the regressors, xit . Note that t is contained in ϒt .
Now introduce the following assumptions on the eigenvalues of and the idiosyncratic
errors, ε t = (ε 1t , ε 2t , . . . , εNt ) .
Assumption A.5. The coefficient matrix, , of the VAR model in (32.44) has distinct eigen-
values λi () , for i = 1, 2, . . . , N, and satisfies the following cross-sectionally invariant condi-
tional moments
⎫
E λsi () ϒt , P, ε t−s
= a s , ⎬
E λsi () |ϒt , P, β = bs (β), (32.46)
⎭
E λsi () |ϒt , P, = cs (),
Remark 10 Assumption A.5 is analytically convenient and can be viewed as a natural generaliza-
tion of the simple AR(1) specifications considered by Robinson (1978), Granger (1980) and oth-
ers. Using the spectral decomposition of = P P−1 , where = diag[λ1 () , λ2 () , . . . ,
λN () ] is a diagonal matrix with eigenvalues of on its diagonal, the factor-augmented VAR
model can be written as
where y∗it is the ith element of yt∗ = P−1 yt , and z∗it is the ith element of z∗t = P−1 (Bxt + ft + εt ).
Consider now the conditions under which an optimal aggregate function exists for ȳ∗wt = w yt∗ =
wP−1 yt. We know from the existing literature that such an aggregate function exists if
E λsi z∗it = a∗s , for all i. Seen from this perspective, our assumption that conditional on
P the eigenvalues have moments that do not depend on i seems sensible, and is likely to be essential
for the validity of Granger’s conjecture.
Remark 11 It is also worth noting that Assumption A.5 does allow for possible dependence of λi ()
on the coefficients β i and γ ij .
As already shown, the optimal aggregate function (in the mean squared error sense) is given by
ȳwt = E w yt |t + vwt , (32.48)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
t+M−1
yt = t+M y−M + s (Bxt−s + ft−s + εt−s ) . (32.49)
s=0
t+M−1
t+M −1
ȳwt = w P P y−M + w P s P−1 (Bxt−s + ft−s + ε t−s ) . (32.50)
s=0
It is now possible to show that (see Pesaran and Chudik (2014) for details)
t+M−1
E ȳwt |t = w y−M E (at+M |t ) + w E bs (β)Bxt−s |t (32.51)
s=0
t+M−1
t+M−1
+ w E [cs () |t ] ft−s + as E ε̄ w,t−s |t .
s=0 s=1
where ε̄ wt = w ε t .
Assumption A.7: The micro coefficients, β i and γ ij , are random draws from common distri-
butions with finite moments such that
where bs (β) and cs () are defined in Assumption A.5, bs = E [bs (β)β i ], cs = E [cs ()γ i ], and
τ N is an N × 1 vector of ones.
Assumption A.8: The eigenvalues of , λi (), are draws from a common distribution with
support over the range (−1, 1).
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
t+M−1
E ȳwt |t = w y−M E (at+M |t ) + bs x̄w,t−s
s=0
t+M−1
t+M−1
+ cs ft−s + as E ε̄ w,t−s |t ,
s=0 s=1
where x̄wt = w xt , and E ȳwt |t no longer depends on the individual specific regressors.
Under the additional Assumption A.8, and for M sufficiently large, the initial states are also elim-
inated and we have
∞ ∞ ∞
E ȳwt |t = bs x̄w,t−s + cs ft−s + as ηt−s ,
s=0 s=0 s=1
∞
where ηt−s = E ε̄w,t−s |t . Note that ∞ s=1 as ηt−s = E s=1 as ε̄ w,t−s |t . Using this
result in (32.48) we obtain the optimal aggregate function
∞
∞
∞
ȳwt = bs x̄w,t−s + cs ft−s + as ηt−s + vwt , (32.54)
s=0 s=0 s=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
ε 1 = ε ∞ = O N αε ,
Remark 12 Condition 0 ≤ α ε < 1 in Assumption A.9 is sufficient and necessary for weak cross-
section dependence of micro innovations. See Chudik, Pesaran, and Tosetti (2011). Following Bai-
ley, Kapetanios, and Pesaran (2015) we shall refer to the constant α ε as the exponent of cross-
sectional dependence of the idiosyncratic shocks. See also Section 29.2.
Since under Assumption A.6 the errors, ε t , are serially uncorrelated, we have
∞
∞
∞
Var as ε̄ w,t−s = a2s Var ε̄ w,t−s ≤ a2s sup [Var(ε̄wt )] .
s=1 s=1 s=1 t
Furthermore
Var(ε̄wt ) = w ε w ≤ w2 ( ε ) ,
q.m ∞ 2
and ∞ s=1 as ε̄ w,t−s → 0, so long as s=1 as < K, for some positive constant K. Recall that
7
under Assumption
A.9, α ε < 1, and supt [Var(ε̄wt )] → 0, as N → ∞. Moreover, since
∞ ∞
s=1 as ηt−s = E s=1 as ε̄ w,t−s |t , it follows that
∞
q.m
as ηt−s → 0, (32.55)
s=1
The limiting behaviour of vwt , as N → ∞, depends on the nature of the processes generating
xit , ft , and ε it , as well as the degree of cross-section dependence that arises from the non-zero off-
q.m
diagonal elements of . Sufficient conditions for vwt → 0 are not presented here due to space
constraints, but can be found in Pesaran and Chudik (2014, Proposition 1). The key conditions
6 1 = O N α ε .
Note that ( ε ) ≤ ε
7 ∞
A sufficient condition for s=1 as to be bounded is |λi | < 1 − , where is a small, strictly positive number.
2
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
q.m
for vwt → 0 are weak error cross-sectional dependence and sufficiently bounded
dynamic
inter-
actions across the units. These conditions are satisfied, for example, if ε =
E ε t ε
< K,
∞ ∞ t
s=1 E ≤ s=1 E < K, for some finite positive constant, K. If, on the other
s s
and
∞
hand, s=1 E is not bounded as N → ∞, or ε t is strongly cross-sectionally dependent,
s
then the aggregation error, vwt , does not necessarily converge to zero and could be sizeable.
1
N
1 1
θ̄ = θ i = τ N θ = τ N (IN − )−1 β, (32.56)
N i=1 N N
where θ = (IN − )−1 β = β + β + 2 β + . . . is the N × 1 vector of individual
long-run coefficients, and as before τ N is an N × 1 vector of ones. Suppose that Assump-
tions A.7 and A.8 are satisfied and denote the common mean of β i by β. Using (32.52), we
have E (s β) = E {E [bs (β)B | t ]} = bs IN for s = 0, 1, . . .. Hence, the elements of θ have
a common mean, E (θ i ) = θ = ∞ =0 bs , which does not depend on the elements of P. If, in
addition, the sequence of random variables
θ i is ergodic in mean, then for sufficiently large N,
θ̄ is well approximated by its mean, ∞ =0 bs , and the cross-section mean of the micro long-run
effects can be estimated by the long-run coefficient of the associated optimal aggregate model.
This result holds even if β i and λi () are not independently distributed, and irrespective of
whether micro shocks contain a common factor.
Whether θ̄ →p θ deserves a comment. A sufficient condition for θ̄ to converge to its mean
(in probability) is given by
Var (θ ) = O N 1− , for some > 0, (32.57)
q.m.
in which case
Var θ̄
≤ N −1 Var (θ ) = O N − → 0 as N → ∞ and θ̄ → θ.
Condition (32.57) need not always hold. This condition can be violated if there is a high degree
of dependence of micro coefficients β i across i, or if there is a dominant unit in the underlying
model in which case the column norm of becomes unbounded in N.
The mean of β i is straightforward to identify from the aggregate relation since E β i = b0 .
But further restrictions are needed for identification of E [λi ()] from the aggregate model. As
with Pesaran (2003) and Lewbel (1994), the independence of β i and λi () will be sufficient
for the identification of the moments of λi (). Under the assumption that β i and λi () are
independently distributed, all moments of λi () can be identified by
bs
E λsi () = . (32.58)
b0
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Another possibility is to adopt a parametric specification for the distribution of the micro
coefficients and then identify the unknown parameters of the cross-sectional distribution of
micro coefficients from the aggregate specification. For example, suppose β i is independently
distributed of λi (), and λi () has a beta distribution over (0, 1),
λp−1 1 − λq−1
f (λ) = , p > 0, q > 0, 0 < λ < 1.
B p, q
ft = ft−1 + vt , (32.59)
Including the exogenous variables, xt , in the model is relatively straightforward and does not
affect the impulse responses of the shocks to macro factors, vt , or to the idiosyncratic errors.
The lag-orders of the VAR models in (32.59) and (32.60) are set to unity only for expositional
convenience.
We make the following additional assumption.
Assumption A.10: The m × 1 macro shocks, vt , are distributed independently of ε t , for all
t and t . They are also serially uncorrelated, with zero means, and a diagonal variance matrix,
v = Diag(σ 2v1 , σ 2v2 , . . . , σ 2vm ), where 0 < σ 2vj < ∞, for all j.
We are interested in the effects of two types of shocks on the aggregate variable ȳwt = w yt ,
namely the composite macro shock, defined by v̄γ̄ t = w vt = γ̄ w vt , and the aggregated
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
idiosyncratic shock, defined by ε̄wt = w ε t . We shall also consider the combined aggregate
shock defined by
ξ̄ wt = w vt + w ε t = γ̄ w vt + ε̄ wt = v̄γ̄ t + ε̄ wt ,
and investigate the time profiles of the effects of these shocks on ȳw,t+s , for s = 0, 1, . . . . The
combined aggregate shock, ξ̄ wt , can be identified from the aggregate equation in ȳwt , so long as
an AR(∞) approximation for ȳwt exists. Since by assumption εt and vt are distributed indepen-
dently then
Var ξ̄ wt = γ̄ w v γ̄ w +w ε w = σ 2v̄ + σ 2ε̄ = σ 2ξ̄ ,
where σ 2v̄ = γ̄ w v γ̄ w is the variance of the composite macro shock, and σ 2ε̄ = w ε w is the
variance of the aggregated idiosyncratic shock. Note that when ft is unobserved, the separate
effects of the composite macro shock, v̄γ̄ t , and the aggregated idiosyncratic shock, ε̄wt , can only
be identified under the disaggregated model (32.60). Only the effects of ξ̄ wt on ȳw,t+h can be
identified if the aggregate specification is used.
Using the disaggregate model we obtain the following generalized impulse response functions
(GIRFs)8
w s w
ε
gε̄ (s) = E ȳw,t+s |ε̄ wt = σ ε̄ , It−1 − E ȳw,t+s |It−1 = √ , (32.61)
w ε w
w Cs v ej,v
gvj (s) = E ȳw,t+s vjt = σ vj , It−1 − E ȳw,t+s |It−1 = , (32.62)
ej,v v ej,v
for j = 1, 2, . . . , m, where It is an information set consisting of all current and past available
information at time t,
s
Cs = s−j
j , (32.63)
j=0
and ej,v is an m × 1 selection vector that selects the jth element of vt . Hence
w Cs v γ̄ w
gv̄ (s) = E ȳw,t+s v̄γ̄ t = σ v̄ , It−1 − E ȳw,t+s |It−1 = . (32.64)
γ̄ w v γ̄ w
Finally,
gξ̄ (s) = E ȳw,t+s ξ̄ wt = σ ξ̄ , It−1 − E ȳw,t+s |It−1
w Cs v γ̄ w +w s ε w
= . (32.65)
γ̄ w v γ̄ w +w ε w
Note that C0 = , and we have gξ̄ (0) = γ̄ w v γ̄ w +w ε w = σ ξ̄ , as to be expected.
8 See Chapter 24 for an account of impulse response analysis where the notion of GIRF is also discussed.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
When N is finite, both, the combined aggregated idiosyncratic shock (ε̄wt ) and the composite
macro shock (v̄γ̄ t ) are important; and the impulse response of the combined aggregate shock on
the aggregate variable, given by (32.65), is a linear combination of gε̄ (s) and gv̄ (s), namely
E gε̄ (s) = O N (α ε −1)/2 . (32.67)
γ i = ϰi , for i = 1, 2, . . . , [N αγ ] ,
γ i = 0, for i = [N αγ ] + 1, . . . , N,
The variance of the aggregated idiosyncratic shock, on the other hand, is bounded by
σ 2ε̄ = w ε w ≤ O N α ε −1 . (32.69)
It follows from (32.68)–(32.69) that only when α γ > (α ε + 1) /2 and μκ = 0, the variance of
the composite macro shock dominates, in which case PlimN→∞ σ 2v̄ /σ 2ξ̄ = 1, and the combined
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
aggregate shock, ξ̄ wt = v̄γ̄ t + ε̄wt converges in quadratic mean to the composite macro shock
as N → ∞. It is then possible to scale gξ̄ (s) by σ −1v̄ , and for any given s = 0, 1, 2, . . ., we can
obtain
Plim σ −1 g
v̄ ξ̄ (s) = Plim σ −1v̄ gv̄ (s) .
N→∞ N→∞
where mdv (s) = ωv gvd (s) , and mdε̄ (s) = ωε̄ gε̄d (s) are the respective contributions of the macro
and aggregated idiosyncratic shocks, and the weights ωv and ωε̄ are defined below (32.66).
Aggregation weights are set equal to N −1 in all simulations. The subscript d is introduced to
highlight the fact that these impulse responses are based on the disaggregate model. We know
from theoretical results that, in cases where the optimal aggregate function exists, the common
factor is strong (i.e., α γ = 1), and the idiosyncratic shocks are weakly correlated (i.e., α ε = 0),
then gξ̄d (s) converges to gvd (s) as N → ∞, for all s. But it would be of interest to investigate the
contributions of macro and aggregated idiosyncratic shocks to the aggregate impulse response
functions, when N is finite, as well as when α γ takes intermediate values between 0 and 1.
We also use Monte Carlo experiments to investigate the persistence properties of the aggre-
gate variable. The degree and sources of persistence in macro variables, such as consumer price
inflation, output and real exchange rates, have been of considerable interest in economics. We
know from the theoretical results that there are two key components affecting the persistence
of the aggregate variables: distribution of the eigenvalues of lagged micro coefficients matrix, ,
which we refer to as dynamic heterogeneity; and the persistence of the common factor itself,
which we refer to as the factor persistence. Our aim is to investigate how these two sources of
persistence combine and get amplified in the process of aggregation.
Finally, a related issue of practical significance is the effect of estimation uncertainty on the
above comparisons. To this end, we estimate disaggregated models using observations on indi-
vidual micro units, yit , as well as an aggregate model that only makes use of the aggregate obser-
vations, ȳt . We denote the estimated impulse responses of the combined aggregate shock on the
aggregate variable by ĝξ̄d (s) when based on the disaggregate model, and by ĝξ̄a (s) when based on
an aggregate autoregressive model fitted to ȳt . It is important to recall that, in general, the effects
of macro and aggregated idiosyncratic shocks cannot be identified from the aggregate model.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
The remainder of this section is organized as follows. The next sub-section outlines the Monte
Carlo design. Section 32.9.2 describes the estimation of gξ̄d (s) using aggregate and disaggregate
data, and the last sub-section discusses the main findings.
and
where each unit, except the first, has one left neighbour (yi−1,t−1 ). The micro model given by
(32.71)-(32.72) can be written conveniently in vector notations as
yt = yt−1 + γ ft + εt , (32.73)
where yt = y1t , y2t , . . . , yNt , γ = γ 1 , γ 2 , . . . , γ N , εt = (ε 1t , ε 2t , . . . , εNt ) , and
⎛ ⎞
λ1 0 0 ··· 0
⎜ d2 λ2 0 ··· 0 ⎟
⎜ ⎟
⎜ 0 d3 λ3 ··· 0 ⎟
=⎜ ⎟.
⎜ .. .. .. .. .. ⎟
⎝ . . . . . ⎠
0 0 ··· dN λN
The autoregressive micro coefficients, λi , are generated as λi ∼ IIDU (0, λmax ), for
i = 1, 2, . . . , N, with λmax = 0.9 or 1. Recall that ȳt will exhibit long memory features when
λmax = 1, but not when λmax = 0.9. The neighbourhood coefficients, di , are generated as
IIDU (0, 1 − λi ), for i = 2, 3, . . . , N, to ensure bounded variances as N→∞. Specifically,
∞ ≤ maxi {|λi | + |di |} < 1, see Chudik and Pesaran (2011).
The idiosyncratic errors, ε t , are generated according to the following spatial autoregressive
process,
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
To ensure that the idiosyncratic errors are weakly correlated, the spatial autoregressive parameter,
δ, must lie in the range [0, 1). We set δ = 0.4. The variance σ 2ς is set equal to N/(τ N RR τ N ),
where τ N = (1, 1, . . . , 1) and R = (IN − δS)−1 , so that Var (ε̄ t ) = N −1 .
The common factor, ft , is generated as
ft = ψft−1 + vt , vt ∼ IIDN 0, 1 − ψ 2 , |ψ| < 1,
for t = −49, −48, . . . , 1, 2, . .. ,T, with f−50 = 0. We consider three values for ψ, namely 0, 0.5
and 0.8. By construction, Var ft = 1.
Finally, the factor loadings are generated as
γ i = κi , for i = 1, 2, . . . , [N αγ ] ,
γ i = 0, for i = [N α γ ] + 1, [N αγ ] + 2, . . . , N,
pa
ȳt = π ȳt− + ζ at .
=1
To estimate gξ̄ (s) using disaggregated data is much more complicated and requires estimates
of the micro coefficients. In terms of the micro parameters, using (32.65), we have
gξ̄d (s) = E ȳw,t+s ξ̄ wt = σ ξ̄ , It−1 − E ȳw,t+s |It−1
s
= [E ut+s− ξ̄ wt = σ ξ̄ , It−1 − E ut+s− It−1 ]. (32.74)
=0
Following Chudik and Pesaran (2011), we first estimate the non-zero elements of , namely λi
and di , using the cross-section augmented least squares regressions,
yit = λi yi,t−1 + di yi−1,t−1 + hi L, phi ȳt + ζ it , for i = 2, 3, . . . , N, (32.75)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
phi
where hi L, pi = =0 hi L , and phi is the lag-order. The equation for the first micro unit is
the same except that it does not feature any neighbourhood effects.9 These estimates are denoted
by λ̂i and d̂i , and an estimate of uit is computed as
N
where û¯ t = N −1 i=1 ûit ; and the following marginal model,
An estimate of ξ it is computed as ξ̂ it = ûit − r̂i ψ̂ ū û¯ t−1 , for i = 1, 2, . . . , N, where r̂i and
ψ̂ ū are the estimates of ri and ψ ū , respectively. When α γ = 1, ψ̂ ū is a consistent estimator
(as N, T →j ∞) of the autoregressive parameter ψ that characterizes the persistence of the
factor, r̂i is a consistent estimator of the scaled factor loading, γ i /γ̄ , and the regression residuals
from (32.79), denoted by ϑ̂ t , are consistent estimates of the macro shock, vt . But, when γ = 0,
j
ūt = N −1 N i=1 uit is serially uncorrelated and ψ̂ ū →p 0 as N, T → ∞.
To compute the remaining terms in (32.74), we note that for s = = 0, E ut ξ̄ wt =
σ̂ ξ̄ , It−1 − E ut It−1 = E ξ t ξ̄ wt = σ̂ ξ̄ , It−1 can be consistently estimated by ˆ ξ w/σ̂ ξ̄ ,
1/2
where σ̂ ξ̄ = w ˆ ξw , ˆ ξ = T −1 Tt=p +1 ξ̂ t ξ̂ t , ξ̂ t = ξ̂ 1t , ξ̂ 2t , . . . , ξ̂ Nt , and ph =
h
maxi phi . Similarly, for s − > 0, E ut+s− ξ̄ wt = σ̂ ξ̄ , It−1 − E ut+s− It−1 can be
s−
consistently estimated by ψ̂ u σ̂ 2ϑ r̂/ w ˆ ξ w 1/2 , where r̂ = r̂1 , r̂2 , . . . , r̂N , and σ̂ 2ϑ =
2
T −1 Tt=ph +1 ϑ̂ t . All lag-orders are selected by AIC with the maximum lag-order set to [T 1/2 ].
9 Chudik and Pesaran (2011) show that if ∞ < 1, these augmented least squares estimates of the micro lagged
j
coefficients are consistent and asymptotically normal when α γ = 1 (as N, T → ∞), and also when there is no factor, i.e.,
γ = 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
macro shock aggregated idiosyncratic shock macro shock aggregated idiosyncratic shock
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
macro shock aggregated idiosyncratic shock macro shock aggregated idiosyncratic shock
Figure 32.1 Contribution of the macro and aggregated idiosyncratic shocks to GIRF of one unit (1 s.e.)
combined aggregate shock on the aggregate variable; N = 200.
completely dominates the aggregate relationship. Similar results are obtained for N as small as
25 (not reported). Whether the support of the distribution of the eigenvalues λi covers unity or
not does not seem to make any difference to the relative importance of the macro shock. Table
32.1 reports the weights ωv and ωε̄ for different values of N, and complements what can be seen
from the plots in Figure 32.1. Note that these weights do not depend on the choice of λmax and,
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
by construction ω2v + ω2ε̄ = 1. We see in Table 32.1 that for α γ = 1, ωv is very close to unity
for all values of N considered, and gξ̄d (s) is mainly explained by the macro shock, regardless of
the shape of the impulse response functions.
Next we examine how dynamic heterogeneity and factor persistence affect the persistence of
the aggregate variable. Figure 32.2 plots the GIRF of the combined aggregate shock on the aggre-
gate variable, gε̄d (s), for N = 200 and different values of λmax and ψ, that control the dynamic
heterogeneity and the persistence of the factor, respectively. The plot on the left of the figure
relates to λmax = 0.9 and the one on the right to λmax = 1. It is interesting that gξ̄d (s) looks very
different when we allow for serial correlation in the common factor. Even for a moderate value of
ψ, say 0.5, the factor contributes significantly to the overall persistence of the aggregate. By con-
trast, the effects of long memory on persistence (comparing the plots on the left and the right
of the panels in Figure 32.2), are rather modest. Common factor persistence tends to become
accentuated by the individual-specific dynamics.
y =0 y =0
0.5 0.5
0.0 0.0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
Figure 32.2 GIRFs of one unit combined aggregate shock on the aggregate variable, gξ̄ (s), for different
persistence of common factor, ψ = 0, 0.5, and 0.8.
Finally, we consider the estimates of gξ̄ (s) based on the disaggregate and the aggregate models,
namely ĝξ̄d (s) and ĝξ̄a (s). Table 32.2 reports the root mean square error (RMSE×100) of these
estimates averaged over horizons s = 0 to 12 and s = 13 to 24, for the parameter values α γ =
0.5, 1, and ψ = 0.5, using 2,000 Monte Carlo replications.10 The estimator based on the disag-
gregate model, ĝξ̄d (s), performs much better (in some cases by twice as much) than its counter-
part based on the aggregate model. The difference between the two estimators is slightly smaller
when α γ = 0.5. As to be expected, an increase in the time dimension considerably improves
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Table 32.2 RMSE (×100) of estimating GIRF of one unit (1 s.e.) combined aggregate
shock on the aggregate variable, averaged over horizons s = 0 to 12 and s = 13 to 24
ĝ a ĝ d ĝ a ĝ d ĝ a ĝ d ĝ a ĝ d
ξ̄ ξ̄ ξ̄ ξ̄ ξ̄ ξ̄ ξ̄ ξ̄
Experiments with α γ = 1
(b) λmax = 1
(d) λmax = 1
the precision of the estimates. Also, ĝξ̄d (s) improves with an increase in N, whereas the RMSE of
ĝξ̄a (s) is little affected by increasing N when α γ = 1, but improves with N when α γ = 0.5.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
tion where changes in aggregate consumption respond to anticipated changes in labour income,
whilst the theory predicts otherwise. For a review of the empirical literature on excess smooth-
ness and excess sensitivity see, for example, Muellbauer and Lattimore (1995). Carroll and Weil
(1994) suggest that the reverse causality between growth and saving often observed in aggregate
data could be due to the neglect of habit formation in consumption behaviour. Fuhrer (2000)
maintains that the dynamics of aggregate consumption decisions as represented by autocovari-
ance functions can be much better understood using a model with habit formation than using a
model with standard time-separable preferences. A problem common to all these studies using
representative agent frameworks is that the coefficient of habit formation needed to reconcile the
model with the data is typically deemed implausibly high. In this section we consider the aggre-
gate implications of allowing for heterogeneity in habit formation coefficients across individuals
and investigate the extent to which empirical puzzles observed in aggregate consumption data
are due to the aggregation problem. Using stochastic simulations, Pesaran (2003) shows that the
estimates of the habit persistence coefficient are likely to be seriously biased downward if they
are based on analogue aggregate consumption functions, which could partly explain the excess
smoothness and excess sensitivity puzzles in terms of neglected heterogeneity.11
Consider an economy composed of a large number of consumers, where each consumer
indexed by i, i = 1, 2, . . . , N, at the beginning of period t is endowed with an initial level of
financial wealth, ai,t−1 . His/her labour income over the period t − 1 to t, yit , is generated accord-
ing to the following geometric random walk model
t
log yit = α i + μt + vs + ξ it , (32.80)
s=1
This formulation allows labour incomes at the individual and the economy-wide levels to exhibit
geometric growth and at the same time yields a plausible steady state size distribution for labour
incomes. Each individual solves the following inter-temporal optimization problem
∞
!
max E δ u(ci,t+s , ci,t+s−1 )|it
s
(32.82)
{ci,t+s }∞
s=0 s=0
11 In a different attempt at resolving the excess smoothness and excess sensitivity puzzles, Binder and Pesaran (2002)
argue that social interactions when combined with habit formation can also help.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
and given initial consumption levels, ci,t−1 , as well as initial wealth levels, ai,t−1 , for all i. In equa-
tions (32.82)–(32.84) uit = u(cit , ci,t−1 ) represents individual ith current-period utility function
for period t, δ = 1/(1 + ρ) represents a constant discount factor, r is the constant real rate of
interest, and E(·|it ) denotes the mathematical conditional expectations operator with respect
to the information set available to the individual at time t
Given the focus of our analysis on aggregation of linear models, we consider the case where the
current period utility function is quadratic, namely
−1
uit = (cit − λi ci,t−1 − c̄i )2 , 0 < λi < 1, (32.86)
2
λi is the habit formation coefficient, and c̄i is the saturation coefficient. For simplicity we also
assume that ρ = r, so that individuals are time-indifferent. For each individual the consump-
tion decision rule for time period t that solves the above inter-temporal optimization problem is
given by
1
cit = λi ci,t−1 + β i yit + γ i exp(α i + σ 2ξ ) [ỹt − (1 + r)ỹt−1 ] , (32.87)
2
t
ỹt = exp(μt + vs ), (32.88)
s=1
r(1 + r − λi )
βi = , (32.89)
(1 + r)2
r(1 + r − λi )(1 + g)
γi = , (32.90)
(1 + r)2 (r − g)
1
g = exp(μ + σ 2v ) − 1. (32.91)
2
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
N
Defining economy-wide average labour income as ȳt = (1/N) i=1 yit , then under (32.81) as
N → ∞ we have
p σ 2α σ 2ξ
ȳt −→ ỹt exp(α + + ). (32.93)
2 2
1
cit = λci,t−1 + βyit + γ exp(α i + σ 2ξ ) [ỹt − (1 + r)ỹt−1 ] , (32.94)
2
r(1 + r − λ)
β= , (32.95)
(1 + r)2
r(1 + r − λ)(1 + g)
γ = , (32.96)
(1 + r)2 (r − g)
N p
and using (32.93) and noting that N1 i=1 exp(α i ) −→ exp(α+ 12 σ 2α ) yields the perfect aggre-
gate model
r(1 + r − λ)(1 + g)
c̄t = λc̄t−1 + βȳt + [ȳt − (1 + r)ȳt−1 ] , (32.97)
(1 + r)2 (r − g)
or equivalently
r(1 + r − λ)
c̄t = λc̄t−1 + [ȳt − (1 + g)ȳt−1 ] . (32.98)
(1 + r)(r − g)
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Assuming that λi ’s are IID draws from a distribution with finite moments of all orders defined
on the unit interval, and taking conditional
expectations
of both sides of (32.100) with respect
to ϒ t = ∪Ni=1 ϒ it , where ϒ it = y it , y i,t−1 , . . . ∪ ȳ t , ȳ t−1 , . . . , we have
1 j
∞ N
E (c̄t |ϒ t ) = E β i λi |ϒ t yi,t−j +
N j=0 i=1
1 j
∞ N
1
+ E γ i λi |ϒ t exp(α i + σ 2ξ ) ỹt−j − (1 + r)ỹt−j−1 .
N j=0 i=1 2
j j j j
Since E β i λi |ϒ t = E β i λi = aj and E γ i λi |ϒ t = E γ i λi = bj , for all i, then we have12
∞
∞
1
N
1 2
E (c̄t |ϒ t ) = aj ȳt−j + exp(α i + σ ξ ) bj ỹt−j − (1 + r)ỹt−j−1 .
j=0
N i=1 2 j=0
(32.101)
But, as noted earlier, for N sufficiently large
1 σ 2ξ
N
1 2 p σ 2α
exp(α i + σ ξ ) −→ exp(α + + ),
N i=1 2 2 2
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
and mj = E(λj ) is the jth -order moment of λi . Similarly, using (32.90) and (32.102) we have
(1 + g)
j
bj = E γ i λi = aj . (32.103)
(r − g)
Now taking
conditional expectations of (32.101) with respect to the aggregate information
set t = ȳt , ȳt−1 , . . . ; c̄t−1 , c̄t−2 , . . .
E(c̄t |t )
∞
∞
1+g
= aj ȳt−j + aj ȳt−j − (1 + r)ȳt−j−1
j=0
r − g j=0
⎧ ⎫
" #⎨ ∞ ⎬
1+r
= a0 ȳt + [aj − (1 + g)aj−1 ]ȳt−j .
r−g ⎩ j=1
⎭
where ε t is the aggregation error and by construction satisfies the orthogonality condition
E(ε t | t ) = 0.
The aggregation errors are serially uncorrelated with zero means, but in general are not
homoskedastic. The above optimal aggregate function is directly comparable to the aggregate
model, (32.98), obtained under homogeneous habit formation coefficients. It is easily seen that
(32.104) reduces to (32.98) if λi = λ for all i. The aggregation errors, εt ’s, also vanish if and
only if λi = λ. Finally, unless the habit formation coefficients are homogeneous, the optimal
aggregate model cannot be written as a finite-order ARDL model in c̄t and ȳt − (1 + g)ȳt−1 .
See Pesaran (2003) for an illustrative numerical result on the extent of the aggregation bias.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
due to dynamic heterogeneity alone does not explain the persistence of the aggregate inflation,
rather it is the combination of factor persistence and dynamic heterogeneity that is responsible
for the high persistence of aggregate inflation as compared to the persistence of the underlying
individual inflation series.
32.11.1 Data
The inflation series for the ith price category is computed as yit = 400 × ln qit − ln qi,t−1 ,
where qit is the seasonally adjusted consumer price index of unit i at time t.13 Units are individ-
ual categories of the consumer price index (e.g. bread, wine, medical services,…) and the time
dimension is quarterly covering the period 1985Q1 to 2004Q2; altogether 78 observations per
price category. There are 85 categories in Germany, 145 in France, and 168 in Italy. The aggregate
inflation measure is computed as ywt = N i=1 wi yit , where N is the number of price categories
and wi is the weight of the ith category in the consumer price index. Pesaran and Chudik (2014)
conduct their empirical analysis for each of the three countries separately.
where |Ci | is the number of neighbours of unit i, assumed to be small and fixed as N → ∞, si
is the corresponding N × 1 sparse weights vector with |Ci | nonzero elements. yit represents the
local average of unit i. No unit is assumed to be dominant in the sense discussed by Chudik and
Pesaran (2011).
Following Pesaran (2006) and Chudik and Pesaran (2015a), economy wide average,
ȳt = N −1 N j=1 yjt , and the three sectoral averages
1
ȳkt = yjt = wk yt , for k ∈ {f , g, s},
|Qk |
j∈Qk
are used in estimation, where Qk for k = {f , g, s} defines the set of units belonging to the food
and beverages sector (f ), the goods sector (g), and the services sector (s). |Qk | is the number
of units in sector k, and wk is the corresponding vector of sectoral weights. The following cross-
section augmented regressions are estimated by least squares for the price category i belonging
to sector k (intercepts are included but not shown)14
13 Descriptive statistics of the individual price categories are provided in Altissimo et al. (2009, Table 2).
14 The estimates are dynamic CCE discussed in Section 29.5.3.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
piφ
pid
pih
pik
yit = φ ii yi,t− + di yi,t− + hi ȳt− + hki ȳk,t− +ζ it , for i ∈ Qk and k ∈ {f , g, s}.
=1 =1 =0 =0
(32.105)
The same equations are also estimated for the energy price category, but without sectoral aver-
ages. The estimates are dynamic CCE discussed in Section 29.5.3. Impulse response function
of the combined aggregate shock on the aggregate variable in a disaggregate model is computed
in the same way as in Section 32.9. The lag-orders for the individual price equations are chosen
by AIC with the maximum lag-order set to 2. In line with the theoretical derivations, a higher
maximum lag-order is selected when estimating the aggregate inflation equations.
Table 32.3 Summary statistics for individual price relations for Germany, France, and Italy
(equation (32.105))
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
the lower rejection rate for the cross-section averages, compared to the own lagged coefficients.
2
The fit is relatively high in most cases. The average R is 56 per cent in Germany, 48 per cent
in France, and 51 per cent in Italy (median values are 61 per cent, 52 per cent, and 54 per cent,
respectively).
Panel B. Bootstrap means and 90% confidence bounds based on aggregate model; y-axis shows the
estimated size of the shock.
Germany France Italy
1 1.6 1.4
1.4 1.2
0.8 1.2 1
0.6 1
0.8
0.8
0.4 0.6
0.6
0.4 0.4
0.2
0.2 0.2
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
Panel C. Bootstrap means and 90% confidence bounds based on disaggregate model; y-axis shows the
estimated size of the shock.
Germany France Italy
1 2 2.2
1.8 2
0.8 1.6 1.8
1.4 1.6
0.6 1.2 1.4
1 1.2
0.4 0.8 1
0.8
0.6 0.6
0.2 0.4 0.4
0.2 0.2
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
Figure 32.3 GIRFs of one unit combined aggregate shock on the aggregate variable.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
Using the estimates of micro lagged coefficients in (33.36), for i = 1, 2, . . . , N, Pesaran and
Chudik (2014) compute eigenvalues of the companion matrix corresponding to the VAR poly-
ˆ
nomial matrix (L),
⎛ ⎞ ⎛ ⎞
φ̂ 11 (L) 0 ··· 0 d̂1 (L)s1
⎜ ⎟ ⎜ ⎟
⎜ φ̂ 22 (L) · · · ⎟ ⎜ d̂2 (L)s2 ⎟
ˆ ⎜ 0 0 ⎟ ⎜
⎜
⎟
⎟,
(L) = ⎜ ⎟+⎜ ⎟
⎜ .. .. .. .. ⎟ ⎜ .. ⎟
⎝ . . . . ⎠ ⎝ . ⎠
0 0 · · · φ̂ NN (L) d̂N (L)sN
piφ pid
where φ̂ ii (L) = =1 φ ii L−1 , d̂i (L) = =1 di L−1 , and φ̂ ii and d̂i denote estimates of
φ ii and di , respectively. The modulus of the largest eigenvalue is 0.94 for Germany and Italy,
and 0.89 for France, and do not cover unity. The authors therefore conclude that it is unlikely
that dynamic heterogeneity alone could generate the degree of persistence observed in Figure
32.3.
This conclusion is further investigated in Figure 32.4, which compares the estimates of GIRFs
for the combined aggregate shock on the aggregate variable with âs = w Ĝs τ N at horizons s =
6, 12 and 24, where the matrix Ĝs is defined by ˆ −1 (L) = Ĝ (L) = ∞
s=0 Ĝs L . âs shows the
effects of dynamic heterogeneity on the persistence of the aggregate variable, whereas the GIRFs
of the combined aggregate shock on the aggregate variable is determined by factor persistence as
well as dynamic heterogeneity. In the case of all the three countries, âs is found to decline with s
much faster when compared to the effects of the combined aggregate shock . It therefore seems
that dynamic heterogeneity alone does not sufficiently explain the observed persistence of the
aggregate inflation.
Figure 32.4 GIRFs of one unit combined aggregate shocks on the aggregate variable (light-grey colour) and
estimates of as (dark-grey colour); bootstrap means and 90% confidence bounds, s = 6, 12, and 24.
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
32.13 Exercises
1. Consider the following dynamic factor models for the n cross-sectional units
where
α i (L) = α i0 + α i1 L + α i2 L2 + . . . .
ȳt = ᾱ n (L)ft ,
n
where ȳt = n−1 yit , ᾱ n (L) = ᾱ 0n + ᾱ 1n L + ᾱ 2n L2 + . . . , and
i=1
n
ᾱ jn = n−1 α ij .
i=1
1 − θ iL
α i (L) = , with φ i < 1 and |θ i | < 1,
1 − φiL
φ i are random draws from uniform distribution over the ranges [aθ , bθ ], and
and θ i and
aφ , bφ . Under what values of these ranges is the absolute summability condition in (b)
satisfied?
2. Suppose that
" #
1 − θ iL
yit = ft + uit ,
1 − φiL
where
n
uit = ρ wij ujt + εit ,
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
n
n
ε it ∼ IID(0, σ 2i ), wij > 0, wij = 1 = wij , and |ρ| < 1.
j=1 i=1
where
n "
#
−1 1 − θ iL
ᾱ n (L) = n .
i=1
1 − φiL
(b) Suppose that θ i and φ i are independently uniformly distributed over the ranges [0, 1].
Show that the limit of ȳt as n → ∞, is a long memory process.
(c) Derive the autocorrelation function of ȳt for n sufficiently large.
(d) Discuss the relevance of the above for the analysis of the relationship between macro and
micro relationships in economics.
ft = ρft−1 + vt ,
uit and vit are defined by the following linear stationary processes
∞
∞
uit = aj ε t−j , vit = bj ξ t−j ,
j=0 j=0
where ε t and ξ t are IID(0, 1). The coefficients, λi , γ i and θ i are either fixed constants or, if
stochastic, are independently distributed. Further ft follows the AR(1) process.
(a) Suppose that uit and vit are weakly cross-sectionally uncorrelated. Derive the correlation
n
between ft and ȳt = n−1 yit , and show that this correlation tends to unity as n → ∞.
i=1
(b) How do you forecast ȳ t based on (a) the aggregate
information set ȳt−1 , x̄t ; ȳt−2 ,
x̄t−1 ; . . .. , (b) disaggregated information set yi,t−1 xit ; yi,t−2 , xi,t−1 ; . . . . . . for i = 1,
2, . . . ., n , distinguishing between cases when n is small and when n is large?
i i
i i
OUP CORRECTED PROOF – FINAL, 10/9/2015, SPi
i
(c) What additional information/restrictions are required if the object of interest is to fore-
cast yit for a particular cross-sectional unit i?
for i = 1, 2, . . . , n, where uit and xit are independently distributed, uit ∼ IID(0, σ 2u ), t =
∪ni it , it = (yit , xit ; yi,t−1 xi,t−1 ; . . . .),
xit = γ i ft + vit ,
vit = ρ i vi,t−1 + ε it
ft = ρft−1 + ε t ,
with ε it and ε t being random draws with zero means and constant variances, ρ i ≤ 1, and
|ρ| < 1.
(a) Assuming that |α i | < 1 for all i, show that the above disaggregated rational expectations
model has a unique solution.
(b) Derive an expression for the aggregates ȳt and x̄t constructed as simple averages of yit and
xit over i.
(c) Suppose that uit and ε it are cross-sectionally weakly correlated. Derive the limiting prop-
erties of ȳt and x̄t as n → ∞, and show that they are cointegrated when ρ = 1.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
33.1 Introduction
I ndividual economies in the global economy are interlinked through many different chan-
nels in a complex way. These include sharing scarce resources (such as oil and other com-
modities), political and technological developments, cross-border trade in financial assets as
well as trade in goods and services, labour and capital movement across countries. Even after
allowing for such effects, there might still be residual interdependencies due to unobserved
interactions and spillover effects not taken properly into account by using the common chan-
nels of interaction. Taking account of these channels of interaction poses a major challenge to
modelling the global economy and conducting policy simulations and counterfactual scenario
analyses.
The global VAR (GVAR) approach, originally proposed in Pesaran et al. (2004), provides a
relatively simple yet effective way of modelling complex high-dimensional systems such as the
global economy. Although GVAR is not the first large global macroeconomic model of the world
economy, its methodological contributions lie in dealing with the curse of dimensionality (i.e.,
the proliferation of parameters as the dimension of the model grows) in a theoretically coherent
and statistically consistent manner. Other existing large models are often incomplete and do not
present a closed system, which is required for simulation analysis. See Granger and Jeon (2007)
for a recent overview of global models.
The GVAR approach was developed in the aftermath of the 1997 Asian financial crisis to quan-
tify the effects of macroeconomic developments on the losses of major financial institutions. It
was clear then that all major banks are exposed to risk from adverse global or regional shocks,
but quantifying these effects required a coherent and simple-to-simulate global macroeconomic
model. The GVAR approach provides a useful and practical way of building such a model, and,
although developed originally as a tool for credit risk analysis, it soon became apparent that it has
numerous other applications. This chapter surveys the GVAR approach, focusing on the theo-
retical foundations of the approach as well as its empirical applications.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
The GVAR can be briefly summarized as a two-step approach. In the first step, small-scale
country-specific models are estimated conditional on the rest of the world. These models are
represented as augmented VAR models, denoted as VARX * and feature domestic variables and
weighted cross-section averages of foreign variables, also commonly referred to as ‘star variables’,
which are treated as weakly exogenous (or long-run forcing). In the second step, individual coun-
try VARX ∗ models are stacked and solved simultaneously as one large global VAR model. The
solution can be used for shock scenario analysis and forecasting as is usually done with standard
low-dimensional VAR models.
The simplicity and usefulness of this approach has proved to be quite attractive and there are
numerous applications of the GVAR approach. Individual units need not necessarily be coun-
tries, but could be regions, industries, goods categories, banks, municipalities, or sectors of a
given economy, just to mention a few notable examples. Mixed cross-section GVAR models, for
instance linking country data with firm-level data, have also been considered in the literature.
The GVAR approach is conceptually simple, although it requires some programming skills since
it handles large data sets, and it is not yet incorporated in any of the mainstream econometric
software packages. Fortunately, an open source toolbox developed by Smith and Galesi (2014)
together with a global macroeconomic data set, covering the period 1979–2013, can be obtained
from the web at <https://sites.google.com/site/gvarmodelling/>. This toolbox has greatly facil-
itated empirical research using GVAR methodology.
We start with methodological issues, considering large linear dynamic systems. We suppose
that the large set of variables under consideration are all endogenously determined in a factor-
augmented high-dimensional VAR model. This model allows for a very general pattern of inter-
linkages among variables, but, as is well known, it cannot be estimated consistently due to the
curse of dimensionality when the cross-section dimension (N) is large. GVAR is one of the
common solutions to the curse of dimensionality, alongside popular factor-based modelling
approaches, large-scale Bayesian VARs and panel VARs. We introduce the GVAR approach as
originally proposed by Pesaran et al. (2004) and then review conditions (on the underlying
unobserved high-dimensional VAR data generating process) that justify the individual equa-
tions estimated in the GVAR approach when N and T (the time dimension) are large, and of the
same order of magnitude. Next, we survey the impulse response analysis, forecasting, analysis of
long-run and specification tests in the GVAR approach. Last but not least, we review empirical
GVAR applications. We separate forecasting from non-forecasting applications, and divide the
latter group of empirical papers into global applications (featuring countries) and the remaining
sectoral/other applications, where cross-section units represent sectors, industries or regions
within a given economy.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
p
where L is the time lag operator, L, p = Ik − =1 L is a matrix lag polynomial in L,
a
for = 1, 2, . . . , p are k × k matrices of unknown coefficients, a (L, sα ) = s=1 a L ,
for a = f , ω, a for = 1, 2, . . . , s and a = f , ω are k × ma matrices of factor loadings, ft is
the mf × 1 vector of unobserved common factors, ωt is the mω × 1 vector of observed common
effects, ut is a k × 1 vector
of reduced form errors with zero means, and the k × k covariance
matrix, u = E ut ut . We abstract from deterministic terms to keep the exposition simple,
but such terms can be easily incorporated in the analysis. GVAR allows for very general forms
of interdependencies across individual variables within a given unit and/or across units, since
lags of all k variables enter individual equations, and the reduced form errors are allowed to be
cross-sectionally dependent. GVAR can also be extended to allow for time varying parameters,
nonlinearities, or threshold effects. But such extensions are not considered in this Chapter.1
VAR models provide a rather general description of linear dynamic systems, but their number
of unknown parameters to be estimated grows at a quadratic rate in the dimension of the model,
k. We are interested in applications where the cross-section dimension, N, as well as the time
series dimension, T, can both be relatively large, while ki , for i = 1, 2, . . . , N, are small, so that
k = O (N). A prominent example arises in the case of global macroeconomic modelling, where
the number of cross-section units is relatively large but the number of variables considered within
each cross-sectional unit (such as real output, inflation, stock prices and interest rates) is small.
Understanding the transmission of shocks across economies (space) and time is a key question
in this example. Clearly, in such settings unrestricted VAR models cannot be estimated due to
the proliferation of unknown parameters (often referred to as the curse of dimensionality). The
main problem is how to impose a plethora of restrictions on the model (33.1) so that the param-
eters can be consistently estimated as N, T →j ∞, while still allowing for a general pattern of
interdependencies between the individual variables.
There are several approaches developed for modelling data sets with a large number of vari-
ables: models that utilize common factors (see Chapters 19 and 29 on factor models), large
Bayesian VARs, Panel VARs, and global VARs. Factor models can be interpreted as data shrink-
age procedures, where a large set of variables is shrunk into a small set of factors.2 Estimated
factors can be used together with the vector of domestic variables to form a small-scale model,
as in factor-augmented VAR models (Bernanke, Bovian, and Eliasz (2005) and Stock and Wat-
son (2005)).3 Large-scale Bayesian VARs, on the other hand, explicitly shrink the parameter
space by imposing tight priors on all or a sub-set of parameters. Such models have been explored,
among others, by Giacomini and White (2006), De Mol, Giannone, and Reichlin (2008), Car-
riero, Kapetanios, and Marcellino (2009), and Banbura, Giannone, and Reichlin (2010). Large
Bayesian VARs share many similarities with Panel VARs. The difference between the two is that,
while large Bayesian VARs typically treat each variable symmetrically, Panel VARs take account
of the structure of the variables, namely the division of the variables into different cross-section
groups and variable types. Parameter space is shrunk in the Panel VAR literature by assuming that
1 Extensions of the linear setting to allow for nonlinearities could also be considered, but most of the GVAR papers
in the literature are confined to a linear framework. The few exceptions include Binder and Gross (2013), who develop a
regime-switching GVAR model, and GVAR papers that consider time varying weights.
2 Stock and Watson (1999, 2002), and Giannone, Reichlin, and Sala (2005) conclude that only a few, perhaps two,
factors explain much of the predictable variation, while Bai and Ng (2007) estimate four factors and Stock and Watson
(2005) estimate as many as seven factors.
3 Dynamic factor models were introduced by Geweke (1977) and Sargent and Sims (1977), which have more recently
been generalized to allow for weak cross-sectional dependence by Forni and Lippi (2001), Forni et al. (2000), and Forni
et al. (2004).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
the unknown coefficients can be decomposed into a component that is common across all vari-
ables, a cross-section specific component, a variable-specific component, lag-specific compo-
nent, and idiosyncratic effects; see Canova and Ciccarelli (2013) for a survey. Last but not least,
the GVAR approach solves the dimensionality problem by decomposing the underlying large
dimensional VARs into a smaller number of conditional models, which are linked together via
cross-sectional averages. The GVAR approach imposes an intuitive structure on cross-country
interlinkages and no restrictions are imposed on the dynamics of the individual country sub-
models. In the case where the number of lags is relatively large (compared with the time dimen-
sion of the panel) and/or the number of country specific variables is moderately large, it is pos-
sible to combine the GVAR structure with shrinkage estimation approaches in light of the usual
bias-variance trade-offs. Bayesian estimation of country-specific sub-models that feature in the
GVAR approach have been considered, for instance in Feldkircher et al. (2014).
pi
qi
xit = i xi,t− + i0 xit∗ + ∗
i xi,t− + εit , (33.3)
=1 =1
4 It is straightforward to accommodate a different number of star variables across countries (ki∗ instead of k∗ ), if desired.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
under conditions reviewed in Section 33.4, be treated as weakly exogenous for the purpose of
estimating the unknown coefficients of the conditional country models.
Let zit = xit , xit∗ be the ki +k∗ dimensional vector of domestic and country-specific foreign
variables included in the sub-model of country i and rewrite (33.3) as
p
Ai0 zit = Ai zit− + εit , (33.4)
=1
where
Ai0 = Iki , −i0 , Ai = (i , i ) for = 1, 2, . . . , p,
p = maxi pi , qi , and define i = 0 for > pi , and similarly i = 0 for > qi . Individual
country-models in (33.4) can be equivalently written in the form of error-correction represen-
tation,
p
xit = i0 xit∗ − i zi,t−1 + Hi zi,t−1 + εit , (33.5)
=1
p
i = Ai0 − Ai , and Hi = − Ai,+1 + Ai,+2 + . . . + Ai,+p .
=1
i = α i β i ,
where α i is the ki × ri full column rank loading matrix and β i is the (ki + k∗ ) × ri full column
rank matrix of cointegrating vectors. It is well known that this decomposition is not unique and
the identification of long-run relationships requires theory-based restrictions (see Sections 23.6
and 33.7).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Country models in (33.3) resemble the small open economy (SOE) macroeconomic models
in the literature, where domestic variables are modelled conditional on the rest of the world. The
data shrinkage given by (33.2) solves the dimensionality problem. The conditions under which
it is valid to specify (33.3) are reviewed in Section 33.4. The estimation of country models in
(33.3), which allows for cointegration within and across countries (via the star variables), is the
first step of the GVAR approach.
The second step of the GVAR approach consists of stacking estimated country models to form
one large global VAR model. Using the (ki + k∗ )×k dimensional ‘link’ matrices Wi = Ei , W̃i ,
where Ei is the k × ki dimensional selection matrix that selects xit , namely xit = Ei xt , and W̃i is
the weight matrix introduced in (33.2) to define country-specific foreign star variables, we have
zit = xit , xit∗ = Wi xt . (33.6)
p
Ai0 Wi xt = Ai Wi xt− + εit ,
=1
p
G0 xt = G xt− + εt , (33.7)
=1
where εt = ε 1t , ε 2t , . . . , ε Nt , and
⎛ ⎞
A1, W1
⎜ A2, W2 ⎟
⎜ ⎟
G = ⎜ .. ⎟, for = 0, 1, 2, . . . , p.
⎝ . ⎠
AN, WN
p
xt = F xt− + G−1
0 εt , (33.8)
=1
where F = G−1 0 G for = 1, 2, . . . , p. PSW established that the overall number of cointegrat-
ing relationships in the GVAR model (33.8) cannot exceed the total number of long-run relations
N
i=1 ri that exist in country-specific models.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
where we abstract from lags of xit , xit∗ . Let 0 be the k × k block diagonal matrix defined by
0 = diag (i0 ) , and let W̃ = (W̃1 , W̃2 , . . . ., W̃N ). Write (33.9) as
xt = 0 W̃xt + ε t ,
or
G0 xt = εt , (33.10)
where G0 = IN − 0 W̃. Suppose that G0 is rank deficient, namely rank (G0 ) = k − m, for
some m > 0. Then the solution of (33.10) exists only if ε t lies in the range of G0 , denoted as
Col (G0 ). Assuming this is the case, system (33.10) does not uniquely determine xt , and the set
of all its possible solutions can be characterized as
xt = f̃t + G+
0 εt , (33.11)
xt = ft + Rε t ,
R = M + G+
0.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Without any loss of generality, it is standard convention to use the normalization Var (ft ) = Im ,
and to set the first non-zero element in each of the m column vectors of to be positive. These
normalization conditions ensure that is unique, in which case R is unique up to the rotation
matrix, M. Note also that all of the findings above hold for any N.
Therefore, the full rank condition, rank(G0 ) = k, is necessary and sufficient for xt , given
by (33.9), to be uniquely determined. If G0 is known to be rank deficient with rank k − m,
and m > 0, then the GVAR model (33.9) would need to be augmented by m equations that
determine the m cross-section averages defined by xt in order for xt to be uniquely determined.
We provide further clarification on the rank of G0 in Section 33.6, where we review conditions
under which the individual equations estimated in the GVAR approach can lead to a singular G0
as N → ∞.
pi
qi
si
xit = i xi,t− + i0 xit∗ + ∗
i xi,t− + Di0 ωt + Di ωt− + ε it , (33.12)
=1 =1 =1
for i = 1, 2, . . . , N. Both types of variables (common variables ωt and cross-section averages xit∗ )
can be treated as weakly exogenous for the purpose of estimation. As noted above, the weak exo-
geneity assumption is testable. Also not all of the coefficients {Di } associated with the common
variables need be significant and, in the case when they are not significant, they could be excluded
for the sake of parsimony.8 The marginal model for the dominant variables can be estimated with
or without the feedback effects from xt . In the latter case, we have the following marginal model,
pω
ωt = ω ωt− + ηωt , (33.13)
=1
pω
where α ω β ω = =1 ω , Hω = − ω,+1 + ω,+2 + . . . + ω,+pω −1 , for =
1, 2, . . . , pω − 1. In the case of I (1) variables, representation (33.14) clearly allows for cointe-
gration among the dominant variables. To allow for feedback effects from the variables in the
GVAR model back to the dominant variables via cross-section averages, the VAR model (33.13)
can be augmented by lags of xωt ∗ = W̃ x , where W̃ is a k∗ × k dimensional weight matrix
ω t ω
∗
defining k global cross-section averages,
8 Chudik and Smith (2013) find that contemporaneous US variables are significant in individual non-US country mod-
els in about a quarter of cases. Moreover, weak exogeneity of the US variables is not rejected by the data.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
pω
qω
∗
ωt = ω ωi,t− + ω xi,t− + ηωt . (33.15)
=1 =1
Assuming there is no cointegration among the common variables, ωt , and the cross-section aver-
∗
ages, xi,t− , (33.15) can be written as
pω −1 qω −1
ωt = −α ω β ω ωt−1 + =1 Hω ωt− + =1
∗
Bω xω,t− + ηωt , (33.16)
where Bω = − ω,+1 + ω,+2 + . . . + ω,+qω −1 . Different lag-orders for the dominant
variables (pω ) and cross-section averages (qω ) can be considered. Note that contemporane-
ous values of star variables do not feature in (33.16), and its unknown parameters can be esti-
mated consistently using least squares or reduced rank regression techniques depending on the
assumed rank of α ω β ω . Similar equations are estimated in Holly, Pesaran, and Yamagata (2011),
and in a stationary setting in Smith and Yamagata (2011).
Conditional models (33.12) and the marginal model (33.16) can be combined and solved as a
complete global VAR model in the usual way. Specifically, let yt = ωt , xt be the (k + mω ) × 1
vector of all observable variables. Using (33.6) in (33.12) and stacking country-specific condi-
tional models (33.12) together with the model for common variables (33.15) yields
p
Gy,0 yt = Gy, yt− + ε yt , (33.17)
=1
where εyt = εt , ηωt ,
Imω 0mω ×k ω ω W̃ω
Gy,0 = , Gy, = , for = 1, 2, . . . , p,
D0 G0 D G
D = D1 , D2 , . . . , DN for = 0, 1, . . . , p, p = maxi pi , qi , si , pω , qω , and we define
Di = 0 for > si , ω = 0 for > pω , and ω = 0 for > qω . Matrix Gy,0 is invertible if
and only if G0 is invertible. Assuming G−1
0 exists, the inverse of Gy,0 is
Imω 0mω ×k
G−1 = ,
y,0 −G−1
0 D0 G−1
0
which is a block lower triangular matrix, showing the long-run causal nature of the common
(dominant) variables, ωt . Multiplying both sides of (33.17) by G−1
y,0 we now obtain the following
GVAR model for yt
p
yt = F yt− + G−1
y,0 ε y t , (33.18)
=1
where F = G−1
y,0 Gy, , for = 1, 2, . . . , p.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
For each i, i is a ki × m matrix of factor loadings, assumed to be uniformly bounded ( i <
K < ∞), and ξ it is a ki × 1 vector of country-specific effects. Factors and the country effects are
assumed to satisfy
∞
where f (L) = ∞
=0 f L , i (L) =
=0 i L , and the coefficient matrices f and
i , for i = 1, 2,. . . , N,
are uniformly absolute summable, which ensures the existence of
Var (ft ) and Var ξ it . In addition, [i (L)]−1 is assumed to exist.
Under these assumptions, after first differencing (33.19) and using (33.21), DdPS obtain
pi
−1
(1 − L) [i (L)] ≈ i L = i L, pi ,
=0
DdPS further obtain the following approximate VAR pi model with factors
i L, pi xit ≈ i L, pi i ft + uit , (33.22)
for i = 1, 2, . . . , N, which is a special case of (33.1). Model (33.22) is more restrictive than
(33.1) because lags of other units do not feature in (33.22), and the errors, uit , are assumed to
be cross-sectionally independently distributed.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
W̃i < KN − 12 , for all i (33.23)
W̃ij
< KN − 2 , for all i, j,
1
(33.24)
W̃i
, . . . , W̃ , and the con-
where W̃ij are the blocks in the partitioned form of W̃i = W̃i1 , W̃i2 iN
stant K < ∞ does not depend on i, j or N. Taking cross-section averages of xit given by (33.19)
yields
N
N
ξ ∗it = W̃ij ξ it = W̃ij i (L) uit .
j=1 j=1
q.m.
∗ −1 ∗ ∗
ft → ∗ i i i xit − ξ ∗i
as N → ∞, which justifies using 1, xit∗ as proxies for the unobserved common factors. Thus,
for N sufficiently large, DdPS obtain the following country-specific VAR models augmented
with xit∗ ,
i L, pi xit − δ̃ i − ˜ i xit∗ ≈ uit , (33.25)
where δ̃ i and ˜ i are given in terms of ξ ∗i and ∗i . (33.25) motivates the use of VARX ∗ conditional
country models in (33.3) as an approximation to a global factor model.
N
Note that the weights W̃i i=1 used in the construction of cross-sectional averages only need
to satisfy the granularity conditions (33.23) and (33.24), and for large N asymptotics one might
as well use equal weights, namely replace all cross-sectional averages by simple averages. For the
q.m.
theory to work, it is only needed that ξ ∗it → 0 at a sufficiently fast rate as N → ∞. For
example, the weights could also be time varying without any major consequences so long as the
granularity conditions are met in each period. In practice, where the number of countries (N)
is moderate and spillover effects could also be of importance, it is advisable to use trade weights
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
that also capture cultural and political interlinkages across countries.10 Trade weights can also be
used to allow for time variations in the weights used when constructing the star variables. This
is particularly important in cases where there are important shifts in the trade weights, as has
occurred in the case of China and its trading partners. Allowing for such time variations is also
important in analyzing the way shocks transmit across the world economy. We review some of
the empirical applications of the GVAR that employ time varying weights below.
The analysis of DdPS has been further extended by Chudik and Pesaran (2011) and Chudik
and Pesaran (2013) to allow for joint asymptotics (i.e., as N and T → ∞, jointly), and weak
cross-sectional dependence in the errors in the case of stationary variables.
xt = xt−1 + ut .
Let
⎛ ⎞
α 0 0 ··· 0
⎜ β α 0 ··· 0 ⎟
⎜ ⎟
⎜ 0 β α ··· 0 ⎟
=⎜ ⎟,
N×N ⎜ .. .. .. .. .. ⎟
⎝ . . . . . ⎠
0 0 0 β α
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
This model is stationary for any given N ∈ N, if and only if |α| < 1. Nevertheless, the stationarity
condition |α| < 1 is not sufficient to ensure that the variance of xNt is bounded in N, and without
additional conditions Var (xNt ) can rise with N. To see this, note that
Var(x1t ) = 1/(1 − α 2 ),
1
Var(x2t ) = (λ + 1) ,
1 − α2
..
.
1 N−1
Var(xNt ) = λ + λN−2 + . . . + λ + 1 .
1−α 2
The necessary and sufficient condition for Var(xNt ) to be bounded in N is given by α 2 + β < 1.
2
Therefore, the condition |α| < 1 is not sufficient if N → ∞. The condition < 1 −
implies α 2 + β 2 < 1, and is therefore sufficient (and in this example it is also necessary) for
Var(xNt ) to be bounded in N.
Similarly, as in DdPS, it is assumed in (33.26) that factors are included in the VAR model in
an additive way so that xt can be written as
xt = ft + ξ t , (33.27)
−1
where ξ t = (Ik −
L) ut , and the existence of the inverse of (Ik − L) is ensured by the
assumption on above. One can also consider the alternative factor augmentation setup,
where factors are added to the errors of the VAR model, instead of (33.26), where deviations
of xt from the factors are modelled as a VAR. But it is important to note that both specifica-
tions, (33.26) and (33.28), yield similar asymptotic results. The main difference between the
two formulations lies in the fact that the factor error structure in (33.28) results in infinite-order
distributed lag polynomials (as large N representation for cross-section averages and individ-
ual units), whilst the specification (33.26) yields finite-order lag representations. In the case of
(33.28), the infinite lag-order polynomials must be appropriately truncated for the purposes of
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
consistent estimation and inference, as in Berk (1974), Said and Dickey (1984) and Chudik and
Pesaran (2013, 2015a).
For any set of weights represented by the k × k∗ matrix W̃i we obtain (using (33.27))
2
where W̃i = O N −1 by (33.23), u < K by the weak cross-sectional dependence
2
assumption, and ∞ < K by the assumption on spectral radius of . (33.29)
=0
q.m.
establishes that ξ ∗it → 0 (uniformly in i and t) as N, T →j ∞. It now follows that
q.m.
xit∗ − ∗i ft → 0, as N, T →j ∞, (33.30)
which confirms the well-known result that only strong cross-sectional dependence can survive
large N aggregation with granular weights (see Section 32.5). Therefore, the unobserved com-
mon factors can be approximated by cross-section averages xit∗ in this dynamic setting, provided
that ∗i has full column rank.
It is now easy to see what additional requirements are needed on the coefficient matrix to
obtain country VARX ∗ models in (33.3) when N is large. The model for the country specific
variables, xit , from the system (33.26) is given by
xit = ii xit−1 + ij xj,t−1 − j ft + i ft − i i ft−1 + uit , (33.31)
j=1,j =i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Finally, substituting (33.30) and (33.33) in (33.31) we obtain the country-specific VARX ∗ (1, 1)
model
q.m.
xit − ii xit−1 − i0 xit∗ − i1 xi,t−1
∗
− uit → 0 uniformly in i, and as N → ∞, (33.34)
where
−1 ∗ −1 ∗
i0 = i ∗ ∗ , and i1 = i i ∗ ∗ .
Requirement (33.32) together with the remaining assumptions in this sub-section, is thus suffi-
cient to obtain (33.3) when N is large. In addition to the derivations of large N representations
of the individual country models, CP also show that the coefficient matrices ii , i0 and i1
can be consistently estimated under the joint asymptotics when N and T → ∞, jointly, plus a
number of further assumptions as set out in CP.
It is also important to consider the consequences of relaxing the restrictions in (33.32). One
interesting case is when
units have ‘neighbours’ in the sense that there exist some country pairs
j = i for which ij remains non-negligible as N → ∞. Another interesting departure from
the above assumptions
is when u is not bounded in N, and there exists a dominant unit j
for which ij is non-negligible for the other units, i ∈ Sj ⊆ {1, 2, . . . , N}. These scenarios
are investigated in Chudik and Pesaran (2011, 2013), and they lead to different specifications of
the country-specific models featuring additional variables. To improve estimation and inference
in such cases one can combine the GVAR approach with various penalized shrinkage methods
such as Bayesian shrinkage (Ridge), Lasso or other related techniques where the estimation is
subject to penalty, which becomes increasingly more binding as the number of parameters is
increased.11
11 LASSO and Ridge regressions are discussed in Sections 11.9 and C.7 in Appendix C. Feldkircher et al. (2014) imple-
ment a number of Bayesian priors (the normal-conjugate prior, a non-informative prior on the coefficients and the variance,
the inverse Wishart prior, the Minnesota prior, the single-unit prior, which accommodates potential cointegration relation-
ships, and the stochastic search variable selection prior) in estimating country-specific models in the GVAR.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Therefore, by construction, we have E vt vt = Ik , and the k × 1 vector of structural impulse
response functions is given by
gvj (h) = E xt+h | vjt = 1, It−1 − E (xt+h | It−1 ) , (33.36)
Rh G−1
0 Pej
= ,
ej ej
for j = 1, 2, . . . , k, where It = {xt , xt−1 , . . .} is the information set consisting of all available
information at time t, and ej is a k × 1 selection vector that selects the variable j, and the k × k
matrices, Rh , are obtained recursively as (see (33.18))
p
Rh = F Rh− with R0 = Ik and R = 0 for < 0.
=1
Expectation operators in (33.36) are taken assuming that the GVAR model (33.8) is the DGP.
Decomposition (33.35) isnot unique and identification of shocks requires k (k − 1) /2 restric-
tions, which is of order O k2 .12 Even for moderate values of k, motivating such a large number
of restrictions is problematic, especially given that the existing macroeconomic literature focuses
mostly on distinguishing between different types of shocks (e.g., monetary policy shocks, fiscal
shocks, technology shocks, etc.), and does not provide a thorough guidance on how to identify
country origins of shocks, which is necessary to identify all the shocks in the GVAR model.
One possible approach to the identification of the shocks is orthogonalized IR analysis of Sims
(1980), who consider setting P to the Choleski factor of (see Section 24.4). But, as is well
known, the choice of the Choleski factor is not unique and depends on the ordering of variables
in the vector xt . Such an ordering is clearly difficult to entertain in the global setting, but partial
ordering could be considered to identify a single shock or a subset of shocks. This is, for example,
accomplished by Dées et al. (2007) who identify the US monetary policy shock (by assuming
that the US variables come first, and two different orderings for the vector of the US variables are
considered). Another well-known possibility to identify shocks in reduced-form VARs includes
the work of Bernanke (1986), Blanchard and Watson (1986), and Sims (1986) who consider
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
a priori restrictions on the contemporaneous covariance matrix of shocks; Blanchard and Quah
(1989) and Clarida and Gali (1994) who consider restrictions on the long-run impact of shocks
to identify the impulse responses; and the sign-restriction approach considered, among others,
in Faust (1998), Canova and Pina (1999), Canova and de Nicolò (2002), Uhlig (2005), Mount-
ford and Uhlig (2009), and Inoue and Kilian (2013). Identification of shocks in a GVAR is sub-
ject to the same issues as in standard VARs (see Chapter 24), but is further complicated due
to the cross-country interactions and the high dimensionality of the model. Dées et al. (2014)
provide a detailed discussion of the identification and estimation of the GVAR model subject to
theoretical constraints.
In view of these difficulties, Pesaran et al. (2004), Pesaran and Smith (2006), Dées et al.
(2007) and the subsequent literature mainly adopt the generalized IRF (GIRF) approach,
advanced in Koop et al. (1996), Pesaran and Shin (1998) and Pesaran and Smith (1998) (see also
Section 24.5). The GIRF approach does not aim at identification of shocks according to some
canonical system or a priori economic theory, but considers a counterfactual exercise where the
historical correlations of shocks are assumed as given. In the context of the GVAR model (33.8)
the k × 1 vector of GIRFs is given by
√
gεj (h) = E(xt+h | ε jt = σ jj , It−1 ) − E (xt+h | It−1 ) ,
Rh G−1 ej
= 0 , (33.37)
ej ej
for j = 1, 2, . . . , k, h = 0, 1, 2, . . ., where σ jj = E ε2jt is the size of the shock, which is set
to one standard deviation (s.d.) of εjt .13 The GIRFs can also be obtained for (synthetic) ‘global’
g
or ‘regional’ shocks, defined by ε m,t = m ε t , where the vector of weights, m, relates to a global
g
aggregate or a particular region. The vector of GIRF for the global shock, εm,t , is
√
g
gm (h) = E xt+h | ε m,t = m m, It−1 − E (xt+h | It−1 ) ,
Rh G−1 m
= √ 0 . (33.38)
m m
13 Estimation and inference on impulse responses can be conducted by bootstrapping, see Dées et al. (2007) for details.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and since the shocks are orthogonal, it follows that N
j=1 SFEVD xit , vjt , h = 1 for any i and
h. In the case of non-orthogonal shocks, the forecast-error variance decompositions need not
sum to unity. Analogously to the GIRFs, generalized forecast error variance decomposition of
generalized shocks can be obtained as
h 2
σ −1
jj
−1
=0 ei R G0 ej
GFEVD xit , ε jt , h = h −1 −1
.
=0 ei R G0 G0 R ei
p
E xt0 +h
t0 = F E xt0 +h−
t0 + G−1
0 E ε t0 +h t0 , (33.39)
=1
and standard forecasts E xt0 +h
Ixt0 can be easily computed from (33.8) recursively using the
estimates of F and G−1
0 , and noting that (33.40) holds and E xt | Ixt0 = xt for all t ≤ t0 .
Forecasts from model (33.18) featuring observed common variables can be obtained in a
similar way.
Generating conditional forecasts for non-standard conditioning information sets with mixed
information on (future, present, and past values of) variables in the panel is more challenging.
This situation could arise, for instance, in the case where data for different variables are released at
14 See also Forni and Lippi (2001), Forni et al. (2000, 2004), Stock and Watson (1999, 2002, 2005), Giannone, Reichlin,
and Sala (2005), and Bai and Ng (2007).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
different dates, or when mixed information sets are intentionally considered to answer specific
questions as in Bussière et al. (2012). Without loss of generality, and for expositional conve-
nience, suppose, for some date t , that the first ka variables in the vector xt belong to t0 and the
remaining kb = k − ka variables do not, and partition ε t as εt = ε at , ε bt , and the associated
covariance matrix, = E ε t ε t as
aa ab
= . (33.41)
ba bb
It then follows that E ε at | t0 = εat , whereas E ε bt | t0 = ba −1 ˆ
aa ε at . Let be an
estimate of , then an estimate of E εt | t0 can be computed as
ε̂at
Ê ε t | t0 = .
ˆ −1
ˆ ba aa ε̂ at
for any given t ≤ t0 + h. The conditional forecasts E xt0 +h
t0 can then be computed recur-
sively as in (33.39). One problem is that and its four sub-matrices in (33.41) can have large
dimensions relative to the available number of time series observations, and therefore it is not
guaranteed that ˆ aa will be invertible. Even if it were, the inverse of the traditional estimate of
variance-covariance matrices does not necessarily have good small sample properties when the
number of variables is large. For these reasons, it is desirable to make use of other covariance
matrix estimators with better small sample properties. There are several estimators proposed
in the literature for estimation of high-dimensional covariance matrices, including Ledoit and
Wolf (2004), Bickel and Levina (2008), Fan et al. (2008), Friedman et al. (2008), the shrink-
age estimator considered in Dées et al. (2014), and the multiple testing approach by Bailey et al.
(2015).
The implicit assumption in construction of the GVAR model (33.8) is invertibility of G0 ,
which ensures that the model is complete as discussed in Section 33.3.1. If G0 is not invertible,
then the system of country-specific equations is incomplete and it needs to be augmented with
additional equations. This possibility is considered in Chudik, Grossman, and Pesaran (2014)
who consider forecasting with GVARs in the case when N, T →j ∞, and the DGP is given by
a factor-augmented infinite-dimensional VAR model considered by CP and outlined above in
Section 33.4.2. For simplicity of exposition, consider a large dimensional VAR with one variable
per country (ki = 1) and one unobserved common factor (m = 1) generated as
in which |ρ| < 1 and the macro shock, ηft , is serially uncorrelated and distributed with zero
mean and variance σ 2η . Let the factor loadings be denoted by γ = γ 1 , γ 2 , . . . , γ N , and con-
sider the granular weights vector w = (w1 , w2 , . . . , wN ) that defines the cross-section averages
x∗it = x∗t = w xt (assumed to be identical across countries). In this simple setting the GVAR
model can be written as (see (33.34))
xit = φ ii xi,t−1 + λi0 x∗t + λi1 x∗t−1 + uit + Op N −1/2 , for i ∈ {1, 2, . . . , N} , (33.43)
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
xt = F̂xt−1 + Ĝ−1
0 ε̂ t , (33.44)
where F̂ = Ĝ−1 Ĝ 1 , Ĝ 1 = ˆ
+ ˆ 1 W̃ ,
ˆ 1 = diag λ̂11 , λ̂21 , . . . , λ̂N1 , and
ˆ = diag φ̂ 11 , φ̂ 22 ,
0
. . . , φ̂ NN . However, in this setup it is not optimal to use (33.44) for forecasting for the following
two reasons. First, G0 = IN − ˆ 0 W̃ is by construction rank deficient; to see this note that
ˆ 0 W̃
w G0 = w I N −
ˆ 0 τ w ,
= w − w
N
and recalling that i=1 wi γ i = γ ∗ , we have
N
wi γ i
w G0 = w − w = w − w = 0 ,
i=1
γ∗
which establishes that G0 has a zero eigenvalue. Since G0 is singular, the system of equations
(33.43) is not complete and it is unclear what the properties of Ĝ−1 0 are, given that the indi-
vidual elements of Ĝ0 are consistent estimates of the elements of G0 . Second, the parameters
N
in the conditional models φ ii , λi0 , λi1 i=1 do not contain information about the persistence of
unobserved common factor, ρ, due to the conditional nature of these models.
Chudik, Grossman, and Pesaran (2014) consider augmenting (33.43) with a set of equations
for the cross-section averages. In the present example we consider augmenting the GVAR model,
(33.43), with the following equation
x∗t = ρx∗t−1 + γ ηft + Op N −1/2 , (33.45)
where x∗t is treated as a proxy for the (scaled) unobserved common factor. See (33.30). Com-
bining (33.43) and (33.45), the following augmented VAR model in zt = xt , x∗t is obtained
B0 zt = B1 zt−1 + uzt + Op N −1/2 , (33.46)
where uzt = ut , γ ηft ,
I −λ0 λ1
B0 = N , B1 = ,
0 1 0 ρ
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
f
xt+h = Bh zt , (33.47)
where
IN λ0 λ1 λ1 + ρλ0
B = B−1
0 B1 = = .
0 1 0 ρ 0 ρ
Consider now the infeasible optimal forecasts obtained using the factor-augmented infinite-
dimensional VAR model, (33.26),
one factor given by (33.42), and conditional on the combined
information set Ixt ∪ If = xt , ft ; xt−1 , ft−1 ; . . .
E xt+h | Ixt ∪ If = h xt + ρ h IN − h γ ft . (33.48)
namely Bh zt → E xt+h | Ixt ∪ If , the infeasible optimal forecasts, as N → ∞.
Even when G0 is invertible, it is possible that augmentation of the GVAR by equations for
cross-section averages leads to forecast improvements. Note that the GVAR model (33.8) does
not feature an unobserved factor error structure. We have seen that a sufficient number of cross-
section averages in the individual country-specific conditional models in (33.3) takes care of the
effects of any strongly cross-sectionally dependent processes that enter as unobserved common
factors for the purpose of estimation of country-specific coefficients. Inclusion of a sufficient
number of cross-section averages will also lead to weak cross-section dependence of the vector
of errors ε t in the country-specific models. But since the reduced form innovations G−1 0 ε t must
be strongly cross-sectionally dependent when a strong factor is present in xt , then it follows that
G−1
0 (if it exists) cannot have bounded spectral matrix norm in N. Forecasts based on the aug-
mented GVAR model avoid the need for inversion of high-dimensional matrices. Monte Carlo
findings reported in Chudik, Grossman, and Pesaran (2014) suggest that augmentation of the
GVAR by equations for cross-section averages does not hurt when G0 is invertible, while it can
considerably improve forecasting performance when G0 is singular.
The majority of applications of the GVAR approach in the literature are concerned with mod-
elling of the global economy. Therefore, a brief discussion of important issues in forecasting the
global economy is in order. There are two important issues in particular: the presence of struc-
tural breaks and model uncertainty. Structural breaks are quite likely, considering the diverse
set of economies and the time period spanning three or more decades, which covers a lot of
historical events (financial crises, wars, regime changes, natural disasters, etc.). The timing and
the magnitude of breaks and the underlying DGP are not exactly known, which complicates
the forecasting problem. Pesaran, Schuermann, and Smith (2009a) address both problems by
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
using a forecast combination method. They considered simple averaging across selected models
(AveM) and estimation windows (AveW) as well as across both dimensions, models and win-
dows (AveAve); and obtain evidence of superior performance for their double-average (AveAve)
forecasts. These and other forecasting evidence are reviewed in more detail in the next section.
Forecast evaluation in the GVAR model is also challenging due to the fact that the multi-horizon
forecasts obtained from the GVAR model could be cross-sectionally as well as serially depen-
dent. One test statistic to evaluate forecasting performance of the GVAR model is proposed by
Pesaran, Schuermann, and Smith (2009a) who develop a panel version of the Diebold and Mar-
iano (1995, DM) DM test assuming cross-sectional independence.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Persistence profiles
The speed of convergence with which the adjustment to long-run relations takes place in the
global model can be examined by persistence profiles (PPs). PPs refer to the time profiles of
the effects of system or variable-specific shocks on the cointegrating relations, and they provide
additional valuable evidence on the validity of long-run relations. In particular, when the speed of
convergence towards a cointegrating relation turns out to be very slow, then this is an important
indication of misspecification in the cointegrating vector under consideration. See Chapter 24
and Pesaran and Shin (1996) for a discussion of PPs in cointegrated VAR models, and Dées et al.
(2007b) for implementation of PPs in the GVAR context.
When the GVAR contains deterministic components, xtP will be given by the sum of the deter-
ministic components and long-horizon expectations of de-trended variables. The vector of devi-
ations from steady states in both cases is given by
x̃t = xt − xtP .
and,
Pin the absence of deterministic components, xtP satisfies the martingale property,
Et xt+1 = xt . Such a property is a natural requirement of any coherent definition of steady
P
states, but this property is not satisfied for the commonly used Hodrick–Prescott (HP) filter
and some of the other statistical measures of steady states.
Permanent components can be easily obtained from the estimated GVAR model using the
Beveridge-Nelson decomposition, as illustrated in detail by Dées et al. (2007) and Dées et al.
(2009). Estimates of steady states are crucial for the mainstream macroeconomic literature,
which focuses predominantly on modelling the business cycle, that is explaining the behaviour
of deviations from the steady states. The GVAR provides a coherent method for constructing
steady states that reflect global influences and long-run structural relationships within, as well as
across, countries in the global economy.
15 See Chapter 13 for an introduction to trend and cycle decompositions. A multivariate analysis is provided in
Section 22.15.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
16 See the following IMF policy publications for examples of the use of GVAR approach by fund staff: 2011 and 2014
Spillover Reports; 2006 World Economic Outlook; October 2010 and April 2014 Regional Economic Outlook: Asia and
Pacific Department; April 2014 Regional Economic Outlook: Western Hemisphere Department; November 2012 Regional
Economic Outlook: Middle East and Central Asia Department; October 2008 Regional Economic Outlook: Europe; April
and October 2012 Regional Economic Outlook: Sub-Saharan Africa; and IMF country reports for Algeria, India, Italy,
Russia, Saudi Arabia, South Africa, and Spain.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
The GVAR handbook edited by di Mauro and Pesaran (2013) provides an interesting col-
lection of a number of GVAR empirical applications from 27 contributors. The GVAR hand-
book is a useful non-technical resource aimed at a general audience and/or practitioners inter-
ested in the GVAR approach. This handbook provides a historical background of the GVAR
approach (Chapter 1), describes an updated version of the basic DdPS model (Chapter 2), and
then provides seven applications of the GVAR approach on international transmission of shocks
and forecasting (Chapters 3–9), three finance applications (Chapters 10–12), and 5 regional
applications. The applications in the handbook span various areas of the empirical literature.
Chapters on international transmission on forecasting investigate, among others, the problem
of measuring output gaps across countries, structural modelling, the role of financial markets in
the transmission of international business cycles, international inflation interlinkages, and fore-
casting the global economy. Finance applications include a macroprudential application of the
GVAR approach, a model of sovereign bond spreads, and an analysis of cross-country spillover
effects of fiscal spending on financial variables. Regional applications investigate the increasing
importance of the Chinese economy, forecasting of the Swiss economy, imbalances in the euro
area, regional and financial spillovers across Europe, and modelling interlinkages in the West
African Economic and Monetary Union. We refer the reader to this handbook for further details
on these interesting applications. In what follows we provide an overview of a number of more
recent applications, starting with forecasting.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Schanne (2011) estimates a GVAR model applied to German regional labour market data,
and uses the GVAR to forecast different labour market indicators. The author finds that includ-
ing information about labour market policies and vacancies, and accounting for lagged and
contemporaneous spatial dependence, can improve the forecasts relative to a simple bivariate
benchmark model. On the other hand, business cycle indicators seem to help little with labour
market predictions.
Forecasting using a mixed conditional information set is considered in Bussière, Chudik, and
Sestieri (2012), who develop a GVAR model to analyse global trade imbalances. In particular,
they compare the growth rates of exports and imports of 21 countries during the Great Trade
Collapse of 2008–09 with the model’s prediction, conditioning on the observed values of real
output and real exchange rates. The objective of this exercise is to assess whether the collapse in
world trade that took place during 2008–09 can be rationalized by standard macro explanatory
variables (such as domestic and foreign output variables and real exchange rates) alone, or if other
factors may have played a role. Standard macro explanatory variables alone are found to be quite
successful in explaining the collapse of global trade for most of the economies in the sample. This
exercise also reveals that it is easier to reconcile the Great Trade Collapse of 2008–09 in the case
of advanced economies as opposed to emerging economies.
Forecasting of trade imbalances is also considered in Greenwood-Nimmo, Nguyen, and Shin
(2012b). These authors compute both central forecasts and scenario-based probabilistic fore-
casts for a range of events and account for structural instability by the use of country-specific
intercept shifts. They find that the predictive accuracy of the GVAR model is broadly compara-
ble to that of standard benchmark models over short horizons and superior over longer horizons.
Similarly to Bussière, Chudik, and Sestieri (2012), they conclude GVAR models may be useful
forecasting tools for policy analysis.
Forecasting of global output growth with GVARs is considered in a number of papers. Chudik,
Grossman, and Pesaran (2014) focus on the information content of purchasing manager indices
(PMIs) for nowcasting and for forecasting of real output growth. Feldkircher et al. (2014)
present Bayesian estimates of the GVAR, and report improved forecasts when the GVAR model is
based on country models estimated with shrinkage estimators. Garratt, Lee, and Shields (2014)
model real output growth for G7 economies using survey output expectations, and find that both
cross-country interdependencies and survey data are important for density forecasts of real out-
put growth of G7 economies. Forecasting with a regime-switching GVAR model is considered in
Binder and Gross (2013) who find that combining the regime-switching and the GVAR method-
ology significantly improves out-of-sample forecast accuracy in the case of real GDP, inflation,
and stock prices.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
is used, for example, to compute the effects of a hypothetical negative equity price shock in
Southeast Asia on the loss distribution of a typical credit portfolio of a private bank with global
exposures over one or more quarters ahead. The authors find that the effects of such shocks on
losses are asymmetric and non-proportional, reflecting the highly nonlinear nature of the credit
risk model. de Wet, van Eyden, and Gupta (2009) develop a South African-specific component
of the GVAR model for the purpose of credit portfolio management in South Africa. Castrén,
Dées, and Zaher (2010) use a GVAR model to analyse the behaviour of euro area corporate
sector probabilities of default under a wide range of shocks. They link the core GVAR model
with a satellite equation for firm-level expected default frequencies (EDFs) and find that, at the
aggregate level, the median EDFs react most to shocks to GDP, exchange rate, oil and equity
prices.
A number of other empirical GVAR papers focus on modelling various types of risk (sovereign,
non-financial corporate or banking sector risks). Favero (2013) uses the GVAR approach to
model sovereign risk, particularly time varying interdependence among ten-year sovereign bond
spreads of the euro area member states. Gray et al. (2013) analyse interactions between banking
sector risk, sovereign risk, corporate sector risk, real economic activity, and credit growth for
15 European countries and the United States. The goal is to analyze the impact and spillover
effects of shocks and to help identify policies that could mitigate banking system failures, and
sovereign credit risk. Alessandri et al. (2009) develop a quantitative framework which evalu-
ates systemic risk due to banks’ balance sheets which also allows for macro credit risk, interest
income risk, market risk, and asset side feedback effects. These authors show that a combina-
tion of extreme credit and trading losses can precipitate widespread defaults and trigger conta-
gious default associated with network effects and fire sales of distressed assets. Chen et al. (2010)
investigate how bank and corporate default risks are transmitted internationally. They find strong
macro-financial linkages within domestic economies as well as globally, and report significant
global spillover effects when the shock originates from an important economy.
Dreger and Wolters (2011) investigate the implications of an increase in liquidity in the years
preceding the global financial crises on the formation of price bubbles in asset markets. They find
that the link between liquidity and asset prices seems fragile and far from being obvious. Impli-
cations of liquidity shocks and their transmission are also investigated in Chudik and Fratzscher
(2011). In addition to liquidity shocks, Chudik and Fratzscher (2011) identify risk shocks and
find that, while liquidity shocks have had a more severe impact on advanced economies during
the recent global financial crisis, it was mainly the decline in risk appetite that affected emerg-
ing market economies. Effects of risk shocks are also scrutinized in Bussière, Chudik, and Mehl
(2011) for a monthly panel of real effective exchange rates featuring 62 countries. Bussière,
Chudik, and Mehl (2011) find that the responses of real effective exchange rates of euro area
countries to a global risk aversion shock after the creation of the euro have been similar to the
effects of such shocks on Italy, Portugal, or Spain before the European Monetary Union, that is,
of economies in the euro area’s periphery. Moreover, their findings suggest that the divergence in
external competitiveness among euro area countries over the past decade, which is at the core of
today’s debate on the future of the euro area, is more likely due to country-specific shocks rather
than to global shocks. Dovern and van Roye (2013) use a GVAR model to study the interna-
tional transmission of financial stress and its effects on economic activity and find that financial
stress is quickly transmitted internationally. Moreover, they find that financial stress has a lagged
but persistent negative effect on economic activity, and that economic slowdowns tend to limit
financial stress.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Gross and Kok (2013) use a mixed cross-section (23 countries and 41 international banks)
GVAR specification to investigate contagion among sovereigns and private banks. They find that
the potential for spillovers in the credit default swap market was particularly pronounced in 2008
and again in 2011–12. Moreover, contagion primarily tends to move from banks to sovereigns
in 2008, whereas the direction seems to have been reversed in 2011–12 in the course of the
sovereign debt crisis.
Interrelation between volatility in financial markets and macroeconomic dynamics is investi-
gated in Cesa-Bianchi, Pesaran, and Rebucci (2014), who augment the GVAR model of DdPS
with a global volatility module. They find a statistically significant and economically sizable
impact of future output growth on current volatility, and no effect of an exogenous change in
volatility on the business cycle over and above those driven by the common factors. They inter-
pret this evidence as suggesting that volatility is a symptom rather than a cause of economic
instability.
Implication of global financial conditions on individual economies is also the object of a study
by Georgiadis and Mehl (2015), but with a very different focus from earlier studies, which
mostly concentrate on transmission of financial risk. These authors investigate the hypothesis
that global financial cycles determine domestic financial conditions regardless of an economy’s
exchange rate regime. Using a quarterly sample of 59 economies spanning 1999Q1:2009Q4
period, the authors reject this hypothesis and find that the classic Mundell-Flemming trilemma
(namely that an economy cannot simultaneously maintain a fixed exchange rate, free capital
movement, and an independent monetary policy) remains valid, despite the significant rise in
financial globalization since the 1990s.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
markets associated with structural factors, and argue that these are partly due to globalization
which, in addition to changes in monetary policy, seem to be behind some of the changes in the
inflationary process over the period under consideration.
Using the GVAR model, Dées et al. (2009) provide estimates of new Keynesian phillips
Curves (NKPC) for eight developed industrial countries and discuss the weak instrument prob-
lem and the characterization of the steady states. It is shown that the GVAR generates global
factors that are valid instruments and help alleviate the weak instrument problem. The use of
foreign variables as instruments is found to substantially increase the precision of the estimates
of the output coefficient in the NKPC equations. Moreover, it is argued that the GVAR steady
states perform better than the Hodrick–Prescott (HP) measure. Unlike HP, the GVAR measures
of the steady states are coherent and reflect long-run structural relationships within as well as
across countries.
Global imbalances and exchange rate misalignments
The effects of demand shocks and shocks to relative prices on global imbalances are examined in
Bussière, Chudik, and Sestieri (2012), using a GVAR model of global trade flows. Their results
indicate that changes in domestic and foreign demand have a much stronger effect on trade flows
as compared to changes in relative trade prices. Using the GVAR approach, global imbalances are
also investigated by Bettendorf (2012), although with a different focus. Estimating exchange rate
misalignments using a GVAR model is undertaken in Marçal et al. (2014). This paper contrasts
GVAR-based measures of misalignment with traditional time series estimates that treat individ-
ual countries as separate units. Large differences between a GVAR and more traditional time
series estimates are reported, especially for small and developing countries.
Role of the US as a dominant economy
The role of the US as a dominant economy in the global economy is examined in Chudik and
Smith (2013) by comparing two models: one that treats the US as a globally dominant economy,
and a standard version of the GVAR model that does not separate the impact of US variables
from the cross-section averages of foreign economies, as is done in DdPS, for example. They find
some support for the extended version of the GVAR model, with the US treated as a dominant
economy. A similar approach is also adopted by Dées and Saint-Guilhem (2011) who find that
the role of the US is somewhat diminished over time.
Business cycle synchronization and the rising role of China in the world economy
Dreger and Zhang (2013) investigate interdependence of business cycles in China and industrial
countries and study the effects of shocks to the Chinese economy. Cesa-Bianchi et al. (2012)
investigate the interdependence between China, Latin America, and the world economy. Feld-
kircher and Korhonen (2012) consider the effects of the rise of China on emerging markets.
All these studies find a significant degree of business cycle synchronization in the world econ-
omy with the importance of the Chinese economy increasing for both advanced and emerging
economies. Cesa-Bianchi et al. (2012), using a GVAR model with time varying trade weights,
find that the long-term impact of a China GDP shock on typical Latin American economies
has increased threefold since the mid-1990s, and the long-term impact of a US GDP shock has
halved. Feldkircher and Korhonen (2012) find that a 1 per cent shock to Chinese output trans-
lates to a 1.2 per cent increase in Chinese real GDP and 0.1 to 0.5 per cent rise in real out-
put in the case of large economies. The countries of Central Eastern Europe and the former
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Commonwealth of Independent States also experience a rise of 0.2 per cent in their real output.
By contrast, China seems to be little affected by shocks to the US economy.
Boschi and Girardi (2011) investigate the business cycle in Latin America using a nine coun-
try/region version of the GVAR, and quantify the relative contribution of domestic, regional,
and international factors to the fluctuation of domestic output in Latin American economies. In
particular, they find that only a modest proportion of Latin American domestic output variabil-
ity is explained by industrial countries’ factors and that domestic and regional factors account
for the main share of output variability at all simulation horizons.
International linkages of the Korean economy are investigated in Greenwood-Nimmo,
Nguyen, and Shin (2012a). They find that the real economy and financial markets are highly
sensitive to oil price changes even though they have little effect on inflation. They also show
that the interest rate in Korea is set largely without recourse to overseas conditions except to the
extent that these influences are captured by the exchange rate. They find that the Korean econ-
omy is most affected by the US, the euro area, Japan, and China.
Understanding interlinkages between emerging Europe and the global economy is investi-
gated in Feldkircher (2013) who develops a GVAR model covering 43 countries. The main find-
ings are that emerging Europe’s real economy reacts to a US output shock as strongly as it does
to a corresponding euro area shock. Moreover, Feldkircher (2013) uncovers a negative effect
of tightening in the euro area’s short-term interest rate on output in the long-run throughout
Central, Eastern, and Southeastern Europe and the Commonwealth of Independent States.
Sun, Heinz, and Ho (2013) use the GVAR approach with combined trade and financial weights
to investigate cross-country linkages in Europe. Their findings show strong co-movements in
output growth and interest rates but weaker linkages between inflation and real credit growth
within Europe.
The impact of foreign shocks on South Africa is studied in de Waal and van Eyden (2013b).
Using time varying weights they show the increasing role of China and the decreasing role of
the US in South African economy, reflecting the substantial increase in South Africa’s trade with
China since the mid-1990s. The impact of a US shock on South African GDP is found to be
insignificant by 2009, whereas the impact of a shock to Chinese GDP on South African GDP is
found to be three times stronger in 2009 than in 1995. These findings are in line with the way the
global crisis of 2007-09 affected South Africa, and highlight increased risk to the South African
economy from shocks to the Chinese economy.
Spillover effects of shocks in large economies (such as China, euro area, and the US) to the
Middle East and North Africa (MENA) region, as well as the effects of shocks originating in the
MENA oil exporters and Gulf Cooperation Countries to the rest of the world, are investigated
using a GVAR model by Cashin, Mohaddes, and Raissi (2014b). The results are as to be expected
with shocks from China playing an increasingly more important role for the MENA countries.
Impact of EMU membership
Two papers, Pesaran, Smith, and Smith (2007) and Dubois, Hericourt, and Mignon (2009),
investigate counterfactual scenarios regarding monetary union membership. Pesaran, Smith, and
Smith (2007) analyse counterfactual scenarios using a GVAR macroeconometric model and
empirically investigate ‘what if the UK had joined the Euro in 1999’. They report probability
estimates that output could have been higher and prices lower in the UK and in the euro area
as a result of the entry. They also examine the sensitivity of these results to a variety of assump-
tions about the UK entry. The aim of Dubois, Hericourt, and Mignon (2009) is to answer the
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
counterfactual question of the consequences of no euro launch in 1999. They find that monetary
unification promoted lower interest rates and higher output in most euro area economies, relative
to a situation where national monetary policies would have followed a German-type monetary
policy. An opposite picture emerges if national monetary policies had adopted British monetary
preferences after September 1992.
Commodity price models
Gutierrez and Piras (2013) construct a GVAR model of the global wheat market, where the feed-
back between the real and the financial sectors, and also the link between food and energy prices,
are taken into account. Their impulse response analysis reveals that a negative shock to wheat
consumption, an increase in oil prices, and real exchange rate devaluation all have inflationary
effects on wheat export prices, although their impacts are different across the main wheat export
countries.
While oil prices are included in the majority of GVAR models as an important observed com-
mon factor, these studies do not generally focus on the nature of oil shocks and their effects.
Identification of oil price shocks is attempted in Chudik and Fidora (2012) and Cashin et al.
(2014). Both papers argue that the cross-section dimension can help in the identification of
(global) oil shocks and exploit sign restrictions for identification. The former paper investigates
the effects of supply-induced oil price increases on aggregate output and real effective exchange
rates. It finds that adverse oil supply shocks have significant negative impacts on the real output
growth of oil importers, with emerging markets being more affected as compared to the more
mature economies. Moreover, oil supply shocks tend to cause an appreciation (depreciation) of
oil exporters’ (oil importers’) real effective exchange rates, but they also lead to an appreciation
of the US dollar. Cashin et al. (2014) identify demand as well as supply shocks and find that the
economic consequences of the two types of shocks are very different. They also find negative
impacts of adverse oil supply shocks for energy importers, while the impacts on oil exporters
that possess large proven oil/gas reserves is positive. A positive oil-demand shock, on the other
hand, is found to be associated with long-run inflationary pressures, an increase in real output, a
rise in interest rates, and a fall in real equity prices.
Impact of the commodity price boom and bust over the period 1980–2010 on output growth
in Latin America and the Caribbean is estimated in a GVAR model by Gruss (2014). It is found
that, even if commodity prices remain unchanged at their high levels, the growth in the commod-
ity exporting region would be significantly lower than during the commodity price boom period.
Housing
Hiebert and Vansteenkiste (2009) adopt the GVAR approach to investigate the spillover effects
of house price changes across euro area economies, using three housing demand variables: real
house prices, real per capita disposable income, and the real interest rate for ten euro area coun-
tries. Their results suggest limited house price spillovers in the euro area, in contrast to the
impacts of a shock to domestic long term interest rates, with the latter causing a permanent shift
in house prices after around three years. Moreover, they find the effects of house price spillover
to be quite heterogeneous across countries.
Jannsen (2010) investigates the international effects of the 2008–10 housing crises, focusing
on the US, Great Britain, Spain, and France. Among other findings, Jannsen’s results show that
the adverse effects of housing crisis tend to be greatest during the first two years—particularly
between the fifth and the seventh quarter after house prices have reached their peak. It is also
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
found that when several important industrial countries face a housing bust at the same time, eco-
nomic activity in other countries is likely to be dampened via international transmission effects,
leading to significant losses of GDP growth in a number of countries, notably in Europe.
Effects of fiscal and monetary policy
There are a number of studies that use the GVAR approach to examine the international effects
of fiscal policy shocks. Favero, Giavazzi, and Perego (2011) highlight the heterogeneous nature
of fiscal policy multipliers across countries, and show that the effects of fiscal shocks on output
differ according to the nature of the debt dynamics, the degree of openness of the economies
under consideration, and the fiscal reaction functions across countries. Hebous and Zimmer-
mann (2013) estimate spillovers of a fiscal shock in one euro area member country on the rest,
and find that the positive effects of area-wide fiscal shocks are larger than those of the domestic
shocks of comparable magnitude, thus showing that coordinated fiscal action is likely to be more
effective.
Cross-country effects of monetary policy shocks are investigated by Georgiadis (2014a) and
Georgiadis (2014b). These papers investigate the global spillover effects of monetary policy
shocks to US and euro area, respectively. In both papers, monetary policy shocks are identified
by sign restrictions. Georgiadis finds that the effects of US monetary policy shocks on aggregate
output are heterogeneous across countries with the foreign output effects being larger than the
domestic effects for many of the economies in the global economy. Substantial heterogeneity
is also observed in the transmission of euro area monetary policy shocks, where countries with
more wage and fewer unemployment rigidities are found to exhibit stronger output effects.
The role of US monetary policy shocks is also examined in Feldkircher and Huber (2015)
who, in addition to monetary policy shocks, also identify the US aggregate demand and supply
shocks within a Bayesian version of the GVAR model. Among the variety of interesting findings
reported in Feldkircher and Huber (2015), is the fact that US monetary policy shocks are found
to have most pronounced effects on real output internationally.
Labour market
The GVAR model developed by Hiebert and Vansteenkiste (2010) is used to analyse spillovers in
the labour market in the US. Using data on 12 manufacturing industries over the period 1977–
2003, Hiebert and Vansteenkiste (2010) analyse responses of a standard set of labour-market
related variables (employment, real compensation, productivity and capital stock) to exogenous
factors (such as a sector-specific measure of trade openness or a common technology shock),
along with industry spillovers using sector-specific manufacturing-wide measures. Their find-
ings suggest that increased trade openness negatively affects real compensation, has negligible
employment effects, and leads to higher labour productivity. Technology shocks are found to
have significantly positive effects on both real compensation and employment.
Role of credit
The role of credit in the international business cycles is investigated using a GVAR approach
by Eickmeier and Ng (2011), Xu (2012) and Konstantakis and Michaelides (2014). Eickmeier
and Ng focus on the transmission of credit supply shocks in the US, the euro area and Japan,
using sign restrictions to identify the shocks. They find that negative US credit supply shocks
have stronger negative effects on domestic and foreign GDP, as compared with credit supply
shocks from the euro area and Japan. Xu (2012) investigates the effects of US credit shocks
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and the importance of credit in explaining business cycle fluctuations. Her findings reveal the
importance of bank credit in explaining output growth, changes in inflation, and long term
interest rates in countries with a developed banking sector. Using GIRFs she finds strong evi-
dence of spillovers from US credit shocks to the UK, the euro area, Japan, and other indus-
trialized economies. Konstantakis and Michaelides (2014) use the GVAR approach to model
output and debt fluctuations in the US and the EU15 economies. Konstantakis and Michaelides
analyse the transmission of shocks to debt and real output using GIRFs and find that the
EU15 economy is more vulnerable than the US to foreign shocks. Moreover, the effects of a
shock to the US debt has a significant and persistent impact on the EU15 and US economies,
whereas a shock to EU15 debt does not have a statistically significant impact on the
US economy.
Macroeconomic effects of weather shocks
In a unique study, Cashin, Mohaddes, and Raissi (2014a) investigate macroeconomic impacts of
El Niño weather shocks measured by the Southern Oscillation Index (SOI). Arguably, El Niño
weather events are exogenous in nature, and can have important consequences for economic
activity worldwide. SOI is added to a standard GVAR framework as an observable common fac-
tor and the effects of a shock to SOI on economic variables across the globe are investigated. The
authors find considerable heterogeneities in responses to El Niño weather shocks: some coun-
tries experience a short-lived fall in economic activity (Australia, Chile, Indonesia, India, Japan,
New Zealand, and South Africa), while others experience a growth-enhancing effect (the US
and the European region). Some inflationary pressures are also observed in response to El Niño
weather shocks, due to short-lived commodity price increases.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
33.11 Exercises
1. Consider the following first-order GVAR model,
(a) Show that for a fixed N and for each i the variables yit , i = 1, 2, . . . , N, are integrated of
order one (i.e. I(1)) and pair-wise cointegrated.
(b) Suppose T is fixed and N is allowed to increase without bounds. Derive the integra-
tion/coinetgration properties of yit , i = 1, 2, . . . , N.
(c) Derive optimal forecasts of yi,T+h , h > 0, conditional on yi,T− , for = 0, 1, 2, . . ..
(d) Consider now the forecast problem if yi,t−1 − ȳt−1 is replaced by yi,t−1 − ȳt .
2. Consider the following factor-augmented VAR models for the N countries comprising the
world economy
N
N
¯ = N −1
(a) Let i and x̄t = N −1 xit , and suppose that ¯ is a positive definite
¯
i=1 i=1
−1
matrix for N including as N → ∞, and let S = ¯
¯ ¯ . Then
E ft − S (x̄t − x̄t−1 ) = O N −1/2 ,
and
(b) Using the above results discuss the problem of identification and estimation of the country-
specific shocks.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
(c) Consider now the case where is replaced by i , thus allowing for dynamic heterogene-
ity across the countries. How are the above results affected by this generalization?
3. Suppose that we are interested in identifying the possible links between uncertainty (as mea-
sured by asset price volatility) and the macro economy. To this end an investigator considers
the following bivariate relations for country 1 (say the US)
where, as in the above question, x1t is the vector of macro-economic variables for country 1,
ft is the m × 1 vector of unobserved common factors (shocks), and u1t is the vector of shocks
specific to country 1. Also, vt is a measure of uncertainty which is assumed to be affected by
common factors. ε t represents the uncertainty specific shock.
Compare the conditions for identification of Cov(u1t , ε t ) in the above multi-country set-
ting with the single country framework considered above.
(c) How would you estimate Cov(uit , εt ) for i = 1, 2, . . . , N?
yt = yt−1 + γ ft + ε t ,
where yt = y1t , y2t , . . . , yNt , γ = γ 1 , γ 2 , . . . , γ N , ft is an unobserved common factor,
, is an N × N matrix of unknown coefficients, and ε t = (ε 1t , ε 2t , . . . , εNt ) , is an N × 1
vector of idiosyncratic shocks. It is assumed that the common factor follows the covariance
stationary AR(1) process
ft = ρft−1 + vt .
The errors ε t and vt are uncorrelated and serially independent with zero means. Further, it is
assumed that
ε t = Rηt ,
where the N × N matrix R has bounded row and column matrix norms (in N), and ηt ∼
IID(0, IN ).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
h−1 h− γ .
for h = 1, 2, . . . , where ah = =0 ρ
(b) Show that
ȳt = w yt−1 + γ̄ ft + Op N −1/2 ,
and
∞
∞
w yt−1 = w j+1
ε t−j−1 + w j+1 γ ft−j−1 .
j=0 j=0
N
where ȳt = N −1 N i=1 yit , γ̄ = N
−1 −1
i=1 γ i , and w = N (1, 1, . . . , 1).
(c) Denoting the spectral norm of , by , show that if < 1, then
⎛ ⎞
∞
ȳt = ⎝ d L ⎠ ft + Op N −1/2 ,
=0
and |d | = O (1 − ) , for some small positive .
(d) Use the above results to obtain h-step forecasts of yit and ȳt based on the observables,
yt , yt−1 , . . . , for N sufficiently large.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Appendices
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Appendix A: Mathematics
T his appendix reviews background material on complex numbers, trigonometry, matrix alge-
bra, calculus, and linear difference equations. Further information on the topics covered
can be found in Horn and Johnson (1985), Hamilton (1994), Lütkepohl (1996), Magnus and
Neudecker (1999), Golub and Van Loan (1996), and Bernstein (2005).
z = a + bi, (A.1)
√
where a, b are real numbers, and i = −1, is the imaginary unit. The length (or norm) of a
complex number z = a + bi is defined as
|z| = a 2 + b2 .
1. Sum:
(a + bi) + (c + di) = (a + c) + (b + d) i.
2. Product:
ac + bd bc − ad
(a + bi) / (c + di) = + 2 i,
c2 + d2 c + d2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
940 Appendices
∞
(a + bi)j
e a+bi
=
j=0
j!
⎡ ⎤
∞ ∞
(−1) j 2j
b (−1) j 2j+1
b
= ea ⎣
+i
⎦. (A.2)
j=0
2j ! j=0
2j + 1 !
The functions sine and cosine are called trigonometric or sinusoidal functions. Viewed as a func-
tion of θ , we have sin (0) = 0. As θ increases to π/2, the sine function increases to 1 and
then falls back to zero as θ rises further to π . The function then reaches its minimum of −1
h a
(hypotenuse)
(opposite)
θ
A C
b
(adjacent)
Figure A
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
when θ = 3π/2 and then begins climbing back to zero. The function is periodic with period 2π,
since
sin (θ + 2π ) = sin θ ,
The cosine can be seen as a horizontal shift of the sine function since
π
cos θ = sin θ + .
2
Hence, the cosine will also be a periodic function, starting out at 1 (i.e., cos(0) = 1), and falling
to zero as θ increases to π /2. More generally, any linear combinations of sinusoidal functions,
of the type
∞
f (θ ) = aj cos jθ + bj sin jθ , (A.3)
j=0
where aj and bj are arbitrary sequences of constants, is a periodic function with period 2π.
Some important identities
sin2 (θ ) + cos2 (θ ) = 1,
sin(θ ± φ) = sin(θ ) cos(φ) ± cos(θ ) sin(φ),
cos(θ ± φ) = cos(θ) cos(φ) ∓ sin(θ ) sin(φ),
sin (−θ ) = − sin θ , cos (−θ ) = cos θ,
∞
1
f (θ ) = a0 + aj cos jθ + bj sin jθ , (A.4)
2 j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
942 Appendices
where
1 +π
Let
m
fm (θ ) = aj cos jθ + bj sin jθ ,
j=0
Consider now any function f (θ ) defined over the interval (−∞, +∞) and such that
+∞
f (ω) dω < ∞.
−∞
Equation (A.5) is the Fourier integral representation of f (θ ), while equation (A.6) is known as
Fourier transform of f (θ ).
See Priestley (1981, Ch. 4), and Chatfield (2003) for further details on Fourier analysis
applied to time series.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
aij is the element in the ith row and jth column of A, and is a real number. In the following, we
indicate by Mm×n the space of real m × n matrices. We indicate by a.j the jth column of A, namely
⎛ ⎞
a1j
⎜ a2j ⎟
⎜ ⎟
a.j = ⎜ .. ⎟,
⎝ . ⎠
amj
ai. is an n-dimensional vector. An n × n matrix A ∈Mm×n is a square matrix. Its elements aii , for
i = 1, 2, . . . , n, are called diagonal elements and the elements a11 , a22 , . . . , ann constitute the
main diagonal of A.
Scalar multiplication: Let A ∈ Mm×n . Let B = βA. B has its (i, j)th generic element
bij = βaij .
Matrix multiplication: Let A ∈ Mm×n , B ∈ Mn×p . Let C = AB. C has its (i, j)th generic element
n
cij = aih bhj .
h=1
Let A ∈ Mm×n . The transpose of A, indicated by A , with the generic (i, j) element, aji . A matrix
A ∈ Mn×n is symmetric if aij = aji for all i, j, namely for symmetric matrices we have A = A .
The following basic properties hold:
1. (A + B) + C = A + (B + C) = A + B + C.
2. (AB) C = A (BC) = ABC.
3. A (B + C) = AB + AC.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
944 Appendices
4. A = A.
5. (A + B) = A + B .
6. (AB) = B A .
7. A2 = AA.
A.2.2 Trace
The trace of a square matrix A ∈ Mn×n is the sum of the diagonal elements of A, i.e.
n
Tr (A) = aii . (A.7)
i=1
1. Tr (A) = Tr A .
2. Tr (A + B) = Tr (A) + Tr (B) .
3. Tr (βA) = βTr (A) , for every scalar β.
4. Tr (AB) = Tr (BA) .
5. Tr (ABC) = Tr (CAB) = Tr (BCA) .
6. Tr (ABC) = Tr (ACB), if A, B, C are symmetric matrices.
A.2.3 Rank
The rank of a matrix A ∈Mm×n , indicated by rank(A), is the maximum number of linearly inde-
pendent columns of A. We have:
A.2.4 Determinant
The determinant of a matrix A ∈Mn×n , indicated by det (A), is a scalar that can be defined iter-
atively as follows:
where Aij is the matrix obtained by deleting the ith row and jth column of A. The inverse A−1
exists if and only if A has full rank. The determinant is also often indicated by |A|. The determi-
nant satisfies the following properties:
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
x Ax ≥ 0,
for all x = 0. If x Ax > 0,for all x = 0, then A is positive definite, and write A > 0. Positive
definite matrices satisfy the following properties:
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
946 Appendices
AIn = A,
Ax = λx,
or
(A − λIn ) x = 0, x = 0.
The above expression is called the characteristic equation or characteristic polynomial of A. Let
λ1 (A) , λ2 (A) , . . . , λn (A) be the eigenvalues of A. The following properties hold:
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1. λi (A) = λi A .
2. Tr (A) = ni=1 aii = ni=1 λi (A) .
n
3. det (A) = i=1 λi (A) .
4. If λi (A) ≥ 0 for all i, then A ≥ 0.
5. If λi (A) > 0 for all i, then A > 0.
6. If A ≥ 0 then λi (A) for all i are real eigenvalues.
√
7. If B is positive semi-definite, Tr (AB) ≤ λmax (A A)Tr (B).
Let λmin (A) and λmax (A) be the minimum and maximum eigenvalues of a symmetric matrix
A, respectively. Then the Rayleigh–Ritz theorem states that
x Ax
λmin (A) = min ,
x =0 x x
x Ax
λmax (A) = max .
x =0 x x
and
x Ax
λi (A) = max min .
y1 ,y2 ,...,yi−1 x =0, x x
x yj =0,j=1,2,...,i−1
See Horn and Johnson (1985) and Bernstein (2005) for further properties of eigenvalues.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
948 Appendices
This is known as the Woodbury matrix identity, which is a generalization of the Sherman–
Morrison formula. The latter obtains if B = In , and k = 1. Also see Dhrymes (2000,
p. 44).
6. Let A ∈ Mn×n . If limk→∞ Ak = 0 then I − A is nonsingular and
∞
−1
(I − A) = Ak .
k=0
5. Tr AA− = rank(A).
(i) AA+ A = A.
(ii) A+ AA+ = A+ .
(iii) A+ A = A+ A.
(iv) AA+ = AA+ .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
a11 B a12 B ... a1n B
⎜ a21 B a22 B ... a2n B ⎟
⎜ ⎟
A⊗B=⎜ .. .. .. .. ⎟. (A.10)
mp×nq ⎝ . . . . ⎠
am1 B am2 B . . . amn B
1. (A ⊗ B) = A ⊗B .
2. Tr (A ⊗ B) = Tr (A) Tr (B) .
4. (A ⊗ B)− = A− ⊗B− .
5. Let A ∈ Mn×n , B ∈Mm×m , we have det (A ⊗ B) = [det (A)]m [det (B)]n .
6. Let A ∈ Mm×n , B ∈Mp×q , C ∈Mn×s , D ∈ Mq×g , we have
Let A ∈ Mn×m . Then vec (A) is defined to be the mn-dimensional vector formed by stacking
the columns of A on top of each other, that is,
vec (A) = (a11 , a21 , . . . , an1 , a12 , a22 , . . . , an2 , . . . , a1m , a2m , . . . ., anm ) .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
950 Appendices
Let A ∈ Mn×n . Then vech(A) is defined to be the n (n + 1) /2-dimensional vector with the
elements on and below the principal diagonal A stacked on top of each other. In other words,
vech(A) is given by the vectorization of A using only the elements on and below the principal
diagonal:
Further results on Kronecker and vec operators can be found in Bernstein (2005).
Let
A11 A12
A= ,
A21 A22
where A11 is m1 ×m1 , A12 is m1 ×n1 , A21 is n1 ×m1 , and A22 is n1 ×n1 . The following properties
can be derived:
1. Its transpose is
A11 A12 A11 A21
= .
A21 A22 A12 A22
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
−1
3. If A11 and A22 − A21 A11 A12 are nonsingular, then
−1 −1 −1 −1
−1 A11 + A11 B A11 −A11 A12 B−1
A = −1 −1 , (A.16)
−B A21 A11 B−1
where
−1
B = A22 − A21 A11 A12 .
−1
4. If A22 and A11 − A12 A22 A21 are nonsingular, then
−1
−1 C−1 −C−1 A12 A22
A = −1 −1 −1 −1 , (A.17)
−A22 A21 C−1 −1
A22 + A22 A21 C A12 A22
where
−1
C = A11 − A12 A22 A21 .
−1
5. If A11 is nonsingular then det (A) = det (A11 ) det A22 − A21 A11 A12 .
−1
6. If A22 is nonsingular then det (A) = det (A22 ) det A11 − A12 A22 A21 .
m
A1 = max aij (A.18)
1≤j≤n
i=1
n
A∞ = max aij . (A.19)
1≤i≤m
j=1
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
952 Appendices
⎛ ⎞1/2
m
n
A2 = Tr(A A) =⎝ a2ij ⎠ .
1/2
(A.20)
i=1 j=1
!
Aspec = ρ max (A) = max λi (A A) ,
1≤i≤n
AB ≤ A B ,
A wide discussion on matrix norms and their properties can be found in Bernstein (2005).
" "1/s
ρ(A) ≤ "As " ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
A = UU .
Q AZ = T,
Q BZ = S,
Where T is an upper quasi-triangular (namely, a block matrix with non-zero elements on the
blocks along the main diagonal and on the entries above the main diagonal), while S is an upper
triangular matrix. If the diagonal elements of T and S, namely tkk and skk are non-zero then
tii
λi (A, B) = ,
sii
are the so-called generalized eigenvalues and are solutions of the equation Ax − λBx = 0. The
columns of the matrices Q and Z are called generalized Schur vectors.
For further details on the Generalized Schur decomposition see Golub and Van Loan (1996,
p. 377).
A = CC ,
where C ∈ Mn×n is a real orthogonal matrix, whose columns are the n eigenvectors of A, and
= diag {λ1 (A) , λ2 (A) , . . . , λn (A)}.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
954 Appendices
A = TT−1 ,
where
⎛ ⎞
1 0 ... 0
⎜ 0 2 ... 0 ⎟
⎜ ⎟
=⎜ .. .. .. .. ⎟
⎝ . . . . ⎠
0 0 . . . p
⎛ ⎞
λi (A) 1 0 ... 0 0
⎜ 0 λi (A) 1 . . . 0 0 ⎟
⎜ ⎟
i = ⎜ .. .. .. . . .. .. ⎟ .
⎝ . . . . . . ⎠
0 0 0 . . . λi (A) 1
A = TT−1 ,
where = diag {λ1 (A) , λ2 (A) , . . . , λn (A)} and T is a nonsingular n × n matrix whose
columns are the n eigenvectors associated to the eigenvalues of A.
A = LL .
See Lütkepohl (1996) for further discussion on the above matrix decompositions.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
⎛ ⎞
∂f (x)
⎜ ∂x1 ⎟
⎜ ∂f (x) ⎟
⎜ ⎟
∂f (x) ⎜ ⎟
=⎜
⎜
∂x2 ⎟,
⎟ (A.21)
∂x ⎜ .. ⎟
⎜ . ⎟
⎝ ∂f (x) ⎠
∂xn
∂f (x)
is the vector of first-order partial derivatives. ∂x is sometimes called the gradient vector of f (x).
⎛ ⎞
∂f (x)
⎜ ∂x1 x=x0 ⎟
⎜ ⎟
⎜ ∂f (x) ⎟
⎜ ⎟
∂f ∂f (x0 ) ⎜ ∂x2 x=x0
⎟
= =⎜ ⎟,
∂x x=x0 ⎜ ⎟ (A.22)
∂x ⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎝ ∂f (x) ⎠
∂xn x=x0
is the vector of first-order partial derivatives evaluated at x = x0 . The Hessian matrix of second-
order partial derivatives of f (x) is
⎛ ⎞
∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x)
⎜ ... ⎟
⎜ ∂x1 ∂x1 ∂x1 ∂x2 ∂x1 ∂xn ⎟
⎜ ∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x) ⎟
∂ 2 f (x) ⎜
⎜ ... ⎟
⎟
=⎜ ∂x2 ∂x1 ∂x2 ∂x2 ∂x2 ∂xn ⎟.
∂x∂x ⎜ .. .. .. .. ⎟
⎜ . . . . ⎟
⎜ 2 ⎟
⎝ ∂ f (x) ∂ 2f(x) ∂ 2 f (x) ⎠
...
∂xn ∂x1 ∂xn ∂x2 ∂xn ∂xn
Let f (X) be a differentiable real valued function of the matrix X ∈Mm×n . The matrix of first-
order partial derivatives is
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
956 Appendices
⎛ ⎞
∂f (x) ∂f (x) ∂f (x)
...
⎜ ∂x11 ∂x12 ∂x1n ⎟
⎜ ∂f (x) ∂f (x) ∂f (x) ⎟
⎜ ⎟
∂f (X) ⎜ ... ⎟
=⎜
⎜
∂x21 ∂x22 ∂x2n ⎟.
⎟ (A.23)
∂X ⎜ .. .. .. ⎟
⎜ . . . ⎟
⎝ ∂f (x) ∂f (x) ∂f (x) ⎠
...
∂xm1 ∂xm2 ∂xmn
and the Hessian matrix of second-order partial derivatives of f (X) is the mn × mn matrix
∂ 2 f (X)
.
∂vec (X) ∂vec (X)
∂x Ax
See Lütkepohl (1996, Ch. 10), and Magnus and Neudecker (1999) for further details.
In the case of a real function of more than one variable, suppose f (x) be a differentiable real
function of x = (x1 , x2 , . . . , xk ) , defined on the open subset C of Rk . Assume that the line
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
segment from a to b is contained in C, and suppose that f (x) is continuous along the seg-
ment and differentiable between a and b. Then there exists c on the line segment a to b
such that
∂f (u)
f (b) − f (a) = (b − a) .
∂u u =c
n−1
(x − x0 )k (x − x0 )n (n)
f (x) = f (x0 ) + f (k) (x0 ) + f (x0 + λ (x − x0 )) . (A.24)
k! n!
k=1
The above result carries over to real functions of more than one variable. We provide here the
second-order Taylor expansion:
∂f (u)
f (x) = f (x0 ) + (x − x0 )
∂u u=x0
(A.25)
∂ 2 f (u)
+ (x − x0 ) (x − x0 ) .
∂u∂u
(A.26)
u=x0 +λ(x−x0 )
θ̂ = argmax F (θ ) ,
θ
where, for example, F (θ ) may be the log-likelihood or the generalized method of moments
objective function, and θ is a p-dimensional vector of unknown parameters.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
958 Appendices
where a large number of parameters needs to be estimated. Further, the amount of computations
involved when using the search procedure to obtain an estimate of θ which is accurate up to three
or four significant figures can be considerable.
(i+1) (i)
θ̂ = θ̂ − A(i) g(i) ,
(i) (i)
where A(i) is a matrix that depends on θ̂ , and g(i) is the gradient vector evaluated at θ̂ ,
given by
(i) ∂F (θ )
g = .
∂θ θ=θ̂ (i)
The choice of the weighting matrix, A(i) , leads to different gradient methods. One common mod-
ification to gradient methods is to include a ‘damping factor’ to prevent possible overshooting or
undershooting, so that
(i+1) (i)
θ̂ = θ̂ − λ(i) A(i) g(i) , (A.27)
(0)
The iterations need to be started with a guess-estimate, θ̂ . Iterations usually stops when one
or more of the following convergence criteria are satisfied: (i) a small relative change occurs in
the objective function, F (θ ); (ii) a small change occurs in the gradient vector, g(i) , relative to
(i)
A(i) ; and (iii) a small relative change occurs in the parameter estimates, θ̂ . Normally, there is a
maximum number of iterations that will be attempted, and if such a maximum is reached, then
estimates should not be used, unless convergence has been achieved. Note that a poor choice of
starting values can lead to exiting at the maximum number of iterations, and general failure of
iterative methods.
Newton–Raphson method
The most frequently used method is the Newton–Raphson technique that makes use of the
second-order Taylor series expansion. This method works especially well when the function is
globally concave in θ . The Newton–Raphson iteration is
(i+1) (i)
# $−1
θ̂ = θ̂ − H(i) g(i) , (A.28)
for i = 1, 2, . . . where
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
% &
(i) ∂ 2 F (θ )
H = , (A.29)
∂θ∂θ θ =θ̂
(i)
% 2 &
(i) ∂ F (θ )
H = E . (A.30)
∂θ ∂θ θ =θ̂
(i)
This modification is particularly useful in maximum likelihood estimation, because in this case,
by information matrix inequality, H(i) is positive definite.
See Cameron and Trivedi (2005, Ch. 10), for further discussion on gradient methods. See
also Boyd and Vandenberghe (2004) for a textbook treatment of optimization algorithms.
Method of steepest ascent
The method of steepest ascent sets A(i) = Ip . It then usually employs the modified method
(A.27), using as damping factor
where H(i) is the Hessian matrix, (A.29). The advantage of this method over the Newton–
Raphson is that it works even in the case when H(i) is singular.
∗ (i)
θ̂ = θ̂ + 0, 0, . . . .0, λj rj , 0, . . . ., 0 ,
where λj is a pre-specified step length and rj is a draw from a uniform distribution on (−1, 1).
(i+1) ∗
Hence, the method sets θ̂ = θ̂ if it increases the objective function, or if it does not increase
the value of the objective function but does pass the Metropolis criterion that
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
960 Appendices
# (i)
∗
$ !
exp F θ̂ − F θ̂ /Ti > u,
where u ∼ U(0, 1), and Ti is a scaling parameter called temperature. Thus, the method
accepts both uphill and downhill moves, with a probability that decreases with the difference
(i)
∗
Lxt = xt−1 .
Powers of the lag operator are defined as successive applications of L, that is,
Lk xt = xt−k .
a1 L + a2 Lq xt = a1 xt−p + a2 xt−q .
a(L) = a0 + a1 L + a2 L2 + . . . .
a(L) = (1 − λ1 L) (1 − λ2 L) (1 − λ3 L) . . . ,
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
See Griliches (1967) for further discussion on the properties of lag polynomials.
Let
P (L) = 1 − λL.
∞
−1
[P (L)] = λj Lj .
j=0
Further, we have
(1 − λL)−1 (1 − λL) = 1.
L−k xt = F k xt = xt+k ,
yt = φyt−1 + wt , (A.31)
where wt is a real valued function defined for t ≥ 0. By applying recursive substitution we can
rewrite (A.31) as a function of its initial value at date t = 0, y0 , and of the sequence of values of
the variable wt in dates between 1 and t
t−1
yt = φ t y0 + φ s wt−s . (A.32)
s=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
962 Appendices
where wt is an exogenous ‘forcing’ variable, in the sense that wt does not depend on yt or its
lagged values. Using the lag operator, we note that (A.33) can also be expressed as follows
1 − φ 1 L − φ 2 L2 . . . − φ p Lp yt = φ(L)yt = wt ,
We have that
ξ t = Fξ t−1 + vt , (A.34)
is a system of p equations where the first equation is given by (A.33), and the remaining equations
are simple identities. By recursive substitution we can rewrite (A.34) as a function of p initial
values y0 , y−1 , . . . , y−p+1 , and of the sequence of values for the variable wt in dates between 1
and t
t−1
ξ t = Ft ξ 0 + Fs vt−s ,
s=0
where ξ 0 = ( y0 , y−1 , . . . , y−p+1 ). A solution for yt can now be obtained as e1 ξ t , where e1 is a
p × 1 selection vector, namely e1 = (1, 0, . . . , 0).
The limit properties of yt (as t → ∞) depend on the eigenvalues of F, which are given as
the roots of the following pth -order polynomial equation, also known as the auxiliary equation
associated with (A.33)
The solution for yt is stable if all eigenvalues of the above auxiliary equation lie inside the unit
circle. In that case, the limit solution is given by
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
' t−1 (
lim yt = lim e1 Fs vt−s .
t→∞ t→∞
s=0
In the case that the process has started a long time ago, namely from an initial value y−M , we have
t+M−1
ξ t = Ft+M ξ −M + Fs vt−s ,
s=0
and the limit value of yt when M → ∞ exists and does not depend on the initial values if all the
roots of the auxiliary equation falls inside the unit circle. Under these conditions we have
∞
lim yt = e1 Fs vt−s .
M→∞
s=0
where λi is the ith root of the auxiliary equation, and assuming that the underlying difference
equation is stable, namely that ρ = maxk (|λk |) < 1. Under this condition |λi z| < 1, for all i,
and there exist p constants, Ak , (|Ak | < K < ∞) such that
1 Ak p
= ,
(1 − λ1 z)(1 − λ2 z) . . . .(1 − λp z) 1 − λk z
k=1
and
p ∞ ∞
' p (
lim yt = Ak λsk wt−s = Ak λsk wt−s
M→∞
k=1 s=0 s=0 k=1
∞
= α s wt−s = α(L)wt ,
s=0
where
p ∞
αs = Ak λsk , and α(L) = α s Ls .
k=1 s=0
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
964 Appendices
It is also convenient to derive α s directly in terms of φ i . Since α(L) is the inverse of φ(L). We
note that we must have
φ(L)α(L) = α(L)φ(L) = 1.
Multiplying the two polynomials and equating the coefficients of the non zero powers of L to
zero we have
α0 = 1
α1 = φ1
α2 = α1φ1 + α0φ2
..
.
α p = α p−1 φ 1 + α p−2 φ 2 + . . . + α 0 φ p ,
α s = α s−1 φ 1 + α s−2 φ 2 + . . . + α s−p φ p , for s = p + 1, p + 2, . . . .
Further details on solution of difference equations can be found in Agarwal (2000), and in
Hamilton (1994, Ch. 1).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
T his appendix covers key concepts from probability theory and statistics that are used in the
book. We refer to Rao (1973), Billingsley (1995), Zwillinger and Kokoska (2000), Bierens
(2005) and Durrett (2010) for further details.
(i) If A ∈ F then Ac ∈ F , where Ac is the complement set of A, i.e., the set of all elements
of not belonging to A.
(ii) If Aj ∈ F , for j = 1, 2, . . ., then ∪∞
j=1 Aj ∈ F .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
966 Appendices
We have
P (X ∈ A) = p(x).
x∈A
A random variable X is continuous if its set of possible values is an interval of numbers. The
probability density function, fX (x), of X is a real-valued function such that
b
P (a ≤ X ≤ b) = fX (x)dx. (B.3)
a
The cumulative distribution function, FX (x), for a continuous random variable X is defined by
x
FX (x) = P (X ≤ x) = fX (u)du. (B.4)
−∞
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
If X and Y are continuous random variables, then for any subset A containing values of (X, Y),
the joint density function is defined via the double integral
P [(X, Y) ∈ A] = fXY (u, v)dudv. (B.6)
A
We have:
The conditional density of Y given that X = x, denoted as fY|X y|x is given by
fXY x, y
fY|X y|x = , (B.8)
fX (x)
where fXY x, y is the joint probability density function of (X, Y).
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
968 Appendices
For any subset A containing the values that the random variable may assume, we have
P [(X1 , X2 , . . . , Xn ) ∈ A] = p (x1 , x2 , . . . , xn ) .
(x1 ,x2 ,...,xn )∈A
If (X1 , X2 , . . . , Xn ) is an n-dimensional, continuous random variable, then the joint density func-
tion is defined via the integral
P [(X1 , X2 , . . . , Xn ) ∈ A] = fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn .
A
Hence, under independence the joint probability is equal to the product of the two marginal
probability distributions. If X and Y are continuous with joint density function, fXY x, y , then
independence occurs if and only if
fXY x, y = fX (x) fY (y). (B.12)
We have:
and, equivalently, fX|Y x|y = fX (x). See (B.8).
2. If X and Y are independent then f (X) and g(Y) are also independent, where f (.) and g (.)
are two functions.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
In the case X is a continuous random variable with probability density function, fX (x), the
expressions corresponding to (B.14)-(B.16) are
+∞
E (X) = xfX (x) dx, (B.17)
−∞
+∞
Var (X) = [x − E (X)]2 fX (x) dx, (B.18)
−∞
+∞
μr = [x − E (X)]r fX (x) dx, (B.19)
−∞
provided that the integrals exist. The following properties for the expectation operator (we now
only focus on the continuous case) are easy to verify:
+∞
1. Let g (·) be a function. Then E [g (X)] = −∞ g (x) fX (x) dx.
2. E (a + bX) = a + bE (X), where a, b are two constants.
3. E (aX + bY) = aE(X) + bE (Y), for any two random variables X and Y.
+∞
2
4. Let g (·) be a function. Then Var [g (X)] = −∞ g (X) − E [g (X)] fX (x) dx.
5. Var (a + bX) = b2 Var (X).
6. Var (X) = E X 2 − [E (X)]2 .
Let X, Y be two random variables, the conditional expectation of Y given that X takes on the
particular value x is
+∞
E ( Y| X = x) = y fY|X y|x dy, (B.20)
−∞
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
970 Appendices
where fY|X y|x is the conditional density of Y given that X = x (see equation (B.8)).
The following propositions hold.
Proposition 49 (Law of iterated expectations) Let X and Y be two random variables, then we
have
E (Y) = E [E (Y |X )] . (B.21)
Using the law of iterated expectations it is possible to prove the following proposition.
Proposition 50 (Law of total variance) Let Y, X be two random variables, and assume that the
variance of Y is finite, then
Two other measures are often used to describe a probability distribution. These are the coef-
ficients of skewness and the coefficient of kurtosis. The coefficient of skewness is a measure of
the asymmetry of a probability distribution and is defined as
μ3
b1 = . (B.23)
[Var (X)]3/2
is used. In particular, this measure is adopted to characterize departures from the normal distri-
bution, which has excess of zero.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
The sign of the covariance indicates the direction of covariation of X and Y. The following prop-
erties of the covariance can be verified:
Since the magnitude of the covariance depends on the scale of measurement of the variables,
a preferable measure is the correlation coefficient
Cov (X, Y)
ρ XY = . (B.28)
[Var (X)]1/2 [Var (Y)]1/2
If X and Y are independent then the expectation operator has the property E (XY) = E (X) E (Y).
As a consequence, two independent random variables satisfy
Cov(X, Y) = 0,
and, consequently, ρ XY = 0. It follows that for two independent random variables, X and Y,
we have
where Y and X are independent random variables with zero means and unit variances, and χ 2k is
a chi-squared random variate with k > 4 degrees of freedom, distributed independent of Y and
X. It is now easily seen that
⎡ 1/2 ⎤ ⎡ 1/2 ⎤
k − 2 k − 2
E Ỹ = E ⎣ Y⎦ = E ⎣ ⎦ E(Y) = 0,
χ 2k χ 2k
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
972 Appendices
⎡ 1/2 ⎤ ⎡ 1/2 ⎤
k − 2 k − 2
E X̃ = E ⎣ X⎦ = E ⎣ ⎦ E(X) = 0,
χ 2k χ 2k
and
k−2 k−2
E Ỹ X̃ = E YX = E E (Y) E (X) = 0,
χ 2k χ 2k
which yields Cov Ỹ, X̃ = 0. Yet Ỹ and X̃ are not independent. To see this note that
k−2 k−2
E Ỹ 2 = E Y =E
2
E Y 2 = 1.
χk
2 χk
2
The result E k−2
χ 2k
= 1, follows since the first moment of the inverse-chi-squared distribution
is given by E(1/χ 2k ) = 1/ (k − 2) > 0.1 Similarly, E X̃ 2 = 1. But
⎡ 2 ⎤ ⎡ 2 ⎤
2 2 k−2 k − 2 ⎦ 2 2
E Ỹ X̃ = E ⎣ X2Y 2⎦ = E ⎣ E X E Y .
χk
2 χ 2k
Furthermore,
using the results for second-order moment of the inverse-chi-squared distribution
2
we have E k−2
χ2
= (k − 2)/(k − 4). Hence,
k
2
Cov Ỹ 2 , X̃ 2 = E Ỹ 2 X̃ 2 − E Ỹ 2 E X̃ 2 = (k − 2)/(k − 4) − 1 = ,
k−4
which is non-zero for any finite k. But Cov Y 2 , X 2 = 0, since Y and X are independently
distributed. In the case where Y and X are normally distributed, it follows that Ỹ and X̃ are dis-
tributed as a multi-variate t with k degrees of freedom. See also Section B.10.3.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
√
where i = −1 is the imaginary number, and θ ∈ R.
The characteristic function of the sum of two independent random variables X and Y satisfies
Namely, the characteristic function of their sum is the product of their marginal characteristic
functions.
P (X = 1) = 1 − P (X = 0) = 1 − q = p. (B.31)
The Bernoulli distribution is used to describe an experiment, the so-called Bernoulli trial, where
the outcome is random and can be either of two possible outcomes, typically a ‘success’ and a
‘failure’. A sequence of Bernoulli trials is referred to as repeated trials.
Binomial
The
random
variable X has a binomial distribution with parameters n and p, denoted by X ∼
Bi n, p if
n k n−k
P (X = k) = p 1−p , (B.32)
k
for k = 0, 1, 2, . . . , n, where
n n!
= , (B.33)
k k! (n − k)!
is the binomial coefficient. Expression (B.32) gives the probability of getting exactly k successes
in n Bernoulli trials. Note that,for n = 1, X reduces to a Bernoulli random variable. The expected
value and variance of X ∼ Bi n, p are
E (X) = np, Var (X) = np 1 − p .
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
974 Appendices
Poisson
The random variable X has a Poisson distribution with parameter λ, denoted by X ∼ Poisson
(λ), if
λk −λ
P (X = k) = e , (B.34)
k!
for k = 0, 1, 2, . . ., and λ is a positive real number. Let p(n) = λ/n for some positive λ. Then
the Binomial distribution approaches the Poisson with parameter λ as n → ∞, that is, bino-
mial with small parameter and large number of draws is like a Poisson. The expected value and
variance of X ∼ Poisson (λ) are
Hence, the uniform distribution has constant probability within the interval [a, b]. The expected
value and variance of X ∼ U (a, b) are
1 1
E (X) = (a + b) , Var (X) = (b − a)2 .
2 12
Normal
The variable X has a normal distribution (or Gaussian distribution), denoted by X ∼
random
N μ, σ 2 , if its probability density function is
(x − μ)2
1 −
fX (x) = √ e 2σ 2 . (B.36)
2πσ 2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
E (X − μ)4 = 3σ 4 .
1
fX (x) = x(k/2)−1 e−x/2 , (B.37)
2k/2
(k/2)
for x ≥ 0, and k is a positive integer. In the expression above,
(.) denotes the gamma function,
which, if n is a positive integer, is given by
(n) = (n − 1)! = (n − 1) · (n − 2) · . . . · 2 · 1.
∞
(z) = t z−1 e−t dt. (B.38)
0
1. If two independent random variables X1 and X2 have χ 2n1 and χ 2n2 distributions, respec-
tively, then (X1 + X2 ) ∼ χ 2n1 +n2 .
2. Given three random variables X, X1 and X2 , such that X = X1 + X2 , X ∼ χ 2n , X1 ∼ χ 2n1 ,
for n1 < n, then X2 is independently distributed of X1 , and X2 ∼ χ 2n2 , with n = n1 + n2 .
3. If X1 , X2 , . . . , Xn are independent and N(0, σ 2 ), then ni=1 Xi2 /σ 2 ∼ χ 2n .
A related distribution is the non-central chi-square distribution, which often arises in the
power analysis of statistical tests. Indicated as χ 2k (λ) , its probability density function is
∞
e−(x+λ)/2 x(k/2)+j−1 λj
fX (x) = ,
2k/2 j=1
k + j 22j j!
2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
976 Appendices
for x > 0, where k is the number of degrees of freedom, and λ > 0 is the non-centrality
parameter.
Student’s t
The random variable X has a Student’s t-distribution (or simply t-distribution) with ν degrees
of freedom, and denoted by X ∼ tν , if its probability density function is
− (ν+1)
ν+1
2 x2 2
fX (x) = √ 1 + , (B.39)
νπ
ν2 ν
where
(.) is the Gamma function and ν is a positive integer. Let Z ∼ N (0, 1) and V ∼ χ 2k ,
with Z and V independent, then
Z
∼ tk .
V/k
Fisher–Snedecor or F-distribution
The random variable X has a central F-distribution (also known as the Fisher–Snedecor distri-
bution) with d1 and d2 degrees of freedom, denoted by X ∼ F (d1 , d2 ), if its probability density
function is
d1 d1 +d2
1 d1 2 d1 d1 − 2
fX (x) = x 2 −1 1 + x , (B.40)
B d1 d2 d2 d2
2, 2
for x ≥ 0, where d1 and d2 are positive integers and B (., .) is the beta function defined by
1
B x, y = t x−1 (1 − t)y−1 dt.
0
d2
E (X) = , for d2 > 2,
d2 − 2
2d22 (d1 + d2 − 2)
Var (X) = , for d2 > 4.
d1 (d2 − 2)2 (d2 − 4)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
X1 /d1
∼ F (d1 , d2 ) .
X2 /d2
1
e−λ/2 dd11 /2 dd22 /2 x 2 (d1 −2) (d1 x + d2 )− 2 (d1 +d2 )
1 1
fX (x) =
d1 d2
B 2, 2
d1 + d2 d1 d1 λx
·F11 , , ,
2 2 2 (d1 x + d2 )
for x > 0, λ > 0, where F1q is the generalized hypergeometric function (see Zwillinger and
Kokoska (2000) for details). If X1 ∼ χ 2d1 (λ), and X2 ∼ χ 2d2 , with X1 and X2 independent,
then the random variable
X1 /d1
,
X2 /d2
E (Xi ) = kpi ,
Var (Xi ) = kpi 1 − pi , Cov Xi Xj = −kpi pj , for i = j.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
978 Appendices
If n = 2 and p1 = p, then the multinomial corresponds to the binomial random variable with
parameters k and p.
Multivariate normal
The n-dimensional vector of random variables, X =(X1 , X2 , . . . , Xn )
, has a multivariate normal
distribution, and denoted by X ∼ N (μ, ), if its probability density function is
1
1 − (x − μ)
−1 (x − μ)
fX (x) = n 1 e 2 , (B.42)
(2π ) 2 || 2
1. If X ∼ N (μ, ), then its individual marginal distributions are univariate normals.
2. Suppose X = X1
, X2
, with X∼N (μ, ). If X1 and X2 are uncorrelated, that is,
Cov(X1 , X2 ) = 0, then X1 and X2 are independent.
3. Suppose X = X1
, X2
, with X ∼ N (μ, ) , and X1 and X2 being two vectors of dimen-
sion n1 and n2 , respectively. Partition μ and accordingly, as follows:
μ1 11 12
μ= , = ,
μ2 22 22
c = 11 − 12 −1
22 21 . (B.44)
Multivariate Student’s t
The n-dimensional vector of random variables, X, has a multivariate t-distribution with param-
eters ν, μ and , and written as X ∼ tv (μ, , n), if its probability density function is
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
−(n+v)/2
n+v −1/2 1
−1
fX (x) = v 2
|S| 1 + (x − μ) S (x − μ) , (B.45)
2 (vπ )n/2 v
Proposition 51 Let X be an n-dimensional random vector with X ∼ N (0, ), and let q = X
AX,
where A is a symmetric n × n matrix. Then q ∼ χ 2k if and only if A has k eigenvalues equal to
1, the rest being zero.
Proposition 52 Let X be an n-dimensional random vector with X ∼ N (0, ), and let q = X
AX,
where A is a symmetric n × n matrix. Then q ∼ χ 2k if and only if
From the above propositions it follows that if X ∼ N (0, ) and A is an idempotent matrix
then X
AX ∼ χ 2k , with k = Tr (A).
Proposition 53 Let X be an n-dimensional random vector with X ∼ N (μ, ), and let q = X
AX,
where A is a symmetric n × n matrix. Then q ∼ χ 2k (λ) if and only if
Proposition 54 Let X be an n-dimensional random vector with X ∼ N (μ, ), and let q1 = X
AX,
q2 = X
BX, where A, B are two symmetric n × n matrices. Then q1 and q2 are independently
distributed if and only if:
(i) AB = 0.
(ii) ABμ = BAμ = 0.
(iii) μ
ABμ = 0.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
980 Appendices
From the above theorem it follows that, if X ∼ N (0, In ), and M1 , and M2 are two idem-
potent matrices, Then X
M1 X and X
M2 X are two independent central chi-square matrices if
M1 M2 = 0, or equivalently if M2 M1 = 0. For further details, see Styan (1970).
The following theorem is due to Cochran (1934).
Cochran’s theorem has been widely investigated in the literature due to its importance in the
distribution theory for quadratic forms in normal random variables and in the analysis of vari-
ance. The following theorem extends Cochran’s theorem to the case of multivariate normal ran-
dom variables with non-diagonal covariance matrix, possibly singular.
Proposition 56 Let X be an n-dimensional random vector with X ∼ N (μ, ). Further, let q = X
AX
and qi = X
Ai X be quadratic forms such that q = ki=1 qi , r = rank(A) and ri = rank
k
(Ai ), with A1 , A2 . . . , Ak be n × n symmetric matrices, and A = i=1 Ai . Consider the
following statements:
(a) q ∼ χ 2r μ
Aμ .
(b) qi ∼ χ 2ri μ
Ai μ .
(c) qi and qj are independently distributed, for i = j = 1, 2, . . . , k.
k
(d) r = ri .
i=1
1
Pr (|X − μ| ≥ λσ ) ≤ . (B.46)
λ2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Proof We have
+∞
σ =
2
(x − μ)2 dFX (x)
−∞
μ−λσ μ+λσ +∞
= (x − μ)2 dFX (x) + (x − μ)2 dFX (x) + (x − μ)2 dFX (x)
−∞ μ−λσ μ+λσ
−λσ +∞
≥ (x − μ)2 dFX (x − μ) + (x − μ)2 dFX (x − μ) .
−∞ λσ
Noting that
−λσ −λσ
(x − μ) dFX (x − μ) ≥ λ σ
2 2 2
dFX (x − μ) ,
−∞ −∞
and
+∞ ∞
(x − μ)2 dFX (x − μ) ≥ λ2 σ 2 dFX (x − μ) ,
λσ λσ
we have
−λσ ∞
σ ≥λ σ
2 2 2
dFX (x − μ) + dFX (x − μ)
−∞ λσ
≥ λ σ Pr (|X − μ| ≥ λσ ) .
2 2
Hence
1
Pr (|X − μ| ≥ λσ ) ≤ ,
λ2
or
σ2
Pr (|X − μ| ≥ ε) ≤ ,
ε2
if we set ε = λσ .
When X has sth -order moments (s > 0) we have the following generalization of Chebyshev’s
inequality:
E {|X − μ|s }
Pr |X − μ|s ≥ ε ≤ .
εs
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
982 Appendices
1 1
|E (XY)| ≤ E (|XY|) ≤ E X 2 2 E Y 2 2 . (B.47)
Proof Consider linear combination of X and Y, given by aX + bY, where a, b are two non-zero
constants. We have
E (aX + bY)2 = a2 E X 2 + b2 E Y 2 + 2abE (XY) ≥ 0.
Hence we have
1/2 2 1/2
|E (XY)| ≤ E X 2 E Y .
Proposition 59 Let X and Y be two random variables such that E (|X|p ) < ∞ and E (|Y|q ) < ∞,
where 1 < p < ∞, and 1 < q < ∞, with 1
p + 1
q = 1. Then
1 1
E (XY) ≤ E |X|p p E |Y|q q . (B.48)
E (|X|) < ∞, P (X ∈ ) = 1,
! !
E !f (X)! < ∞,
then
Proof Consider the following mean value expansion of f (X) around E(X) = μ
1 2
f (X) = f (μ) + (X − μ) f
(X) + X − X̄ f
X̄ ,
2
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
where the random variable X̄ lies in the range (X, μ). Then
1 " 2 #
E [f (X)] = f (μ) + E X − X̄ f
X̄ .
2
Since f (X) is convex then f
E [f (X)] ≥ f [E (X)] .
(i) b (0) = 0,
(ii) For any dates 0 ≤ a1 ≤ a2 ≤ . . . ≤ ak ≤ 1 the changes [b (a2 ) − b (a1 )] ,
[b (a3 ) − b (a2 )] , . . . , [b (ak ) − b (ak−1 )] are independent multivariate Gaussian with
b (a) − b (s) ∼ N(0, a − s),
(iii) For any given realization, b (a) is continuous in a with probability 1.
Other continuous time processes can be generated from the standard Brownian motion. For
example, a Brownian motion with variance σ 2 can be obtained as
(i) b(0) = 0,
(ii) For any dates 0 ≤ a1 ≤ a2 ≤ . . . ≤ ak ≤ 1 the changes [b (a2 ) − b (a1 )] ,
[b (a3 ) − b (a2 )] , . . . , [b (ak ) − b (ak−1 )] are independent multivariate Gaussian with
b(a) −b(s) ∼ N(0, (a − s) Im ),
(iii) For any given realization, b(a) is continuous in a with probability 1.
1
w (a) = 2 b (a) , (B.51)
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
984 Appendices
[Ta]
s[Ta] = uj ,
j=1
where [Ta] is the largest integer part of Ta, and ut = (u1t , u2t , . . . , umt )
is an m × 1 random
vector satisfying:
(i) E (ut |t−1 ) = 0, and Var (ut |t−1 ) = for all t, where t−1 is a non-decreasing infor-
mation set, and is a positive definite symmetric matrix.
(ii) supt E (ut s ) < ∞, for some s > 2.
The following results hold
T 1
− 32
T st ⇒ w(a)da, (B.53)
t=1 0
T 1
T −2 st s
t ⇒ w(a)w
(a)da, (B.54)
t=1 0
1
1
T
ut st−1 ⇒ w(a)dw(a), (B.55)
T t=1 0
See Phillips and Durlaf (1986) for the proof of the above results.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
C.1 Introduction
T he statistical approach adopted in this volume is primarily classical, but a Bayesian approach
is also considered in the analysis of DSGE models, forecast combination and panel data
modelling. This appendix provides an overview of the Bayesian approach and formally intro-
duces the Bayesian concepts and results used in various parts of the book. A full treatment
of Bayesian analysis can be found in Geweke (2005), Greenberg (2013), Koop (2003), and
Geweke, Koop, and van Dijk (2011).
P (A ∩ B ) = P (A) P (B |A ) = P (B ∩ A) = P (B ) P (A |B ) .
Hence
P (A) P (B |A )
P (A |B ) = . (C.1)
P (B )
Bayes theorem provides a rule for updating the probability of an event (such as A) in the light of
observing another event such as B . The theorem is named after Reverend Thomas Bayes whose
work was posthumously published in 1763 by the Royal Society as ‘An Essay towards solving a
Problem in the Doctrine of Chances’, Philosophical Transactions (1683–1775).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
986 Appendices
θ̂ T is then derived by minimizing the risk function, Eθ [L θ,
θ̂ T ] where the expectation, Eθ (·), is
taken with respect to the posterior distribution, π θ y . Under a quadratic loss function the
Bayes estimator of θ is given by the mean of the posterior distribution, namely
θ̂ T = θ π θ y dθ.
Other Bayes estimates, such as mode or the median of the posterior distribution, can also be
motivated using other loss functions.
When the focus of the analysis is on one of the elements of θ , say θ 1 , the marginal posterior
distribution is considered. For θ 1 the marginal
posterior distribution is obtained by integrating
out all the other elements of θ from π θ y , namely
π θ 1 y = π θ y dθ 2 dθ 3 . . . dθ p .
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
1
h θ 1 y = .
Var θ 1 |y
Computations of E θ 1 |y and h θ 1 |y are often quite complicated and time consuming, par-
ticularly when p is relatively large. But, thanks to recent advances in computing technology, such
computations are carried out reasonably fast using Markov Chain Monte Carlo (MCMC) simu-
lation techniques such as Metropolis–Hastings and Gibbs algorithms. An overview of alternative
Monte Carlo techniques used in the literature is provided by Chib (2011). A more accessible
textbook account is given in Greenberg (2013, Ch. 7).
C.3.1 Identification
In the case where θ is identified, namely f (y; θ 1 ) = f (y; θ 2 ) if and only if θ 1 = θ 2 , the posterior
distribution gets dominated by likelihood function, and the precision of the individual elements
of θ rises with T. This follows since ln π (θ ) is fixed as T → ∞, but ln f (y |θ ) rises with T when
θ is identified. Bayesian inference in the case of non-identified or weakly identified parameters
is discussed in Koop, Pesaran, and Smith (2013), where itis shown that if θ 1 is non-identified
then its precision does not rise with T, and limT→∞ T −1 h θ 1 |y = 0.
f (y |θ ) =θ i=1 yi (1 − θ )1−i=1 yi , 0 ≤ θ ≤ 1.
N N
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
988 Appendices
Using the beta-distributed prior ((α) denotes a gamma function defined by (B.38))
(α + β) α−1
π (θ ) = θ (1 − θ )β−1 , for α, β > 0,
(α)(α)
where π θ y is the posterior probability distribution of θ , defined by (C.2). This result can
be obtained by application of the following Bayes rule
f yT+1 , y, θ |M = π (θ |M )f y |θ ,M f yT+1 y, θ ,M .
and applying the Bayes rule now to f yT+1 , y conditional on model M, we have
π(θ |M )f y |θ M f yT+1 y, θ M dθ
f yT+1 , y |M
f yT+1 y,M = =
f (y |M ) f (y |M )
π (θ |M )f y |θ M
= f yT+1 y, θ, M dθ
f (y |M )
= π θ y M f yT+1 θ , y M dθ,
as required.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
The posterior predictive distribution can also be extended to allow for multiple models. See
Section 17.9.
Note that π (θ 1 |M1 ) f (y |θ 1 , M1 ) = π θ 1 y,M1 is the posterior probability distribution
of θ 1 under M1 . P(y |M1 ) is also known as the ‘marginal likelihood’ of model M1 , and can be
viewed as the expected value of the likelihood with respect to the prior distribution. It can also
be viewed as an ‘average likelihood’ where the averaging is carried out with respect to the priors.
Similarly
P(y |M2 ) = π (θ 2 |M2 ) f (y |θ 2 , M2 )dθ 2 = π θ 2 y,M2 dθ 2 .
Also, since M1 and M2 are assumed to be mutually exclusive and exhaustive, in the sense that
one or the other model holds, we have
P(y) = π(M1 ) π θ 1 y,M1 dθ 1 + π(M2 ) π θ 2 y,M2 dθ 2 .
Finally, the posterior ratio of model M1 to model M2 (also known as the ‘posterior odds’ ratio)
is given by
π (θ 1 |M1 ) f (y |θ 1 , M1 )dθ 1
P M1 y π(M1 )
= × .
P M2 y π(M2 )
π (θ 2 |M2 ) f (y |θ 2 , M2 )dθ 2
In words, the posterior odds ratio of model M1 to model M2 is equal to the prior odds ratio
multiplied by the ratio of the marginal likelihoods, also known as the ‘Bayes factor’
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
990 Appendices
It is important to note that the Bayes factor is only well defined when the priors π (θ 1 |M1 ) and
π (θ 2 |M2 ) are proper.
For large values of the sample size T, the logarithm of the posterior odds will be dominated by
the Bayes factor. Under standard regularity conditions, and assuming that θ i is identified under
Mi , we obtain the familiar Schwarz model selection criterion
1
ln P M1 y − ln P M2 |y = ln f (y|θ̂ 1,ML , M1 ) − ln f (y|θ̂ 2,ML , M2 ) − (p1 − p2 ) ln T + O(1),
2
y = Xβ + u,
The conjugate priors for the regression model are the inverse gamma distribution for σ 2 , and
the normal distribution for β|σ 2 . More specifically,
1 (a¯ /2) + 1 −d
π σ2 = exp ¯ ,
σ2 2σ 2
1 For further details of the regression model and the underlying assumptions see Section 2.2.
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
and
2
2 −k/2 1
π β σ = 2πσ
|H| exp − 2 (β − b) H (β − b) ,
1/2
¯ 2σ ¯ ¯ ¯
where a and d are the prior hyperparameter of the inverse-gamma distribution, and b and σ −2 H
are the¯prior ¯mean and precision of β|σ 2 . Recall that σ 2 H−1 is the prior variance of ¯β|σ 2 . Com-
¯
bining the above results we have ¯
a/2+1
2 −(T+k)/2 1 ¯ −d
π θ y, X = 2π σ |H |1/2
exp ¯ (C.3)
¯ σ2 2σ 2
1 1
× exp − 2 (β − b) H (β − b) − 2 y − Xβ y − Xβ .
2σ ¯ ¯ ¯ 2σ
Hence
(β − b) H (β − b) + y − Xβ y − Xβ
¯ ¯ ¯
= û û + (β − β̂) X X (β−β̂) + (β − b) H (β − b) .
¯ ¯ ¯
The term û û does not depend on θ and can be ignored. Further
(β−β̂) X X (β−β̂) + (β − b) H (β − b)
¯ ¯ ¯
¯ H̄(β−β̄),
= β̂ X X β̂+(β−β) (C.4)
where
−1
β̄ = X X + H X X β̂ + Hb , (C.5)
¯ ¯¯
and
H̄ = X X + H. (C.6)
¯
In the case where σ 2 is known or when the analysis is done conditional on σ 2 (the case of con-
ditional conjugate priors), it readily follows that distribution of β|σ 2 is N(β̄, σ 2 H̄−1 ), where
β̄ is the posterior mean and H̄ is the posterior precision of β. It is easily seen that β̄ is a matrix
weighted average of the OLS estimator, β̂, and the prior mean, b. The weights
¯
−1 −1
WOLS = X X + H X X , and WPrior = X X + H H,
¯ ¯ ¯
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
992 Appendices
add up to Ik . In the case where the regression coefficients are identified and T −1 X X →p xx >
0, then
−1 −1
WPrior = T −1 X X + T −1 H T H →p 0,
¯ ¯
and
−1 −1
WOLS = T −1 X X + T −1 H T X X →p I k ,
¯
and β̄ − β̂ →p 0.
When the Bayesian analysis is carried out jointly in β and σ 2 , we first need to use (C.3) to inte-
grate out β to obtain the posterior distribution of σ 2 . This yields the following inverse-gamma
posterior for σ 2
ā/2+1
1 −d̄
π σ 2 y, X = exp ,
σ2 2σ 2
where
ā = T + a, and d̄ = d+b Hb+ y y − β̄ H̄β̄ .
¯ ¯ ¯ ¯¯
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
−1
β̂ Shrinkage = X X + σ 2 θ Ik X X β̂ (C.8)
−1
= X X + σ 2 θ Ik X y.
A similar estimator (known as the ridged estimator) can also be derived using penalized regres-
sion with an L2 penalty norm. The criterion function for this penalized regression is given by
Q (β,λ) = y − Xβ y − Xβ + λ β β−K ,
where λ > 0 and K is a positive constant such that β β ≤ K. The first-order condition for this
optimization problem is
−2X y − Xβ + 2λβ = 0,
which yields
−1
β̂ Ridge = X X + λIk X y. (C.9)
It is clear that the shrinkage and the ridge estimators coincide when λ = σ 2 θ . The main differ-
ence between the Bayesian and the penalized regression approaches lies in the way the shrinkage
(or penalty) parameter is chosen. Under the Bayesian approach the choice of λ must be a pri-
ori, whilst under the penalized regression approach the choice is often made by cross validation.
See also Section 11.9 and Hastie, Tibshirani, and Friedman (2009) and Buhlmann and van de
Geer (2012).
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References
Ackerberg, D., K. Caves, and G. Frazer (2006). Structural estimation of production functions. Technical
report, Munich Personal RePEc Archive. <http://mpra.ub.uni-muenchen.de/38349/>.
Agarwal, R. P. (2000). Difference Equations and Inequalities: Theory, Methods, and Applications. New York:
Marcel Dekker.
Ahn, S. C. and A. R. Horenstein (2013). Eigenvalue ratio test for the number of factors. Econometrica 81,
1203–1207.
Ahn, S. C., Y. H. Lee, and P. Schmidt (2001). GMM estimation of linear panel data models with time-varying
individual effects. Journal of Econometrics 102, 219–255.
Ahn, S. C., Y. H. Lee, and P. Schmidt (2007). Stochastic frontier models with multiple time-varying individual
effects. Journal of Productivity Analysis 27, 1–12.
Ahn, S. C., Y. H. Lee, and P. Schmidt (2013). Panel data models with multiple time-varying individual effects.
Journal of Econometrics 174, 1–14.
Ahn, S. C. and P. Schmidt (1995). Efficient estimation of models for dynamic panel data. Journal of Economet-
rics 68, 29–52.
Ahn, S. K. and G. C. Reinsel (1990). Estimation of partially nonstationary multivariate autoregressive models.
Journal of the American Statistical Association 85, 813–823.
Akay, A. (2012). Finite-sample comparison of alternative methods for estimating dynamic panel data models.
Journal of Applied Econometrics 27, 1189–1204.
Alessandri, P., P. Gai, S. Kapadia, N. Mora, and C. Puhr (2009). Towards a framework for quantifying systemic
stability. International Journal of Central Banking 5, 47–81.
Alogoskoufis, G. S. and R. Smith (1991). On error correction models: specification, interpretation, estima-
tion. Journal of Economic Surveys 5, 97–128.
Altissimo, F., B. Mojon, and P. Zaffaroni (2009). Can aggregation explain the persistence of inflation? Journal
of Monetary Economics 56, 231–241.
Alvarez, J. and M. Arellano (2003). The time series and cross-section asymptotics of dynamic panel data esti-
mators. Econometrica 71, 1121–1159.
Amemiya, T. (1973). Generalized least squares with an estimated autocovariance matrix. Econometrica 41,
723–732.
Amemiya, T. (1978). A note on a random coefficients model. International Economic Review 19, 793–796.
Amemiya, T. (1980). Selection of regressors. International Economic Review 21, 331–354.
Amemiya, T. (1985). Advanced Econometrics. Oxford: Basil Blackwell.
Amemiya, T. and T. MaCurdy (1986). Instrumental-variable estimation of an error-component model. Econo-
metrica 54, 869–880.
Amengual, D. and M. W. Watson (2007). Consistent estimation of the number of dynamic factors in a large
N and T panel. Journal of Business and Economic Statistics 25, 91–6.
An, S. and F. Schorfheide (2007). Bayesian analysis of DSGE models. Econometric Reviews 26, 113–172.
Anatolyev, S. (2005). GMM, GEL, serial correlation, and asymptotic bias. Econometrica 73, 983–1002.
Andersen, T. G., T. Bollerslev, F. X. Diebold, and H. Ebens (2001). The distribution of realized stock return
volatility. Journal of Financial Economics 61, 43–76.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
996 References
Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2001). The distribution of realized exchange rate
volatility. Journal of the American Statistical Association 96, 42–55.
Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2003). Modeling and forecasting realized volatility.
Econometrica 71, 579–625.
Anderson, G. S. (2008). Solving linear rational expectations models: a horse race. Computational Eco-
nomics 31, 95–113.
Anderson, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate normal
distributions. Annals of Mathematical Statistics 22, 327–351.
Anderson, T. W. (1971). The Statistical Analysis of Time Series. New York: John Wiley.
Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis (3rd edn.). New York: John Wiley.
Anderson, T. W. and C. Hsiao (1981). Estimation of dynamic models with error components. Journal of the
American Statistical Association 76, 598–606.
Anderson, T. W. and C. Hsiao (1982). Formulation and estimation of dynamic models using panel data. Jour-
nal of Econometrics 18, 47–82.
Anderton, R., A. Galesi, M. Lombardi, and F. di Mauro (2010). Key elements of global inflation. In R. Fry,
C. Jones, and C. Kent (eds.), Inflation in an Era of Relative Price Shocks, RBA Annual Conference Volume.
Sydney: Reserve Bank of Australia.
Andrews, D. W. K. (1988). Laws of large numbers for dependent non-identically distributed random variables.
Econometric Theory 4, 458–467.
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation.
Econometrica 59, 817–858.
Andrews, D. W. K. (1998). Hypothesis testing with a restricted parameter space. Journal of Econometrics 84,
155–199.
Andrews, D. W. K. (2005). Cross section regression with common shocks. Econometrica 73, 1551–1585.
Andrews, D. W. K. and X. Cheng (2012). Estimation and inference with weak, semi-strong, and strong iden-
tification. Econometrica 80, 2153–2211.
Andrews, D. W. K. and J. C. Monahan (1992). An improved heteroskedasticity and autocorrelation consistent
covariance matrix estimator. Econometrica 60, 953–966.
Andrews, D. W. K. and W. Ploberger (1994). Optimal tests when a nuisance parameter is present only under
the alternative. Econometrica 62, 1383–1414.
Angeletos, G., G. Lorenzoni, and A. Pavan (2010). Beauty contests and irrational exuberance: A neoclassical
approach. Working Paper 15883. Cambridge, MA: National Bureau Of Economic Research.
Anselin, L. (1988). Spatial Econometrics: Methods and Models. Dordrecht: Kluwer Academic.
Anselin, L. (2001). Spatial econometrics. In B. H. Baltagi (ed.), A Companion to Theoretical Econometrics.
Oxford: Blackwell.
Anselin, L. and A. K. Bera (1998). Spatial dependence in linear regression models with an introduction to
spatial econometrics. In A. Ullah and D. E. A. Giles (eds.), Handbook of Applied Economic Statistics. New
York: Marcel Dekker.
Anselin, L., J. Le Gallo, and J. Jayet (2007). Spatial panel econometrics. In L. Matyas and P. Sevestre (eds.),
The Econometrics of Panel Data, Fundamentals and Recent Developments in Theory and Practice (3rd edn.).
Dordrecht: Kluwer.
Aoki, M. (1996). New Approaches to Macroeconomic Modelling. Oxford: Oxford University Press.
Arbia, G. (2006). Spatial Econometrics: Statistical Foundations and Applications to Regional Growth Convergence.
Berlin: Springer-Verlag.
Arellano, M. (1987). Practitioners’ corner: computing robust standard errors for within-groups estimators.
Oxford Bulletin of Economics and Statistics 49, 431–434.
Arellano, M. (2003). Panel Data Econometrics. Oxford: Oxford University Press.
Arellano, M. and S. R. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an
application to employment equations. Review of Economic Studies 58, 277–297.
Arellano, M. and S. Bonhomme (2011). Nonlinear panel data analysis. Annual Review of Economics 3, 395–424.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 997
Arellano, M. and O. Bover (1995). Another look at the instrumental variable estimation of error-components
models. Journal of Econometrics 68, 29–51.
Assenmacher-Wesche, K. and M. H. Pesaran (2008). Forecasting the swiss economy using VECX* models:
An exercise in forecast combination across models and observation windows. National Institute Economic
Review 203, 91–108.
Baberis, N. and R. Thaler (2003). A survey of behavioral finance. In G. M. Constantinides, M. Harris, and
R. Stultz (eds.), Handbook of Behavioral Economics of Finance. Amsterdam: Elsevier.
Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica 77, 1229–1279.
Bai, J. (2013). Likelihood approach to dynamic panel models with interactive effects. Mimeo, Columbia
University, New York.
Bai, J. and C. Kao (2005). On the estimation and inference of a panel cointegration model with cross-sectional
dependence. In B. H. Baltagi (ed.), Contributions to Economic Analysis. Amsterdam: Elsevier.
Bai, J., C. Kao, and S. Ng (2009). Panel cointegration with global stochastic trends. Journal of Econometrics 149,
82–99.
Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70,
191–221.
Bai, J. and S. Ng (2004). A panic attack on unit roots and cointegration. Econometrica 72, 1127–1177.
Bai, J. and S. Ng (2007). Determining the number of primitive shocks in factor models. Journal of Business and
Economic Statistics 25, 52–60.
Bai, J. and S. Ng (2008). Large dimensional factor analysis. Foundations and Trends in Econometrics 3, 89–168.
Bai, J. and S. Ng (2010). Panel unit root tests with cross-section dependence: a further investigation. Econo-
metric Theory 26, 1088–1114.
Bai, Z. D. and J. W. Silverstein (1998). No eigenvalues outside the support of the limiting spectral distribution
of large dimensional sample covariance matrices. Annals of Probability 26, 316–345.
Baicker, K. (2005). The spillover effects of state spending. Journal of Public Economics 89, 529–544.
Bailey, N., G. Kapetanios, and M. H. Pesaran (2015). Exponents of cross-sectional dependence: estimation
and inference. Journal of Applied Econometrics. Forthcoming.
Bailey, N., M. H. Pesaran, and L. V. Smith (2015, January). A multiple testing approach to the regularisa-
tion of large sample correlation matrices. Unpublished University of Cambridge, CAFE Research Paper
No. 14.05.
Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics. Journal of Economet-
rics 73, 5–59.
Bala, V. and S. Goyal (2001). Conformism and diversity under social learning. Economic Theory 17,
101–120.
Balestra, P. (1996). Introduction to linear models for panel data. In L. Mátyás and P. Sevestre (eds.), The
Econometrics of Panel Data: A Handbook of the Theory with Applications. Berlin: Springer.
Balestra, P. and M. Nerlove (1966). Pooling cross section and time series data in the estimation of a dynamic
model: the demand for natural gas. Econometrica 34, 585–612.
Baltagi, B. H. (2005). Econometric Analysis of Panel Data. New York: John Wiley.
Baltagi, B. H. and G. Bresson (2012). A robust hausman-taylor estimator. In B. H. Baltagi, R. C. Hill, W. K.
Newey, and H. L. White (eds.), Essays in Honor of Jerry Hausman, vol. 29 of Advances in Econometrics,
pp. 175–214. Bingley: Emerald Group.
Baltagi, B. H., G. Bresson, and A. Pirotte (2007). Panel unit root tests and spatial dependence. Journal of
Applied Econometrics 22, 339–360.
Baltagi, B. H., G. Bresson, and A. Pirotte (2012). Forecasting with spatial panel data. Computational Statistics
& Data Analysis 56, 3381–3397.
Baltagi, B. H., P. Egger, and M. Pfaffermayr (2013). A generalized spatial panel data model with random effects.
Econometric Reviews 32, 650–685.
Baltagi, B. H., Q. Feng, and C. Kao (2011). Testing for sphericity in a fixed effects panel data model. The
Econometrics Journal 14, 25–47.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
998 References
Baltagi, B. H. and C. Kao (2000). Nonstationary panels, cointegration in panels and dynamic panels, a survey.
In B. H. Baltagi (ed.), Nonstationary Panels, Panel Cointegration, and Dynamic Panels, Advances in Economet-
rics, vol. 15. New York: JAI Press.
Baltagi, B. H. and S. Khanti-Akom (1990). On efficient estimation with panel data: an empirical comparison
of instrumental variables estimators. Journal of Applied Econometrics 5, 401–406.
Baltagi, B. H. and D. Li (2006). Prediction in the panel data model with spatial correlation: the case of liquor.
Spatial Economic Analysis 1, 175–185.
Baltagi, B. H. and L. Liu (2008). Testing for random effects and spatial lag dependence in panel data models.
Statistics & Probability Letters 78, 3304–3306.
Baltagi, B. H. and A. Pirotte (2010). Seemingly unrelated regressions with spatial error components. Empirical
Economics 40, 5–49.
Baltagi, B. H., S. Song, and W. Koh (2003). Testing panel data regression models with spatial error correlation.
Journal of Econometrics 117, 123–150.
Baltagi, B. H. and Z. Yang (2013). Standardized LM tests for spatial error dependence in linear or panel regres-
sions. The Econometrics Journal 16, 103–134.
Balvers, R. J., T. F. Cosimano, and B. MacDonald (1990). Predicting stock returns in an efficient market. The
Journal of Finance 45, 1109–1128.
Banbura, M., D. Giannone, and L. Reichlin (2010). Large Bayesian vector auto regressions. Journal of Applied
Econometrics 25, 71–92.
Banerjee, A. (1999). Panel data unit roots and cointegration: an overview. Oxford Bulletin of Economics and
Statistics 61, 607–629.
Banerjee, A., J. J. Dolado, J. W. Galbraith, and D. Hendry (1993). Cointegration, Error Correction and the Econo-
metric Analysis of Non-stationary Data. Oxford: Oxford University Press.
Banerjee, A., M. Marcellino, and C. Osbat (2004). Some cautions on the use of panel methods for integrated
series of macroeconomic data. Econometrics Journal 7, 322–340.
Banerjee, A., M. Marcellino, and C. Osbat (2005). Testing for PPP: should we use panel methods? Empirical
Economics 30, 77–91.
Barndorff-Nielsen, O. E. and N. Shephard (2002). Econometric analysis of realised volatility and its use in
estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280.
Barndorff-Nielsen, O. E. and N. Shephard (2002). Estimating quadratic variation using realized variance. Jour-
nal of Applied Econometrics 17, 457–477.
Barro, R. J. and X. Sala-i-Martin (2003). Economic Growth (2nd edn). Cambridge, MA: The MIT Press.
Bartlett, M. S. (1946). On the theoretical specification and sampling properties of autocorrelated time-series.
Journal of the Royal Statistical Society Supplement, 8, 27–41.
Bates, J. M. and C. W. J. Granger (1969). The combination of forecasts. OR 20, 451–468.
Bauwens, L., S. Laurent, and J. V. K. Rombouts (2006). Multivariate GARCH models: a survey. Journal of
Applied Econometrics 21, 79–109.
Baxter, M. and R. G. King (1999). Measuring business cycles: approximate band-pass filters for economic
time series. Review of Economics and Statistics 81, 575–593.
Beach, C. M. and J. G. MacKinnon (1978). A maximum likelihood procedure for regression with autocorre-
lated errors. Econometrica 46, 51–58.
Belsley, D. A., E. Kuh, and R. E. Welsch (1980). Regression Diagnostics: Identifying Influential Data and Sources
of Collinearity. New York: John Wiley.
Benati, L. (2010). Are policy counterfactuals based on structural VARS reliable? ECB Working Paper 1188,
European Central Bank, Working Paper Series, N0. 1188.
Bera, A. K. and Y. Bilias (2002). The MM, ME, ML, EL, EF and GMM approaches to estimation: a synthesis.
Journal of Econometrics 107, 51–86.
Bera, A. K. and C. M. Jarque (1987). A test for normality of observations and regression residuals. International
Statistical Review 55, 163–172.
Bera, A. K. and M. McAleer (1989). Nested and non-nested procedures for testing linear and log-linear regres-
sion models. Sankhya B: Indian Journal of Statistics 21, 212–224.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 999
Beran, R. (1988). Prepivoting test statistics: a bootstrap view of asymptotic refinements. Journal of the Amer-
ican Statistical Association 83, 687–697.
Berk, K. N. (1974). Consistent autoregressive spectral estimates. The Annals of Statistics 2, 489–502.
Bernanke, B. S. (1986). Alternative explanations of the money-income correlation. Carnegie-Rochester Confer-
ence Series on Public Policy 25, 49–99.
Bernanke, B. S., J. Bovian, and P. Eliasz (2005). Measuring the effects of monetary policy: a factor-augmented
vector autoregressive (FAVAR) approach. Quarterly Journal of Economics 120, 387–422.
Bernstein, D. S. (2005). Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems
Theory. Princeton, NJ: Princeton University Press.
Bertrand, M., E. Duflo, and S. Mullainathan (2004). How much should we trust differences-in-differences
estimates? Quarterly Journal of Economics 119, 249–275.
Bester, C. A., T. G. Conley, and C. B. Hansen (2011). Inference with dependent data using cluster covariance
estimators. Journal of Econometrics 165, 137–151.
Bettendorf, T. (2012). Investigating global imbalances: Empirical evidence from a GVAR approach. Studies
in Economics 1217, Department of Economics, University of Kent, UK.
Beveridge, S. and C. R. Nelson (1981). A new approach to the decomposition of economic time series into
permanent and transitory components with particular attention to measurement of the ‘business cycle’.
Journal of Monetary Economics 7, 151–174.
Bewley, R. (1979). The direct estimation of the equilibrium response in a linear dynamic model. Economics
Letters 3, 251–276.
Bhargava, A. and J. D. Sargan (1983). Estimating dynamic random effects models from panel data covering
short time periods. Econometrica 51, 1635–1660.
Bickel, P. J. and E. Levina (2008). Covariance regularization by thresholding. The Annals of Statistics 36,
2577–2604.
Bierens, H. J. (2005). Introduction to the Mathematical and Statistical Foundations of Econometrics. Cambridge:
Cambridge University Press.
Billingsley, P. (1995). Probability and Measure (3rd edn). New York: John Wiley.
Billingsley, P. (1999). Convergence of Probability Measure (2nd edn). New York: John Wiley & Sons.
Binder, M. and M. Gross (2013). Regime-switching global vector autoregressive models. Frankfurt: European
Central Bank, Working Paper No. 1569.
Binder, M., C. Hsiao, and M. H. Pesaran (2005). Estimation and inference in short panel vector autoregres-
sions with unit roots and cointegration. Econometric Theory 21, 795–837.
Binder, M. and M. H. Pesaran (1995). Multivariate rational expectations models and macroeconometric mod-
elling: a review and some new results. In M. H. Pesaran and M. R. Wickens (eds.), Handbook of Applied
Econometrics, vol. I: Macroeconometrics. Oxford: Blackwell.
Binder, M. and M. H. Pesaran (1997). Multivariate linear rational expectations models: characterization of
the nature of the solutions and their fully recursive computation. Econometric Theory 13, 887–888.
Binder, M. and M. H. Pesaran (1998). Decision making in presence of heterogeneous information and social
interactions. International Economic Review 39, 1027–1052.
Binder, M. and M. H. Pesaran (2000). Solution of finite-horizon multivariate linear rational expectations mod-
els and sparse linear systems. Journal of Economic Dynamics and Control 24, 325–346.
Binder, M. and M. H. Pesaran (2002). Cross-country analysis of saving rates and life-cycle models. Mimeo,
University of Cambridge.
Black, A. and P. Fraser (1995). UK stock returns: predictability and business conditions. The Manchester School
Supplement 63, 85–102.
Blanchard, O. J. and C. M. Kahn (1980). The solution of linear difference models under rational expectations.
Econometrica 48, 1305–1311.
Blanchard, O. J. and D. Quah (1989). The dynamic effects of aggregate demand and supply disturbances.
American Economic Review 79, 655–673.
Blanchard, O. J. and M. W. Watson (1986). Are business cycles all alike? In R. J. Gordon (ed.), The American
Business Cycle: Continuity and Change. Chicago: University of Chicago Press.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1000 References
Blundell, R. and S. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models.
Journal of Econometrics 87, 115–143.
Blundell, R. and S. Bond (2000). GMM estimation with persistent panel data: an application to production
functions. Econometric Reviews 19, 321–340.
Blundell, R., S. Bond, and F. Windmeijer (2000). Estimation in dynamic panel data models: improving on the
performance of the standard GMM estimator. IFS Working Papers W00/12. London: Institute for Fiscal
Studies.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31,
307–327.
Bollerslev, T. (1990). Modelling the coherence in short run nominal exchange rates: a multivariate generalized
ARCH model. Review of Economics and Statistics 72, 498–505.
Bollerslev, T., R. Y. Chou, and K. F. Kroner (1992). ARCH modeling in finance: a review of the theory and
empirical evidence. Journal of Econometrics 52, 5–59.
Bond, S., A. Leblebicioglu, and F. Schiantarelli (2010). Capital accumulation and growth: a new look at the
empirical evidence. Journal of Applied Econometrics 25, 1073–1099.
Bond, S., C. Nauges, and F. Windmeijer (2002). Unit roots and identification in autoregressive panel data
models: A comparison of alternative tests. Mimeo. London: Institute for Fiscal Studies.
Boschi, M. and A. Girardi (2011, May). The contribution of domestic, regional and international factors to
Latin America’s business cycle. Economic Modelling 28, 1235–1246.
Boskin, M. J. and L. J. Lau (1990). Post-war economic growth in the group-of-five countries: A new analysis.
Cambridge, MA: NBER Working Paper No. 3521.
Boswijk, H. P. (1995). Efficient inference on cointegration parameters in structural error correction models.
Journal of Econometrics 69, 133–158.
Bowman, A. W. and A. Azzalini (1997). Applied Smoothing Techniques for Data Analysis: The Kernel Approach
with S-Plus Illustrations. Oxford: Claredon Press.
Bowsher, C. G. (2002). On testing overidentifying restrictions in dynamic panel data models. Economics
Letters 77, 211–220.
Box, G. E. P. and G. M. Jenkins (1970). Time Series Analysis: Forecasting and Control (rev. edn, 1976). San
Francisco: Holden-Day.
Box, G. E. P. and D. A. Pierce (1970). Distribution of residual autocorrelations in autoregressive-integrated-
moving average time series models. Journal of American Statistical Association 65, 1509–1526.
Boyd, S. and L. Vandenberghe (2004). Convex Optimization. Cambridge: Cambridge University Press.
Breedon, F. J. and P. Fisher (1996). M0: causes and consequences. The Manchester School 64, 371–387.
Breen, W., L. R. Glosten, and R. Jagannathan (1989). Economic significance of predictable variations in stock
index returns. Journal of Finance 44, 1177–1189.
Breitung, J. (2000). The local power of some unit root tests for panel data. In B. H. Baltagi (ed.), Nonstationary
Panels, Panel Cointegration, and Dynamic Panels, Advances in Econometrics, vol. 15. Amsterdam: JAI.
Breitung, J. (2002). Nonparametric tests for unit roots and cointegration. Journal of Econometrics 108,
343–363.
Breitung, J. (2005). A parametric approach to the estimation of cointegration vectors in panel data. Economet-
ric Reviews 24, 151–173.
Breitung, J. and B. Candelon (2005). Purchasing power parity during currency crises: a panel unit root test
under structural breaks. World Economic Review 141, 124–140.
Breitung, J. and I. Choi (2013). Factor models. In N. Hashimzade and M. A. Thornton (eds.), Hand-
book of Research Methods and Applications in Empirical Macroeconomics, Chapter 11. Cheltenham:
Edward Elgar.
Breitung, J. and S. Das (2005). Panel unit root tests under cross-sectional dependence. Statistica
Neerlandica 59, 414–433.
Breitung, J. and S. Das (2008). Testing for unit roots in panels with a factor structure. Econometric Theory 24,
88–108.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1001
Breitung, J. and W. Meyer (1994). Testing for unit roots in panel data: are wages on different bargaining levels
cointegrated? Applied Economics 26, 353–361.
Breitung, J. and M. H. Pesaran (2008). Unit roots and cointegration in panels. In L. Matyas and P. Sevestre
(eds.), The Econometrics of Panel Data: Fundamentals and Recent Developments in Theory and Practice (3rd
edn). Berlin: Springer-Verlag.
Breitung, J. and U. Pigorsch (2013). A canonical correlation approach for selecting the number of dynamic
factors. Oxford Bulletin of Economics and Statistics 75, 23–36.
Brent, R. P. (1973). Algorithms for Minimization Without Derivatives. Englewood Cliffs, NJ: Prentice-Hall.
Breusch, T. S. and L. G. Godfrey (1981). A review of recent work on testing for autocorrelation in dynamic
simultaneous models. In D. Currie, R. Nobay, and D. Peel (eds.), Macroeconomic Analysis: Essays in Macroe-
conomics and Econometrics. London: Croom Helm.
Breusch, T. S., G. Mizon, and P. Schmidt (1989). Efficient estimation using panel data. Econometrica 57,
695–700.
Breusch, T. S. and A. R. Pagan (1980). The Lagrange multiplier test and its application to model specifications
in econometrics. Review of Economic Studies 47, 239–253.
Brock, W. and S. Durlauf (2001). Interactions-based models. In J. Heckman and E. Leamer (eds.), Handbook
of Econometrics, vol. 5. Amsterdam: North-Holland.
Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods (2nd edn.). New York: Springer.
Browning, M. and M. D. Collado (2007). Habits and heterogeneity in demands: a panel data analysis. Journal
of Applied Econometrics 22, 625–640.
Browning, M., M. Ejrnæs, and J. Alvarez (2010). Modelling income processes with lots of heterogeneity.
Review of Economic Studies 77, 1353–1381.
Broze, L., C. Gouriéroux, and A. Szafarz (1990). Reduced Forms of Rational Expectations Models. New York:
Harwood Academic.
Broze, L., C. Gouriéroux, and A. Szafarz (1995). Solutions of multivariate rational expectations models. Econo-
metric Theory 11, 229–257.
Brüggemann, R. and H. Lütkepohl (2005). Practical problems with reduced rank ML estimators for cointe-
gration parameters and a simple alternative. Oxford Bulletin of Economics and Statistics 67, 673–690.
Buhlmann, P. and S. van de Geer (2012). Statistics for High-Dimensional Data. New York: Springer.
Bun, M. J. G. (2004). Testing poolability in a system of dynamic regressions with nonspherical disturbances.
Empirical Economics 29, 89–106.
Burns, A. M. and W. C. Mitchell (1946). Measuring Business Cycles. New York: National Bureau of Economic
Research.
Burridge, P. (1980). On the Cliff–Ord test for spatial autocorrelation. Journal of the Royal Statistical Society
B 42, 107–108.
Bussière, M., A. Chudik, and A. Mehl (2011). How have global shocks impacted the real effective exchange
rates of individual euro area countries since the euro’s creation? The BE Journal of Macroeconomics 13, 1–48.
Bussière, M., A. Chudik, and G. Sestieri (2012). Modelling global trade flows: results from a GVAR model.
Globalization and Monetary Policy Institute Working Paper 119, Federal Reserve Bank of Dallas.
Caglar, E., J. Chadha, and K. Shibayama (2012). Bayesian estimation of DSGE models: is the workhorse model
identified? Koç University-Tusiad Economic Research Forum, Working Paper No. 1205.
Cameron, A. C. and P. K. Trivedi (2005). Microeconometrics Methods and Applications. New York: Cambridge
University Press.
Campbell, J. Y. (1987). Stock returns and the term structure. Journal of Financial Economics 18, 373–399.
Campbell, J. Y., A. W. Lo, and A. C. MacKinlay (1997). The Econometrics of Financial Markets. Princeton,
NJ: Princeton University Press.
Campbell, J. Y. and N. G. Mankiw (1987). Are output fluctuations transitory? Quarterly Journal of
Economics 102, 857–880.
Campbell, J. Y. and N. G. Mankiw (1989). International evidence of the persistence of economic fluctuations.
Journal of Monetary Economics 23, 319–333.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1002 References
Canova, F. and M. Ciccarelli (2013). Panel vector autoregressive models: a survey. In T. B. Fomby, L. Kilian,
and A. Murphy (eds.), VAR Models in Macroeconomics - New Developments and Applications: Essays in Honor
of Christopher A. Sims. Bingley: Emerald Group.
Canova, F. and G. de Nicolò (2002). Monetary disturbances matter for business fluctuations in the G-7. Jour-
nal of Monetary Economics 49, 1131–1159.
Canova, F. and J. Pina (1999). Monetary policy misspecification in VAR models. Centre for Economic Policy
Research, Discussion Paper No 2333.
Canova, F. and L. Sala (2009). Back to square one: identification issues in DSGE models. Journal of Monetary
Economics 56, 431–449.
Carriero, A., G. Kapetanios, and M. Marcellino (2009). Forecasting exchange rates with a large Bayesian VAR.
International Journal of Forecasting 25, 400–417.
Carrion-i-Sevestre, J. L., T. Del Barrio, and E. Lopez-Bazo (2005). Breaking the panels: an application to the
GDP per capita. Econometrics Journal 8, 159–175.
Carroll, C. D. and D. N. Weil (1994). Saving and growth: a reinterpretation. Carnegie-Rochester Conference
Series on Public Policy 40, 133–192.
Case, A. C. (1991). Spatial pattern in household demand. Econometrica 59, 953–965.
Cashin, P., K. Mohaddes, and M. Raissi (2014a). Fair weather or foul? The macroeconomic effects of El Niño.
Cambridge Working Paper in Economics, No. 1418.
Cashin, P., K. Mohaddes, and M. Raissi (2014b). The global impact of the systemic economies and MENA
business cycles. In I. A. Elbadawi and H. Selim (eds.), Understanding and Avoiding the Oil Curse in Resource-
rich Arab Economies. Cambridge: Cambridge University Press. Forthcoming.
Cashin, P., K. Mohaddes, M. Raissi, and M. Raissi (2014). The differential effects of oil demand and supply
shocks on the global economy. Energy Economics. Forthcoming.
Castrén, O., S. Dées, and F. Zaher (2010). Stress-testing Euro Area corporate default probabilities using a
global macroeconomic model. Journal of Financial Stability 6, 64–78.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research 1, 245–276.
Cesa-Bianchi, A., M. H. Pesaran, and A. Rebucci (2014). Uncertainty and economic activity: a global
perspective. Technical report, CAFE Research Paper No. 14.03, available at SSRN: <http://ssrn.com/
abstract=2414003>. Mimeo, 20 February 2014.
Cesa-Bianchi, A., M. H. Pesaran, A. Rebucci, and T. Xu (2012). China’s emergence in the world economy and
business cycles in Latin America. Journal of LACEA Economia 12, 1–75.
Chamberlain, G. (1982). Multivariate regression models for panel data. Journal of Econometrics 18, 5–46.
Chamberlain, G. (1983). Funds, factors and diversification in arbitrage pricing models. Econometrica 51,
1305–1324.
Chamberlain, G. (1984). Panel data. In Z. Griliches and M. Intrilligator (eds.), Handbook of Econometrics,
vol. 2, ch. 22, pp. 1247–1318. Amsterdam: North-Holland.
Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment restrictions. Journal
of Econometrics 34, 305–334.
Chambers, M. (2005). The purchasing power parity puzzle, temporal aggregation, and half-life estimation.
Economics Letters 86, 193–198.
Champernowne, D. G. (1960). An experimental investigation of the robustness of certain procedures for esti-
mating means and regression coefficients. Journal of the Royal Statistical Society, Series A 123, 398–412.
Chan, N. H. and C. Z. Wei (1988). Limiting distributions of least squares estimates of unstable autoregressive
processes. Annals of Statistics 16, 367–401.
Chang, Y. (2002). Nonlinear IV unit root tests in panels with cross-sectional dependency. Journal of Econo-
metrics 110, 261–292.
Chang, Y. (2004). Bootstrap unit root test in panels with cross-sectional dependency. Journal of Economet-
rics 120, 263–293.
Chang, Y., J. Y. Park, and P. C. B. Phillips (2001). Nonlinear econometric models with cointegrated and deter-
ministically trending regressors. Econometrics Journal 4, 1–36.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1003
Chang, Y. and W. Song (2005). Unit root tests for panels in the presence of short-run and long-run dependen-
cies. Mimeo, Rice University TX.
Chatfield, C. (2003). The Analysis of Time Series: An Introduction (6th edn.). London: Chapman and Hall.
Chen, Q., D. Gray, P. N’Diaye, H. Oura, and N. Tamirisa (2010). International transmission of bank and cor-
porate distress. IMF Working Paper No. 10/124.
Cheung, Y. and K. S. Lai (1993). Finite-sample sizes of Johansen’s likelihood ratio tests for cointegration.
Oxford Bulletin of Economics and Statistics 55, 315–328.
Chib, S. (2011). Introduction to simulation and MCMC methods. In J. Geweke, G. Koop, and H. van Dijk
(eds.), The Oxford Handbook of Bayesian Econometrics. Oxford: Oxford University Press.
Choi, I. (2001). Unit root tests for panel data. Journal of International Money and Banking 20, 249–272.
Choi, I. (2002). Combination unit root tests for cross-sectionally correlated panels. In Econometric Theory and
Practice: Frontiers of Analysis and Applied Research, Essays in Honor of P.C.B. Phillips. Cambridge: Cambridge
University Press.
Choi, I. (2006). Nonstationary panels. In K. Patterson and T. C. Mills (eds.), Palgrave Handbooks of Econo-
metrics, vol. 1. Basingstoke: Palgrave Macmillan.
Choi, I. and T. K. Chue (2007). Subsampling hypothesis tests for nonstationary panels with applications to
exchange rates and stock prices. Journal of Applied Econometrics 22, 233–264.
Choi, I. and H. Jeong (2013). Model selection for factor analysis: some new criteria and performance com-
parisons. Research Institute for Market Economy (RIME) Working Paper No.1209, Sogang University,
South Korea.
Chortareas, G. and G. Kapetanios (2009). Getting PPP right: identifying mean-reverting real exchange rates
in panels. Journal of Banking & Finance 33, 390–404.
Chow, G. C. (1960). Test of equality between sets of coefficients in two linear regression. Econometrica 28,
591–605.
Christiano, L. J. and T. J. Fitzgerald (2003). The band pass filter. International Economic Review 44, 435–465.
Chudik, A. and M. Fidora (2012). How the global perspective can help us to identify structural shocks. Federal
Reserve Bank of Dallas Staff Paper No. 19.
Chudik, A. and M. Fratzscher (2011). Identifying the global transmission of the 2007–2009 financial crisis
in a GVAR model. European Economic Review 55, 325–339.
Chudik, A., V. Grossman, and M. H. Pesaran (2014). Nowcasting and forecasting global growth with purchas-
ing managers indices. Mimeo, January 2014.
Chudik, A., K. Mohaddes, M. H. Pesaran, and M. Raissi (2015). Long-run effects in large heterogenous panel
data models with cross-sectionally correlated errors. Federal Reserve Bank of Dallas, Globalization and
Monetary Policy Institute Working Paper No. 223.
Chudik, A. and M. H. Pesaran (2011). Infinite dimensional VARs and factor models. Journal of Economet-
rics 163, 4–22.
Chudik, A. and M. H. Pesaran (2013). Econometric analysis of high dimensional VARs featuring a dominant
unit. Econometric Reviews 32, 592–649.
Chudik, A. and M. H. Pesaran (2015a). Common correlated effects estimation of heterogeneous dynamic
panel data models with weakly exogenous regressors. Journal of Econometrics. Forthcoming.
Chudik, A. and M. H. Pesaran (2015b). Theory and practice of GVAR modeling. Journal of Economic Surveys.
Forthcoming.
Chudik, A., M. H. Pesaran, and E. Tosetti (2011). Weak and strong cross-section dependence and estimation
of large panels. Econometrics Journal 14, C45–C90.
Chudik, A. and L. V. Smith (2013). The GVAR approach and the dominance of the U.S. economy. Federal
Reserve Bank of Dallas, Globalization and Monetary Policy Institute Working Paper No. 136.
Clare, A. D., Z. Psaradakis, and S. H. Thomas (1995). An analysis of seasonality in the UK equity market.
Economic Journal 105, 398–409.
Clare, A. D., S. H. Thomas, and M. R. Wickens (1994). Is the gilt-equity yield ratio useful for predicting UK
stock return? Economic Journal 104, 303–315.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1004 References
Clarida, R. and J. Gali (1994). Sources of real exchange rate fluctuations: How important are nominal shocks?
Carnegie-Rochester Series on Public Policy 41, 1–56.
Clarida, R., J. Gali, and M. Gertler (1999). The science of monetary policy: a new Keynesian perspective.
Journal of Economic Literature 37, 1661–1707.
Clements, M. P. and D. F. Hendry (1993). On the limitations of comparing mean square forecast errors.
Journal of Forecasting 12, 617–637.
Clements, M. P. and D. F. Hendry (1998). Forecasting Economic Time Series. Cambridge: Cambridge University
Press.
Clements, M. P. and J. Smith (2000). Evaluating the forecast densities of linear and nonlinear models: Appli-
cations to output growth and unemployment. Journal of Forecasting 19, 255–276.
Cliff, A. D. and J. K. Ord (1969). The problem of spatial autocorrelation. In A. J. Scott (ed.), London Papers in
Regional Science. London: Pion.
Cliff, A. D. and J. K. Ord (1973). Spatial Autocorrelation. London: Pion.
Cliff, A. D. and J. K. Ord (1981). Spatial Processes: Models and Applications. London: Pion.
Coakley, J. and A. M. Fuertes (1997). New panel unit root tests of PPP. Economics Letters 57, 17–22.
Coakley, J., A. M. Fuertes, and R. Smith (2002). A principal components approach to cross-section depen-
dence in panels. Birkbeck College Discussion Paper 01/2002.
Coakley, J., A. M. Fuertes, and R. Smith (2006). Unobserved heterogeneity in panel time series. Computational
Statistics and Data Analysis 50, 2361–2380.
Coakley, J., N. Kellard, and S. Snaith (2005). The PPP debate: price matters! Economic Letters 88, 209–213.
Cobb, C. W. and P. H. Douglas (1928). A theory of production. American Economic Review 18, 139–165.
Cochran, W. G. (1934). The distribution of quadratic forms in a normal system, with applications to the anal-
ysis of covariance. Proceedings of the Cambridge Philosophical Society 30, 178–191.
Cochrane, D. and G. H. Orcutt (1949). Application of least squares regression to relationship containing auto-
correlated error terms. Journal of the American Statistical Association 44, 32–61.
Cochrane, J. H. (2011). Determinacy and identification with Taylor rules. Journal of Political Economy 119,
565–615.
Cogley, J. (1990). International evidence on the size of the random walk in output. Journal of Political Econ-
omy 98, 501–518.
Cogley, J. (1995). Effects of Hodrick-Prescott filter on trend and difference stationary time series: implications
for business cycle research. Journal of Economic Dynamics and Control 19, 253–278.
Conley, T. G. (1999). GMM estimation with cross sectional dependence. Journal of Econometrics 92,
1–45.
Conley, T. G. and G. Topa (2002). Socio-economic distance and spatial patterns in unemployment. Journal of
Applied Econometrics 17, 303–327.
Cooley, T. F. and E. C. Prescott (1976). Estimation in the presence of stochastic parameter variation. Econo-
metrica 44, 167–184.
Cooper, R. and J. Haltiwanger (1996). Evidence on macroeconomic complementarities. The Review of
Economics and Statistics 78, 78–93.
Cornwell, C. and P. Rupert (1988). Efficient estimation with panel data: an empirical comparison of instru-
mental variables. Journal of Applied Econometrics 3, 149–155.
Cowles, A. (1960). A revision of previous conclusions regarding stock price behavior. Econometrica 28,
909–915.
Cox, D. R. (1961). Tests of separate families of hypotheses. In Proceedings of the Fourth Berkeley Symposium on
Mathematical Statistics and Probability, vol. 1, Berkeley: University of California Press.
Cox, D. R. (1962). Further results on tests of separate families of hypotheses. Journal of the Royal Statistical
Society, Series B 24, 406–424.
Cressie, N. (1993). Statistics for Spatial Data. New York: Wiley.
Crowder, M. J. (1976). Maximum likelihood estimation for dependent observations. Journal of the Royal Sta-
tistical Society, Series B 38, 45–53.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1005
Darby, M. R. (1983). Movements in purchasing power parity: the short and long runs. In M. Darby and J. Loth-
ian (eds.), The International Transmission of Inflation. Chicago: University of Chicago Press (for National
Bureau of Economic Research).
Dastoor, N. K. (1983). Some aspects of testing non-nested hypotheses. Journal of Econometrics 21,
213–228.
Davidson, J. (1994). Stochastic Limit Theory: An Introduction for Econometricians. Oxford: Oxford University
Press.
Davidson, J. (2000). Econometric Theory. Malden, MA: Blackwell.
Davidson, J. and R. de Jong (1997). Strong laws of large numbers for dependent heterogeneous processes: a
synthesis of recent and new results. Econometric Reviews 16, 251–279.
Davidson, J., D. F. Hendry, F. Srba, and S. Yeo (1978). Econometric modelling of the aggregate time-series
relationship between consumer’s expenditure and income in the United Kingdom. Economic Journal 88,
661–692.
Davidson, R. and J. G. MacKinnon (1981). Several tests for model specification in the presence of alternative
hypothesis. Econometrica 49, 781–793.
Davidson, R. and J. G. MacKinnon (1984). Model specification tests based on artificial linear regressions.
International Economic Review 25, 485–502.
Davidson, R. and J. G. MacKinnon (1993). Estimation and Inference in Econometrics. New York: Oxford
University Press.
Davies, R. (1977). Hypothesis testing when a nuisance parameter is present only under the alternative.
Biometrika 64, 247–254.
Dawid, A. P. (1984). Present position and potential developments: some personal views: statistical theory,
the prequential approach. Journal of the Royal Statistical Society Series A 147, 278–292.
De Jong, D. N., B. Ingram, and C. Whiteman (2000). A Bayesian approach to dynamic macroeconomics.
Journal of Econometrics 98, 203–223.
de Jong, R. M. (1997). Central limit theorems for dependent heterogeneous random variables. Econometric
Theory 13, 353–367.
De Mol, C., D. Giannone, and L. Reichlin (2008). Forecasting using a large number of predictors: Is Bayesian
shrinkage a valid alternative to principal components? Journal of Econometrics 146, 318–328.
de Waal, A. and R. van Eyden (2013a). Forecasting key South African variables with a global VAR model.
Working Papers 201346, University of Pretoria, Department of Economics.
de Waal, A. and R. van Eyden (2013b). The impact of economic shocks in the rest of the world on South Africa:
Evidence from a global VAR. Working Papers 201328, University of Pretoria, Department of Economics.
de Wet, A. H., R. van Eyden, and R. Gupta (2009). Linking global economic dynamics to a South African-
specific credit risk correlation model. Economic Modelling 26, 1000–1011.
Deaton, A. S. (1977). Involuntary saving through unanticipated inflation. American Economic Review 67,
899–910.
Deaton, A. S. (1982). Model selection procedures, or does the consumption function exist? In G. Chow and
P. Corsi (eds.), Evaluating the Reliability of Macroeconometric Models, New York: John Wiley.
Deaton, A. S. (1987). Life-cycle models of consumption: is the evidence consistent with the theory? In T. F.
Bewley (ed.), Advances in Econometrics: Fifth World Congress, vol. 2. Cambridge: Cambridge University
Press.
Dées, S., F. di Mauro, M. H. Pesaran, and L. V. Smith (2007a). Exploring the international linkages of the Euro
Area: a global VAR analysis. Journal of Applied Econometrics 22, 1–38.
Dées, S., S. Holly, M. H. Pesaran, and L. V. Smith (2007b). Long run macroeconomic relations in the global
economy. Economics - The Open-Access, Open-Assessment E-Journal 1, 1–58.
Dées, S., M. H. Pesaran, L. V. Smith, and R. P. Smith (2009). Identification of New Keynesian Phillips curves
from a global perspective. Journal of Money, Credit and Banking 41, 1481–1502.
Dées, S., M. H. Pesaran, L. V. Smith, and R. P. Smith (2014). Constructing multi-country rational expectations
models. Oxford Bulletin of Economics and Statistics 76, 812–840.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1006 References
Dées, S. and A. Saint-Guilhem (2011). The role of the United States in the global economy and its evolution
over time. Empirical Economics 41, 573–591.
Del Negro, M. and F. Schorfheide (2011). Bayesian macroeconometrics. In John Geweke, Gary Koop and
Herman van Dijk (eds.) The Oxford Handbook of Bayesian Econometrics, Oxford: Oxford University Press.
Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM
algorithm. Journal of the Royal Statistical Society. Series B 39, 1–38.
den Haan, W. J. and A. Levin (1997). A practitioner’s guide to robust covariance matrix estimation. In
G. S. Maddala and C. R. Rao (eds.), Handbook of Statistics: Robust Inference, vol. 15. Amsterdam: North-
Holland.
Dhaene, G. and K. Jochmans (2012). Split-panel jackknife estimation of fixed-effect models. Mimeo, 21 July
2012.
Dhrymes, P. J. (1971). Distributed Lags: Problems of Estimation and Formulation. San Francisco: Holden Day.
Dhrymes, P. J. (2000). Mathematics for Econometrics (3rd edn). New York: Springer Verlag.
di Mauro, F. and M. H. Pesaran (2013). The GVAR Handbook: Structure and Applications of a Macro Model of
the Global Economy for Policy Analysis. Oxford: Oxford University Press.
Dickey, D. and W. Fuller (1979). Distribution of the estimators for autoregressive time series with a unit root.
Journal of the American Statistical Association 74, 427–431.
Diebold, F. X., T. A. Gunther, and A. S. Tay (1998). Evaluating density forecasts, with applications to financial
risk management. International Economic Review 39, 863–884.
Diebold, F. X., J. Hahn, and A. S. Tay (1999). Multivariate density forecast evaluation and calibration in finan-
cial risk management: high-frequency returns on foreign exchange. Review of Economics and Statistics 81,
661–673.
Diebold, F. X. and R. S. Mariano (1995). Comparing predictive accuracy. Journal of Business and Economic
Statistics 13, 253–265.
Diebold, F. X. and G. D. Rudebush (1991). On the power of Dickey-Fuller tests against fractional alternatives.
Economics Letters 35, 155–160.
Doan, T., R. Litterman, and C. Sims (1984). Forecasting and conditional projection using realistic prior dis-
tributions. Econometric Reviews 3, 1–100.
Donald, S. G., G. W. Imbens, and W. K. Newey (2009). Choosing the number of moments in conditional
moment restriction models. Journal of Econometrics 152, 28–36.
Dovern, J. and B. van Roye (2013). International transmission of financial stress: evidence from a GVAR. Kiel
Working Papers 1844, Kiel Institute for the World Economy.
Draper, D. (1995). Assessment and propagation of model uncertainty (with discussion). Journal of the Royal
Statistical Society Series B 57, 45–97.
Draper, N. R. and R. C. van Nostrand (1979). Ridge regression and James-Stein estimation: review and com-
ments. Technometrics 21, 451–466.
Dreger, C. and J. Wolters (2011). Liquidity and asset prices: how strong are the linkages? Review of Economics
& Finance 1, 43–52.
Dreger, C. and Y. Zhang (2013). Does the economic integration of China affect growth and inflation in indus-
trial countries? FIW Working Paper series 116, FIW.
Driscoll, J. C. and A. C. Kraay (1998). Consistent covariance matrix estimation with spatially dependent panel
data. Review of Economics and Statistics 80, 549–560.
Druska, V. and W. C. Horrace (2004). Generalized moments estimation for spatial panels: Indonesian rice
farming. American Journal of Agricultural Economics 86, 185–198.
Dubin, R. A. (1988). Estimation of regression coefficients in the presence of spatially autocorrelated errors.
Review of Economics and Statistics 70, 466–474.
Dubois, E., J. Hericourt, and V. Mignon (2009). What if the euro had never been launched? A counterfactual
analysis of the macroeconomic impact of euro membership. Economics Bulletin 29, 2241–2255.
Dufour, J. M. (1980). Dummy variables and predictive tests for structural change. Economics Letters 6,
241–247.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1007
Dufour, J. M. and L. Khalaf (2002). Exact tests for contemporaneous correlation of disturbances in seemingly
unrelated regressions. Journal of Econometrics 106, 143–170.
Dufour, J. M. and E. Renault (1998). Short run and long run causality in time series: theory. Econometrica 66,
1099–1126.
Durbin, J. and S. J. Koopman (2001). Time Series Analysis by State Space Methods. New York: Oxford University
Press.
Durbin, J. and G. S. Watson (1950). Testing for serial correlation in least squares regression i. Biometrika 37,
409–428.
Durbin, J. and G. S. Watson (1951). Testing for serial correlation in least squares regression ii. Biometrika 38,
159–178.
Durrett, R. (2010). Probability: Theory and Examples. Cambridge: Cambridge University Press.
Easterly, W. and R. Levine (2001). What have we learned from a decade of empirical research on
growth? It’s not factor accumulation: stylized facts and growth models. World Bank Economic Review 15,
177–219.
Edison, K. D. W. H. J. and D. Cho (1993). A utility-based comparison of some models of exchange rate volatil-
ity. Journal of International Economics 35, 23–45.
Egger, P., M. Larch, M. Pfaffermayr, and J. Walde (2009). Small sample properties of maximum likelihood
versus generalized method of moments based tests for spatially autocorrelated errors. Journal of Regional
Science and Urban Economics 39, 670–678.
Egger, P., M. Pfaffermayr, and H. Winner (2005). An unbalanced spatial panel data approach to US state tax
competition. Economics Letters 88, 329–335.
Eicker, F. (1963). Asymptotic normality and consistency of the least squares estimators for families of linear
regressions. The Annals of Mathematical Statistics 34, 447–456.
Eicker, F., L. M. LeCam, and J. Neyman (1967). Limit theorems for regressions with unequal and dependent
errors. In L. LeCam and J. Neyman (eds.), Fifth Berkeley Symposium on Mathematical Statistics and Proba-
bility, vol. 1, pp. 59–82, Berkeley. University of California Press.
Eickmeier, S. and T. Ng (2011). How do credit supply shocks propagate internationally? A GVAR approach.
Discussion Paper Series 1: Economic Studies 2011–27, Deutsche Bundesbank, Research Centre.
Eklund, J. and G. Kapetanios (2008). A review of forecasting techniques for large data sets. National Institute
Economic Review 203, 109–115.
Elhorst, J. P. (2003). Specification and estimation of spatial panel data models. International Regional Science
Review 26, 244–268.
Elhorst, J. P. (2005). Unconditional maximum likelihood estimation of linear and log-linear dynamic models
for spatial panels. Geographical Analysis 37, 85–106.
Elhorst, J. P. (2010). Dynamic panels with endogenous interaction effects when t is small. Regional Science and
Urban Economics 40, 272–282.
Elliott, G., C. W. J. Granger, and A. Timmermann (2006). Handbook Of Economic Forecasting, vol. I. Amster-
dam: North-Holland.
Elliott, G. and M. Jansson (2003). Testing for unit roots with stationary covariates. Journal of Econometrics 115,
75–89.
Elliott, G., T. J. Rothenberg, and J. H. Stock (1996). Efficient tests for an autoregressive unit root. Economet-
rica 64, 813–836.
Elliott, G. and A. Timmermann (2004). Optimal forecast combinations under general loss functions and fore-
cast error distributions. Journal of Econometrics 122, 47–79.
Elliott, G. and A. Timmermann (2008). Economic forecasting. Journal of Economic Literature 46, 3–56.
Ellison, G. (1993). Learning, local interaction, and coordination. Econometrica 61, 1047–1071.
Embrechts, P., A. Hoing, and A. Juri (2003). Using copulas to bound VaR for functions of dependent risks.
Finance and Stochastics 7, 145–167.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United
Kingdom inflation. Econometrica 50, 987–1007.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1008 References
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1009
Fernandez, C., E. Ley, and M. F. J. Steel (2001). Benchmark priors for Bayesian model averaging. Journal of
Econometrics 100, 381–427.
Ferson, W. E. and C. R. Harvey (1993). The risk and predictability of international equity returns. Review of
Financial Studies 6, 527–566.
Fingleton, B. (2008a). A generalized method of moments estimator for a spatial model with moving average
errors, with application to real estate prices. Empirical Economics 34, 35–57.
Fingleton, B. (2008b). A generalized method of moments estimator for a spatial panel model with an endoge-
nous spatial lag and spatial moving average errors. Spatial Economic Analysis 3, 27–44.
Fisher, G. R. and M. McAleer (1981). Alternative procedures and associated tests of significance for non-
nested hypotheses. Journal of Econometrics 16, 103–119.
Fisher, R. A. (1932). Statistical Methods for Research Workers (4th edn). Edinburgh: Oliver and Bond.
Fleming, M. M. (2004). Techniques for estimating spatially dependent discrete choice models. In L. Anselin,
R. J. G. M. Florax, and S. J. Rey (eds.), Advances in Spatial Econometrics. Berlin: Springer-Verlag.
Flores, R., P. Jorion, P. Y. Preumont, and A. Szarfarz (1999). Multivariate unit root tests of the PPP hypothesis.
Journal of Empirical Finance 6, 335–353.
Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000). The generalized dynamic factor model: identification
and estimation. Review of Economics and Statistic 82, 540–554.
Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2004). The generalized dynamic factor model: consistency
and rates. Journal of Econometrics 119, 231–235.
Forni, M. and M. Lippi (1997). Aggregation and the Microfoundations of Dynamic Macroeconomics. Oxford:
Oxford University Press.
Forni, M. and M. Lippi (2001). The generalized factor model: representation theory. Econometric Theory 17,
1113–1141.
Fox, R. and M. S. Taqqu (1986). Large sample properties of parameters estimates for strongly dependent
stationary Gaussian time series. The Annals of Statistics 14, 517–532.
Frees, E. W. (1995). Assessing cross sectional correlation in panel data. Journal of Econometrics 69, 393–414.
Friedman, J., T. Hastie, and R. Tibshirani (2008). Sparse inverse covariance estimation with the graphical
LASSO. Biostatistics 9, 432–441.
Frisch, R. and F. V. Waugh (1933). Partial time regressions as compared with individual trends. Econometrica 1,
387–401.
Fry, R. and A. R. Pagan (2005). Some issues in using VARs for macroeconometric research. CAMA Working
Paper No. 19.
Fuhrer, J. C. (2000). Habit formation in consumption and its implications for monetary-policy models. Amer-
ican Economic Review 90, 367–390.
Fuller, W. A. (1996). Introduction to Statistical Time Series (2nd edn). New York: Wiley.
Galesi, A. and M. J. Lombardi (2009). External shocks and international inflation linkages: A global VAR
analysis. European Central Bank, Working Paper No. 1062.
Gali, J. (1992). How well does the IS-LM model fit postwar U.S. data? Quarterly Journal of Economics 107,
709–738.
Garderen, K. J., K. Lee, and M. H. Pesaran (2000). Cross-sectional aggregation of non-linear models. Journal
of Econometrics 95, 285–331.
Gardner Jr, E. S. (2006). Exponential smoothing: the state of the art - Part II. International Journal of Forecast-
ing 22, 637–666.
Garnett, J. C. (1920). The single general factor in dissimilar mental measurement. British Journal of Psychol-
ogy 10, 242–258.
Garratt, A., K. Lee, M. H. Pesaran, and Y. Shin (2003a). Forecast uncertainties in macroeconometric mod-
elling: an application to the UK economy. Journal of the American Statistical Association, Applications and
Case Studies 98, 829–838.
Garratt, A., K. Lee, M. H. Pesaran, and Y. Shin (2003b). A long run structural macroeconometric model of
the UK. Economic Journal 113, 412–455.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1010 References
Garratt, A., K. Lee, and K. Shields (2014). Forecasting global recessions in a GVAR model of actual and
expected output in the G7. University of Nottingham, Centre for Finance, Credit and Macroeconomics
Discussion Paper No. 2014/06.
Garratt, A., D. Robertson, and S. Wright (2005). Permanent vs transitory components and economic funda-
mentals. Journal of Applied Economics 21, 521–542.
Garratt, A., K. Lee, M. H. Pesaran, and Y. Shin (2006). Global and National Macroeconometric Modelling: A
Long Run Structural Approach. Oxford: Oxford University Press.
Geary, R. C. (1954). The contiguity ratio and statistical mapping. The Incorporated Statistician 5, 115–145.
Gengenbach, C., F. C. Palm, and J. Urbain (2006). Cointegration testing in panels with common factors.
Oxford Bulletin of Economics and Statistics 68, 683–719.
Gengenbach, C., F. C. Palm, and J. Urbain (2010). Panel unit root tests in the presence of cross-sectional
dependencies: comparison and implications for modelling. Econometric Reviews 29, 111–145.
Georgiadis, G. (2014a). Determinants of global spillovers from US monetary policy. Mimeo, May 2014.
Georgiadis, G. (2014b). Examining asymmetries in the transmission of monetary policy in the Euro Area:
Evidence from a mixed cross-section global VAR model. Mimeo, June 2014.
Georgiadis, G. and A. Mehl (2015). Trilemma, not dilemma: Financial globalisation and monetary policy
effectiveness. Federal Reserve Bank of Dallas, Globalization and Monetary Policy Institute Working Paper
No. 222.
Gerrard, W. J. and L. G. Godfrey (1998). Diagnostic checks for single-equation error-correction and autore-
gressive distributed lag models. The Manchester School of Economic & Social Studies 66, 222–237.
Geweke, J. (1977). The dynamic factor analysis of economic time series. In D. Aigner and A. Goldberger
(eds.), Latent Variables in Socio-Economic Models. Amsterdam: North-Holland.
Geweke, J. (1985). Macroeconometric modeling and the theory of the representative agent. American
Economic Review 75, 206–210.
Geweke, J. (2005). Contemporary Bayesian Econometrics and Statistics. New York: Wiley.
Geweke, J., J. L. Horowitz, and M. H. Pesaran (2008). Econometrics. In S. N. Durlauf and L. E. Blume (eds.),
The New Palgrave Dictionary of Economics (2nd edn). New York: Palgrave Macmillan.
Geweke, J., G. Koop, and H. van Dijk (2011). The Oxford Handbook of Bayesian Econometrics. Oxford: Oxford
University Press.
Giacomini, R. and C. W. J. Granger (2004). Aggregation of space-time processes. Journal of Econometrics 118,
7–26.
Giacomini, R. and H. White (2006). Tests of conditional predictive ability. Econometrica 74, 1545–1578.
Giannone, D., L. Reichlin, and L. Sala (2005). Monetary policy in real time. In M. Gertler and K. Rogoff (eds.),
NBER Macroeconomics Annual 2004, vol. 19, pp. 161–200. Cambridge MA: MIT Press.
Gilli, M. and G. Pauletto (1997). Sparse direct methods for model simulation. Journal of Economic Dynamics
and Control 21, 1093–1111.
Gnedenko, B. V. (1962). Theory of Probability. New York: Chelsea.
Godfrey, L. G. (1978a). Testing against general autoregressive and moving average error models when the
regressors include lagged dependent variables. Econometrica 46, 1293–1301.
Godfrey, L. G. (1978b). Testing for higher order serial correlation in regression equations when the regressors
include lagged dependent variables. Econometrica 46, 1303–1310.
Godfrey, L. G. (2011). Robust non-nested testing for ordinary least squares regression when some of the
regressors are lagged dependent variables. Oxford Bulletin of Economics and Statistics 73, 651–668.
Godfrey, L. G. and C. D. Orme (2004). Controlling the finite sample significance levels of heteroskedasticity-
robust tests of several linear restrictions on regression coefficients. Economics Letters 82, 281–287.
Godfrey, L. G. and M. H. Pesaran (1983). Test of non-nested regression models: small sample adjustments
and Monte Carlo evidence. Journal of Econometrics 21, 133–154.
Goffe, W. L., G. D. Ferrier, and J. Rogers (1994). Global optimization of statistical functions with simulated
annealing. Journal of Econometrics 60, 65–99.
Golub, G. H. and C. F. Van Loan (1996). Matrix computations (3rd edn). Baltimore, MA: John Hopkins Uni-
versity Press.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1011
Gonzalo, J. (1994). Five alternative methods of estimating long-run equilibrium relationships. Journal of
Econometrics 60, 203–233.
Gorman, W. M. (1953). Community preference fields. Econometrica 21, 63–80.
Gouriéroux, C., A. Holly, and A. Monfort (1982). Likelihood ratio test, Wald test, and Kuhn–Tucker test in
linear models with inequality constraints on the regression parameters. Econometrica 50, 63–80.
Gouriéroux, C., A. Monfort, E. Renault, and A. Trognon (1987). Generalised residuals. Journal of Economet-
rics 34, 5–32.
Gouriéroux, C., A. Monfort, and G. M. Gallo (1997). Time Series and Dynamic Models. New York: Cambridge
University Press.
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods.
Econometrica 37, 424–438.
Granger, C. W. J. (1980). Long memory relationships and the aggregation of dynamic models. Journal of Econo-
metrics 14, 227–238.
Granger, C. W. J. (1986). Developments in the study of co-integrated economic variables. Oxford Bulletin of
Economics and Statistics 48, 213–228.
Granger, C. W. J. (1987). Implications of aggregation with common factors. Econometric Theory 3, 208–222.
Granger, C. W. J. (1990). Aggregation of time-series variables: a survey. In T. Barker and M. H. Pesaran (eds.),
Disaggregation in Econometric Modelling, ch. 2, pp. 17–34. London and New York: Routlege.
Granger, C. W. J. (1992). Forecasting stock market prices: lessons for forecasters. International Journal of Fore-
casting 8, 3–13.
Granger, C. W. J. and Y. Jeon (2007). Evaluation of global models. Economic Modelling 24, 980–989.
Granger, C. W. J. and J. L. Lin (1995). Causality in the long run. Econometric Theory 11, 530–536.
Granger, C. W. J. and M. J. Morris (1976). Time series modelling and interpretation. Journal of the Royal Sta-
tistical Society A 139, 246–257.
Granger, C. W. J. and P. Newbold (1974). Spurious regressions in econometrics. Journal of Econometrics 2,
111–120.
Granger, C. W. J. and P. Newbold (1977). Forecasting Economic Time Series. New York: Academic Press.
Granger, C. W. J. and M. H. Pesaran (2000a). A decision-based approach to forecast evaluation. In W. S. Chan,
W. K. Li, and H. Tong (eds.), Statistics and Finance: An Interface. London: Imperial College Press.
Granger, C. W. J. and M. H. Pesaran (2000b). Economic and statistical measures of forecast accuracy. Journal
of Forecasting 19, 537–560.
Gray, D. F., M. Gross, J. Paredes, and M. Sydow (2013). Modeling banking, sovereign, and macro risk in a
CCA Global VAR. IMF Working Papers 13/218, International Monetary Fund.
Gredenhoff, M. and T. Jacobson (2001). Bootstrap testing linear restrictions on cointegrating vectors. Journal
of Business and Economic Statistics 19, 63–72.
Greenberg, E. (2013). Introduction to Bayesian Econometrics (2nd edn). New York: Cambridge University
Press.
Greene, W. (2002). Econometric Analysis (5th edn). Upper Saddle River, NJ: Prentice Hall.
Greenwood-Nimmo, M., V. H. Nguyen, and Y. Shin (2012a). International linkages of the Korean economy:
The global vector error-correcting macroeconometric modelling approach. Melbourne Institute Working
Paper Series wp2012n18, Melbourne Institute of Applied Economic and Social Research, The University
of Melbourne.
Greenwood-Nimmo, M., V. H. Nguyen, and Y. Shin (2012b). Probabilistic forecasting of output, growth, infla-
tion and the balance of trade in a GVAR framework. Journal of Applied Econometrics 27, 554–573.
Gregory, A. W. and M. R. Veall (1985). Formulating Wald tests of nonlinear restrictions. Econometrica 53,
1465–1468.
Griffith, D. A. (2010). Modeling spatio-temporal relationships: retrospect and prospect. Journal of Geograph-
ical System 12, 111–123.
Griliches, Z. (1957). Specification bias in estimates of production functions. Journal of Farm Economics 39,
8–20.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1012 References
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1013
Hamilton, J. D. (1994). Time Series Analysis. Princeton, NJ: Princeton University Press.
Hanck, C. (2009). For which countries did PPP hold? A multiple testing approach. Empirical Economics 37,
93–103.
Hannan, E. J. (1970). Multiple Time Series. New York: John Wiley.
Hannan, E. J. and M. Deistler (1988). The Statistical Theory of Linear Systems. New York: John Wiley & Sons.
Hansen, B. E. (1995). Rethinking the univariate approach to unit root testing: using covariates to increase
power. Econometric Theory 11, 1148–1171.
Hansen, C. B. (2007). Asymptotic properties of a robust variance matrix estimator for panel data when T is
large. Journal of Econometrics 141, 597–620.
Hansen, G., J. R. Kim, and S. Mittnik (1998). Testing cointegrating coefficients in vector autoregressive error
correction models. Economics Letters 58, 1–5.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50,
1029–1054.
Hansen, L. P., J. Heaton, and A. Yaron (1996). Finite-sample properties of some alternative GMM estimators.
Journal of Business & Economic Statistics 14, 262–280.
Hansen, L. P. and K. J. Singleton (1982). Generalized instrumental variables estimation of nonlinear rational
expectations models. Econometrica 50, 1269–1286.
Hansen, P. R., Z. Huang, and H. H. Shek (2012). Realized GARCH: a joint model for returns and realized
measures of volatility. Journal of Applied Econometrics 27, 877–906.
Haque, N. U., M. H. Pesaran, and S. Sharma (2000). Neglected heterogeneity and dynamics in cross-country
savings regressions. In J. Krishnakumar and E. Ronchetti (eds.), Panel Data Econometrics: Papers in Honour
of Professor Pietro Balestra. New York: Elsevier.
Harbo, I., S. Johansen, B. Nielsen, and A. Rahbek (1998). Asymptotic inference on cointegrating rank in partial
systems. Journal of Business and Economic Statistics 16, 388–399.
Harding, M. (2013, April). Estimating the number of factors in large dimensional factor models. Mimeo, Stan-
ford University.
Harris, D., S. Leybourne, and B. McCabe (2004). Panel stationarity tests for purchasing power parity with
cross-sectional dependence. Journal of Business and Economic Statistics 23, 395–409.
Harris, R. D. F. and H. E. Tzavalis (1999). Inference for unit roots in dynamic panels where the time dimension
is fixed. Journal of Econometrics 91, 201–226.
Hartee, D. R. (1958). Numerical Analysis. Oxford: Clarendon.
Harvey, A. C. (1981). The Econometric Analysis of Time Series. London: Philip Allan.
Harvey, A. C. (1989). Forecasting Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge
University Press.
Harvey, A. C. and D. Bates (2003). Multivariate unit root tests, stability and convergence. University of Cam-
bridge, DAE Working Paper No. 301.
Harvey, A. C. and A. Jaeger (1993). Detrending, stylized facts and the business cycle. Journal of Applied Econo-
metrics 8, 231–247.
Harvey, A. C. and N. Shephard (1993). Structural time series models. In G. S. Maddala, C. R. Rao, and H. D.
Vinod (eds.), Handbook of Statistics, vol. 11. Amsterdam: Elsevier Science.
Harvey, D. I., S. J. Leybourne, and P. Newbold (1997). Testing the equality of prediction mean squared errors.
International Journal of Forecasting 13, 281–291.
Harvey, D. I., S. J. Leybourne, and P. Newbold (1998). Tests for forecast encompassing. Journal of Business and
Economic Statistics 16, 254–259.
Harvey, D. I., S. J. Leybourne, and N. D. Sakkas (2006). Panel unit root tests and the impact of initial observa-
tions. Granger Centre Discussion Paper No. 06/02, University of Nottingham.
Hastie, T., R. Tibshirani, and J. Friedman (2009). The Elements of Statistical Learning (2nd edn). Berlin:
Springer.
Hausman, J. A. (1978). Specification tests in econometrics. Econometrica 46, 1251–1272.
Hausman, J. A. and W. E. Taylor (1981). Panel data and unobservable individual effects. Econometrica 49,
1377–1398.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1014 References
Hayakawa, K., M. Pesaran, and L. Smith (2014). Transformed maximum likelihood estimation of short
dynamic panel data models with interactive effects. CESifo Working Paper No. 4822.
Hayakawa, K. and M. H. Pesaran (2015). Robust standard errors in transformed likelihood estimation of
dynamic panel data models. Journal of Econometrics. Forthcoming.
Hayashi, F. (1982). Tobin’s marginal Q and average Q: A neoclassical interpretation. Econometrica 50,
213–224.
Hayashi, F. and C. Sims (1983). Nearly efficient estimation of time series models with predetermined, but not
exogenous, instruments. Econometrica 51, 783–798.
Hebous, S. and T. Zimmermann (2013). Estimating the effects of coordinated fiscal actions in the Euro Area.
European Economic Review 58, 110–121.
Heijmans, R. D. H. and J. R. Magnus (1986a). Asymptotic normality of maximum likelihood estimators
obtained from normally distributed but dependent observations. Econometric Theory, 374–412.
Heijmans, R. D. H. and J. R. Magnus (1986b). Consistent maximum-likelihood estimation with dependent
observations: the general (non-normal) case and the normal case. Journal of Econometrics 32, 253–285.
Heijmans, R. D. H. and J. R. Magnus (1986c). On the first-order erfficiency and asymptotic normality of max-
imum likelihood estimators obtained from dependent observations. Statistica Neerlandica 40, 169–188.
Hendry, D. F. and N. R. Ericsson (2003). Understanding Economic Forecasts. Cambridge, MA: The MIT Press.
Hendry, D. F., A. R. Pagan, and J. D. Sargan (1984). Dynamic specification. In Z. Griliches and M. Intriligator
(eds.), Handbook of Econometrics, vol. II, pp. 1023–1100. Amsterdam: Elsevier.
Henriksson, R. D. and R. C. Merton (1981). On market-timing and investment performance. II. Statistical
procedures for evaluating forecasting skills. Journal of Business 54, 513–533.
Hepple, L. W. (1998). Exact testing for spatial correlation among regression residuals. Environment and Plan-
ning A 30, 85–108.
Heutschel, L. (1991). The absolute value GARCH model and the volatility of U.S. stock returns. Unpublished
Manuscript, Princeton University.
Hiebert, P. and I. Vansteenkiste (2009). Do house price developments spill over across Euro Area countries?
Evidence from a Global VAR. Working Paper Series 1026, European Central Bank.
Hiebert, P. and I. Vansteenkiste (2010). International trade, technological shocks and spillovers in the labour
market: a GVAR analysis of the US manufacturing sector. Applied Economics 42, 3045–3066.
Hildreth, C. (1950). Combining cross section data and time series. Cowles Commission Discussion Paper,
No. 347.
Hildreth, C. and W. Dent (1974). An adjusted maximum likelihood estimator. In W. Sellekaert (ed.), Econo-
metrics and Economic Theory: Essays in Honour of Jan Tinbergen. London: Macmillan.
Hildreth, C. and J. Houck (1968). Some estimators for a linear model with random coefficients. Journal of the
American Statistical Association 63, 584–595.
Hlouskova, J. and M. Wagner (2006). The performance of panel unit root and stationarity tests: results from
a large scale simulation study. Econometric Reviews 25, 85–116.
Hodrick, R. and E. Prescott (1997). Post-war U.S. business cycles: an empirical investigation. Journal of Money,
Credit, and Banking 29, 1–16.
Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999). Bayesian model averaging: tutorial. Sta-
tistical Science 14, 382–417.
Holly, S., M. H. Pesaran, and T. Yamagata (2010). A spatio-temporal model of house prices in the US. Journal
of Econometrics 158, 160–173.
Holly, S., M. H. Pesaran, and T. Yamagata (2011). The spatial and temporal diffusion of house prices in the
UK. Journal of Urban Economics 69, 2–23.
Holly, S. and I. Petrella (2012). Factor demand linkages, technology shocks, and the business cycle. Review of
Economics and Statistics 94, 948–963.
Holtz-Eakin, D., W. Newey, and H. S. Rosen (1988). Estimating vector autoregressions with panel data. Econo-
metrica 56, 1371–1395.
Horn, R. A. and C. A. Johnson (1985). Matrix Analysis. Cambridge: Cambridge University Press.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1015
Horowitz, J. L. (1994). Bootstrap-based critical values for the information matrix test. Journal of Economet-
rics 61, 395–411.
Horowitz, J. L. (2009). Semiparametric and Nonparametric Methods in Econometrics. Berlin: Springer.
Hsiao, C. (1974). Statistical inference for a model with both random cross-sectional and time effects. Interna-
tional Economic Review 15, 12–30.
Hsiao, C. (1975). Some estimation methods for a random coefficient model. Econometrica 43, 305–325.
Hsiao, C. (2003). Analysis of Panel Data (2nd edn). Cambridge: Cambridge University Press.
Hsiao, C. (2014). Analysis of Panel Data (3rd edn). Cambridge: Cambridge University Press.
Hsiao, C., T. W. Appelbe, and C. R. Dineen (1992). A general framework for panel data models with an
application to Canadian customer-dialed long distance telephone service. Journal of Econometrics 59,
63–86.
Hsiao, C. and M. H. Pesaran (2008). Random coefficient models. In L. Matyas and P. Sevestre (eds.), The
Econometrics of Panel Data, ch. 6, pp. 185–213. Berlin: Springer.
Hsiao, C., M. H. Pesaran, and A. Pick (2012). Diagnostic tests of cross-section independence for limited
dependent variable panel data models. Oxford Bulletin of Economics and Statistics 74, 253–277.
Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu (1999). Bayes estimation of short-run coefficients in
dynamic panel data models. In C. Hsiao, L. F. Lee, K. Lahiri, and M. H. Pesaran (eds.), Analysis of Pan-
els and Limited Dependent Variables Models. Cambridge: Cambridge University press.
Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu (2002). Maximum likelihood estimation of fixed effects
dynamic panel data models covering short time periods. Journal of Econometrics 109, 107–150.
Hsiao, C., Y. Shen, and H. Fujiki (2005). Aggregate vs disaggregate data analysis: a paradox in the estimation
of a money demand function of Japan under the low interest rate policy. Journal of Applied Econometrics 20,
579–601.
Hsiao, C. and Q. Zhou (2015). Statistical inference for panel dynamic simultaneous equations models. Journal
of Econometrics. Forthcoming.
Huang, J. S. (1984). The autoregressive moving average model for spatial analysis. Australian Journal of Statis-
tics 26, 169–178.
Huizinga, J. (1988). An empirical investigation of the long-run behavior of real exchange rates. Carnegie-
Rochester Conference Series Public Policy 27, 149–214.
Hurwicz, L. (1950). Least squares bias in time series. In T. C. Koopman (ed.), Statistical Inference in Dynamic
Economic Models. New York: Wiley.
Ibragimov, R. and U. K. Müller (2010). t-statistic based correlation and heterogeneity robust inference. Journal
of Business and Economic Statistics 28, 453–468.
Im, K. S., J. Lee, and M. Tieslau (2005). Panel LM unit root tests with level shifts. Oxford Bulletin of Economics
and Statistics 63, 393–419.
Im, K. S., M. H. Pesaran, and Y. Shin (2003). Testing for unit roots in heterogeneous panels. Journal of Econo-
metrics 115, 53–74.
Imbs, J., H. Mumtaz, M. O. Ravn, and H. Rey (2005). PPP strikes back: Aggregation and the real exchange
rate. Quarterly Journal of Economics 120, 1–43.
Inoue, A. and L. Kilian (2013). Inference on impulse response functions in structural VAR models. Journal of
Econometrics 177, 1–13.
Iskrev, N. (2010a). Evaluating the strength of identification in DSGE models. an a priori approach. 2010
Meeting Papers 1117, Society for Economic Dynamics.
Iskrev, N. (2010b). Local identification in DSGE models. Journal of Monetary Economics 57, 189–202.
Iskrev, N. and M. Ratto (2010). Analysing identification issues in DSGE models. MONFISPOL papers,
Stressa, Italy.
James, W. and C. Stein (1961). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium
on Mathematical Statistics and Probability 1, 361–379.
Jannsen, N. (2010). National and international business cycle effects of housing crises. Applied Economics
Quarterly (formerly: Konjunkturpolitik) 56, 175–206.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1016 References
Jarque, C. M. and A. K. Bera (1980). Efficient tests for normality, homoscedasticity and serial independence
of regression residuals. Economics Letters 6, 255–259.
Jensen, P. S. and T. D. Schmidt (2011). Testing cross-sectional dependence in regional panel data. Spatial
Economic Analysis 6, 423–450.
Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12,
231–254.
Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in Gaussian vector autore-
gressive models. Econometrica 59, 1551–1580.
Johansen, S. (1992). Cointegration in partial systems and the efficiency of single-equation analysis. Journal of
Econometrics 52, 231–254.
Johansen, S. (1994). The role of the constant and linear terms in cointegration analysis of nonstationary vari-
ables. Econometric Reviews 13, 205–229.
Johansen, S. (1995). Likelihood Based Inference on Cointegration in the Vector Autoregressive Model. Oxford:
Oxford University Press.
Johansen, S. and K. Juselius (1992). Testing structural hypotheses in a multivariate cointegration analysis of
the PPP and UIP for UK. Journal of Econometrics 53, 211–244.
John, S. (1971). Some optimal multivariate tests. Biometrika 58, 123–127.
Jolliffe, I. T. (2004). Principal Components Analysis (2nd edn). New York: Springer.
Jones, M. C., J. S. Marron, and S. J. Sheather (1996). A brief survey of bandwidth selection for density estima-
tion. Journal of the American Statistical Association 91, 401–407.
Jönsson, K. (2005). Cross-sectional dependency and size distortion in a small-sample homogeneous panel-
data unit root test. Oxford Bulletin of Economics and Statistics 63, 369–392.
Jorgenson, D. W. (1966). Rational distributed lag functions. Econometrica 32, 135–149.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T. C. Lee (1985). The Theory and Practice of Econo-
metrics (2nd edn). New York: John Wiley.
Juselius, K. (2007). The Cointegrated VAR Model: Methodology and Applications. Oxford: Oxford University
Press.
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological
Measurement 20, 141–151.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering,
Transactions ASMA, Series D 82, 35–45.
Kandel, S. and R. F. Stambaugh (1996). On the predictability of stock returns: an asset-allocation perspective.
Journal of Finance 51, 385–424.
Kao, C. (1999). Spurious regression and residual-based tests for cointegration in panel data. Journal of Econo-
metrics 90, 1–44.
Kao, C. and M. Chiang (2001). On the estimation and inference of a cointegrated regression in panel data.
Advances in Econometrics 15, 179–222.
Kapetanios, G. (2003). Determining the poolability properties of individual series in panel datasets. Queen
Mary, University of London Working Paper No. 499.
Kapetanios, G. (2004). A new method for determining the number of factors in factor models with large
datasets. Queen Mary, University of London, Working Paper No. 525.
Kapetanios, G. (2007). Dynamic factor extraction of cross-sectional dependence in panel unit root tests. Jour-
nal of Applied Econometrics 22, 313–338.
Kapetanios, G. (2010). A testing procedure for determining the number of factors in approximate factor mod-
els with large datasets. Journal of Business and Economic Statistics 28, 397–409.
Kapetanios, G. and M. H. Pesaran (2007). Alternative approaches to estimation and inference in large multi-
factor panels: small sample results with an application to modelling of asset returns. In G. Phillips and
E. H. Tzavalis (eds.), The Refinement of Econometric Estimation and Test Procedures: Finite Sample and
Asymptotic Analysis. Cambridge: Cambridge University Press.
Kapetanios, G., M. H. Pesaran, and T. Yamagata (2011). Panels with nonstationary multifactor error struc-
tures. Journal of Econometrics 160, 326–348.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1017
Kapetanios, G. and Z. Psaradakis (2007). Semiparametric sieve-type GLS inference. Working paper No. 587,
University of London.
Kapoor, M., H. H. Kelejian, and I. Prucha (2007). Panel data models with spatially correlated error compo-
nents. Journal of Econometrics 140, 97–130.
Karagedikli, O., T. Matheson, C. Smith, and S. P. Vahey (2010). RBCs and DSGEs: the computational
approach to business cycle theory and evidence. Journal of Economic Surveys 24, 113–136.
Karlin, S. and H. M. Taylor (1975). A First Course in Stochastic Processes (2nd edn). New York: Academic Press.
Keane, M. P. and D. E. Runkle (1992). On the estimation of panel-data models with serial correlation when
instruments are not strictly exogenous. Journal of Business and Economic Statistics 10, 1–9.
Kelejian, H. H. (1980). Aggregation and disaggregation of non-linear equations. In J. Kmenta and J. B. Ramsay
(eds.), Evaluation of econometric models. New York: Academic Press.
Kelejian, H. H. and I. Prucha (1998). A generalized spatial two stage least squares procedure for estimat-
ing a spatial autoregressive model with autoregressive disturbances. Journal of Real Estate Finance and
Economics 17, 99–121.
Kelejian, H. H. and I. Prucha (1999). A generalized moments estimator for the autoregressive parameter in a
spatial model. International Economic Review 40, 509–533.
Kelejian, H. H. and I. Prucha (2001). On the asymptotic distribution of the Moran I test with applications.
Journal of Econometrics 104, 219–257.
Kelejian, H. H. and I. Prucha (2007). HAC estimation in a spatial framework. Journal of Econometrics 140,
131–154.
Kelejian, H. H. and I. Prucha (2010). Specification and estimation of spatial autoregressive models with
autoregressive and heteroskedastic disturbances. Journal of Econometrics 157, 53–67.
Kelejian, H. H. and D. P. Robinson (1993). A suggested method of estimation for spatial interdependendence
models with autocorrelated errors, and an application to a county expenditure model. Papers in Regional
Science 72, 297–312.
Kelejian, H. H. and D. P. Robinson (1995). Spatial correlation: a suggested alternative to the autoregressive
model. In L. Anselin and R. J. Florax (eds.), New Directions in Spatial Econometrics, pp. 75–95. Berlin:
Springer-Verlag.
Kendall, M. G. (1938). A new measure of rank correlation. Biometrika 30, 81–89.
Kendall, M. G. (1953). The analysis of economic time series - part 1: Prices. Journal of the Royal Statistical
Society 96, 11–25.
Kendall, M. G. (1954). Note on bias in the estimation of autocorrelation. Biometrika 41, 403–404.
Kendall, M. G. and J. D. Gibbons (1990). Rank Correlation Methods (5th edn). London: Edward Arnold.
Kendall, M. G., A. Stuart, and J. K. Ord (1983). The Advanced Theory of Statistics, vol. 3. London: Charles
Griffin & Co.
Kennedy, P. (2003). A Guide to Econometrics. Oxford: Blackwell.
Kezdi, A. (2004). Robust standard error estimation in fixed-effects panel models. Hungarian Statistical
Review 9, 95–116.
Khintchine, A. (1934). Korrelationstheorie der stationare stochastichen processe. Mathematische
Annalen 109, 604–615.
Kiefer, N. M. and T. J. Vogelsang (2002). Heteroskedasticity-autocorrelation robust standard errors using the
Bartlett kernel without truncation. Econometrica 70, 2093–2095.
Kiefer, N. M. and T. J. Vogelsang (2005). A new asymptotic theory for heteroskedasticity autocorrelation
robust tests. Econometric Theory 21, 1130–1164.
Kiefer, N. M., T. J. Vogelsang, and H. Bunzel (2000). Simple robust testing of regression hypotheses. Econo-
metrica 68, 695–714.
Kilian, L. (1997). Impulse response analysis in vector autoregressions with unknown lag order. unpublished
manuscript, University of Michigan.
Kilian, L. (1998). Confidence intervals for impulse responses under departures from normality. Econometric
Reviews 17, 1–29.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1018 References
King, R. G. and M. W. Watson (1998). The solution of singular linear difference systems under rational expec-
tations. International Economic Review 39, 1015–1026.
Kiviet, J. F. (1995). On bias, inconsistency, and efficiency of various estimators in dynamic panel data models.
Journal of Econometrics 68, 53–78.
Kiviet, J. F. (1999). Expectation of expansions for estimators in a dynamic panel data model; some results for
weakly exogenous regressors. In C. Hsiao, K. Lahiri, L.-F. Lee, and M. H. Pesaran (eds.), Analysis of Panel
Data and Limited Dependent Variables. Cambridge: Cambridge University Press.
Kiviet, J. F. and G. D. A. Phillips (1993). Alternative bias approximation with lagged-dependent variables.
Econometric Theory 9, 62–80.
Kleibergen, F. and S. Mavroeidis (2009). Weak instrument robust tests in gmm and the new Keynesian Phillips
curve. Journal of Business and Economic Statistics 27, 293–311.
Klein, L. R. (1962). An Introduction to Econometrics. Upper Saddle River, NJ: Prentice-Hall.
Kocherlakota, N. R. (2003). The equity premium: it’s still a puzzle. Journal of Economic Literature 34, 42–71.
Koenker, R. and J. A. Machado (1999). GMM inference when the number of moment conditions is large.
Journal of Econometrics 93, 327–344.
Komunjer, I. and S. Ng (2011). Dynamic identification of DSGE models. Econometrica 79, 1995–2032.
Konstantakis, K. N. and P. G. Michaelides (2014). Transmission of the debt crisis: from EU15 to USA or vice
versa? A GVAR approach. Journal of Economics and Business 76, 115–132.
Koop, G. (2003). Bayesian Econometrics. New York: John Wiley.
Koop, G., M. H. Pesaran, and S. M. Potter (1996). Impulse response analysis in nonlinear multivariate models.
Journal of Econometrics 74, 119–147.
Koop, G., M. H. Pesaran, and R. Smith (2013). On identification of Bayesian DSGE models. Journal of Business
and Economic Statistics 31, 300–314.
Kukenova, M. and J. A. Monteiro (2009). Spatial dynamic panel model and system GMM: a Monte Carlo
investigation. MPRA Working Paper n. 13405.
Kullback, S. and R. A. Leibler (1951). On information and sufficiency. Annals of Mathematical Statistics 22,
79–86.
Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin (1992). Testing the null hypothesis of stationary
against the alternative of a unit root: how sure are we that economic time series have a unit root? Journal
of Econometrics 54, 159–178.
Kydland, F. and E. Prescott (1996). The computational experiment: an econometric tool. Journal of Economic
Perspectives 10, 69–85.
Larsson, R. and J. Lyhagen (1999). Likelihood-based inference in multivariate panel cointegration models.
Working paper series in Economics and Finance, no. 331, Stockholm School of Economics.
Larsson, R., J. Lyhagen, and M. Lothgren (2001). Likelihood-based cointegration tests in heterogenous
panels. Econometrics Journal 4, 109–142.
Ledoit, O. and M. Wolf (2004). A well-conditioned estimator for large-dimensional covariance matrices.
Journal of Multivariate Analysis 88, 365–411.
Lee, K. and M. H. Pesaran (1993). Persistence profiles and business cycle fluctuations in a disaggregated
model of UK output growth. Ricerche Economiche 47, 293–322.
Lee, L. F. (2003). Best spatial two-stage least squares estimators for a spatial autoregressive model with autore-
gressive disturbances. Econometric Reviews 22, 307–335.
Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive
models. Econometrica 72, 1899–1925.
Lee, L. F. (2007). GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. Journal of
Econometrics 137, 489–514.
Lee, L. F. and X. Liu (2006). Efficient GMM estimation of a SAR model with autoregressive disturbances.
Mimeo.
Lee, L. F. and X. Liu (2010). Efficient GMM estimation of high order spatial autoregressive models with
autoregressive disturbances. Econometric Theory 26, 187–230.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1019
Lee, L. F. and J. Yu (2010a). Estimation of spatial autoregressive panel data models with fixed effects. Journal
of Econometrics 154, 165–185.
Lee, L. F. and J. Yu (2010b). Some recent developments in spatial panel data model. Regional Science and Urban
Economics 40, 255–271.
Lee, L. F. and J. Yu (2010c). Spatial panels: Random components vs. fixed effects. Mimeo, Ohio State
University.
Lee, L. F. and J. Yu (2011). Estimation of Spatial Panels. Hanover, MA: Now Publishers, Foundations and
Trends in Econometrics.
Lee, L. F. and J. Yu (2013). Spatial panel data models. Mimeo, April, 2013.
Leroy, S. (1973). Risk aversion and the martingale property of stock returns. International Economic Review 14,
436–446.
LeSage, J. and R. K. Pace (2009). Introduction to Spatial Econometrics. Abingdon, Oxford: Taylor and Fran-
cis/CRC Press.
Levin, A., C. Lin, and C. Chu (2002). Unit root tests in panel data: asymptotic and finite-sample properties.
Journal of Econometrics 108, 1–24.
Levinshon, J. and A. Petrin (2003). Estimating production functions using inputs to control for unobserv-
ables. Review of Economic Studies 70, 317–342.
Lewbel, A. (1994). Aggregation and simple dynamics. American Economic Review 84, 905–918.
Leybourne, S. J. (1995). Testing for unit roots using forward and reverse Dickey-Fuller regressions. Oxford
Bulletin of Economics and Statistics 57, 559–571.
Li, H. and G. S. Maddala (1996). Bootstrapping time series models. Econometric Reviews 15, 115–158.
Lillard, L. A. and Y. Weiss (1979). Components of variation in panel earnings data: American scientists 1960–
70. Econometrica 47, 437–454.
Lin, X. and L. F. Lee (2010). GMM estimation of spatial autoregressive models with unknown heteroskedas-
ticity. Journal of Econometrics 157, 34–52.
Lindley, D. V. and A. F. M. Smith (1972). Bayes estimates for the linear model. Journal of the Royal Statistical
Society, B 34, 1–41.
Lippi, M. (1988). On the dynamic shape of aggregated error correction models. Journal of Economic Dynamics
and Control 12, 561–585.
Litterman, R. (1980). Techniques for forecasting with vector autoregressions. Ph.D. Dissertation, University
of Minnesota, Minneapolis.
Litterman, R. (1986). Forecasting with bayesian vector autoregressions—five years of experience. Journal of
Business and Economic Statistics 4, 25–38.
Litterman, R. and K. Winkelmann (1998). Estimating Covariance Matrices. Risk Management Series. New
York: Goldman Sachs.
Liu, X., L. F. Lee, and C. R. Bollinger (2006). Improved efficient quasi maximum likelihood estimator of spatial
autoregressive models. Mimeo.
Ljung, G. M. and G. E. P. Box (1978). On a measure of lack of fit in time series models. Biometrika 65, 297–303.
Lo, A. (2004). The adaptive markets hypothesis: market efficiency from an evolutionary perspective. Journal
of Portfolio Management 30, 15–29.
Lo, A. and C. MacKinlay (1988). Stock market prices do not follow random walks: evidence from a simple
specification test. Review of Financial Studies 1, 41–66.
Loeve, M. (1977). Probability Theory One. Berlin: Springer Verlag.
Lothian, J. R. and M. Taylor (1996). Real exchange rate behavior: the recent float from the perspective of the
last two centuries. Journal of Political Economy 104, 488–509.
Lovell, M. C. (1963). Seasonal adjustment of economic time series and multiple regression analysis. Journal
of the American Statistical Association 58, 993–1010.
Lucas, R. E. (1978). Asset prices in an exchange economy. Econometrica 46, 1429–1446.
Lütkepohl, H. (1984). Linear transformation of vector ARMA processes. Journal of Econometrics 26, 283–293.
Lütkepohl, H. (1996). Handbook of Matrices. New York: John Wiley.
Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Berlin: Springer Verlag.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1020 References
Lütkepohl, H. and M. Kratzig (2004). Applied Time Series Econometrics. Cambridge: Cambridge University
Press.
Lütkepohl, H., P. Saikkonen, and C. Trenkler (2001). Maximum eigenvalue versus trace tests for the cointe-
grating rank of a VAR process. Econometrics Journal 4, 287–310.
Lyhagen, J. (2008). Why not use standard panel unit root test for testing PPP. Economics Bulletin 3, 1–11.
MacDonald, R. and P. D. Murphy (1989). Testing for the long run relationship between nominal interest rates
and inflation using cointegration techniques. Applied Economics 21, 439–447.
MacKinnon, J. G. and H. White (1985). Some heteroskedasticity-consistent matrix estimators with improved
finite sample properties. Journal of Econometrics 29, 305–325.
MacKinnon, J. G. (1991). Critical values for cointegration tests. In C. G. R. F Engle (ed.), Run Economic Rela-
tionships: Readings in Cointegration, ch. 13, pp. 267–276. Oxford: Oxford University Press.
MacKinnon, J. G. (1996). Numerical distribution functions for unit root and cointegration tests. Journal of
Applied Econometrics 11, 601–618.
MacKinnon, J. G., A. A. Haug, and L. Michelis (1999). Numerical distribution functions of likelihood ratio
tests for cointegration. Journal of Applied Econometrics 14, 563–577.
MacKinnon, J. G., H. White, and R. Davidson (1983). Tests for model specification in the presence of alter-
native hypothesis: some further results. Journal of Econometrics 21, 53–70.
Maddala, G. S. (1971). The use of variance components models in pooling cross section and time series data.
Econometrica 39, 341–358.
Maddala, G. S. (1988). Introduction to Econometrics. New York: Macmillan.
Maddala, G. S. and S. Wu (1999). A comparative study of unit root tests with panel data and a new simple test.
Oxford Bulletin of Economics and Statistics Special Issue, 631–652.
Madsen, E. (2010). Unit root inference in panel data models where the time-series dimension is fixed: a com-
parison of different test. The Econometrics Journal 13, 63–94.
Magnus, J. R. (1982). Multivariate error components analysis of linear and nonlinear regression models by
maximum likelihood. Journal of Econometrics 19, 239–285.
Magnus, J. R. and H. Neudecker (1999). Matrix Differential Calculus with Applications in Statistics and Econo-
metrics. New York: John Wiley and Sons.
Malkiel, B. G. (2003). The efficient market hypothesis and its critics. Journal of Economic Perspectives 17,
59–82.
Maravall, A. and A. D. Rio (2007). Temporal aggregation, systematic sampling, and the Hodrick-Prescott
filter. Computational Statistics and Data Analysis 52, 975–998.
Marcellino, M., J. H. Stock, and M. W. Watson (2006). A comparison of direct and iterated multistep ar meth-
ods for forecasting macroeconomic time series. Journal of Econometrics 135, 499–526.
Mardia, K. V. and R. J. Marshall (1984). Maximum likelihood estimation of models for residual covariance in
spatial regression. Biometrika 71, 135–146.
Mark, N. C., M. Ogaki, and D. Sul (2005). Dynamic seemingly unrelated cointegration regression. Review of
Economic Studies 72, 797–820.
Mark, N. C. and D. Sul (2003). Cointegration vector estimation by panel DOLS and long-run money demand.
Oxford Bulletin of Economics and Statistics 65, 655–680.
Marriott, F. H. C. and J. A. Pope (1954). Bias in the estimation of autocorrelations. Biometrika 41,
390–402.
Marçal, E. F., B. Zimmermann, D. D. Prince, and G. T. Merlin (2014). Assessing interdependence among coun-
tries’ fundamentals and its implications for exchange rate misalignment estimates: An empirical exercise
based on GVAR. <http://ssrn.com/abstract=2364508> or <http://dx.doi.org/10.2139/ssrn.2364508>.
Massey, F. J. (1951). The Kolmogorov–Smirnov test of goodness of fit. Journal of the American Statistical Asso-
ciation 46, 68–78.
Masson, P. R., T. Bayoumi, and H. Samiei (1998). International evidence on the determinants of private sav-
ing. The World Bank Economic Review 12, 483–501.
Mátyás, L. (1999). Generalized Method of Moments Estimation. Cambridge: Cambridge University Press.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1021
Mavroeidis, S. (2005). Identification issues in forward-looking models estimated by gmm, with an application
to the phillips curve. Journal of Money, Credit, and Banking 37, 421–448.
McAleer, M. and M. H. Pesaran (1986). Statistical inference in non-nested econometric models. Applied
Mathematics and Computation 20, 271–311.
McCoskey, S. and C. Kao (1998). A residual-based test of the null of cointegration in panel data. Econometric
Reviews 17, 57–84.
McCracken, M. W. and K. D. West (2004). Inference about predictive ability. In M. P. Clements and D. F.
Hendry (eds.), A Companion to Economic Forecasting. Malden: Wiley Blackwell.
McLeish, D. L. (1975a). Invariance principles for dependent variables. Zeitschrift für Wahrscheinlichskeitsthe-
orie und Verwandete Gebiete 32, 165–178.
McLeish, D. L. (1975b). A maximal inequality and dependent strong laws. Annals of Probability 3, 829–839.
McMillen, D. P. (1995). Selection bias in spatial econometric models. Journal of Regional Science 35,
417–423.
Meghir, C. and L. Pistaferri (2004). Income variance dynamics and heterogeneity. Econometrica 72, 1–32.
Mehra, R. and E. Prescott (1985). The equity premium: a puzzle. Journal of Monetary Economics 15, 146–161.
Mehra, R. and E. C. Prescott (2003). The equity premium puzzle in retrospect. In M. H. G.M. Constantinides
and R. Stulz (eds.), Handbook of the Economics of Finance, pp. 889–938. Amsterdam: North Holland.
Melino, A. and S. M. Turnbull (1990). Pricing foreign currency options with stochastic volatility. Journal of
Econometrics 45, 239–265.
Merton, R. C. (1981). On market-timing and investment performance: an equilibrium theory of market fore-
casts. Journal of Business 54, 363–406.
Mills, T. C. (1990). Time Series Techniques for Economists. Cambridge: Cambridge University Press.
Mills, T. C. (2003). Modelling Trends and Cycles in Economic Series. London: Palgrave Texts in Econometrics.
Mishkin, F. S. (1992). Is the fisher effect for real? Journal of Monetary Economics 30, 195–215.
Mizon, G. E. and J. F. Richard (1986). The encompassing principle and its application to testing non-nested
hypotheses. Econometrica 54, 657–678.
Moon, H. R. and B. Perron (2004). Testing for a unit root in panels with dynamic factors. Journal of Econo-
metrics 122, 81–126.
Moon, H. R. and B. Perron (2005). Efficient estimation of the seemingly unrelated regression cointegration
model and testing for purchasing power parity. Econometric Reviews 23, 293–323.
Moon, H. R. and B. Perron (2012). Beyond panel unit root tests: using multiple testing to determine the
nonstationarity properties of individual series in a panel. Journal of Econometrics 169(1), 29–33.
Moon, H. R., B. Perron, and P. C. B. Phillips (2006). On the breitung test for panel unit roots and local asymp-
totic power. Econometric Theory 22, 1179–1190.
Moon, H. R., B. Perron, and P. C. B. Phillips (2007). Incidental trends and the power of panel unit root tests.
Journal of Econometrics 141, 416–459.
Moon, H. R. and M. Weidner (2015). Dynamic linear panel regression models with interactive fixed effects.
Econometric Theory. Forthcoming.
Moran, P. A. P. (1948). The interpretation of statistical maps. Biometrika 35, 255–60.
Moran, P. A. P. (1950). Notes on continuous stochastic processes. Biometrika 37, 17–23.
Mörters, P. and Y. Peres (2010). Brownian Motion. New York: Cambridge Series in Statistical and Probabilistic
Mathematics.
Moscone, F. and E. Tosetti (2009). A review and comparison of tests of cross section independence in panels.
Journal of Economic Surveys 23, 528–561.
Moscone, F. and E. Tosetti (2011). GMM estimation of spatial panels with fixed effects and unknown het-
eroskedasticity. Regional Science and Urban Economics 41, 487–497.
Moulton, B. R. (1990). An illustration of a pitfall in estimating the effects of aggregate variables on micro units.
Review of Economics and Statistics 72, 334–338.
Mountford, A. and H. Uhlig (2009). What are the effects of fiscal policy shocks? Journal of Applied Economet-
rics 24, 960–992.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1022 References
Muellbauer, J. (1975). Aggregation, income distribution and consumer demand. Review of Economic Stud-
ies 42, 525–543.
Muellbauer, J. and R. Lattimore (1995). The consumption function: A theoretical and empirical overview.
In M. H. Pesaran and M. R. Wickens (eds.), Handbook of Applied Econometrics: Macroeconomics,
pp. 221– 311. Oxford: Basil Blackwell.
Mundlak, Y. (1961). Empirical production functions free of management bias. Journal of Farm Economics 43,
44–56.
Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica 46, 69–85.
Mur, J., F. López, and M. Herrera (2010). Testing for spatial effects in seemingly unrelated regressions. Spatial
Economic Analysis 5, 399–440.
Murphy, A. H. and H. Dann (1985). Forecast evaluation. In A. H. Murphy and R. W. Katz (eds.), Probability,
Statistics, and Decision Making in the Atmospheric Sciences, pp. 379–437. Boulder, CO: Westview.
Murray, C. J. and D. H. Papell (2002). Testing for unit roots in panels in the presence of structural change with
an application to oecd unemployment. In B. H. Baltagi (ed.), Nonstationary Panels, Panel Cointegration, and
Dynamic Panels, Advances in Econometrics, vol. 15. Amsterdam: JAI.
Muth, J. F. (1961). Rational expectations and the theory of price movements. Econometrica 29, 315–335.
Mutl, J. and M. Pfaffermayr (2011). The Hausman test in a Cliff and Ord panel model. The Econometrics Jour-
nal 14, 48–76.
Nabeya, S. (1999). Asymptotic moments of some unit root test statistics in the null case. Econometric The-
ory 15, 139–149.
Nason, J. and G. Smith (2008). Identifying the new Keynesian Phillips curve. Journal of Applied Economet-
rics 23, 525–551.
Nauges, C. and A. Thomas (2003). Consistent estimation of dynamic panel data models with time-varying
individual effects. Annales d’Economie et de Statistique 70, 53–74.
Neave, H. R. and P. L. Worthington (1992). Distribution-free Tests. London: Routledge.
Nelson, C. R. and C. I. Plosser (1982). Trends and random walks in macro-economic time series. Journal of
Monetary Economics 10, 139–162.
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: a new approach. Econometrica 59,
347–370.
Nelson, D. B. and C. Q. Cao (1992). Inequality constraints in the univariate GARCH model. Journal of Business
and Economic Statistics 10, 229–235.
Nerlove, M. (2002). Essays in Panel Data Econometrics. Cambridge: Cambridge University Press.
Nerlove, M. and P. Balestra (1992). Formulation and estimation of econometric models for the analysis
of panel data. In L. Matyas and P. Sevestre (eds.), The Econometrics of Panel Data. Dordrecht: Kluwer
Academic Publishers.
Nerlove, M., D. M. Grether, and J. L. Carvalo (1979). Analysis of Economic Time Series. New York: Academic
Press.
Newey, W. K. and R. J. Smith (2000). Asymptotic bias and equivalence of GMM and GEL estimators. MIT
Discussion Paper No. 01/517.
Newey, W. K. and K. D. West (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation
consistent covariance matrix. Econometrica 55, 703–708.
Newey, W. K. and K. D. West (1994). Automatic lag selection in covariance matrix estimation. Review of
Economic Studies 61, 631–653.
Neyman, J. and E. Scott (1948). Consistent estimates based on partially consistent observations. Economet-
rica 16, 1–32.
Ng, S. (2006). Testing cross section correlation in panel data using spacings. Journal of Business and Economic
Statistics 24, 12–23.
Ng, S. (2008). A simple test for nonstationarity in mixed panels. Journal of Business and Economic Statistics 26,
113–127.
Nickell, S. (1981). Biases in dynamic models with fixed effects. Econometrica 49, 1417–1426.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1023
Nijman, T. and M. Verbeek (1992). Nonresponse in panel data: the impact on estimates of a life cycle con-
sumption function. Journal of Applied Econometrics 7, 243–257.
Nyblom, J. (1989). Testing for the constancy of parameters over time. Journal of the American Statistical Asso-
ciation 84, 223–230.
O’Connell, P. G. J. (1998). The overvaluation of purchasing power parity. Journal of International Economics 44,
1–19.
Ogaki, M. (1992). Engle’s law and cointegration. Journal of Political Economy 100, 1027–1046.
Onatski, A. (2009). Testing hypotheses about the number of factors in large factor models. Econometrica 77,
1447–1479.
Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. Review of
Economics and Statistics 92, 1004–1016.
Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly
influential factors. Journal of Econometrics 168, 244–258.
Orcutt, G. H. and H. S. Winokur (1969). First order autoregression: inference, estimation, and prediction.
Econometrica 37, 1–14.
Ord, J. K. (1975). Estimation methods for models of spatial interaction. Journal of the American Statistical
Association 70, 120–126.
Osborne, M. (1959). Brownian motion in the stock market. Operations Research 7, 145–173.
Osborne, M. (1962). Periodic structures in the brownian motion of stock prices. Operations Research 10,
345–379.
Osterwald-Lenum, M. (1992). A note with quantiles of the asymptotic distribution of the maximum likeli-
hood cointegration rank test statistics. Oxford Bulletin of Economics and Statistics 54, 461–472.
Pagan, A. R. (1980). Some identification and estimation results for regression models with stochastically vary-
ing coefficients. Journal of Econometrics 13, 341–364.
Pagan, A. R. and A. Ullah (1999). Nonparametric Econometrics. Cambridge: Cambridge University Press.
Pagan, A. R. and M. H. Pesaran (2008). Econometric analysis of structural systems with permanent and tran-
sitory shocks and exogenous variables. Journal of Economic Dynamics and Control 32, 3376–3395.
Palma, W. (2007). Long-Memory Time Series: Theory and Methods. Hoboken, NJ: John Wiley.
Pantula, S. G., G. Gonzalez-Farias, and W. A. Fuller (1994). A comparison of unit-root test criteria. Journal of
Business and Economic Statistics 12, 449–459.
Park, H. and W. Fuller (1995). Alternative estimators and unit root tests for the autoregressive process. Journal
of Time Series Analysis 16, 415–429.
Park, J. Y. (1992). Canonical cointegrating regressions. Econometrica 60, 119–143.
Pearson, K. (1894). Mathematical contribution to the theory of evolution. II. Skew variation in homogeneous
material. Philosophical Transactions of the Royal Society of London, A 186, 343–414.
Pedroni, P. (1999). Critical values for cointegration tests in heterogeneous panels with multiple regressors.
Oxford Bulletin of Economics and Statistics 61, 653–670.
Pedroni, P. (2000). Fully modified OLS for heterogenous cointegrated panels. In B. H. Baltagi (ed.), Non-
stationary Panels, Panel Cointegration, and Dynamic Panels, Advances in Econometrics, vol. 15. New York:
JAI Press.
Pedroni, P. (2001). Purchasing power parity tests in cointegrated panels. Review of Economics and Statistics 83,
727–731.
Pedroni, P. (2004). Panel cointegration: asymptotic and finite sample properties of pooled time series tests
with an application to the PPP hypothesis. Econometric Theory 20, 597–625.
Pedroni, P. and T. Vogelsang (2005). Robust tests for unit roots in heterogeneous panels. Mimeo, Williams
College.
Pepper, J. V. (2002). Robust inferences from random clustered samples: an application using data from the
panel study of income dynamics. Economics Letters 75, 341–345.
Pesaran, B. and M. H. Pesaran (2009). Time Series Econometrics using Microfit 5.0. Oxford: Oxford University
Press.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1024 References
Pesaran, B. and M. H. Pesaran (2010). Conditional volatility and correlations of weekly returns and the VaR
analysis of 2008 stock market crash. Economic Modelling 27, 1398–1416.
Pesaran, M. H. (1972). Small Sample Estimation of Dynamic Economic Models. Ph.D. thesis, Cambridge
University.
Pesaran, M. H. (1973). Exact maximum likelihood estimation of a regression equation with first order
moving-average errors. Review of Economic Studies 40, 529–535.
Pesaran, M. H. (1974). On the general problem of model selection. Review of Economic Studies 41, 153–171.
Pesaran, M. H. (1981a). Diagnostic testing and exact maximum likelihood estimation of dynamic models.
In E. Charatsis (ed.), Proceedings of the Econometric Society European Meeting, 1979, Selected Econometric
Papers in memory of Stefan Valvanis, pp. 63–87. Amsterdam: North-Holland.
Pesaran, M. H. (1981b). Identification of rational expectations models. Journal of Econometrics 16, 375–398.
Pesaran, M. H. (1982). Comparison of local power of alternative tests of non-nested regression models. Econo-
metrica 50, 1287–1305.
Pesaran, M. H. (1987a). Econometrics. In J. Eatwell, M. Milgate, and P. Newman (eds.), The New Palgrave: A
Dictionary of Economics, vol. 2. London: Palgrave Macmillan.
Pesaran, M. H. (1987b). Global and partial non-nested hypotheses and asymptotic local power. Econometric
Theory 3, 69–97.
Pesaran, M. H. (1987c). The Limits to Rational Expectations. Oxford: Basil Blackwell. Reprinted with correc-
tions 1989.
Pesaran, M. H. (1997). The role of economic theory in modelling the long run. Economic Journal 107,
178–191.
Pesaran, M. H. (2003). Aggregation of linear dynamic models: an application to life-cycle consumption mod-
els under habit formation. Economic Modelling 20, 383–415.
Pesaran, M. H. (2004). General diagnostic tests for cross section dependence in panels. CESifo Working Paper
No. 1229.
Pesaran, M. H. (2006). Estimation and inference in large heterogenous panels with multifactor error structure.
Econometrica 74, 967–1012.
Pesaran, M. H. (2007a). A pair-wise approach to testing for output and growth convergence. Journal of Econo-
metrics 138, 312–355.
Pesaran, M. H. (2007b). A simple panel unit root test in the presence of cross section dependence. Journal of
Applied Econometrics 22, 265–312.
Pesaran, M. H. (2010). Predictability of asset returns and the efficient market hypothesis. In A. Ullah and D. E.
Giles (eds.), Handbook of Empirical Economics and Finance, pp. 281–311. New York: Taylor and Francis.
Pesaran, M. H. (2012). On the interpretation of panel unit root tests. Economics Letters 116, 545–546.
Pesaran, M. H. (2015). Testing weak cross-sectional dependence in large panels. Econometric Reviews 34,
1089–1117.
Pesaran, M. H. and A. Chudik (2014). Aggregation in large dynamic panels. Journal of Econometrics 178,
273–285.
Pesaran, M. H. and A. S. Deaton (1978). Testing non-nested nonlinear regression models. Econometrica 46,
677–694.
Pesaran, M. H. and B. Pesaran (1993). A simulation approach to the problem of computing Cox’s statistic for
testing non-nested models. Journal of Econometrics 57, 377–392.
Pesaran, M. H. and B. Pesaran (1995). A non-nested test of level-differenced versus log-differenced stationary
models. Econometric Reviews 14, 213–227.
Pesaran, M. H. and A. Pick (2007). Econometric issues in the analysis of contagion. Journal of Economic
Dynamics and Control 31, 1245–1277.
Pesaran, M. H., A. Pick, and A. Timmermann (2011). Variable selection, estimation and inference for multi-
period forecasting problems. Journal of Econometrics 164, 173–187.
Pesaran, M. H., R. G. Pierse, and K. C. Lee (1993). Persistence, cointegration and aggregation: A disaggregated
analysis of output fluctuations in the U.S. economy. Journal of Econometrics 56, 57–88.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1025
Pesaran, M. H., R. G. Pierse, and K. Lee (1994). Choice between disaggregate and aggregate specifications
estimated by IV method. Journal of Business and Economic Statistics 12, 111–121.
Pesaran, M. H., R. G. Pierse, and M. S. Kumar (1989). Econometric analysis of aggregation in the context of
linear prediction models. Econometrica 57, 861–888.
Pesaran, M. H., C. Schleicher, and P. Zaffaroni (2009). Model averaging in risk management with an applica-
tion to futures markets. Journal of Empirical Finance 16, 280–305.
Pesaran, M. H., T. Schuermann, and L. V. Smith (2009a). Forecasting economic and financial variables with
global VARs. International Journal of Forecasting 25, 642–675.
Pesaran, M. H., T. Schuermann, and L. V. Smith (2009b). Rejoinder to comments on forecasting economic
and financial variables with global VARs. International Journal of Forecasting 25, 703–715.
Pesaran, M. H., T. Schuermann, and B.-J. Treutler (2007). Global business cycles and credit risk. In The Risks
of Financial Institutions, National Bureau of Economic Research Publications, pp. 419–474. Chicago: Uni-
versity of Chicago Press.
Pesaran, M. H., T. Schuermann, B.-J. Treutler, and S. M. Weiner (2006). Macroeconomic dynamics and credit
risk: a global perspective. Journal of Money, Credit and Banking 38, 1211–1261.
Pesaran, M. H., T. Schuermann, and S. Weiner (2004). Modelling regional interdependencies using
a global error-correcting macroeconometric model. Journal of Business and Economics Statistics 22,
129–162.
Pesaran, M. H. and Y. Shin (1996). Cointegration and speed of convergence to equilibrium. Journal of Econo-
metrics 71, 117–143.
Pesaran, M. H. and Y. Shin (1998). Generalised impulse response analysis in linear multivariate models. Eco-
nomics Letters 58, 17–29.
Pesaran, M. H. and Y. Shin (1999). An autoregressive distributed lag modelling approach to cointegration
analysis. In S. Strom and P. Diamond (eds.), Econometrics and Economic Theory in the 20th Century: The
Ragnar Frisch Centennial Symposium. Cambridge: Cambridge University Press.
Pesaran, M. H. and Y. Shin (2002). Long run structural modelling. Econometrics Reviews 21, 49–87.
Pesaran, M. H., Y. Shin, and R. Smith (1999). Pooled mean group estimation of dynamic heterogeneous pan-
els. Journal of the American Statistical Association 94, 621–634.
Pesaran, M. H., Y. Shin, and R. J. Smith (2000). Structural analysis of vector error correction models with
exogenous I(1) variables. Journal of Econometrics 97, 293–343.
Pesaran, M. H., Y. Shin, and R. J. Smith (2001). Bounds testing approaches to the analysis of level relationships.
Journal of Applied Econometrics 16, 289–326. Special issue in honour of J D Sargan on the theme ‘Studies
in Empirical Macroeconometrics’.
Pesaran, M. H. and L. J. Slater (1980). Dynamic Regression: Theory and Algorithms. Chichester: Ellis Horwood.
Pesaran, M. H., L. V. Smith, and R. P. Smith (2007). What if the UK or Sweden had joined the euro in 1999?
An empirical evaluation using a global VAR. International Journal of Finance & Economics 12, 55–87.
Pesaran, M. H., L. V. Smith, and T. Yamagata (2013). Panel unit root test in the presence of a multifactor error
structure. Journal of Econometrics 175, 94–115.
Pesaran, M. H. and R. Smith (2006). Macroeconometric modelling with a global perspective. Manchester
School 74, 24–49.
Pesaran, M. H. and R. J. Smith (1994). A generalized R2 criterion for regression models estimated by the
instrumental variables method. Econometrica 62, 705–710.
Pesaran, M. H. and R. P. Smith (1985). Evaluation of macroeconometric models. Economic Modelling 2,
125–134.
Pesaran, M. H. and R. P. Smith (1995). Estimating long-run relationships from dynamic heterogeneous pan-
els. Journal of Econometrics 68, 79–113.
Pesaran, M. H. and R. P. Smith (1998). Structural analysis of cointegrating VARS. Journal of Economic Sur-
veys 12, 471–505.
Pesaran, M. H. and R. P. Smith (2014). Signs of impact effects in time series regression models. Economic
Letters 122, 150–153.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1026 References
Pesaran, M. H., R. P. Smith, and K. S. Im (1996). Dynamic linear models for heterogeneous pan-
els. In L. Matyas and P. Sevestre (eds.), The Econometrics of Panel Data (2nd edn). Boston: Kluwer
Academic.
Pesaran, M. H., R. P. Smith, and S. Yeo (1985). Testing for structural stability and predictive failure: a review.
Manchester School 53, 280–295.
Pesaran, M. H. and A. Timmermann (1992). A simple nonparametric test of predictive performance. Journal
of Business and Economic Statistics 10, 461–465.
Pesaran, M. H. and A. Timmermann (1994). Forecasting stock returns: an examination of stock market trad-
ing in the presence of transaction costs. Journal of Forecasting 13, 335–367.
Pesaran, M. H. and A. Timmermann (1995). The robustness and economic significance of predictability of
stock returns. Journal of Finance 50, 1201–1228.
Pesaran, M. H. and A. Timmermann (2000). A recursive modelling approach to predicting UK stock returns.
Economic Journal 110, 159–191.
Pesaran, M. H. and A. Timmermann (2005a). Real time econometrics. Econometric Theory 21, 212–231.
Pesaran, M. H. and A. Timmermann (2005b). Small sample properties of forecasts from autoregressive mod-
els under structural breaks. Journal of Econometrics 129, 183–217.
Pesaran, M. H. and A. Timmermann (2009). Testing dependence among serially correlated multi-category
variables. Journal of the American Statistical Association 104, 325–337.
Pesaran, M. H. and E. Tosetti (2011). Large panels with common factors and spatial correlation. Journal of
Econometrics 161, 182–202.
Pesaran, M. H., A. Ullah, and T. Yamagata (2008). A bias-adjusted LM test of error cross section indepen-
dence. Econometrics Journal 11, 105–127.
Pesaran, M. H. and M. Weale (2006). Survey expectations. In C. W. J. Granger, G. G. Elliott, and A. Timmer-
mann (eds.), Handbook of Economic Forecasting, Amsterdam: North-Holland.
Pesaran, M. H. and M. Weeks (2001). Non-nested hypothesis testing: an overview. In B. H. Baltagi (ed.),
Companion to Theoretical Econometrics, Oxford. Basil Blackwell.
Pesaran, M. H. and T. Yamagata (2008). Testing slope homogeneity in large panels. Journal of Econometrics 142,
50–93.
Pesaran, M. H. and Z. Zhao (1999). Bias reduction in estimating long run relationships from dynamic het-
erogeneous panels. In C. Hsiao, K. Lahiri, L. Lee, and M. Pesaran (eds.), Analysis of Panels and Limited
Dependent Variables: A Volume in Honour of G. S. Maddala. Cambridge: Cambridge University Press.
Pesaran, M. H. and Q. Zhou (2014). Estimation of time-invariant effects in static panel data models. CAFE
Research Paper No. 14.08, University of Southern California.
Pesavento, E. (2007). Residuals-based tests for the null of no-cointegration: an analytical comparison. Journal
of Time Series Analysis 28, 111–137.
Pfaffermayr, M. (2009). Maximum likelihood estimation of a general unbalanced spatial random effects
model: a Monte Carlo study. Spatial Economic Analysis 4, 467–483.
Phillips, A. W. (1954). Stabilisation policy in a closed economy. Economic Journal 64, 290–323.
Phillips, A. W. (1957). Stabilisation policy and the time-forms of lagged responses. Economic Journal 67,
265–277.
Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics. Journal of Econometrics 33,
311–340.
Phillips, P. C. B. (1991). Optimal inference in cointegrated systems. Econometrica 59, 283–306.
Phillips, P. C. B. (1994). Some exact distribution theory for maximum likelihood estimators of cointegrating
coefficients in error correction models. Econometrica 62, 73–93.
Phillips, P. C. B. (1995). Fully modified least squares and vector autoregressions. Econometrica 63, 1023–1078.
Phillips, P. C. B. and S. N. Durlaf (1986). Multiple time series regression with integrated processes. Review of
Economic Studies 53, 473–495.
Phillips, P. C. B. and B. E. Hansen (1990). Statistical inference in instrumental variables regression with I(1)
processes. Review of Economic Studies 57, 99–125.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1027
Phillips, P. C. B. and H. R. Moon (1999). Linear regression theory for nonstationary panel data. Economet-
rica 67, 1057–1111.
Phillips, P. C. B. and S. Ouliaris (1990). Asymptotic properties of residual based tests for cointegration. Econo-
metrica 58, 165–193.
Phillips, P. C. B. and P. Perron (1988). Testing for a unit root in time series regression. Biometrika 75, 335–346.
Phillips, P. C. B. and D. Sul (2003). Dynamic panel estimation and homogeneity testing under cross section
dependence. Econometrics Journal 6, 217–259.
Phillips, P. C. B. and D. Sul (2007). Bias in dynamic panel estimation with fixed effects, incidental trends and
cross section dependence. Journal of Econometrics 137, 162–188.
Phillips, P. C. B., Y. Sun, and S. Jin (2006). Spectral density estimation and robust hypothesis testing using
steep origin kernels without truncation. International Economic Review 47, 837–894.
Pinkse, J. (1999). Asymptotic properties of moran and related tests and testing for spatial correlation in probit
models. University of British Columbia.
Pinkse, J., M. Slade, and C. Brett (2002). Spatial price competition: a semiparametric approach. Economet-
rica 70, 1111–1153.
Ploberger, W. and W. Krämer (1992). The CUSUM test with OLS residuals. Econometrica 60(2), 271–286.
Ploberger, W. and P. C. B. Phillips (2002). Optimal testing for unit roots in panel data. Mimeo, University of
Rochester.
Poirier, D. J. (1998). Revising beliefs in unidentified models. Econometric Theory 14, 483–509.
Powell, M. J. D. (1964). An efficient method for finding the minimum of a function of several variables without
calculating derivatives. Computer Journal 7, 155–162.
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling (1989). Numerical Recipes: The Art of Scientific
Computing FORTRAN version. Cambridge: Cambridge University Press.
Priestley, M. B. (1981). Spectral Analysis and Time Series. London: Academic Press.
Pyke (1965). Spacings. Journal of the Royal Statistical Society, Series B 27, 395–449.
Quandt, R. E. (1960). Tests of the hypothesis that a linear regression system obeys two separate regime. Jour-
nal of the American Statistical Association 55, 324–330.
Quenouille, M. (1949). Approximate tests of correlation in time series. Journal of Royal Statistical Society, Series
B 11, 68–83.
Rahbek, A. and R. Mosconi (1999). Cointegration rank inference with stationary regressors in VAR models.
The Econometrics Journal 2, 76–91.
Rao, C. R. (1970). Estimation of heteroscedastic variances in linear models. Journal of the American Statistical
Association 65, 161–172.
Rao, C. R. (1973). Linear statistical inference and its applications. New York: John Wiley.
Ravn, M. O. and H. Uhlig (2002). On adjusting the Hodrick–Prescott filter for the frequency of observations.
Review of Economics and Statistics 84, 371–376.
Rivera-Batiz, L. A. and P. M. Romer (1991). Economic integration and endogenous growth. Quarterly Journal
of Economics 106, 531–555.
Roberts, H. (1967). Statistical versus clinical prediction in the stock market. Unpublished manuscript, Center
for Research in Security Prices, University of Chicago.
Robertson, D., A. Garratt, and S. Wright (2006). Permanent vs transitory components and economic funda-
mentals. Journal of Applied Econometrics 21, 521–542.
Robertson., D. and V. Sarafidis (2013). V estimation of panels with factor residuals. Working Paper 1321,
Cambridge Working Paper in Economics.
Robertson, D. and J. Symons (1992). Some strange properties of panel data estimators. Journal of Applied
Econometrics 7, 175–189.
Robinson, P. M. (1978). Statistical inference for a random coefficient autoregressive model. Scandinavian
Journal of Statistics 5, 163–168.
Robinson, P. M. (1994). Time series with strong dependence. In C. A. Sims (ed.), Advances in Econometrics:
Sixth World Congress, vol 1. Cambridge: Cambridge University Press.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1028 References
Robinson, P. M. (1995). Gaussian semiparametric estimation of long range dependance. The Annals of Statis-
tics 5, 1630–1661.
Robinson, P. M. (2007). Nonparametric spectrum estimation for spatial data. Journal of Statistical Planning
and Inference 137, 1024–1034.
Robinson, P. M. (2008). Correlation testing in time series, spatial and cross-sectional data. Journal of Econo-
metrics 147, 5–16.
Rose, D. E. (1977). Forecasting aggregates of independent ARIMA processes. Journal of Econometrics 5,
323–345.
Rosenberg, B. (1972). The estimation of stationary stochastic regression parameters reexamined. Journal of
the American Statistical Association 67, 650–654.
Rosenberg, B. (1973). The analysis of a cross-section of time series by stochastically convergent parameter
regression. Annals of Economic and Social Measurement 2, 399–428.
Rosenblatt, M. (1952). Remarks on a multivariate transformation. Annals of Mathematical Statistics 23,
470–472.
Rothenberg, T. J. (1971). Identification in parametric models. Econometrica 39, 577–591.
Rozanov, Y. A. (1967). Stationary Random Processes. San Francisco: Holden-Day.
Rubinstein, M. (1976). The valuation of uncertain income streams and the pricing of options. Bell Journal of
Economics 7, 407–425.
Al-Sadoon, M. M., T. Li, and M. H. Pesaran (2012). An exponential class of dynamic binary choice panel data
models with fixed effects. Technical report, CESifoWorking Paper Series No. 4033. Revised 2014.
Said, E. and D. A. Dickey (1984). Testing for unit roots in autoregressive-moving average models of unknown
order. Biometrika 71, 599–607.
Saikkonen, P. (1991). Asymptotic efficient estimation of cointegration regressions. Econometric Theory 7,
1–21.
Salemi, M. K. (1986). Solution and estimation of linear rational expectation models. Journal of Economet-
rics 31, 41–66.
Salkever, D. S. (1976). The use of dummy variables to compute predictions, prediction errors and confidence
intervals. Journal of Econometrics 4, 393–397.
Samuelson, P. (1965). Proof that properly anticipated prices fluctuate randomly. Industrial Management
Review Spring 6, 41–49.
Sarafidis, V. and D. Robertson (2009). On the impact of error cross-sectional dependence in short dynamic
panel estimation. Econometrics Journal 12, 62–81.
Sarafidis, V. and T. Wansbeek (2012). Cross-sectional dependence in panel data analysis. Econometric
Reviews 31, 483–531.
Sarafidis, V., T. Yamagata, and D. Robertson (2009). A test of cross section dependence for a linear dynamic
panel model with regressors. Journal of Econometrics 148, 149–161.
Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica 26,
393–415.
Sargan, J. D. (1959). The estimation of relationships with autocorrelated residuals by the use of instrumental
variables. Journal of the Royal Statistical Society, Series B 21, 91–105.
Sargan, J. D. (1964). Wages and prices in the United Kingdom: a study in econometric methodology. In
P. Hart, G. Mills, and J. Whitaker (eds.), Econometrics Analysis for National Economic Planning. London:
Butterworths.
Sargan, J. D. (1976). Testing for misspecification after estimation using instrumental variables. Unpublished
manuscript.
Sargan, J. D. and A. Bhargava (1983). Testing for residuals from least squares regression being generated by
Gaussian random walk. Econometrica 51, 153–174.
Sargent, T. J. (1976). The observational equivalence of natural and unnatural rate theories of macroeco-
nomics. Journal of Political Economy 84, 631–640.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1029
Sargent, T. J. and C. A. Sims (1977). Business cycle modeling without pretending to have too much a-priori
economic theory. In C. Sims (ed.), New Methods in Business Cycle Research. Minneapolis: Federal Reserve
Bank of Minneapolis.
Satchell, S. and J. L. Knight (eds.) (2007). Forecasting Volatility in the Financial Markets (3rd edn). Amsterdam:
Butterworth-Heinemann Finance.
Schanne, N. (2011). Forecasting regional labour markets with GVAR models and indicators. Conference
paper presented at the European Regional Science Association, <http://econpapers.repec.org/paper/
wiwwiwrsa/ersa10p1044.htm>.
Scheffe, H. (1959). The Analysis of Variance. New York: John Wiley.
Scheinkman, J. A. and W. Xiong (2003). Overconfidence and speculative bubbles. Journal of Political Econ-
omy 111, 1183–1219.
Schmidt, P. and P. C. B. Phillips (1992). LM test for a unit root in the presence of deterministic trends. Oxford
Bulletin of Economics and Statistics 54, 257–287.
Schott, J. R. (2005). Testing for complete independence in high dimensions. Biometrika 92, 951–956.
Sentana, E. (2000). The likelihood function of conditionally heteroskedastic factor models. Annales
d’Economie et de Statistique 58, 1–19.
Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley.
Shaman, P. and R. A. Stine (1988). The bias of autoregressive coefficient estimators. Journal of the American
Statistical Association 83, 842–848.
Shapiro, M. D. and M. W. Watson (1988). Sources of business cycle fluctuations. NBER Macroeconomics
Annual 3, 111–148.
Sheather, S. J. (2004). Density estimation. Statistical Science 19, 588–597.
Shephard, N. (2005). Stochastic Volatility: Selected Readings. Oxford: Oxford University Press.
Shiller, R. J. (2005). Irrational Exuberance (2nd edn). Princeton, NJ: Princeton University Press.
Shin, Y. (1994). A residual-based test of the null of cointegration against the alternative of no cointegration.
Econometric Theory 10, 91–115.
Silverman, B. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.
Sims, C. (1980). Macroeconomics and reality. Econometrica 48, 1–48.
Sims, C. (1986). Are forecasting models usable for policy analysis? Quarterly Review, Federal Reserve Bank of
Minneapolis 10, 105–120.
Sims, C. (2001). Solving linear rational expectations models. Computational Economics 20, 1–20.
Sims, C. and T. Zha (1998). Bayesian methods for dynamic multivariate models. International Economic
Review 39, 949–968.
Skouras, S. (1998). Risk neutral forecasting. EUI Working Papers, Eco No. 98/40, European University
Institute.
Slutsky, E. (1937). The summation of random causes as the source of cyclic processes. Econometrica 5,
105–146.
Smeeks, S. (2010). Bootstrap sequential tests to determine the stationary units in a panel. Mimeo, Maastricht
University.
Smets, F. and R. Wouters (2003). An estimated dynamic stochastic general equilibrium model of the Euro
Area. Journal of the European Economic Association 1, 1123–1175.
Smets, F. and R. Wouters (2007). Shocks and frictions in us business cycles: a Bayesian DSGE approach.
American Economic Review 97, 586–606.
Smith, L. V. and A. Galesi (2014). GVAR Toolbox 2.0 for Global VAR Modelling. <https://sites.google.
com/site/gvarmodelling/gvar-toolbox>.
Smith, L. V., S. Leybourne, T. Kim, and P. Newbold (2004). More powerful panel data unit root tests with an
application to mean reversion in real exchange rates. Journal of Applied Econometrics 19, 147–170.
Smith, L. V. and T. Yamagata (2011). Firm level return–volatility analysis using dynamic panels. Journal of
Empirical Finance 18, 847–867.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1030 References
So, B. S. and D. W. Shin (1999). Recursive mean adjustment in time series inferences. Statistics & Probability
Letters 43, 65–73.
Söderlind, P. (1994). Cyclical properties of a real business cycle model. Journal of Applied Econometrics 9,
113–122.
Song, M. (2013). Asymptotic theory for dynamic heterogeneous panels with cross-sectional dependence and
its applications. Mimeo, 30 January 2013.
Spanos, A. (1989). Statistical Foundations of Econometric Modelling. Cambridge: Cambridge University Press.
Spearman, C. (1904). General intelligence objectively determined and measured. American Journal of Psychol-
ogy 15, 201–293.
Stock, J. H. and M. W. Watson (1999). Forecasting inflation. Journal of Monetary Economics 44, 293–335.
Stock, J. H. and M. W. Watson (2002). Macroeconomic forecasting using diffusion indexes. Journal of Business
and Economic Statistics 20, 147–162.
Stock, J. H. and M. W. Watson (2004). Combination forecasts of output growth in a seven-country data set.
Journal of Forecasting 23, 405–430.
Stock, J. H. and M. W. Watson (2005). Implications of dynamic factor models for VAR analysis. NBER Work-
ing Paper No. 11467.
Stock, J. H. and M. W. Watson (2011). Dynamic factor models. In M. P. Clements and D. F. Hendry (eds.),
The Oxford Handbook of Economic Forecasting. New York: Oxford University Press.
Stock, J. H., J. Wright, and M. Yogo (2002). A survey of weak instruments and weak identification in general-
ized method of moments. Journal of Business and Economic Statistics 20, 518–529.
Stoker, T. (1984). Completeness, distribution restrictions, and the form of aggregate functions. Economet-
rica 52, 887–907.
Stoker, T. (1986). Simple tests of distributional effects on macroeconomic equations. Journal of Political Econ-
omy 94, 763–795.
Stoker, T. (1993). Empirical approaches to the problem of aggregation over individuals. Journal of Economic
Literature 31, 1827–1874.
Styan, G. P. H. (1970). Notes on the distribution of quadratic forms in singular normal variables.
Biometrika 57, 567–572.
Su, L. and Z. Yang (2015). QML estimation of dynamic panel data models with spatial errors, unpublished.
Journal of Econometrics 185, 230–258.
Sun, Y., F. F. Heinz, and G. Ho (2013). Cross-country linkages in Europe: a global VAR analysis. IMF Working
Papers 13/194, International Monetary Fund.
Sun, Y., P. C. B. Phillips, and S. Jin (2008). Optimal bandwidth selection in heteroskedasticity-autocorrelation
robust testing. Econometrica 76, 175–194.
Swamy, P. A. V. B. (1970). Efficient inference in random coefficient regression model. Econometrica 38,
311–323.
Tanaka, K. (1990). Testing for a moving average root. Econometric Theory 6, 433–444.
Theil, H. (1954). Linear Aggregation of Economic Relations. Amsterdam: North-Holland.
Theil, H. (1957). Specification errors and the estimation of economic relations. Review of the International
Statistical Institute 25, 41–51.
Tian, Y. and G. P. H. Styan (2006). Cochran’s statistical theorem revisited. Journal of Statistical Planning and
Inference 136, 2659–2667.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society
Series B 58, 267–288.
Tiefelsdorf, M. and D. A. Griffith (2007). Semiparametric filtering of spatial autocorrelation: the eigenvector
approach. Environment and Planning A 39, 1193–1221.
Timmermann, A. (2006). Handbook of Economic Forecasting, Chapter Forecast Combinations, pp. 135–196.
Amsterdam, North Holland.
Tobin, J. (1950). A statistical demand function for food in the USA. Journal of the Royal Statistical Society A 113,
113–141.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1031
Trapani, L. and G. Urga (2010). Micro versus macro cointegration in heterogeneous panels. Journal of Econo-
metrics 155, 1–18.
Tso, M. K. S. (1981). Reduced-rank regression and canonical analysis. Journal of the Royal Statistical Society,
Series B 43, 183–189.
Tzavalis, E. H. (2002). Structural breaks and unit root tests for short panels. Mimeo, Queen Mary, University
of London.
Uhlig, H. (2001). Toolkit for analysing nonlinear dynamic stochastic models easily. Computational Methods
for the Study of Dynamic Economies 33, 30–62.
Uhlig, H. (2005). What are the effects of monetary policy on output? Results from an agnostic identification
procedure. Journal of Monetary Economics 52, 381–419.
Vansteenkiste, I. (2007). Regional housing market spillovers in the US: lessons from regional divergences in
a common monetary policy setting. Working Paper Series 0708, European Central Bank.
Varian, H. (1975). A Bayesian approach to real estate assessment. In S. E. Fienberg and A. Zellner (eds.), In
Studies in Bayesian Econometrics and Statistics in Honor of L.J. Savage. Amsterdam: North-Holland.
Velasco, C. (1999). Gaussian semiparametric estimation of non-stationary time series. Journal of Time Series
Analysis 20, 87–127.
Vella, F. and M. Verbeek (1998). Whose wages do unions raise? A dynamic model of unionism and wage rate
determination for young men. Journal of Applied Econometrics 13, 163–183.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57,
307–333.
Wagner, M. (2008). On PPP, unit roots and panels. Empirical Economics 35, 229–249.
Wagner, M. and J. Hlouskova (2010). The performance of panel cointegration methods: results from a large
scale simulation study. Econometric Reviews 29, 182–223.
Wallace, T. D. and A. Hussain (1969). The use of error components models in combining cross-section and
time-series data. Econometrica 37, 55–72.
Wallis, K. F. (1980). Econometric implications of the rational expectations hypothesis. Econometrica 48,
49–73.
Watson, M. W. (1986). Univariate detrending methods with stochastic trends. Journal of Monetary Eeco-
nomics 18, 49–75.
Watson, M. W. (1994). Vector autoregression and cointegration. In D. MacFadden and R. Engle (eds.), Hand-
book of Econometrics, pp. 843–915. Amsterdam: North Holland.
Wegge, L. L. and M. Feldman (1983). Identifiability criteria for Muth rational expectations models. Journal of
Econometrics 21, 245–254.
West, K. D. (1996). Asymptotic inference about predictive ability. Econometrica 64, 1067–1084.
Westerlund, J. (2005a). Data dependent endogeneity correction in cointegrated panels. Oxford Bulletin of Eco-
nomics and Statistics 67, 691–705.
Westerlund, J. (2005b). New simple tests for panel cointegration. Econometric Reviews 24, 297–316.
Westerlund, J. (2005c). A panel CUSUM test of the null of cointegration. Oxford Bulletin of Economics and
Statistics 62, 231–262.
Westerlund, J. (2007). Estimating cointegrated panels with common factors and the forward rate unbiased-
ness hypothesis. Journal of Financial Econometrics 3, 491–522.
Westerlund, J. (2009). Some cautions on the LLC panel unit root test. Empirical Economics 37, 517–531.
Westerlund, J. and J. Breitung (2014). Myths and facts about panel unit root tests. Econometric Reviews.
Forthcoming.
Westerlund, J. and R. Larsson (2009). A note on the pooling of individual PANIC unit root tests. Econometric
Theory 25, 1851–1868.
Westerlund, J. and J. Urbain (2011). Cross-sectional averages or principal components? Research Memoranda
053, Maastricht: METEOR, Maastricht Research School of Economics of Technology and Organization.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for het-
eroskedasticity. Econometrica 48, 817–838.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
1032 References
White, H. (1982a). Instrumental variables regression with independent observations. Econometrica 50,
483–499.
White, H. (1982b). Maximum likelihood estimation of misspecified models. Econometrica 50,
1–25.
White, H. (2000). Asymptotic Theory For Econometricians (rev. edn). London: New York: Academic Press.
Whiteman, C. (1983). Linear Rational Expectations Models: A User’s Guide. Minneapolis: University of Min-
nesota Press.
Whittle, P. (1954). On stationary processes on the plane. Biometrika 41, 434–449.
Whittle, P. (1963). Prediction and Regulation by Linear Least-Squares Methods. London: English Universities
Press.
Whittle, P. (1979). Why predict? Prediction as an adjunct to action. In D. Anderson (ed.), Forecasting,
Amsterdam. North Holland.
Wilks, D. S. (1995). Statistical Methods in the Atmospheric Sciences: An Introduction. San Diego: Academic Press.
Windmeijer, F. (2005). A finite sample correction for the variance of linear efficient two-step gmm estimators.
Journal of Econometrics 126, 25–51.
Wold, H. (1938). A Study in the Analysis of Stationary Time Series. Uppsala: Almquist and Wiksell.
Wold, H. (1982). Soft modeling: The basic design and some extensions. In K. G. Joreskog and H. Wold (eds.),
Systems under indirect observation: Causality, structure, prediction: vol. 2, pp. 589–591. Amsterdam: North-
Holland.
Wooldridge, J. M. (2000). Introductory Econometrics: A Modern Approach (4th edn). Mason, USA: South-
Western.
Wooldridge, J. M. (2003). Further results on instrumental variables estimation of the average treatment effect
in the correlated random coefficient model. Econometric Theory 79, 185–191.
Wooldridge, J. M. (2005). Simple solutions to the initial conditions problem in dynamic, nonlinear panel-data
models with unobserved heterogeneity. Journal of Applied Econometrics 20, 39–54.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd edn). Cambridge, MA.
USA: The MIT Press.
Wooldridge, J. M. and H. White (1988). Some invariance principles and central limit theorems for dependent
heterogeneous processes. Econometric Theory 4, 210–230.
Wright, S. (1925). Corn and hog correlations. U.S. Department of Agriculture Bulletin 1300, Washington.
Wu, D. M. (1973). Alternative tests of independence between stochastic regressors and disturbances.
Econometrica 41, 733–750.
Xu, T. (2012, January). The role of credit in international business cycles. Cambridge Working Papers in Eco-
nomics 1202, Faculty of Economics, University of Cambridge.
Yin, Y. Q., Z. D. Bai, and P. R. Krishnainiah (1988). On the limit of the largest eigenvalue of the large dimen-
sional sample covariance matrix. Probability Theory and Related Fields 78, 509–521.
Yu, J., R. de Jong, and L. F. Lee (2008). Quasi-maximum likelihood estimators for spatial dynamic panel data
with fixed effects when both N and T are large. Journal of Econometrics 146, 118–137.
Yule, G. U. (1926). Why do we sometimes get nonsense-correlations between time series? A study in sampling
and the nature of time-series. Journal of the Royal Statistical Society 89, 1–63.
Yule, G. U. (1927). On a method of investigating periodicities in disturbed series with special application to
Wolfert’s sun spot numbers. Philosophical Transactions of the Royal Society, Series A 226, 267–298.
Zaffaroni, P. (2004). Contemporaneous aggregation of linear dynamic models in large economies. Journal of
Econometrics 120, 75–102.
Zaffaroni, P. (2008). Estimating and forecasting volatility with large scale models: Theoretical appraisal of
professionals’ practice. Journal of Time Series Analysis 29, 581–599.
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation
bias. Journal of the American Statistical Association 57, 348–368.
Zellner, A. (1971). Introduction to Bayesian Inference in Econometrics. New Yok: John Wiley and Sons.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
References 1033
Zellner, A. (1986). Bayesian estimation and prediction using asymmetric loss functions. Journal of the Ameri-
can Statistical Association 81, 446–451.
Zellner, A. and H. Theil (1962). Three stage least squares: simultaneous estimation of simultaneous equa-
tions. Econometrica 30, 54–78.
Zwillinger, D. and S. Kokoska (2000). Standard Probability and Statistics Tables and Formulae. Boca Raton, FL:
Chapman and Hall.
Zygmund, A. (1959). Trigonometric Series, vol. 1. Cambridge: Cambridge University Press.
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Name Index
Ackerberg, D., 691 Baillie, R. T., 349 Blanchard, O. J., 468, 483, 485, 486,
Agarwal, R. P., 964 Bala, V., 797 489, 600, 603, 915, 916
Ahn, S. C., 685–6, 696, 697, 698, 765 Balestra, P., 673, 677, 680, 681, 704 Blundell, R., 685, 688–91
Ahn, S. K., 852 Baltagi, B. H., 658, 667, 673, 784, Bollerslev, T., 414, 416, 422, 425,
Akay, A., 700 787, 802, 803, 804, 805, 807, 813, 609, 614
Alessandri, P., 926 839, 849, 855 Bollinger, C. R., 808
Alogoskoufis, G. S., 124 Balvers, R. J., 154 Bond, S., 233, 682, 683, 685–6,
Al-Sadoon, M. M., 700 Banbura, M., 902 688–92, 696, 771, 772, 838
Altissimo, F., 859, 892, 893 Banerjee, A., 559, 836, 840, 843, 855 Bonhomme, S., 700
Alvarez, J., 682, 685, 745 Barndorff Nielsen, O. E., 412, 614 Boschi, M., 929
Amemiya, T., 116, 192, 230, 250, Barro, R. J., 771, 797 Boskin, M. J., 705
435, 666, 705 Bartlett, M. S., 299, 528 Boswijk, H. P., 568
Amengual, D., 765 Bates, D., 834 Bover, O., 686–8
An, S., 497 Bates, J. M., 385, 386, 408 Bovian, J., 902
Anatolyev, S., 234 Bauwens, L., 629 Bowman, A. W., 79
Anderson, T. G. 412, 614 Baxter, M., 360 Bowsher, C. G., 691
Anderson, T. W., 281, 403, 404, 412, Bayes, T., 985 Box, G. E. P., 281, 302
448, 461, 463, 464, 489, 532, 614, Bayoumi, T., 711, 713, 727 Boyd, S., 959
677, 680, 681–2, 685, 731 Beach, C. M., 101 Breedon, F. J., 580
Anderton, R., 927 Belsley, D. A., 70 Breen, W., 154
Andrews, D. W. K., 114, 115, 321, Benati, L., 491 Breitung, J., 530, 765, 819, 820, 825,
328, 498, 750, 794, 822, 923 Bera, A. K., 75, 141, 225, 241, 254, 827, 828, 829, 830, 834, 835, 838,
Angeletos, G., 157 259, 784 843, 852, 853, 854
Anselin, L., 751, 784, 797, 799, 802, Beran, R., 743 Brent, R. P., 308
803, 810, 812, 815 Berk, K. N., 777, 913 Bresson, G., 807, 839
Aoki, M., 797 Bernanke, B. S., 902, 915 Brett, C., 809, 810, 813
Appelbe, T. W., 705 Bernstein, D. S., 939, 947, 950 Breusch, T. S., 91, 240, 241, 440, 666,
Arbia, G., 797, 815 Bertrand, M., 653 784, 785, 788
Arellano, M., 233, 653–4, 673, 682, Bester, C. A., 813–14 Brock, W., 797
683, 685, 686–8, 690, 692, 696, Bettendorf, T., 928 Brockwell, P. J., 275, 281, 299, 301,
700, 701, 828 Beveridge, S., 364–7, 368, 523, 321, 520
Assenmacher-Wesche, K., 581 552–6 Browning, M., 687, 688, 745
Azzalini, A., 79 Bewley, R., 127 Broze, L., C., 480, 483, 488, 489
Bhargava, A., 677, 837 Brüggemann, R., 852
Baberis, N., 161 Bickel, P. J., 918 Buhlmann, P., 262, 993
Bachelier, L., 136 Bierens, H. J., 79, 177, 940, 965, 978 Bun, M. J. G., 743
Bai, J., 454, 457, 458, 696, 764, 765, Bilias, Y., 225, 241 Bunzel, H., 115, 116
774, 836, 837, 839, 840, 854, 902, Billingsley, P., 169, 171, 193, 335, Burns, A. M., 360
917 965, 980, 983 Burridge, P., 814
Bai, Z. D., 752 Binder, M., 133, 473, 480, 481, 482, Bussière, M., 918, 925, 926, 928
Baicker, K., 798 500, 504, 685, 695, 797, 852, 888,
Bailey, N., 752, 754, 761, 786, 787, 902, 925 Caglar, E., 498
880, 918 Black, A., 154 Cameron, A. C., 959, 960
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Campbell, J. Y., 154, 426, 580 Cooley, T. F., 705 Durbin, J., 111, 363, 369
Candelon, B., 828 Cooper, R., 797 Durlaf, S. N., 570, 984
Canova, F., 497, 903, 916 Cornwell, C., 668 Durlauf, S., 797
Cao, C. Q., 416 Cosimano, T. F., 154 Durrett, R., 965
Carriero, A., 902 Cowles, A., 136
Carrion-i-Sevestre, J. L., 828 Cox, D. R., 251, 257 Easterly, W., 771
Carroll, C. D., 888 Cressie, N., 800 Edison, K. D. W. J., 392
Carvalo, J. L., 275, 281 Crowder, M. J., 210 Egger, P., 802, 803, 804, 808
Case, A. C., 805 Eicker, F., 85
Cashin, P., 929, 930, 932 Dann, H., 397 Eickmeier, S., 931
Castrén,O., 926 Darby, M. R., 579 Ejrnæs, M., 745
Cattell, R. B., 447 Das, S., 829, 830, 834, 835 Eklund, J., 917
Caves, K., 691 Dastoor, N. K., 253 Elhorst, J. P., 696, 802
Cesa-Bianchi, A., 798, 927, 928 Davidson, J., 124, 183, 184, 185, 186, Eliasz, P., 902
Chadha, J., 498 193, 281, 350 Elliott, G., 339, 341, 342, 344, 373,
Chamberlain, G., 685, 696, 750, 755 Davidson, R., 43, 48, 101, 117, 118, 385, 408, 826, 836
Chambers, M., 861 252, 254, 259 Ellison, G., 797
Champernowne, D. G., 26 Davies, R., 402 Embrechts, P., 613
Chan, N. H., 335 Davis, R. A., 275, 281, 299, 301, 321, Engle, R. F., 26, 198, 243, 411, 414,
Chang, Y., 819, 831, 835, 837, 839 520 417, 426, 523, 525–6, 537, 552,
Chatfield, C., 281, 292, 321, 520, 942 Dawid, A. P., 406 564, 609, 612, 613, 616, 618
Chen, Q., 926 Deaton, A. S., 108, 251, 253, 887–8 Entorf, H., 844
Cheng, X., 498 Dées, S., 497, 559, 840, 909, 915, Ericsson, N. R., 387, 924
Cheung, Y., 577 916, 918, 921, 922, 926, 928 Evans, G., 552
Chiang, M., 851 Deistler, M., 281 Everaert, G., 775
Chib, S., 502, 987 De Jong, D., 497
Cho, D., 392 de Jong, R., 185, 810 Fama, E. F., 136, 153, 154
Choi, I., 765, 817, 820, 824, 826, Del Barrio, T., 828 Fan, J., 918
837–8, 855 Del Negro, M., 503, 504 Fan, Y., 918
Chortareas, G., 832 De Mol, C., 902, 917 Farmer, D., 161
Chou, R. Y., 425 Dempster, A. P., 364 Faust, J., 916
Chow, G. C., 77 Den Haan, W. J., 115 Favero, C. A., 401, 926, 931
Christiano, L. J., 360 Dent, W., 101 Feldkircher, M., 903, 914, 925, 928,
Chu, C., 819, 821, 827, 829, 830, 838 De Waal, A., 924, 929 929, 931
Chudik, A., 137, 158, 350, 453, 752, De Wet, A. H., 926 Feldman, M., 495
753, 754, 758, 760, 770, 771, 774, Dhaene, G., 778 Feng, Q., 787
775, 776, 777, 778, 779, 782, 783, Dhrymes, P. J., 121, 948 Fernandez, C., 261
788, 794, 802, 874, 876, 880, 882, Dickey, D. A., 332, 777, 817, 913 Ferrier, G. D., 549, 579, 960
883, 884, 892, 893, 895, 896, 907, Diebold, F. X., 349, 395, 406, 407, Ferson, W. E., 154
911, 913, 914, 918, 919, 920, 925, 421, 614 Fidora, M., 930
926, 928, 930 Di Mauro, F., 497, 924, 932 Fingleton, B., 808
Chue, T. K., 837–8 Dineen, C. R., 705 Fisher, G. R., 252
Ciccarelli, M., 903 Doan, T., 502 Fisher, P., 580
Clare, A. D., 154 Donald, S. G., 234 Fisher, R. A., 13, 824, 838
Clarida, R., 493, 916 Douglas, P. H., 62 Fitzgerald, T. J., 360
Clements, M. P., 387, 392, 406, 408 Dovern, J., 926 Fleming, M. M., 815
Cliff, A.D., 751, 800 Draper, D., 260, 261, 387 Flores, R., 834
Coakley, J., 764, 771, 817, 835 Draper, N. R., 37 Forni, M., 751, 862, 902, 917
Cobb, C. W., 62 Dreger, C., 926, 928 Fox, R., 349
Cochran, W. G., 979 Driscoll, J. C., 813 Fraser, P., 154
Cochrane, D., 106, 108, 110 Druska, V., 798 Fratzscher, M., 926
Cochrane, J. H., 497 Dubin, R. A., 800 Frazer, G., 691
Cogley, J., 360, 580 Dubois, E., 929–30 Frees, E. W., 785
Collado, M. D., 687, 688 Duflo, E., 653 French, K. R., 154
Conley, T. G., 798, 813–14 Dufour, J. M., 77, 514, 785 Friedman, J., 262, 917, 918, 993
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Frisch, R., 43 Griffith, D. A., 814 Hendry, D. F., 122, 198, 243, 387,
Fry, R., 603 Griliches, Z., 191, 441, 656, 689, 862 392, 408, 564
Fuertes, A. M., 764, 771, 817 Grilli, V., 579 Henriksson, R. D., 397
Fuhrer, J. C., 888 Groen, J. J. J., 818, 843, 852, 854, 917 Hepple, L. W., 814
Fujiki, H., 859 Groote, T. D., 775 Hericourt, J., 929–30
Fuller, W. A., 271, 332, 339, 342, Gross, M., 902, 911, 925, 927 Herrera, M., 812, 813
345, 384, 817, 822 Grossman, S., 154 Heutschel, L., 417
Grossman, V., 918, 919, 920, 925 Heyde, C. C., 186, 350
Galesi, A., 901, 927 Gruber, M. H. J., 37 Hiebert, P., 930, 931
Gali, J., 493, 603–4, 916 Grunfeld, Y., 441, 656, 862 Hildreth, C., 101, 673, 705
Gallo, G. M., 134 Gruss, B., 930 Hlouskova, J., 838, 849, 853
Galton, F., 5 Gunther, T. A., 406 Ho, G., 929
Garderen, K. J., 862, 866 Gupta, R., 926 Hodrick, R., 358–60
Gardner Jr, E. S., 425 Gutierrez, L., 838, 849, 930 Hoeting, J. A., 260, 261
Garnett, J. C., 448 Hoing, A., 613
Garratt, A., 259, 524, 552, 553, 563, Hachem, W., 752 Holly, A., 253
574, 575, 577, 580, 581, 604, 605, Hadri, K., 832, 838 Holly, S., 760, 761, 763, 844, 847,
921, 925 Hahn, J., 406, 743 846, 908, 932
Garratt, T., 553 Haining, R. P., 751, 784, 801 Holtz-Eakin, D., 683, 696, 697
Geary, R. C., 814 Hall, A. R., 241, 501 Horenstein, A. R., 765
Gengenbach, C., 838, 839 Hall, P., 186, 350, 548 Horn, R. A., 939, 947
Georgiadis, G., 927, 931 Hallin, M., 454, 765, 814 Horowitz, J. L., 79, 743
Gerrard, W. J., 550 Halmos, P. R., 169 Horrace, W. C., 798
Gertler, M., 493 Haltwanger, J., 797 Houck, J., 705
Geweke, J., 751, 859, 902, 917, 985 Hamilton, J. D., 133, 184, 185, 186, Hsiao, C., 650, 673, 677, 680, 681–2,
Giacomini, R., 396, 862, 902 281, 351, 400, 426, 520, 939, 941, 685, 692, 693, 695, 696, 697, 698,
Giannone, D., 902, 917 961 699, 700, 701, 705, 706, 717, 729,
Giavazzi, F., 401, 931 Hanck, C., 832 730, 731, 736, 788, 852, 859
Gibbons, J. D., 6, 8 Hannan, E. J., 268, 281 Huang, J. S., 801
Gilli, M., 482 Hansen, B. E., 527, 836, 850, 854 Huang, Z., 411
Girardi, A., 929 Hansen, C. B., 655, 813–14 Huber, F., 931
Glosten, L. R., 154 Hansen, G., 852 Huizinga, J., 579
Gnedenko, B. V., 182 Hansen, L. P., 225, 227, 228, 233, Hurwicz, L., 729
Godfrey, L. G., 112, 117, 118, 240, 234, 685, 691 Hussain, A., 657
241, 251, 252, 253, 550 Hansen, P. R., 411
Goffe, W. L., 549, 579, 960 Haque, N. U., 711, 713, 719, 726, 733 Ibragimov, R., 814
Golub, G. H., 939, 953 Harbo, L., 568, 569, 570, 580, 904 Im, K. S., 730, 735, 736, 819, 820,
Gonzalez-Farias, G., 342, 345 Harding, M., 765 822, 823, 828, 832
Gonzalo, J., 546 Harris, D., 832 Imbens, G. W., 234
Gorman, W. M., 859, 864 Harris, R. D. F., 819, 827 Imbs, J., 859
Gouriéroux, C., 134, 253, 480, 483, Hartee, D. R., 306 Ingram, B., 497
488, 489, 788 Harvey, A. C., 281, 360, 361, 363, Inoue, A., 916
Goyal, S., 797 369, 834 Iskrev, N., 497
Granger, C. W. J., 26, 154, 247, 281, Harvey, C. R., 154
350, 377, 385, 386, 398, 408, Harvey, D. I., 395, 396, 826 Jacobson, T., 852
513–17, 523, 525–6, 537, 552, Hastie, T., 262, 917, 918, 993 Jaeger, A., 360, 361
564, 576, 860, 861, 862, 866, 870, Haug, A. A., 543 Jagannathan, R., 154
873, 875, 900 Hausman, J. A., 640, 644, 660, 665, James, W., 36, 37
Gray, D. F., 926 686, 735 Jannsen, N., 930–1
Gredenhoff, M., 852 Hayakawa, K., 695, 697, 698, 699 Jansson, M., 836
Greenberg, E., 985, 987, 992 Hayashi, F., 482, 692 Jarque, C. M., 75, 141
Greene, W., 48, 92, 118, 438, 464 Heaton, J., 233 Jayet, J., 799, 810, 815
Greenwood-Nimmo, M., 925, 929 Hebous, S., 931 Jenkins, G. M., 281
Gregory, A. W., 214 Heijmans, R. D. H., 210, 211, 212 Jensen, P. S., 169, 787–8
Grether, D. M., 275, 281 Heinz, F. F., 929 Jeon, Y., 900
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Mariano, R. S., 395, 407 Nabeya, S., 824 Perron, B., 825, 826, 828, 832–3,
Mark, N. C., 851, 853 Nason, J., 494, 497 836, 838–9, 853
Marriott, F. H. C., 313, 743 Nauges, C., 696, 838 Perron, P., 339, 345
Marron, J. S., 79 Neave, H. R., 619 Pesaran, B., 108, 109, 145, 256, 259,
Marshall, R. J., 212 Nelson, C. R., 329, 364–7, 368, 371, 320, 559, 609, 613, 614, 622, 630
Massey, F. J., 625 523, 552–6, 922 Pesaran, M. H., 43, 44, 77, 101, 102,
Masson, P. R., 711, 713, 727 Nelson, D. B., 416 103, 106, 108, 109, 113, 125, 130,
Mátyás, L., 230, 241 Nerlove, M., 275, 281, 673, 677, 680, 133, 134, 137, 138, 145, 154, 155,
Mavroeidis, S., 494, 497 681 158, 160, 161, 239, 247, 251, 252,
McAleer, M., 251, 252, 254, 259 Neudecker, H., 471, 939, 956 253, 256, 259, 262, 304, 306, 320,
McCabe, B., 832 Newbold, P., 26, 281, 395, 396 350, 377, 384, 385, 397, 398, 401,
McCoskey, S., 850 Newey, W. K., 114, 234, 400, 405, 403, 406, 440, 467, 472, 473, 480,
McCracken, M. W., 395 683, 696, 697, 813 481, 482, 494, 495–6, 497, 498,
McLeish, D. L., 328 Neyman, J., 85, 645, 671 500, 501, 504, 525, 526, 527, 535,
McMillen, D. P., 815 Ng, S., 454, 457, 458, 497, 765, 787, 543, 544, 545, 547, 559, 563, 564,
Meghir, C., 744 832, 833, 836, 837, 840, 854, 902, 570, 572, 573, 578, 580, 581, 589,
Mehl, A., 926, 927 917 590, 596, 597, 600, 602, 605, 609,
Mehra, R., 152, 153 Ng, T., 931 613, 614, 619, 622, 630, 660, 663,
Melino, A., 419 Nguyen, V. H., 925, 929 664–5, 667, 668, 669, 685, 692,
Merton, R. C., 397 Nickell, S., 679, 725 693, 695, 696, 697, 698, 699, 700,
Meyer, W., 827 711, 713, 717, 719, 726, 729, 730,
Nicolò, G. de, 916
Michaelides, P. G., 931, 932 731, 732, 733, 735, 736, 738, 741,
Nijman, T., 673
Michelis, L., 543 743, 746, 752, 753, 754, 758, 760,
Nyblom, J., 923
Mignon, V., 929–30 761, 763, 764, 766, 767, 768, 769,
Mills, T. C., 275, 281, 552 770, 771, 773, 774, 775, 776, 777,
O’Connell, P. G. J., 750, 833, 834
Mishkin, F. S., 580 778, 779, 783, 784, 785, 786, 787,
Ogaki, M., 538, 853
Mitchell, W. C., 360 788, 794, 797, 798, 802, 811, 812,
Onatski, A., 454, 457, 458,
Mittnik, S., 852 813, 814, 815, 818, 819, 820, 822,
759, 765
Mizon, G., 666 823, 828, 832, 835, 836, 839, 840,
Orcutt, G. H., 106, 314, 743 841, 842, 843, 844, 846, 847, 851,
Mizon, G. E., 253
Ord, J. K., 302, 751, 800, 802 852, 853, 854, 861, 862, 866, 874,
Mohaddes, K., 929, 930, 932
Orme, C. D., 118 876, 877, 878, 880, 882, 883, 884,
Monahan, J. C., 114, 115, 321
Osbat, C., 836, 840, 843 888, 892, 893, 895, 896, 900, 901,
Monfort, A., 134, 253
Osborne, M., 136 903, 904, 907, 908, 911, 913, 914,
Monteiro, J. A., 810
Osterwald-Lenum, M., 543 915, 916, 918, 919, 920–1, 922,
Moon, H. R., 773, 774, 818, 824,
825, 826, 828, 832–3, 836, 838–9, Ouliaris, S., 525 923, 924, 925, 927, 928, 929, 932,
850, 851, 853, 862 987
Moran, P. A. P., 751, 784, 814 Pace, R. K., 815 Pesavento, E., 526
Morris, M. J., 861 Pagan, A., 78, 79, 602, 705 Petrella, I., 932
Mörters, P., 983 Pagan, A. R., 91, 122, 440, 603, 784, Petrin, A., 691
Moscone, F., 787, 808, 815 785, 788 Pfaffermayr, M., 802, 803, 804, 808
Mosconi, R., 572 Palm, F. C., 838, 839 Phillips, A. W., 124
Moulton, B. R., 798 Palma, W., 348 Phillips, G. D. A., 729, 733
Mountford, A., 916 Pantula, S. G., 342, 345 Phillips, P. C. B., 26, 114, 116, 339,
Muellbauer, J., 859, 888 Papell, D. H., 828 345, 525, 527, 544, 570, 696, 737,
Mullainathan, S., 653 Park, H., 339, 342, 822 750, 818, 824, 825, 826, 827, 828,
Müller, U. K., 814 Park, J. Y., 538, 831 831, 833, 838, 850, 851, 852, 854,
Mundlak, Y., 634, 640, 649, 653, 656, Pauletto, G., 482 862, 984
673, 696 Pavan, A., 157 Pick, A., 138, 385, 401, 788
Mur, J., 812, 813 Pearson, K., 5, 12, 225, 228, 229 Pierce, D. A., 302
Murphy, A. H., 397 Pedroni, P., 830, 843, 844, 848–9, Pierse, R. G., 605, 862
Murphy, P. D., 580 850 Pigorsch, U., 765
Murray, C. J., 828 Pepper, J. V., 798 Pina, J., 916
Muth, J. F., 129, 467 Perego, J., 931 Pinkse, J., 809, 810, 813, 814
Mutl, J., 808 Peres, Y., 983 Piras, F., 930
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Pirotte, A., 807, 813, 839 Rudebush, G. D., 349 Sims, C., 486–8, 489, 493, 496, 502,
Pistaferri, L., 744 Runkle, D. E., 691–2 575, 584, 586, 588, 599, 600, 692,
Ploberger, W., 828, 838, 923 Rupert, P., 668 751, 902, 915, 917
Plosser, C. I., 329 Singleton, K. J., 227, 228
Poirier, D. J., 498 Said, E., 777, 913 Skouras, S., 392
Pope, J. A., 313, 743 Saikkonen, P., 543, 851 Slade, M., 809, 810, 813
Potter, S. M., 44, 589, 605 Saint-Guilhem, A., 928 Slater, L. J., 101, 102, 103, 106
Powell, M. J. D., 308 Sakkas, N. D., 826 Slutsky, E., 173–6, 187, 267
Prescott, E. C., 152, 153, 358–60, Sala, L., 497, 902, 917 Smeeks, S., 832
497, 705 Sala-i-Martin, X., 771, 797 Smets, F., 497–8
Press, W. H., 308 Salemi, M. K., 470, 472 Smith, A. F. M., 731
Priestley, M. B., 281, 292, 321, 360, Salkever, D. S., 77 Smith, G., 494, 497
520, 942 Samiei, H., 711, 713, 727 Smith, J., 406
Prucha, I., 799, 804, 808, 813, 814 Samuelson, P., 136, 154 Smith, L., 697, 698, 699
Psaradakis, Z., 116, 154 Sarafidis, V., 696, 750, 756, 769, 788 Smith, L. V., 497, 697, 698, 699, 822,
Pyke, 787 Sargan, J. D., 122, 124, 235, 257–8, 835, 836, 901, 907, 908, 918, 920,
677, 691, 837 921, 924, 928, 929
Quah, D., 600, 603, 916 Sargent, T. J., 467, 495, 751, 902, 917 Smith, R J., 125, 234, 526, 527, 543,
Quandt, R. E., 923 Satchell, S., 426 563, 564, 570, 572, 578, 580, 842
Quenouille, M., 778 Schanne, N., 925 Smith, R. P., 43, 44, 77, 124, 239,
Scheffe, H., 77 262, 497, 498, 559, 717, 719, 726,
Rahbek, A., 572 Scheinkman, J. A., 158 729, 730, 735, 736, 764, 767, 773,
Raissi, M., 929, 930, 932 Schiantarelli, F., 771, 772 820, 862, 916, 929
Rao, C. R., 79, 85, 171, 173, 176, Schleicher, C., 630 Smith, V., 907, 928
178, 211, 214, 222, 965 Schmidt, P., 666, 685–6, 696, 697, Snaith, S., 835
Ratto, M., 497 698, 826, 827 So, B. S., 778
Ravn, M. O., 359 Schmidt, T. D., 787–8 Söderlind, P., 360
Rebucci, A., 798, 927, 928 Schorfheide, F., 497 Song, M., 774–5
Reichlin, L., 552, 902, 917 Schott, J. R., 786, 787 Song, S., 784, 802, 804
Reinsel, G. C., 852 Schuermann, T., 421, 798, 840, 841, Song, W., 837
Reisman, E., 924 920–1, 924, 925 Spanos, A., 27
Renault, E., 514 Scott, E., 645, 671 Spearman, C., 5, 6, 448
Richard, J. F., 198, 243, 253, 564 Sentana, E., 612 Stambaugh, R. F., 154
Rio, A. D., 359 Serfling, R. J., 173, 176, 187, 188 Steel, M. F. J., 261
Rivera-Batiz, L.A., 797 Sestieri, G., 918, 925, 928 Stein, C., 36, 37
Roberts, H., 153 Shaman, P., 313 Stiglitz, J., 154
Robertson, D., 552, 553, 696, Shapiro, M. D., 603 Stine, R. A., 313
713, 750 Sharma, S., 711, 713, 719, 726, 733 Stock, J., 339, 341, 342, 344, 826
Robins, R. P., 426 Sheather, S. J., 79 Stock, J. H., 238, 385, 386, 498, 765,
Robinson, D. P., 808 Shek, H. H., 411 902, 917
Robinson, P. M., 349, 350, 801, 814, Shen, Y., 859 Stoker, T., 860, 862, 865
815, 861, 873, 878 Shephard, N., 361, 412, 426, 614 Stuart, A., 302
Rogers, J., 549, 579, 960 Shibayama, K., 498 Styan, G. P. H., 979
Rombouts, J. V. K., 629 Shields, K., 925 Su, L., 696
Romer, P. M., 797 Shiller, R. J., 145, 146, 151 Sul, D., 696, 737, 750, 833, 851, 853
Rose, D. E., 861 Shin, D. W., 778 Sun, Y., 114, 116, 929
Rosen, H. S., 683, 696, 697 Shin, Y., 44, 125, 526, 527, 535, 543, Swamy, P. A. V. B., 704, 708, 713–17,
Rosenberg, B., 705 544, 545, 547, 563, 564, 570, 572, 718, 737–8
Rosenblatt, M., 406 573, 578, 580, 589, 590, 596, 597, Symons, J., 713
Rothenberg, T., 339, 341, 342, 605, 731, 732, 819, 820, 822, 823, Szafarz, A., 480, 483, 488, 489
344, 826 832, 842, 843, 850, 851, 853, 916,
Rothenberg, T. J., 495 922, 925, 929 Tahmiscioglu, A. K., 692, 693, 695,
Rozanov, Y. A., 281 Shorfheide, F., 503, 504 696, 697, 698, 699, 729, 730, 731
Rubin, D. B., 364 Silverman, B., 78, 79 Tanaka, K., 831
Rubinstein, M., 154 Silverstein, J. W., 752 Taqqu, M. S., 349
i i
i i
OUP CORRECTED PROOF – FINAL, 5/9/2015, SPi
i
Tay, A. S., 406 Van Nostrand, R. C., 37 Winokur, H. S., 314, 743
Taylor, H. M., 268 Van Roye, B., 926 Wold, H., 267, 275, 917
Taylor, M., 579 Vansteenkiste, I., 930, 931, 932 Wolf, M., 918
Taylor, W. E., 640, 644, 660, 665, 686 Varian, H., 375 Wolters, J., 926
Thaler, R., 161 Veall, M. R., 214 Wooldridge, J. M., 48, 92, 118, 186,
Theil, H., 40, 191, 250, 441, 861, 864 Velasco, C., 349 655, 671, 673, 700, 701, 808
Thomas, A., 696 Vella, F., 668 Worthington, P. L., 619
Thomas, S. H., 154 Verbeek, M., 668, 673 Wouters, R., 497–8
Tian, Y., 979 Vogelsang, T. J., 115, 116, 118, 830 Wright, J., 238, 498
Tibshirani, R., 262, 917, 918, 993 Vuong, Q. H., 258 Wright, S., 228–9, 552, 553
Tiefelsdorf, M., 814 Wu, D. M., 761
Tieslau, M., 828 Wagner, M., 836, 838, 849, 853 Wu, S., 824, 835, 838
Timmermann, A., 138, 154, 155, Wallace, T. D., 657
160, 161, 373, 377, 384, 385, 397, Wallis, K. F., 495 Xiong, W., 158
398, 403, 406, 408, 619 Wansbeek, T., 756, 769 Xu, T., 931–2
Tobin, J., 673 Watson, G. S., 111
Topa, G., 798 Watson, M. W., 367, 385, 386, 484,
Yamagata, T., 440, 660, 696, 738,
Tosetti, E., 137, 158, 752, 753, 754, 485, 559, 603, 765, 902, 915, 917
741, 743, 760, 761, 763, 764, 770,
758, 760, 767, 769, 770, 771, 787, Waugh, F. V., 43
785, 815, 835, 836, 844, 847, 908,
794, 802, 808, 811, 812, 813, Weale, M., 161
932
815, 876 Weeks, M., 251, 262
Yang, Z., 696, 804
Tran, L. T., 814 Wegge, L. L., 495
Yaron, A., 233
Trapani, L., 862 Wei, C. Z., 335
Yeo, S., 77
Trenkler, C., 543 Weidner, M., 773, 774
Yin, Y. Q., 752
Treutler, B-J., 925 Weil, D. N., 888
Yogo, M., 238, 498
Trivedi, P. K., 959, 960 Weiner, S., 798, 840, 841
Tso, M. K. S., 461, 463 Weiss, Y., 658, 674 Yoo, B. S., 525
Turnbull, S. M., 419 West, K. D., 113, 114, 395, 400, 405, Yu, J., 696, 751, 802, 803, 804,
Tzavalis, E., 828 813 810, 815
Tzavalis, H. E., 819, 827 Westerlund, J., 771, 819, 820, 830, Yule, G. U., 26, 267
837, 843, 849, 850, 854
Uhlig, H., 359, 472, 916 White, H., 85, 86, 91, 113, 114, 180, Zaffaroni, P., 350, 623, 630, 862
Ullah, A., 78, 79, 440, 785, 815 184, 186, 193, 222, 237, 253, 259, Zaher, F., 926
Urbain, J., 771, 838, 839 396, 654, 902 Zellner, A., 248, 375, 441, 634, 992
Urga, G., 862 Whiteman, C., 470, 472, 497 Zha, T., 502
Whittle, P., 281, 390, 751, 763, 800 Zhang, Y., 928
van de Geer, S., 262, 993 Wickens, M. R., 154 Zhao, Z., 730, 733
Vandenberghe, L., 959 Wilks, D. S., 397 Zhou, Q., 664–5, 667, 668, 669, 695
Van Dijk, H., 985 Windmeijer, F., 234, 689, 838 Zimmermann, T., 931
Van Eyden, R., 924, 926, 929 Winkelmann, K., 613 Zwillinger, D., 965, 977
Van Loan, C. F., 939, 953 Winner, H., 808 Zygmund, A., 348
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
Subject Index
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
cross-sectional aggregation, and long Cholesky 95, 954 cointegration analysis 525
memory processes 349–50 classical decomposition of time computation of critical values of
cross-sectional dependence, in series 274–5 the statistics 339
panels 750–96, 795–6 generalized forecast error limiting distribution of
CCE estimators 766–72, variance 593–5 Dickey–Fuller statistic 338
775–8, 793–4 GVAR models 922 for models with a drift 334
common factor models 755–63 Jordan 954 for models without a drift 332–4
dynamic panel data models with matrices 953–4 panel unit root testing
factor error structure 772–9 orthogonalized forecast error 817, 818, 819, 821, 822,
error cross-sectional dependence, variance 592–3 826, 830
testing for 783–93 permanent/transitory time-reversed 822
error dependence, component 922 difference equations 961–4
cross-section 772–3 Schur/generalized first-order 965
errors, cross correlations 750 Schur 486, 953 difference stationary
large heterogeneous panels with spectral 953 processes 324–5
multifactor error trend and cycle see trend and cycle first difference versus
structure 763–72 decomposition; trend-cycle trend-stationary
long-run coefficients in dynamic decomposition of unit root processes 328–9
panel data models with processes as integrated processes 324
factor error structure, variance of ϒ 8–10 dimensionality curse, GVAR solution
estimating 779–83 Watson 367 to 903–5
panel unit root testing 833–4 Wold 275 common variables,
PC estimators 764–5, 774–5 -test (Pesaran and introducing 907–8
quasi-maximum likelihood Yamagata) 738–41 rank deficient GVAR model G0
estimator 773–4, 802 extensions of 741–2 906–7
semi-strong factors 760–1 density forecasts, evaluation 406–8 direct search methods 959–60
short dynamic panels with density function directional forecast evaluation
unobserved factor error bivariate regression model 13 criteria
structure 696 convergence 172 generalized PT test for serially
strong and weak factors 756, 757 maximum likelihood (ML) dependent
weak, in spatial panels 801–2 estimation 201, 218 outcomes 399–400
weak and strong, in large model combination 259 Pesaran–Timmermann
panels 752–4 non-parametric estimation 77–9 market-timing test 397–8
cross-sectional regressions probability and statistics 966 regression approach to derivation
heteroskedasticity problem 83 returns, statistical of PT test 398–9
panel data models with strictly models 139, 141 relationship of the PT statistic to
exogenous regressors 650–3 spectral, properties of 287–91 Kuipers score 398
cross-sectionally weakly dependent dependent variable, models with disaggregate behavioural
(CWD) 753, 754, 758, different transformations relationships, general
759, 802 Bera–McAleer test statistic 254 framework for 863–4
cross-unit cointegration 836–8 double-length regression test distributed lag models 120–3
cumulative distribution function statistic 254–5 see also autoregressive distributed
(CDF) 619 PE test statistic 253 lag (ARDL) models
cumulative sum (CUSUM) Sargan and Vuong’s likelihood ARDL models, estimation 122–3
statistics 923 criteria 257–8 model selection criteria 123
curve fitting approach 3–4 simulated Cox’s non-nested test polynomial 120–1
statistics 256–7 rational 121
data generating process deterministic aggregation 864 spectral density 291–2
(DGP) 244, 245, 259, 882 deterministic trends 121–2 undetermined coefficients
decay factor 413 devolatized returns 614, 621–2 method 470
decision-based forecast evaluation Dickey–Fuller (DF) unit root tests distributions
framework 390–4 asymptotic distribution of asymptotic 541–3
decomposition Dickey–Fuller Bayesian analysis 985–6, 988–9
Beveridge–Nelson 358, statistic 335–8 Bernoulli 196, 973
364–7, 552–6 augmented 338–9, 525 binomial 973
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
bivariate 3, 11, 966–7 error-correction models 120, 124 information and processing
chi-squared 39, 975 long-run and short-run costs 154–5
continuous 974–7 effects 125–6 investor rationality 155
convergence in 172–6 mean lag 127–8 joint hypothesis problem 153
cumulative 966 panel data models 200, 234 market efficiency and stock
discrete probability 973–4 partial adjustment market
Fisher–Snedecor 976–7 model 120, 123–4, 125, 129 predictability 147–53
impulse response analysis 597–8 rational expectations profitable opportunities,
marginal 617 models 129–34 exploiting in
maximum likelihood containing expectations of practice 159–61
estimation 318 exogenous variables 130 semi-strong form 153
multinomial 977 with current expectations of strong form 153
multivariate 967–8, 977–9 endogenous variables 130–1 theoretical
normal 27, 974–5 with future expectations of foundations 137, 155–9
of OLS estimator 37–9 endogenous variables 131–3 versions 136
panel unit root testing 822–5 when arising 120 weak form 153
Poisson 974 dynamic forecast for US output EGARCH (exponential
posterior predictive 988–9 growth 519 GARCH-in-mean)
predictive 376 model 416–17
dynamic OLS (DOLS)
prior and posterior 985–6 El Niño weather shocks 932
estimator 851
probability 966 EMH see efficient market hypothesis
dynamic seemingly unrelated
test statistics 54 (EMH)
regression (DSUR)
uniform 974 EMU membership, impact 929–30
estimator 853
Donsker’s theorem 335 Encompassing test 253
dynamic stochastic equilibrium, and
dot-com bubble 142 endogenous variables 431, 493
joint hypothesis problem
double index process 752 rational expectations models with
153
double-length (DL) regression test current expectations of
dynamic stochastic general
statistic 254–5 130–1
equilibrium (DSGE)
dummy variables 76, 644–5, rational expectations models with
models, rational
658, 681, 826, 828 future expectations of 131–3
expectations 467
least squares dummy system of equations with
general framework 489–90
variable 644–5 iterated instrumental variables
with lags 493–5
seasonal 42, 468, 507, 510 estimator 444–5
without lags 490–2
VAR models 507, 510, 513 two- and three-stage least
Durbin–Watson squares 442–4
earnings dynamics, testing slope Engel curves, non-linear 863
statistic 105, 111, 112
homogeneity in 744–6 equal weights average forecast 386
dynamic conditional correlations
(DCC) econometric models, equilibrium
model 609, 612–14, 615, formulation 243 equilibriating process 159
622, 623 efficiency impulse response analysis 597
see also asset returns see also efficient market hypothesis money market 580
maximum likelihood (EMH) stochastic 268
estimation 615–17 asymptotic 203, 206 equi-sample-size contour 831
with Gaussian returns 616 first-order 211 equity index futures 142
with Student’s t-distributed market efficiency and stock ergodicity conditions 301
returns 616–17 market error-correction model
post estimation evaluation of predictability 147–53 (ECM) 120, 124–5
t-DCC model 624–5 efficient market hypothesis errors
simple diagnostic tests 618–19 (EMH) 161–4 AR(m) error process with zero
dynamic economic see also returns of assets, restrictions 111
modelling 120–35, 134–5 predictability assumption of constant
adaptive expectations alternative versions 153–5 conditional and
models 120, 128–9 dynamic stochastic equilibrium unconditional error
ARDL models, estimation 122–3 formulations 153 variances 25–6
distributed lag models 120–3 evolution of 136 asymptotic standard 233
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
generalized method of moments empirical applications 923–32 heterogeneous panel data models,
(GMM) (cont.) forecasting 917–21 large 703–49, 746–9
asymptotic normality 230–1 forecasting applications 924–5 see also panel data models with
consistency 230 global finance applications 925–7 strictly exogenous
generalized instrumental variable global macroeconomic regressors; short Tdynamic
estimator 235–41 applications 927–32 panel data models
generalized R2 for IV impulse response Bayesian analysis 730–1
regressions 239 analysis 915–17 dynamic heterogeneous
Sargan’s general large-scale VAR reduced form data panels 723–4
misspecification representation 901–3 fixed effects (FE)
test 239–40 long-run properties 921–2 specification 710
Sargan’s test of residual serial panel cointegration 841 heterogeneous panels with strictly
correlation for IV permanent/transitory component exogenous regressors 704–6
regressions 240–1 decomposition 922 large sample bias of pooled
two-stage least squares 238–9 sectoral/other applications 932 estimators in dynamic
and instrumental variables see specification tests 923 models 724–8
instrumental variables and theoretical justification of mean group
GMM approach 909–14 estimator 717–23, 728–30
misspecification test 234–5 theory and practice 900–35 multifactor error structure, large
optimal weighting matrix 232 two-step approach of 901 heterogeneous panels
panel cointegration 852 GLS see generalized least squares with 763–72
population moment (GLS) pooled estimators in
conditions 226–8, 235 GMM see generalized method of heterogeneous
RE models, estimation 500–1 moments (GMM) panels 706–13
short T dynamic panel data Goldfeld–Quandt test, spatial panel
models 689 heteroskedasticity 89, 90 econometrics 811–13
two-step and iterated goodness of fit 358 Swamy estimator/test 713–17,
estimators 233–4, 689 gradient methods 958–9 719–23, 737–8
utilization of 225 method of steepest ascent 959 testing for slope homogeneity see
German DAX index 142 Newton-Raphson 958–9 slope homogeneity,
Germany Granger causality 513–17 testing for
inflation persistence 894, 895 and Granger heteroskedasticity 83–93, 92
output growth (VAR non-causality 516–17, 576 additive specification 87, 91
models) 513, 516, 518 granularity condition 753 in cross-section regressions 83
GIRF see generalized impulse Great Depression (1929) 146 diagnostic checks and
response function (GIRF) Grunfeld’s investment tests 89–92
GIVE see generalized instrumental equation 437, 441 efficient estimation of regression
variable estimator (GIVE) G-test of Phillips and Sul 737 coefficients in presence
global financial crisis GVAR see global vector of 86
(2008) 142, 145, 411, 925, 926 autoregressive (GVAR) errors 83, 113
global imbalances and exchange rate modelling F-test 90, 91
misalignment 928 Gauss–Markov theorem 83, 86
global vector autoregressive (GVAR) habit formation, aggregation of general models 86–9
modelling 563, 933–5 life-cycle consumption Goldfeld–Quandt test 89, 90
see also vector autoregressive decision rules graphical checks and tests 89
(VAR) models under 887–92 maximum likelihood
approximating a global factor Hannan–Quinn criterion (HQC), estimation 87, 88, 89
model 909–11 model selection 123, 250 mean-variance
approximating factor augmented Hausman test specification 87, 91
stationary high dimensional panel data models with strictly models with serially
VARs 911–14 exogenous regressors correlated/heteroskedastic
and Asian financial crisis 659–63, 673 errors 115–18
(1997) 900 slope homogeneity, testing multiple regression 30
benefits of 900 for 735–7 multiplicative
dimensionality curse 903–8 spatial panel econometrics 804 specification 86–7, 90
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
OLS estimators, null hypothesis see null hypothesis traditional impulse response
using 84, 85, 86, 89, 91 predictive failure test 76–7 functions 584–5
panel data models with strictly relationship between different in VARX models 595–7
exogenous regressors 661, ways of testing β = 0 55–8 independently identically distributed
668 simple hypotheses 51, 53–5 (IID) random variables
parametric tests 89, 90–2 size of test 52–3 see also random variables
regression models with stability of regression coefficients, aggregation in large panels 861
heteroskedastic testing 77 asymptotic theory 177, 180
disturbances 83–5 statistical hypothesis and maximum likelihood (ML)
heteroskedasticity autocorrelation statistical testing 51–2 estimation 196, 200, 203
consistent (HAC) testing significance of dependence inequalities
estimator 233 between ϒ and X 55–8 Cauchy–Schwarz 981–2
heteroskedasticity-consistent t-test see t-statistic/test Chebyshev 980
variance (HCV) Holder 982
estimators 85, 117, 118 idempotent matrix 30, 946 Jensen 982–3
higher-order lags 535–6, 566 IID see independently identically infinite moving average
histogram 77, 143 distributed (IID) random process 270, 271, 272, 347
Hodrick–Prescott (HP) variables infinite vector moving average
filter 358–60, 922, 928 impulse response analysis 584–608, process 537
Holder’s inequality 982 605–8 inflation
homoskedasticity 10, 25, 26, 30 Blanchard and Quah (1989) global 927–8
household consumption model 603 persistence of see inflation
expenditure, cross-sectional in cointegrating VARs 596–7 persistence
regressions 83 empirical distribution of impulse rates of 860–1
housing 844–8, 930–1, 932 response functions and variance-inflation factor
hypothesis testing, regression persistence profiles 597 (VIF) 70
models 51–82, 79–82 forecast error variance inflation persistence
alternative hypothesis 52, 53 decompositions 592–5 aggregation 892–6
Chow test (stability of regression Gali’s IS-LM model 603–4 data 893
coefficients) 77 generalized impulse response estimation results 894–5
coefficient of multiple correlation function 589–90 micro model of consumer
and F-test 65–6 GVAR models 915–17 prices 893–4
composite hypotheses 51 see also global vector sources 895–6
confidence intervals 52, 59 autoregressive (GVAR) information and processing
critical or rejection region of modelling costs 154–5
test 51 identification of a single structural innovation error 275
error types 52–3 block in a structural instrumental variables and GMM
F-test see F-statistic/test model 590–1 225, 807
implications of misspecification of identification of monetary policy Ahn and Schmidt model 685–6
regression model on shocks 604–5 Anderson and Hsiao model 681
hypothesis testing 74–5 identification of short-run effects Arellano and Bond model 682–5
Jarque–Bera’s test of normality of in structural VAR models Arellano and Bover models (with
regression residuals 75–6 598–600 time-invariant
joint confidence region 66–7 macro and aggregated regressors) 686–8
linear restrictions see linear idiosyncratic Blundell and Bond
restrictions shocks 878–81 model 688–91
maintained hypothesis 52 multiple regression 43–4 over-identifying restrictions,
versus model selection 247–8 multivariate systems 585 testing for 691
models with serially orthogonalized impulse response spatial panel
correlated/heteroskedastic function 586–9 econometrics 807–10
errors 115–18 persistence profiles for instrumental variables (IV) 117
multicollinearity problem 67–72 cointegrating relations 597 integrated GARCH (IGARCH)
multiple models 58–9 structural systems with permanent hypothesis 623–4, 625
non-parametric estimation of and transitory shocks 600–2 intercept terms, regression
density function 77–9 SVARs 600–1, 603 equations 30, 33, 75
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
model selection 242–64, 262–4 cross-sectional dependence, in classical normal linear regression
see also Akaike information panels 765, 771, 775, 778, model 24–7, 41—2
criterion (AIC), linear 783, 785 covariance matrix of regression
regression models; Schwarz forecasting 396 coefficients β̂ 31–3
Bayesian criterion (SBC), and GMM 233, 234 distribution of OLS
non-nested tests, model heterogeneous panel data models, estimator 37–9
selection large 730, 731, 743 disturbances of regression
Bayesian Markov chain Monte Carlo equation 24–5
analysis 259–61, 989–90 (MCMC) methods 502 Frisch-Waugh-Lovell
combination of models, Bayesian max ADF unit root test 345 theorem 43, 48
approach to 259–61 model combination 261 Gauss–Markov theorem 14, 17,
consistency properties of multivariate 18, 24, 34–6, 83
criteria 250 analysis 453, 455–6, 457 heteroskedasticity 30
criteria 249–50 non-nested tests, linear regression homoskedasticity 25, 26, 30
formulation of econometric models 252, 257 impulse response analysis 43–4
models 243–4 panel cointegration 843, 852, 853 interpretation of
panel unit root coefficients 43–4
versus hypothesis testing 247–8
testing 834, 838, 839 irrelevant regressors, inclusion 46
Lasso regressions 261–2, 914
short T dynamic panel data linear regressions that are
models with different
models 689, 691, 700 non-linear in variables
transformations of
spatial panel econometrics 812 47–8
dependent variable
spurious regression problem 26 maximum likelihood
Bera–McAleer test approach 28–9
statistic 253 Wald test procedure 214
mean square error of an estimator
double-length regression test Moore–Penrose inverse
and bias-variance trade
statistic 254–5 matrix 906, 948
off 36
PE test statistic 253 Moran’s I test 784
multiple correlation
Sargan and Vuong’s likelihood moving average error model 121
coefficient 24, 39–41
criteria 257–8 moving average (MA) processes
ordinary least squares
simulated Cox’s non-nested test 269–72, 276–7, 595
method 24, 27–8,
statistics 256–7 autocorrelated disturbances 98
30–1, 37–9
probit versus logit models 246–7 forecasting 381–2
orthogonality 25, 26, 30
infinite 270, 271, 272, 347
pseudo-true values 244–7 partitioned regression 24, 41–3
MA(1) processes, estimation properties of OLS residuals
rival linear regression
maximum likelihood (ML) 30–1
models 245–6
estimation 303–6 multiplicative specification,
moment conditions
method of moments 302–3 heteroskedasticity 86–7, 90
see also method of moments
regression equations with multi-step ahead
exact numbers 228–9
MA(q) error processes, forecasting 373, 379–80
excess of 229–31 estimation 306–8 multivariate analysis 431–66,
population moment MA(q) error processes, estimation 464–6
conditions 226–8, 235 of regression equations canonical correlation
monetary policy shocks, with 306–8 analysis 458–61
identification 604–5 MSFE see mean squared forecast common factor models
money market equilibrium error (MSFE) criteria 448–58
(MME) 580 multicollinearity problem 24 determining number of
Monte Carlo investigations hypothesis testing 67–74 factors 454–8
see also aggregation and prediction problem 72–4 distributions 967–8, 977–9
aggregation in large seriousness, measuring 70 endogenous variables, system of
panels 860, 881–7 multinomial distribution 977 equations with 441–5
design 882–3 multi-period returns 138 forecasting 392, 517–18
estimation using aggregate and multiple correlation generalized least squares
disaggregate data 883–4 coefficient 24, 39–41 estimator 432–4
results 884–7 multiple regression 24–50, 48–50 heteroskedasticity 85
cointegration analysis 543, 547 ceteris paribus assumption 43, 44 hypothesis testing 65–6
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
multivariate analysis (cont.) NKPC (new Keynesian Phillips hypothesis testing 52, 53, 54,
impulse response systems 585 curve) 475, 476, 494 57, 58, 61, 63, 64
iterated instrumental variables non-autocorrelated errors 10, 25, 26 Lagrange multiplier (LM)
estimator 444–5 non-linear restrictions, test 214, 218
linear/non-linear restrictions, testing 438–9 model selection 248
testing of 438–9 non-nested tests, linear regression panel unit root testing 819,
LR statistic for testing whether models 822–5, 826, 827, 830
is diagonal 439–41 Encompassing test 253 returns of assets,
maximum likelihood estimation globally and partially non-nested predictability 141
of SURE models 436–7 models 248 sphericity 787
normal distributions 27 hypotheses 51 stationarity, testing for 345
principal components JA-test 252 vector autoregressive models 512
(PC) 446–8 J-test 252 numerical optimization techniques
and cross-section average N-test 251 direct search methods 959–60
estimators of factors 450–4 NT-test 251–2 gradient methods 958–9
reduced rank regression 461–3 simulated Cox’s non-nested test grid search methods 957
seemingly unrelated regression statistics 256–7
equations 431–41 W-test 252 OECD (Organisation for Economic
spectral density 518–20 non-parametric approaches Co-operation and
system estimation subject to linear see also parametric tests Development) 580, 633
restrictions 434–6 cointegration analysis 548–9 oil shocks 513, 930
two- and three-stage least OLS estimator 37–9, 96
hypothesis testing 77–9
squares 431, 442–4, 444 see also ordinary least squares
spatial panel
multivariate generalized (OLS) analysis/regression
econometrics 813–14
autoregressive conditional ARDL models 122, 123, 127
non-spherical disturbances,
heteroskedastic asymptotic theory 192
regression models with 94
(MGARCH) 609 autocorrelated
normal equations, OLS problem 4
multivariate normal disturbances 96, 113
normal linear regression model see
distribution 978 biased 199
classical normal linear
Mundell-Flemming trilemma 927 compared to GLS 96
regression model
distribution 37–9
normality assumptions
Nadaraya-Watson kernal 814 estimation of α 2 18–19
asymptotic normality 205, 230–1
National Bureau of Economic implications of misspecification
departures from normality 142
Research (NBER) 360 for 44–6
National Longitudinal Surveys Jarque–Bera’s test, normality of
inconsistency of estimator of
(NLS), of Labor Market regression residuals 75–6
dynamic models with
Experience 633 multiple regression 25, 27, 28 serially correlated
negative exponential utility (finance normal distributions 974–5 errors 315–17
application) 392–4 n-step ahead forecast error 592 Phillips–Hansen fully
neoclassical investment model 482 N-test (non-nested) 251 modified 527–9
net present value (NPV) 150 NT-test (non-nested) 251–2 pooled 636–9, 652
new Keynesian Phillips curve null hypothesis properties 14–19
(NKPC) 475, 476, 494, 928 see also hypothesis testing, single-equation 434
Newey–West heteroskedasticity and regression models stochastic transformation 115
autocorrelation consistent autocorrelated unbiased 14
(HAC) variance matrix 113 disturbances 114, 118 omitted variable problem,
Newey–West robust variance autocovariances, misspecification 45
estimator 113–15 estimation 301–2 one-sided moving average process,
Newey–West SHAC estimator 813 cointegration analysis 540 versus two-sided
Newton-Raphson Dickey–Fuller (DF) unit root representation 269
method 305, 364, 546, 733, tests 332 one-step ahead forecast 373
958–9 fixed effects, testing for 659 optimal weighting matrix,
Nickell bias 679 forecasting 398, 402, 406 generalized method of
Nielson Datasets 633 and GMM 234 moments 232
Nikkei 225 (NK) index 142, 621 heteroskedasticity 90 optimality, forecast 373–6
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
ordinary least squares (OLS) tests 848–50 relation between FE, RE and
analysis/regression panel corrected standard errors cross-sectional
ARCH/GARCH effects, testing (PCSE) 835 estimators 652–3
for 417, 418 panel data models relation between pooled OLS and
cointegration aggregation of large panels see RE estimators 652
analysis 527, 532, 549–50 under aggregation time invariant effects, estimation
common factor models 449 cross-sectional dependence see FEF-IV estimation 667–70
estimator see OLS estimator cross-sectional dependence, HT estimation
fully modified OLS (FM-OLS) in panels procedure 665–7
approach 527, 850, 854 dynamic 200, 699–700 time-specific effects 657–9
and GMM 229, 238 large heterogeneous see time-specific formulation 635
heteroskedasticity 84, 85, 86, heterogeneous panel data unbalanced panels 671–3
89, 91 models, large unit-specific formulation 634
hypothesis testing 53 non-linear unobserved effects Panel Study of Income Dynamics
method 4–5, 27–8 models 699–700 (PSID) 633
in multiple regression 24, 27–8, short T dynamic models see short panel unit root testing 817–38,
30–1, 37–9 Tdynamic panel data models 855–8
non-nested tests, linear regression spatial panel econometrics see see also panel cointegration
models 252, 253 spatial panel econometrics asymptotic power of tests 825–6
orthogonality 30 with strictly exogenous regressors
cross-sectional
Pesaran–Timmermann (PT) see panel data models with
dependence 833–4
market-timing test 398 strictly exogenous regressors
Dickey–Fuller (DF) unit root
properties of residuals 30–1 unit roots and cointegration in
tests 817, 818, 819, 821,
regressions, second generation panels see panel
822, 830
panel unit root tests 835–6 cointegration; panel unit
distribution of tests under null
residuals 30–1, 112 root testing
hypothesis 822–5
vector autoregressive models 510 panel data models with strictly
finite sample properties of
orthogonality 4, 10, 234, 304, 501, exogenous regressors
tests 838–9
697, 946 633–75, 674–5, 676
first generation panel unit root
multiple regression 25, 26, 30 see also seemingly unrelated
tests 821–33
orthogonalized forecast error regression equations
GLS regressions, tests based
variance decomposition (SURE) models
on 834–5
592–3 cross-sectional regression 650–3
orthogonalized impulse response heterogeneous trends 826–8
estimation of the variance of
function 586–9 pooled OLS, FE and RE measuring proportion of
output gap relationship 578, 580 estimators of β (robust to cross-units with unit roots
output growths, VAR models heteroskedasticity and serial 832–3
Germany 513, 516 correlation) 653–6 model and hypotheses to
Japan 513, 515 fixed effects test 818–20
United States 513, 514 versus random effects 653 OLS regressions, tests based
overlapping returns 138 specification 639–45 on 835–6
testing for 659–63 other approaches to 830–2
panel cointegration 855–8 between group estimator of β and panel cointegration see panel
see also panel unit root testing 650–3 cointegration
with cross-sectional Hausman’s misspecification second generation panel unit root
dependence 853–5 test 659–63, 673 tests 833–6
cross-unit cointegration 836–7 heterogeneous panels 704–5 short-run dynamics 828–30
estimation of cointegrating linear panels with strictly Panel VARs (PVAR)
relations in panels 850–5 exogenous models 695, 852, 901, 902
general considerations 839–43 regressors 634–5 parametric tests
multiple cointegration, tests non-linear unobserved see also non-parametric
for 849–50 effects 670–1 approaches
residual-based approaches 843–9 pooled OLS estimator 636–9 cointegration analysis 548
spurious regression 843–8 random effects heteroskedasticity 89, 90–2
system estimators 852–3 specification 646–50 hypothesis testing 77
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
rational expectations (RE) rationality, efficient market hypothesis testing in models see
models 120, 129–34, hypothesis 155 hypothesis testing,
504–6 RE models see rational expectations regression models
backward recursive (RE) models interpretation of multiple
solution 482–3 realized volatility (RV) 412 regression coefficients 43–4
Bayesian analysis 501–3 reduced rank hypothesis 461 Lasso 261–2, 914
bias of RE estimators, in short T reduced rank regression linear see linear regression
dynamic panel data models (RRR) 403, 461–3 MA(q) error processes,
678–81 regression coefficients estimation of regression
Blanchard and Kahn efficient estimation of in presence equations with 306–7
method 483–5 of heteroskedasticity 86 models see regression models
calibration and linear restrictions, testing multiple see multiple regression
identification 496–8 on 59–62 OLS see ordinary least squares
containing expectations of multiple, interpretation of 43–4 (OLS) analysis/regression
exogenous variables 130 stability of (Chow test) 77 orthogonal 4
with current expectations of regression line 3, 5 partitioned 41–3
endogenous regression models penalized regression
variables 130–1 with autocorrelated techniques 242, 262
DSGE models disturbances 98–106 PT test, regression approach to
general framework 489–90 adjusted residuals, R2 , and derivation of 398–9
with lags 493–5 other statistics 103–4 reverse 4, 6
without lags 490–3 AR(1) and AR(2) Spearman rank 5
efficient market cases 99, 102–3 spurious 26, 843–8
hypothesis 156, 157 covariance matrix of exact ML stock return 147
with feedbacks 476–8 estimators for AR(1) and three variable models 33, 59, 91
AR(2) disturbances 103 regularity conditions 200–3, 244
’finite-horizon’ 482–3
estimation 99–100 residual matrices 42
with forward and backward
higher-order error residual serial correlation,
components 472–6
processes 100–1 consequences 95
with future expectations
log-likelihood ratio statistics for returns of assets
of endogenous
tests of residual serial see also efficient market hypothesis
variables 131–3
correlation 105–6 (EMH); weekly returns,
forward solution 468–70
with heteroskedastic volatilities and conditional
method of undetermined disturbances 83–5 correlations in
coefficients 470–2
hypothesis testing see hypothesis and alternative versions of
multivariate RE testing, regression models efficient market hypothesis
models 467–72 implications of misspecification 153–5
GMM estimation 500–2 on hypothesis testing 74–5 conditional correlation of,
higher-order case 479–82 multiple 58–9 modelling see conditional
identification, general with non-spherical correlation of asset returns,
treatment 495–8 disturbances 94 modelling 609–30
King and Watson method 485–6 simple see simple regressions covariance of asset returns with
lagged values 467, 468, 470, 473, regressions marginal utility of
490, 493, 496 absolute distance/minimum consumption 152
martingale difference distance 4 cross-correlation of returns 145
process 488–9 auxiliary 92, 253, 254 daily returns 144, 145
maximum likelihood bivariate see bivariate regressions empirical evidence 142–4
estimation 498–500 coefficients see regression extent to which predictable 145
multivariate 467–506 coefficients log-price change and relative price
quadratic determinantal equation cross-country growth 83 change 137
method 473–6, 481, 499 cross-sectional 83, 650–3 measures of departure from
retrieving solution for yt 481–2 generalized R2 for IV normality 141
Sims method 486–8 regressions 239 monthly stock market
rational hypothesis GLS see generalized least squares returns 145–6
(REH) 129–30, 133 (GLS) multi-period returns 138
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
small open economy (SOE) spectral representation estimation of the mean 297–9
macroeconomic theorem 285–7 inconsistency of the OLS
models 905 spectral decomposition 953 estimator of dynamic models
smoothing parameter 78 spectral density with serially correlated
South Africa 929 see also spectral analysis errors 315–17
Southern Oscillation Index and autocovariance generating sample bias-corrected estimators
(SOI) 932 function 273 of autocorrelation
spatial autoregressive (SAR) cointegration analysis 530 coefficient, ϕ, small 313–15
specification 800 distributed lag models 291–2 spectral density,
spatial correlation 797–8 estimation 319–21 estimation 318–21
spatial error component (SEC) 801 of long memory processes 348 testing for stationarity 345–346
spatial error models 800–1 multivariate 518–20 Yule–Walker estimators 308–9
spatial heteroskedasticity properties of function 287–91 statistical aggregation 864–5
autocorrelation consistent spectral representation statistical fit 242, 247
(SHAC) estimator 813 theorem 286 statistical hypothesis and statistical
spatial lag models 798–800 standardized 331, 367 testing 51–2
spatial lag operator 798 trend-cycle decomposition of unit see also hypothesis testing,
spatial moving average (SMA) 801 root processes 367 regression models
spatial panel weighting schemes for item statistical inference, classical
econometrics 797–816, estimating 318 theory 51
815–16 spectral radius 952–3 steepest ascent, method of 959
dynamic panels with spatial spectral representation s-th mean, convergence
dependence 810 theorem 285–7 in 167, 169–70
estimation 802–10 spurious regression 26, 843–8 stochastic equilibrium 268
fixed effects specification 802 square summable sequence, stochastic orders Op (·) and op
heterogeneous panels 811–13 stochastic processes 270 (·) 176–7
instrumental variables and GMM SSR (sum of squares of residuals) 63 stochastic processes 267–84, 281–4
807–10 state space models and Kalman absolutely summable
maximum likelihood filter 361–4 sequence 270, 272, 273, 274
estimator 802 static factor model 448 autocovariance function 269, 271
non-parametric stationary stochastic autocovariance generating
approaches 813–14 processes 267–8, 281 function 272–4
random effects stationary time series classical decomposition of time
specification 803–7 processes 297–323, 321–3 series 274–5
spatial dependence in asymptotic distribution of ML moving average 269–72, 276–7
panels 798–802, 814–15 estimator 318 see also moving average (MA)
spatial error models 800–1 estimation of processes
spatial lag models 798–800 autocovariances 299–302 stationary 267–8, 281
spatial weights and spatial lag estimation of autoregressive (AR) trend-stationary
operator 798 processes 308–13 processes 268, 275
temporal heterogeneity 812–13 maximum likelihood white noise 268, 269
testing for spatial estimation of AR(1) stochastic trend
dependence 814–15 processes 309–12 representation 368–9
weak cross-sectional dependence maximum likelihood stochastic volatility models 419
in spatial panels 801–2 estimation of AR(p) stock market crash (1929) 146
Spearman rank processes 312–13 stock market crash
regression 5, 6–7, 8, 785 estimation of MA(1) processes (2008) 142, 145, 411, 925
spectral analysis 285–94, 292–4 maximum likelihood stock market predictability and
distributed lag models, spectral estimation 303–6 market efficiency 147–53
density 291–2 method of moments 302–3 risk-averse investors 151–3
properties of spectral density regression equations with risk-neutral investors 148–51
function 287–91 MA(q) error processes, stock prices, random walk
relation between f (ω) and estimation 306–8 model 136, 149
autovariance generation estimation of mixed ARMA stock return 25
function 289–91 processes 317–18 stock returns, monthly 145–6
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
strict exogeneity 15, 26, 197–200 cointegration analysis 546–9 residual component 275
see also exogeneity; panel data cross-sectional dependence (CD) seasonal component 275
models with strictly tests 793–4 spurious regression
exogenous regressors; DCC model 618–19 problem 26, 843–8
seemingly unrelated error cross-sectional stationary processes, estimation
regression equations dependence 783–93 see stationary time series
(SURE) models fixed effects specification 659–63 processes
heterogeneous panel data models, forecasting 400–6 total impact effect, measuring 43
large 704–6 F-test 65–6, 735 US macroeconomic time
unbiased 199 GARCH effects 418–19 series 1959–2002 385
weak and strict 26, 197–200 Granger non-causality, trace statistic, asymptotic
strict stationarity, stochastic block 516–17 distribution 541–3
processes 268 G-test of Phillips and Sul 737 transaction costs 160
strong law for asymptotically heteroskedasticity 89–92 transversality
uncorrelated processes 184 hypothesis testing see hypothesis condition 132, 149, 484
strong law for mixing processes 184 testing, regression models trend and cycle
strong law of large likelihood-based tests 212–22 decomposition 358–72
numbers 178, 179 linear restrictions 59–66, 438–9 band-pass filter 358, 360
structural time series linear versus log-linear Hodrick–Prescott filter 358–60
approach 360–1 consumption functions 259 interest rates 556–9
structural VARs long-run relationships 526–7 state space models and Kalman
(SVARs) 600–1, 603 misspecification 234–5 filter 361–4
structural VEC (SVEC) 601 multiple cointegration 849–50 structural time series
Student t-distribution 618 non-nested tests see non-nested approach 360–1
Student’s t-distributed errors tests, linear regression trend-cycle decomposition of unit
distributions 976 models root processes 364–9
ML estimation with 421–3 for over-identifying trend-cycle decomposition of unit
subsampling procedure 837–8 restrictions 691 root processes
sum of squares of residuals panel unit root testing see panel see also trend and cycle
(SSR) 63 unit root testing decomposition
SURE models see seemingly parametric tests 90–2, 548 Beveridge–Nelson
unrelated regression power of a test 52 decomposition 364–7
equations (SURE) models residual serial correlation 105–6 stochastic trend
Swamy estimator/test 713–17 residual-based, cointegration representation 368–9
relationship with mean group analysis 525–6 Watson decomposition 367
estimator 719–3 small sample properties of test trended variables 192
testing for slope statistics 547–9 trend-stationary processes
homogeneity 737–8 spatial dependence in versus first difference stationary
Sylverster equations 470 panels 814–15 processes 328–9
specification, GVAR models 923 stochastic processes 268, 275
tail-fatness see kurtosis (tail-fatness) unit root see Dickey–Fuller (DF) trigonometric functions 940–1
Taylor series expansion of unit root tests; panel unit t-statistics/test 41, 54, 68, 69, 116
functions 177, 217 root testing; unit root panel unit root
Taylor’s theorem 957 processes and tests testing 821, 822, 823
tests/testing weak exogeneity 569 Tukey window 320, 346
asymptotic power of panel unit three variable models 33, 59, 91 two variables, relationship
root tests 825–6 three-stage least squares between 3–23, 22–3
bootstrap tests of slope (3SLS) 443, 444 correlation coefficients between
homogeneity for AR(1) time domain techniques 267 ϒ and X 5–8
model, time series analysis curve fitting approach 3–4
bias-corrected 743–4 classical decomposition 274–5 decomposition of variance of ϒ
cointegration cyclical component 275 8–10
VAR models 540–3 financial and macro-economic likelihood approach, bivariate
VARX models 570–1, time series 25 regressions 13–14
571–2, 577–80 long-term trend 275 linear statistical models 10–12
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
i i
i i
OUP CORRECTED PROOF – FINAL, 8/9/2015, SPi
i
vector autoregressive process with multivariate spectral parameter variations and ARCH
exogenous variables (VARX) density 518–20 effects 420
modelling (cont.) output growths 513, 514, 515, and predictability 159–60
identifying long-run relationships 516, 518, 519 realized 412
in a cointegrating panel cointegration 839 RiskMetrics™ ( JP Morgan)
VARX 572–3 Panel VARs 695, 852, 902, 903 method 412–13
impulse response analysis in short-run effects in structural risk-return relationships 419–20
models 595–7 models, stochastic models 419
long-run structural model for identification 598–600 testing for ARCH/GARCH
UK 574–80 stationary conditions for VAR effects 417–19
testing for cointegration (p) 508–9
in 569–72 SVARs 600–1, 603 Wald test procedure 117, 125, 438,
testing Hr against Hmy 571 testing for block Granger 526, 822
testing Hr against Hr+1 570–1 non-causality 516–17 maximum likelihood (ML)
testing Hr in presence of I(0) unit root case 509 estimation 195, 212,
weakly exogenous VAR order selection 512–13 214–22
regressors 571–2 VAR(1) model 507, Watson decomposition 367
testing weak exogeneity 569 517–18, 519–20, 532, 878 weak law of large numbers
weakly exogenous I(1) VAR(p) model 508–9, (WLLN) 178, 181
variables 563–6 535, 536, 586, 598 weak stationarity, stochastic
vector autoregressive (VAR) vector error correction (VEC) processes 268
models 520–2 models weather shocks 932
see also autoregressive (AR) see also cointegration analysis weekly returns, volatilities and
processes; cointegration estimation of short-run conditional correlations
analysis parameters 549–50 in 620–9
Beveridge–Nelson decomposition and GVAR modelling 924 asset specific estimates 623–4
in 552–6 small sample properties of test changing volatilities and
cointegration of VAR statistics 547 correlations 626–9
asymptotic distribution of trace treatment of trends 536 devolatized returns,
statistic 541–3 and VARX models 567, 568, 569 properties 621–2
impulse response volatility 426–8 ML estimation 622–3
analysis 596–7 conditional variance post estimation evaluation of
maximum eigenvalue models 412–13 t-DCC model 624–5
statistic 540–1 econometric approaches 413–17 recursive estimates and VaR
multiple cointegrating Absolute GARCH-in-mean diagnostics 625–6
relations 529–30 model 417 weighted symmetric tests of unit root
testing for ARCH(1) and GARCH(1,1) critical values 345
cointegration 540–3 specifications 414–15 treatment of deterministic
trace statistic 541–3 exponential GARCH-in-mean components 344
treatment of trends 536–8 model 416–17 weighted symmetric
companion form of VAR(p) higher-order GARCH estimates 342–4
model 508 models 415–16 white noise process 268, 269
deterministic estimation of ARCH and Wiener processes 115, 335
components 510–12 ARCH-in-mean window size
estimation 509–10 models 420–3 (bandwidth) 78, 114, 116
factor-augmented, aggregation ML estimation with Gaussian Wold’s decomposition 275, 364
of 872–7 errors 421 Wright’s demand equation 228–9
forecasting with multivariate ML estimation with Student’s W-test (non-nested) 252
models 517–18 t-distributed errors 421–3
Granger causality 513–17 forecasting with GARCH Yule–Walker
high dimensional VARs 900, models 423–5 equations/estimators 280,
901, 911–14 implied, market-based 411 308–9
large Bayesian 902 intra-daily returns 411
large-scale VAR reduced form data measurement and modelling zero concordance 58
representation 901–3 of 411–28 zero mean 10, 25, 179
i i