Вы находитесь на странице: 1из 58

Contents

1 Instrumental Variable Method 3


1.1 Consistency of Least Squares or Not? . . . . . . . . . . . . . . . . . 3
1.2 Reason 1 for IV: Measurement Errors . . . . . . . . . . . . . . . . . 3
Lecture Notes in Empirical Macroeconomics 1.3 Reason 2 for IV: Lagged Dependent Variable + Autocorrelated Shocks 5
1.4 Reason 3 for IV: Simultaneous Equations Bias (and Inconsistency) . . 5
(MiQEF, MSc course at UNISG) 1.5 Definition of the IV Estimator—Consistency of IV . . . . . . . . . . 9
1.6 Hausman’s Specification Test . . . . . . . . . . . . . . . . . . . . . . 15
Paul Söderlind1 1.7 Tests of Overidentifying Restrictions in 2SLS∗ . . . . . . . . . . . . 16

2 Non-Spherical Errors 18
January 2005 (with some corrections done later)
2.1 Summary of Least Squares . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Variance of a Sample Average (more details) . . . . . . . . . . . . . . 22
2.5 The Newey-West Estimator . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Vector Autoregression (VAR) 29


3.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Moving Average Form and Stability . . . . . . . . . . . . . . . . . . 30
3.4 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Forecasts Forecast Error Variance . . . . . . . . . . . . . . . . . . . 34
3.6 Forecast Error Variance Decompositions∗ . . . . . . . . . . . . . . . 35
1
University of St. Gallen. Address: s/bf-HSG, Rosenbergstrasse 52, CH-9000 St. Gallen, 3.7 Structural VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Switzerland. E-mail: Paul.Soderlind@unisg.ch. Document name: EmpMacroAll.TeX. 3.8 Cointegration, Common Trends, and Identification via Long-Run Restrictions∗ 46

1
4 Monetary Policy in VAR Systems 53
4.1 VAR System, Structural Form, and Impulse Response Function . . . . 53
4.2 Fully Recursive Structural Form . . . . . . . . . . . . . . . . . . . . 54
4.3 Some Controversies . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 1 Instrumental Variable Method
4.4 Summary of Some Important Results from VAR Studies of Monetary
Reference: Greene (2003) 5.4–6 and 15.1–2
Policy∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Additional references: Hayashi (2000) 3.1–4; Verbeek (2004) 5.1-4; Hamilton (1994) 8.2;
5 Microfoundations of Monetary Policy Models 78 and Pindyck and Rubinfeld (1998) 7
5.1 Dynamic Models of Sticky Prices . . . . . . . . . . . . . . . . . . . 78
5.2 Aggregate Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 1.1 Consistency of Least Squares or Not?
5.3 Recent Models for Studying Monetary Policy . . . . . . . . . . . . . 84
Consider the linear model
A Summary of Solution Methods for Linear RE Models 91 yt = xt0 β0 + u t , (1.1)

7 Solving Linear Expectational Difference Equations 93 where yt and u t are scalars, xt a k×1 vector, and β0 is a k×1 vector of the true coefficients.
7.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 The least squares estimator is
7.2 Matrix Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . 95 T
!−1 T
1X 1X
7.3 Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 β̂ L S = xt xt0 xt yt (1.2)
T T
7.4 Time Series Representation∗ . . . . . . . . . . . . . . . . . . . . . . 102 t=1 t=1
T
!−1 T
1X 1X
8 A Menu of Different Policy Rules 103 = β0 + xt xt0 xt u t , (1.3)
T T
8.1 A “Simple” Policy Rule . . . . . . . . . . . . . . . . . . . . . . . . . 103 t=1 t=1

8.2 Optimal Policy under Commitment . . . . . . . . . . . . . . . . . . . 104 where we have used (1.1) to substitute for yt . The probability limit is
8.3 Discretionary Solution . . . . . . . . . . . . . . . . . . . . . . . . . 106 !−1
T T
1X 1X
9 Estimation of New Keynesian Models 109 plim β̂ L S − β0 = plim xt xt0 plim xt u t . (1.4)
T T
t=1 t=1
9.1 “New Keynesian Economics and the Phillips Curve” by Roberts . . . 109
9.2 “Solution and Estimation of RE Macromodels with Optimal Monetary In many cases the law of large numbers applies to both terms on the right hand side. The
Policy” by Söderlind . . . . . . . . . . . . . . . . . . . . . . . . . . 111 first term is typically a matrix with finite elements and the second term is the covariance of
9.3 “Estimating The Euler Equation for Output” by Fuhrer and Rudebusch 112 the regressors and the true residuals. This covariance must be zero for LS to be consistent.
9.4 “New-Keynesian Models and Monetary Policy: A Reexamination of
the Stylized Facts” by Söderström et al . . . . . . . . . . . . . . . . . 113 1.2 Reason 1 for IV: Measurement Errors

Suppose the true model is


yt∗ = xt∗0 β0 + u ∗t . (1.5)

2 3
Data on yt∗ and xt∗ is not directly observable, so we instead run the regression value. This makes a lot of sense, since when the measurement error is very large then the
regressor xt is dominated by noise that has nothing to do with the dependent variable.
yt = xt0 β + u t , (1.6) Suppose instead that only yt∗ is measured with error. This not a big problem since this
measurement error is uncorrelated with the regressor, so the consistency of least squares
where yt and xt are proxies for the correct variables (the ones that the model is true for).
is not affected. In fact, a measurement error in the dependent variable is like increasing
We can think of the difference as measurement errors
the variance in the residual.
y
yt = yt∗ + vt and (1.7)
xt = xt∗ + vtx , (1.8) 1.3 Reason 2 for IV: Lagged Dependent Variable + Autocorrelated
where the errors are uncorrelated with the true values and the “true” residual u ∗t .
Shocks
Use (1.7) and (1.8) in (1.5)
Anything that causes correlation between the reisduals and the regressor will make LS
y
yt − vt = xt − vt x 0

β0 + u ∗t or inconsistent. For instance, a model with a lagged dependent variables as regressor and
y autocorrelated shocks.
yt = xt0 β0 + εt where εt = −vtx0 β0 + vt + u ∗t . (1.9)
To illustrate this, consider the simple ARMA(1,1)
Suppose that xt∗ is a measured with error. From (1.8) we see that vtx and xt are corre-
yt = ρyt−1 + u t , where u t = εt + θ εt−1 , (1.11)
lated, so LS on (1.9) is inconsistent in this case. To make things even worse, measurement
errors in only one of the variables typically affect all the coefficient estimates. where |ρ| < 1 and εt are iid white noise. It is clear that Cov(yt−1 , u t ) 6= 0 if θ 6 = 0. To
To illustrate the effect of the error, consider the case when xt is a scalar. Then, the be precise, we have
probability limit of the LS estimator of β in (1.9) is
Cov(yt−1 , u t ) = Cov(ρyt−2 + εt−1 + θ εt−2 , εt + θ εt−1 )
plim β̂ L S = Cov (yt , xt ) / Var (xt )
= θ Var(εt−1 ). (1.12)
= Cov xt∗ β0 + u ∗t , xt / Var (xt )


= Cov xt β0 − vtx β0 + u ∗t , xt / Var (xt )


 Results from a Monte Carlo experiment are shown in Figure 1.1.

Cov (xt β0 , xt ) + Cov −vtx β0 , xt + Cov u ∗t , xt


 
=
Var (xt ) 1.4 Reason 3 for IV: Simultaneous Equations Bias (and Inconsis-
Var (xt ) Cov −vtx β0 , xt∗ − vtx

= β0 + tency)
Var (xt ) Var (xt )
= β0 − β0 Var vt / Var (xt )
x Suppose economic theory tells you that the structural form of the m endogenous variables,

"
Var vtx
 # yt , and the k predetermined (exogenous) variables, z t , is
= β0 1 −  . (1.10)
Var xt∗ + Var vtx

F yt + Gz t = u t , where u t is iid with Eu t = 0 and Cov (u t ) = 6, (1.13)

since xt∗ and vtxare uncorrelated with and with each other. This shows that β̂ L S goes
u ∗t where F is m × m, and G is m × k. The disturbances are assumed to be uncorrelated with
to zero as the measurement error becomes relatively more volatile compared with the true the predetermined variables, E(z t u 0t ) = 0.

4 5
Distribution of LS estimator, T=200 Distribution of LS estimator, T=900 where ỹt are the endogenous variables except y jt (and F̃ j the corresponding coefficients).
We collect z t and ỹt in the xt vector to highlight that (1.17) looks like any other linear
0.88 0.89
0.4 Mean and std:
0.03
0.4 0.01 regression equation. The problem with (1.17), however, is that the residual is likely to be
correlated with the regressors, so the LS estimator is inconsistent. The reason is that a
0.2 0.2 shock to u jt influences y jt , which in turn will affect some other endogenous variables in
the system (1.13). If any of these endogenous variable are in xt in (1.17), then there is a
0 0 correlation between the residual and (some of) the regressors.
0.6 0.8 1 0.6 0.8 1
Note that the concept of endogeneity discussed here only refers to contemporaneous
True model: yt = ρyt−1 + ut, where ut = εt + θεt−1, endogeneity as captured by off-diagonal elements in F in (1.13). The vector of predeter-
where ρ=0.8, θ=0.5 and εt is iid N(0,2) mined variables, z t , could very well include lags of yt without affecting the econometric
Estimated model: yt = ρyt−1 + ut endogeneity problem.
Example 1 (Supply and Demand. Reference: Hamilton 9.1.) Consider the simplest si-
Figure 1.1: Distribution of LS estimator of the autoregressive parameter. multaneous equations model for supply and demand on a market. Supply is

qt = γ pt + u st , γ > 0,
Suppose F is invertible. Solve for yt to get the reduced form
and demand is
yt = −F −1 Gz t + F −1 u t (1.14)
qt = βpt + α At + u dt , β < 0,
= 5z t + εt , with Cov (εt ) = . (1.15)
where At is an observable demand shock (perhaps income). The structural form is there-
The reduced form coefficients, 5, can be consistently estimated by LS on each equation fore " #" # " # " #
since the exogenous variables z t are uncorrelated with the reduced form residuals (which 1 −γ qt 0 u st
+ At = .
are linear combinations of the structural residuals). The fitted residuals can then be used 1 −β pt −α u dt
to get an estimate of the reduced form covariance matrix. The reduced form is " # " # " #
The jth line of the structural form (1.13) can be written qt π11 ε1t
= At + .
pt π21 ε2t
F j yt + G j z t = u jt , (1.16)
If we knew the structural form, then we can solve for qt and pt to get the reduced form in
where F j and G j are the jth rows of F and G, respectively. Suppose the model is normal- terms of the structural parameters
ized so that the coefficient on y jt is one (otherwise, divide (1.16) with this coefficient). " # " γ
# " β γ
#" #
qt − β−γ α β−γ − β−γ u st
Then, rewrite (1.16) as = A t + .
pt 1
− β−γ α 1
β−γ − β−γ
1
u dt
y jt = −G j z t − F̃ j ỹt + u jt Example 2 (Supply equation with LS.) Suppose we try to estimate the supply equation in
" #
zt Example 1 by LS, that is, we run the regression
= xt0 β + u jt , where xt = , (1.17)
ỹt
q t = θ p t + εt .

6 7
If data is generated by the model in Example 1, then the reduced form shows that pt is A supply shock, u st , affects the quantity, but this has no affect on the price (the regressor
correlated with u st , so we cannot hope that LS will be consistent. In fact, when both qt in the supply equation), so there is no correlation between the residual and regressor in
and pt have zero means, then the probability limit of the LS estimator is the supply equation. A demand shock, u tD , affects the price and the quantity, but since
Cov (qt , pt ) quantity is not a regressor in the inverse demand function (only the exogenous At is) there
plim θ̂ = is no correlation between the residual and the regressor in the inverse demand equation
Var ( pt )
either.
Cov γγ−β α γ β
 
α
γ −β u t , γ −β
d s 1 d 1 d
At + γ −β u t − At + γ −β u t − γ −β u t
=  
α
Var γ −β At + 1 d
γ −β u t − 1 s
γ −β u t ,
1.5 Definition of the IV Estimator—Consistency of IV
where the second line follows from the reduced form. Suppose the supply and demand
Consider the linear model
shocks are uncorrelated. In that case we get
yt = xt0 β0 + u t , (1.18)
γ α2
Var (At ) + γ 2 Var u dt + β 2 u st
 
Var where yt is a scalar, xt a k × 1 vector, and β0 is a vector of the true coefficients. If
(γ −β)2 (γ −β) (γ −β)
plim θ̂ =
α2
(A ) 1 d 1
Var u st
 
Var t + Var u t + we suspect that xt and u t in (1.18) are correlated, then we may use the instrumental
(γ −β)2 (γ −β)2 (γ −β)2
γ α 2 Var (At ) + γ Var u dt + β Var u st
  variables (IV) method. To do that, let z t be a k × 1 vector of instruments (as many
=  . instruments as regressors; we will later deal with the case when we have more instruments
α 2 Var (At ) + Var u dt + Var u st

than regressors.) If xt and u t are not correlated, then setting xt = z t gives the least squares
First, suppose the supply shocks are zero, Var u st = 0, then plim θ̂ = γ , so we indeed

(LS) method.
estimate the supply elasticity, as we wanted. Think of a fixed supply curve, and a demand Recall that LS minimizes the variance of the fitted residuals, û t = yt − xt0 β̂ L S . The
curve which moves around. These point of pt and qt should trace out the supply curve. It first order conditions for that optimization problem are
is clearly u st that causes a simultaneous equations problem in estimating the supply curve:
T
u st affects both qt and pt and the latter is the regressor in the supply equation. With no 1X  
0kx1 = xt yt − xt0 β̂ L S . (1.19)
movements in u st there is no correlation between the shock and the regressor. Second, now T
t=1
suppose instead that the both demand shocks are zero (both At = 0 and Var u dt = 0).

If xt and u t are correlated, then plim β̂ L S 6= β0 . The reason is that the probability limit of
Then plim θ̂ = β, so the estimated value is not the supply, but the demand elasticity. Not
the right hand side of (1.19) is Cov(xt , yt − xt0 β̂ L S ), which at β̂ L S = β0 is non-zero, so the
good. This time, think of a fixed demand curve, and a supply curve which moves around.
first order conditions (in the limit) cannot be satisfied at the true parameter values. Note
Example 3 (A flat demand curve.) Suppose we change the demand curve in Example 1 that since the LS estimator by construction forces the fitted residuals to be uncorrelated
to be infinitely elastic, but to still have demand shocks. For instance, the inverse demand with the regressors, the properties of the LS residuals are of little help in deciding if to
curve could be pt = ψ At + u tD . In this case, the supply and demand is no longer use LS or IV.
a simultaneous system of equations and both equations could be estimated consistently The idea of the IV method is to replace the first xt in (1.19) with a vector (of similar
with LS. In fact, the system is recursive, which is easily seen by writing the system on size) of some instruments, z t . The identifying assumption of the IV method is that the
vector form " #" # " # " # instruments are uncorrelated with the residuals (and, as we will see, correlated with the
1 0 pt −ψ u tD
+ At = .
1 −γ qt 0 u st

8 9
regressors) LS, T=200 IV, T=200 ML, T=200

0kx1 = E z t u t (1.20) 0.88 0.78 0.78


0.4 0.03
0.4 0.05
0.4 0.05
xt0 β0 .

= E z t yt − (1.21)
0.2 0.2 0.2
The intuition is that the linear model (1.18) is assumed to be correctly specified: the
residuals, u t , represent factors which we cannot explain, so z t should not contain any 0 0 0
information about u t . 0.6 0.8 1 0.6 0.8 1 0.6 0.8 1

The sample analogue to (1.21) defines the IV estimator of β as1


T
1X  
0kx1 = z t yt − xt0 β̂ I V , or (1.22) LS, T=900 IV, T=900 ML, T=900
T
t=1
0.89 0.80 0.80
T
!−1 T 0.4 0.01
0.4 0.02
0.4 0.02
1X 0 1X
β̂ I V = z t xt z t yt . (1.23)
T T
t=1 t=1 0.2 0.2 0.2

It is clearly necessay for 6z t xt0 /T to have full rank to calculate the IV estimator.
0 0 0
0.6 0.8 1 0.6 0.8 1 0.6 0.8 1
Remark 4 (Probability limit of product) For any random variables yT and x T where
plim yT = a and plim x T = b (a and b are constants), we have plim yT x T = ab.
Figure 1.2: Distribution of different estimators of the autoregressive parameter.
To see if the IV estimator is consistent, use (1.18) to substitute for yt in (1.22) and
take the probability limit be satisfied. In practical terms, the first order conditions (1.22) do then not define a unique
T T T value of the vector of estimates. If a law of large numbers applies, then plim 6z t xt0 /T =
1X 0 1X 1X 0
plim z t xt β0 + plim z t u t = plim z t xt β̂ I V . (1.24) Ez t xt0 . If both z t and xt contain constants (or at least one of them has zero means), then
T T T
t=1 t=1 t=1
a reduced rank of Ez t xt0 would be a consequence of a reduced rank of the covariance
Two things are required for consistency of the IV estimator, plim β̂ I V = β0 . First, that matrix of the stochastic elements in z t and xt , for instance, that some of the instruments
plim 6z t u t /T = 0. Provided a law of large numbers apply, this is condition (1.20). are uncorrelated with all the regressors. This shows that the instruments must indeed be
Second, that plim 6z t xt0 /T has full rank. To see this, suppose plim 6z t u t /T = 0 is correlated with the regressors for IV to be consistent (and to make sense).
satisfied. Then, (1.24) can be written For an example, see Figure 1.2 (details are given in Figure 1.1)

T
!
1X 0   Remark 5 (Second moment matrix) Note that E zx 0 = E z E x 0 + Cov(z, x). If E z = 0
plim z t xt β0 − plim β̂ I V = 0. (1.25)
T and/or E x = 0, then the second moment matrix is a covariance matrix. Alternatively,
t=1
suppose both z and x contain constants normalized to unity: z = [1, z̃ 0 ]0 and x = [1, x̃ 0 ]0
If plim 6z t xt0 /T has reduced rank, then plim β̂ I V does not need to equal β0 for (1.25) to
−1
1 In matrix notation where z t0 is the t th row of Z we have β̂ I V = Z 0 X/T Z 0 Y /T .


10 11
where z̃ and x̃ are random vectors. We can then write Use (1.18) to substitute for yt in (1.22)
" # " # !−1
1 h i 0 0 T T
E zx 0 = 1 E x̃ 0 + β̂ I V = β0 +
1X 0
z t xt
1X
zt u t . (1.26)
E z̃ 0 Cov(z̃, x̃) T T
" # t=1 t=1
1 E x̃ 0
= . √
E z̃ E z̃ E x̃ 0 + Cov(z̃, x̃) Premultiply by T and rearrange as

T
!−1 √ T
For simplicity, suppose z̃ and x̃ are scalars. Then E zx 0 has reduced rank if Cov(z̃, x̃) = 0, √ 1X 0 T X
T (β̂ I V − β0 ) = z t xt zt u t . (1.27)
since Cov(z̃, x̃) is then the determinant of E zx 0 . This is true also when z̃ and x̃ are vectors. T T
t=1 t=1
Example 6 (Supply equation with IV.) Suppose we try to estimate the supply equation in If the first term on the right hand side converges in probability to a finite matrix (as as-
Example 1 by IV. The only available instrument is At , so (1.23) becomes sumed in proving consistency), and the vector of random variables z t u t satisfies a central
T
!−1 T limit theorem, then
1X 1X
γ̂ I V = A t pt A t qt , √ d
 
T T T (β̂ I V − β0 ) → N 0, 6zx −1
S0 6x−1
t=1 t=1 z , where (1.28)
T √ T !
so the probability limit is 1X 0 T X
6zx = E z t xt and S0 = Cov zt u t .
T T
t=1 t=1
plim γ̂ I V = Cov (At , pt )−1 Cov (At , qt ) ,
0
The last matrix in the covariance matrix follows from (6zx −1 )0 = (6 )−1 = 6 −1 . This
zx xz
since all variables have zero means. From the reduced form in Example 1 we see that
general expression is valid for both autocorrelated and heteroskedastic residuals—all such
1 γ features are loaded into the S0 matrix. Note that S0 is the variance-covariance matrix of
Cov (At , pt ) = − α Var (At ) and Cov (At , qt ) = − α Var (At ) , √
β −γ β −γ T times a sample average (of the vector of random variables xt u t ).
so
−1  Example 8 (Choice of instrument in IV, simplest case) Consider the simple regression
γ
 
1
plim γ̂ I V = − α Var (At ) − α Var (At )
β −γ β −γ yt = β1 xt + u t .
= γ.
The asymptotic variance of the IV estimator is
This shows that γ̂ I V is consistent. h√ √ T !
i T X
AVar T (β̂ I V − β0 ) = Var z t u t / Cov (z t , xt )2
T
1.5.1 Asymptotic Normality of IV t=1

T z u / T) =
Little is known about the finite sample distribution of the IV estimator, so we focus on the If z t and u t is serially uncorrelated and independent of each other, then Var(6t=1 t t
asymptotic distribution—assuming the IV estimator is consistent. Var(z t ) Var(u t ). We can then write
d
Remark 7 If x T → x (a random variable) and plim Q T = Q (a constant matrix), then
h√ i Var(z t ) Var(u t )
AVar T (β̂ I V − β0 ) = Var(u t ) = .
d
Q T x T → Qx. Cov (z t , xt )2 Var(xt )Corr (z t , xt )2

12 13
An instrument with a weak correlation with the regressor gives an imprecise estimator. Use p̂t = δ̂ L S At and Slutsky’s theorem
With a perfect correlation, then we get the precision of the LS estimator (which has a low  
d qt , δ̂ L S At
plim Cov
variance, but is perhaps not consistent).
plim γ̂2S L S =  
c δ̂ L S At
plim Var
1.5.2 2SLS
Cov (qt , At ) plim δ̂ L S
=
Suppose now that we have more instruments, z t , than regressors, xt . The IV method does Var (At ) plim δ̂ 2L S
γ
h ih i
not work since, there are then more equations than unknowns in (1.22). Instead, we can − β−γ α Var (At ) − β−γ 1
α
use the 2SLS estimator. It has two steps. First, regress all elements in xt on all elements = h i2
in z t with LS. Second, use the fitted values of xt , denoted x̂t , as instruments in the IV Var (At ) − β−γ 1
α
method (use x̂t in place of z t in the equations above). In can be shown that this is the most = γ.
efficient use of the information in z t . The IV is clearly a special case of 2SLS (when z t
has the same number of elements as xt ). Note that the trick here is to suppress some the movements in pt . Only those movements
It is immediate from (1.24) that 2SLS is consistent under the same condiditons as that depend on At (the observable shifts of the demand curve) are used. Movements in pt
PT
IV since x̂t is a linear function of the instruments, so plim t=1 x̂t u t /T = 0, if all the which are due to the unobservable demand and supply shocks are disregarded in p̂t . We
instruments are uncorrelated with u t . know from Example 2 that it is the supply shocks that make the LS estimate of the supply
The name, 2SLS, comes from the fact that we get exactly the same result if we replace curve inconsistent. The IV method suppresses both them and the unobservable demand
the second step with the following: regress yt on x̂t with LS. shock.

Example 9 (Supply equation with 2SLS.). With only one instrument, At , this is the same 1.6 Hausman’s Specification Test
as Example 6, but presented in another way. First, regress pt on At
Cov ( pt , At ) 1 This test is constructed to test if an efficient estimator (like LS) gives (approximately) the
pt = δ At + u t ⇒ plim δ̂ L S = =− α. same estimate as a consistent estimator (like IV). If not, the efficient estimator is most
Var (At ) β −γ
likely inconsistent. It is therefore a way to test for the presence of endogeneity and/or
Construct the predicted values as
measurement errors.
p̂t = δ̂ L S At . Let β̂e be an estimator that is consistent and asymptotically efficient when the null
hypothesis, H0 , is true, but inconsistent when H0 is false. Let β̂c be an estimator that is
Second, regress qt on p̂t consistent under both H0 and the alternative hypothesis. When H0 is true, the asymptotic
d qt , p̂t
 distribution is such that
Cov
qt = γ p̂t + et , with plim γ̂2S L S = plim  .
   
Cov β̂e , β̂c = Var β̂e . (1.29)
Var
c p̂t
Proof. Consider the estimator λβ̂c + (1 − λ) β̂e , which is clearly consistent under H0
since both β̂c and β̂e are. The asymptotic variance of this estimator is
     
λ2 Var β̂c + (1 − λ)2 Var β̂e + 2λ (1 − λ) Cov β̂c , β̂e ,

14 15
which is minimized at λ = 0 (since β̂e is asymptotically efficient). The first order condi- Bibliography
tion with respect to λ
      Davidson, J., 2000, Econometric Theory, Blackwell Publishers, Oxford.
2λ Var β̂c − 2 (1 − λ) Var β̂e + 2 (1 − 2λ) Cov β̂c , β̂e = 0
Greene, W. H., 2003, Econometric Analysis, Prentice-Hall, Upper Saddle River, New
should therefore be zero at λ = 0 so Jersey, 5th edn.
   
Var β̂e = Cov β̂c , β̂e . Hamilton, J. D., 1994, Time Series Analysis, Princeton University Press, Princeton.

(See Davidson (2000) 8.1) Hayashi, F., 2000, Econometrics, Princeton University Press.
This means that we can write Pindyck, R. S., and D. L. Rubinfeld, 1998, Econometric Models and Economic Forecasts,
        Irwin McGraw-Hill, Boston, Massachusetts, 4ed edn.
Var β̂e − β̂c = Var β̂e + Var β̂c − 2 Cov β̂e , β̂c
   
= Var β̂c − Var β̂e . (1.30) Verbeek, M., 2004, A Guide to Modern Econometrics, Wiley, Chichester, 2nd edn.

We can use this to test, for instance, if the estimates from least squares (β̂e , since LS
is efficient if errors are iid normally distributed) and instrumental variable method (β̂c ,
since consistent even if the true residuals are correlated with the regressors) are the same.
In this case, H0 is that the true residuals are uncorrelated with the regressors.
All we need for this test are the point estimates and consistent estimates of the vari-
ance matrices. Testing one of the coefficient can be done by a t test, and testing all the
parameters by a χ 2 test
 0  −1  
β̂e − β̂c Var β̂e − β̂c β̂e − β̂c ∼ χ 2 ( j) , (1.31)

where j equals the number of regressors that are potentially endogenous or measured with
error. Note that the covariance matrix in (1.30) and (1.31) is likely to have a reduced rank,
so the inverse needs to be calculated as a generalized inverse.

1.7 Tests of Overidentifying Restrictions in 2SLS∗

When we use 2SLS, then we can test if instruments affect the dependent variable only
via their correlation with the regressors. If not, something is wrong with the model since
some relevant variables are excluded from the regression.

16 17
2.2 Heteroskedasticity

Definition: εt is not iid, since the V (εt ) is different for different observations (t)
2 Non-Spherical Errors Effect: LS is still consistent, but the typical expression for V (b) is wrong. LS is no
longer the best estimator (GLS is)
Reference: Greene (2003) 10.3
Additional references: Hayashi (2000) 6.5; Hamilton (1994) 14; Verbeek (2004) 4.10; 2.2.1 White’s Test of Heteroskedasticity
Harris and Matyas (1999); and Pindyck and Rubinfeld (1998) Appendix 10.1; Cochrane
H0 : homoskedasticity
(2001) 11.7
H A : the kind of heteroskedasticity which can be explained by the levels, squares, and
cross products of the regressors.
2.1 Summary of Least Squares Let wt be the unique elements in xt ⊗ xt Run a regression of squared fitted residuals
on wt
Consider the regression equation
et2 = wt0 γ + vt .
yt = xt0 β0 + εt . (2.1) Test if all elements (except the constant) in γ are zero (N R 2 ∼ χ P2 , P = dim(wt ) − 1)
The reason for this specification is that if u 2t is uncorrelated with xt ⊗ xt , then the
Recall that the LS estimator can be written
usual LS covariance matrix applies.
T
!−1 T
X X
β̂ L S = xt xt0 xt yt (2.2)
2.2.2 Correct Var(β̂) for LS
t=1 t=1
!−1 T
XT X The matrix S0 in (2.4)–(2.5) can not be simplified to σ 2 6x x . Instead, estimate it with
= β0 + 0
xt xt x t εt . (2.3)
White’s estimator
t=1 t=1 XT
Ŝ0 = ε̂t2 xt xt0 /T, (2.6)
t=1
We know that the general expression for the asymptotic distribution of β̂ L S is
where ε̂t are the fitted residuals.

Discussion: let z t = xt εt and think of Var(z 1 + z 2 + ...)/T when z t is uncorrelated
d
 
T (β̂ L S − β0 ) → N 0, 6x−1 x S0 6x x , where
−1
(2.4)
T √ T ! with z t−1 . Ideally we would like to estimate this variance as 6i=1 c i )/T , but that is
T Var(z
1X T X
6x x = E xt xt0 and S0 = Cov x t εt . (2.5) not possible since there is not enough data to estimate each Var(z i ). However, White has
T T
t=1 t=1 shown that is a consistent way of estimating the variance of the sum.
In practice, 6x x and S0 are replaced by their sample analogues. When xt is independent
of all u t−s and u t is iid, then S0 = Var(u t )6x x , so the covariance matrix in (2.4) simplifies 2.3 Autocorrelation
to Var(u t )6x−1
x , which is the classical LS case.
Definition: εt is not iid, since εt is correlated with some εt−s
Effect: LS is still consistent, but the standard expression for Var(β̂) is wrong. LS is
no longer the best estimator (GLS is)

18 19
Std of LS estimator Std of LS, Corr(xt,xt−1)=−0.9 Std of LS, Corr(xt,xt−1)=0
0.1 0.1 0.1
2 −1
2
σ (X’X)
−1 σ (X’X)
Newey−West
White’s Simulated
0.08 Model: yt=0.9xt+εt,
Simulated 0.05 0.05
where εt ∼ N(0,ht), with ht = 0.5exp(αx2t )

0.06
0 0
−0.5 0 0.5 −0.5 0 0.5
α α
0.04
Model: yt=0.9xt+εt,

Std of LS, Corr(xt,xt−1)=0.9 where εt = αεt−1 + ut,


0.02 0.1 where ut is iid N(0,h) such that Std(εt)=1

0 0.05
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2
α

0
−0.5 0 0.5
Figure 2.1: Variance of OLS estimator, heteroskedastic errors α

2.3.1 Test of Autocorrelation


Figure 2.2: Variance of OLS estimator, autocorrelated errors
H0 : no autocorrelation; H A : autocorrelation
√ if εt is homoskedastic (so Var (z 1 ) = Var (z 2 )).
1. Estimate ρ = Corr(et , et−1 ). Form t-test: T ρ ∼ N (0, 1)
One more, with T = 3
2. Durbin-Watson: DW≈ 2 − 2ρ. Reject H0 in favour of positive autocorrelation if  
DW<1.5 or so E (z 1 + z 2 + z 3 )2 = E z 12 + E z 22 + E z 32 + 2 E z 1 z 2 + 2 E z 2 z 3 + 2 E z 1 z 3

3 Var(z t ) + |{z}
= |{z} 4 Cov(z t , z t−1 ) + |{z}
2 Cov(z t , z t−2 )
T 2(T −1) 2(T −2)
2.3.2 Correct Var(β̂) for LS
The pattern continues as T increases. In practice, this leads to numerical problems, so the
In this case z t = xt εt in (2.5) is autocorrelated which will affect the variance of the sum.
usual way is as follows:
To estimate Var(z 1 + z 2 + ... + z T ) we form weighted average of the autocorrelations of
use all data points to estimate the variances and covariances in the previous expression
z t . For instance, with T = 2
replace T by 1 plus number of lags used (3 in the previous expression)
Var (z 1 + z 2 ) = Var (z 1 ) + Var (z 2 ) + 2 Cov (z 1 , z 2 ) Plugging this into (2.5) gives the Newey-West estimator.
= 2 Var (z 1 ) + 2 Cov (z 1 , z 2 ) ,

20 21
h i0
2.4 Variance of a Sample Average (more details) xt z t . Then Cov(m̄) is

Consider a covariance stationary vector process m t with zero mean and Cov (m t , m t−s ) = " #!

R (s) (which only depends on s). That is, we allow for serial correlation in m t , but no het- Cov (m̄) = Cov

eroskedasticity. This is more restrictive than we want, but we will allow for heteroskedas- " #
Var (x̄) Cov (x̄, z̄)
ticity later. = .
PT Cov (z̄, x̄) Var (z̄)
Let m̄ = t=1 m t /T . The sampling variance of a mean estimator of the zero mean
random variable m t is defined as Example 12 (Cov(m̄) with T = 3.) With T = 3, we have

T
! T
!0 
1 X 1 X Cov (T m̄) =
Cov (m̄) = E  mt mτ  . (2.7)
T T E (m 1 + m 2 + m 3 ) m 01 + m 02 + m 03 =

t=1 τ =1

E m 1 m 01 + m 2 m 02 + m 3 m 03 + E m 2 m 01 + m 3 m 02 + E m 1 m 02 + m 2 m 03 + E m 3 m 01 + E m 1 m 03 .
  
Let the covariance (matrix) at lag s be | {z } | {z } | {z } | {z } | {z }
3R(0) 2R(1) 2R(−1) R(2) R(−2)
R (s) = Cov (m t , m t−s )
The general pattern in the previous example is
= E m t m 0t−s , (2.8)
T
X −1

since E m t = 0 for all t. Cov (T m̄) = (T − |s|) R(s). (2.9)


s=−(T −1)

Example 10 (m t is a scalar iid process.) When m t is a scalar iid process, then Divide both sides by T
T T
!
1X 1 X √ T −1 
|s|

Var (m t ) /*independently distributed*/

Var mt = 2
X
T T Cov T m̄ = 1− R(s). (2.10)
t=1 t=1 T
s=−(T −1)
1
= T Var (m t ) /*identically distributed*/
T2 This is the exact expression for a given sample size.
1 In many cases, we use the asymptotic expression (limiting value as T → ∞) instead.
= Var (m t ) .
T If R (s) = 0 for s > q so m t is an MA(q), then the limit as the sample size goes to infinity
This is the classical iid case. Clearly, limT ⇒∞ Var(m̄) = 0. By multiplying both sides by is
√ √ √ q
T we instead get Var( T m̄) = Var (m t ), which is often more convenient for asymptotics.
  X
ACov T m̄ = lim Cov T m̄ = R(s), (2.11)
T →∞
s=−q
Example 11 Let xt and z t be two scalars, with samples averages x̄ and z̄. Let m t =
where ACov stands for the asymptotic variance-covariance matrix. This continues to hold
even if q = ∞, provided R (s) goes to zero sufficiently quickly, as it does in stationary
VAR systems. In this case we have
√  ∞
X
ACov T m̄ = R(s). (2.12)
s=−∞

22 23
Estimation in finite samples will of course require some cut-off point, which is discussed Var(√T × sample mean)/Var(series)
below. 10
√
Model: mt = ρmt−1 + ut, where ut is iid N(0,1)

The traditional estimator of ACov T m̄ is just R(0), which is correct when m t has
no autocorrelation, that is 8

√ 
ACov T m̄ = R(0) = Cov (m t , m t ) if Cov (m t , m t−s ) for s 6= 0. (2.13) 6

By comparing with (2.11) we see that this underestimates the true variance of autocovari- 4
ances are mostly positive, and overestimates if they are mostly negative. The errors can
be substantial. 2

Example 13 (Variance of sample mean of AR(1).) Let m t = ρm t−1 +u t , where Var(u t ) = 0


−0.6 −0.4 −0.2 0 0.2 0.4 0.6
σ 2 . Note that R (s) = ρ |s| σ 2 / 1 − ρ 2 , so

AR(1) coefficient
√  ∞
X
AVar T m̄ = R(s) √
s=−∞ Figure 2.3: Ratio of Var( T m̄) and Var(m t ) of AR(1) process m t = ρm t−1 + u t .
∞ ∞
!
σ2 X σ2 X
= ρ |s| = 1+2 ρ s
Example 15 (Variance of sample mean of AR(1), illustration of why limT →∞ of (2.10).)
1 − ρ s=−∞
2 1 − ρ2
s=1
For an AR(1) (2.10) is
σ2 1 + ρ
= ,
1 − ρ2 1 − ρ √  σ2
T
X −1 
|s|

Var T m̄ = 1− ρ |s|
which is increasing in ρ (provided |ρ| < 1, as required for stationarity). The variance 1 − ρ2 T
s=−(T −1)
of m̄ is much larger for ρ close to one than for ρ close to zero: the high autocorrelation " T −1 
#
σ2 X s s
create long swings, so the mean cannot be estimated with any good precision in a small = 1+2 1− ρ
1 − ρ2 T
s=1
sample. If we disregard all autocovariances, then we would conclude that the variance of
√ σ2 ρ ρ T +1 − ρ
 
T m̄ is σ 2 / 1 − ρ 2 , which is smaller (larger) than the true value when ρ > 0 (ρ < 0). .

= 1 + 2 + 2
1 − ρ2 1−ρ T (1 − ρ)2
For instance, with ρ = 0.85, it is approximately 12 times too small.
The last term in brackets goes to zero as T goes to infinity. We then get the result in
Example 14 (Variance of sample mean of AR(1), continued.) Part of the reason why Example 13.
Var (m̄) increased with ρ in the previous examples is that Var (m t ) increases with ρ. We

can eliminate this effect by considering how much larger AVar( T m̄) is than in the iid
√ 2.5 The Newey-West Estimator
case, that is, AVar( T m̄)/Var(m t ) = (1 + ρ) / (1 − ρ). This ratio is one for ρ = 0 (iid
data), less than one for ρ < 0, and greater than one for π > 0. This says that if relatively 2.5.1 Definition of the Estimator
more of the variance in m t comes from long swings (high ρ), then the sample mean is
more uncertain. See Figure 2.3 for an illustration. Newey and West (1987) suggested the following estimator of the covariance matrix in

24 25
(2.11) as (for some n < T ) n is to round (T /100)1/4 down to the closest integer, although this does not satisfy the
√ n   consistency requirement.
\
 X |s|
ACov T m̄ = 1− R̂(s) It is important to note that definition of the covariance matrices in (2.8) and (2.15)
s=−n
n+1
assume that m t has zero mean. If that is not the case, then the mean should be removed
n  
X s 
in the calculation of the covariance matrix. In practice, you remove the same number,
= R̂(0) + 1− R̂(s) + R̂(−s) , or since R̂(−s) = R̂ 0 (s),
n+1 estimated on the whole sample, from both m t and m t−s . It is often recommended to
s=1
n  
X s  remove the sample means even if theory tells you that the true mean is zero.
= R̂(0) + 1− R̂(s) + R̂ 0 (s) , where (2.14)
n+1
s=1

T
2.6 Summary
1 X 0
R̂(s) = m t m t−s (if E m t = 0). (2.15)
T
t=s+1 T
1X
The tent shaped (Bartlett) weights in (2.14) guarantee a positive definite covariance Let m̄ = m t and R (s) = Cov (m t , m t−s ) . Then
T
t=1
estimate. In contrast, equal weights (as in (2.11)), may give an estimated covariance
√  ∞
X
matrix which is not positive definite, which is fairly awkward. Newey and West (1987) ACov T m̄ = R(s)
showed that this estimator is consistent if we let n go to infinity as T does, but in such a s=−∞
√ 
way that n/T 1/4 goes to zero. ACov T m̄ = R(0) = Cov (m t , m t ) if R(s) = 0 for s 6 = 0
There are several other possible estimators of the covariance matrix in (2.11), but √ n  
 X s 
simulation evidence suggest that they typically do not improve a lot on the Newey-West \
Newey-West:ACov T m̄ = R̂(0) + 1− R̂(s) + R̂ 0 (s) .
n+1
s=1
estimator.

Example 16 (m t is MA(1).) Suppose we know that m t = εt + θ εt−1 . Then R(s)


 = 0 for Bibliography

\
s ≥ 2, so it might be tempting to use n = 1 in (2.14). This gives ACov T m̄ = R̂(0) +
Cochrane, J. H., 2001, Asset Pricing, Princeton University Press, Princeton, New Jersey.
2 [ R̂(1)+ R̂ (1)], while the theoretical expression (2.11) is ACov()
= R(0)+ R(1)+ R 0 (1).
1 0

The Newey-West estimator puts too low weights on the first lead and lag, which suggests Greene, W. H., 2003, Econometric Analysis, Prentice-Hall, Upper Saddle River, New
that we should use n > 1 (or more generally, n > q for an MA(q) process). Jersey, 5th edn.
It can also be shown that, under
√ quite
 general circumstances, Ŝ in (2.14)–(2.15) is Hamilton, J. D., 1994, Time Series Analysis, Princeton University Press, Princeton.
a consistent estimator of ACov T m̄ , even if m t is heteroskedastic (on top of being
autocorrelated). (See Hamilton (1994) 10.5 for a discussion.) Harris, D., and L. Matyas, 1999, “Introduction to the Generalized Method of Moments
Estimation,” in Laszlo Matyas (ed.), Generalized Method of Moments Estimation .
2.5.2 How to Implement the Newey-West Estimator chap. 1, Cambridge University Press.

Economic theory and/or stylized facts can sometimes help us choose the lag length n. Hayashi, F., 2000, Econometrics, Princeton University Press.
For instance, we may have a model of stock returns which typically show little autocor-
relation, so it may make sense to set n = 0 or n = 1 in that case. A popular choice of

26 27
Newey, W. K., and K. D. West, 1987, “A Simple Positive Semi-Definite, Heteroskedastic-
ity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55, 703–708.

Pindyck, R. S., and D. L. Rubinfeld, 1998, Econometric Models and Economic Forecasts, 3 Vector Autoregression (VAR)
Irwin McGraw-Hill, Boston, Massachusetts, 4ed edn.
Reference: Greene (2003) 19.6; Hamilton (1994) 10–11; Verbeek (2004) 9.4; and Pindyck
Verbeek, M., 2004, A Guide to Modern Econometrics, Wiley, Chichester, 2nd edn.
and Rubinfeld (1998) 9.2 and 13.5.
Let yt be an n × 1 vector of variables. The VAR( p) is

yt = µ + A1 yt−1 + ... + A p yt− p + εt , where εt is white noise, Cov(εt ) = . (3.1)

Example 17 (VAR(2) of 2 × 1 vector.) Let yt = [ xt z t ]0 . Then


" # " #" # " #" # " #
xt A1,11 A1,12 xt−1 A2,11 A2,12 xt−2 ε1t
= + + . (3.2)
zt A1,21 A1,22 z t−1 A2,21 A2,22 z t−2 ε2t

3.1 Estimation

The MLE, conditional on the initial observations, of the VAR is the same as OLS estimates
of each equation separately (assuming iid normally distributed residuals). The MLE of
PT
the i j th element in Cov(εt ) is given by t=1 v̂it v̂ jt /T , where v̂it and v̂ jt are the OLS
residuals.
Note that the VAR system is a system of “seemingly unrelated regressions,” with the
same regressors in each equation. The OLS on each equation is therefore the GLS, which
coincides with MLE if the errors are normally distributed.
The choice of variables to enter the VAR system is typically based on economic the-
ory, whereas the lag length is guided by more practical considerations (to make residuals
uncorrelated, information criterion, account for seasons etc).

3.2 Canonical Form

A VAR( p) can be rewritten as a VAR(1). This turns out to be very practical for calculating,
among other things, impulse response functions. For instance, a VAR(2) can be written

28 29
as Remark 20 (Spectral decomposition.) The n eigenvalues (λi ) and associated eigenvec-
" # " # " #" # " # tors (z i ) of the n × n matrix A satisfy
yt µ A1 A2 yt−1 εt
= + + or (3.3)
yt−1 0 I 0 yt−2 0 (A − λi In ) z i = 0n×1 .
yt∗ = µ∗ + Ayt−1

+ εt∗ . (3.4)
If the eigenvectors are linearly independent, then
Example 18 (Canonical form of a univariate AR(2).)
λ1 0 · · · 0
 
" # " # " #" # " #
µ εt 0 λ2 · · · 0
 
yt a1 a2 yt−1 h i
. A = Z 3Z −1 , where 3 =  .
 
= + +  .. .. .  and Z = z 1 z 2 · · · zn
yt−1 0 1 0 yt−2 0  . . · · · ..


0 0 ··· λn
Example 19 (Canonical for of VAR(2) of a 2 × 1 vector.) Continuing on the previous
example, we get Note that we therefore get
      
xt A1,11 A1,11 A2,11 A2,12 xt−1 ε1t A2 = A A = Z 3Z −1 Z 3Z −1 = Z 33Z −1 = Z 32 Z −1 ⇒ Aq = Z 3q Z −1 .
  ε2t
      
 zt   A1,21 A1,22 A2,21 A2,22   z t−1 √
. Remark 21 (Modulus of complex number.) If λ = a + bi, where i = −1, then |λ| =

 =  +
 t−1   1
 x   0 0 0  x
  t−2
  0  √
   |a + bi| = a 2 + b2 .
z t−1 0 1 0 0 z t−2 0
We want lim K →∞ A K +1 yt−K ∗
−1 = 0 (stable VAR) to get a moving average repre-
3.3 Moving Average Form and Stability sentation of yt (where the influence of the starting values vanishes asymptotically). We
note from the spectral decompositions that A K +1 = Z 3 K +1 Z −1 , where Z is the matrix of
Consider a VAR(1), or a VAR(1) representation of a VAR( p) or an AR( p) eigenvectors and 3 a diagonal matrix with eigenvalues. Clearly, lim K →∞ A K +1 yt−K

−1 =
0 is satisfied if the eigenvalues of A are all less than one in modulus.
yt∗ = Ayt−1

+ εt∗ . (3.5)
Example 22 (AR(1).) For the univariate AR(1) yt = ayt−1 + εt , the characteristic equa-

Solve recursively backwards (substitute for yt−s ∗
= Ayt−s−1 + εt−s
∗ , s = 1, 2,...) to get
tion is (a − λ) z = 0, which is only satisfied if the eigenvalue is λ = a. The AR(1) is
the vector moving average representation (VMA), or impulse response function therefore stable (and stationarity) if −1 < a < 1.

yt∗ = A Ayt−2

+ εt−1


+ εt∗ If we have a stable VAR, then (3.6) can be written

= A2 yt−2
∗ ∗
+ Aεt−1 + εt∗ X
yt∗ = As εt−s

(3.7)
= A2 Ayt−3∗
+ εt−2
∗ ∗
+ εt∗

+ Aεt−1 s=0
= A3 yt−3

+ A2 εt−2
∗ ∗
+ Aεt−1 + εt∗ = εt∗ + ∗
Aεt−1 + A2 εt−2

+ ...
..
. We may pick out the first n equations from (3.7) (to extract the “original” variables from
K
X the canonical form) and write them as
= A K +1 yt−K

−1 + As εt−s

. (3.6)
s=0
yt = εt + C1 εt−1 + C2 εt−2 + ..., (3.8)

30 31
which is the vector moving average, VMA, form of the VAR. Definition: if z cannot help forecast x, then z does not Granger-cause x; the MSE of
the forecast E ( xt | xt−s , z t−s , s > 0) equals the MSE of the forecast E ( xt | xt−s , s > 0).
Example 23 (AR(2), Example (18) continued.) Let µ = 0 in 18 and note that the VMA Test: Redefine the dimensions of xt and z t in (3.2): let xt be n 1 × 1 and z t be n 2 × 1. If
of the canonical form is the n 1 × n 2 matrices A1,12 = 0 and A2,12 = 0, then z fail to Granger-cause x. (In general,
we would require As,12 = 0 for s = 1, ..., p.) This carries over to the MA representation
" # " # " #" # " #" #
yt εt a1 a2 εt−1 a12 + a2 a1 a2 εt−2
= + + + ... in (3.8), so Cs,12 = 0.
yt−1 0 1 0 0 a1 a2 0
These restrictions can be tested with an F-test. The easiest case is when x is a scalar,
The MA of yt is therefore since we then simply have a set of linear restrictions on a single OLS regression.
 
yt = εt + a1 εt−1 + a12 + a2 εt−2 + ... Example 26 (RBC and nominal neutrality.) Suppose we have an RBC model which says
that money has no effect on the real variables (for instance, output, capital stock, and the
Note that
∂ yt ∂ Et yt+s productivity level). Money stock should not Granger-cause real variables.
= Cs or = Cs , with C0 = I (3.9)
∂εt−s
0 ∂εt0
Example 27 (Granger causality and causality.) Do Christmas cards cause Christmas?
so the impulse response function is given by {I, C1 , C2 , ...}. Note that it is typically only
meaningful to discuss impulse responses to uncorrelated shocks with economic interpreta- Example 28 (Granger causality and causality II, from Hamilton 11.) Consider the price
tions. The idea behind structural VARs (discussed below) is to impose enough restrictions Pt of an asset paying dividends Dt . Suppose the expected return (Et (Pt+1 + Dt+1 )/Pt )
is a constant, R. The price then satisfies Pt = Et ∞ −s
P
to achieve this. s=1 R Dt+s . Suppose Dt = u t +
δu t−1 + vt , so Et Dt+1 = δu t and Et Dt+s = 0 for s > 1. This gives Pt = δu t /R, and
Example 24 (Impulse response function for AR(1).) Let yt = ρyt−1 + εt . The MA
Pt Dt = u t + vt + R Pt−1 , so the VAR is
s=0 ρ εt−s , so ∂ yt /∂εt−s = ∂ Et yt+s /∂εt = ρ . Stability
representation is yt = s s
" # " #" # " #
requires |ρ| < 1, so the effect of the initial value eventually dies off (lims→∞ ∂ yt /∂εt−s = Pt 0 0 Pt−1 δu t /R
= + ,
0). Dt R 0 Dt−1 u t + vt

Example 25 (Numerical VAR(1) of a 2 × 1 vector.) Consider the VAR(1) where P Granger-causes D. Of course, the true causality is from D to P. Problem:
" # " #" # " # forward looking behavior.
xt 0.5 0.2 xt−1 ε1t
= + . Example 29 (Money and output, Sims (1972).) Sims found that output, y does not Granger-
zt 0.1 −0.3 z t−1 ε2t
cause money, m, but that m Granger causes y. His interpretation was that money supply
The eigenvalues are approximately 0.52 and −0.32, so this is a stable VAR. The VMA is is exogenous (set by the Fed) and that money has real effects. Notice how he used a
combination of two Granger causality test to make an economic interpretation.
" # " # " #" # " #" #
xt ε1,t 0.5 0.2 ε1t−1 0.27 0.04 ε1t−2
= + + + ...
zt ε2,t 0.1 −0.3 ε2t−1 0.02 0.11 ε2t−2 Example 30 (Granger causality and omitted information.∗ ) Consider the VAR

ε1t
      
3.4 Granger Causality y1t a11 a12 0 y1t−1
 y2t  =  0 a22 0   y2t−1  +  ε2t 
      
Main message: Granger-causality might be useful, but it is not the same as causality. y3t 0 a32 a33 y3t−1 ε3t

32 33
Notice that y2t and y3t do not depend on y1t−1 , so the latter should not be able to Granger- Example 31 (Unconditional variance of VAR(1).) Letting s → ∞ in (3.13) gives
cause y3t . However, suppose we forget to use y2t in the regression and then ask if y1t ∞
X 0
Granger causes y3t . The answer might very well be yes since y1t−1 contains information Cov(yt ) = As  As
about y2t−1 which does affect y3t . (If you let y1t be money, y2t be the (autocorrelated) s=0

Solow residual, and y3t be output, then this is a short version of the comment in King =  + [AA0 + A2 (A2 )0 + ...]
=  + A  + AA0 + ... A0

(1986) comment on Bernanke (1986) (see below) on why money may appear to Granger-
cause output). Also note that adding a nominal interest rate to Sims (see above) money- =  + A Cov(yt )A0 ,
output VAR showed that money cannot be taken to be exogenous.
which suggests that we can calculate Var(yt ) by an iteration (backwards in time) 8t =
 + A8t+1 A0 , starting from 8T = I , until convergence.
3.5 Forecasts Forecast Error Variance

The error forecast of the s period ahead forecast is 3.6 Forecast Error Variance Decompositions∗
yt+s − Et yt+s = εt+s + C1 εt+s−1 + ... + Cs−1 εt+1 , (3.10) If the shocks are uncorrelated, then it is often useful to calculate the fraction of Var(yi,t+s −
Et yi,t+s ) due to the j th shock, the forecast error variance decomposition. Suppose the
so the covariance matrix of the (s periods ahead) forecasting errors is
covariance matrix of the shocks, here , is a diagonal n × n matrix with the variances ωii
Cov (yt+s − Et yt+s ) = E (yt+s − Et yt+s ) (yt+s − Et yt+s )0 = +C1 C10 +...+Cs−1 Cs−1
0
. along the diagonal. Let cqi be the ith column of Cq . We then have
(3.11) n
X 0
For a VAR(1), Cs = As , so we have Cq Cq0 = ωii cqi cqi . (3.14)
i=1
yt+s − Et yt+s = εt+s + Aεt+s−1 + ... + As εt+1 , and (3.12)
Example 32 (Illustration of (3.14) with n = 2.) Suppose
Cov (yt+s − Et yt+s ) =  + AA + ... + A
0 s−1
(A ).
s−1 0
(3.13) " # " #
c11 c12 ω11 0
Cq = and  = ,
Note that lims→∞ Et yt+s = 0, that is, the forecast goes to the unconditional mean c21 c22 0 ω22
(which is zero here, since there are no constants - you could think of yt as a deviation
then
from the mean). Consequently, the forecast error becomes the VMA representation (3.8). " #
ω11 c11
2 + ω c2
22 12 ω11 c11 c21 + ω22 c12 c22
Similarly, the forecast error variance goes to the unconditional variance. Cq Cq0 = ,
ω11 c11 c21 + ω22 c12 c22 ω11 c21
2 + ω c2
22 22

which should be compared with


" #" #0 " #" #0
c11 c11 c12 c12
ω11 + ω22
c21 c21 c22 c22
" # " #
2
c11 c11 c21 2
c12 c12 c22
= ω11 2
+ ω 22 2
.
c11 c21 c21 c12 c22 c22

34 35
Applying this on (3.11) gives the F matrix, which controls how the endogenous variables, yt , are linked to each other
n n n contemporaneously. In fact, identification of a VAR amounts to choosing an F matrix.
X X X
Cov (yt+s − Et yt+s ) = ωii I + ωii c1i (c1i )0 + ... + ωii cs−1i (cs−1i )0 Once that is done, impulse responses and forecast error variance decompositions can be
i=1 i=1 i=1 made with respect to the structural shocks. For instance, the impulse response function of
n
the VAR, (3.8), can be rewritten in terms of u t = Fεt (from (3.19))
X
ωii I + c1i (c1i )0 + ... + cs−1i (cs−1i )0 ,
 
= (3.15)
i=1
yt = εt + C1 εt−1 + C2 εt−2 + ...
which shows how the covariance matrix for the s-period forecast errors can be decom-
= F −1 Fεt + C1 F −1 Fεt−1 + C2 F −1 Fεt−2 + ...
posed into its n components.
= F −1 u t + C1 F −1 u t−1 + C2 F −1 u t−2 + ... (3.20)

3.7 Structural VARs Remark 33 The easiest way to calculate this representation is by first finding F −1 (see
below), then writing (3.18) as
3.7.1 Structural and Reduced Forms
yt = µ + A1 yt−1 + ... + A p yt− p + F −1 u t . (3.21)
We are usually not interested in the impulse response function (3.8) or the variance de-
composition (3.11) with respect to εt , but with respect to some structural shocks, u t , To calculate the impulse responses to the first element in u t , set yt−1 , ..., yt− p equal to
which have clearer interpretations (technology, monetary policy shock, etc.). the long-run average, (I − A1 − ... − Ap)−1 µ, make the first element in u t unity and all
Suppose the structural form of the model is other elements zero. Calculate the response by iterating forward on (3.21), but putting all
elements in u t+1 , u t+2 , ... to zero. This procedure can be repeated for the other elements
F yt = α + B1 yt−1 + ... + B p yt− p + u t , u t is white noise, Cov(u t ) = D. (3.16)
of u t .
This could, for instance, be an economic model derived from theory.1
We would typically pick F such that the elements in u t are uncorrelated with each
Provided F −1 exists, it is possible to write the time series process as
other, so they have a clear interpretation.
yt = F −1 α + F −1 B1 yt−1 + ... + F −1 B p yt− p + F −1 u t (3.17) The VAR form can be estimated directly from data. Is it then possible to recover the
= µ + A1 yt−1 + ... + A p yt− p + εt , Cov (εt ) = , (3.18) structural parameters in (3.16) from the estimated VAR (3.18)? Not without restrictions
on the structural parameters in F, Bs , α, and D. To see why, note that in the structural
where form (3.16) we have ( p + 1) n 2 parameters in {F, B1 , . . . , B p }, n parameters in α, and
 0 n(n +1)/2 unique parameters in D (it is symmetric). In the VAR (3.18) we have fewer pa-
µ = F −1 α, As = F −1 Bs , and εt = F −1 u t so  = F −1 D F −1 . (3.19)
rameters: pn 2 in {A1 , . . . , A p }, n parameters in in µ, and n(n +1)/2 unique parameters in
Equation (3.18) is a VAR model, so a VAR can be thought of as a reduced form of the . This means that we have to impose at least n 2 restrictions on the structural parameters
structural model (3.16). {F, B1 , . . . , B p , α, D} to identify all of them. This means, of course, that many different
The key to understanding the relation between the structural model and the VAR is structural models have can have exactly the same reduced form.
1 This is a “structural model” in a traditional, Cowles commission, sense. This might be different from
Example 34 (Structural form of the 2 × 1 case.) Suppose the structural form of the
what modern macroeconomists would call structural.

36 37
VAR(2) in Example 17 is on xt (second equation). The VAR (reduced form) is obtained by premultiplying by F −1
" #" # " #" # " #" # " # " # " #" #" # " #" #
F11 F12 xt B1,11 B1,12 xt−1 B2,11 B2,12 xt−2 u 1t xt 1 0 B11 B12 xt−1 1 0 u 1t
= + + . = +
F21 F22 zt B1,21 B1,22 z t−1 B2,21 B2,22 z t−2 u 2t zt α 1 B21 B22 z t−1 α 1 u 2t
" #" # " #
This structural form has 3 × 4 + 3 unique parameters. The VAR in (3.2) has 2 × 4 + 3. A11 A12 xt−1 ε1t
= + .
We need at least 4 restrictions on {F, B1 , B2 , D} to identify them from {A1 , A2 , }. A21 A22 z t−1 ε2t

This means that ε1t = u 1t , so the first VAR shock equals the first structural shock. In
3.7.2 “Triangular” Identification 1: Triangular F with Fii = 1 and Diagonal D contrast, ε2t = αu 1t + u 2t , so the second VAR shock is a linear combination of the first
Reference: Sims (1980). two shocks. The covariance matrix of the VAR shocks is therefore
The perhaps most common way to achieve identification of the structural parameters " #! " # " #! " #
ε1t σ12 ασ12 u 1t σ12 0
is to restrict the contemporaneous response of the different endogenous variables, yt , to Cov = , where Cov = .
ε2t ασ12 α 2 σ12 + σ22 u 2t 0 σ22
the different structural shocks, u t . Within in this class of restrictions, the triangular iden-
tification is the most popular: assume that F is lower triangular (n(n + 1)/2 restrictions) This set of identifying restrictions can be implemented by estimating the structural
with diagonal element equal to unity, and that D is diagonal (n(n − 1)/2 restrictions), form with LS—equation by equation. The reason is that this is just the old fashioned fully
which gives n 2 restrictions (exact identification). recursive system of simultaneous equations. See, for instance, Greene (2000) 16.3.
A lower triangular F matrix is very restrictive. It means that the first variable can
react to lags and the first shock, the second variable to lags and the first two shocks, etc. 3.7.3 “Triangular” Identification 2: Triangular F and D = I
This is a recursive simultaneous equations model, and we obviously need to be careful The identifying restrictions in Section 3.7.2 is actually the same as assuming that F is
with how we order the variables. The assumptions that Fii = 1 is just a normalization. triangular and that D = I . In this latter case, the restriction on the diagonal elements of F
A diagonal D matrix seems to be something that we would often like to have in has been moved to the diagonal elements of D. This is just a change of normalization (that
a structural form in order to interpret the shocks as, for instance, demand and supply the structural shocks have unit variance). It happens that this alternative normalization is
shocks. The diagonal elements of D are the variances of the structural shocks. fairly convenient when we want to estimate the VAR first and then recover the structural
parameters from the VAR estimates.
Example 35 (Lower triangular F: going from structural form to VAR.) Suppose the
structural form is Example 36 (Change of normalization in Example 35) Suppose the structural shocks in
" #" # " #" # " #
1 0 xt B11 B12 xt−1 u 1t Example 35 have the covariance matrix
= + .
−α 1 zt B21 B22 z t−1 u 2t
" #! " #
u 1t σ12 0
D = Cov = .
This is a recursive system where xt does not not depend on the contemporaneous z t , and u 2t 0 σ22
therefore not on the contemporaneous u 2t (see first equation). However, z t does depend Premultiply the structural form in Example 35 by
" #
1/σ1 0
0 1/σ2

38 39
to get Example 40 (Identification of the 2 × 1 case.) Suppose the structural form of the VAR(2)
" #" # " #" # " # in Example 17 is
1/σ1 0 xt B11 /σ1 B12 /σ1 xt−1 u 1t /σ1
= + . " #" # " #" # " #" # " #
−α/σ2 1/σ2 zt B21 /σ2 B22 /σ2 z t−1 u 2t /σ2 F11 0 xt B1,11 B1,12 xt−1 B2,11 B2,12 xt−2 u 1t
= + + ,
F21 F22 zt B1,21 B1,22 z t−1 B2,21 B2,22 z t−2 u 2t
This structural form has a triangular F matrix (with diagonal elements that can be dif- " #
ferent from unity), and a covariance matrix equal to an identity matrix. 1 0
with D = .
0 1
The reason why this alternative normalization is convenient is that it allows us to use
Step 1 above solves
the widely available Cholesky decomposition.
" # " #−1 " #−1 0
11 12 F11 0 F11 0
Remark 37 (Cholesky decomposition) Let  be an n × n symmetric positive definite =  
12 22 F21 F22 F21 F22
matrix. The Cholesky decomposition gives the unique lower triangular P such that  =
P P 0 (some software returns an upper triangular matrix, that is, Q in  = Q 0 Q instead).
 
1
2
F11
− F 2F21F
11 22
= F21
2 +F 2
F21

11
− F2 F 2 F2
Remark 38 Note the following two important features of the Cholesky decomposition. 11 22 F11 22

First, each column of P is only identified up to a sign transformation; they can be reversed for the three unknowns F11 , F21 , and F22 in terms of the known 11 , 12 , and 22 . Note
at will. Second, the diagonal elements in P are typically not unity. that the identifying restrictions are that D = I (three restrictions) and F12 = 0 (one
restriction). (This system is just four nonlinear equations in three unknown - one of the
Remark 39 (Changing sign of column and inverting.) Suppose the square matrix A2 is
equations for 12 is redundant. You do not need the Cholesky decomposition to solve it,
the same as A1 except that the i th and j th columns have the reverse signs. Then A−1
2 is
since it could be solved with any numerical solver of non-linear equations—but why make
the same as A−1
1 except that the i th and j th rows have the reverse sign.
life even more miserable?)
This set of identifying restrictions can be implemented by estimating the VAR with
A practical consequence of this normalization is that the impulse response of shock i
LS and then take the following steps.
equal to unity is exactly the same as the impulse response of shock i equal to Std(u it ) in
0
• Step 1. From (3.19)  = F −1 I F −1 (recall D = I is assumed), so a Cholesky the normalization in Section 3.7.2.
decomposition recovers F −1 (lower triangular F gives a similar structure of F −1 ,
and vice versa, so this works). The signs of each column of F −1 can be chosen 3.7.4 Other Identification Schemes∗
freely, for instance, so that a productivity shock gets a positive, rather than negative, Reference: Bernanke (1986).
effect on output. Invert F −1 to get F. Not all economic models can be written in this recursive form. However, there are
• Step 2. Invert the expressions in (3.19) to calculate the structural parameters from often cross-restrictions between different elements in F or between elements in F and D,
the VAR parameters as α = Fµ, and Bs = F As . or some other type of restrictions on Fwhich may allow us to identify the system.
Suppose we have (estimated) the parameters of the VAR (3.18), and that we want to

40 41
0 
impose D = Cov(u t ) = I . From (3.19) we then have (D = I )
   
1 0 0 1 0 0 1 0 0
 0 c s   0 c s  =  0 c2 + s 2 0  ,which is an identity matrix
 0      
 = F −1 F −1 . (3.22) 2 2
0 −s c 0 −s c 0 0 c +s
As before we need n(n − 1)/2 restrictions on F, but this time we don’t want to impose since cos2 θ + sin2 θ = 1. The transformation u = G 0 ε gives
the restriction that all elements in F above the principal diagonal are zero. Given these
u t = εt for t 6 = i, k
restrictions (whatever they are), we can solve for the remaining elements in B, typically
u i = εi c − εk s
with a numerical method for solving systems of non-linear equations.
u k = εi s + εk c.
3.7.5 What if the VAR Shocks are Uncorrelated ( = I )?∗
The effect of this transformation is to rotate the i th and k th vectors counterclockwise
Suppose we estimate a VAR and find that the covariance matrix of the estimated residuals through an angle of θ . (Try it in two dimensions.) There is an infinite number of such
is (almost) an identity matrix (or diagonal). Does this mean that the identification is transformations (apply a sequence of such transformations with different i and k, change
superfluous? No, not in general. Yes, if we also want to impose the restrictions that F is θ , etc.).
triangular.
There are many ways to reshuffle the shocks and still get orthogonal shocks. Recall Example 42 (Givens rotations and the F matrix.) We could take F in (3.23) to be (the
that the structural shocks are linear functions of the VAR shocks, u t = Fεt , and that we transpose) of any such sequence of givens rotations. For instance, if G 1 and G 2 are givens
0
assume that Cov (εt ) =  = I and we want Cov (u t ) = I , that, is from (3.19) we then rotations, then F = G 01 or F = G 2 or F = G 01 G 02 are all valid.
have (D = I )
F F 0 = I. (3.23) 3.7.6 Identification via Long-Run Restrictions, but No Cointegration∗

There are many such F matrices: the class of those matrices even have a name: orthogonal Suppose we have estimated a VAR system (3.1) for the first differences of some variables
matrices (all columns in F are orthonormal). However, there is only one lower triangular yt = 1xt , and that we have calculated the impulse response function as in (3.8), which
F which satisfies (3.23) (the one returned by a Cholesky decomposition, which is I ). we rewrite as
Suppose you know that F is lower triangular (and you intend to use this as the identi-
1xt = εt + C1 εt−1 + C2 εt−2 + ...
fying assumption), but that your estimated  is (almost, at least) diagonal. The logic then
= C (L) εt , with Cov(εt ) = . (3.24)
requires that F is not only lower triangular, but also diagonal. This means that u t = εt
(up to a scaling factor). Therefore, a finding that the VAR shocks are uncorrelated com-
bined with the identifying restriction that F is triangular implies that the structural and
reduced form shocks are proportional. We can draw no such conclusion if the identifying
assumption is something else than lower triangularity.

Example 41 (Rotation of vectors (“Givens rotations”).) Consider the transformation


of the vector ε into the vector u, u = G 0 ε, where G = In except that G ik = c,
G ik = s, G ki = −s, and G kk = c. If we let c = cos θ and s = sin θ for some an-
gle θ, then G 0 G = I . To see this, consider the simple example where i = 2 and k = 3

42 43
To find the MA of the level of xt , we solve recursively the elements in F. However, it is common to assume that (3.28) is a lower triangular
matrix. We can then use the following “trick” to find F. Since εt = F −1 u t
xt = C (L) εt + xt−1  0
= C (L) εt + C (L) εt−1 + xt−2 E C(1)εt εt0 C(1)0 = E C(1)F −1 u t u 0t F −1 C(1)0
..  0
. C(1)C(1)0 = C(1)F −1 F −1 C(1)0 . (3.29)
= C (L) (εt + εt−1 + εt−2 + ...)
We can therefore solve for a lower triangular matrix
= εt + (C1 + I ) εt−1 + (C2 + C1 + I ) εt−2 + ...
s
X 3 = C(1)F −1 (3.30)
= C + (L) εt , where Cs+ = Cs with C0 = I. (3.25)
j=0 by calculating the Cholesky decomposition of the left hand side of (3.29) (which is avail-
As before the structural shocks, u t , are able from the VAR estimate). Finally, we solve for F −1 from (3.30).

u t = Fεt with Cov(u t ) = D. Example 43 (The 2 × 1 case.) Suppose the structural form is
" #" # " #" # " #
The VMA in term of the structural shocks is therefore F11 F12 1xt B11 B12 1xt−1 u 1t
= + .
s
F21 F22 1z t B21 B22 1z t−1 u 2t
X
xt = C + (L) F −1 u t , where Cs+ = Cs with C0 = I. (3.26)
and we have an estimate of the reduced form
j=0
" # " # " # " #!
1xt 1xt−1 ε1t ε1t
The C + (L) polynomial is known from the estimation, so we need to identify F in order to =A + , with Cov = .
1z t 1z t−1 ε2t ε2t
use this equation for impulse response function and variance decompositions with respect
to the structural shocks. The VMA form (as in (3.24))
As before we assume that D = I , so " # " # " # " #
1xt ε1t ε1t−1 ε1t−2
 0 = +A +A 2
+ ...
 = F −1 D F −1 (3.27) 1z t ε2t ε2t−1 ε2t−2

in (3.19) gives n(n + 1)/2 restrictions. and for the level (as in (3.25))
We now add restrictions on the long run impulse responses. From (3.26) we have " # " # " # " #
xt ε1t ε1t−1   ε
1t−2
= + (A + I ) 2
+ A + A+I + ...
∂ xt+s zt ε2t ε2t−1 ε2t−2
lim = lim Cs+ F −1
s→∞ ∂u 0t s→∞
or since εt = F −1 u t
= C(1)F −1 , (3.28)
" # " # " # " #
xt u 1t u 1t−1   u 1t−2
where C(1) = ∞ −1
+ (A + I ) F −1 2 −1
+ ...
P
j=0 C s . We impose n(n − 1)/2 restrictions on these long run responses. =F + A + A+I F
zt u 2t u 2t−1 u 2t−2
Together we have n 2 restrictions, which allows to identify all elements in F.
In general, (3.27) and (3.28) is a set of non-linear equations which have to solved for There are 8+3 parameters in the structural form and 4+3 parameters in the VAR, so we

44 45
need four restrictions. Assume that Cov (u t ) = I (three restrictions) and that the long where 8 (L) is a stable matrix polynomial in the lag operator. We see that the k × 1 vector
run response of u 1t−s on xt is zero, that is, ϕt has permanent effects on (at least some elements in) yt , while the r × 1 (r = n − k) ψt
" # " #−1 does not.
unrestricted 0   F11 F12 The last component in (3.31) is stationary, but τt is a k × 1 vector of random walks, so
= I + A + A2 + ...
unrestricted unrestricted F21 F22 the n × k matrix ϒ makes yt share the non-stationary components: there are k common
#−1
trends. If k < n, then we could find (at least) r linear combinations of yt , α 0 yt where α 0 is
"
F11 F12
= (I − A) −1
F21 F22 an r × n matrix of cointegrating vectors, which are such that the trends cancel each other
" #−1 " #−1 (α 0 ϒ = 0).
1 − A11 −A12 F11 F12
= .
−A21 1 − A22 F21 F22 Remark 44 (Lag operator.) We have the following rules: (i) L k xt = xt−k ; (ii) if 8 (L) =
The upper right element of the right hand side is a + bL−m + cLn , then 8 (L) (xt + yt ) = a (xt + yt ) + b (xt+m + yt+m ) + c (xt−n + yt−n )
and 8 (1) = a + b + c.
−F12 + F12 A22 + A12 F11
(1 − A22 − A11 + A11 A22 − A12 A21 ) (F11 F22 − F12 F21 ) Example 45 (Söderlind and Vredin (1996)). Suppose we have
which is one restriction on the elements in F. The other three are given by F −1 F −1 0

=
   
ln Yt (output) 0 1
, that is,     " #
 ln Pt (price level)  1 −1  money supply trend
, ϒ = 
 1 0  , and τt = productivity trend ,

 2 +F 2  " yt = 
 ln M (money stock)

F22 
− F22 F21 +F12 F11 2
# t
11 12
12    
 (F11 F22 −F12 F21 ) 2 (F11 F22 −F12 F21 ) 
2 +F 2 = . ln Rt (gross interest rate) 0 0
− F22 F21 +F12 F11 2
F21 11 12 22
(F11 F22 −F12 F21 ) (F11 F22 −F12 F21 )
2
then we see that ln Rt and ln Yt + ln Pt − ln Mt (that is, log velocity) are stationary, so
3.8 Cointegration, Common Trends, and Identification via Long-Run "
0 0 0 1
#
α =
0
Restrictions∗ 1 1 −1 0

These notes are a reading guide to Mellander, Vredin, and Warne (1992), which is well be- are (or rather, span the space of) cointegrating vectors. We also see that α 0 ϒ = 02×2 .
yond the first year course in econometrics. See also Englund, Vredin, and Warne (1994).
(I have not yet double checked this section.) 3.8.2 VAR Representation

The VAR representation is as in (3.1). In practice, we often estimate the parameters in


3.8.1 Common Trends Representation and Cointegration
A∗s , α, the n × r matrix γ , and  = Cov(εt ) in the vector “error correction form”
The common trends representation of the n variables in yt is
" # " #! 1yt = A∗1 1yt + ... + A∗p−1 1yt− p+1 + γ α 0 yt−1 + εt , with Cov(εt ) = . (3.33)
ϕt ϕt
yt = y0 + ϒτt + 8 (L) , with Cov = In (3.31)
ψt ψt
τt = τt−1 + ϕt , (3.32)

46 47
This can easily be rewritten on the VAR form (3.1) or on the vector MA representation Use (3.38) and (3.39) to rewrite (3.37) as
for 1yt
yt = C (1) ξt + C ∗ (L) εt , where (3.40)
1yt = εt + C1 εt−1 + C2 εt−2 + ... (3.34) ∞
X
Cs∗ = − C j. (3.41)
= C (L) εt . (3.35) j=s+1

To find the MA of the level of yt , we recurse on (3.35) 3.8.4 Identification of the Common Trends Shocks

yt = C (L) εt + yt−1 Rewrite (3.31)-(3.32) and (3.39)-(3.40) as


= C (L) εt + C (L) εt−1 + yt−2 t
..
X
yt = C (1) εt + C ∗ (L) εt , with Cov (εt ) = , and (3.42)
.
s=0
= C (L) (εt + εt−1 + εt−2 + ... + ε0 ) + y0 . (3.36)
" # " # " #!
i Pt
s=0 ϕt ϕt ϕt
h
= ϒ 0n×r + 8 (L) , with Cov = In . (3.43)
ψt ψt ψt
We now try to write (3.36) in a form which resembles the common trends representation
(3.31)-(3.32) as much as possible. h
0 0
i0
Since both εt and ϕt ψt are white noise, we notice that the response of yt+s to either
must be the same, that is,
3.8.3 Multivariate Beveridge-Nelson decomposition
" #
h i  ϕ
t
C (1) + Cs εt =

ϒ 0n×r + 8s

We want to split a vector of non-stationary series into some random walks and the rest for all t and s ≥ 0. (3.44)
ψt
(which is stationary). Rewrite (3.36) by adding and subtracting C(1)(εt + εt−1 + ...)
This means that the VAR shocks are linear combinations of the structural shocks (as
yt = C (1) (εt + εt−1 + εt−2 + ... + ε0 ) + [C(L) − C (1)] (εt + εt−1 + εt−2 + ... + ε0 ) .
in the standard setup without cointegration)
(3.37) " #
Suppose εs = 0 for s < 0 and consider the second term in (3.37). It can be written ϕt
= Fεt
h i ψt
I + C1 L + C2 L2 + .... − C (1) (εt + εt−1 + εt−2 + ... + ε0 ) " #
Fk
= εt . (3.45)
= /*since C (1) = I + C1 + C2 + ...*/ Fr
[−C1 − C2 − C3 − ...] εt + [−C2 − C3 − ...] εt−1 + [−C3 − ...] εt−2 . (3.38)
Combining (3.44) and (3.45) gives that
Now define the random walks " #
Fk
C (1) + Cs∗ = ϒ Fk + 8s (3.46)
ξt = ξt−1 + εt , (3.39) Fr

= εt + εt−1 + εt−2 + ... + ε0 . must hold for all s ≥ 0. In particular, it must hold for s → ∞ where both Cs∗ and 8s
vanishes
C (1) = ϒ Fk . (3.47)

48 49
The identification therefore amounts to finding the n 2 coefficients in F, exactly as in – Step 4a. Premultiply (3.47) with ϒ 0 and solve for Fk
the usual case without cointegration. Once that is done, we can calculate the impulse −1
responses hand variance Fk = ϒ 0 ϒ ϒ 0 C(1). (3.50)
i0 decompositions with respect to the structural shocks by using
0 0
εt = F −1
ϕt ψt in (3.42).2 As before, assumptions about the covariance matrix of −1 0 −1
(This means that E ϕt ϕt0 = Fk Fk0 = ϒ 0 ϒ ϒ C(1)C (1)0 ϒ ϒ 0 ϒ .
the structural shocks are not enough to achieve identification. In this case, we typically
From (3.48) we see that this indeed is Ik as required by (3.49).) We still need
rely on the information about long-run behavior (as opposed to short-run correlations) to
to identify Fr .
supply the remaining restrictions.
– Step 4b. From (3.49), E ϕt ψt0 = 0k×r , we get Fk Fr0 = 0k×r , which gives
• Step 1. From (3.31) we see that α0ϒ
= 0r ×k must hold for α 0 yt
to be stationary. kr restrictions on the r n elements in Fr . Similarly, from E ψt ψt0 = Ir , we get
Given an (estimate of) α, this gives r k equations from which we can identify r k Fr Fr0 = Ir , which gives r (r + 1) /2 additional restrictions on Fr . We still
elements in ϒ. (It will soon be clear why it is useful to know ϒ). need r (r − 1) /2 restrictions. Exactly how they look does not matter for the
0
impulse response function of ϕt (as long as E ϕt ψt = 0). Note that restrictions
• Step 2. From (3.44) we have ϒϕt = C (1) εt as s → ∞. The variances of both
on Fr are restrictions on ∂ yt /∂ψt0 , that is, on the contemporaneous response.
sides must be equal
This is exactly as in the standard case without cointegration.
E ϒϕt ϕt0 ϒ 0 = E C (1) εt εt0 C (1)0 , or
A summary of identifying assumptions used by different authors is found in Englund,
ϒϒ 0 = C (1) C (1)0 , (3.48)
Vredin, and Warne (1994).
which gives k (k + 1) /2 restrictions on ϒ (the number of unique elements in the
symmetric ϒϒ 0 ). (However, each column of ϒ is only identified up to a sign trans- Bibliography
formation: neither step 1 or 2 is affected by multiplying each element in column j
of ϒ by -1.) Bernanke, B., 1986, “Alternative Explanations of the Money-Income Correlation,”
Carnegie-Rochester Series on Public Policy, 25, 49–100.
• Step 3. ϒ has nk elements, so we still need nk − r k − k (k + 1) /2 = k(k − 1)/2
further restrictions on ϒ to identify all elements. They could be, for instance, that Englund, P., A. Vredin, and A. Warne, 1994, “Macroeconomic Shocks in an Open
money supply shocks have no long run effect on output (some ϒi j = 0). We now Economy - A Common Trends Representation of Swedish Data 1871-1990,” in Villy
know ϒ. Bergström, and Anders Vredin (ed.), Measuring and Interpreting Business Cycles . pp.
125–233, Claredon Press.
" #!
ϕt
• Step 4. Combining Cov = In with (3.45) gives
ψt Greene, W. H., 2000, Econometric Analysis, Prentice-Hall, Upper Saddle River, New
" # " # " #0 Jersey, 4th edn.
Ik 0 Fk Fk
=  , (3.49)
0 Ir Fr Fr Greene, W. H., 2003, Econometric Analysis, Prentice-Hall, Upper Saddle River, New
Jersey, 5th edn.
which gives n (n + 1) /2 restrictions.
2 Equivalently,we can use (3.47) and (3.46) to calculate ϒ and 8s (for all s) and then calculate the Hamilton, J. D., 1994, Time Series Analysis, Princeton University Press, Princeton.
impulse response function from (3.43).

50 51
King, R. G., 1986, “Money and Business Cycles: Comments on Bernanke and Related
Literature,” Carnegie-Rochester Series on Public Policy, 25, 101–116.

Mellander, E., A. Vredin, and A. Warne, 1992, “Stochastic Trends and Economic Fluctu- 4 Monetary Policy in VAR Systems
ations in a Small Open Economy,” Journal of Applied Econometrics, 7, 369–394.

Pindyck, R. S., and D. L. Rubinfeld, 1998, Econometric Models and Economic Forecasts, 4.1 VAR System, Structural Form, and Impulse Response Function
Irwin McGraw-Hill, Boston, Massachusetts, 4ed edn.
Reference: Walsh (2003) 1.3; Favero (2001) 6
Sims, C. A., 1980, “Macroeconomics and Reality,” Econometrica, 48, 1–48. Let yt be an n × 1 vector of macro variables, including the policy instrument (usually
a short interest rate or a narrow money aggregate). The VAR system, that is, the reduced
Söderlind, P., and A. Vredin, 1996, “Applied Cointegration Analysis in the Mirror of form is
Macroeconomic Theory,” Journal of Applied Econometrics, 11, 363–382.
yt = µ + A1 yt−1 + ... + A p yt− p + εt , εt is white noise, Cov(εt ) = . (4.1)
Verbeek, M., 2004, A Guide to Modern Econometrics, Wiley, Chichester, 2nd edn.
The underlying structural form is assumed to be

F yt = α + B1 yt−1 + ... + B p yt− p + u t , u t is white noise, Cov(u t ) = D. (4.2)

We are, in most cases, interested in understanding the effect of the structural shocks,
u t . This essentially requires an estimate of the structural form, but that can be achieved
by imposing identifying restrictions on the VAR. As an example, the impulse response
function of the VAR in (4.1) is

yt = εt + C1 εt−1 + C2 εt−2 + .... (4.3)

By comparing (4.1) and (4.2) we see that εt = F −1 u t (or u t = Fεt ). We can then rewrite
the impulse response function (4.3) in terms of the structural shocks

yt = F −1 u t + C1 F −1 u t−1 + C2 F −1 u t−2 + ... (4.4)

A VAR estimation gives us Ci , i = 1, 2, . . ., but not F, so we need to impose restrictions


in order to identify the impulse responses to structural shocks.

Remark 46 The easiest way to calculate this representation is by first finding F −1 (see
below), then using εt = F −1 u t to write (4.1) as

yt = µ + A1 yt−1 + ... + A p yt− p + F −1 u t . (4.5)

52 53
To calculate the impulse responses to the first element in u t , set yt−1 , ..., yt− p equal to A Cholesky decomposition of the covariance matrix of the VAR residuals, , gives
the long-run average, (I − A1 − ... − Ap)−1 µ, make the first element in u t unity and all a lower triangular matrix, which by (4.6) can be taken to represent F −1 , since a lower
other elements zero. Calculate the response by iterating forward on (4.5), but putting all triangular F (as assumed) implies a lower triangular F −1 and D = I . Note however, that
elements in u t+1 , u t+2 , ... to zero. This procedure can be repeated for the other elements the signs of each column of F −1 are arbitrary. Therefore, we have
of u t .
chol () = F −1 , (4.7)
To see the mapping between the reduced form and the structural form, premultiply
up to a sign transformation of each column of F −1 , which implies a sign transformation
(4.2) by F −1 . This shows that the relation between the VAR parameters and the structural
of each row of F. With F identified, B1 , ..., B p can be calculated from (4.6).
parameters is
VAR in terms of structural form parameters Expression (4.2) with a lower triangular F and D = I is, in fact, a fully recursive
 = F −1 D F −1
0
(4.6) system of simultaneous equations (Greene (2003) 15.6). Using (4.6) and (4.7) is just a
As = F Bs for s = 1, ..., p.
−1 way to recover the fully recursive system from the VAR.1

In the VAR, there are pn 2 elements in A1 , ..., A p and n(n + 1)/2 (unique) elements in .
4.2.2 Monetary Policy
In the structural form, there are (1 + p) n 2 elements in F, ..., B p and n(n + 1)/2 (unique)
elements in D. We therefore have to impose at least n 2 (non-trivial) restrictions on the We now consider monetary policy in a fully recursive structural model. Partition the
structural form in order to back out the structural form parameters from the reduced form. vector of endogenous variables, yt , into the (scalar) policy instrument, st , variables which
come before st , x1t , and those which come after st , x2t ,

4.2 Fully Recursive Structural Form


 
x1t
yt =  st  . (4.8)
 
4.2.1 Identification
x2t
Remark 47 (Cholesky decomposition) Let  be an n × n symmetric positive definite
Rewrite (4.2) as (assuming α = 0)
matrix. The Cholesky decomposition gives the unique lower triangular P such that  =
P P 0 (some software returns an upper triangular matrix, that is, Q in  = Q 0 Q instead).
 11    11  
F 0 0 x1t B1 B112 B113 x1t−1
F 22 0   st  =  B121 B122 B123   st−1  + ...
 21
 F
    
Note that each column of P is only identified up to a sign transformation; they can be
reversed at will. F 31 F 32 F 33 x2t B131 B132 B133 x2t−1
 11 12 13
   
Bp Bp Bp x1t− p u 1t
Remark 48 (Changing sign of column and inverting.) Suppose the square matrix A2 is
+  B 21 B 22 B 23   s
 +  u st  , (4.9)
   
p p p t− p
the same as A1 except that the i th and j th columns have the reverse signs. Then A−1
 
2 is
B 31
p B 32
p B 33
p x2t− p u 2t
the same as A−1
1 except that the i th and j th rows have the reverse sign.
1 We would asymptotically get the same structural parameters by equation-by-equation LS of (4.2) LS

The most common set of restrictions is to assume that F is lower triangular and that is FIML in this case (assuming normally distributed shocks), since the structural shocks are assumed to be
uncorrelated. The reason why the two estimates are not identical in small samples is that the VAR approach
D = I , which gives exact identification. The Cholesky decomposition is useful in this imposes that also the small sample estimate of D is an identify matrix, while the equation-by-equation LS
case. does not.

54 55
where F 22 is a scalar, and F 11 and F 33 are lower-triangular matrices (not necessarily with This suggests that we can settle for partial identification in the sense that we must
diagonal elements equal to unity). The covariance matrix of the shocks is the identity take a stand on which variables that come before and after the policy instrument, but the
matrix. This model has D = I and a lower triangular F. ordering within those blocks are unimportant for understanding the effects of monetary
The equation for st in (4.9) is policy shocks.
    The typical identifying assumption in much of Sims’ work (see for instance, Sims
h i x1t−1 h i x1t− p (1980)) is that the monetary policy variable is unaffected by contemporaneous innovations
F 22 st = −F 21 x1t + B121 B122 B123  st−1 +...+ B 21 B 22 B 23  st− p +u st .
   
p p p
in the other variables, that is, it is put “first” in the VAR. In later work, by Sims and others,
x2t−1 x2t− p
monetary policy is instead put last (so monetary policy is potentially affected by, but does
(4.10)
22 not affect, contemporaneous macro variables).
If we divide by the scalar F , then we get a traditional reaction function. Policy in t is
determined by (i) a rule which depends on the contemporaneous x1t (but not x2t ); (ii) all
4.2.4 On Variance Decompositions
lagged variables; and (iii) a monetary policy shock, u st .2
Suppose st is the j th element in yt . The impulse response with respect to the monetary It is sometimes found in VAR studies that policy surprises explains only a small part of the
policy shock is then found from the j th columns of the matrices in (4.4), that is, the j th variance of yt (a typical result for US studies for the period after 1982, see for instance,
columns of F −1 , C1 F −1 , C2 F −1 ,... Since F −1 is lower triangular, a policy shock in Leeper, Sims, and Zha (1996)). Two comments are warranted (see also Bernanke (1996)).
period t, u st , has a contemporaneous effect on x2t , but not on x1t . First, this does not mean that all monetary policy has been unimportant. For instance,
it could be the case that anticipated monetary policy, or more generally, the systematic
4.2.3 Importance of the Ordering of the VAR monetary policy, decreases the variance of output and inflation. Second, the variance
decomposition does not tell us about the potential effects of monetary policy surprises (the
Suppose the our objective is to analyze the effects of monetary policy shocks on the other
impulse response function does, however), only about the combination of the potential
variables in the VAR system, for instance, output and prices. The identification rests on
effect with the actual monetary policy shocks for that particular sample.
the ordering of the VAR, that is, on the structure of the contemporaneous correlations as
captured by F. It is therefore important to understand how the results on the monetary
policy shock are changed if the variables are reordered. 4.3 Some Controversies
We have the following result (see Christiano, Eichenbaum, and Evans (1999)):
4.3.1 The “Price Puzzle”
1. The partitioning of yt into variables which come before, x1t , and after, x2t , the pol- The price puzzle is that in a VAR of output, prices, money, interest rate and perhaps
icy instrument is important for u st and the impulse response function of all variables some more variables, contractionary shocks to monetary policy leads to persistent price
with respect to u st . increases! This seems to hold not just in the US, but also in several other countries, and
2. The order within x1t and x2t does not matter for u st or the impulse response function is more pronounced if the policy instrument is taken to be a short interest rate rather than
of any variable with respect to u st . a money aggregate. It is often not statistically significant, but is so common that it signals
2 Note also that since Std(u ) = 1, Std(u /F 22 ) = 1/ F 22 . This clarifies the relation to the tradi-
that the VAR might be misspecified.
st st
tional normalization in systems of simulataneous equations (diagonal elements of F equal to unity and D Sims (1992) discusses how this could be due to a missing element in the reaction
diagonal but not restricted to be an identity matrix); the absolute values of the diagonal elements in F here function of the central bank. Commodity prices may signal inflation expectations, so the
corresponds to the inverses of the standard deviations of the shocks in the traditional normalization.

56 57
IR wrt output shock IR wrt price shock McCallum (1983) argues that the policy instrument of the Fed is a short interest rate
1 1 and that the correct measure of the monetary policy shocks is the residual in a reaction
∆y
π function for this interest rate
0.5 0.5
i
short interest rate = f (lagged macro data) + policy shock. (4.11)
0 0

The crucial assumption is that the policy instrument does not depend on contemporaneous
−0.5 −0.5
0 4 8 12 0 4 8 12 macro data. In contrast, the money stock does, since money demand is probably affected
by shocks to income and prices (as well as the policy shock to the interest rate). The
VAR(4) of US output growth, inflation and innovation in the money stock, m t − Et−1 m t , is therefore a mixture of the policy shock
3−month T−bill rate. Recursive identification.
IR wrt policy shock and other shocks.
1 Quarterly data for 1959−2003
Bernanke and Blinder (1992) argue for using the federal funds rate as the policy in-
strument. (The federal funds rate is the market interest rate for over-night US dollar loans.
0.5
It is usually loans of reserves between banks, called “federal funds loans” since they have
0 typically been used to meet the reserve requirements. It is not directly controlled by Fed;
only the discount rate is.) First, a policy instrument, if effective, should be able to predict
−0.5 macro economic variables. They find that the federal funds rate produces better forecasts
0 4 8 12
of output, employment, and consumption than M1, M2, T-bill rates, or long bonds. Sec-
ond, they notice that the federal funds rate was raised at all cyclical peaks (NBER) and
Figure 4.1: Impulse responses to identified structural shocks
at most of the “Romer dates.” Estimates of reaction functions like (4.11) produce reason-
able responses to inflation and unemployment shocks. Third, the estimated supply curve
central bank may react now by raising interest rates which makes the inflation somewhat
of non-borrowed reserves is extremely elastic at the target funds rate between FOMC
lower than it would otherwise have been (but still positive). If commodity prices are
meetings. This suggests that the federal funds rate is predetermined within the month
excluded from the VAR, this may appear as monetary policy shocks having a positive
(and presumably set by policy), and not driven by demand for reserves which changes
effect on inflation.
continually as the economy is hit by shocks.
This explanation has been challenged on several accounts, for instance, by Hanson
Sims (1992) argues that expansionary shocks to monetary policy should drive output
(2004) and Giordani (2004).
up and lead to opposite movements in money stock and interest rates. Eichenbaum (1992)
comments that this makes Sims choose the interest rate rather than M1 as the policy
4.3.2 Choice of Policy Instrument: i t or m t ?
instrument since positive shocks to M1 lead (in typical VAR of the US economy) to an
Sims (1980) showed that the fraction of the forecasting error variance in US output that increase in the federal funds rate and a decline in output! He also notes that M0, but not
can be attributed to money stock innovations is much lower when an interest rate is added non-borrowed reserves, has the same property. Since the former has a less pronounced
to a VAR of money, price, and output. (The typical identifying assumption in much price puzzle (see below), Eichenbaum (1992) argues that it is a better measure of monetary
of Sims’ work is that the monetary policy variable is unaffected by contemporaneous policy than the federal funds rate.
innovations in the other variables, that is, it is put “first” in the VAR.) In short (and for the US), the policy instrument is now usually taken to be the federal

58 59
g
funds rate, or in some cases, some narrow money aggregate. A structural form of (yt , πt , i t ) can therefore be written
 g     g   AD 
β y βr −βr εt+1

1 0 0 yt+1 yt
4.4 Summary of Some Important Results from VAR Studies of Mon-  0

1 0 πt+1  = α y 1
  
0   πt  +  εt+1
   CP 
. (4.17)
etary Policy∗ −γ y −γπ 1 i t+1 0 0 0 it εt+1
MP

4.4.1 “An Alternative Explanation of the Price Puzzle” by Giordani Notice that this is a recursive VAR with uncorrelated errors, so the standard triangular
identification would work fine. There would be no price puzzle. It is clear that the policy
Giordani (2004) suggests that the price puzzle may be an artifact of using detrended output
shock has no effect on current inflation, and it is straightforward to show that the effect
instead of the output gap—which will cause the identified monetary policy shocks to be
on future inflation is negative.
“contaminated” by other shocks.
However, if we instead choose to work with output (rather than the output gap), then
The model used by Giordani (2004) is a simple Keynesian model (without any forward g
g
we should use (4.13) to substitute for yt in (4.17). Then we do the following: (a) use
looking elements). The IS equation for the output gap, yt , is g
(4.14) to substitute ρyt + εt+1 for yt+1 , use (4.16) to get yt = (i t − γπ πt )/γ y which
N N N

g g
yt+1 = β y yt − βr (i t − πt ) + εt+1
AD
, (4.12) combined with (4.14) gives ytN = yt + πt γπ /γ y − i t /γ y . To simplify we also assume that
there is no monetary policy shock (εtM P = 0 for all t). The result is then
where i t is the interest rate and πt inflation. The output gap is just the difference between
εt+1
       N 
1 0 0 yt+1 yt 1 1 0
(log) actual output, yt , and potential output, ytN ,
0 πt+1  = H πt  +  0 0 1 εt+1 ,
  AD 
 0 1 (4.18)
     
g
yt = yt − ytN , (4.13) −γ y −γπ 1 i t+1 it −γ y 0 0 εt+1
CP

where potential output is assumed to be an exogenous AR(1) where H is a matrix (whose details are not particularly important here). Notice that while
system also appear to be recursive, the errors are not uncorrelated any more.
N
yt+1 = ρytN + εt+1
N
, (4.14) To analyse this model further, consider premultiplying both sides of (4.18) by the
inverse of the leading matrix on the left hand side (corresponding to the F matrix in
where εt+1
N
are the technology shocks that play an important role in the analysis. The
(4.2)). The “VAR model” (reduced form) then becomes (where A is yet another matrix of
Phillips curve for inflation
little concern here)
g
πt+1 = πt + α y yt + εt+1
CP
, (4.15)
εt+1
      N 
yt+1 yt 1 1 0
π π  εt+1 , with
  AD 
= A + 0 0 1 (4.19)
    
(εC P for cost push shock) and the policy rule is  t+1   t 
i t+1 it 0 γ y γπ εt+1
CP
g
γπ πt+1 + γ y yt+1 + εt+1
MP
.
| {z }
i t+1 = (4.16) VAR shockst+1

εt+1 σN
 N   2 
0 0
All shocks are assumed to be uncorrelated. Notice that the paper uses ε to denote the
Cov εt+1  =  0 σ AD
2 0 .
 AD  
(4.20)

structural shocks (in contrast to the notation in (4.1) and (4.2)). One important feature
AD ) increases the output gap, which will drive up εt+1
CP
0 0 σC P
2
of the model is that an AD shock (εt+1
inflation in the next period.

60 61
This means that the covariance matrix of the VAR, denoted  as in (4.1), is inflation: a price puzzle.

σ N + σ AD γ y σ AD
 2 2 2

0 Remark 49 If there are no supply shocks, σ N2 = 0, then (4.22) becomes
= σC2 P γπ σC2 P . (4.21)
 
   
γ y2 σ AD
2 + γ 2σ 2 u tAD∗ 1/σ AD 0 " #
π CP  εt+1
AD
1/σC P  C P ,
 C P∗  
 ut  =  0 (4.25)
The identification in (4.7) amounts to setting F −1 =chol() see) and to use this in the εt+1
u tM P∗ 0 0
VMA (4.4) to calculate the impulse responses with respect to the (perceived) structural
shocks (u t in (4.4)). Notice that the latter here are the “identified” structural shocks to which says that both the AD and cost push shocks are correctly identified and that there
AD, inflation, and monetary policy (u AD∗ , u C P∗ , and u M P∗ , where a star∗ indicates the are no monetary policy shock (as we assumed by setting εtM P = 0).
“identified” structural shocks with an identity covariance matrix). The ordering of these
Remark 50 To be directly comparable with the analysis in Giordani (2004), (4.22) and
shocks is (by definition) the same as the ordering of the VAR (y, π, and i).
(4.24) need to be renormalized so that the F matrix has ones along the main diagonal.
We can then can write the F −1 u t term in (4.4) as
Instead, the “identified” structural shocks get variances that different from unity. In (4.22)
this can be achieved by premultiplying the current F −1 matrix by a diagonal matrix where
 q 
σ N2 + σ AD
2 
0 0 u tAD∗
the elements on the diagonal equals the reciprocals of the corresponding elements in the
F −1 u t =  σC P   u t  , (4.22)
 
0 0  C P∗ 
current F −1 . Consequently, the variances of the identified shocks are then set equal to
 q q
2 / σ2 + σ2
γ y σ AD u tM P∗
AD γπ σC P γ y σ AD σ N / σ N + σ AD
2 2
N
the diagonal elements of the current F −1 .
where the matrix is the Cholesky decomposition of  in (4.21) (this calculation takes a
bit of algebra). This representation of the VAR residuals must of course coincide with 4.4.2 “The ‘Price Puzzle’ Reconsidered” by Hanson
the VAR residuals (last term, including matrix) in (4.19). Equating, and solving for the
Hanson (2004) compares several commodity price indices and other indicators and find
“identified” shocks gives
very little correlation between ability to forecast inflation and to resolve the price puzzle.
F −1 u t = VAR shockst (from (4.19)) (4.23) Moreover, the price puzzle (in US data) seems to be limited to the 1955-1979 sample, and
 q q  commodity prices cannot solve the puzzle for this sample period.
1/ σ N2 + σ AD2 1/ σ N2 + σ AD
2 0
 εt+1
 AD∗   N 
ut  The idea (in Sims (1992)) that there is a missing element in the reaction function of
AD  ,
0 0 εt+1
1/σC P  
 C P∗  
 ut  =  (4.24)

 the central bank can be formalized as follows. Suppose the policy rule is
σ /σ σ /σ
− q AD N q N AD εt+1
 
u tM P∗ 0 CP
σ N +σ AD
2 2 2σ N +σ AD
2
i t = β Et πt+1 + g(xt ) + u tM P , (4.26)
This shows that the identified AD shock is a (scaled) sum of the supply and AD shocks,
where Et πt+1 are the inflation expectations, g(xt ) the effect of other variables in the
the identified cost push shock is correct, and that the identified monetary policy shock
policy rule, and u tM P the policy shock. (The notation differs from Hanson (2004)).
depends negatively on the supply shock and positively on the aggregate demand shock.
Suppose expected inflation is some function of the variables included in the VAR
We already know that inflation does not react to the supply shock (see (4.15)), but it reacts
(denoted t ) and some other variables (denoted Z t ). It can then be represented as
(with a lag) positively to an AD shock (since the AD shock drives up the output gap). In
data it will therefore seems as if the identified monetary policy shock, u tM P∗ , pushes up Et πt+1 = πtm (t ) + πtz (Z t , t ), (4.27)

62 63
where πtm (t ) is the inflation expectations in the VAR model and πtz (Z t , t ) is the rest. A typical result is that monetary policy shocks have a hump-shaped effect on output,
Combining the equations but also a long lasting effect on monetary policy. In fact, Rmm (L) and R ym (L) are quite
similar.
i t = βπtm (t ) + g(xt ) + πtz (Z t , t ) + u tM P , (4.28) The issue in Cochrane’s paper is whether the impulse response of output, R ym (L), de-
| {z }
VAR residual
pends on the monetary policy or not and how this affects the interpretation of the impulse
where the last two terms correspond to the VAR residuals. If πtz (Z t , t ) dominate u tM P , response functions obtained from an identified VAR model.
then it will look like the “monetary policy shocks” identified from the VAR lead to high
future inflation. The solution would be to include the Z t variables in the VAR. Formally, If Only Unanticipated Policy Matters
this means that Z t is part of t , so the πtz (Z t , t ) term is zero. Clearly, the Z t variables Suppose the true model is that output depends on policy surprises (current and lagged) as
must have incremental (over t ) predictive power for future inflation. well as other shocks
A large number of possible variables in Z t are studied in the paper (broad indices
of commodity prices, various individual commodity prices, financial data, and monetary yt = a ∗ (L) (m t − Et−1 m t ) + b∗ (L) δt , (4.30)
aggregates). Most of the commodity prices and the monetary aggregates have forecasting
where Ls (m t − Et−1 m t ) = m t−s − Et−s−1 m t , and where the non-monetary shock, δt , is
power.
uncorrelated with the monetary shock.
The next step of the analysis is to study which of the possible variables in Z t (from
The coefficients in (4.30) are supposed to be unaffected by monetary policy. (This
the previous list) that actually solve the price puzzle (when included in a VAR). The
rules out, among other things, that a change in the volatility of money supply affects the
broad commodity price indices and some of the financial data seem to be most successful,
coefficient of the money supply surprise, as in Lucas’ model.)
whereas individual commodity prices have very small effects.
Attaching a lag polynomial to policy surprises is ad hoc, but was done almost imme-
The key point of the analysis is to study if those variables that solve the price puzzle
diately after the Lucas model was published. The original Lucas model could not explain
are the same as those that have predictive power for inflation. There is a weak tendency
the business cycles or the long responses of output to monetary shocks. (The first motiva-
for this to be true, but the overall impression is no relation at all.
tion for these lags was in terms of capital accumulation, but this can hardly be a plausible
4.4.3 “What Do the VARs Mean? Measuring the Output Effects of Monetary Pol- explanation given the stability of capital stock.)
icy” by Cochrane From (4.29) we have

Reference: Cochrane (1998). m t − Et−1 m t = Rmm (0) u mt + Rmy (0) u yt ,

which can be used to rewrite (4.30) as


Setup and Important VAR Results

Suppose we have estimated an output-money (plus whatever) VAR, imposed identifying yt = a ∗ (L) Rmm (0) u mt + a ∗ (L) Rmy (0) u yt + b∗ (L) δt . (4.31)
restrictions, and calculated the impulse responses to the structural shocks
Equation (4.31) and the second line in (4.29) must be identical. This implies that
" # " #" # " #!
mt Rmm (L) Rmy (L) u mt u mt
= , Cov = 1. (4.29) a ∗ (L) Rmm (0) = R ym (L) , (4.32)
yt R ym (L) R yy (L) u yt u yt

64 65
so the VAR impulse response of output to policy shocks, R ym (L), is proportional to the Note that this does not concern the endogenous part of monetary policy, that is, how
true propagation mechanism, a ∗ (L), which is invariant to actual monetary policy. In this monetary policy affects Rmy (L) and thereby how output reacts to output shocks.
case, the VAR is useful tool for understanding the effect of monetary policy surprises on
output. Federal Funds Rate
This means that the hump-shaped and long-lasting effect of monetary policy shocks
In this case the monetary policy is measured by the innovations in the Federal funds rate
found in VAR studies , R ym (L), reflects the ad-hoc dynamics, a ∗ (L), attached to the
(ordered last in a VAR including, among other things, commodity prices to deal with the
Lucas’ model.
“price puzzle”). The results are similar to those discussed above.
The similarly between R ym (L) and Rmm (L) found in VAR studies should, in this
setting, be interpreted as a coincidence.
Sticky Price Models

If No Distinction Between Anticipated and Unanticipated Policy Sticky price model (for instance the Taylor model) has built-in dynamics, where both an-
ticipated and unanticipated policy matters, but where the latter is usually more powerful.
Consider the extreme case where anticipated policy has the same effect as unanticipated
The built-in dynamics decreases the need for ad hoc dynamics, as captured by a ∗ (L)
policy, so the true model is
above, in order to explain the observed VAR impulse response of output to money supply
yt = a ∗ (L) m t + b∗ (L) δt . (4.33) surprises.

By using (4.29) this can be written 4.4.4 “The Federal Funds Rate and the Channels of Monetary Transmission” by
Bernanke and Blinder
yt = a ∗ (L) Rmm (L) u mt + a ∗ (L) Rmy (L) u yt + b∗ (L) δt . (4.34)
Reference: Bernanke and Blinder (1992).
This should be equal to the second line in (4.29), so
• Monthly US data 1959:1-1978:12.
a ∗ (L) Rmm (L) = R ym (L) . (4.35)
• VAR of federal funds rate, unemployment rate, log of CPI, deposits/securities/loans.
The VAR impulse response function of output to policy shocks, R ym (L), is no longer
invariant to the policy rule—rather the opposite. • Identifying assumption: monetary policy is predetermined (does not depend on
In the extreme case when the propagation mechanism is such that output only depends other contemporaneous shocks, as in much of Sims’ work).
on the current monetary policy shock, a ∗ (L) = a ∗ , then the typical hump-shaped pattern • Results: (i) policy shocks (higher federal funds rate) increases the unemployment
of R ym (L) is a reflection of a hump-shaped pattern of how policy shocks affect future rate after a year; (ii) bank deposits fall; (iii) banks initially sell off securities to
policy, that is of the hump-shaped pattern of Rmm (L). balance the drop in deposits, but this is later undone and the volume of loans is
Since estimates of Rmm (L) are typically hump-shaped in data, and fairly similar to the reduced instead.
estimates of R ym (L), this setting suggests that a ∗ (L) ≈ a ∗ , that is, the effect of money
on output is almost contemporaneous. • Interpretation: adjustment of the stock of loans takes time, so the fall in deposits
In this case the VAR is a not a useful tool for understanding the effect of monetary is initially met by selling of liquid securities. Unemployment starts to rise at the
policy surprises on output. Instead, a direct estimation of (4.33) should work well. same time as stock of loans is reduced. More than a coincidence (decreased supply

66 67
of credit - the “credit channel”)? Or is it that the demand for loans decrease as the • Result: the initial effect of a positive shock to the federal funds rate is to increase
interest rate increase creates a recession by the standard IS-LM mechanism? net funds raised by the business sector for almost a year, and it is only thereafter that
we observe a decline! This is quite contrary to most models, including the “credit
4.4.5 “The Effects of Monetary Policy Shocks: Evidence from the Flow of Funds” channel ” interpretation of Bernanke and Blinder (1992). Why? Interest rate shocks
by Christiano, Eichenbaum, and Evans∗ create recessions, firm revenues decrease, but costs take time to change?

Reference: Christiano, Eichenbaum, and Evans (1996).


4.4.6 “Do Measures of Monetary Policy in a VAR Make Sense” by Rudebusch
• A study of the effect of monetary policy shocks on, for instance, “net funds raised Reference: Rudebusch (1998) and Sims (1998).
in the financial markets” by nonfinancial business or households.
• VAR interest rate equations.
• Quarterly US data 1960:Q1-1992:Q4.
– Can be interpreted as a reaction function, see (4.10).
• VAR (in levels?) of log real GDP, log of GDP deflator, log commodity prices,
– Time invariant, linear structure: tests of parameter stability in the reaction
federal funds rate, minus log of non-borrowed reserves, total reserves, and net funds
function often rejects stability (monthly US data 1960-:1-1995:3.)
raised in the financial markets. This is also the ordering in the identification.
– Small information set: traditional reaction functions typically use a much
– Choice of policy instrument: federal funds rate or minus log of non-borrowed larger information set (trade deficit, stance of fiscal policy, measures of po-
reserves (in the latter case the order of the federal funds rate and the non- litical pressure). The official records indicate that different types of data has
borrowed reserves is reversed in the VAR). The results are not particularly been of interest at different times.
sensitive to this choice. – Use of final data, Y F ,while the true reaction function can only include prelim-
– Commodity prices included to avoid the prize puzzle. inary data, Y p , where YtF = YtP + wt . Suppose the true reaction function is
st = αYtP + u st , but we estimate st = αYtF + est = αYtP + αwt + u st . If the
• Inspection of the estimated shocks to the federal funds rate. statistical agency produces inefficient preliminary estimates, then wt and YtP
will be correlated and the estimator that produces α̂ is inconsistent. Important?
– The policy shocks are relatively high before each NBER recession, et vice
– long distributed lags: VAR estimates often show significant coefficients at
versa. Causality?
lags of many months, which indicate that there is some variation in the federal
– Persistent effects on the federal funds rate (lasts almost two years).
funds rate which can be predicted many months in advance. This is at variance
– A positive shock decreases Fed’s holdings of US government securities. Are with other evidence.
the subsequent increases in the interest rate accomplished by selling bills and
bonds (open market operations)? • VAR interest rate shocks.

– A positive shock decreases M1 and output (the latter with a lag of two quar- – Comparison of the VAR innovations, that is, the j th element in εt from (4.1)
ters). (rather than the structural shocks u st ) with the difference between forward
federal funds rate and the realized federal funds rate (short sample: 1988:10-
1995:3).

68 69
– This is done for the forecast/forward price of the federal funds rate average 2. Empirically, short interest rates are better indicators of monetary policy than mon-
over a month which be realized one-, two-, and three-months in the future. etary aggregates. This finding is not surprising given the amount of interest rate
– The VAR shocks are much more volatile than the surprise according to finan- smoothing that most central bank pursue: the supply curve of reserves is almost flat
cial data, and the correlation between them is low. between infrequent interest rate changes; most innovations in monetary aggregates
reflect money demand innovations.
• Other observations.
3. Monetary policy surprises have been relatively unimportant for US macroeconomic
– Different VARs in the literature have produced different time series of policy fluctuations since 1960.
shocks, but fairly similar impulse response functions. Strange! Data mining
4. Monetary policy reacts strongly to the macroeconomic situation (the “feedback” or
to get a reasonable impulse response function?
“systematic” part of monetary policy is an important part of monetary policy).
Sims (1998) has two important comments on Rudebusch’s paper. First, excluding
Bernanke points out that the third point does not prove that monetary policy shocks
an exogenous variable from a regression (in a system of equations) does not necessarily
cannot have large effects. In order to assess this possibility, the impulse response function
lead to bad estimates of the coefficients (this is clearly the case if the excluded explana-
is more useful than the forecasting error variance decomposition. He also notes that the
tory variables are uncorrelated with the other explanatory variables), but will obviously
VAR approach has little to say about the effects of anticipated monetary policy.
change the fitted residual. Second, the Federal funds rate is often changed quickly as new
information about the state of the economy arrives. This means that innovations to the
4.4.8 “Identifying Monetary Policy in a Small Open Economy under Flexible Ex-
federal funds rate contain both policy surprises (what we want to measure) and reactions
change Rates” by Cushman and Zha∗
to innovations in the state of the economy. The difference between the federal funds fu-
tures and the actual federal funds rate is such an innovation.. In contrast, a VAR where Reference: Cushman and Zha (1997).
the policy instrument is allowed to depend on current values of the state of the economy, The authors argue that a VAR study of a small open economy cannot be done in the
may potentially be able to separate the components of the innovation. Note that this is an same way as for the US. They try to incorporate the following aspects in a VAR of monthly
argument for not making the monetary policy instrument predetermined, as is the case in Canadian macro data.
much of Sims’ own work.
1. Interest rates movements are likely to react contemporaneously to foreign interest
rates. This is an argument against assuming that the monetary policy instrument is
4.4.7 “What Does Monetary Policy Do?” by Leeper, Sims. and Zha
predetermined (like in much of Sims’ work).
Reference: Leeper, Sims, and Zha (1996) and Bernanke (1996).
2. Under a flexible exchange rate regime, the exchange rates should be allowed to
This paper estimates large VAR systems of monetary policy, sometimes with split of
react to all contemporaneous shocks. It is, after all, a forward looking asset price.
the sample. The degrees of freedom problem is handled with a Bayesian approach.
In his comment on the paper, Bernanke finds that the most important conclusions from 3. Trade flows are interesting and important.
the paper are the following.
4. Foreign variables can be treated as a separate block, which is (block) exogenous for
1. The estimated effects of monetary policy seems plausible: the VAR approach might the domestic (small open) economy. In practice, this means that domestic variables
work. are not allowed to affect foreign variables - not even with a lag.

70 71
5. Data: monthly 1974-1993 data for Canada. US is taken to be the “rest of the world.” where shocks are uncorrelated with each other. Pt is a vector of “policy variables” with
p
the associated structural shocks vt , where the innovations to the variables in Pt may be
6. A (Sims style) identification gives the strange result that a monetary contraction
correlated through a non-diagonal matrix A p . Yt is a vector of non-policy variables, like
leads to price increases (the “price puzzle” once again) and a depreciation of the
output and inflation.
exchange rate (an “exchange rate puzzle”).
Policy variables have no contemporaneous effect on the non-policy variables (the op-
7. Their identification implies a traditional money demand equation (M1, P, y, i), and posite approach to what Sims typically use). If Pt was a scalar, this would be enough for
p
a money supply equation which may depend on the foreign interest rate and com- identifying the policy shock, vt . It would simply be the residual in a regression of Pt on
modity prices, but not on contemporaneous output. (Plus a few more things.) contemporaneous Yt and lags of both Pt and Yt .
In this paper, Pt contains total reserves, tr , non-borrowed reserves, nbr , the call rate
8. Their results indicate that a monetary contraction leads to an appreciation of the (a market rate of reserves, similar to the federal funds rate), cr , and the Lombard rate, lr .
Canadian dollar, an increase in the interest rate and a decrease in the money stock, It is therefore necessary to put extra restrictions on the system in order to extract a scalar
a prolonged negative effect on the price level, and a small but negative effect on policy shock, vts .
output. The variance decomposition indicate that monetary policy shocks account The reduced form, VAR, of (4.36) is
for only a small fraction of forecast error variance of output.
k
" # " # " y #
Yt X Yt−s ut
= As + p q . (4.37)
4.4.9 “What Does the Bundesbank Target?” by Bernanke and Mihov∗ Pt s=1
Pt−s ut + ut

Reference: Bernanke and Mihov (1997). See also Walsh 9.4 and Bernanke and Mihov The VAR residuals for the policy block has been split up into two components. To see
(1998). what they are, note that by the rules of partitioned matrices, we have that the inverse of
the leading matrix in (4.36) is.
Remark 51 (Bundesbank’s interest rates. See, for instance, Burda and Wyplosz (1997)
" #−1 " #
9.3) The floor of the interest rate tunnel was the discount rate. Access to this was limited I−F 0 (I − F)−1 0
= . (4.38)
by quotas. The ceiling was the Lombard rate at which banks borrowed in emergencies. −D0 I − G 0 (I − G 0 )−1 D0 (I + F)−1 (I − G 0 )−1
The repo rate was in between. The instruments of the Bundesbank were these three rates,
The VAR shocks of the policy variables must have the following relation to the structural
the quotas, and the reserve requirements.
shocks
Bernanke and Mihov study what the Bundesbank has actually done over the period p q p y
u t + u t = (I − G 0 )−1 A p vt + (I − G 0 )−1 D0 (I + F)−1 A y vt . (4.39)
1969:01 to 1990:12. They find that inflation forecasts explain much more of the variance
p p
in the Lombard rate than does money growth. The conclude that the Bundesbank has, in Define u t to be (I − G 0 )−1 A p vt , that is, the part of the VAR shock (of the policy vari-
y p y
fact, been running an inflation target. ables) which is uncorrelated with structural non-policy shocks, vt . (Recall that vt and vt
Their structural model looks like are uncorrelated). We must therefore have
k
" #" # " #" # " #" y #
I−F 0 Yt X Bs Cs Yt−s Ay 0 vt p p
(I − G 0 ) u t = A p vt .
= + p , (4.40)
−D0 I − G 0 Pt s=1
D s G s Pt−s 0 A p vt
(4.36)

72 73
The model for the policy innovations are (corresponding to equations (2.7)-(2.10) in the VAR shock for the non-policy block must be
the paper) y y
u t = (I − F)−1 A y vt . (4.42)
Total reseves demand : u tr = −αu cr + v d
From (4.38) we then get
Lombard loans demand : u ll = β (u cr − u lr ) + v b
p q p y
Nonborrowed reserves supply : u nbr = φ d v d + φ b v b + φ s v s + v n u t + u t = (I − G 0 )−1 A p vt + (I − G 0 )−1 D0 u t . (4.43)
Lombard rate : u lr = γ v + γ v + γ v + v ,
d d b b n n s
This gives
and vs is taken to be the “policy shock,” which we want to identify. Using the identity p q p
h i0
Cov u t + u t = (I − G 0 )−1 A p Cov vt A p0 (I − G 0 )−1 +
u ll = u tr − u nbr , the model implies the following restrictions on (4.40) h i0
y
      (I − G 0 )−1 D0 Cov u t D00 (I − G 0 )−1 , (4.44)
1 0 α 0 u tr 1 0 0 0 vd
 1 −1 −β β   u nbr   0 1 0 0   v b 
      y y p
  =  . (4.41) since vt , and therefore u t , and vt are uncorrelated. From (4.43) we also get
 0 1
 0 0   u cr   φ d φ b 1 φ s   v n 
     
 h i0
p q y y
0 0 0 1 u lr γd γb γn 1 vs Cov u t + u t , u t = (I − G 0 )−1 D0 Cov u t (I − G 0 )−1 D00 , (4.45)
| {z } | {z }
I −G 0 Ap p q p q y y
The matrices Cov u t + u t , Cov u t + u t , u t and Cov u t can be estimated from
The model is estimated in a two-step procedure. First, each equation in the VAR (4.37) is the VAR residuals. If A p and I − G 0 are identified as discussed in conjunction with
estimated separately with least squares. Second, the policy shock is identified by matching (4.41), then the equations in (4.45) are sufficient to identify D0 , since it has as many
the covariance matrix of the VAR residuals with the covariance matrix implied by the unique elements as there are elements in D0 .
p
theoretical model (4.41). The idea is that if we had the Cov(u t ) (has 4 (4 + 1) /2 = 10
unique elements), then we could solve for the parameters in (4.41) (there are 8 parameters)
plus the variances of the structural shocks (4). With at least additional two restrictions Bibliography
(on top of all the zeros and cross restrictions already assumed), the parameters could be
Bernanke, B. S., 1996, “Comment to ’What Does Monetary Policy Do?’ by Leeper, Sims,
identified, that is, we solve (typically a non-linear problem) for the parameters and the
p
and Zha,” Brookings Papers on Economic Activity, 2, 69–73.
variances from Cov(u t ).
They therefore put additional restrictions. For instance, in case of “Lombard rate Bernanke, B. S., and A. S. Blinder, 1992, “The Federal Funds Rate and the Channels of
targeting” they set γ d = γ b = γ n = 0 (gives overidentification), which means that Monetary Transmission,” American Economic Review, 82, 901–921.
v s = u lr . Alternatively, with “nonborrowed-reserves targeting,” they impose φ d = φ b =
Bernanke, B. S., and I. Mihov, 1997, “What Does the Bundesbank Target?,” European
φ s = 0 (gives overidentification).
p p Economic review, 41, 1025–1053.
The previous discussion supposed that we could observe u t and calculate Cov(u t ).
p q
However, the VAR shocks for the policy blocks are u t + u t as given in (4.39). One way Bernanke, B. S., and I. Mihov, 1998, “Measuring Monetary Policy,” Quarterly Journal of
of dealing is as follows (I do not know how Bernanke and Mihov did). By (4.36)-(4.38) Economics, 113, 869–902.

74 75
Burda, M., and C. Wyplosz, 1997, Macroeconomics - A European Text, Oxford University Rudebusch, G. D., 1998, “Do Measures of Monetary Policy in a VAR Make Sense?,”
Press, 2nd edn. International Economic Review, 39, 907–931.

Christiano, L., M. Eichenbaum, and C. Evans, 1999, “Monetary policy shocks: what Sims, C. A., 1980, “Comparison of Interwar and Postwar Business Cycles: Monetarism
have we learned and to what end?,” in John B. Taylor, and Michael Woodford (ed.), Reconsidered,” American Economic Review, 70, 250–257.
Handbook of Macroeconomics, Elsevier, Amsterdam.
Sims, C. A., 1992, “Interpreting the Macroeconomic Time Series Facts: The Effects of
Christiano, L. J., M. Eichenbaum, and C. Evans, 1996, “The Effects of Monetary Policy Monetary Policy,” European Economic Review, 36, 975–1000.
Shocks: Evidence from the Flow of Funds,” Review of Economics and Statistics, 78,
Sims, C. A., 1998, “Comment on Glenn Rudebusch’s ’Do Measures of Monetary Policy
16–34.
in a VAR Make Sense?’,” International Economic Review, 39, 933–942.
Cochrane, J. H., 1998, “What Do the VARs Mean? Measuring the Output Effects of
Walsh, C. E., 2003, Monetary Theory and Policy, MIT Press, Cambridge, Massachusetts,
Monetary Policy,” Journal of Monetary Economics, 41, 277–300.
2nd edn.
Cushman, D. O., and T. Zha, 1997, “Identifying Monetary Policy in a Small Open Econ-
omy under Flexible Exchange Rates,” Journal of Monetary Economics, 39, 433–448.

Eichenbaum, M., 1992, “Comment on ’Interpreting the Macroeconomic Time Series


Facts: The Effects of Monetary Policy’ by Christopher Sims,” European Economic
Review, 36, 1001–1011.

Favero, C., 2001, Applied Macroeconomics, Oxford University Press.

Giordani, P., 2004, “An Alternative Explanation of the Price Puzzle,” Journal of Monetary
Economics, 51, 1271–1296.

Greene, W. H., 2003, Econometric Analysis, Prentice-Hall, Upper Saddle River, New
Jersey, 5th edn.

Hanson, M. S., 2004, “The Price Puzzle Reconsidered,” Journal of Monetary Economics,
51, 1385–1413.

Leeper, E. M., C. A. Sims, and T. Zha, 1996, “What Does Monetary Policy Do?,” Brook-
ings Papers on Economic Activity, 2, 1–63.

McCallum, B. T., 1983, “A Reconsideration of Sim’s Evidence Regarding Monetarism,”


Economics Letters, 13, 167–171.

76 77
First, find the flex price optimum, pit∗ . Second, minimize the loss from not being at pit∗
and from incurring adjustment costs. For the moment, we will take the time series process
of pit∗ as given and focus on the second part of the maximization problem. To make any
5 Microfoundations of Monetary Policy Models progress, we also approximate the objective function in the second step by a quadratic
function
Main references: Blanchard and Fischer (1989) (BF), Obstfeldt and Rogoff (1996) (OR),

and Walsh (2003). X h 2 i
min∞ Et βs ∗
pit+s − pit+s + c ( pit+s − pit+s−1 )2 or (5.1)
{ pit+s }s=0
s=0
n 2 2 o
5.1 Dynamic Models of Sticky Prices min∞ pit − pit∗ + c ( pit − pit−1 )2 + β Et pit+1 − pit+1

+ βc Et ( pit+1 − pit )2 + ... .
{ pit+s }s=0

This section deals with the effect of price rigidities in dynamic models. Prices are set in The first order condition with respect to pit is
advance and firms are assumed to supply whatever demand happens to be (which is rea-
sonable only as long as demand shocks do not force marginal costs above the price). This pit − pit∗ + c ( pit − pit−1 ) − βc Et ( pit+1 − pit ) = 0 or (5.2)
clearly assumes that firms can expand production, for instance, by hiring more labour, so 1 ∗
β Et 1pit+1 + p − pit = 1pit .

(5.3)
there must be a fairly elastic factor supply. If factor supply is not particularly elastic, then c it
marginal costs will increase rapidly so the assumption that marginal cost is always below There is no lumpiness in individual price changes. The loss function is quadratic so both
the price becomes implausible. deviations from the flex price optimum ( pit − pit∗ ) and prices changes ( pit − pit−1 ) are
Aggregate demand shocks (or money supply) will usually have real effects when much more costly when they are large. The optimal policy is therefore to converge to pit∗
prices adjust slowly. This is certainly the case when prices are changed with prespeci- by taking many small steps rather than a few large.
fied intervals (time-dependent rules), and the main issue is instead how long the effects In a symmetric equilibrium pit = pt and pit∗ = pt∗ . It can also be noted that situations
last. It is typically also the case when prices are changed when the old prices are too far with a high surprise inflation will lead to a higher pit∗ − pit , so the price adjustment is
from the frictionless optimum (state dependent rules). then faster.
In general, we would like to find a reasonable model which can explain both why The smooth individual price changes carry over to the average prices, since all firms
average prices seem to adjust gradually to monetary expansions and why price changes are similar. Let pit = pt and pit∗ = pt∗ be the common prices and write (5.3) as
of individual firms appear to be “lumpy.” This is hard.
1 ∗
1pt = β Et 1pt+1 + pt − pt .

(5.4)
c
5.1.1 Quadratic Costs of Price-Adjustment
In the special case of no adjustment cost (c = 0), (5.2) shows that pit = pit∗ , so the
Reference: Rotemberg (1982a), Rotemberg (1982b), and Walsh (2003) 5.3. firm will always set its actual price equal to the unrestricted optimal price (quite obvious
Firm i is a monopolist on its market and sets the log price, pit , to maximize the value since the price is then unrestricted).
of the firm: the expected discounted sum of profits. If there were no costs of adjusting
this price, then the price would be equal to some value, pit∗ , which we call the flex price 5.1.2 Determination of pt∗ under Monopolistic Competition
optimum.
What is the unrestricted optimal price, pit∗ , which plays such an important role in the
With costs of adjusting the price we formulate the maximization problem in two steps.
previous model? A typical formulation is that it represents a monopolist’s price in a flex-

78 79
price equilibrium. That price is typically an increasing function of aggregate demand and rate are constant. This suggests that this model of price setting is certainly not suitable for
a decreasing function of the productivity level. In logs, we write understanding a permanent change in the money supply trend. It is not plausible that the
model parameters, for instance q and c, would remain unchanged in such a case.
pit∗ = pt + φyt + εt , (5.5)
5.1.3 Example: The Phillips Curve (5.6) in a Very Simple Macro Model
where pt is the average (log) price level in the economy (the competitors’ prices), yt is
the output of this firm, and εt is interpreted as a cost-push shock. This equation says that To construct a simple example, assume that the quantity equation holds. In logs we have
the firm will (try to) raise the to raise the relative price ( pit∗ − pt ) when output is high
(since φ > 0) and after a cost push shock. The latter could, for instance, be a negative m t = pt + yt , (5.8)
productivity shock or an exogenous increase in labour costs.
where m t is the money supply (controlled by the central bank).
The φ parameter is typically an increasing function of the slope of the marginal cost
The quantity equation can be taken to represent aggregate demand. Aggregate supply
curve (the degree of decreasing returns to scale) and decreasing in the elasticity of sub-
is represented by the price setting rule, and it is assumed that firms supply whatever the
stitution between goods in consumer preferences. In most models, we need an upward
market demands at the going price: output is demand determined. In traditional mon-
sloping marginal cost curve to get φ > 0, which could be motivated by some fixed factors
etarist models, the quantity equation is aggregate demand, without much discussion of
of production. If these fixed factors are not completely fixed, but can be accumulated
where it comes from. In a Keynesian model, the quantity equation would be an approxi-
over time, then the problem becomes more complicated (dynamic) and (5.5) can only be
mation to the Keynesian AD curve (the combination of the IS and LM curves which traces
interpreted as an approximation that might be valid for short to medium run horizons (a
out the relation between output and prices). Both these interpretations assume a negative
business cycle, say).
relation between the price level and output. In some modern dynamic general equilibrium
Assuming a symmetric equilibrium ( pit = pt and pit∗ = pt∗ ) and using (5.5) in (5.4)
models, the quantity equation can be shown to be the money demand equation (see, for
gives
instance, Bénassy (1995)).
1pt = β Et 1pt+1 + δ (φyt + εt ) , where δ = 1/c, (5.6)
We now use this very simple model of “demand” to illustrate some properties of the
which can be thought of as an expectations-augmented Phillips curve. It is in an sense sticky price model. Substitute for yt in (5.6) by using (5.8)
similar to the Keynesian AS curve, which has positive relation between output and the
price level. 1pt = β Et 1pt+1 + δφ (m t − pt ) + δεt
Recursion forward gives − pt−1 + pt (1 + β + δφ) − β Et pt+1 = δ (φm t + εt ) . (5.9)

X This is a second-order expectational difference equation, which can be solved with a
1pt = δ β s Et (φyt+s + εt+s ) , (5.7)
s=0
variety of methods. The perhaps most straightforward one is to specify a time-series
process for the exogenous driving process, and transform the system to a vector first-order
provided lims→∞ β s+1 Et 1pt+s = 0. Note that Et yt+s has a large effect on infla-
system and then use a decomposition of the resulting matrix to decouple the variables in
tion is φ is high (strong decreasing returns to scale and/or strong market power), and
those that are predetermined in t (typically the exogenous variables and values determined
Et (φyt+s + εt+s ) has a large effect if δ is high (small c in (5.6)).
in previous periods like the capital stock and lagged variables) and those that can jump
As in any Phillips curve, it appears as if inflation is a real phenomenon! This is quite
in t in response to changes in expectations about future values (typically asset prices and
the opposite to the Cagan model, where it is assumed that both output and the real interest
anything else that depend on expected future values).

80 81
A trivial step is to note that (5.9) can be rewritten Baseline model Less price rigidity, c=0.1
1 1
1 1 + β + δφ δ
Et pt+1 = − pt−1 + pt − (φm t + εt ) . (5.10)
β β β money
0.5 0.5
price
For simplicity, suppose there is no cost push shock (εt = 0) and that m t is an AR(1)
output
0 0
m t = ρm t−1 + εmt . (5.11)
−2 0 2 4 6 8 −2 0 2 4 6 8
We can then write the model on state space form as period period

ρ εmt+1
      
m t+1 0 0 mt
 pt = 0 0 1   pt−1  +  0 . (5.12) Prices sensitive to demand, φ=2
      
Rotemberg model, response to money supply shock
δ 1 1+β+δφ 1
Et pt+1 −β φ −β β p t 0 Parameter values (base line):
ρ=0.95, φ=3/7, c=0.5, β=0.96
Some impulse response functions (dynamic simulations obtained from setting εmt = 1 0.5
in t = 0 but zero in all other periods) are shown in Figure 5.1. In the baseline case, price
adjustment is fairly slow (expensive to change prices), so a monetary shock leads to a 0
relatively large effect on output: money is far from neutral. With a smaller cost of price
−2 0 2 4 6 8
changes, price adjustment is much faster, so the monetary shock has almost no effect on period
output: money is almost neutral. Finally, when φ is high (quickly decreasing returns to
scale or strong monopoly power), then it is too costly for firms to keep their old prices—so
Figure 5.1: Impulse responses in the Rotemberg model
the price response is quick (and the effect on output small).

restrictions on which type of policy experiments which are meaningful to analyze with
5.1.4 The Phillips and the “Natural Rate Hypothesis”
the help of this model: we should probably only use this model for policy changes which
The “natural rate hypothesis” states that the mean of output cannot be affected by any keeps the average inflation rate unchanged. In many applications, the Phillips equation
monetary policy. Suppose the central bank can change the inflation rate by changing its is assumed to refer to detrended output (as a measure of the business cycle). The main
policy instrument. Take the unconditional expectation of the Rotemberg/Calvo model reason is that the Phillips effect is typically only relevant for as long as the production
(5.6) and use iterated expectations and E εt = 0 to get function has decreasing returns to scale, see the discussion of (5.5). Since detrended
E 1pt − β E 1pt+1 output per definition has a zero mean, the kind of experiments that changes the mean of
E yt = . (5.13) yt must be ruled out.
δφ
If β = 1 (β < 1), and inflation is a stationary series so E 1pt = E 1pt+1 , then this
means that inflation cannot (can) affect average output. Irrespective of whether β = 1 or 5.2 Aggregate Demand
not, a drifting inflation rate (E 1pt 6= E 1pt+1 ) can certainly affect average output.
Reference: Walsh (2003) 5.4
This should probably be regarded as an artifact of the Calvo/Rotemberg model. It puts
This section derives an AD curve from the optimal consumption/savings decision of

82 83
a forward looking agent. The demand shock comes from a random taste parameter. This section gives an introduction to recent models of monetary policy. Such models
The period utility function is typically combine a forward looking Phillips curve, for instance, from a Calvo/Rotemberg
At model, with an aggregate demand equation derived from an optimizing consumer’s in-
1−γ
U (Ct ) = C , (5.14) tertemporal consumption/savings decision, and some kind of policy rule or objective func-
1−γ t
tion for the central bank.
where At is a taste shift parameter. The Euler equation for optimal consumption is

∂U (Ct ) ∂U (Ct+1 )
 
5.3.1 A Simple Model
= β Et Q t+1 , (5.15)
∂Ct ∂Ct+1
Price are set as in the Calvo/Rotemberg model. In the Calvo model, a fraction of the firms
where Q t+1 is the gross real return. are allowed to set a new price in a period, and the rest must keep their old price. When
The marginal utility of Ct is allowed to change the price, the firms chooses a price to minimize a discounted sum of the
∂U (Ct ) −γ squared deviations of the actual price and the flex price. The Rotemberg model instead
= At C t , (5.16)
∂Ct says that the firm can change the price whenever it wants, but that there is a (quadratic)
cost associated with price changes. Both models generate the same implications for the
so the optimality condition can be written
aggregate Phillips curve. We also assume that the flex price is determined as in model
At+1 Ct+1 −γ
 
1 = β Et Q t+1 of monopolistic competition, pit∗ = pt + φyt + επ t , where φ measures how much price
At Ct setters wants to increase the relative price when demand increases (φ is high when the
= β Et exp (ln Q t+1 + 1 ln At+1 − γ ln Ct+1 + γ ln Ct ) . (5.17) substitution elasticities between goods is low and when the marginal cost curve is steep).
The supply side of the economy can then be summarized by the “Phillips curve”
Assume that ln Q t+1 , ln At+1 , and ln Ct+1 are jointly normally distributed. (Recall E exp (x) =
exp (E x + Var (x) /2) is x is normally distributed.) Take logs of (5.17) and rewrite it as πt = β Et πt+1 + δ (φyt + επ t ) , (5.19)

0 = ln β + Et ln Q t+1 + Et 1 ln At+1 − γ Et ln Ct+1 + γ ln Ct where δ is increasing in the fraction q.


+ Vart (ln Q t+1 + ln At+1 − γ ln Ct+1 ) /2, or (5.18) The “aggregate demand” curve is derived from an Euler condition for optimal con-
1 1 sumption choice with taste shocks, combined with the assumption that consumption equals
Et ln Ct+1 = ln Ct + Et ln Q t+1 + Et z t+1 ,
γ γ output. It is
1
where Et z t+1 = ln β + Et 1 ln At+1 + Vart (.). Et yt+1 = yt + (i t − Et πt+1 ) + ε yt , (5.20)
γ
The most important part of Et z t+1 is Et 1 ln At+1 . If ln At+1 = ρ ln At + u t+1 ,
where ε yt is a negative shock to current (time t) demand.
then Et 1 ln At+1 = (ρ − 1) ln At , so the AR(1) formulation carries over to the expected
The central bank sets short interest rate, i t . This can have effect on output since prices
change, but the sign is reversed if ρ > 0.
are sticky, so the nominal interest rate affects the real interest rate. This, in turn, affects
demand, and thus inflation through the “Phillips effect.” Suppose the reaction function,
5.3 Recent Models for Studying Monetary Policy also called simple policy rule, of the central bank is a “Taylor rule”

Reference: Walsh (2003) 5.4 i t = χ πt + υyt . (5.21)

84 85
This is a sub-optimal commitment policy. It is a commitment rule since the policy setter This system is in state space form and could be summarized as
will stick to this rule, even if it would be optimal to deviate from it in certain states. The " # " #
x1t+1 x1t
optimal commitment rule, however, would not restrict the decision rule to be a function Ã0 = Ã + B̃i t + ξ̃t+1 , and (5.25)
Et x2t+1 x2t
of yt and πt only. " #
Note that there is no money demand function in this model. The reason is that mone- x1t
i t = −F . (5.26)
tary policy is specified in terms of the interest rate, so the money stock becomes demand x2t
determined (the money supply curve is flat at the chosen nominal interest rate). Of course, where x1t is a vector of predetermined variables (here επt and ε yt , which happens to
in order for the central bank to control anything of importance, there must be a demand be exogenous, but also endogenous variables can be predetermined) and x2t a vector of
for money. The money demand function could be added to the model, but its only role is forward looking variables (here πt and yt ). Premultiply (5.25) with Ã−1
0 to get
to determine the money stock. " # " #
Suppose the shocks in (5.19) and (5.20) follow x1t+1 x1t
=A + Bi t + ξt+1 , where (5.27)
Et x2t+1 x2t
επ t+1 = τπ επt + ζπt+1  
0 Ã, B = Ã0 B̃, and Cov (ξt ) = Ã0 Cov ξ̃t Ã0 .
A = Ã−1 −1 −1 −10
(5.28)
ε yt+1 = τ y ε yt + ζ yt+1 . (5.22)
By using the policy rule (5.26) in (5.27) we get
We can write (5.19)–(5.22) as " # " #
x1t+1 x1t
= (A − B F) + ξt+1 .
     
1 0 0 0 επt+1 τπ 0 0 0 επ t (5.29)
Et x2t+1 x2t
 0 1 0 0   ε yt+1 τy ε yt
     
 
 
= 0 0 0 


+
 0 0 β 0  E π   −δ 0 1 −δφ   πt  This system of expectational difference equations (with stable and unstable roots) can
   t t+1    
0 0 γ1 1 Et yt+1 0 1 0 1 yt be solved in several different ways. For instance, a decomposition of A − B F in terms of
    eigenvalues and eigenvectors will work if the latter are linearly independent. Otherwise,
0 ζπt+1
 
 0   ζ yt+1
  other techniques must be used (see, Appendix A and Söderlind (1999)). A necessary
,

 0  it +  0
  
 (5.23) condition for a unique saddle path equilibrium is that A − B F has as many stable roots
   
1 (inside the unit circle) as there are predetermined variables (that is, elements in x1t ).
γ 0
To solve the model numerically, parameter values are needed. The following values
with   have been used in most of Figures 5.2-5.4 (exceptions are indicated)
επ t
i ε yt β δ φ γ τπ τ y υ χ λ y λi
h 
0 0 χ υ .
 
it =  (5.24)

 πt 
 0.99 2.25 2/7 2 0.5 0.5 0.5 1.5 0.5 0
yt
The choice of δ implies relatively little price stickiness. The choice of φ means that a 1%
increase in aggregate demand leads to a desired increase of the relative price of 2/7%. The
choice of the relative risk aversion γ implies an elasticity of intertemporal substitution of
1/2. The υ and χ are those advocated by Taylor. The loss function parameters (see next

86 87
a. Baseline model b. Large inflation coefficient 5.3.2 Optimal Monetary Policy∗
4 4
π Suppose the central bank’s loss function is
2 y 2

i
X
0 0 Et β s L t+s , where (5.30)
s=0
−2 −2
2 2 2
L t+s = πt+s − π ∗ + λ y yt+s − y ∗ + λi i t+s − i ∗ . (5.31)
−2 0 2 4 6 8 −2 0 2 4 6 8
period period A particularly straightforward way to proceed is to optimize (5.30), by restricting the
policy rule to be of the simple form discussed above, (5.21). Optimization then proceeds
as follows: guess the coefficients υ and χ, solve the model, use the time series represen-
c. Large output coefficient Persistent price shock: simple policy rule
4 tation of the model to calculate the loss function value. Then try other coefficients υ and
χ , and see if they give a lower loss function value. Continue until the best coefficients
2
have been found.
0 The unrestricted optimal commitment policy and the optimal discretionary policy rule
are a bit harder to find. Methods for doing that are discussed in, among other places,
−2
Söderlind (1999).
−2 0 2 4 6 8
period Figure 5.3 compares the equilibria under the simple policy rule, unrestricted optimal
commitment rule, and optimal discretionary rule, when it is assumed that π ∗ = y ∗ = 0.
It is clear that the optimal commitment rule achieves a much more stable inflation and
Figure 5.2: Impulse responses to price shock; simple policy rule
output, in spite of a less vigorous increase in the nominal interest rate. This is achieved by
credibly promising to keep interest rates high in the future (and even raise further), which
section) means that inflation is twice as important as output, and that the policy maker
gives expectations of lower future output and therefore future inflation. This, in turn,
does not care about fluctuations in the nominal interest rate.
gives lower inflation and output today. The discretionary equilibrium is fairly similar to
The first subfigure in Figure 5.2 illustrates how the model with the policy rule (5.21)
the simple rule in this model. Note that there is no constant “inflation bias” when target
works. An inflation shock in period t = 0 increases inflation. The policy maker reacts by
levels are at their natural levels (zero) as they are in these figures. The discretionary rule is
raising the nominal interest even more in order to increase the real interest rate. This, in
still different from the commitment rule (they are, after all, outcomes of different games).
turn, has a negative effect on output and therefore on inflation via the “Phillips curve.” The
The intuition is that there is a time-varying “bias” since the conditional expectations of
central bank creates a recession to bring down inflation. The other subfigures illustrates
output and inflation in the next periods (their “conditional natural rates”) typically differ
what happens if the coefficients in the reaction function (5.21) are changed.
from the target rates (here zero).
Figure 5.4 makes the same type of comparison, but for a positive demand shock, −ε yt .
In this case, both optimal rules “kill” the demand shock, which is seen almost directly
from (5.20): any shock ε yt could be met by increasing i t by γ ε yt . In this way output is
unaffected by the shock, and there will then be no effect on inflation either, since the only

88 89
a. Simple policy rule b. Commitment policy a. Simple policy rule b. Commitment policy
4 4 4 4
π π
2 y 2 2 y 2
i i
0 0 0 0

−2 −2 −2 −2

−2 0 2 4 6 8 −2 0 2 4 6 8 −2 0 2 4 6 8 −2 0 2 4 6 8
period period period period

c. Discretionary policy Persistent price shocks c. Discretionary policy Persistent demand shocks
4 4

2 2

0 0

−2 −2

−2 0 2 4 6 8 −2 0 2 4 6 8
period period

Figure 5.3: Impulse responses to price shock: simple rule, optimal commitment policy, Figure 5.4: Impulse responses to positive demand shock: simple rule, optimal commit-
and discretionary policy ment policy, and discretionary policy

way the demand shock can affect inflation is via output (see (5.19)). This is very similar A Summary of Solution Methods for Linear RE Models
to the static model discussed above: the demand shock drives both prices and output in
the same direction and should, if possible, neutralized. Of course, the result hinges on The model is " # " # " #
x1t+1 x1t εt+1
the assumption that the policy maker is not averse to movements in the nominal interest =A + , (A.1)
Et x2t+1 x2t 0
rate, that is, λi = 0 in (5.31). (It can be shown that this case can be approximated in the
where x1t is an n 1 × 1 vector of predetermined variables, x2t is an n 2 vector of “forward
simple policy rule (5.21) by setting the coefficients very high.) Many studies indicate that
looking” variables, and εt is a white noise process. All dynamics of the exogenous pro-
central banks are unwilling to let the nominal interest rate vary much. This is sometimes
cesses have been placed in x1t . A necessary condition for a saddle path equilibrium is that
interpreted as a concern for the banking sector, and sometimes as due to uncertainty about
A has as many unstable roots (inside unit circle) as there are elements in x1t .
the state of the economy and/or the effect of policy changes on output/inflation. In any
Decompose A as
case, λi > 0 is often necessary in order to make this type of model fit the observed
A = Z T Z −1 , (A.2)
variability in nominal interest rates.
where T is (at least) upper block diagonal. Note that we require Z to be invertible. In

90 91
some cases we could let T be a diagonal matrix with eigenvalues along the principal
diagonal and with the corresponding eigenvectors in the columns of Z (if the eigenvectors
are linearly independent).
This decomposition should be reordered so that the blocks corresponding to the stable 7 Solving Linear Expectational Difference Equations
eigenvalues (in or on the unit circle) comes first. Partition conformably with the stable
References: Blanchard and Kahn (1980), King and Watson (1998), and Klein (2000).
and unstable roots
" # " #
Tθ θ Tθδ Z kθ Z kδ
T = and Z = . (A.3) 7.1 The Model
0 Tδδ Z λθ Z λδ
The model in state-space form is
The solution can then be shown to be
" # " # " #
x1t+1 x1t εt+1
−1
x1t+1 = Z kθ Tθθ Z kθ x1t + εt+1 (A.4) =A + , (7.1)
Et x2t+1 x2t 0n 2 ×1
−1
x2t = Z λθ Z kθ x1t . (A.5)
where x1t is an n 1 × 1 vector of predetermined variables with the initial value x10 given,
x2t is an n 2 vector of “forward looking” variables, and εt is a white noise process with
Bibliography covariance matrix 6. All dynamics of the exogenous processes have been placed in x1t .

Bénassy, J.-P., 1995, “Money and Wage Contracts in an Optimizing Model of the Business Example 52 (Cagan model.) In the Cagan model (see, for instance, Blanchard and Kahn
Cycle,” Journal of Monetary Economics, 35, 303–315. (1980) 4), money demand is
Blanchard, O. J., and S. Fischer, 1989, Lectures on Macroeconomics, MIT Press.
ln Mt − ln Pt = −ωi t , with ω > 0,
Obstfeldt, M., and K. Rogoff, 1996, Foundations of International Macroeconomics, MIT
where Mt , Pt , and i t are the nominal money balances, the price level and the nominal
Press.
interest rates respectively. A money demand equation would typically include a term cap-
Rotemberg, J. J., 1982a, “Monopolistic Price Adjustment and Aggregate Output,” Review turing real activity (output), but that is suppressed in the Cagan model. This is reasonable
of Economic Studies, 49, 517–531. if we focus on cases where the price level fluctuates much more than output. Similarly, we
assume that the real interest rate is constant, so the Fisher equation is
Rotemberg, J. J., 1982b, “Sticky Prices in the United States,” Journal of Political Econ-
omy, 60, 1187–1211. i t = Et (ln Pt+1 − ln Pt ) + constant.

Söderlind, P., 1999, “Solution and Estimation of RE Macromodels with Optimal Policy,” Combining gives an equation where the price level, Pt , behaves like an asset price
European Economic Review, 43, 813–823.
ln Pt = (1 − α) ln Mt + α Et ln Pt+1 , with 0 < α < 1 since α = ω/ (1 + ω) .
Walsh, C. E., 2003, Monetary Theory and Policy, MIT Press, Cambridge, Massachusetts,
2nd edn. In will be important that α < 1, so the future is “discounted.”

92 93
Example 53 Consider the Cagan model in Example 52 and suppose the log money sup- 7.2 Matrix Decompositions
ply, ln Mt , is an exogenous AR(1)
Remark 55 (Complex matrices.) Let A H denote the transpose of the complex conjugate
α−1 1
Et ln Pt+1 = ln Mt + ln Pt , and of A, so that if
α α
" #
h i 1
ln Mt+1 = ρ ln Mt + t+1 . A = 1 2 + 3i then A = H
.
2 − 3i
This can be rewritten on the state-space form (7.1) A square matrix A is unitary (similar to orthogonal) if A H = A−1 , for instance,
" # " #" # " # " # " #
ln Mt+1 ρ 0 ln Mt t+1 1+i 1+i 1−i 1+i
= α−1 1 + . A= 2
1−i
2
−1+i
gives A H = A−1 = 2
1−i
2
−1−i
.
Et ln Pt+1 α α ln Pt 0 2 2 2 2

Example 54 In the simple case of Example 52, we can solve the model analytically by Remark 56 (Schur decomposition.) The decomposition of the n × n matrix A gives the
recursion forward (and using the law of iterated expectations) matrices T and Z such that
A = ZT ZH (7.3)
ln Pt = (1 − α) ln Mt + α Et ln Pt+1
where Z is a unitary n × n matrix and T is an n × n upper triangular Schur form with
= (1 − α) ln Mt + α Et (1 − α) ln Mt+1 + α Et+1 ln Pt+2
 
the eigenvalues along the diagonal. Note that premultiplying (7.3) with Z −1 = Z H and
...
postmultiplying with Z gives
 
= (1 − α) Et ln Mt + α ln Mt+1 + α 2 ln Mt+2 + ... , T = Z H AZ , (7.4)

which is upper triangular. The ordering of the eigenvalues in T can be reshuffled, al-
provided that lim K →∞ α K ln Pt+K = 0. When ln Mt is an AR(1), then Et ln Mt+s =
though this requires that Z is reshuffled conformably to keep (7.3) to hold—this involves
ρ s ln Mt , so we have
a bit of tricky “book keeping.”
  1−α
ln Pt = (1 − α) 1 + αρ + α 2 ρ 2 + ... ln Mt = ln Mt .
1 − αρ Remark 57 (Upper triangular matrices.) If T is upper triangular, then T T is as well.
For instance, with α = 0.5 and ρ = 0.9 we get
Example 58 (Cagan again.) If α = 0.5 and ρ = 0.9 in the Cagan model in Example 53
1 − 0.5 so the Schur decomposition is
ln Pt = ln Mt ≈ 0.909 ln Mt .
1 − 0.5 × 0.9 " # " #
ρ 0 0.9 0
Take expectations of (7.1), based in information in t, of both sides A = α−1 1 = , and
−1 2
" # " # " α α
# " # " #
x1t+1 x1t −0.74 0.673 0.9 1 −0.74 −0.673
Et =A . (7.2) Z≈ ,T = H
, and Z ≈ .
x2t+1 x2t −0.673 −0.74 0 2 0.673 −0.74

We will first try to find the solution to (7.2), then reintroduce the shocks εt . Note that T is upper triangular, with the eigenvalues (0.9 and 2) along the diagonal, and
Z is unitary (Z Z H = I ). Note also that A = Z T Z H holds. In this example, T and Z are
real since all eigenvalues are real (unique).

94 95
7.2.1 Why not a Spectral Decomposition? decomposition
#−1
Remark 59 (Spectral decomposition.) The n eigenvalues (λi ) and associated eigenvec- αρ−1 αρ−1
" # " #" #"
ρ 0 α−1 0 ρ 0 α−1 0
tors (z i ) of the n × n matrix A satisfies α−1 1
= ,
α α 1 1 0 α1 1 1

(A − λi In ) z i = 0n×1 . (7.5) which would work fine since the eigenvectors are linearly independent.

If the eigenvectors are linearly independent, then we can decompose A as


h i 7.3 Solving
A = Z 3Z −1 , where 3 = diag(λ1 , ..., λ1 ) and Z = z 1 z 2 · · · z n ,
7.3.1 “Decoupling”
so 3 is a matrix with the eigenvalues along the diagonal and zeros elsewhere. To see why
Calculate the Schur decomposition (7.3) of A in (7.2) and reorder (both T and Z , a bit
the spectral decomposition works, note that by (7.5) AZ = Z 3, which can be premulti-
tricky) so the n θ eigenvalues with modulus smaller than one comes first. (Note that T and
plied by Z −1 . (Note that this decomposition can be quite convenient since the fact that 3
Z may include complex elements.) Partition T accordingly
is diagonal implies A2 = A A = Z 3Z −1 Z 3Z −1 = Z 33Z −1 = Z 32 Z −1 .)
" #
Tθθ Tθ δ
Why should we not decompose A with the help of eigenvalues and eigenvectors in- T = . (7.6)
0 Tδδ
stead? We could if the eigenvectors were linearly independent (distinct eigenvalues is a
sufficient, not necessary, condition for this). In this case, the approach in Section 7.3 still If there are n θ stable and n δ unstable eigenvalues, then Tθθ is n θ × n θ , Tθδ is n θ × n δ , and
applies, but where we let T = 3. Tδδ is n δ × n δ .
Often the eigenvectors are linearly dependent. This would create a fundamental prob- Introduce the auxiliary variables
lem when we try to “decouple” the system of difference equations (see below). We then " # " #
θt x1t
have to use some other decomposition. The Jordan decomposition used by Blanchard = ZH . (7.7)
δt x2t
and Kahn (1980) is perhaps the neatest, but also very difficult to calculate accurately (see
Golub and van Loan (1989)). The calculation of the Schur decomposition is fairly robust, Use A = Z T Z H in (7.2). Then, premultiply with the non-singular matrix Z H (“no
and is therefore widely implemented in software libraries. information is lost,” that is, we get an equivalent system), use (7.7) and (7.3)
" #
Example 60 Consider the process xt − xt−1 = xt−1 − xt−2 + εt . It can be written as a H
Z Et xt+1 = Z Z T ZH H x1t
(use A = Z T Z H in 7.2),
VAR(1) as x2t
" # " #" # " #
εt
" # " #
xt 2 −1 xt−1 θt+1 θt
= + . Et H
= Z ZT (from (7.7))
xt−1 1 0 xt−2 0 δt+1 δt
" #" #
The VAR matrix has a repeated eigenvalue (1) and eigenvectors ([1, 1]). Tθθ Tθ δ θt
= (since Z H Z = I ). (7.8)
0 Tδδ δt
Example 61 (Cagan again.) The A matrix in Example 53 has the following spectral

96 97
7.3.2 Solving the System (7.8) of Et θt+1 and Et δt+1 variable (the exogenous ln Mt ) and the roots are ρ and 1/α (see Example 58 or 61),
so if both |ρ| and |α| are less than unity, then there is a unique stable solution, where
Since Tδδ contains the roots outside the unit circle, δt will diverge as t increases unless
θ0 = Z kθ
−1
ln M0 .
δ0 = 0. Any stable solution will therefore require that δt = 0 for all t. The system (7.8)
can therefore be written as Example 63 (Cagan model with too many stable roots.) Consider the Cagan model in
δt = 0, and
(7.9) Example 53 again, but change the price equation to
Et θt+1 = Tθθ θt .
ln Pt = ln Mt + a Et ln Pt+1 , with a > 0. (7.13)
7.3.3 Initial Values of θ0 " # " #" # " #
ln Mt+1 ρ 0 ln Mt t+1
Invert (7.7) and partition as = −1 1 + , (7.14)
Et ln Pt+1 a a ln Pt 0
" # " #" #
x1t Z kθ Z kδ θt with the eigenvalues ρ (still assuming |ρ| < 1) and 1/a. To illustrate, suppose |aρ| < 1.
=
x2t Z λθ Z λδ δt Then, iterating on the price equation gives the stable fundamental solution
" #
Z kθ ∞
= θt , (7.10) X
Z λθ ln Pt∗ = a s Et ln Mt+s
s=0
since δt = 0. 1
= ln Mt .
The initial conditions are that x10 is given. From (7.10) we have 1 − aρ
However, the full set of solutions is ln Pt = ln Pt∗ + bt , where bt is a “bubble.” Try this in
x10 = Z kθ θ0 , (7.11)
(7.13) to get
which can be solved for θ0 if Z kθ is invertible. It has n 1 rows (the number of prede- 1

1

termined/backward looking variables) and n θ columns (as many as stable roots), so a ln Mt + bt = ln Mt + a Et ln Mt+1 + bt+1
1 − aρ 1 − aρ
necessary condition is that the number of stable roots equal the number of backward aρ
= ln Mt + ln Mt + a Et bt+1 ,
looking variables (Blanchard and Kahn (1980), proposition 1). If that is the case and Z kθ 1 − aρ
is invertible, then which requires bt = a Et bt+1 . That means that Et bt+1 = bt /a. When |a| < 1 this means
θ0 = Z kθ
−1
x10 , (7.12) that the bubble is unstable, and we choose bt = 0 to get an economically meaningful
so we can calculate a unique stable solution by using (7.12) in (7.8) (and then transforming (stable) solution of the price level ln Pt = ln Pt∗ + bt . However, with |a| > 1, there is an
back to x1t ). This is developed below. infinity of stable bubbles (which all give a stable price level) and we have no good reason
If the number of stable roots is less than the number of predetermined variables, n 1 , to choose one over another.
then there is no stable solution. In contrast, if the number of stable roots is larger than
Example 64 (Cagan model with too many stable roots, continued.) The matrix in (7.14)
the number of predetermined variables, , n 1 , then there is an infinite number of stable
can be decomposed in terms of the eigenvectors and eigenvalues
solutions. See Blanchard and Kahn (1980) for details.
" # " #" #" #−1
ρ 0 1 − aρ 0 ρ 0 1 − aρ 0
Example 62 (Cagan model with a unique stable solution.) There is one predetermined 1
= .
−1
a a 1 1 0 a1 1 1

98 99
When |a| > 1, then both eigenvalues are stable so (7.10) can be written The spectral decomposition is
" # " #" # #−1
θ1t
" # " #" #"
ln Mt
=
1 − aρ 0
, 2 β − 23 β 1 1
2 0 − 23 β 1
= .
ln Pt 1 1 θ2t 0 12 1 0 0 2 1 0
where θ1t and θ2t are the elements of the vector θt . We can identify θ1t = ln Mt /(1 − aρ) In this case (7.9) and (7.10) become
from the first equation. The second equation then says that ln Pt = ln Mt /(1 − aρ) + θ2t ,
1
where by (7.9) Et θ2t+1 = θ2t /a, so θ2t is indeed a stable variable. However, beyond that δt = 0 and θt+1 = θt ,
2
we cannot say much about what it represents—this is the “bubble” discussed above.
" # " #
x1t − 23 β
= θt .
x2t 1
Example 65 (Square, but singular, Z kθ .) It is possible that we have the right number of
stable roots, but that Z kθ is singular. This is a fairly odd case, but we cannot rule it out. If β 6= 0, then Z kθ is non-singular and we have a unique stable solution. It is (since
For instance, consider a slight variation on the example by King and Watson (1998) x2t = θt )
1 3
" # " #" # x1t+1 = x1t with x10 given, and x2t = − x1t .
x1t+1 2 0 x1t 2 2β
= , with x10 given.
x2t+1 α 12 x2t This model has a unique stable solution when β 6 = 0 since x20 adjusts so that x1t will not
explode. When β = 0, then x1t does not depend on x20 , so there is no possibility to put
The model therefore has one stable root and one initial condition (it will turn out to be in
the system on a stable path (recall that x1t has an inherent tendency of being unstable).
the wrong place, however). The spectral decomposition is
" # " #"
1
#" #−1 The evolution of the deterministic system is (7.9) with (7.12) as starting values. (7.10)
2 0 0 1 2 0 0 1
= . shows to transform to expected values of x1t and x2t .
α 12 1 3α
2
0 2 1 3α
2

(7.9) and (7.10) become 7.3.4 Putting the Innovations Back


1 From the model in state-space form (7.1) we know that x1t+1 − Et x1t+1 = εt+1 . Using
δt = 0 and θt+1 = θt ,
" # " # 2 (7.10) to rewrite this gives
x1t 0
= θt .
x2t 1 Z kθ (θt+1 − Et θt+1 ) = εt+1 , (7.15)

The stable auxiliary variable, θt , is not related to the variable with an initial condition, which under the same conditions as above can be inverted
x1t , so Z kθ is indeed singular. It is clear this model cannot have a stable solution since
the solution for the first variable must be x1t = 2t x10 . θt+1 = Et θt+1 + Z kθ
−1
εt+1 . (7.16)

Example 66 (Fixing Example 65.) Change the model to Combined with (7.9) we have
" # " #" #
x1t+1 2 β x1t θt+1 = Tθ θ θt + Z kθ
−1
εt+1 , (7.17)
= , with x10 given.
x2t+1 0 21 x2t
which with (7.12) and (7.10) is a complete solution of the stochastic model.

100 101
7.3.5 Dynamics in Terms of x1t and x2t where εt+1 has the covariance matrix 6. We know that the impulse response function is

Using θt = Z kθ
−1
x1t from (7.10) in (7.17) gives yt = εt + Mεt−1 + M 2 εt−2 + ..., (7.23)
−1
x1t+1 = Z kθ Tθθ Z kθ x1t + εt+1 . (7.18) which immediately gives unconditional covariance matrix of yt as (since E εt εt−s = 0 if
s 6= 0)
Similarly, combining x2t = Z λθ θt with θt = −1
Z kθ x1t (both from (7.10)) gives  0
Cov (yt ) = 6 + M6 M 0 + M 2 6 M 2 + ... (7.24)
−1
x2t = Z λθ Z kθ x1t . (7.19) This is easily calculated by iterating until convergence on Cov (yt ) in

We can summarize these as Cov (yt ) = 6 + M Cov (yt ) M 0 . (7.25)

x1t+1 = M x1t + εt+1 , and (7.20) The iteration could start by setting Cov (yt ) on the right hand side to a matrix of zeros.
x2t = C x1t , (7.21) (An exact formula exist, but it usually gives longer computation time and less accuracy.)

where the definition of the M and C matrices are clear from comparing with (7.18) and Example 68 (Why (7.25) works). Consider iterating on As+1 = B + C As C 0 by starting
(7.19). from A0 = 0. The first iteration gives A1 = B, the second A2 = B+C A1 C 0 = B+C BC 0 ,
the third A3 = B + C A2 C 0 = B + C B + C BC 0 C 0 = B + C BC 0 + CC BC 0 C 0 , and so

Example 67 (Solving the Cagan model.) From the Cagan model in Example 58, and
forth. Continue until As+1 ≈ As .
(7.19) we have
Now, consider a linear combination of yt
−1
ln Pt = Z λθ Z kθ ln Mt
≈ −0.673 (−0.74)−1 ln Mt z t = C yt . (7.26)

≈ 0.909 ln Mt , (exact answer is 10/11 ln Mt ), It is straightforward to calculate the impulse responses of z t by combining (7.23) and
(7.26). It is also straightforward to calculate the unconditional covariance matrix of z t as
and (7.18) recovers the AR(1) of money supply
Cov (z t ) = C Cov (yt ) C 0 . (7.27)
−1
ln Mt+1 = Z kθ Tθθ Z kθ ln Mt + t+1
= −0.74 ∗ 0.9 ∗ (−0.74)−1 ln Mt + t+1
= 0.9 ln Mt + t+1 . 8 A Menu of Different Policy Rules

8.1 A “Simple” Policy Rule


7.4 Time Series Representation∗
Reference: Currie and Levine (1993) and Söderlind (1999).
This section summarizes some useful tools for calculating unconditional variances from
We now change (7.1) by adding an effect of a vector of policy instruments, u t ,
a VAR(1) model.
" # " # " #
Consider the VAR(1) x1t+1 x1t εt+1
=A + Bu t + , (8.1)
yt+1 = M yt + εt+1 , (7.22) Et x2t+1 x2t 0n 2 ×1

102 103
with x10 given. x2t+1 , which have to be functions of εt+1 , but that is something we will return to later.
The policy instrument is assumed to be a linear function of x1t and x2t (this might Form the Lagrangian
force you to change the definition of x1t - you can always add variables) ∞
X
β t xt0 Qxt + 2xt0 U u t + u 0t Ru t + 2ρt+1
0
(Axt + Bu t + ξt+1 − xt+1 ) .
 
" # L 0 = min

E0
x1t {u t }t=0
u t = −F . (8.2) t=0
x2t (8.8)
The k first order conditions for u t are
Substituting for u t in (8.1) gives
0
" # " # " # −B Et ρt+1 = U 0 xt + Ru t . (8.9)
x1t+1 x1t εt+1
= (A − B F) + , (8.3)
Et x2t+1 x2t 0n 2 ×1 The n first order conditions for xt are
which is on the same form as (7.1), but where A − B F replaces A. We can therefore apply
β A0 Et ρt+1 = ρt − β Qxt − βU u t . (8.10)
the solution algorithm above.
We can write (8.7), (8.10), and (8.9) as
8.2 Optimal Policy under Commitment 
In 0n×k 0n×n

xt+1
 
A B 0n×n

xt
 
ξt+1

 0n×n 0n×k β A0   u t+1  =  −β Q −βU In   ut  +  0 .


       
Reference: Currie and Levine (1993), Backus and Driffil (1986), Svensson (1994), and
0k×n 0k×k −B 0 Et ρt+1 U 0 R 0k×n ρt 0
Söderlind (1999).
(8.11)
Let " #
x1t
xt = , (8.4) 8.2.1 Initial Conditions
x2t
where x1t is an n 1 × 1 vector of “backward looking” variables and x2t an n 2 × 1 vector of (See Currie and Levine (1993) p 171.)
“forward looking” variables. Let n = n 1 + n 2 , so xt is an n × 1 vector. We have n 1 initial conditions from the predetermined x10 , and n 2 from p20 = 0n 2 ×1 .
The problem is to minimize the loss function The control variables in u t (a k ×1 vector) should belong to the forward looking variables,

" #" # since we have no initial value for them. Note that p2t will typically be non-zero, except
X h i Q U xt
J0 = E0 β t xt0 u 0t , (8.5) in the initial period. This can be interpreted as if the policy maker in t = 0 exploits
t=0
U0 R ut the fact that private sector expectations formed in t < 0 (which still influence today’s
by choosing an optimal sequence of the k × 1 vector of policy instrument, u t . The con- economy, for instance, through capital stocks and prices determined in previous periods).
straints are In fact, there is always a temptation to exploit this, that is, to set policy in such a way that
" # " # " # ρ2t = 0, but the commitment rules out this—expect for the initial period. The “timeless
x1t+1 x1t εt+1
=A + Bu t + , or (8.6) perspective,” advocated by Woodford (1999), is essentially to use the policy rule that
Et x2t+1 x2t 0n 2 ×1 comes out from solving (8.11), but only from some period t > 0 where ρ2t is set to some
xt+1 = Axt + Bu t + ξt+1 , and x10 given, (8.7) non-zero value such that the policy is stationary.

where ξt+1 = (εt+1 , x2t+1 − Et x2t+1 ). The second part of ξt+1 are the innovations in

104 105
8.2.2 Solution The solution is then on the form

It can be shown that the solution is on the form x1t+1 = M x1t + εt+1 , (8.14)
" # " # " #
x1t+1 x1t εt+1 u t = −F x1t , (8.15)
=M + , (8.12)
ρ2t+1 ρ2t 0n 2 ×1 x2t = C x1t . (8.16)

and  
x2t
Bibliography
" #
x1t
 ut  = C . (8.13)
 
ρ2t
ρ1t Backus, D., and J. Driffil, 1986, “The Consistency of Optimal Policy in Stochastic Ratio-
Equations (8.12) and (8.13), together with the initial values of x10 and ρ20 = 0n 2 ×1 give nal Expectations Models,” CEPR Discussion Paper 124.
a complete description of the evolution of the system.
Blanchard, O. J., and C. M. Kahn, 1980, “The Solution to Linear Difference Models under
Rational Expectations,” Econometrica, 5, 1305–1311.
8.3 Discretionary Solution
Currie, D., and P. Levine, 1993, Rules, Reputation and Macroeconomic Policy Coordina-
References: Currie and Levine (1993), Backus and Driffil (1986), Oudiz and Sachs (1985), tion, Cambridge University Press.
Svensson (1994), and Söderlind (1999).
Golub, G. H., and C. F. van Loan, 1989, Matrix Computations, The John Hopkins Uni-
The model is the same as in the commitment case, but the policy maker cannot commit
versity Press, Baltimore, MD, 2nd edn.
to a policy rule. Instead, the policy maker reoptimizes every period.
King, R. G., and M. W. Watson, 1998, “The Solution of Singular Linear Difference Sys-
1. We can find a stationary policy rule if we let the time horizon go to infinity.
tems under Rational Expectations,” International Economic Review, 39, 1015–1026.
2. The state of the economy is given by the predetermined variables, x1t . As a conse-
Klein, P., 2000, “Using the Generalized Schur Form to Solve a Multivariate Linear Ratio-
quence, the decision rule and the non-predetermined variables, x2t , must be linear
nal Expectations Model,” Journal of Economic Dynamics and Control, 24, 1405–1423.
functions of x1t (u t = −F x1t , and x2t = C x1t , respectively in the stationary equi-
librium). Oudiz, G., and J. Sachs, 1985, “International Policy Coordination in Dynamic Macroe-
conomic Models,” in Willem H. Buiter, and Richard C. Marston (ed.), International
3. The policy maker takes the expectations of private agents as given (“Nash equi-
Economic Policy Coordination, Cambridge University Press, Cambridge.
librium” - not like in commitment case where the policy maker is a “Stackelberg
leader”). From above it is Et x2t+1 = C Et x1t+1 . Söderlind, P., 1999, “Solution and Estimation of RE Macromodels with Optimal Policy,”
European Economic Review, 43, 813–823.
4. No closed form solution exists—not even a proof (except in the scalar case) of con-
vergence of the solution algorithm. The solution algorithm is backwards recursive, Svensson, L. E. O., 1994, “Why Exchange Rate Bands? Monetary Independence in Spite
starting from a distant period in time, T . First, we find the optimal policy for T , of Fixed Exchange Rates,” Journal of Monetary Economics, 33, 157–199.
then for T − 1 (incorporating the knowledge of how policy in T will be set), etc.
The recursion continues until the policy rule has converged.

106 107
Woodford, M., 1999, “Commentary: How Should Monetary Policy Be Conducted in an
Era of Price Stability?,” in New Challenges for Monetary Policy, Federal Reserve Bank
of Kansas City.
9 Estimation of New Keynesian Models

9.1 “New Keynesian Economics and the Phillips Curve” by Roberts

(JMCB 1995)
Rotemberg or Calvo or...

1pt − Et 1pt+1 = c0 + γ yt + t (9)

From theory: γ > 0, t could be serially correlated and correlated with yt , yt is the output
gap
Part of t could be real oil prices

1pt − Et 1pt+1 = c0 + γ yt + c1 1r poilt + c2 1r poilt−1 + t

Tried unemployment rate (−γ RUt ) instead of output gap (+γ yt )

108 109
9.1.1 How to Measure Et 1pt+1 ? 9.1.4 Results

Survey data (Michigan, Livingston) γ is reasonable (though not significant when using the RE approach): 0.2–0.4 (cf ≈ 0.6
RE/Ex post data (McCullum): use 1pt+1 = Et 1pt+1 +u t+1 to substitute for Et 1pt+1 = in PS lecture notes)
1pt+1 − u t+1 Residuals autocorrelated (adjusted std, Newey-West?)
1pt − 1pt+1 = c0 + γ yt + t − u t+1 Subsample (/1973/) stability when using survey data, perhaps not with RE approach
[See Table 1 of the paper]
u t+1 is (under RE) uncorrelated with regressor (dated t)—but extra noise
Notice that having − Et 1pt+1 on LHS in (9) is crucial for RE approach (at least if we
were to use LS). Consider the alternative 9.2 “Solution and Estimation of RE Macromodels with Optimal Mon-
etary Policy” by Söderlind
1pt = β Et 1pt+1 + c0 + γ yt + t
= β1pt+1 + c0 + γ yt + t − βu t+1 (EER1999)
Simplified version of model by Fuhrer-Moore
Correlation of residual and regressor
yt = α1 yt−1 + α2 yt−2 + αr rt−1 + ε yt ,
9.1.2 Data ∞ 
1 X 40 s

Sample: annual data 1949–1990 rt = Et (i t+s − πt+1+s ) ,
41 41
s=0
1pt : CPI, Dec to Dec
The log price level is the average of the wage contracts still in effect
r poil: oil price/GNP deflator
yt : GNP deviations from deterministic trend (linear-quadratic) pt = θ0 wt + θ1 wt−1 + (1 − θ0 − θ1 ) wt−2 .

9.1.3 Estimation Method Nominal wage contracts are set to for 3 periods: increasing in price level and yt (coeff γ )

t could be correlated with yt ⇒IV (or rather 2SLS) X h i
L t = Et β t q y yt2 + (1 − q y )πt2 + qi i t2 ,
Instruments: r poilt , r poilt−1 , G t , G t−1 , dummy (democrat) t=0

Söderlind (Journal of Policy Modeling, 2001.) estimates the model


quarterly US data from the mid 1960s to the mid 1990s
yt log real GNP per capita detrended with a linear trend,
πt CPI
i t T-bill rate (i 1,t )
Monte Carlo experiment [See Table 1 of the paper]
The means of the estimates are generally close to the true values, even if...

1. the short sample exaggerates the effectiveness of monetary policy by making (i)

110 111
output too responsive to real interest rates (αr too low), and (ii) wage setting too i: averages of FFR
responsive to output (γ too high) π : GDP deflator inflation
y: 5 different measures of the output gap
2. underestimates the willingness to use monetary policy (qi too high). These parame-
ters, and also the weight on output in the loss function, q y , have considerably higher
9.3.2 Estimation Methods
standard deviations than predicted by asymptotic theory
ML:
3. The correlations of the parameters “stabilize” the properties of the model:
(a) Estimate VAR(4) of π and i. Keep fixed
(a) α1 and α2 are strongly negatively correlated (-0.85), which keeps the autocor- (b) Guess parameters in (2), solve for equilibrium, find residuals and use in likelihood
relation of output relatively stable across simulations. function (SSE?); repeat with new parameters...
GMM (or rather IV/2SLS):
(b) ar and qi are negatively correlated (-0.65), which keeps the effect of monetary
Instruments: four lags of yt , πt , i t or four lags of defense spending, oil prices and dummy
policy on output and inflation relatively constant.
(democrat)
(c) average contract length (θ0 + 2θ1 + 3 (1 − θ0 − θ1 )) and γ are strongly pos-
itively correlated (0.83), which keeps the effect of a price shock on inflation 9.3.3 Results
and output relatively unchanged across simulations.
ML: lags of output (α1 , α2 ) large and significant; µ varies across specifications (perhaps
≈ 0), β small whenever µ is large
9.3 “Estimating The Euler Equation for Output” by Fuhrer and Rude-
GMM: like ML, but µ tend to be larger
busch
9.3.4 Monte Carlo Simulations
(JME 2004)
Euler equation for the output gap Purpose: study the small sample properties of ML and GMM estimators of µ and β.
Simulations made for different true DGPs
yt = Et yt+1 − σ (i t − Et πt+1 ) + ηt (1) ML: no small sample bias in either µ or β
GMM: µ biased towards 0.5, β downward biased when true µ is small. GMM is much
Correct only when C = Y
less precise than ML. Weak instruments...
Empirical problems⇒hybrid model
 
κ−1
1 X 9.4 “New-Keynesian Models and Monetary Policy: A Reexamina-
yt = α0 + α1 yt−1 + α2 yt−2 + µ Et−τ yt+1 − β Et−τ  i t+ j+m − πt+ j+m+1  (2)

κ
j=0 tion of the Stylized Facts” by Söderström et al
Lags of output, timing of expectations (τ ), long (κ) real interest rate (lagged?, m)
(Scandinavian Journal of Economics, forthcoming)

9.3.1 Data

1966Q1–2000Q4

112 113
Model from Rudebusch [See Tables 1,3 and 4 and Figures 3 and 4 of the paper]
4
X
πt = µπ Et−1 π̄t+3 + (1 − µπ ) απ j πt− j + α y yt−1 + εt , (9.1)
j=1
2
X
yt = µ y Et−1 yt+1 + 1 − µ y β y j yt− j − βr i t−1 − Et−1 π̄t+3 + ηt ,
 
(9.2)
j=1

min Var [π̄t ] + λ Var [yt ] + ν Var [1i t ] . (9.3)


{i t }

{λ, ν, µπ , µ y , σπ , σ y } is found by minimizing the function


 0  
ξ − ξ̂ V̂ −1 ξ − ξ̂ , (9.4)

where ξ are the moments in Table 1 (std and autocorr), and V their covariance matrix.
Other parameters are taken from Rudebusch (Table 3)
Key findings:

1. a small concern for output stability (λ low) but a large preference for interest rate
smoothing (ν high)

2. small degree of forward-looking behavior in price-setting (µπ low) but a large de-
gree of forward-looking in the determination of output (µ y high)—different from
Fuhrer and Rudebusch

Analysis of why we get these results:

1. larger values of λ (output stability) than in our estimated configuration imply too
high volatility and persistence of inflation and too low volatility and persistence of
output compared with U.S. data

2. while small decreases in ν (interest rate stability) have little effect, a positive value
for ν is crucial to avoid excessive volatility in the interest rate relative to the data

3. larger values of µπ than in our estimation imply too low volatility of output and the
interest rate and too low persistence of inflation and output

4. smaller values of µ y lead to excessive volatility in the interest rate and inflation and
excessive persistence in inflation

114 115

Вам также может понравиться