Вы находитесь на странице: 1из 115

# R EVIEW T OPIC 1: S IMPLE R EGRESSION

## R EVIEW T OPIC 1: S IMPLE R EGRESSION

I NTRODUCTORY F INANCIAL E CONOMETRICS
Review of Econometric Theory
3 C REDITS , 51 H OURS
Jianhua Gang

Wooldridge, Ch.2

School of Finance
Renmin University of China

Spring 2013

S PRING
Theory
2013

1 / 110

## J IANHUA G ANG (RUC)

R EGRESSION A NALYSIS

S PRING
Theory
2013

## R EVIEW T OPIC 1: S IMPLE R EGRESSION

R EGRESSION A NALYSIS

2 / 110

## C LASSICAL N ORMAL S IMPLE R EGRESSION M ODEL

Generalized idea of a random sample of n independently and
identically distributed (i.i.d.) observations from N (, 2 ).

## Regression analysis involves the estimation and evaluation of the

relationship between a variable of interest (dependent variable,
explained variable, regressand) and one or more other variables
(independent variables, explanatory variables, regressors).
What is estimation, prediction (forecast), the fitting?

## Have sample of n independent observations y1 , ..., yn , each of

which is normally distributed with variance 2 ,but conditional
mean governed by
E(yi ) = + xi , i = 1, ..., n.
where,
1
2

S PRING
Theory
2013

3 / 110

## and are termed regression parameters/regression coefficients.

The term xi varies with i, but is not random (nonstochastic, fixed in
repeated sampling).
What is sampling?

S PRING
Theory
2013

4 / 110

## C LASSICAL N ORMAL S IMPLE R EGRESSION M ODEL

If ui = yi ( + xi ) denotes the error (or disturbance term), then
write simple regression model as:
yi = + xi + ui , ui NID(0, 2 ), i = 1, ..., n,

1
2

## the intercept is the mean of y when xi equals zero

the slope is the change in the mean of y when xi increases by one
unit. (This interpretation of the intercept is not always sensible in
economic applications.)

(1)

## The assumption that the regressor x is Nonstochastic is

inappropriate in many applications in economics and it is relaxed
later.
More useful to think of the classical assumption as being
appropriate when we conditional on the values of x1 , ..., xn . Thus,
conditional upon the values of x1 , ..., xn , the yi are independent
normal variables with means + xi and common constant
variance 2 for i = 1, ..., n.

S PRING
Theory
2013

## R EVIEW T OPIC 1: S IMPLE R EGRESSION

5 / 110

E STIMATION OF PARAMETERS

S PRING
Theory
2013

## R EVIEW T OPIC 1: S IMPLE R EGRESSION

E STIMATION OF PARAMETERS

6 / 110

## Population moments conditions (assumptions provided before as in

(1)):
The following general approaches to estimate , and 2 are
considered: method of moments (MM); ordinary least squares
(OLS); and maximum likelihood estimation (MLE).

E(ui ) = 0,
E(xi ui ) = 0,
E(u2i

2 ) = 0.

## Let the MM estimator of and be b

and b
, with associated
bi = yi (b
+b
xi ), i = 1, ..., n.
residuals u

S PRING
Theory
2013

7 / 110

S PRING
Theory
2013

8 / 110

## Obtain MM: solving the derived equations (replacing E(.) by

bi ), the equations are:
n1 (.), and ui by u

Choose estimates b
and b
to get "best fit" in the sense of
minimizing
S(, ) = [yi ( + xi )]2 .

ubi

b2i
u

[yi (b + bxi )] = 0,

xi ubi =

## First order conditions (the F.O.C.s) are,

xi [yi (b + bxi )] = 0,

S(b
, b
)

S(b
, b
)

= 0

## It can be proved that under weak conditions, MME are consistent

and asymptotically normally distributed.

S PRING
Theory
2013

9 / 110

## O RDINARY L EAST S QUARES E STIMATION (OLS)

xi [yi (b + bxi )]
i

ubi = 0

xi ubi = 0

(2)

## O RDINARY L EAST S QUARES E STIMATION (OLS)

(xi x)(yi y)

b
=

It is clear that the normal equations imply that the OLS estimates
of and are equal to the corresponding MME previously.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Review of Econometric

S PRING
Theory
2013

10 / 110

The solution of b
and b
which minimize the objective function
S(, ) are,

(3)

ui is an OLS
residual).

= 0

S PRING
Theory
2013

= 0

[yi (b + bxi )]

11 / 110

(xi x)2
i

b
= yb
x

i

S PRING
Theory
2013

12 / 110

## Because yi N ( + xi , 2 ), i = 1, ..., n, so that

f (yi ) = (22 )1/2 exp{[yi ( + xi )]2 /22 }, i.

Let b
yi = ( b
+b
xi ) denote a typical OLS predicted value, then the
normal equation for OLS yield several results.

## We already assume that yi , ..., yn are independent, so

f (y1 , ..., yn ) =

yi

f ( yi ) = L
i

b2 =
equals OLS. The MLE of 2 is

n 1

ub2i = MM estimate.

13 / 110

=
=

(byi + ubi )

=
i

i
b
y2i

+
i

b2i
u

+0

i

S PRING
Theory
2013

## R EVIEW T OPIC 1: S IMPLE R EGRESSION

14 / 110

G OODNESS OF F IT

G OODNESS OF F IT

i

S PRING
Theory
2013

(yi n1 yi )2

y2i

## R EVIEW T OPIC 1: S IMPLE R EGRESSION

n
[y ( + xi )]2
l(, , 2 ) = ln(22 ) i
.
2
22
i

i

byi ubi

## Coefficient of determination R2 is index of goodness of fit of OLS

line with
ESS
= 1
, 0 R2 1.
R2 =
TSS
TSS
R2 = r2XY , where rXY = XY (correlation coefficient between x and
y).

S PRING
Theory
2013

15 / 110

S PRING
Theory
2013

16 / 110

## S AMPLING D ISTRIBUTION OF OLE E STIMATORS

For the classical normal simple regression model, b
and b
are
jointly normally distributed with
E(b
) =
E( b
) =

## Best linear unbiased estimator (BLUE) of and , even when

errors ui are not normally distributed.

Var(b
) =

2
(xi x)2

i
2

)
+ x2 Var(b
n
Cov(b
, b
) = xVar(b
)
Var(b
) =

S PRING
Theory
2013

17 / 110

S PRING
Theory
2013

18 / 110

## The OLS estimator of the regression parameters can be written as

It can be shown that, in classical normal simple regression model,

b
= + wi ui

ub2i = RSS 2 2 (n 2)

b
= + zi ui

is independent of b
and b
.

## where the nonstochastic terms wi and zi depend upon the

regressor values, e.g.

## Note (n 2) is the number of observations minus the number of

regression parameters estimated to derive the residuals and is called
the degree of freedom parameter for the regression.

j

S PRING
Theory
2013

19 / 110

S PRING
Theory
2013

20 / 110

## E STIMATION OF SIGMA - SQUARE

S TATISTICAL I NFERENCE

S TATISTICAL I NFERENCE
S TOCHASTIC

Hence,
b2i ) = 2 (n 2)
E( u
i

s2 =

## Study of statistical inference requires the specification of the

probabilistic model for y1 , ..., yn .We make the following
assumptions.

ub2i
i

n2

b2 = [(n 2)/n] s2 is
is unbiased. The ML estimator, however,
biased (of course when sample size gets relatively small).

S PRING
Theory
2013

## R EVIEW T OPIC 1: S IMPLE R EGRESSION

21 / 110

S TATISTICAL I NFERENCE

S PRING
Theory
2013

## R EVIEW T OPIC 1: S IMPLE R EGRESSION

S TATISTICAL I NFERENCE

22 / 110

E(yi ) = + xi i;
n

( xi x ) 2 > 0
1

## for n > 1. For the purpose of asymptotic theory, it is conventional

to assume 0 < lim n1 S < ;
A3 Let ui = yi E(yi ),common variance (homoskedasticity)
var(ui ) = 2 i. If the ui do not have the same variance, have
heteroskedasticity.
A4 Let ui = yi E(yi ),uncorrelated disturbances so E(ui uj ) = 0 if
i 6= j.If have time series data and assumption is false then say have
autocorrelation (or serial correlation).
A5 Let ui = yi E(yi ),normally distributed distanbances (so that A4
implies independence).
J IANHUA G ANG (RUC)

## I NTRODUCTORY F INANCIAL E CONOMETRICS Review of Econometric

S PRING
Theory
2013

23 / 110

b
and b
are N (, var(b
)) and N ( , var(b
)), respectively, so that
q
z(b
) = (b
)/ var(b
) N (0, 1)
q
) N (0, 1)
z( b
) = (b
)/ var(b
b2i 2 2 (n 2) independently of b
and b
, so
2
b
(n 2) independently of z(b
) and z( ), so
2
t(b
) =

t( b
) =
J IANHUA G ANG (RUC)

q
q

z(b
)

(n2) 2

z( b
)

(n2) 2

t(n 2)

t(n 2)

S PRING
Theory
2013

24 / 110

(n2)

(b
)
z( b
)
t( b
) = q = q
q t(n 2)
2
s
s2
b
var
(

)
2

Hence,
t( b
) =

var(b
)(

s2
2
s2
s2
)
=
(
)(
)
=
2
SXX 2
SXX

## is the estimator of var(b

) and the square root of this quantity is
called the estimated standard error, denoted by
s
r
q
2
s
s2
b
b
var(b
)(when n big)
SE( ) = var( )( 2 ) =

SXX

S PRING
Theory
2013

25 / 110

),

t(b
) =

SE(b
)

t(n 2)

)
(b
t(n 2),
SE(b
)

S PRING
Theory
2013

26 / 110

## H YPOTHESIS T ESTING : T S TATISTIC

Consider the null hypothesis that restricts one of the regression
parameters, e.g. H0 : = 0 , where 0 is some specified constant.

## For whatever value of ,

prob(d1 t(n 2) d1 ) = (1 )

t( b
) =

(b
)
t(n 2),
SE(b
)

t0 (b
) =

(b
0 )
t(n 2).
SE(b
)

## Then the (1 ) 100 per cent confidence intervals (C.I.) for

and are given by,

respectively.

and so if H0 is true,

b
d1 SE(b
)
b
)
d1 SE(b

## ) is termed as the test statistic.

t0 ( b
J IANHUA G ANG (RUC)

S PRING
Theory
2013

27 / 110

S PRING
Theory
2013

28 / 110

## R ELAXING THE A SSUMPTION OF F IXED R EGRESSORS

Suppose that x, like y, is a r.v.. Consider the results above that can
now be regarded as being derived, conditional upon the values
x1 , ..., xn .

## The critical/rejection region depends upon the nature of the

alternative hypothesis and the prespecified significance level,
denoted by .
1

2
3

H1 : 6= 0 reject H0 if |t0 (b
)| > d1 ,where
prob(t(n 2) > d1 ) = /2
H1+ : > 0 reject H0 if t0 (b
) > d2 ,where prob(t(n 2) > d2 ) =
H1 : < 0 reject H0 if t0 (b
) < d2 ,where
prob(t(n 2) < d2 ) =
Just replace by and b
by b
in the above to obtain test procedures
for (the intercept).

S PRING
Theory
2013

29 / 110

## R ELAXING THE A SSUMPTION OF F IXED R EGRESSORS

E(b
|x1 , ..., xn ) = , E(b
|x1 , ..., xn ) = and E(s2 |x1 , ..., xn ) = 2 .These
expectations do not depend upon the x values and so OLS
estimators are unconditionally unbiased. Similar remarks apply to
probability limits;
var(b
|x1 , ..., xn ), var(b
|x1 , ..., xn ) and cov(b
, b
|xx1 , ..., xn ),as given
above, do depend on the x values, and so do not correspond to
unconditional characteristics.
Fortunately, 2 does not pose major problems for inference. The
variables (b
) /SE(b
) and (b
)/SE(b
) are, given x values,
both distributed as t(n 2), still. This distribution does not depend
on x values, but just on the values of (n 2). Hence the t tests
and confidence intervals described above are unconditinally valid.

S PRING
Theory
2013

30 / 110

SCHOOLING )

1

## It has been assumed that the errors u1 , ..., un NID(0, 2 ) whether

or not we condition on the x values, i.e. the regressor values and
error terms are statistically independent.
Assumptions in 1 can be weakened but we cannot expect to get
results that are exact, i.e. valid for finite sample sizes, and often
have to resort to asymptotically valid results in practical situations.

S PRING
Theory
2013

31 / 110

S PRING
Theory
2013

32 / 110

P REDICTION

P REDICTION

P REDICTION

P REDICTION

## Suppose wish to make predictions for period f , f > n (the sample

size), with xf known and assuming the data generation process for
y is unchanged so that,

## Suppose wish to make predictions for period f , f > n (the sample

size), with xf known and assuming the data generation process for
y is unchanged so that,

yf = + xf + uf , uf N (0, 2 ).

yf = + xf + uf , uf N (0, 2 ).

## Prediction of E(yf ): use the predictor b

yf = b
+b
xf , where the OLS
estimators use the data for i = 1, ..., n. This predictor is BLUE for
E(yf ) = + xf .
The predictor b
yf is a linear combination of the OLS estimators and
so is normally distributed.
The variance of b
yf can be estimated, and confidence intervals and
tests of hypotheses are feasible.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

33 / 110

## Prediction of yf : use same predictor which implies a forecast error

h

 i
of (yf b
y f ) = uf ( b
) + b
xf , which has zero
expectation, given OLS unbiased and E(uf ) = 0.
The forecast error is normally distributed, being a linear
combination of three normal variates, and has a variance that can
be estimated. Confidence intervals and tests of hypotheses, e.g.
H0 : E ( yf b
yf ) = 0,are feasible.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

34 / 110

## C LASSICAL M ULTIPLE R GRESSION M ODEL

Have sample of n independent observations y1 , ..., yn , each of
which is normally distributed with variance 2 , but means vary
according to
E(yi ) = + 1 x1i + ... + k xki = + j xji , i = 1, ..., n.

Wooldridge, Ch.3, 4
and j are parameters/coefficients.
Regressors xji vary with i, but nonrandom (nonstochastic, i.e. fixed
in repeated sampling).
can be regarded as an intercept with = E(yi ), given all xji = 0.
Slopes j can often be regarded as partial derivatives: j =

E(yi )
xji .

## Note: Regressor might be discrete or a nonlinear function of some

other regressor; so that interpretations vary.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

35 / 110

S PRING
Theory
2013

36 / 110

## S TOCHASTIC S PECIFICATION OF C LASSICAL M ODEL

The following assumptions are made in the classical normal
regression model:

j

## A1 There exist observation invariant parameters and j , j = 1, ..., k

such that
E(yi ) = + j xji i;
j

j
n

n

1

S PRING
Theory
2013

37 / 110

S PRING
Theory
2013

38 / 110

## S TOCHASTIC S PECIFICATION OF C LASSICAL M ODEL

Assumption A2 is often too restrictive for economic applications
in which some regressors are probably better regarded as random,
rather than fixed in repeated sampling.

## The following assumptions are made in the classical normal

regression model:
A3 Also need to assume that no regressor is just a linear combination
of the other regressors and the intercept term.
A4 Common variance (homoskedasticity) var(ui ) = 2 i. If the ui do
not have the same variance, have heteroskedasticity.
A5 Uncorrelated disturbances so E(ui uj ) = 0 if i 6= j.If have time series
data and assumption is false then say have autocorrelation/serial
correlation.
A6 Normally distributed distanbances (so that A5 implies
independence).

## As in the case of the simple regression model, we can start by

thinking about the conditional distribution of yi , holding the
values xji (i = 1, ..., n; j = 1, ...k) constant. Having derived results
for the conditinal model, we can see which of them will apply to
the unconditional model for y.
For the former model, we have that, given the values of the
regressors, the variates yi are independent with conditional
distributions N ( + xji j , 2 ) for i = 1, ..., n.
j

S PRING
Theory
2013

39 / 110

S PRING
Theory
2013

40 / 110

= 0

"

= 0

## Have, E(ui ) = 0 and E(xji ui ) = 0 for j = 1, ..., k.

Therefore, MM estimators, denoted byb, can be derived form

ubi

## O RDINARY L EAST S QUARES E STIMATION

S(, 1 , ..., k ) =

yi

+ j xji

!#2

xji ubi
i

ubi

bi is the residual yi (b
for j = 1, ..., k, where u
+b
j xji ), i = 1, ..., n.

xji ubi

E(u2i 2 ) = 0,
it is

b2i .
b 2 = n1 u

S PRING
Theory
2013

= 0

## bi is the OLS residual

for j = 1, ..., k, where
! u
yi

b
j xji
+b

, i = 1, ..., n.

= 0

41 / 110

S PRING
Theory
2013

42 / 110

## Using methods similar to those appropriate in the context of the

simple regression model, it can be shown that the log likelihood
function is given by,
Best linear unbiased estimator (BLUE) of and j , j = 1, ..., k,even
when errors ui are not normally distributed.

S(, 1 , ..., k )
n
.
l(, 1 , ..., k , ) = ( ) ln(22 )
2
22
2

## The MLE of the regression parameters must minimize

S(, 1 , ..., k ) and so OLSE = MLE.
b2
The MLE of 2 is RSS
n , where RSS = ui is the OLS residual sum
of squares function.

S PRING
Theory
2013

43 / 110

S PRING
Theory
2013

44 / 110

Let b
yi = b
+b
j

yi

i

byi ubi

bi + bj xji u
bi = 0
= b
u

y2i =
i

(byi n byi )
i

(byi + ubi )2
i

( yi n y i )

(b + bj xji )ubi
i

b2i
+u
i

S PRING
Theory
2013

## R EVIEW T OPIC 2: M ULTIPLE R EGRESSION

45 / 110

G OODNESS OF F IT

S PRING
Theory
2013

## R EVIEW T OPIC 2: M ULTIPLE R EGRESSION

G OODNESS OF F IT

46 / 110

## It can be shown that

R2

Coefficient of determination
is index of goodness of fit of OLS
ESS
2
line with R = TSS = 1 TSS , 0 R2 1.
2

## Some use degree-of-freedom adjusted R2 , denoted by R ,and

2
defined by R = 1 {RSS/ (n k 1) / [TSS/ (n 1)]} .This
index can be negative.

b
j xj ,
= yb
j

b
j =

## If add regressors to a model and re-estimate by OLS, R2 cannot

2
fall (monotonic function on # of parameters), but R can.

exji yi
i

ex2ji

where e
xji is the ith residual from the OLS regression of the
jth regressor on the other (k 1) regressors and the intercept term.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

47 / 110

S PRING
Theory
2013

48 / 110

b
N (, var(b
)).

b
j = j +

exji ui
i

ex2ji

= j +

exji ui
i

## Since the OLS estimators of the slope parameters can be written as

b
xji ui / e
x2ji = j + e
xji ui /RSSj and the disturbances
j = j + e

S PRING
Theory
2013

49 / 110

2 2

## Note that (n k 1) is the number of observations minus the

number of regression parameters estimated to derive the
residuals and is called the degree of freedom parameter for the
regression.
b2i ) = 2 (n k 1) and so the estimator s2 = (n1k1) ( u
b2i )
E( u
i

is unbiased.

## b2 = [(n k 1)/n] s2 is biased (of

However, the MLE estimator,
course when sample size is relatively small).
I NTRODUCTORY F INANCIAL E CONOMETRICS Review of Econometric
S PRING
Theory
2013

S PRING
Theory
2013

50 / 110

## S AMPLING DISTRIBUTIONS FOR INFERENCE

independently of b
and bj , j.

## J IANHUA G ANG (RUC)

var(b
j ) = 2 /RSSj , j = 1, ..., k.

E( b
j ) = j

b2i
u

## where RSSj is the residual sum of squares from the OLS

estimation of the auxiliary regression of the jth regressor on the
other (k 1) regressors and the intercept term.

## S AMPLING D ISTRIBUTION OF OLE E STIMATORS

51 / 110

b
)) and N ( j , var( bj )), respectively, so that
and b
j are N (, var(b
z(b
) = (b
)/

var(b
) N (0, 1)
q
z( bj ) = ( bj j )/ var( bj ) N (0, 1).

## RSS/2 2 (n k 1) independently of z(b

) and z( bj ), so
q

## [RSS/(n k 1)] /2 t(n k 1)

q
t( bj ) = z( bj )/ [RSS/(n k 1)] /2 t(n k 1).
t(b
) = z(b
)/

S PRING
Theory
2013

52 / 110

## S AMPLING DISTRIBUTIONS FOR INFERENCE

C ONFIDENCE I NTERVALS

## We know RSS/(n k 1) = s2 , so that, for example,


q
p
p
t( bj ) = z( bj )/ s2 /2 = ( bj j )/
var( bj ) s2 /2

## var( bj )(s2 /2 ) = (2 /RSSj )(s2 /2 ) = s2 /RSSj which is the

estimator of var( b ) and the square root of this quantity is called
j

).
Hence,

simlarly

## J IANHUA G ANG (RUC)



t( bj ) = bj j /SE( bj ) t(n k 1)
) /SE(b
) t(n k 1)
t(b
) = (b

S PRING
Theory
2013

53 / 110

## the (1 ) 100 per cent confidence intervals for and j are

) and b d1 SE( b ), respectively.
given by b
d1 SE(b
j

j0

## Then t0 ( bj ) is the test statistic. The critical/rejection region

depends upon the nature of the alternative hypothesis and the
prespecified significance level, denoted by .

S PRING
Theory
2013

S PRING
Theory
2013

54 / 110

## For whatever value of j , t( bj ) = ( bj j )/SE( bj ) t(n k 1)

,and so if H0 is true t0 ( b ) = ( b )/SE( b ) t(n k 1).
j

## Consider null hypothesis that restricts one of the regression

parameters, e.g. H0 : j = j0 (some specified constant),

## T EST OF H YPOTHESES USING T S TATISTICS

C ONFIDENCE I NTERVALS

55 / 110

## H1 : j 6= j0 reject H0 if |t0 ( bj )| > d1 ,where

prob(t(n k 1) > d1 ) = /2
H + : > reject H0 if t0 ( b ) > d2 ,where
1

j0

prob(t(n k 1) > d2 ) =

## H1 : j < j0 reject H0 if t0 ( bj ) < d2 ,where

prob(t(n k 1) < d2 ) =
Just replace j by and bj by b
in the above to obtain test
procedures relevant to testing hypotheses concerning the
intercept.

S PRING
Theory
2013

56 / 110

## F T EST OF S EVERAL L INEAR R ESTRICTIONS

E XAMPLE
Suppose that the null hypothesis to be tested is denoted by H0 and
consists of several linear restrictions on the parameters of the
regression model. Thus H0 specifies the values of, say, q < (k + 1)
linear combinations of the regression coefficients. For example, with
k = 4 and q = 3, H0 could consist of the following restrictions:
+ 1 = 0; 2 = 1; and 4 = 0. We now need a joint test of all the
restrictions of H0 ,rather than a collection of separate t-tests.

S PRING
Theory
2013

57 / 110

## Let RSS(H0 ) be the sum of squared residuals obtained under the

restrictions of H0 .In the example of the previous note, RSS(H0 ) is
derived by applying OLS to the restricted model:
(yi x2i ) = 1 (x1i 1) + 3 x3i + ui .
Let RSS(H1 ) be the RSS obtained by applying OLS to the
unrestricted model. In the previous example, RSS(H1 ) is derived by
4

j=1

S PRING
Theory
2013

## F T EST OF S EVERAL L INEAR R ESTRICTIONS

58 / 110

P REDICTION

P REDICTION

D EFINITION
Define the F statistic by the following equation
F=

## Suppose wish to make predictions for period f , f > n (n is the

sample size), with xjf known and it being assumed that the data
generation process (DGP) for y is unchanged so that

,
q

yf = + j xjf + uf , uf N (0, 2 ).
j

## in which df (H1 ) is the degrees of freedom parameter for the

unrestricted model, i.e. df (H1 ) = (n k 1).

## Prediction of E(yf ): use the predictor b

yf = b
+b
j xjf , where the
j

## If H0 is true, then F F(q, df (H1 )).

The null hypothesis is regarded as inconsistent with the data if the
sample (observed) value of F is significantly large, i.e. the test is
one-sided.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Review of Econometric

S PRING
Theory
2013

59 / 110

OLS estimators use the data for i = 1, ..., n. This predictor is BLUE
for E(yf ) = + xf .
The predictor b
yf is a linear combination of the OLS estimators and
so is normally distributed. The variance of b
yf can be estimated, and
confidence intervals and tests of hypotheses are feasible.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

60 / 110

P REDICTION

P REDICTION

## Suppose wish to make predictions for period f , f > n (n is the

sample size), with xjf known and it being assumed that the DGP
for y is unchanged so that,
yf = + j xjf + uf , uf N (0, 2 )
j

## Prediction of yf : use same predictor which implies a forecast error

of
#
"


b
) + j j xjf
( yf b
y f ) = uf ( b

Wooldridge, Ch.3.

## which has zero expectation, given OLS unbiased and E(uf ) = 0.

The forecast error is normally distributed, being a linear
combination of normal variates, and has a variance that can be
estimated.
Confidence intervals and tests of hypotheses, e.g.
H0 : E ( yf b
yf ) = 0,is feasible.

S PRING
Theory
2013

## R EVIEW T OPIC 3: M ULTICOLLINEARITY

61 / 110

M ULTICOLLINEARITY

M ULTICOLLINEARITY

S PRING
Theory
2013

## R EVIEW T OPIC 3: M ULTICOLLINEARITY

62 / 110

M ULTICOLLINEARITY

M ULTICOLLINEARITY

## The information content of a sample available for the purpose of

estimating the individual regression parameters depends, in part,
upon the intercorrelations between the regressors.
Let R2j denote the R2 statistic from the OLS estimation of the

## It can be proved that,

var(b
j ) = 2 /RSSj = 2 /

"

(xji xj )
i


1 R2j .

## auxiliary regression of the jth regressor on the other (k 1)

regressors and the intercept term. Since it has been assumed that
no regressor is a linear combination of the other regressors and
the intercept term, it follows that R2j < 1 for all j.

## Thus, ceteris paribus, high degrees of multicollinearity lead to high

values of sampling variances.

## If R2j = 1 for some j, then say that there is perfect multicollinearity.

If R2j is close to 1 for some j, then have a high degree of
multicollinearity.

S PRING
Theory
2013

63 / 110

## Note: imprecise estimators can lead to wide condidence intervals

and weak tests of hypotheses.
i

## held constant. Variances may be small even when there is a high

degree of multicollinearity, or large when the regressor are
uncorrelated.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

64 / 110

## R EVIEW T OPIC 3: M ULTICOLLINEARITY

M ULTICOLLINEARITY

## R EVIEW T OPIC 3: M ULTICOLLINEARITY

M ULTICOLLINEARITY

M ULTICOLLINEARITY

## Also note that although the multicollinearity is indeed a problem,

but nontheless no assumptions of the classical multiple
regression model have been violated.
Therefore, provided multicollinearity is not perfect, then OLS
estimators are BLUE and MLE. Similarly the standard test
procedures are valid and retain optimality properties relative to
other tests.
Klein proposes the rule of thumb that multicollinearity is a
"problem" if maxj R2j > R2 .
If trying to consider multicollinearity, it is not sufficient to look
only at pairwise correlations between regressors (might be nested
models where reside complex relationship or even stochastic).

S PRING
Theory
2013

## R EVIEW T OPIC 3: M ULTICOLLINEARITY

M ULTICOLLINEARITY

65 / 110

## Multicollinearity is a feature of the nonrandom regressor set and

so we cannot test for it. Some measures for multicollinearity have
been proposed, but they are open to objection and the R2j statistics
are simple to calculate and interpret.
Models can be reparameterized to make transformed regressor
uncorrelated, but the transformed parameters may have no
economic interest.
As noted above, multicollinearity can lead to large variances and
weak tests, e.g. might have every individual slope estimate being
insignificant (as indicated by a t-test), but a highly significant F
statistic for the hypothesis that all slopes equal zero.

## J IANHUA G ANG (RUC)

M ULTICOLLINEARITY

S PRING
Theory
2013

66 / 110

## R EVIEW T OPIC 4: T HE M EAN F UNCTION

M ULTICOLLINEARITY

## Multicollinearity can also lead to large changes in parameter

estimates when there are small changes in the data.
Various "treatments" have been described, e.g. drop some
variables, use first differences, use outside estimates of some
coefficients. These treatments usually introduce new problems,
e.g. dropping an insignificant, but relevant, variable will lead to
biased estimator in the amended model.

Wooldridge, Ch.3, Ch. 7, Ch. 9.

## Real solution is to get more valid information, so using false

restrictions is not a good strategy. May also have to wait for more
data.

S PRING
Theory
2013

67 / 110

S PRING
Theory
2013

68 / 110

## I NCORRECT S PECIFICATION IN THE M EAN

R EVIEW T OPIC 4: T HE M EAN F UNCTION F UNCTION -C ONSEQUENCES

## I NCORRECT S PECIFICATION IN THE M EAN

R EVIEW T OPIC 4: T HE M EAN F UNCTION F UNCTION -C ONSEQUENCES

C ONSEQUENCES

C ONSEQUENCES

C ASE 1

C ASE 2

## Have assumed that there exist observation invariant parameters

and 1 , ..., k such that the conditional mean is given by

## Have assumed that there exist observation invariant parameters

and 1 , ..., k such that the conditional mean is given by
E(yi |xji , j = 1, ..., k) = + j xji ,

## where xji is ith value of jth regressor.

1. May have included irrelavant regressors, i.e. some j equals zero. OLS
estimators are still unbiased and consistent, but no longer efficient
(they fail to use valid information set that corresponds to some
coefficients being zero).

S PRING
Theory
2013

69 / 110

## I NCORRECT S PECIFICATION IN THE M EAN

R EVIEW T OPIC 4: T HE M EAN F UNCTION F UNCTION -C ONSEQUENCES

2. May have omitted some relevant regressors: Write the conditional mean
function as E(yi |xji , j = 1, ..., k) = + j xji + E(fi |xji , j = 1, ...k.),
j

## where fi stands for an omitted factor. In general, OLS estimators of

regression parameters and j are biased and inconsistent. The
estimator s2 is biased and inconsistent, and the standard t- and
F-tests are no longer valid.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

C ONSEQUENCES

70 / 110

## T EST P ROCEDURES -RESET T EST

C ASE 3

If have strong belief about the omitted factor, can use precise test.
For example, if sure that fi is a linear combination of q variables zji ,
can apply F-test of H0 : 1 = ... = q = 0 in the expanded model

## May use incorrect functional form, e.g. assume

yi = + j xji + ui , ui NID(0, 2 ),

## If do not have strong belief, then can use "information

parsimonious" RESET test. In this test, fit the null model

## log(yi ) = + j log(xji ) + vi , vi NID(0, 2 ).

j

yi = + j xji + ui , ui NID(0, 2 ),

## The OLS estimators of the false linear-linear model do not

correspond to parameters of economic interest.

S PRING
Theory
2013

## by OLS to obtain predicted values b

yi , i = 1, ..., n.
71 / 110

S PRING
Theory
2013

72 / 110

## Suppose we divide the sample into two subsamples, denoted by

1 and 2 .Let 1 contains n1 observations and 2 contains
n2 = n n1 observations. The unrestricted model of the
alternative hypothesis is then written as,

yi )j+1 + ui , ui NID(0, 2 ).
yi = + j xji + j (b
j

yi = + j xji + ui , ui NID(0, 2 ), if i 1 ,

Notes:
1

2
3
4
5
6
7

No b
yi term because this is a linear combination of the intercept term
and the regressors xji ;
F-test is valid even though added variables are random;
Choice of q has impact on power;
No rule for determining the best value of q;
Often use quite small values of q, e.g. 1 or 2;
Cannot expect RESET to indicate how a model should be re-specified;
Cannot assume RESET will always have high power.

S PRING
Theory
2013

73 / 110

## Suppose that ns > (k + 1), s = 1, 2.Let RSSs denote the residual

sum of squares (RSS) for the OLS regression of yi on the intercept
term and the xji using only the observations for s , s = 1, 2, and
RSS denote the residual sum of squares for this OLS regression
using all n observations. H0 can be tested using the F statistic
k+1

yi = + j xji + ui , ui NID(0, 2 ), if i 2 .
j

## so that changes in regression coefficients are permitted (under the

unrestricted model!). Should note the homoskedasticity.

S PRING
Theory
2013

74 / 110

## If, say, n2 (k"+ 1), then use predictive

failure test. Test n2
!#
restrictions E yi e
xji
+e
= 0, i ,whereedenotes an
j

## estimator derived using only the observations of 1 .The

F-statistics is
F=
,
n2
which is F(n2 , (n1 k 1)) when the model is stable.

## However, in case of n2 < k + 1,the n2 restrictions being tested may

be satisfied even though H0 is false.

## which is F(k + 1, (n 2k 2)) under H0 , with large values

indicating the inconsistency of H0 .
I NTRODUCTORY F INANCIAL E CONOMETRICS Review of Econometric
S PRING
Theory
2013

and

## The null hypothesis

of constant coefficients consists
n
o of the (k + 1)

F=

75 / 110

S PRING
Theory
2013

76 / 110

T REATMENT

T REATMENT

## The only treatment that allows valid inference is the correct

specification of the mean function.

S PRING
Theory
2013

## R EVIEW T OPIC 5: N ON - NORMAL D ISTURBANCES

Wooldridge, Ch.5.

77 / 110

N ON - NORMAL D ISTURBANCES

S PRING
Theory
2013

## R EVIEW T OPIC 5: N ON - NORMAL D ISTURBANCES

N ON - NORMAL D ISTURBANCES

78 / 110

C ONSEQUENCES

C ONSEQUENCES
OLS estimators are still BLUE, but, in general, are NOT normally
distributed. Therefore the t and F tests are no longer valid in
finite samples.

## The standard formulae for confidence intervals are also invalid in

finite samples.

yi = + j xji + ui , i = 1, ..., n,
j

## where the disturbances are independently and identically

distributed (i.i.d.) with zero mean and variance 2 < ,but the
common distribution is NOT normal.

## Under weak conditions, OLS estimators are consistent and a

Central Limit Theorem can be used to show that they are
asymptotically normally distributed, implying that t and F tests of
linear restrictions on regression coefficients are asymptotically
valid. The usual confidence intervals are also asymptotically
valid.
The prediction error test is, however, not asymptotically valid.
Since MLE maximizes wrong likelihood function, it does not
produce asymptotically efficient estimators.

S PRING
Theory
2013

79 / 110

S PRING
Theory
2013

80 / 110

## R EVIEW T OPIC 5: N ON - NORMAL D ISTURBANCES

T EST P ROCEDURES

## R EVIEW T OPIC 5: N ON - NORMAL D ISTURBANCES

T EST P ROCEDURES

T REATMENT

T REATMENT

## When the ui are NID(0, 2 ), the following conditions are satisfied:

E(u3i ) = 0; and E(u4i ) 34 = 0.
bi ,then it is natural to look
If a typical OLS residual is denoted by u
b3i and
at tests based upon the sample moments n1 u
b3i 3b
b2i .Jarque and Bera propose a
4 , where 2 = n1 u
n1 u
test of the joint significance of these terms. However, this test is
only asymptotically valid and, in large samples, there is little
need to assume normality when examining OLS results for the
linear multiple regression model.

## If have precise information about the form of the disturbance

distribution, then can derive the likelihood function and obtain
the asymptotically efficient MLE. Otherwise, use OLS and rely
upon large sample results.

## Asymptotic theory sometimes provides a poor approximation to

the actual finite sample behaviour of the Jarque-Bera statistic
when the ui are normal.
The Jarque-Bera test can have low power under some nonnormal
disturbance distributions.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

81 / 110

S PRING
Theory
2013

## R EVIEW T OPIC 6: A UTOCORRELATION AND

H ETEROSKEDASTICITY

82 / 110

H ETEROSKEDASTICITY-I NTRODUCTION

H ETEROSKEDASTICITY-I NTRODUCTION

## Allow var(ui ) to vary with i, so that y1 , ..., yn are independent

N ( + j xji , 2i ) variables, where 2i denotes var(ui ).
j

## Heteroskedasticity is often regarded as associated with

cross-section data, grouped data, or random coefficient models,
but can occur in time-series applications (GARCH-family models
for instance).

S PRING
Theory
2013

83 / 110

S PRING
Theory
2013

84 / 110

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

C ONSEQUENCES OF H ETEROSKEDASTICITY

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

C ONSEQUENCES OF H ETEROSKEDASTICITY

## OLS still unbiased and consistent, but no longer efficient in

either large or small samples.

Goldfeld-Quandt Test

## OLS not MLE because MLE maximize likelihood under false

assumption that all ui have same variance.
b
xji ui / e
x2 = + e
xji ui /RSSj , so that
= + e
j

ji

var(b
j ) =

"i

ex2ji 2i / ex2ji
i

#2


2
= e
x2ji u2i / RSSj which is not equal
i

## to E(s2 )/RSSj .Conventional standard errors are, therefore, biased.

The t- and F-tests are, therefore, invalid.

S PRING
Theory
2013

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

T ESTSFOR H ETEROSKEDASTICITY

85 / 110

T ESTSFOR H ETEROSKEDASTICITY

## A finite sample test that requires normality of the distrubances.

The null hypothesis is that the errors are homoskedastic. It is
assumed that information is available about the relative
magnitudes of variances under the alternative hypothesis of
heteroskedasticity.
Using this information, reorder the data so that 21 22 ... 2n .
Split the sample into three parts containing m, c, and m
observations, with m > (k + 1) and n = 2m + c. Drop the middle
set of c observations.

S PRING
Theory
2013

## T ESTS FOR H ETEROSKEDASTICITY

86 / 110

T ESTSFOR H ETEROSKEDASTICITY

## T ESTS FOR H ETEROSKEDASTICITY

Goldfeld-Quandt Test
functions for estimation using the first m and last m observations,
respectively. Under the null hypothesis of homoskedasticity, the
statistic GQ = RSS2 /RSS1 is distributed as F(m k 1, m k 1)
and large values indicate data inconsistency of null hypothesis.

## Lagrange Multiplier/Score Test

Original form suggested by Breusch-Pagan and Godfrey requires
normal disturbances even for asymptotic validity, and is not
recommended.

## Problems: a) Choice of m and c; b) Need enough information to

reorder data according to values of variances.

S PRING
Theory
2013

87 / 110

S PRING
Theory
2013

88 / 110

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

T ESTSFOR H ETEROSKEDASTICITY

## T ESTS FOR H ETEROSKEDASTICITY

T ESTSFOR H ETEROSKEDASTICITY

## Studentizedd Score Test

Koenkers Studentized Score test is asymptotically robust to
nonnormality. Estimate model by OLS using all observations and
bi , i = !
obtain the residuals u
1, ..., n. Assume an alternative of the
p

form

2i

## = g 0 + j zji ,where the precise form of g(.) need

1

not be specified.

## Studentizedd Score Test

Koenkers test statistic is nR2K and, under homoskedasticity, nR2K is
asymptotically distributed as 2 (p) with large values indicating
the rejection of the null model.
Problems: a) Large sample test; b) need enough information to
select the variable zji incorrect choice has impact on power.

p

u
1

## and obtain the coefficient of determination denoted by R2K .

J IANHUA G ANG (RUC)

S PRING
Theory
2013

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

89 / 110

T ESTSFOR H ETEROSKEDASTICITY

90 / 110

T ESTSFOR H ETEROSKEDASTICITY

## Whites test can be regarded as a Koenker-type test with the zji

being the nonredundant terms of xiq and xiq xir , q, r = 1, ..., k.
Problems: a) Large sample test; b) need enough information to
select the variable zji incorrect choice has impact on power.

S PRING
Theory
2013

S PRING
Theory
2013

91 / 110

## ARCH models are widely used - conditional variance depends

upon squared past values of ui .The test for ARCH is a
b2ij ; i = p + 1, ..., n and j = 1, ..., p.
Koenker-type check with zji = u

S PRING
Theory
2013

92 / 110

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

## If know variances up to a constant of proportionality, can apply

OLS to transformed data to get efficient estimators. Suppose
2i = 2 w2i ,with the w2i being known, then var(ui /wi ) = 2 i.In
this case, apply OLS to the transformed model
(yi /wi ) = (1/wi ) + j (xji /wi ) + (ui /wi ), in which the (ui /wi )

## If suspect heteroskedasticity and do not have very precise

information about its form, then can use Whites
heteroskedasticity consistent standard errors, denoted by
WSE(b
) and WSE(b
j ), j = 1, ..., k. for asymptotically valid
inference after OLS estimation.
White shows that, if

WSE(b
j ) =

## are NID(0, ) variates.

Note: the transformed model may not contain an intercept.

ex2ji u2i /
i

2

## j ) is asymptotically distributed as N (0, 1) in

then (b
j j )/WSE(b
presence of unspecified heteroskedasticity.
J IANHUA G ANG (RUC)

S PRING
Theory
2013

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

93 / 110

T REATMENT OF H ETEROSKEDASTICITY

S PRING
Theory
2013

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

94 / 110

T REATMENT OF H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

H0 : j = j0

## Hence, if d1 is such that

are based upon

prob(d1 N (0, 1) d1 ) = (1 ),
the (1 ) 100 per cent confidence intervals for and j are
) and b d1 WSE( b ), respectively.
given by, b
d1 WSE(b
j

under H0 .

## I NTRODUCTORY F INANCIAL E CONOMETRICS Review of Econometric

S PRING
Theory
2013

b
b
b
tW
0 ( j ) = ( j j0 ) /WSE( j )N (0, 1)

## Since the procedures are only asymptotically valid, can replace

N (0, 1) by t(n k 1) and this is often done. Thus can use the
following to obtain asymptotically valid tests of H0 : j = j0 .

95 / 110

S PRING
Theory
2013

96 / 110

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

## A UTOCORRELATION /S ERIAL C ORRELATION I NTRODUCTION

b ) > d1 , where
H1 : j 6= j0 reject H0 if tW
(

0
j

j

## prob(t(n k 1) > d1 ) = /2;

b
H1+ : j > j0 reject H0 if tW
0 ( j ) > d2 , where
prob(t(n k 1) > d2 ) = ;
H : 6= reject H0 if tW ( b ) < d2 , where
1

j0

j

## some t 6= s.Use t subscript because autocorrelation is often

discussed in a time-series framework, but spatial autocorrelation
has been examined.

prob(t(n k 1) > d2 ) = ;
Just replace j by and b
by b
in the above to obtain test
procedures relevant to testing hypotheses concerning the
intercept.

S PRING
Theory
2013

## The regressors are asumed to be nonrandom. (It would be

straightforward to allow for random regressors with xjt
independent of us , for all j, s and t.) This assumption will be
relaxed later. In particular, will consider autocorrelation when
regressors include lagged values of the dependent variable.

97 / 110

C ONSEQUENCES OF A UTOCORRELATION

## OLS not MLE because MLE maximizes likelihood under false

assumption that the ut are independent.
b
= + e
xjt ut
xjt ut / e
x2 = + e
xjt ut /RSSj ,and, since the e
t

jt

exjt ut
t

S PRING
Theory
2013

98 / 110

## OLS still unbiased and consistent, but no longer efficient in either

large or small samples.

## R EVIEW T OPIC 6: H ETEROSKEDASTICITY

C ONSEQUENCES OF A UTOCORRELATION

## A UTOCORRELATION /S ERIAL C ORRELATION

xjt ut ) and so
6= var(e
t

var(b
j ) 6= 2 /RSSj .Conventional standard errors are, therefore,
biased.

## In the lectures given this term, it is assumed that the ut are

covariance stationary with E(ut utg ) = (|g|) for all t, with
(|0|) = 2 . The autocorrelation of order g, denoted by (g),is the
correlation between ut and utg ,i.e. E(ut utg )/2 , with the
sequence (1), (2), ...being called the autocorrelation function or
ACF. Under the null hypothesis of serial independence, (g) = 0
for all g 6= 0. Different tests check the significance of different sets
of estimates of autocorrelations.

S PRING
Theory
2013

99 / 110

S PRING
Theory
2013

100 / 110

## Basically a test for nonzero values of (1), based upon OLS

residuals. The test statistic is
d=
which is approx.

L EMMA
Values of d close to 0 (resp. 4) indicate high level of positive (resp. negative)
residual first order serial correlation. The distribution of d under null
hypothesis of independent errors depends upon values of regressors, so critical
values vary from one case to another.

2(1 r(1))

where,
r(1) =

S PRING
Theory
2013

101 / 110

S PRING
Theory
2013

102 / 110

## D URBIN -WATSON T EST

Have tables for combinations of n and k (and for models with and
without an intercept) giving bounds for the critical values for
testing H0 of serial independence against H1 : (1) > 0.These
upper and lower bounds, denoted by du and dl , define an interval
that contains the true known critical value. If d < dl , reject.If
d > du , accept.If dl d du ,the test is inconclusive. For
H1 : (1) < 0,use 4 du and 4 dl as bounds.

## The Durbin-Watson procedure is a useful test against either first

order autoregressive (AR(1)) model ut = 1 ut1 + t ,or first order
moving average (MA(1)) model ut = t + 1 t1 ,in which the
t NID(0, 2 ).For reasons to be discussed later in time series, we
assume |1 | < 1 and | 1 | 1.
Problems:
Checks for nonzero values of (1) can be insensitive to
(g) 6= 0, g 6= 1,e.g. g = 4, when (1) = 0
Test is inconclusive when sample value of d falls between
bounds-inconclusive region.
Requires errors to be normal and regressors to be fixed, e.g. no
lagged dependent variables.

S PRING
Theory
2013

103 / 110

S PRING
Theory
2013

104 / 110

E STIMATION

## Very flexible asymptotic test based upon OLS results. It is

asymptotically valid for models with nonnormal errors and
lagged dependent variables in the regressor set.
If null hypothesis of serial independence is to be tested against
autoregressive or moving average model of order g, then apply
asymptotically valid F-test of H0 : 1 = 2 = ... = g = 0 after OLS
k

E STIMATION

## If have precise information about form of autocorrelation, e.g.

type (AR or MA) and order (value of g), can use asymptotically
efficient MLE or apporoximation.
k

1

## btj + ut ,in which

estimation of the model yt = + j xjt + j u

## ut = 1 ut1 + ... + g utg + t , t NID(0, 2 ), AR(g), or

ut = t + 1 t1 + ... + g tg , t NID(0, 2 ), MA(g). MLE, or
approximations based upon minimizing 2t are available in

## of yt = + j xjt + ut .For "gaps" in alternative model, omit

econometric softwares.

btj are lagged values of the residuals from the OLS estimation
the u
k

btj = 0.
selected j terms. If t j is not positive, set u
J IANHUA G ANG (RUC)

S PRING
Theory
2013

105 / 110

106 / 110

## R ESIDUAL S ERIAL C ORRELATION OR G ENUINE

D ISTURBANCE A UTOCORRELATION ?

## Significant outcomes of tests designed for autocorrelation can be

caused by misspecification of the mean function, e.g. omit
relevant regressors or use wrong functional form. In such cases,
re-estimation allowing for autocorrelation is of little value.

S PRING
Theory
2013

S PRING
Theory
2013

## R ESIDUAL S ERIAL C ORRELATION OR G ENUINE

D ISTURBANCE A UTOCORRELATION ?

107 / 110

## A procedure, called the COMFAC test, has been developed to test

the null hypothesis that the errors of a regression equation are
generated by an autoregressive process of specified order. The
COMFAC test uses as its alternative an expanded version of the
original regression equation obtained by adding lagged values of
the dependent variable and the initial set of regressors. Details are
not provided because this test, while asymptotically valid, has
finite sample properties that cause concern; see Gregory and Veall,
Economic Letters, 1986, 22, 203-208. Moreover, the alternative
a test that rarely detects a false null hypothesis.

S PRING
Theory
2013

108 / 110

## R ESIDUAL S ERIAL C ORRELATION OR G ENUINE

D ISTURBANCE A UTOCORRELATION ?

## R ESIDUAL S ERIAL C ORRELATION OR G ENUINE

D ISTURBANCE A UTOCORRELATION ?

## Mizon (A simple message for autocorrelation correctors: dont,

Journal of Econometrics, 1995, 69, 267-288) offers the following
conclusions:
Although it is important to test for autocorrelation, it is rarely
appropriate to "autocorrelation correct" in response to rejecting the
null hypothesis of independent disturbances;
and, when re-estimation assuming autoregressive errors imposes
invalid restrictions, inconsistent parameter estimators will result.

## The nature of the restrictions to which Mizon refers can be

illustrated by considering a simple case in which the model of the
null is yt = xt + ut ,with ut = 1 ut1 + t , t NID(0, 2 ), i.e. the
disturbances ut are AR(1).
J IANHUA G ANG (RUC)

S PRING
Theory
2013

109 / 110

## Under this null,

yt = xt + 1 (yt1 xt1 ) + t ,
or equivalently,
yt = xt + 1 yt1 1 xt1 + t , t NID(0, 2 ),
in which the coefficient of xt1 is restricted to be minus the
product of the coefficients of xt and yt1 . Note that this restriction
is not linear.

S PRING
Theory
2013

110 / 110

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 1 Introduction of Time Series
3 C REDITS , 51 H OURS
Statistical analysis of data observed over time.

Jianhua Gang
School of Finance
Renmin University of China

Spring 2013

Time Series
2013

1 / 18

M OMENTS

## For a generic random variable we can define the mean, variance,

and for pairs of random variables we can also define covariance,
correlation etc. In a time series we define these for each Yt :

## Data observed between two dates, normalized as t = 1 and t = T.

Equispaced, i.e. we observe Y1 , Y2 , ..., Yt , Yt+1 , ..., YT1 , YT and NO
intermediate observation is missing.
Yt depends on Ys (if theres any) if and only if s < t
Yt does not depends on Ys if s > t.
Then, the vector {Y1 , Y2 , ..., Yt , Yt+1 , ..., YT1 , YT } is a time series.

## D EFINITIONS (M OMENTS OF T IME S ERIES )

Mean: E(Yt )= t ;

2
2
Variance: E (Y
nt t ) = t

o
Covariance: E (Yt t )(Yt+j t+j ) = t (j)
Correlation:

Time Series
2013

2 / 18

M OMENTS

Time Series
2013

3 / 18

t (j)
t t+j

Time Series
2013

4 / 18

O PERATORS

O PERATORS

## S TATIONARITY AND E RGODICITY

P ROBLEM
Suppose {Y1 , Y2 , ..., Yt , Yt+1 , ..., YT1 , YT } is a single realization from a
stochastic process {Yt }
.We are interested in the model that generated
the time series, but we do not know it. How can we make inference, using
one single realization?

Lag operator: L
L Yt = Yt1
So, L1 Yt = Yt+1

## First Difference operator:

= 1L
Yt = (1 L)Yt = Yt Yt1
Also, 2 Yt = (1 L)2 Yt = Yt 2Yt1 + Yt2

Time Series
2013

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

S OLUTION
We must use the fact that this is a T-dimensional observation:

5 / 18

R ESTRICT H ETEROGENEITY

Time Series
2013

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

R ESTRICT H ETEROGENEITY

6 / 18

R ESTRICT H ETEROGENEITY

R ESTRICT H ETEROGENEITY

## Assume some properties are common to all the Yt s in

{Y1 , Y2 , ..., YT } .For example,

## In this way, we may try to estimate or (j) using the sample

counterparts. "Covariance stationarity" is also known as a "weak
stationarity" or simply as "stationarity" (without other references).

## D EFINITION (C OVARIANCE S TATIONARITY )

For time series Yt {Yt }
,

## For stationary processes, we shorten the notation and introduce j

for (j) to indicate the autocovariance.

E(Yt ) = , t

E (Yt )(Yt+j )
= (j), t


## The plot of j against j is called autocovariance function.

i.e. the first two moments are finite and do not depend on time
(spatial equivalent).

Time Series
2013

7 / 18

Time Series
2013

8 / 18

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

R ESTRICT H ETEROGENEITY

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

R ESTRICT HETEROGENEITY

## R ESTRICT D EPENDENCE OVER T IME

Given n , and given the process is stationary, then the sample
moments would estimate the population moments consistently.

## One may generalize this argument and allow for some

dependence, provided that it is not too much: a sufficient

## D EFINITION (S TRICT S TATIONARITY )



For any j1 , ...jn , the joint distribution of Yt+j1 , ..., Yt+jn and of


Yt+ +j1 , ..., Yt+ +jn is the same for any .
1

,not on time;

## condition for consistent estimation of is

|j | < .
j=0

D EFINITION
One restriction on the dependence that allows to consistently estimate
the population moments using the sample moments in stationary
processes is called Ergodicity.

Time Series
2013

9 / 18

of Time2013
Series

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

R ESTRICT D EPENDENCE

10 / 18

## F ORECASTS BASED ON A L INEAR P ROJECTION

Assume: Yt is stationary; E(Yt ) = 0 (if E(Yt ) = 6= 0, then

## Often we are interested in time series because we want to answer

one of the two questions:
1

## Forecasting: What value do you expect for Yt+1 if you observed

Y1 , ..., Yt ?
Impulse response: What is the consequence on Yt of a shock that
took place (t j) periods ago?

## Linear forecast of Yt+1 using Yt is

b t + 1 | t = a ( 1 ) Yt ;
Y
1

## Linear forecast of Yt+1 using Yt and Yt1 is

b t+1|t = a(2) Yt + a(2) Yt1 ;
Y
1
2

3

## Linaer forecast of Yt+1 using Yt , ..., Ytm+1 is

(m)
b t+1|t,...,tm+1 = a(m) Yt + a(m) Yt1 + ...am
Y
Ytm+1 .
1
2

of Time2013
Series

11 / 18

of Time2013
Series

12 / 18

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

F ORECAST

W OLD D ECOMPOSITION

W OLD D ECOMPOSITION
Of course, in some cases a non-linear forecast may be better.

## Now, which values of

linear projection?

(m) (m)
(m)
(1 , 2 , ..., m )
(m)

(m)

characterise a good
(m)

## Let Xt = (Yt , ..., Ytm+1 ) , = (1 , 2 , ..., m ) ,then must

meet E [(Yt+1 Xt ) Xt ] = 0 (i.e., the forecast error Yt+1 Xt is
not correlated with Xt ).
Then, given Yt+1 = Yt+1 , (Yt+1 being single component),

Yt = kt + j tj
j=0

where

## It can be proved that b

gives the best linear forecast.

of Time2013
Series

## D EFINITION (W OLD D ECOMPOSITION )

E(Yt+1 Xt ) E(Xt Xt ) = 0
 1

b
E(Xt Yt+1 )
= E(Xt Xt )

## However, a linear model is usually easier to use, so it is important

that any stationary process may be given a linear representation.
This can be discussed using the Wold Decomposition.

0 = 1, 2j <
j=0

13 / 18

W OLD D ECOMPOSITION

of Time2013
Series

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

W OLD D ECOMPOSITION

14 / 18

I MPULSE R ESPONSE

I MPULSE R ESPONSE
For a process Yt that admits

Yt = + j tj

## and t ,the error made in forecasting Yt on the basis of a linear

function,
b (Yt |Yt1 , ...)
t = Yt E

j=0

## is such that, for any t, E(t ) = 0, E(2t ) = 2 , E(t s ) = 0 if t 6= s.

E(t ) = 0, E(2t ) = 2 ,

## kt is the linear deterministic component of Yt : it can be predicted

arbitrarily well as a linear function of past Yt , i.e.,
b (kt |Yt1 , ...) and it is such that E(kt tj ) = 0 j.
kt = E

E(t s ) = 0, s 6= t.
notice that

Yt
= j
tj

## so j is the effect on Yt of a shock that took place (t j) periods

before. A plot of j (againtst j) is called impulse response function.
J IANHUA G ANG (RUC)

of Time2013
Series

15 / 18

of Time2013
Series

16 / 18

ACF

## T OPIC 1 I NTRODUCTION OF T IME S ERIES

A UTOCORRELATION F UNCTION

## PARTIAL A UTOCORRELATION F UNCTION

D EFINITION (PACF)

D EFINITION (ACF)

## For a stationary Yt ,define the autocorrelation,

j =

(m)
b t+1|t,...,tm+1 = (m) Yt + (m) Yt1 + ... + m
Y
Ytm+1
1
2

j
0

(1)

of Time2013
Series

(2)

(m)

(j)

PACF

## autocorrelations, and a plot of j (against j) is called partial

autocorrelation function.

17 / 18

of Time2013
Series

18 / 18

## T OPIC 2 M OMENT G ENERATING F UNCTION

I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 2 MGF
3 C REDITS , 51 H OURS
It is however essential to consider the MGFs in order to
depict/solve relevant time series problems.

Jianhua Gang
School of Finance
Renmin University of China

Spring 2013

S PRING 2013

1 / 19

P RELIMINARIES

P RELIMINARIES :

P RELIMINARIES :

## S AMPLE S PACE AND R ANDOM VARIABLES

B INOMIAL D ISTRIBUTION

Define,

f (x) =

x x

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 2 MGF

S PRING 2013

2 / 19

P RELIMINARIES

Define as,

x sample space;
x random variable

## Then a probability density function (pdf) f (x) is a mapping from

x to the set of R with the probability that:

f (x) = 1;

Z x
Pr {x x } =

f (x)dx = 1.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 2 MGF

S PRING 2013

3 / 19

n!
px (1 p)nx , for x = 1, 2, ..., n.
x!(n x)!

(a + b)n =

n!

x=0

S PRING 2013

4 / 19

P RELIMINARIES

## T OPIC 2 M OMENT G ENERATING F UNCTION

P RELIMINARIES :

P RELIMINARIES :

P OISSON D ISTRIBUTION

N ORMAL D ISTRIBUTION

P RELIMINARIES

Define as,

e x
, for x = 1, 2, ..., n.
x!
The density arises from the identity of:
f (x) =

Define as,

o
n
(x )2
exp 22

f (x) =
22

written as x N , 2 , where < x < .

e =

x
x!
x=0

in which = E(x).

S PRING 2013

5 / 19

6 / 19

## The expectation (or the mean, or 1st. moment) of a random

variable is defined by,

x f (x) discrete

x x
Z
E(x) =

x f (x) dx continuous

 
i = E xi

x x

## i.e., it is a weighted average of x over all possible outcomes.

The expectation of a measurable function g(x) of a r.v. x is
therefore defined by:

g (x) f (x)

x x
Z
E {g(x)} =

g (x) f (x)dx

S PRING 2013

o
n
i = E (x )i

x x

S PRING 2013

7 / 19

S PRING 2013

8 / 19

## H IGHER -O RDER M OMENTS

C ALCULATION OF M OMENTS

## What about the higher-order (central) moments?

In definition, the third and the fourth moments measure the
following properties:
3 :

4 :

## It is simple to show that:

g(x) = c E {g(x)} = c

E {c g(x)} = c E {g(x)}

E {a + b g(x)} = a + bE {g(x)}

measured by:
3 =
4 =

and hence,

3
3
4
4

## T OPIC 2 M OMENT G ENERATING F UNCTION

o
 
n
2
2 = E (x )2 = E x2 [E(x)]

S PRING 2013

9 / 19

x x

so that,

S PRING 2013

10 / 19

## M OMENT G ENERATING F UNCTIONS (MGF S )

#
2 x2
1 + x +
+ ... f (x)dx
2!

di [Mx ()]
i

2
3
i
2 + 3 + ... + i + ...
2!
3!
i!

## |=0 = i (raw moments)

d
Hence we call the function Mx () the MGF of x. Note that this
property is true in either the discrete or the continuous case.

x x

S PRING 2013

"

= 1 + 1 +

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 2 MGF

x x

)
2 2

x
Mx () = E ex = E 1 + x +
+ ...
2!
)
"
#
(
Z

(x)i
(x)i
=
= E
i! f (x)dx
i!
i=0
i=0

## Calculating the moments of even simple r.v.s can be difficult.

However, consider the following function:

e f (x)
x x
n o
Z
Mx () = E ex =

ex f (x)dx

11 / 19

S PRING 2013

12 / 19

## M OMENT G ENERATING F UNCTIONS (MGF S )

E XAMPLE OF MGF

A N E XAMPLE

It is also easy to see that the MGF satisfies two very important
properties.
E XAMPLE
Observations x1 through xn which are independent copies from r.v.
x Po ().Suppose were interested in the properties (distribution,
moments, etc.) of the sample mean:

the r.v. x.)

X=

Mn1 xi =
Mn 1 xi
1n

## Mxi () = [Mx ()]

i=1
n



1
n
= Mxi ( ) = Mx ( )
n
n
i=1

1 n
Xi
n i
=1

S PRING 2013

13 / 19

E XAMPLE OF MGF

## T OPIC 2 M OMENT G ENERATING F UNCTION

A N E XAMPLE

S PRING 2013

14 / 19

E XAMPLE OF MGF

A N E XAMPLE
P ROBLEM
Calculate the MGF of Sn = nX;

P ROBLEM
Calculate the MGF of X;

S OLUTION
S OLUTION

n o
Mx () = E ex =

MSn
() =

x

e
e x

= e
= e
x!
x!
x=0
x=0
n
o
o
n 
= e exp e = exp e 1
x

S PRING 2013

n 
oin
exp e 1

n 
o
= exp n e 1

x=0

i=1

ex f (x)

## J IANHUA G ANG (RUC)

Mx () =

Note that the MGF of Sn is of the same form as that for x, i.e. letting = n
n 
o
MSn () = exp e 1
i.e. Sn Po (n) = Po ( ).

15 / 19

S PRING 2013

16 / 19

E XAMPLE OF MGF

## T OPIC 2 M OMENT G ENERATING F UNCTION

A N E XAMPLE

E XAMPLE OF MGF

A N E XAMPLE
P ROBLEM
The moments of X.

P ROBLEM
Calculate the MGF of X;

S OLUTION

S OLUTION
h

E X

Mxi ( )

n
n
n
i=1
n h

oi
n

n

= Mx ( ) = exp e n 1
n
o
n 
= exp n e n 1

MX () = M Sn
( ) = M xi ( ) =

S PRING 2013

h 2i
E X
=
2X

17 / 19

E XAMPLE OF MGF

## That is we immediately find that

 
E X =

2X =
n
If we consider X as an estimator for , we refer to these properties
as unbiasedness, and given the consistency, that is the variance
tends to be zero.

S PRING 2013

 
E X =

A N E XAMPLE

i
i

19 / 19

## J IANHUA G ANG (RUC)

o
n

di exp n(e n 1)
i
n d
o
d exp n(e n 1)

o
n d
2
d exp n(e n 1)
2

| =0
| =0 =
| =0 = 2 +

i d
   2

= E X E X
= (central moments)
n
h

S PRING 2013

18 / 19

## T OPIC 3 ARMA M ODELS

I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 3 ARMA Models

## We said we are interested in the j in the representation:

3 C REDITS , 51 H OURS

Yt = +

j t j

j =0

Jianhua Gang
for the impulse response analysis and for forecasting.
School of Finance
Renmin University of China

## However, in general we dont know the j , and we cant hope to

estimate an infinite number of parameters, so we have to propose
parsimonious models.

Spring 2013

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

1 / 47

2 / 47

## T HE S IMPLEST M ODEL : W HITE N OISE

If t is w .n.(0, 2 ),

D EFINITION

## t may be independent, but needs not be;

t may be strictly stationary, but needs not be;
t is covariance stationary;

{ t }
is white noise if:
E ( t ) = 0t

and if Yt = +

E (2t ) = 2 t
(j )

so, j = 0, j = 0, and j

and if Yt = +

mean if

is stationary if

j t j ,then Yt

3 / 47

j =0

j =0

j < .

j =0

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013

2j < ;

j t j ,then Yt

j =0

E ( t s ) = 0t (t 6 = s )

S PRING 2013

S PRING 2013

4 / 47

MA(1)

## T OPIC 3 ARMA M ODELS

MA(1)

I NVERTIBLE MA(1)

I NVERTIBLE MA(1)

## Let t w .n.(0, 2 ), then Yt = + t + t 1 is the MA(1).

We can check stationarity noticing that 0 = 1, 1 = ,so

Rewrite t = Yt t 1 as t = Yt Lt

2j = 1 + 2 < .

## using the lag operator. Then, (1 + L)t = Yt ,so, for | | < 1,

j =0

Otherwise, we can check that the first two moments do not depend on
time.
1
2

Mean: E (Yt ) =
Autocovariances:

= E [(Yt )2 ] = E [(t + t 1 )2 ] = (1 + 2 )2

= E [(Yt )(Yt 1 )] = 2
= 0

i.e. Yt =

j =1

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

5 / 47

MA( Q )

6 / 47

MA( Q )

MA( Q )

MA(q).
Mean: E (Yt ) =
Autocovariances:

## The impulse response are j , j q,and drop to 0 after q lags.

0
j q

= E [(t + 1 t 1 + ... + q t q ) ]
= (1 + 21 + ... + 2q )2
= E [(t + 1 t 1 + ... + q t q )
(t j + 1 t 1 j + ... + q t q j )]

j >q

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

MA( Q )

( )j Yt j + t .

Autocorrelations: 1 = 2 , j 2 = 0.
1 +

## J IANHUA G ANG (RUC)

Yt
= Yt ( )j Lj = ( )j Yt j ,
(1 + L)
j =0
j =0

0
j 2
3

t =

= ( j + 1 j +1 + 2 j +2 + ... + q j q )2
= 0

S PRING 2013

7 / 47

## Invertibility: set = 0;recall Yt = (1 + 1 L + ... + q Lq )t and

factor (1 + 1 L + ... + q Lq ) = (1 1 L)(1 2 L)...(1 q L) in
the MA(1) we asked that |1 | < 1: in the same way here we have to
ask that |1 | < 1, |2 | < 1, ..., |q | < 1.This is sometimes stated as
asking that the roots of the equation in z of the form
(1 + 1 z + ... + q z q ) = 0 lie OUTSIDE the unit circle.

S PRING 2013

8 / 47

MA( INFINITY )

MA( INFINITY )

AR(1)

## Let t w .n.(0, 2 ), then Yt = c + Yt 1 + t is AR (1). Assume

further that || < 1. Since Yt 1 = c + Yt 2 + t 1 ,then replace
into the previous equation:

j t j

AR(1)

j =0

## |j | < ,we can derive the moments replacing j

j =0

by j in a

Yt

= c + (c + Yt 2 + t 1 ) + t
= (1 + )c + 2 Yt 2 + t 1 + t
= ...iterating

Yt

1
2

Mean: E (Yt ) =
Autocovariances:

2k 2

k k +j 2

k =0

S PRING 2013

## T OPIC 3 ARMA M ODELS

j =0

j =0

as n , and || < 1
n
1
Yt =
c + 0 + j t j
1
j =0

k =0

j c + n +1 Yt n 1 + j t j

9 / 47

AR(1)

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

AR(1)

10 / 47

AR(1)

AR(1)

the condition

Mean:

j =0

| j | =

j =0

| |j =
j =0

1
1 ||

## stationary and ergodic for the mean.

This can also be obtained by rewriting Yt as Yt = c + LYt + t ,
using the lag operator, and then (1 L)Yt = c + t . Since || < 1,
Yt

0 =

j c + j t j
j =0

2k 2 = 2k 2 = 1 2 2

k =0

c
(= )
1

## Autocovariances: using the formula for the MA() process,

= (1 L)1 c + (1 L)1 t

E (Yt ) =

k =0

k =0

k k +j 2 =

k =0

k k +j 2 =

k =0

2k j 2 =

j
2
1 2

j =0

c
+ j t j
1 j =0

S PRING 2013

11 / 47

S PRING 2013

12 / 47

AR(1)

## T OPIC 3 ARMA M ODELS

AR(1)

AR(1)

AR(1)

Autocorrelations

Upon knowing that the process is stationary, we could derive the mean and
autocovariances:

j
= j
j =
0

## Mean: E (Yt ) = E (c + Yt 1 + t ) = c + E (Yt 1 ) + E (t ) using

stationarity, E (Yt ) = , E (Yt 1 ) = ,
so = c + and then = 1 c
Autocovariances: Replacing c = (1 ), rewrite Yt as

= + Yt 1 + t
Yt = (Yt 1 ) + t
Yt

then
0 = E (Yt )2 = E ((Yt 1 ) + t )2

= 2 0 + 2

## Impulse Response Function: j = j

J IANHUA G ANG (RUC)

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

13 / 47

AR(1)

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

AR(1)

14 / 47

AR( P )

AR( P )

solving for 0,

## Let t w .n.(0, 2 ), then Yt = c + 1 Yt 1 + ... + p Yt p + t is

AR (p ).

2
.
1 2
= E [(Yt )(Yt j )]

0 =
j 1

P ROBLEM
How can we check for stationarity?

= E [((Yt 1 ) + t )(Yt j )]
= E [(Yt 1 )(Yt j )] + E (t (Yt j ))
= j 1
So
j 1 =

## J IANHUA G ANG (RUC)

S OLUTION
Factoring (1 1 L ... p Lp ) = (1 1 L)...(1 p L) stationary
follows if |j | < 1 for all j.
Another way to state this condition is to check that the solutions of the
equation in z, (1 1 z ... p z p ) = 0 are all OUTSIDE the unit circle.

j
2 .
1 2

S PRING 2013

15 / 47

S PRING 2013

16 / 47

AR( P )

## T OPIC 3 ARMA M ODELS

AR( P )

AR( P )

AR( P )
Given stationarity,
Autocovariances:
0 = E (Yt )2

## Given the stationarity,

Mean:
E (Yt ) = E (c + 1 Yt 1 + ... + p Yt p + t )
= c + 1 + ... + p
c
=
1 1 ... p

j 1

## = E [(1 (Yt 1 ) + ... + p (Yt p ) + t ) (Yt )]

= E [(1 (Yt 1 )(Yt ) + ... + p (Yt p )(Yt )
+t (Yt ))]
= 1 1 + ... + p p + 2
= E [(Yt )(Yt j )]
= E [(1 (Yt 1 ) + ... + p (Yt p ) + t ) (Yt j )]
= E [(1 (Yt 1 )(Yt j ) + ... + p (Yt p )(Yt j )
+t (Yt j )]
= 1 j 1 + ... + p j p

## This is a linear system in j , j = 0, ..., p.

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models
S PRING 2013

17 / 47

AR( P )

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

AR( P )

18 / 47

AR( P )

AR( P )

## Notice that if the roots of (1 1 z 2 z 2 = 0) are complex, then

the autocorrelations show a cyclical dynamics.

0 = 1 1 + 2 2 + 2
1 = 1 0 + 2 1
2 = 1 1 + 2 0
and notice that 1 = 1 ,so replacing 1 and 2 ,
1

1 2 0


21
=
+ 2 0
1 2
(1 2 )
h
i 2
=
(1 + 2 ) (1 2 )2 21

1 =
2
0

## We can therefore also get autocorrelations.

J IANHUA G ANG (RUC)

S PRING 2013

19 / 47

S PRING 2013

20 / 47

## In general we can compute the IRF inverting (L)Yt = t into

Yt = (L)1 t (here we used stationary) so (L)1 = (L).i.e.,

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

21 / 47

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

22 / 47

ARMA( P, Q )

ARMA( P, Q )

## Let t w .n.(0, 2 ), then

Yt = c + 1 Yt 1 + ... + p Yt p + t + 1 t 1 + ... + q t q is
ARMA(p, q ).
Stationarity of the whole ARMA(p, q ) depends on the
autoregressive part only (whilst the invertibility depends on the
MA part only) :
Using the lag operator, the ARMA(p, q ) is
(1 1 L ... p Lp )Yt = (1 + 1 L + ... + q Lq )t .
For stationarity, we have to check if the roots of
(1 1 z ... p z p ) = 0 are all outside the unit circle.
For invertibility, we require that the roots of
(1 + 1 z + ... + q z q ) = 0 are outside the unit circle.

S PRING 2013

23 / 47

S PRING 2013

24 / 47

ARMA( P, Q )

## T OPIC 3 ARMA M ODELS

ARMA( P, Q )

ARMA( P, Q )

ARMA( P, Q )
Given the stationarity,
Autocovariances: The autocovariances are a combination between
those of an AR (p ) and an MA(q ), so for j > q,

## Given the stationarity,

Mean:

j = 1 j 1 + ... + p j p
E (Yt ) = E (c + 1 Yt 1 + ... + p Yt p

## For example, ARMA(1, 1), Yt = c + Yt 1 + t + t 1 (|| < 1) :

Firstly notice that

+t + 1 t 1 + ... + q t q )
= c + 1 + ... + p + 0 + ... + 0
c
=
1 1 ... p

E [(Yt )t ] = E [((Yt 1 ) + t + t 1 )t ]

= 0 + 2 + 0 = 2
E [(Yt )t 1 ] = E [((Yt 1 ) + t + t 1 )t 1 ]

= 2 + 0 + 2 = ( + )2
J IANHUA G ANG (RUC)

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

25 / 47

S PRING 2013

ARMA( P, Q )

## T OPIC 3 ARMA M ODELS

ARMA( P, Q )

26 / 47

ARMA( P, Q )

ARMA( P, Q )

so
0 = E [((Yt 1 ) + t + t 1 )(Yt )]

so

+E [t 1 (Yt )]
= 1 + 2 + ( + )2

0
1

j 2 = j 1

## = E [(Yt 1 )(Yt 1 )] + E [t (Yt 1 )]

+E [t 1 (Yt 1 )]
= 0 + 0 + 2
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models
S PRING 2013

and

1 = E [(Yt )(Yt 1 )]


( + )2
= 1+
1 2


( + )2
= 2 + +
1 2
2

27 / 47

S PRING 2013

28 / 47

ARMA( P, Q )

ARMA( P, Q )

IRF OF ARMA

## I MPULSE R ESPONSE F UNCTION OF ARMA( P, Q )

The autocorrelation can be derived in the same way: for the generic
ARMA(p, q ), for j > q,

## Given stationarity, inverting (L)Yt = (L)t

=
1
(L) (L) =
(L) =
(1 + 1 L + ... + q Lq ) =
Yt

( L ) 1 (L ) t
(L)
(L) (L)

(1 1 L... p Lp )

(1 + 1 L + 2 L2 + ...)
(1 + 1 L + ... + q Lq ) = 1 1 L + 1 L 2 L2 + 2 L2
3 L3 2 1 L3 1 2 L3 + 3 L3 + ...

S PRING 2013

29 / 47

IRF OF ARMA

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

30 / 47

IRF OF ARMA

## In the ARMA(1, 1) case, then,

solve this for the various power of L:

1 = +

L0 : 1 = 1
1

: 1 = 1 + 1

: 2 = 2 + 1 1 + 2

L
L

j 2 = j 1 = ( + ) j 1
The ARMA(1, 1) could also be decomposed in impulse response by
looking at
Yt = Yt 1 + t , t = t + t 1

L3 : 3 = 3 + 3 + 2 1 + 1 2

S PRING 2013

31 / 47

S PRING 2013

32 / 47

IRF OF ARMA

C OMMON FACTORS

C OMMON FACTORS

Then,

Yt

j =0

j =0

j t j + j t j 1
j =0

## In ARMA modelling, it may be that the same factor appears both in

(L) and of (L) : in this case, the ARMA(p, q ) process cannot be
distinguished, on the basis of the autocorrelation structure (or from
the weights in the MA() representation), from an
ARMA(p 1, q 1) process.

j t j = j (t j + t j 1 )
j =0

## In this case, it is sometimes also said that the model ARMA(p, q ) is

overparametrised.

j t j + l 1 t l
j =0

l =1

## The ARMA(p, q ) model may be simplified (and indeed it may be

desirable to do so, especially if the parameters 1 , ..., p and
1 , ... q have to be estimated).

= t + j 1 t j + j 1 t j
j =1

j =1

= t + ( + ) j 1 t j
j =1

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

33 / 47

C OMMON FACTORS

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

E XAMPLE :

34 / 47

## A FINAL COMMENT ON STATIONARY AND INVERTIBLE

ARMA
Yt = 1.2Yt 1 0.35Yt 2 + t 0.7t 1

is

## We already saw that for a stationary ARMA(p, q ), it is also possible

to give an MA() representation; in the same way, it is also possible
to give an AR () representation (indeed, this is a proper definition of
"invertibility"). All these models have the same autocovariances /
autocorrelation structures, and are therefore indistinguishable.

## We can choose the representation that is more convenient for our

purpose: for example, we may like the MA() if we are interested in
the impulse rensponse function, the AR () if we want to compute t
given observations on {Yt }
(and assuming we know the
parameters), or we may prefer the ARMA(p, q ) if we are interested in
estimating the parameters.

## (1 1.2L + 0.35L2 )Yt = (1 0.7L)t

(1 0.7L)(1 0.5L)Yt = (1 0.7L)t
so, simplifying (1 0.7L), the process has the same autocorrelation
structure (and the same weights in the MA() representation) of

(1 0.5L)Yt = t
i.e.
Yt = 0.5Yt 1 + t
J IANHUA G ANG (RUC)

S PRING 2013

35 / 47

S PRING 2013

36 / 47

## T RANSFORMATION OF ARMA M ODELS

F ILTERS
Sometimes data are treated (by nature or by the researcher) by
summing / averaging / differencing ...
For Yt , a filter h(L) is applied as:
Xt = h(L)Yt

## ARMA models are quite standard and typical behaviour in econometrics,

however, we may derive some transformations from this particular
framework.

where

h (L) =

hj Lj

j =

If

|hj | < ,

j =

j =

| j | <

then
Xt = + (L)t
where
= h(1)c, (L) = h(L)(L)
J IANHUA G ANG (RUC)

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

37 / 47

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

38 / 47

## T RANSFORMATION OF ARMA M ODELS

S UM OF ARMA PROCESSES

## Our variable of interest (signal) may be obscured by a noise.

Suppose Yt is w .n.(0, 2 ), and
1
Xt =
k

Example:
Yt = Xt + vt

k 1

Yt j

where

j =0

Xt = ut + ut 1

## as in (moving) average of quarterly or monthly data on a yearly basis:

then averaging induced dependence where there was none.

S PRING 2013

39 / 47

## and ut is w .n.(0, 2u ), vt is w .n.(0, 2v ), E (ut vt ) = 0 for all t, .

Suppose we are interested in Xt ,but we can only observe Yt .What are
the properties of Yt ?

S PRING 2013

40 / 47

## T OPIC 3 ARMA M ODELS

S UM OF ARMA PROCESSES

S UM OF ARMA PROCESSES
In order to find ,compute

## E (Yt ) = 0 for all t.

0 = E (Xt + vt )2 = E (Xt2 ) + E (vt2 ) + 2E (Xt vt )
1

j 2

1 =

= (1 + 2 )2u + 2v
= E (Xt + vt )(Xt 1 + vt 1 ) = E (Xt Xt 1 ) + E (vt Xt 1 )
+E (Xt vt 1 ) + E (vt vt 1 )
= 2u
= 0

= 1
1 + 2
solve for :
= 1 + 1 2
q
1 1 421
1,2 =
21

Yt = t + t 1 , t w .n.(0, 2 )

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

41 / 47

42 / 47

## T RANSFORMATION OF ARMA M ODELS

S UM OF ARMA PROCESSES

1 421

21

## the process is invertible. We can also derive

1 = 2 ,so

2 = 2u .

L EMMA
In general, consider

Yt = Xt + Wt
2 ,for

example from

where Xt and Wt are (zero mean) stationary processes such that Xt and
W are not correlated at any t, , then
E (Yt Yt j ) = E (Xt Xt j ) + E (Wt Wt j )

## (Notice that we observe Yt ,so we can estimate and 2 : since

however there are three parameters of interest, , 2u , and 2v ,we,
however, cannot estimate them without an identification assumption.
In other words, Yt contains less information than Xt and vt ).

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

S UM OF ARMA PROCESSES

1 =

## T RANSFORMATION OF ARMA M ODELS

For = 1 , where

2u
.
(1 + 2 )2u + 2v

Since in an MA(1),

S PRING 2013

43 / 47

i.e.
Yj = Xj + W
j

S PRING 2013

44 / 47

## T OPIC 3 ARMA M ODELS

S UM OF TWO MA PROCESSES

## T RANSFORMATION OF ARMA M ODELS

S UM OF TWO AR PROCESSES
Suppose,
Yt = Xt + Wt
where

(1 L)Xt = ut , (1 L)Wt = vt ( 6= )
then

L EMMA
If Xt is MA(q1 ) and Wt is MA(q2 ), then Yt is MA(max[q1, q2 ]).

## (1 L)(1 L)Xt = (1 L)ut ,

(1 L)(1 L)Wt = (1 L)vt ,
so

(1 L)(1 L) (Xt + Wt )
= (1 L)ut + (1 L)vt
So Yt is ARMA(2, 1). (If = , Yt is AR (1) ).
J IANHUA G ANG (RUC)

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013
T OPIC 3 ARMA M ODELS

45 / 47

## T RANSFORMATION OF ARMA M ODELS

S UM OF T WO ARMA P ROCESSES

L EMMA
If Xt is ARMA(p1 , q1 ), Wt is ARMA(p2 , q2 ), then Yt is ARMA(p, q ) with
p p1 + p2
and
q max(p1 + q2 , p2 + q1 )

S PRING 2013

47 / 47

S PRING 2013

46 / 47

## T OPIC 4 E STIMATION OF ARMA

I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 4 Estimation of ARMA
3 C REDITS , 51 H OURS
MLE

Jianhua Gang
School of Finance
Renmin University of China

Spring 2013

PRING 2013

1 / 42

## E STIMATION : S AMPLE MOMENTS

Y = (Y1 , ..., YT )
be a Normally distributed vector with
E (Y ) = ,E ((Y )(Y ) ) =

Sample autocovariance

bj =

Sample autocorrelation

Let

Sample Mean

2 / 42

## We described the properties of some stationary processes by focusing

on some population moments (mean, autocovariances,...). However,
we only have the data that we observed, (y1 , ..., yT ) ,so we can only
compute estimates of these moments. Are these estimates useful?
Y =

PRING 2013

1
Yt
T t
=1

## The Gaussian density, computed at the points

y = (yT , ..., y1 )

1 T
(Yt Y )(Yt j Y )
T t =
j +1

in the support of Y is
fY T ,...Y 1 (yT , ..., y1 )

bj

b
j =
b0

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation ofSARMA

PRING 2013

1
= (2 )T /2 ||1/2 exp( (y ) 1 (y ))
2
3 / 42

PRING 2013

4 / 42

E XAMPLES :

## Now assume that y = (yT , ..., y1 ) is the realization of Y, and

consider =() where is a set of parameters of interest. Then,

1
= (2 )T /2 |()|1/2 exp( (y ) ()1 (y ))
2
is the likelihood function. Maximizing that function w.r.t. yields the
(exact) maximum likilihood estimate.
Note the difference between and .

PRING 2013

## T OPIC 4 E STIMATION OF ARMA

AR (1)(|0 | < 1) :
Yt = c0 + 0 Yt 1 + t , t Nid (0, 20 )

## E STIMATION : M AXIMUM L IKELIHOOD (ML)

5 / 42

= (c, , 2 ) , (||

1
2
...
...
() =
1 2
T 2 T 3
T 1 T 2

... T 2 T 1
... T 3 T 2

...
...
...

...
1

...

PRING 2013

E XAMPLES :

< 1) and

6 / 42

## E STIMATION : M AXIMUM L IKELIHOOD (ML)

E XAMPLES :

MA(1)(| 0 | < 1) :
Yt = 0 + t + 0 t 1 , t

## The likelihood function may be computed for a given set of

observations and for any parameter (within the range of the
parameter space).

Nid (0, 20 )

= (c, , 2 ) , and
() =

(1 + 2 )

2
2
(1 + ) ...
0

0
J IANHUA G ANG (RUC)

(1 + 2 )

...

...

...
0

...
...

...
1

...

(1 + 2 )

time
obs.

...

2
(1 + )
1
0

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation ofSARMA

PRING 2013

y1
0.5

y2
0.8

y3
0.2

y4
2

and suppose you want to estimate 0 in the MA(1) model with the
additional assumption that 0 = 0 and 20 = 1: consider five
potential values for 0 : 0.5, 0.25, 0, 0.25, 0.5.
Then, we have to compute () for each : for example, when
= 0.5,

7 / 42

PRING 2013

8 / 42

E XAMPLES :

E XAMPLES :
and

then,

() =

0.5 2
(1 +0.5 )

(1 + 0.52 ) ...

0
0

0.5
(1 +0.52 )

1
...
0
0

...
...
...
...
...

0
0
...
1
0.5
(1 +0.52 )

0
0
...

0.5
(1 +0.52 )
1

PRING 2013

(y ) ()1 (y )

= 0.5 0.8 0.2 2
1

1.25 0.5
0
0
0.5 1.25 0.5
0

0
0.5 1.25 0.5
0
0
0.5 1.25

0.5
0.8

0.2
2
= 4.6903
9 / 42

ARMA2013

E XAMPLES :

10 / 42

E XAMPLES :

## The function may be computed for all the , | | < 1 (b

= 0.76)

So,
1
(2 )T /2 |()|1/2 exp( (y ) ()1 (y ))
2
1

1/2
= (2 )4/2 (1.332)
exp( 4.6903)
2
= 2.1033 103
Therefore, we may get all the likelihoods for different .

0.5 0.25 0
0.25
0.5
103 f 3.178 2.618 2.153 1.967 2.103

ARMA2013

11 / 42

ARMA2013

12 / 42

E XAMPLES :

ML OF AR(1)

AR(1)

## Yt = c0 + 0 Yt 1 + t , |0 | < 1, t Nid (0, 20 )

The computation of
Then

1
(2 )T /2 |()|1/2 exp( (y ) ()1 (y ))
2

Yt N

## is very heavy, because it requires the inversion of the T T .matrix of

1
() for all the admissible values .

ARMA2013

13 / 42

1
c0
, 20
1 0
1 20

fY1 (y1 ; )


2

c
1/2

2
1 y1 1

= (2 )1/2
exp

2
2
1 2
1 2

ML OF AR(1)

ARMA2013

AR(1)

## Luckily, it is sometimes easy to rewrite the likelihood function in a

way that does not require the inversion of ();otherwise, it is also
possible to modify the problem so that, again, we can avoid the
inversion of ().

## J IANHUA G ANG (RUC)

14 / 42

ML OF AR(1)

AR(1)
and, by the same arguement,

## Of course, the same likelihood may be expressed for Y2 , however in

this case we can also exploit the fact that we observed Y1 on the
period before:
Y2 |Y1 N (c0 + 0 Y1 , 20 ),
so

= (2 )

fY |Y ,...,Y
t

t =2

t 1

## (yt |yt 1 , ..., y1 ; )fY1 (y1 ; )

where
fY t |Yt 1 ,...,Y 1 (yt |yt 1 , ..., y1 ; )
"
#

1 (yt c yt 1 )2
1/2 2 1/2

= (2 )
exp
2
2

fY 2 |Y 1 (y2 |y1 ; )
1/2

## fY T ,...,Y 1 (yT , ..., y1 ; ) =

"
#
2
2 1/2
y
1

y
(
)
2
1

exp
2
2

when t = 2, ..., T .

## so in what follows we simplify the notation in this way.

J IANHUA G ANG (RUC)

ARMA2013

15 / 42

ARMA2013

16 / 42

ML OF AR(1)

## T OPIC 4 E STIMATION OF ARMA

AR(1)

ML OF AR(1)

AR(1)

The log-likelihood is
T

l () = ln(fY1 (y1 ; )) +

t

t =2

1
= ln 2
2
1 2

## Maximizing that function would give the "maximum likelihood

estimate" when t is normally distributed.

t 1

c
1 y1 1

2
2
2
1

2

## However, although we eliminated the problem of inverting (),we

still cannot express our estimate b
as a closed form function of the
observations, so we still have to compute the likelihood function on
all the admissible parameters in order to find the maximum.

1 T (yt c yt 1 )2
T 1
2
ln(2 )

2
2 j =2
2
We then succeed in writing the (log) likelihood in a way that does
not require the inversion of a T T matrix.

ARMA2013

17 / 42

ARMA2013

ML OF AR(1)

AR(1)

18 / 42

ML OF AR(1)

AR(1)

## and, equating the derivative to 0,

1 T (yt c yt 1 )2
T 1
ln(22 )
2
2 j =2
2

## That estimate is known as "conditional maximum likelihood

estimate", because it is the maximum likelihood estimate if Y1 is not
random (so, the log-likelihood above is called "conditional"
log-likelihood). In this case, a closed form solution exists.
In order to find the closed form solution, first notice that 2 can be
estimated and concentrated out:
"
#
2
T

y
T 1
1
y

(
)
t
t

ln(22 )
2
2
2 j =2
2
T

=
J IANHUA G ANG (RUC)

T 1 1
1
(yt c yt 1 )
+
2
2
2 j =2
( 2 )2

c2 =

## so replacing this in the log-likelihood, and we get the solutions:

(yt y . )(yt 1 y .1 )
t =2

b =

where, y . =

19 / 42

T
1
(yt y . ) (yt 1 y .1 )

T 1 t =2

b
c =

ARMA2013

T
1
(yt c yt 1 )2

T 1 j =2

1
T 1

## J IANHUA G ANG (RUC)

( yt 1 y . 1 ) 2
t =2
T

t =2

t =2

yt , y .1 = T 11 yt 1
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING
ARMA2013

20 / 42

ML OF AR(1)

AR(1)

ML OF AR( P )

AR( P )

## So for the "conditional maximum likelihood estimate" a closed

form solution exists, and it is the OLS estimate in
Yt = c0 + 0 Yt 1 + t .
Notice that this is not the likelihood function of our original
stationary AR(1) process, but the likelihood of the process,

Assume
Yt = c0 + 0;1 Yt 1 + ... + 0;p Yt p + t ,
where
t Nid (0, 20 )
and the roots of 1 0;1 z ... 0;p z p = 0 are outside the unit
circle.

Yt = c0 + 0 Yt 1 + t , |0 | < 1,

Introduce

Y1 = y1

and

ARMA2013

2 1

Vp = ( )

21 / 42

ML OF AR( P )



Yp p

Yp p

 

ARMA2013

AR( P )



22 / 42

ML OF AR( P )

AR( P )

## and take again the Gaussian density:

fY p ,...,Y 1 (yp , ..., y1 ; )

## so the likelihood can be written as

= (2 )p/2 |2 Vp ()|1/2




1 
1
exp 2 yp p Vp () yp p
2

T

t =p +1

## where the problem of inverting the T T matrix () is reduced to

inverting a p p matrix Vp ().

= (2 )1/2 |2 |1/2
"
#
2
1 (yt c 1 yt 1 ... p yt p )
exp
2
2
when t = p + 1, ..., T .
J IANHUA G ANG (RUC)

ARMA2013

23 / 42

ARMA2013

24 / 42

ML OF AR( P )

AR( P )

ML OF AR( P )

AR( P )

estimate".

## l () = ln(fY p,...,Y1 (yp , ..., y1 ; ))

h
i
T
+ ln fYt |Yt 1 ,...,Yt p (yt |yt 1 , ..., yt p ; )

## Again, a "conditional maximum likelihood estimate" can be

considered instead: this is obtained by treating Yp , ..., Y1 as given,
and maximizing

t =p +1

p
= ln(2 ) |2 Vp ()|1/2
2



1 
2 yp p Vp1 () yp p
2
T p

ln(22 )
2
2
1 T (yt c 1 yt 1 ... p yt p )

2 t =p +1
2

ARMA2013

t =p +1

## instead. The value of that maximized the log-likelihood is called

"conditional maximum likelihood estimate". This turns out to
be the OLS estimate of c0 , 0;1 , ..., 0;p in the corresponding
regression model.

25 / 42

ML OF MA(1)

ARMA2013

## T OPIC 4 E STIMATION OF ARMA

MA(1)

26 / 42

ML OF MA(1)

MA(1)

Suppose
i.e. the density of Yt |t 1 is

## Yt = 0 + t + 0 t 1 , | 0 | < 1, t Nid (0, 20 )

fY t |t 1 (yt |t 1 ; 0 )


1 (yt 0 0 t 1 )2
1
exp
= q
2
20
220


1 2t
1
exp 2
= q
2 0
220

## Under an additional assumption that

0 = 0
we can also derive a "conditional maximum likelihood estimate" of
in an MA(1).
In general, since t Nid (0, 20 ),then
Yt |t 1 N (0 + 0 t 1 , 20 )

ARMA2013

27 / 42

ARMA2013

28 / 42

ML OF MA(1)

MA(1)

ML OF MA(1)

MA(1)

## Unfortunately t 1 is not observable.

However, suppose that we know 0 ,then Y1 = 0 + 1 + 0 0 , and,
given 0 and 0 we can also compute

and
fY t ,Y t 1 ,...,Y 1 |0 (yt , yt 1 , ..., y1 |0 ; 0 )

= fY1 |0 (y1 |0 ; 0 )

1 (0 ) = y1 0 0 0

## (the notation 1 (0 ) means that 1 is computed for a given value of

the vector of parameters ).
Having computed 1 (0 )(and given 0 ) we can also compute
2 (0 ) = y2 0 0 1 ( 0 ), and,iterating the procedure,
t (0 ) = yt 0 0 t 1 (0 )
Then
fY t |t 1 (yt |t 1 ; 0 ) = fY t |Y t 1 ,...,Y 1 ,0 (yt |yt 1 , ..., y1 , 0 ; 0 )
J IANHUA G ANG (RUC)

ARMA2013

29 / 42

## fY |Y ,...,Y , (yt |yt 1 , ..., y1 , 0 ; 0 )

t 1

t =2



T
t ( 0 )2
.
= (2 )T /2 (20 )T /2 exp
220
t =1
Notice that this is not the density of (Yt , Yt 1 , ..., Y1 ) where each Yt
has an MA(1) representation, but that a density (i.e., the density of
(Yt , Yt 1 , ..., Y1 ) when each Yt has MA(1) representation)
conditional on 0 .
Moreover, we cannot compute a likelihood, because we cant observe
0 .
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING
ARMA2013

ML OF MA(1)

MA(1)

30 / 42

ML OF MA(1)

MA(1)
Therefore,

## fY t ,Y t 1 ,...,Y 1 ,0 =0 (yt , yt 1 , ..., y1 , 0 = 0; 0 )

Yt = 0 + t + 0 t 1 ,

= fY1 |0 =0 (y1 |0 = 0; 0 )

with

## fY |Y ,...,Y , =0 (yt |yt 1 , ..., y1 , 0 = 0; 0 )

t > 0, 0 = 0

t =2

This process is very similar to the stationary MA(1), and it has the
density above (setting 0 = 0); given that we know 0 ,we can
initialize the iterations (for all the admissible values of )

= (2 )

t 1

T /2

2 T /2

( )


t ()2
exp 22 .
t =1
T

## Taking the logs, the (conditional) log-likelihood is

t () = yt t 1 ()
We can then compute the likelihood (which is, then, a "conditional
likelihood") as a function of a set of observations (yt , yt 1 , ..., y1 ) ,
and of a generic vector of unknown parameters ,
J IANHUA G ANG (RUC)

ARMA2013

31 / 42

l () =

T
1
T
ln(2 ) ln(2 ) 2
2
2
2

2t ()
t =1

## The value of that maximizes the (conditional) log-likelihood is

called "conditional maximum likelihood estimate".
J IANHUA G ANG (RUC)

ARMA2013

32 / 42

ML OF MA( Q )

## T OPIC 4 E STIMATION OF ARMA

MA( Q )

ML OF MA( Q )

MA( Q )
Iteratively, and we can formulate a "conditional maximum likelihood":
fY t ,Y t 1 ,...,Y 1 ,0 =0 (yt , yt 1 , ..., y1 , 0 = 0; 0 )

## Corresponding with the MA(1) case, the MA(q) process can be

written as:

= fY1 |0 = 0(y1 |0 = 0; 0 )
T

t

## Yt = 0 + t + 0;1 t 1 + ... + 0;q t q , t Nid (0, 20 )

and the roots of 1 + 0;1 z + ... + 0;q

zq

t =2

= (2 )

t 1

T /2

2 T /2

| |


t ()2
exp 22 .
t =1
T

## Taking the logs, the (conditional) log-likelihood is

t () = yt 1 t 1 () ... q t q ()
l () =

T
1
T
ln(2 ) ln(2 ) 2
2
2
2

2t ()
t =1

## The value of that maximizes the (conditional) log-likelihood is

called "conditional maximum likelihood estimate".
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING
ARMA2013

33 / 42

## J IANHUA G ANG (RUC)

ML OF ARMA( P, Q )

ARMA2013

## T OPIC 4 E STIMATION OF ARMA

ARMA( P, Q )

34 / 42

ML OF ARMA( P, Q )

ARMA( P, Q )

Yt
t

## = c0 + 0;1 Yt 1 + ... + 0;p Yt p

+t + 0;1 t 1 + ... + 0;q t q ,
Nid (0, 20 )

## The conditional likelihood is then

fY T ,...,Y p +1 |Y p, ...,Y 1 ,0 =0,...,p q +1 =0

## and the roots of 1 0;1 z ... 0;p z p = 0 and of

1 + 0;1 z + ... + 0;q z q = 0 are all outside the unit circle, and there is
no common factor.

## (yT , ..., yp +1 |yp , ..., y1 , p = 0, ..., p q +1 = 0; )

= (2 )(T p )/2 |2 |(T p )/2

 2
T
t ()
exp 22 .
t =p +1

## Again, assume that Yp , ..., Y1 as given and 0 = (0 , ..., q +1 ) = 0.

Then we can compute
t () = yt c 0;1 yt 1 ... 0;p yt p 1 t 1 () ... q t q ()
for t > p.
J IANHUA G ANG (RUC)

ARMA2013

35 / 42

ARMA2013

36 / 42

## T OPIC 4 E STIMATION OF ARMA

ML OF ARMA( P, Q )

ARMA( P, Q )

l ()
1
T p
T p
ln(2 )
ln(2 ) 2
=
2
2
2

2t ()

t =p +1

## The value of that maximizies the conditional log-likelihood is called

"conditional maximum likelihood estimate".

## When t is not normally distributed, the density is different and

then the maximum likelihood estimate is different as well. If we use
the gaussian density even if t is not normally distributed, then, our
estimate is no longer the maximum likelihood one. In this case it
usually known as Pseudo (or Quasi) maximum likelihood instead.

ARMA2013

37 / 42

ARMA2013

38 / 42

## O PTIMIZATION OF THE OBJECTIVE FUNCTION

for a generic (0 ) ,and consider an approximate second order Taylor
expansion of l (),

## In general, it is not always possible to obtain a closed form formula

for the estimate, and it may be extremely time consuming to compute
the log-likelihood function (even the conditional log-likelihood) for all
the potential .
The optimisation of the log-likelihood may be carried using a
numerical algorithm, such as the Newton-Raphson one. Introduce

iif

l ()
=
2 l ()
H ((0 ) ) =
| (0) (Hessian)
=
g ((0 ) ) =

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING

ARMA2013

ih
i
h
l () l ((0 ) ) + g ((0 ) ) (0 )
i
i
h
1h
(0 ) H ((0 ) ) (0 )
2
l ()
| b=0
=

## Now, consider the approximation of the derivative around (0 ) :

h
i
h
i
l ()
g ((0 ) ) H ((0 ) ) (0 ) .

39 / 42

ARMA2013

40 / 42

## If the approximation was perfect, we could have just computed b

solving for
h
i
h
i
g ((0 ) ) H ((0 ) ) (0 ) = 0,

i.e.,

## Next, we can improve, by considering a second order approximation of

l () in (1 ) ,and compute
h
i
(2 ) = (1 ) + H ((1 ) )1 g ((1 ) ) .

i
h
= (0 ) + H ((0 ) )1 g ((0 ) ) .

## However, this may be a rather poor estimate, because the

approximation is not exact (there is a remainder, in this case of the
third order, in the Taylor expansion of l ()). Lets call this possibly
poor estimate (1 ) ,then, where
i
h
(1 ) = (0 ) + H ((0 ) )1 g ((0 ) )
clearly, this is (in a certain probabilistic sense better than a generic
(0 ) .

ARMA2013

41 / 42

## The procedure can then be iterated until convergence (which

gives b
).
In many cases, you may start the optimisation with any set of
starting values, but this may result in a rather slow optimisation, or
even in an "incorrect" solution (you may end up picking a local
maximum, rather than the maximum). It is then advisable to start
from a "good" point, that is, from a consistent estimate of
(typically, an estimate that you may compute easily, even if it is less
efficient than maximum likelihood): the correlogram based
estimate is a good starting point (given certain regularity conditions,
properties as in the pseudo-maximum likelihood estimate may be
obtained after just one step).
J IANHUA G ANG (RUC)

ARMA2013

42 / 42

## T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME

S ERIES
I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 5 Models of Heteroskedasticity
3 C REDITS , 51 H OURS
Jianhua Gang
School of Finance
Renmin University of China

Spring 2013

1 / 30

S PRING 2013

2 / 30

A S AMPLE

E XCHANGE )

## Motivation: the linear structural (and time series) models cannot

explain a number of important features common to much financial
data

## F IGURE : CBOE VIX and SPX (S&P500 Index) Scale

leptokurtosis
volatility clustering or volatility pooling
leverage effects
SPX (S&P500 Index)

## Our traditional structural model could be something like:

yt = 1 + 2 x2t + ... + k xkt + ut

2000

100

1000

50

CBOE VIX

S PRING 2013

## J IANHUA G ANG (RUC)

or more compactly
y = X + u, u N 0, 2

S PRING 2013

0
1990

3 / 30

1992

1994

1996

1998
2002
Year

2004

2006

2008 2009

S PRING 2013

4 / 30

A S AMPLE

## N ON - LINEAR M ODELS : A D EFINITION

Campbell, Lo and MacKinlay (1997) define a non-linear data
generating process (DGP) as one that can be written as,

## F IGURE : SPX Return and Percentage Change of VIX (%VIX)

yt = f (ut , ut 1 , ut 2 , ...)

15
10

5
0

-5
-10
1990

1992

1994

1996

1998

2000
Year

2002

2004

2006

20082009

80
60
40

term.

20
0
-20
-40
1990

1992

1994

1996

1998

2000
Year

2002

2004

2006

20082009

S PRING 2013

## T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES

Models with nonlinear g (.) are non-linear in the mean, while those
with nonlinear 2 (.) are nonlinear in variance.
5 / 30

S PRING 2013

6 / 30

## T ESTING FOR N ON - LINEARITY

The traditional tools of time series analysis (acfs, spectral analysis)
may find no evidence that we could use a linear model, but the data
may still not be independent.

## The linear paradigm is a useful one: Many apparently non-linear

relationships can be made linear by a suitable transformation.
On the other hand, it is likely that many relationships in finance are
intrinsically non-linear.
There are many types of non-linear models, e.g.

## Portmanteau tests (discuss later) for non-linear dependence have been

developed.
The simplest is Ramseys RESET test, which took the form:
u
bt = 0 + 1 ybt2 + 2 ybt3 + ... + p 1 ybtp + vt

ARCH / GARCH
switching models
bilinear models

## Many other non-linearity tests are available.

One particular non-linear model that has proved very useful in finance
is the ARCH model due to Engle (1982).

S PRING 2013

7 / 30

S PRING 2013

8 / 30

## A UTOREGRESSIVE C ONDITIONALLY H ETEROSKEDASTIC (ARCH)

T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES M ODELS

H ETEROSKEDASTICITY R EVISITED

H ETEROSKEDASTICITY R EVISITED

ARCH M ODELS

## An example of a structural model is

yt = 1 + 2 x2t + 3 x3t + 4 x4t + ut , ut N 0, 2u

## The assumption that the variance of the error is constant is known as

homoskedasticity, i.e. var (ut ) = 2 .
What if the variance of the error is not constant?
heteroskedasticity;
would imply that standard error estimates could be wrong.

## The mean process have been extended to essentially analogous

models for the variance (seminal paper of Engle (1982)).
Autoregressive conditional heteroscedasticity (ARCH) models are now
commonly used to describe and forecast changes in the volatility of
financial time series.
Bollerslev et al. (1992, 1994), Bera and Higgins (1993), Pagan
(1996), Palm (1996) and Shephard (1996), among others.

## Is the variance of the errors likely to be constant over time?

Not for financial data!!!

S PRING 2013

9 / 30

## A UTOREGRESSIVE C ONDITIONALLY H ETEROSKEDASTIC (ARCH)

T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES M ODELS

ARCH M ODELS

10 / 30

## A UTOREGRESSIVE C ONDITIONALLY H ETEROSKEDASTIC (ARCH)

T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES M ODELS

ARCH M ODELS

So use a model which does not assume that the variance is constant.
Recall the definition of the variance of ut :
n
o
2t = var (ut |ut 1 , ut 2 , ...) = E [ut E (ut )]2 |ut 1 , ut 2 , ...
we usually assume that E (ut ) = 0, so


2t = var (ut |ut 1 , ut 2 , ...) = E ut2 |ut 1 , ut 2 , ...

## The full model would be

yt = 1 + 2 x2t + ... + k xkt + ut , ut N 0, 2t

where 2t = 0 + 1 ut21 .
We can easily extend this to the general case where the error variance
depends on q lags of squared errors:

## This is an ARCH(q) model.

Instead of calling the variance 2t ,in the literature it is usually called
ht ,so the model is

(ARCH) model:
2t = 0 + 1 ut21

## yt = 1 + 2 x2t + ... + k xkt + ut , ut N (0, ht )

where
ht = 0 + 1 ut21 + 2 ut22 + ... + q ut2q

## This is known as the ARCH(1) model.

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 5 Models of Heteroskedasticity
S PRING 2013

## 2t = 0 + 1 ut21 + 2 ut22 + ... + q ut2q

Now, what could the current value of the variance of the errors
plausibly depend upon?

S PRING 2013

11 / 30

S PRING 2013

12 / 30

## T ESTING FOR ARCH E FFECTS

1 First, run any postulated linear regression of the form given in the
equation above, e.g.

## For illustration, consider an ARCH(1). Instead of the above, we can

write

yt = 1 + 2 x2t + ... + k xkt + ut , ut N 0, 2t
q
t =
0 + 1 ut21
The two are different ways of expressing exactly the same model. The
first form is easier to understand while the second form is required for
simulating from an ARCH model, for example.

## yt = 1 + 2 x2t + ... + k xkt + ut

saving the residuals, u
bt .

2 Then square the residuals, and regress them on q own lags to test for
ARCH of order q, i.e. run the regression
bt21 + 2 u
bt22 + ... + q u
bt2q + vt
u
bt2 = 0 + 1 u

## 3 The test statistic is defined as TR 2 (the number of observations

multiplied by the coefficient of multiple correlation) from the last
regression, and is distributed as a 2 (q ).
J IANHUA G ANG (RUC)

S PRING 2013

13 / 30

S PRING 2013

## P ROBLEMS WITH ARCH( Q ) M ODELS

How do we decide on q?

H0 : 1 = 2 = ... = q = 0

## The required value of q might be very large;

H1 : q 6= 0, q = 1 or 2 or 3...

## Non-negativity constraints might be violated;

If the value of the test statistic is greater than the critical value from
the 2 distribution, then reject the null hypothesis.
Note that the ARCH test is also sometimes applied directly to returns
instead of the residuals from Stage 1 above.

S PRING 2013

14 / 30

15 / 30

## When we estimate an ARCH model, we require i > 0, i = 1, 2, ..., q

(since variance cannot be negative).
Therefore, a natural extension of an ARCH(q) model which circumvents
some of these problems is a GARCH model.

S PRING 2013

16 / 30

GARCH M ODELS

GARCH M ODELS

## Due to Bollerslev (1986). Allow the conditional variance to be

dependent upon previous own lags.
The variance equation is now
But in general a GARCH(1,1) model will be sufficient to capture the
volatility clustering in the data.

2t = 0 + 1 ut21 + 2t 1
This is a GARCH(1,1) model, which is like an ARMA(1,1) model for
the variance equation.
We could also show that a GARCH(1,1) model can be written as an
infinite order ARCH model.

## Why is GARCH better than ARCH?

more parsimonious-avoiding overfitting;
less likely to breach non-negativity constraints.

## We can again extend the GARCH(1,1) model to a GARCH(p,q):

q

2t = 0 + i ut2i +
i =1

j 2t j

j =1

S PRING 2013

17 / 30

S PRING 2013

18 / 30

## The unconditional variance of ut is given by,

var (ut ) =

Since the model is no longer of the usual linear form, we cannot use
OLS.

0
1 ( 1 + )

## We use maximum likelihood as we already discussed.

when 1 + < 1
1 + > 1 is termed non-stationarity in variance;

The method works by finding the most likely values of the parameters
given the actual data.

## For non-stationarity in variance, the conditional variance forecasts will

not converge on their unconditional value as the horizon increases.

S PRING 2013

19 / 30

S PRING 2013

20 / 30

## E STIMATION OF GARCH M ODELS USING M AXIMUM

T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES L IKELIHOOD

L IKELIHOOD

## The steps involved in actually estimating an ARCH or GARCH model

are as follows
1

Specify the appropriate equations for the mean and the variance - e.g.
an AR(1)-GARCH(1,1) model:
yt
2t

= + yt 1 + ut , ut N (0, 2t )
=

0 + 1 ut21

(1)

2t 1

## Now we get model (1) and likelihood function (2).

Unfortunately, the LLF for a model with time-varying variances
cannot be maximised analytically, except in the simplest of cases. So
a numerical procedure is used to maximise the log-likelihood
function. A potential problem: local optima or multimodalities in
the likelihood surface.
The way we do the optimisation is:

1 T
T
1 T (yt yt 1 )2
l = log(2 ) log(2t )
2
2 t =1
2 t =1
2t

(2)

1
2
3

The computer will maximise the function and give parameter values
and their standard errors.

S PRING 2013

21 / 30

## N ON -N ORMALITY AND M AXIMUM L IKELIHOOD

Set up LLF.
Use regression to get initial guesses for the mean parameters.
Choose some initial guesses for the conditional variance parameters.
Specify a convergence criterion - either by criterion or by value.

S PRING 2013

22 / 30

## Recall that the conditional normality assumption for ut is essential.

We can test for normality using the following representation

## Since the GARCH model was developed, a huge number of extensions

and variants have been proposed. Three of the most important
examples are EGARCH, GJR, and GARCH-M models.

vt N (0, 1)
q ut = vt t
t = 0 + 1 ut21 + 2t 1
vt = utt
The sample counterpart is vbt =

## Problems with GARCH(p,q) Models:

ubt
bt

Are the vbt normal? Typically vbt are still leptokurtic, although less so
than u
bt . Is this a problem? Not really, as we discussed before. We
can use the ML with a robust variance/covariance estimator. ML
with robust standard error is called Quasi-Maximum Likelihood or
QML (also known as pseudo-).

S PRING 2013

23 / 30

## Non-negativity constraints may still be violated;

GARCH models cannot account for leverage effects

## Possible solutions: the exponential GARCH (EGARCH) model or the

GJR model, which are asymmetric GARCH models.

S PRING 2013

24 / 30

## T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES

T HE EGARCH M ODEL

## T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES

T HE EGARCH M ODEL

T HE GJR M ODEL

## Suggested by Nelson (1991). The variance equation is given by

r


2
ut 1
|ut 1 |
+ q

log 2t = + log 2t 1 + q

2t 1
2t 1

## Due to Glosten, Jaganathan and Runkle

2t = 0 + 1 ut21 + 2t 1 + ut21 It 1
where


Since we model the log 2t , then even if the parameters are negative,
2t will be positive.
We can account for the leverage effect by noticing that a negative
shock (u
t 1 ) has an asymmetric effect on the dependent variable
log 2t as opposed to a positive shock.

S PRING 2013

## T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES

T HE GJR M ODEL

25 / 30

A N E XAMPLE OF GJR

It 1 = 1, if ut 1 < 0
It 1 = 0, otherwise
For a leverage effect, we would see > 0.
We require 1 + 0 and 1 0 for non-negativity conditions.

S PRING 2013

## T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES

A N E XAMPLE OF GJR

26 / 30

## N EWS I MPACT C URVES

The news impact curve plots the next period volatility (ht ) that would
arise from various positive and negative values of ut 1 , given an
estimated model.
News Impact Curves for SPX returns using coefficients from GARCH
and GJR Model Estimates:

## Using monthly SPX returns, December 1979 - June 1998

Estimating a GJR model, we obtain the following results.

0.14
GARCH

= 0.172

2t =

GJR
0.12

(3.198 )
Value of Conditional Variance

yt

(16.372 )

(0.437 )

(14.999 )

(5.772 )

0.1

0.08

0.06

0.04

0.02

0
-1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

S PRING 2013

27 / 30

S PRING 2013

28 / 30

GARCH- IN -M EAN

GARCH- IN -M EAN

## We expect a risk to be compensated by a higher return. So why not

let the return of a security be partly determined by its risk?
Engle, Lilien and Robins (1987) suggested the ARCH-M specification.
A GARCH-M model would be

yt = + t 1 + ut , ut N 0, 2t
2t = 0 + 1 ut21 + 2t 1

GARCH can model the volatility clustering effect since the conditional
variance is autoregressive. Such models can be used to forecast
volatility.
We could show that
Var (yt |yt 1 , yt 2 ..., ) = Var (ut |ut 1 , ut 2 , ...)

## It is possible to combine all or some of these models together to get

more complex hybrid models - e.g. an ARMA-EGARCH(1,1)-M
model.

S PRING 2013

29 / 30

S PRING 2013

30 / 30

## T OPIC 6 M ULTIVARIATE GARCH

I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 6 Multivariate GARCH
3 C REDITS , 51 H OURS
Jianhua Gang
School of Finance
Renmin University of China

Spring 2013

PRING 2013

1 / 44

MGARCH FAMILY

PRING 2013

MGARCH FAMILY

2 / 44

MGARCH FAMILY

MGARCH FAMILY

P ROBLEM
1

## Is the volatility of an asset transmitted to another asset directly

(through its conditional variance) or indirectly (through its conditional
covariances)?

Is the impact the same for negative and positive shocks of the same
amplitude?

time.

PRING 2013

## Literature regarding MGARCH models: Bollerslev et al. (1988);

Gourieroux (1997); De Santis and Gerard (1998); Hafner and
Herwartz (1998); Franses and van Dijk (2000); Lien and Tse (2002).
MGARCH models: initially developed in the late 1980s and the first
half of the 1990s. After 2000s, another active phase of this field.

## Are they higher during periods of higher volatility (sometimes

associated with financial crises)?
Are they increasing in the long run, perhaps because of the
globalization of financial markets?
J IANHUA G ANG (RUC)

## The above questions can be answered using MGARCH.

For example, impact of volatility in financial markets on real variables
like exports and output growth rates, and the volatility of these
growth rates.

3 / 44

PRING 2013

4 / 44

MGARCH FAMILY

MGARCH FAMILY

## D EFINITION (C ONDITIONAL VARIANCE )

yt = t ( ) + t

ET AL .

ht = c + A t 1 + Ght 1
where

ht

t = Ht1/2 ( ) zt

## where Ht1/2 ( ) is a N N positive definite matrix; the N 1 random

vector zt , IN is the identity matrix of order N.

Var (zt ) = IN

PRING 2013

## T OPIC 6 M ULTIVARIATE GARCH

= vech (Ht )

= vech t t

and vech () denotes the operator that stacks the lower triangular portion
of a N N matrix as a N (N + 1) /2 1vector. A and G are square
parameter matrices of order N (N + 1) /2 and c is a N (N + 1) /2 1
parameter vector.

E (zt ) = 0

1988)

5 / 44

PRING 2013

## VEC M ODELS (B OLLERSLEV

VEC

ET AL .

1988)

AND

DVEC (B OLLERSLEV

ET AL .

6 / 44

1988)

## The number of parameters is 78 for N = 3. So in practice, this model

is used only in the bivariate case.

## The number of parameters is 78 for N = 3. So in practice, this model

is used only in the bivariate case.

## Bollerslev et al. (1988) suggests DVEC (diagonal VEC) model to

overcome this: A and G matrices are assumed to be diagonal, each
element hi ,jt depending only on its own lag and on the previous values
of it jt .

## Bollerslev et al. (1988) suggests DVEC (diagonal VEC) model to

overcome this: A and G matrices are assumed to be diagonal, each
element hi ,jt depending only on its own lag and on the previous values
of it jt .

## DVEC can hence reduce the number of parameters to 12 for N = 3.

But, even under this diagonality, large-scale systems are still highly
parameterized and difficult to estimate.

But, even under this diagonality, large-scale systems are still highly
parameterized and difficult to estimate.

Even simpler version of the DVEC (Ding and Engle, 2001): A and G
to be positive scalar (scalar model).

Even simpler version of the DVEC (Ding and Engle, 2001): A and G
to be positive scalar (scalar model).

PRING 2013

7 / 44

PRING 2013

8 / 44

BEKK

## Practitioners who study volatility processes often observe that their

model is very close to the unit root case.
To take this into account, Riskmetrics uses the exponentially weighted
moving average model (EWMA) and defines the variances and
covariances as IGARCH-type models (Engle and Bollerslev, 1986):

## It is difficult to guarantee the positivity of Ht in the VEC

representation without imposing strong restrictions on the parameters.
Engle and Kroner (1995) propose alternative Ht to ensure the
positivity: the BEKK model.

ht = (1 ) t 1 + ht 1
which is a scalar VEC. The decay factor proposed by Riskmetrics is
0.94 for daily data and 0.97 for monthly data.
However, the decay factor is not estimated by suggested. Therefore
very hard to justify.

PRING 2013

9 / 44

S PRING
GARCH
2013

10 / 44

## C AT. 1: D IRECT G ENERALIZATION

BEKK

BEKK

D EFINITION (BEKK(1,1,K))
Ht = C C +

k =1

k =1

Ak t 1 t 1 Ak + Gk Ht 1 Gk

(1)

## However, still prefer parsimonious models (as well as reducing the

generality).
Impose diagonal BEKK model, i.e. diagonalize Ak and Gk . (Now also
a DVEC model but less general, since DVEC is not guranteed to be
positive definite.)

## K determines the generality of the process.

Parameters of the BEKK model do not represent directly the impact
of the different lagged term on the elements of Ht like in the VEC
model.

## Scalar BEKK also applicable, i.e. Ak and Gk are equal to a scalar

times a matrix of ones.

## When K = 1, this is a VEC with C , Ak and Gk being restricted to

be positive.
J IANHUA G ANG (RUC)

S PRING
GARCH
2013

11 / 44

S PRING
GARCH
2013

12 / 44

VEC

AND

BEKK

AL .

## (1990 B ), B OLLERSLEV AND E NGLE (1993))

D EFINITION (FGARCH(1,1,K))
The difficulty when estimating a VEC or even a BEKK model is the
high number of unknown parameters, even after imposing several
restrictions.

## Lin (1992): the BEKK(1,1,K) model above is a factor GARCH model,

denoted by F-GARCH(1,1,K), if for each k = 1, ..., K , Ak and Gk have
rank one and have the same left and right eigenvectors, k and wk ,i.e.
Ak

It is thus not surprising that these models are rarely used when the
number of series is larger than 3 or 4.
Factor and orthogonal models circumvent this difficulty by imposing a
common dynamic structure on all the elements of Ht , which results in
less parameterized models.

Gk

S PRING
GARCH
2013

13 / 44

(4)

S PRING
GARCH
2013

## C AT. 1: D IRECT G ENERALIZATION

FACTOR M ODEL (E NGLE ET

AL .

14 / 44

## C AT. 1: D IRECT G ENERALIZATION

(1990 B ), B OLLERSLEV AND E NGLE (1993))

(3)

N
0 for k 6= i
, wkn = 1
1 for k = i n =1

AL .

(2)

## where k and k are scalars, and wk and k (for k = 1, ..., K ) are N 1

vectors satisfying,
wk i

= k wk k ,
= k wk k ,

## (1990 B ), B OLLERSLEV AND E NGLE (1993))

D EFINITION (FGARCH(1,1,K))
Substitute (2) and (3) into (1) and define = C C , we get
K

Ht = +

k k
k =1

2k wk t 1 t 1 wk + 2k wk Ht 1 wk

(5)

Ht

## The K -factor GARCH implies that Ht 1 has reduced rank K , but Ht

is of full rank because is positive definite.
The vector k is defined as factor loading, and the scalar wk t
(denoted as fkt ) is the kth factor.
The expression between brackets can be replaced by other univariate
GARCH specifications.
J IANHUA G ANG (RUC)

S PRING
GARCH
2013

## Consider, for instance, the two-factor F-GARCH model:

F-GARCH(1,1,2),

15 / 44

= + 1 1 21 w1 t 1 t 1 w1 + 21 w1 Ht 1 w1

+1 1 22 w2 t 1 t 1 w2 + 22 w2 Ht 1 w2

S PRING
GARCH
2013

(6)

16 / 44

AL .

AL .

## A K-factor model can then be written as

t = 1 f1t + 2 f2t + et

t = ft + et

## where et represents an idiosyncratic shock with constant variance

matrix and uncorrelated with the two factors.

## where is a matrix of dimension N K and ft is a K 1 vector. A

factor is observable if it is specified as a function of t .

Each factor fkt has zero conditional mean and conditional variance
like a GARCH(1,1) process.

## Variants of the factor model in the literature. e.g. Vrontos et al.

(2003), the full-factor multivariate GARCH model (FF-GARCH).

S PRING
GARCH
2013

17 / 44

S PRING
GARCH
2013

## C AT. 2: L INEAR C OMBINATIONS

O RTHOGONAL GARCH

O RTHOGONAL GARCH

Each factor fkt has zero conditional mean and conditional variance
like a GARCH(1,1) process.

D EFINITION (O-GARCH(1,1, M ))
Kariya (1988) and Alexander and Chibumba (1997): The N N
time-varing variance marix Ht is generated by m N univariate GARCH
models,
t = 1 f1t + 2 f2t + et
where et represents an idiosyncratic shock with constant variance matrix
and uncorrelated with the two factors.

S PRING
GARCH
2013

18 / 44

19 / 44

## A K-factor model can then be written as

t = ft + et
where is a matrix of dimension N K and ft is a K 1 vector. A
factor is observable if it is specified as a function of t .
Variants of the factor model in the literature. e.g. Vrontos et al.
(2003), the full-factor multivariate GARCH model (FF-GARCH).

S PRING
GARCH
2013

20 / 44

## C AT. 3: N ONLINEAR C OMBINATIONS

CCC M ODEL

Multivariate models must allow where one can specify separately (the
individual conditional variances) and the conditional correlation
matrix or other measure of dependence between individual series (like
the copula of the conditional joint density).

A hierarchical procedure:
1

## Choose a GARCH-type model for each conditional variance (may vary

among within the multivariate system);
Model the conditional correlation matrix (imposing positive definiteness
for any t).

## For models of this category, theoretical results on stationarity,

ergodicity and moments may not be so straightforward

## Nonetheless, they are less greedy in parameters than other categories,

and therefore more easily estimable.

## Thus the conditional covariances are proportional to the product of

the corresponding conditional standard deviations.
This restriction greatly reduces the number of unknown parameters
and thus simplifies the estimation. (Bollerslev (1990))

S PRING
GARCH
2013

21 / 44

S PRING
GARCH
2013

## C AT. 3: N ONLINEAR C OMBINATIONS

CCC M ODEL

CCC M ODEL

D EFINITION (CCC)

## R is the matrix containing the constant conditional correlation ij .The

original CCC model has a GARCH(1,1) specification for each
conditional variance in Dt :

Ht = Dt RDt = ij
where

hiit hjjt



1/2
1/2
...hNNt
Dt = diag h11t

(7)


## hiit can be defined as any univariate GARCH model, and R = ij

symmetric positive definite matrix with ij = 1,for any i.

22 / 44

S PRING
GARCH
2013

is a

23 / 44

## However, unconditional covariances are difficult to calculate.

He and Terasvirta (2002b) use a VEC-type formulation for
(h11t , h22t , ..., hNNt ) to allow for interactions between the conditional
variances. They call this the extended CCC model.

S PRING
GARCH
2013

24 / 44

DCC M ODEL

DCC M ODEL

bivariate case.

## CCC: Assumption that the conditional correlations are constant:

unrealistic in many empirical applications.

The DCC model of Tse and Tsui (2002) and Engle (2002) are useful
when modelling high-dimensional data sets.

## DCC: Generalization of the CCC by making the conditional

correlation matrix time-dependent. (Christodoulakis and Satchell
(2002), Engle (2002) and Tse and Tsui (2002)).

## An additional difficulty is that the time-dependent conditional

correlation matrix has to be positive definite for any t. (The DCC
models guarantee this under simple conditions on the parameters.)

S PRING
GARCH
2013

## T OPIC 6 M ULTIVARIATE GARCH

Ht = Dt Rt Dt
where Dt is defined in (7), hiit can be defined as any univariate GARCH
model and
Rt = (1 1 2 ) R + 1 t 1 + 2 Rt 1
(8)

25 / 44

S PRING
GARCH
2013

DCC M ODEL

DCC M ODEL

DCCT (M))

MODEL OF

OR

## In (8), 1 and 2 are non-negative parameters satisfying 1 + 2 < 1,R is a

symmetric N N positive definite parameter matrix with ii = 1 and
t 1 is the N N correlation matrix of for
= t M, t M + 1, ..., t 1. Its i, jth. element is given by:
ij ,t 1 = r

## J IANHUA G ANG (RUC)

M
m =1 ui ,t m uj ,t m


M
2
2
u
u
M

m =1 i ,t m
m =1 j ,t m

S PRING
GARCH
2013

(9)

27 / 44

26 / 44

DCCT (M))

## where uit = it / hiit .The matrix t 1 can be expressed as:

t 1 = Bt11 Lt 1 Lt 1 Bt11 ,where Bt 1 is a N N diagonal matrix with

1/2
2
ith. diagonal element given by M
and
h =1 ui ,t h
Lt 1 = (ut 1 , ..., ut M ) is a N M matrix, with ut = (u1t u2t ...uNt ) .

## A necessary condition to ensure the positivity of t 1 ,and therefore

also of Rt , is that M > N. Then Rt is itself a correlation matrix if
Rt 1 is also a correlation matrix (notice iit = 1 for any i).

S PRING
GARCH
2013

28 / 44

## C AT. 3: N ONLINEAR C OMBINATIONS

DCC M ODEL

DCC M ODEL

Engle and Sheppard, 2001).
D EFINITION (C ONT. DCC MODEL OF E NGLE (2002) OR DCCE (1,1))
D EFINITION (DCC MODEL OF E NGLE (2002) OR DCCE (1,1))

## with ut as in (9). Q is the N N unconditional variance matrix of ut , and

and are non-negative scalar parameters satisfying + < 1.

Ht = Dt Rt Dt
where





1/2
1/2
1/2
1/2
Q
...qNN
diag
q
...q
Rt = diag q11,t
t
11,t
,t
NN ,t

## The element of Q can be estimated or set to their empirical

couterpart to render the estimation even simpler.

## where the N N symmetric positive definite matrix Qt = (qij ,t ) is given

by:
Qt = (1 ) Q + ut 1 ut 1 + Qt 1
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 6 Multivariate
S PRING
GARCH
2013

29 / 44

12t

## J IANHUA G ANG (RUC)

M
m =1 u1,t m u2,t m


M
2
2
u
u
M

m =1 1,t m
m =1 2,t m

(10)

2
(1 ) q 22 + u2,t
1 + q22,t 1

S PRING
GARCH
2013

## For both DCCT and DCCE models, one can test 1 = 2 = 0 or

= = 0, respectively to check whether imposing constant
conditional correlations is empirically relevant.
A drawback of the DCC models is that 1 , 2 in DCCT and , in
DCCE are scalars, so that all the conditional correlations obey the
same dynamics. This is however necessary to ensure that Rt is
positive definite for any t through sufficient conditions on the
parameters.

q

2
(1 ) q 11 + u1,t
1 + q11,t 1
1

## C ATEGORY 3: N ONLINEAR C OMBINATIONS

Unlike in DCCT the DCCE model does not formulate the conditional
correlation as a weighted sum of past correlations.

= (1 1 2 ) 12 + 2 12,t 1 + 1

q

30 / 44

## Explicit difference between DCCT (M ) and DCCE (1, 1), see

conditional correlations:

+ r

S PRING
GARCH
2013

12t

(11)

31 / 44

S PRING
GARCH
2013

32 / 44

W HY DCC

## DCC models can be estimated consistently in two steps, which makes

this approach feasible when N is high.
Of course, when N is large, the restriction of common dynamics gets
tighter, but for large N the problem of maintaining tractability also
gets harder. In this respect, several variants of the DCC model are
proposed in the literature.

## DCC models open the door to using flexible GARCH specifications in

the variance part.

## Billio et al. (2003) argue that constraining the dynamics of the

conditional correlation matrix to be the same for all the correlations is
not desirable.

## As the conditional variances (together with the conditional means)

can be estimated using N univariate models, one can easily extend the
DCC-GARCH models to more complex GARCH-type structures.

## Pelletier (2003) proposes a model where the conditional correlations

follow a switching regime driven by an unobserved Markov chain so
that the correlation matrix is constant in each regime but may vary
across regimes.
J IANHUA G ANG (RUC)

S PRING
GARCH
2013

33 / 44

S PRING
GARCH
2013

34 / 44

## D EFINITION (GDC M ODEL )

Ht = Dt Rt Dt + t
where,

A model somewhat different from the previous ones but that nests
several of them is the general dynamic covariance (GDC) model
proposed by Kroner and Ng (1998).

Dt = (dijt ), diit =

(12)

## Rt is specified as DCCT (M ) or DCCE (1, 1). = (ij ), ii = 0, ij = ji ,

ijt = ij + i t 1 t 1 j + gi Ht 1 gj , for any i, j.

(13)

## i , gi , i = 1, ..., N are (N 1) vectors of parameters, and = ( ij ) is

positive definite and symmetric.

S PRING
GARCH
2013

35 / 44

S PRING
GARCH
2013

36 / 44

## C OPULA -MGARCH M ODELS

Elementwise we have,
hijt
hiit

p
= ijt iit jjt + ij ijt , for i 6= j
= | iit | , for any i.

(14)

## Any N-dimensional joint distribution function may be decomposed

into its N marginal distributions, and a copula function that
completely describes the dependence between the N variables (Sklar
(1959), Nelsen (1999), Patton (2000), Jondeau and Rockinger
(2001)).
Standard copula-GARCH:

## where the ijt are given by the BEKK formulation in (13).

The GDC model contains several MGARCH models as special cases.

1
2
3

S PRING
GARCH
2013

37 / 44

## GARCH for conditional variances;

marginal distributions for each series;
a conditional copula function.

S PRING
GARCH
2013

## C AT. 3: N ONLINEAR C OMBINATIONS

E STIMATION I SSUES

T WO -S TEP MLE

## Papers put need to allow for time-variation in the conditional copula,

extending the DCC models to other specifications of the conditional
dependence, so that the copula function is rendered time-varying
through its parameters, which can be functions of past data.
Can be estimated using a two-step maximum likelihood approach.
Feature of copula-GARCH models: the ease with which very flexible
joint distributions may be obtained in the bivariate case.
Their application to higher dimensions is a subject for further
research.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 6 Multivariate

S PRING
GARCH
2013

39 / 44

38 / 44

E STIMATION I SSUES

## Two-step maximum likelihood estimation: Engle and Sheppard

(2001) show that the loglikelihood can be written as the sum of a
mean and volatility part (depending on a set ofunknown
 parameters).

b
b
Therefore estimate coefficients separately, say, 1 , 2 .
But maximizing them separately is not fully efficient since they are
limited information estimators. However, one iteration of a
Newton-Raphson
algorithm applied to the total likelihood starting at
 
b
b
1 , 2 provides an estimator that is asymptotically efficient.

S PRING
GARCH
2013

40 / 44

## T OPIC 6 M ULTIVARIATE GARCH

E STIMATION I SSUES

## T OPIC 6 M ULTIVARIATE GARCH

E STIMATION I SSUES

D IAGNOSTIC C HECKING

D IAGNOSTIC C HECKING

## MGARCH models are difficult: too many parameters!

A simple trick to ensure a reasonable value of the model-implied
unconditional covariance matrix, which also helps to reduce the
number of parameters in the maximization of the likelihood function,
is referred to as VTE by Engle and Mezrich (1996).

It is desirable to check,
1

VTE:
1

## ex ante: whether the data present evidence of multivariate ARCH

effects;
ex post: check the adequacy of the MGARCH specification.

## Re-parameterization of the model and the estimation of the

unconditional variance;
QML estimation of the remaining parameters.

## Merits: when the model is misspecified, the VTE can be superior to

the QMLE for long-term prediction.

S PRING
GARCH
2013

## T OPIC 6 M ULTIVARIATE GARCH

41 / 44

D IAGNOSTIC C HECKING

E (ztzt ) = I
N;
Cov zit2 , zjt2 = 0, for all i 6= j;


Cov zit2 , zj2,t k = 0, for k > 0.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 6 Multivariate

S PRING
GARCH
2013

42 / 44

D IAGNOSTIC C HECKING

D IAGNOSTIC C HECKING

## Since the dynamics of the series is assumed to be captured by the

model (at least in the first two conditional moments), the
standardized error term zt = Ht1/2 t should obey the following
moment conditions (Ding and Engle, 2001):
2

S PRING
GARCH
2013

## T OPIC 6 M ULTIVARIATE GARCH

D IAGNOSTIC C HECKING

mean;

## Testing 2 is suited to check if the conditional distribution is Gaussian,

which could be false even if Ht is correctly specified.

## Testing 3 aims at checking the adequacy of the dynamic specification

of Ht , regardless of the validity of the assumption about the
distribution of zt .
Tse (2002), diagnostics for conditional heteroscedasticity models
applied in the literature can be divided into three categories:
portmanteau tests, residual-based diagnostics and Lagrange multiplier
tests.

43 / 44

S PRING
GARCH
2013

44 / 44

## T OPIC 7 M ULTIVARIATE M ODELS

I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 7 Multivariate Models
3 C REDITS , 51 H OURS
Jianhua Gang
School of Finance
Renmin University of China

Spring 2013

PRING 2013

1 / 52

EXOGENOUS.

Qst
Qdt

(1)
(2)
(3)

PRING 2013

= + P + S + u
Q = + P + kT + v
Q

(4)
(5)

## This is a simultaneous STRUCTURAL FORM of the model.

The point is that price and quantity are determined simultaneously
(price affects quantity and quantity affects price). So, P and Q are
endogenous variables, while S and T are exogenous.
We can obtain REDUCED FORM equations corresponding to (4)
and (5) by solving equations (4) and (5) for P and for Q (separately).

## in which St = price of a substitute good; Tt = some variable

embodying the state of technology.
J IANHUA G ANG (RUC)

## S IMULTANEOUS E QUATIONS M ODELS : T HE S TRUCTURAL F ORM

Assuming that the market always clears, and dropping the time
subscripts for simplicity.

y is an ENDOGENOUS variable.
An example from economics to illustrate-the demand and supply of a
good:

= + Pt + St + ut
= + Pt + kTt + vt
= Qst

2 / 52

## S IMULTANEOUS E QUATIONS M ODELS : T HE

S TRUCTURAL F ORM

All the models we have looked at thus far have been single equations
models of the form: y = X + u

Qdt

PRING 2013

3 / 52

PRING 2013

4 / 52

## O BTAINING THE R EDUCED F ORM

Re-arranging (6):
Solving for Q,
+ P + S + u = + P + kT + v

P=

(6)

v u
+
T
S

(8)

## Q can be ultimately calculated by multiplying (7) through with :

Solving for P,
S
u
Q
kT
v
Q

=

(7)

Q=

u v

T+
S+

(9)

(8) and (9) are the reduced form equations for P and Q.

PRING 2013

5 / 52

6 / 52

## S IMULTANEOUS E QUATIONS B IAS

But what would happen if we had estimated equations (4) and (5),
i.e. the structural form equations, separately using OLS?
Both equations depend on P. One of the CLRM assumptions was that
E (X u ) = 0, where X is a matrix containing all the variables on the
R.H.S. of the equation.
It is clear from (8) that P is related to the errors in (4) and (5) i.e. it is stochastic.
Hence, when estimating coefficient before P, it is biased! (Since
E (X u ) 6= 0 in general!)

PRING 2013

PRING 2013

7 / 52

## Conclusion: Application of OLS to structural equations which are part

of a simultaneous system will lead to biased coefficient estimates.
Is the OLS estimator still consistent, even though it is biased?
No - In fact the estimator is inconsistent as well.
Hence it would NOT be possible to estimate equations (4) and (5)
validly using OLS.

PRING 2013

8 / 52

1

## Taking equations (8) and (9), we can rewrite them as

P
Q

= 10 + 11 T + 12 S + 1
= 20 + 21 T + 22 S + 2

(10)
(11)

We CAN estimate equations (10) and (11) using OLS since all the
R.H.S. variables are exogenous.
But ... we probably dont care what the values of the coefficients
are; what we wanted were the original parameters in the structural
equations - , , , , , k.

PRING 2013

## T OPIC 7 M ULTIVARIATE M ODELS

P ROBLEM
As well as simultaneity, we sometimes encounter another problem:
Identification. Consider the following demand and supply equations

= + P
Q = + P
Q

9 / 52

S PRING
Models2013

(13)

10 / 52

## If an equation (model) is identified in general its coefficients can be

estimated. The appropriate estimation technique will depend upon
whether it is exactly identified or over-identified.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013

(12)

We cannot tell which is which! (same equations in nature from OLS view!)

11 / 52

## Both equations of (12) and (13) are UNIDENTIFIED or NOT

IDENTIFIED, or UNDERIDENTIFIED.

## We do not have enough information from the equations to estimate

four parameters. Notice that we would not have had this problem with
equations (4) and (5) since they have different exogenous variables.

S PRING
Models2013

12 / 52

## We could have three possible situations:

1

An equation is unidentified
like (12) and (13)
we cannot get the structural coefficients from the reduced form
estimates.
An equation is exactly identified
e.g. (4) or (5)
can get unique structural form coefficient estimates.
An equation is over-identified
Examples given later
More than one set of structural coefficients could be obtained from
the reduced form.

S PRING
Models2013

13 / 52

## The order condition - is a necessary but not sufficient condition for an

equation to be identified.
The rank condition - is a necessary and sufficient condition for
identification. We specify the structural equations in a matrix
form and consider the rank of a coefficient matrix.

S PRING
Models2013

14 / 52

## W HAT D ETERMINES THE I DENTIFICATION ?

T HE O RDER C ONDITION

E XAMPLE
In the following system of equations, the Ys are endogenous, while the Xs
are exogenous. Determine whether each equation is over-, under-, or
just-identified.

D EFINITION
Statement of the Order Condition (from Ramanathan 1995, pp.666)
Let G denote the number of structural equations. An equation is just
identified if the number of variables excluded from an equation is G-1.
If more than G-1 are absent, it is over-identified. If less than G-1 are
absent, it is not identified.

S PRING
Models2013

## T OPIC 7 M ULTIVARIATE M ODELS

T HE O RDER C ONDITION

## How do we tell if an equation is identified or not?

There are two conditions we could look at:

15 / 52

Y1 = 0 + 1 Y2 + 2 Y3 + 3 X1 + 4 X2 + u1

(14)

Y2 = 0 + 1 Y3 + 2 X1 + u2

(15)

Y3 = 0 + 1 Y2 + u3

(16)

S PRING
Models2013

16 / 52

## T OPIC 7 M ULTIVARIATE M ODELS

T HE O RDER C ONDITION

T HE R ANK C ONDITION

S OLUTION
G=3
If # excluded variables = 2, the eq. is just identified
If # excluded variables > 2, the eq. is over-identified
If # excluded variables < 2, the eq. is not identified
Hence,
Equation 14: Not identified
Equation 15: Just identified
Equation 16: Over-identified

S PRING
Models2013

## In a system of G equations any particular equation is identified iff it is

possible to construct at least one non-zero determinant of the order
(G-1) from the coefficients excluded from that particular equation but
contained in other equations of the model.
or A sufficient condition for the identification of a relationship is that
the rank of the matrix of parameters of all the excluded variables
(endogenous and pre-determined) from that equation be equal to
(G-1).

17 / 52

S PRING
Models2013

## T OPIC 7 M ULTIVARIATE M ODELS

T HE R ANK C ONDITION

18 / 52

## W HAT D ETERMINES THE I DENTIFICATION ?

T HE R ANK C ONDITION

For example:
y1 = 3y2 2x1 + x2 + u1

Results:

y2 = y3 + x3 + u2

y3 = y1 y2 2x3 + u3

2
3

## RC/OC: Equation 1 is exactly identified;

RC: Equation 2 is exactly identified, OC: over-identified;
RC: Equation 3 is not identified, OC: exactly identified.

## y1 + 3y2 + 0y3 2x1 + x2 + 0x3 + u1 = 0

0y1 y2 + y3 + 0x1 + 0x2 + x3 + u2 = 0
y1 y2 y3 + 0x1 + 0x2 2x3 + u3 = 0

S PRING
Models2013

19 / 52

S PRING
Models2013

20 / 52

## We can, however, formally test this using a Hausman test, which is

calculated as follows:
How do we tell whether variables really need to be treated as
endogenous or not?
Consider again equations (14)-(16). Equation (14) contains Y2 and
Y3 - but do we really need equations for them?

## 1 Obtain the reduced form equations corresponding to equations

(14)-(16). The reduced forms turn out to be:
Y1
Y2
Y3

= 10 + 11 X1 + 12 X2 + v1
= 20 + 21 X1
+ v2
= 30 + 31 X1
+ v3

(17)
(18)
(19)

Estimate the reduced form equations (17)-(19) using OLS, and obtain
b2 , Y
b3 .
b1 , Y
the fitted values: Y

S PRING
Models2013

21 / 52

S PRING
Models2013

## T EST FOR E XOGENEITY-H AUSMAN T EST

22 / 52

R ECURSIVE S YSTEMS

R ECURSIVE S YSTEMS
Consider the following system of equations:
Y1 = 10

(21)

## 2. Run the regression corresponding to equation 14.

Y2

(22)

3. Run the regression 14 again, but now also including the fitted values
b2 , Y
Y

Y3

b2 + 3 Y
b3 + u1 (20)
Y1 = 0 + 1 Y2 + 2 Y3 + 3 X1 + 4 X2 + 2 Y

## 4. Use an F-test to test the joint restriction that H0 : 2 = 3 = 0. If

the null hypothesis is rejected, Y2 and Y3 should be treated as
endogenous.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013

23 / 52

+ 11 X1 + 12 X2 + u1
= 20 + 21 Y1
+ 21 X1 + 22 X2 + u2
= 30 + 31 Y1 + 32 Y2 + 31 X1 + 32 X2 + u3

(23)

P ROBLEM
Assume that the error terms are not correlated with each other. Can we
estimate the equations individually using OLS?
(21) contains no endogenous variables, so X1 and X2 are NOT
correlated with u1 . So we can use OLS on (21).
(22) contains endogenous variable Y1 . We can use OLS on (22) if all
the R.H.S. variables are uncorrelated with the error u2 (True!). In
fact, Y1 is not correlated with u2 because there is no Y2 term in
equation (21). So we can use OLS on (22).
J IANHUA G ANG (RUC)

S PRING
Models2013

24 / 52

## T OPIC 7 M ULTIVARIATE M ODELS

R ECURSIVE S YSTEMS

## T OPIC 7 M ULTIVARIATE M ODELS

R ECURSIVE S YSTEMS

## Cannot use OLS on structural equations, but we can validly apply it

to the reduced form equations.
Equation 23: Contains both Y1 and Y2 ; we require these to be
uncorrelated with u3 . By similar arguments to the above, equations
(21) and (22) do not contain Y3 , so we can use OLS on (23).
This is known as a RECURSIVE or TRIANGULAR system. We do not
have a simultaneity problem here.

D EFINITION
If the system is just identified, ILS involves estimating the reduced form
equations using OLS, and then using them to substitute back to obtain
the structural parameters.

## But in practice not many systems of equations will be recursive...

However, ILS is not used much because:
1
2

S PRING
Models2013

25 / 52

## Solving back to get the structural parameters can be tedious.

Most simultaneous equations systems are over-identified.

S PRING
Models2013

## In fact, we can use this technique for just-identified and

over-identified systems.

## In fact, we can use this technique for just-identified and

over-identified systems.

## Two stage least squares (2SLS or TSLS) is done in two stages:

Obtain and estimate the reduced form equations using OLS. Save the
fitted values for the dependent variables.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013

27 / 52

26 / 52

Obtain and estimate the reduced form equations using OLS. Save the
fitted values for the dependent variables.
Estimate the structural equations, but replace any R.H.S. endogenous
variables with their Stage 1 fitted values.

S PRING
Models2013

27 / 52

1

## Estimate the reduced form equations (17)-(19) individually by OLS and

b2 , Y
b3 .
b1 , Y
obtain the fitted values, Y
Replace the R.H.S. endogenous variables with their Stage 1 estimated
values:
Y1
Y2
Y3

b2 + 2 Y
b3 + 3 X1 + 4 X2 + u1
= 0 + 1 Y
b3 + X1 + u2
= 0 + 1 Y
2
b
= 0 + 1 Y2 + u3

(24)
(25)
(26)

b2 and Y
b3 will not be correlated with u1 , will not be correlated
Now Y
with u2 , and will not be correlated with u3 .

S PRING
Models2013

28 / 52

## The standard error estimates also need to be modified compared

with their OLS counterparts, but once this has been done, we can use
the usual t- and F-tests to test hypotheses about the structural form
coefficients.

S PRING
Models2013

29 / 52

## I NSTRUMENTAL VARIABLES (IV)

Recall that the reason we cannot use OLS directly on the structural
equations is that the endogenous variables are correlated with the
errors.
One solution: abandon Y2 or Y3 , rather, use some other variables
We want these other variables to be (highly) correlated with Y2 and
Y3 , but not correlated with the errors - the INSTRUMENTS.
Say, some suitable instruments for Y2 and Y3 , z2 and z3 respectively.
We do not use the instruments directly, but run regressions of the
form:
Y2 = 1 + 2 z2 + 1

(27)

Y3 = 3 + 4 z3 + 2

(28)

S PRING
Models2013

## If the disturbances in the structural equations are autocorrelated,

the 2SLS estimator is not even consistent.

## It is still of concern in the context of simultaneous systems whether

the CLRM assumptions are supported by the data.

30 / 52

b2 and Y
b3 , and replace Y2
Obtain the fitted values from (27) & (28), Y
and Y3 with these in the structural equation.
We do not use the instruments directly in the structural equation.
It is typical to use more than one instrument per endogenous
variable.
If the instruments are the variables in the reduced form equations,
then IV is equivalent to 2SLS.

S PRING
Models2013

31 / 52

## What happens if we use IV/2SLS unnecessarily?

The coefficient estimates will still be consistent, but will be
inefficient compared to those that just used OLS directly.
The Problem With IV:

## A natural generalisation of autoregressive models.

VAR: a systems regression model i.e. there is more than one
dependent variable.
y1t

P ROBLEM
What are the instruments?

y2t

S OLUTION
Solution: 2SLS is easier.
Other Estimation Techniques:
1
2

## 3SLS - allows for non-zero covariances between the error terms.

LIML-(Limited Information ML) estimating reduced form equations
by maximum likelihood.
FIML-(Full Information ML) estimating all the equations
simultaneously using maximum likelihood.

S PRING
Models2013

## T OPIC 7 M ULTIVARIATE M ODELS

32 / 52

Yt = 0 + 1 Yt 1 + ut
J IANHUA G ANG (RUC)

S PRING
Models2013

S PRING
Models2013

33 / 52

## No need to specify endogeneity/exogeneity: all are (weakly)

endogenous!
Allows a variable to depend on more than just its own lags or
combinations of white noise terms, so more general than ARMA
modelling.
Provided that there are no contemporaneous terms on the R.H.S.
of the equations, Can simply use OLS separately on each equation.
Forecasts are often better than traditional structural models.

## = 10 + 11 y1t 1 + 11 y2t 1 + u1t

= 20 + 21 y2t 1 + 21 y1t 1 + u2t

VAR(1):

  
    
10
y1t 1
u1t
y1t
11 11
=
+
+

y2t
20
y2t 1
u2t
21
21

34 / 52

## Problems with VARs

1

2
3
4
5

VARs are theoretical (as are ARMA models). What if not the VAR
process?
How to decide the appropriate lag length?
So many parameters to estimate!
Do we need to ensure all components of the VAR are stationary?
How do we interpret the coefficients?

S PRING
Models2013

35 / 52

## C HOOSE O PTIMAL L AG L ENGTH FOR

VAR(C ROSS -E QUATION R ESTRICTIONS )

## C HOOSE O PTIMAL L AG L ENGTH FOR

VAR(C ROSS -E QUATION R ESTRICTIONS )

## In the spirit of (unrestricted) VAR modelling, each equation should

have the same lag length.
Suppose that a bivariate VAR(8), and we want to examine a
restriction that the coefficients on lags 5 through 8 are jointly zero.
(de facto, H0 : VAR (4) against HA : VAR (8))
This can be done using a likelihood ratio (LR) test.

## Denote the variance-covariance matrix of residuals (given by u

bu
b /T ),
b
as . The LR test statistic for this joint hypothesis is:
i
h
b
b
LR = T ln
r ln u
LR is asymptotically distributed as a:

2 (q ), q = # of restrictions
In the our case above we restrict 4 lags of two variables in each of the
two equations:4 2 2 = 16 restrictions.
P ROBLEM
Conducting the LR test is cumbersome and requires a normality
assumption for the disturbances

S PRING
Models2013

36 / 52

37 / 52

b
MAIC = ln
+ 2k/T
k
b
MSBIC = ln
+ ln(T )
T
2k
b
MHQIC = ln +
ln(ln T )
T

y1t
y2t

## = 10 + 11 y1t 1 + 11 y2t 1 + 12 y2t + u1t

= 20 + 21 y2t 1 + 21 y1t 1 + 22 y1t + u2t

in compact form:

 
   
    
10
y1t 1
y2t
u1t
y1t
11 11
12 0
=
+
+
+
0 22
21 21
y2t
20
y2t 1
y1t
u2t

## k is the total number of regressors in all equations.

The values of the information criteria are constructed for 0,1,... lags
(up to some pre-specified maximum kmax ).

S PRING
Models2013

S PRING
Models2013

## C HOOSE O PTIMAL L AG L ENGTH FOR

VAR(I NFORMATION C RITERIA )

38 / 52

S PRING
Models2013

39 / 52

1

## B LOCK S IGNIFICANCE AND C AUSALITY T ESTS

We can take the contemporaneous terms over to the L.H.S. and write
    


  
y1t
10
y1t 1
u1t
1
12
11 11
=
+
+
22
1
21 21
y2t
20
y2t 1
u2t
BYt = 0 + 1 Yt 1 + ut
We then pre-multiply both sides by B 1 :

or
Yt = A0 + A1 Yt 1 + et
This is known as a standard form VAR, which we can estimate using
ML as before.
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate
S PRING
Models2013

40 / 52

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013

41 / 52

I MPULSE R ESPONSES

## VAR models are often difficult to interpret: one solution is to

construct the impulse responses and variance decompositions.
Impulse responses trace out the responsiveness of the dependent
variables in the VAR to shocks in the error term. A unit shock is
applied to each variable and its effects are noted.
Consider for example a simple bivariate VAR(1):

## Granger causality tests seek to answer questions such as Do

changes in y1 cause changes in y2 ?

## J IANHUA G ANG (RUC)

Implied Restriction
21 = 0 and 21 = 0 and 21 = 0
11 = 0 and 11 = 0 and 11 = 0
12 = 0 and 12 = 0 and 12 = 0
22 = 0 and 22 = 0 and 22 = 0

I MPULSE R ESPONSES

Each of these four joint hypotheses can be tested within the F-test
framework, since each set of restrictions contains only parameters
drawn from one equation.

Hypothesis
1. Lags of y1t do not explain current y2t
2. Lags of y1t do not explain current y1t
3. Lags of y2t do not explain current y1t
4. Lags of y2t do not explain current y2t

## 12 y1t 1 11 12 y1t 2 11 12 y1t 3 u1t

+
+
+

22 y 2t 1 21 22 y 2t 2 21 22 y 2t 3 u 2t

## We might be interested in testing the following hypotheses, and their

implied restrictions on the parameter matrices:

Yt = B 1 0 + B 1 1 Yt 1 + B 1 ut

## It is likely that, when a VAR includes many lags of variables, it will be

difficult to see which sets of variables have significant effects on
each dependent variable and which do not. For illustration, consider
the following bivariate VAR(3):
y1t 10 11
+
=
y 2t 20 21

or
2

## If y1 causes y2 , lags of y1 should be significant in the equation

for y2 . If this is the case, we say that y1 Granger-causes y2 .
If y2 causes y1 , lags of y2 should be significant in the equation for y1 .
If both sets of lags are significant, there is bi-directional causality

y2 t = 20 + 21 y2 t 1 + 21 y1t 1 + u2 t

## A change in u1t will immediately change y1 . It will change y2 and also

y1 during the next period.
We can examine how long and to what degree a shock to a given
equation has on all of the variables in the system.

S PRING
Models2013

42 / 52

S PRING
Models2013

43 / 52

## T OPIC 7 M ULTIVARIATE M ODELS

VARIANCE D ECOMPOSITIONS

## T OPIC 7 M ULTIVARIATE M ODELS

VARIANCE D ECOMPOSITIONS

## I MPULSE R ESPONSES AND VARIANCE

D ECOMPOSITIONS : T HE O RDERING OF THE
VARIABLES

## Variance decompositions offer a slightly different method of

examining VAR dynamics. They give the proportion of the
movements in the dependent variables that are due to their own
shocks, versus shocks to the other variables.
This is done by determining how much of the s step ahead forecast
error variance for each variable is explained in innovations to each
explanatory variable (s = 1,2,. . . ).
The variance decomposition gives information about the relative
importance of each shock to the variables in the VAR.

S PRING
Models2013

44 / 52

## For calculating IRs and variance decompositions, the ordering of the

variables is important.
Main reason: VAR errors often violate the independence of one
another. Instead, they typically correlates to some degree.
Therefore, the notion of examining the effect of the innovations
separately has little meaning, since they have a common
component.
Thus, must orthogonalise the innovations.
In the bivariate VAR, this problem would be approached by
attributing all of the effect of the common component to the first of
the two variables in the VAR.
In the general case where there are more variables, the situation is
more complex but the interpretation is the same.
J IANHUA G ANG (RUC)

L AG N UMBER S ELECTION

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013
L AG N UMBER S ELECTION

L AG N UMBER S ELECTION

45 / 52

T ESTS OF R ANDOMNESS

T ESTS OF R ANDOMNESS

## If Yt is i.i.d. (and has finite variance) then 1 , ..., k are all 0.

How do we choose the lags p, q in an ARMA(p, q ) model?
1

## By by looking at the sample autocorrelations and the sample partial

autocorrelations, and trying to recognize the pattern of a model with
given p, q.
By using an automatic selection criterion (information criterion).

S PRING
Models2013

46 / 52

## Then, the sample autocorrelations (b

j , b
h , j 6= h, j 1, h 1) are
asymptotically independent and

Tb
j d N (0, 1) ,(j 1)

We can use this property to design two tests to check if the data are
independently distributed.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013

47 / 52

L AG N UMBER S ELECTION

T ESTS OF R ANDOMNESS

L AG N UMBER S ELECTION

T EST OF R ANDOMNESS

P ORTMANTEAU TEST

P ORTMANTEAU TEST

## We can also test a group of k autocorrelations jointly : under the

null,
This test is so simple that it can be inspected
so the
visually,
computers usually plots two error bars at 1.96/ T with the sample
autocorrelation function.
(Notice: although it is called "test for randomness" by some
computer softwares and some references, a more appropriate name
would be "test for independent distribution").

b2j d

2k

j =1

## (this test may be of particular interest when we suspect a

seasonal structure in the data: for example with quarterly data the
first three autocorrelations may be zero, and then the fourth one may
be non-zero). (The test may be sensitive to the choice of k on some
occasions).
The tests for independent distribution and the Portmanteau test
may provide preliminary information about the sample AC.

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013
L AG N UMBER S ELECTION

48 / 52

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013
L AG N UMBER S ELECTION

49 / 52

## M ODEL S ELECTION : I NFORMATION C RITERIA

The solution: add a penalty which increases with p and q.
IC = 2l (b
) + penalty

2(p + q )
Akaike IC
penalty :
(p + q ) ln(T ) Bayes IC

## An automatic way to select p, q. The idea: use "maximum

likelihood" to choose p, q.
The problem: if you compare an ARMA(p, q ) with an
ARMA(p + 1, q ), the ARMA(p, q ) has always smaller likelihood.
This is because the estimate from the ARMA(p, q ) model maximises
bp +1 = 0,while the
the likelihood with the constraint that
ARMA(p + 1, q ) does not impose that constraint, so the
bp +1 = 0,
ARMA(p + 1, q ) has higher maximum likelihood unless
exaclty (which is an event with probability zero in finite sample even
when the true p;0 = 0 actually) (Notice analogy with regression here:
when you increase the number of regressors, the R 2 does not decrease,
and in general increases, even when the regressors are irrelevant).

S PRING
Models2013

50 / 52

## BIC: consistent estimation of p, q.

AIC: inconsistent estimation of p, q (may select larger than correct
p, q in large samples).
Both BIC and AIC may select smaller then correct p, q in finite
samples (this however is not necessarily a bad thing: it may result,
in small samples).

## An alternative approach: of course, we can also compare an

ARMA(p, q ) with an ARMA(p + 1, q ), or with an ARMA(p, q + 1),
using a likelihood ratio test. The criterion is then adding lags as
long as the likelihood ratio test statistic is above a user-chosen critical
value (for example, 5% significance would have c.v. 3. 84)
J IANHUA G ANG (RUC)

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING
Models2013

51 / 52

L AG N UMBER S ELECTION

PARSIMONIOUS M ODELLING

PARSIMONIOUS M ODELLING
Large econometrics models tend to do badly in terms of
forecasting, and are outperfomed by small ARMA models (Box
& Jenkins).
Even in ARMA models, increasing the number of parameters reduces
the precision of with which each parameter is estimated. This is
beacuse when the parameters are estimated, their variance
contributed to the variance of the forecast.
Adding extra parameters may then help to reduce or eliminate the
forecast bias, but the gain in terms of reduction bias 2 is outweighted
by the loss in increased variance of the forecast.
Should balance the number of estimated parameters and the number
of observations.
Sometimes, Information Criteria have been advocated also to select
more parsimonious models.
J IANHUA G ANG (RUC)

S PRING
Models2013

52 / 52

## T OPIC 8 T RENDING VARIABLES AND

C O - INTEGRATION
I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 8 Trending and Co-Integration

## In previous lectures, it has been assumed that all data are

trend-free.

3 C REDITS , 51 H OURS

## However, in many cases, this assumption turns out to be

inappropriate,

Jianhua Gang

E XAMPLE
Examine the series for consumption and income. The presence of trends
can sometimes invalidate the usual asymptotic theory for OLS and test
procedures.

School of Finance
Renmin University of China

Spring 2013

A discussion of trends and related topics such as tests for unit roots
and cointegration is, therefore, required.

PRING 2013

1 / 49

## A UTOREGRESSIVE D ISTRIBUTED L AG (ADL) M ODELS

Applied workers often specify models that include both lagged values
of the dependent variable and a distributed lag component in the
regression function. These models are called autoregressive

## A UTOREGRESSIVE D ISTRIBUTED L AG (ADL) M ODELS

Applied workers often specify models that include both lagged values
of the dependent variable and a distributed lag component in the
regression function. These models are called autoregressive
A very simple ADL relationship is
yt = + 1 yt 1 + 0 xt + 1 xt 1 + ut , |1 | < 1.

yt = + 1 yt 1 + 0 xt + 1 xt 1 + ut , |1 | < 1.

## The short-run multiplier is E (yt )/xt = 0 .

The cumulative effect corresponding to the long-run multiplier is

## The autoregressive component implies that a change in xt affects yt

and all future values of yt .
It can be shown that
E (yt +j )/xt = 1j 1 ( 1 + 1 0 ) , j = 1, 2, ...such terms are called
dynamic multipliers.

PRING 2013

2 / 49

PRING 2013

j =0

## then, say the slope of the long-run relationship and the

intercept is = /(1 1 ).
3 / 49

PRING 2013

4 / 49

## A UTOREGRESSIVE DISTRIBUTED LAG (ADL) MODELS

It is useful to note that the simple ADL can be written in a
mathematically equivalent form that has a parameterization of direct
economic interest, called the error correction model (ECM).
The ECM is derived from the ADL as follows:
yt = + 1 yt 1 + 0 xt + 1 xt 1 + ut ,

## The ECM is nonlinear in the coefficients 0 , 1 , , and , all of which

have meanings,
The coefficient 0 measures the contemporaneous effect.
The coefficient 1 can be thought of as reflecting speed of
The coefficients and are the intercept and the slope of the
long-run relationship.

can be written as

(yt + yt 1 ) = + 1 yt 1 + 0 (xt + xt 1 ) + 1 xt 1 + ut ,
yt = 0 xt (1 1 ) [yt 1 xt 1 ] + ut
Thus the ECM has first differences in y linked to first differences in x
and the extent by which y deviates from the long-run expected value in
the previous period. (Martingale)

PRING 2013

5 / 49

## A UTOREGRESSIVE D ISTRIBUTED L AG (ADL) M ODELS

If the OLS estimates of the ADL are denoted byband the nonlinear
least squares estimates of the ECM are denoted bye, then it can
be shown that,

e =

e
=

PRING 2013

b
0
0 = e
b
1 = e
1

6 / 49

## For future reference, it is important to note that if 0 and 1 are

estimated by applying OLS to
yt = 0 xt (1 1 ) [yt 1 xt 1 ] + ut

(1 b1 )
1 )
(b
0 + b
(1 b1 )

## the estimates of these parameters are b

0 = e
0 and
b
e
1 = 1 , respectively .

This two-step approach can play an important role when the data
contain trends and will be discussed later in further detail.

## While OLS estimation of the ADL and nonlinear least squares

estimation of the ECM yield the same point estimates, the latter
method (NLS) has the advantage of giving estimated standard errors
for estimates of long-run parameters as part of standard output.
J IANHUA G ANG (RUC)

PRING 2013

7 / 49

PRING 2013

8 / 49

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

T RENDING VARIABLES

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

T RENDING VARIABLES

T RENDING VARIABLES

## In the past, attention has been restricted to (covariance) stationary

processes. Recall that a time series variable zt is covariance stationary
if E (zt ) = z , var (zt ) = 2z ,and cov (zt , zt g ) = (|g |) for all t.

## Two models of trends are discussed.

1

zt = zt zt 1 = ut , ut NID (0, 2 ),

## First, zt is said to be a trend stationary process if zt = f (t ) + ut ,in

which f (.) is a deterministic function and ut is a stationary process.
(Often linear trend t);
Second, zt is said to be a difference stationary process if
zt = zt zt 1 = ut ,where ut is a stationary process.

## Series that can be differenced to obtain stationary variables are called

integrated series and it is useful to adopt the following terminology
and notation: a stationary process is denoted I (0) (integrated of order
zero); and a series zt is said to be integrated of order d if d zt I (0).
J IANHUA G ANG (RUC)

T RENDING VARIABLES

PRING 2013

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

9 / 49

T RENDING VARIABLES

Hence,

zt =

us + z0 .
1

## If z0 is zero, then, zt is sum of current and past innovations.

Clearly zt I (1) and zt I (0).

S PRING
Co-Integration
2013

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

T RENDING VARIABLES

(1)

10 / 49

## D IFFERENCES BETWEEN I(0) AND I(1) VARIABLES

We are able to illustrate the differences using two simple models:

## Model (1) can be extended so that zt has a nonzero mean, i.e. to

allow for a drift,
zt = zt zt 1 = a + ut , ut NID (0, 2 ),

## 1. I (0) : Let zt be an I (0) variate generated by the stable AR (1) model

zt = zt 1 + ut , || < 1, ut NID (0, 2 ).
It can be proved that for whatever value of t:

so that, if z0 = 0, then,

E (zt )

var (zt )

corr (zt , zt s )

zt =

us + at,

(2)

## and equation (2) implies the existence of both deterministic and

stochastic trend components.

0;
h

i
2 / 1 2 < ;
s s 0

Thus this I (0) variable has constant mean, constant variance (hence
large departures are rare), and autocorrelations decline as the order
increases.

0

## with j tending rapidly to zero as j , i .e.finite memory.

J IANHUA G ANG (RUC)

S PRING
Co-Integration
2013

11 / 49

S PRING
Co-Integration
2013

12 / 49

## We are able to illustrate the differences using two simple models:

2. I (1) :Let zt be an I (1) variate generated by the random walk
zt = zt zt 1 = ut , ut NID (0, 2 )
t

where, zt =

us ,if z0 = 0.
1

Hence,
var (zt )

E (zt )

=
=

corr (zt , zt g )

0,
t2 , (monotonic on t )
q
(t g )/ t (t g ), (dependence on t )

## Even if g is large, corr (zt , zt g ) can be close to 1 for t g .Clearly zt

is nonstationary.
t

Note that zt =

us

## zt ,i.e. there is an infinitely long memory.

J IANHUA G ANG (RUC)

S PRING
Co-Integration
2013

13 / 49

S PRING
Co-Integration
2013

14 / 49

## C ONSEQUENCES FOR OLS ANALYSIS

Analysis is relatively straightforward when variables are
trend-stationary with linear trends. In multiple regression with several
regressors having linear trends, adding a trend term to the basic
model and fitting
yt = + j xjt + t + ut , ut i.i.d.(0, 2 ),
j

which provides a basis for valid estimation and inference, with the
additional regressor serving as a trend-removing agent.
However, it has been established that the asymptotic theory of
OLS estimators and tests developed for I (0) variables can be
misleading when applied to data from I (1) processes. With
nonstationary variables, OLS estimators may tend to nonstandard
distributions, rather than normality, as n .
J IANHUA G ANG (RUC)

S PRING
Co-Integration
2013

15 / 49

S PRING
Co-Integration
2013

16 / 49

## It is therefore important to test whether or not variables are generated

by I (1) difference stationary processes.
Many results have been derived for testing the null hypothesis that
a series is difference stationary against the alternative of
covariance stationarity.
- Consider, a simple AR (1) model
2

- If

= 1 zt I (1)
< 1 zt I (0)

S PRING
Co-Integration
2013

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

Let DP and RE denote the true data process and the regression
equation used to compute the test, respectively. Will consider
three cases.
C ASE 1 H1 : zt stable AR (1) with zero mean.

## - Can define the polynomial ( ) = 1 .Solving

( ) = 1 = 0 yields the root = 1/, which is unity if = 1.
- Hence we talk of testing for a unit root. Several tests for unit roots
are based upon work by Dickey and Fuller (hereafter DF).
J IANHUA G ANG (RUC)

## DP(1) zt = ut , ut = ut 1 + t , t NID (0, 2 ),with 1 < 1

and H0 : = 1.Apply OLS to,
RE(1) zt = ( 1)zt 1 + t .

zt = zt 1 + ut , ut NID (0, ).

||

17 / 49

## DP(2) zt = 0 + ut , ut = ut 1 + t , t NID (0, 2 ),with

1 < 1 and H0 : = 1.Apply OLS to,
RE(2) zt = + ( 1)zt 1 + t , 0 (1 ).H0 implies that
= 1 = 0.

S PRING
Co-Integration
2013

18 / 49

## T ESTING FOR U NIT R OOTS

b 1) denote an OLS estimator (same notation for all REs).
Let (
b 1) or
Can base tests on either K (1) = n(
b 1)/SE (
b 1) = (
b 1)/SE (
b ).For RE(3), can also
t (1) = (
use F(0, 1), the F statistic for testing the two restrictions of
= 1 = 0.

## C ASE 3 H1 : zt linear trend + stable AR (1)

DP(3) zt = 0 + 1 t + ut , ut = ut 1 + t , t NID (0, 2 ),with
1 < 1 and H0 : = 1.Apply OLS to,
RE(3) zt = + t + ( 1)zt 1 + t ,in which
[ 0 (1 ) + 1 ] and 1 (1 ).H0 implies that = 0.

## Only t (1) type tests are to be considered. DF show that,

provided DP and RE are correctly matched, these test statistics have
nonstandard asymptotic distributions under H0 .DF also provide
estimated critical values for each of the three cases given
above.
If get incorrect matching of RE with DP, results can be quite
different and wrong.

S PRING
Co-Integration
2013

19 / 49

S PRING
Co-Integration
2013

20 / 49

DF- TAU TESTS :

S PRING
Co-Integration
2013

## T ESTING FOR U NIT R OOTS -C RITICAL VALUES OF

DF- GAMMA TESTS :

21 / 49

S PRING
Co-Integration
2013

22 / 49

## Useful to relax assumption that, under H0 ,the zt are independent.

A fairly general specification is that, under the unit root hypothesis,
zt follows a stationary mixed autoregressive-moving average process
(l )zt = (l )t , t i.i.d.(0, 2 )
in which (l ) and (l ) are polynomials in the lag operator.
Often approximate the autocorrelation of the zt by autoregressive
model to obtain the Augmented-DF (ADF) test.
For example, the ADF form corresponding to RE(3) can be written as
p

zt = A + A t + (A 1)zt 1 + j zt j + t
1

## it is asymptotically valid, for each of the three combinations of DP and

RE, to use the same critical values for ADF and DF.
J IANHUA G ANG (RUC)

S PRING
Co-Integration
2013

23 / 49

## Schwert points out that it may be difficult to obtain a satisfactory

autoregressive approximation to the serial correlated when zt has a
moving average component.
Choi finds that the use of a large value of p in ADF tests can lead to
low power.
More generally, studies of power indicate that it may be very difficult
to discriminate between trend stationary and difference stationary
processes, e.g. DF test for unit roots have low power when the data
are trend stationary.
Perron finds that if the data have segmented trends, e.g. structural
breaks, then unit root test lack power.

S PRING
Co-Integration
2013

24 / 49

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

C O - INTEGRATION

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

C O - INTEGRATION

C O - INTEGRATION

## The recognition of the existence of unit roots coupled with ideas

co-integration. A simple case of a single relationship will be
considered. More general treatment are available and the book by
Banerjee et al. (1994) contains many useful discussions and
references; also see the book by Harris (1995).

## Definition: The variable of (z1t , z2t , ..., zmt ) are said to be

co-integrated if:
1

zit I (d ), d > 0, i;
2

## The topics to be covered are: the definition of co-integration and its

links with equilibrium; testing for the absence of co-integration; and
Grangers Representation theorem which concerns the Error
Correction Models.
Co-integration theory was developed by Engle and Granger (1987)
(Nobel Prize in Economics, 2003)

S PRING
Co-Integration
2013

25 / 49

## Let yt and xt denote consumption and income, respectively. Suppose

that both variables are I (1).Consider ut defined by
yt = + xt + ut .In general, a linear combination of I (1)
variables, such as ut = yt ( + xt ) is also I (1).
Engle and Granger argue that there cannot be a meaningful
equilibrium relationship between yt and xt unless ut I (0),since an
I (1) error will wander widely and rarely cross the line through zero.
Thus co-integration is sometimes viewed as being required for a
certain type of equilibrium relationship.
However, economists might wish to include I (0) variables in an
equilibrium relationship, as well as I (1) regressors.

S PRING
Co-Integration
2013

at =

27 / 49

6= 0.

## If 1 and 2 are satisfied, then, in general, we write

(z1t , z2t , ..., zmt ) CI (d, b ).
Will only consider the case d = b = 1,i.e. a linear combination of I (1)
processes is a stationary I (0) variable.

S PRING
Co-Integration
2013

## J IANHUA G ANG (RUC)

C O - INTEGRATION

26 / 49

## For technical reasons, the null hypothesis is taken to be no

co-integration, so that, using the simple example above, the true ut
is I (1)(in which case, first difference the data and then apply classical
methods for stationary processes).
Thus, if and were known, could calculate the ut and apply an
ADF test for a unit root. If the test indicated the rejection of the unit
root restriction, the evidence could be viewed as supporting the
assumption of co-integration with ut I (0),but the parameters are, of
course, unknown.

S PRING
Co-Integration
2013

28 / 49

## The parameters and can be estimated by applying OLS - this is

known as fitting the co-integrating regression (CR). If yt and xt are
co-integrated, the OLS estimator of from the CR exhibits a property
known as superconsistency because, as n , it approaches the true
value at a faster rate than in the classical stationary variables case.
However, in small samples, biases may be important.

S PRING
Co-Integration
2013

29 / 49

## Having estimated the CR, the associated OLS residuals u

bt can be
used in a test for a unit root in the process ut ,which is a test for the
absence of co-integration. Two simple tests can be used with the u
bt .
One procedure is to compute the DW statistic which should be close
to zero under the unit root hypothesis. This check is not
recommended.

30 / 49

## C O - INTEGRATION E XAMPLE 1: PPP T HEORY

The second approach involves applying the ADF test in t-ratio form
after OLS estimation of the equation
p

b
ut = b
ut 1 + j b
ut j + et ,
1

Consider for example the PPP theory for the ex-change rate. In
perfect markets there are no arbitrage opportunities so the exchange
rate R is determined by the relative movements of the domestic price
level P and the foreign price level. P* i.e.

## in which p is selected so that et appears to be a sequence of i.i.d.

variables.

R=

This test is denoted CRADF. The DF tables are not valid for CRADF.
Asymptotic distributions under the unit root hypothesis depend upon
the number of I (1) regressors and whether or not the CR includes an
intercept and/or a trend term.
Finite sample critical values have been estimated by computer
method for various cases and are availble in some estimation
programs, e.g. PcGive.
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 8 Trending and
S PRING
Co-Integration
2013

S PRING
Co-Integration
2013

## T ESTING FOR THE A BSENCE OF C O - INTEGRATION

31 / 49

P
r = p p (in log)
P

This can be seen as an long equilibrium. Data like exchange rates and
inflation levels are usually I (1). So they are quite volatile. However if
the PPP theory is correct they should not drift apart a lot over time
i.e.
r (p p ) small

S PRING
Co-Integration
2013

32 / 49

## C O - INTEGRATION E XAMPLE 2: T HE M ONEY D EMAND

Suppose: mt real money supply; rt interest rate; t inflation; Yt real
income.

## If in fact there exists a vector of coefficients = ( 1 , 2 , 3 )such

that
1 rt 2 pt 3 pt = I (0)

## then rt , pt ,and pt are said to co-integrate and is the cointegrating

vector. Note: A co-integrating vector does not always exist, or there
might be more than one co-integrating vectors.

Moreover in practice the variables above are usually I (1) so they may
co-integrate.

mt = 1 rt + 2 t + 3 Yt

1
2
3

S PRING
Co-Integration
2013

33 / 49

S PRING
Co-Integration
2013

34 / 49

## In the previous lecture we have seen how we can test for

co-integration using the EG methodology. The EG approach involves
some serious drawbacks.
1 Suppose for example that x1t and x2t are I (1) and we want to test for
co-integration between those two variables. Recall the way to do it is
estimate the following regression by OLS1
x1t = b
1 + b
2 x2t + u
bt

## and apply DF test on the residuals u

bt . Because x1t and x2t can be
treated in a symmetric fashion, hence an alternative regression can be
3 + b
4 x1t + bt
x2t = b

## and do the same thing on bt .Theoretically the two approaches are

equivelant and they should give the same answer when the sample
used is large. In practice however, they may give different answers
because the sample sizes used are not large enough.
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 8 Trending and
S PRING
Co-Integration
2013

## J IANHUA G ANG (RUC)

Use DF test to make sure that all the variables are I (1).
Use OLS to estimate the model mt = c
1 rt + c
2 t + c
3 Yt + ubt .
Conduct test on the residual. If there is co-integration, then the
residuals must be stationary, otherwise the residuals will be I (1).

35 / 49

## 2 Moreover when m variables co-integrate, it is possible to have more

than one distinct co-integrating relationships (this number is actually
up to (m 1) co-integrating vectors can be found). The EG
methodology cannot estimate distinct co-integrating relationships.
The Johansen (1988) procedure provides a framework that
circumvents those problems. The Johansen approach involves
estimation of a system of equations rather than a single equation.
Before we consider this approach, we need to introduce the VAR and
VECM.

S PRING
Co-Integration
2013

36 / 49

VAR M ODEL

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

VAR M ODEL

VAR M ODEL

VAR M ODEL
The VAR model is, as the name suggests, an autoregression of a
vector process. Consider a simplest example of a VAR. This is a
two-variable VAR model with lag of first order (VAR(1)).

  
  
1t
y1t
11 12 y1t 1
=
+
21 22 y2t 1
y2t
2t

## Recall the AR(p) model is yt =

i yt i + t , and it can be
i =1

reparameterized as

yt = yt 1 + i yt i + t
i =2

or

Yt = 1 Yt 1 + t

p 1

yt = yt p +

ci yt i + t

i =1

## Scalar autoregressive models are inapropriate for co-intgration

analysis, as they involve only one variable (yt ). But co-integration
involves more than one variables.

Yt =

i Yt i + t
i =1

## where Yt = [y1t , ..., ymt ] and i is (m m) matrices of coefficients.

J IANHUA G ANG (RUC)

S PRING
Co-Integration
2013

37 / 49

VECM

Yt = 1 Yt 1 + t

i Yt i + t

i =1

where i = I

j
j =1

1 Yt 1 = Yt t
.

## The model is called Vector Error Correction model (VECM). Notice

that the only term in levels is Yt p The rest of the terms appear in
differences. As it will be explained in detail later, the Johansen
approach relies on the VECM.

## Consider the VECM representation of VAR(1) on two component I (1)

series. Yt = [yy1t2t ]I (1).

p 1

38 / 49

## T HE VECM, I(1) PROCESSES AND C O - INTEGRATION

Just like the scalar AR(p) model, the VAR(p) model can finally be
reparameterised as follows
Yt = p Yt p +

S PRING
Co-Integration
2013

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 8 Trending and

S PRING
Co-Integration
2013

39 / 49

Note: the right hand side is I (0) so 1 Yt 1 must be I (0) as well i.e.
the rows of the matrix 1 are co-integrating vectors and y1t and y2t
co-integrate. The rank of 1 gives the number of the linearly
independent co-integrating vectors.
Note that m = 2 so we cannot have more than one linearly
independent co-integrating vectors.
J IANHUA G ANG (RUC)

S PRING
Co-Integration
2013

40 / 49

## T HE VECM, I(1) PROCESSES AND CO - INTEGRATION

The result from last slide can be generalized easily to higher order
VECMs. Consider the model as before and suppose that
Y (t ) = I (1). Then,
1
2

S PRING
Co-Integration
2013

## At the centre of the subsequent analysis is the matrix p .In particular

we are interested in the rank of p .

41 / 49

## r = m : all component series of Yt are I (0),so co-integration is not an

issue.
0 < r < m : all component series are at least I (1) and co-integration
exists.
r = 0 : all component seris are I (1), but co-integration does not exists.

S PRING
Co-Integration
2013

S PRING
Co-Integration
2013

42 / 49

V ECTORS

CO - INTEGRATION

CO - INTEGRATION

## So far we have considered what co-integration implies for the

properties of VECM. Now reverse the question and ask which
properties of the VECM imply co-integration.

## The rows of p are co-integrating vectors of Yt p .

rank (p ) = r , where r is the number of linearly independent
co-integrating vectors.
Since r m 1, p is of reduced rank (singular).

43 / 49

## Under Johansens approach, the test statistics for co-integration are

formulated as
g
bi )
trace (r ) = T ln(1
i =r +1

and

b r +1 )
max (r , r + 1) = T ln(1

## b i is the estimated value for the ith. ordered eigenvalue from

where
the p matrix, r is the rank of matrix p , T is the number of
observations, and g is the dimension of the p .

S PRING
Co-Integration
2013

44 / 49

V ECTORS

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

C O - INTEGRATING V ECTORS

## O BTAINING LINEARLY INDEPENDENT

CO - INTEGRATING VECTORS FROM THE VECM
Recall that a reduced rank matrix can be decomposed into a product
of two full rank matrices. If co-integration exists then the m m
matrix p is of reduced rank (r < m ) and can be expressed as

The trace (r ) tests the null that the number of co-integrating vectors
is less than or equal to r against an unspecified alternative, while
the max (r , r + 1) tests the null that the number of co-integrating
vectors is r against an alternative of r + 1.

S PRING
Co-Integration
2013

## T OPIC 8 T RENDING VARIABLES AND C O - INTEGRATION

45 / 49

C O - INTEGRATING V ECTORS

p =
where , are m r full rank matrices.
Consider, for example, the case of m = 2. Then if y1t , y2t
co-integrate r = 1 and
 

1 
1 2
p = =
2


1 1 1 2
=
2 1 2 2
J IANHUA G ANG (RUC)

S PRING
Co-Integration
2013

## O BTAINING LINEARLY INDEPENDENT

CO - INTEGRATING VECTORS FROM THE VECM

46 / 49

## E CONOMIC INTERPRATATION OF THE VECM

Error Correction Models (ECM) have been widely used in economics
e.g. theories for the demand for money. The idea is as follows: let xt
be the optimal money balance that the individual wants to hold in
period t. Moreover, let xt the actual money stock. Equilibrium is
attained when xt = xt (long run).
In practice however xt may be different from xt due to adjustment
costs. The disequilibrium error in period t is defined as

## Now consider p Yt p ,the term in the VECM


  
1 ( 1 y1t + 2 y2t )
y1t
=
p Yt p =
y2t
2 ( 1 y1t + 2 y2t )
1 y1t + 2 y2t is the co-integrating relationship and defines the
co-integrating vector (when r > 1). The matrix of weightings can
be seen as a matrix of "speed adjustment" coefficients.

et = xt xt
The ECM suggests that xt changes over time to correct disequilibrium
errors that occurred in the past i.e.
xt = et 1
, where is a speed adjustment coefficient.

S PRING
Co-Integration
2013

47 / 49

S PRING
Co-Integration
2013

48 / 49

## In the context of VECM, long-run equlibrium relation(s) exists when

there is a co-integration. The long-run relationship is defined by
Yt p . When Yt p = 0 the system is in equilibrium. In the short
run however Yt p 6= 0 and Yt p gives the disequilibrium error.
Yt changes over time in response
1

## to the past error ( Yt p ) according to the adjustment coefficients

given by the matrix , and
to past changes Yt 1 ,Yt 2 ...Yt p +1 .

S PRING
Co-Integration
2013

49 / 49

## T OPIC 9 C AUSALITY, E XOGENEITY, AND S HOCK

I NTRODUCTORY F INANCIAL E CONOMETRICS
Topic 9 Causality, Exogeneity and Shock
3 C REDITS , 51 H OURS
Jianhua Gang
School of Finance
Renmin University of China

Spring 2013

S PRING 2013
and Shock 1 / 31

S PRING 2013
and Shock 2 / 31

G OALS

O UR DATA

## Time-series properties of macro variables;

How certain variables are related to each other (Interactions,
causality);
Systemic dynamics/transmissions of shocks.

S PRING 2013
and Shock 3 / 31

## E XAMPLES (M ACRO DATA )

Quantity, Price. These data are the result of aggregation procedures with
respect to economic agents, goods, and time.

S PRING 2013
and Shock 4 / 31

VARIABLES

## Denote yt , as a vector of n endogenous variables at time t; xt , the vector

of m exogenous variables, then create a linear system:

Some variables that are specific to the phenomenon under study, allow
D EFINITION (E XOGENOUS VARIABLE )
In order to explain the phenomenon, some variables may have influence on
the endogenous variables, and that the values of which are fixed outside
the phenomenon.

S PRING 2013
and Shock 5 / 31

LINEAR SYSTEM

## D YNAMIC M ACROECONOMETRIC M ODELS

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p + = 0,
(1)
where Aj , j = 0, 2, ..., p are n n; Bj are n m matrices, and is a n 1
vector. The A0 is supposed to be nonsingular, so that the whole system
allows for a unique determination of the current values of the endogenous
variables.

S PRING 2013
and Shock 6 / 31

## Aim of the model: derive the impact on the economy of an

autonomous (exogenously decided) expenditure (Gt ) policy.
Endogenous variables: GDPt , Ct , It .
Exogenous variables: Gt

The system,

GDPt = Ct + It + Gt
Ct = aGDPt 1

It = b (GDPt 1 GDPt 2 )

(2)

## where the first equation represents equilibrium of total supply =

total demand; second the demand of consumption (a to be some
fraction between 0 and 1); third the propensity to invest (assuming
growth period).
J IANHUA G ANG (RUC)

S PRING 2013
and Shock 7 / 31

## It is hence convenient to rewrite model (2) as in,

GDPt
GDPt 1
1 1 1
0 0 0
0 1
0 Ct a 0 0 Ct 1
0 0
1
b 0 0
It
It 1

GDPt 2
1
0 0 0
+ 0 0 0 Ct 2 0 Gt = 0
0
b 0 0
It 2

(3)

S PRING 2013
and Shock 8 / 31

R ANDOMNESS

## T OPIC 9 C AUSALITY, E XOGENEITY AND S HOCK

R ANDOMNESS

R ANDOMNESS

R ANDOMNESS
D YNAMICS AND

DISTURBANCES

The dynamic model (3) is deterministic and does not reflect short-run
disturbances.
If the whole dynamics has been correctly included in the initial
specification as in (3), these disturbances should be independent.
With random factors, we may re-write the model (2) as in,

TDt = Ct + It + Gt

GDPt = TDt
C = aGDPt 1 + ut

t
It = b (GDPt 1 GDPt 2 ) + vt

(4)

## where, given the equilibrium conditions, the price adjustment ensures

total demand = total supply, while clearly the behaviors (Ct , and
It ) are determined on factors other than just revenue. So we add in
error terms in these behavior equations.

S PRING 2013
and Shock 9 / 31

## Model (4) can be further written into,


 



 
a a
0
0
Ct 1
Ct 2
Ct
=
+
b b
b b
It
It 1
It 2
 
 
 
0
u
a
Gt 2 + t
Gt 1 +
+
vt
b
b

(5)

More compactly,
A0 yt + A1 yt 1 + A2 yt 2 + B0 xt + B1 xt 1 + B2 xt 2 + = t

S PRING 2013
and Shock10 / 31

D EFINITIONS

D EFINITIONS

## D EFINITION (C ONTROL VAR .)

Exogenous variables that can be controled by policy maker. (a.k.a.
instruments, economic policy, decision variables)
D EFINITION (E NVIRONMENT

VAR .)

## It is hence possible to distinguish the difference among exogenous

variables (xt environment, zt control) in the model,
A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p

+C0 zt + C1 zt 1 + ... + Cp zt p + = t

## Other exogenous variables have their own evolution on which we cannot

easily intervene.

S PRING 2013
and Shock11 / 31

S PRING 2013
and Shock12 / 31

VAR .

## Consider, control var.s are fixed; environment var.s influence on

endogenous var.s is before yt . Therefore,

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p

+C0 zt + C1 zt 1 + ... + Cp zt p + = t
xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p
+F1 yt 1 + ... + Fp yt p + = ut

(6)

S PRING 2013
and Shock13 / 31

WEAK EXOGENEITY

## C ONTROL AND E NVIRONMENT VARIABLES

The control var.s can have an impact on the endogenous var. or the
environment var. However, they do not influence them directly. (i.e. do
not alter through Aj , Bj , Dj , Fj );
The x are exogenous because the xt s are fixed prior to the yt s
(F0 = 0, and cov (ut , t ) = 0).

## The model (6) is called block-recursive (determination of x and then

of y ).
The recursive model (6) corresponds to the weak exogeneity (with
information of lagged endogenous variable y ).

S PRING 2013
and Shock14 / 31

## Policy maker could intervene on the control var.s (value or evolution)

so as to affect the endogenous var.s.
More restrictive if we assume xt s are determined autonomously
(without a relationship to the lagged endogenous var.s). This
corresponds to imposing Fj = 0, j,

E XAMPLE
In Keynesian model, the government can alter Gt so as to influence the
economy. e.g. maintain a constant level of expenditure,

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p

+C0 zt + C1 zt 1 + ... + Cp zt p + = t
xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p
+ = ut

Gt = Gt 1 ,
(7)

S PRING 2013
and Shock15 / 31

## or to modify government expenditure according to the observed evolution

of investment,
Gt Gt 1 = (It 1 It 2 ).
This is how the values of the control var.s will be fixed in term of the main
aggregates. And this can be expressed by adding in a policy equation,
J IANHUA G ANG (RUC)

S PRING 2013
and Shock16 / 31

## D EFINITION (W ITH POLICY EQUATION )

From equation (8) to (10), additional recursiveness can be observed:

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p

+C0 zt + C1 zt 1 + ... + Cp zt p + = t
xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p
+F1 yt 1 + ... + Fp yt p + = ut
zt + G1 zt 1 + ... + Gp zt p + H1 xt 1 + ... + Hp xt p
+I1 yt 1 + ... + Ip yt p + = vt

(8)
(9)

determination of z of x of y .
However, policy maker may only give values that he wants to the
coefficients Gj , Hj , Ij , whereas he does not have any influence on the
other parameters of the model.

(10)

S PRING 2013
and Shock17 / 31

S PRING 2013
and Shock18 / 31

## T OPIC 9 C AUSALITY, E XOGENEITY AND S HOCK

T HE STRUCTURAL FORM

## VARIOUS F ORMS OF A D YNAMIC M ODEL

T HE STRUCTURAL FORM
S IMULTANEITY

## The structural form, for example, corresponds to the initial equation:

A0 yt + A1 yt 1 + ... + Ap yt p

+B0 xt + B1 xt 1 + ... + Bp xt p + = t ,

(11)

## Traditionally, A0 is often expressed with unit elements along its main

diagonal, then equation (11) can be re-written as,
yt

= + (I A0 )yt A1 yt 1 ... Ap yt p
B0 xt B1 xt 1 ... Bp xt p + t ,

## Simultaneity among the variables can be introduced through the

coefficients of A0 and through the nonzero contemporaneous
correlation of the elements of the vector .
While the simultaneity appearing in A0 is easily interpretable in terms
of equilibrium, the one appearing in var () is not!

(12)

## where (I A0 ) has zero elements on the main diagonal.

System as in (12) could be difficult to interpret without additional
constraints.
J IANHUA G ANG (RUC)

S PRING 2013
and Shock19 / 31

S PRING 2013
and Shock20 / 31

## T OPIC 9 C AUSALITY, E XOGENEITY AND S HOCK

T HE REDUCED FORM

T HE REDUCED FORM

D EFINITION

C OMMENTS

## D EFINITION (R EDUCED FORM )

Endogenous var. is expressed as a function of the lagged endogenous
var.s, of the exogenous var.s, and of the disturbance term, e.g.,
yt

= A 01 (A0 yt + A1 yt 1 + ... + Ap yt p
+B0 xt + B1 xt 1 + ... + Bp xt p + ) + A01 t .

(13)

## Therefore initial parameters are transformed into other summarized

forms. Sometimes easy to calculate/estimate but...
Problem: do we really care about reduced estimations?

## Model (13) can be simplified into:

yt = A (0)1 ((A(L) A(0))yt + B (L)xt + ) + A(0)1 t
where,
A(L) = A0 + A 1 L + ... + Ap Lp , B (L) = B0 + B1 L + ... + Bp Lp .
J IANHUA G ANG (RUC)

S PRING 2013
and Shock21 / 31

T HE FINAL FORM

S PRING 2013
and Shock22 / 31

C AUSALITY

C AUSALITY

D EFINITION

## D EFINITION (T HE FINAL FORM )

The above expressions can be further transformed into expressing the
current value of the endogenous var.s yt as a function of the exogenous
variables and of the disturbances , < t.This is the final form.
Given all roots of the polynomial A(L) are outside the unit circle,
yt = A (L)1 B (L)xt A(L)1 + A(L)1 A0 (A01 )t ,

## Previous cases: study distinction between endogenous, exogenous

(control/environment) variables.
Now consider an approach: analyzing the joint evolution of the
various variables of interest, and in examining whether some of them
are fixed before others.
Can be used on processes of {xt } and {yt }.(may also be used on
control var.s {zt }).

## which allows us to separate the influence of the exogenous var.s and of

the disturbances on y .

S PRING 2013
and Shock23 / 31

S PRING 2013
and Shock24 / 31

C AUSALITY

## T OPIC 9 C AUSALITY, E XOGENEITY AND S HOCK

C AUSALITY

C AUSALITY

D EFINITIONS - CAUSALITY

D EFINITION -

C AUSALITY

NONCAUSALITY

## D EFINITION (G RANGER (1969))

As a result of the properties of the linear regression, the variable

## 1. y causes x at time t iff,

E (xt | x t 1 , y t 1 ) 6= E (xt | x t 1 );
2. y causes x instantaneously at time t iff,

## var ((xt | x t 1 , y t 1 )) var ((xt | x t 1 ))

then we have the following conditions.

E (xt | x t 1 , y t ) 6= E (xt | x t 1 , y t 1 ).

S PRING 2013
and Shock25 / 31

C AUSALITY

C AUSALITY

C AUSALITY

D EFINITION -

D EFINITION -

NONCAUSALITY

## I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 9 Causality, Exogeneity

S PRING 2013
and Shock26 / 31
C AUSALITY

NONCAUSALITY

D EFINITION (N ONCAUSALITY )
1. y does not cause x at time t iff,
var ((xt | x t 1 , y t 1 )) = var ((xt | x t 1 ));
2. y does not cause x instantaneously at time t iff,

C OROLLARY (S YMMETRIC )
The two following statements are equivalent:
1. y does not cause x instantaneously at time t;
2. x does not cause y instantaneously at time t.

S PRING 2013
and Shock27 / 31

S PRING 2013
and Shock28 / 31

C AUSALITY

## T OPIC 9 C AUSALITY, E XOGENEITY AND S HOCK

C AUSALITY

C AUSALITY

C AUSALITY R EVERSAL

L IMIT

## The definitions of causality proposed are valid for any time t. In

reality, for certain phenomena we could observe a causality reversal.
It is clear that definitions of causality as shown above involves
conditions on the forecast error only.

such reversals.

## It hence might be preferable, then, to use terms such as

predictability and instantaneous predictability instead of
causality and instantaneous causality.

## D EFINITION (A BSENCE OF REVERSAL )

y does not cause x (instantaneous) iff y does not cause x (instantaneous)
at time t for all possible times t.

## However, academia still uses the term causality.

Therefore, should keep in mind constantly the previous definitions is
sometimes not suitable to describe real-world phenomenon.

S PRING 2013
and Shock29 / 31

## C AUSALITY AND VAR M ODELS

Consider the expression of VAR,
     

yt
y (L) yx (L)
c

= y + yt
cx
xt
xt
xy (L) x (L)

(14)

## where the usual conditions on the roots of the autoregressive

characteristic polynomial are satisfied. We can hence choose a
normalization of the type,


y (0) yx (0)
=I
(0) =
xy (0) x (0)
Therefore, in this case, all simultaneous links between the two
processes are summarized in the covariance cov (yt , xt ).

S PRING 2013
and Shock31 / 31

## It is therefore true that when the process (x, y ) is stationary, it is

apparent that the definitions for a certain date or for all dates
coincide.

S PRING 2013
and Shock30 / 31