0 оценок0% нашли этот документ полезным (0 голосов)

43 просмотров115 страницFinancial Econometrics Notes (PPT)

Oct 04, 2015

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

Financial Econometrics Notes (PPT)

© All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

43 просмотров115 страницFinancial Econometrics Notes (PPT)

© All Rights Reserved

Вы находитесь на странице: 1из 115

I NTRODUCTORY F INANCIAL E CONOMETRICS

Review of Econometric Theory

3 C REDITS , 51 H OURS

Readings:

Jianhua Gang

Wooldridge, Ch.2

School of Finance

Renmin University of China

Spring 2013

S PRING

Theory

2013

1 / 110

R EGRESSION A NALYSIS

S PRING

Theory

2013

R EGRESSION A NALYSIS

2 / 110

Generalized idea of a random sample of n independently and

identically distributed (i.i.d.) observations from N (, 2 ).

relationship between a variable of interest (dependent variable,

explained variable, regressand) and one or more other variables

(independent variables, explanatory variables, regressors).

What is estimation, prediction (forecast), the fitting?

which is normally distributed with variance 2 ,but conditional

mean governed by

E(yi ) = + xi , i = 1, ..., n.

where,

1

2

S PRING

Theory

2013

3 / 110

The term xi varies with i, but is not random (nonstochastic, fixed in

repeated sampling).

What is sampling?

S PRING

Theory

2013

4 / 110

If ui = yi ( + xi ) denotes the error (or disturbance term), then

write simple regression model as:

yi = + xi + ui , ui NID(0, 2 ), i = 1, ..., n,

1

2

the slope is the change in the mean of y when xi increases by one

unit. (This interpretation of the intercept is not always sensible in

economic applications.)

(1)

inappropriate in many applications in economics and it is relaxed

later.

More useful to think of the classical assumption as being

appropriate when we conditional on the values of x1 , ..., xn . Thus,

conditional upon the values of x1 , ..., xn , the yi are independent

normal variables with means + xi and common constant

variance 2 for i = 1, ..., n.

S PRING

Theory

2013

5 / 110

E STIMATION OF PARAMETERS

S PRING

Theory

2013

E STIMATION OF PARAMETERS

6 / 110

(1)):

The following general approaches to estimate , and 2 are

considered: method of moments (MM); ordinary least squares

(OLS); and maximum likelihood estimation (MLE).

E(ui ) = 0,

E(xi ui ) = 0,

E(u2i

2 ) = 0.

and b

, with associated

bi = yi (b

+b

xi ), i = 1, ..., n.

residuals u

S PRING

Theory

2013

7 / 110

S PRING

Theory

2013

8 / 110

bi ), the equations are:

n1 (.), and ui by u

Choose estimates b

and b

to get "best fit" in the sense of

minimizing

S(, ) = [yi ( + xi )]2 .

ubi

b2i

u

[yi (b + bxi )] = 0,

xi ubi =

xi [yi (b + bxi )] = 0,

S(b

, b

)

S(b

, b

)

= 0

and asymptotically normally distributed.

S PRING

Theory

2013

9 / 110

xi [yi (b + bxi )]

i

ubi = 0

xi ubi = 0

(2)

(xi x)(yi y)

b

=

It is clear that the normal equations imply that the OLS estimates

of and are equal to the corresponding MME previously.

S PRING

Theory

2013

10 / 110

The solution of b

and b

which minimize the objective function

S(, ) are,

(3)

ui is an OLS

residual).

= 0

S PRING

Theory

2013

= 0

[yi (b + bxi )]

11 / 110

(xi x)2

i

b

= yb

x

i

S PRING

Theory

2013

12 / 110

f (yi ) = (22 )1/2 exp{[yi ( + xi )]2 /22 }, i.

Let b

yi = ( b

+b

xi ) denote a typical OLS predicted value, then the

normal equation for OLS yield several results.

f (y1 , ..., yn ) =

yi

f ( yi ) = L

i

b2 =

equals OLS. The MLE of 2 is

n 1

ub2i = MM estimate.

13 / 110

=

=

(byi + ubi )

=

i

i

b

y2i

+

i

b2i

u

+0

i

S PRING

Theory

2013

14 / 110

G OODNESS OF F IT

G OODNESS OF F IT

i

S PRING

Theory

2013

(yi n1 yi )2

y2i

n

[y ( + xi )]2

l(, , 2 ) = ln(22 ) i

.

2

22

i

i

byi ubi

Residual Sum of Squares(RSS)

line with

ESS

RSS

= 1

, 0 R2 1.

R2 =

TSS

TSS

R2 = r2XY , where rXY = XY (correlation coefficient between x and

y).

S PRING

Theory

2013

15 / 110

S PRING

Theory

2013

16 / 110

For the classical normal simple regression model, b

and b

are

jointly normally distributed with

E(b

) =

E( b

) =

errors ui are not normally distributed.

Var(b

) =

2

(xi x)2

i

2

)

+ x2 Var(b

n

Cov(b

, b

) = xVar(b

)

Var(b

) =

S PRING

Theory

2013

17 / 110

S PRING

Theory

2013

18 / 110

It can be shown that, in classical normal simple regression model,

b

= + wi ui

ub2i = RSS 2 2 (n 2)

b

= + zi ui

is independent of b

and b

.

regressor values, e.g.

regression parameters estimated to derive the residuals and is called

the degree of freedom parameter for the regression.

j

S PRING

Theory

2013

19 / 110

S PRING

Theory

2013

20 / 110

S TATISTICAL I NFERENCE

S TATISTICAL I NFERENCE

S TOCHASTIC

Hence,

b2i ) = 2 (n 2)

E( u

i

s2 =

probabilistic model for y1 , ..., yn .We make the following

assumptions.

ub2i

i

n2

b2 = [(n 2)/n] s2 is

is unbiased. The ML estimator, however,

biased (of course when sample size gets relatively small).

S PRING

Theory

2013

21 / 110

S TATISTICAL I NFERENCE

S PRING

Theory

2013

S TATISTICAL I NFERENCE

22 / 110

E(yi ) = + xi i;

n

( xi x ) 2 > 0

1

to assume 0 < lim n1 S < ;

A3 Let ui = yi E(yi ),common variance (homoskedasticity)

var(ui ) = 2 i. If the ui do not have the same variance, have

heteroskedasticity.

A4 Let ui = yi E(yi ),uncorrelated disturbances so E(ui uj ) = 0 if

i 6= j.If have time series data and assumption is false then say have

autocorrelation (or serial correlation).

A5 Let ui = yi E(yi ),normally distributed distanbances (so that A4

implies independence).

J IANHUA G ANG (RUC)

S PRING

Theory

2013

23 / 110

b

and b

are N (, var(b

)) and N ( , var(b

)), respectively, so that

q

z(b

) = (b

)/ var(b

) N (0, 1)

q

) N (0, 1)

z( b

) = (b

)/ var(b

b2i 2 2 (n 2) independently of b

RSS = u

and b

, so

RSS

2

b

(n 2) independently of z(b

) and z( ), so

2

t(b

) =

t( b

) =

J IANHUA G ANG (RUC)

q

q

z(b

)

RSS

(n2) 2

z( b

)

RSS

(n2) 2

t(n 2)

t(n 2)

S PRING

Theory

2013

24 / 110

RSS

(n2)

(b

)

z( b

)

t( b

) = q = q

q t(n 2)

2

s

s2

b

var

(

)

2

Hence,

t( b

) =

var(b

)(

s2

2

s2

s2

)

=

(

)(

)

=

2

SXX 2

SXX

) and the square root of this quantity is

called the estimated standard error, denoted by

s

r

q

2

s

s2

b

b

var(b

)(when n big)

SE( ) = var( )( 2 ) =

SXX

S PRING

Theory

2013

25 / 110

),

t(b

) =

SE(b

)

t(n 2)

)

(b

t(n 2),

SE(b

)

S PRING

Theory

2013

26 / 110

Consider the null hypothesis that restricts one of the regression

parameters, e.g. H0 : = 0 , where 0 is some specified constant.

prob(d1 t(n 2) d1 ) = (1 )

t( b

) =

(b

)

t(n 2),

SE(b

)

t0 (b

) =

(b

0 )

t(n 2).

SE(b

)

and are given by,

respectively.

and so if H0 is true,

b

d1 SE(b

)

b

)

d1 SE(b

t0 ( b

J IANHUA G ANG (RUC)

S PRING

Theory

2013

27 / 110

S PRING

Theory

2013

28 / 110

Suppose that x, like y, is a r.v.. Consider the results above that can

now be regarded as being derived, conditional upon the values

x1 , ..., xn .

alternative hypothesis and the prespecified significance level,

denoted by .

1

2

3

H1 : 6= 0 reject H0 if |t0 (b

)| > d1 ,where

prob(t(n 2) > d1 ) = /2

H1+ : > 0 reject H0 if t0 (b

) > d2 ,where prob(t(n 2) > d2 ) =

H1 : < 0 reject H0 if t0 (b

) < d2 ,where

prob(t(n 2) < d2 ) =

Just replace by and b

by b

in the above to obtain test procedures

for (the intercept).

S PRING

Theory

2013

29 / 110

E(b

|x1 , ..., xn ) = , E(b

|x1 , ..., xn ) = and E(s2 |x1 , ..., xn ) = 2 .These

expectations do not depend upon the x values and so OLS

estimators are unconditionally unbiased. Similar remarks apply to

probability limits;

var(b

|x1 , ..., xn ), var(b

|x1 , ..., xn ) and cov(b

, b

|xx1 , ..., xn ),as given

above, do depend on the x values, and so do not correspond to

unconditional characteristics.

Fortunately, 2 does not pose major problems for inference. The

variables (b

) /SE(b

) and (b

)/SE(b

) are, given x values,

both distributed as t(n 2), still. This distribution does not depend

on x values, but just on the values of (n 2). Hence the t tests

and confidence intervals described above are unconditinally valid.

S PRING

Theory

2013

30 / 110

SCHOOLING )

1

or not we condition on the x values, i.e. the regressor values and

error terms are statistically independent.

Assumptions in 1 can be weakened but we cannot expect to get

results that are exact, i.e. valid for finite sample sizes, and often

have to resort to asymptotically valid results in practical situations.

S PRING

Theory

2013

31 / 110

S PRING

Theory

2013

32 / 110

P REDICTION

P REDICTION

P REDICTION

P REDICTION

size), with xf known and assuming the data generation process for

y is unchanged so that,

size), with xf known and assuming the data generation process for

y is unchanged so that,

yf = + xf + uf , uf N (0, 2 ).

yf = + xf + uf , uf N (0, 2 ).

yf = b

+b

xf , where the OLS

estimators use the data for i = 1, ..., n. This predictor is BLUE for

E(yf ) = + xf .

The predictor b

yf is a linear combination of the OLS estimators and

so is normally distributed.

The variance of b

yf can be estimated, and confidence intervals and

tests of hypotheses are feasible.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

33 / 110

h

i

of (yf b

y f ) = uf ( b

) + b

xf , which has zero

expectation, given OLS unbiased and E(uf ) = 0.

The forecast error is normally distributed, being a linear

combination of three normal variates, and has a variance that can

be estimated. Confidence intervals and tests of hypotheses, e.g.

H0 : E ( yf b

yf ) = 0,are feasible.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

34 / 110

Have sample of n independent observations y1 , ..., yn , each of

which is normally distributed with variance 2 , but means vary

according to

E(yi ) = + 1 x1i + ... + k xki = + j xji , i = 1, ..., n.

READING

Wooldridge, Ch.3, 4

and j are parameters/coefficients.

Regressors xji vary with i, but nonrandom (nonstochastic, i.e. fixed

in repeated sampling).

can be regarded as an intercept with = E(yi ), given all xji = 0.

Slopes j can often be regarded as partial derivatives: j =

E(yi )

xji .

other regressor; so that interpretations vary.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

35 / 110

S PRING

Theory

2013

36 / 110

The following assumptions are made in the classical normal

regression model:

j

such that

E(yi ) = + j xji i;

j

j

n

n

1

S PRING

Theory

2013

37 / 110

S PRING

Theory

2013

38 / 110

Assumption A2 is often too restrictive for economic applications

in which some regressors are probably better regarded as random,

rather than fixed in repeated sampling.

regression model:

A3 Also need to assume that no regressor is just a linear combination

of the other regressors and the intercept term.

A4 Common variance (homoskedasticity) var(ui ) = 2 i. If the ui do

not have the same variance, have heteroskedasticity.

A5 Uncorrelated disturbances so E(ui uj ) = 0 if i 6= j.If have time series

data and assumption is false then say have autocorrelation/serial

correlation.

A6 Normally distributed distanbances (so that A5 implies

independence).

thinking about the conditional distribution of yi , holding the

values xji (i = 1, ..., n; j = 1, ...k) constant. Having derived results

for the conditinal model, we can see which of them will apply to

the unconditional model for y.

For the former model, we have that, given the values of the

regressors, the variates yi are independent with conditional

distributions N ( + xji j , 2 ) for i = 1, ..., n.

j

S PRING

Theory

2013

39 / 110

S PRING

Theory

2013

40 / 110

= 0

"

= 0

Therefore, MM estimators, denoted byb, can be derived form

ubi

S(, 1 , ..., k ) =

yi

+ j xji

!#2

xji ubi

i

ubi

bi is the residual yi (b

for j = 1, ..., k, where u

+b

j xji ), i = 1, ..., n.

xji ubi

E(u2i 2 ) = 0,

it is

b2i .

b 2 = n1 u

S PRING

Theory

2013

= 0

for j = 1, ..., k, where

! u

yi

b

j xji

+b

, i = 1, ..., n.

= 0

41 / 110

S PRING

Theory

2013

42 / 110

simple regression model, it can be shown that the log likelihood

function is given by,

Best linear unbiased estimator (BLUE) of and j , j = 1, ..., k,even

when errors ui are not normally distributed.

S(, 1 , ..., k )

n

.

l(, 1 , ..., k , ) = ( ) ln(22 )

2

22

2

S(, 1 , ..., k ) and so OLSE = MLE.

b2

The MLE of 2 is RSS

n , where RSS = ui is the OLS residual sum

of squares function.

S PRING

Theory

2013

43 / 110

S PRING

Theory

2013

44 / 110

Let b

yi = b

+b

j

yi

i

byi ubi

bi + bj xji u

bi = 0

= b

u

y2i =

i

(byi n byi )

i

(byi + ubi )2

i

( yi n y i )

Residual Sum of Squares(RSS)

(b + bj xji )ubi

i

b2i

+u

i

S PRING

Theory

2013

45 / 110

G OODNESS OF F IT

S PRING

Theory

2013

G OODNESS OF F IT

46 / 110

R2

Coefficient of determination

is index of goodness of fit of OLS

ESS

RSS

2

line with R = TSS = 1 TSS , 0 R2 1.

2

2

defined by R = 1 {RSS/ (n k 1) / [TSS/ (n 1)]} .This

index can be negative.

b

j xj ,

= yb

j

b

j =

2

fall (monotonic function on # of parameters), but R can.

exji yi

i

ex2ji

where e

xji is the ith residual from the OLS regression of the

jth regressor on the other (k 1) regressors and the intercept term.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

47 / 110

S PRING

Theory

2013

48 / 110

b

N (, var(b

)).

b

j = j +

exji ui

i

ex2ji

= j +

exji ui

i

RSSj

b

xji ui / e

x2ji = j + e

xji ui /RSSj and the disturbances

j = j + e

S PRING

Theory

2013

49 / 110

2 2

= RSS (n k 1)

number of regression parameters estimated to derive the

residuals and is called the degree of freedom parameter for the

regression.

b2i ) = 2 (n k 1) and so the estimator s2 = (n1k1) ( u

b2i )

E( u

i

is unbiased.

However, the MLE estimator,

course when sample size is relatively small).

I NTRODUCTORY F INANCIAL E CONOMETRICS Review of Econometric

S PRING

Theory

2013

S PRING

Theory

2013

50 / 110

independently of b

and bj , j.

var(b

j ) = 2 /RSSj , j = 1, ..., k.

E( b

j ) = j

b2i

u

estimation of the auxiliary regression of the jth regressor on the

other (k 1) regressors and the intercept term.

51 / 110

b

)) and N ( j , var( bj )), respectively, so that

and b

j are N (, var(b

z(b

) = (b

)/

RSS =

var(b

) N (0, 1)

q

z( bj ) = ( bj j )/ var( bj ) N (0, 1).

) and z( bj ), so

q

q

t( bj ) = z( bj )/ [RSS/(n k 1)] /2 t(n k 1).

t(b

) = z(b

)/

S PRING

Theory

2013

52 / 110

C ONFIDENCE I NTERVALS

q

p

p

t( bj ) = z( bj )/ s2 /2 = ( bj j )/

var( bj ) s2 /2

estimator of var( b ) and the square root of this quantity is called

j

).

Hence,

simlarly

t( bj ) = bj j /SE( bj ) t(n k 1)

) /SE(b

) t(n k 1)

t(b

) = (b

S PRING

Theory

2013

53 / 110

) and b d1 SE( b ), respectively.

given by b

d1 SE(b

j

j0

depends upon the nature of the alternative hypothesis and the

prespecified significance level, denoted by .

S PRING

Theory

2013

S PRING

Theory

2013

54 / 110

,and so if H0 is true t0 ( b ) = ( b )/SE( b ) t(n k 1).

j

parameters, e.g. H0 : j = j0 (some specified constant),

C ONFIDENCE I NTERVALS

55 / 110

prob(t(n k 1) > d1 ) = /2

H + : > reject H0 if t0 ( b ) > d2 ,where

1

j0

prob(t(n k 1) > d2 ) =

prob(t(n k 1) < d2 ) =

Just replace j by and bj by b

in the above to obtain test

procedures relevant to testing hypotheses concerning the

intercept.

S PRING

Theory

2013

56 / 110

E XAMPLE

Suppose that the null hypothesis to be tested is denoted by H0 and

consists of several linear restrictions on the parameters of the

regression model. Thus H0 specifies the values of, say, q < (k + 1)

linear combinations of the regression coefficients. For example, with

k = 4 and q = 3, H0 could consist of the following restrictions:

+ 1 = 0; 2 = 1; and 4 = 0. We now need a joint test of all the

restrictions of H0 ,rather than a collection of separate t-tests.

S PRING

Theory

2013

57 / 110

restrictions of H0 .In the example of the previous note, RSS(H0 ) is

derived by applying OLS to the restricted model:

(yi x2i ) = 1 (x1i 1) + 3 x3i + ui .

Let RSS(H1 ) be the RSS obtained by applying OLS to the

unrestricted model. In the previous example, RSS(H1 ) is derived by

4

j=1

S PRING

Theory

2013

58 / 110

P REDICTION

P REDICTION

D EFINITION

Define the F statistic by the following equation

F=

sample size), with xjf known and it being assumed that the data

generation process (DGP) for y is unchanged so that

,

RSS(H1 )

q

yf = + j xjf + uf , uf N (0, 2 ).

j

unrestricted model, i.e. df (H1 ) = (n k 1).

yf = b

+b

j xjf , where the

j

The null hypothesis is regarded as inconsistent with the data if the

sample (observed) value of F is significantly large, i.e. the test is

one-sided.

S PRING

Theory

2013

59 / 110

OLS estimators use the data for i = 1, ..., n. This predictor is BLUE

for E(yf ) = + xf .

The predictor b

yf is a linear combination of the OLS estimators and

so is normally distributed. The variance of b

yf can be estimated, and

confidence intervals and tests of hypotheses are feasible.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

60 / 110

P REDICTION

P REDICTION

sample size), with xjf known and it being assumed that the DGP

for y is unchanged so that,

yf = + j xjf + uf , uf N (0, 2 )

j

READING

of

#

"

b

) + j j xjf

( yf b

y f ) = uf ( b

Wooldridge, Ch.3.

The forecast error is normally distributed, being a linear

combination of normal variates, and has a variance that can be

estimated.

Confidence intervals and tests of hypotheses, e.g.

H0 : E ( yf b

yf ) = 0,is feasible.

S PRING

Theory

2013

61 / 110

M ULTICOLLINEARITY

M ULTICOLLINEARITY

S PRING

Theory

2013

62 / 110

M ULTICOLLINEARITY

M ULTICOLLINEARITY

estimating the individual regression parameters depends, in part,

upon the intercorrelations between the regressors.

Let R2j denote the R2 statistic from the OLS estimation of the

var(b

j ) = 2 /RSSj = 2 /

"

(xji xj )

i

1 R2j .

regressors and the intercept term. Since it has been assumed that

no regressor is a linear combination of the other regressors and

the intercept term, it follows that R2j < 1 for all j.

values of sampling variances.

If R2j is close to 1 for some j, then have a high degree of

multicollinearity.

S PRING

Theory

2013

63 / 110

and weak tests of hypotheses.

i

degree of multicollinearity, or large when the regressor are

uncorrelated.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

64 / 110

M ULTICOLLINEARITY

M ULTICOLLINEARITY

M ULTICOLLINEARITY

but nontheless no assumptions of the classical multiple

regression model have been violated.

Therefore, provided multicollinearity is not perfect, then OLS

estimators are BLUE and MLE. Similarly the standard test

procedures are valid and retain optimality properties relative to

other tests.

Klein proposes the rule of thumb that multicollinearity is a

"problem" if maxj R2j > R2 .

If trying to consider multicollinearity, it is not sufficient to look

only at pairwise correlations between regressors (might be nested

models where reside complex relationship or even stochastic).

S PRING

Theory

2013

M ULTICOLLINEARITY

65 / 110

so we cannot test for it. Some measures for multicollinearity have

been proposed, but they are open to objection and the R2j statistics

are simple to calculate and interpret.

Models can be reparameterized to make transformed regressor

uncorrelated, but the transformed parameters may have no

economic interest.

As noted above, multicollinearity can lead to large variances and

weak tests, e.g. might have every individual slope estimate being

insignificant (as indicated by a t-test), but a highly significant F

statistic for the hypothesis that all slopes equal zero.

M ULTICOLLINEARITY

S PRING

Theory

2013

66 / 110

M ULTICOLLINEARITY

estimates when there are small changes in the data.

Various "treatments" have been described, e.g. drop some

variables, use first differences, use outside estimates of some

coefficients. These treatments usually introduce new problems,

e.g. dropping an insignificant, but relevant, variable will lead to

biased estimator in the amended model.

READING

Wooldridge, Ch.3, Ch. 7, Ch. 9.

restrictions is not a good strategy. May also have to wait for more

data.

S PRING

Theory

2013

67 / 110

S PRING

Theory

2013

68 / 110

R EVIEW T OPIC 4: T HE M EAN F UNCTION F UNCTION -C ONSEQUENCES

R EVIEW T OPIC 4: T HE M EAN F UNCTION F UNCTION -C ONSEQUENCES

C ONSEQUENCES

C ONSEQUENCES

C ASE 1

C ASE 2

and 1 , ..., k such that the conditional mean is given by

and 1 , ..., k such that the conditional mean is given by

E(yi |xji , j = 1, ..., k) = + j xji ,

1. May have included irrelavant regressors, i.e. some j equals zero. OLS

estimators are still unbiased and consistent, but no longer efficient

(they fail to use valid information set that corresponds to some

coefficients being zero).

S PRING

Theory

2013

69 / 110

R EVIEW T OPIC 4: T HE M EAN F UNCTION F UNCTION -C ONSEQUENCES

2. May have omitted some relevant regressors: Write the conditional mean

function as E(yi |xji , j = 1, ..., k) = + j xji + E(fi |xji , j = 1, ...k.),

j

regression parameters and j are biased and inconsistent. The

estimator s2 is biased and inconsistent, and the standard t- and

F-tests are no longer valid.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

C ONSEQUENCES

70 / 110

C ASE 3

If have strong belief about the omitted factor, can use precise test.

For example, if sure that fi is a linear combination of q variables zji ,

can apply F-test of H0 : 1 = ... = q = 0 in the expanded model

yi = + j xji + ui , ui NID(0, 2 ),

parsimonious" RESET test. In this test, fit the null model

j

yi = + j xji + ui , ui NID(0, 2 ),

correspond to parameters of economic interest.

S PRING

Theory

2013

yi , i = 1, ..., n.

71 / 110

S PRING

Theory

2013

72 / 110

1 and 2 .Let 1 contains n1 observations and 2 contains

n2 = n n1 observations. The unrestricted model of the

alternative hypothesis is then written as,

yi )j+1 + ui , ui NID(0, 2 ).

yi = + j xji + j (b

j

yi = + j xji + ui , ui NID(0, 2 ), if i 1 ,

Notes:

1

2

3

4

5

6

7

No b

yi term because this is a linear combination of the intercept term

and the regressors xji ;

F-test is valid even though added variables are random;

Choice of q has impact on power;

No rule for determining the best value of q;

Often use quite small values of q, e.g. 1 or 2;

Cannot expect RESET to indicate how a model should be re-specified;

Cannot assume RESET will always have high power.

S PRING

Theory

2013

73 / 110

sum of squares (RSS) for the OLS regression of yi on the intercept

term and the xji using only the observations for s , s = 1, 2, and

RSS denote the residual sum of squares for this OLS regression

using all n observations. H0 can be tested using the F statistic

RSS (RSS1 + RSS2 ) n 2k 2

RSS1 + RSS2

k+1

yi = + j xji + ui , ui NID(0, 2 ), if i 2 .

j

unrestricted model!). Should note the homoskedasticity.

S PRING

Theory

2013

74 / 110

failure test. Test n2

!#

restrictions E yi e

xji

+e

= 0, i ,whereedenotes an

j

F-statistics is

RSS RSS1 n1 k 1

F=

,

RSS1

n2

which is F(n2 , (n1 k 1)) when the model is stable.

be satisfied even though H0 is false.

indicating the inconsistency of H0 .

I NTRODUCTORY F INANCIAL E CONOMETRICS Review of Econometric

S PRING

Theory

2013

and

of constant coefficients consists

n

o of the (k + 1)

F=

75 / 110

S PRING

Theory

2013

76 / 110

T REATMENT

T REATMENT

specification of the mean function.

READING

S PRING

Theory

2013

Wooldridge, Ch.5.

77 / 110

N ON - NORMAL D ISTURBANCES

S PRING

Theory

2013

N ON - NORMAL D ISTURBANCES

78 / 110

C ONSEQUENCES

C ONSEQUENCES

OLS estimators are still BLUE, but, in general, are NOT normally

distributed. Therefore the t and F tests are no longer valid in

finite samples.

finite samples.

yi = + j xji + ui , i = 1, ..., n,

j

distributed (i.i.d.) with zero mean and variance 2 < ,but the

common distribution is NOT normal.

Central Limit Theorem can be used to show that they are

asymptotically normally distributed, implying that t and F tests of

linear restrictions on regression coefficients are asymptotically

valid. The usual confidence intervals are also asymptotically

valid.

The prediction error test is, however, not asymptotically valid.

Since MLE maximizes wrong likelihood function, it does not

produce asymptotically efficient estimators.

S PRING

Theory

2013

79 / 110

S PRING

Theory

2013

80 / 110

T EST P ROCEDURES

T EST P ROCEDURES

T REATMENT

T REATMENT

E(u3i ) = 0; and E(u4i ) 34 = 0.

bi ,then it is natural to look

If a typical OLS residual is denoted by u

b3i and

at tests based upon the sample moments n1 u

b3i 3b

b2i .Jarque and Bera propose a

4 , where 2 = n1 u

n1 u

test of the joint significance of these terms. However, this test is

only asymptotically valid and, in large samples, there is little

need to assume normality when examining OLS results for the

linear multiple regression model.

distribution, then can derive the likelihood function and obtain

the asymptotically efficient MLE. Otherwise, use OLS and rely

upon large sample results.

the actual finite sample behaviour of the Jarque-Bera statistic

when the ui are normal.

The Jarque-Bera test can have low power under some nonnormal

disturbance distributions.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

81 / 110

S PRING

Theory

2013

H ETEROSKEDASTICITY

82 / 110

H ETEROSKEDASTICITY-I NTRODUCTION

H ETEROSKEDASTICITY-I NTRODUCTION

N ( + j xji , 2i ) variables, where 2i denotes var(ui ).

j

READING

cross-section data, grouped data, or random coefficient models,

but can occur in time-series applications (GARCH-family models

for instance).

S PRING

Theory

2013

83 / 110

S PRING

Theory

2013

84 / 110

C ONSEQUENCES OF H ETEROSKEDASTICITY

C ONSEQUENCES OF H ETEROSKEDASTICITY

either large or small samples.

Goldfeld-Quandt Test

assumption that all ui have same variance.

b

xji ui / e

x2 = + e

xji ui /RSSj , so that

= + e

j

ji

var(b

j ) =

"i

ex2ji 2i / ex2ji

i

#2

2

= e

x2ji u2i / RSSj which is not equal

i

The t- and F-tests are, therefore, invalid.

S PRING

Theory

2013

T ESTSFOR H ETEROSKEDASTICITY

85 / 110

T ESTSFOR H ETEROSKEDASTICITY

The null hypothesis is that the errors are homoskedastic. It is

assumed that information is available about the relative

magnitudes of variances under the alternative hypothesis of

heteroskedasticity.

Using this information, reorder the data so that 21 22 ... 2n .

Split the sample into three parts containing m, c, and m

observations, with m > (k + 1) and n = 2m + c. Drop the middle

set of c observations.

S PRING

Theory

2013

86 / 110

T ESTSFOR H ETEROSKEDASTICITY

Goldfeld-Quandt Test

Let RSS1 and RSS2 denote the OLS residual sum of squares

functions for estimation using the first m and last m observations,

respectively. Under the null hypothesis of homoskedasticity, the

statistic GQ = RSS2 /RSS1 is distributed as F(m k 1, m k 1)

and large values indicate data inconsistency of null hypothesis.

Original form suggested by Breusch-Pagan and Godfrey requires

normal disturbances even for asymptotic validity, and is not

recommended.

reorder data according to values of variances.

S PRING

Theory

2013

87 / 110

S PRING

Theory

2013

88 / 110

T ESTSFOR H ETEROSKEDASTICITY

T ESTSFOR H ETEROSKEDASTICITY

Koenkers Studentized Score test is asymptotically robust to

nonnormality. Estimate model by OLS using all observations and

bi , i = !

obtain the residuals u

1, ..., n. Assume an alternative of the

p

form

2i

1

not be specified.

Koenkers test statistic is nR2K and, under homoskedasticity, nR2K is

asymptotically distributed as 2 (p) with large values indicating

the rejection of the null model.

Problems: a) Large sample test; b) need enough information to

select the variable zji incorrect choice has impact on power.

p

u

1

J IANHUA G ANG (RUC)

S PRING

Theory

2013

89 / 110

T ESTSFOR H ETEROSKEDASTICITY

90 / 110

T ESTSFOR H ETEROSKEDASTICITY

being the nonredundant terms of xiq and xiq xir , q, r = 1, ..., k.

Problems: a) Large sample test; b) need enough information to

select the variable zji incorrect choice has impact on power.

S PRING

Theory

2013

S PRING

Theory

2013

91 / 110

upon squared past values of ui .The test for ARCH is a

b2ij ; i = p + 1, ..., n and j = 1, ..., p.

Koenker-type check with zji = u

S PRING

Theory

2013

92 / 110

T REATMENT OF H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

OLS to transformed data to get efficient estimators. Suppose

2i = 2 w2i ,with the w2i being known, then var(ui /wi ) = 2 i.In

this case, apply OLS to the transformed model

(yi /wi ) = (1/wi ) + j (xji /wi ) + (ui /wi ), in which the (ui /wi )

information about its form, then can use Whites

heteroskedasticity consistent standard errors, denoted by

WSE(b

) and WSE(b

j ), j = 1, ..., k. for asymptotically valid

inference after OLS estimation.

White shows that, if

WSE(b

j ) =

Note: the transformed model may not contain an intercept.

ex2ji u2i /

i

2

RSSj ,

then (b

j j )/WSE(b

presence of unspecified heteroskedasticity.

J IANHUA G ANG (RUC)

S PRING

Theory

2013

93 / 110

T REATMENT OF H ETEROSKEDASTICITY

S PRING

Theory

2013

T REATMENT OF H ETEROSKEDASTICITY

94 / 110

T REATMENT OF H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

H0 : j = j0

are based upon

prob(d1 N (0, 1) d1 ) = (1 ),

the (1 ) 100 per cent confidence intervals for and j are

) and b d1 WSE( b ), respectively.

given by, b

d1 WSE(b

j

under H0 .

S PRING

Theory

2013

b

b

b

tW

0 ( j ) = ( j j0 ) /WSE( j )N (0, 1)

N (0, 1) by t(n k 1) and this is often done. Thus can use the

following to obtain asymptotically valid tests of H0 : j = j0 .

95 / 110

S PRING

Theory

2013

96 / 110

T REATMENT OF H ETEROSKEDASTICITY

T REATMENT OF H ETEROSKEDASTICITY

b ) > d1 , where

H1 : j 6= j0 reject H0 if tW

(

0

j

j

b

H1+ : j > j0 reject H0 if tW

0 ( j ) > d2 , where

prob(t(n k 1) > d2 ) = ;

H : 6= reject H0 if tW ( b ) < d2 , where

1

j0

j

discussed in a time-series framework, but spatial autocorrelation

has been examined.

prob(t(n k 1) > d2 ) = ;

Just replace j by and b

by b

in the above to obtain test

procedures relevant to testing hypotheses concerning the

intercept.

S PRING

Theory

2013

straightforward to allow for random regressors with xjt

independent of us , for all j, s and t.) This assumption will be

relaxed later. In particular, will consider autocorrelation when

regressors include lagged values of the dependent variable.

97 / 110

C ONSEQUENCES OF A UTOCORRELATION

assumption that the ut are independent.

b

= + e

xjt ut

xjt ut / e

x2 = + e

xjt ut /RSSj ,and, since the e

t

jt

exjt ut

t

S PRING

Theory

2013

98 / 110

large or small samples.

C ONSEQUENCES OF A UTOCORRELATION

xjt ut ) and so

6= var(e

t

var(b

j ) 6= 2 /RSSj .Conventional standard errors are, therefore,

biased.

covariance stationary with E(ut utg ) = (|g|) for all t, with

(|0|) = 2 . The autocorrelation of order g, denoted by (g),is the

correlation between ut and utg ,i.e. E(ut utg )/2 , with the

sequence (1), (2), ...being called the autocorrelation function or

ACF. Under the null hypothesis of serial independence, (g) = 0

for all g 6= 0. Different tests check the significance of different sets

of estimates of autocorrelations.

S PRING

Theory

2013

99 / 110

S PRING

Theory

2013

100 / 110

residuals. The test statistic is

d=

which is approx.

L EMMA

Values of d close to 0 (resp. 4) indicate high level of positive (resp. negative)

residual first order serial correlation. The distribution of d under null

hypothesis of independent errors depends upon values of regressors, so critical

values vary from one case to another.

2(1 r(1))

where,

r(1) =

S PRING

Theory

2013

101 / 110

S PRING

Theory

2013

102 / 110

Have tables for combinations of n and k (and for models with and

without an intercept) giving bounds for the critical values for

testing H0 of serial independence against H1 : (1) > 0.These

upper and lower bounds, denoted by du and dl , define an interval

that contains the true known critical value. If d < dl , reject.If

d > du , accept.If dl d du ,the test is inconclusive. For

H1 : (1) < 0,use 4 du and 4 dl as bounds.

order autoregressive (AR(1)) model ut = 1 ut1 + t ,or first order

moving average (MA(1)) model ut = t + 1 t1 ,in which the

t NID(0, 2 ).For reasons to be discussed later in time series, we

assume |1 | < 1 and | 1 | 1.

Problems:

Checks for nonzero values of (1) can be insensitive to

(g) 6= 0, g 6= 1,e.g. g = 4, when (1) = 0

Test is inconclusive when sample value of d falls between

bounds-inconclusive region.

Requires errors to be normal and regressors to be fixed, e.g. no

lagged dependent variables.

S PRING

Theory

2013

103 / 110

S PRING

Theory

2013

104 / 110

E STIMATION

asymptotically valid for models with nonnormal errors and

lagged dependent variables in the regressor set.

If null hypothesis of serial independence is to be tested against

autoregressive or moving average model of order g, then apply

asymptotically valid F-test of H0 : 1 = 2 = ... = g = 0 after OLS

k

E STIMATION

type (AR or MA) and order (value of g), can use asymptotically

efficient MLE or apporoximation.

k

1

estimation of the model yt = + j xjt + j u

ut = t + 1 t1 + ... + g tg , t NID(0, 2 ), MA(g). MLE, or

approximations based upon minimizing 2t are available in

econometric softwares.

btj are lagged values of the residuals from the OLS estimation

the u

k

btj = 0.

selected j terms. If t j is not positive, set u

J IANHUA G ANG (RUC)

S PRING

Theory

2013

105 / 110

106 / 110

D ISTURBANCE A UTOCORRELATION ?

caused by misspecification of the mean function, e.g. omit

relevant regressors or use wrong functional form. In such cases,

re-estimation allowing for autocorrelation is of little value.

S PRING

Theory

2013

S PRING

Theory

2013

D ISTURBANCE A UTOCORRELATION ?

107 / 110

the null hypothesis that the errors of a regression equation are

generated by an autoregressive process of specified order. The

COMFAC test uses as its alternative an expanded version of the

original regression equation obtained by adding lagged values of

the dependent variable and the initial set of regressors. Details are

not provided because this test, while asymptotically valid, has

finite sample properties that cause concern; see Gregory and Veall,

Economic Letters, 1986, 22, 203-208. Moreover, the alternative

adopted in the COMFAC procedure may be inadequate and yield

a test that rarely detects a false null hypothesis.

S PRING

Theory

2013

108 / 110

D ISTURBANCE A UTOCORRELATION ?

D ISTURBANCE A UTOCORRELATION ?

Journal of Econometrics, 1995, 69, 267-288) offers the following

conclusions:

Although it is important to test for autocorrelation, it is rarely

appropriate to "autocorrelation correct" in response to rejecting the

null hypothesis of independent disturbances;

and, when re-estimation assuming autoregressive errors imposes

invalid restrictions, inconsistent parameter estimators will result.

illustrated by considering a simple case in which the model of the

null is yt = xt + ut ,with ut = 1 ut1 + t , t NID(0, 2 ), i.e. the

disturbances ut are AR(1).

J IANHUA G ANG (RUC)

S PRING

Theory

2013

109 / 110

yt = xt + 1 (yt1 xt1 ) + t ,

or equivalently,

yt = xt + 1 yt1 1 xt1 + t , t NID(0, 2 ),

in which the coefficient of xt1 is restricted to be minus the

product of the coefficients of xt and yt1 . Note that this restriction

is not linear.

S PRING

Theory

2013

110 / 110

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 1 Introduction of Time Series

3 C REDITS , 51 H OURS

Statistical analysis of data observed over time.

Jianhua Gang

School of Finance

Renmin University of China

Spring 2013

Time Series

2013

1 / 18

M OMENTS

and for pairs of random variables we can also define covariance,

correlation etc. In a time series we define these for each Yt :

Equispaced, i.e. we observe Y1 , Y2 , ..., Yt , Yt+1 , ..., YT1 , YT and NO

intermediate observation is missing.

Yt depends on Ys (if theres any) if and only if s < t

Yt does not depends on Ys if s > t.

Then, the vector {Y1 , Y2 , ..., Yt , Yt+1 , ..., YT1 , YT } is a time series.

Mean: E(Yt )= t ;

2

2

Variance: E (Y

nt t ) = t

o

Covariance: E (Yt t )(Yt+j t+j ) = t (j)

Correlation:

Time Series

2013

2 / 18

M OMENTS

Time Series

2013

3 / 18

t (j)

t t+j

Time Series

2013

4 / 18

O PERATORS

O PERATORS

P ROBLEM

Suppose {Y1 , Y2 , ..., Yt , Yt+1 , ..., YT1 , YT } is a single realization from a

stochastic process {Yt }

.We are interested in the model that generated

the time series, but we do not know it. How can we make inference, using

one single realization?

Lag operator: L

L Yt = Yt1

So, L1 Yt = Yt+1

= 1L

Yt = (1 L)Yt = Yt Yt1

Also, 2 Yt = (1 L)2 Yt = Yt 2Yt1 + Yt2

Time Series

2013

S OLUTION

We must use the fact that this is a T-dimensional observation:

5 / 18

R ESTRICT H ETEROGENEITY

Time Series

2013

R ESTRICT H ETEROGENEITY

6 / 18

R ESTRICT H ETEROGENEITY

R ESTRICT H ETEROGENEITY

{Y1 , Y2 , ..., YT } .For example,

counterparts. "Covariance stationarity" is also known as a "weak

stationarity" or simply as "stationarity" (without other references).

For time series Yt {Yt }

,

for (j) to indicate the autocovariance.

E(Yt ) = , t

E (Yt )(Yt+j )

= (j), t

i.e. the first two moments are finite and do not depend on time

(spatial equivalent).

Time Series

2013

7 / 18

Time Series

2013

8 / 18

R ESTRICT H ETEROGENEITY

R ESTRICT HETEROGENEITY

Given n , and given the process is stationary, then the sample

moments would estimate the population moments consistently.

dependence, provided that it is not too much: a sufficient

For any j1 , ...jn , the joint distribution of Yt+j1 , ..., Yt+jn and of

Yt+ +j1 , ..., Yt+ +jn is the same for any .

1

,not on time;

|j | < .

j=0

D EFINITION

One restriction on the dependence that allows to consistently estimate

the population moments using the sample moments in stationary

processes is called Ergodicity.

Time Series

2013

9 / 18

of Time2013

Series

R ESTRICT D EPENDENCE

10 / 18

Assume: Yt is stationary; E(Yt ) = 0 (if E(Yt ) = 6= 0, then

consider Yt instead). Then,

one of the two questions:

1

Y1 , ..., Yt ?

Impulse response: What is the consequence on Yt of a shock that

took place (t j) periods ago?

b t + 1 | t = a ( 1 ) Yt ;

Y

1

b t+1|t = a(2) Yt + a(2) Yt1 ;

Y

1

2

3

(m)

b t+1|t,...,tm+1 = a(m) Yt + a(m) Yt1 + ...am

Y

Ytm+1 .

1

2

of Time2013

Series

11 / 18

of Time2013

Series

12 / 18

F ORECAST

W OLD D ECOMPOSITION

W OLD D ECOMPOSITION

Of course, in some cases a non-linear forecast may be better.

linear projection?

(m) (m)

(m)

(1 , 2 , ..., m )

(m)

(m)

characterise a good

(m)

meet E [(Yt+1 Xt ) Xt ] = 0 (i.e., the forecast error Yt+1 Xt is

not correlated with Xt ).

Then, given Yt+1 = Yt+1 , (Yt+1 being single component),

Yt = kt + j tj

j=0

where

gives the best linear forecast.

of Time2013

Series

E(Yt+1 Xt ) E(Xt Xt ) = 0

1

b

E(Xt Yt+1 )

= E(Xt Xt )

that any stationary process may be given a linear representation.

This can be discussed using the Wold Decomposition.

0 = 1, 2j <

j=0

13 / 18

W OLD D ECOMPOSITION

of Time2013

Series

W OLD D ECOMPOSITION

14 / 18

I MPULSE R ESPONSE

I MPULSE R ESPONSE

For a process Yt that admits

Yt = + j tj

function,

b (Yt |Yt1 , ...)

t = Yt E

j=0

E(t ) = 0, E(2t ) = 2 ,

arbitrarily well as a linear function of past Yt , i.e.,

b (kt |Yt1 , ...) and it is such that E(kt tj ) = 0 j.

kt = E

E(t s ) = 0, s 6= t.

notice that

Yt

= j

tj

before. A plot of j (againtst j) is called impulse response function.

J IANHUA G ANG (RUC)

of Time2013

Series

15 / 18

of Time2013

Series

16 / 18

ACF

A UTOCORRELATION F UNCTION

D EFINITION (PACF)

D EFINITION (ACF)

j =

(m)

b t+1|t,...,tm+1 = (m) Yt + (m) Yt1 + ... + m

Y

Ytm+1

1

2

j

0

(1)

of Time2013

Series

(2)

(m)

(j)

PACF

autocorrelation function.

17 / 18

of Time2013

Series

18 / 18

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 2 MGF

3 C REDITS , 51 H OURS

It is however essential to consider the MGFs in order to

depict/solve relevant time series problems.

Jianhua Gang

School of Finance

Renmin University of China

Spring 2013

S PRING 2013

1 / 19

P RELIMINARIES

P RELIMINARIES :

P RELIMINARIES :

B INOMIAL D ISTRIBUTION

Define,

f (x) =

x x

S PRING 2013

2 / 19

P RELIMINARIES

Define as,

x sample space;

x random variable

x to the set of R with the probability that:

f (x) = 1;

Z x

Pr {x x } =

f (x)dx = 1.

S PRING 2013

3 / 19

n!

px (1 p)nx , for x = 1, 2, ..., n.

x!(n x)!

(a + b)n =

n!

x=0

S PRING 2013

4 / 19

P RELIMINARIES

P RELIMINARIES :

P RELIMINARIES :

P OISSON D ISTRIBUTION

N ORMAL D ISTRIBUTION

P RELIMINARIES

Define as,

e x

, for x = 1, 2, ..., n.

x!

The density arises from the identity of:

f (x) =

Define as,

o

n

(x )2

exp 22

f (x) =

22

written as x N , 2 , where < x < .

e =

x

x!

x=0

in which = E(x).

S PRING 2013

5 / 19

6 / 19

variable is defined by,

x f (x) discrete

x x

Z

E(x) =

x f (x) dx continuous

i = E xi

x x

The expectation of a measurable function g(x) of a r.v. x is

therefore defined by:

g (x) f (x)

x x

Z

E {g(x)} =

g (x) f (x)dx

S PRING 2013

o

n

i = E (x )i

x x

S PRING 2013

7 / 19

S PRING 2013

8 / 19

C ALCULATION OF M OMENTS

In definition, the third and the fourth moments measure the

following properties:

3 :

4 :

g(x) = c E {g(x)} = c

E {c g(x)} = c E {g(x)}

E {a + b g(x)} = a + bE {g(x)}

measured by:

3 =

4 =

and hence,

3

3

4

4

o

n

2

2 = E (x )2 = E x2 [E(x)]

S PRING 2013

9 / 19

x x

so that,

S PRING 2013

10 / 19

#

2 x2

1 + x +

+ ... f (x)dx

2!

di [Mx ()]

i

2

3

i

2 + 3 + ... + i + ...

2!

3!

i!

d

Hence we call the function Mx () the MGF of x. Note that this

property is true in either the discrete or the continuous case.

x x

S PRING 2013

"

= 1 + 1 +

x x

)

2 2

x

Mx () = E ex = E 1 + x +

+ ...

2!

)

"

#

(

Z

(x)i

(x)i

=

= E

i! f (x)dx

i!

i=0

i=0

However, consider the following function:

e f (x)

x x

n o

Z

Mx () = E ex =

ex f (x)dx

11 / 19

S PRING 2013

12 / 19

E XAMPLE OF MGF

A N E XAMPLE

It is also easy to see that the MGF satisfies two very important

properties.

E XAMPLE

Observations x1 through xn which are independent copies from r.v.

x Po ().Suppose were interested in the properties (distribution,

moments, etc.) of the sample mean:

the r.v. x.)

X=

Mn1 xi =

Mn 1 xi

1n

i=1

n

1

n

= Mxi ( ) = Mx ( )

n

n

i=1

1 n

Xi

n i

=1

S PRING 2013

13 / 19

E XAMPLE OF MGF

A N E XAMPLE

S PRING 2013

14 / 19

E XAMPLE OF MGF

A N E XAMPLE

P ROBLEM

Calculate the MGF of Sn = nX;

P ROBLEM

Calculate the MGF of X;

S OLUTION

S OLUTION

n o

Mx () = E ex =

MSn

() =

x

e

e x

= e

= e

x!

x!

x=0

x=0

n

o

o

n

= e exp e = exp e 1

x

S PRING 2013

n

oin

exp e 1

n

o

= exp n e 1

x=0

i=1

ex f (x)

Mx () =

Note that the MGF of Sn is of the same form as that for x, i.e. letting = n

n

o

MSn () = exp e 1

i.e. Sn Po (n) = Po ( ).

15 / 19

S PRING 2013

16 / 19

E XAMPLE OF MGF

A N E XAMPLE

E XAMPLE OF MGF

A N E XAMPLE

P ROBLEM

The moments of X.

P ROBLEM

Calculate the MGF of X;

S OLUTION

S OLUTION

h

E X

Mxi ( )

n

n

n

i=1

n h

oi

n

n

= Mx ( ) = exp e n 1

n

o

n

= exp n e n 1

MX () = M Sn

( ) = M xi ( ) =

S PRING 2013

h 2i

E X

=

2X

17 / 19

E XAMPLE OF MGF

E X =

2X =

n

If we consider X as an estimator for , we refer to these properties

as unbiasedness, and given the consistency, that is the variance

tends to be zero.

S PRING 2013

E X =

A N E XAMPLE

i

i

19 / 19

o

n

di exp n(e n 1)

i

n d

o

d exp n(e n 1)

o

n d

2

d exp n(e n 1)

2

| =0

| =0 =

| =0 = 2 +

i d

2

= E X E X

= (central moments)

n

h

S PRING 2013

18 / 19

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 3 ARMA Models

3 C REDITS , 51 H OURS

Yt = +

j t j

j =0

Jianhua Gang

for the impulse response analysis and for forecasting.

School of Finance

Renmin University of China

estimate an infinite number of parameters, so we have to propose

parsimonious models.

Spring 2013

S PRING 2013

T OPIC 3 ARMA M ODELS

1 / 47

2 / 47

If t is w .n.(0, 2 ),

D EFINITION

t may be strictly stationary, but needs not be;

t is covariance stationary;

{ t }

is white noise if:

E ( t ) = 0t

and if Yt = +

E (2t ) = 2 t

(j )

so, j = 0, j = 0, and j

and if Yt = +

mean if

is stationary if

j t j ,then Yt

3 / 47

j =0

j =0

j < .

j =0

S PRING 2013

2j < ;

j t j ,then Yt

j =0

E ( t s ) = 0t (t 6 = s )

S PRING 2013

S PRING 2013

4 / 47

MA(1)

MA(1)

I NVERTIBLE MA(1)

I NVERTIBLE MA(1)

We can check stationarity noticing that 0 = 1, 1 = ,so

Rewrite t = Yt t 1 as t = Yt Lt

2j = 1 + 2 < .

j =0

Otherwise, we can check that the first two moments do not depend on

time.

1

2

Mean: E (Yt ) =

Autocovariances:

= E [(Yt )2 ] = E [(t + t 1 )2 ] = (1 + 2 )2

= E [(Yt )(Yt 1 )] = 2

= 0

i.e. Yt =

j =1

S PRING 2013

T OPIC 3 ARMA M ODELS

5 / 47

MA( Q )

6 / 47

MA( Q )

MA( Q )

MA(q).

Mean: E (Yt ) =

Autocovariances:

0

j q

= E [(t + 1 t 1 + ... + q t q ) ]

= (1 + 21 + ... + 2q )2

= E [(t + 1 t 1 + ... + q t q )

(t j + 1 t 1 j + ... + q t q j )]

j >q

S PRING 2013

T OPIC 3 ARMA M ODELS

MA( Q )

( )j Yt j + t .

Autocorrelations: 1 = 2 , j 2 = 0.

1 +

Yt

= Yt ( )j Lj = ( )j Yt j ,

(1 + L)

j =0

j =0

0

j 2

3

t =

= ( j + 1 j +1 + 2 j +2 + ... + q j q )2

= 0

S PRING 2013

7 / 47

factor (1 + 1 L + ... + q Lq ) = (1 1 L)(1 2 L)...(1 q L) in

the MA(1) we asked that |1 | < 1: in the same way here we have to

ask that |1 | < 1, |2 | < 1, ..., |q | < 1.This is sometimes stated as

asking that the roots of the equation in z of the form

(1 + 1 z + ... + q z q ) = 0 lie OUTSIDE the unit circle.

S PRING 2013

8 / 47

MA( INFINITY )

MA( INFINITY )

AR(1)

further that || < 1. Since Yt 1 = c + Yt 2 + t 1 ,then replace

into the previous equation:

j t j

AR(1)

j =0

j =0

by j in a

Yt

= c + (c + Yt 2 + t 1 ) + t

= (1 + )c + 2 Yt 2 + t 1 + t

= ...iterating

Yt

1

2

Mean: E (Yt ) =

Autocovariances:

2k 2

k k +j 2

k =0

S PRING 2013

j =0

j =0

as n , and || < 1

n

1

Yt =

c + 0 + j t j

1

j =0

k =0

j c + n +1 Yt n 1 + j t j

9 / 47

AR(1)

S PRING 2013

T OPIC 3 ARMA M ODELS

AR(1)

10 / 47

AR(1)

AR(1)

the condition

Mean:

j =0

| j | =

j =0

| |j =

j =0

1

1 ||

This can also be obtained by rewriting Yt as Yt = c + LYt + t ,

using the lag operator, and then (1 L)Yt = c + t . Since || < 1,

Yt

0 =

j c + j t j

j =0

2k 2 = 2k 2 = 1 2 2

k =0

c

(= )

1

= (1 L)1 c + (1 L)1 t

E (Yt ) =

k =0

k =0

k k +j 2 =

k =0

k k +j 2 =

k =0

2k j 2 =

j

2

1 2

j =0

c

+ j t j

1 j =0

S PRING 2013

11 / 47

S PRING 2013

12 / 47

AR(1)

AR(1)

AR(1)

AR(1)

Autocorrelations

Upon knowing that the process is stationary, we could derive the mean and

autocovariances:

j

= j

j =

0

stationarity, E (Yt ) = , E (Yt 1 ) = ,

so = c + and then = 1 c

Autocovariances: Replacing c = (1 ), rewrite Yt as

= + Yt 1 + t

Yt = (Yt 1 ) + t

Yt

then

0 = E (Yt )2 = E ((Yt 1 ) + t )2

= 2 0 + 2

J IANHUA G ANG (RUC)

S PRING 2013

T OPIC 3 ARMA M ODELS

13 / 47

AR(1)

S PRING 2013

T OPIC 3 ARMA M ODELS

AR(1)

14 / 47

AR( P )

AR( P )

solving for 0,

AR (p ).

2

.

1 2

= E [(Yt )(Yt j )]

0 =

j 1

P ROBLEM

How can we check for stationarity?

= E [((Yt 1 ) + t )(Yt j )]

= E [(Yt 1 )(Yt j )] + E (t (Yt j ))

= j 1

So

j 1 =

S OLUTION

Factoring (1 1 L ... p Lp ) = (1 1 L)...(1 p L) stationary

follows if |j | < 1 for all j.

Another way to state this condition is to check that the solutions of the

equation in z, (1 1 z ... p z p ) = 0 are all OUTSIDE the unit circle.

j

2 .

1 2

S PRING 2013

15 / 47

S PRING 2013

16 / 47

AR( P )

AR( P )

AR( P )

AR( P )

Given stationarity,

Autocovariances:

0 = E (Yt )2

Mean:

E (Yt ) = E (c + 1 Yt 1 + ... + p Yt p + t )

= c + 1 + ... + p

c

=

1 1 ... p

j 1

= E [(1 (Yt 1 )(Yt ) + ... + p (Yt p )(Yt )

+t (Yt ))]

= 1 1 + ... + p p + 2

= E [(Yt )(Yt j )]

= E [(1 (Yt 1 ) + ... + p (Yt p ) + t ) (Yt j )]

= E [(1 (Yt 1 )(Yt j ) + ... + p (Yt p )(Yt j )

+t (Yt j )]

= 1 j 1 + ... + p j p

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013

17 / 47

AR( P )

S PRING 2013

T OPIC 3 ARMA M ODELS

AR( P )

18 / 47

AR( P )

AR( P )

the autocorrelations show a cyclical dynamics.

0 = 1 1 + 2 2 + 2

1 = 1 0 + 2 1

2 = 1 1 + 2 0

and notice that 1 = 1 ,so replacing 1 and 2 ,

1

1 2 0

21

=

+ 2 0

1 2

(1 2 )

h

i 2

=

(1 + 2 ) (1 2 )2 21

1 =

2

0

J IANHUA G ANG (RUC)

S PRING 2013

19 / 47

S PRING 2013

20 / 47

Yt = (L)1 t (here we used stationary) so (L)1 = (L).i.e.,

S PRING 2013

T OPIC 3 ARMA M ODELS

21 / 47

S PRING 2013

T OPIC 3 ARMA M ODELS

22 / 47

ARMA( P, Q )

ARMA( P, Q )

Yt = c + 1 Yt 1 + ... + p Yt p + t + 1 t 1 + ... + q t q is

ARMA(p, q ).

Stationarity of the whole ARMA(p, q ) depends on the

autoregressive part only (whilst the invertibility depends on the

MA part only) :

Using the lag operator, the ARMA(p, q ) is

(1 1 L ... p Lp )Yt = (1 + 1 L + ... + q Lq )t .

For stationarity, we have to check if the roots of

(1 1 z ... p z p ) = 0 are all outside the unit circle.

For invertibility, we require that the roots of

(1 + 1 z + ... + q z q ) = 0 are outside the unit circle.

S PRING 2013

23 / 47

S PRING 2013

24 / 47

ARMA( P, Q )

ARMA( P, Q )

ARMA( P, Q )

ARMA( P, Q )

Given the stationarity,

Autocovariances: The autocovariances are a combination between

those of an AR (p ) and an MA(q ), so for j > q,

Mean:

j = 1 j 1 + ... + p j p

E (Yt ) = E (c + 1 Yt 1 + ... + p Yt p

Firstly notice that

+t + 1 t 1 + ... + q t q )

= c + 1 + ... + p + 0 + ... + 0

c

=

1 1 ... p

E [(Yt )t ] = E [((Yt 1 ) + t + t 1 )t ]

= 0 + 2 + 0 = 2

E [(Yt )t 1 ] = E [((Yt 1 ) + t + t 1 )t 1 ]

= 2 + 0 + 2 = ( + )2

J IANHUA G ANG (RUC)

S PRING 2013

T OPIC 3 ARMA M ODELS

25 / 47

S PRING 2013

ARMA( P, Q )

ARMA( P, Q )

26 / 47

ARMA( P, Q )

ARMA( P, Q )

so

0 = E [((Yt 1 ) + t + t 1 )(Yt )]

so

+E [t 1 (Yt )]

= 1 + 2 + ( + )2

0

1

j 2 = j 1

+E [t 1 (Yt 1 )]

= 0 + 0 + 2

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 3 ARMA Models

S PRING 2013

and

1 = E [(Yt )(Yt 1 )]

( + )2

= 1+

1 2

( + )2

= 2 + +

1 2

2

27 / 47

S PRING 2013

28 / 47

ARMA( P, Q )

ARMA( P, Q )

IRF OF ARMA

The autocorrelation can be derived in the same way: for the generic

ARMA(p, q ), for j > q,

=

1

(L) (L) =

(L) =

(1 + 1 L + ... + q Lq ) =

Yt

( L ) 1 (L ) t

(L)

(L) (L)

(1 1 L... p Lp )

(1 + 1 L + 2 L2 + ...)

(1 + 1 L + ... + q Lq ) = 1 1 L + 1 L 2 L2 + 2 L2

3 L3 2 1 L3 1 2 L3 + 3 L3 + ...

S PRING 2013

29 / 47

IRF OF ARMA

S PRING 2013

T OPIC 3 ARMA M ODELS

30 / 47

IRF OF ARMA

solve this for the various power of L:

1 = +

L0 : 1 = 1

1

: 1 = 1 + 1

: 2 = 2 + 1 1 + 2

L

L

j 2 = j 1 = ( + ) j 1

The ARMA(1, 1) could also be decomposed in impulse response by

looking at

Yt = Yt 1 + t , t = t + t 1

L3 : 3 = 3 + 3 + 2 1 + 1 2

S PRING 2013

31 / 47

S PRING 2013

32 / 47

IRF OF ARMA

C OMMON FACTORS

C OMMON FACTORS

Then,

Yt

j =0

j =0

j t j + j t j 1

j =0

(L) and of (L) : in this case, the ARMA(p, q ) process cannot be

distinguished, on the basis of the autocorrelation structure (or from

the weights in the MA() representation), from an

ARMA(p 1, q 1) process.

j t j = j (t j + t j 1 )

j =0

overparametrised.

j t j + l 1 t l

j =0

l =1

desirable to do so, especially if the parameters 1 , ..., p and

1 , ... q have to be estimated).

= t + j 1 t j + j 1 t j

j =1

j =1

= t + ( + ) j 1 t j

j =1

S PRING 2013

T OPIC 3 ARMA M ODELS

33 / 47

C OMMON FACTORS

S PRING 2013

T OPIC 3 ARMA M ODELS

E XAMPLE :

34 / 47

ARMA

Yt = 1.2Yt 1 0.35Yt 2 + t 0.7t 1

is

to give an MA() representation; in the same way, it is also possible

to give an AR () representation (indeed, this is a proper definition of

"invertibility"). All these models have the same autocovariances /

autocorrelation structures, and are therefore indistinguishable.

purpose: for example, we may like the MA() if we are interested in

the impulse rensponse function, the AR () if we want to compute t

given observations on {Yt }

(and assuming we know the

parameters), or we may prefer the ARMA(p, q ) if we are interested in

estimating the parameters.

(1 0.7L)(1 0.5L)Yt = (1 0.7L)t

so, simplifying (1 0.7L), the process has the same autocorrelation

structure (and the same weights in the MA() representation) of

(1 0.5L)Yt = t

i.e.

Yt = 0.5Yt 1 + t

J IANHUA G ANG (RUC)

S PRING 2013

35 / 47

S PRING 2013

36 / 47

F ILTERS

Sometimes data are treated (by nature or by the researcher) by

summing / averaging / differencing ...

For Yt , a filter h(L) is applied as:

Xt = h(L)Yt

however, we may derive some transformations from this particular

framework.

where

h (L) =

hj Lj

j =

If

|hj | < ,

j =

j =

| j | <

then

Xt = + (L)t

where

= h(1)c, (L) = h(L)(L)

J IANHUA G ANG (RUC)

S PRING 2013

T OPIC 3 ARMA M ODELS

37 / 47

S PRING 2013

T OPIC 3 ARMA M ODELS

38 / 47

S UM OF ARMA PROCESSES

Suppose Yt is w .n.(0, 2 ), and

1

Xt =

k

Example:

Yt = Xt + vt

k 1

Yt j

where

j =0

Xt = ut + ut 1

then averaging induced dependence where there was none.

S PRING 2013

39 / 47

Suppose we are interested in Xt ,but we can only observe Yt .What are

the properties of Yt ?

S PRING 2013

40 / 47

S UM OF ARMA PROCESSES

S UM OF ARMA PROCESSES

In order to find ,compute

0 = E (Xt + vt )2 = E (Xt2 ) + E (vt2 ) + 2E (Xt vt )

1

j 2

1 =

= (1 + 2 )2u + 2v

= E (Xt + vt )(Xt 1 + vt 1 ) = E (Xt Xt 1 ) + E (vt Xt 1 )

+E (Xt vt 1 ) + E (vt vt 1 )

= 2u

= 0

= 1

1 + 2

solve for :

= 1 + 1 2

q

1 1 421

1,2 =

21

Yt = t + t 1 , t w .n.(0, 2 )

S PRING 2013

T OPIC 3 ARMA M ODELS

41 / 47

42 / 47

S UM OF ARMA PROCESSES

1 421

21

1 = 2 ,so

2 = 2u .

L EMMA

In general, consider

Yt = Xt + Wt

2 ,for

example from

where Xt and Wt are (zero mean) stationary processes such that Xt and

W are not correlated at any t, , then

E (Yt Yt j ) = E (Xt Xt j ) + E (Wt Wt j )

however there are three parameters of interest, , 2u , and 2v ,we,

however, cannot estimate them without an identification assumption.

In other words, Yt contains less information than Xt and vt ).

S PRING 2013

T OPIC 3 ARMA M ODELS

S UM OF ARMA PROCESSES

1 =

For = 1 , where

2u

.

(1 + 2 )2u + 2v

Since in an MA(1),

S PRING 2013

43 / 47

i.e.

Yj = Xj + W

j

S PRING 2013

44 / 47

S UM OF TWO MA PROCESSES

S UM OF TWO AR PROCESSES

Suppose,

Yt = Xt + Wt

where

(1 L)Xt = ut , (1 L)Wt = vt ( 6= )

then

L EMMA

If Xt is MA(q1 ) and Wt is MA(q2 ), then Yt is MA(max[q1, q2 ]).

(1 L)(1 L)Wt = (1 L)vt ,

so

(1 L)(1 L) (Xt + Wt )

= (1 L)ut + (1 L)vt

So Yt is ARMA(2, 1). (If = , Yt is AR (1) ).

J IANHUA G ANG (RUC)

S PRING 2013

T OPIC 3 ARMA M ODELS

45 / 47

S UM OF T WO ARMA P ROCESSES

L EMMA

If Xt is ARMA(p1 , q1 ), Wt is ARMA(p2 , q2 ), then Yt is ARMA(p, q ) with

p p1 + p2

and

q max(p1 + q2 , p2 + q1 )

S PRING 2013

47 / 47

S PRING 2013

46 / 47

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 4 Estimation of ARMA

3 C REDITS , 51 H OURS

MLE

Jianhua Gang

School of Finance

Renmin University of China

Spring 2013

PRING 2013

1 / 42

Y = (Y1 , ..., YT )

be a Normally distributed vector with

E (Y ) = ,E ((Y )(Y ) ) =

Sample autocovariance

bj =

Sample autocorrelation

Let

Sample Mean

2 / 42

on some population moments (mean, autocovariances,...). However,

we only have the data that we observed, (y1 , ..., yT ) ,so we can only

compute estimates of these moments. Are these estimates useful?

Y =

PRING 2013

1

Yt

T t

=1

y = (yT , ..., y1 )

1 T

(Yt Y )(Yt j Y )

T t =

j +1

in the support of Y is

fY T ,...Y 1 (yT , ..., y1 )

bj

b

j =

b0

PRING 2013

1

= (2 )T /2 ||1/2 exp( (y ) 1 (y ))

2

3 / 42

PRING 2013

4 / 42

E XAMPLES :

consider =() where is a set of parameters of interest. Then,

1

= (2 )T /2 |()|1/2 exp( (y ) ()1 (y ))

2

is the likelihood function. Maximizing that function w.r.t. yields the

(exact) maximum likilihood estimate.

Note the difference between and .

PRING 2013

AR (1)(|0 | < 1) :

Yt = c0 + 0 Yt 1 + t , t Nid (0, 20 )

5 / 42

= (c, , 2 ) , (||

1

2

...

...

() =

1 2

T 2 T 3

T 1 T 2

... T 2 T 1

... T 3 T 2

...

...

...

...

1

...

PRING 2013

E XAMPLES :

< 1) and

6 / 42

E XAMPLES :

MA(1)(| 0 | < 1) :

Yt = 0 + t + 0 t 1 , t

observations and for any parameter (within the range of the

parameter space).

Nid (0, 20 )

= (c, , 2 ) , and

() =

(1 + 2 )

2

2

(1 + ) ...

0

0

J IANHUA G ANG (RUC)

(1 + 2 )

...

...

...

0

...

...

...

1

...

(1 + 2 )

time

obs.

...

2

(1 + )

1

0

PRING 2013

y1

0.5

y2

0.8

y3

0.2

y4

2

and suppose you want to estimate 0 in the MA(1) model with the

additional assumption that 0 = 0 and 20 = 1: consider five

potential values for 0 : 0.5, 0.25, 0, 0.25, 0.5.

Then, we have to compute () for each : for example, when

= 0.5,

7 / 42

PRING 2013

8 / 42

E XAMPLES :

E XAMPLES :

and

then,

() =

0.5 2

(1 +0.5 )

(1 + 0.52 ) ...

0

0

0.5

(1 +0.52 )

1

...

0

0

...

...

...

...

...

0

0

...

1

0.5

(1 +0.52 )

0

0

...

0.5

(1 +0.52 )

1

PRING 2013

(y ) ()1 (y )

= 0.5 0.8 0.2 2

1

1.25 0.5

0

0

0.5 1.25 0.5

0

0

0.5 1.25 0.5

0

0

0.5 1.25

0.5

0.8

0.2

2

= 4.6903

9 / 42

ARMA2013

E XAMPLES :

10 / 42

E XAMPLES :

= 0.76)

So,

1

(2 )T /2 |()|1/2 exp( (y ) ()1 (y ))

2

1

1/2

= (2 )4/2 (1.332)

exp( 4.6903)

2

= 2.1033 103

Therefore, we may get all the likelihoods for different .

0.5 0.25 0

0.25

0.5

103 f 3.178 2.618 2.153 1.967 2.103

ARMA2013

11 / 42

ARMA2013

12 / 42

E XAMPLES :

ML OF AR(1)

AR(1)

The computation of

Then

1

(2 )T /2 |()|1/2 exp( (y ) ()1 (y ))

2

Yt N

1

() for all the admissible values .

ARMA2013

13 / 42

1

c0

, 20

1 0

1 20

fY1 (y1 ; )

2

c

1/2

2

1 y1 1

= (2 )1/2

exp

2

2

1 2

1 2

ML OF AR(1)

ARMA2013

AR(1)

way that does not require the inversion of ();otherwise, it is also

possible to modify the problem so that, again, we can avoid the

inversion of ().

14 / 42

ML OF AR(1)

AR(1)

and, by the same arguement,

this case we can also exploit the fact that we observed Y1 on the

period before:

Y2 |Y1 N (c0 + 0 Y1 , 20 ),

so

= (2 )

fY |Y ,...,Y

t

t =2

t 1

where

fY t |Yt 1 ,...,Y 1 (yt |yt 1 , ..., y1 ; )

"

#

1 (yt c yt 1 )2

1/2 2 1/2

= (2 )

exp

2

2

fY 2 |Y 1 (y2 |y1 ; )

1/2

"

#

2

2 1/2

y

1

y

(

)

2

1

exp

2

2

when t = 2, ..., T .

J IANHUA G ANG (RUC)

ARMA2013

15 / 42

ARMA2013

16 / 42

ML OF AR(1)

AR(1)

ML OF AR(1)

AR(1)

The log-likelihood is

T

l () = ln(fY1 (y1 ; )) +

t

t =2

1

= ln 2

2

1 2

estimate" when t is normally distributed.

t 1

c

1 y1 1

2

2

2

1

2

still cannot express our estimate b

as a closed form function of the

observations, so we still have to compute the likelihood function on

all the admissible parameters in order to find the maximum.

1 T (yt c yt 1 )2

T 1

2

ln(2 )

2

2 j =2

2

We then succeed in writing the (log) likelihood in a way that does

not require the inversion of a T T matrix.

ARMA2013

17 / 42

ARMA2013

ML OF AR(1)

AR(1)

18 / 42

ML OF AR(1)

AR(1)

1 T (yt c yt 1 )2

T 1

ln(22 )

2

2 j =2

2

estimate", because it is the maximum likelihood estimate if Y1 is not

random (so, the log-likelihood above is called "conditional"

log-likelihood). In this case, a closed form solution exists.

In order to find the closed form solution, first notice that 2 can be

estimated and concentrated out:

"

#

2

T

y

T 1

1

y

(

)

t

t

ln(22 )

2

2

2 j =2

2

T

=

J IANHUA G ANG (RUC)

T 1 1

1

(yt c yt 1 )

+

2

2

2 j =2

( 2 )2

c2 =

(yt y . )(yt 1 y .1 )

t =2

b =

where, y . =

19 / 42

T

1

(yt y . ) (yt 1 y .1 )

T 1 t =2

b

c =

ARMA2013

T

1

(yt c yt 1 )2

T 1 j =2

1

T 1

( yt 1 y . 1 ) 2

t =2

T

t =2

t =2

yt , y .1 = T 11 yt 1

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING

ARMA2013

20 / 42

ML OF AR(1)

AR(1)

ML OF AR( P )

AR( P )

form solution exists, and it is the OLS estimate in

Yt = c0 + 0 Yt 1 + t .

Notice that this is not the likelihood function of our original

stationary AR(1) process, but the likelihood of the process,

Assume

Yt = c0 + 0;1 Yt 1 + ... + 0;p Yt p + t ,

where

t Nid (0, 20 )

and the roots of 1 0;1 z ... 0;p z p = 0 are outside the unit

circle.

Yt = c0 + 0 Yt 1 + t , |0 | < 1,

Introduce

Y1 = y1

and

ARMA2013

2 1

Vp = ( )

21 / 42

ML OF AR( P )

Yp p

Yp p

ARMA2013

AR( P )

22 / 42

ML OF AR( P )

AR( P )

fY p ,...,Y 1 (yp , ..., y1 ; )

= (2 )p/2 |2 Vp ()|1/2

1

1

exp 2 yp p Vp () yp p

2

T

t =p +1

inverting a p p matrix Vp ().

= (2 )1/2 |2 |1/2

"

#

2

1 (yt c 1 yt 1 ... p yt p )

exp

2

2

when t = p + 1, ..., T .

J IANHUA G ANG (RUC)

ARMA2013

23 / 42

ARMA2013

24 / 42

ML OF AR( P )

AR( P )

ML OF AR( P )

AR( P )

estimate".

h

i

T

+ ln fYt |Yt 1 ,...,Yt p (yt |yt 1 , ..., yt p ; )

considered instead: this is obtained by treating Yp , ..., Y1 as given,

and maximizing

t =p +1

p

= ln(2 ) |2 Vp ()|1/2

2

1

2 yp p Vp1 () yp p

2

T p

ln(22 )

2

2

1 T (yt c 1 yt 1 ... p yt p )

2 t =p +1

2

ARMA2013

t =p +1

"conditional maximum likelihood estimate". This turns out to

be the OLS estimate of c0 , 0;1 , ..., 0;p in the corresponding

regression model.

25 / 42

ML OF MA(1)

ARMA2013

MA(1)

26 / 42

ML OF MA(1)

MA(1)

Suppose

i.e. the density of Yt |t 1 is

fY t |t 1 (yt |t 1 ; 0 )

1 (yt 0 0 t 1 )2

1

exp

= q

2

20

220

1 2t

1

exp 2

= q

2 0

220

0 = 0

we can also derive a "conditional maximum likelihood estimate" of

in an MA(1).

In general, since t Nid (0, 20 ),then

Yt |t 1 N (0 + 0 t 1 , 20 )

ARMA2013

27 / 42

ARMA2013

28 / 42

ML OF MA(1)

MA(1)

ML OF MA(1)

MA(1)

However, suppose that we know 0 ,then Y1 = 0 + 1 + 0 0 , and,

given 0 and 0 we can also compute

and

fY t ,Y t 1 ,...,Y 1 |0 (yt , yt 1 , ..., y1 |0 ; 0 )

= fY1 |0 (y1 |0 ; 0 )

1 (0 ) = y1 0 0 0

the vector of parameters ).

Having computed 1 (0 )(and given 0 ) we can also compute

2 (0 ) = y2 0 0 1 ( 0 ), and,iterating the procedure,

t (0 ) = yt 0 0 t 1 (0 )

Then

fY t |t 1 (yt |t 1 ; 0 ) = fY t |Y t 1 ,...,Y 1 ,0 (yt |yt 1 , ..., y1 , 0 ; 0 )

J IANHUA G ANG (RUC)

ARMA2013

29 / 42

t 1

t =2

T

t ( 0 )2

.

= (2 )T /2 (20 )T /2 exp

220

t =1

Notice that this is not the density of (Yt , Yt 1 , ..., Y1 ) where each Yt

has an MA(1) representation, but that a density (i.e., the density of

(Yt , Yt 1 , ..., Y1 ) when each Yt has MA(1) representation)

conditional on 0 .

Moreover, we cannot compute a likelihood, because we cant observe

0 .

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING

ARMA2013

ML OF MA(1)

MA(1)

30 / 42

ML OF MA(1)

MA(1)

Therefore,

Yt = 0 + t + 0 t 1 ,

= fY1 |0 =0 (y1 |0 = 0; 0 )

with

t > 0, 0 = 0

t =2

This process is very similar to the stationary MA(1), and it has the

density above (setting 0 = 0); given that we know 0 ,we can

initialize the iterations (for all the admissible values of )

= (2 )

t 1

T /2

2 T /2

( )

t ()2

exp 22 .

t =1

T

t () = yt t 1 ()

We can then compute the likelihood (which is, then, a "conditional

likelihood") as a function of a set of observations (yt , yt 1 , ..., y1 ) ,

and of a generic vector of unknown parameters ,

J IANHUA G ANG (RUC)

ARMA2013

31 / 42

l () =

T

1

T

ln(2 ) ln(2 ) 2

2

2

2

2t ()

t =1

called "conditional maximum likelihood estimate".

J IANHUA G ANG (RUC)

ARMA2013

32 / 42

ML OF MA( Q )

MA( Q )

ML OF MA( Q )

MA( Q )

Iteratively, and we can formulate a "conditional maximum likelihood":

fY t ,Y t 1 ,...,Y 1 ,0 =0 (yt , yt 1 , ..., y1 , 0 = 0; 0 )

written as:

= fY1 |0 = 0(y1 |0 = 0; 0 )

T

t

and the roots of 1 + 0;1 z + ... + 0;q

zq

t =2

= (2 )

t 1

T /2

2 T /2

| |

t ()2

exp 22 .

t =1

T

t () = yt 1 t 1 () ... q t q ()

l () =

T

1

T

ln(2 ) ln(2 ) 2

2

2

2

2t ()

t =1

called "conditional maximum likelihood estimate".

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING

ARMA2013

33 / 42

ML OF ARMA( P, Q )

ARMA2013

ARMA( P, Q )

34 / 42

ML OF ARMA( P, Q )

ARMA( P, Q )

Yt

t

+t + 0;1 t 1 + ... + 0;q t q ,

Nid (0, 20 )

fY T ,...,Y p +1 |Y p, ...,Y 1 ,0 =0,...,p q +1 =0

1 + 0;1 z + ... + 0;q z q = 0 are all outside the unit circle, and there is

no common factor.

= (2 )(T p )/2 |2 |(T p )/2

2

T

t ()

exp 22 .

t =p +1

Then we can compute

t () = yt c 0;1 yt 1 ... 0;p yt p 1 t 1 () ... q t q ()

for t > p.

J IANHUA G ANG (RUC)

ARMA2013

35 / 42

ARMA2013

36 / 42

ML OF ARMA( P, Q )

ARMA( P, Q )

l ()

1

T p

T p

ln(2 )

ln(2 ) 2

=

2

2

2

2t ()

t =p +1

"conditional maximum likelihood estimate".

then the maximum likelihood estimate is different as well. If we use

the gaussian density even if t is not normally distributed, then, our

estimate is no longer the maximum likelihood one. In this case it

usually known as Pseudo (or Quasi) maximum likelihood instead.

term (conditional RSS) is minimized.

ARMA2013

37 / 42

ARMA2013

38 / 42

for a generic (0 ) ,and consider an approximate second order Taylor

expansion of l (),

for the estimate, and it may be extremely time consuming to compute

the log-likelihood function (even the conditional log-likelihood) for all

the potential .

The optimisation of the log-likelihood may be carried using a

numerical algorithm, such as the Newton-Raphson one. Introduce

iif

l ()

| (0) (gradient)

=

2 l ()

H ((0 ) ) =

| (0) (Hessian)

=

g ((0 ) ) =

ARMA2013

ih

i

h

l () l ((0 ) ) + g ((0 ) ) (0 )

i

i

h

1h

(0 ) H ((0 ) ) (0 )

2

l ()

| b=0

=

h

i

h

i

l ()

g ((0 ) ) H ((0 ) ) (0 ) .

39 / 42

ARMA2013

40 / 42

solving for

h

i

h

i

g ((0 ) ) H ((0 ) ) (0 ) = 0,

i.e.,

l () in (1 ) ,and compute

h

i

(2 ) = (1 ) + H ((1 ) )1 g ((1 ) ) .

i

h

= (0 ) + H ((0 ) )1 g ((0 ) ) .

approximation is not exact (there is a remainder, in this case of the

third order, in the Taylor expansion of l ()). Lets call this possibly

poor estimate (1 ) ,then, where

i

h

(1 ) = (0 ) + H ((0 ) )1 g ((0 ) )

clearly, this is (in a certain probabilistic sense better than a generic

(0 ) .

ARMA2013

41 / 42

gives b

).

In many cases, you may start the optimisation with any set of

starting values, but this may result in a rather slow optimisation, or

even in an "incorrect" solution (you may end up picking a local

maximum, rather than the maximum). It is then advisable to start

from a "good" point, that is, from a consistent estimate of

(typically, an estimate that you may compute easily, even if it is less

efficient than maximum likelihood): the correlogram based

estimate is a good starting point (given certain regularity conditions,

properties as in the pseudo-maximum likelihood estimate may be

obtained after just one step).

J IANHUA G ANG (RUC)

ARMA2013

42 / 42

S ERIES

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 5 Models of Heteroskedasticity

3 C REDITS , 51 H OURS

Jianhua Gang

School of Finance

Renmin University of China

Spring 2013

1 / 30

S PRING 2013

2 / 30

A S AMPLE

E XCHANGE )

explain a number of important features common to much financial

data

leptokurtosis

volatility clustering or volatility pooling

leverage effects

SPX (S&P500 Index)

yt = 1 + 2 x2t + ... + k xkt + ut

2000

100

1000

50

CBOE VIX

S PRING 2013

or more compactly

y = X + u, u N 0, 2

S PRING 2013

0

1990

3 / 30

1992

1994

1996

1998

2002

Year

2004

2006

2008 2009

S PRING 2013

4 / 30

A S AMPLE

Campbell, Lo and MacKinlay (1997) define a non-linear data

generating process (DGP) as one that can be written as,

yt = f (ut , ut 1 , ut 2 , ...)

15

10

5

0

-5

-10

1990

1992

1994

1996

1998

2000

Year

2002

2004

2006

20082009

80

60

40

term.

20

0

-20

-40

1990

1992

1994

1996

1998

2000

Year

2002

2004

2006

20082009

S PRING 2013

Models with nonlinear g (.) are non-linear in the mean, while those

with nonlinear 2 (.) are nonlinear in variance.

5 / 30

S PRING 2013

6 / 30

The traditional tools of time series analysis (acfs, spectral analysis)

may find no evidence that we could use a linear model, but the data

may still not be independent.

relationships can be made linear by a suitable transformation.

On the other hand, it is likely that many relationships in finance are

intrinsically non-linear.

There are many types of non-linear models, e.g.

developed.

The simplest is Ramseys RESET test, which took the form:

u

bt = 0 + 1 ybt2 + 2 ybt3 + ... + p 1 ybtp + vt

ARCH / GARCH

switching models

bilinear models

One particular non-linear model that has proved very useful in finance

is the ARCH model due to Engle (1982).

S PRING 2013

7 / 30

S PRING 2013

8 / 30

T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES M ODELS

H ETEROSKEDASTICITY R EVISITED

H ETEROSKEDASTICITY R EVISITED

ARCH M ODELS

yt = 1 + 2 x2t + 3 x3t + 4 x4t + ut , ut N 0, 2u

homoskedasticity, i.e. var (ut ) = 2 .

What if the variance of the error is not constant?

heteroskedasticity;

would imply that standard error estimates could be wrong.

models for the variance (seminal paper of Engle (1982)).

Autoregressive conditional heteroscedasticity (ARCH) models are now

commonly used to describe and forecast changes in the volatility of

financial time series.

Bollerslev et al. (1992, 1994), Bera and Higgins (1993), Pagan

(1996), Palm (1996) and Shephard (1996), among others.

Not for financial data!!!

S PRING 2013

9 / 30

T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES M ODELS

ARCH M ODELS

10 / 30

T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES M ODELS

ARCH M ODELS

So use a model which does not assume that the variance is constant.

Recall the definition of the variance of ut :

n

o

2t = var (ut |ut 1 , ut 2 , ...) = E [ut E (ut )]2 |ut 1 , ut 2 , ...

we usually assume that E (ut ) = 0, so

2t = var (ut |ut 1 , ut 2 , ...) = E ut2 |ut 1 , ut 2 , ...

yt = 1 + 2 x2t + ... + k xkt + ut , ut N 0, 2t

where 2t = 0 + 1 ut21 .

We can easily extend this to the general case where the error variance

depends on q lags of squared errors:

Instead of calling the variance 2t ,in the literature it is usually called

ht ,so the model is

(ARCH) model:

2t = 0 + 1 ut21

where

ht = 0 + 1 ut21 + 2 ut22 + ... + q ut2q

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 5 Models of Heteroskedasticity

S PRING 2013

Now, what could the current value of the variance of the errors

plausibly depend upon?

S PRING 2013

11 / 30

S PRING 2013

12 / 30

1 First, run any postulated linear regression of the form given in the

equation above, e.g.

write

yt = 1 + 2 x2t + ... + k xkt + ut , ut N 0, 2t

q

t =

0 + 1 ut21

The two are different ways of expressing exactly the same model. The

first form is easier to understand while the second form is required for

simulating from an ARCH model, for example.

saving the residuals, u

bt .

2 Then square the residuals, and regress them on q own lags to test for

ARCH of order q, i.e. run the regression

bt21 + 2 u

bt22 + ... + q u

bt2q + vt

u

bt2 = 0 + 1 u

multiplied by the coefficient of multiple correlation) from the last

regression, and is distributed as a 2 (q ).

J IANHUA G ANG (RUC)

S PRING 2013

13 / 30

S PRING 2013

How do we decide on q?

H0 : 1 = 2 = ... = q = 0

H1 : q 6= 0, q = 1 or 2 or 3...

If the value of the test statistic is greater than the critical value from

the 2 distribution, then reject the null hypothesis.

Note that the ARCH test is also sometimes applied directly to returns

instead of the residuals from Stage 1 above.

S PRING 2013

14 / 30

15 / 30

(since variance cannot be negative).

Therefore, a natural extension of an ARCH(q) model which circumvents

some of these problems is a GARCH model.

S PRING 2013

16 / 30

GARCH M ODELS

GARCH M ODELS

dependent upon previous own lags.

The variance equation is now

But in general a GARCH(1,1) model will be sufficient to capture the

volatility clustering in the data.

2t = 0 + 1 ut21 + 2t 1

This is a GARCH(1,1) model, which is like an ARMA(1,1) model for

the variance equation.

We could also show that a GARCH(1,1) model can be written as an

infinite order ARCH model.

more parsimonious-avoiding overfitting;

less likely to breach non-negativity constraints.

q

2t = 0 + i ut2i +

i =1

j 2t j

j =1

S PRING 2013

17 / 30

S PRING 2013

18 / 30

var (ut ) =

Since the model is no longer of the usual linear form, we cannot use

OLS.

0

1 ( 1 + )

when 1 + < 1

1 + > 1 is termed non-stationarity in variance;

The method works by finding the most likely values of the parameters

given the actual data.

not converge on their unconditional value as the horizon increases.

S PRING 2013

19 / 30

S PRING 2013

20 / 30

T OPIC 5 M ODEL OF H ETEROSKEDASTICITY IN T IME S ERIES L IKELIHOOD

L IKELIHOOD

are as follows

1

Specify the appropriate equations for the mean and the variance - e.g.

an AR(1)-GARCH(1,1) model:

yt

2t

= + yt 1 + ut , ut N (0, 2t )

=

0 + 1 ut21

(1)

2t 1

Unfortunately, the LLF for a model with time-varying variances

cannot be maximised analytically, except in the simplest of cases. So

a numerical procedure is used to maximise the log-likelihood

function. A potential problem: local optima or multimodalities in

the likelihood surface.

The way we do the optimisation is:

1 T

T

1 T (yt yt 1 )2

l = log(2 ) log(2t )

2

2 t =1

2 t =1

2t

(2)

1

2

3

The computer will maximise the function and give parameter values

and their standard errors.

S PRING 2013

21 / 30

Set up LLF.

Use regression to get initial guesses for the mean parameters.

Choose some initial guesses for the conditional variance parameters.

Specify a convergence criterion - either by criterion or by value.

S PRING 2013

22 / 30

We can test for normality using the following representation

and variants have been proposed. Three of the most important

examples are EGARCH, GJR, and GARCH-M models.

vt N (0, 1)

q ut = vt t

t = 0 + 1 ut21 + 2t 1

vt = utt

The sample counterpart is vbt =

ubt

bt

Are the vbt normal? Typically vbt are still leptokurtic, although less so

than u

bt . Is this a problem? Not really, as we discussed before. We

can use the ML with a robust variance/covariance estimator. ML

with robust standard error is called Quasi-Maximum Likelihood or

QML (also known as pseudo-).

S PRING 2013

23 / 30

GARCH models cannot account for leverage effects

GJR model, which are asymmetric GARCH models.

S PRING 2013

24 / 30

T HE EGARCH M ODEL

T HE EGARCH M ODEL

T HE GJR M ODEL

r

2

ut 1

|ut 1 |

+ q

log 2t = + log 2t 1 + q

2t 1

2t 1

2t = 0 + 1 ut21 + 2t 1 + ut21 It 1

where

Since we model the log 2t , then even if the parameters are negative,

2t will be positive.

We can account for the leverage effect by noticing that a negative

shock (u

t 1 ) has an asymmetric effect on the dependent variable

log 2t as opposed to a positive shock.

S PRING 2013

T HE GJR M ODEL

25 / 30

A N E XAMPLE OF GJR

It 1 = 1, if ut 1 < 0

It 1 = 0, otherwise

For a leverage effect, we would see > 0.

We require 1 + 0 and 1 0 for non-negativity conditions.

S PRING 2013

A N E XAMPLE OF GJR

26 / 30

The news impact curve plots the next period volatility (ht ) that would

arise from various positive and negative values of ut 1 , given an

estimated model.

News Impact Curves for SPX returns using coefficients from GARCH

and GJR Model Estimates:

Estimating a GJR model, we obtain the following results.

0.14

GARCH

= 0.172

2t =

GJR

0.12

(3.198 )

Value of Conditional Variance

yt

(16.372 )

(0.437 )

(14.999 )

(5.772 )

0.1

0.08

0.06

0.04

0.02

0

-1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

S PRING 2013

27 / 30

S PRING 2013

28 / 30

GARCH- IN -M EAN

GARCH- IN -M EAN

let the return of a security be partly determined by its risk?

Engle, Lilien and Robins (1987) suggested the ARCH-M specification.

A GARCH-M model would be

yt = + t 1 + ut , ut N 0, 2t

2t = 0 + 1 ut21 + 2t 1

GARCH can model the volatility clustering effect since the conditional

variance is autoregressive. Such models can be used to forecast

volatility.

We could show that

Var (yt |yt 1 , yt 2 ..., ) = Var (ut |ut 1 , ut 2 , ...)

more complex hybrid models - e.g. an ARMA-EGARCH(1,1)-M

model.

S PRING 2013

29 / 30

S PRING 2013

30 / 30

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 6 Multivariate GARCH

3 C REDITS , 51 H OURS

Jianhua Gang

School of Finance

Renmin University of China

Spring 2013

PRING 2013

1 / 44

MGARCH FAMILY

PRING 2013

MGARCH FAMILY

2 / 44

MGARCH FAMILY

MGARCH FAMILY

P ROBLEM

1

(through its conditional variance) or indirectly (through its conditional

covariances)?

Is the impact the same for negative and positive shocks of the same

amplitude?

time.

PRING 2013

Gourieroux (1997); De Santis and Gerard (1998); Hafner and

Herwartz (1998); Franses and van Dijk (2000); Lien and Tse (2002).

MGARCH models: initially developed in the late 1980s and the first

half of the 1990s. After 2000s, another active phase of this field.

associated with financial crises)?

Are they increasing in the long run, perhaps because of the

globalization of financial markets?

J IANHUA G ANG (RUC)

For example, impact of volatility in financial markets on real variables

like exports and output growth rates, and the volatility of these

growth rates.

3 / 44

PRING 2013

4 / 44

MGARCH FAMILY

MGARCH FAMILY

yt = t ( ) + t

ET AL .

ht = c + A t 1 + Ght 1

where

ht

t = Ht1/2 ( ) zt

vector zt , IN is the identity matrix of order N.

Var (zt ) = IN

PRING 2013

= vech (Ht )

= vech t t

and vech () denotes the operator that stacks the lower triangular portion

of a N N matrix as a N (N + 1) /2 1vector. A and G are square

parameter matrices of order N (N + 1) /2 and c is a N (N + 1) /2 1

parameter vector.

E (zt ) = 0

1988)

5 / 44

PRING 2013

VEC

ET AL .

1988)

AND

DVEC (B OLLERSLEV

ET AL .

6 / 44

1988)

is used only in the bivariate case.

is used only in the bivariate case.

overcome this: A and G matrices are assumed to be diagonal, each

element hi ,jt depending only on its own lag and on the previous values

of it jt .

overcome this: A and G matrices are assumed to be diagonal, each

element hi ,jt depending only on its own lag and on the previous values

of it jt .

But, even under this diagonality, large-scale systems are still highly

parameterized and difficult to estimate.

But, even under this diagonality, large-scale systems are still highly

parameterized and difficult to estimate.

Even simpler version of the DVEC (Ding and Engle, 2001): A and G

to be positive scalar (scalar model).

Even simpler version of the DVEC (Ding and Engle, 2001): A and G

to be positive scalar (scalar model).

PRING 2013

7 / 44

PRING 2013

8 / 44

BEKK

model is very close to the unit root case.

To take this into account, Riskmetrics uses the exponentially weighted

moving average model (EWMA) and defines the variances and

covariances as IGARCH-type models (Engle and Bollerslev, 1986):

representation without imposing strong restrictions on the parameters.

Engle and Kroner (1995) propose alternative Ht to ensure the

positivity: the BEKK model.

ht = (1 ) t 1 + ht 1

which is a scalar VEC. The decay factor proposed by Riskmetrics is

0.94 for daily data and 0.97 for monthly data.

However, the decay factor is not estimated by suggested. Therefore

very hard to justify.

PRING 2013

9 / 44

S PRING

GARCH

2013

10 / 44

BEKK

BEKK

D EFINITION (BEKK(1,1,K))

Ht = C C +

k =1

k =1

Ak t 1 t 1 Ak + Gk Ht 1 Gk

(1)

generality).

Impose diagonal BEKK model, i.e. diagonalize Ak and Gk . (Now also

a DVEC model but less general, since DVEC is not guranteed to be

positive definite.)

Parameters of the BEKK model do not represent directly the impact

of the different lagged term on the elements of Ht like in the VEC

model.

times a matrix of ones.

be positive.

J IANHUA G ANG (RUC)

S PRING

GARCH

2013

11 / 44

S PRING

GARCH

2013

12 / 44

VEC

AND

BEKK

AL .

D EFINITION (FGARCH(1,1,K))

The difficulty when estimating a VEC or even a BEKK model is the

high number of unknown parameters, even after imposing several

restrictions.

denoted by F-GARCH(1,1,K), if for each k = 1, ..., K , Ak and Gk have

rank one and have the same left and right eigenvectors, k and wk ,i.e.

Ak

It is thus not surprising that these models are rarely used when the

number of series is larger than 3 or 4.

Factor and orthogonal models circumvent this difficulty by imposing a

common dynamic structure on all the elements of Ht , which results in

less parameterized models.

Gk

S PRING

GARCH

2013

13 / 44

(4)

S PRING

GARCH

2013

FACTOR M ODEL (E NGLE ET

AL .

14 / 44

(1990 B ), B OLLERSLEV AND E NGLE (1993))

(3)

N

0 for k 6= i

, wkn = 1

1 for k = i n =1

AL .

(2)

vectors satisfying,

wk i

= k wk k ,

= k wk k ,

D EFINITION (FGARCH(1,1,K))

Substitute (2) and (3) into (1) and define = C C , we get

K

Ht = +

k k

k =1

2k wk t 1 t 1 wk + 2k wk Ht 1 wk

(5)

Ht

is of full rank because is positive definite.

The vector k is defined as factor loading, and the scalar wk t

(denoted as fkt ) is the kth factor.

The expression between brackets can be replaced by other univariate

GARCH specifications.

J IANHUA G ANG (RUC)

S PRING

GARCH

2013

F-GARCH(1,1,2),

15 / 44

= + 1 1 21 w1 t 1 t 1 w1 + 21 w1 Ht 1 w1

+1 1 22 w2 t 1 t 1 w2 + 22 w2 Ht 1 w2

S PRING

GARCH

2013

(6)

16 / 44

AL .

AL .

t = 1 f1t + 2 f2t + et

t = ft + et

matrix and uncorrelated with the two factors.

factor is observable if it is specified as a function of t .

Each factor fkt has zero conditional mean and conditional variance

like a GARCH(1,1) process.

(2003), the full-factor multivariate GARCH model (FF-GARCH).

S PRING

GARCH

2013

17 / 44

S PRING

GARCH

2013

O RTHOGONAL GARCH

O RTHOGONAL GARCH

Each factor fkt has zero conditional mean and conditional variance

like a GARCH(1,1) process.

D EFINITION (O-GARCH(1,1, M ))

Kariya (1988) and Alexander and Chibumba (1997): The N N

time-varing variance marix Ht is generated by m N univariate GARCH

models,

t = 1 f1t + 2 f2t + et

where et represents an idiosyncratic shock with constant variance matrix

and uncorrelated with the two factors.

S PRING

GARCH

2013

18 / 44

19 / 44

t = ft + et

where is a matrix of dimension N K and ft is a K 1 vector. A

factor is observable if it is specified as a function of t .

Variants of the factor model in the literature. e.g. Vrontos et al.

(2003), the full-factor multivariate GARCH model (FF-GARCH).

S PRING

GARCH

2013

20 / 44

CCC M ODEL

Multivariate models must allow where one can specify separately (the

individual conditional variances) and the conditional correlation

matrix or other measure of dependence between individual series (like

the copula of the conditional joint density).

A hierarchical procedure:

1

among within the multivariate system);

Model the conditional correlation matrix (imposing positive definiteness

for any t).

ergodicity and moments may not be so straightforward

and therefore more easily estimable.

the corresponding conditional standard deviations.

This restriction greatly reduces the number of unknown parameters

and thus simplifies the estimation. (Bollerslev (1990))

S PRING

GARCH

2013

21 / 44

S PRING

GARCH

2013

CCC M ODEL

CCC M ODEL

D EFINITION (CCC)

original CCC model has a GARCH(1,1) specification for each

conditional variance in Dt :

Ht = Dt RDt = ij

where

hiit hjjt

1/2

1/2

...hNNt

Dt = diag h11t

(7)

symmetric positive definite matrix with ij = 1,for any i.

22 / 44

S PRING

GARCH

2013

is a

23 / 44

He and Terasvirta (2002b) use a VEC-type formulation for

(h11t , h22t , ..., hNNt ) to allow for interactions between the conditional

variances. They call this the extended CCC model.

S PRING

GARCH

2013

24 / 44

DCC M ODEL

DCC M ODEL

bivariate case.

unrealistic in many empirical applications.

The DCC model of Tse and Tsui (2002) and Engle (2002) are useful

when modelling high-dimensional data sets.

correlation matrix time-dependent. (Christodoulakis and Satchell

(2002), Engle (2002) and Tse and Tsui (2002)).

correlation matrix has to be positive definite for any t. (The DCC

models guarantee this under simple conditions on the parameters.)

S PRING

GARCH

2013

Ht = Dt Rt Dt

where Dt is defined in (7), hiit can be defined as any univariate GARCH

model and

Rt = (1 1 2 ) R + 1 t 1 + 2 Rt 1

(8)

25 / 44

S PRING

GARCH

2013

DCC M ODEL

DCC M ODEL

DCCT (M))

MODEL OF

OR

symmetric N N positive definite parameter matrix with ii = 1 and

t 1 is the N N correlation matrix of for

= t M, t M + 1, ..., t 1. Its i, jth. element is given by:

ij ,t 1 = r

M

m =1 ui ,t m uj ,t m

M

2

2

u

u

M

m =1 i ,t m

m =1 j ,t m

S PRING

GARCH

2013

(9)

27 / 44

26 / 44

DCCT (M))

t 1 = Bt11 Lt 1 Lt 1 Bt11 ,where Bt 1 is a N N diagonal matrix with

1/2

2

ith. diagonal element given by M

and

h =1 ui ,t h

Lt 1 = (ut 1 , ..., ut M ) is a N M matrix, with ut = (u1t u2t ...uNt ) .

also of Rt , is that M > N. Then Rt is itself a correlation matrix if

Rt 1 is also a correlation matrix (notice iit = 1 for any i).

S PRING

GARCH

2013

28 / 44

DCC M ODEL

DCC M ODEL

Engle and Sheppard, 2001).

D EFINITION (C ONT. DCC MODEL OF E NGLE (2002) OR DCCE (1,1))

D EFINITION (DCC MODEL OF E NGLE (2002) OR DCCE (1,1))

and are non-negative scalar parameters satisfying + < 1.

Ht = Dt Rt Dt

where

1/2

1/2

1/2

1/2

Q

...qNN

diag

q

...q

Rt = diag q11,t

t

11,t

,t

NN ,t

couterpart to render the estimation even simpler.

by:

Qt = (1 ) Q + ut 1 ut 1 + Qt 1

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 6 Multivariate

S PRING

GARCH

2013

29 / 44

12t

M

m =1 u1,t m u2,t m

M

2

2

u

u

M

m =1 1,t m

m =1 2,t m

(10)

2

(1 ) q 22 + u2,t

1 + q22,t 1

S PRING

GARCH

2013

= = 0, respectively to check whether imposing constant

conditional correlations is empirically relevant.

A drawback of the DCC models is that 1 , 2 in DCCT and , in

DCCE are scalars, so that all the conditional correlations obey the

same dynamics. This is however necessary to ensure that Rt is

positive definite for any t through sufficient conditions on the

parameters.

q

2

(1 ) q 11 + u1,t

1 + q11,t 1

1

Unlike in DCCT the DCCE model does not formulate the conditional

correlation as a weighted sum of past correlations.

= (1 1 2 ) 12 + 2 12,t 1 + 1

q

30 / 44

conditional correlations:

+ r

S PRING

GARCH

2013

12t

(11)

31 / 44

S PRING

GARCH

2013

32 / 44

W HY DCC

this approach feasible when N is high.

Of course, when N is large, the restriction of common dynamics gets

tighter, but for large N the problem of maintaining tractability also

gets harder. In this respect, several variants of the DCC model are

proposed in the literature.

the variance part.

conditional correlation matrix to be the same for all the correlations is

not desirable.

can be estimated using N univariate models, one can easily extend the

DCC-GARCH models to more complex GARCH-type structures.

follow a switching regime driven by an unobserved Markov chain so

that the correlation matrix is constant in each regime but may vary

across regimes.

J IANHUA G ANG (RUC)

S PRING

GARCH

2013

33 / 44

S PRING

GARCH

2013

34 / 44

Ht = Dt Rt Dt + t

where,

A model somewhat different from the previous ones but that nests

several of them is the general dynamic covariance (GDC) model

proposed by Kroner and Ng (1998).

Dt = (dijt ), diit =

(12)

ijt = ij + i t 1 t 1 j + gi Ht 1 gj , for any i, j.

(13)

positive definite and symmetric.

S PRING

GARCH

2013

35 / 44

S PRING

GARCH

2013

36 / 44

Elementwise we have,

hijt

hiit

p

= ijt iit jjt + ij ijt , for i 6= j

= | iit | , for any i.

(14)

into its N marginal distributions, and a copula function that

completely describes the dependence between the N variables (Sklar

(1959), Nelsen (1999), Patton (2000), Jondeau and Rockinger

(2001)).

Standard copula-GARCH:

The GDC model contains several MGARCH models as special cases.

1

2

3

S PRING

GARCH

2013

37 / 44

marginal distributions for each series;

a conditional copula function.

S PRING

GARCH

2013

E STIMATION I SSUES

T WO -S TEP MLE

extending the DCC models to other specifications of the conditional

dependence, so that the copula function is rendered time-varying

through its parameters, which can be functions of past data.

Can be estimated using a two-step maximum likelihood approach.

Feature of copula-GARCH models: the ease with which very flexible

joint distributions may be obtained in the bivariate case.

Their application to higher dimensions is a subject for further

research.

S PRING

GARCH

2013

39 / 44

38 / 44

E STIMATION I SSUES

(2001) show that the loglikelihood can be written as the sum of a

mean and volatility part (depending on a set ofunknown

parameters).

b

b

Therefore estimate coefficients separately, say, 1 , 2 .

But maximizing them separately is not fully efficient since they are

limited information estimators. However, one iteration of a

Newton-Raphson

algorithm applied to the total likelihood starting at

b

b

1 , 2 provides an estimator that is asymptotically efficient.

S PRING

GARCH

2013

40 / 44

E STIMATION I SSUES

E STIMATION I SSUES

D IAGNOSTIC C HECKING

D IAGNOSTIC C HECKING

A simple trick to ensure a reasonable value of the model-implied

unconditional covariance matrix, which also helps to reduce the

number of parameters in the maximization of the likelihood function,

is referred to as VTE by Engle and Mezrich (1996).

It is desirable to check,

1

VTE:

1

effects;

ex post: check the adequacy of the MGARCH specification.

unconditional variance;

QML estimation of the remaining parameters.

the QMLE for long-term prediction.

S PRING

GARCH

2013

41 / 44

D IAGNOSTIC C HECKING

E (ztzt ) = I

N;

Cov zit2 , zjt2 = 0, for all i 6= j;

Cov zit2 , zj2,t k = 0, for k > 0.

S PRING

GARCH

2013

42 / 44

D IAGNOSTIC C HECKING

D IAGNOSTIC C HECKING

model (at least in the first two conditional moments), the

standardized error term zt = Ht1/2 t should obey the following

moment conditions (Ding and Engle, 2001):

2

S PRING

GARCH

2013

D IAGNOSTIC C HECKING

mean;

which could be false even if Ht is correctly specified.

of Ht , regardless of the validity of the assumption about the

distribution of zt .

Tse (2002), diagnostics for conditional heteroscedasticity models

applied in the literature can be divided into three categories:

portmanteau tests, residual-based diagnostics and Lagrange multiplier

tests.

43 / 44

S PRING

GARCH

2013

44 / 44

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 7 Multivariate Models

3 C REDITS , 51 H OURS

Jianhua Gang

School of Finance

Renmin University of China

Spring 2013

PRING 2013

1 / 52

EXOGENOUS.

Qst

Qdt

(1)

(2)

(3)

PRING 2013

= + P + S + u

Q = + P + kT + v

Q

(4)

(5)

The point is that price and quantity are determined simultaneously

(price affects quantity and quantity affects price). So, P and Q are

endogenous variables, while S and T are exogenous.

We can obtain REDUCED FORM equations corresponding to (4)

and (5) by solving equations (4) and (5) for P and for Q (separately).

embodying the state of technology.

J IANHUA G ANG (RUC)

Assuming that the market always clears, and dropping the time

subscripts for simplicity.

y is an ENDOGENOUS variable.

An example from economics to illustrate-the demand and supply of a

good:

= + Pt + St + ut

= + Pt + kTt + vt

= Qst

2 / 52

S TRUCTURAL F ORM

All the models we have looked at thus far have been single equations

models of the form: y = X + u

Qdt

PRING 2013

3 / 52

PRING 2013

4 / 52

Re-arranging (6):

Solving for Q,

+ P + S + u = + P + kT + v

P=

(6)

v u

+

T

S

(8)

Solving for P,

S

u

Q

kT

v

Q

=

(7)

Q=

u v

T+

S+

(9)

(8) and (9) are the reduced form equations for P and Q.

PRING 2013

5 / 52

6 / 52

But what would happen if we had estimated equations (4) and (5),

i.e. the structural form equations, separately using OLS?

Both equations depend on P. One of the CLRM assumptions was that

E (X u ) = 0, where X is a matrix containing all the variables on the

R.H.S. of the equation.

It is clear from (8) that P is related to the errors in (4) and (5) i.e. it is stochastic.

Hence, when estimating coefficient before P, it is biased! (Since

E (X u ) 6= 0 in general!)

PRING 2013

PRING 2013

7 / 52

of a simultaneous system will lead to biased coefficient estimates.

Is the OLS estimator still consistent, even though it is biased?

No - In fact the estimator is inconsistent as well.

Hence it would NOT be possible to estimate equations (4) and (5)

validly using OLS.

PRING 2013

8 / 52

Short answer: sometimes.

1

P

Q

= 10 + 11 T + 12 S + 1

= 20 + 21 T + 22 S + 2

(10)

(11)

We CAN estimate equations (10) and (11) using OLS since all the

R.H.S. variables are exogenous.

But ... we probably dont care what the values of the coefficients

are; what we wanted were the original parameters in the structural

equations - , , , , , k.

PRING 2013

P ROBLEM

As well as simultaneity, we sometimes encounter another problem:

Identification. Consider the following demand and supply equations

= + P

Q = + P

Q

9 / 52

S PRING

Models2013

(13)

10 / 52

estimated. The appropriate estimation technique will depend upon

whether it is exactly identified or over-identified.

S PRING

Models2013

(12)

We cannot tell which is which! (same equations in nature from OLS view!)

11 / 52

IDENTIFIED, or UNDERIDENTIFIED.

four parameters. Notice that we would not have had this problem with

equations (4) and (5) since they have different exogenous variables.

S PRING

Models2013

12 / 52

1

An equation is unidentified

like (12) and (13)

we cannot get the structural coefficients from the reduced form

estimates.

An equation is exactly identified

e.g. (4) or (5)

can get unique structural form coefficient estimates.

An equation is over-identified

Examples given later

More than one set of structural coefficients could be obtained from

the reduced form.

S PRING

Models2013

13 / 52

equation to be identified.

The rank condition - is a necessary and sufficient condition for

identification. We specify the structural equations in a matrix

form and consider the rank of a coefficient matrix.

S PRING

Models2013

14 / 52

T HE O RDER C ONDITION

E XAMPLE

In the following system of equations, the Ys are endogenous, while the Xs

are exogenous. Determine whether each equation is over-, under-, or

just-identified.

D EFINITION

Statement of the Order Condition (from Ramanathan 1995, pp.666)

Let G denote the number of structural equations. An equation is just

identified if the number of variables excluded from an equation is G-1.

If more than G-1 are absent, it is over-identified. If less than G-1 are

absent, it is not identified.

S PRING

Models2013

T HE O RDER C ONDITION

There are two conditions we could look at:

15 / 52

Y1 = 0 + 1 Y2 + 2 Y3 + 3 X1 + 4 X2 + u1

(14)

Y2 = 0 + 1 Y3 + 2 X1 + u2

(15)

Y3 = 0 + 1 Y2 + u3

(16)

S PRING

Models2013

16 / 52

T HE O RDER C ONDITION

T HE R ANK C ONDITION

S OLUTION

G=3

If # excluded variables = 2, the eq. is just identified

If # excluded variables > 2, the eq. is over-identified

If # excluded variables < 2, the eq. is not identified

Hence,

Equation 14: Not identified

Equation 15: Just identified

Equation 16: Over-identified

S PRING

Models2013

possible to construct at least one non-zero determinant of the order

(G-1) from the coefficients excluded from that particular equation but

contained in other equations of the model.

or A sufficient condition for the identification of a relationship is that

the rank of the matrix of parameters of all the excluded variables

(endogenous and pre-determined) from that equation be equal to

(G-1).

17 / 52

S PRING

Models2013

T HE R ANK C ONDITION

18 / 52

T HE R ANK C ONDITION

For example:

y1 = 3y2 2x1 + x2 + u1

Results:

y2 = y3 + x3 + u2

y3 = y1 y2 2x3 + u3

2

3

RC: Equation 2 is exactly identified, OC: over-identified;

RC: Equation 3 is not identified, OC: exactly identified.

0y1 y2 + y3 + 0x1 + 0x2 + x3 + u2 = 0

y1 y2 y3 + 0x1 + 0x2 2x3 + u3 = 0

S PRING

Models2013

19 / 52

S PRING

Models2013

20 / 52

calculated as follows:

How do we tell whether variables really need to be treated as

endogenous or not?

Consider again equations (14)-(16). Equation (14) contains Y2 and

Y3 - but do we really need equations for them?

(14)-(16). The reduced forms turn out to be:

Y1

Y2

Y3

= 10 + 11 X1 + 12 X2 + v1

= 20 + 21 X1

+ v2

= 30 + 31 X1

+ v3

(17)

(18)

(19)

Estimate the reduced form equations (17)-(19) using OLS, and obtain

b2 , Y

b3 .

b1 , Y

the fitted values: Y

S PRING

Models2013

21 / 52

S PRING

Models2013

22 / 52

R ECURSIVE S YSTEMS

R ECURSIVE S YSTEMS

Consider the following system of equations:

Y1 = 10

(21)

Y2

(22)

3. Run the regression 14 again, but now also including the fitted values

b3 as additional regressors:

b2 , Y

Y

Y3

b2 + 3 Y

b3 + u1 (20)

Y1 = 0 + 1 Y2 + 2 Y3 + 3 X1 + 4 X2 + 2 Y

the null hypothesis is rejected, Y2 and Y3 should be treated as

endogenous.

S PRING

Models2013

23 / 52

+ 11 X1 + 12 X2 + u1

= 20 + 21 Y1

+ 21 X1 + 22 X2 + u2

= 30 + 31 Y1 + 32 Y2 + 31 X1 + 32 X2 + u3

(23)

P ROBLEM

Assume that the error terms are not correlated with each other. Can we

estimate the equations individually using OLS?

(21) contains no endogenous variables, so X1 and X2 are NOT

correlated with u1 . So we can use OLS on (21).

(22) contains endogenous variable Y1 . We can use OLS on (22) if all

the R.H.S. variables are uncorrelated with the error u2 (True!). In

fact, Y1 is not correlated with u2 because there is no Y2 term in

equation (21). So we can use OLS on (22).

J IANHUA G ANG (RUC)

S PRING

Models2013

24 / 52

R ECURSIVE S YSTEMS

R ECURSIVE S YSTEMS

to the reduced form equations.

Equation 23: Contains both Y1 and Y2 ; we require these to be

uncorrelated with u3 . By similar arguments to the above, equations

(21) and (22) do not contain Y3 , so we can use OLS on (23).

This is known as a RECURSIVE or TRIANGULAR system. We do not

have a simultaneity problem here.

D EFINITION

If the system is just identified, ILS involves estimating the reduced form

equations using OLS, and then using them to substitute back to obtain

the structural parameters.

However, ILS is not used much because:

1

2

S PRING

Models2013

25 / 52

Most simultaneous equations systems are over-identified.

S PRING

Models2013

over-identified systems.

over-identified systems.

Obtain and estimate the reduced form equations using OLS. Save the

fitted values for the dependent variables.

S PRING

Models2013

27 / 52

26 / 52

Obtain and estimate the reduced form equations using OLS. Save the

fitted values for the dependent variables.

Estimate the structural equations, but replace any R.H.S. endogenous

variables with their Stage 1 fitted values.

S PRING

Models2013

27 / 52

1

b2 , Y

b3 .

b1 , Y

obtain the fitted values, Y

Replace the R.H.S. endogenous variables with their Stage 1 estimated

values:

Y1

Y2

Y3

b2 + 2 Y

b3 + 3 X1 + 4 X2 + u1

= 0 + 1 Y

b3 + X1 + u2

= 0 + 1 Y

2

b

= 0 + 1 Y2 + u3

(24)

(25)

(26)

b2 and Y

b3 will not be correlated with u1 , will not be correlated

Now Y

with u2 , and will not be correlated with u3 .

S PRING

Models2013

28 / 52

with their OLS counterparts, but once this has been done, we can use

the usual t- and F-tests to test hypotheses about the structural form

coefficients.

S PRING

Models2013

29 / 52

Recall that the reason we cannot use OLS directly on the structural

equations is that the endogenous variables are correlated with the

errors.

One solution: abandon Y2 or Y3 , rather, use some other variables

instead.

We want these other variables to be (highly) correlated with Y2 and

Y3 , but not correlated with the errors - the INSTRUMENTS.

Say, some suitable instruments for Y2 and Y3 , z2 and z3 respectively.

We do not use the instruments directly, but run regressions of the

form:

Y2 = 1 + 2 z2 + 1

(27)

Y3 = 3 + 4 z3 + 2

(28)

S PRING

Models2013

the 2SLS estimator is not even consistent.

the CLRM assumptions are supported by the data.

30 / 52

b2 and Y

b3 , and replace Y2

Obtain the fitted values from (27) & (28), Y

and Y3 with these in the structural equation.

We do not use the instruments directly in the structural equation.

It is typical to use more than one instrument per endogenous

variable.

If the instruments are the variables in the reduced form equations,

then IV is equivalent to 2SLS.

S PRING

Models2013

31 / 52

The coefficient estimates will still be consistent, but will be

inefficient compared to those that just used OLS directly.

The Problem With IV:

VAR: a systems regression model i.e. there is more than one

dependent variable.

y1t

P ROBLEM

What are the instruments?

y2t

S OLUTION

Solution: 2SLS is easier.

Other Estimation Techniques:

1

2

LIML-(Limited Information ML) estimating reduced form equations

by maximum likelihood.

FIML-(Full Information ML) estimating all the equations

simultaneously using maximum likelihood.

S PRING

Models2013

32 / 52

Yt = 0 + 1 Yt 1 + ut

J IANHUA G ANG (RUC)

S PRING

Models2013

S PRING

Models2013

33 / 52

endogenous!

Allows a variable to depend on more than just its own lags or

combinations of white noise terms, so more general than ARMA

modelling.

Provided that there are no contemporaneous terms on the R.H.S.

of the equations, Can simply use OLS separately on each equation.

Forecasts are often better than traditional structural models.

= 20 + 21 y2t 1 + 21 y1t 1 + u2t

VAR(1):

10

y1t 1

u1t

y1t

11 11

=

+

+

y2t

20

y2t 1

u2t

21

21

34 / 52

1

2

3

4

5

VARs are theoretical (as are ARMA models). What if not the VAR

process?

How to decide the appropriate lag length?

So many parameters to estimate!

Do we need to ensure all components of the VAR are stationary?

How do we interpret the coefficients?

S PRING

Models2013

35 / 52

VAR(C ROSS -E QUATION R ESTRICTIONS )

VAR(C ROSS -E QUATION R ESTRICTIONS )

have the same lag length.

Suppose that a bivariate VAR(8), and we want to examine a

restriction that the coefficients on lags 5 through 8 are jointly zero.

(de facto, H0 : VAR (4) against HA : VAR (8))

This can be done using a likelihood ratio (LR) test.

bu

b /T ),

b

as . The LR test statistic for this joint hypothesis is:

i

h

b

b

LR = T ln

r ln u

LR is asymptotically distributed as a:

2 (q ), q = # of restrictions

In the our case above we restrict 4 lags of two variables in each of the

two equations:4 2 2 = 16 restrictions.

P ROBLEM

Conducting the LR test is cumbersome and requires a normality

assumption for the disturbances

S PRING

Models2013

36 / 52

37 / 52

b

MAIC = ln

+ 2k/T

k

b

MSBIC = ln

+ ln(T )

T

2k

b

MHQIC = ln +

ln(ln T )

T

y1t

y2t

= 20 + 21 y2t 1 + 21 y1t 1 + 22 y1t + u2t

in compact form:

10

y1t 1

y2t

u1t

y1t

11 11

12 0

=

+

+

+

0 22

21 21

y2t

20

y2t 1

y1t

u2t

The values of the information criteria are constructed for 0,1,... lags

(up to some pre-specified maximum kmax ).

S PRING

Models2013

S PRING

Models2013

VAR(I NFORMATION C RITERIA )

38 / 52

S PRING

Models2013

39 / 52

1

We can take the contemporaneous terms over to the L.H.S. and write

y1t

10

y1t 1

u1t

1

12

11 11

=

+

+

22

1

21 21

y2t

20

y2t 1

u2t

BYt = 0 + 1 Yt 1 + ut

We then pre-multiply both sides by B 1 :

or

Yt = A0 + A1 Yt 1 + et

This is known as a standard form VAR, which we can estimate using

ML as before.

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate

S PRING

Models2013

40 / 52

S PRING

Models2013

41 / 52

I MPULSE R ESPONSES

construct the impulse responses and variance decompositions.

Impulse responses trace out the responsiveness of the dependent

variables in the VAR to shocks in the error term. A unit shock is

applied to each variable and its effects are noted.

Consider for example a simple bivariate VAR(1):

changes in y1 cause changes in y2 ?

Implied Restriction

21 = 0 and 21 = 0 and 21 = 0

11 = 0 and 11 = 0 and 11 = 0

12 = 0 and 12 = 0 and 12 = 0

22 = 0 and 22 = 0 and 22 = 0

I MPULSE R ESPONSES

Each of these four joint hypotheses can be tested within the F-test

framework, since each set of restrictions contains only parameters

drawn from one equation.

Hypothesis

1. Lags of y1t do not explain current y2t

2. Lags of y1t do not explain current y1t

3. Lags of y2t do not explain current y1t

4. Lags of y2t do not explain current y2t

+

+

+

22 y 2t 1 21 22 y 2t 2 21 22 y 2t 3 u 2t

implied restrictions on the parameter matrices:

Yt = B 1 0 + B 1 1 Yt 1 + B 1 ut

difficult to see which sets of variables have significant effects on

each dependent variable and which do not. For illustration, consider

the following bivariate VAR(3):

y1t 10 11

+

=

y 2t 20 21

or

2

for y2 . If this is the case, we say that y1 Granger-causes y2 .

If y2 causes y1 , lags of y2 should be significant in the equation for y1 .

If both sets of lags are significant, there is bi-directional causality

y2 t = 20 + 21 y2 t 1 + 21 y1t 1 + u2 t

y1 during the next period.

We can examine how long and to what degree a shock to a given

equation has on all of the variables in the system.

S PRING

Models2013

42 / 52

S PRING

Models2013

43 / 52

VARIANCE D ECOMPOSITIONS

VARIANCE D ECOMPOSITIONS

D ECOMPOSITIONS : T HE O RDERING OF THE

VARIABLES

examining VAR dynamics. They give the proportion of the

movements in the dependent variables that are due to their own

shocks, versus shocks to the other variables.

This is done by determining how much of the s step ahead forecast

error variance for each variable is explained in innovations to each

explanatory variable (s = 1,2,. . . ).

The variance decomposition gives information about the relative

importance of each shock to the variables in the VAR.

S PRING

Models2013

44 / 52

variables is important.

Main reason: VAR errors often violate the independence of one

another. Instead, they typically correlates to some degree.

Therefore, the notion of examining the effect of the innovations

separately has little meaning, since they have a common

component.

Thus, must orthogonalise the innovations.

In the bivariate VAR, this problem would be approached by

attributing all of the effect of the common component to the first of

the two variables in the VAR.

In the general case where there are more variables, the situation is

more complex but the interpretation is the same.

J IANHUA G ANG (RUC)

L AG N UMBER S ELECTION

S PRING

Models2013

L AG N UMBER S ELECTION

L AG N UMBER S ELECTION

45 / 52

T ESTS OF R ANDOMNESS

T ESTS OF R ANDOMNESS

How do we choose the lags p, q in an ARMA(p, q ) model?

1

autocorrelations, and trying to recognize the pattern of a model with

given p, q.

By using an automatic selection criterion (information criterion).

S PRING

Models2013

46 / 52

j , b

h , j 6= h, j 1, h 1) are

asymptotically independent and

Tb

j d N (0, 1) ,(j 1)

We can use this property to design two tests to check if the data are

independently distributed.

S PRING

Models2013

47 / 52

L AG N UMBER S ELECTION

T ESTS OF R ANDOMNESS

L AG N UMBER S ELECTION

T EST OF R ANDOMNESS

P ORTMANTEAU TEST

P ORTMANTEAU TEST

null,

This test is so simple that it can be inspected

so the

visually,

computers usually plots two error bars at 1.96/ T with the sample

autocorrelation function.

(Notice: although it is called "test for randomness" by some

computer softwares and some references, a more appropriate name

would be "test for independent distribution").

b2j d

2k

j =1

seasonal structure in the data: for example with quarterly data the

first three autocorrelations may be zero, and then the fourth one may

be non-zero). (The test may be sensitive to the choice of k on some

occasions).

The tests for independent distribution and the Portmanteau test

may provide preliminary information about the sample AC.

S PRING

Models2013

L AG N UMBER S ELECTION

48 / 52

S PRING

Models2013

L AG N UMBER S ELECTION

49 / 52

The solution: add a penalty which increases with p and q.

IC = 2l (b

) + penalty

2(p + q )

Akaike IC

penalty :

(p + q ) ln(T ) Bayes IC

likelihood" to choose p, q.

The problem: if you compare an ARMA(p, q ) with an

ARMA(p + 1, q ), the ARMA(p, q ) has always smaller likelihood.

This is because the estimate from the ARMA(p, q ) model maximises

bp +1 = 0,while the

the likelihood with the constraint that

ARMA(p + 1, q ) does not impose that constraint, so the

bp +1 = 0,

ARMA(p + 1, q ) has higher maximum likelihood unless

exaclty (which is an event with probability zero in finite sample even

when the true p;0 = 0 actually) (Notice analogy with regression here:

when you increase the number of regressors, the R 2 does not decrease,

and in general increases, even when the regressors are irrelevant).

S PRING

Models2013

50 / 52

AIC: inconsistent estimation of p, q (may select larger than correct

p, q in large samples).

Both BIC and AIC may select smaller then correct p, q in finite

samples (this however is not necessarily a bad thing: it may result,

in small samples).

ARMA(p, q ) with an ARMA(p + 1, q ), or with an ARMA(p, q + 1),

using a likelihood ratio test. The criterion is then adding lags as

long as the likelihood ratio test statistic is above a user-chosen critical

value (for example, 5% significance would have c.v. 3. 84)

J IANHUA G ANG (RUC)

S PRING

Models2013

51 / 52

L AG N UMBER S ELECTION

PARSIMONIOUS M ODELLING

PARSIMONIOUS M ODELLING

Large econometrics models tend to do badly in terms of

forecasting, and are outperfomed by small ARMA models (Box

& Jenkins).

Even in ARMA models, increasing the number of parameters reduces

the precision of with which each parameter is estimated. This is

beacuse when the parameters are estimated, their variance

contributed to the variance of the forecast.

Adding extra parameters may then help to reduce or eliminate the

forecast bias, but the gain in terms of reduction bias 2 is outweighted

by the loss in increased variance of the forecast.

Should balance the number of estimated parameters and the number

of observations.

Sometimes, Information Criteria have been advocated also to select

more parsimonious models.

J IANHUA G ANG (RUC)

S PRING

Models2013

52 / 52

C O - INTEGRATION

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 8 Trending and Co-Integration

trend-free.

3 C REDITS , 51 H OURS

inappropriate,

Jianhua Gang

E XAMPLE

Examine the series for consumption and income. The presence of trends

can sometimes invalidate the usual asymptotic theory for OLS and test

procedures.

School of Finance

Renmin University of China

Spring 2013

A discussion of trends and related topics such as tests for unit roots

and cointegration is, therefore, required.

PRING 2013

1 / 49

Applied workers often specify models that include both lagged values

of the dependent variable and a distributed lag component in the

regression function. These models are called autoregressive

distributed lag (ADL) models.

Applied workers often specify models that include both lagged values

of the dependent variable and a distributed lag component in the

regression function. These models are called autoregressive

distributed lag (ADL) models.

A very simple ADL relationship is

yt = + 1 yt 1 + 0 xt + 1 xt 1 + ut , |1 | < 1.

yt = + 1 yt 1 + 0 xt + 1 xt 1 + ut , |1 | < 1.

The cumulative effect corresponding to the long-run multiplier is

and all future values of yt .

It can be shown that

E (yt +j )/xt = 1j 1 ( 1 + 1 0 ) , j = 1, 2, ...such terms are called

dynamic multipliers.

PRING 2013

2 / 49

PRING 2013

j =0

intercept is = /(1 1 ).

3 / 49

PRING 2013

4 / 49

It is useful to note that the simple ADL can be written in a

mathematically equivalent form that has a parameterization of direct

economic interest, called the error correction model (ECM).

The ECM is derived from the ADL as follows:

yt = + 1 yt 1 + 0 xt + 1 xt 1 + ut ,

have meanings,

The coefficient 0 measures the contemporaneous effect.

The coefficient 1 can be thought of as reflecting speed of

adjustment.

The coefficients and are the intercept and the slope of the

long-run relationship.

can be written as

(yt + yt 1 ) = + 1 yt 1 + 0 (xt + xt 1 ) + 1 xt 1 + ut ,

yt = 0 xt (1 1 ) [yt 1 xt 1 ] + ut

Thus the ECM has first differences in y linked to first differences in x

and the extent by which y deviates from the long-run expected value in

the previous period. (Martingale)

PRING 2013

5 / 49

If the OLS estimates of the ADL are denoted byband the nonlinear

least squares estimates of the ECM are denoted bye, then it can

be shown that,

e =

e

=

PRING 2013

b

0

0 = e

b

1 = e

1

6 / 49

estimated by applying OLS to

yt = 0 xt (1 1 ) [yt 1 xt 1 ] + ut

(1 b1 )

1 )

(b

0 + b

(1 b1 )

0 = e

0 and

b

e

1 = 1 , respectively .

This two-step approach can play an important role when the data

contain trends and will be discussed later in further detail.

estimation of the ECM yield the same point estimates, the latter

method (NLS) has the advantage of giving estimated standard errors

for estimates of long-run parameters as part of standard output.

J IANHUA G ANG (RUC)

PRING 2013

7 / 49

PRING 2013

8 / 49

T RENDING VARIABLES

T RENDING VARIABLES

T RENDING VARIABLES

processes. Recall that a time series variable zt is covariance stationary

if E (zt ) = z , var (zt ) = 2z ,and cov (zt , zt g ) = (|g |) for all t.

1

zt = zt zt 1 = ut , ut NID (0, 2 ),

which f (.) is a deterministic function and ut is a stationary process.

(Often linear trend t);

Second, zt is said to be a difference stationary process if

zt = zt zt 1 = ut ,where ut is a stationary process.

integrated series and it is useful to adopt the following terminology

and notation: a stationary process is denoted I (0) (integrated of order

zero); and a series zt is said to be integrated of order d if d zt I (0).

J IANHUA G ANG (RUC)

T RENDING VARIABLES

PRING 2013

9 / 49

T RENDING VARIABLES

Hence,

zt =

us + z0 .

1

Clearly zt I (1) and zt I (0).

S PRING

Co-Integration

2013

T RENDING VARIABLES

(1)

10 / 49

We are able to illustrate the differences using two simple models:

allow for a drift,

zt = zt zt 1 = a + ut , ut NID (0, 2 ),

zt = zt 1 + ut , || < 1, ut NID (0, 2 ).

It can be proved that for whatever value of t:

so that, if z0 = 0, then,

E (zt )

var (zt )

corr (zt , zt s )

zt =

us + at,

(2)

stochastic trend components.

0;

h

i

2 / 1 2 < ;

s s 0

Thus this I (0) variable has constant mean, constant variance (hence

large departures are rare), and autocorrelations decline as the order

increases.

0

J IANHUA G ANG (RUC)

S PRING

Co-Integration

2013

11 / 49

S PRING

Co-Integration

2013

12 / 49

2. I (1) :Let zt be an I (1) variate generated by the random walk

zt = zt zt 1 = ut , ut NID (0, 2 )

t

where, zt =

us ,if z0 = 0.

1

Hence,

var (zt )

E (zt )

=

=

corr (zt , zt g )

0,

t2 , (monotonic on t )

q

(t g )/ t (t g ), (dependence on t )

is nonstationary.

t

Note that zt =

us

J IANHUA G ANG (RUC)

S PRING

Co-Integration

2013

13 / 49

S PRING

Co-Integration

2013

14 / 49

Analysis is relatively straightforward when variables are

trend-stationary with linear trends. In multiple regression with several

regressors having linear trends, adding a trend term to the basic

model and fitting

yt = + j xjt + t + ut , ut i.i.d.(0, 2 ),

j

which provides a basis for valid estimation and inference, with the

additional regressor serving as a trend-removing agent.

However, it has been established that the asymptotic theory of

OLS estimators and tests developed for I (0) variables can be

misleading when applied to data from I (1) processes. With

nonstationary variables, OLS estimators may tend to nonstandard

distributions, rather than normality, as n .

J IANHUA G ANG (RUC)

S PRING

Co-Integration

2013

15 / 49

S PRING

Co-Integration

2013

16 / 49

by I (1) difference stationary processes.

Many results have been derived for testing the null hypothesis that

a series is difference stationary against the alternative of

covariance stationarity.

- Consider, a simple AR (1) model

2

- If

= 1 zt I (1)

< 1 zt I (0)

S PRING

Co-Integration

2013

Let DP and RE denote the true data process and the regression

equation used to compute the test, respectively. Will consider

three cases.

C ASE 1 H1 : zt stable AR (1) with zero mean.

( ) = 1 = 0 yields the root = 1/, which is unity if = 1.

- Hence we talk of testing for a unit root. Several tests for unit roots

are based upon work by Dickey and Fuller (hereafter DF).

J IANHUA G ANG (RUC)

and H0 : = 1.Apply OLS to,

RE(1) zt = ( 1)zt 1 + t .

zt = zt 1 + ut , ut NID (0, ).

||

17 / 49

1 < 1 and H0 : = 1.Apply OLS to,

RE(2) zt = + ( 1)zt 1 + t , 0 (1 ).H0 implies that

= 1 = 0.

S PRING

Co-Integration

2013

18 / 49

b 1) denote an OLS estimator (same notation for all REs).

Let (

b 1) or

Can base tests on either K (1) = n(

b 1)/SE (

b 1) = (

b 1)/SE (

b ).For RE(3), can also

t (1) = (

use F(0, 1), the F statistic for testing the two restrictions of

= 1 = 0.

DP(3) zt = 0 + 1 t + ut , ut = ut 1 + t , t NID (0, 2 ),with

1 < 1 and H0 : = 1.Apply OLS to,

RE(3) zt = + t + ( 1)zt 1 + t ,in which

[ 0 (1 ) + 1 ] and 1 (1 ).H0 implies that = 0.

provided DP and RE are correctly matched, these test statistics have

nonstandard asymptotic distributions under H0 .DF also provide

estimated critical values for each of the three cases given

above.

If get incorrect matching of RE with DP, results can be quite

different and wrong.

S PRING

Co-Integration

2013

19 / 49

S PRING

Co-Integration

2013

20 / 49

DF- TAU TESTS :

S PRING

Co-Integration

2013

DF- GAMMA TESTS :

21 / 49

S PRING

Co-Integration

2013

22 / 49

A fairly general specification is that, under the unit root hypothesis,

zt follows a stationary mixed autoregressive-moving average process

(l )zt = (l )t , t i.i.d.(0, 2 )

in which (l ) and (l ) are polynomials in the lag operator.

Often approximate the autocorrelation of the zt by autoregressive

model to obtain the Augmented-DF (ADF) test.

For example, the ADF form corresponding to RE(3) can be written as

p

zt = A + A t + (A 1)zt 1 + j zt j + t

1

RE, to use the same critical values for ADF and DF.

J IANHUA G ANG (RUC)

S PRING

Co-Integration

2013

23 / 49

autoregressive approximation to the serial correlated when zt has a

moving average component.

Choi finds that the use of a large value of p in ADF tests can lead to

low power.

More generally, studies of power indicate that it may be very difficult

to discriminate between trend stationary and difference stationary

processes, e.g. DF test for unit roots have low power when the data

are trend stationary.

Perron finds that if the data have segmented trends, e.g. structural

breaks, then unit root test lack power.

S PRING

Co-Integration

2013

24 / 49

C O - INTEGRATION

C O - INTEGRATION

C O - INTEGRATION

about long run equilibrium relationships leads to the study of

co-integration. A simple case of a single relationship will be

considered. More general treatment are available and the book by

Banerjee et al. (1994) contains many useful discussions and

references; also see the book by Harris (1995).

co-integrated if:

1

zit I (d ), d > 0, i;

2

links with equilibrium; testing for the absence of co-integration; and

Grangers Representation theorem which concerns the Error

Correction Models.

Co-integration theory was developed by Engle and Granger (1987)

(Nobel Prize in Economics, 2003)

S PRING

Co-Integration

2013

25 / 49

that both variables are I (1).Consider ut defined by

yt = + xt + ut .In general, a linear combination of I (1)

variables, such as ut = yt ( + xt ) is also I (1).

Engle and Granger argue that there cannot be a meaningful

equilibrium relationship between yt and xt unless ut I (0),since an

I (1) error will wander widely and rarely cross the line through zero.

Thus co-integration is sometimes viewed as being required for a

certain type of equilibrium relationship.

However, economists might wish to include I (0) variables in an

equilibrium relationship, as well as I (1) regressors.

S PRING

Co-Integration

2013

at =

27 / 49

6= 0.

(z1t , z2t , ..., zmt ) CI (d, b ).

Will only consider the case d = b = 1,i.e. a linear combination of I (1)

processes is a stationary I (0) variable.

S PRING

Co-Integration

2013

C O - INTEGRATION

26 / 49

co-integration, so that, using the simple example above, the true ut

is I (1)(in which case, first difference the data and then apply classical

methods for stationary processes).

Thus, if and were known, could calculate the ut and apply an

ADF test for a unit root. If the test indicated the rejection of the unit

root restriction, the evidence could be viewed as supporting the

assumption of co-integration with ut I (0),but the parameters are, of

course, unknown.

S PRING

Co-Integration

2013

28 / 49

known as fitting the co-integrating regression (CR). If yt and xt are

co-integrated, the OLS estimator of from the CR exhibits a property

known as superconsistency because, as n , it approaches the true

value at a faster rate than in the classical stationary variables case.

However, in small samples, biases may be important.

S PRING

Co-Integration

2013

29 / 49

bt can be

used in a test for a unit root in the process ut ,which is a test for the

absence of co-integration. Two simple tests can be used with the u

bt .

One procedure is to compute the DW statistic which should be close

to zero under the unit root hypothesis. This check is not

recommended.

30 / 49

The second approach involves applying the ADF test in t-ratio form

after OLS estimation of the equation

p

b

ut = b

ut 1 + j b

ut j + et ,

1

Consider for example the PPP theory for the ex-change rate. In

perfect markets there are no arbitrage opportunities so the exchange

rate R is determined by the relative movements of the domestic price

level P and the foreign price level. P* i.e.

variables.

R=

This test is denoted CRADF. The DF tables are not valid for CRADF.

Asymptotic distributions under the unit root hypothesis depend upon

the number of I (1) regressors and whether or not the CR includes an

intercept and/or a trend term.

Finite sample critical values have been estimated by computer

method for various cases and are availble in some estimation

programs, e.g. PcGive.

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 8 Trending and

S PRING

Co-Integration

2013

S PRING

Co-Integration

2013

31 / 49

P

r = p p (in log)

P

This can be seen as an long equilibrium. Data like exchange rates and

inflation levels are usually I (1). So they are quite volatile. However if

the PPP theory is correct they should not drift apart a lot over time

i.e.

r (p p ) small

S PRING

Co-Integration

2013

32 / 49

Suppose: mt real money supply; rt interest rate; t inflation; Yt real

income.

that

1 rt 2 pt 3 pt = I (0)

vector. Note: A co-integrating vector does not always exist, or there

might be more than one co-integrating vectors.

Moreover in practice the variables above are usually I (1) so they may

co-integrate.

mt = 1 rt + 2 t + 3 Yt

1

2

3

S PRING

Co-Integration

2013

33 / 49

S PRING

Co-Integration

2013

34 / 49

co-integration using the EG methodology. The EG approach involves

some serious drawbacks.

1 Suppose for example that x1t and x2t are I (1) and we want to test for

co-integration between those two variables. Recall the way to do it is

estimate the following regression by OLS1

x1t = b

1 + b

2 x2t + u

bt

bt . Because x1t and x2t can be

treated in a symmetric fashion, hence an alternative regression can be

3 + b

4 x1t + bt

x2t = b

equivelant and they should give the same answer when the sample

used is large. In practice however, they may give different answers

because the sample sizes used are not large enough.

I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 8 Trending and

S PRING

Co-Integration

2013

Use DF test to make sure that all the variables are I (1).

Use OLS to estimate the model mt = c

1 rt + c

2 t + c

3 Yt + ubt .

Conduct test on the residual. If there is co-integration, then the

residuals must be stationary, otherwise the residuals will be I (1).

35 / 49

than one distinct co-integrating relationships (this number is actually

up to (m 1) co-integrating vectors can be found). The EG

methodology cannot estimate distinct co-integrating relationships.

The Johansen (1988) procedure provides a framework that

circumvents those problems. The Johansen approach involves

estimation of a system of equations rather than a single equation.

Before we consider this approach, we need to introduce the VAR and

VECM.

S PRING

Co-Integration

2013

36 / 49

VAR M ODEL

VAR M ODEL

VAR M ODEL

VAR M ODEL

The VAR model is, as the name suggests, an autoregression of a

vector process. Consider a simplest example of a VAR. This is a

two-variable VAR model with lag of first order (VAR(1)).

1t

y1t

11 12 y1t 1

=

+

21 22 y2t 1

y2t

2t

i yt i + t , and it can be

i =1

reparameterized as

yt = yt 1 + i yt i + t

i =2

or

Yt = 1 Yt 1 + t

p 1

yt = yt p +

ci yt i + t

i =1

analysis, as they involve only one variable (yt ). But co-integration

involves more than one variables.

Yt =

i Yt i + t

i =1

J IANHUA G ANG (RUC)

S PRING

Co-Integration

2013

37 / 49

VECM

Yt = 1 Yt 1 + t

i Yt i + t

i =1

where i = I

j

j =1

1 Yt 1 = Yt t

.

that the only term in levels is Yt p The rest of the terms appear in

differences. As it will be explained in detail later, the Johansen

approach relies on the VECM.

series. Yt = [yy1t2t ]I (1).

p 1

38 / 49

Just like the scalar AR(p) model, the VAR(p) model can finally be

reparameterised as follows

Yt = p Yt p +

S PRING

Co-Integration

2013

S PRING

Co-Integration

2013

39 / 49

Note: the right hand side is I (0) so 1 Yt 1 must be I (0) as well i.e.

the rows of the matrix 1 are co-integrating vectors and y1t and y2t

co-integrate. The rank of 1 gives the number of the linearly

independent co-integrating vectors.

Note that m = 2 so we cannot have more than one linearly

independent co-integrating vectors.

J IANHUA G ANG (RUC)

S PRING

Co-Integration

2013

40 / 49

The result from last slide can be generalized easily to higher order

VECMs. Consider the model as before and suppose that

Y (t ) = I (1). Then,

1

2

S PRING

Co-Integration

2013

we are interested in the rank of p .

41 / 49

issue.

0 < r < m : all component series are at least I (1) and co-integration

exists.

r = 0 : all component seris are I (1), but co-integration does not exists.

S PRING

Co-Integration

2013

S PRING

Co-Integration

2013

42 / 49

V ECTORS

CO - INTEGRATION

CO - INTEGRATION

properties of VECM. Now reverse the question and ask which

properties of the VECM imply co-integration.

rank (p ) = r , where r is the number of linearly independent

co-integrating vectors.

Since r m 1, p is of reduced rank (singular).

43 / 49

formulated as

g

bi )

trace (r ) = T ln(1

i =r +1

and

b r +1 )

max (r , r + 1) = T ln(1

where

the p matrix, r is the rank of matrix p , T is the number of

observations, and g is the dimension of the p .

S PRING

Co-Integration

2013

44 / 49

V ECTORS

C O - INTEGRATING V ECTORS

CO - INTEGRATING VECTORS FROM THE VECM

Recall that a reduced rank matrix can be decomposed into a product

of two full rank matrices. If co-integration exists then the m m

matrix p is of reduced rank (r < m ) and can be expressed as

The trace (r ) tests the null that the number of co-integrating vectors

is less than or equal to r against an unspecified alternative, while

the max (r , r + 1) tests the null that the number of co-integrating

vectors is r against an alternative of r + 1.

S PRING

Co-Integration

2013

45 / 49

C O - INTEGRATING V ECTORS

p =

where , are m r full rank matrices.

Consider, for example, the case of m = 2. Then if y1t , y2t

co-integrate r = 1 and

1

1 2

p = =

2

1 1 1 2

=

2 1 2 2

J IANHUA G ANG (RUC)

S PRING

Co-Integration

2013

CO - INTEGRATING VECTORS FROM THE VECM

46 / 49

Error Correction Models (ECM) have been widely used in economics

e.g. theories for the demand for money. The idea is as follows: let xt

be the optimal money balance that the individual wants to hold in

period t. Moreover, let xt the actual money stock. Equilibrium is

attained when xt = xt (long run).

In practice however xt may be different from xt due to adjustment

costs. The disequilibrium error in period t is defined as

1 ( 1 y1t + 2 y2t )

y1t

=

p Yt p =

y2t

2 ( 1 y1t + 2 y2t )

1 y1t + 2 y2t is the co-integrating relationship and defines the

co-integrating vector (when r > 1). The matrix of weightings can

be seen as a matrix of "speed adjustment" coefficients.

et = xt xt

The ECM suggests that xt changes over time to correct disequilibrium

errors that occurred in the past i.e.

xt = et 1

, where is a speed adjustment coefficient.

S PRING

Co-Integration

2013

47 / 49

S PRING

Co-Integration

2013

48 / 49

there is a co-integration. The long-run relationship is defined by

Yt p . When Yt p = 0 the system is in equilibrium. In the short

run however Yt p 6= 0 and Yt p gives the disequilibrium error.

Yt changes over time in response

1

given by the matrix , and

to past changes Yt 1 ,Yt 2 ...Yt p +1 .

S PRING

Co-Integration

2013

49 / 49

I NTRODUCTORY F INANCIAL E CONOMETRICS

Topic 9 Causality, Exogeneity and Shock

3 C REDITS , 51 H OURS

Jianhua Gang

School of Finance

Renmin University of China

Spring 2013

S PRING 2013

and Shock 1 / 31

S PRING 2013

and Shock 2 / 31

G OALS

O UR DATA

How certain variables are related to each other (Interactions,

causality);

Systemic dynamics/transmissions of shocks.

S PRING 2013

and Shock 3 / 31

Quantity, Price. These data are the result of aggregation procedures with

respect to economic agents, goods, and time.

S PRING 2013

and Shock 4 / 31

VARIABLES

of m exogenous variables, then create a linear system:

Some variables that are specific to the phenomenon under study, allow

ones to follow their evolutions.

D EFINITION (E XOGENOUS VARIABLE )

In order to explain the phenomenon, some variables may have influence on

the endogenous variables, and that the values of which are fixed outside

the phenomenon.

S PRING 2013

and Shock 5 / 31

LINEAR SYSTEM

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p + = 0,

(1)

where Aj , j = 0, 2, ..., p are n n; Bj are n m matrices, and is a n 1

vector. The A0 is supposed to be nonsingular, so that the whole system

allows for a unique determination of the current values of the endogenous

variables.

S PRING 2013

and Shock 6 / 31

autonomous (exogenously decided) expenditure (Gt ) policy.

Endogenous variables: GDPt , Ct , It .

Exogenous variables: Gt

The system,

GDPt = Ct + It + Gt

Ct = aGDPt 1

It = b (GDPt 1 GDPt 2 )

(2)

total demand; second the demand of consumption (a to be some

fraction between 0 and 1); third the propensity to invest (assuming

growth period).

J IANHUA G ANG (RUC)

S PRING 2013

and Shock 7 / 31

GDPt

GDPt 1

1 1 1

0 0 0

0 1

0 Ct a 0 0 Ct 1

0 0

1

b 0 0

It

It 1

GDPt 2

1

0 0 0

+ 0 0 0 Ct 2 0 Gt = 0

0

b 0 0

It 2

(3)

S PRING 2013

and Shock 8 / 31

R ANDOMNESS

R ANDOMNESS

R ANDOMNESS

R ANDOMNESS

D YNAMICS AND

DISTURBANCES

The dynamic model (3) is deterministic and does not reflect short-run

disturbances.

If the whole dynamics has been correctly included in the initial

specification as in (3), these disturbances should be independent.

With random factors, we may re-write the model (2) as in,

TDt = Ct + It + Gt

GDPt = TDt

C = aGDPt 1 + ut

t

It = b (GDPt 1 GDPt 2 ) + vt

(4)

total demand = total supply, while clearly the behaviors (Ct , and

It ) are determined on factors other than just revenue. So we add in

error terms in these behavior equations.

S PRING 2013

and Shock 9 / 31

a a

0

0

Ct 1

Ct 2

Ct

=

+

b b

b b

It

It 1

It 2

0

u

a

Gt 2 + t

Gt 1 +

+

vt

b

b

(5)

More compactly,

A0 yt + A1 yt 1 + A2 yt 2 + B0 xt + B1 xt 1 + B2 xt 2 + = t

S PRING 2013

and Shock10 / 31

D EFINITIONS

D EFINITIONS

Exogenous variables that can be controled by policy maker. (a.k.a.

instruments, economic policy, decision variables)

D EFINITION (E NVIRONMENT

VAR .)

variables (xt environment, zt control) in the model,

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p

+C0 zt + C1 zt 1 + ... + Cp zt p + = t

easily intervene.

S PRING 2013

and Shock11 / 31

S PRING 2013

and Shock12 / 31

VAR .

endogenous var.s is before yt . Therefore,

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p

+C0 zt + C1 zt 1 + ... + Cp zt p + = t

xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p

+F1 yt 1 + ... + Fp yt p + = ut

(6)

S PRING 2013

and Shock13 / 31

WEAK EXOGENEITY

The control var.s can have an impact on the endogenous var. or the

environment var. However, they do not influence them directly. (i.e. do

not alter through Aj , Bj , Dj , Fj );

The x are exogenous because the xt s are fixed prior to the yt s

(F0 = 0, and cov (ut , t ) = 0).

of y ).

The recursive model (6) corresponds to the weak exogeneity (with

information of lagged endogenous variable y ).

S PRING 2013

and Shock14 / 31

so as to affect the endogenous var.s.

More restrictive if we assume xt s are determined autonomously

(without a relationship to the lagged endogenous var.s). This

corresponds to imposing Fj = 0, j,

E XAMPLE

In Keynesian model, the government can alter Gt so as to influence the

economy. e.g. maintain a constant level of expenditure,

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p

+C0 zt + C1 zt 1 + ... + Cp zt p + = t

xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p

+ = ut

Gt = Gt 1 ,

(7)

S PRING 2013

and Shock15 / 31

of investment,

Gt Gt 1 = (It 1 It 2 ).

This is how the values of the control var.s will be fixed in term of the main

aggregates. And this can be expressed by adding in a policy equation,

J IANHUA G ANG (RUC)

S PRING 2013

and Shock16 / 31

From equation (8) to (10), additional recursiveness can be observed:

A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p

+C0 zt + C1 zt 1 + ... + Cp zt p + = t

xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p

+F1 yt 1 + ... + Fp yt p + = ut

zt + G1 zt 1 + ... + Gp zt p + H1 xt 1 + ... + Hp xt p

+I1 yt 1 + ... + Ip yt p + = vt

(8)

(9)

determination of z of x of y .

However, policy maker may only give values that he wants to the

coefficients Gj , Hj , Ij , whereas he does not have any influence on the

other parameters of the model.

(10)

S PRING 2013

and Shock17 / 31

S PRING 2013

and Shock18 / 31

T HE STRUCTURAL FORM

T HE STRUCTURAL FORM

S IMULTANEITY

A0 yt + A1 yt 1 + ... + Ap yt p

+B0 xt + B1 xt 1 + ... + Bp xt p + = t ,

(11)

diagonal, then equation (11) can be re-written as,

yt

= + (I A0 )yt A1 yt 1 ... Ap yt p

B0 xt B1 xt 1 ... Bp xt p + t ,

coefficients of A0 and through the nonzero contemporaneous

correlation of the elements of the vector .

While the simultaneity appearing in A0 is easily interpretable in terms

of equilibrium, the one appearing in var () is not!

(12)

System as in (12) could be difficult to interpret without additional

constraints.

J IANHUA G ANG (RUC)

S PRING 2013

and Shock19 / 31

S PRING 2013

and Shock20 / 31

T HE REDUCED FORM

T HE REDUCED FORM

D EFINITION

C OMMENTS

Endogenous var. is expressed as a function of the lagged endogenous

var.s, of the exogenous var.s, and of the disturbance term, e.g.,

yt

= A 01 (A0 yt + A1 yt 1 + ... + Ap yt p

+B0 xt + B1 xt 1 + ... + Bp xt p + ) + A01 t .

(13)

forms. Sometimes easy to calculate/estimate but...

Problem: do we really care about reduced estimations?

yt = A (0)1 ((A(L) A(0))yt + B (L)xt + ) + A(0)1 t

where,

A(L) = A0 + A 1 L + ... + Ap Lp , B (L) = B0 + B1 L + ... + Bp Lp .

J IANHUA G ANG (RUC)

S PRING 2013

and Shock21 / 31

T HE FINAL FORM

S PRING 2013

and Shock22 / 31

C AUSALITY

C AUSALITY

D EFINITION

The above expressions can be further transformed into expressing the

current value of the endogenous var.s yt as a function of the exogenous

variables and of the disturbances , < t.This is the final form.

Given all roots of the polynomial A(L) are outside the unit circle,

yt = A (L)1 B (L)xt A(L)1 + A(L)1 A0 (A01 )t ,

(control/environment) variables.

Now consider an approach: analyzing the joint evolution of the

various variables of interest, and in examining whether some of them

are fixed before others.

Can be used on processes of {xt } and {yt }.(may also be used on

control var.s {zt }).

the disturbances on y .

S PRING 2013

and Shock23 / 31

S PRING 2013

and Shock24 / 31

C AUSALITY

C AUSALITY

C AUSALITY

D EFINITIONS - CAUSALITY

D EFINITION -

C AUSALITY

NONCAUSALITY

As a result of the properties of the linear regression, the variable

forecast based on more information is necessarily the best one. i.e.

E (xt | x t 1 , y t 1 ) 6= E (xt | x t 1 );

2. y causes x instantaneously at time t iff,

then we have the following conditions.

E (xt | x t 1 , y t ) 6= E (xt | x t 1 , y t 1 ).

S PRING 2013

and Shock25 / 31

C AUSALITY

C AUSALITY

C AUSALITY

D EFINITION -

D EFINITION -

NONCAUSALITY

S PRING 2013

and Shock26 / 31

C AUSALITY

NONCAUSALITY

D EFINITION (N ONCAUSALITY )

1. y does not cause x at time t iff,

var ((xt | x t 1 , y t 1 )) = var ((xt | x t 1 ));

2. y does not cause x instantaneously at time t iff,

C OROLLARY (S YMMETRIC )

The two following statements are equivalent:

1. y does not cause x instantaneously at time t;

2. x does not cause y instantaneously at time t.

S PRING 2013

and Shock27 / 31

S PRING 2013

and Shock28 / 31

C AUSALITY

C AUSALITY

C AUSALITY

C AUSALITY R EVERSAL

L IMIT

reality, for certain phenomena we could observe a causality reversal.

It is clear that definitions of causality as shown above involves

conditions on the forecast error only.

such reversals.

predictability and instantaneous predictability instead of

causality and instantaneous causality.

y does not cause x (instantaneous) iff y does not cause x (instantaneous)

at time t for all possible times t.

Therefore, should keep in mind constantly the previous definitions is

sometimes not suitable to describe real-world phenomenon.

S PRING 2013

and Shock29 / 31

Consider the expression of VAR,

yt

y (L) yx (L)

c

= y + yt

cx

xt

xt

xy (L) x (L)

(14)

characteristic polynomial are satisfied. We can hence choose a

normalization of the type,

y (0) yx (0)

=I

(0) =

xy (0) x (0)

Therefore, in this case, all simultaneous links between the two

processes are summarized in the covariance cov (yt , xt ).

S PRING 2013

and Shock31 / 31

apparent that the definitions for a certain date or for all dates

coincide.

S PRING 2013

and Shock30 / 31

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.