Вы находитесь на странице: 1из 7

Autocorrelation

There is serial correlation (i.e. autocorrelation) when either the dependent variable (AR models) or the residual (MA morels) show correlation with its values in pasts periods. This is a problem because standard errors (even heteroskedastic robusts) are not concistent, affecting statistical inferences (i.e. hypothesis testing). In cross section data, it is relatively safe to assume

E(ei,ej)=0

In words, the residuals are completely independent across observations. However, it is usually the case that errors are not independent in time series. Observations for the same individual in different periods of time are usually correlated. There are many factors that may cause this (auto)correlation:

a) Cyclical components

b) Ommited variable bias

c) Functional form misspecification

d) Data manipulation: secondary effect of smooting or interpolation techniques in copiling statistics

Consider the simplest bivariate model

y

=

t

0

+

1

x

t

+ e

t

where the residuals

e

t

e

t

=

e

t

1

+ u

t

follow a first order autoregression process

and

coefficient and indicates how strong the autocorrelation is. Replacing iteratively r times,

the residuals can be expressed as e geometric series of the error terms

u is an error term with no serial correlation. The parameter is the correlation

t

u .

t

e

t

=

u

t

+

u

t

1

+

2

u

t

2

+

Since 1 <

< 1 and

r

+ 1

e

t

=

i

=

0

where

i

u

t

i

(

E e

V

(

e

t

t

)

)

• •

E

(

i

u

= =

t

i

)

i

= 0

=

V

(

i

= 0

i

u

t

i

)

=

i

=

0

i

= 0

e

(

+

t

r

+ 1

r

u

t

r

+

r

+ 1

e

t

r

+

1

=

r

i = 0

i

u

t

i

+

r

+ 1

e

t

r

+ 1

tend to zero, the last term can be dropped as r

i

(

E u

t

i

)

2

V

i

(

(

)

u

=

t

0

i

)

2

)

=

2

u

1

2

and

The last result for the variance of the residuals assume

u

t

is homoskedastic.

ˆ

The OLS estimator

The variance of beta looks like

1 is unbiased and consistent but the usual standard errors are not.

V (

ˆ

1

) =

2

u 2 ∑ ( x x ) t
u
2
(
x
x
)
t

1 +

∑∑

(

x

i

x

)(

x

j

x

)

k

(

x

i

x

)

2

Which is clearly no the simplest variance case for homoskedastic errors. It is very likely that the standard errors are underestimated which will lead to the usual t and F statistic not to be valid.

Detecting Autocorrelation in the Error Term

Since the true population errors are not observed, all autocorrelation detection procedures are based on regression residuals which are the sample estimates of the population error terms. The first method relies on graphical detection. This method is very important and should be always consider the initial detection step. You run the regression and obtain the residuals which should be plotted over time. We work with the workfile sugar.wf1 (which is part of the data sets provided by Hill,

Griffiths and Judge, 2002) and estimate

. Figure 1 presets the

graphs of the residuals (levels and first difference). The general pattern of the residuals reflects some inertia where positive (negative) errors are followed by positive errors (negative)

ln(A ) =

t

0

+

1

ln(P ) + e

t

t

Method 2: Durbin Watson Test:

Then, you have more formal tests. The most commonly used is the Durbin-Watson test

and it is based on simulations of the AR(1) model for the error term

different to the AR(1) model for the dependent variable. Stock and Watson mostly work with this second case while here we introduce the AR(1) model to the binary regression analysis. Consider a regression model with two equations:

e . Notice that this is

t

y

t

e t

= =
=
=

0

e

t

+

1

1

x

+ u

t

t

+ e

t

). E-views

automatically produce the Durbin Watson (d) statistic which is reported on the regression

This is a two-sided sided test of no serial correlation (

H

0

:

e

=

0

,

H

1

:

e

π

0

output. Consider, again,

ln(A ) =

t

0

+

1

ln(P ) + e

t

t

Dependent Variable: LOG(A) Method: Least Squares

 

Date: 12/04/08

Time: 23:58

Sample: 1 34 Included observations: 34

Variable

Coefficient

Std. Error

t-Statistic

Prob.

C

6.111328

0.168570

36.25397

0.0000

LOG(P)

0.970582

0.110629

8.773336

0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat

0.706345

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

4.707273

0.697168

0.561094

0.308771

0.544589

3.050865

0.634375

-7.258010

76.97143

1.291242

0.000000

You can find the d-statistic on the botton left-hand corner of the regression output. This statistics is defined as:

d

= 2(1

)

Therefore, no serial correlation is consistent with a d-statistic close to 2. When d = 2 , the correlation coefficient is zero and there is no evidence of autocorrelation in the error

If d < 2 ( d > 2 ), there is evidence of positive (negative) autocorrelation. There is

perfect positive (negative) autocorrelation when d = 0 ( d = 4 ).

E-views does not provide tables for the critical values or p values of the Durbin-Watson test. In our case, the d-statistic is 1.29, situated in the rejection area 1 . You will not be required to do hypothesis testing with the d-w statistic but you need to know what it is for.

term

1 Go to http://www.stanford.edu/~clint/bench/dwcrit.htm. For T=34 and 2 parameters, you have two critical values for 5% significance level 1.39 (lower level) and 1.51 (upper level). A d statistic below 1.39 is strong evidence of positive serial correlation while you do not reject the null of no correlation if the d statistic is above the upper level (1.51). Values in the interval are mixed evidence so it is inconclusive.

Method 3: LM Test.

Durbin-Watson is, together with AFD unit root test, the most commonly used test in time series. However, it is import to know that it is not relevant in many instances, for instance if the error distribution is not normal, or if you have the dependent variable in a lagged form as an independent variable this is not an appropriate test for autocorrelation. A test that is suggested that does not have these limitations is the Lagrange Multiplier test for autocorrelation or Breusch-Godfrey test.

e is estimated

Starting from the initial equation

and its first lag introduced in an auxiliary regression. The t-statistic of the new variable gives you . Eviews compute this statistic directly from the equation window, clicking on View/Residual Tests/SerialCorrelation LM Test. You should indicate the number of lagged residuals to be included (in this cse, only one). The final outpt should be similar to:

y

=

t

0

+

1

x

t

+ e

t

, the perturbation term

t

Breusch-Godfrey Serial Correlation LM Test:

 

F-statistic

4.022271

Probability

0.053699

Obs*R-squared

3.904864

Probability

0.048147

Test Equation:

Dependent Variable: RESID Method: Least Squares

 

Date: 12/05/08

Time: 01:58

Variable

Coefficient

Std. Error

t-Statistic

Prob.

C

0.019701

0.161432

0.122040

0.9037

LOG(P)

0.011818

0.105912

0.111581

0.9119

RESID(-1)

0.343298

0.171173

2.005560

0.0537

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat

0.114849

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)

1.01E-16

0.057742

0.304057

0.295148

0.481415

2.700477

0.616094

-5.184062

2.011135

1.978183

0.150928

Since the p-val is higher than 5% but below 10%, the rejection or no rejection of the null hypothesis of no serial correlation depends on the significance level selected. At 5%, we can not reject the hypothesis that there is no autocorrelation. This result contradict our

conclusion from the d-statistics. However, take into account we can reject no serial correlation at 10% significance level. One advantage of this test is that can be generalized to higher order of autocorrelation. You should introduce further lags of the estimated residuals and use the F-statistic to thest the null hypothesis of no serial correlation.

Figure 1: Residual Plot

Equation _reg01.ls LOG(A) C LOG(P) ‘warning: this is time series so no heterosk. robust se. line resid line d(resid)

0.8

0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 -0.2 -0.2 -0.4 -0.4 -0.6 -0.6
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
5
10
15
20
25
30
RESID
5 10 15 20 25 30
5
10
15
20
25
30
D(RESID)
D(RESID)

The residual line plot shows some positive correlation while the first difference shows negative correlation.

Code: E-views code for serial correlation

‘This is the Durbin-Watson statistic Equation _reg01.ls LOG(A) C LOG(P) Scalar d-stat=@dw

‘Lagrange Multiplier test Equation _reg01.ls LOG(A) C LOG(P) Genr ehat=resid Equation _reg01.ls LOG(A) C LOG(P) ehat(-1)