Вы находитесь на странице: 1из 19

BABS 502

Lecture 8
ARIMA Forecasting II
March 16 and 21, 2011

Content
The Box-Jenkins Modeling Process
Seasonal ARIMA Models
Concluding comments on ARIMA models

Martin L. Puterman 2008

The Box Jenkins Approach to


forecasting with ARIMA models

Identification
Fitting
Diagnostics
Refitting if necessary
Forecasting

Martin L. Puterman 2010

Identification
What does the data look like?
What patterns exist?
Is the data stationary?

Tools
Plots of data
PACF
ACF

Martin L. Puterman 2008

Model Fitting
Trial model is proposed
e.g. ARIMA(0,1,2)

Model parameters are estimated using


statistical software
Output includes

Parameter estimates
Test statistics
Goodness of fit measures
Residuals
Diagnostics
Martin L. Puterman 2008

Diagnostics
Determines whether model fits data adequately.
The goal is to extract all information and ensure that
residuals are white noise

Key measures
ACF of Residuals
PACF of Residuals
Ljung-Box-Pierce Q Statistic (Portmanteau Test)
Tests whether a set of residual autocorrelations is
significantly different than zero.
See next slide for details

If model deemed adequate, proceed with


forecasting, otherwise try a new model.
Martin L. Puterman 2008

Comments on Model Adequacy Testing


(NCSS Documentation)
The Portmanteau Test (sometimes called the Box-Pierce-Ljung
statistic) is used to determine if there is any pattern left in the
residuals that may be modeled. This is accomplished by testing the
significance of the autocorrelations up to a certain lag. In a private
communication with Dr. Greta Ljung, we have learned that this test
should only be used for lags between 13 and 24. The test is
computed as :
2

rj

Q(k ) N ( N 2)
j 1

N j

where rj is the jth residual autocorrelation.


Under H0: All residual autocorrelations equal zero; Q(k) is
distributed as a Chi-square with (K-p-q-P-Q) degrees of freedom
where p,q,P and Q are the model orders.

Martin L. Puterman 2008

Forecasting with ARIMA models


ARIMA forecasting is done automatically in any
statistical program.
You should try to figure out how this works in
terms of the equation for the model.
It helps to write out model equation
This is complicated with seasonal models, we will discuss
this below.

In AR portion of models use past values in forecasts


In MA portion of models use past residuals in
forecasts.

Prediction intervals are usually very wide; out of


sample forecast errors might be more reliable.
Martin L. Puterman 2008

Google Share Price Forecasting


Series
Monthly price (Jan 23, 2006 March 3, 2008)
Model
Regular(1,1,0) Seasonal (No seasonal parameters)
Observations
111
Root Mean Square
20.12511
Model Estimation Section
Parameter
Parameter
Standard
Name
Estimate
Error
T-Value
AR(1)
0.2383635
.009
2.5978
Forecast of price
Row
Date
112
2094
113
2095
114
2096
115
2097
116
2098

Forecast
423.5
421.3
420.8
420.7
420.7

Lower
360.7
340.4
324.8
311.6
299.9

Prob
Level
0.009382

Upper 95% Limit


486.3
502.3
516.8
529.8
541.4

Fitted Model
Xt+1-Xt = .238 (Xt Xt-1) or Xt+1 = Xt + .238 (Xt Xt-1)

One Step Ahead Forecast = 432.70 + .238 * (432.7 471.2) = 423.5

Martin L. Puterman 2008

Seasonal ARIMA Models


The basic concept is to add extra terms to model that
take into account a persistent seasonal pattern

For example, a AR model for monthly data may contain


information from lag 12, lag 24, etc.
i.e. Yt = A1Yt-12 +A2 Yt-24 + et
This is referred to as an ARIMA(0,0,0)x(2,0,0)12 model

General form is ARIMA(p,d,q)x(ps,ds,qs)s


This combines both non-seasonal and seasonal terms

This provides a broader class of models.


The challenge is to select a model from a larger class.

Martin L. Puterman 2008

10

Wages Data
Autocorrelations of Wages (0,0,12,0,0)

Partial Autocorrelations of Wages (0,0,12,0,0)

Partial Autocorrelations

1.0

0.5

0.0

-0.5

-1.0
0.0

10.3

20.5

30.8

0.5

0.0

-0.5

-1.0
0.0

41.0

Time

10.3

20.5

30.8

41.0

Time

Plot of Wages
6.0

Observe data is non-stationary

5.7

Wages

Autocorrelations

1.0

5.4

5.1

4.8
0.9

18.9

36.9

54.9

72.9

Time

Martin L. Puterman 2008

11

Differenced Wages Data


Partial Autocorrelations of Wages (1,0,12,0,0)

Autocorrelations of Wages (1,0,12,0,0)


1.0

Partial Autocorrelations

Autocorrelations

1.0

0.5

0.0

-0.5

-1.0
0.0

0.5

0.0

-0.5

-1.0
0.0
10.3

20.5

30.8

10.3

41.0

Time

Autocorrelations of Wages (1,0,12,0,0)


Lag
Correlation
Lag
Lag
Correlation
1
-0.055496 11
0.062967
2
-0.004269 12
0.506937
3
0.298826
13
-0.056564
4
0.108858
14
0.041622
5
0.073639
15
0.287086
6
0.121665
16
0.001067
7
0.048026
17
0.088581
8
0.069370
18
0.092850
9
0.218783
19
0.010672
10
0.044824
20
0.115261
Significant if |Correlation|> 0.237356

20.5

30.8

41.0

Time

Correlation

Lag

Correlation

21
22
23
24
25
26
27
28
29
30

31
32
33
34
35
36
37
38
39
40

0.013874
0.152615
0.077908
0.013874
0.118463
0.328709
-0.086446
0.028815
0.205977
-0.066169

0.200640
0.016009
0.008538
0.494130
-0.024546
0.058698
0.200640
0.003202
0.036286
0.091782

Martin L. Puterman 2008

12

Model Fitting ARIMA(0,1,3)x(0,0,1)12


Model Estimation Section
Parameter
Name
Estimate
MA(1)
0.1390065
MA(2)
1.547035E-02
MA(3)
-0.2083403
SMA(1) -0.5427189

Standard
Error
0.1200761
0.1202638
0.1170662
0.1019158

Prob
Level
0.247006
0.897645
0.075128
0.000000

T-Value
1.1577
0.1286
-1.7797
-5.3252
Forecasts of Wages

Autocorrelations of Residuals

6.4

6.0

0.5

Wages

Autocorrelations

1.0

0.0

-0.5

-1.0
0.0

5.6

5.2

12.3

24.5

36.8

49.0

4.8
1982.9

Lag

1984.9

1986.8

1988.8

1990.8

Time

Ljung-Box 24

20

34.96

0.020343

Martin L. Puterman 2008

Inadequate Model
13

Model Fitting ARIMA(0,1,3)x(0,0,2)12


Model Estimation Section
Parameter
Name Estimate
MA(1) 0.2134133
MA(2) 7.882232E-02
MA(3) -0.3358605
SMA(1) -0.4282575
SMA(2) -0.8555523

Standard
Error
0.1150536
0.1161695
0.1119487
0.1181367
6.011709E-02

Prob
Level
0.06361
0.49744
0.00269
0.00028
0.00000

T-Value
1.8549
0.6785
-3.000
-3.625
-14.23
Wages Chart

Autocorrelations of Residuals

6.2

1.0

Wages

Autocorrelations

5.9
0.5

0.0

5.2

-0.5

-1.0
0.0

5.5

12.3

24.5

36.8

49.0

4.8
1982.9

Lag

Martin L. Puterman 2008

1984.9

1986.8

1988.8

1990.8

Time

14

Model Fitting ARIMA(0,1,0)x(1,1,0)12


Model Estimation
Parameter
Name
SAR(1)

Section
Parameter
Estimate
-0.5495576

Standard
Error
8.447082E-02

T-Value
-6.5059

Prob
Level
0.000000

Wages Chart
Autocorrelations of Residuals
6.2
1.0

Wages

Autocorrelations

5.9
0.5

0.0

5.5

5.2
-0.5

-1.0
0.0

12.3

24.5

36.8

49.0

4.8
1982.9

Lag

Martin L. Puterman 2008

1984.9

1986.8

1988.8

1990.8

Time

15

Model Comparison
Model

RMSE

LjungBox (24)

Residual
ACF
Autocorrelations of Residuals

34.96

Autocorrelations

(0,1,3)x(0,0,1)12 .0316

1.0

0.5

0.0

-0.5

-1.0
0.0

12.3

24.5

36.8

49.0

Lag

11.08

Autocorrelations

(0,1,3)x(0,0,2)12 .0245

Autocorrelations of Residuals
1.0

0.5

0.0

-0.5

-1.0
0.0

12.3

24.5

36.8

49.0

Lag

Autocorrelations of Residuals

(0,1,0)x(1,1,0)12 .0239

15.84

Autocorrelations

1.0

0.5

0.0

-0.5

-1.0
0.0

12.3

24.5

36.8

49.0

Lag

But we are concerned about forecasting and should compare models


out of sample (usually simpler models are better).
Also forecasts from the last model looks most reasonable.
Martin L. Puterman 2008

16

Interpreting Seasonal Models


What does a ARIMA(1,0,0)x(1,0,0)12 model mean in terms of the
data xt?
We use the backshift operator Bxt = xt-1, the identity operator Ixt = xt
and the difference operator Dxt = (I B)xt = xt xt-1 to understand
this.
An AR(1) model is written as

(I a1B) xt = et
which becomes xt a1xt-1 = et
which becomes xt = a1xt-1 + et

Note B2xt = B(Bxt) = Bxt-1 = xt-2.


An AR(2) model is written as

(I a1B a2B2) xt = et

An MA(1) model is written as


xt = (I b1B) et = et b1et-1
Martin L. Puterman 2011

17

What does a ARIMA(1,0,0)x(1,0,0)12 model mean in terms of the data xt?


It is written as:
(I a12B12)(I a1B) xt = et
Note the order of the terms on the left doesnt matter. Above can be rewritten
as
(I a1B)(I a12B12) xt = et
or
(I a1B - a12B12 + a1a12 B13) xt = et
or
xt a1 xt-1 a12 xt-12 + a1a12xt-13 = et
finally
xt = a1 xt-1 + a12 xt-12 - a1a12xt-13 + et
This is analogous to regressing xt on xt-1, xt-12, xt-13. And forecasts will be
based on past or predicted values for these quantities.
Martin L. Puterman 2011

18

Concluding Comments

The ARIMA models are not designed for models with multiplicative seasonality. In
such cases;

Use log transforms.


De-seasonalize and use ARIMA on de-seasonalized data.

Models with persistent trends can be de-trended and ARIMA applied to the detrended series.

Several automatic fitting programs do a good job fitting ARIMA models (Not NCSS)

Parsimony is desirable use models with as few as terms as possible

AIC and BIC criterion penalize number of terms in the model


Theoretical result any high order MA model can be written as a low order AR model and
vice versa; e.g. an MA(6) can be closely represented by an AR(1) or AR(2) model

Key point Above approach to model selection is based on in sample fitting

Need to compare all models on the basis of out-of sample forecasts on holdout data.

Simpler ARIMA models seem to work better out of sample even though they may not give
the best fit.
Recall from early slides that fitting is different than forecasting.

ARIMA models forecasts can be pooled with those from one or more other models.
Martin L. Puterman 2011

19

Оценить