Вы находитесь на странице: 1из 28

Bootstrapping EXPOS methods: a case study

Clara Cordeiro
DM/FCT, University of Algarve, Portugal
based on joint work with Professor M. Manuela Neves (supervisior)
DM/ISA, Technical University of Lisbon, Portugal

June 23, 2009

ISF 2009, Hong Kong 21-24 June

Clara Cordeiro

Bootstrapping EXPOS methods

1 / 28

Outline
1

Introduction

Overview of the EXPOS methods

Overview of the Bootstrap

Procedure

Case study

Closing coments

References

Clara Cordeiro

Bootstrapping EXPOS methods

2 / 28

Introduction

Motivation
Boot motivation:
Is a resampling scheme that has a huge application in many fields;
Is a very popular methodology because of its simplicity and nice
properties;
There has been a great development in the area of dependent data.

EXPOS motivation:
It refers to a set of methods that can be used to model and to obtain
forecasts;
Is a versatile approach that continually updates a forecast
emphasizing the most recent experience;
Are the most widely used forecasting methods.
Clara Cordeiro

Bootstrapping EXPOS methods

3 / 28

Introduction

Idea
The idea is to join these two approaches and to construct a
computational algorithm to obtain:
forecasts;
forecast intervals (@work...);
and in a long plan use it as a missing data imputation (@work...).
What to use?
Use EXPOS methods to select the model that better fits a data set;
Use Boot to resampling and reconstructed a replica of the original
data set;
Use

Clara Cordeiro

software and packages to build the Boot.EXPOS procedure.

Bootstrapping EXPOS methods

4 / 28

Introduction

Work chronology

Past studies:
forecasting using Holt-Winters method and some depend bootstrap
approaches; sieve bootstrap has revealed a good option;
model selection increased including SES, Holts linear and HW
additive and multiplicative methods;
first sketch of a computational algorithm where some considerations
are made: stationarity, BoxCox transformations, differencing, ...;
applied to the M3 competition; good behavior among six well-know
methods but problems with small data sets;
...

Clara Cordeiro

Bootstrapping EXPOS methods

5 / 28

Introduction

Work chronology

Recent studies:
EXPOS selection has been augmented to a choice of thirty methods
(additive and multiplicative error term);
case study of 40 well-known time series is performed;
BoxCox transformation with 0 < < 1 is used;
run procedure using the M3 competition time series;
with this large set of model selection, better point forecasts were
obtained. Bootstrap intervals are narrower. BCa bootstrap intervals
can improve the results (@work...);
...

Clara Cordeiro

Bootstrapping EXPOS methods

6 / 28

Introduction

Objective of the project

In sume:
Consider EXPOS in modeling time series, instead of the traditional
ARIMA class.
Combines the use of EXPOS methods with the bootstrap
methodology - Boot.EXPOS.
Use the Boot.EXPOS procedure to obtain forecasts.
Test it on some well-known data sets and then use it on large data
sets.
The forecast performance is evaluated using some accuracy measures
and the results are compared with other forecasting EXPOS methods.

Clara Cordeiro

Bootstrapping EXPOS methods

7 / 28

Follow up: EXPOS methods

Clara Cordeiro

Bootstrapping EXPOS methods

8 / 28

Overview of the EXPOS methods

Some notes...

A times series is a combination of a pattern and some random error.


According to the characteristics that a times series reveals one of
those EXPOS methods is chosen by the AIC criterion.
The exponential smoothing parameters (, , ) are estimated by
minimizing the MSE.
The goal is to find the best EXPOS model and to separate the
pattern (trend or/and seasonality) from the error term.

Clara Cordeiro

Bootstrapping EXPOS methods

9 / 28

Overview of the EXPOS methods

The exponential smoothing classification


Historical evolution: Pegel (1969), extended by Gardner (1985),
Hyndman et al. (2002) and Taylor (2003).

Trend
Component
N (None)
A (Additive)
Ad (Additive damped)
M (Multiplicative)
Md (Multiplicative damped)

N
(None)
N,N
A,N
Ad,N
M,N
Md,N

Seasonal Component
A
M
(Additive) (Multiplicative)
N,A
N,M
A,A
A,M
Ad,A
Ad,M
M,A
M,M
Md,A
Md,M

For each method in the framework, additive error and multiplicative error
versions are considered.

Clara Cordeiro

Bootstrapping EXPOS methods

10 / 28

Follow up: Bootstrap methodology

Clara Cordeiro

Bootstrapping EXPOS methods

11 / 28

Overview of the Bootstrap

What is Bootstrap?

The bootstrap methodology (Efron 1979) (IID bootstrap) is a


versatile approach and so it could be applied to many models.
Advantage: relies on few assumptions and is easy to implement;
Disadvantage: is time consuming.

Data structure: i.i.d. or dependent.


Generally not too hard to do it.
Draw random samples with replacement;
Extract results, such as predictions;
Iterative calculation;
Accumulate the results.

Clara Cordeiro

Bootstrapping EXPOS methods

12 / 28

Overview of the Bootstrap

Questioning Boot
Does it really work?
Yes!
Key of success is to make sure that the bootstrap resampling correctly
mimics the original sampling
Key assumption is independence

Bootstrap always work?


No!
It just works much more often than any of the common alternatives

Cases when it fails


Resampling done incorrectly, failing to preserve the original sampling
structure
Data are dependent, but resampling done as though they were
independent
Some really weird statistics, like the maximum, that depend on very
small features of the data

Clara Cordeiro

Bootstrapping EXPOS methods

13 / 28

Overview of the Bootstrap

Bootstrap and Dependent Data

The majority of bootstrap methods for dependent data suggests the


use of blocks, in order to keep the dependence structure.
The blocks are resampled as in the independent case and within the
blocks the dependence structure is kept.
However, if the time series process is driven from iid innovations
another way of resampling can be used; then the IID bootstrap can
easily be extended to the dependent case.
The autoregressive AR(p) is a commonly example of such a process.

Clara Cordeiro

Bootstrapping EXPOS methods

14 / 28

Connecting Boot & EXPOS

Any good model should yield residuals that do not show significant
patterns.
Most of exponential smoothing models do not yield white noise
residuals.
In fact it is commonly found some pattern left in the residuals.
In order to model such left-over patterns an autoregressive process is
used to filter the EXPOS residuals series.
Because of the iid nature of the AR residuals, the IID bootstrap can
easily be extended to the dependent case.
These model-based resampling for time series is based on resampling
from the AR residuals.

Clara Cordeiro

Bootstrapping EXPOS methods

15 / 28

Example
Decomposition by ETS(A,Ad,A) method
6e+05
3e+05
6e+05
0e+00

level

3e+05
150000e+00

slope

5000
10000 5000

season
1980

15000

1970

1990

1955

1960

1965

1970

1975

Time

dole: EXPOS resid

dole: AR resid

1985

1990

0
30000

20000

0
40000

1965

1970

1975

1980

1985

1990

1955

time series dole

best EXPOS
select by AIC

EXPOS residuals

4
1960

1965

1970

1975

1980

1985

1990

AR residuals

10

15
Lag

Clara Cordeiro

20

25

10

15
Lag

20

25

0.05

PACF

0.15

ACF

0.05
0.15

0.05

0.05

0.4
0.2
0.0
0.2

0.2

0.0

PACF

0.2

0.4

0.15

1960

0.15

1955

ACF

1980

Year

40000

1960

6e+05
4e+05
2e+05
0e+00

monthly total of people

8e+05

observed

Unemployment benefits in Australia

10

15
Lag

20

25

10

15

20

25

Lag

Bootstrapping EXPOS methods

16 / 28

Inside the procedure

Simulation examples
Simulation 1
AR reconstruction (#1)

AR reconstruction (#1)+components

4e+05

4e+05

6e+05

8e+05

6e+05

Forecasts from ETS(A,Ad,A)

2e+05

2e+05

100

200

300

400

500

100

200

300

400

0e+00

0e+00

30000

20000

10000

10000

20000

30000

40000

Resampling (#1)

1955

1960

1965

1970

1975

1980

1985

1990

1960

1970

1980

1990

Simulation 2
AR reconstruction (#2)

AR reconstruction (#2)+components

4e+05

6e+05

8e+05

6e+05

Forecasts from ETS(A,Ad,A)

Clara Cordeiro

100

200

300

400

500

2e+05

2e+05
0

100

200

300

400

0e+00

0e+00

30000

30000

10000

10000

4e+05

10000

30000

10000 20000 30000

Resampling (#2)

1955

1960

1965

1970

1975

1980

Bootstrapping EXPOS methods

1985

1990

1960

1970

1980

1990

17 / 28

Inside the procedure

In each b = 1, , B simulations, h forecast are achieved and


recorded in a matrix

f11 f12 f1h


f21 f22 f2h

..
..
..
..
.
.
.
.
fB1 fB2 fBh
At the end of the B replications, the forecasts achieved through
Boot.EXPOS procedure are the mean over each column
For example, for the dole time series
Forecast comparison: serie dole

EXPOS

Boot.EXPOS

800000
700000

750000

Monthly total

850000

real values

10

11

12

Forecast horizon

Clara Cordeiro

Bootstrapping EXPOS methods

18 / 28

Procedure

Boot.EXPOS

Complete description
Remark: previous EXPOS fit required
1

Adjust an AR model to the residuals with increasing order p selected


by AIC criterion;

Obtain the AR residuals and center them;

Draw a random sample from the centered residuals;

Use AR model recursively for obtaining a bootstrap series of the


residuals;

Construct a time series replica using the EXPOS components and the
previous bootstrap series;

Obtain h step-ahead forecasts from the new time series using the
EXPOS fit;

Repeat Step 3 to Step 6, B times;

For each h, obtain the mean of the B forecasts.

Clara Cordeiro

Bootstrapping EXPOS methods

19 / 28

Follow up: Case study

Clara Cordeiro

Bootstrapping EXPOS methods

20 / 28

Case study

The selection

Six time series were chosen:


the first serie, nav, refers to the number of airplanes that flight in the
Flight Information Region (FIR) of Lisbon;
pigs, ukdeaths and writing are three series that can be found in
package fma;
UKDriverDeaths in
gas in the

Clara Cordeiro

package datasets;

package forecast.

Bootstrapping EXPOS methods

21 / 28

Case study

The data
Number of pigs slaughtered in Victoria, Australia

Deaths and serious injuries on UK roads

2000

2005

1800
1400
1200

40000
1995

1600

monthly total

2000

100000
60000

80000

monthly total

30000

1980

1985

1990

1995

1976

1978

1980

1982

YEAR

YEAR

YEAR

Sales of printing and writing paper

Road casualties in Great Britain

Australian gas production

50000
30000

2000

1968

1970

1972

1974
YEAR

Clara Cordeiro

1976

1978

1000

200

10000

400

600

monthly totals of car drivers

800

2500

1984

1500

monthly total

20000
10000

1990

1000

1985

thousands of French francs

2200

120000

Number of airplains in the FIR Lisbon

1970

1975

1980

1985

YEAR

Bootstrapping EXPOS methods

1960

1970

1980

1990

YEAR

22 / 28

Case study

Computing
for each time series the best EXPOS method is selected using AIC
EXPOS residuals are extracted and tested for white noise hypotheses
apply Boot.EXPOS
obtain the forecasts for the h period ahead
evaluate the performance of Boot.EXPOS procedure using some
accuracy measures:
Acronyms
RMSE
MAE
MAPE

Clara Cordeiro

Definition
Root Mean Squared Error
Mean Absolute Error
Mean Absolute Percentage Error

Bootstrapping EXPOS methods

p Formula
(mean(Et2 ))
mean(|E t |)

mean(100 YEt )
t

23 / 28

Case study

Forecast results
Forecast comparison: serie Nav

Forecast comparison: serie pigs

real values

Forecast comparison: serie ukdeaths

Boot.EXPOS

Boot.EXPOS

1600

EXPOS

10

11

12

1200
1

Forecast horizon

10

11

12

1800

real values

Forecast horizon

Clara Cordeiro

10

11

12

10

11

12

real values

EXPOS

11

12

Boot.EXPOS

60000
Monthly total

1200
3

Forecast comparison: serie gas

Boot.EXPOS

1600

EXPOS

40000

1000
2

Forecast horizon

1400

Monthly total

800
600
400
1

Forecast comparison: serie UKDriverDeaths

Boot.EXPOS

1000

EXPOS

Forecast horizon

Forecast comparison: serie writing

real values

70000

50000

Monthly total

real values

85000

30000

95000

Monthly total

105000

Monthly total

40000
35000

Monthly total

EXPOS

1800

Boot.EXPOS

1400

EXPOS

115000

real values

10

11

12

Forecast horizon

Bootstrapping EXPOS methods

10

Forecast horizon

24 / 28

Case study

Accuracy results

Serie
nav

n
279

s
12

h
12

pigs

188

12

12

(A,N,A)

ukdeaths

120

12

12

(M,N,M)

writing

120

12

12

(A,A,A)

UKDriverDeaths

192

12

12

(M,N,A)

gas

476

12

12

(M,Md,M)

Clara Cordeiro

EXPOS fit
(M,A,M)

method
EXPOS
Boot.EXPOS
EXPOS
Boot.EXPOS
EXPOS
Boot.EXPOS
EXPOS
Boot.EXPOS
EXPOS
Boot.EXPOS
EXPOS
Boot.EXPOS

Accuracy measures
RMSE
MAE
MAPE
3661.23
3369.51
10.15
3456.60
3128.17
9.44
9377.28
7554.21
7.48
8653.24
6475.87
6.49
156.84
143.16
10.13
89.84
71.28
4.89
58.61
44.96
5.97
57.21
43.95
5.92
205.63
198.49
14.68
87.78
70.60
5.09
2773.72
2097.73
4.22
2348.16
1908.15
3.84

Bootstrapping EXPOS methods

25 / 28

Closing coments

About this study:


For these time series the Boot.EXPOS has a good behavior in
obtaining forecasts.
The optimal combination of EXPOS methods and bootstrap
resampling could provide more accurate forecasts.
About others studies:
Considering other data sets that we have analyzed, the Boot.EXPOS
has also a good performance.
If transformation is applied to the data, more series have good
accuracy results.

Clara Cordeiro

Bootstrapping EXPOS methods

26 / 28

Acknowledgements: To the IIF for the travel award that make it possible
to participate in the ISF 2009.

Thank you!
Complete references on the next slide.

Clara Cordeiro

Bootstrapping EXPOS methods

27 / 28

References
Brockwell, P. and Davis, R., Introduction to Time Series and Forescasting, 2nd edition, Springer-Verlag New York, 2002.
Cordeiro, C. and Neves, M., Bootstrap and exponential smoothing working together in forecasting time series,
Proceedings in Computational Statistics (COMPSTAT 2008), Paula Brito (editor), Physica-Verlag, 2008,pp. 891899.
Cordeiro, C. and Neves, M., Forecasting time series with Boot.EXPOS procedure, to be appear in REVSTAT statistical
Journal, June, 2009.
Davison, A.C. and Hinkley, D.V., Bootstrap Methods and their Application, Cambridge University Press, 1998.
Efron, B, Bootstrap Methods: Another Look at the Jackknife, The Annals of Statistics, 7 (1979), pp. 126.
Gardner Jr, E.S., Exponential Smoothing: The State of the Art-Part II, International Journal of Forecasting, 22 (2006),
pp. 637666.
Hyndman, R., forecast: Forecasting functions for time series; software available at
http://www.robjhyndman.com/Rlibrary/forecast/.
Hyndman, R., expsmooth: Data sets for Forecasting with exponential smoothing; software available at
http://www.robjhyndman.com/Rlibrary/forecast/.
Hyndman, R., fma: Data sets from Forecasting: methods and applications by Makridakis, Wheelwright & Hyndman
(1998); software available at http://www.robjhyndman.com/Rlibrary/forecast/.
Hyndman, R., Koehler, A., Ord, J. and Snyder, R., Forecasting with Exponential Smoothing: The State Space
Approach, Springer-Verlag Inc, 2008.
Lahiri, S.N., Resampling Methods for Dependente Data, Springer Verlag Inc, 2003.
R Develpment core team, R: A Language and Environment for Statistical Computing; software available at
http://www.R-project.org.
Trapletti, A. and Hornik, K., tseries: Time Series Analysis and Computational Finance; R package version 0.10-18, 2009.
Clara Cordeiro

Bootstrapping EXPOS methods

28 / 28

Вам также может понравиться