Вы находитесь на странице: 1из 5

Introduction to Time Series

Arnaud Chevalier, University College Dublin, January 2004


Adapted by Mark Franklin, European University Institute, May 2007
Time series analysis is the analysis of a series of data-points over time.
Time series analysis allows one to answer question such as what is the causal effect on a
variable Y of a change in variable X over time? An important difference between time series and
cross section data is that the ordering of cases does matter in time series.
Definition: A sequence of random variables indexed by time is called a stochastic process
(stochastic means random) or time series for mere mortals. A data set is one possible outcome
(realisation) of the stochastic process. If history had been different, we would observe a different
outcome, thus we can think of time series as the outcome of a random variable. For example,
INFLATION RATE
18
16
14
12
10
8
6
4
2
0
1959:02
-2
-4

1971:04

1984:02

1996:04

Series1

Rather than dealing with individuals as units, the unit of interest is time: the value of Y at time
t is Yt. The unit of time can be anything from days to election years. The value of Y in the previous
period is called the first lag value: Yt-1. The jth lag is denoted: Yt-j. Similarly Yt+1 is the value of Y
in the next period. So a simple bivariate regression equation for time series data looks like:
Yt = 0 + Xt + ut
1

Notation can vary a lot from one researcher to the next. Some will refer to 'a' or '1'rather than
to '0'. Some will put the 'a' or '0' at the end of the equation instead of at the start. Some will refer
to 'e' rather than to 'u'. Some will refer to 'b' rather than to ''. Some will use a whole series of
different Greek letters for different variables. You need to get used to such variations in notation.
Changes in the value of Y (delta-Y, denoted Y) rather than just plain Y are often used
because series tend to have a trend, say increase over time. If two series are trending we could
falsely conclude that one is causing the other (spurious relationship). The change in the value of Y
between period t-1 and t is called the first difference: Y = Yt Yt-1.
To deal with trended series, it is possible to include a trend variable (a simple counter that goes
up by a set amount with each time point) as an independent variable in the regression, but you need
a theoretical basis to justify its inclusion: Yt = 0 + Xt + t + ut.
Seasonality introduce patterns in the data, for example if new car models are introduced in the
same month every year, then for that month we observe higher sales every year. Also, in the retail
sector, sales are expected to be higher in the run up to Christmas. Including a set of dummy
variables for quarter (months) will account for the seasonality of the dependent or independents
variables (remember to leave apart one quarter/month, to avoid perfect multicollinearity).
In time series, the value of Y in one period is typically correlated with its value in the next
period, this is called serial correlation or autocorrelation.
1st autocorrelation is the correlation between Yt and Yt-1
jth autocorrelation is the correlation between Yt and Yt-j.
For the inflation series shown earlier, the autocorrelation of inflation at time t with inflation at
various times in the past is:
Lag

Inflation rate

Change of inflation rate

0.85

-0.24

0.77

0.27

0.77

0.32

0.68

-0.06

Inflation is strongly positively autocorrelated, and autocorrelation decreases as the lag


increases. This is the long term trend of inflation. However, Changes in inflation are correlated at a
much lower level and the one-year lag is negatively correlated (the 4-year lag for change in
inflation is not significantly correlated in the above table).
2

How do you know when you have autocorrelated data?


The standard test is the Durbin-Watson statistic (DW) which is a very awkward statistic that must
take on values close to 2.0 if autocorrelation is not to be considered problematic. Exactly how close
depends on the number of cases in the analysis, but anything outside the range 1.5 to 2.5 is almost
certainly problematic. This statistic, though standard, is falling into disuse because it only tests for
autocorrelation between time t and time t-1, and this can yield a 'false negative' if there is
autocorrelation at larger lags. These days a test known as the Breusch-Godfrey test is preferred. In
the best journals your model needs to provide a clean bill of health using this test, though you
might get away with a DW test in lesser journals.
1st order autoregression AR(1)
This is a model that relates a variable to its own value in the immediate past.
An AR(1) process takes the following form:

Yt = 0 + 1 Yt 1 + u t
One way to deal with autocorrelation is to find the value of 1 and use it to 'correct' the series so as
to remove the autocorrelation. The estimation can then be conducted on the 'AR1 corrected' data.
This is done in a single step using suitable software and is appropriate if the autocorrelation is
viewed as error to be purged, but this is a rather flat-footed approach. More appropriate is often to
think about WHY the dependent variable is affected by past values of itself, and develop a
theoretical rationale for including the lagged dependent variable in the equation you are estimating.
pth order autoregression AR(p)
The more distant past may have independently affect current value of a variable. One way to
incorporate this information is to include additional lags in the AR(1) model.
An AR(p) model represents Yt as a linear function of its p lagged values.
Yt = 0 + 1Yt 1 + 2Yt 2 + .. + p Yt p
There is no convenient software for 'correcting' the data for lags beyond the first (but look up the
General Method of Moments in STATA for inconvenient software), so significant autocorrelation
in lags 2 to p may have to be included as independent variables in any model.

Additional independent variables


The whole purpose of time-series analysis is to develop and test explanations for why the
dependent variable varies over time in the way it does. This generally involves using multiple
3

independent variables in addition to the variables required to deal with autocorrelation. Sometimes
these variables are also lagged to various degrees. It is possible that additional independent
variables will eliminate the effects of lagged variables, if independent variables themselves have
causes that are spread over time. It is obviously preferable to deal with autocorrelation problems
substantively in this way rather than by incorporating lagged variables that have no substantive
meaning (but which might compete for explained variance with substantively meaningful
variables).

Heteroskedasticitiy
This horrid word relates to the distribution of your variables (and especially your dependent
variable) over time. Very frequently there is more variance in a variable in some parts of the timeseries than in other parts. The graph of inflation rate shown earlier displays severe
heteroskedasticity in the period between the mid-1970s and the early 1980s. Heteroskedasticity
causes bias in the estimates produced by OLS and most other forms of regression, and a good
model needs to find independent variables that explain differences in the variance of the dependent
variable over time, or else the dependent variable needs to be transformed in some way so as to get
rid of the heteroskedasticity. This can be problematic, but usually is not. In particular, the lagged
version of the dependent variable, so often needed to deal with autocorrelation, may as a byproduct also deal with heteroskedasticity (if the dependent variable varies wildly, so may the
immediate past value of that variable which then helps to predict the large variations). However
this may be, a good model that predicts well the variations in the dependent variable will also
predict the heteroskedastic component in this variance. Returning to the earlier graph, a good model
would predict the spikes in inflation in the 1970s and 1980s, thus dealing with the
heteroskedisticity in the data.
Heteroskedasticity is best identified by actually looking at your variables plotted over time,
as was done with inflation above. The plots generally tell you if heteroskedasticity might be a
problem. There are several formal tests for this condition, one of which is known as the ARCH test,
which is performed in a similar way to the Breusch-Godfrey test for autocorrelation. Good journals
will require time series models to demonstrate that there is no unexplained heteroskedisity in the
data by reporting the ARCH test results, though in minor journals you may get away without
mentioning this condition.

Further reading
If you are serious about using time series analysis in your work and want to be published in good
journals (and have your books taken seriously) then you need to acquire and read Peter Kennedy's
Guide to Econometrics (now in its 5th edition). It is superbly well-written for those without a
mathematical background, and is even amusing in parts. You should make it your business to work
your way through this book, getting whatever help you need to understand it (at least at the
superficial level of the large-text portions). Technicalities dealt with in small text can be omitted
unless you are actually trying to employ the technique concerned. The large text portions of the
first five chapters (only about 20 pages in all) are available in pdf format from Dr. Franklin.
Sage has published a series of something like 200 little green books (each about 50 pages)
on specific social science methods. The books are generally written at a very introductory level
with political scientists and sociologists in mind. The back cover of each volume contains a list of
all the volumes (which are in no particular order). Find the series in the library and pull out the
highest-numbered volume you see there (the volume numbers are on the spine of each book). Then
look for relevant time series volumes, of which there are quite a number. Start with the first and
work your way through.
The STATA Manual contains excellent and generally accessible introductions to the various
techniques available in STATA. These are also accessible from the HELP facility.
A wonderful journal that you should make it your business to follow is Political Analysis
(free with membership of the Political Methodology Section of the APSA). The math is sometimes
opaque, but practice (and frequent reference to Kennedy) can help!

Вам также может понравиться