10 Modeling Volatility - New

Modeling Volatility and Correlation
Price Patterns
Return Patterns
Price vs. Return Patterns

IBM Stock Price 1959 1996
DM/USD 1959-1996
Return Patterns in Dow Jones Industrial

Average
(1916-2003)
Range of Return
Movement
Gaussian
Prediction
Real
Movement
More than 3.4%
58 days
1001days
More than 4.5%
6 days
366 days
More than 7.0%
One time in 300,000

years
48 days
Daily S&P 500 Returns for January 1963

September 2003
Volatility Clustering
Historical volatilities with various

windows
An Excursion into Non-linearity Land
Motivation: the linear structural (and time series) models

cannot explain a number of important features
common to much financial data
- leptokurtosis
- volatility clustering or volatility pooling
- leverage effects
Our traditional structural model could be something like:

yt = 1 + 2x2t + ... + kxkt + ut,
or more compactly y = X + u.
We also assumed ut N(0,2).
Types of non-linear models
The linear paradigm is a useful one. Many apparently nonlinear relationships can be made linear by a suitable
transformation. On the other hand, it is likely that many
relationships in finance are intrinsically non-linear.
There are many types of non-linear models, e.g.

- ARCH / GARCH
- switching models
- bilinear models
Heteroscedasticity Revisited
An example of a structural model is
yt = 1 + 2x2t + 3x3t + 4x4t + u t

u2
with ut N(0,
).
The assumption that the variance of the errors is constant is

known as
u2
homoscedasticity, i.e. Var (ut) =
.
What if the variance of the errors is not constant?

- heteroscedasticity
- would imply that standard error estimates could be wrong.
Is the variance of the errors likely to be constant over time?

Not for financial data.
Autoregressive Conditionally
Heteroscedastic
(ARCH) Models
So use a model which does not assume that the variance

is constant.
Recall
t2 the definition of the variance of ut:
= Var(ut ut-1, ut-2,...) = E[(ut-E(ut))2 ut-1, ut-2,...]
We usually
assume that E(ut) = 0
t2
so
= Var(ut ut-1, ut-2,...) = E[ut2 ut-1, ut-2,...].
What could the current value of the variance of the errors

plausibly depend upon?
Previous squared error terms.
This
leads
to
the
autoregressive
conditionally
heteroscedastic model
for the
ut21variance of the errors:
t2
= 0 + 1
This is known as an ARCH(1) model.
Autoregressive Conditionally
Heteroscedastic
(ARCH) Models (contd)
The full model would be

yt = 1 + 2x2t + ... + kxkt + ut , utt2 N(0,
)
t2
where
= 0 +ut211
We can easily extend this to the general case where the
error variance depends on q lags of squared
errors:
2
2
2
2
ut q
t
t
2
= 0 u+t 11 u+
+...+
q
2
This is an ARCH(q) model.
2
t
Instead of calling the variance
, in the literature it is
usually called ht, so the model is
yt =2 1 + 22x2t + ... + 2kxkt + ut , ut N(0,ht)
ut 1
ut 2
ut q
where ht = 0 + 1
+2
+...+
q
Testing for ARCH Effects

1. First, run any postulated linear regression of the form given in the
equation
above, e.g.
utyt = 1 + 2x2t + ... + kxkt + ut
saving the residuals, .
2. Then square the residuals, and regress them on q own lags to test
for ARCH
2
2
of order q, i.e.
ut2 run
0 the
1ut2regression
...
u
1
2 t 2
q t q vt
where vt is iid.
Obtain R2 from this regression
3. The test statistic is defined as TR2 (the number of observations
multiplied by the coefficient of multiple correlation) from the last
regression, and is distributed as a 2(q).
Testing for ARCH Effects (contd)

4. The null and alternative hypotheses are
H0 : 1 = 0 and 2 = 0 and 3 = 0 and ... and q = 0
H1 : 1 0 or 2 0 or 3 0 or ... or q 0.
If the value of the test statistic is greater than the critical
value from the 2 distribution, then reject the null
hypothesis.
Note that the ARCH test is also sometimes applied directly

to returns instead of the residuals from Stage 1 above.
Problems with ARCH(q) Models
How do we decide on q?
The required value of q might be very large
Non-negativity constraints might be violated.
When we estimate an ARCH model, we require i

>0 i=1,2,...,q (since variance cannot be
negative)
A natural extension of an ARCH(q) model which gets around

some of these problems is a GARCH model.
Generalised ARCH (GARCH) Models
Due to Bollerslev (1986). Allow the conditional variance to

be dependent upon previous own lags
The variance equation is now

t2 = 0 + 1 ut21 +t-12
(1)
This is a GARCH(1,1) model, which is like an ARMA(1,1)

model for the variance equation.
We could also write

t-12 = 0 + 1 ut22 +t-22
t-22 = 0 + 1 ut23 +t-32
Substituting into (1) for t-12 :
t2 = 0 + 1 ut21 +(0 + 1 ut22 + t-22)

= 0 + 1 ut21 +0 + 1 ut22 + t-22
Generalised ARCH (GARCH) Models (contd)
Now substituting into (2) for t-22
t2 = 0 + 1 ut21 +0 + 1 ut22 +2(0 + 1 ut23 +t-32)
t2 = 0 + 1 ut21 +0 + 1 ut22 +02 + 12 ut23 +3t-32

t2 = 0 (1++2) + 1 ut21 (1+L+2L2 ) + 3t-32
An infinite number of successive substitutions would yield

2
2
2 2
2
2
+
+
L+
L
u
(1+
+...)
(1+
+...)
+
0
t
0
1
t 1
So the GARCH(1,1) model can be written as an infinite order

ARCH model.
2
2
2
2
2
2
2
t = 0 +1 ut 1 +2 ut 2 +...+q ut q +1t-1 +2t-2 +...+ pt-p
We can again extend the GARCH(1,1) model to a GARCH(p,q):
t2
= 0 i u
i 1
2
t i
j t j
j 1
Generalised ARCH (GARCH) Models

(contd)
But in general a GARCH(1,1) model will be sufficient to

capture the volatility clustering in the data.
Why is GARCH Better than ARCH?

- more parsimonious - avoids overfitting
- less likely to breech non-negativity constraints
The Unconditional Variance under the

GARCH Specification
The unconditional variance of ut is given by

0
1 (1 )
1 < 1
Var(ut) =
when
1 1
1 = 1
is termed non-stationarity in variance

is termed intergrated GARCH
For non-stationarity in variance, the conditional

variance forecasts will not converge on their
unconditional value as the horizon increases.
Examples of GARCH (1,1)
Daily data from 2 January 1996 to 6 October 2000

(Carol Alexander, 2001)
t2 = 0 + 1 ut21 +t-12
0.0000098
0.000005
0.153
0.047
0.714
0.949
Examples of GARCH (1,1)
Daily data from 2 January 1996 to 6 October 2000

t2 = 0 + 1 ut21 +t-12
0.0000014
0.0000328
0.052
0.157
0.941
0.74
Estimation of ARCH / GARCH Models
Since the model is no longer of the usual linear form, we

cannot use OLS.
We use another technique known as maximum likelihood.
The method works by finding the most likely values of the

parameters given the actual data.
More specifically, we form a log-likelihood function and

maximise it.
Maximum Likelihood (Example)
For wheel A, with p = 1/4, the

probability of observing WIN,
WIN, LOSS is:
For wheel B, with p = 3/4, the

probability of observing WIN,
WIN, LOSS is:
If we had to choose wheel A or

B based on the available data,
we would choose wheel B
because it has a higher
probability of having produced
the observed data
= 3/4 is called
the maximum
likelihood estimate
of p
The maximum
likelihood
principle seeks the
parameter values
that maximize the
probability, or
likelihood, of
observing the
outcomes actually
obtained
The probability of observing WIN, WIN, LOSS is

the likelihood L and is:
L p p p 1 p p 2 p3
Then:
dL p
dp
2 p 3 p2
And:
The maximum likelihood estimate is:
2 p 3 p2 0 p 2 3 p 0
p 2 3
Maximum Likelihood
More generally, we have:
P X x f x | p p 1 p , x 0,1
Assuming independence, we can form the
joint probability function:
x
1 x
f x1 ,K , xN | p f x1 | p K f xN | p
N xi
xi
p 1 p
L p | x1 ,K , xN
The last line we have indicated that the joint

probability function is algebraically equivalent to
the likelihood function L(p|x1,..., xN)
Maximum Likelihood
The value of p that maximizes L(p) = p2(1 p)

is the same value of p that maximizes the
log-likelihood function lnL(p) = 2ln(p) +
ln(1 p), where ln is the natural logarithm
For the general problem:
N
ln L p ln f xi | p
i 1
xi ln p N xi ln 1 p
i 1
i 1
Maximum Likelihood
The first derivative is:

d ln L p
dp
N xi
1 p
Then:
xi N xi 0
p
1 p
Solving, we get:
1 p xi p N xi 0
Finally, we have:
x
Estimation of ARCH / GARCH Models

(contd)
The steps involved in actually estimating an ARCH or GARCH model

are as follows
1.
Specify the appropriate equations for the mean and the variance - e.g.
an AR(1)- GARCH(1,1) model:
yt = + yt-1 + ut , ut N(0,t2)
2.
t2 = 0 + 1 ut21 +t-12
Specify the log-likelihood function to maximise:
T
1 T
1 T
2
2
L log(2 ) log( t ) ( y t y t 1 ) 2 / t
2
2 t 1
2 t 1
3.
The computer will maximise the function and give parameter values
and their standard errors
Parameter Estimation using Maximum

Likelihood
Consider the bivariate regression case with homoscedastic errors for

simplicity:
y t 1 2 xt u t
Assuming that ut N(0,2), then yt N( 1 2 x,t 2) so that the probability
density function for a normally distributed random variable with this mean
and variance is given by
1 ( y t 1 2 xt ) 2
1
2
f ( y t 1 2 xt , )
exp
(1)
2
2
Successive values of yt would trace out the familiar bell-shaped curve.
Assuming that ut are iid, then yt will also be iid.

Likelihood (contd)
Then the joint pdf for all the ys can be expressed as a product of the
individual density functions
f ( y1 , y 2 ,..., yT 1 2 X t , 2 ) f ( y1 1 2 X 1 , 2 ) f ( y 2 1 2 X 2 , 2 )...
f ( yT 1 2 X 4 , 2 )
(2)
f ( yt 1 2 X t , 2 )
t 1
Substituting into equation (2) for every yt from equation (1),

1 T ( y t 1 2 xt ) 2
f ( y1 , y 2 ,..., yT 1 2 xt , ) T
exp
2
T
2
( 2 )
t 1
(3)

Likelihood (contd)
The typical situation we have is that the xt and yt are given and we want to
estimate 1, 2, 2. If this is the case, then f() is known as the likelihood
function, denoted LF(1, 2, 2), so we write
2
T
(
y
x
)
1
2
t
1
2 t
LF ( 1 , 2 , ) T
exp
2
T
(4)
2
( 2 )
t 1
Maximum likelihood estimation involves choosing parameter values ( 1,

2,2) that maximise this function.
We want to differentiate (4) w.r.t. 1, 2,2, but (4) is a product containing T

terms.

Likelihood (contd)
f ( x ) max log( f ( x )) , we can take logs of (4).

Since max
x
x
Then, using the various laws for transforming functions containing
logarithms, we obtain the log-likelihood function, LLF:
T
1 T ( y t 1 2 xt ) 2
LLF T log log(2 )
2
2 t 1
2
which is equivalent to
T
T
1 T ( y t 1 2 xt ) 2
2
LLF log log(2 )
2
2
2 t 1
2
(5)
Differentiating (5) w.r.t. 1, 2,2, we obtain
LLF
1 ( y 1 2 xt ).2. 1
t
1
2
2
(6)

Likelihood (contd)
( y 1 2 xt ).2. xt
LLF
1
t
2
2
2
LLF
T 1 1 ( y t 1 2 xt ) 2

22 2
2
4
(7)
(8)
Setting (6)-(8) to zero to minimise the functions, and putting hats above
the parameters to denote the maximum likelihood estimators,
From (6),
( y x ) 0
y x 0
y T x 0
t
1
T
1
1 2
T
1 y 2 x
(9)

Likelihood (contd)
From (7),
(10)
( y x ) x 0
y x x x 0
y x x x 0
x y x ( y x ) x
x y x Tx y Tx
2 ( xt2 Tx 2 ) y t xt Tx y
t
2
t
2
t
From (8),
T
1
2 4
1 t
2
t
y x Tx y
( x Tx )
(y
2
t
2
t
1 2 xt ) 2

Likelihood (contd)
1
( y t 1 2 xt ) 2
T
1
2 ut2
T
Rearranging, 2
How do these formulae compare with the OLS estimators?

(9) & (10) are identical to OLS
(11) is different. The OLS estimator was
2
(11)
1
ut2
Tk
Therefore the ML estimator of the variance of the disturbances is biased,

although it is consistent.
But how does this help us in estimating heteroscedastic models?
Estimation of GARCH Models Using Maximum

Likelihood
t
Now we have yt = + yt-1 + ut , ut N(0, )
t2 = 0 + 1 ut21 +t-12
T
1 T
1 T
2
2
L log(2 ) log( t ) ( y t y t 1 ) 2 / t
2
2 t 1
2 t 1
Unfortunately, the LLF for a model with time-varying variances cannot be

maximised analytically, except in the simplest of cases. So a numerical procedure
is used to maximise the log-likelihood function. A potential problem: local
optima or multimodalities in the likelihood surface.
The way we do the optimisation is:

1. Set up LLF.
2. Use regression to get initial guesses for the mean parameters.
3. Choose some initial guesses for the conditional variance parameters.
4. Specify a convergence criterion - either by criterion or by value.
Extensions to the Basic GARCH Model
Since the GARCH model was developed, a huge number of

extensions and variants have been proposed. Three of the
most important examples are EGARCH, GJR, and GARCH-M
models.
Problems with GARCH(p,q) Models:

- Non-negativity constraints may still be violated
- GARCH models cannot account for leverage effects
Possible solutions: the exponential GARCH (EGARCH) model

or the GJR model, which are asymmetric GARCH models.
The EGARCH Model

Suggested by Nelson (1991). The variance equation is given
by
u
u t 1
2
2
2
t 1
log( t ) log( t 1 )
2
2
t 1
t 1
Advantages of the model

- Since we model the log(t2), then even if the parameters are
negative, t2
will be positive.
- We can account for the leverage effect: if the relationship between
volatility and returns is negative, , will be negative.
The GJR Model
Due to Glosten, Jaganathan and Runkle
t2 = 0 + 1 ut21 +t-12+ut-12It-1
where It-1 = 1 if ut-1 < 0
= 0 otherwise
For a leverage effect, we would see > 0.
We require 1 + 0 and 1 0 for non-negativity.
News Impact Curves

The news impact curve plots the next period volatility (ht) that
would arise from various positive and negative values of ut-1,
given an estimated model.
News Impact
Curves for S&P 500 Returns using Coefficients
0.14
GARCH
from GARCH and GJR Model Estimates:
GJR
Value of Conditional Variance
0.12
0.1
0.08
0.06
0.04
0.02
0
-1
-0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1
0.1
0.2
Value of Lagged Shock
0.3
0.4
0.5
0.6
0.7
0.8
0.9
GARCH-in Mean
We expect a risk to be compensated by a higher return. So

why not let the return of a security be partly determined by
its risk?
Engle, Lilien and Robins (1987) suggested the ARCH-M

specification.
would be
yt = + t-1+ A
ut GARCH-M
, ut N(0,t2model
)
t2 = 0 + 1 ut21 +t-12
can be interpreted as a sort of risk premium.
It is possible to combine all or some of these models

together to get more complex hybrid models - e.g. an
ARMA-EGARCH(1,1)-M model.
What Use Are GARCH-type Models?
GARCH can model the volatility clustering effect since the

conditional variance is autoregressive. Such models can
be used to forecast volatility.
We could show that

Var (yt yt-1, yt-2, ...) = Var (ut ut-1, ut-2, ...)
So modelling t2 will give us models and forecasts for yt as

well.
Variance forecasts are additive over time.
Questions?

10 Modeling Volatility - New

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

10 Modeling Volatility - New

Загружено:

Авторское право:

Доступные форматы

Modeling Volatility and Correlation

Price vs. Return Patterns

Return Patterns in Dow Jones Industrial

More than 3.4%

More than 4.5%

More than 7.0%

One time in 300,000

Daily S&P 500 Returns for January 1963

Historical volatilities with various

An Excursion into Non-linearity Land

Motivation: the linear structural (and time series) models

Our traditional structural model could be something like:

We also assumed ut N(0,2).

Types of non-linear models

There are many types of non-linear models, e.g.

An example of a structural model is

yt = 1 + 2x2t + 3x3t + 4x4t + u t

The assumption that the variance of the errors is constant is

What if the variance of the errors is not constant?

Is the variance of the errors likely to be constant over time?

So use a model which does not assume that the variance

What could the current value of the variance of the errors

Previous squared error terms.

The full model would be

Testing for ARCH Effects

Testing for ARCH Effects (contd)

Note that the ARCH test is also sometimes applied directly

Problems with ARCH(q) Models

The required value of q might be very large

Non-negativity constraints might be violated.

When we estimate an ARCH model, we require i

A natural extension of an ARCH(q) model which gets around

Generalised ARCH (GARCH) Models

Due to Bollerslev (1986). Allow the conditional variance to

The variance equation is now

This is a GARCH(1,1) model, which is like an ARMA(1,1)

We could also write

t-22 = 0 + 1 ut23 +t-32

Substituting into (1) for t-12 :

t2 = 0 + 1 ut21 +(0 + 1 ut22 + t-22)

Generalised ARCH (GARCH) Models (contd)

Now substituting into (2) for t-22

t2 = 0 + 1 ut21 +0 + 1 ut22 +2(0 + 1 ut23 +t-32)

t2 = 0 + 1 ut21 +0 + 1 ut22 +02 + 12 ut23 +3t-32

An infinite number of successive substitutions would yield

So the GARCH(1,1) model can be written as an infinite order

Generalised ARCH (GARCH) Models

But in general a GARCH(1,1) model will be sufficient to

Why is GARCH Better than ARCH?

The Unconditional Variance under the

The unconditional variance of ut is given by

is termed non-stationarity in variance

For non-stationarity in variance, the conditional

Examples of GARCH (1,1)

Daily data from 2 January 1996 to 6 October 2000

Examples of GARCH (1,1)

Daily data from 2 January 1996 to 6 October 2000

Estimation of ARCH / GARCH Models

Since the model is no longer of the usual linear form, we

We use another technique known as maximum likelihood.

The method works by finding the most likely values of the

More specifically, we form a log-likelihood function and

Maximum Likelihood (Example)