Вы находитесь на странице: 1из 143

Introduction to Analysis

of Financial Data

Martin Tegner

Mathematical Institute
University of Oxford
June 21, 2019
The main goals of this course

To overview tools for data analysis used for financial data


To get an overview of themes in the world of time series
modelling and econometrics
To get familiar with R
To show how to build elementary trading strategies

2/146
Suggested literature

“Statistics and Data Analysis for Financial Engineering” by


David Ruppert
”Statistical Analysis of Financial Data in S-Plus“ by Rene
Carmona
”Analysis of Financial Time Series” by Ruey Tsay
”Quantitative Risk Management” by A. McNeil, R. Frey, P.
Embrechts

3/146
Outline

Introduction to R
Empirical properties of financial data
Kernel estimators
Test for normality, QQ plots
Regressions
Way of analysing dependency between variables
Linear and non-linear time series
Stationarity
Autocorrelation functions
Modelling

4/146
Introduction to R
https://github.com/martnj/R-econometrics

5/146
R - introduction

# Creating vectors (data structure which contains objects of the same mode)
x <- c(1,2,3,4,5) # ’c’ for concatenate, gives *numeric* vector ’x’
x # print out ’x’
length(x)
(y <- 1:5) # define vector and *print* output via ’()’
(y[6] <- 6) # append to a vector (alternative: y <- c(z, 6))
numeric(5) # empty numeric of length 5 (default)

# Check if two objects are the same


x == y # component-wise ’equal-to’ operator
identical(x,y) # identical as objects? why not?
class(x) # => x is a *numeric* vector
class(y) # => y is an *integer* vector
all.equal(x,y) # numerical equality; see argument ’tolerance’

# Vector arithmetic: component-wise


2*x + 1 # ’*’ component-wise, ’+1’ adds 1 to all elements
x + x
x*x # component-wise product
x*y # must be of same length

6/146
R - introduction
# Some functions
(x <- c(3,4,2))
rev(x) # reverse order
sort(x) # sort in increasing order
sort(x, decreasing=TRUE)
(idx <- order(x)) # create indices that sort x
x[idx] # => sorted
log(x) # (component-wise) logarithms
x^2 # (component-wise) squares
exp(x)
sum(x)
cumsum(x)
prod(x)

# Sequences
seq(from=1,to=7,by=2)
seq(from=1,to=100,length.out=25)
rep(1:3, each=3, times=2)

# Missing values
z <- 1:3; z[5] <- 4 # two statements in one line (’;’-separated)
z # ’not available’ (NA)
c(z, 0/0) # 0/0, 0*Inf, Inf-Inf lead to ’not a number’ (NaN)
class(NaN) # not a number but still of mode ’numeric’ 7/146
class(NA)
R - introduction

# Matrices
(A <- matrix(1:9, ncol=3)) # OBS! operates on matrix by *columns*
(A <- matrix(1:9, ncol=3, byrow=TRUE)) # row-wise

# Some matrix functions


nrow(A) # number of rows
ncol(A)
dim(A) # dimension
diag(A) # diagonal of A
diag(3) # identity 3x3 matrix
(D <- diag(1:3)) # diagonal matrix
D%*%A # matrix multiplication
A*A # element-wise product (ie. Hadamard product)
log(A)
rowSums(A)
sum(A) # sums all elements

8/146
R - introduction

# Random number generation


(X <- rnorm(2)) # generate two N(0,1) random variates
(Y <- rnorm(2))

# Reproducibility:

# Set a ’seed’
X==Y # obviously not equal (here: with probability 1)
set.seed(10) # with set.seed() we can set the seed
X <- rnorm(2) # draw two N(0,1) random variates
set.seed(10) # set the same seed again
Y <- rnorm(2)
all.equal(X, Y) # => TRUE

9/146
R - introduction

# Plot student-t distributions with various


# degrees of freedom and compare to the normal

x <- seq(-4, 4, length=100)


hx <- dnorm(x)
degf <- c(1, 3, 8, 30)
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c("df=1", "df=3", "df=8", "df=30", "normal")
plot(x, hx, type="l", lty=2, xlab="x value",
ylab="Density", main="Comparison of t Distributions")
for (i in 1:4){
lines(x, dt(x,degf[i]), lwd=2, col=colors[i])
}
legend("topright", inset=.05, title="Distributions",
labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)

10/146
R - introduction

# The quantmod library

install.packages("quantmod") # download/install
install.packages(c("TTR","xts","zoo")) # packages required for qunatmode
library(quantmod) # load the package
?‘quantmod-package‘

getSymbols("AAPL",src="yahoo") # download Apple stock from Yahoo


class(AAPL)
?‘xts-package‘
dim(AAPL)
names(AAPL)
head(AAPL) # the first 6 rows of the data
tail(AAPL) # last 6 rows of the data

# Financial chart from ’quantmod’


chartSeries(AAPL,theme="white")
chartSeries(AAPL,type=c("auto","candlesticks"),subset="last 2 months")

# Log-returns
rtn = diff(log(AAPL$AAPL.Close))
chartSeries(rtn,theme="white")
11/146
Probability background

12/146
Randomness

Formal model: (Ω, F, P) probability space


Sample space: ω ∈ Ω represents possible realisation of
experiment
Sigma-algebra: family of events A ∈ F to which we can
assign probabilities
Probability measure: assigning probabilities P(A) to events
Random variable: X : Ω → R (or Rd ) often our model
Probability distribution: ‘event’ = {ω : X (ω) ≤ x} random
variable induces FX (x) = P(X ≤ x)
Dynamical model: a collection of random variables {Xt }t≥0
called a stochastic process
Filtration: working with stochastic processes, {Ft }t≥0
represents the information flow

13/146
Randomness

Denote random variables by upper case letters: X1 , . . . , Xn


Say that X1 , . . . , Xn are independent if for all αi ∈ R

P(X1 ≤ α1 . . . , Xn ≤ αn ) = P(X1 ≤ α1 ) . . . P(Xn ≤ αn ).

Denote a data sample by (x1 , . . . , xn ), where implicitly


assume that observations are realisations of i.i.d random
variables Xi with common distribution F (x)

14/146
Distributional properties

Joint Distribution FX ,Y (x, y ) = P(X ≤ x, Y ≤ y ). If density exists


Z x Z y
FX ,Y (x, y ) = P(X ≤ x, Y ≤ y ) = fx,y (w , z)dzdw
−∞ −∞

Marginal distribution
Z ∞
FX (x) = FX ,Y (x, ∞), fx (x) = fx,y (x, y )dy

q-th quantile of random variable

xq = inf{x : q ≤ FX (x)}

Conditional distribution
P(X ≤ x, Y ≤ y ) fx,y (x, y )
FX |Y ≤y (x) = , fx|y (x) =
P(Y ≤ y ) fy (y )

15/146
Distributional properties

p-th moment
Z
p
E[X ] = x p f (x)dx (< ∞ ⇐⇒ X ∈ Lp )
R

p-th central moment (provided X ∈ Lp )


Z
E[(X − µX )p ] = (x − µX )p f (x)dx
R

Skewness (symmetry)
(X − µX )3
SX = E[ ]
σX3
Kurtosis (tail behaviour)

(X − µX )4
KX = E[ ]
σX4

SX = 0 and KX = 3 for Gaussian variable; KX > 3 called leptokurtic

16/146
Conditional expectation

Assume have observed the outcome of some random variables


X = (X1 , . . . , Xn ) ∈ L2 and want to make a statement ξˆ ∈ σ(X ) about
the outcome of some unobserved random variable ξ ∈ L2
The best possible way in mean-square sense (L2 ), i.e. if we want the error
ˆ 2]
E[(ξ − ξ)

to be small, is given by the conditional expectation

ξˆ = E[ξ|X ]

Often, we are interested in the best linear prediction

ξˆ = α0 + α1 X1 + · · · + αn Xn

with αi ∈ R such that the mean-square error is minimised


If (X , ξ) is jointly Gaussian, then indeed

E[ξ|X ] = µξ + ΣξX Σ−1


XX (X − µX )

17/146
Empirical properties of financial data

18/146
Histogram

Histogram: a non-parametric estimator of the density of a


data sample
Advantages - simplicity, works well in regions with the bulk of
data points
Disadvantages - strong dependence on the of position and size
of bins
Can be misleading in regions with only few data points
(extremes), no information about the variability within a bin

19/146
Kernel density estimator

Given a sample (x1 , . . . , xn ) from (unknown) density f (x), the


kernel density estimator is
n
ˆ 1 X x − xi
fb (x) = K( ),
nb b
i=1

where K is a non-negative function which integrates to 1


called kernel, and b > 0 is called bandwidth
For the histogram
n
1 X
Hist(x) = θ(x, xi )
nb
i=1

where θ(x, xi ) = 1 if x and xi belong to the same bin, zero


otherwise
20/146
R - S&P 500 returns

install.packages("Ecdat") # Download package Ecdat


data(package="Ecdat") # Load data from it (SP500),
data(SP500,package="Ecdat")

class(SP500)
names(SP500)
plot(SP500$r500,type="l")
hist(SP500$r500)
hist(SP500$r500,breaks=100)
plot(density(SP500$r500))

install.packages("fBasics")
library(fBasics)
basicStats(SP500)

21/146
R - output of basicStats(SP500)

r500
nobs 2783.000000 # Sample size
NAs 0.000000 # No of missing values
Minimum -0.228006
Maximum 0.087089
1. Quartile -0.004845
3. Quartile 0.005766
Mean 0.000418
Median 0.000536
Sum 1.163571
SE Mean 0.000206 # Standard error of sample mean
# SE:= (standard deviation)/sqrt(nobs)
LCL Mean 0.000014 # Lower bound of 95% C.I.
UCL Mean 0.000822
Variance 0.000118
Stdev 0.010863
Skewness -3.495673
Kurtosis 74.400163

22/146
Hypothesis testing

Statement of a null hypothesis H0 and decision on a


significance level α, e.g. α = 0.05
Outcome: “reject” or “not reject” null hypothesis
Never possible to “accept”, only possible to state that the
sampled data are not sufficient to reject
Connection to confidence intervals:
e.g., test whether a parameter is zero
if 95% CI contains zero, fail to reject “Null”; otherwise reject.
In 100 tests, one would expect 5 (wrongly) deemed
statistically significant

23/146
Hypothesis testing - example

Let x = (x1 , . . . , xN ) be a sample from X ∼ N(µX , σX2 )


N
1 X σ2
µ̂X = xi =⇒ µ̂X ∼ N(µX , X )
N N
i=1

H0 : µX = 0 (null hypotheis) vs. H1 : µX 6= 0 (alternative)


Under H0 : √
Test statistic t = σ̂NXµ̂X ∼ student’s t with N − 1 degrees of
freedom (t-student since estimate σ, otherwise normal)
Reject H0 at significance level α if

|t| ≥ q1−α/2

q1−α/2 is quantile of t-student (std. normal) distribution


Or if p-value is less then α, p-value = P(|T | > t|H0 is true)
24/146
Testing for normality

Jarque-Bera test
H0 : data {Yi }ni=1 from standard normal distribution

J-B test: if Y ∼ N(0, 1), then E[Y 3 ] = 0 and E[Y 4 ] = 3


Test statistic:
n 1X 3 2 n 1X 4
J= ( Yi ) + ( Yi − 3)2
6 n 24 n
For large n, J ∼ χ22
A large J (> 6) =⇒ reject H0
Shapiro-Wilk test looks at order statistics

25/146
R - test for normality

normalTest(SP500$r500,method=’jb’) # JB-test

# STATISTIC:
# X-squared: 648508.6002
# P VALUE:
# Asymptotic p Value: < 2.2e-16 # Reject normality

hist(SP500$r500,nclass=30) # Histogram
d1 = density(SP500$r500) # Obtain density estimate
range(SP500$r500) # Range of SP500 returns
x = seq(-.25,.1,by=.001)
y1 = dnorm(x,mean(SP500$r500),stdev(SP500$r500))
plot(d1$x,d1$y,xlab="return",ylab="density",type="l")
lines(x,y1,lty=2)
plot(d1$x,d1$y,xlab="return",ylab="density",type="l",xlim=c(-0.05,.05))
lines(x,y1,lty=2)

26/146
Q-Q plots

Q-Q plot is a graphical method of comparing two probability


distributions by plotting their quantiles
Given FX and q, the q-quantile is a number πq satisfying

FX (πq ) = P(X ≤ πq ) = q ⇒ πq = FX−1 (q).

or more generally

πq = inf{x : q ≤ FX (x)}

27/146
Q-Q plots
Empirical distribution
n
1X
F̂n (x) = 1{xi ≤x}
n
i=1

By Glivenko-Canteli theorem F̂n (x) → F (x) uniformly in x


Order statistic: X(1) ≤ X(2) ≤ . . . ≤ X(n)
Viewing x(k),n as a realization of X(1),n . . . , X(n),n we have

k −1 k
x(k),n = π̂q for <q≤
n n
since the sample quantile is
n
1X
π̂q = inf{x : q ≤ F̂n (x) = 1{xi ≤x} }
n
i=1
28/146
Normal probability plots

If the normality assumption is true, then qth sample quantile


will be approximately equal to µ + σΦ−1 (q)

πq − µ X −µ πq − µ πq − µ
Φ( ) = P( ≤ )=q ⇒ = Φ−1 (q).
σ σ σ σ
Hence, plotting (x(k),n − µ̂)/σ̂ versus Φ−1 (k/(n + 1)) we can
test for normality

29/146
R - Q-Q Plot

# Create sample data


x <- rcauchy(100) # Undefined moments

# Standard normal fit


qqnorm(x)
qqline(x)

# S&P 500 returns


y = (SP500$r500 - mean(SP500$r500))/sd(SP500$r500)
qqnorm(y)
qqline(y)

30/146
Linear regression

31/146
Linear regression

Y , output variable
X = (X1 , . . . , Xp ), input variables
How does Y relate to X ?
Model assumption:

Y = β0 + β1 X1 + . . . + βp Xp + 

where  is an error with E [|X1 , . . . , Xp ] = 0

32/146
Regression coefficients

Model

Y = β0 + β1 X1 + . . . + βp Xp + 

β0 : intercept
βj : slope


βj = E [Y |X1 , . . . , Xp ].
∂Xj

βj gives change in the expected value of Y when Xj changes


by one unit

33/146
Regression - assumptions

Conditional expectation assumed to be linear

E [Y |X ] = β0 + β1 X1 + . . . + βp Xp

Uncorrelated additive (Gaussian) noise: 1 , 1 , . . . , n are


uncorrelated (independent)
Constant variance: Var(i ) = σ2

34/146
Least squares estimation - 1D input variable

n independent observations (x1 , y1 ), . . . , (xn , yn ) from

Yi = β0 + β1 Xi + i

Least squares estimation: find β0 and β1 that minimize


n
X
RSS(β) = (yi − (β0 + β1 xi ))2
i=1

Solution
Pn
yi (xi − x̄)
β1 = Pi=1
b n 2
i=1 (xi − x̄)
βb0 = ȳ − βb1 x̄

Fitted values ybi = βb0 + βb1 xi


35/146
Least squares estimation - p-dim input variable

n independent observations (x1 , y1 ), . . . , (xn , yn ) where


xi = (xi1 , . . . , xip )> from

Yi = β0 + β1 Xi1 + · · · + βp Xip + i

Organise in matrices: y = (y1 , . . . , yn )> , X with rows (1, x>


i )
and β = (β0 , . . . , βp )> such that

RSS(β) = (y − Xβ)> (y − Xβ)

If X has full rank, the unique minimising solution is given by

β̂ = (X> X)−1 X> y

Fitted values ŷ = Xβ̂

36/146
Sampling properties of β̂
Unbiased estimate with variance

Var(β̂) = (X> X)−1 σ2


n
1 X
σ̂2 = (yi − ŷi )2
n−p−1
i=1

If errors are Gaussian


 
β̂ ∼ N β, (X> X)−1 σ2
(n − p − 1)σ̂2 ∼ χ2n−p−1

Gives 1 − α confidence intervals/regions


h √ √ i
β̂j − z (1−α/2) vj σ̂ , β̂j + z (1−α/2) vj σ̂
n (1−α/2)
o
Cβ = β : (β̂ − β)> X> X(β̂ − β) ≤ σ̂2 χ2p−1

37/146
Sampling properties of β̂

(1−α/2)
z (1−α/2) and χ2p−1 are the 1 − α/2-quantiles of the
normal and chi-square distributions respectively (intervals
approximately correct even if errors are non-Gaussian)
The confidence region for β generates a corresponding
confidence region for the true regression function
f (x) = (1, x)> β, namely
n o
Cf (x) = (1, x)> β : β ∈ Cβ

38/146
Inference about model
Gives a way of drawing inference about β
Consider the null hypothesis βj = 0. Then

β̂j
tj = √ , vj = jth diagonal of (X> X)−1
σ̂ v j

is distributed as tn−p−1 . Hence, large tj =⇒ reject H0


Similarly, for a group of parameters βg ⊂ β, we may test
H0 : βg = 0 with the F statistic

(RSS0 − RSS)/(p − p0 )
F =
RSS/(n − p − 1)

where RSS0 is for smaller model with βg = 0 and p0 + 1


(non-zero) parameters
Under null hypothesis, F is distributed as Fp−p0 ,n−p−1
39/146
R - Linear regression
# Linear regression

spx = SP500$r500
y = spx[-(1:2)]
X = cbind(rep(1,length(spx)-2),spx[-c(1,length(spx))],
spx[-c(length(spx)-1,length(spx))] )

install.packages("scatterplot3d")
library(scatterplot3d)
pr = par()
scatterplot3d(X[,2],X[,3],y)
par(pr)

m1 <- lm(y~X-1)
summary(m1)

X = cbind(rep(1,length(spx)-2),spx[-c(1,length(spx))] )
plot(X[,2],y)
m2 <- lm(y~X-1)
summary(m2)
40/146
R - Linear regression

beta_hat = solve(t(X)%*%X,t(X)%*%y)
sig2_eps = sum((y - X%*%beta_hat)^2)/(nrow(X)-3)
beta_cov = inv(t(X)%*%X)*sig2_eps
for(i in 1:1000){
beta_rnd = beta_hat + (chol(beta_cov))%*%rnorm(length(beta_hat))
abline(beta_rnd,col=grey(0.7))
}

41/146
Model selection - information criteria
Akaike Information Criterion (AIC) is defined as

2 2
AIC = − log(max likelihood) + × (number of parameters)
N N
Recall the likelihood L(θ|y) = P(y|θ)
N
2 X 2d
AIC = − log P(yi |θ̂) +
N N
i=1

First component of AIC measures ”goodness of fit”


Second component of AIC penalises for number of parameters
Bayesian information criterion (BIC)
N
2 X ln(N)
BIC(h) = − log P(yi |θ̂) + × (number of parameters)
N N
i=1

42/146
Model quality - goodness of fit

How much of the variation in Y can be predicted if one knowns


X1 , . . . , Xp ? (ANOVA)
n
X
total SS = (yi − ȳ )2 ( ∝ variance of the data)
i=1
Xn
regression SS = yi − ȳ )2
(b (explained sum of squares)
i=1
Xn
residual SS = yi − yi )2
(b (residual sum of squares)
i=1

43/146
Sums of squares and R 2

Always

total SS = explained SS + residual SS

Define
explained SS
R2 =
total SS
Thus, R 2 ≤ 1 measures proportion of the total variation of Y that
can be explained by linear model
If residual SS = 0 =⇒ R 2 =1 (i.e., zero prediction error)
If explained SS = 0 (i.e., ybi = ȳ ) no point of regression model
The closer R 2 to 1 the better

44/146
Part I: linear time series

45/146
Time series - definition

Time series data: a sequence of observations in chronological


order
x = (x0 , x1 , . . . , xn )
often assumed to be observed outcome of
Stochastic process (time series model): a sequence of random
variables with the index representing (discrete) time

X = {Xt }t

46/146
Objective

The objective of time series analysis is to find the dynamic


properties of X , i.e. the dependence of Xt on its past values
Xt−1 , Xt−2 , . . .
We try to draw inference about a time series model X from
an observed realisation x

47/146
Asset returns

Most financial studies applies to returns data


Returns provide a scale-free summary of the investment
opportunity
Returns have more attractive statistical properties than the
price itself

Let Pt be a price of an asset at time t > 0


Simple return
Pt − Pt−1 Pt
Rt = ⇐⇒ 1 + Rt =
Pt−1 Pt−1

Log return
Pt
rt = log (1 + Rt ) = log = log Pt − log Pt−1
Pt−1

48/146
R - asset price vs. returns

# Mishkin tb3 - three-month Bonds


# T-bill rate (in percent, annual rate)
data(Mishkin ,package="Ecdat")
plot(Mishkin[,4])

y <- diff(log(Mishkin[,4])) # log returns


plot(y)
abline(h=0,col="grey")

49/146
Stylized facts of asset returns

Stylised facts is a collection of empirical observations from


statistical analysis of financial price data (e.g. log-returns on
equities, indices, exchange rates, commodity prices etc.)
Stylized facts often applies to daily log-returns (also to
intra-daily, weekly, monthly, tick-by-tick)

50/146
Stylized facts of asset returns

Return time series are not i.i.d. although they show little
serial correlation
Series of absolute or squared returns (∼variance) show
profound serial correlation
Conditional expected returns are close to zero
Extreme returns appear in clusters
Return series are leptokurtic (peaked) and heavy-tailed
(power-like tail)

51/146
Stylized facts of asset returns

1.0
0.5
log−return

0.0
−0.5

1950 1960 1970 1980 1990

Time

52/146
Time series model - properties

Mean and standard deviation functions

µX (t) = E[Xt ]
p
σX (t) = Var[Xt ]
Autocovariance function

γX (s, t) = Cov[Xs , Xt ]

and autocorrelation function (ACF)

γX (s, t)
ρX (s, t) = Corr[Xs , Xt ] =
σX (s)σX (t)

Recall that the first two moments fully characterise Gaussian


process
53/146
Time series model - properties

Stationarity
“The same type of stochastic behaviour of a stochastic
process from one time period to another”
For instance, financial returns can vary, but their stochastic
properties (mean, std, ...) are often similar in each period
Attention: returns often show stationary behaviour, not asset
prices, which tend to increase over time
Important theme: transforming time series data to obtain
stationary behaviour =⇒ modelling in stationary domain

54/146
Strictly and weakly stationary processes

Definition
A stochastic process X = {Xt }t
is strictly stationary if for any n, h ∈ N

P(Xt1 ≤ x1 , . . . , Xtn ≤ xn ) = P(Xt1 +h ≤ x1 , . . . , Xtn +h ≤ xn )

is weakly stationary if
i) E[Xt ] = µX for all t ≥ 0
ii) Var [Xt ] = σX2 < ∞ for all t ≥ 0
iii) γX (s + h, s) = γX (t + h, t) for all t, s, h (auto-covariance is a
function only of the time lag h)

55/146
Remarks

Strict stationarity does not imply weak (e.g. E[Xt2 ] = ∞)


Weak stationarity does not imply strict (e.g. E[Xtp ] p > 2
may vary)
The autocorrelation function (ACF) of a stationary time series
X = {Xt }t is given by

γX (h)
ρX (h) = Corr[Xh , X0 ] = , h∈Z
γX (0)

56/146
Time average as statistical estimates

If X is stationary time series then


n
1X
lim Xi = const. (= µX if X is ergodic)
n→∞ n
i=1

During this course we assume that stationary ⇒ ergodic,


hence it makes sense to set
n
1X
µ̂X = xi
n
i=1

n−|h|
1 X γ̂X (h)
γ̂X (h) = (xi − µ̂X )(xi+|h| − µ̂X ), ρ̂X (h) =
n γ̂X (0)
i=1

57/146
Autocorrelation function (ACF)

# S$P 500 - ACF

getSymbols("SPY",src="yahoo")
head(SPY)
plot(SPY$SPY.Close)
SPY.rtn = diff(log(SPY$SPY.Close))
plot(SPY.rtn)
acf(SPY.rtn[-1]) # plot autocorrelation function
acf(SPY.rtn[-1]^2) # x[-1] : skip the first element

58/146
The search for stationarity - decomposition

It is a common practice to work with mean-zero time series


Xt − µX (tricky if we have µX (t))
Sometimes data x(t) can be described as

x(t) = m(t) + p(t) + x̃(t),

where m(t) is a deterministic monotone function, p(t) a


deterministic periodic function and x̃(t) is a mean-zero
stationary time series
See example ”Stationarity CO2 at Manua Loa”

59/146
The search for stationarity - integration

Differential operator ∇ is given by

∇Xt = Xt − Xt−1

Taking differences can turn non-stationary time series into


stationary (remember returns vs prices!)
We say that a time series is integrated of order one (or that
it has one unit root) if its difference is stationary. We denote
the class of these time series by I (1)

60/146
The search for stationarity - integration

First difference can cancel a constant term, the second order


difference will cancel a linear trend (m(t) = at + b)

∇2 Xt = Xt − 2Xt−1 + Xt−2

Successive application of the difference operator can remove


any kind of polynomial trend

X ∈ I (p) means that ∇p X = {∇p Xt }t is stationary

The order of integration can be calculated using unit root tests

61/146
The search for stationarity - Box-Cox transformations

Stabilisation of variance for X̃t = Xt − µX (t)


In the case when

Var[Xt ] = φ(µX (t))2 σ 2

we can take Yt = Ψ(X̃t ), where Ψ0 (x) = 1


φ(x) and Ψ is
invertible
Yt = Ψ(X̃t ) ≈ Ψ(µX̃ (t)) + Ψ0 (µX (t))(X̃ (t) − µX (t)) with
approximative variance σ 2
Examples:
Ψ(µ) = log µ is recommended when the variance varies like
the mean (φ(µ) = µ)

Ψ(µ) = µ is recommended when the variance varies like the

square root mean (φ(µ) = µ)
62/146
Stationarity

# Stationarity CO2 at Manua Loa


data(package="datasets")
data(co2,package="datasets")
plot(co2)
co2.stl= stl(co2,"periodic")
?stl
head(co2.stl$time.series)
plot(co2.stl)
plot(co2.stl$time.series[,3],ylab="CO2 data, remainder")

plot(diff(co2,differences = 2))

63/146
First example of time series model - white noise

Definition ((Strict) white noise)


{Xt }t∈Z is a white noise process if it is stationary, has zero
mean and with ρ(h) = 1h=0 and γ(0) = σ 2 < ∞. It is
denoted by WN(0, σ 2 ).
{Xt }t∈Z is a strict white noise process if it is a sequence of
i.i.d. random variables with γ(0) = σ 2 < ∞ and zero mean.
We write SWN(0, σ 2 ).

A time series w0 . . . , wn is said to form white noise if they are


observations from i.i.d. mean zero random variables {Wt }t∈Z ;
it is obviously stationary
Cov(Ws , Wt ) = E[Ws Wt ] = 0
E[Wt+i |W1 , . . . , Wi ] = 0 so no predictive power (modelling
process is complete)
64/146
R - white noise

# White noise
WN <- rnorm(1024,0,1)
ts.plot(WN)
acf(WN,40,"covariance")
acf(WN,40,"correlation")

# check for normality


qqnorm(WN)
qqline(WN)

65/146
Linear time series

Definition
A time series {Xt }t∈Z is said to be linear if it can be written as

X
Xt = µ + Ψi Wt−i ,
i=0

where µ is the constantPmean of X , Ψ0 = 1, and {Wt } is a white


noise series. Moreover ∞ i=0 |Ψi | < ∞ (absolute summability
condition).


X
E[Xt ] = µ, Var[Xt ] = σ 2 Ψ2i .
i=0

weakly stationary =⇒ Var[Xt ] < ∞ =⇒ Ψ2i → 0

66/146
Linear time series

P∞
i=0 |Ψi | < ∞ implies that

X
E[| Ψi Wt−i |] < ∞
i=0

67/146
Linear time series

The ACF of stationary X is



X ∞
 X 
γ(h) = Cov [Xt , Xt−h ] = E[ Ψi Wt−i Ψj Wt−h−j ]
i=0 j=0

X ∞
X
2
= E[ Ψj Ψi Wt−i Wt−h−j ] = Ψj+h Ψj E[Wt−h−j ]
i,j=0 j=0
X∞
2
=σ Ψj+h Ψj
j=0

68/146
Random walk

We say that {Xn }n is a random walk if there exists a white


noise W such that

Xn = X0 + W1 + . . . + Wn .

E[Xn ] = E[X0 ]
Var[Xn ] = Var[X0 ] + nσ 2
Variance is changing with n =⇒ non-stationarity
But {Xn }n is I (1)
Example of a unit-root non-stationary time series.

69/146
R - random Walk

# Random Walk
WN <- rnorm(1024,0,1)
RW <- cumsum(WN)
acf(RW,40,"covariance")
acf(RW,40,"correlation")
acf(diff(RW),40,"correlation")

70/146
Auto regressive time series (AR model)

Definition
A mean-zero time series {Xn }n is AR(p) if

Xn = φ1 Xn−1 + φ2 Xn−2 + . . . + φp Xn−p + Wn ,

where φi ∈ R
Subtract the sample mean from the data before trying to model
with (mean-zero) AR processes

71/146
AR(1) series

AR(1) series
Xt = φ1 Xt−1 + Wt
We can rewrite this iteratively as

Xt =φ1 Xt−1 + Wt = Wt + φ1 Wt−1 + φ21 Wt−2 + . . .



X
= φi1 Wt−i
i=0

Fits definition of linear time series (φ1 = 1 =⇒ random walk


starting at “minus infinity”)

72/146
AR(1) series

Var[Xt ] = φ21 Var[Xt−1 ] + σ 2


Weak stationarity assumption: Var[Xt ] = Var[Xt−1 ] < ∞ hence

σ2
Var[Xt ] =
1 − φ21

provided φ21 < 1. Hence

weak stationarity =⇒ −1 < φ1 < 1

It turns out that the opposite implication is also true.


The ACF is given by

ρ(h) = φ1 ρ(h − 1) =⇒ ρ(h) = φh1 since ρ(0) = 1

(hint: multiply Xt with Xt−h )


73/146
R - AR(1) series

# AR(IMA) process
# order = c(p,d,q) where p is AR(p), d differencing,
# q is MA(q)
x = arima.sim(list(order=c(1,0,0),ar=0.9),n=1000)
ts.plot(x)
acf(x,40,type="correlation")
lines(0.9^(0:40),lty=2)

74/146
Estimation: Yule-Walker equations

Assume we have AR(2) model

Xn = φ1 Xn−1 + φ2 Xn−2 + Wn (1)

We want to estimate φ1 , φ2 (and possibly σ)


Multiply (1) by Xn , then Xn−1 , finally by Xn−2
Use the fact that E [Wn Xn−1 ] = 0 and E [Wn Xn−2 ] = 0
We obtain the Yule-Walker equations

γX (0) = φ1 γX (1) + φ2 γX (2) + σ 2


γX (1) = φ1 γX (0) + φ2 γX (1)
γX (2) = φ1 γX (1) + φ2 γX (0)

Solving with sample covariances gives one way of estimating


φ1 , φ2 , σ (method of moments)
75/146
AR(2) process
We have
φ1
ρ(1) = (2nd eqn)
1 − φ2
ρ(h) = φ1 ρ(h − 1) + φ2 ρ(h − 2), h≥2 (3rd eqn)

Defining backshift operator B k ρ(h) = ρ(h − k), may write the


characteristic equation

(1 − φ1 B − φ2 B 2 )ρ(h) = 0

Its solutions−1 are characteristic roots. For example, from roots


ω1 and ω2 we have

(1 − ω1 B)(1 − ω2 B)ρ(h) = 0

Complex roots give rise to seasonality in the time series


76/146
Stationarity and characteristics roots

For AR(1) we had ρ(h) = φ1 ρ(h − 1), hence the characteristic


equation has the form
1
1 − φ1 z = 0 characteristic root ω= = φ1
z0
with stationarity iff |φ1 | = |ω| < 1

For stationarity of AR(p) process:


characteristic roots are less than 1 in modulus
this ensures that ACF of the model converges to 0 as lag h
increases

77/146
Estimation of AR(p) process

Mean-zero AR(p) is given by

Xn = φ1 Xn−1 + φ2 Xn−2 + . . . + φp Xn−p + Wn ,

ACF satisfies

(1 − φ1 B − φ2 B 2 − . . . − φp B p )ρ(h) = 0, for h > 0.

Setting h = 1, . . . , p gives a set of p equations for unknown


(φ1 , . . . , φp ) =⇒
Method of moments estimation (Yule-Walker)
Regression: Xn towards (Xn−1 , . . . , Xn−p ) =⇒ least-squares
estimation
Alternatives exist: MLE, prediction error method etc.

78/146
Finding the order

The optimal order p of model can be determined by


AIC-criteria or partial autocorrelation function (PACF)

79/146
Partial autocorrelation function

Given (Xt−m+1 , Xt−m+2 , . . . , Xt ), the best linear predictor of Z is

Etm (Z ) = α1 Xt−m+1 + α2 Xt−m+2 + . . . + αm Xt

which minimises the mean square error from Z


h i
E (Z − (α1 Xt−m+1 + . . . + αm Xt ))2

The k-th partial autocorrelation is defined as


k−1 k−1
ΦX (k) = Corr[Xt − Et−1 (Xt ), Xt−k − Et−1 (Xt−k )]

In other words: PACF at lag k is the correlation between Xt and Xt−k


after their linear dependence on (Xt−k+1 , . . . , Xt−1 ) (the intermediate
variables) has been removed
k-th partial auto-correlation Φ(k) can be calculated from auto-correlation
coefficients
If X is an AR(p) process, then Φ(k) = 0 for k > p

80/146
R - partial autocorrelation function

# US Gross National Product 1946-2010

data = read.table("Data/q-gnp4710.txt",header=T)
head(data)
tail(data)
gnp = data$VALUE
gnp.r = diff(log(gnp))
tVec = seq(1947,2010,length.out=nrow(data)) # create time-index
plot(tVec,gnp,xlab=’year’,ylab=’GNP’,type="l")
plot(tVec[-1],gnp.r,type="l",xlab="year",ylab="growth"); abline(h=0)
acf(gnp.r,lag=12)
pacf(gnp.r,lag=12,ylim=c(-1,1))

81/146
R - AIC criteria for order determination

# PACF criteria for AR(p)


m1 = arima(gnp.r,order=c(3,0,0)) # fit AR(3) model
m1

# AIC criteria
?ar # ‘ar’ also fits order ’p’!
m2 = ar(gnp.r,method=’mle’)
m2$order # Find the identified order
names(m2)
print(m2$aic,digits=3)
plot(c(0:12),m2$aic,type=’h’,xlab=’order’,ylab=’AIC’)
lines(0:12,m2$aic,lty=1)

82/146
Model checking - residuals

Assume we have time series x1 , . . . , xT observed from

Xt = φ1 Xt−1 + . . . + φp Xt−p + Wt

We estimate φ̂i of φi and calculate (1-step) predictions

xˆt = φˆ1 xt−1 + . . . + φˆp xt−p , t ≥p+1

Residual series defined as

ŵt = xt − x̂t

Using least squares, we can calculate an estimate


PT 2
2 t=p+1 ŵt
σ̂W =
T − 2p − 1
83/146
Model checking

A fitted model should always be examined carefully


If the model is adequate the residual series should behave as a
white noise
ACF and Ljung-Box (next slide) statistics can be used to
check the closeness of {ŵt } to a white noise
Type in R tsdiag (m1, gof = 12) (example of GNP)

84/146
Ljung-Box

Define lag-k sample autocorrelation (x̄ denotes sample mean)


PT
t=k+1 (xt − x̄)(xt−k − x̄)
ρ̂k = PT 0 ≤ k < T − 1.
2
t=1 (xt − x̄)

If {Xt } is an i.i.d. sequence satisfying E[Xt2 ] < ∞ then ρ̂k is


asymptotically normal with mean 0 and variance 1/T .
This allow us to test for individual ACF components;
H0 : ρk = 0, k > 0

85/146
Ljung-Box

Ljung-Box test
m
X ρ̂2i
Q(m) = T (T + 2) .
T −i
i=1

under H0 , Q(m) is chi-squared with m − p degrees of freedom


If {Xt } is an i.i.d. sequence, then {|Xt |} is also i.i.d. so it is a
good practice to apply ACF and Ljung-Box test to absolute
series

86/146
Ljung-Box

# Box-Ljung test

vw = read.table("Data/m-ibm3dx2608.txt",header=T)[,3]
plot(vw,type=’l’)
m3 = arima(vw-mean(vw),order=c(3,0,0)) # fit AR(3)-model

names(m3)
Box.test(m3$residuals,lag=12,type=’Ljung’)
pv = 1 - pchisq(16.352,9) # p-value with 9 dof = #lags - #params
pv # Small p-value => reject H0: white noise

m4 = arima(vw-mean(vw),order=c(3,0,0),fixed=c(NA,0,NA,NA))
m4
Box.test(m4$residuals,lag=12,type=’Ljung’)
pv = 1 - pchisq(16.828,10) # Compute p-value, 10 dof
pv

87/146
Prediction with AR(p)

Assume that {Xt }t is AR(p) and we know its coefficients

Xt+1 = φ1 Xt + . . . + φp Xt+1−p + Wt+1

Assume we have observations up until time t, i.e. xs for s ≤ t


Denote X̂t+1|t = Et [Xt+1 ] (conditional expectation)
Since Et [Wt+1 ] = 0 and Et [Xs ] = Xs , s ≤ t, we get

X̂t+1|t = φ1 Xt + . . . + φp Xt−p+1
et (1) = Xt+1 − X̂t+1|t = Wt+1 residual
2
Var[et (1)] = σW

88/146
Prediction with AR(p)

Next: 2-step predictions

Xt+2 = φ1 Xt+1 + . . . + φp Xt−p+2 + Wt+2

Hence

X̂t+2|t = φ1 X̂t+1|t + φ2 Xt + . . . + φp Xt−p+2


et (2) = φ1 et (1) + Wt+2 = φ1 Wt+1 + Wt+2
Var[et (2)] = (1 + φ21 )σW
2

89/146
Moving average time series

We say that X ∼ MA(q) if

Xt = Wt + θ1 Wt−1 + θ2 Wt−2 + . . . + θq Wt−q , θi ∈ R.

MA(q), q < ∞, is stationary by definition

90/146
Moving average time series

Variance
Var[Xt ] = (1 + θ12 + . . . + θq2 )σW
2

Autocorrelation function for MA(1)

Xt−h Xt = Xt−h Wt − θ1 Xt−h Wt−1

Taking expectation we obtain


2
γ(1) = −θ1 σW , γ(h) = 0 for h > 1

Autocorrelation function of MA(q) vanishes for lags greater


than q (jf. PACF for AR process)

91/146
Backward shift operator

BXt = Xt−1 and B k Xt = Xt−k


AR(p) can be written as φ(B)Xt = Wt , where

φ(z) = 1 − φ1 z − φ2 z 2 − . . . − φp z p

MA(q) can be written as Xt = θ(B)Wt , where

θ(z) = 1 + θ1 z + θ2 z 2 + . . . − θq z q

92/146
ARMA and ARIMA

X ∼ ARMA(p, q) if

φ(B)Xt = θ(B)Wt

ARMA(p,q) is stationary if the characteristic roots of φ(z) are


|ωi | ≤ 1
X ∼ ARIMA(p, d, q) ⇐⇒ ∇d X ∼ ARMA(p, q)

93/146
Fitting models to data in practice

Transform the data (e.g using a Box-Cox transformation) to


stabilise the variance
Differentiate successively to remove linear/polynomial trends
Remove seasonal components
Examine the ACF/PACF: is an AR(p) or MA(q) model
appropriate?
Try your chosen model(s), and use AIC to search for a better
models
Check the residuals from your chosen model by plotting ACF
of residuals, do a Box-Jung test
If residuals form a white noise the model fitting stops

94/146
Fitting an AR in practice

Compute autocorrelation function and check if decays fast


Estimate coefficients and the variance of noise
Use AIC criterion or PACF to determine the order
Compute residuals and test for white noise to decide if the
model gives a good fit

95/146
Fitting an MA in practice

Compute autocorrelation function and check if it vanishes


from some point on
Confirm the order with the AIC criterion
Estimate coefficients by maximum likelihood
Compute residuals and test for white noise

96/146
Fitting an ARMA

Attempt to fit AR and compute residuals


Attempt to fit MA to the AR-residuals, or to original data if
the AR fit was not satisfactory
Analyse the residuals

97/146
Exponential smoothing

Assume we believe in

X
X̂h+1 ≈ w j Xh+1−j , w ∈ (0, 1).
j=1

Since ∞ j
P
j=1 w = 1/(1 − w ) and we want weights to sum up
to one we define

X
X̂h+1 = (1 − w ) w j Xh+1−j , w ∈ (0, 1).
j=1

Can be analysed using ARIMA(0,1,1) model


Common technique for forecasting

98/146
Regression models with time series errors

Often interested in relationship between two (or more) time


series
Consider linear regression

Yt = α + βXt + et ,

where Yt , Xt are two time series and et denotes the error term.
Use least-squares to estimate α, β but it is common for et to
be serially correlated

99/146
R - regression models with time series errors

# Regression model with time-series errrors

# 1-year / 3-year treasury rates, weekly data


r1 = read.table("Data/w-gs1yr.txt",header=T)[,4]
r3 = read.table("Data/w-gs3yr.txt",header=T)[,4]
tVec = c(1:2467)/52+1962
plot(tVec,r1,xlab="",ylab="rate",type="l")
lines(tVec,r3,lty=1,col=’red’)
legend(x=’topleft’,legend=c(’1year’,’3year’),lty=c(1,1),col=c(’black’,’r
plot(r1,r3,type="p")

# Linear regression
lm1 = lm(r3~r1)
summary(lm1)
plot(tVec,lm1$residuals,type=’l’); abline(h=0)
acf(lm1$residuals,lag=36)

100/146
R - cont.

# Look at first differences


c1 = diff(r1)
c3 = diff(r3)
plot(tVec[-1],c1,xlab="",ylab="diff(rate)",type="l")
lines(tVec[-1],c3,lty=1,col=’red’)
legend(x=’topleft’,legend=c(’1year’,’3year’),lty=c(1,1),col=c(’black’,’r
plot(c1,c3); abline(0,1)

# Linear regression
lm2 = lm(c3~-1+c1) # ’-1’ for no intercept
summary(lm2)
plot(lm2$residuals,type="l")
acf(lm2$residuals,lag=36)
pacf(lm2$residuals,lag=36,ylim=c(-1,1))

101/146
Regression models with time series errors

ACF still shows significant serial correlation, but magnitudes


are much smaller then before
We can fit ARMA to residual errors
Next we try MA(1) and AR(p) as a model for the noise

102/146
R - cont.

# MA(1) / AR(p) for regression residuals

r = lm2$residuals
m.ma = arima(r,order=c(0,0,1),include.mean=F)

plot(m.ma$residuals); abline(h=0)
acf(m.ma$residuals)
pacf(m.ma$residuals,ylim=c(-1,1))
Box.test(m.ma$residuals,lag=10,type=’Ljung’)
1 - pchisq(50.344,9)

m.ar = ar(r); print(m.ar)


acf(m.ar$resid[-(1:21)])
pacf(m.ar$resid[-(1:21)],ylim=c(-1,1))
Box.test(m.ar$resid[-(1:21)],lag=23,type=’Ljung’)
1 - pchisq(1.7823,1)

103/146
Regression models - summary

1 Fit the linear regression model and check serial correlation of


the residuals
2 If the residual series is unit-root non stationary, take the first
difference of both the dependent and explanatory variables.
Go to step 1
3 If the residual series appears to be stationary, identify an
ARMA model for the residuals ad modify the linear regression
model accordingly

104/146
Backetesting

Divide the data into estimation and prediction set


Estimate the model from estimation set and use the fitted
model for 1-step predictions
Step forward using the prediction set
Calculate mean-square prediction errors
PT −1
j=h [ej (1)]2
T −h

105/146
Model averaging

Sometimes there are many reasonable models


A good idea is to take an average of the models (weights
should sum to 1)

106/146
Part II: nonlinear time series

107/146
Volatility

A notion of variability of price returns, often used for risk


Not directly observable =⇒ need to estimate from observed
returns
Many different modelling approaches

108/146
Volatility - characteristics

Volatility clusters
Volatility jumps are rare
Volatility shows mean reversion
Volatility seems to react differently to big price increase and
drops - leverage effect

109/146
Volatility models

Volatility as conditional standard deviation of daily returns


Implied volatility
Realised volatility measures (from high frequency data)

110/146
Nonlinear AR

We say that {Xt }t is a nonlinear AR(p) process if

Xt = µ(Xt−1 , . . . , Xt−p ) + σ(Xt−1 , . . . , Xt−p )Wt

Let p = 1 and {Wt }t be i.i.d. N(0, 1) then

Xt |Xt−1 ∼ N(µ(Xt−1 ), σ(Xt−1 )2 )

111/146
Motivation

Plot autocorrelation for data with non-linearity: looks like


white noise
Try qqplos =⇒ looks non-normal?
For Gaussian processes only: no correlation =⇒ lack of
dependence
Square the data and plot ACF

112/146
R - Nonlinear AR

# Non-linearity: S&P 500

ts.plot(SP500); abline(h=0)
acf(SP500)
qqnorm(SP500$r500); qqline(SP500$r500)

# Non-normal, zero-correlation does not guarantee independence!


acf(SP500$r500^2)

# Treasury-bill rates
data(Tbrate,package="Ecdat")
Tbill <- Tbrate[,1]
d.Tbill <- diff(Tbill)
qqnorm(d.Tbill)
qqline(d.Tbill)
acf(d.Tbill)
acf(d.Tbill^2)

113/146
Nonlinear AR

# IBM data:
da = read.table("data/m-intcsp7309.txt",header=T)
head(da)
X = log(da$intc+1)
rtn = ts(X,frequency=12,start=c(1973,1))
plot(rtn,type="l",xlab="year",ylab="IBM log-return"); abline(h=0)
Box.test(X,lag=12,type="Ljung")
# -> can not reject H0: rho(h)=0, h>0

acf(X,lag=24)
acf(abs(X),lag=24)
Box.test(abs(X),lag=12,type="Ljung")
# -> reject H0: rho(h)=0, h>0

114/146
Model structure

We have series {Xt } of log-returns


Usually uncorrelated or with minor lower order serial
correlation
We aim to model

µt = E[Xt |Ft−1 ], σt2 = Var[Xt |Ft−1 ]

115/146
Building a model

Specify the mean of the equation (ARMA model) to remove


linear dependence
Use residual to test for ARCH effects: Consider (Xt − µt )2
and use for Ljung-Box test (for example)
Build a model for volatility and perform joint estimation
Check the final model and refine if necessary

116/146
ARCH models

{Xt }t is said to be of type ARCH(p) if

Xt = σt Wt , {Wt }t is white noise

where
p
X
σt2 = α0 + 2
αj Xt−j ,
j=1

Require: α0 > 0 and αj ≥ 0 for j = 1, 2, . . . , p such that

σt2 > 0

117/146
ARCH(1)

Stationarity

Xt2 = σt2 Wt2 where σt2 = α0 + α1 Xt−1


2

Xt2 = α0 Wt2 + α1 Xt−1


2
Wt2
..
.

α1j Wt2 Wt−1
X
2 2
= α0 . . . Wt−j
j=0

P∞ j
E[Xt ] = E[E[Xt |Xt−1 ]] = 0 and E[Xt2 ] = α0 j=0 α1
Hence, {Xt }t is weakly stationary ⇐⇒ |α1 | < 1 and in that
case, E[Xt2 ] = α0 /(1 − α1 )

118/146
ARCH(1)

Auto-covariance, note that

E[Xt+h Xt ] = E [E[Xt+h Xt |Wt+h−1 , Wt+h−2 , . . .]]


= E[E[σt σt+h Wt Wt+h |Wt+h−1 , Wt+h−2 , . . .]]
= E[σt σt+h Wt E[Wt+h ]] = 0

Thus ACF is zero for h 6= 0


2 X 2 ] ∝ αh
Meanwhile, E[Xt+h t 1
Shows that X is a weak white noise, not a white noise in the
strong sense!

119/146
ARCH(1)

Higher moments

3α02 (1 + α1 )
E[Xt4 ] =
(1 − α1 )(1 − 3α12 )
1
This implies that 0 ≤ α12 < 3
Tails of Xt are heavier than that of normal distribution

E[Xt4 ] 1 − α12
= 3 > 3.
(Var[Xt ])2 1 − 3α12

120/146
ARCH pros and cons

Advantage:
Model can produce volatility clusters
X has heavy tails
Weaknesses:
Positive and negative shocks have the same effect on volatility
ARCH models are likely to over-predict the volatility

121/146
ARCH and AR

ARCH(p):
p
X
Xt = σt Wt where σt2 = α0 + 2
αj Xt−j ,
j=1

Define ηt = Xt2 − σt2 , then we can write


p
X
Xt2 = α0 + 2
αj Xt−j + ηt
j=1

This is an AR(p) for Xt2 except {ηt } might not be i.i.d.


Hence, to determine the order it is useful to look at PACF of
Xt2

122/146
ARCH model checking

Define residuals as
Xt
X̃t =
σt
Check if X̃t forms i.i.d sequence (Ljung-Box etc.)

123/146
R - ARCH fitting to IBM data

# ARCH(p)

# install fGarch from Rmetrics


install.packages(’fGarch’)
library(fGarch)
m1 = garchFit(~garch(3,0),data=X,trace=T) # Fit an ARCH(3) model
?garchFit
summary(m1)
m2 = garchFit(~garch(1,0),data=X,trace=F)
summary(m2)
plot(m2)

pacf(X^2,lag=24,ylim=c(-1,1))
m3 = garchFit(~garch(12,0),data=X,trace=F)
summary(m3)
plot(m3)

124/146
GARCH

{Xt }t is said to be of type GARCH(p,q) if

Xt = µt + σt Wt

where
p
X q
X
σt2 = α0 + 2
αj X̃t−j + 2
θj σt−j ,
j=1 j=1

with X̃t = Xt − µt
Hence X̃t |Ft−1 ∼ N(0, σt2 )

125/146
ARCH, GARCH

Set residuals εt ≡ Xt2 − E[Xt2 |Xt−1 ] = Xt2 − σt2


ARCH(1):

σ 2 = α0 + α1 Xt−1
2

Xt2 = α0 + α1 Xt−1
2
+ εt

GARCH(1,1):

σt2 = α0 + α1 Xt−1
2 2
+ β1 σt−1
Xt2 − εt = α0 + α1 Xt−1
2 2
+ β1 (Xt−1 − εt−1 )
Xt2 = α0 + (α1 + β1 )Xt−1
2
− β1 εt−1 + εt

126/146
ARCH,GARCH

{Xt }t ∼ ARCH(p) ⇐⇒ {Xt2 }t ∼ AR(p)


{Xt }t ∼ GARCH(p,q) ⇐⇒ {Xt2 }t ∼ ARMA(p,q)
In practise often low order GARCH models are used
Rule of thumb: if the fitted ARCH is of high order, often low
order GARCH is a good alternative

127/146
Other models

IGARCH - (unit-root GARCH)


GARCH - M (if returns depend on the volatility)
EGARCH (asymmetric effects between positive and negative
asset returns)
and many more...

128/146
Cointegration

Suppose we could find an asset whose price was stationary


(therefore mean-reverting)
Whenever the price is below the mean we buy and realise a
profit when the price reverts to its mean
If the price is above the mean we sell
However we already know that the prices are integrated!
Sometimes we can find two or more assets that a linear
combination of their prices is stationary
If that happens, a vector of coefficients of that linear
combination is called a cointegration vector

129/146
Example

Consider two time-series


(1) (1) (2) (2)
Xn = Sn + n and Xn = Sn + n n = 1, . . . , N,

where:
Sn = S0 + W1 + . . . + Wn , S0 = 0, is a random walk with
{Wn }n - N(0, 1) being a white noise process,
(1) (2)
{n } and {n } are two independent white noise sequences
that are independent of {W }
(1) (2)
{Xn }n and {Xn }n are random walks plus noise and hence
both I (1)

130/146
Example cont.

However, the linear combination


(1) (2) (1) (2)
Xn − Xn = n − n n = 1, . . . , N,

is stationary since it is a white noise with normal distribution .


Cointegration is a state of several time series sharing a
common (non-stationary) trend, the remainder being
stationary

131/146
Example cont.

Consider two time-series


(1) (1) (2) (2)
Xn = Sn1 + n and Xn = Sn2 + n n = 1, . . . , N,

Testing for stationarity is easy if the coefficients of the linear


combination are known
(1) (2)
Stationarity of a1 Xn + a2 Xn is equivalent to
(2) (1)
Xn = β0 + β1 Xn + n ,

where n is a stationary time series.


Regression analysis is very useful here

132/146
Regression analysis for cointegration

When regressing I (1) on another I (1) we may face the


problem that the residuals are integrated processes
However if these processes are cointegrated the residuals will
I (0) (previous slide)
Philips-Ouliaris or Dicky Fuller test for cointegration is
designed to test for stationarity of residuals

133/146
Trading with R

134/146
Useful libraries in R

TTR indicators popular in industry


PerformanceAnalytics collection of econometric functions
for performance and risk analysis
quantmod quantitative financial modelling and trading
framework
googleVis provides an interface between R and the Google
chart tools
Rmetrics open development software project for
computational finance

135/146
Basic steps

Forecasting: hypothesis, statistics, testing


Explaining data is easy, forecasting is complicated Data is very
noisy and often has little structure. Statistical significance is
hard to get
Get data (think of you trading frequency and horizon)
Select tools
Construct trading rule
Evaluate your strategy

136/146
Example

Data: SP500
Tool: we use DVI indicator (momentum indicator) from TTR
library
Trading rule: go long if DVI < 0.5 and short if DVI > 0.5
We always invest all our capital
Check performance of our strategy

137/146
Example in R
install.packages("quantmod")
install.packages("PerformanceAnalytics")
library(PerformanceAnalytics)
library(TTR)

# Step 1: Get S&P 500 data from Yahoo


getSymbols("^GSPC") # load(file=’GSPC_for_trading.Rdata’,verbose=T)
tail(GSPC)
chartSeries(GSPC,theme = chartTheme("white"))

# Step 2: Create indicator


# Calculate DVI (momentum indicator)
dvi <- DVI(Cl(GSPC)) # ’Cl’ gives closing price
?DVI
plot(dvi[,3])

# Step 3: Construct your trading rule


# Create signal: (long (short) if DVI is below (above) 0.5)
# ’Lag’ so yesterday’s signal is applied to today’s returns
sig <- Lag(ifelse(dvi[,3] < 0.5, 1, -1),k=1)
138/146
Example in R

# Step 4: Equity curve


# Calculate signal-based returns
ret <- ROC(Cl(GSPC))*sig

# Step 5: Evaluate strategy performance


# subset returns to period of intrest
ret <- ret["2009-06-02/2010-09-07"]

# Use the PerformanceAnalytics package:


# Cumulative Performance
chart.CumReturns(ret)
# Performance, Drawdonws etc...
table.Drawdowns(ret, top=10)
table.DownsideRisk(ret)
charts.PerformanceSummary(ret)

139/146
Example in R

# Compare with long buy-and-hold


ret <- ROC(Cl(GSPC))["2009-06-02/2010-09-07"]
charts.PerformanceSummary(ret)

# Compare with long-only


sig <- Lag(ifelse(dvi[,3] < 0.5, 1, 0))
ret <- ROC(Cl(GSPC))*sig
ret <- ret["2009-06-02/2010-09-07"]
charts.PerformanceSummary(ret)

# Compare with random signal


set.seed(10)
sig = runif(length(Cl(GSPC))) < 0.5
sig = 2*sig-1
ret <- ROC(Cl(GSPC))*sig
ret <- ret["2009-06-02/2010-09-07"]
charts.PerformanceSummary(ret)

140/146
RSI - Example in R

# S&P500 index data from Yahoo! Finance


getSymbols("^GSPC", from="2000-01-01", to="2008-12-07")
chartSeries(GSPC,theme = chartTheme("white"))

# Calculate the RSI indicator


rsi <- RSI(Cl(GSPC),2)
plot(rsi)
?RSI

# Create the long (up) and short (dn) signals


sigup <- ifelse(rsi < 10, 1, 0)
sigdn <- ifelse(rsi > 90, -1, 0)
# Lag signals to align with days in market,
# not days signals were generated
sigup <- lag(sigup,1)
sigdn <- lag(sigdn,1)

141/146
RSI - Example in R

# Replace missing signals with no position


# (generally just at beginning of series)
sigup[is.na(sigup)] <- 0
sigdn[is.na(sigdn)] <- 0
# Combine both signals into one vector
sig <- sigup + sigdn

# Calculate Close-to-Close returns


ret <- ROC(Cl(GSPC))*sig
charts.PerformanceSummary(ret)

142/146
Remarks

Warning! Often it is useful to size your position (for example


taking into account volatility
Try this for RSI strategy

143/146

Вам также может понравиться