Econometrics 2019 PDF

Introduction to Analysis
of Financial Data
Martin Tegner
Mathematical Institute
University of Oxford
June 21, 2019
The main goals of this course
To overview tools for data analysis used for financial data

To get an overview of themes in the world of time series
modelling and econometrics
To get familiar with R
To show how to build elementary trading strategies
2/146
Suggested literature
“Statistics and Data Analysis for Financial Engineering” by

David Ruppert
”Statistical Analysis of Financial Data in S-Plus“ by Rene
Carmona
”Analysis of Financial Time Series” by Ruey Tsay
”Quantitative Risk Management” by A. McNeil, R. Frey, P.
Embrechts
3/146
Outline
Introduction to R
Empirical properties of financial data
Kernel estimators
Test for normality, QQ plots
Regressions
Way of analysing dependency between variables
Linear and non-linear time series
Stationarity
Autocorrelation functions
Modelling
4/146
Introduction to R
https://github.com/martnj/R-econometrics
5/146
R - introduction
# Creating vectors (data structure which contains objects of the same mode)
x <- c(1,2,3,4,5) # ’c’ for concatenate, gives *numeric* vector ’x’
x # print out ’x’
length(x)
(y <- 1:5) # define vector and *print* output via ’()’
(y[6] <- 6) # append to a vector (alternative: y <- c(z, 6))
numeric(5) # empty numeric of length 5 (default)
# Check if two objects are the same

x == y # component-wise ’equal-to’ operator
identical(x,y) # identical as objects? why not?
class(x) # => x is a *numeric* vector
class(y) # => y is an *integer* vector
all.equal(x,y) # numerical equality; see argument ’tolerance’
# Vector arithmetic: component-wise

2*x + 1 # ’*’ component-wise, ’+1’ adds 1 to all elements
x + x
x*x # component-wise product
x*y # must be of same length
6/146
R - introduction
# Some functions
(x <- c(3,4,2))
rev(x) # reverse order
sort(x) # sort in increasing order
sort(x, decreasing=TRUE)
(idx <- order(x)) # create indices that sort x
x[idx] # => sorted
log(x) # (component-wise) logarithms
x^2 # (component-wise) squares
exp(x)
sum(x)
cumsum(x)
prod(x)
# Sequences
seq(from=1,to=7,by=2)
seq(from=1,to=100,length.out=25)
rep(1:3, each=3, times=2)
# Missing values
z <- 1:3; z[5] <- 4 # two statements in one line (’;’-separated)
z # ’not available’ (NA)
c(z, 0/0) # 0/0, 0*Inf, Inf-Inf lead to ’not a number’ (NaN)
class(NaN) # not a number but still of mode ’numeric’ 7/146
class(NA)
R - introduction
# Matrices
(A <- matrix(1:9, ncol=3)) # OBS! operates on matrix by *columns*
(A <- matrix(1:9, ncol=3, byrow=TRUE)) # row-wise
# Some matrix functions

nrow(A) # number of rows
ncol(A)
dim(A) # dimension
diag(A) # diagonal of A
diag(3) # identity 3x3 matrix
(D <- diag(1:3)) # diagonal matrix
D%*%A # matrix multiplication
A*A # element-wise product (ie. Hadamard product)
log(A)
rowSums(A)
sum(A) # sums all elements
8/146
R - introduction
# Random number generation

(X <- rnorm(2)) # generate two N(0,1) random variates
(Y <- rnorm(2))
# Reproducibility:
# Set a ’seed’
X==Y # obviously not equal (here: with probability 1)
set.seed(10) # with set.seed() we can set the seed
X <- rnorm(2) # draw two N(0,1) random variates
set.seed(10) # set the same seed again
Y <- rnorm(2)
all.equal(X, Y) # => TRUE
9/146
R - introduction
# Plot student-t distributions with various

# degrees of freedom and compare to the normal
x <- seq(-4, 4, length=100)

hx <- dnorm(x)
degf <- c(1, 3, 8, 30)
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c("df=1", "df=3", "df=8", "df=30", "normal")
plot(x, hx, type="l", lty=2, xlab="x value",
ylab="Density", main="Comparison of t Distributions")
for (i in 1:4){
lines(x, dt(x,degf[i]), lwd=2, col=colors[i])
}
legend("topright", inset=.05, title="Distributions",
labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
10/146
R - introduction
# The quantmod library
install.packages("quantmod") # download/install
install.packages(c("TTR","xts","zoo")) # packages required for qunatmode
library(quantmod) # load the package
?‘quantmod-package‘
getSymbols("AAPL",src="yahoo") # download Apple stock from Yahoo

class(AAPL)
?‘xts-package‘
dim(AAPL)
names(AAPL)
head(AAPL) # the first 6 rows of the data
tail(AAPL) # last 6 rows of the data
# Financial chart from ’quantmod’

chartSeries(AAPL,theme="white")
chartSeries(AAPL,type=c("auto","candlesticks"),subset="last 2 months")
# Log-returns
rtn = diff(log(AAPL$AAPL.Close))
chartSeries(rtn,theme="white")
11/146
Probability background
12/146
Randomness
Formal model: (Ω, F, P) probability space

Sample space: ω ∈ Ω represents possible realisation of
experiment
Sigma-algebra: family of events A ∈ F to which we can
assign probabilities
Probability measure: assigning probabilities P(A) to events
Random variable: X : Ω → R (or Rd ) often our model
Probability distribution: ‘event’ = {ω : X (ω) ≤ x} random
variable induces FX (x) = P(X ≤ x)
Dynamical model: a collection of random variables {Xt }t≥0
called a stochastic process
Filtration: working with stochastic processes, {Ft }t≥0
represents the information flow
13/146
Randomness
Denote random variables by upper case letters: X1 , . . . , Xn

Say that X1 , . . . , Xn are independent if for all αi ∈ R
P(X1 ≤ α1 . . . , Xn ≤ αn ) = P(X1 ≤ α1 ) . . . P(Xn ≤ αn ).
Denote a data sample by (x1 , . . . , xn ), where implicitly

assume that observations are realisations of i.i.d random
variables Xi with common distribution F (x)
14/146
Distributional properties
Joint Distribution FX ,Y (x, y ) = P(X ≤ x, Y ≤ y ). If density exists

Z x Z y
FX ,Y (x, y ) = P(X ≤ x, Y ≤ y ) = fx,y (w , z)dzdw
−∞ −∞
Marginal distribution
Z ∞
FX (x) = FX ,Y (x, ∞), fx (x) = fx,y (x, y )dy
∞
q-th quantile of random variable
xq = inf{x : q ≤ FX (x)}
Conditional distribution
P(X ≤ x, Y ≤ y ) fx,y (x, y )
FX |Y ≤y (x) = , fx|y (x) =
P(Y ≤ y ) fy (y )
15/146
Distributional properties
p-th moment
Z
p
E[X ] = x p f (x)dx (< ∞ ⇐⇒ X ∈ Lp )
R
p-th central moment (provided X ∈ Lp )

Z
E[(X − µX )p ] = (x − µX )p f (x)dx
R
Skewness (symmetry)
(X − µX )3
SX = E[ ]
σX3
Kurtosis (tail behaviour)
(X − µX )4
KX = E[ ]
σX4
SX = 0 and KX = 3 for Gaussian variable; KX > 3 called leptokurtic
16/146
Conditional expectation
Assume have observed the outcome of some random variables

X = (X1 , . . . , Xn ) ∈ L2 and want to make a statement ξˆ ∈ σ(X ) about
the outcome of some unobserved random variable ξ ∈ L2
The best possible way in mean-square sense (L2 ), i.e. if we want the error
ˆ 2]
E[(ξ − ξ)
to be small, is given by the conditional expectation
ξˆ = E[ξ|X ]
Often, we are interested in the best linear prediction
ξˆ = α0 + α1 X1 + · · · + αn Xn
with αi ∈ R such that the mean-square error is minimised

If (X , ξ) is jointly Gaussian, then indeed
E[ξ|X ] = µξ + ΣξX Σ−1

XX (X − µX )
17/146
Empirical properties of financial data
18/146
Histogram
Histogram: a non-parametric estimator of the density of a

data sample
Advantages - simplicity, works well in regions with the bulk of
data points
Disadvantages - strong dependence on the of position and size
of bins
Can be misleading in regions with only few data points
(extremes), no information about the variability within a bin
19/146
Kernel density estimator
Given a sample (x1 , . . . , xn ) from (unknown) density f (x), the

kernel density estimator is
n
ˆ 1 X x − xi
fb (x) = K( ),
nb b
i=1
where K is a non-negative function which integrates to 1

called kernel, and b > 0 is called bandwidth
For the histogram
n
1 X
Hist(x) = θ(x, xi )
nb
i=1
where θ(x, xi ) = 1 if x and xi belong to the same bin, zero

otherwise
20/146
R - S&P 500 returns
install.packages("Ecdat") # Download package Ecdat

data(package="Ecdat") # Load data from it (SP500),
data(SP500,package="Ecdat")
class(SP500)
names(SP500)
plot(SP500$r500,type="l")
hist(SP500$r500)
hist(SP500$r500,breaks=100)
plot(density(SP500$r500))
install.packages("fBasics")
library(fBasics)
basicStats(SP500)
21/146
R - output of basicStats(SP500)
r500
nobs 2783.000000 # Sample size
NAs 0.000000 # No of missing values
Minimum -0.228006
Maximum 0.087089
1. Quartile -0.004845
3. Quartile 0.005766
Mean 0.000418
Median 0.000536
Sum 1.163571
SE Mean 0.000206 # Standard error of sample mean
# SE:= (standard deviation)/sqrt(nobs)
LCL Mean 0.000014 # Lower bound of 95% C.I.
UCL Mean 0.000822
Variance 0.000118
Stdev 0.010863
Skewness -3.495673
Kurtosis 74.400163
22/146
Hypothesis testing
Statement of a null hypothesis H0 and decision on a

significance level α, e.g. α = 0.05
Outcome: “reject” or “not reject” null hypothesis
Never possible to “accept”, only possible to state that the
sampled data are not sufficient to reject
Connection to confidence intervals:
e.g., test whether a parameter is zero
if 95% CI contains zero, fail to reject “Null”; otherwise reject.
In 100 tests, one would expect 5 (wrongly) deemed
statistically significant
23/146
Hypothesis testing - example
Let x = (x1 , . . . , xN ) be a sample from X ∼ N(µX , σX2 )

N
1 X σ2
µ̂X = xi =⇒ µ̂X ∼ N(µX , X )
N N
i=1
H0 : µX = 0 (null hypotheis) vs. H1 : µX 6= 0 (alternative)

Under H0 : √
Test statistic t = σ̂NXµ̂X ∼ student’s t with N − 1 degrees of
freedom (t-student since estimate σ, otherwise normal)
Reject H0 at significance level α if
|t| ≥ q1−α/2
q1−α/2 is quantile of t-student (std. normal) distribution

Or if p-value is less then α, p-value = P(|T | > t|H0 is true)
24/146
Testing for normality
Jarque-Bera test
H0 : data {Yi }ni=1 from standard normal distribution
J-B test: if Y ∼ N(0, 1), then E[Y 3 ] = 0 and E[Y 4 ] = 3

Test statistic:
n 1X 3 2 n 1X 4
J= ( Yi ) + ( Yi − 3)2
6 n 24 n
For large n, J ∼ χ22
A large J (> 6) =⇒ reject H0
Shapiro-Wilk test looks at order statistics
25/146
R - test for normality
normalTest(SP500$r500,method=’jb’) # JB-test
# STATISTIC:
# X-squared: 648508.6002
# P VALUE:
# Asymptotic p Value: < 2.2e-16 # Reject normality
hist(SP500$r500,nclass=30) # Histogram
d1 = density(SP500$r500) # Obtain density estimate
range(SP500$r500) # Range of SP500 returns
x = seq(-.25,.1,by=.001)
y1 = dnorm(x,mean(SP500$r500),stdev(SP500$r500))
plot(d1$x,d1$y,xlab="return",ylab="density",type="l")
lines(x,y1,lty=2)
plot(d1$x,d1$y,xlab="return",ylab="density",type="l",xlim=c(-0.05,.05))
lines(x,y1,lty=2)
26/146
Q-Q plots
Q-Q plot is a graphical method of comparing two probability

distributions by plotting their quantiles
Given FX and q, the q-quantile is a number πq satisfying
FX (πq ) = P(X ≤ πq ) = q ⇒ πq = FX−1 (q).
or more generally
πq = inf{x : q ≤ FX (x)}
27/146
Q-Q plots
Empirical distribution
n
1X
F̂n (x) = 1{xi ≤x}
n
i=1
By Glivenko-Canteli theorem F̂n (x) → F (x) uniformly in x

Order statistic: X(1) ≤ X(2) ≤ . . . ≤ X(n)
Viewing x(k),n as a realization of X(1),n . . . , X(n),n we have
k −1 k
x(k),n = π̂q for <q≤
n n
since the sample quantile is
n
1X
π̂q = inf{x : q ≤ F̂n (x) = 1{xi ≤x} }
n
i=1
28/146
Normal probability plots
If the normality assumption is true, then qth sample quantile

will be approximately equal to µ + σΦ−1 (q)
πq − µ X −µ πq − µ πq − µ
Φ( ) = P( ≤ )=q ⇒ = Φ−1 (q).
σ σ σ σ
Hence, plotting (x(k),n − µ̂)/σ̂ versus Φ−1 (k/(n + 1)) we can
test for normality
29/146
R - Q-Q Plot
# Create sample data

x <- rcauchy(100) # Undefined moments
# Standard normal fit

qqnorm(x)
qqline(x)
# S&P 500 returns

y = (SP500$r500 - mean(SP500$r500))/sd(SP500$r500)
qqnorm(y)
qqline(y)
30/146
Linear regression
31/146
Linear regression
Y , output variable
X = (X1 , . . . , Xp ), input variables
How does Y relate to X ?
Model assumption:
Y = β0 + β1 X1 + . . . + βp Xp +
where is an error with E [|X1 , . . . , Xp ] = 0
32/146
Regression coefficients
Model
Y = β0 + β1 X1 + . . . + βp Xp +
β0 : intercept
βj : slope
∂
βj = E [Y |X1 , . . . , Xp ].
∂Xj
βj gives change in the expected value of Y when Xj changes

by one unit
33/146
Regression - assumptions
Conditional expectation assumed to be linear
E [Y |X ] = β0 + β1 X1 + . . . + βp Xp
Uncorrelated additive (Gaussian) noise: 1 , 1 , . . . , n are

uncorrelated (independent)
Constant variance: Var(i ) = σ2
34/146
Least squares estimation - 1D input variable
n independent observations (x1 , y1 ), . . . , (xn , yn ) from
Yi = β0 + β1 Xi + i
Least squares estimation: find β0 and β1 that minimize

n
X
RSS(β) = (yi − (β0 + β1 xi ))2
i=1
Solution
Pn
yi (xi − x̄)
β1 = Pi=1
b n 2
i=1 (xi − x̄)
βb0 = ȳ − βb1 x̄
Fitted values ybi = βb0 + βb1 xi

35/146
Least squares estimation - p-dim input variable
n independent observations (x1 , y1 ), . . . , (xn , yn ) where

xi = (xi1 , . . . , xip )> from
Yi = β0 + β1 Xi1 + · · · + βp Xip + i
Organise in matrices: y = (y1 , . . . , yn )> , X with rows (1, x>

i )
and β = (β0 , . . . , βp )> such that
RSS(β) = (y − Xβ)> (y − Xβ)
If X has full rank, the unique minimising solution is given by
β̂ = (X> X)−1 X> y
Fitted values ŷ = Xβ̂
36/146
Sampling properties of β̂
Unbiased estimate with variance
Var(β̂) = (X> X)−1 σ2

n
1 X
σ̂2 = (yi − ŷi )2
n−p−1
i=1
If errors are Gaussian

β̂ ∼ N β, (X> X)−1 σ2
(n − p − 1)σ̂2 ∼ χ2n−p−1
Gives 1 − α confidence intervals/regions

h √ √ i
β̂j − z (1−α/2) vj σ̂ , β̂j + z (1−α/2) vj σ̂
n (1−α/2)
o
Cβ = β : (β̂ − β)> X> X(β̂ − β) ≤ σ̂2 χ2p−1
37/146
Sampling properties of β̂
(1−α/2)
z (1−α/2) and χ2p−1 are the 1 − α/2-quantiles of the
normal and chi-square distributions respectively (intervals
approximately correct even if errors are non-Gaussian)
The confidence region for β generates a corresponding
confidence region for the true regression function
f (x) = (1, x)> β, namely
n o
Cf (x) = (1, x)> β : β ∈ Cβ
38/146
Inference about model
Gives a way of drawing inference about β
Consider the null hypothesis βj = 0. Then
β̂j
tj = √ , vj = jth diagonal of (X> X)−1
σ̂ v j
is distributed as tn−p−1 . Hence, large tj =⇒ reject H0

Similarly, for a group of parameters βg ⊂ β, we may test
H0 : βg = 0 with the F statistic
(RSS0 − RSS)/(p − p0 )
F =
RSS/(n − p − 1)
where RSS0 is for smaller model with βg = 0 and p0 + 1

(non-zero) parameters
Under null hypothesis, F is distributed as Fp−p0 ,n−p−1
39/146
R - Linear regression
# Linear regression
spx = SP500$r500
y = spx[-(1:2)]
X = cbind(rep(1,length(spx)-2),spx[-c(1,length(spx))],
spx[-c(length(spx)-1,length(spx))] )
install.packages("scatterplot3d")
library(scatterplot3d)
pr = par()
scatterplot3d(X[,2],X[,3],y)
par(pr)
m1 <- lm(y~X-1)
summary(m1)
X = cbind(rep(1,length(spx)-2),spx[-c(1,length(spx))] )
plot(X[,2],y)
m2 <- lm(y~X-1)
summary(m2)
40/146
R - Linear regression
beta_hat = solve(t(X)%*%X,t(X)%*%y)
sig2_eps = sum((y - X%*%beta_hat)^2)/(nrow(X)-3)
beta_cov = inv(t(X)%*%X)*sig2_eps
for(i in 1:1000){
beta_rnd = beta_hat + (chol(beta_cov))%*%rnorm(length(beta_hat))
abline(beta_rnd,col=grey(0.7))
}
41/146
Model selection - information criteria
Akaike Information Criterion (AIC) is defined as
2 2
AIC = − log(max likelihood) + × (number of parameters)
N N
Recall the likelihood L(θ|y) = P(y|θ)
N
2 X 2d
AIC = − log P(yi |θ̂) +
N N
i=1
First component of AIC measures ”goodness of fit”

Second component of AIC penalises for number of parameters
Bayesian information criterion (BIC)
N
2 X ln(N)
BIC(h) = − log P(yi |θ̂) + × (number of parameters)
N N
i=1
42/146
Model quality - goodness of fit
How much of the variation in Y can be predicted if one knowns

X1 , . . . , Xp ? (ANOVA)
n
X
total SS = (yi − ȳ )2 ( ∝ variance of the data)
i=1
Xn
regression SS = yi − ȳ )2
(b (explained sum of squares)
i=1
Xn
residual SS = yi − yi )2
(b (residual sum of squares)
i=1
43/146
Sums of squares and R 2
Always
total SS = explained SS + residual SS
Define
explained SS
R2 =
total SS
Thus, R 2 ≤ 1 measures proportion of the total variation of Y that
can be explained by linear model
If residual SS = 0 =⇒ R 2 =1 (i.e., zero prediction error)
If explained SS = 0 (i.e., ybi = ȳ ) no point of regression model
The closer R 2 to 1 the better
44/146
Part I: linear time series
45/146
Time series - definition
Time series data: a sequence of observations in chronological

order
x = (x0 , x1 , . . . , xn )
often assumed to be observed outcome of
Stochastic process (time series model): a sequence of random
variables with the index representing (discrete) time
X = {Xt }t
46/146
Objective
The objective of time series analysis is to find the dynamic

properties of X , i.e. the dependence of Xt on its past values
Xt−1 , Xt−2 , . . .
We try to draw inference about a time series model X from
an observed realisation x
47/146
Asset returns
Most financial studies applies to returns data

Returns provide a scale-free summary of the investment
opportunity
Returns have more attractive statistical properties than the
price itself
Let Pt be a price of an asset at time t > 0

Simple return
Pt − Pt−1 Pt
Rt = ⇐⇒ 1 + Rt =
Pt−1 Pt−1
Log return
Pt
rt = log (1 + Rt ) = log = log Pt − log Pt−1
Pt−1
48/146
R - asset price vs. returns
# Mishkin tb3 - three-month Bonds

# T-bill rate (in percent, annual rate)
data(Mishkin ,package="Ecdat")
plot(Mishkin[,4])
y <- diff(log(Mishkin[,4])) # log returns

plot(y)
abline(h=0,col="grey")
49/146
Stylized facts of asset returns
Stylised facts is a collection of empirical observations from

statistical analysis of financial price data (e.g. log-returns on
equities, indices, exchange rates, commodity prices etc.)
Stylized facts often applies to daily log-returns (also to
intra-daily, weekly, monthly, tick-by-tick)
50/146
Return time series are not i.i.d. although they show little
serial correlation
Series of absolute or squared returns (∼variance) show
profound serial correlation
Conditional expected returns are close to zero
Extreme returns appear in clusters
Return series are leptokurtic (peaked) and heavy-tailed
(power-like tail)
51/146
1.0
0.5
log−return
0.0
−0.5
1950 1960 1970 1980 1990
Time
52/146
Time series model - properties
Mean and standard deviation functions
µX (t) = E[Xt ]
p
σX (t) = Var[Xt ]
Autocovariance function
γX (s, t) = Cov[Xs , Xt ]
and autocorrelation function (ACF)
γX (s, t)
ρX (s, t) = Corr[Xs , Xt ] =
σX (s)σX (t)
Recall that the first two moments fully characterise Gaussian

process
53/146
Time series model - properties
Stationarity
“The same type of stochastic behaviour of a stochastic
process from one time period to another”
For instance, financial returns can vary, but their stochastic
properties (mean, std, ...) are often similar in each period
Attention: returns often show stationary behaviour, not asset
prices, which tend to increase over time
Important theme: transforming time series data to obtain
stationary behaviour =⇒ modelling in stationary domain
54/146
Strictly and weakly stationary processes
Definition
A stochastic process X = {Xt }t
is strictly stationary if for any n, h ∈ N
P(Xt1 ≤ x1 , . . . , Xtn ≤ xn ) = P(Xt1 +h ≤ x1 , . . . , Xtn +h ≤ xn )
is weakly stationary if
i) E[Xt ] = µX for all t ≥ 0
ii) Var [Xt ] = σX2 < ∞ for all t ≥ 0
iii) γX (s + h, s) = γX (t + h, t) for all t, s, h (auto-covariance is a
function only of the time lag h)
55/146
Remarks
Strict stationarity does not imply weak (e.g. E[Xt2 ] = ∞)

Weak stationarity does not imply strict (e.g. E[Xtp ] p > 2
may vary)
The autocorrelation function (ACF) of a stationary time series
X = {Xt }t is given by
γX (h)
ρX (h) = Corr[Xh , X0 ] = , h∈Z
γX (0)
56/146
Time average as statistical estimates
If X is stationary time series then

n
1X
lim Xi = const. (= µX if X is ergodic)
n→∞ n
i=1
During this course we assume that stationary ⇒ ergodic,

hence it makes sense to set
n
1X
µ̂X = xi
n
i=1
n−|h|
1 X γ̂X (h)
γ̂X (h) = (xi − µ̂X )(xi+|h| − µ̂X ), ρ̂X (h) =
n γ̂X (0)
i=1
57/146
Autocorrelation function (ACF)
# S$P 500 - ACF
getSymbols("SPY",src="yahoo")
head(SPY)
plot(SPY$SPY.Close)
SPY.rtn = diff(log(SPY$SPY.Close))
plot(SPY.rtn)
acf(SPY.rtn[-1]) # plot autocorrelation function
acf(SPY.rtn[-1]^2) # x[-1] : skip the first element
58/146
The search for stationarity - decomposition
It is a common practice to work with mean-zero time series

Xt − µX (tricky if we have µX (t))
Sometimes data x(t) can be described as
x(t) = m(t) + p(t) + x̃(t),
where m(t) is a deterministic monotone function, p(t) a

deterministic periodic function and x̃(t) is a mean-zero
stationary time series
See example ”Stationarity CO2 at Manua Loa”
59/146
The search for stationarity - integration
Differential operator ∇ is given by
∇Xt = Xt − Xt−1
Taking differences can turn non-stationary time series into

stationary (remember returns vs prices!)
We say that a time series is integrated of order one (or that
it has one unit root) if its difference is stationary. We denote
the class of these time series by I (1)
60/146
The search for stationarity - integration
First difference can cancel a constant term, the second order

difference will cancel a linear trend (m(t) = at + b)
∇2 Xt = Xt − 2Xt−1 + Xt−2
Successive application of the difference operator can remove

any kind of polynomial trend
X ∈ I (p) means that ∇p X = {∇p Xt }t is stationary
The order of integration can be calculated using unit root tests
61/146
The search for stationarity - Box-Cox transformations
Stabilisation of variance for X̃t = Xt − µX (t)

In the case when
Var[Xt ] = φ(µX (t))2 σ 2
we can take Yt = Ψ(X̃t ), where Ψ0 (x) = 1

φ(x) and Ψ is
invertible
Yt = Ψ(X̃t ) ≈ Ψ(µX̃ (t)) + Ψ0 (µX (t))(X̃ (t) − µX (t)) with
approximative variance σ 2
Examples:
Ψ(µ) = log µ is recommended when the variance varies like
the mean (φ(µ) = µ)
√
Ψ(µ) = µ is recommended when the variance varies like the
√
square root mean (φ(µ) = µ)
62/146
Stationarity
# Stationarity CO2 at Manua Loa

data(package="datasets")
data(co2,package="datasets")
plot(co2)
co2.stl= stl(co2,"periodic")
?stl
head(co2.stl$time.series)
plot(co2.stl)
plot(co2.stl$time.series[,3],ylab="CO2 data, remainder")
plot(diff(co2,differences = 2))
63/146
First example of time series model - white noise
Definition ((Strict) white noise)

{Xt }t∈Z is a white noise process if it is stationary, has zero
mean and with ρ(h) = 1h=0 and γ(0) = σ 2 < ∞. It is
denoted by WN(0, σ 2 ).
{Xt }t∈Z is a strict white noise process if it is a sequence of
i.i.d. random variables with γ(0) = σ 2 < ∞ and zero mean.
We write SWN(0, σ 2 ).
A time series w0 . . . , wn is said to form white noise if they are

observations from i.i.d. mean zero random variables {Wt }t∈Z ;
it is obviously stationary
Cov(Ws , Wt ) = E[Ws Wt ] = 0
E[Wt+i |W1 , . . . , Wi ] = 0 so no predictive power (modelling
process is complete)
64/146
R - white noise
# White noise
WN <- rnorm(1024,0,1)
ts.plot(WN)
acf(WN,40,"covariance")
acf(WN,40,"correlation")
# check for normality

qqnorm(WN)
qqline(WN)
65/146
Linear time series
Definition
A time series {Xt }t∈Z is said to be linear if it can be written as
∞
X
Xt = µ + Ψi Wt−i ,
i=0
where µ is the constantPmean of X , Ψ0 = 1, and {Wt } is a white

noise series. Moreover ∞ i=0 |Ψi | < ∞ (absolute summability
condition).
∞
X
E[Xt ] = µ, Var[Xt ] = σ 2 Ψ2i .
i=0
weakly stationary =⇒ Var[Xt ] < ∞ =⇒ Ψ2i → 0
66/146
Linear time series
P∞
i=0 |Ψi | < ∞ implies that
∞
X
E[| Ψi Wt−i |] < ∞
i=0
67/146
Linear time series
The ACF of stationary X is

∞
X ∞
X
γ(h) = Cov [Xt , Xt−h ] = E[ Ψi Wt−i Ψj Wt−h−j ]
i=0 j=0
∞
X ∞
X
2
= E[ Ψj Ψi Wt−i Wt−h−j ] = Ψj+h Ψj E[Wt−h−j ]
i,j=0 j=0
X∞
2
=σ Ψj+h Ψj
j=0
68/146
Random walk
We say that {Xn }n is a random walk if there exists a white

noise W such that
Xn = X0 + W1 + . . . + Wn .
E[Xn ] = E[X0 ]
Var[Xn ] = Var[X0 ] + nσ 2
Variance is changing with n =⇒ non-stationarity
But {Xn }n is I (1)
Example of a unit-root non-stationary time series.
69/146
R - random Walk
# Random Walk
WN <- rnorm(1024,0,1)
RW <- cumsum(WN)
acf(RW,40,"covariance")
acf(RW,40,"correlation")
acf(diff(RW),40,"correlation")
70/146
Auto regressive time series (AR model)
Definition
A mean-zero time series {Xn }n is AR(p) if
Xn = φ1 Xn−1 + φ2 Xn−2 + . . . + φp Xn−p + Wn ,
where φi ∈ R
Subtract the sample mean from the data before trying to model
with (mean-zero) AR processes
71/146
AR(1) series
AR(1) series
Xt = φ1 Xt−1 + Wt
We can rewrite this iteratively as
Xt =φ1 Xt−1 + Wt = Wt + φ1 Wt−1 + φ21 Wt−2 + . . .

∞
X
= φi1 Wt−i
i=0
Fits definition of linear time series (φ1 = 1 =⇒ random walk

starting at “minus infinity”)
72/146
AR(1) series
Var[Xt ] = φ21 Var[Xt−1 ] + σ 2

Weak stationarity assumption: Var[Xt ] = Var[Xt−1 ] < ∞ hence
σ2
Var[Xt ] =
1 − φ21
provided φ21 < 1. Hence
weak stationarity =⇒ −1 < φ1 < 1
It turns out that the opposite implication is also true.

The ACF is given by
ρ(h) = φ1 ρ(h − 1) =⇒ ρ(h) = φh1 since ρ(0) = 1
(hint: multiply Xt with Xt−h )

73/146
R - AR(1) series
# AR(IMA) process
# order = c(p,d,q) where p is AR(p), d differencing,
# q is MA(q)
x = arima.sim(list(order=c(1,0,0),ar=0.9),n=1000)
ts.plot(x)
acf(x,40,type="correlation")
lines(0.9^(0:40),lty=2)
74/146
Estimation: Yule-Walker equations
Assume we have AR(2) model
Xn = φ1 Xn−1 + φ2 Xn−2 + Wn (1)
We want to estimate φ1 , φ2 (and possibly σ)

Multiply (1) by Xn , then Xn−1 , finally by Xn−2
Use the fact that E [Wn Xn−1 ] = 0 and E [Wn Xn−2 ] = 0
We obtain the Yule-Walker equations
γX (0) = φ1 γX (1) + φ2 γX (2) + σ 2

γX (1) = φ1 γX (0) + φ2 γX (1)
γX (2) = φ1 γX (1) + φ2 γX (0)
Solving with sample covariances gives one way of estimating

φ1 , φ2 , σ (method of moments)
75/146
AR(2) process
We have
φ1
ρ(1) = (2nd eqn)
1 − φ2
ρ(h) = φ1 ρ(h − 1) + φ2 ρ(h − 2), h≥2 (3rd eqn)
Defining backshift operator B k ρ(h) = ρ(h − k), may write the

characteristic equation
(1 − φ1 B − φ2 B 2 )ρ(h) = 0
Its solutions−1 are characteristic roots. For example, from roots

ω1 and ω2 we have
(1 − ω1 B)(1 − ω2 B)ρ(h) = 0
Complex roots give rise to seasonality in the time series

76/146
Stationarity and characteristics roots
For AR(1) we had ρ(h) = φ1 ρ(h − 1), hence the characteristic

equation has the form
1
1 − φ1 z = 0 characteristic root ω= = φ1
z0
with stationarity iff |φ1 | = |ω| < 1
For stationarity of AR(p) process:

characteristic roots are less than 1 in modulus
this ensures that ACF of the model converges to 0 as lag h
increases
77/146
Estimation of AR(p) process
Mean-zero AR(p) is given by
Xn = φ1 Xn−1 + φ2 Xn−2 + . . . + φp Xn−p + Wn ,
ACF satisfies
(1 − φ1 B − φ2 B 2 − . . . − φp B p )ρ(h) = 0, for h > 0.
Setting h = 1, . . . , p gives a set of p equations for unknown

(φ1 , . . . , φp ) =⇒
Method of moments estimation (Yule-Walker)
Regression: Xn towards (Xn−1 , . . . , Xn−p ) =⇒ least-squares
estimation
Alternatives exist: MLE, prediction error method etc.
78/146
Finding the order
The optimal order p of model can be determined by

AIC-criteria or partial autocorrelation function (PACF)
79/146
Partial autocorrelation function
Given (Xt−m+1 , Xt−m+2 , . . . , Xt ), the best linear predictor of Z is
Etm (Z ) = α1 Xt−m+1 + α2 Xt−m+2 + . . . + αm Xt
which minimises the mean square error from Z

h i
E (Z − (α1 Xt−m+1 + . . . + αm Xt ))2
The k-th partial autocorrelation is defined as

k−1 k−1
ΦX (k) = Corr[Xt − Et−1 (Xt ), Xt−k − Et−1 (Xt−k )]
In other words: PACF at lag k is the correlation between Xt and Xt−k

after their linear dependence on (Xt−k+1 , . . . , Xt−1 ) (the intermediate
variables) has been removed
k-th partial auto-correlation Φ(k) can be calculated from auto-correlation
coefficients
If X is an AR(p) process, then Φ(k) = 0 for k > p
80/146
R - partial autocorrelation function
# US Gross National Product 1946-2010
data = read.table("Data/q-gnp4710.txt",header=T)
head(data)
tail(data)
gnp = data$VALUE
gnp.r = diff(log(gnp))
tVec = seq(1947,2010,length.out=nrow(data)) # create time-index
plot(tVec,gnp,xlab=’year’,ylab=’GNP’,type="l")
plot(tVec[-1],gnp.r,type="l",xlab="year",ylab="growth"); abline(h=0)
acf(gnp.r,lag=12)
pacf(gnp.r,lag=12,ylim=c(-1,1))
81/146
R - AIC criteria for order determination
# PACF criteria for AR(p)

m1 = arima(gnp.r,order=c(3,0,0)) # fit AR(3) model
m1
# AIC criteria
?ar # ‘ar’ also fits order ’p’!
m2 = ar(gnp.r,method=’mle’)
m2$order # Find the identified order
names(m2)
print(m2$aic,digits=3)
plot(c(0:12),m2$aic,type=’h’,xlab=’order’,ylab=’AIC’)
lines(0:12,m2$aic,lty=1)
82/146
Model checking - residuals
Assume we have time series x1 , . . . , xT observed from
Xt = φ1 Xt−1 + . . . + φp Xt−p + Wt
We estimate φ̂i of φi and calculate (1-step) predictions
xˆt = φˆ1 xt−1 + . . . + φˆp xt−p , t ≥p+1
Residual series defined as
ŵt = xt − x̂t
Using least squares, we can calculate an estimate

PT 2
2 t=p+1 ŵt
σ̂W =
T − 2p − 1
83/146
Model checking
A fitted model should always be examined carefully

If the model is adequate the residual series should behave as a
white noise
ACF and Ljung-Box (next slide) statistics can be used to
check the closeness of {ŵt } to a white noise
Type in R tsdiag (m1, gof = 12) (example of GNP)
84/146
Ljung-Box
Define lag-k sample autocorrelation (x̄ denotes sample mean)

PT
t=k+1 (xt − x̄)(xt−k − x̄)
ρ̂k = PT 0 ≤ k < T − 1.
2
t=1 (xt − x̄)
If {Xt } is an i.i.d. sequence satisfying E[Xt2 ] < ∞ then ρ̂k is

asymptotically normal with mean 0 and variance 1/T .
This allow us to test for individual ACF components;
H0 : ρk = 0, k > 0
85/146
Ljung-Box
Ljung-Box test
m
X ρ̂2i
Q(m) = T (T + 2) .
T −i
i=1
under H0 , Q(m) is chi-squared with m − p degrees of freedom

If {Xt } is an i.i.d. sequence, then {|Xt |} is also i.i.d. so it is a
good practice to apply ACF and Ljung-Box test to absolute
series
86/146
Ljung-Box
# Box-Ljung test
vw = read.table("Data/m-ibm3dx2608.txt",header=T)[,3]
plot(vw,type=’l’)
m3 = arima(vw-mean(vw),order=c(3,0,0)) # fit AR(3)-model
names(m3)
Box.test(m3$residuals,lag=12,type=’Ljung’)
pv = 1 - pchisq(16.352,9) # p-value with 9 dof = #lags - #params
pv # Small p-value => reject H0: white noise
m4 = arima(vw-mean(vw),order=c(3,0,0),fixed=c(NA,0,NA,NA))
m4
Box.test(m4$residuals,lag=12,type=’Ljung’)
pv = 1 - pchisq(16.828,10) # Compute p-value, 10 dof
pv
87/146
Prediction with AR(p)
Assume that {Xt }t is AR(p) and we know its coefficients
Xt+1 = φ1 Xt + . . . + φp Xt+1−p + Wt+1
Assume we have observations up until time t, i.e. xs for s ≤ t

Denote X̂t+1|t = Et [Xt+1 ] (conditional expectation)
Since Et [Wt+1 ] = 0 and Et [Xs ] = Xs , s ≤ t, we get
X̂t+1|t = φ1 Xt + . . . + φp Xt−p+1
et (1) = Xt+1 − X̂t+1|t = Wt+1 residual
2
Var[et (1)] = σW
88/146
Prediction with AR(p)
Next: 2-step predictions
Xt+2 = φ1 Xt+1 + . . . + φp Xt−p+2 + Wt+2
Hence
X̂t+2|t = φ1 X̂t+1|t + φ2 Xt + . . . + φp Xt−p+2

et (2) = φ1 et (1) + Wt+2 = φ1 Wt+1 + Wt+2
Var[et (2)] = (1 + φ21 )σW
2
89/146
Moving average time series
We say that X ∼ MA(q) if
Xt = Wt + θ1 Wt−1 + θ2 Wt−2 + . . . + θq Wt−q , θi ∈ R.
MA(q), q < ∞, is stationary by definition
90/146
Moving average time series
Variance
Var[Xt ] = (1 + θ12 + . . . + θq2 )σW
2
Autocorrelation function for MA(1)
Xt−h Xt = Xt−h Wt − θ1 Xt−h Wt−1
Taking expectation we obtain

2
γ(1) = −θ1 σW , γ(h) = 0 for h > 1
Autocorrelation function of MA(q) vanishes for lags greater

than q (jf. PACF for AR process)
91/146
Backward shift operator
BXt = Xt−1 and B k Xt = Xt−k

AR(p) can be written as φ(B)Xt = Wt , where
φ(z) = 1 − φ1 z − φ2 z 2 − . . . − φp z p
MA(q) can be written as Xt = θ(B)Wt , where
θ(z) = 1 + θ1 z + θ2 z 2 + . . . − θq z q
92/146
ARMA and ARIMA
X ∼ ARMA(p, q) if
φ(B)Xt = θ(B)Wt
ARMA(p,q) is stationary if the characteristic roots of φ(z) are

|ωi | ≤ 1
X ∼ ARIMA(p, d, q) ⇐⇒ ∇d X ∼ ARMA(p, q)
93/146
Fitting models to data in practice
Transform the data (e.g using a Box-Cox transformation) to

stabilise the variance
Differentiate successively to remove linear/polynomial trends
Remove seasonal components
Examine the ACF/PACF: is an AR(p) or MA(q) model
appropriate?
Try your chosen model(s), and use AIC to search for a better
models
Check the residuals from your chosen model by plotting ACF
of residuals, do a Box-Jung test
If residuals form a white noise the model fitting stops
94/146
Fitting an AR in practice
Compute autocorrelation function and check if decays fast

Estimate coefficients and the variance of noise
Use AIC criterion or PACF to determine the order
Compute residuals and test for white noise to decide if the
model gives a good fit
95/146
Fitting an MA in practice
Compute autocorrelation function and check if it vanishes

from some point on
Confirm the order with the AIC criterion
Estimate coefficients by maximum likelihood
Compute residuals and test for white noise
96/146
Fitting an ARMA
Attempt to fit AR and compute residuals

Attempt to fit MA to the AR-residuals, or to original data if
the AR fit was not satisfactory
Analyse the residuals
97/146
Exponential smoothing
Assume we believe in
∞
X
X̂h+1 ≈ w j Xh+1−j , w ∈ (0, 1).
j=1
Since ∞ j
P
j=1 w = 1/(1 − w ) and we want weights to sum up
to one we define
∞
X
X̂h+1 = (1 − w ) w j Xh+1−j , w ∈ (0, 1).
j=1
Can be analysed using ARIMA(0,1,1) model

Common technique for forecasting
98/146
Regression models with time series errors
Often interested in relationship between two (or more) time

series
Consider linear regression
Yt = α + βXt + et ,
where Yt , Xt are two time series and et denotes the error term.
Use least-squares to estimate α, β but it is common for et to
be serially correlated
99/146
R - regression models with time series errors
# Regression model with time-series errrors
# 1-year / 3-year treasury rates, weekly data

r1 = read.table("Data/w-gs1yr.txt",header=T)[,4]
r3 = read.table("Data/w-gs3yr.txt",header=T)[,4]
tVec = c(1:2467)/52+1962
plot(tVec,r1,xlab="",ylab="rate",type="l")
lines(tVec,r3,lty=1,col=’red’)
legend(x=’topleft’,legend=c(’1year’,’3year’),lty=c(1,1),col=c(’black’,’r
plot(r1,r3,type="p")
# Linear regression
lm1 = lm(r3~r1)
summary(lm1)
plot(tVec,lm1$residuals,type=’l’); abline(h=0)
acf(lm1$residuals,lag=36)
100/146
R - cont.
# Look at first differences

c1 = diff(r1)
c3 = diff(r3)
plot(tVec[-1],c1,xlab="",ylab="diff(rate)",type="l")
lines(tVec[-1],c3,lty=1,col=’red’)
legend(x=’topleft’,legend=c(’1year’,’3year’),lty=c(1,1),col=c(’black’,’r
plot(c1,c3); abline(0,1)
# Linear regression
lm2 = lm(c3~-1+c1) # ’-1’ for no intercept
summary(lm2)
plot(lm2$residuals,type="l")
acf(lm2$residuals,lag=36)
pacf(lm2$residuals,lag=36,ylim=c(-1,1))
101/146
Regression models with time series errors
ACF still shows significant serial correlation, but magnitudes

are much smaller then before
We can fit ARMA to residual errors
Next we try MA(1) and AR(p) as a model for the noise
102/146
R - cont.
# MA(1) / AR(p) for regression residuals
r = lm2$residuals
m.ma = arima(r,order=c(0,0,1),include.mean=F)
plot(m.ma$residuals); abline(h=0)
acf(m.ma$residuals)
pacf(m.ma$residuals,ylim=c(-1,1))
Box.test(m.ma$residuals,lag=10,type=’Ljung’)
1 - pchisq(50.344,9)
m.ar = ar(r); print(m.ar)

acf(m.ar$resid[-(1:21)])
pacf(m.ar$resid[-(1:21)],ylim=c(-1,1))
Box.test(m.ar$resid[-(1:21)],lag=23,type=’Ljung’)
1 - pchisq(1.7823,1)
103/146
Regression models - summary
1 Fit the linear regression model and check serial correlation of

the residuals
2 If the residual series is unit-root non stationary, take the first
difference of both the dependent and explanatory variables.
Go to step 1
3 If the residual series appears to be stationary, identify an
ARMA model for the residuals ad modify the linear regression
model accordingly
104/146
Backetesting
Divide the data into estimation and prediction set

Estimate the model from estimation set and use the fitted
model for 1-step predictions
Step forward using the prediction set
Calculate mean-square prediction errors
PT −1
j=h [ej (1)]2
T −h
105/146
Model averaging
Sometimes there are many reasonable models

A good idea is to take an average of the models (weights
should sum to 1)
106/146
Part II: nonlinear time series
107/146
Volatility
A notion of variability of price returns, often used for risk

Not directly observable =⇒ need to estimate from observed
returns
Many different modelling approaches
108/146
Volatility - characteristics
Volatility clusters
Volatility jumps are rare
Volatility shows mean reversion
Volatility seems to react differently to big price increase and
drops - leverage effect
109/146
Volatility models
Volatility as conditional standard deviation of daily returns

Implied volatility
Realised volatility measures (from high frequency data)
110/146
Nonlinear AR
We say that {Xt }t is a nonlinear AR(p) process if
Xt = µ(Xt−1 , . . . , Xt−p ) + σ(Xt−1 , . . . , Xt−p )Wt
Let p = 1 and {Wt }t be i.i.d. N(0, 1) then
Xt |Xt−1 ∼ N(µ(Xt−1 ), σ(Xt−1 )2 )
111/146
Motivation
Plot autocorrelation for data with non-linearity: looks like

white noise
Try qqplos =⇒ looks non-normal?
For Gaussian processes only: no correlation =⇒ lack of
dependence
Square the data and plot ACF
112/146
R - Nonlinear AR
# Non-linearity: S&P 500
ts.plot(SP500); abline(h=0)
acf(SP500)
qqnorm(SP500$r500); qqline(SP500$r500)
# Non-normal, zero-correlation does not guarantee independence!

acf(SP500$r500^2)
# Treasury-bill rates
data(Tbrate,package="Ecdat")
Tbill <- Tbrate[,1]
d.Tbill <- diff(Tbill)
qqnorm(d.Tbill)
qqline(d.Tbill)
acf(d.Tbill)
acf(d.Tbill^2)
113/146
Nonlinear AR
# IBM data:
da = read.table("data/m-intcsp7309.txt",header=T)
head(da)
X = log(da$intc+1)
rtn = ts(X,frequency=12,start=c(1973,1))
plot(rtn,type="l",xlab="year",ylab="IBM log-return"); abline(h=0)
Box.test(X,lag=12,type="Ljung")
# -> can not reject H0: rho(h)=0, h>0
acf(X,lag=24)
acf(abs(X),lag=24)
Box.test(abs(X),lag=12,type="Ljung")
# -> reject H0: rho(h)=0, h>0
114/146
Model structure
We have series {Xt } of log-returns

Usually uncorrelated or with minor lower order serial
correlation
We aim to model
µt = E[Xt |Ft−1 ], σt2 = Var[Xt |Ft−1 ]
115/146
Building a model
Specify the mean of the equation (ARMA model) to remove

linear dependence
Use residual to test for ARCH effects: Consider (Xt − µt )2
and use for Ljung-Box test (for example)
Build a model for volatility and perform joint estimation
Check the final model and refine if necessary
116/146
ARCH models
{Xt }t is said to be of type ARCH(p) if
Xt = σt Wt , {Wt }t is white noise
where
p
X
σt2 = α0 + 2
αj Xt−j ,
j=1
Require: α0 > 0 and αj ≥ 0 for j = 1, 2, . . . , p such that
σt2 > 0
117/146
ARCH(1)
Stationarity
Xt2 = σt2 Wt2 where σt2 = α0 + α1 Xt−1

2
Xt2 = α0 Wt2 + α1 Xt−1

2
Wt2
..
.
∞
α1j Wt2 Wt−1
X
2 2
= α0 . . . Wt−j
j=0
P∞ j
E[Xt ] = E[E[Xt |Xt−1 ]] = 0 and E[Xt2 ] = α0 j=0 α1
Hence, {Xt }t is weakly stationary ⇐⇒ |α1 | < 1 and in that
case, E[Xt2 ] = α0 /(1 − α1 )
118/146
ARCH(1)
Auto-covariance, note that
E[Xt+h Xt ] = E [E[Xt+h Xt |Wt+h−1 , Wt+h−2 , . . .]]

= E[E[σt σt+h Wt Wt+h |Wt+h−1 , Wt+h−2 , . . .]]
= E[σt σt+h Wt E[Wt+h ]] = 0
Thus ACF is zero for h 6= 0

2 X 2 ] ∝ αh
Meanwhile, E[Xt+h t 1
Shows that X is a weak white noise, not a white noise in the
strong sense!
119/146
ARCH(1)
Higher moments
3α02 (1 + α1 )
E[Xt4 ] =
(1 − α1 )(1 − 3α12 )
1
This implies that 0 ≤ α12 < 3
Tails of Xt are heavier than that of normal distribution
E[Xt4 ] 1 − α12
= 3 > 3.
(Var[Xt ])2 1 − 3α12
120/146
ARCH pros and cons
Advantage:
Model can produce volatility clusters
X has heavy tails
Weaknesses:
Positive and negative shocks have the same effect on volatility
ARCH models are likely to over-predict the volatility
121/146
ARCH and AR
ARCH(p):
p
X
Xt = σt Wt where σt2 = α0 + 2
αj Xt−j ,
j=1
Define ηt = Xt2 − σt2 , then we can write

p
X
Xt2 = α0 + 2
αj Xt−j + ηt
j=1
This is an AR(p) for Xt2 except {ηt } might not be i.i.d.

Hence, to determine the order it is useful to look at PACF of
Xt2
122/146
ARCH model checking
Define residuals as
Xt
X̃t =
σt
Check if X̃t forms i.i.d sequence (Ljung-Box etc.)
123/146
R - ARCH fitting to IBM data
# ARCH(p)
# install fGarch from Rmetrics

install.packages(’fGarch’)
library(fGarch)
m1 = garchFit(~garch(3,0),data=X,trace=T) # Fit an ARCH(3) model
?garchFit
summary(m1)
m2 = garchFit(~garch(1,0),data=X,trace=F)
summary(m2)
plot(m2)
pacf(X^2,lag=24,ylim=c(-1,1))
m3 = garchFit(~garch(12,0),data=X,trace=F)
summary(m3)
plot(m3)
124/146
GARCH
{Xt }t is said to be of type GARCH(p,q) if
Xt = µt + σt Wt
where
p
X q
X
σt2 = α0 + 2
αj X̃t−j + 2
θj σt−j ,
j=1 j=1
with X̃t = Xt − µt
Hence X̃t |Ft−1 ∼ N(0, σt2 )
125/146
ARCH, GARCH
Set residuals εt ≡ Xt2 − E[Xt2 |Xt−1 ] = Xt2 − σt2

ARCH(1):
σ 2 = α0 + α1 Xt−1
2
Xt2 = α0 + α1 Xt−1
2
+ εt
GARCH(1,1):
σt2 = α0 + α1 Xt−1
2 2
+ β1 σt−1
Xt2 − εt = α0 + α1 Xt−1
2 2
+ β1 (Xt−1 − εt−1 )
Xt2 = α0 + (α1 + β1 )Xt−1
2
− β1 εt−1 + εt
126/146
ARCH,GARCH
{Xt }t ∼ ARCH(p) ⇐⇒ {Xt2 }t ∼ AR(p)

{Xt }t ∼ GARCH(p,q) ⇐⇒ {Xt2 }t ∼ ARMA(p,q)
In practise often low order GARCH models are used
Rule of thumb: if the fitted ARCH is of high order, often low
order GARCH is a good alternative
127/146
Other models
IGARCH - (unit-root GARCH)

GARCH - M (if returns depend on the volatility)
EGARCH (asymmetric effects between positive and negative
asset returns)
and many more...
128/146
Cointegration
Suppose we could find an asset whose price was stationary

(therefore mean-reverting)
Whenever the price is below the mean we buy and realise a
profit when the price reverts to its mean
If the price is above the mean we sell
However we already know that the prices are integrated!
Sometimes we can find two or more assets that a linear
combination of their prices is stationary
If that happens, a vector of coefficients of that linear
combination is called a cointegration vector
129/146
Example
Consider two time-series

(1) (1) (2) (2)
Xn = Sn + n and Xn = Sn + n n = 1, . . . , N,
where:
Sn = S0 + W1 + . . . + Wn , S0 = 0, is a random walk with
{Wn }n - N(0, 1) being a white noise process,
(1) (2)
{n } and {n } are two independent white noise sequences
that are independent of {W }
(1) (2)
{Xn }n and {Xn }n are random walks plus noise and hence
both I (1)
130/146
Example cont.
However, the linear combination

(1) (2) (1) (2)
Xn − Xn = n − n n = 1, . . . , N,
is stationary since it is a white noise with normal distribution .

Cointegration is a state of several time series sharing a
common (non-stationary) trend, the remainder being
stationary
131/146
Example cont.
Consider two time-series

(1) (1) (2) (2)
Xn = Sn1 + n and Xn = Sn2 + n n = 1, . . . , N,
Testing for stationarity is easy if the coefficients of the linear

combination are known
(1) (2)
Stationarity of a1 Xn + a2 Xn is equivalent to
(2) (1)
Xn = β0 + β1 Xn + n ,
where n is a stationary time series.

Regression analysis is very useful here
132/146
Regression analysis for cointegration
When regressing I (1) on another I (1) we may face the

problem that the residuals are integrated processes
However if these processes are cointegrated the residuals will
I (0) (previous slide)
Philips-Ouliaris or Dicky Fuller test for cointegration is
designed to test for stationarity of residuals
133/146
Trading with R
134/146
Useful libraries in R
TTR indicators popular in industry

PerformanceAnalytics collection of econometric functions
for performance and risk analysis
quantmod quantitative financial modelling and trading
framework
googleVis provides an interface between R and the Google
chart tools
Rmetrics open development software project for
computational finance
135/146
Basic steps
Forecasting: hypothesis, statistics, testing

Explaining data is easy, forecasting is complicated Data is very
noisy and often has little structure. Statistical significance is
hard to get
Get data (think of you trading frequency and horizon)
Select tools
Construct trading rule
Evaluate your strategy
136/146
Example
Data: SP500
Tool: we use DVI indicator (momentum indicator) from TTR
library
Trading rule: go long if DVI < 0.5 and short if DVI > 0.5
We always invest all our capital
Check performance of our strategy
137/146
Example in R
install.packages("quantmod")
install.packages("PerformanceAnalytics")
library(PerformanceAnalytics)
library(TTR)
# Step 1: Get S&P 500 data from Yahoo

getSymbols("^GSPC") # load(file=’GSPC_for_trading.Rdata’,verbose=T)
tail(GSPC)
chartSeries(GSPC,theme = chartTheme("white"))
# Step 2: Create indicator

# Calculate DVI (momentum indicator)
dvi <- DVI(Cl(GSPC)) # ’Cl’ gives closing price
?DVI
plot(dvi[,3])
# Step 3: Construct your trading rule

# Create signal: (long (short) if DVI is below (above) 0.5)
# ’Lag’ so yesterday’s signal is applied to today’s returns
sig <- Lag(ifelse(dvi[,3] < 0.5, 1, -1),k=1)
138/146
Example in R
# Step 4: Equity curve

# Calculate signal-based returns
ret <- ROC(Cl(GSPC))*sig
# Step 5: Evaluate strategy performance

# subset returns to period of intrest
ret <- ret["2009-06-02/2010-09-07"]
# Use the PerformanceAnalytics package:

# Cumulative Performance
chart.CumReturns(ret)
# Performance, Drawdonws etc...
table.Drawdowns(ret, top=10)
table.DownsideRisk(ret)
charts.PerformanceSummary(ret)
139/146
Example in R
# Compare with long buy-and-hold

ret <- ROC(Cl(GSPC))["2009-06-02/2010-09-07"]
# Compare with long-only

sig <- Lag(ifelse(dvi[,3] < 0.5, 1, 0))
ret <- ret["2009-06-02/2010-09-07"]
# Compare with random signal

set.seed(10)
sig = runif(length(Cl(GSPC))) < 0.5
sig = 2*sig-1
ret <- ret["2009-06-02/2010-09-07"]
140/146
RSI - Example in R
# S&P500 index data from Yahoo! Finance

getSymbols("^GSPC", from="2000-01-01", to="2008-12-07")
chartSeries(GSPC,theme = chartTheme("white"))
# Calculate the RSI indicator

rsi <- RSI(Cl(GSPC),2)
plot(rsi)
?RSI
# Create the long (up) and short (dn) signals

sigup <- ifelse(rsi < 10, 1, 0)
sigdn <- ifelse(rsi > 90, -1, 0)
# Lag signals to align with days in market,
# not days signals were generated
sigup <- lag(sigup,1)
sigdn <- lag(sigdn,1)
141/146
RSI - Example in R
# Replace missing signals with no position

# (generally just at beginning of series)
sigup[is.na(sigup)] <- 0
sigdn[is.na(sigdn)] <- 0
# Combine both signals into one vector
sig <- sigup + sigdn
# Calculate Close-to-Close returns

142/146
Remarks
Warning! Often it is useful to size your position (for example

taking into account volatility
Try this for RSI strategy
143/146

Econometrics 2019 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Econometrics 2019 PDF

Загружено:

Авторское право:

Доступные форматы

Introduction to Analysis

To overview tools for data analysis used for financial data

“Statistics and Data Analysis for Financial Engineering” by

# Check if two objects are the same

# Vector arithmetic: component-wise

# Some matrix functions

# Random number generation

# Plot student-t distributions with various

x <- seq(-4, 4, length=100)

# The quantmod library

getSymbols("AAPL",src="yahoo") # download Apple stock from Yahoo

# Financial chart from ’quantmod’

Formal model: (Ω, F, P) probability space

Denote random variables by upper case letters: X1 , . . . , Xn

P(X1 ≤ α1 . . . , Xn ≤ αn ) = P(X1 ≤ α1 ) . . . P(Xn ≤ αn ).

Denote a data sample by (x1 , . . . , xn ), where implicitly

Joint Distribution FX ,Y (x, y ) = P(X ≤ x, Y ≤ y ). If density exists

q-th quantile of random variable

p-th central moment (provided X ∈ Lp )

SX = 0 and KX = 3 for Gaussian variable; KX > 3 called leptokurtic

Assume have observed the outcome of some random variables

to be small, is given by the conditional expectation

Often, we are interested in the best linear prediction

with αi ∈ R such that the mean-square error is minimised

E[ξ|X ] = µξ + ΣξX Σ−1

Histogram: a non-parametric estimator of the density of a

Given a sample (x1 , . . . , xn ) from (unknown) density f (x), the

where K is a non-negative function which integrates to 1

where θ(x, xi ) = 1 if x and xi belong to the same bin, zero

install.packages("Ecdat") # Download package Ecdat

Statement of a null hypothesis H0 and decision on a

Let x = (x1 , . . . , xN ) be a sample from X ∼ N(µX , σX2 )

H0 : µX = 0 (null hypotheis) vs. H1 : µX 6= 0 (alternative)

q1−α/2 is quantile of t-student (std. normal) distribution

J-B test: if Y ∼ N(0, 1), then E[Y 3 ] = 0 and E[Y 4 ] = 3

Q-Q plot is a graphical method of comparing two probability

FX (πq ) = P(X ≤ πq ) = q ⇒ πq = FX−1 (q).

By Glivenko-Canteli theorem F̂n (x) → F (x) uniformly in x

If the normality assumption is true, then qth sample quantile

# Create sample data

# Standard normal fit

# S&P 500 returns

where  is an error with E [|X1 , . . . , Xp ] = 0

βj gives change in the expected value of Y when Xj changes

Conditional expectation assumed to be linear

Uncorrelated additive (Gaussian) noise: 1 , 1 , . . . , n are

n independent observations (x1 , y1 ), . . . , (xn , yn ) from

Least squares estimation: find β0 and β1 that minimize

Fitted values ybi = βb0 + βb1 xi

n independent observations (x1 , y1 ), . . . , (xn , yn ) where

Organise in matrices: y = (y1 , . . . , yn )> , X with rows (1, x>

RSS(β) = (y − Xβ)> (y − Xβ)

If X has full rank, the unique minimising solution is given by

β̂ = (X> X)−1 X> y

Fitted values ŷ = Xβ̂

Var(β̂) = (X> X)−1 σ2

If errors are Gaussian

Gives 1 − α confidence intervals/regions

is distributed as tn−p−1 . Hence, large tj =⇒ reject H0

where RSS0 is for smaller model with βg = 0 and p0 + 1

First component of AIC measures ”goodness of fit”

How much of the variation in Y can be predicted if one knowns

total SS = explained SS + residual SS

where is an error with E [|X1 , . . . , Xp ] = 0

Uncorrelated additive (Gaussian) noise: 1 , 1 , . . . , n are

Var(β̂) = (X> X)−1 σ2