Вы находитесь на странице: 1из 200

lOMoARcPSD|1057622

Lecture notes, lectures 1-8

Econometrics (University of Manchester)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 1

Ralf Becker

February 4, 2016

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Table of contents

State of Play
Model assumptions and parameter properties
Example: Basic Econometrics Grades (1)
Testing multiple restrictions
Example 1: Basic Econometrics Grades (2)
p-values - Revision

Overview Semester 2

Auxiliary regressions

Small Sample Properties of OLS Estimators

What next

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Assumptions

Assumption 1 The model is linear in parameters.

yi = 0 + 1 x1i + 2 x2i + ... + k xki + ui (1)

or
y = X + u (2)

Assumption 2
Random samples {yi , x1i , ..., xki }. n observations.

Assumption 3
There is variation in the explanatory variables. Absence of perfect
multicollinearity (full rank of X).
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Assumptions

Assumption 4 Zero conditional mean.

E [ui |xi ] = E [ui |x1i , x2i , ..., xki ] = 0 (3)

or
E [u|X] = 0 (4)

A1 to A4 guarantee that the OLS parameter estimator is unbiased


h i
E b = (5)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Assumptions

Assumption 5
Homoskedasticity. Constant residual variance.

Var [ui |xi ] = 2 (6)

or
Var [u|X] = 2 I (7)

A1 to A5 (= Gauss-Markov assumptions) guarantee that OLS


parameter estimator is BLUE (best linear unbiased).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Assumptions
Assumption 6
Normality.
ui N 0, 2

(8)
or
u N 0, 2 I .

(9)
This assumption implies A4 and A5.
Gauss-Markov assumptions + A6 (= Classical linear regression
assumptions) guarantee that inference on can be based on t and
F tests in samples of any size.
   
bi i /se bi tnk1 (10)
F Fr ,nk1 (see below)
If A6 is not valid then the above inference is justified in large
sample (and the associated theory is called asymptotic theory).
   
a a
bi i /se bi tnk1 = N (0, 1) (11)
a
Distributing prohibited
F F|rDownloaded
,nk1
by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Example: Basic Econometrics Grades (1)

Semester 1 and
Semester 2 grades
(somewhat randomised)
for Econometrics
(gradeexample.csv). 137
observations.

Regress variable sem2i against sem1i

sem2i = + sem1i + ui (12)


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Example: Basic Econometrics Grades (1)

We can test whether there is a significant relationship between


Semester 1 and Semester 2 results. Looking at the regression
output the answer is clearly yes. Lets, however, test the following
hypothesis:

H0 : = 1
HA : < 1

Given A1 - A6 we know that


   
b /se b tn11

The rejection rule is:


Reject H0 if tcalc < tcrit, (= 1.645 for = 0.05)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Example: Basic Econometrics Grades (1)

The test statistic is

b 1 0.9337 1
tcalc =  = = 0.9507
se b 0.0697

Therefore we do not reject H0 . What does this mean? For every


grade that you get extra in Semester 1 (through good revision, of
course!) you will, on average, get another grade in Semester 2.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Testing multiple restrictions

The most common test statistic used to test multiple restrictions is

(SSRr SSRu ) /r
F = (13)
SSRu / (n k 1)
F Fr ,nk1 if A1 to A6
a .
F Fr ,nk1 if A1 to A5

here r = number of restrictions tested and SSR = sum of squared


residuals of the restricted (r ) and unrestricted models (u).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Example: Basic Econometrics Grades (2)

We are interested in whether the relationship between Semester 1


and Semester 2 grades differs between Year 2 and 3 students.
Include Year 3 dummy into the model

S2ti = 0 + 1 S1ti + 2 Y 3si + 3 (S1ti Y 2si ) + ei

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Example: Basic Econometrics Grades (2)


Lets test a composite hypothesis that first semester performance
has no bearing on the second semester grade, i.e.

H0 : 2 = 3 = 0
H0 : 2 and/or 3 6= 0
The test statistic to be used is the F -test

(SSRr SSRu ) /r
F = Fr ,nk1
SSRu / (n k 1)
We are testing two restrictions, hence r = 2, and n k 1 = 133.
The decision rule is to reject H0 if F > F2,133, (3.00 at = 0.05).

(26786.75 25437.09) /2
F = = 3.5284
25437.09/133
(Get SSRr and SSRu yourself!) We reject H0 at = 0.05. The
Semester 1 / Semester 2 grade relationship (marginally) varies
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
between Year 2 and Year 3 students.
lOMoARcPSD|1057622

Example: Basic Econometrics Grades (2)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

p-values

It is crucial to understand what p-values are, how they are


calculated and how to interpret them. The decision rule used,
when using p-values is the following

Reject H0 if p-value <

The value for = P(type I error H0 is true) needs to be set by


the researcher. The p-value is then the probability of getting a test
statistic at least as extreme as that calculated from the data if
H0 was true!

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

p-values - Examples
For the above examples the p-values are
I t-test, left-tailed (one sided)

H0 : = 1
HA : < 1

H0 distribution: t135 . test-stat = -0.9507. p-value= 0.1717


(or from Tables p value > 0.1 using 120 d.o.f.). Do not
reject H0 .
I F-test

H0 : s1t = s1e = 0
H0 : s1t and/or s1e 6= 0

H0 distribution: F2,133 . test-statistic = 3.5284. p-value =


0.0321 (or from Tables: 0.01 < p value < 0.05 using 2 and
120 d.o.f). Do reject H0 at by
Distributing prohibited | Downloaded
Elia
= 0.05.
Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Overview Semester 2

Auxiliary Regressions
Small Sample Parameter Properties
The Matrix Form
Asymptotic Parameter Properties
Introduction to Time-Series Data
Multicolinearity - Breach of A3
Heteroskedasticity - Breach of A5
Autocorrelation - Breach of A5 for time series data
Specification testing - Breach of A1
Forecasting
Maximum Likelihood
Bayesian Econometrics

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Assessment

I Problem Sets, Multiple-choice and short answer questions


based on prior work assignment, you will need RStudio to
complete all work.
PS 1: Deadline in week beginning 22 Feb (2.5%)
PS 2: Deadline in week beginning 25 Apr (2.5%)
I Mid-Term Exam, Thursday 17 March, 3-4pm (10%)
I Final Exam, 1.5 hours, short answer-type questions, (35%)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Auxiliary regressions
Reading: (Wooldridge p176-178)
Later in the course we will encounter helper or auxiliary
regressions.
Some multiple restrictions can easily be tested by auxiliary
regressions. Example:
y = X + u (14)
0
= (0 1 ... 4 ) (15)

H0 : 2 = 3 = 4 = 0 (16)
1. Estimate restricted model
yi = 0 + 1 x1i + ui (17)
and obtain estimated residuals
ei = yi e0 e1 x1i
u (18)
where eiprohibited
Distributing are OLS| Downloaded
estimatesbyfrom the(yypieesp@abyssmail.com)
Elia Aile restricted model.
lOMoARcPSD|1057622

Auxiliary regressions

ei on constant, x1t and x2t , x3t and x4t . Obtain R 2


2. Regress u
from this regression.

3. Calculate the test statistic LM = nR 2 .


Under the null hypothesis (16) this test statistic is
asymptotically 2 distributed with r (= 3) degrees of
freedom. Right tailed test.
Idea: Regress residuals from the null model against something
with which the residuals should only be correlated in the case the
null hypothesis is not valid. If the null is valid the correlation of the
regressors in the auxiliary regression and the null residuals should
be close to zero and so should the R 2 and hence the LM test.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Small Sample Parameter Properties

Reading: (Wooldridge 3.3,3.4 and 4..1)


Lets start with our standard regression model

yi = 0 + 1 xi + ui (19)

You know that


Cov (yi , xi )
b1 = (20)
Var (xi )
and
  2
Var b1 = (21)
SST1 1 R12


What does it mean for parameter estimates to be unbiased and


efficient?
These are small sample properties, next week: large
sample/asymptotic properties.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Parameter Properties
Unbiasedness

Formally
E (1 ) = 1 (22)
Assume that there is a population with 100,000 members. In this
population the true (but unknown) relationship is (where 1 = 1.5)

yi = 0.5 + 1.5 xi + ui (23)

Then each of you (300 students) randomly draws a sample of 100


observations (i.e. 100 pairs of (yi , xi ) pairs) and estimates a
regression, obtaining 0 and 1 . What would we expect?
I 1 is a random variable and hence ...

I Each of you would obtain different estimates 0 and 1


I We would then expect that, on average, your estimates of 1
would equal 1.5
IF A1 to A4 hold
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Parameter Properties
Unbiasedness

But note: In practice you have only one of these 1 s! And that
mayDistributing
happen to be somewhere
prohibited in by
| Downloaded the
Eliatail!
Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Parameter Properties
Efficiency

Recall that 1 is a r.v.

From the histogram for


1 you can see that it
has some variation.

If the OLS estimator is


efficient (if GM
assumptions hold!) then
there is no other
estimator that has a
smaller variation.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Parameter Properties
Why are they important

I When you estimate a regression on sample data you know


that you will obtain a draw from a random variable
I You want to know that you are drawing from a distribution
that is centered around the true value (unbiasedness)
I You want to know that you are drawing from a distribution
that is not unnecessarily dispersed (efficient)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

What to do next:

I Clips on the matrix form and p-values to consolidate this


weeks material
I Attempt the revision quiz
I Before next weeks lecture watch clips on some basic
statistical tools we will need for next weeks lecture

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

ECON20110/30370 Econometrics
2016/17 - Semester 2 - Week 2

Ralf Becker

February 1, 2016

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Table of contents

Observation Wise Form to Matrix Form

Asymptotic Preliminaries

Asymptotic Properties of OLS Estimators

Random Regressors

Introduction to Time-Series Data

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Observation Wise Form to Matrix Form

The same Model can be represented in two ways

y = X + u (1)
yi = 0 + 1 xi + ui (4)

Cov (yi , xi )
b1 =
1 Var (xi )
b = X0 X X0 y (2)
b0 = y b1 x (5)
  2
 
1 Var bj =
 
Var b = 2 X0 X

(3) SSTj 1 Rj 2

(6)
The Matrix form is much more general as it leaves the number of
columns in X unspecified
We will use the form that makes life easier (depends on the issue)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Matrix Form

Let X be a (n q) matrix, y be a (n 1) vector and u is a (n 1)


vector of error terms
 1
0
=
b X X X0 y (7)
(qn)(nq) (qn)(n1)


b1
b2

b = (8)

...

(q1)
bq
If the first column of X is a vector of ones (representing the
constant) then b1 is the estimated constant parameter.

In Semester 1: q = k + 1 (sometimes I use k instead of q)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Matrix Form
Let X be a (n q) matrix, y be a (n 1) vector and u is a (n 1)
vector of error terms
   1
2 0
Var b = X X (9)
(11) (qn)(nq)


Var (b1 )
  Cov (b1 , b2 ) Var (b2 )
Var b =

.. ..
(qq) . .
Cov (1 , q ) Cov (2 , q )
b b b b Var (bq )
(10)
This matrix is a symmetric matrix

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Matrix Form and averages


We need to understand what terms like X0 X are!

1 x1
1 x2
X= .

..
.. .
1 xn
Then

1 x1
1 x2
 
0 1 1 ... 1
XX =

x1 x2 . . . xn
.. ..
. .
1 xn
(12 + 12 + + 12 ) (x1 + x2 + + xn )
 
=
(x1 + x2 + + xn ) (x12 + x22 + + xn2 )
 
n xi
=
Distributing prohibited xi2
xi | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Matrix Form and averages

As it turns out it will be convenient to deal with averages.


 1 1

1 0 n n n xi
XX= 1 1 2
n n xi n xi

Each element represents an average!

Equally:
1
 
1 0 n yi
Xy= 1
n n (xi yi )

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Asymptotic Preliminaries

I Asymptotic arguments are arguments in which we imagine


that the sample size, n, goes to infinity, n .
I The reason we do this is not because it is realistic to increase
sample sizes to , but because we understand the behaviour
of some terms if n , in particular: averages
I Recall sample means are random variables!
The following two tools are important:
Theorem (Law of Large Numbers)
Sample means converges to true mean as n .
Conditions apply!

Theorem (Central Limit Theorem)


Averages are asymptotically normal distributed as n .
Conditions apply!
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Asymptotic Properties of OLS Estimators

In Semester 1:
I We needed Assumption 6 (error normality) to derive the
distributions of t and F tests.
I If A6 holds we know the distributions of t and F tests at
any sample size.
In Semester 2:
I As we relax some assumptions (e.g. the homoskedasticity
assumption) we loose the ability to derive small sample
distributions. But we will be able to derive asymptotic
properties (i.e. as the sample size goes to infinity).
I Means that we can do without A6!

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Asymptotic Properties of OLS Estimators


Assuming A1 to A5 hold
Here the basic idea! (Details in Online Clips) Start with the matrix
form model (fixed X):
y = X + u (11)
What we need is the distribution of
1 0
b = X0 X Xy (12)

We modify this by substituting for y to get


1 0
b = + X0 X Xu (13)

Bring on the LHS and augment by 1/n terms


 1  
1 0 1 0
b = XX Xu (14)
n n

Recall that, as
Distributing u is a |r.v.,
prohibited b by
Downloaded is Elia
alsoAile
a (yypieesp@abyssmail.com)
r.v.
lOMoARcPSD|1057622

Asymptotic Properties of OLS Estimators


Consistency

Terms A and B are now averages.


 1  
1 0 1 0
b = XX Xu (15)
| {z } n n
bias | {z } | {z }
A B

How does b behave for large n?


p
We want b 0, then b is said to be consistent. If E (xi ui ) = 0
(A4) and (xi ui ) is iid (A2), then a Law of Large Numbers (LLN) is
applicable (sample average converges to true mean!).

We also need to apply a LLN to A1 and assume that it (as A is


an average) converges to a constant mean, say M.
p p
Then B 0 and hence b M 0 = 0.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Asymptotic Properties of OLS Estimators


Asymptotic Normality

Now we want to know how (b ) is distributed (for large n).


Recall, it is a r.v.!
Applying a Central Limit Theorem (CLT, averages are
asymptotically normally distributed) (under assumptions, e.g. iid!)
we can establish that
 1  
1 0 1 0
b = XX Xu (16)
| {z } n n
| {z } | {z }
M a
N(0,P)

This then establishes that


a
b N(0, MPM 0 ) (17)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Small Sample v Asymptotic Properties

In Semester 1:

GM assumptions (A1 to A5) allowed unbiasedness and efficiency


(BLUE) result

CLRM assumptions (A1 to A6) allowed b N(0, 2 (X0 X)1 )


result

In Semester 2:
a
A1 to A4 are sufficient to derive b N(0, MPM 0 ). A6 is not
necessary. A5 can be relaxed (using different LLNs and CLTs)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Random Regressors

So far we assumed that X was non-random and fixed.Lets start

from this result


1
b = + X0 X X0 u (18)

From here we established that b was a r.v. and hence we were


interested in its expectation to establish unbiasedness.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Random Regressors

If X is fixed:

E ()
b = (19)
if E (u) = 0

If X is random:

E (|X)
b = (20)
if E (u|X) = 0
hence result is conditioned on the particular set of observations X
we used.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Random Regressors

Can the last result be generalised?

I.e. can we turn the conditional expectation E (|X)


b = into an
unconditional expectation?

E () = EX (E (|X))
b = EX () = (21)

which is an application of the Law of Iterated Expectations

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Random Regressors
What about our variance formula?

If X is fixed:
1
b = 2 X0 X
Var () (22)

If X is random:
1
Var (|X)
b = 2 X0 X (23)
i.e. at this stage the variance formula is valid for the particular X
only.

We need to form expectations across the r.v. X to get Var ()


b

1  1 
Var ()
b = EX (Var (|X))
b = EX ( 2 X0 X ) = 2 EX X0 X
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)(24)
lOMoARcPSD|1057622

Random Regressors
Variance implementation

we established that
 1 
b = 2 EX
Var () X0 X (25)

In practice we only have one realisation of X.

How could we possibly obtain EX (. . . X . . .)?

We use the one observation we have and calculate an average of


what we want over that one observation!

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Introduction to Time-Series Data

Data may be sampled across time. (W. chps 10.1, 10.2 and 11)
Example: Phillips curve. Is there a relationship between inflation
() and unemployment (un)?

obs CS obs TS
all in 2003 all UK data
1 (UK , unUK ) 1 (1990 , un1990 )
2 (Ch , unCh ) 2 (1991 , un1991 )
3 (Jap , unJap ) 3 (1992 , un1992 )
.. .. .. ..
. . . .

i = 0 + 1 uni + ui i = 1, ..., n (26)


t = 0 + 1 unt + ut t = 1, ..., T (27)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Introduction to Time-Series Data


Some Time-Series Plots

Source:OECD, Main Economic Indicators


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Consequences of using Time-Series Data

Are there any consequences for the properties of OLS model


estimates if data are time series?

Assumption 2
Random samples {yi , x1i , ..., xki }. n observations.

The ith observation should be independent from any other


observation.
Can this assumption be maintained for TS data?

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Consequences of using Time-Series Data

Often Data will trend for the entire or part of the series.

Random Sampling does not seem appropriate for all TS data.

Derivation of asymptotic properties of parameter estimates


collapses.

This makes establishing relationships difficult.

Example
Relationship between CO2 emissions (thousand metric tons of
carbon) and global temperature (deviation from 1961-1990
average).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Consequences of using Time-Series Data

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Time-Series Data: Outlook

What are we to achieve?

I Figure out when we can use TS data in a regression model


I What are the consequences if we employ regression analysis
inappropriately
I Build a simple univariate TS model (mainly for forecasting
purposes)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

ECON20110/30370 Econometrics
2015/16 - Semester 2 - Lecture Week 3

Ralf Becker

February 13, 2016

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Table of contents

Time Series Data


Model assumptions and parameter properties
Consequences of breach of assumption TS1
A real example
A simulated example
Univariate Time-Series models
Introduction
Linking TS features and univariate processes
AR(1) model

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Model Setup

I assume that you have watched the online clips in the


Pre-Lecture Section of BB.

Consider the following time-series model

yt = xt + ut (1)

where xt may include xt = (zt , zt1 , zt2 , ..., yt1, yt2,... )

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Assumptions
Assumption TS1 (as in Wooldridge chapter 11) Assume that
the model is as in (1) and that the draws of (yt , xt ) for t = 1, ..., T
are stationary and weakly dependent.

Assumption TS2
No perfect correlation between variables in xt .

Assumption TS3
Zero conditional mean.
E [ut |xt ] = 0. (2)
Assumption TS4
Homoskedasticity. Constant residual variance.
Var [ut |xt ] = 2 (3)
Assumption TS5
No autocorrelation (serial correlation)..

Distributing prohibited uts |xt , xts


Corr (ut|,Downloaded ) =Aile
by Elia 0 for all s 6= 0
(yypieesp@abyssmail.com) (4)
lOMoARcPSD|1057622

Consequences of breach of assumption TS1


A real example

Consider the following four time series (all annual from 1961 to
2007, TS_Data_SpuriousRegression.wf1)
I Life Expectancy at Birth in Belgium (lifeexp)
I Agriculture, value added (% of GDP) in China (agrval)
I ODA aid per person (constant 2007 US$) in Norway (aid)
I CO2 emission per person (metric tons) in Australia (co2em)
Which ones should be least related? Lets run a regression and
lets see what we get.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Consequences of breach of assumption TS1


A real example - Discussion of results

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Consequences of breach of assumption TS1


A simulated example

Consider two series {yt } and {xt } . We impose the following:


1. They are simulated series, independent from each other
2. They breach assumption TS1 (they follow a Random Walk
model)

yt = yt1 + ut ut N (0, 1) (5)


xt = xt1 + vt vt N (0, 1) (6)

(see online clip)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Consequences of breach of assumption TS1


A simulated example

Figure : Example of simulated random walks


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Consequences of breach of assumption TS1


A simulated example

Then we regress yt on constant and xt

yt = 0 + 1 xt + t

and get
ybt = 16.0 + 0.1697 xt
(0.220) (0.011)

The t-stat is around 16. Can we trust this result? No! TS1 is
breached if data behave like in (5) and (6).
Spurious Regression!
Using nonstationary series in standard regression analysis will cause
problems.
The issue are the availability of LLNs and CLTs.
Straightforward for iid data; They are available for weakly dep.
(stationary) data; But not available for nonstationary data
Need to understand more about the behaviour of TS
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Univariate Time-Series models


Autoregressive Models

To understand a time-series behaviour we often use univariate


models.

Economic variables are the result of very complicated interactions


with many other economic variables.

Abstract from all the other variables and concentrate on dynamic


features of yt .

yt = 0 + 1 yt1 + 2 yt2 + ... + k ytk + ut (7)

which is called an autoregressive process.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


TS Data Features

Features of TS data:
I persistence
I trending
I seasonality
Models of the type (7) can capture all these features.

The key to describing the features of time series are


autocorrelations h = Corr (yt , yth ) for lags h = 1, 2, 3, ...

Here autocorrelations of {yt } and {xt } from the spurious


regression example.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


TS Data Features

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


TS Data Features

Time Series for which the autocorrelation starts with values very
close to 1 (for lag h = 1) and only decays very slowly are likely to
not be weakly dependent. The dependence is too strong

Covariance Stationarity and Weak Dependence are conceptually


quite different, but in practice we will find that most series that are
covariance stationary also are weakly dependent and vice versa.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


Another example - US Dollar / UK Pound exchange rate

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


Linking TS features and univariate processes

The key is that the autocorrelations h of an AR process are


related to the coefficients i , i = 1, ..., k in the AR (k) model

yt = 0 + 1 yt1 + 2 yt2 + ... + k ytk + ut

I More coefficients allow for more complicated autocorrelation


functions
Why are univariate models (such as the AR) so useful?
I We often abstract from complicated interrelations between
economic time-series
I Simple models like this have proven useful for forecasting
I A related model can be used to perform a hypothesis test on
whether a series is stationary or not
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model

We will look at one particular AR process in detail, the AR (1)


process. The exercise will look at higher order processes.
We will use it to:
1. Relating AR (1) coefficients to the moments of a time-series
unconditional moments
2. How to use an estimated AR model to forecast Conditional
Expectations and Forecasting

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model - Unconditional Moments

One example in a bit more detail: Autocorrelation of order 1,


AR(1).
yt = 0 + 1 yt1 + ut . (8)
This is a special case of AR(k) in (7).
Assume that the innovation term ut is zero mean iid with
Var (ut ) = u2 . Acknowledging that yt is a random variable we are
interested in a number of characteristics of this time series, namely
its expectation and variance.

E [yt ] = t (9)
Var [yt ] = t2 . (10)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model - Unconditional Moments

These are unconditional moments, that is expectations about yt


without any other knowledge but those of the process parameters.
Here they are just stated:

0
E [yt ] = (11)
1 1
u2
Var [yt ] = 2 = (12)
1 12

Note that these are independent of time!

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model - Unconditional Moments
Autocovariances and autocorrelations are also related to the
parameters in (21):

1 = Corr (yt , yt1 ) = 1 ; Cov (yt , yt1 ) = 2 1 (13)


2 = Corr (yt , yt2 ) = 12 ; Cov (yt , yt2 ) = 2 12 (14)
..
.
k = Corr (yt , ytk ) = 1k ; Cov (yt , ytk ) = 2 1k . (15)

without any knowledge of previous realisations of yt we would


forecast yt to take the value

E [yt ] = t = 0 / (1 1 )
Does not use all info, in particular yt1, yt2 etc. (persistence!)
From elementary statistics: P (A) 6=P (A|B)
Distributing
Using all infoprohibited
will lead| Downloaded
to improvedby Elia Aile (yypieesp@abyssmail.com)
expectation.
lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model - Conditional Expectations and Forecasting

Similarities to prediction. Here we are explicitly going outside the


sample range used for estimation.

Today: t

Forecast yt+1 , yt+2 , etc using observations yt , yt1 , yt2 , etc.

It = yt , yt1 , yt2 , ... (16)


Of course, the unconditional expectations applies here as well,
E [yt+1 ] = 0 /(1 1 ).

Consider E [yt+1 |It ], the expected value of yt+1 taking into


account the information available at time t, It .

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model - Conditional Expectations

In the AR(1) model, such a prediction is then obtained as follows

E [yt+1 |It ] = E [0 + 1 yt + ut+1 |It ] (17)


= 0 + 1 E [yt |It ] + E [ut+1 |It ]
= 0 + 1 yt + 0.

In general, E [yt+1 |It ] 6= E [yt ].

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model - Conditional Expectations
Example with (0 = 0.2, 1 = 0.5):

process: yt = 0.2 + 0.5 yt1 + ut (18)


yt3 = 0.02, yt2 = 0.55, yt1 = 0.71 and yt = 0.64

it then follows that

E [yt+1 |It ] = 0.2 + 0.5 yt = 0.52 (19)


E [yt+2 |It ] = E [0.2 + 0.5yt+1 + ut+2 |It ] (20)
= 0.2 + 0.5 E [yt+1 |It ] + E [ut+2 |It ]
= 0.2 + 0.5 E [yt+1 |It ] + 0
= 0.2 + 0.5 0.52 = 0.46.

which is unequal to

E (yt+1 ) = 0 / (1 1 ) = 0.2/ (1 0.5) = 0.4.


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model - Conditional Expectations

It is clear that E [yt+k |It ] will converge to the unconditional


expectation E (yt+k ) (here = 0.4) as k .
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Univariate Time-Series models


AR(1) model - Conditional Expectations

The fact that the conditional forecast converges to the


unconditional expectation indicates that the informational value of
the current process realisations diminishes with increasing forecast
horizon.

This is characteristic of a stationary and weakly dependent process


and is ensured in AR(1) if |1 | < 1.

Stationary process reverts to a constant mean (mean reverting


processes).

AR(1) monotonic convergence


AR(k), k > 1 more complex convergence possible

We only consider stationary (weakly dependent) series.


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Univariate Time-Series models


Example: UK CPI
Data: UKCPI.csv
and RStudio: Week3Practice

Implementation in R
Before we stated the AR(1) model

yt = 0 + 1 yt1 + ut . (21)
0
E (yt ) = (22)
1 1
The model that R actually estimates is

(yt ) = 1 (yt1 ) + ut . (23)


E (yt ) = (24)

The parameters in (21) or (23) can be estimated by OLS


(assuming TS1
Distributing holds)!
prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 4

Ralf Becker

February 19, 2016

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Table of contents

Heteroskedasticity
What is it?
Consequences of Heteroskedasticity
Detection
Robust standard errors
Generalised Least Squares (GLS)
Weighted LS and Feasible GLS

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Assumptions

Here is our model

y = X + u (1)
Assumption 5
Homoskedasticity. Constant residual variance.

Var [ui |xi ] = 2 (2)

or
Var [u|X] = 2 I (3)

If this assumption is breached then we are dealing with


heteroskedasticity
It is a common feature of simple regression models

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Example
House Price Sales - US data, Stockton3.csv
Model house sales price (spricei ) as being dependent on the
number of bedrooms (bedsi )

spricei = 0 + 1 bedsi + ui (4)

The deviations from the


regression line clearly
grow (on average) with
the number of beds (and
perhaps drops again for
bedsi > 5). Hence we
have error variance that
increases with the value
of the explanatory
variable.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Consequences of Heteroskedasticity
... for the OLS estimator

A5 is part of the Gauss-Markov assumptions which established the


BLUE properties of the OLS estimators for a linear model.
If A5 is breached, then:
I Best (Efficient), not any longer
I Linear, the estimator hasnt changed it is still linear
I Unbiased, still valid, need ZCM assumptions
I Estimator
As there are LLNs and CLTs for heteroskedastic data, OLS
estimators are still consistent and asymptotically normally
distributed (if all other assumptions are met).
 1 0 
X0 X b X0 X 1

b N , X X .

see online clip


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Consequences of Heteroskedasticity
... for inference using OLS

As obvious from the previous slide, in the presence of


heteroskedasticity

1 1
Var ()
b = X0 X X0 X X0 X (5)
1
6= 2 X0 X (6)

This means that when we calculate t-statistics

bk k
t stat =
se(bk )
we need to use the correct variance estimator from which to obtain
se(bk ).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Consequences of Heteroskedasticity
... for inference using OLS

What is the distribution of t-tests

Using standard variance Using the variance formula (5)


formula (6)

small n asymp. small n asymp.


Homosk. tnpars N(0, 1) Homosk. ? N(0, 1)

Heterosk. ? ? Heterosk. ? N(0, 1)

We havnt learned yet how to obtain an estimate for in (5).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

What next?

I How to detect whether HS is a problem


I How to perform inference with OLS robust to the presence of
HS (but inefficient)
I How to obtain efficient parameter estimates (Generalised
Least Squares - GLS)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection
Graphical Tools

How could we go about detecting the presence of


heteroskedasticity?

Example: Stockton4.wf1; 2610 home sales in Stockton, CA from


Oct 1, 1996 to Nov 30, 1998

Regress:
spricei = o + 1 livareai + 2 agei + ui

Is there some relation between residuals, u


bi and agei or livareai ?

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection
Graphical Tools

livareai Var (ui )

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection
Using hypothesis tests

More formal detection:


Heteroskedasticity implies systematic variation in the residual
variance. Some measure of residual variance is statistically
correlated to some variable. Some variable may be a variable
included in the regresion (here livareai or agei ) or another variable
(e.g. pooli ).
Use auxiliary regression!
What happens if you run:

u
bi = 0 + 1 livareai + 2 agei + i ?


b1 is an estimate of corr (b
ui , livareai ). Hence
b1 = 0 (Try
yourself!). Does not express the relationship we could see in the
above scatter.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection
Using hypothesis tests

u
bi is not a good measure of residual variance

|b bi2 are a good measure of residual variance.


ui | or u

bi2 = 0 + 1 livareai + i
u

This is an auxiliary regression!


H0 : absence of heteroskedaticity, 1 = 0 or R 2 = 0
HA : heteroskedaticity, 1 and R 2 > 0 We will again use the

following test statistic


a
LM = nR 2 2k

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection
Using hypothesis tests

In general we will apply the following strategy:


1. Estimate your regression model (say)

spricei = o + 1 livareai + 2 agei + ui (7)

and save the estimated residuals, ubi , i = 1, ..., n


2. Run an auxilliary regression

bi2 = 0 + zi 1 + i
u (8)

and save the R 2 of this auxiliary regression. Here zi is a


k-dimensional vector with all variables potentially relevant for
bi2
the variation in u
3. Calculate the test statistic LM = nR 2

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection
Using hypothesis tests

4. We test the following hypotheses

H0 : 1 = 0 nR 2 = 0 homoskedastic residuals (9)


HA : any j 6= 0 for j = 1, .., k nR 2 > 0 (10)
heteroskedastic residuals (11)
(12)
a
LM = nR 2 2k
Hence reject H0 if nR 2 >2k,,crit

Always right-tailed test

This testing principle is due to Breusch and Pagen. They basically


argued that zi could contain any variable that you suspect is
responsible forprohibited
Distributing the heteroskedasticity.
| Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Detection
Which explanatory variables to use?
In the above procedure it was left unspecified what and how many
variables should be in zi . Lets refer to this model:

spricei = o + 1 livareai + 2 agei + ui (13)

In EVIEWS the following versions are implemented:


I Breusch-Pagen-Godfrey
zi = (livareai , agei )
I White test (without cross terms)
zi = (livareai , agei , livareai2 , agei2 )
in some textbooks: zi = (livareai2 , agei2 )
I White test (with cross terms)
zi = (livareai , agei , livareai2 , agei2 , livareai agei )
But you are not restricted to these combinations, you can also
include variables that have not been included in the original
regression.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Detection
Heteroskedasticity test examples in R
reg1 < lm(SPRICE LIVAREA + AGE , data = hs data)
Breusch-Pagen Test, the default version
> bptest(reg1)
studentized Breusch-Pagan test
data: reg1 BP = 192.5768, df = 2, p-value < 2.2e 16
White Test, without cross-products
> bptest(reg 1, LIVAREA + I (LIVAREA2 ) + AGE + I (AGE 2 ),
data = hs data)
studentized Breusch-Pagan test
data: reg1 BP = 278.4171, df = 4, p value < 2.2e 16
White Test, with cross-products
> bptest(reg 1, LIVAREA + I (LIVAREA2 ) + AGE + I (AGE 2 ) +
I (AGE LIVAREA), data = hs data)
studentized Breusch-Pagan test
data:Distributing
reg1 BPprohibited
= 282.2782, p Aile
df =by5,Elia
| Downloaded value < 2.2e 16
(yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Detection
Test for time dependence of residual variance - ARCH
Consider daily exchange rate changes (USD/UKP), dxt (=100*
log-difference). (4 Jan 1971 to 7 Feb 2014) in
usdukp.xls/usdukp.wf1.

We estimate AR(1) model:

dxt = 0.0015 + 0.0437dxt1 + ut


(0.0025) (0.0096)

We save the estimated


residuals, ut
Clear vola clusters in
time
non-constant error
variance

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection
Test for time dependence of residual variance - ARCH

Could we use the variable time, t, as an explanatory variable in a


Breusch-Pagen test?
only if variance is consistently increasing or decreasing.

Volatility clusters: high values for t2 are likely to be followed by


further high 2 , 2 , 2 , etc.
values for t+1 t+2 t+3

2
We do not know the values for t+j but have proxies u 2
bt+j

bt2 are likely to be followed by further high


If volatility clusters: u
values for u 2 , u
bt+1 2 , u
bt+2 2 , etc. - Autoregressive Conditional
bt+3
Heteroskedasticity (ARCH).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection
Test for time dependence of residual variance - ARCH

Use auxiliary regression:

bt2 = 0 + 1 u
u 2
bt1 + 2 u 2
bt2 + ... + k u 2
btk + t
H0 : 1 = ... = k = 0 homoskedastic residuals
HA : any j 6= 0 for j = 1, .., k heteroskedastic
(ARCH) residuals
a
will deliver the test statistic LM = T R 2 2k under the null
hypothesis. Here T is the number of observations in the auxiliary
regression.
ARCH LM Test in R
> ArchTest(reg 2$residuals, lags = 12)
ARCH LM-test; Null hypothesis: no ARCH effects
data : reg 2$residuals
Chi squared = 1004.562, df = 12, p value < 2.2e 16
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Robust (White) standard errors

If there is heteroskedasticity in error terms then


  1 0 1
Var b = X0 X X X X0 X . (14)

and to estimate this we need an estimate for


2
1 0 0
..
0 2

2 .
= .. ..
.
(15)
. . 0
0 0 N 2

Do we have any information on 12 , 22 , etc? u


b1 , u
b2 , ...

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Robust (White) standard errors


Can we estimate a variance on the basis of one observation? In
general we cannot. This implies that, in general,
2
b1 0 0
u
2
..
0 u .
=
b2 (16)
..
b
..
. . 0
0 0 u bN2

is not a useful estimate of .


However, we are not reallyinterested
 in an estimate of but
rather an estimate of Var . It turns that it is alright to use
b b
 
in the context of estimating Var b . (Halbert White)
  1 0
Var b = X0 X b X0 X 1

X X (17)

see RStudio
Distributingwork for |how
prohibited this is by
Downloaded implemented in R.
Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Robust (White) standard errors

Asymptotically
 1 0  
b N , X0 X b X0 X 1 .
X X

and hence, if Var (i ) is based on (17), t-tests can be used for


inference (using critical values from N(0,1)).

This is called heteroskedastic robust inference / standard errors.

In Wooldridge (Chapter 8.2) elementwise version.

Heteroskedastic robust LM tests (using auxiliary regressions are


also available - no detail required). Wooldridge, p. 269-271 (4th
ed), 264-265 (5th ed).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Example

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Generalised Least Squares (GLS)


The idea
If 6= 2 I, is there an estimator for which has smaller variance
than bOLS ?
For any symmetric and positive definite matrix, such as , you can
find a non-singular matrix P, such that
= P P 0. (18)
(nn) (nn)(nn)

From this it follows that


P1 P01 = I. (19)
Recall the original model
y = X + u (20)
2
Var (u) = 6= I (21)
1 0
b = X0 X Xy (22)
 
1 1
Var b = X0 X X0 X X0 X

(23)
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Generalised Least Squares (GLS)


The idea

Transform the model such that new residuals are homoskedastic.


Recall that if
u N (0, ) , (24)
and
v = P1 u
then  
v N 0, P1 P01 (25)

resulting in
v N (0, I). (26)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Generalised Least Squares (GLS)


The idea

Hence premultiply equation (20) with P1

P1 y = P1 X + P1 u
y = X
e e +v (27)
Var (v) = I

OLS can be applied without any modifications to the new variables


y = P1 y and X
e e = P1 X.

Note that remained unchanged!

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Generalised Least Squares (GLS)


The idea

GLS estimator of :
 1
bGLS = e 0X
X e e 0e
X y (28)
 0 0 1
= X0 P1 P1 X X0 P1 P1 y
1 0 1
= X0 1 X X y
1
 
X0 1 X

Var bGLS = . (29)

Model (27) meets all the GM assumptions. Hence bGLS is BLUE.

As in general bGLS 6= bOLS it cannot be guaranteed that bOLS is


efficient any longer.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Weighted Least Squares - WLS


GLS - implementation

How to specify P? From (18):


2
1 0 0 0 1 0
.. ..
0 2

2 .
P= 0 2 .
= .. .. .. ..
.

. . 0 . . 0
0 0 N 2 0 0 N
(30)
If one knew , one could specify P. Of course we do not know .

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Weighted Least Squares - WLS


GLS - implementation

In certain applications you may suspect that i is proportional to a


certain variable, say zi , and you could then use a matrix P such
that
z1 0 0
..
0 z2 .
P= .
.
. (31)
.. . . 0

0 0 zN
This is then essentially what is sometimes (see Wooldridge chapter
8.4) called weighted least squares.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Weighted Least Squares - WLS


GLS - implementation
The root of this name is apparent from
1
z1 0 0

y1
.. y2
0 z2 .

1

y = P y =
e .. .. ..
.

. . 0
0 0 zN yN

y1 /z1
y2 /z2
= (32)

..
.
yN /zN

and e
y is merely a re-weighted version of y. X e is re-weighted in the
same fashion. If the first column in X is a column of ones, the first
column of X e will be a column with the reciprocals of all zi s.
Hence, in such
Distributing a case| Downloaded
prohibited there wouldby be
Elia no
Aileconstant in (27).
(yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Weighted Least Squares - WLS


GLS - implementation

Note:

I The variable zi should be strictly positive. Otherwise one


implicitely uses negative variances (Hint: You can always use
exponentials!).
I Interpret the parameter estimates in untransformed model

I R 2 cannot be compared between transformed and


untransformed model
I In R(Studio) use:
reg 3 < lm(SPRICE LIVAREA + AGE ,
data = hs data, weights = (1/LIVAREA))

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Feasible Least Squares - WLS


GLS - implementation

Sometimes one suspects that more than one variable drives the
residual variance, Var (ui ).

Argue that Var (ui ) is proportional to some linear combimation of


variables z1 .

Need to ensure positivity of the elements to be substituted onto


the diagonal of P.

Reading: Wooldridge (pp 282-284, 4th ed) (pp 276-278, 5th ed)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 5

Ralf Becker

February 29, 2016

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Table of contents

Autocorrelation
What is it?
Consequence
Detection
LM test
Extra notes on detection
How to deal with autocorrelation
Newey-West standard errors
Estimation in differences
Empirical Example

Specification Testing
Overview
RESET Test

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Assumptions

A common issue in TS data.

Recall the TS model


yt = xt + ut (1)
and

Assumption TS5
No autocorrelation (serial correlation)..

Corr (ut , uts |xt , xts ) = 0 for all s 6= 0

Formally autocorrelation is the breakdown of ATS5.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Why do we see autocorrelation?

When does it make sense to relate residuals:


I spatial relationship (in CS data), residuals for neighbouring
regions may be correlated.
Example: Consumption income relationship.
Observation units: Postcodes in Manchester. Neighbouring
postcodes tend to belong to similar socio-economic
backgrounds and may hence display similar deviations from
predicted patterns.

I time relationship (in TS data), residuals close to each other in


time may be related to each other. In this case we also call
this correlation serial correlation (autocorrelation).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

A simple model
The setup

A simple regression set-up but now the error terms are not iid, but
dependent.
We specify them as following the one process we know that
induced dependence, an AR(1) process:

yt = xt + ut (2)
ut = ut1 + vt (3)
N 0, v2

vt (4)

Corr (ut , ut1 ) 6= 0. The parameter determines how strong this


relationship is.
The random term vt may be normally distributed with zero mean
and variance v2 .
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

A simple model
The error term dynamics
If equation (3) is true for ut it is also true for ut1 :

ut1 = ut2 + vt1 . (5)

Substitution into (3) yields

ut = (ut2 + vt1 ) + vt (6)

and after recursive substitution we obtain:

ut = vt + vt1 + 2 vt2 + 3 vt3 + . . . . (7)

vt - are the shocks to our system, the only source of randomness.


ut - is determined by todays shock vt but also by all previous
shocks vt1 , vt2 , etc., it is a compound error term
If > 1, then past shock would have ever increasing influence on
todays residual.
If < 1 shocks die out (weakly dependent). This is the case we
will Distributing
restrict our prohibited | Downloaded
attention by Elia
to initially Aile ATS1).
(see (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

A simple model
The error term properties

Lets apply what we know about AR(1) processes. A few properties


of the regression error term (which is a compound shock) ut :

E (ut ) = 0 (8)
v2
Var (ut ) = 2 = (9)
1 2
Corr (ut , ut1 ) = (10)
2
Cov (ut , ut1 ) = (11)
k
Corr (ut , utk ) = (12)
2 k
Cov (ut , utk ) = . (13)

It is apparent that the condition for ATS5 to be valid is = 0.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Why does autocorrelation occur in real data?

I Economic shocks that have a persistant effect. This will


generate a series of either positive or negative residuals.
Shocks may cause slow adjustment processes. Extreme case:
nonstationary variables.

I Misspecified models - functional form. (e.g specification in


levels with nonstationary data)

I Misspecified models - omitted variable. If the omitted variable


is persistant, its omission might result in autocorrelated
residuals. This may be a dynamic misspecification, ie.
omitting yt1 or xt1 .

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Consequence
Returning to the matrix representation of the model, DGP (2) and
(3) can be written as follows:

y = X + u (14)
2
Var (u) = 6= I (15)
Var (u) = (16)
T 1

1 2

1 T 2

2
..
= 2 1 .
.. .. ..

..
. . . .
T 1 T 2 1
where T = sample size.

see (11) and (13). The covariances are on the off-diagonals:


Cov (ut , utk ) = 2 k
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Consequence
It is cleary not possible to simplify this to 2 I.
Residual processes other than the AR(1) will result in a different
setup for the variance-covariance matrix .

Is ,
b when estimated by means of OLS, still consistent and
efficient?
Similar to Heteroskedasticity. b is still consistent.
 
The derivation of Var b remains unchanged:
  1 0 1
Var b = X0 X X X X0 X (17)

which simplifies to
  1
Var b = 2 X0 X (18)
only when = 2 I.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Consequence
ATS1 to ATS5
 1 
a
b N , 2 X0 X . (19)

ATS1 to ATS3
 1 0 1 
a
b N , X0 X X X X0 X . (20)

Test the following null hypotheses on the ith element of , i , for


example: H0 : i = 0.5. One would typically use a ttest
(H0 : i = 0.5) calculated according to

bi 0.5 a
t= N (0, 1) . (21)
sbi

If one was to incorrectly calculate sbi from equation (19) rather


than from (20), the calculated t-statistic (21) would turn out to be
not Distributing
asymptotically normally
prohibited distributed.
| Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Detection of Autocorrelation
Informal tools - Time Series Plots

We have made assumptions for the error terms. The latter,


however, are unobserved. We will use estimated regression
residuals to test the validity of the assumptions on the unobserved
error terms.

Time series plot of residuals. Compare the following two regression


residual plots (use data in TSdataSpuriouosRegressions.csv and
usdukp.csv)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection of Autocorrelation
Informal tools - Time Series Plots
A: Residual from B: Residual from regressing
regressing agrvalt on log (usdukp)t on
aidt . log (usdukp)t1 .

The left clearly has runs of observations above and below the
mean (of zero).
Residual ut is correlated with its predecessor observation ut1 .
Residuals on the right appear more random.
HowDistributing
could weprohibited
quantify this in a by
| Downloaded statistic?
Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Detection of Autocorrelation
Testing for autocorrelation - LM test

This is sometimes called the Breusch-Godfrey test. Allows for


higher order autocorrelation.

yt = xt + ut (22)
ut = 1 ut1 + 2 ut2 + . . . + k utk + vt

where xt may contain a constant and lagged dependent variable.


Want to assess the relationship between the error terms, which are
unobserved.

We will, instead use the estimated residuals, ut as proxies and


examine the relationship between ut and its lagged versions, ut1 ,
ut2 , etc.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection of Autocorrelation
Testing for autocorrelation - LM test

Auxiliary regression procedure:


1. First estimate regression model (22) by OLS,
2. Save the residuals ubt , and then
3. Run the following auxiliary regression:

ubt = + xt + 1 ubt1 + 2 ubt2 + . . . + k ubtk + vt .

4. Testing for residual autoregression of order k:


H0 : 1 = 2 = ... = k = 0;
HA : any i 6= 0 for i = 1, ..., k
a
Use LM (nR 2 2k ) test where n is # obs in aux. reg.
(requires conditional homoskedasticity of vt ).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Detection of Autocorrelation
Extra notes on detection

I The auxiliary regressions for autocorrelation require the


inclusion of all explanatory variables (unlike auxiliary
regressions for heteroskedasticity).
I LM test is flexible: Auxiliary regression may lagged residuals
for e.g. i = 1, 3, 12. No need to include all k = 12 lags.
I Maximum lag is to be chosen according to the data frequency.
I LM test is the standard test..

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

How to deal with autocorrelation


Reasons for presence of autocorrelation:
I Misspecified models
I functional form.
I omitted variable.
I use of nonstationary variables
I Genuine AC (persistant shock effects; slow adjustment
processes).
General Action:
I Fix the model (when model is misspecified)
I use correct functional form
I include all relevant variables
I Use stationary transformations of the non-stationary variables
I Use estimators which allow for autocorrelated residuals (like
GLS, BUT not often done for AC)
I Use robust inference procedures (leave parameter estimators
unchanged
Distributing - see |below)
prohibited Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

How to deal with autocorrelation


Robust Inference: Newey-West standard errors

GLS type approach (reformulating a model to get nice error terms)


is in principle available, but hardly ever applied.

When autocorrelation is more complex and/or


xt includes a lagged dependent variable and/or
Residuals are conditionally heteroskedastic

adapt the variance-covariance to obtain valid large sample


inference.
Recall   1 0 1
Var b = X0 X X X X0 X . (23)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

How to deal with autocorrelation


Robust Inference: Newey-West standard errors

Recall the case of Heteroskedastic residuals:


had the following structure:
2
1 0 0
..
0 2

2 .
= .

.
(24)
.. .. 0

0 0 T2

and was estimated by



ub12 0 0
..
ub22

0 .
White =
b
.. ..

(25)
. . 0
0 0 ubT2
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

How to deal with autocorrelation


Robust Inference: Newey-West standard errors
Autocorrelated residuals: Now has non-zero off-diagonal
elements (allowing for autocorrelated residuals):
2
1 12 13 14 1T
12 2 23 24 2T
2
13 23 2 34 3T
3
= (26)

2
14 24 34 4 4T

.. .. .. .. .. .
..
. . . . .
1T 2T 3T 4T T 2

and approximating ij with bij = ubi ubj . would deliver


ub12

ub1 ub2 ub1 ub3 ub1 ub4 ub1 ubT
ub1 ub2
ub22 ub2 ub3 ub2 ub4 ub2 ubT
ub1 ub3 ub2 ub3 u 2 u u u u
3 4 3 T

b 3 b b b b
=
b (27)

ub1 ub4 ub2 ub4 ub3 ub4 ub42 ub4 ubT

.. .. .. .. ..
. . . . .
ub1 ubT ub| 2Downloaded
Distributing prohibited ubT ub3 ubTby Elia
ub4 ubAile ubT2
T (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

How to deal with autocorrelation


Robust Inference: Newey-West standard errors

Note that b =u b0 and hence, if this is substituted into X0 X, we


bu
get the following result:

X0 X
b = X0 u
b u b0 X
= X0 u b0 X
 
b u
= 0

Therefore (27) is not useful as an estimator of in (23).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

How to deal with autocorrelation


Robust Inference: Newey-West standard errors

Newey and West propose


b nw =

ub12 w1 ub1 ub2 w2 ub1 ub3 0 0



.. ..
w1 ub1 ub2 ub22 w1 ub2 ub3 w2 ub2 ub4 . .


..
w2 ub1 ub3 w1 ub2 ub3 ub32 w1 ub3 ub4 . 0

.

..

0 w2 ub2 ub4 w1 ub3 ub4 ub42 . w2 ubT 2 ubT

.
.. .. .. .. ..
. . . . w1 ubT 1 ubT
0 0 w2 ubT 2 ubT w1 ubT 1 ubT ubT2
(28)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

How to deal with autocorrelation


Robust Inference: Newey-West standard errors

A graphical representation of the proposed weights wi :

where white cells represent 0s.


Lighter shades illustrate
smaller wi . The further away
from the diagonal the smaller
the weight.

In this illustration non-zero weight only for the first two


off-diagonals.
  1 0
Varnw b = X0 X b nw X X0 X 1

X (29)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

How to deal with autocorrelation


Robust Inference: Newey-West standard errors

Note:
- The exposition here differs from Wooldridge although structure
similar to (chapter 12.5) (Reading: Hamilton, Time Series
Analysis, p.219).
- It also caters for heteroskedastic residuals.
- Parameter estimate remains
  unchanged
Inference using Varnw is valid asymptotically only. In the
b
presence of AC and/or HS

bi 0.5 a
t testOLS = ? (30)
sbi ,OLS
bi 0.5 a
t testNW = N (0, 1) (31)
sbi ,nw

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

How to deal with autocorrelation


Estimation in differences
Common proposal: if strongly autocorrelated residuals in:

yt = + xt + ut (32)
ut = ut1 + vt where vt is iid. (33)

estimate the regression in differenced form using.

yt = yt yt1, and xt = xt xt1,

as dependent and explanatory variables respectively.


To evaluate whether this makes sense, reconstruct
yt (= yt yt1 ) from (32):

yt yt1 = ( + xt + ut ) ( + xt1 + ut1 )


= (xt xt1 ) + (ut ut1 )
yt = xt + (ut ut1 ) (34)

FirstDistributing
note, that prohibited | Downloaded
one cannot by EliaAile
estimate from(yypieesp@abyssmail.com)
(34).
lOMoARcPSD|1057622

How to deal with autocorrelation


Estimation in differences

The idea behind this is that perhaps (ut ut1 ) reduces to the iid
process vt .

(ut ut1 ) = (ut1 + vt ) ut1 (35)


= ( 1) ut1 + vt

This, however, is only the case when = 1


This implies nonstationary error terms, something we previously
excluded via assumption TS1. If you use nonstationary
  data in
(32), then b is not asymptotically N , var b as TSA1 is not
met.
= spurious regression problem
When using persistent time-series data it is paramount to test the
series for stationarity
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

A worked example
Wooldridge Ex 11.7

Relating US hourly wage (hrwage) to productivity (output/hour)


(outphr ). Data in EARNS.csv.

Want to establish the elasticity of wage wrt to productivity

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Specification Testing
Overview

Potential problem:

I Heteroskedasticity
I Autocorrelation
I Omitted variable (specific suspicion about a missing variable)
I Functional form (RESET test)
I Structural change (Chow test)
For heteroskedasticity and autocorrelation the alternatives were
well defined. Formulating a test was straightforward.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Specification Testing
Overview

Omitted variable:

Consider:
yt = + xt + ut (36)

I Suspected that another variable zt should be included

yt = + xt + zt + ut (37)
A simple t-test of the H0 : = 0 will do the trick.
I Quadratic rather than a linear relationship between xt and yt :

yt = + xt + xt2 + ut (38)

and test H0 : = 0 using a t-test.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Specification Testing
RESET Test

Unspecific alternative? Need a test that raises a flag if something


is wrong.
No need to say what something is beforehand.
RESET (REgression Specification Error Test) by Ramsey.
Apply the following steps:
1. Estimate the model under
 the
 H0 that equation (36) is
correctly specified b, b
2. Obtain a series of {b
yt } as follows

ybt =
b + x
b t. (39)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Specification Testing
RESET Test

3. Then estimate the following equation

yt = + xt + 2 ybt2 + 3 ybt3 + ut (40)

and test H0 : 2 = 3 = 0. (standard F -test).


Using ubt as dependent variable delivers exactly the same
result.
If H0 is rejected there is some problem with the original model (36).
Not clear what the problem is.
Could be anything (even autocorrelation and heteroskedasticity).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 6

Ralf Becker

April 6, 2016

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Table of contents

Specification Testing
Overview
RESET Test

Structural Change and Dummy Variables


Structural Change
Dummy Variables
Some more issues on dummy variables

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Specification Testing
Overview

Potential problem:

I Heteroskedasticity
I Autocorrelation
I Omitted variable (specific suspicion about a missing variable)
I Functional form (RESET test)
I Structural change (Chow test)
For heteroskedasticity and autocorrelation the alternatives were
well defined. Formulating a test was straightforward.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Specification Testing
Overview

Omitted variable:

Consider:
yt = + xt + ut (1)

I Suspected that another variable zt should be included

yt = + xt + zt + ut (2)
A simple t-test of the H0 : = 0 will do the trick.
I Quadratic rather than a linear relationship between xt and yt :

yt = + xt + xt2 + ut (3)

and test H0 : = 0 using a t-test.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Specification Testing
RESET Test

Unspecific alternative? Need a test that raises a flag if something


is wrong.
No need to say what something is beforehand.
RESET (REgression Specification Error Test) by Ramsey.
Apply the following steps:
1. Estimate the model under
 the
 H0 that equation (1) is
correctly specified b, b
2. Obtain a series of {b
yt } as follows

ybt =
b + x
b t. (4)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Specification Testing
RESET Test

3. Then estimate the following equation

yt = + xt + 2 ybt2 + 3 ybt3 + ut (5)

and test H0 : 2 = 3 = 0. (standard F -test).


Using ubt as dependent variable delivers exactly the same
result.
If H0 is rejected there is some problem with the original model (1).
Not clear what the problem is.
Could be anything (even autocorrelation and heteroskedasticity).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Structural Change and Dummy Variables

Reading: Wooldridge (5th ed) pp 235-238 (Chp. 7) and p 437


(Chp. 13).
Relationship between variables may change:
- in time (TS data)
- for different categories of observations (say male and female)

What will we do
1. How can we detect this?
2. What do we do about this.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Structural Change
An example

I Annualised UK CPI growth (inflation rate), UKCPI.csv.


I Quarterly data from 1988Q2 to 2013Q4
I Lets withhold 2012Q1 to 2013Q4 for forecasting
I Lets label this series yt
I Estimate an AR(4) model

ybt = 0.542 + 0.041 yt1 + 0.122 yt2 0.054 yt3 + 0.690 yt4

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Structural Change
An example - Forecasts from full sample estimation

Figure: The data series from 1988Q2 to 2011Q4 and then 8 quarters of
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
forecasts
lOMoARcPSD|1057622

Structural Change
An example - Forecasts from full sample estimation

Figure: prohibited
Distributing The forecasts and realisations
| Downloaded from 2012 and 2013
by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Structural Change
An example - Forecasts from full sample estimation

I The problem lies in the inclusion of the early sample period


that includes significantly higher inflation.
I This will effect the unconditional mean

0.542
E (yt ) = = 2.689 (6)
1 0.041 0.122 + 0.054 0.690
to which we would expect this stationary process to converge.
I Also RESET test has p-value of 0.0019
I If the early observations are from a regime that may not be
relevant any more, then we may want to exclude these
observations.
I Lets re-estimate the model with observations starting from
1992Q1 instead (exclude first 4 years of data).
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Structural Change
An example - Forecasts from 92+ sample estimation

The result is the following:

ybt = 1.291 0.047 yt1 + 0.052 yt2 0.118 yt3 + 0.499 yt4
(0.425) (0.092) (0.089) (0.072) (0.074)
RSS = 223.233; n = 80 (7)

I Now we have E (yt ) = 2.102, a significantly lower


unconditional expectation
I This implies that the conditional forecasts will come in
somewhat lower.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Structural Change
An example - Forecasts from 92+ sample estimation

Figure: The data series from 2006Q1 to 2013Q4 with forecasts


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Structural Change
The Chow Test

Idea: Regress full sample and regress sub-samples. If there is no


change there should be no difference in overall fit.
I Lets use the RSS as a measure of fit
I Full Model - 1989Q2 to 2011Q4: RSSr = 368.7223, n = 91
I Split the same period into two sub-samples:
I Sample 1 - 1989Q2 to 1991Q4: RSS1 = 43.8610, n = 11
I Sample 2 - 1992Q1 to 2011Q4: RSS2 = 223.2332, n = 80
I RSSu = RSS1 + RSS2 = 267.0942
I Difference in fit is 101.6281
I Is this difference significant?

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Structural Change
The Chow Test

k = # of parameters estimated
(RSSr RSSu ) /k
F = (8)
RSSu / (dofu )
F Fk,T 2k . (9)

H0 : no difference between full- and sub-sample fit


HA : difference between full- and sub-sample fit
This is just a special case of the general F-test
Example:
RSS1 = 43.8610; RSS2 = 223.2332
RSSr = 368.7223; RSSu = 267.0942
dofu = 91 2 5
F = 6.164; Fcrit,0.05 = 2.32; Fcrit,0.01 = 3.23.
Reject H0
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Structural Change
The Chow Test - Extra Notes

I Only Fk,T 2k distributed if the error variance in the two


subsamples is identical.
I If H0 is rejected we do not know whether it is due to a
changing intercept or slope (or both). use dummy variables
I We need to know the time for which we suspect the break, if
we dont then we need some sort of recursive strategy.
I We need to know that there is one break only
There are testing strategies which have been developed to deal
with the two latter problems.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Dummy Variables
An Introduction

Reading: Semester 1
I Different attack to the previous example.
I A dummy variable is a variable that takes values of 0 and 1.
I The criterion that decides between 0 and 1 depends on the
problem,
I male - female
I pre - post 1981
I pre - post EMU etc.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Dummy Variables
An Example

I Use house price data, Stockton3.csv


I Consider the following model:

spricei = 0 + 1 pooli + u1 (10)

I The pool variable is defined as follows:



0 for houses without pool
pooli = (11)
1 for houses with pool

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Dummy Variables

sprice
\i = 118210 69142 pooli (12)
(1345) (6026)

R 2 = 0.048; RSS = 1.17 1013 , n = 2610 (13)

I Average house price for houses without pool (pooli = 0)


E (spricei |pooli = 0) = 118210 = 0

I Average house price for houses with pool (pooli = 1)


E (spricei |pooli = 1) = 187352 = 0 + 1

I Of course this model is hugely misspecified!

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Dummy Variables

Consider the following model:

spricei = 0 + 1 livareai + u1 (14)


sprice
\i = 30637.55 + 9466.7 livareai
(2263.4) (132.09)
2
R = 0.663; RSS = 4.14 1012 , n = 2610

Question: Does the price/living area relationship differ between

houses with and without swimming pools?

Two strategies:
1. Estimate two models, one for houses with pool and another
for houses without pool
2. Estimate one model but use the pooli dummy variable
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Dummy Variables

Strategy 1:

Pool: sprice
\i = b0 + b1 livareai (15)
Rp2 = 0.658; RSS = 4.51 10 , n = 13011

No Pool: sprice
\i =
b0 +
b1 livareai (16)
2 12
Rnp = 0.647; RSS = 3.67 10 , n = 2480

Strategy 2:

sprice
\i = b0 + b1 livareai + b2 pooli + b3 (livareai pooli )(17)
Rd um2 = 0.665; RSS = 4.12 1012 , n = 2610

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Dummy Variables

Question: How are the parameters in the different strategies


related to each other?
I b0 =
b0
I 0 + 2 = b0
b b
I b1 =
b1
I 1 + 3 = b1
b b

Therefore b2 and b3 can be interpreted as follows:


I b2 = The difference in the constant between houses with and
without pool
I b3 = The difference in the slope (effect of living area on house
price) between houses with and without pool

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Dummy Variables
Testing for significance of Dummy Variables

An F-test of b2 = b3 = 0 can be interpreted as a test on whether


or not the spricei /livareai relationship differs between houses with
and without pool

(RSSr RSSu ) /k
F = (18)
RSSu /dofu
k = number of restrictions
dofu = dof in the unrestricted model
F Fk,dofu .

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Dummy Variables
Testing for significance of Dummy Variables

Dummy Variable test: Chow test:


RSSr = 4.14 1012 from (14) RSSr = 4.14 1012 from (14)
RSSu = 4.12 1012 from (17) RSSu = 4.12 1012 from sum
k = 2 as H0 : b2 = b3 = 0 of (15) and (16)
dofu = 2610 4 = 2606 k = 2 as each model has 2
coefficients to estimate
dofu =
(2480 + 130) (2 + 2) = 2606
Both versions lead to

(4.14 4.12) /2
F = = 6.325 (19)
4.12/ (2610 4)
F F2,inf ,Fcv ,0.01 = 4.61 (20)

And therefore we reject H0 .


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Dummy Variables
Additional Notes

I IN CS differentiate between two (or more groups), eg. in


studies where you have a control group.
I Beware of creating a perfect multicolinearity problem by
including too many dummy variables (dummy variable trap).
Example:

0 if no pool
pooli = ; (21)
1 if pool

0 if pool
nopooli = ; (22)
1 if no pool
const = 1 (23)
nopooli = const pooli . (24)

If pooli , nopooli and a constant are included, then they are


perfectly colinear. One has to be left out.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Dummy Variables
Additional Notes

I Impulse dummies, to model once-off effects.


Example: Australia introduced VAT in July 2000. As cars were
to become cheaper, some purchases were delayed from June
into July. A dummy variable capturing such an effect would be


0 for t May 2000
1 for t = June 2000

Dt = . (25)

1 for t = July 2000
0 for t > July 2000

I Keep the number of dummy variables as small as possible.


However, if you omit a dummy variables which should be
included you are facing an omitted variable problem.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Dummy Variables
Additional Notes

I Rejecting the H0 above may be due to different residual


variance in sub-samples. No easy way to include a dummy for
the variance into an OLS framework. (ML estimation
required)
I Interpretation can be tricky (see Wooldridge examples).
I The dependent variable might be in dummy variable form
(buy or no buy; interest rate change or no change). A whole
different set of models is required. (and ML estimation)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

ECON20110/30370 Econometrics
2014/15 - Semester 2 - Week 8

Ralf Becker

April 16, 2016

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Table of contents

Maximum Likelihood Estimation


Introduction
Example: Goals
Parameter Estimation
Outlook

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Maximum Likelihood Estimation


Introduction
Reading: Wooldridge p 778-779 (4th ed); Thomas p 40-43;
Davidson and MacKinnon 399-404.
Where did the parameter estimate
1 0
b = X0 X X y. (1)

come from?
1. Minimisation of residual sum of squares (Least Squares - LS):
 0  
min y Xb b0 u
y Xb = u b

or
2. Minimisation of sample moments (Method of Moments - MM)
 
X0 y Xb = X0 u
b = 0.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Maximum Likelihood Estimation


Introduction

For the model


y = X + u (2)
both resulted in the same estimator (1).

A third estimator derivation principle is Maximum Likelihood


(ML). For the model in (2) we will obtain the same estimator.

On other occasions not all three estimators may be obtainable.

Even if more than one estimator is feasible, ML usually has


desirable large-sample properties. Efficient and asymptotically
normally distributed!

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Maximum Likelihood Estimation


Example: Goals
ML estimation comes into play when other methods are
inadequate.

Example: Lets say you want to model the amount of goals scored
in an English Premier League Match. Data from all matches from
August 2012 to 24 March 2014. 681 matches. (EPL 2012to14.csv)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Maximum Likelihood Estimation


Example: Goals

A straightforward way to
model this would be to
use OLS for (gi =
goals):

gi = + ui (3)
with ui N(0, 2 ) and
hence

E (gi ) = (4)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Maximum Likelihood Estimation


Example: Goals

Using this estimated model, what would be the following


probabilities?

P(gi 1) = 0.846

P(gi 3) = 0.450

P(gi < 0) = 0.056

Problem: normality is clearly a problem; continuous r.v.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Maximum Likelihood Estimation


Example: Goals

One way to handle this is to acknowledge gi N + . Accordingly


assume that gi Poisson()

The density of a Poisson r.v. is

gi e
f (gi ) = .
gi !
The parameter to be estimated here is .

Note: E (gi ) = , Var (gi ) =

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Parameter Estimation
Compare the empirical histogram with an arbitrary poisson
distribution ( = 3)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Parameter Estimation
How do we find the optimal, the ML parameter estimate?
The density of a Poisson r.v. is

gi e
f (gi ; ) = . (5)
gi !
The parameter to be estimated here is .
What would be the probability of the two first outcomes(say 0 and
5 goals), given a certain value of ?
iid
L (; g1 , g2 ) = f (g1 ; )f (g2 ; )
This is called the likelihood function.
Here we have a product; often more convenient to work with
summations hence take log:
iid
ln L (; g1 , g2 ) = ln f (g1 ; ) + ln f (g2 ; )
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Parameter Estimation
The first few observations are: 0, 5, 3, 5, 2, ...
Lets assume the parameter was = 2 or 5 or 8.

= 2:
ln L ( = 2; g1 , g2 ) = 2 + (3.322) = 5.322
= 5
ln L ( = 5; g1 , g2 ) = 6.740
= 8
ln L ( = 8; g1 , g2 ) = 10.390

The larger the value of ln L (; g1 , g2 ) the more likely that the data
were drawn from the respective distribution.
From which distribution did the data most likely come? Here from
the Poisson with = 2.
We only used the first two datapoints and we did not search over
all possible parameter
Distributing values. by Elia Aile (yypieesp@abyssmail.com)
prohibited | Downloaded
lOMoARcPSD|1057622

Parameter Estimation

We only used the first two datapoints and we did not search over
all possible parameter values.

The formal problem statement is

ML = argmin ln L (; g1 , g2 , ..., g681 )


Using the glm function in R: ML = 2.779441

This coincides with average number of goals.

P(gi 1) = 0.9379
P(gi 3) = 0.5256
P(gi < 0) = 0

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Parameter Estimation
Conditional models
Say you want to recognise that the number of goals may depend
on a number of explanatory variables: e.g. whether the match is a
home match for a top team or not, perhaps the Table position
of the teams, the temperature on the day, etc.

How would we adjust (3)?

Lets consider a variable topi which is a dummy variable

1 if top team home match


topi = { (6)
0 if not a top team home match
Then, in a OLS framework we would adjust the model as follows

gi = + 1 topi + ui (7)
E (gi |topi ) = + 1 topi (8)
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Parameter Estimation
Conditional models
How would we adjust the Poisson model?
We need to adjust the density in (5) as follows:

gi i e i
f (gi | topi ; ) = .
gi !
now changes, meaning we get varying conditional expectations
E (gi |topi ) = i .
i is specified as follows

i = exp(0 + 1 topi )
We use the exp() function to ensure that i is positive.
We then find the ML parameter estimates

(0,ML , 1,ML ) = argmin ln L (0 , 1 ; g1 , ..., g681 , top1 , ..., top681 )


(0 ,1 )
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com) (9)
lOMoARcPSD|1057622

Outlook

Sometimes the problem in (9) can be solved analytically.


E.g. for (2) where we would find (1) as bML .
Often there is no analytical solution numerical maximisation
(sophisticated trial and error).
ML estimation is attractive as:
1. Under very general assumptions ML estimators can often be
shown to be asymptotically normally distributed.
2. ML estimators are available where LS estimators are not:
2.1 truncated models (your data has been pre-selected, e.g.
Manchester students all have at least ABB at A-levels)
2.2 models with binary dependent variables (do you own a bicycle
or not)
2.3 count data (number of soldiers killed by mule-kicks each year
in the Prussian cavalry, Ladislaus von Bortkiewicz, 1898)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 8

Ralf Becker

April 24, 2016

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Table of contents

Bayesian Econometrics
Introduction
The Basics
Summary Comparison
An Example
Summary

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Bayesian Econometrics
Introduction

Reading: Greene (6th ed) Chapter 18.

Think of the following elements:


I Data, y , including observations for the dependent and
explanatory variables. Most generally all are treated as
random variables.
I Model, this describes how the explanatory variables are linked
with each other. Part of this description are usually:
I a set of model parameters, and
I a distributional assumption for any error term

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Frequentists Econometrics
This is what we have done so far.
I Data, y , are observed.
I Model, we choose a particular model, say M, e.g.

y = X + u (1)

I a set of model parameters, associated to this model. is


unknown but assumed to be constant
I a distributional assumption for any error term, e.g.
u N(0, 2 )
Then we obtained an estimate for

b = (X0 X)1 X0 y (2)

which (given assumptions) we know to have to have the following


distribution
0 1
b N(, by2 (X
Distributing prohibited | Downloaded Elia X) )
Aile (yypieesp@abyssmail.com) (3)
lOMoARcPSD|1057622

Frequentists Econometrics

We assumed that is fixed, but unknown and established that b is


a draw from a random variable that is centered around the
unknown .

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Bayesian Econometrics
The crucial difference

If it was possible to pin the difference between frequentist and


bayesian econometrics to one point, it would be the following:

While Frequentists assume that is unknown but fixed,


Bayesians assume that is a random variable of which we do
not see any draw.

This implies that Bayesians are really looking for p(|y ), i.e. the
probability distribution of conditional on the observed data, y
(and the assumed model, M with associated error distribution).
The observed data (potentially a vector of data) may well be
random as well and be characterised by p(y ).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Bayesian Econometrics
The Basics
Recall the following basic probability rule (where A and B are
events):

P(A, B)
P(A|B) = (4)
P(B)
The same is valid for random variables, a and b, rather than events

p(a, b)
p(a|b) = (5)
p(b)
p(a, b)
p(b|a) = p(a, b) = p(b|a)p(a) (6)
p(a)

Now substitute the second line into the first and obtain

p(b|a)p(a)
p(a|b) = (7)
p(b)
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Bayesian Econometrics
The Basics
Why is this useful?

p(b|a)p(a)
p(a|b) = (8)
p(b)
Think of our two random variables, the parameter vector, and
the data y .

p(y |)p()
p(|y ) = (9)
p(y )
The left hand side is the object of desire for Bayesians, the
posterior distribution of conditional on the observed data.

We are not trying to find one best estimate of the unknown ,


rather we are interested in the posterior distribution p(|y ).

You could look at the mean of that distribution as your one best
Distributing
estimate of .prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Bayesian Econometrics
The Basics

As p(y ) does not involve we can also state

p(|y ) p(y |)p() (10)


means is proportional to This implies that if we have the
terms on the right hand side we can calculate p(|y ).
I p(y |), this is exactly the same as the likelihood function we
encountered in the ML section. You need a model M and an
error distribution to write this down.
I p(), this is called the prior distribution of the parameter
vector. It reflects our knowledge about the parameter vector
of interest prior to looking at our data y .

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Bayesian Econometrics
Summary comparison

Frequentists:

Set Model M and error distribution to determine p(y |), then

ML = argmax p(y |) (11)


Bayesians:

Set Model M and error distribution to determine p(y |) and the


prior distribution p() then

p(|y ) p(y |)p() (12)


p(|y ) needs to be evaluated at each possible/plausible value for .
We are not optimising anything, but we are multiplying two
distributions.
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

An Example
Setup
To get a flavour for the
calculations required we
will work through an
example.

Lets assume you want


to figure out whether
there is a positive
temperature trend.

Annual (1850 to 2015) temperature anomaly data. n = 166


Source: Climate Research Unit, UEA,
http://www.cru.uea.ac.uk/cru/data/temperature/
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

An Example
Setup

If there was a positive temperature trend we would expect the


probability of a year with increasing temperature to be larger than
0.5!

Define a dummy variable yt that takes a value of 1 for years with


temperature increases and 0 otherwise.

= P(yt > 0) (13)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

An Example
Frequentists Approach

I is unknown, but fixed.


Pn
I Lets obtain a sample estimate, = i=1 yt = 0.5273
I Do we know a distribution for ?
I Yes, N(, ((1 )/n) N(p, 0.03892 )
I But as we do not know we do not know where it is?
I Only once you fix a H0 (e.g. H0 : = 0.5) you can perform
hypothesis tests on
I Clearly we would be unable to reject a H0 : = 0.5 at any
reasonable significance level.

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

An Example
Bayesian Approach

We apply

p(|y ) p(y |)p() (14)


to our problem:

p(|y ) p(y |)p() (15)


p() is our best info on before we see the data, p(|y ) after
weve seen the data.

This can be understood as an updating problem (add one years


info, yt at a time!).

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

An Example
Bayesian Approach

In Bayesian analysis we need to do calculations at every possible


outcome for :

For simplicity, discretise the problem and consider the 101 possible
values,
1 = 0.0, 2 = 0.01, 3 = 0.02, ..., 100 = 0.99, 101 = 1.0

pp(i |yt ) p(yt |i )p(i ) (16)


p(yt |i )p(i )
p(i |yt ) = P101 (17)
i=1 pp(1 |i)

where the second line is justPa rescaling to ensure that p(i |yt ) is a
probability distribution and 101
i=1 p(i |yt ) = 1. Before we can start
we need a prior distribution p(i )
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

An Example
The prior distribution

Lets try the following:

1. N[0.4, 0.052 ]
2. N[0.6, 0.052 ]
3. U[0, 1]

Where do they come from?


Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

An Example
The updating mechanism
Recall, this is what we do.

pp(i |yt ) p(yt |i )p(i ) (18)


p(yt |i )p(i )
p(i |yt ) = P101 (19)
i=1 pp(i |i)

All that is left is: p(yt |i )

As we are dealing with a binary variable and we assumed that is


the probability that yt = 1, we know that

p(yt |i ) = i if yt = 1 (20)
p(yt |i ) = (1 i ) if yt = 0 (21)

This means we everything to perform the updating (recall (18) and


(19)Distributing
need to prohibited
be done | for
Downloaded
every by
i . Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

An Example
The posterior distribution
after 163 annual updates:

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

An Example
The prior distribution

Once we have the posteriors after all updating we can calculate the
following probabilities:

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)


lOMoARcPSD|1057622

Bayesian Econometrics
Summary

CONTRAs
I Arbitrary choice of priors
I With uninformative priors (sometimes) same results as
frequentists
I When allowing for continuous parameter distributions we need
to use numerical integration (computationally intensive!)
PROs
I Ability to find probabilities for parameters of interest
I Ability to deal with latent/unobserved random variables
I Computational issues become less important

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Вам также может понравиться