Lecture Notes Lectures 1 8

lOMoARcPSD|1057622
Lecture notes, lectures 1-8
Econometrics (University of Manchester)
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

lOMoARcPSD|1057622
ECON20110/30370 Econometrics
2015/16 - Semester 2 - Week 1
Ralf Becker
February 4, 2016

lOMoARcPSD|1057622
Table of contents
State of Play
Model assumptions and parameter properties
Example: Basic Econometrics Grades (1)
Testing multiple restrictions
Example 1: Basic Econometrics Grades (2)
p-values - Revision
Overview Semester 2
Auxiliary regressions
Small Sample Properties of OLS Estimators
What next

lOMoARcPSD|1057622
Assumptions
Assumption 1 The model is linear in parameters.
yi = 0 + 1 x1i + 2 x2i + ... + k xki + ui (1)
or
y = X + u (2)
Assumption 2
Random samples {yi , x1i , ..., xki }. n observations.
Assumption 3
There is variation in the explanatory variables. Absence of perfect
multicollinearity (full rank of X).
lOMoARcPSD|1057622
Assumptions
Assumption 4 Zero conditional mean.
E [ui |xi ] = E [ui |x1i , x2i , ..., xki ] = 0 (3)
or
E [u|X] = 0 (4)
A1 to A4 guarantee that the OLS parameter estimator is unbiased

h i
E b = (5)

lOMoARcPSD|1057622
Assumptions
Assumption 5
Homoskedasticity. Constant residual variance.
Var [ui |xi ] = 2 (6)
or
Var [u|X] = 2 I (7)
A1 to A5 (= Gauss-Markov assumptions) guarantee that OLS

parameter estimator is BLUE (best linear unbiased).

lOMoARcPSD|1057622
Assumptions
Assumption 6
Normality.
ui N 0, 2

(8)
or
u N 0, 2 I .

(9)
This assumption implies A4 and A5.
Gauss-Markov assumptions + A6 (= Classical linear regression
assumptions) guarantee that inference on can be based on t and
F tests in samples of any size.

bi i /se bi tnk1 (10)
F Fr ,nk1 (see below)
If A6 is not valid then the above inference is justified in large
sample (and the associated theory is called asymptotic theory).

a a
bi i /se bi tnk1 = N (0, 1) (11)
a
Distributing prohibited
F F|rDownloaded
,nk1
by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Semester 1 and
Semester 2 grades
(somewhat randomised)
for Econometrics
(gradeexample.csv). 137
observations.
Regress variable sem2i against sem1i
sem2i = + sem1i + ui (12)

lOMoARcPSD|1057622
We can test whether there is a significant relationship between

Semester 1 and Semester 2 results. Looking at the regression
output the answer is clearly yes. Lets, however, test the following
hypothesis:
H0 : = 1
HA : < 1
Given A1 - A6 we know that

b /se b tn11
The rejection rule is:

Reject H0 if tcalc < tcrit, (= 1.645 for = 0.05)

lOMoARcPSD|1057622
The test statistic is
b 1 0.9337 1
tcalc = = = 0.9507
se b 0.0697
Therefore we do not reject H0 . What does this mean? For every

grade that you get extra in Semester 1 (through good revision, of
course!) you will, on average, get another grade in Semester 2.

lOMoARcPSD|1057622
Testing multiple restrictions
The most common test statistic used to test multiple restrictions is
(SSRr SSRu ) /r
F = (13)
SSRu / (n k 1)
F Fr ,nk1 if A1 to A6
a .
F Fr ,nk1 if A1 to A5
here r = number of restrictions tested and SSR = sum of squared

residuals of the restricted (r ) and unrestricted models (u).

lOMoARcPSD|1057622
We are interested in whether the relationship between Semester 1

and Semester 2 grades differs between Year 2 and 3 students.
Include Year 3 dummy into the model
S2ti = 0 + 1 S1ti + 2 Y 3si + 3 (S1ti Y 2si ) + ei

lOMoARcPSD|1057622

Lets test a composite hypothesis that first semester performance
has no bearing on the second semester grade, i.e.
H0 : 2 = 3 = 0
H0 : 2 and/or 3 6= 0
The test statistic to be used is the F -test
(SSRr SSRu ) /r
F = Fr ,nk1
SSRu / (n k 1)
We are testing two restrictions, hence r = 2, and n k 1 = 133.
The decision rule is to reject H0 if F > F2,133, (3.00 at = 0.05).
(26786.75 25437.09) /2
F = = 3.5284
25437.09/133
(Get SSRr and SSRu yourself!) We reject H0 at = 0.05. The
Semester 1 / Semester 2 grade relationship (marginally) varies
between Year 2 and Year 3 students.
lOMoARcPSD|1057622

lOMoARcPSD|1057622
p-values
It is crucial to understand what p-values are, how they are

calculated and how to interpret them. The decision rule used,
when using p-values is the following
Reject H0 if p-value <
The value for = P(type I error H0 is true) needs to be set by

the researcher. The p-value is then the probability of getting a test
statistic at least as extreme as that calculated from the data if
H0 was true!

lOMoARcPSD|1057622
p-values - Examples
For the above examples the p-values are
I t-test, left-tailed (one sided)
H0 : = 1
HA : < 1
H0 distribution: t135 . test-stat = -0.9507. p-value= 0.1717

(or from Tables p value > 0.1 using 120 d.o.f.). Do not
reject H0 .
I F-test
H0 : s1t = s1e = 0
H0 : s1t and/or s1e 6= 0
H0 distribution: F2,133 . test-statistic = 3.5284. p-value =

0.0321 (or from Tables: 0.01 < p value < 0.05 using 2 and
120 d.o.f). Do reject H0 at by
Distributing prohibited | Downloaded
Elia
= 0.05.
Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Overview Semester 2
Auxiliary Regressions
Small Sample Parameter Properties
The Matrix Form
Asymptotic Parameter Properties
Introduction to Time-Series Data
Multicolinearity - Breach of A3
Heteroskedasticity - Breach of A5
Autocorrelation - Breach of A5 for time series data
Specification testing - Breach of A1
Forecasting
Maximum Likelihood
Bayesian Econometrics

lOMoARcPSD|1057622
Assessment
I Problem Sets, Multiple-choice and short answer questions

based on prior work assignment, you will need RStudio to
complete all work.
PS 1: Deadline in week beginning 22 Feb (2.5%)
PS 2: Deadline in week beginning 25 Apr (2.5%)
I Mid-Term Exam, Thursday 17 March, 3-4pm (10%)
I Final Exam, 1.5 hours, short answer-type questions, (35%)

lOMoARcPSD|1057622
Reading: (Wooldridge p176-178)
Later in the course we will encounter helper or auxiliary
regressions.
Some multiple restrictions can easily be tested by auxiliary
regressions. Example:
y = X + u (14)
0
= (0 1 ... 4 ) (15)
H0 : 2 = 3 = 4 = 0 (16)
1. Estimate restricted model
yi = 0 + 1 x1i + ui (17)
and obtain estimated residuals
ei = yi e0 e1 x1i
u (18)
where eiprohibited
Distributing are OLS| Downloaded
estimatesbyfrom the(yypieesp@abyssmail.com)
Elia Aile restricted model.
lOMoARcPSD|1057622
ei on constant, x1t and x2t , x3t and x4t . Obtain R 2

2. Regress u
from this regression.
3. Calculate the test statistic LM = nR 2 .

Under the null hypothesis (16) this test statistic is
asymptotically 2 distributed with r (= 3) degrees of
freedom. Right tailed test.
Idea: Regress residuals from the null model against something
with which the residuals should only be correlated in the case the
null hypothesis is not valid. If the null is valid the correlation of the
regressors in the auxiliary regression and the null residuals should
be close to zero and so should the R 2 and hence the LM test.

lOMoARcPSD|1057622
Small Sample Parameter Properties
Reading: (Wooldridge 3.3,3.4 and 4..1)

Lets start with our standard regression model
yi = 0 + 1 xi + ui (19)
You know that

Cov (yi , xi )
b1 = (20)
Var (xi )
and
2
Var b1 = (21)
SST1 1 R12

What does it mean for parameter estimates to be unbiased and

efficient?
These are small sample properties, next week: large
sample/asymptotic properties.
lOMoARcPSD|1057622
Parameter Properties
Unbiasedness
Formally
E (1 ) = 1 (22)
Assume that there is a population with 100,000 members. In this
population the true (but unknown) relationship is (where 1 = 1.5)
yi = 0.5 + 1.5 xi + ui (23)
Then each of you (300 students) randomly draws a sample of 100

observations (i.e. 100 pairs of (yi , xi ) pairs) and estimates a
regression, obtaining 0 and 1 . What would we expect?
I 1 is a random variable and hence ...
I Each of you would obtain different estimates 0 and 1

I We would then expect that, on average, your estimates of 1
would equal 1.5
IF A1 to A4 hold
lOMoARcPSD|1057622
Unbiasedness
But note: In practice you have only one of these 1 s! And that
mayDistributing
happen to be somewhere
prohibited in by
| Downloaded the
Eliatail!
Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Efficiency
Recall that 1 is a r.v.
From the histogram for

1 you can see that it
has some variation.
If the OLS estimator is

efficient (if GM
assumptions hold!) then
there is no other
estimator that has a
smaller variation.

lOMoARcPSD|1057622
Why are they important
I When you estimate a regression on sample data you know

that you will obtain a draw from a random variable
I You want to know that you are drawing from a distribution
that is centered around the true value (unbiasedness)
I You want to know that you are drawing from a distribution
that is not unnecessarily dispersed (efficient)

lOMoARcPSD|1057622
What to do next:
I Clips on the matrix form and p-values to consolidate this

weeks material
I Attempt the revision quiz
I Before next weeks lecture watch clips on some basic
statistical tools we will need for next weeks lecture

lOMoARcPSD|1057622
Ralf Becker
February 1, 2016

lOMoARcPSD|1057622
Table of contents
Observation Wise Form to Matrix Form
Asymptotic Preliminaries
Asymptotic Properties of OLS Estimators
Random Regressors

lOMoARcPSD|1057622
Observation Wise Form to Matrix Form
The same Model can be represented in two ways
y = X + u (1)
yi = 0 + 1 xi + ui (4)
Cov (yi , xi )
b1 =
1 Var (xi )
b = X0 X X0 y (2)
b0 = y b1 x (5)
2

1 Var bj =

Var b = 2 X0 X

(3) SSTj 1 Rj 2
(6)
The Matrix form is much more general as it leaves the number of
columns in X unspecified
We will use the form that makes life easier (depends on the issue)

lOMoARcPSD|1057622
Matrix Form
Let X be a (n q) matrix, y be a (n 1) vector and u is a (n 1)

vector of error terms
1
0
=
b X X X0 y (7)
(qn)(nq) (qn)(n1)

b1
b2

b = (8)

...

(q1)
bq
If the first column of X is a vector of ones (representing the
constant) then b1 is the estimated constant parameter.
In Semester 1: q = k + 1 (sometimes I use k instead of q)

lOMoARcPSD|1057622
Matrix Form
Let X be a (n q) matrix, y be a (n 1) vector and u is a (n 1)
vector of error terms
1
2 0
Var b = X X (9)
(11) (qn)(nq)

Var (b1 )
Cov (b1 , b2 ) Var (b2 )
Var b =

.. ..
(qq) . .
Cov (1 , q ) Cov (2 , q )
b b b b Var (bq )
(10)
This matrix is a symmetric matrix

lOMoARcPSD|1057622
Matrix Form and averages

We need to understand what terms like X0 X are!

1 x1
1 x2
X= .

..
.. .
1 xn
Then

1 x1
1 x2

0 1 1 ... 1
XX =

x1 x2 . . . xn
.. ..
. .
1 xn
(12 + 12 + + 12 ) (x1 + x2 + + xn )

=
(x1 + x2 + + xn ) (x12 + x22 + + xn2 )

n xi
=
Distributing prohibited xi2
xi | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Matrix Form and averages
As it turns out it will be convenient to deal with averages.

1 1

1 0 n n n xi
XX= 1 1 2
n n xi n xi
Each element represents an average!
Equally:
1

1 0 n yi
Xy= 1
n n (xi yi )

lOMoARcPSD|1057622
Asymptotic Preliminaries
I Asymptotic arguments are arguments in which we imagine

that the sample size, n, goes to infinity, n .
I The reason we do this is not because it is realistic to increase
sample sizes to , but because we understand the behaviour
of some terms if n , in particular: averages
I Recall sample means are random variables!
The following two tools are important:
Theorem (Law of Large Numbers)
Sample means converges to true mean as n .
Conditions apply!
Theorem (Central Limit Theorem)

Averages are asymptotically normal distributed as n .
Conditions apply!
lOMoARcPSD|1057622
In Semester 1:
I We needed Assumption 6 (error normality) to derive the
distributions of t and F tests.
I If A6 holds we know the distributions of t and F tests at
any sample size.
In Semester 2:
I As we relax some assumptions (e.g. the homoskedasticity
assumption) we loose the ability to derive small sample
distributions. But we will be able to derive asymptotic
properties (i.e. as the sample size goes to infinity).
I Means that we can do without A6!

lOMoARcPSD|1057622

Assuming A1 to A5 hold
Here the basic idea! (Details in Online Clips) Start with the matrix
form model (fixed X):
y = X + u (11)
What we need is the distribution of
1 0
b = X0 X Xy (12)
We modify this by substituting for y to get

1 0
b = + X0 X Xu (13)
Bring on the LHS and augment by 1/n terms

1
1 0 1 0
b = XX Xu (14)
n n
Recall that, as
Distributing u is a |r.v.,
prohibited b by
Downloaded is Elia
alsoAile
a (yypieesp@abyssmail.com)
r.v.
lOMoARcPSD|1057622

Consistency
Terms A and B are now averages.

1
1 0 1 0
b = XX Xu (15)
| {z } n n
bias | {z } | {z }
A B
How does b behave for large n?

p
We want b 0, then b is said to be consistent. If E (xi ui ) = 0
(A4) and (xi ui ) is iid (A2), then a Law of Large Numbers (LLN) is
applicable (sample average converges to true mean!).
We also need to apply a LLN to A1 and assume that it (as A is

an average) converges to a constant mean, say M.
p p
Then B 0 and hence b M 0 = 0.
lOMoARcPSD|1057622

Asymptotic Normality
Now we want to know how (b ) is distributed (for large n).

Recall, it is a r.v.!
Applying a Central Limit Theorem (CLT, averages are
asymptotically normally distributed) (under assumptions, e.g. iid!)
we can establish that
1
1 0 1 0
b = XX Xu (16)
| {z } n n
| {z } | {z }
M a
N(0,P)
This then establishes that

a
b N(0, MPM 0 ) (17)

lOMoARcPSD|1057622
Small Sample v Asymptotic Properties
In Semester 1:
GM assumptions (A1 to A5) allowed unbiasedness and efficiency

(BLUE) result
CLRM assumptions (A1 to A6) allowed b N(0, 2 (X0 X)1 )

result
In Semester 2:
a
A1 to A4 are sufficient to derive b N(0, MPM 0 ). A6 is not
necessary. A5 can be relaxed (using different LLNs and CLTs)

lOMoARcPSD|1057622
Random Regressors
So far we assumed that X was non-random and fixed.Lets start
from this result

1
b = + X0 X X0 u (18)
From here we established that b was a r.v. and hence we were

interested in its expectation to establish unbiasedness.

lOMoARcPSD|1057622
Random Regressors
If X is fixed:
E ()
b = (19)
if E (u) = 0
If X is random:
E (|X)
b = (20)
if E (u|X) = 0
hence result is conditioned on the particular set of observations X
we used.

lOMoARcPSD|1057622
Random Regressors
Can the last result be generalised?
I.e. can we turn the conditional expectation E (|X)

b = into an
unconditional expectation?
E () = EX (E (|X))
b = EX () = (21)
which is an application of the Law of Iterated Expectations

lOMoARcPSD|1057622
Random Regressors
What about our variance formula?
If X is fixed:
1
b = 2 X0 X
Var () (22)
If X is random:
1
Var (|X)
b = 2 X0 X (23)
i.e. at this stage the variance formula is valid for the particular X
only.
We need to form expectations across the r.v. X to get Var ()

b
1 1
Var ()
b = EX (Var (|X))
b = EX ( 2 X0 X ) = 2 EX X0 X
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)(24)
lOMoARcPSD|1057622
Random Regressors
Variance implementation
we established that
1
b = 2 EX
Var () X0 X (25)
In practice we only have one realisation of X.
How could we possibly obtain EX (. . . X . . .)?
We use the one observation we have and calculate an average of

what we want over that one observation!

lOMoARcPSD|1057622
Data may be sampled across time. (W. chps 10.1, 10.2 and 11)
Example: Phillips curve. Is there a relationship between inflation
() and unemployment (un)?
obs CS obs TS
all in 2003 all UK data
1 (UK , unUK ) 1 (1990 , un1990 )
2 (Ch , unCh ) 2 (1991 , un1991 )
3 (Jap , unJap ) 3 (1992 , un1992 )
.. .. .. ..
. . . .
i = 0 + 1 uni + ui i = 1, ..., n (26)

t = 0 + 1 unt + ut t = 1, ..., T (27)

lOMoARcPSD|1057622

Some Time-Series Plots
Source:OECD, Main Economic Indicators

lOMoARcPSD|1057622
Consequences of using Time-Series Data
Are there any consequences for the properties of OLS model

estimates if data are time series?
Assumption 2
Random samples {yi , x1i , ..., xki }. n observations.
The ith observation should be independent from any other

observation.
Can this assumption be maintained for TS data?

lOMoARcPSD|1057622
Often Data will trend for the entire or part of the series.
Random Sampling does not seem appropriate for all TS data.
Derivation of asymptotic properties of parameter estimates

collapses.
This makes establishing relationships difficult.
Example
Relationship between CO2 emissions (thousand metric tons of
carbon) and global temperature (deviation from 1961-1990
average).

lOMoARcPSD|1057622

lOMoARcPSD|1057622
Time-Series Data: Outlook
What are we to achieve?
I Figure out when we can use TS data in a regression model

I What are the consequences if we employ regression analysis
inappropriately
I Build a simple univariate TS model (mainly for forecasting
purposes)

lOMoARcPSD|1057622
2015/16 - Semester 2 - Lecture Week 3
Ralf Becker
February 13, 2016

lOMoARcPSD|1057622
Table of contents
Time Series Data

Model assumptions and parameter properties
Consequences of breach of assumption TS1
A real example
A simulated example
Univariate Time-Series models
Introduction
Linking TS features and univariate processes
AR(1) model

lOMoARcPSD|1057622
Model Setup
I assume that you have watched the online clips in the

Pre-Lecture Section of BB.
Consider the following time-series model
yt = xt + ut (1)
where xt may include xt = (zt , zt1 , zt2 , ..., yt1, yt2,... )

lOMoARcPSD|1057622
Assumptions
Assumption TS1 (as in Wooldridge chapter 11) Assume that
the model is as in (1) and that the draws of (yt , xt ) for t = 1, ..., T
are stationary and weakly dependent.
Assumption TS2
No perfect correlation between variables in xt .
Assumption TS3
Zero conditional mean.
E [ut |xt ] = 0. (2)
Assumption TS4
Var [ut |xt ] = 2 (3)
Assumption TS5
No autocorrelation (serial correlation)..
Distributing prohibited uts |xt , xts

Corr (ut|,Downloaded ) =Aile
by Elia 0 for all s 6= 0
(yypieesp@abyssmail.com) (4)
lOMoARcPSD|1057622

A real example
Consider the following four time series (all annual from 1961 to
2007, TS_Data_SpuriousRegression.wf1)
I Life Expectancy at Birth in Belgium (lifeexp)
I Agriculture, value added (% of GDP) in China (agrval)
I ODA aid per person (constant 2007 US$) in Norway (aid)
I CO2 emission per person (metric tons) in Australia (co2em)
Which ones should be least related? Lets run a regression and
lets see what we get.

lOMoARcPSD|1057622

A real example - Discussion of results

lOMoARcPSD|1057622

A simulated example
Consider two series {yt } and {xt } . We impose the following:

1. They are simulated series, independent from each other
2. They breach assumption TS1 (they follow a Random Walk
model)
yt = yt1 + ut ut N (0, 1) (5)

xt = xt1 + vt vt N (0, 1) (6)
(see online clip)

lOMoARcPSD|1057622

A simulated example
Figure : Example of simulated random walks

lOMoARcPSD|1057622

A simulated example
Then we regress yt on constant and xt
yt = 0 + 1 xt + t
and get
ybt = 16.0 + 0.1697 xt
(0.220) (0.011)
The t-stat is around 16. Can we trust this result? No! TS1 is
breached if data behave like in (5) and (6).
Spurious Regression!
Using nonstationary series in standard regression analysis will cause
problems.
The issue are the availability of LLNs and CLTs.
Straightforward for iid data; They are available for weakly dep.
(stationary) data; But not available for nonstationary data
Need to understand more about the behaviour of TS
lOMoARcPSD|1057622

Autoregressive Models
To understand a time-series behaviour we often use univariate

models.
Economic variables are the result of very complicated interactions

with many other economic variables.
Abstract from all the other variables and concentrate on dynamic

features of yt .
yt = 0 + 1 yt1 + 2 yt2 + ... + k ytk + ut (7)
which is called an autoregressive process.

lOMoARcPSD|1057622

TS Data Features
Features of TS data:
I persistence
I trending
I seasonality
Models of the type (7) can capture all these features.
The key to describing the features of time series are

autocorrelations h = Corr (yt , yth ) for lags h = 1, 2, 3, ...
Here autocorrelations of {yt } and {xt } from the spurious

regression example.

lOMoARcPSD|1057622

TS Data Features

lOMoARcPSD|1057622

TS Data Features
Time Series for which the autocorrelation starts with values very
close to 1 (for lag h = 1) and only decays very slowly are likely to
not be weakly dependent. The dependence is too strong
Covariance Stationarity and Weak Dependence are conceptually

quite different, but in practice we will find that most series that are
covariance stationary also are weakly dependent and vice versa.

lOMoARcPSD|1057622

Another example - US Dollar / UK Pound exchange rate

lOMoARcPSD|1057622

Linking TS features and univariate processes
The key is that the autocorrelations h of an AR process are

related to the coefficients i , i = 1, ..., k in the AR (k) model
yt = 0 + 1 yt1 + 2 yt2 + ... + k ytk + ut
I More coefficients allow for more complicated autocorrelation

functions
Why are univariate models (such as the AR) so useful?
I We often abstract from complicated interrelations between
economic time-series
I Simple models like this have proven useful for forecasting
I A related model can be used to perform a hypothesis test on
whether a series is stationary or not
lOMoARcPSD|1057622

AR(1) model
We will look at one particular AR process in detail, the AR (1)

process. The exercise will look at higher order processes.
We will use it to:
1. Relating AR (1) coefficients to the moments of a time-series
unconditional moments
2. How to use an estimated AR model to forecast Conditional
Expectations and Forecasting

lOMoARcPSD|1057622

AR(1) model - Unconditional Moments
One example in a bit more detail: Autocorrelation of order 1,

AR(1).
yt = 0 + 1 yt1 + ut . (8)
This is a special case of AR(k) in (7).
Assume that the innovation term ut is zero mean iid with
Var (ut ) = u2 . Acknowledging that yt is a random variable we are
interested in a number of characteristics of this time series, namely
its expectation and variance.
E [yt ] = t (9)
Var [yt ] = t2 . (10)

lOMoARcPSD|1057622

These are unconditional moments, that is expectations about yt

without any other knowledge but those of the process parameters.
Here they are just stated:
0
E [yt ] = (11)
1 1
u2
Var [yt ] = 2 = (12)
1 12
Note that these are independent of time!

lOMoARcPSD|1057622

Autocovariances and autocorrelations are also related to the
parameters in (21):
1 = Corr (yt , yt1 ) = 1 ; Cov (yt , yt1 ) = 2 1 (13)

2 = Corr (yt , yt2 ) = 12 ; Cov (yt , yt2 ) = 2 12 (14)
..
.
k = Corr (yt , ytk ) = 1k ; Cov (yt , ytk ) = 2 1k . (15)
without any knowledge of previous realisations of yt we would

forecast yt to take the value
E [yt ] = t = 0 / (1 1 )
Does not use all info, in particular yt1, yt2 etc. (persistence!)
From elementary statistics: P (A) 6=P (A|B)
Distributing
Using all infoprohibited
will lead| Downloaded
to improvedby Elia Aile (yypieesp@abyssmail.com)
expectation.
lOMoARcPSD|1057622

AR(1) model - Conditional Expectations and Forecasting
Similarities to prediction. Here we are explicitly going outside the

sample range used for estimation.
Today: t
Forecast yt+1 , yt+2 , etc using observations yt , yt1 , yt2 , etc.
It = yt , yt1 , yt2 , ... (16)

Of course, the unconditional expectations applies here as well,
E [yt+1 ] = 0 /(1 1 ).
Consider E [yt+1 |It ], the expected value of yt+1 taking into

account the information available at time t, It .

lOMoARcPSD|1057622

AR(1) model - Conditional Expectations
In the AR(1) model, such a prediction is then obtained as follows
E [yt+1 |It ] = E [0 + 1 yt + ut+1 |It ] (17)

= 0 + 1 E [yt |It ] + E [ut+1 |It ]
= 0 + 1 yt + 0.
In general, E [yt+1 |It ] 6= E [yt ].

lOMoARcPSD|1057622

Example with (0 = 0.2, 1 = 0.5):
process: yt = 0.2 + 0.5 yt1 + ut (18)

yt3 = 0.02, yt2 = 0.55, yt1 = 0.71 and yt = 0.64
it then follows that
E [yt+1 |It ] = 0.2 + 0.5 yt = 0.52 (19)

E [yt+2 |It ] = E [0.2 + 0.5yt+1 + ut+2 |It ] (20)
= 0.2 + 0.5 E [yt+1 |It ] + E [ut+2 |It ]
= 0.2 + 0.5 E [yt+1 |It ] + 0
= 0.2 + 0.5 0.52 = 0.46.
which is unequal to
E (yt+1 ) = 0 / (1 1 ) = 0.2/ (1 0.5) = 0.4.

lOMoARcPSD|1057622

It is clear that E [yt+k |It ] will converge to the unconditional

expectation E (yt+k ) (here = 0.4) as k .
lOMoARcPSD|1057622

The fact that the conditional forecast converges to the

unconditional expectation indicates that the informational value of
the current process realisations diminishes with increasing forecast
horizon.
This is characteristic of a stationary and weakly dependent process

and is ensured in AR(1) if |1 | < 1.
Stationary process reverts to a constant mean (mean reverting

processes).
AR(1) monotonic convergence

AR(k), k > 1 more complex convergence possible
We only consider stationary (weakly dependent) series.

lOMoARcPSD|1057622

Example: UK CPI
Data: UKCPI.csv
and RStudio: Week3Practice
Implementation in R
Before we stated the AR(1) model
yt = 0 + 1 yt1 + ut . (21)
0
E (yt ) = (22)
1 1
The model that R actually estimates is
(yt ) = 1 (yt1 ) + ut . (23)

E (yt ) = (24)
The parameters in (21) or (23) can be estimated by OLS

(assuming TS1
Distributing holds)!
prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Ralf Becker
February 19, 2016

lOMoARcPSD|1057622
Table of contents
Heteroskedasticity
What is it?
Consequences of Heteroskedasticity
Detection
Robust standard errors
Generalised Least Squares (GLS)
Weighted LS and Feasible GLS

lOMoARcPSD|1057622
Assumptions
Here is our model
y = X + u (1)
Assumption 5
Var [ui |xi ] = 2 (2)
or
Var [u|X] = 2 I (3)
If this assumption is breached then we are dealing with

heteroskedasticity
It is a common feature of simple regression models

lOMoARcPSD|1057622
Example
House Price Sales - US data, Stockton3.csv
Model house sales price (spricei ) as being dependent on the
number of bedrooms (bedsi )
spricei = 0 + 1 bedsi + ui (4)
The deviations from the

regression line clearly
grow (on average) with
the number of beds (and
perhaps drops again for
bedsi > 5). Hence we
have error variance that
increases with the value
of the explanatory
variable.
lOMoARcPSD|1057622
... for the OLS estimator
A5 is part of the Gauss-Markov assumptions which established the

BLUE properties of the OLS estimators for a linear model.
If A5 is breached, then:
I Best (Efficient), not any longer
I Linear, the estimator hasnt changed it is still linear
I Unbiased, still valid, need ZCM assumptions
I Estimator
As there are LLNs and CLTs for heteroskedastic data, OLS
estimators are still consistent and asymptotically normally
distributed (if all other assumptions are met).
1 0
X0 X b X0 X 1

b N , X X .
see online clip

lOMoARcPSD|1057622
... for inference using OLS
As obvious from the previous slide, in the presence of

heteroskedasticity
1 1
Var ()
b = X0 X X0 X X0 X (5)
1
6= 2 X0 X (6)
This means that when we calculate t-statistics
bk k
t stat =
se(bk )
we need to use the correct variance estimator from which to obtain
se(bk ).

lOMoARcPSD|1057622
... for inference using OLS
What is the distribution of t-tests
Using standard variance Using the variance formula (5)

formula (6)
small n asymp. small n asymp.

Homosk. tnpars N(0, 1) Homosk. ? N(0, 1)
Heterosk. ? ? Heterosk. ? N(0, 1)
We havnt learned yet how to obtain an estimate for in (5).

lOMoARcPSD|1057622
What next?
I How to detect whether HS is a problem

I How to perform inference with OLS robust to the presence of
HS (but inefficient)
I How to obtain efficient parameter estimates (Generalised
Least Squares - GLS)

lOMoARcPSD|1057622
Detection
Graphical Tools
How could we go about detecting the presence of

heteroskedasticity?
Example: Stockton4.wf1; 2610 home sales in Stockton, CA from

Oct 1, 1996 to Nov 30, 1998
Regress:
spricei = o + 1 livareai + 2 agei + ui
Is there some relation between residuals, u

bi and agei or livareai ?

lOMoARcPSD|1057622
Detection
Graphical Tools
livareai Var (ui )

lOMoARcPSD|1057622
Detection
Using hypothesis tests
More formal detection:

Heteroskedasticity implies systematic variation in the residual
variance. Some measure of residual variance is statistically
correlated to some variable. Some variable may be a variable
included in the regresion (here livareai or agei ) or another variable
(e.g. pooli ).
Use auxiliary regression!
What happens if you run:
u
bi = 0 + 1 livareai + 2 agei + i ?

b1 is an estimate of corr (b
ui , livareai ). Hence
b1 = 0 (Try
yourself!). Does not express the relationship we could see in the
above scatter.

lOMoARcPSD|1057622
Detection
u
bi is not a good measure of residual variance
|b bi2 are a good measure of residual variance.

ui | or u
bi2 = 0 + 1 livareai + i
u
This is an auxiliary regression!

H0 : absence of heteroskedaticity, 1 = 0 or R 2 = 0
HA : heteroskedaticity, 1 and R 2 > 0 We will again use the
following test statistic

a
LM = nR 2 2k

lOMoARcPSD|1057622
Detection
In general we will apply the following strategy:

1. Estimate your regression model (say)
spricei = o + 1 livareai + 2 agei + ui (7)
and save the estimated residuals, ubi , i = 1, ..., n

2. Run an auxilliary regression
bi2 = 0 + zi 1 + i
u (8)
and save the R 2 of this auxiliary regression. Here zi is a

k-dimensional vector with all variables potentially relevant for
bi2
the variation in u
3. Calculate the test statistic LM = nR 2

lOMoARcPSD|1057622
Detection
4. We test the following hypotheses
H0 : 1 = 0 nR 2 = 0 homoskedastic residuals (9)

HA : any j 6= 0 for j = 1, .., k nR 2 > 0 (10)
heteroskedastic residuals (11)
(12)
a
LM = nR 2 2k
Hence reject H0 if nR 2 >2k,,crit
Always right-tailed test
This testing principle is due to Breusch and Pagen. They basically

argued that zi could contain any variable that you suspect is
responsible forprohibited
Distributing the heteroskedasticity.
| Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Detection
Which explanatory variables to use?
In the above procedure it was left unspecified what and how many
variables should be in zi . Lets refer to this model:
spricei = o + 1 livareai + 2 agei + ui (13)
In EVIEWS the following versions are implemented:

I Breusch-Pagen-Godfrey
zi = (livareai , agei )
I White test (without cross terms)
zi = (livareai , agei , livareai2 , agei2 )
in some textbooks: zi = (livareai2 , agei2 )
I White test (with cross terms)
zi = (livareai , agei , livareai2 , agei2 , livareai agei )
But you are not restricted to these combinations, you can also
include variables that have not been included in the original
regression.
lOMoARcPSD|1057622
Detection
Heteroskedasticity test examples in R
reg1 < lm(SPRICE LIVAREA + AGE , data = hs data)
Breusch-Pagen Test, the default version
> bptest(reg1)
studentized Breusch-Pagan test
data: reg1 BP = 192.5768, df = 2, p-value < 2.2e 16
White Test, without cross-products
> bptest(reg 1, LIVAREA + I (LIVAREA2 ) + AGE + I (AGE 2 ),
data = hs data)
data: reg1 BP = 278.4171, df = 4, p value < 2.2e 16
White Test, with cross-products
> bptest(reg 1, LIVAREA + I (LIVAREA2 ) + AGE + I (AGE 2 ) +
I (AGE LIVAREA), data = hs data)
data:Distributing
reg1 BPprohibited
= 282.2782, p Aile
df =by5,Elia
| Downloaded value < 2.2e 16
(yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Detection
Test for time dependence of residual variance - ARCH
Consider daily exchange rate changes (USD/UKP), dxt (=100*
log-difference). (4 Jan 1971 to 7 Feb 2014) in
usdukp.xls/usdukp.wf1.
We estimate AR(1) model:
dxt = 0.0015 + 0.0437dxt1 + ut

(0.0025) (0.0096)
We save the estimated

residuals, ut
Clear vola clusters in
time
non-constant error
variance

lOMoARcPSD|1057622
Detection
Could we use the variable time, t, as an explanatory variable in a

Breusch-Pagen test?
only if variance is consistently increasing or decreasing.
Volatility clusters: high values for t2 are likely to be followed by

further high 2 , 2 , 2 , etc.
values for t+1 t+2 t+3
2
We do not know the values for t+j but have proxies u 2
bt+j
bt2 are likely to be followed by further high

If volatility clusters: u
values for u 2 , u
bt+1 2 , u
bt+2 2 , etc. - Autoregressive Conditional
bt+3
Heteroskedasticity (ARCH).

lOMoARcPSD|1057622
Detection
Use auxiliary regression:
bt2 = 0 + 1 u
u 2
bt1 + 2 u 2
bt2 + ... + k u 2
btk + t
H0 : 1 = ... = k = 0 homoskedastic residuals
HA : any j 6= 0 for j = 1, .., k heteroskedastic
(ARCH) residuals
a
will deliver the test statistic LM = T R 2 2k under the null
hypothesis. Here T is the number of observations in the auxiliary
regression.
ARCH LM Test in R
> ArchTest(reg 2$residuals, lags = 12)
ARCH LM-test; Null hypothesis: no ARCH effects
data : reg 2$residuals
Chi squared = 1004.562, df = 12, p value < 2.2e 16
lOMoARcPSD|1057622
Robust (White) standard errors
If there is heteroskedasticity in error terms then

1 0 1
Var b = X0 X X X X0 X . (14)
and to estimate this we need an estimate for

2
1 0 0
..
0 2

2 .
= .. ..
.
(15)
. . 0
0 0 N 2
Do we have any information on 12 , 22 , etc? u

b1 , u
b2 , ...

lOMoARcPSD|1057622

Can we estimate a variance on the basis of one observation? In
general we cannot. This implies that, in general,
2
b1 0 0
u
2
..
0 u .
=
b2 (16)
..
b
..
. . 0
0 0 u bN2
is not a useful estimate of .

However, we are not reallyinterested
in an estimate of but
rather an estimate of Var . It turns that it is alright to use
b b

in the context of estimating Var b . (Halbert White)
1 0
Var b = X0 X b X0 X 1

X X (17)
see RStudio
Distributingwork for |how
prohibited this is by
Downloaded implemented in R.
Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Asymptotically
1 0
b N , X0 X b X0 X 1 .
X X
and hence, if Var (i ) is based on (17), t-tests can be used for

inference (using critical values from N(0,1)).
This is called heteroskedastic robust inference / standard errors.
In Wooldridge (Chapter 8.2) elementwise version.
Heteroskedastic robust LM tests (using auxiliary regressions are

also available - no detail required). Wooldridge, p. 269-271 (4th
ed), 264-265 (5th ed).

lOMoARcPSD|1057622
Example

lOMoARcPSD|1057622

The idea
If 6= 2 I, is there an estimator for which has smaller variance
than bOLS ?
For any symmetric and positive definite matrix, such as , you can
find a non-singular matrix P, such that
= P P 0. (18)
(nn) (nn)(nn)
From this it follows that

P1 P01 = I. (19)
Recall the original model
y = X + u (20)
2
Var (u) = 6= I (21)
1 0
b = X0 X Xy (22)

1 1
Var b = X0 X X0 X X0 X

(23)
lOMoARcPSD|1057622

The idea
Transform the model such that new residuals are homoskedastic.

Recall that if
u N (0, ) , (24)
and
v = P1 u
then
v N 0, P1 P01 (25)
resulting in
v N (0, I). (26)

lOMoARcPSD|1057622

The idea
Hence premultiply equation (20) with P1
P1 y = P1 X + P1 u
y = X
e e +v (27)
Var (v) = I
OLS can be applied without any modifications to the new variables

y = P1 y and X
e e = P1 X.
Note that remained unchanged!

lOMoARcPSD|1057622

The idea
GLS estimator of :
1
bGLS = e 0X
X e e 0e
X y (28)
0 0 1
= X0 P1 P1 X X0 P1 P1 y
1 0 1
= X0 1 X X y
1

X0 1 X

Var bGLS = . (29)
Model (27) meets all the GM assumptions. Hence bGLS is BLUE.
As in general bGLS 6= bOLS it cannot be guaranteed that bOLS is

efficient any longer.

lOMoARcPSD|1057622
Weighted Least Squares - WLS

GLS - implementation
How to specify P? From (18):

2
1 0 0 0 1 0
.. ..
0 2

2 .
P= 0 2 .
= .. .. .. ..
.

. . 0 . . 0
0 0 N 2 0 0 N
(30)
If one knew , one could specify P. Of course we do not know .

lOMoARcPSD|1057622

In certain applications you may suspect that i is proportional to a

certain variable, say zi , and you could then use a matrix P such
that
z1 0 0
..
0 z2 .
P= .
.
. (31)
.. . . 0

0 0 zN
This is then essentially what is sometimes (see Wooldridge chapter
8.4) called weighted least squares.

lOMoARcPSD|1057622

The root of this name is apparent from
1
z1 0 0

y1
.. y2
0 z2 .

1

y = P y =
e .. .. ..
.

. . 0
0 0 zN yN

y1 /z1
y2 /z2
= (32)

..
.
yN /zN
and e
y is merely a re-weighted version of y. X e is re-weighted in the
same fashion. If the first column in X is a column of ones, the first
column of X e will be a column with the reciprocals of all zi s.
Hence, in such
Distributing a case| Downloaded
prohibited there wouldby be
Elia no
Aileconstant in (27).
(yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Note:
I The variable zi should be strictly positive. Otherwise one

implicitely uses negative variances (Hint: You can always use
exponentials!).
I Interpret the parameter estimates in untransformed model
I R 2 cannot be compared between transformed and

untransformed model
I In R(Studio) use:
reg 3 < lm(SPRICE LIVAREA + AGE ,
data = hs data, weights = (1/LIVAREA))

lOMoARcPSD|1057622
Feasible Least Squares - WLS

Sometimes one suspects that more than one variable drives the
residual variance, Var (ui ).
Argue that Var (ui ) is proportional to some linear combimation of

variables z1 .
Need to ensure positivity of the elements to be substituted onto

the diagonal of P.
Reading: Wooldridge (pp 282-284, 4th ed) (pp 276-278, 5th ed)

lOMoARcPSD|1057622
Ralf Becker
February 29, 2016

lOMoARcPSD|1057622
Table of contents
Autocorrelation
What is it?
Consequence
Detection
LM test
Extra notes on detection
How to deal with autocorrelation
Newey-West standard errors
Estimation in differences
Empirical Example
Specification Testing
Overview
RESET Test

lOMoARcPSD|1057622
Assumptions
A common issue in TS data.
Recall the TS model

yt = xt + ut (1)
and
Assumption TS5
No autocorrelation (serial correlation)..
Corr (ut , uts |xt , xts ) = 0 for all s 6= 0
Formally autocorrelation is the breakdown of ATS5.

lOMoARcPSD|1057622
Why do we see autocorrelation?
When does it make sense to relate residuals:

I spatial relationship (in CS data), residuals for neighbouring
regions may be correlated.
Example: Consumption income relationship.
Observation units: Postcodes in Manchester. Neighbouring
postcodes tend to belong to similar socio-economic
backgrounds and may hence display similar deviations from
predicted patterns.
I time relationship (in TS data), residuals close to each other in

time may be related to each other. In this case we also call
this correlation serial correlation (autocorrelation).

lOMoARcPSD|1057622
A simple model
The setup
A simple regression set-up but now the error terms are not iid, but
dependent.
We specify them as following the one process we know that
induced dependence, an AR(1) process:
yt = xt + ut (2)
ut = ut1 + vt (3)
N 0, v2

vt (4)
Corr (ut , ut1 ) 6= 0. The parameter determines how strong this

relationship is.
The random term vt may be normally distributed with zero mean
and variance v2 .
lOMoARcPSD|1057622
A simple model
The error term dynamics
If equation (3) is true for ut it is also true for ut1 :
ut1 = ut2 + vt1 . (5)
Substitution into (3) yields
ut = (ut2 + vt1 ) + vt (6)
and after recursive substitution we obtain:
ut = vt + vt1 + 2 vt2 + 3 vt3 + . . . . (7)
vt - are the shocks to our system, the only source of randomness.

ut - is determined by todays shock vt but also by all previous
shocks vt1 , vt2 , etc., it is a compound error term
If > 1, then past shock would have ever increasing influence on
todays residual.
If < 1 shocks die out (weakly dependent). This is the case we
will Distributing
restrict our prohibited | Downloaded
attention by Elia
to initially Aile ATS1).
(see (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
A simple model
The error term properties
Lets apply what we know about AR(1) processes. A few properties

of the regression error term (which is a compound shock) ut :
E (ut ) = 0 (8)
v2
Var (ut ) = 2 = (9)
1 2
Corr (ut , ut1 ) = (10)
2
Cov (ut , ut1 ) = (11)
k
Corr (ut , utk ) = (12)
2 k
Cov (ut , utk ) = . (13)
It is apparent that the condition for ATS5 to be valid is = 0.

lOMoARcPSD|1057622
Why does autocorrelation occur in real data?
I Economic shocks that have a persistant effect. This will

generate a series of either positive or negative residuals.
Shocks may cause slow adjustment processes. Extreme case:
nonstationary variables.
I Misspecified models - functional form. (e.g specification in

levels with nonstationary data)
I Misspecified models - omitted variable. If the omitted variable

is persistant, its omission might result in autocorrelated
residuals. This may be a dynamic misspecification, ie.
omitting yt1 or xt1 .

lOMoARcPSD|1057622
Consequence
Returning to the matrix representation of the model, DGP (2) and
(3) can be written as follows:
y = X + u (14)
2
Var (u) = 6= I (15)
Var (u) = (16)
T 1

1 2

1 T 2

2
..
= 2 1 .
.. .. ..

..
. . . .
T 1 T 2 1
where T = sample size.
see (11) and (13). The covariances are on the off-diagonals:

Cov (ut , utk ) = 2 k
lOMoARcPSD|1057622
Consequence
It is cleary not possible to simplify this to 2 I.
Residual processes other than the AR(1) will result in a different
setup for the variance-covariance matrix .
Is ,
b when estimated by means of OLS, still consistent and
efficient?
Similar to Heteroskedasticity. b is still consistent.

The derivation of Var b remains unchanged:
1 0 1
Var b = X0 X X X X0 X (17)
which simplifies to
1
Var b = 2 X0 X (18)
only when = 2 I.
lOMoARcPSD|1057622
Consequence
ATS1 to ATS5
1
a
b N , 2 X0 X . (19)
ATS1 to ATS3
1 0 1
a
b N , X0 X X X X0 X . (20)
Test the following null hypotheses on the ith element of , i , for

example: H0 : i = 0.5. One would typically use a ttest
(H0 : i = 0.5) calculated according to
bi 0.5 a
t= N (0, 1) . (21)
sbi
If one was to incorrectly calculate sbi from equation (19) rather

than from (20), the calculated t-statistic (21) would turn out to be
not Distributing
asymptotically normally
prohibited distributed.
| Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Detection of Autocorrelation
Informal tools - Time Series Plots
We have made assumptions for the error terms. The latter,

however, are unobserved. We will use estimated regression
residuals to test the validity of the assumptions on the unobserved
error terms.
Time series plot of residuals. Compare the following two regression

residual plots (use data in TSdataSpuriouosRegressions.csv and
usdukp.csv)

lOMoARcPSD|1057622
Informal tools - Time Series Plots
A: Residual from B: Residual from regressing
regressing agrvalt on log (usdukp)t on
aidt . log (usdukp)t1 .
The left clearly has runs of observations above and below the
mean (of zero).
Residual ut is correlated with its predecessor observation ut1 .
Residuals on the right appear more random.
HowDistributing
could weprohibited
quantify this in a by
| Downloaded statistic?
Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Testing for autocorrelation - LM test
This is sometimes called the Breusch-Godfrey test. Allows for

higher order autocorrelation.
yt = xt + ut (22)
ut = 1 ut1 + 2 ut2 + . . . + k utk + vt
where xt may contain a constant and lagged dependent variable.

Want to assess the relationship between the error terms, which are
unobserved.
We will, instead use the estimated residuals, ut as proxies and

examine the relationship between ut and its lagged versions, ut1 ,
ut2 , etc.

lOMoARcPSD|1057622
Testing for autocorrelation - LM test
Auxiliary regression procedure:

1. First estimate regression model (22) by OLS,
2. Save the residuals ubt , and then
3. Run the following auxiliary regression:
ubt = + xt + 1 ubt1 + 2 ubt2 + . . . + k ubtk + vt .
4. Testing for residual autoregression of order k:

H0 : 1 = 2 = ... = k = 0;
HA : any i 6= 0 for i = 1, ..., k
a
Use LM (nR 2 2k ) test where n is # obs in aux. reg.
(requires conditional homoskedasticity of vt ).

lOMoARcPSD|1057622
Extra notes on detection
I The auxiliary regressions for autocorrelation require the

inclusion of all explanatory variables (unlike auxiliary
regressions for heteroskedasticity).
I LM test is flexible: Auxiliary regression may lagged residuals
for e.g. i = 1, 3, 12. No need to include all k = 12 lags.
I Maximum lag is to be chosen according to the data frequency.
I LM test is the standard test..

lOMoARcPSD|1057622

Reasons for presence of autocorrelation:
I Misspecified models
I functional form.
I omitted variable.
I use of nonstationary variables
I Genuine AC (persistant shock effects; slow adjustment
processes).
General Action:
I Fix the model (when model is misspecified)
I use correct functional form
I include all relevant variables
I Use stationary transformations of the non-stationary variables
I Use estimators which allow for autocorrelated residuals (like
GLS, BUT not often done for AC)
I Use robust inference procedures (leave parameter estimators
unchanged
Distributing - see |below)
prohibited Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Robust Inference: Newey-West standard errors
GLS type approach (reformulating a model to get nice error terms)

is in principle available, but hardly ever applied.
When autocorrelation is more complex and/or

xt includes a lagged dependent variable and/or
Residuals are conditionally heteroskedastic
adapt the variance-covariance to obtain valid large sample

inference.
Recall 1 0 1
Var b = X0 X X X X0 X . (23)

lOMoARcPSD|1057622

Recall the case of Heteroskedastic residuals:

had the following structure:
2
1 0 0
..
0 2

2 .
= .

.
(24)
.. .. 0

0 0 T2
and was estimated by

ub12 0 0
..
ub22

0 .
White =
b
.. ..

(25)
. . 0
0 0 ubT2
lOMoARcPSD|1057622

Autocorrelated residuals: Now has non-zero off-diagonal
elements (allowing for autocorrelated residuals):
2
1 12 13 14 1T
12 2 23 24 2T
2
13 23 2 34 3T
3
= (26)

2
14 24 34 4 4T

.. .. .. .. .. .
..
. . . . .
1T 2T 3T 4T T 2
and approximating ij with bij = ubi ubj . would deliver

ub12

ub1 ub2 ub1 ub3 ub1 ub4 ub1 ubT
ub1 ub2
ub22 ub2 ub3 ub2 ub4 ub2 ubT
ub1 ub3 ub2 ub3 u 2 u u u u
3 4 3 T

b 3 b b b b
=
b (27)

ub1 ub4 ub2 ub4 ub3 ub4 ub42 ub4 ubT

.. .. .. .. ..
. . . . .
ub1 ubT ub| 2Downloaded
Distributing prohibited ubT ub3 ubTby Elia
ub4 ubAile ubT2
T (yypieesp@abyssmail.com)
lOMoARcPSD|1057622

Note that b =u b0 and hence, if this is substituted into X0 X, we

bu
get the following result:
X0 X
b = X0 u
b u b0 X
= X0 u b0 X

b u
= 0
Therefore (27) is not useful as an estimator of in (23).

lOMoARcPSD|1057622

Newey and West propose

b nw =
ub12 w1 ub1 ub2 w2 ub1 ub3 0 0

.. ..
w1 ub1 ub2 ub22 w1 ub2 ub3 w2 ub2 ub4 . .

..
w2 ub1 ub3 w1 ub2 ub3 ub32 w1 ub3 ub4 . 0

.

..

0 w2 ub2 ub4 w1 ub3 ub4 ub42 . w2 ubT 2 ubT

.
.. .. .. .. ..
. . . . w1 ubT 1 ubT
0 0 w2 ubT 2 ubT w1 ubT 1 ubT ubT2
(28)

lOMoARcPSD|1057622

A graphical representation of the proposed weights wi :
where white cells represent 0s.

Lighter shades illustrate
smaller wi . The further away
from the diagonal the smaller
the weight.
In this illustration non-zero weight only for the first two

off-diagonals.
1 0
Varnw b = X0 X b nw X X0 X 1

X (29)

lOMoARcPSD|1057622

Note:
- The exposition here differs from Wooldridge although structure
similar to (chapter 12.5) (Reading: Hamilton, Time Series
Analysis, p.219).
- It also caters for heteroskedastic residuals.
- Parameter estimate remains
unchanged
Inference using Varnw is valid asymptotically only. In the
b
presence of AC and/or HS
bi 0.5 a
t testOLS = ? (30)
sbi ,OLS
bi 0.5 a
t testNW = N (0, 1) (31)
sbi ,nw

lOMoARcPSD|1057622

Common proposal: if strongly autocorrelated residuals in:
yt = + xt + ut (32)
ut = ut1 + vt where vt is iid. (33)
estimate the regression in differenced form using.
yt = yt yt1, and xt = xt xt1,
as dependent and explanatory variables respectively.

To evaluate whether this makes sense, reconstruct
yt (= yt yt1 ) from (32):
yt yt1 = ( + xt + ut ) ( + xt1 + ut1 )

= (xt xt1 ) + (ut ut1 )
yt = xt + (ut ut1 ) (34)
FirstDistributing
note, that prohibited | Downloaded
one cannot by EliaAile
estimate from(yypieesp@abyssmail.com)
(34).
lOMoARcPSD|1057622

The idea behind this is that perhaps (ut ut1 ) reduces to the iid
process vt .
(ut ut1 ) = (ut1 + vt ) ut1 (35)

= ( 1) ut1 + vt
This, however, is only the case when = 1

This implies nonstationary error terms, something we previously
excluded via assumption TS1. If you use nonstationary
data in
(32), then b is not asymptotically N , var b as TSA1 is not
met.
= spurious regression problem
When using persistent time-series data it is paramount to test the
series for stationarity
lOMoARcPSD|1057622
A worked example
Wooldridge Ex 11.7
Relating US hourly wage (hrwage) to productivity (output/hour)

(outphr ). Data in EARNS.csv.
Want to establish the elasticity of wage wrt to productivity

lOMoARcPSD|1057622
Overview
Potential problem:
I Heteroskedasticity
I Autocorrelation
I Omitted variable (specific suspicion about a missing variable)
I Functional form (RESET test)
I Structural change (Chow test)
For heteroskedasticity and autocorrelation the alternatives were
well defined. Formulating a test was straightforward.

lOMoARcPSD|1057622
Overview
Omitted variable:
Consider:
yt = + xt + ut (36)
I Suspected that another variable zt should be included
yt = + xt + zt + ut (37)
A simple t-test of the H0 : = 0 will do the trick.
I Quadratic rather than a linear relationship between xt and yt :
yt = + xt + xt2 + ut (38)
and test H0 : = 0 using a t-test.

lOMoARcPSD|1057622
RESET Test
Unspecific alternative? Need a test that raises a flag if something

is wrong.
No need to say what something is beforehand.
RESET (REgression Specification Error Test) by Ramsey.
Apply the following steps:
1. Estimate the model under
the
H0 that equation (36) is
correctly specified b, b
2. Obtain a series of {b
yt } as follows
ybt =
b + x
b t. (39)

lOMoARcPSD|1057622
RESET Test
3. Then estimate the following equation
yt = + xt + 2 ybt2 + 3 ybt3 + ut (40)
and test H0 : 2 = 3 = 0. (standard F -test).

Using ubt as dependent variable delivers exactly the same
result.
If H0 is rejected there is some problem with the original model (36).
Not clear what the problem is.
Could be anything (even autocorrelation and heteroskedasticity).

lOMoARcPSD|1057622
Ralf Becker
April 6, 2016

lOMoARcPSD|1057622
Table of contents
Overview
RESET Test
Structural Change and Dummy Variables

Structural Change
Dummy Variables
Some more issues on dummy variables

lOMoARcPSD|1057622
Overview
Potential problem:
I Heteroskedasticity
I Autocorrelation
I Omitted variable (specific suspicion about a missing variable)
I Functional form (RESET test)
I Structural change (Chow test)
For heteroskedasticity and autocorrelation the alternatives were
well defined. Formulating a test was straightforward.

lOMoARcPSD|1057622
Overview
Omitted variable:
Consider:
yt = + xt + ut (1)
I Suspected that another variable zt should be included
yt = + xt + zt + ut (2)
A simple t-test of the H0 : = 0 will do the trick.
I Quadratic rather than a linear relationship between xt and yt :
yt = + xt + xt2 + ut (3)
and test H0 : = 0 using a t-test.

lOMoARcPSD|1057622
RESET Test
Unspecific alternative? Need a test that raises a flag if something

is wrong.
No need to say what something is beforehand.
RESET (REgression Specification Error Test) by Ramsey.
Apply the following steps:
1. Estimate the model under
the
H0 that equation (1) is
correctly specified b, b
2. Obtain a series of {b
yt } as follows
ybt =
b + x
b t. (4)

lOMoARcPSD|1057622
RESET Test
3. Then estimate the following equation
yt = + xt + 2 ybt2 + 3 ybt3 + ut (5)
and test H0 : 2 = 3 = 0. (standard F -test).

Using ubt as dependent variable delivers exactly the same
result.
If H0 is rejected there is some problem with the original model (1).
Not clear what the problem is.
Could be anything (even autocorrelation and heteroskedasticity).

lOMoARcPSD|1057622
Structural Change and Dummy Variables
Reading: Wooldridge (5th ed) pp 235-238 (Chp. 7) and p 437

(Chp. 13).
Relationship between variables may change:
- in time (TS data)
- for different categories of observations (say male and female)
What will we do
1. How can we detect this?
2. What do we do about this.

lOMoARcPSD|1057622
Structural Change
An example
I Annualised UK CPI growth (inflation rate), UKCPI.csv.

I Quarterly data from 1988Q2 to 2013Q4
I Lets withhold 2012Q1 to 2013Q4 for forecasting
I Lets label this series yt
I Estimate an AR(4) model
ybt = 0.542 + 0.041 yt1 + 0.122 yt2 0.054 yt3 + 0.690 yt4

lOMoARcPSD|1057622
Structural Change
An example - Forecasts from full sample estimation
Figure: The data series from 1988Q2 to 2011Q4 and then 8 quarters of
forecasts
lOMoARcPSD|1057622
Structural Change
Figure: prohibited
Distributing The forecasts and realisations
| Downloaded from 2012 and 2013
by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
Structural Change
I The problem lies in the inclusion of the early sample period

that includes significantly higher inflation.
I This will effect the unconditional mean
0.542
E (yt ) = = 2.689 (6)
1 0.041 0.122 + 0.054 0.690
to which we would expect this stationary process to converge.
I Also RESET test has p-value of 0.0019
I If the early observations are from a regime that may not be
relevant any more, then we may want to exclude these
observations.
I Lets re-estimate the model with observations starting from
1992Q1 instead (exclude first 4 years of data).
lOMoARcPSD|1057622
Structural Change
An example - Forecasts from 92+ sample estimation
The result is the following:
ybt = 1.291 0.047 yt1 + 0.052 yt2 0.118 yt3 + 0.499 yt4
(0.425) (0.092) (0.089) (0.072) (0.074)
RSS = 223.233; n = 80 (7)
I Now we have E (yt ) = 2.102, a significantly lower

unconditional expectation
I This implies that the conditional forecasts will come in
somewhat lower.

lOMoARcPSD|1057622
Structural Change
An example - Forecasts from 92+ sample estimation
Figure: The data series from 2006Q1 to 2013Q4 with forecasts

lOMoARcPSD|1057622
Structural Change
The Chow Test
Idea: Regress full sample and regress sub-samples. If there is no

change there should be no difference in overall fit.
I Lets use the RSS as a measure of fit
I Full Model - 1989Q2 to 2011Q4: RSSr = 368.7223, n = 91
I Split the same period into two sub-samples:
I Sample 1 - 1989Q2 to 1991Q4: RSS1 = 43.8610, n = 11
I Sample 2 - 1992Q1 to 2011Q4: RSS2 = 223.2332, n = 80
I RSSu = RSS1 + RSS2 = 267.0942
I Difference in fit is 101.6281
I Is this difference significant?

lOMoARcPSD|1057622
Structural Change
The Chow Test
k = # of parameters estimated
(RSSr RSSu ) /k
F = (8)
RSSu / (dofu )
F Fk,T 2k . (9)
H0 : no difference between full- and sub-sample fit

HA : difference between full- and sub-sample fit
This is just a special case of the general F-test
Example:
RSS1 = 43.8610; RSS2 = 223.2332
RSSr = 368.7223; RSSu = 267.0942
dofu = 91 2 5
F = 6.164; Fcrit,0.05 = 2.32; Fcrit,0.01 = 3.23.
Reject H0
lOMoARcPSD|1057622
Structural Change
The Chow Test - Extra Notes
I Only Fk,T 2k distributed if the error variance in the two

subsamples is identical.
I If H0 is rejected we do not know whether it is due to a
changing intercept or slope (or both). use dummy variables
I We need to know the time for which we suspect the break, if
we dont then we need some sort of recursive strategy.
I We need to know that there is one break only
There are testing strategies which have been developed to deal
with the two latter problems.

lOMoARcPSD|1057622
Dummy Variables
An Introduction
Reading: Semester 1
I Different attack to the previous example.
I A dummy variable is a variable that takes values of 0 and 1.
I The criterion that decides between 0 and 1 depends on the
problem,
I male - female
I pre - post 1981
I pre - post EMU etc.

lOMoARcPSD|1057622
Dummy Variables
An Example
I Use house price data, Stockton3.csv

I Consider the following model:
spricei = 0 + 1 pooli + u1 (10)
I The pool variable is defined as follows:

0 for houses without pool
pooli = (11)
1 for houses with pool

lOMoARcPSD|1057622
Dummy Variables
sprice
\i = 118210 69142 pooli (12)
(1345) (6026)
R 2 = 0.048; RSS = 1.17 1013 , n = 2610 (13)
I Average house price for houses without pool (pooli = 0)

E (spricei |pooli = 0) = 118210 = 0
I Average house price for houses with pool (pooli = 1)

E (spricei |pooli = 1) = 187352 = 0 + 1
I Of course this model is hugely misspecified!

lOMoARcPSD|1057622
Dummy Variables
Consider the following model:
spricei = 0 + 1 livareai + u1 (14)

sprice
\i = 30637.55 + 9466.7 livareai
(2263.4) (132.09)
2
R = 0.663; RSS = 4.14 1012 , n = 2610
Question: Does the price/living area relationship differ between
houses with and without swimming pools?
Two strategies:
1. Estimate two models, one for houses with pool and another
for houses without pool
2. Estimate one model but use the pooli dummy variable
lOMoARcPSD|1057622
Dummy Variables
Strategy 1:
Pool: sprice
\i = b0 + b1 livareai (15)
Rp2 = 0.658; RSS = 4.51 10 , n = 13011
No Pool: sprice
\i =
b0 +
b1 livareai (16)
2 12
Rnp = 0.647; RSS = 3.67 10 , n = 2480
Strategy 2:
sprice
\i = b0 + b1 livareai + b2 pooli + b3 (livareai pooli )(17)
Rd um2 = 0.665; RSS = 4.12 1012 , n = 2610

lOMoARcPSD|1057622
Dummy Variables
Question: How are the parameters in the different strategies

related to each other?
I b0 =
b0
I 0 + 2 = b0
b b
I b1 =
b1
I 1 + 3 = b1
b b
Therefore b2 and b3 can be interpreted as follows:

I b2 = The difference in the constant between houses with and
without pool
I b3 = The difference in the slope (effect of living area on house
price) between houses with and without pool

lOMoARcPSD|1057622
Dummy Variables
Testing for significance of Dummy Variables
An F-test of b2 = b3 = 0 can be interpreted as a test on whether

or not the spricei /livareai relationship differs between houses with
and without pool
(RSSr RSSu ) /k
F = (18)
RSSu /dofu
k = number of restrictions
dofu = dof in the unrestricted model
F Fk,dofu .

lOMoARcPSD|1057622
Dummy Variables
Testing for significance of Dummy Variables
Dummy Variable test: Chow test:

RSSr = 4.14 1012 from (14) RSSr = 4.14 1012 from (14)
RSSu = 4.12 1012 from (17) RSSu = 4.12 1012 from sum
k = 2 as H0 : b2 = b3 = 0 of (15) and (16)
dofu = 2610 4 = 2606 k = 2 as each model has 2
coefficients to estimate
dofu =
(2480 + 130) (2 + 2) = 2606
Both versions lead to
(4.14 4.12) /2
F = = 6.325 (19)
4.12/ (2610 4)
F F2,inf ,Fcv ,0.01 = 4.61 (20)
And therefore we reject H0 .

lOMoARcPSD|1057622
Dummy Variables
Additional Notes
I IN CS differentiate between two (or more groups), eg. in

studies where you have a control group.
I Beware of creating a perfect multicolinearity problem by
including too many dummy variables (dummy variable trap).
Example:

0 if no pool
pooli = ; (21)
1 if pool

0 if pool
nopooli = ; (22)
1 if no pool
const = 1 (23)
nopooli = const pooli . (24)
If pooli , nopooli and a constant are included, then they are

perfectly colinear. One has to be left out.
lOMoARcPSD|1057622
Dummy Variables
Additional Notes
I Impulse dummies, to model once-off effects.

Example: Australia introduced VAT in July 2000. As cars were
to become cheaper, some purchases were delayed from June
into July. A dummy variable capturing such an effect would be

0 for t May 2000
1 for t = June 2000

Dt = . (25)

1 for t = July 2000
0 for t > July 2000

I Keep the number of dummy variables as small as possible.

However, if you omit a dummy variables which should be
included you are facing an omitted variable problem.

lOMoARcPSD|1057622
Dummy Variables
Additional Notes
I Rejecting the H0 above may be due to different residual

variance in sub-samples. No easy way to include a dummy for
the variance into an OLS framework. (ML estimation
required)
I Interpretation can be tricky (see Wooldridge examples).
I The dependent variable might be in dummy variable form
(buy or no buy; interest rate change or no change). A whole
different set of models is required. (and ML estimation)

lOMoARcPSD|1057622
Ralf Becker
April 16, 2016

lOMoARcPSD|1057622
Table of contents
Maximum Likelihood Estimation

Introduction
Example: Goals
Parameter Estimation
Outlook

lOMoARcPSD|1057622

Introduction
Reading: Wooldridge p 778-779 (4th ed); Thomas p 40-43;
Davidson and MacKinnon 399-404.
Where did the parameter estimate
1 0
b = X0 X X y. (1)
come from?
1. Minimisation of residual sum of squares (Least Squares - LS):
0
min y Xb b0 u
y Xb = u b
or
2. Minimisation of sample moments (Method of Moments - MM)

X0 y Xb = X0 u
b = 0.
lOMoARcPSD|1057622

Introduction
For the model

y = X + u (2)
both resulted in the same estimator (1).
A third estimator derivation principle is Maximum Likelihood

(ML). For the model in (2) we will obtain the same estimator.
On other occasions not all three estimators may be obtainable.
Even if more than one estimator is feasible, ML usually has

desirable large-sample properties. Efficient and asymptotically
normally distributed!

lOMoARcPSD|1057622

Example: Goals
ML estimation comes into play when other methods are
inadequate.
Example: Lets say you want to model the amount of goals scored
in an English Premier League Match. Data from all matches from
August 2012 to 24 March 2014. 681 matches. (EPL 2012to14.csv)

lOMoARcPSD|1057622

Example: Goals
A straightforward way to
model this would be to
use OLS for (gi =
goals):
gi = + ui (3)
with ui N(0, 2 ) and
hence
E (gi ) = (4)

lOMoARcPSD|1057622

Example: Goals
Using this estimated model, what would be the following

probabilities?
P(gi 1) = 0.846
P(gi 3) = 0.450
P(gi < 0) = 0.056
Problem: normality is clearly a problem; continuous r.v.

lOMoARcPSD|1057622

Example: Goals
One way to handle this is to acknowledge gi N + . Accordingly

assume that gi Poisson()
The density of a Poisson r.v. is
gi e
f (gi ) = .
gi !
The parameter to be estimated here is .
Note: E (gi ) = , Var (gi ) =

lOMoARcPSD|1057622
Compare the empirical histogram with an arbitrary poisson
distribution ( = 3)

lOMoARcPSD|1057622
How do we find the optimal, the ML parameter estimate?
The density of a Poisson r.v. is
gi e
f (gi ; ) = . (5)
gi !
The parameter to be estimated here is .
What would be the probability of the two first outcomes(say 0 and
5 goals), given a certain value of ?
iid
L (; g1 , g2 ) = f (g1 ; )f (g2 ; )
This is called the likelihood function.
Here we have a product; often more convenient to work with
summations hence take log:
iid
ln L (; g1 , g2 ) = ln f (g1 ; ) + ln f (g2 ; )
lOMoARcPSD|1057622
The first few observations are: 0, 5, 3, 5, 2, ...
Lets assume the parameter was = 2 or 5 or 8.
= 2:
ln L ( = 2; g1 , g2 ) = 2 + (3.322) = 5.322
= 5
ln L ( = 5; g1 , g2 ) = 6.740
= 8
ln L ( = 8; g1 , g2 ) = 10.390
The larger the value of ln L (; g1 , g2 ) the more likely that the data
were drawn from the respective distribution.
From which distribution did the data most likely come? Here from
the Poisson with = 2.
We only used the first two datapoints and we did not search over
all possible parameter
Distributing values. by Elia Aile (yypieesp@abyssmail.com)
prohibited | Downloaded
lOMoARcPSD|1057622
We only used the first two datapoints and we did not search over
all possible parameter values.
The formal problem statement is
ML = argmin ln L (; g1 , g2 , ..., g681 )

Using the glm function in R: ML = 2.779441
This coincides with average number of goals.
P(gi 1) = 0.9379
P(gi 3) = 0.5256
P(gi < 0) = 0

lOMoARcPSD|1057622
Conditional models
Say you want to recognise that the number of goals may depend
on a number of explanatory variables: e.g. whether the match is a
home match for a top team or not, perhaps the Table position
of the teams, the temperature on the day, etc.
How would we adjust (3)?
Lets consider a variable topi which is a dummy variable
1 if top team home match

topi = { (6)
0 if not a top team home match
Then, in a OLS framework we would adjust the model as follows
gi = + 1 topi + ui (7)
E (gi |topi ) = + 1 topi (8)
lOMoARcPSD|1057622
Conditional models
How would we adjust the Poisson model?
We need to adjust the density in (5) as follows:
gi i e i
f (gi | topi ; ) = .
gi !
now changes, meaning we get varying conditional expectations
E (gi |topi ) = i .
i is specified as follows
i = exp(0 + 1 topi )
We use the exp() function to ensure that i is positive.
We then find the ML parameter estimates
(0,ML , 1,ML ) = argmin ln L (0 , 1 ; g1 , ..., g681 , top1 , ..., top681 )

(0 ,1 )
Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com) (9)
lOMoARcPSD|1057622
Outlook
Sometimes the problem in (9) can be solved analytically.

E.g. for (2) where we would find (1) as bML .
Often there is no analytical solution numerical maximisation
(sophisticated trial and error).
ML estimation is attractive as:
1. Under very general assumptions ML estimators can often be
shown to be asymptotically normally distributed.
2. ML estimators are available where LS estimators are not:
2.1 truncated models (your data has been pre-selected, e.g.
Manchester students all have at least ABB at A-levels)
2.2 models with binary dependent variables (do you own a bicycle
or not)
2.3 count data (number of soldiers killed by mule-kicks each year
in the Prussian cavalry, Ladislaus von Bortkiewicz, 1898)

lOMoARcPSD|1057622
Ralf Becker
April 24, 2016

lOMoARcPSD|1057622
Table of contents
Introduction
The Basics
Summary Comparison
An Example
Summary

lOMoARcPSD|1057622
Introduction
Reading: Greene (6th ed) Chapter 18.
Think of the following elements:

I Data, y , including observations for the dependent and
explanatory variables. Most generally all are treated as
random variables.
I Model, this describes how the explanatory variables are linked
with each other. Part of this description are usually:
I a set of model parameters, and
I a distributional assumption for any error term

lOMoARcPSD|1057622
Frequentists Econometrics
This is what we have done so far.
I Data, y , are observed.
I Model, we choose a particular model, say M, e.g.
y = X + u (1)
I a set of model parameters, associated to this model. is

unknown but assumed to be constant
I a distributional assumption for any error term, e.g.
u N(0, 2 )
Then we obtained an estimate for
b = (X0 X)1 X0 y (2)
which (given assumptions) we know to have to have the following

distribution
0 1
b N(, by2 (X
Distributing prohibited | Downloaded Elia X) )
Aile (yypieesp@abyssmail.com) (3)
lOMoARcPSD|1057622
Frequentists Econometrics
We assumed that is fixed, but unknown and established that b is

a draw from a random variable that is centered around the
unknown .

lOMoARcPSD|1057622
The crucial difference
If it was possible to pin the difference between frequentist and

bayesian econometrics to one point, it would be the following:
While Frequentists assume that is unknown but fixed,

Bayesians assume that is a random variable of which we do
not see any draw.
This implies that Bayesians are really looking for p(|y ), i.e. the
probability distribution of conditional on the observed data, y
(and the assumed model, M with associated error distribution).
The observed data (potentially a vector of data) may well be
random as well and be characterised by p(y ).

lOMoARcPSD|1057622
The Basics
Recall the following basic probability rule (where A and B are
events):
P(A, B)
P(A|B) = (4)
P(B)
The same is valid for random variables, a and b, rather than events
p(a, b)
p(a|b) = (5)
p(b)
p(a, b)
p(b|a) = p(a, b) = p(b|a)p(a) (6)
p(a)
Now substitute the second line into the first and obtain
p(b|a)p(a)
p(a|b) = (7)
p(b)
lOMoARcPSD|1057622
The Basics
Why is this useful?
p(b|a)p(a)
p(a|b) = (8)
p(b)
Think of our two random variables, the parameter vector, and
the data y .
p(y |)p()
p(|y ) = (9)
p(y )
The left hand side is the object of desire for Bayesians, the
posterior distribution of conditional on the observed data.
We are not trying to find one best estimate of the unknown ,

rather we are interested in the posterior distribution p(|y ).
You could look at the mean of that distribution as your one best
Distributing
estimate of .prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
The Basics
As p(y ) does not involve we can also state
p(|y ) p(y |)p() (10)

means is proportional to This implies that if we have the
terms on the right hand side we can calculate p(|y ).
I p(y |), this is exactly the same as the likelihood function we
encountered in the ML section. You need a model M and an
error distribution to write this down.
I p(), this is called the prior distribution of the parameter
vector. It reflects our knowledge about the parameter vector
of interest prior to looking at our data y .

lOMoARcPSD|1057622
Summary comparison
Frequentists:
Set Model M and error distribution to determine p(y |), then
ML = argmax p(y |) (11)

Bayesians:
Set Model M and error distribution to determine p(y |) and the

prior distribution p() then
p(|y ) p(y |)p() (12)

p(|y ) needs to be evaluated at each possible/plausible value for .
We are not optimising anything, but we are multiplying two
distributions.
lOMoARcPSD|1057622
An Example
Setup
To get a flavour for the
calculations required we
will work through an
example.
Lets assume you want

to figure out whether
there is a positive
temperature trend.
Annual (1850 to 2015) temperature anomaly data. n = 166

Source: Climate Research Unit, UEA,
http://www.cru.uea.ac.uk/cru/data/temperature/
lOMoARcPSD|1057622
An Example
Setup
If there was a positive temperature trend we would expect the

probability of a year with increasing temperature to be larger than
0.5!
Define a dummy variable yt that takes a value of 1 for years with

temperature increases and 0 otherwise.
= P(yt > 0) (13)

lOMoARcPSD|1057622
An Example
Frequentists Approach
I is unknown, but fixed.

Pn
I Lets obtain a sample estimate, = i=1 yt = 0.5273
I Do we know a distribution for ?
I Yes, N(, ((1 )/n) N(p, 0.03892 )
I But as we do not know we do not know where it is?
I Only once you fix a H0 (e.g. H0 : = 0.5) you can perform
hypothesis tests on
I Clearly we would be unable to reject a H0 : = 0.5 at any
reasonable significance level.

lOMoARcPSD|1057622
An Example
Bayesian Approach
We apply
p(|y ) p(y |)p() (14)

to our problem:
p(|y ) p(y |)p() (15)

p() is our best info on before we see the data, p(|y ) after
weve seen the data.
This can be understood as an updating problem (add one years

info, yt at a time!).

lOMoARcPSD|1057622
An Example
Bayesian Approach
In Bayesian analysis we need to do calculations at every possible

outcome for :
For simplicity, discretise the problem and consider the 101 possible
values,
1 = 0.0, 2 = 0.01, 3 = 0.02, ..., 100 = 0.99, 101 = 1.0
pp(i |yt ) p(yt |i )p(i ) (16)

p(yt |i )p(i )
p(i |yt ) = P101 (17)
i=1 pp(1 |i)
where the second line is justPa rescaling to ensure that p(i |yt ) is a
probability distribution and 101
i=1 p(i |yt ) = 1. Before we can start
we need a prior distribution p(i )
lOMoARcPSD|1057622
An Example
The prior distribution
Lets try the following:
1. N[0.4, 0.052 ]
2. N[0.6, 0.052 ]
3. U[0, 1]
Where do they come from?

lOMoARcPSD|1057622
An Example
The updating mechanism
Recall, this is what we do.
pp(i |yt ) p(yt |i )p(i ) (18)

p(yt |i )p(i )
p(i |yt ) = P101 (19)
i=1 pp(i |i)
All that is left is: p(yt |i )
As we are dealing with a binary variable and we assumed that is

the probability that yt = 1, we know that
p(yt |i ) = i if yt = 1 (20)
p(yt |i ) = (1 i ) if yt = 0 (21)
This means we everything to perform the updating (recall (18) and

(19)Distributing
need to prohibited
be done | for
Downloaded
every by
i . Elia Aile (yypieesp@abyssmail.com)
lOMoARcPSD|1057622
An Example
The posterior distribution
after 163 annual updates:

lOMoARcPSD|1057622
An Example
The prior distribution
Once we have the posteriors after all updating we can calculate the
following probabilities:

lOMoARcPSD|1057622
Summary
CONTRAs
I Arbitrary choice of priors
I With uninformative priors (sometimes) same results as
frequentists
I When allowing for continuous parameter distributions we need
to use numerical integration (computationally intensive!)
PROs
I Ability to find probabilities for parameters of interest
I Ability to deal with latent/unobserved random variables
I Computational issues become less important

Lecture Notes Lectures 1 8

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Lecture Notes Lectures 1 8

Загружено:

Авторское право:

Доступные форматы

lOMoARcPSD|1057622

Lecture notes, lectures 1-8

Econometrics (University of Manchester)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Small Sample Properties of OLS Estimators

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Assumption 1 The model is linear in parameters.

yi = 0 + 1 x1i + 2 x2i + ... + k xki + ui (1)

Assumption 4 Zero conditional mean.

E [ui |xi ] = E [ui |x1i , x2i , ..., xki ] = 0 (3)

A1 to A4 guarantee that the OLS parameter estimator is unbiased

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Var [ui |xi ] = 2 (6)

A1 to A5 (= Gauss-Markov assumptions) guarantee that OLS

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Example: Basic Econometrics Grades (1)

Regress variable sem2i against sem1i

sem2i = + sem1i + ui (12)

Example: Basic Econometrics Grades (1)

We can test whether there is a significant relationship between

Given A1 - A6 we know that

The rejection rule is:

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Example: Basic Econometrics Grades (1)

The test statistic is

Therefore we do not reject H0 . What does this mean? For every

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Testing multiple restrictions

The most common test statistic used to test multiple restrictions is

here r = number of restrictions tested and SSR = sum of squared

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Example: Basic Econometrics Grades (2)

We are interested in whether the relationship between Semester 1

S2ti = 0 + 1 S1ti + 2 Y 3si + 3 (S1ti Y 2si ) + ei

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Example: Basic Econometrics Grades (2)

Example: Basic Econometrics Grades (2)

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

It is crucial to understand what p-values are, how they are

Reject H0 if p-value <

The value for = P(type I error H0 is true) needs to be set by

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

H0 distribution: t135 . test-stat = -0.9507. p-value= 0.1717

H0 distribution: F2,133 . test-statistic = 3.5284. p-value =

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

I Problem Sets, Multiple-choice and short answer questions

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

ei on constant, x1t and x2t , x3t and x4t . Obtain R 2

3. Calculate the test statistic LM = nR 2 .

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

Small Sample Parameter Properties

Reading: (Wooldridge 3.3,3.4 and 4..1)

You know that

What does it mean for parameter estimates to be unbiased and

yi = 0.5 + 1.5 xi + ui (23)

Then each of you (300 students) randomly draws a sample of 100

I Each of you would obtain different estimates 0 and 1

Recall that 1 is a r.v.

From the histogram for

If the OLS estimator is

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)

I When you estimate a regression on sample data you know

Distributing prohibited | Downloaded by Elia Aile (yypieesp@abyssmail.com)