Академический Документы
Профессиональный Документы
Культура Документы
Chapter 10: Multicollinearity
Iris Wang
iris.wang@kau.se
Econometric problems
Econometric problems
Multicollinearity
What does it mean? A high degree of correlation amongst the
explanatory variables
What are its consequences? It may be difficult to separate out
What are its consequences? It may be difficult to separate out
the effects of the individual regressors. Standard errors may
be overestimated and t‐values depressed.
Note a symptom may be high R2 but low t‐values
Note: a symptom may be high R but low t values
How can you detect the problem? Examine the correlation
matrix of regressors ‐ also carry out auxiliary regressions
amongst the regressors.
t th
Look at the Variance‐inflating factor (VIF)
NOTE
NOTE:
be careful not to apply t tests mechanically without checking for
multicollinearity
multicollinearity is a data problem, not a misspecification problem
is a data problem, not a misspecification problem
Variance‐inflating
Variance inflating factor (VIF)
factor (VIF)
Multicollinearity inflates the variance of an
inflates the variance of an
estimator
= 1/(1 RJ2)
VIFJ = 1/(1‐R
where RJ2 measures the R2 from a regression of
Xj on the other X variable/s
h h X i bl /
⇒serious multicollinearity problem if VIFJ>5
Econometric problems
Econometric problems
Heteroskedasticity
What does it mean? The variance of the error term is not
constant
t t
What are its consequences? The least squares results
are no longer efficient and t tests and F tests results may
l ffi i d d l
be misleading
How can you detect the problem?
H d t t th bl ? Plot the residuals against
Pl t th id l i t
each of the regressors or use one of the more formal tests
How can we remedy the problem?
Ho can e remed the problem? Respecify
Respecif the model –
the model
look for other missing variables; perhaps take logs or
choose some other appropriate functional form; or make
sure relevant variables are expressed “per capita”
p p p
The Homoskedastic Case
The Homoskedastic
The Heteroskedastic Case
The Heteroskedastic
The consequences of heteroskedasticity
OLS estimators are still unbiased (unless there are also
omitted variables)
i d i bl )
However OLS estimators are no longer efficient or
minimum variance
minimum variance
The formulae used to estimate the coefficient standard
errors are no longer correct
• so the t-tests will be misleading
• confidence intervals based on these standard errors will be
wrong
Detecting heteroskedasticity
Detecting heteroskedasticity
Visual inspection of scatter diagram or the
residuals
Goldfeld‐Quandt test
suitable for a simple form of heteroskedasticity
suitable for a simple form of heteroskedasticity
Goldfeld‐Quandt
Goldfeld Quandt test (JASA, 1965)
test (JASA, 1965)
382, Suppose it looks as if σui i = σ
P. 382, Suppose it looks as if
P. σuXi
i.e. the error variance is proportional to the
q
square of one of the X’s
Rank the data according to the variable and
conduct an F test using RSS2/RSS1
where these RSS are based on regressions using
the first and last [n‐c]/2 observations [c is a
central section of data usually about 25% of n]
l fd ll b f ]
Reject H0 of homoskedasticity if Fcal > Ftables
Remedies
Respecification of the model
Include relevant omitted variable(s)
Express
p model in log-linear
g form or some other
appropriate functional form
Express variables in per capita form
Where respecification won’t solve the problem
use robust Heteroskedastic Consistent Standard
Errors (due to Hal White,
White Econometrica 1980)
Basic Econometrics, Spring 2012
Iris Wangg
iris.wang@kau.se
1
Chapter 11: Heteroskedasticity
Definition:
Heteroskedasticity occurs when the constant variance
assumption, i.e. Var(ui|Xi)= σ2, fails. This happens when
variance of the error term (ui) changes across different values
of Xi. Heteroskedasticity is present if the
Example: Savingsi=α0+α1income+ui variance of unobserved factors
affecting savings (ui) increases with
income
2
Ch t 11
Chapter 11: Heteroskedasticity
H t k d ti it
Outline
1. Consequences of Heteroskedasticity
2. Testing for Heteroskedasticity
3
1. Consequences of Heteroskedasticity
• OLS is unbiased and consistent under the following 4
assumptions:
p
– Linear in parameters
– Random sampling
– No perfect collinearity
No perfect collinearity
– Zero conditional mean (E(u|X)=0)
• Homoskedasticity assumption (MLR.4) stating constant error
variance (Var(u|X)= σ2) plays no role in showing that OLS is
unbiased & consistent
unbiased & consistent
4
1. Consequences of Heteroskedasticity ‘cntd
• However, estimators of variances, Var(βj) are biased without
homoskedasticity
– OLS
OLS standard errors are biased
standard errors are biased
– Standard confidence interval, t, and F statistics which are based on
standard errors are no longer valid.
– t & F statistics no longer have t & F distribution resp.
& l h & d b
– And this is not resolved in large samples
• OLS is no longer BLUE and asymptotically efficient
g y p y
– It is possible to find estimates that are more efficient than OLS (e.g.
GLS, Generalized Least Squares)
• Solutions involve using:
Solutions involve using:
i. Generalized least squares (GLS)
ii. Weighted least squares (WLS) is a special case of GLS, p.373
5
Weighted
g Least Squares
q (WLS)
• Aim: to specify the form of heteroskedasticity detected and use weighted
least squares estimator
least squares estimator.
– If we have correctly specified the form of the variance, then WLS is
more efficient than OLS
– If we used wrong form of variance, WLS will be biased but it is
If d f f i WLS ill b bi d b t it i
generally consistent as long as E(u|X)=0.
– But, efficiency of WLS is not guaranteed when using wrong form of
variance.
i
i.e. the bias
will improve
• We use this to transform the original regression equation with with large N
homoskedastic error term
homoskedastic error term
6
2. Testing for Heteroskedasticity
• Why test for heteroskedasticity?
– First, unless there is evidence of heteroskedasticity, many prefer to use
the usual t under OLS
the usual t under OLS
– This is because the usual t statistics have exact t distribution under
the assumptions of homoskedasticity & normally distributed errors.
– Second, if heteroskedasticity
Second if heteroskedasticity is present, it is possible to obtain better
is present it is possible to obtain better
estimator than OLS when the form of heteroskedasticity is known.
• In the regression model:
Y= β0+β1x1+…+βkxk +u
• We assume that E(u| x1, …xk )=0 Æ OLS is unbiased and
consistent.
consistent
• In order to test for violation of the homoskedasticity
assumption, we want to test the null hypothesis:
Ho: Var(u| x1, …, xk )=σ2
7
2. Testing for Heteroskedasticity ‘cntd
• To test the null hypothesis above, we test whether expected
value of u2 is related to one or more of the explanatory variables.
value of u is related to one or more of the explanatory variables
• If we reject Ho, then heteroskedasticity is a problem & needs to
be solved.
• Two types heteroskedasticity tests:
A. Goldfeld‐Quandt Test for heteroskedasticity, p.382
B. White’s General Heteroskedasticity
hi ’ l k d i i Test, p.386
• Once we reject Ho of homoskedasticity, we should treat the
heteroskedasticity problem
8
B. White heteroskedasticity test
Autocorrelation
Iris Wang
iris.wang@kau.se
Econometric problems
Econometric problems
Topics to be covered
Topics to be covered
Overview of autocorrelation
Overview of autocorrelation
First‐order autocorrelation and the Durbin‐
Watson test
Watson test
Higher‐order autocorrelation and the Breusch‐
G df
Godfrey test
Dealing with autocorrelation
Examples and practical illustrations
Autocorrelated series and autocorrelated
series and autocorrelated
disturbances
Overview of autocorrelation
What is meant by autocorrelation ?
The error terms are not independent from observation to observation – ut
depe ds o o e o o e pas a ues o u
depends on one or more past values of u
What are its consequences?
The least squares estimators are no longer “efficient” (i.e. they don’t have the
lowest variance). More seriously autocorrelation may be a symptom of
model misspecification
d l i ifi ti
How can you detect the problem?
Plot the residuals against time or their own lagged values, calculate the
Durbin‐Watson statistic or use some other tests of autocorrelation such as
the Breusch‐Godfrey (BG) test
How can you remedy the problem?
Consider possible model re‐specification of the model: a different functional form,
missing variables lags etc If all else fails you could correct for autocorrelation
missing variables, lags etc. If all else fails you could correct for autocorrelation
by using the Cochrane‐Orcutt procedure or Autoregressive Least Squares
First‐order autocorrelation
The sources of autocorrelation
The consequences of autocorrelation
The consequences of autocorrelation
Detecting autocorrelation
Detecting autocorrelation
The Durbin‐Watson
The Durbin Watson test
test
More on the Durbin‐Watson
More on the Durbin Watson statistic
statistic
Using the Durbin‐Watson
Using the Durbin Watson statistic
statistic
Durbin‐Watson
Durbin Watson critical values
critical values
The Breusch‐Godfrey
The Breusch Godfrey (BG) test
(BG) test
The Breusch‐Godfrey
The Breusch Godfrey test continued
test continued
Dealing with autocorrelation
Dealing with autocorrelation
How should you deal with a problem of autocorrelation?
y p
Consider possible re‐specification of the model:
a different functional form,
the inclusion of additional explanatory variables,
the inclusion of lagged variables (independent and
dependent)
If all else fails you can correct for autocorrelation by
g g q
using the Autoregressive Least Squares
Quick questions and answers
Question 1:
What is the problem of autocorrelation?
Answer:
Autocorrelation is the problem where
Autocorrelation is the problem where
the disturbances in a regression model are
not independent of one another
from observation to observation
(it is mainly a problem for models
estimated using time series data)
estimated using time series data)
Question 2:
Is serial correlation the same as
autocorrelation?
Answer:
Answer:
Yes. Serially correlated disturbances or
errors are the same as autocorrelated
ones.
ones
Question 3:
Wh i
What is meant by AR(1) errors?
b AR(1) ?
Answer:
This means that the errors or
d stu ba ces o o a st o de
disturbances follow a first‐order
autoregressive pattern
ut = ρut‐1 + ε
+ t
Question 4:
Question 4:
What is the best known test for
AR(1) disturbances?
Answer:
The Durbin‐Watson
The Durbin Watson test. The null
test. The null
hypothesis of no autocorrelation
( i li d ) i H0 ρ=0
(serial independence) is H
d 0
Question 5:
What is the range of possible
What is the range of possible
values for the DW statistic?
Answer:
0≤ DW ≤ 4.
If there is no autocorrelation you
If there is no autocorrelation you
would expect to get a DW stat of
around 2.
Question 6:
What are the three main
limitations of the DW test?
limitations of the DW test?
Answer:
1. It only tests for AR(1) errors
2 It has regions where the test is
2. It has regions where the test is
inconclusive (between dL and dU)
3. The DW statistic is biased towards 2
in models with a lagged dependent
in models with a lagged dependent
variable.
Question 7:
How do you test for higher order
a tocorrelated errors?
autocorrelated
Answer:
Using the Breusch‐Godfrey (BG)
test
Question 9:
How do I know what order of
autocorrelation to test for?
autocorrelation to test for?
Answer:
With annual data a first order test is
With annual data a first order test is
probably enough, with quarterly or
monthly data check for AR(4) or
thl d t h k f AR(4)
AR(12) errors if you have enough data.
If in doubt repeat the test for a
number of different maximum lags
number of different maximum lags.
Question 10:
What should I do if my model exhibits
What should I do if my model exhibits
autocorrelation?
Answer:
On the first instance try model re‐
On the first instance try model re
specification (additional lagged values
of variables or a log transformation of
f i bl l t f ti f
some series). If this doesn’t deal with
the problem use Autoregressive Least
Squares rather than OLS estimation
Squares rather than OLS estimation.