Академический Документы
Профессиональный Документы
Культура Документы
FEUNL
Var
ST
Where rij denotes the ith residual from regressing xj on all other independent variables
(see section 8.2).
In EViews, this can be done by choosing on the Options Menu, the White standard
errors.
If you find any typo in these notes, please e-mail me so I can correct it.
Robust standard errors and t statistics are appropriate as the sample sizes increases. We
dont always use these robust standard errors because, in small samples, the robust t
statistics can depart a lot from the t distribution.
Hence, it is important to know whether there is or there isnt heteroskedasticity in our
sample. Lets perform a few examples picking examples from the book. Take the
example on the demand for cigarettes, from chapter 8. Open the corresponding workfile.
We wish to estimate the demand for cigarettes measured by the number of cigarettes
smoked per day as a function of income, the price of a pack of cigarettes, education,
age, squared age and the presence of a ban on restaurants from the state the person
surveyed lives.
neither income nor cigarette price is significant and their impacts would be small
anyway (eg, if income increases by 10%, cigs increases by (0.880/100)*10=0.088
cigarettes per day);
education reduces smoking;
smoking increases with age up until approx. 42.83 years (basically, maximize
cigarettes smoked in variable age; derive the part related to age and age squared and
make it equal to zero). After that, it falls.
But now, a very important question: is there heteroskedasticity? If so, the usual
standard errors and t statistics will be wrong and OLS will not be efficient. We will
perform just a couple tests to check for heteroskedasticity. See other tests available
on EViews.
First, lets run the Breusch-Pagan test for heteroskedasticity:
1) Estimate the model by OLS, keep the squared OLS estimated residuals.
2) Run an auxiliary regression of the squared OLS estimated residuals on the
independent variables. Keep the R-squared from this regression.
3) Form either the F (following a F(k,n-k-1)) or the LM (following a chi-square with k
degrees of freedom). If the p-value is greater than 5%, we do not reject the null
of homoskedasticity.
In EViews, this is very easy to do.
Behold how many options you have for running a heteroskedasticity test!
Both the F test and the LM (obs*Rsquared of the auxiliary regression) conclude for the
rejection of the null of homoskedasticity.
You should check that EViews is doing this right. How? Generate the residuals yourself
and perform the regression as usual (New Object/Equation, etc.). You will get the same
output as above.
White test for heteroskedasticity takes into account the possibility that the variance
structure might be richer. The squares and cross-products of the independent variables
are also included in the right-hand side. Alternatively, whenever you have too many
independent variables, you can use the fitted values of the dependent variable and the
squared fitted values of the independent variable.
2.159258
52.17245
110.0813
Prob. F(25,781)
Prob. Chi-Square(25)
Prob. Chi-Square(25)
0.0009
0.0011
0.0000
h exp(g )
1/ h
as weights
Since we have to estimate h, FGLS will not be unbiased but it is consistent and
asymptotically more efficient than OLS.
If cigs_residsq stands for the estimated h, we have to divide the model by
1/square root(h). Why? See book (there is a univariate example there for savings, start
from there).
We will get
Dependent Variable: CIGS/SQR(CIGS_RESIDSQF)
Method: Least Squares
Date: 10/12/09 Time: 17:06
Sample: 1 807
Included observations: 807
1/SQR(CIGS_RESIDSQF)
LOG(INCOME)/SQR(CIGS_RESIDSQF)
LOG(CIGPRIC)/SQR(CIGS_RESIDSQF)
EDUC/SQR(CIGS_RESIDSQF)
AGE/SQR(CIGS_RESIDSQF)
AGE^2/SQR(CIGS_RESIDSQF)
RESTAURN/SQR(CIGS_RESIDSQF)
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
Std. Error
t-Statistic
Prob.
5.635471
1.295239
-2.940314
-0.463446
0.481948
-0.005627
-3.461064
17.80314
0.437012
4.460145
0.120159
0.096808
0.000939
0.795505
0.316544
2.963855
-0.659242
-3.856953
4.978378
-5.989706
-4.350776
0.7517
0.0031
0.5099
0.0001
0.0000
0.0000
0.0000
0.002751
-0.004728
1.578698
1993.831
-1510.045
2.049719
0.966192
1.574979
3.759715
3.800425
3.775347
We could also use Menu Options/WLS and write down the appropriate weighting scheme.
2. A little bit on time series just a few issues (examples from chapters 10 to 12)
2.1 Take the workfile about housing investment and prices.
There are a lot of interesting things you can do now. You can take a series and study its
evolution over time. Take the housing price index for instance
You can actually see several graphs at the same time if you select a Group of variables.
10
The log of the price seems to be significant. You may think this is OK, but it is not.
Both variables are trending throughout the sample.
If you take a look at the residuals, you can see if what youre doing makes sense or
not.
They are not stationary (there are formal tests to see this, namely unit root tests like the
Dickey-Fuller or Phillips-Perron tests and you can always look at the correlogram of
the residuals). This means we should rethink your specification. Our previous regression
was spurious.
We now add a linear trend to take account of the trending behaviour of LINVPC.
11
LPRICE does not come significant anymore. We conclude that there are other factors
beyond the price that are captured by the linear trend that seem to be important.2
Notice that these other factors are not modelled just by adding a linear trend.
Moreover, the fact that a linear trend appears to be informative shouldnt prompt
you to get carried away and start obsessively adding a huge train of trend terms
(linear, quadratic,)
C
LPRICE
T
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
Std. Error
t-Statistic
Prob.
-0.913060
-0.380961
0.009829
0.135613
0.678835
0.003512
-6.732815
-0.561198
2.798445
0.0000
0.5779
0.0079
0.340765
0.306959
0.143641
0.804675
23.45930
1.048727
-0.666155
0.172543
-0.974252
-0.850133
10.07976
0.000296
The Durbin-Watson test, valid under classical assumptions, is based on the OLS
residuals and one can show that DW is approximately 2(1-) where is the first-order
2
You should always test the residuals to see if theyre well-behaved. In this case, they are still
nonstationary. In a practical work, you should keep on looking for a correct specification.
12
13
Once you find out that here is first-order serial correlation, you can transform the model
to take this into account:
1 estimate the original model and take the estimated residuals.
2 run the regression of t over t-1 to compute the correlation coefficient.
3 For every-variable xt (and for the dependent variable), compute the quasidifferenced variable xt-xt-1
4- Apply OLS to the equation with the quasi-differenced variables. The usual standard
errors, t statistics and F are asymptotically valid.4
3
The regressors appear because we are assuming away strict exogeneity of the regressors. If we had strict
exogeneity, we only needed to regress the residuals on their lagged values, the regressors wouldnt be
needed. See book on this.
4
This is known as the Cochrane-Orcutt estimation, omitting the first observation. If you transform the
first equation to include the first observation in the regression, you call this the Prais-Winsten estimation.
14
Alternatively, you can estimate the model as usual, but converting the standard
errors at the end. This may be better than simple FGLS.
Just pick the option Newey-West. You will be correcting for both heteroskedasticty
and autocorrelation.
Notice that you can always test for heteroskedasticty as in cross-section cases. (just
check the options under Heteroskedasticity tests. They are the same ones as before.
However, for these tests to be valid, the errors should not be autocorrelated; also, for the
F statistic of the Breusch-Pagan test to be valid, the residuals of this auxiliary regression
should themselves be serially uncorrelated and homoskedastic.
AR(1)
We can write it as y t y t 1 t where the error is a white noise (constant variance
and mean zero). If =1, you have what is known as a random walk. It is a typically
nonstationary process, highly persistent. We say that this process has a stochastic
trend, as opposed to a deterministic trend that appears whenever we have a linear trend
directly establishing a trend in the variable.
15
Compare the highly persistent random walk above to a stationary little persistent AR(1)
16
If you add a constant to the random walk, you get the random walk with drift
y t a y t 1 t
Notice how the drift a defines a linear trend behaviour in the series!
MA(1)
Here, we model the residual part and make it richer than it was before.
u t t t 1
17
It is clearly stationary (once again, this can be tested through the so-called unit root
tests). Actually, a pure MA process is always stationary.
You can create other processes yourself, eg, combining AR and MA parts to get ARMA
models.
18