00000chen - Linear Regression Analysis3

Multiple
Regression Analysis
y = |
0
+ |
1
x
1
+ |
2
x
2
+ . . . |
p
x
p
+
1 2013/11/27 Chia-Hsin Chen
Assumptions of OLS
Estimator
Regression Residual
A regression residual, , is defined as the
difference between an observed y value and its
corresponding predicted value:
( )
( )
0 1 1 2 2

c | | | | = = + + + +
k k
y y y x x x
c
2013/11/27
Assumptions of OLS Estimator
1) E(
i
) = 0 (unbiasedness)
2) Var(
i
) is constant (homoscedasticity)
3) Cov(
i
,
j
) = 0 (independent error terms)
4) Cov(
i
,X
i
) = 0 (error terms unrelated to Xs)

i
~ iid (0 , o
2
)
Gauss-Markov Theorem: If these conditions hold,
OLS is the best linear unbiased estimator (BLUE).
Additional Assumption:
i
s are normally distributed.
3 illnesses in Regression
1) Multicollinearity: Strong relationship among
explanatory variables.
2) Heteroscedasticty: Changing variance.
3) Autocorrelated Error Terms: this is a
symptom of specification error.
Checking the Regression
Assumptions
Assumptions of the model:

1) linear conditional mean
2) constant variance (homoskedasticity), normal errors
3) independent error terms
So we should see:
a pattern of constant variation around a line
very few points more than 2 standard deviations
away from central linear relationship.
How can we be sure of this when using real data? We
must perform some basic diagnostic procedures to
insure that the model holds
If the model assumptions are violated:
Prediction can be systematically biased
Standard errors and t-tests wrong
someone may be able to beat you with a
different and better model
All of the assumptions of the model are
really statements about the regression error
terms ().
How can we detect violations of the model?
Example: Data Set 1
Example: Data Set 2
Example: Data Set 3
Example: Data Set 4
Residual Analysis
Properties of
Regression Residual
The mean of the residuals is equal to 0.
This property follows from the fact that the
sum of the differences between the observed
y values and their least squares predicted
values is equal to 0.
( ) ( )
Residuals 0 = =

y y
y
2013/11/27
Properties of
Regression Residual
The standard deviation of the residuals is
equal to the standard deviation of the fitted
regression model.
This property follows from the fact that the
sum of the squared residuals is equal to SSE,
which when divided by the error degrees of
freedom is equal to the variance of the fitted
regression model, s
2
.
2013/11/27
Properties of
Regression Residual
The square root of the variance is both the
standard deviation of the residuals and the
standard deviation of the regression model.
( ) ( )
( )
( ) ( )
2 2
2
Residuals SSE
Residuals
SSE
1 1
= =
= =
+ +

y y
s
n k n k
2013/11/27
Regression Outlier
A regression outlier is a residual that is larger
than 3s (in absolute value).
2013/11/27
Heteroskedasticity
Heteroskedasticity : the error terms do not all
have the same variance.

Consequences of Using OLS
when Heteroscedasticity
OLS estimation still gives unbiased coefficient
estimates, but they are no longer BLUE.
This implies that if we still use OLS in the
presence of heteroscedasticity, our standard
errors could be inappropriate and hence any
inferences we make could be misleading.
Whether the standard errors calculated using the
usual formulae are too big or too small will
depend upon the form of the heteroscedasticity.

Detection of Heteroscedasticity
Graphical methods
Formal tests: There are many of them
Goldfeld-Quandt (GQ) test
Whites test

Graphical analysis of residuals
To check for homoscedasticity (constant variance):
Produce a scatterplot of the standardized residuals
against the fitted values.
Produce a scatterplot of the standardized residuals
against each of the independent variables.
If assumptions are satisfied, residuals should vary
randomly around zero and the spread of the
residuals should be about the same throughout
the plot (no systematic patterns.)
2013/11/27
First, plot Y against X (works only when you
have one X)
1.Plot of Residuals vs. Fitted Values
Useful for:
detection of non-linear relationships
detection of non-constant variances
What should this look like?
1. Residuals should be evenly distributed
around the mean
2. No relationship between the mean of the
residual and the level of fitted value
A key assumption is that the
regression model is a linear
function. This is not always true.
This will show up even more
prominently in the residuals vs.
fitted plot
r
e
s
i
d
u
a
l
s

There should be no
relationship between the
average value of the
residuals and fitted (X)
Heteroskedasticity (different variances)
The key is a systematic pattern of variation
r
e
s
i
d
u
a
l
s

Heteroskedasticity: Examples
Homoscedasticity is probably violated if
The residuals seem to increase or decrease in
average magnitude with the fitted values, it is
an indication that the variance of the residuals
is not constant.

The points in the plot lie on a curve around
zero, rather than fluctuating randomly.
A few points in the plot lie a long way from
the rest of the points.

Residual Plot
for Functional Form
Add x
2
Term Correct Specification
x
e
^
x
e
^
2013/11/27
0
Residual Plot
for Equal Variance
Unequal Variance Correct Specification
Fan-shaped.
Standardized residuals used typically.
x
e
^
x
e
^
2013/11/27
Residual Plot
for Independence
x
Not Independent Correct Specification
Plots reflect sequence data were collected.
e
^
x
e
^
2013/11/27
Testing the Normality
Assumption
Normality of residuals.
Normality is not required in order to obtain
unbiased estimates of the regression
coecients.
But normality of the residuals necessary for
some of the other tests for example the
Bresch Pagan test of heteroscedasticity, Durbin
Watson test of autocorrelation, etc.
Normality of residuals.
Non-normal residuals cause the following:
t-tests and other associated statistics may no
longer be t distributed
Least squares estimates are extremely sensitive
to large
i
and it may be possible to improve on
least squares
The linear functional form may be incorrect and
various transformations of the dependent
variable may be necessary.
Normality
The random errors need be regarded as a random
sample from a N(0,
2
) distribution, so we can
check this assumption by checking whether the
residuals might have come from a normal
distribution.
We should look at the standardized residuals
Options for looking at distribution:
Histogram,
Normal plot of residuals

How can we detect departures from normality?
Characteristics of the normal distribution : thin tails
and symmetry.
The most basic analysis would be to graph the
histogram of the standardized residuals.
Neither of these plots look particularly symmetric
Histogram
Example
Histogram with a normal curve for recent 61 observations
of the monthly stock rate of return of Exxon
The histogram uses only 61
observations, whereas the normal
curve superimposed depicts the
histogram using infinitely many
observations. Therefore, sampling
errors should show up as gaps
between the two curves.
This is not effective for revealing a subtle but systematic
departure of the histogram from normality.
Normal Q-Q Plot of Residuals
A normal probability plot is found by plotting the residuals
of the observed sample against the corresponding residuals
of a standard normal distribution N (0,1)
1) The first step is to sort the data from the lowest to the
highest. Let n be the number of observations. Then, the
lowest observation, denoted as x(1) is the (1/n) th quantile
of the data.
2) The next step is to determine for each observation the
corresponding quantile of the normal distribution that has
the same mean and the standard deviation as the data. The
following Excel function is a convenient way to determine
the normal (i/(n+1)) th quantile, denoted as x(i).

x(i) = NORMINV(i/(n+1), sample mean, sample standard deviation).

If the plot shows a straight line, it is
reasonable to assume that the observed
sample comes from a normal distribution.
If the points deviate a lot from a straight line,
there is evidence against the assumption that
the random errors are an independent sample
from a normal distribution.

x
(i)

i/(n+1)
Normal Q-Q plot
Normal Q-Q Plot
qq-plots plot the quantiles of a variable
against the quantiles of a normal
distribution.
qq plot is sensitive to non-normality
near the tails.
1) The data are arranged from smallest to largest.
2) The percentile of each data value is determined.
3) the z-score of each data value is calculated.
4) z-scores are plotted against the percentiles of
data values
Departures from this straight line indicate
departures from normality.
PP-plot is sensitive to non-normality in the middle
range of data.
PP-plot graphs

PP-plot graphs
Example
Example
Suppose simulate some data from a t distribution with only 3
degrees of freedom.
Jarque-Bera test of normality
3 4
3 4
( ) ( ) /
;
( 1)( 2)
i i
x x x x n
n
S K
n n s s

(
= =
(

x
f (x )

x

f (x )
= =
f (x )
x
skewness
Skewed to the right, SK>0
Skewed to the left, SK<0
3
3
( )
( 1)( 2)
i
x x
n
S
n n s
(
=
(

kurtosis
X
f (X)
4
4
( ) /
i
x x n
K
s
=

What can we do if we find evidence of
Non-Normality?

What is the pattern in the plot of residuals?
Check alternative (non-linear) specications that
are appropriate.
Deviations from normality could be due to
outliers.
Find the reasons for outliers.
Data error? Correct the entry.
If not data error, and there is a valid reason for that
observation, then could use a dummy variable for
that observation.
Outliers
Sometimes these outlying observations are
associated with large residuals (Type A).
In other cases, these observations are influential and
draw the regression line close to them (Type B).
Looking at the residuals can help weed out Type A outlier
but not Type B.
What should you do about outliers?
Investigate data errors, changes in
measurement,
structural changes in the environment
Consider deleting only if you have a good reason.
What is an outlier?
an unusual observation which is not likely to reoccur.
If it is likely to reoccur, you will fool yourself by
deletion. You will think that the model fits and
predicts better than it really does.

Steps in a
Residual Analysis
1. Check for a misspecified model by plotting
the residuals against each of the quantitative
independent variables.
Analyze each plot, looking for a curvilinear
trend. This shape signals the need for a
quadratic term in the model. Try a second-
order term in the variable against which the
residuals are plotted.
2013/11/27
Steps in a Residual Analysis
2. Examine the residual plots for outliers. Draw
lines on the residual plots at 2- and 3-
standard-deviation distances below and
above the 0 line.
Examine residuals outside the 3-standard-
deviation lines as potential outliers and check
to see that no more than 5% of the residuals
exceed the 2-standard-deviation lines.
2013/11/27
Steps in a Residual Analysis
Determine whether each outlier can be
explained as an error in data collection or
transcription, corresponds at a member of a
population different from that of the remainder
of the sample, or simply represents an unusual
observation.
If the observation is determined to be an error,
fix it or remove it. Even if you cannot determine
the cause, you may want to rerun the regression
analysis without the observation to determine its
effect on the analysis.
2013/11/27
Steps in a
Residual Analysis
3. Check for nonnormal errors by plotting a
frequency distribution of the residuals, using
a stem-and-leaf display or a histogram.
Check to see if obvious departures from
normality exist. Extreme skewness of the
frequency distribution may be due to outliers
or could indicate the need for a
transformation of the dependent variable.
2013/11/27
Steps in a
Residual Analysis
4. Check for unequal error variances by plotting
the residuals against the predicted values, .
If you detect a cone-shaped pattern or some
other pattern that indicates that the variance
of c is not constant, refit the model using an
appropriate variance-stabilizing
transformation on y, such as ln (y). (Consult
the references for other useful variance-
stabilizing transformations.)
2013/11/27
Goldfeld-Quandt (GQ) test
The Goldfeld-Quandt (GQ) test is carried out
as follows.
1) Split the total sample of length T into two
sub-samples of length T1 and T2. The
regression model is estimated on each sub-
sample and the two residual variances are
calculated.
2) The null hypothesis is that the variances of
the disturbances are equal, H0:

2
2
2
1
o o =
The GQ Test (Contd)
1) The test statistic, denoted GQ, is simply the ratio of the
two residual variances where the larger of the two
variances must be placed in the numerator.

2) The test statistic is distributed as an F(T
1
-k, T
2
-k) under
the null of homoscedasticity.
A problem with the test is that the choice of where
to split the sample is that usually arbitrary and may
crucially affect the outcome of the test.

2
2
2
1
s
s
GQ =
Whites general test for heteroscedasticity is one
of the best approaches because it makes few
assumptions about the form of the
heteroscedasticity.
The test is carried out as follows:
1) Assume that the regression we carried out is as
follows y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ u
t

And we want to test H
0
: Var(u
t
) = o
2
.
We estimate the model, obtaining the residuals .

Whites Test: Detection of Heteroscedasticity

u
t
2) Then run the auxiliary regression

3) Obtain R
2
from the auxiliary regression and multiply it
by the number T of observations. It can be shown that
T R
2
~ _
2
(m)
where m is the number of regressors in the auxiliary
regression excluding the constant term.
4) If the _
2
test statistic from step 3 is greater than the
corresponding value from the statistical table, then reject
the null hypothesis that the disturbances are
homoscedastic.

t t t t t t t t
v x x x x x x u + + + + + + =
3 2 6
2
3 5
2
2 4 3 3 2 2 1
2
o o o o o o
How Do we Deal with
Heteroscedasticity?
Generalised Least Squares (GLS)
If the form (i.e. the cause) of the heteroscedasticity is
known, then we can use an estimation method
which takes this into account (called generalised
least squares, GLS).

Generalised Least Squares (GLS)
A simple illustration of GLS is as follows: Suppose
that the error variance is related to another variable
z
t
by
To remove the heteroscedasticity, divide the
regression equation by z
t
. Than

where is the error term.
So the disturbances from the new regression
equation will be homoscedastic.

( )
2 2
var
t t
z u o =
t
t
t
t
t
t t
t
v
z
x
z
x
z z
y
+ + + =
3
3
2
2 1
1
| | |
t
t
t
z
u
v =
( )
( )
2
2
2 2
2
var
var var o
o
= = =
|
|
.
|
\
|
=
t
t
t
t
t
t
t
z
z
z
u
z
u
v
Other Correcting for Heteroskedasticity

Other Correcting for Heteroskedasticity
Use Whites heteroscedasticity consistent
standard error estimates.
The effect of using Whites correction is that in
general the standard errors for the slope
coefficients are increased relative to the usual
OLS standard errors.
This makes us more conservative in hypothesis
testing, so that we would need more evidence
against the null hypothesis before we would
reject it.

Example

Reasons for the Log Transform

1
Y
X
Y
|
A
= A

Multicolinear

Multicollinearity

This problem occurs when the explanatory variables are very
highly correlated with each other.
Perfect multicollinearity
Cannot estimate all the coefficients
e.g. suppose x
3
= 2x
2

and the model is y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ |
4
x
4t
+ u
t

Problems if Near Multicollinearity is Present but Ignored
R
2
will be high but the individual coefficients will have high
standard errors.
The regression becomes very sensitive to small changes in the
specification.
Thus confidence intervals for the parameters will be very wide,
and significance tests might therefore give inappropriate
conclusions.
1 1
1 1
1 1
1 2 1 2 1
( ) ( ) ( )
( ) [( ( ))( ( )) ] [( )( ) ]
[( ) ' ( ) ]
( ) ( ') ( )
( ) ( ) ( )
| |
| |
o o

' ' ' '
= = = +
' '
= =
' ' '
=
' ' '
=
' ' ' '
= =
b X X X Y X X X X e
Cov b E b E b b E b E b b
E X X X ee X X X
X X X E ee X X X
X X X X X X X X
2
1

( ) & ( )
( )
| o | o
= =
'
is the (i,j) component in
j jj j jj
jj
Var c SE c
c X X
2
2 2 1
1 1
o o
=
= = = ~

n
i
i
e
SSE
MSE
n k n k
Use

~ ( 1)
( )
j j j j
j jj
t t n k
SE c
| | | |
| o

= =
( ) ( )
( ) ( )
2 2
2 2
2 1

( ) ( )

( )

[ ( )][ ( )] [( )( ) ]
( ) ( )
Var Y E Y E Y E Y X
E X X E X
E X X XE X
XCov X X X X X
|
| | | |
| | | | | | | |
| o

( (
= =
( (

( (
= =
( (

(
' ' '
= =

' '
= =
1
( ) ( ) SE Y X X X X o

'
=
2 1
( ) ( ) Cov XX | o

'
=
2
2 2 1
1
o o
=
= ~

n
i
i
e
n k
Multicollinearity
In multiple regression analysis, one is often
concerned with the nature and significance of the
relations between the explanatory variables and the
response variable.
Questions that are frequently asked are:
What is the relative importance of the effects of the
different independent variables?
What is the magnitude of the effect of a given independent
variable on the dependent variable?
Multicollinearity
(A) Can any independent variable be dropped from the
model because it has little or no effect on the dependent
variable?
(B) Should any independent variables not yet included
in the model be considered for possible inclusion?
Simple answers can be given to these questions if
(A) The independent variables in the model are
uncorrelated among themselves.
(B) They are uncorrelated with any other independent
variables that are related to the dependent variable but
omitted from the model.

Multicollinearity
Some key problems that typically arise when the
explanatory variables being considered for the
regression model are highly correlated among
themselves are:
1. Adding or deleting an explanatory variable changes the
regression coefficients.
2. The estimated standard deviations of the regression
coefficients become large when the explanatory
variables in the regression model are highly
correlated with each other.
3. The estimated regression coefficients individually may
not be statistically significant even though a definite
statistical relation exists between the response
variable and the set of explanatory variables.
Why?
Why?
Problems with multicolinearity
colinear variables can have coefficients with large
standard errors.
colinear variables can have insignificant ts, but
very significant Fs
getting a larger sample doesnt necessarily help
much
multicolinearity is a disease, a violation of the
model assumptions.
Least squares and the least squares standard errors
are not OK.
Multicollinearity
(strong relationship among explanatory variables themselves)
Variances of regression coefficients are inflated (smaller).
Regression coefficients may be different from their true
values, even signs.
Adding or removing variables produces large changes in
coefficients.
Removing a data point may cause large changes in coefficient
estimates or signs.
In some cases, the F ratio may be significant, R
2
may be
very high despite the all t ratios are insignificant
(suggesting no significant relationship).

Solutions to the Multicollinearity Problem
Drop a collinear variable from the
regression
Combine collinear variables (e.g. use their
sum as one variable)

Measuring Multicollinearity

The easiest way to measure the extent of multicollinearity
is simply to look at the matrix of correlations between the
individual variables. e.g.

But another problem: if 3 or more variables are linear
- e.g. x
2t
+ x
3t
= x
4t

Note that high correlation between y and one of the xs is
not muticollinearity.

Corr x
2
x
3
x
4
x
2
- 0.2 0.8
x
3
0.2 - 0.3
x
4
0.8 0.3 -
Multicollinearity Diagnostics
A formal method of detecting the presence of
multicollinearity that is widely used is by the
means of Variance Inflation Factor.
It measures how much the variances of the estimated
regression coefficients are inflated as compared to
when the independent variables are not linearly related.

is the coefficient of determination from the
regression of the jth independent variable on the
remaining k-1 independent variables.

k j
R
VIF
j
j
, 2 , 1 ,
1
1
2
=
=
2
j
R
A VIF near 1 suggests that multicollinearity is not a problem
for the independent variables.
Its estimated coefficient and associated t value will not change much as
the other independent variables are added or deleted from the regression
equation.
A VIF much greater than 1 indicates the presence of
multicollinearity. A maximum VIF value in excess of 10 is
often taken as an indication that the multicollinearity may be
unduly influencing the least square estimates.
the estimated coefficient attached to the variable is unstable and its
associated t statistic may change considerably as the other independent
variables are added or deleted.

The simple correlation coefficient between all pairs of
explanatory variables (i.e., X
1
, X
2
, , X
k
) is helpful
in selecting appropriate explanatory variables for a
regression model and is also critical for examining
multicollinearity.
While it is true that a correlation very close to +1
or 1 does suggest multicollinearity, it is not true
(unless there are only two explanatory variables)
to infer multicollinearity does not exist when there
are no high correlations between any pair of
explanatory variables.
Example:Sales Forecasting
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0

SUBSCRIB ADRATE SIGNAL APIPOP COMPETE

SUBSCRIB 1.00000 -0.02848 0.44762 0.90447 0.79832
SUBSCRIB 0.9051 0.0478 <.0001 <.0001

ADRATE -0.02848 1.00000 -0.01021 0.32512 0.34147
ADRATE 0.9051 0.9659 0.1619 0.1406

SIGNAL 0.44762 -0.01021 1.00000 0.45303 0.46895
SIGNAL 0.0478 0.9659 0.0449 0.0370

APIPOP 0.90447 0.32512 0.45303 1.00000 0.87592
APIPOP <.0001 0.1619 0.0449 <.0001

COMPETE 0.79832 0.34147 0.46895 0.87592 1.00000
COMPETE <.0001 0.1406 0.0370 <.0001

APIPOP 495 . 0 ADRATE 25 . 0 28 . 96 SUBSCRIBE + =
COMPETE 16.23 APIPOP 44 . 0 SIGNAL .02 - ADRATE 27 . 0 42 . 51 SUBSCRIBE + + =
COMPETE 13.92 APIPOP 43 . 0 ADRATE 26 . 0 32 . 51 SUBSCRIBE + + =
VIF calculation:
Fit the model

COMPETE + + + =
3 2 1 0
ADRATE SIGNAL APIPOP | | | |
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.878054
R Square 0.770978
Adjusted R Square 0.728036
Standard Error 264.3027
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 3762601 1254200 17.9541 2.25472E-05
Residual 16 1117695 69855.92
Total 19 4880295
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -472.685 139.7492 -3.38238 0.003799 -768.9402258 -176.43
Compete 159.8413 28.29157 5.649786 3.62E-05 99.86587622 219.8168
ADRATE 0.048173 0.149395 0.322455 0.751283 -0.268529713 0.364876
Signal 0.037937 0.083011 0.457012 0.653806 -0.138038952 0.213913
Fit the model

SIGNAL + + + =
3 2 1 0
APIPOP ADRATE Compete | | | |
SUMMARY OUTPUT
Multiple R 0.882936
R Square 0.779575
Observations 20
ANOVA
Regression 3 103.0599 34.35329 18.86239 1.66815E-05
Residual 16 29.14013 1.821258
Total 19 132.2
Intercept 3.10416 0.520589 5.96278 1.99E-05 2.000559786 4.20776
ADRATE 0.000491 0.000755 0.649331 0.525337 -0.001110874 0.002092
Signal 0.000334 0.000418 0.799258 0.435846 -0.000552489 0.001221
APIPOP 0.004167 0.000738 5.649786 3.62E-05 0.002603667 0.005731
Fit the model
COMPETE + + + =
3 2 1 0
APIPOP ADRATE Signal | | | |
SUMMARY OUTPUT
Multiple R 0.512244
R Square 0.262394
Observations 20
ANOVA
Regression 3 3559789 1186596 1.897261 0.170774675
Residual 16 10006813 625425.8
Total 19 13566602
Intercept 5.171093 547.6089 0.009443 0.992582 -1155.707711 1166.05
APIPOP 0.339655 0.743207 0.457012 0.653806 -1.235874129 1.915184
Compete 114.8227 143.6617 0.799258 0.435846 -189.7263711 419.3718
ADRATE -0.38091 0.438238 -0.86919 0.397593 -1.309935875 0.548109
Fit the model
COMPETE + + + =
3 2 1 0
APIPOP Signal ADRATE | | | |
SUMMARY OUTPUT
Multiple R 0.399084
R Square 0.159268
Observations 20
ANOVA
Regression 3 589101.7 196367.2 1.010346 0.413876018
Residual 16 3109703 194356.5
Total 19 3698805
Intercept 253.7304 298.6063 0.849716 0.408018 -379.2865355 886.7474
Signal -0.11837 0.136186 -0.86919 0.397593 -0.407073832 0.170329
APIPOP 0.134029 0.415653 0.322455 0.751283 -0.747116077 1.015175
Compete 52.3446 80.61309 0.649331 0.525337 -118.5474784 223.2367
VIF calculation Results:

There is no significant multicollinearity.

Variable R- Squared VIF
ADRATE 0.159268 1.19
COMPETE 0.779575 4.54
SIGNAL 0.262394 1.36
APIPOP 0.770978 4.36
Example: Multicolinearity

Outliers
Outliers

Outliers = unusual observations

How can we find unusual observations?
cause for outliers

Out of sample
predictions
Out of sample predictions

Out of sample predictions

Out of sample-Extrapolation

Non Linear
functional forms
Non Linear functional forms
Standard regression model is a linear
conditional mean model.
In many situations in practice, it is desirable to
have some flexibility to specify non-linear
regression functions.
The standard linear regression model can be
"tricked" into displaying non-linearity by two
techniques:

1 1
log( ) log( )
Y X
Y X
Y X
| |
A A
A = = A =
1
log( )
log( )
Y
Y
Y
X X
X
|
A
A
= =
A A
Example: Brain Weight Data

Ln(64)=4.1589

6.4g<= 64g
164
Generalized Linear Models
Normal Regression:
+ + + = =
2 2 1 1 0
2
) , ( ~ x x N y | | | q o
c
Distributional specification Link function Linear predictor
Poisson Regression:
Y~Poisson() E(Y)= = exp(q) q=|
0
+|
1
x
1
+ ...
Logistic Regression:
Y~Binomial(n,t) E(Y)= nt ==exp(q)/[1+exp(q)] q=|
0
+|
1
x
1
+ ...
Dummy Variables
166
Categorical Explanatory Variables
in Regression Models
Categorical independent variables can
be incorporated into a regression model
by converting them into 0/1 (dummy)
variables
For binary variables, code dummies 0
for no and 1 for yes
167
Dummy Variables, More than two levels
For categorical variables with k categories, use
k1 dummy variables
SMOKE2 has three levels, initially coded
0 = non-smoker , 1 = former smoker, 2 = current smoker
Use k 1 = 3 1 = 2 dummy variables to code
this information like this:
168
Example
Childhood respiratory()health
survey:
Binary explanatory variable (SMOKE) is coded 0
for non-smoker and 1 for smoker.
Response variable Forced Expiratory ()
Volume (FEV) is measured in liters/second
The mean FEV in nonsmokers is 2.566
The mean FEV in smokers is 3.277
169
Example, cont.
Regress FEV on SMOKE least squares
regression line:
= 2.566 + 0.711X
Intercept (2.566) = the mean FEV of group 0
Slope = the mean difference in FEV
= 3.277 2.566 = 0.711
t
stat
= 6.464 with 652 df, P 0.000 (same as
equal variance t test)
The 95% CI for slope is 0.495 to 0.927 (same
as the 95% CI for
1

0
)
Basic Biostat 170
Dummy Variable SMOKE
Regression line
passes through
group means
b = 3.277 2.566 = 0.711
171
Smoking increases FEV?
Children who smoked had higher mean FEV
How can this be true given what we know
about the deleterious respiratory effects of
smoking?
ANS: Smokers were older than the
nonsmokers.
AGE confounded the relationship between
SMOKE and FEV
A multiple regression model can be used to
adjust for AGE in this situation
172
Multiple Regression
Coefficients
Rely on
software to
calculate
multiple
regression
statistics
173
Example
The multiple regression model is:
FEV = 0.367 + 0.209(SMOKE) +0.231(AGE)
Intercept a
Slope b
2
Slope b
1
174
Multiple Regression Coefficients, cont.
The slope coefficient associated for SMOKE is
0.206, suggesting that smokers have 0.206
less FEV on average compared to non-smokers
(after adjusting for age)
The slope coefficient for AGE is 0.231,
suggesting that each year of age in associated
with an increase of 0.231 FEV units on average
(after adjusting for SMOKE)
175
Coeffi ci ents
a
.367 .081 4.511 .000
-.209 .081 -.072 -2.588 .010
.231 .008 .786 28.176 .000
(Constant)
smoke
age
Model
1
B St d. Error
Unstandardized
Coef f icients
Beta
St andardized
Coef f icients
t Sig.
Dependent Variable: f ev
a.
Inference About the Coefficients
Inferential statistics are calculated for each
regression coefficient. For example, in testing
H
0
:
1
= 0 (SMOKE coefficient controlling for AGE)
t
stat
= 2.588 and P = 0.010
df = n k 1 = 654 2 1 = 651
176
Inference About the Coefficients
The 95% confidence interval for this slope of
SMOKE controlling for AGE is 0.368 to 0.050.
Coeffi ci ents
a
.207 .527
-.368 -.050
.215 .247
(Constant)
smoke
age
Model
1
Lower Bound Upper Bound
95% Conf idence Interval f or B
Dependent Variable: f ev
a.
15-177
Comparing the Slopes of Two or More Regression Lines
Use of dummy (or classification) variables in regression.
y
x
1

A) (loc
1 1 0
x y | | + =
B) (loc
1 3 2
x y | | + =
3 1 0
: | | = H
2 0 0
: | | = H
Equal intercepts
Equal slopes
Suppose we have a quantitative explanatory variable, X
1
, and we have two
possible regression lines: one for situation 1 (location A), the other for situation
2 (location B).
178
Reformulate the model.
Define a new variable, x
2
such that
x
2
= 0 for situation 1 (Location A)
x
2
= 1 for situation 2 (Location B)
Then use multiple regression.
2 1 3 2 2 1 1 0
x x x x y t t t t + + + =
When x
2
=0 we have:
1 1 0 1 1 0
x x y | | t t + = + =
When x
2
=1 we have:
1 3 2 1 3 1 2 0
) ( ) (
x x y | | t t t t + = + + + =
Test of t
2
=0 equivalent to no intercept difference.
Test of t
3
=0 equivalent to no slope difference.
Tests are based on reduction (drop) sums of squares,
as previously defined.
Dummy variable
179
y
x
1

y
x
1

y
x
1

y
x
1

t
2
=0 and t
3
=0
t
2
=0 and t
3
=0
t
2
=0 and t
3
=0
t
2
=0 and t
3
=0
y=t
0
+ t
1
x
t
1

y=(t
0
+t
2
)+ (t
1
+t
3
)
x
y=t
0
+ t
1
x
y=(t
0
+t
2
)+ t
1
x
y=t
0
+ t
1
x
y=t
0
+ (t
1
+t
3
) x
y=t
0
+ t
1
x
y=t
0
+ t
1
x
Example

Example

Autocorrelation
in the errors
We assumed the errors are independent, that is,
Cov (u
i
, u
j
) = 0 for I = j
This is essentially the same as saying there is no pattern.
Obviously we never have the actual us, so we use their
sample counterpart, the residuals.
If there are patterns in the residuals from a model, we say
that they are autocorrelated.
Some stereotypical patterns we may find in the residuals.
Positive Autocorrelation

Positive Autocorrelation is indicated by a cyclical residual plot over time.
+
-
-
t
u
+
1
t
u
-3.7
-6
-6.5
-6
-3.1
-5
-3
0.5
-1
1
4
3
5
7
8
7
+
-
Time
t
u
Negative Autocorrelation

Negative autocorrelation is indicated by an alternating pattern where the residuals
cross the time axis more frequently than if they were distributed randomly
+
-
-
t
u
+
1
t
u
+
-
t
u
Time
No pattern in residuals
No autocorrelation

No pattern in residuals at all: this is what we would like to see
+
t
u
-
-
+
1
t
u
+
-
t
u
Detecting Autocorrelation:
The Durbin-Watson Test
The Durbin-Watson (DW) is a test for first order
autocorrelation - i.e. it assumes that the relationship is between
an error and the previous one
u
t
= u
t-1
+ v
t

where v
t
~ N(0, o
v
2
).
The DW test statistic actually tests
H
0
: = 0 and H
1
: = 0

The test statistic is calculated by

( )
DW
u u
u
t t
t
T
t
t
T
=

=
=
1
2
2
2
2
The Durbin-Watson Test: Critical Values
We can also write
where is the estimated correlation coefficient.
Since is a correlation, it implies that 0s DW s4.
If = 0, DW = 2. So roughly speaking, do not
reject the null hypothesis if DW is near 2, i.e. there
is little evidence of autocorrelation.
DW has 2 critical values, an upper critical value
(d
u
) and a lower critical value (d
L
), and there is
also an intermediate region where we can neither
reject nor not reject H
0
.
DW~ 2 1 (

)
The Durbin-Watson Test: Interpreting the Results

Conditions which Must be Fulfilled for DW to be a Valid Test
1. Constant term in regression
2. Regressors are non-stochastic
3. No lags of dependent variable

Another Test for Autocorrelation:
The Breusch-Godfrey Test

It is a more general test for r
th
order autocorrelation:
~N(0, )
The null and alternative hypotheses are:
H
0
:
1
= 0 and
2
= 0 and ... and
r
= 0
H
1
:
1
= 0 or
2
= 0 or ... or
r
= 0
The test is carried out as follows:
1). Estimate the linear regression using OLS and obtain the residuals,
2). Regress on all of the regressors from stage 1 (the xs) plus
Obtain R
2
from this regression.
3). It can be shown that (T-r)R
2
~ _
2
(r)
If the test statistic exceeds the critical value from the statistical tables,
reject the null hypothesis of no autocorrelation.

u
t
u
t
u u u u u v v
t t t t r t r t t
= + + + + +

1 1 2 2 3 3
... ,
, ,..., u u u
t t t r 1 2
2
v
o

Consequences of Ignoring Autocorrelation
if it is Present

The coefficient estimates derived using OLS are still
unbiased, but they are inefficient, i.e. they are not BLUE,
even in large sample sizes.
Thus, if the standard error estimates are inappropriate,
there exists the possibility that we could make the wrong
inferences.
R
2
is likely to be inflated relative to its correct value for
positively correlated residuals.

Remedies for Autocorrelation

If the form of the autocorrelation is known, we could use a GLS
procedure i.e. an approach that allows for autocorrelated residuals
e.g., Cochrane-Orcutt.
But such procedures that correct for autocorrelation require
assumptions about the form of the autocorrelation.
If these assumptions are invalid, the cure would be more dangerous
than the disease! - see Hendry and Mizon (1978).
However, it is unlikely to be the case that the form of the
autocorrelation is known, and a more modern view is that residual
autocorrelation presents an opportunity to modify the regression.

Dynamic Models

All of the models we have considered so far have been static, e.g.
y
t
= |
1
+ |
2
x
2t
+ ... + |
k
x
kt
+ u
t

But we can easily extend this analysis to the case where the current
value of y
t
depends on previous values of y or one of the xs, e.g.
y
t
= |
1
+ |
2
x
2t
+ ... + |
k
x
kt
+
1
y
t-1
+

2
x
2t-1
+ +
k
x
kt-1
+ u
t

We could extend the model even further by adding extra lags, e.g.
x
2t-2
, y
t-3
.

Why Might we Want/Need To Include Lags
in a Regression?

Inertia of the dependent variable
Over-reactions
Measuring time series as overlapping moving averages

However, other problems with the regression could cause the null
hypothesis of no autocorrelation to be rejected:
Omission of relevant variables, which are themselves autocorrelated.
If we have committed a misspecification error by using an
inappropriate functional form.
Autocorrelation resulting from unparameterised seasonality.

Models in First Difference Form
Another way to sometimes deal with the problem of autocorrelation is
to switch to a model in first differences.

Denote the first difference of y
t
, i.e. y
t
- y
t-1
as Ay
t
; similarly for the x-
variables, Ax
2t
= x
2t
- x
2t-1
etc.

The model would now be
Ay
t
= |
1
+ |
2
Ax
2t
+ ... + |
k
Ax
kt
+ u
t

Sometimes the change in y is purported to depend on previous values
of y or x
t
as well as changes in x:
Ay
t
= |
1
+ |
2
Ax
2t
+ |
3
x
2t-1
+|
4
y
t-1
+ u
t

The Long Run Static Equilibrium Solution

One interesting property of a dynamic model is its long run or static
equilibrium solution.

Equilibrium implies that the variables have reached some steady state
and are no longer changing, i.e. if y and x are in equilibrium, we can say
y
t
= y
t+1
= ... =y and x
t
= x
t+1
= ... =x
Consequently, Ay
t
= y
t
- y
t-1
= y - y = 0 etc.

So the way to obtain a long run static solution is:
1. Remove all time subscripts from variables
2. Set error terms equal to their expected values, E(u
t
)=0
3. Remove first difference terms altogether
4. Gather terms in x together and gather terms in y together.

These steps can be undertaken in any order
The Long Run Static Equilibrium Solution:
An Example

If our model is
Ay
t
= |
1
+ |
2
Ax
2t
+ |
3
x
2t-1
+|
4
y
t-1
+ u
t

then the static solution would be given by
0 = |
1
+ |
3
x
2t-1
+|
4
y
t-1

|
4
y
t-1
= - |
1
- |
3
x
2t-1

2
4
3
4
1
x y
|
|
|
|

=

Problems with Adding Lagged Regressors
to Cure Autocorrelation

Inclusion of lagged values of the dependent variable violates the
assumption that the RHS variables are non-stochastic.

What does an equation with a large number of lags actually mean?

Note that if there is still autocorrelation in the residuals of a model
including lags, then the OLS estimators will not even be consistent.

Parameter Stability Tests

Parameter Stability Tests

So far, we have estimated regressions such as

We have implicitly assumed that the parameters (|
1
, |
2
and |
3
) are
constant for the entire sample period.

We can test this implicit assumption using parameter stability tests. The
idea is essentially to split the data into sub-periods and then to estimate up
to three models, for each of the sub-parts and for all the data and then to
compare the RSS of the models.

There are two types of test we can look at:
- Chow test (analysis of variance test)
- Predictive failure tests

y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ u
t

The Chow Test
The steps involved are:
1. Split the data into two sub-periods. Estimate the regression over the
whole period and then for the two sub-periods separately (3 regressions).
Obtain the RSS for each regression.

2. The restricted regression is now the regression for the whole period
while the unrestricted regression comes in two parts: for each of the sub-
samples.

We can thus form an F-test which is the difference between the RSSs.

The statistic is

( ) RSS RSS RSS
RSS RSS
T k
k
+
+

1 2
1 2
2
The Chow Test (contd)

where:
RSS = RSS for whole sample
RSS
1
= RSS for sub-sample 1
RSS
2
= RSS for sub-sample 2
T = number of observations
2k = number of regressors in the unrestricted regression (since it comes
in two parts)
k = number of regressors in (each part of the) unrestricted regression

3. Perform the test. If the value of the test statistic is greater than the
critical value from the F-distribution, which is an F(k, T-2k), then reject
the null hypothesis that the parameters are stable over time.

A Chow Test Example

Consider the following regression for the CAPM | (again) for the
returns on Glaxo.

Say that we are interested in estimating Beta for monthly data from
1981-1992. The model for each sub-period is

1981M1 - 1987M10
0.24 + 1.2R
Mt
T = 82 RSS
1
= 0.03555
1987M11 - 1992M12
0.68 + 1.53R
Mt
T = 62 RSS
2
= 0.00336
1981M1 - 1992M12
0.39 + 1.37R
Mt
T = 144 RSS = 0.0434

A Chow Test Example - Results

The null hypothesis is

The unrestricted model is the model where this restriction is not imposed

= 7.698

Compare with 5% F(2,140) = 3.06

We reject H
0
at the 5% level and say that we reject the restriction that the
coefficients are the same in the two periods.
H and
0 1 2 1 2
: o o | | = =
( )
Test statistic =
+
+

00434 00355 000336
00355 000336
144 4
2
. . .
. .

The Predictive Failure Test

Problem with the Chow test is that we need to have enough data to do the
regression on both sub-samples, i.e. T
1
>>k, T
2
>>k.
An alternative formulation is the predictive failure test.
What we do with the predictive failure test is estimate the regression over a long
sub-period (i.e. most of the data) and then we predict values for the other period
and compare the two.

To calculate the test:
- Run the regression for the whole period (the restricted regression) and obtain the RSS
- Run the regression for the large sub-period and obtain the RSS (called RSS
1
). Note
we call the number of observations T
1
(even though it may come second).

where T
2
= number of observations we are attempting to predict. The test statistic
will follow an F(T
2
, T
1
-k).
2
1
1
1
Statistic Test
T
k T
RSS
RSS RSS
=
Backwards versus Forwards Predictive Failure Tests

There are 2 types of predictive failure tests:

- Forward predictive failure tests, where we keep the last few
observations back for forecast testing, e.g. we have observations for
1970Q1-1994Q4. So estimate the model over 1970Q1-1993Q4 and
forecast 1994Q1-1994Q4.

- Backward predictive failure tests, where we attempt to back-cast
the first few observations, e.g. if we have data for 1970Q1-1994Q4,
and we estimate the model over 1971Q1-1994Q4 and backcast
1970Q1-1970Q4.

Predictive Failure Tests An Example

We have the following models estimated:
For the CAPM | on Glaxo.
1980M1-1991M12
0.39 + 1.37R
Mt
T = 144 RSS = 0.0434
1980M1-1989M12
0.32 + 1.31R
Mt
T
1
= 120 RSS
1
= 0.0420
Can this regression adequately forecast the values for the last two years?

= 0.164

Compare with F(24,118) = 1.66.
So we do not reject the null hypothesis that the model can adequately
predict the last few observations.
24
2 120
0420 . 0
0420 . 0 0434 . 0
Statistic Test

=

Omission of an Important Variable or
Inclusion of an Irrelevant Variable

Omission of an Important Variable
Consequence: The estimated coefficients on all the other variables will be
biased and inconsistent unless the excluded variable is uncorrelated with
all the included variables.

Even if this condition is satisfied, the estimate of the coefficient on the
constant term will be biased.

The standard errors will also be biased.

Inclusion of an Irrelevant Variable

Coefficient estimates will still be consistent and unbiased, but the
estimators will be inefficient.

How do we decide the sub-parts to use?

As a rule of thumb, we could use all or some of the following:
- Plot the dependent variable over time and split the data accordingly to any
obvious structural changes in the
series, e.g.

- Split the data according to any known
important historical events (e.g. stock market crash, new government elected)
- Use all but the last few observations and do a predictive failure test on those.
0
200
400
600
800
1000
1200
1400
1
2
7
5
3
7
9
1
0
5
1
3
1
1
5
7
1
8
3
2
0
9
2
3
5
2
6
1
2
8
7
3
1
3
3
3
9
3
6
5
3
9
1
4
1
7
4
4
3
Sample Period
V
a
l
u
e

o
f

S
e
r
i
e
s

(
y
t
)

A Strategy for Building Econometric Models

Our Objective:

To build a statistically adequate empirical model which
- satisfies the assumptions of the CLRM
- is parsimonious
- has the appropriate theoretical interpretation
- has the right shape - i.e.
- all signs on coefficients are correct
- all sizes of coefficients are correct
- is capable of explaining the results of all competing models

2 Approaches to Building Econometric Models

There are 2 popular philosophies of building econometric models: the
specific-to-general and general-to-specific approaches.

Specific-to-general was used almost universally until the mid 1980s,
and involved starting with the simplest model and gradually adding to it.

Little, if any, diagnostic testing was undertaken. But this meant that all
inferences were potentially invalid.

An alternative and more modern approach to model building is the LSE
or Hendry general-to-specific methodology.

The advantages of this approach are that it is statistically sensible and also
the theory on which the models are based usually has nothing to say about
the lag structure of a model.
The General-to-Specific Approach
First step is to form a large model with lots of variables on the right hand
side
This is known as a GUM (generalised unrestricted model)
At this stage, we want to make sure that the model satisfies all of the
assumptions of the CLRM
If the assumptions are violated, we need to take appropriate actions to remedy
this, e.g.
- taking logs
- adding lags
- dummy variables
We need to do this before testing hypotheses
Once we have a model which satisfies the assumptions, it could be very big
with lots of lags & independent variables

The General-to-Specific Approach:
Reparameterising the Model

The next stage is to reparameterise the model by
- knocking out very insignificant regressors
- some coefficients may be insignificantly different from each other,
so we can combine them.

At each stage, we need to check the assumptions are still OK.

Hopefully at this stage, we have a statistically adequate empirical model
which we can use for
- testing underlying financial theories
- forecasting future values of the dependent variable
- formulating policies, etc.

Regression Analysis In Practice - A Further Example:
Determinants of Sovereign Credit Ratings

Cantor and Packer (1996)

Financial background:
What are sovereign credit ratings and why are we interested in them?

Two ratings agencies (Moodys and Standard and Poors) provide credit
ratings for many governments.

Each possible rating is denoted by a grading:

Moodys Standard and Poors
Aaa AAA
..
B3 B-

Purposes
- to attempt to explain and model how the ratings agencies arrived at
their ratings.

- to use the same factors to explain the spreads of sovereign yields
above a risk-free proxy

- to determine what factors affect how the sovereign yields react to
ratings announcements

Determinants of Sovereign Ratings

Data
Quantifying the ratings (dependent variable): Aaa/AAA=16, ... , B3/B-=1

Explanatory variables (units of measurement):
- Per capita income in 1994 (thousands of dollars)
- Average annual GDP growth 1991-1994 (%)
- Average annual inflation 1992-1994 (%)
- Fiscal balance: Average annual government budget surplus as a
proportion of GDP 1992-1994 (%)
- External balance: Average annual current account surplus as a proportion
of GDP 1992-1994 (%)
- External debt Foreign currency debt as a proportion of exports 1994 (%)
- Dummy for economic development
- Dummy for default history
Income and inflation are transformed to their logarithms.

The model: Linear and estimated using OLS

Dependent Variable

Explanatory Variable
Expected
sign
Average
Rating
Moodys
Rating
S&P
Rating
Moodys / S&P
Difference
Intercept ? 1.442
(0.663)
3.408
(1.379)
-0.524
(-0.223)
3.932**
(2.521)
Per capita income + 1.242***
(5.302)
1.027***
(4.041)
1.458***
(6.048)
-0.431***
(-2.688)
GDP growth + 0.151
(1.935)
0.130
(1.545)
0.171**
(2.132)
-0.040
(0.756)
Inflation - -0.611***
(-2.839)
-0.630***
(-2.701)
-0.591***
(2.671)
-0.039
(-0.265)
Fiscal Balance + 0.073
(1.324)
0.049
(0.818)
0.097*
(1.71)
-0.048
(-1.274)
External Balance + 0.003
(0.314)
0.006
(0.535)
0.001
(0.046)
0.006
(0.779)
External Debt - -0.013***
(-5.088)
-0.015***
(-5.365)
-0.011***
(-4.236)
-0.004***
(-2.133)
Development dummy + 2.776***
(4.25)
2.957***
(4.175)
2.595***
(3.861)
0.362
(0.81)
Default dummy - -2.042***
(-3.175)
-1.63**
(-2.097)
-2.622***
(-3.962)
1.159***
(2.632)
Adjusted R
2
0.924 0.905 0.926 0.836
Notes: t-ratios in parentheses; *, **, and *** indicate significance at the 10%, 5% and 1% levels
respectively. Source: Cantor and Packer (1996). Reprinted with permission from Institutional Investor.
Interpreting the Model

From a statistical perspective
Virtually no diagnostics

Adjusted R
2
is high

Look at the residuals: actual rating - fitted rating

From a financial perspective
Do the coefficients have their expected signs and sizes?

Do Ratings Add to Publicly Available Available Information?
Now dependent variable is
- Log (Yield on the sovereign bond - yield on a US treasury bond)

Do Ratings Add to Publicly Available Available
Information? Results

Dependent Variable: Log (yield spread)
Variable Expected Sign (1) (2) (3)
Intercept ? 2.105***
(16.148)
0.466
(0.345)
0.074
(0.071)
Average
Rating
- -0.221***
(-19.175)
-0.218***
(-4.276)
Per capita
income
- -0.144
(-0.927)
0.226
(1.523)
GDP growth - -0.004
(-0.142)
0.029
(1.227)
Inflation + 0.108
(1.393)
-0.004
(-0.068)
Fiscal Balance - -0.037
(-1.557)
-0.02
(-1.045)
External
Balance
- -0.038
(-1.29)
-0.023
(-1.008)
External Debt + 0.003***
(2.651)
0.000
(0.095)
Development
dummy
- -0.723***
(-2.059)
-0.38
(-1.341)
Default dummy + 0.612***
(2.577)
0.085
(0.385)
Adjusted R
2
0.919 0.857 0.914
Notes: t-ratios in parentheses; *, **, and *** indicate significance at the 10%, 5% and 1% levels
respectively. Source: Cantor and Packer (1996). Reprinted with permission from Institutional Investor.
What Determines How the Market Reacts
to Ratings Announcements?
The sample: Every announcement of a ratings change that occurred
between 1987 and 1994 - 79 such announcements spread over 18
countries.

39 were actual ratings changes

40 were watchlist / outlook changes

The dependent variable: changes in the relative spreads over the US
T-bond over a 2-day period at the time of the announcement.
to Ratings Announcements? Explanatory variables.

0 /1 dummies for

- Whether the announcement was positive
- Whether there was an actual ratings change
- Whether the bond was speculative grade
- Whether there had been another ratings announcement in the previous 60 days.

and

- The change in the spread over the previous 60 days.
- The ratings gap between the announcing and the other agency

to Ratings Announcements? Results

Dependent Variable: Log Relative Spread
Independent variable Coefficient (t-ratio)
Intercept -0.02
(-1.4)
Positive announcements 0.01
(0.34)
Ratings changes -0.01
(-0.37)
Moodys announcements 0.02
(1.51)
Speculative grade 0.03**
(2.33)
Change in relative spreads from day 60 to day -1 -0.06
(-1.1)
Rating gap 0.03*
(1.7)
Other rating announcements from day 60 to day -1 0.05**
(2.15)
Adjusted R
2
0.12
Note: * and ** denote significance at the 10% and 5% levels respectively. Source: Cantor and Packer
(1996). Reprinted with permission from Institutional Investor.
Conclusions

6 factors appear to play a big role in determining sovereign credit
ratings - incomes, GDP growth, inflation, external debt, industrialised
or not, and default history.

The ratings provide more information on yields than all of the macro
factors put together.

We cannot determine well what factors influence how the markets will
react to ratings announcements.

Comments on the Paper

Only 49 observations for first set of regressions and 35 for yield
regressions and up to 10 regressors

No attempt at reparameterisation

Little attempt at diagnostic checking

Where did the factors (explanatory variables) come from?

Example:
Simple Regression Results

Multiple Regression Results

Check the size and significance level of the
coefficients, the F-value, the R-Square, etc. You
will see what the net of effects are.

Coefficients Standard Error t Stat
Intercept (b
0
) 165.0333581 16.50316094 10.000106
Lotsize (b
1
)
6.931792143 2.203156234 3.1463008
F-Value 9.89
Coefficients Standard Error t Stat
Intercept 59.32299284 20.20765695 2.935669
Lotsize 3.580936283 1.794731507 1.995249
Rooms 18.25064446 2.681400117 6.806386
F-Value 31.23
Using The Equation to Make Predictions
Predict the appraised value at average lot size
(7.24) and average number of rooms (7.12).

What is the total effect from 2000 sf increase in
lot size and 2 additional rooms?

$215,180 or 215.18
) 18.25(7.12 (7.24) 3.58 59.32 . App.Val
=
+ + =
in app. value
(3.58)(2000) (18.25)(2)
$43,660
Increse
= +
=
Coefficient of Multiple Determination, r
2
and
Adjusted r
2
Reports the proportion of total variation in Y
explained by all X variables taken together (the
model)

Adjusted r
2
r
2
never decreases when a new X variable is added to
the model
This can be a disadvantage when comparing models
squares of sum total
squares of sum regression
SST
SSR
r
2
k .. 12 . Y
= =
What is the net effect of adding a new variable?
We lose a degree of freedom when a new X variable is added
Did the new X variable add enough explanatory power to offset
the loss of one degree of freedom?
Shows the proportion of variation in Y explained
by all X variables adjusted for the number of X
variables used

(where n = sample size, k = number of independent variables)

Penalize excessive use of unimportant independent
variables
Smaller than r
2
Useful in comparing among models

(
|
.
|
\
|

=
1 k n
1 n
) r 1 ( 1 r
2
k .. 12 . Y
2
adj
Multiple Regression Assumptions
Assumptions:
The errors are normally distributed
Errors have a constant variance
The model errors are independent
Errors (residuals) from the regression model:
e
i
= (Y
i
Y
i
)
These residual plots are used in multiple regression:
Residuals vs. Y
i
Residuals vs. X
1i

Residuals vs. X
2i

Residuals vs. time (if time series data)

Two variable model

Y
X
1
X
2
2 2 1 1 0
X b X b b Y
+ + =
Y
i
Y
i
<

x
2i
x
1i
The best fit equation, Y ,
is found by minimizing the
sum of squared errors, Ee
2
<

Sample
observation
Residual = e
i
=
(Y
i
Y
i
)
<

Are Individual Variables Significant?
Use t-tests of individual variable slopes
Shows if there is a linear relationship between the
variable X
i
and Y; Hypotheses:
H
0
:
i
= 0 (no linear relationship)
H
1
:
i
0 (linear relationship does exist between X
i
and Y)
Test Statistic:

Confidence interval for the population slope
i

i
b
i
1 k n
S
0 b
t

=
i
b 1 k n i
S t b

Is the Overall Model Significant?

F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all of the X
variables considered together and Y
Use F test statistic; Hypotheses:
H
0
:
1
=
2
= =
k
= 0 (no linear relationship)
H
1
: at least one
i
0 (at least one independent
variable affects Y)
Test statistic:
1 k n
SSE
k
SSR
MSE
MSR
F

= =
Testing Portions of the Multiple Regression
Model
To find out if inclusion of an individual X
j
or a
set of Xs, significantly improves the model,
given that other independent variables are
included in the model
Two Measures:
1. Partial F-test criterion
2. The Coefficient of Partial Determination
Contribution of a Single Independent
Variable X
j

SSR(X
j
| all variables except X
j
)
= SSR (all variables) SSR(all variables except X
j
)
Measures the contribution of X
j
in explaining the total
variation in Y (SST)
consider here a 3-variable model:
SSR(X
1
| X
2
and X
3
)
= SSR (all variablesX1-x3) SSR(X
2
and X
3
)

SSR
UR

Model
SSR
R
Model
The Partial F-Test Statistic
Consider the hypothesis test:
H
0
: variable Xj does not significantly improve the model after all
other variables are included
H
1
: variable Xj significantly improves the model after all other
variables are included
1) - k - /(n SSE MSE
n) restrictio of number )/(df SSR - (SSR
F
UR
R UR
=
=
=
Note that the numerator is the contribution of X
j
to the regression.

If Actual F Statistic is > than the Critical F, then
Conclusion is: Reject H
0
; adding X
1
does improve model
Coefficient of Partial Determination for one or a
set of variables
Measures the proportion of total variation in the dependent
variable (SST) that is explained by X
j
while controlling for
(holding constant) the other explanatory variables
R UR
R UR
2
j) except variables Yj.(all
SSR SST
SSR - SSR
r
=
Too complicated
by hand!
Why & How it works?

00000chen - Linear Regression Analysis3

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

00000chen - Linear Regression Analysis3

Загружено:

Авторское право:

Доступные форматы

Multiple

The Durbin-Watson Test: Interpreting the Results

Is the Overall Model Significant?

Вам также может понравиться