Вы находитесь на странице: 1из 60

# 11- 1

COMPLETE
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
6th edition.
11- 2

Chapter 11

Multiple
Regression
11- 3

## 11 Multiple Regression (1)

• Using Statistics
• The k-Variable Multiple Regression Model
• The F Test of a Multiple Regression Model
• How Good is the Regression
• Tests of the Significance of Individual Regression
Parameters
• Testing the Validity of the Regression Model
• Using the Multiple Regression Model for
Prediction
11- 4

## 11 Multiple Regression (2)

• Qualitative Independent Variables
• Polynomial Regression
• Nonlinear Models and Transformations
• Multicollinearity
• Residual Autocorrelation and the Durbin-Watson
Test
• Partial F Tests and Variable Selection Methods
• Multiple Regression Using the Solver
• The Matrix Approach to Multiple Regression
Analysis
11- 5

## 11 LEARNING OBJECTIVES (1)

After studying this chapter you should be able to:
• Determine whether multiple regression would be applicable to a given
instance
• Formulate a multiple regression model
• Carryout a multiple regression using a spreadsheet template
• Test the validity of a multiple regression by analyzing residuals
• Carryout hypothesis tests about the regression coefficients
• Compute a prediction interval for the dependent variable
11- 6

## 11 LEARNING OBJECTIVES (2)

After studying this chapter you should be able to:
• Use indicator variables in a multiple regression
• Carryout a polynomial regression
• Conduct a Durbin-Watson test for autocorrelation
in residuals
• Conduct a partial F-test
• Determine which independent variables are to be
included in a multiple regression model
• Solve multiple regression problems using the
Solver macro
11- 7

## 11-1 Using Statistics

y Lines y Planes
B

B
A

Slope: β 11 C
A
x1
Intercept: β 00

x2
x
Any two points (A and B), or Any three points (A, B, and C), or an
an intercept and slope (β 0 and intercept and coefficients of x1 and x2
β 1), define a line on a two- (β 0 , β 1 , and β 2), define a plane in
dimensional surface. a three-dimensional surface.
11- 8

## 11-2 The k-Variable Multiple

Regression Model
Thepopulation
The populationregression
regressionmodel
modelof ofaa
dependentvariable,
variable,Y,
Y,on
onaaset
setof ofkk x2
dependent
independentvariables,
variables,XX,1,XX,. y β
independent 2,. . . , Xk is
2 . . , Xk is
2
1
givenby:
given by:
Y=ββ 0++ββ X1X1 + β 2X2 + . . . + β kXk
Y= 1 1 + β 2X2 + . . . + β kXk
++εε
0
β 1
β
whereββ 0isisthe
where theY-intercept
Y-interceptof ofthe
the 0
0
regressionsurface
regression surfaceand eachββ i, ,ii==
andeach
1,2,...,kisisthe
theslope
slopeof
ofthe
i
theregression
regression
x1
1,2,...,k
surface--sometimes
sometimescalled
calledthetheresponse
response y = β 0 + β 1 x1 + β 2 x 2 + ε
surface
surface--with
surface withrespect
respecttotoXX.i.
Model assumptions:
Model assumptions: i

1.1. εε ~N(0,σ
~N(0,σ 2),),independent
independentofofother
othererrors.
errors.
2

2.2. The
Thevariables
variablesXXiare
i
areuncorrelated
uncorrelatedwith
withthe
theerror
errorterm.
term.
11- 9

## Simple and Multiple Least-Squares

Regression
Y y

x1
y = b0 + b1x
X x2 y = b0 + b1 x1 + b2 x 2

InInaasimple
simpleregression
regressionmodel,
model,
model
model InInaamultiple
multipleregression
regressionmodel,
model,
model
model
theleast-squares
the least-squaresestimators
estimators theleast-squares
least-squaresestimators
estimators
the
minimizethe
minimize thesum
sumofofsquared
squared minimizethethesum
sumofofsquared
squared
minimize
errorsfrom
errors fromthetheestimated
estimated errorsfrom
fromthe
theestimated
estimated
errors
regressionline.
regression line. regressionplane.
plane.
regression
11- 10

## The Estimated Regression

Relationship
Theestimated
The estimatedregression
regressionrelationship:
relationship:
relationship
relationship
Y = b0 + b1 X 1 + b2 X 2 ++bk X k
whereY isisthe
where thepredicted
predictedvalue
valueofofY,
Y,the
thevalue
valuelying
lyingononthethe
estimatedregression
estimated regressionsurface.
surface. The
Theterms
termsbbi,i,for
forii==0,0,1,1,....,k
....,kare
are
theleast-squares
the least-squaresestimates
estimatesof
ofthe
thepopulation
populationregression
regression
parametersββ i.i.
parameters
Theactual,
The actual,observed
observedvalue valueof ofYYisisthe
thepredicted
predictedvalue
valueplus
plusan
an
error:
error:
yyj j==bb00++bb11xx1j1j++bb22xx2j2j+.
+.....++bbkkxxkjkj+e,
+e, jj==1,1,…,
…,n.n.
11- 11

Least-Squares Estimation:
The 2-Variable Normal Equations
Minimizing the sum of squared errors with respect to the
estimated coefficients b0, b1, and b2 yields the following
normal equations which can be solved for b0, b1, and b2.

∑ y = nb + b ∑ x + b ∑ x
0 1 1 2 2

## ∑x y=b ∑x +b∑x +b ∑xx

2

1 0 1 1 1 2 1 2

∑x y=b ∑x +b∑xx +b ∑x
2

2 0 2 1 1 2 2 2
11- 12

Example 11-1

## YY XX11 XX22 XX1X1X2 X212 XX222 XX1Y1Y XX2Y2Y NormalEquations:

Equations:
2
2 X 1 Normal
72
72 12
12 55 60
60 144
144 25
25 864
864 360
360
76
76 11
11 88 88
88 121
121 64
64 836
836 608
608
78
78 15
15 66 90
90 225
225 36
36 1170
1170 468
468 743==10b
743 10b+123b
0+123b+65b 1+65b2
70 10 0 1 2
70 10 55 50
50 100
100 25
25 700
700 350
350 9382==123b
123b+1615b
68 11 9382 0+1615b+869b1+869b2
68 11 33 33
33 121
121 99 748
748 204
204 0 1 2
80 16 99 144 256 81 1280 720 5040==65b
65b+869b
80 16 144 256 81 1280 720 5040 0+869b+509b1+509b2
82
82 14
14 12
12 168
168 196
196 144
144 1148
1148 984
984 0 1 2
65
65 88 44 32
32 64
64 16
16 520
520 260
260
62 88 33 24 64 99 496 186
62
90
90 18
18 10
10
24
180
180
64
324
324 100
100
496
1620
1620
186
900
900
bb00==47.164942
47.164942
---
---
743
---
---
123
---
---
65
---
--- ----
----
869 16151615
---
---
509
----
----
9382
----
----
5040
bb11==1.5990404
1.5990404
743 123 65 869 509 9382 5040
bb22==1.1487479
1.1487479
Estimated regression equation:

Y = 47164942
. + 15990404
. X 1 + 11487479
. X2
11- 13

11- 14

## Decomposition of the Total Deviation

in a Multiple Regression Model

{}
y
} Y − Y: Error Deviation
Total deviation: Y − Y
Y − Y : Regression Deviation
y

x1

x2
TotalDeviation
Total Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
SST
SST == SSR
SSR ++ SSE
SSE
11- 15

## 11-3 The F Test of a Multiple

Regression Model
AAstatistical
statisticaltest
testfor
forthe
theexistence
existenceof ofaalinear
linearrelationship
relationshipbetween
betweenYYand
andany
anyor
or
allof
all ofthe
theindependent
independentvariables
variablesXX,1,xx,2,...,...,XX:k:
1 2 k
HH0:0: ββ 11==ββ 22==...=
...=ββ = k= 0
k 0
HH1:1: Not
Notall theββ i(i=1,2,...,k)
allthe (i=1,2,...,k)are
areequal
equaltoto00
i

## Source of Sum of Degrees of

Variation Squares Freedom Mean Square F Ratio

## Regression SSR k SSR

MSR = F = MSR
k MSE
Error SSE n - (k+1) SSE
MSE =
( n − ( k + 1))
Total SST n-1 SST
MST =
( n − 1)
11- 16

## Using the Template: Analysis of

Variance Table (Example 11-1)

## F Distribution with 2 and 7 Degrees of Freedom Thetest

The teststatistic,
statistic,FF==86.34,
86.34,isisgreater
greater
f(F)
thanthe
than thecritical
criticalpoint
pointofofFF(2,7) for
forany
any
(2, 7)
Test statistic = 86.34
commonlevel
common levelofofsignificance
significance
(p-value≈≈0),
(p-value 0),so
sothe
thenull
nullhypothesis
hypothesis
α =0.01
isisrejected,
rejected,and
andwe wemight
mightconclude
conclude
thatthe
that thedependent
dependentvariable
variableisisrelated
related
F totoone
oneorormore
moreof ofthe
theindependent
independent
0
F0.01 =9.55 variables.
variables.
11- 17

## 11-4 How Good is the Regression

y The mean square error is an unbiased
estimator of the variance of the population
2
errors, ε , denoted by σ :
SSE ∑ ( y − y) 2
MSE = =
( n − ( k + 1)) ( n − ( k + 1))
x1
Standard error of estimate:
x2 Errors: y - y s= MSE

2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
SSR SSE
2
R = =1-
SST SST
11- 18

SST

SSR SSE
2 SSR SSE
R = = 1-
SST SST

## The adjusted multiple coefficient of determination, R 2, is the coefficient of

determination with the SSE and SST divided by their respective degrees of freedom:
SSE
R 2 =1- (n -(k +1))
SST
(n -1)

Example11-1:
Example 11-1: ss==1.911
1.911 R-sq==96.1%
11- 19

## Measures of Performance in Multiple

Regression and the ANOVA Table
Source of Sum of Degrees of
Variation Squares Freedom Mean Square F Ratio

## Regression SSR (k) MSR

SSR F =
MSR = MSE
k
Error SSE (n-(k+1)) SSE
=(n-k-1) MSE =
(n − ( k + 1))
Total SST (n-1) SST
MST =
( n − 1)

SSE
SSR SSE 2
2 R ( n − ( k + 1))
R = = 1- F = 2 (n - (k + 1)) MSE
SST SST 2 R =1- =
(1 − R ) (k ) SST MST

(n - 1)
11- 20

## 11-5 Tests of the Significance of

Individual Regression Parameters

Hypothesistests
individualregression
regressionslope
slope
parameters:
parameters:
(1)
(1) HH00::ββ 11==00
HH11::ββ 11≠≠ 00
(2)
(2) HH00::ββ 22==00
HH11::ββ 22≠≠ 00
...
..
.
(k)
(k) HH00::ββ kk==00
HH : : β
β ≠≠ 00 b −0
Test statistic
11 kk for test i: t =
( n −( k +1 )
i

s(b )
i
11- 21

## Regression Results for Individual

Parameters (Interpret the Table)

Coefficient Standard
Variable Estimate Error t-Statistic
Constant 53.12 5.43 9.783
*
X1 2.03 0.22 9.227
*
X2 5.60 1.30 4.308
*
X3 10.35 6.88 1.504
X4 3.45 2.70 1.259
X5 -4.25 0.38 11.184
*
n=150 t0.025 =1.96
11- 22

11- 23

11- 24

## 11-6 Testing the Validity of the

Regression Model: Residual Plots

## Residuals vs M1 (Example 11-2)

It appears that the residuals are randomly distributed with no pattern and
with equal variance as M1 increases
11- 25

## 11-6 Testing the Validity of the

Regression Model: Residual Plots

## Residuals vs Price (Example 11-2)

It appears that the residuals are increasing as the Price increases. The
variance of the residuals is not constant.
11- 26

## Normal Probability Plot for the

Residuals: Example 11-2
Linear trend indicates residuals are normally distributed
11- 27

## Investigating the Validity of the Regression:

Outliers and Influential Observations

## Regression line Point with a large

y without outlier y value of xi

. . *
.
.. .. Regression
Regression line

. .. ..
when all data are
line with
. .. . included
.. . outlier
.. .. .. .
. . .. .
No relationship in
this cluster
* Outlier
x x
Outliers
Outliers InfluentialObservations
Influential Observations
11- 28

## Possible Relation in the Region between the

Available Cluster of Data and the Far Point

## Point with a large value of xii x

y *
Some of the possible data between the x
original cluster and the far point x
x
x x

. . . . x x x x
x x

.. .. .. . x x x x
x
. .. . x
x
x x x

## More appropriate curvilinear relationship

(seen when the in between data are known).

x
11- 29

## Outliers and Influential Observations:

Example 11-2

UnusualObservations
Unusual Observations
Obs.
Obs. M1
M1 EXPORTS
EXPORTS Fit
Fit Stdev.Fit
Stdev.Fit Residual
Residual
St.Resid
St.Resid
11 5.10
5.10 2.6000
2.6000 2.6420
2.6420 0.1288
0.1288 -0.0420
-0.0420 -0.14
-0.14
XX
22 4.90
4.90 2.6000
2.6000 2.6438
2.6438 0.1234
0.1234 -0.0438
-0.0438 -0.14
-0.14
XX
25
25 6.20
6.20 5.5000
5.5000 4.5949
4.5949 0.0676
0.0676 0.9051
0.9051 2.80R
2.80R
26
26 6.30
6.30 3.7000
3.7000 4.6311
4.6311 0.0651
0.0651 -0.9311
-0.9311 -2.87R
-2.87R
50
50 8.30
8.30 4.3000
4.3000 5.1317
5.1317 0.0648
0.0648 -0.8317
-0.8317 -2.57R
-2.57R
67
67 8.20
8.20 5.6000
5.6000 4.9474
4.9474 0.0668
0.0668 0.6526
0.6526
2.02R
2.02R

RRdenotes
denotesan
anobs.
obs.with
withaalarge
largest.
st.resid.
resid.
XXdenotes
denotesan
anobs.
obs.whose
whoseXXvalue
valuegives
givesititlarge
largeinfluence.
influence.
11- 30

## 11-7 Using the Multiple Regression

Model for Prediction

Sales EstimatedRegression
Estimated RegressionPlane
Planefor
forExample
Example11-1
11-1

89.76

18.00

63.42
8.00
12 3
Promotions
11- 31

## A (1-α ) 100% prediction interval for a value of Y given values of X :

i
yˆ ± t α s2( yˆ ) + MSE
( ,(n−(k +1)))
2
A (1-α) 100% prediction interval for the conditional mean of Y given
values of X :
i
yˆ ± t α s[Eˆ (Y )]
( ,(n−(k +1)))
2
11- 32

## 11-8 Qualitative (or Categorical)

Independent Variables (in Regression)
An indicator (dummy, binary) variable of qualitative level A:
1 if level A is obtained
Xh = 
0 if level A is not obtained
MOVIEEARN COST PROM BOOK
MOVIEEARN COST PROM BOOK
1 28 4.2 1.0 0
1 28 4.2 1.0 0
2 35 6.0 3.0 1
2 35 6.0 3.0 1
3 50 5.5 6.0 1
3 50 5.5 6.0 1
4 20 3.3 1.0 0
4 20 3.3 1.0 0
5 75 12.5 11.0 1
5 75 12.5 11.0 1
6 60 9.6 8.0 1
6 60 9.6 8.0 1
7 15 2.5 0.5 0
7 15 2.5 0.5 0
8 45 10.8 5.0 0
8 45 10.8 5.0 0
9 50 8.4 3.0 1
9 50 8.4 3.0 1
10 34 6.6 2.0 0
10 34 6.6 2.0 0
11 48 10.7 1.0 1
11 48 10.7 1.0 1
12 82 11.0 15.0 1
12 82 11.0 15.0 1
13 24 3.5 4.0 0
13 24 3.5 4.0 0
14 50 6.9 10.0 0
14 50 6.9 10.0 0
15 58 7.8 9.0 1
15 58 7.8 9.0 1
16 63 10.1 10.0 0
16 63 10.1 10.0 0
17 30 5.0 1.0 1
17 30 5.0 1.0 1
18 37 7.5 5.0 0
18 37 7.5 5.0 0
19 45 6.4 8.0 1
19 45 6.4 8.0 1
20
20
72
72
10.0
10.0
12.0
12.0 1
1
EXAMPLE 11-3
11- 33

## Picturing Qualitative Variables in

Regression
Y y
Line for X2=1
b3

b0+b2
Line for X2=0

b0
x1

X1 x2
AAregression
regressionwith
withone
one AAmultiple
multipleregression
regressionwith
withtwo
two
quantitativevariable
quantitative variable(X
(X)1)and
and quantitativevariables
quantitative variables(X
(X1and
andXX)2)
1 1 2
onequalitative
one qualitativevariable
variable(X(X):2): andone
and onequalitative
qualitativevariable
variable(X
(X):3):

y = b + b x + b x + b x
2 3

y = b + b x + b x
0 1 1 2 2
0 1 1 2 2 3 3
11- 34

## Picturing Qualitative Variables in Regression:

Three Categories and Two Dummy Variables

Y
Line for X = 0 and X3 = 1 AAqualitative
qualitative
variablewith
variable withrr
levelsor
levels orcategories
categories
Line for X2 = 1 and X3 = 0 isisrepresented
representedwith
with
(r-1)0/1
(r-1) 0/1(dummy)
(dummy)
b0+b3 variables.
variables.
Line for X2 = 0 and X3 = 0

b0+b2

b0
X1
Category XX2
Category XX33
2
AAregression
regressionwith
withone
onequantitative
quantitativevariable
variable(X
(X)1)and
1
andtwo
qualitativevariables
qualitative variables(X
(X2and
2
andXX):
2
2): Drama
Drama 00 11
Romance 11 00
y = b + b x + b x + b x
0 1 1 2 2 3 3
Romance
11- 35

## Using Qualitative Variables in

Regression: Example 11-4

Salary==8547
Salary 8547 ++ 949
949Education
Education ++ 1258
1258Experience
Experience --
3256Gender
3256 Gender
(SE) (32.6)
(SE) (32.6) (45.1)
(45.1) (78.5)
(78.5)
(212.4)
(212.4)
(t)
(t) (262.2)
(262.2) (21.0)
(21.0) (16.0)
(16.0)
(-15.3)
(-15.3) Onaverage,
average,female
femalesalaries
salariesare
are
Gender = 
1 if Female On
0 if Male \$3256below
\$3256 belowmale
malesalaries
salaries
11- 36

## Interactions between Quantitative and

Qualitative Variables: Shifting Slopes
Line for X2=0
Y

## Slope = b1 Line for X2=1

b0

Slope = b1+b3

b0+b2
X1

AAregression
regressionwith
withinteraction
interactionbetween
betweenaaquantitative
quantitative
variable(X
variable (X)1)and
andaaqualitative
qualitativevariable
variable(X
(X2):):
1 2

y = b + b x + b x + b x x
0 1 1 2 2 3 1 2
11- 37

## One-variable polynomial regression model:

Y= β 0+β 1 X + β 2X2 + β 3X3 +. . . + β mXm +ε
where m is the degree of the polynomial - the highest power of X appearing in
the equation. The degree of the polynomial is the order of the model.
Y Y
y = b + b X
y = b + b X
0 1
0 1

y = b + b X + b X
0 1 2
2

(b < 0) y = b + b X + b X + b X
0 1 2
2
3
3

X1 X1
11- 38

11- 39

## Polynomial Regression: Other

Variables and Cross-Product Terms

Variable Estimate
Variable Estimate Standard
StandardError
Error T-statistic
T-statistic
XX1 1 2.34
2.34 0.92
0.92 2.54
2.54
XX2 2 3.11
3.11 1.05
1.05 2.96
2.96
XX121 4.22 1.00 4.22
2
4.22 1.00 4.22
XX222 3.57 2.12 1.68
2
3.57 2.12 1.68
XX1X1X 2.77 2.30 1.20
2
2
2.77 2.30 1.20
11- 40

## 11-10 Nonlinear Models and

Transformations
The multiplica tive model :
Y =β X β X β X β ε
0 1
1

2
2

3
3

## The logarithmi c transfor mation :

log Y = log β +β log X +β log X +β log X +log ε
0 1 1 2 2 3 3
11- 41

Y =βe β ε
0
1X

## The logarithmi ctransfor mation :

log Y =log β +βX +log ε
0 1 1
11- 42

## Plots of Transformed Variables

Sim ple Regression of Sales on Ad vertising Regression of Sales on Log(Advertising)

30 25

20
SALES

SALES
15

## Y = 6 .59 2 71 + 1.19 176 X Y = 3.6 6 8 2 5 + 6 .78 4 X

10 R- Sq uared = 0 .8 9 5 R- Sq uared = 0 .978
5

0 5 10 15 0 1 2 3

1.5
3.5

RESIDS 0.5
LOGSALE

2.5
Y = 1.70 0 8 2 + 0 .5 53 13 6 X -0.5

R- Sq uared = 0 .9 47
-1.5
1.5
2 12 22
0 1 2 3
11- 43

## Variance Stabilizing Transformations

••Square
Square root
root transformation:
transformation: Y ′ = Y
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
approximately proportionalto
tothe
theconditional
conditionalmean
meanof
ofYY

••Logarithmic
Logarithmic transformation:
transformation: Y ′ = log(Y )
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofregression
regressionerrors
errorsisisapproximately
approximately
proportionalto
proportional tothe
thesquare
squareof
ofthe
theconditional
conditionalmean
meanof
ofYY

••Reciprocal
Reciprocal transformation:
transformation: Y ′ = 1
Y
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
approximately proportionalto
tothe
thefourth
fourthpower
powerof
ofthe
the
conditionalmean
conditional meanofofYY
11- 44

## Regression with Dependent Indicator

Variables
The logistic function:
e ( β +β X )
0 1

E (Y X ) =
1 +e ( β +β X )
0 1

## Transformation to linearize the logistic function:

 p 
p ′ = log  
1 − p 

y Logistic Function
1

0
x
11- 45

11-11: Multicollinearity
x2

x1 x2 x1
Orthogonal X variables provide Perfectly collinear X variables
information from independent provide identical information
sources. No multicollinearity. content. No regression.

x2
x2
x1 x1
Some degree of collinearity.
A high degree of negative
Problems with regression depend
collinearity also causes problems
on the degree of collinearity.
with regression.
11- 46

Effects of Multicollinearity

•• Variancesof
Variances ofregression
regressioncoefficients
coefficientsare
areinflated.
inflated.
•• Magnitudesof
Magnitudes ofregression
regressioncoefficients
coefficientsmay
maybebedifferent
different
fromwhat
from whatare
areexpected.
expected.
•• Signsof
Signs ofregression
regressioncoefficients
coefficientsmay
maynotnotbe
beas
asexpected.
expected.
removingvariables
variablesproduces
produceslarge
largechanges
changesin in
coefficients.
coefficients.
Removing datapoint
pointmay
maycause
causelarge
largechanges
changesin in
coefficientestimates
coefficient estimatesor
orsigns.
signs.
•• Insome
In somecases,
cases,the
theFFratio
ratiomay
maybebesignificant
significantwhile
whilethe
thett
ratiosare
ratios arenot.
not.
11- 47
Detecting the Existence of Multicollinearity:
Correlation Matrix of Independent Variables and
Variance Inflation Factors
11- 48

## The variance inflation factor associated with X h :

1
VIF ( X h ) =
1 − Rh2
where R 2h is the R 2 value obtained for the regression of X on
the other independent variables.

VIF100

50

0
0.0 0.5 1.0 Rh2
11- 49

## Observation: The VIF (Variance Inflation Factor)

values for both variables Lend and Price are both
greater than 5. This would indicate that some degree of
multicollinearity exists with respect to these two
variables.
11- 50

## Solutions to the Multicollinearity

Problem

•• Drop
Drop aa collinear
collinear variable
variable from
from the
the
regression
regression
•• Change
Change inin sampling
sampling plan
plan to
to include
include
elements outside
elements outside the
the multicollinearity
multicollinearity range
range
•• Transformations
Transformations of of variables
variables
•• Ridge
Ridge regression
regression
11- 51

## 11-12 Residual Autocorrelation and

the Durbin-Watson Test
Anautocorrelation
An autocorrelationisisaacorrelation
correlationof
ofthe
thevalues
valuesofofaavariable
variable
withvalues
with valuesof
ofthe
thesame
samevariable
variablelagged
laggedone
oneorormore
moreperiods
periods
back. Consequences
back. Consequencesof ofautocorrelation
autocorrelationinclude
includeinaccurate
inaccurate
estimatesof
estimates ofvariances
variancesand
andinaccurate
inaccuratepredictions.
predictions.
LaggedResiduals
Lagged Residuals TheDurbin-Watson
Durbin-Watsontest test(first-order
(first-order
The
ii εε i i εε i-1i-1 εε i-2i-2 εε i-3i-3 εε i-4i-4
autocorrelation):
autocorrelation):
11 1.0
1.0 ** ** ** ** HH0:0:ρρ 11==00
22 0.0 1.0 ** ** **
HH1:1: ρρ 11≠≠ 00
0.0 1.0
33 -1.0
-1.0 0.00.0 1.0 **
1.0 **
44 2.0 -1.0
2.0 -1.0 0.0 1.0
0.0 1.0 ** TheDurbin-Watson
The Durbin-Watsontest teststatistic:
statistic:
55 3.0
3.0 2.02.0 -1.0 -1.0 0.0 0.0 1.01.0
n
66 -2.0 3.0 2.0 -1.0 -1.0 0.0 2
-2.0 3.0 2.0 0.0 ∑ ( ei − ei −1 )
77 1.0 -2.0 -2.0 3.0 2.0 2.0 -1.0 -1.0
d = i =2 n
1.0 3.0
88 1.5
1.5 1.01.0 -2.0 -2.0 3.0 3.0 2.02.0
99 1.0 1.5 1.0 -2.0 -2.0 3.0 2
10
1.0
-2.5
1.5
1.0
1.0
1.5 1.0 1.0 -2.0
3.0
-2.0
∑ ei
10 -2.5 1.0 1.5 i =1
11- 52

## Critical Points of the Durbin-Watson Statistic: α =0.05,

n= Sample Size, k = Number of Independent Variables

## kk==11 kk==22 kk==33 kk==44 kk==55

nn ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU
15
15 1.08
1.08 1.36
1.36 0.95
0.95 1.54
1.54 0.82
0.82 1.75
1.75 0.69
0.69 1.97
1.97 0.56
0.56 2.21
2.21
16
16 1.10
1.10 1.37
1.37 0.98
0.98 1.54
1.54 0.86
0.86 1.73
1.73 0.74
0.74 1.93
1.93 0.62
0.62 2.15
2.15
17
17 1.13
1.13 1.38
1.38 1.02
1.02 1.54
1.54 0.90
0.90 1.71
1.71 0.78
0.78 1.90
1.90 0.67
0.67 2.10
2.10
18
18 1.16
1.16 1.39
1.39 1.05
1.05 1.53
1.53 0.93
0.93 1.69
1.69 0.82
0.82 1.87
1.87 0.71
0.71 2.06
2.06
. .. . .. . .. . .. . .. . ..
.. .. .. .. .. ..
. . . . . .
65
65 1.57
1.57 1.63
1.63 1.54
1.54 1.66
1.66 1.50
1.50 1.70
1.70 1.47
1.47 1.73
1.73 1.44
1.44 1.77
1.77
70
70 1.58
1.58 1.64
1.64 1.55
1.55 1.67
1.67 1.52
1.52 1.70
1.70 1.49
1.49 1.74
1.74 1.46
1.46 1.77
1.77
75
75 1.60
1.60 1.65
1.65 1.57
1.57 1.68
1.68 1.54
1.54 1.71
1.71 1.51
1.51 1.74
1.74 1.49
1.49 1.77
1.77
80
80 1.61
1.61 1.66
1.66 1.59
1.59 1.69
1.69 1.56
1.56 1.72
1.72 1.53
1.53 1.74
1.74 1.51
1.51 1.77
1.77
85
85 1.62
1.62 1.67
1.67 1.60
1.60 1.70
1.70 1.57
1.57 1.72
1.72 1.55
1.55 1.75
1.75 1.52
1.52 1.77
1.77
90
90 1.63
1.63 1.68
1.68 1.61
1.61 1.70
1.70 1.59
1.59 1.73
1.73 1.57
1.57 1.75
1.75 1.54
1.54 1.78
1.78
95
95 1.64
1.64 1.69
1.69 1.62
1.62 1.71
1.71 1.60
1.60 1.73
1.73 1.58
1.58 1.75
1.75 1.56
1.56 1.78
1.78
100
100 1.65
1.65 1.69
1.69 1.63
1.63 1.72
1.72 1.61
1.61 1.74
1.74 1.59
1.59 1.76
1.76 1.57
1.57 1.78
1.78
11- 53

## Positive Test is No Test is Negative

Autocorrelation Inconclusive Autocorrelation Inconclusive Autocorrelation

0 dL dU 4-dU 4-dL 4

Fornn==67,
For 4: ddUU≈≈1.73
67,kk==4: 4-dUU≈≈2.27
1.73 4-d 2.27
ddLL≈≈1.47 4-ddLL≈≈2.53
1.47 4- 2.53<<2.58
2.58
HH00isisrejected,
rejected,and
andweweconclude
concludethere
thereisisnegative
negativefirst-order
first-order
autocorrelation.
autocorrelation.
11- 54

## 11-13 Partial F Tests and Variable

Selection Methods
Fullmodel:
Full model:
YY==ββ 0 0++ββ 1 1XX1 1++ββ 2 2XX2 2++ββ 3 3XX3 3++ββ 4 4XX4 4++εε
Reducedmodel:
Reduced model:
YY==ββ 0 0++ββ 1 1XX1 1++ββ 2 2XX2 2++εε

PartialFFtest:
Partial test:
HH0:0:ββ 3 3==ββ 4 4==00
HH1:1:ββ 3 3and
andββ 4not
4
notboth
both00
(SSE − SSE ) / r
PartialFFstatistic:
statistic: R F
Partial F =
(r, (n − (k + 1)) MSE
F

whereSSE
where SSERisisthethesum
sumofofsquared
squarederrors
errorsofofthe
thereduced
reducedmodel,
model,SSESSEFisisthe
thesum
sumofofsquared
squared
R F
errorsofofthe
errors thefull
fullmodel;
model;MSE
MSEFisisthe
themean
meansquare
squareerror
errorofofthe
thefull
fullmodel
model[MSE
[MSEF==SSESSE/(n-
F/(n-
F F F
(k+1))];rrisisthe
(k+1))]; thenumber
numberofofvariables
variablesdropped
droppedfromfromthe
thefull
fullmodel.
model.
11- 55

## Variable Selection Methods

••All
All possible
possible regressions
regressions
Run
 Runregressions
regressionswith
withall
allpossible
possiblecombinations
combinationsof
of
independentvariables
independent variablesand
andselect
selectbest
bestmodel
model

## A p-value of 0.001 indicates

that we should reject the null
hypothesis H0: the slopes for
Lend and Exch. are zero.
11- 56

## Variable Selection Methods

••Stepwise
Stepwise procedures
procedures
Forward
 Forwardselection
selection
onevariable
variableatataatime
timetotothe
themodel,
model,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Backward
 Backwardelimination
elimination
•• Remove
Removeone
onevariable
variableatataatime,
time,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Stepwise
 Stepwiseregression
regression
variablestotothe
themodel
modeland
andsubtracts
subtractsvariables
variablesfrom
fromthe
themodel,
model,on
on
thebasis
the basisof
ofthe
theFFstatistic
statistic
11- 57

Stepwise Regression

## Is there at least one variable with p-value > Pin ? No

Stop
Yes
Enter most significant (smallest p-value) variable into model

## Calculate partial F for all variables in the model

Remove
Is there a variable with p-value > Pout ?
variable
No
11- 58

## Stepwise Regression: Using the

Computer (MINITAB)

MTB>>STEPWISE
MTB STEPWISE'EXPORTS'
'EXPORTS'PREDICTORS
PREDICTORS 'M1’
'M1’ 'LEND'
'LEND' 'PRICE’
'PRICE’ 'EXCHANGE'
'EXCHANGE'

StepwiseRegression
Stepwise Regression

F-to-Enter:
F-to-Enter: 4.00 F-to-Remove:
4.00 F-to-Remove: 4.00
4.00
ResponseisisEXPORTS
Response EXPORTS on
on 44predictors,
predictors,with
withNN== 67
67

Step
Step 11 22
Constant
Constant 0.9348
0.9348 -3.4230
-3.4230
M1
M1 0.520
0.520 0.361
0.361
T-Ratio
T-Ratio 9.89
9.89 9.21
9.21
PRICE
PRICE 0.0370
0.0370
T-Ratio
T-Ratio 9.05
9.05

SS 0.495
0.495 0.331
0.331
R-Sq
R-Sq 60.08
60.08 82.48
82.48
11- 59

## Using the Computer: MINITAB

MTB>> REGRESS
REGRESS 'EXPORTS’
'EXPORTS’ 44 'M1’
'M1’ 'LEND’
'LEND’ 'PRICE'
'PRICE' 'EXCHANGE';
'EXCHANGE';
MTB
SUBC>vif;
vif;
SUBC>
SUBC>dw.
dw.
SUBC>
Regression Analysis
Regression Analysis
Theregression
regressionequation
equationisis
The
EXPORTS = - 4.02 ++0.368
EXPORTS = - 4.02 0.368M1 M1++0.0047
0.0047LEND
LEND++0.0365
0.0365PRICE
PRICE++0.27
0.27
EXCHANGE
EXCHANGE
Predictor Coef Stdev t-ratio
Predictor Coef Stdev t-ratio pp
VIF
VIF
Constant -4.015 2.766 -1.45 0.152
Constant -4.015 2.766 -1.45 0.152
M1 0.36846 0.06385 5.77 0.000
M1 0.36846 0.06385 5.77 0.000
3.2
3.2
LEND 0.00470 0.04922 0.10 0.924
LEND 0.00470 0.04922 0.10 0.924
5.4
5.4
PRICE 0.036511 0.009326 3.91 0.000
PRICE 0.036511 0.009326 3.91 0.000
6.3
6.3
EXCHANGE 0.268 1.175 0.23 0.820
EXCHANGE 0.268 1.175 0.23 0.820
1.4
1.4

ss==0.3358
0.3358 R-sq==82.5%
81.4%
AnalysisofofVariance
Variance
Analysis
SOURCE DF SS MS
SOURCE DF SS MS FF
pp
Regression 44 32.9463 8.2366 73.06
Regression 32.9463 8.2366 73.06
11- 60

## Using the Computer: SAS (continued)

ParameterEstimates
Estimates
Parameter
Parameter Standard TTfor
forH0:
H0:
Parameter Standard
Variable DF Estimate Error Parameter=0
Variable DF Estimate Error Parameter=0
Prob >>|T|
Prob |T|

## INTERCEP 11 -4.015461 2.76640057 -1.452

INTERCEP -4.015461 2.76640057 -1.452
0.1517
0.1517
M1 11 0.368456 0.06384841 5.771
M1 0.368456 0.06384841 5.771
0.0001
0.0001
LEND 1 0.004702 0.04922186 0.096
LEND 1 0.004702 0.04922186 0.096
0.9242
0.9242
PRICE 11 0.036511 0.00932601 3.915
PRICE 0.036511 0.00932601 3.915
0.0002
0.0002
EXCHANGE 11 0.267896 1.17544016 0.228
EXCHANGE 0.267896 1.17544016 0.228
0.8205
0.8205
Variance
Variance
Variable DF Inflation
Variable DF Inflation
INTERCEP 11 0.00000000
INTERCEP 0.00000000
M1 11 3.20719533
M1 3.20719533
LEND 11 5.35391367
LEND 5.35391367
PRICE 11 6.28873181
PRICE 6.28873181
EXCHANGE 11 1.38570639
EXCHANGE 1.38570639