Вы находитесь на странице: 1из 60

11- 1

COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
6th edition.
11- 2

Chapter 11

Multiple
Regression
11- 3

11 Multiple Regression (1)


• Using Statistics
• The k-Variable Multiple Regression Model
• The F Test of a Multiple Regression Model
• How Good is the Regression
• Tests of the Significance of Individual Regression
Parameters
• Testing the Validity of the Regression Model
• Using the Multiple Regression Model for
Prediction
11- 4

11 Multiple Regression (2)


• Qualitative Independent Variables
• Polynomial Regression
• Nonlinear Models and Transformations
• Multicollinearity
• Residual Autocorrelation and the Durbin-Watson
Test
• Partial F Tests and Variable Selection Methods
• Multiple Regression Using the Solver
• The Matrix Approach to Multiple Regression
Analysis
11- 5

11 LEARNING OBJECTIVES (1)


After studying this chapter you should be able to:
• Determine whether multiple regression would be applicable to a given
instance
• Formulate a multiple regression model
• Carryout a multiple regression using a spreadsheet template
• Test the validity of a multiple regression by analyzing residuals
• Carryout hypothesis tests about the regression coefficients
• Compute a prediction interval for the dependent variable
11- 6

11 LEARNING OBJECTIVES (2)


After studying this chapter you should be able to:
• Use indicator variables in a multiple regression
• Carryout a polynomial regression
• Conduct a Durbin-Watson test for autocorrelation
in residuals
• Conduct a partial F-test
• Determine which independent variables are to be
included in a multiple regression model
• Solve multiple regression problems using the
Solver macro
11- 7

11-1 Using Statistics

y Lines y Planes
B

B
A

Slope: β 11 C
A
x1
Intercept: β 00

x2
x
Any two points (A and B), or Any three points (A, B, and C), or an
an intercept and slope (β 0 and intercept and coefficients of x1 and x2
β 1), define a line on a two- (β 0 , β 1 , and β 2), define a plane in
dimensional surface. a three-dimensional surface.
11- 8

11-2 The k-Variable Multiple


Regression Model
Thepopulation
The populationregression
regressionmodel
modelof ofaa
dependentvariable,
variable,Y,
Y,on
onaaset
setof ofkk x2
dependent
independentvariables,
variables,XX,1,XX,. y β
independent 2,. . . , Xk is
2 . . , Xk is
2
1
givenby:
given by:
Y=ββ 0++ββ X1X1 + β 2X2 + . . . + β kXk
Y= 1 1 + β 2X2 + . . . + β kXk
++εε
0
β 1
β
whereββ 0isisthe
where theY-intercept
Y-interceptof ofthe
the 0
0
regressionsurface
regression surfaceand eachββ i, ,ii==
andeach
1,2,...,kisisthe
theslope
slopeof
ofthe
i
theregression
regression
x1
1,2,...,k
surface--sometimes
sometimescalled
calledthetheresponse
response y = β 0 + β 1 x1 + β 2 x 2 + ε
surface
surface--with
surface withrespect
respecttotoXX.i.
Model assumptions:
Model assumptions: i

1.1. εε ~N(0,σ
~N(0,σ 2),),independent
independentofofother
othererrors.
errors.
2

2.2. The
Thevariables
variablesXXiare
i
areuncorrelated
uncorrelatedwith
withthe
theerror
errorterm.
term.
11- 9

Simple and Multiple Least-Squares


Regression
Y y

x1
y = b0 + b1x
X x2 y = b0 + b1 x1 + b2 x 2

InInaasimple
simpleregression
regressionmodel,
model,
model
model InInaamultiple
multipleregression
regressionmodel,
model,
model
model
theleast-squares
the least-squaresestimators
estimators theleast-squares
least-squaresestimators
estimators
the
minimizethe
minimize thesum
sumofofsquared
squared minimizethethesum
sumofofsquared
squared
minimize
errorsfrom
errors fromthetheestimated
estimated errorsfrom
fromthe
theestimated
estimated
errors
regressionline.
regression line. regressionplane.
plane.
regression
11- 10

The Estimated Regression


Relationship
Theestimated
The estimatedregression
regressionrelationship:
relationship:
relationship
relationship
Y = b0 + b1 X 1 + b2 X 2 ++bk X k
whereY isisthe
where thepredicted
predictedvalue
valueofofY,
Y,the
thevalue
valuelying
lyingononthethe
estimatedregression
estimated regressionsurface.
surface. The
Theterms
termsbbi,i,for
forii==0,0,1,1,....,k
....,kare
are
theleast-squares
the least-squaresestimates
estimatesof
ofthe
thepopulation
populationregression
regression
parametersββ i.i.
parameters
Theactual,
The actual,observed
observedvalue valueof ofYYisisthe
thepredicted
predictedvalue
valueplus
plusan
an
error:
error:
yyj j==bb00++bb11xx1j1j++bb22xx2j2j+.
+.....++bbkkxxkjkj+e,
+e, jj==1,1,…,
…,n.n.
11- 11

Least-Squares Estimation:
The 2-Variable Normal Equations
Minimizing the sum of squared errors with respect to the
estimated coefficients b0, b1, and b2 yields the following
normal equations which can be solved for b0, b1, and b2.

∑ y = nb + b ∑ x + b ∑ x
0 1 1 2 2

∑x y=b ∑x +b∑x +b ∑xx


2

1 0 1 1 1 2 1 2

∑x y=b ∑x +b∑xx +b ∑x
2

2 0 2 1 1 2 2 2
11- 12

Example 11-1

YY XX11 XX22 XX1X1X2 X212 XX222 XX1Y1Y XX2Y2Y NormalEquations:


Equations:
2
2 X 1 Normal
72
72 12
12 55 60
60 144
144 25
25 864
864 360
360
76
76 11
11 88 88
88 121
121 64
64 836
836 608
608
78
78 15
15 66 90
90 225
225 36
36 1170
1170 468
468 743==10b
743 10b+123b
0+123b+65b 1+65b2
70 10 0 1 2
70 10 55 50
50 100
100 25
25 700
700 350
350 9382==123b
123b+1615b
68 11 9382 0+1615b+869b1+869b2
68 11 33 33
33 121
121 99 748
748 204
204 0 1 2
80 16 99 144 256 81 1280 720 5040==65b
65b+869b
80 16 144 256 81 1280 720 5040 0+869b+509b1+509b2
82
82 14
14 12
12 168
168 196
196 144
144 1148
1148 984
984 0 1 2
65
65 88 44 32
32 64
64 16
16 520
520 260
260
62 88 33 24 64 99 496 186
62
90
90 18
18 10
10
24
180
180
64
324
324 100
100
496
1620
1620
186
900
900
bb00==47.164942
47.164942
---
---
743
---
---
123
---
---
65
---
--- ----
----
869 16151615
---
---
509
----
----
9382
----
----
5040
bb11==1.5990404
1.5990404
743 123 65 869 509 9382 5040
bb22==1.1487479
1.1487479
Estimated regression equation:

Y = 47164942
. + 15990404
. X 1 + 11487479
. X2
11- 13

Example 11-1: Using the Template

Regression results for Alka-Seltzer sales


11- 14

Decomposition of the Total Deviation


in a Multiple Regression Model

{}
y
} Y − Y: Error Deviation
Total deviation: Y − Y
Y − Y : Regression Deviation
y

x1

x2
TotalDeviation
Total Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
SST
SST == SSR
SSR ++ SSE
SSE
11- 15

11-3 The F Test of a Multiple


Regression Model
AAstatistical
statisticaltest
testfor
forthe
theexistence
existenceof ofaalinear
linearrelationship
relationshipbetween
betweenYYand
andany
anyor
or
allof
all ofthe
theindependent
independentvariables
variablesXX,1,xx,2,...,...,XX:k:
1 2 k
HH0:0: ββ 11==ββ 22==...=
...=ββ = k= 0
k 0
HH1:1: Not
Notall theββ i(i=1,2,...,k)
allthe (i=1,2,...,k)are
areequal
equaltoto00
i

Source of Sum of Degrees of


Variation Squares Freedom Mean Square F Ratio

Regression SSR k SSR


MSR = F = MSR
k MSE
Error SSE n - (k+1) SSE
MSE =
( n − ( k + 1))
Total SST n-1 SST
MST =
( n − 1)
11- 16

Using the Template: Analysis of


Variance Table (Example 11-1)

F Distribution with 2 and 7 Degrees of Freedom Thetest


The teststatistic,
statistic,FF==86.34,
86.34,isisgreater
greater
f(F)
thanthe
than thecritical
criticalpoint
pointofofFF(2,7) for
forany
any
(2, 7)
Test statistic = 86.34
commonlevel
common levelofofsignificance
significance
(p-value≈≈0),
(p-value 0),so
sothe
thenull
nullhypothesis
hypothesis
α =0.01
isisrejected,
rejected,and
andwe wemight
mightconclude
conclude
thatthe
that thedependent
dependentvariable
variableisisrelated
related
F totoone
oneorormore
moreof ofthe
theindependent
independent
0
F0.01 =9.55 variables.
variables.
11- 17

11-4 How Good is the Regression


y The mean square error is an unbiased
estimator of the variance of the population
2
errors, ε , denoted by σ :
SSE ∑ ( y − y) 2
MSE = =
( n − ( k + 1)) ( n − ( k + 1))
x1
Standard error of estimate:
x2 Errors: y - y s= MSE

2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
SSR SSE
2
R = =1-
SST SST
11- 18

Decomposition of the Sum of Squares and


the Adjusted Coefficient of Determination

SST

SSR SSE
2 SSR SSE
R = = 1-
SST SST

The adjusted multiple coefficient of determination, R 2, is the coefficient of


determination with the SSE and SST divided by their respective degrees of freedom:
SSE
R 2 =1- (n -(k +1))
SST
(n -1)

Example11-1:
Example 11-1: ss==1.911
1.911 R-sq==96.1%
R-sq 96.1% R-sq(adj)==95.0%
R-sq(adj) 95.0%
11- 19

Measures of Performance in Multiple


Regression and the ANOVA Table
Source of Sum of Degrees of
Variation Squares Freedom Mean Square F Ratio

Regression SSR (k) MSR


SSR F =
MSR = MSE
k
Error SSE (n-(k+1)) SSE
=(n-k-1) MSE =
(n − ( k + 1))
Total SST (n-1) SST
MST =
( n − 1)

SSE
SSR SSE 2
2 R ( n − ( k + 1))
R = = 1- F = 2 (n - (k + 1)) MSE
SST SST 2 R =1- =
(1 − R ) (k ) SST MST

(n - 1)
11- 20

11-5 Tests of the Significance of


Individual Regression Parameters

Hypothesistests
Hypothesis testsabout
aboutindividual
individualregression
regressionslope
slope
parameters:
parameters:
(1)
(1) HH00::ββ 11==00
HH11::ββ 11≠≠ 00
(2)
(2) HH00::ββ 22==00
HH11::ββ 22≠≠ 00
...
..
.
(k)
(k) HH00::ββ kk==00
HH : : β
β ≠≠ 00 b −0
Test statistic
11 kk for test i: t =
( n −( k +1 )
i

s(b )
i
11- 21

Regression Results for Individual


Parameters (Interpret the Table)

Coefficient Standard
Variable Estimate Error t-Statistic
Constant 53.12 5.43 9.783
*
X1 2.03 0.22 9.227
*
X2 5.60 1.30 4.308
*
X3 10.35 6.88 1.504
X4 3.45 2.70 1.259
X5 -4.25 0.38 11.184
*
n=150 t0.025 =1.96
11- 22

Example 11-1: Using the Template

Regression results for Alka-Seltzer sales


11- 23

Using the Template: Example 11-2

Regression results for Exports to Singapore


11- 24

11-6 Testing the Validity of the


Regression Model: Residual Plots

Residuals vs M1 (Example 11-2)

It appears that the residuals are randomly distributed with no pattern and
with equal variance as M1 increases
11- 25

11-6 Testing the Validity of the


Regression Model: Residual Plots

Residuals vs Price (Example 11-2)

It appears that the residuals are increasing as the Price increases. The
variance of the residuals is not constant.
11- 26

Normal Probability Plot for the


Residuals: Example 11-2
Linear trend indicates residuals are normally distributed
11- 27

Investigating the Validity of the Regression:


Outliers and Influential Observations

Regression line Point with a large


y without outlier y value of xi

. . *
.
.. .. Regression
Regression line

. .. ..
when all data are
line with
. .. . included
.. . outlier
.. .. .. .
. . .. .
No relationship in
this cluster
* Outlier
x x
Outliers
Outliers InfluentialObservations
Influential Observations
11- 28

Possible Relation in the Region between the


Available Cluster of Data and the Far Point

Point with a large value of xii x


y *
Some of the possible data between the x
original cluster and the far point x
x
x x

. . . . x x x x
x x

.. .. .. . x x x x
x
. .. . x
x
x x x

More appropriate curvilinear relationship


(seen when the in between data are known).

x
11- 29

Outliers and Influential Observations:


Example 11-2

UnusualObservations
Unusual Observations
Obs.
Obs. M1
M1 EXPORTS
EXPORTS Fit
Fit Stdev.Fit
Stdev.Fit Residual
Residual
St.Resid
St.Resid
11 5.10
5.10 2.6000
2.6000 2.6420
2.6420 0.1288
0.1288 -0.0420
-0.0420 -0.14
-0.14
XX
22 4.90
4.90 2.6000
2.6000 2.6438
2.6438 0.1234
0.1234 -0.0438
-0.0438 -0.14
-0.14
XX
25
25 6.20
6.20 5.5000
5.5000 4.5949
4.5949 0.0676
0.0676 0.9051
0.9051 2.80R
2.80R
26
26 6.30
6.30 3.7000
3.7000 4.6311
4.6311 0.0651
0.0651 -0.9311
-0.9311 -2.87R
-2.87R
50
50 8.30
8.30 4.3000
4.3000 5.1317
5.1317 0.0648
0.0648 -0.8317
-0.8317 -2.57R
-2.57R
67
67 8.20
8.20 5.6000
5.6000 4.9474
4.9474 0.0668
0.0668 0.6526
0.6526
2.02R
2.02R

RRdenotes
denotesan
anobs.
obs.with
withaalarge
largest.
st.resid.
resid.
XXdenotes
denotesan
anobs.
obs.whose
whoseXXvalue
valuegives
givesititlarge
largeinfluence.
influence.
11- 30

11-7 Using the Multiple Regression


Model for Prediction

Sales EstimatedRegression
Estimated RegressionPlane
Planefor
forExample
Example11-1
11-1

89.76

Advertising

18.00

63.42
8.00
12 3
Promotions
11- 31

Prediction in Multiple Regression

A (1-α ) 100% prediction interval for a value of Y given values of X :


i
yˆ ± t α s2( yˆ ) + MSE
( ,(n−(k +1)))
2
A (1-α) 100% prediction interval for the conditional mean of Y given
values of X :
i
yˆ ± t α s[Eˆ (Y )]
( ,(n−(k +1)))
2
11- 32

11-8 Qualitative (or Categorical)


Independent Variables (in Regression)
An indicator (dummy, binary) variable of qualitative level A:
1 if level A is obtained
Xh = 
0 if level A is not obtained
MOVIEEARN COST PROM BOOK
MOVIEEARN COST PROM BOOK
1 28 4.2 1.0 0
1 28 4.2 1.0 0
2 35 6.0 3.0 1
2 35 6.0 3.0 1
3 50 5.5 6.0 1
3 50 5.5 6.0 1
4 20 3.3 1.0 0
4 20 3.3 1.0 0
5 75 12.5 11.0 1
5 75 12.5 11.0 1
6 60 9.6 8.0 1
6 60 9.6 8.0 1
7 15 2.5 0.5 0
7 15 2.5 0.5 0
8 45 10.8 5.0 0
8 45 10.8 5.0 0
9 50 8.4 3.0 1
9 50 8.4 3.0 1
10 34 6.6 2.0 0
10 34 6.6 2.0 0
11 48 10.7 1.0 1
11 48 10.7 1.0 1
12 82 11.0 15.0 1
12 82 11.0 15.0 1
13 24 3.5 4.0 0
13 24 3.5 4.0 0
14 50 6.9 10.0 0
14 50 6.9 10.0 0
15 58 7.8 9.0 1
15 58 7.8 9.0 1
16 63 10.1 10.0 0
16 63 10.1 10.0 0
17 30 5.0 1.0 1
17 30 5.0 1.0 1
18 37 7.5 5.0 0
18 37 7.5 5.0 0
19 45 6.4 8.0 1
19 45 6.4 8.0 1
20
20
72
72
10.0
10.0
12.0
12.0 1
1
EXAMPLE 11-3
11- 33

Picturing Qualitative Variables in


Regression
Y y
Line for X2=1
b3

b0+b2
Line for X2=0

b0
x1

X1 x2
AAregression
regressionwith
withone
one AAmultiple
multipleregression
regressionwith
withtwo
two
quantitativevariable
quantitative variable(X
(X)1)and
and quantitativevariables
quantitative variables(X
(X1and
andXX)2)
1 1 2
onequalitative
one qualitativevariable
variable(X(X):2): andone
and onequalitative
qualitativevariable
variable(X
(X):3):

y = b + b x + b x + b x
2 3

y = b + b x + b x
0 1 1 2 2
0 1 1 2 2 3 3
11- 34

Picturing Qualitative Variables in Regression:


Three Categories and Two Dummy Variables

Y
Line for X = 0 and X3 = 1 AAqualitative
qualitative
variablewith
variable withrr
levelsor
levels orcategories
categories
Line for X2 = 1 and X3 = 0 isisrepresented
representedwith
with
(r-1)0/1
(r-1) 0/1(dummy)
(dummy)
b0+b3 variables.
variables.
Line for X2 = 0 and X3 = 0

b0+b2

b0
X1
Category XX2
Category XX33
2
AAregression
regressionwith
withone
onequantitative
quantitativevariable
variable(X
(X)1)and
1
andtwo
two Adventure 00
Adventure 00
qualitativevariables
qualitative variables(X
(X2and
2
andXX):
2
2): Drama
Drama 00 11
Romance 11 00
y = b + b x + b x + b x
0 1 1 2 2 3 3
Romance
11- 35

Using Qualitative Variables in


Regression: Example 11-4

Salary==8547
Salary 8547 ++ 949
949Education
Education ++ 1258
1258Experience
Experience --
3256Gender
3256 Gender
(SE) (32.6)
(SE) (32.6) (45.1)
(45.1) (78.5)
(78.5)
(212.4)
(212.4)
(t)
(t) (262.2)
(262.2) (21.0)
(21.0) (16.0)
(16.0)
(-15.3)
(-15.3) Onaverage,
average,female
femalesalaries
salariesare
are
Gender = 
1 if Female On
0 if Male $3256below
$3256 belowmale
malesalaries
salaries
11- 36

Interactions between Quantitative and


Qualitative Variables: Shifting Slopes
Line for X2=0
Y

Slope = b1 Line for X2=1

b0

Slope = b1+b3

b0+b2
X1

AAregression
regressionwith
withinteraction
interactionbetween
betweenaaquantitative
quantitative
variable(X
variable (X)1)and
andaaqualitative
qualitativevariable
variable(X
(X2):):
1 2

y = b + b x + b x + b x x
0 1 1 2 2 3 1 2
11- 37

11-9 Polynomial Regression

One-variable polynomial regression model:


Y= β 0+β 1 X + β 2X2 + β 3X3 +. . . + β mXm +ε
where m is the degree of the polynomial - the highest power of X appearing in
the equation. The degree of the polynomial is the order of the model.
Y Y
y = b + b X
y = b + b X
0 1
0 1

y = b + b X + b X
0 1 2
2

(b < 0) y = b + b X + b X + b X
0 1 2
2
3
3

X1 X1
11- 38

Polynomial Regression: Example 11-5


11- 39

Polynomial Regression: Other


Variables and Cross-Product Terms

Variable Estimate
Variable Estimate Standard
StandardError
Error T-statistic
T-statistic
XX1 1 2.34
2.34 0.92
0.92 2.54
2.54
XX2 2 3.11
3.11 1.05
1.05 2.96
2.96
XX121 4.22 1.00 4.22
2
4.22 1.00 4.22
XX222 3.57 2.12 1.68
2
3.57 2.12 1.68
XX1X1X 2.77 2.30 1.20
2
2
2.77 2.30 1.20
11- 40

11-10 Nonlinear Models and


Transformations
The multiplica tive model :
Y =β X β X β X β ε
0 1
1

2
2

3
3

The logarithmi c transfor mation :


log Y = log β +β log X +β log X +β log X +log ε
0 1 1 2 2 3 3
11- 41

Transformations: Exponential Model

The exponentia lm odel :


Y =βe β ε
0
1X

The logarithmi ctransfor mation :


log Y =log β +βX +log ε
0 1 1
11- 42

Plots of Transformed Variables


Sim ple Regression of Sales on Ad vertising Regression of Sales on Log(Advertising)

30 25

20
SALES

SALES
15

Y = 6 .59 2 71 + 1.19 176 X Y = 3.6 6 8 2 5 + 6 .78 4 X


10 R- Sq uared = 0 .8 9 5 R- Sq uared = 0 .978
5

0 5 10 15 0 1 2 3
ADVERT LOGADV

Regression of Log(Sales) on Log(Advertising) Residual Plots: Sales vs Log(Advertising)


1.5
3.5

RESIDS 0.5
LOGSALE

2.5
Y = 1.70 0 8 2 + 0 .5 53 13 6 X -0.5

R- Sq uared = 0 .9 47
-1.5
1.5
2 12 22
0 1 2 3
LOGADV Y-HAT
11- 43

Variance Stabilizing Transformations

••Square
Square root
root transformation:
transformation: Y ′ = Y
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
approximately proportionalto
tothe
theconditional
conditionalmean
meanof
ofYY

••Logarithmic
Logarithmic transformation:
transformation: Y ′ = log(Y )
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofregression
regressionerrors
errorsisisapproximately
approximately
proportionalto
proportional tothe
thesquare
squareof
ofthe
theconditional
conditionalmean
meanof
ofYY

••Reciprocal
Reciprocal transformation:
transformation: Y ′ = 1
Y
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
approximately proportionalto
tothe
thefourth
fourthpower
powerof
ofthe
the
conditionalmean
conditional meanofofYY
11- 44

Regression with Dependent Indicator


Variables
The logistic function:
e ( β +β X )
0 1

E (Y X ) =
1 +e ( β +β X )
0 1

Transformation to linearize the logistic function:


 p 
p ′ = log  
1 − p 

y Logistic Function
1

0
x
11- 45

11-11: Multicollinearity
x2

x1 x2 x1
Orthogonal X variables provide Perfectly collinear X variables
information from independent provide identical information
sources. No multicollinearity. content. No regression.

x2
x2
x1 x1
Some degree of collinearity.
A high degree of negative
Problems with regression depend
collinearity also causes problems
on the degree of collinearity.
with regression.
11- 46

Effects of Multicollinearity

•• Variancesof
Variances ofregression
regressioncoefficients
coefficientsare
areinflated.
inflated.
•• Magnitudesof
Magnitudes ofregression
regressioncoefficients
coefficientsmay
maybebedifferent
different
fromwhat
from whatare
areexpected.
expected.
•• Signsof
Signs ofregression
regressioncoefficients
coefficientsmay
maynotnotbe
beas
asexpected.
expected.
•• Addingor
Adding orremoving
removingvariables
variablesproduces
produceslarge
largechanges
changesin in
coefficients.
coefficients.
•• Removingaadata
Removing datapoint
pointmay
maycause
causelarge
largechanges
changesin in
coefficientestimates
coefficient estimatesor
orsigns.
signs.
•• Insome
In somecases,
cases,the
theFFratio
ratiomay
maybebesignificant
significantwhile
whilethe
thett
ratiosare
ratios arenot.
not.
11- 47
Detecting the Existence of Multicollinearity:
Correlation Matrix of Independent Variables and
Variance Inflation Factors
11- 48

Variance Inflation Factor

The variance inflation factor associated with X h :


1
VIF ( X h ) =
1 − Rh2
where R 2h is the R 2 value obtained for the regression of X on
the other independent variables.

Relationship between VIF and Rh2


VIF100

50

0
0.0 0.5 1.0 Rh2
11- 49

Variance Inflation Factor (VIF)

Observation: The VIF (Variance Inflation Factor)


values for both variables Lend and Price are both
greater than 5. This would indicate that some degree of
multicollinearity exists with respect to these two
variables.
11- 50

Solutions to the Multicollinearity


Problem

•• Drop
Drop aa collinear
collinear variable
variable from
from the
the
regression
regression
•• Change
Change inin sampling
sampling plan
plan to
to include
include
elements outside
elements outside the
the multicollinearity
multicollinearity range
range
•• Transformations
Transformations of of variables
variables
•• Ridge
Ridge regression
regression
11- 51

11-12 Residual Autocorrelation and


the Durbin-Watson Test
Anautocorrelation
An autocorrelationisisaacorrelation
correlationof
ofthe
thevalues
valuesofofaavariable
variable
withvalues
with valuesof
ofthe
thesame
samevariable
variablelagged
laggedone
oneorormore
moreperiods
periods
back. Consequences
back. Consequencesof ofautocorrelation
autocorrelationinclude
includeinaccurate
inaccurate
estimatesof
estimates ofvariances
variancesand
andinaccurate
inaccuratepredictions.
predictions.
LaggedResiduals
Lagged Residuals TheDurbin-Watson
Durbin-Watsontest test(first-order
(first-order
The
ii εε i i εε i-1i-1 εε i-2i-2 εε i-3i-3 εε i-4i-4
autocorrelation):
autocorrelation):
11 1.0
1.0 ** ** ** ** HH0:0:ρρ 11==00
22 0.0 1.0 ** ** **
HH1:1: ρρ 11≠≠ 00
0.0 1.0
33 -1.0
-1.0 0.00.0 1.0 **
1.0 **
44 2.0 -1.0
2.0 -1.0 0.0 1.0
0.0 1.0 ** TheDurbin-Watson
The Durbin-Watsontest teststatistic:
statistic:
55 3.0
3.0 2.02.0 -1.0 -1.0 0.0 0.0 1.01.0
n
66 -2.0 3.0 2.0 -1.0 -1.0 0.0 2
-2.0 3.0 2.0 0.0 ∑ ( ei − ei −1 )
77 1.0 -2.0 -2.0 3.0 2.0 2.0 -1.0 -1.0
d = i =2 n
1.0 3.0
88 1.5
1.5 1.01.0 -2.0 -2.0 3.0 3.0 2.02.0
99 1.0 1.5 1.0 -2.0 -2.0 3.0 2
10
1.0
-2.5
1.5
1.0
1.0
1.5 1.0 1.0 -2.0
3.0
-2.0
∑ ei
10 -2.5 1.0 1.5 i =1
11- 52

Critical Points of the Durbin-Watson Statistic: α =0.05,


n= Sample Size, k = Number of Independent Variables

kk==11 kk==22 kk==33 kk==44 kk==55


nn ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU
15
15 1.08
1.08 1.36
1.36 0.95
0.95 1.54
1.54 0.82
0.82 1.75
1.75 0.69
0.69 1.97
1.97 0.56
0.56 2.21
2.21
16
16 1.10
1.10 1.37
1.37 0.98
0.98 1.54
1.54 0.86
0.86 1.73
1.73 0.74
0.74 1.93
1.93 0.62
0.62 2.15
2.15
17
17 1.13
1.13 1.38
1.38 1.02
1.02 1.54
1.54 0.90
0.90 1.71
1.71 0.78
0.78 1.90
1.90 0.67
0.67 2.10
2.10
18
18 1.16
1.16 1.39
1.39 1.05
1.05 1.53
1.53 0.93
0.93 1.69
1.69 0.82
0.82 1.87
1.87 0.71
0.71 2.06
2.06
. .. . .. . .. . .. . .. . ..
.. .. .. .. .. ..
. . . . . .
65
65 1.57
1.57 1.63
1.63 1.54
1.54 1.66
1.66 1.50
1.50 1.70
1.70 1.47
1.47 1.73
1.73 1.44
1.44 1.77
1.77
70
70 1.58
1.58 1.64
1.64 1.55
1.55 1.67
1.67 1.52
1.52 1.70
1.70 1.49
1.49 1.74
1.74 1.46
1.46 1.77
1.77
75
75 1.60
1.60 1.65
1.65 1.57
1.57 1.68
1.68 1.54
1.54 1.71
1.71 1.51
1.51 1.74
1.74 1.49
1.49 1.77
1.77
80
80 1.61
1.61 1.66
1.66 1.59
1.59 1.69
1.69 1.56
1.56 1.72
1.72 1.53
1.53 1.74
1.74 1.51
1.51 1.77
1.77
85
85 1.62
1.62 1.67
1.67 1.60
1.60 1.70
1.70 1.57
1.57 1.72
1.72 1.55
1.55 1.75
1.75 1.52
1.52 1.77
1.77
90
90 1.63
1.63 1.68
1.68 1.61
1.61 1.70
1.70 1.59
1.59 1.73
1.73 1.57
1.57 1.75
1.75 1.54
1.54 1.78
1.78
95
95 1.64
1.64 1.69
1.69 1.62
1.62 1.71
1.71 1.60
1.60 1.73
1.73 1.58
1.58 1.75
1.75 1.56
1.56 1.78
1.78
100
100 1.65
1.65 1.69
1.69 1.63
1.63 1.72
1.72 1.61
1.61 1.74
1.74 1.59
1.59 1.76
1.76 1.57
1.57 1.78
1.78
11- 53

Using the Durbin-Watson Statistic

Positive Test is No Test is Negative


Autocorrelation Inconclusive Autocorrelation Inconclusive Autocorrelation

0 dL dU 4-dU 4-dL 4

Fornn==67,
For 4: ddUU≈≈1.73
67,kk==4: 4-dUU≈≈2.27
1.73 4-d 2.27
ddLL≈≈1.47 4-ddLL≈≈2.53
1.47 4- 2.53<<2.58
2.58
HH00isisrejected,
rejected,and
andweweconclude
concludethere
thereisisnegative
negativefirst-order
first-order
autocorrelation.
autocorrelation.
11- 54

11-13 Partial F Tests and Variable


Selection Methods
Fullmodel:
Full model:
YY==ββ 0 0++ββ 1 1XX1 1++ββ 2 2XX2 2++ββ 3 3XX3 3++ββ 4 4XX4 4++εε
Reducedmodel:
Reduced model:
YY==ββ 0 0++ββ 1 1XX1 1++ββ 2 2XX2 2++εε

PartialFFtest:
Partial test:
HH0:0:ββ 3 3==ββ 4 4==00
HH1:1:ββ 3 3and
andββ 4not
4
notboth
both00
(SSE − SSE ) / r
PartialFFstatistic:
statistic: R F
Partial F =
(r, (n − (k + 1)) MSE
F

whereSSE
where SSERisisthethesum
sumofofsquared
squarederrors
errorsofofthe
thereduced
reducedmodel,
model,SSESSEFisisthe
thesum
sumofofsquared
squared
R F
errorsofofthe
errors thefull
fullmodel;
model;MSE
MSEFisisthe
themean
meansquare
squareerror
errorofofthe
thefull
fullmodel
model[MSE
[MSEF==SSESSE/(n-
F/(n-
F F F
(k+1))];rrisisthe
(k+1))]; thenumber
numberofofvariables
variablesdropped
droppedfromfromthe
thefull
fullmodel.
model.
11- 55

Variable Selection Methods

••All
All possible
possible regressions
regressions
Run
 Runregressions
regressionswith
withall
allpossible
possiblecombinations
combinationsof
of
independentvariables
independent variablesand
andselect
selectbest
bestmodel
model

A p-value of 0.001 indicates


that we should reject the null
hypothesis H0: the slopes for
Lend and Exch. are zero.
11- 56

Variable Selection Methods

••Stepwise
Stepwise procedures
procedures
Forward
 Forwardselection
selection
•• Add
Addone
onevariable
variableatataatime
timetotothe
themodel,
model,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Backward
 Backwardelimination
elimination
•• Remove
Removeone
onevariable
variableatataatime,
time,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Stepwise
 Stepwiseregression
regression
•• Adds
Addsvariables
variablestotothe
themodel
modeland
andsubtracts
subtractsvariables
variablesfrom
fromthe
themodel,
model,on
on
thebasis
the basisof
ofthe
theFFstatistic
statistic
11- 57

Stepwise Regression

Compute F statistic for each variable not in the model

Is there at least one variable with p-value > Pin ? No


Stop
Yes
Enter most significant (smallest p-value) variable into model

Calculate partial F for all variables in the model

Remove
Is there a variable with p-value > Pout ?
variable
No
11- 58

Stepwise Regression: Using the


Computer (MINITAB)

MTB>>STEPWISE
MTB STEPWISE'EXPORTS'
'EXPORTS'PREDICTORS
PREDICTORS 'M1’
'M1’ 'LEND'
'LEND' 'PRICE’
'PRICE’ 'EXCHANGE'
'EXCHANGE'

StepwiseRegression
Stepwise Regression

F-to-Enter:
F-to-Enter: 4.00 F-to-Remove:
4.00 F-to-Remove: 4.00
4.00
ResponseisisEXPORTS
Response EXPORTS on
on 44predictors,
predictors,with
withNN== 67
67

Step
Step 11 22
Constant
Constant 0.9348
0.9348 -3.4230
-3.4230
M1
M1 0.520
0.520 0.361
0.361
T-Ratio
T-Ratio 9.89
9.89 9.21
9.21
PRICE
PRICE 0.0370
0.0370
T-Ratio
T-Ratio 9.05
9.05

SS 0.495
0.495 0.331
0.331
R-Sq
R-Sq 60.08
60.08 82.48
82.48
11- 59

Using the Computer: MINITAB

MTB>> REGRESS
REGRESS 'EXPORTS’
'EXPORTS’ 44 'M1’
'M1’ 'LEND’
'LEND’ 'PRICE'
'PRICE' 'EXCHANGE';
'EXCHANGE';
MTB
SUBC>vif;
vif;
SUBC>
SUBC>dw.
dw.
SUBC>
Regression Analysis
Regression Analysis
Theregression
regressionequation
equationisis
The
EXPORTS = - 4.02 ++0.368
EXPORTS = - 4.02 0.368M1 M1++0.0047
0.0047LEND
LEND++0.0365
0.0365PRICE
PRICE++0.27
0.27
EXCHANGE
EXCHANGE
Predictor Coef Stdev t-ratio
Predictor Coef Stdev t-ratio pp
VIF
VIF
Constant -4.015 2.766 -1.45 0.152
Constant -4.015 2.766 -1.45 0.152
M1 0.36846 0.06385 5.77 0.000
M1 0.36846 0.06385 5.77 0.000
3.2
3.2
LEND 0.00470 0.04922 0.10 0.924
LEND 0.00470 0.04922 0.10 0.924
5.4
5.4
PRICE 0.036511 0.009326 3.91 0.000
PRICE 0.036511 0.009326 3.91 0.000
6.3
6.3
EXCHANGE 0.268 1.175 0.23 0.820
EXCHANGE 0.268 1.175 0.23 0.820
1.4
1.4

ss==0.3358
0.3358 R-sq==82.5%
82.5% R-sq(adj)==81.4%
81.4%
R-sq R-sq(adj)
AnalysisofofVariance
Variance
Analysis
SOURCE DF SS MS
SOURCE DF SS MS FF
pp
Regression 44 32.9463 8.2366 73.06
Regression 32.9463 8.2366 73.06
11- 60

Using the Computer: SAS (continued)

ParameterEstimates
Estimates
Parameter
Parameter Standard TTfor
forH0:
H0:
Parameter Standard
Variable DF Estimate Error Parameter=0
Variable DF Estimate Error Parameter=0
Prob >>|T|
Prob |T|

INTERCEP 11 -4.015461 2.76640057 -1.452


INTERCEP -4.015461 2.76640057 -1.452
0.1517
0.1517
M1 11 0.368456 0.06384841 5.771
M1 0.368456 0.06384841 5.771
0.0001
0.0001
LEND 1 0.004702 0.04922186 0.096
LEND 1 0.004702 0.04922186 0.096
0.9242
0.9242
PRICE 11 0.036511 0.00932601 3.915
PRICE 0.036511 0.00932601 3.915
0.0002
0.0002
EXCHANGE 11 0.267896 1.17544016 0.228
EXCHANGE 0.267896 1.17544016 0.228
0.8205
0.8205
Variance
Variance
Variable DF Inflation
Variable DF Inflation
INTERCEP 11 0.00000000
INTERCEP 0.00000000
M1 11 3.20719533
M1 3.20719533
LEND 11 5.35391367
LEND 5.35391367
PRICE 11 6.28873181
PRICE 6.28873181
EXCHANGE 11 1.38570639
EXCHANGE 1.38570639