Вы находитесь на странице: 1из 43

12-1 of 43

Chapter Twelve
Multiple Regression

12-2 of 43
McGraw-Hill/Irwin

Copyright 2003 by The McGraw-Hill Companies, Inc. All rights reserved.

Multiple Regression
Part 1. Basic Multiple Regression
12.1
12.2
12.3

The Linear Regression Model


The Least Squares Point Estimates
The Mean Squared Error and the Standard
Error
12.4
Model Utility: R2, Adjusted R2, and the F Test
12.5
Testing Significance of an Independent
Variable
12.6
Confidence Intervals and Prediction Intervals

Part 2 Using Squared and Interaction Terms


12.7
12.8

12-3 of 43

The Quadratic Regression Model


Interaction

Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
12.9
12.10

Dummy Variables to Model Qualitative Variables


The Partial F Test: Testing a Portion of a
Model

Part 4 Model Building and Model Diagnostics


12.11
12.12

Model Building and Model Diagnostics


Model Building and the Effects of
Mulitcollineartity
12.13
Diagnostics for Detecting Outlying and
Influential Observations

12-4 of 43

12.1 The Linear Regression


Model
The linear regression model relating y to x1, x2, , xk is

y= y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk


where

y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk is the mean value of the


dependent variable y when the values of the independent
variables are x1, x2, , xk.

0 , 1 , 2 ,...,
k
arethe
regression parameters relating the mean
value of y to x1, x2, , xk.
is an error term that describes the effects on y of all factors
other than the independent variables x1, x2, , xk .
12-5 of 43

Example: The Linear Regression


Model
Example 12.1: The Fuel Consumption Case
Week
1
2
3
4
5
6
7
8

Average Hourly
Temperature, x1 (F)
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5

Chill Index, x2
18
14
24
22
8
16
1
0

Fuel Consumption
y (MMcf)
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5

y = 0 1 x1 2 x2

12-6 of 43

The Linear Regression Model


Illustrated
Example 12.1: The Fuel Consumption Case

12-7 of 43

The Regression Model Assumptions


Model

y= y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk

Assumptions about the model error terms, s


Mean Zero The mean of the error terms is equal to 0.
Constant Variance The variance of the error terms is, the
same for every combination values of x1, x2, , xk.
Normality The error terms follow a normal distribution for
every combination values of x1, x2, , xk.
Independence The values of the error terms are statistically
independent of each other.
12-8 of 43

12.2 Least Squares Estimates and


Prediction
Estimation/Prediction Equation:

y b0 b1 x01 b2 x02 ... bk x0 k


is the point estimate of the mean value of the dependent
variable when the values of the independent variables are x01,
x02, , x0k. It is also the point prediction of an individual value
of the dependent variable when the values of the independent
variables are x01, x02, , x0k.
b1, b2, , bk are the least squares point estimates of the
parameters 1, 2, , k.

12-9 of 43

x01, x02, , x0k are specified values of the independent predictor


variables x1, x2, , xk.

Example: Least Squares


Estimation
Example 12.3: The Fuel Consumption Case

Minitab

Output
FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predictor
Constant
Temp
Chill
S = 0.3671

Coef
13.1087
-0.09001
0.08249

StDev
0.8557
0.01408
0.02200

R-Sq = 97.4%

Analysis of Variance
Source
DF
Regression
2
Residual Error
5
Total
7

SS
24.875
0.674
25.549

T
15.32
-6.39
3.75

P
0.000
0.001
0.013

R-Sq(adj) = 96.3%
MS
12.438
0.135

Predicted Values (Temp = 40, Chill = 10)


Fit StDev Fit
95.0% CI
10.333
0.170
(
9.895, 10.771)
12-10 of 43

F
92.30

P
0.000

95.0% PI
9.293, 11.374)

Example: Point Predictions and


Residuals
Example 12.3: The Fuel Consumption Case
Week
1
2
3
4
5
6
7
8

12-11 of 43

Average Hourly
Temperature, x1 (F) Chill Index, x2
28.0
18
28.0
14
32.5
24
39.0
22
45.9
8
57.8
16
58.1
1
62.5
0

Observed Fuel
Predicted Fuel
Consumption
Consumption
Residual
y (MMcf) 13.1087 - .0900x1 + .0825x2 e = y - pred
12.4
12.0733
0.3267
11.7
11.7433
-0.0433
12.4
12.1631
0.2369
10.8
11.4131
-0.6131
9.4
9.6372
-0.2372
9.5
9.2260
0.2740
8.0
7.9616
0.0384
7.5
7.4831
0.0169

12.3 Mean Square Error and


Standard Error
SSE

2
i

( yi y i ) 2

s 2 MSE

SSE
n-( k 1)

s MSE

SSE
n-(k 1)

Sum of Squared Errors


Mean Square Error, point
estimate of residual variance
Standard Error, point estimate of
residual standard deviation

Example 12.3 The Fuel Consumption Case


Analysis of Variance
Source
DF
Regression
2
Residual Error
5
Total
7

s 2 MSE
12-12 of 43

SS
24.875
0.674
25.549

MS
12.438
0.135

F
92.30

P
0.000

SSE
0.674

0.1348 s s 2 0.1348 0.3671


n-(k 1) 8 3

12.4 The Multiple Coefficient of


Determination
The multiple coefficient of determination R2 is

Explained variation
R
Total variation
2

R2 is the proportion of the total variation in y explained by the


linear regression model

Total variation Explained variation Unexplained variation


Total variation = (yi y )2 Total Sum of Squares (SSTO)

Explained variation = (y i y )2 Regression Sum of Squares (SSR)

Unexplained variation = (yi y i )2

Error Sum of Squares (SSE)

Multiple correlation coefficient , R R 2


12-13 of 43

The Adjusted R2
The adjusted multiple coefficient of determination is

k
R R

n 1

n 1

n (k 1)

Example 12.3 The Fuel Consumption Case


S = 0.3671
R-Sq = 97.4%
Analysis of Variance
Source
DF
SS
Regression
2
24.875
Residual Error
5
0.674
Total
7
25.549

R-Sq(adj) = 96.3%
MS
12.438
0.135

24.875
2

R
0.974, R 2 0.974

25.549
8 1

12-14 of 43

F
92.30

P
0.000

8 1

0.963
8 (2 1)

F Test for Linear Regression


Model
To test H0: = = = = 0 versus
Ha: At least one of the , , , k is not equal
to 0
Test Statistic:

F(model)

(Explained variation)/k
(Unexplained variation)/[n - (k 1)]

Reject H0 in favor of Ha if:


F(model) > For
p-value <

Fis based on k numerator and n-(k+1) denominator degrees of


freedom.
12-15 of 43

Example: F Test for Linear


Regression
Example 12.5 The Fuel Consumption Case
Output
Analysis of Variance
Source
DF
Regression
2
Residual Error
5
Total
7

SS
24.875
0.674
25.549

MS
12.438
0.135

Minitab
F
92.30

P
0.000

Test Statistic:
F(model)

(Explained variation)/k
24.875 / 2

92.30
(Unexplained variation)/[n - (k 1)] 0.674 /(8 3)

Reject H0 at level of significance, since


F-test at = 0.05
F(model) 92.30 5.79 F.05 and
level of
p - value 0.000 0.05
significance
12-16 of 43

Fis based on 2 numerator and 5 denominator degrees of


freedom.

12.5 Testing Significance of the


Independent Variable
If the regression assumptions hold, we can reject H0: j = 0 at
the level of significance (probability of Type I error equal to )
if and only if the appropriate rejection point condition holds or,
equivalently, if the corresponding p-value is less than .
Alternative

Reject H0 if:

p-Value

Ha : j 0

t t

Area under t distribution left of t

Ha : j 0

t t / 2 , that is

Twice area under t distribution right of t

Ha : j 0

t t

Area under t distribution right of t

t t / 2 or t t / 2

Test Statistic

t=

bj
sbj

, t/2
12-17 of t43

100(1-)% Confidence Interval for 1

[b j t / 2 sb j ]

and p-values are based on n (k+1) degrees of


freedom.

Example: Testing and


Estimation for s
Example 12.6: The Fuel Consumption Case

Minitab

Output
Predictor
Constant
Temp
Chill
Test

t=

Coef
13.1087
-0.09001
0.08249

StDev
T
0.8557
15.32
0.01408
-6.39
0.02200Interval 3.75

b2
0.08249

3.75 2.571 t.025


sb2
0.02200

p value 2 P(t 3.75) 0.013

[b2 t / 2 sb2 ]
[0.08249 (2.571)(0.02200)]
[0.08249 0.05656]
[0.02593, 0.13905]

Chill is significant at the = 0.05 level, but not at = 0.01


12-18 of 43

P
0.000
0.001
0.013

t, t/2 and p-values are based on 5 degrees of freedom.

12.6 Confidence and Prediction Intervals


Prediction:

y b0 b1 x01 b2 x02 ... bk x0 k

If the regression assumptions hold,

100(1 - )% confidence interval for the mean value


of y

[y t /2 s( y y ) ] s( y y ) s Distance value

100(1 - )% prediction interval for an individual value of y

[y t /2 s y ],

s y s 1 + Distance value

Distance value (requires matrix algebra), see Appendix G on CD-RO


t is based on n-(k+1) degrees of freedom
12-19 of 43

Example: Confidence and Prediction


Intervals
Example 12.9 The Fuel Consumption Case
Minitab Output
FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predicted Values (Temp = 40, Chill = 10)
Fit StDev Fit
95.0% CI
95.0% PI
10.333
0.170
(9.895, 10.771) (9.293,11.374)

95% Confidence Interval

95% Prediction Interval

[y t /2 s Distance value ]

[y t /2 s 1 Distance value ]

[10.333 (2.571)(0.3671) 0.2144515 ] [10.333 (2.571)(0.3671) 1 0.2144515 ]


[10.333 0.438]

[10.333 1.041]

[9.895,10.771]

[9.292,11.374]

12-20 of 43

12.7 The Quadratic Regression


Model
Model

12-21 of 43

y= 0 1 x 2 x 2

Example: Quadratic Regression


Model
Example 12.11 The Gasoline Additive Case
Units of
Additive, x
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
12-22 of 43

Mileage,
y (MPG)
25.8
26.1
25.4
29.6
29.2
29.8
32.0
31.4
31.7
31.7
31.5
31.2
29.4
29.0
29.5

Example: Quadratic Regression


Model
Example 12.11: The Gasoline Additive Case

Minitab

Output
Mileage = 25.7 + 4.98 Units - 1.02 UnitsSq
Predictor
Coef
StDev
T
P
Constant
25.7152
0.1554
165.43
0.000
Units
4.9762
0.1841
27.02
0.000
UnitsSq
-1.01905
0.04414
-23.09
0.000
S = 0.2861
R-Sq = 98.6%
R-Sq(adj) = 98.3%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
2
67.915
33.958
414.92
0.000
Residual Error
12
0.982
0.082
Total
14
68.897
Predicted Values (Units = 2.44, UnitsSq = (2.44)(2.44) = 5.9536)
Fit
StDev Fit
95.0% CI
95.0% PI
31.7901
0.1111
( 31.5481, 32.0322)2 ( 31.1215, 32.4588)

y 25.7152 4.9762(2.44) 1.01905(2.44) 31.7901 mpg

12-23 of 43

12.8 Interaction
Example
12.13: The
Bonner
Frozen
Foods
Case

12-24 of 43

Sales
Region
1
2
3
4
5
6
7
8
9
10
11
12
13

Radio and TV
Expenditures
x1
1
1
1
1
1
2
2
2
2
2
3
3
3

Print
Expenditures
x2
1
2
3
4
5
1
2
3
4
5
1
2
3

Sales
Volume
y
3.27
8.38
11.28
14.5
19.63
5.84
10.01
12.46
16.67
19.83
8.51
10.14
14.75

Sales
Region
14
15
16
17
18
19
20
21
22
23
24
25

Radio and TV
Expenditures
x1
3
3
4
4
4
4
4
5
5
5
5
5

Print
Expenditures
x2
4
5
1
2
3
4
5
1
2
3
4
5

Sales
Volume
y
17.99
19.85
9.46
12.61
15.5
17.68
21.02
12.23
13.58
16.77
20.56
21.05

Modeling Interaction

Model y= 0 1 x1 2 x2 3 x1 x2 x1x2 is a cross-product or interaction term


Example 12.13: The Bonner Frozen Food Case

Minitab

Output
Sales = - 2.35 + 2.36 RadioTV + 4.18 Print - 0.349 Interact
Predictor
Coef
StDev
T
P
Constant
-2.3497
0.6883
-3.41
0.003
RadioTV
2.3611
0.2075
11.38
0.000
Print
4.1831
0.2075
20.16
0.000
Interact
-0.34890
0.06257
-5.58
0.000
S = 0.6257
R-Sq = 98.6%
R-Sq(adj) = 98.4%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
3
590.41
196.80
502.67
0.000
Residual Error
21
8.22
0.39
Total
24
598.63
Predicted Values (RadioTV = 2, Print = 5, Interact=(2)(5) = 10)
Fit StDev Fit
95.0% CI
95.0% PI
12-25 of 43
19.799
0.265
( 19.247, 20.351) ( 18.385, 21.213)

12.9Using Dummy Variables to


Model Qualitative Independent
Variable
Example 12.15 The Electronics World Case
Store
1
2
3
4
5
6
7
8
9
10

Number of
Households
x
161
99
135
120
164
221
179
204
214
101

Location
Street
Street
Street
Street
Street
Mall
Mall
Mall
Mall
Mall

Location
Dummy
DM
0
0
0
0
0
1
1
1
1
1

Sales
Volume
y
157.27
93.28
136.81
123.79
153.51
241.74
201.54
206.71
229.78
135.22

Location Dummy Variable


DM
12-26 of 43

1 if a store is in a mall location


0 otherwise

Example: Regression with a Dummy


Variable
Example 12.15: The Electronics World Case

Minitab

Output
Sales = 17.4 + 0.851 Households + 29.2 DM
Predictor
Constant
Househol
DM

Coef
17.360
0.85105
29.216

S = 7.329

StDev
9.447
0.06524
5.594

R-Sq = 98.3%

T
1.84
13.04
5.22

P
0.109
0.000
0.001

R-Sq(adj) = 97.8%

Analysis of Variance
Source
Regression
Residual Error
Total
12-27
of 43

DF
2
7
9

SS
21412
376
21788

MS
10706
54

F
199.32

P
0.000

12.10 The Partial F Test: Testing the


Significance of a Portion of a
Regression Model
Complete model : y= 0 1 x1 ... g x g g 1 x g 1 ... k xk
Reduced model : y= 0 1 x1 ... g x g

To test H0: g+1= g+2 = = k = 0 versus


Ha: At least one of the g+1, g+2, , k is not equal
to 0
Partial F Statistic:

(SSE R - SSE C )/(k - g)


SSE C /[n - (k 1)]

Reject H0 in favor of Ha if:


F > For
p-value <
12-28 of 43

Fis based on k-g numerator and n-(k+1) denominator degrees of


freedom.

12.11 Model Building and the Effects of


Multicollinearity
Example: The Sale Territory Performance Case
Sales
3669.88
3473.95
2295.10
4675.56
6125.96
2134.94
5031.66
3367.45
6519.45
4876.37
2468.27
2533.31
2408.11
2337.38
4586.95
2729.24
3289.40
2800.78
3264.20
3453.62
1741.45
2035.75
1578.00
4167.44
2799.97

12-29 of 43

Time
43.10
108.13
13.82
186.18
161.79
8.94
365.04
220.32
127.64
105.69
57.72
23.58
13.82
13.82
86.99
165.85
116.26
42.28
52.84
165.04
10.57
13.82
8.13
58.54
21.14

MktPoten
Adver MktShare
74065.11 4582.88
2.51
58117.30 5539.78
5.51
21118.49 2950.38
10.91
68521.27 2243.07
8.27
57805.11 7747.08
9.15
37806.94
402.44
5.51
50935.26 3140.62
8.54
35602.08 2086.16
7.07
46176.77 8846.25
12.54
42053.24 5673.11
8.85
36829.71 2761.76
5.38
33612.67 1991.85
5.43
21412.79 1971.52
8.48
20416.87 1737.38
7.80
36272.00 10694.20
10.34
23093.26 8618.61
5.15
26879.59 7747.89
6.64
39571.96 4565.81
5.45
51866.15 6022.70
6.31
58749.82 3721.10
6.35
23990.82
860.97
7.37
25694.86 3571.51
8.39
23736.35 2845.50
5.15
34314.29 5060.11
12.88
22809.53 3552.00
9.14

Change
0.34
0.15
-0.72
0.17
0.50
0.15
0.55
-0.49
1.24
0.31
0.37
-0.65
0.64
1.01
0.11
0.04
0.68
0.66
-0.10
-0.03
-1.63
-0.43
0.04
0.22
-0.74

Accts WkLoad
74.86
15.05
107.32
19.97
96.75
17.34
195.12
13.40
180.44
17.64
104.88
16.22
256.10
18.80
126.83
19.86
203.25
17.42
119.51
21.41
116.26
16.32
142.28
14.51
89.43
19.35
84.55
20.02
119.51
15.26
80.49
15.87
136.58
7.81
78.86
16.00
136.58
17.44
138.21
17.98
75.61
20.99
102.44
21.66
76.42
21.46
136.58
24.78
88.62
24.96

Rating
4.9
5.1
2.9
3.4
4.6
4.5
4.6
2.3
4.9
2.8
3.1
4.2
4.3
4.2
5.5
3.6
3.4
4.2
3.6
3.1
1.6
3.4
2.7
2.8
3.9

Correlation Matrix
Example: The Sale Territory Performance Case

12-30 of 43

Multicollinearity
Multicollinearity refers to the condition where the independent
variables (or predictors) in a model are dependent, related,
or correlated with each other.
Effects
Hinders ability to use bjs, t statistics, and p-values to
assess the relative importance of predictors.
Does not hinder ability to predict the dependent (or
response) variable.
Detection
Scatter Plot Matrix
Correlation Matrix
Variance Inflation Factors (VIF)
12-31 of 43

Variance Inflation Factors (VIF)


The variance inflation factor for the jth independent (or
predictor) variable xj is

1
VIFj
1 R 2j

where Rj2 is the multiple coefficient of determination for the


regression model relating xj to the other predictors x1,,xj1,xj+1, xk

x j = 0 1 x1 2 x2 ... j 1 x j 1 j 1 x j 1 ... k xk

Notes:
VIFj = 1 implies xj not related to other predictors
max(VIFj) > 10 suggest severe multicollinearity

12-32 of 43

mean(VIFj) substantially greater than 1 suggests


severe multicollinearity

Example: Variance Inflation Factors


(VIF)
Example: The Sale Territory Performance Case
Output

MegaStat

max(VIFj) =5.639, mean(VIFj) = 2.667 probably not severe


multicollinearity
12-33 of 43

12.12 Residual Analysis in


Multiple Regression
For an observed value of yi, the residual is

ei yi y i yi (b0 b1 xi1 ... bk xik )


If the regression assumptions hold, the residuals should look like a
random sample from a normal distribution with mean 0 and variance
2.
Residual Plots
Residuals versus each independent variable
Residuals versus predicted ys
Residuals in time order (if the response is a time series)
Histogram of residuals
Normal plot of the residuals
12-34 of 43

Nonconstant Variance: Remedial


Measures
Example 13.1: The QHIC Case

y / x 0 / x 1 2 x

Upkeep/V = - 53.5 1/V + 3.41 One + 0.0112 Value


Predictor
Coef
SE Coef
T
P
Noconstant
1/V
-53.50
83.20
-0.64
0.524
One
3.409
1.321
2.58
0.014
Value
0.011224
0.004627
2.43
0.020
Predicted Values (1/V = 0.004545, One = 1, Value = 220)
Fit
SE Fit
95.0% CI
95.0% PI
5.635
0.162
(
5.306,
5.964) (
3.994,
7.276)

Plots: Residual versus Fits x and predicted responses

12-35 of 43

12.13 Diagnostics for Detecting


Outlying and Influential
Observations

Observation 1: Outlying with respect to y value


Observation 2: Outlying with respect to x value
Observation 3: Outlying with respect to x value and y value
not
consistent with regression relationship
12-36 of 43 (Influential)

Example: Influence Diagnostics


Hospital Labor Needs Case, Model:
y = monthly labor hours required
x1 = monthly X-ray exposures
x2 = monthly occupied bed days
x3 = average length of patient stay (days)

12-37 of 43

Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620 1,172.464
431.156
0.159
1,611.370 1,526.780
84.590
0.085
1,613.270 1,993.869 -380.599
0.112
1,854.170 1,676.558
177.612
0.084
2,160.550 1,791.405
369.145
0.083
2,305.580 2,798.761 -493.181
0.085
3,503.930 4,191.333 -687.403
0.120
3,571.890 3,190.957
380.933
0.077
3,741.400 4,364.502 -623.102
0.177
4,026.520 4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863

Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786

Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033

Leverage Values

hi
Leverage = distance value

An observation is outlying with respect to x if it has a large


leverage, greater than 2(k+1)/n
Hospital Labor Needs Case: n = 17, k = 3, 2(3+1)/17 = 0.4706
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

12-38 of 43

Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620
1,172.464
431.156
0.159
1,611.370
1,526.780
84.590
0.085
1,613.270
1,993.869 -380.599
0.112
1,854.170
1,676.558
177.612
0.084
2,160.550
1,791.405
369.145
0.083
2,305.580
2,798.761 -493.181
0.085
3,503.930
4,191.333 -687.403
0.120
3,571.890
3,190.957
380.933
0.077
3,741.400
4,364.502 -623.102
0.177
4,026.520
4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863

Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786

Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033

Residuals and Studentized


Residuals
Residual
Studentized Residual
Residual Standard Error

ei

ei
s 1 hi

An observation is outlying with respect to y if it has a large


studentized (or standardized) residual, |StRes| greater than 2

12-39 of 43

Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620
1,172.464
431.156
0.159
1,611.370
1,526.780
84.590
0.085
1,613.270
1,993.869 -380.599
0.112
1,854.170
1,676.558
177.612
0.084
2,160.550
1,791.405
369.145
0.083
2,305.580
2,798.761 -493.181
0.085
3,503.930
4,191.333 -687.403
0.120
3,571.890
3,190.957
380.933
0.077
3,741.400
4,364.502 -623.102
0.177
4,026.520
4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863

Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786

Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033

Studentized Deleted Residuals


Studentized Deleted Residual

Deleted Residual
Deleted Residual Standard Error

di
nk 2

ei
sd
SSE (1 hi ) ei2
i

An observation is outlying with respect to y if it has a large studentized deleted


residual, |tRes| greater than t/2 [with (n-k-2) d.f.]
Hospital Labor Needs Case: (17-3-2) = 12 |tRes| > t.025 = 2.179

12-40 of 43

Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620 1,172.464
431.156
0.159
1,611.370 1,526.780
84.590
0.085
1,613.270 1,993.869 -380.599
0.112
1,854.170 1,676.558
177.612
0.084
2,160.550 1,791.405
369.145
0.083
2,305.580 2,798.761 -493.181
0.085
3,503.930 4,191.333 -687.403
0.120
3,571.890 3,190.957
380.933
0.077
3,741.400 4,364.502 -623.102
0.177
4,026.520 4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863

Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786

Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033

Cooks Distance
ei2
Cook' s Distance Di
(k 1) s 2

hi

2
(
1

h
)
i

An observation is influential with respect to the estimated


regression parameters b0, b1, , bk if it has a large Cooks
distance, Di greater than F.50 [with k+1 and n-(k+1) d.f.]
Hospital Labor Needs Case: (3+1) = 4, (17-3-1)
=13, Di > F.50 =
Studentized
Studentized
Deleted
0.8845
Observation
Hours Predicted
Residual Leverage
Residual
Residual Cook's D

12-41 of 43

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

566.520
688.409 -121.889
696.820
721.848
-25.028
1,033.150
965.393
67.757
1,603.620 1,172.464
431.156
1,611.370 1,526.780
84.590
1,613.270 1,993.869 -380.599
1,854.170 1,676.558
177.612
2,160.550 1,791.405
369.145
2,305.580 2,798.761 -493.181
3,503.930 4,191.333 -687.403
3,571.890 3,190.957
380.933
3,741.400 4,364.502 -623.102
4,026.520 4,364.229 -337.709
10,343.810 8,713.307 1,630.503
11,732.170 12,080.864 -348.694
15,414.940 15,133.026
281.914
18,854.450 19,260.453 -406.003

0.121
0.226
0.130
0.159
0.085
0.112
0.084
0.083
0.085
0.120
0.077
0.177
0.064
0.146
0.682
0.785
0.863

-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786

-0.203
-0.044
0.114
0.752
0.138
-0.642
0.291
0.612
-0.828
-1.214
0.630
-1.129
-0.553
4.558
-1.006
0.989
-1.975

0.002
0.000
0.001
0.028
0.000
0.014
0.002
0.009
0.016
0.049
0.009
0.067
0.006
0.353
0.541
0.897
5.033

Multiple Regression
Summary
:
Part 1. Basic Multiple Regression
12.1
12.2
12.3

The Linear Regression Model


The Least Squares Point Estimates
The Mean Squared Error and the Standard
Error
12.4
Model Utility: R2, Adjusted R2, and the F Test
12.5
Testing Significance of an Independent
Variable
12.6
Confidence Intervals and Prediction Intervals

Part 2 Using Squared and Interaction Terms


12.7
12.8
12-42 of 43

The Quadratic Regression Model


Interaction

Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
12.9
Dummy Variables to Model Qualitative
Variables
12.10
The Partial F Test: Testing a Portion of a
Model

Part 4 Model Building and Model Diagnostics


12.11
12.12

Model Building and Model Diagnostics


Model Building and the Effects of
Mulitcollineartity
12.13
Diagnostics for Detecting Outlying and
Influential Observations

12-43 of 43

Вам также может понравиться