Chap 12

12-1 of 43
Chapter Twelve
Multiple Regression
12-2 of 43
McGraw-Hill/Irwin
Copyright 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Multiple Regression
Part 1. Basic Multiple Regression
12.1
12.2
12.3
The Linear Regression Model

The Least Squares Point Estimates
The Mean Squared Error and the Standard
Error
12.4
Model Utility: R2, Adjusted R2, and the F Test
12.5
Testing Significance of an Independent
Variable
12.6
Confidence Intervals and Prediction Intervals
Part 2 Using Squared and Interaction Terms

12.7
12.8
12-3 of 43
The Quadratic Regression Model

Interaction
Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
12.9
12.10
Dummy Variables to Model Qualitative Variables

The Partial F Test: Testing a Portion of a
Model
Part 4 Model Building and Model Diagnostics

12.11
12.12
Model Building and Model Diagnostics

Model Building and the Effects of
Mulitcollineartity
12.13
Diagnostics for Detecting Outlying and
Influential Observations
12-4 of 43
12.1 The Linear Regression

Model
The linear regression model relating y to x1, x2, , xk is
y= y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk

where
y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk is the mean value of the

dependent variable y when the values of the independent
variables are x1, x2, , xk.
0 , 1 , 2 ,...,
k
arethe
regression parameters relating the mean
value of y to x1, x2, , xk.
is an error term that describes the effects on y of all factors
other than the independent variables x1, x2, , xk .
12-5 of 43
Example: The Linear Regression

Model
Example 12.1: The Fuel Consumption Case
Week
1
2
3
4
5
6
7
8
Average Hourly
Temperature, x1 (F)
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5
Chill Index, x2
18
14
24
22
8
16
1
0
Fuel Consumption
y (MMcf)
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
y = 0 1 x1 2 x2
12-6 of 43

Illustrated
12-7 of 43
The Regression Model Assumptions

Model
y= y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk
Assumptions about the model error terms, s

Mean Zero The mean of the error terms is equal to 0.
Constant Variance The variance of the error terms is, the
same for every combination values of x1, x2, , xk.
Normality The error terms follow a normal distribution for
every combination values of x1, x2, , xk.
Independence The values of the error terms are statistically
independent of each other.
12-8 of 43
12.2 Least Squares Estimates and

Prediction
Estimation/Prediction Equation:
y b0 b1 x01 b2 x02 ... bk x0 k

is the point estimate of the mean value of the dependent
variable when the values of the independent variables are x01,
x02, , x0k. It is also the point prediction of an individual value
of the dependent variable when the values of the independent
variables are x01, x02, , x0k.
b1, b2, , bk are the least squares point estimates of the
parameters 1, 2, , k.
12-9 of 43
x01, x02, , x0k are specified values of the independent predictor

variables x1, x2, , xk.
Example: Least Squares

Estimation
Minitab
Output
FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predictor
Constant
Temp
Chill
S = 0.3671
Coef
13.1087
-0.09001
0.08249
StDev
0.8557
0.01408
0.02200
R-Sq = 97.4%
Analysis of Variance
Source
DF
Regression
2
Residual Error
5
Total
7
SS
24.875
0.674
25.549
T
15.32
-6.39
3.75
P
0.000
0.001
0.013
R-Sq(adj) = 96.3%
MS
12.438
0.135
Predicted Values (Temp = 40, Chill = 10)

Fit StDev Fit
95.0% CI
10.333
0.170
(
9.895, 10.771)
12-10 of 43
F
92.30
P
0.000
95.0% PI
9.293, 11.374)
Example: Point Predictions and

Residuals
Week
1
2
3
4
5
6
7
8
12-11 of 43
Average Hourly
Temperature, x1 (F) Chill Index, x2
28.0
18
28.0
14
32.5
24
39.0
22
45.9
8
57.8
16
58.1
1
62.5
0
Observed Fuel
Predicted Fuel
Consumption
Consumption
Residual
y (MMcf) 13.1087 - .0900x1 + .0825x2 e = y - pred
12.4
12.0733
0.3267
11.7
11.7433
-0.0433
12.4
12.1631
0.2369
10.8
11.4131
-0.6131
9.4
9.6372
-0.2372
9.5
9.2260
0.2740
8.0
7.9616
0.0384
7.5
7.4831
0.0169
12.3 Mean Square Error and

Standard Error
SSE
2
i
( yi y i ) 2
s 2 MSE
SSE
n-( k 1)
s MSE
SSE
n-(k 1)
Sum of Squared Errors

Mean Square Error, point
estimate of residual variance
Standard Error, point estimate of
residual standard deviation
Example 12.3 The Fuel Consumption Case

Source
DF
Regression
2
Residual Error
5
Total
7
s 2 MSE
12-12 of 43
SS
24.875
0.674
25.549
MS
12.438
0.135
F
92.30
P
0.000
SSE
0.674
0.1348 s s 2 0.1348 0.3671

n-(k 1) 8 3
12.4 The Multiple Coefficient of

Determination
The multiple coefficient of determination R2 is
Explained variation
R
Total variation
2
R2 is the proportion of the total variation in y explained by the

linear regression model
Total variation Explained variation Unexplained variation

Total variation = (yi y )2 Total Sum of Squares (SSTO)
Explained variation = (y i y )2 Regression Sum of Squares (SSR)
Unexplained variation = (yi y i )2
Error Sum of Squares (SSE)
Multiple correlation coefficient , R R 2

12-13 of 43
The Adjusted R2
The adjusted multiple coefficient of determination is
k
R R
n 1
n 1
n (k 1)

S = 0.3671
R-Sq = 97.4%
Source
DF
SS
Regression
2
24.875
Residual Error
5
0.674
Total
7
25.549
R-Sq(adj) = 96.3%
MS
12.438
0.135
24.875
2
R
0.974, R 2 0.974
25.549
8 1
12-14 of 43
F
92.30
P
0.000
8 1
0.963
8 (2 1)
F Test for Linear Regression

Model
To test H0: = = = = 0 versus
Ha: At least one of the , , , k is not equal
to 0
Test Statistic:
F(model)
(Explained variation)/k
(Unexplained variation)/[n - (k 1)]
Reject H0 in favor of Ha if:

F(model) > For
p-value <
Fis based on k numerator and n-(k+1) denominator degrees of

freedom.
12-15 of 43
Example: F Test for Linear

Regression
Output
Source
DF
Regression
2
Residual Error
5
Total
7
SS
24.875
0.674
25.549
MS
12.438
0.135
Minitab
F
92.30
P
0.000
Test Statistic:
F(model)
(Explained variation)/k
24.875 / 2
92.30
(Unexplained variation)/[n - (k 1)] 0.674 /(8 3)
Reject H0 at level of significance, since

F-test at = 0.05
F(model) 92.30 5.79 F.05 and
level of
p - value 0.000 0.05
significance
12-16 of 43
Fis based on 2 numerator and 5 denominator degrees of

freedom.
12.5 Testing Significance of the

Independent Variable
If the regression assumptions hold, we can reject H0: j = 0 at
the level of significance (probability of Type I error equal to )
if and only if the appropriate rejection point condition holds or,
equivalently, if the corresponding p-value is less than .
Alternative
Reject H0 if:
p-Value
Ha : j 0
t t
Area under t distribution left of t
Ha : j 0
t t / 2 , that is
Twice area under t distribution right of t
Ha : j 0
t t
Area under t distribution right of t
t t / 2 or t t / 2
Test Statistic
t=
bj
sbj
, t/2
12-17 of t43
100(1-)% Confidence Interval for 1
[b j t / 2 sb j ]
and p-values are based on n (k+1) degrees of

freedom.
Example: Testing and

Estimation for s
Minitab
Output
Predictor
Constant
Temp
Chill
Test
t=
Coef
13.1087
-0.09001
0.08249
StDev
T
0.8557
15.32
0.01408
-6.39
0.02200Interval 3.75
b2
0.08249
3.75 2.571 t.025

sb2
0.02200
p value 2 P(t 3.75) 0.013
[b2 t / 2 sb2 ]
[0.08249 (2.571)(0.02200)]
[0.08249 0.05656]
[0.02593, 0.13905]
Chill is significant at the = 0.05 level, but not at = 0.01

12-18 of 43
P
0.000
0.001
0.013
t, t/2 and p-values are based on 5 degrees of freedom.
12.6 Confidence and Prediction Intervals

Prediction:
y b0 b1 x01 b2 x02 ... bk x0 k
If the regression assumptions hold,
100(1 - )% confidence interval for the mean value

of y
[y t /2 s( y y ) ] s( y y ) s Distance value
100(1 - )% prediction interval for an individual value of y
[y t /2 s y ],
s y s 1 + Distance value
Distance value (requires matrix algebra), see Appendix G on CD-RO

t is based on n-(k+1) degrees of freedom
12-19 of 43
Example: Confidence and Prediction

Intervals
Minitab Output
FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predicted Values (Temp = 40, Chill = 10)
Fit StDev Fit
95.0% CI
95.0% PI
10.333
0.170
(9.895, 10.771) (9.293,11.374)
95% Confidence Interval
95% Prediction Interval
[y t /2 s Distance value ]
[y t /2 s 1 Distance value ]
[10.333 (2.571)(0.3671) 0.2144515 ] [10.333 (2.571)(0.3671) 1 0.2144515 ]

[10.333 0.438]
[10.333 1.041]
[9.895,10.771]
[9.292,11.374]
12-20 of 43
12.7 The Quadratic Regression

Model
Model
12-21 of 43
y= 0 1 x 2 x 2
Example: Quadratic Regression

Model
Example 12.11 The Gasoline Additive Case
Units of
Additive, x
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
12-22 of 43
Mileage,
y (MPG)
25.8
26.1
25.4
29.6
29.2
29.8
32.0
31.4
31.7
31.7
31.5
31.2
29.4
29.0
29.5
Example: Quadratic Regression

Model
Example 12.11: The Gasoline Additive Case
Minitab
Output
Mileage = 25.7 + 4.98 Units - 1.02 UnitsSq
Predictor
Coef
StDev
T
P
Constant
25.7152
0.1554
165.43
0.000
Units
4.9762
0.1841
27.02
0.000
UnitsSq
-1.01905
0.04414
-23.09
0.000
S = 0.2861
R-Sq = 98.6%
R-Sq(adj) = 98.3%
Source
DF
SS
MS
F
P
Regression
2
67.915
33.958
414.92
0.000
Residual Error
12
0.982
0.082
Total
14
68.897
Predicted Values (Units = 2.44, UnitsSq = (2.44)(2.44) = 5.9536)
Fit
StDev Fit
95.0% CI
95.0% PI
31.7901
0.1111
( 31.5481, 32.0322)2 ( 31.1215, 32.4588)
y 25.7152 4.9762(2.44) 1.01905(2.44) 31.7901 mpg
12-23 of 43
12.8 Interaction
Example
12.13: The
Bonner
Frozen
Foods
Case
12-24 of 43
Sales
Region
1
2
3
4
5
6
7
8
9
10
11
12
13
Radio and TV
Expenditures
x1
1
1
1
1
1
2
2
2
2
2
3
3
3
Print
Expenditures
x2
1
2
3
4
5
1
2
3
4
5
1
2
3
Sales
Volume
y
3.27
8.38
11.28
14.5
19.63
5.84
10.01
12.46
16.67
19.83
8.51
10.14
14.75
Sales
Region
14
15
16
17
18
19
20
21
22
23
24
25
Radio and TV
Expenditures
x1
3
3
4
4
4
4
4
5
5
5
5
5
Print
Expenditures
x2
4
5
1
2
3
4
5
1
2
3
4
5
Sales
Volume
y
17.99
19.85
9.46
12.61
15.5
17.68
21.02
12.23
13.58
16.77
20.56
21.05
Modeling Interaction
Model y= 0 1 x1 2 x2 3 x1 x2 x1x2 is a cross-product or interaction term

Example 12.13: The Bonner Frozen Food Case
Minitab
Output
Sales = - 2.35 + 2.36 RadioTV + 4.18 Print - 0.349 Interact
Predictor
Coef
StDev
T
P
Constant
-2.3497
0.6883
-3.41
0.003
RadioTV
2.3611
0.2075
11.38
0.000
Print
4.1831
0.2075
20.16
0.000
Interact
-0.34890
0.06257
-5.58
0.000
S = 0.6257
R-Sq = 98.6%
R-Sq(adj) = 98.4%
Source
DF
SS
MS
F
P
Regression
3
590.41
196.80
502.67
0.000
Residual Error
21
8.22
0.39
Total
24
598.63
Predicted Values (RadioTV = 2, Print = 5, Interact=(2)(5) = 10)
Fit StDev Fit
95.0% CI
95.0% PI
12-25 of 43
19.799
0.265
( 19.247, 20.351) ( 18.385, 21.213)
12.9Using Dummy Variables to

Model Qualitative Independent
Variable
Example 12.15 The Electronics World Case
Store
1
2
3
4
5
6
7
8
9
10
Number of
Households
x
161
99
135
120
164
221
179
204
214
101
Location
Street
Street
Street
Street
Street
Mall
Mall
Mall
Mall
Mall
Location
Dummy
DM
0
0
0
0
0
1
1
1
1
1
Sales
Volume
y
157.27
93.28
136.81
123.79
153.51
241.74
201.54
206.71
229.78
135.22
Location Dummy Variable

DM
12-26 of 43
1 if a store is in a mall location

0 otherwise
Example: Regression with a Dummy

Variable
Example 12.15: The Electronics World Case
Minitab
Output
Sales = 17.4 + 0.851 Households + 29.2 DM
Predictor
Constant
Househol
DM
Coef
17.360
0.85105
29.216
S = 7.329
StDev
9.447
0.06524
5.594
R-Sq = 98.3%
T
1.84
13.04
5.22
P
0.109
0.000
0.001
R-Sq(adj) = 97.8%
Source
Regression
Residual Error
Total
12-27
of 43
DF
2
7
9
SS
21412
376
21788
MS
10706
54
F
199.32
P
0.000
12.10 The Partial F Test: Testing the

Significance of a Portion of a
Regression Model
Complete model : y= 0 1 x1 ... g x g g 1 x g 1 ... k xk
Reduced model : y= 0 1 x1 ... g x g
To test H0: g+1= g+2 = = k = 0 versus

Ha: At least one of the g+1, g+2, , k is not equal
to 0
Partial F Statistic:
(SSE R - SSE C )/(k - g)

SSE C /[n - (k 1)]
Reject H0 in favor of Ha if:

F > For
p-value <
12-28 of 43
Fis based on k-g numerator and n-(k+1) denominator degrees of

freedom.
12.11 Model Building and the Effects of

Multicollinearity
Example: The Sale Territory Performance Case
Sales
3669.88
3473.95
2295.10
4675.56
6125.96
2134.94
5031.66
3367.45
6519.45
4876.37
2468.27
2533.31
2408.11
2337.38
4586.95
2729.24
3289.40
2800.78
3264.20
3453.62
1741.45
2035.75
1578.00
4167.44
2799.97
12-29 of 43
Time
43.10
108.13
13.82
186.18
161.79
8.94
365.04
220.32
127.64
105.69
57.72
23.58
13.82
13.82
86.99
165.85
116.26
42.28
52.84
165.04
10.57
13.82
8.13
58.54
21.14
MktPoten
Adver MktShare
74065.11 4582.88
2.51
58117.30 5539.78
5.51
21118.49 2950.38
10.91
68521.27 2243.07
8.27
57805.11 7747.08
9.15
37806.94
402.44
5.51
50935.26 3140.62
8.54
35602.08 2086.16
7.07
46176.77 8846.25
12.54
42053.24 5673.11
8.85
36829.71 2761.76
5.38
33612.67 1991.85
5.43
21412.79 1971.52
8.48
20416.87 1737.38
7.80
36272.00 10694.20
10.34
23093.26 8618.61
5.15
26879.59 7747.89
6.64
39571.96 4565.81
5.45
51866.15 6022.70
6.31
58749.82 3721.10
6.35
23990.82
860.97
7.37
25694.86 3571.51
8.39
23736.35 2845.50
5.15
34314.29 5060.11
12.88
22809.53 3552.00
9.14
Change
0.34
0.15
-0.72
0.17
0.50
0.15
0.55
-0.49
1.24
0.31
0.37
-0.65
0.64
1.01
0.11
0.04
0.68
0.66
-0.10
-0.03
-1.63
-0.43
0.04
0.22
-0.74
Accts WkLoad
74.86
15.05
107.32
19.97
96.75
17.34
195.12
13.40
180.44
17.64
104.88
16.22
256.10
18.80
126.83
19.86
203.25
17.42
119.51
21.41
116.26
16.32
142.28
14.51
89.43
19.35
84.55
20.02
119.51
15.26
80.49
15.87
136.58
7.81
78.86
16.00
136.58
17.44
138.21
17.98
75.61
20.99
102.44
21.66
76.42
21.46
136.58
24.78
88.62
24.96
Rating
4.9
5.1
2.9
3.4
4.6
4.5
4.6
2.3
4.9
2.8
3.1
4.2
4.3
4.2
5.5
3.6
3.4
4.2
3.6
3.1
1.6
3.4
2.7
2.8
3.9
Correlation Matrix
12-30 of 43
Multicollinearity
Multicollinearity refers to the condition where the independent
variables (or predictors) in a model are dependent, related,
or correlated with each other.
Effects
Hinders ability to use bjs, t statistics, and p-values to
assess the relative importance of predictors.
Does not hinder ability to predict the dependent (or
response) variable.
Detection
Scatter Plot Matrix
Correlation Matrix
Variance Inflation Factors (VIF)
12-31 of 43
Variance Inflation Factors (VIF)

The variance inflation factor for the jth independent (or
predictor) variable xj is
1
VIFj
1 R 2j
where Rj2 is the multiple coefficient of determination for the

regression model relating xj to the other predictors x1,,xj1,xj+1, xk
x j = 0 1 x1 2 x2 ... j 1 x j 1 j 1 x j 1 ... k xk
Notes:
VIFj = 1 implies xj not related to other predictors
max(VIFj) > 10 suggest severe multicollinearity
12-32 of 43
mean(VIFj) substantially greater than 1 suggests

severe multicollinearity
Example: Variance Inflation Factors

(VIF)
Output
MegaStat
max(VIFj) =5.639, mean(VIFj) = 2.667 probably not severe

multicollinearity
12-33 of 43
12.12 Residual Analysis in

Multiple Regression
For an observed value of yi, the residual is
ei yi y i yi (b0 b1 xi1 ... bk xik )

If the regression assumptions hold, the residuals should look like a
random sample from a normal distribution with mean 0 and variance
2.
Residual Plots
Residuals versus each independent variable
Residuals versus predicted ys
Residuals in time order (if the response is a time series)
Histogram of residuals
Normal plot of the residuals
12-34 of 43
Nonconstant Variance: Remedial

Measures
Example 13.1: The QHIC Case
y / x 0 / x 1 2 x
Upkeep/V = - 53.5 1/V + 3.41 One + 0.0112 Value

Predictor
Coef
SE Coef
T
P
Noconstant
1/V
-53.50
83.20
-0.64
0.524
One
3.409
1.321
2.58
0.014
Value
0.011224
0.004627
2.43
0.020
Predicted Values (1/V = 0.004545, One = 1, Value = 220)
Fit
SE Fit
95.0% CI
95.0% PI
5.635
0.162
(
5.306,
5.964) (
3.994,
7.276)
Plots: Residual versus Fits x and predicted responses
12-35 of 43
12.13 Diagnostics for Detecting

Outlying and Influential
Observations
Observation 1: Outlying with respect to y value

Observation 2: Outlying with respect to x value
Observation 3: Outlying with respect to x value and y value
not
consistent with regression relationship
12-36 of 43 (Influential)
Example: Influence Diagnostics

Hospital Labor Needs Case, Model:
y = monthly labor hours required
x1 = monthly X-ray exposures
x2 = monthly occupied bed days
x3 = average length of patient stay (days)
12-37 of 43
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620 1,172.464
431.156
0.159
1,611.370 1,526.780
84.590
0.085
1,613.270 1,993.869 -380.599
0.112
1,854.170 1,676.558
177.612
0.084
2,160.550 1,791.405
369.145
0.083
2,305.580 2,798.761 -493.181
0.085
3,503.930 4,191.333 -687.403
0.120
3,571.890 3,190.957
380.933
0.077
3,741.400 4,364.502 -623.102
0.177
4,026.520 4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033
Leverage Values
hi
Leverage = distance value
An observation is outlying with respect to x if it has a large

leverage, greater than 2(k+1)/n
Hospital Labor Needs Case: n = 17, k = 3, 2(3+1)/17 = 0.4706
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
12-38 of 43
Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620
1,172.464
431.156
0.159
1,611.370
1,526.780
84.590
0.085
1,613.270
1,993.869 -380.599
0.112
1,854.170
1,676.558
177.612
0.084
2,160.550
1,791.405
369.145
0.083
2,305.580
2,798.761 -493.181
0.085
3,503.930
4,191.333 -687.403
0.120
3,571.890
3,190.957
380.933
0.077
3,741.400
4,364.502 -623.102
0.177
4,026.520
4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033
Residuals and Studentized

Residuals
Residual
Studentized Residual
Residual Standard Error
ei
ei
s 1 hi
An observation is outlying with respect to y if it has a large

studentized (or standardized) residual, |StRes| greater than 2
12-39 of 43
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620
1,172.464
431.156
0.159
1,611.370
1,526.780
84.590
0.085
1,613.270
1,993.869 -380.599
0.112
1,854.170
1,676.558
177.612
0.084
2,160.550
1,791.405
369.145
0.083
2,305.580
2,798.761 -493.181
0.085
3,503.930
4,191.333 -687.403
0.120
3,571.890
3,190.957
380.933
0.077
3,741.400
4,364.502 -623.102
0.177
4,026.520
4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033
Studentized Deleted Residuals

Studentized Deleted Residual
Deleted Residual
Deleted Residual Standard Error
di
nk 2
ei
sd
SSE (1 hi ) ei2
i
An observation is outlying with respect to y if it has a large studentized deleted

residual, |tRes| greater than t/2 [with (n-k-2) d.f.]
Hospital Labor Needs Case: (17-3-2) = 12 |tRes| > t.025 = 2.179
12-40 of 43
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620 1,172.464
431.156
0.159
1,611.370 1,526.780
84.590
0.085
1,613.270 1,993.869 -380.599
0.112
1,854.170 1,676.558
177.612
0.084
2,160.550 1,791.405
369.145
0.083
2,305.580 2,798.761 -493.181
0.085
3,503.930 4,191.333 -687.403
0.120
3,571.890 3,190.957
380.933
0.077
3,741.400 4,364.502 -623.102
0.177
4,026.520 4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033
Cooks Distance
ei2
Cook' s Distance Di
(k 1) s 2
hi
2
(
1
h
)
i
An observation is influential with respect to the estimated

regression parameters b0, b1, , bk if it has a large Cooks
distance, Di greater than F.50 [with k+1 and n-(k+1) d.f.]
Hospital Labor Needs Case: (3+1) = 4, (17-3-1)
=13, Di > F.50 =
Studentized
Studentized
Deleted
0.8845
Observation
Hours Predicted
Residual Leverage
Residual
Residual Cook's D
12-41 of 43
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
566.520
688.409 -121.889
696.820
721.848
-25.028
1,033.150
965.393
67.757
1,603.620 1,172.464
431.156
1,611.370 1,526.780
84.590
1,613.270 1,993.869 -380.599
1,854.170 1,676.558
177.612
2,160.550 1,791.405
369.145
2,305.580 2,798.761 -493.181
3,503.930 4,191.333 -687.403
3,571.890 3,190.957
380.933
3,741.400 4,364.502 -623.102
4,026.520 4,364.229 -337.709
10,343.810 8,713.307 1,630.503
11,732.170 12,080.864 -348.694
15,414.940 15,133.026
281.914
18,854.450 19,260.453 -406.003
0.121
0.226
0.130
0.159
0.085
0.112
0.084
0.083
0.085
0.120
0.077
0.177
0.064
0.146
0.682
0.785
0.863
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
-0.203
-0.044
0.114
0.752
0.138
-0.642
0.291
0.612
-0.828
-1.214
0.630
-1.129
-0.553
4.558
-1.006
0.989
-1.975
0.002
0.000
0.001
0.028
0.000
0.014
0.002
0.009
0.016
0.049
0.009
0.067
0.006
0.353
0.541
0.897
5.033
Multiple Regression
Summary
:
Part 1. Basic Multiple Regression
12.1
12.2
12.3

The Least Squares Point Estimates
The Mean Squared Error and the Standard
Error
12.4
Model Utility: R2, Adjusted R2, and the F Test
12.5
Testing Significance of an Independent
Variable
12.6
Confidence Intervals and Prediction Intervals
Part 2 Using Squared and Interaction Terms

12.7
12.8
12-42 of 43
The Quadratic Regression Model

Interaction
Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
12.9
Dummy Variables to Model Qualitative
Variables
12.10
The Partial F Test: Testing a Portion of a
Model
Part 4 Model Building and Model Diagnostics

12.11
12.12
Model Building and Model Diagnostics

Model Building and the Effects of
Mulitcollineartity
12.13
Diagnostics for Detecting Outlying and
Influential Observations
12-43 of 43

Chap 12

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chap 12

Загружено:

Авторское право:

Доступные форматы

12-1 of 43

Copyright 2003 by The McGraw-Hill Companies, Inc. All rights reserved.

The Linear Regression Model

Part 2 Using Squared and Interaction Terms

The Quadratic Regression Model

Dummy Variables to Model Qualitative Variables

Part 4 Model Building and Model Diagnostics

Model Building and Model Diagnostics

12.1 The Linear Regression

y= y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk

y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk is the mean value of the

Example: The Linear Regression

The Linear Regression Model

The Regression Model Assumptions

y= y|x1 , x2 ,..., xk = 0 1 x1 2 x2 ... k xk

Assumptions about the model error terms, s

12.2 Least Squares Estimates and

y b0 b1 x01 b2 x02 ... bk x0 k

x01, x02, , x0k are specified values of the independent predictor

Example: Least Squares

Predicted Values (Temp = 40, Chill = 10)

Example: Point Predictions and

12.3 Mean Square Error and

Sum of Squared Errors

Example 12.3 The Fuel Consumption Case

0.1348 s s 2 0.1348 0.3671

12.4 The Multiple Coefficient of

R2 is the proportion of the total variation in y explained by the

Total variation Explained variation Unexplained variation

Explained variation = (y i y )2 Regression Sum of Squares (SSR)

Unexplained variation = (yi y i )2

Error Sum of Squares (SSE)

Multiple correlation coefficient , R R 2

Example 12.3 The Fuel Consumption Case

F Test for Linear Regression

Reject H0 in favor of Ha if:

Fis based on k numerator and n-(k+1) denominator degrees of

Example: F Test for Linear

Reject H0 at level of significance, since

Fis based on 2 numerator and 5 denominator degrees of

12.5 Testing Significance of the

Area under t distribution left of t

Twice area under t distribution right of t

Area under t distribution right of t

100(1-)% Confidence Interval for 1

and p-values are based on n (k+1) degrees of

Example: Testing and

3.75 2.571 t.025

p value 2 P(t 3.75) 0.013

Chill is significant at the = 0.05 level, but not at = 0.01

t, t/2 and p-values are based on 5 degrees of freedom.

12.6 Confidence and Prediction Intervals

y b0 b1 x01 b2 x02 ... bk x0 k

If the regression assumptions hold,

100(1 - )% confidence interval for the mean value

100(1 - )% prediction interval for an individual value of y

Distance value (requires matrix algebra), see Appendix G on CD-RO

Example: Confidence and Prediction

95% Confidence Interval

95% Prediction Interval

[10.333 (2.571)(0.3671) 0.2144515 ] [10.333 (2.571)(0.3671) 1 0.2144515 ]

12.7 The Quadratic Regression

Example: Quadratic Regression