Академический Документы
Профессиональный Документы
Культура Документы
Chapter Twelve
Multiple Regression
12-2 of 43
McGraw-Hill/Irwin
Multiple Regression
Part 1. Basic Multiple Regression
12.1
12.2
12.3
12-3 of 43
Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
12.9
12.10
12-4 of 43
0 , 1 , 2 ,...,
k
arethe
regression parameters relating the mean
value of y to x1, x2, , xk.
is an error term that describes the effects on y of all factors
other than the independent variables x1, x2, , xk .
12-5 of 43
Average Hourly
Temperature, x1 (F)
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5
Chill Index, x2
18
14
24
22
8
16
1
0
Fuel Consumption
y (MMcf)
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
y = 0 1 x1 2 x2
12-6 of 43
12-7 of 43
12-9 of 43
Minitab
Output
FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill
Predictor
Constant
Temp
Chill
S = 0.3671
Coef
13.1087
-0.09001
0.08249
StDev
0.8557
0.01408
0.02200
R-Sq = 97.4%
Analysis of Variance
Source
DF
Regression
2
Residual Error
5
Total
7
SS
24.875
0.674
25.549
T
15.32
-6.39
3.75
P
0.000
0.001
0.013
R-Sq(adj) = 96.3%
MS
12.438
0.135
F
92.30
P
0.000
95.0% PI
9.293, 11.374)
12-11 of 43
Average Hourly
Temperature, x1 (F) Chill Index, x2
28.0
18
28.0
14
32.5
24
39.0
22
45.9
8
57.8
16
58.1
1
62.5
0
Observed Fuel
Predicted Fuel
Consumption
Consumption
Residual
y (MMcf) 13.1087 - .0900x1 + .0825x2 e = y - pred
12.4
12.0733
0.3267
11.7
11.7433
-0.0433
12.4
12.1631
0.2369
10.8
11.4131
-0.6131
9.4
9.6372
-0.2372
9.5
9.2260
0.2740
8.0
7.9616
0.0384
7.5
7.4831
0.0169
2
i
( yi y i ) 2
s 2 MSE
SSE
n-( k 1)
s MSE
SSE
n-(k 1)
s 2 MSE
12-12 of 43
SS
24.875
0.674
25.549
MS
12.438
0.135
F
92.30
P
0.000
SSE
0.674
Explained variation
R
Total variation
2
The Adjusted R2
The adjusted multiple coefficient of determination is
k
R R
n 1
n 1
n (k 1)
R-Sq(adj) = 96.3%
MS
12.438
0.135
24.875
2
R
0.974, R 2 0.974
25.549
8 1
12-14 of 43
F
92.30
P
0.000
8 1
0.963
8 (2 1)
F(model)
(Explained variation)/k
(Unexplained variation)/[n - (k 1)]
SS
24.875
0.674
25.549
MS
12.438
0.135
Minitab
F
92.30
P
0.000
Test Statistic:
F(model)
(Explained variation)/k
24.875 / 2
92.30
(Unexplained variation)/[n - (k 1)] 0.674 /(8 3)
Reject H0 if:
p-Value
Ha : j 0
t t
Ha : j 0
t t / 2 , that is
Ha : j 0
t t
t t / 2 or t t / 2
Test Statistic
t=
bj
sbj
, t/2
12-17 of t43
[b j t / 2 sb j ]
Minitab
Output
Predictor
Constant
Temp
Chill
Test
t=
Coef
13.1087
-0.09001
0.08249
StDev
T
0.8557
15.32
0.01408
-6.39
0.02200Interval 3.75
b2
0.08249
[b2 t / 2 sb2 ]
[0.08249 (2.571)(0.02200)]
[0.08249 0.05656]
[0.02593, 0.13905]
P
0.000
0.001
0.013
[y t /2 s( y y ) ] s( y y ) s Distance value
[y t /2 s y ],
s y s 1 + Distance value
[y t /2 s Distance value ]
[y t /2 s 1 Distance value ]
[10.333 1.041]
[9.895,10.771]
[9.292,11.374]
12-20 of 43
12-21 of 43
y= 0 1 x 2 x 2
Mileage,
y (MPG)
25.8
26.1
25.4
29.6
29.2
29.8
32.0
31.4
31.7
31.7
31.5
31.2
29.4
29.0
29.5
Minitab
Output
Mileage = 25.7 + 4.98 Units - 1.02 UnitsSq
Predictor
Coef
StDev
T
P
Constant
25.7152
0.1554
165.43
0.000
Units
4.9762
0.1841
27.02
0.000
UnitsSq
-1.01905
0.04414
-23.09
0.000
S = 0.2861
R-Sq = 98.6%
R-Sq(adj) = 98.3%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
2
67.915
33.958
414.92
0.000
Residual Error
12
0.982
0.082
Total
14
68.897
Predicted Values (Units = 2.44, UnitsSq = (2.44)(2.44) = 5.9536)
Fit
StDev Fit
95.0% CI
95.0% PI
31.7901
0.1111
( 31.5481, 32.0322)2 ( 31.1215, 32.4588)
12-23 of 43
12.8 Interaction
Example
12.13: The
Bonner
Frozen
Foods
Case
12-24 of 43
Sales
Region
1
2
3
4
5
6
7
8
9
10
11
12
13
Radio and TV
Expenditures
x1
1
1
1
1
1
2
2
2
2
2
3
3
3
Print
Expenditures
x2
1
2
3
4
5
1
2
3
4
5
1
2
3
Sales
Volume
y
3.27
8.38
11.28
14.5
19.63
5.84
10.01
12.46
16.67
19.83
8.51
10.14
14.75
Sales
Region
14
15
16
17
18
19
20
21
22
23
24
25
Radio and TV
Expenditures
x1
3
3
4
4
4
4
4
5
5
5
5
5
Print
Expenditures
x2
4
5
1
2
3
4
5
1
2
3
4
5
Sales
Volume
y
17.99
19.85
9.46
12.61
15.5
17.68
21.02
12.23
13.58
16.77
20.56
21.05
Modeling Interaction
Minitab
Output
Sales = - 2.35 + 2.36 RadioTV + 4.18 Print - 0.349 Interact
Predictor
Coef
StDev
T
P
Constant
-2.3497
0.6883
-3.41
0.003
RadioTV
2.3611
0.2075
11.38
0.000
Print
4.1831
0.2075
20.16
0.000
Interact
-0.34890
0.06257
-5.58
0.000
S = 0.6257
R-Sq = 98.6%
R-Sq(adj) = 98.4%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
3
590.41
196.80
502.67
0.000
Residual Error
21
8.22
0.39
Total
24
598.63
Predicted Values (RadioTV = 2, Print = 5, Interact=(2)(5) = 10)
Fit StDev Fit
95.0% CI
95.0% PI
12-25 of 43
19.799
0.265
( 19.247, 20.351) ( 18.385, 21.213)
Number of
Households
x
161
99
135
120
164
221
179
204
214
101
Location
Street
Street
Street
Street
Street
Mall
Mall
Mall
Mall
Mall
Location
Dummy
DM
0
0
0
0
0
1
1
1
1
1
Sales
Volume
y
157.27
93.28
136.81
123.79
153.51
241.74
201.54
206.71
229.78
135.22
Minitab
Output
Sales = 17.4 + 0.851 Households + 29.2 DM
Predictor
Constant
Househol
DM
Coef
17.360
0.85105
29.216
S = 7.329
StDev
9.447
0.06524
5.594
R-Sq = 98.3%
T
1.84
13.04
5.22
P
0.109
0.000
0.001
R-Sq(adj) = 97.8%
Analysis of Variance
Source
Regression
Residual Error
Total
12-27
of 43
DF
2
7
9
SS
21412
376
21788
MS
10706
54
F
199.32
P
0.000
12-29 of 43
Time
43.10
108.13
13.82
186.18
161.79
8.94
365.04
220.32
127.64
105.69
57.72
23.58
13.82
13.82
86.99
165.85
116.26
42.28
52.84
165.04
10.57
13.82
8.13
58.54
21.14
MktPoten
Adver MktShare
74065.11 4582.88
2.51
58117.30 5539.78
5.51
21118.49 2950.38
10.91
68521.27 2243.07
8.27
57805.11 7747.08
9.15
37806.94
402.44
5.51
50935.26 3140.62
8.54
35602.08 2086.16
7.07
46176.77 8846.25
12.54
42053.24 5673.11
8.85
36829.71 2761.76
5.38
33612.67 1991.85
5.43
21412.79 1971.52
8.48
20416.87 1737.38
7.80
36272.00 10694.20
10.34
23093.26 8618.61
5.15
26879.59 7747.89
6.64
39571.96 4565.81
5.45
51866.15 6022.70
6.31
58749.82 3721.10
6.35
23990.82
860.97
7.37
25694.86 3571.51
8.39
23736.35 2845.50
5.15
34314.29 5060.11
12.88
22809.53 3552.00
9.14
Change
0.34
0.15
-0.72
0.17
0.50
0.15
0.55
-0.49
1.24
0.31
0.37
-0.65
0.64
1.01
0.11
0.04
0.68
0.66
-0.10
-0.03
-1.63
-0.43
0.04
0.22
-0.74
Accts WkLoad
74.86
15.05
107.32
19.97
96.75
17.34
195.12
13.40
180.44
17.64
104.88
16.22
256.10
18.80
126.83
19.86
203.25
17.42
119.51
21.41
116.26
16.32
142.28
14.51
89.43
19.35
84.55
20.02
119.51
15.26
80.49
15.87
136.58
7.81
78.86
16.00
136.58
17.44
138.21
17.98
75.61
20.99
102.44
21.66
76.42
21.46
136.58
24.78
88.62
24.96
Rating
4.9
5.1
2.9
3.4
4.6
4.5
4.6
2.3
4.9
2.8
3.1
4.2
4.3
4.2
5.5
3.6
3.4
4.2
3.6
3.1
1.6
3.4
2.7
2.8
3.9
Correlation Matrix
Example: The Sale Territory Performance Case
12-30 of 43
Multicollinearity
Multicollinearity refers to the condition where the independent
variables (or predictors) in a model are dependent, related,
or correlated with each other.
Effects
Hinders ability to use bjs, t statistics, and p-values to
assess the relative importance of predictors.
Does not hinder ability to predict the dependent (or
response) variable.
Detection
Scatter Plot Matrix
Correlation Matrix
Variance Inflation Factors (VIF)
12-31 of 43
1
VIFj
1 R 2j
x j = 0 1 x1 2 x2 ... j 1 x j 1 j 1 x j 1 ... k xk
Notes:
VIFj = 1 implies xj not related to other predictors
max(VIFj) > 10 suggest severe multicollinearity
12-32 of 43
MegaStat
y / x 0 / x 1 2 x
12-35 of 43
12-37 of 43
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620 1,172.464
431.156
0.159
1,611.370 1,526.780
84.590
0.085
1,613.270 1,993.869 -380.599
0.112
1,854.170 1,676.558
177.612
0.084
2,160.550 1,791.405
369.145
0.083
2,305.580 2,798.761 -493.181
0.085
3,503.930 4,191.333 -687.403
0.120
3,571.890 3,190.957
380.933
0.077
3,741.400 4,364.502 -623.102
0.177
4,026.520 4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033
Leverage Values
hi
Leverage = distance value
12-38 of 43
Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620
1,172.464
431.156
0.159
1,611.370
1,526.780
84.590
0.085
1,613.270
1,993.869 -380.599
0.112
1,854.170
1,676.558
177.612
0.084
2,160.550
1,791.405
369.145
0.083
2,305.580
2,798.761 -493.181
0.085
3,503.930
4,191.333 -687.403
0.120
3,571.890
3,190.957
380.933
0.077
3,741.400
4,364.502 -623.102
0.177
4,026.520
4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033
ei
ei
s 1 hi
12-39 of 43
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620
1,172.464
431.156
0.159
1,611.370
1,526.780
84.590
0.085
1,613.270
1,993.869 -380.599
0.112
1,854.170
1,676.558
177.612
0.084
2,160.550
1,791.405
369.145
0.083
2,305.580
2,798.761 -493.181
0.085
3,503.930
4,191.333 -687.403
0.120
3,571.890
3,190.957
380.933
0.077
3,741.400
4,364.502 -623.102
0.177
4,026.520
4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033
Deleted Residual
Deleted Residual Standard Error
di
nk 2
ei
sd
SSE (1 hi ) ei2
i
12-40 of 43
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Hours Predicted
Residual Leverage
566.520
688.409 -121.889
0.121
696.820
721.848
-25.028
0.226
1,033.150
965.393
67.757
0.130
1,603.620 1,172.464
431.156
0.159
1,611.370 1,526.780
84.590
0.085
1,613.270 1,993.869 -380.599
0.112
1,854.170 1,676.558
177.612
0.084
2,160.550 1,791.405
369.145
0.083
2,305.580 2,798.761 -493.181
0.085
3,503.930 4,191.333 -687.403
0.120
3,571.890 3,190.957
380.933
0.077
3,741.400 4,364.502 -623.102
0.177
4,026.520 4,364.229 -337.709
0.064
10,343.810 8,713.307 1,630.503
0.146
11,732.170 12,080.864 -348.694
0.682
15,414.940 15,133.026
281.914
0.785
18,854.450 19,260.453 -406.003
0.863
Studentized
Residual
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
Studentized
Deleted
Residual Cook's D
-0.203
0.002
-0.044
0.000
0.114
0.001
0.752
0.028
0.138
0.000
-0.642
0.014
0.291
0.002
0.612
0.009
-0.828
0.016
-1.214
0.049
0.630
0.009
-1.129
0.067
-0.553
0.006
4.558
0.353
-1.006
0.541
0.989
0.897
-1.975
5.033
Cooks Distance
ei2
Cook' s Distance Di
(k 1) s 2
hi
2
(
1
h
)
i
12-41 of 43
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
566.520
688.409 -121.889
696.820
721.848
-25.028
1,033.150
965.393
67.757
1,603.620 1,172.464
431.156
1,611.370 1,526.780
84.590
1,613.270 1,993.869 -380.599
1,854.170 1,676.558
177.612
2,160.550 1,791.405
369.145
2,305.580 2,798.761 -493.181
3,503.930 4,191.333 -687.403
3,571.890 3,190.957
380.933
3,741.400 4,364.502 -623.102
4,026.520 4,364.229 -337.709
10,343.810 8,713.307 1,630.503
11,732.170 12,080.864 -348.694
15,414.940 15,133.026
281.914
18,854.450 19,260.453 -406.003
0.121
0.226
0.130
0.159
0.085
0.112
0.084
0.083
0.085
0.120
0.077
0.177
0.064
0.146
0.682
0.785
0.863
-0.211
-0.046
0.118
0.765
0.144
-0.657
0.302
0.627
-0.838
-1.192
0.645
-1.117
-0.568
2.871
-1.005
0.990
-1.786
-0.203
-0.044
0.114
0.752
0.138
-0.642
0.291
0.612
-0.828
-1.214
0.630
-1.129
-0.553
4.558
-1.006
0.989
-1.975
0.002
0.000
0.001
0.028
0.000
0.014
0.002
0.009
0.016
0.049
0.009
0.067
0.006
0.353
0.541
0.897
5.033
Multiple Regression
Summary
:
Part 1. Basic Multiple Regression
12.1
12.2
12.3
Multiple Regression
Part 3 Dummy Variables and Advanced Statistical
Inferences
12.9
Dummy Variables to Model Qualitative
Variables
12.10
The Partial F Test: Testing a Portion of a
Model
12-43 of 43