Академический Документы
Профессиональный Документы
Культура Документы
and planning
Multiple Regression Analysis
Introduction
Introduction
Multiple Regression
Y 0 1 x1 2 x2 k xk
0, 1, , k are parameters
X1, X2, ,Xk are known constants
, the error terms are independent N(o, 2)
to the data.
The least-squares method chooses the bs that
make the sum of squares of the residuals as small
as possible.
(
y
y
)
i i
i 1
n k 1 i 1
2
(
y
y
)
(
y
y
)
(
y
y
)
i
i
i i
SST
df:
SSR
n 1 k (n k 1)
SSE
Sum of
Squares
df
Mean
Square
F-test
Regression
SSR
MSR=
SSR/k
MSR/MSE
Error
SSE
n-k-1
MSE=
SSE/n-k-1
Total
SST
n-1
Reject H0 if
F F ( ; k , n k 1)
Interval estimation of i
Where
s(bi )
; n k 1) s(bi )
MSE
( x x )2
To test:
H 0 : i 0
H a : i 0
t t ( ; n k 1)
2
t t ( ; n k 1)
2
or
Sales Forecasting
Multiple regression is a popular technique for predicting
product sales with the help of other variables that are likely to
have a bearing on sales.
Example
The growth of cable television has created vast new potential
in the home entertainment business. The following table gives
the values of several variables measured in a random sample of
20 local television stations which offer their programming to
cable subscribers. A TV industry analyst wants to build a
statistical model for predicting the number of subscribers that a
cable station can expect.
Example:Sales Forecasting
Example:Sales Forecasting
Example:Sales Forecasting
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.884267744
R Square
0.781929444
Adjusted R Square
0.723777295
Standard Error
142.9354188
Observations
20
ANOVA
df
Regression
SS
MS
1098857.84
274714.4601
Residual
15
306458.0092
20430.53395
Total
19
1405315.85
Coefficients
Standard Error
t Stat
Significance F
13.44626923
P-value
7.52E-05
Intercept
51.42007002
98.97458277
0.51952803
0.610973806
-159.539
AD_Rate
-0.267196347
0.081055107
-3.296477624
0.004894126
-0.43996
-0.09443
Signal
-0.020105139
0.045184758
-0.444954014
0.662706578
-0.11641
0.076204
0.440333955
0.135200486
3.256896248
0.005307766
0.152161
0.728507
16.230071
26.47854322
0.61295181
0.549089662
-40.2076
72.66778
APIPOP
Compete
262.3795
Example:Sales Forecasting
Example:Sales Forecasting
0.882638739
R Square
0.779051144
Adjusted R Square
0.737623233
Standard Error
139.3069743
Observations
20
ANOVA
df
SS
Regression
MS
1094812.92
364937.64
Residual
16
310502.9296
19406.4331
Total
19
1405315.85
Coefficients
Standard Error
t Stat
F
18.80498277
P-value
Significance F
1.69966E-05
Lower 95%
Upper 95%
Intercept
51.31610447
96.4618242
0.531983558
0.602046756
-153.1737817
255.806
AD_Rate
-0.259538026
0.077195983
-3.36206646
0.003965102
-0.423186162
-0.09589
APIPOP
0.433505145
0.130916687
3.311305499
0.004412929
0.15597423
0.711036
Compete
13.92154404
25.30614013
0.550125146
0.589831583
-39.72506442
67.56815
Example:Sales Forecasting
Example:Sales Forecasting
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.8802681
R Square
0.774871928
Adjusted R Square
0.748386273
Standard Error
136.4197776
Observations
20
ANOVA
df
SS
Regression
MS
1088939.802
544469.901
Residual
17
316376.0474
18610.35573
Total
19
1405315.85
Coefficients
Intercept
96.28121395
AD_Rate
-0.254280696
APIPOP
0.495481252
Standard Error
50.16415506
t Stat
F
29.2562866
P-value
Significance F
3.13078E-06
Lower 95%
Upper 95%
-9.556049653
202.1184776
1.919322948
0.07188916
0.075014548 -3.389751739
0.003484198
-0.41254778 -0.096013612
0.065306012
7.45293E-07
0.357697418
7.587069489
0.633265086
Example:Sales Forecasting
All the variables in the model are
statistically significant, therefore our final
model is:
Final Model
Multicollinearity
Multicollinearity
Multicollinearity
Multicollinearity
Multicollinearity
Some key problems that typically arise when the
explanatory variables being considered for the regression
model are highly correlated among themselves are:
1.
2.
3.
Multicollinearity Diagnostics
1
,
1 R 2j
j 1,2, k
Multicollinearity Diagnostics
Multicollinearity Diagnostics
Example:Sales Forecasting
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
SUBSCRIB
ADRATE
KILOWATT
APIPOP
COMPETE
1.00000
-0.02848
0.9051
0.44762
0.0478
0.90447
<.0001
0.79832
<.0001
-0.02848
0.9051
1.00000
-0.01021
0.9659
0.32512
0.1619
0.34147
0.1406
KILOWATT
KILOWATT
0.44762
0.0478
-0.01021
0.9659
1.00000
0.45303
0.0449
0.46895
0.0370
APIPOP
APIPOP
0.90447
<.0001
0.32512
0.1619
0.45303
0.0449
1.00000
0.87592
<.0001
COMPETE
COMPETE
0.79832
<.0001
0.34147
0.1406
0.46895
0.0370
0.87592
1.00000
SUBSCRIB
SUBSCRIB
ADRATE
ADRATE
<.0001
Example:Sales Forecasting
SUBSCRIBE 51.42 0.27 ADRATE - .02 SIGNAL 0.44 APIPOP 16.23 COMPETE
Example:Sales Forecasting
VIF calculation:
Fit the model
0.878054
R Square
0.770978
Adjusted R Square
0.728036
Standard Error
264.3027
Observations
20
ANOVA
df
Regression
SS
MS
3762601
1254200
Residual
16
1117695
69855.92
Total
19
4880295
Coefficients
Standard Error t Stat
F
17.9541
Significance F
2.25472E-05
P-value
Lower 95%
Intercept
-472.685
139.7492
-3.38238
0.003799
-768.9402258
Upper 95%
-176.43
Compete
159.8413
28.29157
5.649786
3.62E-05
99.86587622
219.8168
ADRATE
0.048173
0.149395
0.322455
0.751283
-0.268529713
0.364876
Signal
0.037937
0.083011
0.457012
0.653806
-0.138038952
0.213913
Example:Sales Forecasting
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.882936
R Square
0.779575
Adjusted R Square
0.738246
Standard Error
1.34954
Observations
20
ANOVA
df
Regression
SS
MS
103.0599
34.35329
Residual
16
29.14013
1.821258
Total
19
132.2
Coefficients
Standard Error t Stat
F
18.86239
P-value
Significance F
1.66815E-05
Lower 95%
Upper 95%
Intercept
3.10416
0.520589
5.96278
1.99E-05
2.000559786
4.20776
ADRATE
0.000491
0.000755
0.649331
0.525337
-0.001110874
0.002092
Signal
0.000334
0.000418
0.799258
0.435846
-0.000552489
0.001221
APIPOP
0.004167
0.000738
5.649786
3.62E-05
0.002603667
0.005731
Example:Sales Forecasting
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.512244
R Square
0.262394
Adjusted R Square
0.124092
Standard Error
790.8387
Observations
20
ANOVA
df
Regression
SS
3
MS
3559789
1186596
Residual
16 10006813
625425.8
Total
19 13566602
Coefficients
Standard Error t Stat
F
1.897261
Significance F
0.170774675
P-value
Lower 95%
Intercept
5.171093
547.6089
0.009443
0.992582
-1155.707711
Upper 95%
1166.05
APIPOP
0.339655
0.743207
0.457012
0.653806
-1.235874129
1.915184
Compete
114.8227
143.6617
0.799258
0.435846
-189.7263711
419.3718
ADRATE
-0.38091
0.438238
-0.86919
0.397593
-1.309935875
0.548109
Example:Sales Forecasting
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.399084
R Square
0.159268
Adjusted R Square
0.001631
Standard Error
440.8588
Observations
20
ANOVA
df
Regression
SS
MS
589101.7
196367.2
Residual
16
3109703
194356.5
Total
19
3698805
Coefficients
Standard Error t Stat
Intercept
Signal
APIPOP
Compete
F
1.010346
Significance F
0.413876018
P-value
Lower 95%
Upper 95%
253.7304
298.6063
0.849716
0.408018
-379.2865355
886.7474
-0.11837
0.136186
-0.86919
0.397593
-0.407073832
0.170329
0.134029
0.415653
0.322455
0.751283
-0.747116077
1.015175
52.3446
80.61309
0.649331
0.525337
-118.5474784
223.2367
Example:Sales Forecasting
R- Squared
VIF
ADRATE
0.159268
1.19
COMPETE
0.779575
4.54
SIGNAL
0.262394
1.36
APIPOP
0.770978
4.36
Indicator variables
Indicator, or dummy variables are used to
determine the relationship between qualitative
independent variables and a dependent variable.
Indicator variables take on the values 0
and 1.
For the insurance innovation example, where the
qualitative variable has two classes, we might
define the indicator variable x2 as follows:
1 if stock company
x2
0 otherwise
Indicator variables
Where:
x1 size of firm
x2
1 if stock company
0 otherwise
Mutual firms
Stock firms
Size
type of firm
Type
17
151
Mutual
26
92
Mutual
21
175
Mutual
30
31
Mutual
22
104
Mutual
277
Mutual
12
210
Mutual
19
120
Mutual
290
Mutual
16
238
Stock
28
164
Stock
15
272
Stock
11
295
Stock
38
68
Stock
31
85
Stock
21
224
Stock
20
166
Stock
13
305
Stock
30
124
Stock
14
246
Stock
Where
x1 size of firm
x2
1 if stock company
0 otherwise
0.95993655
R Square
0.92147818
Adjusted R Square
0.91224031
Standard Error
2.78630562
Observations
20
ANOVA
df
Regression
SS
MS
1548.820517
774.4103
Residual
17
131.979483
7.763499
Total
19
1680.8
Coefficients
Intercept
Size
type of firm
Standard Error
33.8698658
1.562588138
-0.10608882
8.76797549
t Stat
F
99.75016
P-value
21.67549
8E-14
0.007799653
-13.6017
1.45E-10
1.286421264
6.815789
3.01E-06
Significance F
4.04966E-10
Lower 95%
Upper 95%
30.57308841
37.16664321
-0.122544675 -0.089632969
6.053860079
11.4820909
Interpretation ?
Nov-98
Jul-98
Mar-98
Nov-97
Jul-97
Mar-97
Nov-96
Jul-96
Mar-96
Nov-95
Jul-95
Mar-95
Nov-94
Jul-94
100
Mar-94
Nov-93
Jul-93
Mar-93
Nov-92
Jul-92
200
Mar-92
Nov-91
Jul-91
Mar-91
Nov-90
Jul-90
Mar-90
400
350
300
250
1
1
150
1
50
PHS
MR
Q2
Q3
Q4
31-Mar-90
217
10.1202
30-Jun-90
271.3
10.3372
30-Sep-90
233
10.1033
31-Dec-90
173.6
9.9547
31-Mar-91
146.7
9.5008
30-Jun-91
254.1
9.5265
30-Sep-91
239.8
9.2755
31-Dec-91
199.8
8.6882
31-Mar-92
218.5
8.7098
30-Jun-92
296.4
8.6782
30-Sep-92
276.4
8.0085
31-Dec-92
238.8
8.2052
31-Mar-93
213.2
7.7332
30-Jun-93
323.7
7.4515
30-Sep-93
309.3
7.0778
31-Dec-93
279.4
7.0537
31-Mar-94
252.6
7.2958
30-Jun-94
354.2
8.4370
30-Sep-94
325.7
8.5882
31-Dec-94
265.9
9.0977
31-Mar-95
214.2
8.8123
30-Jun-95
296.7
7.9470
30-Sep-95
308.2
7.7012
31-Dec-95
257.2
7.3508
31-Mar-96
240
7.2430
30-Jun-96
344.5
8.1050
30-Sep-96
324
8.1590
31-Dec-96
252.4
7.7102
31-Mar-97
237.8
7.7905
30-Jun-97
324.5
7.9255
30-Sep-97
314.6
7.4692
31-Dec-97
256.8
7.1980
31-Mar-98
258.4
7.0547
30-Jun-98
360.4
7.0938
30-Sep-98
348
6.8657
31-Dec-98
304.6
6.7633
31-Mar-99
294.1
6.8805
30-Jun-99
377.1
7.2037
30-Sep-99
355.6
7.7990
31-Dec-99
308.1
7.8338
0.885398221
0.78393001
0.759236296
26.4498851
Observations
40
ANOVA
df
SS
Regression
MS
88837.93624
22209.48406
Residual
35
24485.87476
699.5964217
Total
39
113323.811
Coefficients
Intercept
473.0650749
Standard Error
35.54169837
t Stat
F
31.74613731
P-value
13.31014264
2.93931E-15
4.257226391 -7.058206249
3.21421E-08
Significance F
3.33637E-11
Lower 95%
Upper 95%
400.9115031
545.2186467
MR
-30.04838192
-38.69102153 -21.40574231
Q2
95.74106935
11.84748487
8.081130334
1.6292E-09
71.689367
119.7927717
Q3
73.92904763
11.82881519
6.249911462
3.62313E-07
49.91524679
97.94284847
Q4
20.54778131
11.84139803
1.73524961
0.091495355
-3.491564078
44.5871267
Private Housing Starts (PHS) with a Simple Regression Forecast (PHSF1) and a Multiple Regression Forecast (PHSF2) in
Thousands of Units
350
300
250
200
150
100
50
PHS
PHSF1
PHSF2
Nov-98
Jul-98
Mar-98
Nov-97
Jul-97
Mar-97
Nov-96
Jul-96
Mar-96
Nov-95
Jul-95
Mar-95
Nov-94
Jul-94
Mar-94
Nov-93
Jul-93
Mar-93
Nov-92
Jul-92
Mar-92
Nov-91
Jul-91
Mar-91
Nov-90
Jul-90
Mar-90
Where
t t 1 t
t = error at time t
= the parameter that measures correlation between
adjacent error terms
t normally distributed error terms with mean zero and
variance 2
Example
t t 1 t
DW
(e
t 2
et 1 ) 2
e
t 1
2
t
Where
et yt y t the residual for time period t
et 1 yt 1 y t 1 the residual for time period t - 1
r1 (e)
e e
t 2
n
e
t 1
t 1
2
t
Decision rule:
Example
Example
Year
1983
1984
1985
1986
1987
Quarter
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CompSale
20.96
21.4
21.96
21.52
22.39
22.76
23.48
23.66
24.1
24.01
24.54
24.3
25
25.64
26.36
26.98
27.52
27.78
28.24
28.78
InduSale
127.3
130
132.7
129.4
135
137.1
141.2
142.8
145.5
145.3
148.3
146.4
150.2
153.1
157.3
160.7
164.2
165.6
168.7
171.7
Example
Blaisdell Company Example
Company Sales ($
millions)
35
30
25
20
15
10
5
0
0
50
100
Industry sales($ millions)
150
200
Example
Example
Example
DW
(e
t 2
et 1 ) 2
e
t 1
2
t
Example
Year
1983
1984
1985
1986
1987
Quarter
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
et
-0.02605
-0.06202
0.022021
0.163754
0.04657
0.046377
0.043617
-0.05844
-0.0944
-0.14914
-0.14799
-0.05305
-0.02293
0.105852
0.085464
0.106102
0.029112
0.042316
-0.04416
-0.03301
et -et-1
(et -et-1)^2
-0.03596
0.084036
0.141733
-0.11718
-0.00019
-0.00276
-0.10205
-0.03596
-0.05474
0.001152
0.094937
0.030125
0.12878
-0.02039
0.020638
-0.07699
0.013204
-0.08648
0.011152
0.001293
0.007062
0.020088
0.013732
3.76E-08
7.61E-06
0.010415
0.001293
0.002997
1.33E-06
0.009013
0.000908
0.016584
0.000416
0.000426
0.005927
0.000174
0.007478
0.000124
et ^2
0.000679
0.003846
0.000485
0.026815
0.002169
0.002151
0.001902
0.003415
0.008911
0.022243
0.021901
0.002815
0.000526
0.011205
0.007304
0.011258
0.000848
0.001791
0.00195
0.00109
0.097941 0.133302
Example
.09794
DW
.735
.13330
Y 0 1 (1 X )
Y e 0 X 1
0.943791346
R Square
0.890742104
Adjusted R Square
0.874187878
Standard Error
19.05542121
Observations
39
ANOVA
df
SS
Regression
MS
97690.01942
19538
Residual
33
11982.59955
363.1091
Total
38
109672.619
Coefficients
Intercept
Standard Error
F
53.80753
Significance F
6.51194E-15
t Stat
P-value
Lower 95%
Upper 95%
-0.2953
0.769613
-245.0826992
182.9546249
-31.06403714
105.1938477
MR
-20.1992545
4.124906847
-4.8969
2.5E-05
Q2
97.03478074
8.900711541
10.90191
1.78E-12
78.9261326
115.1434289
Q3
75.40017073
8.827185877
8.541813
7.17E-10
57.44111179
93.35922967
Q4
20.35306822
8.83373887
2.304015
0.027657
2.380677107
38.32545934
DPI
0.022407799
0.004356973
5.142974
1.21E-05
0.013543464
0.031272134
-28.59144723 -11.80706176
21500
21000
20500
20000
PHS
19500
19000
18500
18000
17500
0
50
100
150
200
DPI
250
300
350
400
0.97778626
R Square
0.956065971
Adjusted R Square
0.946145384
Standard Error
12.46719572
Observations
39
ANOVA
df
SS
Regression
MS
104854.2589
14979.17985
Residual
31
4818.360042
155.4309691
Total
38
109672.619
Coefficients
t Stat
Significance F
3.07085E-19
P-value
Lower 95%
716.5926532
1017.664989
0.704153784
0.486593
-1358.949934
2792.13524
MR
-13.65521724
3.093504134
-4.414158396
0.000114
-19.96446404
-7.345970448
Q2
106.9813297
6.069780998
17.62523718
1.04E-17
94.60192287
119.3607366
Q3
27.72122303
9.111432565
3.042465916
0.004748
9.138323433
46.30412262
Q4
-13.37855186
7.653050858
-1.748133144
0.09034
-28.98706069
2.22995698
DPI
Intercept
Standard Error
F
96.37191
Upper 95%
-0.060399279
0.104412354
-0.578468704
0.567127
-0.273349798
0.15255124
DPI SQUARED
0.000335974
0.000536397
0.626354647
0.535668
-0.000758014
0.001429963
LPHS
0.655786939
0.097265424
6.742241114
1.51E-07
0.457412689
0.854161189
PHS
LPHS
Q2
Q3
Q4
DPI
30-Jun-90
271.3
10.3372
MR
217
18063
DPI SQUARED
1,631,359.85
30-Sep-90
233
10.1033
271.3
18031
1,625,584.81
31-Dec-90
173.6
9.9547
233
17856
1,594,183.68
31-Mar-91
146.7
9.5008
173.6
17748
1,574,957.52
30-Jun-91
254.1
9.5265
146.7
17861
1,595,076.61
30-Sep-91
239.8
9.2755
254.1
17816
1,587,049.28
31-Dec-91
199.8
8.6882
239.8
17811
1,586,158.61
31-Mar-92
218.5
8.7098
199.8
18000
1,620,000.00
30-Jun-92
296.4
8.6782
218.5
18085
1,635,336.13
30-Sep-92
276.4
8.0085
296.4
18036
1,626,486.48
31-Dec-92
238.8
8.2052
276.4
18330
1,679,944.50
31-Mar-93
213.2
7.7332
238.8
17975
1,615,503.13
30-Jun-93
323.7
7.4515
213.2
18247
1,664,765.05
30-Sep-93
309.3
7.0778
323.7
18246
1,664,582.58
31-Dec-93
279.4
7.0537
309.3
18413
1,695,192.85
31-Mar-94
252.6
7.2958
279.4
18154
1,647,838.58
30-Jun-94
354.2
8.4370
252.6
18409
1,694,456.41
30-Sep-94
325.7
8.5882
354.2
18493
1,709,955.25
31-Dec-94
265.9
9.0977
325.7
18667
1,742,284.45
31-Mar-95
214.2
8.8123
265.9
18834
1,773,597.78
30-Jun-95
296.7
7.9470
214.2
18798
1,766,824.02
30-Sep-95
308.2
7.7012
296.7
18871
1,780,573.21
31-Dec-95
257.2
7.3508
308.2
18942
1,793,996.82
31-Mar-96
240
7.2430
257.2
19071
1,818,515.21
30-Jun-96
344.5
8.1050
240
19081
1,820,422.81
30-Sep-96
324
8.1590
344.5
19161
1,835,719.61
31-Dec-96
252.4
7.7102
324
19152
1,833,995.52
31-Mar-97
237.8
7.7905
252.4
19331
1,868,437.81
30-Jun-97
324.5
7.9255
237.8
19315
1,865,346.13
30-Sep-97
314.6
7.4692
324.5
19385
1,878,891.13
31-Dec-97
256.8
7.1980
314.6
19478
1,896,962.42
31-Mar-98
258.4
7.0547
256.8
19632
1,927,077.12
30-Jun-98
360.4
7.0938
258.4
19719
1,944,194.81
30-Sep-98
348
6.8657
360.4
19905
1,980,963.41
31-Dec-98
304.6
6.7633
348
20194
2,038,980.00
31-Mar-99
294.1
6.8805
304.6
20377
2,076,010.87
30-Jun-99
377.1
7.2037
294.1
20472
2,095,440.74
30-Sep-99
355.6
7.7990
377.1
20756
2,153,982.23
31-Dec-99
308.1
7.8338
355.6
21124
2,231,020.37