Вы находитесь на странице: 1из 22

355

Chapter 12

REGRESSION ANALYSIS II
MULTIPLE LINEAR REGRESSION AND
OTHER TOPICS


12.1 (a) The scatter diagram and fitted line are illustrated below:

(b) Observe that
2
25.5
7
14.0
7
2
3.6429 ( ) 38.3571
2.0 ( )( ) 20.2
( ) 12.64
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y

= = = = =

= = = = =
= =


So,

1 0 1

0.5266, 2.0 ( 0.5266)(3.6429) 3.9184
xy
xx
S
y x
S
= = = = =
The fitted line is 3.92 0.53 y x = , which is graphed in part (a).
356 CHAPTER 12. REGRESSION ANALYSIS II
(c) Proportion of y variability explained is given by
2
2
0.842
xy
xx yy
S
r
S S
= = .

12.2 (a) The scatter diagram and fitted line are illustrated below:

(b) The transformed data are:

x 0.5 1 2 4 5 6 7
1
y
y =
0.2174 0.3125 0.4762 0.5882 1.1111 1.4286 1.2500

Observe that
2
25.5
7
14.0
7
2
3.6429 ( ) 38.3571
2.0 ( )( ) 6.9912
0.7691 ( ) 1.415
x
xx n
y
xy n
y
y y n
x S x x
y S x x y y
y S y y

= = = = =

= = = = =

= = = =


So,
1
0 1
6.9912

0.1823,
38.3571

0.7691 (0.1823)(3.6429) 0.1050
xy
xx
S
S
y x

= = =
= = =

The fitted line is 0.105 0.182 y x = + , which is graphed in part (a).
(c) Proportion of y variability explained is given by

2
2
2
(6.9912)
0.9005
(38.3571)(1.415)
xy
xx y y
S
r
S S


= = = , which is a fairly high
proportion. This fit is good, as is shown in part (a).


357


12.3 (a)
1
3
1
, y x x
y
= =
(b)
1 1
,
1
y x
y x
= =
+


12.4 (a) Using Minitab, enter the data into columns C1 and C2. The output is as
follows:

The scatter diagram and fitted line are illustrated below:


358 CHAPTER 12. REGRESSION ANALYSIS II
(b) We test the hypotheses:
0 1 1 1
: 0 versus : 0 H H = >
Since H
1
is right-sided and 0.05 = , the rejection region is
0.05
: 1.746 R T t > = (for d.f. = 16). The value of the observed t is
1

0 1453.90
16.83
86.40 /
xx
t
s S

= = = ,
which lies in R. Hence, H
0
is rejected at 0.05 = . The evidence is very
strong that log (Failure Time) depends on 1/Temperature. The expected
value of log (Failure Time) decreases as x decreases. Thus, increasing
the temperature reduces x and hence reduces the life of the insulation.
(c) Since
2
0.947 r = , the straight line explains most of the variation in y .

12.5 (a) The scatter diagram is shown in the below figure (labeled as (i)).
(b) The scatter diagram of the transformed data, log and log x x y y = = ,
reveals a more nearly linear relationship this is illustrated in the figure
below (labeled as (ii)).


Using Minitab, the original data are entered into columns C1 and C2 the
output is at the top of the next page.

In order to evaluate the mean and sum of squares for log(x), we also
calculate the following using Minitab:



359



(c) Since
2 2
( 1)(Standard Deviation) 19(1.0059) 19.225
xx
S n = = = and
0.025
2.101 t = for d.f. = 18, a 95% confidence interval for
1
is given by
( )
1

2.101 0.8993 2.101 0.2954


xx
s
S
= or ( 1.52, 0.28) .
(d) At 18, 6.14 0.899log (18) 3.54
e
x y = = = . Since 19.225
xx
S = , a 95%
confidence interval is given by

2
1 (2.8904 3.2107)
3.54 2.101(1.295) 3.54 0.64
20 19.225

+ =
or (2.90, 4.18).

12.6 (a) 38.413 0.0166(1200) 1.008(13) 71.44 y = + + =
(b) 38.413 0.0166(1200) 1.008(15) 73.45 y = + + =
(c) 38.413 0.0166(1400) 1.008(15) 76.77 y = + + =

12.7 When
1
3 x = and
2
2 x = , the mean of the response Y is
0 1 1 2 2
( ) 2 1(3) 3( 2) 11 E Y x x = + + = + = .

12.8 (a)
2
SSE 167.71
3.494
3 51 3
s
n
= = =

. So, 1.869 s = , which is associated with
3 48 n = degrees of freedom.
(b) Total SS = SS due to regression + SSE = 2538.7 + 167.71 = 2706.41.
360 CHAPTER 12. REGRESSION ANALYSIS II
2
SS due to regression 2538.7
0.938
Total SS 2706.41
R = = =
We see that 93.8% of the total variability in y is explained by regression
on
1
x and
2
x .

12.9 (a) Since
0.025
2.110 t = for d.f. = 48, the 95% confidence interval for
2
is

2 0.025 2

SE( ) 1.008 1.68(0.0977) t = or (0.844,1.172)
(b) Given that 0.05 = and H
1
is right-sided, the rejection region is
0.05
: 1.68 R T t > = for d.f. = 48. Since the observed value of t is

1
1

0.0140 0.0166 0.0140


2.430
SE( ) 0.00107


= = , which lies in R, we reject the null
hypothesis
0 1
: 0.0140 H = in favor of
1 1
: 0.0140 H > at 0.05 = .

12.10 (a) & (b) Using Minitab, we obtain the following output:

(c) 12.014 0.5555(2.5) 0.10722(127) 27.02 y = + + =
361


(d) From the output, the proportion of y variability explained is 52.0%.
Alternatively,
2
SS due to regression 36.259
0.520
Total SS 69.787
R = = = , or 52.0%.
(e) The confidence intervals for
0 1 2
, , are:
0
1
2
: 12.014 2.101(5.071)
: 0.5555 2.101(0.1770)
: 0.1072 2.101(0.03775)



12.11 (a)
0 1 2

45.3, 3.22, 0.02066 = = =
(b)
1 2
45.3 3.22 0.02066 y x x = .
(c) The proportion of y variability explained is
2
0.795 R = .
(d)
2
SSE
MSE 11.58
2
s
n
= = =



12.12 (a) 16 laptops (= total df + 1)
(b)
0 1 2

258, 61.71, 20.7 = = =
(c)
1 2
258 61.71 20.7 y x x = + + . All terms are significant.
(c) The proportion of y variability explained is
2
0.723 R = .

12.13 (a) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.085 R T t > = for d.f. = 20. Since the observed value of t is

1
1

0 3.22 0
7.058
SE( ) 0.4562


= = , which lies in R, we reject the null
hypothesis
0 1
: 0 H = in favor of
1 1
: 0 H at 0.05 = .
(b) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.085 R T t > = for d.f. = 20. Since the observed value of t for
2

is
2.414, which does lie in R, we reject the null hypothesis
0 2
: 0 H = in
favor of
1 2
: 0 H at 0.05 = .
(c) 45.3 3.22(3.2) 0.0207(2.0) 34.955 y = =
(d) A 90% confidence interval for
0
, which does not include 0, is given by

0 0.05 0

SE( ) 45.3 1.771(2.345) t = or (41.147, 49.453) .

12.14 (a) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.160 R T t > = for d.f. = 13. Since the observed value of t is

1
1

0 61.7 0
4.74
SE( ) 13.03


= = , which lies in R, we reject the null hypothesis
0 1
: 0 H = in favor of
1 1
: 0 H at 0.05 = .
362 CHAPTER 12. REGRESSION ANALYSIS II
(b) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.160 R T t > = for d.f. = 13. Since the observed value of t for
2

is
2.88 , we reject the null hypothesis
0 2
: 0 H = in favor of
1 2
: 0 H at
0.05 = .
(c) 258 61.7(2) 20.7(16.5) 206.95 y = + + =
(d) A 90% confidence interval for
0
, which includes 0, is given by

0 0.05 0

SE( ) 258 1.771(117.9) t = or ( 466.801, 49.199) .

12.15 (a) The scatter diagram of y versus
10
log x x = is shown below:

(b) The Minitab output is given below. The fitted line is
10
46.55 11.77log y x = (which is shown in part (a)).


363


(c) From the Minitab output in part (b), 2.788
x x
s
S

= . You could also
obtain the same result from direct calculation using 1.1862
x x
S

= . A 90%
confidence interval for
1
is given by

1

1.943 11.77 1.943(2.788)


x x
s
S


= or ( 17.18, 6.36) .
(d) At ( )
10
300, 46.55 11.772 log (300) 17.39 x y = = = . Since 2.5739 x =
and 1.1862
x x
S

= , a 95% confidence interval for the expected y-value at
300 x = is given by

( )
( )
2
0.025
2
( )
1

2.4471 2.5739
1
17.39 2.447(3.031)
8 1.1862
17.39 2.76
x x
x x
y t s
n S

= +
=

or (14.6, 20.15).

12.16 (a) Note that
1
1
bx
ae
y
= + , so that
1
log 1 log( ) a bx
y
| |
= +
|
|
\
, where the
right-side of the equation is a linear function of x. Thus, the required
linearizing transformation is
1
log 1 y
y
| |
=
|
|
\
.
(b) Note that log (log ) log ( ) log ( )
e e e e
y a b x = + . Thus, the required linearizing
transformation is log ( ) and log (log )
e e e
x x y y = = .

12.17 (a) & (b) The scatter diagram of y versus x shown in the figure (part (i))
below reveals a relation along a curve, and that of
10
log ( ) y y = versus x
shown in the figure (part (ii)) looks like a straight line relation.

364 CHAPTER 12. REGRESSION ANALYSIS II
(c) The fitted line

10
log ( ) 1.16 0.0305 y x = + is shown in part (ii) of the
figure in part (a). The Minitab output is shown below:


12.18 (a) (i) 0.3780 0.1826(5) 0.3401(7) 2.9157 y = + + =
(ii) 0.3780 0.1826(11) 0.3401(15) 6.7321 y = + + =
(b) Since five regression parameters are estimated, we have
2
d.f. 2 20 2 18
SSE 113.15
6.286
20 2 18
6.286 2.507
n
s
s
= = =
= = =

= =

(c) Regression SS + Residual SS = Total SS, so in this case we have

245.40 +113.15 = 358.55,
and
2
Regression SS 245.40
0.6844
Total SS 358.55
R = = = .

12.19 (a) A 90% confidence interval for
1
is given by
1 0.05 1

SE( ) 0.1826 1.752(0.1085) t = or (-0.0075, 0.3727).
365


(b) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.101 R T t > = for d.f. = 15. Since the observed value of t for
2

is
2
2

0.10 0.3401 0.10


5.324

0.0451 SE( )


= = , which lies in R, we reject the null
hypothesis
0 2
: 0.10 H = in favor of
1 2
: 0.10 H at 0.05 = .

12.20 We plot the residual versus the predicted value y . The plot shows that the
residuals tend to have higher variability with increasing values of y . This
indicates a violation of the assumption of constant variances.

12.21 (a) We plot the residual versus the predicted value y and the time order,
respectively, in parts (i) and (ii) of the below figure.

(b) The plot (i) does not seem to signify any appreciable violation of the
366 CHAPTER 12. REGRESSION ANALYSIS II
assumptions. The plot (ii) of residual versus time order, however, exhibits
a distinct pattern. The residuals tend to steadily increase in time. This
indicates a possible violation of the independence assumption.

12.22 The residual 1.3 at 2 x = has a much larger magnitude than the other residuals.
The corresponding y observation should be investigated for either a possible
error in recording or some other unanticipated circumstance.

12.23 Looking at the residuals in time order shown in the figure below, a distinct
pattern is apparent. The residuals decrease quite systematically until about the
year 15, and then they steadily increase. Residuals adjacent in time have similar
magnitude. This pattern casts serious doubt on the independence assumption.
Violation of independence is frequent in time series data such as these.

12.24 (a) The Minitab output is as follows:

367


(b) The least squares fit to a quadratic function reduces the residual sum of
squares from 14.606 to 12.7044, but the p-value for testing
0 2
: 0 H = is
large, so the quadratic term is not needed. The Minitab output is shown
below:


(c) The proportion of y variability explained is
2
0.754 R = , or 75.4%.
(d) For the straight line regression 1.274 s = , and for the quadratic fit
Error MS 1.5881 1.260 s = = = .

12.25 Use Minitab to find a quadratic fit for these data:
120 100 80 60 40 20 0
300
250
200
150
100
x
y
S 3.30271
R-Sq 99.8%
R-Sq(adj) 99.8%
Fitted Line Plot
C2 = 81.19 + 0.9983 C1
+ 0.009929 C1**2


368 CHAPTER 12. REGRESSION ANALYSIS II

The regression equation is
C2 = 81.19 + 0.9983 C1 + 0.009929 C1**2

S = 3.30271 R-Sq = 99.8% R-Sq(adj) = 99.8%

Analysis of Variance

Source DF SS MS F P
Regression 2 53589.8 26794.9 2456.47 0.000
Error 8 87.3 10.9
Total 10 53677.1

Sequential Analysis of Variance

Source DF SS F P
Linear 1 52744.0 508.73 0.000
Quadratic 1 845.8 77.54 0.000

(b) The proportion of y variability explained is
2
0.998 R = , or 99.8%.
(c) Given that 0.05 = and H
1
is left-sided, the rejection region is
0.05
: 1.860 R T t < = for d.f. = 8. Since the observed value of t is

1
1

0 0.9983
7.19
SE( ) 0.1389

= = , we reject reject the null hypothesis


0 1
: 0 H =
in favor of
1 1
: 0 H < at 0.05 = .

12.26 (a) & (c) Using Minitab, enter the data into columns C1, C2, and C3. The
output is shown below:

Regression Analysis: C3 versus C2, C1

The regression equation is
C3 = 18.8 - 0.0445 C2 - 1.19 C1

Predictor Coef SE Coef T P
Constant 18.8092 0.7942 23.68 0.000
C2 -0.04452 0.01683 -2.65 0.038
C1 -1.1892 0.2040 -5.83 0.001

S = 1.05520 R-Sq = 95.1% R-Sq(adj) = 93.5%

Analysis of Variance

Source DF SS MS F P
Regression 2 129.728 64.864 58.26 0.000
Residual Error 6 6.681 1.113
Total 8 136.409

Source DF Seq SS
C2 1 91.900
C1 1 37.828


The fitted equation is
1 2
18.8 0.0445 1.19 y x x = with
2
0.951 R = . This
means that the price of the used car decreases by 1190.0 dollars if age
369


increases by one year and odometer mileage remains fixed. Note that
2

is
negative. The mean price will still decrease by about 44.5 dollars if odometer
mileage increases by one thousand miles while age remains fixed. From the
Minitab output, both variables play a significant role. Moreover, the high
2
R
value signifies a good fit.

(b) The 95% confidence intervals are given by:

1 0.025 1

SE( ) 1.1892 2.445(0.2040) t =

2 0.025 2

SE( ) 0.04452 2.445(0.0168) t =

12.27 (a) The Minitab output is as follows:

From the output, note that the fitted line is 0.167 0.237 y x = + and
2
0.925 r = . The constant term could be dropped and the model re-fit.
(b) Since
0.025
2.306 t = for d.f. = 8, a 95% confidence interval for
1
is given
by ( )
1

2.306 0.23703 2.306 0.02383


xx
s
S
= or (0.182, 0.292) .
(c) Since 37 and 1610
xx
x S = = , a 95% confidence interval (for x = 45) is
given by
[ ]
2
1 (45 37)
0.167 0.237(45) 2.306(0.9563)
10 1610

+ + or (9.67, 11.32).
12.28 (a) & (b) In part (i) of the below figure, the scatter diagram shows a
relationship, along a curve, between the diameters and the heights
of sugar maple trees. Using the transformations
370 CHAPTER 12. REGRESSION ANALYSIS II
log and log x x y y = = , the scatter diagram of the transformed
data reveals a linear relationship, as shown in part (ii) of the figure
below:

(c) Using Minitab, the original data are entered into columns C1 and C2 the
output is shown below.
The fitted line, namely

log (height) 3.07 0.465log (diameter)


e e
= + , is
shown in the above figure (part (ii)).

371



(d) The proportion of variance of log ( )
e
y explained is
2
0.930 r = .

12.29 (a)
2
50.4 0.1907 y x = + and
2
0.03 r = . The Minitab output is shown below:

(b)
1 2 3
92.32 0.583 0.1494 35.07 y x x x = + + and
2
0.586 R = . The Minitab
output is shown below:
372 CHAPTER 12. REGRESSION ANALYSIS II


(c) Even three variables do not predict well. In fact, the GPA (
3
x ) could
predict almost as well by itself. We summarize the results in the following
table:
Predictor
3
x
3 1
and x x
3 1 2
, , and x x x
2
R
0.495 0.570 0.586

12.30 (a) The Minitab output is shown on the top of the next page.
(b) At 160 x = , the expected CLEP score is
2
991 9.26(160) 0.0432(160) 615.3 y = + = .
(c) We summarize the
2
R values of various predictors in the following table:
Predictor x
2
x
2
and x x
2
R
0.691 0.719 0.749

In general, all three models do not fit the data well since
2
R has moderate
values. The simple linear fit does almost as well as the two quadratic fits.
373




12.31 The design matrix X of the model
0 1 1 2 2
Y x x e = + + + is
1 1 14
1 2 44
1 2 20
1 4 36
1 4 66
1 5 59
1 7 100
1 7 95
1 8 38
X
(
(
(
(
(
(
=
(
(
(
(
(
(









374 CHAPTER 12. REGRESSION ANALYSIS II
12.32 The design matrix X of the model
2
0 1 2
Y x x e = + + + is
1 10 100
1 20 400
1 30 900
1 40 1600
1 50 2500
1 60 3600
1 70 4900
1 80 6400
1 90 8100
1 100 10000
1 110 12100
X
(
(
(
(
(
(
(
(
=
(
(
(
(
(
(
(
(



12.33 Using Minitab, the data for gender, initial and final number of sit-ups are
entered into columns C1, C8, and C9, respectively. The fitted equation is

final number = 9.999 + 0.155 (gender) + 0.9015 (initial number)

With the high p-value of 0.899 shown in the output (below), we cannot reject
the hypothesis that the coefficient of predictor gender is zero. In fact, a simple
linear fit with initial number of sit-ups as predictor has
2
0.715 r = .

375


12.34 Using Minitab, enter the data into 3 columns. The output is as follows:

Regression Analysis

The regression equation is
posttest run = 146 + 0.631 pretest run + 27.5 gender

Predictor Coef StDev T P
Constant 145.61 26.53 5.49 0.000
pretest 0.63084 0.04349 14.51 0.000
gender 27.55 11.23 2.45 0.016

S = 42.36 R-Sq = 82.3% R-Sq(adj) = 81.8%

Analysis of Variance

Source DF SS MS F P
Regression 2 649204 324602 180.88 0.000
Residual Error 78 139978 1795
Total 80 789183

Source DF Seq SS
pretest 1 638408
gender 1 10797

Unusual Observations
Obs pretest posttest Fit StDev Fit Resid St Resid
9 1065 960.00 872.55 13.95 87.45 2.19 R
13 1080 806.00 882.02 14.53 -76.02 -1.91 X
16 945 700.00 796.85 9.63 -96.85 -2.35 R
41 905 636.00 771.62 8.42 -135.62 -3.27 R
56 750 565.00 673.84 6.60 -108.84 -2.60 R

R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.

So, the fitted equation is:

posttest run = 146 + 0.631 pretest run + 27.5 gender

The plot of the residuals versus fitted value is shown on the following page:

376 CHAPTER 12. REGRESSION ANALYSIS II
400 500 600 700 800 900 10001100
0
10
20
Residual
F
r
e
q
u
e
n
c
y
Histogram of Residuals
0 10 20 30 40 50 60 70 80
200
700
1200
Observation Number
R
e
s
i
d
u
a
l
I Chart of Residuals
X=715.0
3.0SL=1133
-3.0SL=296.5
450 550 650 750 850 950
500
600
700
800
900
1000
1100
Fit
R
e
s
i
d
u
a
l
Residuals vs. Fits
-2.5-2.0-1.5-1.0-0.50.0 0.5 1.0 1.5 2.0 2.5
500
600
700
800
900
1000
1100
Normal Plot of Residuals
Normal Score
R
e
s
i
d
u
a
l
Pretest run versus posttest run

2 1
40
30
20
10
0
Residual
F
r
e
q
u
e
n
c
y
Histogram of Residuals
80 70 60 50 40 30 20 10 0
3
2
1
0
Observation Number
R
e
s
i
d
u
a
l
I Chart of Residuals
666
8
666
8
66
88
888
8
6
8
8
88
88
8
86
8886
88
8
8666
8886666
888
88
88
8
866
8
66
8
866
88
8
8
8
8866
8
66
8
X=1.531
3.0SL=2.827
-3.0SL=0.2343
950 850 750 650 550 450
2.0
1.5
1.0
Fit
R
e
s
i
d
u
a
l
Residuals vs. Fits
0.5 0.0 -0.5
2.0
1.5
1.0
Normal Plot of Residuals
Normal Score
R
e
s
i
d
u
a
l
Gender versus posttest run

Вам также может понравиться