Академический Документы
Профессиональный Документы
Культура Документы
Chapter 12
REGRESSION ANALYSIS II
MULTIPLE LINEAR REGRESSION AND
OTHER TOPICS
12.1 (a) The scatter diagram and fitted line are illustrated below:
(b) Observe that
2
25.5
7
14.0
7
2
3.6429 ( ) 38.3571
2.0 ( )( ) 20.2
( ) 12.64
x
xx n
y
xy n
yy
x S x x
y S x x y y
S y y
= = = = =
= = = = =
= =
So,
1 0 1
0.5266, 2.0 ( 0.5266)(3.6429) 3.9184
xy
xx
S
y x
S
= = = = =
The fitted line is 3.92 0.53 y x = , which is graphed in part (a).
356 CHAPTER 12. REGRESSION ANALYSIS II
(c) Proportion of y variability explained is given by
2
2
0.842
xy
xx yy
S
r
S S
= = .
12.2 (a) The scatter diagram and fitted line are illustrated below:
(b) The transformed data are:
x 0.5 1 2 4 5 6 7
1
y
y =
0.2174 0.3125 0.4762 0.5882 1.1111 1.4286 1.2500
Observe that
2
25.5
7
14.0
7
2
3.6429 ( ) 38.3571
2.0 ( )( ) 6.9912
0.7691 ( ) 1.415
x
xx n
y
xy n
y
y y n
x S x x
y S x x y y
y S y y
= = = = =
= = = = =
= = = =
So,
1
0 1
6.9912
0.1823,
38.3571
0.7691 (0.1823)(3.6429) 0.1050
xy
xx
S
S
y x
= = =
= = =
The fitted line is 0.105 0.182 y x = + , which is graphed in part (a).
(c) Proportion of y variability explained is given by
2
2
2
(6.9912)
0.9005
(38.3571)(1.415)
xy
xx y y
S
r
S S
= = = , which is a fairly high
proportion. This fit is good, as is shown in part (a).
357
12.3 (a)
1
3
1
, y x x
y
= =
(b)
1 1
,
1
y x
y x
= =
+
12.4 (a) Using Minitab, enter the data into columns C1 and C2. The output is as
follows:
The scatter diagram and fitted line are illustrated below:
358 CHAPTER 12. REGRESSION ANALYSIS II
(b) We test the hypotheses:
0 1 1 1
: 0 versus : 0 H H = >
Since H
1
is right-sided and 0.05 = , the rejection region is
0.05
: 1.746 R T t > = (for d.f. = 16). The value of the observed t is
1
0 1453.90
16.83
86.40 /
xx
t
s S
= = = ,
which lies in R. Hence, H
0
is rejected at 0.05 = . The evidence is very
strong that log (Failure Time) depends on 1/Temperature. The expected
value of log (Failure Time) decreases as x decreases. Thus, increasing
the temperature reduces x and hence reduces the life of the insulation.
(c) Since
2
0.947 r = , the straight line explains most of the variation in y .
12.5 (a) The scatter diagram is shown in the below figure (labeled as (i)).
(b) The scatter diagram of the transformed data, log and log x x y y = = ,
reveals a more nearly linear relationship this is illustrated in the figure
below (labeled as (ii)).
Using Minitab, the original data are entered into columns C1 and C2 the
output is at the top of the next page.
In order to evaluate the mean and sum of squares for log(x), we also
calculate the following using Minitab:
359
(c) Since
2 2
( 1)(Standard Deviation) 19(1.0059) 19.225
xx
S n = = = and
0.025
2.101 t = for d.f. = 18, a 95% confidence interval for
1
is given by
( )
1
+ =
or (2.90, 4.18).
12.6 (a) 38.413 0.0166(1200) 1.008(13) 71.44 y = + + =
(b) 38.413 0.0166(1200) 1.008(15) 73.45 y = + + =
(c) 38.413 0.0166(1400) 1.008(15) 76.77 y = + + =
12.7 When
1
3 x = and
2
2 x = , the mean of the response Y is
0 1 1 2 2
( ) 2 1(3) 3( 2) 11 E Y x x = + + = + = .
12.8 (a)
2
SSE 167.71
3.494
3 51 3
s
n
= = =
. So, 1.869 s = , which is associated with
3 48 n = degrees of freedom.
(b) Total SS = SS due to regression + SSE = 2538.7 + 167.71 = 2706.41.
360 CHAPTER 12. REGRESSION ANALYSIS II
2
SS due to regression 2538.7
0.938
Total SS 2706.41
R = = =
We see that 93.8% of the total variability in y is explained by regression
on
1
x and
2
x .
12.9 (a) Since
0.025
2.110 t = for d.f. = 48, the 95% confidence interval for
2
is
2 0.025 2
SE( ) 1.008 1.68(0.0977) t = or (0.844,1.172)
(b) Given that 0.05 = and H
1
is right-sided, the rejection region is
0.05
: 1.68 R T t > = for d.f. = 48. Since the observed value of t is
1
1
= = , which lies in R, we reject the null
hypothesis
0 1
: 0.0140 H = in favor of
1 1
: 0.0140 H > at 0.05 = .
12.10 (a) & (b) Using Minitab, we obtain the following output:
(c) 12.014 0.5555(2.5) 0.10722(127) 27.02 y = + + =
361
(d) From the output, the proportion of y variability explained is 52.0%.
Alternatively,
2
SS due to regression 36.259
0.520
Total SS 69.787
R = = = , or 52.0%.
(e) The confidence intervals for
0 1 2
, , are:
0
1
2
: 12.014 2.101(5.071)
: 0.5555 2.101(0.1770)
: 0.1072 2.101(0.03775)
12.11 (a)
0 1 2
45.3, 3.22, 0.02066 = = =
(b)
1 2
45.3 3.22 0.02066 y x x = .
(c) The proportion of y variability explained is
2
0.795 R = .
(d)
2
SSE
MSE 11.58
2
s
n
= = =
12.12 (a) 16 laptops (= total df + 1)
(b)
0 1 2
258, 61.71, 20.7 = = =
(c)
1 2
258 61.71 20.7 y x x = + + . All terms are significant.
(c) The proportion of y variability explained is
2
0.723 R = .
12.13 (a) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.085 R T t > = for d.f. = 20. Since the observed value of t is
1
1
0 3.22 0
7.058
SE( ) 0.4562
= = , which lies in R, we reject the null
hypothesis
0 1
: 0 H = in favor of
1 1
: 0 H at 0.05 = .
(b) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.085 R T t > = for d.f. = 20. Since the observed value of t for
2
is
2.414, which does lie in R, we reject the null hypothesis
0 2
: 0 H = in
favor of
1 2
: 0 H at 0.05 = .
(c) 45.3 3.22(3.2) 0.0207(2.0) 34.955 y = =
(d) A 90% confidence interval for
0
, which does not include 0, is given by
0 0.05 0
SE( ) 45.3 1.771(2.345) t = or (41.147, 49.453) .
12.14 (a) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.160 R T t > = for d.f. = 13. Since the observed value of t is
1
1
0 61.7 0
4.74
SE( ) 13.03
= = , which lies in R, we reject the null hypothesis
0 1
: 0 H = in favor of
1 1
: 0 H at 0.05 = .
362 CHAPTER 12. REGRESSION ANALYSIS II
(b) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.160 R T t > = for d.f. = 13. Since the observed value of t for
2
is
2.88 , we reject the null hypothesis
0 2
: 0 H = in favor of
1 2
: 0 H at
0.05 = .
(c) 258 61.7(2) 20.7(16.5) 206.95 y = + + =
(d) A 90% confidence interval for
0
, which includes 0, is given by
0 0.05 0
SE( ) 258 1.771(117.9) t = or ( 466.801, 49.199) .
12.15 (a) The scatter diagram of y versus
10
log x x = is shown below:
(b) The Minitab output is given below. The fitted line is
10
46.55 11.77log y x = (which is shown in part (a)).
363
(c) From the Minitab output in part (b), 2.788
x x
s
S
= . You could also
obtain the same result from direct calculation using 1.1862
x x
S
= . A 90%
confidence interval for
1
is given by
1
= or ( 17.18, 6.36) .
(d) At ( )
10
300, 46.55 11.772 log (300) 17.39 x y = = = . Since 2.5739 x =
and 1.1862
x x
S
= , a 95% confidence interval for the expected y-value at
300 x = is given by
( )
( )
2
0.025
2
( )
1
2.4471 2.5739
1
17.39 2.447(3.031)
8 1.1862
17.39 2.76
x x
x x
y t s
n S
= +
=
or (14.6, 20.15).
12.16 (a) Note that
1
1
bx
ae
y
= + , so that
1
log 1 log( ) a bx
y
| |
= +
|
|
\
, where the
right-side of the equation is a linear function of x. Thus, the required
linearizing transformation is
1
log 1 y
y
| |
=
|
|
\
.
(b) Note that log (log ) log ( ) log ( )
e e e e
y a b x = + . Thus, the required linearizing
transformation is log ( ) and log (log )
e e e
x x y y = = .
12.17 (a) & (b) The scatter diagram of y versus x shown in the figure (part (i))
below reveals a relation along a curve, and that of
10
log ( ) y y = versus x
shown in the figure (part (ii)) looks like a straight line relation.
364 CHAPTER 12. REGRESSION ANALYSIS II
(c) The fitted line
10
log ( ) 1.16 0.0305 y x = + is shown in part (ii) of the
figure in part (a). The Minitab output is shown below:
12.18 (a) (i) 0.3780 0.1826(5) 0.3401(7) 2.9157 y = + + =
(ii) 0.3780 0.1826(11) 0.3401(15) 6.7321 y = + + =
(b) Since five regression parameters are estimated, we have
2
d.f. 2 20 2 18
SSE 113.15
6.286
20 2 18
6.286 2.507
n
s
s
= = =
= = =
= =
(c) Regression SS + Residual SS = Total SS, so in this case we have
245.40 +113.15 = 358.55,
and
2
Regression SS 245.40
0.6844
Total SS 358.55
R = = = .
12.19 (a) A 90% confidence interval for
1
is given by
1 0.05 1
SE( ) 0.1826 1.752(0.1085) t = or (-0.0075, 0.3727).
365
(b) Given that 0.05 = and H
1
is two-sided, the rejection region is
0.025
: 2.101 R T t > = for d.f. = 15. Since the observed value of t for
2
is
2
2
0.0451 SE( )
= = , which lies in R, we reject the null
hypothesis
0 2
: 0.10 H = in favor of
1 2
: 0.10 H at 0.05 = .
12.20 We plot the residual versus the predicted value y . The plot shows that the
residuals tend to have higher variability with increasing values of y . This
indicates a violation of the assumption of constant variances.
12.21 (a) We plot the residual versus the predicted value y and the time order,
respectively, in parts (i) and (ii) of the below figure.
(b) The plot (i) does not seem to signify any appreciable violation of the
366 CHAPTER 12. REGRESSION ANALYSIS II
assumptions. The plot (ii) of residual versus time order, however, exhibits
a distinct pattern. The residuals tend to steadily increase in time. This
indicates a possible violation of the independence assumption.
12.22 The residual 1.3 at 2 x = has a much larger magnitude than the other residuals.
The corresponding y observation should be investigated for either a possible
error in recording or some other unanticipated circumstance.
12.23 Looking at the residuals in time order shown in the figure below, a distinct
pattern is apparent. The residuals decrease quite systematically until about the
year 15, and then they steadily increase. Residuals adjacent in time have similar
magnitude. This pattern casts serious doubt on the independence assumption.
Violation of independence is frequent in time series data such as these.
12.24 (a) The Minitab output is as follows:
367
(b) The least squares fit to a quadratic function reduces the residual sum of
squares from 14.606 to 12.7044, but the p-value for testing
0 2
: 0 H = is
large, so the quadratic term is not needed. The Minitab output is shown
below:
(c) The proportion of y variability explained is
2
0.754 R = , or 75.4%.
(d) For the straight line regression 1.274 s = , and for the quadratic fit
Error MS 1.5881 1.260 s = = = .
12.25 Use Minitab to find a quadratic fit for these data:
120 100 80 60 40 20 0
300
250
200
150
100
x
y
S 3.30271
R-Sq 99.8%
R-Sq(adj) 99.8%
Fitted Line Plot
C2 = 81.19 + 0.9983 C1
+ 0.009929 C1**2
368 CHAPTER 12. REGRESSION ANALYSIS II
The regression equation is
C2 = 81.19 + 0.9983 C1 + 0.009929 C1**2
S = 3.30271 R-Sq = 99.8% R-Sq(adj) = 99.8%
Analysis of Variance
Source DF SS MS F P
Regression 2 53589.8 26794.9 2456.47 0.000
Error 8 87.3 10.9
Total 10 53677.1
Sequential Analysis of Variance
Source DF SS F P
Linear 1 52744.0 508.73 0.000
Quadratic 1 845.8 77.54 0.000
(b) The proportion of y variability explained is
2
0.998 R = , or 99.8%.
(c) Given that 0.05 = and H
1
is left-sided, the rejection region is
0.05
: 1.860 R T t < = for d.f. = 8. Since the observed value of t is
1
1
0 0.9983
7.19
SE( ) 0.1389
is
negative. The mean price will still decrease by about 44.5 dollars if odometer
mileage increases by one thousand miles while age remains fixed. From the
Minitab output, both variables play a significant role. Moreover, the high
2
R
value signifies a good fit.
(b) The 95% confidence intervals are given by:
1 0.025 1
SE( ) 1.1892 2.445(0.2040) t =
2 0.025 2
SE( ) 0.04452 2.445(0.0168) t =
12.27 (a) The Minitab output is as follows:
From the output, note that the fitted line is 0.167 0.237 y x = + and
2
0.925 r = . The constant term could be dropped and the model re-fit.
(b) Since
0.025
2.306 t = for d.f. = 8, a 95% confidence interval for
1
is given
by ( )
1
+ + or (9.67, 11.32).
12.28 (a) & (b) In part (i) of the below figure, the scatter diagram shows a
relationship, along a curve, between the diameters and the heights
of sugar maple trees. Using the transformations
370 CHAPTER 12. REGRESSION ANALYSIS II
log and log x x y y = = , the scatter diagram of the transformed
data reveals a linear relationship, as shown in part (ii) of the figure
below:
(c) Using Minitab, the original data are entered into columns C1 and C2 the
output is shown below.
The fitted line, namely