a

© All Rights Reserved

Просмотров: 39

a

© All Rights Reserved

- Random Wave Forces on a Free-To-surge Vertical Cylinder
- Pt4 Adv Regression Models
- H2 Math Practice
- Statistical Inference, Regression SPSS Report
- Evaluation of Venture Capital Based on Evaluation Model
- 360Bind for SAP BusinessObjects Automated Regression Testing
- Regression
- Predictions--HP Labs
- Chapter 5
- Regression
- Applied Regression Analysis Final Project
- ACCG200_SolutionCh_03.pdf
- 17004411
- research report on e-banking and its effect
- Appendix Nonlinear Regression
- Equations for Linear Regression
- 12.Simple Regression NLS Edit(1)
- 10.5923.j.statistics.20150501.01 (1)
- 37743305-Applications-of-Statistical-Software-for-Data-Analysis.docx
- R-1

Вы находитесь на странице: 1из 33

Chapter 4

4.1 (a) Yes, the scatterplot below (left) shows a linear relationship between the cube root of

weight, 3 weight , and length.

10

0.3

9

0.2

0.1

Residual

6

5

4

0.0

-0.1

3

-0.2

2

1

-0.3

5

10

15

20

25

Length (cm)

30

35

40

10

15

20

25

Length (cm)

30

35

40

(b) Let x = length and y = 3 weight . The least-squares regression line is y = 0.0220 + 0.2466 x .

The intercept of 0.0220 clearly has no practical interpretation in this situation, since weight and

the cube root of weight must be positive. The slope 0.2466 indicates that for every 1 cm increase

in length, the cube root of weight will increase, on average, by 0.2466. (c)

3 weight = 0.0220 + 0.2466 36 8.8556 , so the predicted weight is 8.85563 694.5 g. The

predicted weight with this model is slightly higher than the predicted weight of 689.9g with the

model in Example 4.2. (d) The residual plot above (right) shows the residuals are negative for

lengths below 17 cm, positive for lengths between 18 cm and 27 cm, and have no clear pattern

for lengths above 28 cm. (e) Nearly all (99.88%) of the variation in the cube root of the weight

can be explained by the linear relationship with the length.

4.2 (a) The scatterplot below (left) shows positive association between length and period with

one very unusual point (106.5, 2.115) in the top right corner.

2.2

0.100

2.0

0.075

0.050

1.6

Residual

Period (s)

1.8

1.4

1.2

0.025

0.000

1.0

-0.025

0.8

-0.050

0.6

20

30

40

50

60

70

Length (cm)

80

90

100

110

20

30

40

50

60

70

Length (cm)

80

90

100

110

(b) The residual plot above (right) shows that the residuals tend to be small or negative for small

lengths and then get larger for lengths between 40 and 50 cm. The residual for the one very large

length is negative again. Even though the value of r 2 is 0.983, the residual plot suggests that a

model with some curvature (or a linear model after a transformation) might be better. (c) The

information from the physics student suggests that there should be a linear relationship between

90

Chapter 4

period and length . (d) A scatterplot (left) and residual plot (right) are shown below for the

transformed data. The least-squares regression line for the transformed data is

y = 0.0858 + 0.210 length . The value of r 2 is slightly higher, 0.986 versus 0.983, and the

residual plot looks better, although the residuals for the three smallest lengths are positive and

the residuals for the next six lengths are negative.

2.2

2.0

0.05

1.8

Residual

Period

1.6

1.4

0.00

1.2

-0.05

1.0

0.8

-0.10

0.6

4

7

8

Square root of length

10

11

7

8

Square root of length

10

11

(e) According to the theoretical relationship, the slope in the model for (d) should be

2

0.2007 . The estimated model appears to agree with the theoretical relationship because

980

the estimated slope is 0.210, an absolute difference of about 0.0093. (f) The predicted length of

an 80-centimeter pendulum is y = 0.0858 + 0.210 80 1.7925 seconds.

3.0

3.0

2.5

2.5

Pressure (atmospheres)

Pressure (atmospheres)

4.3 (a) A scatterplot is shown below (left). The relationship is strong, negative and slightly

nonlinear (or curved), with no outliers.

2.0

1.5

1.0

2.0

1.5

1.0

5.0

7.5

10.0

12.5

15.0

Volume (cubic cm)

17.5

20.0

0.050

0.075

0.100

0.125

1/Volume

0.150

0.175

(b) Yes, the scatterplot for the transformed data (above on the right) shows a clear linear

relationship. (c) The least-squares regression equation is P = 0.3677 + 15.8994 (1 V ) . The

square of the correlation coefficient, r 2 = 0.9958 , indicates almost a perfect fit. The residual plot

(below) shows a definite pattern, which should be of some concern, but the model still provides a

good fit.

91

0.050

Residual

0.025

0.000

-0.025

-0.050

-0.075

0.050

0.075

0.100

0.125

1/Volume

0.150

0.175

(d) Letting y = 1 P , the least-squares regression line is y = 0.1002 + 0.0398V . The scatterplot

(below on the left), the value of r 2 = 0.9997 , and the residual plot (below on the right) indicate

that the linear model provides an excellent fit for the transformed data. This transformation also

achieves linearity because V = k P .

0.005

0.9

0.004

0.8

0.003

0.002

Residual

1/Pressure

0.7

0.6

0.001

0.000

-0.001

0.5

-0.002

0.4

-0.003

0.3

-0.004

5.0

7.5

10.0

12.5

15.0

Volume (cubic cm)

17.5

20.0

5.0

7.5

10.0

12.5

15.0

Volume (cubic cm)

17.5

20.0

(e) When the gas volume is 15 cm3 the model in part (c) predicts the pressure to be

P = 0.3677 + 15.8994 (1 15 ) 1.4277 atmospheres, and the model in part (d) predicts the

reciprocal of pressure to be 0.1002 + 0.0398(15) = 0.6972 or P = 1/ 0.6972 1.4343

atmospheres. The predictions are the same to the nearest one-hundredth of an atmosphere.

0.2

0.1

Residual

Period squared

4.4 (a) The scatterplot below (left) shows that the relationship between period2 and length is

roughly linear.

0.0

-0.1

-0.2

20

30

40

50

60

70

Length (cm)

80

90

100

110

20

30

40

50

60

70

Length (cm)

80

90

100

110

(b) The least-squares regression line for the transformed data y = period2 and x = length is

y = 0.1547 + 0.0428 x . The value of r 2 = 0.992 and the residual plot above (right) indicate that

92

Chapter 4

the linear model provide a good fit for the transformed data. As we noticed in Exercise 4.2 part

(d), the residual plot looks better, but there is still a pattern with the residuals for the three

smallest lengths being positive and the residuals for the next six lengths being negative. (c)

4 2

0.0403 . The

According to the theoretical relationship, the slope in the model should be

980

estimated model appears to agree with the theoretical relationship because the estimated slope is

0.0428, an absolute difference of about 0.0025. (d) The predicted length of an 80-centimeter

pendulum is y = 0.1547 + 0.0428 80 3.2693 or a period of 1.8081 seconds. The two models

provide very similar predicted values, with an absolute difference of only 0.0156.

4.5 (a) A scatterplot is shown below (left). The relationship is strong, negative and nonlinear (or

curved).

180

5.0

140

ln(Light intensity)

160

120

100

80

60

4.5

4.0

3.5

40

20

3.0

5

8

Depth (meters)

10

11

8

Depth (meters)

10

11

22.78/31.78) are all 0.717. Since the ratios are all the same, the exponential model is

appropriate. (c) Yes, the scatterplot (above on the right) shows that the transformation achieves

linearity. (d) If x = Depth and y = ln(Light Intensity), then the least-squares regression lines is

y = 6.7891 0.3330 x . The intercept 6.7891 provides an estimate for the average value of the

natural log of the light intensity at the surface of the lake. The slope, 0.3330, indicates that the

natural log of the light intensity decreases on average by 0.3330 for each one meter increase in

depth. (e) The residual plot below (left) shows that the linear model on the transformed data is

appropriate. (Some students may suggest that there is one unusually large residual, but they need

to look carefully at the scale on the y-axis. All of the residuals are extremely small.) (f) If x =

Depth and y = Light Intensity, then the model after the inverse transformation is

y = e6.7891e0.333 x or y = 888.1139 0.7168x . The scatterplot below (right) shows that the

exponential model is excellent for these data.

93

180

0.000100

160

Light intensity (lumens)

0.000075

Residual

0.000050

0.000025

0.000000

-0.000025

140

120

100

80

60

40

-0.000050

20

5

8

9

Depth (meters)

10

11

8

Depth (meters)

10

11

(g) At 22m, the predicted light intensity is y = 888.1139e0.33322 0.5846 lumens. No, the

absolute difference between the observed light intensity 0.58 and the predicted light intensity

0.5846 is very small (0.0046 lumens) because the model provides an excellent fit.

4.6 (a) A scatterplot is shown below (left).

3000000

6.5

2500000

6.0

log(Acres)

Acres

2000000

1500000

5.5

1000000

500000

5.0

0

1978

1979

1980

1981

1978

Year

1979

1980

1981

Year

(b) The ratios are 226,260/63,042 = 3.5890, 907,075/226,260 = 4.0090, and 2,826,095/907,075 =

3.1156. (c) The transformed values of y are 4.7996, 5.3546, 5.9576, and 6.4512. A scatterplot of

the logarithms against year is shown above (right). (d) Minitab output is shown below.

The regression equation is

log(Acres) = - 1095 + 0.556 year

Predictor

Constant

year

Coef

-1094.51

0.55577

S = 0.0330502

SE Coef

29.26

0.01478

R-Sq = 99.9%

T

-37.41

37.60

P

0.001

0.001

R-Sq(adj) = 99.8%

(e) If x = year and y = acres, then the model after the inverse transformation is

y = 101094.51100.5558 x . The coefficient of 100.5558 x is 0.0000 (rounded to 4 decimal places) so all of

the predicted values would be 0. (Note: If properties of exponents are not used to simplify the

right-hand-side, then some calculators will be able to do the calculations without having serious

overflow problems.) (f) The least-squares regression line of log(acres) on year is

y = 4.2513 + 0.5558 x . (g) The residual plot below shows no clear pattern, so the linear

regression model on the transformed data is appropriate.

94

Chapter 4

0.04

3000000

0.03

2500000

2000000

0.01

Acres

Residual

0.02

0.00

1500000

1000000

-0.01

500000

-0.02

-0.03

1.0

1.5

2.0

2.5

3.0

Years since 1977

3.5

4.0

(h) If x = year and y = acres, then the model after the inverse transformation is

y = 104.2513100.5558 x 17,836.1042 100.5558 x . A scatterplot with the exponential model

superimposed is shown above (right). The exponential model provides an excellent fit. (i) The

predicted number of acres defoliated in 1982 (5 years since 1977) is

y 17,836.1042 100.55585 = 10,722,597.42 acres.

4.7 (a) If y = number of transistors and x = number of years since 1970, then y (1) = ab1 = 2250

4

2250

2250 3

1.5874 . This model

and y (4) = ab = 9000 , so a =

1417.4112 and b =

0.25

1417.4112

9000

predicts the number of transistors in year x after 1970 to be y = 1417.4112 1.5874 x . (b) Using

the natural logarithm transformation on both sides of the model in (a), produces the line

ln y = 7.2566 + 0.4621x . (c) The slope for Moores model (0.4621) is larger than the estimated

slope in Example 4.6 (0.332), so the actual transistor counts have grown more slowly than

Moores law suggests.

4

4.8 (a) According to the claim, the number of children killed doubled every year after 1950.

Year

1951 1952 1953 1954 1955 1956 1957 1958 1959 1960

Number of deaths

2

4

8

16

32

64

128 256 512 1024

(b) A scatterplot showing the exponential relationship is shown below (left).

3.0

1000

2.5

log(Number of deaths)

Number of deaths

800

600

400

2.0

1.5

1.0

200

0.5

0

0.0

1950

1952

1954

1956

Year

1958

1960

1950

1952

1954

1956

1958

1960

Year

(c) According to the paper, the number of children killed x years after 1950 is 2 x . Thus,

245 = 3.5184 1013 or approximately 35 trillion children were killed in 1995. This is clearly a

95

mistake. (d) A scatterplot of the logarithms against year (above on the right) shows a strong,

positive linear relationship. (e) The least-squares regression line for predicting the logarithm of

y = deaths from x = year is approximately y = 587.0 + 0.301x . Thus, the predicted value in

1995 is y = 587.0 + 0.301 1995 13.495 . As a check, log(245 ) 13.5463 . The absolute

difference in these two predictions, 0.0513, is relatively small.

4.9 (a) A scatterplot is shown below.

300

2.5

2.0

200

Log(Population)

250

150

100

1.5

1.0

50

0.5

0

50

100

Time (since 1790)

150

200

50

100

Time (since 1790)

150

200

(b) In the scatterplot above (right), the transformed data appear to be linear from 0 to 90 (or 1790

to about 1880), and then linear again, but with a smaller slope. The linear trend indicates that the

exponential model is still appropriate and the smaller slope reflects a slower growth rate. (c) The

least-squares regression line for predicting y = log(population) from x = time since 1790 is

y = 1.329 + 0.0054 x . Transforming back to the original variables, the estimated population size

is 21.3304 1.0125x . A scatterplot with this regression line is shown below (left). (d) The

residual plot (below on the right) shows random scatter and r 2 = 0.995, so the exponential model

provides an excellent fit.

2.5

0.010

0.005

2.3

Residual

Log(Population)

2.4

2.2

0.000

-0.005

2.1

-0.010

-0.015

2.0

120

130

140

150

160

170

180

Time (since 1790)

190

200

210

120

130

140

150

160

170

180

Time (since 1790)

190

200

210

(e) The predicted population in 2010 is y = 1.329 + 0.0054 220 2.517 or about

102.517 = 328.8516 million people. The prediction is probably too low, because these estimates

usually do not include homeless people and illegal immigrants.

4.10 (a) A scatterplot of distance versus height is shown below (left).

Chapter 4

1500

1500

1400

1400

1300

1300

1200

1200

Distance

Distance

96

1100

1100

1000

1000

900

900

800

800

300

400

500

600

700

Height

800

900

1000

16

18

20

22

24

26

Square root of height

28

30

32

(b) The curve tends to bow downward, which resembles a power curve x p with p < 1.

Since we want to pull in the right tail of the distribution, we should apply a transformation x p

with p < 1. (c) A scatterplot of distance against the square root of height (shown above, right)

straightens the graph quite nicely.

4.11 (a) Let x = Body weight in kg and y = Life span in years. Scatterplots of the original data

(left) and the transformed data (right), after taking the logarithms of both variables, are shown

below. The linear trend in the scatterplot for the transformed data suggests that the power model

is appropriate.

1.75

40

Log(Life span)

1.50

30

20

1.25

1.00

0.75

10

0.50

0

0

500

1000

1500

Weight (kg)

2000

2500

3000

-2

-1

1

Log(Weight)

(b) The least squares regression line for the transformed data is log y = 0.7617 + 0.2182log( x) .

The residual plot (below on the left) shows fairly random scatter about zero and r 2 = 0.7117 .

Thus, 71.17% of the variation in the log of the life spans is explained by the linear relationship

with the log of the body weight.

97

0.3

40

0.2

Residual

0.1

0.0

-0.1

-0.2

30

20

10

-0.3

-0.4

-2

-1

1

Log(Weight)

2

3

Transformed weight

(c) The inverse transformation gives the estimated power model y = 100.7617 x 0.2182 5.7770 x 0.2182 .

(d) This model predicts the average life span for humans to be

y 5.7770 650.2182 = 14.3642 years, considerably shorter than the expected life span of humans.

(e) According to the biologists, the power model is y = ax 0.2 . The easiest and best option is to

plot a graph of ( weight 0.2 , lifespan ) and then fit a least-squares regression line using the

transformed weight as the explanatory variable. The scatterplot (above on the right) shows that

this model provides a good fit for the data. The least-squares regression line is

y = 2.70 + 7.95 x 0.2 with a predicted average life span of y = 2.7 + 7.95 650.2 15.62 years for

humans. Note: Students may try some other models, which are not as good. For example,

raising both sides of the equation to the fifth power, the model becomes y 5 = a 5 x , which is a

linear regression model with no intercept parameter (or an intercept of zero). After transforming

life span y to y5, the estimated model is y 5 = 30,835 x . This model predicts the average life span

of humans to be y = ( 30,835 65)

0.2

5

transformed data is y 5 = 1389463 + 30, 068 x with a predicted average life span of

y = (1389463 + 30068 65 )

0.2

4.12 (a) The power model would be more appropriate for these data. The scatterplot of the log

of cost versus diameter (below on the left) is linear, but the plot of the log of cost versus the log

of diameter (below on the right) shows almost a perfect straight line.

Chapter 4

1.2

1.2

1.1

1.1

1.0

1.0

Log(Cost)

Log(Cost)

98

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

9

10

11

12

13

14

15

Diameter (inches)

16

17

18

1.00

1.05

1.10

1.15

Log(Diameter)

1.20

1.25

(b) Let y = the cost of the pizza and x = the diameter of the pizza. The least-squares regression

line is log y = 1.5118 + 2.1150log x . The inverse transformation gives the estimated power

model y = 101.5118 x 2.115 0.0308 x 2.115 . (c) According to this model, the predicted costs of the

four different size pizzas are $4.01, $5.90, $8.18, and $13.91, from smallest to largest. There are

only slight differences between the predicted costs for the model and the actual costs, so an

adjustment does not appear to be necessary based on this model. (d) According to our estimated

power model in part (b), the predicted cost for the new soccer team pizza is

y = 0.0308 242.115 $25.57 . (e) An alternative model is based on setting the cost proportional

to the area, or the power model of the form cost ( 4 ) x 2 . Most students will square the

diameter and then fit a linear model to obtain the least squares regression line

y = 0.506 + 0.0445 x 2 . The estimated price of the soccer team pizza is

Using least-squares with no intercept, the value of b is estimated to be 0.2046, so the predicted

2

cost of the soccer team pizza is y = ( 0.2046 24 ) $24.11 .

4.13 (a) As height increases, weight increases. Since weight is a 3-dimensional characteristic

and height is 1-dimensional, weight should be proportional to the cube of the height. A model of

the form weight = a(height)b would be a good place to start. (b) A scatterplot of the response

variable y = weight versus the explanatory variable x = height is shown below.

250

Weight (pounds)

225

200

175

150

60

65

70

Height (inches)

75

80

(c) Calculate the logarithms of the heights and the logarithms of the weights. The least-squares

regression line for the transformed data is log y = 1.3912 + 2.0029log x . r 2 = 0.9999 ; almost

all (99.99% of the variation in log of weight is explained by the linear relationship with log of

99

height. (d) The residual plot below for the transformed data shows that the residuals are very

close to zero with no discernable pattern. This model clearly fits the transformed data very well.

0.0015

Residual

0.0010

0.0005

0.0000

-0.0005

-0.0010

1.750

1.775

1.800

1.825

Log(Height)

1.850

1.875

1.900

y = 101.3912 x 2.0029 0.0406 x 2.0029 . The predicted weight of a 510 (70) adult is

y = 0.0406 702.0029 201.4062 lbs, and the predicted weight of a 7 (84) adult is

4.14 Who? The individuals are hearts from various mammals. What? The response variable y

is the weight of the heart (in grams) and the explanatory variable x is the length of the left

ventricle (in cm). Why? The data were collected to explore the relationship in these two

quantitative measurements for hearts of mammals. When, where, how, and by whom? The data

were originally collected back in 1927 by researchers studying the physiology of the heart.

Graphs: A scatterplot of the original data is shown below (left). The nonlinear trend in the

scatterplot makes sense because the heart weight is a 3-dimensional characteristic which should

be proportional to the length of the cavity of the left ventricle. A scatterplot, after transforming

the data by taking the logarithms of both variables, shows a clear linear trend (below, right) so

the power model is appropriate.

4

3000

Log(Heart weight)

4000

2000

1000

-1

0

8

10

12

Cavity length (cm)

14

16

18

-0.4

-0.2

0.0

0.2

0.4

0.6

Log(Cavity lenght)

0.8

1.0

1.2

Numerical Summaries: The correlation between log of cavity length and log of heart weight is

0.997, indicating a near perfect association. Model: The power model is weight = a lengthb .

After taking the logarithms of both variables, the least-squares regression line is

log y = 0.1364 + 3.1387log x . Approximately 99.3% of the variation in the log of heart weight

is explained by the linear relationship with log of cavity length. The residual plot below suggests

that there may be a little bit of curvature remaining, but nothing to get overly concerned about.

100

Chapter 4

0.3

Residual

0.2

0.1

0.0

-0.1

-0.2

-0.4

-0.2

0.0

0.2

0.4

0.6

Log(Cavity length)

0.8

1.0

1.2

y = 100.1364 x3.1387 0.7305 x3.1387 , which provides a good fit for these data.

400

10

300

Residual

Distance (cm)

4.15 (a) The scatterplot below (left) shows that the relationship between y = distance and x =

time is strong, positive, and nonlinear (curved).

200

-5

100

-10

0

0.1

0.2

0.3

0.4

0.5

0.6

Time (seconds)

0.7

0.8

0.0

0.9

0.1

0.2

0.3

0.4

0.5

Time squared

0.6

0.7

0.8

(b) The least-squares regression line for the transformed data is y = 0.990 + 490.416 x 2 . (c) The

residual plot above (right) shows random scatter and r 2 = 0.9984 , so 99.84% of the variability in

the distance fallen is explained with this linear model. (d) Yes, the scatterplot below (left) shows

that this transformation does a very good job creating a linear trend. The least-squares regression

line for the transformed data is y = 0.1046 + 22.0428 x .

0.4

20

0.2

15

0.1

Residual

0.3

10

0.0

-0.1

-0.2

-0.3

-0.4

-0.5

0.1

0.2

0.3

0.4

0.5

0.6

Time (seconds)

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

Time (seconds)

0.7

0.8

0.9

(e) The residual plot above (right) shows no obvious pattern and r 2 = 0.9986 . This is an excellent

model. (f) The predicted distance that an object had fallen after 0.47 seconds is 109.32 cm using

101

the model from (b) and 109.51 cm using the model from (d). There is very little difference in the

predicted values, but most students will probably pick the prediction from (d) because r 2 is a

little higher and the residual plot shows less variability about the regression line.

4.16 (a) We are given the model ln y = 2.00 + 2.42ln x . Using properties of logarithms, the

power model is eln y = e 2.00+2.42ln x or y = e2.00 x 2.42 . (b) The estimated biomass of a tree with a

diameter of 30 cm is y = e 2.00 302.42 508.2115 kg.

4.17 Who? The individuals are carnivores. What? The response variable y is a measure of

abundance and the explanatory variable x is the size of the carnivore. Why? Ecologists were

interested in learning more about natures patterns. When, where and how? The data were

collected before 2002 (the publication date) by relating the body mass of the carnivore to the

number of carnivores. Rather than simply counting the total number of observed carnivores, the

researchers created a measure of abundance based on a count relative to the size of prey in an

area. Graphs: A scatterplot of y = abundance versus x = body mass (on the left below) shows a

nonlinear relationship. Using the log transformation for both variables provides a moderately

strong, negative, linear relationship (see the scatterplot below on the right).

1800

3

1600

1400

Log(Abundance)

Abundance

1200

1000

800

600

400

200

0

-1

0

50

100

150

200

Body mass (kg)

250

300

350

-1.0

-0.5

0.0

0.5

1.0

Log(Body mass)

1.5

2.0

2.5

Numerical Summaries: The correlation between log body mass and log abundance is 0.912.

Model: The least-squares regression line for the transformed data is

log y = 1.9503 1.0481log x , with an r 2 = 0.8325 and a residual plot (below) showing no obvious

patterns.

1.0

Residual

0.5

0.0

-0.5

-1.0

-1.0

-0.5

0.0

0.5

1.0

Log(Body mass)

1.5

2.0

2.5

y = 101.9503 x 1.0481 89.1867 x 1.0481 , which provides a good fit for these data.

102

Chapter 4

4.18 Let x = the breeding length, length at which 50% of females first reproduce and y = the

asymptotic body length. The scatterplot (left) and residual plot (right) below show that the linear

model does not provide a great fit for these body measurements of this fish species. Most of the

residuals are negative for breeding lengths below 30 cm and above 150 cm.

100

75

400

50

300

Residual

500

200

25

0

100

-25

-50

0

0

50

100

150

200

Breeding length (cm)

250

300

350

50

100

150

200

Breeding length

250

300

350

Applying the log transformation to both lengths produces better results. The scatterplot (left)

and residual plot (right) below show that a linear model provides a very good fit. The least

squares regression model for the transformed data is log y = 0.3011 + 0.9520logx , with an

r 2 = 0.898 and a residual plot with very little structure, although most of the residuals are still

negative when the explanatory variable is above 1.9.

0.4

0.3

2.5

0.2

2.0

Residual

3.0

1.5

0.1

0.0

-0.1

1.0

-0.2

0.5

-0.3

0.5

1.0

1.5

Log(Breeding length)

2.0

2.5

0.5

1.0

1.5

Log(Breeding length)

2.0

2.5

The inverse transformation gives the estimated power model y = 100.3011 x0.952 2.0003x 0.952 ,

which provides a good fit for these data.

4.19 (a) Scatterplots of the original data (left) and the transformed data (right) are shown below.

103

120

2.0

100

80

60

40

1.5

1.0

0.5

20

0.0

0

0

10

20

Time (hours)

30

40

10

20

Time (hours)

30

40

(b) The first phase is from 0 to 6 hours when the mean colony size actually decreases. This

decrease is hard to see on the graph of the original data, but is more obvious on the graph of the

transformed data. In the second phase, from 6 to 24 hours, the mean colony size increases

exponentially. Both graphs show this phase clearly, but it is most noticeable from the linear

trend on the graph of the transformed data for this time period. At 36 hours, mean growth is in

the third phase where growth is still occurring, but at a lower rate than the previous phase. The

point in the top right corner of both graphs clearly shows the new phase because this point does

not fit the pattern for phase two. (c) Let y = mean colony size and x = time. The least-squares

regression line for the transformed data is log y = 0.5942 + 0.0851x . Using the inverse

transformation, the predicted size of a colony 10 hours after inoculation is

y = 100.5942100.085110 = 100.2568 1.8063 .

1.6

1.75

1.4

1.50

1.2

1.25

Log(Colony size)

4.20 The correlation for time (hours 624) and log (mean colony size) is r = 0.9915 . The

correlation time (hours 624) and log (individual colony size) is r = 0.9846 . As expected, the

correlation for the individual colony size is smaller than the correlation for the mean colony size

because individual measurements have more variability. The scatterplots below show the

differences in the relationships for mean colony sizes (left) and individual colony sizes (right).

1.0

0.8

0.6

0.4

1.00

0.75

0.50

0.25

0.2

0.00

0.0

5

10

15

Time (hours)

20

25

10

15

Time (hours)

20

25

4.21 (a) Weight = c1 (height )3 and strength = c2 (height ) 2 , so strength = c3 ( weight ) 2 / 3 , where c1 ,

c2 , and c3 are arbitrary constants. (b) The graph of y = x 2 / 3 below shows that strength does not

increase linearly with body weight, as would be the case if a person 1 million times as heavy as

an ant could lift 1 million times more than the ant. Strength increases more slowly. For

example, if weight is multiplied by 1000, strength will increase by a factor of 10002 / 3 = 100 .

104

Chapter 4

100

Strength

80

60

40

20

0

0

200

400

600

800

1000

Weight

4.22 (a) Answers will vary. (b) The population of cancer cells after n 1 years is P = P0 (7 / 6) n 1 .

The population of cancer cells after n years is P = P0 (7 / 6) n 1 + (1/ 6)( P0 (7 / 6)n 1 ) = P0 (7 / 6) n .

(c) Answers will vary, but the exponential model should provide a good fit for the data collected.

4.23 (a) The sum of the six counts is 10+9+24+61+206+548 = 858 people. (b) The sum of the

top row shows 10+9+24 = 43 people had arthritis. (c) The marginal distribution of participation

in soccer is shown below.

Elite Non-elite Did not play

Count

71

215

572

Percent 8.3% 25.1%

66.7%

(d) The percent of each group who have arthritis is 14.08% for the elite soccer players, 4.2% for

the non-elite soccer players and 4.19% for the people who did not play. This suggests an

association between playing elite soccer and developing arthritis.

4.24 The percents should add to 100% because they provide a breakdown of all participants

according to one categorical variable. The sum is 8.3% + 25.1% + 66.7% = 100.1 %. If one

more decimal place is included in each of the percents, then the sum is 8.28% + 25.06% +

66.67% = 100.01%. The percents do not add to 100% because of rounding.

4.25 (a) The sum of the six counts is 5375 students. (b) The proportion of these students who

smoke is 1004/5375 = 0.1868, so the percent of smokers is 18.68%. (c) The marginal

distribution of parents smoking behavior is shown below.

Neither parent smokes

One parent smokes Both parents smoke

Count

1356

2239

1780

Percent 25.23%

41.66%

33.12%

(d) The three conditional distributions are shown in the table below.

Neither parent

One parent

Both parents

smokes

smokes

smoke

Student does not smoke

86.14%

81.42%

77.53%

Student smokes

13.86%

18.58%

22.47%

The conditional distributions reveal what many people expectparents have a substantial

influence on their children. Students that smoke are more likely to come from families where

one or more of their parents smoke.

105

4.26 (a) The two-way table is shown below. (b) The percent of eggs in each group that hatched

are 59.26% in a cold nest, 67.86% in a neutral nest, and 72.12% in a hot nest. The percents

indicate that hatching increases with temperature. The cold nest did not prevent hatching, but

made it less likely.

Cold

Neutral Hot

Hatched

16

38

75

Not hatched 11

18

29

Total

27

56

104

4.27 (a) The two conditional distributions are shown in the table below. The biggest difference

between men and women is in Administrationa higher percentage of women chose this major.

A greater percent of men chose the other fields, especially finance. (b) A total of 386 students

responded , so 722386 = 336 did not respond. About 46.54% of the students did not respond.

Female Male

Accounting

30.22% 34.78%

Administration 40.44% 24.84%

Economics

2.22% 3.73%

Finance

27.11% 36.65%

4.28 Two examples are shown below. In general, choose a to be any number from 0 to 50, and

then all the other entries can be determined.

25 25

10 40

35 15

50 0

Note: This is why we say that such a table has one degree of freedom: We can make

one (nearly) arbitrary choice for the value of a, and then have no more decisions to make.

4.29 (a) The two-way table is shown below. (b) Overall, 11.88% of white defendants and

10.24% of black defendants receive the death penalty. For white victims, 12.58% of white

defendants and 17.46% of black defendants receive the death penalty. For black victims, 0% of

white defendants and 5.83% of black defendants receive the death penalty. (c) The death penalty

is more likely when the victim was white (14.02%) rather than black (5.36%). Because most

convicted killers are of the same race as their victims, whites are more often sentenced to death.

Death penalty

No death penalty

White defendant

19

141

Black defendant

17

149

4.30 (a) The two-way table is shown below. (b) Overall, 70% of male applicants are admitted,

while only 56% of females are admitted. (c) In the business school, 80% of male applicants are

admitted, compared with 90% of females. In the law school, 10% of males are admitted,

compared with 33.33% of females. (d) Six out of 7 men apply to the business school, which

admits 82.5% of all applicants, while 3 out of 5 women apply to the law school, which admits

only 27.5% of its applicants.

Admit Deny

Male

490

210

Female 280

220

106

Chapter 4

4.31 The table below gives the two marginal distributions. The marginal distribution of marital

status is found by taking, e.g., 337/8235 4.1%. The marginal distribution of job grade is found

by taking, e.g., 955/8235 11.6%.

Single

Married

Divorced Widowed

4.1%

93.9%

1.5%

0.5%

Grade 1

Grade 2

Grade 3

Grade 4

11.6%

51.5%

30.2%

6.7%

As rounded here, both sets of percents add up to 100%. If students round to the nearest whole

percent, the marital status numbers add up to 101%. If they round to two places after the

decimal, the job grade percents add up to 100.01%.

4.32 The percent of single men in grade 1 jobs is 58/337 17.21%. The percent of grade 1 jobs

held by single men is 58/955 6.07%.

4.33 Divide the entries in the first column by the first column total; e.g., 17.21% 58/337.

These should add to 100% (except for rounding error). The percentages in the table below add to

100.01%.

Job grade

% of single men

1

17.21%

2

65.88%

3

14.84%

4

2.08%

If the percents are rounded to the nearest tenth, 17.2%, 65.9%, 14.8%, and 2.1%, then they add to

100%.

4.34 (a) We need to compute percents to account for the fact that the study included many more

married men than single men, so that we would expect their numbers to be higher in every job

grade (even if marital status had no relationship with job level). (b) A table of percents is below;

descriptions of the relationship may vary. Single and widowed men had higher percents of grade

1 jobs; single men had the lowest (and widowed men the highest) percents of grade 4 jobs.

Job grade

Single

Married

Divorced Widowed

1

17.21%

11.31%

11.90%

19.05%

4

2.08%

6.90%

5.56%

9.52%

4.35 Age is the main lurking variable: Married men would generally be older than single men,

so they would have been in the work force longer, and therefore had more time to advance

in their careers.

4.36 (a) A bar graph is shown below58.33% of desipramine users did not have a relapse,

while 25.0% of lithium users and 16.7% of those who received a placebo succeeded in breaking

their addictions. (b) Because random assignment was used, there is statistical evidence for

causation (though there are other questions we need to consider before we can reach that

conclusion).

107

60

50

40

30

20

10

Desipramine

Lithium

Label

Placebo

4.37 (a) To find the marginal distribution of opinion, we need to know the total numbers of

people with each opinion: 49/133 36.84% said higher, 32/133 24.06% said the same,

and 52/133 39.10% said lower. The numbers are summarized in the first table below. The

main finding is probably that about 39% of users think the recycled product is of lower quality.

This is a serious barrier to sales. (b) There were 36 buyers and 97 nonbuyers among the

respondents, so (for example) 20/36 55.56% of buyers rated the quality as higher. Similar

arithmetic with the buyers and nonbuyers rows gives the two conditional distributions of opinion,

shown in the second table below. We see that buyers are much more likely to consider recycled

filters higher in quality, though 25% still think they are lower in quality. We cannot draw any

conclusion about causation: It may be that some people buy recycled filters because they start

with a high opinion of recycled products, or it may be that use persuades people that the quality

is high.

Higher The same Lower

36.84% 24.06%

39.10%

Higher The same

Buyers

55.56% 19.44%

Nonbuyers 29.90% 25.77%

Lower

25.00%

44.33%

4.38 (a) The two-way table is shown below. (b) The overall batting averages are 0.240 for Joe

and 0.260 for Moe. Moe has the best overall batting average.

Hit

No hit

Joe

120

380

Moe

130

370

(c) Two separate tables, one for each type of pitcher, are shown below. Against left-handed

pitchers, Joes batting average is 0.200 and Moes batting average is 0.100. Against righthanded pitchers, Joes batting average is 0.400 and Moes batting average is 0.300. Joe is better

against both kinds of pitchers.

Left-handed pitchers

Right-handed pitchers

Hit

No hit

Hit

No hit

Joe

80

320

Joe

40

60

Moe

10

90

Moe

120

280

(d) Both players do better against right-handed pitchers than against left-handed pitchers. Joe

spent 80% of his at-bats facing left-handers, while Moe only faced left-handers 20% of the time.

108

Chapter 4

4.39 Examples will vary, of course; one very simplistic possibility is shown below. The key is

to be sure that there is a lower percentage of overweight people among the smokers than among

the nonsmokers.

Combined All People

Early Death

Yes

No

Overweight

41

59

Not overweight 50

50

Smokers

Non smokers

Early Death

Yes

No

Overweight

10

0

Not overweight 40

20

Early Death

Yes No

Overweight

31 59

Not overweight 10 30

4.40 Who? The individuals are students. What? The categorical variables of interest are

educational level or degree (Associates, Bachelors, Masters, Professional, or Doctors) and

gender (male or female). Why? The researchers were interested in checking if the participation

of women changes with level of degree. When, where, how, and by whom? These projections,

in thousands, were made for 2005-2006 by the National Center for Education Statistics. Graphs:

The conditional distributions of sex for each degree level are shown in the bar graph below (left).

The conditional distributions of degree level for each gender are shown in the bar graph below

(right).

70

50

60

40

Percent

Percent

50

40

30

30

20

20

10

10

0

m

Fe

Degree

A

e

al

le

Ma

's

te

ia

oc

ss

m

Fe

e

al

el

ch

Ba

's

or

e

al

m

Fe

e

al

's

er

st

Ma

e

al

m

Fe

e

al

o

si

es

of

Pr

e

al

m

Fe

l

na

D

e

al

r's

to

oc

e

al

0

Degree

A

l

's

's

r's

r 's

na

te

or

te

to

ia

el

sio

as

oc

oc

ch

M

es

D

ss

Ba

of

r

P

Fe

e

al

l

's

's

's

r's

na

te

or

or

te

ia

el

ct

sio

as

oc

ch

M

es

Do

ss

Ba

of

r

P

Ma

le

Numerical summaries: The software output below from Mintab provides the joint distribution,

marginal distributions, and conditional distributions in one consolidated table. The first entry in

each cell is the count, the second entry is the % of the row (or the conditional distribution of

gender for each type of degree), the third entry is the % of the column (or the conditional

distribution of degree for each gender), and the fourth entry is the overall %.

Rows: Degree

Columns: Gender

Female

Male

109

All

Associate's

431

63.85

26.85

15.851

244

36.15

21.90

8.974

675

100.00

24.83

24.825

Bachelor's

813

58.20

50.65

29.901

584

41.80

52.42

21.478

1397

100.00

51.38

51.379

Doctor's

21

46.67

1.31

0.772

24

53.33

2.15

0.883

45

100.00

1.66

1.655

Master's

298

58.09

18.57

10.960

215

41.91

19.30

7.907

513

100.00

18.87

18.867

42

47.19

2.62

1.545

47

52.81

4.22

1.729

89

100.00

3.27

3.273

1605

59.03

100.00

59.029

1114

40.97

100.00

40.971

2719

100.00

100.00

100.000

Professional

All

Cell Contents:

Count

% of Row

% of Column

% of Total

Interpretation: Women earn a majority of associates, bachelors, and masters degrees, but fall

slightly below 50% for professional and doctoral degrees. The distributions of degree level are

very similar for females and males.

4.41 No. Rich nations have more TV sets than poor nations. Rich nations also have longer life

expectancies because they have better nutrition, clean water, and better health care. There is

common response relationship between TV sets and length of life.

110

Chapter 4

x = # of

TV sets

y = average

life span

Wealth

4.42 In this case, there may be a causative effect, but in the direction opposite to the one

suggested: People who are overweight are more likely to be on diets, and so choose artificial

sweeteners over sugar. (Also, heavier people are at a higher risk to develop diabetes; if they do,

they are likely to switch to artificial sweeteners.)

Use of

sweeteners

Weight

gain

4.43 No. The number of hours standing up is a confounding variable in this case. The diagram

below illustrates the confounding between exposure to chemicals and standing up.

111

?

Exposure to

chemicals

Miscarriages

Time

standing up

4.44 Well-off people tend to have more cars. They also tend to live longer, probably because

they are better educated, take better care of themselves, and get better medical care. The cars

have nothing to do with it. The relationship between number of cars and length of life is

common response.

Number

of cars

Length of

life

Wealth

4.45 It could be that children with lower intelligence watch many hours of television and get

lower grades as well. It could be that children from lower socio-economic households where

parents are less likely to limit television viewing and are unable to help their children with their

schoolwork because the parents themselves lack education. The variables number of hours

112

Chapter 4

watching television and grade point average change in common response to socio-economic

status or IQ.

Number of hours

spent watching TV

GPA

IQ or socioeconomic

status

4.46 Single men tend to have a different value system than married men. They have many

interests, but getting married and earning a substantial amount of money are not among their top

priorities. Confounding is the best term to describe the relationship between marital status and

income.

?

Marital

status

Annual

income

Values

4.47 The effects of coaching are confounded with those of experience. A student who has taken

the SAT once may improve his or her score on the second attempt because of increased

113

familiarity with the test. The student may also have increased knowledge from additional math

and science courses.

SAT

score

Coaching

Course

Experience

4.48 A reasonable explanation is that the cause-and-effect relationship goes in the other

direction: Doing well makes students feel good about themselves, rather than vice versa.

Selfesteem

Quality

of work

CASE CLOSED!

1. (a) Let y = premium and x = age. Scatterplots of the original data (left) and transformed data

(right) after taking the logarithms of both variables are shown below. The plot of the original

data shows a strong nonlinear relationship. The plot for the transformed data shows a clear

linear trend, so the power model is appropriate.

114

Chapter 4

2.4

250

2.3

2.2

Log(Premium)

Premium ($)

200

150

100

2.1

2.0

1.9

1.8

1.7

50

1.6

1.5

0

40

45

50

55

Age (years)

60

65

1.60

1.65

1.70

Log(Age)

1.75

1.80

(b) A scatterplot of the logarithm of premium versus age is shown below (left). The linear trend

suggests that the exponential model is appropriate.

2.4

0.010

2.3

0.005

2.1

Residual

Log(Premium)

2.2

2.0

1.9

1.8

1.7

0.000

-0.005

-0.010

1.6

1.5

-0.015

40

45

50

55

age

60

65

40

45

50

55

60

65

Age

(c) Since the association between the log of premium and age is nearly perfect, the exponential

model is most appropriate. The least-squares regression line for the transformed data is

log y = 0.0275 + 0.0373x . Using the inverse transformation, the predicted premium is

y = 100.0275100.0373 x 0.9386 100.0373 x . (d) The predicted monthly premiums are

y = 0.9386 100.037358 $136.74 for a 58-year-old and y = 0.9386 100.037368 $322.76 for a

68-year-old. (e) You should feel very comfortable with these predictions. The residual plot

above (right) shows no clear patterns and r 2 = 99.9% , so the exponential model provides an

excellent fit.

2. (a) The entries in each column are only from these six selected causes of death. There are

other causes of death so the total number of deaths in each age group is higher than the sum of

the deaths for these six causes. (b) Percents should be used to compare the age groups because

the age groups contain different numbers of individuals. (c) The conditional distributions are

shown in the table below. Each entry is obtained by dividing the count for that cause of death by

the appropriate column total.

15 to 24 years 25 to 44 years 45 to 64 years

Accidents

45.32%

21.60%

5.42%

AIDS

0.52%

5.34%

1.35%

Cancer

4.93%

14.77%

33.16%

Heart disease

3.28%

12.63%

23.27%

Homicide

15.59%

5.71%

0.63%

Suicide

11.87%

8.73%

2.30%

115

50

50

40

40

30

30

Percent

Percent

20

10

Cause

20

10

s S r t e e

s S r t e e

s S r t e e

n t ID ce ar id id

n t ID ce ar id id

n t ID ce ar id id

d e A Can He mic u ic

d e A Can He mic u ic

d e A Can He mic Su ic

ci

ci

ci

o

o S

o S

H

Ac

H

Ac

Ac

H

4

P6

5

P2

5

P1

Cause

5 5 4

P1 P2 P6

s

nt

de

ci

Ac

5 5 4

P1 P 2 P 6

D

AI

5 5 4

P1 P2 P 6

r

ce

an

C

5 5 4

P 1 P2 P6

rt

ea

H

5 5 4

P 1 P 2 P6

e

id

ic

m

Ho

5 5 4

P1 P2 P6

e

id

ic

Su

(d) The leading cause of death for the youngest age group is accidents, followed by homicide

and suicide. For the middle age group, accidents are still the leading cause of death, but cancer

and heart disease are second and third, respectively. For the oldest age group, cancer is the

leading cause of death, with heart disease running a close second.

3. (a) The chance of dying for men over 65 who walk at least 2 miles a day is half that of men

who do not exercise. (b) Individuals who exercise regularly have many other habits and

characteristics that could contribute to longer lives.

4.49 Spending more time watching TV means that less time is spent on other activities. Answers

will vary, but some possible lurking variables are: the amount of time parents spend at home, the

amount of exercise and the economy. For example, parents of heavy TV watchers may not

spend as much time at home as other parents. Heavy TV watchers may not get as much exercise

as other adolescents. As the economy has grown over the past 20 years, more families can afford

TV sets (many homes now contain more than two TV sets), and as a result, TV viewing has

increased and children have less physical work to do in order to make ends meet.

4.50 (a) Let y = intensity and x = distance. A scatterplot of the original data is shown below

(left). The data appear to follow a power law model of the form y = axb where b is some

negative number.

-0.5

0.30

-0.6

-0.7

Log(Intensity)

Intensity (candelas)

0.25

0.20

0.15

-0.8

-0.9

-1.0

0.10

-1.1

-1.2

1.0

1.2

1.4

1.6

Distance (meters)

1.8

2.0

0.00

0.05

0.10

0.15

0.20

Log(distance)

0.25

0.30

(b) A scatterplot of the transformed data (above on the right), after taking the logarithms of both

variables, shows a clear linear trend, so the power model is appropriate. The least-squares

116

Chapter 4

regression line for the transformed data is log y = 0.5235 2.0126log x . (c) The residual plot

below shows no obvious patterns and r 2 = 99.9% so this linear model on the transformed data

provides an excellent fit.

0.010

0.25

Intensity (candelas)

0.005

Residual

Variable

intensity

predicted

0.30

0.000

0.20

0.15

0.10

-0.005

0.00

0.05

0.10

0.15

0.20

Log(Distance)

0.25

0.30

1.0

1.2

1.4

1.6

Distance (meters)

1.8

2.0

(d) Using the inverse transformation to find the predicted intensity gives

y = 100.5235 x 2.0126 0.2996 x 2.0126 . The plot of the original data with this model is shown above

(right). (e) The predicted intensity of the 100-watt bulb at 2.1 meters is

y = 0.2996 2.12.0126 0.0673 candelas.

4.51 (a) Yes, this transformation achieves linearity; see the scatterplot below.

0.30

Intensity (candelas)

0.25

0.20

0.15

0.10

0.2

0.3

0.4

0.5

0.6

0.7

1/(Distance-squared)

0.8

0.9

1.0

(b) Let x = distance and y = intensity. The least-squares regression line for the transformed data

1

is y = 0.0006 + 0.30 2 . (c) The predicted intensity of the 100-watt bulb at 2.1 meters is

x

1

y = 0.0006 + 0.30 2 0.0674 candelas. (d) Writing the model from part (d) of Exercise

2.1

0.3

4.50 in a slightly different form shows that the models are very similar, y = 0.0006 + 2

2.1

0.3

versus y 2.0126 . The absolute difference in the predicted values is 0.0001. Thus, the

2.1

117

4.52 The explanatory variable is the amount of herbal tea and the response variable is a measure

of health and attitude. The most important lurking variable is social interactionmany of the

nursing home residents may have been lonely before the students started visiting.

4.53 (a) The column sums are shown below.

Single:

10,949 + 7,653 + 4,009 + 720 = 23,331

Married:

2,472 + 19,640 + 32,183 + 8,539 = 62,834

Widowed:

16 + 228 + 2,312 + 8,732 = 11,288

Divorced:

155 + 2,904 + 7,898 + 1,703 = 12,660

The sum of these column totals is 23,331 + 62,834 + 11,288 + 12,660 = 110,113, which is not

equal to 110,115. The difference is due to rounding. (b) The marginal distributions, conditional

distributions, and joint distribution are shown in the software output from Minitab below.

Rows: Age

Columns: Marital Status

divorced married single widowed

All

15-24

155

1.14

1.22

0.141

2472

18.19

3.93

2.245

10949

80.55

46.93

9.943

16

0.12

0.14

0.015

13592

100.00

12.34

12.344

25-39

2904

9.54

22.94

2.637

19640

64.55

31.26

17.836

7653

25.15

32.80

6.950

228

0.75

2.02

0.207

30425

100.00

27.63

27.631

40-64

7898

17.02

62.39

7.173

32183

69.36

51.22

29.227

4009

8.64

17.18

3.641

2312

4.98

20.48

2.100

46402

100.00

42.14

42.140

65+

1703

8.65

13.45

1.547

8539

43.36

13.59

7.755

720

3.66

3.09

0.654

8732

44.34

77.36

7.930

19694

100.00

17.89

17.885

All

12660

11.50

100.00

11.497

62834

57.06

100.00

57.063

23331

21.19

100.00

21.188

11288

10.25

100.00

10.251

110113

100.00

100.00

100.000

Cell Contents:

Count

% of Row

% of Column

% of Total

The table below provides just the marginal distribution for marital status.

Single Married Widowed Divorced

21.19% 57.06% 10.25%

11.50%

A bar chart of the marginal distribution is shown below.

118

Chapter 4

60

50

Percent

40

30

20

10

Single

Married

Widowed

Marital status

Divorced

(c) The two conditional distributions are shown in the table below.

Age

Single Married Widowed Divorced

1524 80.55% 18.19% 0.12%

1.14%

4064 8.64% 69.36% 4.98%

17.02%

Among the younger women, more than 4 out of 5 have not yet married, and those who are

married have had little time to become widowed or divorced. Most of the older group is or has

been marriedonly about 8.64% are still single. (d) Among single women, 46.93% are 1524,

32.8% are 2539, 17.18% are 4064 and 3.09% are 65 or older.

4.54 (a) The scatterplots below show a strong nonlinear relationship for the original data (left)

and a nearly perfect, negative linear association for the transformed data (right).

0.5

3.0

0.4

2.5

Log(Height)

Height (feet)

0.3

2.0

1.5

0.2

0.1

0.0

1.0

-0.1

-0.2

0

3

Bounce

Bounce

Not only is the linear association between the log(height) and bounce stronger than the linear

association between the logarithms of both variables, but there is also a value of zero for the

bounce number which means that the logarithm cannot be used for this point. The exponential

model is more appropriate for predicting y = height from x = bounce number. (b) The leastsquares regression line for the transformed data is log y = 0.4610 0.1191x . The residual plot

below shows that the first two residuals are positive and the next three residuals are negative, but

the residuals are all very small. The value of r 2 is 0.998, which indicates that 99.8% of the

variability in log(height) is explained by linear relationship with bounce. This model provides an

excellent fit.

119

0.015

0.010

Residual

0.005

0.000

-0.005

-0.010

Bounce

y = 100.4610100.1191 x 2.8907 100.1191 x . The predicted height on the 7th bounce is

y = 2.8907 100.11917 = 0.4239 feet.

4.55 The lurking variable is temperature or season. More flu cases occur in winter when less ice

cream is sold, and fewer flu cases occur in the summer when more ice cream is sold. This is an

example of common response.

Number of flu

cases reported

Amount of

ice cream

sold

Season or

temperature

4.56 Who? The individual are randomly selected people from three different locations. What?

The response variable is whether or not the individual suffered from CHD and the explanatory

variable is a measure of how prone an individual is to sudden anger. Both variables are

categorical, with CHD being yes or no and the level of anger being classified as low, moderate,

or high. Why? The researchers wanted to see if there was an association between these two

categorical variables. When, where, how, and by whom? In the late 1990s a random sample of

almost 13,000 people was followed for four years. The Spielberger Trait Anger Scale was used

to classify the level of anger and medical records were used for CHD. Graphs: A bar graph of

the conditional distributions of CHD for each level of anger is shown below (left). To see the

120

Chapter 4

increase in the percent of individual with CHD in each group, a separate bar graph is shown

(right). Notice how the change in scale changes your impression of the effect.

100

4

80

Percent

Percent

3

60

40

20

0

CHD

Anger level

yes

no

low

yes

no

moderate

yes

no

high

low

moderate

Anger level

high

Numerical summaries: The software output below from Minitab shows the marginal

distributions, conditional distributions, and joint distribution.

Rows: CHD

Columns: Anger

high

low moderate

All

No

606

7.32

95.73

7.151

3057

36.90

98.30

36.075

4621

55.78

97.67

54.532

8284

100.00

97.76

97.758

Yes

27

14.21

4.27

0.319

53

27.89

1.70

0.625

110

57.89

2.33

1.298

190

100.00

2.24

2.242

All

633

7.47

100.00

7.470

3110

36.70

100.00

36.700

4731

55.83

100.00

55.830

8474

100.00

100.00

100.000

Cell Contents:

Count

% of Row

% of Column

% of Total

The most important numbers for comparison are the percents of each anger group

that experienced CHD: 53/3110 1.70% of the low-anger group, 110/4731 2.33% of the

moderate-anger group, and 27/633 4.27% of the high-anger group.

Interpretation: Risk of CHD increases with proneness to sudden anger. It might be good to

point out to students that results like these are typically reported in the media with a reference to

4.3%

2.5 , we might read that subjects in the

the relative risk of CHD; for example, because

1.7%

high-anger group had 2.5 times the risk of those in the low-anger group.

4.57 Who? The individuals are cultures of marine bacteria. What? The two quantitative

variables are x = time (minutes) and y = count (number of surviving bacteria in hundreds). Why?

Researchers wanted to see if the bacteria would decay exponentially over time when exposed to

X-rays. When, where, how, and by whom? It is not clear when or where the data were collected,

but the counts were obtained after exposing cultures to X-rays for different lengths of time.

121

Graphs: Scatterplots below show the original data (left) and the transformed data (right) after

taking the logarithm of count. Both plots suggest that the exponential decay model is appropriate

for these data.

400

2.6

2.4

2.2

Log(Count)

300

200

100

2.0

1.8

1.6

1.4

1.2

1.0

0

8

10

Time (minutes)

12

14

16

8

10

Time (minutes)

12

14

16

Numerical summaries: The least-squares regression line for the transformed data is

log y = 2.5941 0.0949 x . Using the inverse transformation, the predicted count is

y = 102.5941100.0949 x 392.7354 100.0949 x . Interpretation: The residual plot below shows no

clear pattern and r 2 = 98.8% , so the exponential decay model provides an excellent model for

the number of surviving bacteria after exposure to X-rays.

0.10

Residual

0.05

0.00

-0.05

-0.10

0

8

10

Time (minutes)

12

14

16

4.58 (a) The two-way table below was obtained by adding the corresponding entries for each

age group. The proportion of smokers who stayed alive for 20 years is 443/582 0.7612 or

76.12% and the proportion of nonsmokers who stayed alive is 502/732 0.6858 or 68.58%.

Smoker Not

Dead

139

230

Alive

443

502

(b) For the youngest group, 269/288 or 93.40% of the smokers and 327/340 or 96.18% of the

nonsmokers survived. For the middle group, 167/245 or 68.16% of the smokers and 147/199 or

73.87% of the nonsmokers survived. For the oldest group, 7/49 or 14.29% of the smokers and

28/193 or 14.51% of the nonsmokers survived. The results are reversed when the data for the

three age groups are combined. (c) The percents of smokers in the three age groups are

288/628100 45.86% for the youngest group, 245/444100 55.18% for the middle aged

group, but only 49/242100 20.25% for the oldest group.

- Random Wave Forces on a Free-To-surge Vertical CylinderЗагружено:ThangiPandian1985
- Pt4 Adv Regression ModelsЗагружено:pe
- H2 Math PracticeЗагружено:Phoebe Heng
- Statistical Inference, Regression SPSS ReportЗагружено:Ijaz Hussain Bajwa
- Evaluation of Venture Capital Based on Evaluation ModelЗагружено:kamer4u
- 360Bind for SAP BusinessObjects Automated Regression TestingЗагружено:goiffon sebastien
- RegressionЗагружено:luispedro1985
- Predictions--HP LabsЗагружено:JoshLowensohn
- Chapter 5Загружено:Lan Anh
- RegressionЗагружено:npnbkck
- Applied Regression Analysis Final ProjectЗагружено:bqa5055
- ACCG200_SolutionCh_03.pdfЗагружено:lethiphuongdan
- 17004411Загружено:Vaduva Andreea
- research report on e-banking and its effectЗагружено:Prachi Tiwari
- Appendix Nonlinear RegressionЗагружено:Elvis Eduard R A
- Equations for Linear RegressionЗагружено:ssheldon_222
- 12.Simple Regression NLS Edit(1)Загружено:Zaldy Harrist
- 10.5923.j.statistics.20150501.01 (1)Загружено:Gibranda Randa
- 37743305-Applications-of-Statistical-Software-for-Data-Analysis.docxЗагружено:Babita Singla
- R-1Загружено:muralidharan
- stats_ch12.pdfЗагружено:Vivek Anandhan
- Chapter 17Загружено:FatimaIjaz
- C1E2_110725Загружено:Екатерина Браилица
- Chap 003Загружено:Pradeep Joshi
- Basic StatsЗагружено:Komal Madhan
- Sweet Biscuits, Snack Bars and Fruit Snacks in India AnalysisЗагружено:aksh
- Regression and Correlation PageЗагружено:John Collins
- unit4Загружено:Juan
- OutputЗагружено:Priya Kumari
- praktikum5Загружено:Hanny Honeyy

- Chapter 15 Final (Homework Answers)Загружено:Kelly Johnson
- Chap 1Загружено:sums95
- Chapter 5 Textbook SolutionsЗагружено:kelly4356
- Chapter 3 SolutionsЗагружено:Greg
- Chapter 18 Silberberg AnswersЗагружено:Kevin Dash
- Chapter 11 Odd ProbsЗагружено:Amihan
- 255123 1997 Statistics Re SolutionsЗагружено:Greg
- Chapter 2 SolutionsЗагружено:okayigiveup
- Chapter 6 SolutionsЗагружено:Greg
- Chapter 14 SolutionsЗагружено:Greg
- Chapter 7 FinalЗагружено:Jason Yoon
- Chapter 10 SolutionsЗагружено:Greg
- Chapter 8 SolutionsЗагружено:Greg
- Chapter 13 SolutionsЗагружено:Greg
- Chapter 11 SolutionsЗагружено:Greg
- Chapter 12 SolutionsЗагружено:Greg
- Chapter 9 SolutionsЗагружено:Greg
- Ch 12 Study Guide AnswersЗагружено:Greg
- ELECTROCHEMISTRY-2Загружено:siyengar1447
- Chapter 20Загружено:Greg
- Chapter 6Загружено:Greg
- Chapter 4Загружено:Greg
- Chapter 5Загружено:Greg
- Chapter 17Загружено:Ashwani Kumar
- doc_apr_22_2014_9_43_1Загружено:Greg
- Chapter 12Загружено:Greg
- Chapter 10Загружено:Greg
- Chapter13 Odd ProbsЗагружено:Greg
- Major WorksЗагружено:Greg

- Bilal Ahmed Shaik Data MiningЗагружено:SHAIKBILALAHMED
- civil engeeringЗагружено:farhad_hadegh
- Full Placement TestЗагружено:Manuel Garcia Grandy
- Manual Neuraltools6 EnЗагружено:John Kennedy Fernandes
- grade 2 teachers guide - solids and liquids learning centerЗагружено:api-348964352
- campbellhilscherszilagyi_jf2008Загружено:Tushar Akash
- Kryon Book-02 Don't Think Like a HumanЗагружено:Evgeniq Marcenkova
- A Longitudinal Study of the Reciprocal Nature of Risk Behaviors and Cognitions in Adolescents: What You Do Shapes What You Think ,and ViceVersaЗагружено:RichardCarmichael
- Actix Radioplan ACP Cookbook Atoll 3.12Загружено:hperaltam123
- Exida Webinar - A Different Certification SchemeЗагружено:Luc Schram
- Kent 1997 Milling ModellingЗагружено:gustool7
- Survival Part 5Загружено:bmartindoyle6396
- 4 Creative Determinants of Video CampaignsЗагружено:Saeed Rezaee
- The Tree of Life - Final for WebsiteЗагружено:artevangelical
- PartyЗагружено:Vlad Tepea
- Preliminary Power Prediction During Early Design Stages of a ShipЗагружено:Bawa Sandhu
- Development and Validation of the Revised Injury Severity Classification Score for Severely Injured Patients.Загружено:wahid_ub02
- Quiz 2 PracticeЗагружено:Aayush Mishra
- writing lab report in scienceЗагружено:api-243846788
- Childrens Predicted BehaviourЗагружено:Selvi Anggun S
- 09 APC Maintenance CTSЗагружено:SreekanthMylavarapu
- 881%5C969%5C0449-+1820-100+AAA+FARS+FRPCЗагружено:Nathalie Lee
- 1211.5723Загружено:mspharugeri
- 25 К.Н. Рао - Предсказание Событий По Вимшоттари ДашаЗагружено:DPdasa
- Extended Essay - MathematicsЗагружено:Fredrik Arentz
- Bialecka-Pikul 2017. Advanced Theory of Mind in AdolescenceЗагружено:Ivanovici Daniela
- 150-5380-7BЗагружено:Vag Katsikopoulos
- Hybrid evolutionary algorithms in a SVR-based electric load forecasting model.pdfЗагружено:Eren Kocbey
- TKDE2013Загружено:tp2006ster
- Solution Manual for Microeconomics 8th Edition by PerloffЗагружено:a670888393