Вы находитесь на странице: 1из 7

STA408: Statistics for Science and Engineering

Tutorial 5

Correlation Coefficient
1. The data below displays the ages (in years) of husbands and wives of six couples. Find the product
moment correlation coefficient of the data and interpret the value.
Ages (in years) of husbands and wives of six couples
Husband’s age 43 57 28 19 35 39
Wife’s age 37 51 32 20 33 38

2. An auto manufacturing company wanted to investigate how the price of one of its car model
depreciates with age. The research department at the company took a sample of eight cars of this
model and collected the following information on the ages (in years) and prices (in hundreds of
dollars) of these cars. The following are some summarized statistics.
∑ 𝑥 = 42 ∑ 𝑦 = 1133 ∑ 𝑥 2 = 264 ∑ 𝑦 2 = 213,565 ∑ 𝑥𝑦 = 4450
(a) Show that the Pearson correlation coefficient is 0.986. Explain the meaning of the value.
(b) What will happen to the prices of the cars when the ages increase?

3. The following table gives information on the number of megapixels and the prices of nine randomly
selected point-and-shoot digital cameras that were available on bestbuy.com. Calculate the value
of the Pearson correlation coefficient and interpret the value. (0.933)
Megapixels 10.3 10.2 7.0 9.1 10.0 12.1 8.0 5.0 14.7
Price (RM) 130 150 62 160 200 280 125 60 400

Regression & Miscellaneous Questions


4. A diabetic is interested in determining how the amount of aerobic exercise impacts his blood sugar.
When his blood sugar reaches 170 mg/dL, he goes for a run at a pace of 10 minutes per mile. On
different days, he runs different distances and measures his blood sugar after completing his run.
Note: The preferred blood sugar level is in the range of 80 to 120 mg/dL. Levels that are too low or
too high are extremely dangerous. The scatter diagram of the data recorded together with the
Minitab output are as given below.

Distance vs Blood sugar


150

140

130
Blood Sugar (mg/dL)

120

110

100

90

80

70
2.0 2.5 3.0 3.5 4.0 4.5
Distance (miles)
STA408 Tutorial 5 Chapter 5: Bivariate Analysis

Regression Analysis: Blood Sugar (mg/dL) versus Distance (miles)

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 5632.46 5632.46 245.73 0.000
Distance (miles) 1 5632.46 5632.46 245.73 0.000
Error 10 229.21 22.92
Lack-of-Fit 4 40.21 10.05 0.32 0.856
Pure Error 6 189.00 31.50
Total 11 5861.67

Model Summary

S R-sq R-sq(adj) R-sq(pred)


4.78758 96.09% 95.70% 94.18%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 191.62 5.44 35.23 0.000
Distance (miles) -25.37 1.62 -15.68 0.000 1.00

(a) Based on the scatter diagram, state the relationship between distance run and blood sugar
level.
(b) Identify the independent and dependent variables.
(c) Based on the output, state the regression equation. What method has been used to estimate
the coefficients?
(d) State the regression coefficients and explain what it means.
(e) State the value of the coefficient of determination and interpret the value.
(f) Determine the value of the correlation coefficient and interpret the value.
(g) Estimate the blood sugar level if he runs a distance of 3.5 miles. (102.825mg/dL)
(h) Can we conclude that the linear regression model is significant at 1% significance level.

5. An auto manufacturing company wanted to investigate how the price of one of its car model
depreciates with age. The research department at the company took a sample of eight cars of this
model and collected the following information on the ages (in years) and prices (in hundreds of
dollars) of these cars. The following are some summarized statistics.
∑ 𝑥 = 42 ∑ 𝑦 = 1133 ∑ 𝑥 2 = 264 ∑ 𝑦 2 = 213,565 ∑ 𝑥𝑦 = 4450
(a) Identify the independent and dependent variables in this analysis.
𝑆𝑆𝑥𝑦
(b) Show that the Pearson correlation coefficient is 0.986. (Use = ).
√𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
(c) How many percent of the variation in prices is explained by the age of the cars? (𝑅 2 = 0.9722)
(d) Name the statistics used to determine the answer in (c). (Coefficient of determination)
(e) Find the slope and 𝑦-intercept of the regression equation. (𝑏 = −34.4425, 𝑎 = 322.4481)
(f) Based on the calculation in (e), write the complete estimated regression equation and
interpret the slope in the context of the problem.
(g) Predict the price of car if the car is 7 years old. (81.35 hundreds of dollars)

2
STA408 Tutorial 5 Chapter 5: Bivariate Analysis

6. The following table gives information on the number of megapixels and the prices of nine randomly
selected point-and-shoot digital cameras that were available on bestbuy.com. The scatter diagram
and Minitab output as shown below.

Megapixels 10.3 10.2 7.0 9.1 10.0 12.1 8.0 5.0 14.7
Price (RM) 130 150 62 160 200 280 125 60 400

Prices (RM) vs Number of Megapixels


400

350

300
Price (RM)

250

200

150

100

50
5.0 7.5 10.0 12.5 15.0
Megapixels

Regression Analysis: Price (RM) versus Megapixels

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 81496 81496 46.99 0.000
Megapixels 1 81496 81496 46.99 0.000
Error 7 12141 1734
Total 8 93637

Model Summary
S R-sq R-sq(adj) R-sq(pred)
41.6463 87.03% 85.18% 71.76%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant -168.5 51.9 -3.25 0.014
Megapixels 35.68 5.21 6.85 0.000 1.00
Based on the above information, answer the following questions.
(a) Identify the independent and dependent variables.
(b) By referring to the scatter diagram, describe the relationship between the two variables.
(c) Use the data set above, show that the Pearson’s correlation coefficient if 0.933 and interpret
the value.
(d) State the coefficient of determination. Comment on this value.
(e) Write the regression equation.
(f) Show that the slope is 35.68 and give the interpretation in the context of the problem.
𝑆𝑆𝑥𝑦
Use 𝑏 = to show.
𝑆𝑆𝑥𝑥
(g) Estimate the price of the digital camera if the number of megapixels are 11.
(h) Test at 1% significance level whether the linear regression model is significant.

3
STA408 Tutorial 5 Chapter 5: Bivariate Analysis

7. The data of annual energy consumption in billions of BTU for both natural gas and coal for the
randomly selected states are given as follows.
∑ 𝑥 = 2256, ∑ 𝑦 = 3088, ∑ 𝑥𝑦 = 1283269, ∑ 𝑥 2 = 1079380, ∑ 𝑦 2 = 1690322
The Minitab output for the data is shown below.
Regression Analysis: Coal versus Gas

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 64590 64590 7.089 0.056
Gas 1 64590 64590
Error 4 36442 9111
Total 5 101031

Model Summary

S R-sq R-sq(adj) R-sq(pred)


95.4486 63.93% 54.91% 26.46%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 315.9 84.2 3.75 0.020
Gas 0.529 0.199 2.66 0.056 1.00
(a) State the independent and dependent variables.
(b) Write down the regression equation.
(c) Show by calculation that the slope value is 0.529 and interpret its value in the context of the
problem.
(d) State the coefficient of determination and explain its meaning.
(e) Determine the correlation coefficient and interpret the value.
(f) Perform a test to determine whether the linear regression is significant. Use 𝛼 = 0.05.

8. The experience (in years) and monthly salaries (in hundreds of RM) of nine randomly selected
secretaries are tabulated in the table below.
Experience 14 3 5 6 4 9 18 5 16
Monthly salary 62 29 37 43 35 60 67 32 60

Below is the Minitab output of the above data.


Regression Analysis: Monthly Salary versus Experience

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 1526.55 1526.55 43.62 0.000
Experience 1 1526.55 1526.55 43.62 0.000
Error 7 245.00 35.00
Lack-of-Fit 6 232.50 38.75 3.10 0.409
Pure Error 1 12.50 12.50
Total 8 1771.56

Model Summary

S R-sq R-sq(adj) R-sq(pred)


5.91612 86.17% 84.19% 79.99%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 25.55 3.83 6.68 0.000
Experience 2.438 0.369 6.60 0.000 1.00

4
STA408 Tutorial 5 Chapter 5: Bivariate Analysis

Based on the Minitab output, answer the following questions


(a) State the independent and dependent variable.
(b) Write down the regression equation.
(c) Show by calculation that the slope value is 2.438 and interpret its value in the context of the
problem.
(d) Determine the coefficient of correlation.
(e) State the coefficient of determination and explain its meaning.
(f) Based on the regression equation, estimate the monthly salary of a secretary who has
worked for 10 years.
(g) Perform a test to determine whether the linear regression model is significant. Use 1% level
of significance.

9. One end A of an elastic string was attached to a horizontal bar and a mass, m grams, was attached
to the other end B. The mass was suspended freely and allowed to settle vertically below A. The
length AB, l mm was recorded, for various masses as follows.

m 100 200 300 400 500 600


l 228 236 256 278 285 301

Below is the Minitab output of the above data.


Regression Analysis: l versus m

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 4073.66 4073.66 213.44 0.000
m 1 4073.66 4073.66 213.44 0.000
Error 4 76.34 19.09
Total 5 4150.00

Model Summary

S R-sq R-sq(adj) R-sq(pred)


4.36872 98.16% 97.70% 96.47%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 210.60 4.07 51.78 0.000
m 0.1526 0.0104 14.61 0.000 1.00
(a) Calculate the correlation coefficient using the data given in the Table above and interpret the
value.
(b) Write down the regression equation and interpret the value of the slope.
(c) State the coefficient of determination and explain its meaning.
(d) Based on the regression equation, estimate the length AB when a mass of 250 grams is
attached to the other end B.
(e) Perform a test to determine whether the linear regression model is significant. Use 1% level
of significance.

10. A company manufactures an electronic device to be used in a very wide temperature range. The
company knows that increased temperature shortens the life time of the device. The following data
is found.

Temperature (C) 10 20 30 40 50 60 70 80 90
Life time (hours) 420 365 285 220 176 117 69 34 5

5
STA408 Tutorial 5 Chapter 5: Bivariate Analysis

Below is the Minitab output of the above data.


Regression Analysis: Lifetime versus Temperature

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 169389 169389 431.51 0.000
Temperature 1 169389 169389 431.51 0.000
Error 7 2748 393
Total 8 172137

Model Summary

S R-sq R-sq(adj) R-sq(pred)


19.8128 98.40% 98.18% 96.85%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 453.6 14.4 31.51 0.000
Temperature -5.313 0.256 -20.77 0.000 1.00

(a) Calculate the correlation coefficient using the data given in the Table above and interpret the
value.
(b) Write down the regression equation and interpret the value of the slope.
(c) State the coefficient of determination and explain its meaning.
(d) Based on the regression equation, estimate the life time of the device when the temperature
used is 65C.
(e) Perform a test to determine whether the linear regression model is significant. Use 1% level
of significance.

11. A study on the amount of rainfall and the quantity of air pollution removed produced the following
data.
Daily rainfall
4.3 4.5 5.9 5.6 6.1 5.2 3.8 2.1 7.5
(in 0.01cm)
Particulate removed
116 118 126 121 132 118 114 108 141
(g/m3)

Below is the Minitab output of the above data.


Regression Analysis: Particulate Removed versus Daily Rainfall

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 711.96 711.96 54.02 0.000
Daily Rainfall 1 711.96 711.96 54.02 0.000
Error 7 92.26 13.18
Total 8 804.22

Model Summary

S R-sq R-sq(adj) R-sq(pred)


3.63041 88.53% 86.89% 73.27%

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 91.16 4.31 21.15 0.000
Daily Rainfall 6.080 0.827 7.35 0.000 1.00

6
STA408 Tutorial 5 Chapter 5: Bivariate Analysis

Answer the questions below by referring to the above Minitab output.


(a) Write down the least squares regression equation and interpret the value of the slope.
(b) Determine the correlation coefficient and interpret the value.
(c) Based on the regression equation, estimate the quantity of particulate removed when the
amount of rainfall is 0.07cm.
(d) Perform a test to determine whether the linear regression model is significant. Use 5% level
of significance.

12. Determine if each of the following statement is TRUE or FALSE.


(a) The best fit regression line passes through the mean values of the data set.
(b) Pearson’s correlation coefficient, r, measures the strength of linear relationship between an
independent variable and a dependent variable.
(c) Coefficient of determination, 𝑅 2 , denotes the proportion of variation in predictor variable
that can be explained by the response variable.
(d) The simple linear regression analyses the linear relationship between the dependent and
response variables.
(e) A weak correlation coefficient is indicated by a negative value.

(f) The constant term in a regression equation can be interpreted as the value of the predictor
variable when the value of the explanatory variable is zero.
(Answer: (a) TRUE, (b) FALSE, (c) FALSE, (d) FALSE, (e) FALSE, (f) FALSE)

Вам также может понравиться