Вы находитесь на странице: 1из 7

IS/BM-B

04/05/2020

DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE


STAT 334: STATISTICAL METHODS II
EXERCISE THREE

1. Terence and Associates surveyed retired senior executives who had returned to work. They found
that after returning to work, 38% were employed by another organization, 30% were self-employed,
23% were either freelancing or consulting, and 9% had formed their own companies. To see if these
percentages are consistent with those of Issah Corporates, a local researcher surveyed 700 retired
executives who had returned to work and found that 322 were working for another company, 185
were self-employed, 126 were either freelancing or consulting, and 67 had formed their own
companies. At α = 0.10, test the claim that the percentages are the same for those people in
Allegheny County.

2. Human blood is grouped into four types: A, B, AB, and O. The percents of Ghanaians with each
type are as follows: O, 48%; A, 35%; B, 12%; and AB, 5%. At a recent blood donation exercise at
University of Ghana, the donors were classified as shown below. At the 0.05 level of significance, is
there sufficient evidence to conclude that the proportions differ from those stated above?

O A B AB
228 147 96 39

3. The data below shows the number of accidents corresponding to day of the week accident occurred
in the first quarter of 2017 according to a report in Ghana.

Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday


No of Accidents 12 25 18 20 32 14 8

At the 5% significance level, do the data provide sufficient evidence to conclude that road accidents
are more likely to occur on some days than on others?

4. It has been accepted globally that data collected on living things follow the normal distribution.
This has been the basis for most conclusions drawn on analysis of bio-data in biology and
medicine. The following table represents the distribution of the weights (in grams) of malignant
tumor removed from 80 patients at a public hospital.

Weight (grams) 20 – 22 23 – 25 26 – 28 29 – 31 32 – 34 35 – 37 38 – 37
No. of Patients 8 11 12 19 14 10 6

Suppose as a statistician, you were asked to check whether the globally accepted statement is true
or not, based on the above data, what conclusions will you reach at 0.05 significance level?

5. 5. According to a genetic theory, the number of colour strains red, yellow, blue and white in a
certain flower should appear in proportions 4 : 12 : 5 : 4. Observed frequencies of red, yellow, blue
and white strains among 500 plants are; n1 = 110, n2 = 410, n3 = 150 and n4 = 130 respectively. It
is required to test the hypothesis that the number of colour strains in the flower is distributed
according to the genetic theory. Test at 0.1 level of significance.

STAT 334/Exx3 Page 1 of 7 IS/BM-B/2020


6. The following are data on the proportion of loans defaulted in a year in 400 financial institutions:

Proportion of loans defaulted No. of financial institutions


0.195-0.295 145
0.295-0.395 96
0.395-0.495 67
0.495-0.595 32
0.595-0.695 28
0.695-0.795 13
0.795-0.895 12
0.895-0.995 7
Total 400

Test at 5% level of significance whether these data come from a truncated exponential distribution
with density function f (x) = 4e−4(x−0.190) , x > 0.190

7. The table below shows that life span distribution of certain organism in years

Lifespan <0.125 0.125-0.250 0.250-0.375 0.375-0.500 0.500-0.625 0.625-0.750 0.750-0.875 >0.857


Frequency 7 29 69 100 133 132 123 57

(a) Based on the table, can it be said that the lifespan of the organism is uniformly distributed
over the interval [0, 1]. Test at 10% level of significance.
(b) Test the claim that the life span is distributed as the Beta distribution with parameters 3 and
2 at 10%significance level. 
Γ(α + β) α−1
Hint: If X ∼ B(α, β), f (x) = β−1
x (1 − x) , 0 ≤ x < 1
Γ(α)Γ(β)

8. The number of weekly deaths in a district recorded for 500 weeks is tabulated below.

Number of deaths 0 1 2 3 4 5 6 7 8 9 10 12
Frequency 76 150 95 68 48 30 13 9 5 3 2 1

Test at 1% significance level that the weekly death is distributed as the Poisson with mean 3.5.

9. Do doctors feel differently about a certain new postoperative procedure from nurses? To answer
this question, a researcher selects a sample of nurses and doctors and asks them to choose between
an old and the new procedure and tabulates the data in table form, as shown below.
Preference
Group New Procedure Old Procedure Indifferent
Nurses 100 80 20
Doctors 50 120 30

Based on the data, is opinion on the procedure independent of profession? Test 5% significance
level.

STAT 334/Exx3 Page 2 of 7 IS/BM-B/2020


10. In a classic study of peptic ulcer, blood types were determined for 1,655 ulcer patients. The
accompanying table shows the data for these patients and for an independently chosen group of
10,000 healthy controls from the same city

Blood Type Ulcer Patients Controls


O 911 4578
A 579 4219
B 124 890
AB 41 313
Total 1,655 10,000

Carry out the chi-square test at α = 0.01 whether Proportion of Ulcer patients is the same across
blood groups

11. The data below are the responses obtained from 300 university lecturers to the question of whether
teaching, research or total performance is the most important basis for assessment of academic
performance.

College
Mode Applied Science Social Science Arts
Teaching 40 20 40
Research 30 60 30
Total Performance 20 20 40
Total 90 100 110

(a) Give an outline of a test that can be used to determine whether the data supports the
hypothesis that there is no difference in the opinions among the lecturers in the three colleges.
(b) Carry out the test at 5% level of significance and draw appropriate conclusions

12. Astudy is being conducted to determine whether the age of the customer is related to the type of
movie he or she rents. A sample of renters gives the data shown here.

Age Documentary Comedy Mystery


12–20 28 18 11
21–29 25 34 14
30–38 15 37 63
39–47 12 47 21
48+ 10 55 15

(a) At 5% level of significance, is the type of movie selected related to the customer’s age?
(b) Compute, the Pearson, Cramer’s and Tschuprow’s contingency coefficients and interpret your
result in each case.

STAT 334/Exx3 Page 3 of 7 IS/BM-B/2020


13. A survey of students at various age group revealed the following data on volunteer practices. At
α = 0.10, can it be concluded that the proportions of volunteers are the same for each group?

Age
Volunteer 18 19 20 21 22
Yes 26 33 30 62 13
No 79 105 68 98 62

14. A crop researcher is interested in finding the relationship existing between yield (y) and the
number of irrigation (x). A random sample of yield and the associated number of irrigation is
tabulated below

y 24 23 27 26 25 26 29 27 31
x 2 4 6 2 4 6 2 4 6

Assuming a linear regression model y = β0 + β1 x + ε;

(a) Find the OLS estimates of β0 and β1 and interpret your result
(b) Test the hypothesis that, the regression model is not significant at α = 0.05
(c) Compute the coefficient of determination and interpret your result.
(d) Construct a 95% confidence interval for the slope parameter and interpret it. How does this
interval compare with your results in (b)?

15. To investigate the relationship between Corporate Social Responsibility (CSR) and Financial Profit
of Ghanaian business establishments, a researcher collected data on CSR and FP of seven
randomly selected business establishments listed on the Ghanaian stock exchange with the
following summary data results:
X X X
(x − x̄)2 = 8, 014.30, (y − ȳ)2 = 606.80, (x − x̄)(y − ȳ) = 2, 117.10

(a) Construct a 95% confidence interval for the slope parameter of a simple linear regression of
CSR on FP.
(b) Is this estimate of this slope parameter significant at 5% level of significance? Give a reason
for your answer.
(c) Comment on the amount of variation in CSR that is attributable to its relationship to FP?

16. A personal manager of a large corporation thinks that there is a relation between absenteeism and
age, and would like to use the age of worker ,X, to predict the number of days absent,Y , during a
calendar year. A random sample of 12 employees was selected with the following summaries:
X X X X X
X = 301, Y = 530, XY = 14560, X 2 = 6015, Y 2 = 29900

It is required to fit the linear regression model E(Y |X) = β0 + β1 (X − X̄)

(a) Calculate the least squares estimates of the regression line and give practical interpretations
of your estimates.
(b) By constructing an ANOVA table, test the significance of regression model at 10% level of
signficance.
(c) Construct 95% confidence intervals for the intercept and the slope parameters.
(d) Predict the number days of absence for a worker who is 24 years and obtain a 95% confidence
interval of your estimate.

STAT 334/Exx3 Page 4 of 7 IS/BM-B/2020


17. A lecturer is interested in the relationship between the number of times a student misses a lecture
and his/her final exam mark. The table below shows a random sample of students with the
number of absent times and corresponding final mark.

Numberof absences 6 2 15 9 12 5 8 4 5 7 10
Final grad (%) 82 86 43 74 58 90 78 84 80 75 60

Assuming a linear relationship between the two variables;

(a) Fit a regression model by choosing an appropriate outcome variable and predictor.
(b) Construct a table showing the observed responses and the corresponding residuals and fitted
values. Hence, calculate the value of the sum of square due to errors, SSE and the sum of
squares due to the regression, SSR.
(c) Test the hypothesis that the slope parameter is not significant at 1% significance level.
(d) Plot the residuals against the fitted values and use it to perform diagnostics on the
assumption of simple linear regression.
(e) Predict a final grade of student who misses lectures 11 times and a construct a 90%
confidence interval for your estimate.

18. An emergency service wishes to see whether a relationship exists between the outside
temperature(°F) and the number of emergency calls it receives for a 7-hour period. The data are
shown below

Temperature (x) 68 74 82 88 93 99 101 75 80


No. of calls (y) 7 4 8 10 11 9 13 5 8

(a) Compute the Pearson Correlation and test at 5% significance level if it is significant. Interpret
your result
(b) By assuming the model yi = α + β(xi − x̄) + εi ;
i. Fit a regression model and interpret your parameter estimates practically.
ii. By partitioning the total sum of squares, test whether the regression model is significant
at 5%
iii. Construct 99% confidence interval for predicted number of calls at 95°F
iv. Compute the coefficient of determination (R2 )
(c) Also fit the regression x on y and
i. Compute the R2 and compare it to b(iv)
ii. Find the product of the slope parameters estimates in (b) and (c) and compare your
result to R2 . Comment on your result.

19. (a) Assuming a linear regression model of the form Yi = βXi + εi , find the OLS estimator of β
(b) In linear
 regression,
  the testfor overall model is done using the test statistic
SSR SSE SSR
F = ÷ , whilst the coefficient of determination is R2 = . Where p
p n−p−1 SST
is the number of predictors in the model and the other terms have their usual meaning. Use
these relations to express F in terms of R2 .

STAT 334/Exx3 Page 5 of 7 IS/BM-B/2020


20. The data below shows the weekly revenue of a soft drink and the corresponding expenditure on TV
and Newspaper Adverts on the soft drink

Weekly Revenue(y) (GH¢1000s) 102 90 95 92 95 98 100 94 98 85 88 96


TV Advert (x1 )GH¢(1000s) 5 2 4 2.5 3 3.5 2.5 3 4 1.5 1.8 4
Newspaper Advert (x2 )(GH¢1000s) 1.5 2 1.5 2.5 3.3 2.3 4.2 2.5 2 0.5 1 2

Assuming the model Y = β0 + β1 X1 + β2 X2 + ε

(a) Write down the matrices X 0 X and X 0 Y


(b) Fit the linear regression Y on X1 and X2 and give practical interpretation to the regression
estimates
(c) Construct an ANOVA table to test for the overall significance of the model at 5% significance
level.
(d) Test at 0.05 level of significance whether each of TV Advert and Newspaper Advert is
significant in predicting weekly revenue of the soft drink.
(e) Compute the adjusted R2 and comment on it

21. The following data show the price (in $ 1000s), horsepower, and Half-mile speed (in kmh− 1) for 15
popular Honda cars. Suppose that the price of each sports and GT car is also available. The
complete data set is as follows

Price Horsepower Speed


25.04 195 90.70
93.76 290 108.00
40.90 189 93.20
24.87 305 103.20
50.14 345 102.10
69.74 450 116.20
23.20 225 91.70
26.38 195 89.70
44.99 215 93.00
42.76 185 92.30
47.52 320 99.00
25.07 155 84.60
27.77 305 103.20
45.56 201 93.20
40.99 320 105.00

(a) Fit a linear regression model predicting the speed using horsepower and price. Interpret your
regression parameter estimates practically.
(b) Conduct an ANOVA to test for the overall regression atα = 0.05 and hence compute the
adjusted R2 and interpret it.
(c) Conduct individual tests on the independent variables to check which of them is significant in
predicting the speed of the car.

STAT 334/Exx3 Page 6 of 7 IS/BM-B/2020


22. Financial analyst would like to understand the relationship among common stockholders’ equity
per share (y), total revenue in billions of cedis (x), and total assets in billions of cedis (z). Data on
these variables from 2000 to 2010 are summarized below.
X X X X X
y = 113.43, x = 138.2247, , z = 92.8077, y 2 = 1231.4320, x2 = 4184.2187
X X X X
z 2 = 977.0069, yx = 1463.6880, yz = 959.5678, xz = 1132.1230

(a) Fit multiple linear regression of y on x and z. Interpret the regression estimates.
(b) Construct an analysis of variance table for your regression model in part (a) above. Would
you conclude that the test for overall regression is significant at α = 0.05
(c) Compute the adjusted R2 and interpret your result
(d) If your conclusion in part (b) above is significant, test at α = 0.05 whether total revenue is
significantly related to stockholders equity.

STAT 334/Exx3 Page 7 of 7 IS/BM-B/2020

Вам также может понравиться