You are on page 1of 39

Practice Exam 1 Solutions

Final Examination


Directions

The exam will end 3 hours minutes after it begins. The exam is divided into two parts.
The first part is multiple choice. Please answer the multiple choice questions on the exam
by circling the best answer (some rounding occurs in several places). The second part of
the exam consists of several problems. Please answer these problems in the space
provided on the exam (you may use the backs of the sheets if necessary). You will get
partial credit for these problems provided that your answers are organized and legible so
that your train of thought can be easily followed. All answers must also be transferred to
the answer sheet to be fully counted.
Good Luck

DON'T EVEN THINK ABOUT PANICING

By Printing my name below I acknowledge that Harvard has an honor code and that I
will adhere to it. Failure to abide by the honor code could result in failing this course and
having to wash Professor Parzens car with my toothbrush.


NAME: ______________________________________________ (-50 if not printed)















Multiple Choice (3 points each)

1) A hypothesis test is used to prevent a machine from under-filling or overfilling
quart bottles of beer. On the basis of a sample, the null hypothesis is rejected and
the machine is shut down for inspection. A thorough examination reveals there is
nothing wrong with the filling machine. From a statistical point of view:

a. A correct decision was made.
b. A Type I and Type II error were made.
c. A Type I error was made.
d. A Type II error was made.

2) The median waiting time for patients to see a doctor at a local emergency room is
much smaller than the mean waiting time. Which of the following is most
consistent with this information (circle one):

a. A histogram of the waiting times would be symmetric.
b. A histogram of the waiting times would be left-skewed.
c. A histogram of the waiting times would be right-skewed.

3) A student is studying very hard in a fluid dynamics course, but he knows he will
either pass or not pass. Suppose this student, Jack Daniels, has a probability of
0.90 for studying the night before the exam. Also, he has a probability of 0.75 for
passing the exam. If the probability of passing the exam, given that he studied the
night before, is 0.82, what may you conclude?

a. The probability of Jack not passing the exam is 0.10.
b. P(Jack studies OR Jack passes) is greater than 0.75.
c. P(Jack does not pass AND Jack studies) is greater than 0.5.
d. P(Jack passes AND Jack studies) is greater than 0.75.
e. None of the above

4) Suppose a computer processor yielded the following random sample of binary
digits:
0101001101010100010000010101010001011010
01010010010101010100110001011010

Is the computer processor yielding an even distribution of ones and zeros? If the
above sample contains 72 digits, of which thirty are ones, what is the value of
the test statistic to answer this question?

a. t = -1.41
b. t = -1.65
c. t = 1.65
d. t = 1.41
e. None of the above
5) If Steve and Doug Butabi want to find the proportion of people who believe they
can move their heads graciously, how large of a sample size is required so that the
margin of error is at most 2 percentage points with a 95% confidence level?

a. 3382
b. 4148
c. 34
d. 266
e. None of the above

6) In the Land of Chocolate, three friends (Hershey, Nougat, and B.C.) all have to
make tough decisions about the next phase of their life where they have ONLY
two options: going to college or working in the nearby sugar mines of Dos
Catorce. Suppose the following probabilities are true: P(Hershey goes to college)
= 0.2 = P(Nougat works at the sugar mines), P(B.C. goes to college) = 0.7. If
each of them makes their choice independent of the others, what is the probability
of Hershey and Nougat going to college while B.C. works at the sugar mines?

a. 0.112
b. 0.048
c. 0.028
d. None of the above

7) The following diagram comes from a famous piece of music.



A random sample of 81 people indicated that 19 people knew the piece of music
from the diagram alone. A random sample of 200 people (independent from the
first) indicated that 175 people knew the piece of music when it was played on a
piano. Construct a 95% confidence interval for the population proportion of
people who know the piece of music from the diagram alone.

a) (.14,.33)
b) (.23,.56)
c) (.12,.23)
d) (.32,44)
e) None of the above




Below you are given the graphs of two normal density curves, both with the same mean.
Use these density curves to answer the following questions:





8) The area under each of these curves is equal to 1.

a) True
b) False

9) Which curve has the larger standard deviation?

a) Graph A
b) Graph B

10) Which distribution has a smaller percent of its data between 35 and 40 units?

a) Graph A
b) Graph B


11) Give a rough estimate of the standard deviation of the density curve in Graph B

a) 5
b) 10
c) 15
d) 20
e) None of the above







12) An instructor gives the same y versus x data as given below to four students.



They each come up with four different answers for the straight line regression
model. Only one is correct. The correct model is

a. y = 60x 1200
b. y = 30x 200
c. y = 139.43 + 29.684x
d. y = 1+ 22.782x

13) A scientist finds that regressing the y versus x data given below results in the
coefficient of determination for the straight-line regression model to be one.


The missing value for y at x = 17 most nearly is

a. -2.444
b. 2.000
c. 6.889
d. 34.00


14) Suppose Z is a standard normal random variable. Then what is the probability that
X=1+2Z will be less than 3?

a) .1587
b) .3413
c) .8413
d) .0013
e) None of the Above.






15) Suppose that X is a binomial random variable with n=3, p=.22. What is the
probability that X will take the value 2?

a) .886744
b) .113256
c) .037752
d) .962248
e) None of the Above.




16) Consider a game where you win $9 with probability .1 and lose $1 with
probability .9. What is your expected profit for this game?

a) $1
b) $1
c) 0
d) $2
e) None of the Above.



17) Suppose that X is a random variable taking the value 5 with probability .4, and
taking the value 5 with probability .6. What is the standard deviation of X?

a) 10
b) 4
c) 24
d) 24
e) None of the Above.



18) Suppose that X is a random variable with a binomial distribution. If the expected
value of X is 50 and the variance of X is 25 then the distribution of X must be:

a) Symmetric
b) Skewed Right
c) Skewed Left
d) It cannot be determined from the given information.



19) If two random variables X and Y have a negative covariance, then:

a) high values of x tend to be associated with high values of y and low values
of x tend to be associated with low values of y.
b) high values of x tend to be associated with low values of y and low
values of x tend to go with high values of y.
c) negative values of x tend to go with negative y values, and vice versa.
d) the expected value of x times y is less than zero.



20) A management-consultant firm uses a regression model where X1 stands for
previous experience, X2 for number of years at current job, and X3 for score on a
job-aptitude test. These variables are used in a regression model to predict job
satisfaction. Job satisfaction ranges from 1 to 20, with 20 indicating that an
employee is satisfied with every aspect of his or her job. The prediction equation
is Yhat = 1.7 0.15 X1 + 0.25 X2 + 0.14 X3. What would the consulting firm
predict for the job satisfaction of an employee who has 15 years of prior
experience, 10 years of employment at the present job, and an aptitude test score
of 85?
a. 14.83
b. 13.85
c. 17.79
d. 15.12
e. None of the above














21) The average cost of tuition, room and board at small private liberal arts colleges is
reported to be $8,500 per term, but a financial administrator believes that the
average cost is higher. A study was conducted using a sample of 150 small liberal
arts colleges. The computer output below was obtained. Let o = 0.05.

Hypothesis test results:
: population mean
H
0
: = 8500
H
A
: > 8500




Based on the output, the conclusion should be

a. the true average cost is higher than $8,500.
b. the true average cost is lower than $8,500.
c. the true average cost is equal to $8,500.
d. the true average cost is equal to $8,708.90.





22) In developing a confidence interval for a population mean, a sample size of 40
observations was used. The CI was 17.25 2.42. Had the sample size been 160
instead of 40, the CI would have been

a) 17.25 1.68
b) 17.25 1.21
c) 69.00 9.68
d) 17.25 9.68

23) After fitting a regression model, if the sum of the residuals equals 0 (
1
0
n
i
i
e
=
=

)

(a) The model is fitting well.
(b) We have reason to doubt the normality assumption.
(c) It tells us nothing.
(d) The slope parameter must be zero.



Mean Sample Mean Std. Err. DF T-Stat P-value
8708.9 96.36292 149 2.1678462 0.0159
24) If the coefficient of determination
2
100% R = , then

(a) none of the variability in the observations is explained by the model fit.
(b) all observations fall on the fitted line exactly.
(c) the model is not true.
(d) a quadratic model would fit the data better.

25) In testing the hypothesis Ho : = 75 vs Ha : 75, the following information is
known: n = 64, x = 72, and s = 10. The computed test statistic is equal to

a) 1.96
b) 2.4
c) -2.4
d) -1.96

26) Suppose the 95% confidence interval for the true population proportion p is (0.36,
0.54). Based on this confidence interval alone, in which of the following set(s) of
hypotheses would the null hypothesis be rejected (at the 0.05 significance level)?

a) Ho : p = 0.3 versus Ha : p 0.3
b) Ho : p = 0.4 versus Ha : p 0.4
c) Ho : p = 0.5 versus Ha : p 0.5
d) All of the above.

27) Your boss asks you to calculate a 99% confidence interval instead of a 90%
confidence interval. What is an advantage and a disadvantage of this action?

a) The advantage is higher confidence. The disadvantage is a wider interval.
b) The advantage is higher confidence. The disadvantage is a narrower interval.
c) The advantage is lower confidence. The disadvantage is a wider interval.
d) The advantage is lower confidence. The disadvantage is a narrower interval.









Based on a random sample of 1000 high school students, 280 of them said they are
current smokers. The 90% confidence interval for the true proportion of all high school
students that are current smokers is (0.26, 0.30).

28) Does the sample proportion lie in the interval (0.26, 0.30)?

a) Yes
b) No
c) Can't tell

29) Does the population proportion lie in the interval (0.26, 0.30)?

a) Yes
b) No
c) Can't tell

30) If we use a 95% confidence level instead of a 90% confidence level, will the
confidence interval calculation from the same data produce an interval narrower
than (0.26, 0.30)?

a) Yes
b) No
c) Can't tell

31) Will the sample proportion for a future sample of 1000 high school students lie in
the interval (0.26,0.30)?

a) Yes
b) No
c) Can't tell










The scatterplot below displays information for 50 states for the year 2000 with regard to
the variables: M.D.s per 100,000 which represents the number of doctors per 100,000
residents and Percent Poverty, the percentage of the population considered to be living in
poverty. The R-sq value is 5.6% and the least squares regression line is

M.D.s per 100,000 = 279.3 4.175 (Percent Poverty)



32) Which of these options better interprets the value of the slope?

a) For each additional percent in poverty the estimated number of M.D.s per
100,000 goes down by 4.175 on the average.
b) For each additional percent in poverty the estimated number of M.D.s per
100,000 goes up by 4.175 on the average.
c) For each additional M.D. the estimated percent in poverty goes down by 4.175%
on the average.
d) For each additional M.D. the estimated percent in poverty goes up by 4.175% on
the average.
e) For every 1 M.D. the percent in poverty goes down by 4.175%.





33) In the year 2000 the percent in poverty for Tennessee was 13.4. According to the
model (or regression equation), how many doctors would we have expected per
100,000 people?

a) About 279
b) About 275
c) About 223
d) About 335
e) About 250

34) Which of these statements is the best interpretation of R-sq in this example?

a) 5.6% of the people living in poverty have enough M.D.s
b) Only 5.6% of the variability in the number of M.D.s per 100,000 is
explained by the percent of the population living in poverty.
c) Only 5.6% of the M.D.s live in poverty.
d) Only 5.6% of the people living in poverty have no M.D.s

35) In the year 2000 the District of Columbia had 23.5% in poverty and 702 M.D.s
per 100,000. If this data point was added to the scatterplot, it would be

a) a residual.
b) negatively correlated with the data.
c) an outlier and influential observation.
d) a weak influence on the least-squares regression line.
e) a lurking variable.



A persons muscle mass is expected to decrease with age. To explore this relationship in
women, a nutritionist randomly selected 15 women from each 10-year age group,
beginning with age 40 and ending with age 79. The observations and least-squares
regression line appear in the scatterplot and the R-sq value is 75%.



36) Which of the following statements is the most accurate ?

(A) For each additional year of age the estimated mean muscle mass increases and
decreases.
(B) The relationship between age and muscle mass is weak because the
correlation is negative. Higher muscle mass goes with both lower and higher age.
(C) The scatterplot shows a negative direction, with higher muscle mass
going with lower age. The plot is generally straight with a moderate amount
of scatter.
(D) The relationship between age and muscle mass is weak because R-sq=75% is
a small number compared to the intercept of 156.35.
(E) The correlation between age and muscle mass turns out to be -0.866. This is
an indication that age is causing muscle mass to decrease with time.


37) Which is the most appropriate statement regarding the interpretation of the
intercept?

(A) For each additional year of age the estimated mean muscle mass decreases by
approximately 1.19 MMIs.
(B) The average muscle mass is 156.35 MMI for women at age 0.
(C) The minimum muscle mass is 156.35 MMI.
(D) For each additional year of age muscle mass decreases by approximately
156.35 MMIs.
(E) We cannot interpret the intercept here since it does not make sense that a
newborn female child would have a muscle mass index of 156.35.

38) The following probability density curve represents waiting times at a customer
service counter at a national department store. The mean waiting time is 5 minutes
with standard deviation 5 minutes. If we took all possible samples of size n=100,
how would you describe the sampling distribution of the resulting sample means?





(A) Shape = right skewed, mean = 5, standard deviation = 5
(B) Shape = same as above graph, mean = 5, standard deviation = 0.5
(C) Shape = approximately normal, mean = 5, standard deviation = 0.5
(D) Shape = approximately normal, mean = 5, standard deviation = 5
(E) Shape = binomial, n =100; p = .05

Hy-Vee Inc. collected data to measure the impact of television advertising on the price
which customers expect to pay for a deluxe pre-packaged dinner sold in Hy-Vee
grocery stores. For each local TV market, Hy-Vee determined two marketing inputs:

x1 = Number of one-week TV promotions
x2 = Advertised discount (in percent) for price of the dinner

In particular, Hy-Vee used x1 = 1, 3, 5, and 7 promotions in combination with x2 = 10%,
20%, 30%, and 40% discounts. Hy-Vee advertised in 10 local TV markets for each of the
(4x4) = 16 combinations of x1 and x2, for a total of 160 markets. Hy-Vee also conducted
a post-advertising customer survey in each market to measure

y = Expected price for the dinner, in dollars

Here is the resulting computer output:




39) Which of the following conclusions is supported by the output?

(a) Promotions is linearly related to Price.
(b) Discount is linearly related to Price, after accounting for Promotions.
(c) The regression assumptions are satisfied.
(d) Neither Promotions nor Discount is linearly related to Price.
(e) The price of beer is likely to fall now that the national elections are over.





40) Interpret the slope for Promotions.

(a) Promotions decrease on average by 0.102 for each one-dollar increase in
expected price.
(b) Expected price decreases on average by $0.102 for each additional promotion.
(c) Promotions decrease on average by 0.102 for each one-dollar increase in
expected price, when discount is held constant.
(d) Expected price decreases on average by $0.102 for each additional
promotion, when discount is held constant.
(e) Expected price decreases on average by $0.0174 for each additional
promotion, when discount is held constant.

41) Suppose that Hy-Vee plans an ad campaign which features two promotions of a
35% price discount in each local market. Estimate the mean expected price with
95% certainty.

(a) $4.31
(b) ($4.24, $4.37)
(c) ($3.78, $4.84)
(d) Stop! Im too tired to calculate this.

42) Suppose that the goal of the ad campaign described in the previous question is for
customers to expect the price to be at most $4.35, on average. Which modification
to the ad campaign should be recommended to help Hy-Vee achieve its goal?

(a) Feature a 10% price discount instead of a 35% discount.
(b) Feature a 30% price discount instead of a 35% discount.
(c) Run seven promotions in each market.
(d) Run a single promotion in each market.
(e) None of the modifications is recommended.

Short Answers

1) (9 points) In a recent study, 928 women were asked about their smoking habits
during pregnancy and then again five years later. The data are summarized in the
table below.



a) What is the approximate probability that a randomly chosen woman smoked 5
years after pregnancy?

(230+95)/928


b) If a randomly selected woman smoked during pregnancy, what is the probability
that she smoked 5 years after pregnancy?

P(5 yrs later|smoked during) = 230/271





c) Are the events Smoking during Pregnancy and Smoking Five Years Later
independent or dependent? Explain.


No, the conditional probability in (b) does not equal the unconditional
probability in (a).






2) (21 points) Consider the following multiple regression computer output and then
answer the questions on the following pages.


Multiple linear regression results
Dependent Variable: var13
Independent Variable(s): var1, var2, var3, var4, var5, var6, var7, var8, var9, var10,
var11, var12
Parameter estimates:


Analysis of variance table for multiple regression model:

Root MSE (also called s
e
) 0.32969475
R-squared (adjusted): 0.2242




Variable Estimate Std. Err. Tstat P-value
Intercept -0.057636276 0.048596717 -1.1860118 0.236
var1 -0.001675507 0.028293129 -0.059219576 0.9528
var2 4.140445E-5 1.442341E-4 0.28706425 0.7741
var3 0.0025728503 0.0026423088 0.973713 0.3305
var4 0.0386679 0.016722031 2.3123925 0.021
var5 -0.002243308 0.0021889468 -1.0248344 0.3058
var6 0.029505625 0.02173865 1.3572887 0.1751
var7 0.06405778 0.027510468 2.3284876 0.0202
var8 0.088917315 0.021120988 4.2099032 <0.0001
var9 -0.04077969 0.027402725 -1.4881619 0.1372
var10 -0.07517062 0.02721936 -2.76166 0.0059
var11 -0.030563422 0.02852621 -1.0714155 0.2843
var12 -0.012943288 0.0255786 -0.50602025 0.613
Source DF SS MS F-stat P-value
Model 12 1.9961203 18.363802 <0.0001
Error 709 77.06733 0.10869864
Total 721 101.020775
a) Which variable is the most important variable in the model ?

Var8 since it has the lowest p-value

b) Which variable would be removed first when performing a backwards
stepwise regression ?

Var1 since it has the highest p-value.

c) What would happen to the value of R-sq when you remove the variable in
part (b) (circle one answer)

Go Up Go Down

d) What is the coefficient of determination (R-sq) for the full model ?

R-sq = SSR/SST = 1 (SSE/SST) = 1 (77.067/101.02)

e) Compute a 95% confidence interval for var10.

-0.07517062 +/- 1.96*(0.02721936)

f) Do we need var2 in the model ? Explain.

No, the p-value is above .05


g) Test the null hypothesis that var8 equals 0.1

Ho: var8=0.1 Ha: no it doesnt

t = (0.088917315-0.1)/ 0.021120988 = -0.5247

Since |t|<1.96 we fail to reject the null hypothesis.

Practice Exam 2 Solutions


Final Examination

Directions

The exam will end 3 hours after it begins. The exam is divided into three parts.
The first and second parts are true-false and multiple choice, respectively. Please
answer the true-false and multiple choice questions on the exam by circling the
best answer. There will be some partial credit for the multiple-choice questions
as long as some credible work is shown. The third part of the exam consists of
several problems. Please answer these problems in the space provided on the
exam (you may use the backs of the sheets if necessary). You will get partial
credit for these problems provided that your answers are organized and legible
so that your train of thought can be easily followed.

Unless stated, all confidence intervals and hypothesis test should be calculated at
the 95% confidence level (use 1.96).

A note on re-grade requests: Only written requests will be considered. Clerical
errors will be changed without question, but other inquiries will result in a re-
grade of the entire exam.

GOOD LUCK

By signing my name here I acknowledge that the GBS has an honor code and I will abide by it.


___________________________________________


NAME (PLEASE PRINT) :

_______________________________________________________________ (-100 if
not printed)










Multiple Choice (5 points each)



1) Consider the following sample data:

25 11 6 4 2 17 9 6

For these data the median is:

a. 7.5
b. 3.5
c. 10.
d. None of the above.

2) The owner of a fish market has an assistant who has determined that the weights of
catfish are normally distributed, with mean of 3.2 pounds and standard deviation of 0.8
pound. If a sample of 64 fish yields a mean of 3.4 pounds, what is probability of
obtaining a sample mean this large or larger?

a) 0.0001
b) 0.0013
c) 0.0228
d) 0.4987

3) In the construction of confidence intervals, if all other quantities are unchanged, an
increase in the sample size will lead to a interval.

a) narrower
b) wider
c) less significant
d) biased

4) A major department store chain is interested in estimating the average amount its credit
card customers spent on their first visit to the chains new store in the mall. Fifteen credit
card accounts were randomly sampled and analyzed with the following results: X =
$50.50 and
2
s = 400 . Construct a 95% confidence interval for the average amount its
credit card customers spent on their first visit to the chains new store in the mall
assuming that the amount spent follows a normal distribution.

a) $50.50 $9.09
b) $50.50 $10.12
c) $50.50 $11.00
d) $50.50 $11.08





5) In the annual report, a major food chain stated that the distribution of daily sales at their Detroit stores
is known to be bell-shaped, and that 95 percent of all daily sales fell between $19,200 and $36,400.
Based on this information, what were the mean sales?

a. Around $20,000
b. Close to $30,000
c. Approximately $27,800
d. Cant be determined without more information.


6) For some positive value of X, the probability that a standard normal variable is between 0
and +2X is 0.1255. The value of X is

a) 0.99
b) 0.40
c) 0.32
d) 0.16
e) None of the above

7) If we know that the length of time it takes a college student to find a parking spot in the
library parking lot follows a normal distribution with a mean of 3.5 minutes and a
standard deviation of 1 minute, find the probability that a randomly selected college
student will find a parking spot in the library parking lot in less than 3 minutes.

a) 0.3551
b) 0.3085
c) 0.2674
d) 0.1915
e) None of the above

8) The Central Limit Theorem is important in statistics because

a) for a large n, it says the population is approximately normal.
b) for any population, it says the sampling distribution of the sample mean is
approximately normal, regardless of the sample size.
c) for a large n, it says the sampling distribution of the sample mean is
approximately normal, regardless of the shape of the population.
d) for any sized sample, it says the sampling distribution of the sample mean is
approximately normal.
9) It is believed that number of people who attend a Mardi Gras parade each year depends on the
temperature that day. A regression has been conducted on a sample of years where the
temperature ranged from 28 to 64 degrees and the number of people attending ranged from 8400 to
14,600. The regression equation was found to be x y 191 2378 + = . Which of the following is
true?

a. The average change in parade attendance is an additional 2378 people per one degree
increase in temperature.
b. The average change in parade attendance is an additional 191 people per one
degree increase in temperature.
c. If the temperature is 75 degrees, we can expect that 16,703 people will attend.
d. If the temperature is 0 degrees this year, then we should expect 2378 people to attend

10) An analyzing the residuals to determine whether the simple regression analysis satisfies
the regression assumptions, which of the following is the best answer?

a. The histogram of the residuals should be approximately bell shaped
b. The scatter plot of the residuals against the dependent variable should
illustrate that the variation in residuals is the same over all levels of y (should
have no patterns).
c. Neither a nor b are true
d. Both a and b are true

11) Assume that after running a regression that you have calculated a prediction of 110 = y .
Also assume that n = 201 and that s

= 4.5. Find the approximate 95% prediction


interval.

a. About 101 to 119
b. About 109.4 to 110.6
c. About 105.5 to 104.5
d. About 98.4 to 121.6
12) Residual analysis is conducted to check whether regression assumptions are met. Which
of the following is not an assumption made in simple linear regression?

a. Errors are independent of each other
b. Errors are normally distributed
c. Errors are linearly related to x
d. Errors have constant variance
13) The following regression output was generated based on a sample of utility customers.
The dependent variable was the dollar amount of the monthly bill and the independent
variable was the size of the house in square feet.

Based on this regression output, which of the following statements is not true?
a. The number of square feet in the house explains only about 2 percent of the
variation in the monthly power bill
b. At the usual alpha level equal to 0.05, there is no basis for rejecting the
hypothesis that the slope coefficient is equal to zero
c. The average increase in the monthly power bill is about 66.4 for each
additional square foot of space in the house
d. The total number of observations is 30.

14) In an effort to estimate the mean dollars spent per visit by customers of a food store, the
manager has selected a random sample of 100 cash register receipts. The mean of these
was $45.67 with a sample standard deviation equal to $12.30. Assuming that he wants to
develop a 95 percent confidence interval estimate, the upper limit of the confidence
interval estimate is:

a. about $2.02.
b. approximately $65.90.
c. about $48.08
d. None of the above.


15) A random sample of 340 people in Chicago showed that 66 listened to WJKT 1450, a
radio station in South Chicago Heights. Based on this sample information, what is the
point estimate for the proportion of people in Chicago that listen to WJKT 1450?

a. 0.231
b. 0.194
c. 0.51
d. 66
e. None of the above


16) The finishing process on new furniture leaves slight blemishes. The table below displays
a managers probability assessment of the number of blemishes in the finish of new
furniture.

Number of Blemishes 0 1 2 3 4 5
Probability 0.34 0.25 0.19 0.11 0.07 0.04

On average, how many defects would we expect on a piece of furniture?

A) 0.28
B) 0.85
C) 1.44
D) 0.77
E) None of the above


17) In a recent survey, 70% of human resource directors thought that it was very important
for business students to take a course in business ethics. For a sample of 12 human
resource directors, what is the probability that at least one of them does not think it very
important for business students to take a business ethics course?

A) 0.9833
B) 0.9521
C) 0.9862
D) 0.9714
E) None of the above

18) Which of the following statements regarding a binomial experiment is false, where n is
the number of trials, and p is the probability of success in each trial?

A) The n trials are independent.
B) The standard deviation is np(1 - p).
C) The mean is np.
D) There are only two possible outcomes.


19) Woof Chow Dog Food Company believes that it has a market share of 25%. They survey
n = 100 dog owners and ask whether or not Woof Chow is their regular brand of dog
food, and 23 people say yes. Based upon this information, what is the value of the test
statistic?

a. -0.462
b. -0.475
c. 0.462
d. 0.475
e. None of the above


20) A company that makes shampoo wants to test whether the average amount of shampoo
per bottle is 16 ounces. The standard deviation is known to be 0.20 ounces. Assuming
that the hypothesis test is to be performed using 0.05 level of significance and a random
sample of n = 64 bottles, how large could the sample mean be before they would reject
the null hypothesis [i.e. testing : 16 : 16
o a
H H = = ]?

a. 16.2 ounces
b. 16.049 ounces
c. 15.8 ounces
d. 16.041 ounces


21) The managers of a local golf course have recently conducted a study of the types of golf
balls used by golfers based on handicap. A joint frequency table for the 100 golfers
covered in the survey is show below:
Type of Golf Ball
Handicap Strata Titleist Nike Other
< 2 5 8 3 2
2 to < 10 8 7 9 10
> 10 7 8 10 23

If a player comes to the course using a Nike golf ball, the probability that he or she has a
handicap of at least 10 is:

a. 0.223.
b. 0.48.
c. 0.455.
d. 0.108.
e. None of the above
22) Suppose we want to test : 30 : 30
o a
H H = < Which of the following possible
sample results based on a sample of size 36 gives the strongest evidence to reject
o
H in
favor of
a
H ?

a) X = 28, s = 6
b) X = 27, s = 4
c) X = 32, s = 2
d) X = 26, s = 9

23) How many Kleenex should the Kimberly Clark Corporation package of tissues contain?
Researchers determined that 60 tissues is the average number of tissues used during a
cold. Suppose a random sample of 100 Kleenex users yielded the following data on the
number of tissues used during a cold: X = 52, s = 22. Using the sample information
provided, calculate the value of the test statistic for testing : 60 : 60
o a
H H = =

a) t =(5260)/ 22
b) t =(5260)/(22 /100)
c) t =(5260)/(22 /1002)
d) t =(5260)/(22 /10)

24) The owner of a local nightclub has recently surveyed a random sample of n = 250
customers of the club. She would now like to determine whether or not the mean age of
her customers is over 30. If so, she plans to alter the entertainment to appeal to an older
crowd. If not, no entertainment changes will be made. Suppose she found that the sample
mean was 30.45 years and the sample standard deviation was 5 years. If she wants to be
95% confident in her decision, what conclusion can she make?

a) There is not sufficient evidence that the mean age of her customers is over 30.
b) There is sufficient evidence that the mean age of her customers is over 30.
c) There is not sufficient evidence that the mean age of her customers is not over 30.
d) There is sufficient evidence that the mean age of her customers is not over 30.



25) A survey claims that 9 out of 10 doctors recommend aspirin for their patients with
headaches. To test this claim against the alternative that the actual proportion of doctors
who recommend aspirin is less than 0.90, a random sample of 100 doctors results in 83
who indicate that they recommend aspirin. The value of the test statistic in this problem is
approximately equal to:

a) 4.12
b) 2.33
c) 1.86
d) 0.07
e) None of the above


A student claims that he can correctly identify whether a person is a business major or an
agriculture major by the way the person dresses. Suppose in actuality that if someone is a
business major, he can correctly identify that person as a business major 87% of the time.
When a person is an agriculture major, the student will incorrectly identify that person as a
business major 16% of the time. Presented with one person and asked to identify the major of
this person (who is either a business or agriculture major), he considers this to be a hypothesis
test with the null hypothesis being that the person is a business major and the alternative that
the person is an agriculture major.

26) Referring to the above, what would be a Type I error?

a) Saying that the person is a business major when in fact the person is a business major.
b) Saying that the person is a business major when in fact the person is an agriculture major.
c) Saying that the person is an agriculture major when in fact the person is a business
major.
d) Saying that the person is an agriculture major when in fact the person is an agriculture

27) Referring to the above, what would be a Type II error?

a) Saying that the person is a business major when in fact the person is a business major.
b) Saying that the person is a business major when in fact the person is an agriculture
major.
c) Saying that the person is an agriculture major when in fact the person is a business major.
d) Saying that the person is an agriculture major when in fact the person is an agriculture
major.


Health care issues are receiving much attention in both academic and political arenas. A
sociologist recently conducted a survey of citizens over 60 years of age whose net worth is too
high to qualify for Medicaid and have no private health insurance. The descriptive statistics for
the ages of 25 uninsured senior citizens were as follows:


28) Which of the following is the best correct statement.

a) One fourth of the senior citizens sampled are below 66 years of age.
b) The middle 50% of the senior citizens sampled are between 66 and 73.0 years of age.
c) The average age of senior citizens sampled is 73.5 years of age.
d) All of the above are correct.

29) Which of the following is the best correct statement.

a) One fourth of the senior citizens sampled are below 64 years of age.
b) The middle 50% of the senior citizens sampled are between 66 and 73.0 years of age.
c) 25% of the senior citizens sampled are older than 81 years of age.
d) All of the above are correct.

30) To explain personal consumption (CONS) measured in dollars, data is collected for

INC: personal income in dollars
CRDTLIM: $1 plus the credit limit in dollars available to the individual
APR: average annualized percentage interest rate for borrowing for the individual
ADVT: per person advertising expenditure in dollars by manufacturers in the city where
the individual lives
SEX: gender of the individual; 1 if female, 0 if male

A regression analysis was performed with CONS as the dependent variable and
ln(CRDTLIM), ln(APR), ln(ADVT), and SEX as the independent variables. The
estimated model was


What is the correct interpretation for the estimated coefficient for SEX?

a. Holding everything else fixed, personal consumption for females is
estimated to be $0.39 higher than males on the average.
b. Holding everything else fixed, personal consumption for males is estimated
to be $0.39 higher than females on the average.
c. Holding everything else fixed, personal consumption for females is estimated
to be 0.39% higher than males on the average.
d. Holding everything else fixed, personal consumption for males is estimated
to be 0.39% higher than females on the average.



A school superintendent is interested in what factors effect the sixth grade proficiency test in her
state.. She obtained the data on percentage of students passing the proficiency test (% Passing),
daily average of the percentage of students attending class (% Attendance), average teacher salary
in dollars (Salaries), and instructional spending per pupil in dollars (Spending) of 47 schools in
the state. The following is the multiple regression output with Y = % Passing as the dependent
variable, X1=:% Attendance, X2 = Salaries and X3 = Spending.



31) Which of the following is a correct statement?

a. The average percentage of students passing the proficiency test is estimated
to go up by 8.50% when daily average of percentage of students attending
class increases by 1%.
b. The daily average of the percentage of students attending class is expected to
go up by an estimated 8.50% when the percentage of students passing the
proficiency test increases by 1%.
c. The average percentage of students passing the proficiency test is
estimated to go up by 8.50% when daily average of the percentage of
students attending class increases by 1% holding constant the effects of
all the remaining independent variables.
d. The daily average of the percentage of students attending class is expected to
go up by an estimated 8.50% when the percentage of students passing the
proficiency test increases by 1% holding constant the effects of all the
remaining independent variables.





32) Which of the following is a correct statement based on the previous output?

a. 62.88% of the total variation in the percentage of students passing the
proficiency test can be explained by daily average of the percentage of
students attending class, average teacher salary, and instructional
spending per pupil.
b. 62.88% of the total variation in the percentage of students passing the
proficiency test can be explained by daily average of the percentage of
students attending class, average teacher salary, and instructional spending
per pupil after adjusting for the number of predictors and sample size.
c. 62.88% of the total variation in the percentage of students passing the
proficiency test can be explained by daily average of the percentage of
students attending class holding constant the effect of average teacher salary,
and instructional spending per pupil.
d. 62.88% of the total variation in the percentage of students passing the
proficiency test can be explained by daily average of the percentage of
students attending class after adjusting for the effect of average teacher
salary, and instructional spending per pupil.


33) The average length of stay in a hospital is useful for planning purposes. Suppose that the
following is the distribution of the length of stay in a hospital (in days) after a minor
operation:

Days 2 3 4 5 6
Probability .05 .20 .40 .20 ?

The average length of stay is:

a) 0.15 days
b) 0.20 days
c) 4.0 days
d) 4.2 days
e) 4.3 days


34) The sale of luxury boats has been found to be extremely dependent on whether or not
consumers think the economy is shrinking. The specific question is whether or not more
than half of all consumers think the economy has entered a recession. In a sample of 1600
randomly selected consumers, 845 answered they believed the economy was in a
recession. A 95% confidence interval for the true population proportion of consumers
who believed the economy was in recession is closest to:

a. (0.5176, 0.5486)
b. (0.4397, 0.5040)
c. (0.5036, 0.5526)
d. (0.4960, 0.5603)

(counted both as correct)
The next 2 questions are based on the following information:
An insurance company analyst is interested in analyzing the dollar value of damage
in automobile accidents. She collects data from 115 accidents, and records the
amount of damage as well as the age of the driver. The results of her regression
analysis are listed below.


SUMMARY OUTPUT


Regression Statistics
Multiple R 0.187
R Square 0.035
Adjusted R
Square
0.026
Standard Error 5652.090 (what we call
s
e
)

Observations 115.000

ANOVA
Df SS MS F Significance
F
Regression 1 130433116.219 130433116.21
9
4.083 0.046
Residual 113 3609911959.86
8
31946123.539
Total 114 3740345076.08
7


Coefficien
ts
Standard Error t Stat P-value
Intercept 10725.80
2
1535.215 6.987 0.000
Age 69.964 34.625 2.021 0.046


35) How would you best explain the y-intercept in this situation?

A) For each additional 1-year increase in the age of the driver, we would expect damage
to increase by $10,726.
B) For each additional 1-year increase in the age of the driver, we would expect damage
to increase by $70.
C) It makes no sense to explain the intercept in this situation, since we can not have
a driver with age of zero.
D) The average amount of damage was $10,726.


36) On average, what would be the dollar value of an accident involving a 25-year-old
driver?

A) $11,836.56
B) $10,795.47
C) $13,372.58
D) $12,474.90




37) The residual is defined as the difference between the:

A) actual value of y and the estimated value of y
B) actual value of x and the estimated value of x
C) actual value of y and the estimated value of x
D) actual value of x and the estimated value of y


Use the following graph for the next question



38) The value of a to the nearest hundredth is

a) 0.973
b) 0.975
c) 1.36
d) 2
e) 2.1
f) None of the above

39) A teacher determined that the class average on an exam is 65 and the standard deviation
is 7. After the class completes a review assignment, he adjusts the marks by adding 10 to
each exam. An analysis of the two sets of marks showed that

a) The standard deviation decreased while the mean increased
b) The standard deviation increased and the mean increased
c) The standard deviation stayed the same while the mean increased
d) The standard deviation and the mean stayed the same.


40) The number of hours for which a lightbulb works before it burns out is normally distributed
with mean 5000 hours and standard deviation 200 hours. What is the probability that a lightbulb
burns out in under 4800 hours?
a) 0.2345
b) 0.1587
c) 0.0124
d) 0.3245
e) None of the above


41) The number of hours for which a lightbulb works before it burns out is normally distributed
with mean 5000 hours and standard deviation 200 hours. Suppose you have a room where you
install five of these lightbulbs at the same time. What is the probability that at no lightbulbs burn
out before 4800 hours (use your answer from the question above)?

a) .0001
b) .0002
c) .0003
d) .1587
e) None of the above


42) To determine the reliability of experts used in interpreting the results of polygraph
examinations in criminal investigations, 280 cases were studied. The results were:


If the hypotheses were Ho: suspect is innocent versus Ha: suspect is guilty, then
the probability of making a wrong decision is:

a. 0.05
b. .086
c. .067
d. .032
e. None of the above

43) In a multiple regression analysis involving 40 observations and 5 independent variables,
SST= 350 and SSE = 30. The coefficient of determination (R
2
) is:

a) .9408
b) .8571
c) .9143
d) .8529


(The next 2 questions are based on the following information.)

Sam Pull attended the Career Showcase at the University of Chicago. The following table depicts
the pdf of the random variable Y representing the number of job offers Sam receives.

y p(y)
0 0.07
1 0.53
2 0.26
3 0.11
4 0.03

44) The expected number of job offers Sam receives is

a) 1.07
b) 1.50
c) 1.93
d) 2.00
e) None of the above

45) Sams parents give him $1000 for just trying to interview, plus $50 for each job offer he
receives. What is the expected amount of money Sam receives ?

a) $2050
b) $1053.5
c) $1075
d) $1100
e) None of the above
Short Answer (points as marked)

1. (28 points 7 points each) At a semiconductor plant, 60% of the workers are skilled and
80% of the workers are full-time. Ninety percent (90%) of the skilled workers are full time.



a) What is the probability that an employee selected at random is a skilled full-time
employee ?



P(S) = 0.6 P(F) = 0.8 and P(F|S) = 0.90

Hence P(F and S) = (0.9)(0.6) = 0.54


b) What is the probability that an employee selected at random is a skilled worker or
a full-time worker ?


P(S or F) = P(S)+P(F)-P(S and F) = 0.6+0.9-0.54=0.86


c) What percentage of the full-time workers are skilled ?

P(S|F) = P(S and F)/P(F) = .54/.8 = .675


d) Is being a skilled worker and being a full-time worker independent or dependent
events ? (explain).


No. P(S)P(F) = .6*.8 which does not equal P(S and F)=0.54
2. (48 Points 6 points each) We have data on the sales of 950 single-family homes in
Springfield, MA.. We wish to explain and predict the price of a single-family home (the Y
variable, in thousands of dollars) using the following predictor variables:

Variable Name Description
s_p Sale price in dollars (response variable)
inv Sale date inventory of homes on market
bath Number of bathrooms
ltsz Lot size in acres
hssz Sq. ft. of living area
bsemt 1 if basement, 0 otherwise
a_c 1 if central a/c, 0 otherwise
f_place 1 if fireplace, 0 otherwise
garsz_a 1 if garage, 0 otherwise
dw 1 if dishwasher, 0 otherwise
dr 1 if dining room, 0 otherwise
fr 1 if family room, 0 otherwise
age5 1 if age <= 5 yrs, 0 otherwise
stl10 1 if 1 story house, 0 otherwise
bdrms Number of bedrooms



a. Besides the intercept, which variables are important in explaining selling price ?
(explain).

All variables with |t|.1.96 or pvalue<.05




b. If you were doing backward stepwise regression, which variable would be removed first ?
(explain)


dw; lowest absolute value of the test statistic.



Sometimes called s
e


c. How much more does a house with a fireplace go for (everything else being equal)?


$9019.883


d. Do you even need the dishwasher variable in the model ? Explain.

No, as the |t|<1.96 or pvalue>.05 so this variable is not significant.


e. What is the value of R
2
?

R
2
=SSR/SST = 4.8714/8.0748=0.6032




f. Test the hypothesis
0
: 10 : 10
hssz a hssz
H H | | = =

Test statistics is t=(11.446-10)/2.311 = 0.625

Since 0.625 < 1.96 we fail to reject the null hypothesis.






g. For the model as is, if we use it for predictions, how accurate would our predictions be
(put the appropriate units on your answer) ?

+/- 1.96 s
e
= +/- 1.96*18509.72 = +/- $36279.06





h. Give a 95% confidence interval for the intercept in the model.


-2606.02 +/- 1.96*(5166.2554)