Академический Документы
Профессиональный Документы
Культура Документы
, 2013
License: Unless otherwise noted, this material is made available under the terms of the Creative
Commons Attribution-NonCommercial-Share Alike 3.0 Unported License:
http://creativecommons.org/licenses/by-nc-sa/3.0/
The University of Michigan Open.Michigan initiative has reviewed this material in accordance with U.S.
Copyright Law and have tried to maximize your ability to use, share, and adapt it. The attribution key
provides information about how you may share and adapt this material.
Copyright holders of content included in this material should contact open.michigan@umich.edu with any
questions, corrections, or clarification regarding the use of content.
For more information about how to attribute these materials visit:
http://open.umich.edu/education/about/terms-of-use. Some materials are used with permission from the
copyright holders. You may need to obtain new permission to use those materials for other uses. This
includes all content from:
Mind on Statistics
Utts/Heckard, 4th Edition, Cengage L, 2012
Text Only: ISBN 9781285135984
Bundled version: ISBN 9780538733489
SPSS and its associated programs are trademarks of SPSS Inc. for its proprietary
computer software. Other product names mentioned in this resource are used for identification purposes
only and may be trademarks of their respective companies.
Attribution Key
For more information see: http:://open.umich.edu/wiki/AttributionPolicy
Content the copyright holder, author, or law permits you to use, share and adapt:
Creative Commons Attribution-NonCommercial-Share Alike License
Public Domain Self Dedicated: Works that a copyright holder
has dedicated to the public domain.
Make Your Own Assessment
Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright.
Public Domain Ineligible. Works that are ineligible for
copyright protection in the U.S. (17 USC 102(b)) *laws in your
jurisdiction may differ.
Content Open.Michigan has used under a Fair Use determination
Fair Use: Use of works that is determined to be Fair consistent
with the U.S. Copyright Act (17 USC 107) *laws in your jurisdiction may differ.
Our determination DOES NOT mean that all uses of this third-party content are Fair Uses and we DO
NOT guarantee that your use of the content is Fair. To use this content you should conduct your own
independent analysis to determine whether or not your use will be Fair.
Statistics 250
1
Lab Workbook
Fall 2013 - Winter
2014
Weekly Labs, In-Lab Projects,
Supplements,
and Old Exams for Review
Used in all lab sections of Stat
250
Table of Contents
2
Material
Page
1
2
4
6
8
10
12
14
17
22
29
36
46
56
62
70
80
89
96
105
115
127
135
169
203
Note to Students
Welcome to Statistics 250 at the University of Michigan! This lab
workbook is designed for you to use in lab and as extra
preparation for exams. In the workbook, you will find the following
materials:
Supplemental Material great summaries for reference
throughout the term:
1. SPSS Commands Reference
2. Notation Sheet
3. Name That Scenario
4. Editing Charts in SPSS
5. Important Notes for Hypothesis Testing
6. Interpretation Examples
7. Summary of T-tests and Name That Scenario Practice for
Means
8. Regression Output in SPSS
Weekly Labs (numbered 1 to 13) each lab contains the follow
parts:
o Lab Background objective and brief overview material,
which is good to take a couple minutes to read before you come
to lab each week.
o Warm-Up Activity quick questions for you to do before the
In-Lab Project, usually a quick review of concepts you have
seen in lecture.
o ILP (In-Lab Project) one or more activities that you will work
on in lab, in groups. A copy of the ILP will be provided to each
group for turning in at the end of the lab period.
o Cool-Down Activity questions for you to do after the ILP for
further reflection and application of the concepts covered in the
ILP.
o Example Exam Questions old exam questions on the lab
topic(s) for additional practice.
Old Exams complete sets of actual old exam for studying. Be
sure to refer to CTools to see if any problems on these old exams
are not relevant for your particular upcoming exam (due to
differences in the semester schedule). This information, in addition
to solutions, will be posted on CTools in the Review Info folder
under the Resources tab closer to each exam date.
The Labs are designed to be interactive and to provide you with a
complete example for each concept.
Completing the
Mean
Proportion
Standard
deviation
Variance
Sample size
Multipliers
Population
Notation
Sample
Notation
Summary Measures
(read as
x (x-bar)
mu)
p
p (p-hat)
(sigma)
s2
n
Confidence Intervals
z* (z-star)
t* (t-star)
Margin of
error
Test
statistics
Note: t, F, and
2 statistics
Notation
used in
SPSS
Mean
Std. Deviation
Variance
N
m, m.e.
Hypothesis Testing
z
t
F
have degrees of
freedom
(abbreviated df)
associated with
them. Look for
these on your
Formula Card.
2 (chisquare)
Significance
level
p-value
t
F
Chi-square
(alpha)
p-value
Sig.
Notation
used in
SPSS
Analysis of Variance (abbreviated ANOVA)
Between
Sum of
groups (look
squares for
SSG
in the column
groups
labeled Sum
of Squares)
Within groups
Sum of
(look in the
squares for
SSE
column
error
labeled Sum
of Squares)
Between
Mean
groups (look
square for
MSG
in the column
groups
labeled Mean
Square)
Within groups
(look in the
Mean
MSE
column
square error
labeled Mean
Square)
Regression
Response
(given by
(dependent)
y
y
name of yvariable
variable)
Predicted
E(y)
y (y-hat)
(estimated)
(expected
response
value of y)
Explanatory
(given by
(independen
x
x
name of xt) variable
variable)
B (look in the
o (beta-not)
y-intercept
bo
row labeled
(Constant))
B (look in the
row labeled
Slope
b1
with the
1 (beta-one)
name of the
x-variable)
Coefficient
of
r
R, Beta
correlation
Name
Population
Notation
Sample
Notation
Coefficient
of
determinati
on
Error terms
vs Residuals
(error
terms)
r2
R Square
e (residuals)
Unstandardiz
ed residuals
One
Two
How many variables are there?
One
Two
What type of variable(s)?
Categorical
Quantitative
Number of Populations
Number of
Variables and
Type
Categori
cal
One
Two
1-sample
inference
for
population
proportion
(p)
(Labs 5 an
6)
2
independe
nt samples
inference
for the
difference
between 2
population
proportions
Chi-square:
Goodness
of Fit (Lab
13)
On
e
Tw
o
Categori
cal
(relationshi
p)
Quantita
tive
(relationshi
Chi-square:
Homogenei
ty
(Lab 13)
(p1 p2)
Chi-square:
Homogenei
ty
(Lab 13)
2
independe
nt samples
inference
for the
difference
between
2
population
means
(1 - 2)
(Lab 10)
1-sample
inference
for
population
mean ()
(Lab 8)
Quantita
tive
More Than
Two
Paired
samples
inference
for a
population
mean
difference
(D)
(Lab 9)
Chi-square:
Independen
ce
(Lab 13)
Regression
(1)
(Lab 12)
ANOVA
(i where
there is
one i for
each
population)
(Lab 11)
p)
10
For boxplots, if there are any points denoted as outliers, you can
identify them by looking at their case label number in the default
output. The Chart Editor provides a special mode for identifying
individual cases whose data labels you want to display. This is the
data label mode, and when you are in data label mode, you can't
change anything else in the chart. From the menus, choose
Elements> Data Label Mode. The cursor changes shape to
indicate that you are in data label mode. Click the data element
for which you want to display the case label.
If there are
overlapping data elements in the spot that you click, the Chart
Editor displays the Select Data Element to Label dialog box.
This dialog allows you to select the specific data element or
elements for which you want to display data labels. The Chart
Editor displays the data label in a default position related to the
data element. When you are finished choosing data elements,
from the menus choose Elements> Data Label Mode again, and
the cursor changes back to the arrow to indicate that you are no
longer in Data Label Mode.
The Options menu lets you customize your chart further. You may
add a title or text box from this menu. Text boxes can appear
anywhere in a chart.
From the Chart Editor menus, select
Options> Text Box or Options> Title depending on which you
want.
For titles, the Chart Editor creates the title box and
automatically positions it in the top center of the chart. Type the
text and press enter when you are finished typing. To enter line
breaks, press Shift+Enter. If necessary, use the Text tab to format
the text. For text boxes, you can drag and drop to reposition
them. You may need to resize the graph so the text box will not
cover up part of the graph. You can also copy the plots onto
MSWord or another text editor and then type in your name and
title within the document.
Saving Output Boxes and Graphs
Images and other output from SPSS can often be copied and then
pasted into a document by selecting the desired output, rightclicking, and choosing Copy.
1. To save an output box, such as a table of descriptive
statistics, first have the location where you would like to store
the output open. Then right-click on the output table and
select Copy. The table can then be pasted into a document or
text-field (such as those in your PreLab assignments or
homework). If you are pasting into a Word document and if the
output does not appear to format correctly, it may be a good
idea to choose Paste Special and paste as an image. Your
11
12
13
14
Supplement 6: Interpretation
Examples
In 1980, Bausch and Lomb Corporation developed a new type of
extended-life contact lens made of silicone, which it claimed had a
useful
life
of
more
than
4 years. During the research and development period, a random
sample of 6 contact wearers was asked to wear the new contact
lenses and record how long they lasted. The average useful life of
the six pairs of lenses was 4.6 years, with a standard deviation of
0.49 years.
a. Interpretation of the Standard Deviation:
The average distance of the observed useful lives of these
lenses from their mean useful life of 4.6 years is about 0.49
years.
b.
SE ( X )
s
n
0.49
6
0.200
15
population
4 years.
mean
useful
life
is
more
than
0.200
3.00
.
The p-value for this test is the probability of getting a t-test
statistic at least as extreme as the observed test statistic,
assuming
the
null
hypothesis
is
true.
n
NOTE:
These interpretations can be extended to the any test and
confidence
interval,
adjusting for the different parameters, different directions of
extreme, different test statistics, etc.
16
17
18
Having just reviewed the three main t-test inference scenarios, you
should understand the testing procedures and be able to interpret the
results of a test. However, it is important to know when each scenario
applies. Read each of the following inference scenarios and determine
which of the three t-test procedures would be most appropriate: the onesample t-test, the paired t-test, or the two-independent samples t-test.
1. A researcher is studying the effect of a new teaching technique for
middle school students. One class of 30 students is taught using the
new technique and their mean score on a standardized test is
compared to the mean score of another class of 27 students who
were taught using the old technique.
2. A company claims that the economy size version of their product
contains
32
ounces.
A consumer group decides to test the claim by examining a random
sample of 100 economy size boxes of the product, since they have
received reports that the boxes contain less than the 32 ounces
claimed.
3. At some universities, athletic departments have come under fire for
low academic achievement among their athletes. An athletic director
decides to test whether or not athletes do in fact have lower GPAs. A
random sample of 200 student athletes and a random sample of 500
non-athlete students are taken and their GPAs are recorded.
4. As part of a biology project, some high school students compare heart
rates of 40 of their classmates before and after running a mile. They
want to see if the heart rate of students their age is faster after
running a mile than before, on average.
5. A hospital is studying patient costs; they decide to follow 500 surgery
patients hospital and medical bills for a year after surgery, and
compare them to the estimated costs provided to the patients before
surgery. They want to see if the estimated and actual costs are
comparable on average.
19
There are four parts to the default regression output. Use the
scroll bar at the right edge of the Output Window to scroll up to
the top of the regression output. The first section just reminds you
which
variable
was
entered
as
the
explanatory
x variable; for this example, the explanatory variable is DNA.
The second section has the heading Model Summary. The Model
Summary starts with the correlation between the two variables, R,
which is the absolute value of the correlation coefficient, r.
You need to look at the sign of the slope of the regression line to
determine if you need to put a minus sign in front of this value to
correctly report the correlation coefficient. (The actual value of
the correlation coefficient is also reported in the last section of
regression output, under the column heading Beta.)
The
correlation coefficient measures the strength of the linear
association between the two variables. The closer it is to +1 or -1,
the stronger the linear association. The square of the correlation,
the R Square quantity, has a useful interpretation in regression.
It is often called the coefficient of determination and measures
the proportion of the variation in the response that can be
explained by the linear regression of y on x. Thus, it is a measure
of how well the linear regression model fits the data. The Std.
Error of the Estimate gives the value of s, the estimate of the
population standard deviation .
Model Summary
Model
1
R
R Square
a
.856
.732
Adjusted
R Square
.699
Std. Error of
the Estimate
4.851
The third part of the output contains the ANOVA table for
regression, used for assessing if the slope is significantly
20
Regression
Residual
Total
Sum of
Squares
515.141
188.228
703.369
df
1
8
9
21
Mean Square
515.141
23.528
F
21.894
Sig.
.002a
Model
1
(Constant)
DNA
Unstandardized
Coefficients
B
Std. Error
-.548
8.193
.167
.036
Standardi
zed
Coefficien
ts
Beta
.856
t
-.067
4.679
Sig.
.948
.002
The t-statistic for the slope, in the second row, is a test of the
significance of the model with x versus the model without x, that
is,
for
testing
H0:
1
=
0
versus
Ha: 1 0. The t-statistic for the y-intercept, in the first row, is a
test of whether the y-intercept (o) is different from zero. This test
is not often of interest unless a value of 0 for the y-intercept is
meaningful
and
of
interest.
For
example,
if
x = amount of soap used and y = height of the suds, then an
intercept value of 0 is meaningful as no soap would lead to no
suds. The column labeled Sig. gives the two-sided p-value for
the corresponding hypothesis test.
SPSS also provides the information to calculate confidence
intervals for the parameter estimates. The column labeled Std.
Error provides standard errors (estimated standard deviations)
of the parameter estimates and is the quantity that is multiplied
by the appropriate t* value in computing the half-width of the
confidence interval. Recall that you can request SPSS to produce
22
23
Regression
Residual
Total
Sum of
Squares
515.141
188.228
703.369
df
1
8
9
Mean Square
515.141
23.528
F
21.894
Sig.
.002a
24
25
26
27
28
29
Warm-Up:
Variables
Categorical
and
Quantitative
Take a few minutes to recall the two types of variables that have
been introduced in class: categorical and quantitative. For each of
the following variables, determine whether it is a categorical or
quantitative variable. Recall that numerical summaries such as
mean and median can only be computed for quantitative
variables.
Cell Phone Model (iPhone, Android)
Categorical
Quantitative
Number of countries visited in a year
Categorical
Quantitative
Credit Hours enrolled per semester
Categorical
Quantitative
Eye color
Categorical Quantitative
Categorical
Quantitative
Task
Answer
30
31
Task
Answer
32
33
This web site contains a Java applet that will help you
understand the relationship between the mean and the
median.
34
findings here.
You have seen that the mean is more sensitive to outliers than the
median. For a data set that contains several outliers, which
measure of center would you choose to report? What measure of
spread? Explain.
You are the manager of a local grocery store who is put in charge
of setting the prices for your stock. You will determine the prices
for each product by examining the prices of your competitors in
the neighborhood. Suppose your neighborhood consists mainly of
chain store supermarkets along with 2 high-end grocery stores.
You want to set your prices low enough to attract customers but
high enough so you will make a profit. How would you use these
measures of center to help you determine the prices?
35
Example
Median
Exam
Question
on
Mean
and
b.
3.01
FALSE
36
37
38
39
Warm-Up: Matching
____ iii.
Mean
categorical variable
C.
____ iv.
Median
quantitative variable
Examine
distribution
of
D. Examine distribution of a
E. Measure of spread
____ vi.
outliers
IQR
40
41
42
Standard Deviation:
Median:
Q1:
Q3:
IQR: Q3-Q1 =
Min:
Max:
Range:
Max-Min =
9. To obtain numerical summaries or any graph (except boxplots)
for current salary by minority status, we need to split the data
file. Use Data> Split File and choose Organize output by
groups.
Doing this successfully does not produce any
noticeable changes in the SPSS windows; there will only be two
short lines of output in the Output window confirming the split.
The grouping variable is minority classification.
Obtain
descriptive statistics for current salary by minority status (once
the data is split, just generate descriptive statistics). List some
of your findings below.
Minority:
Non-Minority:
43
group, you need to go back to the Split File dialog box and
choose Analyze all cases, do not create groups. Again, the
only change will be one line of output confirming the data has
been un-split.
11.Create side-by-side boxplots for current salary. The data file
should NOT be split to create these. Use Graphs> Legacy
Dialogs> Boxplot with Simple and Summaries for groups
of cases. Minority Status is the variable for the category axis,
and current salary is the variable.
How does the distribution for current salary compare for
minorities versus non-minorities (based on the side-by-side
boxplots, histograms, and descriptives)?
Cool-Down: Check
about Std Dev
Your
Understanding
Incorrect
because
___________________________
_____________________________________________________________
44
Incorrect
mean
height
by
because
___________________________
_____________________________________________________________
3. The average distance between the height values is roughly
2 inches.
Correct
Incorrect
because
___________________________
_____________________________________________________________
b. A student provided the following incorrect interpretation of
standard deviation.
68% of the height values are within 2 inches of the mean
height
Why is this interpretation incorrect in general? What graph of
the height data would you make to check if the statement
could be correct? What would you look for in the graph?
45
Experiment
Grades
5
4
3
No
Yes
What is (approx.) the IQR for the grade scores of children who
do eat breakfast?
___________ points
46
this sentence.
The highest grade scored by one of the children not eating
breakfast is (approx) equal to the ______________________ for the
children who do eat breakfast.
h. True or false: The symmetry in the b for the children not eating
breakfast implies that the histogram of the same data is also
symmetric.
Circle one: True
False
Explain:
47
48
Below are two sequence plots; in the first plot the observations
appear to support that the underlying process that generated the
observations is stable, but that is not the case for the observations
in the second plot on the right. In this case, there appears to be
an increasing trend, thus the underlying process does not appear
to be stable; the observations should not be considered a random
sample.
Q-Q Plots: Later in this class, we will see that the assumption of
a normal model for a population of responses will be
needed in order to perform certain inference procedures.
Previously, we have seen that a histogram can be used to get an
idea of the shape of a distribution. However, there are more
sensitive tools for checking whether the shape is close to a normal
(bell-shaped) model.
The best plot that can be used to check for normality is called a QQ Plot, which is a plot of the percentiles (or quantiles) of a
standard normal distribution against the corresponding percentiles
of the observed data. If the observations follow an approximately
normal distribution, the resulting plot should be roughly a straight
line with a positive slope. Deviations from this indicate possible
departures from a normal distribution.
At the right is an example of a Q-Q Plot showing strong support to
say the data that does seem to come from a population with an
approximately normal distribution.
49
The Q-Q plot on the left indicates the existence of two clusters of
observations. The Q-Q plot in the center shows an example where
the shape of the distribution appears to be skewed right. The Q-Q
plot on the right shows evidence of an underlying distribution that
has shorter tails compared to those of a normal distribution.
Note: It is only important that you can see the departures in the
above graphs and not as important to know if the departure
implies skewed left versus skewed right and so on. A histogram
would allow you to see the shape and type of departure from
normality.
Finally, we consider
an example Q-Q plot
(shown at the right)
that appears normal
with the exception of
one data point.
50
In this case, we
would say the Q-Q
plot shows evidence
of
an
underlying
distribution which is
approximately
normal except for
one large outlier that
should be further
investigated.
Note that outliers
could
appear
in
either the upper or
lower tail.
identically distributed
normal ,
should not
51
In this first part of the In-Lab Project, you will look at more
examples of time plots to help learn how to better read such
graphs for assessing whether our data appear to be stable and
support the random sample condition.
Task 1: Go to the Stat 250 Prelab Site and find the Time Series
tab along the top. Download the timeseries.rdata file script you will
be using in this part of the ILP. When you double click on this
script file it should open up the R program (on all campus
machines, and you can download R to your computer free too).
1. Begin the program by entering the following command.
timeseries()
52
Task 2: Go to the Stat 250 Prelab Site and find the QQ Plot tab
along the top. Download the qqplot.rdata file script you will be
using in this part of the ILP. When you double click on this script
file it should open up the R program (on all campus machines, and
you can download R to your computer free too).
1. Begin the program by entering the following command.
qqplot()
2. Select your sample size by entering a number between 1 and
10000.
3. Select the type of distribution you would like an example a QQ
plot be generated from.
4. Once your QQ plots and the corresponding histograms have
been created, you will be asked if you want to save your plots
to the desktop as an image. Answer and then you will again be
asked to select a next sample size. Try creating QQ plots for
many different distributions and sample sizes.
5. Sketch below one of the resulting QQ plots and Histograms for
a sample of 1000 observations from a skewed right distribution.
QQ plot:
Histogram:
53
54
1. Create a histogram and a Q-Q plot for the IQ values. Q-Q plots
are created via Analyze> Descriptive Statistics> Q-Q plots.
Provide rough sketches.
Histogram:
QQ Plot:
55
56
57
Histogram A
Boxplot A
Boxplot C
QQ Plot A
Histogram B
Histogram
Boxplot B
QQ Plot B
58
QQ Plot
640
620
600
580
PHSLEV
560
540
520
1
10
11
Sequence number
59
60
61
62
63
64
Formula Card
65
66
X has a
___________________________ distribution
X has a
____________________________ distribution
67
P(smiled
given
male)
P(male given smiled)
ii. Find the probability selected
appropriate conclusion.
P(male)
above
and
circle
the
68
69
b. Julie is one of his current students and she studies statistics for
about 6 hours per week. What is her corresponding z-score?
70
ii
71
True
False
Suppose that the amount of time spent waiting for your bus to our
campus each day is a uniform random variable between 0 to 20
minutes.
a. Sketch a picture of the model for waiting time for the bus.
Provide labels for each axis and some values along each axis.
72
Final answer:
________________
f.
Circle one:
73
Not Correct
2. The 95% confidence level means that with this method, and for
74
ILP: Calculating
Proportion
Not Correct
CI
for
Population
75
__.
Cool-Down:
More
Confidence Levels
Interpretations
of
Not Correct
Not Correct
76
3.
4.
Not Correct
5.
Not Correct
The data has a 95% confidence level that the true proportion
of families that ate dinner together last Sunday night is within
the calculated confidence interval of (0.56, 0.70).
Correct
6.
Not Correct
Not Correct
Population
77
standard error?
i. If repeated samples of 100 alumni were obtained, we would
estimate the resulting sample proportions to be about 0.046
from the true population proportion p on average.
ii. If repeated samples of 100 alumni were obtained, we would
estimate the sample proportion of 0.70 to differ from the
true population proportion p by about 0.046.
iii. We would estimate that our sample proportion of 0.70 is
about 0.046 away from the true population proportion p on
average.
2. Construct a CI for population proportion p and Find the
Sample Size.
a.
Give a (general) 95% confidence interval estimate for the
population proportion of alumni in favor of firing the coach.
b.
c.
78
f.
79
80
(uses > or <; called a one-sided test). Notice that we never test
for equality in Ha.
The purpose of a significance test is to assess whether or not the
observed data are consistent with the null hypothesis (within the
reasonable bounds of sampling variability). If the data seem
unlikely to occur if the null hypothesis is assumed true, then we
would reject the statement made in the null hypothesis. To help us
make this decision, we use a test statistic, which represents a
summary of the data. More specifically, a test statistic is a
random variable related to the hypotheses of interest that has a
known probability distribution (under the null hypothesis), and will
be examined for evidence in favor of or against H 0.
In hypothesis testing, another frequently reported value is the pvalue, a number that is used to indicate the degree of significance
of the data. The p-value is the probability of getting a test
statistic as extreme or more extreme than the observed value of
the test statistic, assuming the null hypothesis is true. Here
extreme means in the direction of Ha, or providing more
evidence against H0 .
We must decide in advance how much evidence against H 0 we will
require for rejection. This designated amount of evidence is called
the level of significance, denoted by (alpha). Common values
of are 0.01, 0.05, and 0.10. If the p-value is less than or equal to
, we make the decision to reject H0. If we reject the null
hypothesis, the results of the test are said to be statistically
significant at level . A significant result in the statistical sense
does not necessarily imply an important result in the practical
sense. It simply means that such a difference from the null
hypothesis is not likely to happen just by chance.
Of course, no procedure is perfect, and as such, there are two
types of errors possible during hypothesis testing. If the null
hypothesis is true but the decision is to reject H 0, then we say that
a Type I error has occurred. A Type II error occurs when the
alternative hypothesis is true, but we fail to reject H 0. Each type of
error has a probability of occurring. If the null hypothesis is true,
the level of significance, , is also the probability of a Type I error,
while the probability of a Type II error is denoted by .
Another important component of a hypothesis test is its power.
The power of a test measures its ability to detect an alternative
hypothesis when it is true. Power of a particular test is calculated
81
as the probability that the test will reject H 0 when the alternative
hypothesis is true. Since we just learned that is the probability
that we did NOT reject H 0 when the Ha is true, we can see that
power is represented by 1 .
Truth
H0
True
Ha
True
Decision
Made
Result
Associated
Probability
Reject H0
Type I Error
Do Not Reject
H0
Correct
Decision
Reject H0
Correct
Decision
1 = power
Do Not Reject
H0
Type II Error
A Few Notes:
1. In practice we want to protect the status quo, so we are usually
most concerned with Type I error. We set Type I error at a
constant value by fixing the significance level .
2. Most tests we describe have the smallest for given .
3. For a fixed sample size n, there is a tradeoff between and
np0 10
and
1p
n( 0) 10.
We also must
82
z=
^p p 0
( p ) (1 p )/n
0
83
Warm-Up:
Stating the Hypotheses
and Defining the Parameter
Interest
of
For each example below fill in the hypotheses and define the
parameter of interest.
1. The Detroit Tigers advertising team believes that about 70 % of
Ann Arbor residents will attend a Tigers game this season.
Suppose the general manager wants to cut back on advertising
in the area and speculates that Ann Arbor residents are less
likely to attend a Detroit Tigers game this season than
previously assumed.
Ho:__________________________ Ha:___________________________
Let _
represent
represent
84
question.
What proportion of UM students____________________________?
85
2.
3. (a)
The estimate in part (2) is a sample proportion.
About how far away from the population proportion would you
expect such estimates to be, on average? (i.e. report the
standard error of the sample proportion)
86
enough,
calculate
the
Fail to Reject H0
Also write out your real world conclusion in the context of the
problem.
87
False
False
3. If the test of H0: p = 0.75 versus Ha: p > 0.75 resulted in a pvalue of 0.083, then the probability that H 0 is true is 0.083.
True
False
False
False
False
88
89
Example
Testing
Exam
Question
for
Proportion
on
Hypothesis
Population
b. Provide the value of the test statistic (including its symbol) and
the corresponding p-value. Include a sketch to show how the pvalue is found.
Sketch (include labels):
Symbol
Test Statistic =
=
_______________
p-value = _________________________
c. Give the decision and corresponding conclusion by circling the
appropriate statements:
The statistical decision at a 5% significance level is:
90
Reject Ho
Fail to reject Ho
Therefore, there
is
is not
sufficient
evidence to say that,
a majority of all students (represented by this sample) would
vote for Obama.
d. If this decision in part (c) is incorrect,
it would be a (circle one) Type 1
error.
Type
91
(possible values of the statistic) vary and how close they tend to
be to the parameter value.
With a large number of samples, you can assess whether the value
of the statistic (e.g., sample mean, X ) will frequently be close to
the true value of the population parameter (e.g., population mean,
), and if so, how close on average. This can be seen more easily
through some pictures:
92
Formula Card:
93
In this activity, you will observe the effects that sample size and
the distribution of the population you are sampling from have on
the sampling distribution of the sample mean. The sampling
distribution of the sample mean, X , is the distribution of the
sample mean values for all possible samples of the same size from
the same population.
Open the Lab 7 Sampling Distribution applet from the Links
to Lab Applets folder on the Stat 250 CTools site (Resources
Lab Info folder).
Alternatively, the original applet can be found at:
http://onlinestatbook.com/stat_sim/sampling_dist/inde
x.html
94
95
1,000 Samples,
population.
or 10,000
Samples
96
this
97
n = 5:
n = 25:
d.
e.
f.
2. Clear the lower three graphs and then select the Skewed
distribution as a parent population.
a. Select Mean (sample mean) as the statistic of interest in
both the 3rd and 4th histograms, sample size n = 5 for the 3rd
histogram, and n = 25 for the 4th. Do about 5 animated
samples, and then take 10,000 samples at once. Draw
rough sketches of each of the distributions of the sample
means. Make sure to label both axes.
n = 5:
n = 25:
98
99
100
f.
101
102
If
the
parent
population
is
NOT
normal
of
__________
and
standard
deviation
of
_____________.
The result in 1(a) is known as the Sampling Distribution of
the Sample Mean. The result in 1(b) is known as the Central
Limit Theorem. While you should note that there are several
similarities between them, make sure you can see and
understand the difference between the two results.
2. Fill out the chart below to further summarize your findings
regarding the sampling distribution of the sample mean based
on the CLT.
Will the sampling
distribution of
sample mean be approx
normal?
Sample Settings
n = 10, Parent Population
Normal
n = 10, Parent Population NOT
Normal
103
104
Example Exam
Distribution
Question
of
Mean
on
the
Sampling
Sample
c. The radio station can afford to pay for a total of 67,000 gallons.
What is the probability that the total number of gallons for a
random sample of 50 homes will exceed 67,000 gallons?
Hint: Think about how a total and an average are related.
105
106
x 0
s/n
The test statistic tells us how many standard errors the sample
mean, x , is from the test value, 0.
In hypothesis testing, another frequently reported value is the pvalue, a number that is used to indicate the degree of significance
of the data. The p-value is the probability of getting a test statistic
as extreme or more extreme than the observed value of the test
statistic, assuming the null hypothesis is true. Here extreme
means in the direction of Ha, or providing more evidence against
H0. Our test statistic, t, has a t-distribution with n 1 degrees of
freedom (df), which is written as t(n 1).
We must decide in advance how much evidence against H 0 we will
require for rejection. This designated amount of evidence is called
the level of significance, denoted by (alpha). Common values of
are 0.01, 0.05, and 0.10. If the p-value is less than or equal to ,
we make the decision to reject H0. If we reject the null hypothesis,
the results of the test are said to be statistically significant at level
.
A significant result in the statistical sense does not
necessarily imply an important result in the practical sense. It
107
simply means that such a difference from the null hypothesis is not
likely to happen just by chance.
Of course, no procedure is perfect, and as such, there are two
types of errors possible during hypothesis testing. If the null
hypothesis is true but the decision is to reject H 0, then we say that
a Type I error has occurred. A Type II error occurs when the
alternative hypothesis is true, but we fail to reject H 0. Each type of
error has a probability of occurring. If the null hypothesis is true,
the level of significance, , is also the probability of a Type I error,
while the probability of a Type II error is denoted by .
108
Formula Card:
5%
10%
If the significance level was 5%, for which of the following p-values
would your result be statistically significant?
0.0067
0.043
0.22
109
0.14
110
in the
111
One
Two
More
than two
Two
Quantitative
112
1.0
.8
.7
.8
.6
.6
.4
.4
.3
.2
.2
SAL
.5
.1
.1
.2
.3
.4
.5
.6
.7
.8
0.0
1
Observed Value
Sequence number
Test-Statistic:
e. Generate the t-test output using Analyze> Compare
Means> One-Sample T Test. Enter a test value of
_______ (this is the value you are testing against from the
null hypothesis).
f. What is the value of the test statistic? _____ =
_______________
g. Provide an interpretation of the test statistic value. (See
Supplement 6.)
113
4. Decision:
What is your decision at a 5% significance level? Reject H0
Fail to Reject H0
Remember:
significant
significant
Reject H0
Fail to Reject H0
Results statistically
1.28
114
2.58
3.00
20
0.108
0.075
0.057
0.043
1. H0: = 20,
Ha: > 20,
Do Not Reject H0
Sketch:
0.030
0.015
t = 2.58
0.009
0.004
Reject H0
t = 2.12
t = -3.20
Reject H0
Do Not
4. H0: = 6,Ha: 6,
Reject H0
Sketch:
t = -1.87
Reject H0
Do Not
115
Reject H0
Do Not
-2.205
2.205
b.
Distribution of the t-test statistic assuming the null
hypothesis is true:
t(8)
c.
t(9)
t(16)
d.
p-value:
0.0259
1-0.059
0.0295
e.
Decision at a 5% significance level:
Reject H0
0.059
1-
Reject H0
Fail to
116
WEIGHT
1050
1000
950
900
850
1
9 11 13 15 17 19 21 23 25 27 29 31 33 35
Order number
117
Formula Card:
118
b. Ha: d < 0
c. Ha: d
119
One
Two
More
than two
Two
Categorical
Quantitative
120
Hypothesis Test: You can now implement the Five Steps for
hypothesis testing.
1. State the Hypotheses:
H0: ___________ = ___________ and Ha: ___________
___________ , where __________ represents:
121
Test-Statistic:
d. Generate the paired t-test output using Analyze>
Compare Means> Paired-Samples T Test.
What is the value of the test statistic? ____ = ____________
e. What is the distribution of the test statistic if the null
hypothesis is true?
Note:
This is not the same as the distribution of the
population that the data were drawn from, and will be the
model used to find the p-value.
3. Calculate the p-Value:
a. What is the SPSS reported p-value? _____________ .
Is this the p-value we want? ________
4. Decision:
What is your decision at a 5% significance level? Reject H0
Fail to Reject H0
Remember:
significant
significant
Reject H0
Fail to Reject H0
Results statistically
122
Ha:_______________________
123
H0:_______________________
Ha:_______________________
124
Ha:_______________________
125
You enter the above data into SPSS, and conduct a paired t-test to
obtain the following output.
P
ire
a
9
P
1
S
# a
3
#
ire
M td
e .D
u ri
s
.
6
5
0
u
s
d
5
a
D
%
In
ife
C
te
D td
S
a e ML n v e o
in
d
g
. 1
3
4 .6
in
w
g
le
re
o
vr
n
n
a
i E
fe
re ro
.
ia a wU
tio n p
a
n
9-.2 7
2 .4 2
c
f
id
lo
n r
n
c
2 .1
e
e
fth
n
e
s
c
eS
et
e
3
h
ig
.0
d r ig
.(2r
s
9
6
tro
-ta
t
u
o
ile
2
d
m
c. Report the p-value for the test in (b) and decision at a 10%
significance level.
p-value: _______________ Decision: (circle)
Fail to Reject H0
Reject H0
126
d
io
)
-
127
Formula Card:
128
and
No
129
Ha: ___________ =
One
Two
More
than two
Two
Categorical
Quantitative
130
Hypothesis Test: You can now implement the Five Steps for
hypothesis testing.
1. State the Hypotheses:
H0: ___________ = ___________ and Ha: ___________
___________ , where __________ represents:
131
iii. Note there is no time order for this data. If there were,
since you need EACH sample to be a random sample,
how many time plots would you need to make to check
this assumption? _______ time plot(s)
132
Side-by-side Boxplots:
Side-by-side boxplots show that the IQRs are: similar
not similar
similar
not similar
133
Ha: ______________________
can
cannot
A Note on Output: Be careful not to confuse the Levenes test pvalue with the p-value from the two independent samples t-test,
since your output will contain both. Levenes tests p-value is
labeled Sig. in the Levenes section of the output. Further, note
that the first line of output corresponds to the pooled procedure,
while the second line corresponds to using an unpooled procedure.
g. Based on the version of the test you decided to conduct
(pooled or general) in part (f), what is the value of the test
statistic? Make sure you are looking at the correct line of
output (see note).
134
4. Decision:
What is your decision at a 5% significance level? Reject H0
Fail to Reject H0
Remember:
significant
significant
Reject H0
Fail to Reject H0
Results statistically
No
Explain.
b. Did your conclusion here match the one you made in part 4?
135
No
136
2. T F
The assumption required by the pooled test is that
the samples
have equal variances.
3. Consider the following sets of boxplots of scores between two
age groups.
Set 1
Set 2
Set
137
Group Statistics
Group
Control
Experimental
Mean
46.80
38.38
5
8
Std.
Deviation
3.42
4.78
Std. Error
Mean
1.53
1.69
Run Time
F
1.09
Sig.
.32
t
3.41
3.70
df
11
10.653
Sig.
(2-tailed)
.006
.004
Mean
Difference
8.42
8.42
Std. Error
Difference
2.47
2.28
Ha: _______________________
Report the results for testing about the population mean times:
Test statistic: ______________
p-value: __________________-
__
Decision: (circle one)
Fail to reject H0
Reject H0
Thus
138
139
140
141
Formula Card:
142
#4
i.
143
One
Two
More
than two
Two
Quantitative
Quantitative
144
Hypothesis Test: You can now implement the Five Steps for
hypothesis testing.
1. State the Hypotheses:
H0: ___________ = ___________ and Ha: ___________
___________ , where __________ represents:
145
146
iii. Note there is no time order for this data. If there were,
since you need EACH sample to be a random sample,
how many time plots would you need to make to check
this assumption? _______ time plot(s)
iv. Construct the Q-Q plots necessary to check the
assumption about normally distributed populations.
(Recall that in order to split a data file, the command is:
Data> Split File.) Does it appear that the assumption
that each sample comes from a normally distributed
population is met? Why?
will
be
Test-Statistic:
f. Generate the ANOVA output using Analyze> Compare
Means> One-Way ANOVA. (Make sure you have turned
off any split file features you may have been using in your
assumptions checks.)
Under Options, select the
Descriptive (which gives you sample means and standard
deviations) and the Homogeneity of variance test
(Levenes test) options.
i.
not valid
to ____________ .
g. What is the value of the test statistic? ____ = ____________
147
Note:
This is not the same as the distribution of the
population that the data were drawn from, and will be the
model used to find the p-value.
148
4. Decision:
What is your decision at a 5% significance level? Reject H0
Fail to Reject H0
Remember:
significant
significant
Reject H0
Fail to Reject H0
Results statistically
149
150
151
equal not
equal.
unpooled
t-test, and
population
sample
variances.
One way to check this assumption is to use Levenes test and see
if the p-value is greater than less than or equal to 0.10 (or
any reasonable significance level).
152
153
Between Groups
Within Groups
Total
Sum of
Squares
601.916
3331.037
3932.953
df
2
60
62
Mean Square
300.958
55.517
F
5.421
Sig.
.007
Multiple Comparisons
Dependent Variable: Gain in Weight
Tukey HSD
(I) Condition
Control
Cog Behav
Family
154
(J) Condition
Cog Behav
Family
Control
Family
Control
Cog Behav
Mean
Difference
(I-J)
-3.65
-8.29
3.65
-4.64
8.29
4.64
155
156
2.
3.
4.
157
158
Formula Card:
159
Plot 1
Plot 2
Plot 3
160
One
Two
More
than two
Two
Categorical
Quantitative
Quantitative
161
162
163
Task: Fit a linear model to the data. Refer to the first part of the
ILP for response/explanatory variables. If you have questions
about the regression output after the ILP activities, refer to
Supplement 8 at the beginning of this workbook for more details.
3. Interpret the estimated slope b1. Clearly explain what the slope
says about the change in the teen birth rate.
6. Use the regression line to predict the teen birth rate for New
Mexico (with a poverty rate of 25.3%) and for Michigan (with a
poverty rate of 12.2%). How do they compare to the observed
TeenBrth values for the states of New Mexico and Michigan?
164
165
ILP:
Hypothesis Test: You can now implement the Five Steps for
hypothesis testing.
1. State the Hypotheses:
H0: ___________ = ___________ and Ha: ___________
___________ , where __________ represents:
166
b. Give the value for each test statistic listed in part (a).
c. Which of the test statistics in part a would not be
appropriate for conducting a one-sided version of the
alternative hypothesis?
3. Calculate the p-Value:
What is the SPSS reported p-value for both test statistics?
________________
Are these the p-values you want? ______________________
4. Decision:
What is your decision at a 5% significance level? Reject H0
Fail to Reject H0
Remember:
significant
significant
Reject H0
Fail to Reject H0
Results statistically
167
168
5.
Construct a Q-Q plot of the residuals.
pattern of the plot.
169
Why would you not want to use this model to predict the teen birth
rate for a state that has a poverty rate of 2%?
R
.598
R Square
.358
Adjusted
R Square
.318
Std. Error of
the Estimate
6.807
ANOVA
Model
1
Regression
Residual
Total
Sum of
Squares
413.01
741.43
1154.44
df
1
16
17
170
Mean Square
413.01
46.34
F
8.91
Sig.
.009
Coefficients
Model
1
(Constant)
Age
Unstandardized
Coefficients
B
Std. Error
61.05
6.08
-.38
.13
Standardized
Coefficients
Beta
t
10.05
-2.99
-.598
Sig.
.000
.009
x x 2883.131 .
We also have: x 46.22 and Sxx =
2
H0:_____________________
p-value:
Fail to reject H0
H0
171
Reject
No
172
20
Explain:
10
Residual
-10
-20
20
25
30
35
40
45
50
55
60
65
70
75
Age
173
(Oi Ei ) 2
Ei
174
so
Formula Card:
___________________________________________
2. A national organization wants to compare the distribution of
level of highest education completed (high school, college,
masters, doctoral) for Republicans versus Democrats.
175
___________________________________________
3. A preservation society has the percentages of five main types
of fish in the river from 10 years ago. After noticing an
imbalance recently, they add some fish from hatcheries to the
river. How can they determine if they restored the ecosystem
from a new sample of fish?
___________________________________________
Falls
15%
Drownin
g
4%
Fire
Poison
Other
3%
16%
17%
Falls
161
Drownin
g
42
Fire
Poison
Other
33
162
150
176
177
Null %
Observ
ed
Expect
ed
MotorVeh
icle
45%
442
Fall
s
15
%
161
Drowni
ng
4%
Fir
e
3%
Poiso
n
16%
Othe
r
17%
42
33
162
150
Tota
l
100
%
990
3.663
Test Statistics
Catego
ry
Chi3.663
Square
df
5
Asymp.
.599
Sig.
a. The p-value is _____________ .
b. The expected value of the test statistic assuming the H0 is
true is _______ .
c. The large p-value is consistent with the fact that our
observed test statistic value is
greater than
less
than
the expected test statistic value (under the null
hypothesis).
178
4. Decision:
What is your decision at a 5% significance level? Reject H0
Fail to Reject H0
Remember:
significant
significant
Reject H0
Fail to Reject H0
Results statistically
Female
Male
Bachelor
Master
642
522
227
179
Profession
al
32
45
Doctorate
18
27
Use a 1%
Hypothesis Test: You can now implement the Five Steps for
hypothesis testing.
1. State the Null Hypotheses:
H0: __________________ _________________________________________
2. Checking the Assumptions and Computing the Test
Statistic
179
a. Show how the expected count 531.8 (first cell for males)
was computed.
b.
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases
Value
9.514a
9.485
3
3
Asymp. Sig.
(2-sided)
.023
.023
.024
df
5.099
1692
Remember:
significant
significant
Reject H0
Fail to Reject H0
Results statistically
5. Conclusion:
What is your conclusion at a 1% significance level
180
5% instead of 1%?
3% instead of 1%?
2.3% instead of 1%?
2% instead of 1%?
181
________
182
________
Example
Tests
Exam
Question
on
________
Chi-Square
Value
8.195a
8.298
8.067
96
df
2
2
1
Asymp. Sig.
.017
.016
.005
183
Exam 1
Questions
184
response
explanatory
variable.
d. Fill in the blank. Suppose the younger patients generally respond better to
acupuncture and that the patients in the untreated group were mostly
older (over 50), while those in the real and sham acupuncture groups were
generally younger.
[1]
The variable age would be an example of a(n)
variable.
2. The
following
graph
for
the
color
of
jelly
by the JB Company.
Which of the following
description(s) are appropriate
for this distribution?
(Circle all that apply.)
[2]
Symmetric
shows
beans
Uniform
Right-skewed
Left-skewed
None of these is appropriate
185
the
distribution
in
bags
produced
population statistic
population parameter
4. A law firm in the Dallas-Fort Worth area prepares contracts and other
legal documents and uses a courier service to deliver the documents to its
many clients. Recently a partner reported that a few complaints have come
in from some of their best clients about delayed contract deliveries. The
current courier service used is Metro Delivery. There are two other
courier services that have opened up in past 2 years. Should they keep
using Metro or consider a new one?
To help address this question, a study was conducted over a month period
in which all three couriers were used. When a delivery was required, one of
the couriers was randomly selected. One of the responses of interest was
the total delivery time, that is, the time (in minutes) from when the order
is phoned in to when the documents are delivered to the destination.
Below are the boxplots and some additional numerical summaries for
comparing the total delivery times for the three couriers.
186
187
a.
What type of variable is total delivery time? Circle one:
[1]
Categorical
Quantitative
b. Compute the approximate IQR of the delivery times for DFW Express.
[2]
IQR = ___________________________________________
c.
[2]
Give the value of the upper boundary (or fence) that would be used to show
why the longest delivery time for DFW Express is an outlier. Show all work.
Upper boundary or fence = ______________________
d. Fill in the blank. The Q1 of the delivery times by Metro is equal to the
[1]
e. Fill in the blank. For Carborne Carrier, 25% of the deliveries took longer
[2]
than ________________.
f.
[2]
_______________
g.
188
189
5. A survey was sponsored by the Ann Arbor News. A random sample of 200
Ann Arbor residents resulted in 110 stating they support extending the Art
Fair by one day.
a.
[1]
What is the value of the population proportion of Ann Arbor residents who
support extending the Art Fair by one day?
d. Suppose the survey will be repeated and the director of the Art Fair would
like to ensure a 95% margin of error of at most 3%. What is the minimum
number of such people that should be surveyed?
[2]
190
c.
True
[2]
False
7. Suppose that the amount of time spent waiting for your bus to our campus
each day is a uniform random variable between 0 to 20 minutes.
a.
[3]
Sketch a picture of the model for waiting time for the bus. Provide labels
for each axis and some values along each axis.
A is the event that you wait at least 10 minutes, that is your waiting time is
in the interval [10,20].
B is the event that you wait at most 15 minutes, that is your waiting time is
in the interval [0,15].
C is the event that you wait at most 10 minutes, that is your waiting time is
in the interval [0,10].
Answer the questions based on the information given above. Show all work.
b. What is P(A)?
[1]
Final answer: ________________
191
Yes
No
Explain briefly.
Probability
0.05
0.10
0.20
[2]
Explain briefly why the owner is not wise to buy 6 magazines at the
beginning of each week.
Note: longwinded explanations will be penalized!
192
[4]
b. Suppose that 35 of the 100 UM students surveyed stated they have their
own laptop, for a sample proportion of 0.35. If the rate of students that
have their own laptop is indeed 30%, how likely would it be to a sample
proportion as large as 0.35 or larger? Show all work.
[3]
[3]
193
b. What is probability the score for the first randomly selected student will
be at least 58 points?
[3]
[2]
Given that the first randomly selected student has a score of at least 58
points, what is the probability that the next randomly selected student
scores below 58 points?
[2]
[3]
ii. What is the probability that exactly 2 of the 6 students will have a
high level of stress (a score of at least 58 points)?
194
a categorical variable
variable
c.
a quantitative variable
a normal
195
a statistic
a sample
a population
Education
Level
a.
[2]
Convicted
Not Convicted
Total
At least 10 years
50
150
200
135
185
165
315
300
500
ii. Find the probability you circled and circle the appropriate conclusion.
The probability =______________________________________
Thus it appears that conviction status
is
is not
independent of education level.
3. Based on the graph
adapted from the USA
Today graph "Seeking a
cure", is it appropriate to
say that the distribution
of the number diagnosed
with diabetes is right
skewed?
Circle one:
Yes
No
[2] Briefly explain:
196
4. The scores for a recent pretest were recorded for the two sections of a
class (all scores were integer values). A summary of the two score
distributions is provided below.
ii.
The average difference between the individual scores is roughly 3.2 points.
iii. We expect 95% of the scores to be within 3.2 points of the mean.
iv. The individual scores vary from the mean by about 3.2 points, on average.
d. At the right we have one more plot but the title is missing the letter for
the Class. For which class is this a plot of?
[2]
Clearly circle your answer:
Class A
Class B
197
a.
yes
no
Does the interval from part (c) provide evidence that a minority of all
adults prefer the current bowl game system?
yes
no
Consider the following statements below. Clearly circle all which correctly
explain the meaning of the 90% confidence level.
i
198
ii
[2]
The 90% confidence interval for the population proportion of all adult
college football fans who would prefer a playoff tournament to the current
system was found to be (0.67,0.72).
Which of the following is a correct interpretation of this 90% confidence
interval? Clearly circle your one answer.
i
ii
iii
iv
199
Final answer:_________________
200
6. A study at the United States Postal Service suggests that the time taken
to serve an individual customer at a post office is normally distributed with
a mean of 4 minutes and standard deviation of 45 seconds.
a.
[4] 20% of the customers will have a service time shorter than _____ minutes.
b. What is the probability that a customer will take longer than 5 minutes?
[3]
[3]
201
7. Suppose that among all of the many cars parked at the stadium on a game
day, 20% are out-of-state. Let X represent the number of cars in a random
sample of size 6 that are out-of-state. The probability distribution for X is
given below.
X=x
P(X = x)
a.
0
0.26214
1
0.3932
2
2
0.2457
6
0.08192
0.01536
5
0.00153
6
6
0.00006
4
What is the distribution for the random variable X = the number of out-ofstate cars in a random sample of 6 cars? (Be specific.)
[3]
b. What is the probability of finding at least 1 but no more than 4 out-ofstate cars in your random sample of 6 cars?
[2]
c.
[2]
[2] Based on the above, circle all of the statements below that are true.
i.
With such a large sample size, there would be a very large margin of
error, so this poll gives very accurate information about the population
of all baseball fans opinion on Bonds achievements.
ii. Since the sample was a volunteer sample, it may not give information
that is representative of the population of all baseball fans.
202
iii. The responses are not independent, therefore this data should not be
used to create a confidence interval.
203
10. The overbooking problem: Airlines find that a small percentage of ticket
holders fail to show up to board a flight. Assume that the percentage is
10%. As a result, the airlines often sell more tickets than the capacity of
the plane. Suppose for planes with 120 seats, the airlines sells 130 tickets.
Let X be the number of passengers out of 130 that actually show up.
a.
[2]
If 130 tickets are sold, how many passengers are expected to actually show
up?
If 130 tickets are sold, what is the probability that every passenger that
actually shows up will get a seat? Show all work.
204
205
Statistics
Heart Rate
N
Valid
Missing
Mean
Minimum
Maximum
1000
0
110
65.0
155
206
Observed value
207
quantitative
quantitative discrete
quantitative discrete
quantitative
quantitative discrete
quantitative
208
4
0
3
5
0
2
5
0
1
5
0
5
0N
u
m
b
e
ro
fF
rie
n
d
s
Statistics
Number of Friends
Mean
Std. Deviation
i.
226
87
[2]
The shortest list had about ____ friends on the social
networking account, while 75% of the lists had at least _____
friends.
ii. Report the average number of friends on a social
networking account. Include the appropriate symbol in your
answer.
[2]
=
iii. The standard deviation reported above is 87. Consider the
following interpretations of this value and clearly circle the
correct interpretation(s).
[2]
209
Ho:________________________Ha:___________________________
where the parameter _____ is defined (in the context of this
problem) as
b. Provide the value of the test statistic (including its symbol) and
the corresponding p-value. Include a sketch to show how the pvalue is found.
[5]
Sketch (include labels):
Symbol
Test Statistic = ______ = ______________________
p-value= ________________________
c. Give the decision and corresponding conclusion by circling the
appropriate statements:
[2]
The statistical decision at a 5% significance level is:
Reject
Ho Fail to reject Ho
210
Therefore, there
is
evidence to say that,
is not
sufficient
ii. How many red candies would you expect to find in this bag?
[1]
Final Answer: ____________________
iii. What is the probability of finding at least one red candy?
[3]
211
Yes
No
Explain briefly:
212
Yes
No
213
[2]
This is an (circle one) observational study
experiment.
214
the
215
(circle
one)
response
response
11.
Bought
popcorn?
Yes
No
216
217
Symbol
Test Statistic = ______ = __________________
p-value = ________________________
d. Give the decision by circling the appropriate statements:
[1] The statistical decision at a 5% significance level is: Reject Ho
Fail to reject Ho
218
e. You are asked to report the conclusion of this study to the Vice
President of your company. With the significance level of 5%,
what should your conclusion be? As your boss appreciates
brevity, provide 1-2 well structured sentences.
[2]
f.
219
confounding
the
3. Application levels: This graph was part of the St. Olaf Annual
Report, in the section providing information about applications
to the college.
Based on this graph, is it appropriate to say the distribution of
application levels is skewed to the left?
[2] Circle one:
Briefly explain:
Yes
No
220
categorical,
[1 pt each]
How long would you say dinner usually lasts when your
family eats dinner together (in minutes)?
categorical
quantitative discrete
quantitative continuous
In the last week, how many evenings did your family eat
dinner together?
categorical
quantitative discrete
quantitative continuous
Prep_Time (minutes)
Mean
Std. Deviation
35.6
8.9
221
222
Stay-at-home Mom
212
Working Mom
198
402
600
223
because
224
b.
[2]
c.
[3]
225
226
[2]
Final Answer: ______________
c. Which of the following are valid interpretations of the 99%
confident level? (Circle all that apply.) [2]
If the sampling procedure were repeated many times,
then approximately 99% of the resulting intervals
would contain the population proportion of all
customers that prefer tea over coffee.
There is about a 99% chance that the population
proportion of all customers that prefer tea over coffee
lies in the interval (0.32, 0.48).
For repeated samples of 250 customers from the same
population, the proportion of all customers that prefer
tea over coffee will fall in the interval (0.32, 0.48) 99%
of the time.
d. What sample size would be required to estimate the population
proportion with 99% confidence and a margin of error of 4%?
Show your work.
[2]
Final Answer: ______________
9. Got the vote? A politician claims that she will receive more
than 70% of the vote in an upcoming election. To test this
claim, a random sample of 20 voters will be surveyed. The
significance level is set at 10%.
227
228
________________
c. True or False? One of the conditions required for the Z test for
the above hypotheses to be valid is that the model for the
original population be normal.
[2]
True
False
229
12.
Decision Making: A researcher prepares a Null Hypothesis
(H0) and an Alternative Hypothesis (Ha), runs an experiment,
and then makes a decision to reject or fail to reject the Null
Hypothesis.
In each case, use the given information to
determine which decision was made and circle it.
a. The p-value was 0.08852 and the significance level was 0.05.
[1]
Decision: (circle one)
Reject H0 Fail to
Reject H0
b. A Type I error was made.
[1]
Decision: (circle one)
Reject H0
Reject H0
Fail
to
Reject H0
Fail
to
13.
Weights of Male High School Seniors: Suppose the
weights for high school male seniors are normally distributed
with an average of 166 pounds and a standard deviation of 7
pounds. Use this model to address the following questions.
a. Jim weighs 175 pounds. How many standard deviations is he
from the mean?
[2]
230
Exam 2
Questions
231
Pair 1
a.
Nighttime - Daytime
Mean
-.006375
Std. Deviation
.006545
Std. Error
Mean
.002314
95% Confidence
Interval of the
Difference
Lower
Upper
-.011847
-.000903
t
-2.755
df
7
Sig. (2-tailed)
.028
Give the p-value corresponding to the hypotheses in (b) and the observed
test statistic value in (c).
[2]
p-value = ______________
f. At the 5% significance level, the decision is: (circle one)
e.
232
[1]
reject H0
fail to reject H0
g. Conclusion: There is (circle one) sufficient
insufficient
[1] evidence to conclude that the batting averages for players appear to be
different for daytime versus nighttime games, on average.
2. For a certain headache pain-reliever, about 25% of patients taking this
medicine experience nausea.
A new formulation of the headache pain-reliever has been approved which
should lower this nausea incidence rate. The research team planned a study
to test this theory using a 5% level of significance.
a. State the hypotheses to be tested and identify the parameter of interest.
[4]
H0: ______________________ Ha: _______________________
where the parameter ____________ represents
c.
[6] i.
p-value: _____________
iii. Circle the decision and give a one sentence real-world conclusion.
Decision:
reject H0
fail to reject H0
233
Conclusion:
d. Based on your decision in part (c), what mistake might have been made?
(circle one) [1]
Type I error
Type II error
234
3. Company records show that drivers get an average of 32,500 miles on a set
of All-Weather tires before an unacceptable level of wear on tires results.
Hoping to improve that figure, the company has added a new polymer to the
rubber that should help protect the tires from wear caused by extreme
weather. The results of a preliminary study from twenty randomly
selected drivers who tested the new tires gave a sample mean of 33,824
miles and a standard error of the sample mean of 323 miles. Let =
population mean tire mileage with the added polymer. Assume = 0.05 and
that all necessary conditions are met to carry out the procedure.
a. Clearly state the hypotheses to be tested.
[2]
H0: ______________________ Ha: _______________________
b.
[4]
Compute the value of the test statistic and give (bounds for) the
corresponding p-value. Show all work.
[2]
Our results have proven that the new polymer increases the tire mileage.
Our results could have been due to chance.
Our results are inconsistent with the hypothesis that the population mean tire
mileage with the new polymer is 32,500 miles.
Oh my goodness, we got a p-value of nearly zero which proves the null is false!
d. The company would like to conduct a larger experiment (with more drivers)
and keep the significance level at 5%.
The features of the new
experiment, before carrying it out, are: (Circle all that apply)
[2]
Higher statistical power.
Lower p-value.
Lower Type II probability.
None of the above
235
StudyHrs
Gender
Female
Male
Mean
15.000
12.925
52
40
Std. Deviation
9.852
9.135
Std. Error
Mean
1.366
1.444
F
StudyHrs
Equal variances
assumed
Equal variances
not assumed
.216
Sig.
.643
df
Sig.
(2-tailed)
Mean
Difference
Std. Error
Difference
90% Confidence
Interval of the
Difference
Lower
Upper
1.033
90
.304
2.075
2.008
-1.262
5.412
1.044
86.843
.300
2.075
1.988
-1.231
5.381
a.
Jackie would like to compare the mean study time for female students to
the mean for male students using a 90% confidence interval.
Which confidence interval should she use? (circle one)
Unpooled
Pooled
[4] Give two reasons that support your choice (provide all details).
#1:
#2:
Report the confidence interval here: ___________________________
b. Based on the confidence interval, does there appear to be a difference
between the mean study time for these two college student populations?
[3] Circle one:
Yes
No
Explain using one simple sentence:
c.
236
[2]
False
ii. If this study were repeated many times and if the null hypothesis of no
difference between the population mean sales were true, we would see
a t-test statistic of 2.8 or larger in only 0.5% of the repetitions.
True False
iii. For this study the sample mean sales for baker 1 stores differed from
the sample mean sales for baker 2 stores by 2.8 standard errors (of
the difference in sample means).
True False
iv. The probability that is H0 true is 0.005.
True False
d. Circle the statistical decision and real-world conclusion that is appropriate.
[2] i.
Reject H0 (the population mean sales of pastries for baker 1 do not appear to
be higher than the population mean sales of pastries for baker 2)
ii. Reject H0 (population mean sales of pastries for baker 1 do appear to be higher
than the population mean sales of pastries for baker 2)
iii. Fail to reject H0 (there was insufficient evidence to demonstrate the
population mean sales of pastries for baker 1 is higher than the population mean
sales of pastries for baker 2)
237
iv. Fail to reject H0 (the population mean sales of pastries for baker 1 is equal to
the population mean sales of pastries for baker 2)
238
p-value: ______________________
7. A college dean plans a student survey to estimate the percentage of all
currently enrolled students who plan to take classes during the next spring
half term. She wants to be 90% confidence that the margin of error will
be at most 5%.
a.
[3]
Final
answer:
_________________
b. Suppose the survey is conducted and the resulting interval was found to be
(0.20, 0.30).
239
[2]
Based on the survey results, with 90% confidence, we would estimate the
sample proportion of currently enrolled students who plan to take classes
during the next spring half term to be between 0.20 and 0.30.
N
o
r
m
a
l
Q
P
l
o
t
f
L
i
e
_
T
i
m
e
4
,3
0
,2
0
,1
0
,0
0
--1
,2
0
,0-2
,0-1
,00O
1
,e
0
2
,0
3
,04
,05
,0
b
s
rv
d
V
a
lu
e
E
x
p
e
c
td
N
o
rm
a
lV
u
e
8. A manufacturer of light bulbs selects at random 100 light bulbs from their
production line and measures the lifetime of each light bulb in hours (that
is, how long the light bulb stay on before it burns out). The manufacturer
would like to estimate the population mean lifetime of its light bulbs.
A histogram and a Q-Q plot of the lifetimes of this random sample are
shown below.
Yes
No
It tells us that the distribution of the sample mean will be same as the
population distribution.
240
b. If we were testing H0: p = 0.5 versus Ha; p > 0.5 and the experiment
resulted in a sample proportion of
False
True
False
True
False
False
10. For each of the following research questions determine whether the
primary method for analyzing the data should be a confidence interval or a
hypothesis test. If the primary method is a confidence interval, circle the
words Confidence interval, and state what the confidence interval is for
by giving the symbol on the blank line. If a hypothesis test should be done,
circle the words Hypothesis Test, and write out the corresponding null
and alternative hypothesis using the appropriate notation.
[6 points]
a.
241
Ha:________________
b. Research question: Is the mean normal human body temperature less than
98.6 degrees Fahrenheit?
Primary Analysis Method: Confidence Interval for ______________
Hypothesis Test with H0:_______________
c.
a.
[2]
Ha:________________
Ha:________________
c.
[2]
242
Fail to reject H0
e.
1
0
9
8
7
6
5
4
3
2
1
01
.01
.0R
1
.a
2
0
1
0
c
e
T
im
e
(h
o
u
r.3
s
)1.401.50
F
re
q
u
n
c
y
H0:_________________
Ha:________________
N
36
Mean
1.330
Std.
Deviation
.124
243
Std. Error
Mean
.021
One-Sample Test
Test Value = 1.40
t
-3.361
c.
df
35
Sig.
(2-tailed)
.002
Mean
Difference
-.070
95% Confidence
Interval of the
Difference
Lower
Upper
-.112
-.028
What is the distribution of the test statistic if the new wax system does
not lead to faster race times on average?
[2]
d. The value of the test statistic is missing in the SPSS output. Provide its
value and the corresponding p-value for testing the hypotheses in part (a).
[2] Test Statistic Value t = ___________
p-value = _____________
e.
Based on the test results, which of the following is the appropriate realworld conclusion?
[1] Clearly circle your answer.
i.
The new wax system significantly decreased the average race time
supporting its use by the Swedish Nordic Ski Team.
ii.
3. Data was collected on the height of male college students and their
fathers. Suppose we are interested in testing the hypothesis that sons are
taller than their fathers, on average using a 5% significance level. The
researchers assistant had not taken Stat 350 and did not know which test
to perform, so he generated SPSS output for both the paired and the
independent samples t-tests. You will need to determine which output is
appropriate to use as you conduct the test below.
Paired Samples Test
Paired Differences
Pair 1
Sons - Fathers
Mean
1.556
95% Confidence
Interval of the
Difference
Lower
Upper
.393
2.719
Std. Error
Mean
.584
Std. Deviation
5.090
t
2.665
df
75
Sig. (2-tailed)
.009
1 = Sons
2 = Fathers
F
height
Equal variances
assumed
Equal variances
not assumed
4.138
Sig.
.044
Sig. (2-tailed)
2.120
150
.036
1.556
.734
.106
3.006
2.120
116.202
.036
1.556
.734
.102
3.010
df
244
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
Mean
Difference
[8]
Ha:_____________________
Give the value of the test statistic and the corresponding p-value.
Test Statistic = _____________ p-value = _______________
Based on the p-value, your decision would be (circle one):
reject H0
do not reject H0.
245
Question 3 continued
Based on the decision, the conclusion would be:
There is (circle one) sufficient insufficient evidence that sons are
taller than their fathers, on average.
246
b. Identify the cell that represents a Type 1 Error (circle your answer).
[1]
Cell A
Cell B
Cell C
Cell D
c.
Identify the cell that represents a Type 2 Error (circle your answer).
[1]
d.
[1]
Cell A
Cell B
Cell C
Cell D
Doctor
Doctor
Doctor
Doctor
247
Description:
c. Provide the test statistic and p-value for testing the hypotheses.
[4]
Show all work.
p-value: _______________
f.
Type 1 error
Beta
Type 2 error
Level of significance
Power
g. What could the news channel do to produce a test with higher power?
[1] Circle all correct answers.
Sample more students
Repeat the study many times
Use a 1% significance level
248
a.
0.227
[2]
249
Flavor
Region
1
2
N
17
12
Mean
4.376
5.617
Std.
Deviation
.880
.915
Std. Error
Mean
.213
.264
F
Flavor
a.
Equal variances
assumed
Equal variances
not assumed
.028
Sig.
.868
df
Sig.
(2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
-3.68
27
.0010
-1.240
.337
-1.932
-.548
-3.65
23.236
.0013
-1.240
.340
-1.942
-.538
[4]
Because
Because
b. Which of the following values must be the pooled estimate of the common
population standard deviation?
[1] Circle one:
c.
0.028
0.868
0.337
0.894
0.799
The pooled 95% confidence interval for the difference in the population
mean flavor rating for region 1 versus region 2 is given by (-1.932, -0.548).
[2 points each]
i.
ii.
True
True
250
False
False
HT
CI
b. In a recent survey, college students were asked the amount of time they
spend watching television on a typical day and the amount of time they
spend surfing on the Internet on a typical day. The researchers were
interested in determining whether the time spent surfing the Internet was
higher than the time spent watching television, on average.
Circle one:
c.
HT
CI
A study was done by randomly assigning 200 volunteers with sore throats
to either drink a cup of herbal tea or use a throat lozenge to ease their
pain. Each subject reported whether or not they experienced any relief.
The researchers were interested in assessing if the tea was more
effective as compared to the throat lozenge.
Circle one:
HT
CI
d. The state lottery office claims that the average household income of those
people playing the lottery is greater than $37,000. A sample of 25
households will be obtained and the incomes recorded to check
this claim.
Circle one:
e.
CI
f.
HT
HT
CI
Men and women shop at a retail clothing store. The manager would like to
estimate how much more (or less), on average, a woman spends on a typical
purchase occasion than a man.
Circle one:
HT
CI
251
a. You decide to perform the pooled version of the test, so you are
assuming
12 = 22. Give an estimate of that common population
variance. Show work.
[3]
Final
Answer
=
_____________________
b. If there is no difference between the population mean aldrin
concentrations, bottom versus mid-depth, what is the
distribution that allows us to calculate the p-value?
[2]
Final Answer = _____________________________________
c. Give the test statistic value and corresponding p-value for
testing the hypotheses about the mean aldrin concentrations.
[3]
Test Statistic Value: ________________ p-value:
___________________
d. Circle the most appropriate conclusion at the 10% level:
252
[2]
253
Ha:_______________________________
b. Report the value of the test statistic and the corresponding pvalue. Show all work.
[5]
p-value:
No
254
No
Mean
Pair
1
a.
AfterWeight BeforeWeight
-2.06250
Std. Deviation
Std. Error
Mean
2.26477
.56619
t
-3.643
df
Sig. (2-tailed)
15
.002
[2]
b. State the appropriate null and alternative hypotheses to be
tested.
[2]
Ho:______________________ Ha:__________________________
c. Before reporting the test results,
the
data
were
examined.
Consider the plot shown at the
right and clearly circle the
appropriate statement below:
[2]
255
True
False
c. The test statistic for testing the above hypotheses is 2.34 and
the corresponding p-value is 0.013. Which of the following are
256
5. True or False?
a. If a sample size is large enough, then the shape of the
histogram of the same data will be approximately normal, even
if the population distribution is not normal.
[2]
Circle one:
True
False
b. If the null hypothesis is actually true, then it is not possible to
make a Type I error.
[2] Circle one:
True
False
257
258
259
[4]
Final
Answer
=
________________
b. Give the name of the statistical result that allowed you to
answer part (a):
[2]
10.
Waiting for that Bus: A public bus company official
claims that the mean waiting time for bus number 14 during
peak hours is less than 10 minutes. Karen will take bus number
14 during peak hours on 18 different occasions and use the
results to test the officials claim.
a. In one simple sentence, explain what is wrong with the null
hypothesis
H0: x = 10.
[2]
b. Karens sample mean waiting time was 2.6 standard errors
below the null value of 10 minutes. What test statistic would
she use to perform this test? Provide the symbol and value of
the test statistic.
[3]
Test statistic: ________ = __________________________________
c. Suppose the p-value was found to be 0.01 and the decision was
to reject the null hypothesis. What is the smallest possible
value for the significance level of the test that this researcher
could have used?
[2]
Final answer: __________________
11.
Maize and Blue Marbles: A statistician is going to play
the game What is in the Box?. He will win a monetary prize if
he can correctly identify the contents of a box (which he cannot
see inside). There are two possibilities for the contents of the
box shown below as competing hypotheses:
H0: The box contains eight maize marbles and two blue
marbles.
Ha: The box contains five maize marbles and five blue marbles.
The box of marbles will be thoroughly mixed. The statistician
will be allowed to select just 1 marble out of the box (without
looking at the full contents) and observe its color. He must
make his decision based on the color of that one selected
marble. He has picked the following reasonable decision rule to
use:
260
261
iii.
[2
262
iv.
263
Wider
Narrower
Cant Decide
Reject H0
264
265
the
hypotheses
in
part
(a)
is
fail to reject H0
266
[2]
i.
[1]
ii.
[1]
267
Survived?
Yes
No
48
88
12
12
Tota
l
60
100
[2]
268
Final answer:
_______________
e. At the 5% level, do the data provide sufficient evidence to say
the population survival rate of emergency surgeries is less than
the population survival rate of non-emergency surgeries?
[2] Circle one:
Yes
No
f. If there is no difference between the two population survival
rates, and we were to repeat this study many, many times,
each time conducting the same hypothesis test at the same
significance level, then we would expect to reject H0 (circle
one)
[2] ... about 95% of the time. ... about 5% of the time.
Cannot be determined.
5. Does it make sense? Decide whether each statement is true
or false.
[2 points each].
a. If the model for employee salaries at a company is strongly
skewed right,
and a large number of employees are selected randomly, then
the
model for employee salaries will become approximately
normal.
True
False
b. The p-value is the probability that the null hypothesis H0 is
correct.
True
False
6. Which Technique? For each scenario below, determine if the
most appropriate statistical analysis technique is conducting a
hypothesis test (HT) or constructing a confidence interval (CI).
Then also give the symbol for the corresponding parameter
that is of interest. [2 points each]
a. A fitness instructor is working on a project which involved
having a random sample of HS students participate in a fitness
course during their last semester in HS. For each student it
was determined whether or not their fitness level improved.
The instructor would like to assess if the improvement rate for
all male students is higher than that for all female students.
Circle one:
_________________
HT
CI
Parameter of interest:
269
HT
CI
Parameter of interest:
HT
CI
Parameter of interest:
4.0
4.8
4.1
4.9
4.2
5.0
4.3
4.4
4.5
4.6
4.7
Label for x-axis
270
Yes
No
271
272
273
Type 1
possible
Type 2
No Error is
Final Exam
Questions
274
a.
[2]
b.
[2]
[2]
275
Thus, the distribution of overall response (circle one): does does not
appear to be the same for the two treatment populations.
2. Trash Bag Strength: Jay Krug works for a distribution company that
purchases trash bags from a vendor. The bags are required to have a
breaking strength of at least 47 pounds. The bag vendor claims that its
production process is relatively stable and produces bags with a mean
breaking strength of 53 pounds and a standard deviation of 4.2 pounds. A
week ago Jay consulted with you asking how he should go about checking
this claim. You suggested obtaining some data.
So Jay strikes an agreement with the vendor that permits him to
sample from the vendors production process. A sample of 49 bags was
randomly selected and the breaking strength of each was measured. The
mean breaking strength for the sample was found to be 2 pounds below the
process mean breaking strength cited by the vendor. Jay does remember
that sample means are statistics which vary from sample to sample, but he
is not sure if his result reflects more variation than due to chance. He
seeks your help in interpreting his result.
a.
What was the observed sample mean breaking strength for the 49
randomly selected bags?
[1]
Final answer: ___________________
b. Suppose the bag vendors claim is true and many random samples of 49 bags
were obtained and for each sample, the sample mean was computed. What
is the distribution (model) for the possible values for the sample mean?
[3] Be specific.
c.
Assuming the bag vendors claim is true, what is the probability of obtaining
a sample mean as far or even farther below the process mean than that
observed by Jay? Show all work.
[3]
276
the
the
the
the
Since the two sample sizes above are different, it will be helpful to report
the two sample proportions. Give these two sample proportions.
Proportion of respondents in a rural area that have a natural tree: ______
Proportion of respondents in an urban area that have a natural tree: _____
b. Before the hypothesis test can be performed, you remember that there
are some conditions that need to be assessed. We can assume that these
two samples are indeed independent random samples from the larger
populations of interest. So you have another condition that should be
checked before computing the test statistic. Perform this check.
[2]
277
c.
[5]
You are ready to perform the hypothesis test. Provide the value of the
observed test statistic and the corresponding p-value. Show all work.
p-value = __________________
yes
no
4. The following graph based on a USA Today article shows the distribution
of favorite style of brownies.
a.
[1]
Symmetric
Uniform
Right-skewed Left-skewed
278
None is appropriate
b. A student in Stats 350 saw this distribution and wanted to assess if this
stated distribution applies to the population of UM undergraduates. She
takes a random sample of 100 such students and records their responses.
State the appropriate null hypothesis to be tested for assessing the USA
Today stated model applies to the population of UM undergraduates.
For expressing the hypothesis, let 1 = fudgy, 2 = cakey, 3 = both same,
4 = neither, and 5 = dont know.
[2]
H0: ________________________________________________
c.
[2]
If the null hypothesis were true, what is the expected value of the test
statistic?
5. Snack Foods: A maker of crackers and other snack food was markettesting two versions of a microwaved snack called Version K and Version
S. A total of 200 adult volunteers were available for a study. Assume
these represent a random sample of potential adult consumers. The 200
adults filled out questionnaire and the results were used to compute a
Target Customer Score (TCS) based on how desirable person would be as a
customer. Then the TCS scores were ordered from smallest to largest to
form pairs of similar customers. Within each customer pair, one was
randomly assigned one to taste snack version K and the other to taste
snack version S.
The primary response was the approval rating for the tasted snack (with a
higher rating indicating higher approval). The differences in approval
rating (Version K Version S) were computed for each customer pair and
are to be used to assess if there is adequate information to say that one
snack version is preferred over the other. A 5% significance level is to be
used. Some SPSS output is provided next.
279
Mean
Pair
1
-1.66
Paired Differences
95% Confidence
Interval of the
Std.
Difference
Std.
Error
Deviation
Mean
Lower
Upper
7.07
.71
-3.06
-.26
t
-2.35
df
Sig. (2-tailed)
99
.021
Circle each of the following statements that are an assumption required for
performing the test.
[2]
i.
the standard deviation for the ratings for the version K snack is
equal to the standard deviation for the ratings for the version S snack.
ii.
iii.
iv.
v.
the ratings for the version K snack are independent of the ratings
for the version S snack.
d. Is there conclusive evidence that one snack version is preferred over the
other at the 5% level?
[2] Select just one and complete the statement.
Yes, and snack version ________ appears to be preferred.
No, the two snacks appears to be equally preferred because
_____________________________________.
280
6. An advisor looks at the schedules of his 200 students to see how many
math and science courses each has registered for. Answer the following
questions based on his summary table below.
0 Science
1 Science
2 Science
Totals
0 Math
44
36
20
100
1 Math
24
40
0
64
2 Math
16
12
8
36
Totals
84
88
28
200
Independent
[2]
Final answer: ________________
281
282
Yes
No
c.
[2]
Since this data was collected over a series of days, the manager
remembered (from his days in Stats 350) that he should examine the data
using time plots (one for each policy). Each plot showed approximately a
random scatter with no apparent pattern in roughly a horizontal band. If a
pattern had been found in any of these plots (say a downward trend), which
assumption underlying ANOVA would be called into question? (Be specific.)
Assumption:
ANOVA
wait
Between Groups
Within Groups
Total
Sum of
Squares
152.445
257.922
410.367
df
2
33
35
Mean Square
76.223
7.816
F
9.752
Sig.
.0005
d. Which of the following statements is true about the value of 7.816 in the
ANOVA table?
[2] Circle one:
It is a parameter of ANOVA.
It is an estimate of the common population variance.
It is an estimate of the common population standard deviation.
It is an estimate of the common sample standard deviation.
e.
[2]
What hypotheses are being tested with the F statistic given in the above
ANOVA table?
H0: __________________________________________________
Ha: __________________________________________________
283
f.
Yes
No
g.
[2] Use the results and clearly circle all pairs that are significantly different
(at a 5% level).
Multiple Comparisons
Dependent Variable: wait
Tukey HSD
(I) policy
1
2
3
(J) policy
2
3
1
3
1
2
Mean
Difference
(I-J)
4.91
1.13
-4.91
-3.78
-1.13
3.78
Std.
Error
1.13
1.13
1.13
1.25
1.13
1.25
Sig.
.0003
.5811
.0003
.0130
.5811
.0130
284
Descriptive Statistics
N
GPA
Entrance Test Score
25
25
Minimum
1.40
3.50
Maximum
4.00
6.30
Mean
2.59
5.03
Std. Deviation
.7769
.7385
Model Summary
Model
1
R
.645
R Square
.417
Adjusted
R Square
.391
Std. Error of
the Estimate
.606
ANOVA
Model
1
Regression
Residual
Total
Sum of
Squares
6.04
8.45
14.49
df
1
23
24
Mean Square
6.035
.367
F
16.425
Sig.
.0005
Coefficients
Model
1
(Constant)
Entrance Test Score
Unstandardized
Coefficients
B
Std. Error
-.83
.851
.68
.168
285
Standardized
Coefficients
Beta
.645
t
-.971
4.05
Sig.
.342
.0005
Question 8 continued:
a.
What is the overall range of the GPA values for this sample of
25 students?
[1]
c.
[2]
d.
[1]
e.
By how much would you expect a students GPA at the end of the
freshman year change for a two point increase in the entrance test score?
[2]
What is the predicted GPA value for a student who got the lowest
score on the entrance test?
[2]
286
Question 8 continued:
g.
Compute the approximate residual for the student from part (f).
[2]
[1]
i.
[4]
Ha: ____________________
The plot at the right is used to check the assumption that: (circle one)
287
Question 8 continued:
k.
A 95% confidence interval for the mean GPA at the end of the
freshman year for all students that have a certain entrance test score will
be the narrowest when the entrance test score is: (circle one)
[1]
l.
2.59
3.50
5.03
6.30
The study is criticized for not taking into account the socioeconomic
background of the entering freshman class. The socioeconomic background
would be an example of: (circle all that apply)
[1]
a response variable
a confounding variable
a dependent variable
a normal variable
288
True
False
289
Singl
e
41
27
12
80
Marrie
d
49
50
21
120
Tota
l
132
110
58
300
Widowed or
divorced
42
33
25
100
The researchers goal was to assess if the distribution of the primary well
being source is the same for the three populations. Give the name of the
statistical test that should be performed.
Chi-Square Tests
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases
Value
5.337a
5.210
2.794
df
4
4
Asymp. Sig.
.254
.266
.095
300
[3]
Test Statistic Value: _____________
p-value: ________________
290
Ha: _____________________
b. Which of the following are required conditions for this test to be valid?
[2] Clearly circle all that apply.
Each sample is a random sample.
The two samples are independent.
Each population has a normal model for the response.
The two population variances are equal.
c.
[2]
Compute the value of the test statistic. Report its symbol and value and
show all work.
291
1 = Winter
28
2 = Spring
33
3 = Summer
16
4 = Fall
51
The observed test statistic value is 19.8. Sketch a picture to show the
p-value of the test and then give the corresponding bounds for the p-value.
Circle one:
Reject H0
Fail to reject H0
Type II Error
292
Cannot be determined
False
The normality of the data is assured due to the central limit theorem
since our sample size is n = 33.
True False
e.
Based on the results of the test in part (b), the following three multiple
comparisons using Tukeys method are also provided from SPSS.
293
[2] Is there a significance difference between 1 and 2, the high and low dose
of the new treatment, at the 5% level? Circle one: Yes
No
Explain (be specific):
5. Do the major league baseball (MLB) teams that spend more money on
players salaries win more games? To investigate this question, a sports
fanatic regressed a teams winning rate (proportion of games that were won
by a team during the regular season) on the corresponding teams total of
the players salaries (in millions of dollars) for a random sample of 20 MLB
teams from the 2007 season. The SPSS output for this regression is
provided. Use this output to answer the following questions.
Model Summary
Model
1
R
.689
R Square
.474
Adjusted
R Square
.445
Std. Error of
the Estimate
.04366
ANOVA
Model
1
Regression
Residual
Total
Sum of
Squares
.0309
.0343
.0652
df
1
18
19
294
Mean Square
.0309
.0019
F
16.23
Sig.
.0008
Coefficients
Model
1
(Constant)
2007 Salary (in millions)
Unstandardized
Coefficients
B
Std. Error
.4335
.02204
.0010
.00025
Standardized
Coefficients
Beta
.689
t
19.670
4.029
Sig.
.0000
.0008
d. Based on this equation, what is the predicted winning rate for the Boston
Red Sox, a team whose player salaries totaled $143,000,000?
[2]
The actual winning rate for Boston was 0.593. What is the value of the
residual for the Red Sox?
[2]
Complete the sentence: Actual winning rates are expected to vary around
the regression line
[2]
by about ___________________ on average.
g.
[2]
A predication interval for the winning rate of the Boston Red Sox would be
(circle one)
wider
narrower
than a confidence interval for the mean winning rate of all teams who spent
$143,000,000 on players salary.
h. Report the test statistic and the p-value for testing if salary significantly
increases the winning rate. Then complete the conclusion by circling the
appropriate phrase.
295
[4]
i.
Does this imply that spending more money on players salaries will cause
teams to win more games? Clearly circle the one correct answer.
[2]
l00..15Q
N
o
r
m
a
-P
lo
tfU
n
s
ta
n
d
riz
e
d
R
e
s
id
u
a
l
E
xp
ectd
N
o
rm
alV
u
e
j.
[4]
.-0
0
.-0
5
.1-0
.1-0
.5
0
.d
0
.50
.1
O
b
s
e
rv
V
a
lu
e
(ii) Another plot that is used to check assumptions regarding our linear
regression model is a plot of the residuals against the independent
variable x. Which of the following plots would support that our linear
model seems reasonable? Clearly circle your answer.
296
297
Story Type
Ordinary Story
Own-Name Story
298
N
16
14
Mean
6.1
9.4
Std.
Deviation
2.17
2.87
Std. Error
Mean
.54
.77
a. Complete the sentence:
[2]
Of the children in the Own-Name Story group, 25% read for at least
_________ minutes.
299
300
c.
A 95% confidence interval for the difference between the mean reading
time for the ordinary story population and the mean reading time for the
own-name story population was found to be: (-5.3, -1.5). Circle the one
answer below that correctly completes the interpretation of the 95%
confidence level:
[2]
If we repeated this experiment with the same design,
we would expect 95% of the resulting confidence
intervals to contain the
difference in the population means of 3.4 minutes.
If the 99% confidence interval were constructed with the same data,
would the resulting interval contain the value of 0?
Yes
No
Cant Tell
Yes
No
301
Cant Tell
9. About 10 years ago, 41% of companies had their own ethics code. A survey
will be conducted today using a random sample of 15 companies and
recording whether or not they have their own ethics code. The data is to
be used to assess if there is evidence that the rate of companies with
their own ethics code has increased from previous rate 10 years ago.
a.
[3]
b. If the study were to be repeated with a larger sample size, what effect
would this have in terms of the power of the test? Circle one:
[1]
Increases
Decreases
302
No change
d. The relationship between opinion on gun control and income earned last
year (in thousands of dollars).
11. Suppose a production line manager monitored the weights of one pound (16
ounce) cans of nuts at monthly intervals to test Ho: = 16 ounces versus
Ha: < 16 ounces by taking a random sample of cans from a days production.
The results for the sampled cans on a day this past November gave a mean
of 16.24 ounces and a standard deviation of 1.81 ounces.
a.
[2]
Give the
_______ = __________________________________
b. If the null hypothesis is true, what is the expected value of the test
statistic?
[2]
c. The only feasible value for the p-value is: (circle one)
[2]
0.04
0.09
0.24
0.38
0.74
12. The Personnel Department for a company reports that 76% of all of its
2000 employees are registered for using the on-site Health Club. In one
brief sentence, explain why it would be inappropriate to construct a 95%
confidence interval using the value of 0.76 as its center.
[2]
303
[2]
304
why this
at the 5%
numerical
make this
(I)
(J)
SHEL SHEL
F
F
1
2
3
Mean
Difference
(I-J)
95% Confidence
Interval
Std.
Error
Sig.
Lower
Bound
Upper
Bound
-4.8190
1.2857
.001
-7.894
-1.744
-1.7278
1.1476
.294
-4.473
1.017
4.8190
1.2857
.001
1.744
7.894
3.0913
1.1299
.021
.389
5.794
1.7278
1.1476
.294
-1.017
4.473
-3.0913
1.1299
.021
-5.794
-.389
0
0.1
1
0.2
2
0.2
305
4
0.1
ity
a. Find the missing probability in the table. What is the
probability that Drew washes 3 cars on a Wednesday
afternoon between 4 and 5 PM?
[2]
Final Answer: _______________
b. Drew is paid $7 for each car that he washes. What is his
expected earnings for a Wednesday afternoon between
4:00 and 5:00 PM? Show your work and include your
units.
[3]
Final Answer: _______________
Consider the following two graphs. Each was made on
data collected for performing a two-independent samples
t-test to assess if there was a significant difference
between the two population means.
Data Set A: Data Set A:
Data Set B: Data Set B:
3.
Which data set would result in a larger p-value for the ttest?
[2] Circle one:
Data Set A
Data Set B
4. Henry is totally unprepared for a three question multiplechoice quiz. He decides to randomly answer each question
and hope for the best. Since he is guessing, his response
on one question does not depend on his response to
another question. There is only one correct answer for
306
[2]
Final Answer: _______________
Henry needs to get at least one question correct to
pass the quiz. What is the probability that Henry passes?
[2]
Final Answer:
______________
d.
307
[2]
c.
[2]
Final Answer:
____________________
d. The 10th movie in the sample had a running time of 1
hours and residual of 5.351. Find the actual final budget
for this movie. Show all work. Include your units.
[3]
Final Answer:
___________________
e. What is the value of the correlation between the running
time of a movie and its final budget?
[1]
Final Answer: ___________________
f. The statistician knows she
should examine the data graphically to
check various assumptions before
assessing significance of the
regression equation. One of the
plots she produced is given.
Consider the following three
statements. Circle all that are
correct.
[2]
Circle one:
(12, 147)
(45, 113)
310
Circle one:
(38, 121)
311
(29, 129)
5(
28
(
33 (
Final
___________________
d. Report the p-value of this test.
[2]
312
answer:
Final
_________________________
answer:
Type I Error
Type II Error
Cannot be determined
Statistic
Parameter
313
False
False
False
False
Score of 650
315
z-test
t-test
goodness-of-fit
homogeneity independence
[2]
Final answer = _______________
[2]
Final answer = _______________
[5]
316
317
c.
_____
_
d.
_____
_
C. one-way
ANOVA
D. chi-square test
of
homogeneity
E. chi-square
test
of
independence
F. regression
analysis
[2]
319
320
321
b. Give the symbol and value for the test statistic to test the
hypotheses in (a).
[2]
_______ = _____________________
c. Provide the p-value for testing the hypotheses stated in (a),
circle your decision at the 5% level, and state your conclusion
in the context of the problem.
[3] p-value = ______________
Decision? (circle one): Fail to Reject H0 Reject H0
Conclusion:
d. The plot below was also
produced. Clearly state
the assumption being
assessed through this
plot.
[2]
322
323
6. Tires:
A consumer advocacy group wants to assess the
difference between how long Michelin and Goodyear tires last,
on average.
A random sample of 30 Michelin Tires was
obtained, and an independent, random sample of 35 Goodyear
tires was obtained. The number of miles that each tire lasted
was recorded for each tire. The following table summarizes the
data that was collected (group 1 = Michelin, group 2 =
Goodyear).
Group 1
Group 2
Sample Mean
3120
3050
Sample Size
30
35
Sample Standard
200
195
Deviation
a. The pooled confidence interval assumes that the two
populations have equal variances. Citing appropriate evidence,
comment on the validity of this assumption.
[1]
b. An estimate of the common population standard deviation
using the above results is 197.3. Provide a 95% pooled
confidence interval estimate for the difference between the two
population mean miles. Show all work.
[4]
324
Reject H0
Not Enough
325
a. Consider the following statements and clearly circle all that are
correct:
[2]
With each 1lb increase in car weight, we would expect to
see a decrease of 5.378 miles per gallon in fuel efficiency.
With each 2,000lb increase in car weight, we would expect
to see a decrease of 10.756 miles per gallon in fuel
efficiency.
Based on the analysis above, about 57.4% of the variation
in fuel efficiency can be explained by the linear relationship
with car weight.
Based on the analysis above, about 33.0% of the variation
in fuel efficiency can be explained by the linear relationship
with car weight.
b. Give the correlation between fuel efficiency and weight.
[2]
Final Answer: _____________________
c. Give the equation of the estimated least squares regression
line for predicting fuel efficiency from weight.
[2]
Final Answer: ______________________________________________
d. The unstandardized coefficient value of 41.764 given in the
SPSS output is an example of:
[2] Circle all that are correct:
a sample statistic
a population parameter
test statistic
a slope of the regression line for the population
an intercept of the regression line for the sample
e. The actual fuel efficiency of the car that weighs 4,054lb in the
data set was 15.5 miles per gallon. Compute the residual for
this observation. Show all work and include the units.
326
[2]
[1]
327
[1] The plot at the right is used to check the assumption that:
(circle one)
328
are randomly
distributed.
values are
randomly
distributed.
terms have
constant variance.
constant variance.
329
b. There are 4 missing values in the ANOVA table. Find the values
and enter them clearly in the table.
[2]
c. If there was no difference in the average scores for the three
instruction methods, what is the distribution of the test statistic
for testing the hypotheses in part (a)?
[2]
Final Answer: ____________________________
330
(circle one)
Reject H0
Fail to Reject H0
[3]
331
10.
Name that Scenario: One important aspect in Statistics is
to understand which statistical methods or procedures are
appropriate to use to address the research problem or question
of interest.
The Janus Sisters make cupcakes and are considering
providing their specialty treats for individual sale at local coffee
shops. Before they embark on this endeavor, they have many
plans to formulate and decisions to make. They need your help
in deciding which statistical method to use to address their
questions. For each question, select the letter corresponding to
the statistical analysis technique most appropriate for
addressing that question.
[2 points each]
_____ 1.
_____ 2.
_____ 3.
A. Simple linear
regression
B. 1-sample t-test for
a population mean
C. Paired t-test for a
population mean
difference
D. 2-sample t-test for
the comparison of
two population
means
E. 1-sample Z-test
for a population
proportion
F.
332
2-sample Z-test
for the
comparison of two
population
proportions
H. Chi-squared test
of independence
I.
Chi-squared test
of homogeneity
_____ 5.
333