Вы находитесь на странице: 1из 33

Solved problems 2

1. Central Tendency and Variability Measures 2


2. Discrete and Continuous Probability Distributions 4
3. Sampling and Sampling Distributions 7
4. Confidence Intervals 11
5. Simple random survey 17
6. Stratified survey 20
7. Hypothesis testing 22
Practice Problems 28
Appendix 1 31
Table 1. Probabilities for the normal distribution for z-score values less than zero 31
Table 2. Probabilities for the normal distribution for z-score values greater than zero 32
Appendix 2 33
Critical values for the Student distribution 33
Solved problems

1. Central Tendency and Variability Measures

The table below lists the scores registered for the 2008 Masters Golf Tournament. Based on the
data it provides, compute the following indicators and give their interpretation:
a) Mean, median and modal score for the entire tournament;
b) The coefficient of variation;
c) The range within at least 75% of the scores lie.
Final Score Frequency
280-283 2
284-287 8
288-291 14
292-295 14
296-299 5
300-303 2

Solution:
a) Mean, median and modal score for the entire tournament;
Final Score Frequency (ni) xi xi*ni Cumulated frequencies
280-283 2 =(283+280)/2=281.5 =281.5*2=563.0 2
284-287 8 285.5 2284.0 10
288-291 14 289.5 4053.0 24
292-295 15 293.5 4402.5 38
296-299 5 297.5 1487.5 43
300-303 2 301.5 603.0 45
Total 46 --- 13393.0 ---

Mean:

=
= = .

On average, a player scored 291 points.

Median:
For computing the median value, it is necessary to determine its position:
+ +
= = = .

The interval that contains the median value is the first interval for which cumulated frequencies
are higher than the position of the median (interval shaded with blue).

= +

.
= + = .

Half the players scored more than 291 points and half scored less.
Mode:
The interval that contains the mode is the interval with the highest frequency (interval shaded in
green).

= + = + = .
+ +
= = =
= + = =
Most players scored 292 points.

b) The coefficient of variation;


Final Score Frequency (ni) xi )
(
280-283 2 281.5 =(281.5-291.15)2*2=186.2450
284-287 8 285.5 255.3800
288-291 14 289.5 38.1150
292-295 15 293.5 82.8375
296-299 5 297.5 201.6125
300-303 2 301.5 214.2450
Total 46 --- 978.4350

Variance:
) .
(
= = = .

Standard deviation:
= = . = .
Coefficient of variation
.
=
= = . % <

.
population is homogenous and the mean is representative

c) The range within at least 75% of the scores lie.



According to Chebyshevs theorem, at least ( ) % of the values of a population will fall
within k standard deviation from the mean, regardless of how the population is distributed. So, at
least 75% of the values will fall within 2 standard deviations from the mean:
(

= . = . . ) = = =
.
=
Taking this into account, at least 75% of the scores will fall between 281 points and 301 points:
= . . = .

+ = . + . = .

2. Discrete and Continuous Probability Distributions

2.1. A class of students took a multiple choice test with 10 questions, each with four choices.
a) Calculate the the probability that a student who guesses randomly will answer exactly five
questions correctly;
b) Calculate the probability that a student who guesses randomly will answer less than five
questions correctly;
c) Calculate the probability that a student who guesses randomly will answer more than five
questions correctly;
d) Assuming each student in the class guesses randomly on each question, calculate the mean,
variance and standard deviation for this distribution.
Solution:
Given that:
There are two possible outcomes (the student is either right or wrong),
There are a finite number of experiments (10 questions in the test),
Each experiment is independent (answering a question is not influenced by the answer to
another question),
the distribution is binomial.

a) Calculate the probability that a student who guesses randomly will answer exactly
five questions correctly;
Each question has possible answers, but only one of the is correct, so the probability to guess the

right answer is = = . . It follows that the probability of not guessing the correct answer
is . = . .
The probability that a student guessing randomly will answer five questions correctly is 5.84%:
! !
() = ( )() = . .
! ( )! ! !
= . . = .

b) Calculate the probability that a student who guesses randomly will answer less than
five questions correctly;
In order to compute the probability that a student randomly guessing will answer less than five
questions correctly, it is necessary to compute the probabilities for answering 0, 1, 2, 3 and 4
questions correctly and add them up:
!
() = . . = .
! !
!
() = . . = . . = .
! !
!
() = . . = . . = .
! !
!
() = . . = . . = .
! !
!
() = . . = . . = .
! !
() + () + () + () + () = . + . + . + . + .
= .
The probability that a student will guess the answer for less than five questions is 92.16%.

c) Calculate the probability that a student who guesses randomly will answer more than
five questions correctly;
In order to compute the probability that a student randomly guessing will answer more than five
questions correctly, we simply subtract the probabilities computed in a) and b) from 1:
() + () + () + () + () = [() + () + () + () + () + ()]
= (. + . ) = . = .
The probability that a student will guess the answer for more than five questions is 1.97%.

d) Assuming each student in the class guesses randomly on each question, calculate the
mean, variance and standard deviation for this distribution.
Mean: = = . = .
Variance: = ( ) = . . = .
Standard deviation: = = . = .

2.2. The selling prices for homes in a community follow a normal distribution with an average
price of $276k and a standard deviation of $32k. Based on this information, calculate the
probability that the next house in the community will sell for:
a) more than $206k;
b) less than $220k;
c) more than $250k, but less than $350k;
d) a price ranging between $276k and $325k.

Solution:
Given that the data is normally distributed, in order to determine the probabilities, it will be
necessary to compute the z-scores:

=

and compare them to the values in the normal distribution table.

a) more than $206k;


, , ,
, = = = = .
, ,
Table 1 in Appendix 1 shows the probability that the next house will be sold for less than $206k.
However, the problem asks for the probability that the next house will be sold for more than $206k,
which means it is necessary to subtract the probability in the table from 1.
, = . ( , ) = . ( , ) = .
= .
The probability that the next house in the community will sell for more than $206k is 98.57%.

b) less than $220k;


, , ,
, = = = = .
, ,
Table 1 in Appendix 1 shows the probability that the next house will be sold for less than $220k.
, = . ( , ) = .
The probability that the next house in the community will sell for less than $220k is 4.01%.

c) more than $250k, but less than $350k;


, , ,
, = = = = .
, ,
, , ,
, = = = = .
, ,

According to Table 1 in Appendix 1, the probability that the next house will be sold for less than
$250k is 0.2090.
In Table 2 in Appendix 1, we find the probability that the next house will be sold for less than
$350k, which is 0.9896.
By subtracting the probability that the next house will be sold for less than $250k from the
probability that the next house will be sold for less than $350k, what remains is the probability that
the selling price of the next house will be greater than $250k, but less than $350k:
(, , ) = ( , ) ( , ) =
= . . = .
The probability that the next house in the community will sell for more than $250k, but less than
$350k is 78.06%.

d) a price ranging between $276k and $325k.


Because $276k is the mean price, the z-score for it will be equal to 0, which means we have 0.5
probability that a given value will be either less than or greater than it.
, , ,
, = = = = .
, ,
In Table 2 in Appendix 1, we find the probability that the next house will be sold for less than
$350k, which is 0.9370. so, in order to determine the probability that the next house will sell for
more than the average price, but less than $325k, it is necessary to subtract the probability that the
price will sell for less than the average price from the probability that the price will be less than
$325k:
(, , ) = ( , ) ( , ) =
= . . = .
The probability that the next house in the community will sell for more than $276k, but less than
$325k is 43.70%.
3. Sampling and Sampling Distributions

3.1. A 2015 report claims that in the EU the average annual milk consumption was 88.6 litres per
person, with a standard deviation of 26.9 litres per person.
a) If a random sample of 40 people is selected, what is the probability that their average annual
milk consumption is more than 80 litres per person, but less than 95 litres per person?
b) What is the probability that, for the same sample of 40 people, their average annual
consumption is between 85 and 90 litres per person? Compare the results with the ones
obtained previously and explain the difference.
c) Calculate the probability for the same limits (85 and 90 litres per person), but in case of a
sample of 60 people. What changes and how? Explain the differences.
d) Assuming that the sample mean resulted from the 60 people interviewed is 78 litres, how
likely is it that the true population mean is still 88.6 litres?
e) Knowing that the population in a community in 2015 was 7431 people, calculate the
probability for the same limits (85 and 90 litres per person) in a sample of 500 people.

Solution:
a) If a random sample of 40 people is selected, what is the probability that their average
annual milk consumption is more than 80 litres per person, but less than 95 litres per
person?
.
= = = .

.
= = = . ( < . ) = .
.
.
= = = . ( < . ) = .
.
(. < < . ) = ( < . ) ( < . ) = . . = .
Based on a sample of 40 respondents, the probability that the average annual milk consumption
is more than 80 litres per person, but less than 95 litres per person is 83.09%.

b) What is the probability that, for the same sample of 40 people, their average annual
consumption is between 85 and 90 litres per person? Compare the results with the
ones obtained previously and explain the difference.
.
= = = .

.
= = = . ( < . ) = .
.
.
= = = . ( < . ) = .
.
(. < < . ) = ( < . ) ( < . ) = . . = .
Based on a sample of 40 respondents, the probability that the average annual milk consumption
is more than 85 litres per person, but less than 90 litres per person is 43.16%.
Narrowing the range within which the mean might fall leads to diminishing the probability that it
is comprised within the given limits. This is because it is more probable to find a value within a
wider interval than within a narrower one.
c) Calculate the probability for the same limits (85 and 90 litres per person), but in case
of a sample of 60 people. What changes and how? Explain the differences.
.
= = = .

.
= = = . ( < . ) = .
.
.
= = = . ( < . ) = .
.
(. < < . ) = ( < . ) ( < . ) = . . = .
Based on a sample of 60 respondents, the probability that the average annual milk consumption
is more than 85 litres per person, but less than 90 litres per person is 50.62%.
Increasing the sample size reduces the error, so the probability that a mean lies within a certain
interval increases. By increasing sample size, the sample mean becomes increasingly more
representative, which means less variation around and more certainty that the result obtained is not
due to chance.

d) Assuming that the sample mean resulted from the 60 people interviewed is 78 litres,
how likely is it that the true population mean is still 88.6 litres?
.
= = = .

.
= = = . ( < . ) = .
.
If the true mean of the population is 88.6 litre per person, then there is only a 0.64% probability
that a sample of 60 people with an average annual milk consumption of 78 litres could be drawn
from the population. Alternatively, if the sample mean is 78 litres per person, there is only a
0.64% probability that the mean of the population is 88.6 litre per person.

e) Knowing that the population in a community in 2015 was 7431 people, calculate the
probability for the same limits (85 and 90 litres per person) in a sample of 500 people.
Since the sample size is larger than 5% of the population (500/7431=0.0673>0.05) it is necessary
to use the finite population correction factor in computing the mean standard error:
.
= = = .

.
= = = . ( < . ) = .
.
.
= = = . ( < . ) = .
.
(. < < . ) = ( < . ) ( < . ) = . . = .
The probability that the population in the community has an average annual consumption
between 85 and 90 liters per person, based on a sample of 500 respondents and with a confidence
level of 95%, is 88.39%.
3.2. Quality control programs often establish control limits that are three standard deviations from
the target mean of a process. If the mean of a sample taken from the process is within the control
limits, the process is deemed satisfactory. A process is designed to fill bottles with 16 ounces of
soda with a standard deviation of 0.5 ounces. Determine the control limits above and below the
mean for this process using a sample size of n = 30.

Solution:
.
= = = .

= . = .
Lower control limit:
+ = + . = .
Upper control limit:

If a 30-bottle sample is collected, the process is considered satisfactory if the sample mean is
between 15.727 ounces and 16.273 ounces.

3.3. According to a 2011 study, 27.7% of high school students smoke. Random samples are
selected from a high school that has 632 students.
a) If a random sample of 31 students is selected, what is the probability that fewer than 10 of
the students smoke?
b) If a random sample of 60 students is selected, what is the probability that more than 19 of
the students smoke?
c) If a random sample of 110 students contains 20 smokers, does this result support the 2011
study?

Solution:
In order to use the normal approximation of the binomial distribution, it is necessary that the data
meets two criteria:
. .
( ) ( . ) .
a) If a random sample of 31 students is selected, what is the probability that fewer than
10 of the students smoke?
Since sample size is less than 5% of the population (/ = . < . ), it is not
necessary to use the finite population correction factor, so the standard error of the mean will be:
( ) . .
= = = .

In the sample of 31 students selected, 10 students represent a proportion of / = . , so
the z-score for this proportion will be:
. .
. = = = .
.
According to Table 2 in Appendix 1, the probability that selecting a sample of 31 students, fewer
than 10 of them will smoke is 0.7157.

b) If a random sample of 60 students is selected, what is the probability that more than
19 of the students smoke?
By increasing the sample size, the sample becomes more than 5% of the population, which means
it will be necessary to apply the finite population correction factor, so the standard error of the
mean becomes:
( ) . .
= = = .

The probability that 19 of the 60 students selected smoke is / = . , so the z-score will
be:
. .
. = = = .
.
According to Table 2 in Appendix 1, the probability associated with a z-score of 0.72 is 0.7642,
which indicates the probability that less than 19 of the 60 students selected smoke. Therefore, the
probability that more than 19 students smoke will be . = . .

c) If a random sample of 110 students contains 20 smokers, does this result support the
2011 study?
The sample size is again more than 5% of the population, so the new standard error of the mean
will be:
( ) . .
= = = .

The proportion in the sample is 20/110=0.1818, so the z-score will be:
. .
. = = = .
.
If the actual proportion of high school students who smoke is 27.7%, then there is only a 0.71%
chance that a sample of 110 students will include 20 or fewer smokers, therefore the results do
not support the study.
4. Confidence Intervals

4.1. After collecting data from a random sample of 35 teenagers, the average hours of sleep per
night resulted was 7.3, with a population standard deviation of 1.8 hours.
a) Calculate a 95% confidence interval for the mean;
b) Calculate a 98% confidence interval for the mean and explain the difference in the
confidence intervals resulted in a) and b);
c) A recent report claims that teenagers sleep an average of 7.8 hours per night. Based on the
result obtained in b), is this claim valid?
d) Calculate the minimum sample size needed to identify a 95% confidence interval for the
mean, assuming a 0.40 hour margin of error.

Solution:
a) Calculate a 95% confidence interval for the mean;
Confidence intervals are computed based on the sample average, the standard error of the mean
and the z-score associated to the given confidence level.
Since in this case the interest is to determine an interval centred on the sample mean, the probability
of a values being less than the mean must be equal to that of being more than the mean. Allowing
for a 5% chance that the true mean does not lie within the interval, but not specifying on which
side it could be, means it is necessary to account for half the chance on the right of the confidence
interval and half on its left. For this reason, the value that must be found in the table is not 0.95 (1-
0.05), but 0.975 (1-0.025), so the z-score will be 1.96.
The standard error of the mean is computed using the central limit theorem, since n>30 and is
known:
.
= = = .

The margin of error is obtained by multiplying the z-score and the standard error of the mean:
= = . . = .
So the confidence interval will be defined by the following limits:
= . . = .

+ = . + . = .

The actual average of sleep hours per night for a teenager lies between 6.7 and 7.9 hour with a
confidence level of 95%.

b) Calculate a 98% confidence interval for the mean and explain the difference in the
confidence intervals resulted in a) and b);
The same reasoning for determining the z-score applies here as well, so the values of z for a 98%
confidence level will be 2.33.
Since changing the confidence level does not affect the standard error of the mean, but affects the
margin of error, the former does not need to be computed again, but the later does:
= = . . = .
So the confidence interval for a confidence level of 98% has the following limits:
= . . = .

+ = . + . = .

In order to become more confident that the interval includes the true population mean, the
interval itself needs to be wider. Thus, the larger confidence level (98%) required a wider
confidence interval.

c) A recent report claims that teenagers sleep an average of 7.8 hours per night. Based
on the result obtained in b), is this claim valid?
According to the results obtained in b), there is 98% confidence that the true mean of the population
lies within (6.59;8.01) hours of sleep. Since the studys claim of 7.3 hours of sleep per night falls
within the confidence interval, the evidence in the sample supports the claim.

d) Calculate the minimum sample size needed to identify a 95% confidence interval for
the mean, assuming a 0.40 hour margin of error.
Given the margin of error, the population standard deviation and the confidence level, the sample
size can be determined based on the following formula:
. .
= = = .
.
In order to provide a 95% confidence interval with a margin of error of 0.4 hours of sleep per
night, a minimum sample size of 78 teenagers is required.

4.2. A random sample of 20 paperback novels average 425.1 pages in length. Assuming that the
page count for all paperback novels is normally distributed with a standard deviation of 92.8 pages,
calculate:
a) A 95% confidence interval for the mean;
b) A 99% confidence interval for the mean;
c) The minimum sample size needed to identify a 98% confidence interval for the mean,
assuming a 52 pages margin of error.

Solution:
a) A 95% confidence interval for the mean;
The z-score for 95% confidence level is 1.96.
Given that the population is normally distributed, it is safe to assume that the samples will also be
normally distributed.
The standard error of the mean is computed using the same formula as before:
.
= = = .

The margin of error is obtained by multiplying the z-score and the standard error of the mean:
= = . . = .
So the confidence interval will be defined by the following limits:
= . . = .

+ = . + . = .

b) A 99% confidence interval for the mean;


The z-score for 99% confidence level is 2.57.
Changing the confidence level does not affect the standard error of the mean.
The new margin of error will be:
= = . . = .
So the confidence interval for a confidence level of 98% has the following limits:
= . . = .

+ = . + . = .

c) The minimum sample size needed to identify a 98% confidence interval for the mean,
assuming a 52 pages margin of error.
. .
= = = .

In order to provide a 98% confidence interval with a margin of error of 52 pages, a minimum
sample size of 18 books is required.

4.3. The data below refers to the amount of trash generated by ten households (in pounds per day).
Assume that the population is normally distributed.
Pounds of Trash
3.9 4.6 15.6 10.5 16.0 6.7 12.0 9.2 13.8 16.8

a) Construct a 95% confidence interval for the mean based on the sample;
b) Construct a 90% confidence interval for the mean;
c) Construct a 80% confidence interval for the mean.

Solution:
a) A 95% confidence interval for the mean;
When the population standard deviation is unknown, the sample standard deviation s is used in
its place as an approximation. When you substitute s for , the Students t-distribution (or, more
simply, the t-distribution) is used in lieu of the normal distribution.
When the sample size is less than 30, the population needs to be normally distributed when using
the t-distribution. When the sample size is 30 or more, the normal distribution can be used as an
approximation to the t-distribution, even if the population is not normally distributed.

In order to construct the confidence interval, it is first necessary to compute the sample mean and
standard deviation.
.
=
= = .

( ) .
= = = .

The standard error of the mean is computed using the sample standard deviation:
.
= = = .

Rather than use z-scores to calculate the boundaries of the confidence interval, the critical t-
scores from Appendix 2 will be used. To identify the appropriate t-score, first locate the column
for the confidence level indicated by the problem (in this case, 95%).
To locate the appropriate row within that column, calculate the degrees of freedom (df) according
to the following formula: df=n1. In this problem, df=101=9. Consider the following excerpt of
Reference Table 2. The correct value of tc is underlined: tc=2.262.
The margin of error is obtained by multiplying the t-score and the standard error of the mean:
= = . . = .
So the confidence interval will be defined by the following limits:
= . . = .

+ = . + . = .

b) A 90% confidence interval for the mean;


The t-score for 90% confidence level is 1.833.
The new margin of error will be:
= = . . = .
So the confidence interval for a confidence level of 98% has the following limits:
= . . = .

+ = . + . = .

c) A 80% confidence interval for the mean;


The t-score for 80% confidence level is 1.383.
The new margin of error will be:
= = . . = .
So the confidence interval for a confidence level of 98% has the following limits:
= . . = .

+ = . + . = .

4.4. For a sample of 60 high school teachers an average annual salary of $52,113 and a sample
standard deviation of $7,804 were determined based on the data collected.
a) Construct a 97% confidence interval for the mean based on the sample;
b) Construct a 92% confidence interval for the mean.

Solution:
a) Construct a 97% confidence interval for the mean based on the sample;
The t-distribution should be used whenever the sample standard deviation s is used in place of the
population standard deviation . However, when the sample size reaches 30 or more, the t-score
values become very close to the z-score values from the normal distribution. Thus, the normal
distribution becomes a good approximation to the t-distribution when n 30.
Most published tables for the t-distribution only show probabilities up to a sample size of 30 and
then only in increments beyond this value. Because the normal distribution table does not depend
on sample size, it is more convenient to substitute the z-score for the t-score when n 30.

The standard error of the mean is computed using the sample standard deviation:
,
= = = , .

The z-score for 97% confidence level is 1.75.
The margin of error is obtained by multiplying the t-score and the standard error of the mean:
= = . , . = , .
So the confidence interval will be defined by the following limits:
= , , . = , .

+ = , + , . = , .

b) A 92% confidence interval for the mean;


The z-score for 92% confidence level is 2.17.
The new margin of error will be:
= = . , . = , .
So the confidence interval for a confidence level of 98% has the following limits:
= , , . = , .

+ = , + , . = , .

4.5. A nationwide organization examines the salaries of its administrative assistants and
determines that the minimum and maximum annual salaries are $29,500 and $68,300, respectively.
What is the minimum sample size needed to construct a 95% confidence interval for the mean
salary, assuming a $1,500 margin of error?

Solution:
The critical z-score that corresponds to a 95% confidence interval is 1.96. The standard error of
the mean is computed using the sample standard deviation:
,
= = = , .

Consider the minimum sample size formula:

=

This equation requires knowing the population standard deviation. If is unknown, it can be
estimated using the range R of the population:


=

This estimate is based on the empirical rule, which states that 99.7% of a populations values lie
within three standard deviations of the meanthree standard deviations less than the mean and
three standard deviations greater than the mean, for a total of six.
So, the estimated standard deviation will be:
, ,

= = = , .

and the minimum required sample size will be:


. , .
= = = .
,
A minimum sample of 72 administrative assistant salaries is required.

4.6. A random sample of 415 women between the ages of 40 and 45 contains 330 mothers.
Construct a 92% confidence interval to estimate the true proportion of women in that age range
who are mothers.

Solution:
Calculate the sample proportion:

= = .

Approximate the standard error of the proportion:
( ) . .
=
= = .

Calculate the margin of error for z-score=1.75:
= = . . = .
Calculate the boundaries of the confidence interval:
= . . = .

+ = . + . = .

4.7. Determine the minimum sample size required to construct a 92% confidence interval to
estimate the proportion of women between the ages of 35 and 45 who are mothers, assuming a
margin of error equal to 0.03.

Solution:
Because there is no information available to estimate the population proportion, use ps = 0.50 to
provide a conservative estimate of sample size.
Calculate the minimum sample size:
( ) . . .
= = = .
.
A sample size of 851 women between the ages of 40 and 45 will provide a 92% confidence
interval with a margin of error equal to 0.03.
5. Simple random survey

5.1. A survey was carried out among the 2000 employees of a company. The sample of 300 persons
was obtained by using random selection without replacement. The summarised results for two
questions in the survey are presented in the table below. Starting from the data collected:
a) estimate the mean age of the employees for a confidence level of 95%;
b) if the selection had been made using random selection with replacement, what would have
been the minimum necessary sample size for the same margin of error and confidence
level?
c) if the margin of error were to be diminished by 25%, what would have been the minimum
necessary sample size for the same confidence level?
d) estimate the proportion of employees who have a good and very good opinion regarding
the activity of the union.
Q1. Age
[20-30) [30-40) [40-50) [50-60) [60-70) Total
20 40 160 70 10 300
Q2. What is your opinion regarding the activity of the union?
Very good Good Neither good, nor bad Bad Very bad Total
80 150 10 50 10 300

Solution:
Age interval [20-30) [30-40) [40-50) [50-60) [60-70) Total
ni 20 40 160 70 10 300
xi 25 35 45 55 65 ---
500 1400 7200 3850 650 13600
( )
8268.89 4271.11 17.7778 6541.11 3867.78 22966.67

a) estimate the mean age of the employees for a confidence level of 95%;
It is necessary to compute the sample mean:

=
= .

And the standard deviation:
)
( .
= = = .

Then, we verify if we need to apply the finite population correction factor:

= = . > .

The standard deviation of the population is unknown, so the sample standard deviation will be
used instead for computing the error of the mean:
.
= = = .

The z-score for a confidence level of 95% is 1.96.
The margin of error will be:
= = . . = .
So the confidence interval for the mean age will have the following limits:
= . . = .

+ = . + . = .

The mean age of the 2000 employees of the company is at least 44.72 years, but not more than
46.42 years. The estimation has a confidence level of 95%.

b) if the selection had been made using random selection with replacement, what would
have been the minimum necessary sample size for the same margin of error and
confidence level?
The minimum sample size for a simple random selection with replacement is computed based on
the formula:
. .
= = = .
.

c) if the margin of error were to be diminished by 25%, what would have been the
minimum necessary sample size for the same confidence level?
By diminishing the margin of error by 25%, the new margin of error will be:
= ( . ) = . . = .
Based on the new margin of error, the minimum required sample size for simple random selection
without replacement is:
. .
= = = .
. .
+ . +
By diminishing the margin of error, the precision of the estimation increases, which means a
larger sample will be required.

d) estimate the proportion of employees who have a good and very good opinion
regarding the activity of the union.
The proportion of respondents who had a good and very good opinion regarding the activity of the
union is:
+
= = = .

The standard error of the proportion is:
( ) . .
=
= = .

The margin of error is:
=
= . . = .
So the confidence interval for the proportion of employees with a good and very good opinion
regarding the activity of the union will have the following limits:
= . . = .
+ = . + . = .
At least 72.26%, but not more than 81.08% of the employees have a good and very good
opinion about the activity of the union.
6. Stratified survey

6.1. A survey is to carried among the students of the CSIE Faculty with the view to analyse the
quality of the learning process. Because it poses an interest to study the characteristics for each
year of study, a stratified survey will be used. The information necessary for applying the survey
plan were taken from the Faculty secretariat and are shown in the table below. based on these data:
a) compute the minimum sample size required for a confidence level of 95% and a marginal
error of 3%;
b) because the sample size computed in a) incurs too high costs, it was proposed that it be
reduced by 10%. What is the marginal error that must be accepted in this case?
c) compute the sample size for each stratum starting from the total sample size computed in
b);
d) if the survey would be simple random without replacement, what would the minimum
necessary sample size be for the same confidence level and margin of error used in b)?
Year of study Number of students Students with no failed exams (%)
I 600 20
II 575 63
III 500 45
Master 450 86
Total 2125 ---

Solution:
Year ni sondaj ni sondaj
of study Ni pi pi
2
Ni 2pi proportional Ni pi optim piNi
I 600 20 0.1600 96.00 149 240.00 137 120.00
II 575 63 0.2331 134.03 143 277.61 159 362.25
III 500 45 0.2475 123.75 125 248.75 142 225.00
Master 450 86 0.1204 54.18 112 156.14 90 387.00
Total 2125 --- --- 407.96 529 922.50 528 1094.25

a) compute the minimum sample size required for a confidence level of 95% and a
marginal error of 3%;
The variance for the qualitative characteristic is computed as follows:
= ( )
The variance to be used for computing the minimum sample size is a mean of the variance for each
stratum:
.
=
= = .

The minimum sample size required for stratified sampling is:

. .
= = = .
. .

+ . +

b) because the sample size computed in a) incurs too high costs, it was proposed that it
be reduced by 10%. What is the marginal error that must be accepted in this case?
Reducing the sample size by 10 percent gives a new sample size of:
= . = . = .
The new margin of error will be:
.
= = . = .

If the sample size is diminished by 10%, the margin of error that must accepted is 3.22%.

c) compute the sample size for each stratum starting from the total sample size
computed in b);
Stratum sample size for the simple stratified sample:

= = = .

Since the total sample size is 529 persons, by rounding the stratum sample size at the nearest
integer, one respondent would be lost, so three of the strata will have 132 respondents and the
fourth will have 133 respondents.

Stratum sample size for the proportional stratified sample:



=

The results are in the table at the beginning of the solution.

d) if the survey would be simple random without replacement, what would the minimum
necessary sample size be for the same confidence level and margin of error used in
b)?
For simple random sampling, it is necessary to calculate the total variation:
=
( ) = . ( . ) = .
.
=
where
= = .

So the minimum sample size for a simple random selection without replacement will be:

. .
= = = .
+ . .

+ .

For a survey with a sample selected using simple random sampling, in order to produce an
estimation with a confidence level of 95% and a margin of error of 3.22%, the sample would
need to have 116 respondent more.
7. Hypothesis testing

7.1. According to a claim, the average SAT math score for graduating high school students in the
state of Virginia has recently exceeded 500. A sample of 70 students from Virginia had an average
SAT math score of 530. Assuming that the population standard deviation for Virginia students
math SAT scores is 125, solve the following tasks:
a) Test the claim at the = 0.05 significance level;
b) Verify the previous answer by comparing the sample mean to the critical sample mean;
c) Verify the answer in a) by comparing the p-value to the level of significance = 0.05.

Solution:

a) Test the claim at the = 0.05 significance level;


The data to be used for testing the hypothesis:
the value against which we test: a=500
sample size: n=70
sample mean: =530
population standard deviation: =125

1) Identifying the hypotheses:


:
: >
2) Defining the critical region(s):

Reject H0

Do not
reject H0

z=1.645
3) Computing the test statistic:


= = = .

4) Decision rule:
> reject H0
5) Final conclusion (interpretation):
There is sufficient evidence to support the claim that the average SAT math score of Virginia
students has recently exceeded 500.

b) Verify the previous answer by comparing the sample mean to the critical sample
mean;
The critical sample mean is the sum of the population mean and the product of the critical z-score
and the standard error of the mean. In order to reject the null hypothesis, the sample mean needs
to be greater than the critical sample mean.

= +
= + .
= .

Because the sample mean is greater than the critical sample mean, the null hypothesis is
rejected, which supports the conclusion reached in a).

c) Verify the answer in a) by comparing the p-value to the level of significance =0.05.
Calculate the probability that a random sample has a mean that is z=2.01 standard deviations or
more above the population mean:
= ( > . ) = ( . ) = . = .
The null hypothesis is rejected when the level of significance is greater than or equal to 0.0222.
Here, 0.05 > 0.0222, so the null hypothesis is rejected.

7.2. A professor claims the average class size at a university is greater than 35 students because a
random sample of 18 classes contained an average of 38.1 students. Assuming that the class size
distribution is normal with a population standard deviation of 7.6 students, solve the following
tasks:
a) Test the claim at the = 0.01 significance level;
b) Verify the previous answer by comparing the sample mean to the critical sample mean;
c) Verify the answer in a) by comparing the p-value to the level of significance = 0.01.

Solution:
a) Test the claim at the = 0.01 significance level;
The data to be used for testing the hypothesis:
the value against which we test: a=35
sample size: n=18
sample mean: =38.1
population standard deviation: =7.6

1) Identifying the hypotheses:


:
: >
2) Defining the critical region(s):

Reject H0

Do not
reject H0

z=2.33
3) Computing the test statistic:

.
. = = = .
.
4) Decision rule:
. < do not reject H0
5) Final conclusion (interpretation):
There is not sufficient evidence to support the claim that the average class size at a university
is greater than 35 students.

b) Verify the previous answer by comparing the sample mean to the critical sample
mean;
.
= +
= + . = .

Because the sample mean is less than the critical sample mean, the null hypothesis is not
rejected, which supports the conclusion reached in a).

c) Verify the answer in a) by comparing the p-value to the level of significance =0.01.
Calculate the probability that a random sample has a mean that is z=2.01 standard deviations or
more above the population mean:
= (. > . ) = (. . ) = . = .
The null hypothesis is rejected when the level of significance is greater than or equal to 0.0418.
Here, 0.01 < 0.0418, so the null hypothesis is not rejected.

7.3. An auditor claims that the average annual salary of a project manager at a construction
company exceeds $82,000. A random sample of 20 project managers had an average salary of
$89,600, with a sample standard deviation of $12,700. Assuming that the salaries of the managers
are normally distributed, solve the following tasks:
a) Test the claim at the = 0.01 significance level;
b) Verify the previous answer by comparing the sample mean to the critical sample mean;

Solution:
a) Test the claim at the = 0.01 significance level;
The data to be used for testing the hypothesis:
the value against which we test: a=82,000
sample size: n=20
sample mean: =89,600
population standard deviation: =12,700

1) Identifying the hypotheses:


: ,
: > ,
2) Defining the critical region(s):

Reject H0

Do not
reject H0

t;19=2.539
3) Computing the test statistic:
, ,

, = = = .
,
4) Decision rule:
, > ; reject H0
5) Final conclusion (interpretation):
There is sufficient evidence to support the claim that the average annual salary of a project
manager at a construction company exceeds $82,000.

b) Verify the previous answer by comparing the sample mean to the critical sample
mean;
,
= + ;
= , + . = , .

Because the sample mean is greater than the critical sample mean, the null hypothesis is
rejected, which supports the conclusion reached in a).

7.4. A wireless phone company claims that its customers cell phone bills average less than $100
per month. A random sample of 75 customers reported an average monthly bill of $94.25 with a
sample standard deviation of $17.38.
a) Test the claim at the = 0.05 significance level;
b) Verify the previous answer by comparing the sample mean to the critical sample mean;
c) Verify the answer in a) by comparing the p-value to the level of significance = 0.05.

Solution:
a) Test the claim at the = 0.05 significance level;
The data to be used for testing the hypothesis:
the value against which we test: a=100
sample size: n=75
sample mean: =94.25
sample standard deviation: s=17.38

1) Identifying the hypotheses:


:
: <
2) Defining the critical region(s):
Reject H0

Do not
reject H0

-z=-1.645
3) Computing the test statistic:

.
. = = = .
.
4) Decision rule:
. < reject H0
5) Final conclusion (interpretation):
There is sufficient evidence to support the claim that the average cell phone bill is less than $100
per month.

b) Verify the previous answer by comparing the sample mean to the critical sample
mean;
.
= + ( )
= + (. ) = .

Because the sample mean is less than the critical sample mean, the null hypothesis is rejected,
which supports the conclusion reached in a).

c) Verify the answer in a) by comparing the p-value to the level of significance =0.05.
Calculate the probability that a random sample has a mean that is z=2.01 standard deviations or
more above the population mean:
= (. < . ) = .
The null hypothesis is rejected when the level of significance is greater than or equal to 0.0021.
Here, 0.05 > 0.0021, so the null hypothesis is rejected.

7.5. A researcher claims that the proportion of U.S. households with at least one pet is not equal
to 70%. A random sample of 120 households contained 90 that owned at least one pet.
a) Test the claim at the = 0.10 significance level;
b) Verify the previous answer by comparing the sample mean to the critical sample mean;
c) Verify the answer in a) by comparing the p-value to the level of significance = 0.10.

Solution:
a) Test the claim at the = 0.10 significance level;
The data to be used for testing the hypothesis:
the value against which we test: a=0.70
sample size: n=120
number fulfilling the condition: m=90
sample mean: = / = .
sample standard deviation: ( ) = . . = .

1) Identifying the hypotheses:


: = .
: .
2) Defining the critical region(s):
Given that in this case the claim only asserts that the proportion is different, with no indication
towards the direction (greater or less than), there will be two critical regions, one for each of the
two possibilities. The two critical regions have the same probability, therefore must be divided
in half, one on the right side, one on the left side.
Reject H0 Reject H0

Do not
reject H0

-z/2=-1.645 z/2=1.645
3) Computing the test statistic:
. .
. = = = .
( ) . .

4) Decision rule:
|. | < / do not reject H0
5) Final conclusion (interpretation):
There is not sufficient evidence to support the claim that the proportion of U.S. households with
at least one pet is not equal to 70%.

b) Verify the previous answer by comparing the sample mean to the critical sample
mean;
Since the null hypothesis can be rejected in two regions, it is necessary to compute two critical
means.
( ) . .
= + (/ ) = . + (. ) = .

( ) . .
= + (/ ) = . + (. ) = .

Because the sample mean is between the two critical sample means, the null hypothesis is not
rejected, which supports the conclusion reached in a).

c) Verify the answer in a) by comparing the p-value to the level of significance =0.10.
Calculate the probability that a random sample has a mean that is z=2.01 standard deviations or
more above the population mean:
= ( > . ) = ( ( < . )) = ( . ) = .
The null hypothesis is rejected when the level of significance is greater than or equal to 0.2302.
Here, 0.10 < 0.0021, so the null hypothesis is rejected.
Practice Problems

1. A process fills boxes with a mean of 340 grams of cereal, with a standard deviation of 20
grams. Assuming the probability distribution for this population is unknown, if a store purchases
600 boxes of cereal, what is the probability that a sample of 50 boxes from the order will average
less than 336 grams?

2. A nationwide poll claims that the president of Romania has less than a 64% approval rating.
In a random sample of 125 people, 74 people gave the president a positive approval rating. Test
the claim at the = 0.02 significance level and verify your answer with the help of p-value.

3. A researcher claims that the average college student spends more than 16 hours on the Internet
per month. A random sample of 60 college students spent an average of 17.3 hours online per
month, with a sample standard deviation of 5.3 hours. Test the claim at the = 0.02 significance
level. Verify your answer by comparing the p-value and the significance level.

4. A computer company claims its laptop batteries average more than 3.5 hours of use per
charge. A sample of 45 batteries last an average of 3.72 hours. Assuming the population standard
deviation is 0.7 hours, test the companys claim at = 0.10 significance level and at = 0.01
significance level. Verify your answer by comparing the p-value to the two levels of significance.

5. A study conducted in 2010 reported 71.3% of men between the ages of 45 and 54 are
considered overweight.
a. If a random sample of 90 men in this age group is selected, what is the probability that
more than 70% of them will be overweight?
b. A recent sample of 22 men from this age group included 18 who were considered
overweight. Is this sufficient evidence to conclude that the proportion of overweight
men from this age group is still 71.3%?

6. 20 undergraduate students in a random sample worked an average of 13.5 hours per week for
a university. Assuming the population is normally distributed with a standard deviation of 5 hours
per week, test the claim that the average student works less than 15 hours per week at =0.02
significance level. Verify your answer by comparing the p-value and the significance level.

7. A survey was carried out among the visitors of a tourism fair. The sample of 400 respondents
was obtained by simple random sampling. The summarised results for two of the questions are
presented in the table below. Based on the collected data and for a confidence level of 95%:
a. estimate the mean age of the visitors;
b. compute the sample size necessary for diminishing the margin of error by 25%;
c. compute the margin of error to be accepted if reducing the sample size by 20%;
d. estimate the proportion of visitors who have a good and very good opinion regarding
accommodation conditions.
Q1. Age
Less than 30 [30-40) [40-50) [50-60) 60 and more Total
60 100 160 70 10 400
Q2. What is your opinion regarding the accommodation conditions?
Very good Good None Bad Very bad Total
70 190 30 50 60 400

8. An insurance company claims that the average automobile on the road today is less than 6
years old. A random sample of 15 cars had an average age of 5.4 years with a sample standard
deviation of 1.1 years. Assuming the population is normally distributed, test the claim at =0.05
significance level. Verify your answer by comparing the p-value and the significance level.

9. A study claims that the average annual tuition for private high schools is more than $7,000.
A random sample of 55 private high schools had an average annual tuition of $7,225 and a sample
standard deviation of $1,206. Test the claim at the = 0.10 significance level and verify your
answer with the help of p-value.

10. A survey with the aim of analysing the working environment will be carried out among the
1010 employees of an institution. Because the management is interested in the analysis at
department level, it was decided that the survey will be stratified. The information necessary for
the stratified sampling plan were taken from the human resources department and are presented in
the table below. For a confidence level of 95%:
a. calculate the sample size, given a margin of error of 3% for Employees with higher
education and of 200 lei for mean salary;
b. because the sample size computed in a) incurs too high costs, it was proposed that it be
reduced by 20%. What is the marginal error that must be accepted in this case for each
of the two variables of interest?
c. compute the sample size for each stratum starting from the total sample size computed
in b);
d. if the survey would be simple random without replacement, what would the minimum
necessary sample size be for the same confidence level and margin of error used in b)?
Department Employees Employees with higher education Average salary Variance
I 80 92% 5764 456
II 190 63% 1552 100
III 110 75% 2556 215
IV 150 86% 3724 324
V 260 15% 1200 89
VI 220 19% 1325 111
11. A government bureau claims that more than 50% of U.S. tax returns were filed electronically
last year. A random sample of 150 tax returns for last year contained 86 that were filed
electronically. Test the claim at the = 0.05 significance level and verify your answer with the
help of p-value.
Appendix 1
Table 1. Probabilities for the normal distribution for z-score values less than zero
Table 2. Probabilities for the normal distribution for z-score values greater than zero
Appendix 2
Critical values for the Student distribution

Вам также может понравиться