Вы находитесь на странице: 1из 4

Chapter 1 Review of Basic Concepts

2. SAMPLING AND SAMPLING DISTRIBUTION

SAMPLING DISTRIBUTION OF SAMPLE MEANS


Suppose we have a population X = {1, 2, 3, 4, 5}. The population mean = 3 and population standard deviation =
2. (Verify this!) If we take samples of size two with replacement, the following table lists all possible samples.
Notice that the mean of the sample means = 3 = population mean. The standard deviation of the sample means
(usually called standard error) is 1, which is equal to “the population standard deviation divided by the square
root of the sample size”. (Verify this!) The picture below is the frequency histogram of the sample means.
Does it look like a bell-shaped distribution?

x1 x2 x  ( x - )2
1 1 1 3 4
1 2 1.5 3 2.25
1 3 2 3 1
1 4 2.5 3 0.25
1 5 3 3 0
2 1 1.5 3 2.25 5
2 2 2 3 1
2 3 2.5 3 0.25 4
Frequency

2 4 3 3 0
2 5 3.5 3 0.25 3
3 1 2 3 1
3 2 2.5 3 0.25 2
3 3 3 3 0
3 4 3.5 3 0.25 1
3 5 4 3 1
4 1 2.5 3 0.25 0
4 2 3 3 0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
4 3 3.5 3 0.25
Sample Means
4 4 4 3 1
4 5 4.5 3 2.25
5 1 3 3 0
5 2 3.5 3 0.25
5 3 4 3 1
5 4 4.5 3 2.25
5 5 5 3 4
Sum = 75 75 25

If we take a sample of size three with replacement, the total number of possible samples = 5 3 = 125. Following
is the list of all possible sample means:

1.00 1.33 1.67 2.00 2.33 1.33 1.67 2.00 2.33 2.67 1.67 2.00 2.33 2.67 3.00 2.00 2.33 2.67 3.00 3.33 2.33
2.67 3.00 3.33 3.67 1.33 1.67 2.00 2.33 2.67 1.67 2.00 2.33 2.67 3.00 2.00 2.33 2.67 3.00 3.33 2.33 2.67
3.00 3.33 3.67 2.67 3.00 3.33 3.67 4.00 1.67 2.00 2.33 2.67 3.00 2.00 2.33 2.67 3.00 3.33 2.33 2.67 3.00
3.33 3.67 2.67 3.00 3.33 3.67 4.00 3.00 3.33 3.67 4.00 4.33 2.00 2.33 2.67 3.00 3.33 2.33 2.67 3.00 3.33
3.67 2.67 3.00 3.33 3.67 4.00 3.00 3.33 3.67 4.00 4.33 3.33 3.67 4.00 4.33 4.67 2.33 2.67 3.00 3.33 3.67
2.67 3.00 3.33 3.67 4.00 3.00 3.33 3.67 4.00 4.33 3.33 3.67 4.00 4.33 4.67 3.67 4.00 4.33 4.67 5.00

The mean of these sample means =


3 = population mean, and the
standard error of these sample
means = 0.816497 = /3 = 2/3.
The distribution of the sample
means is shown on the next figure.

You can verify the following


(although it may be tedious to do
so). If samples of size four were
taken with replacement, then there
will be 54 = 625 different possible

1
samples. The mean of these sample means = 3 and the standard error of these sample means = 2/4 = 1/2.
From these examples, we observed the following:

1. The distribution of the sample means is approximately bell-shaped (normal).


2. Regardless of the sample sizes, the mean of the sample means is always equal to the population mean.
3. The standard error of the sample mean = The population standard deviation divided by the square root of
the sample size, i.e.,  x   / n . Hence, as the sample size increases, the standard error of the sample
means decreases. That is, as the sample size increases, all the sample means become closer to its
population mean.

Now, let’s consider the same population X = {1, 2, 3, 4, 5}. However, we will now take a sample of size two
without replacement. The following table lists all possible samples. Notice that the mean of the sample means
= 3 = population mean. The standard error of the sample means is 0.866025. The picture on the right is the
frequency of the sample means. Does it look like a bell-shaped distribution?

x1 x2 Mean
1 2 1.5
1 3 2 2
1 4 2.5
1 5 3
2 3 2.5 1
2 4 3
2 5 3.5
3 4 3.5 0
3 5 4 1.5 2.0 2.5 3.0 3.5 4.0 4.5
4 5 4.5

Now, if we take samples of size 3 without replacement, the following table lists all the possible samples. Note
that the mean of the sample means =  = 3, and the standard error of the sample means = 0.57735.

x1 x2 x3 mean 2.5
1 2 3 2.00
1 2 4 2.33 2
Frequency

1 2 5 2.67 1.5
1 3 4 2.67 1
1 3 5 3.00
1 4 5 3.33 0.5

2 3 4 3.00 0
2 3 5 3.33 2 2.4 2.7 3 3.4 3.7 4

2 4 5 3.67 Sample Means

3 4 5 4.00

Notice that, when samples are taken without replacement, the first two of the three observations we made
previously still hold. As for the last one regarding the relationship between the population standard deviation 
and standard error of sample means  x , we now have the following rule:

When samples are taken without replacement from a finite population of size N, the relationship between the
population standard deviation  and the standard error of the sample means  x is

 N n N n
x  , where is called the finite population correction factor.
n N 1 N 1

The case of sampling from a finite population without replacement may be theoretically interesting to know, but
it is not an important case in real applications for the following reasons:

1. If the population has only a few finite values, we probably will simply take a census rather than a sample.
2. If the population has finite but large number of different values and the sample size is much smaller than
the population size (which is often the case in the real life application), then the finite population
correction factor will be close to 1, making the two cases, sampling with replacement and sampling
without replacement, with little difference.

2
Examples:
1 Suppose samples of size three are taken with replacement from the population X = {1, 2, 3, 4, 5, 6, 7, 8, 9,
10}. Notice that, for this population,  = 5.5 and 2 = 8.25.
a) How many possible samples are there?
b) What is the mean of the sample means?
c) What is the standard error of the sample means?
2 How would your answers to Question 1 change if samples were taken without replacement?

THEORIES ABOUT SAMPLING DISTRIBUTION OF SAMPLE MEANS


Following are two important results about the distribution of sample means. The second one, called central
limit theorem, is in fact one of the most important results in Statistics.

1 If the original distribution of the population is normal, then the sampling distribution of the sample means
will be normal with mean (of the sample means) equal to the population mean, i.e.,  x   , and standard
error of the sample means equal to population standard deviation divided by square root of the sample size,
i.e.,  x   / n .
2 Central Limit Theorem For large enough sample size, the sampling distribution of the sample means
will be approximately normal with mean (of the sample means) equal to the population mean, i.e.,  x   ,
and standard error of the sample means equal to population standard deviation divided by square root of
the sample size, i.e.,  x   / n , regardless of the probability distribution of the original population.

Here, the “standard error” of the sample means is simply “standard deviation” of the sample means. The term
“standard error” is always used to represent the “standard deviation” of sample statistics so that people know
that we are talking about the distribution of sample statistics.

How big a sample size is considered big enough? The rule of thumb is n  30 although it actually depends on
the original distribution of the population. If the original distribution is very symmetric (e.g., uniform
distribution), n does not have to be very large for the Central Limit Theorem to work. However, if the original
distribution is skewed, then it may require a quite large n in order for the Central Limit Theorem to work.

Examples:
1 Given that men’s weights are normally distributed with a mean of 160 lbs and a standard deviation of 25
lbs, find the probability that
a) If 1 man is randomly selected, his weight is greater than 175 lb.
b) If 25 different men are randomly selected, their mean weight is greater than 167 lb.
c) If 100 different men are randomly selected, their mean weight is between 156 lb and 164 lb.

2 Let X be a population with the following elements: {0, 1, 2, …, 20}. The mean of X is 10 and variance is
36.67 (hence, standard deviation = 6.0553). Find the probability that
a) If one number is randomly selected, it is between 9 and 11.
b) If 49 random numbers are taken with replacement, the mean of these numbers is between 9 and 11.
c) If 100 random numbers are taken with replacement, the mean of these numbers is between 9 and 11.
d) If 10,000 random numbers are taken with replacement, the mean of these numbers is between 9 and 11.
e) If 64 random numbers are taken with replacement, the mean of these numbers is greater than 12.
f) If 64 random numbers are taken with replacement, the mean of these numbers is less than 7.

3 Suppose the family incomes for all the families in a major city (with over one million families) are known
to have a mean of $40,000 per month, with a standard deviation of $18,000. Find the probability that
a) If 49 families are selected, their average income is below $35,000.
b) If 100 families are selected, their average income is above $45,000.
c) If 10,000 families are selected, their average income is either below $39,500 or above $40,500.
d) Suppose now we don’t really know the true average family income of the city (assume we do know the
standard deviation), and we would like to know it. Based on your answer in c), do you think we really
need to take a census in order to find the true average family income of the city? or, we can
comfortably take a large enough sample, and then, use the sample mean as an estimate to the

3
population mean?
SAMPLING DISTRIBUTION OF THE SAMPLE MEAN WHEN  IS UNKNOWN

By the Central Limit Theorem, when the sample size is large, the sampling distribution of x will be
approximately normal with  x   and  x   / n . Hence, the standardized sample statistics,
x  x
Z  ( X  X ) /  X = , will then follow a standard Z distribution. Notice that this requires  be known.
/ n
What happens if  is unknown? In which case, we usually use the sample standard deviation s to estimate .
x
But then, the standardized sample statistics, , will no longer follow the Z distribution. In fact, this
s/ n
variable is referred to as t, and its corresponding distribution is called Student t-distribution (or simply t-
distribution) with n – 1 degrees of freedom.

The Student t Distribution


The t distribution was introduced originally by W. S. Gosset, who published his paper under the pen name
“Student,” since the company for which he worked, a brewery, did not permit publication by employees. The
student t distribution is very similar to the standardized normal distribution Z, except that it has heavier tails. It
is bell-shaped and symmetric. The student t distribution has one parameter value, called degrees of freedom,
which determines the exact shape of the distribution. Like Z distribution, the mean of t is 0. Unlike Z
distribution whose standard deviation is 1, the standard deviation for a t variable with n degrees of freedom is
n /( n  2) for n > 2. As the degrees of freedom become larger, t-distribution will become closer to Z
distribution. Eventually, when the degrees of freedom approach infinity, t-distribution will become identical to
the Z distribution. There is a close relationship between the degrees of freedom of a t variable and the sample
size n – when the sample size becomes larger, the degrees of freedom of t also become larger. Hence, when the
sample size n is large, we can often use Z distribution to approximate the t distribution.

Вам также может понравиться