Вы находитесь на странице: 1из 2

The population is the set of entities under study. For example, the mean height of men.

is a hypothetical population because it includes all men that have lived, are alive and will
live in the future. I like this example because it drives home the point that we, as analysts,
choose the population that we wish to study. Typically it is impossible to survey/measure
the entire population because not all members are observable (e.g. men who will exist in
the future). If it is possible to enumerate the entire population it is often costly to do so and
would take a great deal of time. In the example above we have a population "men" and a
parameter of interest, their height.

Instead, we could take a subset of this population called a sample and use this sample to
draw inferences about the population under study, given some conditions. Thus we could
measure the mean height of men in a sample of the population which we call a statistic and
use this to draw inferences about the parameter of interest in the population. It is an
inference because there will be some uncertainty and inaccuracy involved in drawing
conclusions about the population based upon a sample. This should be obvious - we have
fewer members in our sample than our population therefore we have lost some information.

There are many ways to select a sample and the study of this is called sampling theory. A
commonly used method is called Simple Random Sampling (SRS). In SRS each member of
the population has an equal probability of being included in the sample, hence the term
"random". There are many other sampling methods e.g. stratified sampling, cluster
sampling, etc which all have their advantages and disadvantages.

It is important to remember that the sample we draw from the population is only one from a
large number of potential samples. If ten researchers were all studying the same population,
drawing their own samples then they may obtain different answers. Returning to our earlier
example, each of the ten researchers may come up with a different mean height of men i.e.
the statistic in question (mean height) varies of sample to sample -- it has a distribution
called a sampling distribution. We can use this distribution to understand the uncertainty in
our estimate of the population parameter.

The sampling distribution of the sample mean is known to be a normal distribution with a
standard deviation equal to the sample standard deviation divided by the sample size.
Because this could easily be confused with the standard deviation of the sample it more
common to call the standard deviation of the sampling distribution the standard error.
The population is the whole set of values, or individuals, you are interested in. The
12down sample is a subset of the population, and is the set of values you actually use in your
So, for example, if you want to know the average height of the residents of China, that is
your population, ie, the population of China. The thing is, this is quite large a number,
and you wouldn't be able to get data for everyone there. So you draw a sample, that is,
you get some observations, or the height of some of the people in China (a subset of the
population, the sample) and do your inference based on that.

The population is everything in the group of study. For example, if you are studying the
price of Apple's shares, it is the historical, current, and even all future stock prices. Or, if you
run an egg factory, it is all the eggs made by the factory.
You don't always have to sample, and do statistical tests. If your population is your
immediate living family, you don't need to sample, as the population is small.

Sampling is popular for a variety of reasons:

it is cheaper than a census (sampling the whole population)

you don't have access to future data, so must sample the past
you have to destroy some items by testing them, and don't want to destroy them all (say, eggs