Академический Документы
Профессиональный Документы
Культура Документы
You need to use inferential statistics, which uses sample results (the “some specifics”) to
estimate (the making of “broader generalizations”) unknown population parameters such as a
population mean or a population proportion. Note that statisticians use the word estimate in
the same sense of the everyday usage: something you are reasonably certain about but cannot
flatly say is absolutely correct.
You estimate population parameters by using either point estimates or interval estimates.
A point estimate is the value of a single sample statistic, such as a sample mean.
A confidence interval estimate is a range of numbers, called an interval, constructed around the
point estimate.
The confidence interval is constructed such that the probability that the interval includes the
population parameter is known.
When you construct a confidence interval estimate, you indicate the confidence of correctly
estimating the value of the population parameter, This allows you to say that there is a specified
confidence that is somewhere in the range of numbers defined by the interval.
For some samples, the interval estimate for µ is correct, but for others it is incorrect.
To resolve this problem of sometimes having an interval that provides a correct estimate and
sometimes having an interval that does not, you need to determine the proportion of samples
producing intervals that result in correct statements about the population mean, µ.
A sample of boxes had a sample mean of 362.3 grams. The interval constructed to estimate µ is
The sampling distribution of the sample mean, is normally distributed if the population for your
characteristic of interest, X, follows a normal distribution.
And, if the population of X does not follow a normal distribution, the Central Limit Theorem
almost always ensures that sample mean is approximately normally distributed when n is large.
However, when dealing with a small sample size and a population that does not follow a normal
distribution, the sampling distribution of sample mean is not normally distributed, and therefore
the confidence interval discussed in this section is inappropriate.
In practice, however, as long as the sample size is large enough and the population is not very
skewed, you can use the confidence interval defined in Equation (8.1) to estimate the
population mean when population standard deviation is known.
More significantly, in virtually all real-world business situations, you would never know the
standard deviation of the population.
Writing under the pen name “Student,” William S. Gosset solved this problem by developing
what today is known as the Student’s t distribution, or the t distribution, for short.
Properties of the t Distribution
The t distribution is very similar in appearance to the standardized normal distribution. Both
distributions are symmetrical and bell-shaped, with the mean and the median equal to zero.
However, the t distribution has more area in the tails and less in the center than does the
standardized normal distribution. This is due to the fact that because S is used to estimate the
unknown σ the values of t are more variable than those for Z.
The degrees of freedom, are directly related to the sample size, n.
As the sample size and degrees of freedom increase, S becomes a better estimate of σ and the t
distribution gradually approaches the standardized normal distribution, until the two are
virtually identical.
With a sample size of about 120 or more, S estimates closely enough so that there is little
difference between the t and Z distributions.
In practice, however, when the sample size is large enough and the population is not very
skewed, in most cases you can use the t distribution to estimate the population mean when σ is
unknown.
When dealing with a small sample size and a skewed population distribution, the confidence
interval estimate may not provide a valid estimate of the population mean.
To assess the assumption of normality, you can evaluate the shape of the sample data by
constructing a histogram, stem-and-leaf display, boxplot, or normal probability plot. However,
the ability of any of these graphs to help you evaluate normality is limited when you have a
small sample size.
Because t is a symmetrical distribution with a mean of 0, if the upper-tail value is +1.9842, the
value for the lower-tail area (lower 0.025) is -1.9842.
A t value of -1.9842 means that the probability that t is less than -1.9842 is 0.025, or 2.5%.
Note that for a 95% confidence interval, you will always have a cumulative probability of 0.975
and an upper-tail area of 0.025. Similarly, for a 99% confidence interval, you will have 0.995 and
0.005, and for a 90% confidence interval you will have 0.95 and 0.05.
In order to compute S2 you first need to know X́ . Therefore, only n−1 of the sample values are
free to vary.This means that you have n−1 degrees of freedom.
For example, suppose a sample of five values has a mean of 20. How many values do you need
to know before you can determine the remainder of the values? The fact that n=5∧ X́=20and
also tells you that
Thus, when you know four of the values, the fifth one is not free to vary because the sum must
be 100. For example, if four of the values are 18, 24, 19, and 16, the fifth value must be 23 so
that the sum is 100.
The interpretation of the confidence interval when σ is unknown is the same as when σ is
known.
The concept of a confidence interval also applies to categorical data. With categorical data, you
want to estimate the proportion of items in a population having a certain characteristic of
interest.
The unknown population proportion is represented by the Greek letter π.
X
The point estimate for π is the sample proportion, p= ,where n is the sample size and X is the
n
number of items in the sample having the characteristic of interest.
In the business world, sample sizes are determined prior to data collection to ensure that the
confidence interval is narrow enough to be useful in making decisions.
Solving for n gives the sample size needed to construct the appropriate confidence interval
estimate for the mean. “Appropriate” means that the resulting interval will have an acceptable
amount of sampling error.