Вы находитесь на странице: 1из 6

Chapter 8 Confidence Interval Estimation

 You need to use inferential statistics, which uses sample results (the “some specifics”) to
estimate (the making of “broader generalizations”) unknown population parameters such as a
population mean or a population proportion. Note that statisticians use the word estimate in
the same sense of the everyday usage: something you are reasonably certain about but cannot
flatly say is absolutely correct.
 You estimate population parameters by using either point estimates or interval estimates.
 A point estimate is the value of a single sample statistic, such as a sample mean.
 A confidence interval estimate is a range of numbers, called an interval, constructed around the
point estimate.
 The confidence interval is constructed such that the probability that the interval includes the
population parameter is known.
 When you construct a confidence interval estimate, you indicate the confidence of correctly
estimating the value of the population parameter, This allows you to say that there is a specified
confidence that is somewhere in the range of numbers defined by the interval.

Confidence Interval Estimate for the Mean ( σ Known)


 For some samples, the interval estimate for µ is correct, but for others it is incorrect.
 To resolve this problem of sometimes having an interval that provides a correct estimate and
sometimes having an interval that does not, you need to determine the proportion of samples
producing intervals that result in correct statements about the population mean, µ.
 A sample of boxes had a sample mean of 362.3 grams. The interval constructed to estimate µ is

The interval from 356.42 to 368.18 is referred to as a 95% confidence interval.


“I am 95% confident that the mean amount of cereal in the population of boxes is somewhere
between 356.42 and 368.18 grams.”
 The variation of sample statistics from sample to sample is called sampling error. The size of the
sampling error is primarily based on the amount of variation in the population and on the
sample size. Large samples have less sampling error than small samples, but large samples cost
more to select.
 Although 3 of the 20 intervals did not include the population mean, if you had selected all the
possible samples of n=10from a population of N=200, 95% of the intervals would include the
population mean.
 In general, the level of confidence is symbolized by ( 1−α ) x 100 % , where α is the proportion in
the tails of the distribution that is outside the confidence interval. The proportion in the upper
tail of the distribution α/2 and the proportion in the lower tail of the distribution is α/2.

 The value of Z α /2❑needed for constructing a confidence interval is called the critical value for
the distribution.

 The sampling distribution of the sample mean, is normally distributed if the population for your
characteristic of interest, X, follows a normal distribution.
 And, if the population of X does not follow a normal distribution, the Central Limit Theorem
almost always ensures that sample mean is approximately normally distributed when n is large.
 However, when dealing with a small sample size and a population that does not follow a normal
distribution, the sampling distribution of sample mean is not normally distributed, and therefore
the confidence interval discussed in this section is inappropriate.
 In practice, however, as long as the sample size is large enough and the population is not very
skewed, you can use the confidence interval defined in Equation (8.1) to estimate the
population mean when population standard deviation is known.

Can You Ever Know the Population Standard Deviation?

 More significantly, in virtually all real-world business situations, you would never know the
standard deviation of the population.

Confidence Interval Estimate for the Mean ( σ Unknown)

 Writing under the pen name “Student,” William S. Gosset solved this problem by developing
what today is known as the Student’s t distribution, or the t distribution, for short.
Properties of the t Distribution

 The t distribution is very similar in appearance to the standardized normal distribution. Both
distributions are symmetrical and bell-shaped, with the mean and the median equal to zero.
However, the t distribution has more area in the tails and less in the center than does the
standardized normal distribution. This is due to the fact that because S is used to estimate the
unknown σ the values of t are more variable than those for Z.
 The degrees of freedom, are directly related to the sample size, n.
 As the sample size and degrees of freedom increase, S becomes a better estimate of σ and the t
distribution gradually approaches the standardized normal distribution, until the two are
virtually identical.
 With a sample size of about 120 or more, S estimates closely enough so that there is little
difference between the t and Z distributions.
 In practice, however, when the sample size is large enough and the population is not very
skewed, in most cases you can use the t distribution to estimate the population mean when σ is
unknown.
 When dealing with a small sample size and a skewed population distribution, the confidence
interval estimate may not provide a valid estimate of the population mean.
 To assess the assumption of normality, you can evaluate the shape of the sample data by
constructing a histogram, stem-and-leaf display, boxplot, or normal probability plot. However,
the ability of any of these graphs to help you evaluate normality is limited when you have a
small sample size.
 Because t is a symmetrical distribution with a mean of 0, if the upper-tail value is +1.9842, the
value for the lower-tail area (lower 0.025) is -1.9842.
 A t value of -1.9842 means that the probability that t is less than -1.9842 is 0.025, or 2.5%.
 Note that for a 95% confidence interval, you will always have a cumulative probability of 0.975
and an upper-tail area of 0.025. Similarly, for a 99% confidence interval, you will have 0.995 and
0.005, and for a 90% confidence interval you will have 0.95 and 0.05.

The Concept of Degrees of Freedom

 In order to compute S2 you first need to know X́ . Therefore, only n−1 of the sample values are
free to vary.This means that you have n−1 degrees of freedom.
 For example, suppose a sample of five values has a mean of 20. How many values do you need
to know before you can determine the remainder of the values? The fact that n=5∧ X́=20and
also tells you that
 Thus, when you know four of the values, the fifth one is not free to vary because the sum must
be 100. For example, if four of the values are 18, 24, 19, and 16, the fifth value must be 23 so
that the sum is 100.

The Confidence Interval Statement

 The interpretation of the confidence interval when σ is unknown is the same as when σ is
known.

Confidence Interval Estimate for the Proportion

 The concept of a confidence interval also applies to categorical data. With categorical data, you
want to estimate the proportion of items in a population having a certain characteristic of
interest.
 The unknown population proportion is represented by the Greek letter π.
X
 The point estimate for π is the sample proportion, p= ,where n is the sample size and X is the
n
number of items in the sample having the characteristic of interest.

Determining Sample Size

 In the business world, sample sizes are determined prior to data collection to ensure that the
confidence interval is narrow enough to be useful in making decisions.

Sample Size Determination for the Mean


 The amount added to or subtracted from is equal to half the width of the interval. This quantity
represents the amount of imprecision in the estimate that results from sampling error. The
sampling error, e, is defined as

 Solving for n gives the sample size needed to construct the appropriate confidence interval
estimate for the mean. “Appropriate” means that the resulting interval will have an acceptable
amount of sampling error.

 To compute the sample size, you must know three factors:


1. The desired confidence level, which determines the value of the critical value from the
standardized normal distribution
2. The acceptable sampling error,
3. The standard deviation
 You use Z instead of t because, to determine the critical value of t, you need to know the sample
size, but you do not know it yet. For most studies, the sample size needed is large enough that
the standardized normal distribution is a good approximation of the t distribution.
 For the sampling error, you should think not of how much sampling error you would like to have
(you really do not want any error) but of how much you can tolerate when reaching conclusions
from the confidence interval.

Sample Size Determination for the Proportion

Вам также может понравиться