Вы находитесь на странице: 1из 22

Basic Statistical Inference

 Using probability distributions to make statistical inferences

 Confidence interval estimation of population parameters

 Hypothesis tests concerning population parameters


Confidence Intervals for a Population Mean
 A confidence interval for 𝝁𝝁 based on the t-Distribution:
 Let Let {𝒚𝒚𝟏𝟏 , 𝒚𝒚𝟐𝟐 , … , 𝒚𝒚𝒏𝒏 } be a randomly selected sample of n values from a
population.
 We can use the sample mean and the sample standard deviation as
the point estimates of the population mean 𝝁𝝁 and standard deviation 𝝈𝝈.
 This interval is constructed around the sample mean 𝒚𝒚 � so that we are
reasonably sure, or confident, that this interval contains 𝝁𝝁 .
 In general, we let 100(1 - 𝛼𝛼)% denote our level of confidence.
 For example, if 𝛼𝛼 = .05, then we are 100(1 - 𝛼𝛼)% = 100(1 - .05)% =
95% confident.
 A 100(1 - 𝛼𝛼)% confidence interval for 𝝁𝝁 is:
(𝒏𝒏−𝟏𝟏) 𝒔𝒔 (𝒏𝒏−𝟏𝟏) 𝒔𝒔 (𝒏𝒏−𝟏𝟏) 𝒔𝒔
� ± 𝒕𝒕 𝜶𝜶
[ 𝒚𝒚 � − 𝒕𝒕 𝜶𝜶
( )] = [ 𝒚𝒚 � + 𝒕𝒕𝜶𝜶
( ), 𝒚𝒚 ( )]
[ 𝟐𝟐] 𝒏𝒏 [ 𝟐𝟐] 𝒏𝒏 ] 𝒏𝒏
𝟐𝟐
Confidence Intervals for a Population Mean
(𝒏𝒏−𝟏𝟏)
 Here, 𝒕𝒕 𝜶𝜶 is the point on the scale of the t-distribution having n - 1
[ 𝟐𝟐]
degrees of freedom so that the area-under this curve to the right of this
𝛼𝛼
point is .
2
 The 100(1 - 𝛼𝛼)% confidence interval for 𝝁𝝁 says that we are 100(1 - 𝛼𝛼)%
confident that 𝝁𝝁 is greater than or equal to the lower bound and less
than or equal to the upper bound
𝒔𝒔 (𝒏𝒏−𝟏𝟏) 𝒔𝒔
� − 𝒕𝒕 𝜶𝜶𝒏𝒏−𝟏𝟏
𝒚𝒚 � + 𝒕𝒕 𝜶𝜶
≤ 𝝁𝝁 ≤ 𝒚𝒚 ( )
𝒏𝒏 [ 𝟐𝟐] 𝒏𝒏
𝟐𝟐

 Example 1: Consider the example from chapter 2, a sample of n=5


Hawks and test them for gasoline mileage. Calculate a 95% confidence
interval for 𝝁𝝁, for the sample {𝒚𝒚𝟏𝟏 , 𝒚𝒚𝟐𝟐 , 𝒚𝒚𝟑𝟑 , 𝒚𝒚𝟒𝟒 , 𝒚𝒚𝟓𝟓 } = {30.7, 31.8, 30.2,
32.0, 31.3}.
 What about 99% confidence interval
Confidence Intervals for a Population Mean

Three possible samples


from the population of all
possible samples of five
mileages that could have
been randomly selected
from the infinite population
of all Hawk mileages.

The population of all


mileages is assumed to be
as normally distributed
with mean µ and standard
deviation 𝜎𝜎 equal to 31.5
and .8, respectively.
Confidence Intervals for a Population Mean
Two of the three 95% confidence
intervals for µ = 31.5 in Figure 3.1
contain µ.

The interpretation of 95%


confidence: 95 percent of the 95%
confidence intervals for µ in the
population of all such intervals
contain µ = 31.5, while 5 percent of
the confidence intervals in this
population do not contain µ.

Thus after computing a 95%


confidence interval for µ, we can
be 95% confident that µ is
contained in our interval, because
95 percent of the intervals in the
population of all possible 95%
confidence intervals for µ contain µ
and because we have obtained one
of the confidence intervals in this
population.
The Derivation of the Interval
 The population of all possible sample means (that is, point estimates of 𝜇𝜇):
1. Has mean 𝝁𝝁𝒚𝒚� = 𝝁𝝁
𝝈𝝈𝟐𝟐
2. Has variance 𝝈𝝈𝟐𝟐 �
𝒚𝒚 = (if the population sampled is' infinite).
𝒏𝒏
𝝈𝝈
3. Has standard deviation 𝝈𝝈𝒚𝒚� = (if the population sampled is infinite).
𝒏𝒏

4. Has a normal distribution (if the population sampled has a normal


distribution).
 Result 1 says the mean of all possible sample means, equals 𝝁𝝁, the
population mean.
 For this reason, when we use the sample mean 𝒚𝒚� as the point estimate of
𝝁𝝁 , we are using an unbiased estimation procedure.
The Derivation of the Interval
𝟐𝟐 𝝈𝝈𝟐𝟐
 Result 2, 𝝈𝝈 �
𝒚𝒚 = , that the division by n makes 𝝈𝝈𝟐𝟐 𝒚𝒚� smaller than 𝝈𝝈𝟐𝟐 .
𝒏𝒏
 Furthermore, this result says that the larger the sample size is, the
smaller is 𝝈𝝈𝟐𝟐 𝒚𝒚� .
 That is, when n is larger, more sample values are used to compute
each possible sample mean.
 This results in the sample means being clustered even more closely
around 𝝁𝝁.
 Results 1, 2, and 3 imply that if the population that is sampled is
normally distributed, then the population of all possible values of
�−𝝁𝝁𝒚𝒚�
𝒚𝒚 �−𝝁𝝁
𝒚𝒚
=
𝝈𝝈𝒚𝒚� 𝝈𝝈/ 𝒏𝒏

has a standard normal distribution.


The Derivation of the Interval
 We estimate 𝝈𝝈𝒚𝒚� = 𝝈𝝈/ 𝒏𝒏, which is called the standard error of the
estimate 𝑦𝑦.

 Then it can be proven that if the population that is sampled is normally
�−𝝁𝝁𝒚𝒚�
𝒚𝒚 �−𝝁𝝁
𝒚𝒚
distributed, the population of all possible values of = has a
𝒔𝒔𝒚𝒚� 𝒔𝒔/ 𝒏𝒏
t−distribution with n − 1 degrees of freedom.
𝒏𝒏−𝟏𝟏 �−𝝁𝝁
𝒚𝒚 𝒏𝒏−𝟏𝟏
 This imples that 𝑷𝑷( −𝒕𝒕 𝜶𝜶 ≤ ≤ 𝒕𝒕 𝜶𝜶 ) is the area under the curve
𝒔𝒔/ 𝒏𝒏
𝟐𝟐 𝟐𝟐
of the t-distribution having n - 1 degrees of freedom, between
𝒏𝒏−𝟏𝟏 𝒏𝒏−𝟏𝟏
− 𝒕𝒕 𝜶𝜶 and 𝒕𝒕 𝜶𝜶 .
𝟐𝟐 𝟐𝟐
Confidence Intervals Based on
Normal Distribution
 The Central Limit Theorem states that:
 if the sample size n is large (at least 30), then the population of all
possible sample means approximately has a normal distribution
(with mean 𝝁𝝁𝒚𝒚� = 𝝁𝝁 and standard deviation 𝝈𝝈𝒚𝒚� = 𝝈𝝈/ 𝒏𝒏, no matter what
probability distribution describes the population sampled.
 Therefore if n is large, the population of all possible values of
�−𝝁𝝁𝒚𝒚�
𝒚𝒚 �−𝝁𝝁
𝒚𝒚
= , approximately has a standard normal distribution.
𝒔𝒔𝒚𝒚� 𝒔𝒔/ 𝒏𝒏

𝜎𝜎 𝑠𝑠
 This implies that [𝑦𝑦� ± 𝑧𝑧𝛼𝛼 ] 𝑎𝑎𝑎𝑎𝑎𝑎 [𝑦𝑦� ± 𝑧𝑧𝛼𝛼 ] are approximately
2 𝑛𝑛 2 𝑛𝑛
correct 100( 1 - 𝛼𝛼) % confidence intervals for 𝝁𝝁, no matter what
probability distribution describes the population sampled.
 The second interval follows from the first by approximating 𝜎𝜎 by s.
Confidence Intervals Based on
Normal Distribution
Confidence Intervals Based on Normal
Distribution

𝑦𝑦−𝜇𝜇
 The population of all possible values of has a normal distribution,
𝜎𝜎/ 𝑛𝑛

𝑦𝑦−𝜇𝜇
therefore, 𝑃𝑃(- 𝑧𝑧𝛼𝛼 ≤ ≤ 𝑧𝑧𝛼𝛼 ) ≈ 1 − 𝛼𝛼
2 𝜎𝜎/ 𝑛𝑛 2

 This probability is the area under the curve of the standard normal
distribution between - 𝑧𝑧𝛼𝛼 and 𝑧𝑧𝛼𝛼 .
2 2

𝜎𝜎 𝜎𝜎
 We can find that 𝑃𝑃[𝑦𝑦� − 𝑧𝑧𝛼𝛼 ≤ 𝜇𝜇 ≤ 𝑦𝑦� + 𝑧𝑧𝛼𝛼 ] ≈ 1 − 𝛼𝛼
2 𝑛𝑛 2 𝑛𝑛
Confidence Intervals Based on Normal
Distribution
𝜎𝜎
 This implies that [𝑦𝑦� ± 𝑧𝑧𝛼𝛼 ] is an approximately correct 100(1 -𝛼𝛼)%
2 𝑛𝑛
confidence interval for 𝜇𝜇.
 When we do not know the true value of the population standard
deviation 𝜎𝜎, we should use the 100(1 -𝛼𝛼)% confidence interval for 𝜇𝜇
𝑠𝑠
based on the normal distribution [𝑦𝑦� ± 𝑧𝑧𝛼𝛼 ] if the sample size n is
2 𝑛𝑛
large (say, at least 30).
 If the sample size n is small and the population sampled is normally
distributed (or at least mound-shaped), we should use the 100(1 -𝛼𝛼)%
(𝑛𝑛−1) 𝑠𝑠
confidence interval for 𝜇𝜇 based on the t-distribution [𝑦𝑦� ± 𝑡𝑡[𝛼𝛼/2] ]
𝑛𝑛
An Illustration of the Central Limit Theorem

(b) Corresponding populations of all possible sample means for different sample sizes
Sample Size Determination
 Suppose we wish to determine the sample size n so that we are 100(1 -
𝛼𝛼)% confident that 𝑦𝑦� is within B units of 𝜇𝜇.
 We can determine the necessary sample as follows:
𝜎𝜎 𝒛𝒛(𝜶𝜶/𝟐𝟐) 𝝈𝝈 𝟐𝟐
𝑧𝑧𝛼𝛼 = 𝐵𝐵, therefore, 𝑧𝑧𝛼𝛼 𝜎𝜎 = 𝑛𝑛𝐵𝐵, and 𝒏𝒏 =
2 𝑛𝑛 2 𝑩𝑩

 The population standard deviation 𝜎𝜎, is probably unknown. We must


often find an estimate of 𝜎𝜎.
𝑹𝑹
 Form Chebyshev's Theorem, 𝝈𝝈 ≈ , where R is the range of the values
𝟒𝟒
in the population to be sampled.
 We can calculate the standard deviation, denoted by 𝑠𝑠𝑝𝑝 of a
preliminary sample with 𝑛𝑛𝑝𝑝 values randomly selected from population
𝒛𝒛(𝜶𝜶/𝟐𝟐) 𝑠𝑠𝑝𝑝 𝟐𝟐
 If 𝑛𝑛𝑝𝑝 - 1 is at least 30, we calculate n by the formula 𝒏𝒏 = ,
𝑩𝑩
(𝑛𝑛−1)
𝑡𝑡[𝛼𝛼/2] 𝑠𝑠𝑝𝑝
otherwise, n = [ ]2
𝐵𝐵
Hypothesis Testing for a Population Mean
TESTING null hypothesis 𝐻𝐻0 : 𝜇𝜇 = 𝑐𝑐, VERSUS alternative hypothesis 𝐻𝐻1 : 𝜇𝜇 ≠ 𝑐𝑐.
 The classical approach to testing these hypotheses utilizes the test statistic
𝑦𝑦� −𝑐𝑐
𝒕𝒕 = . 𝑦𝑦� and s are the mean and standard deviation of a sample of size n that has been randomly selected.
𝑠𝑠/ 𝑛𝑛

 The test statistic t measures the distance between 𝑦𝑦� 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐 (the value that
makes 𝐻𝐻0 true)
 A test statistic nearly or exactly zero, provides little or no evidence to
support rejecting 𝐻𝐻0 in favor of 𝐻𝐻1 .
 A positive test statistic substantially greater than zero, provides evidence to
support rejecting 𝑯𝑯𝟎𝟎 in favor of 𝑯𝑯𝟏𝟏 .
 Similarly, a negative test statistic substantially less than zero, provides
evidence to support rejecting 𝐻𝐻0 in favor of 𝐻𝐻1 .
 To decide how large in absolute value the test statistic must be, we consider
the errors that can be made in hypothesis testing.
Hypothesis Testing for a Population Mean
TESTING null hypothesis 𝐻𝐻0 : 𝜇𝜇 = 𝑐𝑐, VERSUS alternative hypothesis 𝐻𝐻1 : 𝜇𝜇 ≠ 𝑐𝑐.
Example: The G & B Corporation produces a 16-ounce bottle of Gem Shampoo. The bottles are filled by an automated bottle-
filling process. If, for a particular adjustment of the bottle-filling process, this process is substantially overfilling bottles or underfilling
bottles, then this process must be shut down and readjusted. Overfilling results in lost profits for G & B, while underfilling is unfair to
consumers. For a given adjustment of the bottle-filling process we consider the infinite population of all bottles that could potentially be
produced.
For each bottle there is a corresponding bottle fill (measured in ounces). We let µ denote the mean of the infinite population of all the
bottle fills. that could potentially be produced by (the particular adjustment of) the bottle-filling process. G & B has decided that it will
shut down and readjust the process if it can be very certain that it should reject the null hypothesis 𝑯𝑯𝟎𝟎 : µ = 16 in favor of the
alternative hypothesis 𝐻𝐻1 : 𝜇𝜇 ≠ 16.
𝐻𝐻0 says that the mean bottle fill is at the appropriate level.
𝐻𝐻1 says that the mean fill is above or below the desired 16 ounces.
Hypothesis Testing for a Population Mean

1. The probability
𝒏𝒏−𝟏𝟏 𝒏𝒏−𝟏𝟏
𝑷𝑷( −𝒕𝒕 𝜶𝜶 ≤ 𝒕𝒕 ≤ 𝒕𝒕 𝜶𝜶 )= 1-𝜶𝜶
𝟐𝟐 𝟐𝟐

2. The probability that


Hypothesis Testing for a Population Mean
 TESTING null hypothesis 𝐻𝐻0 : 𝜇𝜇 = 𝑐𝑐, VERSUS alternative hypothesis
𝐻𝐻1 : 𝜇𝜇 ≠ 𝑐𝑐 by using a confidence interval

Example: Consider the Gem Shampoo bottle fill problem. We can reject 𝑯𝑯𝟎𝟎
: µ = 16 in favor of the alternative hypothesis 𝐻𝐻1 : 𝜇𝜇 ≠ 16, by setting the
probability of a Type I error equal to 𝛼𝛼 if and only if the 100(1-
𝛼𝛼)% confidence interval for µ does not contain 16.
Calculate 𝛼𝛼
Hypothesis Testing for a Population Mean
 Considerations In Setting 𝛼𝛼 And The Distinction Between Statistical And
Practical Significance
TestingG H0 : μ = c Versus H1 : μ > c
 TESTING null hypothesis 𝐻𝐻0 : 𝜇𝜇 = 𝑐𝑐, VERSUS alternative hypothesis
𝐻𝐻1 : 𝜇𝜇 > 𝑐𝑐 by using a Rejection Point
A Lower One-Sided Confidence Interval
 TESTING null hypothesis 𝐻𝐻0 : 𝜇𝜇 = 𝑐𝑐, VERSUS alternative hypothesis
𝐻𝐻1 : 𝜇𝜇 > 𝑐𝑐

 Example: Consider the sample of n = 5 mileages. Calculate a lower one-


sided 95% confidence interval for the mean mileage μ based on this
sample.

Вам также может понравиться