1 Confidence Intervals normal curve -- the probability of observing a
value outside of this area is less than 0.05. Confidence Intervals Because the normal curve is symmetric, half of In statistical inference, one wishes to estimate the area is in the left tail of the curve, and the population parameters using observed sample other half of the area is in the right tail of the data. curve. As shown in the diagram to the right, for a confidence interval with level C, the area in A confidence interval gives an estimated range each tail of the curve is equal to (1-C)/2. For a of values which is likely to include an unknown 95% confidence interval, the area in each tail is population parameter, the estimated range equal to 0.05/2 = 0.025. being calculated from a given set of sample data. The confidence interval can take any number of probabilities, with the most common being 95% or 99%. The common notation for the parameter in question is . Often, this parameter is the population mean , which is estimated through the sample mean . The level C of a confidence interval gives the probability that the interval produced by the method employed includes the true value of the parameter . Example No. 1 Suppose a student measuring the boiling temperature of a certain liquid observes the The value z* representing the point on the standard readings (in degrees Celsius) 102.5, 101.7, normal density curve such that the probability of 103.1, 100.9, 100.5, and 102.2 on 6 different observing a value greater than z* is equal to p is samples of the liquid. He calculates the sample known as the upper p critical value of the standard mean to be 101.82. If he knows that the normal distribution. For example, if p = 0.025, the standard deviation for this procedure is 1.2 value z* such that P(Z > z*) = 0.025, or P(Z < z*) = degrees, what is the confidence interval for the 0.975, is equal to 1.96. For a confidence interval population mean at a 95% confidence level? with level C, the value p is equal to (1-C)/2. A 95% confidence interval for the standard normal In other words, the student wishes to estimate distribution, then, is the interval (-1.96, 1.96), since the true mean boiling temperature of the liquid 95% of the area under the curve falls within this using the results of his measurements. If the interval. measurements follow a normal distribution, then the sample mean will have the distribution . Since the sample size is 6, the standard deviation of the sample mean is equal 7.1.1 Confidence Intervals for to 1.2/sqrt(6) = 0.49. Unknown Mean and Known The selection of a confidence level for an Standard Deviation interval determines the probability that the confidence interval produced will contain the For a population with unknown mean and true parameter value. Common choices for the known standard deviation , a confidence confidence level C are 0.90, 0.95, and 0.99. interval for the population mean, based on a These levels correspond to percentages of the simple random sample (SRS) of size n, is + area of the normal density curve. For example, z* , where z* is the upper (1-C)/2 critical value a 95% confidence interval covers 95% of the for the standard normal distribution. An increase in sample size will decrease the Prediction Intervals length of the confidence interval without Predicting the next future observation with a reducing the level of confidence. This is 100(1-α)% prediction interval because the standard deviation decreases as n increases. The margin of error, m of a Suppose that is a random sample from a confidence interval is defined to be the value normal population. We wish to predict the added or subtracted from the sample mean value, a single future observation. A point which determines the length of the interval: prediction of is the sample mean. The prediction error is . The expected value of the prediction error is:
and the variance of
the prediction error is 7.1.2 Confidence Intervals for Unknown Mean and Unknown Standard Deviation because the future observation , is independent of Confidence Intervals for Unknown Mean the mean of the current sample𝜒 . The prediction and Unknown Standard Deviation error is normally distributed. Therefore, In most practical research, the standard deviation for the population of interest is not known. In this case, the standard deviation is replaced by the estimated standard has a standard normal distribution. deviation, s also known as the standard error. Replacing with S results in: Since the standard error is an estimate for the true value of the standard deviation, the distributions of the sample mean is no longer normal with mean , and standard deviation . Instead, the sample mean follows which has a t distribution with degrees of the t distribution with mean , and standard freedom. Manipulating Tas we have done deviation . The t distribution is also described previously in the development of a CI leads to by its degrees of freedom. For a sample of a prediction interval on the future size n, the t distribution will have n-1 degrees observation xn+1. of freedom. The notation for a t distribution with k degrees of freedom is t(k). As the Definition: sample size nincreases, the t distribution A 100( 1 - ) % prediction interval on a single becomes closer to the normal distribution, future observation from a normal distribution is since the standard error approaches the true given by: standard deviation or large n. For a population with unknown mean and unknown standard deviation, a confidence interval for the population mean, based on Example No. 1: a simple random sample (SRS) of size n, is t* , where t* is the upper (1-C)/2 critical value Reconsider the tensile adhesion tests on for the t distribution with n-1 degrees of specimens of U-700 alloy described in freedom, t(n-1). Example 8-4. The load at failure for specimens was observed, and we found that and . The 95% confidence interval on was . We plan to 7.2 Prediction Intervals test a twenty-third specimen. A 95% prediction interval on the load at failure for this specimen Let’s reconsider the tensile adhesion tests. The is, load at failure for specimens was observed, and we found that and . We want to find a tolerance interval for the load at failure that includes 90% of the values in the population with 95% confidence. From Appendix Table XI the tolerance factor k for , , and 95% confidence is The desired tolerance interval is Notice that the prediction interval is considerably longer than the CI. which reduces to (23.67, 39.75). We can be 95% confident that at least 90% of the values of load at failure for this particular alloy lie 7.3 Tolerance Interval between 23.67 and 39.75 megapascals. Consider a population of semiconductor processors. Suppose that the speed of these processors has a normal distribution with mean megahertz and standard deviation megahertz. Then the interval from 600 - 1.96(30) = 541.2 to 600 + 1.96(30) = 658.8 megahertz captures the speed of 95% of the processors in this population because the interval from 1.96 to 1.96 captures 95% of the area under the standard normal curve. The interval from is called a tolerance interval,
is called a tolerance interval
If μ and σ are unknown, capturing a specific percentage of values of a population will contain less than this percentage (probably) because of sampling variability in x-bar and s. A tolerance interval for capturing at least % of the values in a normal distribution with confidence level 100(1+ ) % is,
where k is a tolerance interval factor found in
Appendix Table XI. Values are given for 90%, 95%, and 95% and for 95% and 99% confidence One-sided tolerance bounds can also be computed. The tolerance factors for these bounds are also given in Appendix Table XI. Example No. 1: