Вы находитесь на странице: 1из 20

CHAPTER 6 - ESTIMATION

Estimation is concerned with estimating the


values of specific population parameters.
OVERVIEW Hypothesis Testing is concerned with testing
whether the value of a population parameter
equal to some specific value. (Discussed more in
the latter part of the book)
• Sampling Methods
• Estimation of the Mean of a Distribution
• Central Limit Theorem (CLT)
• Interval Estimation (Confidence Interval)
• Estimation for the Binomial Distribution
SAMPLING METHODS
Population – every member of a group
Sample – subset of the population, has similar characteristics to the population.

Random Sample is a selection of some members of the population such that


each member is independently chosen and has a known nonzero probability of
being selected.

Simple Random Sample is a random sample in which each group member has
the same probability of being selected.

More details on pages 155 & 156.


More details in Section 6.4.

SAMPLING METHODS (contd.)


Cluster Sampling is when a population is subdivided on similar characteristics and
all the members of the randomly selected subgroups are chosen to be in the sample.

Stratified Sampling is when a population is subdivided on similar characteristics and


random members of randomly selected subgroups are chosen to be in the sample.

Blinding prevents biased reporting of outcomes by the patient and/or the physician.
Blinding is not always feasible in some research settings.
Patient Single – Blind
Double – Blind
Physician Triple – Blind

Researcher
Example: Consider rolling a die five times and finding the sample mean.
What is the behavior we see when this process continues indefinitely?

All outcomes are equally likely, so the population mean is 3.5; the mean of the
10,000 trials is 3.49. If continued indefinitely, the sample mean will be 3.5. Also
notice the distribution is “normal.”

Thus, the mean of the sample means targets the population mean which is also
ഥ.
known as the Sampling Distribution of 𝑿
𝐸 𝑋ത = 𝜇

In other words, we refer to 𝑋ത is an unbiased estimator of μ.

The larger the sample size, the more precise an estimator 𝑋ത is.
Central Limit Theorem (CLT)

𝜎2

𝑋~𝑁 𝜇,
𝑛

Many times the distributions we encounter are not normal. In these such cases, CLT would be applied.
(20.6)2 𝜎 20.6

𝑋~𝑁 112, 𝜇 = 112, = = 6.5143
10 𝑛 10

98 − 112 126 − 112


𝑃 98 < 𝑋ത < 126 = 𝑃 <𝑧< = 𝑃(−2.15 < 𝑧 < 2.15)
6.51 6.51

= Φ 2.15 − Φ −2.15 = 0.9842 − 0.0158 = 0.9684

Thus, if the CLT holds, 96.8% of the samples of size 10 would be expected to have a
mean birthweight between 98 and 126 oz.
𝜃෠ is the unbiased estimator of 𝜃 synonymously as 𝑋ത is an unbiased estimator of μ.

This in turn makes 𝑋ത is point estimate of μ.

A point estimate is a single value to estimate a population parameter.

We frequently wish to obtain an interval or confidence interval.

A confidence interval (CI) gives us a range of plausible values for population parameter.
A range of values used to estimate the true value of the population parameter.
Before we calculate a confidence interval (CI), we need to first go over the components
of a CI.

Confidence level (CL = 1 – α): how confident do you want to be that the population
parameter is in the interval?
Common CL’s – 90%, 95%, or 99%

Critical value (𝑧𝑐 ) is the value that separates the likely from the unlikely.
Find the critical value that corresponds to 95%.
0.95 = 1 − 𝛼
𝛼 = 0.05
Because there are two tails we need to divide by 2.
𝛼 0.5
= = 0.025
2 2
Now we use the Normal Distribution table to find the z-
score that corresponds to an area of 0.025 to the left
 -1.96
Confidence Interval for μ

σ is known σ is unknown

Normal Distribution Student’s T - Distribution


𝜎 𝑠
𝐸 = 𝑧𝑐 ∙ 𝐸 = 𝑡𝑐 ∙
𝑛 𝑛

𝐶𝐼: 𝑥ҧ − 𝐸 < 𝜇 < 𝑥ҧ + 𝐸 𝐶𝐼: 𝑥ҧ − 𝐸 < 𝜇 < 𝑥ҧ + 𝐸

Margin of Error (E): maximum likely difference between the sample statistic
and population parameter.

Most of the time, we are not given σ, so we will need to use the T – Distribution.
T - Distribution

Because σ is unknown, it is reasonable to estimate σ by the sample standard deviation s. The


problem with this is that the distribution is no longer normally distribution.

To resolve this, we have the Student’s T Distribution (T – Distribution) which was founded by
William Gossett for the Guinness Brewery in Ireland. He found that the shape of the distribution
depends on the sample size n.

Thus, the t distribution is not a unique distribution but is instead a family of distributions
indexed referred to as the degrees of freedom (df ) of the distribution.

Degrees of Freedom (df = n-1): For a collection of sample data, the number of sample values
that can vary after certain restrictions have be imposed on all data values.

Example: There are 9 fielding positions in baseball. Let’s say we have 9 players, we can freely put
8 players in any position until we get to the ninth player. That last player will have a
predetermined position.
𝑡𝑑𝑓,𝑢 : where df = degrees of freedom, u = uth percentile

𝑡20,0.95 = 1.725

𝑡23,0.95 = 1.714

As n increases in size, the t distribution converges to an 𝑁(0,1) distribution.


CONFIDENCE INTERVALS
The following two confidence intervals that are calculated use the below data set for Sample 1.
More details can be found on page 161
𝑛 = 10, 𝑥ҧ = 116.90, 𝑠 = 21.70 𝑡9,0.975 = 2.262

𝑠 21.70
𝐸 = 𝑡9,0.975 ∙ = 2.262 ∙ = 15.52217
𝑛 10

116.90 − 15.52217 < 𝜇 < 116.90 + 15.52217

101.378 < 𝜇 < 132.422


Interpretation: 95% of such intervals that could be constructed from repeated random
samples of size n contain parameter μ.

The important part to understand is that intervals depend on the sample mean and
sample variance and vary from sample to sample. Also the parameter μ is a fixed
unknown constant. Therefore, we can’t say there is a 95% chance that the parameter μ
will fall within a particular 95% CI.
𝑛 = 10, 𝑥ҧ = 116.90, 𝑠 = 21.70 𝑡9,0.995 = 3.250

𝑠 21.70
𝐸 = 𝑡9,0.995 ∙ = 3.250 ∙ = 22.30196
𝑛 10

116.90 − 22.30196 < 𝜇 < 116.90 + 22.30196

94.598 < 𝜇 < 139.202

Interpretation: 99% of such intervals that could be constructed from


repeated random samples of size n contain parameter μ.
Notice that the 99% CI (94.6, 139.2) is wider than the 95% CI (101.4, 132.4).

The rationale for this difference is that the higher the level of confidence desired that
μ lies within the interval, the wider the CI must be.

You can also see a change in the interval if you change the sample size to 100.

Find the 95% CI where n = 100.

For more practice, try example 6.37 on page 180.


Estimation for the Binomial Distribution
Example: Consider rolling a die five times and finding the sample proportion of odd
numbers. What is the behavior we see when this process continues indefinitely?

𝑥
𝑝Ƹ =
𝑛
(𝑠𝑎𝑚𝑝𝑙𝑒
𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛)

𝑝: (𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛)

All outcomes are equally likely, so the population proportion of odd


numbers is 0.50; the proportion of the 10,000 trials is 0.50. If continued
indefinitely, the mean of sample proportions will be 0.50. Also, notice the
distribution is “approximately normal.”
Confidence Interval for Proportions (𝑝)
Follows a normal distribution so we use z – scores when finding the CI for proportions.

𝑝Ƹ ∙ 𝑞ො 𝐶𝐼: 𝑝Ƹ − 𝐸 < 𝑝 < 𝑝Ƹ + 𝐸


𝐸 =𝑧∙
𝑛

More details can be found in Section 6.8.


𝑝Ƹ = 0.040, 𝛼 = 0.05, 𝑧1−𝛼 = 1.96, 𝑛 = 10,000
2

We have that 𝑛𝑝Ƹ 𝑞ො = 10,000 0.040 0.96 = 384 > 5. As this is a requirement for
using CI’s for Binomial Parameter p.
0.040 0.96
𝐸 = 1.96 ∙ = 0.00384
10,000

0.040 − 0.00384 < 𝑝 < 0.040 + 0.00384 0.036 < 𝑝 < 0.044

Вам также может понравиться