Вы находитесь на странице: 1из 40

STAT 101

Dr. Kari Lock Morgan


9/13/12

Confidence Intervals:
Sampling Distribution

SECTIONS 3.1, 3.2


• Sampling Distributions (3.1)
• Confidence Intervals (3.2)

Statistics: Unlocking the Power of Data Lock5


The Big Picture

Population Sampling

Sample
Statistical
Inference
Statistics: Unlocking the Power of Data Lock5
Statistical Inference

Statistical inference is the process of


drawing conclusions about the entire
population based on information in a sample.

Statistics: Unlocking the Power of Data Lock5


Statistic and Parameter
A parameter is a number that describes
some aspect of a population.
A statistic is a number that is computed
from data in a sample.

 We usually have a sample statistic and want to


use it to make inferences about the population
parameter

Statistics: Unlocking the Power of Data Lock5


The Big Picture

Population Sampling
PARAMETERS

Sample
STATISTICS
Statistical
Inference
Statistics: Unlocking the Power of Data Lock5
Parameter versus Statistic

mu x-bar

p-hat

sigma

rho

Statistics: Unlocking the Power of Data Lock5


Election Polls
 Over the weekend (9/7/12 – 9/9/12), 1000
registered voters were asked who they plan to
vote for in the 2012 presidential election

 What proportion of voters plan to vote for


Obama?
𝑝ො = 0.50 p = ???
http://www.politico.com/p/2012-election/polls/president
Statistics: Unlocking the Power of Data Lock5
Point Estimate

We use the statistic from a sample as


a point estimate for a population
parameter.

 Point estimates will not match population


parameters exactly, but they are our best
guess, given the data

Statistics: Unlocking the Power of Data Lock5


Election Polls
 Actually, several polls were conducted over
the weekend (9/7/12 – 9/9/12):

http://www.politico.com/p/2012-election/polls/president
Statistics: Unlocking the Power of Data Lock5
IMPORTANT POINTS
• Sample statistics vary from sample to sample.
(they will not match the parameter exactly)
• KEY QUESTION: For a given sample statistic,
what are plausible values for the population
parameter? How much uncertainty surrounds
the sample statistic?
• KEY ANSWER: It depends on how much the
statistic varies from sample to sample!

Statistics: Unlocking the Power of Data Lock5


Reese’s Pieces
• What proportion of Reese’s pieces are orange?
• Take a random sample of 10 Reese’s pieces
• What is your sample proportion?  class
dotplot
• Give a range of plausible values for the
population proportion

Statistics: Unlocking the Power of Data Lock5


Sampling Distribution

A sampling distribution is the


distribution of sample statistics
computed for different samples of the
same size from the same population.

 A sampling distribution shows us how the


sample statistic varies from sample to sample

Statistics: Unlocking the Power of Data Lock5


Sampling Distribution

In the Reese’s Pieces sampling distribution,


what does each dot represent?

a) One Reese’s piece


b) One sample statistic

Statistics: Unlocking the Power of Data Lock5


Reese’s Pieces
• www.lock5stat.com/statkey

Statistics: Unlocking the Power of Data Lock5


Sample Size Matters!

As the sample size increases, the


variability of the sample statistics
tends to decrease and the sample
statistics tend to be closer to the true
value of the population parameter

 For larger sample sizes, you get less variability


in the statistics, so less uncertainty in your
estimates
Statistics: Unlocking the Power of Data Lock5
Random Samples
• If you take random samples, the sampling
distribution will be centered around the true
population parameter

• If sampling bias exists (if you do not take


random samples), your sampling distribution
may give you bad information about the true
parameter

Statistics: Unlocking the Power of Data Lock5


Lincoln’s Gettysburg Address

Statistics: Unlocking the Power of Data Lock5


Interval Estimate

An interval estimate gives a range


of plausible values for a population
parameter.

Statistics: Unlocking the Power of Data Lock5


Margin of Error
One common form for an interval estimate is
statistic ± margin of error
where the margin of error reflects the
precision of the sample statistic as a point
estimate for the parameter.

 How do we determine the margin of error???

Statistics: Unlocking the Power of Data Lock5


Sampling Distribution
• We can use the spread of the sampling
distribution to determine the margin of error
for a statistic

Statistics: Unlocking the Power of Data Lock5


Margin of Error

The higher the standard deviation of the


sampling distribution, the
a) higher
b) lower

the margin of error.

Statistics: Unlocking the Power of Data Lock5


Election Polling

 Why is the margin of error smaller for the


Gallup poll than the ABC news poll?
http://www.realclearpolitics.com/epolls/2012/president/us/general_election_romney_vs_obama-1171.html

Statistics: Unlocking the Power of Data Lock5


Interval Estimate
 Using the Gallup poll, calculate an interval
estimate for the proportion of registered
voters who plan to vote for Obama.

Statistics: Unlocking the Power of Data Lock5


Election Polling

Statistics: Unlocking the Power of Data Lock5


Confidence Interval
A confidence interval for a parameter is an
interval computed from sample data by a
method that will capture the parameter for
a specified proportion of all samples

 The success rate (proportion of all samples


whose intervals contain the parameter) is known
as the confidence level
 A 95% confidence interval will contain the true
parameter for 95% of all samples
Statistics: Unlocking the Power of Data Lock5
Confidence Intervals
 www.lock5stat.com/StatKey

 The parameter is fixed

 The statistic is random


(depends on the sample)
 The interval is random
(depends on the statistic)

Statistics: Unlocking the Power of Data Lock5


Sampling Distribution
If you had access to the sampling
distribution, how would you find the
margin of error to ensure that intervals of
the form
statistic ± margin of error
would capture the parameter for 95% of
all samples?

Statistics: Unlocking the Power of Data Lock5


Standard Error

The standard error of a statistic, SE,


is the standard deviation of the
sample statistic

 The standard error can be calculated as the


standard deviation of the sampling
distribution

Statistics: Unlocking the Power of Data Lock5


95% Confidence Interval

If the sampling distribution is relatively


symmetric and bell-shaped, a 95%
confidence interval can be estimated using
statistic ± 2 × SE

Statistics: Unlocking the Power of Data Lock5


Economy
A survey of 1,502 Americans in January 2012 found
that 86% consider the economy a “top priority” for
the president and congress this year.
The standard error for this statistic is 0.01.
What is the 95% confidence interval for the true
proportion of all Americans that considered the
economy a “top priority” at that time?
statistic ± 2×SE
(a) (0.85, 0.87) 0.86 ± 2×0.01
(b) (0.84, 0.88) 0.86 ± 0.02
(c) (0.82, 0.90) (0.84, 0.88)
http://www.people-press.org/2012/01/23/public-priorities-deficit-rising-terrorism-
slipping/
Statistics: Unlocking the Power of Data Lock5
Interpreting a Confidence Interval
 95% of all samples yield intervals that contain
the true parameter, so we say we are “95%
sure” or “95% confident” that one interval
contains the truth.
 “We are 95% confident that the true proportion
of all Americans that considered the economy a
‘top priority’ in January 2012 is between 0.84
and 0.88”

Statistics: Unlocking the Power of Data Lock5


Reese’s Pieces

The standard error for 𝑝,


ො the proportion of
orange Reese’s Pieces in a random sample of
10, is closest to

a) 0.05
b) 0.15
c) 0.25
d) 0.35

Statistics: Unlocking the Power of Data Lock5


Reese’s Pieces
Each of you will create a 95% confidence interval
based off your sample. If you all sampled
randomly, and all create your CI correctly, what
percentage of your intervals do you expect to
include the true p?
a) 95%
b) 5%
c) All of them
d) None of them

Statistics: Unlocking the Power of Data Lock5


Reese’s Pieces
 Use StatKey to more precisely estimate the SE

 Use this estimated SE and your 𝑝ො to create a


95% confidence interval based on your data.
 Come up to the board and draw your interval

 How many include (our best guess at) the


truth?

Statistics: Unlocking the Power of Data Lock5


Reese’s Pieces

Did your 95% confidence interval include the


true p?

a) Yes
b) No

Statistics: Unlocking the Power of Data Lock5


Confidence Intervals
If context were added, which of the following
would be an appropriate interpretation for a 95%
confidence interval:

a)“we are 95% sure the interval contains the parameter”


b)“there is a 95% chance the interval contains the parameter”
c)Both (a) and (b)
d)Neither (a) or (b)
95% of all samples yield intervals that contain the true parameter, so we say
we are “95% sure” or “95% confident” that one interval contains the truth.

We can’t make probabilistic statements such as (b) because the interval either
contains the truth or it doesn’t, and also the 95% pertains to all intervals
that could be generated, not just the one you’ve created.
Statistics: Unlocking the Power of Data Lock5
Confidence Intervals
Sample statistic ± ME
Population

Sample

Sample
...
Sample Margin of Error (ME)
Sample Sample (95% CI: ME = 2×SE)

Sampling Distribution
Calculate statistic
for each sample Standard Error (SE):
standard deviation of
sampling distribution

Statistics: Unlocking the Power of Data Lock5


Summary
• To create a plausible range of values for a
parameter:
o Take many random samples from the population,
and compute the sample statistic for each sample
o Compute the standard error as the standard
deviation of all these statistics
o Use statistic  2SE

• One small problem…

Statistics: Unlocking the Power of Data Lock5


Reality

… WE ONLY HAVE ONE SAMPLE!!!!

• How do we know how much sample


statistics vary, if we only have one
sample?!?
… to be continued

Statistics: Unlocking the Power of Data Lock5


To Do
 Read Sections 3.1, 3.2

 Do Homework 2 (due Tuesday, 9/18)

Statistics: Unlocking the Power of Data Lock5

Вам также может понравиться