0 оценок0% нашли этот документ полезным (0 голосов)

9 просмотров61 страницаSAS help

May 31, 2018

Topic 9

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

SAS help

© All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

9 просмотров61 страницаTopic 9

SAS help

© All Rights Reserved

Вы находитесь на странице: 1из 61

Sampling Distributions

Sampling Distributions

The Sampling Distribution of the Sample

Mean

• Population Distribution vs. Sampling

Distribution

• The Mean and Standard Deviation of the

Sample Mean

• Sampling Distribution of a Sample Mean

• Central Limit Theorem

Statistical Inference

• Data generated from random sampling or

randomized comparative experiments

▪ Do a statistical analysis on the sample data

▪ Using the laws of probability to answer the question

“What would happen if we did this very many times?”

• Statistical Inference:

▪ Using the sample to infer something about the

population based on the above

▪ Reasoning rests on asking “How often would this

method give a correct answer if I used it very many

times?”

Sampling Terminology

• Parameter

– a number that describes the population

– in practice, the value is unknown number

– For example:

• , population mean

• , population standard deviation

• p, population proportion

• Statistic

– known value calculated from a sample

– a statistic is often used to estimate a parameter

– For Example:

• , sample mean

• s, sample standard deviation

• p̂ , sample proportion

Sampling Distributions Topic 9 5

Sampling Terminology (continued)

, sample mean estimates , population mean

S , sample standard deviation estimates , population standard deviation

p̂ , sample proportion estimate p, population proportion

Parameters come from the population

Population and Sample

The process of statistical inference involves using information from a

sample to draw conclusions about a wider population.

Different random samples yield different statistics. We need to be able to

describe the sampling distribution of possible statistic values in order to

perform statistical inference.

We can think of a statistic as a random variable because it takes numerical

values that describe the outcomes of the random sampling process.

• Population

representative Sample...

Sample

the Population.

Sampling Terminology (continued)

• Variability

– different samples from the same population may yield

different values of the sample statistic

• Sampling Distribution

– tells what values a statistic takes and how often it takes

those values in repeated sampling

• Sampling Distribution of a Statistic

– The distribution of values taken by the statistic in all

possible samples of the same size from the same

population.

Parameter vs. Statistic

The mean of a population is denoted by µ – this

is a parameter.

The mean of a sample is denoted by x – this is

a statistic. x is used to estimate µ.

The true proportion of a population with a certain

trait is denoted by p – this is a parameter.

denoted by p̂(“p-hat”) – this is a statistic. p̂is

used to estimate p.

Sampling Distributions Topic 9 9

The Law of Large Numbers

Law of Large Numbers – as the sample size

increases, the sample mean gets closer to the

population mean. That is , the difference

between the sample mean and the population

mean tends to become smaller (i.e., approaches

zero).

For Example:

( x gets closer to µ )

Example: Sampling from a Distribution

for the sample mean with an example.

Population Distribution From Which

We Will Take Samples

We will be

taking samples 0.28

Population

Population Proportion

0.24

from this Mean = 2.56

0.20

population 0.16

distribution: 0.12

0.08

0.04

0.00

0 1 2 3 4 5 6 7 8 9 10 11

Values of X

normal curve.

Sampling Distributions Topic 9 11

Example: Sampling from a Distribution (cont.)

Here are the distribution Sample Distribution for Sample 1 Sample Distribution for Sample 2

of the measurements in

11 11

the sample from four 10 10

9 9

different random 8 8

6

7

6

population on the 5

4

5

4

previous slide. Each 3 3

2 2

sample is of size 25. 1 1

0 0

Each of these 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11

Values of X Values of X

distributions looks

somewhat like the Sample Distribution for Sample 3 Sample Distribution for Sample 4

“parent” population

11 11

distribution (i.e., the 10 10

Number of Data Points

9 9

distribution being 8 8

7 7

sampled from), but 6 6

5 5

none look exactly like 4

3

4

3

it. 2

1

2

1

0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11

from each other as Values of X Values of X

well.

Sampling Distributions Topic 9 12

Example: Sampling from a Distribution (cont.)

• Now here is the distribution of the measurements in a sample of

size 1000.

• This distribution looks much more like the parent population.

Conclusion: The

distribution of

200

measurements in a Number of Data Points

large sample looks

like the distribution in

the parent population. 100

of Large Numbers)

0

Distribution

0 1 2 3 4 5 6 7 8 9 10 11

within a single

Values of X

sample.

Sampling Distributions Topic 9 13

Sampling Variability

Different random Samples yield different Statistics. This basic

fact is called sampling variability: the value of a statistic varies

in repeated random sampling.

To make sense of sampling variability, we ask, “What would

happen if we took many samples?”

Population

Sample

Sample

Sample

Sample

Sample

Sample

Sample

?

Sampling Distributions Topic 9 14

Distribution of the Sample Mean

● Suppose we take a series of different random samples.

▪ Sample 1 – we compute sample mean x1

▪ Sample 2 – we compute sample mean x2

▪ Sample 3 – we compute sample mean x3

▪ etc.

● Each time we sample, we may get a different result. The

sample mean x is a random variable! Therefore, the sample

mean has a mean, a standard deviation, and a probability

distribution.

▪ Since the sample mean is determined by chance, there is

variability in our point estimates.

▪ This variability leads to uncertainty as to whether our estimates

are correct.

▪ So, need some way to indicate the reliability of statements

made about a population based on sample data.

Distribution of the Sample Mean

• Example of a Sampling Distribution of the Sample Mean

sample1, n1 = 10 , 1 = 23.3

Sample2, n2 = 10 , 2 = 22.7

Population Sample3, n3 = 10 , 3 = 23.8

.

.

.

Sample15, n15 = 10 , 15 = 23.2

• Since we do not know the value of in advance when we

take a sample of 10 from our population, is a random

variable. And will have a distribution with a mean, ,

and a standard deviation, . This distribution is called the

Sampling Distribution of the Sample Mean.

• We can estimate by taking the mean of the 15 from

our 15 different samples above (( 1 + 2+ …+ 15)/15)

Distribution of the Sample Mean

• Show example in EXCEL

Distribution of the Sample Mean (cont.)

Terminology: Probability distribution of a statistic is called

a sampling distribution.

statements about the values the sample mean takes on

and the likelihood that our estimates of the population

mean based on the sample mean are accurate.

● The sampling distribution of the sample mean depends

upon:

▪ Sample size n.

▪ Mean μ of the population.

▪ Standard deviation σ of the population.

▪ Shape of the population distribution.

Distribution of the Sample Mean (cont.)

If you repeated taking simple random samples of size n from

a population and calculate the sample mean for each sample,

the distribution of this accumulation of sample means is the

sampling distribution of the sample mean.

Important Property:

The sampling distribution of the sample mean does not

necessarily look like the population distribution.

Example: Sampling Distribution

of the sample mean with an example.

Population Distribution From Which

We Will Take Samples

We will be

taking samples 0.28

Population

Population Proportion

0.24

from this Mean = 2.56

0.20

population 0.16

distribution: 0.12

0.08

0.04

0.00

0 1 2 3 4 5 6 7 8 9 10 11

Values of X

normal curve.

Sampling Distributions Topic 9 20

Example: Sampling Distribution (cont.)

Now suppose that we took a sample of size 10 and computed the

sample mean as a sample statistic and did this many times. The

sampling distribution of this random variable looks like this.

This looks somewhat like a Probability Dist'n of Sample Mean for Samples of Size 10

normal curve and not at

0.5

all like the parent μ=2.56

population. 0.4

Probability Density

0.2

0.1

Distribution

0.0

across many

1 2 3 4 5 6

samples. Values of the Sample Mean

(Values in original population are 0 to 11.)

Sampling Distributions Topic 9 21

Example: Sampling Distribution (cont.)

Probability Dist'n of Sample Mean for Samples of Size 40 Probability Dist'n of Sample Mean for Samples of Size 160

μ=2.56 μ=2.56

0.9 2

0.8

0.7

Probability Density

0.6

0.5

1

0.4

0.3

0.2

0.1

0.0 0

The sampling distribution for the The sampling distribution for the sample

sample mean for a sample size of 40. mean for a sample size of 160.

This looks more like a normal curve This looks almost exactly like a normal

than for a sample of size 10. curve.

Values of the sample mean range from Values of the sample mean range from

approximately 1.5 to 4. approximately 1.75 to 3.25. (A much

tighter distribution – variation

Sampling Distributions Topic 9

decreases as sample size gets larger.)

22

Behavior of Sampling Distribution

What can we take from our sampling distribution example

above?

1) The distribution of measurements in a sample looks like the

distribution in the parent population, NOT necessarily like a

Normal curve.

2) The sampling distribution of the sample mean looks like a normal

curve as our sample size increased, even though the parent

population is definitely NOT normal.

3) As the sample size increases, the sample mean gets closer to the

population mean, i.e., the difference between the sample mean and

the population mean tends to become smaller (i.e., approaches zero).

(Law of Large Numbers!)

4) The spread in the histograms for the sampling distribution of the

sample mean is getting smaller for larger sample sizes.

(Law of Large Numbers!)

Sampling Distributions Topic 9 23

Mean and Standard Deviation of the Sampling

Distribution of the Sample Means

Mean of a sampling distribution of a sample mean

There is no tendency for a sample mean to fall systematically above

or below , even if the distribution of the raw data is skewed. Thus,

the mean of the sampling distribution is an unbiased estimate of the

population mean .

mean

The standard deviation of the sampling distribution measures how

much the sample statistic varies from sample to sample. It is smaller

than the standard deviation of the population by a factor of √n.

Mean and Standard Deviation of the Sampling

Distribution of the Sample Means

•If numerous samples of size n are taken from a

population with mean and standard deviation

• If the mean, , is taken for each of these samples of

size n, then is now a random variable (sampling

distribution of the sample means)

•The mean of this sampling distribution of the sample

mean, , equals (where the population mean)

and the standard deviation, , of the sampling

distribution of the sample mean called the standard

error is: / n

(where is the population standard deviation)

Sampling Distributions Topic 9 25

Summary of Properties of Sampling Distribution

The sampling distribution of the sample mean has several

important properties:

● If a simple random sample of size n is drawn from any large population,

then the sampling distribution of the sample mean has:

▪ Mean: μ x = μ

(The mean of the sampling distribution of the sample mean

equals the population mean.)

▪ Standard deviation, called the standard error of the mean:

σ

σx =

n

(As the sample size increases, the standard error of the sample mean gets

smaller.)

● In addition, if the population is normally distributed, then, the sampling

distribution is normally distributed.

Terminology: The Standard Deviation of a statistic is called its Standard Error.

Mean and Standard Deviation of the Sampling

Distribution of the Sample Means

Since the mean of the random variable, is

(that is, =), we say that is an

unbiased estimator of

Individual observations have standard

deviation , but sample means from

samples of size n have standard deviation

/ n (called the standard error, that is

= / n)

Averages are less variable than individual

observations.

Sampling Distributions Topic 9 27

Notation for Sampling Distribution of Sample

Means

distribution, then the sampling distribution of

the sample mean, , of n independent

observations has the N(µ, / n ) distribution.

Normal distribution, then so does the sample

mean.”

Central Limit Theorem

● If our population has a normal distribution, then the

sampling distribution of the sample mean is normal.

normal distribution. What can we do?

the sample mean is normal, even when the population

distribution is not? This is almost true …

Central Limit Theorem

Most population distributions are not Normal. What is the shape of the

sampling distribution of sample means when the population distribution

isn’t Normal?

of sample means changes its shape: it looks less like that of the

population and more like a Normal distribution!

very close to Normal, no matter what shape the population distribution

has, as long as the population has a finite standard deviation.

Regardless of the shape of the population distribution,

the sampling distribution of the sample mean becomes

approximately normal as the sample size n increases.

Sampling Distributions Topic 9 30

Central Limit Theorem (cont.)

Draw an SRS of size n from any population with mean m and finite

standard deviation s . The central limit theorem (CLT) says that when n

is large, the sampling distribution of the sample mean x is approximately

Normal:

æ s ö

x is approximately N ç m, ÷

è nø

Central Limit Theorem (cont.)

Summary:

● If the random variable X (i.e., the population) is normally

distributed, then the sampling distribution of the sample mean is

normally distributed for any sample size.

● For all other random variables X (i.e., other populations), the

sampling distribution of the sample mean is approximately

normally distributed if n is 30 or higher. (The convention in our

class for n large enough)

Old Faithful Example

The most famous geyser in the world, Old Faithful in Yellowstone Nat’l

Park, has a mean time between eruptions of 85 minutes and a

standard deviation of 21.25 minutes. The distribution of the time

interval between eruptions is not normal.

a) What is the probability that a randomly selected time interval will be less

than 75 minutes?

b) What is the probability that a random sample of 20 time intervals will have

a mean less than 75 minutes?

c) What is the probability that a random sample of 30 time intervals will have

a mean less than 75 minutes?

d) What is the probability that a random sample of 30 time intervals will have

a mean greater than 100 minutes?

e) What is the probability that a random sample of 30 time intervals will have

a mean between 75 and 90 minutes?

Old Faithful Example

The most famous geyser in the world, Old Faithful in Yellowstone Nat’l

Park, has a mean time between eruptions of 85 minutes and a

standard deviation of 21.25 minutes. The distribution of the time

interval between eruptions is not normal.

a) What is the probability that a randomly selected time interval will be less

than 75 minutes? Cannot be done with information given

b) What is the probability that a random sample of 20 time intervals will have

a mean less than 75 minutes? Cannot be done with information given

c) What is the probability that a random sample of 30 time intervals will have

a mean less than 75 minutes? 0.0050

d) What is the probability that a random sample of 30 time intervals will have

a mean greater than 100 minutes? 0.0001

e) What is the probability that a random sample of 30 time intervals will have

a mean between 75 and 90 minutes? 0.8963

Sampling Distributions Topic 9 34

Summary: Distribution of the Sample Mean

The sample mean is a random variable with a probability

distribution called the sampling distribution.

– The mean of the sampling distribution is equal to the mean of

the population =μ.

– The standard deviation of the sampling distribution

(i.e., the standard error, ) is equal to = σ / n , where

σ = population standard deviation and n = sample size.

– Shape of the sampling distribution:

• If the sample size n is sufficiently large (30 or more is a

good rule of thumb), then this distribution is

approximately normal (regardless of the shape of the

population distribution).

• If the population is normally distributed, then the

sampling distribution is normal for any sample size.

Sampling Distributions Topic 9 35

A Few More Facts

Any linear combination of independent

Normal random variables is also Normally

distributed.

More generally, the central limit theorem

notes that the distribution of a sum or

average of many small random quantities is

close to Normal.

Finally, the central limit theorem also applies

to discrete random variables.

The Sampling Distribution of the Sample

Mean

What Was Covered

• Population Distribution vs. Sampling

Distribution

• The Mean and Standard Deviation of the

Distribution of the Sample Mean

• Sampling Distribution of a Sample Mean

• Central Limit Theorem

Topic 9 cont.

Proportion

Sampling Distributions for Counts and

Proportions

▪ Binomial Distributions for Sample Counts

▪ Binomial Distributions in Statistical Sampling

▪ Finding Binomial Probabilities

▪ Binomial Mean and Standard Deviation

▪ Sample Proportions

▪ Normal Approximation for Counts and

Proportions

Binomial Probability Distribution

• Binomial Probability Distribution

▪ A special discrete probability distribution

▪ Describes probabilities for experiments that have

two mutually exclusive outcomes (the experiment

has only two outcomes)

▪ One outcome is called a success

▪ The word “success” does not mean that this a a “good”

outcome or that we want this to be the outcome. It is the

outcome that we are looking for.

▪ For example, if we are looking at the cobra strike on an

animal. A success is animal dies.

▪ The other outcome is called a failure

▪ In the above example the failure would be the animal does

not die.

Sampling Distributions Topic 9 40

Definition of a Binomial Experiment

Definition: A binomial experiment is an experiment

with the following characteristics:

1. The experiment is performed a fixed number of

times, n; each time is called a trial. So there are a

fixed number of trials.

2. The n trials are independent. (Outcome of one trial

will not affect the outcome of any other trials.)

3. Each trial has only two possible outcomes, usually

called success and failure.

4. The probability of success is the same for each trial

of the experiment.

Sampling Distributions Topic 9 41

Notation for a Binomial Experiment

Notation used for binomial experiments:

• The number of trials is represented by n.

• The probability of a success (in a single trial)is

represented by p.

The total number of successes in n trials is represented by

the random variable X. (X is called a binomial random

variable.)

Because there cannot be a negative number of successes,

and because there cannot be more than n successes (out

of n attempts):

0 ≤ X=x≤ n

• The probability distribution of a binomial random

variable is called a binomial distribution.

Sampling Distributions Topic 9 42

Binomial Setting

Example 1

In a shipment of 100 televisions, how many are

defective? The probability of a television being

defective is 0.08?

n = 100

P = 0.08

X number of defective televisions in shipment of

100

X is a binomial random variable since the trials

(televisions) are independent, there is a fixed

number of televisions, n = 100, and the

probability of success (being defective) is the

same for each television (p=0.08)

X~Binomial (100,0.08)

Sampling Distributions Topic 9 43

Binomial Setting

Example 2

A new procedure for treating breast cancer is used on

25 patients. . How many patients are cured? The

procedure has a probability of 0.76 to cure a patient

n = 25

P = 0.76

X number of patience cured out of the 25.

X is a binomial random variable since the trials

(patient with breast cancer) are independent,

there is a fixed number of patients, n = 25, and

the probability of success (being cured) is the

same for each patient (p=0.76)

X~Binomial (25, 0.76)

Summary for the Binomial Distribution

• Binomial Experiment

▪ Fixed number of trials, n

▪ Only two outcomes for each trial, success or failure

▪ The n trials are independent

▪ The probability of a success, p, is the same for each trial

• Let x = the count of successes in a binomial setting. The

distribution of X is the binomial distribution with

parameters n and p. X~Binomial (n, p)

▪ n is the number of trials/observations

▪ p is the probability of a success on any one observation

(p must be the same for each trial)

▪ The random variable X takes on whole values between 0

and n

Sampling Distributions Topic 9 45

Binomial Probabilities

• Find the probability that a binomial random

variable X takes any particular value. X is the

number of success out of n trials/observations,

– P(x successes out of n observations) = P(X=x)

– There are three ways to calculate binomial

probabilities:

• Using the Binomial Formula

• Using Binomial Tables

• Using a statistical software package

Mean and Standard Deviation

• If X has the binomial distribution with n

observations and probability p of success

on each observation, then the mean and

standard deviation of X are

μ np

σ np (1 p )

Case Study

Inspecting Switches for lot of 10 Switches

The number X of bad switches has approximately the

binomial distribution with n=10 and p=0.1. Find the mean

and standard deviation of this distribution.

µ = np = (10)(0.1) = 1

the probability of each being bad is one tenth; so we

expect (on average) to get 1 bad one out of the 10

sampled

σ n p (1 p ) (1 0 )(0 .1 )( 1 0 .1 ) 0 .9 0 .9 4 8 7

Case Study

Inspecting Switches for lot of 10 Switches

The CDF from MINITAB

Cumulative Distribution Function

x P( X <= x )

0 0.34868

1 0.73610

2 0.92981

3 0.98720

4 0.99837

5 0.99985

6 0.99999

7 1.00000

Case Study

Inspecting Switches

Probability Histogram

n=10, p=0.1

Normal Approximation for Binomial

Distributions

As n gets larger, something interesting happens to the shape

of a binomial distribution.

Suppose that X has the binomial distribution with n trials and success

probability p. When n is large, the distribution of X is approximately Normal

with mean and standard deviation

X np s X = np(1- p)

As a rule of thumb, we will use the Normal approximation when n is so

large that np ≥ 10 and n(1 – p) ≥ 10.

Sampling Distributions Topic 9 51

Example

Sample surveys show that fewer people enjoy shopping than in the past. A survey asked a

nationwide random sample of 2500 adults if they agreed or disagreed that “I like buying

new clothes, but shopping is often frustrating and time-consuming.” Suppose that exactly

60% of all adult U.S. residents would say “Agree” if asked the same question. Let X = the

number in the sample who agree. Estimate the probability that 1520 or more of the

sample agree.

1. Verify that X is approximately a binomial random variable.

Success = agree, Failure = don’t agree

n = 2500 trials of the chance process.

The probability of selecting an adult who agrees is p = 0.60.

Each trial is independent

Since np = 2500(0.60) = 1500 and n(1 – p) = 2500(0.40) = 1000 are both at least

10, we may use the Normal approximation.

z= = 0.82

np(1 p) 2500(0.60)(0.40) 24.49 24.49

P(X ³1520) = P(Z ³ 0.82) =1- 0.7939 = 0.2061

Sampling Distributions Topic 9

52

Sampling Distribution of a Sample Proportion

There is an important connection between the sample proportion pˆ and

the number of " successes" X in the sample.

pˆ

size of sample n

Proportions

• The proportion of a population that has some

outcome (“success”) is p.

• The proportion of successes in a sample is

measured by the sample proportion:

ˆ num ber of successes in the sam ple

p total num ber of observations in the sam ple

x

pˆ

n

“p-hat” (sample proportion) is the point estimate

for p (population proportion)

So it is no more than the binomial variable

divided by sample size

Sampling Distributions Topic 9 54

Point Estimate, p̂

Example

• A polling company polls 1200 people to see if they

supported a certain issue. 408 of the 1200

supported the issue. What is the sample proportion:

• Let X = number of people who supported the issue

(the number of successes in the sample)

• Let n = number in our sample

x

• Then ˆ

p

n

408

ˆ

p

1200

ˆ 0.34

p

Sampling Distributions Topic 9 55

Distribution of the Sample Proportion

Because the values of the sample proportion varies from sample

to sample, it is a random variable.

So, we have the same questions for the sample proportion as we

had for the sample mean:

• What is the mean of the sample proportion?

• What is the standard deviation of the sample proportion?

• What is the sampling distribution of the sample proportion?

• Can we apply the Central Limit Theorem to approximate these

with normal distributions?

Distribution of the Sample Proportion

Because the values of the sample proportion varies from sample

to sample, it is a random variable.

So, we have the same questions for the sample proportion as we

had for the sample mean:

• What is the mean of the sample proportion? p

• What is the standard deviation of the sample proportion?

p (1 p )

standard error =

n

(again nothing more than the binomial standard deviation divided by n

np (1 p ) p (1 p )

n n

• What is the sampling distribution of the sample proportion if np(1-p)≥10

or number of success or failures is 15 or more (the conditions that allow

us to estimate the binomial using the normal distribution)?

p̂ ~ N( mean: p̂ p , standard error: p(1 p) )

pˆ

n

Sampling Distributions Topic 9 57

Inference about a Proportion

Simple Conditions

that n ≤ 0.05N) is sufficiently large and the population proportion

p isn’t close to either 0 or 1 (i.e., np(1−p) ≥ 10), then the

sampling distribution is approximately normal. In this book it is

simplified to the number of successes and failures in the

sample are both ≥ 15.

Aerobics Example

Assume that 80% of the people taking aerobics classes

are female and a simple random sample of n= 100

students is taken.

proportion of female students.

b)What is the probability that at most 75% of the

students in the sample are female?

c)What is the probability that at least 90 of the

students in the sample are female? Would that be

unusual?

Aerobics Example

P(success) = 0.8 P (failure) = 1 – 0.8 = 0.2 n = 100

a) number of successes = 0.8*100 = 80 ≥ 15

number of failures = 0.2*100 = 20 ≥ 15

p̂ ~ N( 0.8, 0.8*.2 0.04 )

100

0.75 0.80

b)P( p̂ ≤ 0.75) = P(Z ≤ 0.8*0.2 ) = P(Z ≤ -1.25) = 0.1056

100

Sampling Distributions for Counts and

Proportions

▪ Binomial Distributions for Sample Counts

▪ Binomial Distributions in Statistical Sampling

▪ Finding Binomial Probabilities

▪ Binomial Mean and Standard Deviation

▪ Sample Proportions

▪ Normal Approximation for Counts and

Proportions