Вы находитесь на странице: 1из 8

Chapter 7 Sampling and Sampling Distribution

Example:

There are 2500 managers in the electronic associates. The average annual salary of
2500 managers is $51800. That is,
2500

∑y i
µ= i =1
= 51800 = population mean,
2500
where y i is the i’th manager’s annual salary. Also, assume the population variance of
the salary data is
2500 2500

∑ ( yi − µ ) ∑(y − 51800)
2 2
i
.
σ2 = i =1
= i =1
= 40002
2500 2500
Assume that 1500 of the 2500 managers have completed the training program. Then,
1500
p= = 0.6 = population proportion of completing the program
2500

Objective: we try to just use part of the data (because it is too costly and time
consuming to use all the data) and thus obtain the accurate guess for the
population mean, variance and proportion.

7.1 Simple Random Sampling

Simple random sampling from finite population:


A simple random sample of size n from a finite population of size N is a sample
selected such that each possible sample of size n has the same probability of
being selected.
N  N!
Note: the total number of random samples is 
n
= . Thus, the
  n!( N − n )!
1
probability of a specific random sample being selected is 

N .
n 
 
Note: the above sampling is also called a random sample without replacement
since we did not place a selected element back into the population. That is, the
sample element can not be selected twice.

1
Simple random sampling from infinite population:
 Each element selected comes from the same population.
 Each element is selected independently.

7.2 Point Estimate

Example:

Suppose we randomly select 30 managers as a sample. Let x1 , x 2 ,  , x30 be the


annual salaries of the 30 managers (a sample). Suppose 19 of them have completed
the training program. Thus,
30

∑x i
x= i =1
= 51814 = sample mean
30
and
19
p= = 0.63 = sample proportion
30

Note: x and p are sensible estimates of µ = 51800 and p = 0.6 .

Point estimate:
A point estimate is a statistic based on a sample of size n (not necessary to be
simple random sample) from a finite population of size N. Suppose x1 , x 2 ,  , x n
is the sample. Then,
n

∑x i
x= i =1
= sample mean
n
n

∑( x − x)
2
i
s2 = i =1
= sample variance
n −1
and the sample proportion p are point estimates of the population mean µ , the
population variance σ 2 , and the population proportion p, respectively.

Note: x , s 2 and p are not the only estimates. They are just some sensible

2
estimates.

7.3 Sampling Distributions

Example:

Suppose we sample 500 times and let


x11 , x12 ,  , x30
1
⇒x1
2
x ,
1
2
x , , x
2
2
30 ⇒x2
   
x500
1 ,x500
2 , , x 500
30 ⇒x500
be 500 simple random samples of 30 managers. Let X be the random variable
representing the average salary of a random sample of 30 managers. Then,
 2500 
x1 , x 2 ,  , x500 are 500 possible values of X . Note there are . 
 30 
 possible x
 
for random variable X .

Properties of X :

Let X be the random variable representing the average of a random sample.


Then,
1. E ( X ) = µ ≡ the population mean
2. For finite population with population size N and the sample size n, then the
standard deviation of X is
N −n σ
Var ( X ) = E ( X − µ) 2 = σ X = .
N −1 n
For infinite population (the infinite population size),
σ
Var ( X ) = σ X = .
n

n N −n N −n σ σ
Note: as ≤ 0.05 , then ≈ 1 . Thus, σ X = ≈ . That is, the
N N −1 N −1 n n
standard deviation of X for the finite population is approximately equal to the one
for the infinite population.

3
Sampling Distribution of X :

As n ≥ 30 or the population is normally distributed, then


X ≈ N ( µ, σ X2 )
σ2
, where σ X2 = . Thus, for some constants − ∞ ≤ c ≤ d ≤ ∞ ,
n
c −µ X −µ d −µ
P (c ≤ X ≤ d ) = P (c − µ ≤ X − µ ≤ d − µ ) = P ( ≤ ≤ )
σX σX σX
c −µ d −µ
≈ P( ≤Z ≤ ),
σX σX

where Z is the standard normal random variable.

Example:

What is the probability of the difference between the sample mean and the population
mean will be less or equal to 500 as the sample size n = 30 ?

[solution]

σ 4000
µ = 51800 , σ X ≈ = = 730 .30 . Thus,
n 30
X − 51800
X ≈ N (51800 , (730 .30 ) 2 ) ⇒ ≈ Z ≡ the standard normal random variable
730 .30

⇒P ( X −51800 ≤ 500 ) = P ( −500 ≤ X −51800 ≤ 500 )


−500 X −51800 500
= P( ≤ ≤ )
730 .30 730 .30 730 .30
≈ P ( −0.68 ≤ Z ≤ 0.68 ) = 0.5036

⇒ There is 50.36% chance that the difference between the sample mean and the
population mean is not more than 500.

Sample Size and Sampling Distribution:

σ
Since σ X = , increasing the sample size will decrease the standard error!!
n

4
Thus, the larger the sample size is, the larger P (c ≤ X ≤ d ) is (since the interval
c − µ d − µ 
 ,  is larger than the one with smaller sample size)!!
 σX σX 

Example:

What is the probability of the difference between the sample mean and the population
mean will be less or equal to 500 as the sample size n = 100 ?

[solution]

σ 4000
µ = 51800 , σ X ≈ = = 400 ≤ 730 .30 ( n = 30 ) . Thus,
n 100
X − 51800
X ≈ N (51800 , ( 400 ) 2 ) ⇒ ≈ Z ≡ the standard normal random variable
400

⇒P ( X −51800 ≤ 500 ) = P (51300 ≤ X ≤ 52300 )


51300 −51800 X −51800 52300 −51800
= P( ≤ ≤ )
400 400 400
≈ P ( −1.25 ≤ Z ≤1.25 ) = 0.7888 ≥ 0.5036 ( n = 30 )

⇒ As the sample size increases to 100, there is 78.88% chance that the difference
between the sample mean and the population mean is not more than 500. That is, the
larger sample size will provide a higher probability that the value of the sample mean
will be within a specific distance of the population mean.

Properties of P :

Let P be the random variable representing the proportion of a random sample


(the sample proportion of a random sample is one possible value of P ). Then,
1. E ( P ) = p ≡ the population proportion
2. For finite population with population size N and the sample size n, then the
standard deviation of P is
N −n p(1 − p)
Var ( P ) = E ( P − p) 2 = σ P = .
N −1 n
For infinite population (the infinite population size),
p (1 − p )
Var ( P ) = σ P = .
n

5
Note:
n N −n N −n p (1 − p ) p (1 − p )
As ≤ 0.05 , then ≈ 1 . Thus, σ X = ≈ .
N N −1 N −1 n n
That is, the standard deviation of P for the finite population is approximately
equal to the one for the infinite population.

Sampling Distribution of P :

As np ≥ 5 and n(1 − p ) ≥ 5,
P ≈ N ( p, σ P2 )
p (1 − p )
, where σ P2 = . Thus, for some constants 0 ≤ c ≤ d ≤ 1
n
c−p P−p d−p
P (c ≤ P ≤ d ) = P (c − p ≤ P − p ≤ d − p ) = P ( ≤ ≤ )
σP σP σP
c−p d−p
≈ P( ≤Z ≤ ),
σP σP

where Z is the standard normal random variable.

p (1 − p )
Note: Since σ P = , increasing the sample size will decrease the standard
n
error!! Thus, the larger the sample size is, the larger P (c ≤ P ≤ d ) is (since the
c − p d − p 
interval  ,  is larger than the one with smaller sample size)!!
 σP σP 

Example:

What is the probability of the difference between the sample proportion and the
population proportion will be less or equal to 0.05 as the sample size n = 30 ? What
is the probability as we increase the sample size to 100?

[solution]

0.6(1 − 0.6)
p = 0.6, σ P ≈ = 0.0894 . Thus,
30

6
P − 0 .6
P ≈ N (0.6, (0.0894 ) 2 ) ⇒ ≈ Z ≡ the standard normal random variable
0.0894

⇒P ( P −0.6 ≤ 0.05 ) = P ( −0.05 ≤ P −0.6 ≤ 0.05 )


−0.05 P −0.6 0.05
= P( ≤ ≤ )
0.0894 0.0894 0.0894
≈ P ( −0.56 ≤ Z ≤ 0.56 ) = 0.4246

⇒ There is 42.46% chance that the difference between the sample proportion and the
population proportion is not more than 0.05 as . n = 30 .

As sample size is increased to 100, then


0.6(1 − 0.6)
p = 0.6, σ P ≈ = 0.0490 ≤ 0.0894 ( n = 30 ). Thus,
100
P − 0 .6
P ≈ N (0.6, (0.0490 ) 2 ) ⇒ ≈ Z ≡ the standard normal random variable
0.0490

⇒P ( P −0.6 ≤ 0.05 ) = P (0.55 ≤ P ≤ 0.65 )


0.55 −0.6 P −0.6 0.65 −0.6
= P( ≤ ≤ )
0.0490 0.0490 0.0490
≈ P ( −1.02 ≤ Z ≤1.02 ) = 0.6922 ≥ 0.4246 ( n = 30 )

⇒ There is 69.22% chance that the difference between the sample proportion and the
population proportion is not more than 0.05 n = 100 . That is, the larger sample size
will provide a higher probability that the value of the sample proportion will be within
a specific distance of the population proportion.

7.4 Other Sampling Methods

1. Stratified Random Sampling:


 The population is divided into groups of elements called strata according to
some “characteristic” of the data.
 A simple random sample is taken from each stratum.

How to determine the sample size in each stratum?


 According to the size of the stratum.
 According to the variance of each stratum.

2. Cluster Sampling:

7
 The population is divided into several separate groups of elements called
clusters.
 A simple random sample of the clusters is taken. All elements within these
sampled or “selected” clusters are in the sample.

Note: one of the primary applications of cluster sampling is area sampling, where
clusters are city blocks or other well-defined areas!!

3. Systematic Sampling:
N
 Select randomly one of the first elements, where n and N are the
n
sample size and the population size, respectively.
N 
 Starting from the first selected element, select every   ’th element after
n
the first element.

Example:

Suppose n = 50 , N = 5000 , and y1 , y 2 ,  , y 5000 are the elements in the population.


N 5000
Since = = 100 , by using systematic sampling, we should select randomly
n 50
one from the first 100 elements first. Suppose the third element is selected, i.e.
x1 = y 3 . Then, select every 100’th element after y 3 , thus
x 2 = y103 , x3 = y 203 ,  , x 50 = y 4903 .

Вам также может понравиться