Вы находитесь на странице: 1из 27

Sampling Distributions

CHAPTER 6
Sample Mean and
Variance
 A statistics is a function of sample observations that contains no
unknown parameters.

 The sampling distribution describes probabilities associated with a


statistic when a random sample is drawn from a population.

 The sampling distribution is the probability distribution or


probability density function of the statistic.

 Derivation of the sampling distribution is the first step in calculating a


confidence interval or carrying out a hypothesis test for a parameter.
Example
 Suppose that X1, ......., Xn are a simple random sample from a normally
distributed population with expected value µ and known variance σ2 .
Then the sample mean is a statistic used to give information about the
population parameter µ; x¯ is normally distributed with expected
value µ and variance σ /n.

 Principle of centrality for sampling distributions of means: The sample


means tend to center around the population mean.

 Principle of variability: The variability among the sample means


decreases as the sample size increases.
 For a random variable X, the expected value E(X) = µ and the
variance Var(X) = σ2 .

 Since the random variable X can take on any of N values with


probability 1/N, then the mean and variance becomes:

N
1
µ=
N
∑x
i =1
i

N
1
σ =
2

N
∑ i
( x
i =1
− µ ) 2

 These random variables can be considered as elements of a random


sample from an infinite population having a probability distribution
with mean µ and variance σ2 .
 The values xi denotes the ith possible value of X.
n
_
1
 For a sample of size n, the sample mean is x=
N
∑x
i =1
i
where xi is the ith sample observation.

 To measure the variability of the sample, we might try a sample copy


of the variance, namely 1 n _
s = ∑ ( xi − x) 2
2

_
n i =1
 We might say, that x is a good estimator of µ.

 The sample variance is not a very good estimator for the population
variance. Whereas ransom samples tend to center at their population
mean, these samples tend to have less variability than the population
that they came from. To compensate for this fact we change the
denominator from n to n-1. 1 n _
s2 = ∑ i
( x
n − 1 i =1
− x ) 2
The sampling distribution of the mean is the probability distribution of
the mean of a random sample. Its mean and variance can be easily
calculated as follows:
The central limit theorem states that the sampling distribution of the
mean, for any set of independent and identically distributed random
variables, will tend towards the normal distribution as the sample size gets
larger. This may be restated as follows:
Central Limit Theorem
 Central Limit Theorem : When n is sufficiently large, the sampling
distribution of is well approximated by a normal curve, even when the
population distribution is not itself normal.

 We sometimes abbreviate the CLT to the phrase “ is asymptotically


distributed with mean µ and variance σ2/n”.
 _ 
_
 X− µ b− µ   b− µ 
 Therefore, P ( X ≤ b) = P  ≤  ≈ P  Z ≤ 
 σ /n σ /n   σ /n 
 
where Z is a standard normal random variable.
Example
 The fracture strengths of a certain type of glass average 14 (in
thousands of pounds per square inch) and have a standard deviation of
2.
 A) What is the probability that the average fracture strength for 100
pieces of this glass exceeds 14.5?

 B) Find an interval that includes the average fracture strength for 100
pieces of this glass with probability 0.95.


Example
 The average strength has approximately a normal distribution
with mean µ=14 and standard deviation σ 2
= = 0.2
n 100

 Thus,
 _ 
_
 X − µ 14.5 − µ 
P ( X > 14.5) = P  >
 σ /n σ / n 
 
approximat ely
 14.5 −14   0.5 
P Z >  = P Z >  = P ( Z > 2.5) = 0.5 − 0.4938 = 0.0062
 0. 2   0.2 
 The probability of seeing an average value (n=100) more than 0.5 unit
above the population mean is, in this case, very small.
 B) We have seen that  
 σ _
σ 
P  µ − 1.96 ≤ X ≤ µ + 1.96  = 0.95
 n n
 

 for a normally distributed .


 In this problem, σ 2
µ − 1.96 = 14 − 1.96 = 13.6
n 100
and
σ 2
µ + 1.96 = 14 + 1.96 = 13.6
n 100
 Approximately 95% of the sample mean fracture strengths, for sample of
size 100, should lie between 13.6 and 14.4.
Sampling Distribution of Sums
The Normal Approximation to the Binomial
Distribution

 Requirements for a Binomial Distribution

 - Fixed Number of Trials


 - Trials are independent
 - Each trial has 2 possible outcomes
 - Probabilities remain constant (p and q, q = 1-p)

 Formula: nCx pxqn-x

 Mean= np
 Variance = npq
Example
 Example:

 54% of people have answering machines. Sample 1000 households.


What is the probability that more than 556 have answering machines.

 P(X >556)= P(557)+P(558)+P(559)+...+P(999)+P(1000)

 We are going to use the normal distribution to approximate a binomial


distribution.
Requirements for Using a Normal Distribution as an
Approximation to a Binomial Distribution:

 If np≥5 and nq≥5, then the binomial random variable is approximately


normally distributed with the mean and standard deviation of μ=np
and σ = sqrt(npq).

 Continuity correction factor


 We are using a Continuous model to approximate a Discrete model.
We need to make adjustment for continuity. This is the continuity
correction factor.

 Because the normal distribution can take all real numbers (is
continuous) but the binomial distribution can only take integer values
(is discrete), a normal approximation to the binomial should identify
the binomial event "8" with the normal interval "(7.5, 8.5)" (and
similarly for other integer values).
Continuity Correction Factor

 x  0.5  np 
P( X  x)    
 np(1  p) 
 
 x  0.5  np 
P( X  x)  1    
 np(1  p ) 
 
Continuity correction
factor
 Example: If n=20 and p=.25, what is the probability that X is greater
than or equal to 8?

 The normal approximation without the continuity correction factor


yields Z=(8-20 × .25)/(20 × .25 × .75)^.5 = 1.55, hence P(X ≥ 8) ~
.0606

 The continuity correction factor requires us to use 7.5 in order to


include 8 since the inequality is weak and we want the region to the
right. z = (7.5 - 5)/(20 × .25 × .75)^.5 = 1.29, hence the area under the
normal curve is .0985.

 The exact solution is .1019 approximation. Hence for small n, the


continuity correction factor gives a much better answer.
Example: 54% of people have answering machines.
Sample
1000 households. Estimate the probability that more than
556 have answering machines.

 1) Test if Normal is Appropriate


 np = .54 X 1000 = 540 and nq = .46 X 1000 =460
 and both are greater than 5

 2) Find the mean and the standard deviation


 µ = np = 540 and σ = 16.87

 3) Draw the normal curve and identify the region


representing the probability to be found.
Example
 4) Find the continuity correction factor

 5) Estimate the probability

 P(N > 556) = P(N ≥ 555.5)=P( (X-µ)/σ ≥ (555.5– 540)/ 16.87)


 = P(Z ≥ 0.919) = 0.5-0.3212=0.1788

What about the probability that less than 519 have answering machines?

 P(N < 519) = P(N ≤ 519.5)=P( (X-µ)/σ ≤ (519.5– 540)/ 16.87)


 = P(Z ≤ -1.22) =0.5 – 0.388=0.112
Sampling Distributions

 •A sample statistic used to estimate an unknown population parameter


is called an estimate.

 • The discrepancy between the estimate and the true parameter value is
known as sampling error.

 • A statistic is a random variable with a probability distribution, called


the sampling distribution, which is generated by repeated sampling.

 • We use the sampling distribution of a statistic to assess the sampling


error in an estimate.
The Sampling Distribution of the Sample
Variance

There is no analog to the CLT for


which gives an approximation for
large samples for an arbitrary
distribution.

The exact distribution for S2 can be derived for X ~ i.i.d. Normal.

The sampling distribution of S2 is needed to infer the variability of a


population from the variability of its sample. The simplest case is when
the population have a normal distribution.
Chi-Square Distribution
 Theorem. If S2 is the variance of a random sample of size n taken
from a normal population having the variance σ2 , then the statistic

 has a chi-squared distribution with v = n - 1 degrees of


freedom.
.

Mean of S2
 Χ2 distribution has a mean equal to its degree of freedom n-1 and the
variance equal to twice its degree of freedom 2(n-1).

 From this information we find the mean and variance of S2.

( n −1) S 2 
E  =n −1
 σ 2

therefore
( n −1) E ( S 2 )
=n −1
σ2
and
σ2
E ( S ) =( n −1)
2
=σ2
( n −1)
Variance of S2
 ( n −1) S 2

V =2( n −1)
 σ 2

therefore
2
( n −1) 
V (S ) =2( n −
2
 1)
 σ
2

and
σ  2σ
2 2 4
V (S ) =
2
2( n −1)
n − 
 =
 1 n -1
Example
 For a certain launching mechanism, the distances by which the
projectile misses the target center have a normal distribution with
variance σ2 = 100square meters. An experiment involving n=25
launches is to be conducted. Let S2 denote the sample variance of the
distances between the impact of the projectile and the target center.

 Approximate P(S2 > 50). Find E(S2 ) and V(S2 ).

 Let U = (n-1) S2 / σ2 , which then has a Χ2 (24) distribution for n=25.


 P(S2 >50) = P ((n-1) S2 / σ2 > 24 * 50 /100) = P(U > 12) from Table 6
is a little larger than 0.975.
Example
 We know that E(S2)= σ2 = 100
 and
 V(S2) = 2 σ4 /(n-1) = 2 (100)^2 / 24 = 10000/12

Вам также может понравиться