Comparing The Hypergeometric Distribution To The Binomial When The Population Size N Is Large

STA 4321
Comparing the Hypergeometric Distribution to the Binomial

when the Population Size N is Large
Brett Presnell
In the experiment leading to the hypergeomtric distribution, we are essentially counting

the number of “successes” (red balls) drawn when a sample of size n is drawn without
replacement from a finite population consisting of r “successes” and N − r “failures” (white
balls). If we think of drawing the n balls one at a time (without replacement), then we
have a sequence of n trials, with each trial yielding either a success (red ball) or a failure
(white ball). So why doesn’t Y , the number of successes in the n trials, follow a binomial
distribution?1
Recall that the binomial distribution requires the trials to be independent with the
probability of success being the same from trial to trial. When we sample without replace-
ment, however, although the probability of success is the same for each trial, the trials are
dependent. In other words, it is the independence assumption of the binomial experiment
that fails to hold when sampling without replacement from a finite population, as can be
seen in the following simple example.
Example. Suppose N = 10, r = 4, and n = 2, and let
Si = {ith ball drawn is red}, and Fi = {ith ball drawn is white}, i = 1, 2.
Obviously P (S1 ) = 4/10. And what is P (S2 )? Well,
P (S2 |S1 ) = 3/9 and P (S2 |F1 ) = 4/9,
so by the Law of Total Probability,

4 3 6 4
P (S2 ) = P (S1 )P (S2 |S1 ) + P (F1 )P (S2 |F1 ) = +
10 9 10 9

4 3 6 4
= + = ,
10 9 9 10
| {z }
=1
and thus P (S1 ) = P (S2 ), i.e., the probability of success is the same in each trial. However,
3 4
P (S2 |S1 ) = 6= P (S2 ) = ,
9 10
so S1 and S2 are dependent.

1
Of course if we sample with replacement, then the trials are independent with success probability r/N
in each trial, and Y follows a binomial distribution rather than a hypergeometric distribution.
1
Interestingly, if the population size N is large relative to the sample size n (and if r is
not too close to 0 or N ), then the dependence between draws is weak.
Example. Suppose in the last example, that N = 100 and r = 40, but still n = 2. Then
we still have P (S1 ) = 0.4 = P (S2 ), but now
39
P (S2 |S1 ) = = 0.393939
99
is must closer to P (S2 ) = 0.4. And if we increase N to 1000 and r to 400, then still
P (S1 ) = 0.4 = P (S2 ), while
399
P (S2 |S1 ) = = 0.399399399 ≈ P (S2 ) = 0.4.
999
With the dependence between draws becoming weaker as N grows larger, we might
expect that when N is large (relative to n), the hypergeometric distribution can be well
approximated by the binomial, and this is indeed the case, as demonstrated by the example
in Table 1 and the corresponding Figure 1.
Table 1: The probability functions of the hypergeometric distribution with n = 5 and

r/N = 0.4 for various values of N and of the binomial distribution with n = 5 and p = 0.4.
r N −r N

y 5−y 5
N = 10 N = 20 N = 100 N = 1000
5
(.4)y (.6)5−y

y r=4 r=8 r = 40 r = 400 y
0 .0238 .0511 .0725 .0772 .0778

1 .2381 .2554 .2591 .2592 .2592
2 .4762 .3973 .3545 .3465 .3456
3 .2381 .2384 .2323 .2306 .2304
4 .0238 .0542 .0728 .0764 .0768
5 .0000 .0036 .0087 .0101 .0102
2
N = 10 r = 4 N = 20 r = 8
0.5
0.5
0.4
0.4
0.3
0.3
p(y)
p(y)
0.2
0.2
0.1
0.1
0.0
0.0
0 1 2 3 4 5 0 1 2 3 4 5
y y
N = 100 r = 40 N = 1000 r = 400

0.5
0.5
0.4
0.4
0.3
0.3
p(y)
p(y)
0.2
0.2
0.1
0.1
0.0
0.0
0 1 2 3 4 5 0 1 2 3 4 5
y y
Figure 1: The probability functions of the hypergeometric distribution (red) with n = 5

and r/N = 0.4 for various values of N and of the binomial distribution (blue) with n = 5
and p = 0.4.

Comparing The Hypergeometric Distribution To The Binomial When The Population Size N Is Large

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Comparing The Hypergeometric Distribution To The Binomial When The Population Size N Is Large

Загружено:

Авторское право:

Доступные форматы

STA 4321

Comparing the Hypergeometric Distribution to the Binomial

In the experiment leading to the hypergeomtric distribution, we are essentially counting

Example. Suppose N = 10, r = 4, and n = 2, and let

Si = {ith ball drawn is red}, and Fi = {ith ball drawn is white}, i = 1, 2.

Obviously P (S1 ) = 4/10. And what is P (S2 )? Well,

P (S2 |S1 ) = 3/9 and P (S2 |F1 ) = 4/9,

so by the Law of Total Probability,

so S1 and S2 are dependent.

Table 1: The probability functions of the hypergeometric distribution with n = 5 and

0 .0238 .0511 .0725 .0772 .0778

N = 100 r = 40 N = 1000 r = 400

Figure 1: The probability functions of the hypergeometric distribution (red) with n = 5

Вам также может понравиться