Probability Distrubution

BIOSTATISTICS
BIOL 350
LECTURER: DR. GORDON LIGHTBOURN
5.3 THE POISSON DISTRIBUTION

CONTINUATION OF CHAPTER 5

Quite frequently we study cases where sample size k
is very large, and one of the events (probability q) is
much more frequent than the other (probability p).
Expression e.g. (0.001+ 0.999)1000
The expansion of this binomial would be quite tiresome!!
In cases like above we are generally interested in

one tail of the distribution only.
This is the tail represented by:
p0qk,
p1qk-1,
p2qk-2,
p3qk-3,..

The first term represents no rare events and k frequent
events (in a sample of k events).
The second term represents 1 rare event and k-1 frequent
events.
The third term represents 2 rare events and k-2 frequent
events.
And so forth.
The expression of the form are the binomial
coefficients.
We could use the binomial to compute the frequencies as
in 5.2, however, it is much easier to use another
distribution, the **Possion distribution**.

The Poisson may be used to approximate the
binomial when the probability of the rare event p< 0.1
and the product of sample size and probability kp<5.
The Poisson distribution is also a discrete frequency
distribution of the number of times a rare event
occurs.
We will use the Poisson distribution to study either a
spatial or temporal sample.
Examples: spatial: the number of moss plants in a
sample quadrant.
Temporal: the number of mutations occurring in a
genetic strain in the time interval of one month.

The Poisson variable, Y, will be the number of rare
events per sample.
It can assume discrete (integer) values form 0 to .
The variable must have two properties:
1. Its mean must be small relative to the maximum
number of events per sampling unit. This means
the event should be rare.
2. An occurrence of the event must be independent
of prior occurrences within the sampling unit. This
means the event should be random.

If the occurrence of one event enhances the
probability of a second such event, we obtain
clumping or contagious distributions.
If the occurrence of one event impedes that of a
second such event in the sampling unit, we obtain a
repulsed, spatially or temporally distribution.
The Poisson distribution can be used as test for
randomness or independence of distribution both
spatially and temporally.

The Poisson series can be represented by:
,
,
,
,
,,
,
(5.11)
Which are the relative expected frequencies corresponding to
the following counts of the rare events Y:
0, 1, 2, 3, 4, ., r,
The first term represents the relative expected frequency of
samples containing no rare events (0).
The second term, one rare event.
The third term, two rare events.
The fourth term, three rare events.
And so forth.
Explanation of the term e ,where e is the base of the natural log, is
2.71828 and is the parametric mean of the distribution.
TABLE 5.5 YEAST CELLS IN 400 SQUARES

OF A HEMACYTOMETER
= 1.8 cells per square; n = 400 squares sampled.
__________________________________________________________________________
(1)
Number of
Cells per
square
(2)
Observed
frequencies
(3)
Absolute
expected
frequencies
(4)
Deviation from
expectation
-
Y
_________________________________________________________________________
0
75
66.1
+
1
103
119.0
2
121
107.1
+
3
54
64.3
4
30
28.9
+
5
13
10.4
+
6
2
3.1
7
1
17
0.8
14.5
+
+
8
0
0.2
9+
1
0.0
+
400
399.9
__________________________________________________________________________
EXAMPLE 5.5
Distribution of yeast cells in 400 squares of a
haemocytometer.
Column (1) lists the number of yeast cells observed in
each haemocytometer square.
Column (2) gives the observed frequency the
number of squares containing a given number of
yeast cells.
Note 75 squares contain no (0) yeast cells.
Most squares held either one or two cells.
Only 17 squares contained 5 or more yeast cells.
EXAMPLE 5.5
Why would we expect this frequency distribution to
be distributed in Poisson fashion?
We have a relatively rare event.
On average there 1.8 cells per square.
Relative to the amount of space, the number found is
very low.
We expect the occurrence of individual yeast cells in
a square is independent of the occurrence of other
yeast cells.
EXAMPLE 5.5
The mean of the rare events is the only quality we
need to know to calculate the relative expected
frequencies (of a Poisson distribution).
We do not know the parametric mean of the yeast
cells.
We employ an estimate (the sample mean) and
calculate the expected frequencies where equals
the sample mean of table 5.5.
It is convenient to rewrite expression 5.11 as:
i = i-1
for i = 1, 2, . Where 0 = e-
(5.12)
EXAMPLE 5.5
Note that the parametric mean has been replaced by
the sample mean .
Expression 5.12 yields relative expressed frequencies.
Absolute expected frequencies:
= n/e
0
We list the expected frequencies I column (3) of table 5.5.

What have we learnt?
When comparing the observed frequencies with the
expected frequencies, we see a good fit (mean 1.8).
No clear pattern of deviation from expected is observed.
EXAMPLE 5.5
The biological interpretation: the yeast cells seem to be
randomly dispersed in the counting chamber, indicating
thorough mixing of the suspension.
Note that in Table 5.5 we group the low frequencies at one
tail of the curve, uniting them by means of a bracket. For a
goodness of fit test no expected frequency, should be
less than 5.
Poisson distribution facts:
Computing expected frequencies we need to know only
one parameter the mean of the distribution.
The mean completely defines the shape of a given
Poisson distribution.
We have a simple relation between the two: = 2
The variance is equal to the mean.
EXAMPLE 5.5
In our example, variance = 1.965, not much larger than
the mean 1.80, indicating that the yeast cells are
distributed approximately in Poisson fashion.
The coefficient of dispersion: CD =
This value will be near 1 in distributions that are essential

Poisson,>1 in clumped samples, and <1 in cases of
repulsion.
In the yeast cell example, CD = 1.092
Figure 5.3 will give you an idea of the shape of the Poisson
distribution of different means.
For the low value of = 0.5, the frequency polygon is
extremely L-shaped, but with an increase in the value of ,
the distribution becomes humped and eventually nearly
symmetrical.
FIGURE 5.3
TABLE 5.6 NUMBER OF MOSS SHOOTS (HYPNUM

SCHREBERI) PER QUADRAT ON CHINA CLAY
RESIDUES (MICA)
__________________________________________________________________________
(1)
Number of
Moss shoots
Per quadrat
(2)
Observed
frequencies
(3)
Absolute
expected
frequencies
(4)
Deviation
from
expectation
-
_________________________________________________________________________
0
100
77.7
+
1
9
37.6
2
6
9.1
3
8
1.5
+
4
1
17
0.2
10.8
+
+
5
0
0.0
0
6+
2
0.0
+
126
126.1
__________________________________________________________________________
= 0.4841
s2 = 1.308
CD = 2.702
TABLE 5.6
The first example, is from an ecological study of
mosses of the species Hypnum schreberi invading
mica residue of china clay. The ecologist laid out 126
quadrats. In each quadrat they counted the number
of moss shoots. Expected frequencies are calculated
using the mean number of moss shoots, = 0.4841, as
an estimate of .
We expect only 78 quadrats without a moss plant, we
find 100.
Also we expect 1.7 quadrats containing 3 or more
moss shoots, we find 11.
The center classes are less than expected.
TABLE 5.6
Instead of the near 38 expected quadrats with one
moss plant each, we find only 9.
This case illustrates clumping, which was also
encountered in the binomial distribution.
The sample variance s2 = 1.308, much larger than the
= 0.4841, yields a coefficient of dispersion CD =
2.702.
Biological explanation: the protonemata, or spores,
of the moss were carried in by water and deposited
at random but that each protonema gave rise to a
number of upright shoots, so counts of the latter
indicated a clumped distribution.
TABLE 5.7 POTENTILLA (WEED) SEEDS IN 98

QUARTER-OUNCE SAMPLES OF GRASS SEEDS
(PHLEUM PRATENSE)
__________________________________________________________________________
(1)
Number of weed
Seeds per
Sample of seeds
(2)
Observed
frequencies
(3)
Poisson
expected
frequencies
(4)
Deviation
from
expectation
-
_________________________________________________________________________
0
37
31.3
+
1
32
35.7
2
16
20.4
3
9
7.8
+
4
2
2.2
5
0
13
0.5
10.6
+
6
1
0.1
+
7+
1
0.0
+
98
98.0
__________________________________________________________________________
= 1.1429
s2 = 1.711
CD = 1.497
TABLE 5.7
The second example tests the randomness of
distribution of weed seeds in samples of grass seed.
We can estimate k (which is several thousand), and
q, which represents the large proportion of grass
seeds, as compared with p, the small proportion of
weed seed.
The data are structured as in a binomial distribution
with alternative states: weed seed and grass seed.
Only the number of weed seeds must be considered.
This is a binomial in which the frequency of one
outcome is very much smaller than that of the other,
and the sample size is large.
TABLE 5.7
We can use the Poisson distribution as a useful
approximation of the binomial frequencies for the tail of
the distribution.
We use the average number of weed seeds per sample of
seeds as our estimate of the mean and calculate Poisson
frequencies from the mean.
Although the pattern of deviations and the coefficient of
dispersion indicate clumping, this tendency is not
pronounced and we do not have sufficient evidence to
suggest this is not a Poisson distribution.
We conclude the seeds are randomly distributed through
out the sample.
If clumping had been found, it might mean that weed
seeds stuck together, for some physical reason
TABLE 5.9 AZUKI BEAN WEEVILS (CALLOSOBRUCHUS

CHINENSIS) EMERGING FROM 112 AZUKI BEANS
(PHASEOLUS RADIATUS)
__________________________________________________________________________
(1)
Number of
Weevils emerging
Per bean
(2)
Observed
frequencies
(3)
Poisson
expected
frequencies
(4)
Deviation
from
expectation
-
_________________________________________________________________________
0
61
70.4
1
50
32.7
+
2
1
7.6
3
0
1
1.2
8.9
4+
0
0.1
112
112.0
__________________________________________________________________________
= 0.4643
s2 = 0.269
CD = 0.579

This distribution is extracted from an experimental

study of population of the azuki weevil.
The number of holes in beans (emergence) is a good
measure of the number of adult that have emerged.
The rare event in this case is the weevil present in the
bean.
The distribution is strongly repulsed, a far rare
occurrence.
There are many more beans containing one weevil
than the Poisson distribution would predict.

Biological explanation:
It was found that the adult female weevil tend to
deposit evenly rather than randomly over the
available beans.
This prevents too many egg being place on any one
bean and precluding heavy competition among the
developing larvae.
A contributing factor was competition between
larvae feeding on the same bean, generally resulting
in all but one being killed or driven away.
TABLE 5.10 MEN KILLED BY BEING KICKED BY A HORSE IN 10

PRUSSIAN ARMY CORPS IN THE COURSE OF 20 YEARS
__________________________________________________________________________
(1)
Number of men
Killed per year
Per army corps
(2)
Observed
frequencies
(3)
Poisson
expected
frequencies
(4)
Deviation
from
expectation
-
_________________________________________________________________________
0
109
108.7
+
1
65
66.3
2
22
20.2
+
3
3
4.1
4
1
4
0.6
4.8
+
5+
0
0.1
Total
200
200.0
__________________________________________________________________________
= 0.610
s2 = 0.611
CD = 1.002

Table 5.10 is a frequency distribution of men killed by

being kicked by a horse in 10 Prussian army corps
over 20 years.
The basic sampling unit is temporal, one army corps
per year.
The mean 0.610 men killed per army corps per year is
the rare event.
If we knew the number of men in each army corps,
we could calculate the probability of not being killed
in one year.
This would give us a binomial approximating the
Poisson distribution.

Knowing the sample size is large, however, we can

consider the example from the Poisson model, using
the observed mean number of men killed per army
corps per year as an estimate of .
This example is a perfect fit to the expected.
What would clumping mean?
Poor discipline in the particular corps or a particular
vicious horse that killed several men before the corps
got rid of it.
Repulsion might mean the men in a corps were
careless until someone had been killed, after which
they became more careful for a while.

Probability Distrubution

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Probability Distrubution

Загружено:

Авторское право:

Доступные форматы

BIOSTATISTICS

5.3 THE POISSON DISTRIBUTION

5.3 THE POISSON DISTRIBUTION

In cases like above we are generally interested in

5.3 THE POISSON DISTRIBUTION

5.3 THE POISSON DISTRIBUTION

5.3 THE POISSON DISTRIBUTION

5.3 THE POISSON DISTRIBUTION

5.3 THE POISSON DISTRIBUTION

TABLE 5.5 YEAST CELLS IN 400 SQUARES

We list the expected frequencies I column (3) of table 5.5.

The coefficient of dispersion: CD =

This value will be near 1 in distributions that are essential

TABLE 5.6 NUMBER OF MOSS SHOOTS (HYPNUM

TABLE 5.7 POTENTILLA (WEED) SEEDS IN 98

TABLE 5.9 AZUKI BEAN WEEVILS (CALLOSOBRUCHUS

TABLE 5.9 AZUKI BEAN WEEVILS (CALLOSOBRUCHUS

This distribution is extracted from an experimental

TABLE 5.9 AZUKI BEAN WEEVILS (CALLOSOBRUCHUS

TABLE 5.10 MEN KILLED BY BEING KICKED BY A HORSE IN 10

TABLE 5.10 MEN KILLED BY BEING KICKED BY A HORSE IN 10

Table 5.10 is a frequency distribution of men killed by

TABLE 5.10 MEN KILLED BY BEING KICKED BY A HORSE IN 10

Knowing the sample size is large, however, we can

Вам также может понравиться