Вы находитесь на странице: 1из 21

Binomial, Negative Binomial and Multinomial

Distribution
Introduction
Binomial distribution arises in many of the real life situations. When a toss
coin number of heads in a given number of toss follows a binomial
distribution. A distribution is said to be binomial if it satisfies the given
below five conditions (adapted from Wackerly, Mendenhall and Scheaffer
2008).
1. There is a fixed number, n, of identical trials.
2. For each trial, there are only two possible outcomes (success/failure).
3. The probability of success, p, remains the same for each trial.
4. The trials are independent of each other.
5. The random variable Y = the number of successes observed for the n
trials.
If all the above conditions are satisfied, then the binomial p.m.f. is given
by
f ( y :n , p )= n p y (1 p)n y
y
The above distribution can arise both in case of infinite population and
finite population with replacement. If the finite population without
replacement is taken then the two (3,4) out of five conditions given above
are not satisfied and thus one need to take care of that. One can argue
that with very high value of N (Finite population) size in comparison to
n will have very little effect on p and thus one can treat the resulting
distribution as binomial distribution. But with very large N also the
independence criteria cant be met. Hence in the case of finite population
without replacement the resulting distribution
is known as
hypergeometric distribution and the resulting p.m.f. is given by
r N r
y N y
f ( y : N ,n , r ) =
N
n
y [max ( 0, n( N r ) ) , min ( r , n ) ]
Where N is the finite population size and r is the total number of
success in N . The hypergeometric distribution would be discussed in
the next chapter. In this chapter we will limit our self to infinite population
or finite population with replacement.

()

( )( )
( )

Binomial Distribution
The probability mass function in case of binomial distribution is given by
y
n y
f ( y :n , p )= n p (1 p)
p
Graph:1 below represents the binomial distributions probability mass
functions with few values of n and p

()

The cumulative distribution function is given by


i= y
F ( y : n , p )= n p i (1 p)ni
i=0 y
Graph: 2 below represents the binomial cumulative distribution functions
with few values of n and p

()

Graph: 1

Graph: 2

Mean of the Binomial Distribution


n
E ( y )= n p y (1 p)n y y
y=0 p
Since the y = 0 term vanishes.
n
n!
E ( y )= y
p y (1 p)n y
y ! ( n y ) !
y=0
n
n!
E ( y )=
p y (1 p)n y
y=1 ( y1) ! ( n y ) !
Let x= y1 = x1 and m=n1 . Subbing y=x +1 and n=m+1
the last sum (and using the fact that the limits x = 0 and x = m
correspond to y = 1 and y = n 1 = m, respectively)
m
(m+1)! x+1
E ( x )=
p (1 p)m x
x=0 x ! ( mx ) !
m
m!
E ( x )=(m+1) p
p x (1 p)m x
x=0 x ! ( mx ) !
m
m!
E ( x )=np
p x (1 p)m x
x=0 x ! ( mx ) !
Using the binomial theorem
m
m!
(a+ b) m=
a y bm y
y=0 y ! ( m y ) !
Put a=1 and b=1 p

()

into

(a+ b) =(a+b) =
m

y=0

m!
m!
a y bm y =
p x (1 p) mx
y ! ( m y ) !
x=0 x ! ( mx ) !

(a+b) =1
And from above we see that
m
m!
E ( x )=np
p x (1 p)m x =np
x
!
(
mx
)
!
x=0
n

y
n y
E ( y ( y1) )= n p (1 p) y ( y1)
y=0 p
n
n!
E ( y ( y1) )= y ( y1)
p y ( 1 p)n y
y
!
(
n
y
)
!
y=0
Since the y = 0 and y=1 term vanishes. This implies
n
n!
E ( y ( y1) )=
p y (1p)n y
y=2 ( y2)! ( n y ) !
n
(n2)!
2
E ( y ( y1) )=n(n1) p
p y2 (1 p)n y
y=2 ( y 2)! ( n y ) !
Let x= y2 = x1 and m=n2 . Subbing y=x +2 and n=m+2
the last sum (and using the fact that the limits x = 0 and x = m
correspond to y = 2 and y = n 2 = m, respectively)
m
m!
E ( y ( y1) )=n(n1) p 2
px (1p)m x
x=0 x ! ( mx ) !
Using binomial theorem as above we can write
m!
p x (1p)m x = n(n1) p 2
x ! ( mx ) !

()

E ( y ( y1) )=n(n1) p

x=0

Variance of binomial distribution


2
2
E ( y 2 )E ( y ) =E ( y ( y1 )) + E ( y )E ( y )
n ( n1 ) p2 +np+(np)2
n2 p 2n p2 +np+(np)2
npn p 2=np(1 p)
Mode
f ( y :n , p )= n p y (1 p)n y
y
f ( y +1 :n , p )= n p y+1 (1 p)n y1
y
From the above relation we can calculate
f ( y+1:n , p ) n y p
=
y +1 (1 p)
f ( y :n , p )
f ( y +1 :n , p )> f ( y : n , p ) if y <np+ p1
f ( y +1 :n , p )< f ( y : n , p ) if y >np+ p1

()
()

into

In the above two case the binomial distribution is unimodal. But if


y=np + p1 it would be bimodal subjected that p 0, 1 . The two modes
are np+ p1 and np+ p . If p=0 the mode is at y=0 and incase of
p=1 the mode is at y=n . Graph: 3 below represent the case with
p=0,1 and we can see from the graph that in case of p=0 the mode is
at y=0 and incase of p=1 the mode is at y=n .

Graph: 3

Estimation of Binomial Parameters known n


ML( p :n , y)= n p y ( 1 p)n y
y
n!
y
n y
ML( p : n , y)=
p (1 p)
y ! ( n y ) !
( p : n , y ) =logn !log ( y ! )log ( ( n y ) ! ) + ylog ( p )+(n y)log (1p)
Maximizing log likelihood, differentiating ( p : n , y ) with respect to
d ( ( p : n , y ) )
=0
dp

()

y n y
+
(1 )=0
p 1 p
y
p= , p 0,1
n
The same parameter would be obtained if we do n Bernoullis trial. This
is not surprising as the fact is that the binomial distribution is the resultant
of n independent Bernoullis trials.
Binomial Approximation to Poisson
Normal Approximation of Binomial
The normal approximation to the binomial distribution (based on the De
MoivreLaplace theorem)
b
xnp
1
Pr <
<

=
eu du=F ( )F ( )

1
2
( npq ) 2
Where u N (0,1) and F represent cumulative density function
This is a relatively crude approximation, but it can be useful when n is
large. Numerical comparisons have been published in a number of
textbooks (e.g., Hald, 1952). A marked improvement is obtained by the
use of a continuity correction. The following normal approximation is used
widely on account of its simplicity:
x +0.5np

1
Pr ( X < x ) =F ( ( npq ) 2 )

Its accuracy for various values of n and p was assessed by Raff (1956)
and by Peizer and Pratt (1968), who used the absolute and the relative
error, respectively. Various rules of thumb for its use have been
recommended in various standard textbooks. Two such rules of thumb are
1. use when np( 1 p) > 9 and
2. use when np > 9 for 0 < p 0. 5 q .
x+0.5np
1

( npq ) 2

Pr ( X =x )=

eu du

x0.5np
1

( npq ) 2

Graph: 4 below represents the binomial probability mass function for


n=100,200,300 , we can see from the graph that as n is increasing the
binomial probability mass function is converging to the normal
distribution. For higher values of n the binomial distribution can be
approximated with normal distribution. Binomial distribution can also be
approximated with Poisson for higher n and lower p . This would be
explored in the next chapter.
Graph: 4

Example: 1 A student takes a multiple choice test with 20 questions, each


with 5 choices (only one of which is correct). Suppose that the student
blindly answers the questions. Let X denote the number of questions
that the student answers correctly. Find each of the following:
a. The probability density function of
b. The Mean of

c. The variance

d. The probability that the student answers at least 12 questions


correctly (the score that she needs to pass).
Ans.
a. The probability P of answering correctly is given by P=
The probability density function of

[]

P ( X=x )= n p x ( 1 p)20 x
x
x

1 4
P ( X=x )= 20
x 5 5

[ ]

20x

b. The mean of

is given by

np=

201
=4
5

is given by

1
5

201
4
5
16
c. The variance of X is given by
np ( 15 )=
=
5
5
d. The probability that student answers at least 12 questions is given
by P ( X 12 )
P ( X=12 ) + P ( X=13 )+ + P ( X=20 ) = 0.000102

Example: 2 Suppose that in a certain district, 40% of the registered voters


prefer candidate A. A random sample of 50 registered voters is selected.
Let Z denote the number in the sample who prefer A. Answer each of
the following:
a. The probability density function of Z
b. The mean of Z
c. The Variance of Z
d. The probability that Z is less than equals to 19.
e. The normal approximation to the probability in (d).
Ans.
a. The probability P of answering correctly is given by P=
The probability density function of

2
5

is given by

[]

x
20 x
P ( X=x )= n p ( 1 p)
x

2 x 3 50x
P ( X=x )= 50
x 5 5

[ ]

502
=20
5
502
3
5
c. The variance of X is given by
np ( 15 )=
=12
5
d. The probability that student answers at least 12 questions is given
by P( Z 19)
P ( Z=0 )+ P ( Z=1 ) ++ P ( Z =19 ) = 0.446476379
e. Z N (20, 12)
b. The mean of

is given by

np=

19.520
=0.442616957
12
P ( Z 19 )=area

Negative Binomial Distribution:


One of the earliest description of negative binomial distribution is due to
Pascal (1679). He discussed many special types of negative binomial
distribution. Montmort (1713) derived the number of tosses required of a
fair coin to obtain a fixed number of heads. A very clear interpretation of
the pmf as a density function was given by Galloway (1839,pp. 3738) in
his discussion of the problem of points. Meyer (1879, p. 204) obtained the
pmf as the probability of exactly j male births in a birth sequence
containing a fixed number of female births; he assumed a known constant
probability of a male birth. He also gave the cdf in a form that we now
recognize as the upper tail of an F
distribution (equivalent to an
incomplete beta function; see Section 5.6).
Negative Binomial Distribution
In a sequence of n binomial trials let the random variable Y denote the
trial at which the rth success occurs, where r is a fixed integer. In this case
the random variable Y is said to be following negative binomial
distribution and the p.m.f. of the same is given by
f ( X =x :r , p )= x1 pr (1 p) yr
r1

( )

The negative binomial distribution is sometimes defined in terms of the


random variable Y =number of failures before rth success. This formulation
is statistically equivalent to the one given above in terms of X =trial at
which the rth success occurs, since Y = X r. The alternative form of the
negative binomial distribution is
f ( X = y +r :r , p )= r+ y1 pr (1 p)y
y
Geometric distribution is a special case of negative binomial distribution
with r=1
Graph: 5 below represents negative binomial distribution with few values
of r and p

Graph: 5

Graph: 6 below represents the negative binomial cumulative distribution


function with few values of r and p
Graph: 6

Mean and variance of Negative Binomial Distribution:


We will use the idea of moment generating functions to calculate first few
moments of the negative binomial distribution and through that we can
obtain mean and variance. The idea of moment generating functions to
obtain moments can be used in the case of the other discrete
distributions. It is an exercise for the reader to use the idea of moment
generating functions to obtain mean and variance of binomial
distributions.
Moment Generating Functions
Let X be a discrete random variable with probability mass function f(x) and
support S. Then:
M ( t )=E ( etx )= e tx f ( x)
x S

is the moment generating function of X as long as the summation is


finite for some interval of t around 0. That is, M(t) is the moment
generating function ("m.g.f.") of X if there is a positive number h such
that the above summation exists and is finite for h < t < h.
The nth derivative of the moment generating function at t=0 gives the nth
moment about origin. Once we have obtained moments about origin it can
be used to find mean, variance and other statistics.
Moment generating function for Negative Binomial
x=

[ ]

M ( t )= e tx x1 (1p) xr p r
r1
x=r
Multiplying numerator and denominator both by e tr
x=
e tr
M ( t )= e tx x1 (1p) xr p r tr
r1
e
x=r
r tr
Now we can take p e out of the summation as they are independent of
x , this gives us

[ ]

x=

[ ]

M ( t )= pr e tr etx x1 (1 p)xr etr


r 1
x=r
This gives us
x=
r tr
M ( t )= p e etx tr x1 (1 p)xr
r 1
x=r

[ ]

M ( t )= p e

tr

x=

[ ]

(1 p)xr
et (xr ) x1
r 1
x=r

1 p
e

t r

M ( t )=(pe )

x=

x=r

[ x1
r1 ]

Now substitute
1 p
e

x=

k =xr

so that

x=k + r

M ( t )=(pe t )r k +r 1
r1
x=r
Using the property that the sum of the negative binomial probabilities
x=
r
(1w) = k + r1 w k
r1
x=r
Using the above summation, we can write the moments generating
function as

1 p
e
1

M ( t )=(pe t )r
This can be simplified as
1 p
e
1

( pe t )r
M ( t )=
Mean of the Negative Binomial Distribution

1p
e
1

( pe t )r

dM ( t )
=
dt t =0
1 p
e
1

1 p
e
1

1 p
e

1 p
e
1

dM ( t )
=
dt
r

r1

[1(1 p)] r ( p)
dM ( t )
=
dt t =0

r2

[1( 1 p)]
r 1

dM ( t )
p r r ( p)r1 p{( p )r r [ p ]
=
dt t =0
pr2
r

r 1

dM ( t )
p r p { ( p ) r p
=
r2
dt t =0
p

r 1

p{( p ) r [ 1( 1 p ) ]

(1p) }

(1p)}

(1p)}

r 1

dM ( t )
p r p +p r p
=
dt t =0
p r2
dM ( t )
p
=
dt t =0

2 r1 r

(1p) }

r ( p +1 p)
pr2

dM ( t )
r
=M ' (0)=
dt t =0
p
Variance of the Negative Binomial Distribution
Variance of the negative binomial distribution can be obtained in the
similar fashion using formulae 2=M ' ' ( 0 )( M ' ( 0 ) )2
1p
e
1

1p
e
1p
e
1

r 1
( t ] +r 2 ( pe t ) pet

r1
''
(
M (t)=r pe t ) (r 1)
Simplifying the above expression gives
r (r +1p)
M ' ' ( 0 )=
p2
r
M ' ( 0) =
p
2
''
=M ( 0 )( M ' ( 0 ) )2
r (r+ 1 p) r
r

2
p
p
p
solving this gives
r (1p)
2=
p2
Mode of the Negative Binomial Distribution
2=

( )( )

Negative binomial distribution is unimodal if


If

t is an integer, then it has two modes at

r1
is not an integer.
p
t1 and t .
t=1+

Estimation of Negative Binomial Parameters:


f ( X =x :r , p )= x1 pr (1 p)xr
r1
f ( Y = y :r , p )= y +r1 p r (1 p) y
y
( y +r ) r
f ( Y = y :r , p )=
p (1 p) y
( y +1) r
( y +r ) r
ML ( r , p : y )=
p (1 p) y
( y +1)r
( r , p : y )=log ( ( y+ r ) )log ( ( y +1 ) ) log ( r )+ rlog ( p ) + ylog(1p)
( ( r , p : y ) )
=0
p
( ( r , p : y ) )
=0
r
( ( r , p : y ) ) r
y
(1 )=0
= +
p
p 1p
r
p=
r+ y
( ( r , p : y ) )
=0 cant be solved in the closed form. This makes the
r
estimation of negative binomial parameters difficult. If a numerical
solution is desired, an iterative technique such as Newton's method can
be used.

( )
( )

Example: 3 An oil company conducts a geological study that indicates that


an exploratory oil well should have a 20% chance of striking oil. What is
the probability that the first strike comes on the third well drilled? What is
the probability that the third strike comes on the seventh well drilled?
What is the mean and variance of the number of wells that must be drilled
if the oil company wants to set up three producing wells?
f ( X =x :r , p )= x1 pr (1 p)xr
r1
2
1
31
p=
.2 ( .8 )
11
p=.256
f ( X =x : r , p )= x1 pr (1 p)xr
.
r1
4
p= 71 .23 ( .8 ) =0.049
31

( )

Ans.

( )

( )

( )

r
3
=E ( x )= =
=15
p 0.20
r (1 p)
30.80
2=Var ( x )=
=
=60
2
0.200.20
p
Example: 4 A standard, fair die is thrown until 3 aces occur. Let X denote
the number of throws. Find each of the following:
a. The probability density function of X
b. The mean of X
c. The variance X
d. The probability that there will be at least 20 throws will be needed.
Ans.
r
xr
. f ( X =x : r , p )= x1 p (1p)
r1

( )

1 5
f ( X =x :r , p )= n1 . .
31 6 6

( )

n3

r
3
=E ( x )= =
=18
p 1 /6
r (1 p) 35/6
2=Var ( x )=
=
=90
1/61 /6
p2
p (V 20 )=0.3643
Many real life variables are found to be having negative binomial
dispersion. Negative binomial regression analysis is used for modelling
over dispersed count data. The following two examples shows the
negative binomial distribution.
Example A. School administrators study the attendance behavior of high
school juniors at two schools. Predictors of the number of days of absence
include the type of program in which the student is enrolled and a
standardized test in math.
Example B. A health-related researcher is studying the number of
hospital visits in past 12 months by senior citizens in a community based
on the characteristics of the individuals and the types of health plans
under which each one is covered.
Normal Approximation of Negative Binomial Distribution:

When the number of success r is large the negative binomial distribution


converges to a normal distribution with
=

r (1p)
p

2=

r (1p)
p2

Multinomial Distribution
A multinomial distribution is an extension of binomial
distribution. In case of binomial distribution the only two possible
outcomes are possible. For example in the case of coin its either
head or tail and in case of dice its either 1 or not 1 or 5 and not
5. In case of multinomial distributions the number of outcomes

are more than 2 and we can say that we have k outcomes with
probability pi
The idea of probability requires that
pi=1
k

i=1

The probability mass function in case of multinomial distribution


is represented by
f ( x 1 , x 2 , , x k :n , p 1 , p 2 , , pk ) =

n!
p 1x p2 x p3x p k x
x 1 ! x 2 ! x3 ! .. x k !
1

Where
pi=1
k

i=1

x i= 1
k

i=1

Each of the x i follows binomial distribution. Based on the this


we can write mean and variance of each of x i
i=E ( x i )=n pi
2=Var ( x i )=n pi ( 1 p i)
Cov ( x i x j) =n p i p j
Cor ( x i x j )=

pi p j

(1 pi )(1 p j)

Example
Suppose that we throw 10 standard, fair dice. Find the probability
of each of the following events:
a. Scores 1 and 6 occur once each and the other scores occur
twice each.

b. There are 4 even scores and 6 odd scores.


Ans.
a.
f ( x 1 , x 2 , , x k :n , p 1 , p 2 , , pk ) =
p=

n!
x
x
x
x
p p p pk
x 1 ! x 2 ! x3 ! .. x k ! 1 2 3
1

10 !
1/6 1 1/61 1/62 1/6 2 1/62 1/62=.00375
1! 1 ! 2 ! 2! 2 ! 2 !

b.
p=

10 !
6
4
1/2 1/2 =.205
6 !4 !

Example: 5 Suppose that two chess players had played numerous


games and it was determined that the probability that Player A would win
is 0.40, the probability that Player B would win is 0.35, and the probability
that the game would end in a draw is 0.25. "If these two chess players
played 12 games, what is the probability that Player A would win 7 games,
Player B would win 2 games, and the remaining 3 games would be
drawn?"

Ans.
f ( x 1 , x 2 , , x k :n , p 1 , p 2 , , pk ) =
p=

n!
p 1x p2 x p3x p k x
x 1 ! x 2 ! x3 ! .. x k !
1

12!
7
2
3
0.40 0.35 0.25 =0.0248
7 ! 2! 3 !

Example: 6 In a certain town, 40% of the eligible voters prefer


candidate A, 10% prefer candidate B, and the remaining 50%
have no preference. You randomly sample 10 eligible voters.
What is the probability that 4 will prefer candidate A, 1 will prefer
candidate B, and the remaining 5 will have no preference?
Ans.

f ( x 1 , x 2 , , x k :n , p 1 , p 2 , , pk ) =
p=

n!
p 1x p2 x p3x p k x
x 1 ! x 2 ! x3 ! .. x k !
1

10 !
0.404 0.101 0.505=0.1008
4 !1!5!

Maximum Likelihood estimation of Multinomial Parameters


In the game of Pokemon (Red and Blue versions, anyway) you must
choose between three types: F, G, and W (fire, grass, and water).
Amongst the population of Pokemon players, theres a probability of
p1 they choose F, a probability of p2 they choose G, and a probability
of p3 they choose W. Suppose we take some data, and find that out of
20 people, 10 choose F, 6 choose G, and 4 choose W. Find the maximum
likelihood estimates of p1 p2 and p3 .
Ans.
L ( p 1 p2|10,6,4 ) =f ( 10,6,4| p1 , p2 ) =

20!
p10 p 6 (1 p1 p2 )4
10 ! 6 ! 4 ! 1 2

To find the parameters we have to maximize the likelihood function. Its


always easy to maximize log of the likelihood function. So we take log of
the above likelihood function
=log(

20 !
p 10 p6 ( 1 p1 p 2) 4 )
10! 6 ! 4 ! 1 2

=log

( 10!206!! 4 ! )+log ( p

=log

( 10!206!! 4 ! )+log ( p

=log

( 10!206!! 4 ! )+10 log ( p ) +6 log ( p )+4 log (1 p p )

10
1

10
1

p62 ( 1 p1 p 2) 4 )

) + log ( p 62 )+ log (( 1 p1 p 2 )4 )
1

Next we find derivative of the likelihood function with respect to


p2
=0
p2
=0,
p1

p1

and

10
4
=
=0
p1 p1 1p 1 p2

6
4
=
=0
p2 p 2 1 p1 p 2

Solving the above equations gives


1
3
p1= p2=
2
10
From above we can obtain
p3=1 p 1p 2=

1
5

Summary:
We have discussed binomial, negative binomial and multinomial
distributions in this chapter. We will end the chapter with one real life
example that helps you distinguishing binomial and negative binomial
distributions. Suppose you have a pack of card and you draw one card.
You put this card back in the pack, shuffle the pack and draw one card
again. You repeat the process. If the question is how many draws are
required to draw two hearts, the resulting distribution is negative
binomial. But if you are asked to find probability of getting two heads in
five draws then your distribution is binomial. Multinomial distribution can
be understood as a extension of binomial distribution with many
outcomes.

Вам также может понравиться