Вы находитесь на странице: 1из 9

The Poisson Distribution

Denis-Simon Poisson (1781-1840) was one hell of a smart cookie. The link below takes you to his biography couteousy of the mathematics department at the University of St. Andrews in Scotland my undergraduate Alma Mater. http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Poisson.html Today, Poisson would be regarded as a Physicist, but in his time, such academic distinctions did not really exist. He and his fellows were not merely jacks-of-all-trades master of none, they were in fact masters of all.. The discovery of the distribution that bears his name came late in his career (1837). Poisson became interested in the Binomial distribution and wondered what would happen if the number of trials(n in the Binomial distribution), were allowed to tend to infinity. If you think about this, you can see that by un-constraining the number of trials, we have a distribution tailor made for improbable or rare events. Definition area, The Poisson distribution is a discrete distribution that applies to events that occur a period or interval. The interval could be time, distance, volume, or whatever1. 1. The random variable in question, X, is the number of occurrences over some interval. For example, The number of fatal airline crashes in Peru, per year. 2. 3. This 4 The occurrence must be random. The occurrences must be independent. In other words, if last year was an unusually bad year for fatal airline crashes in Peru. portends nothing for this year. The occurrences must be uniformly distributed over the interval being used. This is a technical requirement to avoid corner solutions dont worry about it.

Although the Poisson distribution has nothing to do with the harmonic mean, they are both interested in rates: MPG, PSI, MPH, etc

The Poisson probabilities are generated in the following manner.


X .e X!

P( X ) =

Where:

X= = e =

The number of occurrence of an event. The mean of the distribution The exponential constant (the base of natural logarithms). e =2.71828 (its on your calculators, the reverse of Ln)

_______________________________________________________________________ You dont need to know this Although this formula bears little resemblance to the Binomial formula, youll just have to trust me. The proof is only three lines, but in involve limits: as some terms are allowed to tend to infinity, while all others tend to one (1), then:
Lim{1 ( / n )}n = e X n

Therefore,
X .e X!

P( X ) =

I told you, you wouldnt want to know! ______________________________________________________________________

The following example demonstrates than the Binomial and Poisson distributions are closely related.

Example:

Suppose that some random variable X is distributed in Binomial fashion, with p = 0.1 (so (1-p) = 0.9) and n = 20. Lets compare the Binomial and Poisson probabilities for X 3. Binomial P(X=0)
20! ( 0.1) 0 . (0.9) 20 = 0.1216 0! ( 20 0)! 20! ( 0.1) 1 . ( 0.9) 19 = 0.2702 1! ( 20 1)! 20! ( 0.1) 2 . ( 0.9) 18 = 0.2852 2! ( 20 2)! 20! ( 0.1) 3 . ( 0.9) 17 = 0.1901 3! ( 20 3)!

Poisson (=2)
0 e 2 0! 1 e 2 1! 2 e 2 2! 3 e 2 3!

= 0.1353

P(X=1)

= 0.2707

P(X=2)

= 0.2707

P(X=3)

= 0.1804

________ 0.8671

________ 0.8571

So, with n =20 in the binomial distribution (thats quite large for the number of trials in a Binomial distribution) the probabilities are remarkably close. A couple things to note here: a) We distribution is p = 0.1. is the mean of the Poisson distribution, in this case two (=2). My choice was not arbitrary. know that the mean of a binomial given by n.p In our case n = 20 and (20 x 0.1) = 2 in the Binomial distribution which is equal to in the Poisson distribution.

b)

In the Poisson distribution P(X=) will always equal P(X=[-1]). To see why, observe P(X=) = P(X=[-1])

Post-multiply P(X=[-1]) by P(X=[-1]) = Another Example: If a forest, find


1 e ( 1) !

(which is just =1)

e = !

A certain tree has seedlings randomly dispersed in a large area. The mean density of seedling is five (5) per square yard. forester randomly locates a 1-square yard region in the the following probabilities. Tip of the Day. Notice that the term e- is common to all the probabilities in the Poisson, also the solution to P(X=0). down or put it in your memory. No seedling in the selected region At least one seedling in the selected region. More than two seedlings Exactly five (5) seedlings.

and, is So write it calculators (i) (ii) (iii) (iv) P(X=0) P(X1) P(X>2) P(X=5)

Another Tip of the Day Students often ask how they can tell a Poisson distribution question from Binomial distribution question? Two ways. Firstly in a binomial question the number of trials is fixed n = something Secondly, Poisson questions have: a) An average is either given or implied b) Code words that imply that rates are involved, words like per yard(above) Solution: per hour, per gallon, per inch etc. We are given a mean of five (5), therefore = 5.

(i) (ii)

P(X=0) P(X1)

5 0 .e 5 = e 5 = 0!

0.0067

Binomial infinity (). So surely P(X 1) cannot be computed because it is equal to P(X=1) + P(X=2) + P(X=3) + ..+P(X=12,657) + .+ P(X = 564,739), and on, and on, and on, until the Chicago Cubs win the World Series.

Previously I have told you that the Poisson distribution is the distribution with n set to

Whilst this is true, two points need to be reinforced. Firstly, it does not negate the fact that all the probabilities must still equal one (1). Secondly, The probabilities become very small, very quickly, and they tend to the insignificant. For example, in the case where (as above) the mean equals five (5). Lets calculate the following P(X = 20) = {(5)20(e)-5}/20! = 0.0000000214161. Its rather like that stupid story of a frog sitting on a lily leaf in the middle of a pond. The shore is twenty-feet away, but the frog can only leap half the distance of its previous jump. The frog never gets ashore. First leap Second leap Third leap Fourth leap Fifth leap Sixth leap Seventh leap On so on. So back to (ii) (iii) = 10 = 5 = 2.5 = 1.25 = 0.625 = 0.3125 = 0.1563 P(X1) = 1 P(X = 0) = 1-0.0067 = 0.9933 P(X>2) = 1- [P(X=0) + P(X=1)] We know P(X=0) = 0.0067 so we need P(X=1) P(X=1) = 51. e-5 = 0.0337 1!

P(X>2) = 1 0.0067 0.0337 = 0.9596 P(X=5) = 55. e-5 = 0.1755 5! The Importance of the Assumption of Independence (iv)

When you get to QMX 212, Ann Lehman will show you how to formally tests for independence. However, simply logic will often guide you., (the following two sets of data are real not made up to make a point). Example: In the late 1980s and 1990 there was an average of 116 homicide death in Richmond, Virginia. Find the probabilities for the number of homicides for a randomly selected date. Furthermore given your probabilities predict the actual numbers for 1991 (in other words, multiply the probabilities by 365 days). First we need . = 116/365 = 0.3178

Now lets run the distribution till the probabilities become insignificant X P(X) Predicted =P(X) x 365 1991 Actual

P(X=0) 268

( 0.3178) 0 e 0.3178 = 0.7277 0!

265.61

So, there were 268 days when there were no homicides in Richmond, VA, we predicted 265. 68 not bad! P(X=1)
( 0.3178) 1 e 0.3178 = 0.2313 1!

84.42

79

So, there were 79 days when there was one (1) homicide in Richmond, VA, we predicted 84.42 still pretty good! P(X=2)
( 0.3178) 2 e 0.3178 = 0.0368 2!

13.41

17

We predicted 13.41, actual was 17. P(X=3)


( 0.3178) 3 e 0.3178 = 0 0039 3!

1.42

Hmm, we if we round our prediction down we nail! Lets try one more. P(X=4)
(0.3178) 4 e 0.3178 = 0.0003 4!

0.1129

So lets think about our results. We started off pretty well, for zero and one homicide

days we were very good, but thereafter our accuracy began to wane. Observe the percentage difference between the actual number of homicides and our prediction. Homicides No homicide One homicide Two homicides Three homicides Four homicides Actual 268 79 17 1 0 Predicted 265.61 84.42 13.39 1.4 0.1129 Percentage Difference (268-265.61) = 0.9% 265.61 (84.42-79) 79 (17-13.39) 13.39 (1.4210-1) 1 To small to bother with = 6.86% = 26.96% = 42.10%

The problem is that we have almost certainly violated the assumption that murders are independent of each other. Obviously, with no-murder days, the question of dependence is moot. The same thing can be said of one-murder days. However, when we get to two and three-murder days, it is possible, perhaps even probable, that those murders were in some way connected. Maybe, some one killed two people during a bank heist. Maybe three people were killed in a drive by shooting. You will learn you how to measure dependence in QMX212. But the above example should alert you to the fact that common sense often serves us well. The following example is also generated from real data. The data come from the U-Boot (U-Boat) Archiv (Archive) in Cuxhaven-Altenbruch, Germany. Not surprisingly it concern U-Boat losses in May 1943. To gives the example a little bit of historical perspective, the Battle of the Atlantic, which went on for most of the Second World War, was a very close run thing. Britain hasnt been self-sufficient in anything for centuries, so when war broke out, it was essential get food and munitions shipped to Britain. The Germans, with considerable success, sent their U-Boats out in to the Atlantic to sink any vessel that might be headed to or from the UK. By March 1943, losses to U-Boats were exceeding the rate at which new ships could be build. Fortunately, British code-breaker working out of Bletchley Park (a manor house

near London), broke the Enigma Naval code, and were able to read German communications in more or less real-time. This advantage resulted in forty (40) U-Boats being sunk in May, 1943. This constituted over 25% of all operational U-Boats. This was such an unsustainable loss that GrandAdmiral Karl Doenitz, head of the Kriegsmarine, and later to succeed Hitler for the last eight days of the war (after Hitlers suicide); withdrew from the Atlantic all but a token force of submarines. So lets see how the Poisson distribution fares. May has thirty-one days (31), forty (40) U-Boats were lost in that month,, so = 40/31 = 1.2903 Let X equal the number of U-Boats sunk on a given day in May, 1943. Predicted U-Boats Sunk = P(X) x 31days 8.53 Actual Sunk 10

X P(X=0)

P(X)
1.2903 0 e 1,.2903 = 0.2752 0!

P(X=1)

1.2903 1 e 1,.2903 = 0.3551 1! 1.2903 2 e 1,.2903 = 0.2291 2! 1.2903 3 e 1,.2903 = 0.0985 3!

11.02

11

P(X=2)

7.10

P(X=3)

3.05

P(X=4)

1.2903 4 e 1,.2903 = 0.0318 4!

0.9852

So, lets see how well we did? Sunk per Day Actual Predicted Percentage Difference

0 1 2 3

10 11 3 5

8.53 11.02 7.90 3.05

(10 8.53) = 17.23% 8.53 (11.02 11) = 0.18% 11 (7.90-3) 3 (5-3.05) 3.05 = 163.33% = 63.93%

Now, with the exception the eleven days in May when one (1) U-Boat was sunk, these results are pretty dismal. Why? The reason is, once again, a lack of independence. In order to thwart U-Boat attacks, the Allies instituted a convoy system, merchant vessels were herded together and protected by destroyers. In a similar vein, U-Boats did not operate independently of each other, UBoats formed themselves into what has become to be known Wolf Packs - many U-Boats attacking a convoy simultaneously. Thus, if a Wolf Pack located and attacked a heavily defender convoy on a particular day, their losses would not be independent of each other. Similarly, since U-Boats did not operate independently, when a Wolf Pack failed to locate a convoy, losses would be zero. The Mean and Variance of the Poisson Distribution We already know that the mean of a Poisson distribution is It so happens that the variance of the Poisson distribution is also The derivation of the mean and variance are, like those of the Binomial distribution, beyond the scope of this course. However if anyone is interested, I can send them to you. Theyre not particularly difficult to follow, yet ingenious in their own way.

Вам также может понравиться