Вы находитесь на странице: 1из 63

Simulation

1
Useful Distributions
Simulation
2
Outline
Discrete Random Distributions
Poisson Process
Exponential Distribution
Gaussian (normal) distribution
Estimation Process and Central Limit
Theorem
Pareto Distribution
Simulation
3
Bernoulli Distribution
The same as indicator rv success/fail:
Success probability p ;
Fail probability q=1-p
IID Bernoulli RVs (e.g. sequence of coin flips)
Simulation
4
Geometric Distribution
Repeated Trials: Number of trials till some
event occurs
(Remember that for geometric progression
; a=1; m=1)
Differentiating this formula with respect to r allows us to derive formula for
E(X)
Simulation
5
Geometric Distribution: Memoryless property
The coin does not remember that it came up tails
l times
Proof:
Simulation
6
Binomial Distribution
Repeated Trials:
Repeat the same Bernoulli experiment n times.
(Experiments are independent of each other)
Number of times an event happens among n trials
has Binomial distribution
(e.g., number of heads in n coin tosses, number of
arrivals in n time slots,)
Binomial is sum of n IID Bernoulli RVs
Simulation
7
Binomial Distribution
Repeated Trials:
Repeat the same Bernoulli experiment n times.
(Experiments are independent of each other)
Number of times Bernoulli experiment succeeded
among n trials has Binomial distribution
Consider tossing a coin twice. The possible outcomes are:
no heads: P(k = 0) = q
2
one head: P(k = 1) = qp + pq (toss 1 is a tail, toss 2 is a
head or toss 1 is head, toss 2 is a tail) = 2pq
two heads: P(k = 2) = p
2
we don't care which
of the tosses is a
head so there are two
outcomes that give
one head
Simulation
8
Binomial Distribution
We want the probability distribution function P(k, n, p)
where:
k = number of success (e.g. number of heads in a coin toss)
n = number of trials (e.g. number of coin tosses)
p = probability for a success (e.g. 0.5 for a head)
If we look at the three choices for the coin flip example,
each term is of the form:
Binomial coefficient takes into account the number of ways
an outcome can occur without regard to order.
( )
)! ( !
!
k n k
n
C
n
k
k
n

= =
Simulation
9
Mean and Variance of Binomial
2
2
0
0
( ) ( , , )
( , , )
n
k
N
k
k P k n p
npq
P k n p
u

=
=

= =

( )
0
0 0
0
( , , )
( , , )
( , , )
N
n n
n k n k
k
k
N
k k
k
kP k n p
kP k n p k p q np
P m n p
u

=
= =
=
= = = =

Simulation
10
1 2 3 4 5 6 7 8 9 10 11 12
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Binomial - Example
n=4
n=40
n=10
n=20
p=0.2
Simulation
11
Binomial Example (ball-bin)
There are B bins, n balls are randomly dropped
into bins.
p
i
: Probability that a ball goes to bin i
K
i
: Number of balls in bin i after n drops
Simulation
12
Outline
Discrete Random Distributions
Poisson Process
Exponential Distribution
Gaussian (normal) distribution
Estimation Process and Central Limit
Theorem
Pareto Distribution
Simulation
13
Poisson Process
Observing discrete events in a continuous interval
of time or space
Number of army solders killed due to being
kicked by their horses (first observation, 1837,
Simon Poisson )
Number of bad checks per day received by a
certain bank
Number of radioactive particles per millisecond
passing through a Geiger counter
Number of customers arriving to a bank
Number of packets arriving to a buffer of a
network router
Simulation
14
Criteria for a Poisson distribution:
1. The probability of observing a single event over
small interval (a time interval or an area) is
approximately proportional to the size of that
interval (e.g.,phone calls).
2. The probability of two events occurring in a
small enough interval is negligible.
3. The probability of an event within a certain
interval does not change over different intervals
(homogenous).
4. The probability of an event in one interval is
independent of the probability of an event in
any other non-overlapping interval.
Nature of Poisson distribution
In several of the above the events - outcomes of discrete
trials binomial distribution with parameters n and p=/n.
Large number of trials : n - distribution Poisson
Law Of Rare Events - each of the individual Bernoulli
events rarely triggers.
Total count of success events in a Poisson process need
not be rare if the parameter is not small.
Example: the number of telephone calls to a busy
switchboard in one hour follows a Poisson distribution
Events appearing frequent to the operator.
Events are rare from the point of the average member of the
population who is very unlikely to make a call to that switchboard in
that hour.
Simulation
15
Simulation
16
Poisson as limit of Binomial
Poisson is the limit of Binomial(n,p) as
Poisson Distribution - Derivation
! !
( ) (1 ) 1
!( )! !( )!
:
! ( 1)...( 1)( )!
lim ( ) lim 1 lim 1
!( )! !
( )!
lim
!
k n k
k n k
k n k n k
k
k
n n n
k
n n
p k p p
k n k k n k n n
Taking limit as n
n n n n k n k n
p k
k n k n n k n n
n n k
k



| | | |
= =
| |

\ \

+ | | | | | | | |
= = =
| | | |

\ \ \ \
=
( 1)...( 1) 1 1
1 lim ... 1
!
( )
1
lim ... lim 1
lim ( ) lim 1
!
n n
k
k
n n
n n
n
k
n n
n n n k n n n k
n k n n n n
n
n n k
Note: for all fixed k
n n
p k
k n
From






+ + | | | || | | || |
=
| | | | |

\ \ \ \ \
+ | | | |
= = =
| |

\ \
| |
=
|
\
0
0 0 0
lim 1
lim ( ) 0,1, 2,...
! !
!
( ) 1
! !
n
a
n
k k
n
i
x
x
k k
y y y
a
Calculus, we get: e
n
e
p k e k
k k
x
Series expansion of exponential function: e
i
e
p k e e e "Legitimate" Probabili
k k


= = =
| |
+ =
|
\
= = =
=
= = = =


ty Distribution
Poisson Distribution - Expectations
( )
1
0 1 1 1
0 2 2
2
( ) 0,1, 2,...
!
( )
! ! ( 1)! ( 1)!
( 1) ( 1) ( 1)
! ! ( 2)!
k
k k k k
k k k k
k k k
k k k
e
f k y
k
e e e
E K k k e e e
k k k k
e e e
E K K k k k k
k k k
e




= = = =


= = =
= =

= = = = = =





= = = =




=


( )
( )
( )
| |
2
2 2
2
2 2
2
2 2 2
( 2)!
( 1) ( )
( ) ( ) [ ]
k
k
e e
k
E Y E Y Y E Y
V Y E Y E Y



=
= =

= + = +
= = + =
=

Simulation
19
Poisson Distribution PMF
For a length of time t the probability of n
arrivals is
( )
( )
!
n
t
n
t
P t e
n


=
Simulation
20
Poisson Process: Number of Arrivals
Let N(t) denote the number of arrivals by time t
The rate of a Poisson process is the
average number of events per unit time (over
a long time).
Simulation
21
The Poisson Arrival Model
Poisson process is a sequence of arrival
events ``randomly spaced in time."
The rate of a Poisson process is the
average number of events per unit time
(over a long time).
Simulation
22
Poisson Distribution Example
Fifty dots distributed randomly over 50 intervals =
1.
Simulation
23
Poisson Graphs
0 5 10 15
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
=10
=.5
=1
=4
n
P
Simulation
24
0 1 2 3 4 5 6 7 8 9 10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Poisson and Binomial
Poisson(4)
n=5,p=4/5
n=20, p=.2
n=10,p=.4
n=50,p=.08
Simulation
25
Interarrival Times of a Poisson Process (1/2)
Pick an arbitrary starting point in time (call it 0).
Let the time until the next arrival. Our goal
CDF of the random variable
F

(t)= p( t) =1-p( > t)


But probability p( > t) is just the probability that
no arrivals occur in (0,t)
F

(t) = 1-p( > t) = 1- P


0
(t)=(t)
0
e
-t
/0! = 1- e
-t
Simulation
26
Interarrival Times of a Poisson Process (2/2)
So, CDF and PDF of the interarrival time are
Conclusion: interarrival time , has an
exponential distribution
1

(t) = 1- e
-t
p

(t) = e
-t
Simulation
27
Outline
Discrete Random Distributions
Poisson Process
Exponential Distribution
Gaussian (normal) distribution
Estimation Process and Central Limit
Theorem
Pareto Distribution
Simulation
28
Exponential Distribution
Inter-arrivals of Poisson process have an
exponential distribution with parameter
Used to describe service time of the queues
service rate
Simulation
29
Exponential Distribution: Memoryless Property
The exponential distribution is memoryless
What occurs after time t is independent of what occurred
prior to time t
For service times:
The additional time needed to complete a
customers service in progress is independent
of when the service started
Simulation
30
Memoryless Property Proof
Past history has no influence on the future
{ | } { } P X x t X t P X x > + > = >
( )
{ , } { }
{ | }
{ } { }
{ }
x t
x
t
P X x t X t P X x t
P X x t X t
P X t P X t
e
e P X x
e
u
u
u
+

> + > > +


> + > = =
> >
= = = >
Proof:
and
31
Memoryless Property
Reliability: Amount of time a component has been in
service has no effect on the amount of time until it fails
Inter-event times: Amount of time since the last event
contains no information about the amount of time until the
next event
Service times: Amount of remaining service time is
independent of the amount of service time elapsed so far
{ | } { } P X x t X t P X x > + > = >
Simulation
32
Poisson Process in Queuing Models
The Poisson and exponential distributions
are often used in queuing models
The arrival process is Poisson
The number of arrivals, e.g., packet
transmission attempts, during a given interval
of time follows a Poisson distribution
The service process is exponential
The time to service a customer, e.g., transmit
a packet, is exponentially distributed
Simulation
33
Properties of the Poisson Process (1)
1) Interarrival times are independent and exponentially
distributed with parameter (the rate parameter)
If t
n
is the time of the n-th arrival, the intervals

n
= t
n+1
- t
n
have the following probability
distribution function (CDF) and are mutually
independent
Simulation
34
Properties of the Poisson Process (2)
2) If k independent Poisson processes A
1
, ...,A
k
,
with rates
1
, ...,
k
, are merged into a single
process, A = A
1
+ + A
k
, , the combined process
A is Poisson with rate
Simulation
35
Properties of the Poisson Process (3)
3) If a Poisson process (with rate ) is split into k
other process by independently assigning each
arrival to the ith process with probability p
i
,
p
1
+ ... + p
k
= 1, the resulting k processes are
Poisson with rate p
i

Simulation
36
Outline
Discrete Random Distributions
Poisson Process
Exponential Distribution
Gaussian (normal) distribution
Estimation Process and Central Limit
Theorem
Pareto Distribution
Simulation
37
Gaussian distribution
Perhaps the most used distribution in all of science.
Sometimes it is called the bell shaped curve or
normal distribution.
Continuous distribution with pdf
2
2
2
) (
2
1
) (

u

=
x
e x p
u = mean of distribution (also at the same place as
mode and median)

2
= variance of distribution
x is a continuous variable (- x )
Simulation
38
Gaussian distribution

p(x) =
1
2
e

( xu )
2
2
2
gaussian
dy e dy y p b y a P
b
a
y
b
a

= < <

2
2
2
) (
2
1
) ( ) (

u

Probability (P) of y being in the range [a, b] is given by an integral:
P( < y < ) =
1
2
e

( yu)
2
2
2

dy =1
Simulation
39
Gaussian distribution
We often talk about a measurement being a certain
number of standard deviations () away from the mean
(u) of the Gaussian.
Probability for a measurement to be in |u - n| from the
mean - the area inside this region.
n Prob. of exceeding mn
0.67 0.5
1 0.32
2 0.05
3 0.03
4 0.00006
Simulation
40
The rule of 3 :
99.7 % of the trial values fall within the range of (m-3s) and
(m+3s)
Gaussian distribution
Simulation
41
Gaussian distribution Main Usage:Statistics
Sampling
Inferring properties (eg: means, variance) of the
population (also known as parameters)
from the sample properties (a.k.a.
statistics)
Depend upon theorems like Central limit
theorem and properties of normal distribution
The distribution of sample mean (or sample stdev
etc) are called sampling distributions
Powerful idea: sampling distribution of the sample
mean is a normal distribution (under mild
restrictions): Central Limit Theorem (CLT)!
Simulation
42
Outline
Discrete Random Distributions
Poisson Process
Exponential Distribution
Gaussian (normal) distribution
Estimation Process and Central Limit
Theorem
Pareto Distribution
Simulation
43
Estimation Process
I am 95%
confident that
u uu u is between
40 & 60.
Mean, u uu u, is
unknown
Population






Random Sample
Mean
X = 50

Sample

Simulation
44
Key Terms
1. Population (Universe)
All Items of Interest
2. Sample
Portion of Population
3. Parameter
Summary Measure about Population
4. Statistic
Summary Measure about Sample
Strong Law of Large Numbers
X
i
IID (independent identically
distributed) random variables E[X
i
]=
With probability 1
(X
1
+ X
2
+X
n
)/n as n, i.e.
Foundation of using large number of
simulations to obtain average results
Simulation
45
Simulation
46
Expected Value of Sample Mean
Remember
E(X+Y) = E(X) + E(Y)
Therefore
( ) ( )
( ) u u = =

=
|

\
|

=
n
n
X E
n
X E
n n
X
E X E
i
i
i
1 1
1
Simulation
47
Variance of Sample Mean
Remember :
Var(X + Y) = Var(X) + Var (Y) (for indep. X, Y)
Therefore
( ) ( )

i i
X Var X Var
( ) X Var k kX Var
2
) ( =
( ) ( )
( )
( )
n
X Var
X nVar
n
X Var
n
n
X
Var X Var
i
i
i
i
= =

=
|

\
|

=
2
2
1
1
X
X
n

1
=
Simulation
48
Central Limit Theorem
When randomly sampling from any population with
mean u and standard deviation , when n is large
enough, the sampling distribution of x bar is
approximately normal: ~ N(u uu u. . . . /n).
X
i
IID (independent identically distributed) random
variables E[X
i
]=; Var[X
i
]=
2
(X
1
+ X
2
+X
n
- n )/ n ~ N(0,1)
Simulation
49
Central Limit Theorem Example 1
The bi-modal, parabolic, probability density is
obviously non-Normal. Call that the parent
distribution.
Simulation
50
Central Limit Theorem Example 1
To compute an average, Xbar, n samples are
drawn, at random, from the parent distribution
and averaged.
Then another sample of n is drawn and another
value of Xbar computed.
This process is repeated, over and over, and
averages of n are computed.
Simulation
51
Central Limit Theorem Example 1
The distribution of averages of n , n=232
Simulation
52
Central Limit Theorem Example 1
The distribution of averages of n , n=232
Simulation
53
Central Limit Theorem Example 2
The Triangular probability density
Simulation
54
Central Limit Theorem Example 2
The distribution of averages of n , n=232
Simulation
55
Uniform distribution
looks nothing like
bell shaped (gaussian)!
Large spread ()!
Sum of r.v.s from a uniform
distribution after very few samples
looks remarkably normal.
It has decreasing
CENTRAL LIMIT TENDENCY!
Normal Distribution
Simulation
56
Goal: estimate mean
of Uniform distribution
(distributed between 1 & 6)
Sample mean = (sum of samples)/N
Since sum is distributed normally, so
is the sample mean
(a.k.a sampling distribution is
NORMAL), w/ decreasing
dispersion!
Normal Distribution
Simulation
57
Sample mean is a gaussian r.v., with x = u uu u & s = /(n)
0.5
With larger number of samples, avg of sample means
is an excellent estimate of true mean.
If (original) is known, invalid mean estimates can
be rejected with HIGH confidence!
Normal Distribution: Tail Probability
Simulation
58
Many Experiments: Coin Tosses
Number of Tosses (n)
Sample Mean = Total Heads /Number of Tosses
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100 125
Variance reduces over
trials by factor SQRT(n)
Corollary: more experiments (n) is good, but not great
(why? sqrt(n) doesnt grow fast)


x
n
= == =
Simulation
59
Gaussian, Poisson, Binomial
The Gaussian distribution can be derived from the
binomial (or Poisson) assuming:
p is finite
N is very large
we have a continuous variable rather than a discrete variable
1 2 3 4 5 6 7 8 9 10 11 12
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
n=4
n=40
n=10
n=20
Bernoulli, p=0.2
0 5 10 15
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
=10
=.5
=1
=4
n
P
Poisson
Simulation
60
Outline
Discrete Random Distributions
Poisson Process
Exponential Distribution
Gaussian (normal) distribution
Estimation Process and Central Limit
Theorem
Pareto Distribution
Simulation
61
Pareto distribution
X
m
location, k shape
k then p
k
(x) (x xm) Dirac delta-function
p
k
(x) = F
k
(x) =
PDF, X
m
=1 CDF,X
m
=1
Simulation
62
Pareto distribution
Long-tail distribution:
A vast population of events occur very rarely (yellow) - have
low amplitude on some scale (e.g., sales) while a small
population of events occur very often (red) - have high
amplitude
The huge population of rare events is referred to as the long
tail
In many cases the rare events are so much greater in number
than they comprise the majority.
Simulation
63
Allocation of wealth among individuals - a larger portion of the
wealth of any society is owned by a smaller percentage of the
people (Pareto principle or the "80-20 rule": 20% of the population
owns 80% of the wealth.
Frequencies of words in longer texts
File size distribution of Internet traffic which uses the TCP
protocol (many smaller files, few larger ones)
A superposition of many Pareto-distributed ON-OFF sources can
be used to generate self similar traffic (will be explained later in
this course)
Pareto distribution
Probability" or fraction of the population
that owns a small amount of wealth per
person is rather high, and then decreases
steadily as wealth increases.

Вам также может понравиться