Probability and Stats Notes

Moment Generating Function and
the Central Limit Theorem

Aria Nosratinia Probability and Statistics 7-1
Motivation
Moment generating function:
Systematic calculation of all moments
Recovery of pdf from the moments
Central Limit Theorem:
Guassianity of appropriate sums of random variables
Characterization of noise
Important approximations
Moments of Sums
If W = X
1
+ +X
n
, then
E[W] = E[X
1
] + +E[X
n
]
V ar(W) =
n
i=1
V ar(X
i
) + 2
n1
i=1
n
j=i
Cov(X
i
, X
j
)
When X
i
are uncorrelated,
V ar(W) = V ar(X
1
) + +V ar(X
n
)
Sum of Two RV
Take two random variables X and Y , we wish to nd the pdf of the
sum. Dene:
W = X +Y
V = Y
Then use the Jacobian method to nd f
WV
(w, v). It is easy to see
|J| = 1, therefore:
f
WV
(w, v) = f
XY
(x, y) = f
XY
(w y, y)
Integrate to nd the marginal
f
W
(w) =
_

f
XY
(w y, y) dy
This is an integration of f
XY
along a diagonal line.
Example
Random Variables X, Y are distributed
f
XY
(x, y) =
_
_
_
2 0 x, y 1, x +y 1
0 else
Find the pdf of the sum.
Using the previous result:
f
W
(w) =
_

f
XY
(w y, y) dy
=
_
w
0
2dy 0 w 1
= 2w
Sum of Independent RV
If X, Y are independent and W = X +Y , then:
f
W
(w) =
_

f
XY
(w y, y) dy =
_

f
X
(w y)f
Y
(y) dy
= f
X
f
Y
The pdf of sum of independent r.v. is given by the convolution of the
individual pdfs.
In the discrete case:
P
W
(w) =
k=
P
X
(k) P
Y
(w k)
NOTE: Naturally the pmf values may be zero for some of these k.
Example
Random variable X is Gaussian (0, 1) and random variable Y is
Gaussian with mean 2 and variance 1. Find the pdf of X +Y .
Example
Random variables X, Y are distributed according to:
f
XY
(x, y) =
_
_
_
2e
x
e
2y
x, y > 0
0 else
Find the pdf of W = X +Y
Moment Generating Function
Denition: For random variable X, the moment generating function
is:
X
(s) = E
_
e
sX
_
For continuous r.v., this is (almost) the Laplace transform of the pdf:
X
(s) =
_

e
sx
f
X
(x) dx
For discrete distributions:
X
(s) =
e
sx
i
P
X
(x
i
)
Example
Find the MGF of the Bernoulli distribution:
P
X
(x) =
_
_
_
p x = 1
1 p x = 0
X
(s) = (1 p)e
0
+pe
s
= 1 p +pe
s
Example
Find the MGF of the exponential R.V.
f
X
(x) =
_
_
_
e
x
x 0
0 else
X
(s) =
_

0
e
xs
e
x
dx
=
_

0
e
x(s)
dx
=

s
_
e
x(s)
_
0
=

s
Moments via MGF
d
X
(s)
ds
=
d
ds
_
e
sx
f
X
(x)dx
=
_
xe
sx
f
X
(x)dx
Set s = 1 and we get:
d
X
(s)
ds
s=0
= E[X]
If we take n derivatives, we get n powers of x, therefore:
Theorem: A random variable X with MGF
X
(s) has moments
E[X
n
] =
d
n
X
(s)
ds
n
s=0
Why Moments from MGF?
Each moment requires an integral.
If we need multiple moments, we can take one integral to get MGF,
and then calculate all moments with derivatives (easier than integrals).
Example: Find the rst four moments of an exponential random
variable with parameter .
MGF of Independent Sums
Theorem: If X, Y are independent,
X+Y
(s) =
X
(s)
Y
(s)
Proof:
X+Y
(s) = E
_
e
s(X+Y )
_
= E
_
e
sX
e
sY
_
= E
_
e
sX
_
E
_
e
sY
_
=
X
(s)
Y
(s)
Application of MGF Properties
Using the MGF properties it is easy to show that:
The sum of n i.i.d. Bernoulli-p variables is a binomial (n, p)
The sum of n i.i.d. geometric-p variables is a Pascal (n, p)
The sum of n independent Poisson-
i
random variables is a
Poisson
i
The sum of n independent Gaussians is a Gaussian (but this is
also true for non-independent Gaussians).
The sum of n i.i.d. exponential variables is Erlang.
Characteristic Function (optional)
The characteristic function of variable X is dened as:
X
() =
_

e
jx
f
X
(x) dx
The characteristic function is related to the Fourier transform of
the PDF, just like MGF is related to the Laplace transform.
E[X
n
] =
1
j
n
d
n
d
n
X
()
No two PDF can share the same characteristic function.
All pdf have characteristic functions, not all pdf have MGF.
Therefore characteristic function is a powerful tool.
Sum of I.I.D. Variables
We wish to investigate sums of a large group of i.i.d. variables
whose distribution we may not know. Is that possible?
Consider the i.i.d. sum W = X
1
+ +X
n
as n .

W
= n
X
and V ar(W) = n V ar(X). Both go to innity.
Oops!
To make the analysis more meaningful, we subtract the means,
and divide by

n to avoid exploding the variance.
Central Limit Theorem
Theorem: Consider an i.i.d. sequence X
1
, X
2
, . . . with mean
X
and variance
2
X
. Then random variables Z
n
=
1
n
i=1
X
i
X
have the property:
lim
n
F
Z
n
(z) = (z)
This means, if W
n
= X
1
+ +X
n
, for large n:
F
W
n
(w)
_
w n
X
n
_
Practical Application of CLT
Problem: Find probabilities involving sum of iid variables
W = X
1
+ +X
n
.
Solution: Find
X
and
2
X
. Then use CLT.
Interesting Note: CLT also applies to discrete variables.
Even though sum of discrete variables is always discrete, the CDF
approaches a Gaussian.
This is enough to make approximations involving probabilities.
Example 1
Consider X
i
to be a uniform distribution over [1, 1]. What is the
probability that W = X
1
+ +X
16
takes values in the interval
[1, 1]?
Example 1
Consider X
i
to be a uniform distribution over [1, 1]. What is the
probability that W = X
1
+ +X
16
takes values in the interval
[1, 1]?
X
= 0 V ar(X) =
1
3
P(1 < W 1) = F
W
(1) F
W
(1)
=
_
1 0
4/
3
_
_
1 0
4/
3
_
= 2
_
3
4
_
1
= 0.3328
Example 2
We ip a fair coin a thousand times. What are the chances that we
will have more than 510 heads?
Example 2
We ip a fair coin a thousand times. What are the chances that we
will have more than 510 heads?
Denote by X the ip of a coin and the number of heads to be
A = X
1
+X
2
+ +X
1000
We know
X
= 0.5 and V ar(X) = 0.25.
P(A > 510) = 1 F
A
(510)
= 1
_
510 1000 0.5
0.5
1000
_
= 1 (0.63) = 0.2643
Question: What would be the probability of exactly 510 heads? Does
this lead to problems?
Example 3
DeMoivre-Laplace Formula: For a binomial (n, p) variable K,
P(k
1
K k
2
)
_
k
2
np + 0.5
_
np(1 p)
_
_
k
1
np 0.5
_
np(1 p)
_
IDEA: whichever side that is incldued, we use a margine of 0.5 on the
probability. This avoids problems with approximating discrete variables.
Example: Calculate the P(K = 8) for a binomial (20, 0.4).
Using the previous formula, the answer is zero!
Using the DeMoivre-Laplace formula,
P(8 K 8) P(7.5 K 8) = (
0.5
4.8
) (
0.5
4.8
) = 0.1803
The exact answer using the binomial formula is 0.1797.
Laplace and De Moivre
Abraham de Moivre
(1654-1705)
De Moivre Formula
(cos x+i sin x)
n
= cos nx+i sin nx
Gaussian probabilities
Pierre-Simon Laplace
(1749-1827)
Laplace Transform, Scalar potentials,
Laplace equation (PDE),
Celestial mechanics
De Moivres formula predates the Euler formula e
ix
= cos x +i sinx.
Laplace almost predicted the existence of black holes!
Advance Topics
Random Sums of Independent Variables
We draw a random integer N according to some distribution, then
form:
W = X
1
+ +X
N
using i.i.d. variables X
i
.
The MGF of the sum is:
W
(s) =
N
_
ln
X
(s)
_
For proof, see your textbook
This MGF can be used to calculate probabilities involving random
sums.
Cherno Bound
This is used to bound the tail of a probability distribtion.
Theorem: For a random variable X,
P(X c) min
s0
e
sc
X
(s)
IDEA: the probability is hard to calculate, but the bound is easier.
Proof:
P(X c) =
_

c
f
X
(x)dx =
_

u(x c)f
X
(x)dx
e
s(xc)
f
X
(x)dx
= e
sc
_

e
sx
f
X
(x)dx
= e
sc
X
(s)

Probability and Stats Notes

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Probability and Stats Notes

Загружено:

Авторское право:

Доступные форматы

Moment Generating Function and

the Central Limit Theorem

Вам также может понравиться