Академический Документы
Профессиональный Документы
Культура Документы
Sn
for large
n
S
n. The large deviation theory deals with the tail probability of the form P n a .
n
large number
and the central limit theorem were concerned with the behavior of
Sn P
n
S
Sn
converges to and does not deal with any deviation.
n
S n n d
N (0,1)
n
Sn
<+a/ n
n
In this case, we can find the probability the deviation is of the order 1/ n which is not large.
Thus the CLT is not applicable for finding the probability of the large deviation of the mean.
MGF and CGF
Recall that the moment generating function (MGF) M X ( s ) of a random variable X is defined by
M X ( s ) = Ee sX =
sx
dFX ( x )
where s is a real variable. Unlike the characteristic function, the MGF may not exist for all
random variables and all values of s. If M X ( s ) exists at s = 0 , it may be conveniently used to
generate the moments as:
EX k = M X ( k ) ( s )
s =0
, k = 1, 2,..
K X ( s ) = log e ( M X ( s ) ) = log e ( Ee sX )
where M X ( s ) is the moment generating function.
If M X ( s ) exists and is non-zero, then K X ( s ) also exists.
As M X (0) = 1 , we get K X (0) = 0. The Taylor series expansion of K X ( s ) about the origin gives
K X ( s) = k X (n)
n =1
sn
n!
where the nth coefficient k n is called the nth cumulant of the random variable of X.
From the above expression, we get
k X ( n) =
=
dn
K X ( s)
ds n
s =0
dn
log e ( M X ( s ) )
ds n
s =0
k X (1) =
dM X ( s )
= X
ds s =0
k X (2) =
d 2 M X ( s)
= X2
ds 2
s =0
d 3 M X ( s)
k X (3) =
= EX 3 EX 2 + 2 3 = E ( X X )3
3
ds
s =0
k X (4) =
d 4 M X ( s)
= E ( X X ) 4 3 X4
ds 4
s =0
Crammers Theorem
Let X1, X2,,, Xn be iid random variables with mean and the MGF
finite in a neighbourhood for s = 0. Then for any a > ,
1
S
M X ( s ) .which is
where
Sn
n
M S n ( s ) = Ee sS n
n
= Ee
Xi
i =1
= E e sX i
i =1
Ee
sX i
i =1
= ( M X (s))
n
S
s >0
P n > a e nl ( a )
n
1
S
log e P n > a l * ( a )
n
n
lim
1 Sn
P
> a l * ( a )
n n
For this, consider a set of new random variables Y1 , Y2 ,..., Yn obtained by mapping X i to Yi
such that the common distribution function of Yi s is given by
y
1
FY ( y ) =
e s*u dFX (u )
M X (s *)
As X i s are iid random variables so also Yi s . We can verify that FY ( y ) is a valid distribution
function. We also observe that
dFY (u ) =
e s*u dFX (u )
M X (s *)
MY (s ) =
su
dFY (u )
1
( s + s *)u
e dFX (u)
M X (s *)
M X (s + s *)
M X (s *)
M X/ (s *)
=a
M X (s *)
Similarly,
Define S n =
Y .
i
i =1
Now
MS (s ) = Ee
Yi
i =1
= Ee sYi
i =1
= ( Ee sY )
MS (s ) = (MY (s ) )
(M X (s + s *))
=
(M x (s *))n
MSn (s + s *)
(M x (s *))n
e dFS (u ) =
su
( s + s *)u
dFSn (u )
(M x (s *))n
so that
dFS (u ) =
n
es *u dFSn (u )
(M x (s *))n
Using the above relationship, the probability involving Sn can be studied in terms of the
probabilities involving .
Suppose b>a, we have
P n > a = P ( Sn > na )
n
dF
Sn
(u )
na
(M
na
nb
s*nb
( M x ( s*))
dF
Sn
(u )
na
nb
=e
dF
Sn
(u)
na
1
1
S
1
1
= ( s * b log e ( M x ( s*)) + log e P ( Sn < nb) log e P ( Sn na )
n
n
Now according to the CLT,
Therefore, as n becomes large and noting that we can take b arbitrarily close to a,
1
S
= ( s * a log e ( M x ( s*))
= l * ( a )
Combining the lower bound and upper bounds,
1
S
P n a e nl ( a )
n
Xi = 1
=0
M X ( s ) = Ee sX = pe s + (1 p )
log M X ( s ) = log (1 p + pe s )
l ( s ) = as log (1 p + pe s )
a (1 p )
p(1 a )
1 a
a
Then, l * (a ) = a log + (1 a) log
p
1 p
l ( s ) is maximum at s = log
In fact, S n =
X
i =1
1 a
a
*
+ (1 a) log
, so we can call l (a ) is the relative entropy or Kullback
p
1