Solutions Manual
The creation of this solution manual was one of the most important im
provements in the second edition of Probability: Theory and Examples. The
solutions are not intended to be as polished as the proofs in the book, but are
supposed to give enough of the details so that little is left to the readers imag
ination. It is inevitable that some of the many solutions will contain errors. If
you nd mistakes or better solutions send them via email to rtd1@cornell.edu
or via post to Rick Durrett, Dept. of Math., 523 Malott Hall, Cornell U., Ithaca
NY 14853.
Rick Durrett
Contents
1 Laws of Large Numbers 1
1. Basic Denitions 1
2. Random Variables 3
3. Expected Value 4
4. Independence 7
5. Weak Laws of Large Numbers 12
6. BorelCantelli Lemmas 15
7. Strong Law of Large Numbers 19
8. Convergence of Random Series 20
9. Large Deviations 24
2 Central Limit Theorems 26
1. The De MoivreLaplace Theorem 26
2. Weak Convergence 27
3. Characteristic Functions 31
4. Central Limit Theorems 35
6. Poisson Convergence 39
7. Stable Laws 43
8. Innitely Divisible Distributions 45
9. Limit theorems in R
d
46
3 Random walks 48
1. Stopping Times 48
4. Renewal theory 51
Contents iii
4 Martingales 54
1. Conditional Expectation 54
2. Martingales, Almost Sure Convergence 57
3. Examples 43
4. Doobs Inequality, L
p
Convergence 64
5. Uniform Integrability, Convergence in L
1
66
6. Backwards Martingales 68
7. Optional Stopping Theorems 69
5 Markov Chains 74
1. Denitions and Examples 74
2. Extensions of the Markov Property 75
3. Recurrence and Transience 79
4. Stationary Measures 82
5. Asymptotic Behavior 84
6. General State Space 88
6 Ergodic Theorems 91
1. Denitions and Examples 91
2. Birkhos Ergodic Theorem 93
3. Recurrence 95
6. A Subadditive Ergodic Theorem 96
7. Applications 97
7 Brownian Motion 98
1. Denition and Construction 98
2. Markov Property, Blumenthals 01 Law 99
3. Stopping Times, Strong Markov Property 100
4. Maxima and Zeros 101
5. Martingales 102
6. Donskers Theorem 105
7. CLTs for Dependent Variables 106
8. Empirical Distributions, Brownian Bridge 107
9. Laws of the Iterated Logarithm 107
iv Contents
Appendix: Measure Theory 108
1. LebesgueStieltjes Measures 108
2. Caratheodarys Extension Theorem 109
3. Completion, etc. 109
4. Integration 109
5. Properties of the Integral 112
6. Product Measures, Fubinis Theorem 114
8. RadonNikodym Theorem 116
1 Laws of Large Numbers
1.1. Basic Denitions
1.1. (i) A and BA are disjoint with B = A(BA) so P(A)+P(BA) = P(B)
and rearranging gives the desired result.
(ii) Let A
n
= A
n
A, B
1
= A
1
and for n > 1, B
n
= A
n
n1
m=1
A
m
. Since the
B
n
are disjoint and have union A we have using (i) and B
m
A
m
P(A) =
m=1
P(B
m
)
m=1
P(A
m
)
(iii) Let B
n
= A
n
A
n1
. Then the B
n
are disjoint and have
m=1
B
m
= A,
n
m=1
B
m
= A
n
so
P(A) =
m=1
P(B
m
) = lim
n
n
m=1
P(B
m
) = lim
n
P(A
n
)
(iv) A
c
n
A
c
so (iii) implies P(A
c
n
) P(A
c
). Since P(B
c
) = 1 P(B) it follows
that P(A
n
) P(A).
1.2. (i) Suppose A F
i
for all i. Then since each F
i
is a eld, A
c
F
i
for
each i. Suppose A
1
, A
2
, . . . is a countable sequence of disjoint sets that are in
F
i
for all i. Then since each F
i
is a eld, A =
m
A
m
F
i
for each i.
(ii) We take the interesection of all the elds containing A. The collection of
all subsets of is a eld so the collection is not empty.
1.3. It suces to show that if F is the eld generated by (a
1
, b
1
) (a
n
, b
n
),
then F contains (i) the open sets and (ii) all sets of the form A
1
A
n
where
A
i
R. For (i) note that if G is open and x G then there is a set of the
form (a
1
, b
1
) (a
n
, b
n
) with a
i
, b
i
Q that contains x and lies in G, so
any open set is a countable union of our basic sets. For (ii) x A
2
, . . . , A
n
and
2 Chapter 1 Laws of Large Numbers
let G = {A : AA
2
A
n
F}. Since F is a eld it is easy to see that
if G then G is a eld so if G A then G (A). From the last result it
follows that if A
1
R, A
1
(a
2
, b
2
) (a
n
, b
n
) F. Repeating the last
argument n 1 more times proves (ii).
1.4. It is clear that if A F then A
c
F. Now let A
i
be a countable collection
of sets. If A
c
i
is countable for some i then (
i
A
i
)
c
is countable. On the other
hand if A
i
is countable for each i then
i
A
i
is countable. To check additivity
of P now, suppose the A
i
are disjoint. If A
c
i
is countable for some i then A
j
is
countable for all j = i so
k
P(A
k
) = 1 = P(
k
A
k
). On the other hand if A
i
is countable for each i then
i
A
i
is and
k
P(A
k
) = 0 = P(
k
A
k
).
1.5. The sets of the form (a
1
, b
1
) (a
d
, b
d
) where a
i
, b
i
Q is a countable
collection that generates R
d
.
1.6. If B R then {Z B} = ({X B} A) ({Y B} A
c
) F
1.7.
P( 4) (2)
1/2
4
1
e
8
= 3.3345 10
5
The lower bound is 15/16s of the upper bound, i.e., 3.126 10
5
1.8. The intervals (F(x), F(x)), x R are disjoint and each one that is
nonempty contains a rational number.
1.9. Let
F
1
(x) = sup{y : F(y) x} and note that F(
F
1
(x)) = x when F is
continuous. This inverse wears a hat since it is dierent from the one dened
in the proof of (1.2). To prove the result now note that
P(F(X) x) = P(X
F
1
(x)) = F(
F
1
(x)) = x
1.10. If y (g(), g()) then P(g(X) y) = P(X g
1
(y)) = F(g
1
(y)).
Dierentiating with respect to y gives the desired result.
1.11. If g(x) = e
x
then g
1
(x) = log x and g
(g
1
(x)) = x so using the formula
in the previous exercise gives (2)
1/2
e
(log x)
2
/2
/x.
1.12. (i) Let F(x) = P(X x). P(X
2
y) = F(
y) F(
y) for y > 0.
Dierentiating we see that X
2
has density function
(f(
y) +f(
y))/2
y
(ii) In the case of the normal this reduces to (2y)
1/2
e
y/2
.
Section 1.2 Random Variables 3
1.2. Random Variables
2.1. Let G be the smallest eld containing X
1
(A). Since (X) is a eld
containing X
1
(A), we must have G (X) and hence G = {{X B} : B
F} for some S F A. However, if G is a eld then we can assume F is.
Since A generates S, it follows that F = S.
2.2. If {X
1
+X
2
< x} then there are rational numbers r
i
with r
1
+r
2
< x and
X
i
< r
i
so
{X
1
+X
2
< x} =
r1,r2Q:r1+r2<x
{X
1
< r
1
} {X
2
< r
2
} F
2.3. Let
0
= { : X
n
() X()}. If
0
it follows from the denition
of continuity that f(X
n
()) f(X()). Since P(
0
) = 1 the desired result
follows.
2.4. (i) If G is an open set then f
1
(G) is open and hence measurable. Now
use A = the collection of open sets in (2.1).
(ii) Let G be an open set and let f(x) be the distance from x to the complement
of G, i.e., inf{x y : y G
c
}. f is continuous and {f > 0} = G, so we need
all the open sets to make all the continuous functions measurable.
2.5. If f is l.s.c. and x
n
is a sequence of points that converge to x and have
f(x
n
) a then f(x) a, i.e., {x : f(x) a} is closed. To argue the converse
note that if {y : f(y) > a} is open for each a R and f(x) > a then it is impos
sible to have a sequence of points x
n
x with f(x
n
) a so liminf
yx
f(y) a
and since a < f(x) is arbitrary, f is l.s.c.
The measurability of l.s.c. functions now follows from Example 2.1. For the
other type note that if f is u.s.c. then f is measurable since it is l.s.c., so
f = (f) is.
2.6. In view of the previous exercise we can show f
is l.s.c. by showing {x :
f
(x) < a} is
open for each a R so f
B
2m+1,n+1
. If we write f
n
(x) out in binary then as n we get more digits
in the expansion but dont change any of the old ones so lim
n
f
n
(x) = f(x)
exists. Since f
n
(X()) Y () 2
n
and f
n
(X()) f(X()) for all ,
Y = f(X).
1.3. Expected Value
3.1. X Y 0 so EX Y  = E(X Y ) = EX EY = 0 and using (3.4) it
follows that P(X Y  ) = 0 for all > 0.
3.2. (3.1c) is trivial if EX = or EY = . When EX
+
< and EY
< ,
we have EX, EY  < since EX
EY
and EX
+
EY
+
.
To prove (3.1a) we can without loss of generality suppose EX
, EY
<
and also that EX
+
= (for if EX, EY  < the result follows from the
theorem). In this case, E(X + Y )
EX
+ EY
EX
+
EY
= so E(X +Y ) = = EX +EY .
To prove (3.1b) we note that it is easy to see that if a = 0 E(aX) = aEX. To
complete the proof now it suces to show that if EY = then E(Y +b) = ,
which is obvious if b 0 and easy to prove by contradiction if b < 0.
3.3. Recall the proof of (5.2) in the Appendix. We let (x) (x) be a linear
function with (EX) = (EX) and note that E(X) E(X) = (EX). If
equality holds then Exercise 3.1 implies that (X) = (X) a.s. When is
strictly convex we have (x) > (x) for x = EX so we must have X = EX a.s.
3.4. There is a linear function
(x) = (EX
1
, . . . , EX
n
) +
n
i=1
a
i
(x
i
EX
i
)
so that (x) (x) for all x. Taking expected values now and using (3.1c)
now gives the desired result.
3.5. (i) Let P(X = a) = P(X = a) = b
2
/2a
2
, P(X = 0) = 1 (b
2
/a
2
).
Section 1.3 Expected Value 5
(ii) As a we have a
2
1
(Xa)
0 a.s. Since all these random variables
are smaller than X
2
, the desired result follows from the dominated convergence
theorem.
3.6. (i) First note that EY = EX and var(Y ) = var(X) implies that EY
2
=
EX
2
and since (x) = (x+b)
2
is a quadratic that E(Y ) = E(X). Applying
(3.4) we have
P(Y a) E(Y )/(a +b)
2
= E(X)/(a +b)
2
= p
(ii) By (i) we want to nd p, b > 0 so that apb(1p) = 0 and a
2
p+b
2
(1p) =
2
. Looking at the answer we can guess p =
2
/(
2
+ a
2
), pick b =
2
/a so
that EX = 0 and then check that EX
2
=
2
.
3.7. (i) Let P(X = n) = P(X = n) = 1/2n
2
, P(X = 0) = 1 1/n
2
for n 1.
(ii) Let P(X = 1 ) = 1 1/n and P(X = 1 +b) = 1/n for n 2. To have
EX = 1, var(X) =
2
we need
(1 1/n) +b(1/n) = 0
2
(1 1/n) +b
2
(1/n) =
2
The rst equation implies = b/(n 1). Using this in the second we get
2
= b
2
1
n(n 1)
+b
2
1
n
=
b
2
n 1
3.8. CauchySchwarz implies
_
EY 1
(Y >a)
_
2
EY
2
P(Y > a)
The left hand side is larger than (EY a)
2
so rearranging gives the desired
result.
3.9. EX
2/
n
= n
2
(1/n 1/(n + 1)) = n/(n + 1) 1. If Y X
n
for all n then
Y n
n=1
n
1
/(n + 1) = since
> 1.
3.10. If g = 1
A
this follows from the denition. Linearity of integration extends
the result to simple functions, and then monotone convergence gives the result
for nonnegative functions. Finally by taking positive and negative parts we get
the result for integrable functions.
3.11. To see that 1
A
= 1
n
i=1
(1 1
Ai
) note that the product is zero if and
only if A
i
some i. Expanding out the product gives
1
n
i=1
(1 1
Ai
) =
n
i=1
1
Ai
i<j
1
Ai
1
Aj
+ (1)
n
n
j=1
1
Aj
6 Chapter 1 Laws of Large Numbers
3.12. The rst inequality should be clear. To prove the second it suces to
show
1
A
n
i=1
1
Ai
i<j
1
Ai
1
Aj
To do this we observe that if is in exactly m of the sets A
i
then the right
hand side is m
_
m
2
_
which is 1 for all m 1. For the third inequality it
suces to show
1
A
n
i=1
1
Ai
i<j
1
Ai
1
Aj
+
i<j<k
1
Ai
1
Aj
1
A
k
This time if is in exactly m of the sets A
i
then the right hand side is
m
m(m1)
2
+
m(m1)(m2)
6
We want to show this to be 1 when m 1. When m 5 the third term is
the second and this is true. Computing the value when m = 1, 2, 3, 4 gives
1, 1, 1, 2 and completes the proof.
3.13. If 0 < j < k then x
j
1 +x
k
so EX
k
< implies EX
j
< . To
prove the inequality note that (x) = x
k/j
is convex and apply (3.2) to X
j
.
3.14. Jensens inequality implies (EX) E(X) so the desired result follows
by noting E(X) =
n
m=1
p(m)y
m
and
(EX) = exp
_
n
m=1
p(m) log y
m
_
=
n
m=1
y
p(m)
m
3.15. Let Y
n
= X
n
+ X
1
. Then Y
n
0 and Y
n
X + X
1
so the monotone
convergence theorem implies E(X
n
+ X
1
) E(X + X
1
). Using (3.1a) now
it follows that EX
n
+ EX
1
EX + EX
1
. The assumption that EX
1
<
allows us to subtract EX
1
and get the desired result.
3.16. (y/X)1
(X>y)
1 and converges to 0 a.s. as y so the rst result
follows from the bounded convergence theorem. To prove the second result, we
use our rst observation to see that if 0 < y <
E(y/X; X > y) P(0 < X < ) +E(y/X; X )
On {X }, y/X y/ 1 and y/X 0 so the bounded convergence
theorem implies
limsup
y0
E(y/X; X > y) P(0 < X < )
Section 1.4 Independence 7
and the desired result follows since is arbitrary.
3.17. Let Y
N
=
N
n=0
X
n
. Using the monotone convergence theorem, the lin
earity of expectation, and the denition of the innite sum of a sequence of
nonnegative numbers
E
_
n=0
X
n
_
= E lim
N
Y
N
= lim
N
EY
N
= lim
N
N
n=0
EX
n
=
n=0
EX
n
3.18. Let Y
n
= X1
An
. Jensens inequality and the previous exercise imply
n=0
E(X; A
n
)
n=0
EY
n
= E
n=0
Y
n
EX <
Let B
n
=
n
m=0
A
m
, and X
n
= X1
Bn
. As n , X1
Bn
X1
A
and EX <
so the dominated convergence theorem and the linearity of expectation imply
E(X; A) = lim
n
E(X; B
n
) = lim
n
n
m=0
E(X; A
m
)
1.4. Independence
4.1. (i) If A (X) then it follows from the denition of (X) that A = {X
C} for some C R. Likewise if B (Y ) then B = {Y D} for some D R,
so using these facts and the independence of X and Y ,
P(A B) = P(X C, Y D) = P(X C)P(Y D) = P(A)P(B)
(ii) Conversely if X F, Y G and C, D R it follows from the denition
of measurability that {X C} F, {Y D} G. Since F and G are
independent, it follows that P(X C, Y D) = P(X C)P(Y D).
4.2. (i) Subtracting P(A B) = P(A)P(B) from P(B) = P(B) shows P(A
c
B) = P(A
c
)P(B). The second and third conclusions follow by applying the
rst one to the pairs of independent events (B, A) and (A, B
c
).
(ii) If C, D R then {1
A
C} {, A, A
c
, } and {1
B
D} {, B, B
c
, },
so there are 16 things to check. When either set involved is or the equality
8 Chapter 1 Laws of Large Numbers
holds, so there are only four cases to worry about and they are all covered by
(i).
4.3. (i) Let B
1
= A
c
1
and B
i
= A
i
for i > 1. If I {1, . . . , n} does not contain 1
it is clear that P(
iI
B
i
) =
iI
P(B
i
). Suppose now that 1 I and let J =
I {1}. Subtracting P(
iI
A
i
) =
iI
P(A
i
) from P(
iJ
A
i
) =
iJ
P(A
i
)
gives P(A
c
1
iJ
A
i
) = P(A
c
1
)
iJ
P(A
i
).
(ii) Iterating (i) we see that if B
i
{A
i
, A
c
i
} then B
1
, . . . , B
n
are independent.
Thus if C
i
{A
i
, A
c
i
, } P(
n
i=1
C
i
) =
n
i=1
P(C
i
). The last equality holds
trivially if some C
i
= , so noting 1
Ai
{, A
i
, A
c
i
, } the desired result follows.
4.4. Let c
m
=
_
g(x
m
) dx
m
. If some c
m
= 0 then g
m
= 0 and hence f = 0 a.e., a
contradiction. Integrating over the whole space we have 1 =
n
m=1
c
m
so each
c
m
< . Let f
m
(x) = g
m
(x)/c
m
and F
m
(y) =
_
y
f
m
(x) dx for < x .
Integrating over {x : x
m
y
m
, 1 m n} we have
P(X
m
y
m
, 1 m n) =
n
m=1
F
m
(y
m
)
Taking y
k
= for k = m, it follows that F
m
(y
m
) = P(X
m
y
m
) and we have
checked (4.3).
4.5. The rst step is to prove the stronger condition: if I {1, . . . , n} then
P(X
i
= x
i
, i I) =
iI
P(X
i
= x
i
)
To prove this, note that if I = n1 this follows by summing over the possible
values for the missing index and then use induction. Since P(X
i
S
c
i
) = 0, we
can check independence by showing that if A
i
S
i
then
P(X
i
A
i
, 1 i n) =
n
i=1
P(X
i
A
i
)
To do this we let A
i
consist of and all the sets {X
i
= x} with x S
i
. Clearly,
A
i
is a system that contains . Using (4.2) it follows that (A
1
), . . . , (A
n
)
are independent. Since for any subset B
i
of S
i
, {X
i
B
i
} is in (A
i
) the
desired result follows.
4.6. EX
n
=
_
1
0
sin(2nx) dx = (2n)
1
cos(2nx)
1
0
= 0. Integrating by parts
twice
EX
m
X
n
=
_
1
0
sin(2mx) sin(2nx) dx
=
m
n
_
1
0
cos(2mx) cos(2nx) dx
=
m
2
n
2
_
1
0
sin(2mx) sin(2nx) dx
Section 1.4 Independence 9
so if m = n, EX
m
X
n
= 0. To see that X
m
and X
n
are not independent note
that X
m
(x) = 0 when x = k/2m, 0 k < 2m and on this set X
n
(x) takes
on the values V
n
= {y
0
, y
1
, . . . y
2m1
}. Let [a, b] [1, 1] V with a < b.
Continuity of sin implies that if > 0 is suciently small, we have
P(X
m
[0, ], X
n
[a, b]) = 0 < P(X
m
[0, ])P(X
n
[a, b])
4.7. (i) Using (4.9) with z = 0 and then with z < 0 and letting z 0 and using
the bounded convergence theorem, we have
P(X +Y 0) =
_
F(y)dG(y)
P(X +Y < 0) =
_
F(y)dG(y)
where F(y) is the left limit at y. Subtracting the two expressions we have
P(X +Y = 0) =
_
({y})dG(y) =
y
({y})({y})
since the integrand is only positive for at most countably many y.
(ii) Applying the result in (i) with Y replaced by Y and noting ({x}) = 0
for all x gives the desired result.
4.8. The result is trivial for n = 1. If n > 1, let Y
1
= X
1
+ + X
n1
which
is gamma(n 1, ) by induction, and let Y
2
= X
n
which is gamma(1, ). Then
use Example 4.3.
4.9. Suppose Y
1
= normal(0, a) and Y
2
= normal(0, b). Then (4.10) implies
f
Y1+Y2
(z) =
1
2
ab
_
e
x
2
/2a
e
(zx)
2
/2b
dx
Dropping the constant in front, the integral can be rewritten as
_
exp
_
bx
2
+ax
2
2axz +az
2
2ab
_
dx
=
_
exp
_
a +b
2ab
_
x
2
2a
a +b
xz +
a
a +b
z
2
__
dx
=
_
exp
_
a +b
2ab
_
_
x
a
a +b
z
_
2
+
ab
(a +b)
2
z
2
__
dx
10 Chapter 1 Laws of Large Numbers
since {a/(a + b)}
2
+{a/(a +b)} = ab/(a +b)
2
. Factoring out the term that
does not depend on x, the last integral
= exp
_
z
2
2(a +b)
__
exp
_
a +b
2ab
_
x
a
a +b
z
_
2
_
dx
= exp
_
z
2
2(a +b)
_
_
2ab/(a +b)
since the last integral is the normal density with parameters = az/(a + b)
and
2
= ab/(a + b) without its proper normalizing constant. Reintroducing
the constant we dropped at the beginning,
f
Y1+Y2
(z) =
1
2
ab
_
2ab/(a +b) exp
_
z
2
2(a +b)
_
4.10. It is clear that h((x, y)) is symmetric and vanishes only when x = y. To
check the triangle inequality, we note that
h((x, y)) +h((y, z)) =
_
(x,y)
0
h
(u) du +
_
(y,z)
0
h
(u) du
_
(x,y)+(y,z)
0
h
(u) du
_
(x,z)
0
h
m
P(X = m, Y = n m) =
m
P(X = m)P(Y = n m)
4.16. Using 4.15, some arithmetic and then the binomial theorem
P(X +Y = n) =
n
m=0
e
m
m!
e
nm
(n m)!
= e
(+)
1
n!
n
m=0
n!
m!(n m)!
nm
= e
(+)
( +)
n
n!
4.17. (i) Using 4.15, some arithmetic and the observation that in order to pick
k objects out of n +m we must pick j from the rst n for some 0 j k we
have
P(X +Y = k) =
k
j=0
_
n
j
_
p
j
(1 p)
nj
_
m
k j
_
p
kj
(1 p)
m(kj)
= p
k
(1 p)
n+mk
k
j=0
_
n
j
__
m
k j
_
=
_
n +m
k
_
p
k
(1 p)
n+mk
12 Chapter 1 Laws of Large Numbers
(ii) Let
1
,
2
, . . . be independent Bernoulli(p). We will prove by induction that
S
n
=
1
+ +
n
has a Binomial(n, p) distribution. This is trivial if n = 1. To
do the induction step note X = S
n1
and Y =
n
and use (i).
4.18. (a) When k = 0, 1, 2, 3, 4, P(X +Y = k) = 1/9, 2/9, 3/9, 2/9, 1/9.
(b) We claim that the joint distribution must be
X\Y 0 1 2
2 a 2/9 a 1/9
1 2/9 a 1/9 a
0 1/9 a 2/9 a
where 0 a 2/9. To prove this let a
ij
= P(X = i, Y = j). P(X +Y = 0) =
1/9 implies a
00
= 1/9. Let a
01
= a. P(X = 0) = 1/3 implies a
02
= 2/9 a.
P(X +Y = 1) = 2/9 implies a
10
= 2/9 a. P(Y = 0) = 1/3 implies a
20
= a.
P(X+Y = 2) = 1/3 implies a
11
= 1/9. Using the fact that the row and column
sums are 1/3 one can now ll in the rest of the table.
4.19. If we let h(x, y) = 1
(xyz)
in (4.7) then it follows that
P(XY z) =
__
1
(xyz)
dF(x) dG(y) =
_
F(z/y) dG(y)
4.20. Let i
1
, i
2
, . . . , i
n
{0, 1} and x =
n
m=1
i
m
2
m
P(Y
1
= i
1
, . . . , Y
n
= i
n
) = P( [x, x + 2
n
)) = 2
n
1.5. Weak Laws of Large Numbers
5.1. First note that var(X
m
)/m 0 implies that for any > 0 there is an
A < so that var(X
m
) A + m. Using this estimate and the fact that
n
m=1
m
n
m=1
2m1 = n
2
E(S
n
/n
n
)
2
=
1
n
2
n
m=1
var(X
m
) A/n +
Since is arbitrary this shows the L
2
convergence of S
n
/n
n
to 0, and
convergence in probability follows from (5.3).
5.2. Let > 0 and pick K so that if k K then r(k) . Noting that Cauchy
Schwarz implies EX
i
X
j
(EX
2
i
EX
2
j
)
1/2
= EX
2
k
= r(0) and breaking the sum
into i j K and i j > K we have
ES
2
n
=
1i,jn
EX
i
X
j
n(2K + 1)r(0) +n
2
k=n+1
C
k
2
log k
C
nlog n
so nP(X
i
 > n) 0 and (5.6) can be applied.
EX
i
 =
k=2
C/k log k =
but the truncated mean
n
= EX
i
1
(Xin)
=
n
k=2
(1)
k
C
k log k
k=2
(1)
k
C
k log k
since the latter is an alternating series with decreasing terms (for k 3).
5.5. nP(X
i
> n) = e/ log n 0 so (5.6) can be applied. The truncated mean
n
= EX
i
1
(Xin)
=
_
n
e
e
xlog x
dx = e log log x
n
e
= e log log n
so S
n
/n e log log n 0 in probability.
5.6. Clearly, X =
X
n=1
1 =
n=1
1
(Xn)
so taking expected values proves (i).
For (ii) we consider the squares [0, k]
2
to get X
2
=
n=1
(2n 1)1
(Xn)
and
then take expected values to get the desired formula.
5.7. Note H(X) =
_
h(y)1
(Xy)
dy and take expected values.
5.8. Let m(n) = inf{m : 2
m
m
3/2
n
1
}, b
n
= 2
m(n)
. Replacing k(k +1) by
m(m+ 1) and summing we have
P(X
i
> 2
m
)
k=m+1
1
2
k
m(m+ 1)
=
2
m
m(m+ 1)
nP(X
i
> b
n
) n2
m(n)
/m(n)(m(n) + 1) (m(n) + 1)
1/2
0
14 Chapter 1 Laws of Large Numbers
To check (ii) in (5.5) now, we let
X = X1
(X2
m(n)
)
and observe
E
X
2
1 +
m(n)
k=1
2
2k
1
2
k
k(k + 1)
To estimate the sum divide it into k m(n)/2 and 1 k < m(n)/2 and replace
k by the smallest value in each piece to get
1 +
m(n)/2
k=1
2
k
+
4
m(n)
2
m(n)
k=m(n)/2
2
k
1 + 2 2
m(n)/2
+ 8 2
m(n)
/m(n)
2
C2
m(n)
/m(n)
2
Using this inequality it follows that
nE
X
2
b
2
n
C2
m(n)
m(n)
2
n
2
2m(n)
C
m(n)
1/2
0
The last detail is to compute
a
n
= E(
X) =
k=m(n)+1
(2
k
1)
1
2
k
k(k + 1)
=
k=m(n)+1
_
1
k
1
k + 1
_
+
k=m(n)+1
1
2
k
k(k + 1)
=
1
m(n) + 1
+
k=m(n)+1
1
2
k
k(k + 1)
1
m(n)
1
log
2
n
From the denition of b
n
it follows that 2
m(n)1
n/m
3/2
n/(log
2
n)
3/2
so
(5.5) implies
S
n
+n/ log
2
n
n/(log
2
n)
3/2
0
5.9. n(s)/s 0 as s and for large n we have n(1) > 1, so we can dene
b
n
= inf{s 1 : n(s)/s 1}. Since n(s)/s only jumps up (at atoms of F),
we have n(b
n
) = b
n
. To check the assumptions of (5.5) now, we note that
n = b
n
/(b
n
) so
nP(X
k
 > b
n
) =
b
n
(1 F(b
n
))
(b
n
)
=
1
(b
n
)
0
Section 1.6 BorelCantelli Lemmas 15
since b
n
as n . To check (ii), we observe
_
bn
0
(x) dx b
n
(b
n
) = b
2
n
/n
So using (5.7) with p = 2
b
2
n
nE
X
2
n,k
_
bn
0
2x(1 F(x)) dx
_
bn
0
(x) dx
0
since (s) as s . To derive the desired result now we note that
a
n
= n(b
n
) = b
n
.
1.6. BorelCantelli Lemmas
6.1. Let > 0. Pick N so that P(X > N) , then pick < 1 so that if
x, y [N + 1, N + 1] and x y < then f(x) f(y) < .
P(f(X
n
) f(X) > ) P(X > N) +P(X X
n
 > )
so limsup
n
P(f(X
n
) f(X) > ) . Since is arbitrary the desired
result follows.
6.2. Pick n
k
so that EX
n
k
liminf
n
EX
n
. By (6.2) there is a further
subsequence X
n(m
k
)
so that X
n
(
m
k
)
X a.s. Using Fatous lemma and the
choice of n
k
it follows that
EX liminf
k
EX
n(m
k
)
= liminf
n
EX
n
6.3. If X
n(m)
is a subsequence there is a further subsequence so that X
n(m
k
)
X a.s. We have EX
n(m
k
)
EX by (a) (3.7) or (b) (3.8). Using (6.3) it follows
that EX
n
EX.
6.4. Let (z) = z/(1 + z). (i) Since (z) > 0 for z = 0, E(X Y ) = 0
implies (X Y ) = 0 a.s. and hence X = Y a.s. (ii) is obvious. (iii) follows
by noting that Exercise 4.10 implies (XY ) +(Y Z) (XZ) and
then taking expected value. To check (b) note that if X
n
X in probability
then since 1, Exercise 6.3 implies d(X
n
, X) = E(X
n
X) 0. To prove
the converse let > 0 and note that Chebyshevs inequality implies
P(X
n
X > ) d(X
n
, X)/() 0
16 Chapter 1 Laws of Large Numbers
6.5. Pick N
k
so that if m, n N
k
then d(X
m
, X
n
) 2
k
. Given a subsequence
X
n(m)
pick m
k
increasing so that n(m
k
) N
k
. Using Chebyshevs inequality
with (z) = z/(1 +z) we have
P(X
n(m
k
)
X
n(m
k+1
)
 > k
2
) (k
2
+ 1)2
k
The right hand side is summable so the BorelCantelli lemma implies that for
large k, we have X
n(m
k
)
X
n(m
k+1
)
 k
2
. Since
k
k
2
< this and
the triangle inequality imply that X
n(m
k
)
converges a.s. to a limit X. To see
that the limit does not depend on the subsequence note that if X
n
(m
k
)
X
(m
k
)
) 0, and the bounded
convergence theorem implies d(X, X
2
(EX
n
)
2
Bn
2
(a
2
/2)n
2
= Cn
2
If we let n
k
= [k
2/(2)
] + 1 and T
k
= X
n
k
then the last result says
P(T
k
ET
k
 > ET
k
) Ck
2
so the Borel Cantelli lemma implies T
k
/ET
k
1 almost surely. Since we have
ET
k+1
/ET
k
1 the rest of the proof is the same as in the proof of (6.8).
6.8. Exercise 4.16 implies that we can subdivide X
n
with large
n
into several
independent Poissons with mean 1 so we can suppose without loss of general
ity that
n
1. Once we do this and notice that for a Poisson var(X
m
) = EX
m
the proof is almost the same as that of (6.8).
6.9. The events {
n
= 0} = {X
n
= 0} are independent and have probability
1/2, so the second Borel Cantelli lemma implies that P(
n
= 0 i.o.) = 1. To
prove the other result let r
1
= 1 r
2
= 2 and r
n
= r
n1
+ [log
2
n]. Let A
n
=
{X
m
= 1 for r
n1
< m r
n
}. P(A
n
) 1/n, so it follows from the second
Borel Cantelli lemma that P(A
n
i.o.) = 1, and hence
rn
[log
2
n] i.o. Since
r
n
nlog
2
n we have
rn
log
2
(r
n
)
[log
2
n]
log
2
n + log
2
log
2
n
Section 1.6 BorelCantelli Lemmas 17
innitely often and the desired result follows.
6.10. Pick
n
0 and pick c
n
so that P(X
n
 >
n
c
n
) 2
n
. Since
n
2
n
< ,
the BorelCantelli lemma implies P(X
n
/c
n
 >
n
i.o.) = 0.
6.11. (i) Let B
n
= A
c
n
A
n+1
and note that as n
P (
m=n
A
m
) P(A
n
) +
m=n
P(B
m
) 0
(ii) Let A
n
= [0,
n
) where
n
0 and
n
n
= . The BorelCantelli lemma
cannot be applied but P(A
n
) 0 and P(A
c
n
A
n+1
) = 0 for all n.
6.12. Since the events A
c
m
are independent
P (
n
m=1
A
m
) =
n
m=1
(1 P(A
m
))
If P(
m
A
m
) = 1 then the innite product is 0, but when P(A
m
) < 1 for all m
this imples
P(A
m
) = (see Lemma) and the result follows from the second
BorelCantelli lemma.
Lemma. If P(A
m
) < 1 for all m and
m
P(A
m
) < then
m=1
(1 P(A
m
) > 0
To prove this note that if
n
k=1
P(A
k
) < 1 and
m
=
m
k=1
(1 P(A
k
)) then
1
n
=
n
m=1
m1
m
n
k=1
P(A
k
) < 1
so if
m=M
P(A
m
) < 1 then
m=M
(1 P(A
m
)) > 0. If P(A
m
) < 1 for all m
then
M
m=1
(1 P(A
m
)) > 0 and the desired result follows.
6.13. If
n
P(X
n
> A) < then P(X
n
> A i.o.) = 0 and sup
n
X
n
< .
Conversely, if
n
P(X
n
> A) = for all A then P(X
n
> A i.o.) = 1 for all A
and sup
n
X
n
= .
6.14. Note that if 0 < < 1 then P(X
n
 > ) = p
n
. (i) then follows im
mediately, and (ii) from the fact that the two Borel Cantelli lemmas imply
P(X
n
 > i.o.) is 0 or 1 according as
n
p
n
< or = .
6.15. The answers are (i) EY
i
 < , (ii) EY
+
i
< , (iii) nP(Y
i
> n) 0, (iv)
P(Y
i
 < ) = 1.
18 Chapter 1 Laws of Large Numbers
(i) If EY
i
 < then
n
P(Y
n
 > n) < for all n so Y
n
/n 0 a.s.
Conversely if EY
i
 = then
n
P(Y
n
 > n) = so Y
n
/n 1 i.o.
(ii) If EY
+
i
< then
n
P(Y
n
> n) < for all n so limsup
n
Y
+
n
/n 0
a.s., and it follows that max
1mn
Y
m
/n 0 Conversely if EY
+
i
= then
n
P(Y
n
> n) = so Y
n
/n 1 i.o.
(iii) P(max
1mn
Y
m
n) nP(Y
i
n) 0. Now, if nP(Y
i
> n) 0 we
can nd a (0, 1), n
k
and m
k
n
k
so that m
k
P(Y
i
> n
k
) . Using
the second Bonferroni inequality we have
P
_
max
1mm
k
Y
m
> n
k
_
m
k
P(Y
i
> n
k
)
_
m
k
2
_
P(Y
i
> n
k
)
2
2
/2 > 0
(iv) P(Y
n
/n > ) = P(Y
n
 > n) 0 if P(Y
i
 < ) = 1.
6.16. Note that we can pick
n
0 so that P(X
n
X >
n
) 0. Let
with P() = p > 0. For large n we have P(X
n
X >
n
) p/2 so
X
n
() X()
n
0. If
0
= { : P({}) > 0} then P(
0
) = 1 so we
have proved the desired result.
6.17. If m is an integer P(X
n
2
m
) = 2
m+1
so taking x
n
= log
2
(Knlog
2
n)
and m
n
= [x
n
] + 1 x
n
+ 1 we have P(X
n
> 2
xn
) 2
xn
= 1/Knlog
2
n.
Since
n
1/nlog
2
n = the second Borel Cantelli lemma implies that with
probability one X
n
> 2
xn
i.o. Since K is arbitrary the desired result follows.
6.18. (i) P(X
n
log n) = 1/n and these events are independent so the second
BorelCantelli implies P(X
n
log n i.o.) = 1. On the other hand P(X
n
(1 + ) log n) = 1/n
1+
so the rst BorelCantelli lemma implies P(X
n
(1 +
) log n i.o.) = 0.
(ii) The rst result implies that if > 0 then X
n
(1 +) log n for large n so
limsup
n
M
n
/ log n 1. On the other hand if > 0
P(M
n
< (1 ) log n) = (1 n
(1)
)
n
e
n
kn
1
A
k
and Y
n
= X
n
/EX
n
. Our hypothesis implies
limsup
n
1/EY
2
n
=
Letting a = in Exercise 3.8 and noting EY
n
= 1 we have
P(Y
n
> ) (1 )
2
/EY
2
n
so using the denition of Y
n
and Exercise 6.6 we have
P(A
n
i.o.) P(limsup Y
n
> ) limsup P(Y
n
> ) (1 )
2
R
t
t
S
N(t)+1
S
N(t)
+T
N(t)
To handle the lefthand side we note
S
N(t)
N(t)
N(t) + 1
S
N(t)+1
+T
N(t)+1
N(t)
N(t) + 1
EX
1
1
EX
1
+EY
1
1
A similar argument handles the righthand side and completes the proof.
7.3. Our assumptions imply X
n
 = U
1
U
n
where the U
i
are i.i.d. with P(U
i
r) = r
2
for 0 r 1.
1
n
log X
n
 =
1
n
n
m=1
log U
m
Elog U
m
by the strong law of large numbers. To compute the constant we note
E log U
m
=
_
1
0
2r log r dr = (r
2
log r r
2
/2)
1
0
= 1/2
20 Chapter 1 Laws of Large Numbers
7.4. (i) The strong law of large numbers implies
n
1
log W
n
c(p) = E log(ap + (1 p)V
n
)
(ii) Dierentiating we have
c
(p) = E
_
a V
n
ap + (1 p)V
n
_
c
(p) = E
_
(a V
n
)
2
(ap + (1 p)V
n
)
2
_
< 0
(iii) In order to have a maximum in (0, 1) we need c
nm
S
n
 m
p
_
Cm
/m
2p
When (2p 1) > 1 the right hand side is summable and the desired result
follows from the BorelCantelli lemma.
8.2. EX
p
= implies
n=1
P(X
n
 > n
1/p
) = which in turn implies that
X
n
 n
1/p
i.o. The desired result now follows from
max{S
n1
, S
n
} X
n
/2
Section 1.8 Convergence of Random Series 21
8.3. Y
n
= X
n
sin(nt)/n has mean 0 and variance 1/n
2
. Since
n=1
var(Y
n
) <
the desired result follows from (8.3).
8.4. (i) follows from (8.3) and (8.5). For (ii) let
P(X
n
= n) = P(X
n
= n) =
2
n
/2n
2
P(X
n
= 0) = 1
2
n
/n
2
n=1
2
n
/2n
2
= implies P(X
n
n i.o.) = 1.
8.5. To prove that (i) is equivalent to (ii) we use Kolmogorovs three series
theorem (8.4) with A = 1 and note that if Y
n
= X
n
1
(Xn1)
then var(Y
n
)
EY
2
n
EY
n
. To see that (ii) is equivalent to (iii) note
X
n
1 +X
n
X
n
1
(Xn1)
+ 1
(Xn>1)
2X
n
1 +X
n
8.6. We check the convergence of the three series in (8.4)
n=1
P(X
n
 > 1)
n=1
EX
n
1
(Xn>1)
<
Let Y
n
= X
n
1
(Xn1)
. EX
n
= 0 implies EY
n
= EX
n
1
(Xn>1)
so
n=1
EY
n

n=1
EX
n
1
(Xn>1)
<
Last and easiest we have
n=1
var(Y
n
)
n=1
EX
n

2
1
(Xn1)
<
8.7. We check the convergence of the three series in (8.4).
n=1
P(X
n
 > 1)
n=1
EX
n

p(n)
<
Let Y
n
= X
n
1
(Xn1)
. If 0 < p(n) 1, Y
n
 X
n

p(n)
so EY
n
 EX
n

p(n)
.
If p(n) > 1 then EX
n
= 0 implies EY
n
= EX
n
1
(Xn>1)
so we again have
EY
n
 EX
n

p(n)
and it follows that
n=1
EY
n

n=1
EX
n

p(n)
<
22 Chapter 1 Laws of Large Numbers
Last and easiest we have
n=1
var(Y
n
)
n=1
EY
2
n
n=1
EX
n

p(n)
<
8.8. If E log
+
X
1
 = then for any K < ,
n=1
P(log
+
X
n
 > Kn) = ,
so X
n
 > e
Kn
i.o. and the radius of convergence is 0.
If Elog
+
X
1
 < then for any > 0,
n=1
P(log
+
X
n
 > n) < , so
X
n
 e
n
for large n and the radius of convergence is e
. If the X
n
are
not 0 then P(X
n
 > i.o.) = 1 and
n=1
X
n
 1
n
= .
8.9. Let A
k
= {S
m,k
 > 2a, S
m,j
 2a, m j < k} and let G
k
= {S
k,n

a}. Since the A
k
are disjoint, A
k
G
k
{S
m,n
 > a}, and A
k
and G
k
are
independent
P(S
m,n
 > a)
n
k=m+1
P(A
k
G
k
)
=
n
k=m+1
P(A
k
)P(G
k
) min
m<kn
P(G
k
)
n
k=m+1
P(A
k
)
8.10. Let S
k,n
= S
n
S
k
. Convergence of S
n
to S
in probability and S
k,n

S
k
S
 +S
S
n
 imply
min
mkn
P(S
k,n
 a) 1
as m, n . Since P(S
m,n
 > a) 0, () implies
P
_
max
m<jn
S
m,j
 > 2a
_
0
As at the end of the proof of (8.3) this implies that with probability 1, S
m
()
is a Cauchy sequence and converges a.s.
8.11. Let S
k,n
= S
n
S
k
. Convergence of S
n
/n to 0 in probability and S
k,n

S
k
 +S
n
 imply that if > 0 then
min
0kn
P(S
k,n
 n) 1
as n . Since P(S
n
 > n) 0, () with m = 0 implies
P
_
max
0<jn
S
j
 > 2n
_
0
Section 1.8 Convergence of Random Series 23
8.12. (i) Let S
k,n
= S
n
S
k
. Convergence of S
n
/a(n) to 0 in probability and
S
k,
 S
k
 +S
2
(log
2
n)
1+2
Noting a(2
n
) = 2
n/2
n
1/2+
we have
P(S(2
n
) > 2
n/2
n
1/2+
)
2
2
n
12
and the desired result follows from the BorelCantell lemma.
24 Chapter 1 Laws of Large Numbers
1.9. Large Deviations
9.1. Taking n = 1 in (9.2) we see that (a) = implies P(X
1
a) = 0.
If S
n
na then X
m
a for some m n so (b) implies (c). Finally if
P(S
n
na) = 0 for all n then (a) = .
9.2. Suppose n = km where m is an integer.
P(S
n
n{a + (1 )b}) P(S
n
na)P(S
n(1)
n(1 )b)
Taking (1/n) log of both sides and letting k gives
(a + (1 )b) (a) + (1 )(b)
If, without loss of generality a < b then letting q
n
where q
n
are rationals and
using monotonicity extends the result to irrational . For a concave function
f, increasing a or h > 0 decreases (f(a +h) f(a))/h. From this observation
the Lipschitz continuity follows easily.
9.3. Since P(X x
o
) = 1, Ee
X
< for all > 0. Since F
is concentrated
on (, x
o
] it is clear that its mean
()/() x
o
. On the other hand
if > 0, then P(X x
o
) = c
> 0, Ee
X
c
e
(xo)
, and hence
F
(x
o
2) =
1
()
_
xo2
e
x
dF(x)
e
xo2)
c
e
(xo)
= e
/c
0
Since > 0 is arbitrary it follows that
x
o
.
9.4. If we let have the standard normal distribution then for a > 0
P(S
n
na) = P( a
n) (a
n)
1
exp(a
2
n/2)
so (1/n) log P(S
n
na) a
2
/2.
9.5.
Ee
X
=
n=0
e
1
e
n
/n! = exp(e
1)
so () = e
1,
()/() =
() = e
, and
a
= log a. Plugging in gives
(a) = a
a
+(
a
) = a log a +a 1
9.6. 1+x e
x
with x = () 1 gives () exp(() 1) To prove the other
inequality, we note that
() 1 =
e
2 +e
2
=
n=1
2n
(2n)!
2
Section 1.9 Large Deviations 25
(9.3) implies P(S
n
na) exp(n{a
2
}). Taking = a/2 to minimize
the upper bound the desired result follows.
9.7. Since (a) is decreasing and log P(X = x
o
) for all a < x
o
we have only
to show that limsup (a) P(X = x
o
). To do this we begin by observing that
the computation for coin ips shows that the result is true for distributions that
have a two point support. Now if we let
X
i
= x
o
when X
i
x
o
and
X
i
= x
o
when x
o
< X
i
x
o
then
S
n
S
n
and hence (a) (a) but
(a) P(
X
i
= x
o
) = P(x
o
< X
i
x
o
). Since is aribitrary the desired
result follows.
9.8. Clearly, P(S
n
na) P(S
n1
n)P(X
n
n(a + )). The fact that
Ee
X
= for all > 0 implies limsup
n
(1/n) log P(X
n
> na) = 0, and the
desired conclusion follows as in the proof of (9.6).
9.9. Let p
n
= P(X
i
> (a +)n). EX
i
 < implies
P
_
max
in
X
i
> n(a +)
_
np
n
0
and hence P(F
n
) = np
n
(1 p
n
)
n1
np
n
. Breaking the event F
n
into disjoint
pieces according to the index of the large value, and noting
P
_
S
n1
 < n
max
in
X
i
n(a +)
_
0
by the weak law of large numbers and the fact that the conditioning event has
a probability 1 the desired result follows.
2 Central Limit Theorems
2.1. The De MoivreLaplace Theorem
1.1. Since log(1 + x)/x 1 as x 0, it follows that given an > 0 there is a
> 0 so that if x < then (1 )x < log(1 + x) < (1 + )x. From this it is
easy to see that our assumptions imply
n
j=1
log(1 +c
j,n
)
and the desired result follows.
1.2. Applying Stirlings formula to n! we have
2nP(S
n
= n +m) =
2ne
n
n
n+m
/(n +m)!
n!n
m
(n +m)!
=
_
m
k=1
1 +
k
n
_
1
m
k=1
k m
2
/2 so if m x
n k
2n
log
_
1
k
n
_
+o(1)
1 +a
2
log(1 +a)
1 a
2
log(1 a)
when k/n a. Now if k/n a > 0 we have
P(S
2n
= 2k + 2) =
n k
n +k + 1
P(S
2n
= 2k)
1 a
1 +a
P(S
2n
= 2k)
Section 2.2 Weak Convergence 27
and summing a geometric series we have P(S
2n
2k) CP(S
2n
= 2k).
1.4. P(S
n
= k) = e
n
n
k
/k! and k! k
k
e
k
2k so
P(S
n
= k) e
n+k
_
n
k
_
k
/
2k
and if k/n a we have
1
n
log P(S
n
= k) =
n k
n
k
n
log
_
k
n
_
+o(1) a 1 a log a
Now if k/n a > 1 we have
P(S
n
= k + 1) =
n
k + 1
P(S
n
= k)
1
a
P(S
n
= k)
and the result follows as in Exercise 1.3.
2.2. Weak Convergence
2.1. Let f
n
(x) = 2 if x [m/2
n
, (m+1)/2
n
) and 0 m < 2
n
is an even integer.
2.2. As n
(i) P(M
n
yn
1/
) = (1 y
n
1
)
n
exp(y
)
(ii) P(M
n
yn
1/
) = (1 y
n
1
)
n
exp(y
)
(iii) P(M
n
log n +y) = (1 e
y
n
1
)
n
exp(e
y
)
2.3. (i) From the asymptotic formula it follows that
lim
x
P(X
i
> x + (/x))
P(X
i
> x)
= lim
x
x
x + (/x)
exp( {
2
/2x
2
}) = e
(ii) Let p
n
= P(X
i
> b
n
+ (x/b
n
)) and note that the denition of b
n
and (i)
imply np
n
e
x
so
P(b
n
(M
n
b
n
) x) = (1 p
n
)
n
exp(e
x
)
(iii) By (1.4) we have
P(X
i
> (2 log n)
1/2
)
1
(2 log n)
1/2
1
n
so for large n, b
n
(2 log n)
1/2
. On the other hand
P(X
i
> {2 log n 2 log log n)}
1/2
)
1
(2 log n)
1/2
log n
n
28 Chapter 2 Central Limit Theorems
so for large n b
n
(2 log n2 log log n)
1/2
From (ii) we see that if x
n
and
y
n
P
_
y
n
b
n
M
n
b
n
x
n
b
n
_
1
Taking x
n
, y
n
= o(b
n
) the desired result follows.
2.4. Let Y
n
d
= X
n
with Y
n
Y
a.s. 0 g(Y
n
) g(Y
a.s. 0 g(Y
n
) g(Y
m=1
1
Xm()x
(7.4) implies that sup
x
F
n
(x) F(x) 0 with probability one. Pick a good
outcome
0
, let x
n,m
= X
m
(
0
) and a
n,m
= 1/n.
2.8. Suppose rst that integer valued X
n
X
= k)
To prove the converse let > 0 and nd points I = {x
1
, . . . x
j
} so that P(X
I) 1 . Pick N so that if n N then P(X
n
= x
i
) P(X
= x
i
) /j.
Now let m be an integer, let I
m
= I (, m], and let J
m
be the integers m
not in I
m
. The triangle inequality implies that if n N then
P(X
n
I
m
) P(X
I
m
)
The choice of x
1
, . . . x
j
implies P(X
J
m
) while the convergence for
all x
i
implies that P(X
n
J
m
) 2 for n N. Combining the last three
inequalities implies P(X
n
m) P(X
n
m=1
Y
2
m
Y
i
almost surely so the desired result follows from
Exercise 2.9.
2.13. EY
n
1 implies EY
n
C so (2.7) implies that the sequence is tight.
Suppose
n(k)
, and let Y be a random variable with distribution . Exer
cise 2.5 implies that if < then EY
= 1. If we let (, ) we have
EY
= 1 = (EY
)
/
so for the random variable Y
Y
(t) dt
=
h
2
_
/h
/h
e
ita
X
(t) dt
3.3. If X has ch.f. then X has ch.f. . If is real = so the inversion
formula (3.3) implies X
d
= X.
32 Chapter 2 Central Limit Theorems
3.4. By Example 3.3 and (3.1f), X
1
+X
2
has ch.f. exp((
2
1
+
2
2
)t
2
/2).
3.5. Examples 3.4 and 3.6 have this property since their density functions are
discontinuous.
3.6. Example 3.4 implies that the X
i
have ch.f. (sin t)/t, so (3.1f) implies that
X
1
+ +X
n
has ch.f. (sin t/t)
n
. When n 2 this is integrable so (3.3) implies
that
f(x) =
1
2
_
(sin t/t)
n
e
itx
dt
Since sin t and t are both odd, the quotient is even and we can simplify the last
integral to get the indicated formula.
3.7. X Y has ch.f. = 
2
. The rst equality follows by taking a = 0 in
Exercise 3.2. The second from Exercise 4.7 in Chapter 1.
3.8. Example 3.9 and (3.1f) imply that X
1
+ + X
n
has ch.f. exp(nt),
so (3.1e) implies (X
1
+ + X
n
)/n has ch.f. exp(t) and hence a Cauchy
distribution.
3.9. X
n
has ch.f.
n
(t) = exp(
2
n
t
2
/2). By taking logs we see that
n
(1) has
a limit if and only if
2
n
2
[0, ]. However
2
= is ruled out by the
remark after (3.4).
3.10. Let
n
(t) = Ee
itXn
and
n
(t) = Ee
itYn
. Since X
n
X
and Y
n
Y
,
we have
n
(t)
(t) and
n
(t)
(t). X
n
+ Y
n
has ch.f.
n
(t)
n
(t)
which
(t)
, the ch.f. of S
.
3.12. By Example 3.1 and (3.1e), cos(t/2
m
) is the ch.f. of a r.v. X
m
with
P(X
m
= 1/2
m
) = P(X
m
= 1/2
m
) = 1/2, so Exercise 3.11 implies S
m=1
X
m
has ch.f.
m=1
cos(t/2
m
). If we let Y
m
= (2
m
X
m
+ 1)/2 then
S
m=1
_
1
2
m
+
2Y
m
2
m
_
= 1 + 2
m=1
Y
m
/2
m
The Y
m
are i.i.d. with P(Y
m
= 0) = P(Y
m
= 1) = 1/2 so thinking about binary
digits of a point chosen at random from (0, 1), we see
m=1
Y
m
/2
m
is uniform
on (0, 1). Thus S
j=1
_
1 +e
it23
j
2
_
(3
k
) =
j=1
_
1 +e
i23
kj
2
_
=
m=1
_
1 +e
i23
m
2
_
= ()
3.14. We prove the result by induction on n by checking the conditions of (9.1)
in the appendix. To make the notation agree we write
(n)
(x) =
_
(is)
n
e
ixs
(ds)
so f(x, s) = (is)
n
e
ixs
. Since (is)
n
e
ixs
 = s
n
, EX
n
< then (i) holds.
Clearly, (ii) f/x = (is)
n+1
e
ixs
is a continuous function of x. The dominated
converence theorem implies
x
_
(is)
n+1
e
ixs
(ds)
is a continuous function so (iii) holds. Finally,
_ _
EX
n+1
d <
so (iv) holds and the desired result follows from (9.1) in the Appendix.
3.15. (t) = e
t
2
/2
=
n=0
(1)
n
t
2n
/(2
n
n!). In this form it is easy to see that
(2n)
(0) = (1)
n
(2n)!/(2
n
n!). The deisred result now follows by observing that
EX
n
< for all n and using the previous exercise.
3.16. (i) Let X
i
be a r.v. with ch.f.
i
. (3.1d) and (3.7) with n = 0 imply

i
(t +h)
i
(t) Ee
ihXi
1 E min(hX
i
, 2)
E(hX
i
; X
i
 h
1/2
) + 2P(X
i
 > h
1/2
)
The rst expected value is h
1/2
, the second term goes to 0 as h 0 by
tightness.
(ii) Without loss of generality we can assume the compact set is [K, K]. Let
> 0 and pick > 0 so that if h < then 
i
(t + h)
i
(t) < for all i.
Let m > 1/ be an integer. Since
n
(t) < 2
for t [K, K].
(iii) X
n
= 1/n has ch.f. e
it/n
that converges to 1 pointwise but not uniformly.
3.17. (i) E exp(itS
n
/n) = (t/n)
n
. If
2
/2 +o(t
2
) and
we have EX = 0 and EX
2
= 2c. If (t) = 1 +o(t
2
) then c = 0 and X 0
3.20. (3.4) shows that Y
n
0 implies
n
(t) 1 for all t. Conversely if
n
(t) 1 for t < then it follows from (3.5) that the sequence Y
n
is tight.
Part (i) of (3.4) implies that any subsequential limit has a ch.f. that is = 1 on
(, ) and hence by the previous exercise must be 1. We have shown now
that any subsequence has a further subsequence that 0 so we have Y
n
0
by the last paragraph of the proof of (3.4).
3.21. If S
n
converges in distribution then
n
(t) = E exp(itS
n
) (t) which is
a ch.f. and hence has (t) 1 < 1/2 for t [, ]. If m < n let
m,n
(t) = Eexp(it(S
n
S
m
)) =
n
(t)/
m
(t)
when
m
(t) = 0. Combining our results we see that if m, n then
m,n
1
for t [, ]. Using the previous exercise now we can conclude that if m, n
then S
n
S
m
0 in probability. Using Exercise 6.4 in Chapter 1 now we
can conclude that there is a random variable S
with S
n
S
in probability.
3.22. By Polyas criterion, (3.10), it suces to show that (t) = exp(t
) is
convex on (0, ). To do this we note
(t) = t
1
exp(t
(t) = (
2
t
22
( 1)t
2
) exp(t
)
Section 2.4 Central Limit Theorems 35
which is > 0 since 1.
3.23. (3.1f) implies that X
1
+ + X
n
has ch.f. exp(nt
), so (3.1e) implies
(X
1
+ +X
n
)/n
1/
has ch.f. exp(t
).
3.24. Let
2
(t) =
1
(t) on A, linear on each open interval that makes up A
c
,
and continuous.
2
is convex on (0, ) and by Polyas criterion must be a ch.f.
Since e
t
is strictly convex we have {t : (t) =
1
(t)} = A.
3.25. Let
0
(t) = (1 t)
+
and
1
(t) be periodic with period 2 and =
0
(t) on
[1, 1]. If X, Y, Z are independent with X and Y having ch.f.
0
and Z having
ch.f.
1
then X +Y and X + Z both have ch.f.
2
0
.
3.26. Let
X
and
Y
be the ch.f. of X and Y . Let > 0 be such that
X
(t) = 0
for t [, ]. If X + Y and X have the same distibution then
X
(t)
Y
(t) =
X
(t) so
Y
(t) = 1 for t [, ] and hence must be 1 by Exercise 3.19.
3.27.
k
EX
k
k
. Conversely if > 0 then P(X > ) > 0 so
k
= EX
k
( )
k
P(X )
and liminf
k
1/k
k
.
3.28. Since (x) is bounded for x [1, 2] the identity quoted implies that
(x) [x]! where f(x) g(x) means 0 < c f(x)/g(x) C < for all
x 1. Stirlings formula implies
n! (n/e)
n
2n
where as usual a
n
b
n
means a
n
/b
n
1 as n . Combining this with the
previous result and recalling (n
p
)
1/n
1 shows
((n + + 1)/)
1/n
_
n + + 1
e
_
(n++1)/n
Cn
1
n K i.o.) limsup
n
P(S
n
/
n K) > 0
so Kolmogorovs 01 law implies P(S
n
/
n K i.o.) = 1.
(b) If S
n
/
n Z in probability then
S
m!
/
m! S
(m+1)!
/
_
(m+ 1)! 0 in probability
On the other hand, the independence of S
m!
and S
(m+1)!
S
m!
imply
P
_
1 <
S
m!
m!
< 2,
S
(m+1)!
S
m!
_
(m + 1)!
< 3
_
P(1 < < 2)P( < 3) > 0
so liminf
m
P(S
m!
/
m! > 1, S
(m+1)!
/
_
(m + 1)! < 1) > 0 a contradiction.
4.3. Since Y
m
= U
m
+ V
m
the rst inequality is obvious. The second follows
from symmetry. To prove the third we note that
P
_
n
m=1
U
m
K
n
_
P( K/
_
var(U
i
))
If the truncation level is chosen large then var(U
i
) is large and the right hand
side > 2/5, so the third inequality holds for large n.
4.4. Intuitively, since (2x
1/2
)
= x
1/2
and S
n
/n 1 in probability
2(
_
S
n
n) =
_
Sn
n
dx
x
1/2
S
n
n
n
To make the last calulation rigorous note that when S
n
n n
2/3
(an event
with probability 1)
_
Sn
n
dx
x
1/2
S
n
n
_
Sn
n
1
x
1/2
1
n
dx
n
2/3
_
1
(n n
2/3
)
1/2
1
n
1/2
_
= n
2/3
_
n
nn
2/3
dx
2x
3/2
n
4/3
2(n n
2/3
)
3/2
0
Section 2.4 Central Limit Theorems 37
as n .
4.5. The weak law of large numbers implies
n
m=1
X
2
m
/n
2
1. y
1/2
is
continuous at 1, so (2.3) implies
_
2
n
_
n
m=1
X
2
m
_
1/2
1 in probability
and Exercise 2.11 implies
n
m=1
X
m
n
_
2
n
n
m=1
X
2
m
_
1/2
1
4.6. Kolmogorovs inequality ((7.2) in Chapter 1) implies
P
_
sup
(1)anm(1+)an
S
m
S
[(1)an]
 >
a
n
_
2/
2
If X
n
= S
Nn
/
a
n
and Y
n
= S
an
/
a
n
then it follows that
limsup
n
P(X
n
Y
n
 > ) 2/
2
Since this holds for all we have P(X
n
Y
n
 > ) 0 for each > 0, i.e.,
X
n
Y
n
0 in probability. The desired conclusion follows from the converging
together lemma Exercise 2.10.
4.7. N
t
/(t/) 1 by (7.3) in Chapter 1, so by the last exercise
(S
Nt
N
t
)/(
2
t/)
1/2
In view of Exercise 2.10 we can complete the proof now by showing
(S
Nt
t)/
t 0
To do this, we observe that EY
2
i
< implies
P
_
max
1m2t/
Y
m
>
t
_
(2t/)P(Y
1
>
t)
2
E(Y
2
1
; Y
1
>
t) 0
by the dominated convergence theorem. Since P(N
t
+ 1 2t/) 1 and
0 t S
Nt
Y
Nt+1
, the desired result follows from Exercise 2.10.
38 Chapter 2 Central Limit Theorems
4.8. Recall u = [t/]. Kolmogorovs inequality implies
P(S
u+m
(S
u
+m) > t
2/5
for some m [t
3/5
, t
3/5
]) 2
2
t
3/5
t
4/5
0
as t . When the event estimated in the last equation does not occur we
have
D
t
+m t
2/5
S
u+m
t D
t
+m +t
2/5
when m [t
3/5
, t
3/5
]. When
m = (D
t
+ 2t
2/5
)/ S
u+m
> t so N
t
u D
t
/ + 2t
2/5
/
m = (D
t
2t
2/5
)/ S
u+m
< t so N
t
u D
t
/ 2t
2/5
/
The last two inequalities imply (recall u = [t/])
N
t
(t D
t
)/
t
1/2
0 in probability.
The central limit theorem implies D
t
/
_
t/ and the desired result follows
from Exercise 2.10.
4.9. Let Y
m
= 1 if X
m
> 0 and Y
m
= 1 if X
m
< 0. P(X
m
= Y
m
) = m
2
so
the Borel Cantelli lemma implies P(X
m
= Y
m
i.o.) = 0. The ordinary central
limit theorem implies T
n
= Y
1
+ + Y
n
has T
n
/
n , so the converging
together lemma, Exercise 2.10, implies S
n
/
n .
4.10. Let X
n,m
= (X
m
EX
m
)/
_
var(S
n
). By denition, (i) in (4.5) holds with
2
= 1. Since X
m
EX
m
 2M, the sum in (ii) is 0 for large n. The desired
result follows from (4.5).
4.11. Let X
n,m
= X
m
/
m=1
E(X
2
n,m
; X
n,m
 > ) = n
1
n
m=1
E(X
2
m
; X
m
 >
n)
n
1
(
n)
m=1
E(X
2+
) C(
n)
0
The desired result now follows from (4.5).
4.12. Let X
n,m
= (X
m
EX
m
)/
n
. By denition (i) in (4.5) holds with
2
= 1.
To check (ii) we note that
n
m=1
E(X
2
n,m
; X
n,m
 > ) =
2
n
n
m=1
E((X
m
EX
m
)
2
; X
m
EX
m
 >
n
)
(2+)
n
n
m=1
E(X
m
EX
m

2+
) 0
Section 2.6 Poisson Convergence 39
The desired result now follows from (4.5).
4.13. (i) If > 1 then
j
P(X
j
= 0) < so the Borel Cantelli lemma implies
P(X
j
= 0 i.o.) = 0 and
j
X
j
exists.
(ii) EX
2
j
= j
2
so var(S
n
) n
3
/(3 ). Let X
n,m
= X
m
/n
(3)/2
. By
denition (i) in (4.5) holds with
2
= 1/(3 ). To check (ii) we note that
when < 1, (3 )/2 > 1 so eventually the sum in (ii) is 0. The desired result
now follows from (4.5).
(iii) When = 1, E exp(itX
j
) = 1 j
1
(1 cos(jt)). So
E exp(itS
n
/n) =
n
j=1
_
1 +
1
n
(j/n)
1
{cos(jt/n) 1}
_
(1 cos(tx))/x is bounded for x 1 and the Riemann sums
n
j=1
1
n
(j/n)
1
{cos(jt/n) 1}
_
1
0
x
1
{cos(xt) 1} dx
so the desired result follows from Exercise 1.1.
2.6. Poisson Convergence
6.1. (i) Clearly d(, ) = d(, ) and d(, ) = 0 if and only if = . To check
the triangle inequality we note that the triangle inequality for real numbers
implies
(x) (x) +(x) (x) (x) (x)
then sum over x.
(ii) One direction of the second result is trivial. We cannot have
n
0
unless
n
(x) (x) for each x. To prove the converse note that if
n
(x) (x)
x

n
(x) (x) = 2
x
((x)
n
(x))
+
0
by the dominated convergence theorem.
6.2. ((x) (x))
+
P(X = x, X = Y ) so summing over x and noting that
the events on the right hand side are disjoint shows /2 P(XY ). To
prove the other direction note that
x
(x) (x) =
x
(x) ((x) (x))
+
= 1 /2
40 Chapter 2 Central Limit Theorems
Let I
x
, x Z be disjoint subintervals of (0, 1/2) with length (x)(x).
Set X = Y = x on I
x
. Let J
x
, x Z, be disjoint subintervals of (1/2, 1)
with length ((x) (x))
+
and set X = x on J
x
. Since
{(x) (x)} + ((x) (x))
+
= (x)
X has distribution . For Y , we similarly let K
x
, x Z, be disjoint subintervals
of (1 /2, 1) with length ((x) (x))
+
and set Y = x on K
x
.
6.3. Let X
n,m
= (
n
m
n
m1
) 1. The hypotheses of (6.7) hold with
p
n,m
=
m1
n
_
1
m1
n
_
n,m
=
_
m1
n
_
2
for 1 m k
n
. The desired result follows from (6.7) since
max
1mkn
p
n,m
k
n
/n 0
kn
m=1
p
n,m
1
n
kn
m=1
(m1)
k
2
n
2n
2
2
kn
m=1
n,m
=
1
n
2
kn
m=1
(m1)
2
k
3
n
3n
2
0
6.4. For m 1,
n
m
n
m1
has a geometric distribution with p = 1 (m1)/n
and hence by Example 3.5 in Chapter 1 has mean 1/p = n/(n m + 1) and
variance (1 p)/p
2
= n(m1)/(n m + 1)
2
.
n,k
=
k
m=1
n
n m + 1
=
n
j=nk+1
n
j
n
_
n
nk
dx
x
nln(1 a)
2
n,k
=
k
m=1
n(m1)
(n m+ 1)
2
= n
n
j=nk+1
n j
j
2
=
n
j=nk+1
1 j/n
(j/n)
2
n
_
1
1a
1 x
x
2
dx
Let t
n,m
=
n
m
n
m1
and X
n,m
= (t
n,m
Et
n,m
)/
n. By design EX
n,m
= 0
and (i) in (4.5) holds. To check (ii) we note that if k/n b < 1 and Y is
geometric with parameter p = 1 b
k
m=1
E(X
2
n,m
; X
n,m
 > ) bnE((Y/
n)
2
; Y >
n) 0
Section 2.6 Poisson Convergence 41
by the dominated convergence theorem.
6.5. Iterating P(T > t +s) = P(T > t)P(T > s) shows
P(T > ks) = P(T > s)
k
Letting s 0 and using P(T > 0) = 1 it follows that P(T > t) > 0 for all t.
Let e
m=1
e
vn+1(vmvm1)
_
e
vn+1(1vn)
v
n
n+1
To nd the joint density of V = (V
1
, . . . , V
n
) we simplify the preceding formula
and integrate out the last coordinate to get
f
V
(v
1
, . . . , v
n
) =
_
0
n+1
v
n
n+1
e
vn+1
dv
n+1
= n!
42 Chapter 2 Central Limit Theorems
for 0 < v
1
< v
2
. . . < v
n
< 1, which is the desired joint density.
6.8. As n , T
n+1
/n 1 almost surely, so Exercise 2.11 implies
nV
n
k
d
= nT
k
/T
n+1
T
k
6.9. As n , T
n+1
/n 1 almost surely, so if > 0 and n is large
n
1
n
m=1
1
{n(V
n
m
V
n
m1
)>x}
n
1
n
m=1
1
{TmTm1>x(1+)}
e
x(1+)
almost surely by the strong law of large numbers. A similar argument gives an
upper bound of exp(x(1 )) and the desired result follows.
6.10. Exercise 6.18 in Chapter 1 implies
(log n)
1
max
1mn+1
T
m
T
m1
1
As n , T
n+1
/n 1 almost surely, so the desired result follows from
Exercise 2.11.
6.11. Properties of the exponential distribution imply
P
_
(n + 1) min
1mn+1
T
m
T
m1
> x
_
= e
x
As n , (n + 1)T
n+1
/n
2
1 almost surely, so the desired result folllows
from Exercise 2.11.
6.12. Conditioning on N = m, we see that if m
0
, . . . , m
k
add up to m then
P(N
0
= m
0
, . . . , N
k
= m
k
) =
m!
m
0
! m
k
!
p
m0
0
p
m
k
k
e
m
m!
=
k
j=0
e
pj
(p
j
)
mj
m
j
!
6.13. If the number of balls has a Poisson distribution with mean s = nlog n
n(log ) then the number of balls in box i, N
i
, are independent with mean
s/n = log(n/) and hence they are vacant with probability exp(s/n) = /n.
Letting X
n,i
= 1 if the ith box is vacant, 0 otherwise and using (6.1) it follows
that the number of vacant sites converges to a Poisson with mean .
Section 2.7 Stable Laws 43
To prove the result for a xed number of balls, we note that the central limit
theorem implies
P(Poisson(s
1
) < r < Poisson(s
2
)) 1
Since the number of vacant boxes is decreased when the number of balls in
creases the desired result follows.
2.7. Stable Laws
7.1. log(tx)/ log t = (log t +log x)/ log t 1 as t . However, (tx)
/t
= x
.
7.2. In the proof we showed
E exp(it
S
n
()/a
n
) exp
__
(e
itx
1)x
(+1)
dx
+
_
(e
itx
1)(1 )x
(+1)
dx
_
Since e
itx
1 itx as x 0, if we assume < 1 the righthand side has a
limit when 0. Using (7.10) and (7.6) the desired result follows.
7.3. If we let Z
m
= sgn (Y
m
)/Y
m

p
, which are i.i.d., then for x 1
P(Z
m
 > x) = P(Y
m
 x
1/p
) = x
1/p
(i) When p < 1/2, EZ
2
m
< and the central limit theorem (4.1) implies
n
1/2
n
m=1
Z
m
c
(ii) When p = 1/2 the Z
m
have the distribution considered in Example 4.8 so
(nlog n)
1/2
n
m=1
Z
m
7.4. Let X
1
, X
2
, . . . be i.i.d. with P(X
i
> x) = x
u
x
_
1
0
px
p1
dx +pC
_
1
x
p1
dx <
(ii) Let X
1
, X
2
, . . . be i.i.d. with P(X
i
> x) = P(X
i
< x) = x
/2 for x 1,
and let S
n
= X
1
+ +X
n
. From the convergence of X
n
to a Poisson process
we have
{m n : X
m
> xn
1/
} Poisson(x
/2)
{m n : X
m
< n
1/
} Poisson(1/2)
Now S
n
xn
1/
if (i) there is at least one X
m
> xn
1/
with m n, (ii) there
is no X
m
< n
1/
with m n, and (iii)
S
n
(1) 0 so we have
liminf
n
P(S
n
xn
1/
)
x
2
e
x
/2
e
1/2
1
2
To see the inequality note that P(iii) P(i) and even if we condition on the
number of X
m
 > n
1/
with m n the distribution of
S
n
(1) is symmetric.
7.6. (i) Let X
1
, X
2
, . . . be i.i.d. with P(X
i
> x) = x
/2, P(X
i
< x) =
(1 )x
Y ) = exp(c
t
)
(ii) W
2
 has density 2(2)
1/2
e
x
2
/2
and f(x) = 1/x
2
is decreasing on (0, )
so using Exercise 1.10 from Chapter 1, and noting g(y) = 1/
y, g
(y) =
(1/2)y
3/2
we see that Y = 1/W
2

2
has density function
2
2
e
1/2y
1
2
y
3/2
as claimed. Taking X = W
1
and Y = 1/W
2
2
and using (i) we see that W
1
/W
2
=
XY
1/2
has a symmetric stable distribution with index 2 (1/2).
2.8. Innitely Divisible Distributions
8.1. Suppose Z = gamma(, ). If X
n,1
, . . . , X
n,n
are gamma(/n, ) and in
dependent then Example 4.3 in Chapter 1 implies X
n,1
+ +X
n,n
=
d
Z.
8.2. Suppose Z has support in [M, M]. If X
n,1
, . . . , X
n,n
are independent and
Z = X
n,1
+ +X
n,n
then X
n,1
, . . . , X
n,n
must have support in [M/n, M/n].
So var(X
n,i
) EX
2
n,i
M
2
/n
2
and var(Z) M
2
/n. Letting n we have
var(Z) = 0.
8.3. Suppose Z = X
n,1
+ +X
n,n
where the X
n,i
are i.i.d. If is the ch.f. of
Z and
n
is the ch.f. of X
n,i
then
n
n
(t) = (t). Since (t) is continuous at 0
we can pick a > 0 so that (t) = 0 for t [, ]. We have supposed is
real so taking nth roots it follows that
n
(t) 1 for t [, ]. Using Exercise
3.20 now we conclude that X
n,1
0, and (i) of (3.4) implies
n
(t) 1 for all
t. If (t
0
) = 0 for some t
0
this is inconsistent with
n
n
(t
0
) = (t
0
) so cannot
vanish.
8.4. Comparing the proof of (7.7) with the verbal description above the problem
statement we see that the Levy measure has density 1/2x for x [1, 1], 0
otherwise.
46 Chapter 2 Central Limit Theorems
2.9. Limit Theorems in R
d
9.1.
F
i
(x) = P(X
i
x)
lim
n
P(X
1
n, . . . , X
i1
n, X
i
x, X
i+1
n, . . . , X
d
n)
lim
n
F(n, . . . , n, x, n, . . . , n)
where the x is in the ith place and ns in the others.
9.2. It is clear that F has properties (ii) and (iii). To check (iv) let G(x) =
d
i=1
F
i
(x
i
) and H(x) =
d
i=1
F
i
(x
i
)(1F
i
(x
i
)). Using the notation introduced
just before (iv)
v
sgn (v)G(v) =
d
i=1
F
i
(b
i
) F
i
(a
i
)
v
sgn (v)H(v) =
d
i=1
{F
i
(b
i
)(1 F
i
(b
i
) F
i
(a
i
)(1 F
i
(a
i
)}
To show
v
sgn(v)(G(v) +H(v)) 0 we note
F
i
(b
i
)(1 F
i
(b
i
)) F
i
(a
i
)(1 F
i
(a
i
))
= {F
i
(b
i
) F
i
(a
i
)}(1 F
i
(a
i
))
+F
i
(a
i
){(1 F
i
(b
i
)) (1 F
i
(a
i
))}
= {1 F
i
(b
i
) F
i
(a
i
)}(F
i
(b
i
) F
i
(a
i
))
and 1 F
i
(b
i
) F
i
(a
i
) 1.
9.3. Each partial derivative kills one intergal.
9.4. If K is closed, H = {x : x
i
K} is closed. So
limsup
n
P(X
n,i
K) = limsup
n
P(X
n
H) P(X H) = P(X
i
K)
9.5. If X has ch.f. then the vector Y = (X, . . . , X) has ch.f.
(t) = Eexp
_
_
i
j
t
j
X
_
_
=
_
_
j
t
j
_
_
Section 2.9 Limit Theorems in R
d
47
9.6. If the random variables are independent this follows from (3.1f). For the
converse we note that the inversion formula implies that the joint distribution
of the X
i
is that of independent random variables.
9.7. Clearly, independence implies
ij
= 0 for i = j. To prove the converse note
that
ij
= 0 for i = j implies
X1,...,X
d
(t) =
d
j=1
Xj
(t
j
)
and then use Exercise 9.6.
9.8. If (X
1
, . . . , X
d
) has a multivariate normal distribution then
c1X1++c
d
X
d
(t) = Eexp
_
_
i
j
tc
j
X
j
_
_
= exp
_
_
t
i
c
i
j
t
2
c
i
ij
c
j
/2
_
_
This is the ch.f. of a normal distribution with mean c
t
and variance cc
t
. To
prove the converse note that the assumption about the distribution of linear
combinations implies
E exp
_
_
i
j
c
j
X
j
_
_
= exp
_
_
i
c
i
j
c
i
ij
c
j
/2
_
_
so the vector has the right ch.f.
3 Random Walks
3.1. Stopping Times
1.1. P(X
i
= 0) < 1 rules out (i). By symmetry if (ii) or (iii) holds then the
other one does as well, so (iv) is the only possibility.
1.2. The central limit theorem implies S
n
/
n 1) > 0
So Kolmogorovs 01 law implies this probability is 1. This shows limsup S
n
=
with probability 1. A similar argument shows liminf S
n
= with proba
bility one.
1.3. {S T = n} = {S = n, T n} {S n, T = n}. The righthand side is in
F
n
since {S = n} F
n
and {T n} = {T n1}
c
F
n1
, etc. For the other
result note that {S T = n} = {S = n, T n} {S n, T = n}. The right
hand side is in F
n
since {S = n} F
n
and {T n} =
n
m=1
{T = m} F
n
,
etc.
1.4. {S +T = n} =
n1
m=1
{S = m, T = n m} F
n
so the result is true.
1.5. {Y
N
B} {N = n} = {Y
n
B} {N = n} F
n
, so Y
N
F
N
.
1.6. If A F
M
then
A {N = n} =
n
m=1
A {M = m} {N = n}
Since A F
M
, A {M = m} F
m
F
n
. Thus A {N = n} F
n
and
A F
N
.
1.7. Dividing the space into A and A
c
then breaking things down according to
the value of L
{N = n} = ({L = n} A)
n
m=1
({L = m} {M = n} A
c
)
Section 3.1 Stopping Times 49
{L = m} A
c
in F
m
whenever A F
m
by the denition of F
L
. Combining
this with {M = n} F
n
proves the desired result.
1.8. (i) (1.4) implies that P(
k
< ) = P( < )
k
so P(
k
< ) 0
if P( < ) < 1. (ii) (1.5) implies that
k
= S
k
S
k1
are i.i.d., with
E
k
(0, ] so (7.2) in Chapter 1 implies S
k
/k E
1
> 0 and sup
n
S
n
= .
1.9. By the previous exercise we get the following correspondence
P( < ) < 1 P( < ) < 1 sup S
n
< inf S
n
>
P( < ) = 1 P( < ) < 1 sup S
n
= inf S
n
>
P( < ) < 1 P( < ) = 1 sup S
n
< inf S
n
=
P( < ) < 1 P( < ) < 1 sup S
n
= inf S
n
=
Using (1.2) now we see that the four lines correspond to (i)(iv).
1.10. (i) A
n
m
corresponds to breaking things down according to the location of
the last time the minimum is attained so the A
n
m
are a partition of . To get
the second equality we note that
A
n
m
= {X
m
0, X
m
+X
m1
0, . . . X
m
+ +X
1
0
X
m+1
> 0, X
m+1
+X
m+2
> 0, . . . , X
m+1
+ +X
m+n
> 0}
(ii) Fatous lemma implies
1 P(
= )
k=0
P( > k) = P(
= )E
When P(
= ) > 0 the last inequality implies that E < and the desired
result follows from the dominated convergence theorem. It remains to prove
that if P(
k=0
P( > k)P(
> n k)
P(
> N) liminf
n
nN
k=0
P( > k)
so E 1/P(
= ) = 0, EX
i
= 0 and P(X
i
= 0) < 1 then P( = ) = 0. Similarly,
P( = ) = 0. Now use Exercise 1.9.
1.12. Changing variables y
k
= x
1
+ +x
k
we have
P(T > n) =
1
0
1x1
0
1x1xn1
0
dx
n
dx
2
dx
1
=
0<y1<...<yn1
dy
n
dy
1
= 1/n!
since the region is 1/n! of the volume of [0, 1]. From this it follows that ET =
n=0
P(T > n) = e and Walds equation implies ES
T
= ETEX
i
= e/2.
1.13. (i) The strong law implies S
n
so by Exercise 1.9 we must have
P( < ) = 1 and P( < ) < 1. (ii) This follows from (1.4). (iii) Walds
equation implies ES
n
= E( n)EX
1
. The monotone convergence theorem
implies that E( n) E. P( < ) = 1 implies S
n
S
= 1. (ii) and
the dominated convergence theorem imply ES
n
1.
1.14. (i) T has a geometric distribution with success probability p so ET = 1/p.
The rst X
n
that is larger than a has the distribution of X
1
conditioned on
X
1
> a so
EY
T
= a +E(X a)
+
/p c/p
(ii) If a = the last expression reduces to . Clearly
max
mn
X
m
+
n
m=1
(X
m
)
+
for n 1 subtracting cn gives the inequality in the exercise. Walds equation
implies that if E < then
E
m=1
(X
m
)
+
= EE(X
1
)
+
Using the denition of c now we have EY
.
1.15. using the denitions and then taking expected value
S
2
Tn
= S
2
T(n1)
+ (2X
n
S
n1
+X
2
n
)1
(Tn)
ES
2
Tn
= ES
2
T(n1)
+
2
P(T n)
Section 3.4 Renewal Theory 51
since EX
n
= 0 and X
n
is independent of S
n1
and 1
(Tn)
F
n1
. [The
expectation of S
n1
X
n
exists since both random variables are in L
2
.] From the
last equality and induction we get
ES
2
Tn
=
2
n
m=1
P(T m)
E(S
Tn
S
Tm
)
2
=
2
n
k=m+1
P(T n)
The second equality follows from the rst applied to X
m+1
, X
m+2
, . . .. The
second equality implies that S
Tn
is a Cauchy sequence in L
2
, so letting n
in the rst it follows that ES
2
T
=
2
ET.
3.4. Renewal Theory
4.1. Let
X
i
= X
i
t,
T
k
=
X
1
+ +
X
k
,
N
t
= inf{k :
T
k
> t}. Now
X
i
= X
i
unless X
i
> t and X
i
> t implies N
t
i. Now
t
T
Nt
2t
the optional stopping theorem implies
E
T
Nt
= E(X
i
t)E
N
t
and the desired result follows.
4.2. Pick > 0 so that P(
i
> ) = > 0. Let
k
= 0 if
k
and = if
k
> . Let T
n
=
1
+ +
n
and M
t
= inf{n : T
n
> t}. Clearly T
n
T
n
and
so N
t
M
t
. M
t
is the sum of k
t
= [t/] +1 geometrics with success probability
so by Example 3.5 in Chapter 1
EM
t
= k
t
/
var(M
t
) = k
t
(1 )/
2
E(M
t
)
2
= var(M
t
) + (EM
t
)
2
C(1 +t
2
)
4.3. The lack of memory property of the exponential implies that the times
between customers who are served is a sum of a service time with mean and
a waiting time that is exponential with mean 1. (4.1) implies that the number
of customers served up to time t, M
t
satises M
t
/t 1/(1 +). (4.1) applied
to the Poisson process implies N
t
/t 1 a.s. so M
t
/N
t
1/(1 +) a.s.
52 Chapter 3 Random Walks
4.4. Clearly if I
(x) = sup
y(x,x+)
h(y)
a
k
= sup
y[k,(k+1))
h(y)
If x [m, (m+ 1)) then
h
(x) a
m1
+a
m
+a
m+1
so integrating over [m, (m+ 1)) and summing over m gives
0
h
(x) dx 3I
<
Now I
0
a
[x/]
dx, a
[x/]
h(x) as 0, and if < then a
[x/]
h
(x)
so the dominated convergence theorem implies
I
0
h(x) dx
A similar argument shows I
0
h(x) dx and the proof is complete.
4.5. The equation comes from considering the time of the rst renewal. It is easy
to see using (4.10) that h(t) = (1 F(t))1
(x,)
is directly Riemann integrable
whenever < so (4.9) implies
H(t)
1
0
(1 F(s))1
(x,)
(s) ds
4.6. In this case the equation is
H(t) = e
t
1
(x,)
(t) +
t
0
H(t s) e
s
ds
and one can check by integrating that the solution is
H(t) =
0 if t < x
e
x
if t x
4.7. By considering the time of the rst renewal
H(t) = (1 F(t +y))1
(x,)
+
t
0
H(t s) dF(s)
Section 3.4 Renewal Theory 53
It is easy to see using (4.10) that h(t) = 1 F(t +y)1
(x,)
is directly Riemann
integrable whenever < so (4.9) implies
H(t)
1
0
(1 F(y +s))1
(x,)
(s) ds
4.8. By considering
1
and
1
H(t) = 1 F
1
(t) +
t
0
H(t s) dF(s)
It follows from (4.10) that h(t) = 1 F
1
(t) is directly Riemann integrable
whenever
1
< so (4.9) implies
H(t)
1
0
1 F
1
(s) ds =
1
1
+
2
4.9. By considering the times of the rst two renewals we see
H(t) = 1 F(t) +
t
0
H(t s) dF
2
(s)
Taking
1
=
2
= in the previous exercise gives the desired result.
4.10. V = F + V F so dierentiating gives the desired equality. Using (4.9)
now gives
v(t)
1
0
f(t) dt =
1
n
m=1
U
m
/n 2
k
. Since EN
n
/n 1/Et
2
, it follows that Et
2
= 2
k
.
(ii) For HH we get Et
1
= 4 since
Et
1
= 1/4 + 1/4(Et
1
+ 2) + 1/2(Et
1
+ 1) (1/4)Et
1
= 1
For HT we note that if we get heads the rst time then we have what we want
the rst time T appears so
Et
1
= P(H) 2 +P(T) (Et
1
+ 1) (1/2)Et
1
= 3/2
and Et
1
= 3
4 Martingales
4.1. Conditional Expectation
1.1. Let Y
i
= E(X
i
F). If A B and A F then
_
A
Y
1
dP =
_
A
X
1
dP =
_
A
X
2
dP =
_
A
Y
2
dP
If A = {Y
1
Y
2
> 0} B then repeating the proof of uniqueness shows
P(A) = 0 and Y
1
= Y
2
a.s. on B.
1.2. The dention of conditional expectation implies
_
B
P(AG) dP =
_
B
1
A
dP = P(A B)
Taking B = G and B = it follows that
_
G
P(AG) dP
_
P(AG) dP
=
P(G A)
P(A)
= P(GA)
1.3. a
2
1
(Xa)
X
2
so using (1.1b) and (1.1a) gives the desired result.
1.4. (1.1b) implies Y
M
E(X
M
F) a limit Y . If A F then the dention of
conditional expectation implies
_
A
X M dP =
_
A
Y
M
dP
Using the monotone convergence theorem now gives
_
A
X dP =
_
A
Y dP
Section 4.1 Conditional Expectation 55
1.5. (1.1b) and (1.1a) imply
0 E((X +Y )
2
G) = E(X
2
G)
2
+ 2E(XY G) +E(Y
2
G)
Now a quadratic a
2
+b +c which is nonnegative at all rational must have
b
2
4ac 0 and the desired result follows.
1.6. Let F
1
= ({a}) and F
2
= ({c}). Take X(b) = 1, X(a) = X(c) = 0. In
this case
a b c
E(XF
1
) 0 1/2 1/2
E(E(XF
1
)F
2
) 1/4 1/4 1/2
To see this is = E(E(XF
2
)F
1
), we can note it is not F
1
.
1.7. (i) implies (ii) follows from Example 1.2. The failure of the converse follows
from Example 4.2 in Chapter 1.
To prove (ii) implies (iii) we note that (1.1f), (1.3), and the assumption
E(XY ) = EE(XY X) = E(XE(Y X)) = E(XEY ) = EXEY
To see that the converse fails consider
X/Y 1 1
1 1/4 0
0 0 1/2
1 1/4 0
where EX = EY = EXY = 0 but E(Y X) = 1 + 2X
2
.
1.8. Let Z = E(XF) E(XG) F and steal an equation from the proof of
(1.4)
E{X E(XF) Z}
2
= E{X E(XF)}
2
+EZ
2
Inserting the denition of Z now gives the desired result.
1.9. var(XF) = E(X
2
F) E(XF)
2
and E(E(X
2
F)) = EX
2
we have
E(var(XF)) = EX
2
E(E(XF)
2
)
Since E(E(XF)) = EX we have
var(E(XF)) = E(E(XF)
2
) (EX)
2
Adding the two equations gives the desired result.
56 Chapter 4 Martingales
1.10. Let F = (N). Our rst step is to prove E(XN) = N. Clearly (i) in
the denition holds. To check (ii), it suces to consider A = {N = n} but in
this case
_
{N=n}
X dP = E{(Y
1
+ +Y
n
)1
(N=n)
}
= nP(N = n) =
_
{N=n}
N dP
A similar computation shows that E(X
2
N) =
2
N + (N)
2
so
var(XN) = E(X
2
N) E(XN)
2
=
2
N
and using the previous exercise we have
var(X) = E(var(XN)) + var(E(XN))
=
2
EN +
2
var(N)
1.11. Exercise 1.8 with G = {, } implies
E(Y X)
2
+E(X EY )
2
= E(Y EY )
2
since EY = EX, and EX
2
= EY
2
, E(XEX)
2
= E(Y EY )
2
and subtract
ing we conclude E(Y X)
2
= 0.
1.12. Jensens inequality implies
E(XF) E(XF)
If the two expected values are equal then the two random variables must be
equal almost surely, so E(XF) = E(XF) a.s. on {E(XF) > 0}. Taking
expected value and using the denition of conditional expectation
E(X X; E(XF) > 0) = 0
This and a similar argument on {E(XF) < 0} imply
sgn(X) = sgn(E(XF)) a.s.
Taking X = Y c it follows that sgn (Y c) = sgn(E(Y G) c) a.s. for all
rational c from which the desired result follows.
1.13. (i) in the denition follows by taking h = 1
A
in Example 1.4. To check
(ii) note that the dominated convergence theorem implies that A (y, A) is
a probability measure.
Section 4.2 Martingales, Almost Sure Convergence 57
1.14. If f = 1
A
this follows from the denition. Linearity extends the result to
simple f and monotone convergence to nonnegative f. Finally we get the result
in general by writing f = f
+
f
.
1.15. If we x and apply the ordinary Holder inequality we get
_
(, d
) X(
)Y (
)
__
(, d
)X(
)
p
_
1/p
__
(, d
)Y (
)
q
_
1/q
The desired result now follows from Exercise 1.14.
1.16. Proof As in the proof of (1.6), we nd there is a set
o
with P(
o
) =
1 and a family of random variables G(q, ), q Q so that q G(q, ) is
nondecreasing and G(q, ) is a version of P((X) qG). Since G(q, )
(Y ) we can write G(q, ) = H(q, Y ()). Let F(x, y) = inf{G(q, y) : q > x}.
The argument given in the proof of (1.6) shows that there is a set A
0
with
P(Y A
0
) = 1 so that when y A
0
, F is a distribution function and that
F(x, Y ()) is a version of P((X) xY ).
Now for each y A
o
, there is a unique measure (y, ) on (R, R) so that
(y, (, x]) = F(x, y)). To check that for each B R, (Y (), B) is a version
of P((X) BY ), we observe that the class of B for which this statement is
true (this includes the measurability of (Y (), B)) is a system that
contains all sets of the form (a
1
, b
1
] (a
k
, b
k
] where a
i
< b
i
,
so the desired result follows from the theorem. To extract the desired
r.c.d. notice that if A S, and B = (A) then B = (
1
)
1
(A) R, and set
(y, A) = (y, B).
4.2. Martingales, Almost Sure Convergence
2.1. Since X
n
G
n
and n G
n
is increasing F
n
= (X
1
, . . . , X
n
) G
n
. To
check that X
n
is a martingale note that X
n
F
n
, while (1.2) implies
E(X
n+1
F
n
) = E(E(X
n+1
G
n
)F
n
) = E(X
n
F
n
) = X
n
2.2. The fact that f is continuous implies it is bounded on bounded sets and
hence Ef(S
n
) < . Using various denitions now, we have
E(f(S
n+1
)F
n
) = E(f(S
n
+
n+1
)F
n
)
=
1
B(0, 1)
_
B(Sn,1)
f(y) dy f(S
n
)
58 Chapter 4 Martingales
2.3. Let a
n
0 be decreasing. Then X
n
= a
n
is a submartingale but X
n
= a
2
n
is a supermartingale.
2.4. Suppose P(
i
= 1) = 1
i
, P(
i
= (1
i
)/
i
) =
i
. Pick
i
> 0 so that
i
< , e.g.
i
= i
2
. P(
i
= 1 i.o.) = 0 so X
n
/n 1 and X
n
.
2.5. A
n
=
n
m=1
P(B
m
F
m1
).
2.6. Since (S
n
+
n+1
)
2
= S
2
n
+2S
n
n+1
+
2
n+1
and
n+1
is independent of F
n
,
we have
E(S
2
n+1
s
2
n+1
F
n
) = S
2
n
+ 2S
n
E(
n+1
F
n
) +E(
2
n+1
F
n
) s
2
n+1
= S
2
n
+ 0 +
2
n+1
s
2
n+1
= S
2
n
s
2
n
2.7. Clearly, X
(k)
n
F
n
. The independence of the
i
, (4.8) in Chapter 1 and
the triangle inequality imply EX
(k)
n
 < . Since X
(k)
n+1
= X
(k)
n
+X
(k1)
n
n+1
taking conditional expectation and using (1.3) gives
E(X
(k)
n+1
F
n
) = X
(k)
n
+X
(k1)
n
E(
n+1
F
n
) = X
(k)
n
2.8. Clearly, X
n
Y
n
F
n
. Since X
n
Y
n
 X
n
 + Y
n
, EX
n
Y
n
 < .
Use monotonicity (1.1b) and the dention of supermartingale
E(X
n+1
Y
n+1
F
n
) E(X
n+1
F
n
) X
n
E(X
n+1
Y
n+1
F
n
) E(Y
n+1
F
n
) Y
n
From this it follows that E(X
n+1
Y
n+1
F
n
) X
n
Y
n
.
2.9. (i) Clearly X
n
F
n
, and EX
n
 < . Using (1.3) now we have
E(X
n+1
F
n
) = X
n
E(Y
n+1
F
n
) = X
n
since Y
n+1
is independent of F
n
and has EY
n
= 1.
(ii) (2.11) implies X
n
X
= 0 a.s.
Section 4.2 Martingales, Almost Sure Convergence 59
(iii) Applying Jensens inequality with (x) = log x to Y
i
and then letting
0 we have E log Y
i
[, 0]. Applying the strong law of large numbers,
(7.3) in Chapter 1, to log Y
i
we have
1
n
log X
n
=
1
n
n
m=1
log Y
m
E log Y
1
[, 0]
2.10. Our rst step is to prove
Lemma. When y 1/2, y y
2
log(1 +y) y.
Proof 1 + y e
y
implies log(1 + y) y for all y. Expanding log(1 + y) in
power series gives
log(1 +y) = y
y
2
2
+
y
3
3
y
4
4
+
When y 1/2
y
2
2
+
y
3
3
y
4
4
+
y
2
2
_
1 +
1
2
+
1
2
2
+
_
= y
2
which completes the proof.
Now if
m=1
y
m
 < we have y
m
 1/2 for m M so
m=1
y
2
m
< and
if N M the lemma implies
m=N
y
m
y
2
m
m=N
log(1 +y
m
)
m=N
y
m
The last inequality shows
m=N
log(1+y
m
) 0 as N , so
m=1
(1+y
m
)
exists.
2.11. Let W
n
= X
n
/
n1
m=1
(1 + Y
m
). Clearly W
n
F
n
, EW
n
 EX
n
 < .
Using (1.3) now and the denition gives
E(W
n+1
F
n
) =
1
n
m=1
(1 +Y
m
)
E(X
n+1
F
n
)
n1
m=1
(1 +Y
m
)
X
n
= W
n
Thus W
n
is a nonnegative supermartingale and (2.11) implies that W
n
W
m
Y
m
< implies that
n1
m=1
(1+Y
m
)
m=1
(1+Y
m
),
so X
n
W
m=1
(1 +Y
m
) a.s.
60 Chapter 4 Martingales
2.12. Let S
n
be the random walk from Exercise 2.2. That exercise implies
f(S
n
) 0 is a supermartingale, so (2.11) implies f(S
n
) converges to a limit
almost surely. If f is continuous and noncontant then there are constants <
so that G = {f < } and H = {f > } are nonempty open sets. Since the
random walk S
n
has mean 0 and nite variance, (2.7) and (2.8) in Chapter 3
imply that S
n
visits G and H innitely often. This implies
liminf f(S
n
) < limsup f(S
n
)
a contradiction which implies f must be constant.
2.13. Using the denition of Y
n+1
, the inequality X
1
N
X
2
N
, the fact that
{N n} F
n
(and hence {N > n} F
n
), and nally the supermartingale
property we have
E(Y
n+1
F
n
) = E(X
1
n+1
1
(N>n+1)
+X
2
n+1
1
(Nn+1)
F
n
)
E(X
1
n+1
1
(N>n)
+X
2
n+1
1
(Nn)
F
n
)
= E(X
1
n+1
F
n
)1
(N>n)
+E(X
2
n+1
F
n
)1
(Nn)
X
1
n
1
(N>n)
+X
2
n
1
(Nn)
= Y
n
2.14. (i) To start we note that Z
1
n
1 is clearly a supermartingale. For the
induction step we have to consider two cases k = 2j and k = 2j +1. In the case
k = 2j we use the previous exercise with X
1
= Z
2j1
, X
2
= (b/a)
j1
(X
n
/a),
and N = N
2j1
. Clearly these are supermartingales. To check the other con
dition we note that since X
N
a we have X
1
N
= (b/a)
j1
X
2
N
.
In the case k = 2j + 1 we use the previous exercise with X
1
= Z
2j
and X
2
=
(b/a)
j
, and N = N
2j
. Clearly these are supermartingales. To check the other
condition we note that since X
N
b we have X
1
N
(b/a)
j
= X
2
N
.
(ii) Since Z
2k
n is a supermartingale, EY
0
EY
nN
2k
. Letting n and
using Fatous lemma we have
E(min(X
0
/a, 1) = EY
0
E(Y
N
2k
; N
2k
< ) = (b/a)
k
P(U k)
4.3. Examples
3.1. Let N = inf{n : X
n
> M}. X
Nn
is a submartingale with
X
+
Nn
M + sup
n
+
n
Section 4.3 Examples 61
so sup
n
EX
+
Nn
< . (2.10) implies X
Nn
a limit so X
n
converges on
{N = }. Letting M and recalling we have assumed sup
n
+
n
< gives
the desired conclusion.
3.2. Let U
1
, U
2
, . . . be i.i.d. uniform on (0,1). If X
n
= 0 then X
n+1
= 1 if
U
n+1
1/2, X
n+1
= 1 if U
n+1
< 1/2. If X
n
= 0 then X
n+1
= 0 if U
n+1
>
n
2
, while X
n+1
= n
2
X
n
if U
n+1
< n
2
. [We use the sequence of uniforms
because it makes it clear that the decisions at time n + 1 are independent of
the past.]
n
1/n
2
< so the Borel Cantelli lemma implies that eventually
we just go from 0 to 1 and then back to 0 again, so sup X
n
 < .
3.3. Modify the previous example so that if X
n
= 0 then X
n+1
= 1 on
U
n+1
> 3/4, X
n+1
= 1 if U
n+1
< 1/4, X
n+1
= 0 otherwise. The previ
ous argument shows that eventually X
n
is indistiguishable from the Markov
chain with transition matrix
_
_
0 1 0
1/4 1/2 1/4
0 1 0
_
_
This chain converges to its stationary distribution which assigns mass 2/3 to 0
and 1/6 each to 1 and 1.
3.4. Let W
n
= X
n
n1
m=1
Y
m
. Clearly W
n
F
n
and EW
n
 < . Using the
linearity of conditional expectation,
n
m=1
Y
m
F
n
, and the dention we have
E(W
n+1
F
n
) E(X
n+1
F
n
)
n
m=1
Y
m
X
n
n1
m=1
Y
m
= W
n
Let M be a large number and N = inf{k :
k
m=1
Y
m
> M}. Now W
Nn
is a
supermartingale by (2.8) and
W
Nn
= X
Nn
(Nn)1
m=1
Y
m
so applying (2.11) to M +W
Nn
we see that lim
n
W
Nn
exists and hence
lim
n
W
n
exists on {N = } {
m
Y
m
M}. As M the right hand
side , so the proof is complete.
3.5. Let X
m
{0, 1} be independent with P(X
m
= 1) = p
m
. Then
m=1
(1 p
m
) = P(X
m
= 0 for all m 1)
62 Chapter 4 Martingales
(i) If
m=1
p
m
= then P(X
m
= 1 i.o.) = 1 so the product is 0. (ii) If
m=1
p
m
< then
m=M
p
m
< 1 for large M, so P(X
m
= 0 for all m
M) > 0 and since p
m
< 1 for all m, P(X
m
= 0 for all m 1) > 0.
3.6. Let p
1
= P(A
1
) and p
n
= P(A
n

n1
m=1
A
c
m
).
n
m=1
(1 p
m
) = P(
n
m=1
A
c
m
)
so letting n and using (i) of Exercise 3.5 gives the desired result.
3.7. Suppose I
k,n
= I
1,n+1
I
m,n+1
. If (I
j,n+1
) > 0 for all j we have by
using the various denitions that
_
I
k,n
X
n+1
dP =
m
j=1
(I
j,n+1
)
(I
j,n+1
)
(I
j,n+1
)
= (I
k,n
) =
(I
k,n
)
(I
k,n
)
(I
k,n
) =
_
I
k,n
X
n
dP
If (I
j,n+1
) = 0 for some j then the rst sum should be restricted to the j with
(I
j,n+1
) > 0. If << the second = holds but in general we have only .
3.8. If and are nite we can nd a sequence of sets
k
so that (
k
)
and (
k
) are < and (
1
) > 0. By restricting our attention to
k
we can
assume that and are nite measures and by normalizing that that is a
probability measure. Let F
n
= ({B
m
: 1 m n}) where B
m
= A
m
k
.
Let
n
and
n
be the restrictions of and to F
n
, and let X
n
= d
n
/d
n
.
(3.3) implies that X
n
X a.s. where
(A) =
_
A
X d +(A {X = })
Since X < a.s. and << , the second term is 0, and we have the desired
Radon Nikodym derivative.
3.9. (i)
_
q
m
dG
m
=
_
(1
m
)(1
m
) +
m
so the necessary and
sucient condition is
m=1
_
(1
m
)(1
m
) +
_
m
> 0
(ii) Let f
p
(x) =
_
(1 p)(1 x)+
m=1
f
m
(
m
) > 0 if and only if
m=1
1f
m
(
m
) < .
Section 4.3 Examples 63
Our task is then to show that the last condition is equivalent to
m=1
(
m
m
)
2
< . Dierentiating gives
f
p
(x) =
1
2
_
p
x
1
2
_
1 p
1 x
f
p
(x) =
1
4
p
x
3/2
1
4
1 p
(1 x)
3/2
< 0
If x, p 1 then
A
2(1 )
3/2
f
p
(x)
1
2
3/2
B
We have f
p
(p) = 0 so integrating gives
0 f
p
(x) f
p
(p) =
_
x
p
f
p
(y) dy
=
_
x
p
_
y
p
f
p
(z) dz dy B(x p)
2
/2
A similar argument establishes an upper bound of A(xp)
2
/2 so using f
p
(p) =
1 we have
A(x p)
2
/2 1 f
p
(x) B(x p)
2
/2
3.10. The Borel Cantelli lemmas imply that when
n
< concentrates
on points in {0, 1}
N
with nitely many ones while
n
= implies con
centrates on points in {0, 1}
N
with innitely many ones.
3.11. Let U
1
, U
2
, . . . be i.i.d. uniform on (0, 1). Let X
n
= 1 if U
n
<
n
and
0 otherwise. Let Y
n
= 1 if U
n
<
n
and 0 otherwise. Then X
1
, X
2
, . . . are
independent with distribution F
n
and Y
1
, Y
2
, . . . are independent with distri
bution G
n
. If

n
n
 < then for large N
nN

n
n
 < 1 which
implies P(X
n
= Y
n
for n N) > 0. Since 0 <
n
<
n
< 1 it follows that
P(X
n
= Y
n
for n 1) > 0. This shows that the measures and induced by
the sequences (X
1
, X
2
, . . .) and (Y
1
, Y
2
, . . .) are not mutually singular so by the
Kakutani dichotomy they must be absolutely continuous.
3.12. Let = P(limZ
n
/
n
= 0). By considering what happens at the rst step
we see
=
k=0
p
k
k
= ()
64 Chapter 4 Martingales
Since we assumed < 1, it follows from (b) in the proof of (3.10)) that = .
It is clear that
{Z
n
> 0 for all n} {limZ
n
/
n
> 0}
Since each set has probability 1 they must be equal a.s.
3.13. is a root of
x =
1
8
+
3
8
x +
3
8
x
2
+
1
8
x
3
Section 4.4 Doobs Inequality, L
p
Convergence 65
Subtracting x from each side and the multiplying by 8 this becomes
0 = x
3
+ 3x
2
5x + 1 = (x 1)(x
2
+ 4x 1)
The quadratic has roots 2
5 so =
5 2.
4.4. Doobs inequality, L
p
convergence when p >
1
4.1. Since {N = k}, using X
j
E(X
k
F
j
) and the denition of conditional
expectation gives that
E(X
N
; N = j) = E(X
j
; N = j) E(X
k
; N = j)
Summing over j now we have EX
N
EX
k
.
4.2. Let K
n
= 1
M<nN
. {M < n N} = {M n 1} {N < n}
c
so K
n
is
predictable. Y
n
= (K X)
n
= X
Nn
X
Mn
is a submartingale. Taking n = k
and n = 0 we have EX
N
EX
M
0.
4.3. Exercise 1.7 in Chapter 3 implies that for A F
M
L =
_
M on A
N on A
c
is a stopping time. Using Exercise 4.2 now gives EX
L
EX
N
. Since L = M
on A and L = N on A
c
, subtracting E(X
N
; A
c
) from each side and using the
denition of conditional expectation gives
E(X
M
; A) E(X
M
; A) = E(E(X
N
F
M
); A)
Since this holds for all A F
M
it follows that X
M
E(X
N
F
M
).
4.4. Let A = {max
1mn
S
m
 > x} and N = inf{m : S
m
 > x or m = n}.
Since N is a stopping time with P(N n) = 1, (4.1) implies
0 = E(S
2
N
s
2
N
) (x +K)
2
P(A) + (x
2
var(S
n
))P(A
c
)
since on A, S
N
 x + K and and on A
c
, S
2
N
= S
2
n
x
2
. Letting P(A) =
1 P(A
c
) and rearranging we have
(x +K)
2
(var(S
n
) x
2
+ (x +K)
2
)P(A
c
) var(S
n
)P(A
c
)
66 Section 4.4 Doobs Inequality, L
p
Convergence
4.5. If c < then using an obvious inequality, then (4.1) and the fact EX
n
= 0
P
_
max
1mn
X
m
_
P
_
max
1mn
(X
n
+c)
2
(c +)
2
_
E(X
n
+c)
2
(c +)
2
=
EX
2
n
+c
2
(c +)
2
To optimize the bound we dierentiate with respect to c and set the result to
0 to get
2
EX
2
n
+c
2
(c +)
3
+
2c
(c +)
2
= 0
2c(c +) 2(EX
2
n
+c
2
) = 0
so c = EX
2
n
/. Plugging this into the upper bound and then multiplying top
and bottom by
2
EX
2
n
+ (EX
2
n
/)
2
_
EX
2
n
+
_
2
=
(
2
+EX
2
n
)(EX
2
n
)
(EX
2
n
+
2
)
2
4.6. Since X
+
n
is a submartingale and x
p
is increasing and convex it follows that
(X
+
m
)
p
{E(X
+
n
F
m
)}
p
E((X
+
n
)
p
F
m
)
Taking expected value now we have E(X
+
m
)
p
< and it follows that
E
X
p
n
n
m=1
E(X
+
m
)
p
<
4.7. Arguing as in the proof of (4.3)
E(
X
n
M) 1 +
_
1
P(
X
n
M ) d
1 +
_
1
1
_
X
+
n
1
(
XnM)
dP
1 +
_
X
+
n
_
XnM
1
1
ddP
= 1 +
_
X
+
n
log(
X
n
M) dP
Section 4.5 Convergence in L
1
67
(ii) a log b a log a +b/e a log
+
a +b/e
Proof The second inequality is trivial. To prove the rst we note that it is
trivial if b < a. Now for xed a the maximum value of (a log b a log a)/b for
b a occurs when
0 =
_
a log b a log a
b
_
=
a
b
2
a log b a log a
b
2
i.e., when b = ae. In this case the ratio = 1/e.
(iii) To complete the proof of (4.4) now we use the Lemma to get
E(
X
n
M) 1 +E(X
+
n
log
+
X
+
n
) +E(
X
n
M)/e
Since E(
X
n
M)/e < we can subtract this from both sides and then divide
by (1 e
1
) to get
E(
X
n
M) (1 +e
1
)
1
(1 +EX
+
n
log
+
X
+
n
)
Letting M and using the dominated convergence theorem gives the desired
result.
4.8. (4.6) implies that E(X
m
Y
m1
) = E(X
m1
Y
m1
). Interchanging X and Y ,
we have E(X
m1
Y
m
) = E(X
m1
Y
m1
), so
E(X
m
X
m1
)(Y
m
Y
m1
)
= EX
m
Y
m
EX
m
Y
m1
EX
m1
Y
m
EX
m1
Y
m1
= EX
m
Y
m
+ (2 + 1)EX
m1
Y
m1
Summing over m1 to n now gives the desired result.
4.9. Taking X = Y in the previous exercise
EX
2
n
= EX
2
0
+
n
m=1
E
2
m
So our assumptions imply sup
n
EX
2
n
and (4.5) implies X
n
X
in L
2
.
4.10. Applying the previous exercise to the martingale Y
n
=
n
m=1
m
/b
m
we
have Y
m
Y
a.s and in L
2
, so Kroneckers lemma ((8.5) in Chapter 1) implies
(X
n
X
0
)/b
n
0 a.s.
4.11. S
Nn
is a martingale with increasing process
2
(N n). If EN
1/2
<
then E sup
n
S
Nn
 < . (4.1) implies that ES
Nn
= 0. Letting n and
using the dominated convergence theorem, ES
N
= 0.
68 Section 4.5 Uniform Integrability, Convergence in L
1
4.5. Uniform Integrability, Convergence in L
1
5.1. Let
M
= sup{x/(x) : x M}. For i I
E(X
i
; X
i
 > M)
M
E((X
i
); X
i
 > M) C
M
and
M
0 as M .
5.2. Let F
n
= (Y
1
, . . . , Y
n
). (5.5) implies that
E(F
n
) E(F
)
To complete the proof it suces to show that F
. To do this we observe
that the strong law implies (Y
1
+ +Y
n
)/n a.s.
5.3. Let a
n,k
= {f((k+1)2
n
)f(k2
n
)}/2
n
. Since I
k,n
= I
2k,n+1
I
2k+1,n+1
,
it follows from Example 1.3 that on I
k,n
E(X
n+1
F
n
) =
a
2k,n+1
+a
2k+1,n
2
= a
k,n
= X
n
Since 0 X
n
K it is uniformly integrable, so (5.5) implies X
n
X
a.s. and
in L
1
, and (5.5) implies X
n
= E(X
F
n
). This implies that
() f(b) f(a) =
_
b
a
X
() d
holds when a = k2
n
and b = (k + 1)2
n
. Adding a nite number of these
equations we see () holds when a = k2
n
and b = m2
n
where m > k. Taking
limits and using the fact that f is continuous and X() K we have () for
all a and b.
5.4. E(fF
n
) is uniformly integrable so it converges a.s. and in L
1
to E(fF
),
which is = f since f F
.
5.5. On {liminf
n
X
n
M}, X
n
M + 1 i.o. so
P(DX
1
, . . . , X
n
) (M + 1) > 0 i.o.
Since the right hand side 1
D
, we must have
D {liminf
n
X
n
M}
Letting M , we have D {liminf
n
X
n
< } a.s.
Section 4.6 Backwards Martingales 69
5.6. If p
0
> 0 then P(Z
n+1
= 0Z
1
, . . . , Z
n
) p
k
0
on {Z
n
k} so Exercise 5.5
gives the desired result.
5.7.
E(X
n+1
F
n
) = X
n
( +X
n
) + (1 X
n
)X
n
= X
n
+X
n
= X
n
so X
n
is a martingale. 0 X
n
1 so (5.5) implies X
n
X
a.s. and
in L
1
. When X
n
= x, X
n+1
is either + x or x so convergence to x
(0, 1) is impossible. The constancy of martingale expectation and the bounded
convergence theorem imply
= EX
0
= EX
n
EX
Since X
= 1) = and P(X
= 0) = 1 .
5.8. The trinagle inequality implies
EE(Y
n
F
n
) E(Y F) EE(Y
n
F
n
) E(Y F
n
) +EE(Y F
n
) E(Y F)
Jensens inequality and (1.1f) imply
EE(Y
n
F
n
) E(Y F
n
) EE(Y
n
Y  F
n
) = EY
n
Y  0
since Y
n
Y in L
1
. For the other term we note (5.6) implies
EE(Y F
n
) E(Y F) 0
4.6. Backwards Martingales
6.1. The L
p
maximal inequality (4.3) implies
E
_
sup
nm0
X
m

p
_
_
p
p 1
_
p
EX
0

p
Letting n it follows that sup
m
X
m
 L
p
. Since X
n
X

p
2 supX
n

p
it follows from the dominated convergence theorem that X
n
X
in L
p
.
6.2. Let W
N
= sup{Y
n
Y
m
 : n, m N}. W
N
2Z so EW
N
< . Using
monotonicity (1.1b) and applying (6.3) to W
N
gives
limsup
n
E(Y
n
Y
F
n
) lim
n
E(W
N
F
n
) = E(W
N
F
)
70 Section 4.5 Uniform Integrability, Convergence in L
1
The last result is true for all N and W
N
0 as N , so (1.1c) implies
E(W
N
F
F
n
) E(Y
n
Y
F
n
) 0 a.s. as n
(6.2) implies E(Y
F
n
) E(Y
F
2
(T n)
Now 1 S
Tn
min
m
S
m
and Example 7.1 implies E(min
m
S
m
)
2
< , so
using the Cauchy Schwarz inequality for the second term we see that each of
the four terms is dominated by an integrable random variable so letting n
and using dominated convergence
0 = 1 2(p q)ET + (p q)
2
ET
2
2
ET
Recalling ET = 1/(p q) and solving gives
ET
2
=
1
(p q)
2
+
2
(p q)
3
so var(T) = ET
2
(ET)
2
=
2
/(p q)
3
.
7.3. (i) Using (4.1) we have
0 = ES
2
Tn
(T n)
As n , ES
2
Tn
a
2
by bounded convergence, and E(T n) ET by
monotone convergence so ET = a
2
.
(ii) Since
n
= 1 with equal probability,
2
n
=
4
n
= 1, and
E(S
3
n
n+1
F
n
) = S
3
n
E(
n+1
F
n
) = 0
E(S
n
3
n+1
F
n
) = S
n
E(
3
n+1
F
n
) = 0
E(S
n
n+1
F
n
) = S
n
E(
n+1
F
n
) = 0
Substituting S
n+1
= S
n
+
n+1
, expanding out the powers and using the last
three identities
E
_
(S
n
+
n+1
)
4
6(n + 1)(S
n
+
n+1
)
2
+b(n + 1)
2
+c(n + 1)
F
n
_
= S
4
n
+ 6S
2
n
+ 1 6(n + 1)S
2
n
6(n + 1) +bn
2
+b(2n + 1) +cn +c
= S
4
n
6nS
2
n
+bn
2
+cn + (2b 6)n + (b +c 5) = Y
n
if b = 3 and c = 2. Using (4.1) now
3E(T n)
2
= E{6(T n)S
2
Tn
S
4
Tn
2(T n)}
Letting n , using the monotone convergence theorem on the left and the
dominated convergence theorem on the right.
3ET
2
= 6a
2
ET a
4
2ET
72 Section 4.5 Uniform Integrability, Convergence in L
1
Recalling ET = a
2
gives ET
2
= (5a
4
2a
2
)/3.
7.4. (i) Using (1.3) and the fact that
n+1
is independent of F
n
E(X
n+1
F
n
) = exp(S
n
(n + 1)())E(exp(
n+1
)F
n
)
= exp(S
n
n())
(ii) As shown in Section 1.9,
() =
= (e
x
/()) dF
then we have
d
d
()
()
=
()
()
_
()
()
_
2
=
_
x
2
dF
(x)
__
xdF
(x)
_
2
> 0
since the last expression is the variance of F
n
= exp((/2)S
n
(n/2)()) (iii)
= X
/2
n
exp(n{(/2) ()/2})
Strict convexity and (0) = 0 imply (/2) ()/2 < 0. X
/2
n
is martingale
with X
/2
0
= 1 so
E
_
X
n
= exp(n{(/2) ()/2}) 0
as n and it follows that X
n
0 in probability.
7.5. If 0 then () (e
+e
)/2 1 so () = ln () 0 and
X
nT
= exp(S
Tn
(T n)()) e
E()
T
.
(ii) Setting (s) = pe
+qe
= 1/s and x = e
we have qsx
2
x +ps = 0.
Solving gives
Es
T
= x =
1
_
1 4pqs
2
2qs
Section 4.7 Optional Stopping Theorems 73
s Es
T
is continuous on [0, 1] and 0 as s 0 so the root is always the
right choice.
7.6. X
Tn
is bounded so the optional stopping theorem and Chebyshevs in
equality imply 1 = EX
T
e
oa
P(S
T
a).
7.7. Let be Normal(0,
2
).
Ee
1
= Ee
(c)
= e
(c)
_
e
x
1
2
2
e
x
2
/2
2
dx
= exp((c ) +
2
2
/2)
_
1
2
2
e
(x+
2
)
2
/2
2
dx
= exp((c ) +
2
2
/2)
since the integral is the total mass of a normal density with mean
2
and
variance
2
. Taking
o
= 2( c)/
2
we have (
o
) = 1. Applying the result
in Exercise 7.6 to S
n
S
0
with a = S
0
, we have the desired result.
7.8. Using Exercise 1.1 in Chapter 4, the fact that the
n+1
j
are independent of
F
n
, the denition of , and the denition of , we see that on {Z
n
= k}
E(
Zn+1
F
n
) = E(
n+1
1
++
n+1
k
F
n
) = ()
k
=
k
=
Zn
so
Zn
is a martingale. Let N = inf{n : Z
n
= 0}. (4.1) implies
x
= E
x
(
ZNn
).
Exercise 5.6 implies that Z
n
on N = so letting n and using the
bounded convergence theorem gives the desired result.
5 Markov Chains
5.1. Denitions and Examples
1.1. Exercise 1.1 of Chapter 4 implies that on Z
n
= i > 0
P(Z
n+1
= jF
n
) = P
_
i
m=1
n+1
m
F
n
_
= p(i, j)
since the
n+1
m
are independent of F
n
.
1.2. p
2
(1, 2) = p(1, 3)p(3, 2) = (0.9)(0.4) = 0.36. To get from 2 to 3 in three
steps there are three ways 2213, 2113, 2133, so
p
3
(2, 3) = (.7)(.9)(.1 +.3 +.6) = .63
1.3. This is correct for n = 0. For the inductive step note
P
(X
n+1
= 0) = P
(X
n
= 0)(1 ) +P
(X
n
= 1)
= (1 )
_
+
+ (1 )
n
_
(0)
+
__
+
_
+
(1 )
n
_
(0)
+
__
=
+
+ (1 )
n+1
_
(0)
+
_
1.4. The transition matrix is
HH HT TH TT
HH 1/2 1/2 0 0
HT 0 0 1/2 1/2
TH 1/2 1/2 0 0
TT 0 0 1/2 1/2
Section 5.2 Extensions of the Markov Property 75
Since X
n
and X
n+2
are independent p
2
(i, j) = 1/4 for all i and j.
1.5.
AA,AA AA,Aa AA,aa Aa,Aa Aa,aa aa,aa
AA,AA 1 0 0 0 0 0
AA,Aa 1/4 1/2 0 1/4 0 0
AA,aa 0 0 0 1 0 0
Aa,Aa 1/16 1/4 1/8 1/4 1/4 1/16
Aa,aa 0 0 0 1/4 1/2 1/4
aa,aa 0 0 0 0 0 1
1.6. This is a Markov chain since the probability of adding a new value at time
n+1 depends on the number of values we have seen up to time n. p(k, k +1) =
1 k/N, p(k, k) = k/N, p(i, j) = 0 otherwise.
1.7. X
n
is not a Markov chain since X
n+1
= X
n
+1 with probability 1/2 when
X
n
= S
n
and with probability 0 when X
n
> S
n
.
1.8. Let i
1
, . . . , i
n
{1, 1} and N = {m n : i
m
= 1}.
P(X
1
= i
1
, . . . , X
n
= i
n
) =
_
N
(1 )
nN
d
P(X
1
= i
1
, . . . , X
n
= i
n
, X
n+1
= 1) =
_
N+1
(1 )
nN
d
Now
_
1
0
x
m
(1 x)
k
dx = m!k!/(m+k + 1)! so
P(X
n+1
= 1X
1
= i
1
, . . . , X
n
= i
n
) =
(S
n
+ 1)!/(n + 2)!
S
n
!/(n + 1)!
=
S
n
+ 1
n + 2
(ii) Since the conditional expectation is only a function of S
n
, (1.1) implies that
S
n
is a Markov chain.
5.2. Extensions of the Markov Property
2.1. Using the hint, 1
A
F
n
, the Markov property (2.1), then E
(1
B
X
n
)
(X
n
)
P
(A BX
n
) = E
(E
(1
A
1
B
F
n
)X
n
)
= E
(1
A
E
(1
B
F
n
)X
n
)
= E
(1
A
E
(1
B
X
n
)X
n
)
= E
(1
A
X
n
)E
(1
B
X
n
)
76 Chapter 5 Markov Chains
2.2. Let A = {x : P
x
(D) }. The Markov property and the denition of A
imply P(DX
n
) on {X
n
A}. so (2.3) implies
P({X
n
A i.o.} {X
n
= a i.o.}) = 0
Since > 0 is arbitrary the desired result follows.
(ii) Under the assumptions of Exercise 5.5 in Chapter 4, h(X
n
) 0 implies
X
n
.
2.3. Clearly, P
x
(X
n
= y) =
n
m=1
P
x
(T
y
= m, X
n
= y). When m = n,
P
x
(T
y
= n, X
n
= y) = P
x
(T
y
= n) = P
x
(T
y
= n)p
0
(y, y). To handle m < n
note that the Markov property implies
P
x
(X
n
= yF
n
) = P
x
(1
(Xnm=y)
m
F
m
) = P
Xm
(X
nm
= y)
Integrating over {T
y
= m} F
m
where X
m
= y and using the denition of
conditional expectation we have
P
x
(T
y
= m, X
n
= y) = E
x
(1
{Xn=y}
; T
y
= m)
= P
x
(T
y
= m)P
y
(X
nm
= y) = P
x
(T
y
= m)p
nm
(y, y)
2.4. Let T = inf{m k : X
m
= x}. Imitating the proof in Exercise 2.3 it is
easy to show that
P
x
(X
m
= x) =
m
=k
P
x
(T = )p
m
(x, x)
Summing from m = k to n+k, using Fubinis theorem to interchange the sum,
then using the trivial inequalities p
j
(x, x) 0 and P
x
(T n + k) 1 we have
n+k
m=k
P
x
(X
m
= x) =
n+k
m=k
m
=k
P
x
(T = )p
m
(x, x)
=
n+k
=k
n+k
m=
P
x
(T = )p
m
(x, x)
n+k
=k
P
x
(T = )
n
j=0
p
j
(x, x)
j=0
p
j
(x, x) =
n
m=0
P
x
(X
m
= x)
Section 5.2 Extensions of the Markov Property 77
2.5. Since P
x
(
C
< ) > 0 there is an n(x) so that P
x
(
C
n(x)) > 0. Let
N = max
xSC
n(x) < = min
xSC
P
x
(
C
N) > 0
The Markov property implies
P
x
(
C
(k1)N
> NF
(k1)N
) = P
X
(k1)N
(
C
> N)
Integrating over {
C
> (k 1)N} using the denition of conditional probability
and the bound above we have
P
x
(
C
> kN) = E
x
_
1
(C
(k1)N
>N)
;
C
> (k 1)N
_
= E
x
_
P
x
(
C
(k1)N
> NF
(k1)N
);
C
> (k 1)N
_
= E
x
_
P
X
(k1)N
(
C
> N);
C
> (k 1)N
_
(1 )P
x
(
C
> (k 1)N)
from which the result follows by induction.
2.6. (i) If x A B then 1
(A<B)
1
= 1
(A<B)
. Taking expected value we
have
P
x
(
A
<
B
) = E
x
(1
(A<B)
1
)
= E
x
(E
x
(1
(A<B)
1
F
1
)) = E
x
h(X
1
)
(ii) To simplify typing we will write T for
AB
. On {T > n} F
n
we have
X
(n+1)T
= X
n+1
so using Exercise 1.1 in Chapter 4, the Markov property and
(i) we have
E(h(X
nT
)F
n
) = E(h(X
n+1
)F
n
) = E(h(X
1
)
n
F
n
)
= E
Xn
h(X
1
) = h(X
n
) = h(X
nT
)
On {T n} F
n
we have X
(n+1)T
= X
nT
F
n
so using Exercise 1.1 in
Chapter 4 we have
E(h(X
(n+1)T
)F
n
) = E(h(X
nT
)F
n
) = h(X
nT
)
(iii) Exercise 2.5 implies that T =
AB
< a.s. Since S(AB) is nite, any
solution h is bounded, so the martingale property and the bounded convergence
theorem imply
h(x) = E
x
h(X
nT
) E
x
h(X
T
) = P
x
(
A
<
B
)
2.7. (i) 0 = E
0
X
n
and X
n
0 imply P
0
(X
n
= 0) = 1. Similarly. N = E
N
X
n
and X
n
N imply P
N
(X
n
= N) = 1.
78 Chapter 5 Markov Chains
(ii) Exercise 2.5 implies P
x
(
0
N
< ) = 1. The martingale property and
the bounded convergence theorem imply
x = E
x
(X(
0
N
n))
E
x
X(
0
N
) = NP
x
(
N
<
0
)
2.8. (i) corresponds to sampling with replacement from a population with i 1s
and (N i) 0s so the expected number of 1s in the sample is i.
(ii) corresponds to sampling without replacement from a population with 2i 1s
and 2(N i) 0s so the expected number of 1s in the sample is i.
2.9. The expected number of As in each ospring = 2 times the fraction of As
in its parents, so the number of As is a martingale. Using Exercise 2.7 we see
that the absorption probabilities from a starting point with k As is k/4.
2.10. (i) If x A,
A
1
=
A
1. Taking expected value gives
g(x) 1 = E
x
(
A
1) = E
x
(
A
1
)
= E
x
E
x
(
A
1
F
1
) = E
x
g(X
1
)
(ii) On {
A
> n} F
n
, g(X
(n+1)A
) + (n + 1)
A
= g(X
n+1
) + (n + 1), so
using Exercise 1.1 in Chapter 4, and (i) we have
E
x
(g(X
(n+1)A
) + (n + 1)
A
F
n
) = E
x
(g(X
n+1
) + (n + 1)F
n
)
= E(g(X
n+1
)F
n
) + (n + 1) = g(X
n
) 1 + (n + 1) = g(X
n
) +n
On {
A
n} F
n
, g(X
(n+1)A
) +(n+1)
A
= g(X
nA
) +(n
A
), so using
Exercise 1.1 in Chapter 4, we have
E
x
(g(X
(n+1)A
) + ((n + 1)
A
)F
n
) = E
x
(g(X
nA
) + (n
A
)F
n
)
= g(X
nA
) + (n
A
)
(iii) Exercise 2.5 implies P
y
(
A
> kN) (1 )
k
for all y A so E
y
A
< .
Since S A is nite any solution is bounded. Using the martingale property,
the bounded and monotone convergence theorems
g(x) = E
x
(g(X
nA
) + (n
A
)) E
x
A
2.11. In this case the equation () becomes
g(H, H) = 0
g(H, T) = 1 +
1
2
g(T, H) +
1
2
g(T, T)
g(T, H) = 1 +
1
2
g(H, T)
g(T, T) = 1 +
1
2
g(T, H) +
1
2
g(T, T)
Section 5.3 Recurrence and Transience 79
Comparing the second and fourth equations we see g(H, T) = g(T, T). Using
this in the second equation and rearranging the third gives
g(T, T) = 2 +g(T, H)
g(H, T) = 2g(T, H) 2
Noticing that the lefthand sides are equal and solving gives g(T, H) = 4,
g(H, T) = g(T, T) = 6, and
EN
1
=
1
4
(4 + 6 + 6) = 4
2.12. (ii) We claim that
P(I
j
= 1I
j+1
= i
j+1
, . . . , I
k
= i
k
) = 1/j
To prove this note that if n = inf{m > j : i
m
= 1} then the conditioning
event tells us that when the chain left n it jumped to at least as far as j. Since
the original jump distribution was uniform on {1, . . . , n 1} the conditional
distribution is uniform on {1, . . . , j}.
5.3. Recurrence and Transience
3.1. v
k
= v
1
R
k1
. Let v be one of the countably many possible values for the
v
i
. Since X(R
k1
) = y a.s., the strong Markov property implies
P
y
(v
1
R
k1
= vF
R
k1
) = P
y
(v
1
= v)
This implies v
k
is independent of F
R
k1
and hence of v
1
, . . . , v
k1
3.2. (i) follows from Exercise 2.3. To prove (ii) note that using (i), Fubinis
theorem, and then changing variables in the inner sum gives
u(s) 1 =
n=1
u
n
s
n
=
n=1
n
m=1
f
m
u
nm
s
n
=
m=1
n=m
f
m
s
m
u
nm
s
nm
=
m=1
f
m
s
m
k=0
u
k
s
k
= f(s)u(s)
80 Chapter 5 Markov Chains
3.3. If h(x) = (1 x)
1/2
. Dierentiating we have
h
(x) =
1
2
(1 x)
3/2
h
(x) =
1
2
3
2
(1 x)
5/2
h
(m)
(x) =
(2m)!
m!4
m
(1 x)
(2m+1)/2
Recalling h(x) =
m=0
h
(m)
(0)/m! we have
u(s) =
m=0
_
2m
m
_
p
m
q
m
s
2m
= (1 4pqs
2
)
1/2
so using Exercise 3.2 f(s) = 1 1/u(s) = 1 (1 4pqs
2
)
1/2
.
(iii) Setting s = 1, we have P
0
(T
0
< ) = 1 (1 4pq)
1/2
.
3.4. The strong Markov property implies
P
x
(T
z
Ty
< F
Ty
) = P
y
(T
z
< ) on {T
y
< }
Integrating over {T
y
< } and using the denition of conditional expectation
P
x
(T
z
< ) P
x
(T
z
Ty
< )
= E
x
(P
x
(T
z
Ty
< F
Ty
); T
y
< )
= E
x
(P
y
(T
z
< ); T
y
< )
= P
x
(T
y
< )P
y
(T
z
< )
3.5.
xy
> 0 for all x, y so the chain is irreducible. The desired result now
follows from (3.5).
3.6. (i) Using (3.7) we have
P
20
(T
40
< T
0
) =
19
m=0
(20/18)
m
39
m=0
(20/18)
m
=
(20/18)
19
(20/18)
1
(20/18)
39
(20/18)
1
Multiplying top and bottom by 20/18 and calculating that (20/18)
20
= 8.225,
(8.225)
2
= 67.654 we have
P
20
(T
40
< T
0
) =
7.225
66.654
= 0.1084
Section 5.3 Recurrence and Transience 81
(ii) Using (4.1), rearranging and then using the monotone and dominated con
vergence theorems we have
E
20
_
X
Tn
+
2
38
(T n)
_
= 20
E
20
(T n) = 380 19E
20
(X
Tn
)
E
20
T = 380 19 40 P
20
(T
40
< T
0
) = 297.6
3.7. Let = inf{n > 0 : X
n
F}, = inf{(x) : x F}, and pick y so
that (y) < . Our assumptions imply that Y
n
= (X
n
) is supermartingale.
Using (4.1) in Chapter 4 now we see that
(y) E
y
(X
n
) P
y
( < n)
Letting n we see that P
y
( < ) (y)/ < 1.
3.8. Writing p
x
= 1/2 +c
x
/x we have
E
x
X
1
x
= ((x + 1)
) p
x
+ ((x 1)
) q
x
=
1
2
((x + 1)
2x
+ (x 1)
)
+
c
x
x
({(x + 1)
} +{x
(x 1)
})
A little calculus shows
(x + 1)
=
_
1
0
(x +y)
1
dy x
1
(x + 1)
2x
+ (x 1)
=
_
1
0
{(x +y)
1
(x 1 +y)
1
} dy
( 1)x
2
This implies that when x is large
E
x
X
1
x
x
2
_
1
2
+ 2C
_
If C < 1/4 then by taking close to 0 we can make this < 0. When C > 1/4
we take < 0, so we want the quantity inside the brackets to be > 0 which
again is possible for close enough to 0.
3.9. If f 0 is superharmonic then Y
n
= f(X
n
) is a supermartingale so Y
n
converges a.s. to a limit Y
. If X
n
is recurrent then for any x, X
n
= x i.o., so
f(x) = Y
and f is constant.
82 Chapter 5 Markov Chains
Conversely, if the chain is transient then P
y
(T
y
< ) < 1 for some x and
f(x) = P
x
(T
y
< ) is a nonconstant superharmonic function.
3.10. E
x
X
1
= px + < x if < (1 p)x.
5.4. Stationary Measures
4.1. The symmetric form of the Markov property given in Exercise 2.1 implies
that for any initial distribution Y
m
is a Markov chain. To compute its transition
probability we note
P
(Y
m+1
= yY
m
= x) =
P
(Y
m
= x, Y
m+1
= y)
P
(Y
m
= x)
=
P
(X
n(m+1)
= y)P
(X
nm
= xX
n(m+1)
= y)
P
(X
nm
= x)
=
(y)p(y, x)
(x)
4.2. In order for the chain to visit j before returning to 0, it must jump to j or
beyond, which has probability
k=j
f
k+1
and in this case it will visit j exactly
once.
(ii) Plugging into the formula we have
q(i, i + 1) =
(i + 1)
(i)
= P( > i + 1 > i)
q(i, 0) =
(0)p(0, i)
(i)
= P( = i + 1 > i)
which a little thought reveals is the transition probability for the age of the
item in use at time n.
4.3. Since the stationary distribution is unique up to constant multiples
y
(z)
y
(y)
=
x
(z)
x
(y)
Since
y
(y) = 1 rearranging gives the desired equality.
4.4. Since simple random walk is recurrent, (4.4) implies that the stationary
measure (x) 1 is unique up to constant multiples. If we do the cycle trick
starting from 0, the resulting stationary measure has
0
(0) = 1 and
0
(k) =
the expected number of visits to k before returning to 0, so
0
(k) = 1.
Section 5.4 Stationary Measures 83
4.5. If we let a = P
x
(T
y
< T
x
) and b = P
y
(T
x
< T
y
) then the number of visits to
y before we return to x has P
x
(N
y
= 0) = 1 a and P
x
(N
k
= j) = a(1 b)
j1
b
for j 1, so EN
k
= a/b. In the case of random walks when x = 0 we have
a = b = 1/2y.
4.6. (i) Iterating shows that
q
n
(x, y) =
(y)p
n
(y, x)
(x)
Given x and y there is an n so that p
n
(y, x) > 0 and hence q
n
(x, y) > 0.
Summing over n and using (3.3) we see that all states are recurrent under q.
(ii) Dividing by (y) and using the dention of q we have
h(y) =
(y)
(y)
x
q(y, x)
(x)
(x)
so h is nonnegative superharmonic, and Exercise 3.9 implies that it must be
constant.
4.7. By (4.7) the renewal chain is positive recurrent if and only if E
0
T
0
<
but X
1
= k implies T
0
= k + 1 so E
0
T
0
=
k
kf
k
.
4.8. Let n = inf{m : p
m
(x, y) > 0} and pick x
1
, . . . , x
n1
= x so that
p(x, x
1
)p(x
1
, x
2
) p(x
n1
, y) > 0
The Markov property implies
E
x
T
x
E
x
(T
x
; X
1
= x
1
, . . . , X
n1
= x
n1
, X
n
= y)
p(x, x
1
)p(x
1
, x
2
) p(x
n1
, y)E
y
T
x
so E
y
T
x
< .
4.9. If p is recurrent then any stationary distribution is a constant multiple of
and hence has innite total mass, so there cannot be a stationary distribution.
4.10. This is a random walk on a graph, so (i) = the degree of i denes a
stationary measure. With a little patience we can compute the degrees for the
upper 4 4 square in the chessboard to be
2 3 4 4
3 4 6 6
4 6 8 8
4 6 8 8
84 Chapter 5 Markov Chains
Adding up these numbers we get 84 so the total mass of is 336. Thus if
is the stationary distribution and c is a corner then (c) = 2/336 and (4.6)
implies E
c
T
c
= 168.
4.11. Using (4.1) from Chapter 4 it follows that
x E
x
(X
n
+(n )) E
x
( n)
Letting n and using the monotone convergence theorem the desired result
follows.
4.12. The Markov property and the result of the previous exercise imply that
E
0
T
0
1 =
x
p(0, x)E
x
x
p(0, x)
x
=
1
E
x
X
1
<
5.5. Asymptotic Behavior
5.1. Making a table of the number of black and white balls in the two urns
L R
black n b n
white mn m(b n)
we can read o the transition probability. If 0 n b then
p(n, n + 1) =
mn
m
b n
m
p(n, n 1) =
n
m
m+n b
m
p(n, n) = 1 p(n, n 1) p(n, n + 1)
5.2. {1, 7}, {2, 3}, {4, 5, 6}.
5.3. Let Z be a bounded invariant random variable and h(x) = E
x
Z. The
invariance of Z and the Markov property imply
E
(ZF
n
) = E
(Z
n
F
n
) = h(X
n
)
so h(X
n
) is martingale and h is a bounded harmonic function.
Conversely if h is bounded and harmonic then h(X
n
) is a bounded martingale.
(2.10) in Chapter 4 implies Z = lim
n
h(X
n
) exists. Z is shift invariant since
Z = lim
n
h(X
n+1
). (5.5) in Chapter 4 implies h(X
n
) = E(ZF
n
).
Section 5.5 Asymptotic Behavior 85
5.4. (i)
m
corresponds to the number of customers that have arrived minus
the one that was served. It is easy to see that the M/G/1 queue satises
X
n+1
= (X
n
+
m+1
)
+
and the new dention does as well.
(ii) When X
m1
= 0 and
m
= 1 the random walk reaches a new negative
minimum so
{m n : X
m1
= 0,
m
= 1} =
_
min
mn
S
m
_
k=1
V
f
k
EV
f
k
a.s.
Taking m = K
n
and noting that the renewal theorem implies K
n
/n 1/E
x
T
x
a.s. the desired result follows.
(iii) From Exercise 6.14 in Chapter 1 we see that if EV
f
n
< then
1
n
max
1mn
V
f
m
0 a.s.
It is easy to see that K
n
n and
m=1
f(X
m
)
Kn
m=1
V
f
m
max
1mn
V
f
m
86 Chapter 5 Markov Chains
and the desired result follows.
5.6. (i) From (ii) in Exercise 5.5, we know that K
n
/n 1/E
x
T
x
. Since the
V
f
k
are i.i.d. with EV
f
k
= 0 and E(V
f
k
)
2
< the desired result follows from
Exercise 4.7 in Chapter 2.
(ii) E(V
f
k
)
2
< implies that for any > 0
k
P
_
(V
f
k
)
2
>
2
k
_
<
so the BorelCantelli lemma implies P(V
f
k
>
j=1
P
x
(X
m
= z, J = j)
= p
m
(x, z) +
m1
j=1
P
x
(X
j
= y, X
j+1
= y, . . . , X
m1
= y, X
m
= z)
If we let A
k
= {X
1
= y, . . . , X
k1
= y, X
k
= z} then using the denition of
A
k
, the denition of conditional expectation, the Markov property, and the
denitions of p
j
and p
mj
P
x
(X
j
= y,X
j+1
= y, . . . , X
m1
= y, X
m
= z) = E
x
(1
Amj
j
; X
j
= y)
= E
x
(E
x
(1
Amj
j
F
j
); X
j
= y)
= E
x
(P
y
(A
mj
); X
j
= y) = p
j
(x, y) p
mj
(y, z)
Section 5.5 Asymptotic Behavior 87
Combining this with the rst equality and summing over m
n
m=1
p
m
(x, z) =
n
m=1
p
m
(x, z) +
n
m=1
m1
j=1
p
j
(x, y) p
mj
(y, z)
Interchanging the order of the last two sums and changing variables k = mj
gives the desired formula.
(ii) P
y
(T
x
< T
y
)
m=1
p
m
(x, z)
y
(z) < and recurrence implies that
m=1
p
m
(x, y) = so we have
n
m=1
p
m
(x, z)
_
n
m=1
p
m
(x, y) 0
To handle the second term let a
j
= p
j
(x, y), b
m
=
m
k=1
p
k
(y, z) and note that
b
m
y
(z) and a
m
1 with
m=1
a
m
= so
n1
j=1
a
j
b
nj
_
n
m=1
a
m
y
(z)
To prove the last result let > 0, pick N so that b
m
y
(z) < for m N
and then divide the sum into 1 j n N and n N < j < n.
5.9. By aperiodicity we can pick an N
x
so that for all n N
x
p
n
(x, x) > 0. By
irreducibility there is an n(x, y) so that p
n(x,y)
(x, y) > 0. Let
N = max{N
x
, n(x, y) : x, y S} <
by the niteness of S.
p
2N
(x, y) p
n(x,y)
(x, y)p
2Nn(x,y)
(y, y) > 0
since 2N n(x, y) N.
5.10. If = inf p(x, y) > 0 and there are N states then
P(X
n+1
= Y
n+1
X
n
= x, Y
n
= y) =
z
p(x, z)p(y, z)
2
N
so P(T > n + 1T > n) (1
2
N) and we have P(T > n) (1
2
N)
n
.
5.11. To couple X
n+m
and Y
n+m
we rst run the two chains to time n. If
X
n
= Y
n
an event with probability 1
n
then we can certainly arrange
things so that X
n+m
= Y
n+m
. On the other hand it follows from the denition
of
m
that
P(X
n+m
= Y
n+m
X
n
= k, Y
n
= )
m
88 Chapter 5 Markov Chains
5.6. General State Space
6.1. As in (3.2)
n=1
p
n
(, ) =
k=1
P
(R
k
< )
=
k=1
P
(R < )
k
=
P
(R < )
1 P
(R < )
6.2. By Example 6.1, without loss of generality A = {a} and B = {b}. Let
R = {x :
bx
> 0}. If is recurrent then b is recurrent, so if x R then (3.4)
implies x is recurrent. (i) implies
xb
> 0. If y is another point in R then
Exercise 3.4 implies
xy
xb
by
> 0 so R is irreducible. Let T = S R. If
z T then
bz
= 0 but
zb
> 0 so z is transient by remarks after Example 3.1.
6.3. Suppose that the chain is recurrent when (A, B) is used. Since P
x
(
A
<
) > 0 we have P
(
A
< ) > 0 and (6.4) implies P
(
X
n
A
i.o.) = 1. (ii)
of the denition for (A
, B
V
2
= ) > 0
it follows from (2.3) that V
n
is recurrent.
6.6.
P
x
(V
1
< x) P
x
(
1
< ( )x)
E
( )x
If x is large
n=1
E
()x
n1
< 1 so P
x
(V
n
n
x for all n) > 0.
Section 5.6 General State Space 89
6.7. Let F
n
= (Y
0
, . . . , Y
n
). In this case Y
n
 =
_
Y
n1

n
 where
n
is a
standard normal independent of F
n1
The sign of Y
n
is independent of F
n1
and Y
n
 so it is enough to look at the behavior of Y
n
. Taking logs and iterating
we have
log Y
n
 = log(
n
) + 2
1
log Y
n1

= log(
n
) + 2
1
log(
n1
) + 2
2
log Y
n2

=
n1
m=0
2
m
log(
nm
) + 2
n
log Y
0

Since E log() < it is easy to see from this representation that log Y
n

a limit independent of Y
0
. Using P(Y
n
 K i.o.) limsup P(Y
n
 K) and
(2.3) now it follows easily that Y
n
is recurrent for any .
6.8. Let T
0
= 0 and T
n
= inf{m T
n1
+ k : X
m
G
k,
}. The denition of
G
k,
implies
P(T
n
< T
T
n1
< T
) (1 )
so if we let N = sup{n : T
n
< T
_
2k/
Assumption (i) implies S
k,m
G
k,1/m
so is nite.
6.9. If (C) = 0 then P
(
X
n
C) = 0 for all n so P
(
X
n
C, R > n) = 0
for all n and (C) = 0. To prove the converse note that if (C) = 0 then
P
(
X
n
C, R > n) = 0 for all n. Now if P
(
X
m
C) > 0 and we let M be
the smallest m for which this holds we have
P
(
X
M
C) = P
(X
M
C, R > M) = 0
a contradiction so P
(
X
m
C) = 0 for all m and (C) = 0.
6.10. The almost sure convergence of the sum follows from Exercise 8.8 in
Chapter 1. The sum Z is a stationary distribution since obviously +Z =
d
Z.
6.11. To prepare for the proof we note that by considering the time of the rst
visit to
n=1
p
n
(x, ) = P
x
(T
< )
m=0
p
m
(, )
m=0
p
m
(, )
90 Chapter 5 Markov Chains
Let = p. By (6.6) this is a stationary probability measure for p. Irreducibil
ity and the fact that = p
n
imply that () > 0 so using our preliminary
=
n=1
() =
_
(dx)
n=1
p
n
(x, )
m=0
p
m
(, )
and the recurrence follows from Exercise 6.1.
6.12. Induction implies
V
n
=
n
+
n1
+ +
n1
1
+
n
V
0
Y
n
=
n
Y
0
0 in probability and
X
n
d
=
0
+
1
+ +
n1
n1
n=0
n
So the converging together lemma, 2.10 in Chapter 2 implies
V
n
n=0
n
6.13. (i) See the solution of 5.4.
(ii) S
n
m
n
= max
0kn
S
n
S
k
(iii) max(S
0
, S
1
, . . . , S
n
) =
d
max(S
0
, S
1
, . . . , S
n
) As n ,
max(S
0
, S
1
, . . . , S
n
) max(S
0
, S
1
, S
2
. . .) a.s.
6.14. Let F be the distribution of Y .
P(X Y > x) =
_
0
P(X > x +y) dF(y)
= e
x
_
0
P(X > y) dF(y) = ae
x
6 Ergodic Theorems
6.1. Denitions and Examples
1.1. If A I then
1
(A
c
) = (
1
A)
c
= A
c
so A
c
I. If A
n
I are disjoint
then
1
(
n
A
n
) =
n
1
(A
n
) =
n
A
n
so
n
A
n
I. To prove the second claim note that the set of invariant random
variables contains the indicator functions 1
A
with A I and is closed under
pointwise limits, so all X I are invariant. To prove the other direction note
that if X is invariant and B R then
{ : X() B} = { : X() B} =
1
({ : X() B}
so { : X() B} I.
1.2. (i)
1
(B) =
n=1
n
(A)
n=0
n
(A) = B.
(ii)
1
(C) =
n=1
n
(B) = C since
1
(B) B.
(iii) We claim that if A is almost invariant then A = B = C a.s.
To see that P(AB) = 0 we begin by noting that is measure preserving so
P(
n
(A)
(n+1)
(A)) = P(
1
[
(n1)
(A)
n
(A)])
= P(
(n1)
(A)
n
(A))
Since P(A
1
(A)) = 0 it follows by induction that
P(
n
(A)
(n+1)
(A)) = 0
for all n 0. Using the triangle inequality P(AC) P(AB) +P(BC) it
follows that P(A
n
(A)) = 0. Since this holds for all n 1 and is trivial for
n = 0 we have
P(AB)
n=0
P(A
n
(A)) = 0
92 Chapter 6 Ergodic Theorems
To see that P(BC) = 0 note that B
1
(B) A
1
(A) has measure 0,
and is measure preserving so induction implies P(
n
(B)
(n+1)
(B)) = 0
and we have
P
_
B
n=1
n
(B)
_
= 0
This shows P(B C) = 0. Since B C the desired conclusion follows.
Conversely, if C is strictly invariant and P(AC) = 0 then
P(
1
AC) = P(
1
(AC)) = P(AC) = 0
so P(
1
AA) P(
1
AC) +P(CA) = 0.
1.3. Let = {0, 1}, F = all subsets, P assign mass 1/2 to each point, T() =
1 preserves P and clearly there are no invariant sets other than and .
However T
2
is the identity and is not ergodic.
1.4. (i) Since all the x
m
are distinct, for some m < n N we must have
x
m
x
n
 1/N. Dene k
j
Z so that j = k
j
+ x
j
. By considering
two cases x
m
< x
n
and x
m
> x
n
we see that either x
nm
= x
n
x
m
 or
x
nm
= 1 x
n
x
m
. In these two cases we have, for k < N,
x
k(nm)
= kx
n
x
m
 and x
k(nm)
= 1 kx
n
x
m

respectively. This shows that the orbit comes within 1/N of any point. Since
N is arbitrary, the desired result follows.
(ii) Let > 0 and = P(A). Applying Exercise 3.1 to the algebra A of
nite disjoint unions of intervals [u, v), it follows that there is B A so that
P(AB) < and hence P(B) P(A)+. If B = +
m
i=1
[u
i
, v
i
) and A[u
i
, v
i
)
(1 )v
i
u
i
 for all i then
P(A) (1 )P(B) (1 )(P(A) + ) (1
2
)P(A)
a contradiction, so we must have A [u
i
, v
i
) (1 )v
i
u
i
 for some i.
(iii) Let A be invariant and > 0. It follows from (ii) that there is an interval
[a, b) so that A[a, b) (1)(ba). If 1/(n+1) < ba < 1/n then there are
y
1
, . . . , y
n
so that B
k
= ([a, b] +y
k
) mod 1 are disjoint. Since the x
n
are dense,
we can nd n
k
so that B
k
= ([a, b] + x
n
k
) mod 1 are disjoint. The invariance
of A implies that (A + x
n
) mod 1 A. Since A [a, b] > (1 )(b a), it
follows that
A n(b a)(1 )
n
n + 1
(1 )
Since n and are arbitrary the desired result follows.
Section 6.2 Birkhos Ergodic Theorem 93
1.5. If f(x) =
k
c
k
e
2kx
then
f((x)) =
k
c
k
e
2i2kx
The uniqueness of the Fourier coecients implies c
k
= c
2k
. Iterating we see
c
k
= c
2
j
k
, so if c
k
= 0 for some k = 0 then we cannot have
k
c
2
k
<
1.6. From the denition it is clear that
1
[a, b] =
n=1
__
1
n +b
,
1
n +a
__
=
n=1
ln
_
n +a + 1
n +a
_
ln
_
n +b + 1
n +b
_
since
_
v
u
dx/(1 +x) = ln(1 +v) ln(1 +u). If we replace by N the sum is
ln
_
N +a + 1
N +b + 1
_
+ ln(1 +b) ln(1 +a)
As N the righthand side converges to ([a, b]).
1.7. To check stationarity, we let j > n and note that for any i, Z
i
, Z
i+1
, . . . , Z
i+j
consists of a partial block with a length that is uniformly distributed on 1, . . . n,
then a number of full blocks of length n and then a partial block n.
To check ergodicity we note that the tail eld of the Z
m
is contained in that
of the block process, which is trivial since it is i.i.d.
6.2. Birkhos Ergodic Theorem
2.1. Let X
M
and X
M
be dened as in the proof of (2.1). The bounded conver
gence theorem implies
E
1
n
n1
m=0
X
M
(
m
) E(X
M
I)
p
0
Writing Z
p
= (EZ
p
)
1/p
and using the triangle inequality
_
_
_
_
1
n
n1
m=0
X
M
(
m
) E(X
M
I)
_
_
_
_
p
_
_
_
_
1
n
n1
m=0
X
M
(
m
)
_
_
_
_
p
+E(X
M
I)
p
1
n
n1
m=0
X
M
(
m
)
p
+E(X
M
I)
p
2X
p
94 Chapter 6 Ergodic Theorems
since EX
M
(
m
)
p
= EX
M

p
and EE(X
M
I)
p
EX
M

p
by (1.1e) in
Chapter 4.
2.2. (i) Let h
M
() = sup
mM
g
m
() g().
limsup
n
1
n
n1
m=0
g
m
(
m
) lim
n
1
n
n1
m=0
(g +h
N
)(
m
)
= E(g +h
M
I)
since g
m
g + h
M
for all m M. h
M
0 as M and h
0
is integrable, so
(1.1c) in Chapter 4 implies E(g +h
M
I) E(gI).
(ii) The triangle inequality and the convergence of g
m
g in L
1
imply
E
1
n
n1
m=0
g
m
(
m
)
1
n
n1
m=0
g(
m
)
1
n
n1
m=0
Eg
m
g 0
The ergodic theorem implies
E
1
n
n1
m=0
g(
m
) E(gI)
0
Combining the last two results and using the triangle inequality gives the desired
result.
2.3. Let X
M
and X
M
be dened as in the proof of (2.1). The result for bounded
random variables implies
1
n
n1
m=0
X
M
(
m
) E(X
M
I)
Using (2.3) now on X
M
we get
P
_
sup
n
1
n
n1
m=0
X
M
(
m
)
>
_
1
EX
M

As M , EX
M
 0. A trivial special case of (5.9) in Chapter 4 implies
E(X
M
I) E(XI) so
P
_
limsup
n
1
n
n1
m=0
X(
m
) > E(XI) + 2
_
= 0
Section 6.3 Recurrence 95
Since the last result holds for any > 0 the desired result follows.
6.3. Recurrence
3.1. Counting each point visited at the last time it is visited in {1, . . . , n}
ER
n
=
n
m=1
P(S
m+1
S
m
= 0, . . . , S
n
S
m
= 0) =
n
m=1
g
m1
3.2. When P(X
i
> 1) = 0
_
1, . . . , max
mn
S
m
_
R
n
_
min
mn
S
m
, . . . , max
mn
S
m
_
If EX
i
> 0 then S
n
/n EX
i
> 0 so S
n
and min
mn
S
m
> a.s. To
evaluate the limit of max
mn
S
m
/n we observe that for any K
lim
n
S
n
n
liminf
n
_
max
1kn
S
k
/n
_
limsup
n
_
max
1kn
S
k
/n
_
= limsup
n
_
max
Kkn
S
k
/n
_
_
max
kK
S
k
/k
_
3.3. () = E exp(X
i
) is convex, () as and the left derivative
at 0 has
(0) = EX
i
> 0 so there is a unique < 0 so that () = 1. Exercise
7.4 in Chapter 4 implies that exp(S
n
) is a martingale. (4.1) in Chapter 4
implies 1 = E exp(S
Nn
). Since exp(S
Nn
) e
and S
n
as n
the bounded convergence theorem implies 1 = e
P(N < ).
3.4. It suces to show
E
_
1mT1
1
(XmB)
; X
0
A
_
= P(X
0
B)
To do this we observe that the left hand side is
m=1
P(X
0
A, X
1
A, . . . , X
m1
A, X
m
B)
=
m=1
P(X
m
A, X
m+1
A, . . . , X
1
A, X
0
B) = P(X
0
B)
96 Chapter 6 Ergodic Theorems
3.5. First note that (3.3) implies
ET
1
= 1/P(X
0
= 1), so the right hand side is
P(X
0
= 1, T
1
n). To compute the left now we break things down according
to the position of the rst 1 to the left of 0 and use translation invariance to
conclude P(T
1
= n) is
=
m=0
P(X
m
= 1, X
j
= 0 for j (m, n), X
n
= 1)
=
m=0
P(X
0
= 1, X
j
= 0 for j (0, m+n), X
m+n
= 1)
= P(X
0
= 1, T
1
n)
6.6. A Subadditive Ergodic Theorem
6.1. (1.3) implies that the stationary sequences in (ii) are ergodic. Exercise 3.1
implies EX
0,n
=
n
m=1
P(S
1
= 0, . . . , S
n
= 0). Since P(S
1
= 0, . . . , S
n
= 0) is
decreasing it follows easily that EX
0,n
/n P( no return to 0 ).
6.2. (a) EL
1
= P(X
1
= Y
1
) = 1/2. To compute EL
2
let N
2
= {i 2 :
X
i
= Y
i
} and note that L
2
N
2
= 0 unless (X
1
, X
2
, Y
1
, Y
2
) is (1, 0, 0, 1) or
(0, 1, 1, 0). In these two cases which have probability 1/16 each L
2
N
2
= 1 so
EL
2
= EN
2
+ 1/8 = 9/8 so EL
2
/2 = 9/16
(b) The expected number of sequences of length K is
_
n
K
_
2
2
K
. Taking K = an
using Stirlings formula m! m
m
e
m
n
2n
2
an
(an)
2an
((1 a)n)
2(1a)n
= (a
2a
(1 a)
2(1a)
2
a
)
n
From the last computation it follows that
1
n
log
_
_
n
na
_
2
2
na
_
2a log a 2(1 a) log(1 a) a log 2
When a = 1 the right hand side is log 2 < 0. By continuity it is also negative
for a close to 1.
Section 6.7 Applications 97
6.7. Applications
7.1. It is easy to see that
E(X
1
+Y
1
) =
_
0
P(X
1
+Y
1
> t) dt =
_
0
e
t
2
/2
dt =
_
/2
Symmetry implies EX
1
= EY
1
=
_
/8. The law of large numbers implies
X
n
/n, Y
n
/n
_
/8. Since (X
1
, Y
1
), (X
2
, Y
2
), . . . is increasing the desired
results follows.
7.2. Since there are
_
n
k
_
subsets and each is in the correct order with probability
1/k! we have
EJ
n
k
_
n
k
__
k!
n
k
(k!)
2
n
2k
k
2k
e
2k
where in the last equality we have used Stirlings formula without the
k term.
Letting k =
n we have
1
n
log EJ
n
k
2log + 2 < 0
when > e.
7.3. It is immediate from the denition that EY
1
= 1. Grouping the individuals
in generation n+1 according to their parents in generation n and using EY
1
= 1
it is easy to see that this is a martingale. Since Y
n
is a nonnegative martingale
Y
n
Y < . However, if exp(a)/() = b > 1 and X
0,n
an then
Y
n
b
n
so this cannot happen innitely often.
7.4. Let k
m
be the integer so that t(k
m
, m) = a
m
. Let X
m,n
be the amount
of time it takes water starting from (k
m
, m) to reach depth n. It is clear
that X
0,m
+ X
m,n
X
0,n
Since EX
+
0,1
< and X
m,n
0 (iv) holds. (6.1)
implies that X
0,n
/n X a.s. To see that the limit is constant, enumerate the
edges in some order (e.g., take each row in turn from left to right) e
1
, e
2
, . . . and
observe that X is measurable with respect to the tail eld of the i.i.d. sequence
(e
1
), (e
2
), . . ..
7.5. (i) a
1
is the minimum of two mean one exponentials so it is a mean 1/2
exponential. (ii) Let S
n
be the sum of n independent mean 1 exponentials.
Results in Section 1.9 imply that for a < 1
1
n
log P(S
n
na) a + 1 + log a
Since there are 2
n
paths down to level n, we see that if f(a) = log 2 a + 1 +
log a < 0 then a. Since f is continuous and f(1) = log 2 this must hold for
some a < 1.
7 Brownian Motion
7.1. Denition and Construction
1.1. Let A = {A = { : ((t
1
), (t
2
), . . .) B} : B R
{1,2,...}
}. Clearly, any
A A is in the eld generated by the nite dimensional sets. To complete
the proof, we only have to check that A is a eld. The rst and easier step is
to note if A = { : ((t
1
), (t
2
), . . .) B} then A
c
= { : ((t
1
), (t
2
), . . .)
B
c
} A. To check that A is closed under countable unions, let A
n
= { :
((t
n
1
), (t
n
2
), . . .) B
n
}, let t
1
, t
2
, . . . be an ordering of {t
n
m
: n, m 1} and
note that we can write A
n
= { : ((t
1
), (t
2
), . . .) E
n
} so
n
A
n
= { :
((t
1
), (t
2
), . . .)
n
E
n
} A.
1.2. Let A
n
= { : there is an s [0, 1] so that B
t
B
s
 Ct s
when
t s k/n}. For 1 i n k + 1 let
Y
i,n
= max
_
B
_
i +j
n
_
B
_
i +j 1
n
_
: j = 0, 1, . . . k 1
_
B
n
= { at least one Y
i,n
is (2k 1)C/n
}
Again A
n
B
n
but this time if > 1/2 + 1/k
P(B
n
) nP(B(1/n) (2k 1)C/n
)
k
nP(B(1) (2k 1)Cn
1/2
)
k
C
n
k(1/2)+1
0
1.3. The rst step is to observe that the scaling relationship (1.2) implies
()
m,n
d
= 2
n/2
1,0
Section 7.2 Markov Property, Blumenthals 01 Law 99
while the denition of Brownian motion shows E
2
1,0
= t, and E(
2
1,0
t)
2
=
C < . Using () and the denition of Brownian motion, it follows that if
k = m then
2
k,n
t2
n
and
2
m,n
t2
n
are independent and have mean 0 so
E
_
_
1m2
n
(
2
m,n
t2
n
)
_
_
2
=
1m2
n
E
_
2
m,n
t2
n
_
2
= 2
n
C2
2n
where in the last equality we have used () again. The last result and Cheby
shevs inequality imply
P
_
_
1m2
n
2
m,n
t
1/n
_
_
Cn
2
2
n
The right hand side is summable so the Borel Cantelli lemma (see e.g. (6.1) in
Chapter 1 of Durrett (1991)) implies
P
_
_
m2
n
2
m,n
t
t
n
for some n N} and A =
N
A
N
. A trivial inequality and the scaling relation (1.2) implies
P
0
(A
N
) P
0
(B(t
N
) C
t
N
) = P
0
(B(1) C) > 0
Letting N and noting A
N
A we have P
0
(A) P
0
(B
1
C) > 0. Since
A F
+
0
it follows from (2.7) that P
0
(A) = 1, that is, limsup
t0
B(t)/
t C
with probability one. Since C is arbitrary the proof is complete.
7.3. Stopping Times, Strong Markov Property
3.1. If m2
n
< t (m+ 1)2
n
then {S
n
< t} = {S < m2
n
} F
m2
n F
t
.
3.2. Since constant times are stopping times the last three statements follow
from the rst three.
{S T t} = {S t} {T t} F
t
.
{S T t} = {S t} {T t} F
t
{S +T < t} =
q,rQ:q+r<t
{S < q} {T < r} F
t
3.3. Dene R
n
by R
1
= T
1
, R
n
= R
n1
T
n
. Repeated use of Exercise 3.2
shows that R
n
is a stopping time. As n R
n
sup
n
T
n
so the desired result
follows from (3.3).
Dene S
n
by S
1
= T
1
, S
n
= S
n1
T
n
. Repeated use of Exercise 3.2 shows
that S
n
is a stopping time. As n S
n
inf
n
T
n
so the desired result follows
from (3.2).
limsup
n
T
n
= inf
n
sup
mn
T
m
and liminf
n
T
n
= sup
n
inf
mn
T
m
so the last
two results follow easily from the rst two.
3.4. First if A F
S
then
A {S < t} =
n
(A {S t 1/n}) F
t
On the other hand if A {S < t} F
t
and the ltration is right continuous
then
A {S t} =
n
(A {S < t + 1/n})
n
F
t+1/n
= F
t
3.5. {R t} = {S t} A F
t
since A F
S
Section 7.4 Maxima and Zeros 101
3.6. (i) Let r = s t.
{S < t} {S < s} = {S < r} F
r
F
s
{S t} {S s} = {S r} F
r
F
s
This shows {S < t} and {S t} are in F
S
. Taking complements and inter
estions we get {S t}, {S > t}, and {S = t} are in F
S
.
(ii) {S < T}{S < t} =
q<t
{S < q}{T > q} F
t
by (i), so {S < T} F
S
.
{S < T}{T < t} =
q<t
{S < q}{q < T < t} F
t
by (i), so {S < T} F
T
.
Here the unions were taken over rational q. Interchanging the roles of S and
T we have {S > T} in F
S
F
T
. Taking complements and interestions we get
{S T}, {S T}, and {S = T} are in F
S
F
T
.
3.7. If A R then
{B(S
n
) A} {S
n
t} =
0m2
n
t
{S
n
= m/2
n
} {B(m/2
n
) A} F
t
by (i) of Exercise 3.6. This shows {B(S
n
) A} F
Sn
so B(S
n
) F
Sn
. Letting
n and using (3.6) we have B
S
= lim
n
B(S
n
)
n
F
Sn
= F
S
.
7.4. Maxima and Zeros
4.1. (i)Let Y
s
() = 1 if s < t and u < (t s) < v, 0 otherwise. Let
Y
s
() =
_
1 if s < t, 2a v < (t s) < 2a u
0 otherwise
Symmetry of the normal distribution implies E
a
Y
s
= E
a
Y
s
, so if we let S =
inf{s < t : B
s
= a} and apply the strong Markov property then on {S < }
E
x
(Y
S
S
F
S
) = E
a
Y
S
= E
a
Y
S
= E
x
(
Y
S
S
F
S
)
Taking expected values now gives the desired result.
(ii) Letting M
t
= max
0st
B
s
we can rewrite (4.7) as
P
0
(M
t
> a, u < B
t
< v) = P
0
(2a v < B
t
< 2a u)
Letting the interval (u, v) shrink to x we see that
P
0
(M
t
> a, B
t
= x) = P
0
(B
t
= 2a x) =
1
2t
e
(2ax)
2
/2t
102 Chapter 7 Brownian Motion
Dierentiating with respect to a now we get the joint density
P
0
(M
t
= a, B
t
= x) =
2(2a x)
2t
3
e
(2ax)
2
/2t
4.2. We begin by noting symmetry and Exercise 2.1 imply
P
0
(R 1 +t) = 2
_
0
p
1
(0, y)
_
t
0
P
y
(T
0
= s) ds dy
=
_
t
0
2
_
0
p
1
(0, y)P
y
(T
0
= s) dy ds
by Fubinis theorem, so the integrand gives the density P
0
(R = 1 + t). Since
P
y
(T
0
= t) = P
0
(T
y
= t), (4.7) gives
P
0
(R = 1 +t) = 2
_
0
1
2
e
y
2
/2
1
2t
3
ye
y
2
/2t
dy
=
1
2t
3/2
_
0
ye
y
2
(1+t)/2t
dy =
1
2t
3/2
t
(1 +t)
7.5. Martingales
5.1. It follows from (5.6) that
cosh(B
t
)e
2
t/2
=
1
2
_
exp(B
t
2
t/2) + exp(B
t
()
2
t/2)
_
is a martingale. (5.1) and this imply
1 = E
0
_
cosh(B
Tt
)e
2
(Tt)/2
_
Letting t and using the bounded convergence theorem we have
1 = cosh(a)E
0
_
e
2
T/2
_
5.2. It follows from (5.1) and (5.6) that
1 = E
0
exp(B
t
2
( t)/2)
Section 7.5 Martingales 103
= b +
b
2
+ 2 is the larger root of b
2
/2 = and B
Tt
a +b(T t)
so using the bounded convergence theorem we have
1 = E
0
_
exp((a +b)
2
/2); <
_
Substituting in the value of and rearranging gives the desired result.
5.3. (i) T
a
= when T
a
< T
b
and T
a
= + T
a
when T
b
< T
a
. Using the
dention of conditional expectation and (1.3) in Chapter 4 we have
E
x
_
e
Ta
; T
b
< T
a
_
= E
x
_
E
x
_
e
(+Ta
_
; T
b
< T
a
_
= E
x
_
e
E
x
_
e
Ta
_
; T
b
< T
a
_
Since B
= b on T
b
< T
a
, the strong Markov property implies
E
x
_
e
Ta
_
= E
b
_
e
Ta
_
and completes the proof of the formula.
(ii) Letting u = E
x
(e
; T
a
< T
b
) and v = E
x
(e
; T
b
< T
a
) then using (4.4)
we can write the equations as
exp((x a)
2) = u +v exp((b a)
2)
exp((b x)
2) = v +uexp((b a)
2)
Multiplying the rst equation by exp((b a)
2) = sinh((b a)
2)u
One can solve for v in a similar way.
5.4. (5.1) and (5.8) imply
E(B(U t)
4
6(U t)B(U t)
2
) = 3E(U t)
2
By putting (a, b) inside a larger symmetric interval and using (5.5) we get
EU < . Letting t , using the dominated convergence theorem on the
left hand side, and the monotone convergence theorem on the right gives E(B
4
U
6UB
2
U
) = 3EU
2
so using CauchySchwarz
EU
2
2EUB
2
U
2
_
EU
2
_
1/2
_
EB
4
U
_
1/2
and it follows that EU
2
4EB
4
U
.
104 Chapter 7 Brownian Motion
5.5. p
t
(x, y) = (2t)
1/2
e
(yx)
2
/2t
. Dierentiating gives
p
t
t
=
1
2
(2)
1/2
t
3/2
e
(yx)
2
/2t
+ (2t)
1/2
e
(yx)
2
/2t
(y x)
2
2t
2
p
t
y
= (2t)
1/2
e
(yx)
2
/2t
(y x)
t
2
p
t
y
2
= (2t)
1/2
e
(yx)
2
/2t
(y x)
2
t
2
+ (2t)
1/2
e
(yx)
2
/2t
1
t
so
p
t
/t = (1/2)
2
p
t
y
2
To check the second claim note that
t
(p
t
(x, y)u(t, y)) = u(t, y)
t
p
t
(x, y) +p
t
(x, y)
t
u(t, y)
= u(t, y)
1
2
2
y
2
p
t
(x, y) +p
t
(x, y)
t
u(t, y)
Integrating by parts twice in the rst term results in
_
p
t
(x, y)
_
1
2
2
y
2
u(t, y) +
t
u(t, y)
_
dy = 0
5.6. If we let u(t, x) = x
6
atx
4
+bt
2
x
2
ct
3
then
u
x
= 6x
5
4atx
3
+ 2bt
2
x
2
u
x
2
= 30x
4
12atx
2
+ 2bt
2
u
t
= ax
4
+ 2btx
2
3ct
2
To have u/t =
1
2
2
u/x
2
we need
a + 15 = 0 2b 6a = 0 3c +b = 0
i.e., a = 15, b = 45, c = 15. Using (5.1) we have
E
_
B
6
Tt
15(T t)B
4
Tt
+ 45(T t)
2
B
2
Tt
_
= 15E(T t)
3
From (5.5) and (5.9) we know ET = a
2
and ET
2
= 5a
4
/3 < . Using
the dominated convergence theorem on the left and the monotone convergence
theorem on the right, we have
a
6
_
1 15 + 45
5
3
_
= 15ET
3
Section 7.6 Donskers Theorem 105
so ET
3
= 61/15.
5.7. u(t, x) = (1 + t)
1/2
exp(x
2
t
/(1 + t)) = (2)
1/2
p
1+t
(0, ix) where i =
1
so u/t +(1/2)
2
u/x
2
= 0 and Exercise 5.5 implies u(t, B
t
) is a martingale.
Being a nonnegative martingale it must converge to a nite limit a.s. However,
if we let x
t
= B
t
/((1 +t) log(1 + t))
1/2
then
(1 +t)
1/2
exp(B
2
t
/(1 +t)) = (1 +t)
1/2
exp(x
2
t
log(1 +t))
so we cannot have x
2
t
1/2 i.o.
7.6. Donskers Theorem
6.1. Exercise 5.4 implies ET
2
u,v
C
_
x
4
u,v
(dx) so using a computation after
(6.2)
E
_
T
2
U,V
_
CE
_
x
4
U,V
(dx) = CEX
4
6.2. () = max
0s1
(s) min
0s1
(s) is continuous so () implies
1
n
_
max
0mn
S
m
min
0mn
S
m
_
max
0s1
B
s
min
0s1
B
s
6.3. (i) Clearly (1/n)
n
m=1
B(m/n) B((m1)/n) has a normal distribution.
The sums converges a.s. and hence in distribuiton to
_
1
0
B
t
dt, so by Exercise
3.9 the integral has a normal distribution. To compute the variance, we write
E
__
1
0
B
t
dt
_2
= E
__
1
0
_
1
0
B
s
B
t
dt ds
_
= 2
__
1
0
_
1
s
E(B
s
B
t
) dt ds
_
= 2
_
1
0
_
1
s
s dt ds
= 2
_
1
0
s(1 s) ds = 2
_
s
2
2
s
3
3
_
1
0
=
1
3
(ii) Let X
n,m
= (n + 1 m)X
m
/n
3/2
. EX
n,m
= 0 and
n
m=1
EX
2
n,m
= n
3
n
k=1
j
2
1/3
106 Chapter 7 Brownian Motion
To check (ii) in (4.5) in Chapter 2 now, we observe that if 1 m n
E
__
n
3/2
(n + 1 m)X
m
/n
3/2
_
2
;
n
3/2
(n + 1 m)X
m
/n
3/2
>
_
1
n
E
_
X
2
1
; X
1
 >
n
_
so the sum in (ii) is E(X
2
1
; X
1
 >
n) 0 by dominated convergence.
7.7. CLTs for Dependent Variables
7.1. On {
n
= i} we have
E(X
n+1
G
n
) =
_
xdH
i
(x) = 0
E(X
2
n+1
G
n
) =
_
x
2
dH
i
(x) =
2
i
The ergodic theorem for Markov chains, Example 2.2 in Chapter 6 (or Exercise
5.2 in Chapter 5) implies that
n
1
n
m=1
2
(X
m
)
2
(x)(x) a.s.
7.2. Let = P(
n
= 1) and let X
n
=
n
1/4. Since X
n
is 1dependent, the
formula in Example 7.1 implies
2
= EX
2
0
+ 2E(X
0
X
1
). EX
2
0
= var(
0
) =
(1/4)(3/4) since
0
is Bernoulli(1/4). For the other term we note
EX
0
X
1
= E [(
0
1/4)(
1
1/4)] = 1/16
since EZ
0
Z
1
= 0 and EZ
i
= 1/4. Combining things we have
2
= 2/16.
To identify Y
0
we use the formula from the proof and the fact that X
1
is
independent of F
1
, to conclude
Y
0
= X
0
E(X
0
F
1
) +E(X
1
F
0
) EX
1
= 1
(0=H,1=T)
1
2
1
(0=H)
+
1
2
1
(1=H)
1/4
7.3. The Markov property implies
E(X
0
F
n
) =
j
p
n1
(
n
, j)
j
Section 7.9 Laws of the Iterated Logarithm 107
Since Markov chain is irreducible with a nite state space, combining Exercise
5.10 with fact that
i
(i)
i
= 0 shows there are constants 0 < , C < so
that
sup
i
j
p
n1
(i, j)
j
Ce
n
7.8. Empirical Distributions, Brownian Bridge
8.1. Exercise 4.1 implies that
P
_
max
0t1
B
t
> b, < B
1
<
_
= P(2b < B
1
< 2b +)
Since P(B
1
 < ) 2 (2)
1/2
it follows that
P
_
max
0t1
B
t
> b
< B
1
<
_
e
(2b
2
)/2
7.9. Laws of the Iterated Logarithm
9.1. Letting f(t) = 2(1 + ) log log log t and using a familar formula from the
proof of (9.1)
P
0
(B
t
k
> (t
k
f(t
k
))
1/2
) f(t
k
)
1/2
exp((1 +) log k)
The righthand side is summable so
limsup
k
B
t
k
/(2t
k
log log log t
k
)
1/2
1
For a bound in the other direction take g(t) = 2(1) log log log t and note that
P
0
(B
t
k
B
t
k1
> ((t
k
t
k1
)g(t
k
))
1/2
) g(t
k
)
1/2
exp((1 ) log k)
The sum of the righthand side is and the events on the left are independent
so
P
0
_
B
t
k
B
t
k1
> ((t
k
t
k1
)g(t
k
))
1/2
i.o.
_
= 1
Combining this with the result for the limsup and noting t
k1
/t
k
0 the
desired result follows easily.
108 Chapter 7 Brownian Motion
9.2. EX
i

= implies
m=1
P(X
i
 > Cn
1/
) = for any C. Using
the second BorelCantelli now we see that limsup
n
X
n
/n
1/
C, i.e., the
limsup = . Since max{S
n
, S
n1
} X
n
/2 it follows that limsup
n
S
n
/n
1/
=
.
9.3. (9.1) implies that
limsup
n
S
n
/(2nlog log n)
1/2
= 1 liminf
n
S
n
/(2nlog log n)
1/2
= 1
so the limit set is contained in [1, 1]. On the other hand
m=1
P(X
n
>
n) <
for any so X
n
/
n 0
so as S
n
/(2nlog log n)
1/2
wanders back and forth between 1 and 1 it lls up
the entire interval.
Appendix: Measure Theory
A.1. LebesgueStieltjes Measures
1.1. (i) If A, B
i
F
i
then A, B F
n
for some n, so A
c
, A B F
n
.
(ii) Let = [0, 1), F
n
= ({[m/2
n
, (m + 1)/2
n
), 0 m < 2
n
}. (
i
F
i
) = the
Borel subsets of [0, 1) but [0, 1/3)
i
F
i
.
1.2. If A has asymptotic density then A
c
has asymptotic density 1 . How
ever, A is not closed under unions. To prove this note that if A has the property
that {2k 1, 2k}A = 1 for all integers k then A has asymptotic density 1/2.
Let A consist of the odd integers between (2k 1)! and (2k)! and the even
integers between (2k)! and (2k + 1)!. Let B = 2Z. Then
limsup
n
(A B) {1, 2, . . . n}/n = 1
liminf
n
(A B) {1, 2, . . . n}/n = 1/2
1.3. (i) B = A + (B A) so (B) = (A) +(B A) (A).
(ii) Let A
n
= A
n
A, B
1
= A
1
and for n > 1, B
n
= A
n
n1
m=1
(A
m
)
c
. Since
the B
n
are disjoint and have union A we have using (i) and B
m
A
m
(A) =
m=1
(B
m
)
m=1
(A
m
)
(iii) Let B
n
= A
n
A
n1
. Then the B
n
are disjoint and have
m=1
B
m
= A,
n
m=1
B
m
= A
n
so
(A) =
m=1
(B
m
) = lim
n
n
m=1
(B
m
) = lim
n
(A
n
)
(iv) A
1
A
n
A
1
A so (iii) implies (A
1
A
n
) (A
1
A). Since (A
1
B) =
(A
1
) (B) it follows that (A
n
) (A).
110 Chapter 7 Brownian Motion
1.4. (Z) = 1 but ({n}) = 0 for all n and Z =
n
{n} so is not countably
additive on (A).
1.5. By xing the sets in coordinates 2, . . . , d it is easy to see (R
d
o
) RR
o
R
o
and iterating gives the desired result.
A.2. Caratheodarys Extension Theorem
2.1. Let C = {{1, 2}, {2, 3}}. Let be counting measure. Let (A) = 2 if 2 A,
0 otherwise.
A.3. Completion, etc
3.1. By (3.1) there are A
i
A so that
i
A
i
B and
i
(A
i
) (B) +/2.
Pick I so that
i>I
(A
i
) < /2, and let A =
iI
A
i
. Since B
i
A
i
, we
have B A
i>I
A
i
and hence (B A) (
i>I
A
i
) /2. To bound
the other dierence we note that A B (
i
A
i
) B and
i
A
i
B so
(A B) (
i
A
i
) (B) /2.
3.2. (i) For each rational r, let E
r
= r +
D
q
. The E
r
are disjoint subsets of
(0, 1], so
r
(E
r
) 1 but we have (E
r
) = (D
q
), so (D
q
) = 0.
(ii) By translating A we can suppose without loss of generality that (A
(0, 1]) > 0. For each rational q let A
q
= AB
q
. If every A
q
is measurable then
(A
q
) = 0 by (i) and (A (0, 1]) =
q
(A
q
) = 0 a contradicition.
3.3. Write the rotated rectangle B as {(x, y) : a x b, f(x) y g(x)}
where f and g are piecewise linear. Subdividing [a, b] into n equal pieces, using
the upper Riemann sum for g and the lower Riemann sum for f, then letting
n we conclude that
(B) = (A).
(ii) By covering D with the appropriate rotations and translations of sets used
to cover C, we conclude
(D)
A
0
as 0. If (A
0
) > 0 then
(A
) > 0 then (A
f d
hd = (A
m=1
m
2
n
1
En,m
Since g f, (iv) in (4.5) implies
m=1
m
2
n
(E
n,m
) =
g d
f d
limsup
n
m=1
m
2
n
(E
n,m
)
f d
For the other inequality let h be the class used to dene the integral. That is,
0 h f, h is bounded, and H = {x : h(x) > 0} has (H) < .
g +
1
2
n
1
H
f1
H
h
so using (iv) in (4.5) again we have
1
2
n
(H) +
m=1
m
2
n
(E
n,m
)
hd
Letting n now gives
liminf
n
m=1
m
2
n
(E
n,m
)
hd
Since h is an aribitrary member of the dening class the desired result follows.
4.3. Since
g ( ) d
g
+
 d +
g
 d
it suces to prove the result when g 0. Using Exercise 4.2, we can pick
n large enough so that if E
n,m
= {x : m/2
n
f(x) < (m + 1)/2
n
} and
h(x) =
m=1
(m/2
n
)1
En,m
then
g hd < /2. Since
n=1
m
2
n
(E
n,m
) =
hd
g d <
we can pick M so that
m>M
m
2
n
(E
n,m
) < /2. If we let
=
M
m=1
m
2
n
1
En,m
112 Chapter 7 Brownian Motion
then
g  d =
g hd +
h d < .
(ii) Pick A
m
that are nite unions of open intervals so that A
m
E
n,m
 M
2
and let
q(x) =
M
m=1
m
2
n
1
Am
Now the sum above is =
k
j=1
c
j
1
(aj1,aj)
almost everywhere (i.e., except at
the end points of the intervals) for some a
0
< a
1
< < a
k
and c
j
R.
 q d
M
m=1
m
2
n
(A
m
E
n,m
)
2
n
(iii) To make the continuous function replace each c
j
1
(aj1,aj)
by a function r
j
that is 0 on (a
j1
, a
j
)
c
, c
j
on [a
j1
+
j
, a
j
j
], and linear otherwise. If we
let r(x) =
k
j=1
r
j
(x) then
q(x) r(x) =
k
j=1
j
c
j
<
if we take
j
c
j
< /k.
4.4. Suppose g(x) = c1
(a,b)
(x). In this case
b
a
cos nxdx =
c
n
sinnx
b
a
so the absolute value of the integral is smaller than 2c/n and hence 0.
Linearity extends the last result to step functions. Using Exercise 4.3 we can
approximate g by a step function q so that
g q dx < . Since  cos nx 1
the triangle inequality implies
g(x) q(x) dx
so the limsup of the left hand side < and since is arbitrary the proof is
complete.
4.5. (a) does not imply (b): let f(x) = 1
[0,1]
. This function is continuous at
x = 0 and 1 but if g = f a.e. then g will be discontinuous at 0 and 1.
(b) does not imply (a): f = 1
Q
where Q = the rationals is equal a.e. to the
continuous function that is 0. However 1
Q
is not continuous anywhere.
A.5 Properties of the Integral 113
4.6. Let E
n
m
= { : x
n
m1
f(x) < x
n
m
},
n
= x
n
m1
on E
n
m
and
n
= x
n
m
on
E
n
m
.
n
f
n
n
+ mesh(
n
) so (iv) in (4.7) implies
n
d
f d
n
d
n
d + mesh(
n
)()
It follows from the last inequality that if we have a sequence of partitions with
mesh(
n
) 0 then
U(
n
) =
n
d,
L(
n
) =
n
d,
f d
A.5. Properties of the Integral
5.1. If g M a.e. then fg Mf a.e. and (iv) in (4.7) implies
fg d M
f d = Mf
1
Taking the inf over M now gives the desired result.
5.2. If ({x : f(x) > M}) = 0 then
f
p
d M
p
so limsup
p
f
p
M.
On the other hand if ({x : f(x) > N}) = > 0 then
f
p
d N
p
so
liminf
p
f
p
N. Taking the inf over M and sup over N gives the desired
result.
5.3. Since f +g f +g we have
f +g
p
dx
f f +g
p1
dx +
g f +g
p1
dx
f
p
f +g
p1
q
+g
p
f +g
p1
q
Now q = p/(p 1) so
f +g
p1
q
=
f +g
p
dx
1/q
= f +g
p1
q
and dividing each side of the rst display by f +g
p1
q
gives the desired result.
(ii) Since f +g f +g, (iv) and (iii) of (4.7) imply that
f +g dx
f +g dx
f dx +
g dx
114 Chapter 7 Brownian Motion
It is easy to see that if {x : f(x) M} = 0 and {x : g(x) N} = 0 then
{x : f(x) +g(x) M +N} = 0. Taking the inf over M and N we have
f +g
+g
5.4. If
n
is a sequence of partitions with mesh(
n
) 0 then f
n
(x) f(x)
at all points of continuity of f so the bounded convergence theorem implies
U(
n
) =
[a,b]
f
n
(x) dx
[a,b]
f(x) dx
A similar argument to applies to the lower Riemann sum and completes the
proof.
5.5. If 0 (g
n
+g
1
) (g +g
1
) then the monotone convergence theorem implies
g
n
g
1
d
g g
1
d
Since
g
1
d < we can add
1
d to both sides and use (ii) of (4.5) to get
the desired result.
5.6.
n
m=0
g
m
m=0
g
m
so the monotone convergence theorem implies
m=0
g
m
d = lim
n
m=0
g
m
d
= lim
n
n
m=0
g
m
d =
m=0
g
m
d
5.7. (i) follows from the monotone convergence theorem.
(ii) Let f = g and pick n so that
g d
g nd <
2
Then let < /(2n). Now if (A) <
A
g d
g (g n) d +
A
g nd <
2
+(A)n <
5.8.
n
m=0
f1
Em
f1
E
and is dominated by the integrable function f, so the
dominated convergence theorem implies
E
f d = lim
n
n
m=0
Em
f d
A.6 Product Measure, Fubinis Theorem 115
5.9. If x
n
c (a, b) then f1
[a,xn]
f1
[a,c]
a.e. and is dominated by f so
the dominated convergence theorem implies g(x
n
) g(c).
5.10. First suppose f 0. Let
n
(x) = m/2
n
on {x : m/2
n
f(x) < (m+1) <
2
n
} for 1 m < n2
n
and 0 otherwise. As n ,
n
(x) f(x) so so the
dominated convergence theorem implies
f
n

p
d 0. To extend to the
general case now, let
+
n
approximate f
+
, let
n
approximate f
, and let
=
+
f  d =
f
+
+
n
 d +
f
n
 d
5.11. Exercise 5.6 implies
f
n
 d =
f
n
 d < so
f
n
 < a.e.,
g
n
=
n
m=1
f
m
g =
m=1
f
m
a.e.
and the dominated convergence theorem implies
g
n
d
g d. To nish
the proof now we notice that (iv) of (4.7) implies
g
n
d =
n
m=1
f
m
d
and we have
m=1
f
m
d
m=1
f
m
 d < so
n
m=1
f
m
d
m=1
f
m
d
A.6. Product Measure, Fubinis Theorem
6.1. The rst step is to observe AB
o
(A
o
B
o
) so (A
o
B
o
) = AB.
Since A
o
B
o
is closed under intersection, uniqueness follows from (2.2).
6.2. f 0 so
f d(
1
2
) =
Y
f(x, y)
2
(dy)
1
(dx) <
This shows f is integrable and the result follows from (6.2).
116 Chapter 7 Brownian Motion
6.3. Let Y = [0, ), B = the Borel subsets, and = Lebesgue measure. Let
f(x, y) = 1
{(x,y):0<y<g(x)}
, and observe
Y
f(x, y) dy (dx) =
X
g(x) (dx)
X
f(x, y) (dx) dy =
0
(g(x) > y) dy
6.4. (i) Let f(x, y) = 1
(a<xyb)
and observe
( )({(x, y) : a < x y b}) =
f d( )
=
f dd =
(a,b]
{F(y) F(a)}dG(y)
(ii) Using (i) twice we have
(a,b]
{F(y) F(a)}dG(y) +
(a,b]
{G(y) G(a)}dF(y)
= F(a){G(b) G(a)} +G(a){F(b) F(a)}
+ ( )((a, b] (a, b]) + ( )({(x, x) : a < x b})
The third term is (F(b) F(a))(G(b) G(a)) so the sum of the rst three is
F(b)G(b) F(a)G(a).
(iii) If F = G is continuous then the last term vanishes.
6.5. Let f(x, y) = 1
{(x,y):x<yx+c}
.
f(x, y) (dy) dx =
(dy) = c(R)
so the desired result follows from (6.2).
6.6. We begin by observing that
a
0
0
e
xy
sin x dy dx =
a
0
 sinx
x
dx <
A.8 RadonNikodym Theorem 117
since sin x/x is bounded on [0, a]. So Exercise 6.2 implies e
xy
sinx is integrable
in the strip. Removing the absolute values from the last computation gives the
left hand side of the desired formula. To get the right hand side integrate by
parts twice:
f(x) = e
xy
f
(x) = ye
xy
g
a
0
e
xy
sinxdx = e
ay
cos a + 1
a
0
ye
xy
cos xdx
f(x) = ye
xy
f
(x) = y
2
e
xy
g
a
0
ye
xy
cos xdx = ye
ay
sin a
a
0
y
2
e
xy
sin xdx
Rearranging gives
a
0
e
xy
sin xdx =
1
1 +y
2
(1 e
ay
cos a ye
ay
sina)
Integrating and recalling d tan
1
(y)/dy = 1/(1 +y
2
) gives the displayed equa
tion. To get the bound note
0
e
ay
dy = 1/a and
0
ye
ay
dy = 1/a
2
.
A.8. RadonNikodym Theorem
8.1. If ({A {x : f(x) < 0}) = 0 then for B A
B
f d =
B{x:f(x)>0}
f d 0
If E = A {x : f(x) < } has positive measure for some > 0 then
E
f d
E
d < 0
so A is not positive.
8.2. Let be the uniform distribution on the Cantor set, C, dened in Example
1.7 of Chapter 1. (C
c
) = 0 and (C) = 0 so the two measures are mutually
singular.
8.3. If F E then since (A B)
c
is a null set.
(F) = (F A) +(F B) (E A) =
+
(E)
118 Chapter 7 Brownian Motion
8.4. Suppose
1
r
+
1
s
and
2
r
+
2
s
are two decompositions. Let A
i
be so that
i
s
(A
i
) = 0 and (A
c
i
) = 0. Clearly (A
c
1
A
c
2
) = 0. The fact that
i
r
<<
implies
i
r
(A
c
1
A
c
2
) = 0. Combining this with
1
s
(A
1
) = 0 =
2
s
(A
2
) it follows
that
1
r
(E) =
1
r
(E A
1
A
2
) = (E A
1
A
2
)
=
2
r
(E A
1
A
2
) =
2
r
(E)
This shows
1
r
=
2
r
and it follows that
1
s
=
2
s
.
8.5. Since
2
, there is an A with
2
(A) = 0 and (A
c
) = 0.
1
<<
2
implies
1
(A) = 0 so
1
.
8.6. Let g
i
= d
i
/d. The denition implies
i
(B) =
B
g
i
d so
(
1
+
2
)(B) =
B
(g
1
+g
2
) d
and the desired result follows from uniqueness.
8.7. If F = 1
A
this follows from the denition. Linearity gives the result for
simple functions; monotone convergence the result for nonnegative functions.
8.8. Let f = (d/d)1
B
in Exercise 8.7 to get
B
d
d
d
d
d =
B
d
d
d = (B)
where the second equality follows from a second application of Exercise 8.7.
8.9. Letting = in Exercise 8.8 we have
1 =
d
d
d
d
Гораздо больше, чем просто документы.
Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.
Отменить можно в любой момент.