Вы находитесь на странице: 1из 76

Probability Theory

Prof. Dr. Nina Gantert


Lecture at TUM in SS 2011
August 18, 2011
Produced by Stefan Seichter
Contents
1 Measure spaces 3
2 Measurable functions, random variables 13
3 Integration of functions 19
4 Markov inequality, Fatous lemma, dominated convergence 22
5 L
p
-spaces, inequalities 24
6 Dierent types of convergence, uniform integrability 28
7 Independence 32
8 Kernels and Fubinis Theorem 36
9 Absolute continuity, Radon-Nikodym derivatives 41
10 Construction of stochastic processes 47
11 The law of large numbers 52
12 Weak convergence, characteristic functions and the central limit theorem 54
13 Conditional expectation 63
14 Martingales 70
2
1 Measure spaces
Let be a non-empty set.
Denition 1.1 / T(), i.e. a collection / of subsets of , is a -eld on , if
(i) /
(ii) A / A
c
/
(iii) A
n
/, n N

nN
A
n
/
If A
n
/, n N, then also

nN
A
n
/.
Denition 1.2 If / is a -eld (-Algebra) on , (, /) is a measurable space and
each A / is measurable.
Denition 1.3 A probability space is a triple (, /, P) where is a non-empty set, / a
-eld on and P : / 1 a function with
(i) P(A) 0 A /
(ii) P() = 1
(iii) P
_

n=1
A
n
_
=

n=1
P(A
n
) for each sequence of pairwise disjoint sets in /.
Property (iii) is the -additivity of P.
Example 1.4 (Discrete probability space)
countable. Say =
1
,
2
, . . . , / = T(). p
1
, p
2
, . . . are weights i.e. p
j
0, j and

j=1
p
j
= 1. P is dened by P(A) =

j :
j
A
p
j
, A . Then (, /, P) is a probability
space. Note that p
j
= P(
j
) > 0 for at least one j. (1.1)
Example 1.5 = 0, 1
N
= = (x
1
, x
2
, . . . ) [ x
i
0, 1, i
For a , let A
n
= [ x
j
= a
j
, 1 j n be the set of all sequences with rst
entries (a
1
, . . . , a
n
).
If P describes fair coin tosses, one should have A
n
/ and P(A
n
) =
1
2
n
. (*)
Since a =

n
A
n
also a / and since P has the property
P
_

n
A
n
_
= lim
n
P(A
n
) if A
n
(i.e. A
n
A
n+1
, n) (1.2)
(Proof of (1.2) see exercises)
one has
P(a) = lim
n
P(A
n
) = 0 (1.3)
In contrast to the discrete case, P is continuous, i.e. there is no with P() > 0.
3
Remark A theorem by Ulam says that (assuming the continuum hypothesis) there is no
continuous probability measure on (, T(). (*) is only possible if / _ T().
In general cannot take / = T(), see example 1.5. On the other hand, / should contain
the interesting sets (e.g. A
n
in example 1.5).
Often, one knows P(A) for certain sets A and one wants to choose / so that one can
extend P to a probability measure on (, /).
is always a non-empty set.
A collection of sets / T() is -stable if it contains nite unions of sets in /, -stable
if it contains intersections of nitely many sets in /.
Denition 1.6 A collection /
0
T() is an algebra on if
(i) /
0
(ii) A /
0
A
c
/
0
(iii) /
0
is -stable
An algebra contains the empty set and is -stable (since A B is (A
c
B
c
)
c
).
Each -eld is an algebra, but there are algebras which are not -elds:
Example = N. Then, /
0
= A [ A nite or A
c
is nite is the algebra of nite
or conite sets. /
0
is not a -eld.
Denition 1.7 A collection T T() is a Dynkin system or -system, if
(i) T
(ii) A, B T, A B BA T
(iii) A
n
T, n N, A
i
A
j
= for i ,= j

n=1
A
n
T
A -system contains and the complement A
c
of a set A T.
Each -eld is a -system. On the other hand, there are -systems which are not -elds.
Example = 1, 2, 3, 4, T = , , 1, 2, 1, 4, 3, 4, 2, 3 T is a -system, but
not a -eld.
Note that a -system which is -stable is a -eld:
for A
1
, A
2
, . . . T,

n=1
A
n
= A
1

n=2
(A
n
A
c
1
. . . A
c
n1
)

n=1
A
n
T.
Lemma 1.8 Let I ,= be an index set. If /
i
is a -eld on , i I, then

iI
/
i
= A [ A /
i
, i I is a -eld on .
The same statement holds for algebras and -systems.
4
Denition 1.9 Let / T(), / , = . Then
(/) =

/ [ / / T() [ / -eld is the -eld generated by /.


T(/) =

/ [ / / T() [ / -system is the -system generated by /.


(/) =

/ [ / / T() [ / algebra is the algebra generated by /.


/ is the generating system of (/), T(/) or (/).
Theorem 1.10 (/) is the smallest -eld T with / T.
If / is a -eld with / / T(), then (/) /.
The same statement holds for (/) and T(/).
Proof: Use Lemma 1.8, where I is an index set for / [ / / T() [ / -eld.
Have I ,= , since / T() and T() is a -eld. Hence (/) is a -eld.
Example 1.11 Let / = [ . Then
(/) = A [ A countable or A
c
countable,
(/) = A [ A nite or A
c
nite,
T(/) = (/).
Some properties: For all collections /, /
1
, /
2
T(), we have
(a) / (/)
(b) (/) = ((/))
(c) /
1
/
2
(/
1
) (/
2
)
(d) (/) (/)
(e) T(/) (/)
(f) /
1
(/
2
) and /
2
(/
1
) (/
1
) = (/
2
)
Proof: (a) and (c) follow from the denition of (/).
(b) is true since (/) is a -eld.
(d) and (e) are true since (/) is an algebra and a -system.
(f) is implied by (b) and (c).
Lemma 1.12 If / T() is -stable, then T(/) = (/).
Proof:
1) We show A T(/), B / A B T(/).
Proof: For B /, T
B
= A [ A B T(/) is a -system,
/ -stable / T
B
T
B
-system T(/) T
B
2) Let A T(/). We show A B T(/).
Proof: T
A
= B [ AB T(/) is -system, / T
A
due to 1) T
A
= T(/).
5
Example 1.13 An important -eld is the Borel--eld on 1
k
.
Let O
k
= A 1
k
[ A open be the collection of open subsets of 1
k
.
The -eld B
k
of Borel-sets of 1
k
(Borel--eld) is B
k
= (O
k
).
Remark Each of the following collections generates B
k
:
c
k
= A 1
k
[ A closed,
/
k
= A 1
k
[ A compact,
J
k
= (x, y] [ x, y 1
k
, x y, where (x, y] = (x
1
, y
1
] . . . (x
k
, y
k
] and x y, if
x
i
y
i
, 1 i k,

k
= (, x] [ x 1
k
.
Remark Using the axiom of choice, B
k
,= T(1
k
).
Goal Assume that for A / T(), P(A) is known (see for instance the sets A
n
in
example 1.5), / is not a -eld.
Can we extend P to a probability measure on a -eld / with / /?
Is this extension unique?
In the following denitions, think of = 1
k
, / = J
k
, = k-dim. content.
Denition 1.14 Assume / ,= , / T(). A function : / [0, ] is a non-
negative function (on sets). is
a) nite, if (A) < , A /.
b) -nite, if there is an increasing sequence A
1
A
2
. . . in / such that

n=1
A
n
=
and (A
n
) < , n.
c) nitely additive, if for nitely many pairwise disjoint sets A
1
, . . . , A
n
/ with
n

j=1
A
j
/, then we have
_
n

j=1
A
j
_
=
n

j=1
(A
j
).
d) -additive, if for pairwise disjoint sets A
1
, A
2
, . . . / with

j=1
A
j
/, we have

j=1
A
j
_
=

j=1
(A
j
).
Denition 1.15 (, /, ) is a measure space if / is a -eld on and : / [0, ] a
-additive non-negative function on / with () = 0.
Hence, a probability space is a measure space with () = 1.
The measure space is nite, if is nite and -nite, if is -nite.
Example 1.16 ,= , / -eld on .
a) Let and dene a measure

by

(A) =
_
_
_
1 A
0 / A
A /

is the Dirac measure in .


6
b) Let T and dene a measure
T
by

T
(A) =
_
_
_
[A T[ A T nite
else

T
is the counting measure in T.
c) Assume
n
, n 1 are measures on (, /) and b
n
0, n 1. Dene
(A) =

n=1
b
n

n
(A) (A /) Then, is a measure on (, /).
In particular, take in c) / = T(),
n
=

n
,
n
, n, b
n
> 0, n and

n=1
b
n
= 1.
In this case, the measure is a discrete measure which is concentrated on the set
1
,
2
, . . .
Theorem 1.17 (Properties of measures)
Let (, /, ) be a measure space and A, B, A
1
, A
2
, . . . /. Then we have
a) If A
1
, A
2
, . . . are pairwise disjoint sets,
_
n

j=1
A
j
_
=
n

j=1
(A
j
)
b) A B (A) (B)
c) A B, (A) < (BA) = (B) (A)
d)
_

j=1
A
j
_

j=1
(A
j
) ( -subadditivity)
e) A
n
A (i.e. A
1
A
2
. . . , A =

i=1
A
i
) (A) = lim
n
(A
n
)
f ) A
n
A (i.e. A
1
A
2
. . . , A =

i=1
A
i
) and (A
1
) < (A) = lim
n
(A
n
)
Remark
1. Assume that is a non-negative function on / and is nitely additive. Then:
(e) is -additive.
Proof: Let (A
j
) be a sequence of pairwise disjoint sets in /. Then

n=1
A
n
_
= lim
n

_
n

j=1
A
j
_
= lim
n
n

j=1
(A
j
) =

j=1
(A
j
).
2. The assumption (A
1
) < in (f) is needed:
Example = N, / = T(), (A) =
_
_
_
[A[ A nite
else
A
n
= n, n + 1, . . . . Then A
n
but (A
n
) = , n.
7
Theorem 1.18 (Uniqueness for measures agreeing on a -stable collection of
sets)
Let / T() be -stable and / = (/). Assume that
1
,
2
are measures on (, /)
with
1
(A) =
2
(A), A / and assume that there is a sequence (A
n
)
n1
, A
n
/, n
with A
n
and
1
(A
n
) =
2
(A
n
) < , n. Then
1
(A) =
2
(A), A /.
Proof: Let B / with
1
(B) =
2
(B). Then T
B
:= A / [
1
(B A) =
2
(B A)
is a -system. From Lemma 1.12, since / T
B
, / = (/) = T(/) T
B
. In
particular, / T
A
n
, n. Since A A
n
A /, property e) implies that
1
(A) =
lim
n

1
(A A
n
) = lim
n

2
(A A
n
) =
2
(A), A /.
Corollary 1.19
1. P
1
, P
2
probability measures on (1
k
, B
k
) with P
1
((, x]) = P
2
((, x]), x 1
k
P
1
= P
2
.
2. There is (at most) one measure
k
on (1
k
, B
k
) with
k
((x, y]) =
k

j=1
(y
j
x
j
),
((x, y]
k
I
k
)
3. Let /
0
T() be an algebra and / = (/
0
), P
1
, P
2
probability measures on (, /)
with P
1
(A) = P
2
(A), A /
0
. Then we have P
1
= P
2
.
Proof:
1. follows from Theorem 1.18 with = 1
k
, / =
k
, A
n
= (, x
n
] x
n
= (n, . . . , n).
2. follows from Theorem 1.18 with = 1
k
, / = J
k
, A
n
= (x
n
, y
n
],
x
n
= (n, . . . , n), y
n
= (n, . . . , n)
3. follows from Theorem 1.18 with A
n
= .
Denition 1.20 A collection o T() is a semiring (Semiring) on if
i) o.
ii) o is -stable.
iii) If A, B o with A B there are nitely many pairwise disjoint sets C
1
, . . . , C
n
o
such that BA =
n

j=1
C
j
.
Examples
a) = 1
k
, o = J
k
.
b) (
1
, /
1
), (
2
, /
2
) measurable spaces, : =
1

2
. Then, the collection
o = A
1
A
2
[ A
1
/
1
, A
2
/
2
is a semiring on .
Proof: exercise.
8
Denition 1.21 A non-negative function

: T() [0, ] is an outer measure


( aueres Ma) if:
i)

() = 0
ii) A B

(A)

(B)
iii)

j=1
A
j
_

j=1

(A
j
) (-subadditivity)
Theorem 1.22 Let / T() be a collection of sets with / and : / [0, ]
a non-negative function on / with () = 0. For A let

(A) = inf
_

n=1
(A
n
) [ (A
n
) sequence in / with A

_
j=1
A
n
_
if there is such a covering sequence for A, + otherwise. Then

is an outer measure
and we say

is the outer measure induced by .


Proof: See Measure Theory.
We will restrict

to a measure on a -eld.
Lemma 1.23 (Caratheodory)
Let

: T() [0, ] be an outer measure, /(

) := A [

(AE) +

(A
c
E) =

(E), E is the -eld of

-measurable sets. Then:


a) /(

) is a -eld on .
b) The restriction of

on /(

) is a measure on (, /(

)).
Proof: See Measure Theory.
A set A is

-measurable if it divides each subset E of in two parts, on which

behaves
additively.
Theorem 1.24 (Extension Theorem)
Let o T() be a semiring and : o [0, ] a non-negative function with the following
properties:
i) () = 0
ii) is nitely additive
iii) is -subadditive
Then there is a measure on (, (o)) with (A) = (A), A o. If is -nite,
is unique.
Proof: See Measure Theory.
9
Remark iii) in theorem 1.24 can be replaced with is -additive.
Lemma 1.25 o T(), o semiring, : o [0, ] non-negative function with () =
0, nitely additive. Then:
is -additive is -subadditive
Proof: See exercises
Corollary 1.26 Let

/ T() be an algebra and :

/ [0, 1] be a non-negative function
with () = 0, () = 1. Assume is -additive. Then there is a unique extension of
to a probability measure on (, /) where / = (

/).
Proof: Follows from Theorem 1.24 and Lemma 1.25 since each algebra is a semiring.
Continuation of example 1.5 (coin tossing)
= 0, 1
N
= = (x
1
, x
2
, . . . ) [ x
i
0, 1, i
For a , let B
n,a
= [ x
i
= a
i
, 1 i n and /
n
= (B
n,a
[ a 0, 1
N

Then: /
1
/
2
. . . /
n
is an increasing sequence of -elds on ,

/ :=

n=1
/
n
is
algebra (see exercises).
If P corresponds to tosses of a fair coin, we have P(B
n,a
) =
1
2
n
, a 0, 1
N
, n.
This denes P on (,

/). To show that P satises the hypothesis of Corollary 1.26, we
have to show that P is -additive on

/.
We then get an extension of P to a probability measure on (, (

/)).
We postpone the proof of the -additivity and will do it later in a more general frame.
Interpretation of /
n
:
/
n
contains the information about until time n if = (x
1
, x
2
, . . . ) is seen as a process
in time.
As an application of Theorem 1.24 we consider the construction of probability measures
on (1, B).
Denition 1.27 A function G: 1 1 is a cdf (cumulative distribution function)
(madenierende Funktion) if
(i) G is increasing (x y G(x) G(y)).
(ii) G is continuous from the right (rechtsstetig).
If G satises in addition lim
x
G(x) = 1, lim
x
G(x) = 0 then G is a cdf of a probability
measure (Verteilungsfunktion).
Theorem 1.28 If G is a cdf then there exists a unique measure
G
on (1, B) such that

G
((a, b]) = G(b) G(a), a, b 1 (1.5)
10
Further,
G
is -nite.
If G is the cdf of a probability measure, then
G
is a probability measure.

G
is the Lebesgue-Stieltjes measure associated to G.
Proof:
G
is a non-negative function on the semiring J
1
with
G
() = 0 and
G
is nitely
additive. Due to Theorem 1.24 and Lemma 1.25, it remains to show that
G
is -additive,
i.e. that for each sequence of pairwise disjoint sets (A
n
)
n1
, A
n
J, n with

n=1
A
n
J,
we have
G
_

n=1
A
n
_
=

n=1

G
(A
n
).
Let

n=1
A
n
= (x, y] J. Then
G
_

n=1
A
n
_
= G(y) G(x).
Assume A
n
= (x
n
, y
n
], n. Then, inf
n
x
n
= x, max
n
y
n
= y and

n=1

G
(A
n
)
n
G(y)
G(x)
Example 1.29
1. G(x) = x (x 1). The associated measure
1
or is the Borel-Lebesgue measure
on (1, B).
2. If G is a cdf,
G
(x) = G(x) G(x), where G(x) = lim
y,x
G(y). Hence
G
is
continuous (i.e.
G
(x) = 0, x) if and only if G is continuous.
3. For a probability measure on (1, B), F(x) := ((, x]) is the cdf of .
F is a cdf and we have
F
= .
Example exponential distribution with parameter .
Then F(x) =
_
_
_
1 e
x
x > 0
0 x 0
4. A 1, A countable (A) = 0
is not true.
Example (Cantor set)
A
1
= (
1
3
,
2
3
), A
2
= (
2
9
,
3
9
) (
7
9
,
8
9
)
A
k
=

(z
1
,...,z
k1
)0,1
k1
_
k1

j=1
z
j
2
3
j
+
1
3
k
,
k1

j=1
z
j
2
3
j
+
2
3
k
_
(k 2)
A :=

k=1
A
k
The Cantor set C is dened by C := [0, 1]A
Claim: C is not countable.
Proof: Consider the function 0, 1, 2
N
[0, 1], a

k=1
a
k
1
3
k
The set a = (a
k
)
k1
[ a
k
0, 2, k is mapped one to one to C.
We have C B and (A) =

k=1
(A
k
) =

n=1
2
k1 1
3
k
=
1
3

k=1
_
2
3
_
k1
= 1
(C) = 0.
11
Further, Theorem 1.24 implies the existence of the Borel-Lebesgue measure
k
on (1
k
, B
k
).

k
is dened on the semiring by
k
((x, y]) =
k

i=1
(y
i
x
i
).
Proof: Check that the hypotheses of Theorem 1.24 are satised.
Denition 1.31 Let (, /, ) be a measure space.
A set A / with (A) = 0 is a -nullset.
(, /, ) is complete if A, B , A B, B -nullset A / ( (A) = 0).
Theorem 1.32 (, /, ) measure space. The collection
/

:= A [ E, F / s.t. E A F, (FE) = 0
is a -eld on , / _ /

and the non-negative function


: /

[0, ] A (A) = sup(B) [ B /, B A


is a measure on (, /

) which extends . The measure space (, /

, ) is complete.
We say that (, /

, ) is the completion of (, /, ).
Proof: see exercises
Remark The measure space (1
k
, B
k
,
k
) is not complete. Its completion is denoted
(1
k
, L
k
,
k
). L
k
is the eld of Lebesgue-measurable sets and
k
is the Lebesgue measure
on (1
k
, L
k
).
12
2 Measurable functions, random variables
,= ,

,= , f :

.
The preimage (Urbild) of a set

A

is f
1
(

A) = [ f()

A.
The corresponding function f
1
is dened by f
1
: T(

) T()

A f
1
(

A).
f
1
commutes with and , more precisely: let I be an index set and

A
j


(j I),
then f
1
_

jI

A
j
_
=

jI
f
1
(

A
j
), f
1
_

jI

A
j
_
=

jI
f
1
(

A
j
), f
1
(

A
c
j
) = (f
1
(

A
j
))
c
,
f
1
(

) = .
Lemma 2.1 ,= ,

,= , f :

. Then:
a) If

/ is a -eld on

, then f
1
(

/) = f
1
(

A) [

A

/ is a -eld on .
b) If / is a -eld on , /
f
:=

A

[ f
1
(

A) / is a -eld on

.
Proof: Check the properties of a -eld in Denition 1.1.
For

/ T(

) we dene f
1
(

/) = f
1
(

A) [

A

/.
Denition 2.2 Let (, /) and (

,

/) be measurable spaces. The function f :

is
(/,

/)-measurable if
f
1
(

/) / (i.e. f
1
(

A) /,

A

/) (2.1)
Notation: f : (, /) (

,

A).
Remark If

/ is large and / is small, there are less measurable functions
f : (, /) (

,

/)
Examples
1. If / = , and

/ = T(), only the constant functions f with f() = ,
are (/,

/)-measurable.
2. If / = T(), each function f :

is (/,

/)-measurable (no matter what

/ is).
3. Assume / is generated by a countable partition of into atoms A
1
, A
2
, . . .
(i.e. A
i
A
j
= , i ,= j and =

j=1
A
j
).
Then f : 1 is (/, B)-measurable, if f is constant on each atom A
i
.
We show that it suces to check (2.1) for a generating system of

/.
Lemma 2.3 Let (, /) and (

,

/) be measurable spaces, f :

and assume

/ T(

) satises (

/) =

/. Then
f is (/,

/)-measurable f
1
(

/) /.
13
Proof: is clear.
holds since

/ /
f
, A
f
is a -eld due to Lemma 2.1 b)

/ = (

/) /
f
.
Consequences 2.4 Let (, /) be a measurable space. Then
a) f : 1
k
is (/, B
k
)-measurable f
1
(J
k
) / f
1
(O
k
) / etc.
b) f
j
: 1 (j = 1, . . . , k), f = (f
1
, . . . , f
k
). Then
f is (/, B
k
)-measurable f
j
is (/, B
1
)-measurable j 1, . . . , k
c) f : 1
k
1
m
continuous f is (B
k
, B
m
)-measurable (we say: f is Borel-measurable)
d) Let A . Then, I
A
is (/, B
1
)-measurable A /.
I
A
is the indicator function dened by I
A
() =
_
_
_
1 A
0 else
e) Let (
j
, /
j
) (j = 1, 2, 3) be measurable spaces and f
1
: (
1
, /
1
) (
2
, /
2
),
f
2
: (
2
, /
2
) (
3
, /
3
) be measurable functions.
Then f
2
f
1
:
1

3
is (/
1
, /
3
)-measurable.
Proof:
a) Follows from Lemma 2.3 since B
k
= (J
k
) = (O
k
) etc.
b) : Fix j and let A
j
1 be an open set. Then, the set A :=
j1

i=1
1A
j

i=j+1
1 is
an open subset of 1
k
, and we have f
1
j
(A
j
) = f
1
(A) /.
f
j
is (/, B
1
)-measurable.
: For (a, b] =
k

j=1
(a
j
, b
j
] J
k
, f
1
((a, b]) =
k

j=1
f
1
j
((a, b]) /.
f
1
(A) /, A J
k
.
c) f continuous f
1
(O
m
) O
k
but O
k
B
k
, (O
m
) = B
m
hence f
1
(A) B
k
,
A B
m
.
d) For B 1, I
1
A
(B) , A, A
c
, .
e) (f
2
f
1
)
1
(A
3
) = f
1
1
(f
1
2
(A
3
)) /
1
, A /
3
.
We can dene -elds through functions:
Denition 2.5 ,= , (
j
, /
j
) (j J) collection of measurable spaces and (f
j
)
jJ
collection of functions f
j
:
j
.
Then (f
j
, j J) :=
_

jJ
f
1
j
(/
j
)
_
is the -eld generated by the functions f
j
(j J).
If J = 1, . . . , n we write (f
1
, . . . , f
n
). Have f
1
n
(/
k
)

jJ
f
1
j
(/
j
) (f
j
, j J)
for each k J, the function f
k
is ((f
j
, j J), /
k
)-measurable.
14
Let / be a -eld on such that f
k
is (/, /
k
)-measurable k J.


jJ
f
1
j
(/
j
) /
_

jJ
f
1
j
(/
j
)
_
/.
Hence, (f
j
, j J) is the smallest -eld /, such that all functions f
k
are (/, /
k
)-
measurable.
Example 2.6 Let (
1
, T
1
), (
2
, T
2
), . . . be measurable spaces and
=

j=1

j
= = (x
1
, x
2
, . . . ) [ x
i

i

Let
j
:
j
, x
j
=
j
() be the projection. Then J = N.
Denition 2.7 In the above setup, the -eld (
j
, j J) is the product--eld
(Produkt--Algebra) of T
1
, T
2
, . . . and we write

j=1
F
j
.
Remark Let /
n
= (F
1
. . . F
n

n+1
. . . [ F
i
T
i
, 1 i n).
We have
1
j
(F
j
) =
1
. . .
j1
F
j

j
. . . and

j=1

1
j
(F
j
) = F
1
. . . F
m

m+1
. . .

n=1
/
n

j=1
F
j

j=1
F
j
=
_

n=1
/
n
_
since

j=1
F
j
is the smallest -eld, which makes all the projections measurable.
Lemma 2.8 Let (, /, ) be a measure space and (

,

/) be a measurable space and
f : (, /) (

,

/).
Then,
f
:

/ [0, ],

A
f
(

A) = (f
1
(

A)) is a measure on (

,

/).

f
is the image of under f or the law of f under (Bildma).
Proof: Check the properties in Denition 1.15.
Denition 2.9 Let P be a probability measure on (, /), let (

,

/) be a measurable
space. Then, a function X: (, /) (

,

/) is a random variable with values in

.
The law P
X
of X under P (P
X
is a probability measure) is the law of X.
If (

,

/) = (1
k
, B
k
) the random variable X = (X
1
, . . . , X
k
) is
_
_
_
a random vector k 2
a real-valued random variable k = 1
If k 2, P
X
= P
(X
1
,...,X
k
)
is the joint law of X
1
, . . . , X
k
under P.
Denition 2.10 Let (, /) be a measurable space and
0
,
0
,= .
Let id:
0
, be the injection map.
Then we are in the setup of Denition 2.5 with J = 1, f
1
= id.
The -eld (id) = id
1
(A) [ A / = A
0
[ A / is the trace of / in
0
.
Example 2.11 Lebesgue measure on (1, B),
0
= [0, 1].
Then let B
[0,1]
denote the trace of B in [0, 1], i.e. B
[0,1]
= B [0, 1] [ B B.
[
[0,1]
denotes the restriction of on ([0, 1], B
[0,1]
).
We say [
[0,1]
is the equidistribution or uniform distribution on [0, 1] (Gleichverteilung).
15
Dene a function f : [0, 1] 0, 1
N
in the following way:
choose for a [0, 1] a sequence (a
k
)
k1
such that a =

k=1
a
k
1
2
k
(if there are two sequences,
take the sequence with innitely many 1s).
Equip 0, 1
N
with the product--eld

iN
T
i
, where T
i
= , 0, 1, 0, 1.
Then, f : ([0, 1], B
[0,1]
) (, /), = 0, 1
N
, / =

iN
T
i
i.e. f is (B
[0,1]
, /)-measurable.
Proof: exercise
Remark The image of under f is the probability measure on 0, 1
N
which describes
innitely many tosses of a fair coin.
Example 2.12 = 1
2
,
0
= E = (x, y) 1
2
[ x
2
+ y
2
1
Let B
E
be the trace of B
2
in E. The probability measure
1

[
E
on (E, B
E
) is the uniform
distribution on E.
We want now to consider functions (or random variables) which can take values in

1 = 1 , .
Let

B = B

1 [ B 1 B.

B is a -eld on

1.
We have

B = B B [ B B B [ B B B , [ B B.
We will consider functions f : (, /) (

1,

B). We write
f > a = [ f() > a = f
1
((a, ])
f = g = [ f() = g()
f = = [ f() =
Lemma 2.13 (, /) measurable space and f : (, /) (

1,

B).
The following statements are equivalent:
a) f is (/,

B)-measurable
b) f > a /, a 1
c) f a /, a 1
d) f < a /, a 1
e) f a /, a 1
Proof: a) b) since (a, ]

B
b) c) since f a =

n=1
f > a
1
n

c) d) since f < a = f a
c
d) a) since

B is generated by

J = [, a] [ a 1 Proof: exercise
Lemma 2.14 Assume f, g : (, /) (

1,

B).
Then: f < g /, f g /, f ,= g /.
16
Proof: f < g =

a
(f < a g > a)
f g = f > g
c
f = g = f g g f
f ,= g = f = g
c

Theorem 2.15 Assume f, g : (, /) (

1,

B)
Then:
a) f + g and f g are (/,

B)-measurable (if dened everywhere!).
b) f g : (, /) (

1,

B)
c)
1
g
: (, /) (

1,

B)
Here:
1

=
1

= 0,
1
0
=
0 = 0 = 0
0 () = () 0 = 0
= () () =
() = () =
+ () and () + are not dened.
Proof:
a) Due to Lemma 2.14, a + cf and f are (/,

B)-measurable, too (a, c 1).
Hence f + g a = f a g / due to Lemma 2.14.
b) Assume rst that f, g take values in 1.
Have f g =
1
4
((f + g)
2
(f g)
2
).
Hence, it suces to take f = g i.e. to show that f
2
: (, /) (

1,

B).
But f
2
> a =
_
_
_
a 0
f

a f

a a > 0
f
2
a / f
2
is (/,

B)-measurable.
General case:

1
= f g =

2
= f g =

3
= f g = 0

4
= (
1

2

3
)
Then,
j
/ (j = 1, 2, 3, 4). The restriction f[

4
of f to
4
is (/, B)-measurable,
same for g[

4
. Due to rst case, f[

4
g[

4
is (/, B)-measurable. But f g a =
4

j=1
(f g a
j
) = f g = (f g a
3
) (f[

4
g[

4
)
1
([a, )) and
f g a
3
=
_
_
_
f g = 0 a 0
else
f g : (, /) (

1,

B).
17
c) Assume a > 0. Then
_
1
g
a
_
=
__
1
g
a
_
g > 0
_

__
1
g
a
_
g = 0
_

__
1
g
a
_
g < 0
_
=
_
0 < g
1
a
_
. .
,
g = 0
. .
,

..
,

_
1
g
a
_
/.
a < 0 analogous.
Theorem 2.16 Assume f, f
1
, f
2
, . . . are (/,

B)-measurable.
Then, the following functions are measurable, too:
a) sup
n
f
n
, inf
n
f
n
b) limsup
n
, liminf
n
f
n
c) f
+
= max(f, 0) = f 0 (f
+
is the positive part of f)
f

= min(f, 0) = f 0 (f

is the negative part of f)


[f[ = f
+
+ f

Proof:
a) sup
n1
f
n
a =

n=1
f
n
a
. .
,
inf
n
f
n
= sup
n
(f
n
)
b) follows from a) since limsup
n
f
n
= inf
n1
sup
kn
f
n
, liminf
n
f
n
= sup
n1
inf
kn
f
n
.
c) f
+
, f

are (/,

B)-measurable due to a), [f[ due to Theorem 2.15 a).
Note that
f
+
0, f

0, f = f
+
f

(2.2)
Corollary 2.17 If (f
n
)
n1
is a sequence of functions which are (/,

B)-measurable and
lim
n
f
n
() exists, (we say that the sequence (f
n
) converges pointwise) then lim
n
f
n
is
(/,

B)-measurable.
18
3 Integration of functions
Assume (, /, ) is a measure space.
Goal: For as many functions f : (, /) (

1,

B) as possible, dene
_
f d.
Proceeds in 4 steps:
1. f = I
A
.
2. f 0, values in 1, f takes only nitely many values.
3. f 0 (take limits from 2.).
4. f arbitrary (decompose f = f
+
f

and apply 3.)


1.
_
I
A
d =
_
A
1 d = (A) (A /)
2. c = f : (, /) (1, B [ f 0, f takes nitely many values
If f c, f() =
1
, . . . ,
n
,
j
1
+

f =
n

j=1

j
I
A
(3.1)
with A
j
= f
1
j
(
j
) / and =
n

j=1
A
j
.
Dene
_
f d =
n

j=1

j
(A
j
)
- have to show that for two dierent representations of the form (3.1), the two
integrals coincide.
3. c

= f : (, /) (

1,

B) [ f 0
Theorem 3.1 Let f c

. Then, there is an increasing sequence of functions


(u
n
)
n1
c with u
n
f (i.e. u
n
u
n+1
, n and lim
n
u
n
= f)
Proof: see Measure Theory.
Dene:
_
f d = lim
n
_
u
n
d (3.2)
- have to show that the limit in (3.2) exists.
- have to show that for two dierent sequences (u
n
)
n1
and (v
n
)
n1
with
u
n
f, v
n
f, lim
n
_
u
n
d = lim
n
_
v
n
d
19
4. Denition 3.2 The function f : f(, /) (

1,

B) is -integrable or integrable if
_
f
+
d < and
_
f

d < .
In this case,
_
f d :=
_
f
+
d
_
f

d (3.3)
_
f d is the integral of f with respect to .
Notations:
_
f d = (f) =
_
f() (d) =
_

f d
Remark
1. If
_
f
+
d < or
_
f

d < then (3.3) still denes


_
f d - with possible values
+ or . In this case, we say that f is quasi-integrable.
2. Assume (, /, P) is a probability space. Then
X is integrable with respect to P E([X[) =
_
[X[ dP < .
We write E(X) =
_
X dP (X: (, /) (1,

B) random variable)
Theorem 3.3 (Properties of the integral)
f, g : (, /) (

1,

B) and f, g are integrable with respect to . Then:
a) f is integrable ( 1) with
_
f d =
_
f d.
b) If f + g is dened on , f + g is integrable and
_
(f + g) d =
_
f d +
_
g d.
c) f g = max(f, g) and f g = min(f, g) are integrable as well.
d) f g
_
f d
_
g d
e) [
_
f d[
_
[f[ d
Proof: see Measure Theory
Examples
1. (, /) measurable space, ,

Dirac measure in .
A function f : (, /) (

1,

B) is integrable withrespect to

if and only if [f()[ <


.
In this case,
_
f d

= f().
2. (, /) = (N, T(N)) and is the counting measure (A) =
_
_
_
[A[ A nite
+ else
f : N

1 is given by the sequence b
n
= f(n). Then
f integrable with respect to

n
[b
n
[ <
If f is integrable w.r.t ,
_
f d =
n

n=1
b
n
.
20
Theorem 3.4 (Monotone Convergence)
If 0 f
1
f
2
. . . then
_
lim
n
f
n
d = lim
n
_
f
n
d
Proof: see Measure Theory
Remark f 0 is needed.
Example 3.5 1, / = B, = , f
n
(x) =
1
n
, x, f
n
is not -integrable
(f
n
is quasi-integrable with
_
f
n
d = .)
But: f
n
f, f(x) = 0, x, f is integrable
_
f d = 0.
Theorem 3.6 (Transformation of measure)
(, /, ) measure space, (

,

/) measurable space, f : (, /) (

,

/).

f
is the law of f under . Then:
(i) h: (

,

/) (

1,

B), h 0
_
hd
f
=
_
(h f) d (3.4)
(ii) h: (

,

/) (

1,

B). Then:
h is integrable with respect to
f
h f is integrable with respect to
In this case, (3.4) holds.
Proof: In 4 steps, see the denition of the integral at the beginning of the chapter.
21
4 Markov inequality, Fatous lemma, dominated
convergence
Denition 4.1 (, /, ) measure space. We say, that a property E holds -almost
everywhere (-a.e.) (-fast uberall) if [ E does not hold for is a -nullset.
If P is a probability measure, we say that E holds P-almost surely (P-a.s.)
(P-fast sicher). In this case P( [ E holds for ) = 1.
Example f, g : (, /) (1,

B)
f = g, -a.e. means that (f ,= g) = 0.
Theorem 4.2 Let f, g : (, /) (

1,

B) and assume f = g, -a.e. Then
f is -integrable g is -integrable
and in this case:
_
f d =
_
g d.
Proof: f
+
,= g
+
f

,= g

f ,= g. Hence w.l.o.g. f 0, g 0.
N := f ,= g / since f, g are measurable
h := I
N
and h
n
:= nI
N
h
_
h
n
d = n(N) = 0
mon.

conv.
_
hd = 0
Since f g + h and g f + h we have
f -integrable g is -integrable
and if f is -integrable,
_
f d =
_
g d.
Theorem 4.3 (Markov inequality)
f : (, /) (

1,

B), f 0, B
t
= f t / (t > 0). Then
(f t)
1
t
_
f d. (4.1)
Proof: 0 t I
B
t
f I
B
t
f and
_
t I
B
t
d = t(B
t
).
Hence, (f t)
1
t
_
f I
B
t
d
1
t
_
f d.
Consequences 4.4
1. f : (, /) (

1,

B), f 0. Then,
_
f d = 0 f = 0 -a.e.
Proof: (f t)
1
t
_
f d = 0 (t > 0)
hence (f > 0) =
_

n
f
1
n

_
0
(f > 0) = 0
2. f : (, /) (

1,

B)
Assume f is -integrable. Then:
([f[ = ) = 0 i.e. [f[ < -a.e.
22
Proof: ([f[ = ) ([f[ n)
(4.1)

1
n
_
[f[ d
n
0.
Lemma 4.5 (Fatous lemma)
(f
n
) sequence of measurable functions, f
n
0, n. Then
_
liminf
n
d liminf
n
_
f
n
d (4.2)
Proof: g
n
:= inf
kn
f
k
, n = 1, 2, . . .
Then 0 g
1
g
2
. . . and liminf
n
f
n
= lim
n
g
n
.
Hence
_
liminf
n
f
n
d =
_
lim
n
g
n
d
mon.
=
conv.
lim
n
_
g
n
d
g
n
f
n
liminf
n
_
f
n
d.
Remark
1. Suces to assume that f
n
g, n for some function g which is -integrable.
Proof: Apply the statement to f
n
g.
2. (4.2) does not always hold.
Example = [0, 1], / =

B
[0,1]
, = [
[0,1]
f
n
() = n
2

n
, 0 1
f
n
(x) 0 -a.s. hence 0 =
_
liminf
n
f
n
d but
liminf
n
_
f
n
d = liminf
n
1
_
0
n
2
x
n
dx = liminf
n
_

n
2
n+1
_
=
Theorem 4.6 (Dominated convergence, Lebesgues Theorem)
f, f
1
, f
2
, . . . (, /) (

1,

B).
Assume that f = lim
n
f
n
-a.e. If there is a function g 0 which is integrable and
[f
n
[ g -a.e., n 1 (4.3)
then f is -integrable and we have
_
f d = lim
n
_
f
n
d
Proof: see Measure Theory.
Remark If is a probability measure, (4.3) is satised if [f
n
[ K -a.s. for some
constant K.
23
5 L
p
-spaces, inequalities
(, /, ) measure space, p 1.
Denition 5.1 f : (, /) (

1,

B) has a nite p-th absolute moment if
_
[f[
p
d < .
L
p
= L
p
(, /, ) := f : (, /) (

1,

B) [
_
[f[
p
d < is the collection of measurable
functions on (, /) with a nite p-th absolute moment.
Remark L
p
(, /, ) is a vector space on 1.
f, g L
p
(, /, ) f L
p
( 1) and
[f + g[
p
([f[ +[g[)
p
(2([f[ [g[))
p
2
p
[f[
p
+ 2
p
[g[
p
f + g L
p
if f, g L
p
Denition 5.2 Let X be a random variable on (, /, P) (X: (, /) (

1,

B))
E([X[
p
) =
_
[X[
p
dP is the p-th absolute moment of X.
X L
p
(, /, ) E([X[
p
) <
In this case: E(X
p
) exists and E(X
p
) < . E(X
p
) is the p-th moment of X.
Denition 5.3 f : (, /) (

1,

B) is -almost everywhere bounded if there is a constant
K (0 K < ) such that ([f[ > K) = 0
L

= L

(, /, ) = f : (, /) (

1,

B) [ f -a.e. bounded
Remark L

(, /, P) is a vector space on 1.
Denition 5.4 (p-Norm)
For f : (, /) (

1,

B) we dene
|f|
p
=
__
[f[
p
d
_1
p
and |f|

= infs > 0: ([f[ > s) = 0.


|f|

is called the essential supremum of f with respect to and we write


|f|

= esssup f.
Example 5.5 (, /, ) = (1, B, ), f = I

|f|

= 0
Next we proof that | |
p
is a seminorm on L
p
for p [1, ].
In order to achieve this, we need two inequalities:
Theorem 5.6 (Holders inequality)
For p [1, ] let q [1, ] such that
1
p
+
1
q
= 1. Then for f, g : (, /) (

1,

B) we have
_
[fg[ d
__
[f[
p
d
_1
p
__
[g[
q
d
_1
q
or short: |fg|
1
|f|
p
|g|
q
The proof of Holders inequality makes use of the following
Lemma 5.7 1 < p, q < ,
1
p
+
1
q
= 1. For all x, y [0, ] we have xy
x
p
p
+
y
q
q
Proof: If 0 < x, y < , we consider the following picture:
(picture will be here soon)
24
Proof of H olders inequality: W.l.o.g., let 0 < |f|
p
|g|
q
<
Due to Lemma 5.7 we have
[f[
|f|
p
[g[
|g|
q

1
p
[f[
p
|f|
p
p
+
1
q
[g[
q
|g|
q
q
Integrating with respect to yields

[fg[ d
|f|
p
|g|
q

1
p
1 +
1
q
1 = 1
Remark H olders inequality still holds for p = 1, q = , i.e. |fg|
1
|f|
1
|g|

.
Proof: see exercises
If p = q = 2, Holders inequality is also called Cauchy-Schwarz inequality.
Theorem 5.8 (Minkowski inequality)
Let p [1, ] and f, g : (, /) (

1,

B) such that f +g is well-dened. Then the triangle
inequality in L
p
holds, i.e. we have |f + g|
p
|f|
p
+|g|
p
.
Proof:
Step 1 p <
We can restrict ourselves to the non-trivial case in which we have |f|
p
, |g|
p
<
and thus |f + g|
p
< . Moreover, w.l.o.g. we can assume that f, g are non-
negative since we have |f + g|
p
|[f[ +[g[|
p
.
Further, for p = 1 we have |[f[ + [g[|
p
= |f|
p
+ |g|
p
and thus we can assume
p > 1.
We dene q :=
1
1
1
p

1
p
+
1
q
= 1 and we have
_
(f + g)
p
d =
_
f(f + g)
p1
d +
_
g(f + g)
p1
d
Holder
|f|
p
|(f +g)
p1
|
q
+|g|
p
|(f +g)
p1
|
q
= (|f|
p
+|g|
p
)
__
(f + g)
(p1)q
d
_1
q
(p1)q=p
= (|f|
p
+|g|
p
)
__
(f + g)
p
d
_
1
1
p
.
Step 2 p =
Again we assume that we have |f|
p
, |g|
p
< .
For all > 0 we have
([f +g[ > |f|

+|g|

+) ([f[ > |f|

+

2
) +([g[ > |g|

+

2
) = 0
([f + g[ > s) = 0, s > |f|

+|g|

.
Remark | |
p
is a seminorm on L
p
(, /, ):
|f|
p
0
f 0 |f|
p
= 0
|f|
p
= [[|f|
p
|f + g|
p
|f|
p
+|g|
p
But |f|
p
= 0 does not imply that we have f 0.
We have |f|
p
= 0 f = 0 -a.e.
25
We dene A
0
:= f L
p
: f = 0 -a.e. and observe that A
0
is a linear subset of L
p
.
Let L
p
:= L
p
(, /, ) = L
p
(, /, )/A
0
denote the quotient space of L
p
with respect to
A
0
.
L
p
is a normed vector space over 1 with norm |[f]|
p
:= |f|
p
for some f [f] where [f]
denotes the equivalence class of f.
Denition 5.9 (L
p
-convergence)
For p [1, ] let (f
n
)
nN
be a sequence of functions in L
p
. We say (f
n
)
nN
converges in
the p-th norm or in L
p
to a limit f L
p
if we have lim
n
|f
n
f|
p
= 0.
Theorem 5.10 (Completeness of L
p
)
Every Cauchy sequence in L
p
converges in L
p
to a limit f L
p
, i.e. L
p
equipped with the
metric d(, ) dened by d(f, g) = |f g|
p
is a complete metric space.
In particular L
p
(, /, ) is a Banach space.
Furthermore for every Cauchy sequence (f
n
)
nN
in L
p
there is f L
p
and a subsequence
(f
n
k
)
kN
such that f
n
k
k
f -a.e.
Proof:
Step 1 We construct a subsequence (f
n
k
)
kN
which converges -a.e.
For k N there is n
k
N such that |f
n
f
m
|
p
2
k
, m, n n
k
N.
We dene g
k
:= f
n
k+1
f
n
k
and g :=

k=1
[g
k
[.
|g|
p
Thm. 5.8

k=1
|g
k
|
p
<
[g[ < -a.e.

k=1
g
k
converges absolutely -a.e.
the sequence (f
n
k
)
kN
converges -a.e. because
m

k=1
g
k
= f
n
m+1
f
n
1
.
there is a set N / with (N) = 0 such that lim
k
f
n
k
exists for all N.
We dene f = lim
k
f
n
k
I
\N
and this completes step 1.
Step 2 We show that (f
n
)
nN
converges in L
p
to f.
In order to do so, we need the following lemma.
Lemma 5.11 Let (, /, ) be a measure space, p [1, ] and (f
n
) a sequence of

1-
valued, (/,

B)-measurable functions such that:
g 0, g L
p
, [f
n
[ g -a.e., n N
f
n
f -a.e. with f : (, /) (

1,

B)
Then we have f L
p
, and
_
[f
n
f[
p

n
0 ( f
n
/
p
f)
Proof: see exercises
26
With the help of Lemma 5.11, we can conclude that (f
n
k
) converges in L
p
to f since we
have
[f
n
k
[ [f
n
k
f
n
k1
+f
n
k1
f
n
k2
+. . . +f
n
2
f
n
1
+f
n
1
[

k
[f
n
k
f
n
k1
[ +[f
n
1
[ L
p
In other words |f
n
k
f|
p

k
0, and since (f
n
) is a Cauchy sequence,
|f
n
f|
p

n
0.
Lemma 5.12 Let (g
n
) be a sequence of non-negative measurable functions. Then
|

n
g
n
|
p

n
|g
n
|
p
.
Proof: h
N
=
N

n=1
g
n
h =

n=1
g
n
.
Hence (
_
h
p
N
d)
1
p
(
_
h
p
d)
1
p
(monotone convergence).
But (
_
h
p
N
d)
1
p

Minkovski
N

n=1
|g
n
|
p

n=1
|g
n
|
p
.
Hence sup
N
_
(
_
h
p
N
d)
1
p
_

n=1
|g
n
|
p
.
Example 5.13 (f
n
/
p
f =f
n
a.e.
f)
= [0, 1], / = B and =
Dene a sequence (A
n
) as follows:
m = 1: A
1
= [0,
1
1
)
m = 2: A
2
= [0,
1
2
), A
3
= [
1
2
, 1)
m = 2: A
4
= [0,
1
3
), A
5
= [
1
3
,
2
3
), A
6
= [
2
3
, 1)

Dene (f
n
) as f
n
:= I
A
n
, n 1 and f = 0.
Then (f
n
) converges to f in L
p
, for 1 p < .
Indeed, as soon as n
m1

j=1
j, we have
_
[f
n
f[
p
= (A
n
)
1
m
.
But f
n
f -a.e.
Indeed, (0, 1): limsup
n
f
n
() = 1, liminf
n
f
n
() = 0.
Remark f
n
f in L

.
Example 5.14 (l
p
-space)
(, /) = (N, T(N)), counting measure.
l
p
:= L
p
(, /, ) = x = (x
k
) 1
N
, |x|
p
< where
_

_
|x|
p
=
_

k=1
[x
k
[
p
_1
p
for 1 p < ,
|x|

= sup
n
[x
k
[ for p = .
From Theorem 5.9, (l
p
, | |
p
) is a Banach space for any 1 p .
27
6 Dierent types of convergence, uniform integrability
(, /, P) a probability space.
X, X
1
, X
2
, . . . random variables.
Denition 6.1
We say that (X
n
) converges in probability to X (Notation: X
n
P
X), if
> 0: P([X
n
X[ )
n
0.
We say that (X
n
) converges almost surely (a.s.) to X if
P(: X
n
()
n
X()) = 1.
We say that (X
n
) converges in L
p
to X if
E([X
n
X[
p
)
n
0 (1 p < ).
Theorem 6.2
i) If X
n
/
p

n
X, then X
n
P

n
X
ii) If X
n
a.s.

n
X, then X
n
P

n
X
Proof: For i), note that P([X
n
X[ )
E([X
n
X[
p
)

p
n
0.
For ii), we need
Lemma 6.3 X
n
a.s.

n
X > 0: P(sup
kn
[X
n
X[ )
n
0.
Thanks to Lemma 6.3 and the fact that
P([X
n
X[ ) P(sup
kn
[X
k
X[ ),
we have, > 0: P([X
n
X[ )
n
0 if X
n
a.s.

n
X.
Proof of Lemma 6.3:
: Let > 0 and dene
A
n
= sup
kn
[X
k
X[
C = lim
n
X
n
= X
B
n
= C A
n
We have B
n
B
n1
. . . and

n=1
B
n
= .
The continuity from above of P implies that lim
n
P(B
n
) = 0 (see (1.2)).
On the other hand, P(C) = 1 P(B
n
) = P(A
n
), n 1.
Hence, P(A
n
)
n
0.
: Dene A
n
and C as above and D

= limsup
n
[X
n
X[ .
We have D

A
n
, n 1.
By hypothesis, P(A
n
)
n
0, hence P(D

) = 0, > 0.
28
One can write C
c
=

k=1
limsup
n
[X
n
X[
1
k
=

k=1
D1
k
0 P(C
c
)

k=1
P(D1
k
) = 0 P(C) = 1.
Remark The converse implications of i) and ii) in Theorem 6.2 do not hold.
Example 5.13 = [0, 1), / = B, P = , X
n
= I
A
n
, X 0.
We have X
n
/
p
X but X
n
X a.s.
In this case, X
n
P
X.
Example 6.4 = [0, 1], / = B, P = , X
n
() = (n + 1)
n
(n = 1, 2, . . . ), X() = 0
Then X
n
a.s.

n
X X
n
P

n
X
E([X
n
X[
p
) = (n + 1)
p
1
_
0
X
np
dx =
(n+1)
p
np+1

_
_
_
1 if p = 1
if 1 < p <
In both cases, the limit is dierent from 0.
X
n
X in L
p
.
Theorem 6.5 The following assertions are equivalent:
a) X
n
P

n
X
b) Every subsequence (X
n
k
) of (X
n
) has another subsequence (X
n
k
) such that
X
n
k
a.s.

n
X.
Proof: a) b):
Let (X
n
k
) be a subsequence of (X
n
).
Then, there exists (X
n
k
) subsequence of (X
n
k
) such that P([X
n
k
X[
1
k
)
1
k
2
, k 1.
We now prove that > 0, P(sup
kn
[X
n
k
X[ )
n
0.
Let > 0 and n large enough such that
1
n
< .
P(sup
kn
[X
n
k
X[ )

k=n
P([X
n
k
X[ )

k=n
P([X
n
k
X[
1
k
)

k=n
1
k
2
n
0.
b) a):
Let > 0 and dene b
n
= P([X
n
X[ ). b
n
is bounded.
Let (b
n
k
) be a convergent subsequence of (b
n
).
Then there exists (b
n
k
) such that X
n
k
a.s.

k
X, which implies that b
n
k
k
0
by Theorem 6.2 ii).
b
n
k
k
0.
Hence, every subsequence (b
n
k
) of (b
n
) which converges, converges to 0.
b
n
n
0.
Remark We already know from Theorem 5.10 that for a sequence (X
n
) such that
X
n
/
p

n
X (1 p ), there is a subsequence (X
n
k
) such that X
n
k
a.s.

n
X.
29
Denition 6.6 We say that the sequence (X
n
) is uniformly integrable if
lim
c
_
sup
n
_
[X
n
[c
[X
n
[ dP
_
= 0 (6.1)
_
_
[X
n
[c
[X
n
[ dP = E([X
n
[I
[X
n
[c
)
_
.
Remarks
1. Every uniformly integrable sequence (X
n
) is bounded in L
1
, that is
sup
n
_
[X
n
[ dP < .
Indeed
_
[X
n
[ dP = c +
_
[X
n
[c
[X
n
[ dP.
2. [X
n
[ K a.s. n (X
n
) is uniformly integrable.
Theorem 6.7 (Extension of Lebesgues Theorem / Dominated convergence)
For a sequence (X
n
) of random variables which is uniformly integrable and converges to
a random variable X in probability, we have X
n
X in L
1
.
In particular, if X
n
X P-a.s. and (X
n
) uniformly integrable then
lim
n
E(X
n
) = E(lim
n
X
n
) = E(X).
For the proof, we need the following lemma:
Lemma 6.8 If (X
n
) is uniformly integrable then there is, for each > 0, some =
() > 0 so that P(A) sup
n
_
A
[X
n
[ dP .
Proof of Lemma 6.8: We have
_
A
[X
n
[ dP =
_
A[X
n
[<c
[X
n
[ dP +
_
A[X
n
[c
[X
n
[ dP cP(A) +
_
A[X
n
[c
[X
n
[ dP (6.3)
and for c = c() large enough and =

2c
, both terms on the right hand side of (6.3) are


2
.
Proof of Theorem 6.7: X
n
X in probability. W.l.o.g. X 0.
Then E([X
n
[) +
_
[X
n
[
[X
n
[ dP. (Take c = , A = in (6.3).)
But P([X
n
[ ) for n N
0
() since X
n
0 in probability, hence
_
[X
n
[
[X
n
[ dP
for n N
0
() because of Lemma 6.8. Hence E([X
n
[)
n
0.
Remark 6.9 X, X
1
, X
2
, . . . L
p
(, /, P).
Then the following statements are equivalent:
a) X
n
X in L
p
.
b) X
n
X in probability and the sequence ([X
n
X[
p
) is uniformly integrable.
30
Proof: see exercises
An example of a sequence which is not uniformly integrable is Example 6.4 or
Example 6.10 Y
1
, Y
2
, . . . fair coin tosses, i.e.
P(Y
i
= 0) =
1
2
= P(Y
i
= 1), P(Y
1
= 0, . . . , Y
n
= 0) =
1
2
n
.
T() = minn 1 [ Y
n
() = 1 is the waiting time for the rst 1.
Let c > 2 be constant, X
n
:= c
n
I
T>n
.
Then X
n
n
0 P-a.s. (since T < P-a.s.).
But E([X
n
[) = E(X
n
) = c
n
2
n
n
since c > 2.
A sequence (X
n
) is uniformly integrable if it is bounded in L
p
for some p > 1 i.e.
sup
n
E([X
n
[
p
) < or if sup
n
E([X
n
[ log
+
[X
n
[) < .
More generally, we have the following theorem:
Theorem 6.11 Assume that g : 1
+
1
+
is increasing with lim
x
g(x)
x
= and that
sup
n
E(g([X
n
[)) < .
Then (X
n
) is uniformly integrable.
Proof: For each > 0, there is c() so that
[x[ c()
g([x[)
[x[

1

(6.4)
Hence, sup
n
_
[X
n
[c()
[X
n
[ dP
(6.4)
sup
n
_
g([X
n
[) dP
. .
<
for = ().
Then, take g(x) = x
p
(p > 0) or g(x) = x log
+
x to get the statements above.
31
7 Independence
(, /, P) probability space.
Denition 7.1 A collection A
i
(i I) of sets in / is (stochastically) independent if
J I, J nite (A
i
,= A
j
for i, j J) P
_

iJ
A
i
_
=

iJ
P(A
i
) (7.1)
Sets in / are also denoted events.
A collection (
i
(i I) with (
i
/(i I) is independent, if for each choice A
i
(
i
(i I),
the events A
i
(i I) are independent.
Remark 7.2 In particular, two events A and B are independent if P(AB) = P(A)P(B).
Note that there may be probability measures Q on (, /) with Q(A B) ,= Q(A)Q(B).
Hence, independence is not a property of A and B alone but involves P.
Theorem 7.3 Assume (
i
(i I) is an independent collection of systems of sets,
(
i
/ (i I) and (
i
is -stable (i I). Then:
(i) The -elds ((
i
) (i I) are independent, too.
(ii) For a partition J
k
(k K) of I (partitioning I in pairwise disjoint subsets) the
-elds
_

iJ
k
(
i
_
(k K) are independent, too.
Example = 0, 1
N
, / = product -eld, X
i
() = x
i
, where = (x
1
, x
2
, . . . ).
P probability measure on (, /) such that the events X
i
= 1 (i = 1, 2, . . . ) are inde-
pendent.
Due to Theorem 7.3 (i) the events X
i
= 0 (i = 1, 2, . . . ) are independent, too.
Due to Theorem 7.3 (ii) the events A
k
= X
kN+1
= x
1
, . . . , X
(k+1)N
= x
N

(in the period [kN + 1, (k + 1)N] the binary text (x


1
, . . . , x
n
) appears)
(k = 0, 1, 2, . . . ) are independent, too.
Proof of Theorem 7.3:
(i) Take J = i
1
, . . . , i
k
and A
i
j
((
i
j
).
Have to show: P(A
i
1
. . . A
i
k
) = P(A
i
1
) . . . P(A
i
k
). ()
Fix A
i
2
(
i
2
, . . . , A
i
k
(
i
k
and let T
i
1
= A ((
i
1
) [ () holds with A
i
1
= A.
Due to the hypothesis of the theorem, (
i
1
T
i
1
.
Further, T
i
1
is a -system: for instance, if A T
i
1
then
P(A
c
A
i
2
. . . A
i
k
) = P(A
i
2
. . . A
i
k
) P(A A
i
2
. . . A
i
k
) =
(1 P(A))
. .
=P(A
c
)
P(A
i
2
) . . . P(A
i
k
) A
c
T
i
1
Due to Lemma 1.12, T
i
1
= ((
i
1
).
Now, let T
i
2
= A ((
i
2
) [ () holds with A
i
1
((
i
1
), A
i
2
= A, A
i
3
, . . . , A
i
k

and iterate the argument.


32
(ii) The collections c
k
:=
_

iJ
A
i
[ J J
k
, J nite, A
i
(
i
_
(k K) are -stable and
independent.
For C
k
j
=

iJ
k
j
A
i
c
k
j
, we have
P(C
k
1
. . . C
k
n
) = P
_

iJ
k
1
A
i
. . .

iJ
k
n
A
i
_
=

iJ
k
1
P(A
i
) . . .

iJ
k
n
P(A
i
) =
P(C
k
1
) . . . P(C
k
n
)
Due to (i), the generated -elds (c
k
) =
_

iJ
k
(
i
_
(k K) are independent,
too.
Lemma 7.4 (Borel-Cantelli-Lemma)
(i) (A
k
)
k1
sequence of events. Then

k=1
P(A
k
) < P
_

n
_
kn
A
k
_
= 0.
(ii) (A
k
)
k1
sequence of independent events. Then

k=1
P(A
k
) = P
_

n
_
kn
A
k
_
= 1.
Note that

kn
A
k
is the event that for innitely many k, A
k
happens.
Remark 7.5 For an arbitrary sequence (A
n
)
nN
of measurable sets in some measure space
we dene
limsup
n
A
n
:=

nN

kn
A
k
and
liminf
n
A
n
:=

nN

kn
A
k
.
It is an easy exercise to show that we have liminf
n
A
n
limsup
n
A
n
and that
(limsup
n
A
n
)
c
= liminf
n
A
c
n
.
limsup
n
A
n
= : A
n
for innitely many n N.
liminf
n
A
n
= : A
n
for all but nitely many n N.
Proof of Lemma 7.4:
(i) Let B
n
=

kn
A
k
.
Then we have limsup
n
A
n
=

nN
B
n
and thus limsup
n
A
n
B
n
, n N.
P(limsup
n
A
n
) P(B
n
)

k=n
P(A
k
)
n
0, since

k=1
P(A
k
) <
P(limsup
n
A
n
) = 0.
(ii) It suces to show that we have P
_

kn
A
k
_
= 1, n N or equivalenty
P
_

kn
A
c
k
_
= 0, n N.
33
Due to Theorem 7.3 the events (A
c
n
)
nN
are also independent and hence we have
P
_

kn
A
c
k
_
Thm. 1.17 (f)
= lim
m
P
_
m

k=n
A
c
k
_
= lim
m
m

k=n
P(A
c
k
) =
lim
m
m

k=n
(1 P(A
k
))
1xe
x
liminf
m
e

m
k=1
P(A
k
)
= 0.
A very remarkable consequence of Theorem 7.3 is the following.
Theorem 7.6 (Kolomogorovs 0-1-law)
Let (/
i
)
iI
be a countable collection of independent -algebras. Further let
/

:=

JI
[J[<

_

i / J
/
i
_
denote the corresponding tail -algebra (terminale -Algebra). Then we have
A /

P(A) 0, 1.
Interpretation of /

1) Dynamical:
If we interpret I as a sequence 1, 2, 3, . . . of points in time and /
n
as the -algebra
of all events observable at time n N, then we have
/

=: /

n=1

_

kn
/
k
_
=

n=1
(/
n
, /
n+1
, . . . ).
/

can be interpreted as the -algebra of all events observable in the innitely distant
future.
Example Let (X
n
)
nN
be a sequence of random variables on (, /).
We dene /
n
:= (X
1
, X
2
, . . . , X
n
) then we have /

n=1
(X
n
, X
n+1
, . . . ).
The events
_
lim
n

n
k=1
X
k
c
n
t
_
,
_
limsup
n

n
k=1
X
k
c
n
t
_
for c
n
, t 1 with c
n
are
elements of /

.
Due to Kolmogorovs 0-1-law we have P(A) 0, 1, A /

if the random variables


X
1
, X
2
, . . . are independent.
2) Static:
We interpret I as the set of subsystems which act independently of each other and
/
i
as the -algbra of events which only depend on the ith subsystem.
Then /

is the collection of all macroscopic events which do not depend on nitely


many subsystems. Thus, if the subsystems are independent, we know that on this
macroscopic scale the whole system is deterministic.
Proof of Theorem 7.6:
Step 1 The collection of sets /
j
(j J),

i / J
/
i
are independent for every nite set J I.
Due to Theorem 7.3 we have that /
j
(j J),
_

iJ
/
i
_
are also independent
(note that /
i
is -stable). Since /


_

i / J
/
i
_
for all nite sets J I we have
that /
i
(i I), /

are independent.
34
Step 2 Again with the help of Theorem 7.3 we can conclude that
_

iI
/
i
_
and /

are
independent.
Step 3 Let A /

. Then A is also an element of


_

iI
/
i
_
, since

JI
[J[<

_

iJ
/
i
_
_
_

iI
/
i
_
.
Therefore, step 2 of this proof implies P(A) = P(A A) = P(A)P(A)
(A is independent of itself)
P(A) 0, 1.
Remark 7.7 Let (/
i
)
iI
be a countable collection of independent -algebras and let
A
i
/
i
(i I).
Then we have that

A :=

JI
[J[<

i / J
A
i
is an element of /

and thus P(

A) 0, 1,
and the Borel-Cantelli-Lemma can be used to decide whether P(

A) = 1 or P(

A) = 0.
Denition 7.8 A collection of random variables (X
i
)
iI
is independent if the -algebra
generated by the random variables X
i
are independent, i.e. if (X
i
) (i I) are indepen-
dent.
35
8 Kernels and Fubinis Theorem
Let (
1
, T
1
) and (
2
, T
2
) be measure spaces.
Denition 8.1 A transition kernel or stochastic kernel K(x
1
, dx
2
) from (
1
, T
1
) to
(
2
, T
2
) is a function K:
1
T
2
[0, 1], (x
1
, A
2
) K(x
1
, A
2
) such that
i) K(x
1
, ) is a probability measure on (
2
, T
2
), x
1

1
.
ii) K(, A
2
) is (T
1
, B
[0,1]
)-measurable A
2
T
2
.
Interpretations of kernels
1. Dynamic:
(
i
, T
i
) = state space of period/time i.
K(x
1
, ) = law of the state at time 2 if state at time 1 was x
1
.
2. Static:
(
i
, T
i
) = state spaces of system number i.
K(x
1
, ) = law of the state of system number 2 given that system number 1 is in
state x
1
.
Examples
1. K(x
1
, ) P
2
, P
2
probability measure on (
2
, T
2
) (no information)
2. K(x
1
, ) =
T(x
1
)
where T : (
1
, T
1
) (
2
, T
2
) (full information)
3. Markov chain

1
=
2
= S, S countable.
T
1
= T
2
= T(S).
K is given by weights k(x, y) (x, y S) and k(x, y)
x,yS
is a stochastic matrix.
k(x, y) 0, x, y S,

yS
k(x, y) = 1, x S
Example S = 0, 1,
_
1
1
_
(0 < < 1)
K(x, y) = I
x=y
+ (1 )I
x,=y
.
Let P
1
be a probability measure on (
1
, T
1
) and K a stochastic kernel from (
1
, T
1
) to
(
2
, T
2
).
Goal: construct a probability measure P on (, T) where =
1

2
, T = T
1
T
2
such
that K(x
1
, ) can be interpreted as the conditional distribution of the second component,
given the rst one.
36
A Discrete case

i
countable, T
i
= T(
i
) (i = 1, 2).
=
1

2
countable and the probability measure P on (, T) (T = T()) is given
by the weights p(x
1
, x
2
) = p
1
(x
1
)k(x
1
, x
2
), where p
1
(x
1
) = P
1
(x
1
).
For each function f : [0, ], we then have
_
f dP =

x
1
,x
2
f(x
1
, x
2
)p(x
1
, x
2
) =

x
1
_

x
2
f(x
1
, x
2
)k(x
1
, x
2
)
_
p
1
(x
1
) =
_ __
f(x
1
, x
2
)K(x
1
, dx
2
)
_
P
1
(dx
1
).
B General case
Let =
1

2
, T = T
1
T
2
.
Theorem 8.2 Assume P
1
is a probability measure on (
1
, T
1
) and K a stochastic kernel
from (
1
, T
1
) to (
2
, T
2
). Then there is a probability measure P on (, T) such that
_
f dP =
_ __
f(x
1
, x
2
)K(x
1
, dx
2
)
_
P
1
(dx
1
) (8.1)
f : (, T) (

1,

B), f 0.
In particular,
P(A) =
_
K(x
1
, A
x
1
)P
1
(dx
1
) (A T), (8.2)
where A
x
1
= x
2

2
[ (x
1
, x
2
) A and
P(A
1
A
2
) =
_
A
1
K(x
1
, A
2
)P
1
(dx
1
) (A
1
T
1
, A
2
T
2
). (8.3)
P is uniquely determined by (8.3).
Proof: The uniqueness of P follows from Theorem 1.18 since the sets of the form A
1
A
2
generate T and they are a -stable collection of sets.
We show that the right hand side of (8.1) or (8.2), respectively, are well-dened.
Step 1 For x
1

1
, take
x
1
(x
2
) := (x
1
, x
2
).
Since
1
x
1
(A
1
A
2
) =
_
_
_
x
1
/ A
1
A
2
x
1
A
1
,

x
1
: (
2
, T
2
) (, T).
Hence, for any function f : (, T) (

1,

B), the function f
x
1
= f
x
1
or
f
x
1
(x
2
) = f(x
1
, x
2
) is measurable: f
x
1
: (
2
, T
2
) (

1,

B).
In particular, A T A
x
1
T
2
, x
1

1
.
Step 2 For f : (, T) (

1,

B), f 0, the function x
1

_
f(x
1
, x
2
)K(x
1
, dx
2
) is well-
dened due to Step 1.
We show that this function is (T
1
,

B)-measurable. Follow the denition of the
integral:
Take rst f = I
A
, then f 0 with nitely many values, then f 0.
Let T be the collection of all A T, for which x
1
K(x
1
, A
x
1
) is (T
1
,

B)-
measurable.
37
T is a -system
T contains all sets of the form A = A
1
A
2
, since
K(x
1
, A
x
1
) = I
A
1
(x
1
)K(x
1
, A
2
), hence x
1
K(x
1
, A
x
1
) is (T
1
,

B)-measurable
as a product of two measurable functions.
T = T.
Step 3 Due to Step 2, the right hand side of (8.1) or (8.2), respectively, are well-dened.
(8.2) denes a probability measure on (, T):
(i) P() =
_
K(x
1
, )
. .
=1
P
1
(dx
1
) = 1
(ii) A
1
, A
2
, . . . T, A
i
A
j
= for i ,= j
_

i=1
A
i
_
x
1
=

i=1
(A
i
)
x
1
P
_

i=1
A
i
_
=
_
K
_
x
1
,

i=1
(A
i
)
x
1
_
. .

i=1
K(x
1
,(A
i
)
x
1
)
P(dx
1
)
mon.
=
conv.

i=1
_
K(x
1
, (A
i
)
x
1
)P(dx
1
)
=

i=1
P(A
i
).
P is a probability measure.
P satises (8.1) since for f = I
A
, (8.1) is (8.2) and for general f, proceed as in
the denition of the integral.
The probability measure P in (8.2) is noted P
1
K.
If X
i
() = x
i
( = (x
1
, x
2
)) is the projection of to
i
(i = 1, 2), then X
1
has the law
P
1
: P(X
1
A
1
) = P(A
1

2
) = P
1
(A
1
), A
1
T
1
.
The law of X
2
is given by
P(X
2
A
2
) = P(
1
A
2
) =
_
K(x
1
, A
2
)P
1
(dx
1
)
. .
=:P
2
(A
2
)
, A A
2
.
Denition 8.3 We say that P
1
and P
2
are the marginals of P (Randverteilungen).
The stochastic kernel K(, ) is also denoted conditional distribution of X
2
, given X
1
and
we write
P(X
2
A
2
[ X
1
= x
1
) = K(x
1
, A
2
) (x
1

1
, A
2
T
2
) (8.4)
or P(X
2
[ X
1
= x
1
) = K(x
1
, ) = conditional distribution of X
2
, given X
1
() = x
1
.
If
1
is countable, the left hand side of (8.4) can be dened in an elementary way, namely
P(X
2
A
2
[ X
1
= x
1
) =
P(X
2
A
2
, X
1
=x
1
)
P(X
1
=x
1
)
=
P(x
1
A
2
)
P
1
(x
1
)
(provided that P
1
(x
1
) > 0) and (8.4) can be proved:
P(x
1
A
2
)
(8.3)
= P
1
(x
1
)K(x
1
, A
2
) (8.4).
In general, this is not possible (for instance, P
1
(x
1
) can be = 0, x
1

1
!) and then
we can take (8.4) as the denition of P(X
2
A
2
[ X
1
= x
1
).
38
Example 8.4 (Classical case)
Assume K(x
1
, ) P
2
(no information).
Then we write P = P
1
P
2
and P is the product (Produktma) of P
1
and P
2
.
We then have
P
1
P
2
(A
1
A
2
) = P
1
(A
1
)P
2
(A
2
) = P
2
P
1
(A
2
A
1
) (8.5)
and (8.5) and (8.1) imply the classical
Fubini Theorem For f : (, T) (

1,

B), f 0,
_
f d(P
1
P
2
) =
_ __
f(x
1
, x
2
)P
2
(dx
2
)
_
P
1
(dx
1
) =
_ __
f(x
1
, x
2
)P
1
(dx
1
)
_
P
2
(dx
2
)
(8.6)
Remark In fact, (8.6) remains true for -nite measures P
1
and P
2
.
Example 8.5 (Uniform distribution on [0, 1]
2
)
Take (
1
, T
1
) = (
2
, T
2
) = ([0, 1], B
[0,1]
),
P
1
= [
[0,1]
= U[0, 1] = uniform distribution on [0, 1], K(x
1
, dx
2
) [
[0,1]
= U[0, 1].
Then, X
2
has the law P
2
= P
1
and X
1
, X
2
are independent.
Example 8.6 (
1
, T
1
), (
2
, T
2
), P
1
as in Example 8.5., K(x
1
, ) = x
1
, x
1
.
Then, X
2
= X
1
P-a.s. and P
2
= P
1
= U[0, 1].
Example 8.7 (
1
, T
1
), (
2
, T
2
), P
1
as in Example 8.5, K(x
1
, dx
2
) =
1
x
1
[
[0,x
1
]
.
K(x
1
, ) =
1
x
1
[
[0,x
1
]
= U[0, x
1
] (or K(x
1
, A) =
1
x
1
x
1
_
0
I
A
(u) du)
We compute P
2
:
P
2
(A
2
) = P( A
2
) =
1
_
0
K(x
1
, A
2
) dx
1
Take A
2
= [0, t]. Then
1
_
0
K(x
1
, A
2
) dx
1
=
t
_
0
1 dx
1
+
1
_
t
t
x
1
dx
1
= t log t.
X
2
has the cdf F
2
(t) = t t log t (0 t 1).
Remark X
2
has the density f
2
(x
2
) = log x
2
.
P is concentrated on the set (x
1
, x
2
) [ 0 x
2
x
1
1.
Example 8.8 (Uniform distribution on A = (x
1
, x
2
) [ 0 x
2
x
1
1)
(
1
, T
1
), (
2
, T
2
) as in Example 8.5, P
1
is the probability measure on (
1
, T
1
) with
density f
1
(x
1
) = 2x
1
and K(x
1
, ) = U[0, x
1
] as in Example 8.7.
Then compute P
2
: P
2
(A
2
) = P(
1
A
2
) =
1
_
0
K(x
1
, A
2
)P
1
(dx
1
).
Take A
2
= [0, t].
P
2
(A
2
) = 2
1
_
0
K(x
1
, [0, t])x
1
dx
1
= 2
t
_
0
x
1
dx
1
+ 2
1
_
t
t
x
1
x
1
dx
1
= . . . = 2(t
1
2
t
2
)
the random variable X
2
has the density f
2
(x
2
) = 2(1 x
2
).
39
Claim P is the uniform distribution on A, i.e. P(B) = 2
2
(B A), B [0, 1]
2
, B
B
[0,1]
2.
Proof: Suces to consider B = B
1
B
2
, B
1
= [0, b
1
], B
2
= [0, b
2
].
Assume b
1
> b
2
. Then
P(B) =
_
B
1
K(x
1
, B
2
)P
1
(dx
1
) = 2
b
2
_
0
1P
1
(dx
1
) + 2
b
1
_
b
2
b
2
x
1
P
1
(dx
1
) = 2
b
2
_
0
x
1
dx
1
+ 2
b
2
_
b
1
b
2
dx
1
=
2(
1
2
b
2
2
) + (b
1
b
2
)b
2
= 2
2
(B A).
Example 8.9 (2-dimensional Gaussian law (2-dim. Normalverteilung))
(
1
, T
1
) = (
2
, T
2
) = (1, B), P
1
= N(0, 1) i.e. P
1
(A
1
) =
_
A
1
1

2
e

x
2
1
2
dx
1
(A
1
B
1
)
Let 0 1 and K(x
1
, ) = N(x
1
, 1
2
) i.e. K(x
1
, A
2
) =
_
A
2
1

2(1
2
)
e

(x
2
x
1
)
2
2(1
2
)
Claim P
2
= N(0, 1) (no matter what is!)
Proof: P
2
(A
2
) =

K(x
1
, A
2
)P
1
(dx
1
). Take A
2
= (, t].
P
2
(A
2
) =

_
t
_

2(1
2
)
e

(x
2
x
1
)
2
2(1
2
)
dx
2
_
1

2
e

x
2
1
2
dx
1
Fubini
=
t
_

_

_

2(1
2
)
e

(x
2
x
1
)
2
+(1
2
)x
2
1
2(1
2
)
1

2
dx
1
_
dx
2
=
t
_

_
_

1
_
2(1
2
)
e

(x
1
x
2
)
2
2(1
2
)
dx
1
_
_
. .
=1
1

2
e

x
2
2
(1
2
)
2(1
2
)
dx
2
=
t
_

2
e

x
2
2
2
dx
2
X
2
has the density f
2
(x
2
) =
1

2
e

x
2
2
2
i.e. X
2
has the law N(0, 1).
Note that X
1
, X
2
are independent = 0.
is the correlation (Korrelationskoezient).
(x
1
, x
2
) =
Cov(X
1
,X
2
)

V ar(X
1
)V ar(X
2
)
, since V ar(X
1
) = V ar(X
2
) = 1 and
Cov(X
1
, X
2
)
(exercise!)
= E(X
1
X
2
)
. .
=
E(X
1
)
. .
=0
E(X
2
)
. .
=0
= .
P is the 2-dim. Gaussian law with expectation (0, 0) and covariance matrix
_
1
1
_
.
Question: Does a probability measure P on (
1

2
, T
1
T
2
) always have the form
P
1
K with a probability measure P
1
on (
1
, T
1
) and a stochastic kernel K from (
1
, T
1
)
to (
2
, T
2
)?
Answer: Under (mild) regularity assumption on (
1
, T
1
), (
2
, T
2
): Yes!
In general: No!
See literature.
40
9 Absolute continuity, Radon-Nikodym derivatives
Recall that a real-valued random variable X on (, /, P) has the density f if
P(X [a, b]) = P
X
([a, b]) =
b
_
a
f(x) dx =
_
[a,b]
f(x) dx.
We will generalize this notion of density.
Let (, /, ) be a measure space. Recall that
c

= f :

1 [ f 0, f : (, /) (

1,

B).
Theorem 9.1 Let f c

and dene
(A) :=
_
A
f d (A /). (9.1)
Then, is a measure with density f with respect to .
Proof: We have () = 0, (A) 0, A /. Remains to show that is -additive. Take
A =

n=1
A
n
, A
n
/, n. Then (A) =
_
fI

n=1
A
n
d =
_
lim
m
_
f
m

n=1
I
A
n
_
d
mon.
=
conv.
lim
m
m

n=1
fI
A
n
d =

n=1
_
fI
A
n
d =

n=1
(A
n
).
If (, /, P) = (1
k
, B
k
,
k
) and f satises (9.1) we say that f is the Lebesgue-density of .
If = P
X
for some random variable X ( is the law of X) we say that f is the Lebesgue-
density or density of X.
Theorem 9.2 Assume f c

and dene by (A) =


_
A
f d (A /). Then
a)
h c

,
_
hd =
_
hf d (9.2)
b) For h: (, /) (

1,

B), we have
h is integrable with respect to hf is integrable with respect to
and in this case (9.2) holds.
Proof: see exercise 20.
Example 9.3 = Z, / = T(Z) and =

counting measure on (, /).


Let f : [0, ] be such that

Z
f() = 1 and dene P by P(A) =
_
A
f d (A /).
Then, P is a discrete probability measure with P() = f(), Z.
Let X: (, /) (1, B) be a random variable. If E(X) is dened,
E(X) =
_
X dP
Thm. 9.2
=
_
Xf d =

Z
X()f().
41
Theorem 9.4 (Uniqueness of densities)
Assume f, g c

. Then:
a) f = g -a.e.
_
A
f d =
_
A
g d, A /.
b) f integrable with respect to and
_
A
f d =
_
A
g d, A / f = g -a.e.
Proof:
a) f = g -a.e. fI
A
= gI
A
-a.e., A / and the claim follows with Theorem 4.2.
b)
_

f d =
_

g d g is integrable as well.
Dene N := f > g, h := fI
N
gI
N
0. Then,
_
fI
N
d =
_
gI
N
d
_
hd = 0 h = 0 -a.e. (N) = 0.
In the same way, (g > f) = 0 f = g -a.e.
Example 9.5 b) does not hold without the assumption that f is integrable.
Take not countable, / = A [ A countable or A
c
countable,
(A) =
_
_
_
0 A is countable
A
c
is countable.
is a measure on (, /).
Take f() = 1, g() = 2, .
Then,
_
A
f d = (A) = 2(A) =
_
A
g d, A / but (f ,= g) = () = .
Question: For two measures , on (, /), is there f : (, /) ([0, ],

B) such that
(A) =
_
A
f d, A /?
Denition 9.6 Assume , are measures on (, /). We say that is absolutely contin-
uous with respect to if A /: (A) = 0 (A) = 0.
We write . In words: means that each -nullset is also a -nullset.
Remark 9.7 If has a density f with respect to and (A) = 0, then
(A) =
_
fI
A
..
=0 -a.e.
d = 0 .
Theorem 9.8 (Radon-Nikodym Theorem)
Assume and are -nite measures on (, /).
Then the following two statements are equivalent:
a) has the density f with respect to .
b) .
f is also denoted Radon-Nikodym derivative and we write f =
d
d
.
42
Proof:
a) b) see Remark 9.7.
b) a) will need the following lemma.
Lemma 9.9 Let and be nite measures on (, /) such that () < (). Then,
there is

/ with (

) < (

) and (A) (A), A / with A

.
Proof: Dene the function = .
For A /, have (A) (A) (A) : / 1 is bounded.
Dene the sets
n
, /
n
(n N) as follows:
A
1
= ,
1
= A
1
= .
If A
1
, . . . , A
n
,
1
, . . . ,
n
are constructed, take

n
:= inf(A) [ A /, /
n
(
n
() = 0)
Case 1:
n
= 0.
Then A
n+1
:= ,
n+1
:=
n
A
n+1
=
n

k
= 0, k n.
Case 2:
n
< 0.
Take A
n+1
/, A
n+1

n
with (A
n+1
) <

n
2
and take
n+1
:=
n
A
n+1
.
With this construction we have A
1
, A
2
, . . . /, A
i
A
j
= 0 for i ,= j and

n=1
[(A
n
)[
_

n=1
A
n
_
+
_

n=1
A
n
_
< .
lim
n
(A
n
) = 0 lim
n

n
= 0.
Dene

:=

n=1

n
and show that the statements in Lemma 9.9 are satised
with

. We have
1

2
. . . ,
(

) = (

) (

) = lim
n
(
n
) lim
n
(
n
) = lim
n
(
n
).
Since (
n+1
) = (
n
) (A
n+1
)
. .
0
(
n
) (
n1
) . . . (
1
) = (),
we have (

) () > 0 (

) < (

).
If A /, A

, we have A /, A
n
, n (A)
n
, n.
But lim
n

n
= 0 (A) 0.
Proof of Theorem 9.8: b) a)
Case 1: () < and () < .
G := g c

[
_
A
g d (A), A / (g 0 G G ,= ).
If g, h G then max(g, h) G since
_
A
max(g, h) d
_
Agh
g d +
_
Ag<h
hd (A g h) + (A g < h) = (A).
Dene := sup
_
g d [ g G ( () < ).
Let ( g
n
) be a sequence in G such that lim
n
_
g
n
d = .
Let g
n
:= max g
1
, . . . , g
n
g
n
G, n.
Have
_
g
n
d
_
g
n
d lim
n
_
g
n
d = .
43
With monotone convergence
_
A
lim
n
g
n
d = lim
n
_
A
g
n
d (A).
Hence: f := lim
n
g
n
G and
_
f d = .
Claim: f is a density of with respect to .
Proof: Dene on (, /) by (A) := (A)
_
A
f d 0, A /.
is a nite measure on (, /) with ((A) = 0 (A) = (A) = 0).
Assume () > 0.
() > 0 q :=
()
2()
> 0 () = 2q() > q().
Apply Lemma 9.9 with and = q and conclude that there is

/ with
(

) > q(

) and (A) q(A), A /, A

.
Take f

:= f + qI

. Then A /:
_
A
f

d =
_
A
f d + q(A

)
_
A
f d + (A) = (A) f

G.
Since 0 < q(

) < (

),
_
f

d =
_
f d
. .
=
+q(

)
. .
>0
and this is a contradiction
to the denition of .
() = 0 (A) =
_
A
f d, A /.
Case 2: , -nite
There are sequences (A
n
), (B
n
), A
n
/, n, B
n
/, n such that
A
1
A
2
. . . , A
n
, B
1
B
2
. . . , B
n
and
(A
n
) < , n, (B
n
) < , n.
Dene C
n
:= A
n
B
n
, n N.
Then C
1
C
2
. . . , C
n
and (C
n
) < , (C
n
) < , n.
Take E
n
:= C
n
C
n1
. Then

n=1
E
n
= , (E
n
) < , (E
n
) < , n.
Dene for each n the nite measures
n
,
n
by

n
(A) := (A E
n
) and
n
(A) := (A E
n
).
Then,
d
n
d
= I
E
n
() (Proof of (): exercise)
We have measurable functions f
n
: [0, ] such that
(A) =

n=1

n
(A)
Case 1
=

n=1
_
A
n
f
n
d
n
()
= lim
m
_
A
_
m

n=1
f
n
I
E
n
_
d =
_
A
f d
with f :=

n=1
f
n
I
E
n
.
has the density f with respect to .
Example 9.10 Let F(x) =
_
_
_
0 x < 0
1
1
2
e
x
x 0
and let be the corresponding measure on (1, B) with ((, x]) = F(x), x 1.
( could be for instance the law of the amount of rain falling at some place per month.)
Let be the Lebesgue measure on (1, B).
is NOT absolutely continuous with respect to since (0) = 0, (0) =
1
2
> 0.
Dene := +
0
. Then there is a Radon-Nikodym derivative f =
d
d
.
44
Claim: f(x) =
_

_
0 x < 0
1
2
x = 0
1
2
e
x
x > 0
does the job.
Proof: ((, x]) =
_

_
0 =
x
_

f(t) dt x < 0
1
2
=
x
_

f(t) dt +
1
2
x = 0
1
1
2
e
x
=
x
_

f(t) dt +
1
2
x > 0
_

_
=
x
_

f d, x.
Example 9.11 Assume is a probability measure on (1, B) with . Then, the cdf
F of is absolutely continuous.
A function F is absolutely continuous if there is a measurable function f 0 such that
F(x) =
x
_

f(t) dt =
x
_

f(t)(dt), x 1.
f is determined uniquely, -a.e.
Remark 9.12 There are probability measures on (1, B) such that (x) = 0, x 1
( the cdf F is continuous ) but , ( F is not absolutely continuous).
Example Uniform distribution on the Cantor set C, see later.
Denition 9.13 Two measures and on (, /) are singular with respect to each other
if there is a set A / such that (A) = 0, (A
c
) = 0. We write .
Examples
1. uniform distribution on [0, 1], uniform distribution on [1, 2].
Then (take A = [0, 1]).
2. a continuous measure on (1, B) (i.e. (x) = 0, x 1), =

q
a
q

q
for some
sequence (a
q
)
q
, then (take A = ).
Theorem 9.14 (Lebesgues decomposition)
, measures on (, /), nite measure. Then there are unique measures
a
and
s
on
(, /) with
i)
a
,
ii)
s
,
iii) =
a
+
s
.
We say that
a
is the absolutely continuous part and
s
the singular part of with respect
to .
45
Proof: Let A

= A / [ (A) = 0 and := sup(A) [ A A

and let (A
n
) be an
increasing sequence in A

such that lim


n
(A
n
) = .
Dene N :=

n=1
A
n
. Then (N) = 0, (N) = .
Dene
a
and
s
by
a
(A) = (A N
c
),
s
(A) = (A N).
Then,
a
and
s
are measures on (, /) with =
a
+
s
.
Since
s
(N
c
) = 0 and (N) = 0, we have
s
.
We show that
a
:
A /, (A) = 0 N (A N
c
) A

(N (A N
c
)) = (N) + (A N
c
) =
+
a
(A)
(B), BA



a
(A) = 0.
Remains to show: uniqueness of the decomposition.
Assume =
a
+
s
=
a
+
s
with
a
,
s
as above,
a
,
s
.
Since
s
, there is a -nullset

N with
s
(

N
c
) = 0, hence

s
(A) =
s
(A

N), A /. (9.3)
Then take N
0
= N

N, then N
0
A

and
a
,
a


a
(A N
0
) =
a
(A N
0
) = 0, A /

s
(A N
0
) = (A N
0
), A /.
Together with (9.3) this implies
(A N
0
) =
s
(A N
0
)
(9.3)
=
s
(A N
0


N) =
s
(A

N)
(9.3)
=
s
(A), A /.
In the same way, (A N
0
) =
s
(A)

s
=
s

a
=
a
.
46
10 Construction of stochastic processes
We saw that for a probability measure P
1
on (
1
, T
1
) and a stochastic kernel from (
1
, T
1
)
to (
2
, T
2
), there is a unique probability measure P = P
1
K on (, T) ( =
1

2
, T = T
1
T
2
). On the other hand, under mild regularity assumptions, each probability
measure P on (, T) is of the form P
1
K when P
1
is the marginal PX
1
1
(where
= (x
1
, x
2
), X
1
() = x
1
, X
2
() = x
2
) and K is a suitable kernel from (
1
, T
1
) to
(
2
, T
2
).
We now describe two particular cases:
1. If
1
is countable, we take
K(x
1
, A
2
) =
_
_
_
P(X
2
A
2
[ X
1
= x
1
) P(X
1
= x
1
) > 0

2
(A
2
) otherwise
where
2
is an arbitrary probability measure on (
2
, T
2
).
Then, P(A
1
A
2
) =

x
1
A
1
P(X
1
= x
1
, x
2
A
2
) =

x
1
A
1
P(X
1
=x
1
)>0
P(X
1
= x
1
) P(x
2
A
2
[ X
1
= x
1
)
. .
=K(x
1
,A
2
)
= (P
1
K)(A
1
A
2
)
P = P
1
K, see the uniqueness statement after (8.3).
2. P is given by a Radon-Nikodym derivative f(x
1
, x
2
) 0 with respect to a product
measure =
1

2
on
1

2
, i.e. P(A) =

A
f(x
1
, x
2
)
1
(dx
1
)
2
(dx
2
)
P
1
has the Radon-Nikodym derivative
f
1
(x
1
) =
_
f(x
1
, x
2
)
2
(dx
2
) with respect to
1
, i.e. f
1
=
dP
1
d
.
In the same way, f
2
=
dP
2
d
2
where f
2
(x
2
) =
_
f(x
1
, x
2
)
1
(dx
1
).
For x
1

1
, f
1
(x
1
) ,= 0, we dene the conditional density f
2
(X
2
[ X
1
) =
f(x
1
,x
2
)
f
1
(x
1
)
(x
2

2
).
Let K(x
1
, ) =
_
_
_
f(x
2
[ x
1
)
2
() f(x
1
) ,= 0

2
() otherwise,
where
2
is an arbitrary probability measure on (
2
, T
2
). Then
(P
1
K)(A
1
A
2
) =
_
A
1
f
1
(x
1
)K(x
1
, A
2
)
1
(dx
1
)
=
_
A
1
f
1
>0
f
1
(x
1
)
_
f(x
1
,x
2
)
f(x
1
)

2
(dx
2
)
1
(dx
1
) =
_
A
1
_
A
2
f(x
1
, x
2
)
2
(dx
2
)
1
(dx
1
)
= P(A
1
A
2
)
P = P
1
K, see the uniqueness statement after (8.3).
Example: 2-dimensional centered Gaussian law
Let [0, 1), = 1
2
,
1
=
2
= ,
f(x
1
, x
2
) =
1
2

1
2
e

x
2
1
2x
1
x
2
+x
2
2
2(1
2
)
=
1

2
e

x
2
1
2
1

2(1
2
)
e

(x
2
x
1
)
2
2(1
2
)
f
1
(x
1
)f(x
2
[ x
1
).
The kernel K(x
1
, ) is therefore given as the Gaussian law A(x
1
, 1
2
).
If = 0, P = P
1
P
2
.
47
Modelling the evolution of a stochastic system
Let (
i
, T
i
) (i = 0, 1, . . . ) be a sequence of measurable spaces. For n 0 take (
(n)
, T
(n)
)
=
_
n

i=0

i
,
n

i=0
T
i
_
. Assume P
0
is a probability measure on (
0
, T
0
) and for each n 1, K
n
is a stochastic kernel from (
(n1)
, T
(n1)
) to (
n
, T
n
).
Interpretation (
i
, T
i
) state space for time i, P
0
= initial law, K
n
(n = 1, 2, . . . )
evolution laws, K
n
((x
0
, . . . , x
n1
), A
n
) = probability that the system is at time n in A
n
,
given the history (x
0
, . . . , x
n1
).
By applying Theorem 8.2 n times, we get for each n 1 a probability measure P
(n)
on
(
(n)
, T
(n)
) with P
(1)
= P
0
, P
(n)
= P
(n1)
K
n
and we have f 0, T
(n)
-measurable:
_
f dP
(n)
=
_

_
f(x
0
, . . . , x
n
)K
n1
((x
0
, . . . , x
n1
), dx
n
) . . . K
1
(x
0
, dx
1
)P
0
(dx
0
).
(10.1)
(
(n)
, T
(n)
, P
(n)
) models the evolution of the system in the time 0, 1, . . . , n.
Goal: Model for innite time horizon.
Let =

j=0

j
= = (x
0
, x
1
, . . . ) [ x
i

i
be the set of all possible trajectories
(Menge aller m oglichen Pfade).
X
n
() = x
n
is the state at time n.
/
n
= (X
0
, . . . , X
n
) is the -eld containing all events observable until time n.
/ = (X
0
, X
1
, . . . ) =
_

n=0
/
n
_
.
Wanted: A probability measure P on (, /) such that the restriction to (
(n)
, T
(n)
) is
P
(n)
. (, /, P) is then a model for the evolution of the system for innite time horizon.
More precisely, A /
n
is of the form A = A
(n)

n+1

n+2
. . . with A
(n)
T
(n)
and
we want a probability measure P on (, /) such that
P(A
(n)

n+1

n+2
. . . ) = P
(n)
(A
(n)
), A
(n)
T
(n)
, n 0. (10.2)
Theorem 10.1 (Ionescu-Tulcea)
Given are P
0
, a probability measure on (
0
, T
0
) and, for each n, a stochastic kernel K
n
from (
(n1)
, T
(n1)
) to (
n
, T
n
).
Then, there is a unique probability measure P on (, /) with (10.2) or (10.1), respectively.
Proof: (10.2) denes P on

n=0
/
n
, in a consistent way:
A /
n
/
n1
A = A
(n)

n+1

n+2
. . . where A
(n)
= A
(n1)

n
with
A
(n1)
T
(n1)
P
(n)
(A
(n)
) =
_
A
(n1)
K
n
((x
0
, . . . , x
n1
),
n
)P
(n1)
(d(x
0
, . . . , x
n1
)) =
P
(n1)
(A
(n1)
)
48
Question: Can we extend P from the algebra

n=0
/
n
to the -eld / =
_

n=0
/
n
_
?
Uniqueness of the extension follows with Theorem 1.18 since

n=0
/
n
is -stable and / =

n=0
/
n
_
. P is additive on

n=0
/
n
.
To apply Theorem 1.18 we have to show that P is -additive on

n=0
/
n
.
Due to the Remark after Theorem 1.17, it suces to show that A
n

k=0
/
k
, A
n

lim
n
P(A
n
) = 0.
Without loss of generality A
n
/
n
(n = 1, 2, . . . ) hence A
n
= A
(n)

n+1

n+2
. . . ,
A
(n+1)
A
(n)

n+1
with A
(n)
T
(n)
(n = 1, 2, . . . ).
We assume that inf
n
P(A
n
) > 0, and this will lead to a contradiction.
Now, since
P(A
n
) =
_

_
I
A
(n) (x
0
, . . . , x
n
)K
n
((x
0
, . . . , x
n1
), dx
n
) . . . K
1
(x
0
, dx
1
)
. .
f
0,n
(x
0
)
P
0
(dx
0
) =
_
f
0,n
(x
0
)P
0
(dx
0
), there is some x
0

0
with inf
n
f
0,n
( x
0
) > 0.
In the same way, with K
1
( x
0
, dx
1
) instead of P
0
, there has to be x
1

1
with inf
n
f
1,n
( x
1
) >
0, where inf
n
_

_
I
A
(n) (( x
0
, x
1
, x
2
, . . . , x
n
)K
n
(( x
0
, x
1
, x
2
, . . . , x
n1
), dx
n
) . . .
K
2
(( x
0
, x
1
), dx
2
)K
1
( x
0
, dx
1
) = inf
n
_
f
1,n
( x
1
)K
n
(. . . ) . . . K
2
(( x
0
, x
1
), dx
2
) > 0.
for each k 0 there is x
k

k
such that
inf
n
_

_
I
A
(n) ( x
0
, . . . , x
k
, x
k+1
, . . . , x
n
)K
n
(( x
0
, . . . , x
k
, x
k+1
, . . . , x
n1
), dx
n
) . . .
K
k+1
(( x
0
, . . . , x
k
), dx
k+1
) > 0.
In particular,
for n = k + 1, I
A
(k+1) ( x
0
, . . . , x
k
, ) ,= 0
A
(k+1)
A
(k)

k+1
( x
0
, . . . , x
k
) A
(k)
.
But now = ( x
0
, x
1
, . . . ) A
k
, k and this contradicts

n
A
n
= .
Denition 10.2 The sequence (X
n
)
n=1,2,...
on the probability space (, /, P) is the
stochastic process with initial law P
0
and evolution laws K
n
(n = 1, 2, . . . ). We write
P(X
n
A
n
[ X
0
= x
0
, . . . , X
n1
= x
n1
) = K
n
((x
0
, . . . , x
n1
), A
n
) (x
i

i
, A
n
T
n
).
If K((x
0
, . . . , x
n1
), ) = K
n
(x
n1
, ), n for a kernel K
n
from (
n1
, T
n1
) to (
n
, T
n
),
we say that (X
n
)
n=0,1,...
is a Markov process.
If in addition (
n
, T
n
) = (S, o) and K
n
(, ) = K(, ), n (for some measurable space
(S, o) and some kernel K from (S, o) to (S, o)) we say that (X
n
)
n=0,1,...
is a time-
homogeneous Markov process with state space (S, o) and kernel K.
Example 10.3 S = 1, o = B. Take > 0. The kernel K from (S, o) to (S, o) is given
by K(x, ) = N(0, x
2
) and we take P
0
=
x
0
for some x
0
,= 0.
(X
n
)
n=0,1,...
is a system that approaches 0 with a variance which depends on the present
state.
49
Stability question:
For which values of do we have X
n
0 P-a.s.?
We have
E(X
2
n
)
(10.2)
=
_
x
2
n
P
(n)
(dx
0
, . . . , dx
n
) =
_
x
2
n
K(x
n1
, dx
n
)
. .
=x
2
n1
P
(n1)
(dx
0
, . . . , dx
n1
)
= E(X
2
n1
) E(X
2
n
) =
n
x
0
(n = 0, 1, . . . ).
For < 1, we conclude that E
_

n=0
X
2
n
_
mon. conv.
=

n=0
E(X
2
n
) <

n=0
X
2
n
< P-a.s.
and in particular,
P(lim
n
X
n
= 0) = 1. (10.3)
For < 1, we therefore have stable behaviour (in the sense of (10.3)).
This continues to be true for <

2
: since
_
[y[N(0,
2
)(dy) =
_
2

, we have
E([X
n
[) =
_ _
[x
n
[K(x
n1
, dx
n
)
. .
=

[x
n1
[

P
(n1)
(dx
0
, . . . , dx
n1
).
Hence E([X
n
[) =
_
2

E([X
n1
[) and (10.3) follows with the same argument as above.
For 1 < <

2
, we hence have E(X
2
n
) , X
n
0 P-a.s.
Denition 10.4 If K
n
((x
0
, . . . , x
n1
), ) P
n
(n = 1, 2, . . . ) then the probability mea-
sure P, given by Theorem 10.1, is the (innite) product measure (Produktma) with
marginals P
n
and we write P =

n=0
P
n
.
P
n
is the n-th marginal or the law of X
n
: P
n
= P X
1
n
.
Theorem 10.5 P is the product of its marginals P X
1
n
if and only if X
0
, X
1
, . . . are
independent with respect to P.
Proof: Let

P =

n=0
P
n
. Then, X
0
, X
1
, . . . are independent with respect to P
P(X
0
A
0
, . . . , X
k
A
k
) =
k

i=0
P(X
i
A
i
), k, A
1
, . . . , A
k
T
1
, . . . , T
k
.
But
k

i=0
P(X
i
A
i
) =

P(X
0
A
0
, . . . , X
k
A
k
). Hence X
0
, X
1
, . . . independent with
respect to P P =

P.
Example 10.6 (Independent coin tosses with success parameter p)
S = 0, 1, o = T(S). Take p [0, 1].

p
probability measure on (S, o) with
p
(1) = p = 1
p
(0).
P
p
:=

n=0

p
is a probability measure on (, /), = S
N
, / =

i=0
o
i
, o
i
= o, i,
= (x
0
, x
1
, . . . ), X
i
() = x
i
.
Then X
0
, X
1
, . . . are independent under P
p
with law
p
. (We say that X
0
, X
1
, . . . are
i.i.d. (independent and identically distributed).)
P
p
(X
0
= x
0
, . . . , X
n
= x
n
) = p

n
i=0
x
i
(1 p)
n

n
i=1
x
i
.
50
Hence, for 0 < p < 1, P
p
() = 0, .
See later: for S
n
=
n

i=0
x
i
,
S
n
n
p P
p
-a.s.
This implies that for p ,= q P
p
P
q
(take A = : lim
n
1
n
n

i=1
x
i
= q, then
P
p
(A) = 0, P
q
(A
c
) = 0). Let /
n
= (X
0
, . . . , X
n
).
We write P
p
[
,
n
for the restriction of P
p
to (, /
n
).
Then, P
p
[
,
n
P
q
[
,
n
, p, q (0, 1).
Example 10.7 0 , , 1. Markov chain with state space S = 0, 1, initial law
= (, 1 ), (0) = , (1) = 1 and the stochastic kernel K =
_
1
1
_
.
Then P(X
0
= 0) = = 1 P(X
0
= 1),
P(X
n+1
= 0 [ X
n
= 0) = = 1 P(X
n+1
= 1 [ X
n
= 0),
P(X
n+1
= 0 [ X
1
= 1) = = 1 P(X
n+1
= 1 [ X
n
= 1).
Due to Theorem 10.1, this determines uniquely a probability measure P on (, /) where
= S
N
, / =

i=0
o
i
, o
i
= T(S), i, as above.
Example 10.6 is a particular case with = = = 1 p.
We nally consider a process which is not a Markov process.
Example 10.8 (Polyas urn)
In an urn we have a white and a black ball. We draw a ball at random and replace it
with two balls of the colour which was drawn.
= 0, 1
N
, = (x
1
, x
2
, . . . ), X
i
() =
_
_
_
1 i-th ball drawn is white
0 i-th ball drawn is black,
P(X
i
= 1) =
1
2
= P(X
i
= 0),
P(X
2
= 1 [ X
1
= x
1
) =
_
_
_
2
3
x
1
= 1
1
3
x
1
= 0.
Let S
n
=
n

i=1
x
i
. After n balls drawn, n + 2 balls are in the urn,
P(X
n+1
= 1 [ X
1
= x
1
, . . . , X
n
= x
n
) =

n
i=1
x
i
+1
n+2
= K
n
((x
1
, . . . , x
n1
), 1).
With Theorem 10.1, this uniquely determines a probability measure on (, /).
Question: Long-time behaviour of
S
n
+1
n+2
= proportion of white balls?
51
11 The law of large numbers
(, /, P) probability space.
Lemma 11.1 (Jensens inequality)
X L
1
, g : 1 1 convex function. Then, g(X) is semi-integrable and
E(g(X)) g(E(X)) (11.1)
If g is strictly convex and X is not P-a.s. constant, we have > in (11.1).
Proof: g is convex (strictly convex) if and only if there is for each x
0
1 a linear function
l(x) = ax + b such that l(x
0
) = g(x
0
) and l(x) g(x), x ,= x
0
(l(x) < g(x), x ,= x
0
).
Take x
0
= E(X), then E(g(X)) E(l(X))
l linear
= l(E(X))
. .
g(E(X))
.
If we have = in the above argument then l(X) = g(X) P-a.s.
(since for Y = g(X) l(X), Y 0 and E(Y ) = 0 Y = 0 P-a.s.).
If g is strictly convex, this implies X = x
0
= E(X) P-a.s.
Theorem 11.2 (Strong law of large numbers)
X
1
, X
2
, . . . i.i.d. and X
1
L
1
. S
n
:=
n

i=1
X
i
. Then,
S
n
n
E(X
1
) P-a.s.
Remark Due to Theorem 6.2 (ii), Theorem 11.2 implies the weak law of large numbers
which says that
S
n
n
E(X
1
) in probability.
Proof of Theorem 11.2: Under the assumption E(X
4
1
) <
1) E(X
4
1
)
Jensen
E([X
1
[)
4
X
1
L
1
.
2) Without loss of generality E(X
1
) = 0 (otherwise, consider

X
i
= X
i
E(X
i
)).
Take > 0.
P([
S
n
n
[ )
Markov inequ.

f(x)=x
4
1

4
E((
S
n
n
)
4
)
=
1

4
n
4
E((X
1
+ . . . + X
n
)(X
1
+ . . . + X
n
)(X
1
+ . . . + X
n
)(X
1
+ . . . + X
n
))
=
1

4
n
4
(nE(X
4
1
) + 4n(n 1)E(X
3
1
)E(X
2
) + 3n(n 1)E(X
2
1
)E(X
2
2
)
+ 6n(n 1)(n 2)E(X
2
1
)E(X
2
)E(X
3
)
+ n(n 1)(n 2)(n 3)E(X
1
)E(X
2
)E(X
3
)E(X
4
)).
But E(X
i
) = 0, i, and E(X
2
i
)
2
Jensen
E(X
4
i
).
Hence P([
S
n
n
[ )
1

4
1
n
4
(nE(X
4
1
) + 3n(n 1)E(X
4
1
))
C()
n
2
.
Hence

n=1
P([
S
n
n
[ ) <
Borel-

Cantelli
P([
S
n
n
[ for innitely many n) = 0.
Since > 0 was arbitrary, we conclude that [
S
n
n
[ 0 P-a.s.
As an application, we go back to Example 10.3.
52
Example 10.13 (continuation)
P
0
:=
x
0
where x
0
,= 0, K(x, ) = N(0, x
2
).
The process (X
n
)
n=0,1,...
can also be written as X
0
= x
0
, X
n+1
=

[X
n
[Y
n+1
,
n = 0, 1, . . . where Y
1
, Y
2
, . . . are i.i.d. with law N(0, 1).
Then, [X
n
[ = (

)
n
[Y
n
[[Y
n1
[ . . . [X
0
[
log [X
n
[ =
n

i=1
(log([Y
i
[) + log(

)) + log([x
0
[) = log [x
0
[ +
n

i=1
Z
i
where Z
i
, i = 1, 2, . . .
are i.i.d., Z
i
= log [Y
i
[ + log

, E(Z
1
) = E(log [Y
1
[) + log

.
We dene
c
by log

c
= E(log [Y
1
[).
Then, if >
c
,
1
n
log [X
n
[
Thm. 11.2
E(Z
1
) > 0 P-a.s. [X
n
[ P-a.s.
If <
c
,
1
n
log [X
n
[ E(Z
1
) < 0 P-a.s. [X
n
[ 0 P-a.s.
Remark
c
= exp(2E(log [Y
1
[))
()
> exp(2 log E([Y
1
[)
. .
2

) =

2
.
() x log x concave exercises.
Corollary 11.3 Assume X
1
, X
2
, . . . i.i.d. and X
1
is semi-integrable with E(X
1
) = .
Then,
S
n
n
+ P-a.s.
Proof: Take

X
i
= X
i
M. Then,

X
1
,

X
2
, . . . are i.i.d. and

S
n
n
E(

X
1
) P-a.s.
But: for each K > 0 there is M(K) = M such that E(

X
1
) = E(X
1
M) K.
Hence, liminf
n
S
n
n
liminf
n

S
n
n
K P-a.s.
Since K was arbitrary, this implies liminf
n
S
n
n
= P-a.s.
S
n
n
P-a.s.
53
12 Weak convergence, characteristic functions and the
central limit theorem
Recall that for X
1
, X
2
, . . . i.i.d. with P(X
i
= 1) = p, P(X
i
= 0) = 1 p,
S
n
=
n

i=1
X
i
, S

n
=
S
n
np

np(1p)
, we have
P(S

n
x)
n
(x), (12.1)
where (x) =
1

2
x
_

t
2
2
dt (de Moivre-Laplace Theorem).
Goal:
1.) Generalize (12.1) for random variables with dierent laws.
2.) Interpretation of (12.1) as a convergence statement for the law of S

n
.
We often assume that S is a Polish space, i.e. a complete separable metric space.
We always equip S with its Borel--eld o (o is generated by the open subsets of S).
Denition 12.1 (Weak convergence)
Let S be a Polish space and let ,
1
,
2
, . . . be probability measures on (S, o).
Then (
n
) converges weakly to for n if for each bounded, continuous function
f : S 1 (we write f C
b
(S)),
_
f d
n

_
f d (12.2)
We write
n
w
.
Theorem 12.2 (Portemanteau-Theorem)
The following statements are equivalent:
i)
n
w
.
ii)
_
f d
n

_
f d for all f : S 1, f uniformly continuous and bounded.
iii) limsup
n

n
(F) (F) for all closed sets F.
iv) liminf
n

n
(G) (G) for all open sets G.
v) lim
n

n
(A) = (A) for all sets A whose boundary is not charged by (-randlose
Mengen) i.e. A o with (

A

A) = 0 where

A is the closure of A and

A its
interior.
Proof:
(i) (ii) clear.
54
(ii) (iii) F closed, > 0 f uniformly continuous, 0 f 1, f(u) = 1,
u F, f(u) = 0, u U

(F)
c
, where U

(F) = s: d(s, F) <


(take for instance f(u) = (1
1

d(u, F)) 0).


Then limsup
n

n
(F) limsup
n
_
f d
n
(ii)
=
_
f d
_
I
U

(F)
d = (U

(F)).
Since F is closed, U

(F) F, hence (U

(F))
0
(F).
limsup
n

n
(F) (F).
(iii) (iv) liminf
n

n
(G) = 1 limsup
n

n
(G
c
) 1 (G
c
) = (G).
follows in the same way.
(iii)+(iv) (v) If the boundary of A is not charged by , then
(A) = (

A) liminf
n

n
(

A) limsup
n

n
(

A) = (

A) = (A).
(v) (i) Take f C
b
(S), > 0. Choose c
1
, . . . , c
m
such that c
k
c
k1
< ,
(f = c
k
) = 0, k, hence A
k
= c
k1
< f < c
k
are sets whose boundary is not charged
by . Take g :=
m

k=1
c
k
I
A
k
. Then, |g f| (where |g f| = sup
sS
[f(s) g(s)[).
But
_
g d
n

_
g d due to (v).
limsup
n
[
_
f d
n

_
f d[ 2.
Denition 12.3 (Convergence in law)
A sequence of random variables (X
n
) with values in a Polish space S converges in law
(in Verteilung) to a random variable X if the laws of X
n
converge weakly to the law of
X. We write X
n
d
X ((X
n
) konvergiert in Verteilung gegen X).
Hence, X
n
d
X E(f(X
n
)) E(f(X)), f C
b
(S).
Lemma 12.4 Assume (X
n
) is a sequence of real-valued random variables which converges
to 0 in probability.
Then, X
n
d
0, i.e. the laws of (X
n
) converge weakly to the Dirac measure in 0.
Proof: Let
n
be the law of X
n
, n = 1, 2, . . . .
1. Take F closed so that 0 / F. Then, there is > 0 so that (, ) F
c
.

n
(F) = P(X
n
F) P([X
n
[ )
n
0 =
0
(F).
2. Let F be a closed set so that 0 F. Then, obviously, limsup
n

n
(F) 1 =
0
(F).
3. Hence, for all cosed sets F, limsup
n

n
(F)
0
(F)
Thm. 12.2 (iii)

n
w

0
.
Lemma 12.5 Assume (S
1
, d
1
) and (S
2
, d
2
) are Polish spaces with metrics d
1
and d
2
and
h: S
1
S
2
a continuous function. Then:
(a) For probability measures ,
1
,
2
, . . . on (S
1
, o
1
) with
n
, we have
(
n
)
h
=
n
h
1
w
h
1
=
h
.
(b) Assume S
1
= S
2
= 1 and X, X
1
, X
2
, . . . are random variables with X
n
d
X.
Then, h(X
n
)
d
h(X).
55
Proof:
(a) We have to show that for all f C
b
(S
2
),
_
f d(
h
h
1
)
n

_
f d( h
1
).
But, due to Theorem 3.6,
_
f d(
h
h
1
) =
_
(f h) d
n
n

_
(f h) d =
_
f d(h
1
) since f h C
b
(S
1
).
(b) follows from (a) with = law of X,
n
= law of X
n
.
12.1 Characteristic functions
We can integrate complex functions by decomposing into real and imaginary part, i.e.
E(X + iY ) = E(X) + iE(Y ), where X, Y are real-valued and in L
1
.
Denition 12.6 (Characteristic function)
The characteristic function of a probability measure on (1
d
, B
d
) is the function
: 1
d
C dened by
(x) =
_
e
ix,y)
(dy) =
_
cos(x, y)(dy) + i sin(x, y)(dy) (x 1
d
), (12.3)
where , is the scalar product on 1
d
.
The characteristic function
X
of a random variable X is the characteristic function of
the law of X. We write
X
instead of

P X
1
=

P
X
, i.e.
X
(z) = E(e
iz,X)
).
Example 12.7 (d = 1)
1. For c 1 and =
c
, (x) = e
ixc
, x 1.
2. For a discrete law =

k=1

c
k
with
k
> 0 we have (x) =

k=1

k
e
ic
k
x
(note that (12.3) is linear in ).
In particular, for = Bin(n, p), =
n

k=0
_
n
k
_
p
k
(1 p)
nk

k
.
(x) =
n

k=0
_
n
k
_
p
k
(1 p)
nk
e
ikx
= (1 p + pe
ix
)
n
.
In the same way, if is Poisson with parameter ,
=

k=0
e

k
k!

k
(x) =

k=0
e

k
k!
e
ikx
= e
(e
ix
1)
.
3. For = N(0, 1), (x) = e

1
2
x
2
, x 1.
Proof: (x) =
1

1
2
y
2
e
ixy
dy and one can calculate this integral.
Let (x) = (x), (x) =
1

2
e

x
2
2
.
is a solution of the dierential equation

t
(x) + x(x) = 0. (12.4)
(Check!) The same holds true for . More precisely, (x) =

e
ixy
(y) dy

t
(x) = i

ye
ixy
(y) dy (12.5)
56
and, integrating by parts,
x(x) =

xe
ixy
..
,
(y)
..

dy = ie
ixy
(y)[

+ i

e
ixy

t
(y) dy = i

e
ixy

t
(y) dy
(12.6)
(since ([x[) 0 for [x[ ).
Due to (12.4),
t
(y) + y(y) = 0, hence (12.5) and (12.6) imply

t
(x) + x(x) = 0. (12.7)
Now,

2 and solve the same linear dierential equation (see (12.4), (12.7))
and they both take the value 1 in x = 0.
=

2 i.e. (x) = e

1
2
x
2
, x 1.
Similarly, one can show that for = N(m,
2
), (x) = e
imx
1
2

2
x
2
, x 1.
4. Let be the Cauchy distribution with parameter c > 0, i.e. has the density
f(x) =
c
(c
2
+x
2
)
. Then, (x) = e
c[x[
, see literature.
5. = U[0, 1]. Then, (x) =
e
ix
1
ix
.
Proof: (x) =
1
_
0
e
ixy
dy =
1
ix
e
ixy
[
1
0
=
1
ix
(e
ix
1).
6. = exp() i.e. has the density f(x) =
_
_
_
e
x
x 0
0 x < 0.
Then, (x) =

ix
, x 1.
Proof: (x) =

_
0
e
ixy
e
y
dy =

_
0
e
(ix)y
dy =

ix
.
The characteristic function is an important tool in probability for the following reasons:
(1) The mapping X
X
has nice properties under transformations and sums of i.i.d.
random variables.
(2)
X
characterizes the law of X uniquely.
(3) Pointwise convergence of the characteristic function is (under mild additional assump-
tions) equivalent to the weak convergence of the corresponding measures.
At (1):
Lemma 12.8 For a random variable X with values in 1
d
and characteristic function
X
we have
a) For each y 1
d
, [
X
(x)[ 1 and
X
(0) = 1.
b)
X
is uniformly continuous.
c) For all a, b 1
d
,
aX+b
(y) =
X
(ay)e
ib,y)
.
57
d)
X
() is real-valued if and only if P
X
= P
X
i.e. if the law of X equals the law of X.
e) If X and Z are independent, then
X+Z
=
X

Z
Proof:
a) Clear from the denition of
X
.
b) [
X
(y
1
)
X
(y
2
)[ =
_
[e
iy
1
,x)
e
iy
2
,x)
[P
X
(dx) =
_
[e
iy
2
,x)
[
. .
1
[1 e
iy
1
y
2
,x)
[
. .
if [y
1
y
2
[()
P
X
(dx).
c)
aX+b
(y) =
_
e
iy,z)
P
aX+b
(dz) =
_
e
iy,az+b)
P
X
(dz) = e
ib,y)
_
e
iz,ay)
P
X
(dz)
= e
ib,y)

X
(ay).
d)
X
(y) =
_
e
iy,z)
P
X
(dz) =
_
cosy, zP
X
(dz) + i
_
siny, zP
X
(dz) is real-valued

_
siny, zP
X
(dz) =
_
siny, zP
X
(dz)

_
(siny, z + siny, z)P
X
(dz) = 0, y
P
X
= P
X
.
e)
X+Z
(y) = E(e
iX+Z,y)
) = E(e
iX,y)
e
iZ,y)
)
Independence
=
of X and Z
E(e
iX,y)
)E(e
iZ,y)
)
=
X
(y)
Z
(y).
The moments of X can be calculated from its characteristic function.
Lemma 12.9 Let X be a real-valued random variable with characteristic function
X
.
Then
(a) If E([X[
k
) < ,
X
is k-times continuously dierentiable and the derivatives are
given by
(j)
X
(t) = E((iX)
j
e
itX
), j = 0, 1, . . . , k.
(b) If E(X
2
) < then

X
(t) = 1 + itE(X)
1
2
t
2
E(X
2
) + o(t
2
) for t 0.
We will use (b).
Proof of (b): We use the following analytical estimate:

e
ix

m=0
(ix)
m
m!

min
_
[x[
n+1
(n + 1)!
,
2[x[
n
n!
_
(12.8)
(see R. Durrett, Probability: Theory and Examples)
Take x = tX and take expectations

E(e
itX
) E
_
n

m=0
(itX)
m
m!
_

E
_

e
itX

m=0
(itX)
m
m!

_
E
_
min
_
[tX[
n+1
(n+1)!
,
2[tX[
n
n!
__
=
t
n
(n+1)!
E(min([t[[X[
n+1
, 2(n + 1)[X[
n
)).
For n = 2, we have therefore
[E(e
itX
) (1 + itE(X)
1
2
t
2
E(X
2
))
1
6
t
2
E(min([t[[X[
3
, 6[X[
2
)).
But [t[[X[
3
6[X[
2
6[X[
2
and [t[[X[
3
6[X[
2
t0
0
Lebesgues

Theorem
E(min([t[[X[
3
, 6[X[
2
))
n
0.
58
At (2):
Theorem 12.10 Assume
1
,
2
are probability measures on (1
d
, B
d
) with
1
=
2
. Then,

1
=
2
.
Proof: Since the compact sets generate the Borel--eld, it suces to show that

1
(K) =
2
(K), K 1
d
, K compact.
Assume K compact and let d(x, K) = infd(x, y) [ y K be the distance from x to K
(x 1
d
).
For m N dene f
m
: 1
d
[0, 1] by f
m
(x) =
_

_
1 x K,
0 d(x, K)
1
m
,
1 md(x, K) otherwise.
Then, f
m
is continuous, has values in [0, 1] and compact support and f
m
I
K
for m .
With monotone convergence, we see that it suces to show that
_
f
m
d
1
=
_
f
m
d
2
, m
(because we then conclude
1
(K) =
2
(K)). Fix m.
Take > 0 and choose N large enough such that B
N
:= [N, N]
d
contains the set
x 1
d
[ f
m
(x) ,= 0 and such that
1
(B
c
N
) and
2
(B
c
N
) .
Using the Fourier convergence Theorem, there is a function g : 1
d
C of the form
g(x) =
n

j=1
c
j
e
i
2
2N
t
j
,x)
, where n N, c
1
, . . . , c
n
C and t
1
, . . . , t
n
Z
d
, such that
sup
xB
N
[g(x) f
m
(x)[ .
We conclude that sup
x1
d
[g(x)[ 1 + . Now we can estimate

_
f
m
d
1

_
f
m
d
2

_
f
m
d
1

_
g d
1

_
g d
1

_
g d
2

_
g d
2

_
f
m
d
2

. (12.9)
Since
1
=
2
, the second term in (12.9) vanishes.
Since [g(x)[ 1 + , the rst term in (12.9) can be estimated as follows:
[
_
f
m
d
1

_
g d
1
[
_
B
N
[f
m
g[ d
1
+
_
B
c
N
[f
m
[ d
1
+
_
B
c
N
[g[ d
1

1
(B
N
) + (1 + )
1
(B
c
N
) + (1 + ) = (2 + ).
The last term in (12.9) is (2 + ), in the same way.
Since > 0 was arbitrary, we conclude that
_
f
m
d
1
=
_
f
m
d
2
, m.
At (3):
Theorem 12.11 (Continuity Theorem)
Assume
1
,
2
, . . . are probability measures on (1, B) with characteristic functions

1
,
2
, . . . . Then
(a) If
n
w

n
for some probability measure on (1, B), then (
n
) converges pointwise
to the characteristic function of , i.e.
n
(t)
n
(t), t 1.
59
(b) If (
n
) converges pointwise to a function : 1 C and is continuous at 0, then
there is a probability measure on (1, B) such that is the characteristic function
of and
n
w
.
For the proof, we will need the improtant notion of tightness (Straheit).
Denition 12.12 A sequence of probability measures on a Polish space (S, o) is tight
(stra) if there is, > 0, a compact set K

S such that sup


n

n
(K
c

) .
Theorem 12.13 (Prohorovs Theorem)
(S, o) Polish space.
(i) (
n
) tight each subsequence of (
n
) has a weakly convergent subsequence.
(ii) holds as well.
Proof: see P. Billingsley, Convergence of Probability Measures.
Proof of Theorem 12.11:
(a) t e
itx
is continuous and bounded, hence

n
(t) =
_
e
itx

n
(dx)
n

_
e
itx
(dx) = (t).
(b) We show that (
n
) is tight.
u
_
u
(1 e
itx
) dt = 2u
u
_
u
(cos tx + i sin tx) dt = 2u
2 sin ux
x
.
Devide both sides by u and integrate with
n
,
1
u
u
_
u
(1
n
(t)) dt = 2
_
(1
sin ux
ux
)
n
(dx) 2
_
[x[
2
u

(1
1
[ux[
)
n
(dx)

n
(x: [x[ >
2
u
).
Since
n
(t)
t0
1 and is continuous in 0, we have
1
u
u
_
u
(1 (t)) dt
u0
0.
Choose u small enough so that
1
u
u
_
u
(1 (t)) dt < .
We show that
n
(x: [x[ >
2
u
)
1
u
u
_
u
(1
n
(t)) dt.
Since
n
(t) (t), t we conclude with dominated convergence that for
n N
0
(),
1
u
u
_
u
(1
n
(t)) dt 2 and this proves that (
n
) is tight.
Now, with Theorem 12.13 (i): each subsequence of (
n
) has a weakly convergent
subsequence and this subsequences al converge to part a).
There
n
w
.
Example 12.14 Let
n
be the uniform distribution on [n, n], i.e.
n
= U[n, n]. Then,

n
(t) =
1
2n
n
_
n
e
itx
(dx) =
1
2n
1
ti
(e
itn
e
itn
) =
1
2nt
(sin tn sin(tn)) =
sin tn
tn
, t 10.
Hence
n
(t)
n
(t), t, where (t) =
_
_
_
1 t = 0
0 else.
60
In particular, is not continuous in 0.
In fact, (
n
) does not converge weakly, see exercises.
Remark (
n
) is not tight.
Corollary 12.15 Let > 0. For each n N consider i.i.d. random variables X
1
, . . . , X
n
.
Bernoulli with p =

n
, i.e. P(X
i
= 1) =

n
= 1 P(X
i
= 0).
S
n
=
n

i=1
X
i
has the law Bin(n,

n
).
the characteristic function
S
n
of S
n
is given by
S
n
(t) =
n
(t) = (1

n
+

n
e
it
)
n
, see
Example 12.7.2.

n
converges to e
(e
it
1)
for n and (t) = e
(e
it
1)
is the characteristic function of
the Poisson distribution with parameter , see Example 12.7.2.
According to Theorem 12.11, the laws of S
n
converge weakly to the Poisson distribution
with parameter .
Theorem 12.16 (Central Limit Theorem (CLT))
Assume X
1
, X
2
, . . . are i.i.d. with
2
= V ar(X
1
) < and m = E(X
1
). Assume
2
> 0.
Take S
n
=
n

i=1
X
i
and S

n
=
S
n
nm

n
2
, n = 1, 2, . . . .
Then, the laws of S

n
converge weakly to N(0, 1). In particular, for a < b ,
we have
lim
n
P(a S

n
b) =
1

2
b
_
a
e

u
2
2
du. (12.10)
Proof: characteristic function of X
1
m. Then, due to Lemma 12.9 b),
(t) = 1

2
2
t
2
+ t
2
g(t) with g(t)
t0
0.
Let
n
be the characteristic function of S

n
. Due to Lemma 12.8 (c) and (e), we have

n
(t) = (
t

n
2
)
n
(t 1).
Note that (1
t
2
2n
)
n
e

t
2
2
(t 1).
Since [u
n
v
n
[ [u v[nmax([u[, [v[)
n1
for all u, v C, we have
[(1
t
2
2n
)
n

n
(t)[ = (1
t
2
2n
)
n
(
t

n
2
)
n
[ n[1
t
2
2n
(
t

n
2
)[ n
t
2
n
2
[g(
t

n
2
)[
n
0.
the characteristic functions
n
of S

n
converge to the characteristic function t e

t
2
2
of = N(0, 1), see Example 12.7.3.
Theorem 12.11 implies that the laws of S

n
converge weakly to N(0, 1).
(12.10) then follows with the Portemanteau Theorem, since = N(0, 1) has a density
and [a, b] is a set whose boundary is not charged by , a, b.
61
Remark Let / = [ probability measure on (

1,

B) with
_
x
2
d < .
For /,
_
[x[ d < . Let m =
_
x d and
2
=
_
x
2
d (
_
x d)
2
.
For
1
,
1
/, dene
1

2
as follows.
Let X
1
, X
2
independent random variables with laws
1
and
2
.
Let
1

2
be the law of
X
1
+X
2
m
1
m
2

2
1
+
2
2
.
Then, if X
1
, . . . , X
n
are i.i.d. with , the laws of S

n
=
S
n
nm

n
2
is . . .
. .
n-times
.
Note = N(0, 1) is a xed point, = .
In this sense, the CLT describes convergence to a xed point.
62
13 Conditional expectation
13.1 Motivation
Assume X is random variable on some probability space (, /, P), X 0.
The expectation E(X) can be interpreted as a prediction for the unknown (random) value
of X.
Assume /
0
/, /
0
is a -eld and assume we have the information in /
0
, i.e. for
each A
0
/
0
we know if A
0
will occur or not.
How does this partial information modify the prediction of X?
Example 13.1
(a) If X is measurable with respect to /
0
, then X c /
0
, c and we know for each
c if X() c or X() > c occurs.
We know the value of X().
(b) X
1
, X
2
, . . . i.i.d. with m = E(X
1
) < .
How should we modify the prediction E(X
1
) = m if we know the value
S
n
() = X
1
() + . . . + X
n
()?
The solution of the prediction problem is to pass from the constant E(X) = m to a
random variable E(X [ /
0
), which is measurable with respect to /
0
, the conditional
expectation of X, given /
0
.
13.2 Conditional expectation for a -eld generated by atoms
Let ( = (A
i
)
i=1,2,...
be a countable partition of into atoms A
i
, i.e. =

i=1
A
i
and
/
0
= (().
If P(A
i
) > 0, consider the conditional law P( [ A
i
) dened by
P(B [ A
i
) =
P(B A
i
)
P(A
i
)
(B /)
and dene E(X [ A
i
) =
_
X dP( [ A
i
) =
1
P(A
i
)
_
A
i
X dP =
1
P(A
i
)
E(XI
A
i
). Now dene
E(X [ /
0
)() :=

i : P(A
i
)>0
1
P(A
i
)
E(XI
A
i
)I
A
i
(). (13.1)
(13.1) gives for each a prediction E(X [ /
0
)() which uses only the information in
which atom is.
Denition 13.2 If /
0
is generated by a countable partition ( of , the random variable
E(X [ /
0
) dened in (13.1) is the conditional expectation of X given /
0
.
63
Theorem 13.3 The random variable E(X [ /
0
) (dened in (13.1)) has the following
properties:
(i) E(X [ /
0
) is measurable with respect to /
0
(E(X [ /
0
): (, /
0
) (

1,

B)).
(ii) For each random variable Y
0
0, which is measurable with respect to /
0
, we have
E(XY
0
) = E(E(X [ /
0
)Y
0
). (13.2)
In particular,
E(X) = E(E(X [ /
0
)). (13.3)
Proof: (i) follows from (13.1).
To show (ii), take rst I
A
j
(A
j
/
0
):
E(E(X [ /
0
)I
A
j
) = E
_

i : P(A
i
)>0
1
P(A
i
)
E(XI
A
j
) I
A
j
I
A
j
. .
=

I
A
j
i=j
0 else
_
=
1
P(A
j
)
E(XI
A
j
)P(A
j
)
= E(XI
A
j
) if P(A
j
) > 0.
Hence (13.2) follows in this case from (13.1) (if P(A
j
) = 0, both sides in (13.2) are = 0).
Next, we consider functions of the form

c
i
I
A
i
(c
i
0), then monotone limits of such
functions as in the denition of the integral.
(13.2) holds true for all Y
0
0, Y
0
measurable with respect to /
0
.
Taking Y
0
1, (13.3) follows.
Notation: If /
0
= (Y ) for some random variable Y , we write E(X [ Y ) instead of
E(X [ (Y )).
Example 13.4
Take p [0, 1], let X
1
, X
2
, . . . be i.i.d. Bernoulli random variables with parameter p, i.e.
P(X
i
= 1) = p = 1 P(X
i
= 0).
Question: What is E(X
1
[ S
n
)?
Answer: E(X
1
[ S
n
) =
n

k=0
P(X
1
= 1 [ S
n
= k)I
S
n
=k
and
P(X
1
= 1 [ S
n
= k) =
P(X
1
=1,S
n
=k)
P(S
n
=k)
=
p
(
n1
k1
)
p
k1
(1p)
n1(k1)
(
n
k
)
p
k
(1p)
nk
=
k
n
E(X
1
[ S
n
) =
S
n
n
. (13.4)
Remark E(X
1
[ S
n
) does not depend on the success parameter p.
64
Example 13.5 (Randomized sums)
X
1
, X
2
, . . . random variables with X
i
0, i and E(X
i
) = m, i.
T : 0, 1, . . . is independent of (X
1
, X
2
, . . . ), S
T
() :=
T()

k=1
X
k
().
Then, according to (13.1), E(S
T
[ T) =

k=0
1
P(T=k)
E(S
T
I
T=k
)I
T=k
.
But E(S
T
I
T=k
) = E(S
k
I
T=k
)
T indep.
=
of S
k
E(S
k
)E(I
T=k
) = k m P(T = k).
Hence E(S
T
[ T) = m

k=0
kI
T=k
. .
=T
E(S
T
[ T) = m T.
Now, with (13.3) we conclude that
E(S
T
) = m E(T) (Walds identity) (13.5)
13.3 Conditional expectation for general -elds
X random variable on (, /, P), X 0, A
0
/, A
0
-eld.
Denition 13.7 A random variable X
0
0 is (a version of) the conditional expectation
of X, given /
0
, if it satises
(i) X
0
is measurable with respect to /
0
,
(ii) for each random variable Y
0
0, Y
0
measurable with respect to /
0
, we have
E(XY
0
) = E(X
0
Y
0
).
We write X
0
= E(X [ /
0
).
Remark 13.8
1) For (ii) it suces to have E(XI
A
0
) = E(X
0
I
A
0
), A
0
/
0
.
2) (ii) implies, with Y
0
1,
E(X) = E(E(X [ /
0
)) (13.6)
Theorem 13.9 (Existence and uniqueness of conditional expectations)
The conditional expectation E(X [ /
0
) of a random variable X 0 given a -eld /
0
exists and is unique in the following sense:
If X
0
and

X
0
are two random variables satisfying (i) and (ii) in Denition 13.7, then
X
0
=

X
0
P-a.s.
Remark If X = X
+
X

is semi-integrable, we dene
E(X [ /
0
) = E(X
+
[ /
0
) E(X

[ /
0
).
Then, X L
1
E(X [ /
0
) L
1
.
65
Proof of Theorem 13.9:
1. Let Q(A
0
) := E(XI
A
0
) (A
0
/
0
).
Then Q is a measure on (, /
0
):
Q
_

i=1
A
i
_
= E
_
X

i=1
I
A
i
_
=

i=1
E(XI
A
i
) =

i=1
Q(A
i
).
The measure Q is absolutely continuous with respect to P on (, /
0
), i.e. we have,
A /
0
, P(A
0
) = 0 Q(A
0
) = 0.
Due to Radon Nikodyms Theorem, there is a function X
0
0, which is measurable
with respect to /
0
such that Q(A
0
)
. .
=E(XI
A
0
)
=
_
A
0
X
0
dP = E(X
0
I
A
0
) (A
0
/
0
).
Remark 13.8.1 implies that (ii) in Denition 13.7 is satised for X
0
.
2. If X
0
and

X
0
are random variables which satisfy (ii) in Denition 13.7, then
A
0
= X
0
>

X
0
/
0
.
13.7 (ii) implies that E(X
0
I
A
0
) = E(

X
0
I
A
0
) E((X
0


X
0
)I
A
0
) = 0 P(A
0
) = 0.
In the same way, P(X
0
<

X
0
) = 0 X
0
=

X
0
P-a.s.
Example 13.10 We generalize the explicit computation in Example 13.4.
Lemma 13.10 X
1
, . . . , X
n
are i.i.d. and X
1
L
1
, S
n
=
n

i=1
X
i
. Then,
E(X
i
[ S
n
) =
S
n
n
, i = 1, . . . , n. (13.7)
Proof: We will need the following lemma.
Lemma 13.11 X and Y are random variables on some probability space (, /, P). Then,
the following statements are equivalent:
(a) Y is measurable with respect to (X).
(b) There is a measurable function h: (, /) (

1,

B) such that Y = h(X).
Proof of Lemma 13.11: (b) (a) is clear because the composition of measurable functions
is measurable.
(a) (b): Take rst Y = I
A
, A (X).
Then, A = X B for some B B and Y = h(X) = I
XB
.
Then, take Y =

i
c
i
I
A
i
, then monotone limits of such functions etc.
Proof of Lemma 13.10: Let Y
0
0, Y
0
measurable with respect to (S
n
).
Hence, with Lemma 13.11, Y
0
= h(S
n
) for a measurable function h. Hence,
E(X
i
h(S
n
)) =
_

_
x
i
h(x
1
+ . . . + x
n
)
1
(dx
1
) . . . (dx
n
), where is the law of X
1
.
E(X
i
h(S
n
)) is invariant under permutations of the indices 1, . . . , n
E(X
i
h(S
n
)) = E(X
j
h(S
n
)), i, j
E(X
i
h(S
n
)) =
1
n
n

k=1
E(X
k
h(S
n
)) = E(
S
n
n
h(S
n
))
E(X
i
Y
0
) = E(
S
n
n
Y
0
) and we showed that
S
n
n
satises property (ii) in Denition 13.7.
66
Remark The proof used only that the joint law of (X
1
, . . . , X
n
) is invariant under per-
mutations of the indices.
13.4 Properties of conditional expectations
Conditional expectation satises the same rules as expectation.
In some situations, this is obvious since conditional expectation is an expectation with
respect to some conditional distribution.
Theorem 13.12 X
1
, X
2
random variables with X
1
0, X
2
0. Then
(a) E(X
1
+ X
2
[ /
0
) = E(X
1
[ /
0
) + E(X
2
[ /
0
) P-a.s.
E(cX
1
[ /
0
) = cE(X
1
[ /
0
)
(linearity)
(b) X
1
X
2
P-a.s. E(X
1
[ /
0
) E(X
2
[ /
0
) P-a.s. (monotonicity)
(c) 0 X
1
X
2
. . . P-a.s. E(lim
n
X
n
[ /
0
) = lim
n
E(X
n
[ /
0
) P-a.s.
Remark about (c)
A :=

n
X
n
< X
n+1
. Due to the hypothesis, P(A) = 1 and (b) implies P(A
0
) = 1 where
A
0
=

n
E(X
n
[ /
0
) E(X
n+1
[ /
0
) (for all versions E(X
n
[ /
0
), E(X
n+1
[ /
0
)).
We now set lim
n
X
n
() = lim
n
X
n
()I
A
() and lim
n
E(X [ /
0
) = lim
n
E(X
n
[ /
0
)I
A
0
().
(c) says that now lim
n
E(X
n
[ /
0
) is (a version of) the conditional expectation of lim
n
X
n
,
given /
0
, i.e. a random variable with properties 13.7 (i) and 13.7 (ii) with /
0
and
X = lim
n
X
n
.
Proof:
(a) For each choice of a version E(X
i
[ /
0
) (i = 1, 2) we have that
E(X
1
[ /
0
) + E(X
2
[ /
0
) is a random variable which is measurable with respect to
/
0
and for Y
0
0, Y
0
measurable with respect to /
0
, we have
E(Y
0
(E(X
1
[ /
0
) + E(X
2
[ /
0
))) = E(Y
0
E(X
1
[ /
0
)) + E(Y
0
E(X
2
[ /
0
))
13.7 (ii)
=
E(Y
0
X
1
) + E(Y
0
X
2
) = E(Y
0
(X
1
+ X
2
)).
(b) Let B
0
= E(X
1
[ /
0
) > E(X
2
[ /
0
). Then, B /
0
and
_
B
0
(E(X
1
[ /
0
) E(X
2
[ /
0
)) dP = E(I
B
0
E(X
1
[ /
0
) E(X
2
[ /
0
))
13.7 (ii)
= E(I
B
0
(X
1
X
2
)) =
_
B
0
(X
1
X
2
) dP
X
1
X
2

P-a.s.
0 P(B
0
) = 0.
(c) Let Y
0
0, Y
0
measurable with respect to /
0
. Then
E(Y
0
lim
n
E(X
n
[ /
0
))
mon.
=
conv.
lim
n
E(Y
0
E(X
n
[ /
0
))
13.7 (ii)
= lim
n
E(Y
0
X
n
)
mon.
=
conv.
E(Y
0
lim
n
X
n
).
67
Theorem 13.13
(a) Let Z
0
0 be a random variable which is measurable with respect to /
0
. Then
E(Z
0
X [ /
0
) = Z
0
E(X [ /
0
). (13.8)
(b) Assume that (X) and /
0
are independent. Then,
E(X [ /
0
) = E(X). (13.9)
Proof:
(a) The right hand side of (13.8) is measurable with respect to /
0
and for Y
0
0, Y
0
measurable with respect to /
0
we have
E(Y
0
(Z
0
X)) = E((Y
0
Z
0
)X)
13.7 (ii)
= E(Y
0
Z
0
E(X [ /
0
)) = E(Y
0
(Z
0
E(X [ /
0
))).
(b) see exercise 45.
Theorem 13.12 implies the following
Fatous Lemma for conditional expectation
X
n
Y P-a.s. n for some X L
1
E
_
liminf
n
X
n
[ /
0
_
liminf
n
E(X
n
[ /
0
) P-a.s.
Lebesgues Theorem for conditional expectations
[X
n
[ Y P-a.s. n for some Y L
1
, X
n
X P-a.s.
E
_
lim
n
X
n
[ /
0
_
= lim
n
E(X [ /
0
) P-a.s.
Jensens inequality for conditonal expectations
X L
1
, f convex. Then, f(x) is semi-integrable and E(f(x) [ /
0
) f(E(X [ /
0
)) P-a.s.
Proof: Each convex function f is of the form f(x) = sup
n
l
n
(x) x with linear functions
l
n
(x) = a
n
x + b
n
. In particular, f l
n
, l
n
(X) L
1
.
Since E(f(x) [ /
0
)
13.12(b)
E(l
n
(x) [ /
0
)
13.12(a)
= l
n
(E(X [ /
0
)), we have
E(f(x) [ /
0
) sup
n
l
n
(E(X [ /
0
)) = f(E(X [ /
0
)) P-a.s.
Corollary 13.14
For p 1, conditional expectation is a contraction of L
p
in the following sense:
X L
p
E(X [ /
0
) L
p
and |E(X [ /
0
)|
p
|X|
p
.
Proof: With f(x) = [x[
p
, Jensens inequality for conditional expectations implies that
[E(X [ /
0
)[
p
E([X[
p
[ /
0
) E([E(X [ /
0
)[
p
) E([X[
p
)
|E(X [ /
0
)|
p
|X|
p
.
In particular, if X L
2
, then E(X [ /
0
) L
2
and E(X [ /
0
) can be interpreted as the
best prediction of X, given /
0
, in the following sense.
68
Theorem 13.15 Assume X L
2
, Y
0
is measurable with respect to /
0
and Y
0
L
2
.
Then E ((X E(X [ /
0
))
2
) E((X Y
0
)
2
) and we have
= if and only if Y
0
= E(X [ /
0
) P-a.s.
Proof: Assume X
0
is a version of E(X [ /
0
).
Then E((X Y
0
)
2
) = E(X
2
) 2E(XY
0
) + E(Y
2
0
).
For Y
0
= X
0
, we conclude E((X X
0
)
2
) = E(X
2
) E(X
2
0
).
Hence E((X Y
0
)
2
) = E((X X
0
)
2
) + E((X
0
Y
0
)
2
).
E((X Y
0
)
2
) E((X X
0
)
2
) with = if and only if X
0
= Y
0
P-a.s.
Remark Theorem 13.15 says that the conditional expectation E(X [ /
0
) is the projec-
tion of the element X in the Hilbert space L
2
(, /, P) on the closed subspace L
2
(, /
0
, P).
Theorem 13.16 (Projection property of conditional expectation)
Let /
0
, /
1
be -elds with /
0
/
1
/ and X a random variable with X 0. Then,
E(E(X [ /
1
) [ /
0
) = E(X [ /
0
) P-a.s. (13.10)
and
E(E(X [ /
0
) [ /
1
) = E(X [ /
0
) P-a.s. (13.11)
Proof: For Y
0
0, Y
0
measurable with respect to /
0
,
E(Y
0
E(X [ /
1
)) = E(Y
0
X) and this proves (13.10).
(13.11) is clear since E(X [ /
0
) is measurable with respect to /
1
: use (13.8).
69
14 Martingales
14.1 Denition and examples
(, /, P) probability space, /
0
/
1
/
2
. . . increasing sequence of -elds with
/
i
/, i.
Interpretation: /
n
is the collection of events observable at time n.
Denition 14.1 A martingale is a sequence (M
n
)
n=0,1,...
of random variables with
(i) M
n
is measurable with respect to /
n
, n 0 and M
n
L
1
, n. (14.1)
(ii) E(M
n+1
[ /
n
) = M
n
, n 0. (14.2)
Remarks 14.2
1. Under the assumption (i), (ii) is equivalent to
E(M
n+1
M
n
[ /
n
) = 0, n 0. (14.3)
2. We are now omitting P-a.s. in (14.2), (14.3).
3. We say that (M
n
) is adapted to (/
n
) (meaning that for each n, M
n
is measurable
with respect to /
n
).
4. (14.3) implies that for n, k 0,
E(M
n+k
M
n
[ /
n
) =
k

l=1
E(M
n+l
M
n+l1
[ /
n
)
Thm.12.16
=
k

l=1
E(E(M
n+l
M
n+l1
[ /
n+l1
) [ /
n
) = 0.
We consider four important (classes of) examples.
14.1.1 Sums of independent centered random variables
Y
1
, Y
2
, . . . independent random variables, Y
i
L
1
, i.
/
n
:= (Y
1
, . . . , Y
n
), n 1, /
0
:= , .
Let M
n
:=
n

i=1
(Y
i
E(Y
i
)), n 1 and M
0
= 0.
Then, (M
n
) is a martingale with respect to (/
n
):
(14.1) is satised and we have
E(M
n+1
M
n
[ /
n
) = E(Y
n+1
E(Y
n+1
) [ /
n
) = E(Y
n+1
[ /
n
) E(Y
n+1
)
Thm.13.13(a)
=
E(Y
n+1
) E(Y
n+1
) = 0.
Example 14.3 p (0, 1), Y
1
, Y
2
, . . . i.i.d. with P(Y
i
= 1) = p = 1 P(Y
i
= 1).
S
n
:=
n

i=1
, n 1 (S
0
= 1) is the corresponding random walk.
A
n
= (Y
1
, . . . , Y
n
), /
0
= , .
70
M
n
:= S
n
n(2p 1) (n = 0, 1, . . . ).
Then (M
n
) is a martingale with respect to (/
n
).
In the same way, for x 1,

M
n
= x +S
n
n(2p 1) (n = 0, 1, . . . ) is a martingale with
respect to (/
n
).
Note that (S
n
) is a martingale with respect to /
n
p =
1
2
.
14.1.2 Successive Predictions
Take X L
1
and set
M
n
:= E(X [ /
n
), n = 0, 1, 2, . . . (14.4)
Then, (M
n
) is a martingale with respect to (/
n
).
M
n
is measurable with respect to /
n
, n and
E(M
n+1
[ /
n
) = E(E(X [ /
n+1
) [ /
n
)
Thm.13.16
= E(X [ /
n
) = M
n
.
14.1.3 Radon-Nikodym derivatives on increasing sequences of -elds
Example 14.4 P and Q are probability measures on (, /) with Q P.
Let X :=
dQ
dP
be the Radon-Nikodym derivative of Q with respect to P.
Let M
n
:=
dQ
dP
[
,
n
be the Radon-Nikodym derivative of Q[
,
n
with respect to P[
,
n
.
Claim: M
n
= E(X [ /
n
), n = 0, 1, 2, . . .
Consequence: (M
n
) is a martingale with respect to (/
n
) (it falls into class 2).
Proof of the claim: Z
n
:= E(X [ /
n
).
1. Z
n
is measurable with respect to /
n
, n.
2. We show that Q(A) =
_
A
Z
n
dP, A /
n
(and this implies Z
n
=
dQ
dP
[
,
n
).
Take A /
n
. Then,
Q(A) =
_
A
X dP since X =
dQ
dP
.
_
A
X dP =
_
X1
A
dP
A,
n
=
(13.2)
_
E(X [ /
n
)1
A
dP =
_
A
Z
n
dP.
Now Radon-Nikodym derivatives on increasing -elds form a martingale even if they are
not of the form E(X [ /
n
).
Theorem 14.5 P and Q probability measures on (, /), /
0
/
1
. . . increasing
sequence of -elds with /
i
/, i.
We assume Q[
,
i
P[
,
i
, i.
Let M
n
=
dQ
dP
[
,
n
(n = 0, 1, . . . ).
Then, (M
n
) is a martingale with respect to (/
n
).
71
Proof:
1. M
n
is measurable with respect to /
n
, n.
2. E(M
n+1
[ /
n
) = M
n
, n, with the same argument as in Example 14.4, i.e. we show
that Q(A) =
_
A
E(M
n+1
[ /
n
) dP, A /
n
.
Take A /
n
. Then, Q(A) =
_
A
M
n
dP =
_
1
A
M
n+1
dP
A,
n
=
_
1
A
E(M
n+1
[ /
n
) dP =
_
A
E(M
n+1
[ /
n
) dP.
M
n
= E(M
n+1
[ /
n
) P-a.s.
14.1.4 Harmonic functions of Markov chains
Consider a Markov process with state space (S, o) and transition kernel K(x, dy).
A function h: (S, o) (

1,

B) is harmonic if it satises, x S, the mean value property
h(x) =
_
h(y)K(x, dy). (14.5)
Take = = (x
0
, x
1
, . . . ) [ x
i
S, X
i
() = x
i
and let P
x
be the law of the Markov
process (X
n
) with x
0
= x and transition kernel K. Assume h is harmonic.
If h(x) < , M
n
:= h(X
n
) (n = 0, 1, . . . ) is a martingale with respect to P
x
and
/
n
= (X
0
, . . . , X
n
):
E(h(X
n+1
) [ /
n
) =
_
h(y)K(X
n
, dy)
(14.5)
= h(X
n
) P
x
-a.s.
Example 14.6 p (0, 1), x Z, Y
1
, Y
2
, . . . i.i.d. with
P(Y
i
= 1) = p = 1 P(Y
i
= 1), S
n
=
n

i=1
Y
i
, (S
0
= 0).
Consider the random walk X
n
= x + S
n
(n = 0, 1, . . . ).
(X
n
) is a Markov chain with starting point x and transition kernel
K(z, ) = p
z+1
+ (1 p)
z1
.
Take h(y) =
_
1p
p
_
y
(y Z).
Then, h is a harmonic function for K:
_
h(y)K(z, dy) = p h(z + 1) + (1 p)h(z 1) =
_
1p
p
_
z
= h(z)
h(X
n
) (n = 0, 1, . . . ) is a martingale with respect to /
n
= (X
0
, . . . , X
n
)
14.2 Stopping times
Denition 14.7 (, /) measurable space, (/
n
) increasing sequence of -elds with
/
n
/, n.
A stopping time is a function T : 0, 1, . . . , + such that
T = n /
n
(n = 0, 1, . . . ) (14.6)
In words: the decision whether to stop at time n can be based on the events observable
until time n.
72
Remark (14.6) is equivalent to
T n /
n
(n = 0, 1, . . . ) (14.7)
Proof: (14.6) implies that T n =
n

k=0
T = k /
n
.
On the other hand, (14.7) implies that T = n = T n T n 1
c
/
n
.
Example 14.8 If (X
n
)
n=0,1,...
is adapted to (/
n
) and A B, the rst entry time to A,
given by T
A
() = minn 0: X
n
() A (min = +) is a stopping time.
Proof: T
A
n =
n

k=0
X
k
A /
n
.
But the time of the last visit to A, given by L
A
() = maxn 0: X
n
() A is in
general not a stopping time.
Theorem 14.9 (Optional stopping (Stoppsatz))
(, /, P) probability space, (/
n
) increasing sequence of -elds.
a) If (M
n
) is a martingale with respect to (/
n
) and T a stopping time, then
(M
Tn
) (n = 0, 1, . . . ) is again a martingale with respect to (/
n
).
b) If in addition
i) T is bounded, i.e. there is a constant K so that P(T K) = 1 or
ii) P(T < ) = 1 and (M
Tn
)
n=0,1,...
is uniformly integrable, then
E(M
T
) = E(M
0
). (14.9)
Proof:
a) Since T n /
n
, M
Tn
is measurable with respect to /
n
, n.
E(M
T(n+1)
M
Tn
[ /
n
) = E((M
n+1
M
n
)1
T>n
[ /
n
)
T>n,
n
= 1
T>n
E(M
n+1
M
n
[ /
n
)
(M
n
) mart.
= 0.
(M
Tn
) martingale with respect to (/
n
).
b) We have to show (14.9) under the above assumption on (M
n
) and T.
If P(T K) = 1, then (M
Tn
) is uniformly integrable, since
[M
Tn
[ max[M
0
[, . . . , [M
n
[ [M
n
[ + . . . +[M
n
[ =: Z.
[M
Tn
[ Z, Z L
1
(M
Tn
) uniformly integrable.
Hence, it suces to prove (14.9) under assumption (ii).
For T < , we have lim
n
M
Tn
() = M
T
().
Hence E(M
0
)
a)
= lim
n
E(M
Tn
)
(M
Tn
)
=
unif. integr.
E
_
lim
n
M
Tn
_
= E(M
T
).
73
Example 14.10 (Simple random walk)
Y
1
, Y
2
, . . . i.i.d. with P(Y
i
= 1) =
1
2
= P(Y
i
= 1), S
n
=
n

k=1
Y
k
for (n 1), S
0
= 0,
/
n
= (Y
1
, . . . , Y
n
), /
0
= , .
Then, (S
n
) is a martingale with respect to (/
n
), see Example 14.3.
Let T = minn 0: S
n
= 1.
Claim:
P(T < ) = 1. (14.10)
Proof: see later
Note E(S
0
) = 0, E(S
T
) = 1.
We conclude that (S
Tn
) is not uniformly integrable.
Example 14.11 (The classical ruin problem)
p (0, 1), x Z, Y
1
, Y
2
, . . . i.i.d., P(Y
i
= 1) = p = 1 P(Y
i
= 1),
/
n
= (Y
1
, . . . , Y
n
), /
0
= , , X
n
= x + S
n
, S
n
=
n

k=1
Y
k
, S
0
= 0.
a, b Z, a < b, T = minn 0: X
n
/ (a, b).
Gambling interpretation:
ruin of the gambler if X
T
= a.
ruin of the casino if X
T
= b.
Claim: P(T < ) = 1.
Proof: Let c = b a.
The events A
k
= Y
kc+1
= 1, . . . , Y
(k+1)c
= 1 (k = 0, 1, . . . ) are independent.
The Borel-Cantelli Lemma implies P
_

kn
A
k
_
= 1 P(T < ) = 1.
Goal: Calculate the ruin probability r(x) = P(X
T
= a) = P(x + S
T
= a).
i) If p =
1
2
, (X
n
) is a martingale with respect to (/
n
).
Clearly, [X
Tn
[ is bounded (a X
Tn
b) and we can apply Theorem 14.9.
x = E(X
0
) = E(X
T
) = a r(x) + b(1 r(x))
r(x) =
b x
b a
. (14.11)
ii) If p ,=
1
2
, we apply Theorem 14.9 to the martingale h(X
n
) =
_
1p
p
_
X
n
=
_
1p
p
_
x+S
n
from Example 14.6 and we get
h(x) = E(h(X
0
)) = E(h(X
T
)) = h(a)r(x) + h(b)(1 r(x))
r(x) =
h(b) h(x)
h(b) h(a)
= . . . =
1
_
p
1p
_
bx
1
_
p
1p
_
ba
(14.12)
74
Remark If p <
1
2
,
r(x) 1
_
p
1 p
_
bx
(14.13)
and this bound does not depend on a.
14.3 The Martingale Convergence Theorem
(, /, P) probability space, (/
n
) increasing sequence of -elds, (M
n
) martingale with
respect to (/
n
).
For a < b and N 1, let U
N
a,b
be the number of upcrossings of the interval [a, b] during
the time interval [0, N].
More precisely, let S
0
= T
0
= 0,
S
k
= minn T
k1
: M
n
a,
T
k
= minn S
k
: M
n
b,
U
N
a,b
= maxk: T
k
N.
Lemma 14.12 (Upcrossing inequality)
E(U
N
a,b
)
E((M
N
a)

)
b a
(14.14)
Proof: S
k
(k = 1, 2, . . . ) and T
k
(k = 1, 2, . . . ) are stopping times.
Theorem 14.9 implies that for Z =

k=1
(M
T
k
N
M
S
k
N
), we have E(Z) = 0.
If U
N
a,b
= m, Z m(b a) + M
N
M
S
n+1
N
.
Further, M
N
M
S
n+1
N
M
N
a if S
n+1
< N and M
N
M
S
n+1
N
= 0 else.
M
N
M
S
n+1
N
(M
N
a)

Z (b a)U
N
a,b
(M
N
a)

.
Now, (14.14) follows since E(Z) = 0.
Remark
1. Let U
a,b
:= lim
N
U
N
a,b
. Then, monotone convergence implies
E(U
a,b
) = lim
N
E(U
N
a,b
)
1
b a
sup
N
E((M
N
a)

) (14.15)
2. The right hand side of (14.15) is nite if
sup
N
E(M

N
) < . (14.16)
Further, (14.16) is equivalent to
sup
N
E([M
N
[) < . (14.17)
Proof: E([M
N
[) = E(M
+
N
) + E(M

N
).
Since E(M
+
N
) E(M

N
) = E(M
N
) = E(M
0
) we have
sup
N
E([M
N
[) < sup
N
E(M

N
) < sup
N
E(M
+
N
) < . (14.18)
75
Theorem 14.13 ((Doobs) Martingale Convergence Theorem)
Assume (M
n
) is a martingale which satises (14.16) (or (14.17)). Then,
P(M
n
converges to a nite limit) = 1 and for M

:= lim
n
M
n
, we have M

L
1
.
Proof:
1. We have liminf
n
M
n
< limsup
n
M
n


a,b
a<b
U
a,b
= .
Due to (14.15), (14.16) implies that P(U
a,b
< ) = 1, a, b and we conclude that
liminf
n
M
n
= limsup
n
M
n
P-a.s.
for P-almost all , M

() := lim
n
M
n
() exists.
2. We have E([M

[) liminf
n
E([M
n
[) < due to (14.18)
M

L
1
(and in particular, M

is nite, P-a.s.).
Example 14.14 Consider simple random walk (S
n
), see Example 14.10.
(S
n
) is a martingale. Clearly, (S
n
) does not converge (since [S
n
S
n1
[ = 1).
In fact, (14.17) does not hold, since the CLT implies that
1

n
E([S
n
[)
n

[x[e

x
2
2
dx =
_
2

.
Nevertheless, we can benet from Theorem 14.13.
Let c Z0, T
c
:= minn 1: S
n
= c.
Then, (S
T
c
n
) is again a martingale with respect to (/
n
).
If c > 0, S
T
c
n
c
If c < 0, S
T
c
n
c
_

(S
T
c
n
) is a martingale which is bounded above
(or below respectively).
(S
T
c
n
) converges P-a.s.
P(T
c
< ) = 1, c Z0 (this proves (14.10) with c=1)
P(T
c
< ) = 1, c Z
P
_
limsup
n
S
n
= , liminf
n
S
n
=
_
= 1.
76

Вам также может понравиться