Luigi Ambrosio, Giuseppe Da Prato, Andrea Mennucci Introduction To Measure Theory and Integration 2011

10
APPUNTI
LECTURE NOTES
Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci
Scuola Normale Superiore
Piazza dei Cavalieri, 7
56126 Pisa, Italy
Introduction to Measure Theory and Integration
Luigi Ambrosio, Giuseppe Da Prato
and Andrea Mennucci
Introduction
to Measure Theory
and Integration
c 2011 Scuola Normale Superiore Pisa
ISBN: 978-88-7642-385-7
e-ISBN: 978-88-7642-386-4
Contents
Preface ix
Introduction xi
1 Measure spaces 1
1.1 Notation and preliminaries . . . . . . . . . . . . . . . . 1
1.2 Rings, algebras and algebras . . . . . . . . . . . . . . 2
1.3 Additive and additive functions . . . . . . . . . . . . 4
1.4 Measurable spaces and measure spaces . . . . . . . . . . 7
1.5 The basic extension theorem . . . . . . . . . . . . . . . 8
1.5.1 Dynkin systems . . . . . . . . . . . . . . . . . . 9
1.5.2 The outer measure . . . . . . . . . . . . . . . . 11
1.6 The Lebesgue measure in R . . . . . . . . . . . . . . . 14
1.7 Inner and outer regularity of measures on metric spaces . 18
2 Integration 23
2.1 Inverse image of a function . . . . . . . . . . . . . . . . 23
2.2 Measurable and Borel functions . . . . . . . . . . . . . 24
2.3 Partitions and simple functions . . . . . . . . . . . . . . 25
2.4 Integral of a nonnegative Emeasurable function . . . . 27
2.4.1 Integral of simple functions . . . . . . . . . . . 27
2.4.2 The repartition function . . . . . . . . . . . . . 28
2.4.3 The archimedean integral . . . . . . . . . . . . . 31
2.4.4 Integral of a nonnegative measurable function . . 32
2.5 Integral of functions with a variable sign . . . . . . . . . 35
2.6 Convergence of integrals . . . . . . . . . . . . . . . . . 36
2.6.1 Uniform integrability and Vitali convergence
theorem . . . . . . . . . . . . . . . . . . . . . . 38
2.7 A characterization of Riemann integrable functions . . . 39
vi Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci
3 Spaces of integrable functions 45
3.1 Spaces L
p
(X, E, ) and L
p
(X, E, ) . . . . . . . . . . 45
3.2 The L
p
norm . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.1 H older and Minkowski inequalities . . . . . . . 48
3.3 Convergence in L
p
(X, E, ) and completeness . . . . . 49
3.4 The space L
(X, E, ) . . . . . . . . . . . . . . . . . . 52
3.5 Dense subsets of L
p
(X, E, ) . . . . . . . . . . . . . . 56
4 Hilbert spaces 61
4.1 Scalar products, pre-Hilbert and Hilbert spaces . . . . . 61
4.2 The projection theorem . . . . . . . . . . . . . . . . . . 63
4.3 Linear continuous functionals . . . . . . . . . . . . . . . 66
4.4 Bessel inequality, Parseval identity and orthonormal
systems . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5 Hilbert spaces on C . . . . . . . . . . . . . . . . . . . . 70
5 Fourier series 73
5.1 Pointwise convergence of the Fourier series . . . . . . . 75
5.2 Completeness of the trigonometric system . . . . . . . . 79
5.3 Uniform convergence of the Fourier series . . . . . . . . 80
6 Operations on measures 83
6.1 The product measure and FubiniTonelli theorem . . . . 83
6.2 The Lebesgue measure on R
n
. . . . . . . . . . . . . . . 87
6.3 Countable products . . . . . . . . . . . . . . . . . . . . 90
6.4 Comparison of measures . . . . . . . . . . . . . . . . . 94
6.5 Signed measures . . . . . . . . . . . . . . . . . . . . . 101
6.6 Measures in R . . . . . . . . . . . . . . . . . . . . . . . 105
6.7 Convergence of measures on R . . . . . . . . . . . . . . 107
6.8 Fourier transform . . . . . . . . . . . . . . . . . . . . . 112
6.8.1 Fourier transform of a measure . . . . . . . . . . 113
7 The fundamental theorem of the integral calculus 119
8 Measurable transformations 129
8.1 Image measure . . . . . . . . . . . . . . . . . . . . . . 129
8.2 Change of variables in multiple integrals . . . . . . . . . 130
8.3 Image measure of L
n
by a C
1
diffeomorphism . . . . . 131
A 137
A.1 Continuity and differentiability of functions depending
on a parameter . . . . . . . . . . . . . . . . . . . . . . . 137
vii Introduction to Measure Theory and Integration
A.2 The dual space of continuous functions . . . . . . . . . . 139
References 183
Preface
This textbook collects the notes for an introductory course in measure
theory and integration taught by the authors to undergraduate students of
Scuola Normale Superiore in the last 10 years.
The goal of the course was to present, in a quick but rigorous way,
the modern point of view on measure theory and integration, putting Le-
besgues theory in R
n
into a more general context and presenting the ba-
sic applications to Fourier series, calculus and real analysis. The text can
also pave the way to more advanced courses in probability, stochastic
processes or geometric measure theory.
Prerequisites for the book are a basic knowledge of calculus in one and
several variables, metric spaces and linear algebra.
All results presented here, as well as their proofs, are classical. We
claim some originality only in the presentation and in the choice of the
exercises. Detailed solutions to the exercises are provided in the nal part
of the book.
Pisa, July 2011
Luigi Ambrosio, Giuseppe Da Prato
and Andrea Mennucci
Introduction
This course consists of an introduction to the modern theories of measure
and of integration. Historically, this has been motivated by the necessity
to go beyond the classical theory of Riemanns integration, usually taught
in elementary Calculus courses on the real line. It is therefore useful to
describe the reasons that motivate this extension.
(1) It is not possible to give a simple, handy, characterization of the class
of Riemanns integrable function, within Riemanns theory. This is indeed
possible within the stronger theory, due essentially to Lebesgue, that we
are going to introduce.
(2) The extensions of Riemanns theory to multiple integrals are very
cumbersome. This extension, useful to compute areas, volumes, etc., is
known as PeanoJordan theory, and it is sometimes taught in elementary
courses of integration in more than one variable. In addition to that, im-
portant heuristic principles like Cavalieris one can be proved only under
technical and basically unnecessary regularity assumptions on the do-
mains of integration.
(3) Many constructive processes typical of Analysis (limits, series, in-
tegrals depending on a parameter, etc.) cannot be handled well within
Riemanns theory of integration. For instance, the following statement is
true (it is a particular case of the so-called dominated convergence the-
orem):
Theorem 1. Let f
h
: [1, 1] R be continuous functions pointwise
converging to a continuous function f . Assume the existence of a con-
stant M satisfying | f
h
(x)| M for all x [1, 1] and all h N. Then
lim
h
1
1
f
h
(x) dx =
1
1
f (x) dx.
Even though this statement makes perfectly sense within Riemanns the-
ory, any attempt to prove this result within the theory (try, if you dont
xii Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci
believe!) seems to fail, and leads (more or less explicitely, see [2]) to
a larger theory. In addition to that, the continuity assumption on the
limit function f is not natural, because a pointwise limit of continuous
functions need not be continuous, and we would like to give a sense to
1
1
f (x) dx even without this assumption. This necessity emerges for
instance in the study of the convergence of Fourier series
f (x) =
i =0
a
i
cos(i x) +b
i
sin(i x) x [1, 1].
In this case the uniform convergence of the series, which implies the
continuity of f as well, is ensured by the condition

i
|a
i
| + |b
i
| <
. On the other hand, we will see that the natural condition for the
convergence (in a suitable sense) of the series is much weaker:
i =0
a
2
i
+b
2
i
< .
Under this condition the limit function f need not be continuous: for
instance, if f (x) = 1 for x [1/2, 1/2] and f (x) = 0 otherwise, then
we will see that the coefcients of the Fourier series are given by b
i
= 0
for all i (because f is even) and by
a
i
=
1
2
if i = 0;
sin(i /2)
i
if i > 0.
(4) The spaces of integrable functions, as for instance
H :=
f : [1, 1] R :
1
1
f
2
(x) dx <
endowed with the scalar product

f, g :=
1
1
f (x)g(x) dx
and with the (pseudo) induced distance d( f, g) = f g, f g
1/2
,
are not complete, if we restrict ourselves to Riemann integrable functions
only. In this sense, the path from Riemanns to Lebesgues theory is the
same one that led from the (incomplete) set of rational numbers Q to the
(complete) real line R.
xiii Introduction to Measure Theory and Integration
Lebesgues theory extends Riemanns theory in two independent direc-
tions. The rst one is concerned, as we already said, with more general
classes of functions, not necessarily continuous or piecewise continuous
(the so-called Borel or measurable functions). The second direction can
be better understood if we remind the very denition of Riemanns integ-
ral

1
1
f (x) dx
n1
i =1
(t
i +1
t
i
) f (t
i
)
where t
1
= 1, t
n
= 1 and the approximation is better and better as the
parameter sup
i <n
t
i +1
t
i
tends to 0. More generally, instead of integrat-
ing with respect to the length measure, we can integrate with respect to
a generic measure and dene
1
1
f (x) d(x)
n
i =1
(A
i
) f (x
i
) (1)
where A
1
, . . . , A
n
is a partition of [1, 1], x
i
A
i
and the approximation
is expected to be better and better as the parameter sup
i
diam(A
i
) tends to
0. We may think, for instance to [1, 1] as a possibly non-homogeneous
bar, and to (A) as the mass of the subset A of the bar: because of
non-homogeneity, (A) need not be proportional to the length of A.
Once we adopt this viewpoint, we will see that it is not hard to obtain a
theory of integration in general metric spaces, and even in more general
classes of spaces. On the other hand, the approximation (1), that in any
case claries the intuitive meaning of the integral, will remain valid for
continuous functions only.
Chapter 1
Measure spaces
In this chapter we shall introduce all basic concepts of measure theory,
adopting the point of view of measures as set functions. The domains
of measures may have different stability properties, and this leads to the
concepts of ring, algebra and algebra. The most basic tool developed
in the chapter is Carath eodorys theorem, which ensures in many cases
the existence and the uniqueness of a additive measure having some
prescribed values on a set of generators of the algebra. In the nal
part of the chapter we will apply these abstract tools to the problem of
constructing a length measure on the real line, the so-called Lebesgue
measure, and we will study its main properties.
1.1. Notation and preliminaries
We denote by N = {0, 1, 2, . . .} the set of natural numbers, and by N
the
set of positive natural numbers. Unless otherwise stated, sequences will
always be indexed by natural numbers.
We shall denote by X a non-empty set, by P(X) the set of all parts of
X and by the empty set. For any subset A of X we shall denote by A
c
its complement A
c
:= {x X : x / A}. If A, B P(X) we denote
by A \ B the relative complement A B
c
, and by AB the symmetric
difference (A \ B) (B \ A).
Let (A
n
) be a sequence in P(X). Then the following De Morgan
identity holds,
_

_
n=0
A
n
_
c
=
_
n=0
A
c
n
.
Moreover, we dene
(1)
limsup
n
A
n
:=
_
n=0
_
m=n
A
m
, liminf
n
A
n
:=
_
n=0
_
m=n
A
m
.
(1)
Notice the analogy with liminf and limsup limits for a sequence (a
n
) of real numbers. We have
limsup
n
a
n
= inf
nN
sup
mn
a
m
and liminf
n
a
n
= sup
nN
inf
mn
a
m
. This is something more than an analogy,
see Exercise 1.1.
L. Ambrosio et al., Introduction to Measure Theory and Integration
Scuola Normale Superiore Pisa 2011
2 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci
As it can be easily checked, limsup
n
A
n
(respectively liminf
n
A
n
) con-
sists of those elements of X that belong to innitely many A
n
(respect-
ively that belong to all but nitely many A
n
).
It easy to check that if (A
n
) is nondecreasing (i.e. A
n
A
n+1
, n N),
we have
liminf
n
A
n
= limsup
n
A
n
=
_
n=0
A
n
,
whereas if (A
n
) is nonincreasing (i.e. A
n
A
n+1
, n N), we have
liminf
n
A
n
= limsup
n
A
n
=
_
n=0
A
n
.
In the rst case we shall write A
n
L, and in the second one A
n
L.
1.2. Rings, algebras and algebras
Denition 1.1 (Rings and Algebras). A non empty subset A of P(X)
is said to be a ring if:
(i) belongs to A;
(ii) A, B A A B, A B A;
(iii) A, B A A \ B A.
We say that a ring is an algebra if X A.
Notice that rings are stable only with respect to relative complement,
whereas algebras are stable under complement in X.
Let K P(X). As the intersection of any family of algebras is still
an algebra, the minimal algebra including K (that is the intersection of all
algebras including K ) is well dened, and called the algebra generated
by K. A constructive characterization of the algebra generated by K can
be easily achieved as follows: set F
(0)
= K {} and
F
(i +1)
:=
_
_
A B, A
c
: A, B F
(i )
_
i 0.
Then, the algebra A generated by K is given by

i
F
(i )
. Indeed, it is
immediate to check by induction on i that A F
(i )
, and therefore the
union of the F
(i )
s is contained in A. On the other hand, this union is
easily seen to be an algebra, so the minimality of A provides the opposite
inclusion.
Denition 1.2 (algebras). A non-empty subset E of P(X) is said to
be a algebra if:
(i) E is an algebra;
3 Introduction to Measure Theory and Integration
(ii) if (A
n
) is a sequence of elements of E then

n=0
A
n
E.
If E is a algebra and (A
n
) E we have
_
n
A
n
E by the De
Morgan identity. Moreover, both sets
liminf
n
A
n
, limsup
n
A
n
,
belong to E.
Obviously, {, X} and P(X) are algebras, respectively the smal-
lest and the largest ones. Let K be a subset of P(X). As the intersection
of any family of algebras is still a -algebra, the minimal algebra
including K (that is the intersection of all algebras including K ) is
well dened, and called the algebra generated by K. It is denoted by
(K ).
In contrast with the case of generated algebras, it is quite hard to give
a constructive characterization of the generated -algebras: this requires
the transnite induction and it is illustrated in Exercise 1.18.
Denition 1.3 (Borel -algebra). If (E, d) is a metric space, the
algebra generated by all open subsets of E is called the Borel algebra
of E and it is denoted by B(E).
In the case when E = R the Borel -algebra has a particularly simple
class of generators.
Example 1.4 (B(R)). Let I be the set of all semiclosed intervals [a,b)
with a b. Then (I) coincides with B(R). In fact (I) contains all
open intervals (a, b) since
(a, b) =
_
n=n
0
_
a +
1
n
, b
_
with n
0
>
1
b a
.
Moreover, any open set A in R is a countable union of open intervals.
(2)
An analogous argument proves that B(R) is generated by semi-closed
intervals (a, b], by open intervals, by closed intervals and even by open
or closed half-lines.
(2)
Indeed, let (a
k
) be a sequence including all rational numbers of A and denote by I
k
the largest
open interval contained in A and containing a
k
. We clearly have A

k=0
I
k
, but also the opposite
inclusion holds: it sufces to consider, for any x A, r > 0 such that (x r, x +r) A, and k
such that a
k
(x r, x +r) to obtain (x r, x +r) I
k
, by the maximality of I
k
, and then x I
k
.
1.3. Additive and additive functions
Let A P(X) be a ring and let be a mapping from A into [0, +]
such that () = 0. We say that is additive if
A, B A, A B = (A B) = (A) +(B).
If is additive, A, B A and A B, we have (A) = (B) +(A \
B), so that (A) (B). Therefore any additive function is nondecreas-
ing with respect to set inclusion. Moreover, by applying repeatedly the
additivity property, additive measures satisfy
_
n
_
k=1
A
k
_
=
n
k=1
(A
k
)
for n N
and mutually disjoint sets A

1
, . . . , A
n
A.
A set function on A is called additive if () = 0 and for any
sequence (A
n
) A of mutually disjoint sets such that

n
A
n
A we
have
_

_
n=0
A
n
_
=
n=0
(A
n
).
Obviously additive functions are additive, because we can consider
countable unions in which only nitely many A
n
are nonempty.
Another useful concept is the subadditivity: we say that is
subadditive if
(B)
n=0
(A
n
),
for any B A and any sequence (A
n
) A such that B

n
A
n
.
Notice that, unlike the denition of additivity, the sets A
n
need not be
disjoint here.
Remark 1.5 (additivity and subadditivity). Let be additive on
a ring A and let (A
n
) A be mutually disjoint and such that

n
A
n

A. Then by monotonicity we have
_

_
n=0
A
n
_

_
k
_
n=0
A
n
_
=
k
n=0
(A
n
), for all k N.
Therefore, letting k we get
_

_
n=0
A
n
_

n=0
(A
n
).
Thus, to show that an additive function is additive, it is enough to
prove that it is subadditive.
Conversely, it is not difcult to show that additive set functions are
subadditive: indeed, if B
n
A
n
we can dene A
0
= B A
0
and
A
n
:= B A
n
\
m<n
A
m
for n N
, so that B is the disjoint union of

the sets A
n
, to obtain
(B) =
n=0
(A
n
)
n=0
(A
n
).
Let be additive on A. Then additivity of is equivalent to con-
tinuity of in the sense of the following proposition.
Proposition 1.6 (Continuity on nondecreasing sequences). If is ad-
ditive on a ring A, then (i ) (i i ), where:
(i) is additive;
(ii) (A
n
) A and A A, A
n
A (A
n
) (A).
Proof. (i)(ii). In the proof of this implication we can assume with no
loss of generality that (A
n
) < for all n N. Let (A
n
) A, A A,
A
n
A. Then
A = A
0

_
n=0
(A
n+1
\ A
n
),
the unions being disjoint. Since is additive, we deduce that
(A) = (A
0
) +
n=0
((A
n+1
) (A
n
)) = lim
n
(A
n
),
and (ii) follows.
(ii)(i). Let (A
n
) A be mutually disjoint and such that A :=
n
A
n
A. Set
B
m
:=
m
_
k=0
A
k
.
Then B
m
A and (B
m
) =
m
0
(A
k
) (A) by the assumption. This
implies (i).
Proposition 1.7 (Continuity on nonincreasing sequences). Let be
additive on a ring A. Then
(A
n
) A and A A, A
n
A, (A
0
) < (A
n
) (A).
(1.1)
Proof. Setting B
n
:= A
0
\A
n
, B := A
0
\A, we have B
n
B, therefore the
previous proposition gives (B
n
) (B). As (A
n
) = (A
0
) (B
n
)
and (A) = (A
0
) (B) the proof is achieved.
Corollary 1.8 (Upper and lower semicontinuity of the measure). Let
be additive on a algebra E and let (A
n
) E. Then we have
_
liminf
n
A
n
_
liminf
n
(A
n
) (1.2)
and, if (X) < , we have also
limsup
n
(A
n
)
_
limsup
n
A
n
_
. (1.3)
Proof. Set L := limsup
n
A
n
. Then we can write
L =
_
n=0
B
n
where B
n
:=
_
m=n
A
m
. (1.4)
Now, assuming (X) < , by Proposition 1.7 it follows that
(L) = lim
n
(B
n
) = inf
nN
(B
n
) inf
nN
sup
mn
(A
m
) = limsup
n
(A
n
).
Thus, we have proved (1.3). The inequality (1.2) can be proved similarly
using Proposition 1.6, thus without using the assumption (X)<.
The following result is very useful to estimate the measure of a limsup
of sets.
Lemma 1.9. Let be additive on a algebra E and let (A
n
) E.
Assume that

n=0
(A
n
) < . Then
_
limsup
n
A
n
_
= 0.
Proof. Set L = limsup
n
A
n
and dene B
n
as in (1.4). Then the inclusion
L B
n
gives
(L) (B
n
)
m=n
(A
m
) for all n N.
As n we nd (L) = 0.
1.4. Measurable spaces and measure spaces
Let E be a algebra of subsets of X. Then we say that the pair (X, E)
is a measurable space. Let : E [0, +] be a additive function.
Then we call a measure on (X, E), and we call the triple (X, E, ) a
measure space.
The measure is said to be nite if (X) < , nite if there exists
a sequence (A
n
) E such that

n
A
n
= X and (A
n
) < for all
n N. Finally, is called a probability measure if (X) = 1.
The simplest (but fundamental) example of a probability measure is
the Dirac mass
x
, dened by
x
(B) :=
1 if x B
0 if x / B.
This example can be generalized as follows, see also Exercise 1.5 and
Exercise 1.23.
Example 1.10 (Discrete measures). Assume that Y X is a nite or
countable set. Given c : Y [0, +] we can dene a measure on
(X, P(X)) as follows:
(B) :=
xBY
c(x) B X.
Clearly =

xY
c(x)
x
is a nite measure if and only if
xY
c(x) <
, and it is nite if and only if c(x) [0, +) for all x Y.
More generally, the construction above works even when Y is uncount-
able, by replacing the sum with
sup

cBY
c(x),
where the supremum is made among the nite subsets Y
of Y. The meas-
ures arising in the previous example are called atomic, and clearly if X is
either nite or countable then any measure in (X, P(X)) is atomic: it
sufces to notice that
=
xX
c(x)
x
with c(x) := ({x}).
In the next section we will introduce a fundamental tool for the construc-
tion of non-atomic measures.
Denition 1.11 (negligible sets and almost everywhere). Given
a measure space (X, E, ), we say that B E is negligible if (B) =
0, and we say that a property P(x) holds almost everywhere if the set
{x X : P(x) is false}
is contained in a negligible set.
Notice that the class of negligible sets is stable under nite or count-
able unions. It is sometimes convenient to know that any subset of a
negligible set is still negligible.
Denition 1.12 (completion of aalgebra and measurable sets).
Let (X, E, ) be a measure space. We dene
E
:= {A P(X) : for some B, C E with (C) = 0, AB C} .

It is easy to check that E
is still a algebra, the so-called completion of

E with respect to . The elements of E
are called measurable sets.

It is also easy to check that can be extended to all A E
simply
by setting (A) = (B), where B E is any set such that AB is
contained in a negligible set of E. This extension is well dened (i.e.
independent of the choice of B), still additive and negligible sets
coincide with those sets that are contained in some B E with (B) =
0. As a consequence, any subset of a negligible set is negligible as
well.
1.5. The basic extension theorem
The following result, due to Carath eodory, allows to extend a additive
function on a ring A to a additive function on (A). It is one of the
basic tools in the construction of non-trivial measures in many cases of
interest, as we will see.
Theorem 1.13 (Carath eodory). Let A P(X) be a ring, and let E
be the algebra generated by A. Let : A [0, +] be additive.
Then can be extended to a measure on E. If is nite, i.e. there
exist A
n
A with A
n
X and (A
n
) < for any n, then the extension
is unique.
To prove this theorem we need some preliminaries: for the uniqueness
the Dynkin theorem and for the existence the concepts of outer measure
and additive set.
1.5.1. Dynkin systems
A non-empty subset K of P(X) is called a system if
A, B K A B K.
A non-empty subset D of P(X) is called a Dynkin system if
(i) X, D;
(ii) A D A
c
D;
(iii) (A
i
) D mutually disjoint

i
A
i
D.
Obviously any algebra is a Dynkin system. Moreover, if D is both a
Dynkin system and a system then it is a algebra. In fact, if (A
i
) is
a sequence in D of not necessarily disjoint sets we have
_
i =0
A
i
= A
0
(A
1
\ A
0
) ((A
2
\ A
1
) \ A
0
)
and so
i
A
i
D by (ii) and (iii).
Let us prove now the following important result.
Theorem 1.14 (Dynkin). Let K be a system and let D K be a
Dynkin system. Then (K ) D.
Proof. Let D
0
be the minimal Dynkin system including K. We are going
to show that D
0
is a algebra which will prove the theorem. For this
it is enough to show, as remarked before, that the following implication
holds:
A, B D
0
A B D
0
. (1.5)
For any B D
0
we set
H(B) = {F D
0
: B F D
0
}.
We claim that H(B) is a Dynkin system. In fact properties (i) and (iii)
are clear. It remains to show that if F B D
0
then F
c
B D
0
or,
equivalently, F B
c
D
0
. In fact, since F B
c
= (F \ B
c
) B
c
=
(F B) B
c
and F B and B
c
are disjoint, we have that F B
c
D
0
as required.
Notice rst that if K K we have K H(K) since K is a
system. Therefore H(K) = D
0
, by the minimality of D
0
. Con-
sequently, the following implication holds
K K, B D
0
K B D
0
,
which implies K H(B) for all B D
0
. Again, the fact that H(B) is
a Dynkin system and the minimality of D
0
give that H(B) = D
0
for all
B D
0
. By the denition of H(B), this proves (1.5).
The uniqueness part in Caratheodorys theoremis a direct consequence
of the following coincidence criterion for measures; in turn, the proof of
the criterion relies on Theorem 1.14.
Proposition 1.15 (Coincidence criterion). Let
1
,
2
be measures in
(X, E) and assume that:
(i) the coincidence set
D := {A E :
1
(A) =
2
(A)}
contains a system K with (K) = E;
(ii) there exists a nondecreasing sequence (X
i
) K with
1
(X
i
) =
2
(X
i
) < and X
i
X.
Then
1
=
2
.
Proof. We rst assume that
1
(X) =
2
(X) is nite. Under this as-
sumption D is a Dynkin system including the system K (stability of
D under complement is ensured precisely by the niteness assumption).
Thus, by the Dynkin theorem, D = E, which implies that
1
=
2
.
Assume now that we are in the general case and let X
i
be given by
assumption (ii). Fix i N and dene the algebra E
i
of subsets of X
i
by
E
i
:= {A X
i
: A E} .
We may obviously consider
1
and
2
as nite measures in the measur-
able space (X
i
, E
i
). Since these measures coincide on the system
K
i
:= {A X
i
: A K}
we obtain, by the previous step, that
1
and
2
coincide on (K
i
)
P(X
i
).
Now, let us prove the inclusion
{B E : B X
i
} (K
i
). (1.6)
Indeed
_
B X : B X
i
(K
i
)
_
is a algebra containing K (here we use the fact that X
i
K), and
therefore contains E. Hence any element of E contained in X
i
belongs to
(K
i
).
By (1.6) we obtain
1
(B X
i
) =
2
(B X
i
) for all B E and all
i N. Passing to the limit as i , since B is arbitrary we obtain that
1
=
2
.
1.5.2. The outer measure
Let be a set function dened on A P(X). For any E P(X) we
dene:
(E) := inf
_

i =0
(A
i
) : A
i
A, E
_
i =0
A
i
_
.
is called the outer measure induced by . We can easily show that
is a nondecreasing set function, namely
(E)
(F) whenever
E F X.
We will obtain the proof of the existence part of Carath eodorys the-
orem by showing in the proposition below that
extends if is
subadditive, and that (Theorem 1.17)
is -additive on a -algebra
containing (A) if is A is a ring and is additive on A. In particular
if is additive on A we see that
provides the desired additive

extension to (A).
Proposition 1.16. The set function
is subadditive on P(X) and

extends if is subadditive on A and () = 0.
Proof. Let (E
i
) P(X) and set E :=

i
E
i
. Assume that

i

(E
i
)
are nite (otherwise the assertion is trivial). Then, since
(E
i
) is nite
for any i N, for any > 0 there exist A
i, j
A such that
j =0
(A
i, j
) <
(E
i
) +

2
i +1
, E
i

_
j =0
A
i, j
, i N.
Consequently
i, j =0
(A
i, j
)
i =0
(E
i
) +.
Since E
i, j =0
A
i, j
we have
(E)
i, j =0
(A
i, j
)
i =0
(E
i
) +
and the rst part of the statement follows from the arbitrariness of .
Now, let us assume that is -subadditive on A and choose E A;
since E

i
A
i
then (E)

i
(A
i
), so we deduce
(E) (E);
but, by choosing A
0
= E and A
n
= for n 1, we obtain that
(E) =
(E). This proves that
extends .
Let us now dene the additive sets, according to Carath eodory. A set
A P(X) is called additive if
(E) =
(E A) +
(E A
c
) for all E P(X). (1.7)
We denote by G the family of all additive sets.
Notice that, since
is subadditive, (1.7) is equivalent to
(E)
(E A) +
(E A
c
) for all E P(X). (1.8)
Obviously, the class G of additive sets is stable under complement;
moreover, by taking E = A B with A G and A B = , we
obtain the additivity property
(A B) =
(A) +
(B). (1.9)
Other important properties of G are listed in the next proposition.
Theorem 1.17. Assume that A is a ring and that is additive. Then G
is a algebra containing A and
is additive on G.
Proof. We proceed in three steps: we show that G contains A, that G is a
algebra and that
is additive on G. As pointed in Remark 1.5, if
is subadditive and additive on the algebra G, then
is additive.
Step 1. A G. Let A A and E P(X), we have to show (1.8).
Assume
(E) < (otherwise (1.8) trivially holds), x > 0 and

choose (B
i
) A such that
E
_
i =0
B
i
,
(E) + >
i =0
(B
i
).
Then, by the denition of
, it follows that
(E) + >
i =0
(B
i
) =
i =0
[(B
i
A) +(B
i
A
c
)]

(E A) +
(E A
c
).
Since is arbitrary we have
(E)
(E A) +
(E A
c
), and (1.8)
follows.
Step 2. G is an algebra and
is additive on G. We already know

that A G implies A
c
G. Let us prove now that if A, B G then
A B G. For any E P(X) we have
(E) =
(E A) +
(E A
c
)
=
(E A) +
(E A
c
B) +
(E A
c
B
c
)
= [
(E A) +
(E A
c
B)] +
(E (A B)
c
).
(1.10)
Since
(E A) (E A
c
B) = E (A B),
we have by the subadditivity of
(E A) +
(E A
c
B)
(E (A B)).
So, by (1.10) it follows that
(E)
(E (A B)) +
(E (A B)
c
),
and A B G as required. The additivity of
on G follows directly
from (1.9).
Step 3. G is a algebra. Let (A
n
) G. We are going to show that
S :=

n
A
n
G. Since we know that G is an algebra, it is not restrictive
to assume that all sets A
n
are mutually disjoint. Set S
n
:=

n
0
A
i
, for
n N.
For any n N, by using the subadditivity of
and by applying
(1.7) repeatedly, we get
(E S
c
) +
(E S)
(E S
c
) +
i =0
(E A
i
)
= lim
n
_
(E S
c
) +
n
i =0
(E A
i
)
_
= lim
n
_
(E S
c
) +
(E S
n
)
_
.
Since S
c
S
c
n
it follows that
(E S
c
) +
(E S) limsup
n
_
(E S
n
) +
(E S
c
n
)
_
=
(E).
So, S G and G is a algebra.
Remark 1.18. We have proved that
(A) G P(X). (1.11)
One can show that the inclusions above are strict in general, for instance
when is the Lebesgue measure we shall consider in the next section.
In fact, in the case when X = R and (A) is the Borel -algebra, Exer-
cise 1.19 shows that (A) has the cardinality of continuum, while G has
the cardinality of P(R), since it contains all subsets of Cantors middle
third set (see Exercise 1.8). An example of a non-additive set will be built
in Remark 1.23, so that also the second inclusion in (1.11) is strict.
1.6. The Lebesgue measure in R
In this section we build the Lebesgue measure on the real line R. To this
aim, we consider rst the set I of all bounded right open intervals of R
I := {(a, b] : a, b R, a < b}
and the collection A containing and the nite unions of elements of
I. Our choice of half-open intervals ensures that A is a ring, because
I is stable under intersection and relative complement (the families of
open and closed intervals, instead, do not have this property).
We dene
length((a, b]) := b a.
More generally, any non-empty A A can be written, possibly in many
ways, as a disjoint nite union of intervals I
i
, i = 1, . . . , N; we dene
(A) :=
N
i =1
length(I
i
). (1.12)
Setting () = 0, it is not hard to show by elementary methods that
is well dened (i.e. (A) does not depend on the chosen decomposition)
and additive on A.
In the next denition we introduce the notion of characteristic function,
which can be used to turn set-theoretic operations into algebraic ones:
for instance the intersection corresponds to the product, when seen at the
level of characteristic functions (see also Exercise 1.1).
Denition 1.19 (Characteristic function of a set). Let A X. The
characteristic function 1
A
: X {0, 1} is dened by
1
A
(x) :=
1 if x A;
0 if x X \ A.
The reader already acquainted with Riemanns theory of integration can
also notice that (A) is the Riemann integral of the characteristic function
1
A
of A, and deduce the additivity property of directly by the additivity
properties of the Riemann integral. In the next theorem we shall rigor-
ously prove these facts, and more. We rst state an auxiliary lemma, a
simple consequence of the Bolzano-Weierstrass compactness theorem on
the real line.
Lemma 1.20. Any bounded and closed interval J contained in the union
of a sequence {A
n
}
nN
of open sets is contained in the union of nitely
many of them.
Proof. Assume with no loss of generality that I = N and A
n
A
n+1
,
and assume by contradiction that there exist x
n
J \ A
n
for all n; by the
BolzanoWeierstrass theorem there exists a subsequence (x
n(k)
) conver-
ging to some x J. If n is such that x A
n
, for k large enough x
n(k)
belongs to A
n
, because A
n
is open. But this is not possible, as soon as
n(k) n, because x
n(k)
/ A
n(k)
and A
n(k)
A
n
.
Theorem 1.21. The set function dened in (1.12) is additive on A.
Proof. ( is well dened) Given disjoint partitions I
1
, . . . , I
n
and J
1
, . . .
. . . , J
m
of A A, we say that J
1
, . . . , J
m
is ner than I
1
, . . . , I
n
if any
interval I
i
is the disjoint union of some of the intervals J
j
. Obviously,
given any two partitions, there exists a third partition ner than both: it
sufces to take all intersections of elements of the rst partition with ele-
ments of the second partition, neglecting the empty intersections. Given
these remarks, to show that is well dened, it sufces to show that
i
(I
i
) =

j
(J
j
) if J
1
, . . . , J
m
is ner than I
1
, . . . , I
n
. This state-
ment reduces to the fact that (I ) =

k
(F
k
) if I I is the disjoint
union of some elements F
k
I; this last statement can be easily proved,
starting from the identity (a, b] = (a, c] (c, b], by induction on the
number of the intervals F
k
.
( is additive) If F, G A and F G = , any disjoint decompositions
of F in intervals I
1
, . . . , I
n
I and any disjoint decomposition of G in
intervals J
1
, . . . , J
m
I provide a decomposition I
1
, . . . , I
n
, J
1
, . . . , J
m
of F G in intervals belonging to I. Using this decomposition to com-
pute (F G) the additivity easily follows.
( is additive) Let (F
n
) A be a sequence of disjoints sets in A and
assume that
F :=
_
n=0
F
n
(1.13)
also belongs to A.
We prove the additivity property rst in the case when F = (x, y]
I. It is also not restrictive to assume that the series

n
(F
n
) is con-
vergent. As any F
n
is a nite union of intervals, say N
n
, we can nd,
given any > 0, a nite union F
n
F
n
of intervals in I such that
(F
n
) (F
n
) + /2
n
and the internal part of F
n
contains F
n
(just shift
the endpoints of each interval in F
n
by a small amount, to obtain a lar-
ger interval in I, increasing the length at most by /(N
n
2
n
)). Let also
F
n
be the internal part of F
n
, that still includes F
n
, and let x
(x, y].
Then, since [x
, y]

n
F
n
, Lemma 1.20 provides an integer k such
that [x
, y]

k
0
F
n
. Hence, the additivity of in A gives
y x

_
k
_
n=0
F
n
_

k
n=0
(F
n
)
n=0
(F
n
) +

2
n
2 +
n=0
(F
n
).
By letting rst 0 and then letting x
x we obtain that (F)
0
(F
n
). The opposite inequality simply follows by the monotonicity
and the additivity of , because the nite unions of the sets F
n
are con-
tained in F.
In the general case, let
F =
k
_
i =1
I
i
,
where I
1
, . . . , I
k
are disjoint sets in I. Then, since for any i {1, . . . , k}
we have that I
i
is the disjoint union of I
i
F
n
, we know by the previous
step that
(I
i
) =
n=0
(I
i
F
n
).
Adding these identities for i = 1, . . . , k, commuting the sums on the
right hand side and eventually using the additivity of on A we obtain
(F) =
k
i =1
(I
i
F) =
n=0
k
i =1
(I
i
F
n
) =
n=0
(F
n
).
We say that a measure in (R, B(R)) is translation invariant if (A +
h) = (A) for all A B(R) and h R (notice that, by Exercise 1.2,
the class of Borel sets is translation invariant as well). We say also that
is locally nite if (I ) < for all bounded intervals I R.
Theorem 1.22 (Lebesgue measure in R). There exists a unique, up
to multiplication with constants, translation invariant and locally -
nite measure in (R, B(R)). The unique such measure satisfying
([0, 1]) = 1 is called Lebesgue measure.
Proof. (Existence) Let A be the class of nite unions of intervals and
let : A [0, +) be the additive set function dened in (1.12).
According to Theorem 1.21 admits a unique extension, that we still
denote by , to (A) = B(R). Clearly is locally nite, and we can use
the uniqueness of the extension to prove translation invariance: indeed,
for any h Ralso the additive measure A (A+h) is an extension
of |
A
. As a consequence (A) = (A +h) for all h R.
(Uniqueness) Let be a translation invariant and locally nite measure
in (R, B(R)) and set c := ([0, 1]). Notice rst that the set of atoms of
is at most countable (Exercise 1.5), and since R is uncountable there
exists at least one x such that ({x}) = 0. By translation invariance this
holds for all x, i.e., has no atom.
Excluding the trivial case c = 0 (that gives 0 by translation in-
variance and additivity), we are going to show that = c on the
class A of nite unions of intervals; by the uniqueness of the extension
in Carath eodory theorem this would imply that = c on B(R).
By nite additivity and translation invariance it sufces to show that
([0, t )) = ct for any t 0 (by the absence of atoms the same holds for
the intervals (0, t ), (0, t ], [0, t ]). Notice rst that, for any integer q 1,
[0, 1) is the union of q disjoint intervals all congruent to [0, 1/q); as a
consequence, additivity and translation invariance give
_
[0, 1/q)
_
=
([0, 1))
q
=
c
q
.
Similarly, for any integer p 1 the interval [0, p/q) is the union of p
disjoint intervals all congruent to [0, 1/q); again additivity and transla-
tion invariance give
([0,
p
q
)) = p
_
[0,
1
q
)
_
= c
p
q
.
By approximation we eventually obtain that ([0, t )) = ct for all
t 0.
The completion of the Borel algebra with respect to is the so-
called -algebra of Lebesgue measurable sets. It coincides with the
class C of additive sets with respect to
considered in the proof of

Carath eodory theorem (see Exercise 1.12).
Remark 1.23 (Outer Lebesgue measure and non-measurable sets).
The measure
used in the proof of Carath eodorys theoremis also called

outer Lebesgue measure, and it is dened on all parts of R. The termin-
ology is slightly misleading here, since
, though subadditive, fails

to be additive. In particular, there exist subsets of R that are not Le-
besgue measurable. To see this, let us consider the equivalence relation
in R dened by x y if x y Q and let us pick a single element
x [0, 1] in any equivalence class induced by this relation, thus forming
a set A [0, 1]. Were this set Lebesgue measurable, all the sets A + h
would still be measurable, by translation invariance, and the family of
sets {A + h}
hQ
would be a countable and measurable partition of R,
with
(A + h) = c independent of h Q. Now, if c = 0 we reach a

contradiction with the fact that
(R) = , while if c > 0 we consider

all sets A +h with h Q [0, 1] to obtain
2 =
([0, 2])
hQ[0,1]
(A +h) = ,
reaching again a contradiction.
Notice that this example is not constructive and strongly requires the
axiom of choice (also the arguments based on cardinality, see Exercise
1.19 and Exercise 1.20, have this limitation). On the other hand, one
can give constructive examples of Lebesgue measurable sets that are not
Borel (see for instance 2.2.11 in [3]).
The construction done in the previous remark rules out the existence of
locally nite and translation invariant additive measures dened on all
parts of R. In R
n
, with n 3, the famous BanachTarski paradox (see
for instance [6]) shows that it is also impossible to have a locally nite,
invariant under rigid motions and nitely additive measure dened on all
parts of R
n
.
1.7. Inner and outer regularity of measures on metric spaces
Let (E, d) be a metric space and let be a nite measure on (E, B(E)).
We shall prove a regularity property of .
Proposition 1.24. For any B B(E) we have
(B) = sup{(C) : C B, closed} = inf{(A) : A B, open}.
(1.14)
Proof. Let us set
K = {B B(E) : (1.14) holds}.
It is enough to show that K is a algebra of parts of E including the
open sets of E. Obviously K contains E and . Moreover, if B K
then its complement B
c
belongs to K. Let us prove now that (B
n
) K
implies

n
B
n
K. Fix > 0. We are going to show that there exist a
closed set C and an open set A such that
C
_
n=0
B
n
A, (A \ C) 2. (1.15)
Let n N. Since B
n
K there exist an open set A
n
and a closed set C
n
such that C
n
B
n
A
n
and
(A
n
\ C
n
)

2
n+1
.
Setting A :=

n
A
n
, S :=

n
C
n
we have S

n
B
n
A and (A \
S) . However, A is open but S is not necessarily closed. So, we
approximate S by setting S
n
:=

n
0
C
k
. The set S
n
is obviously closed,
S
n
S and consequently (S
n
) (S). Therefore there exists n
N
such that (S\S
n
) < . Now, setting C = S

n
we have C

n
B
n
A
and (A \ C) < (A \ S) +(S \ C) < 2. Therefore
n
B
n
K. We
have proved that K is a algebra. It remains to show that K contains
the open subsets of E. In fact, let A be open and set
C
n
=
_
x E : d(x, A
c
)
1
n
_
,
where d(x, A
c
) := inf
yA
c d(x, y) is the distance function from A
c
. Then
C
n
are closed subsets of A, and moreover C
n
A, which implies (A \
C
n
) 0. Thus the conclusion follows.
Notice that inner and outer approximation hold for measurable sets
B as well: one has just to notice that there exist Borel sets B
1
, B
2
such
that B
1
B B
2
with (B
2
\ B
1
) = 0, and apply inner approximation
to B
1
and outer approximation to B
2
.
Remark 1.25 (Inner and outer approximation for -nite measures).
It is possible to extend the inner approximation property to -nite meas-
ures: sufces to assume the existence of a sequence of closed sets C
n
with
nite measure such that (X \
n
C
n
) = 0. Indeed, assuming with no loss
of generality that C
n
C
n+1
, we know that for any Borel set B and any
n N it holds
(B C
n
) = sup {(C) : C closed, C B C
n
} ,
so that
(B C
n
) sup {(C) : C closed, C B} .
Letting n we recover the inner approximation property.
Analogously, if we assume the existence of a sequence of open sets A
n
with nite measure satisfying X =
n
A
n
, we have the outer approxim-
ation property: indeed, for any n and any > 0 we can nd (assuming
with no loss of generality (B) < +) open sets B
n
A
n
containing
B A
n
and such that
_
B
n
\ (B A
n
)
_
< 2
n
. It follows that
n
B
n
contains B and
_
_
nN
B
n
\ B
_
< 2.
Since B
n
are also open in X, the set
n
B
n
is open and since is arbitrary
we get the outer approximation property.
We conclude this chapter with the following result, whose proof is a
straightforward consequence of Proposition 1.24 (alternatively, one can
use Dynkins argument, since the class of closed sets is a -system and
generates the Borel -algebra).
Corollary 1.26. Let , be nite measures in (E, B(E)), such that
(C) = (C) for any closed subset C of E. Then = .
Exercises
1.1 Given A X, denote by 1
A
: X {0, 1} its characteristic function, equal
to 1 on A and equal to 0 on A
c
. Show that
1
AB
= max{1
A
, 1
B
}, 1
AB
= min{1
A
, 1
B
}, 1
A
c = 1
X
1
A
and that
limsup
n
A
n
= A limsup
n
1
A
n
= 1
A
,
liminf
n
A
n
= A liminf
n
1
A
n
= 1
A
.
1.2 Let A R
n
be a Borel set. Show that for h R
n
and t R the sets
A +h := {a +h : a A} , t A := {t a : a A}
are Borel as well.
1.3 Find an example of a additive measure on a algebra A such that
there exist A
n
A with A
n
A and inf
n
(A
n
) > (A).
1.4 Let be additive and nite, on an algebra A. Show that is additive if
and only if it is continuous along nonincreasing sequences.
1.5 Let be a nite measure on (X, E). Show that the set of atoms of , dened
by
A
:= {x X : {x} E and ({x}) > 0}

is at most countable. Show that the same is true for nite measures, and
provide an example of a measure space for which this property fails.
1.6 Let (X, E, ) be a measure space, with nite. We say that is diffuse if
for all A E with (A) > 0 there exists B A with 0 < (B) < (A). Show
that, if is diffuse, then (E) = [0, (X)].
1.7 Show that if X is a separable metric space and E is the Borel algebra,
then a additive measure : E [0, +) is diffuse if and only if has no
atom.
1.8 Let be the Lebesgue measure in [0, 1]. Show the existence of a negli-
gible set having the cardinality of the continuum. Hint: consider the classical
Cantors middle third set, obtained by removing the interval (1/3, 2/3) from
[0, 1], then by removing the intervals (1/9, 2/9) and (7/9, 8/9), and so on.
1.9 Let be the Lebesgue measure in [0, 1]. Show the existence, for any > 0,
of a closed set C [0, 1] containing no interval and such that (C) > 1 .
Hint: remove from [0, 1] a sequence of open intervals, centered on the rational
points of [0, 1].
1.10 Using the previous exercise, write [0, 1] = A B where A is negligible
in the measure-theoretic sense (i.e. (A) = 0) and B is negligible in the Baire
category sense (i.e. it is the union of countably many closed sets with empty
interior). So, the two concepts of negligible should be never used at the same
time.
1.11 Let be the Lebesgue measure in [0, 1]. Construct a Borel set E (0, 1)
such that
0 < (E I ) < (I )
for any open interval I (0, 1).
1.12 Let (X, E, ) be a measure space and let
: P(X) [0, +] be
the outer measure induced by . Show that the completed algebra E
is
contained in the class C of additive sets with respect to
.
: P(X) [0, +] be the

outer measure induced by . Show that for all A X there exists B E
containing A with (B) =
(A).
1.14 Let (X, E, ) be a measure space. Check the following statements, made
in Denition 1.12:
(i) E
is a algebra;
(ii) the extension (A) := (B), where B E is any set such that AB is
contained in a negligible set of E, is well dened and additive on
E
;
(iii) negligible sets of E
are characterized by the property of being coin-

tained in a negligible set of E.
: P(X) [0, +] be
the outer measure induced by . Show that if (X) is nite, the class C of
additive sets with respect to
coincides with the class of E
measurable sets.
Hint: one inclusion is provided by Exercise 1.12. For the other one, given an
additive set A, by applying Exercise 1.13 twice, nd rst a set B E with
(B \ A) = 0, and then a set C E with (C) = 0 and B \ A C.

1.16 Find a algebra E P(N) containing innitely many sets and such that
any B E different from has an innite cardinality.
1.17 Find : P(N) {0, +} that is additive, but not additive.
1.18 Let be the rst uncountable ordinal and, for K P(X), dene by
transnite induction a family F
(i )
, i , as follows: F
(0)
:= K {},
F
(i )
:=
_

_
k=0
A
k
, B
c
: (A
k
) F
( j )
, B F
( j )
_
,
if i is the successor of j , and F
(i )
:=

j i
F
( j )
otherwise.
Show that
i
F
(i )
= (K ).
1.19 Show that B(R) has the cardinality of the continuum. Hint: use the con-
struction of the previous exercise, and the fact that has at most the cardinality
of continuum.
1.20 Show that the algebra L of Lebesgue measurable sets has the same
cardinality of P(R), thus strictly greater than the continuum. Hint: consider all
subsets of Cantors middle third set.
1.21 Show that the cardinality of any algebra is either nite or uncount-
able.
1.22 Let X be a set and let A P(X) be an algebra with nite cardinality.
Show that its cardinality is equal to 2
n
for some integer n 1.
1.23 Let (X, E, ) be a a measure space and suppose that X is nite or count-
able. Show the existence of a measure on P(X) that extends , that is,
(A) = (A) for all A E.
1.24 Find an example of an additive set function : P(N) {0, 1}, with
(N) = 1 and ({n}) = 0 for all n N (in particular is not additive, the
construction of this example requires Zorns lemma).
1.25 Let C B([0, 1]) with (C) > 0. Without using the continuum hypo-
thesis, show that C has the cardinality of continuum.
1.26 Let (K, d) be a compact metric space and let be as in Exercise 1.24.
Lets say that a sequence (x
n
) K -converges to x K if
_
{n N : d(x
n
, x) > }
_
= 0 > 0.
Show that any sequence (x
n
) K is -convergent and that the -limit is
unique.
Chapter 2
Integration
This chapter is devoted to the construction of the integral of Emeasur-
able functions in general measure spaces (, E, ), and its main con-
tinuity and lower semicontinuity properties. Having built in the previous
chapter the Lebesgue measure in the real line R, we obtain as a byproduct
the Lebesgue integral on R; in the last section we compare Lebesgue and
Riemann integral.
In the construction of the integral we prefer to empahsize two view-
points: the rst, more traditional one
_
X
f d =
zIm( f )
z({ f = z})
is appropriate to deal with simple functions (i.e. functions whose range is
nite) and useful to show the additivity of the integral with respect to f .
The second one, for nonnegative functions is summarized by the formula
_
X
f d =
_

0
({ f > t }) dt.
This second viewpoint is more appropriate to show the continuity prop-
erties of the integral with respect to f (the integral on the right side can
be elementarily dened, since t ({ f > t }) is nonincreasing, see
Section 2.4.3). Of course we show that the two viewpoints are consistent
if we restrict ourselves to the class of simple functions.
2.1. Inverse image of a function
Let X be a non empty set. For any function : X Y and any I
P(Y) we set
1
(I ) := {x X : (x) I } = { I }.
The set
1
(I ) is called the inverse image of I .
Let us recall some elementary properties of
1
(the easy proofs are left
to the reader as an exercise):
(i)
1
(I
c
) = (
1
(I ))
c
for all I P(Y);
(ii) if {J
i
}
i I
P(Y) we have
_
i I
1
(J
i
) =
1
_
_
i I
J
i
_
,
_
i I
1
(J
i
) =
1
_
_
i I
J
i
_
.
In particular, if I J = we have
1
(I )
1
(J) = . Also, if
E P(Y) and we consider the family
1
(E) of subset of X dened
by
1
(E) :=
_
1
(I ) : I E
_
, (2.1)
we have that
1
(E) is a algebra whenever E is a algebra.
2.2. Measurable and Borel functions
We are given measurable spaces (X, E) and (Y, F). We say that a func-
tion : X Y is (E, F)measurable if
1
(F) E. If (Y, F) =
(R, B(R)), we say that is a real valued Emeasurable function, and if
(X, d) is a metric space and E is the Borel algebra, we say that is a
real valued Borel function.
The following simple but useful proposition shows that the measurab-
ility condition needs to be checked only on a class of generators.
Proposition 2.1 (Measurability criterion). Let G F be such that
(G) = F. Then : X Y is (E, F)measurable if and only if
1
(I ) E for all I G (equivalently, iff
1
(G) E).
Proof. Consider the family D := {I F :
1
(I ) E}. By the
above-mentioned properties of
1
as an operator between P(Y) and
P(X), it follows that D is a algebra including G. So, it coincides
with (G) = F.
A simple consequence of the previous proposition is the fact that any
continuous function is a Borel function: more precisely, assume that :
X Y is continuous and that E = B(X) and F = B(Y). Then, the
algebra
_
A Y :
1
(A) B(X)
_
contains the open subsets of Y (as, by the continuity of ,
1
(A) is
open in X, and in particular Borel, whenever A is open in Y), and then it
contains the generated algebra, i.e. B(Y).
The following proposition, whose proof is straightforward, shows that
the class of measurable functions is stable under composition.
Proposition 2.2. Let (X, E), (Y, F), (Z, G) be measurable spaces and
let : X Y and : Y Z be respectively (E, F)measurable and
(F, G)measurable. Then is (E, G)measurable.
It is often convenient to consider functions with values in the extended
space R := R {+, }, the so-called extended functions. We say
that a mapping : X R is Emeasurable if
1
({}),
1
({+}) E and
1
(I ) E , I B(R).
(2.2)
This condition can also be interpreted in terms of measurability between
E and a suitable Borel algebra in R, see Exercise 2.3. Analogously,
when (X, d) is a metric space and E is the Borel algebra, we say that
: X R is Borel whenever the conditions above hold.
The following proposition shows that extended Emeasurable func-
tions are stable under pointwise limits and countable supremum and in-
mum.
Proposition 2.3. Let (
n
) be a sequence of extended Emeasurable func-
tions. Then the following functions are Emeasurable:
sup
nN
n
(x), inf
nN
n
(x), limsup
n
n
(x), liminf
n

n
(x).
Proof. Let us prove that (x) := sup
n

n
(x) is Emeasurable (all other
cases can be deduced from this one, or directly proved by similar argu-
ments). For any a R we have
1
([, a]) =
_
n=0
1
n
([, a]) E.
In particular { = } E, so that
1
((, a]) E for all a R;
by letting a we get
1
(R) E. As a consequence, the class
_
I B(R) :
1
(I ) E
_
is a algebra containing the intervals of the form (, a] with a R,
and therefore coincides with B(R). Eventually, { = +} = X \
[
1
(R) { = }] belongs to E as well.
2.3. Partitions and simple functions
Let (X, E) be a measurable space. A function : X R is said to be
simple if its range (X) is a nite set. The class of simple functions is
obviously a real vector space, as the range of + is contained in
{a +b : a range(), b range()} .
If (X) = {a
1
, . . . , a
n
}, with a
i
= a
j
if i = j , setting A
i
=
1
({a
i
}),
i = 1, . . . , n we can canonically represent as
(x) =
n
k=1
a
k
1
A
k
, x X. (2.3)
Moreover, A
1
, . . . , A
n
is a nite partition of X (i.e. A
i
are mutually
disjoint and their union is equal to X). However, a simple function has
many representations of the form
(x) =
N
k=1
a
k
1
A
k
, x X,
where A
1
, . . . , A
N
need not be mutually disjoint and a
k
need not be in
the range of . For instance
1
[0,1)
+31
[1,2]
= 1
[0,2]
+21
[1,2]
.
It is easy to check that a simple function is Emeasurable if, and only if,
all level sets A
k
in (2.3) are Emeasurable; in this case we shall also say
that {A
k
} is a nite Emeasurable partition of X.
Now we show that any nonnegative Emeasurable function can be ap-
proximated by simple functions; a variant of this result, with a different
construction, is proposed in Exercise 2.8.
Proposition 2.4. Let be a nonnegative extended Emeasurable func-
tion. For any n N
, dene
n
(x) =
i 1
2
n
if
i 1
2
n
(x) <
i
2
n
, i = 1, 2, . . . , n2
n
;
n if (x) n.
(2.4)
Then
n
are simple and Emeasurable, (
n
) is nondecreasing and con-
vergent to . If in addition is bounded the convergence is uniform.
Proof. It is not difcult to check that (
n
) is nondecreasing. Moreover,
we have
0 (x)
n
(x)
1
2
n
if (x) < n, x X,
and
0 (x)
n
(x) = (x) n if (x) n, x X.
So, the conclusion easily follows.
2.4. Integral of a nonnegative Emeasurable function
We are given a measure space (X, E, ). We start to dene the integral
for simple nonnegative functions.
2.4.1. Integral of simple functions
Let be a nonnegative simple Emeasurable function, and let us repres-
ent it as
(x) =
N
k=1
a
k
1
A
k
, x X,
with N N, a
1
, . . . , a
N
0 and A
1
, . . . , A
N
in E. Then we dene
(using the standard convention in the theory of integration that 0 = 0),
_
X
d :=
N
k=1
a
k
(A
k
).
It is easy to see that the denition does not depend on the choice of the
representation formula for . Indeed, let {b
1
, . . . , b
M
} be the range of
and let =

M
1
b
j
1
B
j
, with B
j
:=
1
(b
j
), be the canonical representa-
tion of . We have to prove that
N
k=1
a
k
(A
k
) =
M
j =1
b
j
(B
j
). (2.5)
As the B
i
s are pairwise disjoint, (2.5) follows by adding the M identities
N
k=1
a
k
(A
k
B
j
) = b
j
(B
j
) j = 1, . . . , M. (2.6)
In order to show (2.6) we x j and consider, for I {1, . . . , N}, the sets
A
I
:=
_
x B
j
: x A
i
iff i I
_
,
so that {A
I
} are a Emeasurable partition of B
j
and x A
I
iff the set
of i s for which x A
i
coincides with I . Then, using rst the fact that
A
I
A
i
if i I , and A
i
A
I
= otherwise, and then the fact that
kI
a
k
= b
j
whenever A
I
= (because
N
1
a
k
1
A
k
coincides with b
j
, the
constant value of on B
j
), we have
N
k=1
a
k
(A
k
B
j
) =
N
k=1
I
a
k
(A
k
A
I
) =
I
N
k=1
a
k
(A
k
A
I
)
=
kI
a
k
(A
I
) =
I
b
j
(A
I
) = b
j
(B
j
).
Proposition 2.5. Let , be simple nonnegative Emeasurable func-
tions on X and let , 0. Then + is simple, Emeasurable
and we have
_
X
( +) d =
_
X
d +
_
X
d
Proof. Let
=
n
k=1
a
k
1
A
k
, =
m
h=1
b
h
1
B
h
with {A
k
}, {B
h
} nite Emeasurable partitions of X. Then {A
k
B
h
} is a
nite Emeasurable partition of X and + is constant (and equal
to a
k
+ b
h
) on any element A
k
B
h
of the partition. Therefore the
level sets of + are nite unions of elements of this partition and
the Emeasurability of + follows (see also Exercise 2.2). Then,
writing
(x) =
n
k=1
m
h=1
a
k
1
A
k
B
h
(x), (x) =
n
k=1
m
h=1
b
h
1
A
k
B
h
(x), x X,
we arrive at the conclusion.
2.4.2. The repartition function
Let : X R be Emeasurable. The repartition function F of , relat-
ive to , is dened by
F(t ) := ({ > t }), t R.
The function F is nonincreasing and satises
lim
t
F(t ) = lim
n
F(n) = lim
n
({ > n}) = ({ > }),
and, if is nite,
lim
t +
F(t ) = lim
n
F(n) = lim
n
({ > n}) = ({ = +}),
since
{ > } =
_
n=1
{ > n}, { = +} =
_
n=1
{ > n}.
Other important properties of F are provided by the following result.
Proposition 2.6. Let : X R be Emeasurable and let F be its re-
partition function.
(i) For any t
0
R we have lim
t t
+
0
F(t ) = F(t
0
), that is, F is right con-
tinuous.
(ii) If is nite, for any t
0
R we have lim
t t
0
F(t ) = ({ t
0
)}, that
is, F has left limits
(1)
.
Proof. Let us prove (i). We have
lim
t t
+
0
F(t ) = lim
n+
F
_
t
0
+
1
n
_
= lim
n+
__
> t
0
+
1
n
__
= ({ > t
0
}) = F(t
0
),
since
{ > t
0
} =
_
n=1
_
> t
0
+
1
n
_
= lim
n
_
> t
0
+
1
n
_
.
So, (i) follows. We prove now (ii). We have
lim
t t
0
F(t ) = lim
n+
F
_
t
0

1
n
_
= lim
n+
__
> t
0

1
n
__
= ({ t
0
}),
since
{ t
0
} =
_
n=1
_
> t
0

1
n
_
= lim
n
_
> t
0

1
n
_
and (ii) follows.
From Proposition 2.6 it follows that, in the case when is nite, F is
continuous at t
0
iff ({ = t
0
}) = 0.
Now we want to extend the integral operator to nonnegative Emea-
surable functions. Let be a nonnegative, simple and Emeasurable
function and let
(x) =
n
k=0
a
k
1
A
k
, x X,
(1)
In the literature F is called a cadlag function.
with n N
, 0 = a
0
< a
1
< a
2
< < a
n
< . Then the repartition
function F of is given by
F(t ) =
(A
1
) +(A
2
) + +(A
n
) = F(0) if 0 t < a
1
(A
2
) +(A
3
) + +(A
n
) = F(a
1
) if a
1
t < a
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(A
n
) = F(a
n1
) if a
n1
t < a
n
0 = F(a
n
) if t a
n
.
Consequently, we can write
_
X
(x) d(x) =
n
k=1
a
k
(A
k
) =
n
k=1
a
k
(F(a
k1
) F(a
k
))
=
n
k=1
a
k
F(a
k1
)
n
k=1
a
k
F(a
k
)
=
n1
k=0
a
k+1
F(a
k
)
n1
k=0
a
k
F(a
k
)
=
n1
k=0
(a
k+1
a
k
)F(a
k
) =
_

0
F(t ) dt.
(2.7)
Example 2.7. We set X = R, = ,
A
1
= [1, 2] [10, 11], A
2
= [2, 3], A
3
= [3, 4],
A
4
= [4, 6], A
5
= [7, 10],
a
1
= 5, a
2
=, a
3
= 10, a
4
= 7, a
5
= 2
and :=

5
k=1
a
k
1
A
k
to be the simple function shown in Figure 2.1. It is
easy to verify that F has the graph shown in the right picture in Figure 2.1.
F
1
1
2
3
4
5
6
7
9
8
10
4 6 7 8 10 9 1 2 3 5
Figure 2.1. a simple function , and its repartition F
The color scheme used for the areas below the two graphs in 2.1 proves
graphically that the areas are identical.
Now, we want to dene the integral of any nonnegative extended E
measurable function by generalizing formula (2.7). For this, we need
rst to dene the integral of any nonnegative nonincreasing function in
(0, +).
2.4.3. The archimedean integral
We generalize here the (inner) Riemann integral to any nonincreasing
function f : [0, +) [0, +]. The strategy is to consider the su-
premum of the areas of piecewise constant minorants of f .
Let be the set of all nite decompositions = {t
1
, . . . , t
N
} of
[0, +], where N N
and 0 = t
0
t
1
< < t
N
< +.
Let now f : [0, +) [0, +] be a nonincreasing function. For
any = {t
0
, t
1
, . . . , t
N
} we consider the partial sum
I
f
() :=
N1
k=0
f (t
k+1
)(t
k+1
t
k
). (2.8)
We dene
_

0
f (t ) dt := sup{I
f
() : }. (2.9)
The integral
_
0
f (t ) dt is called the archimedean integral of f . It enjoys
the usual properties of the Riemann integral (see Exercise 2.5) but, among
these, we will need only the monotonicity with respect to f in the sequel.
For our purposes the most relevant property of the Archimedean integral
is instead the continuity under monotonically nondecreasing sequences.
Proposition 2.8. Let f
n
f , with f
n
: [0, +) [0, +] nonin-
creasing. Then
_

0
f
n
(t ) dt
_

0
f (t ) dt.
Proof. It is obvious that
_

0
f
n
(t ) dt
_

0
f (t ) dt.
To prove the converse inequality, x L <
_
0
f (t ) dt . Then there exists
= {t
1
, . . . , t
N
} such that
N1
k=0
f (t
k
)(t
k+1
t
k
) > L.
Since for n large enough
_

0
f
n
(t ) dt
N1
k=0
f
n
(t
k+1
)(t
k+1
t
k
) > L,
letting n we nd that
sup
nN
_

0
f
n
(t ) dt L.
This implies
sup
nN
_

0
f
n
(t ) dt
_

0
f (t ) dt
and the conclusion follows.
2.4.4. Integral of a nonnegative measurable function
We are given a measure space (X, E, ) and an extended nonnegative
Emeasurable function . Having the identity (2.7) in mind, we dene
_
X
d: =
_

0
({ > t }) dt. (2.10)
Notice that the function t ({ > t }) [0, +] is nonnegative and
nonincreasing in [0, +), so that its archimedean integral is well dened
and (2.10) extends, by the remarks made at the end of Section 2.4.2, the
integral elementarily dened on simple functions. If the integral is nite
we say that is integrable.
It follows directly from the analogous properties of the archimedean
integral that the integral so dened is monotone, i.e.

_
X
d
_
X
d.
Indeed, implies { > t } { > t } and ({ > t }) ({ >
t }) for all t > 0. Furthermore, the integral is invariant under modica-
tions of in negligible sets, that is
= a.e. in X
_
X
d =
_
X
d.
To show this fact it sufces to notice that = a.e. in X implies that
the sets { > t } and { > t } differ in a negligible set for all t > 0,
therefore ({ > t }) = ({ > t }) for all t > 0.
Let us prove the following basic Markov inequality.
Proposition 2.9. For any a (0, +) we have
({ a})
1
a
_
X
(x) d(x). (2.11)
Proof. For any a (0, +) we have, recalling the inclusion { a}
{ > t } for any t (0, a), that ({ > t }) ({ a}) for all
t (0, a). The monotonicity of the archimedean integral gives
_
X
(x) d(x) =
_

0
({ > t }) dt
_

0
1
(0,a)
(t )({ > t }) dt
a({ a}).
The Markov inequality has some important consequences.
Proposition 2.10. Let : X [0, +] be an extended Emeasurable
function.
(i) If is integrable then the set { = +} has measure 0, that
is, is nite a.e. in X.
(ii) The integral of vanishes iff is equal to 0 a.e. in X.
Proof. (i) Since
_
X
d < we deduce from (2.11) that
lim
a+
({ > a}) = 0.
Since
{ = } =
_
n=1
{ > n},
by applying the continuity along decreasing sequences in the space ({ >
1} (with nite measure) we obtain
({ = }) = lim
n+
({ > n}) = 0.
(ii) If
_
X
d = 0 we deduce from (2.11) that ({ > a}) = 0 for all
a > 0. Since
({ > 0}) = lim
n+
({ >
1
n
}) = 0,
the conclusion follows. The other implication follows by the invariance
of the integral.
Proposition 2.11 (Monotone convergence). Let (
n
) be a nondecreas-
ing sequence of extended nonnegative Emeasurable functions and set
(x) := lim
n
n
(x) for any x X. Then
_

0

n
(x) d(x)
_

0
(x) d(x).
Proof. It sufces to notice that ({
n
> t }) ({ > t }) for all t > 0,
and then to apply Proposition 2.8.
Now, by Proposition 2.4 we obtain the following important approxim-
ation property.
Proposition 2.12. Let : X [0, +] be an extended Emeasurable
function. Then there exist simple Emeasurable functions
n
: X
[0, +) such that
n
, so that
_

0

n
(x) d(x)
_

0
(x) d(x).
Remark 2.13 (Construction of Lebesgue and Riemann integrals).
Proposition 2.12 could be used as an alternative, and equivalent, deni-
tion of the Lebesgue integral: we can just dene it as the supremum of the
integral of minorant simple functions. This alternative denition is closer
to the denitions of Archimedean integrals and of inner Riemann integ-
ral: the only (fundamental) difference is due to the choice of the family of
simple functions. In all cases simple functions take nitely many val-
ues, but within the Lebesgue theory their level sets belong to a algebra,
and so the family of simple function is much richer, in comparison with
the other theories.
We can now prove the additivity property of the integral.
Proposition 2.14. Let , : X [0, ] be Emeasurable functions.
Then
_
X
( +) d =
_
X
d +
_
X
d.
Proof. Let
n
,
n
be simple functions with
n
and
n
. Then,
the additivity of the integral on simple functions gives
_
X
(
n
+
n
) d =
_
X

n
d +
_
X

n
d.
We conclude passing to the limit as n and using the monotone
convergence theorem.
The following Fatous lemma, providing a semicontinuity property of
the integral, is of basic importance.
Lemma 2.15 (Fatou). Let
n
: X [0, +] be extended Emeasur-
able functions. Then we have
_
X
liminf
n

n
(x) d(x) liminf
n
_
X

n
(x) d(x). (2.12)
Proof. Setting (x) := liminf
n

n
(x), and
n
(x) = inf
mn

m
(x), we
have that
n
(x) (x). Consequently, by the monotone convergence
theorem,
_
X
(x) d(x) = lim
n
_
X

n
(x) d(x).
On the other hand
_
X

n
(x) d(x)
_
X

n
(x) d(x),
so that
_
X
(x) d(x) liminf
n
_
X

n
(x) d(x).
In particular, if
n
are pointwise converging to , we have
_
X
(x) d(x) liminf
n
_
X

n
(x) d(x).
2.5. Integral of functions with a variable sign
Let : X R be an extended Emeasurable function. We say that
is integrable if both the positive part
+
(x) := max{(x), 0} and the
negative part
(x) := max{(x), 0} of are integrable in X. As

=
+

, in this case it is natural to dene

_
X
(x) d(x) :=
_
X

+
(x) d(x)
_
X

(x) d(x).
As || =
+
+
, the additivity properties of the integral give that

is integrable if and only if
_
X
|| d < .
Let : X Rand let A E be such that 1
A
is -integrable. We dene
also
_
A
(x) d(x) :=
_
X
1
A
(x)(x) d(x).
In the following proposition we summarize the main properties of the
integral.
Proposition 2.16. Let , : X R be integrable functions.
(i) For any , R we have that + is integrable and
_
X
( +) d =
_
X
d +
_
X
d.
(ii) If in X we have
_
X
d
_
X
d.
(iii)

_
X
d

_
X
|| d.
Proof. (i). Since ()
+
=
and ()
=
+
, we have
_
X
d =
_
X
d. So, possibly replacing by and by we can assume
that 0 and 0. We have
( +)
+
+
= ( +)
+
+
+
+
,
so that we can integrate both sides and use the additivity on nonnegative
functions to obtain
_
X
( +)
+
d +
_
X

d +
_
X

d
=
_
X
( +)
d +
_
X

+
d +
_
X

+
d.
Rearranging terms we obtain (i).
(ii). It follows by the monotonicity of the integral on nonnegative func-
tions and from the inequalities
+

+
and
.
(iii). Since || || the conclusion follows from (ii).
Another consequence of the additivity property of the integral is the
additivity of the real-valued map
A E
_
A
d
whenever is integrable. We will see in the next section that, as a
consequence of the dominated convergence theorem, this map is even
additive.
2.6. Convergence of integrals
In this section we study the problem of commuting limit and integral;
we have already seen that this can be done in some particular cases, as
when the functions are nonnegative and monotonically converge to their
supremum, and now we investigate some more general cases, relevant for
the applications.
Proposition 2.17 (Lebesgue dominated convergence theorem). Let
(
n
) be a sequence of Emeasurable functions pointwise converging to
. Assume that there exists a nonnegative integrable function such
that
|
n
(x)| (x) x X, n N.
Then the functions
n
and the function are integrable and
lim
n
_
X

n
d =
_
X
d.
Proof. Passing to the limit as n we obtain that is Emeasurable
and || in X. In particular is integrable. Since + is
nonnegative, by the Fatou lemma we have
_
X
( +) d liminf
n
_
X
(
n
+) d.
Consequently,
_
X
d liminf
n
_
X

n
d. (2.13)
In a similar way we have
_
X
( ) d liminf
n
_
X
(
n
) d.
Consequently,
_
X
d limsup
n
_
X

n
d. (2.14)
Now the conclusion follows by (2.13) and (2.14).
An important consequence of the dominated convergence theorem is
the absolute continuity property of the integral of integrable func-
tions :
for any > 0 there exists > 0 such that (A) <
_
A
|| d < .
(2.15)
The proof of this property is sketched in Exercise 2.9.
2.6.1. Uniform integrability and Vitali convergence theorem
In this subsection we assume for simplicity that the measure is nite.
A family {
i
}
i I
of Rvalued integrable functions is said to be
uniformly integrable if
lim
(A)0
_
A
|
i
(x)| d(x) = 0, uniformly in i I .
This means that for any > 0 there exists > 0 such that
(A) <
_
A
|
i
(x)| d(x) i I.
This property obviously extends fromsingle functions to families of func-
tions the absolute continuity property of the integral.
Notice that any family {
i
}
i I
dominated by a single integrable
function (i.e. such that |
i
| || for any i I ) is obviously
uniformly integrable. Taking this remark into account, we are going to
to prove the following extension of the dominated convergence theorem,
known as Vitali Theorem.
Theorem 2.18 (Vitali). Assume that is a nite measure and let (
n
) be
a uniformly integrable sequence of functions pointwise converging to
a real valued function . Then is integrable and
lim
n
_
X

n
d =
_
X
d.
To prove the Vitali theorem we need the following Egorov Lemma.
Lemma 2.19 (Egorov). Assume that is a nite measure and let (
n
)
be a sequence of Emeasurable functions pointwise converging to a real
valued function . Then for any > 0 there exists a set A
E such that
(A
) < and
n
uniformly in X \ A
.
Proof. For any integer m 1 we write X as the increasing union of the
sets B
n,m
, where
B
n,m
:=
_
x X : |
i
(x) (x)| <
1
m
i n
_
.
Since is nite there exists n(m) such that (B
n(m),m
) > (X) 2
m
.
We denote by A
the union of X \ B
n(m),m
, so that
(A
m=1
(X \ B
n(m),m
) <
m=1
2
m
= .
Now, given any > 0, we can choose m > 1/ to obtain that
|
n
(x) (x)|
1
m
< for all x B
n(m),m
, n n(m).
As X \ A
B
n(m),m
, this proves the uniform convergence of
n
to on
X \ A
.
Proof of the Vitali Theorem. Fix > 0 and nd > 0 such that
_
A
|
n
| d < whenever (A) < . Again, Fatous Lemma yields that
_
A
|| d whenever (A) < .
Assume now that A is given by Egorov Lemma, so that
n
uni-
formly on X \ A. Then, writing
_
X
(
n
) d =
_
X\A
(
n
) d +
_
A
(
n
) d
and using the fact that lim
n
sup
X\A
|
n
| = 0 we obtain
_
X
(
n
) d
3
for n large enough. The statement follows letting 0.
2.7. A characterization of Riemann integrable functions
The integrals
_
J
f d, with J = [a, b] closed interval of the real line and
equal to the Lebesgue measure in R, are traditionally denoted with the
classical notation
_
b
a
f dx or with
_
J
f dx. This is due to the fact that
Riemanns and Lebesgues integral coincide on the class of Riemanns
integrable functions.
We denote by I
( f ) and I
( f ) the upper and lower Riemann integral

of f respectively, the former dened by taking the supremum of the sums
n1
1
a
i
(t
i +1
t
i
) in correspondence of all step functions
h =
n1
i =1
a
i
1
[t
i
,t
i +1
)
f a = t
1
< < t
n
= b, (2.16)
and the latter considering the inmum in correspondence of all step func-
tions h f . We denote by I ( f ) the Riemann integral, equal to the upper
and lower integral whenever the two integrals coincide.
As the Lebesgue integral of the function h in (2.19) coincides with
n1
i
a
i
(t
i +1
t
i
), we have
_
J
g d = I (g) for any step function g : J R.
Now, if f : J R is continuous, we can choose a uniformly bounded
sequence of step functions g
h
converging pointwise to f (for instance
splitting J into i equal intervals [x
i
, x
i +1
[ and setting a
i
= min
[x
i
,x
i +1
]
f )
whose Riemann integrals converge to I ( f ). Therefore, passing to the
limit in the identity above with g = g
h
, and using the dominated conver-
gence theorem we get
_
J
f d = I ( f ) for any continuous function f : J R.
We are going to generalize this fact, providing a full characterization,
within the Lebesgue theory, of Riemmans integrable functions.
Theorem 2.20. Let f : J = [a, b] R be a bounded function. Then
f is Riemann integrable if and only if the set of its discontinuity points
is Lebesgue negligible. If this is the case, we have that f is B(J)
measurable and
_
J
f d = I ( f ). (2.17)
Proof. Let
(x) := inf
_
liminf
h
f (x
h
) : x
h
x
_
f

(x) := sup
_
limsup
h
f (x
h
) : x
h
x
_
.
(2.18)
It is not hard to show (see Exercise 2.6 and Exercise 2.7) that f
is lower
semicontinuous and f

is upper semicontinuous, therefore both f
and
f

are Borel functions.
We are going to show that I
( f ) =
_
J
f
d and I
( f ) =
_
J
f

d.
These two equalities yield the conclusion, as f is continuous at a.e.
point in J iff f

f
= 0 a.e. in J, and this holds iff (because f

f

0)
_
J
( f

f
) d = 0.
Furthermore, if the set of discontinuity points of f is negligible, the
Borel function f

differs from f only in a negligible set, thus f is
B(J)
measurable (because { f > t } differs from the Borel set { f

> t }
only in a negligible set, see also Exercise 2.4) and its integral coincides
with
_
J
f

d =
_
J
f
d; this leads to (2.17).

Since I
( f ) = I
(f ) and f

= (f )
, we need only to prove the

rst of the two equalities, i.e.
_
J
f
d = I
( f ). (2.19)
In order to check the inequality in (2.19) we apply Exercise 2.11, nd-
ing a sequence of continuous functions g
h
f
f and obtaining,
thanks to the monotone convergence theorem,
_
J
f
d = sup
hN
_
J
g
h
d = sup
hN
I (g
h
) = sup
hN
I
(g
h
) I
( f ).
In order to prove in (2.19) we x a step function h f in [a, b) as in
(2.16) and we notice that f a
i
= h in (t
i
, t
i +1
) implies f
a
i
in the
same interval. Hence f
h in J \ {t
1
, . . . , t
n
} and, being the set of the
t
i
s Lebesgue negligible, we have
_
J
f
d
_
J
h d = I (h).
Since h is arbitrary the inequality is achieved.
Exercises
2.1 Show that any of the conditions listed below is equivalent to the Emea-
surability of : X R.
(i)
1
((, t ]) E for all t R;
(ii)
1
((, t )) E for all t R;
(iii)
1
([a, b]) E for all a, b R;
(iv)
1
([a, b)) E for all a, b R;
(v)
1
((a, b)) E for all a, b R.
2.2 Let , : X R be Emeasurable. Show that + and are E
measurable. Hint: prove that
{ + < t } =
_
rQ
[{ < r} { < t r}]
and
{
2
> a} = { >
a} { <
a}, a 0.
2.3 Let us dene a distance d in R by
d(x, y) := | arctan x arctan y|
where, by convention, arctan() = /2.
(i) Show that (R, d) is a compact metric space (the so-called compactication
of R) and that A R is open relative to the Euclidean distance if, and only
if, it is open relative to d;
(ii) use (i) to show that, given a measurable space (X, E), f : X R is E
measurable according to (2.2) if and only if it is measurable between E and
the Borel algebra of (R, d).
2.4 Let (X, E, ) be a measure space and let E
be the completion of E induced

by . Show that f : X R is E
measurable iff there exists a Emeasurable

function g such that { f = g} is contained in a negligible set of E.
2.5 Let us dene I
f
as in (2.8) and let us endow with the usual partial ordering
= {t
1
, . . . , t
N
} = {s
1
, . . . , s
M
} if and only if . Show that
I
f
() is nondecreasing. Use this fact to show that f
_
0
f (t ) dt is additive.
2.6 Let f : R R be a function. Show that the functions f
, f

dened in
(2.18) are respectively lower semicontinuous and upper semicontinuous.
2.7 Let f : R R be a bounded function. Using Exercise 2.6 show that
{ f
t } and { f

t } are closed for all t R. In particular deduce that
= {x R : f is continuous at x}
belongs to B(R).
2.8 Let (a
n
) (0, ) with
i =0
a
i
= , lim
i
a
i
= 0.
Show that for any : X [0, +] Emeasurable there exist A
i
E such that
=

i
a
i
1
A
i
. Hint: set
0
:= , A
0
:= { a
0
} and
1
:=
0
a
0
1
A
0
0.
Then, set A
1
:= {
1
a
1
} and
2
:=
0
a
1
1
A
1
and so on.
2.9 Let : X R be integrable. Show that the property (2.15) holds. Hint:
assume by contradiction its failure for some > 0 and nd A
i
with (A
i
) < 2
i
and
_
A
i
|| d . Then, notice that B := limsup
i
A
i
is negligible, consider
B
n
:=
_
i n
A
i
\ B
and apply the dominated convergence theorem.
2.10 Prove that if
n
in L
1
(, E, ), then (
n
) is uniformly integrable.
In addition, nd a space (X, E, ) and a sequence (
n
) that is uniformly
integrable, for which there is no g L
1
(X, E, ) satisfying |
n
| g for all
n N.
2.11 Let (X, d) be a metric space and let g : X [0, ] be lower semicon-
tinuous and not identically equal to . For any > 0 dene
g
(x) := inf
yX
{g(y) +d(x, y)} .
Check that:
(a) |g
(x) g
(x
)| d(x, x
) for all x, x
X;
(b) g
g as .
2.12 Let f : R
2
R be satisfying the following two properties:
(i) x f (x, y) is continuous in R for all y R;
(ii) y f (x, y) is continuous in R for all x R.
Show that f is a Borel function. Hint: rst reduce to the case when f is
bounded. Then, for > 0 consider the functions
f
(x, y) :=
1
2
_
x+
x
f (x
, y) dx
,
proving that f
are continuous and f
f as 0.
Chapter 3
Spaces of integrable functions
This chapter is devoted to the properties of the so-called L
p
spaces, the
spaces of measurable functions whose p-th power is integrable. Through-
out the chapter a measure space (X, E, ) will be xed.
3.1. Spaces L
p
(X, E, ) and L
p
(X, E, )
Let Y be a real vector space. We recall that a norm on Y is a non-
negative map dened on Y satisfying:
(i) y = 0 if and only if y = 0;
(ii) y = || y for all R and y Y;
(iii) y
1
+ y
2
y
1
+y
2
for all y
1
, y
2
Y.
The space Y, endowed with the norm , is called a normed space.
Y is also a metric space when endowed with the distance d(y
1
, y
2
) =
y
1
y
2
(the triangle inequality is a direct consequence of (iii)). If
(Y, d) is a complete metric space, we say that (Y, ) is a Banach space.
We denote by L
1
(X, E, ) the real vector space of all integrable
functions on (X, E). We dene
1
:=
_
X
|(x)| d(x), L
1
(X, E, ).
We have clearly
1
= ||
1
R, L
1
(X, E, ),
and
+
1

1
+
1
, L
1
(X, E, ),
so that conditions (ii) and (iii) in the denition of the norm are fullled.
However,
1
is not a norm in general, since
1
= 0 if and only if
= 0 a.e. in X, so (i) fails.
Then, we can consider the following equivalence relation R on
L
1
(X, E, ),
= a.e. in X (3.1)
and denote by L
1
(X, E, ) the quotient space of L
1
(X, E, ) with re-
spect to R. In other words, L
1
(X, E, ) is the quotient vector space
of L
1
(X, E, ) with respect to the vector subspace made by functions
vanishing a.e. in X.
For any L
1
(X, E, ) we denote by the equivalence class de-
termined by and we set
+

:=

+, := . (3.2)
It is easily seen that these denitions do no depend on the choice of rep-
resentatives in the equivalence class, and endow L
1
(X, E, ) with the
structure of a real vector space, whose origin is the equivalence class of
functions vanishing a.e. in X. Furthermore, setting

1
=
1
, L
1
(X, E, ),
it is also easy to see that this denition does not depend on the particular
element chosen in , and that (ii), (iii) still hold. Now, if
1
= 0
we have that the integral of || is zero, and therefore = 0. Therefore
L
1
(X, E, ), endowed with the norm
1
, is a normed space.
To simplify the notation typically is identied with whenever the
formula does not depend on the choice of the function in the equival-
ence class: for instance, quantities as ({ > t }) or
_
X
d have this
independence, as well as most statements and results in Measure Theory
and Probability, so this slight abuse of notation is justied. It should be
noted, however, that formulas like ( x) = 0, for some xed x X, do
not make sense in L
1
(X, E, ), since they depend on the representative
chosen (unless ({ x}) > 0).
More generally, if an exponent p (0, ) is given, we can apply a
similar construction to the space
L
p
(X, E, ) :=
_
: is Emeasurable and
_
X
||
p
d <
_
.
Since |x + y|
p
|x|
p
+ |y|
p
if p 1, and |x + y|
p
2
p1
(|x|
p
+
|y|
p
) if p 1, it turns out that L
p
(X, E, ) is a vector space, and we
shall denote by L
p
(X, E, ) the quotient vector space, with respect to the
equivalence relation (3.1). Still we can dene the sum and product by a
real number as in (3.2), to obtain that L
p
(X, E, ) has the structure of a
real vector space. The case p = 2 is particularly relevant for the theory,
as we will see.
Sometimes we will omit either E or , writing L
p
(X,) or even L
p
(X).
This typically happens when (X, d) is a metric space, and E is the Borel
-algebra, or when X R and is the Lebesgue measure.
3.2. The L
p
norm
For any L
p
(X, E, ) we dene
p
:=
__
X
||
p
d
_
1/p
.
We are going to show that
p
is a norm for any p [1, +). Notice
that we already checked this fact when p = 1, and that the homogen-
eity condition (ii) trivially holds, whatever the value of p is. Further-
more, condition (i) holds precisely because L
p
(X, E, ) consists, strictly
speaking, of equivalence classes induced by (3.1). So, the only condi-
tion that needs to be checked is the subadditivity condition (ii), and in the
sequel we can assume p > 1.
The concept of Legendre transform will be useful. Let f : R R be
a function; we dene its Legendre transform f

: R R {+} by
f

(y) = sup
xR
{xy f (x)}, y R.
Then the following inequality clearly holds:
xy f (x) + f

(y) x, y R, (3.3)
and actually f

could be equivalently dened as the smallest function
with this property.
Example 3.1. Let p > 1 and let
f (x) =
x
p
p
if x 0,
0 if x < 0.
Then, by an elementary computation, we nd that
f

(y) =
y
q
q
if y 0,
+ if y < 0,
where q = p/( p 1) (equivalently,
1
p
+
1
q
= 1). Consequently, the
following inequality, known as Young inequality, holds:
xy
x
p
p
+
y
q
q
, x, y 0. (3.4)
Motivated by the previous example, we say that p and q are dual (or
conjugate) exponents if
1
p
+
1
q
= 1, i.e. q = p/( p 1). The duality
relation is symmetric in (1, +), and obviously 2 is self-dual.
Example 3.2. Let f (x) = e
x
, x R. Then
f

(y) := sup
xR
{xy e
x
} =
+ if y < 0,
0 if y = 0,
y log y y if y > 0.
Consequently, the following inequality holds:
xy e
x
+ y log y y, x, y 0. (3.5)
3.2.1. H older and Minkowski inequalities
Proposition 3.3 (H older inequality). Assume that L
p
(X, E, ) and
L
q
(X, E, ), with p and q dual exponents in (1, +). Then
L
1
(X, E, ) and
1

p
q
. (3.6)
Proof. If either
p
= 0 or
q
= 0 then one of the two functions
vanishes a.e. in X, hence vanishes a.e. and the inequality is
trivial. If both
p
and
q
are strictly positive, by the 1homogeneity
of the both sides in (3.6) with respect to and , we can assume with no
loss of generality that the two norms are equal to 1.
Now we apply (3.4) to |(x)| and |(x)| to obtain
|(x)(x)|
|(x)|
p
p
+
|(x)|
q
q
.
Integrating over X with respect to yields
_
X
|(x)(x)| d(x)
1
p
+
1
q
= 1.
A particular case of the H older inequality is
_
X
(x)(x) d(x)
__
X

2
(x) d(x)
_
1/2
__
X

2
(x) d(x)
_
1/2
.
It also follows, as we shall see, from the Cauchy-Schwarz inequality of
scalar products.
Proposition 3.4 (Minkowski inequality). Assume that p [1, +)
and , L
p
(X, E, ). Then + L
p
(X, E, ) and
+
p

p
+
p
. (3.7)
Proof. The cases p = 1 is obvious. Assume that p (1, +). Then we
have
_
X
| +|
p
d
_
X
| +|
p1
|| d +
_
X
| +|
p1
|| d.
Since | +|
p1
L
q
(X, E, ) where q = p/( p 1), using the H older
inequality we nd that
_
X
| +|
p
d
__
X
| +|
p
d
_
1/q
(
p
+
p
),
and the conclusion follows.
By the previous proposition it follows that
p
is a normon L
p
(X, E, ).
3.3. Convergence in L
p
(X, EEE, ) and completeness
We have seen in the previous section that L
p
(X, E, ) is a normed space
for all p [1, +). In this section we prove some properties of the con-
vergence in these spaces, obtaining as a byproduct the following result.
Theorem 3.5. L
p
(X, E, ) is a Banach space for any p [1, +).
This theoremwill be a direct consequence of the following proposition,
that provides also a relation between convergence in L
p
and convergence
a.e. in X.
Proposition 3.6. Let p [1, +) and let (
n
) be a Cauchy sequence in
L
p
(X, E, ). Then:
(i) there exists a subsequence (
n(k)
) converging a.e. to a function
in L
p
(X, E, );
(ii) (
n
) is converging to in L
p
(X, E, ), so that L
p
(X, E, ) is a
Banach space.
Proof. Let (
n
) be a Cauchy sequence in L
p
(X, E, ). Choose a sub-
sequence (
n(k)
) such that
n(k+1)

n(k)
p
< 2
k
k N.
Next, set
g(x) :=
k=0
|
n(k+1)
(x)
n(k)
(x)|, x X.
By the monotone convergence theorem and the subadditivity of the L
p
norm it follows that
__
X
g
p
(x) d(x)
_
1/p
= lim
N
_
_
X
N1
k=0
|
n(k+1)

n(k)
|
p
d
_
1/p
lim
N
N1
k=0
2
k
= 2 < .
Therefore, g is nite a.e., that is, there exists B E such that (B) =
0 and g(x) < for all x B
c
. Set now
(x) :=
n(0)
(x) +
k=0
(
n(k+1)
(x)
n(k)
(x)), x B
c
.
The series above is absolutely convergent for any x B
c
; moreover, re-
placing the series in the denition of by the nite sum
N1
0
(
n(k+1)
(x)

n(k)
(x)) we obtain (x) = lim
k

n(k)
(x). Therefore, if we dene (for
instance) = 0 on the negligible set B, we obtain that
n(k)

a.e. on X.
The inequality || |
n(0)
| +g gives that ||
p
is integrable, so that
L
p
(X, E, ). So, (i) is proved.
In order to prove (ii), we rst claim that
n(k)
in L
p
(X, E, ) as
k . In fact, since
|(x)
n(h)
(x)|
k=h
|
n(k+1)
(x)
n(k)
(x)|, x X,
we have, again by monotone convergence and subadditivity of the norm,
__
X
|(x)
n(h)
(x)|
p
d(x)
_
1/p
k=h
__
X
|
n(k+1)
(x)
n(k)
(x)|
p
d(x)
_
1/p
k=h
2
k
,
and the claim follows.
Since (
n
) is Cauchy, for any > 0 there exists n
N such that
n, m > n

n

m
p
< .
Now choose k N such that n(k) > n
and
n(k)
p
< . For any
n > n
we have

n
p

n(k)
p
+
n(k)

n
p
2.
Remark 3.7 (L
p
convergence versus a.e. convergence). The argu-
ment used in the previous proof applies also to converging sequences
(as these sequences are obviously Cauchy), and proves that any sequence
(
n
) strongly converging to in L
p
(X,E, ) admits a subsequence (
n(k)
)
converging a.e. to : precisely, this happens whenever
n(k+1)

n(k)
p
< .
In general, however, convergence in L
p
does not imply convergence
a.e.: the functions
0
= 1
[0,1]
1
= 1
[0,1/2]
,
2
= 1
[1/2,1]
3
= 1
[0,1/3]
,
4
= 1
[1/3,2/3]
,
5
= 1
[2/3,1]
. . .
converge to 0 in L
p
(0, 1), but are nowhere pointwise converging.
The previous remark shows that we can expect to infer pointwise con-
vergence from convergence in L
p
only modulo the extraction of a sub-
sequence. Now, we ask ourselves about the converse implication: given
a sequence (
n
) in L
p
(X, E, ) pointwise converging to a function
L
p
(X, E, ), we want to nd conditions ensuring the convergence of
(
n
) to in L
p
(X, E, ). This is not true in general, as the following
example shows.
Example 3.8. Let X = [0, 1], E = B([0, 1]) and let = be the
Lebesgue measure. Set
n
(x) =
_
n if x [0, 1/n],
0 if x [1/n, 1].
Then
n
(x) 0 for all x (0, 1] but
n
1
= 1.
In the next proposition we assume that is a nite measure, since we
dened uniform integrability only for nite measures .
Proposition 3.9. Let (
n
) be a sequence in L
p
(X, E, ) pointwise con-
vergent to a function L
p
(X, E, ), with (|
n
|
p
) uniformly integ-
rable. Then
n
in L
p
(X, E, ).
Proof. The functions h
n
:= |
n
|
p
are pointwise converging to 0 and,
because of the inequality
h
n
2
p1
(|
n
|
p
+||
p
),
they are also easily seen to be uniformly integrable. Therefore, by
applying Vitali Theorem 2.18 to h
n
we obtain the conclusion.
3.4. The space L
(X, EEE, )
Let : X R be a Emeasurable function. We say that is
essentially bounded if there exists a real number M > 0 such that
({|| > M}) = 0.
If is essentially bounded there exists a nonnegative number, denoted
by
, such that
= min {t 0 : ({|| > t }) = 0} . (3.8)

This easily follows from the fact that the function t ({|| > t }) is
right continuous (Proposition 2.6), so the inmum is attained.
Notice also that
is characterized by the property
M || M a.e. in X. (3.9)
We shall denote by L
(X, E, ) the space of all equivalence classes of

essentially bounded functions with respect to the equivalence relation
in (3.1), thus identifying functions that coincide a.e. in X.
Several properties of the L
p
spaces extend up to the case p = : rst
of all L
(X, E, ) is a real vector space and we have the Minkowski

inequality
+
. (3.10)
Indeed, by (3.9) and the triangle inequality, |(x) + (x)|
a.e. in X, therefore (3.8) provides (3.10). As a consequence,

L
(X, E, ) endowed with the norm
, is a normed space.
The H older inequality takes the form
_
X
|| d
_
X
|| d. (3.11)
Indeed, we have just to notice that |(x)(x)|
|(x)| for a.e.

x X, and then integrate with respect to . This inequality can be still
written as (3.6), provided we agree that q = 1 is the dual exponent of
p = (and conversely).
For nite measures we can apply H olders inequality to obtain that the
L
p
spaces are nested; in particular L
is the smaller one and L

1
is the
larger one.
Remark 3.10 (Inclusions between L
p
spaces). Assume that is nite.
Then, if 1 r s we have
L
r
(X, E, ) L
s
(X, E, ).
In fact, if r < s and L
s
(X, E, ) we have, in view of the H older
inequality (with p = s/r and q = s/(s r)),
_
X
|(x)|
r
d(x)
__
X
|(x)|
s
d(x)
_
r/s
__
X
1
X
d(x)
_
1r/s
,
and so
r
((X))
(sr)/rs
s
. (3.12)
By (3.12) we obtain that p (X)
1/p
p
is nondecreasing for in
the intersection of the spaces L
p
(X, E, ), so that it has a limit as p
. Since (X)
1/p
1 as p we obtain that lim
p
p
exists,
nite or innite. The following proposition characterizes L
(X, E, )
and the L
norm in terms of this limit.

Proposition 3.11. Assume that is nite and let be in the intersection
_
p<
L
p
(X, E, ).
Then L
(X, E, ) if and only if the limit lim

p
p
is nite. If
this is the case, we have that
coincides with the value of the limit.

Proof. If p 1 we have by the Markov inequality
({|| a}) = ({||
p
a
p
}) a
p
p
p
.
Consequently,
p
a({|| a})
1/p
, which yields lim
p
p

a whenever ({ a}) > 0. So, if the limit is nite, we have
L
(X, E, ) and
lim
p
p
. The converse inequality follows
directly from (3.11); the same inequality also proves that if the limit is
not nite, then / L
(X, E, ).
In the next remark we characterize the convergence in L
, proving also
that L
(X, E, ) is a Banach space: as a matter of fact, convergence

in L
(X, E, ) differs from the convergence in supremum norm only

because a negligible set is neglected.
Remark 3.12 (L
(X, EEE, ) is a Banach space). Assume that (

n
)
L
(X, E, ) is a Cauchy sequence, and let us consider the negligible

set
_
n, m=0
{x X : |
n
(x)
m
(x)| >
n

m
} .
Then sup
B
c |
n

m
|
n

m
; as a consequence, the complete-

ness of the space of bounded functions dened in B
c
provides a bounded
function : B
c
R such that
n
uniformly in B
c
. Extending
in an arbitrary Emeasurable way (for instance with the 0 value) to the
whole of X, we get
n
in L
(X, E, ).
A similar argument proves that
n
in L
(X, E, ) if and only if

there exists a negligible set B E satisfying
n
uniformly in
B
c
.
We know that
_
X
d
does not exceed

_
X
|| d. A nice and useful
generalization of this fact is the so-called Jensen inequality.
Recall that, if J R is an interval, a continuous function g : J R
is said to be convex if
g
_
x + y
2
_
g(x) + g(y)
2
x, y J. (3.13)
By several approximations (see Exercise 3.7) one can prove that a convex
function f satises g(t x+(1t )y) tg(x)+(1t )g(y) for all x, y J
and t [0, 1], and even that
g
_
n
i =1
t
i
x
i
_
i =1
t
i
g(x
i
) whenever t
i
0, x
i
J and
n
i =1
t
i
= 1.
(3.14)
In the proof we use an elementary property of convex functions g : R
R satisfying g(t ) + as |t | +, namely the existence of a
minimum point t
0
; moreover, the function g is nondecreasing in [t
0
, +)
and nonincreasing in (, t
0
] (see Exercise 3.8).
Proposition 3.13 (Jensen). Assume that is a probability measure. Let
g: R R be convex and bounded from below and let L
1
(X, E, ).
Then we have
g
__
X
d
_
_
X
g() d. (3.15)
Proof. Let us rst show (3.15) when is simple. Let
=
n
i =1
i
1
A
i
,
where n 1 is an integer,
1
, . . . ,
n
R and A
1
, . . . , A
n
are mutually
disjoint sets in E whose union is X, so that
n
i =1
(A
i
) = 1.
Then, from (3.14) we infer
g
__
X
d
_
= g
_
n
i =1
i
(A
i
)
_
i =1
g(
i
)(A
i
) =
_
X
g() d.
In the general case, let us rst assume that g(t ) + as |t | +.
Then, by Exercise 3.8 we know that g has a minimum point t
0
, and that
g is nondecreasing in [t
0
, +), and nonincreasing in (, t
0
]. We can
assume with no loss of generality (possibly replacing g(t ) by g(t t
0
)
and by + t
0
) that g attains its minimum value at t
0
= 0, and that
_
X
g() dis nite. Furthermore, replacing g by gg(0), we can assume
that the minimum value of g is 0.
Let
n
be nonnegative simple functions satisfying
; the simple
functions
+
n

n
converge to
+
= in L
1
(X, E, ). In addition,
since g is monotone in (, 0] and [0, +), the monotone convergence
theorem gives
_
X
g(
+
n
) d
_
X
g(
+
) d,
_
X
g(
n
) d
_
g(
) d,
so that (since g(0) = 0,
+
n

n
= 0 and
+
= 0)
_
X
g(
+
n

n
) d =
_
X
g(
+
n
) d +
_
X
g(
n
) converges to
_
X
g(
+
) d +
_
X
g(
) =
_
X
g() d. Passing to the limit as n in Jensens inequality for the
simple functions
+
n

n
g
__
X
(
+
n

n
) d
_
_
X
g(
+
n

n
) d
we get (3.15).
Finally, the assumption that g(t ) +as t +can be removed
by considering the functions g
(t ) := g(t ) +|t |, which converge to +

as |t | , thanks to the fact that g is bounded from below: we obtain
g
__
X
d
_
+
_
X
d
_
X
g() d +
_
X
|| d.
and Jensens inequality follows by letting 0.
An alternative proof of Jensens inequality is based on another viewpoint,
namely the representation of g as the supremum of a family {L
i
}
i I
of
afne functions. Since is a probability measure, for all i I it is easy
to check that L
i
(
_
d) =
_
L
i
() d, so that
L
i
_
_
X
d)
_
X
L() d i I.
Taking the supremumin the right hand side we obtain Jensens inequality.
Both viewpoints are important in the theory of convex functions.
To be more precise, Jensens inequality holds provided g is convex on
an interval containing the image of . The next example is very important
in Probability and Information theory.
Example 3.14 (Entropy functional). By applying Jensens inequality
with the convex function g(z) = z ln z in [0, +) we obtain
_
X
ln d
_
X
dln
__
X
d
_
(3.16)
for all L
1
(X, E, ) nonnegative. If
_
X
d = 1 we obtain that
_
X
ln d 0 even though the function g has a variable sign (it attains
the minimum value 1/e at z = 1/e).
3.5. Dense subsets of L
p
(X, E, )
Proposition 3.15. For any p [1, +], the space of all simple
integrable functions is dense in L
p
(X, E, ).
Proof. Let f L
p
(X, E, ) with f 0. Then the conclusion follows
from Proposition 2.12 (by Proposition 2.4 in the case p = ) and the
dominated convergence theorem. In the general case we write f as f
+
f

and approximate in L
p
both parts by simple functions.
We consider now the special situation when X is a metric space, E is
the algebra of all Borel subsets of X and is any nite measure on
(X, E).
We denote by C
b
(X) the space of all continuous bounded functions on
X. Clearly, C
b
(X) L
p
(X, E, ) for all p [1, +].
Proposition 3.16. For any p[1, +) and any nite measure , C
b
(X)
is dense in L
p
(X, E, ).
Proof. Let C be the closure of C
b
(X) in L
p
(X, E, ); obviously C is a
vector space, as C
b
(X) is a vector space. In view of Proposition 3.15 it is
enough to show that for any Borel set I B(X) there exists a sequence
(
n
) C
b
(X) such that
n
1
I
in L
p
(X, E, ).
Assume rst that I is closed. Set
n
(x) =
1 n d(x, I ) if d(x, I )
1
n
0 if d(x, I )
1
n
,
where
d(x, I ) := inf{|x y| : y I }.
It is easy to see that
n
are continuous, that 0
n
1 and that
n
(x)
1
I
(x), hence the dominated convergence theorem implies that
n
1
I
in
L
p
(X, E, ).
Now, let
G := {I B(X) : 1
I
C}.
It is easy to see that G is a Dynkin system (which includes the system
of closed sets), so that by the Dynkin theorem we have G = B(X).
Remark 3.17. C
b
(X) (or more precisely, the equivalence classes of con-
tinuous bounded functions) is a closed subspace of L
(X, E, ), and
therefore it is not dense in general. Indeed, if (
n
) C
b
(X) is Cauchy
in L
(X, E, ), then it uniformly converges, up to a -negligible set

B (just take in Remark 3.12 as B the union of the negligible sets
{|
n
m
| >
n
m
}). Therefore (
n
) uniformly converges on B
c
and
on its closure K. Denoting by C
b
(K) its uniform limit, by Tietzes
exension theorem we may extend to a function, that we still denote by
, in C
b
(X). As X \ K B is negligible, it follows that
n
in
L
(X, E, ).
Exercises
3.1 Assume that is nite, but not nite. Provide examples showing that no
inclusion holds between the spaces L
p
(X, E, ) in general. Nevertheless, show
that for any Emeasurable function : X R the set
_
p [1, ] : L
p
(X, E, )
_
is an interval. Hint: consider for instance the Lebesgue measure on R.
3.2 Let 1 p q < and f L
q
(X, E, ). Show that for any (0, 1)
we can write f = g +

f , with g L
q
(X, E, ),

f L
p
(X, E, ) and g
q

f
q
(notice that if is nite we can take g = 0).
3.3 Let p (1, ), L
p
and L
q
, with q = p
, be such that
1
=
q
. Show that either = 0 or there exists a constant [0, +) such
that || = ||
q1
a.e. in X. Hint: rst investigate the case of equality in
Youngs inequality.
3.4 Prove the following variant of H olders inequality, known as Youngs in-
equality: if L
p
, L
q
and
1
p
+
1
q
=
1
r
, with r 1, we have that L
r
and
r

p
q
.
3.5 Let (
n
) L
1
(X, E, ) be nonnegative and satisfying liminf
n

n

a.e. in X. Show that
_
X

n
d =
_
X
d = 1
_
X
|
n
| d 0.
Hint: notice that the positive part and the negative part of
n
have the same
integral to obtain
_
X
|
n
| d = 2
_
X
(
n
)
+
d.
Then, apply the dominated convergence theorem.
3.6 Show that the following extension of Fatous lemma: if
n

n
, with
n
L
1
(X) nonnegative,
n
in L
1
(X), then
liminf
n
_
X

n
d
_
X
liminf
n

n
d.
Hint: prove rst the statement under the additional assumption that
n

a.e. in X.
3.7 Show that (3.13) implies g(t x + (1 t )y) g(x) + (1 t )g(y) for all
x, y J and t [0, 1]. Then, deduce from this property (3.14). Hint: it is
useful to consider dyadic numbers t = k/2
m
, with k 2
m
integer.
3.8 Let g : R R be a convex function such that g(z) + as |z| +.
Show the existence of z
0
R where g attains its minimum value. Then, show
that g is nondecreasing in [z
0
, +) and nonincreasing in (, z
0
].
3.9 Let (
n
) L
1
(X, E, ) be nonnegative functions. Show that the conditions
liminf
n

n
a.e. in X, limsup
n
_
X

n
d
_
X
d <
imply the convergence of
n
to in L
1
(X, E, ). Hint: use Exercise 3.5.
3.10 Let {
i
}
i I
be a family of functions satisfying
sup
i I
_
X
(|
i
|) d = M < +
and assume that (c)/c is nondecreasing and tends to +as c +. Show
that {
i
}
i I
is uniformly integrable. Hint: use the inequalities
_
A
|
i
| d
_
A{|
i
|c}
(
i
)
(c)
d +
_
A{|
i
|<c}
|
i
| d
M
(c)
+c(A),
with (c) := (c)/c, and then choose c sufciently large, such that M/(c) <
/2.
3.11 Assuming that (X, d) is a metric space, E = B(X) and is nite, prove
Lusins theorem: for any > 0 and any f L
1
(X, E, ), there exists a closed
set C X such that (X \ C) < and f |
C
is continuous and bounded. Hint:
use the density of C
b
(X) in L
1
and Egorovs theorem.
Chapter 4
Hilbert spaces
In this chapter we recall the basic facts regarding real vector spaces en-
dowed with a scalar product. We introduce the concept of Hilbert space
and show that, even for the innite-dimensional ones, continuous linear
functionals are induced by the scalar product. Moreover, we see that even
in some classes of innite dimensional spaces (the so-called separable
ones) there exists a well-dened notion of basis (the so-called complete
orthonormal systems), obtained replacing nite sums with converging
series. Even though the presentation will be self-contained, we assume
that the reader has already some familiarity with these concepts (basis,
scalar product, representation of linear functionals) in nite-dimensional
spaces.
4.1. Scalar products, pre-Hilbert and Hilbert spaces
A real preHilbert space is a real vector space H endowed with a map-
ping
H H R, (x, y) x, y,
called scalar product, such that:
(i) x, x 0 for all x H and x, x = 0 if and only if x = 0;
(ii) x, y = y, x for all x, y H;
(iii) x +y, z = x, z +y, z for all x, y, z H and , R.
In the following H represents a real preHilbert space.
The scalar product allows us to introduce the concept of orthogonality.
We say that two elements x and y of H are orthogonal if x, y = 0.
We are going to prove that the function
x :=
x, x, x H
is a norm in H. For this we need the following CauchySchwartz in-
equality.
Proposition 4.1. For any x, y H we have
|x, y| x y. (4.1)
In (4.1) equality holds if and only if x and y are linearly dependent.
Proof. Set
F() = x +y
2
=
2
y
2
+2x, y +x
2
, R.
Since F() 0 for all R we have
|x, y|
2
x
2
y
2
0,
which yields (4.1).
If x and y are linearly dependent, it is clear that |x, y| = x y.
Assume conversely that x, y = x y and that y = 0. Then
we have F() = (x y)
2
so that, choosing = x/y, we
nd F() = 0. This implies x + y = 0, so that x and y are linearly
dependent.
Now we can prove easily that is a norm in H. In fact, it is clear
that x = ||x for all R and all x H. Moreover, taking into
account (4.1), we have for all x, y H,
x + y
2
= x + y, x + y = x
2
+y
2
+2x, y
x
2
+y
2
+2x y = (x +y)
2
,
so that x + y x +y.
Therefore a preHilbert space H is a normed space and, in particular,
a metric space. If H, endowed with the distance induced by the norm, is
complete we say that H is a Hilbert space.
Example 4.2. (i). R
n
is a Hilbert space with the canonical scalar product
x, y :=
n
k=1
x
k
y
k
,
inducing the Euclidean distance, where x = (x
1
, . . . , x
n
), y = (y
1
, . . .
. . . , y
n
) R
n
.
(ii). Let (X, E, ) be a measure space. Then L
2
(X, E, ), endowed with
the scalar product
, :=
X
(x)(x) d(x) , L
2
(X, E, ),
is a Hilbert space (completeness follows from Proposition 3.5).
(iii). Let
2
be the space of all sequences of real numbers x = (x
k
) such
that

k=0
x
2
k
< .
2
is a vector space with the usual operations,
a(x
k
) = (ax
k
) a R, (x
k
) +(y
k
) = (x
k
+ y
k
), (x
k
), (y
k
)
2
.
The space
2
, endowed with the scalar product
x, y :=
k=0
x
k
y
k
, x = (x
k
), y = (y
k
)
2
is a Hilbert space. This follows from (ii) taking X = N, E = P(X) and
({x}) = 1 for all x X.
(iv). Let X = C([0, 1]) be the linear space of all real continuous func-
tions on [0, 1]. X is a preHilbert space with the scalar product
f, g :=
X
f (t )g(t ) dt.
However, X is not a Hilbert space: indeed, X is dense, but strictly con-
tained, in L
2
(0, 1).
Finite-dimensional pre-Hilbert spaces H are always Hilbert spaces:
indeed, if {v
1
, . . . , v
n
}, with n = dim H, is a basis of H, the Gram-
Schmidt orthonormalization process (recalled in Exercise 4.3) provides
an orthonormal basis {e
1
, . . . , e
n
} of H (i.e. e
i
= 1 and e
i
is ortho-
gonal to e
j
for i = j ), and the map
x =
n
i =1
x, e
i
e
i
(x, e
1
, x, e
2
, . . . , x, e
n
)
(mapping x to the Euclidean vector of its coordinates with respect to this
basis) is easily seen to provide an isometry with R
n
: indeed,
i =1
x, e
i
e
i
2
=
n
i, j =1
x, e
i
x, e
j
e
i
, e
j
=
n
i =1
(x, e
i
)
2
.
Thus, being R
n
complete, H is complete.
4.2. The projection theorem
It is useful to notice that for any x, y H the following parallelogram
identity holds:
x + y
2
+x y
2
= 2x
2
+2y
2
, x, y H. (4.2)
One can show that identity (4.2) characterizes pre-Hilbert spaces among
normed spaces, and Hilbert among Banach spaces, see Exercise 4.1.
Theorem 4.3 (Projection on closed subspaces). Let H be a Hilbert
space and let Y be a closed subspace of H. Then for any x H there
exists a unique y Y, called projection of x on Y and denoted by
Y
(x),
such that
x y = min
zY
x z.
Moreover, y is characterized by the property
x y, z = 0 for all z Y. (4.3)
Proof. Set d := inf
zY
x z and choose (y
n
) Y such that x y
n

d. We are going to show that (y
n
) is a Cauchy sequence.
For any m, n N we have, by the parallelogram identity (4.2),
(xy
n
)+(xy
m
)
2
+(xy
n
)(xy
m
)
2
= 2xy
n
2
+2xy
m
2
.
Consequently
y
n
y
m
2
= 2x y
n
2
+2x y
m
2
4
x
y
n
+ y
m
2
2
.
Taking into account that (y
n
+ y
m
)/2 Y we nd
y
n
y
m
2
2x y
n
2
+2x y
m
2
4d
2
,
so that y
n
y
m
0 as n, m . Thus, (y
n
) is a Cauchy sequence
and, since the space is complete and Y is closed, it is convergent to an
element y Y. Since x y
n
x y we nd that x y = d.
Existence is thus proved. Uniqueness follows again by the parallelogram
identity, that gives
y y
2
2x y
2
+2x y
2
4
x
y + y
2
2d
2
+2d
2
4d
2
= 0
whenever y and y
are minimizers.
Let us prove (4.3). Dene
F() = x y z
2
=
2
z
2
2x y, z +x y
2
, R.
Since F attains a minimum at = 0, we have F
(0) = x y, z = 0,
as claimed.
Conversely, if (4.3) holds for all z Y, we have
x y z
2
= z
2
+x y
2
x y
2
.
Remark 4.4 (Projection on convex closed sets). The previous proof
works, with absolutely no modication, to show that for any convex
closed set K H and any x H there exists a unique solution y =
K
(x) to the problem
min
zK
x z.
In this case, however,
K
(x) is not characterized by (4.3), but by a one-
sided condition, namely x
K
(x), z
K
(x) 0 for all z K, see
Exercise 4.2.
Corollary 4.5. Let Y be a closed proper subspace of H. Then there ex-
ists x
0
H \ {0} such that x
0
, y = 0 for all y Y.
Proof. It is enough to choose an element z
0
in H which does not belong
to Y and set x
0
= z
0

Y
(z
0
).
Fix an integer n 1, a n-dimensional subspace H
n
H and an
orthonormal basis {e
1
, . . . , e
n
} of it. The following result characterizes
the projection on H
n
, giving the best approximation of an element x by a
linear combination of {e
1
, . . . , e
n
}.
Proposition 4.6. The projection of an element x H on H
n
is given by
H
n
(x) =
n
k=1
x, e
k
e
k
.
Proof. We have to show that for any y
1
, . . . , y
n
R we have
x
n
k=1
x
k
e
k
x
n
k=1
y
k
e
k
2
, (4.4)
where x
k
= x, e
k
. We have in fact
x
n
k=1
y
k
e
k
2
= x
2
+
n
k=1
y
2
k
2
n
k=1
x
k
y
k
= x
2
k=1
x
2
k
+
n
k=1
(x
k
y
k
)
2
.
This quantity is clearly minimal when x
k
= y
k
, and
x
n
k=1
x
k
e
k
2
= x
2
k=1
x
2
k
. (4.5)
An alternative proof of the Proposition, based on the characterization
(4.3) of
H
n
(x), is proposed in Exercise 4.4.
4.3. Linear continuous functionals
A linear functional F on H is a mapping F : H R such that
F(x +y) = F(x) +F(y) x, y H, , R.
F is said to be bounded if there exists K 0 such that
|F(x)| Kx for all x H.
Proposition 4.7. A linear functional F is continuous if, and only if, it is
bounded.
Proof. It is obvious that if F is bounded then it is continuous (even
Lipschitz continuous). Assume conversely that F is continuous and, by
contradiction, that it is not bounded. Then for any n N there exists
x
n
H such that |F(x
n
)| n
2
x
n
. Setting y
n
=
1
n
x
n
/x
n
we have
y
n
=
1
n
0, whereas F(y
n
) n, which is a contradiction.
The following basic Riesz theorem, gives an intrinsic representation
formula of all linear continuous functionals.
Proposition 4.8. Let F be a linear continuous functional on H. Then
there exists a unique x
0
H such that
F(x) = x, x
0
x H. (4.6)
Proof. Assume that F = 0 and let Y = F
1
(0) = Ker F. Then Y = H is
closed (because F is continuous) and a vector space (because F is linear),
so that by Corollary 4.5 there exists z
0
H such that F(z
0
) = 1 and
z
0
, z = 0 for all z Ker F.
On the other hand, for any x H the element z = x F(x)z
0
belongs
to KerF since F(z) = F(x) F(x)F(z
0
) = 0. Therefore
z
0
, x F(x)z
0
= 0 for all x H,
so that
x, z
0
F(x)z
0
2
= 0
and (4.6) follows setting x
0
= z
0
/z
0
2
.
It remains to prove the uniqueness. Let y
0
H be such that
F(x) = x, x
0
= x, y
0
, x H.
Then, choosing x = x
0
y
0
we nd that x
0
y
0
2
= 0, so that
x
0
= y
0
.
4.4. Bessel inequality, Parseval identity and orthonormal sys-
tems
Let us discuss the concept of basis in a Hilbert space H, assuming with
no loss of generality that the dimension of H is not nite. We use Kro-
neckers notation
hk
, equal to 1 for h = k and equal to 0 if h = k.
Denition 4.9 (Orthonormal system). Asequence (e
k
)
kN
H is call-
ed an orthonormal system if
e
h
, e
k
=
h,k
, h, k N.
Proposition 4.10. Let (e
k
)
kN
be an orthonormal system in H.
(i) For any x H we have
k=0
|x, e
k
|
2
x
2
. (4.7)
(ii) For any x H the series

k=0
x, e
k
e
k
is convergent in H
(1)
.
(iii) Equality holds in (4.7) holds if and only if
x =
k=0
x, e
k
e
k
. (4.8)
Inequality (4.7) is called Bessel inequality and when the equality holds,
Parseval identity.
Proof. (i) Let n N. Then by (4.5) we have
x
n
k=0
x, e
k
e
k
2
= x
2
k=0
|x, e
k
|
2
, (4.9)
so that (4.7) follows by the arbitrariness of n.
(ii) Let n, p N and set
s
n
=
n
k=0
x, e
k
e
k
.
(1)
A series

k=0
x
i
of vectors in a Banach space E is said to be convergent if the sequence of the
nite sums
n
k=0
x
i
is convergent in E
Then
s
n+p
s
n
2
=
n+p
k=n+1
x, e
k
e
k
2
=
n+p
k=n+1
|x, e
k
|
2
.
Since the series

k=0
|x, e
k
|
2
is convergent by (i), the sequence (s
n
) is
Cauchy and the conclusion follows.
Passing to the limit as n in (4.9) we nd
k=0
x, e
k
e
k
2
= x
2
k=0
|x, e
k
|
2
.
This proves statement (iii).
Denition 4.11 (Complete orthonormal system). An orthonormal sys-
tem (e
k
)
kN
is called complete if
x =
k=0
x, e
k
e
k
x H.
Example 4.12. Let H =
2
as in Example 4.2(iii). Then, it is easy to see
that the system (e
k
), where
e
k
:= (0, 0, . . . , 0, 1, 0, 0, . . .) (with the digit 1 in the k-th position)
is complete. Indeed, if x = (x
k
)
2
we have that x, e
i
= x
i
(the i -th
component of the sequence x), so that
x
n
k=0
x, e
i
e
i
2
=
k=n+1
x
2
k
0.
We already noticed that R
n
is the canonical model of n-dimensional Hil-
bert spaces H, because any choice of an orthonormal basis {v
1
, . . . ,v
n
}
of H induces the linear isometry
a
n
i =1
a
i
e
i
fromR
n
to H (which, as a consequence, preserves also the scalar product,
by the parallelogram identity). For similar reasons,
2
is the canonical
model of all spaces H having a complete orthonormal system (e
k
)
kN
: in
this case, the linear map from
2
to H given by
a
i =0
a
i
e
i
is an isometry, thanks to Parsevals identity.
Proposition 4.13 (Completeness criterion). Let (e
n
) be an orthonormal
system. Then (e
n
) is complete if and only if the vector space E spanned
by (e
n
) is dense in H.
Proof. If (e
n
) is complete we have that any x H is the limit of the
nite sums

N
1
x, e
i
e
i
, which all belong to E, therefore E is dense.
Conversely, if E is dense, for any x H and any > 0 we can nd a
vector z =

n
i =1
a
i
e
i
with z x < . By applying Proposition 4.6
twice (rst to the vector space spanned by {e
1
, . . . , e
m
}, and then to the
vector space spanned by {e
1
, . . . , e
n
}) we get
x
m
i =1
x, e
i
e
i
x
n
i =1
x, e
i
e
i
x
n
i =1
a
i
e
i
<
for m n. Since is arbitrary this proves that the sum of the series is
equal to x.
The following proposition provides a necessary and sufcient condi-
tion for the existence of a complete orthonormal system. We recall that
a metric space (X, d) is said to be separable if there exists a countable
dense subset D X.
Theorem 4.14. A Hilbert space H admits a complete orthonormal sys-
tem (e
k
)
kN
if and only if H, as a metric space, is separable.
Proof. If H admits a complete orthonormal system(e
k
)
kN
then H is sep-
arable, because the collection D of nite sums with rational coefcients
of the vectors e
k
provides a countable dense subset (indeed, the closure
of D contains the nite linear combinations of the vectors e
k
and then the
whole space).
Conversely, assume that H is separable and let (v
n
) be a dense se-
quence. We dene e
0
= v
0
, e
1
= v
k
1
where k
1
is the rst k > k
0
= 0
such that v
k
is linearly independent from v
0
, e
2
= v
k
2
where k
2
is the
rst k > k
1
such that v
k
is linearly independent from {e
0
, e
1
}, and so on.
In this way we have built a sequence (e
i
) of linearly independent vectors
generating the same vector space generated by (v
n
). Let S be this vec-
tor space, and let us represent it as
n
S
n
, where S
n
is the vector space
generated by {e
0
, . . . , e
n
}. Notice that S is dense, as all v
n
belong to S.
By applying the Gram-Schmidt process to e
i
, an operation that does not
change the vector spaces S
n
generated by the vectors e
0
, . . . , e
n
, we can
also assume that (e
i
) is an orthonormal system. Then, Proposition 4.13
gives that (e
i
) is complete.
4.5. Hilbert spaces on C
In this section we illustrate briey how the concepts introduced so far
extend to complex vector spaces H. A preHilbert space is a complex
vector space H endowed with a mapping
H H C, (x, y) x, y,
called scalar product, such that:
(i) x, x 0 for all x H and x, x = 0 if and only if x = 0;
(ii) x, y = y, x for all x, y H;
(iii) x +y, z = x, z +y, z for all x, y, z H and , C.
It turns out that x :=

x, x is still a norm, because the Cauchy-
Schwarz inequality still holds. Hence, we can dene Hilbert spaces as
those spaces for which the norm induces a complete distance.
The canonical model of n-dimensional Hilbert space is C
n
. Given a
measure space (X, F, ), a basic example of Hilbert space is the space
of F-measurable and square integrable functions f : X C. In this
context F-measurable means that both the real and the imaginary part of
f are F-measurable. In this space one can dene the scalar product
f, g :=
X
f (x)g(x) d(x)
and prove that it induces an Hilbert space structure. The space
2
(C) of
complex-values sequences (z
n
) with (|z
n
|)
2
(R) is a particular case.
The norm still satises the parallelogram identity, so that we can still
prove the existence of orthogonal projections on closed subspaces and its
characterization in terms of
Re
x
Y
(x), z
= 0 z Y.
Analogously, in Remark 4.4, one has to replace the scalar product by its
real part.
Riesz representation theorem still holds (now for continuous and C-linear
functionals) and the concepts of orthonormal system and complete or-
thonormal system make sense. We have Bessels inequality for orthonor-
mal systems and Parsevals identity for complete orthonormal systems.
Finally,
2
(C) is the canonical model of all separable Hilbert spaces; as
in the real case the correspondence is induced by the choice of a complete
orthonormal system, which provides coordinates of a vector.
We conclude this chapter providing a natural example, considered in
the literature, of non-separable Hilbert space.
Example 4.15 (Quasi-periodic functions). We dene the space AP(R)
of almost periodic functions as the closure, with respect to uniform con-
vergence in R, of the vector space generated by complex-valued periodic
functions (of arbitrary period). This space has been extensively studied
by Bochner and Bohr. It is easy to show that the space of almost periodic
functions is not only a vector space (it is a subspace of C(R, C)), but also
an algebra, i.e. f g AP(R) whenever f, g AP(R).
If f is almost periodic one can also show (by approximation, taking
into account that this property is linear with respect to f and holds for
periodic functions) that there exists the limit
M( f ) := lim
T+
1
2T
T
T
f (x +t ) dt.
In addition, it is easily seen that the limit is independent of x.
The space AP(R) of all almost periodic functions is a pre-Hilbert
space when endowed with the following inner product
f, g
AP
:= M( f g) f, g AP(R).
For any R dene
e
(t ) = e
i t
, t R.
Then e
AP(R), e
, e
AP
= 1 and
e
, e
AP
= lim
T+
e
i T()
e
i T()
Ti ( )
= 0 whenever = ,
so that (e
)
R
is an orthonormal system in AP(R) having the cardinality
of continuum. One can also characterize the (abstract) Hilbert completion
of AP(R) (the so-called Bohr almost periodic functions) and prove that
the system {e
}
R
is complete. For more details see e.g. [4].
Exercises
4.1 Let (X, ) be a normed space, and assume that the norm satises the
parallelogram identity (4.2). Set
x, y :=
1
4
x + y
2
1
4
x y
2
, x, y X.
Show that , is a scalar product whose induced norm is . Use this identity
to show that any linear isometry between pre-Hilbert spaces preserves also the
scalar product.
4.2 Show that, in the situation considered in Remark 4.4,
K
(x) is characterized
by the property
x
K
(x), z
K
(x) 0 z K.
4.3 Let H be a nite dimensional pre-Hilbert space and let {v
1
, . . . , v
n
}, with
n = dim H, be a basis of it. Dene
f
1
= v
1
, f
2
= v
2

v
2
, f
1
f
1
, f
1
f
1
, f
3
= v
3

v
3
, f
1
f
1
, f
1
f
1

v
3
, f
2
f
2
, f
2
f
2
, ......
Show that e
i
= f
i
/ f
i
is an orthonormal system in H (notice that v
k
f
k
is
the projection of v
k
on the vector space generated by {v
1
, . . . , v
k1
}).
4.4 Let H be a Hilbert space, and let X be an innite-dimensional separable
subspace. Show that
X
(x) =
k=0
x, e
k
e
k
x H,
where (e
k
) is any complete orthonormal system of X. Hint: show that the vector
x
k
x, e
k
e
k
is orthogonal to all vectors of X.
4.5 Let X be the space of functions f : [0, 1] R such that f (x) = 0 for at
most countably many x, and

x
f
2
(x) < +. Show that X, endowed with the
scalar product
f, g :=
x[0,1]
f (x)g(x),
is a non-separable Hilbert space.
4.6 Let (e
k
)
kN
be a complete orthonormal system of H. Show that, for any
x, y H we have
k=0
x, e
k
y, e
k
= x, y. (4.10)
4.7 Show that for any Hilbert space H there exists a family (not necessarily
nite or countable) of vectors {e
i
}
i I
such that:
(i) e
i
, e
j
is equal to 1 if i = j , and to 0 otherwise;
(ii) for any vector x H there exists a countable set J I with
x =
i J
x, e
i
e
i
.
Hint: use Zorns lemma.
Chapter 5
Fourier series
In this chapter we study the problem of representing a given T-periodic
function as a superposition, for a suitable choice of the coefcients, of
more elementary ones. This problem was rst studied by J. Fourier
in the case when the elementary functions are the trigonometric ones
(nowadays we know that many different choices are indeed possible).
Thanks to the theory of L
2
spaces and of Hilbert spaces developed in the
previous chapters, the problem can be formalized by looking for com-
plete orthonormal systems in L
2
made by trigonometric functions.
We shall mostly be concerned with the case of 2-periodic functions,
but a simple change of scale (see Remark 5.1) easily provides the trans-
lation of the results to arbitrary periods.
We are concerned with the measure space
_
(, ), B((, )),
_
,
where is the Lebesgue measure. As usual, we shall write for brev-
ity L
2
(, ). We shall denote by , the canonical scalar product
given by
f, g :=
_
(,)
f (x)g(x) d =
f (x)g(x) dx, f, g L
2
(, ).
Let us consider, as a family of elementary functions, the trigonometric
system, given by:
1
2
;
1
cos kx, k N, k 1;
1
sin kx, k N, k 1.
(5.1)
It is easy to check with integration by parts that this is an orthonormal sys-
tem in L
2
(, ), see Exercise 5.1. Thus, in view of Proposition 4.10,
the series of functions
S(x) =
1
2
a
0
+
k=1
(a
k
cos kx + b
k
sin kx), (5.2)
is convergent in L
2
(, ) for any f L
2
(, ), where
a
k
:=
1
f (y) cos kydy, k N,

and
b
k
:=
1
f (y) sin kydy, k N, k 1.

Notice that a
0
/2 is the mean value of f on (, ), in agreement with
the fact that all terms in the series (5.2) have mean value 0 on (, ).
To recognize (5.2) in terms of scalar products, we see that the term a
0
/2
corresponds to
_
f,
1
2
_
1
2
and the terms a
k
cos kx, b
k
sin kx for k 1, correspond respectively to
_
f,
1
cos kx
_
1
cos kx,
_
f,
1
sin kx
_
1
sin kx.
Formula (5.2) is called the trigonometric Fourier series of f .
The Bessel inequality (4.7) reads, in this context, as follows:
1
| f (x)|
2
dx
1
2
a
2
0
+
k=1
(a
2
k
+ b
2
k
). (5.3)
Indeed, it is easily seen that a
2
0
/2 = ( f, 1/
2)
2
and, for k 1,
a
2
k
=
__
f,
1
cos kx
__
2
, b
2
k
=
__
f,
1
sin kx
__
2
.
First, we shall nd sufcient conditions on f ensuring the pointwise con-
vergence of the series S(x) to f (x) in (, ). Then, we shall show
that the trigonometric system is complete, so that the inequality above
is actually an equality. As shown in Exercise 5.4 and Exercise 5.5, the
trigonometric system, the trigonometric series and the form of the coef-
cients become much more nice and symmetric in the complex-valued
Hilbert space L
2
_
(, ); C
_
:
f (x) =
nZ
a
n
e
i nx
where a
n
:=
1
2
_

f (x)e
i nx
dx.
Remark 5.1 (2T-periodic functions). If f L
2
(T, T) we can write
instead
f (x) =
a
0
2
+
k=1
a
k
cos

T
kx + b
k
sin

T
kx
with
a
k
:=
1
T
_
T
T
f (x) dx if k = 0;
1
T
_
T
T
f (x) cos

T
kx dx if k > 0,
b
k
:=
1
T
_
T
T
f (x) sin

T
kx dx.
5.1. Pointwise convergence of the Fourier series
For any integer N 1 we consider the partial sum
S
N
(x) :=
1
2
a
0
+
N
k=1
(a
k
cos kx + b
k
sin kx), x [, ).
Since the functions cos kx and sin kx are 2periodic, it is natural to
extend f to the whole of R as a 2periodic function

f , setting
f (x + 2n) = f (x), x [, ), n = 1, 2, . . . . (5.4)

We shall denote in the sequel by H
l,r
(z) the Heaviside function
H
l,r
(z) :=
_
l if z 0;
r if z > 0.
Lemma 5.2. For any integer N 1 and x, l, r R we have
S
N
(x)
l +r
2
=
1
2
_

f (x + ) H
l,r
()
sin(/2)
sin
_
_
N +
1
2
_
_
d.
(5.5)
Proof. Write
S
N
(x) =
1
2
a
0
+
N
k=1
(a
k
cos kx + b
k
sin kx)
=
1
f (y)
_
1
2
+
N
k=1
(cos kx cos ky + sin kx sin ky)
_
dy
=
1
f (y)
_
1
2
+
N
k=1
cos k(x y)
_
dy.
To evaluate the sum, we notice that for any z R
_
1
2
+
N
k=1
cos kz
_
sin
_
1
2
z
_
=
1
2
_
sin
_
1
2
z
_
+
N
k=1
_
sin
__
k +
1
2
_
z
_
sin
__
k
1
2
_
z
___
=
1
2
sin
__
N +
1
2
_
z
_
.
Therefore
1
2
+
N
k=1
cos kz =
1
2
sin
__
N +
1
2
_
z
_
sin
_
1
2
z
_ (5.6)
and so,
S
N
(x) =
1
2
_

f (y)
sin
__
N +
1
2
_
(x y)
_
sin
_
1
2
(x y)
_ dy. (5.7)
Now, setting = y x we get
S
N
(x) =
1
2
_
x
x
f (x + )
sin
__
N +
1
2
_
_
sin
_
1
2
_ d
=
1
2
_

f (x + )
sin
__
N +
1
2
_
_
sin
_
1
2
_ d
since the function under the integral is 2periodic. Now, integrating
(5.6) over [, ] yields
1 =
1
2
_

sin
__
N +
1
2
_
_
sin
_
1
2
_ d,
so that
1
_

0
sin
__
N +
1
2
_
_
sin
_
1
2
_ d = 1 =
1
_
0
sin
__
N +
1
2
_
_
sin
_
1
2
_ d.
If we multiply both sides by l and r, and subtract the resulting identities
from (5.7), (5.5) follows.
Proposition 5.3 (Dinis test). Let x, l, r R be such that
_

f (x + ) H
l,r
()|
| sin(/2)|
d < . (5.8)
Then the Fourier series of f converges to (l +r)/2 at x.
Dinis test shows a remarkable property of the Fourier series: while the
specic value of the coefcients a
k
and b
k
depends on the behaviour of f
on the whole interval (, ), and the same holds for the Fourier series,
the character of the series (convergent or not) at a given point x depends
only on the behaviour of f in the neighbourhood of x: indeed, it is this be-
haviour that inuences the integrability of (
f (x +)H
l,r
())/ sin(/2)
(the only singularity being at = 0).
In the next example we provide sufcient conditions for the conver-
gence of the Fourier series.
Example 5.4. Assume that f : [, ] Ris L-Lipschitz continuous,
i.e.
| f (x) f (y)| L|x y| x, y [, ]
for some L 0. Then Dinis test is fullled at any x R\ Z choosing
l = r =

f (x), and at any x Z choosing l =

f (x
) and r =

f (x
+
)
(1)
.
Indeed, with these choices of l and r, the quotient
f (x + ) H
l,r
()
sin(/2)
is bounded in a neighbourhood of 0.
The same conclusions hold when f is H older continuous for some
(0, 1], i.e.
| f (x) f (y)| L|x y|
, x, y [, ]
for some L 0: in this case the quotient is bounded from above, near 0,
by the function L||
/| sin(/2)| 2L||
1
which is integrable.
(1)
here we denote by g(x
), g(x
+
) the left and right limits of g at x
More generally, the argument of the previous example can be used to
show that the Fourier series is pointwise convergent for piecewise C
1
functions f : at continuity points x the series converges to f (x), and at
(jump) discontinuity points x it converges to ( f (x
) + f (x
+
))/2. How-
ever, the mere continuity of f is not sufcient to ensure pointwise con-
vergence of the Fourier series.
In order to prove Proposition 5.3, we need the following Riemann
Lebesgue lemma, a tool interesting in itself.
Lemma 5.5. Let (e
k
) be an orthonormal system in L
2
(, ). Assume
that there exists M > 0 such that e
k
M for all k N. Then for

any f L
1
(, ) we have
lim
k
_

f (x)e
k
(x) dx = 0. (5.9)
Proof. Notice rst that if f L
2
(, ) the conclusion of the lemma is
trivial. We have in fact in this case
_

f (x)e
k
(x) dx = f, e
k
and, since by Bessels inequality the series

1
| f, e
k
|
2
is convergent,
we have lim
k
f, e
k
= 0.
Let us now consider the general case. We know that bounded continu-
ous functions are dense in L
1
(, ), hence for any > 0 we can nd
g C
b
(, ) such that f g
1
< . As a consequence
| f, e
k
| = | f g, e
k
| + |g, e
k
| M + |g, e
k
|
and letting k we obtain limsup
k
| f, e
k
| M. Since is arbit-
rary the proof is achieved.
Proof of Proposition 5.3. Set
g() :=
f (x + ) H
l,r
()
sin(/2)
L
1
(, ). (5.10)
Then, writing
sin[(N +
1
2
)t ] = sin Nt cos
1
2
t + cos Nt sin
1
2
t
and applying the RiemannLebesgue lemma to g cos t /2 (with e
N
=
sin Nt ) and to g sin(t /2) (with e
N
= cos Nt ) we obtain from (5.5) that
S
N
(x) converge to (l +r)/2.
5.2. Completeness of the trigonometric system
Proposition 5.6. The trigonometric system (5.1) is complete. In particu-
lar equality holds in (5.3) and
lim
N
_

| f (x) S
N
f (x)|
2
dx = 0 f L
2
(, ). (5.11)
Proof. We show that the vector space E generated by the trigonometric
system is dense in L
2
(, ). Let H
be the closure, in the L

2
(, )
norm, of E, that is easily seen to be still a vector space as well. We will
prove in a series of steps that H
contains larger and larger classes of

functions.
Let f : [, ] [0, +) be a Lipschitz function, and let us prove
that it belongs to H
. Indeed, we know from Example 5.4 that S

N
f
pointwise in (, ). On the other hand, we already know from Propos-
ition 4.10(ii) that the Fourier series is convergent in L
2
(, ) to some
function g (which is indeed, by Exercise 4.4, the orthogonal projection of
f on H
), therefore a subsequence (S
N(k)
) is converging -almost every-
where to g. It follows that g = f and S
N
f in L
2
(, ).
If now g : [, ] [0, +) is continuous, we know that g can be
monotonically approximated by the Lipschitz functions
g
(x) := min
y[,]
_
g(y) + |x y|
_
, x [, ]
(see Exercise 2.11), that converge to g also in L
2
(, ) by the dom-
inated convergence theorem. As a consequence also g belongs to H
.
Since H
is invariant by addition of constants, we proved that all continu-

ous functions in [, ] belong to H
. We conclude using the density of

this class of functions in L
2
(, ).
Remark 5.7. Let f L
2
(, ). Then, the Parseval identity reads as
follows
1
| f (x)|
2
dx =
1
2
a
2
0
+
k=1
(a
2
k
+ b
2
k
). (5.12)
For instance, taking f (x) = x one nds the following nice relation
between and the harmonic series with exponent 2:
k=1
1
k
2
=

2
6
.
Notice that (5.11) provides, for any f L
2
(, ), the existence of
a subsequence N(k) such that S
N(k)
f (x) f (x) for L
1
a.e. x
(, ). Is it true that the whole sequence S
N
f converges a.e. to f ?
This problem, surprisingly difcult, has been solved by L.Carleson only
in 1966, see [1].
Finally, we notice that there exist other important examples of com-
plete orthonormal systems, besides the trigonometric one. Some of them
are illustrated in the exercises.
5.3. Uniform convergence of the Fourier series
We conclude by studying the uniform convergence of the Fourier series.
We recall that a series

0
x
n
in a Banach space E is said to be totally
convergent if the numerical series

0
x
n
is convergent. Using the
completeness of E it is not difcult to check (see Exercise 5.2) that any
totally convergent series is convergent (as we have seen in the previous
chapter, this means that the nite sums
N
0
x
n
converge in E to a vector,
denoted by
0
x
n
).
Now we show that the Fourier series of C
1
functions f with f () =
f () are uniformly convergent: the proof highlights two important prin-
ciples, whose validity extend to higher order derivatives (see Exercise
5.9) and to Fourier transforms: rst, the Fourier coefcients of the deriv-
ative of a function are linked to the Fourier coefcients of the function;
second, higher regularity of f implies a faster decay of the Fourier coef-
cients, and therefore a convergence in stronger norms of the Fourier
series.
Proposition 5.8. Assume that f C
1
([, ]) and that f () =
f (). Then the Fourier series of f converges uniformly to f in [, ].
Proof. We rst notice that

f in (5.4) is Lipschitz continuous, so that by
Proposition 5.3 we have
f (x) =
1
2
a
0
+
k=1
(a
k
cos kx + b
k
sin kx) x [, ].
Let us consider the Fourier series of the derivative f

of f ,
k=1
(a
k
cos kx + b
k
sin kx) x [, ],
where, for k 1 integer,
a
k
=
1
f

(y) cos ky dy, b
k
=
1
f

(y) sin ky dy. (5.13)
Notice that a
0
= 0 because f () = f () implies that the mean value
of f

on (, ) is 0. As easily checked through an integration by parts
(using again the fact that f () = f ()), we have a
k
= kb
k
and b
k
=
ka
k
. Then, by the Bessel inequality it follows that
k=1
k
2
(a
2
k
+b
2
k
) =
k=1
(a
k
)
2
+(b
k
)
2
| f

(x)|
2
dx < . (5.14)
Therefore the Fourier series of f is totally convergent in C([, ]) and
therefore uniformly convergent. We have indeed
k=1
max
x[,]
|a
k
cos kx + b
k
sin kx|
k=1
(|a
k
| + |b
k
|)
k=1
k
2
(|a
k
| + |b
k
|)
2
_
1/2
_

k=1
k
2
_
1/2
< .
Exercises
5.1 Check that the trigonometric system (5.1) is orthogonal.
5.2 Let E be a Banach space. Show that any totally convergent series

n
x
n
,
with (x
n
) E, is convergent. Moreover,
_
_
_
n=0
x
n
_
_
_
n=0
x
n
. (5.15)
Hint: estimate
N
0
x
n

M
0
x
n
with the triangle inequality.
5.3 Prove that the following systems on L
2
(0, ) are orthonormal and complete
_
2
sin kx, k 1,
and
1
;
_
2
cos kx, k 1.
5.4 Show that
e
k
(x) :=
1
2
e
i kx
, k Z
is a complete orthonormal system in L
2
((, ); C). Hint: in order to show
completeness, consider rst the cases where f is real-valued or i f is real-valued.
5.5 Let (e
k
) be as in Exercise 5.4. Using the Parseval identity show that
_

| f (x)|
2
dx =
1
2
kZ
__

f (x)e
i kx
dx
_
2
f L
2
((, ); C) .
5.6 Let f L
2
((, ); C) and let S
N
f =

N
N
f, e
k
e
k
, with N 1, be the
Fourier sums corresponding to the complete orthonormal systemin Exercise 5.4.
Show that
f (x) S
N
f (x) =
_

G
N
(x y)( f (x) f (y)) dy
with
G
N
(z) :=
sin((N + 1/2)z)
sin(z/2)
.
Hint: use the identities
N
0
e
i ky
=

N
0
(e
i y
)
k
= (e
i (N+1)z
1)/(e
i y
1).
5.7 Arguing as in Remark 5.7, show that
1
k
4
=
4
/90. Hint: consider the
function f (x) = x
2
.
5.8 Chebyschev polynomials C
n
in L
2
(a, b), with (a, b) bounded interval, are
the ones obtained by applying the Gram-Schmidt procedure to the vectors 1, x,
x
2
, x
3
, . . .. They are also called Legendre polynomials when (a, b) = (1, 1).
(a) Compute explicitly the rst three Legendre polynomials.
(b) Show that {C
n
}
nN
is a complete orthonormal system. Hint: use the density
of polynomials in C([a, b]).
(c) Show that the n-th Legendre polynomial P
n
is given by
P
n
(x) =
_
2n + 1
2
1
2
n
n!
d
n
d
n
x
(x
2
1)
n
.
5.9 Let f C
m
_
[, ]; C
_
with f
( j )
() = f
( j )
() for all j = 0, . . . , m
1. Show that c
(m)
k
, the k-th Fourier coefcient of f
(m)
is linked to c
k
, the k-th
Fourier coefcient of f , by c
(m)
k
= (i k)
m
c
k
.
Chapter 6
Operations on measures
In this chapter we collect many useful tools in Analysis and Probability
that will be widely used in the following chapters. We will study the
product of measures (both nite and countable), the product of measures
by L
1
functions, the RadonNikod ym theorem, the convergence of meas-
ures on the real line R and the Fourier transform.
6.1. The product measure and FubiniTonelli theorem
Let (X, F) and (Y, G) be measurable spaces. Let us consider the product
space XY. A set of the form AB, where A F and B G, is called
a measurable rectangle. We denote by R the family of all measurable
rectangles. R is obviously a system. The algebra generated by R
is called the product algebra of F and G. It is denoted by F G.
Given nite measures in (X, F) and in (Y, G), we are going to
dene the product measure in (X Y, F G).
First, for any E F G we dene the sections of E, setting for
x X and y Y,
E
x
:= {y Y : (x, y) E}, E
y
:= {x X : (x, y) E}.
Proposition 6.1. Assume that and are nite and let E F G.
Then the following statements hold.
(i) E
x
G for all x X and E
y
F for all y Y.
(ii) The functions
x (E
x
), y (E
y
),
are Fmeasurable and Gmeasurable respectively. Moreover,
_
X
(E
x
) d(x) =
_
Y
(E
y
) d(y). (6.1)
Proof. Assume rst that E = A B is a measurable rectangle. Then, if
(x, y) X Y we have
E
x
=
_
B if x A
if x / A,
E
y
=
_
A if y B
if y / B.
Consequently,
(E
x
) = 1
A
(x)(B), (E
y
) = 1
B
(y)(A),
so that (6.1) clearly holds.
Now, let D be the family of all E F G such that (i) is fullled.
Clearly, D is a Dynkin system including the system R. Therefore, (i)
follows from the Dynkin theorem.
Now, if both are are nite, let D be the family of all E F
G such that (ii) is fullled. Clearly, D is a Dynkin system including
the system R (stability under complement follows by the identities
((E
c
)
x
) = (Y) (E
x
) and ((E
c
)
y
) = (X) (E
y
)). Therefore,
(ii) follows from the Dynkin theorem as well.
In the general nite case we argue by approximation: if E FG,
F X
h
X and G Y
h
Y satisfy (X
h
) < and (Y
h
) < , we
dene the nite measures
h
(A) = (A X
h
),
h
(B) = (B Y
h
)
to obtain that x
h
(E
y
) is Fmeasurable and y
h
(E
x
) is G
measurable for all E E G. Passing to the limit as h in the
identity
_
X
h
h
(E
x
) d(x) =
_
X

h
(E
x
) d
h
(x) =
_
Y

h
(E
y
) d
h
(y)
=
_
Y
h
h
(E
y
) d(y)
the continuity properties of measures and integrals give (6.1) as well.
Theorem 6.2 (Product measure). If and are nite, there exists a
unique measure in (X Y, F G) satisfying
(A B) = (A)(B) for all A F, B G.
The measure is -nite and denoted by . Furthermore
is nite (resp. a probability measure) if both and are nite (resp.
probability measures).
Proof. Existence is easy: we set
(E) =
_
X
(E
x
) d(x) =
_
Y
(E
y
) d(y), E F G. (6.2)
Using the continuity and additivity properties of the integral, it is immedi-
ate to check that is a measure on (XY, FG). In the case of nite
measures, uniqueness follows by the the coincidence criterion for posit-
ive measures stated in Proposition 1.15: indeed, the value of the product
measure is uniquely determined on the system K made by rectangles
AB with (A) and (B) nite, and thanks to the niteness assump-
tion there exist E
n
= A
n
B
n
K with E
n
X Y.
Corollary 6.3. Let E F G be such that (E) = 0. Then
(E
y
) = 0 for almost all y Y and (E
x
) = 0 for almost all
x X.
Proof. It follows directly from (6.2).
We consider here the measure space (X Y, F G, ), where =
and and are nite.
Theorem 6.4 (FubiniTonelli). Let F : XY [0, +] be a FG
measurable map. Then the following statements hold.
(i) For any x X (respectively y Y), the function y F(x, y) (re-
spectively x F(x, y)) is Gmeasurable (resp. Fmeasurable).
(ii) The functions
x
_
Y
F(x, y) d(y), y
_
X
F(x, y) d(x)
are respectively Fmeasurable and Gmeasurable.
(iii) We have
_
XY
F(x, y) d(x, y) =
_
X
__
Y
F(x, y) d(y)
_
d(x)
=
_
Y
__
X
F(x, y) d(x)
_
d(y).
(6.3)
Proof. Assume rst that F = 1
E
, with E F G. Then we have
F(x, y) = 1
E
x
(y), x X, F(x, y)(x) = 1
E
y (x), y Y,
so (i), (ii) and (iii) follow from Proposition 6.1. Consequently, by lin-
earity, (i)(iii) hold when F is a simple function. If F is general, it
is enough to approximate it by a monotonically increasing sequence of
simple functions and then pass to the limit using the monotone conver-
gence theorem.
Remark 6.5 (The denition of integral revisited). We noticed in Re-
mark 2.13 that the integral of nonnegative functions can also be dened
without using the archimedean integral, by considering minorant simple
functions. If we follow this approach, the identity that we used to dene
the integral can be derived by applying the FubiniTonelli theorem to the
subgraph
E := {(x, t ) X R : 0 < t < f (x)} ,
with the product measure , being the Lebesgue measure. Indeed,
it is not difcult to show that E is F B(R)measurable whenever f
is F-measurable, so that
_

0
({ f > t }) dt =
_

0
(E
t
) dt = (E) =
_
X
(E
x
) d(x)
=
_
X
f (x) d(x).
Of course, splitting F in positive and negative parts, also the case of
extended real valued maps can be considered:
Corollary 6.6. Let F : X Y [, +] be a F Gmeasurable
map. Then F is integrable if and only if:
(i) for a.e. x X the function y F(x, y) is integrable;
(ii) the function x
_
Y
|F(x, y)| d(y) is integrable.
If these conditions hold, we have
_
XY
F(x, y) d( )(x, y) =
_
X
__
Y
F(x, y) d(y)
_
d(x). (6.4)
Notice that, strictly speaking, the function in (ii) is dened only out of
a negligible set; by integrability of it we mean integrability of
any Fmeasurable extension of it (for instance we may set it equal to 0
wherever
_
Y
|F(x, y)| d(y) is not nite).
Remark 6.7 (Finite products). The previous constructions extend with-
out any difculty to nite products of measurable spaces (X
i
,F
i
). Name-
ly, the product -algebra F :=
n
i
F
i
in the cartesian product X :=
n
1
X
i
is generated by the rectangles
{A
1
A
n
: A
i
F
i
, 1 i n} .
Furthermore, if
i
are nite measures in (X
i
, F
i
), integrals with re-
spect to the product measure =
n
1

i
are dened by
_
X
F(x) d(x) =
_
X
1
_
X
2

_
X
n
F(x
1
, . . . , x
n
) d
n
(x
n
) d
2
(x
2
) d
1
(x
1
),
and any permutation in the order of the integrals would produce the same
result. Finally, the product measure is uniquely determined, in the
nite case, by the product rule
(A
1
A
n
) =
n
i =1
i
(A
i
) A
i
F
i
, 1 i n.
It is also not hard to show that the product is associative, both at the level
of algebras and measures, see Exercise 6.1.
6.2. The Lebesgue measure on R
n
This section is devoted to the construction, the characterization and the
main properties of the Lebesgue measure in R
n
, i.e. the length measure
in R
1
, the area measure in R
2
, the volume measure in R
3
and so on.
Denition 6.8 (Lebesgue measure in R
n
). Let us consider the measure
space (R, B(R), L
1
), where L
1
is the Lebesgue measure on (R,B(R)).
Then, we can dene the measure space (R
n
,
n
i =1
B(R), L
n
) with L
n
:=
n
1
L
1
. We say that L
n
is the Lebesgue measure on R
n
.
Since (see Exercise 6.2)
B(R
n
) =
n
i =1
B(R),
we can equivalently consider L
n
as a measure in (R
n
, B(R
n
)), forget-
ting its construction as a product measure (indeed, there exist alternative
and direct constructions of L
n
independent of the concept of product
measure).
As in the one-dimensional case, we will keep using the classical nota-
tion
_
E
f (x) dx :=
_
R
n
f 1
E
dL
n
E B(R
n
), f : R
n
R Borel
for integrals with respect to Lebesgue measure L
n
(or Riemann integrals
in more than one independent variable).
In the computation of Lebesgue integrals, a particular role is sometimes
played by the dimensional constant
n
= L
n
(B(0, 1)) (so that
1
= 2,
2
= ,
3
= 4/3,. . . ). A general formula for the computation of
n
can be given using Eulers function:
(z) :=
_

0
t
z1
e
t
dt z > 0.
Indeed, we have
n
=

n/2
(
n
2
+1)
. (6.5)
A proof of this formula, based on the identity (z + 1) = z(z) (which
gives also (n) = (n 1)! for n 1 integer) is proposed in Exercise 6.7.
We are going to show that L
n
is invariant under translations and rota-
tions. For this we need some notation. For any a R
n
and any > 0 we
set
Q(a, ) : =
_
x R
n
: a
i
x
i
< a
i
+, i = 1, . . . , n
_
=
n
i =1
[a
i
, a
i
+).
Q(a, ) is called the box with corner at a. For all N N we consider
the family
Q
N
= {Q(2
N
k, 2
N
) : k = (k
1
, . . . , k
n
) Z
n
}.
It is also clear that each box in Q
N
is Borel and that its Lebesgue measure
is 2
nN
. Now we set
Q =
_
N=0
Q
N
.
It is clear that all boxes in Q
N
are mutually disjoint and that their union
is R
n
. Furthermore, if N < M, Q Q
N
and Q
Q
M
, then either
Q
Q or Q Q
= . If follows that if Q, Q
Q intersect, then
one of the two sets is contained in the other one.
Lemma 6.9. Let U be a non empty open set in R
n
. Then U is the disjoint
union of boxes in Q.
Proof. For any x U, let Q
x
Q be the biggest box such that x
Q
x
U. This box is uniquely dened: indeed, x an x; for any m there
is only one box Q
x,m
Q
m
such that x Q
x,m
; moreover, since U is
open, for m large enough Q
x,m
U; we can then dene Q
x
= Q
x, m
where m is the smallest integer m such that Q
x,m
U.
This family {Q
x
}
xU
is a partition of U, that is, for any x, y U, either
Q
x
= Q
y
or Q
x
Q
y
= ; indeed, if we suppose that Q
x
Q
y
= ,
then one of the two boxes is contained in the other, say Q
x
Q
y
. This
leads to x Q
x
Q
y
U, contradicting the denition of Q
x
unless
Q
x
= Q
y
.
From Lemma 6.9 it follows easily that the algebra generated by Q
coincides with B(R
n
).
Proposition 6.10 (Properties of the Lebesgue measure). The follow-
ing statements hold.
(i) (translation invariance) For any E B(R
n
), x R
n
we have
L
n
(E + x) = L
n
(E), where
E + x = {y + x : y R
n
}.
(ii) If is a translation invariant measure on (R
n
, B(R
n
)) such that
(K) < for any compact set K, there exists a number C
0
such that
(E) = C
L
n
(E) E B(R
n
).
(iii) (rotation invariance) For any orthogonal matrix R L(R
n
; R
n
) we
have
L
n
(R(E)) = L
n
(E) E B(R
n
).
(iv) For any T L(R
n
; R
n
) we have
L
n
(T(E)) = |det T|L
n
(E) E B(R
n
).
Proof. Fix x R
n
. The measures L
n
(E) and L
n
(E + x) coincide on
the system of boxes; thanks to Lemma 6.9, this system generates
the Borel algebra, so that the coincidence criterion for measures stated
in Proposition 1.15 gives that L
n
(E) = L
n
(E +x) for all Borel sets E.
Let us prove (ii). Let Q
0
Q
0
and set C
= (Q
0
). Since Q
0
is
included in a compact set, we have C
< . Since is translation

invariant, all boxes in Q
0
have the same measure. Now, let Q
N
Q
N
.
Since Q
0
is the disjoint union of 2
nN
boxes in Q
N
which have all the
same measure (again by the translation invariance) we have that
(Q
N
) = C
L
n
(Q
N
).
So, Lemma 6.9 gives that (A) = C
L
n
(A) for any open set, and there-
fore for any Borel set.
Let us now prove (iii). By the translation invariance of L
n
, the meas-
ure (E) = L
n
(R(E)) is easily seen to be translation invariant (because
R(E+z) = R(E)+R(z)), hence L
n
(R(E)) = CL
n
(E) for some cont-
ant C. We can identify the constant C choosing E equal to the unit ball,
nding C = 1.
Finally, let us prove (iv). By polar decomposition we can write T =
R S with S =

T
T symmetric and nonnegative denite, and R

orthogonal. Notice that on one hand |det T| = det S (because det R
{1, 1}) and on the other hand, by (iii) we have
L
n
(T(E)) = L
n
(R(S(E))) = L
n
(S(E)).
Hence, it sufces to show that L
n
(S(E)) = det SL
n
(E) for any sym-
metric and nonnegative denite matrix S. By the translation invariance of
L
n
(S(E)) there exists a constant C such that L
n
(S(E)) = CL
n
(E) for
any Borel set E. In this case we can identify the constant C choosing as
E a suitable n-dimensional cube: denoting by (e
i
) an orthonormal basis
of eigenvectors of S, with eigenvalues
i
0 (whose product is det S),
choosing
E =
_
n
i =1
c
i
e
i
: |c
i
| 1
_
, so that S(E) =
_
n
i =1
i
c
i
e
i
: |c
i
| 1
_
,
the rotation invariance of L
n
gives L
n
(E) = 1 and L
n
(S(E)) =
1

n
. Therefore C = det S and the proof is complete.
6.3. Countable products
We are here concerned with a sequence (X
i
, F
i
,
i
), i = 1, 2, . . ., of
probability spaces. We denote by X the product space
X :=
k=1
X
k
and by x = (x
k
) the generic element of X.
We are going to dene a algebra of subsets of X. Let us rst in-
troduce the cylindrical sets in X. A cylindrical set I
n,A
is a set of the
following form
I
n,A
= {x : (x
1
, . . . , x
n
) A},
where n 1 is an integer and A
n
1
F
k
. This representation is not
unique; however, since
I
n,A
= A
k=n+1
X
k
we have that I
n,A
= I
m,B
with n < m implies B = AX
n+1
X
m
.
We denote by C the family of all cylindrical sets of X. Notice also that
I
c
n,A
= I
n,A
c ,
so that C is stable under complement. If I
n,A
and I
m,B
belong to C we can
assume by the previous remarks that m = n, so that I
n,A
I
n,B
= I
n,AB
belongs to C. Therefore C is an algebra.
The algebra generated by C is called the product algebra of the
algebras F
i
. It is denoted by
k=1
F
k
.
Now we dene a function on C, setting
(I
n,A
) =
_
n
k=1

k
_
(A), I
n,A
C. (6.6)
This denition is well posed, again thanks to the fact that I
n,A
= I
m,B
with n < m when B = AX
n+1
X
m
. It is easy to check that is
additive: indeed, if I
n,A
and I
m,B
are disjoint, using the previous remark
we can assume with no loss of generality that n = m, and therefore the
equality (I
n,A
I
n,B
) = (I
n,A
) +(I
n,B
) follows by
_
n
k=1

k
_
(A B) =
_
n
k=1

k
_
(A) +
_
n
k=1

k
_
(B).
Theorem 6.11. The set function dened in (6.6) is additive on C
and therefore, by the Carath eodory theorem, it has a unique extension to
a probability measure on (X,
1
F
k
) that is denoted by
k=1

k
Proof. To prove the additivity of it is enough to show the continuity
of at , or equivalently the implication
(E
j
) C, (E
j
) nonincreasing, (E
j
)
0
> 0
_
n=1
E
j
= .
(6.7)
In the following we are given a nonincreasing sequence (E
j
) on C such
that (E
j
)
0
> 0. To prove (6.7), we need some more notation. We
set
X
(n)
=
k=n+1
X
k
and we dene
(n)
on cylindrical sets of X
(n)
as in (6.6). Then, we con-
sider the sections of E
j
dened as
E
j
(x
1
) =
_
x
(1)
X
(1)
: (x
1
, x
(1)
) E
j
_
, x
1
X
1
.
E
j
(x
1
) is a cylindrical subset of X
(1)
and by the Fubini theorem we have
(E
j
) =
_
X
1
(1)
(E
j
(x
1
)) d
1
(x
1
)
0
> 0, j 1. (6.8)
Set now
F
j,1
=
_
x
1
X
1
:
(1)
(E
j
(x
1
))

0
2
_
, j 1.
Then F
j,1
is not empty and by (6.8) we have
(E
j
) =
_
F
j,1
(1)
(E
j
(x
1
)) d
1
(x
1
) +
_
F
c
j,1
(1)
(E
j
(x
1
)) d
1
(x
1
)

1
(F
j,1
) +

0
2
.
Therefore
1
(F
j,1
)
0
/2 for all j 1.
Obviously (F
j,1
) is a nonincreasing sequence of subsets of X
1
. Since
1
is additive, it is continuous at 0. Therefore, there exists
1

_
1
F
j,1
and so
(1)
(E
j
(
1
))

0
2
, j 1. (6.9)
Consequently we have
E
j
(
1
) = , j 1. (6.10)
Now we iterate the procedure: for any x
2
X
2
we consider the section
E
j
(
1
, x
2
) =
_
x
(2)
X
(2)
: (
1
, x
2
, x
(2)
) E
j
_
, j 1.
By the Fubini theorem we have
(1)
(E
j
(
1
)) =
_
X
2
(2)
(E
j
(
1
, x
2
)) d
2
(x
2
). (6.11)
We set
F
j,2
=
_
x
2
X
2
:
(2)
(E
j
(
1
, x
2
))

0
4
_
, j 1.
Then by (6.9) and (6.10) we have
0
2

(1)
(E
j
(
1
)) =
_
X
2
(2)
(E
j
(
1
, x
2
)) d
2
(x
2
)
=
_
F
j,2
(2)
(E
j
(
1
, x
2
)) d
2
(x
2
) +
_
[F
j,2
]
c

(2)
(E
j
(
1
, x
2
)) d
2
(x
2
)

2
(F
j,2
) +

0
4
.
Therefore
2
(F
j,2
)
0
/4. Since (F
j,2
) is nonincreasing and
2
is
additive, there exists
2
X
2
such that
2
(E
j
(
1
,
2
))

0
4
, j 1,
and consequently we have
E
j
((
1
,
2
)) = . (6.12)
Arguing in a similar way we see that there exists a sequence (
k
) X
such that
E
j
(
1
, . . . ,
n
) = , for all j, n 1, (6.13)
where
E
j
(
1
, . . . ,
n
) =
_
x X
(n)
: (
1
, . . . ,
n
, x
(n)
) E
j
_
, j, n 1.
Since E
j
are cylindrical, this easily implies that (
n
)
_
1
E
j
. There-
fore
_
1
E
j
is not empty, as required.
Exercises
6.1 Let (X
1
, F
1
), (X
2
, F
2
), (X
3
, F
3
) be measurable spaces. Show that
(F
1
F
2
) F
3
= F
1
(F
2
F
3
).
If we are given measures
i
in F
i
, i = 1, 2, 3, show also that (
1
2
)
3
=
1
(
2

3
).
6.2 Let us consider the measurable spaces (R, B(R)), (R
n
, B(R
n
)). Show that
B(R
n
) =
n
i =1
B(R).
Hint: to show the inclusion , use Lemma 6.9.
6.3 Let L
n
be the algebra of Lebesgue measurable sets in R
n
. Show that
L
1
L
1
L
2
.
Hint: to show the strict inclusion, consider the set E = F {0}, where F R
is not Lebesgue measurable.
6.4 Show that the product algebra is also generated by the family of products
1
A
i
where A
i
F
i
and A
i
= X
i
only for nitely many i .
6.5 Writing properly L
3
as a product measure, compute L
3
(T), where
T =
_
(x, y, z) : x
2
+ y
2
< r
2
and y
2
+ z
2
< r
2
_
.
6.6 [Computation of
n
] Find a recursive formula linking
n
to
n2
, and use
it to show that
2k
=
k
/k! and
2k+1
= 2
k+1
k
/(2k +1)!!, where (2k +1)!!
is the product of all odd integers between 1 and 2k + 1. Hint: use the Fubini
Tonelli theorem.
6.7 Use Exercise 6.6 and the identities (1) = 1, (1/2) =

and (z +1) =
z(z) to show (6.5).
6.8 Let and be nite measures on (X, F) and (Y, G) respectively and let
= . Let E = (F G)
, as dened in Denition 1.12, and let be

the extension of to E. Show this version of the FubiniTonelli Theorem 6.4:
for any Emeasurable function F : X Y [0, +] the following statements
hold:
(i) for a.e. x X the function y F(x, y) is measurable;
(ii) the function x
_
Y
F(x, y) d(y), set to zero at all points x such that
y F(x, y) is not integrable, is measurable;
(iii)
_
X
_
Y
F(x, y) d(x) d(y) =
_
XY
F(x, y) d(x, y).
6.9 Using the notation of the Fubini-Tonelli theorem, let X = Y = [0, 1],
F = G = B([0, 1]), let be the Lebesgue measure and let be the counting
measure. Let D = {(x, x) : x [0, 1]} be the diagonal in X Y; check that
_
X
(D
x
) d(x) =
_
Y
(D
y
) d(y).
6.10 Let ( f
h
) be converging to f in L
1
(X Y, ). Show the existence
of a subsequence h(k) such that f
h(k)
(x, ) converge to f (x, ) in L
1
(Y, ) for
a.e. x X. Show by an example that, in general, this property is not true for
the whole sequence.
6.4. Comparison of measures
In this section we study some relations between measures in a measurable
space (X, F).
The rst (immediate) one is the order relation: viewing measures as
set functions, we say that if (B) (B) for all B F. It is not
hard to see that the space of measures endowed with this order relation is
a complete lattice (see Exercise 6.13): in particular
(B) = sup {(A
1
) +(A
2
) : A
1
, A
2
F, (A
1
, A
2
) partition of B}
and
(B) = inf {(A
1
) +(A
2
) : A
1
, A
2
F, (A
1
, A
2
) partition of B} .
Another relation between measures is linked to the concept of product of
a function by a measure.
Denition 6.12. Let be a measure in (X, F) and let f L
1
(X, F, )
be nonnegative. We set
f (B) :=
_
B
f d B F. (6.14)
It is immediate to check, using the additivity and the continuity properties
of the integral, that f is a nite measure. Furthermore, the following
simple rule provides a way for the computation of integrals with respect
to f :
_
X
h d( f ) =
_
X
hf d, (6.15)
whenever h is Fmeasurable and nonnegative (or hf is integrable,
see Exercise 6.11). It sufces to check the identity (6.15) on character-
istic functions h = 1
B
(and in this case it reduces to (6.14)), and then
for simple functions. The monotone convergence theorem then gives the
general result.
Notice also that, by denition, f (B) = 0 whenever (B) = 0. We
formalize this relation between measures in the next denition.
Denition 6.13 (Absolute continuity). Let , be measures in F. We
say that is absolutely continuous with respect to , and write , if
all negligible sets are negligible, i.e.
(A) = 0 (A) = 0.
For nite measures, the absolute continuity property can also be given in
a (seemingly) stronger way, see Exercise 6.14.
The following theoremshows that absolute continuity of with respect
to is not only necessary, but also sufcient to ensure the representation
= f .
Theorem 6.14 (RadonNikod ym). Let and be nite measures on
(X, F) such that . Then there exists a unique nonnegative
L
1
(X, F, ) such that
(E) =
_
E
(x) d(x) E F. (6.16)
We are going to show a more general result, whose statement needs two
more denitions. We say that a measure is concentrated on a F
measurable set A if (X \ A) = 0. For instance, the Dirac measure
a
is concentrated on {a}, and the Lebesgue measure in R is concentrated
on the irrational numbers, and f is concentrated (whatever is) on
{ f = 0}.
Denition 6.15 (Singular measures). Let , be measures in (X, F).
We say that is singular with respect to , and write , if there exist
disjoint Fmeasurable sets A, B such that is concentrated on A and
is concentrated on B.
The relation of singularity, as stated, is clearly symmetric. However, it
can also be stated in a (seemingly) asymmetric way, by saying that
if is concentrated on a negligible set A (just take B = A
c
to see the
equivalence with the previous denition).
Example 6.16. Let X = R, F = B(R), the Lebesgue measure on
(X, F) and =
x
0
the Dirac measure at x
0
R. Then is concentrated
on A := R \ {x
0
}, whereas is concentrated on B := {x
0
}. So, and
are singular.
Theorem 6.17 (Lebesgue). Let and be measures on (X, F), with
nite and nite. Then the following assertions hold.
(i) There exist two unique nite measures
a
and
s
on (X, F) such that
=
a
+
s
,
a
,
s
. (6.17)
(ii) There exists a unique L
1
(X, F, ) such that
a
= .
(6.17) is called the Lebesgue decomposition of with respect to . The
function in (ii) is called the density of with respect to and it is
sometimes denoted by
: =
d
d
.
RadonNikod ym theorem simply follows by the Lebesgue theorem noti-
cing that, in the case when the uniqueness of the decomposition
gives
a
= and
s
= 0, so that =
a
= .
Proof of Theorem 6.17. . We assume rst that also is nite. Set
= + and notice that, obviously, and . Dene a linear
functional F on L
2
(X, F, ) setting
F() :=
_
X
(x) d(x), L
2
(X, F, ).
The functional F is well dened and bounded (and consequently con-
tinuous) since, in view of the H older inequality, we have
|F()|
_
X
|(x)| d(x)
_
X
|(x)| d(x) [(X)]
1/2
L
2
(X,F,)
.
Now, thanks to the Riesz theorem, there exists a unique function f
L
2
(X, F, ) such that
_
X
(x) d(x) =
_
X
f (x)(x) d(x) L
2
(X, F, ). (6.18)
Setting = 1
E
, with E F, yields
(E) =
_
E
f (x) d(x) 0,
which implies, by the arbitrariness of E, f (x) 0, a.e. and, in par-
ticular, both a.e. and a.e. In the sequel we shall assume, possibly
modifying f in a negligible set, and preserving the validity of (6.18),
that f 0 everywhere. By (6.18) it follows
_
X
(x)(1 f (x)) d(x) =
_
X
f (x)(x) d(x) L
2
(X, F, ).
(6.19)
Setting = 1
E
, with E F, yields
_
E
(1 f (x)) d(x) =
_
E
f (x) d(x) 0
because f 0. Thus, being E arbitrary, we obtain that f (x) 1 for
a.e. x X. Set now
A := {x X : 0 f (x) < 1}, B := {x X : f (x) 1},
so that (A, B) is a Fmeasurable partition of X, and
a
(E) := (E A),
s
(E) := (E B) E F,
so that
a
= 1
A
is concentrated on A,
s
= 1
B
is concentrated on B
and =
a
+
s
.
Then, setting in (6.19) = 1
B
, we see that
(B)
_
B
f d =
_
B
(1 f ) d = 0
because f = 1 a.e. on B. It follows that
s
is singular with respect to
.
We show now that the existence of such that
a
= . Heuristically,
this can be obtained choosing in (6.19) the function = (1 f )
1
1
EA
,
but since this function need not to be in L
2
(X, F, ) we argue by ap-
proximation: set in (6.19)
(x) = (1 + f (x) + + f
n
(x))1
EA
(x)
where n 1 and E F. Then we obtain
_
EA
(1 f
n+1
(x)) d(x) =
_
EA
[ f (x)+ f
2
(x)+ + f
n+1
(x)] d(x).
Set (x) = 0 for x B and
(x) := lim
n
[ f (x) + f
2
(x) + + f
n+1
(x)] =
f (x)
1 f (x)
, x A.
Then, by the monotone convergence theorem it follows that
a
(E) = (E A) =
_
EA
(x) d(x) =
_
E
(x) d(x).
Setting E = X we see that L
1
(X, F, ), and the arbitrariness of E
gives that
a
= .
Now we consider the case when is nite. In this case there exists
a sequence of pairwise disjoint sets (X
n
) F such that
X =
_
n=0
X
n
with (X
n
) < .
Let us apply Theorem 6.17 to the nite measures
n
= 1
X
n
,
n
= 1
X
n
.
For any n N let
n
= (
n
)
a
+ (
n
)
s
=
n
n
+ (
n
)
s
be the Lebesgue
decomposition of
n
with respect to
n
. Now, set
a
:=
n=0
(
n
)
a
,
s
:=
n=0
(
n
)
s
, :=
n=0
n
1
X
n
.
Since
k
n=0
(
n
)
a
+(
n
)
s
=
k
n=0
n
= 1
k
0
X
n
,
we can pass to the limit as k to obtain that
a
and
s
are nite
measures, and =
a
+
s
. Moreover, for any E F we have, using
the monotone convergence theorem,
a
(E) =
n=0
(
n
)
a
(E) =
n=0
_
E

n
(x) d
n
(x)
=
_
E
n=0
n
(x)1
X
n
d(x) =
_
E
(x) d(x).
So,
a
, and setting E = X we see that is integrable with respect
to . Finally, it is easy to see that
s
, because if we denote by
B
n
F negligible sets where (
n
)
s
are concentrated, we have that
s
is concentrated on the negligible set
n
B
n
.
Finally, let us prove the uniqueness of
a
and
s
: assume that
=
a
+
s
=
a
+
s
and let B, B
be negligible sets where

s
and
s
are respectively con-
centrated. Then, as B B
is negligible and both

s
and
s
are con-
centrated on B B
, for any set E F we have
s
(E) =
s
(E(BB
)) = (E(BB
)) =
s
(E(BB
)) =
s
(E).
It follows that
s
=
s
and therefore
a
=
a
.
The interested reader can have a look at a different proof of The-
orem 6.17 independent of Hilbert space theory, and based on three aux-
iliary variational principles; it turns out that the density f of
a
is the
maximizer in the problem
sup
__
X
f d : f
_
. (6.20)
See Exercise 6.17 and Exercise 6.18 for more details.
Remark 6.18. If is not nite then the Lebesgue decomposition does
not hold in general. Consider for instance the case when X = [0, 1],
F = B([0, 1]), is the counting measure and = L
1
. Then
(as the only negligible set is the empty set) but there is no : [0, 1]
[0, ] satisfying
(E) =
_
E
d.
Indeed, this function should be -integrable and therefore it can be non-
zero only in a set at most countable.
Exercises
6.11 Show that a Fmeasurable function h is f integrable if and only if f h
is integrable.
6.12 Show that ( f )(g) = ( f g) and ( f )(g) = ( f g) whenever
f, g L
1
(X, F, ) are nonnegative.
6.13 Let {
i
}
i I
be a family of measures in (X, F). Show that
(B) := inf
_

k=0
i (k)
(B
k
) : i : N I,
(B
k
) countable Fmeasurable partition of B
_
is the greatest lower bound of the family {
i
}
i I
, i.e.
i
for all i I and it
is the largest measure with this property. Show also that
(B) := sup
_

k=0
i (k)
(B
k
) : i : N I,
(B
k
) countable Fmeasurable partition of B
_
is the smallest upper bound of the family {
i
}
i I
, i.e.
i
for all i I and
it is the smallest measure with this property.
6.14 Let , be measures in (X, F) with nite. Then if and only if
for all > 0 there exists > 0 such that
A F, (A) < (A) < .
6.15 Assume that and that . Show that = 0.
6.16 Assume that + and that . Show that .
6.17 Prove Theorem 6.14 in the following two steps:
(1) Show that a maximizer f in (6.20) exists.
(2) Setting = f 0, satises
t > 0, B F, t 1
B
(B) = 0. (6.21)
Then, apply Exercise 6.18 to conclude that .
6.18 Let , be nonnegative nite measures satisfying (6.21). Show that
. Hint: rst show that
inf {(A) : A F, is concentrated on A}
has a solution A. Assuming by contradiction that (A) > 0 (otherwise we are
done), show that
F B A, (B) > 0 (B) > 0. (6.22)
Then, show that the numbers
h
:= sup
_
(B) : F B A, 1
B
2
h
1
B
_
are innitesimal as h , that the supremum is attained at B
h
, and that
(C) 2
h
(C) for all sets C A \ B
h
. (6.23)
Finally choose t = 2
h
, with h sufciently large so that
h
< (A) and B =
A \ B
h
, to get a contradiction with (6.21).
6.5. Signed measures
Let (X, F) be a measurable space. In this section we see howthe concept
of measure, still viewed as a set function, can be extended dropping the
nonnegativity assumption on A (A).
We recall that sequence (E
i
) F of pairwise disjoint sets such that
0
E
i
= E is called a countable Fmeasurable partition of E.
Denition 6.19 (Signed measures and total variation). A signed mea-
sure in (X, F) is a map : F R such that
(E) =
i =0
(E
i
)
for all countable Fmeasurable partitions (E
i
) of E.
Notice that the series above is absolutely convergent by the arbitrari-
ness of (E
i
): indeed, if : N N is a permutation, then (E
(i )
) is still
a partition of E, hence
i =0
(E
i
) =
i =0
(E
(i )
).
This implies that the series is absolutely convergent.
Let be a signed measure. Then we dene the total variation || of
as follows:
||(E) = sup
_

i =0
|(E
i
)| : (E
i
) Fmeasurable partition of E
_
,
E F.
Proposition 6.20. Let be a signed measure and let || be its total vari-
ation. Then || is a nite measure on (X, F).
Proof. It is immediate to check that || is a nondecreasing set function.
Step 1. If A, B F are disjoint, we have
||(A B) = ||(A) +||(B).
Indeed, let E = A B and let (E
i
) be a countable Fmeasurable parti-
tion of E. Set
A
j
= A E
j
, B
j
= B E
j
, j N.
Then (A
j
) is a countable Fmeasurable partition of A and (B
j
) a count-
able Fmeasurable partition of B and we have E
j
= A
j
B
j
. Moreover,
j =0
|(E
j
)|
j =1
|(A
j
)| +
j =0
|(B
j
)| ||(A) +||(B),
which yields ||(A B) ||(A) +||(B).
Let us prove the converse inequality, assuming with no loss of gener-
ality that ||(A B) < . Since both ||(A) and ||(B) are nite, for
any > 0 there exist countable Fmeasurable partitions (A
k
) of A and
(B
k
) of B such that
k=0
|(A
k
)| ||(A)

2
,
k=0
|(B
k
)| ||(B)

2
.
Since (A
k
, B
k
) is a countable Fmeasurable partition of AB, we have
that
||(A B)
k=1
(|(A
k
)| +|(B
k
)|) ||(A) +||(B) .
By the arbitrariness of we have ||(A B) ||(A) +||(B).
Step 2. || is additive. Since || is additive by Step 1, it is enough
to show that || is subadditive, i.e. |(A)|

0
||(A
i
) whenever
(A
i
) F is a partition of A. This can be proved arguing as in the rst
part of Step 1, i.e. building from a partition (E
j
) of A partitions (E
j
A
i
)
of all sets A
i
.
Step 3. ||(X) < . Assume by contradiction that ||(X) = . Then
we claim that
there exists a partition X = A B such that
|(A)| 1 and ||(B) = .
(6.24)
By the claim the conclusion follows since we can use it to construct by
recurrence (replacing X with B and so on), a disjoint sequence (A
n
) F
such that |(A
n
)| 1. Assume, to x the ideas, that (A
n
) 1 for
innitely many n, and denote by E the union of these sets: then, the
additivity of forces (E) = +, a contradiction. Analogously, if
(A
n
) 1 for innitely many n, we nd a set E such that (E) =
.
Let us prove (6.24). By the assumption ||(X) = it follows the
existence of a partition (X
n
) of X such that
n=0
|(X
n
)| > 2(1 +|(X)|).
Then either the sumof those (X
n
) which are nonnegative or the absolute
value of the sum of those (X
n
) which are nonpositive is greater than
1 + |(X)|. To x the ideas, assume that for a subsequence (X
n(k)
) we
have (X
n(k)
) 0 and
k=0
(X
n(k)
) > 1 +|(X)|.
Set A =

0
X
n(k)
and B = A
c
. Then we have |(A)| > 1 +|(X)| and
|(B)| = |(X) (A)| |(A)| |(X)| > 1.
Since
||(X) = ||(A) +||(B) = ,
either ||(B) = + or ||(A) = +. In the rst case we are done, in
the second one we exchange A and B. So, the claim is proved and the
proof is complete.
Let be a signed measure on (X, F). We dene
+
:=
1
2
(|| +),
:=
1
2
(|| ),
so that
=
+

and || =
+
+
. (6.25)
The measure
+
(respectively
) is called the positive part (respectively

negative part) of and the rst equation in (6.25) is called the Jordan
representation of .
Remark 6.21. It is easy to check that Theorems 6.17 and 6.14 hold when
is a signed measure: it sufces to split it into its positive and negative
part, see also Exercise 6.19.
The following theorem proves also that
+
and
are singular, and

provides a canonical representation of
as suitable restrictions of .
Theorem 6.22 (Hahn decomposition). Let be a signed measure on
(X, F) and let
+
and
be its positive and negative parts. Then there

exists a Fmeasurable partition (A, B) of X such that
+
(E) = (AE) and
(E) = (BE) E F. (6.26)

Proof. Let us rst notice that ||. Thus, by the RadonNikod ym
theorem, there exists h L
1
(X, F, ||) such that
(E) =
_
E
h d|| E F. (6.27)
Let us prove that |h(x)| = 1 for ||a.e. x X. Indeed, set
E
1
:= {x X : h(x) > 1}, F
1
:= {x X : h(x) < 1}
We rst show that ||(E
1
) = ||(F
1
) = 0. Since we have
||(E
1
) (E
1
) =
_
E
1
h d|| ||(E
1
),
and the second inequality is strict if ||(E
1
) > 0, we have that ||(E
1
) =
0. In a similar way one can prove that ||(F
1
) = 0, so that |h| 1 ||
a.e. in X. Now, let r (0, 1) and set
G
r
:= {x X : |h(x)| < r}.
Let (G
r,k
) be a countable Fmeasurable partition of G
r
. Then we have
|(G
r,k
)| =
_
G
r,k
h d||

_
G
r,k
|h| d|| r||(G
r,k
).
Therefore
k=0
|(G
r,k
)| r||(G
r
),
which yields, by the arbitrariness of the partition of G
r
, ||(G
r
)
r||(G
r
). Thus ||(G
r
) = 0 and letting r 1 we obtain that ||({|h| <
1}) = 0. Hence, possibly modifying h in ||negligible set, we can as-
sume with no loss of generality that h takes its values in {1, 1}.
Now, to conclude the proof, we set
A := {x X : h(x) = 1}, B := {x X : h(x) = 1}.
Then for any E F we have
+
(E) =
1
2
(||(E) +(E)) =
1
2
_
E
(1 +h)d||
=
_
EA
hd|| = (E A),
and
(E) =
1
2
(||(E) (E)) =
1
2
_
E
(1 h)d||
=
_
EB
hd|| = (E B).
Exercises
6.19 Using the decomposition of in positive and negative part, show that Le-
besgue decomposition is still possible when is nite and is a signed meas-
ure. Using the Hahn decomposition extend this result to the case when even
is a signed measure. Are these decompositions unique?
6.20 Show that | f | = | f | for any f L
1
(X, E, ).
6.6. Measures in R
In this section we estabilish a 1-1 correspondence between nite Borel
measures in R and a suitable class of nondecreasing functions. In one
direction this correspondence is elementary, and based on the concept of
repartition function.
Given a nite measure in (R, B(R)), we call repartition function of
the function F : R [0, +) dened by
F(x) := ((, x]) x R.
Notice that obviously
(1)
F is nondecreasing, right continuous, and satis-
es
lim
x
F(x) = 0, lim
x+
F(x) [0, +). (6.28)
Moreover, F is continuous at x if and only if x is not an atom of .
(1)
The arguments are similar to those used in Section 2.4.2, in connection with the properties of the
function t ({ > t })
The following result shows that this list of properties characterizes the
functions that are repartition functions of some nite measure ; in addi-
tion the measure is uniquely determined by its repartition function.
Theorem 6.23. Let F : R [0, +) be a nondecreasing and right
continuous function satisfying (6.28). Then there exists a unique nite
measure in (R, B(R)) such that F is the repartition function of .
Proof. The proof follows the same lines of the construction of the Le-
besgue measure in Section 1.6, with a simplication due to the fact that
we can also consider unbounded intervals (because we are dealing with
nite measures). We set
I := {(a, b] : a [, +), b R, a < b}
and denote by A the ring generated by I: it consists, as it can be easily
checked, of all nite disjoint unions of intervals in I. We dene, with
the convention F() = 0,
((a, b]) := F(b) F(a) (a, b] I. (6.29)
This denition is justied by the fact that, if were a measure and F
were its repartition function, (6.29) would be valid, because (a, b] =
(, b] \ (, a]. Then we extend to A with the same mechan-
ism used in the proof of Theorem 1.21, and check that is additive on
A. Also, the same argument used in that proof shows that is even
additive: in order to prove that (F) =

i
(F
i
) whenever F and all F
i
belong to A one rst reduces to the case when F = (a, b] belongs to I;
then, one enlarges F
i
to F
i
A with (F
i
) < (F
i
) +2
i
and, using
the fact that all intervals [a
, b] with a
> a are contained in a nite union

of the sets F
i
, obtains
((a
, b])
i =0
(F
i
) 2 +
i =0
(F
i
).
Letting rst 0 and then a
a we obtain the subadditivity property

(F)

i
(F
i
), and the opposite inequality follows by monotonicity.
By the Carath eodory theorem has a unique extension, that we still
denote by , to B(R) = (A). Setting a = and letting b tend to
+ in the identity (6.29) we obtain that (R) = F(+) R. From
(6.29) with a = we obtain that the repartition function of is F.
Given a nondecreasing and right continuous function F satisfying (6.28),
the Stieltjes integral
_
R
f dF
is dened as
_
f d
F
, where
F
is the nite measure built in the pre-
vious theorem. The notation dF is justied by the fact that, when f =
i
z
i
1
(a
i
,b
i
]
, we have (by the very denition of
F
)
_
R
f dF =
_
R
f d
F
=
i
z
i
(F(b
i
) F(a
i
)).
This approximation of the Stieltjes integral will play a role in the proof
of Theorem 6.28.
6.7. Convergence of measures on R
In this section we study a notion of convergence for measures on the
real line that is quite useful, both from the analytic and the probabilistic
viewpoints.
Denition 6.24 (Weak convergence). Let (
h
) be a sequence of nite
measures on R. We say that (
h
) weakly converges to a nite measure
on R if the repartition functions F
h
of
h
are pointwise converging to the
repartition function F of on a co-countable set, i.e. if
lim
h
h
(, x]) = ((, x]) with at most countably many exceptions.
(6.30)
Since the repartition function is right continuous, it is uniquely determ-
ined by (6.30). Then, since the measure is uniquely determined by its
repartition function, we obtain that the weak limit, if exists, is unique.
The following fundamental example shows why we admit at most count-
ably many exceptions in the convergence of the repartition functions.
Example 6.25. [Convergence to the Dirac mass] Let C
(R) be
a nonnegative function such that
_
R
dx = 1 (an important example is
the Gauss function (2)
1/2
e
x
2
/2
). We consider the rescaled functions
h
(x) = h(hx) and the induced measures
h
=
h
L
1
, all probability
measures. Then, it is immediate to check that
h
weakly converge to
0
:
for x > 0 we have indeed
h
((, x]) =
_
x
h
(y) dy =
_
hx
(y) dy 1
because hx + as h +. An analogous argument shows that
h
((, x]) 0 for any x < 0. If is even, at x = 0 we dont
have pointwise convergence of the repartition functions: all the reparti-
tion functions F
h
satisfy F
h
(0) = 1/2, while F(0) = 1.
Weak convergence is a quite exible tool, because it allows also an op-
posite behaviour, the approximation of continuous measures (i.e. with no
atom) by purely atomic ones, see for instance Exercise 6.21.
From now on we will consider only, for the sake of simplicity, the
case of weak convergence of probability measures. Before stating a com-
pactness theorem for the weak convergence of probability measures, we
introduce the following terminology.
Denition 6.26 (Tightness). We say that a family of probability meas-
ures {
i
}
i I
in R is tight if for any > 0 there exists a closed interval
J R such that
i
(R \ J) i I.
Clearly any nite family of probability measures is tight. One can also
check (see Exercise 6.24) that {
i
}
i I
is tight if and only if
lim
x
F
i
(x) = 0, lim
x+
F
i
(x) = 1 uniformly with respect to i I ,
(6.31)
where F
i
are the repartition functions of
i
. Furthermore, (see Exer-
cise 6.25) any weakly converging sequence is tight. Conversely, we have
the following compactness result for tight sequences:
Theorem 6.27 (Compactness). Let (
h
) be a tight sequence of prob-
ability measures on R. Then there exists a subsequence (
h(k)
) weakly
converging to a probability measure .
Proof. We denote by F
h
the repartition functions of
h
. By a diagonal
argument we can nd a subsequence (F
h(k)
) pointwise converging on Q.
We denote by G the pointwise limit, obviously a nondecreasing function.
We extend G by monotonicity setting
G(x) := sup {G(q) : q Q, q x} x R
and let E be the co-countable set of the discontinuity points of G.
Let us check that F
h(k)
is pointwise converging to G on R \ E: for
x / E we have indeed
limsup
k
F
h(k)
(x) inf
qQ, q>x
limsup
k
F
h(k)
(q) = inf
qQ, q>x
G(q) = G(x),
and analogously
liminf
k
F
h(k)
(x) sup
qQ, q<x
liminf
k
F
h(k)
(q) = sup
qQ, q<x
G(q) = G(x).
Since (
h
) is tight, we have also
lim
x
F
h
(x) = 0, lim
x+
F
h
(x) = 1
uniformly with respect to h, hence G() = 0 and G(+) = 1.
Notice now that the nondecreasing function
F(x) := lim
yx
G(y)
is right continuous, and still satises F() = 0 and F(+) = 1,
therefore (according to Theorem 6.23) F is the repartition function of a
probability measure . Since F = G on R \ E, we have F
h(k)
F
pointwise on R \ E, and this proves the weak convergence of
h(k)
to .
The following theorem provides a characterization of the weak conver-
gence in terms of convergence of the integrals of continuous and bounded
functions.
Theorem 6.28. Let
h
, be probability measures in R. Then
h
weakly
converge to if and only if
lim
h
_
R
g d
h
=
_
R
g d g C
b
(R). (6.32)
Proof. Assuming that
h
weakly, we denote by F
h
and F the cor-
responding repartition functions and x g C
b
(R). Let M = sup |g|
and > 0. By Exercise 6.25 the sequence (
h
) is tight, so that we can
nd t > 0 satisfying
h
(R \ (t, t ]) < for any h N; we may as-
sume (possibly choosing a larger t ) that also (R \ (t, t ]) < and that
both t and t are points where the repartition functions are converging.
Thanks to the uniform continuity of g in [t, t ] we can nd > 0 such
that
x, y [t, t ], |x y| < |g(x) g(y)| < . (6.33)
Hence, we can nd points t
1
, . . . , t
n
in [t, t ] such that t
1
= t , t
n
= t ,
there is convergence of the repartition functions in all points t
i
, and t
i +1
t
i
< for i = 1, . . . , n1. By (6.33) it follows that sup
(t,t ]
|g f | < ,
where
f :=
n1
i =1
g(t
i
)1
(t
i
,t
i +1
]
.
Splitting the integrals on R as the sum of an integral on (t, t ] and an
integral on (t, t ]
c
we have
_
R
g d
h

_
(t,t ]
f d
h
M + = (M +1) h N, (6.34)
and analogously
_
R
g d
_
(t,t ]
f d
M + = (M +1). (6.35)
Since
_
(t,t ]
f d
h
=
n1
i =1
g(t
i
) [F
h
(t
i +1
) F
h
(t
i
)]
n1
i =1
g(t
i
) [F(t
i +1
) F(t
i
)] =
_
(t,t ]
f d,
adding and subtracting
_
(t,t ]
f d
h
, and using (6.34) and (6.35), we con-
clude that
limsup
h
_
R
g d
h

_
R
g d
(M +1).
Since is arbitrary, (6.32) is proved.
Conversely, assume that (6.32) holds. Given x R, dene the open
set A = (, x); we can easily nd (g
k
) C
b
(R) monotonically con-
verging to 1
A
and deduce from (6.32) the inequality
liminf
h
h
(A) sup
kN
liminf
h
_
R
g
k
d
h
= sup
kN
_
R
g
k
d = (A).
Analogously, using a sequence (g
k
) C
b
(R) such that g
k
1
C
, with
C = (, x], we deduce from (6.32) the inequality
limsup
h
h
(C) inf
kN
limsup
h
_
R
g
k
d
h
= inf
kN
_
R
g
k
d = (C).
Therefore we have convergence of the repartition functions for any x R
such that (A) = (C), i.e. for any x that is not an atom of . We
conclude thanks to Exercise 1.5.
Notice that in (6.32) there is no mention to the order structure of R,
and only the metric structure (i.e. the space C
b
(R)) comes into play. In
a general context, of probability measures on a metric space (X, d) en-
dowed with the Borel algebra B(X), we say that
h
weakly converge
to if
lim
h
_
X
g d
h
=
_
X
g d for any function g C
b
(X).
Exercises
6.21 Show that the probability measures
h
:=
1
h
h
i =1
i
h
weakly converge to the probability measure 1
[0,1]
L
1
.
6.22 Let F
h
: R R be nondecreasing functions pointwise converging to
a nondecreasing function F : R R on a dense set D R. Show that
F
h
(x) F(x) at all points x where F is continuous.
6.23 Consider all atomic measures of the form
h
2
i =h
2
a
i
i
h
,
where h N and a
h
, . . . , a
h
0. Show that for any nite Borel measure
in R there exists a sequence of measures (
h
) of the previous form that weakly
converges to .
6.24 Show that a family {
i
}
i I
of probability measures in R is tight if and only
if (6.31) holds.
6.25 Show that any sequence (
h
) of probability measures weakly convergent
to a probability measure is tight. Hint: if is the weak limit and > 0 is
given, choose an integer n 1 such that ([1 n, n 1]) > 1 and points
x (n, 1 n) and y (n 1, n) where the repartition functions of
h
are
converging to the repartition function of .
6.26 We want to extend what was shown in this section from the realm of prob-
ability measures to that of nite measures. Let (
h
), be nite positive Borel
measures on R, and let F
h
, F be their repartition functions. Consider the fol-
lowing implications:
(a) lim
h
_
R
g d
h
=
_
R
g d g C
b
(R) (that is, (6.32));
(b) lim
h
_
R
g d
h
=
_
R
g d g C
c
(R);
(c) F
h
converge to F at all points where F is continuous;
(d) F
h
converge to F on a dense subset of R;
(e) lim
h

h
(R) = (R);
(f) (
h
) is tight.
Find an example where (b) holds but (a), (c), (e) do not hold and prove the
following implications: a b, e, a c, d c, b e c, d e f ,
d f e, d f a. As a corollary, if (e) holds (as it happens in the case when
all
h
and are probability measures) we obtain that a b c d f .
6.8. Fourier transform
The Fourier transform is a basic tool in Pure and Applied Mathematics,
Physics and Engineering. Here we just mention a few basic facts, focus-
sing on the use of this transform in Measure Theory and Probability.
Denition 6.29 (Fourier transform of a function). Let f L
1
(R, C).
We set
f () :=
_
R
f (x)e
i x
dx R.
The function

f is called Fourier transform of f .
Since the map f (x)e
i x
is continuous, and bounded by | f (x)|,
the dominated convergence theorem gives that

f () is continuous. The
same upper bound also shows that

f is bounded, and sup |

f | f
1
.
More generally, the following result holds:
Theorem 6.30. Let k N be such that
_
R
|x|
k
| f |(x) dx < . Then
f C
k
(R, C) and
D
p

f () = (i )
p
x
p
f () p = 0, . . . , k.
The proof of Theorem 6.30 is a straightforward consequence of the
differentiation theorem for integrals depending on a parameter (in this
case, the variable, see the Appendix):
D
p
_
R
f (x)e
i x
dx =
_
R
D
p
_
f (x)e
i x
_
dx
= (i )
p
_
R
x
p
f (x)e
i x
dx.
According to the previous result, the Fourier transform allows to trans-
form differentiations (in the variable) into multiplications (in the x
variable), thus allowing an algebraic solution of many linear differential
equations.
In the sequel we need an explicit expression of the Fourier transform
of a Gaussian function. For > 0, let
(x) :=
e
|x|
2
/(2
2
)
(2
2
)
1/2
(6.36)
be the rescaled Gaussian functions, already considered in Example 6.25.
Then
_
R
(x)e
i x
dx = e
2
/2
R. (6.37)
The proof of this identity is sketched in Exercise 6.27.
Remark 6.31. (Discrete Fourier transform) If f : R R is a 2T-
periodic function, then we can write the Fourier series (corresponding,
up to a linear change of variables, to those considered in Chapter 5 for
2-periodic functions)
f =
nZ
a
n
e
i n

T
x
, in L
2
((T, T); C), (6.38)
with
a
n
=
1
2T
_
T
T
f (x)e
i n

T
x
dx, e
i n

T
x
= cos n
T
x +i sin n
T
x.
(6.39)
Remark 6.32. (Inverse Fourier transform) For g L
1
(R, C) we de-
ne inverse Fourier transform of f the function
g(x) :=
1
2
_
R
g()e
i x
d x R.
It can be shown (see for instance Chapter VI.1 in [7]) that the maps f
f and g g are each the inverse of the other in the so-called Schwarz
space S(R, C) of smooth and rapidly decreasing functions at innity:
S(R, C) :=
_
f C
(R, C) : lim
|x|
|x|
k
|D
i
f |(x) = 0 k, i N
_
.
In particular we have
f (x) = 2
_
f
2
_
(x) =
_
R
a
e
i x
d with
a
: =
1
2
_
R
f (x)e
i x
dx.
These formulas can be viewed as the continuous counterpart of the dis-
crete Fourier transform (6.38), (6.39). In this sense, a
are generalized
Fourier coefcients, corresponding to the frequency . The difference
with Fourier series is that any frequency is allowed, not only the integer
multiples n/T of a given one.
6.8.1. Fourier transform of a measure
In this section we are concerned in particular with the concept of Fourier
transform of a measure.
Denition 6.33 (Fourier transform of a measure). Let be a nite
measure on R. We set
() :=
_
R
e
i x
d(x) R.
The function : R C is called Fourier transform of .
Notice that Denition 6.29 is consistent with Denition 6.33, because
=

f whenever = f L
1
. Notice also that, by the dominated con-
vergence theorem, the function is continuous. Moreover (0) = (R)
and, by estimating from above the modulus of the integral with the integ-
ral of the modulus (see also Exercise 6.29), we obtain that | ()| (R)
for all R. Still using the differentiation theorems under the integral
sign, one can check that for k N the following implications hold:
_
R
|x|
k
d(x) < C
k
(R, C) and
D
p
() = (i )
p
x
p
() p = 0, . . . , k.
(6.40)
Let us see other basic examples of Fourier transforms of probability mea-
sures:
Example 6.34. (1) If =
x
0
then () = e
i x
0
.
(2) If = p
1
+ q
0
(with p + q = 1) is the Bernoulli measure with
parameter p, then () = q + pe
i
.
(3) If
=
n
i =0
_
n
i
_
p
i
q
ni
i
is the binomial measure with parameters n, p then
() = (q + pe
i
)
n
R.
(4) If = e
x
1
(0,)
(x)L
1
is the exponential measure, then
() =
1
1 +i
R.
(5) If = (2a)
1
1
(a,a)
L
1
is the uniform measure in [a, a], then
() =
sin(a)
a
R \ {0}.
(6) If = [(1 + x
2
)]
1
L
1
is the Cauchy measure, then
(2)
() = e
||
R.
(2)
This computation can be done using the residue theorem in complex analysis
Theorem 6.35. Any nite measure in R is uniquely determined by its
Fourier transform .
Proof. For > 0 we denote by
the rescaled Gaussian functions in

(6.36). According to Exercise 6.27 we have
e
z
2
2
/2
=
_
R
(w)e
i zw
dw.
Setting z = (x y)/
2
, dividing both sides by (2
2
)
1/2
we deduce that
(x y) =
1
(2
2
)
1/2
_
R
(w)e
i w(xy)/
2
dw.
Using Fubini-Tonelli theorem we obtain
_
R
(x y)d(x) =
_
R
1
(2
2
)
1/2
__
R
(w)e
i w(xy)/
2
dw
_
d(x)
=
_
R
(w)
(2
2
)
1/2

_
w
2
_
e
i yw/
2
dy.
(6.41)
As a consequence, the integrals h
(y) =
_
R
(yx) d(x) are uniquely

determined by . But, still using the Fubini-Tonelli theorem, one can
check the identity
_
R
__
R
g(y)
(x y) dy
_
d(x) =
_
R
h
(y)g(y) dy g C
b
(R).
(6.42)
Passing to the limit as 0 and noticing that (by Example 6.25, that
provides the weak convergence of
to
0
as 0, or a direct veric-
ation)
_
R
g(y)
(x y) dy =
_
R
g(x z)
(z) dz g(x) x R
from the dominated convergence theorem we obtain that all integrals
_
R
g d, for g C
b
(R), are uniquely determined. Hence is uniquely
determined by its Fourier transform.
Remark 6.36. It is also possible to show an explicit inversion formula
for the Fourier transform. Indeed, (6.42) holds not only for continuous
functions, but also for bounded Borel functions; choosing a < b that are
not atoms of and g = 1
(a,b)
, we have that
_
R
g(x)
(x y) dy g(x)
for a.e. x (precisely for x / {a, b}), so that (6.42) and (6.41) give
((a, b)) = lim
0
_
b
a
h
(y) dy = lim
0
_
b
a
_
R
e
w
2
/2
2
2
2
(
w
2
)e
i yw/
2
dwdy.
The change of variables w = t
2
and Fubini theorem give
((a, b)) = lim
0
1
2
_
R
e
t
2
2
/2
(t )
e
i t b
e
i t a
i t
dt, (6.43)
for all points a < b that are not atoms of .
According to Theorem 6.28 we have the implication:
h
weakly
h
pointwise in R. (6.44)
The following theorem, due to L evy, gives essentially the converse im-
plication, allowing to deduce the weak convergence fromthe convergence
of the Fourier transforms.
Theorem 6.37 (L evy). Let (
h
) be probability measures in R. If f
h
=

h
pointwise converge in R to some function f , and if f is continuous at
0, then f = for some probability measure in R and
h
weakly.
Proof. Let us show rst that (
h
) is tight. Fixed a > 0, taking into
account that sin is an odd function and using the Fubini theorem we get
_
a
a
() d =
_
a
a
_
R
e
i x
d(x)d =
_
R
_
a
a
cos(x) dd(x)
=
_
R
2
x
sin(ax) d(x)
for any probability measure . Hence, using the inequalities | sin t | |t |
for all t and | sin t | |t |/2 for |t | 2, we get
1
a
_
a
a
_
1 ()
_
d = 2 2
_
R
sin(ax)
ax
d(x)
= 2
_
R
_
1
sin(ax)
ax
_
d(x) (6.45)

_
R \
_
2
a
,
2
a
__
.
For > 0 we can nd, by the continuity of f at 0, a > 0 such that
_
a
a
(1 f ()) d < a.
By the dominated convergence theorem we get h
0
N such that
_
a
a
_
1
h
()
_
d < a h h
0
. (6.46)
As a
1
_
a
a
(1
h
()) d 0 as a 0 for any xed h, we infer that
we can nd b (0, a] such that (6.46) holds with b replacing a for all
h N. From (6.45) we get
h
(R \ [n, n]) < for all h N, as soon
as n > 2/b.
Being the sequence tight, we can extract a subsequence (
h(k)
) weakly
converging to a probability measure and deduce from (6.44) that f =
. It remains to show that the whole sequence (
h
) weakly converges to
: if this is not the case there exist > 0, g C
b
(R) and a subsequence
h
(k) such that
_
R
g d
h
(k)

_
R
g d
k N.
But, possibly extracting one more subsequence, we can assume that
h
(k)
weakly converge to a probability measure ; in particular
_
R
g d
_
R
g d
> 0. (6.47)
As we are assuming that f
h
=
h
converge pointwise to f = we
obtain that = lim
k

h
(k)
= , hence = . From Theorem 6.35 we
obtain that = , contradicting (6.47).
Notice that just pointwise convergences of the Fourier transforms is
not enough to conclude the weak convergence, unless we know that the
limit function is continuous: let us consider, for instance, the rescaled
Gaussian kernels used in the proof of Theorem 6.35 and let us consider
the behaviour of the Gaussian measures
L
1
as ; in this
case, from Exercise 6.27 we infer that the Fourier transforms are point-
wise converging in R to the discontinuous function equal to 1 at = 0
and equal to 0 elsewhere. In this case we dont have weak convergence of
the measures: we have, instead, the so-called phenomenon of dispersion
of the whole mass at innity
lim
(R \ [n, n]) = lim
1
_
R \ [
n
,
n
]
_
=
1
(R \ {0}) = 1
n N
and the family of measures
is far from being tight as .

Exercises
6.27 Check the identity (6.37).
6.28 Show that is uniformly continuous in R for any nite measure .
6.29 Let be a probability measure in R. Show that if | | attains its maximum
at
0
= 0, then there exist x
0
R and c
n
[0, ) such that
=
nZ
c
n
x
n
with x
n
= x
0
+
2n
0
.
Use this fact to show that | | 1 in R if and only if is a Dirac mass.
Chapter 7
The fundamental theorem of the integral
calculus
In this section we give a closer look at a classical theme, namely the fun-
damental theorem of the integral calculus, looking for optimal conditions
on f ensuring the validity of the formula
f (x) f (y) =
_
x
y
f

(s) ds.
Notice indeed that in the classical theory of the Riemann integration there
is a gap between the conditions imposed to give a meaning to the integ-
ral
_
x
a
g(s) ds (i.e. Riemann integrability of g) and those that ensure its
differentiability as a function of x (for instance, typically one requires
the continuity of g). We will see that this gap basically disappears in Le-
besgues theory, and that there is a precise characterization of the class
of functions representable as c +
_
x
a
g(s) ds for a suitable (Lebesgue)
integrable function g and for some constant c.
The following denition is due to Vitali.
Denition 7.1 (Absolutely continuous functions). Let I R be an in-
terval. We say that f : I R is absolutely continuous if for any > 0
there exists > 0 for which the implication
n
i =1
(b
i
a
i
) <
n
i =1
| f (b
i
) f (a
i
)| < (7.1)
holds for any nite family {(a
i
, b
i
)}
1i n
of pairwise disjoint intervals
contained in I .
An absolutely continuous function is obviously uniformly continuous,
but the converse is not true, see Example 7.7.
Let f : [a, b] Rbe absolutely continuous. For any x [a, b] dene
F(x) = sup
a,x
n
i =1
| f (x
i
) f (x
i 1
)|,
where
a,x
is the set of all decompositions = {a = x
0
< x
1
< <
x
n
= x} of [a, x]. F is called the total variation of f . Let us check that
F is nite: let > 0 be satisfying the implication (7.1) with = 1 and
let us estimate from above a sum in the denition of F. Without loss of
generality we can assume that |x
i
x
i 1
| < /2 for all i = 1, . . . , n 1,
possibly adding more points (which increases the sum). Then, we can
split the sum in families of intervals with total length larger than /2 and
less than (just keep adding a new interval to a family if the total length
does not exceed and notice that if it exceeds , the total length is at
least /2); the number of these families is less than
2
(x a) and, as a
consequence, (7.1) gives
F(x)
2
(x a) +1.
We set
f
+
(x) =
1
2
(F(x) + f (x)), f

(x) =
1
2
(F(x) f (x)),
so that
f (x) = f
+
(x) f

(x), F(x) = f
+
(x) + f

(x), x [a, b].
Lemma 7.2. Let f : [a, b] R be absolutely continuous and let F be
its total variation. Then F, f
+
, f

are nondecreasing and absolutely
continuous.
Proof. Let x [a, b), y (x, b] and = {a = x
0
< x
1
< < x
n
=
x}. Then we have
F(y) | f (y) f (x)| +
n
i =1
| f (x
i
) f (x
i 1
)|.
Taking the supremum over all
a,x
, yields
F(y) | f (y) f (x)| + F(x),
which implies that F, f
+
, f

are nondecreasing. It remains to show
that F is absolutely continuous. Let > 0 and let = () > 0 be such
that the implication (7.1) holds for all nite families (a
i
, b
i
) of pairwise
disjoint intervals with

i
(b
i
a
i
) < . Let now (a
i
, b
i
) be a family of
disjoint intervals with
i
(b
i
a
i
) < and let us prove that
i
|F(b
i
)
F(a
i
)| < 2. For any i = 1, . . . , n we can nd
i
= {a
i
= x
0,i
< x
1,i
<
< x
n
i
,i
= b
i
} such that
F(b
i
) F(a
i
) <
n
+
n
i
k=1
| f (x
k,i
) f (x
k1,i
)|, 1 i n. (7.2)
Indeed, if a = y
0
< y
1
< < y
m
i
= b
i
is a partition such that
F(b
i
) <
n
+
m
i
k=1
| f (y
k
) f (y
k1
)|
we can assume with no loss of generality (adding one more element to
the partition if necessary) that y
k
= a
i
for some k; then, it sufces to
estimate the rst k terms of the above sum with F(a
i
), and to call x
0,i
=
y
k
, . . . , x
m
i
k+1,i
= y
m
i
to obtain (7.2) with n
i
= m
i
k + 1. Adding
the inequalities (7.2) and taking into account that the union of the disjoint
intervals (x
k,i 1
, x
k,i
) (for 1 i n, 0 k n
i
) has length less than ,
from the absolute continuity property of f we get
n
i =1
(F(b
i
) F(a
i
)) < + = 2.
This proves that F is absolutely continuous.
The absolute continuity property characterizes integral functions, as
the following theorem shows.
Theorem 7.3. Le I = [a, b] R. A function f : I R is represent-
able as
f (x) = f (a) +
_
x
a
g(t ) dt x I (7.3)
for some g L
1
(I ) if and only if f is absolutely continuous.
Proof. (Sufciency) If f is representable as in (7.3), we have
| f (x) f (y)|
_
y
x
|g(s)| ds x, y I, x y.
Hence, setting A =
i
(a
i
, b
i
), the absolute continuity property follows
by the implication
L
1
(A) <
_
A
|g| ds < .
The existence, given > 0, of > 0 with this property is ensured by
Exercise 6.14 (with = L
1
and = gL
1
).
(Necessity) According to Lemma 7.2, we can write f as the difference
of two nonincreasing absolutely continuous functions. Hence, we can
assume with no loss of generality that f is nonincreasing, and possibly
adding to f a constant we shall assume that f (a) = 0. We extend f to the
whole of R setting f 0 in (, a) and f f (b) in (b, ). It is clear
that this extension, that we still denote by f , retains the monotonicity and
absolute continuity properties.
By Theorem 6.23 we obtain a unique nite measure on (R, B(R))
without atoms (because f is continuous) such that f is the repartition
function of . Since f is constant on (, a) and on (b, +), we
obtain that is concentrated on I , so that
f (x) = ((, x]) = ((a, x]) x R. (7.4)
Now, if we were able to show that 1
I
L
1
, by the RadonNikodym
theorem we would nd g L
1
(I ) such that = gL
1
, so that (7.4)
would give
f (x) =
_
x
a
g(s) ds x I.
Hence, it remains to show that 1
I
L
1
. Taking into account the
identity ((a, b)) = f (b) f (a), the absolute continuity property can
be rewritten as follows: for any > 0 there exists > 0 such that
L
1
(A) < (A)
for any nite union of open intervals A I . But, by approximation,
the same implication holds for all open sets, because any such set is the
countable union of open intervals. By Proposition 1.24, ensuring an ap-
proximation from above with open sets, the same implication holds for
Borel sets B I as well. This proves that 1
I
L
1
and concludes the
proof.
We will need the following nice and elementary covering theorem.
Theorem 7.4 (Vitali covering theorem). Let {B
r
i
(x
i
)}
i I
be a nite
family of balls in a metric space (X, d). Then there exists J I such
that the balls {B
r
i
(x
i
)}
i J
are pairwise disjoint, and
_
i I
B
r
i
(x
i
)
_
i J
B
3r
i
(x
i
). (7.5)
Proof. We proceed as follows: rst we pick a ball with largest radius,
then we remove all balls that intersect the rst chosen ball and choose
a second ball of largest radius among the remaining ones. We continue
removing all balls that intersect the second chosen ball and picking a
third ball of largest radius among the remaining ones, and so on. The
process stops when either there is no ball left, i.e. when the remaining
balls intersect at least one of the already chosen balls. The family of
chosen balls is disjoint by construction. If x B
r
i
(x
i
) and the ball B
r
i
(x
i
)
has not been chosen, then there is a chosen ball B
r
j
(x
j
) intersecting it, so
that d(x
i
, x
j
) < r
i
+r
j
. Moreover, if B
r
j
(x
j
) is the rst chosen ball with
this property, then r
j
r
i
(otherwise, if r
i
> r
j
, either the ball B
r
i
(x
i
) or
a ball with larger radius would have been chosen, instead of B
r
j
(x
j
)), so
that d(x
i
, x
j
) < 2r
j
. It follows that
d(x, x
j
) d(x, x
i
) +d(x
i
, x
j
) < r
i
+2r
j
3r
j
.
As x is arbitrary, this proves (7.5).
It is natural to think that the function g in (7.3) is, as in the classical
fundamental theorem of integral calculus, the derivative of f . This is
true, but far from being trivial, and it follows by the following weak con-
tinuity result (due to Lebesgue) of integrable functions. We state the
result even in more then one variable, as the proof in this case does not
require any extra difculty.
Theorem 7.5 (Continuity in mean). Let f L
1
(R
n
). Then, for L
n
a.e. x R
n
we have
lim
r0
1
n
r
n
_
B
r
(x)
| f (y) f (x)| dy = 0.
The terminology continuity in mean can be explained as follows: it is
easy to show that the integral means
1
n
r
n
_
B
r
(x)
f (y) dy
of a continuous function f converge to f (x) as r 0 for any x R
n
,
because they belong to the interval
[min
B
r
(x)
f, max
B
r
(x)
f ].
The previous theorem tells us that the same convergence occurs, for L
n
a.e. x R
n
, for any integrable function f . This simply follows by the
inequality
n
r
n
_
B
r
(x)
f (y) dy f (x)
=
1
n
r
n
_
B
r
(x)
f (y) f (x) dy
n
r
n
_
B
r
(x)
| f (y) f (x)| dy.
By the local nature of this statement, the same property holds for locally
integrable functions.
Proof of Theorem 7.5. Given , > 0 and an open ball B = B
R
(0), it
sufces to check that the set
A :=
_
x B : limsup
r0
1
n
r
n
_
B
r
(x)
| f (y) f (x)| dy > 2
_
has Lebesgue measure less than (3
n
+1). To this aim, we write f as the
sum of a good part g and a bad, but small, part h, i.e. f = g + h
with g : B R bounded and continuous, and h
L
1
(B
)
< , with
B
= B
R+1
(0); this decomposition is possible, because Proposition 3.16
ensures the density of bounded continuous functions in L
1
(B).
The continuity of g gives
lim
r0
1
n
r
n
_
B
r
(x)
|g(y) g(x)| dy = 0 x B.
Hence, as f = g +h, we have A A
1
, where
A
1
:=
_
x B : limsup
r0
1
n
r
n
_
B
r
(x)
|h(y) h(x)| dy > 2
_
.
Then, it sufces to show that L
n
(A
1
) (3
n
+ 1). By the triangle
inequality, we have also A
1
A
2
A
3
with
A
2
:= {x B : |h(x)| > }
and
A
3
:=
_
x B : sup
r(0,1)
1
n
r
n
_
B
r
(x)
|h(y)| dy >
_
.
Markov inequality ensures that L
n
(A
2
) h
L
1
(B)
/ < , so that we
need only to show that L
n
(A
3
) 3
n
.
Since x
_
B
r
(x)
|h(y)| dy is continuous, we have that
x sup
r(0,1)
1
n
r
n
_
B
r
(x)
|h(y)| dy
is lower semi continuous, hence A
3
is open. Notice also that for any
x A
3
there exists r (0, 1), depending on x, such that
_
B
r
(x)
|h(y)| dy >
n
r
n
.
Let K A
3
be a compact set and let {B(x
i
, r
i
)}
i I
be a nite family of
these balls whose union covers K. By applying Vitalis covering theorem
to this family of balls, we can nd a disjoint subfamily {B
r
i
(x
i
)}
i J
such
that the union of the enlarged balls B
3r
i
(x
i
) still covers K. Adding the
previous inequalities with x = x
i
and r = r
i
and summing in i J,
since all balls B
r
i
(x
i
) are contained in B
we get
L
n
(K)
i J
n
(3r
i
)
n
3
n
i J
_
B
r
i
(x
i
)
|h(y)| dy
3
n
_
B
|h(y)| dy 3
n
.
As K is arbitrary we obtain that L
n
(A
3
) 3
n
.
Since the continuity in mean is a local property, it is not difcult to
extend the previous result to locally integrable functions. By applying
this extended theorem to a characteristic function f = 1
E
we get
lim
r0
L
n
(E B
r
(x))
n
r
n
= 1 for L
n
a.e. x E
lim
r0
L
n
(E B
r
(x))
n
r
n
= 0 for L
n
a.e. x R
n
\ E
for any E B(R
n
); points of the rst type are called density points,
whereas points of the second type are called rarefaction points.
Using the continuity in mean of integrable functions we obtain the
fundamental theorem of calculus within the (natural) class of absolutely
continuous functions.
Theorem 7.6. Let I R be an interval and let f : I R be absolutely
continuous. Then f is differentiable at L
1
a.e. point of I . In addition
f

is Lebesgue integrable in I and
f (x) = f (a) +
_
x
a
f

(s) ds x I. (7.6)
Proof. Let g be as in (7.3), let x
0
I be a point where
lim
r0
1
r
_
x
0
+r
x
0
r
|g(s) g(x
0
)| ds = 0 (7.7)
and notice that
f (x
0
+r) f (x
0
)
r
=
1
r
_
x
0
+r
x
0
g(s) ds
= g(x
0
) +
1
r
_
x
0
+r
x
0
g(s) g(x
0
) ds
for r > 0. Hence, passing to the limit as r 0, from (7.7) we get
f

+
(x
0
) = g(x
0
); a similar argument shows that f

(x
0
) = g(x
0
). As,
according to the previous theorem, L
1
a.e. point x
0
satises (7.7), we
obtain that f is differentiable, with derivative equal to g, L
1
a.e. in I .
It sufces to replace g with f

in (7.3) to obtain (7.6).
One might think that differentiability L
1
a.e. and integrability of the
derivative are sufcient for the validity of (7.6) (these are the minimal
requirements to give a meaning to the formula). However, this is not
true, as the Heaviside function 1
(0,)
fulls these conditions but fails to be
(absolutely) continuous. Then, one might think that one should require
also the continuity of f to have (7.6). It turns out that not even this
is enough: we build in the next example the Cantor-Vitali function, also
called devils staircase: a continuous function having derivative equal to 0
L
1
a.e., but not constant. This example shows why a stronger condition,
namely the absolute continuity, is needed.
Example 7.7 (CantorVitali function). Let
X := { f C([0, 1]) : f (0) = 0, f (1) = 1} .
This is a closed subspace of the complete metric space C([0, 1]), hence
X is complete as well. For any f : [0, 1] R we set
T f (x) :=
f (3x)/2 if 0 3x 1,
1/2 if 1 < 3x < 2,
1/2 + f (3x 2)/2 if 2 3x 3.
(7.8)
It is easy to see that T maps X into X, and that T is a contraction (with
Lipschitz constant equal to 1/2). Hence, by the contraction principle,
there is a unique f X such that T f = f .
Let us check that f has zero derivative L
1
a.e. in [0, 1]. As f = T f ,
f is constant, and equal to 1/2, in (1/3, 2/3). Inserting this information
again in the identity f = T f we obtain that f is locally constant (equal
to 1/4 and to 3/4) on (1/9, 2/9) (7/9, 8/9). Continuing in this way,
one nds that f is locally constant on the union of 2
n1
intervals, each
of length 3
n
, n 1. The complement C = [0, 1] \ A of the union A
of these intervals is Cantors middle third set (see also Exercise 1.8), and
since
L
1
(A) =
n=1
2
n1
3
n
=
1
2
n=1
_
2
3
_
n
= 1
we know that L
1
(C) = 0. At any point of A the derivative of f is
obviously 0.
In connection with the previous example, notice also that f maps A,
a set of full Lebesgue measure in [0, 1], into the countable set {2
n
}
n1
.
On the other hand, it maps C, a Lebesgue negligible set, into [0, 1], a set
with strictly positive Lebesgue measure.
Exercises
7.1 Let H : R R be satisfying the Lipschitz condition
|H(x) H(y)| C|x y| x, y R
and let f : [a, b] R be an absolutely continuous function. Show that H f
is absolutely continuous in [a, b].
7.2 Let E R be a Borel set and assume that any t R is either a point
of density or a point of rarefaction of E. Show that either L
1
(E) = 0 or
L
1
(R\ E) = 0. (Remark: the same result is true in R
n
, but with a much harder
proof, see [3], 4.5.11).
7.3[Lipschitz change of variables] Let f : I = [a, b] R be absolutely
continuous (resp. Lipschitz). Show that
_
f (b)
f (a)
(y) dy =
_
b
a
( f (x)) f

(x) dx
for any bounded (resp. integrable) Borel function : f (I ) R.
7.4 Use the previous exercise to show that, for any Lipschitz function f : R
R and any L
1
negligible set N B(R), the derivative f

vanishes L
1
a.e.
on f
1
(N).
Chapter 8
Measurable transformations
In this chapter we study the classical problem of the change of variables
in the integral from a new viewpoint. We will compute how the Lebesgue
measure in R
n
changes under a sufciently regular transformation, gen-
eralizing what we have already seen for linear, or afne, maps. As a
byproduct we obtain a quite general change of variables formula for in-
tegrals with respect to the Lebesgue measure.
8.1. Image measure
We are given two measurable spaces (X, E) and (Y, F), a measure
on (X, E) and a (E, F)measurable mapping F : X Y. We dene a
measure F
#
in (Y, F) by setting
F
#
(I ) := (F
1
(I )), I F. (8.1)
It is easy to see that F
#
is well dened, by the measurability assumption
on F, and -additive on F. F
#
is called the image measure of by F.
The following change of variable formula is simple, but of a basic
importance.
Proposition 8.1. Let : Y [0, ] be a Fmeasurable function. Then
we have
_
X
(F(x)) d(x) =
_
Y
(y) dF
#
(y). (8.2)
Proof. By monotone approximation it is enough to prove (8.2) when
is a simple function. By linearity of both sides we need only to consider
functions of the form = 1
I
, where I F. In this case we have
F = 1
F
1
(I )
, hence (8.2) reduces to (8.1).
In the following example we discuss the relation between the change of
variables formula (8.2), that even on the real line involves no derivative,
and the classical one. The difference is due to the fact that in (8.2) we are
not using the density of F
#
with respect to L
1
. It is precisely in this
density that the derivative of F shows up.
Example 8.2. Let F : R R be of class C
1
and such that F
(t ) > 0 for
all t R. Let A be the image of F (an open interval, by the assumptions
made on F) and let : A R be continuous. Then for any interval
[a, b] A the following elementary formula of change of variables holds
(just put y = F(x) in the right integral):
_
F
1
(b)
F
1
(a)
(F(x)) dx =
_
b
a
(y)
1
F
(F
1
(y))
dy.
On the other hand, choosing = 1
I
with I = [a, b] in (8.2), we have
_
F
1
(b)
F
1
(a)
(F(x)) dx =
_
b
a
(y) dF
#
L
1
.
Hence, comparing the two expressions, we nd
_
b
a
(y)
1
F
(F
1
(y))
dy =
_
b
a
(y) dF
#
L
1
. (8.3)
Since a, b and are arbitrary, (8.3) can be interpreted by saying that
F
#
L
1
L
1
and
F
#
L
1
=
1
F
F
1
L
1
.
In the next section, we shall generalize this formula to R
n
, and even in one
space dimension we will see that the assumption that F
> 0 everywhere
can be weakened (see also Exercise 8.3).
8.2. Change of variables in multiple integrals
We consider here the measure space (R
n
, B(R
n
), L
n
), where L
n
is the
Lebesgue measure.
We recall a few basic facts from calculus with several variables: given
an open set U R
n
and a mapping F : U R
n
, F is said to be differ-
entiable at x U if there exists a linear operator DF(x) L(R
n
; R
n
)
(1)
such that
lim
|h|0
|F(x +h) F(x) DF(x)h|
|h|
= 0.
The operator DF(x) if exists is unique, and is called the differential of F
at x. If F is afne, i.e. F(x) = T x + a for some T L(R
n
; R
n
) and
a R
n
, we have DF(x) = T for all x U.
(1)
L(R
n
; R
m
) is the Banach space of all linear mappings T : R
n
R
m
endowed with the sup norm
T = sup{|T x| : x R
n
, |x| = 1}
If F is differentiable at x U we dene the Jacobian determinant J
F
(x)
of F at x by setting
J
F
(x) = det DF(x).
If F is differentiable at any x U and if the mapping DF : U
L(R
n
; R
n
) is continuous, we say that F is of class C
1
. If, in addition,
F is bijective between U and an open domain A and F
1
is of class C
1
in A, we say that F is a C
1
diffeomorphism of U onto A. In this case we
have that DF(x) is invertible and
D(F
1
)(F(x)) = (DF(x))
1
x U.
Finally, by Proposition 6.10 we know that if T L(R
n
; R
n
) we have
L
n
(T(E)) = | det T| L
n
(E) E B(R
n
). (8.4)
8.3. Image measure of L
n
by a C
1
diffeomorphism
In this section we study how the Lebesgue measure changes under the
action of a C
1
map F. The relevant quantity will be the function | J
F
|,
which really corresponds to the distorsion factor of the measure.
Let U R
n
be open. The critical set C
F
of F C
1
(U; R
n
) is dened
by
C
F
:= {x U : J
F
(x) = 0} .
Lemma 8.3. The image F(C
F
) of the critical set is Lebesgue negligible.
Proof. Let K C
F
be a compact set and > 0; for any x K the set
DF(x)(B
1
(0)) is Lebesgue negligible (because DF is singular at x, so
that DF(x)(R
n
) is contained in a (n 1)-dimensional subspace of R
n
),
hence we can nd = (, x) > 0 such that
L
n
_
{z R
n
: dist (z F(x), DF(x)(B
1
(0))) < }
_
< .
By a scaling argument we get
L
n
_
{z R
n
r
(0))) < r}
_
< r
n
r > 0.
On the other hand, since |F(y) F(x) DF(x)(y x)| < r in B
r
(x),
provided r is small enough, we get
F(B
r
(x))
_
z R
n
r
(0)) < r
_
.
It follows that B
r
(x) U and L
n
(F(B
r
(x))) < r
n
for r > 0 small
enough, depending on x.
Since the family of balls {B
r/3
(x)}
xK
covers the compact set K, we can
nd a nite family {B
r
i
/3
(x
i
)}
i I
whose union still covers K and extract
from it, thanks to Vitalis covering theorem, a subfamily {B
r
i
/3
(x
i
)}
i J
made by pairwise disjoint balls such that the union of the enlarged balls
{B
r
i
(x
i
)}
i J
covers K. In particular, covering F(K) by the union of
F(B
r
i
(x
i
)) for i J, we get
L
n
(F(K))
i J
r
n
i
=
3
n
i J
n
_
r
i
3
_
n
3
n
n
L
n
(U).
Letting 0 we obtain that L
n
(F(K)) = 0. Since K is arbitrary,
by approximation (recall that C
F
, being a closed subset of U, can be
written as the countable union of compact subsets of U) we obtain that
L
n
(F(C
F
)) = 0.
The following theorem provides a necessary and sufcient condition
for the absolute continuity of F
#
L
n
with respect to L
n
, assuming a C
1
regularity of F.
Theorem 8.4. Let U R
n
be an open set and let F : U R
n
be of
class C
1
, whose restriction to U \ C
F
is injective. Then:
(i) F
#
(1
U
L
n
) is absolutely continuous with respect to L
n
if and only if
C
F
is Lebesgue negligible.
(ii) If F
#
(1
U
L
n
) L
n
we have
F
#
(1
U
L
n
) =
1
| J
F
|(F
1
)
1
F(U\C
F
)
L
n
. (8.5)
Proof. (i) If L
n
(C
F
) > 0, we have F
#
(1
U
L
n
)(F(C
F
)) L
n
(C
F
) >
0 and F
#
(1
U
L
n
) fails to be absolutely continuous with respect to L
n
,
because we proved in Lemma 8.3 that F(C
F
) is Lebesgue negligible.
Let G be the inverse of the restriction of F to the open set U \C
F
. The
local invertibility theoremensures that the domain A = F(U\C
F
) of G is
an open set, that G is of class C
1
in A and that DG(y) = (DF)
1
(G(y))
for all y A. Let us assume now that C
F
is Lebesgue negligible and
let us show that F
1
(E) is Lebesgue negligible whenever E F(U) is
Lebesgue negligible. Since we already know that C
F
is L
n
negligible
set, we can assume with no loss of generality that E A and show that
G(E) is Lebesgue negligible. Let A
M
be the open sets
A
M
:= {y A : DG(y) < M} .
We will prove that
L
n
(G(K)) (3M)
n
L
n
(K) (8.6)
for any compact set K A
M
. So, F
#
L
n
(3M)
n
L
n
on the compact
sets of A
M
and therefore on the Borel sets; in particular
L
n
(G(E A
M
)) (3M)
n
L
n
(E A
M
) = 0,
and letting M we obtain that L
n
(G(E)) = 0, because E A.
In order to show (8.6) we consider a bounded open set B contained in
A
M
and containing K, and the family of balls B
r
(y) B with y K
and r > 0. For any of these balls the mean value theorem gives (with
t = t (y, z) (0, 1))
|G(z)G(y)| = |DG((1t )y+t z)(zy)| M|zy| z B
r
(y),
therefore G(B
r
(y)) B
Mr
(G(y)) for any of these balls. Since the family
of balls {B
r/3
(y)}
yF
covers K, we can nd a nite family {B
r
i
/3
(y
i
)}
i I
whose union still covers K and extract from it, thanks to Vitalis covering
theorem, a subfamily {B
r
i
/3
(y
i
)}
i J
made by pairwise disjoint balls such
that the union of the enlarged balls {B
r
i
(y
i
)}
i J
covers K. In particular,
by our choice of the radii of the balls, the family {B
Mr
i
(G(y
i
))}
i J
covers
G(K). We have then
L
n
(G(K))
i J
n
(Mr
i
)
n
= (3M)
n

i J
n
_
r
i
3
_
n
(3M)
n
L
n
(B).
Letting B K we obtain (8.6).
Let us prove (ii). We denote by h the RadonNikodym derivative of
F
#
(1
U
L
n
) with respect to L
n
; by Theorem 7.5 we have that
h(y) = lim
r0
1
n
r
n
_
B
r
(y)
h(z) dz = lim
r0
L
n
(G(B
r
(y)))
n
r
n
,
for L
n
a.e. y A.
Taking into account that F
#
(1
U
L
n
) is concentrated on A, and that
1/| J
F
| F
1
= | J
G
|, it remains to prove that for all y
0
A we have
lim
r0
L
n
(G(B
r
(y
0
)))
n
r
n
= | J
G
|(y
0
). (8.7)
For the sake of simplicity we only consider the case when y
0
= 0 and
G(0) = 0 (this is not restrictive, up to a translation in the domain and in
the codomain). We divide the rest of the proof in two steps.
Step 1. We assume in addition that DG(0) = I and show that
lim
r0
L
n
(G(B
r
(0)))
n
r
n
= 1, (8.8)
which is equivalent to (8.7) in this case.
Since DF(0) = DG(0) = I we have by the denition of derivative,
lim
|y|0
|F(x) x|
|x|
= 0, lim
|y|0
|G(y) y|
|y|
= 0
So, for any (0, 1) there exists
> 0 such that if |x| <
we have
x U\C
F
and |F(x)x| < |x| and if |y| <
we have y F(U\C
F
)
and |G(y) y| < |y|. It follows that r <
implies
|F(x)| < r x B
(1)r
(0), |G(y)| < (1 +)r y B
r
(0).
(8.9)
In particular
B
(1)r
(0) G(B
r
(0)) B
(1+)r
(0) r <
. (8.10)
Now, by (8.10) it follows that
(1 )
n
L
n
(G(B
r
(0)))
n
r
n
(1 +)
n
,
provided r <
, and this proves that (8.8) holds.

Step 2. Set T = DG(0) and H(x) = T
1
G(x), so that DH(0) = I .
Then we have G(B
r
(0)) = T(H(B
r
(0))) and so, thanks to (8.4),
L
n
(G(B
r
(0))) = L
n
(T(H(B
r
(0)))) = | det T| L
n
(H(B
r
(0))),
which implies
lim
r0
L
n
(G(B
r
(0)))
n
r
n
= | det T| lim
r0
L
n
(H(B
r
(0)))
n
r
n
= | det T|.
The proof is complete.
Example 8.5 (Polar and spherical coordinates). Let us consider the
polar coordinates
(, ) ( cos , sin ).
Here U = (0, ) (0, 2) and the critical set is empty, because the
modulus of the Jacobian determinant is .
In the case of the spherical coordinates
(, , ) ( cos sin , sin sin , cos )
we have U = (0, ) (0, 2) (0, ) and the critical set is empty,
because the modulus of the Jacobian determinant is
2
sin .
Theorem 8.6 (Change of variables formula). Let U R
n
be an open
set and let F : U R
n
of class C
1
, injective on U \ C
F
. Then
_
F(U)
(y) dy =
_
U
(F(x))| J
F
|(x) dx (8.11)
for any Borel function : F(U) [0, +].
Proof. We rst see that it is not restrictive to assume that C
F
= ; indeed,
F(C
F
) is Lebegue negligible and so images of points in C
F
do not affect
the left hand side, while obviously points in C
F
do not affect the right
hand side. So, possibly replacing U with U \ C
F
, we can assume that
C
F
= .
By (8.2) and (8.5) we have
_
F(U)
(y)
| J
F
|(F
1
(y))
dy =
_
U
(F(x)) dx.
for any nonnegative Borel function . We conclude choosing (y) =
(y)| J
F
|(F
1
(y)).
Exercises
8.1 Let (X, F), (Y, G) and (Z, H) be measurable spaces and let f : X Y,
g : Y Z be measurable maps. Show that
g
#
( f
#
) = (g f )
#
for any measure in (X, F).

8.2 Let f : {0, 1}
N
[0, 1] be the map associating to a sequence (a
i
) {0, 1}
the real number

i
a
i
2
i 1
[0, 1]. Show that
f
#
_

i =0
(
1
2
0
+
1
2
1
)
_
= 1
[0,1]
L
1
.
8.3 Show the existence of a strictly increasing and C
1
function F : R R
such that F
#
L
1
is not absolutely continuous with respect to L
1
.
8.4 Remove the injectivity assumption in Theorem 8.4, showing that
F
#
(1
U
L
n
) =
xF
1
(y)\C
F
1
| J F|(x)
1
F(U\C
F
)
L
n
.
for any C
1
function F : U R
n
with Lebesgue negligible critical set.
Appendix A
A.1. Continuity and differentiability of functions depending
on a parameter
In this section we consider the following problem: we are given a metric
space (X. d) and a measure space (Y. F. ). Given f : X Y R, we
assume that for all x X the function f (x. ) is integrable, so that the
function F : X R given by
F(x) :=
_
Y
f (x. y) d(y) x X
is well dened. We would like to understand under which conditions F,
an integral depending on the parameter x, is continuous. When X is an
open subset of R
n
endowed with the Euclidean distance, it is also natural
to investigate the differentiability properties of F.
Theorem A.1 (Continuity of F). Assume that f (. y) is continuous in
X for -almost all y Y and that there exists m L
1
(Y. ) satisfying
sup
xX
| f (x. y)| m(y) for a.e. y Y. (A.1)
Then F is bounded and continuous in X.
Proof. It is clear that |F(x)| m
1
for all x X. Continuity is a
simple consequence of the dominated convergence theorem: indeed, if
x
n
X converge to x, then f (x
n
. y) converge to f (x. y) for -almost
every y and the convergence is dominated because of (A.1). It follows
that F(x
n
) F(x).
A more expressive way to state the continuity of F is to say that limit
and integral commute, namely
lim
h
_
Y
f (x
h
. y) d(y) =
_
Y
lim
h
f (x
h
. y) d(y).
The following example shows that if no uniform upper bound is imposed
on f , then continuity might fail:
Example A.2. Let X = Y = R, = L
1
and
f (x. y) :=
|x|(1 |y||x|) if |y||x| - 1;

0 if |y||x| 1.
Then F(x) = 1 for all x = 0, while F(0) = 0. In this case the smallest
possible function satisfying (A.1) is |y|
1
which is not integrable.
Next, we assume that X is an open set of R
n
endowed with the Euc-
lidean distance and we investigate the differentiability of F. Under suit-
able assumption, we can commute derivative and integral, namely
x
i
_
Y
f (x. y) d(y) =
_
Y
f
x
i
(x. y) d(y) x X. i = 1. . . . . n.
(A.2)
Theorem A.3 (Differentiability of F). Assume that for -almost all y
Y the function f (. y) is differentiable in X with a continuous gradient
x
f (x. y) and that, for any ball B
r
(x
0
) X, there exists m L
1
(Y. )
satisfying
| f (x
0
. y)| + sup
xB
r
(x
0
)
|
x
f |(x. y) m(y) for a.e. y Y. (A.3)
Then F C
1
(X) and (A.2) holds.
Proof. We x x
0
X, i {1. . . . . n} and x
i
= x + t
i
e
i
with t
i
= 0 and
t
i
0. The mean value theorem, applied for any y such that f (. y)
C
1
(X), gives
i
(y) (0. 1) satisfying
F(x
0
+t
i
e
i
) F(x
0
)
t
i
=
_
Y
f
x
i
(x
0
+
i
(y)t
i
e
i
. y) d(y)
For i large enough (as soon as |t
i
| - r) the functions of y inside the
integral are dominated by the function m in (A.3), hence we can pass to
the limit with the dominated convergence theorem to get (notice that the
measurability of f ,x
i
(x
0
. ) follows by the same limiting process)
F
x
i
(x
0
) =
_
Y
f
x
i
(x
0
. y) d(y).
Finally, continuity of partial derivatives of F is a consequence of the
previous theorem.
Of course similar statements can be given for k-th order derivatives of F,
provided f (. y) is k times differentiable and, for any ball B
r
(x
0
) R
n
there exists m L
1
(Y. ) satisfying
| f (x
0
. y)| + sup
xB
r
(x
0
)
sup
| p|k
|D
p
f |(x. y) m(y) for a.e. y Y
(here p = ( p
1
. . . . . p
n
) and | p| = p
1
+ + p
n
). Under this assumption
one obtains that
D
p
_
Y
f (x. y) d(y) =
_
Y
D
p
x
f (x. y) d(y) whenever | p| k.
A.2. The dual space of continuous functions
In this section we want to characterize the space (C(X))
, dual space of
C(X), with (X. d) compact metric space. Recall that C(X) is a Banach
space, when endowed with the sup norm, regardless of any assumption
on (X. d). Some knowledge of the basic terminology of Banach spaces
(dual space, dual norm) is needed for this section.
We start with some notation: we shall denote by M(X) the space of
signed measures , i.e. the real-valued and -additive set functions ,
dened on B(X), of the form =
+

with
positive and nite

Borel measures satisfying
+

.
This orthogonality condition ensures uniqueness of the decomposition
of , as we will see in a moment; existence, instead, is just a consequence
of the -additivity (see Section 6.5), but we shall not use this fact in the
sequel.
For M(X) we denote || =
+
+
its total variation measure,

as in Section 6.5, and set
:= ||(X) =
+
(X) +
(X). (A.4)
In the next proposition we show that the decomposition =
+

is unique, so that (A.4) is well posed, and that M(X) is a normed space.
The completeness of M(X) will be a consequence of TheoremA.6, since
any dual space is complete.
Proposition A.4. For any M(X) the decomposition =
+

is unique. In addition M(X), endowed with the norm (A.4), is a normed

space.
Proof. Assume that =
+

=
+

, with orthogonal decom-

positions. Let A be a Borel set where
+
is concentrated, so that
is
concentrated on X \ A, and let

A be an analogous Borel set for
. Since
0 (respectively 0) on the subsets of A (respectively of X \ A)
and the same property holds for

A, we obtain that (and therefore
and
) vanishes on subsets of A \

A and of

A \ A. On the other hand, if
B A

A we have
(B) =
(B) = 0 and
+
(B) = (B) =
+
(B).
Analogously, if B (X \ A) (X \

A) we have
+
(B) =
+
(B) = 0
and
(B) =
(B). This proves that
.
Now, stability of M(X) under multiplication with real constants and
1-homogeneity of the norm are obvious. Let us prove stability under
addition and subadditivity of the norm: if =
+
and =
+
we can write as before = f || and = g|| with f. g : X [1. 1].

Then, setting = ||+||, the RadonNikod ymtheoremgives || = a
and || = b for suitable a. b : X [0. 1], so that
+ = f || + g||| = ( f a + gb)
and we may take ( f a + gb)
as positive and negative parts of + .

We obtain also
+ =
_
X
| f a + gb| d
_
X
|a| +|b| d = +.
This completes the proof of the proposition.
We shall also denote by A(X) the collection of open subsets of X
and use the following characterization of set functions dened on A(X)
which are restrictions of -additive measures dened on the Borel -
algebra.
Proposition A.5. Let (X. d) be a compact metric space and let :
A(X) [0. +] be a nondecreasing set function satisfying () = 0
and:
(i) (continuity) if A
n
A(X), n N, monotonically converge from
below to A, then (A
n
) (A);
(ii) (subadditivity) (A
1
A
2
) (A
1
) + (A
2
) for all A
1
. A
2

A(X);
(iii) (additivity on disjoint sets) (A
1
A
2
) = (A
1
) +(A
2
) whenever
A
1
A(X) and A
2
A(X) are disjoint.
Then
(B) := inf {(A) : A A(X). A B} (A.5)
is a -additive extension of to B(X).
Proof. Notice rst that is subadditive on A(X): indeed, if A
i
A
i
and B is an open set with compact closure in A, then B is contained in
the union of nitely many A
i
s, so that (ii) gives
(B)
i =1
(A
i
).
Since B is arbitrary, (i) gives (A)

i
(A
i
).
Now, if we take (A.5) as the denition of for all subsets of X, Pro-
position 1.16 gives that extends and is subadditive. Then, The-
orem 1.17 gives that is additive on the Borel algebra, provided
we are able to show that any Borel set is additive. Since the class of
additive sets is a algebra, sufces to show that any closed set is
additive.
To this aim, we rst show that is additive on distant sets, namely
(recall that dist(U. V) is the inmum of the distances d(x. y) for x U
and y V)
(B
1
B
2
) = (B
1
) + (B
2
) whenever dist(B
1
. B
2
) > 0. (A.6)
Indeed, if A B
1
B
2
is open we can consider the disjoint open sets
A
1
:= {x A : dist(x. B
1
) - dist(x. B
2
)} .
A
2
:= {x A : dist(x. B
2
) - dist(x. B
2
)}
containing B
1
and B
2
respectively to get
(A) (A
1
A
2
) = (A
1
) +(A
2
) (B
1
) + (B
2
).
Since A is arbitrary the inequality in (A.6) follows, while the converse
one is a consequence of subadditivity.
Let F X be closed, B X and let us prove that (B F) + (B \
F) (B) (the opposite inequality follows by subadditivity). Assuming
with no loss of generality (B) - and setting
B
h
:=
_
x B : 2
h
> dist(x. F) 2
h1
_
h Z
the additivity on distant sets gives
hZ
(B
2h
) (B) - .
hZ
(B
2h+1
) (B) -
because all nite sums are made on distant sets, all contained in B. We
have then that

hZ
(B
h
) is convergent and, since the sets B
h
are a
partition of B \ F, using once more the additivity on distant sets we get
(B F) + (B \ F) (B F) +
_
N
_
h=
B
h
_
+
h=N+1
(B
h
)
=
_
(B F)
N
_
h=
B
h
_
+
h=N+1
(B
h
)
(B) +
h=N+1
(B
h
)
for any N 1. Letting N the inequality follows.
For g C(X) we can dene
_
X
g d :=
_
X
g d
+

_
X
g d
.
In this way
_
g d is linear w.r.t. g; in addition, since
_
X
h =
_

0

+
({h > t }) dt
_

0

({h > t }) dt
=
_

0
({h > t }) dt
whenever h is nonnegative, splitting g in positive and negative part we
obtain that
_
X
g d is also linear w.r.t. to . Since
_
X
g d

_
X
|g| d
+
+
_
X
|g| d
max |g| = g
g C(X)
the functional
L
(g) :=
_
X
g d g C(X) (A.7)
belongs to (C(X))
and satises L
. The remarkable fact

is that any element in the dual is representable in this form, and that
equality holds. This will also prove that M(X) is a Banach space (with
the denition of M(X) given above, independent of Section 6.5, it is not
even totally obvious that it is a linear space!).
Theorem A.6 (Riesz). Let (X. d) be a compact metric space. The space
(C(X))
is, via (A.7), isomorphic and isometric to M(K). That is: all
functionals L
belong to (C(X))
and, for any L (C(X))
, there exists
a unique M(K) satisfying L = L
. Finally, L
= .
Proof. The proof will be achieved in three steps. In the rst one we build
an auxiliary positive nite measure
and prove in the second one that
provides the desired representation of L when L is nondecreasing.

In the last one we achieve the general case and provide equality of the
norms.
Step 1. Let
: A(X) [0. +) be dened by
(A) := sup {|L(g)| : |g| 1. supp g A} .

Notice that
(X) L and that
() = 0. Notice also that we can

equivalently replace |L(g)| with L(g) inside the supremum and that a
simple approximation argument gives
(A) |L(g)| whenever |g| 1

A
. (A.8)
Indeed, if |g| 1
A
we can nd continuous functions g
n
: X [1. 1]
convergent to g and with support contained in A. In addition, if L is
monotone we have also
(A) L() whenever 1

A
. (A.9)
We claim that
satises all the assumption of Proposition A.5. Indeed,

if g C(X) has support contained in A, since the support is compact
we have K A
i
for i large enough; it follows that L(g)
(A
i
)
sup
j

(A
j
) and since g is arbitrary the continuity follows. In order to
prove the subadditivity, given a continuous g : X [1. 1] with support
K contained in A
1
A
2
, we can consider the disjoint compact sets K \
A
1
and K \ A
2
and a continuous function : X [0. 1] identically
equal to 1 in a neighbourhood of K \ A
1
and identically equal to 0 in a
neighbourhood of K \ A
2
. It follows that (1 )g has support contained
in A
1
and g has support contained in A
2
, hence
L(g) = L((1 )g) + L(g)
(A
1
) +
(A
2
).
Since g is arbitrary, the subadditivity of
follows. Finally, to prove the

additivity on disjoint sets it sufces to notice that, given g
i
with support
in A
i
and |g
i
| 1, the function g = g
1
+ g
2
has support in A
1
A
2
and
satises L(g) = L(g
1
) + L(g
2
) and |g| 1.
By Proposition A.5 we obtain that
is the restriction to A(X) of a

positive measure
. Notice also that
is nite, since
(X) =
(X) = L. (A.10)
Step 2. Now we claim that L
|L|, namely L
(g) |L(g)| for any

nonnegative g C(X). Also, we shall prove that if L is nondecreasing,
namely L(g) 0 whenever g C(X) is nonnegative, then L
coincides
with L. This proves already Riesz theorem for positive functionals.
By homogeneity, in the proof of the inequality L
(g) |L(g)|, it is
not restrictive to assume 0 g 1. Given an integer N 1, let us
consider the open sets A
i
:= {g > i ,N}, i = 0. . . . . N 1, and notice
that
1
N
+
N1
i =1
1
N
1
A
i
g
N1
i =1
1
A
i
. (A.11)
Now, given continuous functions
i
: X [0. 1] satisfying 1
A
i

i

1
A
i 1
, i = 1. . . . . N, we can use (A.8) to estimate
L
(g)
N1
i =1
1
N

(A
i
)
N1
i =1
1
N
|L(
i +1
)|
L
_
1
N
N
i =2
i
_
.
But, since
1
N
+
N1
i =1
1
N

i
g
N1
i =1
1
N

i +1
(A.12)
we can let N and use the continuity of L to get L
(g) |L(g)|.
If L is also monotone we can use the inequality (A.9) to get
L
(g)
1
N

N1
i =1
1
N

(A
i
)
N1
i =1
1
N
L(
i
) = L
_
1
N
N1
i =1
i
_
.
Again we can let N and use (A.12) to get L
(g) = L(g).
Step 3. Now we dene linear continuous functionals L
: C(X) R
by
L
+
(g) :=
L
(g) + L(g)
2
. L
(g) :=
L
(g) L(g)
2
.
We have L
+
+ L
= L
and L
+
L
= L. In addition, by Step 2, L
are monotone.
Now we can apply the construction of Step 1 and use monotonicity in
Step 2 to nd positive nite measures
such that L
= L
. It follows
that
L = L
+
L
= L
+ L
= L

and the representation of L follows. Analogously, we obtain that
L
= L
+
+ L
= L
+ + L
= L
+
+
so that
=
+
+
. To conclude, we identify with L and show

that
+
and
are orthogonal. The bound on follows by (A.10):

=
+
(X) +
(X) = L
+(1) + L
(1) = L
(1) = L.
In order to show that
+

, write
= a
and use the identity
=
+
+
to get a
+
+ a
= 1
a.e. in X. On the other hand

the density of C(X) in L
2
(X.
) and a truncation argument provide

a sequence of continuous functions g
n
: X [1. 1] convergent in
L
2
(X.
) to the sign of a
+
a
, so that
L = sup
|g|1
|L
(g)| = sup
|g|1
_
(a
+
a
)g d
=
_
X
|a
+
a
| d
.
Hence
_
X
(1 |a
+
a
|) d
(X) L 0.
Since |a
+
a
| 1 it must be |a
+
a
| = 1
a.e. in X. Since
a
[0. 1] a.e., this can only happen if a

+
a
= 0
a.e. in X,
which means that
+
is orthogonal to
.
Remark A.7. A similar result holds, with minor changes in the proof,
if (X. d) is locally compact and separable, namely there exists an non-
decreasing sequence of open sets with compact closure whose union is
the whole of X. In this case C(X) has to be replaced by C
0
(X), namely
the closure in C(X) of the space C
c
(X) of compactly supported func-
tions, while M(X) remains unchanged.
Solutions of some exercises
In this chapter we provide solutions to the main exercises proposed in the
text, and in particular of those marked with one or two -.
Chapter 1
Exercise 1.1. All verications are very simple and we omit them.
Exercise 1.2. We prove the statement for the translations, the proof for
the dilations being similar. Fix h R and consider the class
F := {A B(R) : A +h B(R)} .
Then F is a algebra containing the intervals, because the class I of
intervals is invariant under translations. Therefore F (I) = B(R).
This proves that A +h is Borel whenever A is Borel.
Exercise 1.3. Set X = N and :=

n

n
. Then the sets A
n
:= {n. n +
1. . . .} satisfy (A
n
) = +, but their intersection is empty.
Exercise 1.4. Let A
n
A with A
n
. A A. Then the sets B
n
:= A \ A
n
satisfy B
n
, so that by assumption (B
n
) () = 0. Since is
nite, (B
n
) = (A) (A
n
), so that (A
n
) (A).
Exercise 1.5. For any n N
the set A
n
of all atoms x such that ({x})
1,n has at most cardinality n(X): indeed, if we choose k elements
x
1
. . . . . x
k
in this sets, adding the inequalities ({x
i
}) 1,n we nd
k,n (X), whence the upper bound on the cardinality of A
n
follows.
If is nite, we choose X
i
X with X
i
E and (X
i
) -
and repeat the previous argument with the sets A
i.n
:= {x A
X
i
:
({x}) 1,n}, whose union gives A
. If not niteness assumption is

made, the statement fails: take X = R, E = P(R) and (A) = 0 if
A = and (A) = +otherwise.
Exercise 1.6. Let be diffuse. First we prove that for all (0. 1) and
all A E there exists a subset B E with 0 - (B) - (A). Indeed,
if this property fails for some and A, for all subsets B either (B) = 0
or (B) (A). Now, choose B
1
A with (B
1
) (0. (A)) (this is
possible by assumption), then B
2
A \ B
1
with (B
2
) (0. (B
1
)) and
so on. Since all these sets are contained in A, we have (B
i
) (A),
and this contradicts the fact that they are disjoint.
Now, given t (0. (X)) we dene a sequence of pairwise disjoint
sets B
i
and numbers s
i
as follows: rst set
s
1
:= sup {(B) : (B) t }
and then choose B
1
with t (B
1
) > s
1
,2; then recursively set
s
n+1
:= sup
_
(B) : B B
c
n
. (B) t (B
n
)
_
and choose B
n+1
B
c
n
with t (B
n
) (B
n+1
) > s
n+1
,2. We now
claim that (
i
B
i
) = t . If this property fails, then
i
(B
i
) - t and the
convergence of the series implies that s
i
0. On the other hand
s
i
sup
_
(B) : B X \
_
i
B
i
. (B) t
i
(B
i
)
_
The previous property with A = X \
i
B
i
and = (t
i
(B
i
)),(A)
shows that the supremum in the right hand side (independent of i ) is
positive, contradicting the fact that s
i
0.
Exercise 1.7. Let X be a separable metric space and let E = B(X). If
({x}) > 0 for some x X, obviously is not diffuse. Conversely,
if A B(X) is given, with (A) > 0 and (B) {0. (A)} for all
B A, we can x a countable dense set (x
i
) X and dene
r
0
:= sup
_
r 0 : (A B
r
(x
0
)) = 0
_
.
Since r (AB
r
(x
0
)) is right continuous, the maximality of r
0
easily
implies that (A B
r
0
(x
0
)) > 0, and therefore (A B
r
0
(x
0
)) = (A).
Now we iterate this construction, setting A
1
:= A B
r
0
(x
0
), dening
r
1
:= sup
_
r 0 : (A
1
B
r
(x
1
)) = 0
_
.
so that (A
1
B
r
1
(x
1
)) = (A
1
) = (A). Continuing in this way, we
have a nonincreasing family of sets (A
i
) with (A
i
) = (A); it follows
that (
_
i
A
i
) = (A) > 0. On the other hand, any point x
_
i
A
i
satises
d(x. x
i
) = r
i
i N.
By the density of the family (x
i
), this intersection contains at most one
point (and at least one, because the measure is positive). It follows that
this point is an atom of .
Exercise 1.8. Cantors middle third set can be obtained as follows: let
C
0
= [0. 1], let C
1
the set obtained from C
0
by removing the interval
(1,3. 2,3), let C
2
be the set obtained from C
1
by removing the intervals
(1,9. 2,9) and (7,9. 8,9), and so on. Each set C
n
consists of 2
n
disjoint
closed intervals with length 3
n
, so that (C
n
) = (2,3)
n
0. If follows
that the intersection C of all sets C
n
is a closed and negligible set.
In order to show that C has the cardinality of continuum (at this stage
it is not even obvious that C = !) we recall that numbers x [0. 1]
can be represented with a ternary, instead of a decimal, expansion: this
means that we can write
x =
i 1
a
i
3
i
= 0. a
1
a
2
a
3
. . .
with the ternary digits a
i
{0. 1. 2}. As for decimal expansions, this
representation is not unique; for instance 1,3 can be written either as 0.1
or as 0.0222 . . ., and 2,3 can be written either as 0.2 or as 0.1222 . . ..
It is easy to check that C
1
corresponds to the set of numbers that can
be expressed by a ternary representation not having 1 as rst digit, C
2
corresponds to the set of numbers that admit a representation not having
1 as a rst or second digit, and so on. It follows that C is the set of
numbers that admit a ternary representation not using the digit 1: since
the map
(a
1
. a
2
. . . .) {0. 2}
N
x =
i =1
a
i
3
i
provides a bijection of {0. 2}
N
with C, and the cardinality of {0. 2}

N
is
the continuum, this proves that C has the cardinality of continuum.
Exercise 1.9. Let {q
n
}
nN
be an enumeration of the rational numbers in
[0. 1], and set
A :=
_
n=0
(q
n

4
2
n
. q
n
+

4
2
n
).
Then A R is open and (A) -
n
2
n1
= (why is the inequality
strict ?). Therefore [0. 1] \ A has Lebesgue measure strictly less than
and an empty interior, because [0. 1] \ A does not intersect Q.
Exercise 1.11 Let {I
n
}
nN
be an enumeration of the open intervals with
rational endpoints of (0. 1). By the construction in Exercise (1.9), for any
interval I and any (0. (I )) we can nd a compact set C I with
an empty interior such that 0 - (C) - . We will dene
E :=
_
i =0
C
i
where C
n
I
n
are compact sets with an empty interior, (C
n
) > 0 and
(C
n
) -
n
. The choice of C
n
and
n
will be done recursively. Notice
rst that
(E I
n
) (C
n
) > 0 n N.
so we have only to take care of the condition (E I
n
) - (I
n
). Set
n
= (I
n
\
n
0
C
i
) and notice that
n
> 0 because all C
i
have an empty
interior. Since
(I
n
E) (I
n

n
_
0
C
i
) +
i =n+1
i
= (I
n
)
n
+
i =n+1
i
it sufces to choose
n
(and C
n
) in such a way that

n+1
i
-
n
. This
is possible, choosing for instance
n+1
> 0 satisfying
n+1
- max
_
1
2
n
.
1
4
n1
. . . . .
1
2
n+1
0
_
.
to get
i
- 2
ni
n
for i > n.
Exercise 1.12. Let A be measurable and let B. C E be satisfying
ALB C and (C) = 0. For any set D X we have, by monotonicity
of
(D A) +
(D \ A)
(D (B C)) +
((D \ B) C).
Since
(DC)
(C) = (C) = 0, by using twice the subadditivity

of
and then the additivity of B we get
(D A) +
(D \ A)
(D B) +
(D \ B) =
(D).
Since D is arbitrary, this proves that A is additive.
Exercise 1.13. The statement is trivial if
(A) = . If not, for any

n N
we can nd, by the denition of
, a countable union A
n
of
sets of N such that A
n
A and (A
n
)
(A) + 1,n. Then, setting

B :=
_
n
A
n
we have B A and (B) inf
n

(A) + 1,n =
(A).
The inequality (B)
(B) follows by the monotonicity of
, taking
into account that
(B) = (B).
Exercise 1.14. E
is a algebra: stability under complement is immedi-

ate, because A
c
LB
c
= ALB; if A
i
LB
i
C
i
, then (
i
A
i
)L(
i
B
i
)
i
C
i
, and since negligible sets are stable under countable unions, this
proves that E
is stable under countable unions.

The extension (A) := (B), where B E is any set such that ALB
is contained in a negligible set of E, is well dened and additive on
E
: if ALB C and ALB
, then BLB
C C
; consequently,
if (C) = (C
) = 0 it must be (B) = (B
). The additivity can be

proven with an argument analogous to the one used to show that E
is a
algebra.
negligible sets of E
are characterized by the property of being con-

tained in a negligible set of E: if A E
is negligible, there exist

negligible sets B. C E with ALB C; as a consequence A is
contained in the negligible set B C E. Conversely, if A X is
contained in a negligible set C E we may take B = to conclude
that A E
and (A) = 0.
Exercise 1.15. Let A be additive; by Exercise 1.13 we can nd a set
B E containing A with (B) =
(A). The additivity of A and the

equality
(B) = (B) give

(B) =
(A) +
(B \ A).
As a consequence
(B \ A) = 0. Now we apply Exercise 1.13 again, to

nd a negligible set C E containing A \ B. It follows that ALB is
contained in C, and therefore A is measurable.
Exercise 1.16. Let us rst build a family of pairwise disjoints sets {A
i
}
i I
P(N), with I and all sets A
i
having an innite cardinality and
i
A
i
=
N (the construction of the algebra will be more clear if we keep I and
N distinct). The family {A
i
} can be obtained, for instance, through a
bijective correspondence S between NN and N, setting A
i
:= S({i }
N). Then, we dene : N I by
(n) = i , where i I is the unique index such that n A
i
and (with the convention
1
() = )
F :=
_
1
(J) : J I
_
.
It is immediate to check that F is a algebra, that A
i
=
1
({i }) F
and that any nonempty set in F contains one of the sets A
i
. Therefore
F contains innitely many sets, and all of them except have an innite
cardinality.
Exercise 1.17. It sufces to dene (A) = 0 if A has a nite cardinality,
and +otherwise. Anite union of sets has an innite cardinality if and
only if at least one of the sets has an innite cardinality, and this shows
that is additive.
The solutions of the next exercises require a more advanced knowledge
of set theory, and in particular the theory of ordinals, the transnite in-
duction, the behavior of cardinality under unions and products, and Zorn
lemma. We shall denote by the smallest uncountable ordinal and by
the cardinality of continuum.
Exercise 1.18. Notice that F
( j )
(K) implies
_
_
k=0
A
k
. B
c
: (A
k
) F
( j )
. B F
( j )
_
(K).
Therefore, if i is the successor of j , we obtain F
(i )
(K); analog-
ously, if i has no predecessor, and F
( j )
(K) for all j i , then
j i
F
( j )
, namely F
(i )
, is contained in (K). Using these two facts,
one obtains by transnite induction that F
(i )
(K) for all i . An
analogous induction argument shows that F
(i )
F
( j )
whenever i j .
So, the union U :=

i
F
(i )
is contained in (K) and, to prove
that equality holds, it sufces to show that this union is a algebra. Let
(B
k
) U and let i
k
be such that B
k
F
(i
k
)
. Since i
k
are countable
and is uncountable we have i :=

k
i
k
and all sets B
k
belong to
F
(i )
. It follows that their union belongs to F
( j )
, where j is the successor
of i , and therefore to U. An analogous (and simpler) argument proves
that U is stable under complement.
Exercise 1.19. Obviously B(R) has at least the cardinality of continuum,
so we need only to show an upper bound on the cardinality of B(R). The
proof is based on the fact that a union

i J
X
i
and a product
i J
X
i
have cardinality not greater than if the index set J and all sets X
i
have
cardinality not greater than . Let F
(i )
be dened as in Exercise 1.19,
with K having at most the cardinality of continuum. Using the previous
property of products, with J even countable, one can prove by transnite
induction that, for all i , F
(i )
has at most cardinality . If we choose
as K the class of intervals, whose cardinality is (at most) , we nd
B(R) = (K) =
_
i
F
(i )
.
Now we use the above mentioned property of unions, with J = and
X
i
= F
(i )
, to conclude that B(R) has at most the cardinality of con-
tinuum.
Exercise 1.20. Obviously L has a cardinality not greater than the car-
dinality of P(R); by Bernstein theorem
(1)
it sufces to show that the
cardinality of P(R) is not greater than the cardinality of L: if C is the
Cantor set of Exercise 1.8, we know that P(R) is in one-to-one corres-
pondence of P(C), because C has the cardinality of continuum; on the
other hand, any subset of C obviously belongs to L, because C has null
Lebesgue measure.
Exercise 1.21. Let E P(X) be a algebra. Assume by contradiction
that E is innite and countable. We dene the equivalence relation
y y
if and only if ((y B y
B) B E)
and let F be the partition of X in equivalence classes. We now prove that
F E. Indeed, let F F, x f F, for any x , F we have f x
so there must be B E such that f B and x B (or the opposite, but
then we may consider B
c
); given this set B, for any g F we have that
g f implies g B, so that F B. Since x is arbitrary we conclude
that
F =
_
BE.FB
B.
Now, since E is countable, it follows that F E. We eventually note
that any set in E is union of sets in F: but then, if F were nite then E
would be nite, whereas if F were innite then E would be uncountable.
Exercise 1.22 We dene F as in the solution of the previous exercise, in
this case it has nite cardinality, say n; consequently, there are 2
n
sets in
E.
Exercise 1.23 We dene F as in the solution of Exercise 1.21; we also
adapt the above argument to show again that F E. Indeed, let F F,
x f F, for any x , F we have f x so there must be B = B
F.x
E
such that f B and x B; and again F B
F.x
. Hence
F =
_
xX. xF
B
F.x
and this proves that F A, since X is countable. We then use the
Axiom of Choice to dene a function : F X such that (F) F,
and eventually dene =

FF
(F)
(x)
.
(1)
If A has cardinality not greater than B, and B has cardinality not greater than A, then there exists
a bijection between A and B
Exercise 1.24. We begin our construction with an algebra
0
in P(N)
and
0
:
0
{0. 1} which is additive but not additive. For instance
we may take as
0
the algebra generated by singletons {x} with x N
(i.e. the sets A N such that either A or A
c
are nite) and set
0
(A) :=
_
0 if A is nite;
1 if A
c
is nite.
We will extend
0
to an additive function, that we still denote by
0
,
dened on the whole of P(N). If such an extension exists, it cant be
additive, because
0
({n}) = 0 for all n N, while
0
(N) = 1.
In the class C of pairs (. ) with algebra and : {0. 1}
additive, we dene the partial order relation (. ) (
) by
and
|
= ; then we consider the class C
0
of all (. ) satisfying
(. ) (
0
.
0
). By Zorn lemma, we can nd a maximal ( . ) in this
class: indeed, it is easy to check that any totally ordered chain I C
0
has an upper bound (
), dened by
:=
_
(.)I
and
(A) := (A) where A . (. ) I.

We will show that the maximality of ( . ) forces to coincide with
P(N), so that will be the desired extension of
0
.
Let us assume by contradiction that P(N) and choose Z N
with Z , . We notice that
_
(A
1
Z) (A
2
Z
c
) : A
1
. A
2

_
is the algebra generated by {Z}. Moreover, either Z or Z
c
satisfy the
following property
for all A with (A) = 1, Z A = . (A.13)
If not, we would be able to nd A
1
. A
2
with A
1
Z = A
2
Z
c
=
and (A
1
) = (A
2
) = 1, so that A
1
and A
2
would be disjoint and
(A
1
A
2
) = 2, contradicting the fact that maps into {0. 1}. Possibly
replacing Z by its complement we shall assume that Z fulls (A.13).
Now we extend to the algebra generated by {Z}, as follows:
(B) := (A
1
) whenever A
1
. A
2
and B = (A
1
Z) (A
2
Z
c
).
(A.14)
Let us check that is well dened and additive.
1. is well dened: if
B = (A
1
Z) (A
2
Z
c
) = (A
3
Z) (A
4
Z
c
)
then (A
1
Z) = (A
3
Z), and if (A
1
) = (A
3
) then one of the
two numbers, say (A
1
), equals 1, while (A
3
) = 0. Dening A :=
A
1
\ A
3
we have (A) = 1 and A Z = , contradicting (A.13).
2. Suppose B. B
are disjoint. Let B = (A

1
Z) (A
2
Z
c
)
and B
= (A
1
Z) (A
2
Z
c
). Then A
1
A
1
Z = . Setting
A
1
:= A
1
\ A
1
we still have B
= (A
1
Z) (A
2
Z
c
), and then we
can use the additivity of to conclude that
(B B
) = (A
1
A
1
) = (A
1
) + (A
1
) = (B) + (B
).
If B we can choose A
1
= A
2
= B in (A.14) to obtain that (B) =
(B), so that extends to the algebra generated by {Z}. This
violates the maximality of ( . ).
Exercise 1.25 We obviously need only to show that the cardinality of
C is at least equal to the continuum. By the inner regularity of we
can assume with no loss of generality that C is closed. Now, we dene
A = (0. 1) \ C and
g(t ) :=
_
[0. t ] C
_
t [0. 1].
This continuous function maps continuously [0. 1] onto [0. (C)], and
it is constant in any connected component of A, so that g(A) is at most
countable. Since g(C) contains [0. (C)] \ g(A) we obtain that C has
cardinality at least equal to the continuum (one can actually see that
g(C) = g([0. 1])).
Exercise 1.26 Since K is totally bounded, for all c > 0 there exist -
nitely many balls B
1
. . . . . B
N
with radius c whose union covers K. The
properties of imply the existence of an index i such that ({n : x
n

B
i
}) = 1. Now we start with c = 1 and nd a closed ball B
(1)
with
radius 1 such that ({n : x
n
B
(1)
}) = 1. Repeating this construction
in B
(1)
we nd a closed ball B
(2)
with radius 1,2 contained in B
(1)
with
({n : x
n
B
(2)
}) = 1. Continuing in this way, if z is the common point
of the balls B
(i )
, we nd x
n
-converges to z.
Chapter 2
Exercise 2.1 The verication is straightforward and is omitted.
Exercise 2.2 Let . : X R be Emeasurable. If (x) + (x) - t
we can nd a rational number r such that (x) - r and (x) - t r,
hence
{ + - t } =
_
rQ
[{ - r} { - t r}] .
This proves that + is Emeasurable. Analogously, since
{
2
> a} = { >
a} { -
a}. a 0
we obtain that
2
is measurable. Considering the difference ( +)
2
( )
2
we obtain that is Emeasurable.
Exercise 2.3. (i) The verication of the axioms of distance is immediate.
In order to prove the compactness of R, let us consider a sequence (x
n
)
R. If sup
n
x
n
= + we can nd for any k an index n(k) such that
x
n(k)
k; it follows that d(x
n
k
. +) = | arctan x
n(k)
,2| tends to
0, so that x
n(k)
+ in the metric space. Analogously, if inf
n
x
n
=
we can nd a subsequence converging to in (R. d). Finally, if
both sup
n
x
n
and inf
n
x
n
are nite, the sequence (x
n
) is bounded and we
can extract, thanks to the BolzanoWeierstrass theorem, a subsequence
x
n(k)
converging to x R. The continuity of z arctan z implies that
x
n(k)
x in (R. d). To prove the equivalence of the two topologies,
let us work with closed sets: if C R is closed with respect to the
(R. d) topology, then it is closed with respect to the Euclidean topology,
because |x
n
x| 0 implies | arctan x
n
arctan x| 0. On the other
hand, if | arctan x
n
arctan x| 0 then for n large enough arctan x
n
belongs to an interval I := (arctan x . arctan x +) (,2. ,2);
the continuity of y tan y in I implies that x
n
x. This proves the
converse implication, and the equivalence of the two topologies.
(ii) We notice rst that, according to (i), B(R) and {}, {+} belong
to B(R). Therefore, if f is measurable between E and the Borel
algebra of (R. d), then it is Emeasurable according to (2.2). According
to the measurability criterion, in order to prove the converse implication
it sufces to show that B(R) is generated by B(R) {} {+}:
this follows by the fact that if C R is closed, then
C = (C R) (C {}) (C {+})
(again by (i)) belongs to the algebra generated by B(R){}{+},
therefore the algebra generated by this family of sets contains B(R).
Exercise 2.4. If { f = g} is contained in a negligible set C of E,
for some Emeasurable function g, then { f > t }L{g > t } C for all
t R, and since {g > t } E it follows that { f > t } E
; this means
that f is E
measurable. Conversely, assume that f is E
measurable
and nd for all q Q a set B
q
E and a negligible set C
q
E with
{ f > q}LB
q
C
q
. We dene
g(x) := sup
_
q Q : x B
q
_
. C :=
_
qQ
C
q
.
Since {g t } =
_
qt
B
q
we have that g is Emeasurable. Let us prove
that f (x) = g(x) for all x , C: for any such x we have x B
q
for
all q - f (x), therefore g(x) f (x); if the inequality were strict, there
would exist q Q with x B
q
and q > f (x), therefore x would be in
B
q
\ { f > q} C
q
C.
Exercise 2.5. If we can nd a nondecreasing family of partitions
1
. . . . .
n
with
1
= ,
n
= and
i +1
\
i
containing just one point.
Therefore, in the proof of the monotonicity of I
( f ) we need only
to show that I
( f ) I
{t }
( f ) whenever t (0. ) \ . Let =
{t
0
. . . . . t
N
} and let i be the last index such that t
i
- t . If i - N we use
the inequality
(t
i +1
t
i
) f (t
i +1
) = (t
i +1
t ) f (t
i +1
) +(t t
i
) f (t
i +1
)
(t
i +1
t ) f (t
i +1
) +(t t
i
) f (t )
adding to both sides
j =i
(t
j +1
t
j
) f (t
j +1
) we obtain I
( f ) I
{t }
( f ).
If i = N the argument is even easier, because the difference I
{t }
( f )
I
( f ) is given by (t t
N
) f (t ).
Now, let f. g : (0. +) [0. +) be given; since I
( f + g) =
I
( f ) + I
(g) we get I
( f + g)
_
0
f (t ) dt +
_
0
g(t ) dt . Since
Y is arbitrary, this proves that
_

0
f (t ) + g(t ) dt
_

0
f (t ) dt +
_

0
g(t ) dt.
In order to prove the converse inequality, x L -
_
0
f (t ) dt , M -
_
0
g(t ) dt and nd . Y with I
( f ) > L and I
(g) > M; then

_

0
f (t ) + g(t ) dt I
( f + g) = I
( f ) + I
(g)
I
( f ) + I
(g) > L + M.
Letting L
_
0
f (t ) dt and M
_
0
g(t ) dt the inequality is proved.
Exercise 2.6. We will prove that f
is lower semicontinuous, the proof

of the upper semicontinuity of f

being analogous. Let (x
n
) R be
converging to x and use the denition of f
(x
n
) to nd y
n
R such that
|x
n
y
n
| -
1
n
and f (y
n
) f
(x
n
) +
1
n
.
Then (y
n
) still converges to x, so that
f
(x) liminf
n
f (y
n
) liminf
n
f
(x
n
) +
1
n
= liminf
n
f
(x
n
).
Exercise 2.7. Let t R and let (x
n
) { f
t } be convergent to x.
Then, the lower semicontinuity of f
gives
f
(x) liminf
n
f
(x
n
) t.
This proves that x { f
t }, so that { f
t } is closed. The proof for f

is similar. Since the Borel algebra is generated by halines, it follows
that f

and f
are Borel, and the same is true for the set { f
= f

}, that
coincides with Y.
Exercise 2.8. Set
0
:= , A
0
:= {
0
a
0
} and
1
:= a
0
1
A
0

0. Then, set A
1
:= {
1
a
1
} and
2
:=
1
a
1
1
A
1
and so on. If
(x) = + then
n
(x) = + for all n, so that x belongs to all sets
A
i
and

n
i =0
a
i
1
A
i
(x) = +. We then assume that (x) - + in the
following. By construction we have that 0
i +1

i

0
= ,
hence
=
n+1
+
n
i =0
(
i

i +1
) =
n+1
+
n
i =0
a
i
1
A
i
.
This proves that

i
a
i
1
A
i
. If the inequality were strict for some
x X with (x) - +, we could nd > 0 such that
i
(x) for
all i N, and since a
i
- for i large enough, we would get x A
i
for i
large enough. But since the series
i
a
i
is not convergent, we would get
i
a
i
1
A
i
(x) = , a contradiction.
Exercise 2.9. Assume by contradiction that the absolute continuity prop-
erty fails. Then, for some > 0 we can nd A
i
with (A
i
) - 2
i
and
_
A
i
|| d . It follows that the set B := limsup
i
A
i
is negligible,
and
B
n
:=
_
i n
A
i
\ B .
Since
_
B
n
|| d
_
A
n
|| d we nd a contradiction with the dom-
inated convergence theorem applied to the functions 1
B
n
||, pointwise
converging to 0.
Exercise 2.10. Let > 0 be given and let > 0 be such that
_
A
|| d -
,2 whenever A E and (A) - . The triangle inequality gives,
with the same choice of A,
_
A
|
n
| d - for n > n
0
, provided
n

1
- ,2 for n > n
0
. Since
1
. . . . .
n
0
are integrable, we can nd
i
> 0 such that
_
A
|
i
| d - whenever A E and (A) -
i
. If
0
= min{. min
i

i
}, we have
_
A
|
n
| d - ,2 whenever n N, A E
and (A) - .
A possible example for the second question is O = [0. 1], = the
Lebesgue measure, and
n
=
2
n
n
1
[2
n
.2
1n
)
. The uniform integrability is a
direct consequence of the convergence of
n
to 0 in L
1
. If
n
g, then
n=1
n
=
n=1
n
g
but
_
n=1
n
=

1
1,n = +.
Exercise 2.11. (a) For any y X we have
g
(x) g(y) +d(x. y) g(y) +d(x
. y) +d(x. x
).
Since y is arbitrary we get g
(x) g
(x
) + d(x. x
). Reversing the
roles of x and x
the inequality is achieved.

(b) Clearly the family (g
) is monotone with respect to , and since we

can always choose y = x in the minimization problem we have g
(x)
g(x). Assume that sup
(x) is nite (otherwise the statement is trivial)

and let x
such that g
(x) +
1
g(x
) + d(x. x
). This inequality
implies that x
x as and, now neglecting the term d(x. x
),
that
g
(x) +
1
g(x
).
Passing to the limit in this inequality as and using the lower
semicontinuity of g we get sup
(x) g(x).
Exercise 2.12. Let us rst assume that f is bounded. For > 0 we
consider the functions
f
(x. y) :=
1
2
_
x+
x
f (x
. y) dx
.
Since x f (x. y) is continuous, we can apply the mean value theorem
to obtain that f
(x. y) f (x. y) as 0. So, in order to show that f

is a Borel function, we need only to show that f
are Borel.
We will prove indeed that f
are continuous: let x

n
x and y
n
y;
since f (x
. y
n
) f (x
. y) for all x
R, we have
1
[x
n
.x
n
+]
(x
) f (x
. y
n
) 1
[x.x+]
(x
) f (x
. y)
for all x
R \ {x . x + }. Therefore, since f is bounded, the

dominated convergence theorem yields
f
(x. y) =
1
2
_
R
1
[x.x+]
(x
) f (x
. y) dx
=
1
2
lim
n
_
R
1
[x
n
.x
n
+]
(x
) f (x
. y
n
) dx
= lim
n
f
(x
n
. y
n
).
In the general case when f is not bounded we approximate it by the
bounded functions f
h
(x) := max{h. min{ f (x). h}}, with h N, that
are still separately continuous, and therefore Borel.
Chapter 3
Exercise 3.1. On the real line, endowed with the Lebesgue measure,
the function (1 + |x|)
1
belongs to L
2
, but not to L
1
, and the function
|x|
1,2
1
(0.1)
(x) belongs to L
1
, but not to L
2
. Turning back to the general
case, if L
p
1
L
p
2
with p
1
p
2
, from the inequality
||
p
max{||
p
1
. ||
p
2
} ||
p
1
+||
p
2
p [ p
1
. p
2
]
(that can be veried considering separately the cases || 1 and || > 1)
we get that L
p
for all p [ p
1
. p
2
].
Exercise 3.2. The statement is trivial if f
q
= 0, so we assume that
f
q
> 0. For c > 0 the set X
c
:= {| f | > c} has nite measure, by
the Markov inequality, hence the inclusion between L
r
spaces for nite
measures gives that | f |1
X
c
L
p
(X. E. ). Since the dominated conver-
gence theorem gives
lim
c0
_
X
| f f 1
X
c
|
q
d = lim
c0
_
X\X
c
| f |
q
d = 0
we can choose

f = f 1
X
c
for c > 0 small enough.
Exercise 3.3. By homogeneity we can assume that
p
= 1 and
q
=
1. Since
_
X
_
||
p
p
+
||
q
q
||||
_
d =

p
p
+

q
q
1 = 0
and the function among parentheses is nonnegative, it follows that if van-
ishes a.e. In particular, for a.e. x, |(x)| is a minimizer of
y
y
q
q
|(x)|y
in [0. +). But this problemhas a unique minimizer, given by |(x)|
q1
,
and we conclude.
Exercise 3.4. It sufces to apply H olders inequality to the functions ||
r
and ||
r
, with the dual exponents p,r and q,r, to obtain
r
r
||
r
p,r
||
r
q,r
=
r
p
r
q
.
Exercise 3.5. The positive part and the negative part of
n
have the
same integral, hence
_
X
|
n
| d = 2
_
X
(
n
)
+
d.
The condition liminf
n

n
ensures that (
n
)
+
is pointwise conver-
gent to 0; in addition, since
n
are nonnegative, the functions are dom-
inated by
+
. Therefore the dominated convergence theorem gives the
result.
Exercise 3.6. If
n
a.e. we apply Fatous lemma to the functions
n
+
n
to obtain
liminf
n
_
X

n
+
n
d
_
X
+ d.
Therefore
limsup
n
_
X

n
d +liminf
n
_
X

n
d
_
X
d +
_
X
d.
Subtracting
_
d from both sides the statement is achieved. In the
general case, let n(k) be a subsequence such that lim
k
_

n(k)
d =
liminf
n
_
X

n
, and let n(k(s)) be a further subsequence converging to
a.e. Then
liminf
n
_
X

n
d = lim
s
_
X

n(k(s))
d
_
X
liminf
n

n(k(s))
d
_
X
liminf
n

n
d.
Exercise 3.7. We show only how (3.13) implies g(t x + (1 t )y)
tg(x) + (1 t )g(y) for all x. y J and t [0. 1]. We prove rst, by
induction on m, that
g
_
2
m
i =1
1
2
m
x
i
_

2
m
i =1
1
2
m
g(x
i
)
for all x
1
. . . . . x
2
m J. The case m = 1 is (3.13) and the induction step
can be achieved grouping the terms as follows:
2
m
i =1
1
2
m
x
i
=
1
2
_
2
m1
i =1
1
2
m1
x
i
+
2
m1
i =1
1
2
m1
x
2
m1
+i
_
.
Now, considering the case when x
i
= x for 1 i k and x
i
= y
otherwise, we get
g(t x +(1 t )y) tg(x) +(1 t )g(y) with t =
k
2
m
.
Since g is continuous, by approximation we get g(t x + (1 t )y)
tg(x) +(1 t )g(y) for all x. y J and t [0. 1].
Exercise 3.8. Let us rst show the existence of z
0
. Let A = g(R) and
let u
n
= g(z
n
) with u
n
inf A. Since u
n
is uniformly bounded from
above, our assumption on g ensures that (z
n
) is bounded. By the Bolzano-
Weierstrass theorem we can nd a subsequence z
n(k)
convergent to z
R. The continuity of g gives that u
n(k)
= g(z
n(k)
) converge to g(z). It
follows that inf A is nite and coincides with g(z). Now, by applying the
convexity inequality of the previous exercise with x = z
2
, y = z
0
and
t = (z
1
z
0
),(z
2
z
0
), we get
g(z
2
) g(z
1
)
z
2
z
1

g(z
1
) g(z
0
)
z
1
z
0
0
for z
0
- z
1
- z
2
, proving the monotonicity of g in [z
0
. +). The
argument in (. z
0
] is analogous.
Exercise 3.9. Fatous lemma gives liminf
n
_
n
d
_
liminf
n

n
d
_
d. Therefore t
n
:=
_

n
d t :=
_
d; we can apply Exer-
cise 3.5 to the functions
n
,t
n
to obtain that
n
,t
n
,t in L
1
. From
this, taking into account that t
n
t , the convergence of
n
to in L
1
follows.
Exercise 3.10. Let +(c) := +(c),c and notice that |
i
| c+(|
i
|),+(c)
= +(|
i
|),+(c) on {|
i
| c}. Therefore
_
A
|
i
| d
_
A{|
i
|c}
+(|
i
|)
+(c)
d+
_
A{|
i
|-c}
|
i
| d
M
+(c)
+c(A).
Let us choose c sufciently large, such that M,+(c) - ,2, and then
> 0 such that c - ,2. The inequality above yields
_
A
|
i
| d -
whenever (A) - .
Exercise 3.11. Let ( f
n
) C
b
(X) be converging in L
1
to f , and let f
n(k)
be a subsequence pointwise convergent a.e. to f . Then, given any
> 0, by Egorov theorem we can nd a Borel set B X with (B) -
and f
n(k)
f uniformly on B
c
. By the inner regularity of the measure
we can nd a closed set C B
c
such that (X \ C) - . The function f
restricted to C, being the uniform limit of bounded continuous functions,
is bounded and continuous.
Chapter 4
Exercise 4.1. Notice that . is obviously symmetric, that x. y =
x. y = x. y and that x. x = x
2
0, with equality only if
x = 0. Notice that the parallelogram identity gives
x + x
+2y
2
+x x
2
= 2x + y
2
+2x
+ y
2
= 8x. y +8x
. y 2x y
2
2x
y
2
and
x + x
2y
2
+x x
2
= 2x y
2
+2x
y
2
= 8x. y +8x
. y 2x + y
2
2x
+ y
2
.
Subtracting and dividing by 4 we get
x + x
. 2y = 4x. y +4x
. y 2x. y 2x
. y.
So, we proved that x + x
. 2y = 2x. y +2x
. y. Using the relation

u. 2: = 4u,2. : (due to the denition of . and the homogeneity
of ), we get
_
x + x
2
. y
_
=
1
2
x. y +
1
2
x
. y.
Setting x = t
1
:, x
= t
2
:, and dening the continuous function (t ) =
t :. y, we get
_
t
1
+t
2
2
_
=
1
2
(t
1
) +
1
2
(t
2
).
This means that and are convex in R, so that is an afne function,
and since (0) = 0 we get (t ) = t (0), i.e. t u. y = t u. y. Coming
back to the identity above, we get x + x
. y = x. y +x
. y.
Exercise 4.2. Assume that y =
K
(x). For all z K and t [0. 1] we
have y +t (z y) belongs to K, so that
y +t (z y) x
2
y z
2
.
Expanding the squares we get
t
2
z y
2
+2t z y. y x 0 t [0. 1].
This implies (either dividing by t > 0 and passing to the limit as t 0,
or computing the right derivative at t = 0) that z y. x y 0.
Conversely, if for some y K this condition holds for all z K, the
argument can be reversed to get y + t (z y) x y x for
all t 0. Choosing t = 1 we get z x y x, proving that
y =
K
(x).
Exercise 4.3. Let Y
k
be the vector space spanned by { f
1
. . . . . f
k
} and
let us prove by induction on k 1 that f
i
is orthogonal to f
j
whenever
1 i - j k. First we observe that if this property holds for some k,
then Y
k
is k-dimensional and coincides with the vector space spanned by
{:
1
. . . . . :
k
} (being contained in it, and with the same dimension).
The orthogonality of the vectors f
i
can be obtained just noticing that
f
k
= :
k

k1
i =1
:
k
. e
i
e
i
.
So, f
k
= :
k

Y
k1
(:
k
) is orthogonal to all vectors in Y
k1
. It follows
that e
k
. e
i
= 0 for all i - k.
Exercise 4.4. Let y = x
k
x. e
k
e
k
; we knowthat the series converges
in H by Bessels inequality. In order to show that
k
x. e
k
e
k
=
X
(x)
it sufces to prove that y is orthogonal to all vectors in X. But since
any vector : X can be represented as a series, it sufces to show that
:. e
i
= 0 for all i . The continuity and linearity of the scalar product
give
y. e
i
= x. e
i

k=0
x. e
k
x. e
i
= x. e
i
x. e
i
= 0.
Exercise 4.5 Since X and its scalar product coincide with L
2
([0. 1],
P([0. 1]). ), where is the counting measure in [0. 1], we obtain
that X is an Hilbert space. Let us prove by contradiction that X is not
separable. If S = { f
n
}
n1
were a dense subset, it could be possible to
nd a countable set D [0. 1] such that f
n
(x) = 0 for all n and all
x [0. 1] \ D. Since [0. 1] is not countable we can nd x
0
[0. 1] \ D
and dene g
0
(x) equal to 1 if x = x
0
and equal to 0 if x = x
0
. We
claim that g
0
does not belong to the closure of S. If this property fails,
we can nd a sequence ( f
n(k)
) S convergent to g
0
a.e. in [0. 1];
but, convergence a.e. corresponds to pointwise convergence and since
g
0
(x
0
) = 0, while f
n(k)
(x
0
) = 0 for all k, we obtain a contradiction.
Exercise 4.6. By Parseval identity we know that x (x. e
i
)) is a
linear isometry from H to
2
. As a consequence, taking the parallelogram
identity into account, the scalar product is preserved.
Exercise 4.7. We consider the class of orthonormal systems {e
i
}
i I
of H,
ordered by inclusion. Zorns lemma ensures the existence of a maximal
system {e
i
}
i I
. Let V be the subspace spanned by e
i
, let Y be its closure
(still a subspace) and let us prove that Y = H. Indeed, if Y were a proper
subspace of H, we would be able to nd, thanks to Corollary 4.5, a unit
vector e orthogonal to all vectors in Y, and in particular to all vectors
e
i
. Adding e to the family {e
i
}
i I
the maximality of the family would be
violated. Now, by the just proved density of V in H, given any x H
we can nd a sequence of vectors (:
n
), nite combinations of vectors e
i
,
such that x :
n
0. If we denote by J
n
I the set of indexes used
to build the vectors {:
1
. . . . . :
n
}, and by H
n
the vector space spanned by
{e
i
}
i J
n
, we know by Proposition 4.6 that
x
i J
n
x. e
i
e
i
x :
n
0.
As a consequence, setting J =
n
J
n
, we have x =

i J
x. e
i
e
i
.
Chapter 5
Exercise 5.1. The functions sin mx cos l x are odd, therefore their integral
on (. ) vanishes. To show that sin mx is orthogonal to sinl x when
l = m, we integrate twice by parts to get
_

sin mx sinl x dx =
m
l
_

cos mx cos l x dx
=
m
2
l
2
_

sin mx sinl x dx.

The integrals of products cos mx cos l x can be handled analogously.
Exercise 5.2. Since for N - M we have
n=0
x
n

M
n=0
x
n

M
i =N+1
x
i

i =N+1
x
i
we obtain that (
N
0
x
i
) is a Cauchy sequence in E. Therefore the com-
pleteness of E provides the convergence of the series. Passing to the
limit as N in the inequality
N
0
x
i

N
0
x
i
and using the
continuity of the norm we obtain (5.15).
Exercise 5.3. We consider only the rst system g
k
=

2, sin kx, the
proof for the second one being analogous. The fact that (g
k
) is orthonor-
mal can be easily checked noticing that g
k
are restrictions to (0. ) of
odd functions, and using the orthogonality of sin kx in L
2
(. ). Ana-
logously, if f L
2
(0. ) let us consider its extension

f to (. ) as an
odd function and its Fourier series, which obviously contains no cosinus.
In (0. ) we have
N
k=1
b
k
sin kx =
N
k=1
f. g
k
g
k
.
where the scalar products are understood in L
2
(0. ). Therefore, from
the convergence of the Fourier series in L
2
(. ) to

f , which implies
convergence in L
2
(0. ) to f , the completeness follows.
Exercise 5.4. Clearly e
k
. e
k
= 1, while
_

e
i kx
e
il x
dx =
1
i (k l)
_
e
i (kl)x
dx
_
= 0 whenever k = l.
As a consequence (e
k
) is an orthonormal system.
Since the Fourier series S
N
f =

N
N
f. e
k
e
k
of f depends linearly on
f , in order to show completeness we need only to show S
N
f f when
f is real-valued and when f is imaginary-valued (i.e. i f is real-valued).
We consider only the rst case, the second one being analogous. Setting
c
k
= f. e
k
, we have
c
k
=
1
2
_

f (x) cos kx i f (x) sin kx dx.

As a consequence, for k 1 we have
2,c
k
= a
k
i b
k
, where a
k
and
b
k
are the coefcients of the real Fourier series of f , and for k 1 we
have

2,c
k
= a
k
+i b
k
. For k = 0, instead, we have

2,c
0
= a
0
.
Taking into account these relations and setting b
0
= 0, we have
N
k=N
c
k
e
i kx
2
=
1
2
_
N
k=1
(cos kx +i sin kx)(a
k
i b
k
)
+
1
k=N
k
i b
k
)
_
=
a
0
2
+Re
_
N
k=1
k
i b
k
)
_
=
a
0
2
+
N
k=1
a
k
cos kx +b
k
sin kx.
and the convergence of S
N
f to f follows by the convergence in the real-
valued case.
Exercise 5.5. It sufces to note that
1
2
__

f (x)e
i kx
dx
_
2
= ( f. e
k
)
2
.
where (e
k
) is the orthonormal system of Exercise 5.4 and to use its com-
pleteness.
Exercise 5.6. From the identity

2N
i =0
e
i kz
= (e
i (2N+1)z
1),(e
i z
1),
we get
N
k=N
e
i kz
= e
i Nz
2N
k=0
e
i kz
= e
i Nz
e
i (2N+1)z
1
e
i z
1
=
=
e
i (N+1,2)z
e
i (N+1,2)z
e
i z,2
e
i z,2
=
sin((N +1,2)z)
sin(z,2)
(A.15)
and we call this term G
N
(z). Hence
S
N
f (x) =
N
k=N
1
2
_
_

f (y)e
i ky
dy
_
e
i kx
=
N
k=N
1
2
_

f (y)e
i k(xy)
dy
=
1
2
_

f (y)G
N
(x y) dy.
Using the fact that sin((N +1,2)z), sin(z,2) has, still because of (A.15),
mean value 1 on (. ), we get
f (x) S
N
f (x) =
1
2
_

( f (x) f (y))G
N
(x y) dy.
Exercise 5.7. We apply the Parseval identity to the function f (x) =
x
2
, whose Fouries series contains no sinus. It is simple to check, by
integration by parts, that a
0
= 2
2
,3 and that a
k
= 4k
2
cos kx for
k 1. We have then
1
x
4
dx =
2
5
4
=
a
2
0
2
+
k=1
a
2
k
=
4
18
4
+
k=1
16
k
4
.
Rearranging terms, we get
1
k
4
=
4
,90.
Exercise 5.8. The polynomials P
n
are given by Q
n
,Q
n
2
, where Q
n
are recursively dened by Q
0
= 1 and
Q
n
(x) := x
n
n1
k=0
x
n
. Q
k
Q
k
. Q
k
Q
k
(x) = x
n
n1
k=0
x
n
. P
k
P
k
(x) n 1.
(a) Since Q
0
= 1, P
0
= 1,
2 and Q
1
= x x. P
0
P
0
= x, because
x. P
0
= 0. As a consequence P
1
(x) =

3,2x. Since x
2
. P
1
= 0, we
have also
Q
2
(x) = x
2
x
2
. P
0
P
0
x
2
. P
1
P
1
= x
2
1
3
and this leads, with simple calculations, to P
2
(x) =

45,8(x
2
1,3).
(b) Let H be the closure of the vector space spanned by C
n
. This space
contains all monomials x
n
, and therefore all polynomials. Since the poly-
nomials are dense in C([a. b]), for the sup norm, they are also dense in
L
2
(a. b). It follows that H = L
2
(a. b). By Proposition 4.13 we conclude
that (C
n
) is complete.
(c) Set
z
n
:=
_
2n +1
2
1
2
n
n!
.

P
n
(x) := z
n
d
n
d
n
x
(x
2
1)
n
Clearly the polynomial

P
n
has degree n. So, in order to show that

P
n
=
P
n
, we have to show that

P
n
is orthogonal to all monomials x
k
, k =
0. . . . . n1, and that

P
n
2
= 1. Since

P
n
has zeros at 1 with multipli-
city n, all its derivatives at 1 with order less than n are zero. Therefore,
for k - n we have

P
n
. x
k
= z
n
__
x
k
d
n1
d
n1
x
(x
2
1)
n
_
1
1
k
_
1
1
x
k1
d
n1
d
n1
x
(x
2
1)
n
dx
_
=
= (1)
k
k!z
n
_
d
nk
d
nk
x
(x
2
1)
n
_
1
1
= 0.
In order to prove that

P
n
2
= 1, still integrating by parts we have

P
n
.

P
n
= z
2
n
_
1
1
d
n1
d
n1
x
(x
2
1)
n
d
n+1
d
n+1
x
(x
2
1)
n
dx =
= z
2
n
_
1
1
(1 x
2
)
n
d
2n
d
2n
x
(x
2
1)
n
dx.
(A.16)
On the other hand
_
1
1
(1 x
2
)
n
dx = 2n
_
1
1
(1 x
2
)
n1
x
2
dx
= 2n
_
1
1
(1 x
2
)
n
dx +2n
_
1
1
(1 x
2
)
n1
dx.
so that
_
1
1
(1 x
2
)
n
dx =
2n
2n +1
_
1
1
(1 x
2
)
n1
dx =
=
(2n)!!
(2n +1)!!
_
1
1
(1 x
2
)
0
dx =
2(2n)!!
(2n +1)!!
.
Taking into account that
d
2n
d
2n
x
(x
2
1)
n
= (2n)! = (2n)!!(2n 1)!! = 2
n
n!(2n 1)!!
from (A.16) we get

P
n
.

P
n
=
2n +1
2
1
2
2n
(n!)
2
2(2n)!!
(2n +1)!!
2
n
n!(2n 1)!! = 1.
Exercise 5.9. Recall that
c
k
=
1
2
_

f (x)e
i kx
dx.
Integrating by parts once and using that f () = f () we get
c
k
=
1
i k
1
2
_

f

(x)e
i kx
dx.
Continuing in this way, in m steps we get
c
k
=
1
(i k)
m
1
2
_

f
(m)
(x)e
i kx
dx.
Chapter 6
Exercise 6.1. Let us prove the inclusion
(F
1
F
2
) F
3
F
1
(F
2
F
3
).
the proof of the converse one being analogous. We have to show that all
products A B, with A F
1
F
2
and B F
3
belong to F
1

(F
2
F
3
). Keeping B xed, the class of sets A for which this property
holds is a algebra that contains the system of measurable rectangles
A = A
1
A
2
(because AB = A
1
(A
2
B) and A
2
B F
2
F
3
),
and therefore the whole product algebra F
1
F
2
.
For all A in the product algebra we have
(
1

2
)
3
(A) =
_
X
1
X
2
3
(A
x
1
x
2
) d
1

2
(x
1
. x
2
)
=
_
X
1
_
X
2
3
(A
x
1
x
2
) d
2
(x
2
) d
1
(x
1
)
=
_
X
1
3
(A
x
1
) d
1
(x
1
) =
1
(
2

3
)(A).
Exercise 6.2. Obviously the cubes belong to
n
1
B(R), and thanks to
Lemma 6.9 the same is true for the open sets. It follows that B(R
n
) is
contained in
n
1
B(R). Let us consider the class
M:=
_
B R : B R R B(R
n
)
_
.
This class contains the open sets (because the product of open sets is
open) and it is a algebra, so it contains B(R). We have thus proved
that all rectangles B
1
R R, with B
1
Borel belong to B(R
n
). By
a similar argument we can show that all rectangles
R R B
i
R R
are Borel. Intersecting rectangles in these families we obtain that all
rectangles with Borel sides belong to B(R
n
) and we conclude.
Exercise 6.3. Assume that A. B L
1
; then there exist Borel sets
A
. B
and Borel Lebesgue negligible sets N

A
. N
B
with ALA
N
A
and BLB
N
B
. Since A
B(R
2
), by the previous exercise,
(A B)L(A
) (N
A
R) (R N
B
)
and N
A
R and RN
B
are L
2
negligible, we obtain that A B L
2
.
This proves that L
2
contains the generators of L
1
L
1
, and therefore
the whole algebra. In order to show the strict inclusion, we consider
the set E = F {0}, where F R is not Lebesgue measurable. Since
E is L
2
negligible we have E L
2
. On the other hand, since the 0
section E
0
coincides with F, and therefore it does not belong to L
1
, the
set E cant belong to the product of the two algebras.
Exercise 6.4. Let A be the algebra generated by these sets; since these
sets are obviously cylindrical, A is contained in the product algebra.
The class of sets B
n
1
X
i
such that B X
n+1
X
n+2
A
is a algebra containing the measurable rectangles A
1
A
n
, and
therefore contains the product algebra
n
1
F
i
. Therefore A contains
the cylindrical sets and, by denition, the whole product algebra.
Exercise 6.5. The sections T
y
:= {(x. z) : (x. y. z) T} are squares
with length side 2
_
r
2
|y|
2
for 0 |y| r, hence
L
3
(T) =
_
r
r
L
2
(T
y
) dy = 8
_
r
0
(r
2
y
2
) dy = 8(r
3
1
3
r
3
) =
16
3
r
3
.
Exercise 6.6. For x R
n
(with n 3) let
r := (x
2
1
+ x
2
2
)
1,2
. A
r
:=
_
(x
3
. . . . . x
n
) : (x
2
3
+ + x
2
n
) - 1 r
2
_
.
Then, using polar coordinates we get
n
=
_
{r-1}
L
n2
(A
r
) dx
1
dx
2
= 2
n2
_
1
0
r(1 r
2
)
(n2),2
dr
=
2
n

n2
.
Therefore
2k
=
2
k1
k1
2k(2k 2) 4

2
=

k
k!
and an analogous argument gives
2k+1
= 2
k+1
k
,(2k +1)!!.
Exercise 6.7. In order to show that
n
=

n,2
I(
n
2
+1)
we show that the right
hand side satises the same recursion formula of the previous exercise.
Since (thanks to the identities I(1) = 1, I(1,2) =

) the formula
holds when n = 1. 2, this will prove that the identity holds for all n. For
n 2 we have
n,2
I(
n
2
+1)
=

(n2),2
n
2
I(
n
2
)
=
2
n

(n2),2
I(
(n2)
2
+1)
.
Exercise 6.8. We know, by Exercise 2.4, that there exist a negligible
set N FG and a FGmeasurable function

F : XY [0. +]
such that {F =

F} is contained in N. By applying the FubiniTonelli
theorem to 1
N
we obtain that N
x
is negligible in Y for a.e. x X.
Since {F(x. ) =

F(x. )} N
x
, still Exercise 2.4 gives that F(x. ) is
measurable for a.e. x X. This proves statement (i). Since, still for
a.e. x X, the integral on Y (with respect to ) of F(x. ) coincides
with the integral of

F(x. ), statements (ii) and (iii) follow by applying
the FubiniTonelli theorem to

F.
Exercise 6.9. Indeed, (D
y
) = ({y}) = 0 for all y Y, so that
_
Y
(D
y
) d(y) = 0. On the other hand, (D
x
) = ({x}) = 1 for all
x X, so that
_
X
(D
x
) d(x) = 1.
Exercise 6.10. Let (h(k)) be a subsequence such that
k
f
h(k)
f
1
is
convergent. Then the FubiniTonelli theorem gives
_
X
_

k=0
_
Y
| f
h(k)
(x. y) f (x. y)| d(y)
_
d(x)
=
k=0
_
XY
| f
h(k)
(x. y) f (x. y)| d - .
It follows that

k
f
h(k)
(x. ) f (x. )
L
1
()
is nite for a.e. x
X, and for any such x the functions f
h(k)
(x. ) converge to f in L
1
().
Choosing Y = { y} and =
y
, to provide a counterexample it is suf-
cient to consider any example (see Remark 3.7) of a sequence converging
in L
1
but not almost everywhere.
Exercise 6.11. It sufces to apply (6.15) to |h| to show that
_
|h| d f is
nite if and only if
_
|h| f d is nite.
Exercise 6.12. We prove the property for the sup, the property for the inf
being analogous. If A = B
1
B
2
with B
1
F and B
2
F disjoint, we
have
f (B
1
) + g(B
2
) =
_
B
1
f d +
_
B
2
g d
_
B
1
f g d +
_
B
2
f g d
=
_
A
f g d.
The arbitrariness of this decomposition, proves that [( f ) (g)](A)
( f g)(A). The converse inequality can be obtained noticing that, in the
chain of equalities-inequality above, the inequality becomes an equality
if we choose B
1
= A { f g} and B
2
= A { f > g}.
Exercise 6.13. It is easy to check that
i
(respectively,
i
) for
all i I , and that any measure with this property is less than (resp.
greater than ): just write (B) =

k
(B
k
)

k

i (k)
(B
k
) (resp.

k

i (k)
(B
k
). So, it remains to show that and are -additive.
For any map i : N I , A
1
. A
2
F disjoint and any countable F
measurable partition of A
1
A
2
we have
k=0
i (k)
(B
k
) =
k=0
i (k)
(B
k
A
1
) +
k=0
i (k)
(B
k
A
2
).
Estimating the right hand side from below with (A
1
) + (A
2
) we get
(because (B
k
) is arbitrary) that is superadditive, i.e. (A
1
A
2
)
(A
1
) + (A
2
). With a similar argument one can prove not only that
is subadditive, but also that is subadditive (it sufces to consider a
countable Fmeasurable family, instead of 2 sets).
Now, let us prove that is subadditive and is superadditive. Let
A
1
. A
2
F be disjoint and let B
1
k
. B
2
k
be countable Fmeasurable
partitions of A
1
and A
2
respectively. If i
1
. i
2
: N I we dene i (2k) =
i
1
(k), B
2k
= B
1
k
and i (2n +1) = i
2
(n), B
2k+1
= B
2
k
, so that
(A
1
A
2
)
k=0
i (k)
(B
k
) =
k=0
i
1
(k)
(B
1
k
) +
k=0
i
2
(k)
(B
2
k
).
By the arbitrariness of B
1
k
, B
2
k
, i
1
and i
2
we conclude that (A
1
A
2
)
(A
1
) + (A
2
). With a similar argument one can prove that is even
subadditive (one has to use a bijection between NN and N) and that
is superadditive.
Exercise 6.14. If for all > 0 there exists > 0 satisfying
A F. (A) - (A) -
then : indeed, if (A) = 0 the implication above holds for all
> 0, hence (A) = 0. If is nite, to prove the converse we argue by
contradiction. Assume that, for some
0
, we can nd sets A
n
F with
(A
n
) - 2
n
and (A
n
)
0
. Then, by the BorelCantelli lemma the
set A := limsup
n
A
n
is negligible. On the other hand, we have
_

_
m=n
A
m
_
(A
n
)
0
and therefore (here we use the assumption that is nite) (A)
0
,
contradicting the absolute continuity of with respect to .
Exercise 6.15. Let B F be a negligible set where is concentrated.
Then (E) = (E B) for all E F. But, by the absolute continuity
of with respect to , we have (E B) = 0 because E B B is
negligible.
Exercise 6.16. Let B F be a negligible set where is concentrated.
Then
(E) = (E B) (E B) +(E B) (E) E F.
where we used the fact that (E B) = 0 because E B B is
negligible.
Exercise 6.17. It is easy to check that the class of functions f satisfying
f is a lattice. Hence, given a maximizing sequence ( f
h
) in (6.20),
possibly replacing f
h
by max
i n
f
i
, we can assume that f
h
f . The
monotone convergence theorem gives that f is a maximizer.
In order to show that = f we set = f 0 and notice that
satises the following property:
t > 0. B F. t 1
B
(B) = 0. (A.17)
Indeed, the integrals
_
X
( f + t 1
B
) d and
_
X
f d have to coincide, be-
cause ( f +t
B
) .
Exercise 6.18. We have to prove that any measure satisfying (A.17)
is concentrated on a -negligible set. To this aim, let us consider the
problem
inf {(A) : A F. is concentrated on A} .
By taking the intersection of a minimizing sequence it is easy to check
that also this problem has a solution A; we have to show that (A) = 0.
By the minimality of A, the implication
F B A. (B) > 0 (B) > 0 (A.18)
holds. Let us consider the numbers
h
:= sup
_
(B) : F B A.
B
2
h
1
B
_
and let us prove that
h
0 as h . Given maximizers B
h
A,
whose existence is easy to check, we have (B
h
) 2
h
(B
h
) and in
particular
h
(B
h
) - . Hence
_
limsup
h
B
h
_
= 0
and (A.18) tells us that necessarily
0 =
_
limsup
h
B
h
_
limsup
h
(B
h
).
Let us show now that the maximality of B
h
implies that (C) 2
h
(C)
for any set C A\ B
h
, i.e. t 1
A\B
h
. Indeed, if there is C
0
A\ B
h
with (C
0
) > 2
h
(C
0
), the maximality of B
h
provides a minimal integer
h
1
1 and C
1
C
0
satisfying (C
1
) 2
h
(C
1
) 1,h
1
. Let us
consider C
0
\ C
1
; we still have (C
0
\ C
1
) > 2
h
(C
0
\ C
1
) and the
maximality of B
h
provide a minimal integer h
2
h
1
and C
2
C
0
\
C
1
satisfying (C
2
) 2
h
(C
2
) 1,h
2
. Continuing in this way we
have a nondecreasing sequence (h
i
) of integers and (C
i
) Fsuch that
(C
i
) 2
h
(C
i
) 1,h
i
and C
i
C
0
\
i 1
j =1
C
j
for all i 2; moreover
h
i
is the least integer for which there is such C
i
. Now lim
i
h
i
= , since
the C
i
are pairwise disjoint. Setting C = C
0
\
1
C
i
, for all F F
contained in C, since F C
0
\
i 1
1
C
j
for all i 2, we have (F)
2
h
(F)1,(h
i
1) (if h
i
2) and then (F) 2
h
(F). Hence B
h
C
is an admissible set for the maximum problem dening
h
, against the
maximality of B
h
.
We choose h in such a way that
h
- (A) and set t = 2
h
, B = A\B
h
in (A.17). From(A.17) we conclude that (B) = 0, contradicting the fact
that (B) = (A)
h
> 0.
Exercise 6.19. Let =
+
and let
+
=
+
a
+
+
s
,
a
+
s
be
the Lebesgue decompositions with respect to of
+
and
respectively.
Then,
a
:=
+
a

a
and
s
:=
+
s

s
provide a decomposition =
a
+
s
with
a
.
s
signed, |
a
| and |
s
| .
If is signed and A provides a Hahn decomposition of (i.e.
+
(E)=
(EA) and
(E) = (EA
c
)), we repeat the decomposition above
in A, relative to and
+
, and in B = A
c
, relative to and
. Denoting
by
A
a
+
A
s
and
B
a
+
B
s
the two decompositions obtained,
a
(E) :=
A
a
(EA)+
B
a
(EB).
s
(E) :=
A
s
(EA)+
B
s
(EB)
provides the desired decomposition =
a
+
s
with |
a
| || and
|
s
| ||.
The uniqueness of these decompositions can be proved with the same
argument used in the case of nonnegative measures.
Exercise 6.20. Let B F and let (B
i
) be a Fmeasurable partition of
B; since
i =0
| f (B
i
)| =
i =0
_
B
i
f d
i =0
_
B
i
| f | d =
_
B
| f | d.
we obtain that | f |(B) | f |(B). To prove the converse inequality x
> 0 and dene B
i
= B f
1
(I
i
), where I
i
= [i. i +1), i Z. Since
the oscillation of | f i | and || f | |i || in f
i
(I
i
) are less than c, we
get
_
B
i
f di (B
i
)
(B
i
).
_
B
i
| f | d|i |(B
i
)
(B
i
).
hence

_
B
i
| f | d
_
B
i
f d
2(B
i
).
It follows that
i Z
| f (B
i
)| =
i Z
_
B
i
f d
i Z
_
B
i
| f | d 2(B
i
)
=
_
B
| f | d 2(B).
Since is arbitrary the converse inequality follows.
Exercise 6.21. If x - 0 or x 1 all repartition functions are respectively
equal to 0 or 1, so we need to consider only the case x [0. 1). The
repartition function of 1
[0.1]
L
1
obviously is equal to x, while
h
((. x]) =
#{i [1. h] : i hx}
h
=
[hx]
h
.
where [s] denotes the integer part of s. Using the inequalities s 1 -
[s] s with s = hx we obtain that
h
((. x]) x.
Exercise 6.22. The argument is similar to the one used in the proof of
Theorem 6.27: if y - x - y
and y. y
D we have
F(y) = lim
h
F
h
(y) liminf
h
F
h
(x) limsup
h
F
h
(x)
lim
h
F
h
(y
) = F(y
).
Letting y x and y
x, we conclude.
Exercise 6.23. We dene a
h
2 = ((. h]) and, for h
2
- i h
2
,
a
i
= ((i 1),h. i ,h]). Let us denote by
h
the measure obtained in
this way. If x (h. h] and i is the smallest integer in (h
2
. h
2
] such
that x i ,h, we have
__
. x
1
h
__

__
.
i 1
h
__
=
i 1
j =h
2
a
i

h
((. x]).
If x is not an atom of , this proves that
liminf
h

h
((. x]) ((. x]).
Analogously
__
. x +
1
h
__

__
.
i
h
__
=
i
j =h
2
a
i

h
((. x]).
If x is not an atom of , this proves that
limsup
h

h
((. x]) ((. x]).
Exercise 6.24. Let us assume that (6.31) holds. If F
i
(x) 1 as x
+uniformly in i I , for any > 0 we can nd x such that 1F
i
(x) -
,2 for all i I . Analogously, we can nd y - x such that F
i
(y) - ,2
for all i I . Then, the interval I = (y. x] satises
i
(I ) > 1 for all
i I , because I
c
= (. y] (x. +).
Exercise 6.25. If is the weak limit and > 0 is given, let us choose
an integer n 1 such that ([1 n. n 1]) > 1 and points
x (n. 1 n) and y (n 1. n) where the repartition functions of
h
are converging to the repartition function of . Then, since ((. x]) +
1 ((. y]) = (R \ (x. y)) - , there exists n
N such that
sup
nn

n
((. x]) +1
n
((. y]) - . Let now x
and y
be satis-
fying
n
((. x
]) +1
n
((. y
]) - n = 0. . . . . n
1.
Then, the interval I = [min{x. x
}. max{y. y
}] satises inf
n

n
(I ) >
1 .
Exercise 6.26.
(a) lim
h
_
R
g d
h
=
_
R
g d g C
b
(R) (that is, (6.32));
(b) lim
h
_
R
g d
h
=
_
R
g d g C
c
(R);
(c) F
h
converge to F on all points where F is continuous;
(d) F
h
converge to F on a dense subset of R;
(e) lim
h

h
(R) = (R);
(f) (
h
) is tight.
We consider the functions
h
(x) := (x +h), where (x) = (2)
1
2
e
x
2
2
is the Gaussian, and
h
=
h
( being the Lebesgue measure), = 0.
In this case (c), (d), do not hold, because F
h
(x) 1 = 0 = F(x) for all
x R, (e) does not hold and (b) holds.
a b. e. This is easy, because C
c
(R) C
b
(R) and 1
R
C
b
(R).
a c. This follows by second part of the proof of Theorem 6.28.
d c. This is Exercise 6.22.
be c. This follows by the same argument used in the proof of second
part of Theorem 6.28: the sequence (g
k
) monotonically convergent to 1
A
can be chosen in C
c
(R), and this shows that liminf
h

h
(A) (A) for
all A R open. Using (e) and passing to the complementary sets, we
obtain limsup
h

h
(C) (C) for all C R closed.
d f . This follows by the same argument used in the solution of
Exercise 6.25.
d f e. For all x D, with D dense, we have lim
h

h
((. x]) =
((. x]). Since
h
((. x])
h
(R) as x + uniformly in
h, we can pass to the limit as x D + to obtain lim
h

h
(R) =
lim
x+
((. x]) = (R).
d f a. This follows by the same argument used in the rst part of
the proof of Theorem 6.28, choosing the points t
i
in the partitions to be
in the dense set where convergence occurs.
Exercise 6.27. Set
g() :=
1
2
2
_
R
e
i x
e
x
2
,(2
2
)
dx.
Notice that g(0) = 1, and that differentiation theorems under the integral
sign
(2)
and an integration by parts give
g
() =
1
2
2
_
R
i e
i x
(xe
x
2
,(2
2
)
) dx
=

2
2
2
_
R
i
d
dx
e
i x
e
x
2
,(2
2
)
dx
=

2
2
2
_
R
e
i x
e
x
2
,(2
2
)
dx.
Therefore g satises the linear differential equation g
() =
2
g(),
whose general solution is g() = ce
2
,2
. Taking into account that
g(0) = 1, c = 1.
(2)
In this case, the application of the theorem is justied by the fact that sup
I
|
d
d
e
i x
e
x
2
,(2
2
)
|
is Lebesgue integrable for all bounded intervals I
Exercise 6.28. Let us approximate by
n
= 1
(n.n)
; using the in-
equality
|e
i x
e
i x
| |x|| | x. . R
we obtain that
|
n
()
n
()| | |
_
R
|x| d
n
(x) n| |.
therefore
n
is uniformly continuous. Since |
n
() ()| (R \
[n. n]), we have that
n
uniformly as n , therefore is
uniformly continuous (indeed, given > 0, nd n such that sup |
n

| - ,2 and = ,(2n) to obtain |
n
()
n
()| ,2 whenever
| | - , and then | () ()| - ).
Exercise 6.29. Obviously | (
0
)| = 1, and we set c = (
0
) = e
i
for
some R. Since
_
R
|1 ce
i x
0
|
2
d(x) = 2 cc c c = 0.
we obtain that e
i x
0
= c for a.e. x R. This implies that x
0

2Z for a.e. x R, so that is concentrated on the set of points
{(2n + ),
0
}
nN
, and it sufces to set x
0
= ,
0
to obtain the stated
representation of as a sum of Dirac masses.
Obviously | | 1 if is a Dirac mass. Conversely, if | | 1, we nd x
0
with ({x
0
}) > 0 and
0
.
0
R \ {0} with
0
,
0
, Q to obtain that is
concentrated on the set {2n,
0
+x
0
}
nN
and on the set {2n,
0
+x
0
}
nN
.
By our choice of
0
and
0
, the intersection of the two sets is the singleton
{x
0
}, and this proves that =
x
0
.
Chapter 7
Exercise 7.1. Let C > 0 be such that |H(x) H(y)| C|x y| for all
x. y R. Let > 0 and let > 0 be such that
i
| f (b
i
) f (a
i
)| - ,C
whenever

i
(b
i
a
i
) - . We have

i
|H( f (b
i
)) H( f (a
i
))|
C
i
| f (b
i
) f (a
i
)| whenever
i
(b
i
a
i
) - . In particular, choosing
f (t ) = t , we see that Lipschitz functions are absolutely continuous.
Exercise 7.2. We assume that both L
1
(E) > 0 and L
1
(R\ E) > 0. Let
a R be such that L
1
((a. ) E) > 0 and L
1
((a. ) \ E) > 0, and
dene F(t ) = L
1
(E(a. t )). By our choice of a, F(t ) and (t a)F(t )
are not identically 0 in (a. +).
If t > a is a rarefaction point of E, we have
F
+
(t ) = lim
h0
F(t +h) F(t )
h
= lim
h0
L
1
((t. t +h) E)
h
= 0.
Analogously, F
(t ) = 0 and we nd that F
is equal to 0 at all rarefaction

points. A similar argument proves that F
= 1 at all density points. Let

now t
0
(a. ) where 0 - F(t
0
) - (t
0
a) and apply the mean value
theorem to obtain t
0
(a. t
0
) such that
F(t
0
) = (t
0
a)F
(t
0
).
By our choice of t
0
it follows that F
(t
0
) (0. 1), a contradiction (be-
cause either t
0
is a density point or a rarefaction point).
Exercise 7.3. Assume rst that is continuous and bounded. Let H(z):=
_
z
f (a)
(y) dy. By the (classical) fundamental theorem of the integral cal-
culus, H is differentiable and H
(z) = (z) for all z f (I ). By the

chain rule and Exercise 7.1, the function
F(t ) :=
_
f (t )
f (a)
(y) dy = H( f (t ))
is absolutely continuous and it has derivative equal to H
( f (t )) f

(t ) =
( f (t )) f

(t ) at all points t where f is differentiable. On the other hand,
still by the fundamental theorem of the integral calculus, the function
G(t ) :=
_
t
a
( f (x)) f

(x) dx
has derivative equal to ( f ) f

L
1
a.e. in [a. b]. Since both F and G
vanish at t = a, they coincide.
By the dominated convergence theorem, the identity of the two func-
tions persists if = 1
A
, with A open (because 1
A
is the pointwise limit of
continuous functions). By applying Dynkins theorem to the class M of
the sets E B( f (I )) such that
_
f (t )
f (a)
1
E
(y) dy =
_
t
a
1
E
( f (x)) f

(x) dx
we obtain that the formula holds for all = 1
E
with E Borel. Eventu-
ally we obtain it for simple functions and, by uniform approximation, for
bounded Borel functions.
Exercise 7.4. Choosing g = 1
N
, by Exercise 7.3 we get
_
b
a
1
f
1
(N)
f

dx =
0, because 1
N
f = 1
f
1
(N)
. Let h
+
and h
be respectively the positive

and negative part of f

1
f
1
(N)
. Since
_
b
a
h
+
dx
_
b
a
h
dx =
_
b
a
f

1
f
1
(N)
dx = 0
for all intervals (a. b), it follows that h
+
= h
L
1
a.e. in R. As a
consequence, f

= 0 L
1
a.e. in f
1
(N).
Chapter 8
Exercise 8.1. Both are measures in (Z. H ). If B H then g f
#
(B) =
( f
1
(g
1
(B))), because (g f )
1
= f
1
g
1
. On the other hand,
g
#
( f
#
)(B) = f
#
(g
1
(B)) = ( f
1
(g
1
(B))).
Exercise 8.2. Let n 1 integer, 0 k - 2
n
and let us consider the
interval I = [k,2
n
. (k +1),2
n
). Then, f
1
(I ) is the cylindrical set of all
binary sequences a
0
a
1
such that a
0
a
n1
is the binary expression
of k. It follows that
i =0
_
1
2
0
+
1
2
1
_
_
f
1
(I )
_
= L
1
(I ).
because their common value is 2
n
. On the other hand, f
1
({1}) consists
of a single point and therefore the identity above holds for I = {1}, the
common value being 0. By additivity the identity holds for nite unions
of sets of this type, a family stable under nite intersections. By the
coincidence criterion the two measures coincide.
Exercise 8.3. Let A R be a dense open set whose complement C has
strictly positive Lebesgue measure (Exercise 1.9), and let
(t ) := min {1. dist(t. C)} t R.
By construction the function is continuous, nonnegative, bounded by
1, and vanishes precisely on C. Then, set
F(t ) :=
_
t
0
(s) ds if t 0;
_
0
t
(s) ds if t - 0.
We have F
= , so that F C
1
and its critical set C
F
= C has positive
Lebesgue measure. It follows that F
#
L
1
is not absolutely continuous
with respect to L
1
. Finally, since
_
b
a
dt > 0 whenever a - b (because
A (a. b) = ) we obtain that F is strictly increasing.
Exercise 8.4. Recall that F(C
F
) is always Lebesgue negligible, regard-
less of any injectivity assumption on U. Hence, possibly replacing U by
U\C
F
we can assume with no loss of generality that C
F
= , i.e. DF(x)
is nonsingular at any x U. Recall that, according to the local invert-
ibility theorem, for any x U there exists a ball B
r
(x) contained in U
such that the restriction to F is injective. Now, following the strategy of
Lemma 6.9 we can cover U by a sequence of right open cubes {Q
i
}
i I
,
pairwise disjoint, such that the restriction of F to a neighbourhood of
Q
i
is injective (we keep dividing a cube until this property is achieved).
Let Q
i
=
n
i =1
[a
i
. a
i
+ ); for b
i
- a
i
sufciently close to a
i
and
Q
i
=
n
i =1
(b
i
. b
i
+) we have (by injectivity of F on

Q
i
)
F
#
(1

Q
i
L
n
) =
1
| J
F
| F
1
1
F(

Q
i
)
L
n
and therefore we can pass to the limit to get
F
#
(1
Q
i
L
n
) =
1
| J
F
| F
1
(y)
1
F(Q
i
)
L
n
.
If we add both sides with respect to i I we get
F
#
(1
U
L
n
) =
i I
1
| J
F
| F
1
(y)
1
F(Q
i
)
L
n
=
xF
1
(y)
1
| J
F
|(x)
1
F(U)
L
n
.
References
[1] L. CARLESON, On the convergence and growth of Fourier series,
Acta Math. 116 (1966), 135157.
[2] W. F. EBERLEIN, Notes on Integration I: The Underlying Conver-
gence Theorem, Comm. Pure Appl. Math. X (1957), 357360.
[3] H. FEDERER, Geometric Measure Theory, Springer, 1969.
[4] F. RIESZ and B. NAGY, Functional Analysis, Dover, 1990.
[5] W. RUDIN, Real and Complex Analysis, McGraw-Hill, 1987.
[6] S. WAGON, The Banach-Tarski Paradox, Cambridge University
Press, 1985.
[7] K. YOSIDA, Functional Analysis, Springer, 1980.
LECTURE NOTES
This series publishes polished notes dealing with topics of current re-
search and originating from lectures and seminars held at the Scuola Nor-
male Superiore in Pisa.
Published volumes
1. M. TOSI, P. VIGNOLO, Statistical Mechanics and the Physics of Flu-
ids, 2005 (second edition). ISBN 978-88-7642-144-0
2. M. GIAQUINTA, L. MARTINAZZI, An Introduction to the Regularity
Theory for Elliptic Systems, Harmonic Maps and Minimal Graphs,
2005. ISBN 978-88-7642-168-8
3. G. DELLA SALA, A. SARACCO, A. SIMIONIUC, G. TOMASSINI,
Lectures on Complex Analysis and Analytic Geometry, 2006.
ISBN 978-88-7642-199-8
4. M. POLINI, M. TOSI, Many-Body Physics in Condensed Matter Sys-
tems, 2006. ISBN 978-88-7642-192-0
P. AZZURRI, Problemi di Meccanica, 2007. ISBN 978-88-7642-223-2
5. R. BARBIERI, Lectures on the ElectroWeak Interactions, 2007. ISBN
978-88-7642-311-6
6. G. DA PRATO, Introduction to Stochastic Analysis and Malliavin Cal-
culus, 2007. ISBN 978-88-7642-313-0
P. AZZURRI, Problemi di meccanica, 2008 (second edition). ISBN 978-
88-7642-317-8
A. C. G. MENNUCCI, S. K. MITTER, Probabilit` a e informazione,
2008 (second edition). ISBN 978-88-7642-324-6
7. G. DA PRATO, Introduction to Stochastic Analysis and Malliavin Cal-
culus, 2008 (second edition). ISBN 978-88-7642-337-6
8. U. ZANNIER, Lecture Notes on Diophantine Analysis, 2009.
ISBN 978-88-7642-341-3
9. A. LUNARDI, Interpolation Theory, 2009 (second edition).
ISBN 978-88-7642-342-0
186 Lecture notes
10. L. AMBROSIO, G. DA PRATO, A. MENNUCCI, Introduction to Meas-
ure Theory and Integration, 2011.
ISBN 978-88-7642-385-7, e-ISBN: 978-88-7642-386-4
Volumes published earlier
G. DA PRATO, Introduction to Differential Stochastic Equations, 1995
(second edition 1998). ISBN 978-88-7642-259-1
L. AMBROSIO, Corso introduttivo alla Teoria Geometrica della Misura
ed alle Superci Minime, 1996 (reprint 2000).
E. VESENTINI, Introduction to Continuous Semigroups, 1996 (second
edition 2002). ISBN 978-88-7642-258-4
C. PETRONIO, A Theorem of Eliashberg and Thurston on Foliations and
Contact Structures, 1997. ISBN 978-88-7642-286-7
Quantumcohomology at the Mittag-Lefer Institute, a cura di Paolo Aluf-
, 1998. ISBN 978-88-7642-257-7
G. BINI, C. DE CONCINI, M. POLITO, C. PROCESI, On the Work of
Givental Relative to Mirror Symmetry, 1998. ISBN 978-88-7642-240-9
H. PHAM, Imperfections de March es et M ethodes dEvaluation et Couver-
ture dOptions, 1998. ISBN 978-88-7642-291-1
H. CLEMENS, Introduction to Hodge Theory, 1998. ISBN 978-88-7642-268-3
Seminari di Geometria Algebrica 1998-1999, 1999.
A. LUNARDI, Interpolation Theory, 1999. ISBN 978-88-7642-296-6
R. SCOGNAMILLO, Rappresentazioni dei gruppi niti e loro caratteri,
1999.
S. RODRIGUEZ, Symmetry in Physics, 1999. ISBN 978-88-7642-254-6
F. STROCCHI, Symmetry Breaking in Classical Systems, 1999 (2000).
ISBN 978-88-7642-262-1
L. AMBROSIO, P. TILLI, Selected Topics on Analysis in Metric Spaces,
2000. ISBN 978-88-7642-265-2
A. C. G. MENNUCCI, S. K. MITTER, Probabilit` a ed Informazione, 2000.
S. V. BULANOV, Lectures on Nonlinear Physics, 2000 (2001).
ISBN 978-88-7642-267-6
Lectures on Analysis in Metric Spaces, a cura di Luigi Ambrosio e Fran-
cesco Serra Cassano, 2000 (2001). ISBN 978-88-7642-255-3
L. CIOTTI, Lectures Notes on Stellar Dynamics, 2000 (2001).
ISBN 978-88-7642-266-9
S. RODRIGUEZ, The Scattering of Light by Matter, 2001.
ISBN 978-88-7642-298-0
G. DA PRATO, An Introduction to Innite Dimensional Analysis, 2001.
ISBN 978-88-7642-309-3
S. SUCCI, An Introduction to Computational Physics: Part I: Grid
Methods, 2002. ISBN 978-88-7642-263-8
D. BUCUR, G. BUTTAZZO, Variational Methods in Some Shape Optim-
ization Problems, 2002. ISBN 978-88-7642-297-3
187 Lecture notes
A. MINGUZZI, M. TOSI, Introduction to the Theory of Many-Body Sys-
tems, 2002.
S. SUCCI, An Introduction to Computational Physics: Part II: Particle
Methods, 2003. ISBN 978-88-7642-264-5
A. MINGUZZI, S. SUCCI, F. TOSCHI, M. TOSI, P. VIGNOLO, Numer-
ical Methods for Atomic Quantum Gases, 2004. ISBN 978-88-7642-130-0

Luigi Ambrosio, Giuseppe Da Prato, Andrea Mennucci Introduction To Measure Theory and Integration 2011

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Luigi Ambrosio, Giuseppe Da Prato, Andrea Mennucci Introduction To Measure Theory and Integration 2011

Загружено:

Авторское право:

Доступные форматы

10

endowed with the scalar product

and mutually disjoint sets A

, so that B is the disjoint union of

:= {A P(X) : for some B, C E with (C) = 0, AB C} .

is still a algebra, the so-called completion of

are called measurable sets.

is called the outer measure induced by . We can easily show that

is a nondecreasing set function, namely

provides the desired additive

is subadditive on P(X) and

is subadditive, (1.7) is equivalent to

is additive on G. As pointed in Remark 1.5, if

is subadditive and additive on the algebra G, then

(E) < (otherwise (1.8) trivially holds), x > 0 and

is additive on G. We already know

x we obtain that (F)

considered in the proof of

used in the proof of Carath eodorys theoremis also called

, though subadditive, fails

(A + h) = c independent of h Q. Now, if c = 0 we reach a

(R) = , while if c > 0 we consider

) < . Now, setting C = S

:= {x X : {x} E and ({x}) > 0}

: P(X) [0, +] be the

are characterized by the property of being coin-

coincides with the class of E

(B \ A) = 0, and then a set C E with (C) = 0 and B \ A C.

(x) := max{(x), 0} of are integrable in X. As

, in this case it is natural to dene

, the additivity properties of the integral give that

( f ) the upper and lower Riemann integral

= 0 a.e. in J, and this holds iff (because f

measurable (because { f > t } differs from the Borel set { f

d; this leads to (2.17).

, we need only to prove the

be the completion of E induced

measurable iff there exists a Emeasurable

are continuous and f

= min {t 0 : ({|| > t }) = 0} . (3.8)

is characterized by the property

(X, E, ) the space of all equivalence classes of

(X, E, ) is a real vector space and we have the Minkowski

a.e. in X, therefore (3.8) provides (3.10). As a consequence,

(X, E, ) endowed with the norm

|(x)| for a.e.

is the smaller one and L

norm in terms of this limit.

(X, E, ) if and only if the limit lim

coincides with the value of the limit.

(X, E, ) is a Banach space: as a matter of fact, convergence

(X, E, ) differs from the convergence in supremum norm only

(X, EEE, ) is a Banach space). Assume that (

(X, E, ) is a Cauchy sequence, and let us consider the negligible

; as a consequence, the complete-

(X, E, ) if and only if

does not exceed

(t ) := g(t ) +|t |, which converge to +

(X, E, ), then it uniformly converges, up to a -negligible set

f (y) cos kydy, k N,

f (y) sin kydy, k N, k 1.

f (x + 2n) = f (x), x [, ), n = 1, 2, . . . . (5.4)

M for all k N. Then for