MA20218 Analysis 2A: Lecture Notes

MA20218 Analysis 2A
Lecture Notes
Roger Moser
Department of Mathematical Sciences
University of Bath
Semester 1, 2014/5
Contents
1 Riemann Integration
1.1 Lower and Upper Riemann Sums . . . .
1.2 Criteria for Integrability . . . . . . . . .
1.3 Riemann Sums . . . . . . . . . . . . . .
1.4 Properties of the Integral . . . . . . . .
1.5 The Fundamental Theorem of Calculus .
1.6 Integration Techniques . . . . . . . . . .
1.7 Exchanging Integrals with Limits . . . .
1.8 Improper Integrals . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
11
15
17
23
26
27
29
2 Analysis in Several Variables

2.1 The Euclidean Space RN . . . .
2.2 Convergence . . . . . . . . . . .
2.3 Open and Closed Sets . . . . .
2.4 Continuity . . . . . . . . . . . .
2.5 Norms . . . . . . . . . . . . . .
2.6 Derivatives . . . . . . . . . . .
2.7 Higher Order Derivatives . . .
2.8 The Implicit Function Theorem
2.9 The Lagrange Multiplier Rule .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
35
37
38
40
43
51
54
57
Index
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
CONTENTS
Chapter 1
Riemann Integration
Integration deals with two different questions.
Area under a curve Given an interval [a, b] R and a function f :
[a, b] R (say,
continuous), we obtain a curve in the plane described
by its graph (x, y) R2 : a x b, y = f (x) . This curve may be
interpreted as a piece of the boundary of a region in the plane. (Typically, the rest of the boundary is taken to be a piece of the x-axis and
two vertical line segments; see Fig. 1.0.1.) What is the area of such a
region?
Antiderivative Given an interval [a, b] R and a function f : [a, b] R,
is it possible to find another function F : [a, b] R such that F 0 = f
in (a, b)? If so, how?
At first, these questions may seem unrelated. But it turns out that there is
a deep connection; in fact, they are two sides of the same coin.
1.1
Lower and Upper Riemann Sums
Throughout this chapter, let a, b R with a < b. We first want to find the
area under a curve given by a function f : [a, b] R. The main idea is to
divide the interval [a, b] into many small intervals and approximate the region
under the curve by rectangles. We can both overestimate and underestimate
the area this way, and when we choose increasingly fine subdivisions, then
we hope that the difference decreases (cf. Fig. 1.1.1).
Definition 1.1.1. A subdivision or partition of [a, b] is a finite sequence
(x0 , x1 , . . . , xN ) such that
a = x0 < x1 < < xN = b.
If = (x0 , x1 , . . . , xN ) is a subdivision of [a, b], then In = [xn1 , xn ] is called
the n-th interval of for n = 1, . . . , N . The number |In | = xn xn1 is
5
CHAPTER 1. RIEMANN INTEGRATION
y = f (x)
Figure 1.0.1: Area under a curve
y = f (x)
Figure 1.1.1: Lower and upper Riemann sum represented in terms of rectangles
1.1. LOWER AND UPPER RIEMANN SUMS
called the length of In and

kk = max{|I1 |, . . . , |IN |}
is called the mesh of .
Now that we have subdivided the interval, we can calculate the areas of
the rectangles for a given function.
Definition 1.1.2. Let f : [a, b] R be a bounded function and =
(x0 , . . . , xN ) a subdivision of [a, b]. Let In = [xn1 , xn ] be the n-th interval
of . Then
N
X
L(f, ) =
inf f (In )|In |
n=1
is called the lower Riemann sum of f on and

U (f, ) =
N
X
sup f (In )|In |
n=1
is called the upper Riemann sum of f on .

As mentioned before, the lower sum Riemann will underestimate the
area and the upper Riemann sum will overestimate it in general (see Fig.
1.1.1). This is the motivation for the next definition.
Definition 1.1.3. Let f : [a, b] R be bounded. Then
f (x) dx = sup {L(f, ) : is a subdivision of [a, b]}

a
is the lower Riemann integral of f on [a, b] and
f (x) dx = inf {U (f, ) : is a subdivision of [a, b]}

a
is the upper Riemann integral of f on [a, b]. If
f (x) dx,
f (x) dx =
a
then we say that f is Riemann integrable (or integrable for short). The
common value is then called the Riemann integral (or integral for short)
and denoted by
b
f (x) dx.
a
y
y = f (x)
positive area
b
negative
area
Figure 1.1.2: The integral corresponds to a signed area
Remark 1.1.1. These definitions are for bounded functions on bounded

intervals only. Extensions to unbounded functions and unbounded intervals
will be discussed later.
Remark 1.1.2. If f : [a, b] R has negative values, it can happen that the
upper or lower Riemann integral is negative (or at least some of the terms
in the upper and lower Riemann sums are negative). So strictly speaking,
what we consider here are signed areas, where the area of a region below
the x-axis is considered to be negative (see Fig. 1.1.2).
Before we study integrals, we need to know a few facts about lower and
upper Riemann sums.
Lemma 1.1.1. Let f : [a, b] R be a bounded function and let
m = inf f ([a, b])
and
M = sup f ([a, b]).
Then for any subdivision of [a, b],

m(b a) L(f, ) U (f, ) M (b a).
Proof. Suppose that = (x0 , . . . , xN ). Let In = [xn1 , xn ] be the n-th
interval of and let mn = inf f (In ) and Mn = sup f (In ) for n = 1, . . . , N .
Then f (In ) f ([a, b]) and hence
m mn Mn M
for every n. Therefore,
N
X
m(xn xn1 )
n=1
N
X
n=1
N
X
n=1
mn (xn xn1 )
Mn (xn xn1 )
N
X
n=1
M (xn xn1 ).
1.1. LOWER AND UPPER RIEMANN SUMS
But
N
X
(xn xn1 ) = xN x0 = b a,
n=1
so we can rewrite these inequalities as

m(b a) L(f, ) U (f, ) M (b a),
as required.
Definition 1.1.4. Let 1 = (x0 , . . . , xM ) and 2 = (y0 , . . . , yN ) be subdivisions of [a, b]. If for every m {0, . . . , M } there exists an n {0, . . . , N }
such that xm = yn , then 2 is called a refinement of 1 .
Definition 1.1.5. Let f : [a, b] R be a bounded function. If S [a, b] is
any set, then (f, S) = sup f (S) inf f (S) is called the oscillation of f on
S. If is a subdivision of [a, b] with intervals I1 , . . . , IN , then we define
(f, ) =
N
X
(f, In )|In |.
n=1
Remark 1.1.3. From the formulas for L(f, ), U (f, ), and (f, ), we
immediately obtain
(f, ) = U (f, ) L(f, ).
Lemma 1.1.2. Let f be a bounded real function on [a, b].
(i) If and 0 are subdivisions of [a, b] and 0 refines , then
L(f, 0 ) L(f, ), U (f, 0 ) U (f, ), and (f, 0 ) (f, ).
(ii) For any subdivisions 1 , 2 of [a, b],
L(f, 1 ) U (f, 2 ).
(iii) Furthermore,
f (x) dx
a
f (x) dx.
a
The last statement is consistent with the expectation that the lower
Riemann integral underestimates the area, while the upper Riemann integral
overestimates it in general.
10
Proof. (i) If 0 refines , then there exists a number ` N {0} such that
0 has ` points more than . We use induction over `.
If ` = 0, then 0 = and there is nothing to prove. If ` = 1, then we
have numbers x0 , . . . , xN with
a = x0 < x1 < < xN = b
such that = (x0 , . . . , xN ), and we have another number x (xK1 , xK )
for some K {1, . . . , N } such that
0 = (x0 , . . . , xK1 , x , xK , . . . , xN ).
Write In = [xn1 , xn ] for n = 1, . . . , N and
I = [xK1 , x ]
and I + = [x , xK ].
Define mn = inf f (In ) for n = 1, . . . , N and define

m = f (I )
and m+ = f (I + ).
Then m mK (as I IK ) and m+ mK (as I + IK ). Hence

L(f, ) =
K1
X
mn |In | + mK |IK | +
n=1
K1
X
N
X
mn |In |
n=K+1
n=1
K1
X
N
X
mn |In | + mK (|I | + |I + |) +
mn |In |
n=K+1
mn |In | + m |I | + m |I | +
n=1
N
X
mn |In | = L(f, 0 ).
n=K+1
The inequality U (f, ) U (f, 0 ) is proved similarly. The inequality

(f, ) (f, 0 ) follows from the other two.
Finally, for the induction step, consider the case ` 2 and assume that
for any subdivision E of [a, b] and any refinement E 0 of E with less than `
additional points, we have
L(f, E 0 ) L(f, E),
U (f, E 0 ) U (f, E),
and
(f, E 0 ) (f, E).
Choose a point x from 0 that does not belong to . Define 00 to be the

subdivision obtained from 0 by removing x . Then 00 is a refinement of
with ` 1 additional points and 0 is a refinement of 00 with 1 additional
point. Hence by the induction hypothesis,
L(f, 0 ) L(f, 00 ) L(f, )
and
U (f, 0 ) U (f, 00 ) U (f, ).
1.2. CRITERIA FOR INTEGRABILITY
11
It then also follows that (f, 0 ) (f, ).

(ii) Let be a common refinement of 1 and 2 . Then from (i) and
Lemma 1.1.1,
L(f, 1 ) L(f, ) U (f, ) U (f, 2 ).
(iii) For any subdivisions 1 and 2 of [a, b], part (ii) yields
L(f, 1 ) U (f, 2 ).
Taking the supremum over all subdivisions 1 while fixing 2 , we obtain
f (x) dx U (f, 2 ).
a
Now taking the infimum over 2 yields the desired inequality.
1.2
Criteria for Integrability
Not every bounded function is integrable, and so we need tools to help us

decide whether we can integrate a given function.
Example 1.2.1. Let f : [0, 1] R be the function with
(
1 if x [0, 1] Q,
f (x) =
0 if x [0, 1]\Q.
Consider any subdivision of [0, 1], say with intervals I1 , . . . , IN . Then
each In , being an interval of positive length, contains both rational and
irrational numbers. Therefore, we have inf f (In ) = 0 and sup f (In ) = 1 for
n = 1, . . . , N . It follows that
L(f, ) =
N
X
0 |In | = 0
and U (f, ) =
n=1
N
X
1 |In | = 1,
n=1
regardless of the subdivision. Hence
f (x) dx = 0
0
and
f (x) dx = 1.
0
In particular, this function is not Riemann integrable.

Theorem 1.2.1. A bounded function f : [a, b] R is Riemann integrable
if, and only if, for every > 0 there exists a subdivision of [a, b] with
(f, ) < .
12
Proof. Suppose that f is Riemann integrable. Then
f (x) dx = sup {L(f, ) : is a subdivision of [a, b]}

a
and simultaneously
f (x) dx = inf {U (f, ) : is a subdivision of [a, b]} .

a
Thus for any > 0, there exist subdivisions 1 , 2 of [a, b] such that
f (x) dx
L(f, 1 ) >
a
and
U (f, 2 ) <
a

2

f (x) dx + .
2
So U (f, 2 ) L(f, 1 ) < . Now let be a common refinement of 1 and

2 . Then
(f, ) = U (f, ) L(f, ) U (f, 2 ) L(f, 1 ) <
according to Lemma 1.1.2.(i).
Conversely, suppose that for every > 0 there exists a subdivision of
[a, b] with (f, ) < . Then, since
f (x) dx U (f, )
f (x) dx L(f, ),
and
it follows that
f (x) dx
a
f (x) dx U (f, ) L(f, ) < .

a
But since is an arbitrary positive number, we must have
f (x) dx.
f (x) dx =
a
That is, the function f is Riemann integrable.

Example 1.2.2. We claim that the function f : [0, 1] R, x 7 x, is
Riemann integrable with
1
1
x dx = .
2
0
1.2. CRITERIA FOR INTEGRABILITY
13
In order to prove this, let N N and consider the subdivision N =

n
(0, N1 , N2 , . . . , NN1 , 1) of [0, 1]. For n = 1, . . . , N , let In = [ n1
N , N ] and
mn = inf f (In ) =
n1
,
N
Mn = sup f (In ) =
n
.
N
Then
L(f, N ) =
N
X
mn |In | =
n=1
N
X
n1
n=1
N2
(N 1)N
1
1
=
2
2N
2 2N
and
U (f, N ) =
N
X
Mn |In | =
n=1
N
X
n
N (N + 1)
1
1
=
= +
.
2
2
N
2N
2 2N
n=1
Hence
1
1
2 2N
x dx
0
x dx
0
1
1
+
2 2N
for any N N. When we let N , we obtain

1
1
1
1
x dx
x dx ,
2
2
0
0
which implies the claim.
The monotonicity is the reason why we can determine the lower and
upper Riemann sums quite easily in this example. We can draw similar
conclusions for other monotonic functions.
Corollary 1.2.1. Let f : [a, b] R be monotonic (and therefore bounded).
Then f is Riemann integrable.
Proof. We only consider the case where f is increasing, as the case of a
decreasing function is similar.
Consider a subdivision = (x0 , . . . , xN ) of [a, b]. As usual, let In =
[xn1 , xn ] for n = 1, . . . , N . Because f is increasing, we have inf f (In ) =
f (xn1 ) and sup f (In ) = f (xn ). Thus
(f, ) =
N
X
(f (xn ) f (xn1 ))(xn xn1 )
n=1
kk
N
X
(f (xn ) f (xn1 )) = kk(f (b) f (a)).
n=1
Thus for any > 0, we can achieve (f, ) < by choosing the mesh small
enough; more precisely, by choosing kk < /(f (b) f (a)). Now Theorem
1.2.1 implies the claim.
14
Corollary 1.2.2. Let f : [a, b] R be continuous. Then f is Riemann

integrable.
Proof. First we note that f is bounded by the Weierstrass extreme value
theorem. Furthermore, by the theorem of uniform continuity, it is uniformly
continuous.
We want to use Theorem 1.2.1 again, so fix > 0. By the uniform
continuity, we can choose > 0 such that for all x, y [a, b], if |x y| < ,
then |f (x) f (y)| < /(b a).
Let be a subdivision of [a, b] with mesh kk < . Let I1 , . . . , IN be
the intervals of . Then |In | < for every n, and hence (f, In ) < /(b a).
Therefore,
(f, ) =
N
X
n=1
X
(f, In )|In | <
|In | = .
ba
n=1
Now Theorem 1.2.1 implies that f is Riemann integrable.

Lemma 1.2.1. Let f : [a, b] R be a bounded function and write m =
inf f ([a, b]) and M = sup f ([a, b]). Let be a subdivision of [a, b] with
intervals I1 , . . . , IN , and let 0 be a subdivision formed by adding one extra
point x to , say x In . Then
L(f, 0 ) L(f, ) + (M m)|In |
and
U (f, 0 ) U (f, ) (M m)|In |.
Proof. Write In1 and In2 for the two intervals into which x divides In , so
that |In1 | + |In2 | = |In |. Then
L(f, ) =
N
X
inf f (Ik )|Ik |,
k=1
whereas
L(f, 0 ) = inf f (In1 )|In1 | + inf f (In2 )|In2 | +
inf f (Ik )|Ik |.
k6=n
When we take the difference, most of the terms cancel. More precisely,
L(f, 0 ) L(f, ) = inf f (In1 )|In1 | + inf f (In2 )|In2 | inf f (In )|In |
M |In1 | + M |In2 | m|In | = (M m)|In |.
The first inequality follows. The second inequality is proved similarly.
1.3. RIEMANN SUMS
15
Theorem 1.2.2. A bounded function f : [a, b] R is Riemann integrable

if, and only if, for every > 0 there exists a number > 0 such that any
subdivision of [a, b] of mesh kk < will satisfy (f, ) < .
Proof. Let f : [a, b] R be bounded. First suppose that for every > 0
there exists a number > 0 such that for any subdivision of [a, b] with
kk < , we have (f, ) < . There clearly exists a subdivision with
kk < , so Theorem 1.2.1 immediately shows that f is Riemann integrable.
Conversely, suppose that f is Riemann integrable. Fix > 0. Then by
Theorem 1.2.1, there exists a subdivision E of [a, b] with (f, E) < /2.
Let N be the number of points of E and let m = inf f ([a, b]) and M =
sup f ([a, b]).
Now consider an arbitrary subdivision of [a, b] and consider the common refinement 0 , obtained by adding all points of E to (unless they
already belong to ). Then by Lemma 1.1.2,

(f, 0 ) (f, E) < .
2
0
On the other hand, the subdivision is formed by adding at most N points
to . Applying Lemma 1.2.1 each time, we obtain
L(f, 0 ) L(f, ) + N (M m)kk
and
U (f, 0 ) U (f, ) N (M m)kk.
Hence
(f, ) (f, 0 ) + 2N (M m)kk <

+ 2N (M m)kk.
2
Choose

.
4N (M m)
If kk < , it follows that (f, ) < .
=
1.3
Riemann Sums
Definition 1.3.1. Suppose that = (x0 , . . . , xN ) is a subdivision of [a, b]

and = (1 , . . . , N ) is a finite sequence of numbers 1 , . . . , N with
xn1 n xn ,
n = 1, . . . , N.
Then the pair (, ) is called a tagged subdivision of [a, b]. For a bounded
function f : [a, b] R, the expression
S(f, , ) =
N
X
n=1
is called a Riemann sum of f .
f (n )(xn xn1 )
16
It is clear that for a Riemann sum with tagged subdivision (, ) as in

this definition, we have
L(f, ) S(f, , ) U (f, ).
For a Riemann integrable function, the lower and the upper Riemann sums
will both be good approximations for the Riemann integral if kk is sufficiently small by Theorem 1.2.2. It follows that any Riemann sum with
subdivision will also approximate the integral.
Corollary 1.3.1. Let f : [a, b] R be a Riemann integrable function. Then
for every > 0 there exists a number > 0 such that every tagged subdivision
(, ) of [a, b] with kk < will satisfy
b

f (x) dx S(f, , ) < .

a
Proof. Fix > 0 and invoke Theorem 1.2.2 to find a number > 0 such
that (f, ) < for all subdivisions of [a, b] with kk < . We have
L(f, ) S(f, , ) U (f, )
as well as
L(f, )
f (x) dx U (f, ).
a
So the numbers
f (x) dx
and S(f, , )
both belong to the interval [L(f, ), U (f, )] of length less than . This
implies the desired inequality.
Example 1.3.1. Consider the function f : [0, 1] R given by f (x) = x2 .
Being continuous, this function is Riemann integrable by Corollary 1.2.2.
Now we want to calculate the integral.
Let N N and consider the tagged subdivision (N , N ) with N =
(0, N1 , N2 , . . . , 1) and N = ( N1 , N2 , . . . , 1). Then we compute
S(f, N , N ) =
N
X
(n/N )2
n=1
N
1 X 2 N (N + 1)(2N + 1)
n =
,
N3
6N 3
n=1
using the well-known formula for the sum of the first N squares. Since
kN k 0 as N , we have
N (N + 1)(2N + 1)
1
= .
3
N
6N
3
x2 dx = lim
0
1.4. PROPERTIES OF THE INTEGRAL
1.4
17
Properties of the Integral
Theorem 1.4.1. (i) Let f, g : [a, b] R be Riemann integrable. Then

f + g is Riemann integrable and
g(x) dx.
f (x) dx +
(f (x) + g(x)) dx =
(ii) Let f : [a, b] R be Riemann integrable and R. Then f is

Riemann integrable and
f (x) dx =
a
f (x) dx.
a
Proof. (i) Consider any subdivision of [a, b] with intervals I1 , . . . , IN .

Then for n = 1, . . . , N and for x In , we have
inf f (In ) + inf g(In ) f (x) + g(x) sup f (In ) + sup g(In ).
Hence
inf f (In ) + inf g(In ) inf (f + g)(In )
sup (f + g)(In ) sup f (In ) + sup g(In ).
As a consequence, we find that
L(f, ) + L(g, ) L(f + g, ) U (f + g, ) U (f, ) + U (g, ).
Let > 0. By Theorem 1.2.2, we can choose a number > 0 such that
whenever kk < , it follows that (f, ) < /2 and (g, ) < /2. Hence
U (f + g, ) L(f + g, ) U (f, ) + U (g, ) (L(f, ) + L(g, )) < .
Using Theorem 1.2.2 again, we conclude that f + g is integrable.
In order to compute its integral, we first consider an arbitrary tagged
subdivision (, ) of [a, b] with = (x0 , . . . , xN ) and = (1 , . . . , N ). We
observe that
S(f + g, , ) =
N
X
(f (n ) + g(n ))(xn xn1 )
n=1
N
X
f (n )(xn xn1 ) +
n=1
= S(f, , ) + S(g, , ).
N
X
n=1
g(n )(xn xn1 )
18
Now consider a sequence of tagged subdivisions (k , k ) of [a, b] such

that kk k 0 as k . Then by Corollary 1.3.1 and the algebra of limits
theorem, we have
b
(f (x) + g(x)) dx = lim S(f + g, k , k )
k
= lim (S(f, k , k ) + S(g, k , k ))

k
= lim S(f, k , k ) + lim S(g, k , k )

k
k
b
b
g(x) dx.
f (x) dx +
=
a
(ii) Suppose first that > 0. Then for any subdivision of [a, b] with
intervals I1 , . . . , IN , we have
inf (f )(In ) = sup f (In )
and
sup (f )(In ) = sup f (In ).
Hence L(f, ) = L(f, ) and U (f, ) = U (f, ). Taking the supremum and the infimum, respectively, we obtain
b
b
f (x) dx = f (x) dx
a
and
f (x) dx =
a
f (x) dx.
a
If f is integrable, then it follows that

b
b
b
f (x) dx =
f (x) dx =
f (x) dx.
a
In the case = 1, we have, using the same notation,

inf (f )(In ) = sup f (In )
and
sup (f )(In ) = inf f (In ).
Hence L(f, ) = U (f, ) and U (f, ) = L(f, ), leading to

b
b
(f (x)) dx = f (x) dx
a
and
b
(f (x)) dx = f (x) dx.
If f is integrable, then
b
b
b
(f (x)) dx =
(f (x)) dx =
f (x) dx.
a
Finally, in the case < 0, we can write = (1)||, and the claim
follows from the two cases already considered. The case = 0 is trivial.
19
Notation. When we have a function f : [a, b] R, we are sometimes only

interested in its behaviour on a subinterval [c, d] [a, b], in which case we
consider the restriction of f to [c, d]. We say that f is Riemann integrable
on [c, d] if the restriction to [c, d] is Riemann integrable and we write
f (x) dx
c
for the corresponding integral.

Theorem 1.4.2. Let f : [a, b] R be a bounded function.
(i) If a c < d b and f is Riemann integrable, then it is Riemann
integrable on [c, d] as well.
(ii) Suppose that a < c < b and f is Riemann integrable on both [a, c] and
[c, b]. Then it is Riemann integrable on [a, b] and
f (x) dx =
f (x) dx +
f (x) dx.
(1.1)
Proof. (i) Let > 0. According to Theorem 1.2.1, there exists a subdivision
of [a, b] such that (f, ) < . Let 0 be the subdivision of [a, b] obtained
by adding the points c and d to (unless they already belong to ). Then
by Lemma 1.1.2,
(f, 0 ) (f, ) < .
Say that 0 = (x0 , . . . , xN ). Write In = [xn1 , xn ] for n = 1, . . . , N . There
exist some numbers K, L with 1 K L N , such that c = xK1 and
d = xL . Then E = (xK1 , . . . , xL ) is a subdivision of [c, d]. We have
(f, E) =
L
X
(f, In )|In |
N
X
(f, In )|In | = (f, 0 ) < .
n=1
n=K
Hence f is Riemann integrable on [c, d] by Theorem 1.2.1.

(ii) Let > 0. Choose subdivisions 1 = (x0 , . . . , xM ) of [a, c] and
2 = (y0 , . . . , yN ) of [c, b] such that
(f, 1 ) <

2
and

(f, 2 ) < .
2
Let = (x0 , . . . , xM , y1 , . . . , yN ). Then is a subdivision of [a, b] with

L(f, ) = L(f, 1 ) + L(f, 2 ),
U (f, ) = U (f, 1 ) + U (f, 2 ),
and
(f, ) = (f, 1 ) + (f, 2 ) < .
(1.2)
20
Hence f is Riemann integrable on [a, b].

Since
c
f (x) dx U (f, 1 )
L(f, 1 )
a
and
f (x) dx U (f, 2 ),
L(f, 2 )
c
it also follows that
L(f, )
f (x) dx U (f, ).
f (x) dx +
a
Because of (1.2), this means that the numbers
f (x) dx +
f (x) dx
and
f (x) dx
both belong to the interval [L(f, ), U (f, )] of length less than . Therefore, we have
c

b
b

f (x) dx +
f (x) dx
f (x) dx < .

a
As can be chosen arbitrarily small, we have in fact
f (x) dx +
f (x) dx =
f (x) dx,
a
as required
Notation. If d < c, define
f (x) dx =
c
Furthermore, define
f (x) dx.
d
f (x) dx = 0
c
for any c [a, b]. Then (1.1) holds for any three numbers a, b, c in the
domain of an integrable function.
Theorem 1.4.3. Suppose that f : [a, b] R is a Riemann integrable function and g : R R is continuous. Then g f is Riemann integrable.
21
Proof. Let m = inf f ([a, b]) and M = sup f ([a, b]). Then g is uniformly
continuous on [m, M ] by the theorem of uniform continuity. Define ` =
min g([m, M ]) and L = max g([m, M ]), both of which exist by the Weierstrass extreme value theorem.
Let > 0. Then by the uniform continuity, there exists a number > 0
such that whenever s, t [m, M ] with |s t| < , we have |g(s) g(t)| <
. Therefore, if I [a, b] is an interval with (f, I) < , it follows that
(g f, I) < .
By Theorem 1.2.1, there exists a subdivision of [a, b] with (f, ) <
. Let I1 , . . . , IN be the intervals of , which we now divide into two
categories. Let A be the set comprising all indices n {1, . . . , N } such that
(f, In ) < , and let B comprise all n {1, . . . , N } such that (f, In ) .
Then
> (f, ) =
N
X
(f, In )|In |
n=1
(f, In )|In |
nB
|In |.
nB
Therefore, we have
X
>
|In |,
nB
which implies
X
(g f, In )|In | (L `)
nB
|In | < (L `).
nB
On the other hand,

X
(g f, In )|In | <
nA
|In | (b a).
nA
Therefore,
(g f, ) =
X
nA
(g f, In )|In | +
(g f, In )|In | < (L ` + b a).
nB
The right-hand side can be made arbitrarily small, and thus g f is Riemann
integrable by Theorem 1.2.1.
Corollary 1.4.1. Let f : [a, b] R be Riemann integrable. Then |f | is
Riemann integrable.
Proof. Apply Theorem 1.4.3 with g(y) = |y|.
Corollary 1.4.2. Let f, g : [a, b] R be Riemann integrable functions.
Then f g is Riemann integrable.
22
Proof. If : [a, b] R is Riemann integrable, then so is 2 by Theorem

1.4.3. Furthermore, the functions f + g and f g are Riemann integrable
by Theorem 1.4.1. Now use the formula
fg =

1
(f + g)2 (f g)2
4
and use Theorem 1.4.1 again.

Theorem 1.4.4. (i) Let f, g : [a, b] R be Riemann integrable functions. Suppose that f g on [a, b]. Then
b
b
g(x) dx.
f (x) dx
a
(ii) Let f : [a, b] R be Riemann integrable. Then

b
b

f (x) dx
|f (x)| dx.

a
Proof. (i) If f g on [a, b], then for any interval I [a, b], we have inf f (I)
inf g(I). Hence for any subdivision of [a, b],
L(f, ) L(g, ).
Taking the supremum, we obtain
b
b
f (x) dx
g(x) dx.
a
Since we have Riemann integrable functions, this implies the desired inequality.
(ii) If
b
f (x) dx 0,
a
then we use the fact that f |f | on [a, b], together with part (i). The
conclusion is then that
b
b
b

f (x) dx =
f (x) dx
|f (x)| dx.

a
If
f (x) dx < 0,
a
then we use the same argument for f , obtaining

b

b
b
b

f (x) dx =
(f (x)) dx
|f (x)| dx
f (x) dx =

a
in this case.
1.5. THE FUNDAMENTAL THEOREM OF CALCULUS
1.5
23
The Fundamental Theorem of Calculus
This is the section where we draw the link between the two problems at the
beginning of the chapter. So far we have calculated areas under a curve.
Now we find a connection with differentiation.
Theorem 1.5.1 (First Fundamental Theorem of Calculus). Suppose that
f : [a, b] R is Riemann integrable and F : [a, b] R is continuous on
[a, b] and differentiable on (a, b) with F 0 = f . Then
b
f (x) dx = F (b) F (a).
a
That is, given a function f , if we can find a continuous function whose

derivative is f , then we can easily compute the integral of f .
Theorem 1.5.2 (Second Fundamental Theorem of Calculus). Suppose that
f : [a, b] R is Riemann integrable and F : [a, b] R is defined by
x
F (x) =
f (t) dt
a
for x [a, b]. If c (a, b) and f is continuous at c, then F is differentiable

at c with F 0 (c) = f (c).
In other words, given a continuous function f , we can use the integral
to construct a function whose derivative is f . So we may regard integration
as the reverse of differentiation.
We have one more piece of information about the function F defined in
this theorem.
Theorem 1.5.3 (Continuity Theorem). Let f : [a, b] R be Riemann
integrable and let F : [a, b] R be the function such that
x
F (x) =
f (t) dt
a
for x [a, b]. Then F is Lipschitz continuous.

We now need to prove all three theorems. We begin with the easiest,
which is Theorem 1.5.3.
Proof of Theorem 1.5.3. Let K = supx[a,b] |f (x)|. Then we have |f | K
in [a, b]. Suppose that x, y [a, b]. If x y, then by Theorem 1.4.2 and
Theorem 1.4.4, we have

y
x

|F (y) F (x)| =
f (t) dt
f (t) dt
a y
a y

|f (t)| dt K(y x).
=
f (t) dt
x
If y < x, we exchange the roles of x and y and obtain a similar inequality.
24
Proof of Theorem 1.5.1. Consider a subdivision = (x0 , . . . , xN ) of [a, b].

Then by the mean value theorem, for any n = 1, . . . , N , there exists a point
n (xn1 , xn ) such that
F (xn ) F (xn1 ) = f (n )(xn xn1 ).
Let = (1 , . . . , N ). Then (, ) is a tagged subdivision of [a, b]. Furthermore,
S(f, , ) =
N
X
f (n )(xn xn1 ) =
n=1
N
X
(F (xn ) F (xn1 )) = F (b) F (a).
n=1
Now choose a sequence of subdivisions k with kk k 0 as k .

With the above observation, we find corresponding sequences of tags k
such that
S(f, k , k ) = F (b) F (a)
for every k N. But the left-hand side converges to
b
f (x) dx
a
by Corollary 1.3.1, which proves the desired formula.

Proof of Theorem 1.5.2. Suppose that f is continuous at c. Fix > 0 and
choose > 0 such that for any x [a, b] with |x c| < , we have |f (x)
f (c)| < .
Now for x [a, b], we have
x
c
F (x) F (c) =
f (t) dt
f (t) dt
a
a
x
=
f (t) dt
c
x
=
(f (c) + f (t) f (c)) dt
c
x
= f (c)(x c) +
(f (t) f (c)) dt.
c
If x 6= c with |x c| < , then

x
x
F (x) F (c)

f (t) f (c)
|f (t) f (c)|

f (c) =
dt
dt .

xc
x
c
|x c|
c
c
That is, we have
F (x) F (c)
= f (c),
xc
which is exactly what we have to prove.
lim
xc
1.5. THE FUNDAMENTAL THEOREM OF CALCULUS
25
Definition 1.5.1. Let I R be an open interval and let f, F : I R be

two functions. If F is differentiable in I and F 0 (x) = f (x) for all x I, then
F is called a primitive for f in I.
Remark 1.5.1. The expression antiderivative is also common.
Corollary 1.5.1. Let I R be an open interval and f : I R a continuous
function.
(i) Then f has a primitive in I.
(ii) If F is a primitive for f in I and x0 I is any point, then there exists
a constant c R such that
x
F (x) =
f (t) dt + c
x0
for all x I.
Remark 1.5.2. The second statement means that primitives are unique up
to a constant.
Proof. (i) Fix x0 I and define
G(x) =
f (t) dt.
x0
Then for any x > x0 , Theorem 1.5.2 implies that G0 (x) = f (x), as f is
continuous.
Now consider x x0 . Choose a point x1 I with x1 < x (which exists
as I is open). Then
x
x0
G(x) =
f (t) dt
f (t) dt
x1
x1
by Theorem 1.4.2. Hence again we have G0 (x) = f (x), so G is a primitive

for f in I.
(ii) Define G as before. If F is another primitive for f in I, then consider
the function H = F G. Then for any x I, we have
H 0 (x) = F 0 (x) G0 (x) = f (x) f (x) = 0.
A result from MA10207 implies that H is constant, i.e., there exists a constant c R such that H(x) = c for all x I. That is, we have
x
F (x) = G(x) + c =
f (t) dt + c
x0
for all x I.
26
1.6
Integration Techniques
Most methods to calculate integrals rely on the first fundamental theorem

of calculus: in order to integrate a function f over [a, b], we first find a
primitive, which we then evaluate at the end points a and b.
Example 1.6.1. Let n N. What is
b
xn dx?
a
xn+1
Define F (x) = n+1 and check that F 0 (x) = xn for all x (a, b). Furthermore, this function is continuous on [a, b]. So by Theorem 1.5.1,
b
bn+1
an+1
xn dx = F (b) F (a) =
.
n+1 n+1
a
We have differentiation rules for products and compositions, and these
give rise to similar rules for integrals.
Theorem 1.6.1 (Integration by Parts). Let f, g : [a, b] R be Riemann
integrable functions. Suppose that F, G : [a, b] R are continuous functions
that are primitives of f and g, respectively, in (a, b). Then
b
b
f (x)G(x) dx +
F (x)g(x) dx = F (b)G(b) F (a)G(a).
a
Proof. Write H = F G and note that this function is continuous on [a, b]

and differentiable in (a, b) with H 0 = f G + F g by the product rule. Hence
by Theorem 1.5.1,
b
(f (x)G(x) + F (x)g(x)) dx = H(b) H(a) = F (b)G(b) F (a)G(a).
a
An application of Theorem 1.4.1 now completes the proof.

b In practice, this formula is typically used in order to reduce the integral
a f (x)G(x) dx into the hopefully easier expression
b
F (b)G(b) F (a)G(a)
F (x)g(x) dx.
a
Examples can be found in Exercise 5.1.

Theorem 1.6.2 (Integration by Substitution). Let I R be an open interval and f : I R a continuous function. Suppose that u : [a, b] I
is continuous on [a, b] and differentiable in (a, b) with u0 continuous and
bounded. Extend u0 to [a, b] by assigning u0 (a) and u0 (b) arbitrarily. Then
u(b)
b
f (y) dy =
f (u(x))u0 (x) dx.
u(a)
1.7. EXCHANGING INTEGRALS WITH LIMITS
27
Proof. Choose a primitive F for f in I (which is possible by Corollary 1.5.1).

Then F u is continuous on [a, b] and differentiable in (a, b) with (F u)0 =
(f u)u0 by the chain rule. Now Theorem 1.5.1 implies that
u(b)
f (y) dy = F (u(b)) F (u(a))

u(a)
and
f (u(x))u0 (x) dx = F (u(b)) F (u(a)).
Hence the two integrals are equal.
1.7
Exchanging Integrals with Limits
Consider a sequence (fk )kN of functions fk : [a, b] R. Recall that fk

converges uniformly to a function f on [a, b] if
> 0 K N k K x [a, b] : |fk (x) f (x)| < .
By a result from MA10207, the uniform limit of continuous functions is
continuous.
Theorem 1.7.1. Let (fk )kN be a sequence of Riemann integrable functions
on [a, b] converging uniformly to a function f : [a, b] R. Then f is
Riemann integrable and
b
b
fk (x) dx
f (x) dx
a
as k .
Remark 1.7.1. The conclusion of the theorem can be written in the form
b
b
lim
fk (x) dx =
lim fk (x) dx.
k a
a k
So we can summarise the theorem as follows: if we have uniform convergence,

then we can exchange the integral with the limit.
Proof. Let > 0 and fix a number K N such that for all k K and
all x [a, b], we have |fk (x) f (x)| < . Using Theorem 1.2.1, we find
a subdivision of [a, b] such that (fK , ) < . Let I1 , . . . , IN be the
intervals of . Then for every n = 1, . . . , N , we have
(f, In ) = sup f (In ) inf f (In )
sup fK (In ) + inf fK (In ) +
= (fK , In ) + 2.
28
It follows that
(f, ) =
N
X
(f, In )|In |
n=1
N
X
(fK , In )|In | + 2
n=1
N
X
|In |
n=1
= (fK , ) + 2(b a) < (1 + 2b 2a).

The right-hand side can be made arbitrarily small. Thus Theorem 1.2.1
implies that f is Riemann integrable.
Moreover, we have

b
b
b

f (x) dx = (fk (x) f (x)) dx
fk (x) dx

a
a
a
b
|fk (x) f (x)| dx

a
(b a) sup |fk (x) f (x)| 0

x[a,b]
as k by Theorem 1.4.1 and Theorem 1.4.4. Therefore, we have the

desired convergence.
P
Corollary 1.7.1. If
k=1 fk is a uniformly convergent series of Riemann
integrable functions fk : [a, b] R, then the sum of the series is Riemann
integrable and
bX
b
X
fk (x) dx =
fk (x) dx.
a k=1
k=1
P
k
Corollary 1.7.2. Suppose that
k=0 k x is a power series with radius of
convergence R (0, ] and R < a < b < R. Then
bX
k xk dx =
a k=0

X
k k+1
b
ak+1 .
k+1
k=0
Proof. See exercise sheets.

Theorem 1.7.2. Let (fk )kN be a sequence of continuously differentiable
functions on (a, b). Suppose that
(i) there exists a number x0 (a, b) such that the sequence (fk (x0 ))kN is
convergent, and
(ii) (fk0 )kN converges uniformly.
Then there exists a continuously differentiable function f : (a, b) R such
that fk f uniformly and fk0 f 0 uniformly as k .
1.8. IMPROPER INTEGRALS
29
Proof. Let y0 = limk fk (x0 ) and let g : (a, b) R be the uniform limit of
fk0 . Then g is continuous, being the uniform limit of continuous functions.
Define
x
g(t) dt, x (a, b).
f (x) = y0 +
x0
Then
f0
= g by Theorem 1.5.2. Moreover, for any x (a, b),

x
x

0
g(t) dt
fk (t) dt y0
|fk (x) f (x)| = fk (x0 ) +
x0
x0
x

0

|fk (x0 ) y0 | + (fk (t) g(t)) dt
xx0
|fk0 (t) g(t)| dt
|fk (x0 ) y0 | +
x0
|fk (x0 ) y0 | + (b a) sup |fk0 (t) g(t)| 0.

t[a,b]
Hence fk f uniformly. We already know that fk0 g = f 0 uniformly.
1.8
Improper Integrals
If we have an unbounded interval or an unbounded function, then the previous theory does not apply. When we think in terms of area under a curve,
this means that we have only discussed bounded regions in R2 so far. But
sometimes it is appropriate to assign an area to an unbounded region. This
can often be done by taking a limit.
There are two distinct situation that we consider.
(i) Suppose that f : [a, b] R is unbounded, but is Riemann integrable
(and in particular bounded) on [c, b] for any c (a, b) and the limit
b
lim
f (x) dx
ca+
exits. Then we define

b
f (x) dx = lim
ca+
f (x) dx.
c
The value of f at a is irrelevant here, so we can use the same definition

for a function f : (a, b] R (provided that is satisfies the appropriate
conditions).
Similarly, if f : [a, b] R or f : [a, b) R is Riemann integrable on
[a, c) for any c (a, b), then
b
c
f (x) dx = lim
f (x) dx,
a
provided that the limit exits.
cb
30
(ii) Suppose that f : [a, ) R is a function that is Riemann integrable

on [a, c] for any c > a and the limit
c
f (x) dx
lim
c a
exists. Then we define

f (x) dx = lim
c a
f (x) dx.
Similarly, if we have a function f : (, b] R that is Riemann

integrable on [c, b] for any c < b, then
f (x) dx = lim
c c
f (x) dx,
provided that the limit exists.

In either case, these are called improper integrals.
Example 1.8.1. Consider
We compute
as c
0+ .
Hence
dx
=2 12 c2
x
dx
= 2.
x
Example 1.8.2. Consider
We have
dx
.
x
dx
x
dx
= log c log 1 = log c
x
as c . Hence this is not an improper integral. Even so, it is customary

to write

dx
= .
x
1
Suppose that f : [a, ) R is a non-negative function that is integrable
on [a, c] for any c > a. Then the function F , given by
x
F (x) =
f (t) dt
a
1.8. IMPROPER INTEGRALS
31
is increasing. Hence F (x) either converges or diverges to as x . In

either case, it makes sense to write

f (x) dx = lim F (x).
x
Theorem 1.8.1 (Integral Test for Convergence of Series). Let K Z and

let f : [K, ) R be a decreasing, non-negative function. Then the series
f (k)
k=K
is convergent if, and only if,
f (x) dx < .
K
Proof. See Exercise 5.3.

Example 1.8.3. Let s R and consider the series
ks .
k=1
In order to test convergence, also consider the integral

xs dx.
1
We distinguish two cases. If s = 1, then the function x 7 xs has the

primitive x 7 log x in (0, ). Hence

xs dx = lim log c = .
c
We conclude that
X
1
= .
k
k=1
s+1
If s 6= 1, then the function x 7 xs has the primitive x 7 xs+1 in (0, ).

We have
(

if s > 1,
limc cs+1 1
s
x dx =
=
1
s
+
1
s+1 if s < 1.
1
Hence we have
ks =
k=1
if s 1 and
X
k=1
if s < 1.
ks <
32
Chapter 2
Analysis in Several Variables

So far we have studied functions in one variable, i.e., defined on a domain
S R (typically an interval). Most of the concepts seen in MA10207 and in
this course, however, have generalisations for functions in several variables.
So from now on, we consider domains S RN and functions f : S R or
f : S RM , where M, N N.
First we need to make a few observations about the space RN itself.
2.1
The Euclidean Space RN
Notation. We identify the elements of RN with column vectors (real N 1

matrices). In order to save space, we often make use of the matrix transpose,
writing

x1
..
T
(x1 , . . . , xN ) = . .
xN
Recall that RN has the familiar vector addition and multiplication with
scalars, making it a vector space. The Euclidean inner product (or Euclidean
scalar product) of x, y RN is
hx, yi =
N
X
x n yn = y T x
n=1
when x = (x1 , . . . , xN )T and y = (y1 , . . . , yN )T . The Euclidean norm of a

vector x = (x1 , . . . , xN )T RN is given by
q
p
kxk = hx, xi = x21 + + x2N .
The following properties are easily proved:
h , i is linear in each variable,
33
34
CHAPTER 2. ANALYSIS IN SEVERAL VARIABLES
x+y
Figure 2.1.1: Geometric interpretation of the triangle inequality in R2 : the

length of the vector x + y is at most the sum of the lengths of x and y.
kxk 0 for all x RN , with equality if, and only if, x = 0,

kxk = ||kxk for all x RN and R, and
hx, yi = hy, xi for all x, y RN .
The Cauchy-Schwarz inequality,
| hx, yi | kxkkyk
for all x, y RN , is proved in MA20216.
Lemma 2.1.1 (Triangle Inequality). For all x, y RN ,
kx + yk kxk + kyk.
The triangle inequality is illustrated in Fig. 2.1.1.
Proof. We have
kx + yk2 = hx + y, x + yi = kxk2 + 2 hx, yi + kyk2
kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 ,
using the Cauchy-Schwarz inequality in the third step. Taking square roots
yields the result.
2.2. CONVERGENCE
2.2
35
Convergence
Definition 2.2.1. A sequence (xk )kN in RN is said to converge to the limit

x0 RN if
> 0 K N k K : kxk x0 k < .
If so, we write x0 = limk xk or xk x0 as k .
Remark 2.2.1. This condition is equivalent to limk kxk x0 k = 0.
Lemma 2.2.1. Let (xk )kN be a sequence in RN , where

(1)
(N ) T
xk = xk , . . . , x k
for every k N. Furthermore, let

(1)
(N ) T
x0 = x0 , . . . , x 0
RN .
(n)
Then x0 = limk xk if, and only if, x0

1, . . . , N .
(n)
= limk xk
for every n =
Proof. Suppose that x0 = limk xk . Then for n = 1, . . . , N , we have

(n)
(n)
xk x0
N
X
(n)
xk

(n) 2
x0
!1/2
= kxk x0 k 0.
n=1
(n)
(k)
So xk x0 as k .
(n)
(n)
Conversely, suppose that x0 = limk xk for n = 1, . . . , N . Then
kxk x0 k =
N
X
(n)
xk

(n) 2
x0
!1/2
0
n=1
as k by the algebra of limits in R.

Lemma 2.2.2. Let (xk )kN and (yk )kN be convergent sequences in RN
and (k )kN a convergent sequence in R. Furthermore, let x0 = limk xk ,
y0 = limk yk , and 0 = limk k . Then
(i) x0 + y0 = limk (xk + yk ),
(ii) 0 x0 = limk (k xk ),
(iii) hx0 , y0 i = limk hxk , yk i, and
(iv) kx0 k = limk kxk k.
36
Proof. (i) We have

k(xk + yk ) (x0 + y0 )k kxk x0 k + kyk y0 k 0
by the triangle inequality.
(ii) Here we observe that
kk xk 0 x0 k = kk (xk x0 ) + (k 0 )x0 k
kk (xk x0 )k + k(k 0 )x0 k
= |k |kxk x0 k + |k 0 |kx0 k.
We know that kxk x0 k 0 and |k 0 | 0 as k . Moreover, we
have |k | |0 |. It follows that kk xk 0 x0 k 0 as k as well.
(iii) We compute
| hxk , yk i hx0 , y0 i | = | hxk x0 , yk y0 i + hx0 , yk y0 i + hxk x0 , y0 i |
| hxk x0 , yk y0 i | + | hx0 , yk y0 i |
+ | hxk x0 , y0 i |
kxk x0 kkyk y0 k + kx0 kkyk y0 k
+ kxk x0 kky0 k.
Now we observe that all the terms on the last line tend to 0.
(iv) This is proved in Exercise 6.4.
Definition 2.2.2. A set S RN is bounded if there exists a number R 0
such that kxk R for all x S. A sequence (xk )kN is bounded if the set
{xk : k N} is bounded.
Theorem 2.2.1 (Bolzano-Weierstrass). Every bounded sequence in RN has
a convergent subsequence.
Proof. Let (xk )kN be a bounded sequence in RN with

(1)
(N ) T
xk = xk , . . . , xk
.
(n)
Note that for any n = 1, . . . , N , we have |xk | kxk k, so the sequence

(n)
(xk )kN is bounded in R.
By the Bolzano-Weierstrass theorem in R, there exists a convergent sub(1)
sequence of (xk )kN . That is, there exists an infinite subset 1 N such
(1)
(1)
that (xk )k1 is convergent. Let x0 denote its limit.
(2)
Now (xk )k1 is a bounded sequence in R as well. Apply the same ar(2)
guments to obtain an infinite set 2 1 such that (xk )k2 is convergent
(2)
(1)
with limit x0 . Note that (xk )k2 , being a subsequence of a convergent
(1)
sequence, still converges to x0 .
2.3. OPEN AND CLOSED SETS
37
Apply the same arguments to the remaining coordinates in turn, con(n)

(n)
structing infinite subsets N 1 . . . N chosen such that xk x0
as k while k n for n = 1, . . . , N .
Then for (xk )kN , all the coordinates converge. Using Lemma 2.2.1, we
see that the subsequence is convergent in RN .
2.3
Open and Closed Sets
Definition 2.3.1. If x RN and r > 0, then

Br (x) = y RN : kx yk < r
is called the open ball with centre x and radius r.
Definition 2.3.2. A set G RN is called open if for every x G there
exists an r > 0 such that Br (x) G. A set F RN is called closed if the
complement RN \F is open.
Remark 2.3.1. In this context, closed is not the same as not open. There
are sets that are neither open nor closed, and there are even some sets that
are both open and closed.
Example 2.3.1. For N = 1 (i.e., in R), open intervals (a, b) are open and
closed intervals [a, b] are closed. A half-open interval [a, b) or (a, b] is neither
open nor closed.
Theorem 2.3.1. Let S RN . Then S is closed if, and only if, it contains
the limit of every sequence in S that converges in RN .
Proof. Suppose that S is closed and consider a sequence (xk )kN in S. For
any point y0 RN \S, there exists a number r > 0 with Br (y0 ) RN \S,
since RN \S is open. Therefore, we have
kxk y0 k r
for all k N and y0 is certainly not a limit of the sequence. So if a limit
exists at all, it must belong to S.
Now suppose that S is not closed. Then RN \S is not open. Hence there
exists a point x0 RN \S such that for any r > 0, we have Br (x0 ) 6 RN \S.
In particular, for any k N, we have B1/k (x0 ) S 6= ; so we can choose a
point xk B1/k (x0 ) S. Thus we construct a sequence (xk )kN in S with
the property that
1
kxk x0 k < 0.
k
Hence we have convergence to x0 , which is not in S.
38
Definition 2.3.3. A subset of RN is called compact if it is closed and

bounded.
Compact sets have particularly nice properties. The following is an example.
Corollary 2.3.1. Let C RN be a compact set. Then every sequence in C
has a subsequence converging to a point of C.
Proof. Given a sequence in C, Theorem 2.2.1 implies that there exists a
convergent subsequence. By Theorem 2.3.1, the limit must belong to C.
2.4
Continuity
Definition 2.4.1. Let S RN and f : S RM . For x0 S, we say that

f is continuous at x0 if
> 0 > 0 x S : kx x0 k < kf (x) f (x0 )k < .
We say that f is continuous if it is continuous at every point of S. Finally,
we say that f is uniformly continuous if
> 0 > 0 x, y S : kx yk < kf (x) f (y)k < .
Definition 2.4.2. Let S RN and f : S RM . For x0 S and ` RM ,
we say that f (x) converges to ` as x x0 if
> 0 > 0 x S : 0 < kx x0 k < kf (x) `k < .
In this case, we write f (x) ` as x x0 or ` = limxx0 f (x).
Remark 2.4.1. It follows that f is continuous at x0 if, and only if, f (x)
f (x0 ) as x x0 .
Theorem 2.4.1. Let S RN and f : S RM . Furthermore, let x0 S.
Then f is continuous at x0 , if, and only if, for any sequence (xk )kN in S
converging to x0 , the sequence (f (xk ))kN converges to f (x0 ) as k .
Proof. Suppose that f is continuous at x0 . Consider a sequence (xk )kN
converging to x0 as k . Let > 0. Then by the continuity, there exists
a number > 0 such that for all x S,
kx x0 k < kf (x) f (x0 )k < .
By the convergence, there exists a number K N such that kxk x0 k < for
all k K. Then kf (xk ) f (x0 )k < for all k K. Hence f (xk ) f (x0 )
as k .
2.4. CONTINUITY
39
Conversely, suppose that f is not continuous at x0 . Then there exists

a number > 0 such that for all > 0 there exists a point x S with
kx x0 k < , but kf (x) f (x0 )k . In particular, for any k N, we can
choose a point xk S with
kxk x0 k <
1
,
k
but kf (xk ) f (x0 )k .
Then the sequence (xk )kN evidently converges to x0 , but the sequence
(f (xk ))kN does not converge to f (x0 ).
Corollary 2.4.1. Let S RN and x0 S.
(i) If f, g : S RM are both continuous at x0 , then f + g is continuous
at x0 .
(ii) If f : S RM is continuous at x0 and : S R is continuous at x0 ,
then f is continuous at x0 .
Proof. Combine Theorem 2.4.1 with Lemma 2.2.2.
Theorem 2.4.2. Let A RN and B RM and let x0 A. Suppose that
f : A RM and g : B RK are functions with f (A) B, such that f is
continuous at x0 and g is continuous at f (x0 ). Then g f is continuous at
x0 .
Proof. First note that g f is well-defined by the assumption f (A) B.
Let > 0. By the continuity of g at f (x0 ), there exists a number > 0
such that kg(y) g(f (x0 ))k < for all y B with ky f (x0 )k < . By the
continuity of f at x0 , there exists a number > 0 such that kf (x)f (x0 )k <
for all x A with kx x0 k < .
Now if x A and kx x0 k < , it follows that kg(f (x)) g(f (x0 ))k < .
Hence g f is continuous at x0 .
Theorem 2.4.3 (Weierstrass Extreme Value Theorem). Let C RN be
non-empty and compact. Then any continuous function f : C R is
bounded and attains its infimum and supremum.
Proof. Let = sup f (C) (, ]. Then we can choose a sequence
(k )kN in f (C) such that = limk k . For each k N, choose a point
xk C with f (xk ) = k .
By Corollary 2.3.1, there exists a convergent subsequence (xkj )jN converging to a point of C, say xkj x0 C as j . By the continuity of
f , we now have
f (x0 ) = lim f (xkj ) = lim kj = lim k = sup f (C).
j
40
Hence the supremum is attained. It also follows that

sup f (C) = f (x0 ) < .
For the infimum, we can use the same arguments. We conclude that the
infimum is attained and
inf f (C) > .
Therefore, we also conclude that f is bounded.
Theorem 2.4.4. If C RN is compact, then every continuous function
f : C RN is uniformly continuous.
Proof. Assume, by way of contradiction, that f is continuous but not uniformly so. Then there exists an > 0 such that for every > 0 there exist
two points x, y C with kx yk < , but kf (x) f (y)k . If we fix
with this property, then that means in particular that for any k N, there
exist xk , yk C such that kxk yk k < k1 , but kf (xk ) f (yk )k .
Corollary 2.3.1 implies that (xk )kN has a convergent subsequence with
limit in C, say xkj x0 C. Then
kykj x0 k kykj xkj k + kxkj x0 k 0,
hence we have ykj x0 as well. Thus by the continuity of f and Theorem
2.4.1, we have
f (xkj ) f (x0 )
and f (ykj ) f (x0 ).
On the other hand, we have the inequality

kf (xkj ) f (ykj )k ,
and the two statements contradict each other.
2.5
Norms
Everything that we have done so far in this chapter is based on the idea
that we can measure distances in RN in terms of the Euclidean norm k k.
This concept can be generalised.
Definition 2.5.1. A norm on real vector space V is a map k kV : V R
such that
(i) kxkV 0 for all x V with equality if, and only if, x = 0,
(ii) kxkV = ||kxkV for all x V and all R, and
(iii) kx + ykV kxkV + kykV for all x, y V .
2.5. NORMS
41
Example 2.5.1. We have already seen that the Euclidean norm k k on RN

satisfies these conditions. Other examples of norms on RN include k k1 and
k k with
kxk1 =
N
X
|xn |
and kxk = max{|x1 |, . . . , |xN |}
n=1
for x = (x1 , . . . , xN )T .
Given a real vector space V with a norm k kV , we can define convergence,
balls, and open sets in V analogously to the corresponding concepts in RN ,
simply replacing the Euclidean norm by k kV everywhere. If we have two
real vector spaces V and W with norms k kV and k kW , respectively, then
we can also define continuity of a map f : V W .
Definition 2.5.2. Two norms k k1 and k k2 on a real vector space V are
equivalent if there exists a number C 0 such that kxk1 Ckxk2 and
kxk2 Ckxk1 for all x V .
Proposition 2.5.1. Let k k1 and k k2 be two equivalent norms on the real
vector space V . Then a sequence (xk )kN in V converges to a limit x0 V
with respect to the norm k k1 if, and only if, it converges to x0 with respect
to k k2 .
Proof. Convergence with respect to k k1 means that kxk x0 k1 0 as
k . But then kxk x0 k2 Ckxk x0 k1 0 as well, so we have
convergence with respect to k k2 . The arguments for the converse are the
same.
Remark 2.5.1. So equivalent norms give rise to the same notion of convergence. It follows that they also give rise to the same continuous functions.
Theorem 2.5.1. Any two norms on RN are equivalent.
Proof. It suffices to show that any norm k k on RN is equivalent to the
Euclidean norm k k. Let (e1 , . . . , eN ) be the standard basis in RN . Then
for x = (x1 , . . . , xN )T RN , we have

N
N

X
X

kxk =
xn e n
kxn en k

n=1
n=1
!1/2 N
!1/2
N
N
X
X
X
=
|xn |ken k
x2n
ken k2
n=1
n=1
n=1
by the triangle inequality, property (ii) in Definition 2.5.1, and the CauchyP
1/2
N
2
Schwarz inequality. Setting C1 =
ke
k
, we obtain
n
n=1
kxk C1 kxk.
(2.1)
42

Now note that for x, y RN , we have
|kxk kyk | kx yk C1 kx yk
by the triangle inequality and the first part of this proof (cf. Exercise 6.3).
Hence k k is a continuous function with respect to the Euclidean norm.
Let

S = x RN : kxk = 1 ,
which is a closed and bounded set with respect to k k. It follows from
Weierstrasss extreme value theorem (Theorem 2.4.3) that there exists a
point x0 S such that kxk kx0 k for all x S. Let C2 = kx10 k . Then
for any x 6= 0, we have
x
S.
kxk
Hence

x
1

= kxk .

C2
kxk
kxk
That is,
kxk C2 kxk .
(2.2)
This inequality is trivial for x = 0.
The equivalence of the norms now follows from the two inequalities (2.1)
and (2.2).
The following is another example of a norm. We will use this specific
norm later.
Definition 2.5.3. Let Hom(RN , RM ) denote the space of all linear maps
A : RN RM . The operator norm is the norm k k on Hom(RN , RM )
defined by

kAk = sup kAxk : x RN with kxk 1 .
Remark 2.5.2. For any A Hom(RN , RM ), the operator norm kAk is
finite. Indeed, if (amn )m,n is the transformation matrix of A with respect to
the standard basis, then for all x RN , we have
!2 1/2
M
N
X
X
kAxk =
amn xn
m=1
n=1
M
X
N
X
m=1
n=1
!
a2mn
N
X
!!1/2
x2n
n=1
M X
N
X
!1/2
a2mn
m=1 n=1
by the Cauchy-Schwarz inequality; so

kAk
M X
N
X
!1/2
a2mn
m=1 n=1
It is checked in Exercise 7.4 that the operator norm is a norm.
kxk
2.6. DERIVATIVES
Proposition 2.5.2.
43
(i) If A Hom(RN , RM ) and x RN , then
kAxk kAkkxk.
(ii) If A, B Hom(RN , RM ), then

kABk kAkkBk.
Proof. (i) If x = 0, then this is trivial. Otherwise, we have

x

kAxk = kxk
A
kxk kAkkxk.
(ii) For any x RN with kxk 1, we have
kABxk kAkkBxk kAkkBkkxk kAkkBk.
The desired inequality then follows.
Remark 2.5.3. The space Hom(RN , RM ) can be identified with RM N by
identifying a linear map with the corresponding matrix. Theorem 2.5.1 says
that any two norms are equivalent on RM N , and the same statement can
then be made for Hom(RN , RM ). However, other norms will not satisfy the
inequalities from Proposition 2.5.2 in general.
2.6
Derivatives
Recall that for a function f : (a, b) R, the derivative of f at a point

x0 (a, b), if it exists, is defined by
f 0 (x0 ) = lim
xx0
f (x) f (x0 )
.
x x0
In this form, the definition has no obvious generalisation for functions S

RM with S RN , because we cannot divide by a vector x x0 with
x, x0 RN . However, there is another characterisation of the derivative:
it determines the best linear approximation of a given function near a given
point (cf. Fig. 2.6.1). This concept does have a generalisation to higher
dimensions.
For differentiation, we need to work with maps defined on open sets.
From now on, denotes an open subset of RN .
Definition 2.6.1. Let x0 . A map f : RM is Frechet differentiable
(or differentiable for short) at x0 if there exists a linear map A : RN RM
such that
f (x) f (x0 ) A(x x0 )
0 as x x0 .
kx x0 k
In this case, the linear map A is called the Frechet derivative (or derivative
for short) of f at x0 and denoted Df (x0 ).
44
y
y = f (x0) + (x - x0) f '(x0)
y = f (x)
x0
Figure 2.6.1: Linear approximation of a differentiable function f : (a, b) R
In the special case N = M = 1, we have

lim
xx0
f (x) f (x0 )
f (x) f (x0 ) (x x0 )f 0 (x0 )
= lim
f 0 (x0 ) = 0,
xx0
x x0
x x0
so
lim
xx0
f (x) f (x0 ) (x x0 )f 0 (x0 )

=0
|x x0 |
as well. Hence the Frechet derivative at x0 is the linear map Df (x0 ) : R R

with Df (x0 )h = f 0 (x0 )h.
Remark 2.6.1. Recall the little o notation: when we write g(x) = o(h(x))
as x x0 , say, this means that
lim
xx0
g(x)
= 0.
h(x)
For example, we have kxx0 k2 = o(kxx0 k) as x x0 . With this notation,

the condition for the Frechet derivative can be written in the form
f (x) = f (x0 ) + Df (x0 )(x x0 ) + o(kx x0 k)
as x x0 .
Since the definition of the derivative involves a linear map, the following
information is useful.
Lemma 2.6.1. Any linear map A : RN RM is continuous.
2.6. DERIVATIVES
45
Proof. By results from linear algebra, there is an (M N )-matrix (amn )m,n

such that the components of Ax are
N
X
amn xn ,
m = 1, . . . , M.
n=1
Thus the claim follows from Lemma 2.2.1 and Theorem 2.4.1.
Proposition 2.6.1. Let f : RM be a map that is differentiable at
x0 . Then f is continuous at x0 .
Proof. We have
f (x) f (x0 ) = Df (x0 )(x x0 ) + o(kx x0 k) 0
as x x0
by Lemma 2.6.1, as required.

We also have a different notion of derivative.
Definition 2.6.2. Let f : RM be a map. Let x0 . For n =
1, . . . , N , let en = (0, . . . , 0, 1, 0, . . . , 0)T denote the n-th standard unit vector
in RN . Then
f (x0 + hen ) f (x0 )
f
(x0 ) = lim
,
h0
xn
h
if the limit exists, is called the partial derivative of f at x0 with respect to
xn . If we write f (x) = (f1 (x), . . . , fM (x))T for x , then the matrix

fm
Jf (x0 ) =
(x0 )
xn
m,n
is called the Jacobi matrix of f at x0 , provided that these partial derivatives
exist.
Definition 2.6.3. Suppose that x0 , let f : RM be a map, and let
v RN . Then
f (x0 + hv) f (x0 )
Dv f (x0 ) = lim
,
h0
h
if it exists, is called the directional derivative of f at x0 in direction v.
Remark 2.6.2. We have
f
xn (x0 )
= Den f (x0 ) if it exists.
Definition 2.6.4. Let x0 . If f : R is a function such that Jf (x0 )

exists, then
f (x0 ) = (Jf (x0 ))T
is called the gradient of f at x0 .
46
Because we assume that f is a function with values in R here, the gradient

at a point x0 is a column vector. It has a geometric interpretation: for
x0 , the direction of the vector f (x0 ) is the direction of steepest ascent
of f at x0 , while its length is the rate of change in that direction.
Lemma 2.6.2. Suppose that f : RM is differentiable at x0 . Then
f
all the partial derivatives x
(x0 ) exist for n = 1, . . . , N , and the Jacobian
n
matrix Jf (x0 ) is the transformation matrix of the linear map Df (x0 ) with
respect to the standard bases in RM and RN .
Proof. Let (e1 , . . . , en ) be the standard basis of RN and (1 , . . . , m ) the
standard basis of RM . Let A = (amn )m,n be the transformation matrix of
Df (x0 ). Fix n {1, . . . , N } and m {1, . . . M }. Then for h 6= 0 with |h|
small enough, we have x0 + hen , as is open. Moreover,

fm (x0 + hen ) fm (x0 )

a
mn

h

f (x0 + hen ) f (x0 )

= m ,
Df (x0 )en
h

f (x0 + hen ) f (x0 ) Df (x0 )hen
0

h
as h 0. Hence we obtain
fm
(x0 ) = amn
xn
when taking the limit.
Remark 2.6.3. This lemma implies that the derivative Df (x0 ), if it exists,
is unique.
Remark 2.6.4. If f is differentiable at x0 , then for any v RN , we can
show with the arguments from Lemma 2.6.2 that Dv f (x0 ) = Df (x0 )v.
Theorem 2.6.1. Let x0 . If f : RM is a map such that all the
f
f
partial derivatives x
, . . . , x
exist throughout and are continuous at x0 ,
1
N
then f is differentiable at x0 .
Proof. We first consider the case M = 1.
Let r > 0 such that Br (x0 ) . Let h = (h1 , . . . , hN )T Br (0).
Consider the function t 7 f (x0 + te1 ), which is differentiable in (r, r) with
derivative
f
(x0 + te1 ).
x1
By the mean value theorem, there exists a number 1 R with |1 | |h1 |
such that
f
f (x0 + h1 e1 ) = f (x0 ) + h1
(x0 + 1 e1 ).
x1
2.6. DERIVATIVES
47
Similarly, there exists a number 2 R with |2 | |h2 | such that

f (x0 + h1 e1 + h2 e2 ) = f (x0 + h1 e1 ) + h2
= f (x0 ) + h1
+ h2
f
(x0 + h1 e1 + 2 e2 )
x2
f
(x0 + 1 e1 )
x1
f
(x0 + h1 e1 + 2 e2 ).
x2
Continuing with the coordinates x3 , x4 , . . . , xN , we obtain 3 , . . . , N with

|n | |hn | for n = 1, . . . , N such that
f
(x0 + 1 e1 )
x1
f
+ + hN
(x0 + h1 e1 + + hN 1 eN 1 + N eN ).
xN
f (x0 + h) = f (x0 ) + h1
Setting
bn =
f
f
(x0 + h1 e1 + + hn1 en1 + n en )
(x0 ),
xn
xn
we obtain
f (x0 + h) = f (x0 ) + Jf (x0 )h +
N
X
bn hn .
(2.3)
n=1
By the continuity of the partial derivatives, we have bn 0 as h 0. Hence

N
X
bn hn = o(khk).
n=1
So (2.3) implies that Df (x0 ) exists and is represented by the matrix Jf (x0 ).
If M 2, then we apply these arguments to every component of f . The
claim then follows in this case as well.
Definition 2.6.5. A map f : RM is called continuously differentiable
if it is differentiable throughout and the map Df : Hom(RN , RM ) is
continuous.
Here continuity is meant with respect to the operator norm on the space
Hom(RN , RM ). However, it follows from Lemma 2.2.1, Theorem 2.4.1, Theorem 2.6.1, and the equivalence of all norms on Hom(RN , RM ) that f is
continuously differentiable if, and only if, all partial derivatives exist and
are continuous in .
Example 2.6.1. Let f : R3 R2 with
2

x1 + x22 x3
f (x) =
x1 x2 x3
48
for x R3 . We compute

Jf (x) =

2x1 2x2 x3 x22
.
x2 x3 x1 x3 x1 x2
All of these expressions give rise to continuous functions, hence f is continuously differentiable.
Theorem 2.6.2 (Chain Rule). Let U RN and V RM be open sets. Let
f : U RM and g : V RK be maps and suppose that f (U ) V . Let
x0 U . If f is differentiable at x0 and g is differentiable at f (x0 ), then g f
is differentiable at x0 with
D(g f )(x0 ) = Dg(f (x0 ))Df (x0 ).
Proof. Define
(x) = f (x) f (x0 ) Df (x0 )(x x0 )
for x U and
(y) = g(y) g(f (x0 )) Dg(f (x0 ))(y f (x0 ))
for y V . Then
lim
xx0
and
(x)
=0
kx x0 k
(y)
=0
yf (x0 ) ky f (x0 )k
lim
(2.4)
(2.5)
by the definition of the Frechet derivative.

Let x U \{x0 }. If f (x) = f (x0 ), then obviously g(f (x)) g(f (x0 )) = 0.
Otherwise,
g(f (x)) g(f (x0 )) = Dg(f (x0 ))(f (x) f (x0 )) + (f (x))
= Dg(f (x0 )) (Df (x0 )(x x0 ) + (x))
(f (x))
+
kf (x) f (x0 )k
kf (x) f (x0 )k
= Dg(f (x0 ))Df (x0 )(x x0 ) + Dg(f (x0 ))(x)
(f (x))
kDf (x0 )(x x0 ) + (x)k.
+
kf (x) f (x0 )k
Hence
kg(f (x)) g(f (x0 )) Dg(f (x0 ))Df (x0 )(x x0 )k
kx x0 k

k(x)k
k(f (x))k
k(x)k
kDg(f (x0 ))k
+
kDf (x0 )k +
.
kx x0 k kf (x) f (x0 )k
kx x0 k
2.6. DERIVATIVES
49
f (x0)
x0
f (x) = 0
Figure 2.6.2: A level set of f and the gradient f (x0 ) perpendicular to it.
Now (2.4) and (2.5) imply that

lim
xx0
kg(f (x)) g(f (x0 )) Dg(f (x0 ))Df (x0 )(x x0 )k

= 0.
kx x0 k
Thus Dg(f (x0 ))Df (x0 ) is the Frechet derivative of g f at x0 .

If we have a function f : R, then the chain rule can be used to find
another geometric interpretation of the gradient f . Suppose that x0
and f is differentiable at x0 . Let = f (x0 ) and consider the level set S =
{x : f (x) = }. Then f (x0 ) is perpendicular to S in the following
sense (cf. Fig. 2.6.2). Suppose that we have a curve : (r, r) S with
(0) = x0 . If the derivative 0 (0) exists, then we can interpret it as a tangent
vector to S at x0 . Consider the function h = f . We have h(t) = for
every t (r, r), because takes values in S . Using the chain rule, we
now compute

0 = h0 (0) = Df (x0 ) 0 (0) = f (x0 ), 0 (0) .
(It is possible, however, that S does not have any tangent vectors at x0
except 0.)
Notation. If x, y RN , then we write [x, y] = {(1 t)x + ty : 0 t 1}
for the line segment connecting x and y.
50
Proposition 2.6.2 (Mean Value Inequality). Let x, y with [x, y] .

If f : RM is continuous on [a, b] and differentiable at every point
z [a, b]\{a, b} with kDf (z)k K, then
kf (x) f (y)k Kkx yk.
Proof. Fix v RM and define
g(t) = hv, f ((1 t)x + ty)i ,
0 t 1.
Then by Theorem 2.4.2, the function g is continuous on [0, 1] and by Theorem 2.6.2, it is differentiable in (0, 1) with
g 0 (t) = hv, Df ((1 t)x + ty)(y x)i .
By the mean value theorem, there exists a number (0, 1) such that
g(1) g(0) = g 0 ( ),
i.e.,
hv, f (y) f (x)i = hv, Df ((1 )x + y)(x y)i Kkvkkx yk.
Choose v = f (y) f (x), then we obtain
kf (y) f (x)k2 Kkf (y) f (x)kkx yk.
If f (x) = f (y), then there is nothing to prove. Otherwise, we divide by
kf (y) f (x)k on both sides to obtain the desired inequality.
Definition 2.6.6. Suppose that x0 . Let f : R be a function. If f
is differentiable at x0 and Df (x0 ) = 0, then x0 is called a critical point (or
stationary point) of f .
If there exists a number r > 0 such that f (x0 ) f (x) for all x Br (x0 ),
then x0 is called a local minimum point of f . If there exists a number r > 0
such that f (x0 ) f (x) for all x Br (x0 ), then x0 is called a local maximum
point of f . If x0 is a critical point of f and neither a local minimum point
nor a local maximum point, then it is called a saddle point of f .
Proposition 2.6.3. Suppose that f : R is a function. If x0 is a
local minimum point or local maximum point of f and f is differentiable at
x0 , then x0 is a critical point of f .
Proof. Choose r > 0 such that Br (x0 ) . Fix n {1, . . . , N } and let en
be the n-th standard unit vector in RN . Consider the function
g(t) = f (x0 + ten ),
t (r, r).
This function is differentiable at 0 by the chain rule with

f
g 0 (0) =
(x0 ).
xn
Moreover, the function g has a local minimum or maximum point at 0.
f
Hence g 0 (0) = 0. It follows that x
(x0 ) = 0 for n = 1, . . . N , and therefore
n
Df (x0 ) = 0.
2.7. HIGHER ORDER DERIVATIVES
2.7
51
Higher Order Derivatives
If we have a map f : RM such that the partial derivative with respect

to xj (for some j = 1, . . . , N ) exists throughout , then it may happen that
f
xj itself has a partial derivative, say with respect to xi . Then we write
f
2f
=
,
xi xj
xi xj
or possibly
2f
f
=
2
xi xi
xi
if i = j. Even higher order partial derivatives are defined similarly.
If f has a Frechet derivative throughout , then we have a map Df :
Hom(RN , RM ). It may happen that Df has a Frechet derivative at
a point x0 . This is then denoted by D2 f (x0 ) and is a linear map
RN Hom(RN , RM ) (i.e., an element of Hom(RN , Hom(RN , RM )). Again,
we can define even higher derivatives similarly.
Definition 2.7.1. Suppose that f : R is a function with second partial
derivatives at a point x0 . Then the matrix

Hf (x0 ) =
2f
(x0 )
xi xj

i,j
is called the Hessian (or Hessian matrix ) of f at x0 .

Theorem 2.7.1 (Symmetry of the Hessian). Let f : R be a function
that has continuous second order partial derivatives in . Then for i, j =
1, . . . , N ,
2f
2f
=
.
xi xj
xj xi
Proof. Let x0 and choose r > 0 such that B2r (x0 ) . Let en be
the n-th standard unit vector in RN . Let h (0, r) and consider the two
functions
g1 (s) = f (x0 + sei + hej ) f (x0 + sei ),
0sh
g2 (t) = f (x0 + hei + tej ) f (x0 + tej ),
0 t h.
Note that
g10 (s) =
f
f
(x0 + sei + hej )
(x0 + sei ).
xi
xi
52
By the mean value theorem, there exists a number 1 (0, h) such that
g1 (h) g1 (0) = hg 0 (1 ). That is,
f (x0 + hei + hej ) f (x0 + hei ) f (x0 + hej ) + f (x0 )

f
f
(x0 + 1 ei + hej )
(x0 + 1 ei ) .
=h
xi
xi
Applying the mean value theorem to the function t 7
we find a number 1 (0, h) such that
f
xi (x0
+ 1 ei + tej ),
f
f
2f
(x0 + 1 ei + hej )
(x0 + 1 ei ) = h
(x0 + 1 ei + 1 ej ).
xi
xi
xj xi
That is,
1
(f (x0 + hei + hej ) f (x0 + hei ) f (x0 + hej ) + f (x0 ))
h2
2f
=
(x0 + 1 ei + 1 ej ).
xj xi
Replacing g1 with g2 and applying the same arguments, we find 2 , 2
(0, h) such that
1
(f (x0 + hei + hej ) f (x0 + hei ) f (x0 + hej ) + f (x0 ))
h2
2f
(x0 + 2 ei + 2 ej ).
=
xi xj
Hence
2f
2f
(x0 + 1 ei + 1 ej ) =
(x0 + 2 ei + 2 ej ).
xj xi
xi xj
(2.6)
Fix > 0. But by the continuity of the second partial derivatives, we

can choose h so small that we have
2

2f
f

xj xi (x0 + 1 ei + 1 ej ) xj xi (x0 )
and
2

f

2f

xi xj (x0 + 2 ei + 2 ej ) xi xj (x0 ) .
Then (2.6) implies

2

2f
f

xj xi (x0 ) xi xj (x0 ) < 2.
Since was chosen arbitrarily, this concludes the proof.
2.7. HIGHER ORDER DERIVATIVES
53
Notation. When working with higher order derivatives, we may have to

keep track of many indices, and then it is convenient to use a multi-index
notation.
N
Let = (1 , . . . , N ) NN
0 , where N0 = N {0}. Then for x R we
define
x = x1 1 xNN .
Moreover, we set
|| = 1 + + N
and ! = 1 ! N !.
If f : R is a function, then we define

|| f
|| f
,
=
x
x1 1 . . . xNN
provided that this partial derivative exists.
Theorem 2.7.2 (Taylors Theorem). Suppose that f : R has continuous partial derivatives up to order m throughout . Let x, y such that
[x, y] . Then there exists a number (0, 1) such that
X
f (y) =
||m1
X 1 mf
1 || f
(x)(y
x)
+
((1 )x + y)(y x) .
! x
! x
||=m
Moreover,
f (y) =
X 1 || f
(x)(y x) + o(ky xkm )
! x
||m
as y x.
Proof. This follows from Taylors theorem in one variable, applied to the
function t 7 f ((1 t)x + ty). See Exercise 10.1 for the details.
Recall that given a real (N N )-matrix A, we have a quadratic form
RN R, x 7 hx, Axi. We say that A is
positive definite if hx, Axi > 0 for all x RN \{0},
negative definite if hx, Axi < 0 for all x RN \{0}, and
indefinite if there exist two points x , x+ RN with hx , Ax i < 0
and hx+ , Ax+ i > 0.
For a symmetric matrix (i.e., a matrix with AT = A), positive definite
means that all eigenvalues are positive, negative definite means that all
eigenvalues are negative, and indefinite means that there are positive and
negative eigenvalues.
54
Corollary 2.7.1 (Second Derivative Test). Let f : R be a function

with continuous partial derivatives up to second order. Let x0 be a
critical point of f .
(i) If Hf (x0 ) is positive definite, then x0 is a local minimum point of f .
(ii) If Hf (x0 ) is negative definite, then x0 is a local maximum point of f .
(iii) If Hf (x0 ) is indefinite, then x0 is a saddle point of f .
Proof. By Taylors theorem, we can write
f (x) = f (x0 ) +
1
hx x0 , Hf (x0 )(x x0 )i + R(x)
2
for a function R : R with

lim
xx0
R(x)
= 0.
kx x0 k2
(i) If Hf (x0 ) is positive definite, then all of its eigenvalues are positive.
Let 0 be the smallest eigenvalue. Then we have
hx x0 , Hf (x0 )(x x0 )i 0 kx x0 k2
for all x RN . So
f (x) f (x0 ) + kx x0 k
0
R(x)
+
2
kx x0 k2

.
If kx x0 k is sufficiently small, then it follows that f (x) f (x0 ).

(ii) This is proved analogously.
(iii) If Hf (x0 ) is indefinite, then it has a positive eigenvalue + and
a negative eigenvalue . Let u+ and u , respectively, be corresponding
eigenvectors of unit length. Then we have

R(tu+ )
2 +
f (x0 + tu+ ) = f (x0 ) + t
+
2
t2
for any t R such that x0 + tu+ . It follows that f (x0 + tu+ ) f (x0 )
whenever |t| is sufficiently small. Similarly, using u and , we see that
f (x0 + tu ) f (x0 ) whenever |t| is sufficiently small.
2.8
The Implicit Function Theorem
We now consider functions in two variables. That is, from now on, we have
N = 2 and we consider an open set R2 . This is merely to avoid technical
complications, and the results do have counterparts in higher dimensions. In
2.8. THE IMPLICIT FUNCTION THEOREM
55
two dimensions, its convenient to denote the coordinates by (x, y)T rather
than (x1 , x2 )T .
This section is about the observation that for a reasonably regular function f : R, the solutions of the equation
f (x) = 0
typically form a curve in . For example, if f (x) = x21 + x22 r2 with r R,
then we have a circle of radius |r|, except for r = 0, where we have a single
point that solves the equation.
We may want to use an equation like this in order to define a specific
curve in R2 . But if we want to be certain that we actually obtain a curve, we
have to worry about degenerate cases like the case r = 0 above. Fortunately,
we can give conditions that guarantee not only that we have a curve, but
even that we locally have the graph of a function. This function is implicitly
defined through the equation.
Theorem 2.8.1 (Implicit Function Theorem). Suppose that the function
f : R is continuously differentiable in . Let (x0 , y0 )T be a point
with f (x0 , y0 ) = 0 and f
y (x0 , y0 ) 6= 0. Then there exist two numbers r, s > 0
such that (x0 r, x0 + r) (y0 s, y0 + s) and there exists a unique
function g : (x0 r, x0 + r) (y0 s, y0 + s) with f (x, g(x)) = 0 for all
x (x0 r, x0 + r). Furthermore, this function g is differentiable at x0 with
f
x
g 0 (x0 ) = f
(x0 , y0 )
y (x0 , y0 )
Remark 2.8.1. The uniqueness of g implies that g(x0 ) = y0 . Once we have

determined the function g, we can apply the theorem for any point (x, g(x))
instead of (x0 , y0 ) and it follows that g is differentiable with
0
g (x) =
f
x (x, g(x))
f
y (x, g(x))
for every x (r, r) such that the denominator does not vanish (which is
the case at least sufficiently close to x0 by the continuity). The right-hand
side is continuous, so g is continuously differentiable.
Proof. Without loss of generality we may assume that x0 = 0 and y0 = 0,
because otherwise we may apply a translation in R2 . Furthermore, we may
assume without loss of generality that f
y (0, 0) > 0; otherwise we may replace
f by f .
56
Step 1: construct g. Define = 12 f

y (0, 0). By the continuity of the
partial derivatives, there exist r, s > 0 such that [r, r] [s, s] and
f
y > on (r, r) (s, s). We may further suppose that r is chosen so
small that |f (x, 0)| < s for all x (r, r) because of the continuity of
f ( , 0) and the fact that f (0, 0) = 0.
Fix x (r, r) and consider the function f (x, ), which is continuous on
[s, s] and continuously differentiable in (s, s) with derivative f
y (x, ) >
. Hence the function is strictly increasing in [s, s]. Moreover, we have
0
f
f (x, s) = f (x, 0)
(x, ) d < 0
s y
and
f (x, s) = f (x, 0) +
f
(x, ) d > 0.
y
We conclude that there exists a unique number y (s, s) such that

f (x, y) = 0. We define g(x) = y.
Step 2: g is continuous at 0. We claim that there exists a constant K
such that
|g(x)| K|x|
(2.7)
for all x (r, r). In order to prove this, fix x and note that
0 = f (x, g(x)) = f (x, g(x)) f (0, g(x)) + f (0, g(x)) f (0, 0).
By the mean value theorem, there exist , (0, 1) with
f (x, g(x)) f (0, g(x)) = x
and
f (0, g(x)) f (0, 0) = g(x)
So
0=x
f
(x, g(x))
x
f
(0, g(x)).
y
f
f
(x, g(x)) + g(x) (0, g(x))
x
y
f
Recall that f
y > in (r, r) (s, s). Furthermore, since x is continuous
on [r, r][s, s], the Weierstrass extreme value theorem implies that there
exist a number C 0 such that

f

(
C
x
,
y
)
x

for all x
[r, r] and all y [s, s]. Hence

f (x, g(x))
C
x

|g(x)| = f
|x| |x|.
(0, g(x))
y
Choosing K = C/, we obtain (2.7).
2.9. THE LAGRANGE MULTIPLIER RULE
57
Step 3: g is differentiable at 0. Next we want to show that g is differentiable at 0 with the derivative given in the statement of the theorem. To
this end, note that by the differentiability of f , we have
f (x, g(x)) = f (0, 0) + Df (0, 0)(x, g(x))T + R(x)
for a function R : (r, r) R with
R(x)
lim p
= 0.
2
x
x + (g(x))2
But f (0, 0) = f (x, g(x)) = 0. So we have
0=x
Thus we obtain
f
f
(0, 0) + g(x) (0, 0) + R(x).
x
y
f
R(x)
g(x)
x (0, 0)
f
.
= f
x
(0,
0)
(0,
0)
x
y
y
Moreover, we have
q

2
R(x)
|R(x)| 1 + (g(x))
|R(x)|
1 + K2

x2
q
p
0
f

f
2 + (g(x))2
x (0, 0)

(g(x))2 f
x
(0,
0)
y
y
|x| 1 +
(0, 0)
x2
as x 0 by (2.7). Hence
f
g(x)
x (0, 0)
.
= f
x0 x
(0, 0)
g 0 (0) = lim
This completes the proof.
2.9
The Lagrange Multiplier Rule
This is another statement about maxima or minima of a given function, but

now we look at the problem of finding extrema relative to a side condition.
Theorem 2.9.1 (Lagrange Multiplier Rule). Let f : R and g : R
be continuously differentiable functions and let

S = (x, y)T : g(x, y) = 0 .
Suppose that (x0 , y0 )T S is a point such that
f (x0 , y0 ) f (x, y)
for all (x, y)T S. If g(x0 , y0 ) 6= 0, then there exists a number R such
that
f (x0 , y0 ) = g(x0 , y0 ).
58
Proof. Since g(x0 , y0 ) 6= 0, we have either

g
(x0 , y0 ) 6= 0
x
or
g
(x0 , y0 ) 6= 0.
y
g
If y
(x0 , y0 ) 6= 0, then we apply the implicit function theorem to find r, s > 0
with [x0 r, x0 + r] [y0 s, y0 + s] and a continuously differentiable
function : (x0 r, x0 + r) (y0 s, y0 + s) such that g(x, (x)) = 0 for
all x (x0 r, x0 + r). Moreover, we have
g
0 (x0 ) = x
g
(x0 , y0 )
y (x0 , y0 )
Now consider the function h : (x0 r, x0 +r) R with h(x) = f (x, (x)).
It has a minimum at x0 , and so by the chain rule,
0 = h0 (x0 ) =
=
f
f
(x0 , y0 ) +
(x0 , y0 )0 (x0 )
x
y
g
(x0 , y0 )
f
f
(x0 , y0 )
(x0 , y0 ) x
.
g
x
y
(x0 , y0 )
y
Hence
f
x (x0 , y0 )
g
x (x0 , y0 )
f
y (x0 , y0 )
g
y (x0 , y0 )
=: .
Then we have
f
g
(x0 , y0 ) = (x0 , y0 )
x
x
and
f
g
(x0 , y0 ) = (x0 , y0 ),
y
y
which is another way of writing f (x0 , y0 ) = g(x0 , y0 ).

g
If x
(x0 , y0 ) 6= 0, then we use the same arguments with the roles of the
coordinates x and y exchanged.
Remark 2.9.1. Theorem 2.9.1 is about a minimum point of f relative to
S. If we apply the result to f instead, then we obtain the corresponding
statement for maximum points relative to S as well.
Recall that we have a geometric interpretation of f as a vector in
the direction of steepest ascent. Furthermore, we know that g(x0 , y0 ) is
perpendicular to the level set S at the point (x0 , y0 )T . So the theorem says
that at a minimum point relative to S, the direction of steepest ascent of f
will be perpendicular to S (cf. Fig. 2.9.1).
Example 2.9.1. Consider the function f : R2 R with f (x, y) = x2 + y 2
for (x, y)T R2 . Find a minimum of f relative to the line

(x, y)T R2 : 2x + y = 5 .
2.9. THE LAGRANGE MULTIPLIER RULE
59
f (x0) = g (x0)
x0
g(x) = 0
Figure 2.9.1: The gradient f (x0 ) is perpendicular to the level set of g.
We use the function g : R2 R with g(x, y) = 2x + y 5. We compute

2
2x
.
and g(x, y) =
f (x, y) =
1
2y
Hence we look for three numbers x0 , y0 , R with
2x0 = 2,
2y0 = ,
2x0 + y0 5 = 0.
This system is easy to solve: we have = 2, x0 = 2, and y0 = 1. So we have
only one candidate for a minimum point: (2, 1)T .
The question is now whether there exists a minimum relative to the line
at all. In order to give an answer, we observe that
n
o
p
f (2, 1) = 5 < 9 = inf f (x, y) R2 : x2 + y 2 > 3 .
n
o
p
Hence everything outside of the set (x, y)T R2 : x2 + y 2 3 is irrelevant for the minimisation problem. So we can define
n
o
p
A = (x, y)T R2 : g(x, y) = 0 and x2 + y 2 3
and minimise in A instead. This is a closed and bounded set, and a minimiser
exists by the Weierstrass extreme value theorem (Theorem 2.4.3). This point
must be (2, 1)T by the previous observations.
60
Index
n-th interval, 5
Jacobi matrix, 45
Bolzano-Weierstrass theorem, 36
bounded, 36
Lagrange multiplier rule, 57

length, 7
local maximum point, 50
local minimum point, 50
lower Riemann integral, 7
lower Riemann sum, 7
Cauchy-Schwarz inequality, 34
chain rule, 48
closed, 37
compact, 38
Continuity Theorem, 23
continuity theorem, 23
continuous, 38
continuous at a point, 38
continuously differentiable, 47
convergence, 35, 38
critical point, 50
mean value inequality, 49

mesh, 7
multi-index notation, 53
norm, 40
open, 37
open ball, 37
operator norm, 42
oscillation, 9
directional derivative, 45
equivalent, 41
Euclidean inner product, 33
Euclidean norm, 33
partial derivative, 45
partition, 5
primitive, 25
first fundamental theorem of calculus, 23

Frechet derivative, 43
Frechet differentiable, 43
fundamental theorem of calculus, 23
gradient, 45
Hessian, 51
implicit function theorem, 55
improper integral, 30
integral test, 31
integration by parts, 26
integration by substitution, 26
refinement, 9
Riemann integrable, 7
Riemann integral, 7
Riemann sum, 15
saddle point, 50
second derivative test, 54
second fundamental theorem of
calculus, 23
stationary point, 50
subdivision, 5
symmetry of the Hessian, 51
tagged subdivision, 15
61
62
Taylors theorem, 53
triangle inequality, 34
uniformly continuous, 38
upper Riemann integral, 7
upper Riemann sum, 7
Weierstrass extreme value theorem,
39
INDEX

MA20218 Analysis 2A: Lecture Notes

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

MA20218 Analysis 2A: Lecture Notes

Загружено:

Авторское право:

Доступные форматы

MA20218 Analysis 2A

2 Analysis in Several Variables

Lower and Upper Riemann Sums

CHAPTER 1. RIEMANN INTEGRATION

Figure 1.0.1: Area under a curve

1.1. LOWER AND UPPER RIEMANN SUMS

called the length of In and

is called the lower Riemann sum of f on and

sup f (In )|In |

is called the upper Riemann sum of f on .

f (x) dx = sup {L(f, ) : is a subdivision of [a, b]}

is the lower Riemann integral of f on [a, b] and

f (x) dx = inf {U (f, ) : is a subdivision of [a, b]}

is the upper Riemann integral of f on [a, b]. If

CHAPTER 1. RIEMANN INTEGRATION

Figure 1.1.2: The integral corresponds to a signed area

Remark 1.1.1. These definitions are for bounded functions on bounded

M = sup f ([a, b]).

Then for any subdivision of [a, b],

1.1. LOWER AND UPPER RIEMANN SUMS

so we can rewrite these inequalities as

CHAPTER 1. RIEMANN INTEGRATION

Define mn = inf f (In ) for n = 1, . . . , N and define

Then m mK (as I IK ) and m+ mK (as I + IK ). Hence

The inequality U (f, ) U (f, 0 ) is proved similarly. The inequality

U (f, E 0 ) U (f, E),

(f, E 0 ) (f, E).

Choose a point x from 0 that does not belong to . Define 00 to be the

1.2. CRITERIA FOR INTEGRABILITY

It then also follows that (f, 0 ) (f, ).

Now taking the infimum over 2 yields the desired inequality.

Criteria for Integrability

Not every bounded function is integrable, and so we need tools to help us

regardless of the subdivision. Hence

In particular, this function is not Riemann integrable.

CHAPTER 1. RIEMANN INTEGRATION

Proof. Suppose that f is Riemann integrable. Then

f (x) dx = sup {L(f, ) : is a subdivision of [a, b]}

f (x) dx = inf {U (f, ) : is a subdivision of [a, b]} .

So U (f, 2 ) L(f, 1 ) < . Now let be a common refinement of 1 and

f (x) dx U (f, ) L(f, ) < .

But since  is an arbitrary positive number, we must have

That is, the function f is Riemann integrable.

1.2. CRITERIA FOR INTEGRABILITY

In order to prove this, let N N and consider the subdivision N =

for any N N. When we let N , we obtain

(f (xn ) f (xn1 ))(xn xn1 )

(f (xn ) f (xn1 )) = kk(f (b) f (a)).

CHAPTER 1. RIEMANN INTEGRATION

Corollary 1.2.2. Let f : [a, b] R be continuous. Then f is Riemann

Now Theorem 1.2.1 implies that f is Riemann integrable.

inf f (Ik )|Ik |,

inf f (Ik )|Ik |.

1.3. RIEMANN SUMS

Theorem 1.2.2. A bounded function f : [a, b] R is Riemann integrable

Definition 1.3.1. Suppose that = (x0 , . . . , xN ) is a subdivision of [a, b]

is called a Riemann sum of f .

CHAPTER 1. RIEMANN INTEGRATION

It is clear that for a Riemann sum with tagged subdivision (, ) as in

1.4. PROPERTIES OF THE INTEGRAL

Properties of the Integral

Theorem 1.4.1. (i) Let f, g : [a, b] R be Riemann integrable. Then

(ii) Let f : [a, b] R be Riemann integrable and R. Then f is

So U (f, 2 ) L(f, 1 ) < . Now let be a common refinement of 1 and

f (x) dx U (f, ) L(f, ) < .

But since is an arbitrary positive number, we must have

(f, In )|In | = (f, 0 ) < .

As can be chosen arbitrarily small, we have in fact

|In | < (L `).

(g f, In )|In | < (L ` + b a).

= (fK , ) + 2(b a) < (1 + 2b 2a).