Вы находитесь на странице: 1из 420

Undergraduate Texts in Mathematics

Editors
F. W. Gehring
P. R. Halmos
Advisory Board
C. DePrima
I. Herstein
J. Kiefer
Wendell Fleming

Functions of
Several Variables
2nd Edition

Springer-Verlag
New York Heidelberg Berlin
Wende II Fleming
Brown University
Department of Mathematics
Providence, Rhode Island 02912

Editorial Board

F. W. Gehring P. R. Halmas
University of Michigan University of California
Department of Mathematics Department of Mathematics
Ann Arbor, Michigan 48104 Santa Barbara, California 93106

AMS Subject Classifications: 26-01. 28-01, 58--01

Library of Congress Cataloging in Publication Data

Fleming, Wende II Helms, 1928-


Functions of several variables.
(Undergraduate texts in mathematics)
Bibliography: p.
Inc1udes index.
I. Functions of several real variables.
I. Tide
QA33l.F63 1977 515' .84 76-40029

All rights reserved.

No part of this book may be translated or reproduced in


any form without written permission from Springer-Verlag.

©1965 by WendelI Fleming


©1977 by Springer- Verlag, New York Inc.
Softcover reprint of the hardcover 2nd edition 1977

ISBN 978-1-4684-9463-1 ISBN 978-1-4684-9461-7 (eBook)


DOI 10.1007/978-1-4684-9461-7
To Flo
Preface

The purpose of this book is to give a systematic development of differential


and integral calculus for functions of several variables. The traditional topics
from advanced calculus are included: maxima and minima, chain rule,
implicit function theorem, multiple integrals, divergence and Stokes's
theorems, and so on. However, the treatment differs in several important
respects from the traditional one. Vector notation is used throughout, and
the distinction is maintained between n-dimensional euclidean space En and
its dual. The elements of the Lebesgue theory of integrals are given. In
place of the traditional vector analysis in £3, we introduce exterior algebra
and the calculus of exterior differential forms. The formulas of vector
analysis then become special cases of formulas about differential forms and
integrals over manifolds lying in P.
The book is suitable for a one-year course at the advanced undergraduate
level. By omitting certain chapters, a one semester course can be based on it.
For instance, if the students already have a good knowledge of partial
differentiation and the elementary topology of P, then substantial parts of
Chapters 4, 5, 7, and 8 can be covered in a semester. Some knowledge of
linear algebra is presumed. However, results from linear algebra are reviewed
as needed (in some cases without proof).
A number of changes have been made in the first edition. Many of these
were suggested by classroom experience. A new Chapter 2 on elementary
topology has been added. Additional physical applications-to thermo-
dynamics and classical mechanics-have been added in Chapters 6 and 8.
Different proofs, perhaps easier for the beginner, have been given for two
main theorems (the Inverse Function Theorem and the Divergence Theorem.)

Vll
Preface

The author is indebted to many colleagues and students at Brown Uni-


versity for their valuable suggestions. Particular thanks are due Hildegarde
Kneisel, Scott Shenker, and Joseph Silverman for their excellent help in
preparing this edition.

Wendell H. Fleming

Providence, Rhode Island


June, 1976

viii
Contents

Chapter 1
Euclidean spaces I
1.1 The real number system 2
1.2 Euclidean P 5
1.3 Elementary geometry of En 10
1.4 Basic topological notions in En 14
*1.5 Convex sets 19

Chapter 2
Elementary topology of En 28
2.1 Functions 28
2.2 Limits and continuity of transformations 31
2.3 Sequences in En 37
2.4 Bolzano-Weierstrass theorem 43
2.5 Relative neighborhoods, continuous transformations 47
2.6 Topological spaces 50
2.7 Connectedness 56
2.8 Compactness 60
2.9 Metric spaces 62
2.10 Spaces of continuous functions 67
*2.11 Noneuclidean norms on En 70

Chapter 3
Differentiation of real-valued functions 76
3.1 Directional and partial derivatives 76
3.2 Linear functions 79

ix
Contents

3.3 Differentiable functions 82


3.4 Functions of class C(q) 89
3.5 Relative extrema 99
*3.6 Convex and concave functions 107

Chapter 4
Vector-valued functions of several variables 119
4.1 Linear transformations 119
4.2 Affine transformations 125
4.3 Differentiable transformations 128
4.4 Composition 134
4.5 The inverse function theorem 140
4.6 The implicit function theorem 147
4.7 Manifolds 153
4.8 The multiplier rule 161

Chapter 5
Integration 167
5.1 Intervals 168
5.2 Measure 170
5.3 Integrals over En 181
5.4 Integrals over bounded sets 186
5.5 Iterated integrals 190
5.6 Integrals of continuous functions 200
5.7 Change of measure under affine transformations 206
5.8 Transformation of integrals 209
5.9 Coordinate systems in En 216
5.10 Measurable sets and functions; further properties 222
5.11 Integrals: general definition, convergence theorems 227
5.12 Differentiation under the integral sign 237
5.13 LP-spaces 240

Chapter 6
Curves and line integrals 245
6.1 Derivatives 245
6.2 Curves in En 247
6.3 Differential I-forms 253
6.4 Line integrals 258
*6.5 Gradient method 265
*6.6 Integrating factors; thermal systems 268

Chapter 7
Exterior algebra and differential calculus 275
7.1 Covectors and differential forms of degree 2 276
7.2 Alternating multilinear functions 283

x
Contents

7.3 M ulticovectors 287


7.4 Differential forms 291
7.5 Multivectors 295
7.6 Induced linear transformations 306
7.7 Transformation law for differential forms 309
7.8 The adjoint and codifferential 311
*7.9 Special results for n = 3 316
*7.10 Integrating factors (continued) 318

Chapter 8
Integration on manifolds 321
8.1 Regular transformations 322
8.2 Coordinate systems on manifolds 329
8.3 Measure and integration on manifolds 334
8.4 The divergence theorem 340
*8.5 Fluid flow 350
8.6 Orientations 353
8.7 Integrals of r-forms 356
8.8 Stokes's formula 362
8.9 Regular transformations on submanifolds 367
8.10 Closed and exact differential forms 369
8.11 Motion of a particle 375
8.12 Motion of several particles 380

Appendix I
Axioms for a vector space 383
Appendix 2
Mean value theorem; Taylor's theorem 385

Appendix 3
Review of Riemann integration 386
Appendix 4
Monotone functions 388

References 389
Answers to problems 391
Index 405

Xl
1
Euclidean spaces

This book is concerned with the differential and integral calculus of functions
of several variables. For this purpose one needs first to know some basic
properties of euclidean space of arbitrary finite dimension n. We begin
this chapter with a brief review of the real numbers and the elements of
vector algebra and geometry of such spaces. Later in the chapter the concepts
of neighborhood, open set, and closed set are introduced. These constitute
the basis for studying what are called topological properties of n-dimen-
sional space.
Format
The word "Theorem" has been reserved for what the author considers
the most important results. Results of lesser depth or interest are labeled
"Proposition." The symbol 0 indicates the end of the proof of a theorem or
proposition. Occasionally part of a proof is left to the reader as a homework
exercise. The sections marked with an asterisk (*) may be omitted without
disrupting the organization. References are given at the end of the book.
We presume that the reader is acquainted with the most elementary aspects
of set theory. The symbols
E,~, U, (\, - , c

stand, respectively, for is an element of, is not an element of, union, intersection,
difference, and inclusion. Sets ordinarily are denoted by capital italicized
letters. A set is described either by listing its elements or by some property
characterizing them. Thus {2, 5, 7} is the set whose elements are the three
numbers 2, 5, and 7. If S is a set and n a property pertaining to elements of S,
then {p E S: n} denotes the set of all PES with property n. For example,
if Z = {l, 2, ... } is the set of natural numbers, then S = {x E Z: x = 2y - 1
1 Euclidean spaces

for some y E Z} is the set of odd natural numbers. The set {x E Z: x 2 = 3}


is the empty set. The set {x E Z: x(x - 1) = x 2 - x} is all of Z.
When the set S in question is clear from the context, we abbreviate by
writing simply {p : 7r}.

1.1 The real number system


While calculus has been motivated in large part by problems from geometry
and physics, its foundations rest upon the idea of number. Therefore a
thorough treatment of calculus should begin with a study ofthe real numbers.
The real number system satisfies axioms about arithmetic and order, which
express properties of numbers with which everyone is familiar from elemen-
tary mathematics.
We list these properties as Axioms I and II.

Axiom I
(a) Any two real numbers have a sum x + y and a product xy, which are
also real numbers. Moreover,
Commutative law x +y= y + x, xy = yx,
Associative law x + (y + z) = (x + y) + z, x(yz) = (xy)z,
Distributive law x(y + z) = xy + xz
for every x, y, and z.
(b) There are two (distinct) real numbers 0 and 1, which are identity
elements under addition and multiplication, respectively:

x + = ° x, xl = x
for every x.
(c) Every real number x has an inverse - x with respect to addition, and
if x =1= 0, an inverse x - 1 with respect to multiplication:
x + (-x) = 0,
Axiom II. There is a relation < between real numbers such that:
(a) For every pair of numbers x and y, exactly one of the following
alternatives holds: x < y, x = y, y < x.
(b) w < x and x < y imply w < y (transitive law).
(c) x < y implies x + z < y + z for every z.
(d) x < y implies xz < yz whenever < z. °
From Axioms I and II follow all of the ordinary laws of arithmetic. In
algebra any set with two operations (usually called addition and multiplica-
tion) having the properties listed in Axiom I is called afield. A field is called
ordered if there is in it a relation < satisfying Axiom II.

2
1.1 The real number system

The real numbers form an ordered field. However, this is by no means the
only ordered field. For example, the rational numbers also form an ordered
field. We recall that x rational means that x = p/q, where p and q are integers
and q "# O. Yet another axiom is needed to characterize the real number
system. This axiom can be introduced in several ways. Perhaps the simplest
of these is Axiom III, stated below in terms of least upper bounds.
In Section 2.3 we state other axioms that turn out to be equivalent to
Axiom III. We should warn the reader that Axiom III is more subtle than
Axioms I and II, and that one becomes aware only gradually of its impli-
cations. However, this axiom is the foundation stone for some of the most
important theorems in calculus.
Let S be a nonempty set of real numbers. If there is a number c such that
x s c for every XES, then c is called an upper bound for S. If c is an upper
bound for Sand b ~ c, then b is also an upper bound for S.

Axiom III. Any set S of real numbers that has an upper bound has a least
upper bound.

The least upper bound for S is denoted by sup S. If S has no upper bound,
then we set sup S = + 00.
A number d is a lower bound for S if d s x for every XES. If S has a
lower bound, then (Problem 2) S has a greatest lower bound. It is denoted
by inf S. If S has no lower bound, then we set inf S = - 00.

EXAMPLE I. Let S = {I, 2, 3, ... }, the set of positive integers. Then sup S =
+ 00 and inf S = 1.

EXAMPLE 2. Let a and b be real numbers with a < b. The sets


[a,b] = {x:a S x S b}, (a, b) = {x: a < x < b},
[a, b) = {x: a S x < b}, (a,b] = {x:a < x s b}
are called finite intervals with endpoints a and b. The first of these intervals
is called closed, the second open, and the last two, half-open. In each instance
b is the least upper bound and a is the greatest lower bound.
In the same way, the semiinfinite intervals
[a, (0) = {x:x ~ a}, (a, (0) = {x:x > a}

are called closed and open, respectively, and have a as greatest lower bound.
The corresponding intervals ( - 00, b], (- 00, b) have b as least upper bound.

Let S be a set that has an upper bound. Example 2 shows that the number
sup S need not belong to S. If sup S does happen to be an element of S, then
it is the largest element of S and we write "max S" instead of "sup S."
Similarly, if S is bounded below and inf S is an element of S, then we write for
it "min S."

3
I Euclidean spaces

EXAMPLE 3. Let S = {x: x 2 < 2 and x is a rational number}. Then .j2 =


sup Sand -.j2 = inf S. Since .j2 is not a rational number, this example
shows that the least upper bound axiom would no longer hold if we replaced
the real number system by the rational number system.

EXAMPLE 4. Let S = {sin x: x E [ -n:, n:J}. Then -1 = min S, 1 = max S.

The real number system also satisfies the archimedean property. This means
that for every e > 0, x > 0 there exists a positive integer m such that x < me.
To prove it, suppose to the contrary that for some pair e, x of positive
numbers, me ~ x for every m = 1, 2, .... Then x is an upper bound for the
set S = {e, 2e, 3e, .. .}. Let c = sup S. Then (m + l)e ~ c and therefore
me ~ c - e, for each m = 1, 2, .... Hence c - e is an upper bound for S
smaller than sup S, a contradiction. This proves the archimedean property.
We shall not prove that there actually is a system satisfying Axioms I,
II, and III. There are two well-known methods of constructing the real
number system, starting from the rational numbers. One of them is the
method of Dedekind cuts and the other is Cantor's method of Cauchy
sequences.
Axioms I, II, and III characterize the real numbers; in other words,
any two systems satisfying these three axioms are essentially the same. To
put this more precisely in algebraic language, any two ordered fields satisfying
Axiom III are isomorphic.
For proofs of these facts, refer to the book by Birkhoff and McLane
[2, Chapter III].

PROBLEMS
l. Find the least upper bound sup S and greatest lower bound inf S of each of the
following sets:
(a) {x: x 2 - 3x + 2 < O}.

(b) {x: x 3 + x 2 - 2x ::s; 2}.


(c) {sin x + cos x: x E [0, n]}.
(d) {x exp x: x < O}. [Note: exp denotes the exponential function, exp x = eX,
where e is the base for natural logarithms.]
State whether sup Sand inf S are elements of S.
2. Let T = {x: -x E S}. Show that -sup T = inf S.
3. Let x and y be real numbers with x < y. Show that there is a rational number z such
that x < z < y. [Hint: By the archimedean property there is a positive integer q
such that q-I < Y - x. Let z = p/q, where p is the smallest positive integer such
that qx < p.]

[Note: In this book, "Show that ... " and "Prove that ... " both mean "give
a valid mathematical proof."J

4
1.2 Euclidean En

1.2 Euclidean En
In this book we denote the real number system by E I. Let us now define the
space E", whose elements are n-tuples of real numbers. The elements of En
will be called vectors.

Scalars and vectors


By scalar we mean a real number. In elementary mathematics a vector is
described as a quantity that has both direction and length. Vectors are
illustrated by drawing arrows issuing from a given point O. The point at the
head of the arrow specifies the vector. Therefore we may (and shall) say
that this point is the vector. Thus in two dimensions a vector is just a point
(x, y) of the plane E2. Vectors in E2 are added by the parallelogram law,
which amounts to adding corresponding components. Thus
(x, y) + (u, v) = (x + u, Y + v).
The product of (x, y) by a scalar e is the vector (ex, ey). The zero vector is
(0,0).
With this in mind, let us define the space E" for any positive integer n.
The elements of E" are n-tuples (x I, ... , x") of real numbers. For short, we
write x for the n-tuple (Xl, ... , x"). The notation x E E" means "x is an
element of E"." The elements of E" are called vectors, and also points,
depending on which term seems more suggestive in the context. Addition and
scalar multiplication are defined in E" as follows. If
x = (x I, ... , x n ),

are any two elements of En, then


x +y = (Xl + yl, ... ,xn + yn).
If x E E" and e is a scalar, then
ex = (exl, ... , ex").

The zero element of En is


0= (0, ... ,0).
With these definitions E" satisfies the axioms for a vector space (Appendix
A.1). The term" vec.tor" is reserved for elements of E" rather than those of any
space satisfying these axioms.
The superscripts should not be confused with powers of x. For instance,
(Xi)2 means the square of the ith entry Xi of the n-tuple (Xl, ... , x").
If n = 1 we identify the 1-tuple x = (x) with the scalar x. In this case
addition and scalar multiplication reduce to ordinary addition and multipli-
cation of real numbers. If n = 2 or 3 we usually write (x, y) or (x, y, z), as is
commonly done in elementary analytic geometry, rather than (Xl, x 2) or

5
I Euclidean spaces

(x 1, X 2 , x 3 ). Practically all of the theorems are}'tated and proved for arbitrary


dimension n. However, the special cases n = 2, 3 will frequently appear in
the examples and homework problems.
The' notions of vector sum and multiplication by scalars determine the
vector-space structure of En, but are not enough to define the concepts of
distance and angle. These arise by introducing an inner product in en. An
inner product assigns to each pair x, y of vectors a scalar, and must have the
four properties listed in Problem 2 at the end of the section. The one which we
shall use is the euclidean inner product, denoted by·,

i= 1

The vector space En with this inner product is called euclidean n-space. Other
inner products in En are considered in Section 2.11.
The euclidean norm (or length) of a vector x is
Ixl = (x' X)1/2.
It is positive, except when x = 0, and satisfies the following two important
inequalities. Fo~ every x, y E En,
(1.1 ) Ix' yl ~ Ixllyl (Cauchy's inequality),
(1.2) Ix + yl ~ Ixl + Iyl (triangle inequality).

PROOF OF (1.1). If y = 0, then both sides of (1.1) are O. Therefore let us


suppose that y #- O. For every scalar t,
(x + ty) . (x + ty) = x . x + 2tx . y + t 2y . y,
since the inner product is commutative and distributive [Problem 2, parts
(a), (b), and (c)]. The left-hand side is Ix + ty 12, and x . x = IX12, Y. Y= IY12.
The right-hand side is quadratic in t and has a minimum when
X'y
t = to = - ~-.
Y'Y
Substituting this expression for t, we find that
Ix' yI 2
o ~ Ix + toYl2 = Ixl2 -~'
or
Ix' yl2 ~ Ix1 21y12.
The last inequality is equivalent to Cauchy's inequality. D

From the proof we see that equality in Cauchy's inequality is equivalent


to the fact that Ix + toyl = 0, that is, that x + toY = O. Thus, if y #- 0,
Ix . y I = Ix II y I if and only ifx is a scalar multiple oiy· If x . y = I x II y I, then
x is a nonnegative scalar multiple of y (and conversely).

6
1.2 Euclidean E"

PROOF OF (1.2). We write, as before,

(x + y) . (x + y) = x.x + 2x . Y + Y . y.
From Cauchy's inequality,
Ix + yl2 ~ Ixl2 + 21xllyl + lyl2,
or

This is equivalent to the triangle inequality. D

If y =1= 0, equality holds in (1.2) if and only if x is a nonnegative scalar


mUltiple of y.
Using the fact that Icx I = Ic II x I, one can easily prove by induction
on m the following extension of the triangle inequality:

(1.3)

for every choice of scalars e I, ... , em and of vectors Xl' ... , x"'. We recall
(Appendix A.1) that L j Cjx j is called a linear combination of Xl' ... , x m •
The euclidean distance between x and y is Ix - y I. If x, y, and z are vectors,
then
x - z = (x - y) + (y - z).

Applying (1.2) to x - y and y - z, we have

(1.4) Ix - zl ~ Ix - yl + Iy - zl,
which justifies the name" triangle inequality" (see Figure 1.1).

x ______\
y
o
Figure 1.1

If x and yare nonzero vectors, the angle () between x and y is defined by


the formula
x'y
(1.5) cos () = Ix II y I' o ~ () ~ 11:.

7
1 Euclidean spaces

.....
y
.... ........ Ix - yl
........
.... ..... x

o
Figure 1.2

This formula agrees in dimensions n = 2, 3 with the one in elementary


analytic geometry. The vectors x and yare orthogonal if x . y = 0, in other
e
words, if is a right angle. We have
Ix - y IZ = IX IZ + Iy IZ - 2x . y,
or
Ix - ylZ = Ixlz + lylZ - 21xllyl cos e,
which is the law of cosines from trigonometry (Figure 1.2).

Orthonormal bases
En is an n-dimensional vector space, and any linearly independent set of
vectors {VI"'" vn} with n elements is a basis for it.
A basis {VI"'" vn } for En is called orthonormal if Vi . Vj = bij' where

bij = {O if i #j, i,j = 1, ... , n.


1 if i = j,
The symbol bij was first introduced by the mathematician Kronecker, and
consequently is called Kronecker's delta. The unit coordinate vectors
el = (1,0, ... ,0),
ez = (0, 1, 0, ... , 0),

en = (0, 0, ... ,0, 1)


form the standard orthonormal basis for En. We have, for each x E en,
n
X = (X I n
, .•• ,x) = '"
L. x i e i ·
i= I

For instance, (2, -1,3) = 2e l - e z + 3e 3 .


If V is any unit vector (I VI = 1), then x . V is the component of x with
respect to v. Since X' e i = Xi, the components of x with respect to the
standard orthonormal basis vectors e 1 , . . . , en are Xl, . . . , xn. If {VI"'" vn}
is any orthonormal basis for En, then VI' ... , Vn are mutually orthogonal
unit vectors. Each x E En can be uniquely represented as a linear combination
(1.6)

8
1.2 Euclidean En

Taking the inner product of each side with Vi and using the formula Vi • Vj =
bij' we obtain

The coefficients c i (1.6) are just the components of x with respect to the
orthonormal basis vectors.

PROBLEMS

1. Let n = 4, x = e l - e 2 + 2e4 = (I, -1,0,2), Y = 3e l - e 2 + e 3 + e4 = (3, -1,1,1).


Find x + y, x - y, Ix + yl, Ix - yl, lxi, Iyl, X· y. Verify (1.1) and (1.2) in this
example.
2. Prove that the standard euclidean inner product in En has the following four
properties:
(a) X' Y = Y • x. (b) (x + y). z = x' z + y' z.
(c) (ex)' y = e(x . y). (d) x' x > 0 if x #- O.
3. Using Problem 2, show that
(w + ex)' (y + dz) = w' y + ex' y + dw' z + edx' z.

4. Show that 2:7= I IXil ~ J~ lxi, for any x = (Xl, ...• xn). [Hint: First suppose that
Xi;:: O. Use Equation (1.1) with yi = I.J

5. Show that 21xl2 + 21yl2 = Ix + yl2 + Ix - Y12. What does this say about
parallelograms (see Figure 1.3)?

x +y

o
Figure 1.3

6. Show that Ix+yllx_yl~lxll+lyll with equality if and only ifx·y=O.


What does this say about parallelograms?
7. Prove (1.3), using (1.2) and induction on m.
8. Let n = 4, and
VI = ¥3e l + 4e 3 ),
V1 = ¥4e 1 - 3e4 ).
V3 = (-/2/10)( -4e l + 3e 2 + 3e 3 + 4e 4 )·
Show that v I, V2, V3 are mutually orthogonal unit vectors. Find a unit vector v4
such that v I' V2' V3, V4 form an orthonormal basis for £4.

9
1 Euclidean spaces

9. Let {Vb"" Vn} be an orthonormal basis for En, and let

C= {X: x = ,=.f tiv i, 0 S t i S


1
1 for i = 1, ... , n}.

The set C is an n-cube. If each t i = 0 or 1, x is called a vertex of C. What are the


possible distances between vertices of C?
10. (Gram-Schmidt process.) Let {XI"", xn} be a basis for En. Let VI = lXII-lXI'
Y2 = X2 - (X2 'vdv l , v2 = IY21- I Y2, Y3 = X3 - (x 3 'VI)V I - (X 3 'V 2)V 2 , V3 =
IY31- I Y3,"" Vn = IYnl-IYn' Show that {VI."', vn} is an orthonormal basis for En.

11. Let l ' be a vector subspace of En, of dimension k; and consider its orthogonal
complement

1'1. = {y: Y' X = 0 for all X E Y}.


(a) Find an orthonormal basis {Vb"" Vn} for En, such that {VI"'" vk } is a basis
for l ' and {Vk+ I, ••.• vn } is a basis for 1'1.. [Hint: Apply Problem 10 to a basis
{XI' ... , xn} such that {XI' ... , xd is a basis for 'r.]
(b) Show that each X E En can be written in one and only one way as X = Y + z
with YE l ' and Z E 1'1..

1.3 Elementary geometry of En


Such concepts as lines, planes, circles, and spheres in E2 or E3 have analogs
in En for any dimension n. Let us begin with the concept of line in En.

Definition. Let Xl' X 2 E En with Xl i= X2 . The line through Xl and X2 is


{x: X = tX I + (l - t)x 2 , t any scalar}.
If we set z = XI - X2 , then this can be rewritten as
+ tz, t any scalar}.
{x: X = X2

In the plane E2 the vector equation X = x 2 + tz becomes


x = X2 + t(XI - X2), Y = Y2 + t(YI - Y2),
which, in elementary analytic geometry, are called "parametric equations"
of the line through (x I, YI) and (x 2, h)·
The line segment joining XI and x 2 is
{X : X = tx I + (1 - t)x 2. t E [0, I]},
where [a, b] denotes the set of real numbers t such that a ::; t ::; b (Section
1.1 ).

For example, if t = t, then X is the midpoint of the line segment joining


Xl and X 2 (Figure 1.4). The points corresponding to t = t, i trisect the line
segment.

10
1.3 Elementary geometry of En

Figure 1.4

Hyperplanes, half-spaces
Given two points XI' X 2 E En, consider the set

Such a set P is called a hyperplane through X 2 . Geometrically, one can think


of P as follows. Let I denote the line through X I and x z , and l' the line through
X z and x. We call I and l' perpendicular if (XI - Xl) • (X - Xl) = 0. Thus
P is made up of all lines through Xl perpendicular to I. If we set z = Xl - X z
and c = z· X2, then
(XI - x 2 )· (x - x 2 ) = z . x-c.

Thus the definition of hyperplane can be conveniently formulated as follows:

Definition. A hyperplane in En is a set of the form {x: z . X = c}, where


z "# 0 and c are given.

A hyperplane is a point for n = 1, a line for n = 2, a plane for n = 3. For


any nonzero scalar b, {x: Z • X = c} = {x: (bz) . x = be}. Thus the vector
z and scalar c defining a hyperplane are determined only up to a scalar
multiple. If c "# 0, we may, for instance, always take c = 1.
A hyperplane P = {x: z· x = c} is parallel to PI = {x: z· x = cd for
any c 1 "# c. If c 1 = 0, then PI contains 0, and PI is a vector subspace of e
of dimension n - 1. This last statement follows from Problem 5.

EXAMPLE 1. Find the hyperplane P in E4 which contains the four points


e 1, e 1 + 2e 2 , e 2 + 3e 3 , e 3 + 4e 4 • Every x E P must satisfy the equation
z . x = c, where z and c must be found. Taking in turn x = e b x =
e 1 + 2e z , ... , we obtain

c = z· e 1 = Zl, c = z· (e 1 + 2e 2 ) = Zl + 2Z2,
C = Z • (e 2 + 3e 3 ) = Z2 + 3z 3, C = Z • (e 3 + 4e 4 ) = z3 + 4Z4.

11
1 Euclidean spaces

From these equations, the components Zl, ... , Z4 of the vector z satisfy

Taking for convenience c = 6, we have


P = {x:6x 1 + 2x 3 + X4 = 6}.

Definition. Let z =1= O. A closed half-space is a set of the form {x : z . x 2: c},


and an open half-space is a set of the form {x : z . x > c} (see Figure 1.5).

~
:::::-::::: H = {x: z· x ~ c}

P={x:z·x=c}

Figure 1.5

A set H of the form {x: z . x S c}, z =1= 0, is a closed half-space, since H is


also {x: ( - z) . x 2: - c). The same remark applies to open half-spaces. A
hyperplane P = {x: z . x = c} divides En into two half-spaces. More precisely,
p - P is the union of the open half-spaces
{x : z . x > c} and {x : z . x < c}.

Spheres and spherical balls


Given x and J > 0, the set {y : Iy - x I = J} is called the (n - 1)-sphere with
center x and radius J. For n = I, it consists of two points x ± J; for n = 2,
it is a circle, and for n = 3 a sphere. The set {y: Iy - x I < J} is the open
spherical n-ball with center x and radius J. The corresponding set
{y : Iy - x I s J} is called a closed spherical n-ball.
The terms" open" and" closed" are justified in Section 1.4. It turns out
that an open half-space or spherical n-ball is an open set, and a closed
half-space or spherical n-ball is a closed set.

Convex sets
This concept is defined as follows.

Definition. Let KeEn. Then K is a convex set if the line segment joining
any two points of K is contained in K (Figure 1.6).

12
1.3 Elementary geometry of E"

8 ......
XI
I
X

Convex
X
2
~2 XI X

Not convex

Figure 1.6

E" itself is a convex set. The empty set and sets with just one point trivially
satisfy the definition; hence they are convex. The reader should be able to
think of several kinds of geometric objects such as lines, planes, spherical
balls, regular solids, and so on, which appear to be convex sets. However,
geometric intuition is not always a reliable guide, especially in four or more
dimensions. In any case, intuition is no substitute for a proof that the set in
question is actually convex. To show that a set K is convex directly from the
definition, we must verify that for every Xl' X2 E K and t E [0,1], the point
X = tX l + (l - t)x 2 also belongs to K. In the definition, we assumed that
Xl i= x 2 . But if XI = x 2 , it is trivial that X E K. since X = XI = x 2 .

EXAMPLE 2. Any closed half-space is a convex set. Let H = {x: Z • X ~ c},


Z i= O. Let XI' x 2 E H and X = tX I + (l - t)x 2 , where t E [0,1]. Then
Z'X l ~ c and Z'X 2 ~c. Since t ~ 0, tZ'x I ~tc; and since 1-t~0,
(l - t)z . X 2 ~ (1 - t)c. Consequently,

Z· X = tz· Xl + (1 - t)z . X 2 ~ tc + (1 - t)c = c.

This shows that X E H. Therefore H is a convex set. Similarly, any hyperplane


is a convex set (Problem 12) and any open half-space is a convex set.

EXAMPLE 3. Let V be an open spherical n-ball, namely, V = {x: Ix - xol < «:5},
for some Xo and «:5 > O. To show that V is a convex set, we proceed as in
Example 2. Let XI' x 2 E V and X = tX I + (1 - t)x 2 , where t E [0,1]. Then

IXI - xol < «:5, IX2 - xol < «:5,


X - Xo = t(x i - xo) + (1 - t)(x 2 - x o),
Ix - xol ~ tix i - xol + (l - t)lx 2 - xol < «:5.

Hence X E U.

The convex subsets of En have many remarkable properties. There is an


extensive mathematical literature devoted to them [6] [9] [13]. In Section
1.5 we give a brief introduction to the theory. The main result (Theorem 1.1)
is the characterization of closed convex sets as intersections of closed half-
spaces.

13
I Euclidean spaces

PROBLEMS

1. Let n = 3. Find the plane that contains the three points e l , e2, and e 3 - 3e l .
Sketch its intersection with the first octant in E3.
2. (a) Find the hyperplane in E4 containing the four points 0, e l + e 2, e l - e 2 + 2e 3 ,
3e4 - e 2 ·
(b) Find the value of t for which t(e l - e 2) + (l - t)e4 is in this hyperplane.
3. Let i denote the line in E4 through e l - e 3 and -e l + e 2 + 2e4 . Find the hyper-
plane P through e l - e 3 to which i is perpendicular.
4. Let "Y = {(x, y, z): 2x + 3y - z = a}. Show that "Y is a 2-dimensional vector
subspace of E 3 , and find a basis for 'Or. ("Y is a vector subspace of En if x, y E "Y imply
x + Y E "Yand ex E "Y for any scalar c.)
5. Let"Y = {x:z'x = O}, where z i= Oisgiven. Show that "Yisan(n - I)-dimensional
vector subspace of En, and find a basis for "Y.
6. Show that {x: Ix - xII = Ix - x 2 1}, where Xl and X2 are given points in En, is a
hyperplane.
7. Show that {x: Ix - xII = e Ix - x 21}, where Xl and X2 are given points in En and
o < e < I, is an (n - I)-sphere.
8. Let x o , Xl' ... , Xn- I be such that XI - Xo, ... , Xn- I - Xo are linearly independent.
Prove that there is exactly one hyperplane containing Xo , XI, ... , Xn - I.
9. A set P = {x: Zi' X = ei for i = I, ... , n - k}, where ZI, ... ' Zn-k are linearly
independent vectors, is called a k-piane in En. Let Xo , Xl> ... , Xk be such that
XI - Xo , ... , Xk - Xo are linearly independent. Prove that there is exactly one
k-plane containing x o , Xl' .. ·' X k·
10. Prove that any line in En is a convex set.
11. Show that K is a convex set by directly applying the definition. Sketch K in the
cases n = 1,2,3.
(a) K = {X:IXII + ... + Ix nl:5: I}.
(b) K = {x = eiv i + ... + env n, 0:5: ei :5: I for i = I, ... , n}, where {VI' ... , vn}
is a basis for En. This is the n-paralleiepiped spanned by VI' ... ' Vn with 0 as a
vertex.
12. Let P be a hyperplane. Prove that the line through any two points of P is contained
in P. Why does this imply that P is a convex set?

1.4 Basic topological notions in En


We now introduce some basic concepts that are essential to a careful treat-
ment of several-variable calculus. These concepts are developed further in
Chapter 2.
Let us begin by making precise the idea of being" strictly inside" a set A,
"strictly outside" A, or neither. Points with these properties will be called,
respectively, interior, exterior, or frontier points. We first define the concept
of neighborhood.

14
1.4 Basic topological notions in En

Definition. A neighborhood of a point Xo E E" is a spherical n-ball U =


{x: Ix - xol < c5}, where c5 > 0 is called the radius of U.

We also call U the c5-neighborhood of xo. Let A be any subset of En.

Definition. A point x is called interior to A if there is some neighborhood U


of x such that U c A. If some neighborhood of x is contained in the
complement A = En - A, then x is exterior to A. If every neighborhood
C

ofx contains at least one point of A and at least one point of A then x is a
C,

frontier point of A.
An interior point of A necessarily is a point of A, and an exterior point
must be a point of A However, a frontier point may belong either to
C•

A or to AC.

Figure 1.7

EXAMPLE I. Let U be the c5-neighborhood of Xo (Figure 1.7). Let us show


that every point of U is interior to U. Given x E U, let r = c5 - Ix - Xo I
and let V be the r-neighborhood of x. If Y E V, then y - Xo = (y - x) +
(x - x o) and by the triangle inequality
Iy - Xo I sly - x I + Ix - Xo I,
Iy - Xo I < r + Ix - Xo I = c5.
Hence y E U. This shows that V c U. Similarly, every point x such that
Ix- Xo I > c5 is exterior to U. If Ix - Xo I = c5, x is a frontier point.

EXAMPLE 2. Let n = 1. Then neighborhoods are open intervals (xo - c5,


Xo + c5). Let A be the set of all rational numbers. Any open interval contains
both rational and irrational numbers. Hence every point of EI is a frontier
point of A.

Definition. The interior of a set A is the set of all points interior to A. It


is denoted by int A. The set of all frontier points of A is the boundary (or
frontier) of A, and is denoted by fr A. The set A u fr A is the closure of A
and is denoted by cl A.

15
1 Euclidean spaces

In Example 1, int U = U and fr U is the (n - I)-dimensional sphere of


radius J. In Example 2, int A is the empty set, and fr A = cI A = EI.

EXAMPLE 3. Let A = En. Then int En = cI En = En and fr En is the empty set.

EXAMPLE 4. Let A be a subset of E1, which has an upper bound. Then


sup A E fr A, where sup A denotes the least upper bound (Section 1.1).
Similarly, inf A E fr A if A has a lower bound.
The closure of a set A consists of those points not exterior to A. Thus
(c1 AY = int (A C ).

It is always true that int A c A. If these two sets are the same, then A is
called an open set.

Definition. A set A is open if every point of A is interior to A.

From the above examples any neighborhood is an open set and En is an


open set. The empty set furnishes another example of an open set. The
interior of any set is an open set (Problem 4). We encounter many more open
sets, starting with the homework problems at the end of the section. The
collection of all open sets define what is called the topology of En. In Chapter 2
we see that such properties as connectedness of a set and continuity of a
function can be expressed in terms of open sets. These are examples of
topological properties.
Let us next show that the union of open sets is open, and that the inter-
section of finitely many open sets is open. We begin by considering two sets
A and B.

Proposition 1.1. If A and B are open sets, then A u B and A n B are open.
PROOF. To show that A u B is open, we show that every point of A u B is
interior to A u B. Let x E A u B. We must find a neighborhood U of x
such that U c A u B. By the definition of the union of two sets, either
x E A or x E B.lfx E A, then there is a neighborhood U ofx such that U c A.
Since A c A u B, U c A u B. Similarly, if x E B there is a neighborhood of
x contained in A u B. This proves that A u B is open.
If x E A n B, then x has neighborhoods U 1, U 2 such that U 1 C A and
U 2 c B. Let U 3 = U I n U 2 . Then U 3 is a neighborhood of x and U 3 c
A n B. Therefore A n B is open. D

In order to discuss unions and intersections of more than two sets,


let us introduce some set-theoretic notation. By indexed collection of sets
let us mean a function with domain some nonempty set J (called an index
set) whose values are subsets of some set S. In the present instance S = En.

16
1.4 Basic topological notions in E"

Let AI' denote the value of the function at /1 E f Moreover, let


U AI' = {p E S : pEAI' for some /1 E f},

n
I'E1

AI' = {p E S: p E All for every /1 E f}.


/1E1

These sets are, respectively, the union and intersection of the indexed collec-
tion. Iff is a finite set, then the indexed collection is called finite. For instance,
if j = {I, 2, ... , m}, we write the collection as AI' A 2, ... , Am. If j =
{I, 2, ... }, then the indexed collection is an infinite sequence of sets and is
written AI' A 2, ... , or [Am], m = 1,2, ... In that case the union is written
A I U A2 U ... or U;;;~ I Am' with similar notations for the intersection.

Proposition 1.2. The union of any indexed collection of open sets is open.
The intersection of any finite indexed collection of open sets is open.

The proof is almost the same as for Proposition 1.1.

EXAMPLE 5. Let n = 1 and A = {x:O < x < 1, x =f. m- I , m = 2,3, ... }.


We can write A = Al U A z U ···,whereA m = {x:(m + 1)-1 < X < m- I }.
Each set Am is open, and hence by Proposition 1.2, A is open.

EXAMPLE 6. Let Am be the m-I-neighborhood of a point Xo, m = 1,2, ....


Then

which is not an open set.

The concept of closed set is defined by considering set complements.

Definition. A set A is closed if its complement A e is open.

In other words, A is a closed set if A contains all of its frontier points,


which is to say A = cl A.
Every door is either open or closed. However, many sets are neither open
nor closed.

EXAMPLE 7. Let A = {x: a :S x < b}. The points a and b are frontier points
of A, with a E A, b ¢ A. The set A is neither open nor closed. In the notation
of Section 1.1, A = [a, b). Its interior and closure are respectively the open
interval (a, b) and the closed interval [a, b].

We recall that
(A n BY = A e U Be, (A u BY = A e n Be.

17
1 Euclidean spaces

More generally, for any indexed collection of sets

We have the following statement from Propositions 1.1 and 1.2.

Proposition 1.3. The intersection of any indexed collection of closed sets is


closed. The union of any finite indexed collection of closed sets is closed.

Besides indexed collections, we sometimes consider un indexed collections


of sets. (We use the term "collection of sets," rather than "set of sets," for a
set whose elements are subsets of some given set S.) Let us use German
script letters to denote collections of sets. For instance, the elements of a
finite collection 21 = {A 1, ... ,Am} of subsets of En are the sets Ai C En,
i = 1, ... ,m.
The union and intersection of a collection 21 of sets are, respectively, the sets
UA = {p E S : pEA for some A E 21},
n
AE'll

A = {p E S : pEA for every A E 21}.


AE'll

If each set of the collection is indexed by itself (taking.f = 21, AA = A), then
this definition of union and intersection agrees with the one for indexed collec-
tions. Propositions 1.2 and 1.3 remain valid for unindexed collections.

PROBLEMS

1. Let U I be the b-neighborhood of XI and U 2 the b-neighborhood of x 2 . Show that


U I n U 2 is empty if and only if b ~ ! IX I - x21.
2. Find int A, fr A, cl A if A is:
(a) {x:O<lx-xol~b},b>O.
(b) {x:lx - xol = b},b > O.
(c) {(x, y): 0 < y < x + 1, x > -I}.
(d) {(r cos e, r sin e): 0 < r < 1,0 < e < 2n}.
(e) {(x, y): x or y is irrational}.
(f) Any finite set.
(g) {I, t, 1, ... }, n = 1.

3. In Problem 2 which sets are open? Which are closed?


4. Let A be any set. Show that int A is open, and that both fr A and cl A are closed.
5. (a) Show that the half space {x: z· x < c} is an open set.
[Hint: Iz'Y - z'xl ~ Izlly - xl.]
(b) Show that {x: z· x 2: c} is a closed set, using (a), and that the hyperplane
{x : z . x = c} is a closed set.

18
1.5 Convex sets

6. Show that:
(a) fr A = fr(A C). (b) cI A = cI(cI A).
(c) fr A = cI A n cI(A C ). (d) int A = (cI(A C ))'.
7. Show by giving examples that the following are in general false:
(a) int(cI A) = int A. (b) fr(fr A) = fr A.
8. Let A be open and B closed. Show that A - B is open, and that B - A is closed.

9. Show that:
(a) int(A n B) = (int A) n (int B).
(b) cI(A u B) = (cI A) u (cI B). [Hint: Part (a).]
10. Show that:
(a) int(A u B) =:> (int A) u (int B).
{b) cI(A n B) c (cI A) n (cI B).
Give examples in which = does not hold.

*1.5 Convex sets


In this section we give a brief introduction to the theory of convex sets. Our
treatment of convexity continues in Section 3.6 with a discussion of convex
and concave functions.

Proposition 1.4. If K b ... ,Km are convex sets, then their intersection
Kl n··· n Km is convex.
PROOF. Let Xb x 2 be any two points of Kl n .. , n K m, Xl i= x 2 . Let I
denote the line segment joining Xl and X 2 . For eachj = 1, ... , m, Xl. X2 E K j'
Since K j is convex I c K j for each j = 1, ...• m. Thus I c KIn··· n Km.
D

In particular, Proposition 1.4 applies if each K j is a half-space. A set that


is the intersection of a finite number of closed half-spaces is called a convex
polytope. Since a half-space is a convex set, any convex polytope is a convex
set.

EXAMPLE 1. Let T be a triangle in the plane £2. Then T is the intersection of


three half-planes, bounded by the lines through the sides of T.

A convex polytope is the set of all points X that satisfy a given finite system
of linear inequalities of the form zj· X 2': cj , j = 1, ... , m. The theory of
linear programming is concerned with the problem of maximizing or
minimizing a linear function subject to such a system of linear inequalities.
It has various interesting economic and engineering applications [to, 13].
In Section 3.6 it is shown that the maximum and minimum values of a linear
function must occur at "extreme points" of K, at least if K is compact.

19
1 Euclidean spaces

In the proof of Proposition 1.4 we did not really use the fact that the num-
ber of sets K j is finite. Therefore we have:

Proposition 1.5. The intersection of any collection of convex sets is a convex


s~. 0

In particular, the intersection of any collection of half-spaces is convex.


The intersection of any collection of closed sets is a closed set. Hence, if each
of the half-spaces is closed, the intersection is a closed, convex set. An im-
portant fact about closed convex sets is that, excluding trivial cases, the
converse holds. The converse can be stated in a slightly sharper form, in that
only half-spaces bounded by supporting hyperplanes need be used (Theorem
1.1). In order to do this we first state the following.

Definition. Let K be a closed convex set. Assume that K is neither the


empty set nor En. A hyperplane P is called supporting for K if P n K
is not empty and K is contained in one of the two closed half-spaces
bounded by P.

If P is supporting for K, the set P n K is convex by Proposition 1.4, and


contains only boundary points of K (Problem 5). Given any boundary point
of K, there is at least one supporting hyperplane containing it. This can be
deduced from Theorem 1.1 and results to be proved in Section 2.4 (Problem 7).
If K has interior points and the boundary fr K is "sufficiently smooth,"
then given y E fr K, there is just one supporting hyperplane containing y.
It is the tangent hyperplane to fr K at y, and can be found by the methods of
calculus. This will be explained in Section 4.7.
If, for example, T is a triangle in E2, then each vertex is contained in an
infinite number of supporting lines to T. Each other boundary point is con-
tained in a single supporting line, the line through the edge of T containing
it (Figure 1.8).

/
/
/ / Supporting line

Figure 1.8

EXAMPLE 2. Let B = {x: Ix I ~ I}, the closed unit n-ball. Then


fr B = {x: Ix I = I}.

20
1.5 Convex sets

Given y E fr B, let
Hy = {x:y.x~ 1},
Py = {x: y. x = I},
so that P y is the hyperplane bounding H y' By Cauchy's inequality and the
fact that Iyl = I,
y·x ~ Iyllxl = Ixl.
Equality holds if and only if x is a positive scalar multiple of y. Hence B c Hy
and B n P y consists only of y. The supporting hyperplane to B at y is P y
(Figure 1.9).

Figure 1.9

Again let K be any nonempty, closed convex set that is a proper subset
of En(K #- En). Let Jf K denote the collection of all closed half-spaces H such
that K cHand the hyperplane P bounding H is supporting for K. For
instance, the collection Jf B in the above example consists of the various half-
spaces Hy for all possible choices of y E fr B.
The notation

stands for the intersection of all half-spaces H E Jf K •

Theorem 1.1. K = n
HE.KK
H.

PROOF. For convenience let us set


KI = n
HE.KK
H.

Since K c H for each H E Jf K, K c K I' Let us show that K = K I'


Suppose it is not. Then there exists some XI E K 1 - K. Since K is a closed
set, there is a point Xo E K nearest XI' that is, Ix - X I I ;?: IXo - X 1 I for every
x E K (Figure l.lO). This fact about closed sets is proved in Section 2.4.

21
1 Euclidean spaces

Figure 1.10

Consider the closed half-space


Ho = {x: (xo - xd· (x - xo) ;?: O}.
Then XI ¢ Ho since
(xo - XI)· (XI - xo) = - Ixo - x l 12 < O.
To show that K c H o , let X be any point of K. Since K is convex, tx +
(1 - t)xo E K for every t E LO, 1]. Then

It X + (1 - t)xo - x l 12 ;?: Ixo - x 112,


or
I(x o - xd + t(x - xoW ;?: Ixo - x 1 12,
Ixo - x l 12 + 2t(x o - XI)· (x - x o) + t 2 1x - XOI2;?: Ixo - x112.
Subtracting IXo - X 112 from both sides and dividing by t, we get for 0 < t S;; 1
2(xo - xd' (x - x o) + tlx - x Ol2 ;?: O.
Letting t ~ 0+ we find that 2(xo - XI)· (x - x o) ;?: 0, which shows that
X E Ho. Thus K c Ho. The boundary of Ho is
Po = {x:(xo - xd· (x - x o) = A},
and Xo E Po. Hence Po n K is not empty, and therefore Po is supporting for K.
This shows that Ho E:If K' Consequently, K I C Ho. But XI E K I, XI ¢ H o ,
a contradiction. Therefore K = K I' 0
Convex combinations
The definition of convex set is expressed in terms of pairs of points. It can also
be given in terms of convex combinations of any finite number m of points.
Let Xj, ... , Xm be distinct points (Xj =f. Xk if j =f. k).

Definition. A point X is a convex combination of x I, ... , Xm if there exist


scalars tl, ... , t m such that

t j ;?: 0 for j = 1, ... , m.

22
1.5 Convex sets

Figure 1.11

To say that x is a convex combination of two points of S is merely to say


that x lies on some line segment with endpoints in S. For instance, if S is the
circle with equation x 2 + y2 = a2, then every point in the circular disk
{(x, y): x 2 + y2 :::;; a 2 } bounded by S is a convex combination of two points
of S.
On the other hand, if S consists of three noncollinear points (xo, Yo),
(XI' YI)' (X2' Y2), then each boundary point of the triangle with these points
as vertices is a convex combination of two points of S, but the interior points
are not. However, each interior point (x, y) is a convex combination of(x2' Yz)
and some point (u, v) on the edge opposite (X2' Y2) (Figure 1.11). Since (u, v) is
a convex combination of(xo, Yo) and (XI' YI)' we can write (x, y) as a convex
combination ofthe three points (xo, Yo), (x 10 yd, (X2, Y2) as follows. Writing
x = (x, y), Xj = (Xj' y), there exist s, ( E [0, 1] such that
x = t[sx o + {l - s)x l ] + {l - + (IXI + t 2 X2'
()X2 = (oXo

where to = (s, (I = t(l - s), (2 =1- (are nonnegative and to + (I + (2 = 1.

Proposition 1.6. A set K is convex if and only if every convex combination of


points of K is a point of K.
PROOF. Let K be convex. Let us prove by induction on m that if x is any
convex combination of XI' .•• , Xm E K, then x E K. The case m = 2 is the
definition of convexity. Assuming the result true for the integer m ;::: 2, let
x be a convex combination of points x 10 ... , Xm +! of K,
m+1 m+!
X = ~
L.
tjx·J' 1= Lt j, tj ;::: 0 for j = 1, ... , m + 1.
j= ! j= !

If t m +! = 1, then t j = 0 for j :::;; m and x = Xm +! is in K. If t m +! < 1, let


~ = tj/t for j = 1, ... ,m,
m
y = L Sjx j •
j= !

Then y is a convex combination of X Io " " x"'. By the induction hypothesis,


y E K. But
x = ty + (l - t)x", + !,
and t E [0,1]. Hence x E K.

23
1 Euclidean spaces

Conversely, assume that every convex combination of points of K is a


point of K. In particular, this is true for convex combinations of any pair
XI' X 2 of points of K. Hence K is convex. D

Let Xo , XI' r :-:;; n, be distinct points of En such that the differences


... , X"
XI - Xo , .. ·, form a linearly independent set. The set of all convex
Xr - Xo
combinations of Xo , XI' ... ' Xr is called the r-simplex with vertices xo,
XI' ... , Xr . A I-simplex is a line segment, a 2-simplex a triangle, and a 3-
simplex a tetrahedron. According to Proposition 1.6, any simplex whose
vertices lie in a convex set K is contained in K (Figure 1.12).

Figure 1.12

A point X of an r-simplex can be written in a unique way as a convex


combination
r

X = L tjx j
j=O

of the vertices X o , XI' ... ,xr (problem 4). The numbers to, tl, ... ,t' are cll-lled
the barycentric coordinates of x. The (r - 1)-dimensional face opposite the
vertex Xi is the set of points of the r-simplex with t i = 0.
For example, the vertices Xo , XI' Xi of a triangle have barycentric coordi-
nates (1,0,0), (0, 1,0), (0,0, 1), respectively. The midpoint of the face dpposite
X o has barycentric coordinates (0, 1, 1). The interior points of the triangle
have barycentric coordinates (to, tl, t 2 ), all of which are strictly positive. In
each case to + t l + t 2 = 1.
The simplex with vertices 0, e I, ... , en is called the standard n-simplex.
It will be denoted by L, and is of use in Section 5.7 in connection with integra-
tion. The barycentric coordinates (to, tl, ... , t n) of a point X E L are given
by t i = Xi for i = 1, ... , n, to = 1 - (x I + ... + xn).

Further results about convex combinations


In the definition of convex combination, no upper bound was put on the
positive integer m. However, for most purposes one need consider only
m :-:;; n + 1. More precisely:

Proposition 1.7. If SeEn and X is a convex combination of points of S, then


X is a convex combination of n + 1 or fewer points of s.

24
1.5 Convex sets

PROOF. Let x be a convex combination with m > n + 1, t j > 0 for j =


1, ... , m, and XI"'" Xm E S. Let us show that X is a convex combination
of m - 1 points of S. Since m - 1 > n, there exist e l , ••• , em - I not all 0 such
that
el(Xl - xm) + ... + em-I(Xm _ 1 - xm) = O.
Let cm = _(cl + ... + em-I). Then
m m
L CjXj = 0, Le j = O.
j= I j= I

Let
sj = tj - r:x.c j for j = 1, ... , m,
where r:x. is a positive number chosen so that sj 2: 0 for each j = 1, ... , m
and Sk = 0 for some k. Explicitly,

!r:x. = max {c:t ' ... ,ct m


m
}.

Then

and consequently X is a convex combination of the m - 1 points XI' ... , Xk - I,


X k + I,···, Xm •
Either m - 1 = n + 1, or else the same argument shows that X is a
convex combination of m - 2 points of S. Continuing, we find that X must
be a convex combination of n + 1 or fewer points of S. D

If S is the set of vertices of an n-simplex T and each of the barycentric


coordinates of x is positive, then x is not a convex combination of fewer than
n + 1 points of S. Hence the number n + 1 is the best possible in Proposition
1.7.
A slightly better result is possible if S is a connected set. A set S is called
disconnected if there exist open sets D and E such that SeD u E, D n S
and EnS are both nonempty, but D n EnS is empty. Connected means
not disconnected. For further discussion of connectedness, see Section 2.7.

Proposition 1.8. If S is a connected subset of En and x is a convex combination


of points ofS, then x is a convex combination ofn or fewer points ofS.
PROOF. Suppose that x* is a point which is a convex combination of n + 1
points x o , x I, ... , Xn of a connected set S, but not of fewer than n + 1 points
of S. The differences x I - X o ,' .. ,X n - Xo form a linearly independent set;
if not, the reasoning used to prove the proposition above shows that x*
is a convex combination of n of the points x o , x I, ... , X n • Therefore x o ,

25
1 Euclidean spaces

Figure 1.13

XI' ... , Xn are the vertices of an n-simplex


T, and all the barycentric coordinates
of x* are positive. Let To be the face of T opposite x o , and
Ko = {x: X* = tx + (1 - t)y, where y E To and t E [0, I]}.
Ko is a convex polytope, and its boundary fr Ko consists of portions of the
hyperplanes which contain x* and the (n - 2)-dimensional faces of To (we
leave the verification of this to the reader). If fr Ko intersects S, then x* is
a convex combination offewer than n + 1 points of S, contrary to hypothesis
(Figure 1.13). Hence S n fr Ko is empty. The interior int Ko and the comple-
ment K'O = En - Ko are open sets, their union contains S, and their inter-
section is empty. But Xo E int Ko and Xj E K'O for i = 1, ... , n. This implies
that S is disconnected, which is a contradiction. 0

By slightly refining the proof, an even stronger result is obtained. Suppose


that S = S I U ... U Sk' where k ~ nand S I, ... , Sk are connected sets.
For each i = 1, ... , n consider the corresponding convex polytope K j • Then
int K j n int K j is empty whenever i #- j and S n fr K j is empty for every i.
Moreover, Xj E int K j • Since k ~ n, some pair of the points Xj, Xj must belong
to the same set Sp. Then Sp is not connected, a contradiction. Hence, if Sis
the union of n or fewer connected sets, every x which is a convex combination
of points of S is a convex combination of n or fewer points of S.

PROBLEMS

1. Show that each of the following subsets of £2 is closed and convex by writing it as
the intersection of closed half-planes:
(a) The regular hexagon with center (0, 0) and e 1 as one vertex.
(b) {(x,y):y ~ lxi, -I s x s I}.
(c) {(x, y): y s log x, x > O}.
(d) {(x, y): 0 s y s sin x, 0 s x S rr}.
2. Write the standard n-simplex as the intersection of n +1 closed half-spaces.
Illustrate for n = 2 and n = 3.

26
1.5 Convex sets

3. Write tel + !e z as a convex combination of e l , 1ez - el' Also write it as a convex


combination of 0, e z , e l + e z . Illustrate.
4. Show that if x can be represented in two ways as a convex combination of
Xo , XI"'" x,, then XI - Xo ,"" X, - Xo form a linearly dependent set. [Hint:
If X = tOxo + ... + t'x, and to + ... + t' = 1, then X - Xo = tl(X I - xo) + ... +
t'(x, - xo)·]

5. Prove that a supporting hyperplane for a closed convex set K can contain no
interior point of K.
6. Let K be any convex set. Prove that its interior and its closure are also convex sets.
7. The barycenter of an r-simplex is the point at which the barycentric coordinates
are equal, to = t l = ... = t'.
(a) Show that the barycenter of a triangle is at the intersection of the medians.
(b) State and prove a corresponding result for r 2 3.
8. Let X be a convex combination of XI' ... , Xm and let Xj be a convex combination
of Yjl,"" Yjmj,j = 1, ... , m. Show that X is a convex combination of ZI,"" zP'
which are the distinct elements of the set {Yik: k = I, ... , mj,j = I, ... , m}.
9. Let S be any subset of En. The set S of all convex combinations of points of S is the
convex hull of S.
(a) Using Problem 8, show that S is convex.
(b) Using Proposition 1.6, show that if K is convex and S c K, then S c K. Thus
the convex hull is the smallest convex set containing S.
10. Given Xo and b > 0, let C = {x: Ix i - x~1 ::::; 0, i = I, ... , n}, an n-cube with
center Xo and side length 20. The vertices of C are those X with IXi - Xo I = 0 for
i = I, ... , n. Show that C is the convex hull of its set of vertices. [Hint: Use induction
on n.]
11. Let K be a closed subset of En such that both K and its complement En - K are
nonempty convex sets. Prove that K is a half-space.
12. Let A and B be convex subsets of En. The join of A and B is the set of all X such that
X lies on a line segment with one endpoint in A and the other in B. Show that the
join of A and B is a convex set.

27
2
Elementary topology of En

In this chapter we explore various ramifications of the topological notions


introduced in Section 1.4. We begin with some basic properties of functions,
limits, and continuity. Then we turn in Sections 2.3 and 2.4 to some properties
of En that depend essentially on the least upper bound axiom for real numbers.
Among these properties are the convergence of Cauchy sequences (Theorem
2.2) and the Bolzano-Weierstrass theorem. In Section 2.6 a very general
concept is introduced, that of topological space. In particular, any set
SeEn becomes a topological space with the relative topology which S
inherits from En. The concepts of connected set and compact set are intro-
duced in Sections 2.7 and 2.8. It turns out that these properties are preserved
by continuous functions (Theorems 2.8 and 2.10). At the end of the chapter,
the concept of metric space is introduced, and some interesting special cases
are considered. Uniform convergence of a sequence of functions is studied as
convergence in a certain metric on a space of bounded functions.

2.1 Functions
One should think of a function f as assigning to each element p of some set
San elementf(p) of another set T. The elementf(p) is called the value off
at p. However, this is not a satisfactory definition of" function " because ofthe
ambiguity of the word "assigns." We give a more careful definition, in terms of
cartesian product sets.

Cartesian product sets


If S and Tare sets, then the cartesian product set S x Tis formed by taking
all ordered pairs (p, q) where pES and q E T. For example, if S = {I, 2, ... , n}
and T = {I, 2, ... , m}, then the elements of S x Tare pairs (i,j) of positive

28
2.1 Functions

integers with 1 ~ i ~ n, 1 ~ j ~ m. In the same way, the plane e is the


cartesian product EI x EI.
If S I' ... , Sn are sets, then the n-fold cartesian product S I x ... x Sn is
formed by taking all (ordered) n-tuples (PI' ... , Pn), where Pi E Si for each
i = 1, ... , n. In particular, En = EI X ••• X EI. In E2, the cartesian product
[a, b] x [c, d] is the rectangle {(x, y): a ~ x ~ b, c ~ y ~ d}.
Relations, functions
Any subset f of the cartesian product S x Tis called a relation between Sand
T. A relation f is called a function if for every PES there is exactly one q E T
such that (p, q) E f. This element q is denoted by f (p).
The set S is the domain off. We shall sometimes say that f is a function
from S into T. If for every q E T there is some pES such that q = f(p), then
we say thatfis onto T. A functionfis univalent (or one-one) if PI # P2 implies
thatf(PI) # f(pz)·
This book is concerned with the calculus of functions whose domains are
subsets of En. Such functions are frequently called by the suggestive but
imprecise name, "functions of n real variables." We may occasionally use this
name in passages intended to motivate a more careful discussion to follow.
However, we never try to make precise the phrase" n real variables." It was
only after such vague terms as "variable" and" quantity" were abandoned
that calculus was put on a foundation acceptable by present-day standards.
A function f from a set S into E I is a real valued function. When SeE I ,
fis a real valued "function of one real variable." Among such functions are
the algebraic functions and elementary transcendental functions (sin, cos,
tan, log, etc.), which should be familiar from elementary calculus. The expo-
nential function is denoted in this book by "exp." Thus exp x = eX, where
e is the base for natural logarithms.
Functions with values in some euclidean Em, m > 1, are called vector
valued and are indicated by boldface letters (say, f or g). A vector valued
function f from a subset D of some euclidean En into Em is also called a
transformation from D into Em. By merely writing "transformation" in place
of "vector-valued function," we have of course introduced no new mathe-
matical idea. However, the word "transformation" is supposed to have a
geometric flavor which aids intuition. Some authors say "mapping" instead
of" transformation." The differential calculus of transformations is developed
in Chapter 4.
If f and g are functions with the same domain S and values in a vector
space "Ii, then the sumf + g is defined by
(f + g)(P) = f(p) + g(p)
for every pES. In particular, it makes sense to speak of sums of real valued
functions or of transformations. Iffhas values in "Ii and ¢ is real valued, then
¢fis the "Ii-valued function given by
(¢f)(p) = ¢(p)f(p)

29
2 Elementary topology of En

for every pES. If <P is a constant function, <p(p) = c for every pES, then we
write cf instead of <pf.

Restriction of a function
Often one is interested only in the values of a function f for elements of
some subset A of its domain. The restriction off to A is the function with
domain A and the same values asfthere. It is denoted by f I A. Thus
flA = {(p,f(p)):PEA}.
For instance, if a real-valued function f is integrated over an interval IcE 1,
then it is only f II which is important. The values off outside I do not affect
the integral.
Images, inverse images
Let fbe a function from a set S into a set T. The image under f of a set A c S
is the setf(A) = {f(p): pEA}. It is a subset of T, and in fact the restriction
f IA is a function from A onto f(A). The inverse image of a set BeT is the
setf -1(B) = {p: f(p) E B}. It is a subset of S.

EXAMPLE 1. Let f(x) = Xl. Then f([ - 2,2]) = [0,4], f(El) = [0,00),
f - 1([ 1, 3]) = [ -)3, - 1] u [1, ~/3]. The function f is not univalent since
f( -x) = f(x).

EXAMPLE 2. Letfbe a function from a set S into a set T. Show that


(*)

for any A c S. Consider any pEA. Then f(P) E f(A) by definition of f(A).
Take B = f(A) in the definition of inverse image set above. Then
p E f -1(f(A)). Since this is true for each pEA, we get (*).

EXAMPLE 3. Show that iffis univalent in Example 2, then


(**) A = f - 1(f(A)).
It suffices to show that A =:> f - 1(f(A)), since the opposite inclusion is (*).
Consider any p E f- 1(f(A)). Then f(p} E f(A}. Therefore f(p) = f(p/) for
some p' E A. Since f is univalent, p = p'. Thus pEA as required.

PROBLEMS
1. (a) Let f(x) = cos x. Find fW ),f([ -71:/4,71:/2]), r 1([0, I]).
(b) Let g = f I[0,71:]. Find gIrO, 71:]), g-I([O, I]). Is g univalent?
2. The equations s = (Xl + y2)1/2, t = x - y define a transformation f from E2 into E2,
such that fIx, y) = (s, t). Let A = {(x, y): x 2 + l :s; a 2 }, where a > 0 is given.
(a) Find f(A).
(b) Find f- I(A).

30
2.2 Limits and continuity of transformations

3. Let f be a function from S into T. Show that, for any BeT:


(a) B => f(r 1(B).
(b) B = f(r l(B», if f is onto B.

4. Let f be a function from S into T. Show that, for any subsets A and B of S:
(a) f(A v B) = f(A) v f(B).
(b) f(A n B) c f(A) n f(B).
(c) Iff is univalent, then f(A n B) = f(A) n f(B).

5. Let f be a function from S into T. Show that, for any subsets C and D of T:
(a) r l(C v D) = r l(C) v r l(D).
(b) r l(C n D) = r l(C) n r l(D).
(c) r l(DC) = [f-1(D)J.

6. Let Sand T be sets, and let n(p, q) = p for all pES, q E T. The function n projects
S x Tonto S. Let ReS x T be a relation. Show that R is a function if and only if
n IR is univalent and onto S.

7. Let 1 :::;; s :::;; n - 1. Let us regard En as the cartesian product ES x En- s, and write
x = (x', XU), where x' = (Xl, ... , XS ), XU = (X S + 1, ... , xn). Let n(x) = x' be the
projection of En onto P. Show that n(A) is an open subset of ES if A is an open subset
of En.

8. Let AcEs, B c En- s, and regard the cartesian product A x B as a subset of En,
as in Problem 7.
(a) Show that A x B is open if both A and B are open.
(b) Show that A x B is closed if both A and B are closed.

2.2 Limits and continuity of transformations


Let us now suppose that f is a function from a set D c En into Em, where n
and m are positive integers. As already mentioned, such functions are called
transformations in this book.
The definition of "limit" for transformations is patterned after the one
encountered in elementary calculus for real-valued functions of one variable.
A punctured neighborhood of Xo is a neighborhood with the center Xo re-
moved. Let us assume that D contains some punctured neighborhood of
xo. For the definition of "limit," Xo itself need not be in D. If Xo ED, the
value off at Xo is irrelevant.

Definition. If for every neighborhood V of Yo there is a punctured neighbor-


hood U ofx o such that f(U) c V, then Yo is the limit of the transformation
fat Xo (Figure 2.1).

In the definition it is understood that the radius of U is small enough so


that U c D. The notations Yo = limx~xo f(x) and f(x) -> Yo as x -> Xo are used
to mean that Yo is the limit of f at xo.

31
2 Elementary topology of En

V ~f(U)
D
....-
f

~
Figure 2.1

If we let 8 and c5 denote the radii of V and U respectively, then the defi-
nition may be rephrased as follows: f(x) ...... Yo as x ...... Xo if for every 8 > 0
there exists c5 > 0 such that I f(x) - Yo I < 8 whenever 0 < Ix - Xo I < c5.
The number c5 depends, of course, on 8 and may also depend on Xo. Given 8
and x o , there is a largest possible c5. However, there is ordinarily no reason
to try to calculate it.
Let us first show that limits behave properly with respect to sums and
products. Let f and g have the same domain and values in the same euclidean
Em.

Proposition 2.1. If Yo = limx~xo f(x) and Zo = limx~xu g(x), then

(1) Yo + Zo = limx~xo[f(x) + g(x)].


(2) CYo = limx~xo cf(x),for any scalar c.
(3) Yo' Zo = limx~xo f(x) . g(x).

PROOF. Let V be any neighborhood of Yo + Zo and let 8 be its radius. Let


VI' V2 be the neighborhoods of radius 8/2 of Yo, zo, respectively. If y E VI
and Z E V2 , then
(y + z) - (Yo + zo) = (y - Yo) + (z - zo).
By the triangle inequality
8 E
I(y + z) - (Yo + zo) I sly - Yo I + Iz - Zo I < 2+ 2= 8.

Hence y + Z E V. By hypothesis there exist punctured neighborhoods U I, U 2


of Xo such that f( U I) C VI and g( U 2) C V2 · Let U = U I II U 2' which is
also a punctured neighborhood of xo. If x E U, then f(x) E VI' g(x) E V2 •
Consequently, f(x) + g(x) E V, which shows that (f + g)(U) c V. This
proves (1). The proof of (2) is left to the reader (Problem 2).
To prove (3) let Vo be the neighborhood of Yo of radius 1, and U 0 be a
punctured neighborhood of Xo such that f(U 0) c Vo. Let

C = max { IYo I + 1, IZo I} .

32
2.2 Limits and continuity of transformations

If YE Vo , then Y = Yo + (y - Yo). By the triangle inequality

lyl ~ IYol + Iy - Yol < IYol + 1,


and hence IYI < C. Now
f(x)· g(x) - Yo· Zo = f(x)· [g(x) - zoJ + Zo· [f(x) - Yo].
From the triangle inequality and Cauchy's inequality,
(*) If(x)· g(x) - Yo· zol ~ If(x)llg(x) - zol + Izollf(x) - Yol.
Given I: > 0, let VI' V2 be the neighborhoods of radius 1:/2C of Yo, zo, respec-
tively, and let VI = Vo n VI' If I: ~ 2C, then V'I = VI' By hypothesis there
are punctured neighborhoods U I' U 2 ofx o such that f( U I) C V'I' g( U 2) C V2 •
Let U = U I n U 2' For every x E u, f(x) E Vo and hence I f(x) I < C. From (*),

for every x E U. This proves (3). o


Definition. A transformation f is called bounded on a set A if there exists C
such that If(x) I ~ C for every x E A.

In the course of the proof we showed that if f has a limit at x o , then f is


bounded on some punctured neighborhood U 0 of Xo.

Limits of components
Since f has values in Em,
f(x) = (f I(X), ... , fm(x))
where.t is a real valued function called the ith component of f(with respect
to the standard basis of Em). For instance, in Problem 2, Section 2.1,f I(X, y) =
(x 2 + y2)1/2 and f2(X, y) = X - y. The next proposition shows that con-
vergence of f(x) to Yo is equivalent to convergence of each component
fi(X) to the corresponding component of yo.

Proposition 2.2. Yo = Iimx~xo f(x) if and only if y~ = limx~xo fi(X) for each
i = 1, ... ,m.

The proof is left to the reader (Problem 10).

Limits along lines

Proposition 2.3. If Yo = limx~xo f(x), then for any v -=f. 0,

Yo = lim f(xo + tv).


t~O

33
2 Elementary topology of E"

°
PROOF. Let V be any neighborhood of Yo. There exists (j >
f(x) E V whenever < Ix - xol < (j. IfO < It I < (jjIvl, then
° such that

I(x o + tv) - xol = Itllvl < (j.


Hence f(xo + tv) E V for every r in the punctured (jjI v I-neighborhood of 0.
o
The points Xo + tv lie on the line through Xo and Xo + v. Roughly
speaking, Proposition 2.3 states that if f has a limit Yo at xo, then Yo is also
the limit as Xo is approached along any line containing Xo. When f fails to
have a limit at Xo this fact can often be discovered by testing f along various
lines.

EXAMPLE I. Let j(x, y) = X2/(X 2 + y2), (x, y) oF (0,0), and let Xo = (0,0).

° ° °
Taking v = e 1 = (1, O),j(r, 0) = 1 for every t oF 0. Hencej(t, 0) -+ 1 as t -+ 0.
Similarly, taking v = e 2 = (0, 1),J(0, t) = for every t oF andj(O, t) -+ as
t -+ 0. Since these limits are different,fhas no limit at (0,0).
As we show below, if Xo = (x o, Yo) oF (0,0), then j(x, y) -+ j(xo, Yo) as
(x, y) -+ (xo, Yo). Thus (0, 0) is the only point where there is no limit.

The converse to Proposition 2.3 is false, as shown in Example 2.

EXAMPLE 2. Let j(x, y) = (y2 - X)2/(y4 + x 2), (x, y) oF (0,0), and again let
Xo = (0,0). Consider any v = (h, k) oF (0,0). Then
(tk 2 - hf
f(th, tk) = t2k4 + h2

which tends to 1 as t -+ 0. However, every punctured neighborhood of (0,0)


contains part of the parabola y2 = x, andj(y2, y) = 0. Hencefdoes not have a
limit at (0,0).

Continuity
Let us now suppose that Xo is an interior point of the domain D of the trans-
formation f.

Definition. A transformation f is continuous at Xo if f(xo) = limx~xo f(x).


When f is continuous at x o, punctured neighborhoods may be replaced
by neighborhoods. The definition may be restated: f is continuous at Xo
if for every neighborhood V of f(xo) there is a neighborhood U of Xo such
that f(U) c V.

EXAMPLE 3. Let f(x, y) = (xy)I/3. Show directly from the definition that j is

° °
continuous at (0, 0). Denote the radii of the neighborhoods U and V by (j and
e, respectively. We must show that given e > there exists (j > such that
Ij(x, y) - j(O, 0) I < e whenever x 2 + y2 < (j2. Note that j(O, 0) = 0.

34
2.2 Limits and continuity of transformations

From the inequality x 2 ± 2xy + y2 ~ 0, we get

IxYI ~ l(x 2 + y2)


I f(x, y)1 = Ixyll/3 ~ 2- l /3(X 2 + y2)l/3.
Given e > 0, we choose (j = j2i3. Then x 2 + i < (j2 implies If Ix, y)1 < e
as required.

One does not ordinarily have to verify continuity of a transformation


directly from the definition (as in Example 3). Instead, one builds up a stock
of transformations known to be continuous and shows that other trans-
formations can be expressed in terms of these by such operations as addition,
multiplication and composition of functions.
Let us consider for the moment real valued functions f, g.

Proposition 2.4. Let f and g be continuous at Xo. Then f + g and fg are con-
tinuous at Xo. If g(xo) '# 0, then fg - 1 is continuous at Xo'

This is a consequence of Proposition 2.1 and Problem 8.


The continuity of a transformation f = (f 1 , ... , fm) can be verified from
continuity of its components fi:

Proposition 2.5. f is continuous at Xo if and only ifl is continuous at Xo for


each i = 1, ... , m.

This is a consequence of Proposition 2.2.

EXAMPLE 4. A constant function, f(x) = c for all x, is continuous. In the


definition, the neighborhood V can be chosen arbitrarily.

EXAMPLE 5. Let I(x) = x, the identity transformation. Then I is everywhere


continuous (take V = V above). Therefore the components of I are every-
where continuous. In this book these components are called the standard
cartesian coordinate functions, and are denoted by Xl, ... , xn. For each
x = (Xl, ... , x n), Xi(X) = Xi.

EXAMPLE 6. Any polynomial in n variables is everywhere continuous. This is


proved by induction on the degree of the polynomial using the continuity of
the coordinate functions Xi and of constant functions.

°
EXAMPLE 7. A rational function f(x) = P(x)/Q(x), where P and Q are poly-
nomials, is continuous at each point where Q(x) '# by Proposition 2.4. For
instance, in Examples 1 and 2,fis continuous at each (x, y) '# (0,0).

35
2 Elementary topology of En

It is :,hown in Theorem 2.7 that the composite of two continuous trans-


formations is continuous. In Section 4-4 it is shown that any differentiable
transformation is continuous. For a real valued function f of one variable,
differentiability of f at Xo is equivalent to the existence of the derivative
j'(xo)·

Limits at 00

Let us call a set of the form {x : Ix I > b} a punctured neighborhood of 00. The
definition of" limit at 00 "then reads: Yo = limlxl ~ x f(x) iffor every neighbor-
hood V of Yo there exists a punctured neighborhood U of 00 such that
f(U) c V.
Whenfis real valued we say that limx~xo f(x) = + 00 if for every C > 0
there is a punctured neighborhood U of Xo such that f(x) > C whenever
x E U. The definition of "limx~xo f(x) = - 00" is similar.

PROBLEMS

1. Find the limit at Xo if it exists.


(a) f(x, y) = xy/(x 2 + y2), Xo = e l + e2 .
(b) f(x, y) = xy/(x 2 + i), Xo = (0,0).
(c) f(x) = (1 - cos x)/x 2, Xo = O. [Hint: Iimx~o (sin x)/x = 1.]
(d) f(x) = Ix - 21e l + Ix + 21e 2 , Xo = 3.
(e) f(x, y) = yel + (xy)2/[(xy)2 + (x - y)2]e 2, Xo = (0,0).
At which points is each of these functions continuous?
2. Prove (2) of Proposition 2.1.
3. Show that if'yo = Iim x _ xo f(x), then IYol = lim x _ xo If(x)l. Prove that the converse
holds if Yo = O.
4. Letf(x) = Ixl a, where a> O. Show thatfis continuous at Xo = odirectly from the
definition of continuous function.
5. Let f(x, y) = x cos(y- I) if y "" 0, and f(x, 0) = O. At which points is f continuous?
6. Find the limit if it exists.

(a) lim
X4 + y4
-2--2' (b) lim
xi
-2--4'
(x.y)-(o.O) X +y (x.y)-(o.O) X +y
. (x'x t )(x'x 2 ) .
(c) hm , where XI and X2 are given vectors not O.
X'X

(d) lim Ix - xII.


Ixl-oo Ix - x 2 1

7. (a) Letg(x) = If(x)la where a > O. Suppose that f is continuous at Xo and f(xo) = O.
Show that g is continuous at Xo.
(b) Use (a) and Problem 3 to give another proof that the function in Example 3 is
continuous at (0, 0).

36
2.3 Sequences in En

8. Let Yo = limx_xo/(x), 20 = lim x- xo g(x). Show that if 20 -# 0 then


lim f(x) = Yo.
X-Xo g(x) 20

9. Show that limx_xof(x) = + 00 if and only if limx~xo[f(xlr I = 0 and f(x) > 0


for every x in some punctured neighborhood of xo.
10. Prove Proposition 2.2 in two different ways. [Hints: For one proof use the definition
of limit directly and the inequality, for any vector h = (h I, ... , hn ),

WI ~ Ihl ~ In(lhll + ... + WI)·


Take h = f(x) - Yo. For the other proof, write f = fle l + ... + ren and note
that f(x)' e j = t(x).]

2.3 Sequences in En
An infinite sequence is a function whose domain is the set of positive integers.
For brevity, we use the term "sequence" to mean infinite sequence. In this
section let us consider sequences with values in En. It is customary to denote
by Xm the value of the function at the integer m = 1, 2, ... , and to call Xm the
mth term of the sequence. The sequence itself is denoted by x I' x 2 , ... , or for
brevity by [xml It must not be confused with the set {XI' X2,"'} whose ele-
ments are the terms of the sequence. This set may be finite or infinite. For
t
instance if Xm = (- 1 then the sequence is - 1, 1, - 1, ... , and the set
{XI' X2""} has only two elements -1 and 1.

Definition. Suppose that for every E > 0 there exists a positive integer N
such that IXm - Xo I < E for every m ;?: N. Then Xo is the limit of the
sequence [xm].

The notations" Xo = limm~ oc xm" and" Xm ~ Xo as m ~ 00" are used to


mean that Xo is the limit of the sequence [xm]. A sequence is called convergent
if it has a limit, otherwise divergent. The integer N in the definition depends of
course on E. Given E there is a smallest possible choice for N. However, for
purposes of the theory of limits it is of no interest to calculate it. What matters
is the fact that some N exists.

Proposition 2.6. Let Xo = limm~ 00 Xm' Yo = limm _ Ym' Then:


00

(a) Xo + Yo = limm_oo(x m + Yn,).


(b) CXo = lim m_ oo cxmfor any scalar c.

(c) xo' Yo = lim m_ oo Xm' Ym'

Let x~ denote the ith component of the vector X m.

37
2 Elementary topology of E"

Proposition 2.7. Xo = limm->CX) Xm if and only if X~ = limm->:c x~ for each


i = I, ... , n.

We leave the proof of Propositions 2.6 and 2.7 to the reader (Problem 6).
The proofs of the corresponding Propositions 2.1 and 2.2 in Section 2.2 can be
adapted for this purpose.
In this section we prove three important results (Theorems 2.1-2.3)
that depend on the least upper bound property of the real numbers
(Axiom III, Section 1.1).
The first of these results is about convergence of monotone sequences of
real numbers.
A sequence [xm] of real numbers is called monotone if either x I :$; X2 :$;
X3 :$; ... or XI 2': X2 2': X3 2': .... In the first instance the sequence is non-
decreasing, in the second non increasing. A sequence [xm] is bounded if there is
a number C such that IXm I :$; C for every m = 1,2, ....

Theorem 2. I. Every bounded monotone sequence of real numbers has a limit.


PROOF. Let [xm] be nondecreasing and bounded. Let Xo = SUP{XI' X2" .}.
Given E > 0 there'exists an N such that Xo - XIV < f.. Otherwise Xm :$; Xo - E
for every m = I, 2, ... , and Xo - E would be a smaller upper bound than the
least upper bound Xo' Since the sequence is nondecreasing, XN :$; Xm :$; Xo
for every m 2': N. Hence IXm - Xo I = Xo - Xm < E for every m 2': N. This
shows that Xm -+ Xo as m -+ 00.
If the sequence is non increasing and bounded, let x 0 = inf{ x I' X2, ... }.
In the same way Xm -+ Xo as m -+ 00. D

EXAMPLE I. Let 0 < a < I. The sequence a, a 2 , a 3 , ..• of its powers is de-
creasing, and is bounded below by O. The limit of the sequence is b =
inf{a,a 2 ,a 3 , ••• }. Since b:$; a m+ l , a-1b:$; am for m = 1,2, .... Therefore
a-1b is a lower bound for {a, a 2 , a 3 , .• • }. Since b is the greatest lower bound,
a-1b'S:; b. However, a-1b 2': b since 0 < a < I; and hence a-1b = b. This
implies that b = O.

EXAMPLE 2. Let Ym = Lk~o(llk !). The sequence [Ym] is nondecreasing. Since


(k!)-I ~ 2-k+1 for k 2': I

1 1 1
Ym :$; 1 + 1 + - + - + ... + ~- < 3.
2 4 2m - I

Thus [Ym] is bounded. This sequence has a limit Yo.

EXAMPLE 3. Let Xm = (I + 11m)"'. From the binomial formula


I m(m - 1) 1 m! 1
x =l+m-+ -2+ ... +-~
m m 2! m m! mm'

38
2.3 Sequences in En

which can be rewritten as

Xm = 1+ 1+ ~2! (1 - 2.)
m
+~
3!
(1 - 2.)(1
m
- ~)
m

When Xm is replaced by Xm + I, the product multiplying (k!) - 1 becomes


larger for k = 2,3, ... , m, and an additional term appears. Thus Xm S Xm + I.
Moreover, Xm S Ym S Yo S 3, with Ym as in Example 2. Thus the sequence
[xm] has a limit xo, and Xo S Yo. In fact, Xo = Yo (Problem 11).
This number
is the base for natural logarithms Xo = e = 2.718···.
Let us turn to the second main result of the present section. The definition
of limit has the sometimes inconvenient feature that it refers not only to the
sequence [x m ], but to the limit xo. This limit Xo is not known in advance,
although one can sometimes guess what Xo must be. The following is a re-
lated concept due to Cauchy. It is stated purely in terms of the sequence,
without reference to a limit.

°
Definition. If for every G > there exists a positive integer N such that
Ix, - xml < G for every I, m ~ N, then [x",] is a Cauchy sequence.

We may agree that 1 ~ m in this definition without loss of generality.

EXAMPLE 4. Let Xm = L~= I(COS k)/k 2 . Let us show that [xm] is a Cauchy
sequence. If 1 ~ m, then
I cos k
X l - Xm = L -k 2
k=m+1

Since k - 2 S t - 2 for k - 1 S t S k, we get (as in the integral test for con-


vergence of an infinite series)

IXI - xml S fm
1dt
2"
t
= m -1 - 1- I < m -1 .

Given [; > 0, choose N > [; - 1. Then


rx, - Xm I < [; for N S m S I.
Thus [xm] is a Cauchy sequence.

Fortunately, it turns out that the concepts of Cauchy sequence and con-
vergent sequence are equivalent, for sequences in En.

39
2 Elementary topology of En

Theorem 2.2 (Cauchy convergence criterion). A sequence [xm] is convergent


if and only if it is a Cauchy sequence.
PROOF. Let [xm] be convergent, and Xo be its limit. Then given e > 0 there
exists N such that IXm - Xo I < e/2 for every m ~ N. Now
Xl - xm = (Xl - xo) + (xo - xm)·
If I, m ~ N, then by the triangle inequality
e e
IXl - Xm I ~ IXl - Xo I + IXo - Xm I < 2" + 2" = e.

Therefore [xm] is a Cauchy sequence.


The proof of the converse is more difficult. Let us first show that every
Cauchy sequence is bounded. If [xm] is Cauchy, then taking e = 1 in the
definition there is an N such that IXl - Xm I < 1 for every I, m ~ N. In par-
ticular, let I = N and let C = max{lxll, ... , IxN-11, IXNI + I}. Then by the
triangle inequality and the fact that x'" = XN + (xm - x N),
IXml ~ IXNI + IXm - xNI < IXNI + 1
if m ~ N. Therefore IXm I ~ C for every m = 1,2, ....
Next, let [xm] be a Cauchy sequence of real numbers, and C be a number
such that IXml ~ C for every m. For each m = 1,2, ... let
Ym = inf{x m, X m+ j, ••• }.

Since {X m +I,Xm+2,"'} C {xm,xm+ I , ... }, we must have Ym ~ Ym+I' Since


Xm ~ C for every m, y", ~ C for every m. The sequence [Ym] is nondecreasing
and bounded. By Theorem 2.1 the sequence [Ym] has a limit Yo' Let us show
that Xm --+ Yo as m --+ 00. Given e > 0 there exists N such that for every
m ~ N, IXN - xml < e/2, and hence
e e
XN - 2" < Xm < XN + 2:'
If m ~ N, then XN - e/2 is a lower bound and XN + e/2 an upper bound for
{xm, X m+ I'" .}. Therefore when m ~ N

e e
IXm - Yol ~ IXm - xNI + IXN - Yol < 2" + 2: = e.

This proves that Xm --+ Yo as m --+ OC'.


Finally, if [xm] is a Cauchy sequence in En, then for each i = 1, ... , n
the components form a Cauchy sequence [x~] of real numbers. This follows
from the definition of Cauchy sequence and the inequality Ixi - x~ I ~
IXl - Xm I· Each of the sequences [x~] has a limit y~, i = 1, ... , n. By Proposi-
tion 2.7, Xm --+ Yo as m --+ 00. 0

40
2.3 Sequences in En

For the next theorem we need the concept of diameter of a set. A nonempty
set A is called bounded if there exists C > 0 such that Ix I ~ C for all x E A.
The diameter of a bounded set A is
diamA = sup{lx - YI:x,YEA}.

Theorem 2.3 (Cantor). Let [AmJ be a sequence of closed sets such that
Al ::::> A2 ::::> ••• and 0 = Iim m_ oc diam Am· Then Al n A2 n··· contains
a single point.
PROOF. For each m = 1,2, ... let Xm be a point of Am. Let us show that
[xmJ is a Cauchy sequence. Given e > 0, there exists N such that diam AN < e.
If I, m 2:: N, then x,, Xm E AN since A, c AN' Am C AN' Therefore
Ix, - xml ~ diam AN < e.
By Theorem 2.2 the sequence [xmJ has a limit Xo. For each I = 1,2, ... ,
xm E A, for every m 2:: I smce Am C A" Since A, is closed, Xo E A, (Problem 5).
Since this is true for each I, Xo E Al n A2 n··· .
IfXEAI n A2 n ... , then xEAm and
o ~ Ix - xol ~ diamA m, m = 1,2, ...
Since diam Am -+ 0 as m -+ 00, Ix - xol = 0 and x = xo. D

EXAMPLE 5. Let n = I, Am = {m, m + I, m + 2, ... }. Then each Am is closed,


unbounded, and Al ::::> A2 ::::> •••• The intersection Al n A2 n '" is empty.

If in Theorem 2.3 it is not assumed that the diameter of Am tends to 0,


then it is still true that A I n A 2 n ... is not empty, provided some set Am
is bounded. This is proved in Section 2.4.
Note. Theorems 2.1-2.3 depend in an essential way on the least upper
bound Axiom III. Let us suppose, on the other hand, that instead of Axiom
III we took the following.

Axiom 1111
(a) EI has the archimedean property.
(b) Every Cauchy sequence in EI has a limit.

Then Axioms III and III I are equivalent in the presence of Axioms I and II,
Section 1.1. This means that if either III or III I is taken as an axiom, then the
other can be proved as a theorem. We have shown that III implies 111 1 , To
show that III I implies III, we first note that if III I is taken as an axiom, then
Theorem 2.3 remains valid (with the same proof). It then suffices to deduce
the least upper bound property from the archimedean property and Theorem
2.3. To prove this, let S be any subset of EI that is bounded above, and
let c be some upper bound for S. Let us define a sequence of closed intervals

41
2 Elementary topology of En

Figure 2.2

11,1 2 "" as follows: Let a be some point of S, and II = [a, c]. Divide
II at the midpoint (a + c)/2 into two congruent closed intervals. If (a + c)/2
is an upper bound for S, let 12 be the left-hand interval, otherwise let 12
be the right-hand interval. In general, suppose m ~ 1 and 1m has been
defined. If the midpoint of 1m is an upper bound for S, let Im+ 1 be the left half
of 1m , otherwise the right half. The archimedean property implies that for
any x ~ 0, the sequence [m- 1x] tends to 0 as m -+ 00. Since 0 ~ 2- m ~ m- 1,
the sequence [2 - mx] also tends to O. Let x = 2(c - a). Now I I :::l 12 :::l .•.
and the length of 1m is 2 - mx. By Theorem 2.3, lin 12 n ... contains a single
point Xo. By the construction, Xo = sup S (see Figure 2.2).

Infinite series
Formally, an infinite series is an expression written Ik=
I X k or XI + X 2 + ....
To be more precise, with any sequence [Xk] is associated another sequence
[sm], where Sm = XI + ... + Xm is called the mth partial sum. This pair of
sequences defines an infinite series. If the sequence of partial sums has a
limit s, then the series is convergent and s is its sum. This is denoted by s =
Xl + X2 + .... If the sequence of partial sums has no limit, then the series
is divergent.
If s = XI + X 2 + "', t = YI + Y2 + "', then s + t = (XI + yd +
(x 2 + Y2) + ... and cs = (ex l ) + (cx 2 ) + ... for any scalar c. This follows
from the definition and Proposition 2.6. Some further elementary properties
are given in Problems 7(c) and 8.

PROBLEMS

In Problems 1 and 2 you may use the results of Problems 9 and 10.
t. Find the limit if it exists.
(a) Xm = (2m - 2- m)!(3m + rm).
(b) Xm = sin(mn/2).
(e) Xm = sin mn.
(d) Xm = ((m + l)/(m - l)t. [Hint: (1 + l/m)m -> e as m -> w.]
(e) Xm = ((m 2 + l)/(m 2 - l))m.

2. Find the limit if it exists, using Proposition 2.


(a) (x m , Ym) = ((1 + m)/(1 - 2m), 1/(1 + m)).
(b) (x m , Ym) = (r m, 1 + m).
(e) (x m , Ym) = (1 - 2- m, (m 2 + 3m)!m!).

42
2.4 Bolzano- Weierstrass theorem

3. Show that a sequence [xm] has at most one limit Xo' [Hint: If Yo were another
limit, let e = IXo - Yo 1/2.]
4. (a) Let A be a closed set. Show that ifx m E A for m = I, 2•... and Xo = Iimm_~ Xm,
then Xo E A.
(b) If A is not closed, show that there exists a sequence [xm]. with Xm E A for
m = 1,2, ... , converging to a limit Xo ¢= A.
5. Show that if Xm E A for every m ;:::: I and Xo = Iim m_ ~ Xm' then Xo E cI A.
6. (a) Prove Proposition 2.6.
(b) Prove Proposition 2.7.
7. (Comparison tests.) Show that:
(a) If 0 ~ Xm ~ Ym for every m ;:::: I and Ym -> 0 as m -> C1J, then Xm -> 0 as m -> 'CIJ.
(b) If [xm], [ym] are nondecreasing sequences such that Xm ~ Ym for each m = 1,2, ...
and Ym -> Y as m -> C1J, then [xm] has a limit x ~ y.
(c) If 0 ~ Xm ~ Ym for every m = 1,2, ... and t = YI + Y2 + ... , then the series
XI + Xl + ... converges with sum s ~ t.
8. An infinite series XI + Xl + ... converges absolutely if the series of nonnegative
numbers IxII + Ixli + ... converges. Prove that any absolutely convergent infinite
series is convergent. [Hint: Show that the sequence [sm] of partial sums is Cauchy.]
9. Show that if a > 0, then
(a) Iim m_ oc a 1 / m = 1. (b) lim m_ x amlm! = O.
(c) Iim m_ oc (x m)l/m = 1 provided Iim m_ x Xm = a.
[Hints: For part (a) reduce to the case 0 < a < 1. By Example 1, if b < 1 then
a ~ bmfor only finitely many m. For part (b), compare with the sequence [elm] for
suitable e and suitable I in Problem 7(a).]
10. Let Xo = Iim m_ x X m, Yo = limm_~ Ym, and assume that Ym of. 0 for m = 0, 1,2, ....
Show that xolYo = Iim m_ r xmlYm. [Hint: By (c) of Proposition 2.6 it suffices to
show that yO' I = Iim m_ ~ y;;; I.]
11. Show that Xo = Yo in Example 3.

2.4 Bolzano-Weierstrass theorem


Suppose that an infinite number of points lie in a box. It is intuitively reason-
able that they cannot remain scattered but must accumulate at some points
of the box. The purpose of the present section is to put this idea on a precise
basis. We begin with definitions of the concepts of isolated point and accumu-
lation point of a set A c P.

Definition. A point Xo is an isolated point of A if there exists a neighborhood


V of Xo such that A n V = {x o}.

Definition. A point Xo is an accumulation point of A if every neighborhood of


Xo contains an infinite number of points of A.

43
2 Elementary topology of En

In the definition of accumulation point we do not require that Xo EA.


However, the next proposition shows that accumulation points are in the
closure of A.

Proposition 2.S. A point Xo is an accumulation point of A if and only if Xo E cl A


and Xo is not an isolated point of A.

PROOF. Suppose that Xo E cl A and Xo is not an isolated point of A. Let VI


be any neighborhood of xo. From the definitions of closure and isolated
point there exists Xl EA n V" XI -=f. Xo. Let us show that A n VI contains
infinitely many points. Suppose not. Then A n V 1 contains finitely many
points XI"'" Xl different from Xo. Choose a smaller neighborhood V of
radius b < min { IXm - Xo I, m = 1, ... , I}. Either A n V is empty or A n V =
{x o }. The first possibility contradicts the assumption that Xo E cl A, and the
second the assumption that Xo is not isolated. Thus Xo is an accumulation
point of A. The converse follows immediately from the definitions. 0

EXAMPLE 1. Every point of the closed interval [a, b] is an accumulation point


of the open interval (a, b).

EXAMPLE 2. Let A = {I, 1, 1, ... }. Every point of A is an isolated point of A.


The set A has the single accumulation point O.

EXAMPLE 3. Let A = {all rational numbers}. Then every point of El is an


accumulation point of A.

The next proposition characterizes accumulation points in terms of con-


vergent sequences.

Proposition 2.9. A point Xo is an accumulation point of A if and only if there


exists a sequence [xm] with limit Xo. such that Xm E A and Xm -=f. Xo for
m = 1,2, ....

The proof is left to the reader (Problem 2).


We come now to the main results of the present section. A set A is called
infinite if the number of its elements is not finite. A set I c En of the form

I = { x: IXl. - x~ I
. a
s "2' i =
}
1, ... , n •

where
X = (Xl, ... ,Xn ),

is called an n-cube with center Xo and side length a. If A is any bounded set,
then A c I for some n-cube I.

44
2.4 Bolzano-Weierstrass theorem

f-
rr 112

1 13 114

Figure 2.3

Theorem (Bolzano-Weierstrass). Every bounded infinite set A c En has at


least one accumulation point.
PROOF. Let A be a bounded infinite set and I I be some closed n-cube con-
taining A. Divide I I into m = 2n closed congruent n-cubes I II, ... , I I m as
indicated in Figure 2.3. Since A is an infinite set, A n Ilk must be infinite for
at least one k = 1, ... , m. Choose some such k and let Ilk = 12 , In the same
way divide 12 into 2n closed congruent n-cubes 1 21 " " , I 2m' As before,
A n 12k is infinite for at least one k. Choose such a k and let 12k = 1 3 , Con-
tinuing, we obtain closed n-cubes I I ::::J 12 ::::J 13 ::::J ••• such that A n I[ is
infinite for each I = 1,2, ... and diam I[ --> 0 as 1--> 00. By Theorem 2.3,
I I n 12 n .. , has a single point Xo. If U is any neighborhood of x o , then
I[ c U for large enough I. Since A n I[ cAn U, A n U is an infinite set.
Therefore Xo is an accumulation point of A. 0

The Bolzano-Weierstrass theorem has the following useful consequences.

Corollary l. Let See be closed and bounded. Then every infinite set A c S
has at least one accumulation point Xo E S.
PROOF. Since S is a bounded set and A c S, A is a bounded set. By the
Bolzano-Weierstrass theorem, A has an accumulation point Xo. By Proposi-
tion 2.8, Xo E cl A. Since S is closed, cl A c S. Thus Xo E S. 0

Note. The converse to Corollary 1 is true. To prove it consider first an


unbounded set S. For each m = 1,2, ... there exists Xm E S such that IXm I ~ m.
The set A = {x I, X2 , ... } is infinite and has no accumulation point. If S is
not closed, then there exists a point Xo E fr S - S. For m = 1, 2, ... there exists
xmES such that IX m - xol < 11m. The set A = {X I,X 2 , ... } is infinite and
has the single accumulation point xo. But Xo rt s.

Corollary 2. Let A I, A 2, ... be nonempty bounded, closed subsets of e such


that A I ::::J A2 ::::J •••• Then n~ ~ I Am is not empty.
PROOF. For m = 1,2, ... choose some point Xm E Am. Let A = {XI' X2 , ••• }.
If A is a finite set, then some X = Xm is repeated infinitely often in the sequence
[xm].SinceAI::::J A 2 ::::J "',xEAmforeverym = 1,2, .... ThusxEn~~1 Am.

45
2 Elementary topology of En

Let us consider the case when A is an infinite set. Then A c A I since each
Am c A l' Since A 1 is bounded, A is bounded. Let Xo be an accumulation
point of A. As in the proof of Corollary 1, we have Xo E A 1 since A I is a closed
set and A c A I' For each m = 1,2, ... , Xo is also an accumulation point of
the set {xm' Xm + I, ... }. Since this set is contained in Am and Am is closed, we
get in the same way Xo E Am. Since this is true for each m, Xo E n~= I Am. D

Corollary 3 shows the existence of a point in any closed set A nearest a


given point Xo ~ A.

Corollary 3. Let A be a closed, nonempty subset of En and Xo ~ A. Then there


exists XI EA such that Ix - xol ~ IXI - xolforallxEA.
PROOF. Let Sr = {x: Ix - Xo I :S; r} denote the closed spherical n-ball with
center Xo and radius r. Let
d = inf{ Ix - Xo I : x E A}
1
dm = d + -, m = 1,2, ...
m
Am = An Sd m '
The sets AI' A z , ... satisfy the hypotheses of Corollary 2. Let XI E n~= I Am·
Then XI E A since each Am C A. By definition of d, I XI - Xo I ~ d. Since
XI EAm, and Am C Sd m , IXI - xol:s; d + 11m for each m = 1,2, .... Thus
IXI - xol = d. 0

Note. The point XI in Corollary 3 need not be unique. However, XI IS


unique if A is also convex (Problem 3).

PROBLEMS

1. Find all accumulation points of A :


(a) A = {(-l)mm(l + m)-l:m = 1,2, ... }.

(b) A = {(cos 2~n, sin 2~n): m = 1,2, .. .}-

(c) A = {((1 - ~)cos 2~n , (1 - ~)sin 2~n): m = 1, 2, .. .}-


(d) A = {(x, y):(x 2 + i)(i- x2 + 1) sO}.
(e) A = {cos m: m = 1,2, ... }.
2. Prove Proposition 2.9.
3. Let A be a closed, convex, nonempty set, and Xo 1= A. Show that there is exactly one
point Xl E A nearest xo.

46
2.5 Relative neighborhoods, continuous transformations

4. A set A is called dense in B if every point of B is an accumulation point of A.


(a) Suppose that A has no isolated points. Show that A is dense in B if and only if
Be cl A.
(b) Suppose that A is dense in Band B is dense in C. Show that A is dense in C.
5. Let C = [0, 1J - (AI uA z u .. ·), where Al = (tt), A z = (~,~)u(~,&), A3 =
U7,l7)U···u(B,m, and Aj is the union of2 j - 1 open intervals of length r j
chosen similarly (see Figure 2.4). [Note: C is called the Cantor set.J
(a) Show that C is a closed set.
(b) Show that C is dense in no open set.

A,
0
,..., A2
i --,
AI
I I n I
A3 A3 A3 A3
Figure 2.4

6. (Subsequences.) Let [xmJ be a sequence, and Yl = x"" for / = 1,2, ... , where
ml < mz < .... Then [ytJ is called a subsequence of [xm].
(a) Show that any bounded sequence in En has a convergent subsequence.
(b) A set S is called sequentially compact if: any bounded sequence [xmJ, with Xm E S
for m = 1,2, ... ,has a subsequence [ytJ such that Yl --+ Yo as /--+ 00, Yo E S. Show
that a nonempty set SeEn is sequentially compact if and only if S is closed and
bounded.
7. Let y be any frontier point of a closed convex set K. Show that K has a supporting
hyperplane P that contains y. [Hint: Let {Ym} be a sequence of points exterior to K
such that Ym tends to Y as m --+ 00. Let xm be a point of K nearest to Ym and

Then IUmI = 1 and Xm tends to y as m --+ ex;. By the proof of Theorem 1.1 there is a
supporting hyperplane of the form {x: Urn' (x - xm) = O}. Let U be an accumulation
point of the bounded set {Ub u z , ... } and P = {x: U' (x - y) = O}.J

2.5 Relative neighborhoods, continuous


transformations
For the definition in Section 2.2 of a transformation continuous at a point
x o , it is assumed that Xo is interior to the domain D. However, we often wish
to discuss continuity at points which are not interior to the domain. Moreover,
even if the domain D is an open subset of En, we may be interested only in the
restriction of the transformation to some set SeD.
We easily circumvent this apparent difficulty by introducing the idea of
relative neighborhood.

Definition. Let S be a nonempty subset of En. A relative neighborhood of a


point XES is any set U such that U = S n W, where W is a neighborhood
of x in En (Figure 2.5).

47
2 Elementary topology of En

U = Sn W S

Figure 2.5

In considering the relative neighborhood V = S n W one simply ignores


all points of W not in S.
Let us now consider a transformation f, whose domain is a set SeEn.

Definition. The transformation f is continuous at Xo if, for every neighborhood


V off(x o), there exists a relative neighborhood V ofx o such that f(V) c V
If f is continuous at every point Xo E S, then we call f continuous on S.

An equivalent form of the definition of continuity at Xo is: for every


c > 0 there exists b > 0 such that If(x) - f(x o) I < c whenever XES and
Ix - Xo I < b (cf. Section 2.2.)

EXAMPLE I. Let S = [a, b]. The relative neighborhoods are described in


Problem 3. Letfbe real valued. For a < Xo < b, continuity at Xo means the
same as that in Section 2.2. Continuity at a means the following: given
e > 0 there exists b > 0 such that I f(x) - f(a) I < e whenever a ~ x <
a + b. This is expressed by writing f(a) = limx~a+ f(x), where the notation +
denotes right-hand limit. Similarly, continuity at b is written f(b) = limx~b­
f(x), where superscript - denotes left-hand limit.

Theorem 2.4. Let f be continuous on a closed, bounded set S. Then f(S) is also a
closed, bounded set.

In this section we give a proof of Theorem 2.4 using Corollary 1 to the


Bolzano-Weierstrass theorem. A different proof is given in Section 2.8, using
the concept of compactness.
PROOF. The proof is by contradiction. Suppose that f(S) is unbounded. Then
for each m = 1,2, ... there exists Xm E S such that If(xm)1 ~ m. The set A =
{XI' X2""} is infinite, and A c S. By Corollary 1, Section 2.4, A has an
accumulation point Xo E S. Let V be the neighborhood of Xo of radius 1.
Since f is continuous at x o , there exists a relative neighborhood V of Xo such
that f(V) c V. In particular, f(x m) E V for infinitely many m such that Xm E V.
Since If(x m) I ~ m, this is a contradiction. Thus f(S) is bounded.

48
2.5 Relative neighborhoods, continuous transformations

Next suppose that f(S) is not closed. Then there exists Yo E cl[f(S)] - f(S).
For m = 1,2, ... there exists Ym E f(S) such that Ym -+ Yo as m -+ 00. Choose
zmES with Ym = f(zm). As before, the set B = {ZI,Z2''''} has an accumu-
lation point Zo E S. Since Yo ¢ f(S), Yo =1= f(zo). Let V be a neighborhood of
f(zo) of radius tl f(zo) - Yo I. Since f is continuous at Zo, there exists a relative
neighborhood U of Zo with f(U) c V. In particular, Ym = f(zm) is in V for
infinitely many m for which Zm E U. This contradicts the fact that Ym -+ Yo as
m ~ 00. Thus f (S) is closed. D

If we specialize Theorem 2.4 to real 'Valued functions, we get a theorem


about the existence of maxima and minima.

Theorem 2.5. Let J be a real valued Junction continuous on a closed, bounded


set S. Then there exist Xl' X 2 E S such that

Jor all XES.


PROOF.We recall that if T c £1 is bounded and closed, then Yl = inf T and
Y2 sup T are points of T (Example 4, Section 1.4). Let T = J(S). By
=
Theorem 2.4, T is closed and bounded. Take Xi such that Yi = J(xJ, i = 1,2.
D

The functionJis said to have a minimum on S at Xl and a maximum on S


at X2 .

PROBLEMS

1. In each case show that f has a minimum on S. but no maximum on S. Which


assumption in Theorem 2.5 is violated?
(a) S = (0, IJ,J(x) = X-I.

(b) S = En,J(x) = 1 ~xlxl'


2. Given xo , let f(x) = 1x - Xo I. Show that f has a minimum on any closed, nonempty
set A c En. (This. gives another proof of Corollary 3, Section 2.4.)

3. Let S = [aJbl
(a) Show that the relative neighborhoods of a are the half open intervals [a, c) with
a < c < b, and S.
(b) Leta < Xo < ~a + b). Show that the relative neighborhoodsofx o areas follows:
(xo - 15, Xo + b) if 0 < 15 < Xo - a; [a, Xo + b) if Xo - a :-:; c5 < b - Xo; S.
(c) Describe the relative neighborhoods in the remaining cases ~a + b) :-:; Xo :-:; b.
4. Define the projection 11: from En onto P as in Problem 7, Section 2.1.
(a) Show that 11:(A) is closed and bounded if A is closed and bounded.
(b) Give an example ofa closed set A such that n(A) is not closed.

49
2 Elementary topology of En

5. (a) Let f be continuous on S, and let SIC S. Show that the restriction f lSI is
continuous on S l'
(b) Let f(x) = I-x if x~O and f(x)=O if x <0. Let SI = [0,00), S= El.
Show that f lSI is continuous, but f is not continuous at each point of S l'
6. Let f be a transformation with domain S. Show that f is continuous at Xo if and only
if f(x o) = limm~oc f(x m) for every sequence [xm] such that Xm E S for m = 1,2, ...
and Xm ..... Xo as m ..... 00.

7. Let f be continuous on En. Suppose, moreover, that f(x) > 0 for all x # 0, and that
f(cx) = cf(x) for any x and c > O. Show that there exist a > 0 and b > 0 such that
alxl ::; f(x) ::; blxl. [Hint: First consider {x: Ixl = I}.]
8. (Uniform continuity.) A transformation f is uniformly continuous on SeEn if given
[; > 0 thert! exists b > 0 (depending only on [;) such that If(x) - f(y) I < [; for every
x, YES with Ix - y I < b. Show that if S is closed and bounded then every f con-
tinuous on S is uniformly continuous on S. [Hint: If not, then there exists [; > 0
and for m = 1,2, ... , x m, Ym E S such that If(xm) - f(Ym) I ~ [; and IXm - Yml ::; 11m.
Let Xo be an accumulation point of {Xlo Xl," .}. Show that the continuity of fat
Xo is contradicted.]

2.6 Topological spaces


In order to proceed further with the study of subsets of En and continuous
transformations, it is convenient to introduce a very general concept-that
of topological space. In this section S denotes a set, not necessarily a subset of
En, and p denotes a point of S.
The notion of topological space occurs in practically all branches of
mathematics. There are several equivalent definitions; of these, we give the
one in terms of neighborhoods.

Definition. Let S be a nonempty set. For every pES let OIJ p be a collection of
subsets of S called neighborhoods of p such that:
(1) Every point p has at least one neighborhood.
(2) Every neighborhood of p contains p.
(3) If Uland U 2 are neighborhoods of p, then there is a neighborhood
U 3 of p such that U 3 CUI n U 2 .
(4) If U is a neighborhood of p and q E U, then there is a neighborhood
V of q such that V c U.

Then S is a topological space.

More precisely, the topological space is S together with the collections


OIJ p of neighborhoods. However, it is common practice to omit explicit
reference to the collections of neighborhoods when no ambiguity can arise.
For our purposes, the following two examples of topological spaces are
of primary importance.

50
2.6 Topological spaces

EXAMPLE I. Let S = En, and as in Section 1.4, let JIf x be the collection of all
open spherical n-balls with center x. Clearly, Axioms (1) and (2) of the
definition above are satisfied, and in (3) we may take U 3 = U I n U 2.
Axiom (4) is verified in Example 1, Section 1.4. Thus En is a topological space.

EXAMPLE 2. Let Seen. Let neighborhoods of XES be all relative neighbor-


hoods U = S n W, where W is a neighborhood of x in En, as in Section 2.5.
The topology on S defined by the collections of relative neighborhoods is the
relative topology. It is discussed further later in the section. Roughly speaking,
the relative topology is the one obtained by simply ignoring the comple-
mentary set SC = En - S.

Further examples of topological spaces appear in the homework problems.


The metric spaces (Section 2.9) furnish an important class of topological
spaces.
In any topological space S, the basic notions of interior, frontier, and
closure are defined just as in Section 1.4 for the topological space En. For
instance, p is interior to a set A c S if some neighborhood of p is contained
in A.

Definition. Let S be a topological space. A set A c S is open if every point


pEA is interior to A. A set A c S is closed if S - A is open.

Axiom (4) (in the preceding definition) guarantees that any neighborhood
is an open set. Propositions 1.1-1.3 in Section 1.4 about unions and inter-
sections of open sets (or closed sets) remain true in any topological space.
The proofs are almost the same as before.
It often happens that two different collections of neighborhoods JIf P' JIf~
lead to the same collection of open subsets of S. In En we need not have started
with spherical neighborhoods. For instance, the neighborhoods obtained
from any noneuclidean norm on En lead to the same open sets as in Section
1.4. (cf. Section 2.11).
The open sets, and not the particular kinds of neighborhoods from which
they were obtained, determine all of the topological properties of S. Thus we
say that the collections 0If p' JIf~ define the same topology on S if they lead to the
same collection of open sets.

EXAMPLE 3. Let S = En. Let JIf~ consist of all n-cubes


U' = {Y:li - xii < 6,; = 1, ... n}
for any 6 > O. The reader should verify that the collections JIf~ of" neighbor-
hoods" in this sense define the same topology on En as the usual topology of
En in Example 1 (Problem 1). The key fact is that each U' E JIf~ is contained
in some U E JIf x' and vice versa.

51
2 Elementary topology of En

Continuous functions
The concept of continuous function from one topological space into another
is a natural extension of that already considered for transformations between
euclidean spaces.

Definition. Letfdenote a function from a topological space S into a topological


space T. The function f is continuous at Po if for every neighborhood V
off(Po) there exists a neighborhood U of Po such thatf(U) c V.

If SeEn, T = Em. and S is given the relative topology, then this definition
of continuity agrees with that in Section 2.5.

Proposition 2.4'. Let f and 9 be real valued functions, continuous at Po. Then
f + 9 and fg are continuous at Po· If g(po) =t- 0, then fg - 1 is continuous at Po .

Proposition 2.5'. Afunction f, from a topological space S into Em, is continuous


at Po and only if each component P is continuous at Po, i = 1, ... , m.

We leave the proof of Propositions 2.4' and 2.5' to the reader (Problems 11
and 12).

Definition. Ufis continuous at every PES, we calif continuous on S.

The following is a convenient characterization of continuous functions,


in terms of open subsets of Sand T.

Theorem 2.6. f is continuous on S if and only if the inverse image f - I(B) of


any open set B is open.
PROOF. Letfbe continuous on Sand BeT be open. Let p be any point of
f - I(B) and V be a neighborhood of f(p) such that V c B. Since f is con-
tinuous, thereisa neighborhood U ofpsuch thatf(U) c VThen U c f- 1(B),
which shows that f - I(B) is open.
Conversely, letf - I(B) be open for each open set B. Let p be any point of S,
and V be any neighborhood ofj(p). Since V is open,f -1(V) is open and con-
tains p. Let U be a neighborhood of p such that U c f - 1(V). Then f( U) c V,
which shows that f is continuous at p. Since this is true for every PES, f is
continuous on S. 0

Since the complement of an open set is closed andf-l(B C ) = [f-l(B)J,


we have the following.

Corollary 1. f is continuous on S ifand only if the inverse image f - 1(B) of any


closed set B is closed.

Let us next specialize to real valued functions f and special choices for B.

52
2.6 Topological spaces

Corollary 2. Iff is real valued and continuous on S, then {p :f{p) > c} is an


open set, and {p :f{p) ~ c}, {p :f{p) = c} are closed sets.
PROOF. In this case T = EI. The semiinfinite interval B = (C, 00) is open, and
{p :f{p) > c} = f-I(B). The other statements are obtained by taking
B = [c, 00), B = {c}, which are closed sets. 0

By applying Corollary 2 to -fand -c, the set {p :f{p) < c} is open and
{p :f{p) s c} is closed. Corollary 2 often provides a convenient way to show
that a given set is open or closed.

EXAMPLE 4. Let A = {(x, y): i < x 2 + I}. Take S = E2 and f(x, y) =


x 2 - y2 + 1. Then A = {(x, y) :f(x, y) > O}; hence A is open. Similarly, the
set {(x, y) : i s x 2 + I} and the hyperbola {(x, y) : y2 = x 2 + l} are closed
sets.

Composites
Let f be a function from S into T, and g from R into S. The composite fo g
is defined by
(fa g)(r) = f[g{r)] for every r E R.

Theorem 2.7. If g is continuous at ro and f is continuous at Po = g(ro), then


fog is continuous at r o.
PROOF. Let V be any neighborhood off(po) (see Figure 2.6). There is a neigh-
borhood U of Po such that f(U) c V. Moreover, there is a neighborhood
W of ro such that g(W) c U. Then (f g)(W) = f[g(W)] c f{U) c V This
0

shows thatf g is continuous at ro.


0 0

EXAMPLE 5. Let D c En be open. Given Xo and v, let g(t) = Xo + tv for every


scalar t. Then g is continuous from EI into En. By Proposition 2.6, ~ =
{t : Xo + tv ED} is open. Let f be continuous on D and let ¢(t) = f(xo + tv)
for t E~. Then ¢ = fa (gl~), and by Theorem 2.7, ¢ is continuous on~. This
result is similar to Proposition 2.3.

Figure 2.6

53
2 Elementary topology of En

The following consequence of Theorem 2.7 is used in Chapter 5.

EXAMPLE 6. Let 1 ~ s ~ n - 1. Let us regard En as the cartesian product


ES x En-s and write x = (x', x"), where x' = (Xl, ... ,XS ), x" = (x s+ I, ... , xn).
Given Xo E ES , let g be the function from En- s into En such that g(x") =
(xo, x") for every x" E En - s • Such a function g is called an injection. Since
Ig(x") - g(y") I = Ix" - y" I, g is continuous. Let D c En be open, and let
D(x o) = {x": (xo, x") ED}. Since D(x o) = g-I(D), by Theorem 2.6 D(xo)
is an open subset of £"-s. Let f be continuous on D. The function f(xo, )
whose value at each x" E D(xo) is f(xo, x") is the composite off and g ID(x o).
By Theorem 2.7, f(x o, ) is continuous. Similarly, given x~, the set
{x' : (x', x~) ED} is open and the function f( , x~) is continuous.

Subspaces
Let So be a topological space and S a nonempty subset of So. By disregarding
the complement So - S, S becomes a topological space in the following
way. If PES, then a neighborhood of p relative to S is a set U = S n W, where
W is a neighborhood in So of p. Axioms (I) through (4) (in the first definition
in this section) are satisfied. For instance, to prove (3) let U 10 U 2 be relative
neighborhoods of p. Then U I = S n WI' U 2 = S n W2 , where WI and W2
are neighborhoods in So of p. Since So is a topological space, there is a neigh-
borhood W3 in So of p with W3 c WI n W2 • Then U 3 = S n W3 is a relative
neighborhood of p and U 3 CUI n U 2. This topology is called the relative
topology induced on S by the topology of So, and S is a topological subspace
of So.
In particular, if So = En, this is the relative topology on S mentioned in
Section 2.5 and Example 2 (above).
The following proposition gives a convenient way to describe the sets that
are open relative to S, or closed relative to S.

Proposition 2.10. Let S be a topological subspace of So. A set A c S is open in


the relative topology if and only if A = S n D, where D is an open subset of
So. Similarly, A is closed in the relative topology if and only if A = S n D
where D is a closed subset of So.

PROOF. Suppose that A =Sn D, where D is an open subset of So. Any


pE A has a neighborhood W in So such that WeD. Then U = S n W is a
relative neighborhood of p such that U c A. Hence A is relatively open.
Conversely, let A be relatively open. For each pEA, let Up be some relative
neighborhood of p with Up c A. Then Up = S n Wp' where Wp is a neigh-
borhood of p in So. Let D = UPEA Wp. Then D is an open subset of So, and
A = SnD.
The statements about closed sets are then obtained by considering the
sets S - A and So - D. 0

54
2.6 Topological spaces

EXAMPLE 7. Let S = [a,b], So = £1, as in Example 1, Section 2.5. For


a < / < b, the interval A = [a, c) is open relative to S, since A = S (\ D
where D = ( - 00, c) is an open subset of £1.

EXAMPLE 8. Let S = {all rational numbers}, So = E 1 , A = {rational


x : Xl < 2}. Then
A = S (\ (- 00,)2) = S (\ (- oc,)2].
Since (- 00,)2) is an open subset of £1 and ( - 00,)2] is a closed subset of
£1, A is both open and closed relative to S.

Homeomorphisms
Let f be a univalent function from a topological space S onto a topological
space T. Then f has an inverse f - 1, whose domain is T. For each q E Tits
value f - l(q) is the unique pES such that f(p) = q. If both f and its inverse
f -1 are continuous functions, thenfis a homeomorphism. If there is a homeo-
morphism from S onto T, then Sand T are homeomorphic topological spaces.
In topology, homeomorphic spaces Sand T may be regarded as indis-
tinguishable. Every topological property enjoyed by S is also enjoyed by T.
For a deeper introduction to the literature on the general theory of
topological spaces, we refer to the work by Kelley [14].

PROBLEMS
1. Consider En with o/i x the collection of neighborhoods of x (open spherical n-balls
with center x), and o/i~ the collection of" neighborhoods" in Example 3.
(a) Verify that En with the collections o/i~ of "neighborhoods" satisfies Axioms (I)
through (4) for a topological space.
(b) Show that each U' E o/i~ contains some U EJU x; and each U E JU x contains
some U' E 'ft~.
(c) Using (b), show that the collections Yli x , J7t~ lead to the same collection of open
sets, and hence define the same topology on En.
2. Use Corollary 2 to show that each of the following sets is closed:
(a) {x: -2 ~ x ~ 2, x 3 - X ;::: O}.
(b) {(x,y):x 4 + y4 = I}.
(c) {x: Yo' x ~ Ixl}, Yo a given vector.

3. Let So = EI. Find whether A is open relative to S, closed relative to S, or neither.


(a) S = {x: a ~ x ~ b, x#- c}, A = [a, c) where a < c < b.
(b) S = (0, I],A = {l,tL ... }.
(c) S = [0, I],A = {I,tt ... }.

°
4. Let So = E2. Find whether A is open relative to S, closed relative to S, or neither.
(a) S = {(x, y): < x 2 + i
< I}, A = {(x, y) E S: x 2 ~ y2}.
(b) S = {(x,y):x 2 + y2 = 4},A = {(X,Y)ES:X 2 < i}.
5. Let S be an open subset of So. Show that the relatively open sets are just those open
subsets of So contained in S.

55
2 Elementary topology of En

Problems 6 through 9 give some examples of topological spaces.


6. (Indiscrete spaces.) Let S be any set, and let every pES have exactly one "neighbor-
hood," namely, S itself; that is, each iJ/i p consists of the set S only.
(a) Verify Axioms (I) through (4).
(b) Show that the only open sets are S and the empty set.
(c) Show that any real valued function continuous on S is constant.
7. (Discrete spaces.) Let S be a topological space such that the set {p} with the one
element p is a "neighborhood" of p; that is, {p} E iJ/i p for each pES.
(a) Show that every subset of S is open.
(b) Show that every function with domain S is continuous.
(c) Suppose that SeEn with the relative topology. Show that S is a discrete space
if and only if every XES is isolated.
8. Let S = El with the following nonstandard topology. In this topology the
"6-neighborhood" of Xo is the interval [xo, Xo + 6).
(a) Verify Axioms (1) through (4).
(b) Show that [0, 00) is both open and closed in this topology.
(c) Let f be a real valued function from El with this topology into El with the
standard topology. Show that continuity of f at Xo is equivalent to f(xo) =
limx~xo f(x), where superscript + denotes right hand limit.

9. Consider the following nonstandard topology on the plane E2: In this topology
the" 6-neighborhood" of (xo, Yo) is the set {(x, y) : Xo - 6 < x < Xo + 6, y = Yo}.
(a) Verify Axioms (1) through (4).
(b) Show that if D c E2 is open in the usual sense, then D is open in this topology,
but not conversely.
(c) Let fix, y) = g(x)h(y), where g and h have domain El and g is continuous in
the usual topology of £1. Show that f is continuous in this topology.
10. Let S = {(x,y):x 2 + y2 = I}, T = {(x,y):lxl + Iyl = I}, both with the relative
topology. Show that Sand T are homeomorphic topological spaces, by finding a
homeomorphism f from S onto T.
11. Prove Proposition 2.4'.
12. Prove Proposition 2.5'.

2.7 Connectedness
From the intuitive point of view, a set should be regarded as connected if it
consists of one piece. Tluus an interval on the real line E1 is connected, while
the set [0, 1] u [2, 3] is disconnected. For more complicated sets, intuition
is not a reliable guide.

Definition. Let So be a topological space. A set S c So is disconnected if there


exist nonempty sets A and B such that S = A u B, A n B is empty, and
A, B are both open relative to S. If S is not disconnected, then S is a
connected set.
If So, considered as a subset of itself, is connected, then So is a connected
topological space.

56
2.7 Connectedness

EXAMPLE 1. Let S = [0, 1] u [2, 3], So = £ I . To show that S is disconnected,


let A = [0, 1], B = [2, 3]. Then A = S (\ ( - 1, ~), which shows that A IS
open relative to S by Proposition 2.10. Similarly, B is open relative to S.

EXAMPLE 2. Let S = {(x, y): i :-: :; Xl, X #- O}, So = £1. To show that S is
disconnected, let A = S (\ H + , B = S (\ H _ , where H + , H _ are the open
half-planes H + = {(x, y): x > O}, H _ = {(x, y): x < O}.
Let us next characterize completely the connected subsets of £1.

Definition. A nonempty set J c £1 is an interval if for every x, y E J, x < y,


the set [x, y] is contained in J (Figure 2.7).

J
~~
{Z ~'
v
[x, y]

Figure 2.7

In Example 2, Section 1.1, four types of finite interval and four types of
semi-infinite interval were considered. Each of these is an interval according
to the definition just given. Moreover, £1 is an interval, and any set {x} with
a single point is an interval. These 10 types of intervals are the only possibilities
(Problem 5).

Proposition 2.11. A set S c £1 is connected if and only if S is an interval.


PROOF. If S is not an interval, then there exist x, yES, X < y, and z ¢ S such
that x < z < y. Let A = S (\ (-00, z), B = S (\ (z,oo). Then A and B
are nonempty and relatively open, A u B = S, and A (\ B is empty. Therefore
S is disconnected.
Conversely, suppose that some interval J is disconnected. Then J =
A u B where A and B are not empty and open relative to J, and A (\ B is
empty. Let XI E A and Xl E B. The notation A, B may be chosen so that
x I < Xl' Since J is an interval, [x I, Xl] c J; hence the fact that A is relatively
open implies that there exists DI > 0 such that [x 10 x I + Dd c A. Similarly,
thereexistsD l > Osuchthat(x2 - Dl ,X2] c B.LettingBI = {xEB;x > xd
and y = inf B I, we have XI < Y < X2' Since J is an interval, y E J. If yEA,
then some interval (y - D, y + D) is contained in A and y + D is a lower
bound for B 1, contrary to the fact that y is the greatest lower bound. Similarly,
if y E B, then some interval (y - D, y + D) is contained in B I' and y is not a
lower bound. This is a contradiction. 0

Letfbe a function from So into a topological space T. For S c So, we call f


continuous on S if the restriction f IS is continuous at each pES when S is
given the relative topology. This agrees with the terminology "continuous
on S" in Section 2.5 when So = £n and in Section 2.6.

57
2 Elementary topology of En

Theorem 2.S. If S is a connected set and f is continuous on S, then f(S) is a


connected set.
PROOF. Consider first the case S = So andf(S) = T. We argue by contradic-
tion. Suppose that T is disconnected. Then T = P u Q where P, Q are open,
nonempty, and P n Q is empty. Let A = f-I(P), B = f-I(Q). By Theorem
2.6, A and B are open. Moreover, S = A u B, A n B is empty, and neither
A nor B is empty. Thus S is disconnected, contrary to hypothesis.
To reduce the general case to this one, consider S as a topological space
with the relative topology. The space S is connected. Replacefby f IS, and T
by f(S), regarded as a topological subspace of T with the relative topology.
Then f IS is continuous, regarded as a function from S onto f(S). By what has
already been proved,j(S) is connected. 0

Corollary (intermediate value theorem). If S is a connected set and f is real-


valued and continuous on S, then f(S) is an interval.
PROOF. By Proposition 2.11 every connected subset of EI is an interval. 0

As a particular instance of the intermediate value theorem, suppose that


S = [a, bJ, a closed, bounded interval. Iffis real valued and continuous on S,
then Theorem 2.5 implies thatfhas a maximum and minimum on S. Since S
is connected, the intermediate value theorem implies thatf(x) assumes on S
every value between its maximum and minimum.
*Pathwise connectedness
Let p and q be points of a topological space S. A path in S from p to q is a
continuous function 9 from [0, 1J into S with g(O) = p, g(l) = q. If every
such pair of points can be joined by a path in S, then S is called path wise
connected.

Proposition 2.12. If Sis pathwise connected, then S is connected.


PROOF. If S is disconnected, then S = A u B as in the definition of dis-
connected space. Let pEA, q E B, and 9 be a path in S joining p and q.
Since 9 is continuous, g-I(A) and g-I(B) are open relative to [0, IJ, their
union is [0, 1J, and their intersection is empty. This contradicts the fact that
[0, 1J is connected. 0

EXAMPLE 3. Let S = S IUS 2' where


SI = {(O,y): -I ~ y ~ I},
S2 = {(x, sin l/x): x> O}
Then it can be shown that S is connected but not pathwise connected
(Problem 12).

On the other hand, any open connected subset of En is pathwise connected.


In fact, any two points of D can be connected by a polygonal path (Problem 10).

58
2.7 Connectedness

PROBLEMS

1. Show from the definition that the following are disconnected subsets of the plane e:
(a) The hyperbola x 2 - y2 = 1.
(b) Any finite subset of E2 with at least two elements.
(c) {(x,y):x 2 + y2 S 4,y # x + I}.

2. Show that a topological space S is disconnected if and only if there exists a con-
tinuous function f on S, with f(p) = 0 or f(p) = I for all PES, but f(p) not constant
on S.

3. Let SI and S2 be connected subsets of En, such that Sin S2 is not empty. Show that
SI u S2 is connected.

4. Let S be an open subset of En. Show that S is disconnected if and only if S = A u B


where A and Bare nonempty open subsets of En, such that A n B is empty. Prove the
corresponding statement in which "open" is replaced everywhere by "closed."

5. Let J be an interval. Show that either J = E I , J has a single point, or J is of one of the
eight types listed in Example 2, Section 1.1.

6. Instead of Axiom III. (least upper bound property) about the real numbers, take
as an axiom the property that EI is connected. Prove Axiom III as a theorem.
[Hint: Let A = {all upper bounds of S} and B = A'. Show that B is open; and if S
has no least upper bound, then A is open.]

7. Show that each of the following sets is pathwise connected.


(a) Any convex set.
(b) The unit circle x 2 + y2 = I in E2.
(c) The unit sphere x 2 + y2 + Z2 = I in E3.

8. Let gl be a path from PI to P2 and gz a path from pz to P3' Let h(t) = gl(2t) if
Os t s 1. and h(t) = gz(2t - I) if! s t s 1. Show that h is a path from PI to P3'
9. Let D c en be open. By polygonal path in D from x to y let us mean a path g in D
from x to y with the following property: There exist to, t I, . . . , t m such that 0 =
to < t I < ... < tm- I < tm = I and g(t) = g(td + (t - tdvk if tk s t S tk+ I' where
Vk = (tu I - tk)-I[g(tu d - g(tk)J, k = 0, I .... , In - 1.
Let g be a polygonal path in D from x to y, and U be any convex set such that
y E U and U c D. Using Problem 8, find a polygonal path in D from x to any point
ZE U.

10. Let D c En be open. Given xED, let A = {y: there is a polygonal path in D from
x to y} and let B = D - A. Using Problem 9, show that A and B are open sets and
A is not empty. [Hint: Any neighborhood is convex.] If B is not empty, then D is
disconnected.
11. A set S is totally disconnected if no connected subset of S contains more than one
point. Show that the following are totally disconnected subsets of EI :
(a) S = {rational numbers}.
(b) S = C, the Cantor set (Section 2.4, Problem 5).
(c) Show that EI would be totally disconnected if we gave EI the nonstandard
topology in Problem 8, Section 2.6.

59
2 Elementary topology of En

12. Let S be as in Example 3. Show that:


(a) S is a closed set.
(b) There is no path in S joining (0,0) and any point of S 2'
(c) S is a connected set.

2.8 Compactness
Let us next introduce another property, called compactness, which a subset
of a topological space may possess. For a set S c P, compactness of S
turns out to be equivalent to S being closed and bounded. Therefore, Theorem
2.4 can be rederived in a more elegant way.
Compactness is defined in terms of open coverings. Let S be a subset of a
topological space So. A collection mof su bsets of So is a copering of S if
eVery point of S belongs to some set A Em, that is, if S c UAE~A.If21' c m
and m' is also a covering of S, then m' is a subcovering. If every A Emis an
open subset of So, then mis an open copering of S.
We are interested in whether every open covering m
of S has a finite
subcovering; i.e. whether finitely many sets A" ... , Am E mexist such that
SeA, u··· u Am.

EXAMPLE I. Let SeEn, and fix a > O. Let Ux be the neighborhood of x of


radius a, and 21 the collection of such U x for all XES. Then m
is an open
covering of S. Generally, one will still have a covering if some of the sets U x
are discarded from the collection. In fact, if S is a bounded set, there will be a
subcovering 21' consisting of finitely many U x'

EXAMPLE 2. In Example 1, let U x be some neighborhood of x whose radius


is no longer fixed, and 21 the collection of such U" for all XES. It may turn
out that there is no finite subcovering, even though S is bounded. For instance,
m
let S = (0, 1], Ux = (tx, 1X), and the collection of these open intervals Ux
for 0 < x ~ 1. Then mis an open covering of S. However, there do not exist
finitely many points x" ... ,Xm in S such that S c U X1 U ... U U Xm '

Definition. A subset S of a topological space So is compact if every open


covering of S contains a finite subcovering.
If So, considered as a subset of itself, is compact, then So is a compact
topological space.

Let us consider the particular case So = En. The compact subsets of En


are characterized by the following theorem, and its converse.

Theorem (Heine-Borel). If S is a closed, bounded subset of En, then every open


covering of S contains a finite subcovering.

60
2.8 Compactness

PROOF. Let 21 be an open covering of S. Suppose that no finite collection


21' c 21 covers S. Let us define a sequence of closed, bounded sets S I ::::>
S 2 ::::> ••• such that diam Sk ~ 0 as k ~ 00 and no finite subcollection of 21
covers any Sk' Let S I = S. Since S I is bounded, some closed n-cube II
contains S I' Divide II into n-cubes I II, ... ,1 1m as in the proof ofthe Bolzano-
Weierstrass theorem, and let S Ik = SIn I Ik' Since S I and I Ik are closed, so is
S Ik' If for every k = 1, ... , m some finite collection 21k c 21 covered S lk' then
211 u ... U 21m would be a finite subcovering of S, contrary to assumption.
Choose some k for which no finite subcollection of 21 covers S Ik' and let
S Ik = S 2' Repeating this process, we obtain the desired sequence of closed,
bounded sets.
By Theorem 2.3, SI n S2 n ... contains a single point xo' Since 21
covers S, Xo belongs to some set A E 21; and since 21 is an open covering, A is
an open set. Therefore, there is a neighborhood U of Xo such that U c A.
Since diam Sk ~ 0 as k ~ 00, Sk C U for large enough k. For such k, Sk is
covered by the subcollection of 21 consisting of the single set A. Since by con-
struction no finite subcollection of 21 covers any Sk' this is a contradiction.
o
EXAMPLE 3. Let S = [0, 1]. For 0 < x ~ 1, define Ux as in Example 2; and
let U 0 = (-1,1). The collection 21 of open intervals Ux for 0 ~ x ~ 1 is an
open covering of [0, 1]. By the Heine-Borel theorem, there is finite sub-
covering. In fact, the three intervals U 0, U 1;2, U I give a covering of [0, 1].

The converse to the Heine- Borel theorem is true. To prove it, suppose first
that S is unbounded. Let Am be the neighborhood of 0 of radius m = 1,2, ....
Then {A I, A 2, ... } is an open covering of S with no finite subcovering. If S
is not closed, let Xo E fr S - S and Am = {x: Ix - xol > 11m}. Then
{A I, A z , . .. } is an open covering of S with no finite subcovering. This proves
the converse.
We summarize these results as follows.

Theorem 2.9. A set SeEn is compact if and only ifS is closed and bounded.

The reader is cautioned that Theorem 2.9 depends heavily on the finite
dimensionality of En. The concepts in this theorem all make sense in any
normed vector space (Section 2.9). However, closed, bounded subsets of an
infinite dimensional normed vector space need not be compact. For an
example, see Problem 4, Section 2.9.
Let us next show that the continuous image of a compact set S is compact.
Consider a function ffrom So into some topological space T, and S c So.

Theorem 2.10. If S is a compact set and f is continuous on S, then f(S) is a


compact set.

61
2 Elementary topology of En

PROOF. Consider first the case S = So. Let flJ be any open covering of I(S).
Since 1 is continuous,j - '(B) is open for every B E flJ. The collection of sets
1 - l(B) is an open covering of S. Since S is compact, a finite subcollection
{f - l(B 1), ... , 1 - l(Bm}} covers S. Then {B 1, ... , Bn.} is a finite subcollection
of flJ which covers I(S). Hence I(S) is compact. The case S c So reduces to
this one by considering S as a topological space with the relative topology
and by replacing 1 with the restriction 1 IS. 0

Note that Theorem 2.4 is an immediate consequence of Theorems 2.9 and


2.10. This gives another proof of Theorem 2.4, based on the Heine-Borel
theorem rather than the Bolzano-Weierstrass theorem.

PROBLEMS

1. Show that any closed subset of a compact topological space is compact.


2. Let A and B be compact subsets of a topological space So. Show that A u B is
compact.
3. Let AcE' and B c E"-' be compact. Show that A x B is a compact subset of En.
[Hint: Problem 8(b), Section 2.1.]
4. Let A be a nonempty subset of En, and let f(x) = inf{ Ix - y I : YEA}. This is the
distance from x to A. Show that:
(a) {(x) = 0 if and only if x E cl A.
(b) If(xd - f(x 2)1 s IXI - x21, for every Xlo X 2 E En; consequently {is continuous
on En.
(c) {x :f(x) S c} is compact if A is compact, for any c ;:::: O.
5. Let A, B be nonempty subsets of E", and let d = inf{ Ix - YI : x E B, YEA}.
(a) Show that d > 0 if A is closed, B is compact, and An B is empty. [Hint:
Problem 4.]
(b) Give an example of closed sets A, B such that A n B is empty but d = O.
6. A dyadic rational number is a real number of the form x = j r \ wherej, k are integers
and k ;:::: O. Let S = {all dyadic rational x: 0 < x < I}. If x = j r k with j odd,
1 S j S 2k - I, and k > 0, let V x = (x - 4 -k, X + 4 -k). Show that there do not
exist finitely many points x I, ... , Xm E S such that S c V XI U ... U U'm'
7. Let S be a compact topological space.
(a) Let Am be a closed, nonempty subset of S for m = 1, 2, ... such that A I :::> A 2 :::> ....
Show that n;;:I Am is not empty. [Hint: Consider the collection ~ of open sets

S - Am, m = 1,2, ....]


(b) Let Am be a closed, non empty subset of S for m = 1, 2, ... , such that A In· .. n Al
is not empty for each I = 1,2, .... Show that n~: I Am is not empty. [Hint:
Part (a).]

2.9 Metric spaces


In Section 2.6 a very general notion was introduced, that of topological space,
by axiomatizing the idea of neighborhood. In the present section we proceed
in a different way, by axiomatizing the idea of distance between any two points

62
2.9 Metric spaces

of a set S. We denote the" distance" between p, q E S by d(p, q), and require d


to satisfy Properties (i) through (iii) below. A set provided with such a distance
is called a metric space. We formalize this as follows.

Definition. A metric space is a nonempty set S together with a real valued


function d with domain the cartesian product S x S, such that:
(i) d(p, q) ~ 0 for every p, q E s, d(p, q) = 0 if and only if p = q.
(ii) d(p, q) = d(q, p) for every p, q E S.
(iii) d(p, r) ~ d(p, q) + d(q, r) for every p, q, rES. The function d is called
a metric on S.
Property (iii) is called the triangle inequality in the metric space S.

EXAMPLE I. Let S = En, and d(x, y) = Ix - y I be the usual euclidean distance.


Properties (i) and (ii) clearly hold, and (iii) is just Formula (1.4). If SeEn,
the euclidean distance also defines a metric on S. Thus any nonempty subset
of En, with the euclidean distance, is a metric space.

EXAMPLE 2. Let S be a sphere in E3. As distance d(x, y) between x, YES take


the length of the shorter great circle arc on S joining x and y. This gives a
different metric on S than the euclidean distance d(x, y). They satisfy the
inequalities
-
d(x, y) ~ d(x, y) ~ "2IT d(x, y).

Although the metrics d and d are different, they will give rise to the same
topology on the sphere (the relative topology).

EXAMPLE 3. Let S be any nonempty set. Let d(p, q) = 1 if p =F q; and


d(p, p) = O. This may seem a rather artificial sort of "distance," but it does
have the properties (i), (ii), and (iii) required for a metric space.
Any metric defines a topology of S, in the same way that the euclidean
distance defines the usual topology of En. By 15-neighborhood of a point p
in a metric space let us mean the set
U = {q E S: d(p, q) < 15}, 15 > O.
Axioms (1) and (2) for a topological space (in the first definition in Section 2.6)
are satisfied. In Axiom (3) we can take U 3 = U 1 n U 2' To verify (4), let U be
the b-neighborhood of p, and q E U. Let V be the b - d(p, q) neighborhood of
q. If rEV, then by (iii)
d(p, r) ~ d(p, q) + d(q, r) < d(p, q) + 15 - d(p, q).
Thus, d(p, r) < 15 for all rEV; that is, V c U.

63
2 Elementary topology of En

In Example 1, neighborhoods of x are just the usual ones when S = En,


and are the relative neighborhoods when SeEn. In Example 3, the b-
neighborhood of P is the one point set {p} if 0 < b < 1. With this metric, S
is a discrete topological space (Problem 7, Section 2.6).

Complete metric spaces


The concepts of convergent sequence and of Cauchy sequence in a metric
space S are defined almost exactly as in Section 2.3. One needs merely to
replace euclidean distance by d-distance in S.

Definition. Let [PmJ, m = 1,2, ... , be a sequence in a metric space S. If for


every I: > 0 there exists a positive integer N such that d(Pl' Pm) < I: for
every I, m 2:: N, then [PmJ is a Cauchy sequence.
If for every I: > 0 there exists a positive integer N such that d(Pm' Po) < I:
for every m 2:: N, then [PmJ is a convergent sequence and Po is its limit.

It can be easily shown that every convergent sequence in a metric space is


Cauchy, just as for sequences in En (Section 2.3). Those metric spaces for
which the converse is valid are called complete.

Definition. Let S be a metric space, with metric d. Then S is complete if


every Cauchy sequence [PmJ in S converges to a limit Po E S.

EXAMPLE 1 (continued). Theorem 2.2 states that En, with the euclidean
distance is a complete metric space. If SeEn with the euclidean distance, then
S is a complete metric space if and only if S is a closed subset of P. To see this,
suppose that S is closed. Let [xmJ be any Cauchy sequence, with Xm E S for
m = 1, 2, .... According to Theorem 2.2, the sequence [xmJ has a limit Xo.
Since S is closed, Xo E S (Problem 4a, Section 2.3). Thus S is a complete
metric space. On the other hand, if S is not closed, then there exists a sequence
[xmJ, with Xm E S for m = 1,2, ... ,converging to a limit Xo ¢ S (Problem 4b,
Section 2.3). The sequence [xmJ is Cauchy, but has no limit in S. Thus S is
not a complete metric space.

In Sections 2.10 and 5.13 we see that certain metric spaces whose elements
are functions are complete.

Normed vector spaces


We recall from Appendix A.l the definition of vector space. Let us axiomatize
the idea of norm on a vector space r over the real numbers. A vector space
with a norm then becomes a metric space, by the same procedure used to
define the euclidean metric from the euclidean norm in Section 1.2.

64
2.9 Metric spaces

Definition. A normed vector space is a vector space "fI together with a real
valued function I II with domain "fI, such that:
(1)
(2)
°
Ilull > for all u E "fI, u -=1= O.
I cu I = Ie III u I for every real c and u E "fl.
(3) Ilu + vii s Ilull + Ilvll for every u, v E "fl.
From Property (2) with c = 0, we have 11011 = 0, where 0 is the zero element
of "fl. From (2), (3), and induction on m, one can show that

IIJ/jUjll s JllcjlllU)1
for every choice of real numbers e\ ... , em and u I, ... , Urn E "fl.
We define the distance between u, v E "fI by
d(u, v) = Ilu - vii.
Then "fI is a metric space. Property (3) implies the triangle inequality (iii).

EXAMPLE 4. Let "fI = En, with the norm of a vector x = (Xl, ... ,xn) equal to
I xI = max { Ix I I, ... , Ixn I}.
Neighborhoods ofx in the metric d(x, y) = Ilx - yll are n-cubes (see Example
3, Section 2.6). This is just one of many nonstandard norms which can be
placed on En. It is seen in Section 2.11 that all norms on En lead to the usual
topology of En.

EXAMPLE 5. The following space "fI is an infinite dimensional version of En.


Consider infinite sequences of real numbers, which will be written as

If Y = (i, ... , ym, ... ) is another such sequence, then the sum is defined by
x + y = (Xl + i, ... ,xm + ym, ... ).
Similarly, cx = (exl, ... , cx m , •.• ). Let "fI be the space of such sequences
for which the sum of squares of the components xm is finite. As norm we take

The reader should check that "fI is a normed vector space. Just as for En, a
set A c "fI is called bounded if there is a positive number C such that
Ilxll s C for all x EA. Unlike En, there exist closed bounded subsets of "fI
which are not compact (Problem 4).
A normed vector space "fI which is complete (in the metric d(u, v) =
Ilu - viI) is called a Banach space. The space En, with any norm, is a finite
dimensional Banach space. Examples of infinite dimensional Banach spaces

65
2 Elementary topology of En

are the sequence space in Example 5, the space <f6'(S) in Section 2.10, and the
U spaces in Section 5.13. We refer to the work by Taylor [22J for an intro-
duction to the theory of Banach spaces and their key role in current mathe-
matical analysis.

PROBLEMS

1. Let S be a sphere in E 3 , with d(x, y) as in Example 2. Show that d satisfies properties


(i), (ii), and (iii) in the definition of metric space.
2. Let S be a metric space, with metric d.
(a) Show that, for fixed Po, the function d(po, ) is continuous on S. (By definition,
the value of d(po, ) at q is d(po, q ).)
(b) Let b > 0. Show that {q: d(po, q) :::; b} is closed and the b-neighborhood
{q : d(po, q) < b} is open. Give an example in which the first of these two sets
is not the closure of the second.
3. Let S be a metric space with metric d. Let

d = d(p, q)
(p, q) 1 + d(p, q)'

Show that d satisfies properties (i), (ii), and (iii) for a metric space.
4. Let ,,//" be as in Example 5.
(a) Verify Properties (1) through (3) for the norm I I in this example.
(b) Show that ,,/," is a complete metric space.
(c) For I = 1,2, ... let e, = (0, ... ,0, 1,0, ...) where I - 1 zeros precede the 1.
Let A = {e b e 2 , ••. }. Show that A is bounded and closed.
(d) Show that the set A in part (c) is not compact. [Hint: Let U, = {x: Ilx - e,ll < I}.
The collection {U I, U 2, ... } covers A.]
5. For any compact subsets A, B of En, let d(A, B) be the smallest number a with the
following property: for every x E B there exists YEA such that Ix - y I :::; a, and for
every YEA there exists x E B such that Ix - YI :::; a. Show d is a metric on the space
whose elements are all compact subsets of En.

°
6. Let S be a compact metric space, with metric d.
(a) Show that for any b > there is a finite set So c S such that any pES is distant
less than b from some q E So [i.e., d(p, q) < b].
(b) A set A c S is countable if either A is a finite set or A = {PI' pz, ... } fOl: some
infinite sequence [Pm] in S, Pm oF P, for I oF m. Show that there is a countable
set A with cl A = S.
7. A topological space So is called a Hausdorffspace if So has the property that for every
p, q E So (p oF q) there exist a neighborhood U of p and a neighborhood V of q such
that U ( l V is empty.
(a) Show that any metric space is a Hausdorff space.
(b) Show that any compact set S c So is closed, if So is a Hausdorff space.
(c) Let f be continuous and univalent from a compact space S onto a Hausdorff
space T. Show thatf- I is continuous from Tonto S. [Hint: Show that U- I )-I(B)
is closed if B is closed.]

66
2.10 Spaces of continuous functions

2.10 Spaces of continuous functions


To motivate the discussion that follows, let us for the moment consider
real valued, continuous functions on a closed, bounded interval [0, 1].
There are many possible ways to define a concept of distance between two
functions I and g. One such is the integral Sbl/(x) - g(x)ldx. Another is
[JAI I(x) - g(xW dx]1/2. Such distances are considered in Section 5.13. In
the present section we consider another distance:
dU, g) = sup{l/(x) - g(x) I : x E [0, I]}.
In Figure 2.8 we picture {g: dU, g) ::; c}, where I is given. For 9 to be in
this set, we must have I/(x) - g(x) I ::; c for all x E [0,1]. This imposes a
uniform bound on I/(x) - g(x)l; for that reason dU, g) is called the uniform
distance.

---r-----------------L----x

Figure 2.8

Let us now formulate the idea of uniform distance in a more general


setting. Let S be a set. A real valued function f, with domain S, is called
bounded if I(S) is a bounded set. For each such I, let

(2.1 ) Ilfll =sup{lf(p)l:pES}.


Let us denote by 81(S) the set of all bounded, real valued functions with
domain S. If I and 9 are bounded functions, then the sum f + 9 is a bounded
function. Moreover, any scalar multiple cl is a bounded function. Thus,
81(S) is a vector space, under the usual notions of addition of functions and
multiplication by scalars. Moreover, II I is a norm on 81(S). It is called the
uniform norm. Properties (1) and (2) for a norm (Section 2.9) are immediate
from (2.1). To prove (3), we have for all pES

1(1 + g)(p) I = I/(p) + g(p)1 ::; I/(p)1 + Ig(p)l·

Since I/(p)1 ::; 11111, Ig(p) I ::; Ilgll, we have

IU + g)(p) I ::; Ilfll + Ilgll.

67
2 Elementary topology of En

Since this is true for each pES,


Ilf + gil ~ Ilfll + Ilgll,
which is Property (3).

Uniform convergence of sequences


Let fl, f2, ... be a sequence of real valued functions. One concept of con-
vergence of the sequence to a limiting function f is that f(p) = limm~ oc fm{P),
for each pES. This concept is called pointwise convergence of the sequence
[fm] to f. It is not the concept of convergence with which we are principally
concerned here.

Definition. Let fl, f2, ... be a sequence of functions, such that fm E £J6{S)
for m = 1,2, .... Then fm tends to f uniformly on S as m -+ 00 if
Ilfm - fll -+ 0 as m -+ 00.

Thus, uniform convergence of a sequence [fm] is the same as convergence


in the metric d such that d(f, g) = II f - gil·

EXAMPLE I. Let S = [0, 1], and let fm{x) = xm (mth power of x). Then
fm(x) -+ 0 as m -+ 00 for 0 ~ x < 1. Since fm(1) = 1, fm(l) -+ 1 as m -+ 00.
The sequence [fm] converges pointwise to f, where f(x) = 0 for 0 ~ x < 1,
f(1) = 1. However, the convergence is not uniform. This follows from
Theorem 2.11, since f is not a continuous function. A direct proof can also
be given (Problem 1).

Proposition 2.13. The space £J6{S), with the metric d(f, g) = Ilf - gil, is a
complete metric space.
PROOF. Let fl,f2, ... be any Cauchy sequence in £J6{S). Given I> > 0 there
exists N such that II}; - fmll < I> for every I, m ~ N. Since Ih(p) - fm{P) I ~
II}; - fmll for each pES,
Ih(p) - fm(P) I < I> for every I, m ~ N.
This shows that, for fixed p, the sequence of real numbers fl(P),f2(P), ...
is Cauchy. By Theorem 2.2, this sequence has a limit, which we denote by
f(P). This defines the function f. Let us show that Ilfm - fll -+ 0 as m -+ 00.
Given I> > 0 there exists N 1 such that I fm - };II < 1>/2 for every I, m ~ N l'
Since fm{P) -+ f(p) as m -+ 00, there exists N 2 > N 1 such that Ih(p) - f{p) I
< 1>/2 for every I ~ N 2' Note that N 2 may depend on both P and 1>, but N 1
depends only on 1>. Then
fm{P) - f{p) = [fm{P) - h(p)] + [};{p) - f{p)],
Ifm{P) - f{p) I ~ Ifm{P) - h(p)1 + Ih(p) - f(p)l·

68
2.10 Spaces of continuous functions

Therefore, ifm 2 NI and I 2 N 1 ,


Ifm(P) - f(p) I :s; Ilfm - fill + IHp) - f(p)1 < 8.

Since this is true for all PES,


Ilfm - f I = sup{ I fm(P) - f(p)l: pES} :s; 8,

for every m 2 N I' This proves that I fm - f I -> 0 as m -> 00.


It remains only to verify that f is a bounded function. Take 8 = 1 and N I
as above. Fix m 2 N I . Then f = (f - fm) + fm' and both f - fm' fm are
bounded functions. Hence f is bounded. D

Let us now suppose that S is a topological space, and consider sequences


of continuous functions on S.

Theorem 2.11. Let fl' f1, .. , be a sequence of functions, such that fm is


continuous on S for each m = 1,2, ... and fm tends to f uniformly on S as
m -> 00. Thenfis continuous on S.
PROOF. Consider any Po E S. Given 8 > 0, there exists N such that Ilfm - fll
< 8/3 for every m 2 N since fm tends to f uniformly. Since fN is continuous,
there exists a neighborhood U of Po such that
I fN(P) - fN(PO) I < 8/3 for all P E U.
Recall that IfN(P) - f(p)l:s; IlfN - fll by definition of the norm I II. Then
f(P) - f(po) = U(p) - fN(P)] + UN(P) - fN(PO)] + UN(PO) - f(po)],
If(p) - f(po) I :s; If(p) - fN(P) I + IfN(p) - fN(PO) I + IfN(PO) - f(Po)l,
888
If(p) - f(po) I < '3 + '3 + '3 = 8

for all P E U. Since such a neighborhood U exists corresponding to every


8 > 0, f is continuous at Po. Since Po is any point of S, f is continuous on S.
D
The space ~(S)
Let rc(S) denote the set of all real valued functions f, such that f is bounded
and continuous on S. The sum f + 9 is continuous if f and 9 are continuous.
Moreover, any scalar multiple cf of a continuous f is continuous. Thus
rc(S) is a vector subspace of &4(S).

Theorem 2.12. The space rc(S), with the metric d(f, g) = Ilf - gil, is a
complete metric space.
PROOF. Let Um], m = 1,2, ... , be any Cauchy sequence in rc(S). Since
rc(S) c &4(S), the sequence is Cauchy in &4(S). By Proposition 2.13 &4(S) is
complete. Thus there exists a bounded function f such that I fm - f I -> 0

69
2 Elementary topology of en

as m ---+ 00. By Theorem 2.11 f is continuous. Hence f E ~(S). Thus, any


Cauchy sequence [fm] in ~(S) has a limit f in ~(S), which proves that ~(S) is
complete. 0

PROBLEMS
l. Show directly from the definition of uniform convergence that the sequence [fm] in
Example 1 does not converge uniformly.
2. Let fm(x) = m2 x if 0 S x S m- I , j~(x) = m(2 - mx) if m- I S x S 2m- I , and
fm(x) = 0 for all other x.
(a) Find II fm II.
(b) Show that fm tends to 0 pointwise for each x, but that the convergence is not
uniform.
3. Let S be a finite set with n elements. Explain how '??(S) can be identified with the space
in Example 4, Section 2.9.
4. Letf(x) = D")= I (sin kx)/k 2 . Use Theorem 2.11 to show that f is continuous on EI.
5. Let SeEn be compact, and let Y be a compact subset of '??(S). Show that:
(a) There exists a number C such that II f II s C for all fEY.
(b) Given e > 0 there exists b > 0 depending on e (but not on f), such that
If(x) - f(y)1 < e for all fE.'!' and all X,YES satisfying Ix - yl < b. [Hint:
Problem 8, Section 2.5 and Problem 6a, Section 2.9.]

* 2.11 Noneuclidean norms on En


It is sometimes advantageous to consider norms on En other than the
standard euclidean norm. The distance between two, points x and y defined
by such a norm need not agree with the euclidean distance. As a result, such
geometric notions as length, area, and spherical ball are changed when
considered with respect to a noneuclidean norm. However, we see that any
noneuclidean norm leads to the same collection of open sets as the euclidean
norm. Since the collection of open sets determines all of the topological
properties of e, these properties are therefore independent of the particular
norm chosen.
We recall from Section 2.9 that a norm is a real valued function I I with
domain e such that:
(1)Ilxll > 0 for every x =1= 0,
(2)Ilcxll = iclllxli for every c and x, and
(3)Ilx + yll ::; Ilxll + Ilyll for every x and y.
The norm defines a metric d(x, y) = Ilx - YII. The J-neighborhood of Xo
in this metric is {x: Ilx - xoll < J}. The closed n-ball with center Xo and radius
<>, with respect to the norm I II, is {x: Ilx - xoll ::; D}. The main result of
this section is a characterization of the closed n-ball with center 0 and radius
1. By translations and scalar multiplications, this then characterizes all
closed n-balls with respect to the norm II II.

70
2.11 Noneuclidean norms on En

We have already given two examples, namely, the standard euclidean


norm (11xll = Ix I) and Example 4, Section 2.9. Two further examples are:

EXAMPLE I. Let

i= 1

The n-balls with respect to this norm are convex polytopes. For example, if
n = 2, the closed unit 2-ball {x : I xI ~ I} is the square with vertices e1, e2 ,
-e 1, -e 2 . Compare with Problem lla, Section 1.3.

EXAMPLE 2. Let (Ci) be a n x n matrix that is symmetric, Cij = Cji for


i,j = 1, ... , n. Define the function B(x, y), for each x = (Xl, ... , x n ), y =
(i, ... , yn) by
n
(2.2) B(x, y) = L CijXiyi.
i.j= 1

The function B is called bilinear. Let us assume that B(x, x) > 0 for every
x :f. O. This means that the matrix (Ci) is positive definite. With B we associate
a quadratic norm, given by

Ilxll = JB(x, x).


Property (1) holds by assumption, while (2) follows from B(cx, cx) = c2 B(x, x).
To establish (3), let us think of B(x, y) as an inner product on En. This is the
standard euclidean inner product if (Ci) is the identity matrix, in which case
B(x, y) = x . y. Any such inner product satisfies
B(x, y) = B(y, x), B(x + y, z) = B(x, z) + B(y, z),
B(cx, y) = cB(x, y), B(x, x) > 0 if x :f. O.
The proof of Cauchy's inequality (1.1) was based on the corresponding
properties of the euclidean inner product. The same proof, with each inner
product U' v in the proof of (1.1) replaced by the corresponding B(u, v),
shows that

IB(x, y)1 ~ JB(x, x) JB(y, y).


This inequality is then used to show that

Ilx + yll ~ Ilxll + Ilyll,


exactly in the same way (1.2) was derived from (1.1). This verifies property (3)
of the quadratic norm II II·
The n-balls with respect to any quadratic norm are n-dimensional ellip-
soids. This is proved in Section 4.8 by finding a new orthonormal basis for En
for which the matrix associated with the bilinear function B is diagonal.

71
2 Elementary topology of En

Proposition 2.14. Corresponding to any norm II lion En there exist positive


numbers m and M sueh that for every x E En,
(2.3) mlxl ~ Ilxll ~ Mlxl.
PROOF. Let

where ret, ... , en} is the standard basis for En. Then

and therefore

But Ixil ~ Ixl for each i = 1, ... , n. Hence

Ilxll ~ Mlxl·
From this Ilx - yll ~ Mix - yl for every x and y, which implies that
I I is a continuous function. Therefore it has a minimum value m on the
compact set {x : Ix I = I},
m = min{llxll : Ixl = I}.
By Axiom (1) for a norm m > O. If x = 0, all terms in (2.3) are O. Given
any x =1= 0, let e = Ixl-t. Then lexl = elxl = 1, and hence llexll ~ m. By
Axiom (2) lIexll = Jclllxll, from which
Ilxll ~ mlxl. o
From Proposition 2.14,
mix - yl ~ IIx - yll ~ Mix - yl
for every x and y. This says that the ratio of the I II-distance to the euclidean
distance is bounded between m and M. Let us call a set D II II-open if every
Xo ED has some neighborhood with respect to this norm which is contained
in D. If Ix - xol < (jjM, then IIx - xoll < (j. Therefore the (j-neighborhood
of Xo with respect to the norm II I contains the ordinary euclidean ((jjM)-
neighborhood of Xo. If D is I II-open, then every Xo E D has a euclidean
neighborhood contained in D. Hence D is open in the ordinary sense.
Similarly, the euclidean (j-neighborhood of Xo contains the (m(j)-neighbor-
hood of Xo with respect to the norm I II. It follows that every set which is
open in the ordinary sense is also I II-open. Thus a set is I II-open if and only
if it is open in the usual sense. Since the open subsets of a topological space
determine the topology (Section 2.6), all norms on En lead to the same top-
ology of En.

72
2.11 Noneuclidean norms on En

The closed unit n-ball


(2.4) K = {x: IIxll s I}
with respect to any norm has the following four properties:
(i) K is compact;
(ii) K is convex;
(iii) K is symmetric about 0; and
(iv) K contains a euclidean neighborhood of O.
Symmetry about 0 means that -x E K for every x E K. By Axiom (2)
with c = -1, II-xII = IIxll, and hence K has Property (iii). From Propo-
sition 2.14, K has Property (iv) and is bounded. For any continuous functionJ,
{x :J(x) s I} is a closed set. Since II II is continuous, K is closed. Since K is
also bounded, property (i) follows from Theorem 2.9. To prove (ii), let
XI' X2 E K and t E [0, 1]. Then
IItx I + (l - t)x 211 s tllxlll + (1 - t)lIx 211 s t + 1 - t.
Thus
IItx I + (l - t)x211 s 1; i.e., tX I + (l - t)X 2 EK.
This proves (ii).
Let us show that, conversely, any set K with these four properties gives
rise to a norm with respect to which K is the closed unit n-ball.

Theorem 2.13. Let K be any set with Properties (i) through (iv). Let 11011 = 0,
andJor every x "# 0, let
1
(2.5) IIxll = - - - - - - - , -
max{t: tx E K}
Then II II is a norm, and (2.4) holds.
PROOF. By (i) and (iv) there exist rl >
Iyl
and r2 > °
such that y E K if
srI and y ¢ K if Iyl > r2 • Hence given x "# 0, tx E K if It I s rl/lxl,
°
and tx ¢ Kif It I > r2/1xl (see Figure 2.9). Let
Sx = {t: tx E K}.
°
Then Sx contains the (r II Ix I)-neighborhood of and is bounded above by
r2/1 x I. Since K is a closed set, Sx is also closed. Hence Sx has a largest element
max Sx, which is positive. This shows that IIxll = Ilmax Sx is well-defined
and is positive. Moreover, by (ii) and the fact that 0 E K, the line segment
between 0 and any point of K is contained in K. Therefore max Sx 2: 1 if and
only if x E K, which says that (2.4) holds.
It remains to verify Axioms (2) and (3) for a norm. By Property (iii),
11- x II = II x II. It is left to the reader to check that if c > 0, then
I
max Sex = - max Sx,
c

73
2 Elementary topology of En

Figure 2.9

and consequently Ilexll = ellxll. Then Axiom (2) holds. For (3) we may
assume that x "# 0 and y "# O. Let
t
t = max Sy, U=--.
s +t
Observe that 0 < u < 1 and
1 1 1
- = - + - = Ilxll + Ilyll·
su s t
A little manipulation shows that su = (1 - u)t. Consequently,
su(x + y) = u(sx) + (1- u)ty.
Since sx, ty E K and K is a convex set by Property (ii), su(x + y) E K. There-
fore su S max Sx+y, and
1
Ilxll + Ilyll = -
su
z Ilx + YII·
This verifies Axiom (3). o

PROBLEMS

1. In E2 consider the norm I (x, y)11 = max {I x I, Iy I}.


(a) WhatisK= {x:llxll < I}?
(b) Show that the triangle with vertices 0, e b e 2 is equilateral with respect to the
distance which this norm defines.

2. The ellipse K = {(x, y): x 2 + xy + 4y2 ~ I} has Properties (i) through (iv) in
Theorem 2.13. .
(a) For what (quadratic) norm is it the closed unit 2-ball?
(b) Find Ile l - e2 11.

3. Let a, b, p be real numbers, with p ;:::: 1. Show that Ita + (1 - t)b IP ~ t Ia IP +


(1 - t)IW for 0 ~ t ~ 1. In particular, la + W ~ 2P-1(laI P + IW)·

74
2.11 Noneuclidean norms on En

Ii
4. Let p ~ 1, and K = {x: = i IXi Ip ~ I}.
(a) Show that K is convex. [Hint: Problem 3.]
(b) Show that K satisfies Properties (i) through (iv) in Theorem 2.13.
(c) Show that the norm associated with K in Theorem 2.13 is Ilxll = [Ii= i IxiIP]i/ p•

Note. For this norm the inequality Ilx + yll ~ Ilxll + Ilyll is called
Minkowski's inequality. There is a related inequality for integrals, which we
prove in Section 5.13.
5. A seminorm on En is a real valued function f satisfying: f(x) ~ 0 for every x;
f(cx) = Ie If(x) for every c and x; and f(x + y) ~ f(x) + f(y) for every x and y.
(a) Let f be a seminorm and K = {x: f(x) ~ I}. Show that K is closed and satisfies
Properties (ii) through (iv). Show that K is compact if and only if f is a norm.
[Hint: First prove that f is continuous.]
(b) Conversely, let K be any closed set satisfying Properties (ii) through (iv). Let
f(x) = 0 if x = 0 or if the line through 0 and x is contained in K. Otherwise, let
1
f(x) = - - , - - - - - - - c -
max{t: tx E K}
as in (2.5). Show that f is a seminorm.
(c) Let n = 3 and f(x, y, z) = Ixl + 21YI. Sketch K and show that f is a semi norm.

75
3
Differentiation of
real valued functions

We now begin the differential calculus for real valued functions of several
variables. The first step is to define the notions of directional derivative
and partial derivative. Then the concept of differentiable function is intro-
duced, by linear approximation to the increments of a function. Taylor's
formula with remainder is obtained for functions of class e(q); such functions
have continuous partial derivatives of orders 1,2, ... , q. It is then applied
to problems of relative extrema and to the characterization of convex
functions of class e(2).
The chain rule for partial derivatives is postponed to Chapter 4, since it
is a natural corollary of the composite function theorem for vector-valued
functions to be proved there.

3.1 Directional and partial derivatives


If f is a function of one variable, then its derivative at a point Xo is defined by

f '( Xo ) -- I'1m f(xo + h) - f(xo)


h '
h-O

provided the limit exists. The corresponding expression for functions of


several variables does not make sense, since h is then a vector and division by
h is undefined. Therefore we must find an acceptable substitute for it. Let
us first consider the derivative of f in various directions.
Let us call any unit vector v (that is, vector with Iv I = 1) a direction
in En. The directions are just the points of the (n - I)-dimensional sphere
which bounds the unit n-ball. If n = 1, the only directions are e 1 and -e 1 ,
which we have identified with the scalars 1 and - 1. If n = 2, every direction
can be written (cos (J, sin (J) where 0 ~ (J < 2n. The angle (J determines the

76
3.1 Directional and partial derivatives

direction. For any n 2: 2 the components of a direction v satisfy Vi = cos ()i'


i = 1, ... , n, where ()i is the angle between v and ei .
Given Xo and a direction v, the line through Xo + v and Xo is called the
line through Xo with direction v. According to the definition in Section 1.3,
this line is
(3.1) {x: x = Xo + tv, t any scalar}.

Let f be a function with domain D c En, and let Xo be an interior point


of D.

Definition. The derivative off at Xo in the direction v is

(3.2)

if the limit exists.


Since Xo is an interior point, the <5-neighborhood of Xo is contained in D
for some <5 > 0 (Figure 3.1). Since

I(xo + tv) - xol = Itvl = Itl,


Xo + tv E D provided It I < <5. The domain of the function <p defined by

<p(t) = f(x o + tv)

contains the <5-neighborhood of O. The derivative of f in the direction v is


<P'(O), if <p has a derivative at O.
The line through Xo with direction - v is the same line as the one through
Xo with direction v. However, the derivative in the direction - v is the
negative of the derivative in direction v (Problem 6). The direction v defines
an orientation of this line, and - v the opposite orientation. When the orienta-
tion changes, the directional derivative changes sign. In effect, by assigning
the orientation v we agree that the point Xo + sv precedes Xo + tv on the
line if s < t.

Figure 3.1

77
3 Differentiation of real valued functions

EXAMPLE I. Let f(x) = Ix I, n = 1. The derivative of fat Xo in the direction


e1 is just !'(xo). For Xo f= 0, !'(xo) = sgn xo, where sgn Xo = 1 if Xo > 0
and sgn Xo = - 1 if Xo < O. The derivative does not exist at Xo = O.

EXAMPLE 2. Let f(x, y) = Ix z - )'ZI1/Z, (xo, Yo) = (0,0). Consider any


direction v = (cos (), sin ()), and let cp(t) = f(t cos (), t sin ()). The directional
derivative is cp'(O), if the derivative cp'(O) exists. Now

cp(t) = It Z cos z () - t Z sin z ()1 1/ Z = 1t IIcos z () - sin z ()1 1/Z.


e e,
If cos z = sin z then cp(t) = 0 for all t; and cp'(O) = o. If cos z f= sin z e e,
then cp has no derivative at t = 0 (see Example 1). Thus, the derivative of fat
(0, 0) is 0 in the four directions (± j212, ± j2f2). The derivative in any
other direction v does not exist.

The partial derivatives of f are defined as the derivatives in the directions


e 1,••• , en, if these directional derivatives exist. There are several equivalent
notations in use for partial derivatives. Of these we shall adopt just two. The
ith partial derivative of f at x, i = 1, ... , n, is denoted by
of
/;(x) or :1 (x).
uXi

Thus

f( ) _ I· f(x + tei) - f(x)


(3.3) Ji X - 1m ~------'---
(~o t

provided the limit exists. Stated in less precise terms, /;(x) is the derivative
taken with respect to the ith variable while holding all other variables fixed.

EXAMPLE 3. Let f(x, y, z) = X Z + y + cos(lz). Then


fl(X, y, z) = 2x,
fz(x, y, z) = 1 - 2yz sin(yZz),
f3(X, y, z) = - yZ sin(yZz).

The symbol}; denotes the real valued function whose value at X is };(x).
Its domain is the set of points where f has an ith partial derivative.
For purposes of brevity, we occasionally abuse the notation by writing
}; for the value /;(x) at some particular x. In each such instance this abuse is
indicated either explicitly or by the context.

EXAMPLE 4. Let f(x) = ljJ[g(x)] for every XED. Suppose that the ith partial
derivative of g at Xo and the derivative of IjJ at g(xo) exist. By the composite
function theorem for functions of one variable
(3.4)

78
3.2 Linear functions

This theorem is proved in Section 4.4 as a special case of the composite


function theorem for transformations.

In Section 3.3 we define the concept of differentiable function. Since this


idea involves linear approximations, we begin in Section 3.2 with linear
functions. We see in Section 3.3 that the directional derivatives of a dif-
ferentiable function are easily computed from the partial derivatives. Dis-
agreeable phenomena of the sort illustrated in Example 2 cannot occur if
f is differentiable at Xo·

PROBLEMS

Unless otherwise stated, the domain D off is en for the particular n indicated
in the problem.
1. In each case find the partial derivatives of f.
(a) f(x, y) = x log(xy), D = {(x, y): xy > O}.
(b) f(x, y, z) = (x 2 + 2y2 + Z)3.
(c) f(x) = x • x.

2. Let f(x, y) = (x - 1)2 - i. Find the derivative of fat e 2 in any direction v, using
the definition of directional derivative.
3. Let f(x, y) = 2xy(x 2 + y2)-1/2, if (x, y) =I (0,0), and frO, 0) = O. Find the derivative
of fat (0, 0) in any direction v.
4. Let f(x, y) = (xy)I/3. (a) Using the definition of directional derivative, show that
fl(O,O) = f2(0, 0) = 0, and that ±e l , ±e 2 are the only directions in which the
derivative at (0, 0) exists. (b) Show that f is continuous at (0, 0).
5. Let f(x, y, z) = 1x + y + z I. If Xo = (xo, Yo, zo) is such that Xo + Yo + Zo = 0, find
those directions v in which the derivative at Xo exists.
6. Show that the derivative off at Xo in the direction - v is the negative of the derivative
at Xo in the direction v.

3.2 Linear functions


Let L be a real-valued function whose domain is en.
Definition. The function L is linear if:
(a) L(x + y) = L(x) + L(y) for every x, y E en.
(b) L(cx) = cL(x) for every x E En and scalar c.

These two conditions are equivalent to the single condition L(cx + dy) =
cL(x) + dL(y) for every x, y E En and scalars c, d. By induction, if L is linear,

(3.5) L(J/ixi) = JI ciL(xi)


79
3 Differentiation of real valued functions

for every m, Xl> ... , xm E En, and scalars cl, ... , cm. In words, this states
that" L of a linear combination of Xl> •.• , Xm is the corresponding linear
combination of L(x l ), ... , L(xm)."
If ai' ... , an are real numbers, then the function L defined by
(3.6)
for every X E En, is linear. Conversely, if L is a linear function, let
ai = L(e i ), i = 1, ... , n.
For each x we have
x = xle l + ... + xnen.
Applying (3.5) with cj = x j , Xj = e j , and m = n, we get
L(x) = Xl L(e l ) + ... + xnL(en).
This has the form (3.6).
Let us denote the right-hand side of (3.6) by a' x. We have proved the
following.

Proposition 3.1. A real valued Junction L is linear if and only if there exist
real numbers a I, ... , an such that L(x) = a . x Jor every x E P. 0

The object a is called a covector and al' ... ' an are its components.
Covectors are not elements of En, but belong to the n-dimensional vector
space (En)* dual to En. This is explained in more detail later in the present
section.
Note that we have written ai with subscripts, while the components Xi of a
vector x are written with superscripts. We call the n-tuple a = (aI' ... , an)
a co vector, and ai' ... , an are its components. The number a' x is called
the scalar product of the co vector a and vector x. If a and bare covectors,
then the sum a + b and product of a by a scalar c are defined by

The covectors form a vector space of dimension n, denoted by (P)*.


With any vector space 11 is associated a dual vector space 11 * (see
Apprmdix A.1). The elements of 11* are the real-valued linear functions on1l.
In particular, let 11 = En. Proposition 3.1 enables us to identify the space of
co vectors with the dual space (E n )*.
The usefulness of the distinction between En and its dual (En)* should
gradually become apparent to the reader, especially in Chapters 4, 7, and 8.
One important fact about vectors and co vectors is that their components Xi
and ai change oppositely with respect to linear transformations (Section 4.1).
The reader who is accustomed to the distinction between column and row
vectors in matrix algebra may think of a vector x as a column vector, and a
co vector a as a row vector.

80
3.2 Linear functions

The standard basis for (En)*


For i = 1, ... , n consider the linear function Xi such that Xi(X) = Xi for
each x E En. In particular, we can take for x any standard basis vector
e l , ... , en for En, obtaining

Xi(e) = 3~,
where 3~ = 3ij is Kronecker's delta. Let us denote by ei the covector cor-
responding to the linear function Xi. The ith component of ei is 1, and the
other components are O. We call {et, ... , en} the standard basis for (En)*.
It is dual to the standard basis {e l , ... , en} for En, in the sense explained in
Appendix A.l.
The notation is chosen so that for every formula about vectors there is
a corresponding formula about co vectors obtained by interchanging sub-
scripts and superscripts. For instance, the components of a vector x satisfy
Xi = Xi(X) = ei ° x. The corresponding formula for the components of a
covector a is ai = a ° ei . In (En)* a euclidean inner product and norm are
defined in the same way as in En. These facts are summarized in the table
that follows.

Vectors Covectors

Standard bases ell"" en el , ... , en


n n
X= Ixie i a= Iai ei
i=1 i= 1
n n
Euclidean inner product X Y = Lxii
0

i= 1
a°b= I
i= 1
aibi

Euclidean norm Ixl2 = X x 0 lal 2 =a a o

n
Scalar product a ° x = Iaixi
i= 1

a ° ei = ai

For some purposes, one can avoid the use of covectors by the simple
device of raising indices.lfa = (al,"" an) is a covector, consider the vector
Y = (i, ... , yn) such that i = ai for i = 1, ... , n. Then a ° x = yo x for any
vector x, where ° on the left side means the scalar product and on the right
side, the euclidean inner product. However, this device is often unsatisfactory.
The difference in behavior of vectors and covectors under linear transfor-
mations was already mentioned. Moreover, it is sometimes convenient to

81
3 Differentiation of real valued functions

give En a noneuclidean inner product. This affects the formula for changing
covectors into vectors. (See the distinction between the differential and
gradient of a function, Section 3.3.)

PROBLEMS

1. Let n = 3 and L(x, y, z) = x + y + 2z.


(a) What is the covector a = (ai, a2, a3) corresponding to L?
(b) Describe the set {(x, y, z): L(x, y, z) = c} and the intersection of this set with the
plane {(x, y, z): y = x}.
2. Prove that any linear function L is continuous using Proposition 3.1 and the in-
equality la' zl :.,; lallzl.
3. Let L be linear on E".
(a) Show that {x: L(x) = c} is a hyperplane, unless L(x) = 0 for all x E E".
(b) Give another proof (not the one suggested in Problem 5, Section 1.4) that hyper-
planes and closed half-spaces are closed sets, using Problem 2.

4. Let {XI' ... , x"} be a basis for E". Define L by the formula
L(clxl + ... + cox") = c"
for every c I, ... , cO. Show that:
(a) L is a linear function.
(b) The set P = {x: L(x) = O} is the (n - I)-dimensional vector subspace spanned
by {XI' ... , x"_ d.
5. Let II II be any norm on E" (Section 2.11). Define on (E")* the dual norm as follows.
For every covector a,
Iiall = max{a' x: Ilxll = I}.
(a) Verify that the dual norm satisfies Properties (1), (2), and (3) (Section 2.9).
(b) Show that IIxll = max{a' x: lIall = I}. [Hint: Problem 7, Section 2.4.]

3.3 Differentiable functions


The existence of a derivative for a function of one variable is a fact of con-
siderable interest. Geometrically, it says that a tangent line exists. However,
the fact that a function of several variables has partial derivatives is not
in itself of much interest. For one thing, the existence of derivatives in the
directions of the standard basis vectors e1, ••• , en does not imply that deriva-
tives exist in other directions. Moreover, the function need not have a tangent
hyperplane even if there is a derivative in every direction (see Example 2
below).
We shall now define a more natural notion, that of differentiability.
Geometrically, differentiability means the existence of a tangent hyperplane.
It will be shown that most of the basic properties of differentiable functions of
one variable remain true for differentiable functions of several variables.
Let us again consider an interior point Xo of the domain D of a real valued
function f.

82
3.3 Differentiable functions

Definition. The function f is differentiable at Xo if there is a linear function


L (depending on xo) such that

(3.7) I·
h~
f(xo + h) - f(xo) - L(h) -
Ihl - .
°
Let us show that iff is differentiable at x o , then f has a derivative at Xo
in every direction v. Taking h = tv, (3.7) implies that

lim f(xo + tv) - f(x o) - L(tv) = 0,


1-+0 t
by Proposition 2.3, and therefore

lim f(xo + tv) - f(x o) _ L(v) = 0,


1-+0 t

lim f(x o + tv) - f(x o) = L(v).


1-+0 t
This shows that L(v) is the derivative at Xo in the direction v.
In Section 3.2 we identified each linear function with a co vector. Let
df(x o) denote the co vector corresponding to the linear function L in (3.7).

Definition. df(x o) is the differential off at Xo.

By Formula (3.6), L(h) = df(xo) . h for any vector h. In particular, iff is


differentiable at Xo, then the derivative offin any direction v equals df(x o) . v.
In particular, let v = e i . The number ai = L(e i) in (3.6) is the ith partial
derivative.fi(xo). Hence the components of the covector df(x o) are the partial
derivatives:
n
(3.8) df(x o) = L .fi(xo)e i,
i= 1

n
(3.9) df(x o) . h = L .fi(xO)h i•
i=1

If xED, let us set x = Xo + h. The vector h = x - Xo is often called


the increment between x and Xo, and in the time-honored notation of calculus
one would write ~x for h. The numberf(xo + h) - f(x o) is the corresponding
increment in! However, we use neither the word increment nor the notation
~x.
If f is differentiable at Xo, then the differential at Xo furnishes a linear
approximation df(x o) • (x - x o) to f(x) - f(x o) when x is near Xo. The error
in this approximation is f(x) - f(xo) - df(x o) . (x - x o), which is the numer-
ator in (3.7). It is small compared to the distance Ix - Xo I when Ix - Xo I is
small.

83
3 Differentiation of real valued functions

Let us interpret this statement geometrically. Think off as defining an


n-dimensional surface in En + 1, namely M = {(x, z): z = f(x), xED}. A
hyperplane pc E"+ 1 is called tangent to M at (x o , Zo) EM if: for each
(x, z) E M there exists (x, z') E P such that limx-+xo 1x - Xo 1- liz' - Z1 = O.
Let us take
P = {(x, z'): z' = Zo + ~r(xo) • (x - xo)}, Zo = f(xo).
Then P is the tangent hyperplane to Mat (xo, zo) (see Figure 3.2). In Section
4.7 we define the concept of manifold. It turns out that M is a manifold of
dimension n; the definition just given of tangent hyperplane is equivalent
to the one given there.

=
(x. ;')

C~L.;,~t1r- (x, z)

I
I
I
I
I I 2
~'----: - :----- x

,I ,
I '
:

X,
~
I XO =0 = !( x o). ; = ! (x )
;' '" =0 + d/ (x o) ' (x - xol

Figure 3.2

EXAMPLE 1. Let f(x, y) = (xy)I/3. Find the tangent plane at (1,1,1). By


elementary calculus

except at (0, 0). Moreover'!l andf2 are continuous functions except at (0, 0).
By Theorem 3.2 in Section 3.4,! is differentiable at any (xo, Yo) # (0, 0). The
components of df(xo, Yo) are fl (xo, Yo) and f2(XO, Yo). The equation for the
tangent plane at (xo, Yo, f(xo, Yo)) is

z = f(xo, Yo) + fl (xo, yo)(x - xo) + fz{xo, yo)(y - Yo)·

Taking xo = Yo = 1, the equation of the tangent plane at (1, 1, 1) is

z = 1 + t{x - 1) + t{y - 1).

84
3.3 Differentiable functions

The partial derivativesfl(O, 0) andf2(0, 0) are both 0, according to Problem


4, Section 3.1. However, there is no tangent plane at (0, 0, 0). If there were a
tangent plane at (0, 0, 0), thenfwould have to be differentiable at (0,0). Since
there is not a derivative in every direction at (0, O),Jis not differentiable there.

Theorem 3.1. Iffis differentiable at Xo thenfis continuous at Xo.


PROOF. For any h
(*) f(xo + h) - f(xo) = [f(xo + h) - f(x o) - df(x o)' h] + df(x o)' h.
From the definition of limit (with s = 1) there is a positive number bo such
that if 0 < Ih I < bo the quotient in (3.7) has absolute value less than 1,
If(xo + h) - f(xo) - df(x o)' hi < Ihl.
By Cauchy's inequality
Idf(x o)' hi ::; Idf(xo)llhl.
Applying the triangle inequality to the right side of (*),
If(xo + h) - f(xo)l::; If(x o + h) - f(x o) - df(xo)'hl + Idf(xo)·hl.
Consequently, if 0 < Ihl < bo,
(3.10) If(xo + h) - f(xo) I < Clhl,
where C = 1 + Idf(xo)l. Given s> 0, let b = min{b o , siC}. Then
If(x o + h) - f(x o)I < f:

for every h such that 0 < Ihl < b. This shows that
lim f(xo + h) = f(x o),
h-+O

in other words, thatfis continuous at Xo. o


For n = 1, f is differentiable at Xo if and only if the derivative f'(xo)
exists, since both statements are equivalent to the existence of a tangent line
at (xo, f(xo». However, for n ~ 2 a functionfmay have a derivative at Xo
in every direction yet not be differentiable or even continuous at Xo. This is
shown by the following example.

EXAMPLE 2. Let
2xy2
f(x, y) =
x
2+
y
4' if (x, y) =1= (0, 0), and f(O, 0) = o.
There are two cases to consider. If cos e =1= 0, then the derivative at (0, 0) in
the direction (cos e, sin e) is
lim f(t cos e, t sin e) - f(O,O) = lim 2 cos esin 2 e = 2 sin 2 e
1-+0 t 1-+0 cos 2 e+ t 2 sin 4 e cos (J •

85
3 Differentiation of real valued functions

If cos 8 = 0, then f(t cos 8, t sin 8) = 0 for every t and the directional
derivative at (0,0) is O. However,J(y2, y) = 1 for every y # O. Sincef(O, 0) = 0,
f is not continuous at (0, 0). By Theorem 3.1,f is not differentiable at (0, 0).
Let us next state a proposition which, although of no interest in itself, is
useful later.

Proposition 3.2. Let 4>(t) = f(x o + th). Then for elWY t such that f is differ-
entiable at Xo + th,
4>'(t) = df(x o + th) . h.
PRCX)F. If h = 0 the result is trivial. If h # 0, then
o= lim f(x o + th + tt) - f(xo + th) - df(xo + th) . tt .
q-O Ittl
In particular, let tt = rho Then
o = lim 4>(t + r) - 4>(t) - r df(x o + th) . h
r '

o = lim 4>(t + r) - 4>(t) - df(x o + th) . h. D


<-0 r

Note that if h is a direction (I h I = 1) and t = 0, we again obtain the


formula df(x o) • h for the directional derivative.
As a first application of Proposition 3.2 let us extend the mean value
theorem to functions of several variables. Consider two points X o , Xo + h.
We recall (Section 1.3) that points of the line segment I joining Xo and Xo + h
have the form x = Xo + th, t E [0, 1].

Mean value theorem. Let f be continuous at every point x of the line segment I
joining Xo and Xo + h, with f differentiable at each point of I except perhaps
the endpoints. Then there exists a number s E (0, 1) such that
f(x o + h) - f(x o) = df(xo + sh) . h.
PROOF. Let 4>(t) = f(x o + th), as in Proposition 3.2. Then 4> is continuous
on [0, 1], and
4>(1) = f(x o + h),
4>'(t) = df(x o + th) . h, if 0 < t < 1.
By the mean value theorem for functions of one variable (Section A.2) there
exists s E (0, 1) such that 4>(1) - 4>(0) = 4>'(s). D

Definition. If f is differentiable at every point of a subset A of its domain


D, then we say that f is differentiable on A. If D is an open set and f is
differentiable at every point of D, then f is called a differentiable function.

86
3.3 Differentiable functions

The mean value theorem has the following corollaries.

Corollary 1. Let f be differentiable on a convex set K and C ~ 0 a number


such that Idf(x) I s Cfor every x E K. Thenfor every x, y E K,
If(x) - f(y) I s Clx - YI.
PROOF. By the mean value theorem, with Xo = y, Xo +h= x,
f(x) - f(y) = df(y + s(x - y)) . (x - y),
where s E (0, 1). By Cauchy's inequality,
If(x) - f(y) I s Idf(y + s(x - y))llx - yl s Clx - YI· 0

Corollary 2. Let f be a differentiable function whose domain D is an open,


connected set, such that df(x) = 0 for every XED. Then f is a constant
function.
PROOF. Let Xo be some point of D, and let Dl = {x: f(x) = f(x o)}. If
XED 1 then some neighborhood U ofx is contained in D. Every neighborhood
is a convex set. By Corollary 1, with C = 0 and K = U, f(y) = f(x) = f(x o)
for every y E U. Hence U c D 1 • This shows that Dl is an open set.
Since f is differentiable, f is continuous by Theorem 3.1. Therefore D -
Dl = {x :f(x) "# f(xo)} is also open by Corollary 2 to Theorem 2.6. If
D - Dl is not empty, then D is the union of two disjoint, nonempty open
sets Dl and D - D1 • Since D is connected, this is impossible. Hence D - Dl
is empty, and D = D 1 • 0

Corollary 2 generalizes the result that iff'(x) = 0 for every x in an open


interval, thenfis constant there.
Note. In 1935 H. Whitney (Duke Math. J. 1,514-517) gave an example
ofaconnected set A c E2 and a differentiable functionf, such thatdf(x,y) = 0
for every (x, y) E A, butf(x, y) is not constant on A. The set A in Whitney's
example has no interior point.
The gradient vector
If x is any point where f is differentiable, then besides the co vector
n

df(x) = L/;(x)e i
i= 1

whose components are the partial derivatives /;(x), it is sometimes more


suitable to think instead of the vector with these same components. This is
called the gradient vector at x and is denoted by grad f(x). Thus
n
(3.11) grad f(x) = L/;(x)e i·
i= 1

Another common notation for the gradient vector is Vf(x).

87
3 Differentiation of real valued functions

Note. This definition of the gradient vector is correct only if we use the
euclidean inner product in En. Suppose that En is given some other inner
product B(x, y) = r::,j= 1 cijxii, with (Ci) a symmetric, positive definite
matrix (see Section 2.11). Let us denote the gradient vector with respect to
this inner product by grad B f(x). We require that grad B f(x) satisfy
B(gradBf(x), h) = df(x)' h, for all h E En.
To find the components of the vector z = gradBf(x), let h = e i in (3.9). Then

2>
n

j= 1
ijz i = !;(x), i = 1, ... , n.

This is a system of linear equations for the components z1, ... ,zn of
grad B f(x). Let (c ij ) denote the inverse of the matrix (cij)' Then
n
Zi = I ciiJ';{x), i = 1, ... , n,
j= 1
n
(3.12) gradBf(x) = I cijjj(x)e i ·
i.j= 1

For a noneuclidean inner product, (3.12) replaces (3.11).

PROBLEMS

In Problems 1,2, 3, and 9, assume that f is differentiable. In each case this


follows from Theorem 3.2 in Section 3.4.
1. Let f(x, y) = 3x Zy + 2xyz. Find the tangent plane at (1, - 2,2).

2. Using the formula df(x o)' v for directional derivative, find the derivative of fat
Xo in the direction v.
(a) f(x, y) = xy, Xo = (1,3), v = (2/j-S, -1/jS).
(b) f(x, y) = x exp(xy), Xo = e l - e z , v = (l/j2)(e l + ez).
(c) f(x, y, z) = ax z + bi + cz z, Xo = e l , v = e 3 .
3. Letf(x, y) = log(x z + 2y + I) + J~ cos(tZ)dt, y > -!-
(a) Find df(x, y).
(b) Find approximately f(O.03, 0.03).

4. Find grad f(x) for each of the following functioRs:


(a) f(x) = Xo . x. (b) f(x) = Ix I, x # O. (c) f(x) = (xo . x)z.

5. In Problem 3, Section 3.1, show that f is continuous at (0, 0), but not differentiable
at (0, 0),
6. Let f(x, y) = 2xyZ /(x z + y4), if (x, y) # (0,0), and f(O, 0) = 0, as in Example 2.
(a) Show that - I :$ f(x, y) :$ I for every (x, y).
(b) Find {(x, y): f(x, y) = l} and {(x, y): f(x, y) = -I}.
(c) Find {(x, y): grad f(x, y) = (0, On.
(d) Find {(x, y): f(x, y) = c} for any c, and illustrate with a sketch.

88
3.4 Functions of class C(q)

7. Let f and 9 be differentiable at Xo. (a) Prove that the sum f + 9 is differentiable
at xo, and d(f + g)(x o) = df(x o) + dg(xo). (b) Prove that the product fg is dif-
ferentiable at x o , and d(fg)(xo) = f(xojdg(xo) + g(xojdf(x o). [Hint: Recall the
proof for n = 1.J
8. (Euler's formula.) Let p be a real number. A functionf is called homogeneous of degree
p if f(tx) = tPf(x) for every x "" 0 and t > O. Let f be differentiable for all x "" O.
Show that if f is homogeneous of degree p, then

df(x) . x = pf(x)

for every x "" 0, and conversely. [Hint: Let ¢(t) = f(tx) and use Proposition 3.2
with Xo = O. For the converse, show that for fixed x, ¢(t)t- P is a constant.J
9. Let Q(x) = I7.
j = I CijXixj, where Cij = C ji and Q(x) > 0 for every x "" O. Let
f(x) = [Q(X)JP/2. Calculate df(x) and verify Euler's formula for this function.
10. Let f be continuous at Xo and 9 differentiable at Xo with g(x o) = O. Show that the
product fg is differentiable at Xo.

3.4 Functions of class C(q)

Letfbe a function whose domain is an open set D c En.

Definition. Iffis continuous on D, thenfis said to be afunction of class C(O).

If the partial derivativesfl(x), ... , fn(x) exist for every XED andfl, ... ,fn
are continuous functions on D, thenfis afunction of class C(i).

The classes C(q) of functions, where q = 2, 3, ... ,are defined below.


We first prove the following sufficient condition for differentiability, which
is adequate for most purposes.

Theorem 3.2. Iff is a function of class C(il, then f is a differentiable function.

PROOF. Let us proceed by induction on the dimension n. If n = 1, differen-


tiability means simply thatI'(x) exists for every xED, while iffis of class
C(i), thenI' is a continuous function. Let us assume that the theorem is true in
dimension n - 1.
Let Xo be any point of D and bo > 0 such that the bo·neighborhood of Xo
is contained in D. Let us write (Figure 3.3):
A (1 ... ,xn-l),
X=X, A
XO= XO,···,XO
(1 n- 1),

¢(x) = f(x\ ... , Xn - 1, XO) = f(x, xo),

provided the point (x, xo) is in D. The partial derivatives of ¢ are

i = 1, ... , n - 1.

89
3 Differentiation of real valued functions

x"

--+---------------x

Figure 3.3

Sincefis of class e(1), eachfi is continuous. Hence each ¢i is continuous and


¢ is of class e(1). By the induction hypothesis ¢ is differentiable at xo. There-
fore, given e> 0 there exists <>1' 0 < <>1 < <>0' such that

I¢(xo + 6) - ¢(xo) - :t>i(Xo)h i < I ~ 161


whenever 161 < <>1. Since J,. is continuous, there exists <>2, 0 < <>2 < <>0, such
that If"(y) - fixo) I < e/2 whenever Iy - xol < <>2. Let <> = min{<>I' <>2}
and let Ih I < <>. By the mean value theorem,
f(x o + h) - ¢(xo + h) = f(xo + 6, Xo + h") - f(xo + 6, xo)
= fixo + 6, Xo + sh")h"
for some s E(0, 1). Setting y = (xo + 6, Xo + shn),
Iy - Xo I = 1(6, sh") I ~ Ihl < <>.
Since f(xo) = ¢(xo),
f(xo + h) - f(x o) = [f(xo + h) - ¢(xo + 6)J
+ [¢(xo + 6) - ¢(xo)],
f(x o + h) - f(x o) - df(xo) • h = [fn(y)h" - J,.(xo}h"J

+ [¢(xo + 6) - ¢(xo) - :t: ¢i(Xo)hJ

Using the above inequalities, the triangle inequality, and the fact that
Ih" I ~ Ihi, 161
~ Ihi, we get

If(xo + h) - f(xo) - df(x o}' hi < ~ Ihnl + ~ llil ~ elhl


whenever Ih I < ~. This proves that f is differentiable at Xo. o
Corollary. Every function of class e(l) is of class CO).

PROOF. Apply Theorems 3.1 and 3.2. o


90
3.4 Functions of class C(q)

EXAMPLE I. IfJand 9 are functions of class C(1) with the same domain D, then
J + 9 is of class C(1). Using the product rule from elementary calculus, the

partial derivatives of the product are


i = 1, ... , n.
Since sums and products of continuous functions are again continuous,
(fg)i is continuous for each i = 1, ... , n. HenceJg is of class C(1).

EXAMPLE 2. The composite of two functions of class C(1) is also of class C(1).
Suppose that J = I/J 0 g, where I/J and 9 are of class C(1). By Formula (3.4),
}; = (I/J' g)gi' Since I/J' and 9 are continuous, their composite I/J' 9 is
0 0

continuous (Theorem 2.7). Since gj is continuous, the product (I/J' g)gi is 0

continuous. ThusJis of class C<I).


Higher-order partial derivatives
The partial derivatives JI (x), ... , In(x) are often called the first-order partial
derivatives ofJat x. The functionsJI, ... ,fn may themselves possess partial
derivatives. If}; has a jth partial derivative at x, then this partial derivative
is called a partial derivative of order 2 and is denoted by

Ii/x) or
a2J
ax j axi (x).
For example, if J(x, y) = x2 l then J,(x, y) = 2xl, Jll(X, y) = 2y 3,
Jdx, y) = 6xl·
If all of the partial derivatives Ii/x), i, j = 1, ... , n, exist at every xED
and each};j is a continuous function, thenJis called aJunction oj class C(2).
By the corollary to Theorem 3.2, ifJis of class C(2) then J" ... ,fn are continuous.
Hence any function of class C(2) is also of class C(').
The partial derivatives of J of order q = 3,4, ... are defined similarly,
wherever they exist. The notation for partial derivative at x, first in the
direction ei " second in the direction ej2' and so on, is
1 S i, S n, 1= 1, ... , q.

Definition. If all of the qth order derivatives of J exist at every XED and
each };" ... , iq is a continuous function on D, then J is a Junction oj class C<q),

To emphasize that D is the domain off, we sometimes say Junction oj class


c<q) on D,
By the corollary to Theorem 3,2, any function of class C(q) is also of class
C(q-I), As q increases, more and more restrictive conditions are placed on the

smoothness off In many parts of differential calculus it is sufficient to assume


thatJis of class C(1) or class C(2), However, for some purposes one needs C(q)
for q > 2, For instance, in Taylor's formula it is assumed thatJis of class C(q),
The sum and product of two functions of class C(q) are of class C(q). If,
in Example 2, I/J and 9 are of class C(q), their composite J is also of class C(q).

91
3 Differentiation of real valued functions

EXAMPLE 3. Let f(x) = x P if x ;?: 0, and f(x) = 0 if x ~ 0, where p > 0 is


given. Thenfis of class C<q) if q < p but not if q > p. The proof of this is left
to the reader (Problem 5). Thus for every q there exist functions of class e(q)
that are not of class C(q+ 1).

EXAMPLE 4. Any polynomial in n variables is a function of class e(q) for every


q. Iffis a rational fUllction,f(x) = P(x)/Q(x) where P and Q are polynomials,
thenfis of class e(q) on any open set where Q(x) =1= o.

It can happen that a functionfhas the second partial derivativesJij and


jji' i =1= j, but that Jij
=1= jji (see Problem 7). However, this undesirable phe-
nomenon cannot occur iffis of class e(2). This is even true under the slightly
weaker hypotheses of the following theorem in which no assumption is made
about the other second-order partial derivatives off.

Theorem 3.3. Iff is of class e(1) and both Jij and jji are continuous, i =1= j, then
Jij = hi.
PROOF. Suppose first that n = 2. We need to show thatf12 = f2l. Let (xo, Yo)
be any point of D, and 15 0 > 0 such that the bo-neighborhood of (xo, Yo)
is contained in D. For 0 < u <bo/fi let
1
A(u) = 2 [f(xo
u
+ u, Yo + u) - f(xo, Yo + u) - f(xo + u, Yo) + f(xo, Yo)]·
A(u) is sometimes called the second difference quotient. Let
g(x) = f(x, Yo + u) - f(x, Yo)
for every x such that (x, Yo + u), (x, Yo) E D. The domain of g is an open
subset of £1 which contains the closed interval [xo, Xo + u]. Moreover,
g'(x) = fl(X, Yo + u) - fl(X, Yo). Thusg is of class e(1) sincefl is continuous,
and
1
A(u) = 2 [g(xo + u) - g(xo)]·
u
Applying the mean value theorem to g, there exists ~ E (xo, Xo + u) such that
A(u) = ~u g'@ = ~u [fl(~' Yo + u) - fl(~' Yo)]
(see Figure 3.4). Of course the number ~ depends on u. Let
h(y) = fl(~' y)
for every y such that (~, y) E D. The domain of h is open and contains
[Yo, Yo + u]. Moreover, h'(y) = fde, y), h is of class e(1) since f12 is con-
tinuous, and
1
A(u) = - [h(yo + u) - h(yo)].
u

92
3.4 Functions of class C(q)

----...
y
(Xo,Yo+U) (xo+u,Yo+u)
1
I
(~, 1'/)1
1'/ ---- ---~
I
i (Xo + u, Yo)
I
I
----~----------~-----x

Figure 3.4

Another application of the mean value theorem gives


A(u) = fd~, 1]),
for some I] E (Yo, Yo + u), depending on u,
By reversing the roles of the first and second variables and repeating the
proof, we find that
A(u) = f2d~*, 1]*)
for some
~* E (xo, Xo + u) and 1]* E (Yo, Yo + u),
Since f12 and f21 are continuous, given I: > °there exists D E (0, Do) such
that ifO < u < D/j2,
/fd~, 1]) - fdxo, Yo)/ < 1:,

This shows that


lim A(u) = fdxo, Yo) = f21(XO, Yo),
u-o+
and proves the theorem if n = 2,
If n > 2 we need consider only the case i < j. Given Xo = (xb, ... ,x~) E D,
let
"'(
'l'X,y ) -_ f( xo,
I
.. ·,xoi - I ,X,XOi + I , ... ,XOj - I ,y,xoj + I n)
, ... ,X O,
for every (x, y) in some open set containing (x~, xb). Applying the theorem to
</J, we find that
o
From Theorem 3.3 it follows that in calculating any qth-order partial
derivative of a function of class C<q) it is only the number of partial differentia-
tions with respect to each of the variables which matters, and not the order
in which they are taken. Thus for functions of class C(3), fl23 = f132' fl12 =
f12I' and so on.

93
3 Differentiation of real valued functions

We are now going to prove a stronger version of the mean-value theorem,


which is valid for functions of class c<q). Let f be of class C(q) and x, Xo E D
such that the line segment joining x and Xo is contained in D. In particular, if
D is convex then x and Xo can be any pair of points of D. Let h = x - Xo,
and define r/> as in Proposition 3.2 by r/>(t) = f(x o + th). The domain of r/> is
{t : x + th ED}, which is an open subset of £1 containing the closed interval
[0, 1J. By repeated application of Proposition 3.2 to f, h, hj' ... , we find that:
n
r/>'(t) = I.i;(xo + thW,
i= 1

n
r/>(q)(t) =. ~ h, ..... iq(X O + thW', ... , hiq .
11, •.•• Iq= 1

The sum in the last formula has nq terms. It is taken over all integers ii'
i2, ... , iq, such that 1 ::; i 1 , • •. , iq ::; n.
By Taylor's formula for functions of one variable (Section A.2) there
exists s E (0, 1) such that
1 1
r/>(1) = r/>(O) + r/>'(O) + - r/>"(O) + ... + r/>(q-l)(O) + - r/>(q)(s).
2! (q - 1)! q!
But r/>(l) = f(x), r/>(O) = f(x o), and we have, by substitution, Taylor's
formula with remainder:

(3.13) f(x) = f(xo) + Jl.i;(X O)(X i - x~) + 2\ i.tlhixo)(Xi - x~)(xj - xb)

1 ~ ..
+ ... + ( -1)!. ~ _ h, ..... iq~,(xo)h'I .. ·h'q~1 + Ri x ),
q It •.•• , l q _ l - 1

where hi = Xi - x~, S E (0, 1), and

R q(x) = -1, In r. + sh)h l.' ... hlq.


.
(3.14) li, ..... l q (xo
q. i, •...• i q = 1

Notice that the first terms on the right-hand side are just the first degree
approximationf(x o) + df(x o)' h tof(x) considered in Section 3.3.
If we ignore the remainder Rix), the right-hand side is a polynomial in
hi, ... , hn of degree q - 1. If the remainder is small this polynomial furnishes
an approximation to f(x). One can give an explicit estimate for the error, in
terms of bounds for the qth-order partial derivatives off Suppose that K is

sh E K for
satisfy
°: ;
a convex subset of D, such that Xo E K and x = Xo + hE K. Then Xo +
s ::; 1. Moreover, suppose that all qth-order partial derivatives

Ih, ..... iq(x)l::; CforallxEK.

94
3.4 Functions of class CCq)

For any real numbers i, ... , y",


n
(i + ... + y")q = I yit ... yiq.
it •...• i q = 1

By Problem 4, Section 1.2, Ii'=


I Ihi I ~ n l/2 1 hI. Therefore, the remainder
in Taylor's formula satisfies the estimate
Cn ql2
(3.15) IRq(x)1 ~ -,-Ihl q, h= x - X o,
q.
whenever x o , x E K.

EXAMPLE = (1 + y2)1/2 cos X, X o = (0, 1), and q = 2. Then


5. Let f(x, y)
fl = -(1 + i)1/2 sin X,f2 = y(1 + i)-1/2 cos X,
fll = -(1 + i)1/2 cos X,f12 = - y(1 + i)-1/2 sin x,
f22 = (1 + y2)-3/ 2 cos x.
Here we have written fl for short in place offl (x, y), and so on. Taylor's
formula (3.13) becomes, sincef(O, 1) = 2 112 ,f1(0, 1) = 0, andf2(0, 1) = 2- 1/2 ,
f(x, y) = 21/2 + 2- 1/2(y - 1) + R 2(x, y).

If the remainder R 2 (x, y) is ignored, one gets a first-degree polynomial,


which approximates f(x, y) for (x, y) near (0, 1). The second-order partial
derivatives obey the estimates
Ifd ~ Iyl, If221 ~ 1.

To apply (3.15), let us choose any a > 1 and


K = {(x,y):lyl ~ a},
Whenever Iyl ~ a, we have
IR 2(x, y) I ~ (1 + a 2)1/2(x 2 + (y - 1)2).
If f is a polynomial of degree q - 1, then the Taylor approximation is
exact; in other words, Rq(x) = for every x. °
EXAMPLE 6. Letf(x, y) = x 2 y and (xo, Yo) = (1, -1). Then

fl = 2xy, -
f 2- X2, fll = 2y,
f12 = f21 = 2x, fl1 2 = fl 2 1 = f2 11 = 2,
95
3 Differentiation of real valued functions

and all other partial derivatives are 0. R 4 (x, y) = 0, and Taylor's formula
becomes
f(x, y) = f(l, -1) + J~(x - 1) + fiY + 1)

1 2
+ 21 [fl1(x - 1) + 2fdx - 1)(y + 1)]

1 2
+ 31 [3fI12(X - 1) (y + 1)],

where the partial derivatives on the right-hand side are evaluated at (1, -1).
Thus
x 2 y = -1 - 2(x - 1) + (y + 1) - (x - 1)2 + 2(x - l)(y + 1)
+ (x - 1)2(y + 1).

Functions of class C(q) on a set


In many instances either f is not of class C(q) on its entire domain D, or
else one is interested only in its values on some subset of D.

Definition. Let A be a nonempty subset of the domain of f Then f is of


class C(q) on A if there exists an open set DI containing A and a function
F of class C(q) with domain D I such that F(x) = f(x) for every x E A.

If A is open, then we may take DI = A. In that case F = flA, where


f IA is the restriction of f to A. When A is an open subset of D, f is of class
C(q) on A if and only if f IA is a function of class C(q).

The function F in the definition is called an extension of class C(q) of f IA.


It is generally not easy to determine whether there is such an extension F.
However, if A has some simple geometrical shape, there is sometimes a
method for explicitly constructing extensions.

°
EXAMPLE 7. In Example 3, let p > be an integer. Let A = [0, ex)), DI = EI,
F(x) = x P for all x. Then f is of class C(q) on A for any q, and F is an extension
of class c<q) of f IA.

*Whitney's extension theorems


Some general theorems about extensions of class c<q) were proved by
Whitney. Let us cite without proof a result that is a special case of a theorem
of Whitney in 1934 (Ann. Math. 35: 485). Let A be the closure of an open
set B, and assume either that A is convex or that its boundary fr A is an
(n - I)-manifold of class C(1) (see Section 4.7). Let f be of class ('(q) on B,
and continuous on A. Moreover, assume that for each i l , ... , iq there is a
function F i" ...• i. continuous on A such that F;, ..... i.(X) equals the qth-order
partial derivative .h, ..... i.(X) for every x E B. Then there exists a function F of

96
3.4 Functions of class C(q)

class e(q) on En such that F(x) = f(x) for every x E A. Hence f is of class cq)
on A.
Actually, what one needs to assume about A to apply Whitney's theorem
is the following: Every Xo E A has a neighborhood U such that any pair of
points x, Y E U n B can be joined in B by a polygon of length no more than
c Ix - YI, where c ~ 1 depends only on U. If A is convex, then the line segment
joining x and y lies in B, and one may take c = 1.
For other extension theorems of Whitney, see Trans. Amer. Math. Soc.
36 (1934) and Bull. Amer. Math. Soc. 50 (1944).

*Functions of class e(ro) ; real analytic functions


Let us say that f is of class e(oo) if f is of class C q ) for every q. Iff is of class
e(oo) and lim q _ oo Rq(x) = 0, then in place of Taylor's formula with remainder

we may put the corresponding infinite series. This infinite series is called the
Taylor series for f(x) at Xo.
If K is a convex subset of D and Xo E K, then the following is a sufficient
condition that f(x) be the sum of its Taylor series for every x E K. Suppose
that there is a positive number M whose qth power bounds every qth-order
partial derivative of f, namely,
(3.16)
for every x E K, q = 1,2, ... , and 1 ::; iI' ... , iq ::; n. Then (3.15) holds,
with e = Mq:

q.1 '
where B = Mnl/2Ihl. Since Bqjq! -+ °as q -+ 00,

lim Rix) = °
for every x E K, provided inequalities (3.16) hold.
A function f is called analytic if every Xo E D has a neighborhood U Xo
such that the Taylor series at Xo converges to f(x) for every x E U xo . We
have just proved the following:

Let f be of class Coo), and suppose that every Xo E D has a neighborhood


U xo in which an estimate (3.16) holds. Thenfis analytic.

The positive number M in (3.16) may depend on Xo and on the radius


of U xo. It would lead us away from our main objectives to discuss analytic
functions in any detail. Therefore, let us issue just one word of caution,
namely, not every function of class e(oo) is analytic. As an example, let D = EI
and let

f(x) = {exp( - ~2) if x> 0,

° if x ::; 0.

97
3 Differentiation of real valued functions

Let us show that this function is of class C(oo) and that f(q)(O) = 0 for every
q = 1,2, .... For x i= 0 the derivatives f(q)(x) can be computed by elemen-
tary calculus, and each f(q) is continuous on El - {O}. It is at the point 0
where f must be examined. Now

(3.17) lim Uk exp( - u) = 0 for each k = 0, 1,2, ... ,


u- + 00

a fact that we prove immediately below. If x < 0, then f(x) = f'(x) =


f"(x) = ... = O. Using (3.17) with k = 0, exp( -1/x 2 ) ~ 0 as x ~ 0+.
Since f(O) = 0, f is continuous. If x > 0

f'(x) = x 3 ~ exp(-~)
x~
= 2x· ~ exp(- ~).
X4 x 2

Using (3.17) with k = 2, f'(x) ~ 0 as x ~ 0+. Therefore limx~o f'(x) = O.


By Problem 4,1'(0) = 0 and f is of class C<1). For each q = 2,3, ... , f(q)(x)
is a polynomial in 1/x times exp( -1/x 2 ) for x > O. Hence limx~o pq)(x) = o.
By Problem 4 and induction on q, pq)(O) = 0 and f E C<q) for every q. Thus
f E C(OO). If we expand f by Taylor's formula about 0, then f(x) = Rq(x) for
every x. If x > 0 the remainder Rix) does not tend to 0 as q ~ w. Hence
f is not an analytic function.
PROOF OF (3.17). For each u < 0 let I/J(u) = u- k exp u. Then
I/J'(u) = (u - k)U- k - 1 exp u,
I/J"(u) = [u 2 - 2ku + k(k + 1)]U- k - 2 exp u.
The expression in brackets has a minimum when u = k and is positive
there. Hence I/J"(u) > 0 for all u > O. Let us apply Taylor's formula to I/J,
withq=2:

with v between u and uo. Since I/J"(v) > 0,

If Uo > k, then l/J'(uo) > 0 and the right-hand side tends to +w as u ~ + w.


Hence l/J(u) ~ + wand IN(u) ~ 0 as u ~ + w. D

PROBLEMS
1. Expand J(x, y, z) = xyz by Taylor's formula about Xo = (1, -1,0), with q = 4.
2. Let J(x, y) = X-I cos y, x > 0, and Xo = (1,0).
(a) Expand J(x, y) by Taylor's formula about xo , with q = 2, and find an estimate for
IR 2 (x, Y)I.
(b) Show that R/'(. y) --+ 0 as q --+ 00 for (x. y) in some open set containing Xo.

98
3.5 Relative extrema

3. Let f(x, y) = !/t(ax + by), where a and b are scalars and !/t is of class C(q) in some
open set containing 0. Show that Taylor's formula about (0, 0) becomes

where (j) is the binomial coefficient (which equals the number of j-element subsets
of a set with m elements).
4. Let f be continuous on an open set D and of class C(l) on D - {xo}. Suppose, more-
over, that Ii = lim x _ xo };(x) exists for each i = 1, ... , n. Prove that Ii = };(xo), and
consequently that f is of class C(l) on D. State and prove a corresponding result in
case q > 1. [Hint: Apply the mean value theorem to f(x o + te;) - f(x o).
5. Prove the statement made in Example 3. [Hint: Problem 4 with Xo = O.J
6. Let f(x) = Xk sin( Ijx) if x i= 0, and frO) = 0. Show that:
(a) If k = 0, then f is discontinuous at 0.
(b) If k = 1, then f is of class C(O) but not differentiable at O.
(c) If k = 2, then f is differentiable but not of class C(l).
(d) What can you say for k ;:::: 3?
7. Let f(x, y) = xy(x 2 - y2)j(X 2 + i), if (x, y) i= (0,0), and frO, 0) = 0.
(a) If (x, y) i= (0, 0), find fdx, y) and f21 (x, y) by elementary calculus, and verify
that they are equal.
(b) Using Problem 4 show that f1(0, 0) = f2(0, 0) = 0 and f is of class C(l).
(c) Using the definition of partial derivative, show that fI2(0, 0) and f21 (0,0) exist
but are not equal. Why does this not contradict Theorem 3.3?
8. Given nand q, how many solutions of the equation i 1 + ... + in = q are there with
ii' ... , in nonnegative integers? With i 1 , ... , in positive integers? What does this
say about the number of different qth-order partial derivatives of a function of class
Clq)?

3.5 Relative extrema


Let A be some subset of En and f a function whose domain contains A.
Let us consider the problem of minimizing or maximizing f(x) on A.

Definitions. If Xo is a point of A such that f(x o) :::; f(x) for every x E A,


then f has an absolute minimum at xo. The number

!(x o) = min{f(x): x E A}

is the minimum value of f on A. (Of course, there need not be any such
point xo. However, if A is a compact set, then by Theorem 2.5, any
continuous function has an absolute minimum at some point of A.)
If f(x o) < f(x) for every x E A except x o , then f has a strict absolute
minimum at Xo.

99
3 Differentiation of real valued functions

We say thatfhas a relative minimum at Xo if there is a neighborhood V


of Xo such that f(x o) :::; f(x) for every x E A n U. If V can be so chosen
that f(x o) < f(x) for every x E A n V except xo, then f has a strict
relative minimum at xo.

The notions of absolute maximum and relative maximum are defined


similarly by reversing the inequality signs. We say extremum for either
maximum or mInImum.
In some cases the extrema can be found by inspection. For example, if
A = En and f(x) = Ix I, then f(O) = 0 and f(x) > 0 for every x =I O. Hence
f has a strict absolute minimum at O. Since this function is not differentiable
at 0, the minimum could not have been found through the use of calculus.
If f and A are smooth enough, the relative extrema can be found by using
calculus. In the present section we assume that A is an open set. In Section 4.8
we learn a technique for finding the extrema when A is a smooth submanifold
of en.

Definition. A point Xo is a critical point of f if df(x o) = O.

If f is a differentiable function, one need look only among the critical


points for relative extrema.

Proposition 3.3. If f has a relative extremum at Xo and f is differentiable


at x o , then Xo is a critical point of f.
PROOF. Given a direction v, let cp(t) = f(x o + tv) for every t in some open
subset of E1 containing O. Then cp has a relative extremum at 0, and con-
sequently by elementary calculus cp'(O) = O. But cp'(O) = df(x o) • v is the
derivative at Xo in the direction v. Hence df(x o) • v = 0 for every v, which
implies that df(x o) = o. D

It is illuminating to look at this result in a slightly different way. In place


of the covector df(x) let us consider the vector grad f(x) with the same
components (Section 3.3). Suppose that x is not a critical point. Then
grad f(x) =I O. Let us find the direction v for which the directional derivative
at x is maximum.
By Cauchy's inequality,
grad f(x) . v :::; Igrad f(x)" v I;

and equality holds if and only if v = v(x), where


1
v(x) = Igrad f(x) I grad f (x).
This direction is called the direction of the gradient at x, and is the one which
maximizes the directional derivative. The maximum value of the directional

100
3.5 Relative extrema

derivative is
1
grad f(x) • v(x) = 1grad f(x) 1 grad f(x) • grad f(x) = 1 grad f(x) I·

By going a short distance from x in the direction v(x), f(x) is increased.


Hence f cannot have a relative maximum at x. The direction - v(x) minimizes
the directional derivative at x. In the same way, f cannot have a relative
minimum at x. This confirms the conclusion of Proposition 3.3.
This discussion is the basis for the gradient method (or method of steepest
ascent) for finding maxima. A good intuitive picture of the gradient method
may be obtained by thinking of an ambitious mountain climber who always
takes the steepest direction. Let us suppose that the surface of the mountain
can be represented in the form {(x, y,f(x, y)): (x, y) E A}, wherefisasmooth
function. In particular, no vertical cliffs, overhangs, or sharp ridges are
allowed. If the mountain has the shape indicated in Figure 3.5(a), then it
appears that the summit will be reached by this technique. However, if the
mountain has a more complicated shape, the climber may reach a false
summit or a saddle as in Figure 3.5(b). Once he reaches any critical point,
the gradient method tells him to stay there. The gradient method will be
defined more precisely later (Section 6.5).

(a) (b)

Figure 3.5

There is a second derivative test for relative extrema of functions of one


variable. If .f'(xo) = °
and !"(xo) #- 0, then the sign of !"(xo) determines
whether the critical point Xo gives a relative minimum or relative maximum.
If !"(xo) = 0, then Xo may give a relative maximum, a relative minimum, or
neither.
Let us now state and prove corresponding results for functions of several
variables. Let f be a function of class e(2) on A, and let Q be the function with
domain A x en defined by the formula

(3.18) Q(x, h) = L" fu(x)hih j•


i.j= 1

For fixed x, (3.18) defines a function on E", which we denote by Q(x, ).


Similarly, for fixed h, Q( ,h) denotes the function on A whose value at x
is Q(x, h). The function Q(x, ) is a quadratic polynomial which in linear

101
3 Differentiation of real valued functions

algebra is called the quadratic form corresponding to the n x n symmetric


matrix (fJx)) of second partial derivatives. Theorem 3.3 guarantees that
this matrix is symmetric.
Let us write Q(x, ) ~ 0 if Q(x, h) ~ 0 for every h, and Q(x, ) > 0 if
Q(x, h) > 0 for every h #- O. Note that Q(x, 0) = O. In the theory of quadratic
forms Q(x, ) is called positive semidefinite if Q(x, ) ~ 0, and positive definite
if Q(x, ) > O.
Similarly, we write Q(x, ) ~ 0 if Q(x, h) ~ 0 for every h, and Q(x, ) < 0
if Q(x, h) < 0 for every h #- O. The corresponding terms are negative semi-
definite and negative definite.
If Q(x, ) has values of both signs, then it is indefinite.
In the discussion to follow and in the proof of Theorem 3.4 below we use
the following notation. U denotes a neighborhood of a critical point x o , of
radius sufficiently small that U c A. Points of U are denoted by x. Let
h = x - xo' Points on the line segment joining Xo and x have the form
Xo + sh, 0 ~ s ~ 1, with s = 0 corresponding to Xo and s = 1 corresponding
to x. Since U is a convex set, we can apply Taylor's formula (3.13) with q = 2.
By (3.14), the remainder in Taylor's formula when q = 2 is R 2 (x) =
tQ(x o + sh, h). By (3.13),
f(x) = f(x o) + df(xo) . h + tQ(xo + sh, h).
Since Xo is a critical point, df(x o) = O. Taylor's formula becomes
(3.19) f(x) = f(x o) + tQ(xo + sh, h).
To say that f has a relative minimum at Xo is equivalent to the statement
Q(xo + sh, h) ~ 0 for all x E U, where U is some neighborhood of Xo con-
tained in A. This condition on Q is difficult to apply directly. It is more
convenient to use the following theorem, which depends on the sign of Q at
the critical point Xo.

Theorem 3.4. Let f be of class e(2) on an open set A, and Xo E A a critical


point. Then:
(a) Q(xo, ) ~ 0 is necessary for a relative minimum at xo.
(a') Q(xo, ) > 0 is sufficient for a strict relative minimum at xo'
(b) Q( Xo , ) ~ 0 is necessary for a relative maximum at Xo.
(b') Q(x o, ) < 0 is sufficient for a strict relative maximum at Xo.

PROOF. Letfhave a relative minimum at Xo. Then there exists a neighborhood


U of Xo such that
f(x) ~ f(x o) for every x E UnA.
Since A is open, we may assume that U c A. Since df(x o) = 0, Formula
(3.19) implies that Q(x o + sh, h) ~ 0 for all x E U, where h = x - Xo.
Since f is of class C<2), the second-order partial derivatives fij are continuous.
Hence Q( ,ho) is continuous on A, for fixed ho. Suppose that Q(x o, ho) < 0

102
3.5 Relative extrema

°
for some ho E En. Then there exists a neighborhood V I of x o , V I C V,
such that Q(y, h o) < for all y E V I. Take x = Xo + ch o where Ic I is small
enough that x E V I, and y = Xo + sh. Then h = ch o: and since Q(x o + sh, )
is quadratic,
Q(xo + sh, h) = c2Q(X O + sh, h o) < O.
But Q(x o + sh, h) z 0, a contradiction. This proves (a).
To prove (a'), suppose that Q(x o , ) > O. Using Problem 8 and the fact
that the functions Iij are continuous at X o , there exists a neighborhood V of
Xo such that V c A and Q(y, ) > 0 for every y E U. Taking y = Xo + sh,
we find from (3.19) that I(x) > I(x o ) for every x E V, x #- Xo. This proves
that I has a strict relative minimum at Xo. Statements (b), (b') follow respec-
tively from (a), (a') by considering -I. D

Let dix) = detCUx)) denote the determinant of the matrix of second-


order partial derivatives. This determinant is called the Hessian.

Definition. A critical point x is nondegenerate if the Hessian determinant


dn(x) is not o.

The behavior of I near a nondegenerate critical point x is determined


by that of the quadratic function Q(x, ). Let us state three criteria according
to which one can test whether a nondegenerate critical point gives a relative
extremum. We begin with a criterion that is special to two dimensions.

Criterion I
In this case, with n = 2,
Q(x, y, h, k) = I11h2 + 2I12hk + I22k2
where we have written (h, k) for (h I, h 2) and Iij for Iij(x, y).
Since (x, y) is a nondegenerate critical point, d 2(x, y) = II d22 - I~2 #- 0.
Consider first the case when 111122 - Ii2 > 0. Then the quadratic equation
°
Q(x, y, h, k) = has no roots (h, k) except (0,0). Therefore, Q(x, y, h, k) has
the same sign for all (h, k) #- (0,0). We have Q(x, y, h, 0) = Illh 2, Q(x, y, 0, k)
= I22k2. The sign of 111 and 122 determines whether Q(x, y, , ) > 0 or
Q(x, y, , ) < O.
By Theorem 3.4 (a'), (b') we find that a critical point (x, y) is a point of
relative minimum if III > 0,122 > 0, II d22 - Ii2 > 0, relative maximum
if 111 < 0,f22 < 0,f11122 - Ii2 > 0.
If II d22 - Ii2 < 0, then {(h, k): Q(x, y, h, k) = O} consists of two lines
intersecting at (0,0). They divide the (h, k)-plane into four parts, on two of
which Q(x, y, h, k) > 0 and on the other two of which Q(x, y, h, k) < O. In
this case, Q(x, y, , ) is indefinite. A critical point where 111122 - Ii2 < 0 is
called a saddle point. The function illustrated by Figure 3.5(b) has one point
of absolute maximum, one of relative maximum, and one saddle point.

103
3 Differentiation of real valued functions

EXAMPLE. Let f(x, y) = 2y2 - x(x - 1)2 for every (x, y) E £2 and A = £2.
This function has two critical points, (1, 0) and (1,0). We find that
fll = 4 - 6x, f22 = 4, fllf22 - fI2 = 16 - 24x.
The point H, 0) gives a relative minimum and (1, 0) is a saddle point. In this
example it is instructive to find the level sets {(x, y): f(x, y) = c}. They are
indicated in Figure 3.6 for the critical values - 247 = fH, 0),0 = f(1, 0) and
for nearby values of c.

c= -r.,

----Hr-H~-----r.**-+H_-------------x

Figure 3.6

The point (t, 0) of relative minimum is an isolated point of the level set
containing it. For - n < c < 0 the level set has two parts. The one that
n.
encloses (t, 0) resembles a small ellipse if c is near - This can be attributed
to the fact that near (t, 0), f(x, y) is approximated by the first two nonzero
terms in its Taylor expansion about (t, 0), namely,
f(t, 0) + tQH, 0, x -1, y) = -n + (x -1)2 + 2y2.
The level sets of this quadratic function are ellipses with center H, 0)- if
c > - 247. Similarly, f(x, y) is approximated by -(x - 1)2 + 2y2 near the
saddle point (1,0). The level sets -(x - 1)2 + 2y2 = c are hyperbolas if
c =F O. Near (1,0) the level sets of f resemble these hyperbolas. For c = 0
we get the lines fly = ±(x - 1) tangent to the level set offat (1, 0).
Let us now state two criteria for Q(x, ) > 0 (or Q(x, ) < 0) that hold
for any dimension n. By Theorem 3.4 these imply that a critical point x
gives a relative minimum (or relative maximum).

Criterion II
For any n let

104
3.5 Relative extrema

These are called the principal minor determinants of the matrix (fiix)). The
mth principal minor dm(x) is the determinant of the matrix obtained by
deleting the last n-m rows and columns. The determinant dix) is the Hessian
of fat x.
Let us state without proof the following criterion:

Q(x, ) > 0 iff dm(x) > 0 for m = 1, ... , n.


Q(x, ) < 0 iff (-ltd m(x) > 0 for m = 1, ... , n.

For a proof of the first of these two statements, see the work by Bocher [3J,
especially pp. 140 and 147. The second follows from the first by considering
- Q. Here iff is an abbreviation for" if and only if."
Criterion II is fairly convenient for small values of n, but becomes unwieldy
for larger ones. This is because of the very large number of operations required
to calculate the determinant of an m x m matrix even for moderately small m.

Criterion III
In linear algebra it is shown that any quadratic form can be written as a
linear combination of squares by suitably choosing a new orthonormal
basis for P. This fact is also proved in Section 4.8 below. Therefore
n
(3.20) Q(x, h) = L Ai(X) [1Ji(x)] 2,
i= I

where for each hE En, 1JI(X), ... , 1Jn(x) are the components of h with respect
to some orthonormal basis {vI(x), ... , vn(x)} for En,
n
h= L 1J i(X)Vi(X).
i=1

The numbers AI (x), ... , An(X) are just the characteristic values of the matrix
(fi}{x)).
If Ai(X) > 0 for each i = 1, ... , n, then from (3.20) Q(x, h) > 0 unless
1Ji(X) = 0 for each i (that is, unless h = 0). In this case Q(x, ) is positive
definite. Conversely, ifh = Vi(X), then Q(x, h) = Ai(X). Therefore, ifQ(x, ) > 0,
then in particular Q(x, vi(x)) > 0, and Ai(X) > 0. This proves the first of the
following statements:

Q(x, ) > 0 iff Ai(X) > 0 for i = 1, ... , n.


Q(x, ) < 0 iff Ai(X) < 0 for i = 1, ... , n.

The second is proved in the same way. Replacing on both sides" >0" by
"~O" we get a criterion for nonnegative semidefiniteness, and replacing
"<0" by" :::;;0," one for nonpositive semidefiniteness.
If n is fairly large it is better, instead of Criterion II, to try some numerical
method for putting Q(x, ) in the form (3.20).

105
3 Differentiation of real valued functions

Boundary extrema
If f is continuous on a compact set A, then f has absolute extrema on A.
They may occur either at interior or at boundary points of A. If an absolute
maximum occurs at an interior point Xo of A, then Xo is among the relative
maxima in int A. We can try to find it by Theorem 3.4. However, Theorem 3.4
does not apply at boundary points of A.
If Xo E fr A and Xo gives an absolute maximum, then f(x) ~ f(xo) for
every x E A, and in particular for every x E fr A. Therefore Xo also gives an
absolute maximum among points of fr A. If fr A is sufficiently smooth, the
Lagrange multiplier rule (Section 4.8) can be applied.
In Section 3.6 we discuss extrema of linear functions, for which calculus
is of no use.

PROBLEMS

In Problems 1 through 6 let A = En for the indicated n.


1. Find the critical points, relative extrema. and saddle points. Make a sketch indicating
the level sets.
(a) f(x, y) = x - x 2 _ y2. (b) f(x, y) = (x + I)(y - 2).
(c) f(x, y) = sin(xy). (d) f(x, y) = xy(x - I).

2. Find the critical points, relative extrema, and saddle points.


(a) f(x, y) = x 3 + X - 4xy - 2l.
(b) f(x, y) = x(y + 1) - x 2y.
(c) f(x, y) = cos x cosh y.
[Note: The hyperbolic functions sinh and cosh are defined by
sinh x = t[exp x - exp( -x)J,
cosh x = Hexp x + exp( -x)].
Their derivatives are given by the formulas sinh' = cosh, cosh' = sinh.]

3. Let f(x, y, z) = x 2 + y2 - Z2. Show that f has one critical point, which does not
give a relative extremum. Describe the level sets.

4. Let f(x, y, z) = x2 + 3y2 + 2Z2 - 2xy + 2xz. Show that 0 is the minimum value
off

5. Given XI"", Xm , find the point X where Ij~ I Ix - xjl2 has 'an absolute minimum,
and find the minimum value.

6. (a) In Problem I(a) find the (absolute) maximum and minimum values of f on
the circular disk x 2 + l ::s; 1.
(b) Do the same for I(c).

7. (a) Let f(x) = I/I(a . x), where 1/1 is of class e(2) and a =I- O. Find all critical points,
and show that every critical point is degenerate.
(b) Illustrate this result in case f(x, y) = (x _ y)2.

106
3.6 Convex and concave functions

8. Let g(h) = D.j= I cijhih j . Assume that g > 0, that is, that g(h) > 0 for every h =1= O.
(a) Show that there exists a number m > 0 such that g(h) ~ m Ih 12 for every h.
[Hint: The polynomial g is continuous and has a positive minimum value m
on the unit (n - I)-sphere {h: Ihl = 1}.]
(b) SupposethatlCij - cijl < w- 2 foreachi,j = 1, ... ,n.LetG(h) = L7.
j =1 Cijhih •
j

Show that G(h) ~ (m - /;) Ih 12 for every h. Hence G > 0 if I: < m.

9. Let Xo be a nondegenerate critical point of a function I of class C(2). Show that Xo


is isolated, that is, that Xo has a neighborhood U containing no other critical points
of f [Hint: Let x be another critical point in U. Apply the mean value theorem to
each of the functions II, ... , j~ to find that
n
(*) 0= L./:iYi)(X j - xh), i = I, . .. ,n,
j= 1

where each Yi E U. Show that if U is small enough, detC/:iy;)) =1= 0 and consequently
the system of equations (*) has only the solution x - Xo = 0, a contradiction.]

*3.6 Convex and concave functions


Functions that are either concave or convex arise naturally in connection
with the study of convex sets. They also occur in a wide variety of applications
of calculus. We will see that the theory of maxima and minima is much
simpler for them than for functions that are neither concave nor convex.
Let f be a real valued function and K a convex subset of the domain of f.

Definition. The function f is convex on K if, for every x I, X z E K and t E [0, 1],
(3.2la) f(tx I + (1 - t)xz) ~ t{(xd + (1 - t)f(xz).

If strict inequality holds in (3.2la) whenever Xl =F Xz and 0< t < 1,


then f is strictly convex on K.

The assumption that K is a convex set is needed to ensure that the point
tX I + (1 - t)x z belongs to the domain of f. In order to see the geometric
meaning of convexity, let us denote points of En + I by (Xl, ... , x n , z) or, for
short, by (x, z). Let
K+ = {(x, z): x E K, z ?: f(x)}.
If Xl = x z , then (3.2la) holds trivially. Therefore, suppose that Xl =F Xz .
Let I denote the line segment in En + l joining (XI,f(X I )) and (xz,f(x z)).
Points of I are of the form
(tXI + (1 - t)xz, tf(xd + (l - t)f(x z )),

where t E [0,1]. Inequality (3.2la) says that such points belong to K+.
Therefore, the definition says geometrically that the line segment I is con-
tained in K+ for every pair of points Xl' Xz E K.

107
3 Differentiation of real valued functions

Proposition 3.4. The function f is convex on K if and only if K + is a convex


subset of En + I.

PROOF. Let f be convex on K. Let (Xl> ZI)' (X2' Z2) E K+, (XI' ZI) -# (X2' Z2),
and l' be the line segment joining them. Let (x, z) be any point of 1'. Then

X = tX I + (1 - t)X2'
Z = tZ I + (l - t)Z2,

where 0 S t S 1. Since ZI ~ f(x l ) and Z2 ~ f(x 2), we have (Figure 3.7)

Hence (x, z) E K+. This proves that K+ is a convex set.

I
(x, j(x))
I
---r--~~~--~~--~--x
XI X X2
Y
K

Figure 3.7

Suppose, conversely, that f is not convex on K. Then there exist XI' X 2 E K


and t E [0, IJ such that (3.21a) does not hold. The point (tx I + (1 - t)x 2 ,
tf(x l ) + (1 - t)f(x 2)) belongs to the line segment I joining the points
(XI,J(X I )) and (X 2,J(X 2)), but not to K+. Since (xl,J(xd), (X 2,J(X 2)) E K+,
the set K + is not convex. D

For any real number c, let


Kc = {xEK:f(x) s c}.

Proposition 3.5. Iffis convex on K, then Kc is a convex setfor every c.

PROOF. For every XI' X2 EKe and t E [0, IJ,


f(tx I + (1 - t)x 2) s tf(xd + (1 - t)!(x 2) s tc + (1 - t)c = c.
Hence tX I + (1 - t)x 2 EKe. D

108
3.6 Convex and concave functions

The same proof shows that {X: f(x) < c} is also convex. The converse to
Proposition 3.5 is false; for example, let f be any increasing function with
domain EI. Then Kc is either all of EI, a semiinfinite interval, or the empty
set. In each case Kc is convex. However, f need not be a convex function; for
instance, if f(x) = x 3 , then f is not convex on EI.

EXAMPLE I. Let A be any nonempty closed subset of En. For every x, let
f(x) be the distance from X to A, namely,
f(x) = min{lx - yl: YEA}.
Let us show that the function f so defined is convex on En if and only if A is a
convex seLIf f is convex on En, then Ko = A and A is convex by Proposition
3.5 with c = O. Conversely, assume that A is convex. Given XI' Xl E En and
t E [0, 1], let YI be a point of A nearest x I, Y1 a point of A nearest Xl' and
X = tX I + (1 - t)x l , y = ty I + (l - t)y 1 .
(Actually, one can say "the nearest point" rather than" a nearest point"
since the set A is convex. This fact is not needed here.) Since A is convex,
YEA. By definition of f, f(x) ~ IX - YI. Then
f(x) ~ It(x i - YI) + (l - t)(x l - Yl)1,
f(x) ~ tix i - YII + (1 - t)lx z - Yll = tf(x l ) + (1 - t)f(xz)·
Hence f is a convex function on En.
In particular, let A consist of a single point Xo. Then f(x) = Ix - Xo I,
and this function is convex on En

Concave functions
The definition of concave function is obtained by reversing the inequality
sign in (3.21a): f is concave on K if, for every xl> Xz E K and t E [0, 1],
(3.21b)
If strict inequality holds whenever XI i= Xz and 0 < t < 1, then f is strictly
concave on K.
There are propositions about concave functions corresponding to Propo-
sitions 3.4 and 3.5 for convex functions. In them K + must be replaced by
K- = {(x, z): x E K, z ~ f(x)},
and Kc by
K C = {xEK:f(x);:::: c}.
A function f is concave on K if and only if - f is convex on K. By using this
fact, or by repeating the proofs, it is easy to prove these propositions about
concave functions.
Many useful inequalities can be obtained from (3.21a) or (3.21 b) by judi-
ciously choosing the function f and the number t.

109
3 Differentiation of real valued functions

Continuity of convex functions

Theorem 3.5. Let K be an open convex set andJconvex on K. ThenJis con-


tinuous on K.

PROOF. Let Xo be any point of K, and d the distance from Xo to the boundary
of K (d = + 00 if K = En). Let C be an n-cube with center Xo and side length
2c5, where n'/2c5 < d. Let V denote the set of vertices of C (see Problem to,
Section 1.5). V is a finite set. Let

(3.22) M = max{J(x): x E V}.

By Proposition 3.5, KM is a convex set. Since C is the convex hull of V and


VC KM,C c K M.

Xo +u

Xo - u

---2«5 ,I
Figure 3.8

Let x be any point such that 0 < Ix - Xo I < c5, and define Xo + u, Xo - u
on the line through Xo and x as in Figure 3.8. Let us write x as a convex com-
bination of Xo + u and xo, and Xo as a convex combination of x and Xo - u.
If t = c5 - 1 Ix - Xo I, then
x = t(x o + u) + (1 - t)xo,

1 t
Xo = -1- x + -1- (xo - u).
+t +t
Since J is convex,
J(x) ~ tf(xo + u) + (1 - t)J(x o) ~ tM + (1 - t)J(x o),
1 t J(x) + tM
J(xo) ~ -l-J(x) + -l-J(x o - u) ~ 1 .
+t +t +t
The inequalities give

-t[M - J(x o}] ~ J(x) - J(x o) ~ t[M - J(x o}],

110
3.6 Convex and concave functions

or

(3.23)

The estimate (3.23) shows that 1 is continuous at Xo. [Note: This proof was
suggested by F. 1. Almgren.] 0

If K is not open, then a convex function 1 may be discontinuous at


boundary points of K. See the example below. The interior of K is an open
convex set, and by the theorem 1 is continuous at every interior point.

EXAMPLE °
2. Let K = [0, 1] and I(x) = x if < x ::; 1,/(0) = 1. Then 1 is
convex on K but is discontinuous at the left endpoint 0.

Calculus tests for convexity or concavity of a function


If a function 1 is sufficiently smooth, then 1 can be tested for convexity or
concavity by using calculus. First let us assume that 1 is differentiable.
Later in the present section we make the stronger assumption that 1 is of
class c(Z) and obtain a test, in terms of the second-order partial derivatives
(Theorem 3.6).
Figure 3.7 suggests that convexity of a differentiable function 1 is equiv-
alent to the fact that 1 lies above its tangent hyperplane at each point
(xo, I(xo)). The following proposition shows that this is indeed so.

Proposition 3.6a. Let 1 be differentiable on a convex set K. Then 1 is convex


on K if and only if
(3.24a) I(x) ~ I(xo) + dl(xo) . (x - x o)
for every x o , x E K.

PROOF. Let 1 be convex on K, and let xo, x be any two points of K. Let
h = x- Xo and t E (0, 1). By definition of convex function,

I(x o + th) ::; tf(x o + h) + (1 - t)/(x o).


This inequality may be rewritten as
(3.25) I(x o + th) - f(x o) ::; t[f(x o + h) - f(x o)].
Subtracting tdl(xo) . h from both sides and dividing by t,

I(xo + th) - I(x o) - tdl(x o) . h ::; I(xo + h) - I(x o) - dl(x o) . h.


t
°
The left-hand side tends to as t -+ 0+. Hence the right-hand side is non-
negative, which says that (3.24a) holds.
Conversely, assume that (3.24a) holds for every x o , x E K. Let xl>
X z E K, Xl i= X z , and let t E (0, 1). Let

III
3 Differentiation of real valued functions

A little manipulation shows that


t
X z = Xo - --h.
1- t
By (3.24a) we have
f(x 1 ) ~ f(x o) + df(x o) . h,

f(x z) ~ f(x o) + df(x o) . ( - 1~ t h}


Multiplying by t/{1 - t) in the first inequality and adding, we get

1 ~ t f(x 1) + f(x z) ~ C~ t + 1)f(xo),


or
t/(x 1) + (1 - t)f(x z ) ~ f(xo)·

°
But this is just the inequality (3.21a) in the definition of convex function. We
assumed that t E (0, 1), but if t = or 1, (3.2la) trivially holds. Therefore
f is convex on K. 0

By sharpening the inequality in (3.24a) we get a necessary and sufficient


condition for strict convexity.

Proposition 3.6b. Let f be differentiable on a convex set K. Then f is strictly


convex on K if and only if
(3.24b)
for every x, Xo E K with x =I xo.
PROOF. Let f be strictly convex on K. In particular, f is convex on K and
(3.24a) holds for every x, Xo E K. Suppose that x =I Xo , and let h = x - Xo.
For every t E (0, 1)
df(x o) . (th) ~ f(x o + th) - f(x o),

by (3.24a) applied with x replaced by Xo + tho But according to (3.25),


which holds strictly since f is strictly convex,

Therefore
tdf(x o) • h < t[f(x o + h) - f(x o)].

Upon dividing both sides by t we get (3.24b).


The proof of the converse is the same as for Proposition 3.6a, all inequali-
ties now being strict. 0

112
3.6 Convex and concave functions

For concave functions the inequality signs must be reversed in (3.24a)


and (3.24b). The first of these inequalities then says geometrically that f
lies below its tangent hyperplane at (x o , f(x o», and the second says that this
is strictly true except at the point (xo, f(xo)) itself.
In the one-dimensional case there is the following simpler test for con-
vexity, or strict convexity, of f. See Appendix AA for a definition of non-
decreasing and increasing functions.

Proposition 3.7. (n = 1) Let KeEl be an interval, and f a function that has


a derivative 1'(x) for every x E K. Then:
(a) f is convex on K if and only ifl' is nondecreasing on K.
(b) f is strictly convex on K if and only if l' is increasing on K.
PROOF. Let f be convex on K, where KeEl is an interval. Let x, y E K,
Y < x. By (3.24a) applied with Xo = y,
f(x) - f(y) 2:: 1'(Y)(x - y).
By (3.24a) applied with Xo = x,
f(y) - f(x) 2:: 1'(x)(y - x), f(x) - f(y) ~ 1'(x)(x - y).
Therefore 1'(Y)(x - y) ~ 1'(x)(x - y), from which 1'(Y) ~ 1'(x). This proves
that l' is nondecreasing on K. If f is strictly convex, then each of these
inequalities is strict. In particular, 1'(Y) < 1'(x), which shows that l' is in-
creasing on K.
Conversely, assume that l' is nondecreasing on K. Let xo, x E K, and
suppose first that Xo < x. By the mean value theorem there exists y E (xo, x)
such that
f(x) - f(xo) = 1'(Y)(x - xo)·
Since l' is nondecreasing, 1'(xo) ~ f'(y). Therefore
f(x) - f(xo) 2:: 1'(xo)(x - xo),
which is equivalent to inequality (3.24a). Similarly, (3.24a) holds if x < Xo.
By Proposition 3.6a, f is convex on K. If l' is increasing, then 1'(xo) < 1'(y),
and the proof shows that (3.24b) holds. By Proposition 3.6b, f is strictly
convex on K. D

EXAMPLE 3. Let p > 1, p not necessarily an integer. Let f(x) = Ix IP for


every x EEl. Then
1'(x) = { plxl p - 1 if x> 0,
_plxI P - 1 if x < 0,
and 1'(0) = O. The function l' is increasing. Hence f is strictly convex on El.
Taking t = -!, we have
f[!<xI + X2)] ~ -!I(xt) + If(X2).
113
3 Differentiation of real valued functions

Multiplying both sides of the inequality by 2P, we get


IXI + x21 P ~ 2P - I(l xII P + IX2I P ).
The inequality is strict unless x I = X2'

Let us next prove a theorem that provides a convenient test for concavity
or convexity of a function of class e(2). We recall the function Q defined by
(3.18).

Theorem 3.6. Let f be of class e(2) on an open, convex set K. Then:


(a) f is convex on K if and only if Q(x, ) ~ 0 for every x E K.
(a') If Q(x, ) > 0 for every x E K, then f is strictly convex on K.
(b) f is concave on K if and only if Q(x, ) ~ 0 for every x E K.
(b') IfQ(x, ) < 0 for every x E K, thenfis strictly concave on K.

PROOF. Since K is convex we may use Taylor's formula with q = 2 and


any pair of points x o , x E K:

(3.26) f(x} = f(x o) + df(x o) • h + tQ(x o + sh, h),


where s E (0,1) and h = x - xo. Let us first prove (a'). By hypothesis,
Q(y, ) > 0 for every y E K, and in particular for y = Xo + sh. Therefore
Q(x o + sh, h) > 0 if h -=I- 0, from which

By Proposition 3.6b, f is strictly convex on K.


Let us next prove (a). If Q(x, ) ~ 0 for every x E K, then the same reason-
ing shows that

By Proposition 3.6a,fis convex on K. On the other hand, ifit is not true that
Q(x, ) ~ 0 for every x E K, then Q(x o, ho) < 0 for some Xo E K and ho -=I- O.
Since f is of class e(2l, Q( ,ho) is continuous on K. Hence there exists b > 0
such that Q(y, ho) < 0 for every y in the b-neighborhood of xo. Let h = cho,
where c > 0 is small enough that Ih I < b, and let x = Xo + h. Since
Q(x o + sh, ) is quadratic,

Q(x o + sh, h) = c 2 Q(X O + sh, ho) < O.


From (3.26)

By Proposition 3.6a, f is not convex on K.


This proves (a) and (a'). Parts (b) and (b') follow respectively from (a)
and (a') by considering -f. D

114
3.6 Convex and concave functions

If n = 1, then Q(x, h) = f"(x)h 2 • The sign of Q(x, h) is determined by that


of f"(x). In that case condition (a) becomes f"(x) :2: 0, and (a') becomes
f"(x) > O. The signs are reversed for (b) and (b'Y.

EXAMPLE 4. Let f be a homogeneous quadratic polynomial,


n
f(x) = L cijxix j ,
i,j= I
for each x E En, where the n x n matrix (Ci) is symmetric. Then };ix) = 2cij
and Q(x, h) = 2f(h). Hence f is convex on En in case f(x) :2: 0 for every x,
and concave in case f(x) ~ 0 for every x. Iff has values of both signs, then f
is neither convex nor concave.

EXAMPLE 5. Let f(x) = ¢[g(x)J, where: (a) 9 is convex and of class e(2)
on K; (b) ¢ is of class e(2) on an interval I such that g(K) c I, with
¢'(u) :2: 0, ¢"(u) :2: 0 for all u E I. By the formula in one-variable calculus for
derivatives of composites,
};(x) = ¢'[g(X)Jgi(X).
By using this theorem again and the product rule,
};ix) = ¢'[g(x)Jgiix) + ¢"[g(x)Jgi(x)gix).
Writing for short gi for gi(X), and so on,
n n
Q(x, h) = ¢'[g(x)J L 9;jh ih j + ¢"[g(x)J L gigjhih j.
i,j=1 i,j=1
The first term on the right-hand side is nonnegative since ¢'[g(x)J :2: 0
and 9 is convex on K. The second term is also nonnegative since ¢"[g(x)J :2: 0
and

Thus Q(x, h) :2: 0 for all x E K, h E En. By Theorem 3.6a, f is convex on K.


In Examples 4 and 5, the sign of Q(x, ) could be determined by direct
calculations. When this is not feasible, one of the tests I, II, or III in Section
3.5 may be applied.

EXAMPLE 6. Let f(x, y) = i(x 3 + y3) + xy. Then fll = X, f22 = y, fll f22
- fi2 = xy - 1. Hence f is strictly convex on the part of the first quadrant
above the hyperbola xy = 1, and strictly concave on the part of the third
quadrant below this hyperbola.

For convex functions the necessary condition df(x o) = 0 for a minimum


is also sufficient.

115
3 Differentiation of real valued functions

Theorem 3.7. Let f be differentiable and convex on an open convex set K and
Xo E K a critical point. Then f has an absolute minimum at xo.
PROOF. Since df(x o) = 0, f(x) 2': f(x o) for every x E K by Proposition 3.6a.
o
Similarly, any differentiable concave function has an absolute maximum at
any critical point.

Corollary 1. A differentiable function which is strictly convex (or strictly


concave) has at most one critical point.
PROOF. Let f be strictly convex on K, and suppose that f has an absolute
minimum at distinct points Xo , XI E K,
f(x o) = f(x ,) :::; f(x)
for every X E K. Since df(x o) = 0, by Proposition 3.6b (with x = XI),
f(x ,) > f(x o). This is a contradiction. 0

Extrema of linear functions


Let f be a linear function. Then calculus is of no help in finding the extrema

°
of f. Since f(x) = a' X = alx l + ... + anx n, the partial derivatives are
/;(x) = ai. Iff has a critical point, then a = 0 and f(x) = for every x.
Let us assume that a =f. 0 and consider the problem of extremum on a
convex polytope K (Section 1.5). If K is contained in {x: Xi 2': 0, i = 1, ... , n},
this is a problem in linear programming and has various interesting applica-
tions [9, 13].
For simplicity let us assume that K is compact. The extrema of f must
occur on the boundary fr K. Let us show that they can be found by consider-
ing only certain points of fr K, called extreme points.

Definition. Let K be a convex set. A point x E K is an extreme point of K


if there do not exist distinct points x I, X 2 E K and t E (0, 1) such that
x = tx I + (1 - t)x 2 .

Stated geometrically, x is extreme if it is interior to no line segment in K.


For example, the extreme points of a simplex are the vertices. If K is a closed
n-ball, then every point offr K is extreme. A half-space has no extreme points.

Proposition 3.8. Let K be compact and convex. Then every point of K is a con-
vex combination of extreme points of K.
PROOF. Let us proceed by induction on the dimension n. If n = 1, then
K is an interval or a single point. Suppose that the proposition is true in
dimension n - 1. Let Xo E K. If Xo is a boundary point, then by Problem 7,
Section 2.4, K has a supporting hyperplane P containing xo. By an isometry
of En (see Section 4.2), we may arrange that Xo = 0 and the equation of P is

116
3.6 Convex and concave functions

xn = O. The set K n P is compact and convex. By the induction hypothesis


Xo is a convex combination of extreme points of K n P and hence (Problem
14) of extreme points of K.
If Xo E int K, then any line through Xo intersects K in a segment with
endpoints Xl' X 2 E fr K. Since Xl and X2 are convex combinations of the set
of extreme points, so is Xo (Problem 8, Section 1.5). 0
By Proposition 1.7, taking as S the set of extreme points of K, each point
of K is a convex combination of n + 1 or fewer extreme points. If S is con-
nected, then n + 1 may be replaced by n.
Let C be the maximum value on K of the linear function f and K 1 =
{x E K :f(x) = C}. If Kl is found, the problem of maximum is solved.

Corollary 2. K I is the convex set spanned by those extreme points of K at which


f has an absolute maximum.
PROOF. Let X E K l • By the proposition X = L
tixi , where Xl' ... ' Xm are
L
extreme points, each t j > 0, and ti = 1. All sums are from 1 to m. Since C
is the maximum value, f(x) ::; C. Since f is linear,
f(x) = L tif(xi) ::; L tiC = C.

But f(x) = C, and since each ti > 0 we must have f(x) = C for j = 1, ... , m.
Thus Xl' •.• , Xm E K I . Conversely, if f(x j ) = C for eachj and X is a convex
combination of Xl, ... , X m, then f(x) = C. 0

If K is a convex polytope, then by induction on n the set of extreme points


is finite. The problem is no longer one of calculus, but that of maximizing f
on this finite set. Except in the simplest si tuations, the method of unsystematic
search among the extreme points is oflittle value. The best-known systematic
method is called the simplex method of linear programming. In a sense it is
an adaptation of the gradient method.

PROBLEMS

1. Which fourth-degree polynomials are convex functions on £1?


2. Use Theorem 3.6 to determine whether f is convex on K, concave on K, or neither.
Unless otherwise indicated, K = £2 or £3.
(a) f(x, y, z) = x 2 + y2 - 4z 2.
(b) f(x, y, z) = x - y2 - Z2.
(c) f(x, y) = (x + y + !)P, K = {(x, y): x + y + 1 > OJ.
(d) f(x, y, z) = exp(x 2 + xy + y2 + Z2).
(e) f(x, y) = exp(xy).
In which cases is the convexity or concavity strict?
3. Let f(x, y) = </J(x 2 + y2), where </J is of class C(2), increasing and concave. Show
that f is convex on the circular disk x 2 + y2 ~ a 2 if and only if </J'(u) + 2u</J"(u) ~ 0
whenever 0 ~ u ~ a2 •

117
3 Differentiation of real valued functions

4. Using Problem 3, find the largest a such that f is convex on x 2 + y2 :::; a2.
(a) f(x, y) = 10g(1 + x 2 + i). (b) f(x, y) = sin(x 2 + y2).
5. Using Example 5, show that each of the following functions is convex on En:
(a) f(x) = IxI P, P ~ 1. (b) f(x) = (1 + X' x)x·x.
(c) f(x) = (1 + IxI2)P/2, p ~ 1. [Hint: First consider p = 1.]
6. Let f(x) = xtjJ(x) and g(x) = tjJ(1/x), where tjJ has a second derivative tjJ"(x) for
every x > 0. Show that f is convex on (0, (0) if and only if g is convex on (0, (0).
7. Let x > 0, y > 0, °: :;
t :::; 1. Show that tx
increasing, concave function.]
+ (1 - t)y ~ x'yl-'. [Hint: Log is an

8. Prove by induction on m that if f is concave on K, then

f(J/Xi) ~ J/f(X)
for every XI"'" Xm E K and scalars t 1, ... ,tm such that each ti ~ and
t 1 + ... + t m = 1. [Note: For convex functions the sense of the inequality is
°
reversed.]
9. (a) Generalizing Problem 7, show that if Xl>"" Xm are positive numbers,
for j = 1, ... , m, and t l + ... + tm. = 1, then
°: :; ti

(b) Prove that the geometric mean is no more than the arithmetic mean, namely.

Xl + ... + Xm ( )l/m
~ X\"'X m •
m
10. Show that if f and g are convex on K, then f + g is convex on K.
11. (a) Let f and g be convex on K, and let h(x) = max[f(x), g(x)] for every X E K.
Show that h is convex on K. [Hint: Use Proposition 3.4.]
(b) Illustrate for the case f(x) = Ix-II, g(x) = x/2.
12. Let f be both convex and concave on En. Show that there exist a and b such that
f(x) = a . x + b for every x E En.
13. Let f be continuous on K, and assume that f(!<x I + x 2)) :::; !f(XI) + !f(x 2) for
every Xl' x 2 E K. Show that f is convex on K. [Hint: First show (3.21a) when
t = j/2k where j = 0, 1, ... , 2k and k is a positive integer.]

14. Let K be closed and convex, and P a supporting hyperplane for K. Show that any
extreme point of K ('\ P is an extreme point of K.
15. Let K be a closed convex polytope (not necessarily compact) and f be a linear
function such that f(x) is bounded above on K. Show that f has an absolute
maximum on K.

118
4
Vector-valued functions
of several variables

In this chapter we study the differential calculus of functions of several


variables with values in En. Among the main results are the theorems about
composition and inverses and the implicit function theorem. Later in the
chapter, subsets of En that are smooth manifolds are considered, and the
spaces of tangent and normal vectors at a point of a smooth manifold are
found. These ideas are then applied to obtain the Lagrange multiplier rule
for constrained extremum problems.
Functions with values in en are referred to as transformations rather than
vector-valued functions. This term has a useful geometric connotation, and
it also agrees with rather common usage. Some authors use instead the term
"mapping." The differential calculus of transformations is based on local
linear approximations, just as for the special case (n = I) of real valued
functions considered in Chapter 3. Consequently, it is first necessary to
review some results about linear transformations. This is done in Section 4.1.

4.1 Linear transformations


In this section we collect some facts about linear transformations from one
euclidean vector space into another. For results stated without proof,
references are given to the work by Hoffman and Kunze [12]. However, the
results in question are standard in linear algebra and may be found in
practically any good book on the subject.
We use the following notation. A linear transformation is generally
denoted by L. Its domain is a euclidean space E', of dimension r, whose
points are denoted by s, t, .... The transformation L has values in another
linear space En, of dimension n, whose points are denoted by x, y, ....

119
4 Vector-valued functions of several variables

Definition. A transformation L from E' into En is linear if:


(1) L(s + t) = L(s) + L(t) for every s, tEE'; and
(2) L(cs) = cL(s) for every SEE' and real c.
This is a special case of the definition in Section A.1 of the Appendix.

We also recall from Section A.1 the concept of vector subspace. To verify that
a set PeEn is a vector subspace of En, one must show that x, YEP implies
x + YEP and that x E P implies cx E P for any real c. A vector subspace P
has a dimension p, and 0 ~ p ~ n. If p = 0 then P = {O}. If p = 1, then P is
a line containing 0; if p = 2, P is a plane containing 0, and so on.
If L is a linear transformation, then the set L(E') is a vector subspace of
En. To prove this, let x, YE L(E'). Then x = L(s), Y = L(t) for some s, tEE'.
But x + Y = L(s) + L(t) = L(s + t). Hence x + YE L(E'). Similarly, if
x E L(E') then cx E L(E') for every scalar c. The dimension p of the vector
space L(E') is called the rank of L.
The kernel of Lis {t: L(t) = O}. It is a vector subspace of E' (Problem 2).
The dimension v of the kernel is called the nullity ofL. The rank and nullity
are related by [12, p. 66J
(4.1) p+v=r.

The matrix of L
Let us denote the standard basis vectors for E' by E1, ... , E, and those for
En by et> ... ,en • Thus vectors t = (t 1, ... ,t') and x = (x1, ... ,xn) can be
written, respectively, as
, n

t = L
j= 1
tjE j , X = L xiei ·
i= 1

With these bases is associated a matrix (C}) for L with n rows and r columns,
in the following way. Let Vj = L(E j ), j = 1, ... , r. The vectors VI"'" V, are
the columns of the matrix. Let C} be the element in the ith row of column j.
Let us show that, if x = L(t), then the components satisfy
,
(4.2) x -
i _ 'L.
" Cjt
i j,I
' -- 1 , ... , n.
j= 1

To see this, by definition of C},


n
(4.3) Vj = L c~ei,j = 1, ... , r.
i= 1
Since L is linear,

,
(4.4) L(t) = L tjv j'
j=l

By (4.3) and (4.4), the components Xi satisfy (4.2).

120
4.1 Linear transformations

Equation (4.2) is just a formula for matrix-vector multiplication. If one


regards t, x as column vectors, then (4.2) states that x is obtained by multi-
plying t on the left by the matrix (c~).
Actually, for any pair of bases for E' and En there is a matrix associated
with L. It is shown in linear algebra that by suitable choice of bases the asso-
ciated matrix can be made to have some special form, for instance, the Jordan
canonical form if r = n [12, p. 207]. What we have called "the matrix of
L is the matrix corresponding to the standard bases for E' and En.
The rows of the matrix (c~) also have an interesting interpretation. The
components L 1, . . . , L n of the linear transformation L are real valued linear
functions. The statement x = L(t) is equivalent to Xi = Li(t), i = 1, ... , n. In
Section 3.2 we identified each real valued linear function with a covector.
Let Wi be the covector with which the linear function Li is identified. Then

i = 1, ... , n.

By (4.2), the components of Wi are the entries C i1 , •.• , c~ of the ith row of the
matrix (c~). For that reason wI, ... , wn are called the row covectors of the
matrix (c~).
v1 V2 V3 V,

w1 c: d d c,1
w2 ci d c,2
w3 d c,3

wn c~ ... . .. . .. ,
en

By (4.4) the column vectors v1, ... , V, span L(E'). The rank p equals the
largest number of linearly independent column vectors of the matrix. Since
row rank equals column rank [12, p. 105J, P is also the largest number of
linearly independent row covectors of the matrix.

Composition
Let L be linear from e into En, and M linear from En into £P. The composite
MoL is linear. Its matrix is the product of the matrices of M and L
(Problem 4).

The case r = n
Let 1 denote the identity linear transformation, I(t) = t for every tEe. Its
matrix is (b~), which has 1 for each element of the principal diagonal and 0
elsewhere. L is nonsingular if it has rank p = n, and singular if p < n. A non-
singular linear transformation L has an inverse L - 1, which is also a linear
transformation with
L-1oL=LoL- 1 =1.

121
4 Vector-valued functions of several variables

If r = n, then the n x n matrix (c}) has a determinant, denoted by det(c)).


This number is also called the determinant of L. Thus by definition

det L = det(c)).
Among the properties of determinants we recall:

det(M a L) = det M det L [12, p. 143],


det L = 0 if and only if L is singular [12, p. 150].

By (4.1), L is singular if and only if v > O. But v > 0 means that L(t) = 0 for
some t =1= O. Therefore, from (4.2) the system of homogeneous linear equations
n
(4.5) 0= L C)ti, i = 1, .. . ,n
i= 1

has a nontrivial solution if and only if det L = O.


In later chapters we see that the absolute value of the determinant is
the ratio of n-dimensional volumes, and the sign of the determinant deter-
mines an orientation.

EXAMPLE 1. Let n = r = p = 2. Let L(s, t) = (2s + t)e1 + (3s - t)e 2 • The


matrix of L is

The linear transformation L takes the standard basis vectors £1' £2 into
V1 = 2e 1 + 3e 2 , V2 = e 1 - e 2 • Since det L = -5 =1= 0, Lis nonsingular (see
Figure 4.1).

-"1 (s, t) ""


- L ""
""
"'> (x,y)
-+-..L..tt--- s --*-------------~---x
/
I
/
(x, y) = L(s, t) /
I
x=2s+t
y=3s-t " , /I

'<
Figure 4.1

122
4.1 Linear transformations

Let M(x, y) = (2x - 5y)EI - XE2' where E I, E2 denote the standard


basis vectors for the plane E2 in which M has its values. The matrix of M is

( 2 -5)
-1 O·
Since det M = - 5 #- 0, M is also nonsingular. The composite is found by
(M 0 L)(s, t) = M(2s + t,3s - t) = [2(2s + t) - 5(3s - t)]E I - (2s + t)E2'
(M 0 L)(s, t) = (-lIs + 7t)E I - (2s + t)E 2 •
The matrix of MoL is

(-11 7).
-2 -1
As expected, det MoL = 25 = det M det L.
EXAMPLE 2. Let n = r = 3, and let L be the linear transformation that takes
the standard basis vectors t l , £2, £3 respectively into
VI = e l + 2e 2 - e3 ,
The matrix is
-1
1 -1)
4 ,
o -1
which has VI' v2, V3 as column vectors. The determinant is 0, and thereforeL is
singular. In fact, V3is a linear combination ofv I and V2, namely, v3 = VI + 2v 2.
Since VI and v2 are linearly independent, the rank ofL is 2. L(E 3 ) is the plane
containing 0, vI, V2' By (4.1) the kernel has dimension 1. It is found by solving
the system (4.5) of homogeneous linear equations. One solution is tl =
£1 + 2£2 - £3' The kernel consists of all scalar multiples of t l .

The dual L * of a linear transformation


This is the linear transformation from the dual space (En)* into (E r)*, defined
as follows. Let (c}) be the matrix of the linear transformation L, and Wi, •.. , w'
the row covectors of the matrix (c~), as above. Note that Wi corresponds to the
real valued linear function L i , which has domain Er. Thus Wi E (Er)* for each
i = 1, ... , n. The linear transformation L* is defined as follows.

Definition. For any covector a = (al" .. ,an) in (E n)*, let


n
(4.6) L*(a) = L aiwi.
i= I

The covector L*(a) is in (E')*, since it is a linear combination of the


covectors Wi, •.• , wr that are in (E')*. The components of L*(a) can be found

123
4 Vector-valued functions of several variables

as follows. We have
r
Wi = L C~&j, i = 1, ... , n,
j= I

where {& I, ... , &r} is the standard basis for (E')*. If we write b = L *(a), then
the components of b satisfy
n
(4.7) hj = LaiC), j = 1, ... , r.
i= I

Ifone regards the co vector a as a row vector, then (4.7) states that bis obtained
by multiplying a on the right by the matrix (c~).

EXAMPLE 2 (continued). In this example the row covectors are Wi =


&1 _ &2 _ &\ w2 = 2&1 + &2 + 4&3, w3 = _&1 - &3, and L*(a) = alw l +
a2w2 + a3 w3 .

Note that Formulas (4.6) and (4.7) are statements about covectors corre-
sponding to Formulas (4.4) and (4.2) about vectors. We say that (4.6) is dual
to (4.4), and (4.7) dual to (4.2). Further aspects of the duality between Land
L* are mentioned in Problem 5.

PROBLEMS

1. Let r = 3, n = 2, and L be the linear transformation such that L(E 1) = el - 2e2,


L(E 2) = e 1 , L(E3) = Se l + e2' Find the matrix ofL, the rank, and the kernel.
2. Show that the kernel of a linear transformation is a vector subspace of its domain.
3. Let r = n, and let Li(t) = eit i for every tEEn, where e I, ... , en are scalars.
(a) What is the matrix?
(b) Find L - I if it exists.
(c) If e l = ... = en > 0, then L is called homothetie about O. Describe L geomet-
rically. Show that if Land Mare homothetic about 0, then L - I and MoL are
also homothetic about O.
4. (a) Show directly from the definitions that the composite of two linear transfor-
mations is also linear.
(b) Let (C~), (dl), and (b~) denote respectively the matrices of L, M, and MoL. Show
that
n
(4.8) b~ = Idk~, for 1 = 1, ... , p, j = 1, ... , r.
i=1

5. Let L * be the dual of the linear transformation L. Show that:


(a) L*(e i) = Wi, i = 1,.,., n, where {el, ... , en} is the standard basis for (En)*,
Section 3.2.
(b) a' L(t) = L *(a) . t, for every covector a E (En)* and vector t E g.
(c) L(g), L*[(E n)*] have the same dimension p, and L, L* have the same nullity v.
(You may use the fact that row rank of a matrix equals column rank.)

124
4.2 Affine transformations

4.2 Affine transformations


If L is linear, then L(O) = O. This fact gives 0 a special role that is somewhat
unnatural from the geometric viewpoint. Before proceeding to study rather
general·nonlinear transformations in Section 4.3, let us consider another
special class of transformations called affine. The main result of the section
is a characterization of the isometries of En (Theorem 4.1).
The present section can be omitted at first reading. The material in it is
mainly used later in the book in examples.

Definition. A transformation g is affine is there exist a linear transformation


Land Xo E en such that
g(t) = L(t) + Xo for every tEE'.

If r = nand L = I, then g is a translation.


A translation merely takes each t into t + xo' If g is affine, then g(O) = xo.
Hence an affine transformation g is linear if and only if g(O) = O. Every affine
transformation is the composite of a translation and a linear transformation.

Isometries of En
Let g be a transformation from en into en. If g preserves the distance between
each pair of points, then g is called an isometry.

Definition. If Ig(s) - g(t) I = Is - tl for every s, tEEn, then g is an isometry


of En.

Let us first suppose that g is an isometry of P that leaves 0 fixed, namely,


g(O) = O. Then taking t = 0, we have Ig(s) I = lsi for every sEEn. Using the
formula Ix - yI 2 = Ixl2 - 2x· Y + IYI2, we have
Ig(sW - 2g(s)· g(t) + Ig(tW = Isl2 - 2s· t + IW,
for every s, tEEn. Therefore
(4.9) g(s) . g(t) = s . t,

which says that g preserves the euclidean inner product. Let Vj = g(E),
j = 1, ... , n. Then IVjl = 1 and from (4.9)

for each i,j = 1, ... , n where, as in Section 1.2, bij is Kronecker's delta.
Hence VI"'" Vn form an orthonormal basis for P. Let us show that g is a
linear transformation. For each s, t, we have from (4.9)

125
4 Vector-valued functions of several variables

and hence
[g(s + t) - g(s) - g(t)] . Vj = 0,
for eachj = 1, ... n. The vector g(s + t) - g(s) - g(t) has component 0 with
respect to each basis vector Vj' Hence

g(s + t) = g(s) + g(t).


Similarly, g{cs) = cg{s) for every s and scalar c; thus g is linear. The column
vectors of its matrix are v I, •.• , Vn'

Definition. A linear transformation that preserves the euclidean inner


product is an orthogonal transformation.

Proposition 4.1 (r = n). L is an orthogonal transformation if and only if the


column vectors VI"", vnform an orthonormal basis for En.
PROOF. We have already shown that if L is orthogonal, then Vb"" vn
form an orthonormal basis. To prove the converse, we see from (4.4) that
n
L(s) . L(t) = L sit j Vi • Vj'
i,j= I

If VI' ... , Vn is an orthonormal basis, then Vi· Vj = b ij , and


L(s) . L(t) = s . t
for every s, tEEn. D

Theorem 4.1 (r = n). A transformation g is an isometry of En if and only if


g is an affine transformation of the form g(t} = L{t} + Xo for every tEEn,
where L is orthogonal.
PROOF. Let g be an isometry of En. Let f~t) = g(t) - Xo for every tEEn,
where Xo = g(O). Then
If(s) - f(t) I = Ig(s) - g(t) = Is - tl
for every s, tEEn. Hence f is an isometry. Moreover, f{O) = O. We have
already shown that f must be orthogonal.
Conversely, let L be orthogonal. Then L(s)· L(t) = s . t for every s,
tEEn. Taking s = t, we have
IL(sW = L(s)· L(s) = s· s = Is12.
Hence IL(s)1 = lsi. Replacing s by s - t, we have
IL{s) - L(t) I = lL(s - t)1 = Is - tl
for every s, tEEn. Hence L is an isometry of En. Since Ig(s) - g{t) I =
IL(s) - L(t) I, g is also an isometry of En. D

126
4.2 Affine transformations

IfL is linear from E" into E", let V denote the linear transformation whose
matrix (c{) is obtained by exchanging rows and columns of the matrix (c~) of
L. We call V the adjoint of L. It is characterized by the formula
(4.10) Y. L(t) = V(y) . t
for every y, tEE" (see Problem 5).
Let us apply (4.10) with y = L(s). We get
L(s) . L(t) = (V 0 L)(s) . t.
From this equation, L is orthogonal if and only if s· t = (V L)(s)' t for 0

every s, tEE". But this is equivalent to the statement that s = (V 0 L)(s)


for every s, in other words, that I = Lt 0 L. Hence L is orthogonal if and only if
V = L- 1 •
If L is orthogonal, then
1 = det I = det V det L.
But det V = det L ([12], p. 146). Hence 1 = (det L)2, and det L = ± 1.
If L is orthogonal and det L = 1, then L is called a rotation of E" about O.

EXAMPLE 1. Any translation is an isometry of E", and L = I.

EXAMPLE 2. Let S be the orthogonal transformation which takes each t =

(. J
(tl, ... ,t") into S(t) = (tl, ... ,t"-I, -t"). S is a reflection of E" about the
hyperplane t" = O. Its matrix is

Two such reflections take each t into itself; that is, So S = I. Hence S =
S-I = st. If M is any orthogonal transformation with det M = -1, then
L = S 0 M is a rotation of E" about 0 and
M=S-loL=SoL.
Thus any orthogonal transformation is either a rotation or the composite of
S and a rotation.

EXAMPLE 3. Let n = 2, and L be a rotation of the plane E2 about (0, 0). Since
I vII = 1, VI = (cos 8)e 1 + (sin 8)e 2 for some 8 E [0, 2n). Since L is a rotation,
V 2 = (- sin 8)e 1 + (cos 8)e 2. The matrix is

(
COS 8
sin 8
-sin
cos 8
8).
The angle of rotation is 8.

127
4 Vector-valued functions of several variables

PROBLEMS

1. Let n = r = 2.
(a) Describe geometrically the linear transformation L with matrix G~).
(b) Find SoL and LoS, where S is the same as in Example 2. Show that both are
rotations of E2 about (0, 0).

2. (a) Show that the vectors VI = (l/.j5)(e l + 2e 3 ),v2 = (l/jiO)(-2e l + .j5e 2 + e3 ),


V3 = (l/jiO)(2e l + .j5e 2 - e 3 ) form an orthonormal basis for E3.
(b) Let L be the orthogonal transformation whose matrix has Vio V2 , V3 as column
vectors. Find L' and verify that L' L = I. Is L a rotation?
0

3. Let Land M be rotations of En about O. Show that L - I and MoL are also rotations.
4. (a) Show that the composite of two affine transformations is also affine.
(b) Which affine transformations are univalent?
5. Prove (4.10).

4.3 Differentiable transformations


We now begin a discussion of differential calculus for transformations from a
portion of one euclidean space E' into another euclidean space En. The first
step is to define the concept of differentiability.
Let g be a transformation from ~ c Er into En, and let to be an interior
point of A We would like to find a local linear approximation for the difference
g(t) - g(to). If there is such an approximation, then g is said to be differen-
tiable at to. More precisely:

Definition. A transformation g is differentiable at to if there exists a linear


transformation L (depending on to) such that
. 1
(4.11) hm -kI1 [g(to + k) - g(to) - L(k)] = O.
k .... O

If we set t = to + k, then L(t - to) is the desired local approximation to


g(t) - g(to). If n = 1, the definition agrees with the one in Section 3.3 for real
valued functions.

Definition. The linear transformation L in (4.11) is called the differential of


g at to and is denoted by Dg(to)·

The following proposition is an easy consequence of the definitions, and


Proposition 2.1 (1) about limits of sums.

Proposition 4.2. Let g and G be differentiable at to. Then the sum g +G is


differentiable at to, and
D(g + G)(t o) = Dg(to) + DG(to).
128
4.3 Differentiable transformations

For formulas for the differentials of products, see Problem 12, Section 4.4.
In order to find the matrix associated with the differential Dg(to) of a
linear transformation g, let us first prove the following proposition. For
each t E~,
g(t) = gl(t)e l + .. , + g"(t)e".
The real valued functions gl, ... , g" are the components of g. Their partial
derivatives are denoted by g~ or ogijotj.

Proposition 4.3. (a) A transformation g is differentiable at to if and only if each


each of its components g l, ... , g" is differentiable at to. (b) Ifg is differentiable
at to, then the matrix of the linear transformation L is the matrix of partial
derivatives g~{to).
PROOF. Let L l, ... ,L" denote the components of L, and let
<!l(k) = Ikl-l[g(to + k) - g(to) - L(k)].
The components of <!l(k) are
i = 1, ... , n.
By Proposition 2.2, <!l(k) -+ 0 as k -+ 0 if and only if cjJi(k) -+ 0 as k -+ 0,
i = 1, ... , n. This proves (a). It also shows that, ifL = Dg(t o), then the rows of
the matrix of L are the co vectors corresponding to the linear functions
L l, ... ,L" (in Section 4.1, these covectors were denoted by wl , ... , w").
From Section 3.3 we recall that Li(k) = dgi(to)· k for every k E E', where
dgi(to) is the differential of the component gi at to. Therefore, the row co vectors
of the matrix of L are dgl(to), ... , dg"(t o). The jth component of dgi(to) is the
partial derivative g~{to); see Formula (3.8). This proves (b). D

The partial derivatives of g are defined just as for real valued functions:
() l' g(to
gJ,t o = 1m
+ Sl:) - g(to)
, j = 1, ... , r.
8-0 S

By an argument like the proof of Proposition 4.3(a), the components of the


vector gito) are the partial derivatives g~(to), i = 1, ... , n. Thus gito) is the
jth column vector of the matrix (g~(to)) of partial derivatives. In the proof of
Proposition 4.3, we showed that dgi(to) is the ith row covector of the matrix
(g~{to)).
In case r = n the determinant of the linear transformation Dg(t) is called
the Jacobian of gat t. It is denoted by Jg(t). Thus
(4.12) Jg(t) = det Dg(t) = det(g)(t)).
Another common notation for the Jacobian is

129
4 Vector-valued functions of several variables

We see later that the Jacobian often plays the same role in the calculus
offunctions of several variables as the derivative does in the case r = n = 1.
In particular, this is so in the theorems about inverses (Section 4.5) and trans-
forming mUltiple integrals (Section 5.8). If the Jacobian is 0 at a point to,
then Dg(to) is singular. This suggests some kind of irregularity in the behavior
of g near to. In order to exclude such irregularities we repeatedly make the
assumption in later sections that the Jacobian is not O.

n
Transformation g = (g 1, ••• ,gn) = L giei
i= 1
Differential at t Dg(t)
Matrix of Dg(t) (g;{t)), i,j = 1, ... , n
Column vectors git), j = 1, ... , r
Row covectors dgi(t), i = 1, ... , n
Jacobian (r = n) Jg(t) = det(g~{t))
. ogi n. . r..
g~{t) = oti' gJ{t) = i~/~{t)e;, dg'(t) = j~l g~{t)f/

{e l , ... , en} standard basis for En


{£t, ... , £r} standard basis for (Er)*

EXAMPLE I. If n = 1, then Dg(t) has the single row co vector dg(t). We may
identify Dg(t) with dg(t). If dg(t) =F 0, the rank of Dg(t) is 1; otherwise it is O.

EXAMPLE 2. If r = 1, then there is a single column vector. It is the derivative,


denoted by g'(t). If g'(t) =F 0, the rank is 1; otherwise it is O.

EXAMPLE 3. Let n = r = 2. Points of the domain L\ are denoted by (s, t) and


those of the image g(L\) by (x, y). It is helpful to think of two copies of the plane
E2. The first contains L\ and is called the st-plane. The second contains g(L\)
and is called the xy-plane.
In our example we let L\ be the whole st-plane, and

g(s, t) = (S2 + t 2 )e 1 + 2ste 2


for every (s, t) E E2. If (x, y) E g(L\), then x = S2 + t 2 , Y = 2st and x + y ~ 0,
x - y ~ O. Therefore g(L\) is contained in the quadrant Q shown in Figure
4.2. In fact, g(L\) = Q. This is seen as follows. Let C be a circle with center
(0, 0) and radius a > O. Points of C are given by s = a cos ¢, t = a sin ¢,
where 0 :s; ¢ :s; 2n. The image g(C) is the set of points

g(a cos ¢, a sin ¢) = a 2e l + (a 2 sin 2¢ )e2' 0 :s; ¢ :s; 2n.


130
4.3 Differentiable transformations

x = S2 + t2
Y = 2st

Figure 4.2

This set is the vertical line segment shown in Figure 4.2. By letting a take
all possible nonnegative values, one gets a collection of line segments cover-
ing Q. If a = 0, the line segment degenerates to the point (0,0). This shows
that Q = g(d).
In Example 3, gl(S, t) = S2 + t 2, g2(S, t) = 2st. The partial derivatives of
gl appear in the first row of the matrix of Dg(s, t), and those of g2 in the
second row. This matrix is therefore

( 2S 2t).
2t 2s

The column vectors are the partial derivatives of g, namely,

gl (s, t) = 2sel + 2te 2 ,


g2(S, t) = 2te 1 + 2se2'
The Jacobian is

Jg(s, t) = det ( 2S
2t
2t)
2s = 4(S2 - t 2 ).

°
Note that Jg(s, t) =F if S2 =F t 2 • In Section 4.5 we show that g has a local
inverse near any such point (s, t). The points (s, is) where S2 = t 2 map under
g onto the boundary half-lines y = ix, x ~ 0, on the right-hand side of
Figure 4.2. The transformation g has no local inverse near (s, is).
Let us now suppose that the domain d of g is an open set.

Definition. If the components gl, ... ,gn are of class c<q), q 2 0, then g is a
transformation of class c<q). Similarly, if gl, ... ,gn are of class c<q) on
Bed, then g is of class C(q) on B.

If g is differentiable at every toE d, then g is a differentiable transformation.

131
4 Vector-valued functions of several variables

Theorem 4.2. Every diJjerentiable transformation is of class C(O). Every


transformation of class C(1) is differentiable.

PROOF. Apply Theorems 3.1, 3.2, and Proposition 4.3(a). o


For most theorems in the differential calculus of transformations, one
needs to assume that g is of class C(l) at least. An exception is the composite
function theorem (Section 4.4), in which only differentiability need be
assumed.
In the remainder of this section we establish several inequalities of a
rather technical nature. These inequalities are used in the proofs of theorems
to follow.
We first need to introduce a norm that measures the "size" of a linear
transformation. Let
IILII = max{IL(t)l: It I ~ l}.
The set of all linear transformations with domain E' and values in En forms
a vector space of dimension nr. The usual properties of a norm are satisfied
(Problem 7).
Let us show that
(4.13) IL(t) I ~ II L Illtl
for every tEE'. If t = 0, then both sides are O. If t =f 0, let e = It 1- 1. Since
L is linear, L(et) = eL(t). Since letl = 1, IL(et) I ~ II L II. Thus Itl-1IL(t)1 ~
II L II, which is the same as (4.13).
Since L(s) - L(t) = L(s - t), we have upon replacing t by s - tin (4.13)
(4.14) IL(s) - L(t) I ~ IILllls - tl.

Proposition 4.4. Let g be differentiable at to. Then given e > 0 there exists a
neighborhood no of to sueh that no c ~ and
(4.15) Ig(t) - g(t o) I ~ (II Dg(to)ll + e)lt - tol
for every t E no.
PROOF. LetL = Dg(to)andsetg(t) = g(t) - L(t). SinceDL(to) = L,Dg(to) = 0
(the zero linear transformation). By (4.11), in which g is replaced by g, there
is a neighborhood no of to such that
(*)
for every t E no. But
get) - g(to) = [L(t) - L(to)] + [get) - g(to)]·
From (*), (4.14), and the triangle inequality we get (4.15). o
If g is of class C(1), there is a stronger version of Proposition 4.4.

132
4.3 Differentiable transformations

Proposition 4.5. Let g be of class C(1) and to E~. Then given t: > 0 there
exists a neighborhood Q of to such that Q c ~ and
(4.16) Ig(s) - g(t) I ~ (II Dg(to)ll + t:)ls - tl
for every s, t E Q.

PROOF. Let g be as before. The row covectors d?/(t o) are all O. Since the partial
derivatives of g are continuous, given E > 0 there is a neighborhood Q
of to such that Idgi(u) I < t:/n for every u E Q and i = 1, ... , n. By Corollary 1,
Section 3.3, for every s, t E Q,
. . t:
Ig'(s) - g'(t) I ~ -Is - tl,
n
n
(**) Ig(s) - g(t) I ~ L Igi(s) - gi(t) I ~ t:ls - tl.
i= 1

From (**) and (4.14) we obtain (4.16) in the same way as before. 0

PROBLEMS

1. In Example 3, find:
(a) The image of any vertical line s = c.
(b) The inverse image of any line y = mx through the origin.
(c) The image of the circular disk bounded by C in Figure 4.2.

2. Letg(s, t) = Is - tie] + Is + tle 2,L\ = E2. Find g(E2) and answer questions (a), (b),
and (c) in Problem 1 for this transformation.

3. Let g(s, t) = (t cos 27tS)el + (t sin 27ts)e 2 + (1 - t)e3' L\ = E2.


(a) Show that g(E2) is a cone with vertex e 3.
(b) What is the image of the square {(s, t): 0 ~ s ~ I, 0 ~ t ~ I}?
(c) Find g-I({e 3}) and g-1({e 1}).

4. Let g(s, t) = 1/(s2 + st + t 2)el + 1/(s2 + st + t 2)2e2' L\ = {(s, t) : 0 < S2 + t2 ~ I}.


(a) Show that g(L\) is part of the parabola y = x 2 , and find it.
(b) Find g-l( {(c, c2 )}).

5. For each of Problems 2, 3, and 4, find:


(a) Where g is differentiable.
(b) The partial derivatives of g.
(c) The rank of Dg(s, t).
(d) The Jacobian Jg(s, t), where applicable.

6. (a) Let g be affine, g(t) = L(t) + Xo for every t E g. Show that Dg(t) = L for every
tEE'.
(b) Let g be a differentiable transformation such that Dg is a constant function
and L\ is a connected open set. Show that g is the restriction to A of an affine
transformation.

133
4 Vector-valued functions of several variables

7. Show that:
(a) IILII > 0 unless L has rank O.
(b) IlcLl! = 1c111LII·
(c) IILI + L211 ~ IILIII + IIL211.
(d) 11M LII ~ IIMIIIILII.
0

(e) The vector space ;t'(E', E") of all linear L from E' into E" has dimension tlr.

8. Another norm for linear transformations, which we denote by III III, is defined as
follows:

IIILIII = Iwll + ... + Iw"l,


where Wi, ... , w" are the row covectors. Show that properties (a)-(d) of Problem 7
hold for this norm. Show that II LII ~ IIILIIi.
9. Let r = nand g be a differentiable transformation. Then g is called conformal if
there exists a real-valued function 11 such that l1(t) > 0 and l1(t)Dg(t) is a rotation
of E" for every tEd.
(a) Using Proposition 4.1 show that g is conformal if and only if, for every tEd.
Jg(t) > 0 and the partial derivatives of g satisfy:

(1)

and

(2)

(b) Show that if g is conformal, then l1(t) = [Jg(tlr 1/".


(c) Let II = 2. Show that g is conformal if and only if Jg(t) > 0 and g:(t) = g~(t),
g~(t) = - gift) for every tEd. [Note: The partial differential equations g: = gL
g1 = - gi are the Cauchy- Riemann equations in the theory of complex analytic
functions; see the work by Nehari [20], for example.]
10. Show that g is conformal if gl(S, t) = exp(s2 - t 2)cos 2st, g2(S, t) = exp(s2 - t 2)sin
2st, d = E2 - {(O,O)}.

II. Let g be of class C(I). The maximum rank possible for Dg(t) is min(r, n), Show that
{t: rank Dg(t) = miner, n)} is open.

4.4 Composition
We now derive a rule for differentiating the composite of two differentiable
transformations. As corollaries of the basic formula (4.17) we then obtain a
formula for lacobians and the chain rule for partial derivatives.
Let g be a transformation from an open set Ll c Er into an open set
DeE", and let f be a transformation from D into £P.

Composite function theorem. Let g be differentiable at to and f be differentiable


at Xo = g(to). Then the composite F = fog is differentiable at to and
(4.17)

134
4.4 Composition

PROOF. Let L = Dg(t o), M = Df(xo). Let us first prove the theorem for two
special cases.
Case J. M = O. Let us show that

(*) !i~ I~ I [F(to + k) - F(to)] = o.


Let C = IILII + 1. By Proposition 4.4 with e = 1, there exists 3 0 > 0 such
that
Ig(to + k) - g(to) I ~ Clkl,
whenever Ikl < 3 0 , Since M = 0, II M I = I Df(xo)jj = Proposition 4.4 o.
(applied this time to f rather than g and e replaced by C - I e) implies that given
e > 0 there exists IJ > 0 such that
e
If(x) - f(xo)1 ~ C Ix - xol

whenever Ix - xol < 1]. Let 3 = min{C- I I],3 0 }. Then we get by taking
x = g(to + k) and using
f[g(to + k)] = F(to + k), f[g(to)] = F(t o),

IF(to + k) - F(t o) I ~ ce Clkl = elkl,


if Ik I < 3. This proves (*). But (*) says that F is differentiable at to with
DF(t o) = O.
Case 2. When f is a linear transformation, f(x) = Mx for all x. In this case
(4.11), together with Propositions 2.1 and 2.2, implies that

(**) ~i~ I~ I M[g(to + k) - g(to) - L(k)] = O.

Since F(t) = M[g(t)] in the special case when f is linear, (**) implies in this
case that F is differentiable at to, with DF(t o) = MoL.
In the general case, let f = f - M. Then Df(x o) = 0, and Case 1 applies
to the transformation F = fog. Moreover, Case 2 applies to G = M g. 0

Since F = F + G, the theorem now follows by Proposition 4.2. 0

By (4.17), the matrix of DF(t o) is the product of the matrices of M and L.


Let us specialize to the case whenfand F are real valued (p = 1) with F = fog.
Moreover, let us abbreviate by writing

Fj = Fito),/; = /;(xo), g~ = gito).


The matrix product becomes

(F 10 •• ', Fr) = (fl' ... ,in) (g/ ... gl)


g~ ... g~ .

135
4 Vector-valued functions of several variables

This is called the chain rule for partial derivatives. We rewrite it in more
compact notation as:

Corollary I (chain rule)

(4.18) Fj = L" h9), j = 1, ... ,r.


i= 1

Note that F 1, ... , F r are the components of the covector dF(t o), and
f1' ... ,f" the components of the covector df(x o). According to Formula
(4.7), the chain rule (4.18) states that dF(t o) = L*[df(x o)] when L = Dg(to).
Thus, the chain rule can be regarded as a particular case of the formula for
changing the components of a covector under the dual L * of the linear
transformation L.
Another suggestive form for the important formula (4.18) is obtained by
writing it with the other notation for partial derivatives:

of of ogl of og"
-j= -1- + ... + - -j j = 1, ... , r.
ot ox ot
j ox" ot '

Let us next consider the case when F and g are vector-valued functions
of one real variable (r = 1). Their derivatives are denoted by F'(to) and g'(to).
If we identify the p x 1 matrix of DF(t o) with F'(to), regarded as a column
vector, and the n x 1 matrix of Dg(t o) with g'(to), then we get from (4.17):

Corollary 2. Let r = 1. Then F'(to) = Df(xo) [g'(to)].

Corollary 2 is used in the discussion of tangent vectors in Section 4.7.


Again let p = 1, and suppose that f and g are of class C(q) for some q ~ 1.
In particular,fand g are differentiable. Formula (4.18) applies at every point
t E L\ and the corresponding point x = g(t) E D. Thus

Fit) = L" (h 0 g)(t)g~{t),


i= 1

for every t E L\ and j = 1, ... , r. Since all of the functions h, g, g) are con-
tinuous, each partial derivative F j is continuous. Hence F is of class C(l).
If q ~ 2, then repeated application of the chain rule shows that F is of class
c<q) and gives formulas for calculating its partial derivatives of orders 1,
2, ... ,q.
In case p > 1 and f, g are of class C(q), the preceding discussion shows
that the components F1, ... ,FP of F are of class C(q), since FI = flog for
each I = 1, ... , p. Therefore F is of class C(q). We have proved:

Corollary 3. Iff and g are of class C(q), then F is of class C(q).

136
4.4 Composition

EXAMPLE I. Let r = p = 1. The chain rule becomes


n
F(t) = L /; [g(t )]gil(t),
i= I

which can also be written


F(t) = df[g(t)] . g'(t).
If in addition n = 1, it becomes F(t) = j'[g(t)]g'(t), which is the composite
function rule of elementary calculus.

EXAMPLE 2. Let F(x) = f[x, g(x)], wherefand 9 are of class e(2). In this case
gl(X) = X, g2(X) = g(x), and the formula in Example 1 becomes

F(x) = fl [x, g(x)] + f2[X, g(X)]g'(X).


Another application of the chain rule together with the formula for the
derivative of a product gives

r(x) = fll + 2fI2g'(X) + f22 [g'(XW + f 2gl/(x).


In this formula the partial derivatives off are evaluated at (x, g(x)).

EXAMPLE 3. Letfbe of class e(2) and let


F(r, e) = fer cos e, r sin ()].
Let us show that

The expression on the left-hand side is called the Laplacian off The partial
differential equationfll + f22 = 0 is called Laplace's equation. Its solutions
are called harmonic functions. The formula above expresses the Laplacian in
polar coordinates.
In this example
gl(r,O) = r cos e, g2(r, e) = r sin e.
Using the chain rule, we get

FI = fIg: + f2gi = fl cos e + f2 sin 0,


F2 =flgi +f2g~ =fl(-rsine)+fz(rcose).

Further application of the chain rule gives

F II = cos e(fll cos e + fl2 sin e) + sin e(f21 cos e + f22 sin e),
F 22 = - r sin e[fll ( - r sin 0) + f12(r cos e)] + r cos e[f21 ( - r sin e)
+ f2z(r cos e)] - fIr cos 0 - f2r sin e.

137
4 Vector-valued functions of several variables

Combining terms and using the fact thatf21 = f12' we get


1 1 .
Fll + zF
r
22 = fll + f22 - -(f1 cos e
r
+ f2 sm e),

This is what we wished to show.

Corollary 4. Let n = r = p, and let f and g be differentiable. Then,for every


t E.1\,
o(F 1, ... , F") o(f 1, ... , f") o(g 1, ... , g")
(4.19)
0(t 1, ... , t") 0(x 1, ... ,x") 0(t 1, ... , t") ,
the Jacobians being evaluated at t and at x = g(t).
PROOF. By (4.17), DF(t) = Df(x) Dg(t). Hence
00

det DF(t) = det Df(x) det Dg(t). o


EXAMPLE 4. Let n = r = p = 2. Let f(x, y) = f l(X, y)E 1 + f2(X, y)E 2,
g(r, e) = (r cos e)e 1 + (r sin e)e2' As before, E 1, E2 denote the standard basis
vectors for the plane E2 in which f takes its values. Then
0(g1,g2) = det (cose
~---=:--
-rsine) = r
o(r, e) sin e r cos e .
Hence

PROBLEMS

Assume that all functions which occur in these problems are of class e(2).
1. Let F(x, y) = f(x, xy). Find the mixed partial derivative F 12'

2. Let F(x, y) = f[x, y, g(x, y)]. Express the partial derivatives of F of orders 1 and 2
in terms of those of f and g.

3. Let n = r = p = 2. Find the Jacobian o(Fl, F 2 )/o(s, t) at the indicated point by


means of Corollary 4.
(a) f(x, y) = xyEI + x 2yE2' g(s, t) = (5 + t)el + (S2 - t 2)e2' (so, to) = (2, 1).
(b) f(x, y) = 1>(x + y)E 1 + 1>(x - y)E 2, g(s, t) = (exp t)el + exp( -S)e2, (so, to) =
(log 2)E I .
4. (a) Show that the chain rule is still true if p > 1, namely,
of • ogi of
oti = i~ oti ox i ' j = 1, ... , r.

(b) Use it to find (oF/os)(so, to) and (oF/ot)(so, to) in Problem 3(a).

138
4.4 Composition

5. Let f(x, y) = 4>(x - cy) + ljJ(x + cy), where c is a scalar. Show that f22 = c2fll'
6. Let
n=4 and
Let

Show that

[Note: The partial differential equation fnn = C2(f1l + ... + fn-I,n- tl is called
the wave equation in n variables. Problem 5 gives D'Alembert's solution for n = 2.
Solutions of the type in Problem 6 are called spherical waves.]

7. Suppose that f satisfies the partial differential equation f2 = fll + bf, where b
is a scalar. Let F(x, y) = exp(-by)f(x, y). Show that F2 = F II .

8. Let F = f L, where L is a linear transformation with matrix (ej). Show that the
0

second-order partial derivatives of F satisfy

j, I = I, ... , r.

9. Using Problem 8 show that if r = nand L is orthogonal, then F II + ... + F nn =


fll + ... + fnn· In other words, the Laplacian is invariant under orthogonal
transformations of En. [Hint: L' = L -I.]

10. Let n = r. A linear transformation L is a Lorentz transformation of En if L - I =


SoL' S, where S is as in Example 2, Section 4.2.
0

(a) Show that if M and L are Lorentz, then MoL and L - I are also Lorentz.
[Hint: S2 = I.]
(b) Show that L is Lorentz if and only if S = L' SoL. 0

(c) Show that L is Lorentz if and only if


n-I n-I

i= 1 i= 1

for every t. [Hint: The right-hand side is S(t)· t. Use (b).]

11. Show that if L is Lorentz, then F I I + ... + Fn-I,n-I - Fnn = fll + ... +
fn-I,n-I - fnn· In other words, the wave operator is invariant under Lorentz
transformations (c = 1).

12. Use the composite function theorem to establish the following rule for differentials
of inner products. Let q, and \jI be differentiable at to, both with values in En. Show
that q, . \jI is differentiable at to and that d(q, . \jI)(t o) = q,(to)' D\jI(t o) + \jI(to)' Dq,(t o).
[Note: Ifv E En and L is a linear transformation from E' into En, then v . L denotes
the real valued linear function such that (v . L)(k) = v . L(k) for all k E E'.] [Hint:
Take g = (q" \jI) with values in E2n and f(x, y) = x . y.]

139
4 Vector-valued functions of several variables

4.5 The inverse function theorem


In this section we suppose that r = n. Thus g denotes a transformation from
an open set.1 c En into En. We recall that g is called univalent ifs ¥- t implies
g(s) ¥- g(t). A univalent transformation g has an inverse, denoted by g-I.
The inverse g- 1 has as domain the set g(.1). It is defined by t = g- I(X), where
t is the unique point of .1 such that x = g(t). It occasionally happens that
g-I can be found explicitly by solving the system of equations Xi = gi(t),
i = 1, ... , n, for the components tl, ... ,tn in terms of x. However, the more
common situation is either that these equations cannot be explicitly solved,
or that it is inconvenient to solve them explicitly. One would like a criterion
which guarantees that the inverse g- 1 exists, and a formula for its differential,
without explicitly finding g-l itself.
Let us begin by reviewing the situation when n = 1. Let g be a real valued
function of class C<1), with domain an open interval .1. The inverse g-l
exists if and only if g is strictly monotone on .1 (Appendix AA). It is proved in
elementary calculus that the inverse is differentiable provided g'(t) ¥- for°
all t E .1, and that

(4.20) 1
g - 1'(x ) = g'(t)' 'f x = g(t ).
1

In two or more dimensions the Jacobian Jg(t) takes the place of the deriva-
tive g'(t). However, the situation is by no means as simple as before. First of

differentiability. Second, and more important, the fact that Jg(t) ¥- does
not imply that g has an inverse. (see Example 2 below). However, the non-
°
all, we have to assume that g is at least of class C(1), a stronger condition than

vanishing of the Jacobian Jg(to) at a point to implies that the restriction


g 1.10 has an inverse for some open set .10 containing to. This is part of the
statement of the inverse function theorem. We sometimes call (g I.1 0) - 1 a
local inverse of g.

Inverse function theorem. Let g be a transformation of class C(q), q ~ 1, from


an open set .1 c En into En. If Jg(to) ¥- 0, then there exists an open set .10
containing to, such that:
(1) The restriction gl.1o is univalent.
(2) The set g(.1 o) is open.
(3) The inverse f of g 1.10 is of class C(q).
(4) Df(x) = [Dg(t)] - 1, if x = g(t), t E .1 0 ,

The following argument should make the existence of a local inverse


plausible. For t near to, g(t) is approximated by G(t) = g(to) + L(t - to),
where L = Dg(to). Since the determinant of L, namely Jg(to), is not 0, L has
an inverse. Hence, the affine transformation G also has an inverse. Since G

140
4.5 The inverse function theorem

--
g

(f has domain g(~o)) Xo = g(t o)

Figure 4.3

is a good approximation to g near to, this suggests that the restriction of


g to a small enough open set Ao containing to should also have an inverse.
This plausibility argument does not, of course, constitute a proof. In fact, it is
a moderately difficult task to prove the inverse function theorem. We give a
proof at the end of the section. In the meantime, we state some consequences
of this theorem and examples.
The determinant of the inverse L - I of a linear transformation L is
(det L)-I. Therefore, part (4) of the inverse function theorem implies that:
1
(4.21 ) Jf(x) = Jg(t)' t = f(x).

The inverse function theorem has the following corollary.

Corollary. Let g be of class C(1), and suppose that Jg(t) i= 0 for all tEA.
Then the image g(B) of any open set B c A is an open set.

PROOF. Let B c A be open, and consider any tt E B. We apply the inverse


function theorem, with A replaced by B. There exists an open set Al containing
t l , with Al c B, such that g(A I ) is open. Therefore the point Xl = g(t l ) has a
neighborhood U I such that U I c g(AI)' But g(A I) c g(B), and hence XI is
an interior point of g(B). Since this is true for each XI E g(B), g(B) is open. 0

EXAMPLE I. Let g be as in Example 3, Section 4.3. We saw that Jg(s, t) i= 0


provided S2 i= t 2 . The lines s = ± t divide the plane into four open quadrants.
The restriction of g to any of these quadrants has an inverse, which can be
found by explicitly solving the equations x = S2 + t 2 , y = 2st for (s, t) in
terms of (x, y). For instance, consider the quadrant Ao = {(s, t): It I < s,s > OJ.
Then
x +y = (s + t)2,

(*) s=
Jx+Y+2 Jx"=Y '
141
4 Vector-valued functions of several variables

The restriction glL\o has all of the properties (1)-(4) of the inverse function
theorem. Its inverse is f = (fl,j2) wherefl(x,y) andfZ(x,y) are on the
right side of equations (*). The open set g(L\o) is the interior of the quadrant
Q in Figure 4.2.

EXAMPLE 2. Let n = 2, and


g(s, t) = (cosh s cos t)el + (sinh s sin t)e z,
where cosh and sinh are hyperbolic functions. Then
g~(s, t) = sinh s cos t, g~(s, t) = -cosh s sin t,
= cosh s sin t,
gi(s, t) g~(s, t) = sinh s cos t.

The Jacobian is sinh z s cos z t + coshz s sin z t, which simplifies because


cosh z s = 1 + sinh z sand cos z t + sin z t = 1 to
Jg(s, t) = sinh z s + sin z t.
If we take for L\ the right half-plane s > 0, then sinh s > 0 and Jg(s, t) > O.
The hypotheses of the inverse function theorem are satisfied; hence local
inverses exist. Since cos and sin are periodic, g(s, t + 2n) = g(s, t). The trans-
formation g is not univalent, and consequently has no inverse. By the
corollary, g(L\) is an open set which, as we shall soon see, is E Z with a line
segment removed.
Let X = {(s, t): s > 0,0< t < 2n}, and let g be the restriction of g to X.
Let us show that g has an inverse. It is not easy to solve the equations
x = gl(S, t) = cosh s cos t, y = g2(S, t) = sinh s sin t
explicitly for sand t. However, let us consider what happens on vertical
straight lines s = c. For each c > 0, g(c, t) represents on [0,2n] an ellipse
with major semiaxis of length cosh c > 1 and minor semiaxis of length
sinhc.Eachoftheseellipseshas ±e 1 asfoci,andg(c,O) = g(c,2n) = (coshc)el'
If Sl =1= Sz, then the points g(Sl, t l ) and g(sz, t z) lie on different ellipses.
I
t (s, t + 2n)
y

- g

• (s, t - 2n)
I

Figure 4.4

142
4.5 The inverse function theorem

Moreover, g(s, t d = g(s, t 2) implies t I = t 2' Hence g(s I, t d = g(s 2, t 2) implies


that (Sl' t l ) = (S2' t 2), and g is univalent. The image of A is E2 with the semi-
infinite line on the x-axis from -e l to 00 deleted. The part of the boundary of
Aon the s-axis is transformed onto the part of the line from e l to 00, and the
vertical part of the boundary onto the part from - e l to e l . Hence g(cl A) =
E2. By periodicity each value which g takes on Ll is also taken somewhere on
Aor its lower boundary. Hence g(Ll) is E2 with the line segment joining - e l
and e l removed (see Figure 4.4).

Regular transformations
By adding to the hypotheses of the corollary above the assumption that g
is univalent we obtain a property that we call regularity.

Definition (r = n). A transformation g is regular if:


(1) g is of class CO),
(2) g is univalent, and
(3) Jg(t) =F 0 for every tEll.

A regular transformation g has an inverse g-I which is also of class C(1).


Regular transformations are called by many authors diffeomorphisms of class
C( I). A transformation of class C(O) which has an inverse of class C(O) is called a
homeomorphism (Section 2.6). By Theorem 4.2 every regular transformation
is a homeomorphism.
One might expect naively that at worst a transformation distorts shapes,
and that the image of a set has basically the same structure as the original.
For instance, the image of a line segment should be a smooth curve with no
self intersections, the interior of a set should transform onto the interior of the
image, and so on. From various examples we know by now that this need not
be the case at all. However, it is so for regular transformations. They are the
ones that behave properly throughout calculus.
The notion of regular transformation is the basis for the discussion in
Chapter 8 of coordinate changes on manifolds. The transformation law for
multiple integrals will be proved in Chapter 5 only for regular transformations.
We turn now to a proof of the inverse function theorem. In preparation,
we first establish two lemmas. In Lemma 1, q, denotes a transformation from
some neighborhood 0 1 of 0 of radius (j 1 into En. We assume that
(4.22) 1q,(t)l:<:::;cltl foralltEO I
where 0 < c < 1. This implies q,(01) cO l ' Let us use the notation
q,lm) = q, 0 ••• 0 q,
for the composition of q, with itself m times. We set q,IO) = I, where I is the
identity transformation of En. By induction on m, q,lm)(Od cO l ' and in fact
(4.23)

143
4 Vector-valued functions of several variables

Lemma I. Let cj) be continuous on n 1 and satisfy (4.22) with


n be the f>-neighborhood of 0, where f> ::; (1 - C)f>l; and let
°< c < 1. Let

L cj)[ml(t)
00

+(t) = for all tEn.


m=O
Then, for all tEn,

I+(t) I ::; _It_1 +(t) - cj)[+(t)] = t.


1 - c,
PROOF. Consider, for r = 1,2, ... , the partial sum
r
+r(t) = L cj)[ml(t).
m=O
By (4.23) and comparison with the geometric series It I(1 + c + c2 + ...),
the series defining+(t) converges. Moreover,

I+r(t)I ::; 11~1 c' 1+(t)1 ::; _It_I .


1- c
In particular, +r(t) E n 1 and +(t) E n 1 for all tEn. Moreover,
r r+ 1
+r - cj) +r =
0 L cj)[ml - L cj)[ml = I - cj)[r+ 11.
m=O m=l
In other words,
Wr(t) - cj)[Wr(t)] = t - cj)[r+ 11(t), for all tEn.
As r ~ 00, +r(t) ~ +(t) and cj)[r+ 11(t) ~ O. Since cj) is continuous, cj)[+r(t)] ~
cj)[+(t)] as r ~ 00. This proves the desired relation between cj) and +. D

Lemma 2. Let g be of class C(l),from ~ into En, with Jg(t 1 ) =I- 0, tl E~. Then
there exist a neighborhood n oft 1 with n c ~,a neighborhood U afx 1 =
get!), and afunction F defined on U, such that:
(i) The restriction gin is univalent.
(ii) U c g(n) and F(U) c n.
(iii) g[F(x)J = xfor all x E U.
(iv) F is differentiable at Xl' and
DF(x 1) = [Dg(t 1)J-l.

PROOF. We may assume that tl = 0 and Xl = O. If this is not the case, we


first make translations in both t-space and x-space, replacing get) by get) =
get + t 1) - Xl' Let L = Dg(O), and cj) = I - L -log. Note that the inverse
L - 1 exists since Jg(O) =I- O. Moreover, cj)(0) = 0 and
Dcj)(O) = I - L - 1 Dg(O) = I - L- 1 L = O.
0 0

144
4.5 The inverse function theorem

We apply Proposition 4.4 to cI», with e = 1. Since DcI»(O) = 0 we have, in some


neighborhood 0 0 of 0 of radius 15 0 , IcI»(t) I ~ ! It I. In (4.22) let c = 1. Choose
o to be the !5-neighborhood of 0, with 15 ~ !!5o sufficiently small that Proposi-
tion 4.5 applies to cI» when e = t.
To prove (i), suppose that g(s) = get) with s, t E Q. Then
cI»(s) = s - L - 1 [g(s)] , cI»(t) = t - L - 1 [get)].
By subtracting, we get
cI»(s) - cI»(t) = s - t.
But Icp(s) - cI»(t) I ~ tis - tl by Proposition 4.5. Therefore, s = t. This
proves that g lOis univalent.
Let 11 = 15(211 L - 1 II) - l, and let U be the 11-neighborhood of O. Define F
by
F(x) = \j![L -lex)] for all x E U,
with \j! as in Lemma 1. Since c = !,
IF(x) I ~ 21L -l(x)1 ~ 211L- l lllxl <15
if x E U. Thus F(U) c O. By definition of cI» and Lemma 1
1 - cI» = L - log, (I - cI») ° \j! = I,
the transformations in these equations being restricted to O. Then
L-1ogo\j!=I;
and g \j! = L. In other words,
0

g[\j!(t)] = L(t), foralltEQ.


Set t = L - l(X), X E U. Then tEn and
g[F(x)] = g[\j!(t)] = L[L -lex)] = x.
This shows that U c g(O) and that g[F(x)] = x, completing the proof of
(ii) and (iii).
To prove (iv), we have by Lemma 1
\j!(t) - t = cI»[\j!(t)], for all t E Q.
Let 0 < c ~ 1. Since cI»(O) = 0 and DCP(O) = 0, inequality (4.22) holds in some
neighborhood no of 0, ofradius 15 0 , Let ~ ~ (1 - c)~o. Then

I\j!(t) - tl ~ cl \j!(t) I ~ 1 ~ )tl, if It 1< l


Since It I ~ II L -1 Illxl and F(x) = \j!(t), we have

IF(x) - L -l(x)1 ~ CII~ -~I~XI, iflxl<~,

145
4 Vector-valued functions of several variables

where ij = %(11 L -111)-1. Given any e > 0 choose c such that 0 < c .:::; t
and

For the corresponding ij we have

I~IIF(X) - L -l(x)1 .:::; e, if 0 < Ixl < ij.

Since F(O) = 0, this shows that

lim -11I IF(x) - F(O) - L -l(x)1 = o.


x--+O X

Thus F is differentiable at 0, and L - 1 = DF(O). o


Proof of inverse function theorem
Given to, let us take for L\o a neighborhood of to such that g IL\o is univalent.
Such a neighborhood exists by Lemma 2. Since g is of class c<q), q 2 1, Jg is
continuous. Hence we can choose L\o such that Jg(t) -::f:. 0 for all t E L\o. Let
f = (g IL\o)- 1; its domain is g(L\o). Consider any t1 E L\o; and let Xl = g(t 1),
n, U, F be as in Lemma 2, with n c L\o. Since U c g(n) c g(L\o) and
gl L\o is univalent, F(x) = f(x) for all x E U. Since each Xl E g(L\o) has such a
neighborhood U, g(L\o) is an open set. Moreover, by (iv) (Lemma 2) f is
differentiable at Xl and
Df(x 1) = [Dg(t 1)J-1.
To complete the proof it remains only to show that f is of class C(q).
By Theorem 4.2, f is continuous since f is differentiable at every point of
g(L\o). Since Df(x) = [Dg(t)J - 1, the product of the corresponding matrices
of first-order partial derivatives is the identity:
n

£5j = L fi(x)g;{t), i,j = 1, ... , n,


/= 1

where t = f(x). Since each g~ is continuous and fis continuous, the composite
g~0 f is continuous. Let (y~) be a nonsingular matrix. Cramer's rule [12J

expresses the solution (z 1, ... , zn) of the system of linear equations


n
bj = L z/y;, j = 1, ... , n
1= 1

as a rational function (quotient of two polynomials) in yL ... , y~, with


denominator not O. Let us take bj = £5}, Z, = flex), y} = g~{t). Since Jg(t) -::f:. 0,
the matrices (g}(t)) are nonsingular. Thus fi is the composite of a rational
function, with nonzero denominator, and the continuous functions g} 0 f.
Thus, each partial derivative fi is continuous. This shows that f is of class

146
4.6 The implicit function theorem

C<l).Ifg is of class e(2), then each g~ is of class e(l). By Corollary 3, Section 4.4,
g~ f is of class e(1). Hence each f~ is of class e(1l, and f of class e(2). By
0

repeating this argument, we find that f is of class e(q) if g is of class e(q). 0

PROBLEMS

1. Determine whether Jg(t) # 0 for all t E~. Find g(~). If g is univalent, find g-I
explicitly.
(a) g(t) = t + Xo (a translation), ~ = En.
(b) g(s, t) = (s + 2t)el + (s - t)e2, ~ = E2.
(c) g(s, t) = (S2 - S - 2)e , + 3te 2, ~ = E2.
(d) g(s, t) = (S2 - t 2)el + ste2, ~ = E2 - {(O,O)}.
(e) g(s, t) = (log St)el + 1/(s2 + t 2)e2' ~ = {(s, t): 0 < t < s}.
2. Let g(t) = t 4 + 2t 2, ~ = (0, 00). Find g- I.
3. Letr = n = 3,andg(s, t, u) = (u cos st)el + (u sin st)e2 + (s + u)r J . Theng(1:1 + 1: 3)
= e, + 2e3' Let f be a local inverse of g such that f(e , + 2e 3) = 1:, + 1:3' Find
Df(e, + 2e 3) using (4) of the inverse function theorem.
4. In Example 2, what are the images of horizontal straight lines? Show that g is a
conformal transformation (Problem 9, Section 4.3), and hence that the images of
vertical and horizontal straight lines intersect at right angles. Illustrate with a sketch.
5. Let g(s, t) = (exp s cos t)e, + (exp s sin t)e 2 and ~ = E2.
(a) Show that Jg(s, t) # 0 for all (s, t), but g is not univalent.
(b) Let ,1 = {(s, t): 0 < t < 2n}. Show that the restriction of g to ,1 is univalent,
and find its inverse.
(c) Find g(E2).
(d) Show that g is conformal.
6. Let ~ be an open convex set and g a differentiable transformation such that
I? j=' g~{t)hihj > 0 for
every t E ~ and h # O. Show that g is univalent. [Hint:
Suppose that g(td = g(t 2). Let h = t2 - t" f(t) = [g(t) - g(t!)] . h, and apply the
mean value theorem to f.] This result is due to H. Nikaid6.
7. Suppose that g is of class C(1), with Jg(t) # 0, for all t E ~. Given x I/o g(~), let l{I(t) =
Ix - g(tW. Show that dl{l(t) # 0 for all t E ~.
8. Let g be of class cn. Suppose that there is a number c > 0 such that Ig(s) - g(t) I :2:
cis - tl for all s, tEEn. Show that:
(a) g is univalent.
(b) Jg(t) # 0 for all tEEn.
(c) g(En) = En. [Hint: Use Problem 7, and show that l{I(t) has a minimum.]

4.6 The implicit function theorem


In elementary calculus there is a principle, often carelessly stated, that an
equation <l>(x, Y) = 0 "implicitly determines one of the variables x or Y as a
function of the other." Actually, this statement is correct in a neighborhood
U of any point (xo, Yo) such that $(xo, Yo) = 0 and at least one of the partial

147
4 Vector-valued functions of several variables

derivatives <1>1 (xo, Yo), <l>2(XO, Yo) is not O. This is a special case ofthe implicit
function theorem below.
More generally, suppose that 1 :::; m < n. Consider a transformation

(4.24)

from an open set D c En into Em. The implicit function theorem ensures that
the equation eIl(x) = 0 determines m components of x as functions of the
remaining n - m components, in a neighborhood of any xo such that
eIl(xo) = 0 and D<I>(x o) has maximum rank m.
The matrix of DeIl(xo) is m x n, with elements the partial derivatives
<I>~{xo), i = 1, ... , m,j = 1, ... , n.
If DeIl(xo) has maximum rank m, then some set of m columns of its matrix
is linearly independent. For the present, let us assume that the last m columns
are linearly independent. Let r = n - m, and let

(4.25)

denote the determinant of the sub matrix with these columns. We also let

x= (Xl, ... , x'), Xo = (X6, ... , xo)

denote the vectors obtained by taking only the first r components of x and
xo. We seek to write the remaining components as x,+ I = ¢1(X) for I =
x
1, ... , m, and for in some open set R containing xo.
When this is possible
we write for brevity q, = (¢l, ... , ¢m) and x = (x, q,(x)).

Implicit function theorem. Let ell be of class C(q) from an open set D c En
into Em, where q ~ 1 and 1 :::; m < n. Let Xo E D be such that eIl(x o) = 0 and
jell(x o) #- O. Then there exist a neighborhood U ofx o , an open set R c E'
containing x o , and q, = (¢ 1, ... , ¢m) of class c(q) on R such that:

jell(x) #- 0 for all x E U;


and
{x E U: eIl(x) = O} = {(x, q,(x)): X E R}.

PROOF. Since ell is at least of class C(1), the Jacobian jell is a continuous
function. By assumption it is not zero at X o , and therefore is not zero for x
in some neighborhood U 0 of Xo'
Let us consider the transformation f, with domain U 0 and values in En,

t(x) = Xi, i = 1, ... , r,


r+l(x) = <l>1(X), I = 1, ... , m, m +r= n.

148
4.6 The implicit function theorem

The transformation f is of class c<q). Its matrix of partial derivatives is

1 o 0; 0 o
o I
I
I
I
o I 0 o
----------1----------
I

<II~ . .. <II; I <11;+ 1 ... <II~


I
I
mm1mm mm
"', I "',+ 1 • .• "'n
By properties of determinants, the Jacobian Jf(x) equals the determinant
j<l»(x) of the m x m block in the lower right-hand corner. Therefore
Jf(x) "# O. By the inverse function theorem, there is a neighborhood U of Xo
such that f(U) is an open set and the restriction fl U has an inverse g of
class c<q). Note here that the roles of the symbols f and g in Section 4.5 have
been reversed.

xr+ 1, ... , x" (x:<I>(x) = O}

I
I
I
, I

--t---'-~"-4-J'--
xo't' x', ... , x'
'--y-----'
R
X·+ 1 = cj/(t)

Figure 4.5

Writing (x, 0) for (x 1, ... , x', 0, ... , 0), let (see Figure 4.5)
R = {x: (x, 0) E f(Un.
Since f(U) is an open set, R is open. For every X E R, let
¢'(X) = g'+ '(X, 0), 1= 1, ... , m.
Then x E U and ~~x) = 0 if and only if x E Rand f(x) = (x, 0). Since fl U and
g are inverses, f(x} = (x, 0) if and only if x = g(x, 0). 0

EXAMPLE I. Let <Il(x, y) = x 2 - y2 - 1. The set H = {(x, y): <Il(x, y) = O}


is a hyperbola. If (xo, Yo) E H and Yo "# 0, then <11 2(xo, Yo) = -2yo "# O.
The implicit function theorem states that (xo, Yo) has a neighborhood U
such that H n U = {(x, ¢(x)): x E R}, where R is an open interval containing
Xo and ¢ is a suitable function. In this example, ¢ can be found explicitly by

149
4 Vector-valued functions of several variables

°
solving the equation <I>(x, y) = for y. We get ¢(x) = ±(x 2 - 1)1/2, the
°
sign depending on whether Yo > or Yo < 0. The interval R is the projection

°°
of H n V onto the x-axis.
If Yo = 0, Xo = ± 1, then <l>2(XO, Yo) = and the reasoning above fails.
However, at these points, <l>l(XO, Yo) # and (xo, Yo) has a neighborhood
V such that H n V = {(ljJ(y), y): y E R}, where ljJ(y) = ±(l + 1)1/2 and R
the projection of H n V onto the y axis.

The partial derivatives of ¢\ ... , cpm in the implicit function theorem


can be calculated in terms of those of <1>1, ... , <l>m by means of the chain
rule and Cramer's rule. We illustrate the technique in two special cases.
Let us suppose that q :2: 2.
We first consider n = 3, m = 1. Suppose as in the implicit function theorem
°
that <I>(xo, Yo, zo) = and <l>3(XO, Yo, zo) # 0. Then there exist an open set R
containing (xo, Yo) and ¢ such that
(*) <I>[x, y, ¢(x, y)] = 0,
and <l>3[X, y, ¢(x, y)] # °for every (x, y) E R. Applying the chain rule to (*),
we get

(**)

In the formulas (**) the partial derivatives of <I> are evaluated at (x, y, ¢(x, y».
To calculate the second-order partial derivatives ¢11' ¢12, ¢22' the chain
rule is applied again. For instance, taking the partial derivative with respect
to the second variable in the first equations (**), we get
<1>12 + <1>13 ¢2 + [<1>32 + <1>33 ¢2]¢1 + <1>3 ¢12 = 0.
Substituting the expressions for CPt> ¢2 obtained above and solving for ¢12,
we get
A.. = _ (<1>3)2<1>12 - <1>2<1>3<1>13 - <1>1<1>3<1>32 + <1>1<1>2<1>33
'1-'12 (<1>3)3
Let m = 2, n = 3, r = n - m = 1. Writing <I> = (<I>, '1') rather than
(<1>1, <1>2), and cp, ljJ rather than cpl, ¢2, we have
<I>[x, cp(x), I/I(x)] = 0,
'I'[x, cp(x), I/I{x)] = 0,
°
and <1>2'1'3 - <1>3'1'2 # for every x E R. The partial derivatives in question
are evaluated at (x, cp(x), I/I(x». By the chain rule
<I> 1 + <1>2 cp' + <1>31/1' = 0,
'1'1 + 'l'2¢' + '1'31/1' = 0,

150
4.6 The implicit function theorem

and by Cramer's rule

cp' = <1>3 'P 1 - <1>1 'P 3, 1/1' = <1>1 'P 2 - <1>2 'P 1.
<1>2 'P 3 - <1>3 'P 2 <1>2 'P 3 - <1>3 'P 2
The second derivatives cp", 1/1" can be found by another application of the
chain rule.
For convenience we assumed in (4.25) that the last m columns of the
m x n matrix of partial derivatives (<I>l(x o)) were linearly independent. More
generally, one need merely suppose that some set of m columns is linearly
independent, in other words, that the linear transformation D«J)(x o) has
maximum rank m. Let us suppose that columnsjl,j2' .,. ,jm form a linearly
independent set, where we may suppose that j 1 < h < ... < jm. Let
i b ... , ir be those integers between 1 and n not included among j 1, ... ,jm'
with i 1 < ... < i r . For brevity let us write A for the r-tuple of integers
(il"'" ir) and Xl for the r-tuple (XiI, ... , xir).
The implicit function theorem now states, roughly speaking, that locally
the equation «J)(x) = 0 determines XiI, ... , xim as functions of Xl. More pre-
cisely, U, R, and CPl, ... , cpm exist as before such that

0(<1>1, ... , <l>m) i=


O(X jl , ... , xim)
° at every X E U,

and
{x E U: «J)(X) = O} = {X: Xl E R, xi! = CPl(X l ) for I = 1, ... , m}.

In the case we considered above, jl"" ,jm are the integers r + 1,


r + 2, ... , n, A =
(1,2, ... , r), and then Xl = x.
EXAMPLE 2. Suppose that m = 2, n = 5, and

0(<1>1, <1>2)
0(x\x 4 ) #-0 atx o .

Then we can take jl = l,j2 = 4, ). = (2,3, 5).


Let m = 1. Then D<I>(x o) = d<l>(x o), and D<I>(xol has maximum rank 1 if
and only if at least one partial derivative <I>/xo) is not zero. If <I>/xol #- 0,
then we can take j 1 = j and Xl = (x 1, ... , xi - 1, xi + 1, ... , Xn).

EXAMPLE 3. Let
<I>(x, y, Z) = X2 + i + Z2 - 2xz - 4.
Then
<l>1(X, y, Z) = 2x - 2z,
<l>2(X, y, Z) = 2y,
<l>3(X, y, Z) = 2z - 2x.

151
4 Vector-valued functions of several variables

If dcI>(x, y, z) = 0, then y = 0, x = z, and cI>(x, y, z) = -4 '" O. The implicit


function theorem applies at any (xo, Yo, zo) where cI>(xo, Yo, zo) = O. If
Xo '" zo, then cI>3(XO' Yo, zo) '" O. We may take jl = 3,.Ie = (1,2), and
proceed as above. We may equally well take h = 1,). = (2,3). However, if
Xo = Zo, then we must takejl = 2,.Ie = (1,3).

PROBLEMS

In each problem assume that cI> is of class em.


1. Let <I>[x, cp(x)] = 0 and <l>2[X, cp(x)] "# 0 for every x E R. Find cp' and cp".
2. Let <I>[cp(y, z), y, z] = 0 and <1>1 [cp(y, z), y, z] "# 0 for every (y, z) E R. Find CPll'
3. Letm = 2,n = 4,<I>(x) = (X 2)2 + (X 4 )2 - 2XIX3, 'P(x) = (X 2)3 + (X 4 )3 + (X I)3 _ (X 3)3,
and cD = (<I>, 'P). Let Xo = (1, -1, 1, 1),jl = l,h = 3.
(a) Show that the hypotheses of the implicit function theorem are satisfied.
(b) Write cpl = cp, cp2 = t/J, where according to the theorem,
(X 2)2 + (X 4 )2 _ 2cp(x 2, X4 )t/J(X 2 , x 4 ) = 0,
(X 2 )3 + (X 4 )3 + [cp(x 2, X4 )]3 _ [t/J(x 2, X4 )]3 = 0
for every (x 2 , x 4 ) E R. Find the first-order partial derivatives of cp and t/J at
x~ = (-1,1).

4. Let <l>(x, y, z) = x 2 + 4y2 - 2yz - Z2, Xo = 2e l + e2 - 4e 3 •


(a) Verify the hypotheses of the implicit function theorem.
(b) Find the largest neighborhood U of Xo such that <l>3(X, y, z) "# 0 for every
(X,y,Z)E U.
(c) Find the largest neighborhood of Xo containing no critical point of <1>.
5. Let <I>(x, y) = x 2 - yl, Xo = (0,0).
(a) Let U be any neighborhood of (0,0), of radius a, and R = (-a/ji, a/ji).
Find a function cp such that <I>[x, cp(x)] = 0 for every x E R.
(b) Show that no cp exists such that {(x, y) E U: <l>(x, y) = O} = {(x, cp(x»: x E R}.
6. (a) Let m = 2. Let cD = (<I>, 'P), where 'P(x) = 8(x)<I>(x) for every XED and 8 is a
real valued function. Show that DcD(xo) has rank less than 2 at any Xo such that
cD(xo) = O.
(b) State and prove a corresponding result for m > 2.
7. Give an alternate proof of the implicit function theorem, in case m = 1, n = 2, by
carrying out the following steps. Let <I> be of class C( I) and suppose that <l>(xo, Yo) = 0,
<l>ixo, Yo) "# O. For definiteness assume that <l>2(XO, Yo) > O.
(a) Show that there exists e > 0 such that <l>(xo, y) < 0 if Yo - f; :s; y < Yo and
<I>(xo, y) > 0 if Yo < Y :s; Yo + f..
(b) Show that there exists b > 0 such that <l>(x, Yo - 1:) < 0 and <l>(x, Yo + e) > 0
if Ix - xol < b.
(c) Let I = {(x, y): Ix - xol < b, Iy - Yol < d. The numbers e and b in (a) and (b)
maybesochosenthat<l>2(x,y) > Oforevery(x,Y)EI.Showthatiflxl - xol < b
the equation <I>(XIo y) = 0 has exactly one solution YI with (XI' yd E I. Set
YI = CP(XI)' This defines cp on the open interval (xo - b, Xo + b).

152
4.7 Manifolds

(d) Show that c/> is differentiable and that

c/>'(x) = _ <1>\ [x, c/>(x)].


<l>2[X, c/>(x)]
In this proof of the theorem, the rectangle I replaces the circular neighborhood U,
but this is unimportant. Can you extend this proof to the case m = 1, n > 2?

4.7 Manifolds
The word manifold is used in mathematics to describe a topological space that
locally is "like" euclidean E', for some r called the dimension of the manifold.
For instance, a circle is locally like EI. Such geometric figures in E3 as ellip-
soids, cylinders, and tori are locally like E2. A cone is not locally like E2 near
its vertex.
We approach the idea of manifold from a rather concrete viewpoint.
For us, a manifold M is a subset of some euclidean En that can locally be
described by an equation <I>(x) = 0, where D<I>(x) must have maximum rank.
Another definition of manifold can be given abstractly in terms of coordinate
systems. It has the advantage that one need not presuppose that M is a subset
of some euclidean space. This will be discussed in Chapter 8.

Definition. Let 1 ~ r < n, q ~ 1. A nonempty set M c En is a manifold


of dimension r and class C(q) if M has the property that for every Xo E M
there exist a neighborhood U of Xo and <I> = (<1>1, ... ,<1>n-,) of class
C(q) on U, such that D<I>(x) has rank n - r for every x E U and

Mn U = {x E U: <I>(x) = O}.

Throughout the following discussion we take q = 1. For brevity we say


r-manifold instead of "manifold of dimension r and class C(1)." If r = n, let
us call any open subset of En an n-manifold.

°
Let us indicate how an r-manifold M is locally like E'. First assume that
jq,(x o) #- with j<l>(x) as in (4.25), and define f as in the proof of the implicit
function theorem. The neighborhood of Xo chosen in Section 4.6 need not
coincide with the neighborhood U in the definition of manifold. In this section
let us denote the former neighborhood by U I rather than U, and let us denote
the set R in Section 4.6 by R I . We may suppose that U leU. Now flU I is a
regular transformation (Section 4.5), and

is a relatively open subset of the r-dimensional subspace of En spanned by


e l , ... ,e,. Therefore it is reasonable to say that M nUl is "like" E' (Figure
4.6). In case jq,(xo) = 0, one must replace the r-tuple of integers 1,2, ... ,r
by some other r-tuple A, as indicated in Section 4.6.

153
4 Vector-valued functions of several variables

M
f(V I )

- f

j(MnVtl

Figure 4.6

EXAMPLE I. Let n = 2, r = 1, and H be the hyperbola x 2 - y2 = 1. To show


that H is a I-manifold, let us take $(x, y) = x 2 - y2 - 1. Then D<D(x, y) =
d<D(x, y) = 2xe 1 - 2ye 2. If (x, y) =I (0,0), then d<D(x, y) =I 0 and the rank is 1.
Given (xo, Yo) E H, let U be any neighborhood of (xo, Yo) that does not
contain (0,0). In this example the choice of <I> does not depend on (xo, Yo).

EXAMPLE 2. Let M be the union of H and one of its asymptotes,


M = {(x,y):x 2 - y2 = 1} u {(x,y):y = x}.
To show that M is a I-manifold we must show that given (xo, Yo) E M,
there exist U and <I> such that d$(x, y) =I 0 in U and
M n U = {(x, y) E U : <I>(x, y) = O}.
If (xo, Yo) E H, then we let <I>(x, y) = x 2 - y2 - 1 as before, and let U be any
neighborhood of (xo, Yo) that does not meet the asymptote y = x. However, if
(xo, Yo) is on the asymptote, we take <I>(x, y) = y - x and U any neighborhood
of (xo, Yo) that does not meet H. In this example our choice of <I> depends on
(xo, Yo)·

EXAMPLE 3. Let M = {(x, y): x 2 = y2}. This set consists of the two lines
y = ±x, and is not a I-manifold. Roughly speaking, M is not like EI near
the crossing point (0,0). More precisely, if M were a I-manifold, then by the
implicit function theorem the following would be true: Each (xo, Yo) E M has
a neighborhood U 1 such that either M nUl = {(x, 4>(x)): x E Rd or
M nUl = {(t/t(y), y): y E R 2 }, where R I , R2 are open. In the present example,
(0,0) has no such neighborhood.

Most examples of manifolds that we consider are obtained in the following


way. Let «) be a transformation of class COl, from an open set D c En into
Em. Let
(4.26) M = {x: <I»(x) = OandD«J)(x)hasrankm}.
If M is not empty, then it is an r-manifold, where m = n - r. To show that
M is an r-manifold, in the definition of manifold let us choose this same <I» for

154
4.7 Manifolds

every Xo E M. By Problem 11, Section 4.3, {x: D<I>(x) has rank m} is open.
Hence any Xo E M has a neighborhood U such that D<I>(x) has rank m for
every x E U, and M n U = {x E U: <I>(x) = O}.

Definition. When (4.26) holds, we say that M is the r-manifold determined by <1>.

EXAMPLE 4. The (n - I)-sphere {x : Ix I = I} is an (n - 1)-manifold. In fact,


it is the (n - I)-manifold determined by <1>, where <I>(x) = Ixlz - 1. The only
critical point of <I> is 0, which is not on the (n - 1)-sphere.

EXAMPLE 5. Let F be real-valued and of class C(1). Consider a level set


Be = {x: F(x) = c}.
Let <I>(x) = F(x) - c. Then d<l>(x) = dF(x). If Be is not empty and contains
no critical points of F, then Be is the (n - 1)-manifold determined by <1>. If Be
contains critical points, then the (n - 1)-manifold determined by <I> is
Me = Be - (set of critical points of F),
unless Me happens to be empty. We saw in Section 3.5 that Be need not
resemble En - 1 near a critical point contained in Be.

EXAMPLE 6. Let F(x, y) = exp(xy). The partial derivatives are F 1(x, y) =


yexp(xy) and F z(x, y) = x exp(xy). The only critical point is (0,0), and
F(O, 0) = 1. If c ::; 0, then Be is empty. If c > 0, c i= 1, then Be is a I-manifold.
In fact, Be is the hyperbola xy = log c. The level set B 1 is the union of the x-
and y-axes, and is not a I-manifold. M 1 = B 1 - {(O, O)} is a I-manifold.

EXAMPLE 7. As in Example 5, if F is of class C(1) and has values in Em, m =


n - r, then
Me = {x: F(x) = c and DF(x) has rank m}
is either empty or an r-manifold.

Tangent vectors to a manifold


Let M be a manifold and Xo E M.

Definition. A vector h is a tangent vector to M at Xo if there exists a function


'" from an interval ( - b, b) into M such that ",(0) = Xo and ""(0) = h.

The definition can be restated in a way that is more appealing geo-

°
metrically. For brevity let us set x, = Xo + tho Then h is a tangent vector if,
for some b > 0, there exists y, E M whenever < It I < b, such that

(*) lim Iy, - x,l = 0.


,~o It I
155
4 Vector-valued functions of several variables

X, = Xo + Ih

Figure 4.7

If we set ",(t) = y, for 0 < It I < <5, and ",(0) = x o, then (*) states that ""(0) = h
(see Figure 4.7).
Let T(xo) denote the set of all tangent vectors at Xo. It is called the tangent
space to M at Xo. If r is the dimension of M, then it is plausible that the
tangent space is a vector space of dimension r. Let us show that this is true.
Let V and «I> be the same as in the definition of manifold.

Theorem 4.3. The tangent space T(xo) is the kernel of the linear transformation
D<Jl(xo)·

Since D«I>(x o) has rank m, the kernel T(xo) is a vector subspace of P with
dimension r = n - m.

PROOF OF THEOREM 4.3. Let L = D«I>(x o). We must show that h is a tangent
vector if and only if L(h) = O.
Let hE T(xo). Let '" be as in the definition of tangent vector. Then
<Jl["'(t)J = 0 for every t E ( - 15, <5). Calculating the derivative of <Jl by 0 '"

Corollary 2, Section 4.4, we have


o = L[",'(O)] = L(h).
Conversely, let L(h) = O. For simplicity let us assume, as in Section
4.6, that the last m columns of the matrix of L are linearly independent.
Let f be as in the proof of the implicit function theorem, and let V I be a
neighborhood of Xo such that the restriction of f to V I is regular. There
exists <5 > 0 such that Xo + thER I for every tE(-<5,<5). Let g = (fIUd- 1
and
"'(t) = g(xo + th,O).
Then "'(t) E M and", is of class C( I) (Figure 4.8). We must show that ""(0) = h.
Let A = Df(xo). Then A - I is the differential of g at the point (x o , 0) =
f(xo). Therefore
""(0) = A - l(ft, 0).

156
4.7 Manifolds

Figure 4.8

By definition of f,
N(h) = hi, i = 1•... , r,
N+r(h) = d<l>I(XO)' h, 1= 1, ...• m.
Since h is in the kernel of D«D(x o), N+r(h) = O. Thus A(h) = (ft, 0), and
h = A- 1(h, 0) = l\I'(O). D

Normal vectors to a manifold


A vector n is called normal to M at Xo if n . h = 0 for every hE T(x o). The
normal vectors form a vector space of dimension m = n - r, the orthogonal
complement of the tangent space T(x o). The gradient grad <l>1(XO) is the vector
with the same components as the covector d<l>I(X O). By Theorem 4.3
grad <l>1(XO) • h = d<l>I(XO) • h = 0
for every hE T(xo) and I = 1, ... , m. Hence grad <l>l(XO)' , .. , grad <l>m(xo) are
normal vectors to M at xo. Since D<J)(xo) has rank m, these vectors are linearly
independent.
Thus, we have proved the following.

Corollary 1. The gradients grad <l>l(X O)' ... , grad <l>m(xo) form a basis for
the space of normal vectors to Mat xo.

In particular, if M is an (n - I)-manifold, then m = 1; grad <l>(x o) is a


normal vector, and all others are scalar multiples of it.

Tangent r-planes
The r-plane tangent to M at Xo is
{x: x = Xo + h, h E T(x o)}
(see Figure 4.9). The terms tangent line, tangent plane, and tangent hyperplane
are used when r = 1,2, and n - 1, respectively. The tangent r-plane is
{x: grad <l>1(XO)' (x - x o) = 0 for 1= 1, ... , m}.

157
4 Vector-valued functions of several variables

Xo +h

Figure 4.9

EXAMPLE 8. Let
M = {(x,y,z):x 2 + y2 + Z2 - 2xz - 4 = OJ.
According to Example 3, Section 4.6, M is the 2-manifold determined by the
function ct> of that example. Then
grad Cl>(x, y, z) = 2(x - Z)el + 2ye2 + 2(z - x)e3'

Let us find the spaces of tangent and normal vectors to M at (2, )3, 1).

gradCl>(2,)3, 1) = 2e 1 + 2,,/3e 2 - 2e 3.
This is a normal vector, and any other is a scalar multiple of it. The tangent
vectors h satisfy

0= grad ct>(2,)3, 1). h = 2hl + 2)3h 2 - 2h 3 •


Two linearly independent solutions of this equation are e 1 + e 3 and
)3 e 1 - e 2 • These vectors form a basis for T(2,)3, 1). The equation of the
tangent plane is 2(x - 2) + 2)3(y - )3) - 2(z - 1) = 0, or x + )3 Y - z
= 4.
EXAMPLE 9. Let Me E3 be a I-manifold. Let us fine the tangent space at
(xo, Yo, zo) E M. Let us write <D = (ct>, 'P). The tangent vectors satisfy

o = grad ct>(xo, Yo, zo) . h = grad 'P(xo, Yo, zo) • h.


From Cramer's rule one solution of this pair of linear equations is the
vector d with components (Problem 8)

the partial derivatives being evaluated at (xo, Yo, zo). The tangent space
T(xo, Yo, zo) consists of all scalar multiples of d.

In Example 9 one can think of M as the intersection of two surfaces,


obtained by setting ct> = 0 and 'P = 0 separately. Since grad ct>(x, y, z) and
grad 'P(x, y, z) are normal to these surfaces and are linearly independent for

158
4.7 Manifolds

(x, y, z) near (xo, Yo, zo), the tangent planes cannot coincide at points on the
intersection M.
This idea can be extended to the intersection of manifolds of other
dimensions.

*Intersections of manifolds
Let M be an r-manifold and N an s-manifold, with M n N nonempty. Let us
assume that r + s > n. Let TI(x o) denote the tangent space to M at a point
Xo E M n N, and T2 (x O) the tangent space to N at xo. There exist a neighbor-
hood U I of Xo and «I> = (<1>1, ... ,<I>n-r) such that D«I>(x) has rank n - r for
every x E U I and
M nUl = {x E U I: «I>(x) = O}.
In the same way, there exists a neighborhood U 2 ofx o and 'I' = ('PI, ... , 'P n - s)

such that D'I'(x) has rank n - s for every x E U 2 and


N n U2 = {x E U 2: 'I'(x) = O}.
Let U = U I n U 2 , and 0 = (<1>1, . .. ,<I>n-r, 'PI, ... , 'P n- s). Then
(M n N) n U = {x E U: 0(x) = O}.
Let TI (xo) n T2 (xo) have dimension r + s - n at each Xo EM n N. The
kernel of D0(xo) is TI (x o) n Tix o). Hence D0(x o) has the desired rank
n - (r + s - n) = (n - r) + (n - s). From the definition, M n N is an
(r + s - n)-manifold.

EXAMPLE 10. Let n = 3, r = s = 2. If the tangent planes of M and N do not


coincide at any point of M n N, then M n N is a I-manifold. The tangent
line at Xo E M n N is the intersection of the tangent planes to M and N at Xo'

PROBLEMS

In Problems 1-6 and 11, one can show that the set in question is a manifold
by verifying (4.26) for suitable «1>.
1. Let F(x, y) = exp(x 2 + 2y2 + 2). Find the level sets and determine which are
I-manifolds.
2. (a) Show that if c =I- 0, then the hyperboloid x 2 + l - 4Z2 = c is a 2-manifold.
Is the cone x 2 + l = 4Z2 a 2-manifold?
(b) Find the tangent plane at 2e l - e 2 + e 3 to the hyperboloid x 2 + l - 4Z2 = 1.
3. Let M = {(x,y,z):xy = 0,x 2 + l + Z2 = I, Z =I- ±I}. Show that M is a 1-
manifold. Sketch M.
4. Let f be of class C(1) on an open set A c E2. Let M = {(x, y,f(x, y)): (x, y) E A}.
(a) Show that M is a 2-manifold.
(b) Show that (Ux, y), f2(X, y), -I) is a normal vector to M at (x, y, f(x, y)).
(c) Show that the equation for a tangent plane agrees with the one in Section 3.3.
(d) State the corresponding results when f is a function of n variables.

159
4 Vector-valued functions of several variables

5. Let A c E1 be open, and f, 9 be real valued functions of class C(1) on A.


(a) Show that M = {(x, f(x), g(x)): x E A} is a I-manifold.
(b) Show that (1, j'(x), g'(x)) is a tangent vector to M at (x, f(x), g(x)).
6. Let M = {(x, y): x y = y\ x > 0, y > 0, (x, y) oF (e, e)}, where e is the base for
natural logarithms. Show that M is a I-manifold. Make a sketch.
7. Let M = {(x, y, z): xy = xz = O}. Is M a I-manifold?
8. In Example 9 show that d is a tangent vector, using Cramer's rule.
9. Let M and N be r-manifolds such that (c1 M) n Nand M n cI N are empty, M c En,
N c En. Prove that M u N is an r-manifold.
10. Let M be an r-manifold and A an open set such that M n A is not empty. Prove
that M n A is an r-manifold.
11. Let M = {x: LL~ 1 CijXixj = I}, where the matrix (Ci) has rank n and is symmetric.
(a) Show that M is an (n - I)-manifold.
(b) Show that the equation of the tangent hyperplane at Xo E M is

L cijxixb = 1.
i,j= 1

12. (Product manifolds.) Let M c En be an r-manifold and N c Em an s-manifold.


Regarding Em+ n as the cartesian product En X Em, show that M x N is an (r + s)-
manifold. Show that the tangent space at a point of M x N is the cartesian product
of the tangent spaces at the corresponding points of M and N.
13. Let points of E4 be denoted by (x, y, u, v). Let C = {x: x 2 + yl = I, u 2 + v2 = I},
K = {x: x 2 + l : : ; I, u2 + v2 ::::; I}, and B = fr K (boundary of K).
(a) Show that C is a 2-manifold.
(b) Show that B-C is a 3-manifold. [Hint: Use Problems 9 and 12.J
14. Let M, (J), and U be as in the definition of "manifold." For each I = I, ... , m,
let 'l'1(X) = gl(X)<1>I(X), where gl is of class C(1) and gl(xol oF O. Show that there
is a neighborhood U 0 of Xo such that D'I'(x) has rank m for every x E U 0 and
M n U 0 = {x E U 0: 'I'(x) = O}. [Hint: Show that d'l'l(XO)' ... , d'l'm(xo) are linearly
independent. J
15. Let f be a function of class C(2) on E2, each of whose critical points is isolated.
Suppose that A = {x :f(x) > O} is connected and contains two points of relative
maximum. Show that A contains a saddle point.
16. Let us identify En' with the set of all linear transformations from En into itself by
associating with each linear transformation L the vector
n

L c)e i + U -l)n'
i,j= 1

where (c~) is the matrix ofL. Let O(n) be the set of orthogonal transformations of En.
(a) Show that O(n) is a manifold of dimension !n(n - 1) and class Cleo). [Hints: It
suffices to show that I has a neighborhood U with the properties required in
the definition of manifold. Use Proposition 4.1. Here I is the identity trans-
formation.
(b) Show that L is a tangent vector to O(n) at I if and only ifL' = - L. (Such a linear
transformation is called skew symmetric.)

160
4.8 The multiplier rule

(c) Let SO(n) be the set of an rotations of En about O. Show that SO(n) is a relatively
open subset of O(n), and hence SO(n) is also a manifold of dimension in(n - 1).
Show that SO(n) is the largest connected subset of O(n) which contains I. [Hint:
Using induction on n, show that any L E SO(n) can be joined with I be a path
in SO(n).]

4.8 The multiplier rule


Let M be a manifold and f be a real valued function of class C(1) on some
open set containing M. Let us consider the problem of finding the extrema of
the functionf IM. This is called a problem of constrained extrema.
If Xo is a point of Mat whichfhas a constrained relative maximum, then
Xo has a neighborhood U 0 such that

f(x) :::; f(x o) for every x E M n U o.


(recall the definition in Section 3.5). Iffhas a constrained relative minimum
at xo, then the inequality sign is reversed.
Since M is a manifold, there exists a neighborhood U of Xo and Cl> of class
C( I) on U such that
Mn U = {x E U : Cl>(x) = O}
and DCl>(x) has maximum rank m for every x E U. We may assume that
U c U o.
Roughly speaking, the multiplier rule states that by introducing suitable
multipliers 0' I, ... , O'm the constrained extremum problem can be treated as
one of ordinary (unconstrained) extremum. More precisely:

Lagrange multiplier rule. Let f have a constrained relative extremum at xo.


Then there exist real numbers 0' 1, ... , 0' msuch that Xo is a critical point of the
function
F = f + 0'1<1>1 + ... + O'm<l>m.
PROOF. It suffices to consider the case of a constrained relative maximum,
Let h be any tangent vector to Mat xo. Let 4>(t) = f["'(t)], where", is the
same as in the definition of tangent vector. Since "'(t) EM and f IM has a
relative maximum at xo, 4> has a relative maximum at O. Therefore 4>'(0) = o.
By the chain rule,
4>'(0) = grad f[ ",(0)] . ""(0) = grad f(xo)' h.
This shows that grad f(x o) is a normal vector to M at xo. However,
grad <l>1(XO)' ... , grad <l>m(xo) form a basis for the space of normal vectors to
M at Xo (Corollary 1, Section 4.7). Therefore, grad f(x o) is a linear com-
bination of these vectors:
grad f(x o) = a 1 grad <l>1(XO) + ... + am grad <l>m(xo).
Let 0'1 = -al for 1= 1, ... , m. Then grad F(x o) = o. o
161
4 Vector-valued functions of several variables

EXAMPLE I. Let f(x, y, z) = x _. y + 2z. Let us find the maximum and


minimum values off on the ellipsoid
M = {(x, y, z): x 2 + i + 2Z2 = 2}.
Let <I>(x, y, z) = 2 - (x 2 + y2 + 2Z2) and F = f + 0'<1>. The
multiplier a is
yet to be determined. From the multiplier rule we get three equations
F1 = 1 - 2axo = 0,
F2 = .- 1 - 2ayo = 0,
F3= 2-4az o =0.
From these and the fourth equation <I> = 0, we get

Xo = 20" Yo = - ;:;,-,
.;.0'
Zo = 20" a
±j2.
Therefore Xo = ±(j212)(e1 - e 2 + e 3), depending on which of the two
possible values for a is used. Since f is continuous and M is a compact set,f
has a maximum and a minimum value on M. One of the two critical points
obtained by the multiplier rule must give the maximum and the other the
minimum. Since

and

these numbers are the maximum and minimum values, respectively.

EXAMPLE 2. Letf(x) = I:'=l °


bi (X i )2, where bi #- for each i = 1, ... ,n. Let
M be the hyperplane {x: y' x = I}, and F(x) = f(x) + 0'(1 - y' x).lfxo is a
critical point of F, then
o= grad F(xo) = grad f(x o) - ay.
Thus, grad f(x o) = ay, or
j;(x o) = 2biX~ = ai, i = 1, ... , n.
From this and the equation y . Xo = 1,

provided the sum is not zero. To determine whether Xo gives an extremum,


we use the formula
f(x o + h) = f(x o) + grad f(x o) . h + f(h),
which is valid for homogeneous quadratic polynomials. Points of the hyper-
plane M are of the form Xo + h, where y' h = 0. Since grad f(x o) = ay,

162
4.8 The multiplier rule

the above formula simplifies to


f(xo + h) = f(xo) + f(h).
Iff(h) ~ 0 for every h satisfying y' h = 0, thenfhas an absolute constrained
minimum at Xo.
The characteristic values of a symmetric matrix
Let (c}) be an n x n matrix and L the corresponding linear transformation.
A number A is a characteristic value ofL if the linear transformation L - AI
is singular. There are n characteristic values A1, ... , An' counting multiplicities.
IfL(x) = AiX and x =F 0, then x is a characteristic vector corresponding to the
characteristic value Ai' The numbers A1"'" An may be complex, and the
characteristic vectors may have complex components [12, p. 164].
Let us suppose, however, that (c}) is a symmetric matrix, c} = c{ for
i,j = 1, ... , n. Let us show that the characteristic values are real. In fact,
Ai can be characterized as the value of a certain constrained maximum.
Consider the homogeneous quadratic polynomial
n
f(x) = L(x) . x = L C}XiXi,
i,i= 1
and let M 1 be the unit (n - I)-sphere in En, M 1 = {X: Ix I = I}. Let
(4.27) A1 = max{f(x):xEMd,
and let v1 be a point of M 1 at which the maximum is attained. By the multi-
plier rule, with
F(x) = f(x) + 0'(1 - X • x),

there is a multiplier 0' such that v1 is a critical point of F. Then grad f(x) =
2L(x) since the matrix (c}) is symmetric. Hence
0= grad F(v 1 ) = 2L(v 1) - 20'V1'
Hence L(v 1) = (TV 1, which shows that 0' is a characteristic value and v1 a
characteristic vector. Since f(x) = L(x)' x,
A1 = !(v 1) = L(v 1)' V1 = O'V 1 . v1,
and since V1 . v1 = 1,0' = A1'
We next let
M2 = {x:lxl = l,x'v 1 = A},
A2 = max{f(x): x E M 2}'
and V2 E M2 be a point such that f(v 2 ) = ).2' We have added another con-
straint. Hence M2 c M 1 and A2 :$ A1' Obviouslyv 2 . V1 = O. For k = 3, ... ,n,
let
Mk = {x:lxl = l,x'vi = 0 fori = 1, ... ,k -I},
(4.28) Ak = max{f(x): x E M k },

163
4 Vector-valued functions of several variables

and Vk E Mk be such that Ak = f(v k). Then


An:-:;An_I:-:;···:-:;A I
and {VI' ... ' Vn} is an orthonormal basis for En. Let us show by induction
on k that Ak is a characteristic value and Vk a characteristic vector. This is
true if k = 1. Let k ~ 2. Applying the multiplier rule with
k-l
F(x) =f(x) + O"dl - x·x) + L O"i+I X • V;,
i= 1
we get
k- 1
0= 2L(v k) - 20"1 Vk + L O"i+l Vi·
i= 1

Since Vi • Vj = 0 for i #- j, we get upon taking the inner product with Vj'
o= 2L(v k)· Vj + O"j+ 1, ifj < k,
0"1 = L(v k )· V k = f(v k ),

or 0" 1 = Ak • Since (c~) is symmetric, L(x) • y = x • L(y) for all x, y. In particular,


L(vd· Vj = Vk • L(v).
Using the induction hypothesis, we get
L(v k)· Vj = Vk· (AjV) = 0 ifj < k.
Hence 0"2 = ... = O"k = 0, and L(vd = Ak Vk •
If k = n, the multiplier rule does not apply. However, we used the multi-
plier rule only to show that L(vk ) is a linear combination of VI' ... ' V k • If
k = n, this is clear from the fact that {v 1, ... , Vn} is a basis for En.

Theorem 4.4. The characteristic values of the symmetric matrix (c~) are
Ab ... , An· For each i = 1, ... , n, Vi is a characteristic vector corresponding
to Ai. If ~ 1, ... , ~n denote the components of x with respect to the ortho-
normal basis {VI' ... ' Vn}, then
(4.29) f(x) = Al(~1)2 + ... + An(~n)2
for every x E En.
PROOF. The first two statements have already been proved. To prove the
third, we have
n

X = L ~iV;,
i= 1
n n

L(x) = L ~iL(Vi) = L Ai~iV;,


i= 1 i= 1
n
f(X) = L(x)· x = L Ai~iVi· x,
i= 1

which is just (4.29). o


164
4.8 The multiplier rule

Corollary. f is positive definite if and only if Ai > for each i ° = 1, ... , n.

°
PROOF. If each Ai is positive, then by (4.29), f(x) > whenever x
Conversely, iff(x) > for every x =1= 0, then Ai = f(v;) > for each i.
°° =1= o.
D

These results have a geometric interpretation. Let


B = {x :f(x) = 1}.
If f is positive definite, then B is called an (n - 1)-dimensional ellipsoid.
Setting Ili = 1/A, we have (Figure 4.10)

B= {x: itl (~i/lli)Z = 1}'


°
ifAi > for each i = 1, ... , n. If n = 2 and Al > 0, Az < 0, then B is a hyper-
bola. If n = 3 and Al > 0, A3 < 0, then B is a hyperboloid. It has one sheet
°
if Az > and two sheets if Az < 0.
y

I
I
...... /
----+-------~/r.,~,------~---x
" /

B / ......
/
/ ,,
/
/
/
/

Figure 4.10

PROBLEMS

1. Set M = {(x,y,z):x + y + z = J} and j(x,y,z) = 3x 2 + 3y2 + Z2. Show that


there is a constrained absolute minimum, and find the minimum value of jon M.
2. Use the multiplier rule to find the distance to the parabola y2 = x from the point
eel' e # O. [Hint: Let f(x, y) = (x - e)Z + l, which is the square of the distance.]
3. Find the distance from the point e l - 2e z - e 3 to the line {(x, y, z): x = y = z}.

4. (a) Use the multiplier rule to show that Ia I = max {a . x : Ix I = I}.


(b) Deduce the same result from Cauchy's inequality.
5. Let M be a manifold, XI If M, and suppose that Xo is a point of M nearest XI. Using
the multiplier rule, show that X I - Xo is a normal vector to M at Xo.
6. Show that the distance from a point X1 to the hyperplane {x: y . x = b} is
Iy' Xl - bi/lyl·

165
4 Vector-valued functions of several variables

7. Letf(x) = xIX Z ... xn and M = {x: Xl + ... + xn = 1, Xi> 0 for i = 1, ... , n}.
(a) Show that f(x) ::; n-n for every x E M, with equality if Xl = ... = xn = n- I.
[Hint: First show that f has an absolute maximum on M. Apply the multiplier
rule to log f, which has a maximum at the same point where f has one.]
(b) Using (a), prove that the geometric mean of n positive numbers is no more
than their arithmetic mean. See Problem 9(b), Section 3.6.

8. Let p > 1 and p' be the number such that p - I + (p') - I = 1. Let I x II be as in
Problem 4, Section 2.11, and for each co vector a let II a II = max {a . x : II x II = I}.
Show that
n )IIP'
Iiall = ( IlaY

[Note: For these norms the inequality la . x I ::; Ilallllxll is called Holder's inequality,
A related inequality for integrals is given in Section 5.13].

9. Let f(x, y, z) = 2xz + yZ.


(a) Find the characteristic values AI, A,z, A,3'
(b) Sketch the surface with equation 2xz + .l = 1. With equation 2xz + yZ = O.

to. Let IILII be defined as in Section 4.3. Show that IILll z is the largest characteristic
value ofL' 0 L, [Hint: Use (4,10) with y = L(t)].

11. (Second derivative test for constrained relative maxima.) Let f and <I> be of class e(2),
and let Q(x, h) = D,
j= I Fiix)hihj where F is as in the multiplier rule. Show that:
(a) If f IM has a relative maximum at x o , then Q(x o , h) ;::: 0 for every hE T(xo).
(b) If Q(x o, h) > 0 for every hE T(x o), h ¥- 0, then f IM has a strict relative
maximum at xo' [Hints: See the proof of Theorem 3,4, Set hI = t-I(y, - x o),
and show that Iim,~o Q(x o, hI) = Q(xo, h),]

166
5
Integration

The integral of a real valued function over a set is a generalization of the


notion of sum. It is defined by approximating in a suitable way by certain
finite sums. The first careful definition was due to Riemann (1854). Riemann
defined the integral of a function over an interval [a, bJ of the real line El.
In the succeeding years Riemann's idea was extended in several ways.
However, the Riemann integral has several intrinsic drawbacks, and for
a truly satisfactory treatment of integration a different approach had to be
found.
About 1900 Lebesgue discovered a more sophisticated and flexible theory
of integrals. In this chapter the elements of the Lebesgue theory are given.
The first step is to define the measure of a set A c En. For n = 1,2, or 3, the
measure is respectively the length, area, or volume of A. An important prop-
erty of Lebesgue measure is its countable additivity [Formula (5.9)]. While
not every set A is assigned a measure, countable additivity insures that the
class of measurable sets is large enough for all applications encountered in
mathematical analysis.
After measure, the integral of a bounded function f over a bounded set A
is defined using upper and lower integrals. The integral exists under the very
mild assumptions that A is measurable and f is measurable on A. Later
(Sections 5.6 and 5.11), the integral is studied without those boundedness
assumptions.
The definition of an integral does not furnish an effective procedure for
the actual evaluation of integrals. However, the theorems on iterated integrals
and transformation of integrals (Section 5.5 and 5.8), together with the
fundamental theorem of calculus, provide a useful technique for this purpose.
Among the important features of the Lebesgue theory are the theorems
about integration term by term in sequences of functions. Such questions are
treated in Section 5.11.

167
5 Integration

Notation. The n-dimensional measure of a set A is denoted by v,,(A).


If the dimension n is clear from the context, then we write simply "measure"
rather than "n-dimensional measure" and YeA) rather than v,,(A). The
integral of f over A is denoted by

or

If A = En, we write simply Sf d"~. The symbols dv" and dv,,(x) are used
after an integral sign merely for con venience and for traditional reasons. They
will have no significance by themselves.
Ifn = 1, then we write, as is customary, SA f(x)dx instead of SA f(x)dVI(x),
and if A = [a, b], we write S~f(x)dx..

5.1 Intervals
What is the n-dimensional measure of a subset A of En? To start with,
let us consider the simplest possible case-where A is an n-dimensional
interval.

Xo;
I
I
I
_ _~_ _ _ _~________- L_ _ _ _ Xl

Figure 5.1

A 2-dimensional interval is a rectangle with sides parallel to the coordinate


axes (Figure 5.1). Its area is the product of the lengths of its sides. Since E2 is
the cartesian product EI x Et, a 2-dimensional interval is just the cartesian
product of I-dimensional intervals. Similarly, a set I c En is called an n-
dimensional interval if I is the cartesian product of I-dimensional intervals:
I=Jlx···xJ n ,
where each J i is a finite interval of EI. The interval I is closed if each J i is
closed, and open if each J i is open. For instance, if I is closed, then there exist
Xo = (x6, ... , x~),
with:xJ < Xii for each i = 1, ... , n, such that J i = [x~, Xii] and
I = { x.. Xoi:s; i
x :s;
i I, I = 1, ... , n}.

168
5.1 Intervals

The n-dimensional measure V(I) of I is the product of the lengths of the inter-
vals J l' ... , J n •
We next define the measure of a set that is a finite union of n-dimensional
intervals. For this purpose the idea of grid of hyperplanes is introduced.

Grids
For each i = 1, ... , n let us take a finite set of real numbers; let the elements
of these sets be denoted by x~, where

and mj + 1 is the number of elements of the ith set. Let P~ be the hyperplane
with equation Xi = x~, and let 0 be the union of all these hyperplanes.
Such a set 0 is called a grid of hyperplanes. A grid divides En into a finite
number of n-dimensional intervals, called intervals of 0, and a finite number
of unbounded sets. The latter could be called semiinfinite intervals of 0, but
we have no occasion to do so. The intervals of 0 have the form I =
J 1 x···xJ n where Jj=[x~i,x~i+l]and the integersjl, ... ,jn may be
chosen arbitrarily subject to 1 -5, jj -5, mj. There are mi'" mn intervals
of the grid, and for convenience we have taken them to be closed (see Figure
5.2).
X2

a
2
x h+l

X71

xiJ + 1 Xl
x},
I
Figure 5.2

Let us call a set Y a figure if Y is the union of certain intervals II' ... , I p
of some grid O. The measure of Y is
(5.1 ) V( Y) = V(I d + ... + V(I p).
There are many possible choices for O. Consequently, we must show that
V(Y) depends only on Yand not on the particular grid chosen. Let us call
0' a refinement of 0 if 0 cO'. It is easy to show that V(Y) is unchanged
if 0' is obtained by adding one hyperplane to 0, and hence by induction if
o is replaced by any refinement of it. Now let 0 and 0' be any two grids such
that Y is the union of intervals of 0 and also the union of intervals 0'. Then
o u 0' is a refinement of both. Consequently, V(Y) is the same whether
o or 0' is used.
169
5 Integration

This same reasoning shows that if Yand Z are figures, then Yand Z can
be written as unions of intervals of the same grid n. Therefore Y u Z is a
figure. Moreover,
(5.2) V(Yu Z) ~ V(Y) + V(Z).

If Y n Z is empty, then equality holds in (5.2).

PROBLEMS

1. Let n = I and Y = [0, I] u [2,3], Z = [1,3] u [4,5]. Verify Formula (5.2) in


this example.
2. Let n = 2 and Y = [0,2] x [0, I] u [1,3] x [1,2], Z = [ -1,2] x [ -I, 3]. Find
a grid n such that both Y and Z are unions of intervals of n. Find the areas of
Y, Z, Y u Z, and Y n Z, and verify that V(Y) + V(Z) - V(Y u Z) = V(Y n Z).
3. Let II = [0, I] x [0, I] x [0, I] and 12 = [1. 2] x [0, 2] x [ - I, 2]. Find the
volume of 11 u 12 and of lin 12 ,
4. Let m be a positive integer, f(x) = exp x, and Y = 11 U ... u 1m , where for
k = I, ... ,m
Ik = [(k - I)im, kim] x [O,j(klm)].

Find the area V2(Y)' Show that it is approximately e - I if m is large.


5. (a) Let 11 and 12 be n-dimensional intervals. Show that lin 12 is also an interval
provided that it has nonempty interior.
(b) When is II U 12 an interval?

5.2 Measure
We now define measure in the Lebesgue sense for a large class of subsets of
En. A set A in this class is called measurable, and the measure of A is denoted
by V(A). Some main properties of measurable sets and of measure are
stated in Theorems 5.1 through 5.3. In studying the present section, the
reader should first concentrate on understanding the definitions and the
statements of these main theorems. A careful study of the various lemmas
and propositions in the section might be postponed.
We begin by defining the measure of a bounded set A. This is done in two
stages. First, the measure of an open set G is defined by approximating G
from within by figures, and that of a compact set K by approximating K from
without by figures. In the second stage, A is approximated from within by
compact sets and from without by open sets. This two-stage approximation
process is an important feature of the Lebesgue theory of measure.
There is an older theory of measure due to Jordan. In this theory A is
approximated simultaneously from within and without by figures. The
Jordan theory is unsatisfactory for several reasons. Among them is the fact

170
5.2 Measure

I I
~

~
\ G
-
y -
V
V -
--
~ /
"- ./

I
Figure 5.3

that the class of sets to which it applies is too small. For instance, there are
compact sets to which the Jordan theory does not assign any measure.
Let G be an open set. If Y is a figure contained in G, then the measure
of G must be more than V( Y). It is defined to be the least upper bound of the
set of all such numbers V(Y) (see Figure 5.3).

Definition. The measure of an open set G is V(G) = sup{V(Y): Y c G} .

If the set S = {V( Y) : Y c G} has no upper bound, then we set V( G) =


+ 00. For instance, V(E n ) = + 00. If G is bounded, then G is contained in
some interval I, and V(I) is an upper bound for S. In this case V(G) is finite.
By definition (Section 1.1), the least upper bound sup S has the property
that s S sup S for every s E S, and given E > 0 there exists an s E S such
that sup S < s + E.

EXAMPLE I. Let G = int Z where Z is a figure (recall that any figure is a


compact set). Let us show that V(int Z) = V(Z). For any figure Y c int Z,
V( Y) s V(Z). Given E > 0 one can find such a figure Ywith V(Z) < V( Y) + E.
To see this, write Z = II U ... u I p' where II,"" I p are intervals. Let
Y = 1'1 U ... u I~, where each Ij is an interval such that Ij c int I j and
V(I) < V(Ij) + p - 1 E. Then Y c int Z and V(Z) < V( Y) + c as required.
Thus, V(Z) = sup{V(Y): Y c int Z} .

Now let K be a compact set. If Z is any figure whose interior int Z contains
K, then V(Z) must exceed the measure of K.

Definition. The measure of a compact set K is V(K) = inf{ V(Z) : K c int Z}.

EXAMPLE 1 (continued). Let K = Y, where Y is a figure. Then V( Y) as defined


immediately above agrees with the definition of V(Y) in Section 5.1. More-
over, V(Y) = V(int Y) as was proved above.

171
5 Integration

Lemma 1. Let K be a compact set and G an open set such that KeG. Then
there is a figure Y such that K c int Yand Y c G.

The proof is left to the reader (Problem 6).


Now let A be any bounded set. Its outer measure, denoted by V(A), is
defined by approximating from without by open sets:
V(A) = inf{V(G): A c G}.

Similarly, the inner measure !:::(A) is defined by approximating from within


by compact sets:
!:::(A) = sup{V(K): K c A}.

If K c A, then by Lemma 1, V(K) ~ V(G) whenever A c G. Hence


V(K) is a lower bound for {V(G): A c G}, and V(K) ~ V(A). Thus V(A)
is an upper bound for {V(K): K c A}, and
!:::(A) ~ V(A).

It is easy to show that if B c A, then


.!:::(B) ~ !:::(A), V(B) ~ V(A) .

We are interested in those bounded sets A whose outer and inner measures
are equal.

Definition. A bounded set A is called measurable if its outer and inner measures
are equal. If A is measurable, then the number
V(A) = I:::(A) = V(A)

is called the n-dimensional measure of A.

Let us show that bounded open sets and compact sets are measurable
and that the new definition of their measures agrees with the previous one.
Let H be a bounded open set. If G is open and H c G, then any figure Y
contained in H is also contained in G. By the definition of measure for open
sets, V(H) ~ V(G). When G = H, equality holds. From the definition of
outer measure, V(H) = V(H). Given t; > 0, there is a figure Y c H with
V(H) - t; < V(Y). Since Y is a compact set, V(Y) ~ I:::(H). Since this is true
for every t; > 0, V(H) ~ I:::(H). But !:::(H) ~ V(H), and therefore
!:::(H) = V(H) = V(H).

Similarly, if L is a compact set, then


I:::(L) = V(L) = V(L).
In addition to open sets and compact sets, many other sets are measurable.
In fact, the only examples of nonmeasurable sets are obtained in a quite
nonconstructive way using the" axiom of choice" of set theory [18, p. 157].

172
5.2 Measure

Let us next show that the union, intersection, and difference of two
bounded measurable sets is measurable (Proposition 5.2). For this a series of
preparatory lemmas is needed.

Lemma 2. Let G and H be open sets and K be a compact subset of G u H.


Then there exists d > 0 such that the d-neighborhood of any x E K is either
contained in G or in H.
PROOF. The sets En - G and En - H are closed. Let
f(x) = dist(x, En - G), g(x) = dist(x, En - H),
where dist(x, A) is the distance from x to the set A (Section 2.8). The functions
f and g are continuous and f(x) + g(x) > 0 for every x E G u H. The
continuous function f + g has a positive minimum value c on the compact set
K. Let d = cj2. For every x E K, either f(x) ~ d or g(x) ~ d. D

Lemma 3a. Let G and H be open sets offinite measure. Then


(5.3) V(G u H) :$ V(G) + V(H).
PROOF. Let Wbe any figure such that We G u H. Let d be as in Lemma 2
with K = W. The figure W is the union of intervals of a grid n. By refining
n if necessary we may suppose that each interval of n has diameter less than
d. Let Y be the union of those intervals of n which are contained in G, and Z
the union of those contained in H. By Lemma 2, W c Y u Z. Consequently,
V(W) :$ V(Y u Z) :$ V(Y) + V(Z).
Since Y c G, V(Y) :$ V(G); similarly, V(Z) :$ V(H). Hence
V(W) :s; V(G) + V(H).
The number V(G) + V(H) is an upper bound for {V(W): We G U H}; and
hence it is no less than the least upper bound V(G u H). This proves (5.3).
D

Lemma 4. Let K and L be compact sets such that K n L is empty. Then


(5.4) V(K u L) ~ V(K) + V(L).
PROOF. Let f(x) = dist(x, L). Since K n L is empty and L is closed, f(x) > 0
for every x E K. Since K is compact and f is continuous, f has a positive
minimum value d on K.
Let W be any figure such that K u L c int W. Then W is the union of
intervals Ii, ... , I p with diameters less than dj2. Let Y be the union of those
intervals I j such that I j n K is not empty, and Z the union of those such that
I j n L is not empty. Then Yu Z c W, Y n Z is empty, and K c int Y,
L c int Z. Hence
V(K) + V(L) :$ V(Y) + V(Z) :$ V(W).

173
5 Integration

This shows that V(K) + V(L) is a lower bound for {V(W) : K u LeW},
and hence is no more than the greatest lower bound V(K u L). D

Lemma 5. Let A and B be bounded sets. Then


(5.5) V(A u B) ~ V(A) + V(B).
If A n B is empty, then
(5.6) I:::(A u B) 2 E:(A) + E:(B).
PROOF. Given s > 0, there are open sets G ::l A, H ::l B such that

V(G) < V(A) + s/2, V(H) < V(B) + s/2.


Then G u H is an open set containing A u B, and from Lemma 3a
V(A u B) ~ V(G u H) ~ V(G) + V(H),
V(A u B) < V(A) + V(B) + s.
Since the last inequality is true for every s > 0, we must have (5.5).
Similar reasoning, using Lemma 4, gives (5.6). D

Proposition 5.1a. Let A and B be bounded measurable sets such that A n B


is empty. Then A u B is measurable and
(5.7) V(A u B) = V(A) + V(B).
PROOF. By Lemma 5,
V(A) + V(B) ~ E(A u B) $ V(A u B) ~ V(A) + V(B).
Since the extreme left-hand and right-hand sides are the same, both E:(A u B)
and V(A u B) must equal V(A) + V(B). D

Corollary 1. Let A be a bounded set. Then A is measurable if and only if for


every s > 0 there exist a compact set K and an open set G such that
K cAe G and V( G - K) < s.

The proof is left to the reader (Problem 7).

Proposition 5.2. Let A and B be bounded measurable sets. Then A - B,


A n B, and A u B are also measurable.
PROOF. Let us first prove that A - B is measurable. Given s > 0, let G, G'
be open sets and K, K' compact sets such that K cAe G, K' c BeG',
and
s s
V(G - K) < 2' V(G' - K') < 2'

Let H = G - K, L = K - G'. Then H is open, L is compact,


LeA - B c H.

174
5.2 Measure

Moreover, H - L is open and


H - L c (G - K) u (G' - K').
By Lemma 3a,
V(H - L) :::; V(G - K) + V(G' - K') < c.
This shows that A - B is measurable.
Now A n B = A - (A - B). Since A and A - B are bounded measurable
sets, by the first part of the proposition their difference A n B is measurable.
Finally,
A u B = (A - B) u B,

and the two sets on the right-hand side are measurable and do not intersect.
By Proposition 5.1a, A u B is measurable. 0

Countable additivity of measure


If AI' A z , ... , Am are bounded measurable sets and Ak n Al is empty for
k =f. I, then
m

(5.8) V(A , u ... U Am) = L V(Ad·


k=1

This follows from Proposition 5.1a and induction on m. Formula (5.8)


expresses a property called .finite additivity of measure. Let us now prove a
stronger result.
A series Lk= I ak of nonnegative numbers converges if the partial sums
Sm = Lk= I ak are bounded, since the partial sums then form a bounded
nondecreasing sequence (see Section 2.3). If the sequence of partial sums is
unbounded, the series is said to diverge to +C/J. A sequence AI' A l , ... of
sets is disjoint if Ak n Al is empty whenever k i= t.

Proposition 5.tb. If AI' A z , ... is a disjoint sequence of measurable sets


and if A = Al U A z u ... is bounded, then A is measurable and
00

(5.9) V(A) = L V(A k )·


k=1

This property is called countable additivity of measure. If we let Ak be empty


for k > m, then (5.8) is a special case of (5.9).
To prove Proposition 5.1 b, we first state:

Lemma 3b. Let G I , G z , ... be open sets each of which has finite measure.
IfG = G I u Gz u ···,then
00

V(G):::; L V(Gd·
k=1

175
5 Integration

PROOF. Let Y c G. The figure Y is a compact set and G I' G2' ... form an
open covering of Y. Hence Y c G I U ... u Gm for some m. By Lemma 3a
and induction on m,
m

V(G I U ... u Gm} ~ I V(G k }·


k=1
Then
m (£

V(Y) ~ L V(Gd ~ I V(G k }·


k=1 k=1

Since this is true for every such Y, the lemma follows. D

PROOF OF PROPOSITION 5.lb. We have Al U ... U Am C A for every m.


Using (5.8)
m

L V(A k) ~ !:::(A).
k=1
Since this is true for each m,
OC!

L V(Ad ~ [(A).

°
k= I

On the other hand, given £ > let Gk be an open set such that Ak c Gk and
V(G k ) < V(A k ) + £r\ k = 1,2, ... , and let G = G I U G2 U .... Then
A c G and therefore V(A) ~ V(G). By Lemma 3b,
ro 00

V(A) < L V(Ad + L r S k


k=1 k=1

and L 2- k = 1. Since this is true for any s > 0, A is measurable and (5.9)
holds. D

If in Proposition 5.1 b the sets A I, A 2, ... are not disjoint, then the equality
(5.9) becomes an inequality. This is part of Theorem 5.2 below.

Unbounded sets
For a possibly unbounded set A the concepts of A being measurable and
measure of A are defined as follows. Let Ur = {x: Ix I < r}. A set A is called
measurable if A n Uris measurable for every r > 0. The measure of A is
V(A) = lim V(A n Ur ).
r-+ + oc'

If we set cf>(r) = V(A n U r), then cf> is a nondecreasing function. Therefore


the limit exists. It may be finite or + 00. If A is a bounded set, then A c U ro
for some roo For r ;:::: ro, A = An U r • For bounded sets this definition agrees
with the previous one.
We summarize several important facts about measurable sets and measure
in the following theorems.

176
5.2 Measure

Theorem 5.1
(a) Every open set G is measurable.
(b) If A is any measurable set, then its complement A C is measurable.
(c) Let AI' A z , ... be measurable sets. Then the union Al U A z U ...
and the intersection A I n A z n ... are measurable sets.

Theorem 5.2
(a) IfYis afigure, then its measure V(Y) is given by the elementary formula
(5.1).
(b) If G is open, then V(G) = sup{V(y): Y c G}, where Y denotes a
figure.
(c) If K is compact, then V(K) = inf{ V(Z): K c int Z}, where int Z
denotes the interior of a figure Z.
(d) If A = A I U A z U ... , where Ak is measurable for each k = 1,2, ... ,
then
x
(5.10) V(A) ~ L V(Ad·
k=1

(e) Equality holds in (5.10) if AI' A z , ... are disjoint.


PROOF OF THEOREM 5.1. If G is open, then GnU r is open and bounded for
each r > O. Hence G n U r is measurable. Since this is true for each r, Gis
measurable; this proves (a). To prove (b), let A be measurable. We note that
A C n Ur = Ur - (A n Ur ). By Proposition 5.2, A C n Ur is measurable since
Ur and A n Ur are measurable sets. Since this is true for each r > 0, A is C

measurable. To prove (c), first consider the case when A = A I U A z U ...


is bounded. Let
BI = AI, B z = A z - AI"'" Bk = Ak - (AI U··· U Ak- I ),····
By Proposition 5.2, each Bk is measurable. Moreover, B I' B z , ... are disjoint
and their union is A. By Proposition 5.1 b, A is measurable in case A is a
bounded set. The case when A is unbounded reduces to this one, by observing
that
An Ur = (AI n Ur ) U (A z n Ur ) U ....

Finally, the measurability of A I n A z n ... follows from what has already


been proved and

PROOF OF THEOREM 5.2. Part (a) of Theorem 5.2 was verified earlier (see
Example 1), and parts (b), (c) are just the definitions. Let us prove (d) when
A is bounded and postpone the proof for unbounded A to Section 5.10.

177
5 Integration

Define the disjoint sets B I' B l ' . .. as in the proof of Theorem 5.1. By
Proposition 5.1 b,
rJJ

V(A) = L V(Bd·
k= I

Since Bk c A k, we have V(B k) ~ V(Ad. This implies (5.10), in case A is


bounded. If AI' A l , ... are disjoint, then Bk = Ak and (5.9) holds. This is (e).
o
EXAMPLE 2. Let G be an open subset of E I. If G is the union of a finite number
of disjoint open intervals I I, ... , 1m, then V( G) = V(I I) + ... + V(I m). It can
also happen that G is the union of a disjoint infinite sequence I I, I l ' . . . of
open intervals, in which case
rJJ

V(G) = L V(Ik)'
k=1

The third possibility is that G contains some half-line. In that case V(G) =
+ 00.
°
The sets of measure playa special role. They turn out to be negligible
in integration theory, and for that reason we shall call them null sets.

Definition. If V(A) = 0, then A is a null set.

Corollary 2. If AI' A l , ... are null sets, then AI U Az U ... is a null set.
If B c A and A is a null set, then B is a null set.
PROOF. By (5.10),

° °
rJJ

~ V(A I U Al U ... ) ~ L = 0,
k= I

which proves the first assertion. If A is a bounded null set and B c A, then

° ~ HB) ~ V(B) ~ V(A) = 0.


Hence B is measurable and is a null set. If A is any null set, then A n Ur is a
null set for each rand B n U r cAn U r • Hence B n U r is a null set, which
implies that B is measurable and is a null set. 0

EXAMPLE 3. A set A is countable if either A is a finite set or its elements can be


arranged in an infinite sequence, A = {XI' Xl, ... } where X k # Xl for k # I.
Anyone point set is a null set. Hence, taking Ak = {xd, we find that any
countable set is a null set.

EXAMPLE 4. Let A be the set of rational numbers in the interval (0, 1). Then
A is countable. For instance, one can write

178
5.2 Measure

Hence A is a null set. Since Vl(A) + Vl[(O, 1) - A] = Vl[(O, 1)] = 1, the


set of irrational numbers in (0, 1) has measure 1 and therefore must be an
uncountable set.

EXAMPLE 5. Let A c M, where M is an (n - 1)-manifold. It is plausible that


the n-dimensional measure of A is 0, and this fact is proved in Section 5.8.
Hence any such set A is a null set.

A sequence of sets A l , A 2, ... is called monotone if either Al c A2 c···


or Al :::J A2 :::J •••• In the first instance the sequence is called nondecreasing,
and in the second instance non increasing.

Theorem 5.3
(a) Let A l , A 2 , ••• be a nondecreasing sequence of measurable sets. Then

(5.11 )

(b) Let A l , A 2 , •.• be a non increasing sequence of measurable sets, such


that V(A 1) < 00. Then

(5.12)

PROOF. Let us prove the theorem under the assumption that there is a
spherical ball U such that A, c U for each v = 1,2, .... This restriction
will be removed in Section 5.10. To prove (a), define B10 B 2 , • " as in the
proofs of Theorems 5.1 and 5.2. Then

v(Dl A,) k~l V(Bd·


=

Since Al c A2 C ... , we have Bl U ... u B,. = A,.. Therefore



V(A,) = L V(B
k=l k )·

We get (5.11) by taking the limit as v ~ 00. To get (5.12), we apply (5.11) to
the nondecreasing sequence of sets C,. = U - A,., and note that V(C) =
V(U) - V(A.) since A. c U. D

EXAMPLE 6. To see the need for the assumption V(A tl < 00 in Theorem
5.3(b), let n = 1 and A. = [v, 00). Then Al :::J A2 :::J ••• and V(A,.) = + 00
for each v = 1,2, .... However, Al n A2 n ... is empty, and hence
VeAl n A2 n···) = O.

179
5 Integration

PROBLEMS

In 1,2, and 3 assume that the sets are bounded.


1. Let A and B be measurable. Show that:
(a) V(A - B) = V(A) - V(A 11 B).
(b) V(A u B) + V(A 11 B) = V(A) + V(B).

2. Show that if A, B, and C are measurable, then

V(A u B u C) = V(A) + V(B) + V(C) - V(A 11 B) - V(A 11 C)


- V(B 11 C) + V(A 11 B 11 C).
3. Show that if A is measurable and B is a null set, then

V(A u B) = V(A - B) = V(A).

4. Let A = Al U A2 U ... , where Ak = {(x, y): x = 11k, 0 :5: Y :5: I} for k = 1,2, ...
Show that V2 (A) = o.

5. Let Ao be the circular disk with center (0, 0) and radius 1. For k = 1,2, ... , let Ak
be the circular disk with center (1 - 4- k)e1 and radius 4- k- l • Let A =Ao-
(AI u A2 U ... ). Find V2(A).

6. Prove Lemma 1. [Hint: Consider the collection of all intervals I such that leG.
The interiors int I of these intervals form an open covering of K.]

7. Prove Corollary 1 to Proposition 5.la. [Hint: If KeG, then G = K u (G - K).


By Proposition 5.1a, V(G) = V(K) + V(G - K).]

8. (a) Show that if A and B are countable sets, then A u B is countable.


(b) Show that if Be A and A is countable, then B is countable.
(c) Show that if Alo A 2, ... are countable sets, then AI u A2 U ... is countable.

9. Let A = {XI' x2 , ••. } be a countable subset of (0, 1). Given 0 < Il < 1, let Ilk =
1l2- k-Ik = (Xk - Ilk' Xk + Ilk), and G = II
l, U 12 U ....
(a) Show that VI(G) :5: Il.
(b) In particular, let A be the set of rational numbers in (0,1). Let K = [0,1] - G.
Then K is a compact subset of the irrational numbers. Show that VI(K) ~ 1 - Il.
(c) Show that K = fr K.

10. Let C be the Cantor set, defined in Problem 5, Section 2.4.


(a) Show that C is a null set (VI(C) = 0).
(b) Show that x E C if and only if x = I~I a)-i where ai = 0 or 2, i = 1,2, ....
(c) Let f(x) = I~ I airi-I for x E C. Show that f(C) = [0, 1]. Hence C is
uncountable.
(d) For x in the kth interval of A i (Problem 5, Section 2.4) let f(x) have the constant
value (2k - 1)2- i, k = 1,2, ... , 2i-l,j = 1,2, .... Show that f is continuous
and nondecreasing on [0, 1]. [Note: f is called the Cantor function.]

11. Show that any straight line in £2 has area O.

180
5.3 Integrals over En

12. Show that if A is an unbounded measurable set, then


V(A) = sup{V(K): K c A}.

(If V(A) = + 00, this means that for every C > 0 there is a compact set K c A
with V(K) ?: c.)

5.3 Integrals over En


We now begin the theory of integration. It is convenient, for technical
reasons, to begin with a rather artificial case. This concerns the integral over
all of P of a bounded function f, such that f(x) = 0 for all x outside some
compact set. The integral is defined by approximatingf from above and below
by functions (called step functions), which take only a finite number of
values. We denote the integral of f over all of En by f dV. J
Consider first functions taking only a finite number of values. In that case
the integral is just a certain finite sum. A function ¢ is called a step function
if there exists a disjoint collection {A 1 , ••• , Am} of bounded measurable
sets such that ¢(x) is constant on each Ak and ¢(x) = 0 for x ¢ A 1 U ... u Am.
If ¢(x) = Ck for every x E A k, then the integral of ¢ over En is

(5.13) f¢ dV = J1 Ck V(Ak)'

EXAMPLE 1. Let ¢(x) = kim for x E [(k - 1)lm, kim), k = 1,2, ... , m, and
¢(x) = 0 for x ¢ [0, 1). Then Ak = [(k - 1)lm, kim), Ck = kim, and

_1 +
m2 m2 f + ... + ~ = m + 1.
¢ dx =
m2 2m
2..
We write dx instead of dV1 in case n = 1. Note that for large m the integral
is approximately!, which is the area of the triangle bounded by the lines
y = x, y = 0, and x = 1 in E2.

EXAMPLE 2. Let ¢(x, y) = kim for (x. y) E Ako k = 1....• m, where Ak =


{(x, y): r E [(k - 1)lm, kim)}, r = (x 2 + y2)1/2 (Figure 5.4). For r :?: 1, we
set ¢(x, y) = 0. Then

f ¢ dV = f
k=lm
~ V(Ad = n I ~ ~(~)2 - (~)2J
k=lm~m m

f ¢ dV = n3
m
[2 f
k= 1
k2 - I kJ = -;m [2m(m + 1)(2m
k= 1 6
+ 1) - m(m + I)J
2

f A.
'I'
d
V
= n(m + 1)(4m - 1)
6m 2 •

For large m the integral is approximately 2n13. which is the volume of the
solid in £3 bounded above by the cone z = (x 2 + l)1/2, below by the plane
z = 0, and with lateral boundary the cylinder x 2 + y2 = 1.

181
5 Integration

Figure 5.4

Proposition 5.3. Let ¢ and t/I be step functions, and c any real number. Then
¢ + t/I and c¢ are step functions. Moreover,

(S.14a) f(¢ + t/I)dV = f ¢ dV +f t/I dV.

(S.14b) f(C¢)dV = c f ¢ dVfor any scalar c.

(S.14c) f t/I dV ~ f ¢ dV if t/I ~ ¢.


The notation t/I ~ ¢ means that t/I(x) ~ ¢(x) for every x E En.
J
PROOF. Let A k, Ck be as in the definition of ¢ dVabove, k = 1, ... , m. Let
Ao = (A 1 u ... u AmY and Co = O. Similarly, there are disjoint bounded
measurable sets B b •.. , Bp and d 1, .•. , dp such that t/I(x) = dl for x E Bl • Let
Bo = (B 1 U ... u Bp)' and do = O. Then

f ¢ dV = kto Ck V(Ak) = kto JoCk V(A k n Bl )

f t/I dV =
ppm
I~O dl V(BI) = I~O k~O dl V(Ak n Bl )·

The order of summation on the right-hand side can be reversed. Moreover,


the sets Ak n Bl form a disjoint collection; and both ¢(x) and t/I(x) are con-
stant on each set Ak n B l • We add to get (S.14a). To get (S.14c) we note that
C k ~ dl on Ak n Bj, and (S.l4b) is immediate. D

°
Definition. Let fbe real valued with domain P. The support off is the smallest
closed set K such that f(x) = for every x rf: K.

EXAMPLE 3. Let f(x) = x + 1 if x E (0, 1) and f(x) = 0 if x rf: (0,1). The


support of f is the closed interval [0, 1].

We are now ready to define upper and lower integrals for a bounded
function f that has compact support. The upper integral of f is denoted

182
5.3 Integrals over E"

1
by f dV. If <P is any step function such that <P ~ f, then J<P dV is an upper
estimate for it. We take the greatest lower bound of the set of all such numbers
J<P dV.
Definition. The upper integral over En of a bounded function f with compact
support is

(S.1Sa) J f dV = inf{f <P dV: <P ~ f}.


We must check that there is at least one such step function, to ensure that
the set on the right-hand side is not empty. However, since f is bounded,
there is a number C such that I f(x) I ~ C for every x. Since its support is
compact there is an interval I such that f(x) = 0 for every x ¢ I. Let <Po(x) =
C if x E I and <Po(x) = 0 if x ¢ I. Then <Po ~ f and <Po is a step function.
In the same way, the lower integral off over En is denoted by Jf dV. It is
a
the least upper bound of the set of all numbers Jt/I dV, where t/I is step func-
tion and t/I ~ f:

(S.1Sb) J f dV = sup{f t/I dV: t/I ~ f}-


If t/I ~ f ~ <p, then by (5.14c), Jt/I dV ~ J<P dV.
This implies that

JfdV~ Jf dV.

If <P is a step function, then J<P dV = J<P dV = 1<P dV.


Definition. A bounded function f with compact support is integrable if its
upper and lower integrals are equal. Its integral over En is

(5.16)

We just observed that any step function is integrable. In Section S.4 it is


shown that every function in a much larger class is integrable.

Proposition 5.4. Let f and g be integrable functions. Then f + g is integrable


and cf is integrable for any scalar c. Moreover,

(5.17a) f (f + g)dV = f f dV + f g dV.

(5.17b) f(cf)dV = c f f dV.

(5.17c) f f dV = f g dV if f ~ g.
183
5 Integration

PROOF. Given e > 0 there exists a step function l/Jl ~ f such that

and a step function l/J2 ~ 9 such that

fl/J2 dV < J
9 dV + ~.
Then l/Jl + l/J2 is a step function and l/Jl + l/J2 ~ f + g. Consequently,

Using (S.14a), we have

J(J+g)dV< Jf dV + Jgdv+e.

Since this is true for every e > 0,

(S.18a)

Similarly,

(S.18b) I(J+g)dV~ If dV + IgdV.


If f and 9 are integrable, then the right-hand sides of (S.18a) and (S.l8b)
are equal. The left-hand side of (S.l8b) is not greater than the left-hand
side of (S.18a). Hence both upper and lower integrals of f + 9 equal f dV + J
J9 dV. This proves that f + 9 is integrable and (S.l7a). The rest of the proof
is left to the reader (Problem 4). D

An n-dimensional interval I = J 1 X ••• x I n is half-open to the right if


each of the I-dimensional intervals J j is half-open to the right. (In the defini-
tion of figure we could equally well have used intervals half-open to the right
instead of closed intervals.) Let us call a function l/J an elementary step
function if l/J is constant on each interval of some grid IT and l/J has the value
o outside the intervals of IT. To avoid ambiguity about the values of l/J on
the bounding faces of intervals we take the intervals of IT half-open to the
right.

*Riemann integral
If in (S.1Sa) only elementary step functions l/J are allowed, then the upper
Riemann integral off is obtained. The lower Riemann integral off is obtained
by allowing only elementary step functions", in (S.1Sb). Let us denote upper

184
5.3 Integrals over En

and lower Riemann integrals by S(f) and §.(f). Then

(5.19) ~(f) ~ J ~J
f dV f dV ~ S(f).

If §.(f) = S(f), then f is called Riemann integrable. Their common value S(f)
is the Riemann integral of f. From (5.19), if f is Riemann integrable, then f is
integrable [in the sense of (5.16)] and

(5.20) S(f)= JfdV.

It can be proved that a bounded function f with compact support is


Riemann integrable if and only if V( {x : f is discontinuous at x}) = 0
[1, pp. 230 and 260].

PROBLEMS

1. Determine whether f is bounded. Find its support.


(a) f(x) = x - Ixi.
(b) f(x, y) = x exp( _x 2 -yZ).
(c) f(x, y) = 1 if either x or y is a rational number, f(x, y) = 0 if both x and yare
irrational.
(d) f(x, y) = (x - y)lx + yl - (x + y)lx - yl if Ixl + Iyl < 1, f(x, y) = 0 if
Ix I + Iy I ~ 1. Illustrate with a sketch.
2. Let [a] denote the largest integer which is no greater than a (for instance, [n] = 3).
Let 4>(x, y) = [x + y] if 0 :::; x < r,O :::; y < s, where rand s are positive integers.
For all other (x, y) let 4>(x, y) = O. Show that

f ¢ dV2 =
rs(r +s-
2
1)
.

3. Let a unit square be divided into a square of side (4m + 1)-1 in the center and 2m
annular figures of equal width (4m + 1)-1 surrounding it, as shown in Figure 5.5.

e2r-----------------~

I
I
I
I

----~----
2m

Figure 5.5

185
5 Integration

Let cp(x, y) = 0 for (x, y) in the small square or outside the large square. Let cp(x, y) =
(-I)kk in the kth annular figure, k = I, ... , 2m. Show that

f cp dV2
= 8m(2m
(4m +
+ 1)
1)2 .
What is this approximately when m is large?
4. (a) Show that if f is integrable, then S (ef)dV = e Sf dV. [Hint: Show that this is
true if e ~ 0, and that - S g dV = 1(-g)dV for every g. If e < 0, set g = ef and
g = -efJ
1
(b) Show that f dV ::;; 1g dV if f ::;; g.

5.4 Integrals over bounded sets


Let A be a bounded measurable set and f be a function that is bounded
on A. More precisely, the domain of f contains A and there is a number C
such that I f(x) I ~ C for every x E A. Let us consider a new function with
the same values as f on A and the value 0 otherwise. This function is denoted
by fA- Thus
fA(X) = {f(X) ~f x E A
o Ifx~A.

The function fA is bounded and has compact support. The values off outside
A should contribute nothing to the integral of f over A.

Definition. The function f is integrable over A if fA is an integrable function.


The integral off over A is the number

(5.21)

In later sections it is sometimes convenient to use the notation fA f(x)dV(x)


for the integral fA f dV. Moreover, we sometimes emphasize the role of the
dimension n by writing dVn instead of dV. When n = 1, we usually write
fA f(x)dx instead of fA f(x)dVt(x).
Proposition 5.4 implies that sums and scalar multiples of functions
integrable over A are also integrable over A. Theorem 5.5 gives a widely
applicable condition for integrability of f. In the meantime, we summarize
a number of properties of the integral in the following theorem.

Theorem 5.4. If all the integrals involved exist, then:


(1) fA(f + g)dV = SA f dV + SA g dV.
(2) SA (cf)dV = C SA f dV.
(3) SA 1 dV = V(A).
(4) If f(x) ~ g(x) for every x E A, then fA f dV ~ fA g dV.
(5) If I f(x) I ~ C for every x E A, then ISA f dVI ~ SA If I dV ~ CV(A).
(6) If A is a null set, then SA f dV = O.
(7) If A !l B is a null set, then fAuBf dV = SAf dV + SBf dV.

186
5.4 Integrals over bounded sets

PROOF. First of all,

Therefore (1) and (2) follow from Proposition 5.4. Let 1A denote the function
with the value 1 on A and otherwise O. It is a step function, called the
characteristic function of A, and by (5.13) with m = 1, C1 = 1, S 1A dV = V(A).
This establishes (3). For (4) we have fA S gA and apply Proposition 5.4.
To prove (5) we have
f(x) s I f(x) I s C for every x E A.
Hence from (4)

{fdVS {lfldVS {CdV,

and the right-hand side is CV(A). Similarly, - f(x) S I f(x) I and

- {fdVS {Ifl dV S CV(A).


Since ISA f dV I is either SA f dV or its negative, this proves (5). Part (6)
follows from (5). To prove (7), fA u B = fA + fB - fA n B' By Proposition 5.4

ffAUBdV= ffAdV+ ffBdV- ffAnBdV,

and the last term is 0 by (6). D

From (6) and (7), the integral is unchanged if A is replaced by A u N


or A - N, where N is any null set. Similarly, if f(x) = g(x) except for x in
some null set, then f and g have the same integral.

EXAMPLE I. If fr A is a null set, then

f A
f dV = r
JintA
fdV= I
clA
fdV.

In elementary examples fr A is always a null set. If A is the set of rational


numbers in [0, 1], then fr A = [0, 1], which is not a null set. Problem 9,
Section 5.2, furnishes an example of a compact subset of E1 with frontier
of positive length. There are open, connected subsets of E2 with frontiers
that have positive area.

Let us next show that under quite mild assumptions about f, the integral
exists. It is for this purpose that the idea of measurable function is introduced.
Let f have domain En.

Definition. If {x: f(x) > c} is a measurable set for every scalar c, then f is a
measurable function.

187
5 Integration

kim
(k - J)/m

A,
Figure 5.6

It is shown in Section 5.10 that such operations as taking the sum of


two measurable functions or the limit of a sequence of measurable functions
lead again to measurable functions. Just as for nonmeasurable sets, the only
examples of non measurable functions are obtained in a nonconstructive
way using the axiom of choice.

Lemma 1. If f is a bounded, nonnegative, measurable function with compact


support, then f is integrable.
PROOF. By replacingfby flC, where C is an upper bound for f(x) we may
assume that 0 s f(x) s 1 for every x. Let I be an interval containing
the support of f. Given e > 0, let m be a positive integer such that V(I) < em.
For each k = 1, ... , m let
k -
Ek = { x: f(x) > ----;;;- , I}
Ak =
k--1 < f(x) s -k}
{x: -m m
(see Figure 5.6). Since f is measurable, each Ek is a measurable subset of I,
hence Ak = Ek - Ek+ 1 is also measurable. Let ¢ and rjJ be step functions
defined by
k rjJ(x) = k - 1
¢(x) = -,
m m
for x E A k , and ¢(x) = rjJ(x) = 0 if f(x) = O. Then ¢(x) - rjJ(x) = 11m on
E1 = {x :f(x) > O} and is 0 otherwise. Hence

f¢dV- frjJdV= V~l),


V(E 1 ) V(I)
--S-<t:.
m m

188
5.4 Integrals over bounded sets

Moreover, ljJ :::;; I : :; 4>, from which


f ljJ dV:::;; II dV:::;; JI f dV:::;; 4> dV.
Hence the upper and lower integrals differ by less than s. Since this is true
for every positive s, I is integrable. D

The construction of the step functions 4>, ljJ in the proof just completed
is a key to understanding the power of the Lebesgue theory of integrals. In
Figure 5.6 we first subdivided the vertical axis, and then obtained the sets
Ak on which 4> and ljJ are defined. From their definition, 4>(x) - ljJ(x) is
uniformly small for large m. In fact, 4>(x) - ljJ(x) :::;; 11m for all x E En. This
construction is quite different from that used in defining the Riemann integral.
As mentioned at the end of Section 5.3, the latter begins with subdividing En
into intervals, on each of which elementary step functions approximating I
from above and below must be constant. Thus, the Lebesgue theory sub-
divides the I-dimensional range of I, while the Riemann theory subdivides
the n-dimensional domain of f.
To apply this key lemma, we let A be a bounded measurable set, and I
be a function whose domain contains A. If {x E A : I(x) > c} is measurable
for every scalar c, then we call I measurable on A.

Theorem 5.5. II I is bounded and measurable on A, then I is integrable over A.


PROOF. Let us first assume that I ~ 0. If c ~ 0, then the set
{x :IA(X) > c} = {x E A :/(x) > c}
is measurable since I is measurable on A. If c < 0, the left-hand side is En
which is measurable. Hence IA is measurable. By Lemma l,fA is integrable.
If f has negative values on A, let g(x) = f(x) + C where C is an upper
bound for II(x) I on A. Then g ~ 0, and g is bounded and measurable on A.
Hence g is integrable over A, and so is f. D

Corollary. II I is bounded and continuous on A, then I is integrable over A.


PROOF. For every c, {x E A : f(x) > c} is open relative to A. In other words,
it is the intersection of A with an open set, and hence is measurable. D

In particular, if A is compact, then any I continuous on A is bounded and


therefore integrable over A.
Theorem 5.5 has a sort of converse. If a bounded function I is integrable
over A, then I is measurable on A. We do not prove this.

*Relation to the Riemann integral


A function I is Riemann integrable over A ifIA is Riemann integrable accocding
to the definition in Section 5.3. Its Riemann integral over A is S(fA)' If I is
Riemann integrable over A, then from (5.20), S(fA) = SA I dV.

189
5 Integration

A bounded set A is Jordan measurable if its characteristic function 1A


is Riemann integrable. It can be shown that A is Jordan measurable if and
only if fr A is a null set [1, p.256].
If A = [a, b], a closed interval of E 1, then the definition of Riemann
integral given above can rather easily be shown to agree with Riemann's
original definition of integral as the limit of sums.

PROBLEMS

1. Let f(x) = 2x - x 2 if 0 ::; x ::; 2, f(x) = 0 otherwise. Using the notation in the
proof of Lemma 1, describe the sets AI, ... , Am. Sketch the step functions 1> and '"
in the case m = 4.
2. In each case show that f is integrable over A.
(a) f(x~ = x 2 exp x, A = [0, a].
(b) f(x) = sin(l/x) if x =I 0, frO) = 5, A = [ -1, 1].
(c) f(x, y) = (x 4 - y2)/(X 2 - y), A = {(x, y): Ixl ~ 1, Iyl ~ 1, x 2 =I y}.
(d) f(x) = 0 if x is irrational, fix) = l/q if x = p/q where p and q are integers with
no common factor, A = (0, 1].
(e) f(x) = 1 if x is irrational, fix) = 0 if x is rational, A = [a, b].
3. For each part of Problem 2 describe the sets {x E A: f(x) > c}.
4. Show that if f, g, and h are integrable over A and I fIx) - g(x) I ::; h(x) for every
x E A, then ISA f dV - SA g dVI ::; SA h dV.
5. Let f be of class C(2) on [0, a] and b = max {f"(x) I : 0 :$; x ::; a}. Let g(x) =
frO) + !,(O)x. Using Problem 4, show that IS~ f dx - afrO) - a 2f'(0)/21 ::; a 3 b/6.
Use this result to estimate S6/2 exp( -x 2 /2)dx.
6. (Mean value theorem for integrals.) Let A be compact and connected. Let f be
continuous on A and g be integrable over A with g(x) ;::: 0 for every x E A. Prove
that there exists x* E A such that

fA
fg dV = f(x*) f A
g dV.

[Hint: Let C and c ·be the maximum and minimum values of f on A. Then
cg ::; fg ::; Cg. Use (2) and (4) of Theorem 5.4 and the intermediate value theorem.]

5.5 Iterated integrals


Thus far we have given no effective procedure for the actual evaluation of
integrals. One method for doing this is by writing the integral as an iterated
integral and applying the fundamental theorem of calculus. Let 1 ~ s < n.
In most cases we shall take s = 1 or s = n - 1. Then En can be regarded
as the cartesian product E' x En - s • Let us write x = (x', x"), where
x I = (1
x , ... ,X S) E ES , x" = (X s + 1, ... , Xn) E En - s•

Let A be a set and f a function whose domain contains A. In the present


section we assume that A is compact, and that f is continuous on A. These

190
5.5 Iterated integrals

assumptions are relaxed in Section 5.11. We show that fA f dV can be ex-


pressed as an iterated integral as follows. First for fixed x', integrate over a set
A(x') c En - s• Then integrate the result of the first integration over a set
R c E S • The sets A(x') and R are defined as follows:

(5.22) A(x') = {x" : (x', x") E A}


R = {x': A(x') is not empty}.
The set R is simply the projection of A onto ES (see Figure 5.7).

x"

A(x') { -----QI•
I
:
,
A ,
:
I
----1--,
I I
I
I
I I I
, I I
I I I
I I I
,I ,I
--+---~~----L------X'

~
R

Figure 5.7

Theorem 5.6. Let f be continuous on a compact set A. Then

(5.23) f A
f(x)dv,,(x) = r {f
JR A(x')
f(x', xlI)dv,,-s(XII)}dV.(X')'

As mentioned earlier, the left-hand side of (5.23) is another notation for


fA f dV. A precise interpretation of the right-hand side is as follows. For
fixed x' E R, let f(x', ) denote the function whose value at each x" E A(x')
is f(x', x"). The inner integral is the integral over A(x') of the function
f(x', ). Let

g(x') = f A(x')
f(x', x")dv,,-ix").

The outer integral on the right-hand side of (5.23) is the integral of 9 over R.
The set R is compact, since R is the projection of the compact set A on ES •
It is easily shown that A(x') is compact for each x', and that f(x', ) is con-
tinuous on A(x'). However, we do not need to use these facts in the proof of
Theorem 5.6. The function 9 need not be continuous. However, it is shown
in the course of proving Theorem 5.6 that 9 is integrable over R. In the proof
of Theorem 5.6, the following fact about monotone sequences of functions is
used. For each v = 1,2, ... let Fv be a bounded measurable function with

191
5 Integration

compact support such that


Fl ~ F2 ~ ... ~ o.
Let
F(x) = lim F v(x) for every x.

Then F is measurable (and of COUrse bounded), and

I F dv" = v-oo
lim IFv dv".

A proof ofthis fact is given in the section on convergence theorems (Corollary


3, Section 5.11).
PROOF. The proof of Theorem 5.6 proceeds by observing that the theorem
is true for elementary step functions, and then by constructing a monotone
sequence of elementary step functions tending to fA.
Let us first show that

(*) I <l>(x)dv,,(x) = I{J <l>(x', X")dv,,_s(X")}d~(X')


if <l> is any elementary step function. If I is any n-dimensional interval, then
I = I' x 1", where I' and I" are s- and (n - s)-dimensional intervals. If <l>
is the characteristic function 1[ of I, then (*) becomes v,,(I) = ~(I')v,,-s(I"),
which is true by definition of meaSUre for intervals. But any elementary
step function can be written as a linear combination <l> = Cl<l>l + ... + cp<l>p,
where each <l>k is the characteristic function of an n-dimensional interval.
Then

which is just (*).


Now let A be a compact set and f be continuous on A. For the moment,
assume that f ~ O. Let F = fA. Let us define a monotone sequence of ele-
mentary step functions F 1 ~ F 2 ~ ••• as follows. Let lobe some interval
containing A and let C be the maximum value of f on A. Let F 1(x) = C for
x E 10, and F 1(x) = 0 otherwise. Divide 10 into 2n congruent subintervals
11, ... , 1m , m = 2n, and let Ak = A n cl I k. If x E Ik and Ak is not empty,
let F 2(X) be the maximum value of f on A k • Otherwise, let F 2(X) = O. Then
F 1 ~ F 2 ~ F. The function F 3 is defined similarly by dividing each interval
Ik into 2n congruent subintervals, and so on (Figure 5.8).
Let do = diam 10 . At the vth step the diameter dv of each interval I is
2- v + 1 do, and Fv(x) = 0 except on those intervals of diameter d v whose
closures meet A. Let us show that for every x
(**) F(x) = lim Fv(x).

192
5.5 Iterated integrals

Figure 5.8

If X rt A, then since A is closed there exists v(x) such that d. < dist(x, A)
for every v ~ v(x). In this case 0 = F(x) = F .(x) when v ~ v(x). Thus (**)
holds for every x rt A. Suppose that x E A. Since f is continuous, given
e > 0 there exists ~(x) > 0 such that I f(y) - f(x) I < e for every YEA
such that Iy - x I < ~(x). Choose v(x) such that d. < ~(x) for every v ~ v(x).
Then F(x) ~ F .(x) < F(x) + e for every v ~ v(x). Therefore (**) also
holds when x E A.
By the monotone sequences theorem

f F dv" = lim
\'-+ 00
fF" dv".
For each x', the functions F,,(x', ) form a monotone sequence tending to
F(x', ). Let

G.(x') = f
F.(x', )dv,,-., G(x') = fF(x', )dv,,-s'

Applying the monotone sequences theorem to the sequence [F.(x', )],


lim G.(x') = G(x')

for every x'. Moreover, G 1 ~ G z ~ .... Applying the monotone sequences


theorem to this sequence,

lim
'-00
fG. d~ = fG d~.
Since F. is an elementary step function, we have by (*) with <I> = F.,

Therefore

(5.23a)

Since F = fA' S F dv" = SA f dv". For x' E R, the inner integral on the right-
hand side of (5.23) is G(x'); and G(x') = 0 for x rt R. Thus, S G d~ equals the

193
5 Integration

outer integral on the right-hand side of (5.23). From (5.23a), Theorem 5.6 is
proved in case f ~ o.
To remove the assumption f~ 0, write f = (f + C) - C, where C is
the maximum value of If I on A. Then f is the difference of nonnegative con-
tinuous functions for each of which (5.23) holds. By subtraction, (5.23)
holds for f. D

It is not essential in Theorem 5.6 that x' = (Xl, ... , XS). One can equally
well take integers il < ... < is and

wherejl < ... < jn-s are those integers between 1 and n not included among
iI, ... , is·
In particular, let n = 2. Then s = 1, and we can take either x' = x or x' = y.
To avoid writing parentheses we write Sdx Sf dy instead of S{j f dy}dx.
Then

where the iterated integrals are taken over the appropriate subsets of El.
Many authors write the iterated integral as IS f dy dx, but this notation
would lead to confusion when we come to the exterior differential calculus
in later chapters.
The iterated integral is usually easier to evaluate when taken in one of the
two possible orders than in the other order.

EXAMPLE 1. Consider the iterated integral

f L:1 dx (x 2 + y)dy.
Then f(x, y) = x 2 + y and A is as shown (Figure 5.9). Evaluating the inner
integral first, we get

Writing SA f dV2 as an iterated integral in the opposite order, we get

il
o dy _)X2 + y)dx =
f,;y i1(X33-
0 + xy )/,;y
_,;ydY ="38 i1
0 y3j2 dy 16
= 15·

In evaluating these integrals we have, of course, used the fundamental


theorem of calculus (Appendix A.3). If n = 3, then the integral can be written
as an iterated triple integral.

194
5.5 Iterated integrals

--~----~~~~----+--x
-\

Figure 5.9

EXAMPLE 2. Let A = {(x, y, Z): x ~ 0, Z ~ 0, °:s:; y :s:; 4 - x2 - Z2}. Then

f
A
f dV3 = f
JR
{f A(x.y)
f dZ}dVi X , y),

J
and A(x, y) is the interval [0, 4 - x 2 - y] (see Figure 5.1 0.) Writing the
integral over R as an iterated integral, we get

x2
{ f dV3 = L2 dx L 4- dy LJ4-XLYfdZ.

There are five other possible ways of writing SA f dV3 as an iterated triple
integral. For instance,

{ f dV3 = Is {L4-XLZ2f dY}dV2(X, z)


= L2 dz L J4 - Z2 dx L4-XLZ2j dy,

where S = {(x, y): x ~ 0, Z ~ 0, x 2 + Z2 :s:; 4}.


z

x
Figure 5.10

195
5 Integration

In the same way, for any n the integral can be written in n! possible ways
as an n-fold iterated integral.
As a special case of Theorem 5.6, one can take f(x) = 1 for all x E A. One
gets the following formula for the measure of A.

Corollary

(5.24) v,,(A) = LVn-s[A(x')Jd~(x').


In particular, let s = 1. Writing x' = Xl = u, (5.24) becomes

(5.25) v,,(A) = Lv,,-l[A(u)]du.

The set A(u) is congruent to the intersection of A and the hyperplane Xl = u.


In effect, (5.25) states that v,,(A) is the integral of the (n - I)-dimensional
measures of these intersections. For n = 3, this is the method of "volumes
by slices" of elementary calculus.

x"

A(u) f ------
__l~----~L---~----_XI
ue, ee,

Figure 5.11

EXAMPLE 3. Let A be the n-simplex with vertices 0, eel' ... , ee n , where e > 0.
For e = 1 this is the standard n-simplex (Section 1.5). Let us show by induction
on n that v,,(A) = en/n!. If n = 1, then A = [0, e] and Vl(A) = e. Assuming
the result in dimension n - 1, we apply (5.25) (see Figure 5.11). Now

= { x.. x 1 + ... + x n ~ e, x i 2:: 0"lor I'-1 , ... , n},

°
A -

A(u) = {X": X 2 + ... + Xn ~ e - U, Xi 2:: for i = 2, ... , n},

and R = [0, e]. Therefore

I e(e - u)n-l
v,,(A) = o(_ n1)'
(e - u)n Ie en
. du = - n(n _ 1)'. 0 = "n.

196
5.5 Iterated integrals

EXAMPLE 4. Let s = n - 1. Suppose that A has the particular form

A = {x: h(x') ::;; xn ::;; H(x'), x' E R}.

Then
A(x') = [h(x'), H(x')]
and

v,,(A) = {[H(X') - h(x')]dv,,-l(X').

For instance, if n = 3 then A is a solid bounded above by the surface with


equation z = H(x, y) and below by the one with equation z = h(x, y),
(X,y)ER.

Moments about hyperplanes, centroids


Let P be a hyperplane (Section 1.3). Then P = {x: y. x = c}. We may
suppose that Iy I = 1. This determines the vector y up to a change in sign.
We make a particular choice for y, which amounts to choosing one of the
two half spaces into which P divides En as "positive" and the other" negative."
The moment of a point x about P is defined to be y . x-c. By Problem 6,
Section 4.8, the moment is in absolute value equal to the distance from x to
P. The moment of a bounded measurable set A about P is defined as
SA (y . X - c)dv,,(x).
In particular, let us take y = ei a standard basis vector and c = O. Then
P = {x: Xi = O}. We denote the moment of A about P by mi in this case.
Thus

i = 1, ... , n.

Let m = (ml, ... , mn). The centroid of A is the point x such that m = v,,(A)x,
provided v,,(A) > O. The components of x satisfy
. mi
X' = v,,(A)' i = 1, ... , n.

The centroid has the following simple interpretation in elementary


mechanics. Suppose that A is a body made of some material of density p(x).
The mass of A is SA p(x)dVn(x), and the moment oj mass about hyperplane P is
SA (y . X - c)p(x)dv,,(x). The center oj mass is the point x* such that

i = 1, ... , n.

If p is constant, then x* = x.
197
5 Integration

EXAMPLE 5. Find the centroid of the hemispherical ball


H = {x: Ixl ::; 1, Xl ;?: O}.
Let IXn denote the measure of an n-ball of radius 1. Then Y,.(H) = ilXn • From
elementary geometry IXn is known for n ::; 3. Problem 7 gives 1X4; in Section
5.9 a general formula for IXn is given. Let us use the intuitively evident fact
that the measure of an n-ball of radius, is IXn,n. This follows from Theorem
5.7, Section 5.7.

Ix"

------+4----+--+---x'
e,
H(u)

Figure 5.12

The components of the centroid x are Xi = mi/Y,.(H). By symmetry


considerations, mi = 0 if i > 1. By the same method as in Example 3, using
the fact that H(u) is an (n - 1) ball of radius (1 - U2 )1/2 (Figure 5.12).

ml = r
JH
Xl dY,.(x) = II
0
U du r
JH(U)
1 dy"-l(X")

= IXn - l II
o
u(I - U 2 )(n- 1)/2 du = l.
IXn -l
n+
Thus

Even-order moments about a closed set


Let B be a closed set, and k an even positive integer (k = 2, 4, ...). The kth
order moment of a point x about B is [dist(x, B)]\ where dist(x, B) is the
distance from x to B. The kth order moment of a bounded measurable set A
about B is

L [dist(x, B)r dY,.(x).


The case k = 2 is of interest in mechanics.

198
5.5 Iterated integrals

PROBLEMS
1. Find the area and also the centroid of:
(a) {(X,Y):X 2 ::;; y::;; x + 2}. (b) {(x,y): Iyl- I ::;; x::;; jt=7}.
2. Express the iterated integral

f dy f(Yl XY dx, where f(y) = min[l, 10g(1/y)],

as an integral over a set A c E2, and then as an iterated integral in the opposite
order. Evaluate it.
3. Express as an iterated triple integral:

{ f dV3 , where A = {(x, y, z): x 2 + Z2 ::;; y2 ::;; 8 - (x 2 + Z2)}.

4. Find the volume of


{(x, y, z): Ixl + Iyl + Izl ::;; 2, Ixl ::;; 1, Iyl ::;; I}.
5. Find the volume of

{(x, y, z): Ixl + Iyl + Izl ::;; 2, Z2 ::;; y}.


6. (a) Suppose that f(x, y) = g(x)h(y) for every (x, y) E A and that A = R x S.
Show that

(b) Evaluate g dx SA exp(x + y)dy.


(c) Evaluate So dy S(Jf2 xy cos(x + y)dx.
7. Let IX, be the measure of the unit n-ball {x : Ix I ::;; I}. Show that

IX, = 2IX, -I L\l - u 2 )(' -ll/2du.

Show that IX4 = n 2/2. (In Section 5.9 we give a general formula for IX,.)

8. Write points of E'+ 1 as (x, z), where x = (xl, ... , x'). Let
A = {(x, z):O::;; z::;; 1 -lxI2}.

Show that v.+ I(A) = 2a(n)/(n + 2).


9. Let 1: be the standard n-simplex.
(a) Show that the centroid of 1: is at the barycenter.
(b) Show that the second moment of 1: about the (n - r)-dimensional plane
spanned by er+I> ... , e, is 2r/(n + 2)!
10. Recall the definition of uniform convergence of a sequence of functions (Section
2.10). Show that the sequence F I> F 2, ... constructed in the proof of Theorem 5.6
converges uniformly to fA if and only if f(x) = 0 for every x E fr A. [Hint: f is
uniformly continuous on the compact set A (see Problem 8, Section 2.5).]

199
5 Integration

5.6 Integrals of continuous functions


In the previous sections the integral of a bounded function f over a bounded
set A was considered. However, in some instances the integral can be
defined without these boundedness assumptions. In the present section we
suppose that f is continuous on A. In Section 5.11 we make the weaker
assumption that f is a measurable function.
Let us first assume thatf ~ O. Let A be a measurable set. For any compact
set K c A the integral SK f dV exists in the sense of Section 5.4. The integral
of f over A is defined as the least upper bound of such integrals:

(5.26) LfdV= suP{LfdV:K c A}

If the set of numbers on the right-hand side has an upper bound, then f is
integrable over A. Otherwise, the integral diverges to + 00. The definition has
two immediate consequences, which we state in the following form.

Lemma 1. Let 9 and h be continuous on A, with 9 ~ 0, h ~ O.


(a) If h ~ 9 and 9 is integrable over A, then h is integrable over A.
(b) If 9 and h are both integrable over A, then 9 + h is integrable over A.
PROOF. To prove (a), for every compact set K c A,

LhdV~ L9dV~ L9dV.


Hence SA 9 dVis an upper bound for {JK h dV: K c A}.
To prove (b), for every compact set K c A,

L (g + h)dV = Lg dV +L h dV ~ Lg dV + L h dV.

Hence the right-hand side is an upper bound for {J dg + h)dV: K c A}. D

Let us consider some important special cases of (5.26).


Case 1. A is closed, f is continuous on A, and f ~ O. For every r ~ 0 let
A(r) = {x E A: Ixl ~ r}, and let

tjt(r) = JA(r)
f dV.

Since f ~ 0, tjt(r) is a nondecreasing function. Hence, as r --+ 00 tjt(r) tends to


a limit I, finite or + 00. Since A(r) is a compact subset of A, by (5.26)
tjt(r) ~ SA f dV for each r. Hence, I ~ SA f dV. On the other hand, if K c A
and K is compact, then K c A(r) for large enough r. Hence

r fdV~ J fdV~ I.
J
K A(r)

200
5.6 Integrals of continuous functions

Since this is true for each such K, SA f dV ~ I. We have shown that


(5.27) fA
f dV = lim
r~ + 00
f A(r)
f dV.

In particular, if n = 1 and A = [a, 00), then

fa
oo f dx = lim
r-+ + 00
fr f dx.
a

EXAMPLE 1. Let f(x) = x - p. Then

f oo

1
x-Pdx =--
P- 1
1

if p > 1. If p ~ 1, then the integral diverges to + 00.


EXAMPLE 2. Let f(x) = exp( -bx), b > 0. Then f is integrable over [a, 00)
for any a.

Figure 5.13

Case 2. Let A = K - {x o}, where K is compact. The function f is con-

°
tinuous on A, f ~ 0, but f may be unbounded on any neighborhood of xo.
For each J > let A'(J) = {x E K : I x - Xo I ~ J} (Figure 5.l3). Each of the
sets A'(J) is compact, and

(5.28) fA
f dV = lim
b~O +
f A'(b)
f dV.
The proof of (5.28) is similar to that for (5.27). The same formula holds if f
is continuous on A = K - L, where L is any closed set. In this case A'(J)
is the set of points of K distant at least J from L.

EXAMPLE 3. Let A = (0, 1] and f(x) = x-Po Then 16 [ p dx = 1/(1 - p) if


p < 1, and the integral diverges to + 00 if p ~ 1. [Note: If A c [a, b] and
[a, b] - A is a null set, then we still use the notation S~ for SA']

Case 3. Letfbe continuous on A = K - {x o}, where K is closed but not


compact. Let A 1 = {x E K : Ix - Xo I ::::: I}, A 2 = {x E K : < Ix - Xo I ~ I}. °
201
5 Integration

Then SAl f dV and SA2f dV can be treated respectively as in Cases 1 and 2.


Since Al n A2 is a null set, if both of these integrals exist their sum is SAf dY.
If either the integral of f over A I or the integral over A 2 does not exist, then
f is not integrable over A.

EXAMPLE 3 (continued). Let A = (0, (0) and f(x) = x - p. Taking A I = [1, (0),
A2 = (0, 1], the integral over Al exists only if p > 1, and over A2 only if
p < 1. Hence SO' x- P dx diverges to + 00 for every p.

Let us now consider a function f continuous on A, such that f may have


both positive and negative values. Its integral is defined by (5.30), after
writing f as the difference of two nonnegative functions. Let
f+(x) = max{f(x), O},
f-(x) = max{ -f(x), O}.
Then f + and f - are continuous on A (Problem 10) and
(5.29)
for every x E A (see Figure 5.14). The function f is called integrable over A
iff + and f - are integrable over A. Its integral is

(5.30)

If both f and A are bounded, the new definition of integral agrees with
that in Section 5.4 (Problem 11).

Figure 5.14

Some authors assign a value + 00 or - 00 to the integral in case at most


one of the functions f + and f - has a divergent integral. However, in no case
should one try to evaluate the meaningless expression 00 - 00 if both f +
and f- have divergent integrals.
Lemma 1 above has two important corollaries.

Corollary 1. Let f be continuous on A. Then f is integrable over A if and


only if If I is integrable over A.

202
5.6 Integrals of continuous functions

PROOF. Let f be integrable over A. Then f + and f - are integrable over A,

° °
and by (b) of Lemma 1 so is their sum If I = f+ + f-. Conversely, let If I
be integrable over A. Since ~ f+ ~ If I, ~ f- ~ I f I, by (a) of Lemma 1
f+ and f- are integrable over A. Hence f is integrable over A. D

Corollary 2 (Comparison test). Let f and g be continuous on A. If I f I~ g


and g is integrable over A, then f is integrable over A.
PROOF. In (a) of Lemma 1, let h = If I. D

In many instances it can be shown that f is integrable by comparing If I


with a function g known to be integrable (Corollary 2).

EXAMPLE 4. Let A = K - {x o}, where K is an n-ball with center Xo. Then

exists if p < n and diverges to + 00 if p ~ n. This can be proved by intro-


ducing generalized spherical coordinates (Section 5.9). Let f(x) =
cjJ(x) Ix - Xo 1- P, where cjJ is continuous and IcjJ(x) I ~ C for every x E A.
Let g(x) = C Ix - Xo 1- p. By the comparison test, f is integrable over A
if p < n. If cjJ(x) ~ c > 0 for every x E A, let h(x) = clx - xol- p • If p ~ n,
then f ~ h. The integral of h over A diverges to + 00, and hence so does the
integral of f.
In the same way

f.En-K
Ix - Xo 1- P d V

exists if p > n and diverges to + 00 if p ~ n. A similar discussion applies.

EXAMPLE 5. Let

(5.31) r(u) = fO x U- 1 exp( - x)dx, u > O.

The function r is called the gamma function. Let f(x) = xu - 1 exp( - x). Since

°
f(x) ~ x u - 1 and p = 1 - u < 1, f is integrable over (0, 1] by comparison
with x- p. For any b, x b exp( - x) --+ as x --+ + 00 and therefore is bounded
on [1, 00). Letting b = u + 1 we see that there is a number C such that
f(x) ~ C/x 2 for every x E [1, 00). Thus f is integrable over [1, 00) and over
(0, 1], therefore over (0, 00).
The gamma function generalizes the factorial. First of all,

(5.32) r(l) = 1"" exp(-x)dx = 1.


203
5 Integration

Integrating by parts,

f(u + 1) = {X)x u exp( -x)dx

= - XU exp( - x) I: 1" +u x u - 1 exp( - x)dx,

which gives
(533) f(u + 1) = uf(u).
The integration by parts over (0, (0) is justified by taking it first over intervals
[<5, IJ and [1, r J and letting <5 --+ 0+, r --+ + 00. In particular, if m is an integer,
then f(m + 1) = mf(m). Since r(l) = 1,
(5.34) f(m + 1) = m!.

Conditionally convergent integrals


Let us return to the special case when A is closed (Case 1, above). If J is inte-
grable on A, then Formula (5.27) remains correct. This follows by applying
(5.27) to J +, J -, and subtraction. On the other hand, if J has both positive
and negative values on A, then there may be a finite limit in (5.27) while the
integral of IJ I diverges to + 00. In that case, the right-hand side of (5.27)
defines a conditionally convergent integral. While conditionally convergent
integrals are important in some parts of mathematical analysis (e.g., in the
treatment of Fourier integrals), they are not within the scope of the Lebesgue
theory. We treat them only in Problems 5, 8, and 9.
In a similar way, one can define in Case 2 a conditionally convergent
integral by (5.28), if the limit there exists while SA IJ IdV diverges to + 00.

Iterated integrals
Theorem 5.6 regarding iterated integrals has the following extension (we
defer the proof to the end of Section 5.11). Let J be continuous on A. If J is
integrable over A, then Formula (5.23) holds. In case J ~ O. then a stronger
statement can be made. If either side of (5.23) diverges to + 00, then so does
the other. Therefore, if J ~ 0, one way to show that J is integrable over A is
to show that the iterated integral exists.

EXAMPLE 6. LetJ(x, Y) = Ixl/(1 + x2 + y2f, A = E2. By symmetry

fJ o
OC!
dV2 = 4 f dy
fOC!
0
(1
x dx
2
+x +y
2)2 = 2
fOC! dy
0
-1- - 2 =
+y
7r.

The reader is cautioned that ifJhas both positive and negative values on A,
then one cannot conclude that J is integrable from the fact that the iterated
integral exists.

204
5.6 Integrals of continuous functions

EXAMPLE 7. Let f(x, y) = y-l cos X, A = [0, nJ x (0, 1J. Then

LldY L"y-l cosxdx = L10dY = O.

The iterated integral in the opposite order does not exist, hence f is not
integrable over A.

PROBLEMS

1. Determine whether the integral exists or is divergent to + 00.

(a) f p-=-t'
2

1
dx

x2 - 1
[Hint: Let cf>(x) = 1/Jx+l as in Example 4.]

(b) f"12

-"12
Isinxl-Pdx. [Hint:x/sinx--+ 1 as x--+ 0.]

(c) ( ' P(x)exp( -cx)d.;c, P a polynomial, c > O.

(d) foo
-00
31:
~x(lxl
dx

+ 1)
.

(e) foo ~dx.


1 (x - I)P

(f) fAX
~ dV2(x, y), A =
-2--2
+y
E2 - {(O,O)}

(g) f
E'
exp(-Ix - yl)
1 + Ix _ yl2 dV2(x,y).

2. Show that {(x, y, z): 0:5: z :5: (x 2 + y2)jxy, 0 < x :5: 1, 0 < y :5: I} has infinite
volume.
3. Show that the volume of {(x, y, z): 0:5: z :5: Ixylexp( _x 2 - y2)} is 1.
4. Show that Jj(x, y)dV2(x, y) exists if f is bounded, continuous, and I f(x, y) I :5:
C(1 + x 2 + y2)- p12, P > 2.

5. If lim,~oo S. . . ,J(x)dx exists but f is not integrable on EI, then this limit is called
the Cauchy principal value. Find the Cauchy principal value:
(a) f(x) = x/(1 + x 2 ). (c) Any odd function f, f( - x) = - f(x)
(b) f(x) = x + 1/(1 + x 2 ). for every x.
6. Let A be open, and f a function continuous and integrable on A. For r > 0, let
A(r) = {x E A : Ixl :::; r, dist(x, A C) ~ r- 1 }. Show that:
(a) A(r) is compact.
(b) SA f dV = lim,~oo SAl') f dV.
7. Show that each of the following integrals over En converges.
(a) S(lx12 + 1)- PI2 dV(x),p > n.
(b) S Ixl-1x1dV(x).

205
5 Integration

8. Letf(x) = X-I sinx.


(a) Show that Iimr _ oo S~ f(x)dx exists. [Hint: Integrate by parts.J
(b) Show that filf(x)ldx diverges to +00.
9. Let f(x) = (-l)m/m if x E em, m + 1), m = 1,2, ...
(a) Show that lim r _ oo S~ f(x)dx exists and equals -1 + t - t + .. ,
(b) Let Kv = (U~= I [2k - 1, 2kJ) u (Ul~ 1[21,21 + IJ). Show that K I U K z U ...

r
= [1,00), but

lim
\'-+00
f
Kv
f(x)dx f:- lim
r-+co 1
f(x)dx.

10. (a) Assume that for every real number c, both of the sets {x E A : f(x) < c} and
{x E A : f(x) > c} are open relative to A. Show that f is continuous on A.
[Hint: Show that {x E A: c < f(x) < d} is open relative to A.J
(b) Using (a) and Corollary 2, Section 2.6, prove: if f is continuous on A, then f+
is continuous on A.
11. Let A be a bounded, measurable set, and let f be bounded and continuous on A.
Show that SA f dV, as defined by (5.30), is the same as in Section 5.4. [Hint: Suppose
first that f :2: O. For any compact set K c A, we have, taking the integral over A
in the previous sense, 0 ~ SA f dV - SK f dV ~ C[V(A - K)J provided f(x) ~ C
for every x E A. But V(A - K) = V(A) - V(K) is arbitrarily small. Apply this to
f +, f -, and subtract.J
12. (Difficult.) Let f be continuous on an open set D. Assume that the integrals of
f+ and f- over D both diverge to + 00. Show that given any number I there is a
sequence of compact sets KI c K z c ... such that D = KI U K z U ... and
lim v _ ro JKJ dV = t.

5.7 Change of measure under affine


transformations
Our next objective is to give formulas which describe how measure and
integrals change under a regular transformation g. In this section we consider
the special case when g is affine and prove the formula for the measure of g(A)
when A is a compact set. This special result is then used in the proof of the
general transformation formula (5.38).
We begin with the following.

Lemma J. Let L be a linear transformation from En into En, and I an n-


dimensional interval. Then V[L(l)J = Idet L I V(l).

PROOF. First of all, Lemma 1 is valid for the following particular kinds of
linear transformations. Points in the domain of L are denoted by t.
(1) For some k and I, the transformation L merely interchanges the com-
ponents t k and t ' of t. Then V[L(l)J = V(l) and det L = -1.

206
5.7 Change of measure under affine transformations

(2) The matrix of L is diagonal. If x = L(t), then Xi = elti, where c~, ... , c~
are the diagonal elements. We have I = 1 I X ... x l n , where e~ch li
is a I-dimensional interval. Then L(I) = 1'1 X ... x 1~, where 1; is a
I-dimensional interval of length lell times the length of l i . Since
det L = c~ ... c~, we have V[L(I)] = Idet LI V(I).
(3) There exist k and I, k ¥- I, such that Xk = t k + d, Xi = t i for i ¥- k.
For notational simplicity we take k = 1, I = 2, and regard En as E2 X En-2.
Then 1=1' x 1", where l' = 11 X 1 2 , 1" = 13 X ... x In. Let L' be the
transformation from E2 into E2, such that
L'(t l , t 2) = (t l + ct 2, t 2).
Then L'(I') = P', where P' is a parallelogram of the same area as 1', and
L(I) = P' x 1". By (5.24),
v,,[L(I)] = V2(P') v,,- 2(1") = V2(1') v,,- 2(1") = v,,(I).
Since det L = 1, Lemma 1 is valid for L of type (3).
Next, we observe that if M and N are any two linear transformations for
which the theorem holds, then
V[(M 0 N)(K)J = Idet M I V[N(K)]
= Idet M" det N IV(K),
and (det M)(det N) = det MoN. Hence the theorem also holds for their
composite MoN.
If N has row covectors Wi, ... , wn and M is of Type (3), then the kth
row covector of M 0 N is wk + cw l and the others are unchanged. The kth
column vector of NoM is Vk + CV 1 and the others are unchanged, where
Vb"', Vn are the column vectors of N. A transformation M of type (1) when
applied on the left interchanges the kth and Ith row covectors ofN, and when
applied on the right interchanges the kth and Ith column vectors. Moreover,
the inverse of a transformation of Type (1) or (3) is of the same type.
Now let L be any linear transformation. Then

where Mb ... , Mp are of Types (1), (2), and (3). Since Lemma 1 holds for
each M i , it is valid for L. D

Now let g be an affine transformation from En into En. According to


Section 4.2, there exist Xo and a linear transformation L such that
g(t) = L(t) + Xo
for every tEEn.

Theorem 5.7. For every compact set K,


(5.35) V[g(K)] = Idet L I V(K).

207
5 Integration

PROOF. First of all, if g is a translation (L = I, the identity), then V[g(Z)] =


V(Z) for any figure Z. Hence V[g(K)] = V(K) for any compact K, if g is a
translation. We may assume from now on that X o = 0, g = L. By Lemma 1
above, (5.35) holds for any n-dimensional interval f. Hence (5.35) also holds
for any figure Z. If K is compact, the construction in the proof of Theorem 5.6
gives a sequence of figures Y1 ~ Y2 ~ •.• such that
V(K) = lim V(Yv)'
v-+ 00

Then g(Y1 ) ~ g(Y2 ) ~ "',

g(K) = g(Y1 ) n g(Y2 ) n "', V[g(K)] = lim V[g(Yv)]'


v-+ 00

Since V[g(Yv)] = Idet LI V(Yv) for v = 1,2, ... , we get (5.35). o


Corollary. ffg is an isometry of En, then V[g(K)] = V(K).
PROOF. For then det L = ± 1. o
Let us apply Theorem 5.7 to calculate the measures of simplexes and
parallelepipeds.

Measure of an n-simplex
Let S be an n-simplex with vertices X O, Xb •.. , Xn (see Section 1.5). Let
Vi = Xi - X O ' The vectors Vb"" Vn are linearly independent. Let L be the
linear transformation with Vb"" Vn as column vectors, and let g(t) =
L(t) + x o ' As before, let k be the standard n-simplex. If t = (t I, . . . , tn) E k,
let to = 1 - (t 1 + ... + tn) and let x = g(t). Then
n n

X = Xo + L ti(X i - x o) = L tiXi'
i= 1 i=O

Hence XES and to, t n are its barycentric coordinates. Conversely,


t 1 , ••• ,
every XES is obtained in this way. Thus S = g(:E). By Example 3, Section
5.5, V(:E) = lin!. Hence. from (5.35),
(5.36) V(S) = Idet Ll/n!.

Measure of an n-parallelepiped
Given Xo and linearly independent vectors Xl' ... , X n , let

P = {X: X = Xo + t
1= 1
tiv i, 0 :s; t i :s; 1, i = 1, ... , n}.

Then P is the n-parallelepiped spanned by V1, ... , Vn with X o as vertex


(Figure 5.15). Let f ° be the unit n-cube. Then P = g(Io), where g is as before.
Since V(Io) = 1,
(5.37) V(P) = Idet LI.

208
5.8 Transformation of integrals

-----
1:2 t--------,

Xo + V2

Figure 5.15

If Vb ... , Vn are linearly dependent, then P is called a degenerate n-parallel-


epiped. In that case V(P) = o.

PROBLEMS

1. Find the volume of the tetrahedron with vertices e I, - e z, e3, e I + 2e z + e3.


2. Find the area of the parallelogram with vertices e l - e z ,2e l + e z , -2elo -e l + 2e z .
3. Let S be an n-simplex, with vertices X o , XI' ... , X n • Show that

1
X6. .xl ... X~)
.
V(S) =- (
det·· ..
n! Xo x~ x~
1 1 1

[Hint: Subtract the first column from each of the other columns. This does not
change the determinant.]
4. Let (C}) be a positive definite symmetric n x n matrix, and let )'10 ... ' An be its
characteristic values. Show that

where IXn is the measure of the unit n-ball. [Hint: See Theorem 4.4.J
5. Show that if Iv;I ~ C for each i = 1, ... , nand P is a parallelepiped spanned by
VI, ••. , Vn , then V(P) ~ c. [Hint: Use induction, a suitable rotation of En, and the
method of slices.]

5.8 Transformation of integrals


Let g be a regular transformation with domain an open set ~ c En. Regularity
means that g is of class C(l) and has an inverse g- 1 of class C(l) (Section 4.5).
Let f be continuous on the open set D = g(~). The object of this section is
to prove the formula (5.38) which expresses the integral of f over a set A c D
as an integral over the corresponding subset B = g-l(A) of ~.

209
5 Integration

( \
I
----
B g

/ to
m
--rI
~
"-

Figure 5.16

Let us first give an imprecise derivation of the transformation formula


(5.38) (see Figure 5.16). For the moment, let B be compact, and let Z be a
figure approaching B from without. Let I c Z and to be a point of I. Let

The affine approximation G to g at to is given by


G(t) = Xo + L(t - to).
If I is small, then G(I) and g(I) nearly coincide. Hence V[g(I)] is nearly
V[G(I)] , which by Theorem 5.7 equals Idet LI V(I). Since f is continuous, its
integral over g(I) is approximately f(xo)V[g(I)], and since det L is the
Jacobian Jg(t o),

f.
g(1)
f dV ~ f[g(t o)] IJg(to) I V(I).
Let
4>(t) = f[g(t)] IJg(t) I.
Since f is continuous and g is of class C<l), the composite f o g is continuous
and IJg I is continuous. Hence 4> is continuous. The integral of 4> over I is
approximately 4>(t o)V(I), which is the right-hand side of the above expression.
The figure Z is the union of small intervals II, ... , I p. Let tk E I k • We
should have approximately

f.
g(Z)
f dV ~ t 4>(t )v(h) ~ f 4> dV.
k=l
k
Z

This suggests the formula

The following theorem states that the formula is correct, and not merely for
compact sets.

210
5.8 Transformation of integrals

Theorem 5.S. Let g be a regular transformation from L\ onto D. Let f be con-


tinuous on D and A be any measurable subset of D. Then

(5.38) If(X)dV(X)
A
= f g-l(A)
f[g(t)] I Jg(t)IdV(t),

provided either integral exists.

More generally, Theorem 5.8 is valid iff is any function measurable on D.


We do not prove this.

To prove Theorem 5.8 let us first prove three lemmas. In these lemmas I
denotes an n-dimensional cube. It is convenient to take I half-open to the
right. Then I = cl I - N, where N is a certain compact set composed of
(n - I)-dimensional faces of I. Hence g(I) = g(cl /) - g(N) is the difference
of compact sets, and so it is measurable. Let / be the side length of I, d its
diameter, and I' the concentric closed n-cube of side length (1 + 1:)/. Let G, L,
Xo be as defined at the beginning of the section.

Lemma 1. Let to E L\ and 1: > 0 be given. Then to has a neighborhood n c L\


such that g(I) c G(l')for every n-cube len with to E cl I (see Figure 5.17).

s.
_f--- ['

--
t.

• to

Figure 5.17

PROOF. If x = G(s), y = G(t), then x - y = L(s - to) - L(t - to). Since L is


linear, x - y = L(s - t). Let C = IlL -III (Section 4.3). Then s - t =
L - I(X - y), and
(*) Is - t I ::; CI x - y I.

Let a = 1:/(2Jn C). Since g is differentiable, to has a neighborhood


n c L\ such that

(**) Ig(t) - G(t) I ::; alt - tol

for every tEn.


Let x E g(I) and s = G - I(X). Then x = G(s) = g(t) for some tEl. By (*)
and (**)
Is - tl ::; Calt - tol.

211
5 Integration

Since to E I, It - tol s d. Since d = Jnl,


Is - tl s C(JJnI s rl/2.
This implies that SE l' and x E G(I'). D

In Lemmas 2 and 3 we assume that f 2 0. As above, denote the integrand


on the right-hand side of (5.38) by ¢(t) = f[g(t)] IJg(t) I.

Lemma 2. Let to E A and


Q j c A such that
B > ° be given. Then to has a neighborhood

f 11(1)
f dV < r¢ dV +
JI
B V(I)

for every n-cube I c Q j with to E cl I.


PROOF. Let

Then a 2 °
since
(a - ~)(b - ~) - (a
°
f 2 0, and b > since g is regular. Let rjJ(~, r) =
°
+ ~)b(1 + rt. Then rjJ(O, 0) = and rjJ is continuous.
°
Hence IrjJ(~, r) I < B for every (~, r) in some neighborhood V of (0,0). Choose
some ~ > 0, r > small enough that (~, r) E Vand ~ < b. Then
(*) (a + ~)(1·+ rtb < (a - ~)(b - ~) + B.

Since f is continuous, there is a neighborhood U of Xo such that a - ~ <


f(x) < a + ~ for every x E U. Since g and IJg I are continuous, there is a
neighborhood Q 1 of to such that g(t) E U and b - ~ < I Jg(t) I for every t E Qj.
We may assume that Q 1 c Q, where Q is as in Lemma 1.
Then

f 11(1)
f dV s (a + ~)V[g(I)],
and by Lemma 1, V[g(I)] s V[G(I')]. By Theorem 5.7,
V[G(I')] = b V(I') = b(1 + rtV(I).
Therefore

(**) fg(l)
f dV s (a + ~)b(1 + rtV(I).
On the other hand,

(***) i ¢ dV 2 (a - Wb - ~)V(I).

This follows from the inequality


¢(t) = f[g(t)] IJg(t) I 2 (a - ~)(b - ~)

212
5.8 Transformation of integrals

which holds for t E 0 1 provided a - ~ ~ O. If a - ~ < 0, (***) holds trivially


since its left side is nonnegative and the right side is negative. Inequalities
(*), (**), and (***) give Lemma 2. 0

Lemma 3. Let] c ~ be an n-cube. Then

f 11(1)
f dV 5, rcP dV.
J[
PROOF. Suppose this is false for some n-cube ]0. Then

c= f g(10)
f dV -
Jr cP dV
[0

is positive. Divide ]0 into m = 2" congruent n-cubes ] 1, ... , ]m' half-open


to the right. Since ] 1, ... , ] m are disjoint and g is univalent, the sets
g(Il)' ... , g(I m) are disjoint. Hence

LcP dV = jtl LcP dV,

For at least one j we must have

Choose some suchj and let]1 = ]j' By dividing]1 into 2n congruent n-cubes
and repeating the argument, we obtain ]2. Continuing, we obtain a sequence
of n-cubes]1 :::> ]2 :::> ••• such that

f.
g(1')
fdV- r cPdV~c2-ln,
J['
I = 1,2, ... ,

and the diameter of]1 tends to 0 as 1-+ 00. By Theorem 2.3, (cl]l) n (cl ]2) n·· .
contains just one point to, Let e < c/V(I°). Then
eV(I') = er'nv(I°) < c2- ln .

If 0 1 is as in Lemma 2, then ]1 c 0 1 for sufficiently large I and we obtain a


contradiction. 0

Let us now prove Theorem 5.8 in case A is compact and f ~ O. Let


B = g-I(A). The inverse g-1 is a regular transformation, since g is regular.
In particular, g is continuous, and therefore B is compact since A is compact.
Let 20 be some n-cube, half-open to the right, containing B. Divide 20 into
2n congruent n-cubes and let 21 be the union of those which meet B. Then
divide each n-cube of Z 1 into 2n congruent n-cubes, and let Z 2 be the union

213
5 Integration

of those meeting B, and so on. Then Z 1 :::J Z 2 :::J . •• and their intersection
is B. There exists Vo such that Z. c Ll for all v ~ Vo.
Applying Lemma 3 to each of these congruent disjoint n-cubes comprising
Z., and adding, we get

ff
A
dV s, fg(Zv)
f dV s, r cf> dV
Jz v

for each v ~ Yo. Moreover,

rcf> dV = lim Jrzvcf> dV.


JB v-oo

This fact will be proved in the section on convergence theorems [Corollary


5(b), Section 5.11]. Thus

{f dV s, Lcf> dV.
But g-l is also a regular transformation. Interchanging the roles of A
and B,

L cf> dV s, {cf> g-1IJg- 1IdV.


0

But IJg- 1(x)1 = IJg(t)I- 1 if x = g(t), and

cf>[g - l(X)] IJg - l(X) I = f(x).


Therefore

which proves (5.38) for the case of a compact set A.


Now let A be measurable, and again assume that f ~ o. Then B = g- l(A)
is also measurable (Problem 8). For any compact set K c A, we have
g-l(K) c Band

rf dV = f
JK g-'(K)
cf> dV s, rcf> dV.
JB
By (5.26),

(#)

Note, in particular, that f is integrable provided cf> is integrable. By con-


sidering g-l as above, we find that cf> is integrable provided that f is inte-
grable, and get the inequality opposite to (#) with ~ rather than s,. This
proves Theorem 5.8 if f ~ o.

214
5.8 Transformation of integrals

Finally, write any continuous f as f = f + - f - , as in Section 5.6. Then


cjJ = cjJ+ - cjJ-. Since IJgl > 0, we have
cjJ+ = (f+ 0 g)IJgl,

If either f or cjJ is integrable, then so is the other, and (5.38) holds. This
completes the proof of Theorem 5.8. 0

In particular, taking f(x) == 1 in (5.38) we have:

Corollary 1

(5.39) V(A) = f g-'(A)


IJg(t) IdV(t).

Corollary 2. If B is a null set, then g(B) is a null set.

Corollary 3. If A is a measurable subset of an r-manifold M, where r ~ n - 1,


then A is a null set.

PROOF. By the implicit function theorem (Section 4.6), any Xo E M has a


neighborhood U such that U n M = g(R), where R is a relatively open
subset of an r-dimensional vector subspace. Then R is a null set. By Corollary
2, U n M is a null set.
If K c M and K is compact, then there exist a finite number of such
neighborhoods U 1, ... , U m such that K c (U 1 U ... u U m) n M. Hence
K is a null set. If A is any measurable set, then V(A) = sup{V(K): K c A}.
If A c M, then A is a null set. 0

EXAMPLE 1. Let A = {(x, y): x > 0, y > 0,0 < xy < 3, x < y < 2x},f(x, y)
= y2, g(s, t) = Pte 1 + foe 2 for s > 0, t > O. We show that g is univalent
by solving the equations

x- ~
- '-it' y= V r;t
~1

explicitly, and find that s = xy, t = y/x. Since Jg(s, t) = 1/2t #- 0, g is regular.
Moreover, the part in A of the hyperbola xy = c,O < c < 3, corresponds to
the segment s = c, 1 < t < 2 in B. Hence B is as shown in Figure 5.18, and

J/2 dV2(x, y) = ~ fdt fs ds =~.


EXAMPLE 2. Let P be an n-parallelepiped. Then P = g(Io) where, as in
Section 5.7, g is affine and lois the unit n-cube. Then

f.
P
f(x)dV(x) = Idet L I f.
10
f[g(t)]dV(t).

215
5 Integration

2~-----, g
B

--~--------------x
3

Figure 5.18

PROBLEMS

1. Let n = 1, g(t) = t Z - 2t + 3, B = (0, I). What does (5.38) become?


2. Let g(s, t) = (SZ + tZ)e l + (SZ - tZ)e z , s > 0, t > 0, and A = {(x, y): 2 < x +Y< 4,
x - y > 0, y > OJ. Show that g is regular and evaluate SA x-IdVz(X, y).

3. Find the second moment about (0, 0) of the parallelogram with vertices (0,0),
e l + e z , - 2e l + 3e z , - e l + 4e z . [Hint: Let g be the linear transformation L with
column vectors e l + ez , -2e l + 3e z .J
4. Let B be a compact set, x its centroid, and g an affine transformation. Show that
g(x) is the centroid of g(B).
5. Let A be symmetric about 0, that is, x E A implies - x E A. Let f be integrable over
A and f( - x) = - f(x) for every x E A. Show that SA f dV = 0.
6. Let g be of class e(l) on d and K be a compact subset of d. Show that there is a
number e such that Ig(s) - g(t) I ~ Cis - tl for every s, t E K. [Hint: Proposition
4.5, Section 4.3.J
7. Let g be regular and K, e as in Problem 6. Show that if Be int K, then V[g(B)J :-:;
eV(B). [Hint: For t E B the partial derivatives ofg satisfy Ig;(t) I :-:; c. Use Problem 5,
Section 5.7, to see that IJg(t) I :-:; e.J
8. Let g be a regular transformation from an open set d onto D. Let B = g-I(A), where
A is a measurable subset of D. Show that B is measurable. [Hints: First consider the
case when A c LcD, where L is compact. Also, g-l is regular. See Problem 7.
Find compact sets LI c L z c··· with union D.J

5.9 Coordinate systems in E"


Let D be an open subset of En. Let 11, ... , I" be functions of class C(l)
on D such that the transformation f = (11, ... , 1") is regular. Since a regular
transformation f is univalent, the numbers 1 1(x), ... ,I"(x) uniquely specify
x and can be regarded as a set of" coordinates" for x.

Definition. A regular transformation f from D into En is a coordinate system


for D. The numbers 1 1(x), ... , I"(x) are the coordinates of x in this co-
ordinate system.

216
5.9 Coordinate systems in En

Since we have already considered regular transformations, this definition


involves nothing new except a change of viewpoint. In many problems it is D
that has actual geometric or physical significance. The transformation f
is introduced simply as a device for solving the problem, and the open set
L1 = f(D) has only an auxiliary status.
In particular, many integrals can be evaluated by introducing a suitable
coordinate system. The transformation formula (5.38) is applied with B =
f(A), g = f- 1. The objective usually is to choose a coordinate system for
which B is simpler than A (for instance an interval) or <p is simpler than f,
or both.
Let us consider some particular coordinate systems.
1. The identity transformation I gives the standard cartesian coordinate
system for E". The components Xl, ... , x" are the standard cartesian
coordinates of x.
2. If f is an affine transformation, then the coordinate system is called
affine.
3. If M is an r-manifold, <J) = (<1>1, ... , <I>"-r) is as in Section 4.7 and
0(<1>1, ... , <I>"-r) 0
:'l( r+ 1 , ... ,X ") "#
ux
at x o , then (Xl, ... , x r , <1>1, ... , <I>"-r) is a coordinate system for some
neighborhood V of xo. Here Xi(X) = Xi for all x. The coordinates of a
point x EM n V in this system are (Xl, ... ,xr , 0, ... ,0).
4. (Polar coordinates.) Let D be E2 with the positive x-axis N removed.
J
Let R(x, y) = x 2 + i and 0(x, y) be the angle from N to the half-line
from (0, 0) through (x, y), with 0 < 0(x, y) < 2n. Then (R, 0) is the polar
coordinate system for D. If r = R(x, y), 8 = 0(x, y), then (r, 8) are the
coordinates of (x, y) and
x = r cos 8 = gl(r, 8), y = r sin 8 = g2(r, e),
where g = (R, 0)-1. Since Jg(r, 8) = r, the transformation formula
becomes

Lf(X, y)dV2 (x, y) = Saf[r cos 8, r sin e]r dV2 (r, e),

where g(B) = A - N. Since N is a null set, the integral over A is the same
as the integral over A - N.
5. (Spherical coordinates in E"). For n = 2, this is the polar coordinate system.
e
Proceeding inductively, let r = lxi, 1 be the angle from the positive x 1 _
axis to x; more precisely,
e1 = cos- 1(x 1jr), 0 < 8 1 < n,
and

217
5 Integration

be spherical coordinates for x" = (X Z , ••• , x n ), where p = Ix"I = r sin (}1.


The coordinates of x are
Xl = r cos (}1
XZ = r sin (}1 cos (}Z

x n- 1 = r sin (}1 sin (}Z ... sin (}n- Z cos (}n-1


xn = r sin (}1 sin (}Z ••• sin (}n - Z sin (}n - 1.

This defines the spherical coordinate system (R, e l , ... , en-I) on D =


En - N, where N is a certain null set. The Jacobian is
Jg(r, (}1, ... , (}n-1) = rn- 1 sin n - z (}1 sinn - 3 (}z ... sin (}n-Z.

EXAMPLE 1. Suppose that f(x) = c/>(Ixl), where c/> is continuous on (r l , r z).


Then

i ,,< Ixl <'2


f dV = 13n i'2
"
c/>(r)r n- 1 dr,

where 13n is a number not depending on C/>, r1' or rz· To find 13n, set c/> == 1.
Then

where an is the measure of the unit n-ball. Hence 13n = nan, which turns out
to be the (n - 1)-dimensional measure of the unit (n - 1)-sphere.
6. (Cylindrical coordinates in E3.) Let (R, e) be the polar coordinate system.
Then (R, e, Z) is a coordinate system for D = E3 - {(x, 0, z): x ~ O}.
The equations x = r cos (), y = r sin (), z = z relate the cylindrical and
standard cartesian coordinates of a point XED. In a similar way cylindrical
coordinates can be introduced in En.
7. The idea of barycentric coordinates (Section 1.5) does not agree precisely
with the definition in this section. However, let to, t 1 , ••• , t n be the bary-
centric coordinates of x, with respect to the vertices xo ' Xl> •..• Xn of
an n-simplex S. Let g be the affine transformation defined at the end of
Section 5.7, and let f = g-l. Then t 1 , ••• , t n are the coordinates ofx in the
affine coordinate system f, and to = 1 - (t 1 + ... + tn).

*Gamma and beta functions


The gamma function was defined in Section 5.6. If we let x = g(s) = SZ /2,
then

218
5.9 Coordinate systems in En

and we obtain another expression for r(u):

(5.40) r(u) = 2 1- u {OO S2u-1 ex p( _ S;}s, u > 0.

Let us calculate the product r(u)r(v). Now

r(u)r(v) = 2 2- u- v {
OO
S 2U-1 exp( - ~)dS {OOt 2V - 1 exp( - ~)dt.
Writing the iterated integral as an integral over the first quadrant Q and
introducing polar coordinates,

r(u)r(v) = 22- u- v fQS2U-1t2V-1 ex p( _ S2 ; t 2)dVz(S, t),

= 22- u- v f
0
OO r2(u+v)-1 exp ( - 2
r2)dr f"/2
0 (cos O)2U-1(sin O)2v-1 dO.

The first integral on the right-hand side is 2u + v- 1r(u + v) by (5.40). Let

(5.41) B(u, v) = 2 f/2(COS O)2U-1(sin O)2v-1 dO, u > 0, v> 0.

The function B is called the beta function, and we have just shown that

(5.42) B( ) = r(u)r(v)
u, v r(u + vr

EXAMPLE 2. Let u = v = l Then B(t, t) = 2 S~/2 dO = n. Hence [r(tW =


nr(l) = n, or

(5.43)

Using the formula

(5.44) r(u + 1) = ur(u),


proved earlier, r(m + t) can be found explicitly for any positive integer m. For
instance,

If u < 0, then the integral defining r(u) diverges. However, if u is not an

°
integer m = 0, -1, -2, ... , one can use (5.44) to define r(u). For instance,
if -1 < u < 0, then < u + 1 < 1 and by definition r(u) = r(u + l)ju.
Next r(u) is defined for - 2 < u < -1, and so on. The gamma function can
also be defined for complex values of u [23, p. 148].

219
5 Integration

To obtain another expression for B(u, v), set cos 2 e = g(e) = z and apply
the transformation formula. Then Ig'(e)1 = 2 cos sin and e e,
(n/2
B(u, v) = Jo (cos 2 8)U-l(sin2 8)V-12 cos esin ede,

(S.4S) B(u, v) = f zU-l(1 - Z)v-l dz.

A variety of integrals can be reduced to either (S.41) or (S.4S) and hence can
be evaluated in terms of the gamma function (see Problem 11).

The measure CX n of the unit n-ball


According to Problem 7, Section S.5,
~ = 2 {I (1 _ u2 )<n-l)/2 duo
CXn - 1 Jo
Setting u = g(z) = Jz, then g'(z) = tz- l /2 and
~= (l Z-l/2(1 _ z)(n-l)/2 dz
CX n - 1 Jo
= BG, n; 1).
r(t)r(~)
an=cx n - 1
r ~+1
( )
.

Moreover, a l = 2. By induction on n and Formula (S.43),

(S.46) n = 1,2, ...


CXn = (n/2)r(n/2)'
If n is even, n = 21, then lr(l) = I! and CX 2l = nlll!.

PROBLEMS

1. Let A = {(x, y): XZ + yZ :<=; a Z, x :2: O}. Evaluate SA xyldVz(x, y) by introducing


polar coordinates.
2. Find the area of {(x, y): x < y < 2x, 1 < x + 4y < 4} by introducing fl(X, y) =
y/x, F(x, y) = x + 4y as coordinates of (x, y).
3. Let A = {(x, y): 0 :<=; XZ + yZ :<=; 2, XZ - yl :<=; 1, x :2: 0, Y :2: O}. Find JA xdVz(x, y)
by introducing the coordinates fl(X, y) = x 2 + y2, f2(X, y) = x 2 _ y2.
4. Write the iterated integral S6 dx g-x dy Wx.y ) f(x, y, z)dz as an iterated integral in
cylindrical coordinates.
5. Find V3({(x, y, z): XZ + yZ + zZ :<=; a2 , x 2 + y2 :2: b2}) where a > b.

220
5.9 Coordinate systems in E"

Figure 5.19

6. (Solids of revolution.) Let S be a compact subset of the right half-plane and


A = {(x, y, z): (r, z) E S},
where r2 = x 2 + i. Show that V3 (A) = 2njiV2(S), where (ji, z) is the centroid of S
(Figure 5.19).
7. Find V3 ({(x, y, z): exp( -2x) > y2 + Z2, x> O}).
8. Let fl(X, y) = distance between (x, y) and (1,0), and f2(X, y) = distance between
(x, y) and (-1,0). Show that f = (fl,f2) is a coordinate system for the half-plane
D = {(x,y):y > O}.
9. Let (fl, ... ,fn) be a coordinate system for DI C En, and (4)1, ... , 4>P) a coordinate
system for D2 c £P. Show that (fl, ... , /", 4>1, ... , 4>P), regarded as functions on
DI x D 2 , form a coordinate system for DI x D 2 •
10. (Bipolar coordinates in E4.) In this system the coordinates of x are r cos 8, r sin 8,
p cos 0(, p sin 0(, where (r,O) are polar coordinates in the x l x 2 -plane and (p,O()
polar coordinates in the x 3 x 4 -plane. Find, using bipolar coordinates,

{ (XI)2 dV4 (x),


JKXK
where K is the unit circular disk x 2 +i ~ 1.
11. In terms of the gamma-function, find:
(a) f5 jl=? dx. [Hint: Let x 3 = z.]
(b) The area of {(x, y): 0 ~ y ~ Jcos x, -n/2 ~ x ~ n/2}.
(c) SO' xQ exp( _x b) dx, a> -1, b > O.
(d) Sf'(log xj<x d dx, c > -1, d < -1.
(e) SE (Xl)k dv,,(x), k = 1,2, ... , ~ the standard n-simplex.
(f) S(Xl)k exp( - Ii= I ai(x i)2)dv,,(x), k = 1,2, ... , ai > 0 for i = 1, ... , n.

12. Let Q(x) = Ii.i= I C}Xixi > 0 for every x '" 0, where c~ = c{ for i,j = 1, ... , n.

f
Show that
exp[ - Q(x)/2]dv,,(x) = (2n)n/2[det(c~n- 1/2.
[Hint: Make a suitable orthogonal transformation.]

221
5 Integration

5.10 Measurable sets and functions;


further properties
The treatment of measurable sets, begun in Section 5.2, is continued here
by completing the proofs of Theorems 5.2 and 5.3 for unbounded sets. Then
we list and prove several properties of measurable functions.
For the rest of the chapter it is convenient to adjoin to £1 two "ideal
points" - 00 and + 00. We call the ordered number field £1 with - 00 and
+ 00 adjoined the extended real number system. The points - 00, + 00 are not
numbers. However, for present purposes we agree that - 00 < a < + 00
for every number a, and that
(+00) + (+00) = +00, (+00) + a =a + (+00) = +00,
+oo ifa>O
(+ oo)a = a( + 00) = { -00 la<.
'f 0

Similar conventions are made regarding - 00. However, ( + 00) + (- 00) is


undefined.
The extended real number system is denoted by £1. If S is a nonempty
subset of £1, let sup S be the smallest b E £1 such that x ~ b for every
x E S.1f + 00 E S, then clearly sup S = + 00. Moreover, if x < + 00 for every
XES but S has no upper bound in £1, then sup S = + 00.
The definition of inf S is similar. By neighborhood of + 00 let us mean
any set {x E £1: x > c}. Then convergence to + 00 of sequences in £1 makes
sense. Any non decreasing sequence x 1> Xl, ... , in £1 such that Xm > - 00
for some m has a limit Xo which is finite (i.e., in £1) or + 00.
Lemma 1 regarding reversing the order of limits for doubly indexed
monotone sequences in £1 is used several times.

Lemma 1. Assume that amv 2 0, amv ~ am. v + 1, amv ~ am + 1, v for every


m = 1, 2, ... , v = 1, 2, ... Then
(5.47) lim lim amv = lim lim IXmv'
m--+oo v-oo v-oo m-oo

PROOF. Let

v--+ 00

These limits exist since ami ~ aml ~ ... and alv ~ al v ~ .. , . Moreover,
b1 ~ bl ~ ... and Cl ~ C2 ~ ... , Let
b = lim bm , c = lim cv.

For every m and v, amv ~ c. Hence bm ~ c for every m, and it follows that
b ~ c. Similarly, c ~ b. 0

222
5.10 Measurable sets and functions; further properties

As a first application of Lemma 1, let us complete the proof of Theorem


5.3, which was postponed in Section 5.2 in case of unbounded sets. Using
Theorem 5.3 we then complete the proof of Theorem 5.2(d, e). Formulas
(5.10) and (5.11) are now to be understood as holding in the extended real
number system £t.

Completion ofproof of Theorem 5.3


First consider a nondecreasing sequence of measurable sets At C A z C . . . .
As in Section 5.2, let U r denote the r-neighborhood of O. Consider integer
values m = 1, 2, ... for r, and let am. = V(A. n U m). This doubly indexed
sequence satisfies the hypotheses of Lemma 1. Let A = At U A z U .... For
each m,
AnUm = (At n U m) U (A z n U m) U ....
By definition of measure for possibly unbounded sets (Section 5.2),
V(A.) = lim V(A. n U m), V(A) = lim V(A n U m).
m.... co m .... oo

By (5.11) for the case of bounded nondecreasing sequences of sets


V(A n U m) = lim V(A. n U m).
..... 00

By Lemma 1,
V(A) = lim V(A.) .
..... 00

This proves (5.11).


Next suppose that At ~ A z ~ ... , with V(A t ) < 00. Let A = At n
A z n .... For each m,
(At - A) n U m = At n U m - AnUm.
Since A n U meA t n U m'

Let m -+ 00. We get


V(A t - A) = V(A t) - V(A).
Similarly, V(At - A.) = V(A t ) - V(A.) for each v. We then get (5.12) by
applying (5.11) to the nondecreasing sequence At - At> At - A z , ... ,
At-A.,.... 0

Completion ofproof of Theorem 5.2(d, e)


First consider two measurable sets A and B (possibly unbounded). From
Section 5.2, we know that for each r
V[(A U B) n U r] ~ V(A n U r) + V(B n U r).
Upon letting r -+ 00,

V(A U B) ::; V(A) + V(B).

223
5 Integration

Equality holds if A n B is empty. By induction, the same results hold for


unions of finitely many measurable sets AI,"" Av.
Now consider any sequence of measurable sets AI, A 2 , ... , and let
A = Al U A2 U .... Let Dv = Al U ... uA v ' Then DI C D2 C '" and
A = DI U D2 U .... Moreover,
v

V(Dv) S L V(Ak)'
k=1
By (5.11),
00

V(A) = lim V(DJ s I V(Ak)'


v-C)) k=l

If A 1> A 2, •.. are disjoint, then the inequalities become equalities. 0

For the discussion of integrals to follow it is convenient to consider


functions which may have the values - 00 or + 00.

Properties of measurable functions


Let f be a function with domain En and values in the extended real number
system £1. As in Section 5.4, f is measurable if
(5.48) {x: f(x) > c}
is a measurable set for every real number c. Let us list and prove some proper-
ties of measurable functions.
(1) If f is measurable, then for every c the sets {x :f(x) S c}, {x :f(x) ;:::: c},
and {x : f(x) < c} are measurable sets.
PROOF. The complement of a measurable set is measurable. Since
{x :f(X) s c} is the complement of the set in (5.48), it is measurable. Now
{x :f(x);:::: c} = n:=1
{x :f(x) > c - 11m}. Each set on the right is
measurable, and hence their intersection is measurable. Taking complements,
the third set is measurable. D

From the definition and (1 ),f- 1(1) is measurable if I is any semiinfinite


interval. It can be shown that f-I(E) is measurable if E is any open set or
closed set.
In the next statement we agree that Of = 0 even when f has extended
real values.
(2) If f is measurable, then af is measurable for any real number a.
PROOF. This follows from the definition and (1). 0

(3a) Iff'and g are measurable and


h(x) = max{f(x), g(x)} for every x,
then h is measurable.
PRooF.{x:h(x) > c} = {x:f(x) > c} u {x:g(x) > c}. o
224
5.10 Measurable sets and functions; further properties

In particular, if f is measurable and


f+(x) = max{f(x), OJ, f - (x) = max { - f(x), 0),
then f + and f - are measurable.
Statement (3a) extends to the maximum of a finite number of measurable
functions and, more importantly, to sequences of functions.

(3b) If fl'/2,'" are measurable and


g(x) = sup {fl(X), f2(X), ... } for every x,
then g is measurable.

PROOF

U {x:fl'(x) > e}.


00

{x:g(x) > e} = o
l'= 1

In particular, if fl ~ f2 ~ .. , then g is the limit of this nondecreasing


sequence.
Similarly, if
h(x) = inf{fl (x), f2(X), ... } for every x,
then h is measurable.
Let Yl' Y2' ... be any sequence in £1. Let

Zl' = inf{yl" Yl'+l'" .}, v = 1,2, ...

Then Z1 ~ Z2 ~ .... The limit of the monotone sequence Z1, Z2, ... is called
the lower limit of the sequence y\, Y2,"" and is denoted by lim inf\'~Ci y,,,
Similarly, if
Wl' = sup{yl" Yl'+ 1, . . . }, v = 1,2, ... ,
then WI ~ W2 ~ .... The limit of the monotone sequence WI' W2' .,. is the
upper limit lim supl'~OO Yl" Since Zl' ~ Wl' for each v = 1,2, ... , we must have

lim inf Yl' ~ lim sup Yv'


v-+oo \1--+00

Equality holds if and only if the sequence Yl' Y2' ... has a limit.

(4) If fl'/2"" are measurable and


f(x) = lim inf fv(x) fOI every x,
l'~OO

then f is measurable.

PROOF. For each v = 1,2, ... let hix) = inf{fix),/l'+ I(X), ... }. By (3b)
each hl' is measurable; hi ~ h2 ~ ... , and by the definition of "lim inf,"
f(x) = sup{h 1(x), h 2(x), ... }. Therefore f is measurable. D

225
5 Integration

(5) If ft,f2,'" are measurable and


f(x) = lim fv(x) for every x,
v-+ 00
then f is measurable.
Note: This is a particular case of (4).

In the next statement it is assumed that f(x) + g(x) is everywhere defined.


In other words, for no x it is true that f(x) = + 00, g(x) = - 00 or vice versa.
(6) Iff and g are measurable, then f + g is measurable.
PROOF. For each v = 1,2, ... let
if f(x) > v,
iff(x) ~ -v,

j - 1 j
if - - < f(x) ~-,
v v
where j is an integer, and 1 - v2 ~ j ~ v2 . This construction is like that in
Section 5.4, Lemma t. Then
f(x) = lim fv(x)
v-+ 00
for every x. Defining gv in the same way,
g(x) = lim gv(x).
For each v the functions fv and gv are measurable and take a finite number of
values. It is easy to show that their sum fv + gv is measurable. Since for
every x
f(x) + g(x) = lim [J.(x) + gv(x)],
v-+ 00
f + g is measurable by (5). o
PROBLEMS

1. Let amv = m(m + v)-t. Show that (5.47) does not hold. Which assumption in Lemma
1 fails to be satisfied?
2. Let 1 and g be measurable real-valued functions. Show that their product is measur-
able. [Hint: Show that the square of a measurable function is measurable. Then
21g = (f + gf - 12 - g2.]
3. Let A be a measurable set with V(A) < 00. Show that there exists a measurable set
B c A with V(B) = tv(A).
4. Show that:
(a) If A is measurable, N is a null set, and A - NcB c A u N, then B is measurable
and V(A) = V(B). [Hint: First consider the case of bounded sets.]
(b) If1 is measurable and f(x) = g(x) except for x in a null set N, then g is measurable.

226
5.11 Integrals: General definition, convergence theorems

5. A set A is called a-compact if there is a sequence of compact sets K I C Kl C . . . ,


such that A = KI U Kl u···. Show that:
(a) Any closed set is a-compact.
(b) Any open set is a-compact.
(c) If AI' A z , ... are a-compact, then Al u Al u··· is a-compact.
(d) If A is any measurable set, then A = BuN where B is a-compact and N is
a null set. [Hint: Show that the result is true for bounded measurable sets.]

5.11 Integrals: General definition,


convergence theorems
Let us now define the integral, in the Lebesgue sense, without the special
assumptions (of boundedness, or continuity) imposed in earlier sections. A
second objective of this section is to find general conditions under which
the symbols "lim ..... ex: " and "J" can be interchanged. Two main theorems
giving such conditions are the monotone sequences theorem and Lebesgue's
dominated convergence theorem.

Definition. The integral of a bounded measurable function with compact


support was defined in Section 5.3. Let us extend the definition in three
steps:
(a) Letibe bounded, measurable, and I ~ O. For each r > O,Jis measur-
able on the r-neighborhood U r of O. By Theorem 5.5, I is integrable
over Ur' Let ct>(r) equal the integral of lover U r' Since I ~ 0, ct> is
nondecreasing. Hence ct>(r) tends to a limit, finite or + 00, as r -+ + 00.
Let

(5.49) II dV = lim
r--+ + 00
r I dV.
JV r

If the limit is finite, then I is integrable (over En). Otherwise the left-
hand side of (5.49) equals + 00. If 0 ~ I ~ g, then JI dV ~ Jg dV.
(b) Let I be measurable and I ~ O. For any t > 0 consider the function
Jsuch that
J(x) = min {f(x), t} for every x.

For each t, tl is a bounded, measurable function. It is called the


truncation of I at height t. If s ~ t, then sf ~ J and hence sf dV ~ J
J J dV. Let
(5.50)
II dV = lim
t--+ + 00
IJ dV.

If the limit is finite, then I is integrable. Otherwise JI dV = + 00. If


o .::; I .::; g, then J .::; t g for every t. Consequently, JI dV .::; Jg dV.
227
5 Integration

If f is bounded, then rf = f for all sufficiently large t. The definitions


in (a) and (b) agree. Moreover, if f has compact support, then <p(r) is
constant for all sufficiently large r and the definition of integral in (a)
agrees with the one in Section 5.3. For Step (c),fmay take both positive and
negative values. We write f = f + - f -, where f + , f - are nonnegative
as in Section 5.6
(c) A measurable function f is integrable if f+ and f- are integrable.
If f is integrable, then

(5.51)

If f has compact support and is bounded, then the definition in (5.51)


agrees with that in Section 5.3. This follows from Proposition 5.4 and the
fact that the integrals of f + , f - in the new sense and in the sense of Section
5.3 agree.
Let us now give some general theorems about the validity of interchanging
the symbols" limv--+ 00 " and" J."
Lemma 1. Let K be a compact set and <PI> <P2, ... be bounded measurable
functions with supports contained in K such that <PI ~ <P2 ~ ... and
limv--+oo <Pv(x) = 0 for every x. Then

lim f<Pv dV = O.
v--+ 00
PROOF. There exists e such that <Pv(x) ~ e for every x and v = 1, hence also
for v = 2,3, .... Given c > 0 let c = c/2V(K). Let
Av = {x: <Pv(x) > c}.
By hypothesis K ::J Al ::J A2 ::J ... and Al II A2 II ... is the empty set.
Since V(A I ) is finite, we may apply Formula (5.12), obtaining
lim V(Av) = V(AI II A2 II···) = O.
v--+ 00
Let Vo be such that V(Av) < el2e for every v ~ vo. Then

f <pvdv~eV(Av)<~,
Av
ifv~vo,
while for every v,

r
JK - Av
<Pv dV ~ cV(K - Av) ~ cV(K) = -2c .
Hence for every v ~ Vo ,

o
228
5.11 Integrals: General definition, convergence theorems

Monotone sequences theorem. Let fl, f2, . .. be measurable functions with


o ~ fl ~ f2 ~ ... , and let f(x) = liml'~ ex j;.(x) for every x. Then

(5.52) I f dV = lim Ifv dV.


v~oo

PROOF. We have shown in Section 5.10 that the limit f is measurable. First
let us assume that f is bounded and has compact support. Let 4>v = f - fv.
Since 0 ~ fv ~ f, each function fv is also bounded and has compact support.
Moreover,

I4>v dV = If dV - IfvdV, v=1,2, ...

From Lemma 1, we get (5.52).


Next let us suppose merely that f is bounded. Let

amv = fUrn
fv d V, m, v = 1, 2, ...

and apply Lemma 1 in Section 5.10. Finally, iffis unbounded observe that
for each m = 1,2, ... ,

where mfv is the truncation of fv at height m. Apply Lemma 1 again with

amv = I mfv d v, m, v = 1, 2, ... o


Note. The theorem states in particular that f is integrable if and only if
the nondecreasing sequence of numbers Sfl dV, S12 dV, ... has a finite limit.
The double limiting process (5.49) and (5.50) can be replaced by a single
one. Let f be measurable and f ~ O. Let
if f(x) ~ v, Ix I ~
~
v,
(*) fv(x) = { if Ixl > v,
f(x) otherwise.
Then 0 ~ fl ~ f2 ~ . .. and fv(x) tends to f(x) for every x. Hence by the
theorem,

I f dV = lim Ifv dV.


v~oo

Corollary 1. If f and g are real valued integrable functions, then f +g is


integrable and

(5.53) Iu + g)dV = If dV + I g dV

229
5 Integration

PROOF. By (6) (Section S.10) f + g is measurable. If f ~ 0, g ~ 0, define/.,


and gv by (*). For each v, fv and gv are bounded and have compact supports.
By Proposition S.4,

f(fv+gv)dV= ffvdV+ fgv dV.

The sequencesfl' f2"'" gl' g2' ... , fl + gl' f2 + g2,'" are nondecreasing
and tend respectively to f, g, f + g. By the monotone sequences theorem, the
corollary is true when f ~ 0, g :?: O.
In the general case,
Os,(f+gt s,f+ +g+.
Since f+ and g+ are integrable, so is (f + g)+. Similarly (f + g)- is inte-
grable. Then
f+ + g+ = (f + g)+ + cp,
f- + g- = (f + g)- + cP,
where cp :?: O. Since the corollary is true for nonnegative functions,

ff+ dV + f g+ dV = f(f+g)+dV+ fCPdV,

ff- dV + f g - dV = f(f+g)-dV+ fCPdV.

Subtracting, we get (S.53.) D

Corollary 2. A measurable function f is integrable if and only if I fl is integrable.

This is proved in the same way as Corollary 1, Section S.6, using f =


f+ - f-, If I = f+ + f-, and Corollary 1 immediately above. It is easy
to show that cf is integrable if f is integrable, and

f(Cf)dV = c f f dV.

Corollary 3. Let fl, f2, ... be measurable, fl be integrable, fl :?: f2 :?: ... :?: 0,
and let f(x) = limv-+oo fv(x) for every x. Then (S.S2) holds.
PROOF. Let gv = fl - fv and apply the theorem to the nondecreasing
sequence gl, g2 ... which has fl - f as limit. D

For sequences that are not necessarily monotone, the result is as follows.

Fatou's lemma. Let /.. be measurable and /.. ~ 0 for each v = 1, 2, .... Then

(S.S4a) f Oim inf J.)dV s, lim inf ffv dV.


v-+oo V-(JJ

230
5.11 Integrals: General definition, convergence theorems

PROOF. Let hv = inf{fv,fv+I' ... }. Since hv ~ fm whenever v ~ m,

m = v, v + 1, ... ,

(**) J hv dV ~ lim inf Jfm dV, v = 1,2, ...


m-+oo
Let f = lim infv-+oo fv. The sequence hI' h2 , ••• is nondecreasing and tends
to f. By the monotone sequences theorem,

J f dV = v-+
lim Jh v dV.
00

Since (**) holds for each v, the limit is no more than the right-hand side of
(**). 0

A statement about measurable functions is said to hold almost everywhere


(or for almost all x) if it is true except for x in some null set.

Proposition 5.5. If f is integrable, then f(x) is finite almost everywhere.


PROOF. If f ~ 0, let
Am = {x :f(x) ~ m}, Aoo = {x:f(x) = +oo}.
Then Aoo = Al n A2 n ... and Al :::J A2 :::J •••• For each m, let 4>m(x) = f(x)
if x E Am, and otherwise 4>m(x) = O. Then 4>m ~ mlAm' where lA is the
characteristic function of A (Section 5.4). Hence

mV(Am) = J mlAmdV ~ J 4>m dV.

Since 4>m ~ f we have, dividing by m,

V(Am) ~~ Jf dV.

Since the right-hand side tends to 0 as m --+ 00, V(Aoo) = O.


In the general case, f + and f - are integrable and
{x:f(x) = ±oo} = {x:f+(x) = +oo} U {x:f-(x) = +oo}. 0

Iff is measurable and g(x) = f(x) almost everywhere, then g is measurable.


Moreover, if either is integrable then so is the other and f dV = g dV J J
(Problem 1).
In Fatou's lemma the hypothesis "fv ~ 0 for every v" can be replaced
by "fv ~ 4> for every v, where 4> is integrable." For this purpose we make the
following convention. If f ~ 4> and 4> is integrable, then 4>(x) is finite almost
everywhere. By f - 4> let us mean the function with value f(x) - 4>(x) if

231
5 Integration

I/>(x) oF + 00, and value 0 if I/>(x) = + 00. Then J(J - I/>)dV = Jf dV -


JI/> dV if f - I/> is integrable. If the integral of f - I/> diverges to + 00, then
we agree that the integral of f also diverges to + 00.
Let gv = fv - 1/>. By assumption, gv ~ 0 for each v = 1,2, .... Moreover,
lim inf gv(x) = lim inf fv(x) - I/>(x),
v-+oo v-+CX)

provided I/>(x) oF + 00, hence almost everywhere. Since for each v

we have
lim inf Igv dV = lim inf Ifv dV - II/> dV.
v-+ 00 v-+ 00

Applying Fatou's lemma to the sequence gl, g2,'·' and adding JI/> dV to
. each side, we again get (S.S4a).
Similarly, if for each v, fv :::; I/> where I/> is integrable, then

(S.54b) I (li~-+~P fv )dV ~ li~....~up I fv dV.


Note. In the monotone sequences theorem and its corollary the hypothesis
fv ~0 can also be replaced by fv ~ 1/>, where I/> is integrable.

Lebesgue's dominated convergence theorem. Let fI' f2"" be measurable


functions such that:
(a) lim,..... oc, J,.(x) = f(x) almost everywhere.
(b) There is an integrable function 9 such that IJ,. I :::; 9 for v = 1, 2, ...
Then

I
f dV = lim Jfv dV.
v .... 00

PROOF. Since - 9 :::; fv :::; 9 and both - 9 and 9 are integrable, we can apply
(S.54a) and (S.S4b). But
lim inf fv(x) = lim sup fv(x) = f(x)
v.... 00 v.... 00

almost everywhere. Hence

lim sup Ifv dV:::; If dV 5, lim inf Ifv dV.


V"'" 00 \'-+00

But lim inf :::; lim sup, and hence JJ. dV tends to Jf dV. D

Note: In (b) it suffices that Ifv(x) I :::; g(x) almost everywhere. One can
redefine J.(x) to be 0 on the null set where this inequality does not hold
without changing Jfv dV.

232
5.11 Integrals: General definition, convergence theorems

Integrals over measurable sets


Let A be a measurable set. Just as in Section 5.4, a function f is called
measurable on A if, for every real c, {x E A :f(x) > c} is measurable. If the
function fA (Section 5.4) is integrable, then f is integrable over A and we set

{ f dV = ffAdV.

If f and g are integrable over A, then f + g is integrable over A. This follows


from Corollary 1, since (f + g)A = fA + gA' Similarly cfis integrable over A.
The basic properties of integrals listed in Theorem 5.4, Section 5.4, remain
valid.
The monotone sequences theorem, Fatou's lemma, and Lebesgue's
dominated convergence theorem remain true for integrals over A. In these
theorems one has simply to write SA in place of J, and to replace the phrases

"measurable" by "measurable on A,"


"integrable" by "integrable over A,"
"for every x" by "for every x E A,"
"almost everywhere" by "almost everywhere in A."
"I fv I s g" by "I f.(x) I s g(x) for every x EA."

In each case let F v = (fJA, F = fA' Then

f Fv dV = {fv dV,

If, for every x E A, 0 s fl(X) S f2(X) S '" and fv(x) tends to f(x) as v ~ 00,
then 0:$ F 1(x) :$ F 2(X):$ .,. and F,,(x) tends to F(x) for every x E P.
Applying the monotone sequence theorem to the sequence F 1- F 2 _ . . . _ we get

J
A
f dV = lim
v-oo
J
A
fv d V.

The proofs of Fatou's lemma and the dominated convergence theorem for
integrals over A are similar.

Corollary 4. Let A be a measurable set of finite measure and f1' f2' ... measur-
able on A. Assume that:
(a) limv-+oo f.(x) = f(x) almost everywhere in A.
(b) There is a number C such that IUx)1 :$ C for every x E A and v =
1,2, .... Then

f A
f dV = lim
v-c:c
f
A
fv dV.

233
5 Integration

PROOF. Let g(x) = C for every x. Since V(A) is finite, g is integrable over A.
The corollary is then a special case of the dominated convergence theorem
for integrals over A. 0

Corollary 5. (a) Let Ai' A 2 , ••• be a nondecreasing sequence of measurable


sets. Let f be integrable over A = Ai U A2 U··· . Then

(5.55) f A
f dV = lim
v~oo
f
Av
f dV.

(b) Let Ai> A 2 ,


••• be a nonincreasing sequence of measurable sets, and
let A = Ai n A2 n .... Then (5.55) holds provided f is integrable
over Ai'

PROOF. To prove (a), letfv = fAy, v = 1,2, .... Then limv-+oo fv(x) = f(x) for
every x E A and Ifv(x) 1 :::;; If(x)l. The conclusion follows from the dominated
convergence theorem, with g = 1f I.
The proof of (b) is similar. 0

Note. If f ~ 0 and the sequence Ai' A 2 , ••• is nondecreasing, then we


could have appealed instead to the monotone sequences theorem. In that
case it is unnecessary to assume that f is integrable over A. This observation
is useful in proving Corollary 6.

Corollary 6. Let A be measurable, f be continuous on A, and f ~ O. Then

(5.56) i f dV = suP{Lf dV: K c A, K is compact}

PROOF. Let s denote the right-hand side of (5.56). Since fK f dV :::;; fA f dV


whenever K c A, we must have s :::;; fA f dV. There is a sequence of compact
sets Ki c K2 C ... and a null set N such that A = N U Ki U K2 U .. .
(see Problem 5, Section 5.10). Then fKy f dV :::;; s for each v = 1,2, ... .
Moreover, by (5.55) with Av = Kv,

f
A
f dV = f
A-N
f dV = lim
v--+oo
r f dV.
JKy
Thus fA f dV :::;; s. Hence fA f dV = s. o
The right-hand side of (5.56) was taken in Section 5.6 as the definition.
Corollary 6 shows that the definition there agrees with the one in the present
section in case f ~ O. Since the procedure for defining the integral when f also
has negative values was the same in both sections [see (5.30) and (5.51)],
the two definitions agree in general.

234
5.11 Integrals: General definition, convergence theorems

Iterated integrals
Formula (5.23) expresses fA f dV as an iterated integral. In Theorem 5.6
we showed that (5.23) is correct if A is compact and f is continuous on A.
Actually, (5.23) is correct if A is any measurable set and f integrable over A.
This is called Fubini's theorem. Let us show that Fubini's theorem is correct
if A is measurable and f is continuous on A. For the general case, when f is
integrable but not necessarily continuous on A, we refer to the work by
McShane and Botts [18, p. 143].
Consider first f continuous on A and f : : : O. Let K 1, K 2, ... and N be as
in the proof of Corollary 6. Let A = K 1 U K 2 U· .. ; then A = Au N. Let

gv(x') = r
JKv(X')
f(x', x")dv,,-.(x"), g(x') = f-
A(x')
f(x', x")dv,,_ix")

where the sets Kv(x'), A(x') are defined by (5.22) with A replaced by K., A.
By Theorem 5.6,

iKv
f(x)dv,,(x) = i
R
gv(x')dY.(x').

Since K 1(x'), K 2 (x'), ... is a nondecreasing sequence of sets with union


A(x'), Corollary 5 implies that gv(x') --+ g(x') as v --+ 00 for each x'. Since
f:::::: 0, gl ~ g2 ~ .... The monotone sequences theorem and Corollary 5
then imply

(*) {f(X)dv,,(x) = {g(X')dy'(X')'

We leave it to the reader to show that v,,-s[N(x')] = 0, for almost all x' E R
(Problem 11). Since A(x') = A(x') u N(x'), we have for almost all x'

g(x') = fA(x')
f(x', x")dv,,_ix").

Then (*) becomes the desired formula (5.23). The proof shows that if either
side of (*) diverges to + 00, then so does the other side, provided f : : : O.
If f is continuous and integrable on A, with both positive and negative
values, we write f = f + - f - with f + , f - as above. Since both f + and f-
satisfy (5.23), f does also.
Note. If A is a-compact (Problem 5, Section 5.10), then we can take
A = A and avoid mentioning the sets N(x') in the proof.

PROBLEMS

1. Let f be integrable, and f(x) = g(x) almost everywhere. Show that 9 is integrable,
and that Sf dV = J9 dV. [Hint: This is known from Section 5.4 if f and 9 are
bounded and have compact supports.]

235
5: Integration

2. Let f(x, y) = (x 2 + y2)- p/2 if 0 < x 2 + l < 1; and f(x, y) = 0 otherwise (p > 0).
Find If and ft! dV2 • What happens as t -+ oo?

3. Let f be measurable and f :2: O. Show that if f f dV = 0, then f(x) = 0 almost


everywhere.

4. Let fv(x) = sin v1tx. Show that:


(a) lim infv.... oo fv(x) = -1 and lim supv .... oo fv(x) = 1 whenever x is irrational. [Hint:
If x is irrational, then every arc of the circle S2 + t 2 = 1 contains (cos V1tX, sin V1tx)
for infinitely many v.]
(b) f& (lim infv.... oo fv)dx = -1, lim infv.... oo g J,. dx = O.

5. Let fv(x) = v if x E (0, V-I) and fv(x) = 0 otherwise, v = 1,2, ... Show that
lim v.... oo fv(x) = 0 for every x, but f fv dx = 1. Why does this not contradict
Lebesgue's dominated convergence theorem?

6. Let fv(x) = v/(x 2 + v2), v = 1,2, ... Show that 0 ~ fv(x) ~ 1, limv.... oo !v(x) = 0
f
for every x, and fv dx = 1t. Why does this not contradict the dominated con-
vergence theorem?

7. Let fl,f2' ... be functions on EI, which satisfy the hypotheses of Lebesgue's
dominated convergence theorem. Let Fv(x) = fo
fv(t)dt, F(x) = fo
f(t)dt. Show that
Fv tends to F uniformly on EI as v -+ 00. [Note: As is customary notation in
elementary calculus, SO = fro.
x] if x :2: 0, and SO = - f~ if x < 0.]

8. (a) Let fl> f2' .. , be integrable over a measurable set A, and assume that
Lk()= I fA 1h 1dV is finite. Let G(x) =
I~ I I h(x) I· Since the terms of this series
are nonnegative, it either converges or diverges to + 00 for every x E A. Show
that G is integrable over A and hence G(x) is finite for almost every x E A.
[Hint: Apply the monotone sequences theorem to the sequence G I , G2, ... ,
where Gv = l(fl)AI + ... + 1(fv)A I.]
(b) By (a) the series I~ 1 h(x) converges absolutely for almost every x'= A. Let
F(x) be the sum of the series. Show that fA F dV = Ik'=
I fA h dV. [Hint:
Apply the dominated convergence theorem to the sequence Fl. F2, ... , where
F, = (fl)A + '" + (fv)A']

9. Let fl' f2,'" be integrable over A, where A has finite measure. Assume that
1h(x) 1 ~ Ck for every x E A and k = 1, 2, ... ,and that the series ~a;.1 Ck converges.

Show that the series Ik'=


I h(x) converges absolutely for almost every x E A;

and if F(x) is the sum of the series, then fA F dV = Ik'=


I JA fk dV. [Hint: Use
Problem 8.]

f
10. Suppose that fv :2: 0 for v = 1,2, ... and that fv dV -+ 0 as v -+ 00. Show that
lim infv.... oo !v(x) = 0 almost everywhere. [Hint: Problem 3, Fatou's lemma.]

11. Write x = (x', x") with x' E ES , x" E en- s. Suppose that v,,(N) = O. Let N(x') =
{x": (x', x") EN}. Show that v,,-s[N(x')] = 0 except for x' E N' where V.(N') = O.
[Hint: First take N bounded. Let G I ::::> G2 ::::> •• , be open sets such that N c Gv
for v = 1,2, ... and v,,(G v) -+ 0 as v -+ 00. Let 9v(x') = v,,-s[Gv(x')]. Then v,,(G v) =
f
9v dV.. Apply the monotone sequences theorem to 91' 92, ....

236
5.12: Differentiation under the integral sign

5.12 Differentiation under the integral sign


Let A be a measurable subset of En, B be an open subset of El, and A x B =
{(x, t): x E A, t E B} be their cartesian product. In this section we are con-
cerned with the validity of the formula

(5.57) o
at j
ff
Af(X, t)dv,,(x) ofj (x, t)dV(x),
= A at j = 1, ... ,1.

Lemma 1. Let f be continuous on A x B. Assume that there is a function


g integrable over A such that I f(x, t) I :S g(x) for every x E A, t E B. Let

(5.58) cf>(t) = Lf(X, t)dv,,(x), t E B.

Then cf> is continuous on B.


PROOF. Let to be any point of B. If t 1 , t 2 , ... is any sequence in B tending
to to, then since f(x, ) is continuous at to
f(x, to) = lim f(x, t m) for every x E A.
m-+oo

Since I f(x, tm) I :S g(x), by Lebesgue's dominated convergence theorem

f f(x, to)dv,,(x) = lim f f(x, tm)dv,,(x).


A m--+oo A

Thus cf>(t m ) --+ cf>(t o) as m --+ 00. Since this is true for every sequence in B
tending to to, cf> is continuous at to. 0

Lemma 2. Let 1= 1. Assume that f and af/ot are continuous on A x Band


satisfy
%
I f(x, t) I :S g(x), I (x, t) I :S h(x) for every x E A, t E B,
where g and h are integrable over A. Then

(5.59) cf>'(t) = L : (x, t)dv,,(x), t E B.

PROOF. Let to E B, and let c5 > 0 be such that B contains the interval
(to - c5, to + (5). If 0 < u < c5, then of/at is integrable over A x [to, to + u].
This set is measurable, and by the iterated integral theorem,

(*) f'O
10
+
dt f A of
U
at (x, t)dv,,(x) = f A {f
10
'o u
+ of (x, t)dt }dv,,(x).
at
By the fundamental theorem of calculus the inner integral on the right-hand
side is f(x, to + u) - f(x, to). If we let t{t(t) denote the right-hand side of

237
5 Integration

(5.59), then (*) becomes

(**) f to + u

to
"'(t)dt = (Wo + u) - (Wo)·

Similarly (**) is true when -~ < u < O. Lemma 1 implies that", is con-
tinuous; hence by the fundamental theorem of calculus, "'(to) = </J'(t o). 0

If, instead of an open set, B is a closed interval [a, b], then the proof of
Lemma 2 shows that (5.59) is still true provided </J'(t) means the one-sided
derivative at the endpoints.

Theorem 5.9. Let A be measurable and B open. Let f and of/oti, j = 1, ... , I,
be continuous on A x B and satisfy

If(x, t)1 ::;; g(x), I:~ (x, t)1 ::;; hJ{x) for every x E A, t E B,

where g and hI' ... , hi are integrable over A. Then the function </J in (5.58)
is of class CU) on B and its partial derivatives are given by (5.57).
PROOF. Applying Lemma 2 with t 1, ••• , ti-I, ti - 1 , ••• , t' fixed, we get

o</J
oti (t) = f A
of (x, t)dv,,(x), ]. = 1, ... , l.
oti

Applying Lemma 1 to the function of/oti , we find that o</J/oti is continuous


on B for eachj. Hence </J is of class Cl). 0

Corollary. The conclusion of the theorem holds if A is compact, B is open, and


f together with of/oti, j = 1, ... , I, are continuous on A x B.
PROOF. Let U be any neighborhood whose closure is contained in B. Since
A x cl U is compact, If(x, t)1 and

1~~(X,t)I' j=I, ... ,I,

are bounded on A x U by some number C. Let g = hi = C. By the theorem


with B replaced by U, </J is of class C(1) on U and (5.57) holds there. Since this
is true for every such U, </J is of class C(I) on Band (5.57) holds for every
tEB. 0

EXAMPLE 1. Let </J(t) = If X-I exp( -xt)dx, t > O. Find </J'(t). Using (5.59),
</J'(t) = - f <XJ
1
exp(-xt)dx = -t- 1 exp(-t).

If B is an interval (a, (0), a > 0, the hypotheses of Lemma 2 are satisfied with
g(x) = h(x) = exp( -ax). The formula for </J'(t) is correct for all t in any such
interval, and hence for every t > O.

238
5.12 Differentiation under the integral sign

PROBLEMS

1. Find c/J'(t) if c/J(t) is:


(a) g log(x2 + t 2 )dx, t -=I- O. (b) So X-I exp(xt)sin x dx.
2. Let c/J(t) = g log(2 - x 2 t 2 )dX. Show that c/J'(O) = 0 and that c/J is concave on the
interval ( -1, 1).
3. For x E El let c/J(x) = SO' exp( - t 2)cos xt dt. Show that c/J'(x) = -ixc/J(x) and find
c/J(x).
4. (Leibniz's rule). Show that
d [fb(t) ] fb(t) of
d f(x, t)dx = f[b(t), t]b'(t) - f[a(t), t]a'(t) + -;- (x, t)dx,
t a(t) a(t) ut

provided f and of/at are continuous on [ao, bo] x B where B is open, that
ao ~ a(t) ~ bo and ao ~ b(t) ~ bo for every t E B, and that the functions a, bare
of class C(1). [Hint: Let G(x, t) = go
f(s, t)ds, so that oG/ox = f Calculate the
derivative of G[b(t), t] - G[a(t), t].]
5. Let c/J(t) = Slog t X-I exp(x 2t 2)dx, t > 1. Find c/J'(t).
6. Let c/J(x, t) = S~~~: f(s)ds, where c > 0 and f is of class C(1) on El. Show that
Wc/J/ot 2) = c2Wc/J/OX 2).
7. Let c/J(x, y) = H; dt S~ f(s, t)ds, x> 0, y > O. Show that if f is continuous on
{(x, y): x 2 0, Y 2 O}, then (02cpjOX oy) = f
8. Let

c/J(x) = f) exp( _t 2 - ;:}t, for x EEl.

(a) Show that c/J(x) = i-Jn exp( -2Ixl) for all x. [Hint: For x> 0, show that
l{>'(x) = -2l{>(x) using the substitution s = l/t.J
(b) Note that application of (5.59) gives a false result at x = O. Why is this not
surprising?
9. Let c/J(t) = SO' f(x, t)dx, for t > 0, where f(x, t) = exp( - xt)x - 1 sin x. Show that:
(a) c/J'(t) = -1/(1 + t 2).
(b) c/J(t) -+ 0 as t -+ + 00. [Hint: Apply the dominated convergence theorem with
fv(x) = f(x, tv) where 1 ~ t 1 ~ t2 ~ ... and tv -+ + 00 as v -+ 00.]

(c) lim
,~oo
f' 0
sin x dx
x
=~.
2
(d) foo Isin x IdX = +
0 x
00.

[Hints: From (a) and (b), c/J(t) = n/2 - tan -1 t.] Integrate by parts to show that

11 00
f(x, t)dx I ~ ~ for suitable c.
By the dominated convergence theorem,

lim
t-+O +
f'0
f(x, t)dx = f'
0
-
x
x dx.
sin-

239
5 Integration

5.13 LP-Spaces
The concept of normed vector space is defined in Section 2.9. The spaces
~(S) of bounded, continuous functions in Section 2.10 furnish one interesting
class of infinite dimensional normed vector spaces. We now consider another
class, the spaces U(A), which are defined as follows.

Definition. Let A be a measurable set of positive measure, and let p be a


number such that p ~ 1. The collection of all measurable real valued
functions f with domain A such that If IP is integrable over A is denoted
by U(A). For instance, L l(A) is just the collection of all real valued
functions integrable over A.

We regard two functions f and g as defining the same element of U(A)


if f(x) = g(x) for almost all x E A.
Since p ~ 1, by Problem 3, Section 2.11,
If(x) + g(x)IP ~ 2P- 1(I!(x)IP + Ig(x)IP).
Hence if If IP and Ig IP are integrable over A, then If + g IP is also integrable
over A. This shows that the sum of two elements in U(A) is also in U(A).
Clearly, cf E U(A) if f E U(A) and c is a real number. Thus U(A) is a vector
space over the real number field. The p-norm of a function f E U(A) is the
number
(5.60)

The p-norm has the following three properties:


(a) I f I P = 0 if and only if f(x) = 0 for almost all x E A.
(b) lief I P = Ie III f I pfor every real c.
(c) Ilf + gllp ~ Ilfllp + Ilgllp'
The proofs of (a) and (b) are left to the reader (Problem 4). Property (c)
is called Minkowski's inequality (let us defer the proof to later in this section). A
finite dimensional version of Minkowski's inequality is given in Problem 4,
Section 2.11. Properties (a), (b), and (c) state that I I p is a norm on the space
U(A). (Recall that any function f such that f(x) = 0 for almost all x E A is
considered as equivalent to the zero function.) In any normed vector space
11, the norm defines a distance (Section 2.9). The distance between two
functions f, g E I!'(A), taken in this sense, is Ilf - gllp. It is called the distance
in mean of order p. The concepts of convergent and Cauchy sequences are
defined in any metric space (Section 2.9). In U(A) these concepts become
the following. A sequence fl' f2' ... converges to f in mean of order p if
Ilf. - flip -+ 0 as v -+ 00. If for every e > 0 there exists N such that
Ilf. - Jllip < e for every I, v ~ N, then the sequence fl'/2"" is Cauchy in
mean of order p.

240
5.13: U-Spaces

Theorem 5.10. Every Cauchy sequence fl'/2, .. , in U(A) converges in mean


of order p to a limit f E U(A).

This theorem is one of the remarkable features of the Lebesgue theory.


It states that U(A) is a complete normed vector space (i.e., a Banach space.)
PROOF. Let fl' f2' ... be a Cauchy sequence in U(A). There is an increasing
sequence of positive integers N 1 , N 2 , ••• such that Ilf. - fillp < r k - k / p for
every v, 1 ;::: N k. Let gk = fNk' and let
ao
(*) F(x) = I 2kp Igix) - gk+ 1(x)IP.
k=l
Since its terms are nonnegative, this series either converges or diverges
to + 00 for every x E A. But

k~l f}kPl gk - gk+ll PdV = k~12kP(llgk - gk+lll p)P < k~12-k.


Since the series on the right converges, so does the one on the left. Therefore,
F is integrable over A, and B = {x E A : F(x) = + oo} is a null set [Problem
8(a), Section 5.11]. Now each term of a nonnegative series is no more than the
sum of the series. Applying this observation to (*), we find that for x E A - B
Igk(X) - gk+ l(x)1 ~ 2- k[F(x)Jl/ P.

Therefore, for s = 1, 2, ... and x E A - B,


s- 1
Igk(X) - gk+s(X) I ~ I Igk+.(X) - gk+r+l(X)1
r=O

s- 1
(**) ~ [F(X)]l/p L 2- k+r < 21-k[F(x)]l/p.
r=O

Since the right-hand side approaches 0 as k --+ 00, the sequence of real
numbers gl(X), g2(X)1 ... is Cauchy. Let
f(x) = lim gk(X). if x E A - B,
k-oo

and let f(x) = 0 if x E B. Letting s --+ 00 in (**),


Igk(X) - f(X) I ~ 2 1- k[F(x)Jl/ P
for x E A - B. Since B is a null set,

{Igk - flPdV~ 20 - k)p {FdV.


Since F is integrable over A, the right-hand side tends to O. Therefore
lim Ilgk - flip = O.
k-ao

241
5 Integration

If v;;::: Nk> then Ilgk - fvllp < 2- k - klp (recall the definition of gk)' Using
Minkowski's inequality
Ilfv - flip ~ I~~ - gkllp + Ilgk - flip.
Since the right-hand side tends to ° as k ~ 00, this shows that

lim II.tv - flip = 0. D


v .... 00

If p > 1, then the number p' such that


1 1
-+-=1
p p'
is called conjugate to p.

Theorem 5.11. If f E U(A) and 9 E U'(A), then fg is integrable over A and

(5.61) (Holder's inequality).


PROOF. Let 4>(t) = tl/p for t ;;::: 0. Since °
< lip < 1, 4>"(s) < for all s > ° °
(this implies that 4> is concave). Hence 4>(t) ~ 4>(1) + 4>'(1)(t - 1), or

t llp ~ 1 + ~p (t - 1) = ~ + ~.
p p

Setting t = uPv- P', where u ;;::: 0, v > 0, we find since 1 - p' = - p'lp that
uP vP'
(5.62) uv ~ -p + "p

°
Obviously, this inequality also holds when v = 0.
If I f I p = 0, then f(x) = almost everywhere in A and both sides of
Holder's inequality are 0. Similarly both sides are if Ilgllp' = 0. Suppose °
that Ilfllp > 0, Ilgll p' > 0, and let

1 = (1/llfll p )f,
Then

II liP dV = IlgIP' dV = 1,

and setting u = Il(x)l, v = Ig(x)1 in (5.62)


f II(x)llg(x)1 dv,,(x) ~ -1 + -;1 = 1.
A P P
But the left-hand side is Ilfglll/llfllpllgllp" This proves Holder's inequality.
D

242
5.13 LP-Spaces

Note. The finite dimensional version of Holder's inequality (Problem 8,


Section 4.8} expresses the fact that the p-norm on £R and the p' -norm on the
dual space (En)* are dually related. There is an infinite dimensional analog
which we state without proof. Let [U(A)]' denote the set of all real valued
linear functions on U(A) that are continuous. On [U(A)]' there is defined a
dual norm,just as in the finite dimensional case. Then [U(A)]' is isomorphic
with U' (A), and the dual norm is just the p' -norm [18, p. 211] for p > 1.

PROOF OF MINKOWSKI'S INEQUALITY

(*) {II + glP dV s {If II! + girl dV + {Igill + gIP-I dV.

By Holder's inequality,

{III II + gIP-I dV s ({IIIP dV yIP({II + gl(P-l)P' dV YIP'.

But (p - I )p' = p. Estimating similarly the last term in (*), we get

{II + glP dV s (11Ill p +llgllp)({1! + gIPdV)IIP'.

If II I + gil P = 0, both sides are O. Otherwise we divide both sides by


(JAI! + glP dV)llp'. Since 1 - lip' = lip, we get Minkowski's inequality. 0

If p = 1, then (formally) p' = 00. Let us call I essentially bounded if I is


equivalent to a bounded function g [equivalent means that I(x) = g(x)
almost everywhere in A]. Let L "'(A) be the collection of all essentially
bounded measurable functions. For I E L "'(A) let
11111", = ess sup{II(x)l: x E A},
where the right-hand side means inf{(sup{ Ig(x)l: x E A}): g is equivalent to
f}. Then Holder's inequality is still true when p = 1.
If p = 2, then p' = 2. In L 2(A) let us introduce an inner product . as
follows:

I· g = {Ig dv".

Then I . 1= (111112)2. Moreover, II· gl s IIIgllI and hence


(5.63) II· gl s III11211g112'
This formula corresponds to Cauchy's inequality in En. In fact, the space
L 2(A) is an infinite dimensional analog of euclidean En.
An inner product space H is a vector space with an inner product· satisfying
a list of axioms corresponding to those in Section 1.2, Problem 2. If H is
infinite dimensional and complete, then H is a Hilbert space. The preceding
theorem shows that L 2(A) is a Hilbert space.

243
5 Integration

PROBLEMS

1. Let f(x) = Ix 1-'. Show that if A is the unit n-ball, then f E U(A) for p < n/a. but not
for p ~ n/a.. [Hint: See Section 5.6, Example 4.]
2. Letf(x) = x- I (logX)-2,A = (O,t).ShowthatfEU(A)forp = l,but not forp > 1.
3. Let 1 ::; q < p. Using Holder's inequality show that if A has finite measure, then
f E U(A) implies that f E U(A). Give an example to show that it is necessary to
assume A has finite measure. [Hint: Apply HOlder's inequality to the functions Ifl q
and 1 with p replaced by p/q.]
4. Prove properties (a) and (b) for the p-norm.
5. Let A = [0, 1].
°
(a) Describe a sequence I I, 12 , ••• of closed intervals such that V1(I v) --+ as v --+ 00,

°
but every x E [0,1] belongs to infinitely many Iv'
°
(b) Let fv(x) = 1 for x E Iv> fv(x) = for x ¢: Iv' Show that I fvll p --+ as v --+ 00, for
1 ::; p < 00, but fv(x) tends to a limit as v --+ 00 for no x E [0, 1].

244
6
Curves and line integrals

Consider a vector-valued function g, from a I-dimensional interval J into


En with derivative g'(t) =1= O. One can think of the point x = g(t) as traversing
a curve while t traverses J from left to right. We do not call g itself a curve.
Instead, any vector-valued function f obtained from g by a suitable parameter
change is regarded as representing the same curve y as g. In Section 6.3 we
define the concept of differential form ro of degree 1. Then the line integral of a
differential form ro along a curve y is defined in Section 6.4. The differential
df of a functionfis called an exact differential form of degree 1. It is shown that
the line integral of ro depends just on the endpoints of y if and only if ro is
exact (Theorem 6.1).
This chapter may be read independently of Chapters 4 and 5, except for
the chain rule (Section 4.4). Section 6.1 repeats some material included in
Chapter 4.

6.1 Derivatives
Let g be a function from a set J c El into En. Let t be an interior point of J.
Then the derivative of g at t is the vector
. 1
(6.1) g'(t) = hm - [g(t + u) - g(t)],
U-'O u

provided the limit exists.


The derivative of a vector-valued function has many of the same properties
as in the case of real-valued functions. If f and g both have a derivative at t,
then
(f + g)'(t) = f'(t) + g'(t),
(6.2)
(f· g)'(t) = f'(t)· g(t) + f(t)· g'(t).

245
6 Curves and line integrals

Here f . g is the real-valued function whose value at each t is the inner product
f(t)· g(t).
The derivative has a geometric interpretation as a tangent vector. Let
us suppose that 1 is an interval. As t traverses 1 from left to right, the point
g(t) traverses some curve in En. A precise definition of the term "curve" is
given in Section 6.2.
Let us assume that to is a point of 1 at which g'(to) # O. Let
Xo = g(to), Vo = g'(to),
x=g(t), y=xo+uv o ,
where t = to + u and Iu I is small enough that tEl. The ratio of the distances
I x - y I and I x - Xo I may be written, upon multiplying numerator and de-
nominator by III u I, in the form
Ix-yl
Ix - xol
11
= ~ [g(t)
'I 1
- g(to)] - g(to) IOlu)[g(t) - g(t o)] I'
Hence
. Ix - yl 1
hm---- = 0 - , - = o.
u ... olx-xol Ig(to)1
This justifies calling Vo a tangent vector at Xo, and the line through Xo and
Xo + Vo a tangent line at Xo (Figure 6.1). Note that we have used the assump-
tion that g'(t 0) # O.

Figure 6.1

The number t is often called a parameter. It need not have any geometric
or physical significance. However, if n = 3 and t happens to denote time in
a physical problem, then g'(t) is the velocity vector.
A vector-valued function g has components gl, ... , gn, which are the
real valued functions such that
n
g(t) = L g'(t)ei
i= 1

for every tEl. Ifg'(t) exists, then the ith component U-1[gi(t + u) - gi(t)]
of the expression on the right side of (6.1) tends to gi'{t) as u -+ 0, by Propo-
sition 2.2, and
n
(6.3) g'{t) = L gi'(t)e i ·
i= 1

246
6.2 Curves in En

Conversely, if gi'(t) exists for each i = 1, ... , n, then g'(t) exists and is given
by (6.3).

EXAMPLE I. Let n = 2, g(t) = t 2e l + (log t)e 2. Find the tangent line at e l .


In this example gl(t) = t 2, g2(t) = log t, and to = 1, Xo = g(1) = e l . Then
g'(t) = 2te l + t-t e2 , and Vo = g'(1) = 2e t + e2' The tangent line goes
through e t and 3e l + e 2 . Its equation is 2y = x-I.

PROBLEMS
1. Find the tangent line at 2- I / Ze l - 2 1/ Ze 1 to the ellipse represented by g(t) =
(cos t)e l + (2 sin t)e z , J = [0, 2n]. Illustrate with a sketch.

2. Find the tangent line at e l + e z + e 3 to the curve represented by


g(t) = tel + tlf1e z + t I13 e 3 ,!::; t ::; 2.
3. A particle moves along the parabola yZ = 4x with constant speed 2 and so that
dy/dt = gl'(t) > 0. Find the velocity vector g'(t) at e l - 2e z . [Note: The speed is
Jg'(t)J.]
4. Give a proof of Formulas (6.2):
(a) Using the corresponding formulas for derivatives of real valued functions and
(6.3).
(b) Directly from the definition (6.1).

5. Let g(t) = [3t/(\ + t 3 )]e l + [3t Z/(1 + t 3 )]e z , t =1= - \ .


(a) Sketch the curve traversed by g(t) on the interval (-x, -1). On the interval
(- 1, (0).
(b) Show that {g(t): t =1= -I} = {(x, .1'): x 3 + y3 = 3xy}. This set is called the
folium of Descartes.

6.2 Curves in En
Let g be a function from an interval 1 c E1 into En. Then g(t) traverses a
curve in En as the" parameter" t traverses 1. It is better not to call g itself
a curve. Instead one should regard any vector-valued function f obtained from
g by a suitable change of parameter as representing the same curve as g. We
define a curve as an equivalence class of equivalent parametric represen-
tations. To simplify matters we first consider only curves with continuously
changing tangents.
Let us now be more precise. Let us for simplicity assume that 1 = [a, b],
a closed bounded interval, and that the components gt, ... , gn are of class
e(1) on [a, b]. By gi'(a) and gi'(b) we mean, respectively, right-hand and left-

hand derivatives. They are equal to the derivatives at a and b of any class e(1)
extension of gi to an open set containing [a, b].

Definition. If g'(t) # 0 for every t E [a, b], then g is a parametric representation


of class e(l) on [a, b].

247
6 Curves and line integrals

To motivate the definition of equivalence that we are going to make, let


us first consider an example.

EXAMPLE I. Let g(t) = tel + tZe z , 1 ::; t ::; 2. Then gl(t) = t, gZ(t) = t Z,
g'(t) = e l + 2te z #- O. Hence g is a parametric representation of class C(1) on
the interval [1, 2]. In fact, it represents the arc of the parabola y = X Z between
(1, 1) and (2,4), traversed from left to right (Figure 6.2). If we let f(r) = (exp r)e l
+ (exp 2r)e z ' 0'::; r ::; log 2, then f also represents this same parabolic arc.
In effect, fis obtained from g by the parameter change t = exp r. It is reason-
able to regard f and g as equivalent, and we do so.

y
I
I
(2,4)

\
,, /
/
I

---~""-+-","------- x

x = t, Y = t2

Figure 6.2

Now let g be any parametric representation of class C(l) on [a, b]. Let ¢
be any real-valued function of class C(1) on some closed interval [ex,f3] such
that
(6.4) ¢'(r) > 0 for every r E [ex, 13], ¢(ex) = a, ¢(f3) = b.
Let f be the composite of g and ¢, denoted by f = g ¢. Then
0

f(r) = g[¢(r)] for every r E [ex, 13],


From the composite function theorem
fi'(r) = gi'[¢(r)]¢'(r), for i = 1, ... , n,
which is the same as
(6.5) f'(r) = g'[¢(r)]¢'(r)

In particular, f'(r) #- 0 and f is also a parametric representation of class C(1).


The tangent vector f'(r) differs from the tangent vector g'[¢(r)] by the positive
scalar multiple ¢'(r). (Scalar multiplication on the right means the same thing
as on the left, vc = cv.)

248
6.2 Curves in En

Definition. We say that f is equivalent to g if there exists <p satisfying the above
conditions such that f = g <p. 0

The properties of reflexivity, symmetry, and transitivity required of an


equivalence relation hold (Problem 6). By an equivalence class is meant the
collection of all parametric representations of class C<l) equivalent to a given
one. The reader may have encountered the notion of equivalence class else-
where in mathematics. An example is the definition of the rational numbers
starting from the integers.

Definition. A curve y of class C(l) is an equivalence class of parametric


representations of class C(l).

By requiring that the components gl, ... ,gn be of class C(q), q ;;::: 2, and
allowing only parameter changes <p of class C(q), the notion of curve of class
C(q) can be defined in the same way. To study curvature of curves one needs

to assume class C(2) at least [25]. However, for present purposes we need only
class C(l). From now on we say "curve" instead of "curve of class C(l)," and
"representation" instead of "parametric representation of class C(1)."
Each curve has an infinite number of representations. If g is one such
representation, then each parameter change <p leads to another. It is often
highly advantageous to make a judicious choice of parameter. In a physical
problem, time (measured according to some preassigned scale) may be the
preferred parameter. For certain curves one of the components Xl, ... , Xn
can be taken as the parameter. For example, if the first component gl'(t) of
the tangent vector g'(t) is everywhere positive, then gl has an inverse (Section
A.4). Let us take for <p the inverse of gl. Formally this amounts simply to
solving the equation Xl = gl(t) for t, obtaining t = <p(x l ). Set r = Xl. Then
Xl is the new parameter andfl(x l ) = Xl. Figure 6.3 illustrates this situation
for n = 2.
A curve y is to be regarded as the path traversed by a moving point, and
we have not excluded the possibility that y passes through the same point
several times. The multiplicity of a point x is the number of points t E [a, b]
such that g(t) = x. The multiplicity does not depend on the particular
representation g chosen for y, since any <p satisfying (6.4) is a univalent

XJ
I (x,f(x)) I
I I
--~--~aL-------7b----x

fl(x) = x, f2(x) = fIx)

Figure 6.3

249
6 Curves and line integrals

g(b)

g(a) g(a) = g(b)

Simple arc Simple closed curve Not simple

Figure 6.4

function, namely, </J( r I) of- </J( r 2) ifr I of- r 2' The trace of y is the set of points
of positive multiplicity, that is, the set of points through which y passes at
least once. If x has multiplicity 1, then x is called a simple point. If every
point of the trace is simple, then y is called a simple arc.
The point g(a) is called the initial endpoint of y and g(b) the final endpoint.
If g(a) = g(b), then y is called a closed curve. A closed curve is called simple
if every point of the trace is simple except g(a), which has multiplicity 2
(Figure 6.4).

EXAMPLE 2. Let g(t) = Xo + t(x I - x o ), 0 ~ t ~ 1. Then y is the line segment


joining XI and Xo. traversed from Xo to XI' It is a simple arc.

EXAMPLE 3. Let g(t) = (cos mt)e l + (sin mt)e 2 , 0 ~ t ~ 2n, where m is an


integer not O. The trace is the unit circle x 2 + y2 = 1. The closed curve y that
g represents goes around the circle Im I times, counterclockwise if m > 0 and
clockwise if m < O. If m = ± 1, then y is a simple closed curve.

At this point we need some properties of integrals, which are reviewed in


Section A.3. In the present chapter we employ the Riemann definition of
integral, as is customary in calculus. The more sophisticated Lebesgue theory
of integrals was developed in Chapter 5.

Definition. The length I of a curve ;! is

(6.6) 1= flg'(t)ldt.

If f is equivalent to g, then

Jra fl If'(r)ldr = Jrali Ig'[</J(r)] I</J'(r)dr = fb Ig'(t)ldt,


a

by (6.5) and the theorem about change of variables in integrals (Section A.3).
Thus I does not depend on the particular representation chosen for y.

250
6.2 Curves in E"

Formula (6.6) is suggested by considering inscribed polygons. Let


a = to < tl < ... < t m - I < tm = b, and let J1 = max{t l - t o ,t 2 - t l , ... ,
t m - tm- d. The polygon which joins successively g(tj _ l ) with g(t) has ele-
mentary length
m
(*) L Ig(t) - g(tj _ dl·
j= I

More precisely, given e > 0, there exists b > °


The length 1is the limit of the elementary lengths of polygons inscribed in y.
such that 1(*) - 11 < e
whenever J1 < b. Since we mention this fact just to motivate the definition
(6.6), the proof will only be indicated. Since the derivative g' is continuous,
g(t) - g(t j _ l ) can be replaced by g'(s)(t j - t j _ 1) and the sum (*) by
m
(**) L Ig'(s)l(tj - tj _ d,
j= I

with error tending to ° as J1 -+ 0, where Sj can be chosen arbitrarily in


[t j _ l , tj]. But (**) is a Riemann sum for the integral (6.6), and tends to 1 as
J1 -+ 0. The proof that (*) can be replaced by (**) with small error makes use
of the fact that the continuous function g' is uniformly continuous on the
compact set [a, b] (Section 2.5, Problem 8).
Every smooth curve y has a representation of particular geometric interest.
It is called the standard representation, or representation with arc length s
as parameter, and is defined in the following way. Let g represent y on [a, b],
and let

S(t) = {lg'(u)ldU for every t E [a, b].

The length of the part of y represented on [a, t] is S(t). Clearly, S(a)


S(b) = 1. By the fundamental theorem of calculus,
°
= and

S'(t) = Ig'(t)1 > 0


for every t E [a, b]. In particular, if t signifies time then S'(t) is the length of

°
the velocity vector, that is, the speed of motion.
Since S'(t) > the equation s = S(t) can be solved for t. More precisely,
the function S has an inverse tjJ of class C(l) on [0,1]. Let G = go tjJ. Then G
is the standard representation of y. From (6.5)

G'(s) = g'[tjJ(s)]tjJ'(s), for every s E [0, 1].


Since

Ig'[tjJ(s)] I'

251
6 Curves and line integrals

we find that
(6.7a) IG'(s) I = 1
for every s E [0, I]. Hence G'(s) is a unit tangent vector at the point G(s). If we
write dxilds for Gi'(s), then (6.7a) can be rewritten

(6.7b)

EXAMPLE 3 (continued). Let m > O. Then I g'(t) I = m, S(t) = mt. Solving the
equation s = S(t) for t, we obtain the standard representation
G(s) = (cos s)e l + (sin s)e 2 , o ::; s ::; 2mrr.
Piecewise smooth curves
It is not difficult to adapt the preceding discussion to curves of class C1)
except for a finite number of corners and cusps. By a parametric representation
of a piecewise smooth curve is meant a continuous function g on an interval
[a, b] with the following property: There exist to, t l , ... , tp with

a=tO<tl < .. ·<tp- l <tp=b


such that the restriction of g to each of the closed subintervals [t j_ 1, t j], j =
1, ... , p, is a parametric representation of class C(l). In particular, g has at
each tj interior to [a, b] right- and left-hand derivatives which need not be
equal. Parameter changes which are piecewise of class CO) are admitted. A
piecewise smooth curve is an equivalence class of parametric representations
which are piecewise of class C(l).

EXAMPLE 4. Let n = 2 and g(t) = tel + It - 11 e 2 , 0 ::; t ::; 2. This is a piece-


wise C(1) representation ofthe polygon from e 2 to e l to 2e l + e 2 with a corner
at e l • Let c/>(r) = r3 + 1 and f(r) = g(r3 + 1) = (r 3 + l)e l + Ir 3 1e 2 ,
-1::; r::; 1. Then fl(r) = r3 + 1 and f2(r) = Ir 3 1. The components
fl, f2 are of class C(l), which might lead one to think that there is no corner.
However, c/> does not define an admissible parameter change, since c/>'(O) = 0
contrary to (6.4). Since f'(O) = 0, f is not a parametric representation of
class C(1). This example emphasizes the importance of the restriction c/>'(r) > 0
in (6.4).

PROBLEMS

1. Which of the following represent simple arcs? Simple closed curves? Illustrate with
a sketch.
(a) g(t) = (a cos t)e l + (b sin t)e2, a> 0, b > 0, J = [0, 2n].
(b) Same as (a) except J = [-n, 2n].
(c) g(t) = (-cosh t)e l + (sinh t)e 2 , J = [ -I, 1] (see Section 3.5 for the definition
of cosh and sinh).

252
6.3 Differential I-forms

2. (a) Let I' be represented by f(x) = xe l + f(x)e z , a ::; x ::; b, where f is of class e(1)

r
on [a, b]. Show that

1= Jl + [f'(x)r dx.
a

(b) Find I in case f(x) = IxI 3/Z, a = -b.


3. Find the standard representation of the helical curve represented on [0,2n] by
g(t) = (cos t)el + (sin t)ez + te 3 • Sketch the trace.
4. Sketch the trace ofthe curvey represented on [0, 2n] by g(t) = (cos t)el + (sin 2t)e z.
Find the tangent vectors to I' at the double point (0, 0).
5. Let gl(t) = cos(1/t)exp( -lit), gZ(t) = sin(1lt)exp( -lit) if 0< t ::; 1, and gl(O) =
gZ(O) = 0.
(a) Show that gl and gZ are of class e(1) on [0,1]. [Hint: Uk exp( -u) ~ as
u ~ +00.]
°
(b) Does g = gle l + gZe z represent a curve of class CO)? Illustrate with a sketch.

6. Let us write f - g to mean f is equivalent to g. Prove that:


(a) g - g (reflexivity).
(b) Iff - g, then g - f (symmetry).
(c) If gl - gz and gz - g3, then gl - g3 (transitivity).
7. Let I' be a curve of class e(l). Prove that the multiplicity of any point x is finite.
8. Let Yo and 1'1 be curves represented on [a, b] by go and gl, respectively. For every
u E [0,1] let Yu be the curve represented by gu(t) = ugl(t) + (1 - u)go(t), a::; t ::; b.
Let I(u) be the length of Yu' Prove that I is a convex function on [0, 1]. When is the
convexity strict?
9. Let G be the standard representation of a curve y of class e(Z).
(a) Show that G'(s)· G"(s) = O. [Hint: Use the fact that IG'(sW = 1.]
(b) Let g be any parametric representation of class e(Z) of I' and define S(t) as in
this section. Show that S"(t) = g'(t)· g"(t)/S'(t).
(c) If G"(s) oF 0, then G"(s) is called the principal normal vector and IG"(s) I the
absolute curvature at G(s). Show that

[lg'(tW Ig"(tW - (g'(t)· g"(t))2] 1/2


IG"[S(t)] I = Ig'(tW

6.3 Differential I-forms


Let us first give a rough description of this notion and afterward be more
precise. A differential form co of degree 1 is supposed to be an "expression
linear in the differentials dx I , ... , dxn":
(6.8)
where the coefficients WI" .. ,W n are real valued functions. In case there is
a real valued differentiable function f such that Wi is the ith partial derivative
j; for each i = 1, ... , n, then co is called the differential off and is written df

253
6 Curves and line integrals

Thus
(6.9)

It is important to know whether a given differential form co is the differential


of a function. A considerable part of the discussion in this section and
Section 6.4 is directed to just this question. One gets a necessary condition
(6.11) from the fact that the mixed partial derivatives hj and hi of a function
f of class e(2) are equal. This necessary condition turns out to be sufficient
if the domain is simply connected (Section 8.10).
We recall from Section 3.2 that the elements of the space (En)* dual to
En are called co vectors, and that the components ai of a covector a are written
with subscripts. No matter what precise meaning we give to the symbols
dx l , ... , dx n, the functions WI, ... , Wn must determine the differential form co.
For each x, the numbers wI(x), ... , wn(x) are the components of a co vector.
This suggests that we may define a differential form as a function whose values
are covectors.
To state this precisely:

Definition. A differential form of degree 1 is a function co with domain D c En


and values in (En)*.

For short we usually say" I-form" instead of" differential form of degree
1." In Chapter 7 differential forms of any degree r = 0, 1,2, ... ,n are defined.
The value of co at x is denoted by co(x). It is the covector

(6.10)

where, as in Section 3.2, e l , ... ,en are the standard basis covectors.
A I-form co is a constant form if there is a covector a such that c.o(x) = a
for every XED. In particular, for each i = 1, ... , n let us consider the con-
stant I-form with value e i . This I-form is denoted by dXi. Since (En)* is a
vector space, the sum co + ~ oftwo functions co and' with the same domain D
and values in (En)* is defined (Section 2.1). Similarly, the productfco is defined
iffis a real-valued function and co is a I-form, with the same domain D. In
particular, widx i is the I-form whose value at each x is wi(x)e i. From (6.10),
WI dx l + ... + Wn dxn is the I-form whose value at each x is co(x). Therefore
Formula (6.8) is correct.

The differential of a function


Let us now suppose that D is an open set. Letfbe a real-valued differentiable
function with domain D. The differential offat x is the covector df(x) whose
components are the partial derivatives fl (x), ... , fn(x).

Definition. The differential off is the differential form df of degree 1 whose


value at each XED is the covector df(x).

254
6.3 Differential I-forms

Some authors define df as the real valued function whose domain is the
cartesian product D x En and whose value at each pair (x, h) is the number
df(x) . h. Knowing df(x), one can find df(x)' h for every hE En, and vice
versa. Hence this definition is equivalent to the one we have given.
Iffand g are differentiable functions with the same domain D, then
d(f + g) = df + dg, d(fg) = f dg + g df
These formulas follow from Problem 7, Section 3.3. Similarly, writing e for
the constant function with value e,
de = 0
where 0 denotes the "zero form" whose value is 0 everywhere. If D is con-
nected, then, conversely, df = 0 implies that f is a constant function. This is
just a restatement of Corollary 2, Section 3.3.
If L is a linear function, then dL is a constant I-form. For let L(x) = a' x,
as in Proposition 3.1. Then the ith partial derivative of Lis ai and dL(x) = a
for every x.
In particular, the standard cartesian coordinate functions Xl, ... , xn
(Section 3.2) are linear. In fact Xi(X) = e i • x = Xi, and dXi(x) = e i . Hence
dX i is just the constant I-form which we have denoted by dXi. The common
practice of writing dx i instead of dX i arises from the habit of confusing nota-
tionally a function with its value at some particular point x, in this case of
confusing Xi with Xi = Xi(X). Nevertheless, following custom, we adhere to
the notation dXi.

Definition. A I-form ro is exact if there is a function f such that ro = df

If df = dg, then d(f - g) = O. If D is connected, f - g is a constant


function. Hence the functionfwhose differential is a given exact I-form co is
determined up to the addition of a constant function, if the domain is con-
nected.
A I-form co is of class C(q) if its components Wi are functions of class C(q).
If ro = df, then Wi = of/ox i. In this case co is of class C(q) if and only iff is
of class Cq+ 1).
Let us look for some criteria to determine whether a I-form co is exact.
If ro is of class C(1) and ro = df, thenfis of class C(2). Using Theorem 3.3 and
the %x i notation for partial derivatives,
oW i o2f o2f oW j
ox j ox j ox i ox i ox j ox i '
Thus the conditions
OWi OWj
(6.11 ) -.=-., i, j = 1, ... , n,
ax} ax'
are necessary for exactness of co.

255
6 Curves and line integrals

Definition. A 1-form ro of class C(1) that satisfies (6.11) is called a closed


1-form.

In (6.11) we may as well suppose that i < j. Thus the definition says in
effect that ro is a closed differential form if its components W 1 , . .. ,Wn satisfy
these n(n - 1)/2 conditions. For instance, if n = 2 let us write dx and dy
instead of dx 1 and dx 2, and
M(x, y) = w 1(x, y), N(x, y) = w 2(x, y).
The expression for a I-form is then
ro = M dx + N dy.
The condition that ro be closed is that the components M and N satisfy
oM oN
oy ox'
We have shown that every exact I-form is closed. The converse is false,
as Example 2 (below) shows. It is comparatively easy to check whether con-
ditions (6.11) are satisfied. Therefore, it is very desirable to find some addi-
tional condition that will guarantee that the converse holds. Such a condition
is that the domain D be simply connected, a term to be defined in Section 8.10.
We prove in Chapter 8 that if D is simply connected, then every closed I-form
with domain D is exact. It turns out that any convex set, and in particular P,
is simply connected. For n = 2, an open, connected set D is simply connected
if and only if, roughly speaking, D has no holes.
We have been careful to distinguish notationally between functions and
their values. One can scarcely attain a sound knowledge of calculus until this
distinction is recognized. Nevertheless, in examples we sometimes abuse the
notation for brevity. For instance, d(x 2y) = 2xydx + x 2 dy is short for the
statement "df = f1 dx + f2 dy, wheref(x, y) = X2y,j1(X, y) = 2xy,jix, y) =
x 2 for every (x, y) E £2."

EXAMPLE I. Let ro = 2xy dx + (x 2 + 2y) dy, D = £2. This is, of course, an


abbreviation for ro = M dx + N dy, where M(x, y) = 2xy, N(x, y) = x 2 + 2y,
for every (x, y) E £2. In this example,
oM oN
ay (x, y) = 2x = ox (x, y)

for every (x, y). Hence ro is a closed I-form. Since E2 is connected and simply
connected, ro = df where f is determined up to the addition of a constant
function. The functionfcan be found by partial integration with respect to the
first variable, as follows:
of
ox (x, y) = M(x, y) = 2xy, f(x, y) = x 2y + 4>(y),

256
6.3 Differential I-forms

where the function ¢ is determined from


o!
oy (x, y) = N(x, y) = x + 2y,
2
x2 + 2y = x2 + ¢'(y).

Of course these equations hold for every (x, y) E E2. Then ¢'(y) = 2y, and
¢(y) = y2 + c for every y, where the "constant of integration" c is a number
that may be chosen arbitrarily. Hence for every (x, y) E E2,
!(x,y) = x 2 y + y2 + c.

EXAMPLE 2. Let D = E2 - {(O, O)}. By removing (0,0) we have made a hole,


and D is not simply connected. Let co = M dx + N dy, where for every
(x, y) E D

y
M(x,y) = - 2 2'
X +y
x
N(x,y) = 2 2'
X +y
A computation shows that oM/oy = oN/ox in D, hence co is closed. Let us
show that co is not exact. Let Dl be the open subset of D obtained by deleting

°
the positive x-axis. For every (x, y) E Dl let 0(x, y) be the angle from the
positive x-axis to (x, y), < 0(x, y) < 2n (Figure 6.5). Using elementary
calculus, we find that in D 1 , d0 = co [Problem 6(a)]. If there were a function
! of class e(2) on D such that co = df, then upon restricting! to D 1 we would
have d(f - 0) = O. Since Dl is connected,f - 0 would be constant on D 1 •
This would imply that 0 can be continuously extended across the positive
x-axis, which is false. Hence co is not exact.

EXAMPLE 3. In some cases it can be seen by inspection that co is exact. For


instance, if co = 2x 1 dxl + ... + 2x ndxn and D = En, then
co = d[(X 1)2 + ... + (xn)2 + c] = d(x· x + c).
The reader may have also discovered by inspection that the form co in Example
1 is exact.

(0,0)

Figure 6.5

257
6 Curves and line integrals

PROBLEMS

1. Let n = 1. Give a precise interpretation of the formula df/dx = f' from elementary
calculus. [Hint: The quotient of two real valued functions is defined wherever the
denominator does not have the value 0.]

2. Let n = 3 and ro = M dx + N dy + 0 dz. What do conditions (6.11) become in


this case?
3. In each case determine whether ro is exact. If exact, find all functions f such that
ro = df
(a) ro = xy dx + (x2/2)dy, D = E2.
(b) ro = x dx + xz dy + xy dz, D = E3.
(c) ro = y dx, D = E2.
(d) ro = (l/x 2 + l/y2)(y dx - x dy), D = {(x, y): x#-O and y #- O}.
4. Let ro = dy + p(x)y dx and D be the vertical strip {(x, y): a < x < b}. Let p be
continuous on (a, b) and P be an antiderivative of p, that is, a function such that
P'(x) = p(x) for every x E (a, b). Let f(x) = exp[P(x)). Show that fro is exact.
5. Show that
(a) d[(D = 1 Xi)2] = 2 D.j= 1 Xi dx j.

(b) d[Li*jXiX ] = 2I:'=1 Lk*iXkdxi.


j
[Hint for (b): What is (L Xi)2 - L (Xi)2?]
6. In Example 2:(a) Show that 0 1(x,y) = M(x,y),0 2(x,y) = N(x,y)forevery(x,Y)ED 1•
You may use the formulas for the derivatives of the inverse trigonometric functions.
(b) Verify that aM/ay = aN/ax by calculating these partial derivatives.
7. Let 9 be continuous on EI. Show that

i= 1

is an exact I-form. [Hint: Let h(u) = ug(u) for every u EEl. The function h has an
antiderivative.]

6.4 Line integrals


Let D be an open subset of £ft. Let ro be a I-form with domain D, and y
a curve whose trace is contained in D. We assume that ro is continuous. Let
us consider an inscribed polygon joining successively the points g(tj - l ) and
g(tj} as in Section 6.2. If Sj E [tj- l , tJ, then ro[g(Sj}] is a covector, and its
scalar product with the vector g(t) - g(t j - l ) is a number. Let us consider
the sum
m
(*) Iro[g(s)]. [g(tj) - g(tj - l )].
j= 1

By reasoning such as that indicated in Section 6.2, the sum (*) tends to the
integral (6.12a) as Il -+ O. This integral is called a line integral.

258
6.4 Line integrals

Definition. Let ro be continuous and y piecewise smooth. The line integral


of ro along y is

(6.12a) fro[g(t)] . g'(t) dt.

The line integral exists, since if g is piecewise of class C(1) the integrand
in (6.12a) is bounded and has a finite number of discontinuities. Iff is equiva-

f:
lent to g, then using (6.5)

f~[g(t)]. g'(t)dt = ro[g(c/>(T))]' g'[c/>(T)]c/>'(T)dT = 1'lro[f(T)]' f'(T)dT.

Hence the line integral does not depend on the particular representation
g chosen for y.
Line integrals have an important role in many parts of mathematical
analysis, such as in the theory of complex analytic functions. Several funda-
mental physical concepts are also expressed in terms of line integrals.
The concept of work is mentioned at the end of the present section. In Section
6.6 we discuss thermal systems, and in Section 8.5, circulation in a flowing
fluid.
L
The notation for line integral is roo Writing out the scalar product
in (6.12a),

(6.12b)

The notation for differential form is supposed to suggest (6.12b). Let us


write ro = WI dx l + ... + Wn dxn as in (6.8) and formally multiply and
divide the right-hand side by dt. If we set x = g(t) and write dxi/dt for gi'(t),
then we get the integrand on the right-hand side of (6.l2b).

EXAMPLE I. Let y be the semicircle with center (0,0) and endpoints ± ae2'
directed from - ae 2 to ae 2 . Let us evaluate fl' x dy - y dx. Points (x, y) of the
semicircle satisfy the equations x = a cos 0, y = a sin 0, where - n/2 ~
o ~ n/2. The most convenient representation for y is on [ - n/2, n/2] with
g(O) = (a cos O)e l + (a sin O)e2' Then

f
y
(x dy - y dx) = [ 12
-1</2
(dY dX)
x dO - Y dO dO =
[12
-1<12
a 2 dO = a 2 n.

Elementary properties of line integrals


From the corresponding linearity property of ordinary integrals,

(6.13) f (cro) = c fro,


l' l'

259
6 Curves and line integrals

for any pair of I-forms ro, ~, and scalar c. Let g represent Y on [a, b], and let
</> be of class C(1) on [0:,13] with

</>'(r) < ° for every r E [0:, 13], </>(0:) = b, </>(13) = a.


The formula for change of variables in integrals still holds if we agree, as usual
in calculus, that fb = - J~. The composite f = go</> represents a curve,
which is denoted by - Y and is called the curve obtained by reversing the sense
of direction of y. From the change of variables formula

(6.14) f - Y
ro - -fro
l'

Let Yi"'" Yp be piecewise smooth curves such that the final endpoint
of Yj is the initial endpoint of Yj+ 1 for j = I, ... , p - 1. Let Y be obtained
by "joining together" the curves Yi"'" Yp' More precisely, let us divide
[0,1] into p subintervals [U - l)/p,j/p] of the same length. Each curve Yj
has a representation on an interval [aj' bj]' By a linear change of parameter
we may assume that aj = (j - 1)/p, bj = j/p. Let gj be such a represen-
tation of Yj for each j = 1, ... , p. Then glJ/p) = gj+iU!P). Let g be the
function such that g(t) = git) for t E [(j - l)/p, j/p]' Then g is a parametric
representation that is piecewise of class C(1), and Y is the curve that g repre-
sents. Let us call Y the sum of these curves and write Y = Yi + ... + Yp'
Since an ordinary integral over [a, b] is the sum of the integrals over the sub-
intervals [(j - l)/p, j/p] , we have

(6.\5) f Y'+"'+Yp
co = f y,
co + ... + f ro.
Yp

EXAMPLE 2. Let Y be the boundary of a rectangle in £2, directed counter-


clockwise as in Figure 6.6. Let ro = M dx + N dy. Then

f Y
ro = I 4
j=i
f
Yj
ro.

The most convenient representation for Yi is obtained by setting gi(t) = t,


g2(t) = c, a ~ t ~ b. Taking similar representations for Y2, -Y3' -Y4 and

y
Y3
d •
Y4 f . fY2
C
y,

x
a b

Figure 6.6

260
6.4 Line integrals

using (6.14) we find that

{ro = fM(t, c)dt + fN(b, t)dt - fM(t, d)dt - fN(a, t)dt.

Let us now consider the case when ro is an exact differential form, ro = df,
where J is of class C(l) on an open set D containing the trace of y. By the
definition (6.12a),

{ dJ = f dJ[g(t)J . g'(t)dt.

Let us use the formula


u 0 g)'(t) = dJ[g(t)] . g'(t).
This is a special case of the chain rule, (Section 4.4). When arc length is the
parameter it says that the derivative at s off 0 G equals the derivative ofJin
the direction of the unit tangent vector at G(s). By the fundamental theorem

fu
of calculus,

0 g)'(t)dt = J[g(b)J - f[g(a)].

Let Xo = g(a) and Xl = g(b) be the endpoints of y. Then

(6.16) { dJ = J(x l ) - J(x o)·

This is a generalization of the fundamental theorem of calculus. It shows that


the line integral of an exact I-form depends only on the endpoints of y. In
particular, if y is closed then the line integral is O.
The following theorem shows that each of these properties characterizes
exact I-forms. We say that y lies in D ifits trace is a subset of D. By curve we
mean here piecewise smooth curve.

Theorem 6.1. Let D c En be open, and ro a continuous 110rm with domain D.


The Jollowing three statements are equivalent:
(1) ro is exact.
(2) For every closed curve y lying in D, Iv ro = o.
(3) fJYI and Y2 are any two curves lying in D with the same initial endpoint
and the same final endpoint, then SrI ro = Srlro(see Figure 6.7).

Xo
Figure 6.7

261
6 Curves and line integrals

PROOF. We have seen that (1) implies (2) in Theorem 6.1. If Yl and Y2 have the
same endpoints, then Yl - Y2 is closed. If (2) holds, then

o= f
VI-)'2
ro = f f
VI
ro -
)'2
roo

Hence (2) implies (3).


It remains to show that (3) implies (1). For simplicity let us assume that
D is connected. If D is not connected, the construction to follow must be
applied separately to each component (that is, maximal connected subset) of
D.
Let Xo be some point of D, and define f as follows. Since D is open and
connected, any point of D can be joined to Xo by a curve whose trace is con-
tained in D (see Problem 10, Section 2.7). For every XED let

f(x) = fro,
)'

where Y is any curve lying in D with initial endpoint Xo and final endpoint x.
Since we are assuming (3) in Theorem 6.1, it does not matter which curve with
these properties is chosen. Let us show that df = roo
xo

)' I

x x + ue l
Figure 6.8

Given xED, let U be a neighborhood of x contained in D and {) the


radius of U. Let 0 < u < {) and for each i = 1, ... , n let Yj be the line segment
from x to x + ue j (see Figure 6.8). Then

f(x + ue j) - f(x) = f v+)';


ro - fro
)'
= f
)';
roo

Let gj(t) = x + tej, !/1m = wlx + teJ Then gj represents Yj on [0, u], and

f
hence

-1 [f(x + ue j) - f(x)] = -1 ro = -1 JU!/Ij(t)dt.


u u )'; U 0

Since Wj is a continuous function, !/Ij is continuous. Consequently, the right-


hand side tends to !/Ij(O) = Wj(x) as u -+ 0+ by the fundamental theorem of
calculus. Similarly, u- 1 [f(x + ue j ) - f(x)] tends to Wj(x) as u -+ 0-.
We have shown that each partial derivative of f of order 1 exists at x
and that

i = 1, ... , n.
Therefore df(x) = ro(x). Since this is true for every xED, ro = df D

262
6.4 Line integrals

Work
Let D be an open connected subset of E3. In mechanics the idea offorce field
is considered. A force field assigns at each XED a linear function, which we
call the force covector acting at x and denote by ro(x). If h is a "small dis-
placement" from x, then the work done moving a particle along the line
segment from x to x + h is approximately ro (x) . h. The force field is the
differential form ro of degree I whose value at each XED is the force co vector
ro(x).
For present purposes it is more natural to regard force as a co vector rather
than a vector. However, one can also consider the force vector F(x) =
wl(x)e l + w2(x)e 2 + w3(x)e3 with the same components as ro(x). This
simple device for changing covectors into vectors is justified since we use the
standard euclidean inner product. If En is given another inner product
B(x, y), then the components of ro(x) and F(x) would be related by

Wi = I cijFj,
j= 1

where (Ci) is the matrix of Band (c ij) = (ciT I. Compare with (2.2) and
(3.12).
Let y b" a piecewise smooth curve lying in D. Using the notation in
Section6.2,withforsakeofsimplicitysj = tj_l,thevectorhj = g(t) - g(tj-d
is a displacement from g(t j _ I ), which is small if f.1 is small. The work done
going along y from g(t j _ l ) to g(t j } should be approximately ro[g(t j _ d] . hj'
This suggests the following.

Definition. The work w done in moving a particle along y is

w = fro.
)'

If arc length is used as parameter, then from the definition (6.12a),

w = {ro[G(S)]' G'(s)ds.

The expression ro[G(s)] . G'(s) is called the component of the field at G(s) in
the direction of the unit tangent vector G'(s) to y.
A force field ro of class e(l) is called conservative if ro is closed. In Section
8.10 we define the concept of simply connected open set. It is shown there
that any closed I-form ro is exact if D is simply connected. If ro is exact and
ro = dj, thenfis a potential of the field roo If Dis connected,fis determined up
to the addition of a constant function.

263
6 Curves and line integrals

EXAMPLE 3. Let ro = -p-3(xdx + ydy + zdz), where p2 = x 2 + y2 + Z2


andD = £3 - {O}.lfweagreethatf(x,y,z)-+Oasp-+ co,thenthepotential
f is given by f(x) = p - 1. Except for a multiplicative constant it is the
Newtonian potential due to a mass concentrated at O.

PROBLEMS
1. Evaluate -!- Lx
dy - y dx in case:
(a) y bounds the triangle shown in Figure 6.9.
(b) y is represented by g(t) = (a cos t)el + (b sin t)e2, 0 ::; t ::; 2n, where a, b > O.
Your answer should be the area of the set enclosed by y. This is a very special case
of Green's theorem (Section 8.7).

(0, O)-----~(a, 0)

Figure 6.9

2. Let gl(t) = tel + (2t - l)e 2, I::; t ::; 2, and g2(t) = (t + I)e l + (t 2 + t + 1)e2,
0::; t ::; 1. Evaluate L. (0 and L, (0 for each of the I-forms (a), (c), and (d) in Problem
3, Section 6.3, where YI and Y2 are the curves represented by gl and g2, respectively.

r
3. Let n = I, (0 = f dx, and Y be the interval [a, b] directed from a to b. Show that

fy
(0 =
a
f(x)dx.

4. Let D c £2 be open and simply connected, and u, v functions of class C( I) that satisfy
au av av au
ax ay' ax - ay"

[Note: These two first-order partial differential equations are called the Cauchy-
Riemann equations. They are fundamental to the theory of complex analytic
functions.]
Show that for any closed curve y lying in D,

f y
u dx - v dy = Q, f y
v dx + u dy = O.

[Hint: Any closed I-form is exact since D is simply connected.]

5. Show that the force field (0 = l/I(p2)(X dx + y dy + z dz) is conservative, where 1/1 is
any function of class C(I) on £1 and p2 = x 2 + y2 + Z2. Find its potential f, if
f(O) = o.

(Other line integrals.) The following problems deal not with integrals of
1-forms, but with some other types of line integrals that often occur.

264
6.5 Gradient method

6. If f is continuous on D, then (by definition)

I f ds = ff[g(t)]lg'(t)ldt.
y •

Show that this integral does not depend on the particular representation g chosen
for y. In particular, if arc length is the parameter, then

I ).
f ds = II 0
f[G(s)]ds.

L.
7. The moment of inertia of a curve y about a point Xo is Ix - Xo 12 ds. Find the moment
of inertia about 0 of the line segment in E3 joining e l and e 2 + 2e3'
8. The centroid x of a curve y is the point such that
_. Jy Xi ds
x'=-- i = 1, ... ,n,
I '

where I is the length of y.


(a) Find the centroid of the helical curve in Problem 3, Section 6.2.
(b) Find its moment of inertia about ne 3 •

9. Let W be continuous on D x En and satisfy the homogeneity condition W(x, ch) =


cW(x, h) whenever c ~ O. Then (by definition)

f f y
W =

W[g(t), g'(t)]dt.

Show that:
(a) This integral does not depend on the particular representation g of y.
L.
(b) If W(x, h) = ro(x)' h, then L W = ro: and if W(x, h) = f(x)lhl, W = L f ds. L.
(c) Let W(x, h) = Ilhll, where II II is any norm on En (Section 2.11). Then W is L.
called the length of y with respect to this norm. Show that if y is the line segment
joining Xl and x2 , then the length is Ilxl - x211.

*6.5 Gradient method


In Section 3.5 we found relative extrema of a function f by calculating the
critical points and testing them by Theorem 3.4. However, in practice the
equation df(x) = 0 for the critical points can be explicitly solved only whenf
has some special form. In Section 3.5 a method for finding critical points
approximately was indicated. It is called the gradient method, or method of
steepest ascent, and will now be described more precisely.
Let D be an open set and F = (Fi, ... , Fn) a vector-valued function
whose components Fi are of class C(l) on D. A function g from an interval J
into En is called a solution of the system of first-order ordinary differential
equations

i = 1, ... , n,

265
6 Curves and line integrals

if g'(t) = F[g(t)] for every t E J. An existence theorem for such systems


[5, 11] states that given Xl ED there is a solution g on some open interval J
containing 0 such that g(O) = Xl.
Now let f be of class e(2) on D. Let F(x) = grad f(x) be the gradient
vector offat x. Assume that Xl is not a critical point off A solution g of
(6.17) g'(t) = grad f[g(t)J,
is called a gradient trajectory off through X 1. It is shown in Section 4.7 that
gradf(x) is a normal vector to the level set off containing x. Therefore, the
gradient trajectories are normal to the level sets, as indicated in Figure 6.10.
Let ¢(t) = f[g(t)]. By the chain rule,
¢'(t) = grad f[g(t)] . g'(t) = 1grad f[g(t)] 12 > O.
Hence ¢ is increasing. In other words, the values off increase along each
gradient trajectory as t increases.

Figure 6.10

Let us assume that there is a gradient trajectory g through Xl which is


defined for every t ~ O.
By the uniqueness theorem for systems of differential equation [5, 11],
g(t) is never a critical point. However, one may ask whether g(t) approaches
a critical point Xo as t --+ + 00. While this is not always true, we prove two
partial results in this direction.

Proposition 6.1. If g(t) tends to a limit Xo as t --+ + 00, then Xo is a critical


point off
PROOF. Define ¢ as above. Then ¢ is increasing and
f(x o) = lim ¢(t).
1--+ + 00

Suppose that grad f(x o) # O. Since grad f is continuous there exists a neigh-
borhood U of Xo and m > 0 such that 1grad f(x) 1 ~ m for every X E U. There
exists t 1 such that g(t) E U for every t ~ t 1. By the fundamental theorem of

266
6.5 Gradient method

calculus, if t I < t2 then

4>(t2) = 4>(tl) + f
II
'24>,(t)dt ~ 4>Ud + m2 (t2 - ttl·

The right-hand side tends to +x as t2 ---+ + x, but 4>(t2) ::os: f(x o). This is a
contradiction. Hence grad f(x o) = O. 0

Note. It may happen that a trajectory g remains in a compact set KeD


for every t ~ 0, but g(t) does not approach a limit Xo at t ---+ + x. In that
case it can be shown that g has a "limit set" B, which consists of all accumu-
lation points of sequences [g(t k )] for all possible sequences t 1,1 2 , ... tending
to + x. B is a compact, connected subset of the level set {x :f(x) = C},
where C = lim,~x 4>(t), and every point of B is critical. Tf[has only isolated
critical points, then B is a single point Xo and g(l) ---+ Xo as t ---+ JJ. We do not
prove this.
However, let us show that if Xo is a nondegenerate critical point at which
f has a relative maximum, then any trajectory starting sufficiently near Xo
leads to Xo.

Proposition 6.2. Let Xo be a nondegenerate critical point such that Q(x o , ) is


negative definite. Then there is a neighborhood U oj' Xo such that Xo =
lim,~x g(t) provided Xl EO U.

PROOF. By Problem 8, Section 3.5, there exists In > such that Q(x o , h) ::os:
~ m 1h 12 for every h. Since f is of class C(2) there is a neighborhood U of Xo
°
Xo + h EO U and s EO (O, 1). Since f;(x o) =
Si EO (0, 1) such that
1

°
such that fJxo + sh) - h/xo) ::os: m/2n2 for i, j = 1, ... ,11, whenever
1

there is by the mean value theorem

fJx o + h) = L f:)xo + Sih)h j.


j= 1

If X EO U and h = X - x o , then
m
L
n . .
gradf(x)' (x - x o) = f;)xo + Sih)hlhJ ::os: - -lhI 2 .
i.j= I 2
Now let l/1(t) = 1 g(t) - Xo 12 , the square of the distance from x o , Taking
x = g(t) and using (6.17), we have provided g(t) EO U
1/1'(t) = 2[g(t) - xo]' g'(t)::os: -mlg(t) - x 0 1
2,

whichbecomesl/1'(t)::os: -mtjJ(t).Nowg(O) = XI EO U,and since 1/1 is decreasing,


g(t) EO U for every t ~ 0. Dividing by 1/1(1) in the inequality 1/1'{t) ::os: - ml/1(t) and
integrating over [0, t],
log l/1{t) - log 1/1(0) ::os: -mt.
The right-hand side tends to - x as 1 ---+ x. Hence so does log 1/1(1): and
l/1{t) tends to 0. 0

267
6 Curves and line integrals

Figure 6.11

Figure 6.10 indicates the behavior of the level sets and gradient trajectories
near a nondegenerate maximum. The situation is similar near a nondegenerate
minimum. The gradient trajectories in that case are followed as t --+ - 00.
Near a saddle point (n = 2) the behavior of the trajectories is indicated
in Figure 6.11.

EXAMPLE I. Letf(x, y) = x 2 - y2. The equations of the gradient trajectories


are
dx dy
dt = 2x,
dt
-2y,

whose solutions are gl(t) = Xl exp(2t), g2(t) = Yl exp( -2t). The trajectories
lie on the hyperbolas xy = k orthogonal to the level sets x 2 - i = c, and on
the coordinate axes. Only trajectories starting from points (0, Yl) lead to the
saddle point (0,0).

PROBLEMS

1. Sketch the level sets and gradient trajectories.


(a) f(x, y) = 1 - x 2 - 2y2.
(b) f(x, y) = xy + y.

2. Let f be a strictly concave function on En which has an absolute maximum at xo.


By Theorem 3. 7, f has no other critical points. Consider any gradient trajectory g.
(a) Let l/!(t) = 1g(t) - Xo 12. Show that i/J is non increasing.
(b) Show that i/J(t) ~ 0, and hence g(t) ~ xo , as t ~ + CfJ.
3. Let G be the representation with arc length as parameter of a gradient trajectory g.
Show that G'(s) is the direction of the gradient at G(s) (see Section 6.2).

*6.6 Integrating factors; thermal systems


Let ro be a differential form of degree 1, with domain D c En. It may happen
that ro is not exact, but that the product of a suitable real valued function
cp and ro is exact. Let us assume that ro(x) -# 0 for all xED.

268
6.6 Integrating factors; thermal systems

Definition. A function 4> is an integrating factor if 4>(x) #- 0 for all XED and
4>ro = dffor some functionf

In some cases an integrating factor can be found by inspection.

EXAMPLE I. Let D = {(x, y): x > O}, ro = 2y dx + x dy. Then ro is not exact,
but
xro = 2xydx + x 2 dy = d(x 2 y).

In this example, 4>(x, y) = x is the integrating factor andf(x, y) = x 2 y.


Let us require 4> and ro to be of class e(l) on D. Then f is of class e(2),
and its first-order partial derivatives arefi = 4>Wi, i = 1, ... , n. By Theorem
3.3,fij = hi' Thus an integrating factor 4> must satisfy the equations
o 0
(6.18) ox (4) W i) =
j OXi (4)w), i < j.

When n = 2, there is only one equation in (6.18) to be satisfied by 4>. We shall


see that when n = 2 integrating factors exist, at least locally (Problem 2).
For n > 2, there are n(n - 1)/2 equations in (6.18). Further conditions must
be imposed on ro in order that there be an integrating factor. Theorem 6.2
(below) gives one set of such conditions. These conditions have a natural
interpretation in the discussion of thermal systems to follow. Another, more
easily verifiable, set of conditions will be given later in Section 7.1 O.
Let y be a curve in En, and g a parametric representation of y on [a, b].

Definition. We say that ro = 0 on y if ro[g(t)] . g'(t) = 0 for a ~ t ~ b.

If 4>ro = df and 4>(x) #- 0, then the curves on which ro = 0 are just those
on whichf(x) is constant. To see this, note thatf[g(t)] is constant if and only
if df = 0 on y. But df = 0 on y if and only ro = 0 on y.
Note. In the terminology of differential equations, a curve y on which
ro = 0 is called a solution curve for the differential equation ro = O. Such a
differential equation is called of Pfaffian type. Since ro(x) #- 0, we have also
df(x) #- 0 if there is an integrating factor 4>. Thus the level sets Be =
{x ED: f(x) = c} are (n - I)-manifolds (see Section 4.7). Every solution
curve lies in some level set Be.
Let us now assume that w 1(x) = 1, i.e., that ro has the special type
n
(6.19) ~ = L Wi dxi .
i= 2

This assumption holds in the application to thermal systems below. Moreover,


the problem of local existence of an integrating factor reduces to considering
ro of this type (see the remark following Theorem 6.2).

269
6 Curves and line integrals

Let us denote points of En by X = (x I, z) where z = (x 2 , ••• ,xn). If ro = 0


on y, then

Hence, if Xo = (x6, zo) and XI = (xL zd are the endpoints of y,

(6.20) xl - x6 = - f~·
1"

Definition. We say that a curve y is of type .r if Zo = Zl and ro = 0 on y.


Let QeD. We say that ro satisfies property (P) in Q if L ~ = 0 for any
curve y of type !7 lying in Q.
By (6.20), property (P) is equivalent to the statement that any curve y
of type.r lying in Q is a closed curve (xo = XI)'

Theorem 6.2. Let Q I be a neighborhood ofx*, with Q I cD.

(a) Ifro has an integrating factor in QI' then ro satisfies property (P) in QI'
(b) If ro satisfies property (P) in QI, then there exists an open set ~ containing
x* such that ~ c Q I and ro has an integrating factor in~.

Remark. In Theorem 6.2, ro is assumed of type (6.19). However, this is


actually no restriction. Since ro(x*) i= 0, some component wlx) is not 0 for
all x in a neighborhood Q I ofx*. We may rearrange indices so that i = 1. Let
ro = (wd-lro. Then wI(x) = 1, and for ~ c QI, ro has an integrating factor
in ~ if and only if ro has an integrating factor in A
PROOF. To prove Theorem 6.2(a), suppose that ¢ro = dfin QI' Since WI = L
of/ox l = ¢ i= 0 in QI' Consider any curve y of type .r lying in QI' Then
y c Be n QI' for some level set Be = {X :f(x) = c}.
By the implicit function theorem,
Be n Q I = {(IjJ(z), z): z E R},
where R c En-I is open. At the endpoints Xo, XI of y we have x6 = ljJ(zo)
and xl = ljJ(zd. Since Zo = Zl' we have x6 = xl which by (6.20) is equivalent
to property (P).

For proof of Theorem 6.2(b), to simplify the notation let us suppose that
x* = O. Suppose that ro has property (P) in QI' Let us define F(u, z) for (u, z)
in a suitable neighborhood Q 2 of 0 as follows (Q 2 c Qd. Let y be any curve
lying in QI, such that ro = 0 on y and y has endpoints (u, 0), (Xl, z) (see Figure
6.12). We set F(u, z) = Xl. Let us show that F(u, z) does not depend on the
particular choice of y. Suppose that y is another such curve, with endpoints
(u, 0) and (Xl, z). The curve YI obtained by traversing y in the opposite sense
followed by y is of type .r. By Property (P), xI = XI.

270
6.6 Integrating factors; thermal systems

__________~~~-----------Xl
(U,O)

Figure 6.12

In order to show that F(u, z) is of class e(l), we make a particular choice for
'Y and appeal to a theorem about differential equations. This curve 'Y is
represented on [0,1] by g(t) = (gl(t), tz), where gl is the solution of the dif-
ferential equation
dg 1
L Wj[g o ::;; t
n 1 .
- = - (t), tz]x', ::;; 1,
dt j=2

with gl(O) = U, Z = (x 2 , ••• , xn). Then (0 = 0 on 'Y and 'Y has endpoints
(u,O), (gl(l), z). Thus F(u, z) = gl(l). By [11, Section 1.3], F is of class e(1)
on a suitable neighborhood il2 of O.
Let us next show that

(6.21 )
of
ox j (u, z) = -
I
Wj(x , z), i = 2, ... ,n,

where Xl = F(u, z). For this purpose, given v > 0 sufficiently small, consider
the curve represented on [0, 1] by G(t) = (GI(t), z + tvej), where
dG I
dt = -Wj[G1(t), Z + tve;]v,

with GI(O) = Xl. Then (0 = 0 on this curve; its endpoints are (Xl, z), (G I(1),
z + ve;). Thus F(u, z + vej) = GI(l). By the mean value theorem,
F(u, z + ve;) - F(u, z) GI(l) - GI(O)
v v
= -Wj[GI(s),z + sve;]
for some s E (0, 1). When v --. 0 we get (6.21).
Let (0# be the I-form obtained by substituting, in (6.19), Xl = F(u, z) and
replacing dx 1 by dF. Then by (6.21),

#
= dF + L.. wjdx = -duo
~ j of
(0
j=2 au
Since F(u, 0) = u, we have
(6.22)

271
6 Curves and line integrals

Consider the transformation F(u, z) = (F(u, z), z) from O 2 into En. By (6.22),
JF(u, 0) = 1. The inverse function theorem implies that 0 has a neighborhood
o c O2 such that FlO has an inverse of class C(1), defined on an open set
~ = F(O) containing F(O) = O. The inverse has the form (FIO)-I(X) =
(f(x), z), where f(x) is that u for which F(u, z) = Xl. We set
of 1
r(x) = ou (f(x), z), 4>(x) = r(x)·

Then ro = r df in ~ and 4> is an integrating factor. D

When n = 2, the property (P) is not needed to construct the function F in


the proof of Theorem 6.2. See Problem 2. Thus an integrating factor always
exists in a neighborhood of any (x, y) when n = 2.

Thermal systems
The subject of thermodynamics is concerned with relationships between heat
and other forms of energy such as mechanical, chemical, electrical, or
magnetic. It deals with gross effects such as temperature and pressure,
without reference to the underlying atomic structure of the matter being
studied.
The following mathematical model is used to describe various thermal
systems [16]. A possible state of the system is a vector x = (Xl, ... ,xn). The
number Xl represents the total energy of the system, and x 2 , • •• ,xn other
quantities such as volume and mass. Let D denote the set of possible states.
It is assumed that D is open, convex, and that xED implies ex E D for any
c > 0. A I-form ro = dx l + ~oftype(6.l9)isgiven. The coefficientsw2' ... ,Wn
are functions on D homogeneous of degree 0, namely, Wi(CX) = wi(x) for
C > 0. [Note: In thermodynamics literature the components Xi are called
extensive variables and the functions Wi' intensive variables.] The physical
significance of ~ is as follows. If a thermal system in a state x is displaced to a
new state x + h, then for Ih I small, ~(x)· h represents approximately the
work done (cf. Section 6.4). Let us now suppose that the states of a thermal
system change with time t, and let x = g(t) be the state at time t E [a, b].
L
Let y be the curve represented by g. Then ~ is the work done in moving the
thermal system along y. On the other hand,

L dx l = gl(b) - gl(a)

L
is the change in total energy. The sum ro is interpreted as the heat energy
transferred to the system from outside it.
The curve y is called adiabatic if ro = 0 on y. In other words, no heat is
transferred as the thermal system states move along an adiabatic curve y.
As above, let Z = (x 2 , ••• , xn) denote the vector of system states, omitting
energy Xl, and denote by Xo = (x6, Zo), Xl = (xL Zl), the initial, final states
as the system moves along y. It is assumed that:

272
6.6 Integrating factors; thermal systems

Principle 1 (Conservation of energy). If y is an adiabatic curve such that


Zo = Zb then x6 = xt.

Principle 1 is called the first law of thermodynamics. It is equivalent to


property (P) above. Theorem 6.2(b) guarantees that integrating factors exist,
at least locally. Principle 2 is an additional assumption that there is an inte-
grating factor of a special form.
Let IJ(x) denote the temperature of a thermal system in state x, measured
on some scale. It is assumed that IJ is of class C(1), homogeneous of degree 0,
and that aIJ/ax' > O.

Principle 2. There exist functions T and S such that ro = T dS and T = r 0 IJ,


where r is of class C('l, r(IJ) > 0 and r'(IJ) > 0 for all IJ.

T(x) is called the thermodynamic temperature and S(x), the entropy of a


thermal system in state x. Thermodynamic temperature T is measured on a
scale natural to the problem. Adiabatic curves are just those on which dS = 0,
in other words, curves on which entropy S is constant. The function T is
homogeneous of degree 0, and S is homogeneous of degree 1 [S(cx) = cS(x)
for c > OJ.
A curve y is called isothermal if T(x) is constant on y; that is, if T[g(t)] = k
for some k, where g represents y. If y is isothermal, then

(6.23) f f
y
ro = k
y
dS = k[S(x,) - S(xo)]·
We have
aT aIJ
ax' = r'[IJ(x)] ax' > O.

Hence, T is an increasing function of x' for fixed z. Suppose that y is iso-


thermal, with endpoints Xo = (X6, zo), x, = (xl, z,). If Zo = z" then we must
have X6 = xt since T(x o) = T(x,). By (6.23) we have L ro = 0 and hence
L ~ = O. Thus there is neither a change in the total energy nor work done as
the system moves along an isothermal curve y for which Zo = z,.
Principle 2 is called the second law of thermodynamics for thermal systems
that can be described by the above model. This model is an idealization of
actual physical systems. For one thing, a curve y leading from thermal state
Xo to state x, can be traversed in the opposite sense, from x, to xo. Thus the
model applies to reversible processes. Moreover, in the model x is to be
regarded as an equilibrium state of the system. For example, if the model
describes a gas in a container, then such variables as pressure, volume, and
temperature should be the same at all points of the container.

EXAMPLE 2 (Ideal gas). Let n = 3 and x' = U, x2 = V, x3 = N, where U is


the total kinetic energy of the molecules in the gas, V the volume, and N the
mass. Let
ro = dU + PdV -l1dN,
273
6 Curves and line integrals

where P is pressure and J,l is called the chemical potential. If ro = T dS, then
¢= liT is an integrating factor and

as as P as J,l
au T' av T' aN T
Since S is homogeneous of degree 1, Euler's formula (Problem 8, Section 3.3)
gives

Therefore

(6.24)
S = U + PV - J,lN
T .
For an ideal gas, it is required that
U U
(6.25) P = a V' T=h-
N'
for suitable constants a, h.
Formula (6.18) gives three equations that ¢ = liT must satisfy. One of
these holds by (6.25). The other two equations determine J,l, and then S,
from (6.24):
a V 1 U
(6.26) S = -Nlog- + -Nlog- + CN
b N b N
where C is any constant (see Problem 4).

PROBLEMS

1. Let n = 3, x = (x, y, z). Show that the I-form co = dx + y dz has no integrating


factor. [Hint: Show that property (P)does not hold. Try curves y on which y = a cos t,
z = a sin t, 0 :s; t :s; 2n, and co = 0 on y].
2. Let n = 2 and co = dx + N(x, y)dy. Given (x*, y*), define F(u, y) by
aF
ay = - N(F(u, y), y), with F(u, y*) = u.

[In other words, for fixed u, F(u, y) satisfies the differential equation dx/dy =
- N(x, y).] Following the last part of the proof ofTheorem 6.2 (b), show that co = r df
in some open set ~ containing (x*, y*).
3. Let n = 2 and co = dU + P dV = T dS.
(a) Show that S = T-1(U + PV).
(b) Find Tif P = aUV- 1 •
4. In Example 2, derive (6.26).

274
Exterior algebra and
differential calculus
7

In this chapter we introduce the algebra and calculus of differential forms,


which also goes by the name "exterior differential calculus." In Section 6.3
we defined the concept of differential form of degree 1, and in Section 6.4, the
integral L ro of such a differential form ro along a curve y. This integral
changes sign if the direction in which y is traversed is reversed. In Chapter 8
we define the integral of a differential form ro of higher degree r over a portion
A of an r-dimensional manifold. For example, for r = 2 a differential form
of degree 2 can be integrated over a piece of surface A. The integral of a
differential form depends on the orientation assigned to A and changes sign
if the orientation is reversed.
We begin in Section 7.1 with the special case of degree r = 2. The purpose
is to motivate the treatment of differential forms of arbitrary degree r, which
follows in Section 7.4. For similar reasons we begin the discussion of multi-
vectors, frames, and orientations in Section 7.5 with the case r = 2.
We recall from Section 6.3 that a differential form of degree 1 is a covector-
valued function. In order to define differential forms of higher degree r we
first introduce in Sections 7.2 and 7.3 multicovectors of degree r. For brevity,
they are called r-covectors. An r-covector is an alternating, multilinear func-
tion with domain the r-fold cartesian product En X ... X En. It turns out
that the r-covectors form a vector space of dimension (~), which is denoted
by (E~)*.
There is a natural product for multicovectors called the exterior product
and denoted by the symbol /\. If ro is an r-covector and ~ an s-covector, then
ro /\ ~ is a certain (r + s)-covector. The exterior product is associative, and
it is commutative except for a possible sign change (Proposition 7.3).
A differential form of degree r is defined as an r-covector-valued function.
For brevity we say r-form for differential form of degree r. The exterior
product ro /\ ~ of an r-form ro and an s-form ~ is defined using the exterior

275
7 Exterior algebra and differential calculus

product for multicovectors. Its properties are summarized in Theorem 7.1.


Every r-form can be represented as a homogeneous polynomial in dx 1, ... , dxn
of degree r, whose coefficients are real valued functions [see Formula (7.23)].
As in Section 6.3, the I-forms dx 1, ••• , dxn are the differentials of the standard
cartesian coordinate functions. Their exterior products are anticommutative,
namely, dx i 1\ dx i = -dx i 1\ dxi.
Every r-form eo of class e(l) has an exterior differential deo, which is a form
of degree r + 1. The usual formulas for differentials of sums and products
remain true, except for a possible change of sign in the product rule. Another
important fact is that d(deo) = 0 for any form of class e(2) (see Theorem 7.2).
One ofthe main results of Chapter 8 is Stokes's formula (8.20). This equates
the integral of deo over a portion A of an (r + 1)-dimensional manifold with
the integral of eo over the boundary of A, provided A and its boundary are
oriented consistently. It is Stokes's formula that justifies the way in which the
exterior differential is defined.
In Section 7.5 we define the concept of r-vector. The r-vectors form a vector
space E~, whose dual space turns out to be the space (E~)* of r-covectors.
Certain multivectors, called decomposable, have an interesting geometric
interpretation. An r-vector 0[ is decomposable if there are vectors h l , ... ,h r
such that 0[ = hl 1\ ••. 1\ hr. It turns out (Theorem 7.3) that if 0[ "1= 0, then
h l , ... , hr span an r-dimensional vector subspace P of En. With 0[ is associated
an orientation of P. If two of the vectors h l , ... , hr are interchanged, then
0[ changes sign and the orientation of P changes. The norm 10[1 of a de-

composable r-vector 0[ equals the r-dimensional measure of a certain r-


parallelepiped.
Besides its differential deo, an r-form eo of class C(1) has a codifferential
deo, which is a form of degree r - 1 (Section 7.8). In Chapter 8 the codifferen-
tial is used only for r = 1, in which case it becomes the divergence. In Section
7.9 the basic formulas of vector analysis in E3 are derived.

7.1 Covectors and differential forms


of degree 2
Let us call a real valued function L with domain En I-linear if L is linear. We
recall from Section 3.2 that a I-linear function L may be identified with the
corresponding covector a = ale l + ... + ane n, where ai = L(e;). Here
{e b ... , en} and {e l , ... , en} are, respectively, the standard bases for en and
its dual (En)*. The linear function corresponding to ei is the standard cartesian
coordinate function Xi, such that Xi(x) = Xi for all x. In this chapter a co-
vector a is also called a covector of degree 1, or for brevity, a l-covector.

Alternating bilinear functions


In order to define covectors of degree 2, let us first recall the concept of
bilinear function. Let B be a real valued function with domain the cartesian
product En X En. The elements of En X En are pairs of vectors, denoted by

276
7.1 Covectors and differential forms of degree 2

(h, k). For fixed h, let B(h, ) denote the function whose value at k is B(h, k).
Similarly, for fixed k, B( , k) is the function whose value at h is B(h, k). The
function B is bilinear if B(h, ) and B( , k) are linear functions for every h, k.
We have
n n
h= L hiej, k = L kie i .
i= 1 j= 1

If B is bilinear, then
n n
B(h, k) = L hiB(ej, k) = L hikjB(ej, eJ
i= 1 i.j= 1

Let
(7.1) i,j=l, ... ,n;
then for every (h, k)
n
(7.2) B(h, k) = L Wijhik j.
i, j= 1

In this chapter we are interested in a special class of r-linear functions,


called alternating. For r = 2, B is alternating if B(h, k) = - B(k, h) for every
(h, k). If B is bilinear and alternating, then wij = -w ji , and in particular
Wii = 0, Formula (7,2) can be rewritten

B(h, k) = L (Wijhikj + wjihjkt


i<j

or
(7.3) B(h, k) = L wiihikj - hjk i).
i<j

Conversely, given n(n - 1)/2 numbers Wij, i < j, Formula (7.3) defines an
alternating bilinear function.
It is convenient to introduce a different name and notation for alternating
bilinear functions.

Definition. A covector of degree 2 (or 2-covector) is an alternating bilinear


function on En X En. The space of 2-covectors is denoted by (E~)*.

If Bl and B2 are alternating bilinear functions, then the sum Bl + B2 is


also alternating and bilinear. Similarly, cB is alternating and bilinear if c
is real, B alternating and bilinear. Thus (E~)* satisfies the axioms for a vector
space (Appendix A.I),
From now on, we generally use lowercase Greek letters, such as (0, ~, fl, ... ,
to denote 2-covectors rather than B. Let us introduce some particular 2-
co vectors, denoted by eij. For every pair of vectors (h, k) E En X En, let
eij(h, k) = hik j - hjk i.

277
7 Exterior algebra and differential calculus

The function eij is alternating and bilinear, for any i,j = 1, ... , n. Thus eij
is a 2-covector. We also have
(7.4)
Since (7.3) holds for all (h, k), we have upon writing ro instead of B to denote a
2-covector,
(7.5) ro = L.
" wijeij.
j<j

Conversely, given n(n - 1)/2 numbers Wjj' i < j, Formula (7.5) defines an
alternating bilinear function. The 2-covectors eij, i < j, are linearly indepen-
dent (Problem 2). These n(n - 1)/2 2-covectors form a basis for (E'2)*, called
the standard basis. In particular, (E 2)* is of dimension n(n - 1)/2.

EXAMPLE I. Let n = 3. The 2-covectors {e I2 , e 13 , e 23 } form the standard


basis for (E~)*, which is 3-dimensional. Every ro E (E~)* can be uniquely
writtenasro = w!2e l2 + W13e13 + W23e23, where the coefficients W I 2,W I3 ,
W23 are real numbers.

Exterior products of l-covectors


One way to get a 2-covector is by multiplying two l-covectors, using a new
kind of product (called the exterior product). The symbol for this kind of
multiplication is 1\. To define the exterior product a 1\ b of l-covectors a and
b, we first specify how to multiply standard basis co vectors ei and ej , and then
we agree that 1\ distributes with addition and scalar multiplication. Let us
set e i 1\ ei = eij. If
n n

a = L aje j, b = L.
" b·e
} '
j
i= I j= I

we then set
n

a 1\ b= L ajbjeij = L (ajb j - ajb;)eij.


j.j= I i<j

In the last equality, we have used (7.4). Thus, the components of a 1\ b in


(7.5) are wij = ajbj - ajbj.
We see in Section 7.5 that every 2-covector is of the form ro = a 1\ b for
dimensions n = 2, 3, but this does not apply for n > 3.

Differential forms of degree 2


The definition is quite analogous to the case of degree 1 in Section 6.3.

Definition. A differential form of degree 2 (or 210rm) is a function ro with


domain D c En and values in (E2)*' The value of ro at x is denoted by ro(x).

The values of ro are 2-covectors. The same Greek letters ro, ~, 1], ... used
above to denote 2-covectors are now used to denote differential forms.

278
7.1 Co vectors and differential forms of degree 2

One way to get a 2-form is by multiplying two I-forms, using the product
A. If ~ and 1) are I-forms, then ~ A 1) is the 2-form obtained by multiplying
their values at each x:
(~ A 1))(X) = ~(x) A 1)(X).

In particular, dXi denotes the I-form, whose (constant) value at any x is the
I-covector ei . Then dx i A dx j has the constant value ei A ej = eij. By (7.5),

ro(x) = L wij(x)eij
i<j

for each XED. Thus


(7.6) ro = L wij dxi A dxj.
i<j

By (7.4), we have
(7.7)

Formula (7.6) represents any 2-form ro as a homogeneous quadratic poly-


nomial in dXI, ... ,dx", whose coefficients wij are real valued functions on D.

EXAMPLE 2. Let n = 3 and ~ = dz - y dy, 1) = z dx - cos x dz. Then


~ A 1) = (dz - y dy) A (z dx - cos x dz)
= yz dx A dy - z dx A dz + y cos x dy Adz.
We have used the fact that A obeys the usual rules of algebra, except that
(7.7) replaces the commutative law. For instance, dy A dx = -dx A dyand
dz A dz = O.

The exterior differential of a I-form


We recall from Section 6.3 that the differential df of a real valued functionf
of class e(1) is the I-form whose value at x is df(x). Now consider a I-form
ro = Li'=
I Wi dx i, where the coefficient functions Wi are of class e(1) on D.
The exterior differential dro is the 2-form defined by

dro = L" dWi A dxi.


i= I

We have

By (7.7), this can be rewritten

(7.8) - " (OW


dro-L., OWi)
- .j - - . dx iAd X.j
i<j ax' ax}

279
7 Exterior algebra and differential calculus

In Section 6.3 we defined the concepts of closed and exact differential forms
of degree 1. By (6.11) and (7.8), a I-form co of class e( l) is closed if and only if
dco = O. Iffis of class e(2), then dfis exact (hence closed) and of class e(l).
Therefore d(df) = O.

EXAMPLE 3. Let n = 2 and co = M dx + N dy. Then


oM oM oN oN
dM = a;dx + a:yd y, dN = ox dx + aydy ,

dco = dM /\ dx + dN /\ dy = ( oN
ox -
OM)dx
a:y /\ dy.

As in Section 6.3, the condition that co be closed is oM/oy = oN/ox.

EXAMPLE 4. If n = 3 and co = 2 dx + zZ dy + xZy dz, then


dco = d(2) /\ dx + d(zZ) /\ dy + d(xZy) /\ dz
= 2z dz /\ dy + 2xy dx /\ dz + X Zdy /\ dz.

The definition of dco given above may at first seem rather arbitrary. We
show at the end of Section 7.4 that the exterior differential is uniquely speci-
fied by a few reasonable properties that may be required of it. However, let
us now show how to arrive at dco in another way, starting from a special case
of Stokes's formula. We require that Stokes's formula hold for any parallel-
ogram (in Section 8.8 it is proved under much more general circumstances).
The following discussion is intended purely as motivation, and can be
omitted if desired. Consider a parallelogram K with vertices x o , Xo + h,
Xo + k, Xo + h + k, and let

g(s, t) = Xo + sh + tk.
Then K = g(I), where I = [O,IJ x [0, IJ is a square in the (s, t) plane. The
pair of vectors (h, k), written in the given order, assigns an orientation to K
(we postpone a formal definition of the concept of orientation until Section
7.5). The integral of a 2-form 'lover an oriented parallelogram K is denoted
by SKo'l. It is defined as follows. Let 'I(x) be the value of 'I at x; 'I(x) is a 2-
covector, that is, an alternating bilinear function, which assigns to the pair
(h, k) the real number 'I(x)(h, k). Let

(*) r 'I = r'l[g(s, t)J(h,k)dVz(s, t).


JKo J1
Note that for a constant 2-form, namely, 'I(x) = '10 for all x, the integral
becomes simply 'Io(h, k). The definition (*) can then be motivated by dividing
K into a large number mZ of small parallelograms congruent to K. The right

280
7.1 Co vectors and differential forms of degree 2

Figure 7.1

side of (*) is the limit as m --+ 00 of the sum

L 'I(x,)(m- 1 h, m- 1 k) = m- 2 L 'I(x,)(h, k),


1= 1 1= 1

where X, is some point of the lth small parallelogram.


Now let y be the polygon bounding K, traversed in the sense indicated in
Figure 7.1. Let ro be a I-form of class e(l). Then

(**) L ro = {{ro[g(1, t)] - ro[g(O, t)]}· k dt

- {{ro[g(S, 1)] - ro[g(s,O)]}· h ds.

The first integral on the right side is the contribution from the second and
fourth segments of y, starting from xo, and the second integral the contri-
bution from the first and third segments. Stokes's formula asserts that

(7.9) f)'
ro = (0'1,
JK
for a suitable 2-form 'I. Let us show that 'I = dro.
Now
n
ro(x). k = L wi(x)k i
i= 1

where W 1, •.• , Wn are the components of the I-form roo By the fundamental
theorem of calculus

to
Wi[g(1, t)J - Wi[g(O, t)J = Jo os Wi[g(S, t)]ds

t og (1
= Jo dWi [g(S, t)] • os ds = Jo dWi[g(S, t)] . h ds.
Similarly,

Wi[g(S, 1)] - Wi[g(S,O)J = {dWi[g(S, t)] • k dt.

281
7 Exterior algebra and differential calculus

By comparing these formulas with (*) and (**) we see that, if(7.9) holds for any
parallelogram K, we must require
n
T)(x)(h, k) = L {(dWi(X)' h)k i - (dWi(X)' k)hi}.
i= 1

But then

T)(x)(h, k) =
i= 1
f {± ow;
ox
(hik i -
i= 1
kihi)}.

Recall that dx i /\ dx i is the constant 2-form, whose value is the 2-covector


eii which assigns to (h, k) the number hik i - kihi. Thus
n n OWi' .
T) = i~l i~l oxi dx 1 /\ dx',

which is just the formula preceding (7.8) for dID. Thus T) = dID.

PROBLEMS

1. Following Example 1, write out for n = 2 and n = 4 the standard basis for (E~)* and
express any 2-covector as a linear combination of the basis covectors e ii.
2. Show that the 2-covectors eii, 1 :0; i < j :0; n, are linearly independent.
3. Let n = 3. Simplify:
(a) (e l + el ) /\ (e! - 3e 3 ).
(b) (dy - x dz) /\ (xy dx + 3 dy + z dz).
4. Let n = 2. Find the exterior differential of:
(a) xly dy - xyl dx.
(b) x dy + y dx.
(c) f(x)dx + g(y)dy.
(d) fIx, y)dy.

5. Let a, b, and c be I-co vectors. Show that:


(a) (a + b) /\ C = a /\ c + b /\ c.
(b) (ka) /\ b = k(a /\ b), if k is real.
(c) b /\ a = -a /\ b.
(d) a /\ a = O.
(e) (a + b) /\ (a - b) = - 2a /\ b.

6. Show that if a, b, care l-covectors, then


a /\ b +b /\ C +c /\ a = (a - b) /\ (b - c).

7. Show that if co = a /\ b, then WijWkl + WjkW'j + Wj/Wjk = 0 for i,j, k, I = 1, ... , n.


[Hint: wij = ajb j - ajb i . Show that

282
7.2 Alternating multilinear functions

7.2 Alternating multilinear functions


In preparation for studying differential forms of any degree r, let us first
discuss alternating multilinear functions of degree r.
For any r ~ 2 let M be a real valued function with domain the r-fold
cartesian product En X ... X En. The elements of En X ... X En are r-tuples
of vectors, denoted by (hb' .. ,hJ

Definition. The function M is multilinear of degree r if for each I = 1, ... , r


and hI' ... , hI-I' hl + I' ... , hr the function M(h l , ... , hI-I, hl + b ... , hr)
is linear.

For brevity we write r-linear instead of multilinear of degree r. When


r = 2 we wrote hI = h, h2 = k. The new definition agrees for r = 2 with the
definition of bilinear function. The formula which generalizes (7.2) to multi-
linear functions is
n

(7.10) M(h l ,···, hr} = L (J)i, ..... iYI'··· h~r,


il • ... , i r = 1

where
(7.11)

This is proved by induction on r.

Interchanges
Let S be some set. For our purposes we shall take either S = En or S =
{l, 2, ... ,n}.If(PI"" ,Pr) and (P'l"" ,p~) are r-tuples of elements of S, let us
say that the second r-tuple is obtained from the first by interchanging Ps
and PI if p~ = PIO P; = Pso and P; = PI for I # s, t.

EXAMPLE 1. The triple of vectors (h 3 , h2' hI) is obtained from (hI' h2' h3) by
interchanging h3 and hI' The 4-tuple of integers (1,5, 3, 7) is obtained from
the 4-tuple (1, 7, 3, 5) by interchanging 5 and 7.

Definition. An r-linear function M is alternating if M(h b ... , hr} changes


sign whenever two vectors in an r-tuple (hI"'" hr) are interchanged.

We recall that an r-tuple (hI, ... ,hr) is called linearly dependent if there
exist scalars c l , ... ,cr, not all 0, such that clhl + ... + crhr = O.

Proposition 7.1. Let M be r-linear and alternating. If(h l , •.. , hr } is a linearly


dependent r-tuple, then M(hlo ... ,hr} = O.

283
7 Exterior algebra and differential calculus

PROOF. First of all, the conclusion is true if some vector in the r-tuple is
repeated. For instance, suppose that hi = h2. Since M is alternating,
M(hb h2,h3,···,hr) = -M(h 2,h l ,h 3,···,hr)·
Then M(hb hi, h3' ... , hr) is its own negative, and must be O.
Suppose for instance that hr is a linear combination of the vectors pre-
ceding it,
hr = clhl + ... + cr-Ih r _ l ·
Since M(h l , ... , hr- I, ) is alinearfunction,
r- I

M(h l , ... , hr) = L c'M(hb ... , hr- I, hi)'


1= I

In the lth term on the right-hand side, the vector hi is repeated, and hence
each term is O. Thus M(h l , ... , hr) = O. 0

For any r ~ 2 there is the trivial alternating r-linear function 0, which


has the value 0 for every r-tuple (hi' ... , hr). If r > n, then (hi" .. , hr) must be
linearly dependent and from the proposition we get the following.

Corollary. If r > n, then 0 is the only alternating r-linear function.

When r = n, an alternating n-linear function is


(7.12)
namely, the determinant of the matrix that has hi, ... , hn as its column vectors.
We show at the end of this section that any alternating n-linear function M is
a scalar multiple of D. The determinant is the alternating n-linear function
with the additional property D(e b ... , en) = 1.
Let us now turn to the case of alternating r-linear functions, for any
r :s; n. We know that the sum of two linear functions is a linear function. From
this fact and the definition of multilinear function, the sum M + N of two
r-linear functions M and N is r-linear. If M and N are alternating, then
M + N is alternating. Similarly, if c is a scalar then cM is r-linear when M
is r-linear and alternating when M is alternating. Thus, the set of alternating
r-linear functions satisfies the axioms for a vector space (Appendix A.l). In
later sections this space is denoted by (E;)*.
To obtain a representation of any alternating r-linear function M, in
Proposition 7.2, we need to introduce some more notation. The letter A
denotes an r-tuple of integers,
A = (i I, ... , i r ),
where 1 :s; ik :s; n for each k = 1, ... , r. There are nr such r-tuples of integers.
If i l < ... < ir , then A is called an increasing r-tuple. There are (;) increasing
r-tuples, where (;) = n!jr!(n - r)! is a binomial coefficient. We write LA
for a sum over all r-tuples and L[A] for a sum over all increasing r-tuples.

284
7.2 Alternating multilinear functions

The following generalization of the Kronecker symbol b~ (Section 1.2) is


used. Let A = (i I, ... , i,), fl = UI, ... ,j,) be r-tuples of integers. Then b~~ is an
element of an r x r matrix; and is 1 if i k = jl, 0 otherwise. Let
b~ = det(b~~).

The important properties of b~ are:


(1) If no integer is repeated in the r-tuple A and fl = A, then b~ = 1.
In this case ik = jl if and only if k = I. Hence b~ = det(b}) = 1.
(2) If no integer is repeated in the r-tuple fl and A is obtained from fl by p
interchanges, then b~ = ( - 1)P.
Each interchange of elements of fl interchanges two column vectors of the
matrix (b~~) and changes the sign of the determinant. Therefore (2) follows
from (1).
(3) In all other cases, b~ = O.
If some integer is repeated in fl, then two column vectors of the matrix are
the same and the determinant is O. If the integers jl"" ,j, are distinct and
some i k does not appear among them, then the kth row covector of the matrix
is 0 and the determinant is O.
Now let M be an alternating r-linear function. For brevity let us set

)ometimes we will still write W il ... i , rather than W A , particularly when r ~ 3


or r = n. If A is obtained from fl by one interchange, then W A = -wJ1" In
particular, W A = 0 if any integer is repeated. If A is obtained from fl by p
interchanges, then WII = (-1)PWA = b~WA'
If fl has no repetitions, then exactly one increasing .Ie is obtained from f1
by interchanges. Hence for every fl,
(7.13) WII = L WAb~,
[AI

where at most one term on the right-hand side is different from O.

EXAMPLE 2. Let n = 5, r = 4. Then W I231 = 0 since I is repeated in the 4-tuple


A = (1,2,3, 1). Since (2,3,4,5) is obtained from (5,4,2, 3) by an odd number of
interchanges, W 2345 = - W 542 3'

Let us now consider some particular alternating r-linear functions, which


play an important role in later sections. For each r-tuple A = (i I, ... , i,) let
eA be the function such that
(7.14)
for every r-tuple of vectors (hI, ... , h,). Note that the r x r matrix (hik) is
formed from rows i l , •.• , i, of the n x r matrix (hi) that has hI"'" h, as

285
7 Exterior algebra and differential calculus

column vectors. By properties of determinants, e A is r-linear and alternating.


When r = 2 and .A. = (i,j), then eij(h, k) = hik j - hjk i. Formula (7.14)
agrees with the definition of eij in Section 7.1.
Taking, in particular, h l , ... , hr to be standard basis vectors, hi = ej, for
I = 1, ... , r, we obtain in (7.14) the matrix ((j~7) whose determinant is (j~.
Thus
(7.15) eA(eiI ,· .. , ejJ = (j~.
If A. is obtained from J.1 by an interchange, then two row covectors of the matrix
(hik) in (7.14) are interchanged. The determinant changes sign. Hence
e\h bhr) = -e"(h l , . · · , hr)
·.·,

for every r-tuple (h b ••. , hr ), which means that e A = -e". In particular,


e A = 0 if A. has any repetitions. If A. is obtained from J.1 by p interchanges,
then e A = (-!)PeI' .
Let us make the convention that e A = 0 in case r > n. This is useful in
defining the exterior product in Section 7.3.

Proposition 7.2 (r :::; n). Let M be alternating and r-linear. Then


(7.16) M = L WAeA,
[A]

where the numbers W A are defined by (7.11) and the alternating r-linear
functions e A by (7.14). The notation [A.] indicates a sum over all increasing
r-tuples A. = (i l' ... , ir)'
PROOF. Let Nt equal the right-hand side ~of (7.16). For each J.1 = (jl' ... ,jr),
-
M(ej" ... , ejJ "A
WAe (e
= L. iI ,···, ejJ
[A]

From (7.13) and (7.15),


-
M(eiI ,···, ejJ "A
= L. WA(j1l = WIl'
[A]

But M and Nt are r-Iinear and have the same value w il at (eiI , ... ,ejJ for
each J.1. By (7.10) M = M. 0

Let us return to the special case r = n mentioned earlier. The only increas-
ing n-tuple of integers between 1 and n is (1,2, ... ,n). By Proposition 7.2,
any alternating, n-linear function has the form M = ce l ''' n, where c = W l "'n
is a scalar. In the notation of (7.12), e l .. · n = D.

PROBLEMS

1. Let n = 5. Find

2. Let n = 4, r = 3, W I23 = 2, W I34 = -1, and WI. = 0 for every other'mcreasing


triple Ie. Find M(e 4 , e l - e 3 , e 2 + e 3 ).

286
7.3 Multicovectors

3. Show that:
(a) b~ = b~.

[Hint for (c): Use (b) and (7.15).]


4. Let M be r-linear, not necessarily alternating. Let w, be as in (7.11) and w~ =
(l/r!)I,w, b~. The function M 1 = ItA]
W, e' is r-linear and alternating.
(a) Show that Ml = (1/r!) L w~e~. [Hint: w~ = II,] w,b~; use Problem 3(c).]
(b) Show that if M is alternating, then w~ = w~ and hence M 1 = M.

7.3 M ulticovectors
Let us now introduce a different name and a different notation for alter-
nating, multilinear functions.

Definition. A multicovector of degree r is an alternating r-linear function with


domain the r-fold cartesian product En X ... X En.

For brevity, multicovectors of degree r are called r-covectors. From now


on multicovectors are ordinarily denoted by the Greek letters co or ~ rather
than M as in Section 7.2.
We observed in Section 7.2 that the set of all r-covectors satisfies the
axioms for a vector space. Let us denote this space by (E~)*. When r > n, its
only element is 0 by Proposition 7.1. When 1 ~ r ~ n, Proposition 7.2
states that if co is any r-covector, then
(7.17) co = IwAe A.
[AJ

Therefore the r-covectors with A increasing span (E~)*. These r-covectors


eA
form a linearly independent set (Problem 4), which is therefore a basis for
(E~)*. It is called the standard basis. The number W A is the component of co
with respect to the basis element eA. Since there are (~) increasing r-tuples
of integers between 1 and n, (E~)* has dimension (~).
Every I-linear function is alternating. Thus a I-covector is just a covector,
and (E~)* = (En)* is the dual space of En. If we identify the I-tuple (i) with
i, then the standard basis I-covectors e 1, •.. ,en are just those introduced
in Section 3.2. As in previous chapters, we use the letters a, b to denote
I-covectors, and a' h = I?=l aihi for the value of the I-covector (i.e., linear
function) a at h. For r = 2, Formula (7.17) becomes (7.5), already derived in
Section 7.1. We define in Section 7.5 a space E~, whose dual turns out to be
(E~)*. The elements of E~ are called multivectors.

287
7 Exterior algebra and differential calculus

If r = n, then we showed in Section 7.2 that the n-covector e 1 ... n is essen-


tially the determinant function. Its value at (hI, ... , hn) is det(h7), which is the
determinant of the n x n matrix with column vectors hI' ... , hn . Since (E~)*
is one-dimensional, every n-covector has the form ro = ce l ···n where
c=ro l ··· n •

EXAMPLE 1. Let n = 5, r = 3, and ro = 6e l45 - 2e 431 - e 514 . The increasing


triple (1,4,5) is obtained from (5, 1,4) by an even number of interchanges.
Hence e 514 = e 145 . The increasing triple (1, 3,4) is obtained from (4, 3, 1) by
one interchange. Hence e431 = _e 134 , and ro = 2e 134 + 5e 145 . This ex-
presses ro as a linear combination of the standard basis 3-covectors. The
components of ro are W 13 4 = 2, W I 45 = 5, and Wl = 0 for every other in-
creasing triple A.

Products
In (E~)* we define the euclidean inner product
ro'~=LWl(l
Il]

and set Irol 2 = ro' roo The standard basis elements are orthonormal with
respect to this inner product.
Another important product is the exterior product, denoted by the symbol
1\. The exterior product of an r-covector and an s-covector is an (r + s)-
covector, defined as follows: If
A = (i" ... , ir ), v = (jl"" ,js),
let us write A, v for the (r + s)-tuple

Definition. Let 1 ~ r ~ n, 1 ~ s ~ n. If A and v are increasing, then


(7.18)

If ro is an r-covector and ~ is an s-covector, with respective components


Wl, (v, then

ro 1\ ~= L wl("e l 1\ e.
[lll,,]

Note that if r +s> n then ro 1\ ~, being an (r + s)-covector, must be O.

EXAMPLE 2. Let n = 4. Then e l2 1\ e 34 = e 1234 .

since the integer 4 is repeated.

288
7.3 Multicovectors

Proposition 7.3. The exterior product has the following properties:


(1) (ro + ~) /\ 'I = (ro /\ 'I) + (~ /\ 'I).
(2) (cro) /\ ~ = c(ro /\ ~).
(3) ~ /\ ro = ( - 1)rsro /\ ~,if ro has degree r and ~ has degree s.
(4) (~ /\ ro) /\ 'I = ~ /\ (ro /\ 'I).

PROOF. The proof of (1) and (2) is almost immediate from the definition and
is left to the reader (Problem 5). To prove (3),

By s interchanges we may bring i l to the left past jl,'" ,js. Similarly, s


interchanges bring each of i2 , ••• , ir in turn past j I, ... , js. Thus A, v is ob-
tained from v, A by rs interchanges, and e' A = ( - 1)rseA, v. Hence

~ /\ ro = L (vwAe,A = (-1)" L wA(veA,V,


[v][Al [A][\'l
which proves (3).
Let us first prove the associative law (4) for basis elements. Let A =
(i b . . . , i r ), v = 01,'''' js), and p = (k b . . . , kt ) be increasing r-, s-, and
t-tuples, respectively. Let
A, v, p = (i b ... , in jl,"" js, k l ,···, kt )·
Let us show that

If some integer is repeated in the (r + s)-tuple A, v, then both sides are O. If


no integer is repeated, then
(e A /\ e) /\ eP = eA,v /\ eP = (-l)Pe' /\ eP = (-l)Pe"P,

where r is an increasing (r + s)-tuple obtained from A, v by p interchanges.


These same p interchanges change the (r + s + t)-tuple A, v, pinto r, p.
Hence
eA,v. P = (-I)per ,p = (e A /\ eV ) /\ eP.

Similarly e A, v, P = e A /\ (e V /\ eP), and hence

(7.19) (e A /\ e) /\ eP = e A /\ (e V /\ ep).

From this formula it is a straightforward matter to obtain (4) (Problem 6). D

If either r or s is even, then the exterior product is commutative. If r =


s =1, we have a /\ b = - b /\ a.
The exterior product of any finite number of multicovectors is defined by
induction. Using (7.18) repeatedly, we find that if A is increasing,
(7.20)

289
7 Exterior algebra and differential calculus

Since both sides of (7.18) change sign under interchanges in .Ie or v, formula
(7.18) also holds for non increasing r-tuples. Thus (7.20) is valid regardless
of whether .Ie is increasing.

EXAMPLE 3. Let n = 5. Then


(e l + 3e4 ) 1\ (e 24 _ 2e 15 ) = e l24 + 3e424 _ 2e ll5 _ 6e415 = e l24 + 6e 145 .
e2 1\ (3e l - 2e 3 ) 1\ e 5 1\ e3 = (3e 21 - 2e23 ) 1\ e 53 = 3e 2153 = 3e 1235 .

Let us next give two useful formulas, which hold if ro is the exterior
product of l-covectors.

Proposition 7.4. Let ro = a l 1\ .•• 1\ a r • Then


(7.21) ro(hb ... , hr ) = det(a k • h,)
for every r-tuple of vectors (hi'· .. , hr )· For every r-tuple of integers J1 =
UI, ... ,jr),1 :::;; j,:::;; n,
(7.22)

Note that the matrices whose determinants appear on the right side of
(7.21) and (7.22) are r x r. The second of them is obtained by deleting all
columns except jl, ... ,jr in the r x n matrix whose row co vectors are
a l , ... , a r .

PROOF. By (7.14) and (7.20), Formula (7.21) is valid if at, ... , a' are standard
basis co vectors, a k = e ik • Formula (7.21) then holds in general, since both
sides of(7.21) are multilinear in a I, ... , a r . Ifh, = ej" then wI' = ro(eh , ... , ej )
and a k • h, = a',.
Thus (7.22) follows from (7.21). D

*Remarks. The exterior product has been defined in terms of the standard
bases. It is not clear that it is "coordinate free," in other words, that the same
exterior product would be obtained starting from different bases. However,
let us add Formula (7.21) to the list [(1)-(4)] of properties in Proposition 7.3.
Then 1\ is the only product with these five properties. To see this, we first
observe that Formula (7.20) is a special case of (7.21), upon taking a k = eik
and using (7.14). Moreover, (7.18) is a consequence of(7.20) and the associative
law (4). Once the product is known for basis elements, Properties (1) and (2)
determine it in general. Thus 1\ is the only product with Properties (1)
through (4) and (7.21). In fact, (3) can be omitted from the list since it follows
from the other four. Since none of these five properties refers 10 bases, the
exterior product is coordinate free.
*Note about terminology. A multilinear function M of degree r and domain
En X ••. X En is often called a covariant tensor ofrank r. An r-covector is then
called an alternating covariant tensor of rank r.

290
7.4 Differential forms

The sum of an r-covector and an s-covector has been defined only when
r = s. However, one may form the direct sum
(An)* = (E~)* EB (E1)* EB ... EB (E~)* EB ... ,
where we agree that (E~)* is the scalar field. The exterior product induces a
product in (An)*, which is then an algebra over the real numbers. This algebra
is called the exterior algebra of (En)* [4]. (A n)* is sometimes called the
covariant Grassmann algebra or the covariant alternating tensor algebra
of En.

PROBLEMS

1. Write down the standard basis for (E~)* for each r = 1,2,3,4. Find all products
e A /\ e" where A = (i) and v = (j, k, I) IS an increasing triple.

2. Let n = 3. Simplify:
(a) (2e l - e 2 ) /\ (3e 2 + e 3 ). (b) e 21 e 23 .
/\
(c) (e l - e 2 + 3e 3 ) /\ e 21 . (d) (e 23 + e 31 ) /\ (5e l - e 2 ).

3. Let n = 5. Simplify:
(a) e 253 /\ (e 14 + e42 ).

4. Show that ifLp] cAe A = 0, then c, = 0 for every increasing I.. [Hint: See (7.15).J
5. Prove (1) and (2) of Proposition 7.3.

6. Prove the associative law (4) of Proposition 7.3, using (I), (2), and (7.19).

7. Show that 0>"/\ ~ /\ '1 = -'1 1\ ~ /\ 0> if 0> has degree r. '1 has degree t. and both r, t
are odd.

7.4 Differential forms


In Section 6.3 a differential form of degree 1 was defined as a covector-
valued function. It was shown that any such differential form <0 is a linear
combination of dXl, ... ,dxn,

<0 = I Wi dxi ,
i= 1

where the coefficients WI"'" Wn are real-valued functions. For r > 2 a


differential form of degree r is supposed to be an alternating polynomial of
degree r in dx 1 , ••• , dxn with coefficients W A which are real valued functions.
This idea expressed more precisely by the following definition.

Definition. A differential form of degree r is a function <0 with domain DeE"


and values in (E~)*. The value of <0 at x is denoted by <o(x).

The values of <0 are r-covectors. The Greek letters <0 and ~ used in Section
7.3 to denote r-covectors are now used to denote differential forms. The
context indicates clearly which is intended.

291
7 Exterior algebra and differential calculus

For brevity we say" r1orm" instead of "differential form of degree r."


It is convenient to call any real valued functionf a 01orm. If r > n, then the
only r-form is the one which has the value 0 for every XED. We also use 0
to denote this r-form.
Let co be an r-form and ~ be an s-form, with the same domain D. The
exterior product co /\ ~ is the (r + s)-form defined by
(co /\ ~)(x) = co(x) /\ ~(x)

for every XED. Similarly, iffis a real valued function, thenfco is the r-form
such that
(fco)(x) = f(x)ro(x)
for every XED.

Theorem 7.1. Let co, ~, and l) be differential forms, and f a real valued function,
all with the same domain D. Then
(1) (co + ~) /\ l) = co /\ l) + ~ /\ l).
(2) (fco) /\ ~ = f(co /\ ~).
(3) ~ /\ co = ( - l)"co /\ ~, if co has degree r and ~ has degree s.
(4) (~ /\ co) /\ l) = ~ /\ (co /\ l)).
PROOF. This is immediate from the definitions above, and Proposition 7.3. 0

We recall that dx i is the I-form with constant value ei . Similarly, for any
r-tuple A. = (iI' ... ,ir ) the r-form dXil /\ '" /\ dxir has constant value eA.
Hence if co is an r-form, then
(7.23) co = I W A dXi! /\ '" /\ dx ir
[AJ

where the value of W A at x is wix). Using Problem 4, Section 7.2, one can
also write
1 ~. .
co = , L. WA dX" /\ ... /\ dx'r.
r. ;.
We say that an r-form co is of class C(q) if the functions W A in (7.23) are
of class C(q).
We recall that iffis a O-form of class C(1), then dfis the I-form
df= fl dx 1 + ... + fndxn,
where flo ... ,fn are the partial derivatives. In particular, if co is an r-form of
class C(1), then W A is a O-form of class C1) and dw;. is defined.

Definition. Let co be an r-form of class C(1). The exterior differential dco is the
(r + I)-form defined by the formula
(7.24) dco = L,dw A /\ dXil /\ ... /\ dxir.
[AJ

292
7.4 Differential forms

EXAMPLE I. Let n = 3, r = 2, and ro = f dx 1\ dy, wherefis of class e(l) on


Dc E3. Then

dro = df 1\ dx 1\ dy = ( of
ox dx
of
+ oy dy + oz dz
of)
1\ dx 1\ dy,

of of
dro = OZ dz 1\ dx 1\ dy = OZ dx 1\ dy 1\ dz.

Problem 2 generalizes this result.

EXAMPLE 2. If r = n, then
ro = f dx 1 1\ ... 1\ dxn
where f = ro 1 "'n' Since dro is an (n + 1)-form, dro = O.
Theorem 7.2. The exterior differential has the following properties:
(1) d(ro + ~) = dro + d~, if ro and ~ are rlorms of class e(l).
(2) d(ro 1\ ~) = dro 1\ ~ + ( -1 Yro 1\ d~, if ro is an rlorm and ~ is an slorm,
both of class e(1).
(3) d(dro) = 0 ifro is an rlorm of class e(2).

f 1\ ~ = f~. Similarly, if s
[If r = 0, we agree that = 0 then ro 1\ f = fro.
The theorem remains true if r = 0 or s = 0.]

PROOF. The coefficients of ro + ~ in (7.23) are w). + (). and d(w). + C) =


dw). + d().. Therefore (1) holds. Similarly, d(cro) = c dro.
To prove (2) let us for brevity set
E). = dX i, 1\ ... 1\ dxir.
This is the constant differential form such that EA(X) = e A for all x. Let us
first show that
(*)
If any integer is repeated in the (r + s)-tuple .Ie, v then both sides are O. Other-
wise, EA 1\ EV = ( - 1)PEt where r is increasing. By definition, d(fEt) =
df 1\ Et. Multiplying both sides by ( -1)P we get (*). Now
ro 1\ ~ = I w).("E). 1\ E".
[).)[,,)

By the ordinary product rule


d(w).(J = (" dw). + w). d(,,.
By (*)withf= w).(v,
d(w).(vE). 1\ EV) = ((vdw). + w).d(,,) 1\ E). 1\ EV
Since de has degree 1 and E). degree r, by (3) of Theorem 7.1
d(v 1\ E). = (-l)'E" 1\ dC.

293
7 Exterior algebra and differential calculus

The scalar-valued function (v commutes with any differential form. Hence


(**) d(w"(,,E" 1\ EV) = (dw" 1\ E") 1\ (("EV) + (-I)'(w"E") 1\ (d(,. 1\ EV).
Using (1),
d(ro 1\ ~) = L d(wA(,.E" 1\ E V
),

[A)[V)

while
L (dw" 1\ E") 1\ (C,EV) = [L dW A 1\ E"] 1\ [L ("Ev] = dro 1\ ~.
WM W M

Similarly,
(- I)' L (w"E") 1\ (d(,. 1\ EV) = (-I)'ro 1\ d~,
[A)[v)

which proves (2).


Iffis of class e(2), then from (7.8)

a2f a2f } . .
d(df) = L ( a i a j - a ja i Xl 1\ dx J = O.
i<j X X X X

The form EA has constant coefficients and hence dE" = O. Using the product
rule (2), d(df 1\ E") = O. Takingf = .w Aand using (I),

d(dro) = d(L dW A 1\ E") = L d(dw" 1\ E") = O. o


[A) [A)

Definition. An r-form ro is closed if dro = O. If ro = d~ for some (r - I)-form


~, then ro is an exact r-form.

If r = I, these definitions agree with those given in Section 6.3. If ro is


exact and ~ can be chosen to be of class e(2), then dro = d(d~) = O. Hence
ro is closed. Poincare's lemma states that if domain D is star-shaped, then
conversely any closed form ro is exact. This is proved in Section 8.10.
*Remark. The exterior differential d is uniquely determined by Properties
(1), (2), (3), and the following property.
(4) For r = 0, df agrees with its definition in Section 6.3.
Let d' also have these four properties. Then d'E" = d'(d'X i1 1\ E"), where
Jl = (i2" .. ,ir)' But dx i = d'x i by (4) since dXi stands for the differential of
the coordinate function Xi. Using (2), (3), and induction on r, d'E" = O.
Using (2) and (4), d'(fEA) = df 1\ EA. Using (I),

d'ro = d'(L WAE") = L d'(wAE A)


[A) [A)

= Ldw A 1\ EA.
[A)

294
7.5 MuItivectors

Thus d'ro = dro for every ro of class e(1). In particular, this proves that the
exterior differential d is "coordinate free."

PROBLEMS

Assume that all forms that appear are of class e(l).

1. Find the exterior differential of:


(a) cos(xi)dx /\ dz.
(b) x dy /\ dz + y dz /\ dx + z dx /\ dy.
2. Let P, Q, R have domain D c £3. Show that

d(P dy /\ dz + Q dz /\ dx + R dx /\ dy) =
JP + -JQ + -JR) dx /\ dy /\ dz.
( -::;-
ex Jy Jz

3. (a) Generalize Problem I(b)tofindan(n - I)-form~suchthatd~ = dx l /\ ... /\ dx".


(b) Find an (r - I)-form ~). such that d~). = E)..
(c) Show that if the coefficients w). in (7.23) are constant functions, then w is exact.

4. (a) Show that if co and ~ are closed differential forms, then co /\ ~ is closed.
(b) Show that if co is closed and ~ is exact, then co /\ ~ is exact.

5. Find the exterior differential of:


(a) dco /\ ~ - co /\ d~.
(b) dco /\ ~ /\ 11 + co /\ d~ /\ 11 + co /\ ~ /\ dll, if co and ~ are of even degree.

6. A function f is an integrating factor for a I-form co if f(x) #- 0 for every XED and fco
is exact. Show that if co has an integrating factor then co /\ dco = O.

7. Show that if co is a 2-form, then

8. Let COl, ... , coP be I-forms such that co i = If= I fJ dg< i = I, ... , p. Assume that
the functions f~ are of class e(l), the gi are of class e(2), and that the I-covectors
col(x), ... , coP(x) are linearly independent for every XED. Find I-forms 9~ such that
dco i = If=19~ /\ co i . [Hint: The p x p matrix (f~(x)) must be nonsingular.]

Note. Conversely if dco i is a linear combination of COl, ... ,coP with coefficients
I-forms 9~, then locally functions f~, gi as above can be found. This result is called the
Frobenius integration theorem and has important applications in geometry and dif-
ferential equations [8, p. 97]. We give a proof for p = I in Section 7.10.

7.5 Multivectors
Let us now define objects called multivectors of degree r, or, for brevity,
r-vectors. They have a role dual to the r-covectors. Formally, one passes from
multicovectors to multicovectors by merely exchanging subscripts and super-
scripts. Multivectors are usually denoted by ~,p, .... We are interested

295
7 Exterior algebra and differential calculus

mainly in multi vectors which can be written as the exterior product of


vectors, namely, 12 = hi 1\ ... 1\ hr. Such multivectors are called decom-
posable. The concept of orientation is defined in terms of decomposable
multivectors. Moreover, there are simple formulas for the measure of an
r-parallelepiped or an r-simplex, in terms of the norm of a corresponding
decomposable r-vector.

The case r =2
By I-vector we mean a vector hE En. To motivate the general treatment of
r-vectors, let us begin with a discussion of 2-vectors. They are defined as
alternating bilinear functions on (En)* x (En)*. This is a special case of the
definition of r-vector given below. Let E'2 denote the space of 2-vectors. In
exactly the same way as in Section 7.1, one introduces 2-vectors eij such that
eji = - eij' e jj = 0, and the n(n - 1)/2 2-vectors eij for i < j form a basis for
E'2. This is called the standard basis for E'2; we see below that the basis for
(E'2)* of 2-covectors eij is dual to it.
Let us denote the component of a 2-vector 12 with respect to eij by (Xij.
Thus
(7.25)

This formula is dual to (7.5); note that subscripts and superscripts have been
exchanged.
The exterior product h 1\ k of vectors h, k is defined, as in Section 7.1,
by agreeing that e j /\ ej = e jj , and that 1\ distributes with addition and scalar
multiplication. We are particularly interested in decomposable 2-vectors,
namely, those that can be written as 12 = h 1\ k for suitable vectors h, k.
The reason is that several interesting geometric quantities are associated
with a decomposable 2-vector.
Let hand k be linearly independent vectors, and P the 2-dimensional
vector subspace of En that they span:
P = {x = sh + tk, s, t real}.
We call the pair (h, k), in the given order, aframe for P. Suppose that (hi, k')
is another frame for P. Then, for suitable real numbers s, t, u, v

hi = sh + tk, k' = uh + vk,


hi 1\ k' = (sh + tk) 1\ (uh + vk).
Since k 1\ h = -h 1\ k, and h 1\ h =k 1\ k = 0,
(7.26) hi 1\ k' = ch 1\ k,

where c = sv - tu. Thus the 2-vector hi 1\ k' is a scalar mUltiple of h 1\ k.


We must have c #- 0, since hi, k' are linearly independent. It is shown below
(Theorem 7.3) that, conversely, if (7.26) holds for some c #- 0, then (h, k) and

296
7.5 Multivectors

(h', k') are frames for the same 2-dimensional subspace P. Thus, although
P has many frames (h, k), the associated 2-vectors h 1\ k are determined by P
up to a scalar multiple.
The concept of orientation is essential in defining integrals of differential
forms in Section 8.7. As a preliminary step, we define orientation for vector
subspaces of En in the present section. Any frame (h, k) determines an orienta-
tion for the 2-dimensional subspace P spanned by (h, k). Let us agree that
another frame (h', k') determines the same orientation for P as (h, k) if c > 0
in (7.26), and the opposite orientation if c < O. This is a somewhat clumsy
way to define orientation. We give in Formula (7.29) below an equivalent
definition in terms of 2-vectors, which is the one used in this book.

EXAMPLE I. Show that (e, + 3e 3, e 2 - e 3) is a frame for the same 2-dimen-


sional vector subspace of E3 as (2e, + e 2 + 5e 3, 4e, + e 2 + 11e3). Calculat-
ing the exterior products, we get

(e, + 3e3) 1\ (e 2 - e 3) = e 12 - e 13 - 3e23 ,


(2e, + e 2 + 5e3) 1\ (4e, + e 2 + 11e3) = -2e'2 + 2e'3 + 6e 23 .
The second 2-vector is - 2 times the first. These two frames determine
opposite orientations for the 2-dimensional subspace P that they span.
There is a convenient formula for areas of parallelograms and triangles,
in terms of 2-vectors. In E~ we introduce the euclidean inner product and
norm:

ex' P = L
i<j
rt. ij{3ii,

lexl = (ex' ex)'/2 = L~}rt.ij)2 T/2.


Let K be a parallelogram with vertices xo, Xo + h, Xo + k, Xo + h + k (see
Figure 7.1), Section 7.1. Then the area of K is defined as

(7.27) V2 (K) = Ih 1\ kl.


To see that this agrees with the usual definition of area when K c E2, by
translatingandrotatingaxeswemaytakex o = O,h = hie" k = k'e , + k2 e 2 •
Then
h 1\ k = h'k2e12,lh 1\ kl = Ih'llk 2 1Ied.
Since Ie 12 1 = 1, we have Ih 1\ k I = Ih' II k 2 1, which is the elementary formula
for area of a parallelogram.
Let S be the triangle with vertices x o , Xo + h, Xo + k. Its area must be
half that of the parallelogram K. Thus
(7.28) V2 (S) = flh 1\ kl.

297
7 Exterior algebra and differential calculus

EXAMPLE 2. The area of the triangle in E3 with vertices 0, 3e 1 + e2, e3 - e2


is tl(3e 1 + e2) 1\ (e 3 - e2)1. Since
(3e 1 + e2) 1\ (e3 - ez) = -3e 12 + 3e l3 + e23 ,
the components are 1X12 = _3,1X 13 = 3, IX 23 = 1. Since Icxl 2 = L[).] (1X)')2, the
area is fo12. The area can also be calculated from Formula (7.36) below.

We now define orientations for a 2-dimensional vector subspace P of En


as follows. Let (h, k) be a frame for P, and set
h 1\ k
(7.29)
CX o = Ih 1\ kl'
Then CXo is a 2-vector and ICX o I = 1. We call CX o an orientation for- P. Since all
2-vectors cx associated with P are scalar multiples of each other, only two of
them have norm Icxl = 1. They are CX o and -cxo' These are the two possible
orientations for P.

EXAMPLE 1 (continued). Since I e!2 - e l3 - 3e n l = fo, the orientations


are ±(11)-1/2(e!2 - e u - 3e 23 ).

Multivectors of any degree r


These are defined as follows. If r is any vector space, then alternating
r-linear functions on r x .,. x 1/' can be defined just as in Section 7.2
where we took r = P. Let us now take r = (P)*, the dual space to En.

Definition. A multivector of degree r is an alternating r-linear function with


domain the r-fold cartesian product (P)* x ... x (En)*.

For brevity, multivectors of degree r are called r-vectors.


For every statement about multicovectors in Sections 7.2 and 7.3, there
is a dual statement about multivectors obtained by everywhere exchanging
the words "vector" and "covector." For instance, if cx is an r-vector and
A = (iJ, ... , i r ), let
(7.30)
This is dual to the following formula [see (7.11) with M = ro]:

Let e). be the r-vector defined by the following formula dual to (7.14):
(7.31)
for every r-tuple (a 1 , .•• , ar ) of covectors.
Let E~ denote the set of all r-vectors. Then E~ satisfies the axioms for a
vector space. It consists of 0 only if r > n. For 1 :s; r :s; n the r-vectors

298
7.5 MuItivectors

Table 7.1

r-vectors r-covectors

Elements of E~ (E~)*

Standard basis elements e;. = ej I 1\ ... /\ ej ,-


(A increasing) a. = IexAe A 0) = I wl.e;·
[A) [I.)

Euclidean inner product a. • P = I ex Af1;' O)'~=IWAC


[i.) [i.)

Euclidean norm 10)1 2 = 0)'0)

Scalar product O)'a. = IWi-ex A


p.)
e;" e,l = <5;;
O)·e;. = Wi

e). with A increasing form the standard basis for E~. The number IX). is the
component of (l with respect to e)..
The inner product (l • Pof two r-vectors, and the exterior product (l /\ P
of an r-vector (l and an s-vector Pare defined by the formulas dual to those
in Section 7.3. In each instance subscripts are replaced by superscripts and
vice versa. The exterior product of multi vectors has the same properties
listed in Proposition 7.3. The scalar product ro . (l of an r-covector ro and an
r-vector (l is defined in the third from last line of the Table 7.1. The last two
lines of the table are particular cases of the formula for ro . (l.
The formulas in the second line and in the last two lines of Table 7.l are
valid regardless of whether Aand J1 are increasing, since they are known to be
valid for increasing r-tuples, and both sides of each formula change sign
under interchanges.
The reader should compare this table with the corresponding table for
r = 1 (Section 3.2). According to the definition (Section A.l), the dual space
of E~ consists of all real valued linear functions F with domain E~. The dual
space may be identified with (E~)* in the following way. Given an r-covector
ro, let F((l) = ro . (l for every (l E E~. This establishes an isomorphism between
(E~)* and the dual space of E~. The next to last line of the table implies that
the standard bases for E~ and (E~)* are dual.

Note about terminology


Multivectors of degree r are also called alternating contravariant tensors of
rank r. The exterior algebra An of En can be introduced in the way indicated
at the end of Section 7.3.

Definition. An r-vector (l is decomposable if there exist vectors hI' ... , hr


such that (l = hI /\ ... /\ hr.

299
7 Exterior algebra and differential calculus

Similarly, an r-covector ro is decomposable if there exist covectors


a I , ... , a r such that ro = a I /\ .. . /\ a r .
A I-vector is simply a vector h. Hence every I-vector is decomposable. If
Cl is an n-vector, then

where c = a l ···n. Hence every n-vector is decomposable. In Section 7.8 it is


shown that any (n - I)-vector is decomposable. However, for 2 :::; r :::; n - 2
there are nondecomposable r-vectors (see Problem 9). Since e A =
e it /\ ••• /\ e ir , the standard basis r-vectors are decomposable.
It is not correct to identify a decomposable r-vector Cl with the r-tuple
(hb ... ,hr ), since there are many ways to write Cl as an exterior product of
vectors. The corollary to Theorem 7.3 below furnishes a geometric descrip-
tion of all possible such decompositions of Cl.

Proposition 7.5. IfCl = hi /\ ... /\ h" then for every r-covector ro,

ro . Cl = ro(hb ... , hr ).

PROOF. Let us define a function ro on En X ... X En by ro(h l , ... , h) =


ro· (hi /\ ... /\ hr ) for every r-tuple (h b ... , hJ Then ro is an alternating
r-linear function. Moreover,

for every J1. = UI, ... ,jr). Hence ro = roo o


If ro = a I /\ ... /\ a r is also decomposable, then by (7.21) and Proposi-
tion 7.5,
(7.32) (a I /\ ... /\ a r ). (hi /\ ... /\ hr ) = det(a k • hI).

Proposition 7.6. If ro· Cl = ro· Pfor every decomposable r-covector ro, then
Cl = p.
PROOF. The standard basis r-covectors e A are decomposable. Hence for
every increasing A.,
D

If Cl = hi /\ ... /\ h" and A. = (i b ..• , i r ), then the component a A satisfies


the formula dual to (7.22):
(7.33)
The decomposable r-vectors have an important geometric significance,
which is described next.

300
7.5 Multivectors

First we recall the following results from linear algebra:


(1) Any linearly independent set {hi' ... , hr} is a basis for the vector subspace
PeEn spanned by these vectors (definition).
(2) Given any such set there exist hr+ I> ... ,hn such that {hl>"" hn} is a
basis for En.
(3) For every basis {hl>"" hn} for En there is a dual basis {a I, ... ,an} for
(En)*, ak • h, = b7 for k, 1= 1, ... , n (Section A.l).

Definition. A linearly independent r-tuple (hi"'" hr) is called a frame for


the vector subspace P spanned by hl>"" hr.

The only difference between the notions of basis and frame is that the
latter takes into account the order in which the basis vectors hI> ... , hr are
written.

Theorem 7.3

(a) An r-tuple (hi" .. , hr) is linearly dependent if and only ifh l /\ ... /\ hr = O.
(b) Let PeEn be an r-dimensional vector subspace and (hi' ... , hr), (h 'l , ... , h~)
be any twoframesfor P. Then there is a scalar c such that
(7.34) h'l /\ ... /\ h~ = chi /\ ... /\ hr·
(c) Conversely, if (hi, ... , hr) and (h'l' ... , h~) are frames which satisfy (7.34)
for some scalar c, then they are frames for the same vector subspace P.
(see Figure 7.2).

PROOF. To prove (a), let (hl>"" hr) be linearly dependent and ex =


hi /\ ... /\ hr. By Propositions 7.1 and 7.5, 00 • ex = 0 for every 00. By Prop-
osition 7.6, ex = O. On the other hand, if (hI>' .. ,hr) is linearly independent,

/"\
/' \
// \
h~ / \

\
h2 -\ ' \ - - - - -7
\ /
/
/
/
/

o
Figure 7.2

301
7 Exterior algebra and differential calculus

let a I, ... ,ar be covectors such that ak • hi = <5~, and let 0) = aI /\ ... /\ ar •
By (7.32),
0). IX = det(<5n = 1.

Hence IX #- O.

To prove (b), let each h; be a linear combination of hi"'" h"


r

h; = L cihm, 1 = 1, ... , r.
m=1

If 0) = a I /\ ... /\ a r is any decomposable r-covector, then by (7.32),

0). IX' = det(a k • h;) = detCt (a k • hm)ci)

where IX' = h'l /\ ... /\ h~. The matrix on the right is the product of the
matrices (a k • hm) and (ci). Hence if c = det(ci),
0) • IX' = c det(a k • hm ) = co)· IX = 0) • (CIX).
This is true for every decomposable 0); hence IX' = CIX by Proposition 7.6.
To prove (c), again let IX = hi /\ ... /\ h" IX' = h'l /\ ... /\ h~. By hypoth-
esis, {hi' ... , hr } and {h'I' ... , h~} are linearly independent sets. It suffices to
show that each h; is a linear combination of hi' ... , hr. Suppose that this is
false for some I, say for 1 = 1. Then {hi' ... , h" h'd is a linearly independent
set. Let a l , ... , a r + 1 be such that ak • hm = <5~ for k, m = 1, ... , r + 1, where
we have set h'l = hr+ I' Let 0) = a l /\ . . . /\ a r . Then 0). a = 1, but 0) • IX' = 0
since the elements a k • h'l = ()~+ 1 of the first column of the r x r matrix
(a k • h'l) are O. This contradicts the assumption that IX' = CIX, C #- O. D

EXAMPLE 3. Show that 2e l + 3e 2 - e 3, e l + 2e2, e l - 2e3 are linearly


dependent. Their exterior product is
(2e l + 3e 2 - e3) /\ (e l + 2e 2) /\ (e l - 2e 3)
= (e 12 - e 31 - 2e 32 ) /\ (e l - 2e 3) = -2e 123 - 2e 321 = O.

Let IX #- 0 be a decomposable r-vector. Then IX = hi /\ ... /\ h" and


the vector subspace P spanned by hi" .. ,hr is called the r-space of IX. If
IX = h'l /\ ... /\ h~, then taking c = 1 in part (c) of Theorem 7.3, we see
that h'I' ... ,h~ also span this same vector subspace P. Thus P depends only
on IX and not on the particular way IX is written as the exterior product of
vectors.
If c #- 0, then (Chb h2' ... ,hr) is another frame for P and CIX = (chi) /\
h2 /\ ... /\ hr. Thus IX and CIX have the same r-space P. On the other hand,
if IX' is not a scalar mUltiple of IX, then IX and IX' have different r-spaces.
In much the same way as for the case r = 2 above, let us now define
r-dimensional measure for parallepipeds and simplices.

302
7.5 MuItivectors

Measure for r-parallelepipeds


It is shown in Section 5.7 that if K is an n-parallelepiped spanned by hI> ... ,hn
with Xo as vertex, then

More generally, if x o , hi"'" hr are vectors, then

K = {x: x = Xo + ±
k;1
tkhk' 0 ~ tk ~ 1, k = 1, ... , r}
is the r-parallelepiped spanned by hI> ... , hr with Xo as vertex.

Definition. The r-dimensional measure of K is


(7.35) v,.(K) = Ihl !\ ... !\ hrl.

By part (a) of Theorem 7.3, v,.(K) = 0 if and only if (hI> ... ,hr) is linearly
dependent.
To show that the definition (7.35) of r-measure for parallelepipeds is
reasonable, let us show that v,.(K) is the product of the lengths of the vectors
hl>"" hr in case these vectors are mutually orthogonal. If VI"", Vr are
vectors, let a k be the covector with the same components as the vector Vk •
Then a I !\ ... !\ a r is the r-covector with the same components as the r-
vector VI !\ ... !\ Vr . By (7.32),
(v I !\ ... !\ vr)· (hi !\ ... !\ hr) = det(v k • hi),
where the· now denotes inner product. In particular, let Vk = hk. Then
(7.36)
Taking square roots, we get a formula for v,.(K). If hi' ... , hr are mutually
orthogonal, then hk ' hi = 0 for k "# I and det(h k ' hi) = Ih I 12 •• ·lh,.1 2 . In
this case v,.(K) = Ihll·· 'Ihrl as required.

Measure for r-simplices


Let S be an r-simplex with vertices x o , ... , X r . Let hk = x k - x o , k = 1, ... , r.
Reasoning as in Section 5.7, we have

S = {x:x = Xo + i
k;1
tkhk,tk 2': Ofork = 1, ... ,r, i
k;1
tk ~ I}.

The r-dimensional measure of S is defined to be


1
(7.37) v,.(S) = , Ihi !\ ... !\ hr I.
r.
Both (7.35) and (7.37) are very special cases of a general formula (8.2) for
r-dimensional measure.

303
7 Exterior algebra and differential calculus

Orientations
Let P be an r-dimensional vector subspace of E", and (hi"'" h,) a frame for
P. The r-vector
hi /\ ... /\ h,
(10 = ----=----'---
Ihi/\ ... /\ h, I
has P as its r-space, and I(10 I = 1. We call (10 an orientation for P:
Definition. A decomposable r-vector (10 is an orientation for P if I(10 I = 1 and
P is the r-space of (10 •

There are two orientations for P. If (10 is one of them, then - (10 is the other.
Two frames (hb ... ,h,), (h/b ... ,h~) for P determine the same orientation for
P if and only if c > 0 in (7.34). Note that c is the determinant of the r x r
matrix (ci) in the proof of Theorem 7.3(b).
If two vectors in the frame (hi' ... ,h,) are interchanged, then the exterior
product hi /\ ... /\ h, changes sign. Thus the orientation of a frame changes
under interchanges.
We now have a criterion that shows when two frames lead to the same
r-vector.

Corollary. Let (h b ... , h,), (h~, ... , h~) be frames. Then hi /\ ... /\ h, =
h/l /\ ... /\ h~ if and only if these frames span the same r-space P, have
the same orientation, and their parallelepipeds with 0 as vertex have the same
r-measure.

The case r = n
In this case P = E" and ±e l ... " are the two orientations. Let us call e l ... "
the standard, or positive, orientation of E" and - e l ... " the negative orientation
of E". When r < n, we do not attempt to call one orientation of P positive
and the other negative. If (hi' ... ,h") is a frame for E", then by (7.33)
(7.38)
The frame has positive orientation if det(h7) > 0 and negative orientation
if det(M) < O. For small values of n, (7.38) sometimes furnishes a convenient
way to find the determinant of the matrix (hf), since the exterior product on
the left side of (7.38) is readily computed.

EXAMPLE 4. Let e l + ez, e l - e 3, e z + 3e 3 be the column vectors of a 3 x 3


matrix M. Then
(el + e z) /\ (e l - + 3e 3) = -e 13Z + 3eZl3 = -2e IZ3 '
e 3) /\ (e z
Therefore det M = - 2. Moreover, (e l + e z , e l - e 3, e z + 3e 3) is a frame
for E 3, which determines the negative orientation -e 1Z3 for E3.

304
7.5 Multivectors

PROBLEMS

1. Simplify (n = 6):
(a) e 3 1\ e s 1\ e 24 .
(b) e 2 1\ e 3 1\ e 62 .
(c) e l 1\ (e 14 + e64 ).
(d) (e l + 3e4 - e6) 1\ (2e23 + e 36 ) 1\ e4S '
(e) (ell + e 13 ) 1\ (e 34 + e 2S ) 1\ (eS6 + e46 ).
2. Evaluate the indicated scalar products (n = 4).
(a) (e l + e 2) . (e l + e 2).
(b) e l2 . e 34 .
(c) eI34'(e431 + 3e I24 ).
(d) (e l - e4) 1\ (e 2 + e4). (e l + 2e 4) 1\ (e 2 - 2e 4).
3. Using Theorem 7.3 show that (2el + e 3, e 2 + e4, e l + e4, e 3 + e4) is a frame for
E4. What is its orientation? Find the determinant of the matrix that has these vectors
as columns, using (7.38).
4. Show that (e l + e4 , e 2 + e s , e 3 + e 6, e l + e s , e 2 + e6 , e 3 - e4) is a frame for E6.
What is its orientation?
5. Show that (e l - e 2, e 2 - e3) and (3e l - e 2 - 2e 3, 2e l - e 2 - e 3) are frames for
the same vector subspace of E3. Do their orientations agree?
6. Find the area of the triangle with vertices 2e 3, e l - e 2 + 2e 3, e l + 3e 3.
7. Find the volume of the 3-simplex in E4 with vertices 0, e l - e 3, e 2, e 3 + 2e 4.
8. Let K be an r-parallelepiped spanned by hi' ... , hr with 0 as vertex. For each
increasing A. let Kl = Xl(K) where Xl is the projection onto the r-space of el'
(Xl leaves the components xit, ... , xir of any x unchanged and replaces each of the
other components by 0.) Let rl = hi 1\ •.• 1\ hr. Show that locll = V.(K l ) and
hence [v.(K)]2 = Ill] [V.(Kl)Y Illustrate for n = 3 and r = 1,2.
9. Show that:
(a) If rl is decomposable, then rl 1\ rl = O.
(b) If rl and P are decomposable r-vectors, then (rl + P) 1\ (rl + P) = 2rl 1\ P if r
is even and is 0 if r is odd.
(c) The 2-vector el2 + e 34 is not decomposable. [Hint: Use (a).]
10. Let rl and P be decomposable nonzero 2-vectors, and P, Q be their respective
2-spaces. Show that if P 11 Q = {O}, then rl + P is not decomposable; and if P 11 Q
is a line through 0, then rl + P is decomposable and rl #- cpo [Hints: In the first
instance rl = h 1\ k, P = h' 1\ k', where {h, k, h', k'} is a linearly independent set.
In the second rl = h 1\ k, P = h 1\ k', where h E P 11 Q.]
11. Let rl = h 1\ k, rl #- O. Show that the matrix (oc ii ) has rank 2. [Hint: Show that each
column vector of the matrix is a linear combination of hand k.]
12. Let (x o, XI' ... , x r) be an (r + I)-tuple such that the vectors XI - x o, ... , xr - Xo
are linearly independent. Such an (r + I)-tuple defines an oriented r-simplex.
Its r-vector is I/r !(XI - xo) 1\ •.• 1\ (x r - xo). Let Pi be the (r - I)-vector of the
ith oriented face (x o, XI' ... , Xi-I, Xi + I"'" xr). Show that
r
I(-I)iPi = O.
i=O

305
7 Exterior algebra and differential calculus

7.6 Induced linear transformations


Let m and n be positive integers. With any linear transformation L from
Em into En is associated for each r = 1,2, ... a linear transformation L, from
E~ into E~ with the following property. If(k b ... , k,) is any r-tuple of vectors
in Em, then we require that
(7.39) Lr(k t 1\ ... 1\ k,) = L(kd 1\ ... 1\ L(kJ
For r = 2 this is illustrated in Figure 7.3.

hI = L(k I )
L2 (k l 1\ k 2 ) = hI 1\ h2
o o
Figure 7.3

-
At the end of the section we show that Lr is well defined by Formula
(7.39). The linear transformations L, are said to be induced by L.
There is a geometric way to think about L,. Let Q be the vector subspace
of Em spanned by {kb' .. ,k,}, and let hi = L(k i), P = L(Q). Then P is the
vector subspace of En spanned by {hl' ... ,h,}. Let us suppose that Q and P
are both of dimension r. [Note: This holds in particular if r = m, m ::; n,
and L has maximum possible rank r. This case is encountered repeatedly in
Chapter 8.J When Q and P have dimension r, (k l , ... , k,) is a frame for Q
and (h l , ... , h,) a frame for P. Moreover, Q is the r-space of the r-vector
p = kl 1\ ... 1\ k,; and P is the r-space of the r-vector (X = hl 1\ ... 1\ h, =
L,(P) according to (7.39).
Let I be the r-parallelepiped spanned by (k t, ... , k,) with 0 as vertex,
and let K = L(l). Then K is the r-parallelepiped spanned by (h io ... , h,),
with 0 as vertex (see Figme 7.3). Consider the ratio of r-measures:

(7.40) " = Y.(K).


y'(l)

By (7.35), " = I(X II P 1- 1. Let us show that" depends only on the r-space Q
and on L. Let (k'l,"" k~) be another frame for Q, and h; = L(kJ Then
(h'l" .. , h~) is a frame for P. Let P' = k'l 1\ ... 1\ k~, (x' = h'l 1\ ... 1\ h~ =
L,(P'). By Theorem 7.3, P' = cp for some real c =f. O. Since L, is linear,
(X' = L,(cP) = cL,(P) = c(x.

306
7.6 Induced linear transformations

Therefore

The case r = m will be of particular interest in Section 8.1. If L is linear


from E' into En, then we may in particular take k j = Ej , where {E I , ... , Er} is
the standard basis for E'. Then hj = Vj = L(E), where VI"'" Vr are the
column vectors of the matrix of L. We have P = EI ... " ex = VI 1\ ... 1\ Yr'
Since IPI = 1, we have
(7.40a) " = IVI 1\ ... 1\ vrl, ifr = m.

The dual transformation


We recall from Section 4.1 that the dual L * of L is a linear transformation
from (En)* into (Em)*. It has the property (Problem 5b, Section 4.1) that
a'L(k) = L*(a)' k for every covector a E (En)* and vector k E Em. In a similar
way Lr has a dual L:, which is a linear transformation from (E~)* into (E~)*.
It is required to have the property that
(7.41)
for every PE E~, ro E (E~)*.
Let us turn to give a definition ofL r (postponed in the preceding discussion)
and of L:.The linear transformation Lr is determined once its values an~
given for standard basis r-vectors. Let E I , ... ,Em be the standard basis vectors
of Em. Let Vj = L(E); then Vj is the jth column vector of the matrix of L
(Section 4.1). The standard basis elements of E~ are EJl = Ej; 1\ ... 1\ Ejr'
with /1 = Ui' ... ,jr) increasing.
Keeping in mind that we want (7.39) to be correct, we set
(7.42)
Since Lr is to be linear, its value at any Pis determined once the values at the
basis elements EJl are known. For any P = L[IlJ f3Il EIi

(7.43) Lr(P) = L f3 IlL r(E Il )·


[IlJ

This defines the induced linear transformation Lr for 1 < r < m. We set
LI = L. If r > m, then E~ has the single element 0 and Lr(O) = O.
Let us show that for every PE E~, Y E E:,
(7.44)
Let /1 = Ub"" jr) and v = (k b .. ·, ks) be increasing. Then
L r (E JJ. ) 1\ L s (E v) = v·) 1 1\'" 1\ v·Jr 1\ Vk 1 1\'" 1\ vk
5 •

If any integer is repeated in the (r + s)-tuple (/1, v), then this is O. Otherwise,
the right-hand side is (-1)PL r +.{E t ) where r is the increasing (r + s)-tupJe
obtained from (/1, v) by p interchanges. Since Ell. \' = (-l)PE, and Lr+s is

307
7 Exterior algebra and differential calculus

linear, (-l)PLr+.{E t ) = Lr+.{EI',J Thus


L.(EI') 1\ Ls(E.) = Lr+s(EI',J
Therefore (7.44) is correct for basis of elements of E'; and E:. Since each of
these transformations is linear, (7.44) then holds in general.
By induction there is a generalization of (7.44) for products of any number
of mu1tiv~ctors. In particular, in this way we get the required formula (7.39)
for products of vectors.
Let PEE'; and (X = Lr(P). Let us find a formula for the components (XA
in terms of the components of p. If A = (if,'''' i r), 11 = 01, ... , jr), and
(C}) is the matrix of L, let

By (7.33), c; is the Ath component ofvit 1\ .,. 1\ Vjr' By (7.42) and (7.43)

(X = I [3I'Vit 1\ ... 1\ Vjr'


[1'1

Since both sides have the same components,

(7.45)

When r = I, this becomes (4.2).


Let us next define the dual linear transformation L:. The standard basis
co vectors for (En)* are e l , ... , en. Let Wi = L *(e i). We recall (Problem Sa,
Section 4.1) that Wi is the ith row covector of the matrix of L. Consider any
standard basis r-covector e A= e i1 1\ ... 1\ e ir for (E~)*. We set
(7.46)

Since L: is to be linear, we set for any r-covector ro = I W Ae Ain (E~)*


[Al

L:(ro) = L wAL:(e A).


[Al

This defines L:.


Let us establi~ the relation (7.41) between the induced linear transfor-
mation Lr and its dual L:. Since both Lr and L: are linear, it suffices to show
that (7.41) is correct for standard basis elements, namely, for P = EI' and
ro = eA. We recall that e i . L(E) = L*(e i). Ej. In other words, e i . Vj = Wi. Ej .
By (7.32),
eA. Lr(EI') = (e i1 1\ •.• 1\ e ir ). (vii 1\ ... 1\ Vj) = det(e ik . Vj)
L r*(e A) . I:JJ. = (w i) 1\ ... 1\ wir ). (I:.J I
1\ ... 1\ E·Jr ) = det(w ik . 1:.)1 ) •
Since the right-hand sides are equal, we have eA. Lr(EI') = L:(e A). EI' and hence
(7.41 ).
The formulas dual to (7.44) and (7.45) are correct (Problems 2, 3).

308
7.7 Transformation law for differential forms

PROBLEMS

1. Let m = 2, n = 3, L(s, t) = (s - 2t)e1 - se z + (2s + 3t)e 3 • Find:


(a) L*(a). (b) Lz(P).
(c) q(ro). (d) Lj(e I23 ).
(e) j", ifr = 2.

2. Prove the dual of (7.44):


(7.47) L:+,(ro /\ ~) = L:(ro) /\ L:(~)

if ro E (E~)*, ~ E (E~)*.

3. Prove the dual of (7.45)


(7.48) C= I WAC;' if ~ = L:(ro).
[A]

4. Let L be an orthogonal transformation of P. Show that ILr(~) I = I~ I:


(a) If ~ is decomposable. [Hint: (7.36).]
(b) For any r-vector~.

7.7 Transformation law for differential forms


Let g be a transformation of class C(l) from an open set Ll c Em into En. Let
D be an open set containing the image g(Ll) (see Figure 7.4). If ro is any r-form
with domain D, then there is a corresponding r-form denoted by ro J with
domain d. Formally, ro J is obtained by merely substituting g(t) for x and
dg i for dx i with gi the ith component of g. The precise definition of ro J is
in terms of induced linear transformations. For each tEd, the differential
L = Dg(t) is a linear transformation from Em into En (Section 4.3). The
usefulness of the induced linear transformation Lr becomes apparent in
Chapter 8. Let us consider the dual induced linear transformation L:.
By definition of r-form, ro(x) is an r-covector in (E~)*. Then L:[ro(x)J is
an r-covector in (E;')*, which we take as ro#(t).

Definition. For each tEd, the value of ro # at t is


(7.49)
where
x = g(t), L = Dg(t),
In case r = 0 we agree that f# = fog. (The notation # is used for brevity
even though it does not indicate the dependence on g and the degree r.)

Theorem 7.4. The operation # has the following properties:


(1) (ro + ~)# = ro# + ~#, if ro and ~ are of degree r.
(2) (ro 1\ 1\ ~#, if ro is of degree r and ~ of degree s.
~)# = ro#
(3) (df)# = d(f g), iff is of class e(l).
0

(4) (dxi! 1\ ... 1\ dx ir )# = dg i' 1\ ... /\ dg ir.


(5) dro# = (dro)#, if ro is of class e(l) and g of class e(2).

309
7 Exterior algebra and differential calculus

00'

Figure 7.4

PROOF. Since L~ is linear,


(00 + ~)~(t) = oo~(t) + ~#(t).
Since this is true for every t E 11, this proves (1). Using (7.47), we get
(00 1\ ~)~(t) = oo~(t) 1\ ~~(t)

for every t E 11, which proves (2). By the chain rule (Section 4.3), d(f 0 g)(t) =
Lj[df(x)]. By (7.49) the right-hand side is (df)~(t). Thus (3) holds. Recall
that dx i stands for dXi, where Xi(X) = Xi for each x. Then gi = Xi 0 g and
from (3) withf = Xi,
(7.50) (dx i)# = di.
Then (4) follows from this and (2). To prove (5) we have, from (1) through (4);
oo~ = L OJ;. 0 g dg i1 1\ .,. 1\ dg ir ,
[A]

doo~ = L d(w A g dg i1 0 1\ ... 1\ dg ir ).


[A]

By (3), (doJ;.)~ = dew A 0 g). Since g is of class e(2), d(dg i) = O. Therefore, by


the product rule
d(dg i1 1\ ... 1\ dg ir ) = O.
Using the product rule again, and with (7.24), we have
doo# =L (dwJ~ 1\ dg i1 1\ ... 1\ dg ir = (doot o
[A]

As in Chapter 4, let g} denote the jth partial derivative of the component


l Let
A _ ik _ 0(gi 1, ... , gir)
gl' - det(gj,) - o(til, ... , tir)'

The matrix of Dg(t) is (g~{t», and the row covectors are dg 1 (t), .. . , dg"(t).
By (7.22), the }lth component of dg i1 1\ ... 1\ dg ir is g~. Therefore
(7.51) (dXil 1\ '" 1\ dxir)~ = L9~dtil 1\ ... 1\ dtir.
[1']

310
7.8 The adjoint and codifferential

In applying this formula in Chapter 8 we usually take r = m. In that case the


only increasing r-tuple is J1 = (1,2, ... , r) and the right-hand side of (7.51)
has just one term.

EXAMPLE 1. Let n = 3, r = m = 2, (x, y, z) = g(s, t). Then


a(gl g3)
(dx /\ dz)~ = a(s: t) ds /\ dt.

If ro = f dx /\ dz, then ro~ =f 0 g(dx /\ dzt

EXAMPLE 2. Let m = n = r. Then, writingf = WI "'n' we have


ro~ = (f dx I /\ ... /\ dxn)~ = f 0 g(dx I /\ ... /\ dxn)~

a(g I , ... , gn) d i d n


=fog I t /\ ... /\ t.
a(t , ... , tn)

EXAMPLE 3. Let m = r = 1. Then ro#(t) = ro[g(t)] . g'(t), and the definition


(6. 12a) of the line integral can be rewritten

PROBLEMS

1. Let m = 2, n = 3, and g(s, t) = (st)e\ + (s cos t)e 2 + (exp t)e 3 . Find 0>' if:
(a) 0) = x dy. (b) 0) = y dz /\ dx.
(c) 0) = dx /\ dy /\ dz. (d) 0) = f (a O-form).

2. (a) Let = 0)". Show that the components satisfy


~

(7.52) C = I(WAog)g~.
IAJ

(b) Show that 0)" = 0, if Dg(t) has rank < r for each t E!l and 0) is of degree r.
[Note: In tensor language a differential form of degree r, being an r-covector-
valued function, is called a covariant alternating tensor field of rank r. Formula
(7.52) is the transformation law for the components of such a tensor field.]

3. Let n = m = 2, r = 1,0) = M dx + N dy. Find explicitly dO) and 0)" and verify that
(dO))" = dO)".
4. Let n = m = 3, g(s, t, tI) = (s cos t)e l + (s sin t)e 2 + tle 3 . Find:
(a) (f dx /\ dy /\ dz)". (b) (x dy /\ dz)".

7.8 The adjoint and codifferential


To each r-vector (X we now assign a certain (n - r}-vector, which is called the
adjoint of (X and is denoted by *(X. Let us begin with the special dimension
r = n - 1, which is the only one needed in connection with the divergence
theorem in Chapter 8.

311
7 Exterior algebra and differential calculus

n=3
r=2
01 = hI 1\ h2 o
Figure 7.5

Let tX = hi /\ ... /\ hn _ I' If tX = 0, then we set *tX = O. If tX =1= 0, then *tX will
turn out to be the vector h with the following three properties: (I) h is a vector
normal to the (n - I)-space P spanned by hi' ... ,hn - I ; (2) (h, hi' ... ,hn - I )
is a positively oriented frame for En; (3) Ih I = ItX I. Condition (3) says that the
length ofh equals v,,- I (K), where K is the (n - I )-parallelepiped with vertex 0
spanned by hi, ... ,hn - I (see Figure 7.5).
With this in mind, let us define * first for the standard basis (n - 1)-
vectors. Let i' = (1,2, ... , i-I, i + I, ... , n). Since i-I interchanges will
change the n-tuple (i, i') into the increasing n-tuple (1, ... ,n),
(-I)i-Ie i /\ ei' = e l .. · n ·
Therefore we set
(7.53) *e i· = (_l)i-I ei .
We want the operation * to behave linearly.
For any (n - I)-vector ex = LI= 1 r/e j ., let *ex be the vector h =
II=I ai'(*e i·). Its components are
(7.54) i = I, ... , n.

EXAMPLE I. Let n = 3. In this particular dimension it is useful to consider,


instead of the standard basis {e23,e13,e12} for E~, the basis {e23,e3beI2},
where e 31 = - e!3' Then any 2-vector tX can be written tX = 1J(23 e23 +
1J(31 e31 + 1J(12 e12 . where 1J(31 = _1J(13; and
*e 23 = e l , *e31 = e 2, *e 12 = e 3,
1J(23 = hi, 31
a =h , 2 1J(12 = h 3 ifh = *tX.

Proposition 7.7. The vector h = *tX defined by (7.54) has properties (I), (2), (3)
above.
PROOF. Given a frame (hi"'" hn - I ), these three properties determine a
vector, which we denote temporarily by 11. Let (h'b ... , h~_ I) be an orthogonal
frame for P, hI. . hi = 0 if k =1= I. Then ex is a scalar multiple b ofh'J /\ ... /\ h~-l'
and replacing h'l by bh'l we may assume that tX = h'l /\ ... /\ h~ - I' By (l),
11· hI, = 0 for each k = I, ... , n - 1. Therefore
111/\ tXl = Illllh'II'''lh~_11 = IlllltXl.

312
7.8 The adjoint and codifferential

By (2), Tt /\ a = ce 1... n where c > O. In fact, c = ITt /\ a I. Let h = *a. From


(7.54), Ihl = lal and consequently c = ITtllhl.
On the other hand,

1:
n /\ a -_(~7_i)
i~lnei /\ (~j')
j~ll1. ej' =
[~-ii'
i~lhll. (-1) i-lJ e1··· n ,

since ei /\ ej' = 0 unless i = j. By (7.54)


Tt /\ a = (Tt· h)e1 ... n'
and hence Tt . h = c = ITt II h I. Equality holds in Cauchy's inequality (Section
1.2), and therefore Tt is a positive scalar multiple of h. By (3), ITt I = Ihi, and
hence Tt = h as required. 0

Corollary 1. Every (n - I)-vector a is decomposable.


PROOF. Let a i= 0, and let h = *a. The vector h is normal to an (n - 1)-
dimensional subspace P. Let Ii be an (n - I)-vector of P whose orientation
and norm are chosen such that Ii and h are related by (lH3). Then Ii is de-
composable and h = *Ii. Thus *Ii = *a, which by (7.54) implies that Ii = a.
o
The adjoint *ro of an (n - 1)-covector ro is defined in a similar way. We set
*ro = ~, where ~ is the l-covector whose components satisfy the formula dual
to (7.54):
(7.55) (i=(-I)i-1 Wi " i=I, ... ,n.
If ro is an (n - l)-covector and a an (n - I)-vector, then
(7.56) (*ro)' (*a) = ro' a,
since the two sides equal Ii= 1 ( - 1)2iWi' lI.i' = Ii= 1 Wi' lI. i,·
If ro is an (n - I)-form, then *ro is defined as the I-form ~ whose com-
ponents (i satisfy (7.55). Thus ~(x) = *ro(x) for each x in the domain of roo
We have, in particular,
(7.57) *(dx 1 /\ ... /\ dX i- 1 /\ dx i+ 1 /\ ... /\ dx n ) = ( _l)i-1 dXi.

EXAMPLE I (continued). For n = 3, (7.57) says that *(dx /\ dy) = dz,


*(dz /\ dx) = dy, *(dy /\ dz) = dx. From these formulas one gets *ro for any
2-form ro (Problem 2).

If ro is an (n - I)-form of class C1), then dro is an n-form. Consequently,


dro = f dx 1 /\ ... /\ dx n , where f is a scalar-valued function. To get a con-
venient expression for J, let ~ = *ro. Its components are given by (7.55).
Let
(7.58)

313
7 Exterior algebra and differential calculus

This function is called the divergence of the I-form ~. By a short calculation


(Problem 2), the desired function/is just div~. Thus
(7.59) dO) = div ~ dx l /\ ... /\ dx n , if ~ = *0).
When n = 3 the divergence has an important physical significance, which
is indicated in Section 8.4.

The remainder of this section is not used in Chapter 8. Let us define *ex
for any r-vector ex when 0 ~ r ~ n. If r = 0 or n, we set
*c = ce l ... n,
If 0 < r < n, let A = (i1>' .. , i,) be any increasing r-tuple, and let A.' =
U1>' .. , jn-,) be the increasing (n - r)-tuple whose entries are those integers
jl between 1 and n which do not appear in A. Let
;: 1 ... n
BA = UA',A .
It is ± 1, depending on whether an odd or an even number of interchanges
puts A.', A in increasing ordeLIf ex is any r-vector, then its adjoint is the (n - r)-
vector *ex whose components satisfy
(7.60) (*ex( = exABA.
If r = n - 1 and A = ii, then A.' = (i), BA = (_1)i- 1, and (7.60) agrees with
(7.54). From the definition (7.60),
*(ex + P) = *ex + *P, *(cex) = c*ex.
Moreover, *ex = 0 if and only if ex = O. Thus the operation * gives an iso-
morphism between E~ and E~_,. This isomorphism preserves inner products.
In fact, if ex and pare r-vectors, then
*ex *P = L (*ex)"'(*P( = L (BA)2ex AfJA.
0

[i·1 [AI

Since(BA)2 = 1,*ex o*P = exop.Takingex = pwehaveinparti<411arl*exl = lexl.


Since BA, BA = (_1),(n-,),
**ex = (_1),(n-,)ex.
Now let v be any increasing (n - r)-tuple. Then

which is 0 if v =F A.' and is BA e l ... n if v = A'. If P is any (n - r)-vector, then

II /\ ex = L {J'ex"ev /\ e A = (LfJA'!XAe,,)e l ... n ,


[vIlAI [AI

and
(7.61)

314
7.8 The adjoint and codifTerentia]

If a. =1= 0 is decomposable, then *a. has the following geometric interpre-


tation. Let a. = hI /\ ... /\ hr' and let hr + 1, . . . ,hn be vectors such that:
(1) (hr+ b ... , hn) is a frame for the orthogonal complement of the r-space
of a.; (2) (hr+ 1, . . . , hn' hi' ... ' hr) is a positively oriented frame for En; (3)
1hr+ 1 /\ ... /\ hn1 = 1a.1· Then *a. = hr + 1 /\ ... /\ hn· The proof is similar
to that given above for r = n - l.
If ro is an r-covector, then *ro is the (/1 - r)-covector such that

(7.62) (*ro);,. = co;, D;,.

If ro is a differential form of degree r, then *ro is the (n - r)-form such that


(*ro)(x) = *ro(x) for every x in the domain of ro.

EXAMPLE 2. *(fdxil /\ ... /\ dx ir ) = c;Jdx j1 /\ . . . /\ dx jn


til' ... ,)n-r)' For r = 0 or n,

*1 = fdx 1 /\ . . . /\ dx", *(fdx 1 /\ . . . /\ dxn) = f


Let ro be an r-form of class C(ll. Then d(*ro) is an (/1 - r + I)-form and
*d(*ro) is an (I' - 1)-form.

Definition. The codifJcrcntial of ro is

(7.63) dro = (_1)'(n-r)*d(*ro).

Since **ro = (- I)r(n-r)ro, substituting *ro for ro we get

(7.64) d(*ro) = *dro.

If r = 0, we invent a form 0 of degree - 1 and agree that dt = o. If ~ is a


I-form, consider the (n - I)-form ro = (_1)"-1 *~. Then *ro = ~ and by
(7.64) d~ = *dro. By (7.59), d~ = div~. Thus the codifferential of a I-form ~
is just the O-form div ~.
IV otc. Many authors define the adjoint so that in (2) above (hI' ... , hr'
hr+ 1" .. , hn) is a positively oriented frame for En. When 1'(11 - 1') is odd,
according to that definition *a. has opposite sign to the one here.
The definition of the adjoint involves the euclidean norm. Hence both
the adjoint and the codifferential depend on the euclidean structure inherited
by E~ and (E~)* from the euclidean inner product in E"; while the notions of
/\ and d actually depend only on the vector space structure and not the inner
product.
In riemannian geometry one is provided at each point x with an inner
product Bx , not necessarily the euclidean inner product. The definition of
adjoint must be modified accordingly. The codifferential is again defined by
(7.63). However, the formula (7.58) for the divergence and its generalization
(Problem 6) must be modified. [8; 21, Chapter V].

315
7 Exterior algebra and differential calculus

PROBLEMS

1. Let n = 2. Show that


(a) *h = hZe l - hle z.
(b) *(M dx + N dy) = N dx - M dy.
(c) *d(N dx - M dy) = -(oM/ox + oN/oy).

2. (a) Let n = 3, and ro = P dy /\ dz + Q dz /\ dx + R dx /\ dy be a 2-form. Show that


OP oQ OR)
*ro = P dx + Q dy + R dz, dro = ( -0 + - + - dx /\ dy /\ dz
x oy oz
and that **ro = roo
(b) Let ro be an (n - I)-form,

ro = L" w·o d
I Xi d
/\ ... /\ X i-I /\ dX i+ I /\ ..• /\ dX "
,

;=1

and let ~ = *ro. Show that dro = div ~ dx l /\ ..• /\ dx".


3. Show that: (a) div(df) is the Laplacian of f. (b) div(fro) = f div ro + df' ro, where
(~ . ro)(x) = ~(x)· ro(x) for I-forms ~, roo

4. Let L be a linear transformation from E" into E" such that v,,-I(K) = v,,_I[L(K)]
for every (11 - I)-parallelepiped K. Show that L is an isometry.

5. Let IX and p be r-vectors. Show that:


(a) (*IX) /\ P = IX' pel"'"'
(b) (*IX) /\ P = ( - 1)r(" - r)1X /\ (*P).
(c) (*ro)' (*IX) = ro' IX for any r-covector roo
6. Show that the components of dro satisfy (dro). = Ii= I JW.,JJxi, where (v, i) =
(i[o ... , ir - I , i).

7. Show that the components of dro satisfy (dro)1. = Li!: (-I)j-I OWl.jJx ij , where Aj
is the r-tuple (i[o ... , ij-I, ij + [0 ••• , ir+ I)'
8. If ~ is a I-form and ro is an r-form let ~. ro be the (r - I)-form such that *(~ . ro) =
(_l)"-I~ /\ (*ro). Show that:
(a) d(fro) = fdro + df ·ro. (b) (~'ro)v = D=I CW,.. i·

*7.9 Special results for n = 3


Vector analysis in E3 is traditionally based on four operations besides the
usual vector addition, scalar multiplication, and inner product. These opera-
tions are the cross product, triple scalar product, curl, and divergence. The
last of these was defined in the previous section, for any dimension n. The
other three are special to three dimensions, and can be expressed in terms of
/\, *, and d as follows.
If hI and hz are vectors, their cross product is denoted by hI x h z . It is
the vector
(7.65)

316
7.9 Special results for n = 3

(see Figure 7.5). The cross product distributes with vector addition and scalar
multiplication, and h2 x hi = - hi X h 2 . However, it is not associative.
The triple scalar product of three vectors is denoted by [hi' h2' h 3]. It is
given by
(7.66)
Its absolute value equals Ihi /\ h2 /\ h31, which is the volume of the paral-
lelepiped spanned by hi' h2' h3 with vertex O. The sign of the triple scalar
product is positive if (hI> h2' h3) is a positively oriented frame for E3 and
negative if this frame is negatively oriented.
Whenn = 3,r(n - r)isalwaysevenand(-l)r(n-r) = l.Then*(h l x h 2) =
hi /\ h 2. Using Problem 5(a), Section 7.8,
hi /\ h2 /\ h3 = (hi x h 2)· h 3e 123 ,
which gives another formula for the triple scalar product:
[hi' h2' h 3] = (hi x h2)· h 3.
The cross product of two co vectors, or of two I-forms, is given by
(7.67) co x ~ = *(co /\ ~).

The curl of a I-form co is the I-form curl co given by


(7.68) curl co = *dco
Its physical significance is indicated in Section 8.8 in connection with
Stokes's formula.

EXAMPLE I. Show that div(curl co) = 0 for every I-form co of class C 2 ). Using
the fact that (1* = *d (Formula 7.64),
div(curl co) = d(*dco) = *d(dco) = *0 = o.
PROBLEMS

Assume that all forms are of class e(2).


1. Show that:
(a) h x k = - k x h.

2. Let ro = M dx + N dy + 0 dz. Show that


curl ro = (ao/ay - aN/az)dx + (aM/az - ao/ax)dy + (aN/ax - aM/ay)dz.
3. Find ei x ej for all pairs i,j = 1,2,3.
4. With the aid of (7.67) and (7.68), show that:
(a) div(~ x ro) = 0 if ~ and ro are closed.
(b) curl(fro) = f curl ro + df x ro.
(c) cur1(f df) = O.
(d) curl(~ x ro) = d(~ " ro).

317
7 Exterior algebra and differential calculus

(e) curl(curl ro) = d(div ro) - Lapl ro, where Lapl(M dx + N dy + 0 dz) =
(Lapl M)dx + (Lap I Njdy + (Lapl O)dz and Lapl f is the Laplacian of the
function f.
(f) ~. curl ro - ro' curl ~ = div(ro x ~). [Hint: By the dual to (7.61), ~. *dro =
*(~ /\ dro).]

5. Show that
(a) hi x (h z x h3) = (hi' h3)hz - (hi' hz)h 3. [Hint: Since both sides are trilinear
in (hi, hz, h3)' it suffices to prove this when hi' hz, h3 are standard basis vectors.
Use Problem 3.]
(b) The cross product is not associative.
(c) (hi x h z ) X (h3 x h4 ) = [hi' h1' h4 ]h 3 - [hi' hz, h3]h4 .

6. Let ro = (E I dx l + E1 dx 1 + E3 dx 3 ) /\ dx 4 + BI dx z /\ dx 3 + B z dx 3 /\ dx l +
B3 dx l /\ dx 2 , where the functions Bi, Ei are of class C(1) on an open subset of E4.
Show that dro = 0 if and only if curl E + JB/Jx 4 = 0, div B = O. Here curl and
div are taken in the variables (Xl, xZ, x 3). {Note: The equation dro = 0 represents
one-half of Maxwell's equations for an electromagnetic field in free space. The
functions E I , E z , E3 represent the electrical components of the field and B I , B z , B3
the components of a magnetic induction vector. There is a similar equation repre-
senting the other half of Maxwell's equations [S, p. 45], T. Frankel, 1974, Amer. Math.
Monthly, 81, 343-34S]}.

*7.10 Integrating factors (continued)


Let co be a i-form with domain D. We assume that co is of class C(l), with
ro(x) =1= O. In Section 6.6 we gave necessary and sufficient conditions for ro
to have an integrating factor. The following theorem gives another set of
conditions, which are easier to apply.

Theorem 7.5
(a) If co has an integrating factor in D, then co /\ dco = O.
(b) Suppose that co /\ dco = O. Then, for any x* ED, there exists an open
set L1 such that x* E L1 and co has an integrating factor in L1.

The proof of (a) is an easy exercise, which we leave to the reader (see
Problem 6, Section 7.4). As already mentioned (Section 6.6), in proving (b) it
suffices to suppose that co has the special form
n
(7.69) co = dx 1 + L wi dxi .
i=2

In preparation for the proof of (b), we first prove the following.

Lemma 1. Let ~ be a 2-form such that ~ is continuous on D and co /\ ~ = O.


Let x* ED. Then there exist a neighborhood n of x* and a I-form 9 con-
tinuous on n, such that S = 9 /\ co in n.

318
7.10 Integrating factors (continued)

PROOF. For sufficiently small Q there exist I-forms 11 1, ... , 11" continuous
on Q such that {l1 l (x), ... , l1"(x)} is an orthonormal basis for (E")* and
111 = lrol-lro(see Problem I). Then
~(x) = L (iix)l1i(X) /\ lli(X),
i<i

where (iix) is the component of ((x) with respect to the (i,j)th element of the
corresponding orthonormal basis {l1 i(X) /\ lli(X), i < j} for (E~)*. Since
ro /\ ~ = 0,
o = 111(X) /\ ~(x) = L (iiX)l1l(X) /\ ll i(x) /\ ll i(x).
2Si<i

Hence (ii = 0 for 2 :::; i < j, and

~(x) = L" (liX)l1l(X) /\ ll i(X).


i= 2

We take 9 = - LJ=2 (Iilli/lrol. Then ~ = 9 /\ ro in Q. D

PROOF OF THEOREM 7.5(b). Suppose that ro is as in (7.69) with ro /\ dro = O.


We may suppose that x* = O. By Lemma I we may also suppose that dro =
9 /\ ro, for some I-form 9.
In the proof we use a construction similar to that used in proving Theorem
6.2(b). We define F(u, z) below in a way similar to the proof of that theorem,
and show that (6.21) holds under the present hypotheses. This implies that
an integrating factor exists in some open set ~ containing 0, by the last step
of the proof of Theorem 6.2(b).
As in Section 6.6, we define gl(t) as the solution of the differential equation

(7.70) z = (x 2 , ... , x")

with gl(O) = u, for 0 :::; t :::; 1, (u, z) E Q2, where Q 2 is some neighborhood of O.
By definition, gl(l) = F(u, z). Moreover, gl(t) = F(u, tz) (see Problem 2).
Hence
dg l
(7.71) -d
t
= L Fi(u, tz)x'
i=2
" .

where Fi = OF/OXi. Let H(t, u, z) = (F(u, tz), tz). By (7.70) and (7.71) we have

(7.72) o = L" (Fi + Wi 0 H)x i


i=2

where F i is evaluated at (u, tz). Let # denote the transformation of differential


forms induced by the transformation H. Then from (7.69),

" "
ro' = F I du +L F;(x i dt + t dx i ) + L Wi H(x i dt + t dxiJ, 0

i=2 i=2

319
7 Exterior algebra and differential calculus

where F 1 = (oF jou). Let


I/Ii(t, u, z) = {Fi(u, tz) + wi[H(t, u, z)J}t.
By (7.72), the terms multiplying dt are zero, and
n
,
co = F 1 du + '\' ./, i
L- 'l'i dx .
i=2

By the fact that dco = 0 /\ co and Theorem 7.4,


dco' = 0' /\ co'.
Hence the coefficient of dt /\ dx i , i ~ 2, must be the same on both sides. On
the left side this coefficient is ol/l;/ot, and on the right side ¢I/Ii' where
0' = ¢ dt + other terms.
The precise expression for the coefficient ¢(t, u, z) is unimportant. For fixed
u, z, the function I/Ii satisfies the linear differential equation

Ol/li = '/"'/'.
at '1''1','

Since I/Ii = 0 when t = 0, we must have I/Ii = 0 for 0 ~ t ~ 1. In particular,


when t = 1 we obtain
o= Fi(u, z) + wi[F(u, z), z], i = 2, .. . ,n.
This is equivalent to (6.21). o
PROBLEMS

1. Let ro be a continuous I-form with ro(O) "# O. Show that there exist a neighborhood n
of 0 and I-forms 1) I, ... , 1)n continuous on n such that 1)i(X) . 1)i(x) = c5ij, i,j = I, ... , n
and 1)1(X) = Iro(x)I-lro(X), for all x E n. Here c5ij = I if i = j, c5ij = 0 if i "# j. [Hint:
Apply the Gram-Schmidt process to {ro l , ... , ron} where rol = ro and ro 2 , ••• , ron
are chosen so that the covectors rol(O), ... , ron(O) are linearly independent.]
2. Let gl(t) be as in (7.70), and g'(l) = F(u, z). Show that g'(k) = F(u, kz) for 0::5: k ::5: 1.
[Hint: Consider the change of variable , = kt.]

320
8
Integration on manifolds

The topic of this chapter is integration over subsets of an r-manifold M c En.


For this purpose we first study in Section 8.1 regular transformations from
E' into M. Then we find that coordinates can be introduced on portions of
M, using the inverses of regular transformations. Such a portion S is called a
coordinate patch on M. It is not always possible to find a single coordinate
system for all of M. However, from the implicit function theorem, co-
ordinates can be introduced locally. Using this fact, together with a device
called partition of unity, the integral of a continuous function f over a set
A c M is defined in Section 8.3.
One of the main results of the chapter is the divergence theorem. If
D c En is an open set satisfying certain regularity conditions, then the
divergence theorem equates the integral over the boundary fr D of the normal
component of a I-form ~ and the integral over D of the divergence div ~. An
equivalent version of the divergence theorem states that the integral of an
(n - 1)-form 0) over fr D equals the integral of dO) over D, provided D and
fr D are oriented consistently. When n = 2 and 3, the divergence theorem
is equivalent to theorems in vector analysis commonly attributed to Green
and to Gauss.
The divergence theorem is a special case of a result that states that the
integral of the differential dO) of an (r - I)-form 0) over a portion A of an
oriented r-manifold M equals the integral of 0) over the suitably oriented
boundary of A. This is called Stokes's formula. In Section 8.10 the idea of
homotopy between two transformations is introduced and is applied to give
sufficient conditions in order that a closed differential form be exact. Some
physical applications are mentioned in Section 8.S and at the end of the
chapter.

321
8 Integration on manifolds

8.1 Regular transformations


Consider integers rand n with 1 S r S n. We recall from Section 4.7 the
concept of r-manifold M c en. One may think intuitively of M as being
locally "like" Er. For instance, ifr = 2 and n = 3, then M is a 2-dimensional
surface in E3. In Chapter 4 an r-manifold M is defined as a subset of some
euclidean En that can locally be described by setting equal to 0 functions
<1>1, ... , <l>n-r with linearly independent differentials. An r-manifold M has at
each x E M a tangent space, denoted in the present section by TM(X),
For purposes of integration it is necessary to consider manifolds from a
different point of view. We must show that a manifold can be locally described
by a system of coordinates. This idea will be made precise in Section 8.2. In
the present section we study transformations from a set ~ c Er into M
that are regular in the sense defined below. By means of such transformations
we can find geometric quantities associated with M, such as tangent spaces
and the r-dimensional measure of certain subsets of M. We recall from
Section 4.3 the concepts of differential Dg(t) of a transformation g, and of
the partial derivatives gl (t), ... , g.(t).

Definition. Let g be a transformation from an open set ~ c Er into an r-


manifold M c En, where r S n. Then g is regular if:
(1) g is of class C1),
(2) g is univalent, and
(3) Dg(t) has rank r, for every t E ~.

Since r S n, Condition (3) states that the differential Dg(t) has maximum
possible rank. We see below that (3) can be replaced by an equivalent con-
dition (3'). When r = 1, Condition (3) states that g'(t) i= O. This is the same
condition imposed in Section 6.2 in defining parametric representations of
curves. The concepts of Section 6.2, such as equivalence, could be generalized
to r > 1. However, this is not done in this book.

Proposition 8.1. Let M be an r-manifold and g a regular transformation


from ~ c Er into M. Let Xo = g(t o), to E A Then the partial derivatives
gl (to), ... , gr(t o) form a basis for the tangent space TM(XO)'

PROOF. Let {tt. ... , t r} be the standard basis for Er. Since ~ is open, it
contains the <5-neighborhood of to for some <5 > O. For lsi < <5, let "'(s) =
g(to + std. Then", is of class C(l), and "'(s) E M.(One may think of", as
representing a curve on M through x o, obtained by fixing the components
t j = tb of t for j = 2, ... , r.) By definition of tangent vector (Section 4.7),
""(0) is a tangent vector to M at xo. But ""(0) = gl (to) by definition of partial
derivative. Thus gl (to) E TM(XO)' Similarly, the partial derivatives g2(t O), ... ,

322
8.1 Regular transformations

gr(t O) are tangent vectors to M at Xo. Since Dg(to) has rank r, the vectors
gl (to), ... , gr(t O) are linearly independent. Hence they form a basis for
TM(x O)' which is an r-dimensional vector subspace of En. 0

EXAMPLE I. Let M = {(x, y, z): Zz = X z + yZ, Z > O}. This 2-manifold


M c E3 is the upper half of a cone, with the vertex deleted. We take
~ = {(s, t): s > 0,0< t < n}, g(s, t) = (s cos t)e 1 + (s sin t)e z + se 3 . Then
g(~) c M. In fact, g(~) = {(x, y, z) EM: y > O}. Let us denote partial
derivatives of g by og/os, og/ot rather than gl, gz. Then

~; = (cos t)e + (sin t)e z + e3


1

~; = (-s sin t)e j + (s cos t)e z .

The vectors og/os, og/ot are linearly independent. By Proposition 8.1 they
form a basis for the tangent space at x = g(s, t). For instance, at the point
g(2, n13) = e j + J3 e z + 2e 3 , this basis is

By the same method of proof as for Proposition 8.1 one can show that
h = Dg(toHk) is a tangent vector to M at Xo, for any k E E' (see Problem 6).
The special choices k = tj, j = 1, ... , r, gave the basis in Proposition 8.1
for the space TM(x O) of tangent vectors at Xo. Thus we have:

Corollary. TM(xo) is the image of E' under the linear transformation Dg(to).

The exterior product of the partial derivatives of g at t is the r-vector


gl (t) /\ ... /\ gr(t). Let us denote the euclidean norm of this r-vector by
j"g(t). Thus

(8.1)

By Theorem 7.3, j"g(t) > 0 if and only if the vectors gj(t), ... , gr(t) are
linearly independent. Since these vectors are the columns of the matrix of
Dg(t), their linear independence is equivalent to the fact that Dg(t) has
maximum rank r. Thus condition (3) in the definition of regular transforma-
tion is equivalent to

(3') j"g(t) > 0 for every t E~.

The notation j"g(t) gives a convenient way to define the concept of r-


dimensional measure for certain subsets of an r-manifold M.

323
8 Integration on manifolds

Definition. Let g be a regular transformation from d into M. Let A = g(B),


where B is a measurable subset of d. Then the r-dimensional measure of
A is

(8.2) ~(A) = Lj'g(t)d~(t).

For brevity, we also call ~(A) the r-measure of A. When r = 1,2,3, we


often say length, area, or volume of A as usual, rather than r-measure. It
will be shown in Section 8.3 that if g is another regular transformation and
B a set such that A = g(B), then ~(A) = Ii} j'g(t)d~(i). In the language
introduced in Section 8.2, this means that ~(A) does not depend on the
particular coordinate system chosen on A.
The motivation for Formula (8.2) is as follows. Consider first a point
to E d, and I an r-cube with to as vertex and side length a, as in Figure 8.1.
Let Xo = g(t o) and K be the r-parallelepiped in En spanned by agl(t O), ••• ,
ag,(t o) with Xo as vertex.
By (7.35),
~(K) = lag 1 (t O) 1\ " ' 1 \ agr(t O)I = arj'g(to).
Since ~(I) = ar, we have ~(K) = j'g(to)~(I). The affine approximation to
g at to is
G(t) = Xo + L(t - to), L = Dg(to).
Moreover, K = G(I). This suggests that ~(K) should be a very good approxi-
mation to ~[g(I)]. If Zed is a figure that represents the union of small
nonoverlapping r-cubes 110 "" I p' then ~[g(Z)J should approximately
equal
p p

L V(Kd = L j'g(tk)~(Id,
k=l k=l

where Kk is the r-parallelepiped obtained by affine approximation at an


appropriate vertex tk Elk' Since a measurable set B can be approximated by

g -

Xo = g(t o)
h, = ag,(to)

Figure 8.1

324
8.1 Regular transformations

figures, this rough argument should make plausible the definition (8.2) of
~(A).

EXAMPLE I (continued). Let us find j'g(s, t). From the rules of exterior
multiplication
og og . .
os /\ at = [(cos t)e l + (sm t)e 2 + e 3J /\ [( -s sm t)e l + (s cos t)e 2J
og og .
os /\ at = se l2 - (s sm t)e31 - (s cos t)e23 ,

j'g(s, t) = Ise 12 - (s sin t)e 31 - (s cos t)e 231


j'g(s, t) = (S2 + S2 sin 2 t + S2 cos 2 t)1/2 = 2s.
Let us take B = {(s, t): a < s < b,O < t < n}. Then A = g(B) is the portion
of the cone Z2 = x 2 + y2 between the planes z = a and z = b, for which
y> O. We have

V2(A) = In2s dV2(s, t) = fdt f 2s ds = n(b 2 - a2).

This result can, of course, also be found from elementary geometry.

EXAMPLE 2. Let S c £3 be a set such that


S = {(x, y, cp(x, y)): (x, y) E R},
where R is an open subset of £2 and cp is of class C(1) on R. The set S is a 2-
manifold. To see this, let <D(x, y, z) = z - cp(x, y), D = {(x, y, z): (x, y) E R}.
Then grad <I>(x) -=f. 0 and S = {x ED: <I>(x) = O}. Hence S is the 2-manifold
determined by <1>; see Section 4.7.
Let
g(x, y) = xe l + ye2 + cp(x, y)e 3.
Then g is of class C(1) from R onto S and is univalent. The vectors oglox
and ogloy give a basis for the tangent space Ts[g(x, y)]. By (8.1), j'g(x, y) =
loglox /\ og/oy I. (See Figure 8.2).
Calculating these partial derivatives, we get
og ocp og ocp
ox = e l + ox e 3, oy = e 2 + oye3,

og og ocp ocp
ox /\ oy = e l2 - oy e 31 - ox e 23 ,

325
8 Integration on manifolds

=
g(.x, y)

~r------t-----y

x
(Jh)
R

Figure 8.2

Since the last line is positive, /g(x, y) > 0, Thus g is a regular transformation
from R onto S, The area of a set A c S lying above a measurable set B c R is

Example 2 can be generalized to a set S described by expressing n - r of


the components Xl, ... , xn as functions of the remaining r components.

EXAMPLE 3, Let A = (ib ... , ir) be an increasing r-tuple of integers,


1 :s; ik :s; n for k = 1, ... , r; and let (j1' ... , jn-r) be the increasing (n - r)-
tuple complementary to A. Let R be an open subset of Er, and cjJ1, ... , cjJn-r
of class C<1) on R. Let g be the transformation from R into En such that
gik(X}') = Xi", k = 1, ... , r
gil(XA) = cjJl(XA), 1= 1, ... , n - r,
where

Let S = g(R). Then S is the r-manifold determined by «I> = (<1>1, ' .. , <l>n-r),
where <l>1(X) = Xl - cjJl(XA) for XED with D = {x: XA E R}. Then g is of class

°
C(1) and univalent. The explicit formula for /g(XA) is complicated. However,

we can show that / g(x,.\) > as follows. Since

8g 8g
-8 .- /\ '"
Xl!
/\ -8 . = e,.\
Xlr
+ other terms.

This r-vector is not 0 since its Ath component (the coefficient of e,.\) is 1. Hence
/g(x,.\) > 0, which shows that g is regular from R onto S. If we set XA(X) = x\
then X"\ projects En onto E' and g = (X"\ IS)-l.

326
8.1 Regular transformations

The case r = n
Let q, be a transformation of class C(l) from ~ c E' into E'. In this chapter we
call q, a fiat transformation. Condition (3) in the definition of regular trans-
formation above is equivalent to Jq,(t) i= 0 for every t E ~, where Jq,(t) is
the Jacobian. For flat transformations, this definition of regularity agrees
with that in Section 4.5. Regular flat transformations will be an essential
tool in relating overlapping coordinate systems on a manifold M. As a
first step in this direction, we show the following.

Proposition 8.2. Let g be a regular transformation from Li into M, and q,


a regular fiat transformation from ~ into Li. Then g = go q, is a regular
transformation from ~ into M. Moreover,
(8.3) j'g(t) = j'g[q,(t)] IJq,(t) I.

PROOF. Since g and q, are both C(l) and univalent, their composite g is also
of class Cl) and univalent. By the composite function theorem (Section 4.3),
Dg(t) = Dg[q,(t)] 0 Dq,(t). Since Dq,(t) is a nonsingular linear transformation
from E' onto E', and Dg[q,(t)] is a linear transformation of rank r, Dg(t)
also has rank r. Thus g is regular. It remains to prove (8.3). Given t, let
L = Dg[q,(t)]. If kb"" kr are linearly independent vectors in E' and
hj = L(k),j = 1, ... , r, then

j'g[q,(t)] = Ihi 1\ ... 1\ hr I.


Ikl 1\ ... 1\ kr I

This follows by Formulas (7.40) and (7.40a) with vj = gl«jl(t)]. In particular,


let kj = q,lt), j = 1, ... , r, namely, the partial derivatives of q, at t. By the
composite function theorem, hj = g,{t). Therefore,

By (7.38),
q,l(t) 1\ ... 1\ q,r(t) = det ¢~(t)el"'r
= Jq,(t)e 1 ··· r·

Therefore j'q,(t) = Iq,l(t) 1\ ... 1\ q,r(t) I = IJq,(t)l. This proves (8.3). 0

We recall that an r-manifold M c En is a topological space, when given


the relative topology (Section 2.6).

Proposition 8.3. If g is a regular transformation from ~ into M, then g(~) is a


relatively open subset of M.

327
8 Integration on manifolds

~s

X'JlG

Figure 8.3
WR
PROOF. Consider any to E ~, and let Xo = g(to). It suffices to show that a
neighborhood Q of to exists such that g(Q) is a relatively open subset of M
containing Xo. If A = (iI' ... ,ir ) is an increasing r-tuple, then let Xi. =
(XiI, ... , xir), and XA(X) = XA as in Example 3 above.
By the implicit function theorem, there exist an increasing r-tuple A and a
neighborhood U of Xo with the following properties. Let S = M n U and
R = XA(S). Then R is an open subset of E' containing x o , XA IS is univalent,
and G = (XAIS)-l is a regular transformation from R onto S. Note that G
is the kind of regular transformation denoted by g in Example 3.
Let gA = X A g. Note that gA is the flat transformation obtained by
0

considering only the components gil, ... , gir of g. Let Q be a neighborhood


of to such that g(Q) c S. For t E Q, g(t) = G[gA(t)] (see Figure 8.3). The
transformation gA is C(1), and gAIQ is univalent. By the composite function
theorem, Dg(t) = DG(x A) DgA(t). Since Dg(t) has maximum rank r, DgA(t)
0

must also have rank r for any t E ~. Thus gA IQ is a regular flat transformation
from Q into R. By the corollary to the inverse function theorem (Section 4.5),
gA(Q) is an open subset of R. By the way G is constructed, G(R 1 ) is open
relative to M, for any open Rl c R. But g(Q) = G(R 1) when Rl = gA(Q).
Thus g(Q) is a relatively open subset of M containing Xo. 0

Remark. In the definition of regular transformation at the beginning of


the section, we assumed not only Conditions (1), (2) and (3) but also that the
set g(~) is contained in an r-manifold M. It is shown in Section 8.9 that if g
is any transformation from ~ into En, then g(~) is an r-manifold provided
(l), (2), and (3) hold and that g(~l) is open relative to g(~) for any open set
~l c ~. Problem 3 illustrates the difficulty encountered if simply (1), (2),
and (3) are assumed.

328
8.2 Coordinate systems on manifolds

PROBLEMS

1. For each of the following transformations from L1 c £2 into £3, find /g(s, t) and
g(L1). Show that g is univalent.
(a) g(s, t) = (s + t)e l + (s - 3t)e 2 + (-2s + 2t + 2)e 3 , L1 = {(s, t):O < 5 + t < I,
5>0,t>0}.
°
(b) g(p, e) = (p cos 1X)e1 + (p sin IX cos e)e2 + (p sin IX sin e)e3' < IX < rr/2 (IX fixed),
L1 = (0, if)) x (0, 2rr).
(c) g(s, t) = ste l + 5e 2 + te 3 , L1 = E2.

2. In each case find the tangent space Ts(xol to S = g(L1) at the indicated point Xo.
(a) In Problem I(a), take Xo = jel - je2 + 2e 3 ·
(b) In Problem I(b), take Xo = (cos lX)e l + )2/2 sin lX(e 2 + e3 ).
(c) In Problem I(c), take Xo = e l + e2 + e3 .
3. Let g(t) = (cos t)e l + (sin 2t)e 2 , L1 = (0, 3rr/2). Sketch g(L1). Show that g is univalent,
that /g(t) > 0, but that g(L1) is not a I-manifold.

4. Find the area of {(x, y, xy): x 2 + l :::; I}.


5. Let n = 2, r = 3. Show that (8.2) becomes

V2 (A) = f {[- -
B
0(g2, g3)J2
0(5, t)
[0(g3, gl)J2
+ --
0(5, t)
[o(gl, g2)J2}1/2
+ --
c'l(s, t)
dV2 (s, t).

6. Let g be regular from L1 into M. Show that Dg(to)(k) is a tangent vector to M at


Xo = g(t o ), for any vector k E E'.

7. Let n = 4, r = 2, A = (1,2).
(a) Show that in Example 3,
/g(XI, x 2 ) = [1 + Igrad ¢lll + Igrad ¢212 + (¢l¢i - ¢i¢1)2]1/ 2 .
(b) Find V2 (A) if
A = {xEE4 :x 3 = (X I )2 - (X 2)2,X4 = 2XIX2, (X I )2 + (X 2)2:::; I).

8.2 Coordinate systems on manifolds


Let M be an r-manifold. Since M is r-dimensional, it should be possible to
find, at least locally on M, r functions F 1, ... , V such that the numbers
Fl(X), ... , V(x) serve as coordinates of a point x E M. If S is a portion of M
coordinatized in this way, then F = (F\ ... , V) should be a univalent function
from S into g. We also require that the inverse g = F- 1 be regular, as
defined in Section 8.1.

Definition. Let S be a nonempty, relatively open subset of an r-manifold


M c En. A univalent transformation F from S into g is a coordinate
system for S if F = g- 1, where g is a regular transformation from an
open set A onto S. The coordinates of a point XES in this system are
F1(x), ... , V(x).

329
8 Integration on manifolds

One might prefer a criterion in terms of F rather than its inverse, for F
to be a coordinate system. Such a criterion is given in Problem 5. On the
other hand, the inverse g rather than F is ordinarily used for calculations in
the sections to follow.
When r = n, we called in Section 5.9 any regular flat transformation
F = (Fl, ... , Fn) a coordinate system. The inverse g = F- 1 of a regular
flat transformation is regular. Hence the present definition is consistent with
that in Section 5.9, when r = n.

EXAMPLE I. Let M = {(x, y, z) : ZZ = kZ(xZ + yZ), z #- O} where k > O. This


2-manifold M is a cone in E3 with the vertex removed. Let us consider three
possible coordinate systems.
(a) Let S be the upper half of M, namely, S = {(x, y, z) EM: z > O}. Let
us take x, y as coordinates of a point on S. Thus, F = (X Is, Y IS) where
X, Y, Z are the standard cartesian coordinate functions for E3. The
inverse is
g(x, y) = xe 1 + ye z + k(x 2 + y2)1/2e 3,
and ~ = E2 - {(O, O)}. The lower half of M is coordinatized similarly,
replacing k by - k.
(b) Let us use polar coordinates (r, 8) in the (x, y) plane as coordinates of
points on the upper half S of the cone M. Let
g(r, 8) = (r cos 8)e 1 + (r sin 8)e 2 + kre 3 ,
for (r, 0) E A, where A = (0, 00) x (0, 2n). This coordinatizes S = g(A).
The set S is a relatively open subset of S, obtained by removing the
rayon S corresponding to 8 = 0 or 2n. The restriction 0 < 8 < 2n is
needed to ensure that g is univalent. When k = 1, g is the same trans-
formation considered in Example 1, Section 8.1.
(c) Spherical coordinates (see Problem 3).

EXAMPLE 2. Let us return to Examples 2 and 3, Section 8.1. In the first of


these examples, F = (X IS, Y IS) is a coordinate system for S. The co-
ordinates of a point (x, y, cjJ(x, y» E S in this system are x, y. In Example 3,
F = Xl IS. The coordinates of a point x in the coordinate system Fare
x\ ... ,Xi,. Such a coordinate system is called cartesian. The choice of
cartesian or some other coordinate system is often determined by
convenience.

Proposition S.4. If F is a coordinate system for S, then F is a homeomorphism


from S onto ~ = F(S).
PROOF. By definition, F is univalent and its inverse g is C1), hence continuous.
We must show that F is continuous from S onto ~. By Theorem 2.6, this is
valid provided F- l(B) is open relative to S for any open set B c A But

330
8.2 Coordinate systems on manifolds

F-1(B) = g(B); by Proposition 8.3 (with fl there replaced by Band M by S),


g(B) is open relative to S. 0

It is not ordinarily possible to find a single coordinate system for all of a


manifold M. If ScM has a coordinate system F, then by Proposition 8.4
S is homeomorphic with an open set fl c Er. Since fl cannot be both open
and compact, S is not compact. In particular, a compact manifold M (e.g.,
a sphere or torus) cannot be coordinatized by a single system.

Definition. A relatively open set ScM which has a coordinate system is a


coordinate patch on M.

By the implicit function thorem every point of M lies in some coordinate


patch S of the sort in Example 3, Section 8.1. Actually, each point of M
lies in an infinite number of coordinate patches. Let us now show how
different coordinate systems are related in overlapping patches.

Coordinate changes
We recall from Proposition 8.2 that the composite g = go q, of a regular
transformation g and a regular flat transformation q, is regular. If .:i and fl
are the respective domains and if .:i = q,(fl), then g(~) = g(fl). If we denote
this set by S, then S is a coordinate patch on M. Moreover, F = g- 1 and F
= g- 1 are both coordinate systems for S. In this way, any regular flat trans-
formation leads to a change of coordinates.

EXAMPLE I (continued). In this example, g(r, e) = g(r cos e, r sin e). Let us
replace fl in part (a) by fll = E2 - {(x, 0): x > O}. Such a change makes no
difference in computing integrals over a part of fl, since the half-line that has
been removed has 2-dimensional measure O. For (x, Y) E fll' let R(x, y) =
(x 2 + y2)1/2 and 8(x, y) the angle from the positive x-axis to (x, y) with
o < 8(x, y) < 2n. Then q, = (R, 0) and g = go q,.
Let us next consider overlapping coordinate patches Sand S on M,
with respective coordinate systems F and F. Let us show that on S n S
these coordinate systems are related by such a regular flat transformation
q,. Let
flo = F(S n S), ~o = F(S n S),
and g = F- 1 , g = F- 1 • Figure 8.4 shows the appropriate choice for q,.

Proposition 8.5. Let q, = F (g Iflo) as in Figure 8.4. Then q, is a regular


0

fiat transformation from flo onto ~ 0, and g(t) = g[q,(t)] for all t E flo·
PROOF. The final assertion holds by construction. It remains to show that
q, is regular. Since q, is univalent by construction, it suffices to show that any
331
8 Integration on manifolds

Figure 8.4

to E .10 has a neighborhood Q 1 such that <I> I Q 1 is regular. Let Xo = g(to)·


Let A and a neighborhood U of Xo be determined from the implicit function
theorem, as in the proof of Proposition 8.3. It was shown there that to
has a neighborhood Q such that gAl Q is a regular flat transformation.
Similarly, to = <l>(t o) has a neighborhood 0 such that gAIQ is regular. By
the corollary to the inverse function theorem, R = gA(O) is an open set.
Moreover, x~ = gA(tO) E it Let Q 1 be a neighborhood of to of radius
small enough that Q 1 c Q n .10 and gA(Q 1) c it Let f = (gAIQ)-1. Then
<I>(t) = f[gA(t)] for all t E Q1' Since f is the inverse of a regular flat transforma-
tion, f is regular. Since gAIQ 1 is regular, the composite <l>IQ 1 is regular. 0

*Manifolds defined by coordinate systems


We have seen that an r-manifold M c En is covered by coordinate patches.
Conversely, let M be a subset of En with the following property. There is a
collection g> of relatively open subsets of M that cover M, and with each
S E g> there is associated a homeomorphism F from S onto an open set
.1 c E' such that the inverse g = F- 1 is regular from .1 onto S. Since each
S E g> is an r-manifold, M is an r-manifold. This shows that instead of the
approach in Section 4.7 we could have defined manifolds in terms of
coordinate systems.
We have taken a rather concrete approach to the idea of manifold, con-
sidering only manifolds given as subsets of some euclidean space. However,
the manifolds encountered in practice often are not given in this way. The
approach via coordinate systems allows one to take a more abstract point of
view. From this viewpoint the definition of manifold is as follows. An r-
manifold of class C(l) is a Hausdorff topological space Z provided with a
collection of open subsets S (called coordinate patches) covering Z and for
each coordinate patch a homeomorphism F from S onto an open set .1 c E'.
The regularity of the flat transformation <I> in Figure 8.4 is now imposed as
an axiom.
If Z is an r-manifold according to this more abstract definition and Z is
separable, then Z can be realized as a submanifold of some euclidean space,
in fact as a submanifold of E2 ,+ 1 [24, Chapter IV]. A result of this type is

332
8.2 Coordinate systems on manifolds

called an embedding theorem. By separable we mean that Z has a covering


either by finitely many coordinate patches or by a sequence of coordinate
patches.

PROBLEMS

1. Let M = {(x,y,z):x 2 + 2y + Z2 = 3,z > O,y > Ixl}. Let F = (XIM, YIM) and
F = (X 1M, Z IM). Describe ~,,i, g, g, and q, (see Figure 8.4).
2. Let M = {(y2 + Z2, y, z): y > O} and let F(x, y, z) = (y + z, exp z) for (x, y, z) EM.
Show that F is a coordinate system for M and find F(M). [Hint: First take y and z
as coordinates on M and then find a suitable coordinate change q, giving the system
F.J
3. In Example 1, take (p,8) as coordinates on S, where p = (x 2 + l + Z2)1I2. Find
a regular flat transformation q, taking (p, 8) into (r, 8). Show that t = (g q,r I is
0

such that t(x, y, z) = (p, 8).


4. Let g be regular from an open interval ~ c El into a I-manifold M. Let S = g(~).
Describe a coordinate system F for S such that IF(xtl- F(x 2 )1 is the length of the
subarc of S with endpoints XI and x 2 . [Hint: A curve can be parameterized by arc
length (Section 6.2).J
5. Let M c En be an r-manifold and S a relatively open subset of M. Let F be of class
C(l) from an open set containing S into Er. Suppose that F IS is univalent and that

DF(x) ITM(x) is a linear transformation of rank r for each XES. Show that ~ = F(S)
is open and g = (FISr I is regular. [Hint: Consider G as in the proof of Proposition
8.3. Show that FoG is a regular flat transformation if R is a sufficiently small open
set containing x~.J
6. (Stereographic projection.) Let M be the sphere x 2 + l + (z - 1)2 = I. For each
X = (x, y, z) EM except the "north pole" 2e 3 , let (s, t, 0) be the point where the line
through 2e 3 and x meets the plane z = O. Let F(x) = (s, t).
(a) Show that F is a coordinate system for M - {2e3}'
(b) Let hlo h2 be tangent vectors to M at x, and let k/ = DF(x)(h/), I = 1,2. Show
that the angle between kl and k2 equals the angle between hi and h2.
7. Let fl, ... , 1', <1>1, ... , <1>n-r be functions of class C(l) on an open set D. Suppose
that F = (f I IS, ... , 1'1 S) is a coordinate system for S, that S = {x ED: cJ)(x) = O},
and that DcJ)(x) has rank n - r for every XED. Show that each Xo E S has a neighbor-
hood U such that (fl, .. . ,1', <1>1, ... , <1>n-r) restricted to U is a coordinate system
for U.
8. Let I :c:; r < n, and let jt(n, r) = {cx E E~: Icxl = I, cx is decomposable}. Identify
E~ with Em, and show that A(n, r) is a manifold of dimension r(n - r). [Hint:
Given CX o E A(n, r), let (VI"'" vn) be an orthonormal frame for En such that CXo =
v I 1\ ... 1\ Yr' Show that if CX is in a small enough neighborhood of cx o, then cx can
be uniquely written in the form

cx=C(V I + ±tlkVk)I\"'I\(Vr+ ±trkVk)'


k=r+l k=r+l

The r(n - r) numbers t jk can be taken as coordinates of Ot.J

333
8 Integration on manifolds

8.3 Measure and integration on manifolds


Let us now define r-dimensional measure for subsets of an r-manifold M
and integrals with respect to it. The r-dimensional measure of a set A c M
is denoted by v,.(A), and the integral of a function f over A by SA f dv,. or by
SA f(x)dv,.(x). When r = n, these turn out to have the same meaning as in
Chapter 5.
For simplicity, the integral will be defined only for continuous functions.
By so doing, we avoid some slightly tedious discussions of measurability of
functions, which for present purposes are irrelevant.
Let us first consider the case when A is contained in some coordinate
patch S. Then S = g(d), where g is a regular transformation from d into M,
and A = g(B) for some Bcd. We call A an r-measurable subset of S if B is
measurable. The requirement that A be r-measurable is a mild one. For
example, any relatively open set A c S is r-measurable. Since g is a homeo-
morphism (by Proposition 8.4), A is relatively open if and only if B is open.
The r-measure v,.(A) of an r-measurable set is defined in Section 8.1.
The definition of SA f dv,. can be motivated by a discussion such as that
following the definition (8.2) of v,.(A). Using the notation there,J[g(to)] v,.(K)
should furnish a good approximation to Sg(I) f dv,.. If Z is a figure composed
of small nonoverIapping r-cubes I" ... , I P' then Ig(z) f dV should be ap-
proximately the corresponding sum
P
L f[g(tdJfg(tdv,.(Id.
k='
In the exact formula Z is replaced by the set B = g-'(A), and the sum by
an integral.

Definition. Let A be an r-measurable subset of a coordinate patch S, and let


f be continuous on A. Then
(8.4) {f(X)dv,.(x) = f![g(t)Jfg(t)dv,.(t), A = g(B),

provided the function (j 0 g)f g is integrable over B.

The integral over B is taken in the sense of Section 5.6. By (8.1), j'g(t) =
Ig,(t) 1\ ... 1\ gr(t)l. Since gis ofcIass C<'), j'g is continuous. Hence(jo g)j'g
is continuous. If f 2: 0, then the integral over B either exists or diverges to
+ 00. When the latter occurs we agree that the integral of f over A also
diverges to + 00.
We must show that the integral does not depend on the particular choice
of coordinate system. Let S be another coordinate patch such that A c S,
and let F be a coordinate system for S. Let us adopt the notation of Figure 8.4.
Then g(t) = g[q,(t)] for all t E do and A = g(B) = gal), where B c do and

334
8.3 Measure and integration on manifolds

B = q,(B) c Xo. By the transformation formula for integrals (Theorem 5.8)


and Formula (8.3)

f/[g(i)]/ g(t)dv,(i) = f![g(t)]/g[«j)(t)] 1 J«j)(t) 1dV,(t)

= Lf[g(t)]/g[«j)(t)]dV,(t),

as required.

EXAMPLE I. Let M c E3 be a 2-manifold, and S a relatively open subset of


M on which x and y can be taken as coordinates, as in Example 2, Section 8.1.
Since /g = [1 + 1grad 4>12]1/2, we have

I f dV2 = L f[x, y, 4>(x, y)](l + 1grad 4>(x, yW)1/2dV2(x, y).

EXAMPLE 2.Let M be a I-manifold and B a closed interval [a, b]. Using the
terminology of Section (6.2), A is the trace of the simple arc y represented on
[a, bJ by g. Formula (8.4) becomes

I f (X)dV1(X) = f f[g(t)] 1 g'(t)Idt.

The right-hand side is Lf ds, as defined in Section 6.4, Problem 6.

EXAMPLE 3. Let H be the hemisphere x 2 + y2 + Z2 = 1, Z > 0. Introducing


spherical coordinates on H, let
g(4), 8) = (sin 4> cos 8)e u + (sin 4> sin 8)e 2 + (cos 4»e 3 ·
The image of a small square [4>, 4> + a] x [8,8 + a] in the (4),8) plane is a
small sector of the hemisphere that is approximately a rectangle of side
lengths a and a sin 4>. Since log/o4> 1\ og/o81 a2 is approximately the area of
the sector, this suggests that ,Ig( 4>, 8) = log/o4> 1\ og/o81 = sin 4>. The reader
should check this formula (Problem 2). If we take B = (0, n/2) x (0, 2n)
then H - g(B) is an arc of a great circle corresponding to 8 = 0. This arc
is 2-dimensionally null in the sense defined below, and hence

f f(x)dV (x) = f.
2 f(x)dV2 (x) = J
1C/2
d4>
J21C
f[g(4), 8)Jsin 4> d8.
H g(B) 0 0

Let us turn to the matter of giving a general definition ofthe r-dimensional


measure V,(A) and the integral SA f dV, when A is not necessarily contained
in some coordinate patch. To simplify matters let us at first assume that M
is a compact manifold and f is continuous everywhere on M. The traditional
way to proceed is to dissect M into a finite number of nonoverlapping pieces
S1, ... , Sm each of which has a coordinate system, with fr Sk n fr Sf contained

335
8 Integration on manifolds

in a finite union of (r - i)-manifolds and M = cl S 1 U ... u cl Sm. Then

(8.5) f f dYr = f= f
A k 1 AnSk
f dYr.

In simple examples it is easy to find such dissections of M. However, the


theorem that every compact r-manifold M has such a dissection is a difficult
one to prove [24, Chapter IV]. Nor is it evident that the integral is independent
of the particular dissection chosen.
The same result can be achieved by a simpler device called partition of
unity. The basic difficulty with dissections is that Sk and Sl cannot overlap.
With partitions of unity this problem is avoided.
Remark. In examples and homework problems considered in this book, A
is usually contained in a single coordinate patch, except perhaps for a subset
of r-dimensional measure 0 that contributes nothing to Yr(A) or to the integral.
If this is not the case, (8.5) can be used in these examples and homework
problems after a rather obvious dissection of M. Thus, the reader should
regard partitions of unity as an important theoretical tool rather than as an
aid to calculating integrals.

Partition of unity
Let us recall from Section 5.3 that the support of a function t/I is the smallest
closed set outside of which t/I(x) = O. Let us first find for every Xo and l5 > 0
a function t/I of class C(OO) on En such that t/I(x) > 0 on the l5-neighborhood of
Xo and the support of t/I is the closure of that neighborhood. In fact, let

h(x) = eXP(l =~2), -1 < x < 1,

h(x) = 0, Ixl;:::: 1.

From the example at the end of Section 3.4 and the composite function
theorem, h is of class C( (0) on E 1 . Let

Definition. Let M be a compact manifold. A collection offunctions {n 1, ... , n m}


is a partition of unity for M if:
(1) nk is of class C(oo) on M and nk ;:::: 0, k = 1, ... , m;
(2) The support of nk is a compact subset of some coordinate patch,
k = 1, ... , m; and
(3) Lk'=1 nk(x) = 1 for every x E M.

Proposition 8.6. Any compact manifold M has a partition of unity.

336
8.3 Measure and integration on manifolds

PROOF. Every x E M is contained in some coordinate patch S. Since S


is relatively open there is a neighborhood U of x such that M n cl U c S.
Since M is compact, a finite collection {U 1, ... , Um} of such neighborhoods
covers M. Let Xk be the center of Uk' 15k the radius, and I/Ik the function of
class C(ro) constructed above with Xo = X k , 15 = 15 k , The collection of functions
{1/I1' ... , 1/1 m} satisfies (1) and (2) of the definition, but not necessarily (3).
However, by construction 1/I1(X) + ... + I/Im(x) > 0 for every x E M. Let

7r k(x) = I/Ik(X) k = 1, ... , m, XEM.


1/I1(X) + ... + I/Im(x)'
The collection {7rt. ... , 7r m } is a partition of unity for M. o
Definition. A set A c M is r-measurable if A n S is r-measurable for any
coordinate patch S.

An n-manifold M c En is just an open subset of P. For r = n, an n-


measurable set A is just a measurable subset of M in the sense of Section 5.2.
It can be shown that, for r < n, the collection of r-measurable subsets of an
r-manifold M has the same properties listed in Theorem 5.1 for measurable
sets (Problem 9).
Let I be continuous on M. Since M is compact, I is bounded on M. This
ensures that all of the following integrals exist. If the support of I is a compact
subset K of a coordinate patch, then we let

f I dv,. f
A
=
AnK
I dv,..
In particular, if {7r 1, ...• 7r m} is a partition of unity. then for any I the support
of I7rk is compact and lies in some coordinate patch. Hence the integral of
I7rk is defined.

Definition. Let A be an r-measurable subset of a compact manifold M, and


{7rt. ... , 7r m } be a partition of unity for M. Then for any I continuous
onM

(8.6)
fA
I dv,. = f f I7rk dv,..
k= 1 A

In case A is contained in some coordinate patch S, this agrees with the


earlier definition (8.4), since L 7rk(x) = 1 for every x E A, We must show
that the integral does not depend on the particular partition of unity chosen
for M. Let {xl' ... , Xp} be another partition of unity for M. Then for every
XEM
m

I(x)x/(x) = L I(x)x/(x)7rk(x), 1= 1.... ,p.


k=l

337
8 Integration on manifolds

Since the support of fXI is contained in some coordinate patch, its integral
over A can be written according to (8.4) as an integral over a set BeE'.
By Theorem 5.4, the integral over B of a finite sum is the sum of the integrals.
Hence

1= 1, ... , p,

In the same way

Since the right -hand sides are equal, the integral off over A does not depend
on which partition of unity is chosen.
If f(x) = 1 for every x EM, then the integral gives the r-dimensional
measure

~(A)= f Jnkd~.
k= 1 A

When A is a subset of some coordinate patch, this agrees with the previous
definition. If ~(A) = 0, then A is called an r-null set. The integral has the
same elementary properties listed in Theorem 5.4 for r = n (Problem 11).
Moreover, ~ is countably additive (Problem 10).

Measure and integration on noncompact manifolds


If M is not compact, the discussion is somewhat more complicated. Partitions
of unity consisting of infinite collections {n b n 2 , ••. } must be considered.
To Conditions (1) through (3) must be added:

(4) If K is any compact subset of M, then the support of 1tk meets K for
only finitely many k.
The sum in (3) is now an infinite series. However, on any compact set only
finitely many terms are different from 0. Every manifold has a partition
of unity. This can be proved by an elaboration of the proof of Proposition 8.6,
which we do not give.
Let f be continuous on A. Then f is called integrable over A if
Lk"= 1 SA I fink d~ is finite. If f is integrable over A, then its integral is

Lk"= 1 SA fnk dv,..

Proposition 8.7. If A is a subset of M that is an (r - I)-manifold, then A is an


r-null set.

338
8.3 Measure and integration on manifolds

This proposition can be proved as follows. By using partitions of unity,


it suffices to show that v,.(A n S) = 0 if S is any coordinate patch. Let A n S
= g(B), where g is regular from an open set ~ c E r onto S. It follows from
Theorem 8.3 (Section 8.9) and Corollary 3 (Section 5.8) that v,.(B) = O.
Hence by Equation (8.2), v,.(A n S) = O.

Corollary. If A is contained in a countable union of(r - I)-manifolds, then A


is an r-null set.

L
*Note. The line integral f ds was defined in Section 6.4 without requiring
that y be simple. If y is not simple, then its trace A = g([a, b]) need not be
contained in a I-manifold. There is a more general notion of r-dimensional
measure and integral for sets that are not necessarily subsets of an r-manifold.
The general formula, which becomes for simple arcs the one in Example 2, is

{f(X)N(X)dV1(X) = f f[g(t)] Ig'(t)ldt,

where N(x) is the multiplicity of the point x. For any r ;::: I there is a similar
formula

(*) {f(X)N(X)dv,.(x) = Lf[g(t)]j'g(t)dv,.(t),

where B is a measurable subset of E', g is of class C(l) on B, A = g(B), and


again N(x) (= number of points t E B such that g(t) = x) is the multiplicity of
x [7, p. 244].
In Formula (*) it is not necessary to assume that Dg(t) has maximum
rank r. If B' = {t E B: rank Dg(t) < r}, then j'g(t) = 0 for every t E B'.
Therefore B' contributes nothing to the integral on the right-hand side of (*).
It turns out that g(B') has r-dimensional measure 0, and therefore contributes
nothing to the integral on the left-hand side of (*).

PROBLEMS

1. Let A = {(x, y, z): x 2 + Z2 = a 2, 0 ~ y ~ b}. Find SA yz2 dV2(x, y, z). [Hint:


Cylindrical coordinates.]

2. In Example 3, verify that Jg(cp, 0) = sin cp.


Moments and centroids. These are defined in the same way as for r = n. For
example, if A has positive r-dimensional measure, then the components of its
centroid are

x'. = ~(~)
1 f x'dv,.(x),
. i = 1, ... , n.
v,.
A A

If r = 2, n = 3, and A is thought of as a surface with continuous density p(x)


(mass per unit of area), then the mass is .fA p(x)dV2 (x) and the components of

339
8 Integration on manifolds

I
the center of mass are

Xi = L xip(x)dVix) Lp(X)dVz(x), i = 1,2,3.

3. Show that 1e3 is the centroid of the hemisphere H in Example 3. Use spherical
coordinates on H.
4. Find the second moment about the z-axis of:
(a) The sphere x 2 + y2 + Z2 = 1.
(b) The triangle with vertices e l , e 2 , e 3 .
5. Let xo, XI' ... , x, be the vertices of an r-simplex ~. Show that the centroid of
~ is (r + 1)-I(XO + XI + ... + x,).

°
6. (Surfaces of revolution.) Let y be a simple arc (or simple closed curve) lying in the
half y > of the (x, y) plane. From Section 6.2, y has a standard representation G
on [0, I], where I is the length and IG'(s)1 = 1 for 0:::; s :::; I. Let g(s, t) = GI(s)e l +
G 2 (s) [(cos t)e 2 + (sin t)e 3 ], and let M = g((O, I) x [0,2n]).
(a) Prove Pappus's theorem: V2(M) = 2ny/, where (x, y) is the centroid of y.
(b) Find the area of a torus (doughnut) of major radius rl and minor radius r2'
7. Let S = {x: Ixi = I} be the unit (n - I)-sphere in En. Show that the (n - 1)-
measure of the "zone" {x E S: a < xn < b} depends only on the difference b - a
when n = 3, but this is false when II i= 3. Assume that - I :::; a < b :::; 1.
8. Let v(r) = IXnrn be the n-dimensional measure of a spherical n-ball of radius r. Show
that v'(r) is the (n - I)-measure of its boundary. [Hint: Spherical coordinates
(Section 5.9).] [Note: IXn was calculated in (5.46). If fJn = v,,- I [unit (n - I)-sphere],
then Pn = IllXn ·]
9. Let M be a r-manifold. Show that:
(a) Any set A open relative to M is r-measurable.
(b) If A is an r-measurable subset of M, then M - A is r-measurable.
(c) If A I, A 2, ... are r-measurable subsets of M, then A I U A2 U ... is r-measurable.
10. Let M be a compact r-manifold. Let A = Al U A2 U···, where Aio A 2, ... are
disjoint r-measurable subsetsofM. Show that v,.(A) = v,.(A I ) + v,.(A 2 ) + ···[Hint:
Use a partition of unity.]
11. Prove that the statements (1 )-(7) obtained by replacing n by r everywhere in Theorem
5.4 are true for integrals over r-measurable subsets of a compact r-manifold.

8.4 The divergence theorem


This is an n-dimensional generalization of the fundamental theorem of
calculus and has numerous applications in geometry and in physics. We
begin by stating the theorem and deriving some consequences of it. A proof
is given later in the present section. In Section 8.7 we give another formula-
tion of the divergence theorem, which is then used to obtain Stokes's formula
in Section 8.8.

340
8.4 The divergence theorem

Let ~ = (1 dx 1 + ... + (n dxn be a class C(l) differential form of degree 1.


We recall from Formula (7.58) that its divergence is the real valued function

~
· r~ = L...
d IV ac..
i= 1 ax'
The divergence theorem equates the integral of div ~ over a set D with an
integral over the boundary fr D, provided D belongs to a suitably restricted
class of sets called regular domains.

Definition. A set D is a regular domain if:


(1) D is open and bounded; and
(2) For any Xo E fr D there exist a neighborhood U of Xo and a function
<I> of class C(l) on U, such that grad <I>(x) =1= 0 and
(fr D) n U = {x E U: <I>(x) = O},
D n U = {XE U:<I>(x) < O}.

If D is a regular domain, then fr D is a compact {n - 1)-manifold. Con-


dition (2) ensures that D cannot be "on both sides" of its boundary fr D.

EXAMPLE I. Let D = {x: 1 x 1 < 1 or 1 < 1 x 1 < 2}. Then fr D is the union of
two concentric (n - I)-spheres of radii 1 and 2. However, D is on both sides
of the inner (n - 1)-sphere.
Actually, this example is rather artificial. If D is the interior of its closure,
D is bounded, and fr D is an (n - 1)-manifold, then using the implicit function
theorem it can be shown that D is a regular domain.

Definition. Let x E fr D, and n =1= 0 be a vector normal to fr D at x. Then


n is an exterior normal at x if there exists t5 > 0 such that x + tn E D
for -b < t < 0 and x + tn E (cl Dr for 0 < t < b.

From the definition, all exterior normals at x are positive scalar multiples
of any particular one. We shall be principally concerned with the unit exterior
normal, which will be denoted by v(x)( 1 v(x) 1 = 1).
Let us show that the exterior unit normal v is a continuous function on
fr D, if D is a regular domain. Let Xo be any point of fr D, and U, <I> as in the
definition above. By Corollary 1, Section 4.7, the vector n(x) = grad <I>(x) is
normal to fr D at x E (fr D) n U.
Let t/J(t) = <I>(x + tn(x)). Then t/J(O) = 0 and

t/J'(O) = grad <I>(x) . n(x) = 1grad <I>(x) 12 > O.

There exists b > 0 such that t/J(t) < 0 for -b < t < 0 and t/J(t) > 0 for
o < t < b. Therefore grad <I>(x) is an exterior normal at x. The vector
v(x) = igrad<l>(x)I- 1 grad <I>(x)

341
8 Integration on manifolds

fr D

Figure 8.5

is the unit exterior normal to D at x. Since <D is of class C(1), v is a continuous


function on (fr D) n U (see Figure 8.5). Since any Xo E fr D has such a
neighborhood U, v is continuous on fr D.

Divergence theorem. Let D be a regular domain and ~ a l-form of class c(1)


on cI D. Then

(8.7) r
lrrD
~(x)· v(x)dv,,_ 1(x) = r div ~(x)dv,,(x)
JD
The number ~(x) . v(x) is called the (exterior) normal component of the
co vector ~(x). Let us defer the proof until later in the present section. The
somewhat restrictive assumption that fr D is a C(1) manifold is made to
simplify the proof. The theorem is still true if fr D is not a manifold but
instead consists of a finite number of pieces of class C(1) intersecting in sets of
dimension n - 2. For example, if D is the interior of an n-cube then the
pieces are the faces, which are cubes of dimension n - 1 and intersect in
(n - 2)-dimensional cubes. This more general form ofthe divergence theorem
will be precisely stated at the end of the section. For certain special kinds of
sets D there is an easy proof of the theorem (Problem 3).
Note. In applying (8.7) we sometimes ignore the distinction between
vectors and covectors. If F is vector-valued with the same components as
~ (pi = Ci, i = 1, ... , n). then we set div F = div ~. On the left side of (8.7),
~ . v becomes F . v, the euclidean inner product of F and v.
For n = 3, the divergence theorem is often called Gauss's or Ostrogradsky's
theorem. It has various interesting physical interpretations. Let ~ be a force
field acting in some open set Do C E3. For each x E Do, ~(x) is the force
acting at x (Section 6.4). For notational simplicity, let us set M = fr D through-
out the discussion to follow. The number fM ~(x) . v(x)dV2(x) is called the
outward flux across the boundary M. The divergence theorem expresses
the outward flux as a volume integral over D. If D has small diameter and
contains x o , then the outward flux is approximately V3(D)div ~(xo). To
make this statement more precise let us prove the following.

342
8.4 The divergence theorem

Lemma 1. Iff is continuous on an open set Do containing Xo , then

f(x o) = .
hm
diamD-+O
1
-(-)
v" D
iD
f(x)dv,,(x).

In words, this formula says that given c > there exists D >
if D is any open set of diameter less than Dwith Xo ED, then
° ° such that

Iv,,(D)f(x o) - L f(x)d v,,(x) I < I: v,,(D).

PROOF. Given c > 0, let D > °


be such that I f(x) - f(x o)I < c whenever
Ix - xol < D. Ifxo E D and diam D < D, then

Iv,,(D)f(xo) - Lf(X)dVn I IL[f(xo) - f(x)]dv,,(x) I


=

:::; L'f(x o) - f(x)ldv,,(x) < 1:v,,(D). D

If in Lemma 1 we take D regular and f = div~, then for any nand Xo


in the domain of ~

(8.8) div ~(xo) = lim


diamD-+O v"
(lD) r ~(x)· v(X)dv,,_I(X).
JM
°
If div ~(x) = for every x in its domain Do, then ~ is called divergence
free (or solenoidal). The divergence theorem has the following corollary.

Corollary. Let ~ be of class C(1) on an open set Do. Then ~ is divergence free
if and only if
(*) L~(x)· v(X)dv,,_l(X) = 0, M = fr D,

for every regular domain D such that cI D c Do.


PROOF. If ~ is divergence free, then the equation (*) is immediate from (8.7).
Conversely if (*) holds for every such D, then by (8.8) div ~(xo) = for every
Xo E Do. D
°
EXAMPLE 2. Let n = 3 and ~ = yz dx. Then (l(X, y, z) = yz, (2 = (3 = 0,
and ~ . v = yzv l where Vi = vl(x, y, z) is the first component of the exterior
unit normal v. Moreover, div ~ = (a(dax) = 0. Thus SfrD yzv l dV2 = 0, by
the divergence theorem. For special choices of D this could also be checked
directly. For instance, if D is a spherical ball with center 0 and radius a, then
v(x) = a-lx, x = (x, y, z). The integral over the sphere fr D can be easily
evaluated using spherical coordinates.

343
8 Integration on manifolds

*Flows in En

°
Let t denote time, and imagine that points of E" move in such a way that a
point at x when t = moves to another position y at time t. For instance,
the movement might be due to the flow of a fluid in E3 (Section 8.5). If we
set y = Tt(x), then T t is a flat transformation from some portion of En into
E" for each t. Let us require that Tt+s = Tr Ts. Thus a point initially at x
0

moves first to y = T,,(x) at time s, then to z = Tr(y) at time t + s. We also


require that each Tr be regular and possess continuous second order partial
derivatives (see Figure 8.6).

Figure 8.6

Definition. Let Do c En be open, and for each t ;:::-: 0, let Tr be a transformation


from Do into Do. Assume that: (1) T,(x) is of class e(2) as a function of
(x, t) on Do x [0,00); (2) Tr is regular for each t ;:::-: 0; (3) To(x) = x for all
x E Do; and (4) Tr+s = Tr Ts for each s, t ;:::-: O. Then the collection
0

{Tr } is a flow.

If D is an open subset of Do, then the flow takes points initially in D into
points of Dr = Tr(D) at time t. Let v(t) = v,,(Dr)' The derivative v'(t) is the rate
of change of the n-dimensional measure of Dr. Let Wr = (aTrlat). Then

(8.9) v'(t) = LJTr(X)diV Wo(x)dv,,(x),

provided cl D c Do. We leave the proof to the reader (Problem 4). In par-
ticular, (8.9) implies that the n-dimensional measure vet) is constant if
°
div Wo(x) = for all XED.

Green's formulas
Formulas (8.10) and (8.11) are consequences of the divergence theorem and
are useful in many applications. Letfbe of class e(2) on cl D, and let M =
fr D as above. Let!v(x) denote the derivative offin the direction of the exterior
normal at x E M, namely,
fv(x) = df(x) • vex).

344
8.4 The divergence theorem

Let cfJ be another function of class e(2) on cl D, and let ~(x) = cfJ(x}df(x}.
Then
~(x) . v(x} = cfJ(x}f.(x),

div ~ =.2:
n a ( af
-ai cfJ -ai ) = dcfJ . df + cfJ Lapl f
,= I X X

Hence we get the first Green's formula:

(8.10) LcfJf.dv,,-1 = L[dcfJ·4f+cfJLaPlfJdv".

In the same way

LfcfJ. dv,,-l = L [df· dcfJ + fLapl cfJ ]dv".


Subtracting, we get the second Green's formula

(8.11) L[cfJf.-fcfJ.]dv,,-1 = L[cfJLaPlf-fLaPlcfJ]dv".

EXAMPLE 3. A functionfis called harmonic if Laplf = O. Letfbe harmonic,


and apply the first Green's formula with cfJ = f Then

(8.12) Lff.dv,,-l = L ldfl2 dv".

When n = 3 the right-hand side often has (except for a suitable multiplicative
constant) the physical interpretation of energy.
Ufis harmonic andf(x} = 0 for every x E M, then from (8.12) the integral
of the nonnegative continuous function Idf 12 is O. Hence df(x} = 0 for every
x E cl D. Given Xo E D let Xl be a point of M nearest Xo. The line joining Xo
and Xl lies in cl D, and from the mean value theoremfis constant on it. Since
f(x l } = 0, we must havef(x o} = O. Thusf(x) = 0 for every x E cl D.
Suppose that f and g are both of class e(2) on cl D and harmonic, and that
f(x} = g(x) for every X E M. Then cfJ(x} = f(x) - g(x) = 0 for X E M and cfJ is
harmonic. Hence cfJ(x} = 0, and f(x) = g(x), for every X E cl D. This shows
that there is at most one harmonic function of class e(2) on cl D with given
values on the boundary M. It is more difficult to show that there is in fact a
harmonic function f with given boundary values. This is called Dirichlet's
problem. Uthe boundary dataf IM are merely continuous, thenfis continuous
on cl D and of class C(2) and harmonic on D. [15, Chapter XI]. If the boundary
data are smooth enough, then f is of class e(2) and harmonic on cl D. For
instance this is true if M is of class e(3) andfis of class e(3) on M.

Let us now turn to the proof of the divergence theorem. The proof pro-
ceeds by first considering two particular cases (Lemmas 2 and 3). The general
result is reduced to these via a partition of unity. As in Chapter 5, .f f dv"
denotes the integral off over all of En.

345
8 Integration on manifolds

Lemma 2. Let ~ be a 1{orm of class e(t) on En such that ~ has compact support.
J
Then div ~ dv" = O.
PROOF. Let 1 sis n. By the iterated integrals theorem

where Xi' = (Xl, ... ,Xi-I, Xi+ I, ... ,xn). Since (i has compact support,
the inner integral is 0 by the fundamental theorem of calculus. Therefore
J O(i/OXi dv" = O. Summing from 1 to n we get the lemma. D

In Lemma 3 we write (as in Section 5.5) x' = (Xl, ... ,xn- I ) instead of xn'.
We return to the situation depicted in Figure 8.2. Let R c En - I be an open
set, and ¢ of class e(t) on R with ¢(x) > O. Let S = {(x', ¢(x')): x' E R}, and
v(x) = (vl(x), ... , vn(x)) the normal vector to S at x with [v(x)[ = 1 and
vn(x) > O. Let e be the cylindrical region bounded above by S and below by
the hyperplane xn = 0, namely,
e= {(x', xn): 0 < xn < ¢(x'), x' E R}.

Lemma 3. Suppose that (n(x', 0) = 0 for all x' E R. Then

L~::dv" = i(nVndv,,-I'

PROOF. By the iterated integrals theorem,

f ox: dv"
c
o(
=
{{
JR Jo ox: dxn
(",(X') iJ( }
dv,,_I(x').

Since (n(x', 0) = 0, the fundamental theorem of calculus implies

(*) L ~~: dv" = Lux', ¢(x'))dv,,_I(x').


Let us next show that
(**) vn(x)(1 + [grad ¢(X'W)I/2 = 1

for any x = (x', ¢(x') E S. To see this, let <I>(x) = xn - ¢(x') as in Example 2,
Section 8.1. Then
grad <I>(x) = en - grad ¢(x'),
[grad <I>(x) [ = (1 + [grad ¢(X'W)I/2.
Since v(x) = [grad <l>(x) [- I grad <I>(x), we get (**). In the same way as for
Example 1, Section 8.3, iffis continuous on S, then

ifdv,,-I = Lf(X', ¢(x'))(1 + [grad ¢(X'W)I/2 dv,,_I(x').


We get Lemma 3 by takingf = (n vn , and using (*), (**). D

346
8.4 The divergence theorem

Let us next show that the divergence theorem is correct for 1-forms with
sufficiently small support.

Lemma 4. Let D be a regular domain. Then any Xo E cl D has a neighborhood


V such that (8.7) holds, provided ~ is a 1Jorm of class e( I) and the support of
~ is contained in V.
PROOF. If Xo E D, let V be any neighborhood of Xo of radius less than the
distance from Xo to fr D. Then ~(x) = 0 for all x E fr D, since ~ has support
contained in V. Thus the left side of (8.7) is 0. By Lemma 2 the right side of
(8.7) is 0.
Next suppose that Xo E fr D. By a translation and rotation in En we may
assume that the components of both Xo and the exterior unit normal v(xo)
are positive:
i = 1, ... , n.
(see Problem 9). Let us show that, for each i = 1, ... , n, there exists a neigh-
borhood Vi of Xo such that

provided ~ has support contained in Vi' One then gets (8.7) by summing
from i = 1 to n, and taking ~ with support contained in V = V In··· n Vn.
For notational simplicity, consider i = n. The same proof applies when i < n.
By the implicit function theorem, there is a relatively open set S c fr D
with Xo E S, such that
S = {(x', 4>(x')): x' E R},
with R c en-I an open set and 4> of class e(l). Since xZ > and xZ = 4>(x o),
°
we may choose S small enough that 4>(x') > for all x E R. Let II> have the
°
properties in the definition of regular domain. Since grad lI>(x o) is a positive
scalar multiple of v(x o) and vn(xo) > 0, the partial derivative II>n(XO) is

°
positive. Let us choose a neighborhood Vof Xo sufficiently small that x' E R,
xn > 0, and II>n{x) > for any x = (x', xn) E V. Then lI>(x', .) is an increasing
function on the interval Ix' = {x n : (x', xn) E V} and <l>(x', 4>(x')) =0. Let
Vn c V be a neighborhood of Xo small enough that (x', 4>(x')) E V for any
x E Vn. For xED n Vn' we have <l>(x) < 0, <l>(x', 4>(x')) = 0, and <l>(x', )
increasing on I",. Thus, xn < 4>(x'). Therefore, D n VII c e with e as in
Lemma 3 (see Figure 8.7).
If ( has support contained in V n' then

i (nVndv,,_l = S(nvndv,,-I,

I i
frD S

i D
a(n
-a
X
n dv"
a(n
ox
=
Dn Un
-;;-n dv" =
C
a'n
-a
x
n dv".

By Lemma 3, the right-hand sides are equal. o


347
8 Integration on manifolds

Figure 8.7

Proof of divergence theorem


Let ~ be a I-form of class e(l) on cl D. Then by definition ~ has a e(l) extension
to an open set Do containing cl D (Section 3.4). We also denote this extension
by ~. Every Xo E cl D has a neighborhood V c Do with the property stated
in Lemma 4. Since cl D is a compact set, a finite number of such neighbor-
hoods Vb . .. , V m cover cl D. Let {1l: b ••• ,1l:m } be a partition of unity of the
kind constructed in the proof of Proposition 8.6, with the support of 1l:k con-
tained in Uk for k = 1, ... , m. Then 1l:k ~ has support in Uk, and by the product
rule for derivatives,

By Lemma 4,

JfrD
r (1l:k~). V dv,,-I = r div(1l:k~)dv", k =
JD
1, ... , m.

Since Lk= I 1l:k = 1, we have L;= 1 d1l: k = 0 and

r
JrD
~.vdv,,_1 = f Jr
k= I frD
(1l:k~)·vdv,,-I'

This gives (8.7), completing a proof of the divergence theorem. D

The assumption that fr D is a manifold of class e(2) can be considerably


weakened. Let us state without proof a somewhat more general version of the
divergence theorem. Let D be an open, bounded set. Assume that
fr D = Al U··· U Ap u B,

348
8.4 The divergence theorem

Figure 8.8

where: (a) Ak is a relatively open subset of fr D and cI Ak is a compact subset


of an (n - I)-manifold M k , k = 1, ... , p; and (b) B is a compact set con-
tained in a finite union of (n - 2)-manifolds and (cI A k ) n (cI AI) c: B when-
ever k # 1. Moreover, assume that D is the interior of its closure. Then

±f ~'Vdv,.-l
k= 1 Ak
= r div~dv,.
JD
provided ~ is of class C1) on cI D (see Figure 8.8). Let us say that such a set D
has a boundary which is piecewise of class C(l).

EXAMPLE 4. Let D be an n-simplex. Let A o , ... , An be its (open) (n - 1)-


dimensional faces, let M k be the hyperplane containing A k , and let B be the
union of the (n - 2)-dimensional faces of D.

PROBLEMS

1. Let n = 2 and ~ = y dx - x dy. Verify (8.7) by evaluating the integrals on both


sides if:
(a) D = {(x, y): x 2 + y2 < a2}.
(b) D is the triangle with vertices 0, e 1 , e l + e 2 •
2. Let D be the solid cone in E3 with vertex e 3 and base B = {(x, y, 0): x 2 + y2 < I}.
Let ~ = x 2 dx + y2 dz. Evaluate
(a) SB ~ • V dV2 •
(b) SA ~. V dV2 , where A = fr D - B. Use the divergence theorem.

3. Prove the divergence theorem directly from the fundamental theorem of calculus
when Dis:
(a) The unit n-cube {x: 0 < Xi < I, i = 1, ... , n}.
(b) The standard n-simplex.
4. Consider a flow {T,}, and define W" v(t) = v,,(D,) as in the discussion preceding (8.9).
Show that:
(a) JT,(x) > 0 for all x E Do and t ~ o.
(b) (OjiJt)JT, = div W 0 when t = O.
(c) (iJ/iJt)JT, = JT, div Wo.
(d) Formula (8.9) holds.
5. Let f(x, y) = t log(x 2 + y2), (x, Y) #- (0,0). Let D = {(x, y): 0 < x 2 + y2 < a2}, and
let ~ = df Show that the left side of (8.7) is 271:. but the right side is o. Why does not
this contradict the divergence theorem?

349
8 Integration on manifolds

In Problems 6 and 7, let M = fr D.


6. Show that:
(a) JM vi(x)dv,,_I(X) = O.
(b) JM X' V(X)dv,,-I(X) = nv,,(D).
(c) JM !.(X)dv,,-I(X) = JD Lapl f(x)dv,,(x).
7. Let D be connected, f harmonic, and !.(x) = 0 for every x E M. Show that f(x) is
constant on D.
8. Let D = {x:a < Ixl < b}, where 0 < a < b.
(a) Show that if f(x) = !/t( I x I), then !.(x) = Ij/( I x /) when I x I = band !.(x) =
-Ij/(Ixl) when Ixl = a.
(b) Let !/t(r) = rk, and let n > 2. Find the value of k for which the function f in (a) is
harmonic.
(c) Let 1> be harmonic on the n-ball B = {x: Ixl ~ b}. Show that

1>(0) = (f3n bn - 1)-1 f


fr B
1>dv,,-I'

[Hint: Apply the second Green's formula with D and f as above and let a -+ O+.J
f3n is as in Problem 8, Section 8.3.
9. Let g(t)= Xo + L(t), where L is a rotation of En. Show that both sides of (8.7) equal
the corresponding integrals in which ~ is replaced by ~" = Ii= 1 ((i g)dgi and D 0

by fj = g-I(D).

*8.5 Fluid flow


As another illustration of the role of the divergence theorem in physical
applications, consider a fluid flowing in an open set Do C E3. Let t denote time
and x = (x, y, z). Let p(x, t) be the density and v(x, t) the velocity at x and time
t. They are assumed to be of class C(1). Let m(t) denote the mass of the fluid at
time t, in a portion D of Do. Then

m(t) = Lp(X, t)qV3 (x).


Suppose that D is a regular domain with cl D c Do. The rate at which mass
flows out of D is

- dm
dt
= r (pv). v d V
JrrD 2•

From the divergence theorem,

- ~7 = L div(pv)dV3'

On the other hand, by differentiating under the integral sign (Section 5.12),

dm
dt =
f. Jiop
D dV3 '

350
8.5 Fluid flow

For each to the functions -div(pv) and op/ot have the same integral over
every regular D with cl D c Do. By Lemma I, Section 8.4, for every Xo E Do
these functions have the same value at (xo, to). In other words,

~= -div(pv).

If the density p is constant, then the fluid is incompressible. Thus for incom-
pressible fluids div v = 0 at every time t.
In the above discussion the only physical principle used is conservation
of mass. To proceed further, some additional description of the physical
properties of the flow is needed. Let us consider the following simple model
of fluid flow. It is assumed that: (1) the fluid is incompressible, (2) the velocity
v(x) at a point x is not time dependent, and (3) there exists a function J of
class e(2) on Do such that v = grad!
Flows that satisfy (2) are called steady. The function J in (3) is called a
velocity potential. To better understand Condition (3), let us introduce the
concept of circulation along a curve y lying in the region Do of flow. Let us
choose arc length s as parameter on y, and let G be the corresponding
standard representation of yon [0, I], where I is the length of y (Section 6.2).
Then G'(s) is a unit tangent vector at the point x = G(s) on y. The tangential
component of the flow is v[G(s)] . G'(s), and the circulation along y is

c = f~V[G(S)]' G'(s)ds.
The circulation can be rewritten as follows. Consider the I-form
( = Vi dx + v2 dy + v3 dz, where vl(x), v2 (x), v3 (x) are the components of the
velocity v(x). Then

c = L~.
Condition (3) states that ~ = dJ is an exact differential form. By Theorem
6.1, Condition (3) is equivalent to the statement that the circulation along
any closed curve y lying in D is O.
A weaker assumption than (3) is that: (3') ~ is a closed differential form
(d~ = 0). The flow is called irrotational if (3') holds, for reasons indicated in
Section 8.8. Theorem 8.4 (Section 8.10), implies that (3) and (3') are equivalent
if Do is simply connected.

EXAMPLE·1. Let Do be £3 with the z-axis removed, and let v(x, y, z) =


(x 2 + i)-I( - yel
+ xe 2). The corresponding I-form is
~ = xdy - ydx
x2 +i .
By Example 2, Section 6.3, ~ is closed but not exact. Condition (3') is satisfied,
but there is no velocity potential. In this example, z plays no role; the flow is

351
8 Integration on manifolds

effectively 2-dimensional. Particles move in circles x 2 + y2 = a 2 about


(0,0), with the same period 2n for all a. The point (0,0) is called a vortex
[19, p. 24].
Let us return to fluid flows satisfying (l) through (3). Since
°= div ~ = div(df) = Lapl f,
the velocity potential must be a harmonic function in Do. Let us assume that
fr Do is a C1) manifold and impose the additional condition at the boundary:
°
(4)fix) = at each x E fr Do, where!v(x) is the derivative in the direction of
the exterior unit normal v(x). Since f,.(x) = gradf(x) . v(x) = v(x) . v(x),
Condition (4) states that the component of velocity normal to fr Do is 0. This
is a reasonable condition since the fluid cannot enter or leave Do through
fr Do. In addition, a condition may be imposed on v(x) for large Ix I, if Do is
unbounded.

EXAMPLE 2. Let us consider 2-dimensional flow past a circular obstacle. Let


Do = {(x, y): x 2 + y2 > I}. We require that v(x, y) ---+ voe l as x 2 + y2 ---+ 00.
In other words, the flow is nearly in the positive x direction with speed Vo
far from the obstacle. Let us find the velocity potential f(x, y). For this pur-
pose we suppose that
f(r cos 8, r sin 8) = g(r)h(8).

We assume that g'(l) = 0, which amounts to Condition (4). Moreover, we


assume that h(8) is decreasing for 0< 8 < n12, and that h( -8) = h(8) =
- h(n - 8). The last condition imposes reasonable symmetries on the flow.
Since Laplf = 0, one gets (Problem 1):
h( 8) = cos 8
(*)
r2gl/(r) + rg'(r) = g(r).

This second-order linear differential equation for g can be solved by observing


that rand r- 1 are particular solutions. The general solution is g(r) =
Clr + C 2r- l . The choice C I = C 2 = Vo gives g(r) = vo(r + r- I ). satis-
°
fying g'(1) = and g'(r) ---+ Vo as r ---+ 00. Then

f(r cos 8, r sin 8) = vo(r + ~ )cos 8

is the desired velocity potential. It can be shown thatfis uniquely determined


up to an additive constant. Figure 8.9 indicates the flow. At each point (x, y)
of a flow curve, v(x, y) is a tangent vector. Our solution to this problem de-
pended on solving differential equations for g and h. The theory of complex
analytic functions and conformal mapping gives another method for finding
velocity potentials for 2-dimensional flows past obstacles of various shapes
[19, Chapter 1]. The reader who has observed the flow of a swiftly moving
river past obstacles may question the realism of this simple model of fluid

352
8.6 Orientations

;:
~n Figure 8.9
:
~
flow. In particular, the solution does not account for eddies downstream from
the obstacle, in which the velocity vector differs greatly in direction from the
general direction of flow.

PROBLEMS

1. In Example 2, show first that 9 - l(r2g" + rg') = - h - 1h" = C where C is a positive


constant. Then show that C = 1 from the conditions imposed on h.
2. Suppose that the velocity potential has the form !(r cos e, r sin e) = g(r) for all
r > O. Show that g(r) = C 1 log r + C 2, where C 1, C 2 are constants. Find the velocity
vector v(x, y) and describe the flow.
3. Consider the flow in £2 such that v = grad f. where !(x, .1') = xi. Show that the flow
is not incompressible. Find the flow curves and sketch them; each such curve is
represented by a solution g of the differential equation g'(t) = v[g(t)].

8.6 Orientations
Let M be an r-manifold. For each x E M, the tangent space TM(x) is an r-
dimensional vector subspace of En. According to Section 7.5 TM(x) has two
possible orientations, each of which is an r-vector of norm 1. If one of these
orientations is denoted by o(x), then the other is - o(x). We would like to
choose the orientation for TM(x) consistently on M: in other words, we want
the function 0 whose value at x is o(x) to be continuous on M.

Definition. M is an orientable manifold if there exists a continuous r-vector-


valued function 0 such that o(x) is an orientation for the tangent space
TM(x) for every x E M. The function 0 is an orientation for M.

It can be shown that a connected manifold has at most two orientations.


Let us find out what orient ability means in the extreme dimensions r = I,
n - 1,'1.
The case r =
If M is a I-manifold, then the two orientations for the I-dimensional vector
space TM(x) are unit tangent vectors at x pointing in opposite directions. M
is oriented by assigning a unit tangent vector v(x) continuously on M (Figure
8.10). It can be shown that every I-manifold is orientable.

353
8 Integration on manifolds

Figure 8.10

The case r = n
Here the n-manifold M is an open subset of En. The possible values for
o(x) are ±e l ... n • If M is connected, then o(x) must be constant. If o(x) =
e 1 " ' n for every x E M, then M is positively oriented; and if o(x) = -e 1 ... "
for every x E M, then M is negatively oriented.
The case r = n - 1
If M is an (n - I)-manifold in E", then the adjoint n(x) = *o(x) is a unit normal
vector to Mat x. The condition that M be orientable is that a unit normal can
be chosen continuously on M. If D is an open set that is a regular domain
(Section 8.4), then the exterior unit normal orients fr D. We call this the
positive orientation on fr D. If M is not the boundary of an open set, then M
may not be orientable. This is shown by the following famous surface.

EXAMPLE I (Mobius strip). This is a 2-manifold M c E3 that is not orien-


table. It may be visualized by twisting a strip of paper and pasting together
the ends (Figure 8.11). The edge of the strip must be omitted in order that M
be locally like E2. The fact that a unit normal cannot be chosen continuously
may be expressed more picturesquely by saying that the Mobius strip is a
surface with "only one side."

Figure 8.11

EXAMPLE 2. The Mobius strip is not a compact 2-manifold. An example of


a compact, nonorientable 2-manifold is the Klein bottle, or twisted torus. It is
obtained by also joining together the lateral edges of the rectangle used to
make the Mobius strip, as indicated in Figure 8.12. The Klein bottle cannot
be realized as a submanifold of E3 , since it can be proved that any compact
(n - I)-manifold M c En is the boundary of an open set and hence is orien-
table. However, the Klein bottle can be realized as a submanifold of E4.

354
8.6 Orientations

Figure 8.12

Orientation induced by a regular transformation


Let S be a coordinate patch on an r-manifold M, and g a regular transforma-
tion from an open set AcE' onto S. By Proposition 8.1, the partial deriva-
tives of g give a frame (gi (t), ... , gr(t)) for the tangent space to M at x = get).
These frames vary continuously with t, and thus assign an orientation Ilo
varying continuously on S. In fact, from the discussion of orientations in
Section 7.5
(8.13) x = get)
with ,Ig(t) = IgI(t) A ... A gr(t) I as in (8.1).

Definition. Ilo is the orientation induced on S by g from the positive orientation


onA.

For evaluating integrals of r-forms in later sections, we need to know


whether an induced orientation Ilo agrees with a preassigned orientation o.
If S is connected, then either llo(X) = o(x) for all XES or else llo(X) = -o(x)
for all XES. This is true since Ilo and 0 are continuous on S, and llo(X) =
±o(x) for all XES.
The following proposition is often useful in this connection. Let 0 be an
orientation for S. Then
o(X) = L o,\(x)e,\,
[,\]

where OA(X) is the component of o(x) with respect to the standard basis
r-vector e Aand A. = (i1" .. ' ir) denotes an increasing r-tuple as in Section 7.5.
As in Section 8.1, let gA = (git, ... ,gir) be the flat transformation obtained by
considering only those components of g.

Proposition 8.8. The induced orientation agrees with o(x), at x = get), if OA(X)
and the Jacobian JgA(t) have the same sign. If OA(X) and JgA(t) have opposite
signs, then the two orientations are opposite.
PROOF. By (7.33) with hj = git), the Ath component of gl(t) A •.• A g.(t) is
JgA(t). Therefore oc~(x) = [,Ig(t)r IJgA(t). Since llo(X) = ±o(x), llo(X) = o(x)
if and only if ocMx) = OA(X). D

EXAMPLE 3. Let H be the hemisphere in Example 3, Section 8.3, oriented so


that OI2(X) > 0 for every x E H. If (4), e) are spherical coordinates and g is as

355
8 Integration on manifolds

in that example, then


12 A.. (}) 0(gl,g2) . A.. A.. 0
Jg ('f', = 0(4), ()) = sm 'f' cos 'f' > .

Therefore, the orientation induced by g agrees with the given orientation o.


In this example the vector n(x) = *o(x) is normal to H, and its third component
n 3 (x) = 012(X). We have oriented H so that the normal "points upward."

PROBLEMS

1. Let M c E2 be a I-manifold determined by a C(l) function <1>, namely, M =


{(x, y): <l>(x, y) = 0, grad <l>(x, y) =f. O}. Let v(x, y) = Igrad <l>(x. yW 1(<I>2(X, y)el -
<1>1 (x, y)e2)'
(a) Show that v is an orientation for M.
(b) Show that the only orientations for M are v and -v, if M is connected.
(c) Find all possible orientations for M if <l>(x, y) = x 2 - y2 - 1.
2. Let S = g(R), where g(x, y) = xe l + ye 2 + cfJ(x, y)e 3 and R c E2 is open. Find the
orientation 0 on S induced by g from the positive orientation of E2. Find the normal
n(x) = *o(x).

3. Let M = {(x, y, z): l = x 2 + Z2 + I, y < O}, and 0 the orientation for M such that
031(X, y, z) < O. Let (r,O) be polar coordinates in the (x, z) plane, and g(r, 0) =

(r cos O)el - (r2 + 1)1/2e 2 + (r sin O)e3' Use Proposition 8.8 to show that the induced
orientation agrees with o.
4. Suppose that M is the r-manifold determined by <1>, in the sense that M satisfies
(4.26), Section 4.7. Show that Mis orientable.

8.7 Integrals of r-forms


Let us first define the integral of a differential form ro of degree r over a portion
A of an r-manifold M. The integral depends on the orientation assigned to M,
and changes sign if the orientation is reversed.
Let M be an r-manifold with orientation 0, and ro an r-form continuous on
M. For each x E M consider ro(x) • o(x), the scalar product of the r-covector
ro(x) and the r-vector o(x). Since ro and 0 are continuous functions, ro • 0 is a
continuous real valued function. Let A be an r-measurable subset of M.

Definition. The integral of ro over A with the orientation 0 is

(8.14) f f AO
ro =
A
ro(x) • o(x)dv,.(x),

provided ro • 0 is integrable over A.

In particular, if M is compact, then ro· 0 is continuous, bounded, and


hence integrable over any r-measurable subset of M. The integral has the

356
8.7 Integrals of r-forms

following elementary properties:


(1) SAo(ro l + ro 2 ) = SAo ro l + SAo ro 2 .
(2) SAO cro = C SAO ro, for any scalar c.
(3) SA ro = - SAO ro.
-0

(4) If Iro(x) I ::; C for every x E A, then ISA o rol ::; CY,.(A).
(5) SAO ro = SAr ro + SA~ ro if A = Al U A2 and Al n A2 is empty.
These follow at once from corresponding elementary properties of the
right-hand sid\! of (8.14) (see Problem 11, Section 8.3).
For instance, in (3),

{ro(X)' [ -o(x)]dY,.(x) = - {ro(X)' o(x)dY,.(x).

Since Io(x) I = 1, lro(x) • o(x) I ::; Iro(x)l. Then in (4),

i{ro(X)'O(X)dY,.(X)i::; {lro(X)ldY,.(X)::; CY,.(A).

The case r = n
Let A + denote A with the pOSItIve orientation el "' n of En. Let ro =
f dx l /\ ... /\ dxn be an n-form. Then
ro(x) • e l ... n = WI ... n(x) = f(x),
and (8.14) becomes

(8.15) f A+
fdxl /\ ... /\ dxn = fA
fdv".

The left-hand side of(8.15) changes sign if either the orientation of A is reversed
or two differentials dXi and dx i are interchanged. For instance, if n = 2 then

fA+
f dx /\ dy = f
A
f dV2 ,

A- f
fdx /\ dy =
A+
fdy /\ dx = f A
fdV2 • -f
Let us next show how to rewrite JAo ro if r < n and A is an r-measurable
subset of some coordinate patch S. Let g be regular from an open set ~ c E'
onto S. We recall from Section 7.7 that ro~ is the corresponding r-form on ~,
obtained formally by substituting everywhere g(t) for x and dg i for dXi.

Proposition 8.9. If 0 is the orientation on S induced by g from the positive


orientation on ~, then

(8.16) fAO
ro = r ro~
JB + '
A = g(B),

provided either integral exists.

357
8 Integration on manifolds

PROOF. Given t, let x = g(t) and L = Dg(t). Recall that git) = L(E), where
{E 1, ... , Er} denotes the standard basis for Er. By (7.39),
gl(t) /\ ... /\ gr(t) = L r(E l /\ ... /\ Er) = L r(E 1 ... J
By (7.49), ro#(t) = L:Cro(x)J. Since E1"' r is the positive orientation for E',

(*) r ro# = Jrro#(t)· E1 ... r dv,.(t).


JB + B

In (7.41), let p= E1... r . Then


ro[g(t)] . (gl (t) /\ ... /\ gr(t)) = ro#(t)· E1 ... r'
On the other hand, let f = ro . 0 in (8.4). Then
(**) f f
AO
ro =
A
ro(x). o(x)dv,.(x) =
JB
rro[g(t)]. o[g(t)]/g(t)dv,.(t).
Since 0 is the induced orientation, we have o(x) = (Xo(x) in (8.13). Therefore
gl (t) /\ ... /\ gr(t) = /g(t)o[g(t)J.
Then (8.16) follows from (*) and (**). o
EXAMPLE I. Let r = 1, ~ eEl. Then ro#(t) = ro[g(t)J . g'(t). If B is an interval,
then SB+ ro# is the line integral of ro along the curve in En represented para-
metrically on B by g.

EXAMPLE 2. Consider the hyperboloid M = {(x, y, z): x 2 = i + Z2 + I}.


M is a 2-manifold. Let A = {(x. y. z) EM: I s;; x < j2}. and 0 the orientation
such that 023(X, y, z) > O. It is convenient to coordinate A using polar co-
ordinates (r,O) in the (y, z) plane. Thus, let g(r, 0) = (r2 + 1)1/2e l +
(rcosO)e2 + (r sin 0)e3, and B = {(r,O):O < r < 1,0 < 0 < 2n}. Now
(dx)# = d[(r2 + 1)1/2] = r(r 2 + 1)-1/2 dr
(dy)# = d(r cos 0) = cos 0 dr - r sin 0 dO
(dzt = d(r sin 0) = sin 0 dr + r cos 0 dO
(dx /\ dy)# = (dx)# /\ (dy)# = - r2(r2 + 1) - 1/2 sin 0 dr /\ dO
(dz /\ dx)# = (dz)# /\ (dx)# = - r2(r2 + 1) - 1/2 cos 0 dr /\ dO
(dy /\ dz)# = (dy)# /\ (dz)# = r dr /\ dO.
In the process of computing (dy /\ dz)# we have computed the Jacobian

J 23( 0) 8(g2,g3)
g r, = 8(r, 0) = r.
Compare with Example 1, Section 7.7. Since Jg B and 0 23 are both positive,
Proposition 8.8 implies that 0 agrees with the orientation induced by g.
Ifro = Pdx /\ dy + Qdz /\ dx + Rdy /\ dz, then
ro# = po g(dx /\ dy)# + Q a g(dz /\ dx)# + R a g(dy /\ dzt

358
8.7 Integrals of r-forms

We may use Proposition 8.9 here. Note that g(B) differs from A by a portion
of the curve on M corresponding to = 0 or 2n. The curve contributes e
nothing to integrals over A. For instance, let ol = xz dx /\ dy. Then ol' =
- 1'3 sin 2 e dr /\ de. By (8.16),

f AO
ro = ( (- 1'3 sin 2 e)dr /\ de = - ( r3 sin 2 e dVir, e)
JB + JB
= - (2"sin2 e de (\3 dr = - ~.
Jo Jo 4
Example 2 suggests how Proposition 8.8 and 8.9 can be generally applied.
By Formula (7.23) it suffices to consider ol = f dX i, /\ ... /\ dx ir , since any
r-form is the sum of terms of this type. Then
(dX i, /\ ... /\ dx ir )' = dg i' /\ ... /\ dg ir = Jgi'(t)dt 1 /\ •.• /\ dtr.
If OA(X) and JgA(t) have the same sign, then

(8.17) fAO
fdxi, /\ ... /\ dx ir = ( fog JgAdv,..
JB

A second version of the divergence theorem


Let D c En be a regular domain. The exterior unit normal v provides an
orientation 0 for fr D, called the positive orientation. For each x E fr D, o(x)
is the (n - I)-vector such that v(x) = *o(x). Let us write aD+, instead of
(fr D)0, for fr D with this orientation. (The symbol is widely used to denote a a
boundary.)

Divergence theorem (Second version). Let D be a regular domain and ol an


(n - l)-{orm of class e(l) on cl D. Then

(8.18) ( ol = ( dOl.
JaD + J D+

PROOF. Consider the I-form ~ = *ol. By Formulas (7.56) and (7.59),


ol(x) - o(x) = *ro(x) - *o(x) = ~(x) - v(x),
dOl = div ~ dx 1 /\ •.• /\ dxn.
Therefore

( ol = ( Ol-Odv,,-l = ( ~-Vdv,,-b
JaD + 1rrD JrrD
( dOl = ( div ~ dx 1 dxn = ( div ~ dv".
JD JD JD
/\ .. , /\
+ +

Then (8.18) follows from (8.7). D

359
8 Integration on manifolds

One can characterize the orientation o(x} in terms of frames. Let


(hi" .. , hn - 1) be a frame for the tangent space to the (n - 1}-manifold
fr D at x. This frame determines the orientation o(x) if (v(x), hi' ... , hn - 1} is a
positively oriented frame for En; otherwise it determines the orientation
-o(x}.

The case n = 2
Suppose that fr D = C 1 u··· U Cm' where each Ck is the trace of a simple
closed curve 'l'k> and C 1, ••. , Cm are disjoint. The orientation is chosen by
selecting the unit tangent vector v(x, y} so that (v(x, y), v(x, y» is a positively
oriented orthonormal frame for E2. Intuitively speaking, this means that as
the boundary is traversed, D is always on the left. Then

i f oD+
0> =
YI
0> + ... + fYm
0>.

If we write 0> = M dx + N dy, then (8.18) becomes


(8.19) L
m
k= 1
fYk
M dx + N dy = i (ONa - aOM)
D+ X Y
dx A dy.

This is known as Green's theorem (see Figure 8.13).

Figure 8.13

EXAMPLE 3. Let 0> = ~x dy - y dx}. Then do> = dx A dy and V2 (D} =


fD+dx A dy. Hence the area of D can be written as an integral over the
boundary:

V2 (D} = 1
-2 icD+
xdy - ydx.

PROBLEMS

1. Let II c E" have the positive orientation and let g be a regular flat transformation
from II into E".
(a) Show that g induces the positive orientation on g(ll) if and only if Jg(t) > 0
for every tEll.
(b) Show that (8.16) becomes

f A+
f dx l /\ ••. /\ dx" = f
B+
fog Jg dt l /\ ••. /\ dt",

provided Jg(t) > 0 for every tEll.

360
8.7 Integrals of r-forms

2. Let A = {(x, y, z) : y = x 2 + Z2, Y :::; 4}, oriented so that 031(X) > O. Evaluate:
(a) SAO z dx /\ dy. (b) SAO exp y dz /\ dx.
[Hint: Use polar coordinates in the (x, z)-plane.]

3. Let A be the triangle in £3 with vertices e l , -e 2 , 2e 3 .


(a) Show that 0 = !(2e23 - 2e 3l + e 12 ) is an orientation for the plane containing A.
(b) Evaluate SAO x dy /\ dz. [Hint: Take g affine such that giL) = A, where L is the
standard 2-simplex.]
4. Let A = {(x, y, z): x 2 + i = Z2, X > 0, 0 < z < I}, oriented so that OI2(X) < O.
Evaluate

f AO
Z2 dy /\ dz.

5. Let n = 4 and M = {x: (X I)2 + (X 2)2 = 1, (X 3)2 + (X4)2 = I}. Let gis, t) = (cos S)el +
(sin S)e2 + (cos t)e3 + (sin t)e 4, 0 :::; s, t :::; 2n.
(a) Find the orientation 0 induced by g from the positive orientation of £2.
(b) Evaluate

fMO
dX3 /\ dx 4 + Xl x 3 dx l /\ dx4.

6. Let A = g(B), where g is as in Problem 2, Section 8.6, and 0 the orientation induced
by g. Using (8.16), show that

fAO
Pix, y, z)dx /\ dy = f B
Pix, y, <p(x, y))dVl(x, y).

7. Let Dc £1 be a regular domain. Show that:


(a) Vl(D) = - SeD+ Y dx.
t
(b) SD (x 2 + i)dV2 = Si'D+ x 3 dy - y3 dx.

8. Evaluate SoP i dx /\ dz, where L is the standard 3-simplex.

9. Let D be the disk x 2 + y2 < I and 0) = (x dy - Y dx)/(x 2 + il. Then SeD+ 0) = 2n


while ID+ dO) = O. Why does this not contradict Green's theorem?
10. Let n = 4 and D = {x: (Xl)l + (X l )2 + (X 3)1 < (X4)1, 0 < X4 < I}. Evaluate:
(a) SaD+ (Xl + x 4 )dx l /\ dx 2 /\ dx 3.
(b) SOD+ 1x 12 dx l /\ dx 2 /\ dx 3.

11. Suppose that D = {(x, y): fix) < y < g(x), a < x < b} = {(x, y): <p(y) < x < ",(y),
e < y < d}. Show directly from the fundamental theorem of calculus and properties
of line integrals that

f cD+
N dy = f· aN dVl ,
D ax
fiJD+
M dx = - f ayaM dV
D

Adding, we get the Green's theorem for regular domains of this special type (see
Figure 8.14).

361
8 Integration on manifolds

'----0
c ----,-
I I
I I
I :
~~--~a----------~b---x

Figure 8.14

12. The winding number of a closed curve y in E2 about a point (xo, Yo) not in the trace
of y is
_ 1
w(xo, Yo) - -
f
(x - xo)dy - (y - yo)dx
2 2 •
2n y (x - xo) + (y - Yo)
Let y be the positively oriented boundary of a regular domain D.
(a) Show that w(xo, Yo) = 1 if(xo, Yo) E D. [Hint: Apply Green's theorem to
DE = {(x, y) ED: (x - x o)2 + (y - Yo)2 2 e},
where e < dist[(x o, Yo), fr D]. Note that m = 2 in Formula (8.19).]
(b) Show that w(xo, Yo) = 0 if (xo, Yo) rt cI D.

8.8 Stokes's formula


The divergence theorem is a special case of a result which is nowadays
called Stokes's formula. Let co be an (r - I)-form. Stokes's formula equates
the integral of dco over a portion A of an oriented r-manifold M and the integ-
ral of co over the (suitably oriented) boundary of A.
Let us begin with the following particular case and afterward generalize.
Let BeE' be a regular domain, and let A = g(B) where g is a regular trans-
formation of class e(2) from some open set containing cI B into M. Let 0 be
the orientation induced on A from the positive orientation of E'. The set
K = g(fr B) is the boundary of A relative to M.
Let us quote two results whose proofs are somewhat technical and are
deferred to Section 8.9. When r = 2, these results can also be proved directly
(Problem 7). The set K is an (r - I)-manifold by Theorem 8.3. We orient K
in a manner consistent with the orientation 0 on A, as follows. Let (X be the
positive orientation on fr B, and let Pl(X) = L,-l[(X(t)J, where x = g(t),
L = Dg(t), and L,-l is the induced linear transformation of (r - I)-vectors
(Section 7.6). The desired orientation on K is P = I PII-1Pl' Let 8Ao denote
K with this orientation.
Let co be an (r - I)-form of class C1) on cI A. By Proposition 8.12,

362
8.8 Stokes's formula

On the other hand, by (8.16),

By (8.18), the right-hand sides are equal. Therefore we have

Stokes's formula:

(8.20)

The case r = 2, n = 3
Here co = P dx + Q dy + R dz is a I-form and dco is a 2-form. In this case,
(8.20) equates the integral of dco over an oriented surface A in £3 with the
integral over the boundary of A oriented consistently with the orientation on
A. Figure 8.15 illustrates a case when 8Ao is a single simple closed curve y.
The set A is the image under g of B, and the orientation 0 on A is that induced
by g from the positive orientation on B. The curve y bounding B is taken
onto y by g, with corresponding orientations.

--+------------------s --~-------------------y

Figure 8.15

EXAMPLE I. Let A be the portion of the hyperboloid x 2 = y2 + Z2 + 1 in


Example 2, Section 8.7. Then y is a circle in the plane x = J2, represented
by f(e) = g(l, e) = J2e! + (cos e)e 2 + (sin e)e3' 0::; e ::; 2n. Let q =
y dy 1\ dz. Then q = dco, where co = ti dz. By Stokes's formula,

L/ dy 1\ dz = 1J. y 2 dz = 1J:" cos 2 ed(sin e) = 0.

e e.
At the last step we used cos 2 = 1 - sin 2 In this example, one also has
L co = SilB. COl, whereoB+ is the perimeterofthe rectangle B = (0,1) x (0,2n:)
363
8 Integration on manifolds

in the (r, fJ) plane traversed counterclockwise. We evaluated fy ro by integrating


along the segment A. where r = 1, which corresponds to y. The reader should
verify that the rest of aB+ corresponds to an arc on M traversed once in each
direction. Since a line integral of ro changes sign when direction is reversed,
the contribution from the rest of aB+ is 0 (Problem 3).

When r = 2, n = 3, Stokes's formula can be rewritten in vector-analysis


notation. The I-form *dro is called curl ro, and the vector n(x) = *o(x) is a
unit normal tb A. Since dro(x)· o(x) = curl ro(x) . n(x), Formula (8.20)
becomes

(8.21) f f
cAo
ro =
A
curl ro(x) . n(x)dV2 (x).

The name Stokes's formula was traditionally applied to (8.21) and not its
generalization (8.20).

The normal n(x) varies continuously on A. At a boundary point x of A,


n(x) can be visualized in the following way. Let x = g(s, t), where (s, t) E fr B.
Let v be the exterior normal and v the positively oriented unit tangent vector
to y at (s, t). The vector h = Dg(s, t)(v) is a tangent vector to y at x. If ho =
Dg(s, t)(v), then (ho, h) is a frame for the tangent space to M at x and has the
required orientation o(x). Hence (n(x), ho, h) is a positively oriented frame for
E3 (see Figure 8.15).
In particular, let A lie in a plane II, oriented by a unit vector no normal to
II. Then n(x) = no for every x E A. Let Xo be a point of the domain of roo
If A contains Xo and A has small diameter, then the right-hand side of (8.21)
is approximately curl ro(xo) • no V2 (A). More precisely,

(8.22) curl ro(xo) . no = lim _(1) fro.


diamA~O V2 A y

This is proved using a lemma similar to that for the proof of the corresponding
formula (8.8) for the divergence. Note that curl ro = 0 if and only if ro is
closed (dro = 0).
*Note 011 fluid flow. In Section 8.5 we defined the circulation c of a fluid
along a curve y lying in the region Do of flow. We saw that c = L ~, where ~ is
a I-form with the same components as the velocity vector. By Stokes's
formula, c = fAo d~ if Y = aAo and A c Do. We call the flow irrotational if
d~ = o. Thus irrotational flows are those for which the circulation is 0 along
any simple closed curve y that bounds such a surface A.
Some generalizations. Let M be an orientable manifold of class C(2).
We stated Stokes's formula above in case cl A is contained in some co-
ordinate patch. By using partitions of unity, this restriction can be removed.

364
8.8 Stokes's formula

Proposition 8.10. Let M be compact and 0 an orientation for M. Then

f dco = 0
JMo
for every (r - 1){orm co of class C( 1) on M.
PROOF. Let {1t 1 , ••• , 1tm} be a partition of unity for M. Let g(k) be a regular
transformation of class e(2) from an open ·set.:1k c E' onto a coordinate patch
Sk containing the support of 1tk. Then

f d(1tkco) = ±
JMo
f
4t;
[d(1t kco)r = ± f
4t;
d(1t kCO)1 = 0,

by the divergence theorem and the fact that (1tk CO)1 = 0 near fr.:1 k since
1tk(X)= 0 outside a compact subset of Sk. Since Lk 1tk = 1 and Lk d1t k =
d(Lk 1tk) = 0,

0= t fM Od(1tkCO) = fMO(t 1tk }co + L.(td1tk~ = fM OdCO. D

Since M has empty boundary relative to itself, one would expect to obtain
o on the left-hand side of (8.20) when A = M. Proposition 8.10 states that
this is correct.
Now let M be any orient able r-manifold of class e(2). Let us call a relatively
open set A c M a regular domain on M if:
(1) cl A is a compact subset of M;
(2) the boundary K of A relative to M is an (r - 1)-manifold of class C(2);
(3) A is the interior, relative to M, of cl A.
Condition (3) rules out oddities of the sort in Example 1, Section 8.4, in
which A is on both sides of K. Problem 8 gives a condition equivalent to (3),
stated in terms of coordinate systems on M.
Let 0 be an orientation for M. If S is a coordinate patch that intersects K,
then K n S can be oriented consistently with 0, as explained earlier in the
present section. It can be shown that these orientations agree in overlapping
coordinate patches. This defines an orientation pon K. Let K with orientation
p be denoted by vAG.
Theorem 8.1. Let A be a regular domain on M, and let co be an (r - 1){orm
of class C(1) on cl A. Then

(8.20) f ilAO
co = f
AO
dco.

This theorem can be proved using the divergence theorem and a partition
of unity in much the same way as for Proposition 8.10. We do not give the
details.

365
8 Integration on manifolds

We have assumed that M is of class e(2), but Theorem 8.1 is still valid for
manifolds of class e(l). Moreover, the relative boundary K may be piecewise
of class Cll in the sense explained at the end of Section 8.4. For instance, if
M is an r-plane and A an r-simplex contained in M, then the boundary of A
relative to M is piecewise of class e(l).

PROBLEMS

1. Let ro = yz dx + x dy + dz. Let}' be the unit circle in the xy-plane, oriented in the
L
counterclockwise direction. Calculate ro and SAO dro and verify that they are equal,
where the orientation 0 is chosen so that vAo = }' and:
(a) A is the disk x 2 + y2 < 1 in the xy-plane.
(b) A = {(x, y, 1 - x 2 - y2): x 2 + i
< I}.
2. Let ro = z exp( - y)dx + z dy + y dz. Evaluate SAO dro when A is:
(a) The ellipsoid x 2/a 2 + y2/b 2 + Z2/C2 = 1 oriented by the exterior normal.
(b) The square with vertices 0, e l + e2, J"2e
3 , e l + e2 + fie 3, oriented so that
023(X) > O.
(c) The paraboloid y = x 2 + Z2 oriented so that 031(X) > O.

3. In Example 1, show that SOB+ ro' = Sl ro ' .


4. Let M = fr D, where D is a regular domain in En. Show that SM (*dro)' v dv,,_, = I.-
if ro is an (n - 2)-form of class C(l) on M.

5. Prove the following:

dro(xo) . (10 = lim


diamA ..... O
[v,.(A)] - 1 f
ilA:lo
ro,

where Xo E A and A lies in an r-plane IT oriented by (10'

6. Let (I be the r-vector of an r-simplex So and Po, PI' ... , Pr the (r - I)-vectors of its
oriented faces (Problem 12, Section 7.5). Show that

dro(xo)' (I = I (_I)iro(Xo)' Pi'


i=O

[Hint: Consider simplexes S similar to So and containing Xo. Apply Problem 5


with A = S.]
7. Let r = 2, and A, B, K, ro as in the discussion preceding (8.20).
(a) Show that K is a I-manifold.
(b) Show that SOAO ro = SOB+ ro·.
8. Show that condition (3) in the definition of regular domain A is equivalent to the
following property. Let S be any coordinate patch on M such that S (") K is not empty.
Let B = g-I(A), where g is regular from an open set ~ onto S. Then for any
to E (fr B) (") ~ there exist a neighborhood U of to and <I> of class C(l) on U such that
grad <l>(t) #- 0 for every t E U and (fr B) (") U = {t E U : <l>(t) = O}, B (") U =
{t E U:<I>(t) < O}. (cf. Section 8.4.)

366
8.9 Regular transformations of submanifolds

8.9 Regular transformations of submanifolds


Throughout the present chapter we consider regular transformations from
/1 c g into an r-manifold M, but there remains the following question.
Given a transformation g from /1 into En, under what conditions is g(/1) an
r-manifold if we no longer assume that g(/1) is contained in an r-manifold M?

Proposition 8.11. Let g be of class c(1) from an open set /1 c g into En. If
Dg(to) has rank r, then there exists a neighborhood Q of to such that g(Q)
is an r-manifold.

PROOF. Since Dg(to) has rank r, there exists A = (i1>'''' ir) such that
Jg).(t o) "# 0, where g). = (git, ... ,gir). By the inverse function theorem, to
has a neighborhood Q such that g).IQ is a regular flat transformation and
Ro = g)'(Q) is open. Then glQ = Go (g).IQ), with G as in Figure 8.3
(Section 8.1), and g(Q) = G(Ro) is an r-manifold. 0

Definition. A transformation g from an open set /1 into En is open if g(/1,) is a


relatively open subset of g(/1) for any open set /1 1 c /1.

Theorem 8.2. Let g satisfy conditions (1), (2), and (3) in the definition ofregular
transformation (Section 8.1). Moreover, let g be open. Then g(/1) is an
r-manifold.

PROOF. By Proposition 8.11, each to E /1 has a neighborhood Q such that


g(Q) is an r-manifold. Since g is an open transformation, g(Q) is a relatively
open subset of g(l1). Hence g(/1) is an r-manifold. 0

Corollary. Ifg satisfies (1), (2), and (3) and g-' is continuous, then g(/1) is an
r-manifold.
PROOF. Let /1, c /1 be open. Then g(/1 1) = (g-')-'(/1,) is open relative to
g(/1) by Theorem 2.6. Therefore, g is an open transformation. 0

Problem 3, Section 8.1, shows that g(/1) need not be an r-manifold if


merely (1), (2), and (3) are assumed.
In this section we give conditions under which the image of a manifold N
under a transformation g is a manifold, and under which the inverse image
of a manifold is a manifold. In particular, we are concerned with manifolds
of lower dimension s, where 1 ~ s < r and the domain of g is an open subset
of g. Many of the results have already been quoted in earlier sections,
especially Section 8.8, when s = r - 1.
In showing that sets are manifolds, we use the following observation,
which is an immediate consequence of the definition of manifold in Section

367
8 Integration on manifolds

4.7. A set PeEn is an s-manifold if any Xo E P is contained in a set Q open


relative to P, such that Q is an s-manifold.

Theorem 8.3. Let g be regular from an open set Ll c E' into an r-manifold M.
Let Nell and P = g(N). Let 1 :s:: s < r. Then
(a) P is an s-manifold if and only if N is an s-manifold.
(b) If N is an s-manifold and x = g(t), tEN, the tangent space Tp(x) is the
image under Dg(t) of TN(t).

PROOF. We divide the proof into three parts.


(i) Suppose that N is an s-manifold. Let Xo E P. Xo = g(t o) and let I: be a
coordinate patch on N containing to. There exists an open set r c E'
and a regular transformation", from r onto I:. Let f = go"'. Since
Df(t) = Dg(t) a D"'(t). if t = "'(t),
and Dg(t), D"'(t) have maximum rank, Df(t) also has maximum rank s.
By Propositions 8.3 and 8.4. g(Ll) is a relatively open subset of M, and
g-I is continuous. Similarly, I: is a relatively open subset of N, and
",-I is continuous. Let PI = g(I:). Then PI is open relative to P = g(N),
since g is a homeomorphism. Moreover, f- I = ",-I a (g-II PI) is
continuous. By the corollary above. f(r) is an s-manifold. But PI = f(o.
Since any Xo E P lies in such a relatively open set PI' P is an s-manifold.
(ii) Continuing the notation in (i). let", I' .... "'. be the partial derivatives of
"'. Given t = "'(t), let
k, = "'I(t). hi = Dg(tHk,), I = ], ...• s.
Since f = go"" the composite function theorem implies that hi = f ,(.).
By Proposition 8.1. the vectors k l •...• k. form a basis for T~t) and
hi, ... ,h. a basis for Tp(x), if x = g(t). This proves (b).
(iii) It remains to show that N is an s-manifold if P is an s-manifold. Let
to E N, Xo = g(to). Choose A. = (il' ... , ir) and a coordinate patch S
on M as in the proof of Proposition 8.3. Let Q be a coordinate patch on
P containing x o , with Q c S. and X a regular transformation from an
open set BeE' onto Q. Let Xl = Xl a X; in other words, Xl =
(Xit, ...• Xir). Let T = Xl(Q) = Xl(B). Now Xl is C(I) and univalent. The
differential DXl(t) has maximum rank s (Problem 1). Since X-I and
G = (Xli S)-I are continuous, (Xl)-I is continuous. By the corollary,
T is an s-manifold. Let Q be a neighborhood of to such that gll Q is
regular, from Q onto an open set R I = gl(Q); see the proof of Proposition
8.3. Let C = {x: Xl E Rd. Then C is open; by choosing Q small enough,
we may assume P n ens = Q n C. The set NI = Q n N n g-l(C n S)
is open relative to N. Moreover, N I = (glIQ)-I(T n RI). Since(glIQ)-1
is regular and T n R I is an s-manifold, the corollary implies that N I is
an s-manifold. Since any to E N lies in such a set N I' N is an s-manifold.
D
368
8.10 Closed and exact differential forms

If ex is an orientation for N, then g induces an orientation ~ on P as follows.


Let x = g(t), tEN, and let ~l(X) = Ls[ex(t)], where Ls is the linear transforma-
tion of s-vectors induced by L = Dg(t). We take ~(x) = l~l(X)I-l~l(X).

Proposition 8.12. Let g, N, and P be as in Theorem 8.3. Let ex be an orientation


for the s-manifold N and ~ the induced orientation on P. Let Y be an s-
measurable subset of P, and ro an sjorm. Then

(8.23) Y = g(Z),

provided either integral exists.


PROOF. First let us consider the case when Z C L, with L a coordinate patch
on N. Then L = +(f), with r an open subset of £S and + regular from r onto
L. Let f = g 0 +,
as in the proof of Theorem 8.3. Then f is regular from r
onto g(L). Let us suppose that ex is the orientation on L induced by from +
the positive orientation on £S. Then ~ is the orientation on g(L) induced
by f from the positive orientation on £S. Let us write ro: to denote the
dependence on g of ro#. Then ro: = (ro:)+. Let Z = +(W). By Proposition 8.9,

If ex is induced by L from the negative orientation, then we replace W+ by


W-. This proves (8.23) if Z is contained in a coordinate patch.
The general case can be reduced to this one by introducing a partition of
unity. One requires that each function 7rk of the partition of unity has support
in an open set Uk such that g-l(U k n P) is contained in some coordinate
patch on N. D

PROBLEMS

1. Let X!. = X!. 0 1.. as in Part (iii) of the proof of Theorem 8.3. Show that DX!.(t) has
rank s for any t E B. Hint. [DG(x!.)r 1 = DX!.(xll T,,(xl.
2. Let Q and M be r-manifolds, with Q cErn, M c En. Let g be a transformation of
class c( 1) on an open set containing Q, with g( Q) eM. Call g IQ regular if g IQ is
univalent and Dg(t) ITQ(t) has rank r for each t E Q. Let N c Q and P = g(N). Prove
that statements (a) and (b) in Theorem 8.3 remain true provided glQ is regular.

8.10 Closed and exact differential forms


Any exact differential form ro = d1] is closed, provided 1] is of class e(2). This
is a consequence ofthe formula in Section 7.4 d(d1]) = O. Whether, conversely,
every closed form ro is exact depends on the topological nature of the domain
D of roo In this section we give two sufficient conditions that every closed
r-form with domain D be exact. The first is that D be simply connected and

369
8 Integration on manifolds

applies when r = 1. The second is that D be star-shaped and applies for any
degree r.
The results in this section do not depend on the rest of Chapter 8, except
for the use of Green's theorem in the easy special case of a rectangle.

Homotopies
Let f and g be transformations of class e(2) from a set B c Em into a set
A c Eft. We are interested in whether it is possible to smoothly interpolate
in A between f and g. If this is possible then f and g are called homotopic in A.
To state this more precisely, let us consider the subset [0, 1] x B of Em + 1.

Definition. If there is a transformation H of class C<2) on [0, 1] x B such that


H(s, t) E A for every (s, t) E [0,1] x Band H(O, t) = f(t), H(1, t) = g(t)
for every t E B, then f and g are homotopic in A.

In the usual definition of homotopy in topology, H is required to be merely


continuous. What we call homotopy is then called a homotopy of class e(2).

EXAMPLE I. Let A be convex. Then we may take


H(s, t) = sg(t) + (1 - s)f(t).
Therefore, any two transformations f and g of class e(2) with values in a con-
vex set A are homotopic in A. In particular, this is true when A = Eft.

To define simple connectedness one may take B to be a circle. However,


instead of a circle it is more convenient to let B be an interval [a, b] with the
endpoints identified. Let f and g be transformations from [a, b] into A such
that f(a) = f(b) and g(a) = g(b). Then f and g are strictly homotopic in A if
the homotopy H in the definition above can be chosen so that H(s, a) =
H(s, b) for every s E [0, 1].
If oH/ot i= 0, then for each s the transformation H(s, ) represents on
[a, b] a closed curve Ys of class e(2) in the sense of Section 6.2. Intuitively, one
may regard a homotopy as a smooth interpolation by the curves Ys between the
curve Yo represented by f and the curve Y1 represented by g. However, for
technical reasons it is disadvantageous to include the conditions oH/ot i= °
in the definition of homotopy.

Definition. If g is strictly homotopic in A to a constant transformation f,


then g is null homotopic in A.

If f(t) = Xo for every t E [a, b], then one should think intuitively that
Ys shrinks to the point Xo as s --+ 0+. When A is an open subset of E2 this is
possible roughly speaking provided Ys does not loop around any holes which
may be present in A. In Figure 8.16, A has two holes and the curves Ys in the
figure are not null homotopic in A.

370
8.10 Closed and exact differential forms

Figure 8.16

Let D be an open set, and co a I-form with domain D. Let us set

(g, co) = [CO[g(t)]. g'(t)dt.

In case g'(t) =F 0, (g, co) is just another notation for the line integral of co along
the curve represented by g.

Proposition 8.13. Let co be closed. Iff and g are strictly homotopic in D, then
(f, co) = (g, co).

PROOF. Let co' be the I-form on the rectangle R = [0,1] x [a, b] induced
by the transformation H. Since dco = 0, dco' = (dco)' = 0. By Green's
theorem,

The integral over oR + is the sum of the integrals over the four segments
AI' . .. , A4 indicated in Figure 8.17. Now
(OHi OHi)
co' = 2:1
n

i=
COi 0 H(dx i)' = 2:1
n

i=
Wi 0 H --;1 ds
uS
+ --;1
ut
dt ,

Hand OHi/ot being evaluated at (1, t). Since H(I, t) = g(t), the right-hand
side is just (g,co). Similarly, since H(O, t) = f(t),

f co' =
)..
-(f,co).

Since H(s, a) = H(s, b),

371
8 Integration on manifolds

)'3
b

A4 R ;'2

a
).,

Figure 8.17

EXAMPLE 2. Let n = 2 and let D be the plane with (0,0) removed. Let 0> =

°
(x dy - Y dx)/(x 2 + y2). Formally,o> = dE>, where E>(x, y) is the angle from
the positive x-axis to (x, y), < E>(x, y) < 2n. However, E> is defined only in
the plane with a slit removed even though 0> is defined and of class Coo) in D.
° °
For each integer m -=F let gm(t) = (cos mt)e\ + (sin mt)e 2, ~ t ~ 2n. Then
<gm,O» = 2mn, which shows that gm and g, are not strictly homotopic in D
when m -=F l. The transformation gm represents the unit circle traversed Im I
°
times, counterclockwise if m > and clockwise if m < 0.

Proposition 8.13 has the following corollaries.

Corollary 1. If g is null homotopic in D, then <g, 0» = 0.


PROOF. If f is constant, then <f, (0) = o. D

Definition. An open set D is simply connected if every transformation g of


class e(2) from an interval [a, b] into D, satisfying g(a) = g(b), is null
homotopic in D.

Roughly speaking, D is simply connected if every closed curve in D can


be shrunk in D to a point. When D c E2 this amounts to saying that D "has
no holes." Removal of a single point, as in Example 2. must be counted as
introducing a hole.
If D = {x E E3 : Ix I > 1}, then D is simply connected, yet D has a "hole."

Theorem 8.4. If D is a simply connected open subset of En, then every closed
110rm with domain D is exact.
PROOF. By Theorem 6.1, it suffices to show that L 0> = ° for every piecewise
smooth closed curve y lying in D. Let g be a representation of such a curve y
on [0, 1], such that g is piecewise of class e( 1). There is a seq uence gl' g2 , ... of
transformations of class e(OO) on [0,1] such that: (1) gm(O) = gm(l) for m =
1,2, ... ; (2) gm(t) -+ g(t) for every t E [0,1], and g~(t) -+ g'(t) except at the
(finitely many) points of discontinuity of g', as m -+ 00; (3) Igm(t) I and Ig~(t) I

372
8.10 Closed and exact differential forms

are bounded by some number C. Such a sequence can be found by a standard


smoothing technique (Problem 4). By Lebesgue's dominated convergence
theorem, <gm' (0) ...... <g, (0) as m ...... ::tJ. By Problem 7, Section 5.11, gm
tends to g uniformly on [0, 1]. Since g([O, 1]) is a compact subset of the open
set D, there exists m I such that gm([O, 1]) c D for all m 2: m I' By Corollary 1,
°
<gm, (0) = for each m 2: mi' Therefore <g, (0) = S; <0 = 0. D

Let us turn to the question of finding a condition on D that ensures that


any closed form of arbitrary degree r is exact. For this purpose, let B be an
open subset of Em. Let us introduce an operation which changes any r-form
I) of class Cl) on [0, 1] x B into an (r - I)-form of class C(l) on B. The latter
n
form is denoted by I). If r = 1, then I) = f ds + 1)1, where 1)1 involves the
differentials dtl, ... ,dtm. In this case I) = n n
f(s, )ds, which is of class C(l)
on B. Next, if I) = ds /\ 9 = L(Il] ell ds /\ dt /\ ... /\ dt jr - I, then we set
j,

(8.24)

Finally, any r-form I) on [0, 1] x B can be written I) = ds /\ 9 + 1)1, where


1)1involves only the differentials dtl, ... ,dtm:
1)1 = I 111 dt i, /\ ... /\ dtir.
[A]

We set n nds /\ 9. Using the rules for exterior differentiation, we find


I) =
that
a I
dl) = d(ds /\ 9) + dl)l = -ds /\ d'9 + ds /\ o~ + d'l)l,
where d' denotes the differential with respect to t of a form on [0, 1] x Band
the components of the r-form O"I/OS are the partial derivatives OI1l/os.
Therefore

(*)
II o dl) = -
II 0 ds /\ d'9 +
II asal)I
0 ds /\

= - { ds /\ d'9 + 1)1(l) - 1)1(0),

where I)I(S) is the r-form on B with coefficients l11(s, ). Differentiating under


the integral sign,

o t t of
ot j Jo f ds = Jo ot j ds, j = 1, ... ,m,

providedfis of class C(1). Hence

d II
o
f ds = .fI(:t oII f dS)dt j = II0 ds /\ d'f
)=
j

373
8 Integration on manifolds

Applying this in (8.24), if 1) is of class e(l) we get

d t1) = L t ds 1\ d'e/l 1\ dt h 1\ ..• 1\ dt ir - I


(**)
Jo [/l) Jo
= (Ids
Jo
1\ (Lw d'e/l 1\ dr il 1\ ... 1\ dt ir - I) = t ds
Jo
1\ d'O.

From (*) and (**) we get

(8.25) s: d1) +d s: 1) = 1)1(1) - 1)1(0).


Now let co be an r-form of class e(l) on A. Let H be a homotopy between
transformations f and g, and let coL co:' co~ denote the r-forms induced
respectively by f, g, and H. Let 1) = co~. Then 1)1(1) = co: and 1)1(0) = co;.
Therefore

(8.26)

With this formula we can readily deduce a result about closed forms,
called Poincare's lemma.

Definition. A set A is star-shaped if there is a point Xo E A such that for


every x E A the line segment joining Xo and x is contained in A (Figure 8.18).

Figure 8.18

Poincare's lemma. Let D be a star-shaped open set and let 1 ~ r ~ n. Then


every closed r-form with domain D is exact.
PROOF. Let Xo be a point with respect to which D is star-shaped. Let f(x) = x o ,
g(x) = x, H(s, x) = Xo + s(x - x o), B = D. (This homotopy merely shrinks
D radially to the point xo.) Then co: = co; and since r > 0 and dl = 0,
co; = O. Since co is closed, dco~ = (dco)~ = O. Let ~ = SA co~. Then by (8.26),
~=~ 0
Note. Poincare's lemma gives only a sufficient condition on D that every
closed form be exact. A necessary and sufficient condition can be obtained
from DeRham's theorem [21, Chapter IV; 24, Chapter IV].
Let us state without proof the following version of the theorem. Let
.;zr(D) denote the set of all closed r-forms of class e(oo) on D. If co and ~ are

374
8.11 Motion of a particle

closed, then eo + ~ is closed and ceo is closed for any scalar c. Thus ;zr(D) is
a vector space over E1. Similarly, let rffr(D) denote the vector space consisting
of all exact r-forms of the type eo = d~ where ~ is of class C(oo) on D. Then
rffr(D) c ;zr(D). According to DeRham's theorem, the quotient vector space
yt'r(D) = ;zr(D)jrffr(D) is isomorphic to the r-dimensional cohomology group
of D with real coefficients. (The homology and cohomology groups of a space
are defined in algebraic topology. They contain a great deal of topological
information about the space.) In particular, every closed r-form is exact if
and only if yt'r(D) = O.

PROBLEMS

1. Let D be the solid torus obtained by rotating the circular disk (y - a)2 + Z2 < b2,
o < b < a, about the z-axis. Let y be the circular path traversed by the center of the
disk. Show that Jy (x dy - Y dx)/(x 2 + i) i= O. Hence by Corollary 1, y is not null
homotopic in D.
2. Let S be the sphere x 2 + y2 + Z2 = a2 , oriented by the unit exterior normal. Let
ro = p-3(X dy /\ dz + y dz /\ dx + z dx /\ dy),
p2 = x 2 + i + Z2,

the domain of ro being £3 - {O}. Show that:


(a) ro is closed.
(b) Sso ro = 4n. [Hint: Find *ro(x)' v(x), where v(x) is the exterior normal.]
(c) £3 - {O} is simply connected.
3. Let D be star-shaped and let D = g(D), where g is a regular flat transformation
of class C(2). Show that every closed form with domain D is exact.
4. For m = 1,2, ... let hm be a function of class C(oc) on £1 such that hm ~ 0,
J~oo hmdx = 1, hm(x) = 0 whenever Ixl ~ 11m. [For instance, we may take
hm(x) = mh(mx), where h is as in Section 3.4.J Let IjJ be a piecewise continuous
function on £1 which is periodic of period 1. Let IjJm(x) = J~ ljJ(y)hm(x - y)dy =00

J~oo ljJ(x + z)hm(z)dz. Show that:


(a) If IljJ(x) I ~ C for every x, then IIjJm(X) I ~ C for every x and m = 1,2, ....
(b) IjJm is of class C(oo) and of period 1, m = 1,2, ... .
(c) If g IjJ dx = 0, then g IjJm dx = 0, m = 1,2, ... .
(d) At each point Xo of continuity of 1jJ, IjJm(XO) -> ljJ(x o) as m -> (f). [Hint:
IjJm(XO) - ljJ(x o) = J~T'm [1jJ(xo + z) - ljJ(xO)]hm(z)dz.]

[Note: In the proof of Theorem 8.4, let IjJ be a periodic extension of gi', and let
g~(t) = l(O) + J~ IjJm(x)dx, i = 1, ... , n.]

8.11 Motion of a particle


In this section we consider the motion of a particle E 3 , subject to a force F
of the type (8.28). It is shown that energy is conserved, and that angular mo-
mentum is also conserved if(8.30) holds. Kepler's laws of motion of the planets
about the sun are derived from these two conservation principles.

375
8 Integration on manifolds

Let x = (x, y, z) denote a possible position of particle. We denote time by


t, and by x(t), the position of the particle at time t. [Note: In the notation of
earlier sections the position would be denoted by g(t). It should be clear from
the context whether x stands for a point in E3 or a function of t.] Following
traditional notation in mechanics, time derivatives are denoted by .. Thus
x(t) = (dx/dt). This is the velocity vector, v = x.
Similarly, v = x is the acceleration vector. Let m denote the mass of the
particle. Then Newton's law states that
(8.27) mx= F,
where F is the force vector. A general model for the motion of particles
assumes that F = F(t, x, v) depends on time, position, and velocity. However,
we consider here only the case when F = F(x). Moreover, we assume that
(8.28) F(x) = - grad U(x),
where U is a function of class e(2) on an open set D c E3. D is the set of possible
positions x for the particle. U(x) is called the potential energy of position x.
The kinetic energy is tm Iv 12. The total energy of a particle in position x with
velocity v is
(8.29) E = tmlvl2 + U(x).
Since x = v, X = v, we have
E=mv·v+gradU·v
= v . (mv - F) = o.

Thus E[x(t)] is constant. In other words, total energy is conserved during


motion of the particle. This property is not sufficient to determine the motion.
To proceed further, one may seek other quantities that are conserved.
Let us now suppose that potential energy depends only on the distance
r from the origin 0:

(8.30) U(x) = u(r), r = Ixl.


The vector p = mv is called the momentum of the particle. The basic equation
(8.27) can be rewritten as p = F. The 2-vector x 1\ p is called the angular
momentum of the particle (about 0). Using the product rule, one gets
d
(8.31 ) dt (x 1\ p) = X 1\ F(x)

(see Problem 1). The product x 1\ F(x) is called the torque. When (8.30) holds,
u'(r») u'(r)
x 1\ F(x) = X 1\ ( - -r- x = - -r- x 1\ x= o.
Thus, angular momentum is constant during motion of the particle. This result
has the following two consequences.

376
8.11 Motion of a particle

(1) The particle moves in a plane P.

In fact, if tX = x /\ p, then P is the 2-dimensional vector subspace of £3


whose 2-space is tX (Section 7.5).
Let us make a rotation in £3 such that P is the (x, y) plane £2. From now
on, we suppose that the particle moves in £2. In terms of polar coordinates
(r, 8),
x = r[(cos 8)e l + (sin 8)e 2]
p = mr[(cos 8)e l + (sin 8)e 2] + mr[( - sin 8)e l + (cos 8)e2]O.
2 •
X /\ P = mr 8e 12 ·
Since angular momentum is conserved, mr2() = A, for the constant A =
1x /\ P I. We assume A > 0, and therefore () > 0. The case A < is entirely °
similar; see Problem 2 for the exceptional case A = 0. Since () > 0, 8 is an
increasing function of t.

o Xi = g(t;). i = 1. 2
Figure 8.19

°
Consider two times t I, t 2, with < t 2 - t 1 and 8(t 2) - 8(t d :::;; 2n. Let
y denote the curve along which the particle moves for t 1 :::;; t :::;; t 2, and B the
region shown in Figure 8.19. Then using Green's theorem

The right-hand side is the area V2(B). Since r2{) = Aim,

(8.32) A(t2 - tl ) = V2(B).


2m

This expression depends only on the difference t 2 - t I' Thus we have:


(2) The particle sweeps out area at a constant rate.
This is Kepler's second law.
We have not yet used the fact that total energy £ is constant during motion
of the particle. For motion in £2, Ivl2 = r2 + r2()2. Since () = A(mr 2)-t, we
have from (8.29) and (8.30),
mr2 A2
£ = 2 + 2mr2 + u(r).

377
8 Integration on manifolds

cp(r)

--r----L--------~-------r

Figure 8.20

This is a first-order differential equation for r(t). Let us suppose that cf>(r) =
A2(2mr2)-1 + u(r) has the shape indicated in Figure 8.20.
Now (mj2)r2 = E - cf>(r), and hence cf>(r) s E. Consider the case E < Eo,
and r 1, r2 the two solutions of cf>(r) = E (Figure 8.20). Then r 1 S r(t) S r2.
We seek those solutions for which r(t) = 0 only for isolated values of t. By
separation of variables, one can express t = t(r) as a function of r on each
time interval where r(t) is monotone. In fact, on such an interval

t(r) - t(ro) = ±"2


( m)1/2 f''0
dy
(E _ cf>(y)) 1 12 '

Moreover, for any ro in this interval, r(t) is a periodic function with period
Tj2, where

(8.33) ( m)1/2 f'2 dy


T = 4 "2 '1 (E _ cf>(y))1 /2·
The integral converges since cf>'(r) "i' 0 for r = r 1, r2. On each time interval
where r(t) is monotone, we can write O(t) = 0[r(t)]. Using the facts that
(j = 0'[r(t)]r and (j = A(mr2) - 1,
1 );/2 fr A dy
(8.34) 0(r) = ± ( 2m ro i(E _ cf>(y))1/2 + 0(ro)·
We leave the details of this argument to the reader (Problem 3). When
E < Eo, the particle moves periodically in orbits of period T. If E ~ Eo, the
particle escapes toward infinity as t -+ 00 (Problem 4).
Let us consider the important particular case u(r) = - kr- 1, where k is a
positive constant. This corresponds to the inverse square law of gravitational
attraction. In this case,
A2 k
cf>(r) = - --
2mr 2 r

378
8.11 Motion of a particle

and Eo = O. Let us take ro = rl' By a rotation of axes, we may suppose that


0(rl) = O. The integral in (8.34) can be evaluated to give
(A/r) - (mk/A)
0(r) = arc cos (2mE + (m 2k 2/A2))1/2'

If we take
_ ( 2EA2)1/2
e - 1 + mk 2 '

then the particle moves on the curve


c
(8.35) - = 1 + e cos e.
r

This is the equation of a conic section (Kepler's first law). For E < 0, e < 1.
In this case the curve is an ellipse. For E = 0 and E > 0, the curve is a para-
bola and hyperbola, respectively. When E < 0, the minor and major semi-
axes of the ellipse are
c A
rl = (1 _ e2)1/2 = (2mIEI)I/2

c k
rz = 1 _ e2 = 21 E I'
By Equation (8.32), with B the interior of the ellipse and t 2 - t I = T,
AT A
2m = TCrlrZ = TCr2 (2mIEI)I/2'

Therefore

T = 2TC(r z )3f2(I y/2.


This is Kepler's third law, stating that the period is proportional to the linear
dimension to the power 1.

PROBLEMS

1. (a) Let f and g be functions of class C(1) on [a, b]. Show that

d df dg
- (f 1\ g) =- 1\ g +f 1\ - .
dt dt dt
(b) Derive (8.31).
2. Suppose that (8.28) and (8.30) hold and that angular momentum is O.
(a) Show that the particle moves on a straight line through O.
(b) Let u(r) = r + ,-1. Show that the motion is periodic for any energy level E > 2.

379
8 Integration on manifolds

3. Let c/>(r) be as in Figure 8.20, A > 0, and T as in (8.33). Show that:


(a) The distance ret) from 0 oscillates between r 1 and r2, taking time T/4 to go from
r, to r 2 or vice versa.
(b) The angle O(t) is periodic with period T.
4. Let E ;::: Eo in Figure 8.20, and let r1 be the unique r such that c/>(r) = E. Show that:
(a) If r > 0 initially, then ret) is increasing and r(t) -> (f) as t -> (f).
(b) If': < 0 initially, then the particle first decreases to r, and afterward increases
to (f) as in (a).

8.12 Motion of several particles


An important part of classical theoretical mechanics concerns the motion of
N particles, N 2 2, subject to forces depending on the positions of all of the
particles. The motion may be subject to certain physical restrictions. Thus,
let x I' ... , X N denote the positions of the particles, with x j E E3 for each
j = 1, ... , N. Let us suppose that x = (XI' .•. , x N) is restricted to belong to
a manifold M c E3N. Let n be the dimension of M. The equations of motion
have the form
(8.36) j = 1, ... ,N,
where ml"'" mN are the masses of the particles. The force functions
F I, ... , F N cannot be arbitrary, since the particles must move in M.
Let ScM be a coordinate patch. For simplicity we consider only
motions during which x remains in S. We find an analog of the potential
energy function U in Section 8.11, and rewrite equations (8.36) in terms of a
coordinate system for S. We denote the coordinates of x in this system by
q = (ql, ... , qn). Thus x = G(q), where G is a regular transformation from
an open set ~ c En into M. At time t, the coordinates of the position x(t)
are q(t) = G-I[x(t)]. The position of the jth particle is xit) = GJq(t)J,
where G = (G I , ... , G N ). By the chain rule the velocity vectors satisfy
naG·
Xj = L -aJ(jk, j = 1, ... ,N.
k= I qk

Consider the following function T, defined on ~ x En:


1 n
T(q, q) = "2 k, ~ I ak/(q)I/4',
where

Then
N
(8.37) T[q(t), q(t)] = j~' ~j Iiy

380
8.12 Motion of several particles

is the kinetic energy at time t. The matrix (ak/(q)) is symmetric and positive
definite for each q. For k = 1, ... , n,

(838) oT ~ '1 ~ • oG
. ~ = L, aklq = L, mj x j '""""3lkj
uq 1=1 j=1 uq

(8.39) I ~
-
oapl 'p'l
L, - q q
2 p,l= 1 oqk
By a short calculation (Problem 3), the equations of motion (8.36) imply
(in the coordinates q)

(8.40) :t (~;k) = :; + F· :~, k = 1, ... , n,


where F = (Fh ... , F N) is the function with values in E 3N describing all of
the forces.
Let us write F = Fa + Fe, where Fa(x) and PC(x) are the components of
F(x) tangent to M and normal to M at x, respectively. Fa(x) is called the
applied force at x, and PC(x) the constraining force at x. By Proposition 8.1,
oG/oqk is a tangent vector to M at x = G(q). Since PC(x) is a normal vector
to M at x, Fe. (oG/oqk) = O. Thus, constraining forces contribute nothing to
the last term of (8.40). Generalizing (8.28), let us assume that there is a
potential energy function U, satisfying
(8.41) Fj(x) = -gradj U(x),
where gradj denotes gradient in the variables Xj' Let V(q) = U[G(q)], and
let L = T - V. Then Equations (8.40) become

(8.42) :t(:~) = :~, k = 1, ... ,n.

These are called Lagrange's equations for the motion of the particles. The
total energy E = T + V is constant during the motion (Problem 2).
Lagrange's equations furnish a starting point for further development
of theory. Hamilton's equations furnish another useful formulation (Problem
5) [10; 17, Chapter 13].

PROBLEMS

1. A single particle moves on a 2-dimensional manifold Me E3 , subject only to a


constraining force (F" = 0). Show that the principal normal vector at x(t) (Section 6.2,
Problem 9) to the curve of motion is also a normal vector to M at x(t). [Note: Such
curves are called geodesics of M.]
2. Show that E[q(t)] is constant, where E = T +V is the total energy. [Hint: By
Euler's formula (Section 3.3, Problem 8).

2T =
naT It·]
I -;;;
aq
k=l

381
8 Integration on manifolds

3. Derive (8.40), using (8.38) and (8.39).


4. Let M = {x: Xj .;: Xi for i,j = 1, ... , N, i .;: j}. Suppose that U(x) depends only on
the mutual distances between particles, namely,

U(X) = 'I'(lx 2 - xII, IX3 - xII, ... , IXn - xn-II)·


In (8.41), take F = F" (no constraining forces, since M is an open subset of E 3N ,
n = 3N). Let Pj = !mjxj. Show that:
(a) 'fJ= I pJ{t) is constant.
(b) I7= I xJ{t) /\ pit) is constant.

S. Let n = (7t" ., ., 7tn) denote a covector, n E (En)*. Let


(8.43) H(q, n) = min{L(q, q) + n' q: q E En}.
Show that:
(a) The minimum (for fixed q) is attained at the unique q such that

oL k -I
(8.44) oil =
- -7tk' - , ••• , n•

Let us write q = «I»(q, n).


(b) L(q, q) = max{q, 7t) - n' q: 7t E (En)*}.
The maximum is attained at the unique n such that

(8.45)
aH
-a = q ,
"k
k = 1, ... , n.
7tk

Let us write n = 'I'(q, q).


(c) For fixed q, the transformation «I»(q, ) from (E")* into E" is inverse to 'I'(q, ).
(d) The Lagrange equations (8.42) are equivalent to the system of 2n differential
equations

(8.46) k = l, ... ,n,

if nIt) = 'I'[q(t), q(t)].


(e) H[q(t), nIt)] = E, where E is the total energy (Problem 2).

[Note: 'I' is called the canonical transformation in mechanics, and (8.46) are the
equations of motion in Hamiltonian form.]

382
Appendix

A.1 Axioms for a vector space


A vector space over the real number field is a nonempty set l ' together with
two operations called addition and scalar multiplication. The sum u + v of
two elements u, v E l ' is also an element of l ' and the scalar multiple cu of
u E l ' by the real number c is an element of 1'. These operations are required
to satisfy the following axioms:
(1) Addition is associative and commutative.
e
(2) There is a zero element such that u + e= u for every u E 1'.
(3) The distributive laws hold:
(c + d)u = cu + du, c(u + v) = cu + cv
for every real c, d and u, v E 1'.
(4) (cd)u = c(du) for every real c, d, and u E 1'.
(5) Ou = e, 1u = u, for every u E 1'.
It is easy to show that En satisfies these five axioms. However, a multi-
tude of other important vector spaces besides En occur in mathematics.
A subset B of a vector space l ' is called a linearly dependent set if there
exist distinct elements Ul,' •• , U m E B and real numbers cl, . .. , em not all 0
such that
C1Ul + ... + cmum = e.
lf B is not linearly dependent, then B is a linearly independent set. l ' is afinit~
dimensional vector space if some finite subset B of l ' spans 1', namely, if
every element u E l ' is a linear combination u = C1Ul + ... + cmu m where
Ul> •.• ,um EB.

383
Appendix

A basis for "Y is a linearly independent set that spans "Y. If "Y is finite
dimensional, then every basis B has the same number n of elements [12, p. 43].
The number n is the dimension of 'Or. If n = 0, then "f' has the single element
e. If n > 0 and B = {u 1 , ••• , un} is a basis for "Y, then every UE "Y can be
uniquely written as a linear combination
U = C1U1 + ... + cnu n.
A non empty set [ljJ c "Y is called a vector subspace of "Y if: u, v E [ljJ
implies U + v E [ljJ and U E [ljJ implies cu E [ljJ for any real c. In other words,
this says that [ljJ when provided with the addition and scalar multiplication in
"Y is a vector space.
Let "Y and "/fI be vector spaces. Let L be a function with domain "Y and
values· in "/fl. Then L is linear if
(a) L(u + v) = L(u) + L(v) for every u, v E "Y.
(b) L(cu) = cL(u) for every u E "Y and real c.
Let Land M be linear. The sum L +M is given by
(L + M)(u) = L(u) + M(u)
for every u E "Y. The function L + M has Properties (a) and (b), and thus
is linear. If c is a real number, then cL is the linear function given by (cL)(u) =
cL(u) for every u E "Y.
Let !£l("Y, "/fI) denote the set of all linear functions with domain "Y and
values in "/fI, together with these operations of sum of functions and multi-
plication offunctions by scalars. Then !£l("Y, "/fI) satisfies Axioms (1) through
(5) for a vector space. The zero element of !£l("Y, "/fI) is the function whose
value at every u E "Y is the zero element of "/fl.

The dual space of "Y


Let us now suppose that "/fI = E1 and set "Y* = !£l("Y, E1). The vector space
"Y* is called the dual space of "Y. Let us show that if "Y has positive, finite
dimension n, then "Y* also has dimension n. Let B = {u 1, ... ,Un} be a basis
for "Y. Let L 1, ... , L n be the real valued functions such that for each i =
1, ... , nand u = C1U1 + ... + cnu n,
Li(C 1Ul + ... + cnu n) = ci .
These functions Li are linear, and therefore belong to "Y*. They are specified
by their values at the basis elements:
(*) i, j = 1, ... , n,
where D) = 1 if i = j and D) = 0 if i -# j.
Let us show that B* = {L 1, ... ,L"} is a basis for "Y*. Suppose that
b 1L1 + ... + bnL" = e, where e is the zero function. Then, for every u E "Y,
b 1L 1(u) + ... + bnL"(u) = e(u) = O.

384
A.2 Mean value theorem: Taylor's theorem

Taking u = Ui and applying formula (*), bi = 0 for each i = 1, ... , n. Thus


B* is a linearly independent set. To show that B* spans Y*, given L E y*
let ai = L(uJ If u = C1Ul + ... + cnu n, then since L is linear L(I ciuJ =
I ciL(uJ Therefore
n n

L(u) = I ai ci = L aiLi(u).
i= 1 i= 1

Since this is true for every u E Y,


L = alL l + .. , + anLn,
which shows that B* spans Y*.
The basis B* is called dual to the basis B.
A function ¢ from a vector space Y into a vector space 1fI is an isomorphism
if ¢ is linear and ¢(u) "# ¢(v) whenever u "# v.lfthere is such an isomorphism
from Y onto 1fI, then Y and 1fI are isomorphic vector spaces. All n-dimen-
sional vector spaces are isomorphic. If {Ub ... ,un} is a basis for Y and
{Wi"'" W n } a basis for 1fI, then the linear function ¢ such that ¢(Ui) = Wi
for i = 1, ... , n is an isomorphism from Y onto 1fI.
In particular, any finite-dimensional vector space Y is isomorphic with
its dual Y*. However, this isomorphism is unnatural from several points
of view. In this book we maintain the distinction between Y and Y*.
A more natural isomorphism is the following one from a vector space Y
into the dual y** of Y*. For each u E Y let ¢(u) = lu, where lu E y** is
the real valued linear function such that lu(L) = L(u) for every L E Y*.
This isomorphism is onto y** if Y is finite dimensional.

A.2 Mean value theorem; Taylor's theorem


In this section and Section A.3 we review some basic theorems from calculus
for functions of one variable. In the proof of the Mean Value Theorem we use
the fact that a continuous real valued function has a maximum and minimum
on any closed interval [a, b] (see Theorem 2.5).

Mean value theorem. Let f be real valued and continuous on a closed interval
[a, b], and let the derivative f'(x) exist for every x E (a, b). Then there
exists c E (a, b) such that
f(b) - f(a) = f'(c)(b - a).
PROOF. Let m = [f(b) - f(a)J/(b - a) and let F(x) = f(b) - f(x) - m(b - x).
Then F is continuous on [a, b] and F'(x) = - f'(x) + m for x E (a, b). Since
[a, b] is compact, F has a maximum and a minimum value on [a, b]. If the
maximum value is positive, then since F(a) = F(b) = 0, the maximum must
occur at some Xl E (a, b). By elementary calculus F'(Xl) = 0 and we may
take c = Xl' Similarly, if the minimum value is negative we may take c = X2'
where F(X2) is the minimum value. If neither of these possibilities occurs, then
F(x) = 0 on [a, b] and c is arbitrary. 0

385
Appendix

The mean value theorem has the following generalization.

Taylor's theorem with remainder. Let f, together with its deril'atil'es /"
/", ... ,f(q- I), be continuous on a closed interval [a, b] and letthe qth-order
derivative f(q)(x) exist for every x E (a, b). Then there exists c E (a, b) such
that

f(b) - f(a) = f'(a)(b - a) + f~\a) (b - a)2

where
R = pq)(c) (b _ )q
q , a .
q.
PROOF. Let
f"(x)
G(x) = f(b) - f(x) - f'(x)(b - x) - 2! (b - xf
f(q-l)(X) K
- ... - (b - x)q- 1 - (b - x)q
-
(q-1)! q!'
where the number K is so chosen that G(a) = O. Then G(b) = 0 and, using
the product rule,
G'(x) = (b - X)q-l [ -f(q)(x) + K].
(q - 1)!
Repeating the reasoning in the proof of the mean value theorem, there exists
CE(a, b) such that G'(c) = O. Then pq)(c) = K. 0

A.3 Review of Riemann integration


Let f be real valued and continuous on an interval [a, b]. Then f has an
integral over [a, b], denoted by J~f(t)dt. According to Riemann's definition
of the integral, it is the limit of sums:

f f(t)dt = !~ jtl f(s)(t j - t j - 1)

where
a = to < t 1 < ... < t m- 1 < tm = b,
and
Jl = max{t 1 - to,t2 - t 1 ,···,tm - tm-d·
More generally, t~e Riemann integral exists for any bounded function
with a finite number of discontinuities. It agrees with the integral in Lebesgue's
sense, which is defined in Chapter 5 for a much wider class of functions.

386
AJ Review of Riemann integration

Fundamental theorem of calculus. Let f be continuous on [a, b] and let

F(t) = ff(s)ds, as t S b.

Then F'(t) = f(t)for every t E [a, b].

PROOF. By elementary properties of the integral, if h > 0 and t +hs b,


then

F(t + h~ - F(t) = l r+hf(S)dS.

Since f is continuous, given e > 0 there exists b > 0 such that f(t) - e <
f(s) < f(t) + e whenever Is - t I < b. Then if h < b,

h[f(t) - e] < r+hf(S)dS < h[f(t) + e],

)
. f( t-e<
F(t + h)h - F(t)
<
f( )
t+e.

Hence

f( t ) = l'1m F(t + h) - F(t)


h .
h-O+

The right-hand side is the right-hand derivative of F at t. Similarly, f(t)


equals the left-hand derivative offat t. D

In the theorem, F'(a) means the right-hand derivative and F'(b) means
the left-hand derivative.
The fundamental theorem says that F is an antiderivative of f If G is
any antiderivative of f, then G/(t) - F'(t) = 0 for every t E [a, b], and by
the mean value theorem G(t) - F(t) is constant on [a, b]. Thus G(t) - F(t) =
G(a) - F(a) = G(a), and upon setting t = b we obtain

G(b) - G(a) = ff(S)dS.

Change of variables in integrals


Let ¢ be any real valued function possessing a continuous derivative on some
closed interval [0(, /1] such that
¢/(r) ~ 0 for every r E [0(, /1], ¢(O() = a, ¢(p) = b.
Then a s ¢(r) s b for every r E [0(, /1].
If U = F 0 ¢, then
U/(r) = F'[¢(r)]¢/(r) = f[¢(r)]¢/(r)

387
Appendix

for every r E [~, f3]. Since


F(a) = U(~), F(b) = U(f3),

ff(t)dt = f f[<J>(r)]<J>'(r)dr.

This is the formula for change of variables in integrals. If <J>'(r) :::; 0 for every
r E [~, f3], then <J>(~) ~ <J>(f3). The same formula holds if we agree that

f = -i~
A.4 Monotone functions
Letfbe real valued with domain S eEl.

Definition. If for every x, YES such that x < y,


f(x) < f(y) thenfis increasing.
f(x) :::; f(y) then f is nondecreasing.
f(x) > f(y) thenfis decreasing.
f(x) ~ f(y) then f is non increasing .
If f is either an increasing function or a decreasing function, then f is
called strictly monotone. Iffis either non decreasing or non increasing, then
fis monotone. If the restriction off to A is monotone, thenfis monotone
onA.
A function f is univalent if f(x) =F f(y) whenever x =F y. Clearly, any
strictly monotone function is univalent. If S is an interval, then conversely
any continuous univalent function must be strictly monotone. This can
be proved from the intermediate value theorem.
Let A be an interval and assume that the derivative f'(x) exists for every
x EA. It is proved in elementary calculus that:
(a) f is nondecreasing on A if and only if f'(x) ~ 0 for every x EA.
(b) Iff'(x) > 0 except at a finite number of points of A, thenfis increasing
onA.
(c) fhas an inverse f -1 if and only iff is strictly monotone on A.
(d) Iff'(x) =F 0 for all x E A, then the derivative of the inverse is given by
_ l' 1
f (t) = f'(x)' ift = f(x).

Among the examples of strictly monotone functions from calculus are


the exponential function exp, whose inverse is log. The restriction to
[ - nl2 n12] of the function sin is strictly monotone. Its inverse is denoted by
sin - 1.

388
References

1. Apostol, T. M. 1957. Mathematical Analysis. Reading, Mass.: Addison-Wesley.


2. Birkhoff, G., MacLane, S. 1953. A Survey of Modem Algebra. New York: Macmillan.
3. Bocher, M. 1929. Introduction to Higher Algebra. New York: Macmillan.
4. Bourbaki, N. 1974. Elements of Mathematics, Book 2, Algebra, Part I, Chapters 1-3.
Paris: Hermann; Reading, Mass: Addison-Wesley.
5. Coddington, E. A., Levinson, N. 1955. Theory of Ordinary Differential Equations.
New York: McGraw-Hill.
6. Eggleston, H. G. 1955. Convexity. New York: Cambridge University Press.
7. Federer, H. 1969. Geometric Measure Theory. Berlin-Heidelberg-New York:
Springer.
S. Flanders, H. 1963. Differential Forms with Applications to the Physical Sciences.
New York: Academic Press.
9. Gale, D. 1960. The Theory of Linear Economic Models. New York: McGraw-Hill.
10. Goldstein, H. 1952. Classical Mechanics. Reading, Mass.: Addison-Wesley.
11. Hale, J. K. 1969. Ordinary Differential Equations. New York: Wiley-Interscience.
12. Hoffman, K., Kunze, R. 1961. Linear Algebra. Englewood Cliffs, N.J.: Prentice-Hall.
13. Karlin, S. 1959. Mathematical Methods and Theory in Games, Programming, and
Economics. Reading, Mass.: Addison-Wesley.
14. Kelley, 1. L. 1975. General Topology. Princeton, N.J.: Van Nostrand; New York:
Springer.
15. Kellogg, O. D. 1929. Foundations of Potential Theory. Berlin: Springer.
16. Kestin,1. 1966. A Course in Thermodynamics. Waltham, Mass.: Blaisdell.
17. Loomis, L. H., Sternberg, S. 1965. Advanced Calculus. Reading, Mass.: Addison-
Wesley.
IS. McShane, E. 1., Botts, T. A. 1959. Real Analysis. Princeton, N.J.: Van Nostrand.
19. Meyer, R. E. 1971. Introduction to Mathematical Fluid Dynamics. New York:
Wiley-Interscience.

389
References

20. Nehari, Z. 1968. Introduction to Complex Analysis. Boston, Mass.: Allyn and Bacon.
21. De Rham, G. 1955. Varieties Differentiables, Actualites Scientifiques et Industrielles,
No. 1222, Paris: Hermann.
22. Taylor, A. 1958. Introduction to Functional Analysis. New York: Wiley.
23. Titchmarsh, E. C. 1939. The Theory of Functions. London: Oxford University Press.
24. Whitney, H. 1957. Geometric Integration Theory. Princeton: Princeton University
Press.
25. Willmore, T. 1. 1959. An Introduction to Differential Geometry. London: Oxford
University Press.

390
Answers to problems

SECTION 1.1
1 (a) 2, 1.
(b) fi, no lower bound.
(c) fi, -1.
(d) 0, _e- 1 .
(e) 1, t.
sup S E S for (b), (c), (e); inf S E S for (c), (d).

SECTION 1.2
1 4e 1 -+ e 3 + 3e4 , -2e 1 - e 3 + e4 , j30, j6, j6 j12,
2e l 6.
8 V4 = ±(j2jlO)(4e 1 + 3e l - 3e 3 + 4e4 ).
9 Jm, m = 1, 2, ... , n.

SECTION 1.3
1 {x:x 1 + Xl + 4x 3 = I}.
2 (a) {x:3x 1 - 3Xl - 3x 3 - X4 = O}.
(b) t = ~.

3 _2Xl +Xl + X3 + 2X4 = -3.


4 One basis for Y is {e 1 + 2e 3 , e l + 3e 3 }.

SECTION 1.4
2 (a) {x:O<lx-xol<b}, {x:lxl=Oorb}, {x:lxl=:;b}.
(b) Empty set, A, A.

391
Answers to problems

(c) A, the union of the half-lines y = 0, y = x + 1, x ~ -1, {(x, y): 0 :5: Y :5: x + 1,
x ~ -I}.
(d) A, the union of the circle x 2 + y2 = 1 and the line segment joining (0, 0) and
(1,0), the closed circular disk x 2 + y2 :5: 1.
(e) Empty set, E2, E2.
(f) Empty set, A, A.
(g) Empty set, A u {O}, A u {O}.
3 Open in (c), (d); closed in (b), (f).

SECTION 1.5

7 (b) The barycenter is at the intersection of the line segments that join the vertices
with the barycenters of the opposite (r - I)-dimensional faces.

SECTION 2.1
1 (a) [-1, 1], [0, 1], {x: x = 2mn: + y, y E [ - ~,~J m any integer}.

(b) [-1, 1], [0, ~J. Yes.


2 (a) The triangle with vertices (0, 0), (a, ±j2a).

(b) {(X, y): x 2 - xy + y2 < ~}

SECTION 2.2
(a) t.
(b) No limit.
(c) t.
(d) (1,5) = e l + 5e 2 •
(e) No limit.
Continuity: In (a), (b), (e) except at(O, 0). In (c) except at O. If we setj(O) = 1. then jis
also continuous at O. In (d) at every point of EI.
5 Continuous at any (x, y) with y of. 0, also at (0,0).
6 (a) O. (c) No limit.
(b) No limit. (d) 1.

SECTION 2.3
1 (a) O.
(b) No limit.
(c) o.
2 (a) (-t, 0).
(b) No limit.
(c) (1, 0).

392
Answers to problems

SECTION 2.4
1 (a) ± 1.
(b) No accumulation points.

(c) {(cos 2~TC, sin 2~TC): m = 0, 1, ... , 4}-


(d) {(x, y): l - x2 + I :-:; o}.
(e) [- 1, 1].

SECTION 2.5
1 (a) S is not closed.
(b) S is not bounded.

SECTION 2.6
3 (a) Both open and closed relative to S.
(b) Closed relative to S.
(c) Neither.
4 (a) Closed relative to S.
(b) Open relative to S.

SECTION 2.11
(a) The square {(x, y): Ixl :-:; 1, Iyl :-:; I}.
2 (a) II(x, y)11 = [x 2 + xy + 41]1/2.
(b) 2.

SECTION 3.1
(a) fl(X, y) = 1 + log(xy), f2(X, y) = x/y
(b) fl(x, y, z) = 6x(x 2 + 2yZ + z)Z, fz(x, y, z) = 12y(xz + 21 + z)Z,
f3(X, y, z) = 3(x Z + 2yZ + z)Z
(c) j;(x) = 2Xi

2 The derivative in direction (cos 8, sin 8) is - 2(cos 8 + sin 8).


3 sin 28, if v = (cos 8, sin 8).
5 The derivative is 0 in those directions for which Vi + v2 + v3 = O. There is no
derivative in other directions.

SECTION 3.2
1 (a) a = e l + e Z + 2e 3.
(b) The plane x + y + 2z = c in E3. The line through
c- 2 c
el + ez + - - e 3 and - e3.
2 2

393
Answers to problems

SECTION 3.3
1 It has the equation 4x + 5y + z + 4 = o.
2 (a) )5.
(b) I/J2e.
(c) o.
3 (a) [2x(x Z + 2y + 1)-1 + cos(xZ)]e l + 2(x z + 2y + 1)- l e 2.
(b) 0.09.
4 (a) xo.
(b) lxi-Ix.
(c) 2(xo· x)xo.
6 (b) {(x, y): x = yZ, x#- O}, {(x, y): x = - yZ, x#- O}.
(c) The union of the sets in 6(b) and the x-axis, with (0,0) excluded since f is not
differentiable there.
(d) {(x, y): x = yZ(c- 1 ± )c- 2 - I), x#- O} if lei s 1, c #- 0; the x- and y-axes if
c = O.

SECTION 3.4
1 xyz = -z + (y + l)z - (x - l)z + (x - l)(y + 1)z
2 (a) f(x, y) = 1 - (x - 1) + Rz(x, y);
IRz(x,y)1 s m- 3 [(x - 1)z + lJ where m = min {x, I}.
7 (c) fdO,O) = -1, f21(O,O) = 1.

8 (n+ qq - 1),. (nq -_ 11) if n S q, 0 if n > q.

SECTION 3.5
1 (a) Maximum at -}el.
(b) Saddle point at -e l + 2e z .
(c) Maximum at each point where xy = nl2 + 2mn; minimum at each point where
xy = -n12 + 2mn, m any integer. Saddle point at 0
(d) Saddle points at 0 and at e l .
2 (a) Maximum at -e l + e z , saddle point at 1< -el + e z).
(b) Saddle points at -e z and at e l + e 2 .
(c) Saddle points at mne l , m any integer.
5 x = (1Im)(x I + ... + xm). The minimum value is

6 (a) i, -2.
(b) sin 1, -sin 1.

394
Answers to problems

7 (a) Let e be any number such that IjI'(e) = O. Then an points x on the hyperplane
{x: a· x = e} are critical.
(b) All points on the line y = x are critical.

SECTION 3.6
1 If f(x) = aox4 + a 1x 3 + a2x2 + a3x + a4, then ao > 0, 8aoa2 ~ 3ai.
2 (a) Neither.
(b) Concave, not strictly.
(c) Convex if p :-;; 0 or p ~ 1. Concave if 0 :-;; p :-;; 1. Not strictly.
(d) Strictly convex.
(e) Neither. However, f is strictly convex on each half of {(x, y): 2xy < -I}.
4 (a) a = 1
(b) a 2 satisfies the equation cot a2 = 2a 2, a2 < n12.

SECTION 4.1

1 ( 1 1
-2 0 1
5). The rank is 2. The kernel consists of an scalar multiples of
tl - I1t2 + 2t3'
3 (a) The diagonal elements are e l , ... , en; all other elements are O.
(b) (L -I )i(X) = (ei)-I Xi provided e i #- 0 for every i.

SECTION 4.2
1 (a) Reflection in the line s = t.
(b) They are rotations through angle 3n12, nl2 respectively.
2 L is not a rotation.
4 (b) If g(t) = L(t) + x o , then L must be nonsingular.

SECTION 4.3
1 (a) The parabola x = c 2 + y2/4e 2 if c #- 0; the positive x-axis if e = O.
(b) The lines t = s(k ± Jk2=l) if k = 11m, m2 < 1; the line t = s if m = 1; the
line t = - s if m = - 1; {(O, O)} if m 2 > 1.
(c) {(x, y) E Q: x :-;; a2 }.

2 g(E2) = {(x,y):x ~ O,y ~ O}.


(a) The parts of the lines x + y = 21el, y = x + 21e1, y + 21e1 = x in g(E2).
(b) The union of the lines s + t = m(s - t) and s + t = -m(s - t).
(c) {(x, y): x 2 + y2 ~ 2a 2, X ~ 0, y ~ O}.
3 (b) The part of the cone between the plane z = 0 and the vertex.
(c) The s-axis; {(m, 1): m any integer}.
4 (a) g(L1) = {(x, x 2 ): x ~ f}.
(b) If e ~ t, the part of the ellipse S2 + st + t 2 = lie in L1; if e < t, the empty set.
395
Answers to problems

5 (a) Except where S2 = t 2 ; everywhere in E2; everywhere in ~.


(b) gl = e l + e 2, g2 = -e l + e 2 in QI = frs, t): s - t > 0, s + t > O},
gl = -e l + e 2 , g2 = e l + e 2 in Q2 = frs, t): s - t < 0, s + t > O},
with similar expressions in the other two of the four quadrants Q3, Q4 into which
the lines s = ±t divide E2.
gl = - 2nt(sin 2n:s)e l + 2nt(cos 2n:S)e2'
g2 = (cos 2n:s)e l + (sin 2n:S)e2 - e 3 ;
r
gl = _(S2 + st + t 2 2(2s + t)[e l + 2(S2 + st + t 2)-l e2 ],
g2 = _(S2 + st + t 2)-2(S + 2t)[e l + 2(S2 + st + t 2 )-l e2 ],
(c) 2; 2 unless t = 0, and 1 ift = 0; 1.
(d) 2in QI' Q3' -2 in Q2' Q4; not applicable; O.

SECTION 4.4

F 12 = xfl2 + Xyf22 + f2' the partial derivatives of f being evaluated at (x, xy).

2 FI = fl + f3gl, F2 = f2 + f3g2'
Fll = fll + 2f13g1 + f33(gl)2 + f3gll'
F12 = f12 + f1392 + f32g1 + f33glg2 + f3912,
F 22 = f22 + 2f23g2 + fdg2)2 + f3g22'
3 (a) 162
(b) -1>I@1>'H).

4 (b) 15E I + 54E 2 , -3E I •

SECTION 4.5

1 (a) Yes, g(En) = En, g-I(X) = X - Xo.


(b) Yes, g(E2) = E2. g-I(X, y) = tux + 2Y)E I + (x - y)E 2].
(c) No, g(E2) is the half-plane 4x ~ - 9.
(d) Yes, g(~) = ~, g is not univalent since g( - s, - t) = g(s, t).
t
(e) Yes, g(M = {(x, y): 0 < y < exp( - x)}, g-I(X, y) = t[(a + b)EI + (a - b)E 2],
where a = (y-I + 2ex )I/2,b = (y-I - 2e X )I/2.

2 g-I(X) = [(x + 1)1/2 - lr/ 2, x> O.

3 Its matrix is (- ~ ~ ~).


100
4 If 0 < cos c < 1, the image of the line t = c is the right half of the hyperbola
X2/COS 2 C - y2/sin 2 c = 1; and if -1 < c < 0 it is the left half. If cos c = 0, it is the
y-axis, if cos c = 1 the right half of the x-axis; if cos c = -1, the left half.

5 (b) (gl.1)-I(X, y) = log R(x, Y)EI + 8(x, y)E2' where R(x, y) = (x 2 + yl)I/2, 8(x, y)
is the angle from the positive x-axis to (x, y).
(c) g(E2) = E2 - {(O,O)}.

396
Answers to problems

SECTION 4.6
1 (j/ = -<I>d<l>2' 4Y" = -[(<1>2)2<1>11 - 2<1>1<1>2<1>12 + (<I>d 2<1>22]/(<I>2)3.
2 4YII = -[(<1>2)2<1>11 - 2<1>1<1>2<1>12 + (<I>d 2<1>22]/(<I>d 3.
3 4YI = -1, 1/11 = 0, 4Y2 = 0, 1/12 = 1 at (-1,1).

4 (b) Radius 3J"2/2.


(c) Radius j2t.

SECTION 4.7
1 {(x, y): F(x, y) = c} is an ellipse if log c > 2, is the one point set {(O, O)} if log c = 2,
and is empty iflog c < 2. Any ellipse is a I-manifold.
2 (a) The cone is not a 2-manifold.
(b) 2(x - 2) - (y + 1) + 4(z - I) = o.
7 No.

SECTION 4.8
1 l

3 JI4/3.
9 (a) Al = A2 = I, A3 = - 1.

SECTION 5.1
2 V(Y) = 4, ViZ) = 12, V(Yu Z) = 13, V(Y n Z) = 3.

3 V(IIU[2)=¥, V(Iln[2)=!.

4 (e - l)eXP(~) I/m .
m exp(1/m) - 1

SECTION 5.2
5 239n/240.

SECTION 5.3
1 (a) Unbounded; (- 00, OJ.
(b) Bounded; £2.
(c) Bounded; £2.
(d) Bounded; {(x,y):y2:2: x 2 , Ixl + Iyl:::; I}.
3 1.

397
Answers to problems

SECTION 5.5
1 (a) t, (t, I)·
(b) 1 + n/2, (x, 0) where x = 2/(6 + 3n).
2 (1 - 3e- 2 )/4.

3 f2 dx f~~~:X2 dz{r:'~::-Z2f dy + r,v:~::~z/ dY}.


4 8.

6 (b) (e - If.
(e) 2 - 2n.

SECTION 5.6
1 (a) Exists.
(b) Exists if p < 1; divergent if p ~ 1.
(e) Exists.
(d) Exists.
(e) Exists if p < 1, p + q > 1; divergent otherwise.
(f) Diverges.
(g) Diverges.
5 (a) O.
(b) n.
(e) O.

SECTION 5.7
1
6'

2 7.

SECTION 5.8

1 f f(x)dx = s: f[g(t)] (2 - 2t)dt, provided either integral exists.

2 2 log 2.
3 1p-.

SECTION 5.9
1 2a s/15.
2 i.
3 t.
398
Answers to problems

4 I X/2

0 0 0
de
I[COS 6+ sin 6) - 1

r dr
I9(r cos 6, r sin 6)
f[r cos e, r sin e, z Jdz.

4n
5 - (a 2 _ b2)3/2.
3
7 n/2.
10 n 2 /4.
11 (a) trH)r(t)freh
(b) r(iW(1)fr(i).

(c) ~ r(a : I}
1
(d) ( - -
)c+' r(c + I).
d+1
(e) r(k + l)fr(n + k + 1).

SECTION 5.11

2 If
t
dV2 = ~
2-p
+ nt'-2IP(~)
2-p
if p of- 2;

If
t
dV2 = n(l + log t) if p = 2.

The limit as t -> 00 is 2n/(2 - p) if 0 < p < 2, and + 00 if p ;;0: 2.

SECTION 5.12
1 (a) 2 tan -'(lft).
1 + exp(nt)
(b) t2 + 1

3 cP(x)= (j;/2)exp( - x 2/4).


5 t-'[2 exp(t 4 ) - (1 + l/Iog t)exp(t 2 log2 t)].

SECTION 6.1
1 Y = 2(x - fi)·
2 Y - 1 = 1{x - I), z - 1 = 1<x - 1).

3 fi( -e, + e2 )·
399
Answers to problems

SECTION 6.2
1 (a) Simple closed curve.
(b) Neither.
(c) Simple arc.
2 (b) -1-;[(4 + 9b)3!2 - 8].

3 G(s) = g[s/J2], 0 ~ s ~ 2J2n.


4 -e l - 2e 2 , e l - 2e 2 , and any scalar multiples of these tangent vectors.
5 (b) No, since g'(O) = O.

SECTION 6.3
2 M2 = NI, N3 = O2 ,0 1 = M 3.
3 (a) f(x, y) = tx 2y + c.
(b) Not exact.
(c) Not exact.
(d) f(x, y) = x/y - y/x + cjJ(x, y), where cjJ is constant on each of the four quadrants
into which the coordinate axes divide E2.

SECTION 6.4
1 (a) taco
(b) nab.

2 (a) Ii.
(e) 2,¥.
(d) -i.
5 f(x, y, z) = tcjJ(p2), where cjJ(u) = J~ I/J(v)dv.

7 2J6.
8 (a) ne 3 •
(b) tJ2n 3 + 2J2n.

SECTION 6.6
3 (b) T = Cpa!(1 +a).

SECTION 7.1

3 (a) _e 12 - 3e 13 - 3e23
(b) -xy dx 1\ dy + x 2y dx 1\ dz + (3x + z)dy 1\ dz.
4 (a) 4xy dx 1\ dy.
(b) O.
(c) O.
(d) (of/ox)dx 1\ dy.

400
Answers to problems

SECTION 7.2
1 -1, 0, 1, 0, O.
2 -1.

SECTION 7.3

2 (a) 6e l2 + 2e l3 _ e 23 .
(b) O.
(c) _3e I23 .
(d) 4e 123.
3 (a) _eI2345.
(b) e l235 _ e l234 _ e 1345 .

SECTION 7.4
1 (a) 2xy sin(xi)dx /\ dy /\ dz.
(b) 3 dx /\ dy /\ dz.

3 (a) ~ i:
n i= I
(_I)i+ IXi dx l /\ ... /\ dX i - 1 /\ dx i + I /\ ... /\ dx".

5 (a) 0 ifr is odd; -2 dw /\ d( ifr is even.


(b) O.

SECTION 7.5
(a) -e 2345 .
(b) O.
(c) -e I46 •
(d) 2e 6 , + e 2, - 2e l " where i' = (I, ... , i - I , i + 1, ... ,6).
(e) 2e l ... 6 •

2 (a) 2.
(b) O.
(c) -1.
(d) - 3.

3 Negative; - 1
4 Negative.
5 No.

6 -fi12.

401
Answers to problems

SECTION 7.6
1 (a) Ifb = L*(a), then b l = al - a2 + 2a 3, b2 = -2al + 3a3'
(b) If ~ = CE12, then L2(~) = CV l A V2 = c(2e 12 + 7e13 - 3e23 ), where VI and v2
are the column vectors.
(c) If (0 = W12e12 + w13 e 13 + W23e23, then L!((O) = (-2W12 + 7W13 - 3W23)E12.
(d) O.
(e) j6i.
SECTION 7.7
1 (a) st(cos t ds - s sin t dt).
(b) -st cos t exp t ds A dt.
(c) 0
(d) f(st, s cos t, exp t).

3 (01 = M g dg l
0 + No g dg 2 • (d(O)' = (-aN - -OM) 0
O(gl, g2)
g -~- - ds A dt.
iJy iJx o(s, t)
4 (a) fog s ds A dt A duo
(b) s cos t sin t ds A du + S2 cos 2 t dt A duo

SECTION 7.9

SECTION 8.1
1 (a) 4.)3; the triangle with vertices 2e 3, e l + e 2, e l - 3e2 + 4e3'
(b) p sin IX; a cone from which points (x, y, 0), y ;::: 0, are deleted.
(c) [1 + S2 + t 2]1/2; the hyperboloid x = yz.
2 A basis for T..(x o) is:
(a) {e l + e 2 - 2e3, e l - 3e2 + 2e3}.

(b) {(COS lX)e l + f sin lX(e 2 + e 3), f sin lX(e 3 - e 2)}.

(c) {e l + e 2, e l + e 3}.

4 231t [23/2 - 1].

7 (b) ~ (13 3 / 2 - 1).


18

SECTION 8.2
1 8 = {(x, y): Ixl < 1, Ixl < y < (3 - x 2 )/2},
~ = {(x,z):lxl < 1,z > O,Z2 < 3 - x 2 - 2Ixl}.
g(x, y) = xe l + ye2 + (3 - x 2 - 2y)1/2e 3,
g(x, z) = xel + 1(3 - x 2 - Z2)e2 + ze 3,
4>(x, y) = (x, (3 - x 2 - 2y)1/2).
2 F(M) = {(s, t): t < exp s}.
402
Answers to problems

SECTION 8.3

2
4 (a) 8n/3.
(b) -!3/6.
6 (b) 4n 2 r l r z .

SECTION 8.4
1 The integrals equal O.
2 (a) -n/4. (b) n/4.

5 ro is not of class C(l) on cl D.


8 (b) k = 2 - n.

SECTION 8.5
2 v(x,y) = C l r- 2 (xe l + ye z ).

SECTION 8.6
1 (c) Four possible orientations; on each half of the hyperbola M use either orientation
v or orientation -v, with v as in part (a).

SECTION 8.7
2 (a) - 8n. (b) n(e 4 - 1).

3 t.
4 t.
5 (a) o(x) = [( - sin S)el + (cos s)ez] /\ [( - sin t)e3 + (cos t)ell
(b) n 2 •

8 112 ,

9 ro is not of class Cl) on cl D.


n 8n
10 (a) - 3" (b)
15

SECTION 8.8
1 n.
2 (a) O.
(b) J2(1 - e- I ).

(c) O.

403
Index

A c
Absolute curvature 253 Canonical transformation 382
Accumulation point 43 Cantor's
Adiabatic curve 272 function 180
Adjoint set 47
of a linear transformation 127 theorem 41
of a multicovector 314 Cartesian product 28
of a multivector 311 Cauchy
Affine transformation 125 convergence criterion 40
Almost everywhere 231 inequality 6
Alternating multilinear function 283 principal value 205
Analytic function (real) 97 sequence 37, 64
Angle between two vectors 7 Cauchy- Riemann equations 134
Archimedean property 4 Center of mass 197
Centroids 197, 265, 339
Chain rule 136
Change of variables in Riemann
B integrals 387
Characteristic
Ball, n-dimensional 12 function 187
unit 198, 220, 340 values and vectors 163
Banach space 65 Closed
Barycenter 27 differential form 265, 294
Barycentric coordinates 24 set 17, 51
Beta function 219 Closure of a set 15,51
Bilinear function 71 Codifferential 315
alternating 277 Compact
Bolzano-Weierstrass theorem 45 set 60
Boundary (or frontier) of a set 15,51 topological space 60
Bounded set 41 Comparison tests 43, 203
transformation (or function) 33 Complete metric space 64

405
Index

Components 8, 80, 287, 299


Composite function 53 D
theorem 134
Concave function 107 Decomposable multi vector 299
Conformal transformation 134 Dense subset 47
Connected De Rham's theorem 374
set 56 Derivative
topological space 56 in a direction 77
Continuous functions 34, 52 of vector-valued functions 245
spaces of 67 partial 78, 129
transformations 34 Diameter of a set 41
Convergence theorems, for integrals 227 Diffeomorphism 143
Convergent sequence 37, 64 Differentiable
Convergent sequence of functions function 83, 86
almost everywhere 232 transformation 128, 131
in mean of order p 240 Differential
pointwise 68 of a function 83, 254
uniformly 68 of a transformation 128
Convex Differential form 254, 278, 291
combinations 22, 24 closed 256, 294
function 107 exact 255, 294
hull 27 integral of 259, 356
set 12 transformation law 309
Coordinate Differentiation under integral sign 237
patch 331 Directional derivative 77
system, in E" 216; on manifolds 329 Directions in E" 76
Coordinates Dirichlet problem 345
barycentric 24 Disconnected set 56
cy lindrical 218 Disjoint sequence of sets 175
spherical 217 Distance 7
standard cartesian 217 from a point to a set 62
Countable in a metric space 63
additivity of measure 175 Domain of a function 29
set 66, 178 Dual
Covector 80 of a linear transformation 123, 307
of degree 2 277 space of E" 80
of degree r 287 space of E~ 299
Covering 60 vector space 384
Critical point 100
nondegenerate 103
Cross product 316
E
Cube, n-dimensional 10,44
Curl 317,364 Elementary step function 184
Curve 249 Energy 272, 376, 381
length 250 Entropy 273
mUltiplicity of points 249 Essentially bounded function 243
piecewise smooth 252 Euclidean
standard representation 251 distance 6
trace 250 inner product 6, 81, 288, 299

406
Index

length 6 essentially bounded 243


nonn 6, 81, 299 gamma 203,218
space E" 5 hannonic 137, 345
Euler's fonnula 89 homogeneous 89
Exact differential fonn 255, 294 integrable 183, 189,228,233
Extended real number system 222 linear 79, 384
Extension of functions 96 measurable 187, 189, 224, 233
Exterior monotone 388
algebra 291 multilinear 283
differential 279, 292 of class C<q) 89,91,96
nonnal 341 of class C<OO) 97
point 15 step 181
product 278, 282, 292, 296 support of 182
Extreme points 116 univalent 29
Extremum 100 vector-valued 29, 119
constrained 161 Fundamental theorem of calculus 387
of a linear function 116
relative 100
G
Gamma function 203, 218
F Gauss's theorem 342
Face of a simplex 24, 305 Gradient
Fatou's lemma 230 direction of 100
Field, ordered 2 method 265
Figure 169 vector 87
Flat transfonnation 327 Gram-Schmidt orthogonalization
Flow 344 process 10
Fluid flow 350 Grassmann algebra 291
irrotational 351, 364 Greatest lower bound 3
Flux, outward 342 Green's
Force covector 263 fonnulas 345
Force field 263 theorem 360
conservative 263 Grid 169
Force vector 263
Frame 301
Frobenius integration theorem 295 H
Frontier point 15 Half-space 12
Fubini's theorem 235 Hamiltonian 382
Function 29 Harmonic functions 137, 345
analytic (real) 97 Hausdorff space 66
beta 219 Heine- Borel theorem 60
bilinear 71, 277 Hessian detenninant 103
bounded 33 Hilbert space 243
Cantor's 180 Holder's inequality 166, 242
characteristic 187 Homeomorphic topological spaces 55
concave 107 Homeomorphism 55
continuous 34, 52 Homogeneous function 89
convex 107 Homotopy 370
differentiable 83, 86 null, strict 370

407
Index

Hyperplane II Inversion of order of partial


supporting 20 derivatives 92
Isolated point 43
I Isometry 125
Image of a set 30 Isomorphism 385
Implicit function theorem 148 Isothermal curve 273
Increasing Iterated integrals 191
function 388
r-tuple 284 J
Increment 83
Indexed collections of sets 16 Jacobian of a transformation 129
intersections of 17 Jordan measurable set 190
unions of 17
Induced linear transformation 306 K
Infinite series 42 Kepler's laws 377, 378, 379
Inner
Klein bottle 354
measure 172
Kronecker symbol 8, 81
product 6,71,81,88,288,289
generalized 285
product space 243
Integrable function 183, 189,228,233
continuous 200, 202 L
Integral Lagrange multiplier rule 161
change of variables 211, 387 Lagrange's equations of motion 381
conditionally convergent 204 Laplace's equation 137,345
differentiation under sign of 237 Laplacian 137
interated 191 Least upper bound 3
line 259, 264 Lebesgue's dominated convergence
lower 183 theorem 232
of continuous functions 200 Leibniz's rule 239
of a differential form 259, 356 Length 6
over bounded sets 186 of a curve 250
over E" 183, 227 Level sets 104, 155
over manifolds 334, 337 Limit
over measurable sets 233 lower 225
Riemann 184, 386 of a sequence 37, 64
transformation of 209 of a transformation 31
upper 183 upper 225
Integrating factor 269, 318 Line
Interior in E" 10
of a set 15, 5 I integral 259
point 15 segment 10
Intermediate value theorem 58 Linear
Intersection of manifolds 159 dependence 383
Interval 3, 57 function 79, 384
half-open 3, 184 independence 383
n-dimensional 168 Linear transformation
semi infinite 3, 169 adjoint of 127
Inverse function theorem 140 characteristic values and vectors 163
image of a set 30 column vectors 120

408
Index

dual of 123, 307


homo the tic 124 N
induced 306 Neighborhood in E" 15
kernel 120 in a metric space 63
matrix of 120 in a topological space 50
nonsingular 121 noneuclidean, in E" 70
nullity 120 of infinity 36
rank 120 punctured 31
row covectors 121 relati ve 47, 54
skew symmetric 160 Nondecreasing, nonincreasing
Lorentz transformation 139 function 388
Lower bound 3 sequence of numbers 38
LP -spaces 240 sequence of sets 179
Nondegenerate critical point 103
Noneuclidean
M inner product 71, 88
norm 70
Manifolds 153, 155, 321 Norm 65, 70
Matrix of a linear transformation 120 Normal vector to a manifold 157, 341
Maximum 49, 100 Normed vector space 65
Maxwell's equations 318 Null sets 178
Mean value theorem 86, 385 subsets of a manifold 338
for integrals 190
Measurable
function 187, 189,224,233 o
set 172, 176 Open
subset of a manifold 334, 337 covering 60
Measure 170, 176 set 16,51
and integration on manifolds 324,334, transformation 367
337 Orientable manifold 353
of a parallelepiped 208, 303 Orientations 77, 304
of a simplex 208, 303 for manifolds 353
of the unit n-ball 198, 220 induced 355
outer and inner 172 Orthogonal
Metric space 63 complement 10
complete 64 transformation 126
Minimum 49 vectors 8
absolute 99 Orthogonalization process of
relative 100 Gram-Schmidt 10
strict 99 Orthonormal bases 8
Minkowski's inequality 75, 240 Ostrogradsky's theorem 342
Mobius strip 354 Outer measure 172
Moments 197, 198,203,265,339
Momentum 376
p
angular 376
Multicovector 287 Pappus's theorem 340
Multilinear function, alternating 283 Parallel hyperplanes II
Multivector 295 Parallelepiped 14, 208, 303
decomposable 299 degenerate 209

409
Index

Parametric representations 247


equivalence of 249 S
Partial derivative of a function 78 Saddle point 103
of a transformation 129 Scalar 5
Partition of unity 336 product 80, 299
Pathwise connectedness 58 Second derivative test 102
Pfaffian differential equation 269 for constrained extrema 166
Plane, k-dimensional 14 Second difference quotient 92
Poincare's lemma 374 Seminorm 75
Point Sequence, infinite 37
critical 100 bounded 38
isolated 43 Cauchy 39, 64
of accumulation 43 convergent, divergent 37, 64
Polygonal path 59 limit of 37, 64
Polytype, convex 19 monotone 38
Potential nonincreasing, nondecreasing 38
energy 376 subsequence of 47
of a force field 263 upper and lower limits 225
velocity 351 Sequentially compact 47
Principal Series, infinite 42
minor determinant 105 absolutely convergent 43
normal vector 253 comparison test for convergence 43
convergent 42
divergent 42
Sets, related notions
Q Sigma compact set 227
Simple arc, closed curve 250
Quadratic form 102 Simplex 24, 208, 303
negative definite 102 standard 24
positive definite 102 Simply connected set 256, 372
Solenoidal 343
Sphere, (n - I)-dimensional 12
unit 218,340
R Spherical coordinates 217
Standard
Raising indices 81 cartesian coordinate functions 35, 255
Real numbers 2 n-simplex 24
Regular domain 341 Standard basis
on a manifold 365 for £" 8
Regular transformation 143, 322 for (£")* 81
flat 327 for E~, (E~)* 287, 299
of a submanifold 367 Star-shaped set 374
Relative Step function 181
extrema 100 elementary I 84
neighborhood 47, 54 Stereographic projection 333
topology 51, 54 Stokes's formula 281, 363, 364
Restriction of a function 30 Support of a function 182
Riemann integral 184, 386 Supporting hyperplane 20
Rotations 127 Surface of revolution 340

410
Index

T v
Tangent Vector 5
plane to a manifold 84, 157 Vector space 383
space 156, 323 bases 384
vector ISS, 246, 252 dimension 384
Taylor's subspaces 120, 384
formula 94
series 97
theorem 386 w
Tensor 290, 299 Wave equation 139
algebra 291 0' Alembert's solution, spherical
field 311 waves 139
Thermal system 272 Whitney's extension theorems 96
extensive, intensive variables 272 Work 263
Thermodynamics, first and second
laws 273
Topological space, axioms for 50
Topology of E"
basic notions 14
relative topology 51
Totally disconnected set 59
Transformation 29
affine 125
bounded 33
composition 134
conformal 134
continuous 34, 48
differentiable 128, 131
flat (regular) 327
Jacobian of a 129
linear 120
of class C<q) 131
of integrals 209
open 367
orthogonal 126
partial derivative of 129
regular 143, 322
Translation 125
Triangle inequality 6, 63
Triple scalar product 317

u
Uniform
continuity 50
convergence of sequences 68
distance, norm 67
Upper bound 3

411

Вам также может понравиться