Contents
1 Week 1
1.1 Lecture
1.2 Lecture
1.3 Lecture
1.4 Lecture
1
2
3
4
.
.
.
.
.
.
.
.
2 Week 2
2.1 Lecture 5  Convergence of Sums and Some Exam Problems . . . . . .
2.2 Lecture 6  Some More Exam Problems and Continuity . . . . . . . . .
2.3 Lecture 7  PathConnectedness, Lipschitz Functions and Contractions,
Fixed Point Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Lecture 8  Uniformity, Normed Spaces and Sequences of Functions . .
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
and
. . .
. . .
3
3
7
11
16
20
20
25
30
34
3 Week 3
3.1 Lecture 9  ArzelaAscoli, Differentiation and Associated Rules . . . . . . . .
3.2 Lecture 10  Applications of Differentiation: Mean Value Theorem, Rolles
Theorem, LHopitals Rule and Lagrange Interpolation . . . . . . . . . . . .
3.3 Lecture 11  The Riemann Integral (I) . . . . . . . . . . . . . . . . . . . . .
3.4 Lecture 12  The Riemann Integral (II) . . . . . . . . . . . . . . . . . . . . .
39
39
4 Week 4
4.1 Lecture 13  Limits of Integrals, Mean Value Theorem for Integrals, and Integral Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Lecture 14  Power Series (I), Taylor Series, and Abels Lemma/Theorem . .
4.3 Lecture 15  StoneWeierstrass and Taylor Series Error Approximation . . . .
4.4 Lecture 16  Power Series (II), Fubinis Theorem, and exp(x) . . . . . . . . .
65
45
51
58
65
72
80
87
5 Week 5
5.1 Lecture 17  Some Special Functions and Differentiation in Several Variables
5.2 Lecture 18  Inverse Function Theorem, Implicit Function Theorem and Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Lecture 19  Multivariable Integration and Vector Calculus . . . . . . . . . .
95
95
98
99
Week 1
As per the syllabus, Week 1 topics include: cardinality, the real line, completeness, topology,
connectedness, compactness, metric spaces, sequences, and convergence.
1.1
Todays main goal will be the construction of the real numbers. We will take the construction of N, Z, and Q for granted.
Lets start with a fact. The rationals form a dense linear order with no endpoints.
Unpacked, this means:
(i) Dense: For all x and y, there exists z such that x < z < y
(ii) Linear: For all x and y, either x < y or x = y or y < x
(iii) No endpoints: For all x, there exists y such that y < x; for all x, there exists y such
that y > x
It turns out that every countable dense linear order with no endpoints is isomorphic to
(Q; <). We will come back to this result after a brief discussion of cardinality.
Cardinality:
Definition 1.1.1. Two sets are equinumerous (written A
= B) if there is a bijection
f : A B.
Note that
= determines an equivalence relation:
(i)
= is reflexive (take the identity)
(ii)
= is symmetric (if f : A B is a bijection, then f 1 : B A is also a bijection)
(iii)
= is transitive (if f : A B and g : B C are bijections, then g f : A C is a
bijection)
Definition 1.1.2. A set x is finite if there exists n N such that x
= {0, 1, . . . , n 1}.
Definition 1.1.3. A set x is infinite if x is not finite.
We write A B if there exists an injection f : A B.
Theorem 1.1.4 (CantorSchroederBernstein). If A B and B A, then A
= B.
The proof of CSB is beyond the scope of this lecture, so we omit it here. Using CSB, we
can prove several useful facts.
Corollary 1.1.5 (Pigeonhole Principle). For all n, m N, if n < m, then {0, 1, . . . , m1}
{0, 1, . . . , n 1}.
3
Now we will construct the real numbers, using equivalence classes of strictly increasing
sequences of bounded rational numbers, which we will naturally identify with their supremums.
Definition 1.1.16. A sequence (an )
n=0 is strictly increasing if for all n, m N, n <
is
bounded
(in Q) if there exists c Q such that for all
m = an < am . We say (an )
n=0
n N, an < c.
Let E be the set of all strictly increasing bounded sequences of rationals. For (an ), (bn )
E, set (an ) (bn ) if and only if n N, m N such that bm > an and n N, m N
such that am > bn . In colloquial terms, the sequences (an ) and (bn ) are interleaved.
Proposition 1.1.17. is an equivalence relation on E.
Proof. It is straightforward to verify the three necessary conditions.
(1) (an ) (an ), since (an ) is strictly increasing
(2) is symmetric by the requirement for equivalence
(3) is also transitive by a (layered) application of the requirement for equivalence
For (an ) E, let [an be the equivalence class of (an ), which formally is the set {(bn )(bn )
(an )}. By moving to equivalence classes, we have [an ] = [bn ] if and only if (an ) (bn ), i.e.
we translate equivalence to equality. Let E be the set of equivalence classes. Define < on
E by setting [an ] < [bn ] if k N such that n N, an < bk . Informally, [an ] < [bn ] if the
terms of (bn ) eventually bound the terms of (an ).
There are two things to check here, namely that < is welldefined, and < is a linear order
on E .
Welldefined: If (an ) (a0n ), (bn ) (b0n ), suppose there exists k N such that n
N, an < bk . Ten take l such that bk < b0l . Then for all n N, we have m N such that
a0n < am . So a0n < am < b0l , so < is welldefined.
Linear order on E : This is precisely what we have rigged in our definition of the equivalence relation on E. That is, if [an ] 6 <[bn ] and [bn ] 6 <[an ], then (an ) and (bn ) are interlaced,
so (an ) (bn ), i.e. [an ] = [bn ].
Now, there is the matter of identifying the rationals in E . The map p 7 [(p 1/n)
n=1 ]
embeds Q into E (that is, is an orderpreserving injection). From now on, we will identify
[(p 1/n)
n=1 ] with p for p Q. Replacing E with an isomorphic copy, we have Q E ;
call this isomorphic copy R, the real line.
Proposition 1.1.18. Q is dense in R. This means x, y R such that x < y, there exists
z Q such that x < z < y.
Proof. Say x = [an ], y = [bn ]. Since x < y, there exists k N such that an < bk for all n N.
Take z = bk+1 Q. Then z < y, since bk+1 < bk+2 , and every element of z is bounded by
bk+1 , hence is bounded by bk+2 . We also have z > x, since (by the archimedean property of
the rationals), there exists n N such that bk < bk+1 1/n.
6
1.2
Today, we will talk about the properties which characterize R, as well as some general
topology.
Definition 1.2.1. An order (L; <) is Dedekind complete if
(i) Every A L which is bounded above has a supremum (i.e., a <least upper bond)
(ii) Every A L which is bounded below has an infimum (i.e., a <greatest lower bound)
Proposition 1.2.2. R is Dedekind complete.
Proof. Let A R be bounded above. Let f : N Q be onto, i.e. enumerate the rationals.
Put An = {f (i)  i 6 n, x A such that f (i) 6 x}, and let (an ) be the sequence defined by
an = max{An }. Clearly, an 6 an+1 , since An An+1 . Colloquially, we are building (an ) to
be a sequence of nondecreasing elements of Q which are less than some element of A, with
the goal of showing that [an ] is the sup of A. We break into two cases:
Case 1: Suppose (an )
n=0 is eventually constant. Say that an = p Q for all n > k for some
k N. Can check that p is a sup for A.
Tn
i=1
Vi T
iI
Vi T
T is called the topology of the space, and the elements of T are called the open sets. We
will refer to X as the space when T is clear.
Definition 1.2.7. We say T is generated from U T if T consists of arbitrary unions of
sets from U (note then U must be closed under finite intersection). U is called a basis for
T. Elements of U are basic open sets.
Example 1.2.8. If (L; <) is a linear order with no endpoints, the open intervals generate a
topology, called the order topology.
Definition 1.2.9. V is an (open) neighborhood of x if V is open and x V . V is a basic
open neighborhood if in addition V is a basic open set.
A basis for the neighborhoods of x is any U consisting of neighborhoods of x such that every
neighborhood of x contains some V U.
Proposition 1.2.10. A X is open if and only if for all x A, there exists an open
neighborhood V of x such that V A.
Proof. ( = ) If A is open, then A A is a neighborhood of x.
( = ) For each x A, there is a neighborhood V of x such that V A. Then A is the
union of all such neighborhoods, and is therefore open.
8
Definition 1.2.11. The interior of E X is the union of all open subsets of X contained
in E. It is the largest open subset of X contained in E.
The exterior of E X is the union of all open subsets of X which have empty intersection
with E. It is the largest open subset of X contained in X \ E, hence is the interior of X \ E.
The boundary of E X is the set of all points of X which are not in Int E or Ext E.
Definition 1.2.12. A X is closed if X \ A is open. Note that arbitrary intersections of
closed sets are closed, as are finite unions.
Definition 1.2.13. The closure of A in X, denoted A, is the intersection of all closed sets
in X containing A. It is the smallest closed set of X containing A.
Definition 1.2.14. D X is dense in X if every nonempty open subset V of X contains
a point of D.
b X, T a topology on X. Then the relative or induced topolDefinition 1.2.15. Let X
b
b = {V X
b is defined to be T
b  V T}.
ogy T on X
Definition
1.2.16. An open cover of X is a collection {Vi }iI of open subsets
S
S of X such
that iI Vi = X. A subcover of X is a subcollection {Vi }iJ , J I such that iJ Vi = X.
Definition 1.2.17. X is compact if every open cover has a finite subcover. Y X is
compact if Y isScompact in the relative topology. Equivalently, whenever
{Vi }iI are open
S
in X such that iI Vi Y , then there exists J I finite such that iJ Vi Y .
Proposition 1.2.18. If N X is compact, and V is open, N \ V is compact.
Proof. Let U be an open cover of N \ V . Since V is open, U {V } is an open cover of N .
Since N is compact, there is a finite subcover U0 U of N . If V U0 , replace U0 by U0 \ {V },
which yields a finite subcover of N \ V .
Definition 1.2.19. (X, T) is locally compact if for every x X, there is a compact N
containing a neighborhood of x.
Definition 1.2.20. (X, T) is connected if it cannot be partitioned into two nonempty open
sets, i.e. there are no open, disjoint A, B such that A B = X.
Proposition 1.2.21. R (with the order topology) is connected.
Proof. Suppose R = A B for some nonempty open, disjoint subsets A, B such that A B =
. Let a A, b B; WLOG, a < b. Put E = {x A  x < b}; since a E, E is nonempty
and bounded above. Let z = sup(E). Since R is Dedekind complete, we must have z A
or z B. We break into cases:
Case 1: Suppose z A. Since A is open, there is an open interval (x, y) such that
z (x, y) A. Since R is dense, we can find z > z with z < z < min{b, y}. Then z (x, y),
so z A, but z < b, contradicting that z is an upper bound for E.
Case 2: Suppose z B. Since B is open, there is an open interval (x, y) such that
z (x, y) B. Then x < z, so x is not an upper bound for E (z is the least upper bound
for E). Thus, we can find z E such that z > x. Since z is an upper bound for E, hatz 6 z,
so z (x, y) B. This is a contradiction, since z A.
9
1.3
Today, we will give a few more results on compactness, and will introduce the Baire Category
Theorem.
Proposition 1.3.1. Let (X, T)
T be a compact space.
T Let CiiN be a collection of closed subsets
of X. Suppose for all n N, i6n Ci 6= . Then iN Ci 6= .
10
(
nN n
nN Dn ) 6= .
T
T
But this is a contradiction, as nN Gn = Q and nN Dn = R \ Q.
12
Metric Spaces
Definition 1.3.13. A metric space is a pair (X, d) where d : (X X) [0, ) satisfies:
(1) For all x X, d(x, x) = 0
(2) For all x 6= y X, d(x, y) 6= 0
(3) For all x, y X, d(x, y) = d(y, x)
(4) (Triangle inequality): For all x, y, z X, d(x, z) 6 d(x, y) + d(y, z)
Intuitively, d(x, y) can be thought of as the distance between x and y.
Proposition 1.3.14. Let (X, d) be a metric space. Then the sets
B(z, r) = {x  d(z, x) < r}, for z X, r > 0
generate a topology on X called the metric topology; B(z, r) are the open balls of radius
r centered at z.
Proof. Let T be the collection of all unions of open balls. To show T is a topology, it suffies
to check that the intersection of any two open balls is a union of open balls. For this, it is
enough to show that for all z1 , z2 X and r1 , r2 (0, ), for all x B(z1 , r1 ) B(z2 , r2 ),
there exists s > 0 such that B(x, s) B(z1 , r1 ) B(z2 , r2 ).
This essentially boils down to the triangle inequality. Let d1 = d(z1 , x) < r1 ; d2 = d(z2 , x) <
r2 . Let s > 0 be small enough such that d(z1 , x) + s < r1 , and d(z2 , x) + s < r2 ; this is
possibly because R is dense.
Fix y B(x, s). Then d(zi , y) 6 d(zi , x) + d(x, y) < di + s < ri for i = 1, 2. Hence,
B(x, s) B(z1 , r1 ) B(z2 , r2 ).
Example 1.3.15. The usual metric on R: d(x, y) = x y. Then the metric topology
on R is the usual order topology, generated by open intervals.
Example 1.3.16. The discrete metric on any set X:
(
0, if x = y
d(x) =
1, if x 6= y
Every subset of X is open with respect to this metric.
Definition 1.3.17. A metric space is compact if it is compact with the metric topology.
Equivalently, any covering of X with open balls has a finite subcover.
Y X is compact if (Y, dY Y ) is compact. Equivalently, any covering of Y with open
balls has a finite subcover.
Definition 1.3.18. A sequence (xn )
n=1 is Cauchy if for every > 0, there exists n N
such that for all k, l > N N, d(xk , xl ) < . Intuitively, the points of a Cauchy sequence
cluster arbitrarily closely together if you go far out enough in the sequence.
13
(xn )
n=1
nN l>n
by
xn
nN l>n
l>2
l>2
l>1
This is because when we remove elements from our sequence, the inf can only go up, whereas
the sup can only decrease, but the sups will never fall below the infs. In particular, this shows
sup inf xl 6 inf sup xl
nN l>n
nN l>n
14
Claim 1.3.27. For every > 0, there exists N N such that k > N , xk < lim sup xn + .
Proof. Let N be large enough that
sup xl lim sup xn <
l>N
or
sup xl < lim sup xn +
l>N
(supl>n xl )
n=1
1.4
Today, we will talk more about completeness, as well as its connection with compactness.
Definition 1.4.1. A limit point of A (in a metric space (X, d)) is a point z such that for
all > 0, A B(z, ) 6= . Equivalently, z is a limit point of A if z is the limit of a sequence
of values in A.
Proposition 1.4.2. Let (X, d) be a metric space. Then A X is closed if and only if A
contains each of its limit points.
Proof. ( = ) Suppose A is closed, and let z be a limit point of A. If z
/ A, then since
X \ A is open in the metric topology, there exists > 0 such that B(z, ) X \ A. But then
A B(z, ) = , a contradiction since z is a limit point.
( = ) Suppose A contains all its limit points. Pick z X \ A. Then z is not a limit point
of A, so there exists > 0 such that B(z, ) A = , i.e. B(z, ) X \ A. Hence, X \ A is
open, so A is closed.
Definition 1.4.3. Let (X, d) be a metric space. A subset A X is complete if and only
if (A, dAA ) is complete.
Proposition 1.4.4. If (X, d) is complete, then A X is complete if and only if A is closed.
Proof. ( = ) Suppose A is complete. Let z be a limit point of A. Then there is a sequence
(xn )
n=1 of points in A such that xn z; but xn is convergent, hence Cauchy in (X, d). Since
(xn ) is a sequence of points in A, it is also cauchy in (A, dAA ). Since A is complete, (xn )
must have a limit in (A, dAA ); call it z A. Then xn z also in (X, d), so z = z, whence
z A. (Note that we have not used the completeness of X here).
( = ) Suppose A is closed. Let (xn ) be Cauchy in (A, dAA ). Then (xn ) A and (xn )
is Cauchy in (X, d). Since X is complete, there exists z X such that xn z. Since A is
closed and z is a limit point of A, z A.
Proposition 1.4.5. Let (X, d) be a metric space. The sets B(z, r) = {x  d(z, x) 6 r} are
closed.
Proposition 1.4.6. Let (X, d) be a metric space. If U is open, for each z U , there exists
> 0 such that B(z, ) U .
Proof. Let be sufficiently small such that B(z, 2) U . Then B(x, ) B(z, 2) U .
Proposition 1.4.7. Let (X, d) be complete.
Let {Bn }nN be open balls of radius rn with
T
limn rn = 0 and Bn+1 Bn . Then nN Bn is nonempty.
Proof. Say Bn = (xn , rn ). Then for all N N, for all k, l > N , we have xk , xl BN , since
each of the open balls are nested, which gives d(xk , xl ) < 2rN . Since rn 0, this implies
that (xn ) is Cauchy, since we can take N large enough that 2rN is arbitrarily small, which
bounds d(xl , xk ) for all k, l > N . By the completeness of (X, d), (xn ) has a limit; call it z.
For every N, z = limn>N +1 xn . Hence z is a limit point of BN +1 B N +1 . Since B N +1 is
closed, z B N +1 , so z BN , since B N +1 BN .
16
z X is a limit of a subsequence of (xn )n=1 . Fix z X; if for every r > 0 and N N, there
exists k > N N such that xk B(z, r), then we can inductively construct a subsequence
of (xn ) which converges to z. Hence, for each z X, there exists rz > 0, Nz N such
that for all n > Nz , xn
/ B(z, rz ). The set {B(z, rz )  z X} is an open cover of X. By
compactness, there are finitely z1 , . . . , zk such that B(z1 , rz1 ) B(zk , rzk ) = X. But for
any n > max{Nz1 , . . . , Nzk }, we have xn
/ B(z1 , rz1 ) B(zk , rzk ) = X, a contradiction.
( = ) Let {Vi }iI be an open cover of (X, d). Suppose {Vi }iI has no finite subcover. We
will aim to construct a sequence (xn ) with no convergent subsequence. The intuitive idea is
that we can construct a sequence (xn ) such that the terms of (xn ) are spaced sufficiently far
from each other element of the sequence, so that (xn ) cannot have a convergent subsequence.
We work inductively as follows. Let x0 be some point in X. Suppose we have constructed
xn . Then:
17
(i) There exists in I such that xn Vin ; this is possible since {Vi }iI is an open cover
for X
(ii) Fix rn > 0 such that B(xn , rn ) Vin ; this is possible since Vin is open
(iii) Pick in , rn so that rn > 1/2L for the smallest possible L N; in other words, we want
to pick in and rn so that we can put the largest open ball possible around xn in Vin
(iv) Finally, pick xn+1 in X \ (Vi1 Vin ); this is possible since X has no finite subcover
Claim 1.4.12. (xn ) constructed above has no convergent subsequence.
Proof. Suppose for contradiction that (xn ) has a subsequence converging to some z X, i.e.
for every > 0 and N N, there exists n > N such that xn B(z, ).
Fix i such that z Vi . Fix L N such that B(z, 1/L) Vi . Fix n N such that
xn B(z, 1/2L); we can do so by the above. Note then that B(xn , 1/2L) B(z, 1/L) Vi ,
where the first inclusion follows from the triangle inequality. At stage n in the induction
above, we could have picked in = i and rn = 1/2L. Hence, by construction, the rn we did
pick must be 1/2L, whence B(xn , 1/2L) Vin .
Hence, for all k > n, xk
/ B(xn , 1/2L), since by construction we have xk
/ Vin for each
k > n. From this, we get that for all k > n,
d(z, xk ) > d(xk , xn ) d(z, xn ) > 1/2L d(z, xn ) = > 0
Thus, xk
/ B(z, ) for every k > n, so z is not a limit point of xk , a contradiction.
This completes the proof.
Theorem 1.4.13 (HeineBorel). In R, a set A is compact if and only if it is closed (complete)
and bounded (both above and below).
Proof. ( = ) We proved earlier that every compact subset of R (a Hausdorff space) is
closed. If A were unbounded in either direction, then we would have a monotone set of
points in A which have no convergent subsequence, contradicting sequential compactness.
( = ) Suppose A is bounded and closed. We prove A is sequentially compact. Our method
will be a lion in the desert style proof. To hunt a lion in the desert, divide the desert into
two halves; the lion must be in one of them, so follow him there. Repeat until youve caught
the lion.
Let (xn ) be a sequence of points in A. It is sufficient to find a Cauchy subsequence by the
completeness of R. Let (a, b) R such that A (a, b) (possible since A is bounded above
and below). Working inductively, we define open intervals Bk such that () for infinitely
many n, xn Bk (i.e. Bk contains infinitely many terms from our sequence):
(i) Set B0 = (a, b).
(ii) Having defined Bk , say Bk = (ak , bk ), let c be the midpoint of (ak , bk ). Then either xn
has infinitely many elements in (ak , c), or in (c, bk ) (if xn has infinitely many elements
equal to c, then that constant subsequence is clearly convergent).
18
(iii) If xn has infinitely many elements in (ak , c), take Bk+1 = (ak , c). Otherwise, set Bk+1 =
(c, bk ).
Using (), find a subsequence (xnk ) such that xnk Bk for each k N. Since the lengths of
Bk shrink to 0, (xnk ) is Cauchy.
Definition 1.4.14. (X, d) is totally bounded if for every > 0, X can be covered with
finitely many balls of radiuses less than .
A proof similar to the one above yields above gives the only if direction of the following:
Theorem 1.4.15 (S09.4(e), S13.3). A metric space (X, d) is compact if and only if X is
complete and totally bounded.
19
Week 2
As per the syllabus, Week 2 topics include: convergence of sums, rearrangements and absolute convergence, continuity in topological and metric spaces, path connectedness, intermediate value theorem, contraction maps and the fixed point theorem, uniform continuity,
uniform convergence, and the ArzelaAscoli theorem.
2.1
Today, we will discuss convergence of sums, and will complete some exam problems concerning the evaluation and convergence of sums.
P
Definition 2.1.1 (Convergence P
of sums in R). For a sequence
(ai ) R,
i=1 ai converges
P
n
to
s
R
if
the
sequence
s
=
a
converges
to
s.
a
converges
absolutely if
n
i=1 i
i=1 i
P
i=1 ai  converges.
P
P
Define (sn ) by sn = ni=1 ai . Then to check that
i=1 ai converges, it is sufficient (in fact,
necessary) to show that (sn ) is Cauchy, i.e. for every > 0, there exists N N such that
for all k, l > N ,
X
l
sk sl  < , i.e. if k 6 l,
ai <
i=k+1
X
bn
10n
n=1
converges.
Proof. It is sufficient to show that the tail sums are arbitrarily small, i.e. for N sufficiently
large,
X
bn
<
10n
n=N
We compare
X
X
bn
1
1
1
1
M
6
M
=M N
1 =
n
n
N
10
10
10
10
9
1 10
i=N
i=N
Since we can always take N sufficiently large so that 9 10N > M , we are done.
Corollary 2.1.3 (S04.1). P (N) = {b  b N} injects into R.
Proof. For b N, set
f (b) =
X
bn
10n
n=1
20
where
(
1 if n b
bn =
0 if n
/b
X
bn
10n
n=1
always converges by the previous proposition. Moreover, if a 6= b, then taking the least N
which belongs to one of a, b but not the other (assume WLOG N a, N
/ b), we have
f (a) =
N
1
X
n=0
f (b) =
N
1
X
n=0
X
an
1
an
+ N +
n
10
10
10n
n=N +1
X
0
bn
bn
+ N +
n
10
10
10n
n=N +1
Since no integer less than N is in one of a, b but not the other by the definition of N , we
have
N
1
N
1
X
X
an
bn
=
n
10
10n
n=0
n=0
Note
X
1
1
1
an
bn
6 N < N +
n
10
10
9
10
10n
n=N +1
n=N +1
21
Proposition 2.1.7 (S09.1). Let a0 = 0, an+1 = 6 + an . Show (an ) converges, and find its
limit.
Solution. We proceed directly, without use of continuity. By induction, we show that an 6 3
for all n N:
Note a0 = 0 6 3. Suppose an 6 3. Then
a2n+1 = 6 + an 6 6 + 3 = 9
so an+1 6 3. To prove a2n+1 > a2n , note
a2n+1 = 6 + an > an 2 + an = 3an > a2n
So (an ) is increasing and bounded above by 3, so an has limit 6 3.
Claim 2.1.8. For every > 0, (an ) is not bounded by 3 . This will show the limit is 3.
Proof. Restrict to < 4. We show that if 3 is a bound, then so is 3 2. It is enough
to show that an+1 < 3 = an < 3 2. Say an+1 < 3 . Then
6 + an < 3
= 6 + an < 9 6 + 2
So an < 3 6 + 2 < 3 2, where the last inequality is obtained because 2 < 4, since
< 4 by assumption.
Using repeated application of this to keep doubling , we get an 6 3 4 = 1. This is a
contradiction, since a0 > 3 4.
Proposition 2.1.9. Let (an ), (bn ), (cn ) be sequences. If an < bn < cn for each n N, and
(an ) and (cn ) converge to the same limit l, then (bn ) converges to l.
Proof. Pick N large enough such that an l < , cn l < . Then note
an l < b n l < c n l <
Further,
> l an > l b n > l c n
Hence, bn l < .
Proposition 2.1.10. If limn dn = l1 and limn un = l2 and limn dn un 0, then
l1 = l2 .
22
Proof. Suppose not. Let > 0 such that l1 l2  > . Take N large enough such that
dN l1  <
Then
l1 l2  = l1 dN + dN uN + uN l2  6 l1 dN  + dN uN  + uN l2  <
Proposition 2.1.11.
(1)n
n=0 n+1
+ + =
3 3 3
converges.
Proof. Set
sn =
n
X
(1)i
i=0
i+1
The basic idea here is that sn alternates between increasing and decreasing; that is, sn > sn+2
for each n even, and sn < sn+2 for n odd. We will build a strictly increasing sequence out
of elements of (sn ) and a strictly decreasing sequence out of elements of (sn ), and wedge the
terms of (sn ) between these two sequences.
Let (un ) be the sequence s0 , s0 , s2 , s2 , s4 , s4 , . . ., i.e. un = sbn/2c . Note that (un ) is decreasing.
Let (dn ) be the sequence s1 , s1 , s3 , s3 , s5 , s5 , . . ., i.e. un = sd(n+1)/2e . Note that (dn ) is
increasing.
Then for all n, dn 6 sn 6 un . Note (un ) and (dn ) are bounded below by 0 and bounded
above by 1 respectively, so both sequences converge. Finally, note
un dn  6
1
n+1
so by the above proposition, (un ) and (dn ) converge to the same limit. Thus, by the previous
proposition, (sn ) converges.
Proposition 2.1.12 (S05.5). Let
N
1 X
an
SN =
N n=1
<
M
2
23
X
1
d((n ), (n )) =
n n 
2n
n=1
Prove (directly) that every infinite A X has an accumulation point, i.e. X is sequentially
compact.
Proof. This will be another lion in the desert proof. The essential idea is that we can
make the distance between two sequences very small if they agree on a large number of first
digits. Let A X be infinite. We will define ak {0, 1} for k > 1 so that
()k : There are infinitely many (n ) A extending (a1 , . . . , ak1 ) (i.e., the first k digits of
(n ) are a1 , . . . , ak1 ).
holds for each ak . Note that ()1 holds because A is infinite, so there must either be an
infinite number of sequences starting with 0 or an infinite number of sequences starting with
1. Let a1 be 0 or 1 depending on which is the case.
Now suppose ()k holds. Note that
{(n ) extending (a1 , . . . , ak1 )} ={(n ) extending (a1 , . . . , ak1 , 0)} {(n ) extending (a1 , . . . , ak1 , 1)}
=B0 B1
Since ()k holds, A must have infinite intersection with at least one of the sets in the union.
If A B0 is infinite, put ak+1 = 0. Otherwise, set ak+1 = 1. It remains to show that (ak )
k=1
is an accumulation point of A.
Let > 0 be given. Let N be large enough so that 1/(2N ) < . Using ()N +1 , find (n ) A
which extends (a1 , . . . , aN ). Then
d((n ), (an )) =
X
X
X
1
1
1
1

a

=
0
+

a

6
=
<
n
n
n
n
n
n
n
N
2
2
2
2
n=1
n=N +1
n=N +1
24
2.2
Today, we will do some more exam problems. We will also introduce continuity.
Proposition 2.2.1 (F07.8). Suppose (an ) is a sequence such that an > 0 for all n N, and
an =
n=1
Does
X
n=1
an
= ?
1 + an
an = ,
n=1
the sequences
k
X
n=1
!
an
and
k
X
n=M
k=1
!
an
k=1
Then
k
X
n=M
k
1 X
an
>
an
1 + an
2 n=M
for each k > M ; since the RHS is not bounded, the LHS is not bounded either. Hence,
certainly
k
X
an
1 + an
n=1
is not bounded.
Case 2: Suppose there are infinitely many n N such that an > 1. Then for each such n,
2an > an + 1, so
an
1
>
1 + an
2
Thus,
an
1 + an n=1
is a sequence of positive real numbers with infinitely many greater than 1/2, so
X
n=1
an
=
1 + an
Proposition 2.2.2 (F13.1). Suppose (an ) is a sequence such that an > 0 for all n N. Let
n
Y
Pn =
(1 + aj )
j=1
Prove
lim Pn <
an <
n=1
Proof. ( = ) It suffices to show that the sum is bounded, since each of the terms are
positive. Note that
Pn =
n
Y
(1 + aj ) = (1 + a1 ) (1 + an )
j=1
n
X
an 6 P n <
j=1
26
an <
n=1
an <
n=N
1
2
For k > N ,
k
Y
(1 + aj ) =1 +
j=N
k
X
aj +
k
X
j=N
61 +
k
X
!
aj
j=N
k
X
!2
k
X
+ +
aj
j=N
!kN +1
aj
j=N
2
kN +1
1
1
1
61 +
+
+ +
2
2
2
62
Thus, for every k > N ,
N
N
k
k
Y
Y
Y
Y
(1 + aj )
(1 + aj ) < 2 (1 + aj )
(1 + aj ) =
j=1
j=1
j=N
j=1
P
reProposition 2.2.3
(S10.11).
Suppose
n=1 an converges absolutely. Show that
Pevery
P
a
a
converges
to
the
same
limit.
(A
rearrangement
of
arrangement
of
n=1 n is
n=1 n
P
n=1 a(n) where : N N is a bijection.)
P
Proof. Let a =
n=1 an . Let > 0 be given. We must show that there exists N N such
that for all k > N ,
X
a(n) a <
n=1
a
<
and
an  <
n
2
2
n=1
n=N
1
P
whereP
the first inequality is obtained using the convergence of
n=1 an to a, and the second
a
ai  < + =
a
a
+
a
6
a
a
=
n
n
i
(n)
2 2
n=1
n=1
n=1
iA
iA
P
Proposition 2.2.4 (F08.5). Suppose
n=1 an converges,
P but not absolutely. Then for every
such
that
a R, there is a rearrangement (a(n) )
n=1
n=1 a(n) = a.
Proof. The key property here is tha the sum of all positive (resp. negative) terms tends to
positive (resp. negative) infinity, but the terms themselves are converging to 0. Let
X = {n  an > 0}
Y = {n  an < 0}
P
an < M
n=1,nX
k
X
an > M
n=1,nY
an  =
k
X
an
n=1,nX
k
X
an < 2M
n=1,nY
N, and is injective by construction; as we will see, will cover every element of X and Y ,
so is also surjective. Let u1 , v1 be sufficiently large enough that
i > u1 , i X = ai <
1
2
1
2
This
Pj1 is possible since (an ) converges to 0. This defines up to j1 , so it also determines
n=1 a(n) . Say WLOG that
j1
X
a(n) < a
i > v1 , i Y = ai >
n=1
Then set u2 , picked so that adding elements of X in (u1 , u2 ] puts the sum between a and
a + (1/2); we can do this since ai < 1/2 for each i > v1 . Similarly, set v2 , picked so that
adding elements of Y in (v1 , v2 ] puts the sum between a (1/2) and a. Keep repeating this
process until uk , vk are sufficiently large enough so that
i > uk , i X = ai <
1
4
1
4
Once again, this is possible since (an ) converges to 0. Repeat the above step with 1/4 instead
of 1/2; then its clear that the partial sums of the rearrangement are Cauchy with limit a.
i > vk , i Y = ai >
Continuity
Definition 2.2.6. Let X, Y be topological spaces. Then f : X Y is continuous at
x0 X if for every open neighborhood V of f (x0 ), f 1 (V ) contains an open neighborhood
of x0 .
If the topologies are generated by some basic open sets, then f is contnuous at x0 if and
only if for every basic open neighborhood V of x0 , there exists a basic open neighborhood
U of x0 such that U f 1 (V ), i.e. f (U ) V .
In particular, in metric spaces (where the topology is generated by open balls), f is continuous at x0 if and only if for every > 0 (standing for the basic open ball B(f (x0 ), )),
there exists > 0 (standing for the basic open ball B(x0 , )) such that x B(x0 , ) implies
f (x) B(f (x0 ), ), i.e. dX (x, x0 ) < = dy (f (x), f (x0 )) < .
Proposition 2.2.7. Let X and Y be metric spaces. Then f : X Y is continous at x if
and only if whenever (xn ) converges to x, (f (xn )) converges to f (x).
Proof. ( = ) Suppose f is continuous at x, and (xn ) converges to x. Let > 0 be given.
We must show that there exists N N such that n > N implies d(f (xn ), f (x)) < . Since
f is continuous at x, there exists > 0 such that
d(y, x) < = d(f (y), f (x)) <
29
Since xn x, there exists N N such that for all n > N , d(xn , x) < , whence
d(f (xn ), f (x)) < .
( = ) Fix f, x, and let > 0 be given. We must find > 0 such that d(x, y) < implies d(f (x), f (y)) < . Suppose no such exists. n particular, for each n, = 1/n does
not satisfy the desired condition. Then there exists y X such that d(x, y) < 1/n but
d(f (x), f (y)) > . Let xn be some such y for each n N. Then (xn ) converges to x, but
f (xn ) does not converge to f (x), a contradiction.
Definition 2.2.8. Let X, Y be topological spaces. Then f : X Y is continuous if and
only if it is continuous at all x X.
Proposition 2.2.9. f : X Y is continuous if and only if for every open V Y , f 1 (V )
is open in X.
Proof. ( = ) Let f be continuous, V Y be open, and put U = f 1 (V ). Fix x U . Then
by continuity of f at x, since V is an open neighborhood of f (x), U must contain an open
neighborhood of x.
( = ) Fix f, x, and let V Y be an open neighborhood of f (x). Since V is open, f 1 (V )
is an open neighborhood of x, whence f is continuous at x.
Corollary 2.2.10. The composition of continuous functions is continuous.
Note: +, , are continuous by their definitions on elements of R.
Theorem 2.2.11 (Intermediate Value Theorem). Let f : X R be continuous. Suppose X
is connected. Then if f takes values y0 , y1 (with y0 < y1 , say) then f takes every value in
(y0 , y1 ). Precisely, for every y (y0 , y1 ), there exists x X such that f (x) = y.
Proof. Suppose y (y0 , y1 ) and for all x X, f (x) 6= y. Put
A = {x X  f (x) < y}; B = {x X  f (x) > y}
It is clear that A B = and X = A B by assumption. Both A and B are open in
X by continuity, since A = f 1 ((, y)) and B = f 1 ((y, )). Additionally, both A and
B are nonempty, since there exists x X such that f (x) = y0 < y and z X such that
f (z) = y1 > y. This is a contradiction, since X is connected.
Corollary 2.2.12 (W06.6). Let f : [a, b] R be continuous. Then f takes every value
between f (a) and f (b).
Proof. Since we showed earlier that [a, b] is connected, this is immediate by the Intermediate
Value Theorem.
2.3
Lecture 7  PathConnectedness, Lipschitz Functions and Contractions, and Fixed Point Theorems
Today, we will discuss the idea of pathconnectedness, and show that there are sets in R2
which are connected but not path connected. We will also introduce classes of continuous
functions which have stronger conditions on how much the function can grow between points
which are close together. Finally, we will prove some valuable fixed point theorems for these
classes of functions.
30
31
But this is a contradiction, since max{y0 , . . . , yk } = yi for some i {1, . . . , k}, so yi im(f ).
The proof is similar for minmum values.
Here is an alternative proof. Note that the continuous image of a compact set is compact
(we will prove this later). Hence, f (X) R is compact, whence it is closed and bounded by
HeineBorel. Since f (X) is bounded, it has a least upper bound . Note is a limit point
of f (X) (every open ball about must contain a point of f (X), or else would not be the
least upper bound for f (X)), whence f (X) since f (X) is closed. Hence, there exists
x X such that f (x) = . The proof that f takes minimum values is nearly identical.
Definition 2.3.8. Let (X, TX ), (Y, TY ) be topological spaces. The product topology on
X Y is the topology generated by U V, U TX , V TY .
32
33
X
<d(xn , xn+1 )
Li
i=1
=d(xn , xn+1 )
<d(x0 , x1 )
L
1L
Ln+1
1L
Theorem 2.3.14 (W08.1(a)  Brouwer Fixed Point Theorem). Let g : [a, b] [a, b] be
continuous. Prove g has a fixed point.
Proof. Let f (x) = g(x) x; note f is continuous. We have g(a) > a, since g(a) [a, b], so
f (a) > 0. Similarly, f (b) 6 0. Hence, by the Intermediate Value Theorem, there exists some
x [a, b] such that f (x) = 0, i.e. g(x) x = 0, so g(x) = x.
Proposition 2.3.15 (F11.1). Let f : X X. Suppose for all x 6= y, d(f (x), f (y)) < d(x, y).
Suppose X is compact. Prove f has a unique fixed point.
Proof. Uniqueness: If f (x) = x, f (y) = y for x 6= y, then d(x, y) = d(f (x), f (y)) < d(x, y),
a contradiction.
Existence: Let g(x) = d(x, f (x)). Since each of the component functions x 7 x and x 7
f (x) are continuous and d is continuous, g : X [0, ) is continuous. Since X is compact, g
attains a minimum value. Let x X be such that g(x) is a minimal value of g on X. Suppose
g(x) = d(x, f (x)) = > 0. Then by our assumption on f , d(f (x), f (f (x))) < d(x, f (x) = ,
contradicting the minimality of . This finishes the proof, since g(x) = d(x, f (x)) = 0, so
x = f (x).
2.4
Today, we will discuss the idea of uniform continuity. We will also introduce pointwise and
uniform convergence of sequences of functions.
Uniformity
34
+ =
2 2
Note that {B(z, z /2)  z X} is an open cover for X. By compactness, there are z1 , . . . , zk
such that B = {B(z1 , z1 /2), . . . , B(zk , zk /2)} cover X. Set = min{z1 , . . . , zk }. Suppose
d(x, y) < . Note x B(zi , zi /2) for some i {1, . . . , k}, since B is an open cover for
X. Since d(x, y) < 6 zi /2, by the triangle inequality we have y B(zi , zi ). By (),
d(f (x), f (y)) < .
x = y + ( x y)
so on [1, )
x = y + 2 y( x y) + ( x y)2
Thus, on [1, )
1
y 6 (x y)
2
Step 3: We now need to merge these two observations to get uniform continuity on the
full space [0, ). Fix > 0. By steps 1, 2, there exist 1 , 2 > 0 such that x, y [0, 3]
and x y < 1 implies f (x) f (y) < , and x, y [1, ) and x y < 2 implies
f (x) f (y) < . Take = min(1 , 2 , 1). If x y < , then either x, y are both in [0, 3]
or x, y are both in [1, ), and in each case f (x) f (y) < .
x
35
Of course, you should now be asking: why is x continuous
in the first place? In fact,
a much more general class of functions is continuous, of which x is a particular example.
Proposition 2.4.6. The forward image of compact sets by continuous functions are compact.
Proof. Let f : X Y be continuous, and let K X be compact. Let {Vi }iI be an
open cover of f (K). Then {f 1 (Vi )}iI is an open cover of K by the continuity of f . By
compactness of k, there is a finite subcover {f 1 (Vi1 ), . . . , f 1 (Vin )}. Then Vi1 , . . . , Vin cover
K.
Proposition 2.4.7. If f : X Y is a continuous bijection, Y is Hausdorff,
and X is
1
compact, then f : Y X is also continuous. In particular, this shows x is continuous
on [0, M ] for each M , hence on [0, ).
Proof. Fix U X open. We must show f (U ) is open. Let K = X \ U . Then K is compact,
since X is compact and U is open. Note f (U ) = Y \ f (X \ U ) since f is a bijection. By the
previous proposition, f (X \ U ) is compact, hence closed, since Y is Hausdorff. So f (U ) is
open.
Limits of functions
Definition 2.4.8. Let X, Y be metric spaces, E X, and f : E X. Let a X (with
possibly a
/ E) with points of E arbitrarily close to A. Then we say limxa,xE f (x) exists
and is equal to y if for every > 0, there exists > 0 such that d(x, a) < , x E, x 6= a
implies d(f (x), f (y)) < .
Note: some people allow x = a in the limit. We will also use this with a = .
Example 2.4.9. Let f (x) = 1/x from (0, ) to (0, ). Then limx f (x) = 0.
Convergence of sequences of functions
Definition 2.4.10. Let fn : X Y , n N be a sequence of functions. Let f : X Y . We
say (fn ) converges to f pointwise on X if for all x X, limn fn (x) exists and equals
f (x).
Example 2.4.11. Take fn : [0, 1] [0, 1] given by fn (x) = xn . Then fn (x) converges
pointwise to
(
0 if x [0, 1)
f (x) =
1 if x = 1
This is an example of a pointwise limit of continuous functions which is not continuous.
Another way to put it is:
lim lim fn (x) need not equal lim lim f (x)
n xa
xa n
even if both limits exist (e.g., in the example above, take a = 1).
36
+ + =
3 3 3
37
of sums of functions. For a sequence of functions (fn ) between metric spaces X and Y , for
each x X we take
X
fn (x)
n=1
to be the limit as k of
sk (x) =
k
X
fn (x)
n=1
fn sup
n=1
converges, then
n=1
38
Week 3
As per the syllabus, Week 3 topics include: definition of derivative, derivative for inverse
function, local maxima and minima, Rolles theorem, mean value theorem, Rolles theorem for higher order derivatives and applications to error bounding for approximations
by Lagrange interpolations, monotonicity, LHopitals rule, uniform convergence limits of
derivatives (in homework), upper and lower Riemann integrals, upper and lower Riemann
sums, definition of the Riemann integral, integrability of bounded continuous functions on
bounded intervals, basic properties of the Riemann integral, integrability of mins, maxes,
sums, and products, RiemannStieltjes integral, the fundamenal theorems of calculus, integration by parts, change of variables in integration, improper integrals, integrals of uniform
convergence limits (in homework), CauchySchwarz inequality.
3.1
Today, we will discuss the ArzelaAscoli theorem. We will also introduce differentiation,
and discuss some important rules which allow us to compute derivatives of certain classes of
functions.
Recall, from last time:
Theorem 3.1.1 (ArzelaAscoli). Let X be a separable metric space. Suppose fn : X R,
and
(i) (fn ) is pointwise bounded
(ii) (fn ) is pointwise equicontinuous
Then (fn ) has a subsequence which converges pointwise to a continuous function. Moreover,
the convergence is uniform on compact subsets of X. Note : If X is a compact metric
space, then X is automatically separable.
Proof. Let D X be countable and dense (such a D exists since X is separated). We will
proceed in four parts.
(1) Get a subsequence which converges pointwise on D (using pointwise boundedness)
(2) Show it converges pointwise everywhere on X (using equicontinuity)
(3) Check the limit is continuous (using equicontinuity)
(4) Show uniform convergence on compact sets
Part 1: We will show (fn ) has a subsequence which converges on D. Let (zl )
l=1 enumerate
D (this is possible since D is countable). The idea here is that we will find a subsequence of
(fn ) which converges at zl for each l N. We will then patch these subsequences together
in a way that one particular subsequence of (fn ) converges at zl for every l N.
Let A0 = N. By induction, we define Al+1 Al such that (fn (zl ))nAl+1 converges. To do
so, suppose Al has been defined. The sequence (fn (zl ))nAl is a bounded sequence in R since
39
+ + =
3 3 3
where we have arrived at the first term using equicontinuity, the second term using part (1),
and the third term using equicontinuity again.
Part 3: For x X, let f (x) = limk fnk (x); this limit exists, as established by part (2).
We now show f is continuous. Fix > 0 and x X. Let > 0 be given such that d(x, z) < d
implies fnk (x) fnk (z) < /3 for all k N; such a delta exists and is independent of nk by
equicontinuity.
Our strategy is now essentially the same as that of part (2). We use the convergence of our
subsequence on all of X from part (2) together with equicontinuity to establish the continuity
of f . Fix z X. Since fnk (x) f (x) and fnk (z) f (z), for k sufficiently large, we have
fnk (x) f (x) <
Then
f (x) f (z) 6 f (x) fnk (x) + fnk (x) fnk (z) + fnk (z) f (z) <
+ + =
3 3 3
where we have arrived at the first term using convergence from part (2), the second term
using equicontinuity, and the third term using convergence again.
Part 4: Let K be a compact subset of X. To prove that (fnk ) converges to f uniformly on
K, we will first establish uniform convergence on small neighborhoods. Using compactness,
we will patch this convergence together on an open cover, then trim down to a finite set of
40
for all k N. Since f is continuous on all of X, there similarly exists x2 > 0 such that for
all y B(x, x2 ),
+ + =
3 3 3
for all y B(x, x ). Cover K with the open balls B(x, x ) for each x X. Then there exists
a finite subcover B(x1 , x1 ), . . . , B(xl , xl ). Put N = max{Nx1 , . . . , Nxl }. Then as shown
above, for all j > N , we have
fnj (x) f (x) <
for each x K, completing the proof.
Derivatives
We will work in R throughout our discussion of derivatives, though there are select propositions we will prove that hold for general metric spaces.
Definition 3.1.2. Let X R, f : X R, and x0 X be a limit point of X. We say f is
differentiable at x0 if
f (x) f (x0 )
lim
xx0
x x0
exists. The derivative of f at x0 , denoted f 0 (x0 ), is the value of the limit above. If the
limit does not exist, f is not differentiable at x0 . Finally, f is differentiable on X if f
is differentiable at every point x0 X.
Example 3.1.3. We compute some fundamental examples of derivatives below.
(1) Let f (x) = c for some c R. Then
lim
xx0
f (x) f (x0 )
cc
= lim
= lim 0 = 0
xx0 x x0
xx0
x x0
xx0
f (x) f (x0 )
x x0
= lim
= lim 1 = 1
xx
xx0
x x0
0 x x0
+ + x0n1
x0 xn2
0
where the third step is true by the continuity of multiplication and addition on R (hence,
polynomials are continuous functions). So f is differentable on R, and f 0 (x) = nxn1
for all x R.
(4) Let f (x) = 1/x for x R \ {0}. Then
1
lim
xx0
f (x) f (x0 )
= lim x x0
xx0 x x0
x x0
= lim
x0 x
x0 x
x x0
1
= lim
xx0 x0 x
1
= 2
x0
xx0
where the fourth step is justified by the continuity of multiplication and division on
R \ {0}. So f is differentable on R \ {0}, and f 0 (x) = 1/x2 for all x R.
Proposition 3.1.4. If f is differentiable at x0 , then f is continuous at x0 .
Proof. Fix > 0. We will use a lot less than full differentiability here. Since
lim
xx0
f (x) f (x0 )
= f 0 (x0 )
x x0
so
f (x) f (x0 ) < x x0 (f 0 (x0 ) + 1)
Hence, it follows that
x x0  < min ,
0
f (x0 ) + 1
= f (x) f (x0 ) <
1
xn
Proof. Let g(y) = y1 on R \ {0}. Take h(x) = xn on R. Then f = g h, so by the chain rule,
f 0 (x0 ) = g 0 (y0 ) h0 (x0 ), where y0 = h(x0 )) = xn0 , so
f 0 (x0 ) =
1
1
n
(n x0n1 ) = 2n (nx0n1 ) = n+1
2
y0
x0
x0
1
f 0 (x
0)
Proof. Proof is immediate from the chain rule, and the fact that the derivative of the identity
is 1.
We will slightly weaken the assumptions of the previous proposition to obtain an identical
result.
Proposition 3.1.8. Suppose f : X Y is bijective for X, Y R and g = f 1 . Let
x0 X, y0 Y such that y0 = f (x0 ) and x0 = f (y0 ). Suppose
(i) f is differentiable at x0
(ii) f 0 (x0 ) 6= 0
(iii) g is continuous on Y
Then g is differentiable at y0 , and
g 0 (y0 ) =
1
f 0 (x
0)
f
(x
)
f
(x
)
0
0
g(y) g(y0 )
x x0
so by (), we have
g(y) g(y0 )
1
<
0
y y0
f (x0 )
44
n
x = x1/n . Then g is differentiable on (0, ), and g 0 (x) =
1
1
1 1/n1
y
=
n1 =
(n1)/n
n 0
nx0
ny0
Corollary 3.1.10. For any rational , the function f (x) = x is differentiable on (0, ),
and f 0 (x) = x1 .
Proof. This is an easy consequence of the chain rule on the maps x 7 xn , x 7 x1/k , together
with the previous propositions computing their derivatives.
We conclude with a few more basic differentiation rules.
Proposition 3.1.11. Let f, g : X R be differentiable functions on some X R. Then
(1) (f + g)0 = f 0 + g 0
(2) (f g)0 = f 0 g + g 0 f (product rule)
3.2
Lecture 10  Applications of Differentiation: Mean Value Theorem, Rolles Theorem, LHopitals Rule and Lagrange Interpolation
Today, we will highlight some useful applications of derivatives. Chief among them are
Rolles Theorem and the Mean Value Theorem (of which Rolles Theorem is a special case).
We will also discuss how derivatives can be used to analyze polynomial approximation via
Lagrange Interpolation. Finally, we will prove several incarnations of LHopitals rule, which
is a useful tool for computing certain kinds of limits.
Proposition 3.2.1 (Newtons approximation). f is differentiable at x0 with derivative L if
and only if for every > 0, there exists > 0 such that for all x,
()x x0  < = f (x) l(x) < x x0 
where l(x) is the line through (x0 , f (x0 )) with slope L, i.e. l(x) = f (x0 ) + (x x0 ).
Informally, this proposition says f has derivative L at x0 if and only if the line with slope L
through (x0 , f (x0 ) tracks the curve very closely around x0 .
Proof. Note
f (x) f (x0 )
f (x) l(x)
=
L
x x0
x x0
Hence, the equivalence between the prescribed condition and differentiability at x0 is clear.
45
f (b) f (a)
ba
Proof. The geometric picture you should have in your head is this. For a function which is
continuous on an closed interval and differentiable on the open interval, between any two
points in the interval, there exists a third point in between the two chosen such that the
tangent line to the curve at that point is parallel to the secant line through the first two
points.
Let l(x) be the line between (a, f (a)) and (b, f (b)). Precisely,
l(x) = f (a) +
(f (b) f (a))(x a)
ba
46
Note
l0 (x) =
f (b) f (a)
ba
f (b) f (a)
ba
and g(a) = g(b) = 0. By Rolles theorem, there exists x (a, b) such that g 0 (x) = 0. Then
f 0 (x) =
f (b) f (a)
ba
Example 3.2.6. For every rational (0, 1) and every x, y > 1, y x  6 y x. Hence,
x 7 x is Lipschitz on [1, ). The proof uses the Mean Value Theorem, as one might
imagine.
Fix x, y [1, ), and assume WLOG x < y. Let f be the function x 7 x . From a previous
proposition, we know f 0 (x) = x1 . By the mean value theorem applied to [x, y], there
exists z (x, y) such that
1
f (y) f (x)
= f 0 (x) = z 1 6 z 1 = 1 6 1
yx
z
since z > 1. Hence
f (y) f (x) = y x 6 y x
Theorem 3.2.7 (Higher Order Rolles Theorem). Let f : [a, b] R be continuous. Suppose
f is ntimes differentiable on (a, b), i.e. f 0 , f 00 , f (3) , . . . , f (n) all exist. Suppose a = a0 < a1 <
< an = b are such that for all i {1, . . . , n}, f (ai ) = 0. Then there exists x (a, b) such
that f (n) = 0.
Proof. A small note in the statement above: while we take f (ai ) = 0 for all i, this is more
a matter of convenience. This theorem is true as long as f (ai ) = c R for each i, but we
can always translate this case into the above form by replacing f with f c. We proceed by
induction on n. Note the case n = 1 is Rolles Theorem, so our claim is true for n = 1.
Suppose the claim holds for n > 1, and suppose a = a0 < < an+1 = b such that for all
i {1, . . . , n 1}, f (ai ) = 0. By Rolles theorem, for each i, we can find ci (ai , ai+1 ) such
that f 0 (ci ) = 0. Now we have c1 , . . . , cn such that f 0 (cj ) = 0 for each j {1, . . . , n}, so using
the inductive hypothesis on f 0 (since f 0 is ntimes differentiable), we get x (c0 , cn ) such
that (f 0 )(n) (x) = 0, i.e. f (n+1) (x) = 0.
Example 3.2.8 (Lagrange Interpolation). We can use the higher order Rolles theorem to
bound error in approximation by polynomials. We will start by approximating a function
by a line. Suppose f : [a, b] R is continuous, and twice differentiable on (a, b). Suppose
f 2 (x) is bounded on (a, b) by M . We approximate f by the line
p1 (x) = f (a) + (x a)
47
f (b) f (a)
ba
(2)
Let g(x) = f (x) p1 (x). This is the approximation error. Note that p1 (x) = 0 for
all x [a, b], so g (2) (x) = f 2 (x). Take c (a, b). Approximate g by a second degree2
polynomial p2 that hits g at a, b and c:
p2 (x) =
g(c)
(x a)(x b)
(c a)(c b)
2g(c)
2g(c)
= f 2 (x)
(c a)(c b)
(c a)(c b)
By the higher order Rolles theorem for h, and since h(a) = h(c) = h(b) = 0, we have some
z (a, b) such that h2 (z) = 0. Then
f (2) (z) =
2g(c)
(c a)(c b)
So
1
1
g(c) 6 f (2) (z)c ac b 6 M (b a)2
2
2
Since the choice of c was arbitrary, the error g(x) is bounded on (a, b) by (M (b a)2 )/2.
We can continue this same process with higher degree polynomial approximations. Set
q2 = p1 + p2 . Then q2 is a degree2 polynomial with q2 (a) = f (a), q2 (b) = f (b), q2 (c) = f (c).
By a similar argument to the above, assuming f is 3times differentiable and M bounds f 3 ,
using n = 3 higher order Rolles, we get
(f q2 )(x) 6
M (b a)3
32
M (b a)n
n!
f (x) f (x0 )
x x0
must be 0 if it exists.
Proposition 3.2.11. If f : [a, b] is continuous and differentiable on (a, b), and for all x
(a, b), f 0 (x) > 0, then f is monotone increasing on [a, b]. Similarly, if for all x (a, b),
f 0 (x) 6 0, then f is monotone decreasing on [a, b]. This is a kind of converse to the above
proposition.
Proof. As one might expect, we proceed using the Mean Value Theorem. Fix x < y in [a, b].
By the Mean Value Theorem on [x, y], there exists w (x, y) such that
f (y) f (x)
= f 0 (w)
yx
Since f 0 (w) > 0 by hypothesis, we get
f (y) f (x) = f 0 (w)(y x) > 0
so f (y) > f (x).
LHopitals Rule
We conclude today by presenting LHopitals rule in several forms, differing mainly in the
hypotheses assumed.
Proposition 3.2.12 (LHopitals Rule). Let X R, f, g : X R, and let x0 be a limit
point of X. Suppose f (x0 ) = g(x0 ) = 0. Suppose f, g are both differentiable at x0 . Suppose
g 0 (x0 ) > 0. Suppose there is a neighborhood (x0 , x0 + ) of x0 where g(x) is never 0 except
at x0 . Then
f 0 (x0 )
f (x)
= 0
lim
xx0 ,xX g(x)
g (x0 )
Proof. Note for x 6= x0 , we have
f (x) f (x0 )
f (x) f (x0 )
x x0
f (x)
=
=
g(x)
g(x) g(x0 )
x x0
g(x) g(x0 )
As x x0 ,
and
so
f (x) f (x0 )
f 0 (x0 )
x x0
x x0
1
0
g(x) g(x0 )
g (x0 )
f (x)
f 0 (x0 )
0
g(x)
g (x0 )
49
xx0 ,xX
f 0 (x0 )
f (x)
= 0
g(x)
g (x0 )
Proof. We show there exists > 0 such that xx0  < , x 6= x0 such that g(x) 6= 0, reducing
our claim to the previous proposition.
Let > 0 be small enough such that xx0  < implies g 0 (x) 6= 0. Suppose for contradiction,
there is x B(x0 , ) \ {x0 } such that g(x) = 0. By Rolles Theorem on [x, x0 ] (or [x0 , x] if
x0 < x), there exists w between x and x0 such that g 0 (w) = 0, a contradiction.
Proposition 3.2.14. Suppose a < b and f, g : [a, b] R are continuous. Suppose f (a) =
g(a) = 0, and g 0 is nonzero on (a, b]. If
f 0 (x)
xa g 0 (x)
lim
Proof. First, by Rolles Theorem as above, g is nonzero on (a, b]. For each z (a, b),
let hz (x) = f (x)g(z) g(x)f (z). Note hz (x) is continuous on [a, z], hz (a) = hz (z) = 0,
and h0z (x) = f 0 (z)g(z) g 0 (x)f (z). By Rolles theorem, there exists w (a, z) such that
h0z (w) = 0, so f 0 (w)g(z) g 0 (w)f (z) = 0, i.e.
f (z)
f 0 (w)
=
0
g (w)
g(z)
Now by assuption that
f 0 (x)
=L
xa g 0 (x)
lim
f (z)
<
L
g(z)
50
3.3
Today, we will introduce Riemann integration as a method for computing areas under curves.
We will develop the Riemann integral, define what it means to be Riemann integrable,
and will prove that several classes of functions are Riemann integrable.
Definition 3.3.1. An interval I is any of [a, b], (a, b], [a, b), (a, b) for a 6 b (allowing for
a = , b = ). Note that under this definition, and single points are considered
intervals.
Definition 3.3.2. An interval is bounded if a 6= and b 6= . We define the length
of the interval to be b a, denoted I.
Definition
S3.3.3. A partition of an interval I is a (finite) set P of pairwise disjoint intervals
such that JP J = I.
P
Proposition 3.3.4. If P is a (finite) partition of a bounded interval I, then I = JP J.
Proof. Let P be a partition of a bounded interval I whose endpoints are a, b where a < b R.
Let P = {J1 , . . . , Jn , and let Ji have endpoints xi < yi for each i. Since P is finite and every
pair of elements of P are disjoint, we can rearrange the intervals so that the endpoints are
ordered, i.e. x1 y1 x2 y2 xn yn . Since I is the union of J1 , . . . , Jn , it follows
that x1 = a and yn = b, where the intervals J1 and Jn are inclusive on the left and right
respectively. Further, we must have yi = xi+1 for each i; if not, then there exists z [a, b]
such that yi < z < xi+1 , contradicting the fact that I is the union of elements of P . Finally,
we have
X
JP
n
n1
X
X
J =
(yi xi ) =
(xi+1 xi ) + (yn xn ) = yn x1 = (b a) = I
i=1
i=1
JP
J 0 P 0
J 0 J
JP
JP
[P 0 ]
Then given partitions P1 , P2 of I, we can pick a partition P 0 which refines both P1 , P2 , whence
the above remark completes the proof.
Definition 3.3.11. Let I be a bounded interval, and f : I R be piecewise constant. Then
Z
Z
f
p.c. f = p.c.
[P ]
for some (equivalently, by the last proposition, all) partition(s) P of I such that f is piecewise
constant with respect to P .
Definition 3.3.12. We say f majorizes f on I if for all x I, f (x) > f (x). Similarly, f
minorizes f on I for all x I, f (x) 6 f (x).
Definition 3.3.13. Let I be a bounded interval, and f : I R be bounded. Then the
upper Riemann integral of f on I is defined as
Z
Z
f  f majorizes f and is piecewise constant on I}
f = inf{p.c.
I
[P ]
[P ]
52
JP
[P ]
JP
and
Z
p.c.
f6
[P ]
Then
Z
f
I
Z
f=
f
I
is piecewise constant with respect to P , and it majorizes f . Moreover, for any f majorizing
f which is piecewise constant with respect to P , we have f (y) > gP (y). By definition, we
have
Z
X
p.c.
gP =
J sup f (x)
[P ]
JP
53
xJ
We refer to the sum on the RHS as the upper Riemann sum of f, P denoted U (f, P ).
Since gP majorizes f for each partition P , and is piecewise constant with respect to I, we
have
Z
Z
f  f majorizes f and is piecewise constant on I} 6 U (f, P )
f = inf{p.c.
I
[P ]
Similarly, since f (y) > gP (y) for any particular partition Y , we have
Z
inf{U (f, P )  P a partition of I} 6 p.c.
f
[P ]
so
Z
f = inf{U (f, P )  P a partition of I}
I
Similarly, define
L(f, P ) =
JP
[P ]
Z
f>
f
I
54
and
Z
Z
f6
p.c.
I
Z
f=
f=
I
f 6 p.c.
so
Z
p.c.
f
I
Z
f=
f
I
Since f is uniformly continuous, there exists > 0 such that x y < = f (x) f (y) <
/I. Take any partition P such that every interval J in P has length less than . Then for
each J P , for any x, y J
I
So
sup f (x) 6 inf f (y) +
xJ
yJ
I
and thus
J sup f (x) 6 J inf f (x) +
xJ
xJ
J
I
JP
xJ
i.e.
U (f, P ) 6 L(f, P ) +
55
X J
JP
I
i.e.
Z
Z
f +
f6
I
as desired.
Corollary 3.3.18 (S07.9). Let f : [a, b] R be continuous ([a, b] bounded). Then f is
Riemann integrable on [a, b].
Proof. This is immediate, since [a, b] is compact, so f is in fact uniformly continuous on
[a, b], hence Riemann integrable by the last theorem.
Proposition 3.3.19. If : I R is bounded by M , then
Z
Z
6 M I
M I 6 f 6
I
Proof. This is clear, since the constant function M majorizes (resp M minorizes) f .
Theorem 3.3.20. Let f : I R be continuous. Suppose I is a bounded interval and f is
bounded. Then f is Riemann integrable on I.
Proof. Let > 0. We will find a partition P of I such that
U (f, P ) 6 L(f, P ) +
whence f is Riemann integrable on I, as shown in the previous theorem. Let a, b be the left
and right endpoints respectively of I. Let M bound f  on I. Let > 0 be small enough so
that 2 < b a, and 2M < /3.
By the previous theorem, since f is uniformly continuous on [a + , b ], f is Riemann
integrable on [a + , b ]. Hence, there exists a partition P of [a + , b ] such that
U (f, P ) 6 L(f, P ) +
inf
x(a,a+)
inf
x(b,b)
f (x) > M
and
sup
f (x) 6 M and
x(a,a+)
sup f (x) 6 M
x(b,b)
56
Then
U (f, P ) =
sup
f (x) + U (f, P ) +
sup f (x)
x(a,a+)
x(b,b)
62M + U (f, P )
6L(f, P ) + + 2M
3
=L(f, P ) 2M + + 4M
3
6 inf f (x) + L(f, P ) +
x(a,a+)
inf
f (x) +
x(b,b)
+ 4M
3
=L(f, P ) + + 4M
3
<L(f, P ) + + 2
3
3
=L(f, P ) +
U (f, P 0 ) L(f, P 0 ) =
U (f, PJ ) L(f, PJ ) 6 n =
n
JP
Proposition 3.3.23. If f : [a, b] R is monotone, then f is Riemann integrable on [a, b].
Proof. Homework (S13.1, F12.2).
Example 3.3.24. We now exhibit the standard example of a function f : [0, 1] R which
is not Riemann integrable. Let
(
0 if x [0, 1] Q
f (x) =
1 if x [0, 1] \ Q
Then for every interval J [0, 1] of nonzero length,
sup f (x) = 1, inf = 0
xJ
xJ
57
Z
f >0
I
Using (1) and (2), if f (x) > g(x) for all x I, then
Z
Z
f> g
I
(6) The functions min(f, g) : x 7 min(f (x), g(x)) and max(f, g) : x 7 max(f (x), g(x)) are
both Riemann integrable on I
3.4
Today, we will prove several theorems which will help us calculate integrals. In particular, we
will prove the 1st and 2nd fundamental theorems of calculus; we will introduce the RiemannStieltjes integral; and prove a kind of product rule (integration by parts) and chain rule
(change of variables) for integrals. We will begin by proving (6) from the theorem introduced
at the end of Lecture 11.
Proof. We will prove the result for the max function; the proof for the min function is similar.
We start with a claim.
Claim 3.4.1. Let a, a, b, b R such that a > a, b > b. Then max{a, b} max{a, b} 6
(a a) + (b b).
58
Proof. Note
max(a, b) max(a, b) = min(max(a, b) a, max(a, b) b)
(
a a if max(a, b) = a
6
bb
if max(a, b) = b
6(a a) + (b b)
Fix > 0. Since f, g are Riemann integrable on I, there exist partitions P1 , P2 of I such
that
U (f, P1 ) L(f, P1 ) 6
2
U (g, P2 ) L(g, P2 ) 6
2
0
Let P refine P1 , P2 . Note that for any refinement K of a partition K, we have U (f, K 0 ) 6
U (f, K) and L(f, K 0 ) > L(f, K). Hence,
U (g, P ) L(g, P ) 6
2
U (f, P ) L(f, P ) 6
xJ
for each x J, so
max(sup f (x), sup g(x)) > sup max(f (x), g(x))
xJ
xJ
xJ
Similarly,
max(inf f (x), inf g(x)) 6 max(f (x), g(x))
xJ
xJ
for each x J, so
max(inf f (x), inf g(x)) 6 inf max(f (x), g(x))
xJ
xJ
xJ
JP
xJ
xJ
JP
xJ
xJ
xJ
JP
xJ
xJ
59
xJ
X
JP
X
JP
X
JP
xJ
xJ
xJ
xJ
xJ
J [sup f (x) sup g(x) inf f (x) sup g(x) + inf f (x) sup g(x) inf f (x) inf g(x
xJ
xJ
xJ
xJ
xJ
xJ
X
JP
xJ
xJ
xJ
60
xJ
Calculating Integrals
Theorem 3.4.7 (Fundamental Theorem of Calculus
R I). Let a < b, f : [a, b] R be Riemann
integrable. Let F : I R be the function F (x) = [a,x] f . Then F is continuous on I, and for
every x0 [a, b], if f is continuous at x0 , then F is differentiable at x0 and F 0 (x0 ) = f (x0 ).
Proof. Let M be a bound for f  on I. Then for every x < y in [a, b], we have:
Z
Z
Z
Z
Z
F (y) F (x) =
f
f =
f 6
f  6
M = M (y x)
[a,y]
[a,x]
[x,y]
[x,y]
[x,y]
So x x0  < implies
(x x0 )(f (x0 ) ) < F (x) F (x0 ) < (x x0 )(f (x0 ) + )
whence
F (x) F (x0 )
< f (x0 ) +
x x0
F (x) F (x0 )
x x0  < =
f (x0 ) <
x x0
f (x0 ) <
i.e.
61
Proof. Let P be a partition of [a, b]. Let J P (with J > 0), say the left and right
endpoints of J are y and z. Then by the Mean Value Theorem, since F 0 = f by assumption,
F (z) F (y) = (z y)f (w) = Jf (w) for some w J.
So
J inf f (x) = F (z) F (y) 6 J sup f (x)
xJ
xJ
For each J P , denote the left and right endpoint of J by yJ and zJ respectively. The
summing over all J P , we find
X
L(f, P ) 6
F (zJ ) F (yJ ) 6 U (f, P )
JP
The middle sum is telescoping, so only the first and last terms of the sum survive, and we
get
L(f, P ) 6 F (b) F (a) 6 U (f, P )
R
Since f is Riemann integrable, we can get L(f, P ) and U (f, P ) arbitrarily close to [a,b] f .
By the last inequality, then
Z
f = F (b) F (a)
[a,b]
[a,x]
Hereafter, we denote
f by
[a,b]
Rb
a
f.
F G0 = F (b)G(b) F (a)G(a) =
F 0G
Proof. F is differentiable, hence continuous, so F is Riemann integrable on [a, b]; the same
holds for G. Since F 0 , G0 are Riemann integrable [a, b], G0 F and F G0 are also Riemann
integrable on [a, b]. By the 2nd fundamental theorem of calculus and the product rule for
differentiation,
Z b
Z b
Z b
Z b
0
0
0
0
G0 F
F G+
F G+G F =
(F G ) = (F G)(b)(F G)(a) = F (b)G(b)F (a)G(a) =
a
Proof. Consider first the case that f is piecewise constant on [a, b] with respect to P . Then
for each J P , by the 2nd fundamental theorem of calculus,
Z
Z
Z
0
f d = cJ = cJ 0 = cJ ((z) (y)) = cJ [J]
J
JP
JP
For the general case, approximate f by piecewise constant functions. See Corollary 11.10.3
from Tao I.
Proposition 3.4.13. Let : [a, b] [(a), (b)] be continuous and monotone increasing.
Let f : [(a), (b)] R be Riemann integrable. Then f : [a, b] R is RiemannStieltjes
integrable with respect to , and
Z b
Z (b)
f d =
f
a
(a)
Proof. We will show this for the case of piecewise constant functions f . The general case
follows by a standard argument (see Tao I, 11.10.6). Say f is pieewise constant with respect
to P . Note if J [(a), (b)] is an interval, then end points of J are taken as values by
63
= J.
by the Intermediate Value Theorem. So there is an interval J [a, b] such that (J)
(b)
f=
(a)
cJ J =
JP
=
cJ [J]
f d
a
JP
b
0
(b)
(f ) =
a
f
(a)
64
Week 4
As per the syllabus, Week 4 topics include: Youngs, Holders, and Minkowskis inequaities,
formal power series, radius of convergence, real analytic functions, absolute and uniform
convergence on closed subintervals, derivatives and integrals of power series, Taylors forumla,
Abels lemma, Abels theorem for uniform convergence and continuity, StoneWeierstrass
theorem, Cauchy mean value theorem, Taylor theorem with reminder in Lagrange, Cauchy,
and integral forms, Newtons methods for finding roots of a single function, error bounds in
numerical integration and differentiation (homework).
4.1
Today, we will discuss how limits of integrals behave with respect to certain classes of functions. We will prove an analogue of the Mean Value Theorem for integrals. Finally, we
prove some useful integral inequalities  namely, CauchySchwarz and Youngs, Holders, and
Minkowskis inequalities.
A few words on limits
Rb
Definition 4.1.1. If f : [a, b) R is not bounded, we can still try to make sense of a f
Rx
Rb
as limxb a f . If the limit exists, take a f to be this limit. We can similarly define an
Rb
interpretation of a f on the left side of the interval, as well as for integrals of the form
R Rb
f, f .
a
Theorem 4.1.2 (F04.3). If fn : [a, b] R are Riemann integrable and (fn ) converges uniformly to f : [a, b] R, then f is Riemann integrable, and
Z
f = lim
a
fn
a
Proof. Homework F04.3 (and a few more). For a counterexample when uniform convergence
is not assumed, see S08.2 (among others).
This gives the following special case for sums:
Theorem 4.1.3. If fn : [a, b] R are Riemann integrable and
on [a, b], then
Z bX
Z b
X
fn =
fn
a n=1
n=1
65
n=1
fn converges uniformly
A useful proposition to know in the context of the theorem above is the Weierstrass
M test:
Theorem 4.1.4 (Weierstrass M test). If
fn  <
n=1
then
n=1
n=1
Mn converges.
Z
f = f (c)
Note that Mean Value Theorem I follows from the special case = 1.
Proof. First, since f is continuous and [a, b] is compact, note that f achieves its minimum and
maximum, say at points xmin , xmax [a, b] respectively. Then in particular, f  is bounded
by some M R, so if
Z
b
=0
a
then
Z b Z b
6
M = 0
f
Z
f = f (x)
Now assume
Z
6= 0
a
Replace by
Z
1
=1
a
This normalization does not change the nature of the proof, but makes some later details
simpler. We then note
Z b
Z b
Z b
f (xmin ) 6
f 6
f (xmax )
a
i.e.
Z
f (xmin )
6
a
Z
f 6 f (xmax )
hence
Z
f (xmin ) 6
f 6 f (xmax )
a
67
By the Intermediate Value Theorem, since f (xmin ), f (xmax ) f ([a, b]), there exists c [a, b]
(in fact, c is in between xmin and xmax ) such that
b
Z
f (c) =
f
a
Then
Z
=
f (c)
a
f
a
fg 6
I
g2
I
with equality if and only if f and g are linearly dependent, meaning there exists c R such
that f (x) = cg(x) for all x R (or g(x) = cf (x) for all x R).
Proof. If
Z
f2 = 0
then for all x I, f (x) = 0 by the previous proposition, and the theorem is clear. One
proceeds similarly if
Z
g2 = 0
I
Suppose
Z
f 6= 0,
I
2
g 2 6= 0
I
2
Note that for all u, v R, (u v) > 0, i.e. u 2uv + v 2 > 0, so 2uv 6 u2 + v 2 with equality
if and only if u = v. We apply this with
g(x)
f (x)
u = qR
, v = qR
f2
g2
I
I
These can be thought of the values at x of normalizations f and g. Then for all x I,
2f (x)g(x)
f (x)2 g(x)2
R
qR
6
+ R 2
R
f2
g
2
2
I
I
f
g
I
I
68
2f (x)g(x)
I
q
62
R
R
2
2
f
g
I
I
so
sZ
Z
g2
f2
fg 6
I
2f (x)g(x)
I
q
62
R
R
2
2
f Ig
I
In this case, f and g are clearly linearly dependent; if f = c g for some c R, then one can
trace back through thse steps and in fact show that
sR
f2
c = RI 2
g
I
with equality if and only if f 0 (x) = c for some constant c R. By the 2nd fundamental
theorem of calculus,
Z 1
f 0 (x) 1dx = f (1) f (0) = 1 0 = 1
0
69
so
Z
()
f 0 (x)2 > 1
0
0
with equality if and only if f = c for a constant c (assuming f (1) = 1, f (0) = 0). Hence, any
f for which equality holds in () will minimize the expression in question. Note the function
g(x) = x achieves equality. Also, g is the only function satisfying f (0) = 0, f (1) = 1 and f 0
is constant, so g is the unique minimizing function.
Theorem 4.1.11 (Youngs Inequality). Let : [0, ) [0, ) be continuous and strictly
monotone increasing, with (0) = 0. (Then note we also have 1 : [0, ) [0, ) which
is continuous, strictly monotone increasing, and 1 (0) = 0.) Then for every a, b > 0,
a
(x)dx +
ab 6
0
1 (x)dx
ap b q
+
p
q
p
q
1
q
=
p1
p
Similarly,
1
p
=
q1
q
so
1
q
1
1
= =
=
=q1
p1
p
p/q
1/(q 1)
x
0
p1
Z
dx +
y
0
q1
a
b
xp
y q
ap b q
+
dy = + =
p 0
q 0
p
q
70
a2 b 2
+
2
2
which is exactly what we used to get CauchySchwarz. We can use a similar argument to
the proof of CauchySchwarz using the more general inequality proved above to show the
following inequality.
Theorem 4.1.13 (Holders Inequality). Let f, g : I R be continuous, and Riemann integrable, and let I > 0. Let p, q > 1 such that 1/p + 1/q = 1. Then
sZ
sZ
Z
f g 6
f p
gq
I
f p +
I
gp
I
1
p1
q =1+
1
p1
so
71
1
p1
(p 1) = p 1 + 1 = p
By Holders Inequality,
Z
Z
Z
p
p1
f + g = f + g f + g
6 (f  + g) (f + gp1 )
I
by the triangle inequality with equality if and only if f (x) and g(x) have the same sign for
each x I. Then by Holders Inequality, we obtain
sZ
sZ
sZ
sZ
Z
Z
f  f + gp1 + g f + gp1 6 p
f p q
f + gq(p1) + p
gp q
f + gq(p1)
I
sZ
p
sZ
f p +
! sZ
q
gp
I
!
f + gp
Dividing by
sZ
q
f + gp
I
we get
Z
f + gp
sZ
11/q
6
sZ
f p +
gp
I
sZ
f + gp 6
sZ
f p +
gp
I
For equality, through use of Holders Inequality, we need f p , f + gp1 to be linearly dependent, and gp , f + gp1 to be linearly dependent, so in particular, we need f , g to be
linearly dependent. Also, through use of the triangle inequality, we need f (x) and g(x) to
have the same sign for all x I. Combining these conditions, we need f and g to be linearly
dependent, and by a constant c > 0. This is a necessary condition for equality, and it is
straightforward to check it is also sufficient.
4.2
Today, we will introduce formal power series. We will discuss how the radius of convergence
affects the convergence of power series, including two powerful tools, namely Abels Lemma
and Abels Theorem. We will also briefly introduce Taylor series.
72
Definition 4.2.1. A formal power series centered at a R is any series of the form
cn (x a)n
n=0
where cn R is called the nth coefficient of the series. The radius of convergence of the
series is defined to be
1
R=
lim supn (cn )1/n
We allow R = + if lim supn (cn )1/n = 0 and R = 0 if lim supn (cn )1/n = .
Theorem 4.2.2. (a) If x a > R, then
cn (x a)n
n=0
diverges.
(b) (S06.2) If x a < R, then
cn (x a)n
n=0
converges absolutely.
Proof. (a) It is enough to show that cn (x a)n  does not converge to 0; to do this, it is
enough to find infinitely many n where cn (x a)n  > 1. By the definition of lim sup, for
each > 0, there are infinitely many n such that
cn 1/n > lim sup cn 1/n
n
cn 
x a >
1
x a x a
x a =
+1=1
R
R
R
so cn (x a)n  > 1 for infinitely many n. In particular, (cn (x a)n )nN does not converge
to 0.
73
(b) Again using the definition of R and lim sup, for every > 0,
cn 1/n <
1
+
R
for all but finitely many n (say for all n > k N). Since x a < R by assumption, put
1
x a
>0
= 1
R
2x a
Then for all but finitely many n
1/n
cn 
x a
x a
1
R + x a
2R
x a <
+ 1
=
<
=1
R
R
2
2R
2R
whence
cn (x a)n  < Ln
where L < 1,
k
X
cn (x a)  +
n=0
cn (x a)  <
k
X
cn (x a)  +
n=0
n=k
Ln <
n=k
cn (x a)n
n=0
cn (x a)n
n=0
X
ncn (x a)n1
n=1
X
cn (x a)n+1
f=
n+1
y
y
n=0
74
Proof. The proofs here are relatively straightforward with the help of the previous theorem,
with the exception of (b) which is a little tricky.
(a) The key here is to use the Weierstrass Mtest with the help of the bound from the
previous proof. That is, in our proof of (b) in the last theorem, used with x = a + r, we
found L < 1 such that for all n > k N for some k,
cn rn  < Ln
Then certainly, since x a < r for each x (a r, a + r), we have
cn (x a)n  < cn rn  < Ln
for all n > k. Thus, it follows that
sup
cn (x a)  =
n=0 x[ar,a+r]
k
X
sup
cn (x a)  +
n=0 x[ar,a+r]
<
k
X
sup
cn (x a)n  +
n=0 x[ar,a+r]
sup
cn (x a)n 
n=k x[ar,a+r]
X
n
n=k
<
Hence, by the Weierstrass Mtest,
m
X
cn (x a)n
n=0
converges uniformly to f on [a r, a + r]
(b) By Tao II 3.7.2, it is sufficient to show that
!0
l
X
cn (x a)n =
n=0
lN
l
X
!
ncn (x a)n1
n=1
lN
ncn (x a)n1
n=1
converges uniformly on [a r, a + r]. For this it is enough to show that the radius of
convergence of
X
ncn (x a)n1
n=1
75
is > r by part (a) of this theorem. For this, it is sufficient to find one point x outside
[a r, a + r] on which
X
ncn (x a)n1
n=1
converges; this is applying part (a) of the previous theorem (the series must diverge
at EVERY point outside the radius of convergence). Pick some x, w R such that
r < x a < x w < R. Since x w < R, by part (a), f (w) converges absolutely;
in particular, cn w an is bounded by M for some M R. Then we compute
ncn (x a)n1  =
n=1
ncn 
n=1
X
x an1
x an1
M
n1
n
w
a
<
w an1
w a n=1 w an1
X
x an1
n
w an1
n=1
m
X
cn (x a)n
n=0
converges uniformly f on [a r, a + r] for any r < R by part (a), it follows that on any
subset [y, z] (a R, a + R),
Z z
Z z
f
fm
y
i.e.
Z
lim
m
zX
n=0
cn (x a) = lim
m Z
X
n=0
z
n
cn (x a) =
lim
m
X
y m n=0
cn (x a)n
R
Definition 4.2.4. f : E R is real analytic at a (E) if on some neighborhood
(a r, a + r) E, f is equal to a power series with radius of convergence > r. We say f is
real analytic on an open set E if f is real analytic at each a E.
Proposition 4.2.5. If f is real analytic on E, then f is smooth (ktimes continuously
differentiable for all k N) and for each k, f (k) is real analytic on E.
Proof. We proved both the base case and the induction case in part (b) of the previous
theorem.
76
cn (x a)n
n=0
X
f (n) (a)
n!
n=0
(x a)n
Proof. Since f is real analytic at a, we can apply part (b) of the previous theorem to
differentiate f (x) term by term. Then a simple induction argument shows that the constant
term of the power series expansion for f k (x) at a is k! ck . Plugging in at a makes all the
higher order terms vanish, so we obtain f k (a) = k! ck . Isolating ck gives ck = f k (a)/k!,
yielding the familiar Taylor expansion on (a r, a + r).
Corollary 4.2.7. If f is representable by (equal to) two power series with coefficients
(cn ), (dn ), then cn = dn for each n.
Proof. By the above corollary, Taylor series forces the value of cn , dn .
Next, we want to consider the behavior of power series at the end points of the radius of
convergence. The series may converge or diverge, but we show that if the series converges
at a = R (resp a = R), then the series converges uniformly on [a R, a] (resp [a, a + R]).
Lemma 4.2.8 (Abels Lemma (F12.1)). Let
Pm(bn ) be a (nonstrictly) decreasing sequence of
nonnegative reals. Let (an ) be such that ( n=1 an )m is bounded on both sides, say by A.
Then
n
X
aj bj 6 2Abm+1
j=m+1
Proof. The key here is to think of this as a special kind of integration by parts for sums.
Let
m
X
sm =
an
n=1
sj (bj+1 bj )
n
X
j=m+1
77
aj bj = sn bn+1 sm bm+1
Proof. Note
n
X
sj (bj+1 bj )
j=m+1
n
X
aj b j =
j=m+1
=
=
n
X
j=m+1
n
X
j=m+1
n
X
sj bj+1
sj bj+1
sj bj+1
j=m+1
n
X
n
X
s j bj
j=m+1
n
X
n
X
aj b j
j=m+1
(sj aj )bj
j=m+1
n
X
sj1 bj
j=m+1
sj bj+1
j=m+1
n1
X
sj bj+1
j=m
Now to prove abels theorem, note (by the claim above and since (bj ) is decreasing),
X
n
n
X
X
n
aj bj 6 sn bn+1 sm bm+1  +
sj (bj+1 bj ) =sn bn+1 sm bm+1  +
sj (bj+1 bj )
j=m+1
j=m+1
j=m+1
n
X
A(bj bj+1 )
6Abn+1 + Abm+1 +
j=m+1
Theorem 4.2.10. If
cn (x a)n
n=0
cn (x a)n
n=0
78
cn xn at x = R,
Now consider
n
X
cn x =
j=m+1
n
X
cn R
x j
R
j=m+1
cn (x a)n
n=0
cn (x a)n
n=0
x(a+R)
cn (x a)n =
cn R n
n=0
n=0
Similarly at a R.
Proof. A uniform limit of continuous functions is continuous, so this is immediate by the
previous theorem.
f (x) = 1 + x = f (0) = 1
1
1
f (x) = (1 + x)1/2 = f 0 (0) =
2
2
1
1
f (x) =
(1 + x)3/2 = f 00 (0) =
4
4
3
3
f (x) = (1 + x)5/2 = f 000 (0) =
8
8
..
..
.
.
k+1
(1)
(2k 2)!
(1)k+1 (2k 2)!
(2k+1)/2
(k)
f (k) (x) =
(1
+
x)
=
f
(0)
=
(k 1)!22k1
(k 1)!22k1
79
X
(1)k+1 (2k 2)!
k=0
(k 1)!22k1 k!
Call
ck =
xk =
(1)k+1 (2k)! k
x
(k!)2 4k (2k 1)
(1)k+1 (2k)! k
x
(k!)2 4k (2k 1)
(This formulafor ck works also at k = 0, 1.) We take for granted for now that
converges to 1 + x on (1, 0]. At x = 1, we have
k ck x
X
X
X
(1)k+1 (2k)!
(1)0!
(2k)!
(2k)!
k
(1)
=
+
=
1
(k!)2 4k (2k 1)
(0!)2 40 (1) k=1 (k!)2 4k (2k 1)
(k!)2 4k (2k 1)
k=0
k=1
This is the limit of a decreasing sequence; in fact the same is true for any x (1, 0]. Let
sn (x) =
n
X
cj x j
j=0
Then on (1, 0], sn (x) is decreasing and converges to 1 + x > 0. At x = 1, sn (x) > 0, so
(sn (1))nN is decreasing and bounded below by 0, whence it converges. Thus, our power
series converges at x = 1; by the theorem, it converges uniformly on[1, 0].
Lets recap. We saw that the polynomials sn converge uniformly to 1 + x on [1, 0]. In
the next lecture, we will prove some more general theorems about functions which can be
uniformly approximated by polynomials. We also found
lim sn (1) = 1
so
X
k=1
X
k=1
4.3
p
(2k)!
=
1 + (1) = 0
(k!)2 4k (2k 1)
(2k)!
(k!)2 4k (2k
1)
=1
Today, we will discuss the approximation of certain classes of functions uniformly by polynomials; we will use this to prove the StoneWeierstrass theorem. We will also give various
forms of the error bounds for Taylor expansions.
Last time, we showed there is a sequence of polynomials which uniformly
converges to
1+ x on [1, 0]. We still need to show that the Taylor expansion of 1 + x at 0 converges
to 1 + x on (1, 0]. We will prove this later.
Proposition 4.3.1. Let X, Y and Z be general metric spaces. Suppose gn : X Y converge
uniformly to g, and fn : Y Z converge uniformly to f , and f is uniformly continuous.
Then (fn gn ) converges uniformly to f g.
80
Proof. Fix > 0. By the uniform continuity of f , there exists > 0 such that d(y1 , y2 ) <
= d(f (y1 ), f (y2 )) < /2. Using uniform convergence of fn , gn , there exists N N such
that for all n > N ,
+ =
2 2
2
2
The first term is a polynomial in x and y, and the second term is a uniform limit of
polynomials by (2), so a similar argument to (1) and (2) does the job.
min(x, y) =
x + y x y
+
2
2
81
(5) This can be done using a simple induction argument, using the composition
min(x1 , . . . , xk , xk+1 ) = min(min(x1 , . . . , xk ), xk+1 )
(6) This can similarly be done using a simple induction argument, using the composition
max(x1 , . . . , xk , xk+1 ) = max(max(x1 , . . . , xk ), xk+1 )
Proposition 4.3.3. The previous proposition holds true if your replace [1, 1] in the domains by [M, M ] for any M R.
Proof. This is clear by composing the functions above with the linear polynomial which
scales the interval [M, M ] to [1, 1].
Theorem 4.3.4 (StoneWeierstrass). Let X be a compact metric space. Let A C(X)
(where C(X) is the set of continuous functions from X into R) satisfy:
(1) A is closed under sums, products, and products with scalars. Precisely:
g A, c R = c g R
f, g A = f + g A
f, g A = f g A
I.e., A is an algebra
(2) The constant function x 7 1 belongs to A (i.e. A is unital)
(3) For every x1 6= x2 X, there exists f A such that f (x1 ) 6= f (x2 ) (A separates
points in X)
Then every f C(X) is a uniform limit of functions in A.
Example 4.3.5. If X = [a, b] R, and A is the set of all polynomials, then A satisfies the
criteria in StoneWeierstrass, so every continuous function f : [a, b] R is a uniform limit
of polynomials.
Proof. As one might imagine, there is some heavy lifting to be done here. The proof is
essentially broken up into three steps. The first is to find functions in our algebra which
match f on at least two points for any two given points. Next, we use the continuity of our
approximating functions to keep these functions close to f on small neighborhoods about
one of the two points, then shrink down to a finite set of neighborhoods using compactness.
We then take the minimum the finitely many functions we recover, and we can approximate
the min function on a finite set of variables uniformly using polynomials (using the previous
proposition!), bounding the difference between f and our approximating functions from
above. The third and final crucial step is to repeat this process using the max function to
bound the difference between f and our approximating functions below. This is all made
precise in the follow claims.
Fix f C(X) and > 0. We need g A such that for all x X,
d(g(x), f (x)) <
82
Claim 4.3.6. For every s, t X, there exists fs,t A such that fs,t (s) = f (s) and fs,t (t) =
f (t).
Proof. If s = t, then we can use the constant function x 7 f (s). Suppose s 6= t. By (3),
there exists h A such that h(s) 6= h(t). Since A is closed under multiplication by a scalar,
we can modify h such that we have h(t) h(s) = f (t) f (s) by multiplying by h by
f (s) f (t)
h(s) h(t)
where the denominator is nonzero by hypothesis. Then adding the constant function x 7
f (s) h(s), our function still lives in A by (1) and (2), and
h(s) = f (s) and h(t) = f (t)
s A such that
Claim 4.3.7. For every s X, there is h
s (s) f (s) < /2
(1) h
s (x) < f (x) + /2
(2) For all x X, h
Proof. The first condition might lead you to think we are taking a step backwards here; after
all, we previously found elements of A which agreed exactly with f at s. However, the first
condition comes at the cost of the second condition, which is clearly much stronger than
exact agreement at two points; our loss of exactness at s is the price for condition (2).
Fix s X. For each t X, by the previous claim there exists fs,t A such that
fs,t (s) = f (s) and fs,t (t) = f (t)
By the continuity of f fs,t , there exists an open neighborhood Ut of t such that for all
x Ut ,
Let M > 0 bound fs,t1 , . . . , fs,tk on X, which is possible since each of fs,t1 , . . . , fs,tk are
continuous and X is compact. Let p : [M, M ]k [M, M ] be a polynomial such that for
all (y1 , . . . , yk ) [M, M ]k ,
p(y1 , . . . , yk ) min(y1 , . . . , yk ) <
83
This is possible by our earlier result on uniform approximation of the min in kvariables
s (x) = p(fs,t , . . . , fs,t ). Then h
s A, since A is closed under
by polynomials. Now let h
1
k
polynomial operations and is unital, and
s (x) h(x) <
h
So
s (x) h(x) <
h
and for all x X,
<
4
2
3
4
3
< hs (x)
4
Once again, the set of all neighborhoods Vs for each s X forms an open cover of X, so by
the compactness of X, there are finitely many s1 , . . . , sl such that Vs1 Vsl = X. Let
s , . . . , h
s ). Then for all x X,
g = max(h
1
l
f (x)
3
< g(x)
4
To see this, note that for each x X, there is an i {1, . . . , l} such that x Vsi , whence
s , . . . , h
s ) > h
s (x) > f (x)
g(x) = max(h
1
i
l
84
3
4
< f (x) + + < f (x) +
4
2 4
> f (x)
= f (x)
4
4
4
and
These three claims together complete the proof. Note that we can replace the assumption
that A is unital with the weaker assumption that for all x X, there exists h A such that
h(x) 6= 0. One can then still prove the first claim with this weaker assumption. We also
used that A is unital to show that A is closed under compositions with polynomials p. But
this was only needed if p has a nonzero constant term; one can avoid this by showing that
the min and max functions can in fact be uniformly approximated by polynomials with a
zero constant term.
Some Error Bounds
Theorem 4.3.9 (Generalization of Mean Value Theorem). Let f, g : [a, b] R be continuous
on [a, b] and differentiable on (a, b). Then there exists c (a, b) such that
f 0 (c)(g(b) g(a)) = g 0 (c)(f (b) f (a))
Moreover, if g 0 (x) is never 0 on (a, b), then
f (b) f (a)
f 0 (c)
=
0
g (c)
g(b) g(a)
Proof. Let
h(x) = (f (x) f (a))(g(b) g(a)) (g(x) g(a))(f (b) f (a))
The h is continuous on [a, b], differentiable on (a, b), and h(a) = 0 = h(b), so by Rolles
theorem, there exists c (a, b) such that h0 (c) = 0, i.e.
h0 (c) = (f 0 (c))(g(b) g(a)) (g 0 (c))(f (b) f (a)) = 0
proving our first claim. Now if g 0 (x) 6= 0 for all x (a, b), then g 0 (c) 6= 0, and g(b) g(a) 6= 0
by Rolles Theorem, so
f 0 (c)
f (b) f (a)
=
0
g (c)
g(b) g(a)
As an aside, note that under the right hypotheses, Lhopitals rule is an easy consequence of
this theorem. See the third incarnation from Lecture 10 for the analogy.
85
f 2 (u)
f (n) (u)
(x u)2 + +
(x u)n
2!
n!
as a function of u on the interval [a, x]. F is continuous on [a, b] and one time differentiable
on (a, x). Let g : [a, x] R be continuous on [a, x] and differentiable on (a, x). By the
generalization of the Mean Value Theorem, there exists c (a, x) such that
F 0 (c)
F (x) F (a)
=
0
g (c)
g(x) g(a)
Note that differentiating F as a function of u gives:
f (2) (u) f (3) (u)
0
0
0
(2)
2
F (u) =f (u) + [(1)f (u) + f (u)(x u)] + (1)(2)(x u)
+
(x u)
2!
2!
f n (u) f (n+1) (u)
+
(x u)n
+ + (1)(n)(x u)n1
n!
n!
f (n+1) (u)
(x u)n
=
n!
Then note that F (a) is the Taylor expansion of f around a to the nth term, and
F (x) F (a) = f (x)
n
X
f k (a)
k!
k=0
(x a)k
is the remainder of the nth term Taylor expansion, denoted Rn (x). By the generalized
Mean Value Theorem, there then exists c (a, x) such that
f (n+1) (c)
(x
n!
0
g (c)
c)n
f (x)
Pn
k=0
f k (a)
(x
k!
a)k
g(x) g(a)
Theorem 4.3.10 (Taylor Expansion with Lagrange Remainder). Let f be ntimes continuously differentiable on [a, x] and (n + 1)times differentiable on (a, x). Then
f (x) =
n
X
f k (a)
k=0
where
Rn (x) =
k!
(x a)k + Rn (x)
f (n+1) (c)
(x a)n+1
(n + 1)!
86
f (n+1) (c)
(x
n!
c)n
f (n+1) (c)
n+1
(x a)n+1
(0
(x
a)
)
=
(1)(n + 1)(x c)n
(n + 1)!
Theorem 4.3.11 (Taylor Expansion with Cauchy Remainder). Let f be ntimes continuously differentiable on [a, x] and (n + 1)times differentiable on (a, x). Then
f (x) =
n
X
f k (a)
k!
k=0
where
Rn (x) =
(x a)k + Rn (x)
f (n+1) (c)
(x c)n (x a)
(n + 1)!
4.4
Today, we will discuss yet another remainder term for Taylor expansions, the integral form
of the Taylor remainder. We will also prove Fubinis theorem, and use this to justify the
multiplication of power series. We will use these tools to introduce the exponential function.
Theorem 4.4.1 (Taylor Expansion with Integral Remainder). Let f be (n + 1)times differentiable on [a, x] with f (n+1) Riemann integrable on [a, x]. Then
f (x) =
n
X
f k (a)
k!
k=0
where
Z
Rn (x) =
a
(x a)k + Rn (x)
f (n+1) (t)
(x t)n dt
n!
87
which is true by the 1st fundamental theorem of calculus. Assume the theorem is true for
n > 0. Let f be (n + 2)times differentiable, with f (n+2) Riemann integrable. Then in
particular, f (n+1) is continuous, hence Riemann integrable. By the theorem for n then:
Z x (n+1)
n
X
f
(t)
f k (a)
k
(x a) +
(x t)n dt
f (x) =
k!
n!
a
k=0
Note (when differentiating with respect to t)
n
(x t) = (x t)n+1
1
n+1
0
dt
n!
n!
n + 1 a
n!
n+1
a
a
Z x (n+2)
f (n+1) (a) (x a)n+1
f
(t)
=
+
(x t)n+1 dt
n!
n+1
(n
+
1)!
a
So
n
X
f k (a)
Z x (n+2)
f (n+1) (a)(x a)n+1
f
(t)
f (x) =
(x a) +
+
(x t)n+1 dt
k!
(n
+
1)!
(n
+
1)!
a
k=0
Z
n+1 k
x (n+2)
X
f (a)
f
(t)
k
=
(x a) +
(x t)n+1 dt
k!
a (n + 1)!
k=0
=
n+1 k
X
f (a)
k=0
k!
Show that if x0 is close enough to x , then there is a constant C such that for all n > 1
xn x  6 Cxn1 x 2
Solution. Let M bound f 00 . Since f 0 (x ) 6= 0, we can fix > 0 and L > 0 using the continuity
of f such that
x x  < = f 0 (x) > L
Now for x B(x , ) we have (since f (x ) = 0)
f
(x)
f
(x)
f
(x
)
F (x) x  = x 0
x = (x x )
f (x)
f 0 (x)
By the Taylor series expansion of f around x, we have
f (x ) = f (x) + f 0 (x)(x x) + R2 (x )
So
f (x ) f (x) = f 0 (x)(x x) + R2 (x )
Plug this in above to get
0
f
(x)(x
x)
+
R
(x
)
2
F (x) x  = (x x ) +
f 0 (x)
Using the Lagrange term for the remainder,
R2 (x) =
f 00 (c)(x x)2
2
2
f
(c)(x
x)
6 M x x 2
F (x) x  = (x x ) + (x x) +
2L
0
2f (x)
Taking C = M/2L completes the proof, with one subtlety; we need to make sure F (x)
(x , x + ) in order for the argument to work for all xn . To do so, take x such that
x x  < min(, 2L/M ).
Proposition 4.4.3 (W06.3). Let f : [a, b] R be twice continuously diffferentiable. Find a
Rb
good error bound for the trapezoid approximation of a dx, where the trapezoid approximation
for n = 1 is:
f (b) + f (a)
(b a)
2
Solution. The trapezoid approximation is given by
Z b
l(x)dx
a
89
where
l(x) = f (a) + (f (b) f (a))
xa
ba
In Lecture 10, we got the following error bounds on f (x) l(x) using the higher order
Rolles theorem:
(x a)(b x) 00
f (x) l(x) =
f (c)
2
for some c (a, b). Since f 00 is continuous on [a, b], let M be a bound for f 00 . Then the
error is bounded as follows:
Z b
Z b
Z b
3
(x a)(b x)
b ab a
M dx = M (b a)
f
(x)
l(x)dx
6
M
dx
6
2
2
2
a
a
a
an,m
n=1 m=1
n=1
m=1
then so does
an,m 
<
an,m
m=1 n=1
and
an,m =
n=1 m=1
X
m=1 n=1
Proof. Let
sn =
an,m 
m=1
By assumption,
sn <
n=1
sn <
n=N +1
90
an,m
an,m  <
m=1
Hence, we can find M > N such that for all n 6 N and for all l > M
l
X
an,m  <
m=M +1
2N
n=M +1 m=1
n=1 m=1
n=1 m=M +1
k
l
N
l
X
X
X X
an,m +
an,m
6
n=M +1 m=1
n=1 m=M +1
< +N
2
2N
=
Using a similar calculation, one can show that. Hence, for large enough M and l, k > M , we
have that
l X
k
l X
k
X
X
an,m ,
an,m
n=1 m=1
m=1 n=1
an,m
n=1 m=1
So
l X
k
X
an,m ,
n=1 m=1
l X
k
X
an,m
m=1 n=1
both converge to
lim
M X
M
X
an,m
n=1 m=1
X
cn (x a)n
n=0
and
dn (x a)n
n=0
respectively. Then f g is real analytic at a with radius of convergence > r, with coefficients
given by the convolution of (cn ) and (dn ).
Proof. Note
!
cn (x a)n
n=0
!
dm (x a)m
cn (x a)n
!
dm (x a)m
m=0
n=0
m=0
cn (x a)n dm (x a)m
n=0 m=0
n=0 k=0
=
=
=
=
=
X
n=0 k=0
X
X
k=0 n=0
X
k
X
cn dkn (x a)k
cn dkn (x a)k
cn dkn (x a)k
k=0 n=0
(x a)
k=0
cn dkn
n=0
(x a)k ek
k=0
where interchanging the order of the sums on line 5 is justified by Fubinis theorem
P
P
Corollary 4.4.7. More generally, if
cn xn , dn y n converge absolutely at x, y R, then
!
n
X
X
cj dnj xj y nj
n=0
converges, to
j=0
X
n=0
cn x
X
n=0
92
dn y n
X
xk
k=0
k!
lim sup
so the radius of convergence is infinite.
(2) By our earlier theorem, exp(x) is differentiable, and its derivative is given by term by
term differntiation of the series. One can easily show by induction that the coefficients
of exp0 (x) are exactly that of exp(x), whence the two series are equal.
(3) Since exp is differentiable, it is clearly continuous. One can show that
Z b
exp(x)dx = exp(b) exp(a)
a
X
xk
k!
k=0
n
X
X
n=0 j=0
X
yk
k=0
k!
xj
y nj
j! (n j)!
n
X
1 X
n!
=
xj y nj
n!
j!(n
j)!
n=0
j=0
X
1
(x + y)n
n!
n=0
= exp(x + y)
93
1
exp(x)
(6) It is straightforward to show that exp(x) is strictly positive, so (2) shows that exp0 (x) is
strictly positive.
Definition 4.4.10. Define
e = exp(1) =
X
n=0
94
Week 5
As per the syllabus, Week 5 topics include: Fubini theorem for sequences, multiplication
of power series, the exponential and logarithm, sine and cosine, uniform approximation of
periodic functions by trigonometric polynomials, multivariable differentiation, the chain
rule, partial derivatives, directional derivatives, differentiability of functions with continuous
partial derivatives, inverse function theorem, implicit function theorem, Lagrange multipliers,
integrals in several variables, change of variables, differentiation under the integral sign,
integration over product of spaces and double integrals, Clairauts theorem on equality of
mixed partial derivatives, local minima, maxima, and saddle points in two variables, Taylors
formula with remainder for functions of several variables, connection to Newtons method in
several variables, line integrals, Greens theorem, divergence theorem, Stokes theorem in R3 .
5.1
Today, we will introduce the natural logarithm and the trigonometric functions sin and cos.
We will prove several key results about these functions, and will introduce basic Fourier
analysis. We will also discuss differentiation in several variables, and will prove the chain
rule. We will also introduce directional derivatives.
Definition 5.1.1. ln : (0, ) R is the inverse of exp. Note that ln exists and is continuous
since exp is strictly monotone increasing, continuous, and onto (0, ).
Proposition 5.1.2. ln0 (y) = 1/y.
Proof. By our earlier theorem on derivatives of inverses (see Lecture 9, Prop 3.1.8), ln is
differentiable, and
1
ln0 (y) =
exp0 (x)
where y = exp(x), so
1
1
ln0 (y) =
=
exp(x)
y
Proposition 5.1.3.
n=1
tn
n=0
converges absolutely to
1
1t
for all t < 1. By our earlier theorem on integration of power series, we have
Z x
Z x
X
X
X
1
xk+1
xn
k
dt =
t dt =
=
k + 1 n=1 n
0 1t
k=0 0
k=0
f (t) =
95
X
xn
n=0
X
xn
n=1
= ln(2)
96
Proposition 5.1.12. sin and cos are differentiable, and sin0 = cos , cos0 = sin .
Proof. We give proof for only sin. Let u, h, , and be as they were the previous proof.
Then note
h
+ + =
2
2
2
So
sin( + h) sin()
u cos
u
h
=
= cos( + )
h
h
h
2
As h 0, u/h 1, and by the continuity of cos, cos( + h/2) cos(), so
sin( + h) sin()
cos()
h
1
xn+1
(n + 1)!
Definition 5.1.16. The trigonometric polynomials are the functions obtained as linear
combinations of sin(nx), cos(nx) for n = 0, 1, 2, . . .. Note all trigonometric polynomials
are periodic with period 2. We can then view them as functions on the interval [0, 2]
where we identify 0 and 2, or equivalently, as functions on the unit circle (denoted R/2Z).
Proposition 5.1.17. The trigonometric polynomials form an algebra on R/2Z.
97
Proof. By definition, the trigonometric polynomials are closed under addition and scalar
multiplication, so we only need to show closure under products. It is sufficient to show that
for all m, n N,
sin(nx) cos(mx), sin(nx) sin(mx), cos(nx) cos(mx)
are trigonometric polynomials. Using the following trigonometric identities,
sin A sin B =
1
[cos(A B) cos(A + B)]
2
1
[cos(A B) + cos(A + B)]
2
1
sin A cos B = [sin(A B) + sin(A + B)]
2
sin A sin B =
we note
1
[cos((n m)x) cos((n + m)x)]
2
1
cos(nx) cos(mx) = [cos((n m)x) + cos((n + m)x)]
2
1
sin(nx) cos(mx) = [sin((n m)x) + sin((n + m)x)]
2
which completes the proof.
sin(nx) sin(mx) =
Proposition 5.1.18. The algebra of trigonometric polynomials is unital and separate points.
Proof. The constant function 1 is cos(0x), hence is a trigonometric polynomial. One can
check that for any x 6= y [0, 2), either sin x 6= sin y or cos x 6= cos y.
Corollary 5.1.19. Any continuous function on R/2Z (or equivalently, any continuous
function on R with period 2) is the uniform limit of trigonometric polynomials.
Proof. Immediate by StoneWeierstrass.
Note that Taos proof of StoneWeierstrass also gives formulas for the coeffficients of the
approximating trigonometric polynomials, using convolutions. The coefficients obtained are
the start of Fourier analysis.
Multivariable Calculus (Differentiation and Integration)
5.2
Today, we will prove two essential results of multivariable calculus, namely the Inverse Function Theorem and the Implicit Function Theorem. We will then use the Implicit Function
Theorem to rigorously prove the Lagrange multipliers method for finding extrema constrained
to a particular surface.
98
5.3
Today, we will introduce Riemann integration in several variables, and we will prove Fubinis
theorem, which allows you to interchange the order of integration under the appropriate
circumstances. We will also prove the standard results of vector calculus, namely Greens
Theorem, Stokes Theorem, and the Divergence Theorem.
99