Академический Документы
Профессиональный Документы
Культура Документы
Michael Damron
Princeton University
1
Contents
1 Fundamentals 4
1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Natural numbers and induction . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Cardinality and the natural numbers . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Metric spaces 25
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Heine-Borel Theorem: compactness in Rn . . . . . . . . . . . . . . . . . . . . 32
3.6 The Cantor set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Sequences 40
4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Subsequences, Cauchy sequences and completeness . . . . . . . . . . . . . . 45
4.3 Special sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 Series 52
5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 Ratio and root tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Non non-negative series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2
7 Derivatives 74
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.3 Mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.4 LHopitals rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.5 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.6 Taylors theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8 Integration 91
8.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2 Properties of integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.3 Fundamental theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.4 Change of variables, integration by parts . . . . . . . . . . . . . . . . . . . . 101
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3
1 Fundamentals
1.1 Sets
We begin with the concepts of set, object and set membership. We will leave these as primitive
in a sense; that is, undefined. You can think of a set as a collection of objects and if a is
an object and A is a set then a A means a is a member of A. If A and B are sets, we
say that A is a subset of B (written A B) whenever a A we have a B. If A B and
B A we say the sets are equal and we write A = B. A is a proper subset of B if A B
but A 6= B. Note that , the set with no elements, is a subset of every set.
There are many operations we can perform with sets.
A B = {a : a A or a B} .
A B = {a : a A and a B} .
A (B C) = (A B) (A C)
A (B C) = (A B) (A C) .
Let us give a proof of the first. To show these sets are equal, we must show each is contained
in the other. So let a A (B C). We would like to show that a (A B) (A C).
We know a A and a (B C). One possibility is that a A and a B, in which case
a A B, giving a (A B) (A C). The only other possibility is that a A and a C,
since a must be in either B or C. Then a A C and the same conclusion holds. The other
direction is an exercise.
If A and B are sets then define the difference A \ B as
A \ B = {a : a A but a
/ B} .
4
One can verify the following as well.
A \ (B C) = (A \ B) (A \ C)
A \ (B C) = (A \ B) (A \ C) .
Finally the symmetric difference is
AB = (A \ B) (B \ A) .
5
The picture is that the equivalence classes of R partition A.
Definition 1.2.4. A partition of A is a collection P of subsets of A such that
S
1. A = SP S and
2. S1 S2 = whenever S1 and S2 in P are not equal.
Using this definition, we can say that if R is an equivalence relation on a set A then the
collection
CR = {[a]R : a A}
of equivalence classes form a partition of A.
Just a note to conclude. If we have an equivalence relation R on a set A, it is standard
notation to write
R/A = {[a]R : a A}
for the set of equivalence classes of A under R. This is known as taking the quotient by an
equivalence relation. At times the relation R is written in an implied manner using a symbol
like . For instance, (a, b) R would be written a b. In this case, the quotient is R/ .
We will spend much of the course talking about functions, which are special kinds of
relations.
Definition 1.2.5. Let A and B be sets and f a relation between A and B. We say that f
is a (well-defined) function from A to B, written f : A B if the following hold.
1. For each a A, there is at least one b B such that (a, b) f .
2. For each a A, there is at most one b B such that (a, b) f . That is, if we ever
have (a, b1 ) f and (a, b2 ) f for b1 , b2 B, it follows that b1 = b2 .
The set A is called the domain of f and the B is called the codomain of f .
Of course we will not continue to use this notation for a function, but the more familiar
notation: if (a, b) f then because of item 2 above, we can unambiguously write f (a) = b.
We will be interested in certain types of functions.
Definition 1.2.6. The function f : A B is called one-to-one (injective) if whenever
a1 6= a2 then f (a1 ) 6= f (a2 ). It is called onto (surjective) if for each b B there exists a A
such that f (a) = b.
Another way to define onto is to first define the range of a function f : A B by
f (A) = {f (a) : a A}
6
Formally speaking we define g f A C by
Proof. We start with the first statement. Suppose that f and g are one-to-one; we will show
that g f must be one-to-one. Suppose then that a and a0 in A are such that (g f )(a) =
(g f )(a0 ). Then by definition, g(f (a)) = g(f (a0 )). But g is one-to-one, so f (a) = f (a0 ).
Now since f is one-to-one, we find a = a0 . This shows that if (g f )(a) = (g f )(a0 ) then
a = a0 , proving g f is one-to-one.
Suppose then that f and g are onto. To show that g f is onto we must show that
for each c C there exists a A such that (g f )(a) = c.This is the same statement as
g(f (a)) = c. We know that g is onto, so there exists b B such that g(b) = c. Furthermore,
f is onto, so for this specific b, there exists a A such that f (a) = b. Putting these together,
if and only if f is a bijection. The meaning of the above equations is f 1 (f (a)) = a and
f (f 1 (b)) = b for all a A and b B.
f 1 = {(b, a) : (a, b) f } .
This is clearly a relation. We claim it is a function. To show this we must prove that
7
for all b B there exists at most one a A such that (b, a) f 1 .
Restated, these are
for all b B there exists a A such that f (a) = b and
for all b B there exists at most one a A such that f (a) = b.
These are exactly the conditions that f be a bijection, so f 1 is a function.
Now we must show that f 1 f = idA and f f 1 = idB . We show only the first; the
second is an exercise. For each a A, there is a b B such that f (a) = b. By definition of
f 1 , we then have (b, a) f 1 ; that is, f 1 (b) = a. Therefore (a, b) f and (b, a) f 1 ,
giving (a, a) f 1 f , or
(f 1 f )(a) = a = idA (a) .
We have now shown that if f is a bijection then there is a function f 1 that satisfies (1).
For the other direction, suppose that f : A B is a function and g : B A is a function
such that
g f = idA and f g = idB .
We must show then that f is a bijection. To show one-to-one, suppose that f (a1 ) = f (a2 ).
Then a1 = idA (a1 ) = g(f (a1 )) = g(f (a2 )) = idA (a2 ) = a2 ., giving that f is one-to-one. To
show onto, let b B; we claim that f maps the element g(b) to b. To see this, compute
b = idB (b) = f (g(b)). This shows that f is onto and completes the proof.
Here are some more facts about inverses and injectivity/surjectivity.
If f : A B is a bijection then so is f 1 : B A.
If f : A B and g : B C are bijections then so is g f .
The identity map idA : A A is a bijection.
If a function f : A B is not a bijection then there is no inverse function f 1 : B A.
However we can in all cases consider the inverse image.
Definition 1.2.10. Given f : A B and C B we define the inverse image of C as
f 1 (C) = {a A : f (a) C} .
Note that if we let C be a singleton set {b} for some b B then we retrieve all elements
a A mapped to b:
f 1 ({b}) = {a A : f (a) = b} .
In the case that f is invertible, this just gives the singleton set consisting of the point f 1 (b).
We note the following properties of inverse images (proved in the homework). For f : A B
and C1 , C2 B,
f 1 (C1 C2 ) = f 1 (C1 ) f 1 (C2 ).
f 1 (C1 C2 ) = f 1 (C1 ) f 1 (C2 ).
8
1.3 Cardinality
The results of the previous section allow us to define an equivalence relation on sets:
Definition 1.3.1. If A and B are sets, we say that A and B are equivalent (A ' B or A
and B have the same cardinality) if there exists a bijection f : A B. The cardinality of a
set A (written ](A)) is defined as the equivalence class of A under this relation. That is
](A) = {B : A ' B} .
Definition 1.3.2. If A and B are sets then we write ](A) ](B) if there exists a one-to-one
function f : A B. Write ](A) < ](B) if ](A) ](B) but ](A) 6= ](B).
2. (transitivity) For all sets A, B, C, if ](A) ](B) and ](B) ](C) then ](A) ](C).
3. (antisymmetry) For all sets A and B, if ](A) ](B) and ](B) ](A) then ](A) = ](B).
Any relation on a set that satisfies these properties is called a partial order. For cardi-
nality, establishment of antisymmetry is done by the Cantor-Bernstein theorem, which we
will skip.
Theorem 1.3.3 (Cantors Theorem). For any set A let P(A) be the power set of A; that is,
the set whose elements are the subsets of A. Then ](A) < ](P(A)).
Proof. We first show that ](A) 6= ](P(A)). We proceed by contradiction. Suppose that A is
a set but assume that ](A) = ](P(A)). Then there exists a bijection f : A P(A). Using
this function, define the set
S = {a A : a
/ f (a)} .
Since this is a subset of A, it is an element of P(A). As f is a bijection, it is onto and
therefore there exists s A such that f (s) = S. There are now two possibilities; either
s S or s / S. In either case we will derive a contradiction, proving that the assumption
we made cannot be true: no such f can exist and ](A) 6= ](P(A)).
In the first case, s S. Then as S = f (s), we have s f (s). But then by definition of
S, it must actually be that s / S, a contradiction. In the second case, s
/ S, giving by the
definition of S that s f (s). However f (s) = S so s S, another contradiction.
Second we must show that ](A) ](P(A)). To do this we define the function
To prove injectivity, suppose that f (a1 ) = f (a2 ). Then {a1 } = {a2 } and therefore a1 =
a2 .
9
Let us now give an example of two sets with the same cardinality. If A and B are sets
we write B A for the set of functions f : A B. Let F2 be a set with two elements, which
we call 0 and 1. We claim that
](P(A)) = ](FA
2) .
To see this we must display a bijection between the two. Define f : P(A) FA 2 by the
following. For any subset S A associate the characteristic function S : A F2 by
(
1 if a S
S (a) = .
0 if a /S
2. s is injective.
3. (Inductive axiom) If any subset S N contains 1 and has the property that whenever
n S then s(n) S, it follows that S = N.
The third property seems a bit weird at first, but actually there are many sets which satisfy
the first two properties and are not N. For instance, the set {n/2 : n N} does. So we need
it to really pin down N.
From these axioms many properties follow. Here is one.
Proof. Let S = {n N : s(n) 6= n}. Clearly 1 S. Now suppose that n S for some
n. Then we claim that s(n) S. To see this, note that by injectivity of s, s(n) 6= n
implies that s(s(n)) 6= s(n). Thus s(n) S. By the inductive axiom, since 1 S and
whenever n S we have s(n) S, we see that S = N. In other words, s(n) 6= n for
all n.
Addition
It is customary to call s(1) = 2, s(2) = 3, and so on. We define addition on the natural
numbers in a recursive manner:
10
for any n, m N, define n + s(m) to be s(n + m).
That this indeed defines a function + : N N N requires proof, but we will skip this and
assume that addition is defined normally. Of course, addition satisfies the commutative and
associative laws.
1. For any m, n, r N, m + (n + r) = (m + n) + r.
m + (n + 1) = m + s(n) = s(m + n) = (m + n) + 1 ,
where we have used the inductive definition of addition. Now suppose that the formula
holds for some r N; we will show it holds for s(r). Indeed,
S = {r N : m + (n + r) = (m + n) + r for all m, n N}
2. For any m, n N, m + n = n + m.
S = {n N : n + m = m + n for all m N} .
The first step is to show that 1 S; that is, that 1 + m = m + 1 for all m N. For
this we also do an induction. Set
T = {m N : 1 + m = m + 1} .
1 + (m + 1) = (1 + m) + 1 = (m + 1) + 1 .
By the induction, T = N.
Now that we have shown 1 S, we assume n S and prove n + 1 S. For m N,
(n + 1) + m = n + (1 + m) = n + (m + 1) = (n + m) + 1
= (m + n) + 1 = m + (n + 1) .
11
3. For all n, m N, n + m 6= n.
S = {n N : n + m 6= nfor all m N} .
(n + 1) + m = (n + m) + 1 = s(n + m) 6= s(n) = n + 1 ,
Proof. We know s does not map any element to 1 so s is in fact a function to N \ {1}.
Also it is injective. To show surjective, consider the set
S = {1} {s(n) : n N} .
Ordering
We also define an ordering on the natural numbers. We say that m n for m, n N if
either m = n or m + a = n for some a N. This defines a total ordering of N; that is, it is
a partial ordering that also satisfies
for all m, n N, m n or n m.
In the case that m n but m 6= n we write m < n. Note that by item 3 above, n < n + m
for all n, m N. In particular, n < s(n).
12
Proof. First each n n so it is reflexive. Next if n1 n2 and n2 n3 then if n1 = n2 or
n2 = n3 , we clearly have n1 n3 . Otherwise there exists m1 , m2 N such that n1 + m1 = n2
and n2 + m2 = n3 . In this case,
n3 = n2 + m2 = (n1 + m1 ) + m2 = n1 + (m1 + m2 ) ,
giving n1 n3 .
For antisymmetry, suppose that m n and n m. For a contradiction, if m 6= n then
there exists a, b N such that m = n+a and n = m+b. Then m = (m+a)+b = m+(a+b),
a contradiction with item 3 above. Therefore m = n.
So far we have proved that is a partial order. We now prove is a total ordering. To
begin with, we claim that for all n N, 1 n. Clearly this is true for n = 1. If we assume
it holds for some n then
n+1=1+n1 ,
verifying the claim by induction.
Now for any m > 1 (that is, m N with m 6= 1), define the set
S = {n N : n m} {n N : m n} .
By the above remarks, 1 S. Supposing now that n S for some n N, we claim that
n + 1 S. To show this, we have three cases.
1. Case 1: n = m. In this case, n + 1 = m + 1 m, giving n + 1 S.
2. Case 2: n > m, so there exists a N such that n = m+a. Then n+1 = m+a+1 m,
giving n + 1 S.
m = n + a = n + a 1 + 1 = (n + 1) + a 1 > n + 1 ,
13
If n < k then n + 1 k.
Multiplication.
We define multiplication inductively by
n 1 = n for all n N
n s(m) = n + (n m) .
2. n m = m n.
3. (n m) r = n (m r).
Jn = {m N : m n} .
Jn+1 = Jn {n + 1} .
To show this let k be in the right side. If k = n + 1 then k Jn+1 . Otherwise k n, giving
by n n + 1 the inequality k n + 1, or k Jn+1 . To prove the inclusion , suppose that
k Jn+1 . If k Jn we are done, so suppose that k / Jn . Therefore k > n, so k n + 1. On
the other hand, k n + 1, so k = n + 1.
Definition 1.5.1. For an arbitrary set A we say that A has cardinality n if A ' Jn . In this
case we say A is finite and we write ](A) = n. If A is not equivalent to any Jn we say A is
infinite.
In this definition, ](A) is an equivalence class of sets and n is a number, so what we have
written here is purely symbolic: it means A ' Jn .
Lemma 1.5.2. If A and B are sets such that A B then ](A) ](B).
Proof. Define f : A B by f (a) = a. Then f is an injection.
14
Theorem 1.5.3. For all n N, ](Jn ) < ](Jn+1 ) < ]N.
Proof. Each set above is a subset of the next, so the proposition holds using instead of
<. We must then prove 6= in each spot above. Assume first that we have proved that
](Jn ) 6= ](Jn+1 ) for all n N; we will show that ](Jn ) 6= ]N for all n N. If we had equality,
then we would find ](Jn+1 ) ]N = ](Jn ). This contradicts the first inequality.
To prove the inequality ](Jn ) 6= ](Jn+1 ), we use induction. Clearly it holds for n = 1
since J1 = {1} and J2 = {1, 2} and any function from J1 to J2 can only have one element
in its range (cannot be onto). Suppose then that ](Jn ) 6= ](Jn+1 ); we will prove that
](Jn+1 ) 6= ](Jn+2 ) by contradiction. Assume that there is a bijection f : Jn+1 Jn+2 . Then
some element must be mapped to n + 2; call this k Jn+1 . Define h : Jn+1 Jn+1 by
m
m 6= k, n + 1
h(m) = n + 1 m = k .
k m=n+1
This function just swaps k and n + 1. It follows then that f = f h : Jn+1 Jn+2 is a
bijection that maps n + 1 to n + 2.
Now Jn is just Jn+1 \ {n + 1} and Jn+1 is just Jn+2 \ {n + 2}, so define g : Jn Jn+1
to do exactly what f does: g(m) = f(m). It follows that g is a bijection from Jn to Jn+1 ,
giving Jn ' Jn+1 , a contradiction.
Because of the proposition, if a set A has A ' N it must be infinite. In this case we say
that A is countable. Otherwise, if A is infinite and ](A) 6= ]N, we say it is uncountable. From
this point on, we will be more loose about working with the natural numbers. For example,
we will use the terms finite and infinite in the same way that we normally do a set is finite
if it has finitely many elements and infinite otherwise. Of course every proof we write from
now on could be done using the Peano axioms, but we will be spared that.
Proof. We must construct a bijection from N to S. We can actually do this using the well-
ordering property: that each non-empty subset of N has a least element. Define f : N S
recursively: f (1) is the least element of S and, assuming we have defined f (1), . . . , f (n),
define f (n + 1) to be the least element of S \ {f (1), . . . , f (n)}.
This is a bijection.
Note that A is countable if and only if there is an injection f : A N; that is, that
](A) ]N.
15
Proof. To prove this we need to construct a bijection from N. We will do this somewhat
non-rigorously, thinking of a bijection from N as a listing of elements of AC A in sequence.
For example, given a countably infinite set S we may take a bijection f : N S and list all
of the elements of S as
f (1), f (2), f (3), . . .
If S is finite then this corresponds to a finite list.
Since each A C is countable, we may list its elements. The collection C itself is countable
so we can list the elements of AC A in an array:
a1 a2
b1 b2 b3
c1
d1 d2 d3 d4
Note that some rows are finite. We now list the elements according to diagonals. That is,
we write the list as
a1 , b1 , a2 , c1 , b2 , d1 , b3 , d2 , . . .
Because we want the list to correspond to a bijection, we need to make sure that no element
is repeated. So, for instance, if b1 and a2 are equal we would only include the first.
1.6 Exercises
1. Let f : A B and g : B C be functions. Show that the relation g f A C,
defined by
(a, c) g f if (a, b) f and (b, c) g for some b B
is a function.
16
5. Strong Induction. In this exercise we introduce strong mathematical induction,
which, although being referred to as strong, is actually equivalent to mathematical
induction. Suppose we are given a collection {P (n) : n N} of mathematical state-
ments. To show P (n) is true for all n, mathematical induction dictates that we show
two things hold: P (1) is true and if P (n) is true for some n N then P (n + 1) is true.
To argue instead using strong induction we prove that
(Here [n/2] is the largest integer no bigger than n/2.) Prove by strong induction
that an 2n1 for n 2. Is it possible to find b < 2 such that an bn1 for all
n 2?
(b) Why does strong induction follow from mathematical induction? In other words
in the second step of strong induction, why are we allowed to assume that P (k)
is true for all k n to prove that P (n + 1) is true?
6. Prove that any non-empty subset S N has a least element. That is, there is an s S
such that for all t S we have s t. This is a major result about N, expressed by
saying that N is well-ordered.
M = {m N : t S, m t} .
Use Peanos induction axiom to prove that M = N. Does this lead to a contradiction?
17
2 The real numbers
2.1 Rationals and suprema
From now on we will proceed through Rudin, using the standard notations
Z = {. . . , 1, 0, 1, . . .}
Q = {m/n : m, n Z and n 6= 0} .
When thinking about the rational numbers, we quickly come to realize that they do not
capture all that we wish to express using numbers. For instance,
2n2 = m2 ,
so m2 is even. This actually implies that m must be even, for otherwise m2 would be odd
(since the square of an odd number is odd). Therefore we can write m = 2s for some s Z.
Plugging back in, we find
2n2 = 4s2 or n2 = 2s2 ,
so n2 is also even, giving that n is even. This is a contradiction.
From the previous theorem, what we know as 2 is not a rational number. Therefore if
we
were to construct a theory from only rationals, we would have a hole where we think
2 should be. What is even stranger is that there are rational numbers arbitrarily close to
this hole.
Theorem 2.1.2. If q Q satisfies 0 < q 2 < 2 then we can find another rational q Q such
that
q 2 < q2 < 2 .
Similarly, for each r Q such that r2 > 2, there is another rational r such that 2 < r2 < r2 .
2 q2
q = q + .
q+2
Then q > q and
2(q 2 2)
q2 2 = ,
(q + 2)2
giving q2 < 2.
18
We see from above that the set {q Q : q 2 < 2} does not have a largest element. This
leads us to study largest elements of sets more carefully.
Note that if a is a least upper bound for B then a is unique. Indeed, assume that a and
a are least upper bounds. Since they are both upper bounds, we have a a0 and a0 a, so
0
Proposition 2.1.4. Let A be a totally ordered set and B a subset. Define C to be the set
of all upper bounds for B. Then sup B = inf C.
Proof. We are trying to show that some element (inf C) is the supremum of B, so we must
show two things: inf C is an upper bound for B and any other upper bound a for B satisfies
inf C a. The second statement is easy because if a is an upper bound for B then a C.
As inf C is a lower bound for C we then have inf C a.
For the first, assume that inf C is not an upper bound for B, so there exists b B such
that inf C is not b. By trichotomy, inf C < b. We claim then that b is a lower bound for C
which is larger than the greatest lower bound, a contradiction. Why is this? If c C then
c is an upper bound for B, giving c b, or b c.
Note that the second statement of Theorem 2.1.2 states that the set {q Q : q 2 > 2}
does not have a supremum in Q. Indeed, if it did have a supremum r, then r would be a
rational upper bound for this set and then we could find a smaller r that is still an upper
bound, a contradiction. So one way of formulating the fact that there are holes in Q is to
say that it does not have the least upper bound property.
Definition 2.1.5. Let A be a totally ordered set with order . We say that A has the least
upper bound property if each nonempty subset B A with an upper bound in A has a least
upper bound in A.
19
For the statement, one needs the definition of an ordered field, which is a certain type of
totally ordered set with multiplication and addition (like the rationals).
Theorem 2.2.1 (Existence and uniqueness of R). There exists a unique ordered field with
the least upper bound property.
The sense in which uniqueness holds is somewhat technical; it is not that any two ordered
fields as above must be equal, but they must be isomorphic. Again we defer to Rudin for
these definitions. We will now assume the existence of R, that it contains Q and Z, and its
usual properties.
One extremely useful property of R that follows from the least upper bound property is
Theorem 2.2.2 (Archimedean property of R). Given x, y R with x 6= 0, there exists
n Z such that
nx > y .
Proof. First let x, y R such that x, y > 0 and assume that there is no such n. Then the
set
{nx : n N}
is bounded above by y. As it is clearly nonempty, it has a supremum s. Then s x < s, so
s x cannot be an upper bound, giving the existence of some m N such that
s x < mx .
However this implies that s < (m+1)x, so s was actually not an upper bound, contradiction.
This proves the statement for the case x < y. The other cases can be obtained from this one
by instead considering x and/or y.
The Archimedean property implies
Corollary 2.2.3 (Density of Q in R). Let x, y R with x < y. There exists q Q such
that x < q < y.
Proof. Apply the Archimedean property to y x and 1 to find n Z such that n(y x) > 1.
We can also find m1 > nx and m2 > nx, so
m2 < nx < m1 .
nx < m 1 + nx < ny .
20
Proof. We already know that N N is countable: this is from setting up the array
(1, 1) (2, 1) (3, 1)
(1, 2) (2, 2) (3, 2)
(1, 3) (2, 3) (3, 3)
and listing the elements along diagonals. On the other hand, there is an injection
f : Q+ N N ,
where Q+ is the set of positive rationals. One such f is given by f (m/n) = (m, n), where
m/n is the reduced fraction for the rational, expressed with m, n N. Therefore Q+ is
countable. Similarly, Q , the set of negative rationals, is countable. Last, Q = Q+ Q {0}
is a union of 3 countable sets and is thus countable.
To prove R is uncountable, we will use decimal expansions for real numbers. In other
words, we write
x = .a1 a2 a3 . . .
where ai {0, . . . , 9} for all i. Since we have not proved anything about decimal expansions,
we are certainly assuming a lot here, but this is how things go. Note that each real number
has at most 2 decimal expansions (for instance, 1/4 = .2500 . . . = .2499 . . .).
Assume that R is countable. Then as there are at most two decimal expansions for each
real number, the set of decimal expansions is countable (check this!) Now write the set of
all expansions in a list:
1 .a0 a1 a2 . . .
2 .b0 b1 b2 . . .
3 .c0 c1 c2 . . .
We will show that no matter what list we are given (as above), there must be a sequence
that is not in the list. This implies that there can be no such list, and thus R is uncountable.
Consider the diagonal element of the list. That is, we take a0 for the first digit, b1 for
the second, c2 for the third and so on:
.a0 b1 c2 d3 . . .
We now have a rule to transform this diagonal element into a new one. We can use many,
but here is one: change each digit to a 0 if it is not 0, and replace it with 9 if it is 0. For
example,
.0119020 . . . .9000909 . . .
Note that this procedure changes the diagonal number into a new one that differs from the
diagonal element in every decimal place. Call this new expansion A = . a0 a
1 . . .
Now our original list contains all expansions, so it must contain A at some point; let us
say that the n-th element of the list is A. Then consider the n-th digit a n of A. On the
one hand, by construction, an is not equal to the n-th digit of the diagonal element. On the
other hand, by the position in the list, a
n equals the n-th digit of the diagonal element. This
is a contradiction.
21
2.3 Rn for n 2
A very important extension of R is given by n-dimensional Euclidean space.
~a ~b = (a1 , . . . , an ) (b1 , . . . , bn ) = a1 b1 + + an bn .
2. |c~a| = |c||~a|.
Proof. The first two follow easily; for instance since a2 0 for all a R (this is actually
part of the definition of ordered field), we get a21 + + a2n 0 and therefore |~a| 0. If
|~a| = 0 then by uniqueness of square roots, a21 + + a2n = 0 and so 0 a2i for all i, giving
ai = 0 for all i.
For the third item, we first give a lemma.
22
Lemma 2.3.5. If ax2 + bx + c 0 for all x R then b2 4ac.
Proof. If a = 0 then bx c for all x. Then we claim b must be zero. If not, then plugging
in either 2c/b or 2c/b will give bx < c, a contradiction. Therefore is a = 0 we must have
b = 0 and therefore b2 4ac as claimed.
Otherwise a 6= 0. First assume that a > 0. Plug in x = b/(2a) to get
b2 /(4a) + c 0
giving b2 4ac. Last, if a < 0 then we have (a)x2 + (b)x + (c) 0 and applying what
we have proved already to this polynomial, we find (b)2 4(a)(c), or b2 4ac.
To prove Cauchy-Schwarz, note that for all x R,
0 (a1 x b1 )2 + + (an x bn )2
= (a21 + + a2n )x2 2(a1 b1 + + an bn )x + (b21 + + b2n )
= |~a|2 x2 2(~a ~b)x + |~b|2 .
2.4 Exercises
1. For each of the following examples, find the supremum and the infimum of the set S.
Also state whether or not they are elements of S.
23
(a) Define the sum set
A + B = {a + b : a A, b B} .
Prove that sup(A + B) = sup A + sup B.
(b) Define the product set
A B = {a b : a A, b B} .
Is it true that sup(AB) = sup Asup B? If so, provide a proof; otherwise, provide
a counterexample.
4. Let C be a collection of open intervals (sets I = (a, b) for a < b) such that
Hint. Define a function f : C S for some countable set S R by setting f (I) equal
to some carefully chosen number.
24
3 Metric spaces
3.1 Definitions
Definition 3.1.1. A set X with a function d : X X R is a metric space if for all
x, y, z X,
1. d(x, y) 0 and equals 0 if and only if x = y and
2. d(x, y) d(x, z) + d(z, y).
Then we call d a metric.
Examples.
1. A useful example of a metric space is Rn with metric d(~a, ~b) = |~a ~b|.
2. If X is any nonempty set we can define the discrete metric by
(
1 if x 6= y
d(x, y) = .
0 if x = y
3. The set F [0, 1] of bounded functions f : [0, 1] R is a metric space with metric
d(f, g) = sup{|f (x) g(x)| : x [0, 1]} .
25
Proposition 3.2.3. Any neighborhood is open.
Proof. Let x X and r > 0. To show that Br (x) is open we must choose y Br (x) and
show that there exists some s > 0 such that Bs (y) Br (x). The radius s will depend on
how close y is to the boundary. Therefore, choose
s = r d(x, y) .
To show that for this s, we have Bs (y) Br (x) we take z Bs (y). Then
1. In R, the only intervals that are open are the (surprise!) open intervals. For instance,
lets consider the half-open interval (0, 1] = {x R : 0 < x 1}. If it were open, we
would be able to, given any x (0, 1], find r > 0 such that Br (x) (0, 1]. But clearly
this is false because Br (1) contains 1 + r/2.
Proof. Let x OC O. Then there exists O C such that x O. Since O is open, there
exists r > 0 such that Br (x) O. This is also a subset of OC O so this set is open.
To show that we cannot allow infinite intersections, consider the sets (1/n, 1 + 1/n) in
R. We have
n=1 (1/n, 1 + 1/n) = [0, 1] ,
Definition 3.2.5. An interior point of Y X is a point y Y such that there exists r > 0
with Br (y) Y . Write Y for the set of interior points of Y .
26
Examples:
1. The set of interior points of [0, 1] (under the usual metric) is (0, 1).
Sets can be both open and closed. Consider , whose complement is clearly open, making
closed. It is also open.
Definition 3.3.1. Let Y X. A point x X is a limit point of Y if for each r > 0 there
exists y Y such that y 6= x and y Br (x). Write Y 0 for the set of limit points of Y .
Examples:
{(x, y) : x2 + y 2 1} {(0, y) : y R} .
Proposition 3.3.2. x X is a limit point of Y if and only if for each r > 0 there are
infinitely many points of Y in Br (x)
27
Proof. We need only show that if x is a limit point of Y and r > 0 then there are in-
finitely many points of Y in Br (x). We argue by contradiction; assume there are only
finitely many and label the ones that are not equal to x as y1 , . . . , yn . Choosing r =
min{d(x, y1 ), . . . , d(x, yn )}, we then have that Br (x) contains no points of Y except pos-
sibly x. This contradicts the fact that x is a limit point of Y .
Here is yet another definition of closed.
Theorem 3.3.3. Y is closed if and only if Y 0 Y .
Proof. Suppose Y is closed and let y be limit point of Y . If y / Y then because X \ Y is
open, we can find r > 0 such that Br (y) (X \ Y ). But for this r, there is no x Br (y)
that is also in Y , so that y is not a limit point of Y , a contradiction.
Suppose conversely that Y 0 Y ; we will show that Y is closed by showing that X \ Y is
open. To do this, let z X \ Y . Since z / Y and Y 0 Y , z cannot be a limit point of Y .
Therefore there is an r > 0 such that Br (z) contains no points p 6= z such that p Y . Since
z is also not in Y , we must have Br (z) (X \ Y ), implying that X \ Y is open.
Examples:
1. Again the set {1, 2, 3} has no limit points (because from the above proposition, a finite
set cannot have limit points). However it is closed by the above theorem.
Proof. We first show the inclusion . To do this we need to show that each y Y and
each y Y 0 must be in the intersection on the right (call it J). First if y Y then because
each C C contains Y , we have y J. Second, if y Y 0 and C C we also claim that
y C. This is because y, being a limit point of Y , is also a limit point of C (directly from
the definition). However C is closed, so it contains its limit points, and y C.
For the inclusion , we will show that Y C. This implies that Y is one of the sets we
are intersecting to form J, and so J Y . Clearly Y Y , so we need to show that Y is
closed. If x
/ Y then x is not in Y and x is not a limit point of Y , so there exists r > 0 such
that Br (x) does not intersect Y . Since Br (x) is open, each point in it has a neighborhood is
contained in Br (x) and therefore does not intersect Y . This means
c that each point in Br (x)
is not in Y and is not a limit point of Y , giving Br (x) Y , so the complement of Y is
open. Thus Y is closed.
From the theorem above, we have a couple of consequences:
28
1. For all Y X, Y is closed. This is because the intersection of closed sets is closed.
2. Y = Y if and only if Y is closed. One direction is clear: that if Y = Y then Y is closed.
For the other direction, if Y is closed then Y 0 Y and therefore Y = Y Y 0 Y .
Examples:
1. Q = R.
2. R \ Q = R.
3. {1, 1/2, 1/3, . . .} = {1, 1/2, 1/3, . . .} {0}.
For some practice, we give Theorem 2.28 from Rudin:
Theorem 3.3.6. Let Y R be nonempty and bounded above. Then sup Y Y and therefore
sup Y Y if Y is closed.
Proof. By the least upper bound property, s = sup Y exists. To show s Y we need to
show that s Y or s Y 0 . If s Y we are done, so we assume s
/ Y and prove that s Y 0 .
Since s is the least upper bound, given r > 0 there must exist y Y such that
sr <y s .
If this were not true, then s r would be an upper bound for Y . But now we have found
y Y such that y 6= s and y Br (s), proving that s is a limit point for Y .
Note that sup Y is not always a limit point of Y . Indeed, consider the set
Y = {0} .
This set has sup Y = 0 but has no limit points. The set Y can even have limit points but
just not with sup Y a limit point. Consider Y = {0} [2, 1].
3.4 Compactness
It will be very important for us, during the study of continuity for instance, to understand
exactly which sets Y R have the following property: for each infinite subset E Y , E
has a limit point in Y . We will soon see that the interval [0, 1] has this property, whereas
(0, 1) does not (take for example the subset {1, 1/2, 1/3, . . .}). The reason is that we will
many times find ourselves exactly in this situation: with an infinite subset E of some set Y
and we will want to find a limit point for E (and hope that it is also in E). This property
is what we will call on the problem set limit point compactness.
Limit point compactness was apparently one of the original notions of compactness (see
the discussion in Munkres topology book at the beginning of the compactness section
thanks Prof. McConnell). However over time it became apparent that there was a stronger
and more general version of compactness (equivalent in metric spaces, but not in all topo-
logical spaces) which could be formulated only in terms of open sets. We give this definition,
now taken to be the standard one, below.
29
Definition 3.4.1. A subset K of a metric space X is compact if for every collection C
of open sets such that K CC C, there are finitely many sets C1 , . . . , Cn C such that
K ni=1 Ci .
The collection C is called an open cover for K and {C1 , . . . , Cn } is a finite subcover.
The process of choosing this finite number of sets from C is referred to as extracting a finite
subcover. The definition, in this language, states that K is compact if from every open cover
of K we can extract a finite subcover of K.
It is quite difficult to gain intuition about the above definition, but it will develop as we
go on and use compactness in various circumstances. The main point is that finite collections
are much more useful than infinite collections. This is true for example with numbers: we
already know that a set of finitely many numbers has a min and a max, whereas an infinite set
does not necessarily. As we go through the course, to develop a clearer view of compactness,
you should revisit the following phrase: often times, compactness allows us to pass from
local information (valid in each open set from the cover) to global information (valid on
the whole space), by patching together the sets in the finite subcover.
Let us now give some properties of compact sets and try to emphasize where the ability
to extract finite subcovers comes into the proofs.
Theorem 3.4.2. Any compact set is limit point compact.
Proof. Let K X be compact and let E K be an infinite set. Assume for a contradiction
that E has no limit point in K, so for each x K we can find rx > 0 such that Brx (x)
intersects E only possibly at x. The collection C = {Brx (x) : x K} is an open cover of
K, so by compactness it can be reduced to a finite subcover of K (and thus of E). But this
means that E must have been finite, a contradiction.
Definition 3.4.3. A set E X is bounded if there exists x X and R > 0 such that
E BR (x).
Theorem 3.4.4. Any compact K X is bounded.
Proof. Pick x X and define a collection C of open sets by
C = {BR (x) : R N} .
We claim that C is an open cover of K. We need just to show that each point of X is in at
least one of the sets of C. So let y X and choose R > d(y, x). Then y BR (x).
Since K is compact, there exist C1 , . . . , Cn C such that K ni=1 Ci . By definition
of the sets in C we can then find R1 , . . . , Rn such that K ni=1 BRi (x). Taking R =
max{R1 , . . . , Rn }, we then have K BR (x), completing the proof.
In the proof it was essential to extract a finite subcover because we wanted to take R to
be the maximum of radii of sets in C. This is clearly infinity if we have an infinite subcover,
and so in this case the proof would break down (that is, if K we were not able to extract a
finite subcover).
Examples.
30
1. The set {1/2, 1/3, . . .} is not compact. This is because we can find an open cover that
admits no finite subcover. Indeed, consider
1 1 1 1
C= , + :n2 .
n 2n n 2n
Each one of the sets in the above collection covers only finitely many elements from
{1/2, 1/3, . . .}, and so any finite sub collection cannot cover the whole set.
2. However if we add 0, by considering the set {1/2, 1/3, . . .} {0}, it becomes compact.
To prove this, let C be any open cover; we will show that there are finitely many sets
from C that still cover our set.
To do this, note first that there must be some C C such that 0 C. Since C is
open, it contains some interval (r, r) for r > 0. Then for n > 1/r, all points 1/n are
in this interval, and thus C contains all but finitely many of the points from our set.
Now we just need to cover the other points, of which there are finitely many. Writing
1/2, . . . , 1/N for these points, choose for each i a set Ci from C such that 1/i Ci .
Then
{C, C2 , . . . , CN }
is a finite subcover.
The main problem in example 1 was actually that the set was not closed. It is not
immediately apparent how that was manifested in our inability to produce a finite subcover,
but it is a general fact:
r = min{d(x, yi )/2 : i = 1, . . . , n} ,
we claim then that Br (x) K c . To show this, let z Br (x). Then d(z, x) < r and by the
triangle inequality,
d(yi , x) < d(z, yi ) + d(z, x) < d(z, yi ) + r ,
31
giving
d(x, yi )/2 d(yi , x) r d(z, yi ) for all i = 1, . . . , n .
In other words, z
/ Byi for all i. But the Byi s cover K and therefore z
/ K. This means
c
K is open, or K is closed.
We now mention a useful way to produce new compact sets from old ones.
D = C {Lc }
and note that D is actually an open cover of K. Therefore, as K is compact, we can extract
from D a finite subcover {D1 , . . . , Dn }. If Di C for all i, then we are done; otherwise Lc
it in this set (say it is Dn ) and we consider the collection {D1 , . . . , Dn1 }. This is a finite
subcollection of C. We claim that it is an open cover of L as well. Indeed, if x L then
/ Lc , Di cannot equal :c , meaning that
there exists i = 1, . . . , n such that x Di . Since x
i 6= n. This completes the proof.
To prove this theorem, we will need some preliminary results. Recall that Rudin defines
an n-cell to be a subset of Rn of the form
Lemma 3.5.2. Suppose that C1 , C2 , . . . are n-cells that are nested; that is, if
n
Y (k) (k)
Ci = [ai , bi ] ,
k=1
then
(k) (k) (k) (k)
[ai , bi ] [ai+1 , bi+1 ] for all i and k .
Then i Ci is nonempty.
32
Proof. We first consider the case n = 1. That is, take Ci = [ai , bi ] for i 1 and ai bi .
Define A = {a1 , a2 , . . .} and a = sup A. We claim that
a i Ci .
To see this, note that ai bj for all i, j. Indeed,
ai aj bj if i j
and
ai bi bj if i j .
Therefore bj is an upper bound for A. But a is the least upper bound of A so a bj for all
m. This gives
ai a bi for all m ,
or a i Ci .
For the case n 2 we just do the same argument on each of the coordinates to find
(a(1), . . . , a(n)) such that
(k) (k)
ai a(k) bi for all i, k ,
or (a(1), . . . , a(n)) i Ci .
Lemma 3.5.3. Any n-cell is compact in Rn .
Proof. For simplicity, take K = [0, 1] [0, 1] = [0, 1]n . Since Rn is a metric space (with
the usual metric), it suffices to prove that K is limit point compact; that is, that each infinite
subset of K has a limit point in K. This is from exercise 11 at the end of the Chapter. It
states that compactness and limit point compactness are equivalent in metric spaces.
Suppose that E K is infinite. We will produce a limit point of E inside K. We begin
by dividing K into 2n sub-cells by cutting each interval [0, 1] into two equal pieces. For
instance, in R2 we would consider the 4 sub-cells
[0, 1/2] [0, 1/2], [0, 1/2] [1/2, 1], [1/2, 1] [0, 1/2], [1/2, 1] [1/2, 1] .
At least one of these 2n sub-cells must contain infinitely many points of E. Call this sub-cell
K1 . Repeat, by dividing K1 into 2n equal sub-cells to find a sub-sub-cell K2 which contains
infinitely many points of E.
We continue this procedure ad infinitum, at stage i 1 finding a sub-cell Ki of K of the
form
Ki = [r1,i 2i , (r1,i + 1)2i ] [rn,i 2i , (rn,i + 1)2i ]
which contains infinitely many points of K. Note that the Ki s satisfy the conditions of the
previous lemma: they are nested n-cells. Therefore there exists z i Ki . Because each Ki
is a subset of K, we have z K.
We claim that z is a limit point of E. To show this, let r > 0. Note that for all points
x, y Ki we have
n
|x y|2 = (x1 y1 )2 + + (xn yn )2 n(2i )2 = i .
4
33
Therefore
n n
diam(Ki ) = sup{|x y| : x, y Ki } i .
2 i
(You can prove this inequality i 2i for all i by induction.) So fix any i > rn ; then for all
x Ki we have (because z Ki )
n
|x z| diam(Ki ) <r,
i
so that Ki Br (z). However Ki contains infinitely many points of E, so we can find one
not equal to z in Br (z). This means z is a limit point of E.
Proof of Heine-Borel. Suppose that K is closed and bounded in Rn . Then there exists an
n-cell C such that K C. By the previous lemma, C is compact. But K is a closed subset
of C so K is compact.
Suppose conversely that K is compact. Then we have already shown K is closed and
bounded.
Therefore we find that
(closed and bounded) (compact) in Rn ,
(closed and bounded) (compact) in metric spaces ,
and
(compact) (limit point compact) in metric spaces .
34
1. C is closed because it is an intersection of closed sets.
2. C is compact because it is closed and bounded (in R).
3. C has total length 0. Although we have not defined this, we can compute the length
of Cn : it is composed of 2n intervals of length 3n . Thus its length is (2/3)n . Because
this number tends to 0 as n goes to infinity (dont worry we will define these things
rigorously later),
length(C) length(Cn ) = (2/3)n
for all n, giving length(C) = 0.
4. Although it looks like all that will remain in the end is the endpoints of the intervals
used to construct C, in fact there is much more. The set of such endpoints is countable,
whereas C is uncountable. To see why this is true, note that each x C can be given
an address. The point x is in C1 , so it is in exactly one of the two intervals of C1 ;
assign the value 0 to x if it is in the first and 1 if it is in the second. Similarly, the
set C2 splits each interval of C1 into two: give x the value 0 if it is in the left such
interval and 1 if it is in the right. Continuing in this way, we can assign to x an infinite
sequence of 0s and 1s:
x 7 0111000110101 . . .
(In fact, this is nothing but the ternary expansion of x, replacing 2s by 1s.) The map
sending x to a sequence is actually a bijection from C to the set of sequences of 0s
and 1s, which we know is uncountable.
One example of an element of C that is not an endpoint is 1/4: its address is
1/4 7 010101 . . .
(Endpoints have 1-term repeating addresses, like 1/3 7 011111 . . .)
5. Every point of C is a limit point of C. To show this, we will prove more. We will show
that for each x C and each r > 0, there are points y and z in (x r, x + r) such
that y 6= x 6= z and y C, z / C. To do this, choose N such that N > 1/r and note
that since 2 > N we certainly have 3N > N , giving 3N < r. Since x C it follows
N
35
Lets finish with an observation. In exercise 6, you are asked to show that if (X, d) is
a metric space and (On ) is a countable collection of open dense subsets of X then n=1 On
is nonempty. From this we can actually derive uncountability of the real numbers. Indeed,
assume for a contradiction that R is countable and list its elements as {x1 , x2 , . . .}. Define
On = R \ {xn }. Each On is open and dense in R, so the intersection of all On s is nonempty.
This is a contradiction, since
\ \
On = [R \ {xn }] = .
n n
3.7 Exercises
1. For X = R2 define the function d : X X R by
Prove that d is a metric on X. Describe the unit ball centered at the origin geometri-
cally. Repeat this question using
2. Let F [0, 1] be the set of all bounded functions from [0, 1] to R. Show that d is a metric,
where
d(f, g) = sup{|f (x) g(x)| : x [0, 1]} .
3. Let X be the set of real valued sequences with only finitely many nonzero terms:
For an element x X write n(x) for the largest i N such that xi 6= 0. Define the
function d : X X R by
1/2
max{n(x),n(y)}
X
d(x, y) = (xi yi )2 .
i=1
4. For each of the following examples, verify that the collection C is an open cover of E
and determine if it can be reduced to a finite subcover. If it can, give a finite subcover;
otherwise, show why it cannot be reduced.
36
(b) E = [0, 1], C = {(x 104 , x + 104 ) : x Q [0, 1]}.
5. Prove that an uncountable set E R cannot have countably many limit points.
Hint. Argue by contradiction and assume that there is a set E R such that E 0 , the
set of limit points of E, is countable. What can you say about E \ E 0 ?
Let O R be a nonempty open set and let x O. Define Ox as the union of all open
intervals I such that x I and I O. Prove that Ox is a nonempty open interval.
7. Let O R be a nonempty open set. By completing the following two steps, show that
there exists a countable collection C of open intervals such that
8. The Kuratowski closure and complement problem. For subsets A of a metric space X,
consider two operations, the closure A, and the complement Ac = X \ A. We can
perform these multiple times, as in Ac , (A)c , etc.
(a) Prove that, starting with a given A, one can form no more than 14 distinct sets by
applying the two operations successively.
(b) Letting X = R, find a subset A R for which the maximum of 14 is attained.
Hint to get started. Clearly (Ac )c = A, so two complements in a row get you nothing
new. What about two closures in a row? See Rudin, Thm. 2.27.
Hint. Define a sequence of sets as follows. Choose x1 A1 and r1 > 0 such that
Br1 (x1 ) A1 . Then argue that there exists x2 A1 A2 and r2 > 0 such that
37
Br2 (x2 ) Br1 /2 (x1 ). Continuing, find infinite sequences r1 , r2 , . . . and x1 , x2 , . . . such
that xn n1 k=1 Ak and Brn (xn ) Brn1 /2 (xn1 ). Then, for each n, define Bn =
Brn /2 (xn ). What can you say about n Bn ?
10. Show that both Q and R \ Q are dense in R with the usual metric.
11. We now extend the definition of dense. If E1 , E2 are subsets of a metric space X then
E1 is dense in E2 if each point of E2 is in E1 or is a limit point of E1 . Show that if
E1 , E2 , E3 are subsets of X such that E1 is dense in E2 and E2 is dense in E3 then E1
is dense in E3 .
12. We say a metric space (X, d) has the finite intersection property if whenever C is a col-
lection of closed sets in X such that each finite subcollection has nonempty intersection,
the full collection has nonempty intersection:
\
C 6= .
CC
Show that X has the finite intersection property if and only if X is compact. (You
may use Rudin, Theorem 2.36.)
14. Let (X, d) be a metric space. We say that a subset E of X is limit point compact if
every infinite subset of E has a limit point in E. We have seen in class that if E is
compact then E is limit point compact. This exercise will serve to show the converse:
that if E is limit point compact then it is compact. For the following questions, fix a
subset E that is limit point compact.
38
(e) Use the previous parts to argue that E is compact.
Hint. Argue by contradiction and assume that there is an open cover C of E that
cannot be reduced to a finite subcover. Begin with 1 = 1/2 and apply part (b)
to get points x11 , . . . , x1n1 E such that E nk=1 1
B1 (x1k ). Clearly
h i
E nk=1
1
B1 (x1k ) E .
At least one of these sets, say B1 (x1j1 ) E cannot be covered by a finite number
of sets from C, or else we would have a contradiction. By parts (a) and (c), it has
the limit point property and C is a cover of it, so repeat the construction using
this set instead of E and 2 = 1/4. Continue, at step n 3 using n = 2n , to
create a decreasing sequence of closed subsets of E. Use part (d).
15. In this exercise we will consider a construction similar to that of the Cantor set. We
will define a countable collection of subsets {En : n 0} of the interval [0, 1] and we
will set E = n=0 En .
Next, let E2 be the set obtained by removing two subintervals, each of length 1/16,
from the middle of each piece of E1 . Thus
5 7 3 5 25 27
E2 = 0, , , ,1 .
32 32 8 8 32 32
39
4 Sequences
4.1 Definitions
Definition 4.1.1. Let (X, d) be a metric space. A sequence is a function f : N X.
We think of a sequence as a list of its elements. We typically write x1 = f (1), x2 = f (2)
and forget about f , denoting the sequence as (xn ) and the elements x1 , x2 , . . ..
The most fundamental notion related to sequences is that of convergence.
Definition 4.1.2. A sequence (xn ) converges to a point x X if for every > 0 there exists
N such that if n N then
d(xn , x) < .
In this case we write xn x.
We can think of proving convergence of a sequence as follows. We have a sequence (xn )
and you tell me it has a limit x. I ask Oh yeah? Well can you show that the terms of the
sequence get very close to x? You say yes and I ask Can you show that all but finitely
many terms are within distance = 1 of x? You say yes and provide an N equal to 600.
Then you proceed to show me that all xn for n 600 have d(xn , x) < 1. Temporarily
satisfied, I ask, Well you did it for 1, what about for = .00001? You then dream up of
an N equal to 40 billion such that for n N , d(xn , x) < .00001. This game can continue
indefinitely, and as long as you can come up with an N for each of my values of , then we
say xn converges to x.
Example.
We all believe that the sequence (xn ) given by xn = n21+n (in R) converges to 0, How do
we prove it? Let > 0. We want |xn 0| < , so we solve:
1 1
< , which is equivalent to n2 + n > .
n2 +n
This will certainly be true if n > 1 , so set
1
N= .
Now if n N then
1 1 1
< <.
n2 + n N2 + N N2
In the previous example, to show convergence to something we could have noticed that
the sequence is monotonic and bounded.
Definition 4.1.3. A sequence (xn ) in R is
1. monotone increasing if xn < xn+1 for all n (monotone non-decreasing if xn xn+1 )
and
40
2. monotone decreasing if xn > xn+1 for all n (monotone non-increasing if xn xn+1 ).
Theorem 4.1.4. If (xn ) is monotone (any of the types above) and bounded (that is, {xn :
n N} is bounded) then it converges.
Proof. Suppose (xn ) is monotone increasing. The other cases are similar. Then
X := {xn : n N}
Definition 4.1.5. A sequence (xn ) in a metric space X is bounded if there exists q X and
m R such that
d(pn , q) M for all n N .
Note that this is the same as saying that the set {xn : n N} is bounded.
Theorem 4.1.6 (Rudin, Theorem 3.2). Let (xn ) be a sequence in a metric space X.
1. (xn ) converges to x X if and only if every neighborhood of x contains xn for all but
finitely many n.
Proof. Part 1 is just a restatement of the definition of a limit. For the second part, suppose
that (xn ) converges to x and to y. Let > 0, so that there exists N1 and N2 such that
Thus d(x, y) < for all > 0; this is only possible if d(x, y) = 0 and thus x = y.
41
For part 3, suppose that (xn ) converges to x X and let = 1. Then there exists N
such that if n N then d(x, xn ) < 1. Now choose
It follows then that d(x, xn ) < r for all n, and so the set {xn : n N} is contained in Br (x).
We now show part 4. Suppose x is a limit point of E. For each n, choose any point (call
it xn ) in the set B1/n (x) E. We claim that this sequence of points (xn ) converges to x. To
see this, let > 0 and pick
1
N= .
Then if n N , d(xn , x) < 1/n 1/N < .
In the above, we see that a limit of a sequence is unique. This is in contrast to the limit
points (plural!) of a subset E of X. The points in E are in no particular order, and E may
have many limit points. But in a sequence, the points are ordered, and there can be at most
one limit as n runs through that chosen order.
In the case that the sequence is of real numbers, there is a nice compatibility with
arithmetic operations.
Properties of real sequences. Let (xn ) and (yn ) be real sequences such that xn x and
yn y.
1. xn + yn x + y.
2. If c R then cxn cx.
3. xn yn xy.
xn
4. If y 6= 0 and yn 6= 0 for all n N then yn
xy .
Proofs of properties. Many of these are similar so we will prove only 1 and 3. Rudin contains
all of the proofs. Suppose first that xn x and yn y. Given > 0 choose N1 and N2
such that
if n N1 then |xn x| < /2 and
if n N2 then |yn y| < /2 .
Letting N = max{N1 , N2 }, then if n N we have
Now note that since (yn ) converges, it is bounded. Therefore we can find M > 0 such that
|x| M and |yn | M for all n. Given > 0 choose N such that if n N then both
42
Then if n N ,
|xn yn xy| M /(2M ) + M /(2M ) = .
Note that for the last item above we required yn 6= 0 for all n. This is not necessary for
the following reasons.
Lemma 4.1.7. If (yn ) is a real sequence such that yn y and y 6= 0 then yn = 0 for at
most finitely many n N.
Proof. Suppose that yn y with y 6= 0 and let = |y|. Then there exists N N such that
if n N then |yn y| < . By the triangle inequality, if n N then
giving yn 6= 0.
The next lemma says that if we remove a finite number of terms from a convergent
sequence, this does not affect the limit.
Lemma 4.1.8. Let (yn ) be a sequence in a metric space X. For a fixed k N define a
sequence (zn ) by
zn = yn+k for n N .
Then (yn ) converges if and only if (zn ) does. If yn y then zn y.
Proof. Suppose yn y. If > 0 we can pick N N such that d(yn , y) < for n N . For
n N,
d(zn , y) = d(yn+k , y) < ,
since n + k N also. This means zn y.
Conversely, if zn y then given > 0 we can find N N such that n N implies that
d(zn , y) < . Define N 0 = N + k. Then if n N 0 , we have n k N and so
Thus yn y.
Now we can change the last property of real sequences as follows. If (xn ) and (yn ) are
real sequences such that xn x and yn y with y 6= 0 then xn /yn x/y. To do this,
we use the first lemma to find k such that for all n, yn+k 6= 0. Then we can consider the
sequences (xn+k ) and (yn+k ) and prove the property for them. Since they only differ from
(xn ) and (yn ) by a finite number of terms, the property also holds for (xn ) and (yn ).
We will mostly deal with sequences of real numbers (or elements of an arbitrary metric
space), but it is useful to understand convergence in Rk , k 2. It can be reformulated
in terms of convergence of each coordinate. That is, if (xn ) is a sequence in Rk , we write
(1) (k)
xn = (xn , . . . , xn ). The sequence (xn ) converges to x Rk if and only if each coordinate
(j)
sequence (xn ) converges to x(j) , the j-th coordinate of x.
43
Theorem 4.1.9. Let (xn ) and (yn ) be sequences in Rk and (n ) a sequence of real numbers.
(j)
1. (xn ) converges to x Rk if and only if xn x(j) (in R) for all j = 1, . . . , k.
2. If xn x, yn y in Rk and n in R, then
xn + yn x + y, xn yn x y, and n xn x ,
Proof. The second part follows from the first part and properties of limits in R we discussed
above. To prove the first, suppose that xn x and let j {1, . . . , k}. Given > 0, let N
be such that n N implies that |xn x| < . Then we have
q q
(j) (j) (j) (1) (k)
|xn x | = (xn x ) (xn x(1) )2 + + (xn x(k) )2 = |xn x| < .
(j) 2
(j)
So xn x(j) .
(j)
For the converse, suppose that xn x(j) for all j = 1, . . . , k and let > 0. Pick
(j)
N1 , . . . , Nk such that for j = 1, . . . , k, if n Nj then |xn x(j) | < / k. then for
N = max{N1 , . . . , Nk } and n N , we have
q
(1) (k)
p
|xn x| = (xn x(1) )2 + + (xn x(k) )2 < 2 /d + + 2 /d = .
Definition 4.1.10. A real sequence (xn ) converges to if for each M > 0 there exists
N N such that
n N implies xn > M .
It converges to if (xn ) converges to .
44
4.2 Subsequences, Cauchy sequences and completeness
We now move back to sequences in general metric spaces. Sometimes the sequence does not
converge, but if we remove many of the terms we can make it converge. Another way to say
this is that a sequence might not converge but it may have a convergent subsequence.
Note that a sequence (xn ) converges to x if and only if each subsequence of (xn ) converges
to x. To prove this, suppose first that xn x and let (xnk ) be a subsequence. Given > 0
we can find N such that if n N then d(xn , x) < . Because (nk ) is monotone increasing, it
follows that nk k for all k, so choose K = N . Then for k K, the element xnk is a term
of the sequence (xn ) with index at least equal to N , giving d(xnk , x) < .
Conversely, suppose that each subsequence of (xn ) converges to x. Then as (xn ) is a
subsequence of itself, we also have xn x!
The next theorem is one of the most important in the course. It is a restatement of
compactness; in general topological spaces, it is called sequential compactness.
Theorem 4.2.2. Let (xn ) be a sequence in a compact metric space X. Then some subse-
quence of (xn ) converges to a point x in X.
Proof. It may be that the set of sequence elements {xn : n N} is finite. In this case, at
least one of these elements must appear in the sequence infinitely often. That is, there exists
x {xn : n N} and a monotone increasing sequence (nk ) such that xnk = x for all k.
Clearly then xnk x and the element x X because the sequence terms are.
Otherwise, the set {xn : n N} is infinite. Because compactness implies limit point com-
pactness, there exists x X which is a limit point of this set. Then we build a subsequence
that converges to x as follows. Since d(xn , x) < 1 for infinitely many n, we can pick n1 such
that d(xn1 , x) < 1. Continuing in this fashion, at stage i we note that d(xn , x) < 1/i for
infinitely many n, so we can pick ni > ni1 such that d(xni , x) < 1/i. Because n1 < n2 < ,
the sequence (xni ) is a subsequence of (xn ). Further, given > 0, choose I > 1/, so that if
i I,
d(xni , x) 1/i 1/I < .
Proof. If (xn ) is bounded in Rk then we can fit the set {xn : n N} into a k-cell, which is
compact. Now, viewing (xn ) as a sequence in this compact k-cell, we see by the previous
theorem that it has a convergent subsequence.
45
The last topic of the section is Cauchy sequences. The motivation is as follows. Many
times we are in a metric space X that has holes. For instance, we may consider Q as a
metric space inside of R (that is, using the metric d(x, y) = |x y| from R). In this space,
the sequence
(1, 1.4, 1.41, 1.414, . . .)
does not converge (it should only converge in R to 2, but this element is not in our
space). Although we cannot talk about this sequence converging; that is, getting close to
some limit x, we can do the next best thing. We can say that the terms of the sequence get
close to each other.
Definition 4.2.4. Let (xn ) be a sequence in a metric space X. We say that (xn ) is Cauchy
if for each > 0 there exists N N such that
Just like before, the number N gives us a cutoff in the sequence after which all terms
are close to each other. Each convergent sequence (xn ) (with some limit x) is Cauchy, for if
> 0 then we can pick N such that if n N then d(xn , x) < /2. Then for m, n N ,
One reason a Cauchy sequence might not converge was illustrated above; the limit may
not be in the space. This is not possible, though, in a compact space.
Theorem 4.2.5. If X is a compact metric space, then all Cauchy sequences in X converge.
Proof. Let X be compact and (xn ) a Cauchy sequence. By the previous theorem, (xn ) has
a subsequence (xnk ) such that xnk x, some point of X. We will show that since (xn )
is already Cauchy, the full sequence must converge to x. The idea is to fix some element
xnk of the subsequence which is close to x. This term is chosen far enough along the initial
sequence so that all terms are close to it, and thus close to x.
Let > 0 and choose N such that if m, n N then d(xm , xn ) < /2. Choose also some
K such that if k K then d(xnk , x) < /2. Last, set N 0 = max{N, K}. Because (nk ) is
monotone increasing, we can fix k such that nk N 0 . Then for any n N 0 , we have
Definition 4.2.6. A metric space X in which all Cauchy sequences converge is said to be
complete.
The above theorem says that compact spaces are complete. This is also true of Rk ,
though it is not compact.
46
Proof. Let (xn ) be a Cauchy sequence. We claim that it is bounded. The proof is almost
the same as that of the fact that a convergent sequence is bounded. We can find N such
that if n, m N then d(xn , xm ) < 1. Therefore d(xn , xN ) < 1 for all n N . Putting
R = max{d(xN , x1 ), . . . , d(xN , xN 1 ), 1}, we then have d(xj , xN ) < R for all j, so (xn ) is
bounded.
Since (xn ) is bounded, we can put it in a k-cell C. Then we can view the sequence
as being in the space C, which is compact. Now we use the fact that compact spaces are
complete, giving some x C such that xn x. But x R, so we are done.
If p > 0 then np 0.
n1/n 1.
n
If p > 0 and R then (1+p)n
0.
47
Assuming it holds for some n, we show it holds for n + 1. We have
n
n+1 n
X n j nj
(x + y) = (x + y)(x + y) = (x + y) xy
j=0
j
n n
X n j+1 nj X n j n+1j
= x y + xy
j=0
j j=0
j
n+1 n
X n j n+1j
X n j n+1j
= xy + xy
j=1
j 1 j=0
j
n
n n+1 n n+1 X n n
= y + x + + xj y n+1j
0 n j=1
j 1 j
n n n+1
But now we use the identity j1
+ j
= j
, valid for n 0 and j = 1, . . . , n. This gives
n
n+1 n+1
X n+1
y +x + xj y n+1j ,
j=1
j
Pn+1 n+1
which is j=0 j
xj y n+1j .
Returning to the proof of the second limit, we first assume that p > 1 and set yn = p1/n 1.
Computing,
p = (yn + 1)n 1 + nyn ,
where we have taken only the first two terms from the binomial theorem. This means
0 yn p1n
and letting n we get yn 0, completing the proof in the case p > 1.
If 0 < p < 1 then we consider 1/p and see that (1/p)1/n 1. Taking reciprocals, we get
p1/n 1.
For the third limit, we use a different term in the binomial theorem. Set xn = n1/n 1
and compute
n n 2 n(n 1) 2
n = (1 + xn ) xn = xn ,
2 2
q q
so 0 xn n1 n4 if n 2. Since n1/2 0 we are done.
2
The fourth limit is a bit more difficult. Choose any k and consider n > 2k. Then
n k n(n 1) (n k + 1) k (n/2)k pk
n
(1 + p) p = p .
k k! k!
This gives
n (p/2)k k
0 n 0
(1 + p)n k!
since < k.
The last limit is proved in Chapter 5 in the theorem on geometric series.
48
4.4 Exercises
1. For the following sequences, find the limit and prove your answer (using an N
argument).
(a) xn = n2 + 1 n, n N.
(b) xn = n2n , n N.
2. Determine whether or not the following sequence converges. If it does not, give a
convergent subsequence (if one exists).
n
xn = sin + cos(n), n N .
2
3. Let a1 , . . . , ak be positive numbers. Show that
n 1/n
a1 + + ank
lim = max{a1 , . . . , ak } .
n k
4. We have seen that if a metric space (X, d) is compact then it must be complete. In
this exercise we investigate the converse.
(a) Show that if X is complete then it need not be compact.
(b) We say that a metric space (X, d) is totally bounded if for each > 0 we can find
finitely many points x1 , . . . , xn such that
n
[
X B (xi ) .
i=1
(Here, Br (x) is the neighborhood {y X : d(x, y) < r}.) Show that X is compact
if and only if X is both totally bounded and complete via the following steps.
i. Show that if X is compact then X is totally bounded.
ii. Assume that X is totally bounded and let E X be infinite. We will try to
(1) (1)
construct a limit point for E in X. Begin by finding x1 , . . . , xn1 X such
that n1
[ (1)
X B 1 (xi ) .
2
i=1
(n) (n)
Continue, at stage n 2 choosing xkn X such that En = B2n (xkn ) En1
(1) (2)
is infinite. Show that (xk1 , xk2 , . . .) is a Cauchy sequence.
Hint. You may want to use the following fact (without proof). For n 1,
define sn = 1/2 + 1/4 + + 1/2n . Then (sn ) converges.
49
iii. Assume that X is totally bounded and complete and show that any E X
which is infinite has a limit point in X. Conclude that X is compact.
(c) Show that if X is totally bounded then it need not be compact.
5. Sometimes we want to analyze sequences that do not converge. For this purpose we
define upper and lower limits; numbers that exist for all real sequences. Let (an ) be a
sequence in R and for each n 1 define
(a) Show that (un ) and (ln ) are monotonic. If (an ) is bounded, show that there exist
numbers u, l R such that un u and ln l. We denote these numbers as the
limit superior (upper limit) and limit inferior (lower limit) of (an ) and write
(b) Give reasonable definitions of lim supn an and lim inf n an in the unbounded
case. (Here your definitions should allow for the possibilities .)
(c) Show that (an ) converges if and only if
(You may want to separate into cases depending on whether the lim inf and/or
lim sup is finite or infinite.)
6. Let (an ) be a real sequence and write E for the set of all subsequential limits of (an );
that is,
Assume that (an ) is bounded and prove that lim supn an = sup E. Explain how you
would modify your proof to show that lim inf n an = inf E. (These results are also
true if (an ) is unbounded but you do not have to prove that.)
50
8. This problem is not assigned; it is just for fun. Let x0 = 1 and xn+1 = sin xn .
(a) Prove xn 0.
(b) Find limn nxn .
51
5 Series
We now introduce series, which are special types of sequences. We will concentrate on them
for the next couple of lectures.
5.1 Definitions
Definition 5.1.1. Let (xn ) be a real sequence. For each n N, define the partial sum
n
X
s n = x1 + + xn = xj .
j=1
P
We say that the series xn converges if (sn ) converges.
Just as before, the tail behavior is all that matters (we can chop off as many initial terms
as we want). In other words
X
X
xn converges iff xn converges for each N 1 .
n=1 n=N
Proof. To prove this, we give a lemma that allows us to handle series of non-negative terms
more easily.
P
Lemma 5.1.4. Let (xn ) be a sequence of non-negative terms. Then xn converges if and
only if the sequence of partial sums (sn ) is bounded.
52
Proof. This comes directly from the monotone convergence theorem. If xn 0 for all n,
then
sn+1 = sn + xn+1 sn ,
giving that (sn ) is monotone, and converges if and only if it is bounded.
Returning to the proof, we will show that the partial sums of the harmonic series are
unbounded. Let M > 0 and choose n of the form n = 2k for k > 2M . Then we give a lower
bound:
1 1 1
sn = 1 + + + + k
2 3 2
1 1 1 1 1 1 1 1
>1+ + + + + + + + + + k
2 3 4 5 6 7 2k1 2 1
1 1 1 1
> +2 +4 + + 2k1
2 4 8 2k
1 k
= (1 + 1 + 1 + + 1) = > M .
2 2
So given any M > 0, there exists n such that sn > M . This implies that (sn ) is unbounded
and we are done.
In the proof above, we used an argument that can be generalized a bit.
Theorem 5.1.5 (Comparison test). Let (xn ) and (yn ) be non-negative real sequences such
that xn yn for all n.
P P
1. If xn converges, then so does yn .
P P
2. If yn diverges, then so does xn .
Proof. The first part is implied by the second, so we need only show the second. Write (sn )
and (tn ) for the partial sums
sn = x1 + + xn and tn = y1 + + yn .
Since yn 0 for all n we can use the above lemma to say that (tn ) is unbounded, so given
M > 0 choose N such that n N implies that tn > M . Now for such n,
sn = x1 + + xn y1 + + yn = tn > M ,
53
2. In the first part, we do not even need yn P 0 as long as we modify the statement.
Suppose that (xn ) is non-negative such that xn converges and |yn | xn for all n.
Then setting sn and tn as before, we can just show that (tn ) is Cauchy. Since (sn ) is,
given > 0 we can find N such that if n > m N then |sn sm | < . Then
To use the comparison test, let us first introduce one of the simplest series of all time.
Theorem 5.1.6 P (Geometric series). For a R define a sequence (xn ) by xn = an . Then the
geometric series n xn converges if and only if |a| < 1. Furthermore,
X 1
an = if |a| < 1 .
n=0
1a
Proof. The first thing to note is that an 0 if |a| < 1. We can prove this by showing that
|a|n 0. So if 0 |a| < 1 then the sequence |a|n is monotone decreasing:
but as |a| =
6 0 we have L = 0.
Now continue to assume that |a| < 1 and compute the partial sum for n 1
1 an+1
sn = .
1a
We let n to get the result.
If |a| 1 then the terms an do not even go to zero, since |a|n |a| 6= 0, so the series
diverges.
Now we can prove facts about the p-series.
P p
Theorem 5.1.7. The series n converges if and only if p > 1.
54
Proof. For p 1 we have np 1/n and so the comparison test gives divergence. Suppose
then that p > 1. We can group terms as before: taking n = 2k 1,
1 1 1
1+ + + +
2p 3p (2k 1)p
1 1 1 1 1 1 1 1
=1+ + + + + + + + + + k
2p 3p 4p 5p 6p 7p 2p(k1) (2 1)p
1 1 1
1 + 2 p + 4 p + + 2k1 (k1)p
2 4 2
1p 1p k1 1p
=1+2 +4 + + (2 )
1
= (21p )0 + (21p )1 + + (21p )k1 < since p > 1 .
1 21p
This means that if sn = nj=1 j p , then s2k 1 1211p for all k. Since (sn ) is monotone, it
P
is then bounded and the series converges.
n N implies |xn | pn .
P n P
Now we just use the comparison test. Since p converges (as 0 < p < 1), so does xn .
Suppose now that > 1. Recall from the homework that given a real sequence (yn ),
there always exists a subsequence (ynk ) such that ynk lim supn yn . So we can find an
increasing sequence (nk ) such that |xnk |1/nk . Thus there exists K such that
55
Last if = 1 we cannot tell anything. First (1/n)1/n 1 but also (1/n2 )1/n =
2
(1/n)1/n 1. Since 1/n2 converges, the root test tells us noth-
P P
1/n diverges and
ing.
Applications.
P n2
1. The series 2n
converges. We can see this by the root test.
1/n 2
n2 n1/n
lim sup = lim sup = 1/2 < 1 .
n 2n n 2
2. Power series. Let x R and for a given real sequence (an ), consider the series
X
an x n .
n=0
We would like to know for which values of x this series converges. To solve for this, we
simply use the root test. Consider
Setting = lim supn |an |1/n , we find that the series converges if |x| < 1/ and
diverges if |x| > 1/. So it makes sense to define
X
R := 1/ as the radius of convergence of the power series an xn .
n=0
Of course we cannot tell from the root test what happens when x = R.
P
If xn+1 xn > 0 for all n N0 (a fixed natural number) then xn diverges.
56
Proof. Assume the limsup is < 1. Then as before, choosing p (, 1) we can find N such
that if n N then |xn+1 | < p|xn |. Iterating this from n = N we find
Applications.
P n nn P
1. The series x n! converges if |x| < 1/C, where C = n=0 1/n!. To see this, set
n
bn = xn nn! : n
n+1
bn+1
= |x| (n + 1) = |x| 1 +
1
.
bn nn (n + 1) n
But
n n n
1 X n j X n(n 1) (n j + 1) X 1
1+ = n = =C .
n j=0
j j=0
j!nj j=0
j!
So lim supn bn+1 C|x| < 1.
bn
2. Power series. Generally wePcan also test convergence of power series using the ratio
test. Considering the series an xn , we compute
an+1 xn+1
an+1
= |x| lim sup
lim sup an = |x| ,
n an x n n
an+1
where = lim supn an . So if |x| < 1/ the series converges, whereas if |x| 1/
we cannot tell. However, if = limn an+1 exists then for |x| > 1/ we have
an
divergence.
57
Remark (from class). The root and ratio tests can give different answers. Consider the
sequence (an ) given by (
1 if n is even
an = .
2 if n is odd
Then
an+1
lim sup = 2 but lim sup (|an |)1/n = 1 .
n an n
Theorem
Pn 5.3.1 (Dirichlet test). Let (an ) and (bn ) be real sequences suchPthat, setting An =
a
j=0 j , we have (An ) bounded. If (b n ) is monotonic with b n 0 then an bn converges.
Now since (An ) is bounded and bn 0 we have An bn 0. We can show this as follows.
Suppose that |An | M for all n and let > 0. Choose N0 such that n N0 implies that
|bn | < /M . Then for n N0 , |An bn | < M /M = .
58
P
Since
P A N b N 0, the above representation gives that an bn converges if and only
if An (bn bn+1 ) converges. But now we use the comparison test: |An (bn bn+1 )|
M |bn bn+1 | = M (bn bn+1 ), where we have used monotonicity to get bn bn+1 0. But
N
X N
X
M (bn bn+1 ) = M (bn bn+1 ) = M (b0 bN +1 ) M b0
n=0 n=0
P
converges, so An (bn bn+1 ) converges, completing the proof.
Note in the previous proof that we R b used a technique similar to integration by parts.
Recall from calculus that the integral a u(x)v(x) dx can be written as
Z b Z b
u(x)v(x) dx = U (b)v(b) U (a)v(a) U (x)v 0 (x) dx ,
a a
X X
(1)n /n converges although 1/n does not .
2. For n N, let f (n) be the largest value of k such that 2k n (this is the integer part
of log2 n). Then X
(1)n /f (n) converges .
P P
3. A series an is said to converge absolutelyPif |an | converges. If it does not converge
absolutely but does converge then we say an converges conditionally. It is a famous
P
theorem of Riemann that given any L R and a conditionallyP convergent series an ,
there is a rearrangement (bn ) of the terms of (an ) such that bn = L. See the last
section of Rudin, Chapter 3 for more details.
5.4 Exercises
1. Determine if the following series converge.
(a)
X n3
n=1
n2 6n + 10
59
(b)
X n!
n=1
1 3 (2n 1)
Hint. Use Theorem 3.42, after multiplying by sin(x/2). Use the following identity,
which is valid for all a, b R:
1
sin a sin b = (cos(a b) cos(a + b)) .
2
Although we have not defined sin x you can use the fact that | sin x| and | cos x| are
both bounded by 1.
P P
3. Because 1/n diverges, any series n 1/n for all n must diverge, by the
xn with xP
comparison test. We might think then that if xn converges and xn 0 for all n
then xn is smaller than 1/n, in the sense that
lim nxn = 0 .
n
P
(a) Show that this is false; that is, there exist convergent series xn with xn 0 for
all n such that {nxn } does not converge to 0.
Show however that if {xn } is monotone non-increasing and non-negative with
(b) P
xn convergent, then nxn 0.
Hint. Use Theorems 3.23 and 3.27.
4. Here we give a different proof of the alternating series test. Let {xn } be a real sequence
that is monotonically non-increasing and xn 0.
P
5. Suppose that (an ) and (bn ) are real sequences such that bn > 0 for all n and an
converges. If
an
lim = L 6= 0
n bn
P
then must bn converge? (Here L is a finite number.)
60
6. (a) For 0 k n, recall the definition of the binomial coefficient
n n!
= .
k k!(n k)!
(1 + b1 )(1 + b2 ) (1 + bm ) 1 + b1 + + bm .
61
6 Function limits and continuity
6.1 Function limits
So far we have only talked about limits for sequences. Now we step it up to functions. Pretty
quickly, though, we will see that we can relate function limits to sequence limits. The point
at which we consider the limit does not even need to be in the domain of the function f . It
is important also to notice that x is not allowed to equal x0 below.
Definition 6.1.1. Let (X, dX ) and (Y, dY ) be metric spaces and E X with x0 a limit point
of E. If f : E Y then we write
lim f (x) = L
xx0
if for each > 0 there exists > 0 such that whenever x E and 0 < dX (x, x0 ) < , it
follows that dY (f (x), L) <
Why is it important that x0 be allowed to be only a limit point of E (and therefore not
necessarily in E)? Consider f : (0, ) R defined by f (x) = sinx x . Then f is not defined
at 0 but we know from calculus that it has a limit of 1 as x 0.
Here we are using as a measure of closeness just as before. We can imagine a dialogue
similar to what occurred for sequences: you say that f (x) approaches L as x approaches x0 .
I say Well, can you get f (x) within .005 of L as long as x is close to x0 ? You say yes and
produce a value of = .01. You then qualify this by saying, As long as x is within = .01
of x0 then f (x) will be within .005 of L. This goes on and on, and if each time I give you
an > 0 you manage to produce a corresponding > 0, then we say the limit equals L.
As promised, there is an equivalent formulation of limits using sequences. Note that the
statement below must hold for all sequences (xn ) in E with xn x0 but xn 6= x0 for all n.
Proposition 6.1.2. Let f : E Y and x0 a limit point of E. We have limxx0 f (x) = L if
and only if for each sequence (xn ) in E such that xn x0 with xn 6= x0 for all n, it follows
that f (xn ) L.
Proof. Suppose first that limxx0 f (x) = L and let (xn ) be a sequence in E such that xn x0
and xn 6= x0 for all n. We must show that f (xn ) L. So, let > 0 and choose > 0
such that whenever dX (x, x0 ) < , we have dY (f (x), L) < . Now since xn x0 we can pick
N N such that if n N then dX (xn , x0 ) < . For this N , if n N then
dX (xn , x0 ) < , so dY (f (xn ), L) < ,
and it follows that f (xn ) L.
Suppose conversely that f (xn ) L for all sequences (xn ) in E such that xn x0 and
xn 6= x0 for all n. By way of contradiction, assume that limxx0 f (x) = L does not hold.
So there must be at least one > 0 such that for any > 0 we try to find, there is always
a x E with dX (x , x0 ) < but dY (f (x ), L) > . So create a sequence of these, using
= 1/n. In other words, for each n N, pick xn E \ {x0 } such that 0 < dX (xn , x0 ) < 1/n
but dY (f (xn ), L) > . (This is possible in part because x0 is a limit point of E.) Then clearly
xn x0 with xn 6= x0 for all n but we cannot have f (xn ) L. This is a contradiction.
62
One nice thing about the sequence formulation is that it allows us to immediately bring
over theorems about convergence for sequences. For instance,
3. limxx0 f (x)g(x) = LM .
4. If M 6= 0 then
f (x) L
lim = .
xx0 g(x) M
6.2 Continuity
We now give the definition of continuity. Note that x0 must be an element of E, since f
needs to be defined there.
Corollary 6.2.2. The function f is continuous at x0 E if and only if for each sequence
(xn ) in E with xn x0 we have f (xn ) f (x0 ).
Proof. This is just a consequence of the sequence theorem from last section.
There is yet another equivalent definition in terms of only open sets. This one is valid for
functions continuous on all of X (although there is a more technical one for continuity at a
point, but we will not get into that). To extend the theorem to functions that are continuous
on subsets E of X, one would need to talk about sets that are open in E.
Theorem 6.2.3. If f : X Y then f is continuous on X if and only if for each open set
O Y , the preimage
f 1 (O) = {x X : f (x) O}
is open in X.
63
Proof. Suppose that f is continuous on X and let O Y be open. We want to show that
f 1 (O) is open. So choose x0 f 1 (O). Since f (x0 ) O (by definition) and O is open we
can find > 0 such that B (f (x0 )) O. However f is continuous at x0 so there exists a
corresponding > 0 such that if x X with dX (x, x0 ) < then dY (f (x), f (x0 )) < . So if
x B (x0 ) then f (x) B (f (x0 )). As B (f (x0 )) was chosen to be a subset of O, we find
of B (x0 ) f 1 (O). This means x0 is an interior point of f 1 (O) and this set is open.
Suppose that for each open O Y the set f 1 (O) is open in X. To show f is continuous
on X we must show that f is continuous at each x0 X. So let x0 X and > 0. The set
B (f (x0 )) is open in Y , so f 1 (B (f (x0 ))) is open in X. Because x0 is an element of this
set (note that f (x0 ) B (f (x0 ))) it must be an interior point, so there is a > 0 such that
B (x0 ) f 1 (B (f (x0 ))). Now if dX (x, x0 ) < then dY (f (x), f (x0 )) < , so f is continuous
at x0 .
It is difficult to get intuition about this definition, but let us give an example to illustrate
how it may work. Consider the function f : R R given by
(
1 if x = 0
f (x) = .
0 if x 6= 0
To see this in terms of the other definition, look at the open set (1/2, 3/2). Then
which is not open. This only proves, however, that f is not continuous everywhere.
Corollary 6.2.4. f is continuous on X if and only if for each closed C Y , the set f 1 (C)
is closed in X.
Proof. f is continuous on X if and only if for each open O Y , the set f 1 (O) is open in
X. If C Y is closed then C c is open in Y . Therefore
c
f 1 (C) = f 1 (C c ) is closed inX .
c
To check this equality, we have x f 1 (C) iff f (x) C iff f (x)
/ C c iff x (f 1 (C c )) .
1
The other direction is similar. If f (C) is closed in X whenever C is closed in Y , let O
be an open set in Y . Then f 1 (Oc ) is closed in X, giving
c
f 1 (O) = f 1 (Oc ) open in X .
64
Examples.
1. The simplest. Take f : X X as f (x) = x. Then for each open O X, f 1 (O) = O
is open in X. So f is continuous on X.
2. Let f : R R be (
1 if x Q
f (x) = .
0 if x
/Q
This function is continuous nowhere. If x R then suppose first x is rational. Choose
a sequence of irrationals (xn ) converging to x (this is possible by the fact that R \ Q
is dense in R, from the homework). Then limn f (xn ) = 0 6= 1 = f (x). A similar
argument holds for irrational x and gives that f is continuous nowhere.
Note that this conclusion cannot be obtained by showing that some open set O has
f 1 (O) not open. (Take for instance O = (1/2, 1/2).) This would prove that f is not
continuous everywhere.
3. The last function was discontinuous at the rationals and irrationals. This one is a
nasty function that will be discontinuous only at the rationals. For any q Q write
q (m, n) if m/n is the lowest terms representation of q; that is, if m, n are the
unique numbers with m Z, n N and m, n have no common prime factors. Then
define f : R R by
(
1
if x Q and x (m, n) for some m Z
f (x) = n .
0 if x /Q
65
We saw last time that f : R R given by f (x) = x is continuous everywhere. We will
use this along with the following proposition to show that polynomials are also continuous.
1. f + g, af , and f g,
2. f /g as long as g(x0 ) 6= 0.
Proof. These follow using the limit properties from before; for example,
Any polynomial function is continuous on all of R. That is, if f (x) = an xn + +a1 x+a0
then f is continuous on R.
Every rational function is continuous at each point for which the denominator is
nonzero. That is, if f (x) = g(x)/h(x), where g and h are polynomial functions, then
f is continuous at x0 if and only if h(x0 ) 6= 0.
Proof. Let > 0. Since g is continuous at f (x0 ), we can choose 0 > 0 such that if
dY (y, f (x0 )) < 0 then dZ (g(y), g(f (x0 ))) < . Since f is continuous at x0 , we can choose
> 0 such that if dX (x, x0 ) < then dY (f (x), f (x0 )) < 0 . Putting these together, if
dX (x, x0 ) < then dY (f (x), f (x0 )) < 0 , giving dZ (g(f (x)), g(f (x0 ))) < . This means
dZ ((g f )(x), (g f )(x0 )) < and g f is continuous at x0 .
Proof. We need to show that any open cover of f (E) can be reduced to a finite subcover, so
let C be an open cover of f (E). Define a collection C 0 of sets in X by
C 0 = {f 1 (O) : O C} .
66
Because f is continuous, each set in C 0 is open. Furthermore C 0 covers E, as every point
in x E is mapped to an element in f (E), which is covered by some O C. This means
that C 0 is an open cover of E, and compactness of E allows to reduce it to a finite subcover
{f 1 (O1 ), . . . , f 1 (On )}. We claim that {O1 , . . . , On } is a finite subcover of f (E). To show
this, let y f (E) so that there exists some x E with f (x) = y. There exists k with
1 k n such that x f 1 (Ok ) and therefore y = f (x) Ok .
This theorem has many consequences.
Proof. From the theorem, the set f (E) is compact and therefore bounded.
The next is for continuous functions to R.
Proof. The set f (E) is closed and bounded, so it contains its supremum, y. Since y f (E)
there exists x0 E such that f (x0 ) = y. Then f (x0 ) f (x) for all x E.
Continuous functions on compact sets actually satisfy a property that is stronger than
continuity. To explain this, consider the function f : (0, ) R given by f (x) = 1/x.
When we study continuity, what we are really interested in is how much a small change in x
will change the value of f (x). (Recall continuity says that if we change x by at most then
f (x) will change by at most .) Consider the effect of changing x by a fixed amount, say .1,
for different values of x. If x is large, like 100, then changing x by .1 can change f so that
it lies anywhere in the interval (1/100.01, 1/99.99). If x is small, like .15, then this same
change in x changes f to lie in the interval (1/.25, 1/.05) = (4, 20). This is a much larger
interval, meaning that f is more unstable to changes when x is small compared to when x
is large.
This motivates the idea of uniform continuity. For a uniformly continuous function, the
measure of stability described above is uniform on the whole set. That is, there is an upper
bound to how unstable the function is to changes. This corresponds to a uniform > 0 over
all x for a given > 0:
67
Proof. The idea of the proof is as follows. Since f is continuous at each x, given > 0 we
can find a x > 0 from the definition of continuity that works at x. These x -balls cover
X and by compactness we can find only finitely many xi s such that these balls still cover
X. Taking the minimum of these numbers will give us the required (positive) .
Let > 0. For each x X, since f is continuous at x, we can find x > 0 such that if
dX (x, x0 ) < x then dY (f (x), f (x0 )) < /2. The collection
{Bx /2 (x) : x X}
is an open cover for X, so since X is compact, we can find x1 , . . . , xn X such that
X i Bxi /2 (xi ). Let
= min{x1 /2, . . . , xn /2} ;
we claim that if x, y X satisfy dX (x, y) < then dY (f (x), f (y)) < . To prove this, pick
such x and y. We can then find i such that dX (xi , x) < i /2. By the triangle inequality we
then have
dX (xi , y) dX (xi , x) + dX (x, y) < i /2 + i .
This means by definition of i that
dY (f (x), f (y)) dY (f (x), f (xi )) + dY (f (y), f (xi )) < /2 + /2 = .
Examples.
1. Not every continuous function on a non-compact set is uniformly continuous. If E
is any non-closed subset of R then there exists a continuous function on E that is
both unbounded and not uniformly continuous. Take x0 to be any limit point of E
that is not in E. Then f (x) = (x x0 )1 is continuous but unbounded. Further f
is not uniformly continuous because there is no > 0 such that for all x, y E with
|x y| < we have |f (x) f (y)| < 1. If there were, we could just choose some y E
with |y x0 | < /2 and then deduce that all points z within distance of y have
f (z) f (y) + 1. But this is impossible.
2. If E is an unbounded subset of R then there is an unbounded continuous function on
E: just take f (x) = x.
3. The only polynomials that are uniformly continuous on all of R are those of degree at
most 1. Indeed, take f (x) = an xn + a1 x + a0 with an 6= 0 and n 2 and assume
that there exists > 0 such that if |x y| < then |f (x) f (y)| < 1. Then consider
points x, y of the form x, x + /2: you can check that
{|f (x) f (x + /2)| : x R}
is unbounded, giving a contradiction. (The problem here is that for fixed , the quantity
|f (x) f (x + /2)| grows to infinity as x . This is not the case if n = 0 or 1.)
There are other ways that functions can fail to be uniformly continuous. We will see later,
however, that any differentiable function with bounded derivative is uniformly continuous.
68
6.4 Connectedness and the IVT
We would like to prove the intermediate value theorem from calculus and the simplest way to
do this is to see that it is a consequence of a certain property of intervals in R. Specifically,
an interval is connected. The definition of connectedness is somewhat strange so we will try
to motivate it. Instead of trying to envision what connectedness is, we will try to capture
what it is not. That is, we want to call a metric space disconnected if we can write it as a
union of two sets that do not intersect. There is a problem with this attempt at a definition,
as we can see by considering R. Certainly we can write it as (, 1/2) [1/2, ) and these
sets do not intersect, but we still want to say that R is connected. The issue in this example
is that the sets are not separated enough from each other. That is, one set contains limit
points of the other. This problem is actually resolved if we require that both sets are open.
(But you have to think about how this resolves the issue.)
Definition 6.4.1. A metric space X is disconnected if there exist non-empty open sets O1
and O2 in X such that X = O1 O2 but O1 O2 = . If X is not disconnected we say it is
connected.
Connectedness and continuity also go well with each other.
Theorem 6.4.2. Let X, Y be metric spaces and f : X Y be continuous. If X is connected
then the image set f (X), viewed as a metric space itself, is connected.
Proof. As stated above, we view f (X) Y as a metric space itself, using the metric it
inherits from Y . To show that f (X) is a connected space we will assume it is disconnected
and obtain a contradiction. So assume that we can write f (X) = O1 O2 with O1 and O2
nonempty, disjoint, and open (in the space f (X)). We will produce from this a disconnection
of X and obtain a contradiction.
Now consider U1 = f 1 (O1 ) and U2 = f 1 (O2 ). These are open sets in X since f is
continuous. Further they do not intersect: if x is in their intersection, then f (x) O1 O2 ,
which is empty. Last, they are nonempty because, for example, if y O1 (which is nonempty
by assumption) then because O1 f (X), there exists x X such that f (x) = y. This x is
in f 1 (O1 ).
So we find that X is disconnected, a contradiction. This means f (X) must have been
connected.
Theorem 6.4.3 (Intermediate value theorem). Let f : [a, b] R for a < b be continuous.
Suppose that for some L R,
f (a) < L < f (b) .
Then there exists c (a, b) such that f (c) = L.
69
Proof. Since f is continuous, the space f ([a, b]) is connected. Since
are both nonempty (because f (a) O1 and f (b) O2 ), open in f ([a, b]), and disjoint, it
cannot be that their union is equal to f ([a, b]). Therefore L f ([a, b]) and there exists
x X with f (x) = L.
6.5 Discontinuities
Let us spend a couple of minutes on types of discontinuities for real functions. Let E R,
f : E R and x0 E. (Draw some pictures.)
x0 is a removable discontinuity of f if limxx0 f (x) exists but it not equal to f (x0 ).
x0 is a simple discontinuity of f if limxx0 f (x) exists, as does limxx+0 f (x), but they
are not equal. Here the first limit is a left limit; that is, we are considering f as being
defined on the metric space E (, x0 ] and taking the limit in this space. The second
is a right limit, and we consider the space as E [x0 , ). This corresponds to saying,
for example, that
lim f (x) = L
xx0
if for each > 0 there exists > 0 such that if x0 < x < x0 then |f (x) L| < .
x0 can be a discontinuity but not captured above. Consider f : R R given by
(
sin(1/x) if x 6= 0
f (x) = .
0 if x = 0
Here there is not even a limit as x 0. This is because we can find a sequences (xn )
converging to 0 such that (f (xn )) does not have a limit. Take
xn = 2/(n) .
6.6 Exercises
1. Let f : [a, b] R be continuous with f (x) > 0 for all x [a, b]. Show there exists
> 0 such that f (x) for all x [a, b].
2. Determine if the following functions are continuous at x = 0. Prove your answer. (You
may use standard facts about trigonometric functions although we have not introduced
them rigorously.)
(a) (
x cos x1 if x 6= 0
f (x) = .
0 if x = 0
70
(b) (
sin x1 if x 6= 0
g(x) = .
0 if x = 0
Must F be continuous?
4. In this problem we will show that there is no real-valued function that is continuous
exactly at the rationals. Fix any f : R R.
(a) Show that for each n N, the set An is open, where
1
An = x : > 0 such that |f (z) f (y)| < for all y, z (x , x + ) .
n
I1 = {r x1 : [x1 , r] O1 } .
71
9. Show that the function f given by f (x) = 1/x is uniformly continuous on [1, ).
10. Show that the function f given by f (x) = x is uniformly continuous on [0, ).
Hint. Use the fact that for a, b 0 we have a + b a + b.
11. Show that the function f given by f (x) = sin(1/x) is not uniformly continuous on
(0, 1).
12. Suppose that f : [0, ) R is continuous and has a finite limit limx f (x). Show
that f is uniformly continuous.
13. Give an example of functions f, g : [0, ) R that are uniformly continuous but the
product f g is not.
14. Let f : R R be continuous with f (f (x)) = x for all x. Show there exists c R such
that f (c) = c.
15. Let p be a polynomial with real coefficients and odd degree. That is,
16. If E R then a function f : E R is called Lipschitz if there exists M > 0 such that
The smallest number such that the above inequality holds for all x, y E is called the
Lipschitz constant for f .
17. Let I be a closed interval. Let f : I I and assume that f is Lipschitz with Lipschitz
constant A < 1.
(a) Prove that there is a unique y I with the following property. Choose x1 I
and define xn+1 = f (xn ) for all n N. Then xn y. This holds independently
of the choice of x1 .
(b) Show by counterexample that for (a) to work, we need I to be closed.
72
(c) Choose a1 , a2 , . . . , ak Q with ai > 0 for all i and with a1 ak > 1. Starting
from any x1 > 0, define a sequence {xn } by the continued fraction
1
xn = .
1
a1 +
1
a2 +
1
+
ak + xn1
Prove that {xn } converges. Prove that its limit is the root of a quadratic poly-
nomial with coefficients in Q. In older books this is stated: an infinite periodic
continued fraction is a quadratic surd. The devil is the eternal surd in the
universal mathematic. C. S. Lewis, Perelandra.
18. Let f : I R for some (open or closed) interval I R. We say that f is convex if for
all x, y I and [0, 1],
f (x + (1 )y) f (x) + (1 )f (y) .
(a) Reformulate the above condition in terms of a relation between the graph of f
and certain line segments.
yz
(b) Suppose that f : R R is convex and let x < z < y. Choose = yx to show
that
f (z) f (x) f (y) f (x)
.
zx yx
Interpret this inequality in terms of the graph of f . Argue similarly to show that
f (y) f (x) f (y) f (z)
.
yx yz
Combine these two to get
f (z) f (x) f (y) f (z)
zx yz
and interpret this inequality in terms of the graph of f .
(c) Suppose that f : [a, b] R is convex. Show that f is continuous on (a, b).
Hint. Let [c, d] be a subinterval of (a, b). Use the last inequality from (b) to show
that f is Lipschitz on [c, d] with Lipschitz constant bounded above by
|f (c) f (a)| |f (b) f (d)|
max , .
|c a| |b d|
19. Suppose that f : R R is continuous and satisfies
f (x + y) = f (x) + f (y) for all x, y R .
(a) Show that there exists c R such that for all x Z, f (x) = cx.
(b) Show that there exists c R such that for all x Q, f (x) = cx.
(c) Show that there exists c R such that for all x R, f (x) = cx.
73
7 Derivatives
7.1 Introduction
Continuous functions are nicer than most functions. However we have seen that they can still
be rather weird (recall the function that equals 1/q at a rational expressed in lowest terms
as p/q). So we move on to study functions that are even nicer, and for this we henceforth
restrict to functions from R to R. We could start at the very bottom, first studying constant
functions f (x) = c and then linear functions f (x) = ax + b, then quadratics, etc. But I trust
you learned about these functions earlier. Noting that constant functions are just special
cases of linear ones, we set out to study functions that are somehow close to linear functions.
The idea we will pursue is that even if a function f is wild, it may be that very close to
a particular point x0 , it may be well represented by a linear function. For a good choice of
a linear function L, it would make sense to hope that
lim (f (x) L(x)) = 0 .
xx0
If f is already continuous then this is not much of a requirement: we just need L(x0 ) = f (x0 ).
So this just means that L(x) can be written as L(x) = a(x x0 ) + f (x0 ).
We will look for a stronger requirement on the speed at which this difference converges
to zero. It should go to zero at least as fast as x x0 does (as x x0 ). In other words, we
will require that
f (x) L(x)
lim = 0 or in shorthand, f (x) L(x) = o(x x0 ) .
xx0 x x0
Plugging in our form of L, this means
f (x) f (x0 )
lim a =0 ,
xx0 x x0
or
f (x) f (x0 )
lim =a.
xx0 x x0
Rewriting this with the notation above, we get
f (x) = f (x0 ) + a(x x0 ) + o(x x0 ) ,
or setting x = x0 + h,
f (x0 + h) = f (x0 ) + ah + o(h)
as h 0. Again, the symbol o(h) represents some term such that if we divide it by h and
take h 0, it goes to 0.
Definition 7.1.1. Let f : (a, b) R. We say that f is differentiable at x0 (a, b) if
f (x) f (x0 )
lim exists .
xx0 x x0
In this case we write f 0 (x0 ) for the limit.
74
7.2 Properties
Proposition 7.2.1. Let f : (a, b) R be differentiable at x0 . Then f is continuous at x0 .
Proof.
f (x) f (x0 )
lim f (x) = f (x0 ) + lim [f (x) f (x0 )] = f (x0 ) + lim lim (x x0 ) = f (x0 ) .
xx0 xx0 xx0 x x0 xx0
The converse is not true. Consider the function f : R R given by f (x) = |x|. Then
|x|
lim ,
x0 x
which does not exist (it has a right limit of 1 and left limit of -1).
We will now play the same game as we did for continuity, trying to find which functions
are differentiable. Here are some examples.
1. f (x) = x:
f (x) f (x0 )
lim =1.
xx0 x x0
So f 0 (x) exists for all x and equals 1.
2. f (x) = xn for n N:
75
Proof. For the first we just use properties of limits:
Theorem 7.2.3 (Chain rule). Let f : (a, b) (c, d) be differentiable at x0 and g : (c, d) R
be differentiable at f (x0 ). Then g f is differentiable at x0 with derivative
Proof. We will want to use a division by f (y) f (x0 ) for y 6= x0 , so we must first deal with
the case that this could be 0. If there exists a sequence (xn ) in (a, b) with xn x0 but
xn 6= x0 for all n with f (xn ) = f (x0 ) for infinitely many n, we would have
76
In the other case, every sequence (xn ) in (a, b) with xn x0 and xn 6= x0 has f (xn ) =
f (x0 ) for at most finitely many n. Then as f is continuous at x0 , we have f (xn ) f (x0 )
with an 6= f (x0 ) for all n and so
Examples.
1. We know f (x) = |x| is continuous but not differentiable. To go one level deeper,
consider (
x2 x0
f (x) = 2
.
x x < 0
The derivative at 0 is
f (0 + h)
lim =0,
h0 h
and the derivative elsewhere is
(
2x x>0
f 0 (x) = .
2x x < 0
2. The function (
x3 x0
f (x) =
x3 x<0
is in class C 2 , as it has two continuous derivatives. But it is not three times differen-
tiable.
77
7.3 Mean value theorem
We begin by looking at local extrema.
In the case that X is R, if f is differentiable at a local extreme point, then the derivative
must be zero.
Proposition 7.3.2. Let f : (a, b) R and suppose that c (a, b) is a local extreme point
for f . If f 0 (c) exists then f 0 (c) = 0.
Proof. Let c be a local max such that f 0 (c) exists. Then there exists r > 0 such that for all
y with |y c| < r, we have f (y) f (c). Therefore, looking at only right limits,
f (y) f (c)
lim+ 0.
yc yc
Looking only at left limits,
f (y) f (c)
lim 0.
yc yc
Putting these together, we find f 0 (c) = 0. The argument for local min is similar.
Theorem 7.3.3 (Rolles theorem). For a < b, let f : [a, b] R be continuous such that f
is differentiable on (a, b). If f (a) = f (b) then there exists c (a, b) such that f 0 (c) = 0.
Proof. If f is constant on the interval then clearly the statement holds. Otherwise for some
d (a, b) we have f (d) > f (a) or f (d) < f (a). Let us consider the first case; the second is
similar. By the extreme value theorem, f takes a maximum on [a, b] and since f (d) > f (a)
this max cannot occur at a or b. So it occurs at some c (a, b). Then c is a local max as
well, so we can apply the previous proposition to find f 0 (c) = 0.
An important corollary is the following.
Corollary 7.3.4 (Mean value theorem). For a < b let f : [a, b] R be continuous such that
f is differentiable on (a, b). There exists c (a, b) such that
f (b) f (a)
f 0 (c) = .
ba
Proof. Define L(x) to be the line that connects the points (a, f (a)) and (b, f (b)):
f (b) f (a)
L(x) = (x a) + f (a) .
ba
78
Then the function g = f L satisfies g(a) = g(b) = 0. It is also continuous on [a, b]
and differentiable on (a, b). Therefore by Rolles theorem, we can find c (a, b) such that
g 0 (c) = 0. This gives
f (b) f (a)
0 = g 0 (c) = f 0 (c) L0 (c) = f 0 (c) ,
ba
implying the corollary.
The mean value theorem has a lot of consequences. It is one of the central tools to
analyze derivatives.
Proof. Suppose first that f 0 (x) 0 for all x (a, b). To show f is non-decreasing, let c < d
in (a, b). By the mean value theorem, there exists x0 (c, d) such that
f (d) f (c)
f 0 (x0 ) = .
dc
But this quantity is nonnegative, giving f (d) f (c). The second follows by considering f
instead of f . The third follows from the previous two.
Proof. The proof is exactly the same as that of the MVT but using the function h : [a, b] R
given by
h(x) = (f (b) f (a))g(x) (g(b) g(a))f (x) .
Indeed, h(a) = (f (b) f (a))g(a) (g(b) g(a))f (a) = h(b), so applying Rolles theorem, we
find c (a, b) such that h0 (c) = 0.
79
Theorem 7.4.2 (LHopitals rule). Suppose f, g : (a, b) R are differentiable with g 0 (x) 6= 0
for all x, where a < b < . Suppose that
f 0 (x)
A as x a .
g 0 (x)
f (x)
A as x a .
g(x)
f (x)
We will now show that if x (a, a + ) then also g(x) A < . Indeed, choose such an x
and then pick any y (a, x). From the generalized MVT, there exists c (y, x) such that
Note that the denominator is nonzero since g is injective (just use the MVT). But since
c (a, a + ), we have
f (x) f (y)
g(x) g(y) A < /2 .
Let y a and we find the result.
In the second case, we suppose that g(x) + as x a. Again for > 0 pick 1 > 0
such that if x (a, a + 1 ) then
0
f (x)
g 0 (x) A < /2 .
f (x) f (x0 )
A /2 < < A + /2 . (4)
g(x) g(x0 )
g(x) g(x0 )
lim =1.
xa
x(a,x0 )
g(x)
80
Therefore using equation (4), there exists 2 < 1 such that if x (a, a + 2 ) then
f (x) f (x0 ) g(x) g(x0 )
A 3/4 < < A + 3/4 . (5)
g(x) g(x0 ) g(x)
Also since g(x) as x a,
f (x0 )
lim =0.
xa
x(a,x0 )
g(x)
Therefore using (9) we can find 3 < 2 such that if x (a, a + 3 ) then
f (x) f (x0 ) g(x) g(x0 ) f (x0 )
A< + <A+ .
g(x) g(x0 ) g(x) g(x)
But this means
f (x)
A< < A + for all x (a, a + 3 ) .
g(x)
f (x)
This proves that g(x)
A as x a.
Proof. We will use the definition from the homework that lim supn bn is the supremum
of all subsequential limits of (bn ). Let S be the set of subsequential limits of (yn ) and T the
corresponding set for (xn yn ). We will prove the case that S and T are bounded above; the
other case is left as an exercise.
We claim that
xS = T , where xS = {xs : s S} .
To prove this, let a xS. Then there exists a subsequence (ynk ) such that ynk a/x.
Now xnk ynk xa/x = a, giving that a T . Conversely, let b T so that there exists a
81
subsequence (xnk ynk ) such that xnk ynk b. Then ynk = xnk ynk /xnk b/x. This means
that b = xb/x xS.
To finish the proof we show that sup T = x sup S. First if t T we have t/x S, so
t/x sup S. Therefore t x sup S and sup T x sup S. Conversely if s S then xs T ,
so xs sup T , giving s (1/x) sup T . This means sup S (1/x) sup T and therefore
sup T x sup S.
To find the radius of convergence of n1
P
n=0 nan x , we use the root test:
lim sup (n|an |)1/n = lim sup n1/n |an |1/n .
n n
1/n
Since n 1 weP can use the previous lemma to get a limsup of 1/R, where R is the radius
of convergence of n
n=0 an x . This means the radius of convergence of the new series is also
R.
Step 2. The function f given by f (x) = n
P
n=0 an x is differentiable at x = 0.
To prove this, we use 0 < |x| < R/2 and compute
P n
f (x) f (0) n=0 an x a0
X
= = an xn1 .
x0 x n=1
Pulling off the first term,
X
f (x) f (0)
n1
X
n2
a1 = an x = |x| an x .
x0 n=2 n=2
We can use the triangle inequality for the last sum to get
f (x) f (0) X
n2
X
x0 a1 |x|
|an ||x| |x| |an |(R/2)n2 .
n=2 n=2
By the ratio test, the last series converges, so setting C equal to it, we find
f (x) f (0)
x0 a 1
C|x| .
Now we can take the limit as x 0 and find
f (x) f (0) f (x) f (0)
lim
a1 = 0 , or lim = a1 .
x0 x0 x0 x0
This means f 0 (0) = a1 .
Step 3. We will now prove that f is differentiable at all |x| < R. So take such an x0 and use
the binomial theorem:
" n #
X X X n nj
f (x) = an (x x0 + x0 )n = an x0 (x x0 )j
n=0 n=0 j=0
j
XX n nj
= 1nj an x0 (x x0 )j . (6)
n=0 j=0
j
We now state a lemma.
82
Lemma 7.5.2. Let am,n , m, n 0 be a double sequence. If
P P
n=0 [ m=0 |am,n |] converges
then
" #
" #
X X X X
am,n = am,n .
n=0 m=0 m=0 n=0
Proof. Let > 0 and write S for the left side above and T for the right side above. For
M, N N, define
XN X M XM XN
SM,N = am,n and TM,N = am,n .
n=0 m=0 m=0 n=0
Clearly SM,N = TM,N for all M, N N. We claim that there exists M0 , N0 such that if
M M0 and N N0 then both |S SM,N | and |T TM,N | are less than /2. We need to
only verify this for S because the same argument works for T . Once we show that, we have
< /2 .
We now want to apply the lemma to the sum in (6). To do this, we must verify that
" #
X X n
1nj |an | |x0 |nj |x x0 |j
n=0 j=0
j
83
converges. But using the binomial theorem again, this sum equals
X
|an |(|x0 | + |x x0 |)n ,
n=0
which converges as long as |x0 | + |x x0 | < R. So pick such an x and we can exchange the
order of summation:
" #
X X n nj
f (x) = an x0 (x x0 )j .
j=0 n=j
j
We can view this as a power series in x x0 by setting g(x) = f (x + x0 ) and seeing that for
|x| < R |x0 |,
X
j
X n nj
g(x) = bj x , with bj = an x0 .
j=0 n=j
j
Definition
P 7.6.1. A function f : (a, b) R is called analytic if it equals some power series
n
a
n=0 n x .
The question now becomes: is every f C actually analytic? P To ntry to answer this
question we look at the derivatives of a power series: if f (x) = n=0 an x , then
84
We can then ask, if f is twice differentiable, can we find c2 (a, b) such that
0 f 00 (c2 )
f (b) = f (a) + f (a)(b a) + (b a)2 ,
2
or a c3 (a, b) such that
Proof. See the proof in Rudin, Thm. 5.15. It is a repeated application of the mean value
theorem.
We get from this a corollary:
Corollary 7.6.3. Suppose that f : [a, b] R has infinitely many derivatives; that is, f
C ([a, b]). Set
Mn = sup f (n) (c) .
c(a,b)
Mn
If n!
(b a)n 0 then
X f (n)
f (b) = (b a)n .
n=0
n!
We can see that in this corollary it is necessary to have this bound on Mn . Take for
example f : [0, ) R given by
(
e1/x if x > 0
f (x) = .
0 if x = 0
In this case, you can check that f (n) (0) = 0 for all n. However, if f (x) = n
P
n=0 an x this
would imply that an = 0 for all n, giving f (x) = 0 for all x.
This means in particular that we must not have the required growth on f (n) (x) to apply
the corollary. If you compute the n-th derivative, you can try to see why the corollary does
not apply; that is, why f is not analytic. For instance, we have
0 1/x 1 00 1/x 1 2
f (x) = e , f (x) = e for x > 0
x2 x4 x3
85
and n-th derivative can be written as
where P is a polynomial in 1/x of degree 2n. For any given r > 0, you can show that
f (n) (x) n
sup r ,
x[0,r] n!
7.7 Exercises
1. Prove that for any c R, the polynomial equation x3 3x + c = 0 does not have two
distinct roots in [0, 1].
2. Suppose that f : R R is differentiable and there exists C < 1 such that |f 0 (x)| C
for all x.
(a) Show that there exists a unique fixed point; that is, an x such that f (x) = x.
(b) Show that if f (0) > 0 then the fixed point is positive.
3. Let f : R R be continuous. Suppose that for some a < b, both of the following two
conditions hold:
4. Assume f on [a, b] is continuous, and that f 0 exists and is everywhere continuous and
positive on (a, b). Let [c, d] be the image of f . Prove that f has an inverse function
f 1 : [c, d] [a, b] and that the derivative of f 1 is continuous on (c, d).
5. Let f : (a, a) R. Assume there is a C R such that for all x (a, a), we have
|f (x) x| Cx2 . Does f 0 (0) exist? If so, what is it?
86
8. Read example 5.6 in Rudin. Define f : R R by
(
x200 sin x1 x 6= 0
f (x) = .
0 x=0
(a) For which n N does f (n) (0), the n-th derivative of f at 0, exist?
(b) For which n N does limx0+ f (n) (x) exist?
(c) For which n N is f C n (R)?
11. (From J. Feldman.) In this problem we will construct a function that is continuous
everywhere but differentiable nowhere. Define g : R R by first setting for x [0, 2],
(
x x [0, 1]
g(x) = .
2 x x [1, 2]
Then for x / [0, 2], define g(x) so that it is periodic of period 2; that is, set g(x) = g(
x)
for the unique x [0, 2) such that x = x + 2m for some m Z. (The graph of g forms
a sequence of identical triangles with the x-axis, each of height 1 andbase 2. Clearly
n
g is continuous.) For each n N, define fn : [0, 1] R by fn (x) = 43 g(4n x).
(a) Make a sketch of f1 and f2 on [0, 1]. (Optional: use a computer algebra package
to graph f1 , f1 + f2 , f1 + f2 + f3 , etc.)
87
P
(b) Prove that the formula f (x) = n=1 fn (x) defines a continuous function on [0, 1].
(c) Complete the following steps to show that f is not differentiable at any x.
Let x [0, 1] and for each m N, define hm to be either number in the set
i.
x 12 4m , x + 21 4m such that there is no integer strictly between 4m x and
m
4 hm . Show that
fn (hm ) fn (x)
if n > m then =0.
hm x
ii. Show that
fn (hm ) fn (x)
if n = m then = 3m .
hm x
iii. Show that
fn (hm ) fn (x)
if n < m then
3n .
hm x
Putting these three cases together, show that
f (hm ) f (x) 1 m
hm x 2 (3 + 3)
(a) Show that for any x, both series converge absolutely and define continuous func-
tions. Show that cos 0 = 1 and sin 0 = 0.
(b) Show that the derivative of sin x is cos x and the derivative of cos x is sin x.
(c) Show that for any x, sin2 x + cos2 x = 1.
Hint. Take the derivative of the left side.
(d) For a given a R find the Taylor series of both f (x) = sin(a + x) and g(x) =
cos(a + x) centered at x = 0.
(e) Use the previous part to show the identities
sin(x + y) = sin x cos y + cos x sin y and cos(x + y) = cos x cos y sin x sin y .
88
(a) Show that S is nonempty.
Hint. Assume it is empty. Since cos 0 = 1, show that then cos x would be
positive for all x > 0 and therefore sin x would be strictly increasing. As sin x is
bounded, it would have a limit as x . Deduce then that cos x would also
have a limit L. Show that L = 2L2 1 and that we must have L = 1. Argue that
this implies sin x is unbounded.
(b) Define
= 2 inf S .
Show that cos 2 = 0, sin 2 = 1. Then prove that sin(x + 2) = sin x and
cos(x + 2) = cos x.
sin x
(c) Define tan x = cos x
for all x such that cos x 6= 0. Show that tan 4 = 1.
14. Please continue to use only the facts about trigonometry established in problems 9
and 10.
(a) Show that the derivative of tan x is sec2 x, where we define sec x = 1/ cos x.
(b) From now on, restrict the domain of tan x to (/2, /2). Show that tan x is
strictly increasing on this domain. Show that its image is R. Therefore tan x has
an inverse function arctan x mapping R (/2, /2). By problem 1, arctan x
is of class C 1 , and in particular continuous.
(c) Show that sec2 (arctan x) = 1 + x2 for all x R. (It is not rigorous to draw a little
right triangle with an angle = arctan x in one corner. Problems 910 involve no
notion of angle or two-dimensional geometry.)
(d) By the definition of inverse function, tan(arctan x) = x for all x R. Use the
1
Chain Rule to show the derivative of arctan x is 1+x 2.
89
15. Abels limit theorem. Suppose that f : (1, 1] R is P a function such that (a) f
is continuous at x = 1 and (b) for all x (1, 1), f (x) P = n
n=0 an x for some power
series that converges for all x (1, 1). If, in addition, an converges, prove that
X
an = f (1) .
n=0
Pn Pn
Hint. For x (1, 1) write fn (x) = k=0 ak xk and An = k=0 ak . Show that
Use the representation of f (x) above to bound this difference for x near 1.
90
8 Integration
The standard motivation for integration is to find the area under the graph of a function.
There are other very important reasons to study integration and one is that integration is
a smoothing operation: the (indefinite) integral of a function has more derivatives than the
original function does. Other motivations can be seen in abstract measure theory and the
application to, for instance, probability theory.
8.1 Definitions
We will start at the bottom and try to find the area under a graph. We will place boxes
under the graph and sum the area in these boxes. The x-coordinates of the sides of these
boxes form an (ordered) partition. Although we have used this word before, it will take a
new meaning here.
Definition 8.1.1. A partition P of the interval [a, b] is a finite set {x1 , . . . , xn } such that
Given a partition and a bounded function f we can construct an upper sum and a lower
sum. To do this, we consider a subinterval [xi , xi+1 ] and let
A box with base [xi , xi+1 ] and height Mi contains the entire area below f in this interval,
whereas the box with the same base but height mi is contained in this area. (Here we are
thinking of f 0, so these statements are slightly different otherwise.) Counting up the
area of these boxes, we get the following definitions.
Definition 8.1.2. Given a partition P = {x1 < < xn } of [a, b] and a bounded function
f : [a, b] R we define the upper and lower sums of f relative to the partition P as
n1
X n1
X
U (f, P) = Mi (xi+1 xi ) and L(f, P) = mi (xi+1 xi ) .
i=1 i=1
There is a useful monotonicity property of upper and lower sums. To state this, we use
the following term. A partition Q of [a, b] is said to be a refinement of P if P Q. This
means that we have just thrown in extra subintervals to P to form Q.
91
Proof. By iteration (or induction) it suffices to show the inequalities in the case that Q has
just one more point than P. So take P = {x1 < < xn } and Q = {x1 < < xk < t <
xk+1 < < xn }. Since most intervals are unchanged,
" # " #
U (f, P) U (f, Q) = Mk (xk+1 xk ) sup f (y) (y xk ) sup f (z) (xk+1 y)
y[xk ,t] z[t,xk+1 ]
Mk (xk+1 xk ) Mk (y xk ) Mk (xk+1 y)
=0.
The argument for lower sums is similar.
The above lemma says that upper sums decrease and lower sums increase when we add
more points into the partition. Since we are thinking of taking very fine partitions, we define
the upper and lower integrals
Z b Z b
f (x) dx = inf U (f, P) and f (x) dx = sup L(f, P)
a P a P
for bounded f : [a, b] R. Note that these are defined for all bounded f .
Definition 8.1.4. If f : [a, b] R then f is integrable (written f R([a, b])) if
Z b Z b
f (x) dx = f (x) dx .
a a
Rb
In this case we write a
f (x) dx for the common value.
Note the following property of upper and lower sums and integrals.
For any partition P of [a, b] and bounded function f : [a, b] R,
Z b Z b
L(f, P) f (x) dx f (x) dx U (f, P) .
a a
Proof. The only inequality that is not obvious is the one between the integrals. To
show this, we first let > 0. By definition of the upper and lower integrals, there exist
partitions P1 and P2 of [a, b] such that
Z b Z b
L(f, P) > f (x) dx /2 and U (f, Q) < f (x) dx + /2 .
a a
0
Taking P to be the common refinement of P and Q (that is, their union), we can use
the previous lemma to find
Z b
f (x) dx < L(f, P) + /2 L(f, P 0 ) + /2 U (f, P 0 ) + /2
a
Z b
U (f, Q) + /2 < f (x) dx + .
a
Taking 0 we are done.
92
There is an equivalent characterization of integrability. It is useful because the condition
involves only one partition, whereas when dealing with both upper and lower integrals one
would need to approximate using two partitions.
Theorem 8.1.5. Let f : [a, b] R be bounded. f is integrable if and only if for each > 0
there is a partition P of [a, b] such that U (f, P) L(f, P) < .
Proof. Suppose first that f is integrable and let > 0. Then the upper and lower integrals
Rb Rb
are equal. Choose P1 such that L(f, P1 ) > a f (x) dx /2 and U (f, P2 ) < a f (x) dx + /2.
Taking P to be the common refinement of P1 and P2 we find
Z b
L(f, P) L(f, P1 ) > f (x) dx /2
a
and Z b
U (f, P) U (f, P2 ) < f (x) dx + /2 .
a
Rb Rb
Since > 0 is arbitrary, we find a f (x) dx a f (x) dx. The other inequality is obvious, so
the upper and lower integrals are equal. In other words, f R.
Using this we can show that all continuous functions are integrable.
Proof. Since [a, b] is compact, f is uniformly continuous. Then given > 0 we can find > 0
such that if x, y [a, b] with |x y| < then |f (x) f (y)| < /(2(b a)). Now construct any
partition P of [a, b] such that, writing P = {x1 < x2 < < xn }, we have |xi xi+1 | <
for all i = 1, . . . , n 1. Then in each subinterval [xi , xi+1 ], we have
93
So we know now that all continuous functions are integrable. There are some other
questions we need to resolve.
Examples.
We will now show that f is not integrable on any [a, b].Indeed, let P be any partition
of [a, b], written as {x1 < x2 < < xn }. Then for each subinterval [xi , xi+1 ], we have
Pn1
Therefore U (f, P) L(f, P) = i=1 (Mi mi )(xi+1 xi ) = b a. Choosing any > 0
that is less than ba, we see that there is no partition P such that U (f, P)L(f, P) < .
Therefore f / R.
All functions with countably many discontinuities are integrable. One example will be
in the problem set. It is actually possible to show that some functions with uncountably
many discontinuities are integrable, but we will not address this.
94
Let us prove a simple example, the function f : [0, 1] given by
(
0 x 1/2
f (x) = .
1 x > 1/2
Given > 0 we construct a partition containing a very small subinterval around the
discontinuity. Let P = {0 < 1/2 /3 < 1/2 + /3 < 1}. Then
2
X
U (f, P) L(f, P) = (Mi mi )(xi+1 xi )
i=1
= 0(1/2 /3) + 1(2/3) + 0(1/2 2/3) = 2/3 < .
In this example we did not need to care about subintervals away from the discontinuity
because the function is constant there (and thus has Mi = mi ). In general we would
have to have construct a partition with somewhat more complicated parts there too
(possibly using continuity).
R1 R1
Similarly, L(f, Pn ) 1/3. This means that 0
f (x) dx 1/3 and 0
f (x) dx 1/3, giving
Z 1
x2 dx = 1/3 .
0
95
Proposition 8.2.1. Let f, g : [a, b] R be integrable and c R.
and Z b Z b
(cf )(x) dx = c f (x) dx .
a a
Proof. Let us show item 1 first. For > 0, take P and Q to be partitions such that
Z b
L(f, P) f (x) dx U (f, P) < L(f, P) + /2
a
and Z b
L(g, Q) g(x) dx U (g, Q) < L(g, Q) + /2
a
(Here we have used that for bounded functions h1 and h2 and any set S R, inf xS (h1 (x) +
h2 (x)) inf xS h1 (x) + inf xS h2 (x) and the corresponding statement for suprema.) So we
find both
U (f + g, P 0 ) L(f + g, P 0 ) <
and Z b Z b
0
L(f + g, P ) f (x) dx g(x) dx < .
a a
96
Rb
The first statement implies that f +g is integrable and L(f + g, P 0 ) a (f + g)(x) dx < .
Combining this with the second statement gives
Z b Z b Z b
(f + g)(x) dx f (x) dx g(x) dx < 2 .
a a a
Z b
L(cf, P) (cf )(x) dx U (cf, P) < L(cf, P) + ,
a
R
b
giving a (cf )(x) dx L(cf, P) < . Combining these two and taking 0 proves
Rb Rb
a
(cf )(x) dx = c a
f (x) dx.
If instead c < 0 then we first prove the case c = 1. Then we have for any partition P of
[a, b] that U (f, P) = L(f, P) and L(f, P) = U (f, P). Thus is U (f, P) L(f, P) <
we also have U (f, P) L(f, P) < , proving that f is integrable. Further, as above,
Z b
L(f, P) (f )(x) dx < L(f, P) +
a
and Z b
U (f, P) f (x) dx < U (f, P) + .
a
Rb Rb
Combining these and taking 0 gives a (f )(x) dx = a f (x) dx. Last, for any c < 0
we note that if f is integrable, so is f and since c > 0, so is (c)(f ) = cf . Further,
Z b Z b Z b Z b
(cf )(x) dx = ((cf ))(x) dx = (cf )(x) dx = (c) f (x) dx
a a a a
Z b
=c f (x) dx .
a
97
For the second item, we just use the fact that for every partition P of [a, b], U (f, P)
U (g, P) whenever f (x) g(x) for all x [a, b]. So given > 0, choose P such that
Rb
U (g, P) < a g(x) dx + . Now
Z b Z b
f (x) dx U (f, P) U (g, P) < g(x) dx + .
a a
Rb Rb
This is true for all > 0 so we deduce that a f (x) dx a g(x) dx.
We move to the third item. Given > 0 choose a partition P of [a, b] such that U (f, P)
L(f, P) < . Now refine P to a partition Q by adding the point d. Call P1 the partition of
[a, d] obtained from the points of Q up to d and P2 the remaining points of Q (including d)
that form a partition of [d, c]. Then
X
U (f, P1 ) L(f, P1 ) = (Mi mi )(xi+1 xi ) U (f, P) L(f, P) < .
i:xi <d
This means f is integrable on [a, d]. Similarly it is integrable on [d, c]. Furthermore, we have
Z d
L(f, P1 ) f (x) dx L(f, P1 ) + ,
a
and Z c
L(f, P2 ) f (x) dx L(f, P2 ) + .
d
Combining these with
Z b
L(f, P1 ) + L(f, P2 ) = L(f, Q) f (x) dx L(f, P1 ) + L(f, P2 ) + ,
a
We find Z b d c
Z Z
f (x) dx f (x) dx f (x) dx < 3 .
a a d
Taking to zero gives the result.
Let us give one more important property of the integral.
Proposition 8.2.2 (Triangle inequality for integrals). Let f : [a, b] R be integrable. Then
so is |f | and Z b Z b
f (x) dx |f (x)| dx .
a a
Proof. Let > 0 and choose a partition P of [a, b] such that U (f, P) L(f, P) < . For the
proof we use the fact (which you can check using the triangle inequality) that for any set
S R and bounded function g : S R,
98
This implies that
U (|f |, P) L(|f |, P) U (f, P) L(f, P) < ,
so |f | R.
Rb
To prove the inequality in the proposition, note that f (x) |f (x)| for all x, so a f (x) dx
Rb Rb Rb Rb
a
|f (x)| dx. Similarly f (x) |f (x)|, so a
f (x) dx = a
(f (x)) dx a
|f (x)| dx.
Combining these gives the inequality.
In fact this is an instance of a more general theorem, stated in Rudin. We will not prove
it; the proof is similar to the above (but more complicated).
Theorem 8.2.3. Suppose that f : [a, b] [c, d] is integrable and : [c, d] R is continuous.
Then f is integrable.
Theorem 8.3.1 (Fundamental theorem of calculus part I). Let f : [a, b] R be integrable
and F : [a, b] R a continuous function such that F 0 (x) = f (x) for all x (a, b). Then
Z b
F (b) F (a) = f (x) dx .
a
Proof. Since f is integrable, given > 0 we can find a partition P such that U (f, P)
L(f, P) < . We will use the mean value theorem to relate values of f in the subintervals to
values of F . That is, writing P = {x1 < < xn }, we can find for each i = 1, . . . , n 1 a
point ci (xi , xi+1 ) such that
Then we have
n1
X
L(f, P) f (ci )(xi+1 xi ) L(f, P) + .
i=1
99
Furthermore Z b
L(f, P) f (x) dx L(f, P) + .
a
Using the equation derived by the mean value theorem above,
n1
X n1
X
f (ci )(xi+1 xi ) = [F (xi+1 ) F (xi )] = F (b) F (a) .
i=1 i=1
and Z b
1 n+1
xn dx = an+1 .
b
a n+1
There is a second fundamental theorem of calculus. Whereas the first is about integrating
a derivative, the second is about differentiating an integral. Both of them say that integration
and differentiation are inverse operations. For example, in the first, when we start with F
and differentiate to get a function f , we integrate back to get F (in a sense).
Theorem 8.3.2 (Fundamental theorem of calculus part II). Let f : [a, b] R be continuous.
Define F : [a, b] R by Z x
F (x) = f (t) dt .
a
Then F is differentiable on [a, b] with F 0 (x) = f (x) for all x.
Proof. Let x [a, b); the case of x = b is similar and is calculated as a left derivative. For
h > 0,
Z x+h Z x
1 x+h
F (x + h) F (x)
Z
1
= f (t) dt f (t) dt = f (t) dt .
h h a a h x
Let > 0. Since f is continuous at x we can find > 0 such that if |t x| < then
|f (t) f (x)| < . This means that if 0 < h < then
1 x+h
Z x+h
1 x+h
Z Z
F (x + h) F (x)
f (x) = f (t) dt f (x) dt = (f (t) f (x)) dt
h h
x x
h
x
Z x+h
1
|f (t) f (x)| dt
h x
(1/h)h = .
100
In other words,
F (x + h) F (x)
lim+ f (x) = 0 .
h0 h
A similar argument works for the left limit (in the case that x 6= a), using
F (x h) F (x) 1 xh
Z
= f (t) dt .
h h x
and completes the proof.
Just as the substitution rule is related to the chain rule, integration by parts is related
to the product rule.
101
Theorem 8.4.2 (Integration by parts). Let f, g : [a, b] R be C 1 . Then
Z b Z b
0
f (x)g (x) dx = f (b)g(b) f (a)g(a) f 0 (x)g(x) dx .
a a
Proof. This follows from the product rule since both of f 0 g and f g 0 is integrable.
8.5 Exercises
1. Let f : [0, 1] R be continuous.
R1
(a) Suppose that f (x) 0 for all x and that 0
f (x) dx = 0. Show that f is identically
zero.
Rb
(b) Suppose that f is not necessarily non-negative but that a
f (x) dx = 0 for all
a, b [0, 1] with a < b. Show that f is identically zero.
4. Define f : [0, 1] R by
(
0 if x /Q
f (x) = 1 .
n
if x = m
n
Q, where m and n have no common divisor
5. Let f and g be continuous functions on [0, 1] with g(x) 0 for all x. Show there exists
c [0, 1] such that Z 1 Z 1
f (x)g(x) dx = f (c) g(x) dx .
0 0
102
Hint. Write the above quantity as
n1 Z n n1 Z k+1
1 X1 dx 1 X 1 1
+ = + dx .
n k=1 k 1 x n k=1 k k x
7. Let {fn } be a sequence of continuous functions on [0, 1]. Suppose that {fn } converges
uniformly to a function f . Recall from last problem set that this means that for any
> 0 there exists N such that n N implies that |fn (x) f (x)| < for all x [0, 1].
Show that Z 1 Z 1
lim fn (x) dx = f (x) dx .
n 0 0
Give an example to show that we cannot only assume fn f pointwise (meaning that
for each fixed x [0, 1], fn (x) f (x)).
R R
1 1
Hint. Use the inequality 0 g(x) dx 0 |g(x)| dx, valid for any integrable g.
8. Suppose that {fn } is a sequence of functions in C 1 ([0, 1]) and that the sequence {fn0 }
converges uniformly to some function g. Suppose there exists some c [0, 1] such that
the sequence {fn (c)} converges. By the fundamental theorem of calculus, we can write
for x [0, 1] Z x
fn (x) = fn (c) + fn0 (t) dt .
c
one can show using the Weierstrass M -test that for any r with 0 < r < R, fn0 g
uniformly on (r, r). We can then conclude that f 0 (x) = g(x).
103
9. You can solve either this question or the next one. In this problem we will
show part of Stirlings formula. It states that
n!
lim 2 .
n nn en n
Use a change of variable u = x/k and continue to show that this equals
n1
" Z #
1/k
X 1
k [log(1 + u) u] du + + n 1 log n .
k=1 0 2k
n!
(b) Prove that nn en n
converges if and only if
n1
" Z #
X 1/k
lim k [log(1 + u) u] du exists .
n 0
k=1
104
A Real powers
The question is the following: we know
what 22 or 23 means, or even 22/3 , the number whose
cube equals 22 . But what does 2 2 mean? We will give the definition Rudin has in the
exercises of Chapter 1. We will only use the following facts for r, s > 0, n, m Z:
rn+m = rn rm .
(rn )m = rmn .
(rs)n = rn sn .
if s < r and n > 0 then sn < rn . If s < r and n < 0 then sn > rn .
S = {x > 0 : xn r}
and to show that S is nonempty, bounded above, and thus has a supremum. Calling y this
supremum, he then shows y n = r. The proof of this is somewhat involved and is similar to
our proof (from the first lecture) that {a Q : a2 < 2} does not have a greatest element.
To show there is only one such y, we note that 0 < y1 < y2 implies that y1n < y2n and so
if y1 6= y2 are positive then y1n 6= y2n .
This definition extends to integer roots.
Definition A.1.2. If r > 0 and n N we define r1/n as the unique positive real number y
such that y n = 1/r.
105
Proposition A.2.2. If a positive a Q can be represented by m/n and p/q for m, n, p, q N
then for all r > 0,
rm/n = rp/q .
Proof. First note that (rm/n )nq = ((rm/n )n )q = rmq and (rp/q )nq = ((rp/q )q )n = rpn . However
as m/n = p/q we have pn = mq and so these numbers are equal. There is a unique nq-th
root of this number, so rm/n = rp/q .
Note that the above proof applies to negative rational powers: suppose that r > 0 and
a Q is negative such that a = m/n = p/q. Then
Definition A.2.3 (Correct definition of rational powers). If r > 0 and a > 0 is rational
we define ra = rm/n for any m, n N such that a = m/n. If a < 0 is rational we define
ra = (1/r)a .
If a = m/n for m Z and n N then ra is the unique positive number such that
(ra )n = rm .
Proof. For m 0 this is the definition. For m < 0, this is because (ra )n = ((1/r)a )n =
(1/r)m = rm and if s is any other positive number satisfying sn = rm then uniqueness
of n-th roots gives s = ra .
ra+b = ra rb .
(ra )b = rab .
Proof. Write a = m/n and b = p/q for m, p Z and n, q N. Then rab is the unique
positive number such that (rab )nq = rmp . But
(rs)a = ra sa .
106
Proof. Again write a = m/n for m Z and n N. Then (rs)a is the unique positive
number such that ((rs)a )n = (rs)m . But
Proof. Suppose first that r > 1 and a 0 with a = m/n for m, n N. Then if ra < 1,
we find rm < 1n = 1, a contradiction, as rm > 1. So ra > 1. Next if a b then
a b 0 so rab 1. This gives ra = rab rb rb .
If r < 1 then ra (1/r)a = 1a = 1, so ra = (1/r)a . Similarly rb = (1/r)b . So since
1/r > 1 we get ra = (1/r)a (1/r)b = rb . Multiplying both sides by ra rb we get
ra rb .
If s < r and a > 0 then sa < ra . If s < r and a < 0 then sa > ra .
rt = sup{ra : a Q and a t} .
Proposition A.3.2. If a Q then for r > 0, the definition above coincides with the rational
definition.
Proof. For this proof, we take ra to be the defined as in the rational powers section.
Suppose first that r > 1. Clearly ra {rb : b Q and b a}. So to show it is the
supremum we need only show it is an upper bound. This follows from the fact that b a
implies rb ra (proved above).
If 0 < r < 1 then ra (r1 )a = (1/r)a so the definitions coincide here as well.
rt+u = rt ru .
107
Proof. We will use the following statement, proved on the homework. If A and B are
nonempty subsets of [0, ) which are bounded above then define AB = {ab : a
A, b B}. We have
sup(AB) = sup A sup B . (7)
It either of the sets consists only of 0, then the supremum of that set is 0 and both
sides above are 0. Otherwise, both sets (and therefore also AB) contain positive
elements. For any element c AB we have c = ab for some a A, b B. Therefore
c = ab sup A sup B and therefore this is an upper bound for AB. As sup(AB) is the
least upper bound, we get sup(AB) sup A sup B. Assuming now for a contradiction
that we have strict inequality, because sup A > 0 we also have sup(AB)/ sup A < sup B.
Thus there exists b B such that sup(AB)/ sup A < b. As b must be positive, we also
have sup(AB)/b < sup A and there exists a A such that sup(AB)/b < a, giving
sup(AB) < ab. This is clearly a contradiction.
Now to prove the property, suppose first that r > 1. By the statement we just proved,
we need only show that
{rb : b Q and b t + u} = AB ,
(rs)t = rt st .
(rt )u = rtu .
108
Proof. We will first show the equality in the case r > 1 and t, u > 0. We begin with
the fact that (rt )u is an upper bound for {ra : a Q and a tu}. So let a tu be
rational and assume further that a > 0. In this case we can write a = bc for b, c Q
and b t, c u. By properties of rational exponents, we have ra = (rb )c . As rb rt
(by definition) we get from monotonicity that (rb )c (rt )c . But this is an element of
the set {(rt )d : d Q and d u}, so (rt )c (rt )u . Putting these together,
ra = (rb )c (rt )c (rt )u .
This shows that (rt )u is an upper bound for {ra : a Q and 0 < a tu}. For the case
that a < 0 we can use monotonicity to write ra r0 (rt )u . Putting this together
with the case a > 0 gives that (rt )u is an upper bound for {ra : a Q and a tu}
and therefore rtu (rt )u .
To prove that (rt )u rtu we must show that rtu is an upper bound for {(rt )a : a
Q and a u}. For this we observe that rt > 1. This holds because t > 0 and therefore
we can find some rational b with 0 < b < t. Thus rt rb > r0 = 1. Now let a be
rational with 0 < a u; we claim that (rt )a rtu . Proving this will suffice since if
a < 0 then (rt )a < (rt )0 = 1 rtu . To show the claim, note that if we show that
rt (rtu )1/a we will be done. This is by properties of rational exponents: we would
then have a
(rt )a (rtu )1/a = rtu .
So we are reduced to proving that
sup{rb : b Q and b t} (rtu )1/a ,
which follows if we show that for each b Q such that b t, we have rb (rtu )1/a .
Again, this is true if rab rtu because then rb = (rab )1/a (rtu )1/a . But a t and
b u so rab rtu . This completes the proof of (rt )u = rtu in the case r > 1 and
t, u > 0.
In the case r > 1 but t > 0 and u < 0, we can use (8):
(rt )u = 1/(rt )u = 1/rtu = rtu .
If instead r > 1 but t < 0 and u > 0,
(rt )u = (1/rt )u = 1/(rt )u = 1/rtu = rtu .
Here we have used that for s > 0 and x R, (1/s)x = 1/sx , which can be verified as
1 = (s(1/s))x = sx (1/s)x . Last if r > 1 but t < 0 and u < 0, we compute
(rt )u = ((1/r)t )u = 1/(rt )u = 1/rtu = rtu ,
completing the proof in the case r > 1.
If 0 < r < 1 then
(rt )u = ((1/r)t )u = (1/r)tu = rtu .
109
If r > 1 and u t then ru rt . If 0 < r < 1 and u t then ru rt .
Proof. Assume r > 1. If u = 0 and t > 0 then we can find a rational b such that
0 < b t, giving rt rb > r0 = 1. For general u t we note 1 rtu , so multiplying
both sides by the (positive) ru we get the result.
If 0 < r < 1 then ru = (1/r)u (1/r)t = rt .
If s < r and t > 0 then st < rt . If s < r and t < 0 then st > rt .
Proof. First consider the case that s = 1. Then r > 1 and for any t > 0 we can find a
rational b such that 0 < b < t. Therefore rt rb > r0 = 1. For general s < r we write
rt = st (r/s)t > st . If t < 0 then st = (1/s)t > (1/r)t = rt .
log 1 = 0.
log is C on (0, ).
y 1 d
f 0 (x) = = = log x .
xy x dx
Therefore f (x) log x has zero derivative and must be a constant. Taking x = 1, we
get
f (1) log 1 = log y log y = 0 ,
so f (x) = log x. This completes the proof.
110
The range of log is R.
Proof. We first claim that limx log x = . Because log is strictly increasing, it
suffices to show that the set {log x : x R} is unbounded above. Note that
Z 2 Z 2
1 1
log 2 = dt dt = 1/2 .
1 t 1 2
e0 = 1.
ex is C on R.
For x, y R, ex+y = ex ey .
Proof. Because ex is the inverse function of log x, which is defined on (0, ), its range
is (0, ), giving ex > 0.
111
For any x,
x
X xn
e = .
n=0
n!
Proof. This follows from Taylors theorem. For any x, the n-th derivative of the ex-
ponential function evaluated at x is simply ex . Therefore expanding at x = 0, for any
N 1,
N 1 (n) N 1
x
X f (0) n f N (cN ) N X xn ecN N
e = x + x = + x ,
n=0
n! N ! n=0
n! N !
with cN some number between 0 and x. This remainder term is bounded by
ecN N xN
x ex 0,
N! N!
N
P n x /N ! 0 as N . This follows because the ratio test gives convergence
because
of x /n!,
P so xthe n-th term must go to 0. By the corollary to Taylors theorem, we
get ex =
n
n=0 n! .
Writing e = e1 , the exponential function is the x-th power of e (defined earlier in terms
of suprema).
Proof. For ease of reading, write exp(x) for the function we have defined here and ex
for the x-th power of e, defined in terms of suprema. Then for x = m/n Q with
n N, we have
Because em/n was defined as the unique positive number y such that y n = em , we have
exp(x) = ex . Generally for x R we defined
(This was the definition for exponents whose bases are 1, which is true in our case
because e1 e0 = 1.) Using equivalence over rationals,
However exp is an increasing function, so writing S for the set whose supremum we take
above, exp(x) sup S = ex . On the other hand, because exp is continuous at x, we can
pick any sequence qn of rationals converging up to x and we have exp(qn ) exp(x).
This implies that exp(x) r is not an upper bound for S for any r > 0 and therefore
exp(x) = sup S = ex .
112
We now show that the exponential function can be attained by the standard limit
x
x n
e = lim 1 + .
n n
First we use the binomial formula
n n
x n X n x j X n! x j
1+ = =
n j=0
j n j=0
j!(n j)! n
n
X n(n 1) (n j + 1) xj
= .
j=0
nj j!
P |x|j
To show the limit, let > 0. By convergence of j=0 j! , we may choose J such that
X |x|j
< /3 .
j=J+1
j!
xj xj
Because Jj=0 n(n1)(nj+1)
P
nj j!
is a finite sum and the j-th term approaches j!
as n ,
we can pick N such that if n N then
J J
X n(n 1) (n j + 1) xj X j
x
< /3 .
nj j! j!
j=0 j=0
113
To prove this, we need to define the function xx . It is given by xx = exp(x log x). So
the identity reads
Z 1
X
x log x
e dx = nn .
0 n=1
For this integral to make sense, as the integrand is not defined at x = 0, we must use the
right limit. Note that by lHopitals rule (which we didnt cover but it is in Rudin),
log x 1/x
lim+ x log x = lim+ = lim+ =0.
x0 x0 1/x x0 1/x2
lim xx = 1 ,
x0+
which has radius of convergence R = . Therefore by the remark after exercise 6, Chapter
8, for any M > 0, this series converges uniformly for x in [M, M ]. (The proof uses the
Weierstrass M -test.) Because the number |x log x| is bounded by e1 on the interval [0, 1]
(do some calculus),
x log x
X (x log x)n
e = converges uniformly on [0, 1] .
n=0
n!
We now use exercise 5, Chapter 8, which says that if (fn ) isR a sequence of continuous functions
1 R1
that converges uniformly on [0, 1] to a function f then 0 fn (x) dx 0 f (x) dx. Noting
that an infinite series of functions is just a limit of the sequence of partial sums (which
converges uniformly in our case), we get
Z 1 Z 1
(x log x)n 1 1
X X Z
x
x dx = dx = (x log x)n dx .
0 n=0 0
n! n=0
n! 0
R1
Now we compute the integral 0 (x log x)n dx using integration by parts. We take
u = ( log x)n and dv = xn dx to get du = (1)n n(log x)n1 /x dx and v = xn+1 /(n + 1):
Z 1 1 Z 1
n ( log x)n xn+1 n n
(x log x) dx = (1) xn ( log x)n1 dx
0 n+1
0 n+1 0
Z 1
n
= xn (log x)n1 dx .
n+1 0
114
Repeating this, we find Z 1
n!
(x log x)n dx = .
0 (n + 1)n+1
So plugging back in, we find
Z 1
x
X 1 X
x dx = = nn .
0 n=0
(n + 1)n+1 n=1
C.1 Definitions
For any set S R write |S| for the diameter of S:
|S| = sup{|x y| : x, y S} .
For example, we have |[0, 1]| = 1, |Q [0, 1]| = 1 and |(0, 1) (2, 3)| = 3.
Definition C.1.1. Let S R. A countable collection {Cn } of subsets of R is called a
countable cover of S if
S n=1 Cn .
Note that the sets in a countable cover can be any sets whatsoever. For example, they
do not need to be open or closed.
Definition C.1.2. If {Cn } is a countable collection of sets in R and > 0, the -total
length of {Cn } is
X
|Cn | .
n=1
If > 1 then it has the effect of increasing the diameter (that is, |Cn | > |Cn |) when
|Cn | is large (bigger than 1) and decreasing it when |Cn | is small (less than 1).
Example 1. Consider the interval [0, 1]. Let us build a very simple cover of this set by
fixing n and choosing our (finite) cover {C1 , . . . , Cn } by
i1 i
Ci = , .
n n
For instance, for n = 4 we have
115
Computing the -total length of this cover:
n
i 1 i
X n
n , n = n .
i=1
This result gives us some hint that the dimension of a set is related to the -total length of
countable covers of the set. Specifically we make the following definition:
Definition C.1.3. If S R has |S| < and > 0 we define the -covered length of S as
( )
X
H (S) = inf |Cn | : {Cn } is a countable cover of S .
n=1
It is an exercise to show that for all 0 < < dimH (S), we have H (S) > 0. Also, setting
0
0 = 1 then H0 (S) > 0 for all S Thus we could define the Hausdorff dimension as
Note that example 1 shows that dimH ([0, 1]) 1. To show the other inequality, we must
show that for all < 1, H ([0, 1]) > 0. To do this, let {Cn } be a countable cover of [0, 1].
We may replace the Cn s by Dn = Cn [0, 1], since the Dn s will still cover [0, 1] and will
have smaller -length. For < 1 we then have
X
X
|Dn | |Dn | ,
n=1 n=1
116
Assuming the lemma, we have H ([0, 1]) 1 for all < 1 and therefore dimH ([0, 1]) = 1.
If the concept of Hausdorff dimension is to agree with our current notion of dimension
it had better be that each subset of R has dimension no bigger than 1. This is indeed the
case; we can argue similarly to before. If S R has |S| < then we can find M > 0 such
that S [M, M ]. Now for each n define a cover {C1 , . . . , Cn } by
i1 i
Ci = M + 2M , M + 2M .
n n
As before, for > 1, the -total length of {C1 , . . . , Cn } is
2M
n 0 as n .
n
Therefore H (S) = 0 and dimH (S) 1.
Example 2. Take S to be any countable set with finite diameter (for instance the rationals
in [0, 1]). We claim that dimH (S) = 0. To show this we must prove that for all > 0,
H (S) = 0. Let > 0 and define a countable cover of S by first enumerating the elements
of S as {s1 , s2 , . . .} and for i N, letting Ci be any interval containing si of length (/2n )1/
(note that this is a positive number). Then the -total length of the cover is
X
n
=;
n=1
2
117
and this approaches zero as k . Note that we have used above that, for example
2k = ek log 2 . Therefore H (S) = 0 and dimH (S) .
For the other direction (to prove dimH (S) ) we will show that H (S) > 0. Let {Cn }
be a countable cover of S. We will give a bound on the -total length of {Cn }. As before,
we may assume that each Cn is actually a subset of [0, 1]. By compactness one can show the
following:
Lemma C.2.2. Given > 0 there exist finitely many open intervals D1 , . . . , Dm such that
m
n=1 Cn j=1 Dj and
Xm X
|Dj | < |Cn | + .
j=1 n=1
Proof. The proof is an exercise. The idea is to first replace the Cn s by closed intervals and
then slightly widen them, while making them open. Then use compactness.
Now choose k such that
k
1
min{|Dj | : j = 1, . . . , m} .
3
For l = 1, . . . , k let Nl be the number of sets Dj such that 3l |Dj | < 3l+1 . Using
= log 2/ log 3 and the definition of k, we find
m
X k
X k
X
l
|Dj | Nl 3 = Nl 2l , (9)
j=1 l=1 l=1
so we will give a lower bound for the right side. Suppose that Dj has 3l |Dj | < 3l+1 .
Then Dj can intersect at most 2 of the intervals in Sl , the l-th step in the construction of
the Cantor set. Since each of these intervals produces 2kl subintervals at the k-th step of
the construction, we find that Dj contains at most 2 2kl subintervals at the k-th step of
the construction. But there are only 2k subintervals at the k-th step so we find
k
X
k
2 Nl 2 2kl
l=1
or
k
1 X
Nl 2l .
2 l=1
118
C.3 Exercises
1. Prove Lemma C.1.4.
3. Prove that if S R with |S| < has nonempty interior then show that dimH (S) = 1.
4. What is the Hausdorff dimension of a modified Cantor set where we remove the middle
1/9-th of our intervals?
5. What is the Hausdorff dimension of the modified Cantor set from exercise 15, Chapter
3?
119