Notes On Luenberger's Vector Space Optimization

Convexity and Optimization
WITH
A PPLICATIONS
Paul G. Bamberg Harvard University
Convexity and Optimization

WITH
A PPLICATIONS
Paul G. Bamberg
Copyright c 2008 Paul G. Bamberg Harvard University Cambridge, MA 02138
This text is based on lecture notes by Paul G. Bamberg written for M ATH 116: Convexity and Optimization with Applications, a course offered at Harvard University in Fall 2008. The notes were meant to complement Optimization by Vector Space Methods by David Luenberger (Wiley Interscience, 1969 [1997]).
Front cover: The image was generated by the following M ATHEMATICA code: GraphicsGrid[ Table[ReliefPlot[ Table[Evaluate[ Sum[RiemannSiegelZ[RandomReal[3, 2].{x, y}], {3}]], {x, 0, 10, .2}, {y, 0, 10, .2}], ColorFunction -> ColorData["BlueGreenYellow"], Frame -> False], {3}, {3}]]
C ONTENTS
Generalizing from Two Dimensions 1.1 Introduction . . . . . . . . . . . . . . . . . . . 1.2 Existence of Optimal Solutions . . . . . . . . 1.3 Linear Programming . . . . . . . . . . . . . . 1.4 Finite- vs. Innite-Dimensional Vector Spaces 1.5 Minimum Norm Problems . . . . . . . . . . . Preliminaries in Algebra, Topology, and Analysis 2.1 Vector Spaces . . . . . . . . . . . . . . . . . . 2.2 Convex Sets . . . . . . . . . . . . . . . . . . . 2.3 Linear Independence and Dimension . . . . . 2.4 Normed Vector Spaces . . . . . . . . . . . . . 2.5 Open and Closed Sets . . . . . . . . . . . . . 2.6 Convergence, Limits, and Continuity . . . . . Banach Spaces 3.1 lp Space . . . . . . . . . . . . 3.2 Lebesgue Integration . . . . 3.3 Lp Space . . . . . . . . . . . 3.4 Cauchy Sequences . . . . . 3.5 Compactness and Extrema . 3.6 Quotient Spaces . . . . . . . 3.7 Denseness and Separability
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
5 5 5 7 9 14 21 21 25 28 30 34 37 41 41 44 47 48 53 55 56 59 59 61 63 64 67 75 75 77 87 87 90 91 92 95
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Hilbert Space 4.1 Inner Products . . . . . . . . . 4.2 The Projection Theorem . . . 4.3 Orthogonal Complements . . 4.4 The Gram-Schmidt Procedure 4.5 Fourier Series . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Dual Spaces and the Hahn-Banach Theorem 5.1 Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Common Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications of the Hahn-Banach Theorem 6.1 The Dual of C ra, bs . . . . . . . . . . . . . 6.2 The Second Dual Space . . . . . . . . . . . 6.3 Alignment and Orthogonal Components 6.4 Minimum Norm Problems . . . . . . . . . 6.5 Applications . . . . . . . . . . . . . . . . .
. . . . . 3
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4 0 6.6 7
CONTENTS Hyperplanes and Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 103 103 104 107 110 115 115 117 121 123 130
Calculus of Variations 7.1 Review . . . . . . . . . . . . . . . 7.2 Gateaux and Fr echet Differentials 7.3 Euler-Lagrange Equations . . . . 7.4 Problems with Constraints . . . . Convex Functionals 8.1 Local to Global . . . . . . . . . 8.2 Conjugate Convex Functionals 8.3 Conjugate Concave Functionals 8.4 Fenchel Duality . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Bibliography
C HAPTER 1
GENERALIZING FROM TWO DIMENSIONS
Reading: [1, Chapter 1]
1.1
Introduction
The general approach in [1], which has made the book a classic, is this: Identify techniques from algebra, elementary single-variable calculus, or elementary multivariable calculus that can be used to solve optimization problems. Reformulate the solution geometrically. Using geometry for inspiration, generalize the solution, typically to innite-dimensional vector spaces and non-Euclidean norms, and prove (algebraically) that it is still valid. All the nite-dimensional problems in this chapter should be familiar, though they may be valuable review for some students. The innite-dimensional problems are just stated, not solved, and we will take quite a while to get to them. In this chapter, we will hold off on dening some important concepts, which for now are just in SMALL CAPS . These concepts will appear in Chapters 2, 3, and 5 of [1], generally in a context where there is no mention of optimization, just some challenging mathematics. You will need to learn them before you tackle optimization. In the process, you will acquire a good background in real analysis and in the branch of mathematics called functional analysis, the theory of normed innite-dimensional vector spaces. My hope for these introductory notes is to convince you that optimization problems are fun and relevant, that some of the best ones can only be formulated in innite-dimensional vector spaces, and that it is worth your while to learn quite a few new denitions and theorems in order to be able to solve them.
1.2
Existence of Optimal Solutions
We begin by remembering an EXISTENCE THEOREM from real analysis: T HEOREM 1.2.1 (extreme value theorem). If f is a continuous real function on a compact metric space X , M suppPX f ppq, and m inf pPX f ppq, then there exist points q, r P X such that f pqq M and f prq m. The following examples illustrate applications of this theorem.
6 0
1. G ENERALIZING FROM T WO D IMENSIONS
Example 1.2.2. You are entering a student competition to draw up a business plan for a company with m scientists and n other employees. Entries with m2 2n2 get rejected. You want to have the highest possible ratio of scientists to other employees. Does this optimization problem have a solution? Solution (based on [2, Example 1.1, p. 2]): We want to solve max m{n such that m2 2n2 for m, n P N. Equivalently, we want to nd the largest rational number p m{n such that p2 2. We cannot apply Theorem 1.2.1 because the function we are optimizing (our OBJECTIVE FUNCTION) is rational-, not realvalued. This alone is not enough to rule out the existence of an optimal solution, but we can show that, in fact, an optimal solution does not exist. Assume that p is this largest rational number such that p2 2. Dene q Then
22 . p pp 2
(1.2.1)
q2 2
Since p2 2, (1.2.2) shows that q 2 2. However, (1.2.1) shows that q p, which contradicts our initial assumption that p is the largest rational number such that p2 2. Therefore this optimization problem has no solution. K Of course, note that as p2 approaches 2, m and n would become larger and larger. Since the number of people is nite, we should impose constraints m mmax and n nmax , in which case there would be a solution. Example 1.2.3. As director of the state lottery, you are designing scratch tickets by assigning probabilities to the possible payoffs from 0 through $4. Since a ticket sells for $5, you want to be as generous as possible. Does this optimization problem in R5 have a solution? Change the problem so that any nonnegative integer payoff is allowed. Does this optimization problem in an innite-dimensional vector space have a solution? Solution: We dene as generous as possible as maximizing the expected value of the ticket. There where pk is the probability of receiving a payoff fore, in the rst problem, we want to solve max 4 k0 kp k of $k , such that 0 pk 1 for all k 0, . . . , 4 and 4 k0 pk 1. Our objective function is continuous 4 and real-valued, and the constraints on tpk uk0 dene a compact set, so Theorem 1.2.1 guarantees the existence of a solution. We simply set pk 1 for k 4 and pk 0 for k 4. For the second problem, we want to solve max kPZ kpk , such that 0 pk 1 for all k P Z and kPZ pk 1. The notion of compactness is tricky in innite-dimensional vector spaces so we cannot apply Theorem 1.2.1 to this problem, but we can show that, in fact, an optimal solution does not exist. Assume that we have some solution pk k for all k P Z . We can always increase the expected value of the ticket and still satisfy our constraints by setting p0 0 and pk k1 for k P N since
2pp2 2q pp 2 q2 .
(1.2.2)
8
k 1
kk1
8
k 0
pk 1qk
8
k 0
kk . K
Because the expected value is unbounded, there is no solution.
1.3. Linear Programming 0 We will return to the concepts of compactness and METRIC SPACES later.
1.3
Linear Programming
Example 1.3.1. Your small bakery can produce only two products: frosted cookies and cakes. A batch of frosted cookies uses up 1 pound of our and 3 pounds of sugar. A batch of cake uses up 2 pounds of our and 1 pound of sugar. Each day your suppliers bring you 14 pounds of our and 17 pounds of sugar. Your optimization problem is to look at the market price of cookies and cakes and decide what to produce.
Figure 1.3.1: possible production schemes in Example 1.3.1 Solution: Let x be the number of batches of cookies, y be the number of batches of cake, p1 be the market price of a batch of cookies, and p2 be the market price of a batch of cake. Then we want to solve: max
x,y
p1 x p2 y x 2y
(revenueq (our constraint) (sugar constraint)
such that
14 3x y 17 x, y 0
Figure 1.3.1 shows the possible production schemes. Observe that if v and w represent possible production schemes, then v p1 qw with 0 1, is also possible. This is the denition of a CONVEX SET. To show this, let 1 2 x 14 A ,x ,b 3 1 y 17 so that we can write the constraints as Ax b. Then if Av b and Aw b, Av b and p1 qAw
8 0
p1 qb; adding these inequalities gives the desired result.

Now we can consider what the optimal solutions are for various values of p1 and p2 : p1 5, p2 5. We see from Figure 1.3.1 that the optimal solution is x $5 4 $5 5 $45.
4, y
5. Revenue is
p1 1, p2 7. Again, we see from Figure 1.3.1 that the optimal solution is x 0, y $1 0 $7 7 $47.
7. Revenue is
p1 6, p2 2. Notice in Figure 1.3.1 that the revenue function overlaps with the sugar constraint. Therefore, we maximize revenue by choosing any point along this constraint. Revenue is $6 17 3 $2 0 $34. K The revenue function is an example of a ( LINEAR ) FUNCTIONAL on R2 that we are trying to optimize. It is an element of the DUAL SPACE. The straight line that we slide to solve the problem is an example of a HYPERPLANE. The fact that this approach works for any convex set is a simple consequence of the H AHN -B ANACH T HEOREM. What we have done is to calculate, for any functional in the dual space, the largest value that this function can achieve subject to the constraint imposed by our budget. This functional on the dual space is called the SUPPORT FUNCIONAL. 1 ? Can you reconstruct the unit Example 1.3.2. What is the support functional for the (closed) unit disk D disk (or any other convex set) from its support functional?
Figure 1.3.2: As shown in Example 1.3.2, on the left we see that the envelope of all functionals for which the support functional returns a constant value (i. e. , 1) is the boundary of our convex set, ? the closed unit disk. On the right we see that the contour plots of the support functional cpm, nq m2 n2 are concentric circles centered at the origin; the contour cpm, nq 1 is the boundary of our convex set. Solution: Let the functionals be given by mx ny . The support functional is then cpm, nq inf c : mx ny 2 c for all px, yq P D
(
1.4. Finite- vs. Innite-Dimensional Vector Spaces 0
2 is the closed unit disk). We can see from the left side of Figure 1.3.2 that cpm, nq c0 , where mx0 (D ny0 c0 is the tangent to the unit disk at the point px0 , y0 q. The slope of the tangent is m{n, so the slope of the normal is n{m? and the angle between the normal and the positive x-axis is tan1pn{mq. ? ? Therefore, x0 cos m{ m2 n2 and y0 sin n{ m2 n2 . Then cpm, nq c0 m2 n2 . Observe that the value of the support functional evaluated for a given functional was just the given functional evaluated at some point on the boundary of the convex set. Therefore, intuitively, it seems that we can reconstruct the convex set from the support functional. We can see this in two ways. First, the boundary of the convex set can be given by a contour of the support functional, as shown in Figure 1.3.2. Equivalently, the boundary of the convex set is the envelope of all functionals for which the support functional returns some constant value, as shown in Figure 1.3.2. K
1.4
Finite- vs. Innite-Dimensional Vector Spaces
Example 1.2.3 illustrated some of the added complexities of moving from nite- to innite-dimensional vector spaces. The following examples further explore the differences between solving optimization problems in nite- versus innite-dimensional vector spaces. Example 1.4.1 (innite-dimensional: equivalent to [1, Example 8.7.3, p. 231234]). You are playing a realtime strategy computer game in which you have to build up a civilization from scratch. The rst phase of the game, the Age of Agriculture, lasts from time t 0 to time t T . In this phase, you have farms that produce at a rate f ptq. Your farms disappear when you move to the next age, the Age of Mining. What makes this game interesting is that you can allocate production between reinvestment rptq and storage sptq. Your farm production increases at a rate proportional to your reinvestment: f9ptq krptq, k
0, f9ptq df . dt kf ptq, so f ptq f0 exppktq. In
What system of allocation maximizes the amount of food available at time T ? Solution: Note that if we set sptq 0, then f ptq rptq and f9ptq krptq any case, we can solve (1.4.1) with the boundary condition f p0q f0 : f pt q
t
0
krp q d
f0.
krp q d
Then the total amount of food stored by time T is

T
0
sptq dt
T
0
pf ptq rptqq dt
T t
0 0
rptq
dt f0 T.
Reinvestment and storage must be nonnegative, so we have the constraints 0 rptq f ptq 0 rptq
t
0
krp q d
f0.
If functions r1 ptq and r2 ptq satisfy these inequalities, so does r1 ptq p1 qr2 ptq, where 0 1. The set of acceptable solutions to the problem is a CONVEX SET in an INFINITE DIMENSIONAL vector space and
10 0
the quantity to be maximized is a LINEAR FUNCTIONAL on this space. We guess that the optimal strategy is to reinvest everything during the interval p0, ts and then store everything during the interval pt, T s:
P p0, ts: f p q rp q, so f9p q krp q kf p q, f p q f0 exppk q, and sp q 0. P pt, T s: f p q sp q, so f9p q krp q 0, f p q f0 exppktq, and sp q f0 exppktq.
T
0
Then we want to maximize
sp q d
pT tqf0 exppktq.
Differentiating with respect to t gives 1 pT tqkf0 exppktq f0 exppktq 0 t T k . This solution holds if T 1{k 0 (if k 1{T ); if k 1{T , then the derivative of the objective function with respect to t is always negative and we are forced into a corner solution in which we just set t 0. K Example 1.4.2 (nite-dimensional). You are growing a new genetically engineered crop as part of a 2-year biofuels experiment. Your contract requires you to deliver 5 tons at the end of each year. The government pays all costs except the cost of fencing your plot of land, so the cost of producing x tons can be modeled ? as cpxq 6 x. The inventory cost of storing your excess crop is hx.
Figure 1.4.1: possible production schemes for Example 1.4.2 Solution: Let yi be the number of tons grown in year i for i 1, 2. We have the constraints y1 5 and y1 y2 10 since we must grow at least 5 tons in order to be able to deliver the required amount by the end of year 1, and we must grow at least 10 tons in order to be ale to deliver the required amount by the end of year 2. The region corresponding to possible production schemes is shown in Figure 1.4.1. Clearly,
1.4. Finite- vs. Innite-Dimensional Vector Spaces 0
11
our cost function 6 y1 6 y2 hpy1 5q is strictly increasing in both y1 and y2 within the feasible region, so the minimum cost will be found somewhere along the line y1 y2 10 between y1 5 and y1 10. We substitute y2 10 y1 into the cost function: now we want to minimize 6 y1 6 10 y1 hpy1 5q with respect to y1 . The derivative with respect to y1 is
?10 y ?y 1 10 y h 3 a 1 h, y1 p10 y1 q 1 which is strictly positive for y1 P p5, 10s and zero for y1 5. Therefore, we reach a corner solution by ? setting py1 , y2 q p5, 5q, at which the minimum cost is 12 5 26.83. K ?3y 1
3
Example 1.4.3 (innite-dimensional: equivalent to [1, Example 1.2.2, p. 3]). Your new contract requires to deliver at a known rate dptq during the interval t P p0, T s. You produce at a rate rptq. Your rate of a production cost is cprptqq (if all you have to pay for is maintaining the fence, this might equal rptq. If you have xptq tons on hand, you pay inventory costs at a rate hxptq. You start with an inventory xp0q. Solution: First, note that x 9 pt q have the constraints
rptq dptq. Both inventory and production must be nonnegative, so we

xptq xp0q
t
0
prp q dp qq d 0 r pt q 0 .
We are trying to minimize total costs:

T
0
pcprptqq hxptqq dt.
The optimal solution r ptq lies in an innite-dimensional vector space. Note that r ptq is not necessarily continuous: for example, we might want to produce some constant positive quantity r p q r for P p0, ts and then produce rp q 0 for P pt, Ts. rptq is not necessarily bounded either: if h is very small, T then the optimal solution will be to produce 0 dptq dt, the total amount required over the interval p0, T s, as rapidly as possible. K See [1, Example 8.7.4, p. 234] for a related problem.
Example 1.4.4 (nite-dimensional). You are operating a simple frictionless rocket-propelled car. You expend fuel instantaneously to increase your speed to v miles/second, coast 1 mile, and use your brakes to stop instantaneously. You then expend fuel to increase your speed to w miles/second, coast 4 miles, and stop instantaneously. The challenge is to minimize the travel time. The constraint imposed by your fuel tank is that v w 3.
12 0
Solution: The total time spent is 1{v 4{w. We use a Lagrange multiplier to solve this problem: Lpv, w, q
Combining the two partial derivatives gives 4v 2 w2 . Combining this with the constraint gives 4v 2 p3 vq2, so v 1 and w 2 (we discard the solution v 3 since v, w 0 as we are discussing speed, not velocity). The travel time is 1.5 seconds. K If this were a physics course, we might make this problem more realistic by assuming some nonzero coefcient of friction between the car and the surface on which it travels and some function that gives the rate of expenditure of fuel, among many other things.
BL 1 0 Bv v2 BL 4 B w w2 0
1 v
4 w pv w 3 q
Example 1.4.5 (innite-dimensional: [1, Example 1.2.5, p. 4]). Now your rocket-propelled vehicle goes straight up and is subject to gravity. You expend fuel at a rate uptq. Your goal is for the vehicle to reach height h at time T while expending minimum fuel.
Solution (full solution in [1, Example 5.9.3, p. 125]): Assuming unit mass, massless fuel, and the absence of aerodynamic forces, the equation of the rocket is governed by the second-order differential equation : ptq uptq g. We cannot expend negative fuel and we want to reach height h at time T , so we have the h constraints
T t
0 0
hpT q
pup q gq d
uptq 0 dt h,
9 p0q 0; that is, we start at ground level with zero where we assume in the second constraint that hp0q h T velocity. We want to minimize the total fuel expended: 0 uptq dt. The optimal solution u ptq is not necessarily continuous or bounded: for example, it might (and actually does) consist of an impulse at time t 0. In that case, in order to work with a function that is at t least bounded, it might be a better idea to work with v ptq 0 up q d , the total amount of fuel expended during the interval p0, ts. This insight is closely related to the R IESZ REPRESENTATION THEOREM. K
Example 1.4.6 (nite-dimensional). Your job is to invent the most exciting scratch ticket. If an outcome has probability p of occurring, the excitement that results when that outcome occurs is proportional to 1 1 log p. For example, the excitement of rolling 6s on two dice log 36 2 log 6 is precisely twice the excitement of rolling 6 on one die. The ticket can pay $k for k 0, 1, 2. You must assign probabilities pk for k 0, 1, 2 to to these three outcomes. So that the state can make a nice prot, the expected payoff must be 4{7 of a dollar.
1.4. Finite- vs. Innite-Dimensional Vector Spaces 0 Solution: Our two constraints are
2 k 0
13
pk
1
H pX q
2 k 0
2 k 0
kpk
4 7 .
We want to maximize the quantity
(H pX q, the excitement function, is the information entropy or Shannon entropy of the random variable X , the payoff of the lottery ticket.) We can substitute the constraints into (1.4.1) to turn this into a singlevariable problem, where we maximize
pk log pk .
(1.4.1)
3 7
p2
log
3 7
p2
4 7
2p2
log
4 7
2p2 p2 log p2

2
over p2 . However, this would be messy, so we just use Lagrange multipliers: Lpp1 , p2 , p3 , 1 , 2 q
2 k 0
BL 1 log p k 0, 1 2 k B pk
Therefore, pk
pk log pk 1
2 k 0
pk 1
k 0
0, 1, 2
k pk
4 7
e1
e2
. Some algebra tells us 1
log 7 1 and 2 log 2, so pk 2k {7.
Example 1.4.7 (innite-dimensional). Your job is to again to invent the most exciting scratch ticket. The ticket can have any nonnegative integer payoff. You need to assign probabilities pk for k P Z to each of these possible outcomes. Make the expected payoff be $1, since the tickets will sell for $1.50. Solution: Now we are trying to maximize over an innite number of variables. Obviously we cannot use substitution as we could in the previous example since we still have an innite number of variables after taking into account the two constraints
8
k 0
pk
1
8
k 0
kpk
1.
There are two LINEAR FUNCTIONALS in the dual space acting as constraints, so the CODIMENSION of this problem is 2. We can still solve the problem using Lagrange multipliers: Lpp0 , p1 , . . . , 1 , 2 q
8
k 0
BL 1 log p k 0, 1 2 k B pk
Therefore, pk
pk log pk 1
8
pk 1
2 0
k 0
k 0
k pk 1
e1
e 2
. We recognize this as in the form of a geometric distribution pk
pp1 pqk
14 0
(note this is a geometric distribution with support r0, 8q, not r1, 8q). Since the mean of a geometric distribution is p1 pq{p, we have p 1{2, so 1 log 2 1, 2 log 2, and pk 1{2k1 . K
1.5
Minimum Norm Problems
Minimum norm problems arise in several instances, as shown by the following examples. Example 1.5.1. You are visiting a friend who lives at the point px, y q along the main highway, a subspace whose equation is x 2y 0.
p2, 1q. A bus drives through town
Where do you get off the bus in order to minimize your walking distance to the friends house? Suppose you take a taxicab, which can travel only north-south or east-west. The cab driver charges for the driving distance. Where do you get out of the cab in order to minimize your cost? What if the driver only charges for the larger dimension?
Figure 1.5.1: minimizing the distance between the highway and the house for the various norms in Example 1.5.1 Solution: In the rst problem, we are minimizing the 2-NORM or E UCLIDEAN NORM. We can solve the problem geometrically by expanding a circle centered at the house until it is tangent to the road, as shown in Figure 1.5.1. The point of tangency is the solution to the system
#
x 2y
0 y 1 2 px 2 q
highway
normal to the highway through the point p2, 1q

a
or px, y q p6{5, 3{5q. The minimum cost (assuming unit cost) is then p4{5q2 p8{5q2 4{sqrt5. Note that we were implicitly invoking the PROJECTION THEOREM ([1], Theorem 2, p. 51) in knowing that we minimize the distance by nding the point on the highway through which the normal through p2, 1q passes.
1.5. Minimum Norm Problems 0
15
In the second problem, we are minimizing the 1-norm or TAXICAB NORM. We can solve the problem geometrically by expanding a diamond centered at the house until it touches the road, as shown in Figure 1.5.1. The point where the square touches the road is the solution to the system x 2y 0, x 2, or px, yq p2, 1q. The minimum cost is then 4{3 4{3 2. In this case, the projection theorem with the ordinary concept of perpendicular does not apply, but we could have used [1, Theorem 5.8.1, p. 119]. In the third problem, we are minimizing a type of M INKOWSKI FUNCTIONAL (see [1, p. 131]). We can solve the problem geometrically by expanding a square around the house until it touches the road, as we did before. The factor by which you must expand the unit square in order for it make it touch a point, or 4{3, is the minimum cost. K Example 1.5.2 (minimum norms and convex sets). Assume that the Island of Sodor is a convex set (it actually is not see Figure 1.5.2) whose shore is a smooth curve. Reverend Awdry is offshore and want to swim to the closest point on the island.
Figure 1.5.2: The Island of Sodor is actually not a convex set (Example 1.5.2)! Solution: We solve the problem geometrically by expanding a circle centered at Reverend Awdrys location until it just touches the island. Let the radius of the circle be d and the point of tangency between the circle and the convex set be pt . Consider any other tangent to the circle, where the point of tangency is very close to pt : this tangent will cut across the interior of the island, otherwise either it would not be a tangent or the set would not be convex. Therefore, d is the minimum distance from Reverend Awdrys location to the shore. Now consider all hyperplanes (lines) that separate Reverend Awdry from every point on the island. The reverend is farthest away from the hyperplane tangent to the island that we just constructed. K This is a DUALITY theorem: the distance d solves a minimization problem over points on the island and also solves a maximization problem over elements of the dual space. The theorem is proved in [1] (Theorem 1, p. 136) with no restrictions except convexity. That is: The shore of the island need not be smooth.
16 0
The vector space can be innite-dimensional. The norm need not be the Euclidean norm. An important class of minimum norm problems is nding a polynomial that best approximates a function in some interval. Before taking a brief look at approximation problems, we rst dene a LINEAR FUNCTIONAL : Denition 1.5.3 (linear functional). Let V be a vector space over a eld F. A linear functional is a map f : V F such that f pv wq f pvq f pwq for all v, w P V and f pavq af pvq for all v P V and all a P F. Example 1.5.4. Consider the innite-dimensional vector space V of all continuous functions f on r1, 1s. Let W V be the 4-dimensional subspace of polynomials p for which deg p 3. Which of the following are linear functionals? L : f pxq f p1q L : f pxq f p0q L : f pxq f p1q L : f px q
1
1 f pxq dx
Solution: The rst three are linear functionals since L pf
gq pf gqpxq f pxq gpxq Lpf q Lpgq Lpaf q af pxq aLpf q pf pxq gpxqq dx
1 1
for all x P R. The last is also a linear functional since L pf
gq
f pxq dx
1
g pxq dx Lpf q Lpg q
Lpaf q
a f pxq dx a
f pxq dx aLpf q.
Note that these four functionals acting on V are clearly linearly independent, as we cannot write one as a linear combination of the others for every continuous function f on r1, 1s. If we restrict ourselves to W k and write f pxq 3 k0 ak x , then f p1q a0 a1 a2 a3 f p0q a0

f p1q a0 a1 a2 a3 f pxq dx a0 x
a1 2 a2 3 a3 4 1 x x x 2 3 4 1
2 2a0 3 a2 .
1.5. Minimum Norm Problems 0 Since

17
1 1 det 1 2
1
0 1 0
1 0 1
2 3
1
0 1 0

0,
the functionals are linearly dependent. More elegantly, we might have remembered that Simpsons rule is exact for polynomials p for which deg p 3, so we can write
1
f pxq dx
1 pf p1q 4f p0q f p1qq . 3
Example 1.5.5. You have lost your scientic calculator and can only evaluate polynomials ppxq. You need to compute approximate values of f pxq sin x for randomly chosen values x P r0, 1s. Here are three norms that might be used to choose the best approximating polynomial: L1 -norm:
|f pxq ppxq| dx L2 -norm: 0 pf pxq ppxqq2 dx L8 -norm: max0x1 |f pxq ppxq|

1
0
For which of these norms is the Taylor polynomial the right choice for ppxq? Under what circumstances might one or another of these norms be the appropriate choice? Solution: The Taylor polynomial is actually not the correct choice for any of these norms. It turns out that interpolating polynomials arise in minimizing the 1-norm, Fourier series in minimizing the 2-norm, and C HEBYSHEV POLYNOMIALS in minimizing the 8-norm. See [3] for a complete textbook on approximation theory. K The following theorem provides reassurance that under reasonably broad conditions, functions can be approximated by polynomials to arbitrary accuracy. T HEOREM 1.5.6 (Weierstra Approximation Theorem). If f is a continuous real-valued function on r0, 1s, then there exists a polynomial p on ra, bs such that |f pxq ppxq| for all x P ras and any 0. Proof: Dene the n 1 Bernstein basis polynomials of degree n as bk,n pxq

n k x p1 xqnk , k
0, . . . , n.
Then a linear combination of Bernstein basis polynomials B pX q

n k 0
v bk,n pxq
is called a Bernstein polynomial, and 1 , . . . n are called the Bernstein or B ezier coefcients. The proof of the Weierstra approximation theorem is constructive: we will construct a series of Bernstein polynomials that converges uniformly to f .
18 0
Dene the Bernstein polynomial Bn pf qpxq

n k 0
k n
n k x p1 xqnk , k
x P r0, 1s.
f is continuous and r0, 1s is compact, so f is uniformly continuous ([2, Theorem 4.19]) on r0, 1s. Therefore, given any 0, there exists a 0 such that |x y | implies |f pxq f py q| {2. Because f is continuous and r0, 1s is compact, we can also conclude that f is bounded ([2, Theorem 4.15]); say supxPr0,1s |f pxq| M . Now f pxq Bn pf qpxq so,
n k 0
f px q f
k n
n k x p1 xqnk , k
|f pxq Bnpf qpxq|

n k f x n xk 1 x nk f k n k 0

k f x n xk 1 x nk f n k k PS loooooooooooooooooooooooomoooooooooooooooooooooooon
p q
p q
p q
p q

k f x n xk 1 x nk , f n k k PT loooooooooooooooooooooooomoooooooooooooooooooooooon
p q
p q
where S tj : |x j {n| n1{4 , j P r0, nsu and T tj : |x j {n| n1{4 , j P r0, nsu r0, nszS . k nk and 2 var X {n pp1 pq{n 1{4n. By Let X Binompn, pq, so that P pX k q n k x p 1 xq Chebyshevs inequality, A
MP X
k n1{4 n
1 xq ? M n M 4? . M xp 1 1{2 n n
2
Equivalently, A {2 if n M 2 {4 2 . Furthermore, if |X k {n| n1{4 and n1{4 , then |X k {n| and |f pxq Bn pf qpxq| {2 since f is uniformly continuous on r0, 1s. Therefore, B is trivially less than {2. We conclude that if n maxpM 2{4 2, 4q, then |f pxq Bnpf qpxq| A B for all x P r0, 1s; that is, the Bernstein polynomials Bn pf q converge uniformly to f . K Example 1.5.7 (Gibbs Phenomenon). Find a discontinuous real-valued function for which such a polynomial does not exist. Solution (based on [4]): Consider the function f pxq sgn sin x, where sgn x Clearly f is discontinuous at x k , for k
$ ' ' &1 ' ' % 1
x0
x0.
x0
P Z. Consider the partial Fourier series:

N 4 sinpp2k 1qxq . k 1 2k 1
SN f p xq
1.5. Minimum Norm Problems 0
19
Figure 1.5.3: Notice how limN 8 SN f pxq f pxq at x nomenon or ringing artifacts (Example 1.5.7).
0. This discrepancy is known as Gibbs phe-
By the Weierstra approximation theorem, we have limN 8 |f pxq SN f pxq| some closed interval not containing a multiple of .
for all x where x is in
However, we notice in Figure 1.5.3 that SN f pxq exhibits a bump whenever x SN f pxq reaches a critical point whenever 4 pSN f q1pxq
N k 1
k . In particular,
cospp2k 1qxq 0.
To nd the zeros of pSN f q1 pxq, we rst show that 2 sin x

N k 1
cospp2k 1qxq sinp2N xq.
(1.5.1)
The proof is by induction. For N 1, Equation (1.5.1) reduces to the identity sinp2xq 2 sin x cos x. Now assume Equation (1.5.1) is true for general N and show that this implies it is true for N 1. We have 2 sin x
N 1 k 1
cospp2k 1qxq 2 sin x
paq
k 1
sinp2N xq 2 sin x cospp2N 1qxq sinp2N xq 2 sin x cosp2N xq cos x 2 sin x sinp2N xq sin x pbq p1 2 sin2 xq sinp2N xq sinp2xq cosp2N xq p cq dq cosp2xq sinp2N xq sinp2xq cosp2N xq p sinpp2N 1qxq.
cospp2k 1qxq cospp2N
1qxq
where (a) follows from the inductive step, (b) follows from the identity sinp2xq 2 sin x cos x, (c) follows from the identity cosp2xq cos2 x sin2 x p1 sin2 xq sin2 x 1 2 sin2 x, and (d) follows from the identity sinpa bq cos a sin b sin a cos b. Using this fact, we see that SN f reaches a critical point whenever sin xpSN f q1 pxq 2 sinp2N xq 0, or whenever x is a multiple of {2N . Consider the closest critical points to x 0: x {2N . Then we
20 0 have

k 1 N 2 sin 22 N . 2 k 1 k 1 N 2N !
SN f
2N
2k 2pk1q The last sum is a Riemann sum taken over midpoints of the partition ,k 2N , 2N Therefore, sin x lim SN f lim f pxq dx 1.1790. N 8 2N x x 0 0
N k 1 4 sin 22 N k1 2k 1
0, . . . , N 1
A similar analysis limx0 f pxq 1.1790. Therefore, the sequence of polynomials tSN f u does not converge pointwise to f . However, we do see that the sequence tSN f u does converge to f for the L1 - and L2 -norms, though it does not for the L8 -norm. K
C HAPTER 2
PRELIMINARIES IN ALGEBRA , TOPOLOGY, AND ANALYSIS
Reading: [1, Sections 2.1 2.9]. Also see [5, Chapters 1 2] for more on the linear algebra covered in this chapter, [6, Chapter 2] for more on the topology, and [2, Chapters 2 and 4] for more on the topology and analysis.
2.1
Vector Spaces
Denition 2.1.1 (vector space). A vector space V over a eld F is a set V along with two operations: addition, which associates with any two vectors u, v P V a vector u v P V , and scalar multiplication, which associates with any vector v P V and any scalar a P F a vector av P V . The following axioms hold: 1. commutative law for vector addition: u v v u for all u, v P V . 2. associative law for vector addition: pu vq w u pv wq for all u, v, w P V . 3. existence of an additive identity: There exists a null vector 0 P V such that v 0 v for all v P V . 4. distributive law for vector addition: apu vq au av for all u, v P V and all a P F. 5. distributive law for scalar addition: pa bqv av bv for all v P V and all a, b P F. 6. associative law for scalar multiplication: pabqv apbvq for all v P V and all a, b P F. 7. existence of a multiplicative identity: 1v v for all v P V . Also, 0v 0 for all v P V .
Unless specied otherwise, we will assume we are working with real vector spaces (F R). Example 2.1.2. Note that [1] replaces a standard axiom, the existence of an additive inverse, with the axiom 0v 0 for all v P V . Prove the existence of an additive inverse from [1]s axioms.
bq pcq 0v p1 p1qqv p 1v p1qv v p1vq, where (a) follows from [1]s axiom 0v 0 for all v P V , (b) follows from the distributive law for scalar
Solution: We have
paq
addition, and (c) follows from the denition of a multiplicative identity. Therefore, we dene the additive inverse v 1v. K
21
22 0
2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
Example 2.1.3. Which axiom on the list can be proved from the others? Solution: We can prove the commutative law from the other axioms. We have 0
0 pu v q pp1q 1qpu vq pbq p1qpu vq 1pu vq pcq p1qu p1qv 1u 1v pdq u v u v
paq
where (a) follows from the fact that 0v 0, (b) follows from the distributive law for scalar addition, (c) follows from the distributive law for vector addition, and (d) follows from the existence of additive inverses, as shown above, and the denition of the multiplicative identity. Then we add v to the left of both sides to get u u 0 u u v u v v u v; repeating with v gives v u u v. K How do we know that we cannot prove another axiom from the remaining ones? We can try dening vector addition and scalar multiplication in different ways and seeing if the resulting operators still obey the axioms for a vector space. If not, then the axioms that are not satised are independent of the axioms that are. Example 2.1.4. Let V R2 . Dene addition as usual, but dene scalar multiplication by apx, y q for px, y q P V and a P R. Is V a vector space? If not, what axiom is not satised?
pax, 0q
Solution: A multiplicative identity does not exist in this vector space since there exists no a such that apx, y q px, y q for all px, y q. Therefore, the existence of a multiplicative identity is independent of the other axioms. K Example 2.1.5. For the NSA budget, let V R Ytsu. The symbol S denotes a secret amount. For elements in V other than s, addition and multiplication are dened as usual. However, s v v s s for all v P V and as s for all a P R. Is V a vector space? If not, what axiom is not satised? Solution: No, because the axiom 0v 0 is not satised. Clearly 0s 0 s. Below are some potential vector spaces: K
All 2 3 matrices with real entries: This is equivalent to R6 , which is a 6-dimensional vector space All innite sequences: innite-dimensional vector space
All bounded innite sequences: innite-dimensional vector space (since the sum of two bounded sequences is bounded and a bounded sequence times a constant is still bounded)
All innite sequences that converge to zero: innite-dimensional vector space
2.1. Vector Spaces 0
23
All innite sequences with only nitely many nonzero terms: innite-dimensional vector space All innite sequences for which the terms form a convergent series: innite-dimensional vector space
All functions f : tA, B, C u R: 3-dimensional vector space All polynomials of any (nite) degree: innite-dimensional vector space
All polynomials p with deg p degpf g q 3)
3:
not a vector space (consider f pxq
x3 and gpxq x3:
All power series: innite-dimensional vector space All continuous functions on r0, 1s: innite-dimensional vector space All linear functions f : R3
R: 3-dimensional vector space (since we have the basis tf p1, 0, 0q, f p0, 1, 0q, f p0, 0, 1qu
because f is linear) Note that we nested SUBSPACES inside each other: Denition 2.1.6 (vector subspace). A nonempty subset U of a vector space V over F is called a subspace of V if for any u, v P U and a, b P F, au bv P U . That is, U is a subspace of V if U is closed under vector addition and scalar multiplication. A subspace must be nonempty, so it must contain at least one element. By the axiom guaranteeing the existence of an additive identity, that one element must be 0. Therefore, every subspace contains 0. Proposition 2.1.7 ([1, Proposition 2.3.1, p. 15]). If U1 and U2 are subspaces of V , so is U1 X U2 . Proof: 0 P U1 , U2 since U1 and U2 are subspaces, so 0 P U1 X U2 . Therefore, U1 X U2 is nonempty. If u, v P U1 X U2 , then u, v P U1 and u, v P U2 . Then au bv P U1 , U2 since U1 and U2 are subspaces, so au bv P U1 X U2 . K
While U1 X U2 is a subspace if U1 and U2 are subspaces, it is not necessarily true that U1 Y U2 is a subspace. Consider any two (noncolinear) lines in R2 that pass through the origin. These lines are both subspaces, but the smallest subspace containing both of these subspaces is R2 . Denition 2.1.8 (sum of vector spaces). The sum of two subsets U1 and U2 of a vector space is the set of all sums u1 u2 , where u1 P U1 and u2 P U2 .
Example 2.1.9. What are the sum and difference of two squares, one centered at a and with side length 2r and the other centered at b and with side length 2s?
24 0
Solution: We skip ahead a little and assume knowledge of NORMED VECTOR SPACES. Let the rst square be the set A tx : }x a}8 ru and B ty : }y b}8 su. Consider any x P A and y P B . We have
}px yq pa bq}8 }x a}8 }y b}8 r s,

where the rst inequality follows from the TRIANGLE INEQUALITY. Therefore, A B is a square centered at a b with side length 2pr sq. Furthermore, we have
}px yq pa bq}8 }x a}8 }y b}8 r s,

where again we use the triangle inequality, as well as the property that }ax} |a|}x} for any scalar a P F and any vector x P X , where X is a normed vector space. Therefore, A B is a square centered at a b with side length 2pr sq. K
Note that A B does not depend on our choice of the origin o since pa oq pb oq a b. This construction will be crucial to the proof of the E IDELHEIT SEPARATION THEOREM ([1], Theorem 3, p. 133). Proposition 2.1.10 ([1, Proposition 2.3.2, p. 15]). If U1 and U2 are subspaces of V , then U1 U2 is a subspace of V. Proof: U1 and U2 contain 0, so U1 U2 contains 0. Suppose u1 , u2 P U1 U2 . Then there exist vectors v1 , v2 P U1 and w1 , w2 P U2 such that u1 v1 w1 and u2 v2 w2 . For any scalars a, b P F, we have au1 bu2
looooomooooon pav1 bv2q loooooomoooooon paw1 bw2q;

PU1 PU2
since au1 bu2 can be expressed as the sum of a vector in U1 and a vector in U2 , it is in U1 U2 . Therefore, U1 U2 is a subspace of V . K Given any set S of vectors in V , there will in general be many subspaces of V that contain S . One is V itself, but there may be smaller ones. For example, if S tp1, 1, 0u, then some examples of subspaces that contain S are tpx, x, 0q : x P Ru, tpx, y, 0q : x, y P Ru, and tpx, y, z q : x, y, z P Ru, which are of dimensions 1, 2, and 3, respectively. The smallest such subspace is given a special name: Denition 2.1.11 (subspace generated by a subset). Suppose S is a subset of a vector space V . The set rS s, called the subspace generated by S , consists of all vectors in V which are linear combinations of vectors in S . If subspace U contains S , it contains rS s. Equivalently, rS s is the intersection of all subspaces that contain S. The usual way to construct rS s is to form all linear combinations of vectors in S . This is clearly a subspace since it is closed. Example 2.1.12. Construct rS s P R2 if S
tp1, 2q, p3, 6qu
2.2. Convex Sets 0
25
tp1, 2q, p3, 3qu
S is the line segment from p1, 2q to p3, 6q.
Solution: We have rS s tpx, 2xq : x P Ru since p3, 6q is a multiple of p1, 2q. rS s R2 since p1, 2q and p3, 3q are linearly independent. rS s tpx, 2xq : x P Ru since the line segment is part of the line y
2x.
S UBSPACE generalizes line or plane through the origin. The generalization of any line or plane is LINEAR VARIETY , a subspace plus a constant vector: Denition 2.1.13 (linear variety). The translation of a subspace is a linear variety or afne subspace. Analogous to the subspace generated by a subset, we have the following: Denition 2.1.14 (linear variety generated by a subset). Suppose S is a nonempty subset of a vector space V . The set pS q, called the linear variety generated by S , is the intersection of all linear varieties in V that contain S . Example 2.1.15. If S P R3 consists of the vectors tp0, 0, 1q, p0, 1, 1qu, give examples of subspaces of two different dimensions that contain S . Which subspace is the smallest? What is pS q? Solution: The subspaces tp0, x, y q : x, y P Ru and tpx, y, z q : x, y, z P Ru are subspaces of dimensions 2 and 3, respectively, that contain S . The rst subspace is the smallest. pS q is p0, 0, 1q tp0, x, 0q : x P Ru. K Example 2.1.16. In R3 , what is pS q if S is a circle? Under what circumstances does pS q rS s? Solution: pS q is the plane containing the circle. If the origin lies in the plane containing the circle, then pS q rS s. K
2.2
Convex Sets
Denition 2.2.1 (convex set). A set K in a linear vector space is said to be convex if for any x1 , x2 elements in the set tx1 p1 qx2 : P r0, 1su are in K .
P K , all
26 0
Proposition 2.2.2. The empty set is convex. Proof: To show that is not convex, we must nd two vectors x1 , x2 P such that there exist vectors in the set tx1 p1 qx2 : 0 1u not in . Since contains no vectors at all, it is convex. K Proposition 2.2.3 ([1, Proposition 2.4.1, p. 18]). For any convex sets K, L and any scalars a, b, aK convex. Proof: Consider the vectors z1 , z2 P aK bL. There exist vectors x1 , x2 z1 ax1 by1 and z2 ax2 by2 . Since K and L are convex, we have z1 p1 qz2
bL is
P K and y1, y2 P L such that
pax1 by1q p1 qpax2 by2q apx1 p1 qx2q bpy1 p1 qy2q.
Then x1 p1 qx2 and y1 p1 qy2 are in K and L, respectively, since K and L are convex. Therefore, z1 p1 qz2 is in aK bL. K
Figure 2.2.1: The union of the two disks is not convex. In general, the union of two convex sets is not convex, as shown in Figure 2.2.1. However, we do have the following: Proposition 2.2.4 ([1, Proposition 2.4.2, p. 18]). Let C be an arbitrary collection of convex sets. Then XK PC K is convex. Proof: Let C XK PC . If C is empty, then the proof reduces to that in Example 2.2.2. Assume we have x1 , x2 P C and choose any P r0, 1s. Then x1 , x2 P K for all K P C and since each K is convex, x1 p1 qx2 P K for all K P C . Therefore, x1 p1 qx2 P C and C is convex. K Denition 2.2.5 (convex hull). Given an arbitrary set S in a linear vector space, the convex hull or convex cover, denoted copS q, is the smallest convex set containing S . We can express copS q as the intersection of all convex sets containing S . Alternatively, we could express copS q as the set of all convex combinations of vectors in S , where a convex combination is a linear combi| S | |S | nation k1 k xk , where k 0 for k 1, . . . , |S | and k1 k 1.
2.2. Convex Sets 0
27
Proposition 2.2.6. Let K be the set of vectors consisting of all convex combinations of vectors in S . Show that K copS q. Proof: First we show that K copS q. Let Km be the set of all convex combinations of the form m i1 i xji , m n where i1 i 1, ai 0 for i 1, . . . , m, and m 1, . . . , n. Clearly K m1 Km . Then in order to show K copS q, we need to show that Km copS q for m 1, . . . , n. The proof is by induction. This clearly holds for m 1 since in that case 1 1 and the convex combinations are therefore just the elements of S copS q. Now we assume that the result is true for general m n and show that this m1 k 1 implies it is true for m 1. Say we are given a convex combination p i 1 i xji , where i1 i 1 and ai 0 for i 1, . . . , m 1. At least one of 1 , . . . , m1 must be strictly positive; without loss of generality, assume that 1 0. Then p
m 1 i 1
i x ji
1 xj
1
m 1 i 2
i x ji
1 loox j on mo
1
m 1 i 2
m 1
m1 xji , i2 i i 2 looooooooomooooooooon

r
Clearly q
P copS q since q P S . Since

m 1 i 2
m1
i 2
i
2 i 1, im 1
i 2
m1
i
r is a convex combination of m elements of S and therefore, by the inductive hypothesis, is also in copS q. Then p is on the line segment connecting q and r with q, r P copS q, so p P copS q, as we wanted to show. Remember that copS q is the smallest convex set containing S . We know that S K1 K copS q, so we just need to show that K is actually convex in order to show that K copS q. Say we are given two n n n , 0 for i 1, . . . , n. elements of K , q n i1 i 1 and i1 i xi , with i1 i i1 i xi and r i i i xi p1 q n Then any point on the line segment connecting q and r can be written as p n i1 i x i i 1 n with P r0, 1s. We can rewrite p as i1 pi p1 qi qxi , with
n
i 1
pi p1 qiq
n i 1
i p1 q
n i 1
p 1 q 1,
K
so p is a convex combination of x1 , . . . , xn and is therefore in K .
Denition 2.2.7 (cone). A set C in a linear vector space is said to be a cone with vertex at the origin if x P C implies x P C for all 0.
Example 2.2.8. Make a cone from a line segment, from a circle, and from a disk. Solution: Figure 2.2.2 shows a cone made from the line segment connecting p5, 5q and p5, 10q and a cone made from the circle of radius 1{2 centered at p0, 0, 1{2q and parallel to the xy -plane. A cone made from the corresponding disk would be the same as the cone made from the circle, except that the cone made from the disk would be lled in. K
28 0
Figure 2.2.2: cones made from the line segment and the circle in Example 2.2.8
Example 2.2.9. Make a convex cone from a non-convex set. Solution: Consider the convex cone C generated from the line segment L in Figure 2.2.2. If we replace L with an arc A that has the same endpoints as L and that is entirely contained within C , then we still generate C . However, A is a non-convex set. K Proposition 2.2.10. It is impossible to make a non-convex cone from a convex set. Proof: Let C be a cone generated from a convex set K . Pick any two vectors y1 , y2 P C . Then there exist vectors x1 , x2 P K and scalars 1 , 2 0 such that y1 1 x1 and y2 2 x2 . Furthermore, we can set 1 and 2 p1 q where 0 and 0 1: specically, 1 2 and 1 {p1 2 q. Then we can write y c y 1 p 1 qy 2 p x 1 p 1 q x 2 q x c ; xc
P K since K is convex and yc P C by denition of a cone. Therefore, C is convex.

2.3 Linear Independence and Dimension
Denition 2.3.1 (linear independence). A vector x is said to be linearly dependent upon a set of vectors S if x can be expressed as a linear combination of vectors in S . Equivalently, x is linearly dependent upon S if x P rS s. Conversely, x is said to be linearly independent of the set S if it is not linearly dependent on S , and a set of vectors is said to be a linearly independent set if each vector in the set is linearly independent of the remainder of the set. Note that this denition works even for innite-dimensional vector spaces. Example 2.3.2. Let S t1, t2 , t4 , . . . u. Is f ptq pt8 4t2 qpt6 dependent upon S ? Is S a linearly independent set?
t2 7q dependent upon S ?
Is g ptq
t5
2.3. Linear Independence and Dimension 0
29
Solution: f ptq is dependent upon S since each term of f ptq has an even exponent. g ptq is independent of S since t5 has an odd exponent. S is a linearly independent set. K T HEOREM 2.3.3 ([1, Theorem 2.5.1, p. 20]). The set tx1 , . . . , xn u is linearly independent if and only if k1 k xk 0 implies k 0 for k 1, . . . , n. K
n
k 1 k x k
Proof: See the proof in [1]. T HEOREM 2.3.4 ([1, Corollary 2.5.1, p. 20]). If tx1 , . . . , xn u is a linearly independent set and n k1 k xk , then k k for k 1, . . . , n. Proof: If 2.3.3.
n
k 1 k xk
n n k1 pk k qxk 0 and k k for k 1, . . . , n by Theorem k1 k xk , then
Denition 2.3.5 (basis). A nite set B of linearly independent vectors is said to be a basis for the space V if rB s V . The number of vectors |B | in B is called the dimension of V . Example 2.3.6. What is the dimension of the space spanned by t1, cos2 x, sin2 x, cos 2xu? Solution: This space has dimension 2, since we can write 1 sin2 x.
cos2 x
sin2 x and cos 2x
cos2 x
Q UIZ T HEOREM 1 (from [1, Theorem 2.5.2, p. 21]). If a vector space V is generated by the set of k vectors Sk tv1 , . . . , vk u, then any set of k 1 vectors Tk1 tw1 , . . . , wk1 u in V must be linearly dependent. Proof: The proof is by induction. For the base case k 1, S1 tv1 u and T2 tw1 , w2 u. There exist scalars 1 , 2 0 such that w1 1 v1 and w2 2 v1 . Then we have 2 v1 1 v2 0, so T is a linearly dependent set. Our inductive hypothesis is that if V rSk1 s, then Tk is a linearly dependent set. We assume this is true for arbitrary k and show that it holds for k 1. First, we can write wk1 k i1 i vi since tv1 , . . . , vk u is a basis. Choose some j such that j 0. If this cannot be done, then w1 0 and Tk1 is a linearly 1 Sk tvj u. Now dependent set. Then let Sk vj 1 wk1 v
j
1 s, with v for some v P rSk
i 1
i j
bi vi . Substituting this into the expression below, we have
wm
k i 1
i vi
pi biqvi wk1
i 1
i j
30 0
for m 1, . . . , k . By the inductive hypothesis, these k vectors w1 , . . . , wk are linearly dependent; that is, there exists some linear combination k i wi 0 with not all of 1 , . . . , k equal to 0. But then we can i1 set k1 0 so that the linear combination k k1 i wi also equals 0; again not all of 1 , . . . , k equal 0 so the set Tk1 is linearly dependent, thus proving the theorem. K We conclude that if Sk is a basis for V , then any set of more than k vectors in V is linearly dependent and cannot be a basis. Therefore, any two bases for a nite-dimensional vector space contain the same number of elements. Example 2.3.7. Show that the result does not extend to innite-dimensional vector spaces. Solution: The vector space V of convergent sequences is generated by the countably innite set of linearly independent elements S tp1, 0, . . . q, p1, 1, 0, . . . q, p1, 1, 1, 0, . . . q, . . . u. If we remove the rst element of this set, we still have a countably innite set S 1 S tp1, 0, . . . qu, but this does not generate V since not all convergent sequences tak u8 K k1 have a1 a2 .
2.4
Normed Vector Spaces
Denition 2.4.1 (normed vector space). A normed vector space, or normed linear space or normed linear vector space, is a vector space X on which there is dened a real-valued function which maps every x P X into a real number }x} called the norm of x. The norm satises the following axioms: 1. positive homogeneity: }x} ||}x} for all P R and all x P X . 2. triangle inequality: }x y} }x} }y} for all x, y P X . 3. positivity and positive deniteness: }x} 0 for all x P X and }x} 0 if and only if x 0. A real-valued function p : X R that satises just the rst two axioms is called a SEMINORM. Note the following about each axiom: positive homogeneity If you take a cab from your destination back to your starting point, you should pay the same amount as when you went from your starting point to your destination. You cannot save money by breaking a cab ride along a single straight road into two pieces. The cab driver cannot offer bargain fares for extra-long rides. dimensions: }x}4
4 In order to convert x4 1 x2 into a norm, we must take the fourth root (this is the l4 -norm in 2
4 x4 1 x2
1
4
).
If you draw a convex set that contains the origin and decree that the cab fare from the origin to any point on the boundary of this set is $1 and that fare scales linearly with distance, you do not necessarily have a norm. In fact, you only have a norm if the boundary is the set tx : }x} cu for some constant c.
2.4. Normed Vector Spaces 0 triangle inequality You cannot save money by breaking a cab ride into two pieces.
31
A cab driver can offer a 50% discount on the shorter dimension (}x pu, v q} 1 2 minp|u|, |v |q maxp|u|, |v |q). Say we have x pa, bq and y pc, dq. Assume without loss of generality that minp|a|, |b|q |a|. If minp|c|, |d|q |c|, then clearly }x y} }x} }y}. If minp|c|, |d|q |d|, assume without loss of generality that minp|a c|, |b d|q |a c|. Then
}x} }y} }x y}
1 |a c| |b d| 2 1 p|a| |c| |a c|q p|b| |d| |b d|q 1 p|c| |d|q 0, 2 2
1 |a| |b| 2
|c| 1 |d| 2
The set of vectors tx : }x} 1u is convex since
where the inequality follows since each term in parentheses is nonnegative. Similarly, a cab driver can also offer a 50% discount on the longer dimension.
}x1 p1 qx2} }x1} p1 q}x2} p1 q 1.

positivity and positive deniteness The if part of positive deniteness is not an independent requirement: positive homogeneity implies that }x} 0 if x 0 since we have }0} }00} |0|}0} 0. Positivity is not an independent requirement: we have 0
paq
cq }0} }x x} }x} } x} p }x} | 1|}x} 2}x},
pbq
where (a) follows from positive homogeneity as shown above, (b) follows from the triangle inequality, and (c) follows from positive homogeneity. Dividing both sides by 2 proves positivity. If the cab driver makes the longer dimension free, cab fare is still a norm (the l8 -norm). and maxp|x1 |, |x2 |q, respectively. Before we are tempted to conclude that }x}p p q is a norm for all p 0, note that it is actually not a norm for p P r0, 1q. The l0 -norm does not satisfy positive homogeneity, and the lp -norm for p P p0, 1q does not satisfy the triangle inequality (the unit disks for these norms are not convex). xp 1 xp 2
1 p
2 In R2 , we have seen the l1 -, l2 -, l4 -, and l8 -norms so far, dened by |x1 | |x2 |, x2 1 x2
1
2
4 , x4 1 x2
1
4
Example 2.4.2. Consider the set of innite sequences of real numbers that have only nitely many nonzero terms. Is this a vector space? If so, what is its dimension? Generalize the norm examples to this case. If you remove the restriction to nitely many nonzero terms but insist that the norm be nite, which example gives the largest vector space? The smallest? Solution: Yes, this is a vector space since the sum of an innite sequence with M 8 nonzero terms and an innite sequence with N 8 nonzero terms will have at most M N 8 nonzero terms, and any multiple of an innite sequence with M nonzero terms will still have only M nonzero terms. The vector
32 0

1
8 | |p p . The space is innite-dimensional. The lp -norm (p P r1, 8q) for a sequence t1 , 2 , . . . u is i1 i l8 -norm is maxi t|i |u8 . If we remove the restriction to nitely many nonzero terms but insist that the i1 norm be nite, then the l8 -norm gives the largest vector space, while the l1 norm gives the smallest. K
Example 2.4.3. C ra, bs is the space of continuous functions on ra, bs, with norm }x} Conrm that this norm satises all the axioms. Solution: We have 1. positive homogeneity:
maxtPra,bs |xptq|.
}x} amax |xptq| || amax |xptq| ||}x}. tb tb

2. triangle inequality:
}x y} amax |xptq yptq| amax p|xptq| |yptq|q amax |x| amax |yptq| }x} }y}. tb tb tb tb
3. positivity and positive deniteness: Clearly }x} 0 if only if xptq 0 for all t P ra, bs. }x} 0 since |x| 0. K Example 2.4.4. Give examples of functions, not in C r0, 1s, that can be included in the space if you choose 1 the norm }x} 0 |xptq| dt.
Figure 2.4.1: Thomaes function (Example 2.4.4). Image taken from Wikipedia. Solution: The function xptq sgn px 0.5q is discontinuous at the point x 0.5, but }x} 1. A more interesting choice is Thomaes function, also known as the popcorn function or the Riemann function: xptq
#
1 q
t
PQ. tPRQ
p q
We show that }x} 0 (see Figure 2.4.1). Given any 0, choose n such that 1{n . In the interval r0, 1s, there are only nitely many rational numbers with denominator at most n. Say there are dn of these numbers. Surround each of these dn points with an interval of length {dn , and use the endpoints of these intervals to form a dissection of the interval r0, 1s. The upper Darboux sum for this dissection is less than 2 : within each of the dn intervals, whose combined length is , the function has an upper bound of 1, and
2.4. Normed Vector Spaces 0
33
within each of the remaining intervals, whose combined length is less than 1, the function has an upper bound of 1{n . The lower Darboux sum is obviously 0 since every interval in the dissection contains an irrational number. Since we have 0 }x} 2 for all 0, }x} 0. The norm }x} does not exist for the Dirichlet function, dened as x pt q 1 Q pt q
#
1 0
tPRQ
tPQ
is not Darboux-integrable (and therefore not Riemann-integrable) since every interval in any dissection of r0, 1s will always contain at least one rational and at least on irrational number; therefore, the upper and lower Darboux sums will not converge. If we use the Lebesgue integral, however, then }x} exists and equals 0 since Q is countable. K Denition 2.4.5 (total variation). A function x on ra, bs is said to be of bounded variation if there is a constant K so that for any partition a t0 t1 tn b of ra, bs,
n i 1
|xptiq xpti1q| K.
n i 1
We then dene the total variation of x as TVpxq sup
|xptiq xpti1q|
b
a
|dxptq|,
where the supremum is taken with respect to all partitions of ra, bs. Example 2.4.6. The space BVra, bs is dened as the space of all functions of bounded variation on ra, bs together with the norm }x} |xpaq| TVpxq. Why is this norm the appropriate choice? Is the function 1 xptq sin 1 t in BVr0, 1s?
Figure 2.4.2: The function xptq sin 1 t does not have bounded variation (Example 2.4.6). Solution: We check that }x} satises the three axioms for a norm:
34 0
1. positive homogeneity:
}x} |xpaq| sup

2. triangle inequality:
n i 1
|xptiq xpti1q| |||xpaq| || sup
n i 1
|xptiq xpti1q| ||}x}.
}x y} |xpaq ypaq| sup
n i 1
|pxptiq yptiqq pxpti1q ypti1qq|

n i 1
|xpaq| |ypaq| sup
|xptiq xpti1q| sup
n i 1
|yptiq ypti1q|,
where we use the triangle inequality for the absolute value function. 3. positivity and positive deniteness: Clearly }x} 0 if xptq 0 for t P ra, bs. Furthermore, because we include the term |xpaq| in the norm, }x} 0 only if xptq 0 for t P ra, bs. Finally, }x} 0 for all functions x P BVra, bs since every term in the norm involves absolute values. The function xptq sin 1 t is not in BVr0, 1s since there are innitely many points t P r0, 1s at which xptq 1. Specically, as t 0, xptq oscillates between 1 and 1 innitely often, as shown in Figure 2.4.2. Therefore, xptq does not have bounded variation. K
We can think of xptq as the position of a car as a function of time for t P ra, bs, and that the cars odometer increases even when the car is backing up. Then the space BVra, bs is the set of functions for which the change in the odometer reading is nite. This space is important because it will turn out to be the dual space to C ra, bs.
2.5
Open and Closed Sets
A norm introduces a topology that may be more general than what the reader is used to. Denition 2.5.1 (topology). A topology on a set X is a collection T of subsets of X having the following properties: 1. and X are in T . 2. The union of the elements of any subcollection of T is in T . 3. The intersection of the elements of any nite subcollection of T is in T . A set X for which a topology T has been specied is called a topological space. Denition 2.5.2 (open set). If X is a topological space with topology T , we say that a subset U X is an open set of X if U belongs to the collection T . More generally, a topological space is a set X together with a collection of subsets of X , called open sets, such that and X are both open, and such that arbitrary unions and nite intersections of open sets are open.
2.5. Open and Closed Sets 0
35
The most basic topology on a nonempty set X is the collection t, X u, so a topology can contain just two open sets. Furthermore, it is permissible to have every set in a topology be open. Here are some other denitions that are independent of any model for topology: Denition 2.5.3 (closed set). A subset A of a topological space X is said to be closed if the set X open.
A is
Denition 2.5.4 (interior and closure). Given a subset A of a topological space X , the interior of A, de, is dened as the union of all open sets contained in A (i. e. the largest open subset of A). The noted A , is dened as the intersection of all closed sets containing A (i. e. the smallest closure of A, denoted A closed set containing the subset A). , then A is open, while if A A , then A is closed. Clearly if A A
Figure 2.5.1: a Web site model that illustrates a nite topology that does not involve norms We use a Web site model to illustrate a nite topology that does not involve norms. Figure 2.5.1 shows the links between the six Web pages in a set X . An open set is dened by the property that no page in the set can be reached from a page from outside the set. Example 2.5.5. For the Web site example, that both and X are open. Solution: We cannot nd a page in that can be reached from a page outside because there are no pages in . We cannot nd a page outside X that links to a page in X because there are no pages outside X . Therefore, and X are open. K Example 2.5.6. For the Web site example, prove that the union of two open sets is open. Solution: Let A and B be two open sets. By the denition of an open set, there are no links from pages in X zA to pages in A, nor are there links from pages in X zB to pages in B . Since X zpA Y B q X zA and X zpA Y B q X zB , there are no links from pages in X zpA Y B q to either pages in A or pages in B ; therefore, there are no links from pages in X zpA Y B q to pages in A Y B . Thus, A Y B is open. K
36 0
Example 2.5.7. For the Web site example, prove that the intersection of two open sets is open. Solution: Let A and B be two open sets. Suppose that A X B is not open. Then there must be some page in X zpA X B q that links to a page in A X B . But X zpA X B q pX zAq Y pX zB q, so we have found either a page in X zA that links to a page in A or a page in X zB that links to a page in B . This contradicts our assumption that A and B are open, so A X B is open. K For the Web site model, we note the following to illustrate some of the topological concepts we have dened: The nine open sets in X are , t2u, t4, 5u, t1, 2, 3u, t2, 4, 5u, t4, 5, 6u, t2, 4, 5, 6u, t1, 2, 3, 4, 5u, and t1, 2, 3, 4, 5, 6u X . The interior of t2, 3u is t2u since t2u is the largest open subset of t2, 3u. The interior of t1, 2, 4, 5, 6u is t2, 4, 5, 6u. The nine closed sets are the complements of the nine open sets in X . Note that and X are both open and closed sets (sometimes called clopen sets). The closure of t1u is t1, 3u since t1, 3u is the smallest closed set containing the subset t1u. The closure of t1, 6u is t1, 3, 6u since t1, 3, 6u is the smallest closed set containing the subset t1, 6u. Now we dene a topology using a norm. Denition 2.5.8 (interior point). Let P be a subset of a normed vector space X . The point p P P is an interior point of P if there exists some 0 such that the ball B pp, qq tx : }x p} u is a subset of P. Denition 2.5.9 (clsoure point). A point x P X is a closure point or limit point of a set P if, given any 0, there is a point p P P such that }x p} . In other words, these denitions are saying that a point p is an interior point of P if we can always surround p with a ball entirely contained within P , while a point x is a closure point of P if every ball centered at x contains at least one point in P . is a closed set; that is, that the complement of P is an open set. Proposition 2.5.10. P , then p is neither a point in P nor a closure point (based on [2, Theorem 2.27, p. 35]): If p P X and p R P of P . Therefore, there exists some ball centered at p that does not contain any points in P . This shows in X is open, so P is closed. that the complement of P K
2.6. Convergence, Limits, and Continuity 0
37
Q UIZ T HEOREM 2 (rst part of [1, Proposition 2.7.4, p. 25]). Let C be a convex set in a normed space. Then C is convex. is empty, it is convex (see Proposition 2.2.2). Suppose x0 , y0 P C . Fix P p0, 1q: we must Proof: If C show that z x0 p1 qy0 is in C . Since x, y P C , there exists some 0 such that the open balls B px0 , q, B py0 , q are contained in C ; that is, all vectors x0 w and y0 w with }w} are in C . Since C is convex, all convex combinations px0 wq p1 qpy0 wq are in C . Furthermore, since px0 wq p1 qpy0 wq z0 w, it follows that all points of the form z0 w are in C ; that is, there . exists some 0 such that the open ball B pz0 , q is contained in C . Therefore, z0 P C K
Q UIZ T HEOREM 3 (second part of [1, Proposition 2.7.4, p. 25]). Let C be a convex set in a normed space. Then is convex. C is empty, it is convex (see Proposition 2.2.2). Suppose x0 , y0 P C . Fix P p0, 1q: we must Proof: If C show that z0 x0 p1 qy0 is in C . Given any 0, select x, y from C such that }x x0 } and }y y0} . Since C is convex, z x p1 qy is in C . Then by the triangle inequality,
}z z0} }x p1 qy x0 p1 qy0} }x x0} p1 q}y y0} p1 q

so z0 is within a distance of the point z . and is therefore in C
P C . Since this is true for every 0, z0 is a closure point of C K
2.6
Convergence, Limits, and Continuity
Any topology is sufcient to dene convergence. Denition 2.6.1 (convergence (any topology)). The sequence txn u converges to x if, for every open set P containing x, there exists an N such that for all n N , xn P P . We write xn x.
Example 2.6.2. For the Web site topology, show that the sequence 6, 5, 4, 6, 5, 4, 5, 4, 5, 4, . . . converges both to 4 and to 5. Solution: The smallest open set containing 4 is t4, 5u, and the smallest open set containing 5 is also t4, 5u. Therefore, for every open set P containing 4 or 5, there exists some N such that for all n N , K If we dene open sets by a norm, we can formulate convergence in terms of norms. Denition 2.6.3 (convergence (normed space)). The sequence txn u converges to x if, for every exists an N such that for all n N , }xn x} . As before, we write xn x.
0, there
38 0
Q UIZ T HEOREM 4 ([1, Proposition 2.8.1, p. 27]). If a sequence converges, its limit is unique. Proof: Suppose xn x and xn x1 . Then for every {2 0, there exist N, N 1 such that for all n and all n1 N 1 , }xn x} {2 and }xn x1 } {2. Then by the triangle inequality,
}x x1} }px xmq px1 xmq} }x xm} }x1 xm}

for m maxpN, N 1 q. Since this is true for any
0, x x 1 .
T HEOREM 2.6.4 ([1, Proposition 2.8.2, p. 27]). A set F is closed if and only if every convergent sequence with elements in F has its limit in F . Proof: For the only if direction, the limit of a sequence in F is obviously a closure point of F and therefore must be in F if F is closed. For the if direction, suppose that F is not closed. Then there is a closure point x of F that is not in F . In each of the open balls B px, 1{nq we may select a point xn P F since x is a closure point. The sequence txn u generated in this way converges to some x R F , which contradicts our assumption that every convergent sequence with elements in F has its limit in F . Therefore, F must be closed. K Denition 2.6.5 (transformation). Let X and Y be vector spaces and let D be a subset of X . A rule which associates an element y P Y with every element x P D is a transformation from X to Y with domain D. Denition 2.6.6 (injective). T is injective or one-to-one if T pxq T pyq implies x y. Denition 2.6.7 (surjective). T is surjective or onto if for every y P Y , there exists at least one x such that T pxq y. In other words, the image of T equals its codomain. Denition 2.6.8 (linear). T is linear if T pax byq aT pxq bT pyq for any x, y P X and any scalars a, b P F.
Example 2.6.9. If X or Y is innite-dimensional, then a linear transformation T cannot be represented by a matrix. Show that the following transformations still qualify as linear: Both X and Y are the space of polynomial functions: T is differentiation. Both X and Y are the space of continuous functions on ra, bs: T is dened by T px q
b
a
k pt, qxp q d.
where k is a continuous function on ra, bs ra, bs
2.6. Convergence, Limits, and Continuity 0 Solution: Differentiation and integration are linear operators: Dx paf bg q aDx f bDx g and bg pxqq dx a R f pxq dx b R g pxq dx. Denition 2.6.10 (continuity (any topology)). T : X of any open subset V Y is an open subset of X .
39
paf pxq
K
is continuous if the inverse image U
T 1 pV q
Notice that the inverse image is dened even if T is not invertible, and it is not necessarily a connected set. For a topology dened by a norm, we can formulate convergence in terms of norms. Denition 2.6.11 (continuity (normed space)). A transformation T from a normed space X to a normed space Y is continuous at x0 P X if for every 0, there exists a 0 such that }x x0 } implies }T pxq T px0q} .
Example 2.6.12 (based on [6, Theorem 18.1, p. 104]). Show that this is the same denition, specialized to a topology dened by a norm. In other words, show that the denition of continuity for any topology implies the denition of continuity for a normed space. Solution: Suppose T is continuous at x0 and let V ty : }T px0 q y} u be an open subset of Y . Then there exists an open set U 1 T 1 pV q in X , and clearly, x0 P U . Furthermore, since U 1 is open, there exists some such that U tx : }x0 x} u is a subset of U 1 . Then x P U implies T pxq P V . K Note that the norm can be different in X and Y . T HEOREM 2.6.13 ([1, Proposition 2.9.1, p. 28]). A transformation T mapping a normed space X into a normed space Y is continuous at the point x0 P X if and only if xn x0 implies T pxn q T pbx0 q. Proof: The if portion is obvious since we just set the in the denition of continuity to the in the denition of convergence. For the only if portion, let txn u be a sequence such that xn x0 but T pxn q T px0 q. Then for some 0 and every N , there is an n N such that }T pxn q T px0 q} . Since xn x0 , this implies that for every 0 there is a point xN such that }xn x0 } } and }T pxn q T px0 q} . But this contradicts our assumption that T is continuous at x0 , so we must have T pxn q T px0 q. K Theorem 2.6.13 suggests the following practical test for lack of continuity. If you want to show that T is not continuous at x0 : Choose some suitable xed (smaller works better) Construct a sequence xn with the following properties: xn converges to x. For all n N , ||T pxn q T pxq|| .
1 n
Having xn scale like
is often convenient.
40 0
Example 2.6.14. Use this test to show that the function f px, y q is discontinuous at px, y q in R2 . x2 y , x4 y 2 f p0, 0q 0
p0, 0q and that the proof is valid for the l2 norm, the l1 norm, or the l8 norm
Solution: Let pxn , yn q p1{n, 1{n2 q. Then lim f pxn , yn q lim
1 n4 1 n4
n1
0 f p0 , 0 q f 1 2
lim xn , lim yn ,
so f is discontinuous at px, y q p0, 0q.
C HAPTER 3
BANACH SPACES
Reading: [1, Sections 2.10 2.15]
3.1 lp Space
Denition 3.1.1 (lp space). For p which
1, the space lp consists of all sequences of scalars t1, 2, . . . u such for

8
i 1
The norm of an element x ti u P lp is dened as
|i| 8.
}x}p
8
|i|
i 1
The space l8 consists of bounded sequences. The norm of an element x supi |i |.
tiu P l8 is dened as }x}8
T HEOREM 3.1.2 (Cauchy-Schwarz Inequality). If x ti u, y ti u P l2 , then
8
i 1
|ii|
8
1
2
2 i
8
1
2
2 i
}x}2}y}2.
i 1
i 1
Proof: First, we show that the geometric mean is less than or equal to the arithmetic mean. By elementary 1 2 2 2 2 algebra, we have 0 px y q2 for any x, y P R. Then xy 2 x 1 2 y . Let a x , b y . Then 1 1 1 a2 b2 1 2 a 2 b. Now consider x ti u, y ti u P l2 . We set a |i |2 {}x}2 , b |i |2 {}y}2 and use the above inequality for each component i. Summing over all i gives
8 1 | |2 1 | |2
1 }x} 1 }y}2 |ii| i i 2 1. }x}2}y}2 i1 2 }x}2 2 }y}2 2 }x}2 2 }y}2 i1
Then
8
i 1
|i i | }x}2 }y}2 .
As an aside, while Augustin Cauchy rst published the inequality for sums in 1821, the corresponding inequality for integrals was rst stated by Viktor Bunyakovsky in 1859, well before Hermann Schwarzs 41
42 0
3. B ANACH S PACES
rediscovery of the inequality in 1888. This fact illustrates Stiglers Law of Eponymy: No scientic discovery is named after its original discoverer. The law, proposed by University of Chicago statistics professor Stephen Stigler in 1980, applies to itself since long before Stigler formulated it, A. N. Whitehead noted that Everything of importance has been said before, by someone who did not discover it. Sigler himself attributed his law to the sociologist Robert K. Merton, famous for coining the phrases self-fullling prophecy and role model as well as for being the father of Harvard economist Robert C. Merton, co-discoverer of the Black-Scholes(-Merton) option-pricing formula. Before proving the Holder inequality, a generalization of the Cauchy-Schwarz inequality, we provide a motivating example. Example 3.1.3. Consider a specic vector x ti u P X , where i 1{i0.3 . What is the smallest integer p 8 0.9 for which the norm }x}p is nite? If i 1{i0.3 , does 8 i1 |i i | converge? If i 1{i , does i1 |i i | converge?
p Solution: Since the series 8 1 1{i converges for p 1, we want to nd the smallest integer p such that i8 K 0.3p 1. Therefore, p 4. i1 |i i | does not converge if i 1{i0.3 , but it does if i 1{i0.9 .
We can view i as an innite set of coefcients that dene a linear functional on X . If i 1{ip and i 1{iq , then a sufcient condition for 8 i1 |i i | to converge is 1{p 1{q 1. Cauchy-Schwarz was a special situation, because when we use the l2 -norm in X , we also want the same l2 norm in the dual space of linear functionals on X . In general, if we use the lp -norm in X and want 1 1 the linear functional 8 i1 |i i | to converge, we should use the lq -norm, where p q 1, on the dual space of linear functionals. Therefore we need to prove the Holder inequality. Before proving the main inequality, rst we derive an auxiliary inequality. Proposition 3.1.4. For a, b 0 and 0 1, a b1
a p1 qb 1,
Proof: Consider the function f ptq t t 1 for t 0. Then f 1 ptq pt1 1q. Since 0 we have f 1 ptq 0 for 0 t 1 and f 1 ptq 0 for t 1. Therefore, f ptq f p1q 0 for t 0, so t for t 0. If b 0, substituting t b 0, the inequality is obvious.
t 1
If K
a{b and multiplying both sides by b gives the desired inequality.

1 p
Q UIZ T HEOREM 5 (Holder Inequality ([1, Theorem 2.10.1, p. 29])). If p and q satisfy 1 q 8, and x ti u P lp , y ti u P lq , then
1 q 1, 1 p 8,
8
i 1
|ii| }x}p}y}q .
3.1. lp Space 0
43
Equality holds if and only if
|i|
|i|
}x}p }y}q
1 q
1 p
for all i.
Proof: First consider the case p 8, q
1. Then we have | | max |i| i

8
i 1
8
i 1
|ii|
i 1

8 max i i i
|i| }x}p}y}q .
The case p 1, q
8 is equivalent. Now consider the cases 1 p 8, 1 q 8. Set a |i |p {}x}p , b |i |q {}y}q , and 1{p, and use
8 1 | |p 1 | |q
1 }x} |ii| 1 }y}q p i i 1. }x}p}y}q i1 p }x}p q }y}q p }x}p q }y}q i1 8
the above inequality for each component i. Summing over all i gives
Then
i 1
|i i | }x}p }y}q .
We can use the Holder inequality to generalize the Euclidean triangle inequality to the Minkowski inequality.
Q UIZ T HEOREM 6 (Minkowski Inequality ([1, Theorem 2.10.2, p. 31])). If x, y P lp with 1 p 8, then so is x y, and }x y}p }x}p }y}p . For 1 p 8, equality holds if and only if x k y for some positive constant k .
Proof: For p 1,
}x y }1
8
i 1
where we apply the triangle inequality to each term in the summation. For p 8,
|i i|
8
i 1
|i|
8
i 1
|i| }x}1 y}1,
}x y}8 max |i i| max |i| max |i| }x}8 }y}8. i i i

For 1 p 8, we rst consider nite sums. We have
n i 1 n i 1 n i 1 n i 1
p}x y}pqp
|i i|p
|i i|p1|i i|
|i i|p1|i|
|i i|p1|i|.
44 0
3. B ANACH S PACES
1 p
Now apply Quiz Theorem 5 to both sums, treating t|i i |p1 u as an element of lq , with
n i 1
1 q 1. Then
|i i|p
i 1
|i i|pp1qq |i i|p
q
1
q
n i 1
i 1
|i|p
1
p
n i 1
i 1
|i|p
|i|p
p
1
p
|i|p
i 1
1 ,
i 1
since pp 1qq q {q 1. Dividing both sides of the above equation by p 1 that 1 1 q p , we have
p |i i | q q and remembering
1 p
Letting n 8 on the right-hand side can only increase its value, so
i 1
|i i|
i 1
|i|
n i 1
|i|
i 1
|i i|p
}x}p }y}p.
Note that this holds for all n; therefore, we can also let n 8 on the left-hand side to obtain the desired result. The conditions for equality follow from the Holder inequality. K
3.2
Lebesgue Integration
Denition 3.2.1 (volume zero). A set S R has volume zero if, for any union of a nite set of intervals with total length less than .
0, S can be contained in the
Denition 3.2.2 (measure zero). A set S R has measure zero if, for any 0, S can be contained in the union of a countably innite set of open intervals with total length less than .
Example 3.2.3 (rational numbers). Show that the set Q Xr0, 1s has measure zero but does not have volume zero. Solution: Since Q X r0, 1s is countable, we can enumerate the elements of the set as q1 , q2 , . . . . For any 0, dene the open interval Ik pqk 2k , qk 2k q. The sum of the lengths of these intervals is . Therefore, Q Y r0, 1s has measure zero. Assume Q X r0, 1s can be contained in the union C Yn i1 Ci , where n 8 and maxi |Ci | . Then |C | n. For n 1, there exists an interval r0, 1s C with length 1 n 0; since Q is dense in R, there must be an element of Q X r0, 1s in this interval. Therefore, Q X r0, 1s does not have volume zero. K The proof that Q X r0, 1s has measure zero works for any countably innite set. However, there also exist sets of measure zero that are uncountably innite. The Cantor set is one famous example:
3.2. Lebesgue Integration 0
45
Example 3.2.4 (Cantor set). Dene the Cantor set as follows: Step 0: Let the Cantor set be the interval r0, 1s. Step i: Remove the middle third of every interval in the Cantor set at Step i 1. Show that the Cantor set has volume, and therefore measure, zero. Solution: At Step i, the Cantor set contains 2i intervals with total volume p2{3qi . For every , there exists some i such that p2{3qi . Then we can contain the Cantor set in the 2i 8 intervals with total length less than . K
Figure 3.2.1: the Cantor set (Example 3.2.4). Image taken from Wikimedia Commons. As an interesting aside, note that the Cantor set is in one-to-one correspondence with r0, 1s. Consider a point x in the Cantor set. That point is in r0, 1s, a left or right subinterval of r0, 1s, a left or right subinterval of that subinterval, and so on (see Figure 3.2.1). We can encode the position of x by the binary expansion 0.b1 b2 . . . , where bi equals 0 if x is in the left subinterval at Step i and 1 if x is in the right subinterval at Step i. If f and g are Riemann (or, equivalently, Darboux) integrable, then so is f g , and it follows by induction that the sum of any nite series of Riemann integrable functions is integrable. The Lebesgue integral extends this idea to a function that is the sum of an innite series of functions. Denition 3.2.5 (Lebesgue integral). Suppose that f pxq A
8
k 1 fk
pxq. Further suppose that the sum
8 b
k 1 a
|fk pxq| dx

converges. Then f pxq is dened, in the sense that sure zero. Furthermore,
k 1 fk
pxq converges, except perhaps on a set of meaf pxq dx
8 b
is the Lebesgue integral of f pxq on ra, bs.
k 1 a
fk pxq dx
b
a
Note that if f is Riemann integrable, then we can write f1 pxq f pxq and fk pxq 0 for k 1, so the Lebesgue integral of f equals the Riemann integral. The following proposition extends the result that if functions f and g are equal except on a set of volume zero, their Riemann integrals are equal:
46 0
3. B ANACH S PACES
8
Proposition 3.2.6. If f pxq
k 1 fk
pxq and gpxq 8 k1 gk pxq are equal except on a set of measure zero, then
b
a
f pxq dx
b
a
g pxq dx.
Example 3.2.7 (Dirichlet function). Show that the Dirichlet function 1xPQXr0,1s is Lebesgue integrable (with integral zero) but not Riemann integrable. Solution: We showed in Example 3.2.3 that Q X r0, 1s has measure zero, so the Lebesgue integral of the Dirichlet function is zero. To show that the Dirichlet function is not Riemann integrable, consider any dissection of r0, 1s. Each interval in this dissection will contain a rational number and a nonrational number since Q is dense in R. Therefore, the upper Darboux sum will always be 1, while the lower Darboux sum will always be zero. Therefore, the Darboux integral, and hence the Riemann integral, does not exist. K The improper integral of an unbounded function cannot be dened as a Riemann integral, since the upper Darboux sum does not exist. However, Lebesgue integration solves the unbounded function problem: Example 3.2.8 (unbounded function). Evaluate the Lebesgue integral
1
0
1 ? dx lim a0 x
1
a
1 ? dx. x
Solution: Let fk pxq Then Ik

8
k 0 Ik
1 ? x
1 2k
x 2 1
1 2
2
k 1
otherwise

k1
|fk pxq| dx 2
R
1 2
is a telescoping series and evaluates to 2. Since |fk pxq| also equals 1.
fk pxq for all k and x, the given integral

K
Lebesgue integration also solves the unbounded interval problem: Example 3.2.9 (unbounded interval). Evaluate the Lebesgue integral
8
0
ex dx.
3.3. Lp Space 0 Solution: Let fk pxq Then Ik

8 #
47
ex 0
k1xk otherwise
|fk pxq| dx epk1q ek ;

R
k0 Ik is a telescoping series and evaluates to 1. Since |fk pxq| also equals 1.
fk pxq for all k and x, the given integral

K
3.3 Lp Space
Denition 3.3.1 (Lp space). For p 1, the space Lp ra, bs consists of all real-valued measurable functions x on ra, bs for which |xptq|p is Lebesgue integrable. The norm of an element x P Lp ra, bs is dened as
}x}p
b
a
|xptq|
1
p
dt
Of course, }x}p 0 does not necessarily imply that xptq 0 since xptq may be nonzero on a set of measure zero, as we saw in Example 3.2.7. Therefore, we do not distinguish between functions that are equal almost everywhere and redene 0 as the equivalence class of all functions that equal zero except on a set of measure zero. Note that we can dene the Rp spaces similarly, replacing Lebesgue integrable with Riemann integrable. While most of the functions we see in these notes will be Riemann integrable, we prefer to work with the Lp spaces since they are Banach spaces (as we will see shortly), allowing us to use a strong theoretical framework. A tricky special case is L8 ra, bs. The obvious denition is }x}p suptPra,bs |xptq|, but this is ambiguous since an element of x corresponds not to a single function but to an equivalence class of functions that differ on a set of measure zero. Therefore, we dene the norm as
}x}8
y t
almost everywhere
inf p qxptq
t a,b
Pr s
sup |y ptq| ess sup |y ptq|,

t a,b
Pr s
where ess sup denotes the essential supremum. Results analogous to Quiz Theorems 5 and 6 hold for the Lp spaces:
T HEOREM 3.3.2 (Holder Inequality ([1, Theorem 2.10.3, p. 32])). If p and q satisfy 1 q 8, and x P Lp ra, bs, y P Lq ra, bs, then
b
a
1 p
1 q 1, 1 p 8,
xptqy ptq dt }x}p }y }q .
48 0
3. B ANACH S PACES
Equality holds if and only if
almost everywhere on ra, bs.
|xptq|
p |yptq|
q }x}p }y}q
Proof: The proof is essentially the same as that of Quiz Theorem 5.
T HEOREM 3.3.3 (Minkowski Inequality ([1, Theorem 2.10.4, p. 33])). If x, y so is x y , and }x y }p }x}p }y }p .
P Lpra, bs with 1 p 8, then 1 since f 1ptq
Proof: Consider the function f ptq tp . Clearly this function is convex on r0, 8q for p ptp1 and f 2 ptq ppp 1qtp2 0 for p 0 and t 0. Dene aptq Then
|xptq| }x}p
bptq
|yptq| }y }p
}x}p . }x}p }y}p
aq |xptq|p |y ptq|p pbq |xptq yptq|p p p aptq p1 qbptqqp paptqqp p1 qpbptqqp , p p p}x}p }y}pq p}x}p }y}pq
where (a) follows from the triangle inequality and (b) follows from the convexity of tp . Then integrating over ra, bs gives }x y}p
p p1 q 1, }x}p }y}p yielding the desired result. K
3.4
Cauchy Sequences
Denition 3.4.1 (Cauchy sequence). A sequence txn u in a normed space is a Cauchy sequence if for any 0, there exists an integer N such that }xm xn} for all n, m N .
Example 3.4.2. Show that the partial sums over the vector space Q sn form a Cauchy sequence. Solution: We recognize that tan1 x
p1qi 2i 4 1
i 0
8
i 0
p1q2i1 2i 1 1,
3.4. Cauchy Sequences 0 so sn
49
4 tan1 1 . Without loss of generality, let n m. Then for n, m N , |sn sm| sn sm sm 0.
Note that since R Q, the series does not converge. The obvious x is to let tsn u be a Cauchy sequence over the vector space R. K
Q UIZ T HEOREM 7 (includes [1, Lemma 2.11.1, p. 35]). Every convergent sequence is a Cauchy sequence. Furthermore, every Cauchy sequence is bounded. Proof: If a series txn u converges to x, then for every }xn x} . Then we have
0, there exists an N such that for every n N ,
}xm xn} }pxm xq pxn xq} }xm x} }xn x} 2 ,

where the rst inequality follows from the triangle inequality. If we set 1 2 , we have shown that for every 1 0, there exists an N such that for every m, n N , }xm xn } 1 . Therefore, txn u is a Cauchy sequence. To show that every Cauchy sequence is bounded, pick an N such that }xn xN } 1 for n N . Then for any n N , }xn} }xn xN xN } }xn xN } }xN } 1 }xN },
where the rst inequality follows from the triangle inequality. Since }xn } bounded.
}xN } for any n N , txnu is
Denition 3.4.3 (Banach space). A normed linear vector space X is complete if every Cauchy sequence in X has a limit in X . A complete normed linear vector space is a Banach space. From Example 3.4.2, we see that Q is not complete. Example 3.4.4 ([1, Example 2.11.2, p. 34]). Let X be the vector space of sequences of real numbers with only nitely many nonzero components with the l8 norm and dene xn
"
1 1 1, , , , 0, 0, 2 n1
Show that txn u is a Cauchy sequence. How would you complete X to make this Cauchy sequence convergent in X ? Solution: We have }xn xm } maxp1{n, 1{mq 0. Clearly txn u does not converge to an element of X , but we can complete X by turning it into the vector space of all sequences of real numbers that converge to zero. K
50 0
3. B ANACH S PACES
Example 3.4.5. Let X
R1r0, 1s and dene

xn ptq
#
0
1 ? t
0t
1 n
t1
1 n
Show that txn u is a Cauchy sequence. To what function does this sequence converge? Is it in R1 r0, 1s, and if not, of what larger space is it an element? Solution: The function converges to xptq Example 3.2.8, we see that x P L1 r0, 1s.
t1{2. Because x is unbounded, x R R1r0, 1s. However, from

K
Q UIZ T HEOREM 8 ([1, Example 2.11.4, p. 35]). lp , for 1 p 8, is a Banach space. Proof: First consider the case 1 p 8. Let txn u be a Cauchy sequence in lp . Then if xn
m p n m n | k | | k k |k
1 p
tinu,
8
|in im|p
0,
i 1 n u is a Cauchy sequence and therefore converges to a limit . We show that x t u P l . so for each k , tk p k k From Quiz Theorem 7, there exists some M such that }xn } M for all n, so j i 1
|in|p }xn}p M p.
Since the sum on the left is nite, the inequality holds as n 8:

j i 1
|i|p M p.
Furthermore, since the inequality holds uniformly in j ,
8
i 1
|i|p M p, x. For any 0, there exists an N such that
so x P lp , and }x} M . We still must show that xn

j i 1
|in im|p }xn xm}p
since txn u is a Cauchy sequence. Since the sum on the left is nite, the inequality holds as m 8:
j i 1
|in i|p }xn x|p
3.4. Cauchy Sequences 0 Furthermore, since the inequality holds uniformly in j ,
51
8
i 1
|ini|p }xn x}p
for n N , so xn x. n m | }x x } 0, so for each k , as before, Now let txn u be a Cauchy sequence in l8 . Then |k n m k n tk u is a Cauchy sequence and therefore converges to a limit k . Furthermore, this convergence is uniform in k . We show that x tk u P l8 . Since tk u is Cauchy, from Quiz Theorem 7 there exists an M such that n | }x } M for all k and n, so x P l and }bx} M . Since n for all n, }xn } M . Then |k n 8 k k uniformly, xn x. K
Example 3.4.6 ([1, Example 2.11.3, p. 35]). Show that C r0, 1s is a Banach space. Solution: Let txn u be a Cauchy sequence in C r0, 1s. For each xed t P r0, 1s, |xn ptq xm ptq| a }xn xm} 0, so txn ptqu is a Cauchy sequence of real numbers. Since E (R with the Euclidean norm px y q2 |x y|) is complete, there exists a real number xptq such that xnptq xptq. Therefore, the functions xn converge pointwise to x. This pontwise convergence is actually uniform in t P r0, 1s; that is, for any 0, there exists an N such that |xn ptq xptq| for all t P r0, 1s and n N . For 0, choose N such that }xn xm} {2 for n, m N , which we can do since txnu is a Cauchy sequence. Then for n N ,
|xnptq xptq| |xnptq xmptq| |xmptq xptq| loooomoooon }xn xm} }xm x}.
{2
By choosing m sufciently large, we can make the second term less than {2 since xm x pointwise, so |xnptq xptq| for n N . We must still show that x is continuous and therefore in C r0, 1s and that xn x in the norm of C r0, 1s. To prove continuity, choose 0. For all t, , n,
|xpt q xptq| |xpt q xnpt q| |xnpt q xnptq| |xnptq xptq|.

By the uniform convergence of txn u, we can choose n such that the rst and last terms are less than {3. Since xn is continuous, can be chosen such that the middle term is less than {3. Thus, x is continuous. Since xn ptq xptq uniformly, xn x in the norm of C r0, 1s. K
Example 3.4.7 ([1, Example 2.11.1, p. 35]). Show that the space of continuous functions on r0, 1s, with the norm
}x}
|xptq| dt,
is not a Banach space.
52 0
3. B ANACH S PACES
Figure 3.4.1: Plots of x2 ptq, x3 ptq, and x4 ptq (Example 3.4.7). Solution: We must construct a Cauchy sequence of functions that convergence to a discontinuous function. Consider the sequence txn u, where xn ptq
$ ' ' &0 ' ' %1
nt
n 2
1 n 1 1 1 . 2 n t 2 t 1 2
0t
1 2
Figure 3.4.1 shows plots of x2 ptq, x3 ptq, and x4 ptq. Then
} x n x m }
1 2 1 2 1 n
nt
1
n 2
mt
m 2
1 dt
1 1 p 1q 2 2
f rac1n
1 2
1 11 m 2 m
1 , n
which clearly goes to zero as n, m x is the discontinuous function
8. Therefore, txnu is a Cauchy sequence. However, xn x, where

xptq
#
0 1
0t
1 2
t1
1 2
. K
Therefore, the given space is not a Banach space. T HEOREM 3.4.8 ([1, Theorem 2.12.1, p. 38]). In a Banach space X , a subset S closed.
X is complete if and only if it is
Proof: If S is complete, then every Cauchy (and hence, by Quiz Theorem 7, convergent) sequence in S converges to a point in S , so S is closed. Furthermore, since X is a Banach space, any Cauchy sequence in S must converge to a point in x P X , and if S is closed, then x P S . Hence, if S is closed, then S must be complete. K T HEOREM 3.4.9 ([1, Theorem 2.12.2, p. 38]). In a normed linear vector space, any nite-dimensional subspace is complete. Proof: The proof is by induction on the dimension of the subspace. A 1-dimensional subspace is complete since all elements in the subspace can be written in the form x e, where is a real number and e is
3.5. Compactness and Extrema 0
53
a xed vector. Convergence of a sequence tn eu is equivalent to convergence of the sequence of real numbers tn u; this sequence is convergent since E is complete. Now assume that the theorem is true for subspaces of dimension N 1. Let X be a normed space and let M be an N -dimensional subspace of X . We show that M is complete. Let te1 , . . . , eN u be a basis for M . Dene k
inf ek a1 ,...,aN
j k
j ej , j 1
N
1, . . . , N.
k is the distance from ek to the subspace Mk generated by the remaining N 1 basis vectors. We must have k 0 otherwise a sequence of vectors in the (N 1)-dimensional subspace Mk could be constructed converging to ek R Mk , which contradicts the induction hypothesis that Mk is complete. Dene mink k and let txn u be a Cauchy sequence in M . Each xn has a unique representation xn Then for arbitrary n, m,
N i 1
n i ei .
}xn xm}
N n i
i 1
p m i q
ei
m |n k k |,
1, . . . , N ;
m since }xn xm } 0, |n 0 for each k . Then for each k , tn k k | k u is a Cauchy sequence and therefore n converges to a limit k . Let x k1 k ek P M . We show that xn x. For all n,
}xn x}
n n k
k 1
p k q
ek
N 1 max |n i k |}ek }, kN x.
K
but since |n k k | 0 for all k , }xn x} 0. Thus, xn
3.5
Compactness and Extrema
Denition 3.5.1 (compact). A set K in a normed space X is compact if, given an arbitrary sequence txi u in K , there is a subsequence txin u that converges to an element x P K . By the Bolzano-Weierstra theorem, compact is equivalent to closed and unbounded in nite-dimensional vector spaces. Note that if K is compact, then every Cauchy sequence converges to a limit in K , so K is also complete. We can now prove the extreme value theorem (see Theorem 1.2.1 for an equivalent statement): Denition 3.5.2 (upper semicontinuous). A real-valued functional f dened on a normed space X is upper semicontinuous at x0 if for any 0, there exists a 0 such that }x x0 } implies f pxq f px0 q (note the absence of the usual absolute value signs).
54 0
3. B ANACH S PACES
T HEOREM 3.5.3 (extreme value theorem [1, Theorem 2.13.1, p. 40]). An upper semicontinuous functional on a compact subset K of a normed linear vector space X achieves a maximum on K . Proof: Let M supxPK f pxq (we allow the possibility that M 8). There is a sequence txi u in K such that f pxi q M . Since K is compact, there is a convergent subsequence xin x P K . Clearly f pxin q M and since f is upper semicontinuous, f pxq limn8 f pxin q M . But since K is compact, f is bounded, so M 8 and then f pxq M . K Example 3.5.4 ([1, Example 2.13.1, p. 40]). Consider the linear functional f pxq
1 2
xptq dt
1
1 2
xptq dt,
dened for x P C r0, 1s. Show that f is continuous. What function on the unit sphere in C r0, 1s maximizes f ? Is this function an element of C r0, 1s? Solution: We have
|f pxq f px0q|
1 2
1 1 } x x0 } 2 }x x0} }x x0}. pxptq x0ptqq dt pxptq x0ptqq dt 2

1 2
Then for every 0, there exists a such that }x x0 } implies |f pxq f px0 q| . The unit ball in C r0, 1s is the set of continuous functions x such that 1 xptq 1 for all t P r0, 1s. The supremum of f over the unit ball in C r0, 1s is clearly 1, but no continuous function x with }x} 1 achieves this supremum. (If we instead considered L8 r0, 1s, then the function x pt q
#
0t
1 2
t1
K
1 2
achieves the supremum.) This shows that the unit ball in C r0, 1s is not compact. In fact, the unit ball is not compact in any innite-dimensional normed linear vector space: T HEOREM 3.5.5. In any innite-dimensional normed linear vector space X , the unit ball is not compact.
Proof: We construct an innite sequence of linearly independent unit vectors zn such that for all n, m, }zn zm} 1{2. Let tz1, . . . , zn1u be a set of n 1 such vectors, and let Y be the closed subspace generated by these vectors. Y is a proper subspace of X since X is not nite-dimensional. Therefore, we can choose x P X zY and let d inf YPY }x y}. Since Y is nite-dimensional, it is complete by Theorem 3.4.9. If d were zero, then there would exist a sequence of vectors in Y that converged to x R Y , contradicting our hypothesis that Y is nite-dimensional and hence complete. Therefore, d 0. Choose some y0 P Y such that d }x y0 } 2d, and let zn
y0 . }x xy }
0
3.6. Quotient Spaces 0 Then by construction, }zn } 1. Now for any y P Y ,
55
}zn y}
x
y 0 y } x y0 } . } x y0 }
The numerator is greater than d and the denominator is less than 2d, so the fraction is greater than 1{2. Since the above equation holds for all y P Y , }zn zi } 1{2 for all i n. By repeating this process for n 2, we construct an innite sequence of vectors in the unit ball (specically on the unit sphere), such that }zn zm } 1{2 for all n, m. Thus, this sequence has no Cauchy subsequence and hence, by Quiz Theorem 7, no convergent subseqeunce. Therefore, the unit ball is not compact. K
3.6
Quotient Spaces
Denition 3.6.1 (equivalence). Let M be a subspace of a vector space X . Two elements x1 , x2 equivalent modulo M if x1 x2 P M . In this case, we write x1 x2 .
P X are
This equivalence relation partitions X into disjoint subsets, called cosets, of equivalent elements. Geometrically, these cosets are linear varieties that are distinct translations of the subspace M . Each x P X belongs to a unique coset of M , denoted rxs. Denition 3.6.2 (quotient space). Let M be a subspace of a vector space X . The quotient space X {M consists of all cosets of M with addition dened by rx1 s rx2 s rx1 x2 s and scalar multiplication dened by arxs raxs.
Denition 3.6.3 (norm in a quotient space). If X is a Banach space and M is a closed subspace of X , then the norm of a coset rxs P X {M is dened as
}rxs} m inf }x m} PM
Example 3.6.4. Show that this denition satises the axioms for a norm.
Solution: Positive homogeneity follows since
}rxs} }rxs} m inf }x m} inf }x m} PM mPM

paq pbq m inf }x m} ||}rxs}, PM
where (a) follows because M is a linear subspace, so nding the inmum over m P M is the same as nding the inmum over m P M and (b) follows by the positive homogeneity of the norm } } in X .
56 0
3. B ANACH S PACES
The triangle inequality follows since
}rxs rys} }rx ys} m inf }x y m} inf }x y 2m} PM 2mPM

paq
m inf }x y 2m} }rxs} }rys}, PM
pbq
where (a) follows for the same reason as before and (b) follows by the fact that the norm } } in X obeys the triangle inequality. Positivity follows from the positivity of the norm } } in X . Clearly }r0s} 0. There remains the possibility that rxs does not include 0 but that there is a sequence in rxs that converges to 0, in which case }rxs} 0. However, this is impossible since M is closed, so }rxs} 0 if rxs r0s. Thus, we have shown positive deniteness. K
3.7
Denseness and Separability
Denition 3.7.1 (dense). A set D is dense in a normed space X if for each x exists a d P D such that }x d} .
P X and each 0, there
Note that if D is dense in X , there are points of D arbitrarily close to each x P X . Therefore, given x P X , we can construct a sequence of elements in D that converges to x. Thus, an equivalent denition is: D is X ; that is, if the closure of D equals X . dense in X if D One of the most common examples of a dense set is the set Q in R. By the Weierstra approximation theorem (1.5.6), the set of all polynomials with rational coefcients is dense in C ra, bs. Denition 3.7.2 (separable). A normed space is separable if it contains a countable dense set.
Q UIZ T HEOREM 9. lp , for 1 p 8, is separable. Proof: Let D be the set of all nitely nonzero sequences with rational components. Clearly, D is countable. p Let x ti u P lp and x 0. Since x P lp , 8 i1 |i | 8, so there exists an N such that
8
i N 1
|i|p 2 .
Then
For i 1, . . . , N , let ri be a rational such that |i d P D and
ri|p {2N , and let d tr1, . . . , rN , 0, 0, . . . u.

8
i N 1
}x d}p
i 1
|i ri|p
|i|p
, K
so D is dense in lp .
3.7. Denseness and Separability 0 Example 3.7.3. Show that C ra, bs is separable.
57
Solution: Let D be the set of all polynomials of nite degree with rational coefcients. Clearly, D is countable since it can be put into correspondence with the set of all nitely nonzero sequences with rational components. Given x P C ra, bs and 0, by the Weierstra approximation theorem (Theorem 1.5.6), there exists a polynomial p such that |xptq pptq| {2 for all t P ra, bs. While p does not necessarily have rational coefcients, we can construct another polynomial r that does, such that |pptq rptq| {2 for all t P ra, bs. We can easily construct r by changing each of the coefcients of p by less than 2N cN , where N 1 deg p and c maxp|a|, |b|q. Then
|pptq rptq|
as desired. Thus
1 N cN ti 2N
i 0
2,
a t b,
}x r} tmax |xptq rptq| tmax |xptq pptq| tmax |pptq rptq| Pra,bs Pra,bs Pra,bs
so D is dense in C ra, bs. Example 3.7.4. Show that l8 is not separable.
, K
n u. (We can index Solution: Suppose we have found a countable dense subset txn u of l8 , where xn ti the sequence txn u by n since the sequence is countable.) Construct x ti u by the following rule: i If i 1 , set i 1. 2 i 1 , set 0. If i i 2 1 2
1 Then }x xi } 2 for all i, since x and xi differ by at least dense, so l8 is not separable.
in their ith component. Therefore txn u is not K
Example 3.7.5. Show that L8 is not separable. Solution: Consider the subset of indicator functions t1t 1xPr0,ts utPr0,1s P L8 . Clearly this set is uncountable since the set of real numbers in the interval r0, 1s is uncountable. Furthermore, }Is It } 1 if s t. Therefore, the open balls tB1{2 pIt qutPr0,1s do not intersect. However, any dense set of L8 must include an element in each of these balls; since the set of balls is uncountable, any dense set of L8 is uncountable. Therefore L8 is not separable. K
58 0
3. B ANACH S PACES
C HAPTER 4
HILBERT SPACE
Reading: [1, Chapter 3].
4.1
Inner Products
Denition 4.1.1 (pre-Hilbert space). A pre-Hilbert space is a linear vector space X over a eld F along with an inner product dened on X X , which associates with any two vectors x, y P X a scalar px, yq. The inner product satises the following axioms: 1. conjugate symmetry: px, yq py, xq for all x, y P X . 2. sesquilinearity (linearity in the rst variable): px y, zq for all x, y, z P X and all P F.
px, zq py, zq and px, yq px, yq
3. positivity and positive deniteness: px, xq 0 and px, xq 0 if and only if x 0.
Note that by conjugate symmetry and linearity in the rst term,
px, y zq py z, xq py, xq pz, xq px, yq px, zq .

The property that pw y, x zq pw, xqpy, zq is sometimes called additivity. If F R, then conjugate symmetry becomes regular symmetry. Therefore, if F R, as we will assume for the rest of this text, sesquilinearity becomes standard bilinearity. a a Let }x} px | xq. We verify that } } p | q is indeed a norm: 1. positive homogeneity:
aq }x} px, xq p
px , x q
pbq
p x , x q
p cq
px, xq ||}x},
where (a) and (c) follow from sesquilinearity and (b) follows from conjugate symmetry. 2. triangle inequality:
}x y}2 px y, x yq px, xq px, yq py, xq py, yq }x}2 2| px, yq | }y}2.

By the Cauchy-Schwarz inequality, (re)proved below, | px, yq | inequality and taking square roots gives }x y} }x} }y}. 59
}x}}y}; substituting this into the
60 0
4. H ILBERT S PACE
3. positive deniteness and positivity: This follows from the positivity and positive deniteness of the inner product. Below we prove a version of the Cauchy-Schwarz inequality equivalent to Proposition 3.1.2. Proposition 4.1.2 (Cauchy-Schwarz Inequality ([1], Lemma 1, p. 47)). For all x, y in an inner product space, | px, yq | }x}}y}. Equality holds if and only if x y or if y 0. Proof: If y 0, the inequality holds trivially. If y 0, then for all scalars , 0 px y, x yq px, xq py, xq px, yq |2 | py, yq . Then for px, yq { py, yq, 0 px, xq
| px, yq |2 | px, yq | apx, xq py, yq }x}}y}. py , y q
The following lemma generalizes a useful result in Euclidean geometry: Proposition 4.1.3 (parallelogram law ([1], Lemma 3, p. 48)). For any x, y in a pre-Hilbert space X ,
}x y}2 }x y}2 2}x}2 2}y}2.

Proof: In the general case, directly expand the norms in terms of the inner product. For X R2 , we can prove the parallelogram law using only the law of cosines. Dene the inner product px | yq as the dot product x y }x}2 }y}2 cos , where is the angle between the vectors x and y. Then by the law of cosines,
2 2 }x y }2 2 }x}2 }y}2 2}x}2 }y}2 cos 2 2 }x y }2 cosp q . 2 }x}2 }y}2 2}x}2 }y}2 loooomoooon
cos
Summing these equations gives the desired result. The Cauchy-Schwarz inequality allows us to prove the continuity of the inner product. Proposition 4.1.4 (continuity of the inner product ([1], Lemma 4, p. 49)). Suppose that xn in a pre-Hilbert space. Then pxn , yn q px, yq. Proof: Because the sequence txn u is convergent, it is bounded; say }xn } M . Then
x and yn y
| pxn, ynq px, yq | | pxn, ynq pxn, yq pxn, yq px, yq | | pxn, yn yq | | pxn x, yq |.
By the Cauchy-Schwarz inequality,
| pxn, ynq px, yq | }xn}}yn y} }xn x}}y}.
4.2. The Projection Theorem 0 Since }xn } is bounded,
61
| pxn, ynq px, yq | M }yn y} }xn x}}y} 0.

4.2 The Projection Theorem
Denition 4.2.1 (orthogonality). In a pre-Hilbert space, two vectors x, y are said to be orthogonal, written x K y, if px, yq 0. A vector is said to be orthogonal to a set S , written x K S , if x K s for all s P S . Consider the following optimization problem in a pre-Hilbert space: given a vector x in a pre-Hilbert space X and a subspace M in X , nd arg minmPM }x m}. If x P M , then the solution is trivial, but if x R M , we must answer three questions: 1. Is there a vector m others?
P M that minimizes }x m}, or is there no m that is at least as good as the
2. Is the solution unique? 3. What is the solution? The following version of the projection theorem answers these questions. T HEOREM 4.2.2 (Projection Theorem: pre-Hilbert space ([1], Theorem 1, p. 50)). Let X be a pre-Hilbert space and M be a subspace of X , and let x P X . If there exists a vector m0 P M such that }x m0 } }x m} for all m P M , then m0 is unique. Furthermore, m0 arg minmPM }x m} if and only if x m0 K M . Proof: First we show that if m0 arg minmPM }x m}, then x m0 K M . Suppose for contradiction that there exists an m P M such that x m0 M m. Without loss of generality, assume that }m} 1 and px m0 | mq 0. Dene m1 m0 m. Then
}x m1}2 }x m0 m}2 }x m0}2 px m0, mq pm, x m0q ||2 }x m0}2 ||2 }x m0}2.
so if x m0 M M , m0 is not a minimizing vector. Now we show that if x m0 K M , then m0 arg minmPM }x m} and m0 is unique. For any m P M ,
}x m}2 }x m0 m0 m}2 }x m0}2 }m0 m}2

by the Pythagorean theorem. Then m0 arg minmPM }x m} since }x m} }x m0 } by positive deniteness of the norm, and m0 is unique since }m0 m}2 0, and therefore }x m} }x m0 }, if and only if m0 m. K The following counterexample will suggest a stronger version of the projection theorem.
62 0
4. H ILBERT S PACE
Example 4.2.3. Let X be the space of innite sequences, with the l2 -norm and the inner product px | yq 1 1 i1 i i for x ti u, y ti u P X . Let x 1, 2 , 3 , . . . , and let M be the subspace of sequences with only nitely many nonzero terms such that all even terms equal zero. Show that the minimizing vector m0 does not exist in M . Solution: Let m0
tiu be a candidate for the minimizing vector, where 2k 0 for k P N. Then }x m0}
2
8
i 1
1 i
so the best we can do is set 2k1 2k1 1 for k 1 with 2k 0 and 2k1 2k1 for k P N. Then
n 8. Consider the sequence tmnu, where mn tiu
}x mn}
2
8
k 1
1 2k
8
k N
2k 1
1 2 4 6
8
k N
2k 1
Because mn is the best we can do among innite series with n 1 nonzero terms, a minimizing vector does not exist since }x mn } }x mn1 } for all n. The problem is that mn m0 , but m0 P M since M is not complete. K To x the problem, we have the following denition. Denition 4.2.4 (Hilbert space). A Hilbert space is a complete pre-Hilbert space. That is, a Hilbert space is a Banach space with an inner product that induces the norm. We then prove the following stronger version of the projection theorem. Q UIZ T HEOREM 10 (Projection Theorem: Hilbert space ([1], Theorem 2, p. 51)). Let H be a Hilbert space and M be a closed subspace of H . For any x P H , there exists a unique vector m0 P M such that }x m0 } }x m} for all m P M . Furthermore, m0 arg minmPM }x m} if and only if x m0 K M . Proof: Uniqueness and orthogonality have already been established in the proof of Theorem 4.2.2, so it sufces to establish the existence of the minimizing vector m0 . If x P M , then m0 x and the theorem is trivial. If x R M , dene inf mPM }x m}. We want to nd an m0 P M such that }x m0 } . Let tmi u be a sequence of vectors in M such that }x mi } . By the parallelogram law (4.1.3),
}pmj xq px miq}2 }pmj xq px miq}2 2}mj x}2 2}x mi}2,

or
}mj mi} 2}mj x} 2}x mi}

2 2 2
4 x
2 mi mj . 2
Then }x pmi mj q{2} by denition of and }mi x}2
2 as i 8, so
}mj mi}2 2}mi x}2 2}x mj }2 42 22 22 42 0.
4.3. Orthogonal Complements 0
63
Therefore tmi u is a Cauchy sequence and since M is a closed subspace of a complete space, tmi u has a limit tm0 u in M . By continuity of the norm, which follows from continuity of the inner product (4.1.4), }x m 0 } . K
4.3
Orthogonal Complements
Denition 4.3.1 (orthogonal complement). Given a subset S of a pre-Hilbert space, we call S K S u the orthogonal complement of S .
tt : t K
Example 4.3.2. If S Solution: S K Then S KK
tp1, 1, 1qu, what are S K, S KK, and S KKK?
tpx, y, zq : x y z 0u, the plane through the origin perpendicular to the vector p1, 1, 1q.
tpa, b, cq : ax by cz 0, x y z 0u tpa, b, cq : a b cu rtp1, 1, 1qus rS s. tpx, y, zq : x y z 0u S K.

K
Finally, S KKK
Proposition 4.3.3 (([1], Proposition 1, p. 52)). Let S and T be subsets of a Hilbert space. Then 1. S K is a closed subspace. 2. S 3. 4.
S KK. If S T , then T K S K . S KKK S K .
Proof: We prove each statement individually: 1. S K is a subspace because for any x, y P S K , a, b P F, and s P S ,
pax by, sq a px, sq b py, sq 0,

so ax by P S K . S K is closed because if txn u is a convergent sequence in S K , then continuity of the inner product implies 0 pxn , sq px, sq for all s P S , so x P S K . 2. If x P S , then x K y for all y P S K ; therefore, x P S KK . 3. If y P T K , then y K x for all x P S since S 4. From (2), S K S K S KKK .
T . Therefore, y P S K. SK.
Therefore,
S KKK and S S KK.
Combining the latter result with (3), S KKK
64 0
4. H ILBERT S PACE
We want to show that given any closed subspace S of a Hilbert space H , any vector x can be written as the sum of of a vector m P M and a vector n P M T . First we need the following denition. Denition 4.3.4 (direct sum). A vector space X is the direct sum of two vector spaces M and N , written M ` N , if every vector x P X has a unique representation of the form x m n, where m P M and n P N.
Example 4.3.5. Show that R3 is the sum of the xy -plane and the xz -plane, but not the direct sum. Solution: In R3 , the xy -plane is given by M tpx, y, 0q : x, y P Ru and the xz -plane is given by N tpx, 0, zq : x, z P Ru; their sum is tp2x, y, zq : x, y, z P Ru tpx, y, zq : x, y, z P Ru R3. However, R3 is not the direct sum of the two planes since we can write px, y, z q pw, y, 0q px w, 0, z q, so no x P R3 can be written uniquely in the form x m n with m P M and n P N . K T HEOREM 4.3.6 (([1], Theorem 1, p. 53)). If M is a closed linear subspace of a Hilbert space H , then H M ` M K and M M KK .
Proof: Let x P H . By the projection theorem, there is a unique vector m0 P M such that }xm0 } }xm} for all m P M , and we can write n0 bx m0 P M K . Therefore, we can write x m0 n0 with m0 P M and n0 P M K . To show that this representation is unique, suppose x m1 n1 with m1 P M and n1 P M K . Then 0 m1 m0 n1 n0 . m1 m0 and n1 n0 are orthogonal, so by the Pythagorean theorem, 0 }m1 m0 }2 }n1 n0 }2 , so by positive deniteness of the norm, m0 m1 and n0 n1 . Therefore, H M ` M K . Now we show that M M KK . By Part (2) of Proposition 4.3.3, M M KK . To show the other direction, let x P M KK . By the above result, x m n where m P M KK and n P M KKK M K , where M KKK M K by Part (4) of Proposition 4.3.3. Since x, m P M KK , x m n P M KK . But n P M K as well, so pn, nq }n}2 0, which by positive deniteness of the inner product implies n 0. Thus, x m P M , so M KK M and we conclude M M KK . K
4.4
The Gram-Schmidt Procedure
Denition 4.4.1 (orthogonal set). A set S of vectors in a pre-Hilbert space is an orthogonal set if x K y for all x, y P S such that x y. If }x} 1 for all x P S , then the set is orthonormal.
Proposition 4.4.2 (([1], Proposition 1, p. 53)). An orthogonal set of nonzero vectors is a linearly independent set. Proof: Suppose tx1 , . . . , xn u is a nite subset of the given orthogonal set and that there are scalars 1 , . . . , n such that n i1 i xi 0. Then
i 1
i xi , xj
j pxj , xj q p0, xj q 0,
1, . . . , n.
4.4. The Gram-Schmidt Procedure 0 Since xj
65 K
0, j 0 for j 1, . . . , n. Then by Theorem 2.3.3, x1, . . . , xn are linearly independent.
T HEOREM 4.4.3 (Gram-Schmidt Process). Let txi u be a countable sequence of linearly independent vectors in a pre-Hilbert space X . Consider the vectors tei u dened by zi
xi
i 1 j 1
pxi, ej q ej
ei
zi }z }
i
for all i. Then tei u is an orthonormal sequence and for each n, rtx1 , . . . , xn us rte1 , . . . , en us. Proof: The proof of the rst part is by induction. For the case n 1, te1 u is an orthonormal sequence. For general n, assume that te1 , . . . , en u is an orthonormal sequence and prove that te1 , . . . , en1 u is also an orthonormal sequence. Because te1 , . . . , en u is an orthonormal sequence, it sufces to show that en1 is orthogonal to each of e1 , . . . , en and that }en1 } 1. Taking inner products,
pzn1 | ziq
xn1
n j 1
pxn1, ej q ej , xi
i 1 k 1
i 1 k 1
pxn1, xiq
for i en1
pxi, ek q ek
n j 1
pxn1, xiq pxn1, xiq pxn1, xiq pxn1, xiq 0,
pxn1, ek q pek , xiq
pxn1, ej q pej , xiq
i 1 k 1
pxn1, ek q pek , xiq
1, . . . , n, where the last line follows from Parsevals equality (??). Since en1 zn1{}zn1}, K ei for i 1, . . . , n and }en1} 1. Therefore, teiu is an orthonormal sequence. To show that rtx1 , . . . , xn us rte1 , . . . , en us for all n, observe that en is a linear combination of
x1 , . . . , xn . Therefore, we can also write xn as a linear combination of e1 , . . . , en ; thus, a vector can be written as a linear combination of x1 , . . . , xn if and only if it can be written as a linear combination of e1 , . . . , en . K
Example 4.4.4. Let X be the space of polynomials p such that deg p 2. Dene the inner product
pp, qq
for any p, q
pptqq ptq dt
P X . Starting with the sequence t1, t, tu, construct an orthonormal basis for X .
66 0
4. H ILBERT S PACE
Solution: We have
x1 1 z1 1 ? e1 }z1} 2 z2 x2 px2 , e1 q e1 t
z1 e2 z3 e3

x3 px3, e1q e1 px3, e2q e2 t2 1 3

z3 }z3}
z2 }z2}

2 t 3
c
3 2
5 2 1 t 2 2
5 . 2 K
Suppose we have a vector x in a Hilbert space X and want to nd the closest vector to x that lies in the subspace M rty1 , . . . , ym us. We can use Theorem 10 and the Gram-Schmidt process to solve this problem. If M is one-dimensional, we can set ei yi {}yi } for any i 1, . . . , n to nd the solution x px, ei q ei P M K . Example 4.4.5. Let X
R2 and dene the inner product pv, wq 4v1w1 v2w2 pw1, w2q P X . Let M rty1us rtp1, 4qus. Find the vector in M that is closest

for any v pv1 , v2 q, w to y1 p5{2, 0q. Solution: First we set
e1 Then the minimizing vector is x px, e1 q e1
y1 }y1}
2 ? , ? 2 5 5 1
5 ,0 2
6 ? 5
2 ? , ? 2 5 5 1
2 4 , ? 5
This technique works equally well if M is generated by more than one vector, provided that we start by converting the set ty1 , . . . , yn u into an orthonormal basis te1 , . . . , em u. (Of course, if y1 , . . . , yn are not linearly independent, m n.) Note that we can write this problem as
max x 1 ,...,n
i 1 n
i yi
i 1
i yi , yj
0,
1, . . . , n.
Note that the objective function is quadratic in 1 , . . . , n and that the constraints are linear in 1 , . . . , n ,
4.5. Fourier Series 0
67
so using Lagrange multipliers reduces this problem to solving a system of linear equations. More specically, if we write

G G py 1 , . . . , y n q

. . . px n , x 1 q
py 1 , y 1 q p y 1 , y n q
.. .
. . . p xn , xn q
1 . . . n
c

. . . , p x, yn q
p x, y1 q
then we can write the constraints as GT c. These are called the normal equations. G is called the Gram matrix. The normal equations have a unique solution if and only if det G 0. Proposition 4.4.6 ([1], Proposition 1, p. 56). det G 0 if and only if y1 , . . . , yn are linearly independent. Proof: Equivalently, we show that det G 0 if and only if y1 , . . . , yn are linearly dependent. If y1 , . . . , yn are linearly dependent, then there exist 1 , . . . , n not all equal to zero such that n i1 i yi 0. Taking inner products,
n i 1
i yi , yj
i 1
i py i , y j q 0 ,
1, . . . , n,
or equivalently,
n i 1
i Gi
0
GT i
p yi , y1 q p yi , yn q
Therefore, the rows of the Gram matrix are linearly dependent, so det G 0. If det G 0, then the rows of the Gram matrix are linearly dependent, so there exist 1 , . . . , n not all equal to zero such that n i1 i pyi , yj q 0 for j 1, . . . , n. Therefore,
i 1
i yi , yj
0,

1, . . . , n,
2 n i yi
i 1
so
n j 1
Thus
0, so y1, . . . , yn are linearly dependent. K Note that if det G 0, solving the normal equations is equivalent to solving fewer than n equations in n
i 1 i yi
i 1
i yi , yj
0
0.
unknowns, so a solution exists, though it is not unique.
4.5
Fourier Series
Now suppose that X is innite-dimensional and that M is generated by a countably innite set of linearly independent vectors. (Recall from Example 2.3.7 that M does not necessarily equal X .) Assume that we have constructed an orthonormal basis e1 , e2 , . . . for M using the Gram-Schmidt process. Then x
8
i 1
px, eiq ei
68 0
4. H ILBERT S PACE
is the vector in M closest to x. However, this is an innite series, so we need the following necessary and sufcient condition for an innite series of orthogonal vectors to converge in Hilbert space. T HEOREM 4.5.1 ([1], Theorem 1, p. 59). Let tei u be an orthonormal sequence in a Hilbert space H . A series of n 2 the form 8 i1 | | 8. In that case, i px, ei q for i 1. i1 i ei converges to an element x P H if and only if Proof: Suppose
n
2 | | 8 and dene the partial sums sn
i 1
i 1 i ei .
Then
}sn sm}
2
2 m e i i
i n 1
m i n 1
|i|2 0
as n, m 0, where n m without loss of generality. Therefore, tsn u is a Cauchy sequence; since H is complete by denition of a Hilbert space, there exists an x P H such that sn x. 8 2 2 If s x P H , then tsn u is a Cauchy sequence so m in1 |i | 0 as n, m 0. Thus, in1 |i | 0, 8 n so i1 |i |2 8. K Clearly psn , ei q i for i 1. Since sn x, psn , ei q px, ei q by Proposition 4.1.4. We call ti u tpx, ei qu the Fourier coefcients of x with respect to tei u.
Proposition 4.5.2 (Bessels inequality ([1], Lemma 1, p. 59)). Let x be an element in a Hilbert space H and suppose tei u is an orthonormal sequence in H . Then
8
i 1
| px, eiq |2 }x}2.
Proof: Let i
px, eiq for i 1. For all n,

0
x
n i 1
2 i xi
}x}2
n i 1
|i|2.
Then
n i 1
for all n; letting n 8 establishes the inequality. Example 4.5.3. Show that Bessels inequality holds even if tei u does not generate H . Solution: Let X be the space of polynomials on r1, 1s and dene the inner product
|i|2 }x}2
K
pp, qq
pptqq ptq dt
4.5. Fourier Series 0
69
for any p, q P X . Let tei u be an orthonormal set of even polynomials and let xptq t. Clearly tei u are even polynomials, so px, ei q 0 for i 0. Then 0 so Bessels inequality holds.
8
i 1
2 | px, eiq |2 }x}2 3 , K
2 Since Bessels inequality guarantees that 8 i1 | px, ei q | 8, Theorem 4.5.1 guarantees that the series 8 i1 px, ei q ei converges to some element. Since any Hilbert space H is complete by denition, that element is in H . Let S tei u be an innite subset S of H ; the subspace rS s include all nite linear s of combinations of the vectors in S , but since we have an innite sum, we are interested in the closure rS s. rS s. For example, if S t1, t2, t4, . . . u, then cos t is in rS
T HEOREM 4.5.4 ([1], Theorem 2, p. 60). Let x be an element in a Hilbert space H and suppose tei u is an orthonormal sequence in H . Then the series
8
px, eiq ei
i 1
PM converges to an element x
P MT . rteius. Furthermore, x x
Proof:
1. Convergence of Fourier series Since any Hilbert space H is complete, the series converges to an element of H . The vectors ei are an innite subset S . The subspace rS s includes all nite linear combinations, but since we have an innite sum, we need the closure of this subspace. If S
t1, t2, t4, u, what transcendental function is in the closure of rS s?
Prove (Theorem 2 on page 60) that x
8
px|eiqei
i 1
converges to an element of the closed subspace M generated by the ei .
70 0
4. H ILBERT S PACE
If the closed subspace M is H itself, then the orthonormal sequence ei is called complete, and Bessels inequality becomes an equality. A useful criterion (proof left for homework): the orthonormal sequence ei is complete if and only if the only vector orthogonal to each of the ei is the null vector . 2. Completeness: an example The Hilbert space is H L2 r1, 1s. From the sequence t1, t, t2 , u it is straightforward to use Gram-Schmidt to construct an orthonormal basis. Luenberger, p. 61, has an explicit formula, but we do not need it. To prove completeness, we assume (for a contradiction) the existence of a function f ptq that is orthogonal to each tn . This element f ptq of H may be discontinuous or unbounded, but its antiderivative F ptq is a continuous function of t. Prove continuity Prove that F p1q F p1q 0. Using integration by parts, prove that
1
n 1 t F ptqdt 0.
f p qd
Using Weierstrass, we construct a polynomial Qptq such that |F ptq Qptq| on [-1,1]. Now prove (this is tricky, see Luenberger p. 62) that 1 2 2 1 |F ptq| dt 2 .
Thus the continuous function F ptq is the zero function and the Lebesgue integrable function f ptq is in the same equivalence class as the zero function, i.e. it is . The Hilbert space H E 3 . Let a p6, 0, 0q, let M be the subspace spanned by the vectors z1 p1, 0, 1q and z2 p1, 2, 1q, which are conveniently orthogonal to one another, and let V be the linear variety a N . Find the vector m0
3. When codimension is smaller - a nite example
P N that is closest to a. Find the vector a0 P V of minimum norm.
State and prove the theorem (Theorem 1 on p. 64) that these two problems (or any two like them) are really the same. There is an easier way to do this problem, since the dimension of N K is just 1. Since y p1, 1, 1q is perpendicular to N , ry s N K , and V is the set of all vectors that satisfy the single equation
px|yq pa|yq 6. Of these, the one with minimum norm is the one that lies in N K. So set x y
and solve the problem.
Of course, this approach would have been spectacularly better if H and N were innite-dimensional while N K was 1-dimensional.
4.5. Fourier Series 0 4. The dual approximation problem Theorem 2 on p. 65 generalizes the preceding example.
71
Let y1 , y2 , yn be a set of linearly independent vectors. Luenberger calls the space that they span M , but it is more consistent with the preceding example to call it N K . Consider all vectors that satisfy the equations
px|y1q c1 px|y2q c2 px|ynq cn

If all the ci are zero, these vectors are in the subspace N . Otherwise they are in a linear variety V
pa|yiq ci for all i

x0
N a, where
How do we know that the vector in V with minimum norm is of the form
n i1 i yi ?
Write down a set of equations that can be solved for the i . How could we solve this problem by Gram-Schmidt? 5. A nite-dimensional example You are doing the term project for your course Ethnic Sculpture. You start with a log of length 21 meters. In the rst week, you cut off u meters, carve a totem pole, and set it vertically in Harvard Yard. In the second week, you cut off v meters, carve a totem pole, and set it vertically in Harvard Yard. In the third week, you carve a totem pole from the remaining w meters, and set it vertically in Harvard Yard. At the end of the fourth week, you take down the poles, having gained artistic renown equal to 3u 2v w. You need renown of 46 to get an A in the course. The energy cost of erecting a totem pole is proportional to the square of its height, so you want to minimize ||x||2 u2 v 2 w2 . Express the constraints in the form py |xq c. The vectors y1 and y2 generate N K . Find the linear combination 1 y1 2 y2 that satises both constraints. What is the solution to this optimization problem? 6. An easy economic example To keep the numbers small, measure money in trillions of dollars and time in decades. You are Treasury secretary in the new Obama administration, and your task is to build up a fund of 3 e 2 trillion dollars over the next decade to deal with the nancial disaster that will be caused when the current crop of Harvard undergraduates gets busy on Wall Street. By selling off formerly toxic securities you already have a trillion dollars to start with. By making loans to banks, you could get your fund balance to grow at an annual rate of 5 percent compounded continuously, in accordance with the differential equation
72 0
4. H ILBERT S PACE
9 ptq 0.5xptq. x
That will increase your money to from Congress at a rate uptq. Constraints: xp0q 1, xp1q e 2 .
3
?e trillion in a decade, but it is not enough. You will need funds

1
0
Optimization problem: choose uptq to minimize the Hilbert space norm pu|uq solution lies in an innite-dimensional vector space.
u2 ptqdt. The
Secret: the codimension is 1. If we can express the constraint in the form pu|y q c, then the solution is that uptq is in the space generated by y ptq, and we just have to determine one coefcient. First you will have to learn something about linear differential equations that Luenberger takes for granted. What inhomogeneous differential equation does xptq satisfy? What is a solution to the corresponding homogeneous differential equation? Substitute xptq 9 ptq. a
aptqx0ptq, where x0ptq is a solution to the homogeneous equation, and solve for
3
Find an expression for ap1q ap0q as an integral.
The optimal uptq lies in the space N K generated by y . Use the constraint to nd the coefcient, and the problem is solved! 7. An example from optimal control. Let H L2 r0, 1s, which is innite-dimensional. We have a door that is supposed, at the push of a button, to open through an angle of one radian and come back to rest in one second. The door is controlled by an electric motor. By supplying a current uptq P H , which produces a proportional torque, we can overcome a linear damping force and also cause some angular acceleration. With all physical constants set equal to 1 for simplicity, the angular velocity of the door satises the differential equation 9 ptq ptq uptq. Now we insist that the vector uptq must lie in the linear variety V of functions that cause the motor to rotate through an angle of 1 radian in one second, starting at rest and ending at rest. What small change would make V be a subspace N ? What is the dimension of N and of V ? The secret to solving this problem is that the codimension of N is 2. To prove this, we have to nd basis vectors y1 and y2 that are orthogonal to N , so that
1 py1|uq 0 y1 ptquptqdt p1q 0 1 py2|uq 0 y2ptquptqdt p1q 1.
Express the constraint xp1q e 2 in terms of an inner product pu|y q c.
We want to nd the vector of minimum norm in V , the function uptq that minimizes the energy 1 consumption 0 uptq2 dt. This will lie in the two-dimensional space spanned by y1 and y2 .
4.5. Fourier Series 0 8. More fun with differential equations
73
First solve the homogeneous rst-order equation 9 ptq ptq 0.
Now multiply this solution by aptq (variation of parameters) to get a trial solution for the inhomogeneous equation 9 ptq ptq uptq. Plug in the trial solution to get a formula for a 9 ptq. Integrate once to express ap1q and p1q in terms of uptq. Integrate the original equation, in the form 9 ptq 9ptq p1q, p1q, and uptq. What are the vectors y1 and y2 ?
uptq, from 0 to 1 to get a relation among
9. Finishing the solution
By the general theorem for the dual approximation problem, the vector uptq of minimum norm lies in the space generated by y1 ptq and y2 ptq. We could nd the minimum-norm uptq 1 y1 ptq 2 y2 ptq by solving two linear equations for 1 and 2 . However, there is a simpler basis for the space: the functions 1 and et . So uptq 1 2 et . Use the constraints p1q determine uptq.
0 and p1q 1 to get two equations for 1 and 2, and solve them to
10. Minimum distance to a convex set Replace the linear variety V by a closed convex subset K of a Hilbert space H , and let x be an arbitrary vector outside of K .
We can prove (Theorem 1 of section 3.12) that there exists a unique vector k0 P K of minimum distance from x. Orthogonality is too much to ask for, but we can also prove that
px k0|k k0q 0 for all k P K .
px k0|k k0q 0 can fail to be achieved.
Draw a diagram, where H is two-dimensional, to show how
Existence proof: this is almost the same as in the case where we have a subspace M instead of a convex set K . We use the parallelogram-law trick. Let
kinf ||x k|| PK
and construct a sequence such that ||x ki || converges to and so is Cauchy. Show that ki is Cauchy. To prove uniqueness, assume that ||x k0 || ||x k1 || . Luenberger just does a special case of the argument above, but for a little more fun, just repeat the parallelogram-law argument. Now lets assume that k0 has the property that It is then easy to prove that k0 is the vector that minimizes the distance ||x k ||. Just expand ||x k||2 ||x k0 k0 k||2 in terms of dot products. This proof doesnt even use the convexity of k . Why is ki also convergent? What does this prove?
px k0|k k0q 0 for all k P K .
74 0
4. H ILBERT S PACE
11. A proof that requires convexity
Suppose that k0 is the unique minimizing vector. The task is to show that px k0 |k1 k0 q 0 for all k1 P K .
Write a formula for the point k that interpolates between k0 and k1 , and draw a diagram to show why this point is in K for 0 1. Since 0 minimizes ||x k ||, the quantity this fact to complete the proof. 12. Example where the convex set is a cone This is a three-dimensional version of Luenbergers example 1 on page 71. You are piloting a helicopter, and your mission is to deliver a sports reporter to the playing eld of the world outdoor curling championships, which are being held on a large frozen lake in Saskatchewan. At noon, two snowmobiles set out from the origin along the ice, moving with velocity y1 and y2 respectively. The convex cone 1 y1 2 y2 between their tracks becomes the playing eld.
||x k||2 is a nondecreasing function of . Its derivative cannot be negative, even for 0. Use
Sketch the playing eld, and describe the four different types of location for the position x 1 y1 2 y2 on the playing eld that is closest to the position x of your helicopter. We know that for all k on the playing eld, px x |k x q 0. Whatever the solution, you can always move to a new point in the cone along a snowmobile velocity vector. What does this say about k x yi ? If i 0, you can also move to a new point in the cone backwards along a snowmobile velocity vector. What does this say about k x yi ?
13. Curling concluded We now have Luenbergers general solution to this problem: x 1 y1 2 y2 , where for all i Equivalently
px x |yi q 0, with equality if i 0. px x |yi q zi , with i zi 0

Show how to recast this equation, for the two-dimensional case, in terms of the Gram matrix G. Alas, there are four unknowns even in this simple case! Give an interpretation of z for each of the four types of solution to the curling-competition problem.
C HAPTER 5
DUAL SPACES AND THE HAHN - BANACH THEOREM
Reading: [1, Sections 5.1 5.4, 5.12]
5.1
Linear Functionals
Denition 5.1.1 (functional). A transformation or function f : X
is a functional if Y
R.
Denition 5.1.2 (linear functional). A linear functional is a functional f : X af pxq bf pyq for all x, y P X and a, b P R.
R such that f pax byq
In a Hilbert space H , any vector y determines a linear function by the rule fy pxq py, xq. Example 5.1.3. What linear functionals are determined by y1
p1, 1, 1q on E 3? 1 , , . . . on l2 ? y2 1, 1 2 3 y3 ptq et1 on L2 r0, 1s?
Solution: We have fy1 pxq py1 , xq x1 x2 x3 .fy2 pxq
py2, xq
8 1
i 1
yi fy3 pxq py3 , xq
1
0
et1 xptq dt.
Recall that for the case p 2, q 2, the Holder inequality (Quiz Theorem 5) reduced to the CauchySchwarz inequality (Propositions 3.1.2, 4.1.2), which in the inner product form was px, yq2 px, xq py, yq. The following example will motivate some of the calculations in the duality proofs in this chapter. Example 5.1.4. If x ti u P l4{3 does
8
i 1i
1{2 i converge?
Solution: Yes. By the Holder inequality,
8 1
i 1
1{2 i
8 1
i 1
1{2 i
|i|
8
i 1
i1{2 75
4 1{4
}x}4{3
2 6
1{4
}x}4{3 8.
76 0
5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM
Denition 5.1.5 (norm of a linear functional). A linear functional f on a normed space X is bounded if there is a constant M such that |f pxq| M }x} for all x P X . The smallest such constant M is the norm of f , denoted }f }; that is, }f } inf tM : |f pxq| M }x} for all x P X u.
Q UIZ T HEOREM 11 ([1], Proposition 2, p. 104). A linear functional on a normed space is continuous if and only if it is bounded. Proof: First we show that a linear functional f on a normed space X is continuous at a single point, then it is continuous throughout X . Assume f is continuous at x0 P X , and let txn u x be a sequence in X . By the linearity of f , |f pxnq f pxq| |f pxn x x0q f px0q|.
Since xn x x0 x0 and since f is continuous at x0 , f pxn x x0 q f px0 q. Thus, |f pxn q f pxq| 0, so f is continuous at x. Since this holds for any x P X , f is continuous throughout X . Suppose }f } M . Then if xn 0, |f pxn q| M }xn } 0, so f is continuous at x 0. By the above result, f is continuous everywhere in X . Now suppose that f is continuous anywhere in X , specically at x 0. Then there exists a 0 such that |f pxq| 1 for }x} . For any x 0 P X , x{}x} has norm , so by using the linearity of f ,
|f pxq|
so M
f x
}}
}x} x
}x} f x
}x} ,
1{ is a bound for f .
8
Note that a linear functional on a nite-dimensional normed space must be bounded and is therefore continuous. To see why the extension to innite-dimensional vector spaces poses a problem, consider the functional f px q kk
k 1
in l2 : clearly this function is discontinuous at x 0 since limx0 f pxq 0 if we let k 1{kn and take the limit as n 8, but limx0 f pxq is undened if we let k 1{n. The linear functionals themselves may be regarded as elements of a vector space: Denition 5.1.6 (algebraic dual). Given a vector space X , not necessarily normed, the linear functionals on X form a vector space dened as follows: Given two linear functionals f1 and f2 on X , we dene their sum f1 f2 as the linear functional on X given by pf1 f2 qpxq f1 pxq f2 pxq. Given a linear functional f and a scalar a, we dene the scalar product af as the linear functional on X given by paf qpxq af pxq. The null element in the space of linear functionals is the functional f pxq 0 for all x P X . This vector space is the algebraic dual of X .
5.2. Common Dual Spaces 0
77
However, not all linear functionals in the algebraic dual will be bounded, so we restrict our attention to the subspace of the algebraic dual consisting of all bounded, or equivalently continuous, functionals on a normed space X . The norm of a functional f in this subspace is still Denition 5.1.5. In fact, we can write the norm of a functional in several ways:
}f } inf tM : |f pxq| M }x} for all x P X u M sup |f}pxx}q| x 0 sup |f pxq|

}x}1 }x}1
sup |f pxq|.
This leads to the following denition: Denition 5.1.7 (normed dual). Let X be a normed linear vector space. The space of all bounded linear functionals on X is the normed dual of X , denoted X . The norm of an element f P X is
}f }
}x}1
sup |f pxq|.
If we explicitly treat a functional f as an element of the dual space, we can write f x . Then f pxq x pxq xx, x y, where the symmetric notation x, y suggests that we are generalizing the inner product p, q from Hilbert space.
5.2
Common Dual Spaces
T HEOREM 5.2.1 ([1, Theorem 5.2.1, p. 106]). X is a Banach space. Proof: By denition of the algebraic dual (Denition 5.1.6), X already satises the axioms of a vector space. Therefore, we simply need to show that X is complete. Let tx u be a Cauchy sequence in X , so } x n xm } 0 as n, m 0. Then for any x P X , txn u is a Cauchy sequence since |xn pxq xm pxq| } x n xm }}x}. Therefore, since R is complete, there exists a scalar x pxq such that xn pxq x pxq. We must check that x P X ; that is, the functional x is linear and bounded. x is linear since
x pax byq lim x n pax byq a lim xn pxq b lim xn pyq ax pxq bx pyq.
n
Furthermore, since tx n u is a Cauchy sequence, for all 0, there exists an M such that |xn pxq xm pxq| }x} for n, m M and all x. Since x n x , |x pxq xm pxq| }bx} for m M . Thus, |xpxq| |xpxq x m pxq xm pxq| |x pxq x pxq| |xm pxq| p }xm q}x},
so x is bounded. K
78 0
Guided by this theorem, we derive representations of the duals of some common dual spaces. These representations will allow us to apply the theoretical framework of dual spaces to practical problems. To nd the dual of a normed space X , we start with a space whose elements are bounded linear functionals f on X . We can calculate }f } from Denition 5.1.5. The challenge is to nd a representation for each functional by identifying it with an element }x } of a familiar normed space. For example, we can identify any linear functional on E 2 with a vector ? of coefcients pa, bq, combined with the evaluation formula ax by and the Euclidean norm }pa, bq} a2 b2 . By analogy with the inner product, we use xx, x y to denote application of the evaluation rule. Usually this is a sum of products orintegral of products rule. The hard part is usually to show that the norm }f } is equal to the norm }x } of the representation, which was dened independently. The proofs in general have four steps: 1. evaluation: state the rule for using x to evaluate the bounded linear functional f . 2. construction: state the rule for converting the bounded linear functional f to x . 3. alignment: invent a vector x P X that is aligned with a given representation vector x . Aligned means that evaluating f pxq is essentially the same calculation as evaluating ||y ||||x||. If ||x|| 1, so much the better. Now we know that ||f || ||y ||. 4. The inequality step: Prove, usually by invoking something like the Hoelder inequality, that |f pxq| ||x||||y||. Applied to vectors with ||x|| 1, this guarantees that ||f || ||y||, so we conclude }f } }x}. 1. The dual of E n is E n This is easy, but it is a good introduction to what needs to be proved. Evaluation: show how to use a vector y
p1, 2, nq to dene a bounded linear functional.
Construction: give the rule for converting a bounded linear functional f to its representation y . Alignment: Show that the choice x y guarantees that ||f || ||y || Inequality: show that if y represents f , then ||f || ||y ||
2. The normed dual of cn is a subspace of l1 . The notation cn is not standard. It means Rn with the norm that is used in l8 . There are several spaces that use this maximum absolute value norm. l8 includes all bounded innite sequences. c is the subspace of sequences that converge to some limit. c0 is the subspace of sequences that converge the limit 0. cn is the subspace of sequences with n components.
5.2. Common Dual Spaces 0 Evaluation: show how to use a vector y tional.
79
p1, 2, nq P l1 to dene a bounded linear func-
Construction: give the rule for converting a bounded linear functional f to its representation y . Alignment: Choose x P cn with i sgn i . Show that this choice makes |f pxq| ||x||8 ||y ||1 . What goes wrong if we try to do this for x P l8 ? Show that ||f || ||y ||1 .
Inequality: show that if y represents f , then f pxq x, y Conclusion: ||f || ||y ||1 3. The normed dual of lp is lq (Part of this is proof 2.5) As you would guess from the Hoelder inequality, Evaluation: If y f is bounded?
1 p
||x||||y||1, so that ||f || ||y||1.
1 q 1. First we do the proof for 1 p 8.
tiu P lq represents f and x tiu P lp, what is f pxq? How do we know that
Construction: Given that f is linear and bounded, dene y by i f pei q. Show that if a vector xN has only its rst N components nonzero, then f pxN q xN , y On what basis can we extend this result to any x P lp ?
q
Alignment: x N and construct the vector xN whose rst N components are given by
i |i | p sgn i and whose other components are all 0. Since there are only a nite number of nonzero components, we need not worry about convergence. Calculate ||xN || and |f pxN q|. By the denition of the norm, |f pxN q| ||f ||||xN ||. Prove that ||f || ||y ||q .
Inequality: Start with the vector y
tiu P lq . What linear functional f does this dene? Use the Hoelder inequality to show that |f pxq| ||x||p ||y ||q . Why can we conclude that ||f || ||y ||q ?
Given that f is bounded, what can we conclude about y ?
Conclusion: ||f || ||y ||q ? 4. The special case p 1 Evaluation: same as for p 1 Construction: same as for p 1 Alignment: Choose a vector xN with at most one nonzero entry, xN = sgn N . What is the norm ||xN ||? Show that ||f || ||y ||8 . Given that f is bounded, what can we conclude about y ?
Inequality: Start with y ti u P l8 , dene the obvious linear functional f , and use the Hoelder inequality to prove that ||f || ||y ||8 . You can then conclude that ||f || ||y ||8 .
80 0
5. The special case p 8 The obvious guess, that the dual space is l1 , is wrong. The problem is related to the fact that l8 is not separable. It is OK for a separable space (l1 ) to have a nonseparable dual, but not the other way around.
If you take the subspace c0 l8 of innite sequences that converge to zero, which is separable, then the dual space is l1 . The proof is left to the homework. Consider the subspace c l8 of innite sequences that converge to some value . This space is also separable, and its dual space is l1 . The proof is left to the homework. The trick is to notice that once you evaluate f on the vector t, , , , you can use linearity to reduce this problem to the previous one. Here is the problem with l8 . One element is the sequence p0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, q where i 1 if i is prime, 0 otherwise. Another is the sequence p1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, q. Since f is bounded, it has a nite value when evaluated on any such sequence. There are uncountably many such sequences. If the dual space of l8 were l1 , it would be possible to identify a countable subset S of these sequences, evaluate f on that subset, and then use those values to evaluate f on any sequence x. But f is bounded, hence continuous. To nd a y P S such that |f pxq f py q| , we need to make ||x y|| . If that could be done, then S would be dense, and l8 would be separable, which we know is not the case.
6. Dual of Hilbert space The obvious guess, that any Hilbert space H is its own dual, is correct. This is theorem 2 on page 109. Evaluation: any element y P H denes a bounded linear functional by the rule f pxq How do we know that this function is bounded?
px|yq.
: Inequality: use Cauchy-Schwarz to show that ||f || ||y ||. We conclude, as usual, that ||f || ||y ||. 7. Dual of Lp r0, 1s for p 1 The obvious guess, that the dual space is Lq r0, 1s, is correct. What is not entirely obvious is how to construct the representation y ptq from the functional. Luenberger just mentions arguments similar to those in Theorem 1.
Alignment: choose x
Construction: First assume a bounded linear functional f . We want to nd a vector y such that f pxq px|y q. Consider the set N of all vectors n such that f pnq 0. Show that this set is a closed subspace. If N H , then f is the zero functional and we take y . Otherwise, write H N ` N K . Choose a vector z P N K with f pz q 1. Consider any x P H . Show that x f pxqz P N . Use the fact that (x f pxqz |z q 0 to construct y . Show that y is unique.
||y|| . Show that ||f || ||y||.

y
5.2. Common Dual Spaces 0 Evaluation: given y ptq P Lq r0, 1s and xptq P Lp r0, 1s, dene a functional by f pxq How do we know that it is bounded?
1
0
81
xptqy ptqdt.
Construction: Dene the function us ptq 1 for 0 t s, 0 otherwise. For future use, evaluate the norm of v ptq ush ptq us ptq. Given a bounded linear functional f , dene F psq for some M .
f pusq Prove that F ps hq F psq M h
1 p
We now know that F psq is continuous. We cannot prove that it is differentiable. However, we can prove that it is absolutely continuous. There are theorems, beyond the scope of our grasp of Lebesgue integration, that say that an absolutely continuous function is differentiable except on a set of measure zero. Now, wherever F is differentiable, take y ptq F 1 ptq as the function that represents f . The values of y on the set of measure zero where F is not differentiable are 1 irrelevant to the value of the Lebesgue integral f pxq 0 xptqy ptqdt. So we have identied the bounded functional f with an equivalence class of functions. xptq |y ptq| p sgn y ptq.
q
Alignment: choose
Calculate f pxq and ||x||p by evaluating integrals. Conclude that ||f || ||y||q . Since f is bounded, this will establish that yptq P Lq r0, 1s. Inequality: use Hoelder to show that, if y ptq represents f , then f pxq ||x||p ||y ||q . How does this establish that ||f || ||y ||q ?
We are done: we can convert between f and y , and we have proved that ||f || ||y ||q . 8. Minkowski functionals This topic comes much later in the chapter, but it provides a geometric motivation for the sublinear functionals that appear in the Hahn-Banach theorem. Consider a convex set K in a normed space X that has the origin in its interior. You can think of the boundary of K as the set of all points that can be reached from the origin for unit taxicab fare. Then the Minkowski functional ppxq is the cab fare from the origin to x or equivalently, the factor by which x must be scaled down to end up on the boundary of K . More formally, ppxq inf tr : x r
P K, r 0u.
Draw a diagram showing curves where ppxq 1, 2, 1 2. A Minkowski functional is sublinear: it has some but not all properties associated with norms. The rst two are the dening properties of a sublinear functional. ppxq ppxq for 0. Prove this from the denition. ppx1 x2 q ppx1 q ppx2 q. Use the convexity of K to prove this. (This is proof 2.6)
ppxq 0, and ppxq is nite for all x. This additional property gives us a convex functional. It is obvious from the denition.
ppxq is continuous. Prove this rst at , then use sublinearity to extend the proof to an arbitrary x. We will consider only continuous sublinear functionals.
82 0
9. Statement of the Geometric Hahn-Banach theorem This looks obvious in two or three dimensions, but we will need the more obscure extension form of the theorem in order to prove it. Let K be a convex set with nonempty interior in a real normed vector space X . There is no loss of generality in assuming that is in the interior. Suppose that V is a linear variety in X containing no interior points of K . Note that V can be as small as just a single point, and that it can include points in the boundary of K . Then there exists an element x
v, x c for all v P V (this denes a hyperplane H containing V ) k, x c for all k in the interior of K (the hyperplane does not intersect the interior of K ).
P X and a constant c such that
Draw diagrams to illustrate the following: If K is not convex, the conclusion of the theorem may not hold. The hyperplane H may or may not be unique. Also illustrate the feature that if X is 3-dimensional, V may have dimension 0, 1, or 2. 10. The theorem in two dimensions. Suppose that V is a zero-dimensional variety (just a single point x). Construct the subspace M (a line through the origin) that contains x. It is easy to construct a linear function f pxq on M such that f pxq 1. Since x is not in the interior of K , ppxq 1. Draw a diagram to illustrate this setup. Now consider 0. Show that f pxq ppxq.
At this point we need a way to extend the denition of f pxq to the entire space X while preserving the property that f pxq ppxq. That is exactly what the extension form of the Hahn-Banach theorem will do for us. 11. Proof of the Hahn-Banach theorem(extension form) (proof 2.7) On subspace M , we have a linear functional f pmq and a continuous sublinear functional p (dened on all of X ) satisfying f pmq ppmq for all m. Choose a vector y that is not in M . Our immediate goal is to dene a linear functional g on the space rM y s such that g pmq f pmq on M and g pxq ppxq on this space that is one dimension larger. Draw a diagram to illustrate the case where X is the plane, M is a line, and p is the Minkowski functional of a convex set. Show that setting g py q 0 will not necessarily work, but that it appears possible to choose a vector m0 P M such that setting g py m0 q 0 will work. Show an example where only one value for m0 is possible, and show other examples where many different choices will do the job. Now consider arbitrary m1 , m2 P M . For xed y , we require g pm1 y q ppm1 y q and g pm2 y q ppm2 y q. Exploit the linearity of g to get a pair of inequalities involving the value g py q that we need to choose.
Suppose 0. Show that f pxq ppxq
Starting with f pm1 m2 q ppm1 m2 q, show that there exists at least one acceptable value for g py q. Luenberger, on page 111, calls this value c.
5.2. Common Dual Spaces 0
83
Using the inequality g pm y q ppm y q, which holds for all m P M given our clever choice of g py q, show that g pm y q ppm y q for all 0. Using the inequality g pm y q ppm y q for all 0. What about 0?
ppm yq, which also holds for all m P M , show that gpm yq
We have now succeeded for one more dimension, since we have chosen g py q in such a way that g pm y q ppm y q for all and all m, and any vector in rM y s is of the form m y . 12. Finishing the proof At this point, if X is nite dimensional, an extension to all of X follows easily by induction, since given f on a k -dimensional M , we have shown how to construct an extension g on a (k 1)dimensional rM y s. However, we want to use this theorem in the innite-dimensional case, so there is still work to be done. Being wimps who are not familiar with Zorns lemma (Luenberger, p. 111), we assume that X is separable. From the countable dense subset x1 , x2 , , xn , we extract a subset y1 , y2 , , yn , of linearly independent vectors, none of them in M . Now M and these yi generate S , a dense subset of X . One y at a time, we extend functional f to a g that is dened on S . By construction, g continuous?
p.
Furthermore, p is bounded and continuous. How do we know that g is
Now consider an x that is not in the dense subspace S . How do we dene F pxq on all of X so that F px q p px q ?
If you are keen on Zorns lemma, see Bachman and Zarici, Functional Analysis, pp.179-180, for a proof that does not assume separability. 13. Back to the geometric version (proof 2.8) Review the situation with the aid of a diagram. We have a convex set K with in its interior. V is a linear variety in X that includes no interior points of K , although it may include boundary points. We let M be the subspace of X generated by V . Then there is a linear functional f , dened on M , which has the value 1 on the variety V . Since V contains no interior points of K , the Minkowski functional of K , ppxq, satises f pxq 1 ppxq on V . Prove that f pxq ppxq for all m P M . Use Hahn-Banach to extend f pxq to an F pxq dened on all of X . How can we be sure that F is continuous? Show that the hyperplane H
tx : F pxq 1u is closed and contains no interior points of K . Restate the conclusion in terms of an element x P X .
Show that any norm ||x|| is a sublinear function.
14. The norm as an all-purpose sublinear function - Corollary 1 on page 112. Given a bounded linear functional f dened on subspace M X , with norm ||f ||M , prove that it can be extended to a bounded linear functional F , with the same norm, dened on all of X . The trick is to choose ppxq ||f ||M ||x||.
84 0
A really simple example: In E 2 (Euclidean norm), let M be the line u M . What is the extension F pu, v q to all of E 2 ?
v and dene f pt, tq 2t on
5.2. Common Dual Spaces 0 15. Achieving alignment
85
Show (Corollary 2) for any x P X , how to construct a nonzero bounded linear functional F , dened on all of X , such that |F pxq| ||F ||||x||. In R2 , choose the Somali taxicab norm ||x|| ||pu, vq|| 1 2 |u| |v |. Show the set K for which ||x|| 1.
Extension form(corollary 1): Dene f on the subspace generated by x as f p2t, tq 2t. Extend f to a function F pu, v q with the same norm. Extension form(corollary 2): Find a linear functional, dened on all of R2 , such that |F pxq| ||F ||||x||.
Geometric form: As V choose the point x does not intersect the interior of K .
1 p1, 2 q.
Find a hyperplane H that includes V but that
86 0
C HAPTER 6
APPLICATIONS OF THE HAHN - BANACH THEOREM
Reading: [1, Sections 5.5 5.9, 5.11 5.13]. Also see [7, Chapter 2] for more on the Riesz representation theorem.
6.1
The Dual of C ra, bs
Recall that C ra, bs is the Banach space of real-valued (uniformly) continuous functions x on ra, bs with norm }x} suptPra,bs |xptq|. The challenge is to nd a way of representing all linear functionals on C ra, bs in a unied manner without using Dirac delta functions. This can be done elegantly by using the Stieltjes integral. Denition 6.1.1. Let f and g be real-valued, bounded functions on ra, bs that do not have a common point of discontinuity. Take a partition of the interval a t0 tn b. Then if I
maxpt lim t q0
i 1
n 1 i 0
f pti1 qpg pti q g pti1 qq
exists, I is called the Stieltjes integral of f with respect to g , denoted

b
a
f ptq dg ptq.
Example 6.1.2. Suppose that you use a logarithmic t-axis in presenting a graph of function f : i.e. you graph f plog sq. Now the area under the graph no longer represents the integral. Show that the Stieltjes b integral gives the right formula for approximating the integral a f psq ds and that this formula agrees with the change-of-variable formula.
Solution: The area under the graph is

exp b
exp a
f plog sq dplog sq

where we have the partition a s0
exp b
exp a
lim f plog si1 qplog si log si1 q maxpsi1 si q0 i0 1 f plog sq ds s

exp b
exp a
n 1
f psq ds, K
sn b, which agrees with the change-of-variable formula.

87
88 0
6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

1
0
Example 6.1.3. What is g pt q t ?
f ptq dg ptq if:
g is differentiable? g ptq 0 for t 1{2 and g ptq 1 for t 1{2? g ptq t for t 1{2 and g ptq t 2 for t 1{2? g pt q P p T
tq for random variable T with support r0, 1s?

1 1
0 0
Solution: We have
1 1 1 1 1
0 0 0 0 0
f ptq dg ptq f ptq dg ptq f ptq dg ptq
f ptq dt f ptqg 1 ptq dt f ptq dt f p1{2q
f ptq dg ptq f p1{2q

1
0
f ptq dg ptq E rT s K
The Riesz representation theorem says that any bounded linear functional f on C ra, bs can be expressed as a Stieltjes integral f px q
b b
a
xptq dv ptq.
However, there is one caveat: the function v must be of bounded variation. Loosely speaking, TVpv q a |dv ptq| must be nite. For example, we could choose xptq sgn v ptq so that f pxq TVpxq. However, such an x is clearly not in C ra, bs since it is not continuous, so we must introduce the following normed linear space. Denition 6.1.4. Let B ra, bs be the space of bounded functions x on ra, bs with norm }x} suptPra,bs |xptq|. Note that C ra, bs is a subspace of B ra, bs. Also recall that we dened the normed linear space BVra, bs of functions on ra, bs with norm TVpxq in Example 2.4.6. Now we can state the Riesz represetnation theorem. T HEOREM 6.1.5 (Riesz Representation Theorem ([1], Theorem 1, p. 113)). Let f be a bounded linear functional on C ra, bs. Then there is a function v P BVra, bs such that for all x P C ra, bs, f px q and }f } TVpv q. Conversely, every function v
b
a
xptq dv ptq
P BVra, bs denes a bounded linear functional in this way.
6.1. The Dual of C ra, bs 0
89
Proof: Since C ra, bs is a subspace of B ra, bs, if f is a bounded linear functional on C ra, bs then, by the Hahn-Banach theorem, there exists a linear functional F on B ra, bs which is an extension of f and has the same norm. For any s P ra, bs, dene the set of functions tus : s P ra, bsu, where for all t P ra, bs, us ptq 0 if s a and us ptq 1tPra,ss if s P pa, bs. Clearly, each us is in B ra, bs. Also dene v psq F pus q, and let a t0 tn b be a nite partition of ra, bs. First we show that TVpv q }f }. Dene
n i 1 i
sgnpvptiq vpti1qq for i 1, . . . , n. Then

n i 1 n i 1
|vptiq vpti1q|
pvptiq vpti1qq pF put q F put qq

i i 1
F
n i 1
put ut q
i i 1
where the last step follows from the linearity of F . By the Hahn-Banach theorem, }F } }f }. By denition, }uti uti1 } 1 for only one i P t1, . . . , nu, so
n
i
i 1
put
i
uti1
q 1.
uti1
Therefore, combining these results with Corollary ??,

n i 1
|vptiq vpti1q| } }
n F
i 1
put
i
q }f }.
Then since TVpv q is the maximum of n i1 |v pti q v pti1 q| over all partitions of ra, bs and the above inequality holds for all partitions, TVpv q }f }. Now we show that }f } TVpv q. For any x P C ra, bs, dene z p q Then, again since }uti
n i 1
xpti1 qputi p q uti1 p qq.
ut } 1 for only one i P t1, . . . , nu,

i 1
}z x} i, Pr max |xpti1 q xp q|. t ,t s

i 1 i
Since x is uniformly continuous, this quantity goes to zero as the partition is made arbitrarily ne. Then since F is continuous, F pz q F pxq, which in turn equals f pxq by the Hahn-Banach theorem. Finally, F pz q
n i 1
xpti1 qpv pti q v pti1 qq
b
a
xptq dv ptq,
so combined with the previous result, f px q

b
a
xptq dv ptq.
90 0 Finally,
b x t dv t
a
p q p q }x}TVpvq;
b
a
dividing by }x} gives }f } TVpv q. Therefore, }f } TVpv q. Conversely, if v P BVra, bs, then f px q
xptq dv ptq K
is linear. Furthermore, f is bounded since |f pxq| }x}TVpv q.
Note that the Riesz representation theorem does not claim that the function v representing the functional f is unique. For example, the functional f pxq xp1{2q could be represented by a function v ptq that equals zero for t P r0, 1{2q, any value a P r0, 1s for t 1{2, and 1 for t P p1{2, 1s. In fact, we could even shift v upward or downward by an arbitrary amount. Therefore, we introduce the following normed linear space, which we can use in applications. Denition 6.1.6. Let NBVra, bs be the space of functions v on ra, bs of bounded variation such that v paq 0 and v is right-continuous on pa, bq. The norm is }v } TVpv q. Note that NBVra, bs is a subspace of BVra, bs.
6.2
The Second Dual Space
Let the functional g pxq be represented by some element x P X ; we will use the notation g pxq xx, x y. Conversely, any x P X denes a functional on X by f px q xx, x y. Example 6.2.1. Show that such an f is linear and bounded. Solution: f is linear because
f pax 1 bx2 q xx, ax1 bx2 y a xx, x1 y b xx, x2 y af px1 q bf px2 q.

Also, |f px q| | xx, x y | }x}}x }, so }f } }x}. Furthermore, by Corollary ??, there exists a nonzero x P X such that xx, x y }x}}x }, so in fact }f } }x}. K Denition 6.2.2 (second dual). The space of all bounded linear functionals on X is denoted X and is called the second dual of X . Denition 6.2.3 (natural mapping). The mapping : X X dened by pxq x such that xx, x y xx, xy is called the natural mapping of X into X ; that is, maps elements of X into the functionals they generate X .
Denition 6.2.4 (reexivity). A normed linear space is said to be reexive if X natural mapping is surjective.
X ; that is, if the
6.3. Alignment and Orthogonal Components 0
91
Example 6.2.5. Give examples of spaces that are reexive and not reexive.
lq with 1{p 1{q 1, as shown in Quiz Solution: The lp spaces, with 1 p 8, are reexive since lp l lp . l1 is not reexive since the dual of l1 is l8 but the dual of l8 is not l1 . By the Theorem ??, so lp q Riesz-Fr echet theorem, any Hilbert space is reexive. K
6.3
Alignment and Orthogonal Components
Recall that in a Hilbert space H , px, yq }x}}y} if and only if y this idea below. Denition 6.3.1 (alignment). A vector x }x}}x}. Example 6.3.2. Let x ti u P lp and x condition are x and x aligned?
ax for some constant a. We generalize
P X is said to be aligned with a vector x P X if xx, xy
tiu P lq , with 1 p 8 and 1{p 1{q 1.
Under what
Solution: The condition for alignment is simply the condition for equality in the Holder inequality:
|i|
|i|
}x}p }x}q
1 q
1 p
Example 6.3.3 ([1], Example 1, p. 117). Let x P Lp ra, bs and x Under what condition are x and x aligned?
P Lq ra, bs, with 1 p 8 and 1{p 1{q 1.
Solution: As before, the condition for alignment is simply the condition for equality in the Holder inequality: xptq K psgn y ptqq|y ptq|q{p . K Example 6.3.4 ([1], Example 2, p. 117). Let x P C ra, bs and let tt P ra, bs : |xptq| b conditions is a bounded linear functional x pxq a xptq dv ptq aligned with xptq?
}x}u. Under what
Solution: We require that v vary only on and that v is nondecreasing at t if xptq 0 and nonincreasing at t if xptq 0. K Recall that in a Hilbert space H , we dened the orthogonal complement of a set S as S K ps, yq 0 for all s P S u. We generalize this idea below.
ty P H :
Denition 6.3.5 (orthogonal complement). Let S be a subset of a normed linear space X . The orthogonal complement or annihilator space of S is dened as S K tx P X : xs, x y 0 for all s P S u.
92 0
Given a subset U X , U K X . However, U K is not necessarily a subset of X if X is not reexive, so we dene the more useful concept below. Denition 6.3.6 (orthogonal complement of U in X ). Let U be a subset of the dual space X . The orthogonal complement of U in X is dened as K U U K X X . T HEOREM 6.3.7 ([1], Theorem 1, p. 118). Let M be a closed subspace of a normed space X . Then K rK M s M . Proof: Clearly, M K rM K s, or equivalently, if x P M , then x PK rM K s. We show the converse: if x R M , then x RK rM K s. On the subspace rx M s, dene the linear functional f pax mq a for m P M . Then
}f }
sup
m M
f p x mq }x m}
inf
1
m M
}x m } .
Since x R M and M is closed, inf mPM }x m} 0, so }f } 8. Thus, by the Hahn-Banach theorem, we can extend f to an x P X . Since f pmq 0 for m P M , x P M K . But xx, x y 1, so x RK rM K s. Then since M K rM K s and M C pK rM K sqC , M K rM K s. K
6.4
Minimum Norm Problems
In general, the solutions to minimum norm problems are not unique. Example 6.4.1 (equivalent to [1], Example 1, p. 119). Cab drivers are charging only for the larger dimension traveled in R2 . Show that from a point pu, v q with v 0, there are many points pu1 , 0q that can be reached for the same fare. Solution: All the points pu1 , 0q with u1
P ru v, u vs can be reached for fare v.
Example 6.4.2. Now cab drivers are charging for the total distance driven. Show that from a point pu, v q with u v , there are many points pu1 , u1 q that can be reached for the same fare. Solution: Without loss of generality, assume u reached for fare v u.
v.
Then all the points pu1 , u1 q with u1
P ru, vs can be
K
The following theorem guarantees neither existence nor uniqueness, but it is more general than the projection theorem in that it species a dual problem for any minimum norm problem. T HEOREM 6.4.3. Let x be an element of a real normed linear space X . Then d inf
m M
}x m} }x} max x x, x y , K 1,x PM P M K. If the inmum is achieved for some m0 P M , then x 0 is aligned
where the maximum is achieved for some x 0 with x m0 .
6.4. Minimum Norm Problems 0 Proof: For
93
0, let m P M satisfy }x m } d
. Then for any x
P M K such that }x} 1,

.
ax m, where m P M . Dene the linear functional f on N by f pnq ad. Then nq| |a|d }f } sup |f}pn } sup }ax n} inf d
0 Since was arbitrary, we conclude xx, x y d. To complete the proof of the rst part of the theorem, we must nd a x such that xx, x y d. Let N rx M s. We can represent an element n P N uniquely as
0 0
xx, xy xx, xy x m , x y xx m , x y }x m }}x } d loooooomoooooon
1. nPN }x m{a} nPN nPN By the Hahn-Banach theorem, extend f on N to x 0 on X . Then }x0 } }f } 1 and x0 f on N , so by K construction, x 0 P M and xx, x0 y f px mq d. To prove the second part of the theorem, assume there exists an m0 P M such that }x m0 } d. K Let x 0 be any element such that x0 P M , }x0 } 1, and xx, x0 y d (the x0 constructed above is one
possibility). Then so x 0 is aligned with x m0 .
xx m0, x 0 y xx, x0 y d }x m0 }}x0 },

K
We can generalize the projection theorem as a corollary of Theorem 6.4.3. C OROLLARY 6.4.4. Let x be an element of a real normed linear space X and let M be a subspace of X . A vector m0 P M satises }x m0 } }x m} for all m P M if and only if there is a nonzero vector x P M K aligned with x m0 . Proof: If }x m0 } }x m} for all m P M , then we have achieved the inmum in Theorem 6.4.3, specically at m0 , so the given x 0 is aligned with x m0 . If there is a nonzero vector x P M K aligned with x m0 , take }x } 1 without loss of generality. Then for all m P M , xx, x y xx m, x 0 y }x m} while xx, x y xx m0 , x0 y }x m0 }. Therefore, }x m0} }x m} for all m P M . K Example 6.4.5. Let X and x .
R2 with the l1 norm, M tpu, vq : u 2v 0u, and x p0, 3q. Determine d, m0,
Solution: We have M K rp1, 2qs, so K tx : }x } 1u tp1{2, 1qu. Then x 0 arg maxx PK xx, x y p1{2, 1q and d xx, x 0 y 3. Then x m0 p0, 3q since x m0 and x0 must be aligned, so m0 p0, 0q. K
Example 6.4.6. Let X R2 with the l8 norm, M tpu, v q : u 2v 0u, and x p0, 3q. Determine d, m0 , and x . Note that the norm in X is the l1 norm since difculties concerning the dual of l8 occur only because of innite dimensionality.
Solution: We have M K rp1, 2qs, so K tx : }x } 1u tp1{3, 2{3qu. Then x 0 arg maxx PK xx, x y p1{3, 2{3q and d xx, x 0 y 2. Then x m0 p2, 2q since x m0 and x0 must be aligned, so m0 p2, 1q. K
94 0
Example 6.4.7. Let X and x .
R2 with the l1 norm, M tpu, vq : 2u v 0u, and x p1, 2q. Determine d, m0,
Solution: We have M K rp2, 1qs, so K tx : }x } 1u tp1, 1{2qu. Then x 0 arg maxx PK xx, x y p1, 1{2q and d xx, x0 y 2. Then x m0 p2, 0q since x m0 and x0 must be aligned, so m0 p1, 2q. K
Theorem 6.4.3 simply proved the existence of a solution to the dual problem in X : the theorem below proves the existence of a solution to the minimum norm problem in X . T HEOREM 6.4.8. Let M be a subspace in a real normed space X . Then d min }x m } sup xx, x y , m PM K xPM,}x}1 where the minimum is achieved for some m 0 is aligned with x0 . Proof: Dene
P M K. If the supremum is achieved for some x0 P M , then x 0 m0
}x}M }x m}
x M x
since it is the norm of the functional x restricted to the subspace M . Then for any m
P } }1
sup
x x, x y P M K,
}x}1
sup xpx m q sup
x M, x
P } }1
sup
pxx, xy xx, bmyq
}x}1
pxx, xy xx, bmyq

x M, x
P } }1
sup
xx, xy }x}M .
Since m was arbitrary, we conclude }x m } }x }M . To complete the proof of the rst part of the theorem, we must nd a m 0 such that }x m } }x }M . Consider x restricted to M . The norm of x so restricted is }x }M . By the Hahn-Banach theorem, extend x restricted on M to y on X . Then K and }y} }x}M and y x on M so y x 0 on M . Set m 0 x y ; then m0 P M } x m 0 } }y } }x }M . To prove the second part of the theorem, assume there exists an x0 P M such that xx0 , x y d. Since the supremum is achieved, clearly }x0 } 1 and }x m 0 } xx0 , x y xx0 , x m0 y, so x0 is aligned with x m K 0. Example 6.4.9. Let X and x0 .
R2 with the l1 norm, M tpu, vq : 2u v 0u, and x p4, 1q. Determine d, m 0,
Solution: We have M rp1, 2qs, so K tx : }x} 1u tp1{3, 2{3qu. Then x0 arg supxPK xx, x y p1{3, 2{3q and d xx, x 0 y 2. Then x m0 p2, 2q since x m0 and x0 must be aligned, so m0 p2, 1q. K We can illustrate the duality between Theorems 6.4.3 and 6.4.8 more explicitly as follows: given a normed linear space X and a subspace M , we have two ways of making a new vector space:
6.5. Applications 0
95
the quotient space X {M . An element of X {M is a coset rxs. Recall that the norm of such a coset is }rxs} inf mPM }x m}. Motivated by this denition, let }x}M }rxs}. This notation is reasonable if you think of it as answering the question: if I am allowed to add any element of M to x, how small can I make the norm? the annihilator space M K , a subspace of X . If we view x as an element of X , it has norm
}x}M K
Using this notation, Theorem 6.4.3 is }x}M
sup xx, xy K x PM ,}x }1
since this is the norm of x considered as a functional on M .
}x}M K and Theorem 6.4.8 is }x}M K }x}M . 0u, and x p7, 2, 3q. Note
Example 6.4.10. Let X R3 with the l1 norm, M tpu, v, wq : u 2v 3w that M rtp2, 1, 0q, p3, 0, 1qus. Determine d, m0 , and x 0.
Solution: We have M K rp1, 2, 3qs, so K tx : }x } 1u tp1{3, 2{3, 1qu. Then x 0 arg maxx PK xx, x y p1{3, 2{3, 1q and d xx, bx0 y }x}M K 2. Then x m0 p0, 0, 2q since x m0 and x K 0 must be aligned, so m0 p7, 2, 1q. We will solve minimum norm problems using roughly the following three steps: 1. In characterizing optimal solutions, use the alignment properties of the normed linear space and its dual. (Refer back to Examples 6.3.2, 6.3.3, and 6.3.4.) 2. Try to guarantee the existence of a solution by formulating the minimum norm problem in a dual space. 3. Examine the dual problem to see if it is easier that is, if it is a lower-dimensional problem or is more transparent than the original problem.
6.5
Applications
Q UIZ T HEOREM 12 (Tonelli ([1], Example 1, p. 122)). If f is continuous on ra, bs and p0 is the polynomial with deg p0 n minimizing maxtPra,bs |f ptq pptq|, then |f ptq p0 ptq| achieves its maximum at at least n 2 points on ra, bs. Proof: First, we reformulate the problem as follows: in the Banach space C ra, bs, we want to nd p P N tpolynomials p : deg p nu such that }f p} is minimized. N is nite-dimensional (dimension n 1), so an optimal solution p0 must exist. Suppose d }f p0 } 0 and let tt P ra, bs : |f ptq p0 ptq| du. By Theorem 6.4.3, the optimal solution p0 must be such that f p0 is aligned with an element v P N K with }v } 1. Furthermore, by Theorem 6.1.5, N K is a subspace of NBVra, bs. Therefore, recall from Example 6.3.4 that if f p0 is aligned with v , then v varies only on ; that is, v consists of jump discontinuities at the points in . Now suppose that contains m n 2 points a t1 tm b. Then the polynomial q ptq ik pt ti q with k P t1, . . . , mu has degree m 1 n and is therefore in N . The Stieltjes integral
96 0
b
a q ptq dv ptq is a linear combination of q pt1 q, . . . , q ptm q with no zero coefcients since v consists of jump discontinuities at the points in . However, the integral cannot equal zero since q pti q 0 for i k and q pti q 0 for i k , so v R N K . Therefore, f p0 is not aligned with any nonzero element of N K , so our assumption that contains less than n 2 points was false. K
Example 6.5.1. Let ra, bs r1, 1s, f ptq t2 , and n 1. Find p0 ptq. Solution: Let p0 ptq a bt. By Theorem 12, |t2 a bt| achieves its maximum at at least 3 points on r1, 1s. The maximum of |t2 a bt| could occur at t 1 (d |1 a b|), t 1 (d |1 a b|), or t b{2 (d | b2 {4 a|). Because the coefcient on the t2 term (1) is positive, we know that 1 a b, 1 a b, and b2 {4 a will be positive for optimal p0 ptq. Setting these quantities equal to d gives a 1{2, b 0, and d 1{2. K Example 6.5.2. Let ra, bs r1, 1s, f ptq tk1 , and n k . Find p0 ptq. Solution: Notice that the function cosppn 1qq achieves its maximum absolute value of 1 precisely n 2 times on the interval r0, s, specically at the points tk{pn 1q : k 0, . . . , n 1u. Now cos P r1, 1s, so set t cos and express cosppn 1qq as a function of t to nd p0 ptq. For example, cos 2
2 cos2 1 2t2 1,
a
so the best approximation to f ptq t2 on r1, 1s is p0 ptq 1{2. As another example, cos 4
cos2 2 sin2 2 p2 cos2 1q2 p2t 1 t2q2 p2t2 1q2 4t2p1 t2q 8t4 8t2 1, cos
1 so the best approximation to f ptq t4 on r1, 1s is p0 ptq t2 8 . Let Tn pcos q cos n by the Chebyshev polynomials of the rst kind. Note that putting t gives T0 ptq cos 0t 1 and T1 ptq cos t. We have the recurrence relation
Tn1 ptq cosppn 1qq cos cos n sin sin n
cos cos n sin psin cosppn 1qq cos sinppn 1qqq cos cos n sin2 cosppn 1q cos2 cosppn 1qq cos2 cosppn 1qq sin cos sinppn 1qq cos cos n cosppn 1qq cos cos n 2t Tnptq Tn1ptq.
k n 1
reaches its maximum at the points t cos
Then to determine the best approximation p0 ptq to f ptq tn on r1, 1s, nd the Chebyshev polynomial Tn ptq, divide by the leading coefcient, the tn term, and multiply by 1. Then |f ptq p0 ptq| ! subtract )
:k
0, . . . , n 1
In the previous examples, we wanted to nd the vector of minimum norm in a given linear variety instead of nding the vector closest to a given subspace. Remember that these problems are equivalent by Theorems 6.4.3 and 6.4.8. A standard problem is to nd an element of minimum norm satisfying a nite
6.5. Applications 0
97
number of linear constraints. Treat the solution to the minimum norm problem as a vector x in the dual space X . Theorem 6.4.8 then guarantees the existence of a solution to the minimum norm problem. We can express the constraints in the form xy1 , x y c1 , . . . , xyn , x y cn with yi P X for i 1, . . . , n. Then is a vector satisfying the constraints, we have if M rtyi us and x d
xyi , x yci
min
m } }x} mmin }x PM K
x M x
P } }1
sup
y , xx, x
where the second equality follows from Theorem 6.4.8. Then because M is nite-dimensional (dimenson n), so we can change the supremum to a maximum. Any vector in M can be written as a linear combina 1 tion x n i1 ai yi or equivalently, if we set Y py1 , , yn q and a pa1 , , an q, as x Ya. Then if we set c1 pc1 , , cn q, d
xyi , x yci
min
y max c1 a. }x} }Ya max xYa, x }1 }Ya}1
Then by Theorem 6.4.8, the optimal x is aligned with x n i (compare to the equivalent result in i1 ai y Hilbert space, where the optimal solution was proportional to x n i1 ai yi ).
xyi, xy ci for i 1, . . . , n is consistent; that is

D Then
C OROLLARY 6.5.3 ([1], Corollary 1, p. 123). Let y1 , . . . , yn
P X and suppose the system of linear equalities
tx P X : xyi, xy ci for i 1, . . . , nu .
x D
min }x } max c1 a
and the optimal x is aligned with the optimal Ya. Proof: See the discussion above.
}Ya}1
Example 6.5.4. You are taking the new Gen Ed course Applied Cubism, whose goal is to teach you how to apply art to everyday life. As your term project, you are producing three large concrete cubes. Initially these will serve as a podium for children who are receiving medals for their athletic prowess, but they will live on as modern sculpture. The gold-medal cube has side u and provides artistic renown 9u. The silver-medal cube has side v and provides artistic renown 4v . The gold-medal cube has side w and provides artistic renown w. You must achieve artistic renown of 72 to earn an A in the course. How do you accomplish this while minimizing the total volume d3 of concrete? Solution: The solution x pu, v, wq lies in the space X l3 . We want to minimize }x }3 such that xy, xy c, where y p9, 4, 1q P X l3{2 and c 72. We have
}ay}3{2 a
Then by Corollary 6.5.3, d min
92
4 1
3 2
3 2
a 64{3 1.
max 72a 2 62{3
xy, x yc
}x} }Ya max c1 a }1
a 64{3
98 0
and d3 288. Then because x and y must be aligned (there is only one constraint, so Ya in Corollary 6.5.3 reduces to a constant times y), we have x p6, 4, 2q. We can check this result by setting up a Lagrangian: Lpu, v, w, q u3 v 3 w3 p9u 4v w 72q dL 3u2 9 0 du dL 3v 2 4 0 dv dL 3w 2 0 dw We see that pu, v, w, q p6, 4, 2, 12q satises the equations above. K
We will use the following scenario for the next two examples: You are setting up a company that will operate for four weeks to produce snacks for the inaugural festivities. At the start of each week, and at the end of the fourth week, you will hire or re staff (1 unit of labor equals 100 hours per week). Each unit of labor produces a ton of snacks per week. Since you are using overworked government personnel to do your hiring and ring, your requirement is to minimize the maximum absolute value of the weekly change in the labor force. In accordance with standard practice, you look for the optimal solution in the space X l8 . It has the form x t1 , 2 , 3 , 4 , 5 u, and it will be the vector of minimum norm in a linear variety determined by whatever constraints the candidates provide. These constraints will be vectors in X l1 . Example 6.5.5. First you go to the McCain-Palin staff. They require 14 tons of moose jerky. Since they favor small government, they want only 1 unit of labor left over to serve the snacks. Solution: Note that at the start of Week k , we have k i1 i units of labor, for i 1, . . . , 5 (the start of Week 5 is the inauguration). Then the functional that gives the tons of snacks produced is y1 p4, 3, 2, 1, 0q and the functional that gives the number of units of labor left for the inauguration is y2 p1, 1, 1, 1, 1q. The optimal solution x must satisfy xy1 , x y c1 14 and xy2 , x y c2 1. Since the optimal x is aligned with a vector in l1 , its entries must almost all be d. The exception is that if the vector in l1 has an entry of zero in any position, then x can have any entry between d and d in that position without destroying alignment. The vector Ya can be written p4a1 a2 , 3a1 a2 , 2a1 a2 , a1 a2 , a2 q. Since we want a unit of labor left over, the rst three entries of x will be nonnegative, while the last two will be negative. Therefore, in the aligned Ya, the rst three entries will also be nonnegative, while the last two will be nonpositive. Therefore, we can discard the absolute values, so }Ya}1 8a1 a2 . From Corollary 6.5.3, d min }x} }Ya max c1 a 14a1 a2 x PD }1 Now 3 2a1 a2 0 and we require }Ya} 8a1 a2 1, so a1 1{6, a2 1{3, Ya p1{3, 1{6, 0, 1{6, 1{3q, and d 2. Then since x must be aligned with Ya, x p2, 2, 1, 2, 2q. K Example 6.5.6. Next you go to the Obama-Biden staff. They require 17 tons of brioche stuffed with brie. Since they favor job creation, they want 3 units of labor left over to serve the snacks.
6.5. Applications 0
99
Solution: Again we have y1 p4, 3, 2, 1, 0q P l1 and y2 p1, 1, 1, 1, 1q P l1 , though the optimal solution x must now satisfy xy1 , x y c1 17 and xy2 , x y c2 3. We repeat all of the steps as before, though now since we want 3 units of labor left over, either the rst three entries of x will be nonnegative while the last two will be negative, or the rst four entries of x will be nonnegative while the last one will be negative. Therefore, in the aligned Ya, either the rst three entries will be nonnegative while the last two will be nonpositive, or the rst four entries will be nonnegative while the last one will be nonpositive. In either case, we can discard the absolute values, so }Ya}1 8a1 a2 or }Ya}1 10a1 3a2. From Corollary 6.5.3, d min }x} }Ya max c1 a 17a1 3a2 x PD }1 Now either 3 2a1 a2 0 or 4 a1 a2 0, which combined with the constraint }Ya}1 1 above, yields either a1 1{6, a2 1{3, Ya p1{3, 1{6, 0, 1{6, 1{3q, and d 11{6 or a1 1{7, a2 1{7, Ya p3{7, 2{7, 1{7, 0, 1{7q, and d 2. d is larger in the second case, so that case is optimal. Then since x must be aligned with Ya, x p2, 2, 2, 1, 2q. K Example 6.5.7 ([1], Example 2, p. 124). Consider selecting the eld current uptq on r0, 1s to drive a motor :ptq 9ptq uptq with initial conditions p0q 0, 9p0q 0, p1q 1, and 9p1q 0 such that governed by we minimize maxtPr0,1s |uptq|. Proof: We could think of this problem as being formulated in C r0, 1s, but since this is not the dual of any normed space, we are not guaranteed that a solution exists in C r0, 1s. Therefore, we instead take X L1 r0, 1s, X L8 r0, 1s, and seek u P X of minimum norm. As in Example ??, the constraints are
xy1, uy xy2, uy
1
0 1 0
9p1q 0 y1 ptquptq dt
y2 ptquptq dt p1q 1,
where y1 ptq et1 and y2 ptq 1 et1 . From Corollary 6.5.3, min }u}8 so we want to maximize a2 subject to the constraint
1
0
}max a2 , Ya1
a1
p a2qet1 a2 dt 1.
We require that uptq is aligned with this function:

1
0
uptqpa1 y1 ptq a2 y2 ptqq dt }u}8 }a1 y1 a2 y2 } d.
This can happen only if uptq is equal to d everywhere and always has the same sign as a1 y1 ptq a2 y2 ptq. Furthermore, the function a1 y1 ptq a2 y2 ptq is the sum of a constant and an exponential term, so it changes sign on r0, 1s at most once; the same must be true of uptq.
100 0
From the rst constraint, we have
xy1, uy
t
0
e dt
t 1
t 1 log 1 1 1
2 e and from the second constraint, we have
et1 dt et 1 e1 1 et 1 0
xy2, uy
t
0
d
dp1 e q dt
t 1
1
t
t et 1 e1 1 e0 t et 1 1
dp1 et1 q dt
d
log
p1eq2 .
4e
Then uptq d for t P r0, t s and uptq d for t P pt , 1s.
Example 6.5.8 ([1], Example 3, p. 125). Consider the problem of selecting the thrust program uptq for a vertically ascending rocket-propelled vehicle, subject only to the forces of gravity and rocket thrust, in order to reach a given altitude with minimum fuel expenditure. Solution: Assuming xed unit mass, unit gravity, amd zero initial conditions, the altitude xptq is governed by the differential equation
:ptq uptq 1, x
Integrate to give
xp0q x 9 p0 q 0 . uptq dt T.
9 pT q x 9 p0q 0.) Now integrate again: (Note this gives x

x pT q
T
0 T 0 T 0
T
0
x 9 psq ds
s
0
T s
0

uptq dt
uptq dt s T2 2
ds uptq ds
ds
2
T T
0 t
dt
pT tquptq dt T2 . xy, uy
T
0
(Note this gives xp0q 0.) Then for xpT q 1 we have the constraint y ptquptq dt 1 T2 , 2
where y ptq T t, and we want to minimize 0 |uptq| dt. We could think of this problem as being formulated in L1 r0, T s, but since this is not the dual of any normed space, we are not guaranteed that a solution exists in L1 r0, T s. Therefore, we instead take X
6.6. Hyperplanes and Linear Functionals 0 C0 r0, T s, X
101
NBVr0, T s, and seek v P X minimizing

T
0
|dvptq| TVpvq }v}

2
subject to
T
0
pT tq dvptq 1 T2 .
Because we cannot expend negative fuel, the physical meaning of v ptq is the total fuel consumption over the interval r0, ts, while }v } is the total fuel consumption over the whole interval r0, T s. From Corollary 6.5.3,
T2 . d min }v } max a 1 2 }pT tqa}1 Then
so the optimal choice is a 1{T and d 1{T T {2. The optimal v must be aligned with pT tqa and, from Example 6.3.4, can vary only at t 0. Thus, v is a step function. Differentiating d with respect to T ? ? ? gives T 2 and d 2, so v 2 1rtPp0,?2ss . K
}pT tqa} tmax |pT tqa| T |a|, Pr0,T s
6.6
Hyperplanes and Linear Functionals
Denition 6.6.1 (hyperplane). A hyperplane H in a linear vector space X is a maximal proper linear variety; that is, a linear variety H such that H X (proper), and if V is any linear variety containing H , then either V X or V H (maximal). The following proposition, stated without proof, gives us a more practical way to work with hyperplanes. Proposition 6.6.2 ([1], Proposition 1, p. 129). Let H be a hyperplane in a linear vector space X . Then there is a linear functional f on X and a constant c such that H tx : f pxq cu. Conversely, if f is a nonzero linear functional on X , then the set tx : f pxq cu is a hyperplane in X .
Denition 6.6.3 (half-space). The four half-spaces associated with a hyperplane H are
tx : f pxq cu tx : f pxq cu
tx : f pxq cu tx : f pxq cu
Denition 6.6.4 (supporting hyperplane). A closed hyperplane H in a normed space X is said to be a support or supporting hyperplane for the convex set K if K is contained in one of the closed half-spaces determined by H and H contains a point in K
102 0
T HEOREM 6.6.5 (Support Theorem ([1], Theorem 2, p. 133)). Given x in the boundary of a convex set K , there exists a supporting hyperplane containing x. C such that xk x. Now for Proof: If x is in the boundary of K , then there exists a sequence txk u in K , there exists a hyperplane Hk strictly separating xk from K . Let Hk be dened by the any k , since xk R K functional fk ; that is, fk pxk q fk pyq for all y P K . Now we extract a subsequence so that fk f ; then in the limit, f pxq f pyq for all y P K . f denes the desired separating hyperplane. K T HEOREM 6.6.6 (Eidelheit Separation Theorem ([1], Theorem 3, p. 133)). Let K1 and K2 be convex sets in X 1 and K2 X K 1 . Then there is a closed hyperplane H separating K1 and K2 , or equivalently, such that K there exists an x P X such that supxPK1 xx, x y inf xPK2 xx, x y. In other words, K1 and K2 lie in opposite half-spaces determined by H . Proof: Let K K1 K2 . Then K contains an interior point and 0 is not one of them. Recall from Proposition 2.2.3 that K1 K2 is convex. Then by Theorem 6.6.5, there exists an x P X with x 0 such that xx, x y 0 for x P K . Setting x x1 x2 with x1 P K1 and x2 P K2 gives xx1 , x y xx2 , x y, so there exists a real number c such that
x K1
sup xx, x y c inf
x K2
xx, xy
K
The desired hyperplane is then H
tx : xx, xy cu.
C HAPTER 7
CALCULUS OF VARIATIONS
Reading: [1, Sections 7.1 7.5, 7.7]
7.1
Example 7.1.1. Let
Review
f pu, v q with f p0q 0. Calculate B f {B u and B f {B v at 0.
u2 v u2 v 2
Calculate the directional derivative at 0 along the vector h p1, mq. Introducing the Euclidean norm, show that the derivative does not exist.
Solution: We have
Bf f p0; p1, 0qq lim 1 pf p, 0q f p0, 0qq 0 0 Bu Bf f p0; p0, 1qq lim 1 pf p0, q f p0, 0qq 0 0 Bv
Furthermore, f p0; hq lim
pf p, mq f p0, 0qq
3
1 m lim m . 0 2 p1 m2 q 1 m2 Note that the directional derivative is not a linear functional of h. In fact, the only linear function f p0; hq that could possibly work in this case, given that both partial derivatives evaluated at 0 equal 0, is f p0; hq 0. We now show that the derivative does not exist; that is, we show that there is no linear function f p0; hq of h such that 1 lim pf phq f p0q f p0; hqq 0. }h}0 }h} First we convert to polar coordinates: h
ph1, h2q pr cos , r sin q.

103
Then f phq
r cos2 sin and
104 0
7. C ALCULUS OF VARIATIONS
}h} r. We rewrite the above equation as

1 r cos2 sin par cos br sin q cos2 sin a cos b sin . r 0 r
lim
for constants a, b. There exist no constants a, b such that the above quantity is zero for all , so the derivative does not exist. Intuitively, we can see from Figure 7.1.1 that f has a cusp at 0. K
Figure 7.1.1: On the left, we see that the function f in Example 7.1.1 has a cusp at 0. Furthermore, the partial derivatives B f {B u (middle) and B f {B v (right) are clearly discontinuous at 0, so Proposition 7.1.2 does not apply. The following proposition will be useful later. Note that it did not apply above since the partial derivatives were not continuous at 0, as shown in Figure 7.1.1.
Proposition 7.1.2 ([1, Example 7.2.4, p. 172]). Let X partial derivatives with respect to each xi . Then
E n and let f pxq be a functional on E n having continuous Bf h . B xi i i1

n
f px; hq
7.2
Gateaux and Frechet Differentials
The Gateaux differential generalizes the concept of a directional derivative: Denition 7.2.1 (Gateaux differential). Let X be a vector space, Y be a normed space, and T : D Y a transformation, with D X . Let x be a xed element of D and h an arbitrary element of X . If the limit T px; hq lim 1 pT px hq T pxqq 0
exists, it is called the Gateaux differential of T at x with increment h. If the limit exists for all h P X , T is said to be Gateaux differentiable at x. Note that while we do require Y to be normed since we need to take limits in Y , we do not require X to
7.2. Gateaux and Fr echet Differentials 0
105
be normed. Furthermore, note that if T is Gateaux differentiable at x, then T px; hq is linear in h since T px; hq lim 1 1 p T px hq T pxqq lim pT px hq T pxqq T px; hq. 0 0
The Fr echet differential generalizes the concept of a derivative: Denition 7.2.2 (Fr echet differential). Let X and Y be normed spaces, and T : D Y a transformation, with D X . If for xed x P D and arbitrary h P X there exists a linear, continuous function T px; hq of h such that 1 lim pT px hq T pxq T px; hqq 0, }h}0 }h}
then T is said to be Fr echet differentiable at x and T px; hq is called the Fr echet differential of T at x with increment h. In Example 7.1.1, we showed that the given function was not Fr echet differentiable at the origin. Example 7.2.3. Calculate the Gateaux differential of f px q from the denition. Then let T
T
0
xptq2 dt
{2, xptq sin t, and hptq cos t. Use the differential to approximate
psin t 0.01 cos tq2 dt.
Solution: We have
1 1 T lim pf px hq f pxqq lim p xptq hptqq2 xptq dt 0 0 0 T 1 T lim p 2xptqhptq 2 h2 ptqq dt 2xptqhptq dt. 0 0 0
Furthermore,
psin t 0.01 cos tq
dt
0
2
sin t dt
2
0.02 sin t cos t dt 0.01 sin 2t dt 0.005 sin s ds
0
2

0
2
sin t dt
2
0.005 0.7954. 4
K
sin2 t dt
0 0
106 0
Example 7.2.4 ([1, Examples 7.2.2, 7.2.5, p. 172 174]). Let X of f p xq

1 0
C r0, 1s. Calculate the Gateaux differential
g pxptq, tq dt,
assuming that the partial derivative gx px, tq exists and is continuous with respect to x and t. Show that this differential is also a Fr echet differential. Solution: We write the Gateaux differential as f px; hq d d
1
0
g pxptq
h t , t dt
pq q
By our assumptions on g , we can interchange the order of integration and differentiation: f px; hq
1
0
gx pxptq, tqhptq dt.
Now we show that the Gateaux differential is also a Fr echet differential. We have
|f px hq f pxq f px; hq|

For xed t, we have
1 g xt
0
p p p q hptq, tq gpxptq, tq gxpxptq, tqhptqq
dt .
where |xptq x ptq| |hptq|. Because r0, 1s is compact, gx px, tq is not just continuous but also uniformly continuous with respect to x and t. Therefore, for every 0, there exists a 0 such that for }h} , |gxpx h, tq gxpx, tq| . Therefore,
g pxptq hptq, tq g pxptq, tq gx px ptq, tqhptq,
|f px hq f pxq f px; hq|

Therefore, lim
1 gx x t ,t
0
p p p q q p p q qq p q
gx x t , t h t dt
}h}.
so f px; hq is indeed a Fr echet differential.
1 |f px hq f pxq f px; hq| 0, }h}0 }h} K
Denition 7.2.5 (relative minimum and maximum). Let f be a real-valued functional dened on a subset of a normed space X . A point x0 P is a relative or local minimum of f on if there is an open ball B px0 q such that f px0 q f pxq for all x P X B px0 q. The relative or local maximum is dened similarly.
Q UIZ T HEOREM 13 ([1, Theorem 7.4.1, p. 178]). Let a real-valued functional f have a Gateaux differential on a vector space X . A necessary condition for f to have an extremum at x0 P X is that f px0 , hq 0 for all h P X . Proof: For every h P X , the function g pq f px0 hq must achieve a extremum at
0. By Fermats
7.3. Euler-Lagrange Equations 0 theorem from single-variable calculus, if 0 is an extremum, then g 1 p0q 0; that is, g 1 p0q lim
107
1 pgpq gp0qq lim pf px0 hq f px0 qq f px0 , hq 0. 0
Note that the theorem is necessary, not sufcient. For example, the function f pu, v q uv has partial derivatives B f {B u v and B f {B v u, so the Gateaux differential of f at 0 with increment h ph1 , h2 q is 0h1 0h2 0. However, 0 is not an extremum, but a saddle point.
7.3
Euler-Lagrange Equations
Denition 7.3.1 (Dra, bs). The normed linear space Dra, bs consists of all continuous and continuously differentiable functions on ra, bs, with norm
9 ptq|. }x} tmax |xptq| tmax |x Pra,bs Pra,bs

Q UIZ T HEOREM 14 (Euler-Lagrange ([1, Section 7.5, p. 179 180])). Consider nding a function x P Drt1 , t2 s minimizing a functional of the form J
t2
t1
9 ptq, tq dt, f pxptq, x
where f is continuous with respect to x, x 9 , and t and has continuous partial derivatives with respect to x and x 9. Assume that xpt1 q c1 and xpt2 q c2 for constants c1 , c2 . Show that x0 arg minx J is a solution to the Euler-Lagrange equation d 9 ptq, tq 0. fx fx pxptq, x 9 ptq, tq dt 9 pxptq, x Proof: Our admissible set is the subset of Drt1 , t2 s with xpt1 q and xpt2 q xed. Given an admissible vector x, we also consider vectors of the form x h that are admissible. The set of all h such that x h is admissible is called the class of admissible variations. Clearly, hpt1 q hpt2 q 0 for such h. The Gateaux differential of J is J px; hq d d
t2
t1
9 , tq dt f px h, x 9 h
t2
t1
By our assumptions on f , we can interchange the order of integration and differentiation: J px; hq
t2
t1
9 tqhptq dt fx px, x,
9 ptq dt. 9 tqh fx 9 px, x,
If we assume that fx 9 has a continuous derivative with respect to t, then we can rewrite the second integral using integration by parts:
t2
t1
9 ptq dt fxpx, x, 9 tqh 9 tqhptq|tt fx px, x,

9 9
t2
t1
2 1
d 9 tqhptq dt, fx 9 px, x, dt
108 0 so
t2
t1
J px; hq
9 tq fx px, x,
d 9 tq hptq dt fx9 px, x, 9 tqhptq|tt2 fx . 9 px, x, 1 dt
For all admissible variations h, hpt1 q hpt2 q 0, so the second term equals zero. By Quiz Theorem 13, a necessary condition on x0 arg minx J is that it must satisfy J px; hq
t2
t1
fx px, x, 9 tq
d 9 tq hptq dt 0 fx 9 px, x, dt
for all h P Drt1 , t2 s with hpt1 q hpt2 q 0. Now we want to show that x0 must satisfy the Euler-Lagrange equation fx px, x, 9 tq d 9 tq 0. fx 9 px, x, dt
t
2 g ptqhptq dt 0 for all h P Drt1 , t2 s with hpt1 q We prove that if g ptq is continuous on rt1 , t2 s and t1 hpt2 q 0, then g ptq 0 on rt1 , t2 s. Assume g ptq is nonzero, say positive, for some t P rt1 , t2 s. Then since g is continuous, it is positive on some interval rs1 , s2 s rt1 , t2 s. Let
hptq But then
p t s 1 q2 ps 2 t q2
0
t2
t1
s1
otherwise aptqhptq dt 0,
t s2 .
contradicting our claim that this integral evaluates to zero. Therefore, g ptq 0 on rt1 , t2 s. Noting that
9 tq fx px, x,
d 9 tq fx 9 px, x, dt K
is continuous, we apply this result to obtain the Euler-Lagrange equation.
In fact, we do not need to assume that the derivative of fx 9 with respect to t is continuous. See [1, Lemmas 7.5.1 7.5.3] for an alternative derivation. Example 7.3.2. For a mass on a spring, if we set all coefcients equal to 1, the difference between kinetic 1 2 9 tq 2 9 x2q. Find the function x that minimizes px energy and potential energy (the Lagrangian) is f px, x, J with xp0q 1 and xp {2q 0. Solution: x must satisfy the Euler-Lagrange equation fx px, x, 9 tq d fx 9 tq x x : 0. 9 px, x, dt
9 tq dt f px, x,
1 9 ptq2 xptq2q dt px 2
The general solution to this equation is xptq A cos t B sin t. From the initial conditions, we have A 1
7.3. Euler-Lagrange Equations 0 and B
109 K
0, so xptq cos t.
Example 7.3.3. The balance in your campaign chest is xptq, and it grows at a rate equal to xptq by accumulating interest. If you want a different growth rate x 9 ptq, you must make contributions at a rate 9 ptq xptq. Find the function x that minimizes uptq x J
puptqq
0
dt
1
0
9 tq dt f px, x,
1
0
9 ptq xptqq2 dt px
with xp0q 1 and xp1q e2 . Solution: x must satisfy the Euler-Lagrange equation
9 tq fx px, x,
d 9 tq 2px 9 xq 2px :x 9 q 2 px x :q 0. fx 9 px, x, dt
The general solution to this equation is xptq 1 e2 {p1 eq and B e2 {p1 eq, so xptq We make contributions at a rate
Aet Bet.
1e e2
From the initial conditions, we have A e2
1e
et .
uptq x 9 ptq xptq pAet Betq pAet Betq 2Bet
e t 12 e . e
2
Example 7.3.4 ([1, Example 7.5.2, p. 182]). You want to determine your lifetime plan of investment and expenditure that maximizes total enjoyment. Assume you have xed savings S and know you will die in T years. Further assume you have no descendants and are not a charitable person, so you plan to have no savings left at time T . Your total capital xptq at time t grows at rate x 9 ptq xptq rptq, where is the interest rate you earn on your investments and rptq is your rate of expenditure. Your enjoyment is a function U of your rate of expenditure rptq, so you want to maximize J
T
0
et U prptqq dt
T
0
9 tq dt f px, x,
T
0
9 ptqq dt. et U pxptq x
Find the function x that maximizes J with xp0q S and xpT q 0. Solution: x must satisfy the Euler-Lagrange equation fx px, x, 9 tq
d d t 1 t 1 fx p x, x, 9 t q e U p x p t q x 9 p t qq e U p x p t q x 9 p t qq 0 9 dt dt d 1 dt U pxptq x 9 ptqq p qU 1pxptq x 9 ptqq.
Integrating this equation gives U 1 prptqq U 1 prp0qq exppp qtq.
110 0
Consider U prq 2 r. Then U 1 prq r1{2 and pU 1 q1 prq r2 . From the previous equation, we have rptq rp0q expp2p qq (note pU 1 q1 prsq pU 1 q1 prqpU 1 q1 psq). We can now solve for xptq: the calculations are messy and not worth repeating here. K
7.4
Problems with Constraints
In several optimization problems, we require the optimal vector to satisfy some constraints. In the previous section, we considered functions that vanish on the endpoints of an interval. More generally, we optimize a functional f subject to n nonlinear constraints given in the implicit form g1 pxq 0, . . . , gn pxq 0. We will assume throughout this section that f and g1 , . . . , gn are continuous and Fr echet differentiable on the normed space X . Denition 7.4.1 (regular point). A point x0 satisfying the constraints g1 pxq 0, . . . , gn pxq 0 is a regular 1 px0 q, . . . , g1 px0 q are linearly independent. point of these constraints if the n linear functionals g1 n Recall that even though g1 , . . . , gn are nonlinear functionals, their Gateaux differentials are linear in the increment h. T HEOREM 7.4.2 ([1, Theorem 7.7.1, p. 187]). If x0 is an extremum of the functional f subject to the constraints gi pxq 0 for i 1, . . . , n and if x0 is a regular point of these constraints, then f px0 ; hq 0 for all h satisfying gi px0 ; hq 0 for i 1, . . . , n. Proof (single constraint): Choose h P X such that g px0 ; hq 0. Choose any vector y P X such that g px0 ; y q 0 (note that if x0 were not a regular point, we would not be able to do this). Now set g px0 h y q 0. This is a nonlinear function of and since x0 , H , and y are xed. The derivative of this function with respect to at , 0 is just g px0 ; y q, which by assumption is nonzero. Therefore, by the implicit function theorem, we can solve for as a function of in some neighborhood of 0. Then 0 g px0 h y q lo gop x0oq g px0 ; hq g px0 ; y q op mo n looomooon
0 0
q op}y}q.
(7.4.1)
Since g px0 ; y q 0, there exist constants constants c1 , c2 0 such that c1 || |g px0 ; y q| c2 ||, and since y 0, there exist constants d1 , d2 0 such that d1 }y } || d2 }y }. Therefore, c1 d1 }y } |gpx0; yq| c2d2}y}, so from (7.4.1), }y} op q. Now x0 h y satises the constraint and is a function of . By Quiz Theorem 13, we must have d f px0 h p qy q| d which implies f px0 ; hq 0 since }y } op q.
0 0,
K
7.4. Problems with Constraints 0
111
Proposition 7.4.3 ([1, Lemma 7.7.1, p. 188]). Let f0 , f1 , . . . , fn be linear functionals on a vector space X and suppose that f0 pxq 0 for every x P X satisfying fi pxq 0 for i 1, . . . , n. Then there exist scalars 1 , . . . , n such that f0
n i 1
i fi
0.
Proof: In E n , consider the subspace M formed by all points of the form pf1 pxq, . . . , fn pxqq. Dene the linear functional g on rM s by g pmq f0 pxq. By the Hahn-Banach theorem, we can extend g to dened on all of E n . Furthermore, since pE n q E n , we can represent as p1 , . . . , n . Thus, we have found 1 , . . . , n such that
n i 1
adding f0 pxq to both sides gives the desired result.
i f1 pxq f0 pxq; K
T HEOREM 7.4.4 (Lagrange Multipliers ([1, Theorem 7.7.2, p. 188 189])). If x0 is an extremum of the functional f subject to the constraints gi pxq 0 for i 1, . . . , n and x0 is a regular point of these constraints, then there exist scalars i for i 1, . . . , n such that the functional Lpxq f pxq is stationary at x0 ; that is, Lpx0 ; hq f px0 ; hq for all h. Proof: By Theorem 7.4.2, f px0 ; hq 7.4.3 gives the desired result.
n i 1
i g i px q
n i 1
i g px0 ; hq 0
0 whenever gipx0; hq 0 for i 1, . . . , n.
Applying Proposition K
Example 7.4.5 (Didos Problem ([1, Example 7.7.1, p. 189])). Find the function x that maximizes
1
1
subject to the arc length constraint
1 a
xptq dt
with xp1q xp1q 0.
9 2 1 dt L x
Solution: We want to nd a stationary point of J
9 tq dt 1 f px, x,
1
9 2 1 dt. x x
112 0
x must satisfy the Euler-Lagrange equation fx px, x, 9 tq
9 d d ? 2x 9 tq 1 dt fx 9 px, x, dt x 9 1
: 0. 1 p1 x 9 2q3{2 x
looooomooooon
Note that is the curvature of a plane curve (see [8, Example 4.2.2, p. 90]). the above equation says that is the constant 1{, so x must be the arc of a circle
px c1q2 pt c2q2 r2.

Recall that the curvature of a circle is the inverse of its radius, so r boundary conditions, we have
is the radius of the circle. From the
so c2 0 and c1 r2 1. Then the arc length of the circle from t 1 to t 1 is 2r, where tan1 p1{c1 q is the angle between the line segments pp0, c1 q, p0, 0qq and pp0, c1 q, p1, 0qq. K We can also use the calculus of variations to nd aligned vectors, as long as absolute values are not an issue. Example 7.4.6. A linear functional on L4 r0, 1s is represented by a function y ptq that is non-negative on r0, 1s. Find the function xptq that maximizes xx, yy subject to the constraint }x}4{3 1. Solution: Note that the constraint is equivalent to }x}4{3 point of J
4 3
p0 c1q2 p1 c2q2 r2 p0 c1q2 p1 c2q2 r2,
1.
Therefore, we want to nd a stationary
1
0
9 tq dt f px, x,
1
0
xptqy ptq pxptqq4{3 dt.
x must satisfy the Euler-Lagrange equation fx px, x, 9 tq d fx 9 tq y 4 x1{3 9 px, x, dt 3

4 3
0,
so xptq K py ptqq3 , where K is a constant chosen so that }x}4{3

1
0
4 e 3 3t dt
1. For example, if yptq et, then

4e4
1
0
1 e4t dt 1 4
3{4
and xptq
1 4
1 4e 4
e3t .
Example 7.4.7. In many of the problems we solved in Hilbert space, the goal was to minimize the norm of a vector x P L2 r0, 1s subject to the constraints pyi , xq ci for i 1, . . . , n. Using the calculus of variations, show that x is a linear combination of the constraints.
7.4. Problems with Constraints 0 Solution: We want to nd a stationary point of J
113
1
0
f px, x, 9 tq dt
1
0
pxptqq
2
n i 1
i yi ptqxptq
dt.
Then x must satisfy the Euler-Lagrange equation
9 tq fx px, x,
d 9 tq 2x i yi fx 9 px, x, dt i1
n
0,
K
In Lp r0, 1s, x is still aligned with the constraints, but aligned is no longer equivalent to proportional. Example 7.4.8. Find the function x P L4 r0, 1s of minimum norm such that
1
0
so x is a linear combination of the constraints, as we saw before.
t3 xptq dt 1.
Solution: We want to nd a stationary point of J
1
0
9 tq dt f px, x,
1
0
ppxptqq4 t3xptqq dt. 0,

K
Then x must satisfy the Euler-Lagrange equation fx px, x, 9 tq so xptq 5t4 . d fx 9 tq 4x3 t3 9 px, x, dt
114 0
C HAPTER 8
CONVEX FUNCTIONALS
Reading: [1, Sections 7.8, 7.10 7.12]
8.1
Local to Global
Recall from single variable calculus that if a function(al) f satises f 2 pxq 0 on ra, bs, then any point x where f 1 pxq 0 is not merely a local minimum but a global minimum. The following theorem helps us generalize the second derivative test by relating the second derivative to convexity. T HEOREM 8.1.1. If f pxq is twice differentiable on ra, bs, then it is convex on ra, bs if and only if f 2 pxq 0 on C Proof: First we show the if direction: specically, we show that if f is a convex function, then f 1 ptq is increasing or constant for all x P C . Pick x1 , x2 P C such that x1 x2 . Then f 1 px2 q f 1 px1 q lim
h
f px 2 hq f px 2 q f px 1 hq f px 1 q , h
so we simply need to show that f px2 hq f px1 q f px1 hq f px2 q for h 0 if f is convex. Let x2 x1 x2 x1 h so that x1 h p1 qx1 px2 hq and x2 x1 p1 qpx2 hq. Then f px1 hq f px2 q f pp1 qx1 px2 hqq f px1 p1 qpx2 hqq f px1 q f px2 hq, where the inequality follows since f is convex. Next we show the only if direction: specically, we show that if f 2 pxq 0 for x P C , then f px1 p1 qx2q f px1q p1 qf px2q for any x1, x2 P C . The proof is by contradiction: suppose that there exists some 0 P p0, 1q such that the previous inequality was false. Then by the mean value theorem, there exists some z1 P px1 , x1 p1 qx2 q such that f 1 pz 1 q f px1 p1 qx2 q f px1 q p1 qpx1 x2q f px1 q p1 qf px2 q 2 q f px1 q p1 pq1 f pxx qpx x q x ,
1 2 2 1
where the inequality follows from our assumption. Similarly, there exists some z2 P px1 p1 qx2 , x2 q such that f px2 q f px1 p1 qx2 q f px1 q f px2 q f px 2 q f px 1 q f 1 pz 2 q , px1 x2 q px 1 x 2 q x2 x1 but this is a contradiction since we chose z2
z1 and f 1 is increasing. Therefore, f must be convex.
115
116 0
8. C ONVEX F UNCTIONALS
Denition 8.1.2 (convex functional). A real-valued functional f dened on a convex subset C of a linear vector space is convex if f px1 p1 qx2 q f px1 q p1 qf px2 q for all x1 , x2
P C and all P p0, 1q.
Example 8.1.3. Check that on L2 r0, 1s, the functional f pxq is convex. Solution: We have f px1 p1 qx2 q
1
0 2 0
1
0
x2 ptq dt
px1ptq p1 qx2ptqq2 dt
1
x2 1
ptq dt p1 q
1
2 0
x2 2
ptq dt p1 q px1ptq x2ptqq2 dt .

0 loooooooooooooooooomoooooooooooooooooon
Q UIZ T HEOREM 15 ([1, Proposition 7.8.1, p. 191]). Let f be a convex functional dened on a convex set C in a normed space. Let inf xPC f pxq. Then The subset C where f pxq is convex. If x0 is a local minimum of f , then f px0 q and x0 is therefore a global minimum of f .
Proof: First, suppose x1 , x2 P . Then for x x1 p1 qx2 with P p0, 1q, we have f pxq f px1 q p1 qf px2q . Furthermore, for any x P C , f pxq by denition. Therefore, f pxq . Next, suppose N is a neighborhood about x0 in which x0 minimizes f . For any x P C with x x0 , there exists some P p0, 1q such that x1 x0 p1 qx P N . Then f px 0 q
paq
f px1q f px0q p1 qf pxq,
pbq
where (a) follows from the fact that x0 is a local minimum and (b) follows from the convexity of f . Rearranging terms, we have f px0 q f pxq for all x P C , so x0 is a global minimum. K
Denition 8.1.4 (epigraph). The epigraph of a convex functional f dened on a convex set C in a vector space X is the set rf, C s tpr, xq P R X : x P C, f pxq ru.
8.2. Conjugate Convex Functionals 0
117
Example 8.1.5. Show that the epigraph is a convex set. Solution: Pick px1 , y1 q and px2 , y2 q such that f px1 q y1 and f px2 q y2 . Then y1 p1 qy2
f px1q p1 qf px2q f px1 p1 qx2q,
where the last step follows since f is convex. Therefore the point px1 p1 qx2 , y1 p1 qy2 q is on or above the graph of f , so the region on or above the graph of f is convex. K
8.2
Conjugate Convex Functionals
Denition 8.2.1 (conjugate convex). Let f be a convex functional dened on a convex set C in a normed space X . The conjugate set C is dened as C x P X : suppxx, x y f pxqq 8 xPC and the functional f conjugate to f is dened on C as f px q suppxx, x y f pxqq.
x C
"
Example 8.2.2. Let C
r0, 1s and f pxq x2. Find C , f pxq, C , and f .
Figure 8.2.1: The epigraphs rf, C s and rf , C s in Example 8.2.2. The linear functionals mx for m 0, 1, 2 are shown on rf, C s, and the linear functionals xm for x 0, 1{2, 1 are shown on rf , C s. Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx. Figure 8.2.1 shows the convex sets rf, C s and rf , C s. From the graph and using elementary calculus, we
118 0 see that
$ ' ' &0
f px q f pmq
m2 ' 4
' %m
Furthermore, f pmq is the vertical coordinate of the point where the hyperplane (line) tangent to the graph of f and parallel to the subspace mx intersects the vertical axis. For m 0, clearly the tangent line and mx are equal, so 0 f pmq. For m P p0, 2q, the tangent line is given by y 2x0 px x0 q x2 0 : for m 2x0 , the vertical intercept of this line is m2 {4 f pmq. For m 2, the tangent line is given by y mpx 1q 1: the vertical intercept of this line is pm 1q f pmq. Is this simple case, X X , and any element of X can be represented by a real number x in the form xx , x y xm. Clearly f pxq is undened except on r0, 1s, so C r0, 1s. Using elementary calculus, we see that f pxq x2 for x P p0, 1q. K Example 8.2.3. Let C
m2
0m2.
m0
r1, 2s and f pxq 0. Find C , f pxq, C , and f .
Figure 8.2.2: The epigraphs rf, C s and rf , C s in Example 8.2.3. The linear functionals mx for m 1, 0, 1 are shown on rf, C s, and the linear functionals xm for x 1, 3{2, 2 are shown on rf , C s. Solution: We construct C and f using the support functional approach suggested in the previous example: construct the hyperplane, parallel to the subspace mx, that touches f at one or more points in C and lies below the graph elsewhere. Then f pmq is the vertical intercept of the hyperplane. If no such hyperplane exists, then m R C . Any element of the dual space can be represented by a real number m in the form xx, x y mx. Figure 8.2.2 shows the convex sets rf, C s and rf , C s. Clearly, f px q f pmq
#
m 2m
m0 m0
and C R. Again, we use the support functional approach to nd f and C : from Figure 8.2.2, clearly f f and C C . K
8.2. Conjugate Convex Functionals 0 For the next example, note that if
1 p
119
1 q 1, then p1 pqp1 qq 1 q ppq 1q
pq p q p p 1q
Example 8.2.4. Let C
pq
1 E n and f pxq p }x}p p for 1 p 8 and 0. Find C and f .
Solution: Let x p1 , . . . , n q P C and represent an element x

n
P C C as p1, . . . , nq. Then

1 i i |i |p . f px q suppxx, bx y f pxqq sup p xPC xPC i1
The supremum exists since the sum inside the supremum is nite. Differentiating with respect to each i gives i |i |p1 sgn i for i 1, . . . , n, (8.2.1) so
1 i i |i |p f px q p i1
n
Taking absolute values, exponentiating both sides of (8.2.1) by q , and using the identity p q pp 1q gives |i|q q |i|p; substituting this above gives f px q 1q 1 |i|q q i1
n
1 |i|p. q i1
n
1 1q }x }q . q
Q UIZ T HEOREM 16. Let C
Lpp0, 1q for 1 p 8 and

f puq 1 p
1
0
puptqqp dt.
Find C and f .
1 P C Lq p0, 1q, where p 1 q 1, as
Proof: We can represent an element y
xy, uy
Then
1
0
y ptquptq dt.
1
0
f py q suppxy, uy f puqq sup

u C
u C
pyptquptq puptqqpq dt;
that is, we want to nd a stationary point of J
1
0
f pu, u, 9 tq dt
1
0
pyptquptq puptqqpq dt.
120 0
Then u must satisfy the Euler-Lagrange equation fu pu, u, 9 tq so y d fu 9 tq y pup1 9 pu, u, dt
0,
pup1. We check that y P Lq p0, 1q:

1
0
pyptqq
dt
1
0
p puptqq p q dt
q q p 1
1
0
pq puptqqp dt 8 K
where boundedness follows since u P Lp p0, 1q.
Q UIZ T HEOREM 17 (rst part of [1, Proposition 7.10.1, p. 196]). The conjugate set C and the conjugate functional f are convex.
Proof: For any x 1 , x2
P C and any P p0, 1q,

supppxx, x 1 y f pxqq p1 qpxx, x2 y f pxqqq xPC suppxx, x 1 y f pxqq p1 q suppxx, x2 y f pxqq xPC xPC f px1 q p1 qf px2 q.
x C
f p x 1 p1 qx2 q suppxx, x1 p1 qx2 y f pxqq P
Therefore, f is convex. Since C is the set of all x where f px q is dened, C is also convex.
Proposition 8.2.5 (second part of [1, Proposition 7.10.1, p. 196]). rf , C s is a closed convex subset of R X . Proof: From Quiz Theorem 17, rf , C s is convex. To show that it is closed, let tpsi , x i qu be a convergent sequence in rf , C s with psi , xi q ps, x q. We must show that ps, x q P rf , C s. For every i and every x P C, si where (a) follows by denition of the epigraph and (b) follows by denition of f . Taking the limit as i 8 gives s xx, x y f pxq f px q for all x P C , so ps, x q P rf , C s. K
paq f px q xx, xy f pxq,
The following proposition conrms our intuition about recovering C and f from C and f , as we did in the previous examples:
Proposition 8.2.6 ([1, Proposition 7.10.2, p. 197]). Let f be a convex functional on a convex set C in a normed space X . If rf, C s is closed, then rf, C s rrf, C s s. 2 (the closed unit disk in E 2 ) and let f pxq f pu, v q 0. Find C , f , C , and D
Example 8.2.7. Let C f .
8.3. Conjugate Concave Functionals 0 Solution: We actually found C R2 and f pa, bq 2 X R3 is closed. C C and f f since rf, C s D
121
a2 b2 in Example 1.3.2. By Proposition 8.2.6, K
8.3
Conjugate Concave Functionals
Denition 8.3.1 (concave functional). A functional g dened on a convex set D of a linear vector space is concave if g is convex. Note that the set D is still convex, not concave. Furthermore, a functional is linear if and only if it is both convex and concave. Just as we dened the epigraph for convex functionals, we dene the hypograph for concave functionals: Denition 8.3.2 (hypograph). The hypograph of a concave functional g dened on a convex set D in a vector space X is the set rg, Ds tpr, xq P R X : x P D, f pxq ru. The hypograph is a convex set: the proof is analogous to that in Example 8.1.5. We can also dene conjugate functionals and sets for concave functionals, just as we did for convex functionals: Denition 8.3.3 (conjugate concave). Let g be a concave functional dened on a convex set D in a normed space X . The conjugate set D is dened as C x P X : inf pxx, x y f pxqq 8 xPD and the functional g conjugate to g is dened on D as g px q suppxx, x y g pxqq.
x D
"
Example 8.3.4. Let D
r0, 1s and gpxq x2. Find D and g.
m 2 m 0 . 4 1 m 2 and C R. From Figure 8.3.1, clearly g g and D D. Note that g pmq f pmq from Example
2
Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx. Figure 8.3.1 shows the convex sets rg, Ds and rg , D s. From the graph and using elementary calculus, we see that $ ' m0 ' &0 g px q g pmq
' ' %m
8.2.2.
122 0
Figure 8.3.1: The hypographs rg, Ds and rg , D s in Example 8.3.4. The linear functionals mx for m 0, 1, 2 are shown on rg, Ds, and the linear functionals xm for x 0, 1{2, 1 are shown on rg , D s.
r1, 2s and gpxq 0. Find D and g.
Figure 8.3.2: The hypographs rg, Ds and rg , D s in Example 8.3.5. The linear functionals mx for m 1, 0, 1 are shown on rg, Ds, and the linear functionals xm for x 1, 3{2, 2 are shown on rg, Ds.
Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx. Figure 8.3.2 shows the convex sets rg, Ds and rg , D s. From the graph, 2m m 0 g px q g pmq m m0 K and D 8.2.3.
#
R. From Figure 8.3.2, clearly g g and D D. Note that gpmq f pmq from Example
8.4. Fenchel Duality 0
123
p0, 8sn equipped with the Euclidean norm and

g px q 1 p p i1 i
n
for x t1 , . . . , n u, p 1 and 0. Find C and f . Solution: The proof is nearly the same as in Example 8.2.4. If we represent an element x t1, . . . , nu, we have n q 1q 1 g px q i . q i1
P D D as
K
8.4
Fenchel Duality
Suppose that f is convex over convex domain C and g is concave over convex domain D. A standard minimization problem is to nd the minimum vertical distance
x C D
P X
inf
pf pxq gpxqq.
While g pxq usually equals zero, introducing g is helpful because it allows us to exploit duality: the minimum vertical separation between the convex sets rf, C s and rg, Ds equals the maximum vertical separation between two parallel hyperplanes separating rf, C s and rg, Ds, as shown in Figure 8.4.1.
Figure 8.4.1: The minimum vertical separation between the convex sets rf, C s and rg, Ds equals the maximum vertical separation between two parallel hyperplanes separating rf, C s and rg, Ds.
Q UIZ T HEOREM 18 (Fenchel Duality Theorem ([1, Theorem 7.12.1, p. 201])). Suppose that f is convex over C and g is concave over D, where C, D are convex sets in a normed space X . Assume that C X D contains points
124 0
in the relative interiors of C and D and that either rf, C s or rg, Ds has nonempty interior. Further assume that Then
x C D
P X
inf
pf pxq gpxqq 8.
where the maximum on the right is achieved by some x 0 PC XD . If the inmum on the left is achieved by some x0 P C X D, then
x C
max pgpxq f pxqq, x PC XD
maxpxx, x 0 y f pxqq xx0 , x0 y f px0 q P

x D
minpxx, x 0 y g pxqq xx0 , x0 y g px0 q. P
Proof: By denition of f and g , for all x P C X D and x
P C X D ,
f px q xx, x y f px q
g px q xx, x y g px q,
so f pxq g pxq g px q f px q. Since this holds for all x P C X D and x

x C D
P C X D ,
P X
inf
pf pxq gpxqq
sup pg px q f px qq. x PC XD
To prove equality, we must nd an x 0
P C X D such that gpx 0 q f px 0 q.
By denition of , the sets rf , C s and rg, Ds are arbitrarily close but have disjoint relative interiors. Since one of the sets rf , C s and rg, Ds has nonempty relative interior, there is a closed hyperplane in R X separating the sets. The hyperplane cannot be vertical, otherwise its vertical projection onto X would separate C and D. Therefore, we can represent the hyperplane as
tpr, xq P R X : xx, x 0 y r cu
for some x 0
P X and some c P R. Now rg, Ds lies below this hyperplane but is arbitrarily close to it, so
c inf pxx, x 0 y g pxqq g px0 q
x D
(8.4.1)
and similarly, rf
, C s lies below the hyperplane but is also arbitrarily close to it, so

c suppxx, x 0 y pf pxq qq f px0 q ,
x C
(8.4.2)
so g px 0 q f px0 q. If the inmum is achieved by some x0 P C X D , then the sets rf , C s and rg, Ds have the point pgpx0q, x0q in common and this point is in the separating hyperplane, so by (8.4.1)
8.4. Fenchel Duality 0 and (8.4.2),
125
maxpxx, x 0 y f pxqq xx0 , x0 y f px0 q

x C
minpxx, x 0 y g pxqq xx0 , x0 y g px0 q.

x D
The general strategy is to use the sets C and D to impose constraints. If we want to minimize a convex functional f , we choose g pxq 0. Then we want to minimize f g , which is equivalent to maximizing g f . Since we chose g pxq 0, is the desired minimum of f . If we want to maximize a concave functional g , we choose f pxq 0. Then we want to minimize f g , which is equivalent to maximizing g f . Since we chose f pxq 0, sup g pxq min pf pxq gpxqq x P C X D xPC XD is the desired maximum of g .
x C D
P X
inf
f pxq max pgpxq f pxqq x PC XD
Example 8.4.1 (generalization [1, Example 7.12.1, p. 202]). Suppose you have w to spend on n activities, each of which costs wi per unit of work for i 1, . . . , n. Assume that the return associated with activity i 1 is decreasing is gi pxi q, xi is the number of units of work allocated to the activity, and gi is increasing and gi due to diminishing marginal returns (so g is concave).
Solution: We write the problem formally as max g px q

n i 1 n i 1
x1 ,...,xn
g i px i q
(returns) (budget constraint) (positivity constraints)
s.t.
wi x i
w 0
n i 1
x1 , . . . , x n
where x px1 , . . . , xn q. To solve the problem using Fenchel duality, set C
x:
wi x i
w
to capture the budget constraint, and set D r0, 8qn to capture the positivity constraints. Set f pxq since we are maximizing a concave functional g . Then for x px 1 , . . . , x n q, f px q sup
n
0
x Ci 1
P
xi x i,
126 0
which is nite only if x Furthermore, is dened for all x i
pw1, . . . , wnq, in which case f pxq w. Therefore, C tpw1, . . . , wnqu.

px q gi i
xi 0,
P r0, 8q since gipxiq is concave, so

g px q inf xPD
n i 1
Pr 8q
inf
px i x i gi pxi qq

x i x g p xq
i
n i 1
px q ; gi i
is dened for all x
P D D. Therefore, the dual problem is

x C D
P X
min
pf pxq gpxqq min
n i 1
pwi q . gi
Given the optimal , xi is the value for which the inmum in (8.4) is achieved, with x i 1, . . . , n.
wi, for i
K
Example 8.4.2. As a concrete application of Example 8.4.1, say you have w 5 to spend on two presents: x1 on the rst and x2 on the second. Maximize the total gratitude you generate, where total gratitude is ? ? given by g pxq g1 px1 q g2 px2 q, with g1 px1 q 2 x1 and g2 px2 q x2 . Solution: By elementary calculus, we have g px q
1 1
x1 0 ,
Pr 8q
inf
inf

2 ? px x 2 x q 1 x 2
1 1 1
x
1
px q g2 2
? p x2 x 2 x2 q x Pr0,8q
1
1 2x 2
x 2
1 x 1
2
2
x1
1
1 2x 2
1 4x .
2
where the inma are achieved at x1
2 2 p1{x 1 q and x2 p1{2x2 q . The dual problem is
min w
1 4
taking the derivative with respect to , we have x0 Then x 1 5 1 4 2
0
? ? x 2 1{2, so x1 4 and x2 1. The maximum gratitude is g p4, 1q 2 4 1 5.
1 2
5 w
1 . 2
Example 8.4.3. As another concrete application of Example 8.4.1, say you have $w million to invest in oil exploration and wind power. The government gives you a 50% subsidy for your wind power investment, so your budget constraint is x1 1 2 x2 w . The total return on your investment is given by g pxq
8.4. Fenchel Duality 0 g1 px1 q g2 px2 q, where g1 px1 q

#
127
4x1
9 x1
0 x1 x1
g 2 px 2 q
3x2
4 x2
0 x2 x2
2.
Figure 8.4.2: The hypographs rg1 , Ds and rg2 , Ds in Example 8.4.3. Solution: From the graphs in 8.4.2, we see that
px q g1 1
3x 1 12 0
x 1
1 x 1
px q g2 2
2x 2 6 0
1 x 2 x 2
3, 3 and
where the inma are achieved at x1 3 if 1 x 1 4 and x1 0 if x1 4, and x2 2 if 1 x2 2 x2 0 if x 2 3. Then D r1, 8q , and C X D tp1, 1{2q : 2u. The dual problem is pq g 1 min a g1 2 2 2

The table below shows values of the optimal x1 and x2 for various values of a, which we can determine using the graphs in Figure 8.4.3. a 1{2 2 5 6 4 2 x 1 6 4 2 x 2 x1 0 1 3 x2 1 2 4 3 10 20 K Example 8.4.4 ([1, Example 7.12.2, p. 203]). As yet another concrete application of Example 8.4.1, say you have $w to bet on a horse race with n horses. Assume that we know that the public has bet si on horse i 3 2 1
128 0
pq g Figure 8.4.3: The function a g1 2
1
2
in Example 8.4.3 for a 1{2, 2, 5.
and that horse i has probability pi of winning for i 1, . . . , n. We bet xi on horse i, and we must bet all of our money on the n horses. Let P denote the amount the track pays out to bettors on the winning horse (typically 80% of the total amount bet). Maximize your expected fraction of the total amount P paid out.
Proof: Our expected net return is

n C w si loooomoooon i1 n i1
xi
xi s i loomoon
amount won if horse i wins
pi w.
total amount bet
Maximizing this quantity is equivalent to maximizing g px q

n i 1
g i px i q ,
gi pxi q
pi xi . xi s i
at x is obtained by nding the lowest line of Remember that the value of the conjugate functional gi i 1 slope xi which lies above gi . Therefore, gi pxi q is undened for x i 0, and since gi p0q pi {si , gi pxi q 0 for x i pi {si . By elementary calculus, we have for 0 pi {si
d pi xi xi x i dxi xi s i Then

x
i
pi si p x i s i q2
0 xi
2 i i 2 i i
pi si x i
si .
i i
(8.4.3)
pi xi px q inf gi xi x i i x xi Pr0,8q i si
px p s x q
0 x i x i
p s
p s
i i
(8.4.4)
where xi is determined from (8.4.3). Now we know from Example 8.4.1 that x i for i 1, . . . , n. Rearrange the indices so that p1 {s1 pn{sn. For the given , dene m as the largest index for which pi{si . From (8.4.3) and (8.4.4), our solution is $b & pi si s i 1, . . . , m i . K xi %0 i m 1, . . . , n
8.4. Fenchel Duality 0 Example 8.4.5 ([1, Example 7.12.3, p. 205]). Find u P L2 r0, 1s minimizing f puq subject to the linear constraints

129
1 2
1
0
puptqq2 dt

yu

. . . py n , u q
py 1 , u q
c1 . . . c. cn
Solution: Let C
L2r0, 1s, D tu : yu cu, and gptq 0. To nd f puq, we want to maximize

J
pu, uq f puq
1
0
uptqu ptq
1 ppuptqq2 2
dt
1
0
9 tq dt, v pu, u,
so u must satisfy the Euler-Lagrange equation
9 tq vu pu, u,
Therefore, and C
d 9 tq u u 0. vu 9 pu, u, dt
1
0
L2r0, 1s. Furthermore,
1 f pu q supppu, u q f puqq 2 uPC
puptqq2 dt
uPD that is, we want to minimize pu, u q subject to the constraints py1 , uq c1 , . . . , pyn , uq cn . Equivalently,
g pu q inf pu, u q ;
we want to nd a stationary point of J
1
0
uptqu ptq
n i 1
i yi ptquptq
dt
1
0
9 tq dt, v pu, u,
so v must satisfy the Euler-Lagrange equation vu pu, u, 9 tq Therefore, g pu q is dened only if u

d 9 vu tq u i yi 9 pu, u, dt i1
n
0.
g pu q inf u, i yi uPD i1

n
T T n i1 i yi y for p1 , . . . , n q, in which case
uinf PD
n i 1
i ci
p, cq cT
and D
tu : u yu. By Quiz Theorem 18, the dual maximization problem is then

max pg pu q f pu qq max cT u PC XD
1 T T T p y q py q , 2
130 0
where by elementary matrix calculus, we achieve the maximum at the original minimization problem is u yT .
pyyT q1c.
Then the solution to K
B IBLIOGRAPHY
[1] D. G. Luenberger, Optimization by Vector Space Methods. Wiley Interscience, 1969 [1997]. [2] W. Rudin, Principles of Mathematical Analysis. McGraw-Hill Science/Engineering/Math, third ed., 1976. [3] M. J. D. Powell, Approximation Theory and Methods. Cambridge University Press, 1981. [4] M. A. Khamsi, Gibbs phenomenon, S.O.S. MATHematics, 2008. [5] S. Axler, Linear Algebra Done Right. Springer, 2nd ed., 2004. [6] J. Munkres, Topology. Prentice Hall, 2nd ed., 2000. [7] G. Bachmann and L. Narici, Functional Analysis. Dover Publications, 2nd ed., 1998. [8] B. V. Brunt, Calculus of Variations. Springer, 1st ed., 2003.
131

Notes On Luenberger's Vector Space Optimization

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Notes On Luenberger's Vector Space Optimization

Загружено:

Авторское право:

Доступные форматы

Convexity and Optimization

Paul G. Bamberg Harvard University

Convexity and Optimization

Copyright c 2008 Paul G. Bamberg Harvard University Cambridge, MA 02138

Reading: [1, Chapter 1]

Existence of Optimal Solutions

1. G ENERALIZING FROM T WO D IMENSIONS

Because the expected value is unbounded, there is no solution.

(revenueq (our constraint) (sugar constraint)

1. G ENERALIZING FROM T WO D IMENSIONS

p1 qb; adding these inequalities gives the desired result.

1.4. Finite- vs. Innite-Dimensional Vector Spaces 0

Finite- vs. Innite-Dimensional Vector Spaces

0, f9ptq  df . dt  kf ptq, so f ptq  f0 exppktq. In

Then the total amount of food stored by time T is

1. G ENERALIZING FROM T WO D IMENSIONS

Then we want to maximize

1.4. Finite- vs. Innite-Dimensional Vector Spaces 0

 rptq dptq. Both inventory and production must be nonnegative, so we

We are trying to minimize total costs:

pcprptqq hxptqq dt.

1. G ENERALIZING FROM T WO D IMENSIONS

We want to maximize the quantity

. Some algebra tells us 1

 log 7 1 and 2  log 2, so pk  2k {7.

. We recognize this as in the form of a geometric distribution pk

1. G ENERALIZING FROM T WO D IMENSIONS

Minimum Norm Problems

 p2, 1q. A bus drives through town

normal to the highway through the point p2, 1q

1.5. Minimum Norm Problems 0

1. G ENERALIZING FROM T WO D IMENSIONS

Solution: The rst three are linear functionals since L pf

for all x P R. The last is also a linear functional since L pf

g pxq dx  Lpf q Lpg q

1.5. Minimum Norm Problems 0 Since

1 pf p1q 4f p0q f p1qq . 3

|f pxq ppxq| dx L2 -norm: 0 pf pxq ppxqq2 dx L8 -norm: max0x1 |f pxq ppxq|

Then a linear combination of Bernstein basis polynomials B pX q 

1. G ENERALIZING FROM T WO D IMENSIONS

Dene the Bernstein polynomial Bn pf qpxq 

|f pxq Bnpf qpxq| 

P Z. Consider the partial Fourier series:

1.5. Minimum Norm Problems 0

0. This discrepancy is known as Gibbs phe-

for all x where x is in

To nd the zeros of pSN f q1 pxq, we rst show that 2 sin x

cospp2k 1qxq  sinp2N xq.

cospp2k 1qxq  2 sin x

cospp2k 1qxq cospp2N

1. G ENERALIZING FROM T WO D IMENSIONS

2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

 0 pu v q  pp1q 1qpu vq pbq  p1qpu vq 1pu vq pcq  p1qu p1qv 1u 1v pdq  u v u v

All innite sequences that converge to zero: innite-dimensional vector space

2.1. Vector Spaces 0

All polynomials p with deg p degpf g q  3)

not a vector space (consider f pxq

 x3 and gpxq  x3:

2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

}px yq pa bq}8 }x a}8 }y b}8 r s,

}px yq pa bq}8 }x a}8 }y b}8 r s,

 looooomooooon pav1 bv2q loooooomoooooon paw1 bw2q;

 tp1, 2q, p3, 6qu

0, f9ptq df . dt kf ptq, so f ptq f0 exppktq. In

rptq dptq. Both inventory and production must be nonnegative, so we

log 7 1 and 2 log 2, so pk 2k {7.

p2, 1q. A bus drives through town

g pxq dx Lpf q Lpg q

Then a linear combination of Bernstein basis polynomials B pX q

Dene the Bernstein polynomial Bn pf qpxq

|f pxq Bnpf qpxq|

cospp2k 1qxq sinp2N xq.

cospp2k 1qxq 2 sin x

0 pu v q pp1q 1qpu vq pbq p1qpu vq 1pu vq pcq p1qu p1qv 1u 1v pdq u v u v

All polynomials p with deg p degpf g q 3)

x3 and gpxq x3:

looooomooooon pav1 bv2q loooooomoooooon paw1 bw2q;

tp1, 2q, p3, 6qu

tp1, 2q, p3, 3qu

pax1 by1q p1 qpax2 by2q apx1 p1 qx2q bpy1 p1 qy2q.

m1 xji , i2 i i 2 looooooooomooooooooon

n n k1 pk k qxk 0 and k k for k 1, . . . , n by Theorem k1 k xk , then

sin2 x and cos 2x

1 s, with v for some v P rSk

bi vi . Substituting this into the expression below, we have

}x1 p1 qx2} }x1} p1 q}x2} p1 q 1.

cq }0} }x x} }x} } x} p }x} | 1|}x} 2}x},

}x} amax |xptq| || amax |xptq| ||}x}. tb tb

We then dene the total variation of x as TVpxq sup

}x} |xpaq| sup

|xptiq xpti1q| |||xpaq| || sup

|xptiq xpti1q| ||}x}.

}x y} |xpaq ypaq| sup

}z z0} }x p1 qy x0 p1 qy0} }x x0} p1 q}y y0} p1 q

}x x1} }px xmq px1 xmq} }x xm} }x1 xm}

Solution: Let pxn , yn q p1{n, 1{n2 q. Then lim f pxn , yn q lim

so f is discontinuous at px, y q p0, 0q.

The norm of an element x ti u P lp is dened as

tiu P l8 is dened as }x}8

T HEOREM 3.1.2 (Cauchy-Schwarz Inequality). If x ti u, y ti u P l2 , then

a{b and multiplying both sides by b gives the desired inequality.

Proof: First consider the case p 8, q

1. Then we have | | max |i| i

|i| }x}1 y}1,

}x y}8 max |i i| max |i| max |i| }x}8 }y}8. i i i

Proposition 3.2.6. If f pxq