m820 2011

Contents
1 Preliminary Analysis 7
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Notation and preliminary remarks . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 The Order notation . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Functions of a real variable . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Continuity and Limits . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 Monotonic functions and inverse functions . . . . . . . . . . . . . 17
1.3.4 The derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.5 Mean Value Theorems . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.6 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.7 Implicit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.8 Taylor series for one variable . . . . . . . . . . . . . . . . . . . . 31
1.3.9 Taylor series for several variables . . . . . . . . . . . . . . . . . . 36
1.3.10 LHospitals rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.3.11 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.4 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2 Ordinary Dierential Equations 51
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.2 General denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3 First-order equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3.1 Existence and uniqueness of solutions . . . . . . . . . . . . . . . 60
2.3.2 Separable and homogeneous equations . . . . . . . . . . . . . . . 62
2.3.3 Linear rst-order equations . . . . . . . . . . . . . . . . . . . . . 63
2.3.4 Bernoullis equation . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.5 Riccatis equation . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.4 Second-order equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.4.2 General ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.4.3 The Wronskian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.4.4 Second-order, constant coecient equations . . . . . . . . . . . . 76
2.4.5 Inhomogeneous equations . . . . . . . . . . . . . . . . . . . . . . 78
2.4.6 The Euler equation . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.5 An existence and uniqueness theorem . . . . . . . . . . . . . . . . . . . 81
1
2 CONTENTS
2.6 Envelopes of families of curves (optional) . . . . . . . . . . . . . . . . . 82
2.7.1 Applications of dierential equations . . . . . . . . . . . . . . . . 91
3 The Calculus of Variations 93
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.2 The shortest distance between two points in a plane . . . . . . . . . . . 93
3.2.1 The stationary distance . . . . . . . . . . . . . . . . . . . . . . . 94
3.2.2 The shortest path: local and global minima . . . . . . . . . . . . 96
3.2.3 Gravitational Lensing . . . . . . . . . . . . . . . . . . . . . . . . 98
3.3 Two generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.3.1 Functionals depending only upon y
(x) . . . . . . . . . . . . . . . 99
3.3.2 Functionals depending upon x and y
(x) . . . . . . . . . . . . . . 101
3.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.5 Examples of functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.5.1 The brachistochrone . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.5.2 Minimal surface of revolution . . . . . . . . . . . . . . . . . . . . 106
3.5.3 The minimum resistance problem . . . . . . . . . . . . . . . . . . 106
3.5.4 A problem in navigation . . . . . . . . . . . . . . . . . . . . . . . 110
3.5.5 The isoperimetric problem . . . . . . . . . . . . . . . . . . . . . . 110
3.5.6 The catenary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.5.7 Fermats principle . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.5.8 Coordinate free formulation of Newtons equations . . . . . . . . 114
4 The Euler-Lagrange equation 121
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2 Preliminary remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.2.1 Relation to dierential calculus . . . . . . . . . . . . . . . . . . . 122
4.2.2 Dierentiation of a functional . . . . . . . . . . . . . . . . . . . . 123
4.3 The fundamental lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.4 The Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . 128
4.4.1 The rst-integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.5 Theorems of Bernstein and du Bois-Reymond . . . . . . . . . . . . . . . 134
4.5.1 Bernsteins theorem . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.6 Strong and Weak variations . . . . . . . . . . . . . . . . . . . . . . . . . 137
5 Applications of the Euler-Lagrange equation 145
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.2 The brachistochrone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.2.1 The cycloid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.2.2 Formulation of the problem . . . . . . . . . . . . . . . . . . . . . 149
5.2.3 A solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.3 Minimal surface of revolution . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.1 Derivation of the functional . . . . . . . . . . . . . . . . . . . . . 155
5.3.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.3.3 The solution in a special case . . . . . . . . . . . . . . . . . . . . 157
CONTENTS 3
5.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.4 Soap Films . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6 Further theoretical developments 173
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.2 Invariance of the Euler-Lagrange equation . . . . . . . . . . . . . . . . . 173
6.2.1 Changing the independent variable . . . . . . . . . . . . . . . . . 174
6.2.2 Changing both the dependent and independent variables . . . . . 176
6.3 Functionals with many dependent variables . . . . . . . . . . . . . . . . 181
6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.3.2 Functionals with two dependent variables . . . . . . . . . . . . . 182
6.3.3 Functionals with many dependent variables . . . . . . . . . . . . 185
6.3.4 Changing dependent variables . . . . . . . . . . . . . . . . . . . . 186
6.4 The Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7 Symmetries and Noethers theorem 195
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7.2 Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7.2.1 Invariance under translations . . . . . . . . . . . . . . . . . . . . 196
7.3 Noethers theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.3.1 Proof of Noethers theorem . . . . . . . . . . . . . . . . . . . . . 205
8 The second variation 209
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.2 Stationary points of functions of several variables . . . . . . . . . . . . . 210
8.2.1 Functions of one variable . . . . . . . . . . . . . . . . . . . . . . 210
8.2.2 Functions of two variables . . . . . . . . . . . . . . . . . . . . . . 211
8.2.3 Functions of n variables . . . . . . . . . . . . . . . . . . . . . . . 212
8.3 The second variation of a functional . . . . . . . . . . . . . . . . . . . . 215
8.3.1 Short intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.3.2 Legendres necessary condition . . . . . . . . . . . . . . . . . . . 218
8.4 Analysis of the second variation . . . . . . . . . . . . . . . . . . . . . . . 220
8.4.1 Analysis of the second variation . . . . . . . . . . . . . . . . . . . 222
8.5 The Variational Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 226
8.6 The Brachistochrone problem . . . . . . . . . . . . . . . . . . . . . . . . 229
8.7 Surface of revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.8 Jacobis equation and quadratic forms . . . . . . . . . . . . . . . . . . . 232
9 Parametric Functionals 239
9.1 Introduction: parametric equations . . . . . . . . . . . . . . . . . . . . . 239
9.1.1 Lengths and areas . . . . . . . . . . . . . . . . . . . . . . . . . . 241
9.2 The parametric variational problem . . . . . . . . . . . . . . . . . . . . 244
9.2.1 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9.2.2 The Brachistochrone problem . . . . . . . . . . . . . . . . . . . . 250
4 CONTENTS
9.2.3 Surface of Minimum Revolution . . . . . . . . . . . . . . . . . . . 250
9.3 The parametric and the conventional formulation . . . . . . . . . . . . . 251
10 Variable end points 257
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.2 Natural boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . 259
10.2.1 Natural boundary conditions for the loaded beam . . . . . . . . . 263
10.3 Variable end points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10.4 Parametric functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
10.5 Weierstrass-Erdmann conditions . . . . . . . . . . . . . . . . . . . . . . 271
10.5.1 A taut wire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
10.5.2 The Weierstrass-Erdmann conditions . . . . . . . . . . . . . . . . 273
10.5.3 The parametric form of the corner conditions . . . . . . . . . . . 277
10.6 Newtons minimum resistance problem . . . . . . . . . . . . . . . . . . . 277
11 Conditional stationary points 287
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
11.2 The Lagrange multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
11.2.1 Three variables and one constraint . . . . . . . . . . . . . . . . . 291
11.2.2 Three variables and two constraints . . . . . . . . . . . . . . . . 293
11.2.3 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
11.3 The dual problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
12 Constrained Variational Problems 299
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
12.2 Conditional Stationary values of functionals . . . . . . . . . . . . . . . . 300
12.2.1 Functional constraints . . . . . . . . . . . . . . . . . . . . . . . . 300
12.2.2 The dual problem . . . . . . . . . . . . . . . . . . . . . . . . . . 304
12.2.3 The catenary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
12.3 Variable end points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
12.4 Broken extremals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
12.5 Parametric functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
12.6 The Lagrange problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
12.6.1 A single non-holonomic constraint . . . . . . . . . . . . . . . . . 317
12.6.2 An example with a single holonomic constraint . . . . . . . . . . 318
12.7 Brachistochrone in a resisting medium . . . . . . . . . . . . . . . . . . . 319
12.8 Brachistochrone with Coulomb friction . . . . . . . . . . . . . . . . . . . 329
13 Sturm-Liouville systems 339
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
13.2 The origin of Sturm-Liouville systems . . . . . . . . . . . . . . . . . . . 342
13.3 Eigenvalues and functions of simple systems . . . . . . . . . . . . . . . . 348
13.3.1 Bessel functions (optional) . . . . . . . . . . . . . . . . . . . . . . 353
13.4 Sturm-Liouville systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
CONTENTS 5
13.4.1 Separation and Comparison theorems . . . . . . . . . . . . . . . 359
13.4.2 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . . . . 363
13.4.3 The oscillation theorem (optional) . . . . . . . . . . . . . . . . . 365
14 The Rayleigh-Ritz Method 375
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
14.2 Basic ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
14.3 Eigenvalues and eigenfunctions . . . . . . . . . . . . . . . . . . . . . . . 379
14.4 The Rayleigh-Ritz method . . . . . . . . . . . . . . . . . . . . . . . . . . 384
15 Solutions to exercises 397
15.1 Solutions for chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
15.10 Solutions for chapter 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
6 CONTENTS
Chapter 1
Preliminary Analysis
1.1 Introduction
This course is about two related mathematical concepts which are of use in many areas
of applied mathematics, are of immense importance in formulating the laws of theoret-
ical physics and also produce important, interesting and some unsolved mathematical
problems. These are the functional and variational principles : the theory of these
entities is named The Calculus of Variations.
A functional is a generalisation of a function of one or more real variables. A real
function of a single real variable maps an interval of the real line to real numbers: for
instance, the function (1 + x
2
)
1
maps the whole real line to the interval (0, 1]; the
function ln x maps the positive real axis to the whole real line. Similarly a real function
of n real variables maps a domain of R
n
into the real numbers.
A functional maps a given class of functions to real numbers. A simple example of
a functional is
S[y] =
_
1
0
dx
_
1 +y
(x)
2
, y(0) = 0, y(1) = 1, (1.1)
which associates a real number with any real function y(x) which satises the boundary
conditions and for which the integral exists. We use the square bracket notation
1
S[y]
to emphasise the fact that the functional depends upon the choice of function used to
evaluate the integral. In chapter 3 we shall see that a wide variety of problems can be
described in terms of functionals. Notice that the boundary conditions, y(0) = 0 and
y(1) = 1 in this example, are often part of the denition of the functional.
Real functions of n real variables can have various properties; for instance they
can be continuous, they may be dierentiable or they may have stationary points and
local and global maxima and minima: functionals share many of these properties. In
1
In this course we use conventions common in applied mathematics and theoretical physics. A
function of a real variable x will usually be represented by symbols such as f(x) or just f, often
with no distinction made between the function and its value; as is often the case it is often clearer
to use context to provide meaning, rather than precise denitions, which initially can hinder clarity.
Similarly, we use the older convention, S[y], for a functional, to emphasise that y is itself a function;
this distinction is not made in modern mathematics. For an introductory course we feel that the older
convention, used in most texts, is clearer and more helpful.
7
8 CHAPTER 1. PRELIMINARY ANALYSIS
particular the notion of a stationary point of a function has an important analogy in
the theory of functionals and this gives rise to the idea of a variational principle, which
arises when the solution to a problem is given by the function making a particular
functional stationary. Variational principles are common and important in the natural
sciences.
The simplest example of a variational principle is that of nding the shortest distance
between two points. Suppose the two points lie in a plane, with one point at the origin,
O, and the other at point A with coordinates (1, 1), then if y(x) represents a smooth
curve passing through O and A the distance between O and A, along this curve is given
by the functional dened in equation 1.1. The shortest path is that which minimises the
value of S[y]. If the surface is curved, for instance a sphere or ellipsoid, the equivalent
functional is more complicated, but the shortest path is that which minimises it.
Variational principles are important for three principal reasons. First, many prob-
lems are naturally formulated in terms of a functional and an associated variational
principle. Several of these will be described in chapter 3 and some solutions will be
obtained as the course develops.
Second, most equations of mathematical physics can be derived from variational
principles. This is important partly because it suggests a unifying theme in our descrip-
tion of nature and partly because such formulations are independent of any particular
coordinate system, so making the essential mathematical structure of the equations
more transparent and easier to understand. This aspect of the subject is not consid-
ered in this course; a good discussion of these problems can be found in Yourgrau and
Mandelstam (1968)
2
.
Finally, variational principles provide powerful computational tools; we explore as-
pects of this theory in chapter 13.
Consider the problem of nding the shortest path between two points on a curved
surface. The associated functional assigns a real number to each smooth curve joining
the points. A rst step to solving this problem is to nd the stationary values of the
functional; it is then necessary to decide which of these provides the shortest path. This
is very similar to the problem of nding extreme values of a function of n variables,
where we rst determine the stationary points and then classify them: the important
and signicant dierence is that the space of allowed functions is not usually nite
in dimension. The innite dimensional spaces of functions, with which we shall be
dealing, has many properties similar to those possessed by nite dimensional spaces,
and in the many problems the dierence is not signicant. However, this generalisation
does introduce some practical and technical diculties some of which are discussed in
section 4.6. In this chapter we review calculus in order to prepare for these more general
ideas of calculus.
In elementary calculus and analysis, the functions studied rst are real functions, f,
of one real variable, that is, functions with domain either R, or a subset of R, and
codomain R. Without any other restrictions on f, this denition is too general to be
useful in calculus and applied mathematics. Most functions of one real variable that
are of interest in applications have smooth graphs, although sometimes they may fail
to be smooth at one or more points where they have a kink (fail to be dierentiable),
or even a break (where they are discontinuous). This smooth behaviour is related to
2
Yourgrau W and Mandelstram S Variational Principles in Dynamics and Quantum Theory (Pit-
man).
1.1. INTRODUCTION 9
the fact that most important functions of one variable describe physical phenomena
and often arise as solutions of ordinary dierential equations. Therefore it is usual to
restrict attention to functions that are dierentiable or, more usually, dierentiable a
number of times.
The most useful generalisation of dierentiability to functions dened on sets other
than R requires some care. It is not too hard in the case of functions of several (real)
variables but we shall have to generalise dierentiation and integration to functionals,
not just to functions of several real variables.
Our presentation conceals very signicant intellectual achievements made at the
end of the nineteenth century and during the rst half of the twentieth century. During
the nineteenth century, although much work was done on particular equations, there
was little systematic theory. This changed when the idea of innite dimensional vector
spaces began to emerge. Between 1900 and 1906, fundamental papers appeared by
Fredholm
3
, Hilbert
4
, and Frechet
5
. Frechets thesis gave for the rst time denitions of
limit and continuity that were applicable in very general sets. Previously, the concepts
had been restricted to special objects such as points, curves, surfaces or functions. By
introducing the concept of distance in more general sets he paved the way for rapid
advances in the theory of partial dierential equations. These ideas together with the
theory of Lebesgue integration introduced in 1902, by Lebesgue in his doctoral thesis
6
,
led to the modern theory of functional analysis. This is now the usual framework of
the theoretical study of partial dierential equations. They are required also for an
elucidation of some of the diculties in the Calculus of Variations. However, in this
introductory course, we concentrate on basic techniques of solving practical problems,
because we think this is the best way to motivate and encourage further study.
This preliminary chapter, which is assessed, is about real analysis and introduces
many of the ideas needed for our treatment of the Calculus of Variations. It is possible
that you are already familiar with the mathematics described in this chapter, in which
case you could start the course with chapter 2. You should ensure, however, that you
have a good working knowledge of dierentiation, both ordinary and partial, Taylor
series of one and several variables and dierentiation under the integral sign, all of
which are necessary for the development of the theory. In addition familiarity with the
theory of linear dierential equations with both initial and boundary value problems is
assumed.
Very many exercises are set, in the belief that mathematical ideas cannot be un-
derstood without attempting to solve problems at various levels of diculty and that
one learns most by making ones own mistakes, which is time consuming. You should
not attempt all these exercise at a rst reading, but these provide practice of essential
mathematical techniques and in the use of a variety of ideas, so you should do as many
as time permits; thinking about a problem, then looking up the solution is usually of
3
I. Fredholm, On a new method for the solution of Dirichlets problem, reprinted in Oeuvres
Complètes, lInstitut Mittag-Leer, (Malm o) 1955, pp 61-68 and 81-106
4
D. Hilbert published six papers between 1904 and 1906. They were republished as Grundz uge
einer allgemeinen Theorie der Integralgleichungen by Teubner, (Leipzig and Berlin), 1924. The most
crucial paper is the fourth.
5
M. Frechet, Doctoral thesis, Sur quelques points du Calcul fonctionnel, Rend. Circ. mat. Palermo
22 (1906), pp 1-74.
6
H. Lebesgue, Doctoral thesis, Paris 1902, reprinted in Annali Mat. pura e appl., 7 (1902) pp
231-359.
little value until you have attempted your own solution. The exercises at the end of
this chapter are examples of the type of problem that commonly occur in applications:
they are provided for extra practice if time permits and it is not necessary for you to
attempt them.
1.2 Notation and preliminary remarks
We start with a discussion about notation and some of the basic ideas used throughout
this course.
A real function of a single real variable, f, is a rule that maps a real number x
to a single real number y. This operation can be denoted in a variety of ways. The
approach of scientists is to write y = f(x) or just y(x), and the symbols y, y(x), f
and f(x) are all used to represent the function. Mathematics uses the more formal
and precise notation f : X Y , where X and Y are subsets of the real line: the set
X is named the domain, or the domain of denition of f, and set Y the codomain.
With this notation the symbol f denotes the function and the symbol f(x) the value
of the function at the point x. In applications this distinction is not always made and
both f and f(x) are used to denote the function. In recent years this has come to be
regarded as heresy by some: however, there are good practical reasons for using this
freer notation that do not aect pure mathematics. In this text we shall frequently use
the Leibniz notation, f(x), and its extensions, because it generally provides a clearer
picture and is helpful for algebraic manipulations, such as when changing variables and
integrating by parts.
Moreover, in the sciences the domain and codomain are frequently omitted, either
because they are obvious or because they are not known. But, perversely, the scientist,
by writing y = f(x), often distinguishes between the two variables x and y, by saying
that x is the independent variable and that y is the dependent variable because it depends
upon x. This labelling can be confusing, because the role of variables can change, but
it is also helpful because in physical problems dierent variables can play quite dierent
roles: for instance, time is normally an independent variable.
In pure mathematics the term graph is used in a slightly specialised manner. A graph
is the set of points (x, f(x)): this is normally depicted as a line in a plane using rect-
angular Cartesian coordinates. In other disciplines the whole gure is called the graph,
not the set of points, and the graph may be a less restricted shape than those dened
by functions; an example is shown in gure 1.5 (page 28).
Almost all the ideas associated with real functions of one variable generalise to
functions of several real variables, but notation needs to be developed to cope with this
extension. Points in R
n
are represented by n-tuples of real numbers (x
1
, x
2
, . . . , x
n
).
It is convenient to use bold faced symbols, x, a and so on, to denote these points,
so x = (x
1
, x
2
, . . . , x
n
) and we shall write x and (x
1
, x
2
, . . . , x
n
) interchangeably. In
hand-written text a bold character, x, is usually denoted by an underline, x.
A function f(x
1
, x
2
, . . . , x
n
) of n real variables, dened on R
n
, is a map fromR
n
, or a
subset, to R, written as f : R
n
R. Where we use bold face symbols like f or to refer
to functions, it means that the image under the function f (x) or (y) may be considered
as vector in R
m
with m 2, so f : R
n
R
m
; in this course normally m = 1 or m = n.
Although the case m = 1 will not be excluded when we use a bold face symbol, we shall
continue to write f and where the functions are known to be real valued and not vector
1.2. NOTATION AND PRELIMINARY REMARKS 11
valued. We shall also write without further comment f (x) = (f
1
(x), f
2
(x), . . . , f
m
(x)),
so that the f
i
are the m component functions, f
i
: R
n
R, of f .
On the real line the distance between two points x and y is naturally dened by
|xy|. A point x is in the open interval (a, b) if a < x < b, and is in the closed interval
[a, b] if a x b. By convention, the intervals (, a), (b, ) and (, ) = R are
also open intervals. Here, (, a) means the set of all real numbers strictly less than
a. The symbol for innity is not a number, and its use here is conventional. In
the language and notation of set theory, we can write (, a) = {x R : x < a}, with
similar denitions for the other two types of open interval. One reason for considering
open sets is that the natural domain of denition of some important functions is an
open set. For example, the function ln x as a function of one real variable is dened for
x (0, ).
The space of points R
n
is an example of a linear space. Here the term linear has
the normal meaning that for every x, y in R
n
, and for every real , x +y and x are
in R
n
. Explicitly,
(x
1
, x
2
, . . . , x
n
) + (y
1
, y
2
, . . . , y
n
) = (x
1
+y
1
, x
2
+y
2
, , x
n
+y
n
)
and
(x
1
, x
2
, . . . , x
n
) = (x
1
, x
2
, . . . , x
n
).
Functions f : R
n
R
m
may also be added and multiplied by real numbers. Therefore
a function of this type may be regarded as a vector in the vector space of functions
though this space is not nite dimensional like R
n
.
In the space R
n
the distance |x| of a point x from the origin is dened by the nat-
ural generalisation of Pythagoras theorem, |x| =
_
x
2
1
+x
2
2
+ +x
2
n
. The distance
between two vectors x and y is then dened by
|x y| =
_
(x
1
y
1
)
2
+ (x
2
y
2
)
2
+ + (x
n
y
n
)
2
. (1.2)
This is a direct generalisation of the distance along a line, to which it collapses when
n = 1.
This distance has the three basic properties
(a) |x| 0 and |x| = 0 if and only if x = 0,
(b) |x y| = |y x|,
(c) |x y| +|y z| |x z|, (Triangle inequality).
(1.3)
In the more abstract spaces, such as the function spaces we need later, a similar concept
of a distance between elements is needed. This is named the norm and is a map from
two elements of the space to the positive real numbers and which satises the above
three rules. In function spaces there is no natural choice of the distance function and
we shall see in chapter 4 that this exibility can be important.
For functions of several variables, that is, for functions dened on sets of points in
R
n
, the direct generalization of open interval is an open ball.
Denition 1.1
The open ball B
r
(a) of radius r and centre a R
n
is the set of points
B
r
(a) = {x R
n
: |x a| < r},
Thus the ball of radius 1 and centre (0, 0) in R
2
is the interior of the unit circle, not
including the points on the circle itself. And in R, the ball of radius 1 and centre 0
is the open interval (1, 1). However, for R
2
and for R
n
for n > 2, open balls are not
quite general enough. For example, the open square
{(x, y) R
2
: |x| < 1, |y| < 1}
is not a ball, but in many ways is similar. (You may know for example that it may be
mapped continuously to an open ball.) It turns out that the most convenient concept
is that of open set
7
, which we can now dene.
Denition 1.2
Open sets. A set U in R
n
is said to be open if for every x U there is an open ball
B
r
(a) wholly contained within U which contains x.
In other words, every point in an open set lies in an open ball contained in the set.
Any open ball is in many ways like the whole of the space R
n
it has no isolated or
missing points. Also, every open set is a union of open balls (obviously). Open sets
are very convenient and important in the theory of functions, but we cannot study the
reasons here. A full treatment of open sets can be found in books on topology
8
. Open
balls are not the only type of open sets and it is not hard to show that the open square,
{(x, y) R
2
: |x| < 1, |y| < 1}, is in fact an open set, according to the denition we gave;
and in a similar way it can be shown that the set {(x, y) R
2
: (x/a)
2
+ (y/b)
2
< 1},
which is the interior of an ellipse, is an open set.
Exercise 1.1
Show that the open square is an open set by constructing explicitly for each (x, y)
in the open square {(x, y) R
2
: |x| < 1, |y| < 1} a ball containing (x, y) and
lying in the square.
1.2.1 The Order notation
It is often useful to have a bound for the magnitude of a function that does not require
exact calculation. For example, the function f(x) =
_
sin(x
2
cosh x) x
2
cos x tends
to zero at a similar rate to x
2
as x 0 and this information is sometimes more helpful
than the detailed knowledge of the function. The order notation is designed for this
purpose.
Denition 1.3
Order notation. A function f(x) is said to be of order x
n
as x 0 if there is a
non-zero constant C such that |f(x)| < C|x
n
| for all x in an interval around x = 0.
This is written as
f(x) = O(x
n
) as x 0. (1.4)
The conditional clause as x 0 is often omitted when it is clear from the context.
More generally, this order notation can be used to compare the size of functions, f(x)
7
As with many other concepts in analysis, formulating clearly the concepts, in this case an open
set, represents a major achievement.
8
See for example W A Sutherland, Introduction to Metric and Topological Spaces, Oxford University
Press.
1.2. NOTATION AND PRELIMINARY REMARKS 13
and g(x): we say that f(x) is of the order of g(x) as x y if there is a non-zero
constant C such that |f(x)| < C|g(x)| for all x in an interval around y; more succinctly,
f(x) = O(g(x)) as x y.
When used in the form f(x) = O(g(x)) as x , this notation means that
|f(x)| < C|g(x)| for all x > X, where X and C are positive numbers independent
of x.
This notation is particularly useful when truncating power series: thus, the series
for sin x up to O(x
3
) is written,
sin x = x
x
3
3!
+O(x
5
),
meaning that the remainder is smaller than C|x|
5
, as x 0 for some C. Note that in
this course the phrase up to O(x
3
) means that the x
3
term is included. The following
exercises provide practice in using the O-notation and exercise 1.2 proves an important
result.
Exercise 1.2
Show that if f(x) = O(x
2
) as x 0 then also f(x) = O(x).
Exercise 1.3
Use the binomial expansion to nd the order of the following expressions as x 0.
(a) x
_
1 +x
2
, (b)
x
1 +x
, (c)
x
3/2
1 e
x
.
Exercise 1.4
Use the binomial expansion to nd the order of the following expressions as x .
(a)
x
x 1
, (b)
_
4x
2
+x 2x, (c) (x +b)
a
x
a
, a > 0.
The order notation is usefully extended to functions of n real variables, f : R
n
R,
by using the distance |x|. Thus, we say that f(x) = O(|x|
n
) if there is a non-zero
constant C and a small number such that |f(x)| < C|x|
n
for |x| < .
Exercise 1.5
(a) If f1 = x and f2 = y show that f1 = O(f) and f2 = O(f) where f(x, y) =
(x
2
+y
2
)
1
2
.
(b) Show that the polynomial (x, y) = ax
2
+ bxy + cy
2
vanishes to at least the
same order as the polynomial f(x, y) = x
2
+ y
2
at (0, 0). What conditions are
needed for to vanish faster than f as
_
x
2
+y
2
0?
Another expression that is useful is
f(x) = o(|x|) which is shorthand for lim
|x|0
f(x)
|x|
= 0.
Informally this means that f(x) vanishes faster than |x| as |x| 0. More generally
f = o(g) if lim
|x|0
|f(x)/g(x)| = 0, meaning that f(x) vanishes faster than g(x) as
|x| 0.
1.3 Functions of a real variable
1.3.1 Introduction
In this section we introduce important ideas pertaining to real functions of a single real
variable, although some mention is made of functions of many variables. Most of the
ideas discussed should be familiar from earlier courses in elementary real analysis or
Calculus, so our discussion is brief and all exercises are optional.
The study of Real Analysis normally starts with a discussion of the real number
system and its properties. Here we assume all necessary properties of this number
system and refer the reader to any basic text if further details are required: adequate
discussion may be found in the early chapters of the texts by Whittaker and Watson
9
,
Rudin
10
and by Kolmogorov and Fomin
11
.
1.3.2 Continuity and Limits
A continuous function is one whose graph has no vertical breaks: otherwise, it is dis-
continuous. The function f
1
(x), depicted by the solid line in gure 1.1 is continuous
for x
1
< x < x
2
. The function f
2
(x), depicted by the dashed line, is discontinuous at
x = c.
x ( ) f
2
x ( ) f
2
x ( ) f
1
y
x
x
1
x
2
c
Figure 1.1 Figure showing examples of a continuous
function, f
1
(x), and a discontinuous function f
2
(x).
A function f(x) is continuous at a point x = a if f(a) exists and if, given any arbitrarily
small positive number, , we can nd a neighbourhood of x = a such that in it |f(x)
f(a)| < . We can express this in terms of limits and since a point a on the real line
can be approached only from the left or the right a function is continuous at a point
x = a if it approaches the same value, independent of the direction. Formally we have
Denition 1.4
Continuity: a function, f, is continuous at x = a if f(a) is dened and
lim
xa
f(x) = f(a).
For a function of one variable, this is equivalent to saying that f(x) is continuous at
x = a if f(a) is dened and the left and right-hand limits
lim
xa
f(x) and lim
xa+
f(x),
9
A Course of Modern Analysis by E T Whittaker and G N Watson, Cambridge University Press.
10
Principles of Mathematical Analysis by W Rudin (McGraw-Hill).
11
Introductory Real Analysis by A N Kolmogorov and S V Fomin (Dover).
1.3. FUNCTIONS OF A REAL VARIABLE 15
exist and are equal to f(a).
If the left and right-hand limits exist but are not equal the function is discontinuous
at x = a and is said to have a simple discontinuity at x = a.
If they both exist and are equal, but do not equal f(a), then the function is said to
have a removable discontinuity at x = a.
Quite elementary functions exist for which neither limit exists: these are also dis-
continuous, and said to have a discontinuity of the second kind at x = a, see Rudin
(1976, page 94). An example of a function with such a discontinuity at x = 0 is
f(x) =
_
sin(1/x), x = 0,
0, x = 0.
We shall have no need to consider this type of discontinuity in this course, but simple
discontinuities will arise.
A function that behaves as
|f(x +) f(x)| = O() as 0
is continuous, though the converse is not true, a counter example being f(x) =
_
|x| at
x = 0.
Most functions that occur in the sciences are either continuous or piecewise continu-
ous, which means that the function is continuous except at a discrete set of points. The
Heaviside function and the related sgn functions are examples of commonly occurring
piecewise continuous functions that are discontinuous. They are dened by
H(x) =
_
1, x > 0,
0, x < 0,
and sgn(x) =
_
1, x > 0,
1, x < 0,
sgn(x) = 1 + 2H(x).
(1.5)
These functions are discontinous at x = 0, where they are not normally dened. In
some texts these functions are dened at x = 0; for instance H(0) may be dened to
have the value 0, 1/2 or 1.
If lim
xc
f(x) = A and lim
xc
g(x) = B, then it can be shown that the following
(obvious) rules are adhered to:
(a) lim
xc
(f(x) +g(x)) = A +B;
(b) lim
xc
(f(x)g(x)) = AB;
(c) lim
xc
f(x)
g(x)
=
A
B
, if B = 0;
(d) if lim
xB
f(x) = f
B
then lim
xc
(f(g(x)) = f
B
.
The value of a limit is normally found by a combination of suitable re-arrangements
and expansions. An example of an expansion is
lim
x0
sinh ax
x
= lim
x0
ax +
1
3!
(ax)
3
+O(x
5
)
x
= lim
x0
_
a +O(x
2
)
_
= a.
An example of a re-arrangement, using the above rules, is
lim
x0
sinh ax
sinh bx
= lim
x0
sinhax
x
x
sinh bx
= lim
x0
sinh ax
x
lim
x0
x
sinh bx
=
a
b
, (b = 0).
Finally, we note that a function that is continuous on a closed interval is bounded
above and below and attains its bounds. It is important that the interval is closed; for
instance the function f(x) = x dened in the open interval 0 < x < 1 is bounded above
and below, but does not attain it bounds. This example may seem trivial, but similar
diculties exist in the Calculus of Variations and are less easy to recognise.
Exercise 1.6
A function that is nite and continuous for all x is dened by
f(x) =
_
_
A
x
2
+x +B, 0 x a, a > 0,
C
x
2
+Dx, a x,
where A, B, C, D and a are real numbers: if f(0) = 1 and limxf(x) = 0, nd
these numbers.
Exercise 1.7
Find the limits of the following functions as x 0 and w .
(a)
sin ax
x
, (b)
tanax
x
, (c)
sin ax
sin bx
, (d)
3x + 4
4x + 2
, (e)
_
1 +
z
w
_
w
.
For functions of two or more variables, the denition of continuity is essentially the
same as for a function of one variable. A function f(x) is continuous at x = a if f(a)
is dened and
lim
xa
f(x) = f(a). (1.6)
Alternatively, given any > 0 there is a > 0 such that whenever |x a| < ,
|f(x) f(a)| < .
It should be noted that if f(x, y) is continuous in each variable, it is not necessarily
continuous in both variables. For instance, consider the function
f(x, y) =
_
_
_
(x +y)
2
x
2
+y
2
, x
2
+y
2
= 0,
1, x = y = 0,
and for xed y = = 0 the related function of x,
f(x, ) =
(x +)
2
x
2
+
2
= 1 +O(x) as x 0
and f(x, 0) = 1 for all x: for any this function is a continuous function of x. On the
line x + y = 0, however, f = 0 except at the origin so f(x, y) is not continuous along
this line. More generally, by putting x = r cos and y = r sin , < , r = 0, we
can approach the origin from any angle. In this representation f = 2 sin
2
_
+

4
_
so
on any circle round the origin f takes any value between 0 and 2. Therefore f(x, y) is
not a continuous function of both x and y.
Exercise 1.8
Determine whether or not the following functions are continuous at the origin.
(a) f =
2xy
x
2
+y
2
, (b) f =
x
2
+y
2
x
2
y
2
, (c) f =
2x
2
y
x
2
+y
2
.
Hint use polar coordinates x = r cos , y = r sin and consider the limit r 0.
1.3.3 Monotonic functions and inverse functions
A function is said to be monotonic on an interval if it is always increasing or always
decreasing. Simple examples are f(x) = x and f(x) = exp(x) which are mono-
tonic increasing and monotonic decreasing, respectively, on the whole line: the function
f(x) = sin x is monotonic increasing for /2 < x < /2. More precisely, we have,
Denition 1.5
Monotonic functions: A function f(x) is monotonic increasing for a < x < b if
f(x
1
) f(x
2
) for a < x
1
< x
2
< b.
A monotonic decreasing function is dened in a similar way.
If f(x
1
) < f(x
2
) for a < x
1
< x
2
< b then f(x) is said to be strictly monotonic (in-
creasing) or strictly increasing ; strictly decreasing functions are dened in the obvious
manner.
The recognition of the intervals on which a given function is strictly monotonic is
sometimes important because on these intervals the inverse function exists. For instance
the function y = e
x
is monotonic increasing on the whole real line, R, and its inverse is
the well known natural logarithm, x = ln y, with y on the positive real line.
In general if f(x) is continuous and strictly monotonic on a x b and y = f(x)
the inverse function, x = f
1
(y), is continuous for f(a) y f(b) and satises
y = f(f
1
(y)). Moreover, if f(x) is strictly increasing so is f
1
(y).
Complications occur when a function is increasing and decreasing on neighbouring
intervals, for then the inverse may have two or more values. For example the function
f(x) = x
2
is monotonic increasing for x > 0 and monotonic decreasing for x < 0: hence
the relation y = x
2
has the two familiar inverses x =
y, y 0. These two inverses are

often refered to as the dierent branches of the inverse; this idea is important because
most functions are monotonic only on part of their domain of denition.
Exercise 1.9
(a) Show that y = 3a
2
xx
3
is strictly increasing for a < x < a and that on this
interval y increases from 2a
3
to 2a
3
.
(b) By putting x = 2a sin and using the identity sin
3
= (3 sin sin 3)/4,
show that the equation becomes
y = 2a
3
sin 3 and hence that x(y) = 2a sin
_
1
3
sin
1
_
y
2a
3
_
_
.
(c) Find the inverse for x > 2a. Hint put x = 2a cosh and use the relation
cosh
3
= (cosh 3 + 3 cosh )/4.
1.3.4 The derivative
The notion of the derivative of a continuous function, f(x), is closely related to the
geometric idea of the tangent to a curve and to the related concept of the rate of
change of a function, so is important in the discussion of anything that changes. This
geometric idea is illustrated in gure 1.2: here P is a point with coordinates (a, f(a))
on the graph and Q is another point on the graph with coordinates (a + h, f(a + h)),
where h may be positive or negative.
Q
P
a
f(a+h)
f(a)
a+h
Tangent
at P
Figure 1.2 Illustration showing the chord PQ and the tan-
gent line at P.
The gradient of the chord PQ is tan where is the angle between PQ and the x-axis,
and is given by the formula
tan =
f(a +h) f(a)
h
.
If the graph in the vicinity of x = a is represented by a smooth line, then it is intuitively
obvious that the chord PQ becomes closer to the tangent at P as h 0; and in the
limit h = 0 the chord becomes the tangent. Hence the gradient of the tangent is given
by the limit
lim
h0
f(a +h) f(a)
h
.
This limit, provided it exists, is named the derivative of f(x) at x = a and is commonly
denoted either by f
(a) or
df
dx
. Thus we have the formal denition:
Denition 1.6
The derivative: A function f(x), dened on an open interval U of the real line, is
dierentiable for x U and has the derivative f
(x) if
f
(x) =
df
dx
= lim
h0
f(x +h) f(x)
h
, (1.7)
exists.
If the derivative exists at every point in the open interval U the function f(x) is said
to be dierentiable in U: in this case it may be proved that f(x) is also continuous.
However, a function that is continuous at a need not be dierentiable at a: indeed,
it is possible to construct functions that are continuous everywhere but dierentiable
nowhere; such functions are encountered in the mathematical description of Brownian
motion.
Combining the denition of f
(x) and the denition 1.3 of the order notation shows

that a dierentiable function satises
f(x +h) = f(x) +hf
(x) +o(h). (1.8)

The formal denition, equation 1.7, of the derivative can be used to derive all its useful
properties, but the physical interpretation, illustrated in gure 1.2, provides a more
useful way to generalise it to functions of several variables.
The tangent line to the graph y = f(x) at the point a, which we shall consider to
be xed for the moment, has slope f
(a) and passes through f(a). These two facts

determine the derivative completely. The equation of the tangent line can be written
in parametric form as p(h) = f(a) + f
(a) h. Conversely, given a point a, and the

equation of the tangent line at that point, the derivative, in the classical sense of the
denition 1.6, is simply the slope, f
(a), of this line. So the information that the

derivative of f at a is f
(a) is equivalent to the information that the tangent line at

a has equation p(h) = f(a) + f
(a) h. Although the classical derivative, equation 1.7,

is usually taken to be the fundamental concept, the equivalent concept of the tangent
line at a point could be considered equally fundamental - perhaps more so, since a
tangent is a more intuitive idea than the numerical value of its slope. This is the key
to successfully dening the derivative of functions of more than one variable.
From the denition 1.6 the following useful results follow. If f(x) and g(x) are
dierentiable on the same open interval and and are constants then
(a)
d
dx
_
f(x) +g(x)
_
= f
(x) +g
(x),
(b)
d
dx
_
f(x)g(x)
_
= f
(x)g(x) +f(x)g
(x), (The product rule)

(c)
d
dx
_
f(x)
g(x)
_
=
f
(x)g(x) f(x)g
(x)
g(x)
2
, g(x) = 0. (The quotient rule)
We leave the proof of these results to the reader, but note that the dierential of 1/g(x)
follows almost trivially from the denition 1.6, exercise 1.14, so that the third expression
is a simple consequence of the second.
The other important result is the chain rule concerning the derivative of composite
functions. Suppose that f(x) and g(x) are two dierentiable functions and a third is
formed by the composition,
F(x) = f(g(x)), sometimes written as F = f g,
which we assume to exist. Then the derivative of F(x) can be shown, as in exercise 1.18,
to be given by
dF
dx
=
df
dg

dg
dx
or F
(x) = f
(g)g
(x). (1.9)
This formula is named the chain rule. Note how the prime-notation is used: it denotes
the derivative of the function with respect to the argument shown, not necessarily the
original independent variable, x. Thus f
(g) or f
(g(x)) does not mean the derivative

of F(x); it means the derivative f
(x) with x replaced by g or g(x).

A simple example should make this clear: suppose f(x) = sin x and g(x) = 1/x,
x > 0, so F(x) = sin(1/x). The chain rule gives
dF
dx
=
d
dg
(sin g)
d
dx
_
1
x
_
= cos g
_
1
x
2
_
=
1
x
2
cos
_
1
x
_
.
The derivatives of simple functions, polynomials and trigometric functions for instance,
can be deduced from rst principles using the denition 1.6: the three rules, given above,
and the chain rule can then be used to nd the derivative of any function described with
nite combinations of these simple functions. A few exercises will make this process
clear.
Exercise 1.10
Find the derivative of the following functions
(a)
_
(a x)(b +x) , (b)
_
a sin
2
x +b cos
2
x, (c) cos(x
3
) cos x , (d) x
x
.
Exercise 1.11
If y = sin x for /2 x /2 show that
dx
dy
=
1
_
1 y
2
.
Exercise 1.12
(a) If y = f(x) has the inverse x = g(y), show that f
(x)g
(y) = 1, that is
dx
dy
=
_
dy
dx
_
1
.
(b) Express
d
2
x
dy
2
in terms of
dy
dx
and
d
2
y
dx
2
.
Clearly, if f
(x) is dierentiable, it may be dierentiated to obtain the second derivative,

which is denoted by
f
(x) or
d
2
f
dx
2
.
This process can be continued to obtain the functions
f,
df
dx
,
d
2
f
dx
2
,
d
3
f
dx
3
, ,
d
n1
f
dx
n1
,
d
n
f
dx
n
,
where each member of the sequence is the derivative of the preceeding member,
d
p
f
dx
p
=
d
dx
_
d
p1
f
dx
p1
_
, p = 2, 3, .
The prime notation becomes rather clumsy after the second or third derivative, so the
most common alternative is
d
p
f
dx
p
= f
(p)
(x), p 2,
with the conventions f
(1)
(x) = f
(x) and f
(0)
(x) = f(x). Care is needed to distinguish
between the pth derivative, f
(p)
(x), and the pth power, denoted by f(x)
p
and sometimes
f
p
(x) the latter notation should be avoided if there is any danger of confusion.
Functions for which the nth derivative is continuous are said to be n-dierentiable
and to belong to class C
n
: the notation C
n
(U) means the rst n derivatives are continu-
ous on the interval U: the notation C
n
(a, b) or C
n
[a, b], with obvious meaning, may also
be used. The term smooth function describes functions belonging to C
, that is func-
tions, such as sin x, having all derivatives; we shall, however, use the term suciently
smooth for functions that are suciently dierentiable for all subsequent analysis to
work, when more detail is deemed unimportant.
In the following exercises some important, but standard, results are derived.
Exercise 1.13
If f(x) is an even (odd) function, show that f
(x) is an odd (even) function.

Exercise 1.14
Show, from rst principles using the limit 1.7, that
d
dx
_
1
f(x)
_
=
f
(x)
f(x)
2
, and
that the product rule is true.
Exercise 1.15
Leibnizs rule
If h(x) = f(x)g(x) show that
h
(x) = f
(x)g(x) + 2f
(x)g
(x) +f(x)g
(x),
h
(3)
(x) = f
(3)
(x)g(x) + 3f
(x)g
(x) + 3f
(x)g
(x) +f(x)g
(3)
(x),
and use induction to derive Leibnizs rule
h
(n)
(x) =
n
k=0
_
n
k
_
f
(nk)
(x)g
(k)
(x),
where the binomial coecients are given by
_
n
k
_
=
n!
k! (n k)!
.
Exercise 1.16
Show that
d
dx
ln(f(x)) =
f
(x)
f(x)
and hence that if
p(x) = f1(x)f2(x) fn(x) then
p
p
=
f
1
f1
+
f
2
f2
+ +
f
n
fn
,
provided p(x) = 0. Note that this gives an easier method of dierentiating prod-
ucts of three or more factors than repeated use of the product rule.
Exercise 1.17
If the elements of a determinant D(x) are dierentiable functions of x,
D(x) =
f(x) g(x)
(x) (x)
show that
D
(x) =
(x) g
(x)
(x) (x)
f(x) g(x)
(x)
(x)
.
Extend this result to third-order determinants.
1.3.5 Mean Value Theorems
If a function f(x) is suciently smooth for all points inside the interval a < x < b,
its graph is a smooth curve
12
starting at the point A = (a, f(a)) and ending at B =
(b, f(b)), as shown in gure 1.3.
B
A
Q
P
a
b
f(a)
f(b)
Figure 1.3 Diagram illustrating Cauchys form
of the mean value theorem.
From this gure it seems plausible that the tangent to the curve must be parallel to
the chord AB at least once. That is
f
(x) =
f(b) f(a)
b a
for some x in the interval a < x < b. (1.10)
Alternatively this may be written in the form
f(b) = f(a) +hf
(a +h), h = b a. (1.11)
where is a number in the interval 0 < < 1, and is normally unknown. This relation
is used frequently throughout the course. Note that equation 1.11 shows that between
zeros of a continuous function there is at least one point at which the derivative is zero.
Equation 1.10 can be proved and is enshrined in the following theorem
Theorem 1.1
The Mean Value Theorem (Cauchys form). If f(x) and g(x) are real and dieren-
tiable for a x b, then there is a point u inside the interval at which
_
f(b) f(a)
_
g
(u) =
_
g(b) g(a)
_
f
(u), a < u < b. (1.12)

By putting g(x) = x, equation 1.10 follows.
A similar idea may be applied to integrals. In gure 1.4 is shown a typical continuous
function, f(x), which attains its smallest and largest values, S and L respectively, on
the interval a x b.
12
A smooth curve is one along which its tangent changes direction continuously, without abrupt
changes.
a
b
f(x)
S
L
Figure 1.4 Diagram showing the upper and
lower bounds of f(x) used to bound the integral.
It is clear that the area under the curve is greater than (b a)S and less than (b a)L,
that is
(b a)S
_
b
a
dxf(x) (b a)L.
Because f(x) is continuous it follows that
_
b
a
dxf(x) = (b a)f() for some [a, b]. (1.13)
This observation is made rigorous in the following theorem.
Theorem 1.2
The Mean Value theorem (integral form). If, on the closed interval a x b, f(x)
is continuous and (x) 0 then there is an satisfying a b such that
_
b
a
dxf(x)(x) = f()
_
b
a
dx(x). (1.14)
If (x) = 1 relation 1.13 is regained.
Exercise 1.18
The chain rule
In this exercise the Mean Value Theorem is used to derive the chain rule, equa-
tion 1.9, for the derivative of F(x) = f(g(x)).
Use the mean value theorem to show that
F(x +h) F(x) = f
_
g(x) +hg
(x +h)
_
f(g(x))
and that
f
_
g(x) +hg
(x +h)
_
= f(g(x)) +hg
(x +h) f
(g +hg
)
where 0 < , < 1. Hence show that
F(x +h) F(x)
h
= f
(g +hg
) g
(x +h),
and by taking the limit h 0 derive equation 1.9.
Exercise 1.19
Use the integral form of the mean value theorem, equation 1.13, to evaluate the
limits,
(a) lim
x0
1
x
_
x
0
dt
_
4 + 3t
3
, (b) lim
x1
1
(x 1)
3
_
x
1
dt ln
_
3t 3t
2
+t
3
_
.
1.3.6 Partial Derivatives
Here we consider functions of two or more variables, in order to introduce the idea of
a partial derivative. If f(x, y) is a function of the two, independent variables x and
y, meaning that changes in one do not aect the other, then we may form the partial
derivative of f(x, y) with respect to either x or y using a minor modication of the
denition 1.6 (page 18).
Denition 1.7
The partial derivative of a function f(x, y) of two variables with respect to the rst
variable x is
f
x
= f
x
(x, y) = lim
h0
f(x +h, y) f(x, y)
h
.
In the computation of f
x
the variable y is unchanged.
Similarly, the partial derivative with respect to the second variable y is
f
y
= f
y
(x, y) = lim
k0
f(x, y +k) f(x, y)
k
.
In the computation of f
y
the variable x is unchanged.
We use the conventional notation, f/x, to denote the partial derivative with respect
to x, which is formed by xing y and using the rules of ordinary calculus for the deriva-
tive with respect to x. The sux notation, f
x
(x, y), is used to denote the same function:
here the sux x shows the variable being dierentiated, and it has the advantage that
when necessary it can be used in the form f
x
(a, b) to indicate that the partial derivative
f
x
is being evaluated at the point (a, b).
In practice the evaluation of partial derivatives is exactly the same as ordinary
derivatives and the same rules apply. Thus if f(x, y) = xe
y
ln(2x +3y) then the partial
derivatives with respect to x and y are, repectively
f
x
= e
y
ln(2x + 3y) +
2xe
y
2x + 3y
and
f
y
= xe
y
ln(2x + 3y) +
3xe
y
2x + 3y
.
Exercise 1.20
(a) If u = x
2
sin(lny) compute ux and uy.
(b) If r
2
= x
2
+y
2
show that
r
x
=
x
r
and
r
y
=
y
r
.
The partial derivatives are also functions of x and y, so may be dierentiated again.
Thus we have
x
_
f
x
_
=

2
f
x
2
= f
xx
(x, y) and

y
_
f
y
_
=

2
f
y
2
= f
yy
(x, y). (1.15)
But now we also have the mixed derivatives
x
_
f
y
_
and

y
_
f
x
_
. (1.16)
Except in special circumstances the order of dierentiation is irrelevant so we obtain
the mixed derivative rule
x
_
f
y
_
=

y
_
f
x
_
=

2
f
xy
=

2
f
yx
. (1.17)
Using the sux notation the mixed derivative rule is f
xy
= f
yx
. A sucient condi-
tion for this to hold is that both f
xy
and f
yx
are continuous functions of (x, y), see
equation 1.6 (page 16).
Similarly, dierentiating p times with respect to x and q times with respect to y, in
any order, gives the same nth order derivative,
n
f
x
p
y
q
where n = p +q,
provided all the nth derivatives are continuous.
Exercise 1.21
If (x, y) = exp(x
2
/y) show that satises the equations
x
=
2x
y
and

2
x
2
= 4
y

2
y
.
Exercise 1.22
Show that u = x
2
sin(ln y) satises the equation 2y
2
2
u
y
2
+ 2y
u
y
+x
u
x
= 0.
The generalisation of these ideas to functions of the n variables x = (x
1
, x
2
, . . . , x
n
) is
straightforward: the partial derivative of f(x) with respect to x
k
is dened to be
f
x
k
= lim
h0
f(x
1
, x
2
, , x
k1
, x
k
+h, x
k+1
, , x
n
) f(x
1
, x
2
, . . . , x
n
)
h
. (1.18)
All other properties of the derivatives are the same as in the case of two variables, in
particular for the mth derivative the order of dierentiation is immaterial provided all
mth derivatives are continuous.
For a function of a single variable, f(x), the existence of the derivative, f
(x),
implies that f(x) is continuous. For functions of two or more variables the existence of
the partial derivatives does not guarantee continuity.
The total derivative
If f(x
1
, x
2
, . . . , x
n
) is a function of n variables and if each of these variables is a function
of the single variable t, we may form a new function of t with the formula
F(t) = f(x
1
(t), x
2
(t), , x
n
(t)). (1.19)
Geometrically, F(t) represents the value of f(x) on a curve C dened parametrically by
the functions (x
1
(t), x
2
(t), , x
n
(t)). The derivative of F(t) is given by the relation
dF
dt
=
n
k=1
f
x
k
dx
k
dt
, (1.20)
so F
(t) is the rate of change of f(x) along C. Normally, we write f(t) rather than
use a dierent symbol F(t), and the left-hand side of the above equation is written
df
dt
.
This derivative is named the total derivative of f. The proof of this when n = 2 and
x
and y
do not vanish near (x, y) is sketched below; the generalisation to larger n is

straightforward. If F(t) = f(x(t), y(t)) then
F(t + ) = f(x(t +), y(t +))
= f
_
x(t) +x
(t +), y(t) +y
(t +)
_
, 0 < , < 1,
where we have used the mean value theorem, equation 1.11. Write the right-hand side
in the form
f(x+x
, y+y
) =
_
f(x +x
, y +y
) f(x, y +y
)
_
+
_
f(x, y +y
) f(x, y)
_
+f(t)
so that
F(t +) F(t)
=
f(x +x
, y +y
) f(x, y +y
)
x
+
f(x, y +y
) f(x, y)
y
.
Thus, on taking the limit as 0 we have
dF
dt
=
f
x
dx
dt
+
f
y
dy
dt
.
This result remains true if either or both x
= 0 or y
= 0, but then more care is needed

with the proof.
Equation 1.20 is used in chapter 4 to derive one of the most important results in
the course: if the dependence of x upon t is linear and F(t) has the form
F(t) = f(x +th) = f(x
1
+th
1
, x
2
+th
2
, , x
n
+th
n
)
where the vector h is constant and the variable x
k
has been replaced by x
k
+ th
k
, for
all k. Since
d
dt
(x
k
+th
k
) = h
k
, equation 1.20 becomes
dF
dt
=
n
k=1
f
x
k
h
k
. (1.21)
This result will also be used in section 1.3.9 to derive the Taylor series for several
variables.
A variant of equation 1.19, which frequently occurs in the Calculus of Variations, is
the case where f(x) depends explicitly upon the variable t, so this equation becomes
F(t) = f(t, x
1
(t), x
2
(t), , x
n
(t))
and then equation 1.20 acquires an additional term,
dF
dt
=
f
t
+
n
k=1
f
x
k
dx
k
dt
. (1.22)
For an example we apply this formula to the function
f(t, x, y) = xsin(yt) with x = e
t
and y = e
2t
,
so
F(t) = f
_
t, e
t
, e
2t
_
= e
t
sin
_
te
2t
_
.
Equation 1.22 gives
dF
dt
=
f
t
+
f
x
dx
dt
+
f
y
dy
dt
= xy cos(yt) +e
t
sin(yt) 2xt cos(yt)e
2t
,
which can be expressed in terms of t only,
dF
dt
= (1 2t)e
t
cos
_
te
2t
_
+e
t
sin
_
te
2t
_
.
The same expression can also be obtained by direct dierentiation of F(t) = e
t
sin
_
te
2t
_
.
The right-hand sides of equations 1.20 and 1.22 depend upon both x and t, but
because x depends upon t often these expressions are written in terms of t only. In the
Calculus of Variations this is usually not helpful because the dependence of both x and
t, separately, is important: for instance we often require expressions like
d
dt
_
F
x
1
_
and

x
1
_
dF
dt
_
.
The second of these expressions requires some clarication because dF/dt contains the
derivatives x
k
. Thus
x
1
_
dF
dt
_
=

x
1
_
f
t
+
n
k=1
f
x
k
dx
k
dt
_
.
Since x
k
(t) is independent of x
1
for all k, this becomes
x
1
_
dF
dt
_
=

2
f
x
1
t
+
n
k=1
2
f
x
1
x
k
dx
k
dt
=
d
dt
_
F
x
1
_
,
the last line being a consequence of the mixed derivative rule.
Exercise 1.23
If f(t, x, y) = xy ty
2
and x = t
2
, y = t
3
show that
df
dt
= y
2
+y
dx
dt
+
dy
dt
(x 2ty) = t
4
(5 7t
2
),
and that
y
_
df
dt
_
=
dx
dt
2y 2t
dy
dt
= 2t
_
1 4t
2
_
,
d
dt
_
f
y
_
=
d
dt
(x 2ty) =
dx
dt
2y 2t
dy
dt
= 2t
_
1 4t
2
_
.
Exercise 1.24
If F =
1 +x1x2, and x1 and x2 are functions of t, show by direct calculation

of each expression that
x1
_
dF
dt
_
=
d
dt
_
F
x1
_
=
x
2
2
1 +x1x2
x2 (x
1
x2 +x1x
2
)
4(1 +x1x2)
3/2
.
Exercise 1.25
Eulers formula for homogeneous functions
(a) A function f(x, y) is said to be homogeneous with degree p in x and y if it has
the property f(x, y) =
p
f(x, y), for any constant and real number p. For
such a function prove Eulers formula:
pf(x, y) = xfx(x, y) +yfy(x, y).
Hint use the total derivative formula 1.20 and dierentiate with respect to .
(b) Find the equivalent result for homogeneous functions of n variables that satisfy
f(x) =
p
f(x).
(c) Show that if f(x1, x2, , xn) is a homogeneous function of degree p, then
each of the partial derivatives, f/x
k
, k = 1, 2, , n, is homogeneous function
of degree p 1.
1.3.7 Implicit functions
An equation of the form f(x, y) = 0, where f is a suitably well behaved function of
both x and y, can dene a curve in the Cartesian plane, as illustrated in gure 1.5.
f(x,y)=0
y
x
y+k
y
x+h x
Figure 1.5 Diagram showing a typical curve dened
by an equation of the form f(x, y) = 0.
For some values of x the equation f(x, y) = 0 can be solved to yield one or more real
values of y, which will give one or more functions of x. For instance the equation
x
2
+ y
2
1 = 0 denes a circle in the plane and for each x in |x| < 1 there are two
values of y, giving the two functions y(x) =
1 x
2
. A more complicated example
is the equation xy +sin(xy) = 0, which cannot be rearranged to express one variable
in terms of the other.
Consider the smooth curve sketched in gure 1.5. On a segment in which the curve
is not parallel to the y-axis the equation f(x, y) = 0 denes a function y(x). Such a
function is said to be dened implicitly. The same equation will also dene x(y), that
is x as a function of y, provided the segment does not contain a point where the curve
is parallel to the x-axis. This result, inferred from the picture, is a simple example of
the implicit function theorem stated below.
Implicitly dened functions are important because they occur frequently as solutions
of dierential equations, see exercise 1.29, but there are few, if any, general rules that
help understand them. It is, however, possible to obtain relatively simple expressions
for the rst derivatives, y
(x) and x
(y).
We assume that y(x) exists and is dierentiable, as seems reasonable from gure 1.5,
so F(x) = f(x, y(x)) is a function of x only and we may use the chain rule 1.22 to
dierentiate with respect to x. This gives
dF
dx
=
f
x
+
f
y
dy
dx
.
On the curve dened by f(x, y) = 0, F
(x) = 0 and hence

f
x
+
f
y
dy
dx
= 0 or
dy
dx
=
f
x
f
y
. (1.23)
Similarly, if x(y) exists and is dierentiable a similar analysis using y as the independent
variable gives
f
x
dx
dy
+
f
y
= 0 or
dx
dy
=
f
y
f
x
. (1.24)
This result is encapsulated in the Implicit Function Theorem which gives sucient
conditions for an equation of the form f(x, y) = 0 to have a solution y(x) satisfying
f(x, y(x)) = 0. A restricted version of it is given here.
Theorem 1.3
Implicit Function Theorem: Suppose that f : U R is a function with continuous
partial derivatives dened in an open set U R
2
. If there is a point (a, b) U for
which f(a, b) = 0 and f
y
(a, b) = 0, then there are open intervals I = (x
1
, x
2
) and
J = (y
1
, y
2
) such that (a, b) lies in the rectangle I J and for every x I, f(x, y) = 0
determines exactly one value y(x) J for which f(x, y(x)) = 0. The function y : I J
is continuous, dierentiable, with the derivative given by equation 1.23.
Exercise 1.26
In the case f(x, y) = y g(x) show that equations 1.23 and 1.24 leads to the
relation
dx
dy
=
_
dy
dx
_
1
.
Exercise 1.27
If ln(x
2
+y
2
) = 2 tan
1
(y/x) nd y
(x).
Exercise 1.28
If x y + sin(xy) = 0 determine the values of y
(0) and y
(0).
Exercise 1.29
Show that the dierential equation
dy
dx
=
y a
2
x
y +x
, y(1) = A > 0,
has a solution dened by the equation
1
2
ln
_
a
2
x
2
+y
2
_
+
1
a
tan
1
_
y
ax
_
= B where B =
1
2
ln
_
a
2
+A
2
_
+
1
a
tan
1
_
A
a
_
.
Hint the equation may be put in separable form by dening a new dependent
variable v = y/x.
The implicit function theorem can be generalised to deal with the set of functions
f
k
(x, t) = 0, k = 1, 2, , n, (1.25)
where x = (x
1
, x
2
, . . . , x
n
) and t = (t
1
, t
2
, . . . , t
m
). These n equations have a unique
solution for each x
k
in terms of t, x
k
= g
k
(t), k = 1, 2, , n, in the neighbourhood of
(x
0
, t
0
) provided that at this point the derivatives f
j
/x
k
, exist and that the deter-
minant
J =
f
1
x
1
f
1
x
2

f
1
x
n
f
2
x
1
f
2
x
2

f
2
x
n
.
.
.
.
.
.
.
.
.
f
n
x
1
f
n
x
2

f
n
x
n
(1.26)
is not zero. Furthermore all the functions g
k
(t) have continuous rst derivatives. The
determinant J is named the Jacobian determinant or, more usually, the Jacobian. It is
often helpful to use either of the following notations for the Jacobian,
J =
f
x
or J =
(f
1
, f
2
, . . . , f
n
)
(x
1
, x
2
, . . . , x
n
)
. (1.27)
Exercise 1.30
Show that the equations x = r cos , y = r sin can be inverted to give functions
r(x, y) and (x, y) in every open set of the plane that does not include the origin.
1.3.8 Taylor series for one variable
The Taylor series is a method of representing a given suciently well behaved function
in terms of an innite power series, dened in the following theorem.
Theorem 1.4
Taylors Theorem: If f(x) is a function dened on x
1
x x
2
such that f
(n)
(x) is
continuous for x
1
x x
2
and f
(n+1)
(x) exists for x
1
< x < x
2
, then if a [x
1
, x
2
]
for every x [x
1
, x
2
]
f(x) = f(a) + (x a)f
(a) +
(x a)
2
2!
f
(a) + +
(x a)
n
n!
f
(n)
(a) +R
n+1
. (1.28)
The remainder term, R
n+1
, can be expressed in the form
R
n+1
=
(x a)
n+1
(n + 1)!
f
(n+1)
(a +h) for some 0 < < 1 and h = x a. (1.29)
If all derivatives of f(x) are continuous for x
1
x x
2
, and if the remainder term
R
n
0 as n in a suitable manner we may take the limit to obtain the innite
series
f(x) =
k=0
(x a)
k
k!
f
(k)
(a). (1.30)
The innite series 1.30 is known as Taylors series, and the point x = a the point of
expansion. A similar series exists when x takes complex values.
Care is needed when taking the limit of 1.28 as n , because there are cases
when the innite series on the right-hand side of equation 1.30 does not equal f(x).
If, however, the Taylor series converges to f(x) at x = then for any x closer
to a than , that is |x a| < | a|, the series converges to f(x). This caveat is
necessary because of the strange example g(x) = exp(1/x
2
) for which all derivatives
are continuous and are zero at x = 0; for this function the Taylor series about x = 0
can be shown to exist, but for all x it converges to zero rather than g(x). This means
that for any well behaved function, f(x) say, with a Taylor series that converges to
f(x) a dierent function, f(x) +g(x) can be formed whose Taylor series converges, but
to f(x) not f(x) + g(x). This strange behaviour is not uncommon in functions arising
from physical problems; however, it is ignored in this course and we shall assume that
the Taylor series derived from a function converges to it in some interval.
The series 1.30 was rst published by Brook Taylor (1685 1731) in 1715: the result
obtained by putting a = 0 was discovered by Stirling (1692 1770) in 1717 but rst
published by Maclaurin (1698 1746) in 1742. With a = 0 this series is therefore often
known as Maclaurins series.
In practice, of course, it is usually impossible to sum the innite series 1.30, so it is
necessary to truncate it at some convenient point and this requires knowledge of how,
or indeed whether, the series converges to the required value. Truncation gives rise to
the Taylor polynomials, with the order-n polynomial given by
f(x) =
n
k=0
(x a)
k
k!
f
(k)
(a). (1.31)
The series 1.30 is an innite series of the functions (xa)
n
f
(n)
(a)/n! and summing
these requires care. A proper understanding of this process requires careful denitions
of convergence which may be found in any text book on analysis. For our purposes,
however, it is sucient to note that in most cases there is a real number, r
c
, named the
radius of convergence, such that if |x a| < r
c
the innite series is well mannered and
behaves rather like a nite sum: the value of r
c
can be innite, in which case the series
converges for all x.
If the Taylor series of f(x) and g(x) have radii of convergence r
f
and r
g
respectively,
then the Taylor series of f(x) + g(x), for constants and , and of f(x)
a
g(x)
b
, for
positive constants a and b, exist and have the radius of convergence min(r
f
, r
g
). The
Taylor series of the compositions f(g(x)) and g(f(x)) may also exist, but their radii of
convergence depend upon the behaviour of g and f respectively. Also Taylor series may
be integrated and dierentiated to give the Taylor series of the integral and derivative
of the original function, and with the same radius of convergence.
Formally, the nth Taylor polynomial of a function is formed from its rst n deriva-
tives at the point of expansion. In practice, however, the calculation of high-order
derivatives is very awkward and it is often easier to proceed by other means, which rely
upon ingenuity. A simple example is the Taylor series of ln(1+tanh x), to fourth order;
this is most easily obtained using the known Taylor expansions of ln(1 +z) and tanh x,
ln(1 +z) = z
z
2
2
+
z
3
3

z
4
4
+O(z
5
) and tanh x = x
x
3
3
+
2x
5
15
+O(x
7
),
and then put z = tanh x retaining only the appropriate order of the series expansion.
Thus
ln(1 + tanh x) =
_
x
x
3
3
+O(x
5
)
_
_
x
2
2
_
1
x
2
3
+
_
2
_
+
x
3
3

x
4
4
+O(x
5
)
= x
x
2
2
+
x
4
12
+O(x
5
).
This method is far easier than computing the four required derivatives of the original
function.
For |x a| > r
c
the innite sum 1.30 does not exist. It follows that knowledge of
r
c
is important. It can be shown that, in most cases of practical interest, its value is
given by either of the limits
r
c
= lim
n
a
n
a
n+1
or r
c
= lim
n
|a
n
|
1/n
where a
k
=
f
(k)
(a)
k!
. (1.32)
Usually the rst expression is most useful. Typically, we have, for large n
n!
f
(n)
(a)
1/n
= r
c
_
1 +O(1/n)
_
so that
n!
f
(n)
(a)
= Ar
n
c
_
1 +O(1/n)
_
for some constant A. Then the nth term of the series behaves as ((x a)/r
c
)
n
, and
decreases rapidly with increasing n provided |x a| < r
c
and n is suciently large.
Supercially, the Taylor series appears to be a useful representation and a good
approximation. In general this is not true unless |xa| is small; for practical applications
far more ecient approximations exist that is they achieve the same accuracy for
far less work. The basic problem is that the Taylor expansion uses knowledge of the
function at one point only, and the larger |x a| the more terms are required for a
given accuracy. More sensible approximations, on a given interval, take into account
information from the whole interval: we describe some approximations of this type in
chapter 13.
The rst practical problem is that the remainder term, equation 1.29, depends upon
, the value of which is unknown. Hence R
n
cannot be computed; also, it is normally
dicult to estimate.
In order to understand how these series converge we need to consider the magnitude
of the nth term in the Taylor series: this type of analysis is important for any numerical
evaluation of power series. The nth term is a product of (xa)
n
/n! and f
(n)
(a). Using
Stirlings approximation,
n! =
2n
_
n
e
_
n
_
1 +O(1/n)
_
(1.33)
we can approximate the rst part of this product by
(x a)
n
n!
2n
_
e|x a|)
n
_
n
= g
n
. (1.34)
The expression g
n
decreases very rapidly with increasing n, provided n is large enough.
Hence the term |x a|
n
/n! may be made as small as we please. But for practical
applications this is not sucient; in gure 1.6 we plot a graph of the values of log(g
n
),
that is the logarithm to the base 10, for x a = 10.
2 4 6 8 10 12 14 16 18 20
1
1.5
2
2.5
3
3.5
n
log(g
n
)
Figure 1.6 Graph showing the value of log(gn),
equation 1.34, for xa = 10. For clarity we have
joined the points with a continuous line.
In this example the maximum of g
n
is at n = 10 and has a value of about 2500, before it
starts to decrease. It is fairly simple to show that that g
n
has a maximum at n |xa|
and here its value is max(g
n
) exp(|x a|)/
_
2|x a|.
The value of f
(n)
(a) is also dicult to estimate, but it usually increases rapidly with
n. Bizarrely, in many cases of interest, this behaviour depends upon the behaviour
of f(z), where z is a complex variable. An understanding of this requires a study
of Complex Variable Theory, which is beyond the scope of this chapter. Instead we
illustrate the behaviour of Taylor polynomials with a simple example.
First consider the Taylor series of sin x, about x = 0,
sin x = x
x
3
3!
+
x
5
5!
+ + (1)
n1
x
2n1
(2n 1)!
+ , (1.35)
which is derived in exercise 1.31.
Note that only odd powers occur, because sin x is an odd function, and also that the
radius of convergence is innite. In gure 1.7 we show graphs of this series, truncated
at x
2n1
with n = 1, 4, 8 and 15 for 0 < x < 4.
2 4 6 8 10 12
-2
-1
0
1
2 n=1
n=4
n=8
n=15
x
Figure 1.7 Graph comparing the Taylor polynomials, of order n,
for the sine function with the exact function, the dashed line.
These graphs show that for large x it is necessary to include many terms in the series
to obtain an accurate representation of sin x. The reason is simply that for xed, large
x, x
2n1
/(2n 1)! is very large at n = x, as shown in gure 1.6. Because the terms
of this series alternate in sign the large terms in the early part of the series partially
cancel and cause problems when approximating a function O(1): it is worth noting that
as a consequence, with a computer having nite accuracy there is a value of x beyond
which the Taylor series for sin x gives incorrect values, despite the fact that formally it
converges for all x.
Exercise 1.31
Exponentional and Trigonometric functions
If f(x) = exp(ix) show that f
(n)
(x) = i
n
exp(ix) and hence that its Taylor series is
e
ix
=
k=0
(ix)
k
k!
.
Show that the radius of convergence of this series in innite. Deduce that
cos x = 1
x
2
2!
+
x
4
4!
+ +
(1)
n
x
2n
(2n)!
+ ,
sin x = x
x
3
3!
+
x
5
5!
+ +
(1)
n
x
2n+1
(2n + 1)!
+ .
Exercise 1.32
Binomial expansion
Show that the Taylor series of (1 +x)
a
is
(1 +x)
a
= 1 +ax +
1
2
a(a 1)x
2
+
a(a 1)(a 2) (a k + 1)
k!
x
k
+ .
When a = n is an integer this series terminates at k = n and becomes the binomial
expansion
(1 +x)
n
=
n
k=0
_
n
k
_
x
k
where
_
n
k
_
=
n!
k! (n k)!
are the binomial coecients.
Exercise 1.33
If f(x) = tan x nd the rst three derivatives to show that tan x = x+
1
3
x
3
+O(x
5
).
Exercise 1.34
The natural logarithm
(a) Show that
1
1 +t
= 1 t + t
2
+ + (1)
n
t
n
+ and use the denition of
the natural logarithm, ln(1 +x) =
_
x
0
dt
1
1 +t
, to show that
ln(1 +x) = x
x
2
2
+
x
3
3
+ +
(1)
n1
x
n
n
+ .
(b) For which values of x is this expression valid.
(c) Use this result to show that ln
_
1 +x
1 x
_
= 2
_
x +
x
3
3
+ +
x
2n1
2n 1
+
_
.
Exercise 1.35
The inverse tangent function
Use the denition tan
1
x =
_
x
0
dt
1
1 +t
2
to show that for |x| < 1,
tan
1
x =
k=0
(1)
k
x
2k+1
2k + 1
.
Exercise 1.36
Show that ln(1 + sinhx) = x
x
2
2
+
x
3
2

5x
4
12
+O(x
5
).
Exercise 1.37
Obtain the rst ve terms of the Taylor series of the function that satises the
equation
(1 +x)
dy
dx
= 1 +xy +y
2
, y(0) = 0.
Hint use Leibnizs rule given in exercise 1.15 (page 21) to dierentiate the equation
n times.
1.3.9 Taylor series for several variables
The Taylor series of a function f : R
m
R is trivially derived from the Taylor expan-
sion of a function of one variable using the chain rule, equation 1.21 (page 26). The
only diculty is that the algebra very quickly becomes unwieldy with increasing order.
We require the expansion of f(x) about x = a, so we need to represent f(a +h) as
some sort of power series in h. To this end, dene a function of the single variable t by
the relation
F(t) = f(a +th) so F(0) = f(a),
and F(t) gives values of f(x) on the straight line joining a to a +h. The Taylor series
of F(t) about t = 0 is, on using equation 1.28 (page 31),
F(t) = F(0) +tF
(0) +
t
2
2!
F
(0) + +
t
n
n!
F
(n)
(0) +R
n+1
, (1.36)
which we assume to exist for |t| 1. Now we need only express the derivatives F
(n)
(0)
in terms of the partial derivatives of f(x). Equation 1.21 (page 26) gives
F
(0) =
m
k=1
f
x
k
(a)h
k
.
Hence to rst order the Taylor series is
f(a +h) = f(a) +
m
k=1
h
k
f
x
k
(a) +R
2
= f(a) +h
f
a
+R
2
, (1.37)
where R
2
is the remainder term which is second order in h and is given below. Here
we have introduced the notation f/x for the vector function,
f
x
=
_
f
x
1
,
f
x
2
, ,
f
x
m
_
with the scalar product h
f
x
=
m
k=1
h
k
f
x
k
.
For the second derivative we use equation 1.21 (page 26) again,
F
(t) =
m
k=1
h
k
d
dt
f
x
k
(a +th) =
m
k=1
h
k
_
m
i=1
h
i
f
x
k
xi
(a +th)
_
.
At t = 0 this can be written in the form,
F
(0) =
m
k=1
h
k
m
i=1
h
i
f
x
k
xi
(a)
=
m
k=1
h
2
k
f
x
k
x
k
(a) + 2
m1
k=1
m
i=k+1
h
k
h
i
f
x
k
xi
(a), (1.38)
where the second relation comprises fewer terms because the mixed derivative rule has
been used. This gives the second order Taylor series,
f(a +h) = f(a) +
m
k=1
h
k
f
x
k
(a) +
1
2!
_
m
k=1
h
k
m
i=1
h
i
f
x
k
xi
(a)
_
+R
3
, (1.39)
where the remainder term is given below.
The higher-order terms are derived in exactly the same manner, but the algebra
quickly becomes cumbersome. It helps, however, to use the linear dierential operator
h /a to write the derivatives of F(t) at t = 0 in the more convenient form,
F
(0) =
_
h

a
_
f(a), F
(0) =
_
h

a
_
2
f(a) and F
(n)
(0) =
_
h

a
_
n
f(a).
(1.40)
Then we can write Taylor series in the form
f(a +h) = f(a) +
n
s=1
1
s!
_
h

a
_
s
f(a) +R
n+1
(1.41)
where the remainder term is
R
n+1
=
1
(n + 1)!
F
(n+1)
() for some 0 < < 1.
Because the high order derivatives are so cumbersome and for the practical reasons
discussed in section 1.3.8, in particular gure 1.7 (page 34), Taylor series for many vari-
ables are rarely used beyond the second order term. This term, however, is important
for the classication of stationary points, considered in chapter 8.
For functions of two variables, (x, y), the Taylor series is
f(a +h, b +k) = f(a, b) +hf
x
+kf
y
+
1
2
_
h
2
f
xx
+ 2hkf
xy
+k
2
f
yy
_
+
1
6
_
h
3
f
xxx
+ 3h
2
kf
xxy
+ 3hk
2
f
xyy
+k
3
f
yyy
_
+
+
s
r=0
h
sr
k
r
(s r)! r!
s
f
x
sr
y
r
+ +R
n+1
, (1.42)
where all derivatives are evaluated at (a, b). In this case the sth term is relatively easy
to obtain by expanding the dierential operator (h/x +k/y)
s
using the binomial
expansion (which works because the mixed derivative rule means that the two operators
/x and /y commute).
Exercise 1.38
Find the Taylor expansions about x = y = 0, up to and including the second order
terms, of the functions
(a) f(x, y) = sin xsin y, (b) f(x, y) = sin
_
x +e
y
1
_
.
Exercise 1.39
Show that the third-order Taylor series for a function, f(x, y, z), of three variables
is
f(a +h, b +k, c +l) = f(a, b, c) +hfx +kfy +lfz
+
1
2!
_
h
2
fxx +k
2
fyy +l
2
fzz + 2hkfxy + 2klfyz + 2lhfzx
_
+
1
3!
_
h
3
fxxx +k
3
fyyy +l
3
fzzz + 6hklfxyz
3hk
2
fxyy + 3hl
2
fxzz + 3kh
2
fyxx + 3kl
2
fyzz
+3lh
2
fzxx + 3lk
2
fzyy
_
.
1.3.10 LHospitals rule
Ratios of functions occur frequently and if
R(x) =
f(x)
g(x)
(1.43)
the value of R(x) is normally computed by dividing the value of f(x) by the value of
g(x): this works provided g(x) is not zero at the point in question, x = a say. If g(x)
and f(x) are simultaneously zero at x = a, the value of R(a) may be redened as a
limit. For instance if
R(x) =
sin x
x
(1.44)
then the value of R(0) is not dened, though R(x) does tend to the limit R(x) 1 as
x 0. Here we show how this limit may be computed using LHospitals rule
13
and
its extensions, discovered by the French mathematician GFA Marquis de lHospital
(1661 1704).
Suppose that at x = a, f(a) = g(a) = 0 and that each function has a Taylor series
about x = a, with nite radii of convergence: thus near x = a we have for small,
non-zero ||,
R(a +) =
f(a +)
g(a +)
=
f
(a) +O(
2
)
g
(a) +O(
2
)
=
f
(a)
g
(a)
+O() provided g
(a) = 0.
Hence, on taking the limit 0, we obtain the result given by the following theorem.
Theorem 1.5
LHospitals rule. Suppose that f(x) and g(x) are real and dierentiable in (a, b) and
g
(x) = 0 for all x (a, b), where a < x < b . If

lim
xa
f(x) = lim
xa
g(x) = 0 or lim
xa
g(x) =
then
lim
xa
f(x)
g(x)
= lim
xa
f
(x)
g
(x)
, (1.45)
provided the right-hand limit exists. The proof of LHospitals rule is given in Rudin
(1976, page 109)
14
.
More generally if f
(k)
(a) = g
(k)
(a) = 0, k = 0, 1, , n 1 and g
(n)
(a) = 0 then
lim
xa
f(x)
g(x)
= lim
xa
f
(n)
(x)
g
(n)
(x)
,
provided the right-hand limit exists.
Consider the function dened by equation 1.44; at x = 0 LHospitals rule gives
R(0) = lim
x0
sin x
x
= lim
x0
cos x
1
= 1.
13
Here we use the spelling of the French national bibliography, as used by LHospital. Some modern
text use the spelling LH opital, instead of the silent s.
14
In some versions of this theorem the condition limxa g(x) = is replaced by limxa g(x) =
limxa f(x) = , which is unnecessarily restrictive.
Exercise 1.40
Find the values of the following limits:
(a) lim
xa
cosh x cosh a
sinhx sinha
, (b) lim
x0
sin x x
xcos x x
, (c) lim
x0
3
x
3
x
2
x
2
x
.
Exercise 1.41
(a) If f(a) = g(a) = 0 and lim
xa
f
(x)
g
(x)
= show that lim
xa
f(x)
g(x)
= .
(b) If both f(x) and g(x) are positive in a neighbourhood of x = a, tend to innity
as x a and lim
xa
f
(x)
g
(x)
= A = 0 show that lim
xa
f(x)
g(x)
= A.
1.3.11 Integration
The study of integration arose from the need to compute areas and volumes. The
theory of integration was developed independently from the theory of dierentiation
and the Fundamental Theorem of Calculus, described in note PI on page 40, relates
these processes. It should be noted, however, that Newton knew of the relation between
gradients and areas and exploited it in his development of the subject.
In this section we provide a very brief outline of the simple theory of integration
and discuss some of the methods used to evaluate integrals. This section is included
for reference purposes; however, although the theory of integration is not central to
the main topic of this course, you should be familiar with its contents. The important
idea, needed in chapter 4, is that of dierentiating with respect to a parameter, or
dierentiating under the integral sign described in equation 1.52 (page 43).
In this discussion of integration we use an intuitive notion of area and refer the
reader to suitable texts, Apostol (1963), Rudin (1976) or Whittaker and Watson (1965)
for instance, for a rigorous treatment.
If f(x) is a real, continuous function of the interval a x b, it is intuitively clear
that the area between the graph and the x-axis can be approximated by the sum of the
areas of a set of rectangles as shown by the dashed lines in gure 1.8.
2
x
3
x
4
x
5
x
1
x
b
a
f(x)
x
y
Figure 1.8 Diagram showing how the area under the curve y = f(x) may be approx-
imated by a set of rectangles. The intervals x
k
x
k1
need not be the same length.
In general the closed interval a x b may be partitioned by a set of n 1 distinct,
ordered points
a = x
0
< x
1
< x
2
< < x
n1
< x
n
= b
to produce n sub-divisions: in gure 1.8 n = 6 and the spacings are equal. On each
interval we construct a rectangle: on the kth rectangle the height is f(l
k
) chosen to
be the smallest value of f(x) in the interval. These rectangles are shown in the gure.
Another set of rectangles of height f(h
k
) chosen to be the largest value of f(x) in the
interval can also be formed. If A is the area under the graph it follows that
n
k=1
(x
k
x
k1
) f(l
k
) A
n
k=1
(x
k
x
k1
) f(h
k
). (1.46)
This type of approximation underlies the simplest numerical methods of approximating
integrals and, as will be seen in chapter 4, is the basis of Eulers approximations to
variational problems.
The theory of integration developed by Riemann (1826 1866) shows that for con-
tinuous functions these two bounds approach each other, as n in a meaningful
manner, and denes the wider class of functions for which this limit exists. When these
limits exist their common value is named the integral of f(x) and is denoted by
_
b
a
dxf(x) or
_
b
a
f(x) dx. (1.47)
In this context the function f(x) is named the integrand, and b and a the upper and
lower integration limits, or just limits. It can be shown that the integral exists for
bounded, piecewise continuous functions and also some unbounded functions.
From this denition the following elementary properties can be derived.
P:I: If F(x) is a dierentiable function and F
(x) = f(x) then F(x) = F(a)+

_
x
a
dt f(t).
This is the Fundamental theorem of Calculus and is important because it provides one
of the most useful tools for evaluating integrals.
P:II:
_
b
a
dxf(x) =
_
a
b
dxf(x).
P:III:
_
b
a
dxf(x) =
_
c
a
dxf(x) +
_
b
c
dxf(x) provided all integrals exist. Note, it is not
necessary that c lies in the interval (a, b).
P:IV:
_
b
a
dx
_
f(x) +g(x)
_
=
_
b
a
dxf(x) +
_
b
a
dxg(x), where and are real
or complex numbers.
P:V:
_
b
a
dxf(x)
_
b
a
dx |f(x)| . This is the analogue of the nite sum inequality
k=1
a
k
k=1
|a
k
| , where a
k
, k = 1, 2, , n, are a set of complex numbers or functions.
P:VI: The Cauchy-Schwarz inequality for real functions is
_
_
b
a
dxf(x)g(x)
_
2
_
_
b
a
dxf(x)
2
__
_
b
a
dxg(x)
2
_
with equality if and only if g(x) = cf(x) for some real constant c. This inequality is
sometimes named the Cauchy inequality and sometimes the Schwarz inequality. It is
the analogue of the nite sum inequality
_
n
k=1
a
k
b
k
_
2
_
n
k=1
a
2
k
__
n
k=1
b
2
k
_
with equality if and only if b
k
= ca
k
for all k and some real constant c.
P:VII: The H older inequality: if
1
p
+
1
q
= 1, p > 1 and q > 1 then
_
b
a
dx
f(x)g(x)

_
_
b
a
dx |f(x)|
p
_
1/p
_
_
b
a
dx |g(x)|
q
_
1/q
,
is valid for complex functions f(x) and g(x) with equality if and only if |f(x)|
p
|g(x)|
q
and arg(fg) are independent of x. It is the analogue of the nite sum inequality
n
k=1
|a
k
b
k
|
_
n
k=1
|a
k
|
p
_
1/p
_
n
k=1
|b
k
|
q
_
1/q
,
1
p
+
1
q
= 1,
with equality if and only if |a
n
|
p
|b
n
|
q
and arg(a
n
b
n
) are independent of n (or a
k
= 0
for all k or b
k
= 0 for all k). If all a
k
and b
k
are positive and p = q = 2 these inequalities
reduce to the Cauchy-Schwarz inequalities.
P:VIII: The Minkowski inequality for any p > 1 and real functions f(x) and g(x) is
_
_
b
a
dx
f(x) +g(x)
p
_
1/p
_
_
b
a
dx |f(x)|
p
_
1/p
+
_
_
b
a
dx |g(x)|
p
_
1/p
with equality if and only if g(x) = cf(x), with c a non-negative constant. It is the
analogue of the nite sum inequality valid for a
k
, b
k
> 0, for all k, and p > 1
_
n
k=1
_
a
k
+b
k
_
p
_
1/p
_
n
k=1
a
p
k
_
1/p
+
_
n
k=1
b
p
k
_
1/p
,
with equality if and only if b
k
= ca
k
for all k and c a non-negative constant.
Sometimes it is convenient to ignore the integration limits, here a and b, and write
_
dxf(x): this is named the indenite integral: its value is undened to within an
additive constant. However, it is almost always possible to express problems in terms
of denite integrals that is, those with limits.
The theory of integration is concerned with understanding the nature of the inte-
gration process and with extending these simple ideas to deal with wider classes of
functions. The sciences are largely concerned with evaluating integrals, that is convert-
ing integrals to numbers or functions that can be understood: most of the techniques
available for this activity were developed in the nineteenth century or before, and we
describe them later in this section.
There are two important extensions to the integral dened above. If either or both
a and b tend to innity we dene an innite integral as a limit of integrals: thus if
b we have
_

a
dxf(x) = lim
b
_
_
b
a
dxf(x)
_
, (1.48)
assuming the limit exists. There are similar denitions for
_
b
dxf(x) and
_

dxf(x),
however, it should be noted that the limit
lim
a
_
a
a
dxf(x) may exist, but the limit lim
a
lim
b
_
a
b
dxf(x)
may not. An example is f(x) = x/(1 +x
2
) for which
_
a
b
dx
x
1 +x
2
=
1
2
ln
_
1 + a
2
1 +b
2
_
.
If a = b the right-hand side is zero for all a (because f(x) is an odd function) and the
rst limit is zero: if a = b the second limit does not exist.
Whether or not innite integrals exist depends upon the behaviour of f(x) as
|x| . Consider the limit 1.48. If f(x) = 0 for some X > 0, the limit exist provided
|f(x)| 0 faster than x
, > 1: if f(x) decays to zero slower than 1/x

1
, for any
> 0 the integral diverges, see however exercise 1.52, (page 45).
If the integrand is oscillatory cancellation between the positive and negative parts
of the integral gives convergence when the magnitude of the integrand tends to zero.
In this case we have the following useful theorem from 1853, due to Chartier
15
.
Theorem 1.6
If f(x) 0 monotonically as x and if
_
x
a
dt (t)
is bounded as x then
_
a
dxf(x)(x) exists.
For instance if (x) = sin(x), and f(x) = x
, 0 < < 2 this shows that

_
0
dxx
sin x
exists: if = 1 its value is /2, for any > 0. It should be mentioned that the very
cancellation which ensures convergence may cause diculties when evaluating such in-
tegrals numerically.
The second important extension deals with integrands that are unbounded. Suppose
that f(x) is unbounded at x = a, then we dene
_
b
a
dxf(x) = lim
0+
_
b
a+
dxf(x), (1.49)
15
J Chartier, Journal de Math 1853, XVIII, pages 201-212.
provided the limit exists. As a general rule, provided |f(x)| tends to innity slower
than |x a|
, > 1, the integral exists, which is why, in the previous example, we

needed < 2; note that if f(x) = O(ln(xa)), as x a, it is integrable. For functions
unbounded at an interior point the natural extension to PIII is used.
The evaluation of integrals of any complexity in closed form is normally dicult, or
impossible, but there are a few tools that help. The main technique is to use the Funda-
mental theorem of Calculus in reverse and simply involves recognising those F(x) whose
derivative is the integrand: this requires practice and ingenuity. The main purpose of
the other tools is to convert integrals into recognisable types. The rst is integration
by parts, derived from the product rule for dierentiation:
_
b
a
dxu
dv
dx
=
_
uv
_
b
a
_
b
a
dx
du
dx
v. (1.50)
The second method is to change variables:
_
b
a
dxf(x) =
_
B
A
dt
dx
dt
f(g(t)) =
_
B
A
dt g
(t)f(g(t)), (1.51)
where x = g(t), g(A) = a, g(B) = b, and g(t) is monotonic for A < t < B. In these
circumstances the Leibniz notation is helpfully transparent because
dx
dt
can be treated
like a fraction, making the equation easier to remember. The geometric signicance of
this formula is simply that the small element of length x, at x, becomes the element
of length x = g
(t)t, where x = g(t), under the variable change.

The third method involves the dierentiation of a parameter. Consider a function
f(x, u) of two variables, which is integrated with respect to x, then
d
du
_
b(u)
a(u)
dxf(x, u) = f(b, u)
db
du
f(a, u)
da
du
+
_
b(u)
a(u)
dx
f
u
, (1.52)
provided a(u) and b(u) are dierentiable and f
u
(x, u) is a continuous function of both
variables; the derivation of this formula is considered in exercise 1.50. If neither limit
depends upon u the rst two terms on the right-hand side vanish. A simple example
shows how this method can work. Consider the integral
I(u) =
_

0
dxe
xu
, u > 0.
The derivatives are
I
(u) =
_

0
dxxe
xu
and, in general, I
(n)
(u) = (1)
n
_

0
dxx
n
e
xu
.
But the original integral is trivially integrated to I(u) = 1/u, so dierentiation gives
_

0
dxx
n
e
xu
=
n!
u
n+1
.
This result may also be found by repeated integration by parts but the above method
involves less algebra.
The application of these methods usually requires some skill, some trial and error
and much patience. Please do not spend too long on the following problems.
Exercise 1.42
(a) If f(x) is an odd function, f(x) = f(x), show that
_
a
a
dxf(x) = 0.
(b) If f(x) is an even function, f(x) = f(x), show that
_
a
a
dxf(x) = 2
_
a
0
dxf(x).
Exercise 1.43
Show that, if > 0, the value of the integral I() =
_

0
dx
sin x
x
is independent
of . How are the values of I() and I() related?
Exercise 1.44
Use integration by parts to evaluate the following indenite integrals.
(a)
_
dx ln x, (b)
_
dx
x
cos
2
x
, (c)
_
dx xln x, (d)
_
dxxsin x.
Exercise 1.45
Evaluate the following integrals
(a)
_
/4
0
dx sin x ln(cos x), (b)
_
/4
0
dxxtan
2
x, (c)
_
1
0
dxx
2
sin
1
x.
Exercise 1.46
If In =
_
x
0
dt t
n
e
at
, n 0, use integration by parts to show that aIn = x
n
e
ax
nIn1 and deduce that

In = n!e
ax
n
k=0
(1)
nk
a
nk+1
k!
x
k
(1)
n
n!
a
n+1
.
Exercise 1.47
(a) Using the substitution u = a x, show that
_
a
0
dxf(x) =
_
a
0
dxf(a x).
(b) With the substitution = /2 show that
I =
_
/2
0
d
sin
sin + cos
=
_
/2
0
d
cos
cos + sin
and deduce that I = /4.
Exercise 1.48
Use the substitution t = tan(x/2) to prove that if a > |b| > 0
_

0
dx
1
a +b cos x
=

a
2
b
2
.
Why is the condition a > |b| necessary?
Use this result and the technique of dierentiating the integral to determine the
values of,
_

0
dx
(a +b cos x)
2
,
_

0
dx
(a +b cos x)
3
,
_

0
dx
cos x
(a +b cos x)
2
,
_

0
dx ln(a+b cos x).
Exercise 1.49
Prove that y(t) =
1
_
t
a
dxf(x) sin (t x) is the solution of the dierential equa-
tion
d
2
y
dt
2
+
2
y = f(t), y(a) = 0, y
(a) = 0.
Exercise 1.50
(a) Consider the integral F(u) =
_
a(u)
0
dxf(x), where only the upper limit de-
pends upon u. Using the basic denition, equation 1.7 (page 18), derive the
derivative F
(u).
(b) Consider the integral F(u) =
_
b
a
dx f(x, u), where only the integrand depends
upon u. Using the basic denition derive the derivative F
(u).
Exercise 1.51
Assuming that both integrals exist, show that
_

dxf
_
x
1
x
_
=
_

dxf(x).
Hence show that
_

dx exp
_
x
2
1
x
2
_
=
e
2
.
You will need the result
_

dxe
x
2
=
.
Exercise 1.52
Find the limits as X of the following integrals
_
X
2
dx
1
xln x
and
_
X
2
dx
1
x(lnx)
2
.
Hint note that if f(x) = ln(lnx) then f
(x) = (xln x)
1
.
Exercise 1.53
Determine the values of the real constants a > 0 and b > 0 for which the following
limit exists
lim
X
_
X
2
dx
1
x
a
(ln x)
b
.
1.4 Miscellaneous exercises
The following exercises can be tackled using the method described in the corresponding
section, though other methods may also be applicable.
Limits
Exercise 1.54
Find, using rst principles, the following limits
(a) lim
x1
x
a
1
x 1
, (b) lim
x0
1 +x 1
1
1 x
, (c) lim
xa
x
1/3
a
1/3
x
1/2
a
1/2
,
(d) lim
x(/2)
( 2x) tan x, (e) lim

x0
+
x
1/x
, (f) lim
x0
_
1 +x
1 x
_
1/x
,
where a is a real number.
Inverse functions
Exercise 1.55
Show that the inverse functions of y = cosh x, y = sinh x and y = tanh x, for
x > 0 are, respectively
x = ln
_
y +
_
y
2
1
_
, x = ln
_
y +
_
y
2
+ 1
_
and x =
1
2
ln
_
1 +y
1 y
_
.
Exercise 1.56
The function y = sin x may be dened to be the solution of the dierential equation
d
2
y
dx
2
+y = 0, y(0) = 0, y
(0) = 1.
Show that the inverse function x(y) satises the dierential equation
d
2
x
dy
2
= y
_
dx
dy
_
3
which gives x(y) = sin
1
y =
_
y
0
du
1
1 u
2
.
Hence nd the Taylor series of sin
1
y to O(y
5
).
Hint you may nd it helpful to solve the equation by dening z = dx/dy.
Derivatives
Exercise 1.57
Find the derivative of y(x) where
(a) y = f(x)
g(x)
, (b) y =
_
p +x
p x
_
q +x
q x
, (c) y
n
= x +
_
1 +x
2
.
Exercise 1.58
If y = sin(a sin
1
x) show that (1 x
2
)y
xy
+a
2
y = 0.
1.4. MISCELLANEOUS EXERCISES 47
Exercise 1.59
If y(x) satises the equation (1 x
2
)
d
2
y
dx
2
2x
dy
dx
+y = 0, where is a constant
and |x| 1, show that changing the independent variable, x, to where x = cos
changes this to
d
2
y
d
2
+ cot
dy
d
+y = 0.
Exercise 1.60
The Schwarzian derivative of a function f(x) is dened to be
Sf(x) =
f
(x)
f
(x)

3
2
_
f
(x)
f
(x)
_
2
= 2
_
f
(x)
d
2
dx
2
_
1
_
f
(x)
_
.
Show that if f(x) and g(x) both have negative Schwarzian derivatives, Sf(x) < 0
and Sg(x) < 0, then the Schwarzian derivative of the composite function h(x) =
f(g(x)) also satises Sh(x) < 0.
Note the Schwarzian derivative is important in the study of the xed points of
maps.
Partial derivatives
Exercise 1.61
If z = f(x +ay) +g(x ay)
x
2a
2
cos(x +ay) where f(u) and g(u) are arbitrary
functions of a single variable and a is a constant, prove that
a
2
2
z
x
2

2
z
y
2
= sin(x +ay).
Exercise 1.62
If f(x, y, z) = exp(ax + by + cz)/xyz, where a, b and c are constants, nd the
partial derivatives fx, fy and fz, and solve the equations fx = 0, fy = 0 and
fz = 0 for (x, y, z).
Exercise 1.63
The equation f(u
2
x
2
, u
2
y
2
, u
2
z
2
) = 0 denes u as a function of x, y and z.
Show that
1
x
u
x
+
1
y
u
y
+
1
z
u
z
=
1
u
.
Implicit functions
Exercise 1.64
Show that the function f(x, y) = x
2
+y
2
1 satises the conditions of the Implicit
Function Theorem for most values of (x, y), and that the function y(x) obtained
from the theorem has derivative y
(x) = x/y.
The equation f(x, y) = 0 can be solved explicitly to give the equations y =
1 x
2
. Verify that the derivatives of both these functions is the same as that
obtained from the Implicit Function Theorem.
Exercise 1.65
Prove that the equation xcos xy = 0 has a unique solution, y(x), near the point
(1,

2
), and nd its rst and second derivatives.
Exercise 1.66
The folium of Descartes has equation f(x, y) = x
3
+y
3
3axy = 0. Show that at
all points on the curve where y
2
= ax, the implicit function y(x) has derivative
dy
dx
=
x
2
ay
y
2
ax
.
Show that there is a horizontal tangent to the curve at (a2
1/3
, a4
1/3
).
Taylor series
Exercise 1.67
By sketching the graphs of y = tanx and y = 1/x for x > 0 show that the equation
x tanx = 1 has an innite number of positive roots. By putting x = n+z, where
n is a positive integer, show that this equation becomes (n + z) tanz = 1 and
use a rst order Taylor expansion of this to show that the root nearest n is given
approximately by xn = n +
1
n
.
Exercise 1.68
Determine the constants a and b such that (1 +a cos 2x +b cos 4x)/x
4
is nite at
the origin.
Exercise 1.69
Find the Taylor series, to 4th order, of the following functions:
(a) ln cosh x, (b) ln(1 + sin x), (c) e
sin x
, (d) sin
2
x.
Mean value theorem
Exercise 1.70
If f(x) is a function such that f
(x) increases with increasing x, use the Mean

Value theorem to show that f
(x) < f(x + 1) f(x) < f
(x + 1).
Exercise 1.71
Use the functions f1(x) = ln(1 + x) x and f2(x) = f1(x) + x
2
/2 and the Mean
Value Theorem to show that, for x > 0,
x
1
2
x
2
< ln(1 +x) < x.
LHospitals rule
Exercise 1.72
Show that lim
x1
sin ln x
x
5
7x
3
+ 6
=
1
16
.
Exercise 1.73
Determine the limits lim
x0
(cos x)
1/ tan
2
x
and lim
x0
a sin bx b sin ax
x
3
.
Integrals
Exercise 1.74
Using dierentiation under the integral sign show that
_

0
dx
tan
1
(ax)
x(1 +x
2
)
=
1
2
ln(1 +a).
Exercise 1.75
Prove that, if |a| < 1
_
/2
0
dx
ln(1 + cos a cos x)
cos x
=

2
8
(1 4a
2
).
Exercise 1.76
If f(x) = (sin x)/x, show that
_
/2
0
dxf(x)f(/2 x) =
2
_

0
dxf(x).
Exercise 1.77
Use the integral denition
tan
1
x =
_
x
0
dt
1
1 +t
2
to show that for x > 0 tan
1
(1/x) =
_

x
dt
1
1 +t
2
and deduce that tan
1
x + tan
1
(1/x) = /2.
Exercise 1.78
Determine the values if x that make g
(x) = 0 if g(x) =
_
2x
x
dt f(t) and
(a) f(t) = e
t
, and (b) f(t) = (sin t)/t.
Exercise 1.79
If f(x) is integrable for a x a +h show that
lim
n
1
n
n
k=1
f
_
a +
kh
n
_
=
1
h
_
a+h
a
dxf(x).
Hence nd the following limits
(a) lim
n
n
6
_
1 + 2
5
+ 3
5
+ +n
5
_
, (b) lim
n
_
1
1 +n
+
1
2 +n
+ +
1
3n
_
,
(c) lim
n
1
n
_
sin
_
y
n
_
+ sin
_
2y
n
_
+ + sin y
_
, (d) lim
n
n
1
_
(n + 1)(n + 2) . . . (2n)
_
1/n
.
Exercise 1.80
If the functions f(x) and g(x) are dierentiable nd expressions for the rst deriva-
tive of the functions
F(u) =
_
u
0
dx
f(x)
u
2
x
2
and G(u) =
_
u
0
dx
g(x)
(u x)
a
where 0 < a < 1.
This is a fairly dicult problem. The formula 1.52 does not work because the
integrands are singular, yet by substituting simple functions for f(x) and g(x), for
instance 1, x and x
2
, we see that there are cases for which the functions F(u) and
G(u) are dierentiable. Thus we expect an equivalent to formula 1.52 to exist.
Chapter 2
Ordinary Dierential
Equations
2.1 Introduction
Dierential equations are an important component of this course, so in this chapter we
discuss relevant, elementary theory and provide practice in solving particular types of
equations. You should be familiar with all the techniques discussed here, though some
of the more general theorems may be new. If you already feel condent with the theory
presented here, then a detail study may not be necessary, though you should attempt
some of the end of section exercises. If you wish to delve deeper into the subject there
are many books available; those used in the preparation of this chapter include Birko
and Rota
1
(1962), Arnold (1973)
2
, Ince
3
(1956) and the older text by Piaggio
4
, which
provides a dierent slant on the subject than modern texts.
Dierential equations are important because they can be used to describe a wide
variety of physical problems. One reason for this is that Newtons equations of motion
usually relate the rates of change of the position and momentum of particles to their
position, so physical systems ranging in size from galaxies to atoms are described by
dierential equations. Besides these important traditional physical problems, ordinary
dierential equations are used to describe some electrical circuits, population changes
when populations are large and chemical reactions. In this course we deal with the sub-
class of dierential equations that can be derived from variational principles, a concept
introduced in chapter 3. A simple example is the stationary chain hanging between two
xed supports: this assumes a shape that minimises its gravitational energy, and we
can use this fact to derive a dierential equation, the solution of which describes the
chains shape. This problem is dealt with in chapter 12.
The term dierential equation rst appeared in English in 1763 in The method by
increments by William Emerson (1701 1782) but was introduced by Leibniz (1646
1
Birkho G and Rota G-C 1962 Ordinary dierential equations (Blaisdell Publishing Co.).
2
Arnold V I 1973, Ordinary Dierential Equations (The MIT Press), translated by R A Silverman
3
Ince E L 1956 Ordinary dierential equations (Dover).
4
Piaggio H T H 1968 An Elementary treatise on Dierential Equations, G Bell and Sons, rst
published in 1920.
51
52 CHAPTER 2. ORDINARY DIFFERENTIAL EQUATIONS
1716) eighty years previously
5
in the Latin form, aequationes dierentiales.
The study of dierential equations began with Newton (1642 1727) and Leibniz.
Newton considered uxional equations, which related a uxion to its uent, a uent
being a function and a uxional its derivative: in modern terminology he considered the
two types of equation dy/dx = F(x) and dy/dx = F(x, y), and derived solutions using
power series in x, a method which he believed to be universally applicable. Although
this work was completed in the 1670s it was not published until the early 18
th
century,
too late to aect the development of the general theory which had progressed rapidly
in the intervening period.
Much of this progress was due to Leibniz and the two Bernoulli brothers, James
(1654 1705), and his younger brother John (1667 1748), but others of this scienti-
cally talented family also contributed; many of these are mentioned later in this chapter
so the following genealogical tree of the scientically important members of this family
is shown in gure 2.1.
Nicholas
James
(16541705)
Nicolas I
(16621716)
John
Nicholas III Daniel
John II
John III
James II
(16671748)
(17101790)
(16951726) (17001782)
Christoph
(16231708)
Daniel II
(17441807)
(17511834)
(17821863)
(17591789)
The Bernoulli family
(16871759)
Nicolas II
John Gustav
(18111863)
Figure 2.1 Some of the posts held by some members of the Bernoulli family are:
James: Prof of Mathematics, Basle (1687-1705);
John: Prof of Mathematics, Groningen (1695-1705), Basle (1705-1748);
Nicholas III: Prof at Petrograd;
Daniel: Prof at Preograd and Basle (Bernoulli principle in hydrodynamics);
John II: Prof at Basle;
John III: Astronomer Royal and Director of Mathematical studies at Berlin;
James II: Prof at Basle, Verona and Petrograd.
In 1690 James Bernoulli solved the brachistochrone problem, discussed in section 5.2,
which involves solving a nonlinear, rst-order equation: in 1692 Leibniz discovered the
method of solving rst-order homogeneous and linear problems, sections 2.3.2 and 2.3.3:
Bernoullis equation, section 2.3.4, was proposed in 1695 by James Bernoulli and solved
by Leibniz and John Bernoulli soon after. Thus within a few years of their discovery
many of the methods now used to solve dierential equations had been discovered.
5
Acta Erruditorum, Oct 1684
2.1. INTRODUCTION 53
The rst treatise to provide a systematic discussion of dierential equations and their
solutions was published in four volumes by Euler (1707 1783), the rst in 1755 and
the remaining three volumes between 1768 and 1770.
This work on dierential equations involved rearranging the equation, using alge-
braic manipulations and transformations, so that the solution could be expressed as an
integral. This type of solution became known as solving the equation by quadrature,
a term originally used to describe the area under a plane curve and in particular to
the problem of nding a square having the same area as a given circle: it was in this
context that the term was rst introduced into English in 1596. The term quadrature
is used regardless of whether the integral can actually be evaluated in terms of known
functions. Other common terms used to describe this type of solution are closed-form
solution and analytic solution: none of these terms have a precise denition.
Much of this early work was concerned with the construction of solutions but raised
fundamental questions concerning what is meant by a function or by a solution of
a dierential equation, which led to important advances in analysis. These questions
broadened the scope of enquiries, and the rst of these newer studies was the work of
Cauchy (1789 1857) who investigated the existence and uniqueness of solutions, and
in 1824 proved the rst existence theorem; this is quoted on page 81. The extension of
this theorem
6
, due to Picard (1856 1941), is quoted in section 2.3. These theorems,
although important, deal only with a restricted class of equations, which do not include
many of the quite simple equations arising in this course, or many other practical
problems.
In 1836 Sturm introduced a dierent approach to the subject, whereby properties of
solutions to certain dierential equations are derived directly from the equation, without
the need to nd solutions. Subsequently, during the two years 1836-7, Sturm (1803
1855) and Liouville (1809 1882) developed these ideas, some of which are discussed in
chapter 13. The notion of extracting information from an equation, without solving it,
may seem rather strange, but you can obtain some idea of what can be achieved by
doing exercises 2.57 and 2.58 at the end of this chapter.
Liouville was also responsible for starting another important strand of enquiry. He
was interested in the problem of integration, the main objective of which is to decide
if a given class of indenite integrals can be integrated in terms of a nite expression
involving algebraic, logarithmic or exponential functions. This work was performed
between 1833 and 1841 and towards the end of this period Liouville turned his atten-
tion to similar problems involving rst and second-order dierential equations, a far
more dicult problem. A readable history of this work is provided by L utzen
7
. This
line of enquiry became of practical signicance with the advent of Computer Assisted
Algebra during the last quarter of the 20
th
century, and is now an important part of
software such as Maple (used in MS325 and M833) and Mathematica: for some modern
developments see Davenport et al
8
.
Applications that involve dierential equations often require solutions, so the third
approach to the subject involves nding approximations to those equations that cannot
be solved exactly in terms of known functions. There are far too many such methods
6
Picard E 1893 J de Maths, 9 page 217.
7
Joseph Liouville 1809-1882: Master of Pure and Applied Mathematics, 1990 by J L utzen, pub-
lished by Springer-Verlag.
8
Davenport J H, Siret Y and Tournier E 1989 Computer Algebra. Sytems and algorithms for
algebraic computation, Academic Press.
to describe here, but one important technique is described in chapter 14.
The current chapter has two principal aims. First, to give useful existence and
uniqueness theorems, for circumstances where they exist. Second, to describe the var-
ious classes of dierential equation which can be solved by the standard techniques
known to Euler. In section 2.3 we discuss rst-order equations: some aspects of second-
order equations are discussed in section 2.4. In the next section some general ideas are
introduced.
2.2 General denitions
An nth order dierential equation is an equation that gives the nth derivative of a real
function, y(x), of a real variable x, in terms of x and some or all of the lower derivatives
of y,
d
n
y
dx
n
= F
_
x, y, y
, y
, , y
(n1)
_
, a x b. (2.1)
The function y is named the dependent variable, and the real variable x the independent
variable and this is often limited to a given interval of the real axis, which may be the
whole axis or an innite portion of it. The function F must be single-valued and
dierentiable in all variables, see theorem 2.2 (page 81).
Frequently we obtain equations of the form
G(x, y, y
, y
, , y
(n)
) = 0, a x b, (2.2)
and this is also referred to as an nth order dierential equation. But in order to progress,
it is usually necessary to rearrange 2.2 into the form of 2.1, and this usually gives more
than one equation. A simple example is the rst-order equation y
2
+ y
2
= c
2
, c a
constant, which gives the two equations y
=
_
c
2
y
2
.
Another important type of system, is the set of n coupled rst-order equations
dz
k
dx
= f
k
(x, z
1
, z
2
, , z
n
) , k = 1, 2, , n, a x b, (2.3)
where f
k
are a set of n real-valued, single-valued functions of (x, z
1
, z
2
, , z
n
). If all
the f
k
are independent of x these equations are described as autonomous; if one or more
of the f
k
depend explicitly upon x they are named non-autonomous
9
.
The nth order equation 2.1 can always be expressed as a set of n coupled, rst-order
equations. For instance, if we dene
z
1
= y, z
2
= y
, z
3
= y
, z
n
= y
(n1)
,
then equation 2.1 becomes
z
n
= y
(n)
= F(x, z
1
, z
2
, , z
n
) and z
k
= z
k+1
, k = 1, 2, , n 1.
9
This distinction is important in dynamics where the independent variable, x, is usually the time.
The signicance of this dierence is that if y(x) is a solution of an autonomous equations then so is
y(x + a), for any constant a: it will be seen in chapter 7, when we consider Noethers theorem, that
this has an important consequence and in dynamics results in energy being constant.
2.2. GENERAL DEFINITIONS 55
This transformation is not unique, as seen in exercise 2.4. Coupled, rst-order equations
are important in many applications, and are used in many theorems quoted later in this
chapter, which is why we mention them here.
In this course most dierential equations encountered are rst-order, n = 1, or
second-order, n = 2.
A solution of equation 2.1 is any function that satises the equation, and it is helpful
and customary to distinguish two types of solutions. The general solution of an nth
order equation is a function
f(x, y, c
1
, c
2
, , c
n
) = 0 (2.4)
involving x, y and n arbitrary constants which satises equation 2.1 for all values of
these constants in some domain: this solution is also named the complete primitive.
The most general solution of an nth order equation contains n arbitrary constants, but
this is dicult to prove for a general equation.
A particular solution or particular integral g(x, y) = 0 is a function satisfying equa-
tion 2.1, but containing no arbitrary constants: particular integrals can be obtained
from a general solution by giving the constants particular values, but some equations
have independent particular solutions that are independent of the general solution:
these are named singular solutions. For instance the equation
y
=
yy
x
has the general solution y(x) = 2c
1
tan(c
2
+c
1
ln x) 1
and also the singular solution y = c
3
, where c
3
is an arbitrary constant, which cannot
be obtained from the general solution, see exercise 2.49(c) (page 84). Another example
is given in exercise 2.6 (page 59); one origin of singular solutions of rst-order equations
is discussed in section 2.6.
If y(x) = 0 is a solution of a dierential equation it is often referred to as the trivial
solution.
The values of the n arbitrary constants are determined by n subsidiary conditions
which will be discussed later.
Linear and Nonlinear equations
An important category of dierential equation are linear equations, that is, equations
that are of rst-degree in the dependent variable and all its derivatives: the most general
nth order, linear dierential equation has the form
a
n
(x)
d
n
y
dx
n
+a
n1
(x)
d
n1
y
dx
n1
+ +a
1
(x)
dy
dx
+a
0
(x)y = h(x) (2.5)
where h(x) and a
k
(x), k = 0, 1, , n, are functions of x, but not y.
If h(x) = 0 the equation is said to be homogeneous, otherwise it is an inhomogeneous
equation.
Linear equations are important for three principal reasons. First, they often approx-
imate physical situations where the appropriate variable, here y, has small magnitude,
so terms O(y
2
) can be ignored. Second, by comparison with nonlinear equations they
are relatively easy to solve. Third, their solutions have benign properties that are
well understood: some of these properties are discussed in section 2.4 and others in
chapter 13.
Dierential equations which are not linear are nonlinear equations. These equations
are usually dicult to solve and their solutions often have complicated behaviours.
Most equations encountered in this course are nonlinear.
Initial and Boundary value problems
An important distinction we need to mention is that between initial value problems and
boundary value problems, which we discuss in the context of the second-order equation
d
2
y
dx
2
= F(x, y, y
), a x b. (2.6)
The general solution of this equation contains two arbitrary constants, and in practical
problems the values of these constants are determined by conditions imposed upon the
solution.
In an initial value problem
10
the value of the solution and its rst derivative are
dened at the point x = a. Thus a typical initial value problem is
d
2
y
dx
2
+y = 0, y(a) = A, y
(a) = B. (2.7)
In a boundary value problem the value of the solution is prescribed at two distinct points,
normally the end points of the range, x = a and x = b. A typical problem is
d
2
y
dx
2
+y = 0, y(a) = A, y(b) = B. (2.8)
The distinction between initial and boundary value problems is very important. For
most initial value problems occurring in practice a unique solution exists, see theo-
rems 2.1 and 2.2, pages 61 and 81 respectively. On the contrary, for most boundary
value problems it is not known whether a solution exists and, if it does, whether it is
unique: we encounter examples that illustrate this behaviour later in the course. It is
important to be aware of this diculty when numerical methods are used.
For example the solutions of equations 2.7 and 2.8 are, respectively,
y = Acos(x a) +Bsin(x a) and y =
Asin(b x) +Bsin(x a)
sin(b a)
.
The former solution exists for all a, A and B; the latter exists only if sin(b a) = 0.
Other types of boundary conditions occur, and are important, and are introduce
later as needed.
Singularities: movable and xed
The solution of the nonlinear equation
dy
dx
= y
2
, y(0) = A, is y(x) =
A
1 Ax
.
10
It is named an initial value problem because in this type of system the independent variable, x, is
often related to the time and we require the solution subsequent to the initial time, x = a.
This solution is undened at x = 1/A, a point which depends upon the initial condition.
Thus this singularity in the solution moves as the initial condition changes, and is
therefore named a movable singularity.
On the other hand the general, non-trivial solution of the linear equation
dy
dx
+
y
x
= 0 is y =
C
x
, C = 0, (2.9)
where C is a constant. This solution is undened at x = 0, regardless of the value of
the integration constant C. This type of singularity in the solution is named a xed
singularity. The signicance of this classication is that the singularities in the solutions
of nonlinear equations are almost always movable. For the solutions of linear equations
they are always xed and their positions are at the same points as the singularities of
the coecient functions dening the equation: in equation 2.9 the coecient of y is
1/x, which is why the solution is singular at x = 0. For the linear equation 2.5 any
singularities in the solution are at points when one or more of the ratios a
k
(x)/a
n
(x),
k = 0, 1, , n 1 has a singularity.
In the above examples the singularity is the point where the solution is unbounded.
But at a singularity a function is not necessarily a point where it is unbounded; a careful
denition of a singularity can only be provided in the context of complex variable theory
and for single-valued functions there are two types of singularity, poles and essential
singularities. We cannot describe this theory here
11
but, instead we list some typical
examples of these singularities.
Functional form Name of singularity
1
(a x)
n
, n = 1 , 2 . Pole
(a x)
, a real, non-integer number. Essential singularity

Other types of essential singularities are exp(1/(x a)), exp(
x a) and ln(x a).

Functions of a real variable are less rened and can misbehave in a variety of unruly
ways: some typical examples are
_
|x|, 1/|x| and ln |x|.
Exercise 2.1
Show that the following equations have the solutions given, and state whether the
singularity in each solution is xed or movable, and whether the equation is linear
or nonlinear.
(a)
dy
dx
= xy
3
, y(0) = A, y =
A
1 A
2
x
2
.
(b)
dy
dx
+
y
x
= x, y(1) = A, y(x) =
3A 1
3x
+
1
3
x
2
.
Various types of solution
A general solution of a dierential equation can take many dierent forms, some more
useful than others. Most useful are those where y(x) is expressed as a formula involving
a nite number of familiar functions of x; this is rarely possible. This type of solution is
11
A brief summary of the relevant theory is provided in the course Glossary; for a fuller discussion
see Whittaker and Watson (1965).
often named a closed-form solution or sometimes an analytic solution
12
. It is frequently
possible, however, to express solutions as an innite series of increasing powers of x:
these solutions are sometimes useful, but normally only for a limited range of x.
A solution may be obtained in the form f(x, y) = 0, which cannot be solved to
provide a formula for y(x). In such cases the equation f(x, y) = 0, for xed x, often
has many solutions, so the original dierential equation has many solutions. A simple
example is the function f(x, y) = y
2
2xy + C = 0, where C is a constant, which
is a solution of the equation y
= y/(y x), which therefore has the two solutions

y = x
x
2
C.
Another type of solution involves some form of approximation. From the beginning
of the 18
th
century to the mid 20
th
century were developed a range of techniques that
approximate solutions in terms of simple nite formulae: these approximations and the
associated techniques are important but do not feature in this course, except for the
method described in chapter 14.
Another type of approximation is obtained by solving the equation numerically,
but these methods nd only particular solutions, not general solutions, and may fail
for initial value problems on large intervals and for nonlinear boundary value problems.
Moreover, if the equations contain several parameters it is usually dicult to understand
the eect of changing the parameters.
Exercise 2.2
Which of the following dierential equations are linear and which are nonlinear?
(a) y
+x
2
y = 0, (b) y
+xy
2
= 1, (c) y
+|y| = 0,
(d) y
+xy
2
+y = 0, (e) y
+y sin x = e
x
, (f) y
+y =
_
1, y > 0,
1, y 0.
(g) y
= |x|, y(1) = 2, (h) y
= 1, y(1)
2
= 1,
(i) y
= x, y(0) +y
(0) = 1, y(1) = 2.
Exercise 2.3
Which of the following problems are initial value problems, which are boundary
value problems and which are neither?
(a) y
+y = sin x, y(0) = 0, y() = 0,

(b) y
+y = |x|, y(0) = 1, y
() = 0,
(c) y
+ 2y
+y = 0, y(0) +y(1) = 1, y
(0) = 0,
(d) y
y = cos x, y(1) = y(2), y
(1) = 0,
(e) y
+ 2y
+x
2
y
2
+|y| = 0, y(0) = y
(0) = 0, y
(1) = 1,
(f) y
(4)
+ 3y
+ 2x
2
y
2
+x
3
|y| = x, y(0) = 1, y
(0) = 2, y
(0) = 1, y
(0) = 1,
(g) y
sin x +y cos x = 0, y(/2) = 0, y
(/2) = 1,
(h) y
+y
2
= y(x
2
), y(0) = 0, y
(0) = 1.
12
Used in this context the solution need not be analytic in the sense of complex variable theory.
Exercise 2.4
Lienards equation,
d
2
x
dt
2
+f(x)
dx
dt
+g(x) = 0, 0,
where f(x) and g(x) are well behaved functions and a constant, describes certain
important dynamical systems.
(a) Show that if y = dx/dt this equation can be written as the two coupled rst-
order equations
dx
dt
= y,
dy
dt
= f(x)y g(x).
(b) If F(x) =
_
x
0
duf(u), by dening z =
1
dx
dt
+ F(x), show that an alternative
representation of Lienards equation is
dx
dt
= (z F(x)) ,
dz
dt
=
g(x)
.
This exercise demonstrates that there is no unique way of converting a second-
order equation to a pair of coupled rst-order equations. The transformation of
part (b) may seem rather articial, but if 0 it provides a basis for a good
approximation to the periodic solution of the original equations which is not easily
obtained by other means.
Exercise 2.5
Clairauts equation
An equation considered by A C Clairaut(1713 1765) is
y = px +f(p) where p =
dy
dx
.
By dierentiating with respect to x show that (x +f
(p)) p
= 0 and deduce
that one solution is p = c, a constant, and hence that the general solution is
y = cx +f(c).
Show also that the function derived by eliminating p from the equations
x = f
(p) and y = px +f(p)

is a particular solution. The geometric signicance of this solution, which is usually
a singular solution, and its connection with the general solution is discussed in
exercise 2.67 (page 90).
Exercise 2.6
Find the general and singular solutions of the dierential equation y = px e
p
,
p = y
(x).
Exercise 2.7
Consider the second-order dierential equation F(x, y
, y
) = 0 in which y(x)
is not explicitly present. Show that by introducing the new dependent variable
p = dy/dx, this equation is reduced to the rst-order equation F(x, p, p
) = 0.
Exercise 2.8
Consider the second-order dierential equation F(y, y
, y
) = 0 in which the inde-

pendent variable x is not explicitly present.
Dene p = dy/dx and show that by considering p as a function of y,
d
2
y
dx
2
=
dp
dx
= p
dp
dy
,
and hence that the equation reduces to the rst-order equation F
_
y, p, p
dp
dy
_
= 0.
2.3 First-order equations
Of all ordinary dierential equations, rst-order equations are usually the easiest to
solve using conventional methods and there are ves types that are amenable to these
methods. When confronted with an arbitrary rst-order equation, the trick is to recog-
nise the type or a transformation that converts it to a given equation of one of these
types. Before describing these types we rst discuss the existence and uniqueness of
their solutions.
2.3.1 Existence and uniqueness of solutions
The rst-order equation
dy
dx
= F(x, y), a x b, (2.10)
does not have a unique solution unless the value of y is specied at some point x =
c [a, b]. Why this is so can be seen geometrically: consider the Cartesian plane Oxy,
shown in gure 2.2. Take any point (x, y) with x [a, b] and where F(x, y) is dened
and single valued, so a unique value of y
(x) is dened: this gives the gradient of the

solution passing through this point, as shown by the arrows.
x + x
y
x
y
x
x+ x 2
y(x)
Figure 2.2 Diagram showing the construction of a solu-
tion through a given point in the Oxy-plane.
At an adjacent value of x, x + x, the value of y on this solution is y(x + x) =
y(x)+xF(x, y(x))+O(x
2
), as shown. By taking the successive values of y at x+kx,
k = 1, 2, , we obtain a unique curve passing through the initial point. By letting
x 0 it can be shown that this construction gives the exact solution. This unique
2.3. FIRST-ORDER EQUATIONS 61
solution can be found only if the initial value of y is specied. Normally y(a) is dened
and this gives the initial value problem
dy
dx
= F(x, y), y(a) = A, a x b. (2.11)
If F(x, y) and its derivatives F
x
and F
y
are continuous in a suitable neighbourhood
surrounding the initial point, dened in theorem 2.2 (page 81), then it can be shown
that a unique solution satisfying the initial condition y(a) = A exists. This is essentially
the result deveoloped by Cauchy in his lectures at the

Ecole Polytechnique between 1820
and 1830, see Ince (1956, page 76). The solution may not, however, exist in the desired
interval [a, b]. The following, more useful, result was obtained by Picard
13
in 1893, and
shows how, in principle, a solution can be constructed.
Theorem 2.1
In a rectangular region D of the Oxy plane a h x a +h, AH y A+H, if
in D we can nd positive numbers M and L such that
a) |F(x, y)| < M, and
b) |F(x, y
1
) F(x, y
2
)| < L|y
1
y
2
|,
then the sequence of functions
y
n+1
(x) = A +
_
x
a
dt F(t, y
n
(t)), y
0
(x) = A, n = 0, 1, , (2.12)
converges uniformly to the exact solution. If F(x, y) is dierentiable in D conditions a)
and b) are satised.
The proof of this theorem, valid for nth order equations, can be found in Ince
14
, Pi-
aggio
15
and a more modern treatment in Arnold
16
. In general this iterative formula
results in very long expressions even after the rst few iterations, or the integrals cannot
be evaluated in closed form.
Typically, if the integrals can be evaluated, equation 2.12 gives the solution as a
series in powers of x, but from this it is usually dicult to determine the radius of
convergence of the series, see for instance exercise 2.10. Even if it converges for all x, it
may be of little practical value when |x| is large: the standard example that illustrates
these diculties is the Taylor series for sinx, which converges for all x, but for large |x|
is practically useless because of rounding errors, see section 1.3.8.
Exercise 2.9
Use the iterative formula 2.12 to nd the innite series solution of
dy
dx
= y, y(0) = 1, x 0.
For which values of x does this solution exist?
Note that you will need to use induction to construct the innite series.
13
Picard E 1893, J de Maths 9 page 217: a history of this developement is given by Ince (1956,
page 63).
14
Ince E L 1956, Ordinary Dierential Equations Dover, chapter 3.
15
Piaggio H T H 1962 An elementary Treaties on Dierential Equations and their applications, G
Bell and Sons, London.
16
Arnold V I 1973 Ordinary Dierential Equations, Translated and Edited by R A Silverman, The
MIT Press.
Exercise 2.10
(a) Use the iterative formula 2.12 to show that the second iterate of the solution
to
dy
dx
= 1 +xy
2
, y(0) = A, x 0,
is
y(x) = A+x +
1
2
A
2
x
2
+
2
3
Ax
3
+
1
4
(1 +A
3
)x
4
+
1
5
A
2
x
5
+
1
24
A
4
x
6
.
(b) An alternative method of obtaining this series is by direct calculation of the
Taylor series, but for the same accuracy more work is usually required. Find
the values of y
(0), y
(0) and y
(0) directly from the dierential equation and

construct the third-order Taylor series of the solution.
(c) The dierential equation shows that for x > 0, y(x) is a monotonic increasing
function, so we expect that for suciently large x, xy(x)
2
1, and hence that
the solution will be given approximately by the equation y
= xy
2
. Use this ap-
proximation to deduce that y(x) at some nite value of x. Explain the likely
eect of this on the radius of convergence of this series. In exercise 2.21 (page 68)
it is shown that for large A the singularity is approximately at x =
_
2/A.
2.3.2 Separable and homogeneous equations
Equation 2.11 is separable if F(x, y) can be written in the form F = f(x)g(y), with
f(x) depending only upon x and g(y) depending only upon y. Then the equation can
be rearranged in the form of two integrals: the following expression also incorporates
the initial condition
_
y
A
dv
g(v)
=
_
x
a
du f(u). (2.13)
Provided these integrals can be evaluated this gives a representation of the solution,
although rarely in the convenient form y = h(x). This is named the method of separa-
tion of variables, and was used by both Leibniz and John Bernoulli at the end of the
17
th
century.
Sometimes a non-separable equation y
= F(x, y) can be made separable if new

dependent and independent variables, u and v, can be found such that it can be written
in the form
du
dv
= U(u)V (v),
with U(u) depending only upon u and V (v) depending only upon v.
A typical separable equation is
dy
dx
= cos y sin x, y(0) = A,
so its solution can be written in the form
_
y
A
dv
cos v
=
_
x
0
du sin u, that is ln
_
tan(y/2 +/4)
tan(A/2 +/4)
_
= 1 cos x,
which simplies to y =
2
+ 2 tan
1
_
exp(1 cos x) tan
_
A
2
+

4
__
.
Exercise 2.11
Use the method of separation of variables to nd solutions of the following equa-
tions.
(a) (1 +x
2
)y
= x(1 y
2
), (b) (1 +x)y
xy = x,
(c) (1 +x)y
= x
1 +y, y(0) = 0, (d) y
=
1 + 2x + 2y
1 2x 2y
. Hint, dene z = x +y.
A sub-class of equations that can be transformed into separable equations are those for
which F(x, y) depends only upon the ratio y/x, rather than on x and y separately,
dy
dx
= F
_
y
x
_
, y(a) = A. (2.14)
Such equations are often named homogeneous equations. The general theory of this
type of equation is developed in the following important exercise.
Exercise 2.12
a) Show that by introducing the new dependent variable v(x) by the relation
y = xv, equation 2.14 is transformed to the separable form
dv
dx
=
F(v) v
x
, v(a) =
A
a
.
Use this transformation to nd solutions of the following equations.
(b) y
= exp
_
y
x
_
+
y
x
, (c) y
=
x + 3y
3x +y
, y(1) = 0,
(d) x(x +y)y
= x
2
+y
2
, (e) y
=
3x
2
xy + 3y
2
2x
2
+ 3xy
,
(f) y
=
4x 3y 1
3x + 4y 7
. Hint, set x = +a and y = +b, where (a, b) is the
point of intersection of the lines 4x 3y = 1 and 3x + 4y = 7.
2.3.3 Linear rst-order equations
The equation
dy
dx
+yP(x) = Q(x), y(a) = A, (2.15)
where P(x) and Q(x) are real functions of x only, is linear because y and y
occur only
to rst-order. Its solution can always be expressed as an integral, by rst nding a
function, p(x), to write the equation as
d
dx
(yp(x)) = Q(x)p(x), y(a) = A, (2.16)
which can be integrated directly. The unknown function, p(x), is found by expand-
ing 2.16, dividing by p(x) and equating the coecient of y(x) with that in the original
equation. This gives
p
p
= P(x) which integrates to p(x) = exp
__
dxP(x)
_
. (2.17)
The function p(x) is named the integrating factor: rather than remembering the formula
for p(x) it is better to remember the idea behind the transformation, because similar
ideas are used in other contexts.
Equation 2.16 integrates directly to give
y(x)p(x) = C +
_
dxp(x)Q(x), (2.18)
where C is the arbitrary constant of integration, dened by the initial condition. In
this analysis there is no need to include an arbitrary constant in the evaluation of p(x).
This method produces a formula for the solution only if both integrals can be eval-
uated in terms of known functions. If this is not the case it is often convenient to write
the solution in the form
y(x)p(x) = A +
_
x
a
dt Q(t)p(t), p(t) = exp
__
t
a
du P(u)
_
, (2.19)
because this expression automatically satises the initial condition and the integrals
can be evaluated numerically.
Exercise 2.13
Use a suitable integrating factor to nd solutions of the following equations. In
each case show that the singularity in the solution is xed and relate its position
to properties of the coecient functions.
(a) (x + 2)y
+ (x + 3)y = 4 exp(x),
(b) y
cos x +y sin x = 2 cos

2
xsin x, y(0) = 0,
(c) x
2
y
+ 1 + (1 2x)y = 0.
(d) Without solving it, use the properties of the dierential equation
cos
2
x
dy
dx
y sin xcos x + (1 + cos
2
x) tanx = 0, y(0) = 2,
to show that the solution is stationary at x = 0.
Find this solution and show that y(0) is a local maximum.
Exercise 2.14
Variation of Parameters
Another method of solving equation 2.15 is to use the method of variation of
parameters which involves nding a function, f(x), which is either a solution of
part of the equation or a particular integral of the whole equation, expressing the
required solution in the form y(x) = v(x)f(x), and nding a simpler dierential
equation for the unknown function v(x).
For equation 2.15, we nd the solution of
dy
dx
+yP(x) = 0. (2.20)
(a) Show that the solution of equation 2.20 with condition y(a) = 1 is
f(x) = exp
_
_
x
a
dt P(t)
_
.
(b) Now assume that the solution of
dy
dx
+yP(x) = Q(x), y(a) = A
can be written as y = v(x)f(x), v(a) = A, and show that fv
= Q, and hence that

the required solution is
y(x) = f(x)
_
A+
_
x
a
dt
Q(t)
f(t)
_
.
Relate this solution to that given by equation 2.19.
Exercise 2.15
Use the idea introduced in exercise 2.7 (page 59) to solve the dierential equation
x
d
2
y
dx
2

dy
dx
= 3x
2
, y(1) = A, y
(1) = A
.
Exercise 2.16
Use the idea introduced in exercise 2.8 (page 60) to solve the dierential equation
d
2
y
dx
2
+
2
y = 0, y(0) = A, y
(0) = 0, > 0.
2.3.4 Bernoullis equation
Two Bernoulli brothers, James and John, and Leibniz studied the nonlinear, rst-order
equation
dy
dx
+ yP(x) = y
n
Q(x), (2.21)
where n = 1 is a constant and P(x) and Q(x) are functions only of x; this equation is
now named Bernoullis equation. The method used by John Bernoulli is to set z = y
1n
,
so that
dz
dx
=
1 n
y
n
dy
dx
, and equation 2.21 becomes
dz
dx
+ (1 n)P(x)z = (1 n)Q(x), (2.22)
which is a rst-order equation of the type treated in the previous section.
An example of such an equation is
x(x
2
1)
dy
dx
y = x
3
y
2
, y(2) = A.
By dividing through by x(x
2
1) we see that P(x) =
1
x(x
2
1)
, Q(x) =
x
2
x
2
1
and
n = 2.
Thus equation 2.22 becomes
dz
dx
+
z
x(x
2
1)
=
x
2
x
2
1
, z =
1
y
, z(2) =
1
A
.
The integrating factor, equation 2.17, is
p(x) = exp
__
dx
1
x(x
2
1)
_
= exp
__
dx
_
1
2(x 1)
+
1
2(x + 1)

1
x
__
=
x
2
1
x
,
hence the equation for z can be written in the form
d
dx
_
z
x
2
1
x
_
=
x
x
2
1
, z(2) =
1
A
.
Integrating this and using the condition at x = 2 gives
1
y
=
3
_
1 +
1
2A
_
x
x
2
1
x.
Exercise 2.17
Solve the following equations.
(a) y
= 2y xy
2
, y(0) = 1, (b) x(1 x
2
)y
+ (2x
2
1)y = x
2
y
3
,
(c) y
cos x y sin x = y
3
cos
2
x, y(0) = 1, (d) x
3
y
= y(x
2
+y).
2.3.5 Riccatis equation
Jacopo Francesco, Count Riccati of Venice (1676 1754), introduced an important class
of rst-order, nonlinear equations. Here we consider the most general of this type of
equation which was introduced by Euler, namely
dy
dx
= P(x) +yQ(x) +y
2
R(x), (2.23)
where P, Q and R are functions only of x. This equation is now named Riccatis
equation
17
. If R(x) = 0 Riccatis equation is a linear equation of the type already
considered, and if P(x) = 0 it reduces to Bernoullis equation, so we ignore these cases.
Riccatis studies were mainly limited to the equations
dy
dx
= ay
2
+bx
and
dy
dx
= ay
2
+bx +cx
2
,
where a, b, c and are constants. The rst of these equations was introduced in Riccatis
1724 paper
18
. It can be shown that the solution of the rst of the above equations can
be represented by known functions if = 2 or = 4k/(2k1), k = 1, 2, , and in
1841 Liouville showed that for any other values of its solution cannot be expressed as
an integral of elementary functions
19
. The more general equation 2.23 was also studied
17
It was apparently DAlembert in 1770 rst used the name Riccatis equation for this equation.
18
Acta Eruditorum, Suppl, VIII 1794, pp. 66-73.
19
Here the term elementary function has a specic meaning which is dened in the glossary.
by Euler. This equation has since appeared in many contexts, indeed whole books are
devoted to it and its generalisations: we shall meet it again in chapter 8.
This type of equation arose in Riccatis investigations into plane curves with radii
of curvature solely dependent upon the ordinate. The radius of curvature, , of a curve
described by a function y(x), where x and y are Cartesian coordinates, is given by
1
=
y
(x)
(1 +y
(x)
2
)
3/2
. (2.24)
This expression is derived in exercise 2.66. Thus if depends only upon the ordinate,
y, we would have a second-order equation f(y, y
, y
) = 0, which does not depend

explicitly upon x. Such equations can be converted to rst-order equations by the
simple device of regarding y as the independent variable: dene p = dy/dx and express
y
(x) in terms of p and p
(y), using the chain rule as follows,

d
2
y
dx
2
=
dp
dx
=
dp
dy
dy
dx
= p
dp
dy
.
Thus the second-order equation f(y, y
, y
) = 0 is reduced to the rst-order equation

f(y, p(y), p
(y)) = 0. Riccati chose particular functions to give the equations quoted at

the beginning of this section, but note that the symbols have changed their meaning.
Exercise 2.18
If a function y(x) can be expressed as the ratio
y =
cg(x) +G(x)
cf(x) +F(x)
where c is a constant and g, G, f and F are dierentiable functions of x, by
eliminating the constant c from this equation and its rst derivative, show that y
satises a Riccati equation.
Later we shall see that all solutions of Riccatis equations can be expressed in this
form.
Reduction to a linear equation
Riccatis equation is an unusual nonlinear equation because it can be converted to
a linear, second-order equation by dening a new dependent variable u(x) with the
equation
y =
1
uR
du
dx
(2.25)
to give, assuming R(x) = 0 in the interval of interest,
d
2
u
dx
2

_
Q+
R
R
_
du
dx
+PRu = 0, (2.26)
which is a linear, second-order equation.
Exercise 2.19
Derive equation 2.26.
Exercise 2.20
(a) Consider the equation
p2(x)
d
2
u
dx
2
+p1(x)
du
dx
+p0(x)u = 0.
By introducing a new function y(x) by u = exp
__
dx y
_
show that y satises
the Riccati equation
dy
dx
=
p0
p2
p1
p2
y y
2
.
(b) The general solution of the second-order equation for u has two arbitrary
constants. The general solution of the rst-order equation for y has one arbitrary
constant. Explain how this contradiction can be resolved.
Exercise 2.21
(a) Show that the Riccati equation considered in exercise 2.10 (page 62),
dy
dx
= 1 +xy
2
, y(0) = A,
has the associated linear equation
x
d
2
u
dx
2

du
dx
+x
2
u = 0, where y =
1
xu
du
dx
.
(b) By substituting the series u = a0 +a1x + a2x
2
+ into this equation show
that a
3k+1
= 0, k = 0, 1, , and that
(n
2
1)an+1 +an2 = 0.
By choosing (a0, a2) = (1, 0) and (a0, a2) = (0, 1) obtain the two independent
solutions
u1(x) = 1 +a3x
3
+a6x
6
+ +a
3k
x
3k
+ ,
u2(x) = x
2
+b5x
5
+b8x
8
+ +b
3k+2
x
3k+2
+ ,
where
a
3k
=
(1)
k
(2
2
1)(5
2
1) ((3k 1)
2
1)
and b
3k+2
=
(1)
k
(4
2
1)(7
2
1) ((3k + 1)
2
1)
.
Deduce that the radii of convergence of the series for u1(x) and u2(x) are innite.
(c) Show that the solution of the original Riccati equation is
y(x) =
u
1
(x) Au
2
(x)/2
x(u1(x) Au2(x)/2)
.
By considering the denominator show that for large A the singularity in y(x) is
at x =
_
2/A, approximately.
Method of solution when one integral is known (optional)
Euler noted that if a particular solution v(x) is known then the substitution y = v+1/z
gives a linear equation for z(x), from which the general solution can be constructed.
Substituting for y = v + 1/z into Riccatis equation gives
v
z
2
= P +Qv +Rv
2
+
Q
z
+
2Rv
z
+
R
z
2
which simplies to
z
+P
1
z = R where P
1
= Q+ 2Rv. (2.27)
This is a linear, rst-order equation that can be solved using the methods previously
discussed. There are a number of special values for P, Q and R for which this method
yields the general solution in terms of an integral: for completeness these are listed in
table 2.1. You are not expected to remember this table.
Table 2.1: A list of the coecients for Riccatis equation, y
= P(x) + Q(x)y + R(x)y

2
, for
which a simple particular integral, v(x) can be found. In this list is a real number and n an
integer, but in some cases it may be a real number.
Cases 7 and 13 have two particular integrals and this allows the general solution to be expressed
as an integral, see equation 2.28.
Case 16 is special, because the transformation z = x
n
y makes the equation separable, see
exercise 2.26.
Case 17 has an explicit solution if n = 2 and reduces to a Bessel function if n = 2, see
exercise 2.28.
P(x) Q(x) R(x) v
1 a(a +f(x)) f(x) 1 a
2 b(a +bf(x)) a f(x) b
3 f(x) xf(x) 1 1/x
4 anx
n1
ax
n
f(x) f(x) ax
n
5 anx
n1
a
2
x
2n
f(x) 0 f(x) ax
n
6 f(x) x
n+1
f(x) (n + 1)x
n
x
n1
7 ax
2n1
f(x) n/x f(x)/x
a x
n
8 a
2
f(x) ag(x) g(x) f(x) a
9 a
2
x
2n
f(x) ax
n
g(x) +anx
n1
g(x) f(x) ax
n
10 f(x) ae
x
f(x) ae
x
e
x
/a
11 ae
x
ae
x
f(x) f(x) ae
x
12 ae
x
a
2
e
2x
f(x) 0 f(x) ae
x
13 ae
2x
f(x) f(x)
a e
x
14 f
(x) f(x)
2
0 1 f(x)
15 g
(x) f(x)g(x) f(x) g(x)

16 bf(x)/x (ax
n
f(x) n)/x x
2n1
f(x)
17 bx
n
0 a
Exercise 2.22
Use the method described in this section to nd the solutions of the following
equations using the form of the particular solution, v(x), suggested, where a, b
are constants to be determined.
(a) y
= xy
2
+ (1 2x)y +x 1, y(0) = 1/2, v = a,
(b) y
= 1 +x x
3
+ 2x
2
y xy
2
, y(1) = 1, v = ax +b,
(c) 2y
= 1 + (y/x)
2
, y(1) = 1, v = ax +b,
(d) 2y
= (1 +e
x
)y +y
2
e
x
, y(0) = 1, v = ae
bx
.
Exercise 2.23
For the equation
dy
dx
= a(a +f(x)) +f(x)y +y
2
,
which is case 1 of table 2.1, show that the general solution is
y = a +
p(x)
C
_
dxp(x)
, p(x) = exp
_
2ax +
_
dxf(x)
_
.
Exercise 2.24
Decide which of the cases listed in table 2.1 corresponds to the equation
dy
dx
= 1 xy +y
2
.
Find the general solution in terms of an integral, and the solution for the condition
y(0) = a.
Method of solution when two integrals are known (optional)
If two particular integrals, v
1
(x) and v
2
(x), are known then the general solution can be
expressed as an integral of a known function. Suppose that y is the unknown, general
solution: then from the dening equations,
y
1
= (y v
1
)(Q + (y +v
1
)R) and y
2
= (y v
2
)(Q + (y +v
2
)R)
and hence
y
1
y v
1
2
y v
2
= (v
1
v
2
)R.
This equation can be integrated directly to give
ln
_
y v
1
y v
2
_
=
_
dx(v
1
v
2
)R, (2.28)
which is the general solution.
Exercise 2.25
Using the trial function y = Ax
a
, where A and a are constants, for each of the
following equations nd two particular integrals and hence the general solution.
(a) x
2
dy
dx
+ 2 +x
2
y
2
= 2xy,
(b) (x
2
1)
dy
dx
+x + 1 (x
2
+ 1)y + (x 1)y
2
= 0,
(c) x
2
dy
dx
= 2 x
2
y
2
.
Exercise 2.26
Show that the equation
dy
dx
=
b
2
x(1 +x
2
)

2
x
y +
x
3
y
2
1 +x
2
is an example of case 16 of table 2.1 and hence nd its general solution.
Exercise 2.27
Use case 7 to show that the particular and general solutions of
dy
dx
= A
2
x
2n1
f(x) +
n
x
y +
f(x)
x
y
2
are y = Ax
n
and
y = Ax
n
1 +Bexp(F(x))
1 Bexp(F(x))
where F(x) = 2A
_
dxx
n1
f(x)
and B is arbitrary constant.
Exercise 2.28
This exercise is about case 17, that is, the Riccati equation
dy
dx
= bx
n
+ay
2
.
(a) Using the transformation y =
w
aw
transform this equation to the linear
equation
d
2
w
dx
2
+abx
n
w = 0.
Using the further transformation to the independent variable z = x
and of the
dependent variable w(z) = z
u(z), show that u(z) satises the equation

z
2
d
2
u
dz
2
+z
du
dz
+
_
+
ab
2
z
n+2
_
u = 0.
Choosing the coecient of zu
to be unity and such that (n + 2) = 2, show

that
z
2
d
2
u
dz
2
+z
du
dz
+
_
4ab
(n + 2)
2
z
2
1
(n + 2)
2
_
u = 0, n = 2.
Deduce that the general solution is
u(z) = AJ 1
n+2
_
2
ab
n + 2
z
_
+BY 1
n+2
_
2
ab
n + 2
z
_
, z = x
n+2
2
,
where J() and Y() are the two ordinary Bessel functions satisfying Bessels
equation
2
d
2
w
d
2
+
dw
d
+ (
2
2
)w = 0
and = (n + 2)/2 and = 1/(n + 2).
(b) If n = 2 show that solutions of the equation for w(x) are w = Bx
where B
is an arbitrary constant and are the solutions of
2
+ ab = 0. Deduce that
particular solutions of the original Riccati equation are y =

ax
and hence that
its general solution is
y =
A2 1x
d
ax(x
d
A)
, d =
1 4ab, 1, 2 =
1
2
(1 d),
where A is an arbitrary constant.
2.4 Second-order equations
2.4.1 Introduction
In this section we introduce some aspects of linear, second-order equations, which fre-
quently arise in the description of physical systems. There are two themes to this
section: in section 2.4.2 we discuss some important general properties of linear equa-
tions, which are largely due to linearity and which make this type of equation much
easier to deal with than nonlinear equations: this discussion is continued in chapter 13
where we shall see that many properties of the solutions of some equations can be de-
termined without nding explicit solutions. Second, in section 2.4.4 we describe various
tricks to nd solutions for particular types of equation.
2.4.2 General ideas
In this section we describe some of the general properties of linear, second-order dier-
ential equations. The equation we consider is the inhomogeneous equation,
p
2
(x)
d
2
y
dx
2
+p
1
(x)
dy
dx
+p
0
(x)y = h(x), a x b, (2.29)
where the coecients p
k
(x), k = 0, 1, 2 are real and assumed to be continuous for
x (a, b). The interval (a, b) may be nite or innite.
The nature of the solutions depends upon p
2
(x), the coecient of y
(x). The theory

is valid in intervals for which p
2
(x) = 0 and for which p
1
/p
2
and p
0
/p
2
are continuous.
If p
2
(x) = 0 at some point x = c the equation is said to be singular at x = c, or to have
a singular point. Singular points, when they exist, always dene the ends of intervals
of denition; hence we may always choose p
2
(x) 0 for x [a, b].
2.4. SECOND-ORDER EQUATIONS 73
The homogeneous equation associated with equation 2.29 is obtained by setting
h(x) = 0,
p
2
(x)
d
2
y
dx
2
+p
1
(x)
dy
dx
+p
0
(x)y = 0, a x b. (2.30)
All homogeneous equations have the trivial solution y(x) = 0, for all x. Solutions that
do not vanish identically are called nontrivial.
Equations 2.29 and 2.30 can be transformed into other forms which are more use-
ful. The two most useful changes are dealt with in exercise 2.31; the rst of these is
important for the general theory discussed in this course and the second is particularly
useful for certain types of approximations.
The solutions of equations 2.29 and 2.30 satisfy the following properties.
P1: Solutions of the homogeneous equation satisfy the superposition principle:
that is if f(x) and g(x) are solutions of equation 2.30 then so is any linear com-
bination
y(x) = c
1
f(x) +c
2
g(x)
where c
1
and c
2
are any constants.
P2: Uniqueness of the initial value problem. If p
1
/p
2
and p
0
/p
2
are contin-
uous for x [a, b] then at most one solution of equation 2.29 can satisfy the given
initial conditions y(a) =
0
, y
(a) =
1
, theorem 2.2 (page 81).
P3: If f(x) and g(x) are solutions of the homogeneous equation 2.30 and if, for
some x = , the vectors (f(), f
()) and (g(), g
()) are linearly independent,

then every solution of equation 2.30 can be written as a linear combination of
f(x) and g(x),
y(x) = c
1
f(x) +c
2
g(x).
The two functions f(x) and g(x) are said to form a basis of the dierential equa-
tion.
P4: The general solution of the inhomogeneous equation 2.29 is given by the sum of
any particular solution and the general solution of the homogeneous equation 2.30.
Finally we observe that an ordinary point, x
0
, is where p
1
(x)/p
2
(x) and p
0
(x)/p
2
(x)
can be expanded as a Taylor series about x
0
, and that at every ordinary point the
solutions of the homogeneous equation 2.30 can also be represented by a Taylor series.
It is common, however, for either or both of p
1
(x)/p
2
(x) and p
0
(x)/p
2
(x) to be singular
at some point x
0
, and these points are named singular points, and these are divided
into two classes. If (xx
0
)p
1
(x)/p
2
(x) and (xx
0
)
2
p
0
(x)/p
2
(x) can be expanded as a
Taylor series, the singular point is regular: otherwise it is irregular. Irregular singular
points do not occur frequently in physical problems but, for the geometric reasons
discussed in chapter 13, regular singular points are common. For ordinary and regular
singular points there is a well developed and important theory of deriving the series
representation for the solutions of the homogeneous equation, but this is not relevant for
this course; good treatments can be found in Ince
20
, Piaggio
21
and Simmons
22
. There
is no equivalent theory for nonlinear equations.
20
Ince E L , 1956 Ordinary dierential equations, chapter XVI (Dover).
21
Piaggio H T H 1968, An Elementary treatise on Dierential Equations, chapter IX, G Bell and
Sons, rst published in 1920.
22
Simmons G F 1981, Dierential Equations, chapter 5, McGraw-Hill Ltd.
Exercise 2.29
Use property P2 to show that if a nontrivial solution of equation 2.30 y(x) is zero
at x = , then y
() = 0, that is the zeros of the solutions are simple.

Exercise 2.30
Consider the two vectors x = (x1, x2) and y = (y1, y2) in the Cartesian plane.
Show that they are linearly independent, that is not parallel, if
x1y2 x2y1 =
x1 x2
y1 y2
= 0.
Exercise 2.31
Consider the second-order, homogeneous, linear dierential equation
p2(x)
d
2
y
dx
2
+p1(x)
dy
dx
+p0(x)y = 0.
(a) Show that it may be put in the canonical form
d
dx
_
p(x)
dy
dx
_
+q(x)y = 0 (2.31)
where p(x) = exp
__
dx
p1(x)
p2(x)
_
and q(x) =
p0(x)
p2(x)
p(x).
Equation 2.31 is known as the self-adjoint form and this transformation shows that
most linear, second-order, homogeneous dierential equation may be cast into this
form: the signicance of this transformation will become apparent in chapter 13.
(b) By putting y = uv, with a judicious choice of the function v(x), show that
equation 2.31 may be cast into the form
d
2
u
dx
2
+I(x)u = 0, u = y
p, (2.32)
and where I(x) =
1
4p
2
_
p
2
+ 4qp 2pp
_
. Equation 2.32 is sometimes known as
the normal form and I(x) the invariant of the original equation.
2.4.3 The Wronskian
In property P3 we introduced the vectors (f, f
) and (g, g
) and in exercise 2.30 it was

shown that these vectors are linearly independent if
W(f, g; x) =
f(x) f
(x)
g(x) g
(x)
= f(x)g
(x) f
(x)g(x) = 0. (2.33)
The function W(f, g; x) is named the Wronskian
23
of the functions f(x) and g(x). This
notation for the Wronskian shows which functions are used to construct it and the
23
Josef Hoene (1778 1853) was born in Poland, moved to France and become a French citizen in
1800. He moved to Paris in 1810 and adpoted the name Josef Hoene de Wronski at about that time,
just after he married.
independent variable; sometimes such detail is unnecessary so either of the notations
W(x) or W(f, g) is freely used.
If W(f, g; x) = 0 for a < x < b the functions f(x) and g(x) are said to be linearly
independent in (a, b); alternatively if W(f, g; x) = 0 they are linearly dependent. These
rules apply only to suciently smooth functions.
The Wronskian of any two solutions, f and g, of equation 2.30 satises the identity
W(f, g; x) = W(f, g; a) exp
_
_
x
a
dt
p
1
(t)
p
2
(t)
_
. (2.34)
This identity is proved in exercise 2.36 by showing that W(x) satises a rst-order
dierential equation and solving it. Because the right-hand side of equation 2.34 always
has the same sign, it follows that the Wronskian of two solutions is either always positive,
always negative or always zero. Thus, if f and g are linearly independent at one point
of the interval (a, b) they are linearly independent at all points of (a, b). Conversely, if
W(f, g) vanishes anywhere it vanishes everywhere. Further, if p
1
(x) = 0 the Wronskian
is constant.
The Wronskian can be used with one known solution to construct another. Suppose
that f(x) is a known solution and let g(x) be another (unknown) solution. The equation
for W(x) can be interpreted as a rst-order equation for g,
g
f gf
= W(x),
and, because g
f gf
= f
2
d
dx
_
g
f
_
, this equation, with 2.34, can be written in the
form
d
dx
_
g
f
_
=
W(a)
f(x)
2
exp
_
_
x
a
dt
p
1
(t)
p
2
(t)
_
having the general solution
g(x) = f(x)
_
C +W(a)
_
x
a
ds
1
f(s)
2
exp
_
_
s
a
dt
p
1
(t)
p
2
(t)
__
, (2.35)
where C is an arbitrary constant.
Exercise 2.32
If F(z) is a dierentiable function and g = F(f), with f(x) a dierentiable,
non-constant function of x, show that W(f, g) = 0 only if g(x) = cf(x) for any
constant c.
Exercise 2.33
Show that the functions a1 sin x +a2 cos x and b1 sin x +b2 cos x are linearly inde-
pendent if a1b2 = a2b1.
Exercise 2.34
Use equation 2.35 to show that if f(x) is any nontrivial solution of the equation
y
+q(x)y = 0 for a < x < b, then another solution is g(x) = f(x)

_
x
a
ds
f(s)
2
.
Exercise 2.35
(a) If f and g are linearly independent solutions of the homogeneous dierential
equation y
+p1(x)y
+p0(x)y = 0, show that

p1(x) =
fg
gf
W(f, g; x)
and p0(x) =
f
W(f, g; x)
.
(b) Construct three linear, homogeneous, second-order dierential equations hav-
ing the following bases of solutions:
(i) (x, sin x), (ii) (x
a
, x
a+b
), (iii) (x, e
ax
),
where a and b are real numbers. Determine any singular points of these equations,
and in case (ii) consider the limit b = 0.
Exercise 2.36
By dierentiating the Wronskian W(f, g; x), where f and g are linearly indepen-
dent solutions of equation 2.30, show that it satises the rst-order dierential
equation
dW
dx
=
p1(x)
p2(x)
W
and hence derive equation 2.34.
2.4.4 Second-order, constant coecient equations
A linear, second-order equation with constant coecients has the form
a
2
d
2
y
dx
2
+a
1
dy
dx
+a
0
y = h(x), (2.36)
where a
k
, k = 0, 1, 2 are real constants, h(x) a real function of only x and, with no loss
of generality, a
2
> 0. Normally this type of equation is solved by nding the general
solution of the homogeneous equation,
a
2
d
2
y
dx
2
+a
1
dy
dx
+a
0
y = 0, (2.37)
which contains two arbitrary constants, and adding to this any particular solution of
the original inhomogeneous equation, dened by equation 2.36.
The rst part of this process is trivial because, for any constant , the nth derivative
of exp(x) is
n
exp(x), that is, a constant multiple of the original function. Thus if
we substitute y = exp(x) into the homogeneous equation a quadratic equation for
is obtained,
a
2
2
+a
1
+a
0
= 0. (2.38)
This has two roots,
1
and
2
, and provided
1
=
2
we have two independent solutions,
giving the general solution
y
g
= c
1
exp(
1
x) +c
2
exp(
2
x), (
1
=
2
). (2.39)
If
1
and
2
are real, so are the constants c
1
and c
2
. If the roots are complex then
1
=
2
and, to obtain a real solution, we need c
1
= c
2
. The case
1
=
2
is special
and will be considered after the next exercise.
Exercise 2.37
Find real solutions of the following constant coecient, dierential equations: if
no initial or boundary values are given nd the general solution. Here and k
are real.
(a) y
+ 5y
+ 6y = 0,
(b) 4y
+ 8y
+ 3y = 0,
(c) y
+y
+y = 0,
(d) y
+ 4y
+ 5y = 0, y(0) = 0, y
(0) = 2,
(e) y
+ 6y
+ 13y = 0, y(0) = 2, y
(0) = 1,
(f) y
+
2
y = 0, y(0) = a, y
(0) = b,
(g) y
2
y = 0, y(0) = a, y
(0) = b,
(h) y
+ 2ky
+ (
2
+k
2
)y = 0.
Repeated roots
If the roots of equation 2.38 are identical, that is a
2
1
= 4a
0
a
2
, then =
a
1
2a
2
and the
above method yields only one solution, y = exp(x). The other solution of 2.37 is found
using the method of variation of parameters, introduced in exercise 2.14. Assuming
that the other solution is y = v(x) exp(x), where v(x) is an unknown function, and
substituting into equation 2.37 gives
d
2
v
dx
2
= 0. (2.40)
Thus another independent solution is y = xexp(x), the general solution is
y
g
(x) = (c
1
+c
2
x) exp(x), =
a
1
2a
2
, (a
2
1
= 4a
0
a
2
). (2.41)
Exercise 2.38
Exercise 2.39
Find the solutions of
(a) y
6y
+ 9y = 0, y(0) = 0, y
(0) = b,
(b) y
+ 2y
+y = 0, y(0) = a, y(X) = b.
2.4.5 Inhomogeneous equations
The general solution of the inhomogeneous equation
a
2
d
2
y
dx
2
+a
1
dy
dx
+a
0
y = h(x) (2.42)
can be written as the sum of the general solution of the homogeneous equation and
any particular integral of the inhomogeneous equation. This is true whether or not the
coecients a
k
, k = 0, 1 and 2, are constant: but here we consider only the simpler
constant coecient case. Boundary or initial conditions must be applied to this sum,
not the component parts.
There are a variety of methods for attempting to nd a particular integral. The
problem can sometimes be made simpler by splitting h(x) into a sum of simpler terms,
h = h
1
+ h
2
, and nding particular y
1
and y
2
for h
1
and h
2
: because the equation is
linear the required particular integral is y
1
+y
2
.
Sometimes the integral can be found by a suitable guess. Thus if h(x) = x
n
, n
being a positive integer, we expect a particular integral to have the form
n
k=0
c
k
x
k
.
By substituting this into equation 2.42 and equating the coecients of x
k
to zero, n+1
equations for the n + 1 coecients are obtained.
Exercise 2.40
Find the general solution of
d
2
y
dx
2
+
2
y = x
2
, > 0,
and nd the solution that satises the initial conditions y(0) = a, y
(0) = b.
If h(x) = e
x
, where may be complex, then provided a
2
2
+a
1
+a
0
= 0 a particular
integral is e
x
/(a
2
2
+ a
1
+ a
0
). But if a
2
2
+ a
1
+ a
0
= 0 we can use the method
of variation of parameters by substituting y = v(x)e
x
into the equation, to form a
simpler equation for v(x). These calculations form the basis of the next exercise.
Exercise 2.41
(a) For the equation
a2
d
2
y
dx
2
+a1
dy
dx
+a0y = e
x
,
by substituting the function y = Ae
x
, A a constant, into the equation nd a
particular integral if a2
2
+a1 +a0 = 0.
(b) If a2
2
+a1+a0 = 0, put y = v(x)e
x
and show that v satises the equation
a2
d
2
v
dx
2
+ (2a2 +a1)
dv
dx
= 1,
and that this equation has the general solution
v =
x
2a2 +a1
+B
Aa2
2a2 +a1
e
x(2+a
1
/a
2
)
.
Hence show that a particular integral is y =
x
2a2 +a1
e
x
.
Exercise 2.42
Find the solutions of the following inhomogeneous equations with the initial con-
ditions y(0) = a, y
(0) = b.
(a)
d
2
y
dx
2
+y = e
ix
, (b)
d
2
y
dx
2
y = sin x, (c)
d
2
y
dx
2
4y = 6,
(d)
d
2
y
dx
2
+ 9y = 1 + 2x, (e)
d
2
y
dx
2

dy
dx
6y = 14 sin 2x + 18 cos 2x.
A more systematic method of nding particular integrals is to convert the solution
to an integral. This transformation is achieved by applying the method of variation
of parameters using two linearly independent solutions of the homogeneous equation,
which we denote by f(x) and g(x). We assume that the solution of the inhomogeneous
equation can be written in the form
y = c
1
(x)f(x) +c
2
(x)g(x), (2.43)
where c
1
(x) and c
2
(x) are unknown functions, to be found. It transpires that both of
these are given by separable, rst-order equations; but the analysis to derive this result
is a bit involved.
By substituting this expression into the dierential equation, it becomes
a
2
(c
1
f + 2c
1
f
+c
1
f
) +a
2
(c
2
g + 2c
2
g
+c
2
g
)
+a
1
(c
1
f +c
1
f
) +a
1
(c
2
g +c
2
g
) +a
0
(c
1
f + c
2
g) = h(x).
We expect this expression to simplify because f and g satisfy the homogeneous equation:
some re-arranging gives
c
1
(a
2
f
+a
1
f
+a
0
f) +c
2
(a
2
g
+a
1
g
+a
0
g)
+a
2
(c
1
f + 2c
1
f
) +a
1
c
1
f
+a
2
(c
2
g + 2c
2
g
) +a
1
c
2
g = h(x).
The rst line of this expression is identically zero; the second line can be written in the
form
a
2
(c
1
f +c
1
f
) +a
2
c
1
f
+a
1
c
1
f = a
2
(c
1
f)
+a
2
(c
1
f
) +a
1
(c
1
f),
and similarly for the third line. Adding these two expressions we obtain
a
2
(c
1
f +c
2
g)
+a
2
(c
1
f
+c
2
g
) +a
1
(c
1
f +c
2
g) = h(x). (2.44)
This identity will hold if c
1
and c
2
are chosen to satisfy the two equations
c
1
f +c
2
g = 0,
c
1
f
+c
2
g
=
h(x)
a
2
.
(2.45)
Any solutions of these equations will yield a particular integral.
For each x, these are linear equations in c
1
and c
2
, and since the Wronskian
W(f, g) = 0, for any x, they have unique solutions given by
c
1
(x) =
hg
W(f, g)
, c
2
(x) =
hf
W(f, g)
. (2.46)
Integrating these gives a particular integral. Notice that in this derivation, at no point
did we need to assume that the coecients a
0
, a
1
and a
2
are constant. Hence this result
is true for the general case, when these coecients are not constant, although it is then
more dicult to nd the solutions, f and g, of the homogeneous equation.
As an example we re-consider the problem of exercise 2.40, for which two linearly
independent solutions are f = cos x and g = sin x, giving W(f, g) = and equa-
tions 2.46 give
c
1
=
1
_
dxx
2
sin x, c
2
=
1
_
dxx
2
cos x,
=
_
x
2
2

2
4
_
cos x
2x
2
sin x, =
_
x
2
2

2
4
_
sin x +
2x
2
cos x.
Thus
y = c
1
cos x +c
2
sinx +
x
2
2

2
4
,
the result obtained previously, although the earlier method was far easier.
Exercise 2.43
Find the general solution of the equation
d
2
y
dx
2
+y = tan x, 0 x <

2
.
2.4.6 The Euler equation
The linear, second-order dierential equation
a
2
x
2
d
2
y
dx
2
+a
1
x
dy
dx
+a
0
y = 0, a
2
> 0, (2.47)
where the coecients a
0
, a
1
and a
2
are constants, is named a (homogeneous) Euler
equation, of second order. This equation is normally dened on an interval of the x-
axis which does not include the origin except, possibly, as an end point. It is one of the
relatively few equations with variable coecients that can be solved in terms of simple
functions.
If we introduce a new independent variable, t, by x = e
t
, then
dy
dx
=
dy
dt
dt
dx
=
1
x
dy
dt
that is x
dy
dx
=
dy
dt
. (2.48)
A second dierentiation gives
x
d
dx
_
x
dy
dx
_
=
d
2
y
dt
2
x
2
d
2
y
dx
2
=
d
2
y
dt
2

dy
dt
, (2.49)
and hence equation 2.47 becomes the constant coecient equation
a
2
d
2
y
dt
2
+ (a
1
a
2
)
dy
dt
+a
0
y = 0. (2.50)
This can be solved using the methods described in section 2.4.4.
2.5. AN EXISTENCE AND UNIQUENESS THEOREM 81
Exercise 2.44
Use the method described above to solve the equation
x
2
d
2
y
dx
2
+ 2x
dy
dx
6y = 0, y(1) = 1, y
(1) = 0, x 1.
Exercise 2.45
Find the solution of
x
d
2
y
dx
2
+
dy
dx
= 0, y(1) = A, y
(1) = A
, x 1.
Exercise 2.46
Show that if x = e
t
then
d
3
y
dt
3
= x
3
d
3
y
dx
3
+ 3x
2
d
2
y
dx
2
+x
dy
dx
, and hence that
x
3
d
3
y
dx
3
=
d
3
y
dt
3
3
d
2
y
dt
2
+ 2
dy
dy
.
Hence nd the general solution of the equation
x
3
d
3
y
dx
3
3x
2
d
2
y
dx
2
+ 6x
dy
dx
6y =
x, x 0.
2.5 An existence and uniqueness theorem
Here we quote a basic existence theorem for coupled rst-order systems, which is less
restrictive than theorem 2.1, but which does not provide a method of constructing
the solution. This proof was rst given by Cauchy in his lecture course at the

Ecole
polytechnique between 1820 and 1830.
Theorem 2.2
For the n coupled rst-order, autonomous, initial value system
dy
k
dx
= f
k
(y), y(x
0
) = A, (2.51)
where y = (y
1
, y
2
, . . . , y
n
), A = (A
1
, A
2
, . . . , A
n
) and where f
k
(y) are dierentiable
functions of y on some domain D, a
k
y
k
b
k
, a
k
< b
k
, k = 1, 2, , n,
then:
(i) for every real x
0
and A D there exists a solution satisfying the initial conditions
y(x
0
) = A, and;
(ii) this solution is unique in some neighbourhood of x containing x
0
.
A geometric understanding of this theorem comes from noting that in any region
where f
k
(y) = 0, for some k, all solutions are non-intersecting, smooth curves. More
precisely in a neighbourhood of a point y
0
where f
k
(y
0
) = 0, for some k, if all f
k
(y
0
)
have continuous second derivatives, it is possible to nd a new set of variables u such
that in the neighbourhood y
0
equation 2.51 transform to
du
1
dx
= 1 and
du
k
dx
= 0,
k = 2, 3, , n. In this coordinate system the solutions are straight lines, so such a
transformation is said to rectify the system. From this it follows that a unique solution
exists. A proof of the above theorem that uses this idea is give in Arnold
24
.
There are two points to notice. First, the non-autonomous system
dy
k
dx
= f
k
(y, x), y(x
0
) = A,
can, by setting x = y
n+1
, f
n+1
= 1, be converted to an n + 1 dimensional autonomous
system.
Second, we note that dierentiability of f
k
(y), for all k, is necessary for uniqueness.
Consider, for instance, the system dy/dx = y
2/3
, y(0) = 0, which has the two solutions
y(x) = 0 and y(x) = (x/3)
3
.
2.6 Envelopes of families of curves (optional)
The equation f(x, y) = 0 denes a curve in the Cartesian space Oxy. If the function
contains a parameter C, the equation becomes f(x, y, C) = 0 and a dierent curve is
obtained for each value of C. By varying C over an interval we obtain a family of curves:
the envelope of this family is the curve that touches every member of this family.
This envelope curve is given by eliminating C between the two equations
f(x, y, C) = 0 and
f
C
= 0. (2.52)
Before proving this result we illustrate the idea with the equation
xcos +y sin = r, r > 0, 0 2, (2.53)
where r is xed and is the parameter: for a given
value of this equation denes a straight line cutting
the x and y axes at r/ cos and r/ sin , respectively,
and passing a minimum distance of r from the origin.
Segments of ve of these lines are shown in gure 2.3,
and it is not too dicult to imagine more segments
and to see that the envelope is a circle of radius r.
For this example equations 2.52 become
xcos +y sin = r and xsin +y cos = 0.
0 0.2 0.4 0.6 0.8 1 1.2
0
0.2
0.4
0.6
0.8
1
1.2
Figure 2.3 Diagram showing ve
examples of the line dened in equa-
tion 2.53, with r = 1 and = k/14,
k = 2, 3, , 6.
Squaring and adding these eliminates to give x
2
+ y
2
= r
2
, which is the equation of
a circle with radius r and centre at the origin.
24
Arnold V I 1973 Ordinary Dierential Equations, section 32.6, translated and edited by R A
Silverman, The MIT Press.
2.6. ENVELOPES OF FAMILIES OF CURVES (OPTIONAL) 83
The signicance of envelopes in the theory of rst-order dierential equations is as
follows. Suppose that f(x, y, C) = 0 is the general solution of a rst-order equation,
so on each member of the family of curves f(x, y, C) = 0 the gradient satises the
dierential equation. Where the envelope touches a member of the family the gradient
and coordinates of the point on the envelope also satisfy the dierential equation. But,
by denition, the envelope touches some member of the family at every point along its
length. We therefore expect the envelope to satisfy the dierential equation: since it
does not include any arbitrary constant and is not one of the family of curves, it is a
singular solution.
We prove equation 2.52 by considering neighbouring
members of the family of curves f(x, y, C +kC) = 0,
k = 1, 2, 3, , 0 < C |C|, such that the curves
dened by f(x, y, C) and f(x, y, C+C) intersect at P,
those by f(x, y, C+C) and f(x, y, C+2C) at Q, and
so on, as shown in gure 2.4. As C 0 the members
of this family of curves approach each other as do the
points P, Q and R. The locus of these points form a
curve each point of which lies on successive members
of the original family.
P
2
4
3
R
Q
k=1
Figure 2.4
Consider two neighbouring members of the family
f(x, y, C) = 0 and f(x, y, C +C) = 0.
as C 0 we require values of x and y that satisfy both of these equations. The second
equation can be exanded,
f(x, y, C) +Cf
C
(x, y, C) +O(C
2
) = 0 and hence f
C
(x, y, C) = 0.
Thus the points on the locus of P, Q, R each satisfy both equations f(x, y, C) = 0
and f
C
(x, y, C) = 0, so the equation of the envelope is obtained by eliminating C from
these equations.
Exercise 2.47
The equation of a straight line intersecting the x-axis at a and the y-axis at b is
x/a +y/b = 1.
(a) Find the envelope, in the rst quadrant, of the family of straight lines such
that the sum of the intersects is constant, a +b = d > 0.
(b) Find the envelope, in the rst quadrant, of the family of straight lines such
that the product of the intersects is constant, ab = d
2
.
Exercise 2.48
Find the solution of each of the following dierential equations: if no initial or
boundary values are given, nd the general solution.
(a)
dy
dx
+y = y
1/2
, y(1) = A > 0.
(b)
dy
dx
y = y
1/5
, y(1) = A > 0.
(c)
1
y
dy
dx
x = xy, y(0) =
1
2
.
(d)
dy
dx
= sin(x y), y(0) = 0.
(e) x
dy
dx
= y +
_
x
2
+y
2
, y(0) = A > 0.
(f)
dy
dx
=
x + 2y 1
2x + 4y + 3
, y(1) = 0.
(g) y
dy
dx
x +y = 0, y(1) = 1.
(h)
dy
dx
+y sinhx = sinh 2x, y(0) = 1.
(i) x
dy
dx
+ 2y = x
3
, y(1) = 0.
(j)
dy
dx
= y tan x +y
3
tan
3
x, y(0) =
2.
(k) x
3
dy
dx
= y(x
2
+y).
Exercise 2.49
Find the solution of each of the following dierential equations: if no initial or
boundary values are given, nd the general solution.
(a)
d
2
y
dx
2
= 2x
_
dy
dx
_
2
, y(0) = 0, y
(0) = 1.
(b) x
2
d
2
y
dx
2
x
dy
dx
+y = x
3
lnx.
(c) x
d
2
y
dx
2
= y
dy
dx
.
(d) x
d
2
y
dx
2

dy
dx
= 3x
2
.
(e) x
d
2
y
dx
2
=
_
dy
dx
_
2
.
(f)
d
2
y
dx
2
+ (x +a)
_
dy
dx
_
2
= 0, y(0) = A, y
(0) = B, 0 < Ba
2
< 2.
(g) (y a)
d
2
y
dx
2
+
_
dy
dx
_
2
= 0.
Exercise 2.50
For each of the following equations, show that the given function, v(x), is a solution
and hence nd the general solution.
(a) x(1 x)
2
d
2
y
dx
2
+ (1 x
2
)
dy
dx
+ (1 +x)y = 0, v(x) = 1 x.
(b) x
d
2
y
dx
2
+ 2
dy
dx
+xy = 0, v =
cos x
x
.
(c)
d
2
y
dx
2
+ 2 tanx
dy
dx
+ 2y tan
2
x = 0, v = e
x
cos x.
Exercise 2.51
(a) Consider the Riccati equation with constant coecients,
dy
dx
= a +by +cy
2
, c = 0,
where a, b and c are constants.
Show that if b
2
= 4ac the general solution is
y(x) =
b
2c
+
_
_
_
c
tan(x +), =
1
2
4ac b
2
if 4ac > b
2
c
tanh(x +), =
1
2
b
2
4ac if 4ac < b
2
where is a constant. Also nd the general solution if b
2
= 4ac.
(b) Find the solutions of the following equations;
(i) y
= 2 + 3y +y
2
, (ii) y
= 9 4y
2
,
(iii) y
= 1 2y +y
2
, (iv) y
= 1 + 4y + 5y
2
.
Exercise 2.52
(a) Show that the change of variable v = y
/y reduces the second-order equation

d
2
y
dx
2
+a1(x)
dy
dx
+a0(x)y = 0
to the Riccati equation
dv
dx
+v
2
+a1(x)v +a0(x) = 0.
Hence deduce that the problem of solving the original second-order equation is
equivalent to solving the coupled rst-order equations
dy
dx
= vy,
dv
dx
= v
2
a1(x)v a0(x).
This equation is named the associated Riccati equation.
(b) Using an appropriate solution of y
+
2
y = 0, where is a real constant,
show that the general solution of v
+v
2
+
2
= 0 is v = tan(x +c).
Exercise 2.53
(a) If x(t) and y(t) satisfy the pair of coupled, linear equations
dx
dt
= ax +by,
dy
dt
= cx +dy,
where a, b, c and d are constants, show that the ratio z = y/x satises the Riccati
equation
dz
dt
= c + (d a)z bz
2
.
(b) Hence show that the general solution of this Riccati equation is
z =
1e
1
t
+C2e
2
t
b (e
1
t
+Ce
2
t
)
where 21,2 = (d a)
_
(d a)
2
+ 4bc
and C is an arbitrary constant.
Exercise 2.54
In this and the next exercise you will show that some of the equations studied by
Riccati have closed form solutions.
(a) Consider the equation
x
dz
dx
= az bz
2
+cx
n
, (2.54)
where a, b and c are constants. By putting z = yx
a
show that this becomes the
Riccati equation
dy
dx
= bx
a1
y
2
+cx
na1
and by changing the independent variable to = x
a
transform this to
dy
d
=
c
a
(n2a)/a
b
a
y
2
.
Deduce that if n = 2a the solution of the original equation can be expressed in
terms of simple functions.
(b) By substituting z =
a
b
+
x
n
u
into equation 2.54 show that it becomes
x
du
dx
= (n +a)u cu
2
+bx
n
,
which is the same but with (a, b, c) replaced by (n + a, c, b). Deduce that the
solution of equation 2.54 can be expressed in terms of simple functions if n = 2a
or n = 2(n +a).
Using further, similar transformations show that the original equation has closed-
form solutions if n = 2(ns +a), s = 0, 1, 2, .
Exercise 2.55
By putting z = x
n
/u into equation 2.54 show that u satises the equation
x
du
dx
= (n a)u cu
2
+bx
n
and deduce that z(x) can be expressed in terms of simple functions if n = 2(na).
By making further transformations of the type used in exercise 2.54, show that
z(x) can be expressed in terms of simple functions if n = 2(ns a), s = 1, 2, .
Exercise 2.56
Consider the sequence of functions
y0(x) = A+A
(x a),
yn(x) =
_
x
a
dt (t x)G(t)yn1(t), n = 1, 2, ,
where A and A
are arbitrary constants.

Show that if
y(x) =
k=0
y
k
(x)
and assuming that the innite series is uniformly convergent on an interval con-
taining x = a, then y(x) satises the second-order equation
d
2
y
dx
2
+G(x)y = 0, y(a) = A, y
(0) = A
.
Exercise 2.57
It is well known that the exponential function, E(x) = e
x
, is the solution of the
rst-order equation
dE
dx
= E, E(0) = 1. (2.55)
Not so well known is the fact that many of the properties of e
x
, for real x, can be
deduced directly from this equation.
(a) Using theorem 2.2 (page 81) deduce that there are no real values of x at which
E(x) = 0.
(b) By dening the function W(x) = 1/E(x), show that W
(y) = W(y), W(0) = 1,

where y = x, and deduce that E(x)E(x) = 1.
(c) If Z(x) = E(x + y) show that Z
(x) = Z(x), Z(0) = E(y), and hence deduce

that E(x +y) = E(x)E(y).
(d) If L(y) is the inverse function, that is if E(x) = y then L(y) = x, show that
L
(y) = 1/y, L(y1y2) = L(y1) +L(y2) and L(1/y) = L(y).

(e) Show that the Taylor series of E(x), L(1 +z) and L
_
1+z
1z
_
are
E(x) =
n=0
x
n
n!
, L(1 +z) =
n=1
(1)
n1
z
n
n
and L
_
1 +z
1 z
_
= 2
n=0
x
2n+1
2n + 1
.
Exercise 2.58
In this exercise you will derive some important properties of the sine and cosine
functions directly from the dierential equations that can be used to dene them.
Your solutions must not make use of trigonometric functions.
(a) Show that the solution of the initial value problem
d
2
z
dx
2
+z = 0, z(0) = , z
(0) = ,
can be written as an appropriate linear combination of the functions C(x) and
S(x) which are dened to be the solutions of the equations
_
C
_
= A
_
C
S
_
, A =
_
0 1
1 0
_
, C(0) = 1, S(0) = 0.
(b) Show that C(x)
2
+S(x)
2
= 1, which is Pythagorass theorem.
(c) If, for any real constant a,
f(x) = C(x +a) and g(x) = S(x +a)
show that
_
f
_
= A
_
f
g
_
, f(0) = C(a), g(0) = S(a),
and deduce that
C(x +a) = C(x)C(a) S(x)S(a) and S(x +a) = S(x)C(a) +C(x)S(a),
which are the trigonometric addition formulae.
(d) Show that there is a non-negative number X such that
C(nX) = 1 and S(nX) = 0 for all n = 0, 1, ,
and hence that for all x
C(x +X) = C(x), S(x +X) = S(x),
so that C(x) and S(x) are periodic functions with period X.
(e) Show that
S
_
X
4
_
= 1, C
_
X
4
_
= 0; S
_
X
2
_
= 0, C
_
X
2
_
= 1; S
_
3X
4
_
= 1, C
_
3X
4
_
= 0.
(f) Show that A
2
= I and hence that A
2n
= (1)
n
I and A
2n+1
= (1)
n
A.
By repeated dierentiation of the equations dening C and S show that
_
C
(n)
(0)
S
(n)
(0)
_
= A
n
_
1
0
_
and deduce that the Taylor expansions of C(x) and S(x) are
C(x) = 1
x
2
2!
+
x
4
4!
+ + (1)
n
x
2n
(2n)!
+ ,
S(x) = x
x
3
3!
+
x
5
5!
+ + (1)
n
x
2n+1
(2n + 1)!
+ .
Exercise 2.59
Find the normal forms, as dened in exercise 2.31(b) of Legendres equation
d
dx
_
(1 x
2
)
dy
dx
_
+y = 0.
Exercise 2.60
Show that changing to the independent variable t =
_
x
a
dx
_
q(x) converts the
equation y
+p1(x)y
+q(x)y = 0, a x b, q(x) > 0, into

d
2
y
dt
2
+
q
(x) + 2p1q
2q
3/2
dy
dt
+y = 0.
Exercise 2.61
If f(x) and g(x) and h(x) are any solutions of the second-order equation y
+
p1(x)y
+q(x)y = 0, show that the following determinant is zero
f f
g g
h h
.
Exercise 2.62
Using the results found in exercise 2.35 (page 76) to construct a linear, homoge-
neous, second-order dierential equation having the solutions
(a) (sinh x, sin x), (b) (tan x, 1/ tan x).
Exercise 2.63
Use the results found in exercise 2.35 (page 76) to show that the equation
d
2
y
dx
2

u
u
dy
dx
u
2
y = 0, u =
f
f
,
has solutions f(x) and 1/f(x).
Exercise 2.64
Let f(x), g(x) and h(x) be three solutions of the linear, third order dierential
equation
d
3
y
dx
3
+p2(x)
d
2
y
dx
2
+p1(x)
dy
dx
+p0(x)y = 0.
Derive a rst-order dierential equation for the Wronskian
W(x) =
f g h
f
.
You will need to dierentiate this determinant: the derivative of an n n deter-
minant, A, where the elements depend upon x is
d
dx
det(A) =
n
k=1
det(A
k
)
where A
k
is the determinant formed by dierentiating the kth row of A.
Exercise 2.65
The Schwarzian derivative
(a) If f(x) and g(x) are any two linearly independent solutions of the equation
y
+ q(x)y = 0, show that the ratio v = f/g is a solution of the third order,
nonlinear equation S(v) = 2q(x), where
S(v) =
v

3
2
_
v
_
2
.
(b) If a, b, c and d are four constants with ad = bc deduce that
S
_
av +b
cv +d
_
= S(v).
The function S(v) is named the Schwarzian derivative and has the important
property that if S(F) < 0 and S(G) < 0 in an interval, then S(H) < 0, where
H(x) = F(G(x)). This result is useful in study of bifurcations of the xed points
of one dimensional maps.
Exercise 2.66
The radius of curvature
The equation of the normal to a curve repre-
sented by the function y = f(x), through the
point (, ) is
y =
1
m()
(x ), m(x) =
df
dx
.
+
r
y
x
(a) Consider the adjacent normal, through the point ( +, +), where =
f
(), and nd the point where this intersects the normal through (, ), correct
to rst order in .
(b) If the curve dened by f(x) is a segment of a circle of radius r, all normals
intersect at its centre, a distance r from (, ). The point of intersection found in
part (a) will be a distance r(, ) from the point (, ) and we dene the radius
of curvature by the limit () = lim
0
r(, ). Use this denition to show that
1
=
f
()
(1 +f
()
2
)
3/2
.
Exercise 2.67
The tangent to a curve C intersects the x- and y-axes at x = a and y = b,
respectively. If the product ab = 2 is constant as the tangent moves on C, show
that the dierential equation for C is given by
2p = (px y)
2
, where p =
dy
dx
.
Notice that || is the area of the triangle formed by the axes and the tangent.
Show that the singular solution of this equation is the hyperbola xy =

2
, and
show that the general solution is a family of straight lines.
2.7.1 Applications of dierential equations
This section of exercises contains a few elementary applications giving rise to simple
rst-order equations. Part of each of these questions involves deriving a dierential
equation, so all of these exercises are optional.
Exercise 2.68
The number, N, of a particular species of atom that decays in suciently large
volume of material decreases at a rate proportional to N. The half-life of a sub-
stance containing only one species of decaying atoms is dened to be the time
for N to decrease to N/2. The half-life of Carbon-14 is 5600 years; if initially
there are N0 Carbon-14 atoms nd an expression for N(t), the number of atoms
at t 0.
Exercise 2.69
A moth ball evaporates, losing mass at a rate proportional to its surface area.
Initially it has radius 10 cm and after a month this has become 5 cm. Find its
radius as a function of time and the time at which it vanishes.
Exercise 2.70
A tank contains 1000 L of pure water. At time t = 0 brine containing 1 kg of
salt/L is added at a rate of one litre a minute, with the mixture kept uniform by
constant stirring, and one litre of the mixture is run o every minute, so the total
volume remains constant. When will there be 50 kg of dissolved salt in the tank?
Exercise 2.71
Torricellis law
Torricellis law states that water ows out of an open tank through a small hole
at a speed it would acquire falling freely from the surface to the hole.
A hemispherical bowl of radius R has a small circular hole, of radius a, drilled in
its bottom. It is initially full of water and at time t = 0 the hole is uncovered.
How long does it take for the bowl to empty?
Exercise 2.72
Water clocks
Water clocks, or clepsydra meaning water thief, are devices for measuring time
using the regular rate of ow of water, and were in use from the 15
th
century BC,
in Egypt, to about 100 BC
25
A simple version is a vessel from which water escapes from a small hole in the
bottom. It was used in Greek and Roman courts to time the speeches of lawyers.
Determine the shape necessary for the water level to fall at a constant rate.
25
Richards E G 1998 Mapping Time, pages 51-57, Oxford University Press.
Exercise 2.73
By winding a rope round a circular post, a rope can be used to restrain large
weights with a small force. If T() and T( + ) = T() + T are the tensions
in the rope at angles and + , then it can be shown that a normal force of
approximately T is exerted by the rope on the post in (, + ). If is the
coecient of friction between the rope and the post, then T T.
Use this to nd a dierential equation satised by T() and by solving this
nd T().
Exercise 2.74
A chain of length L starts with a length l0 hanging over the edge of a horizontal
table. It is released from rest at time t = 0. Neglecting friction determine how
long it takes to fall o the table.
Exercise 2.75
Lamberts law of absorption
Lamberts law of absorption states that the percentage of incident light absorbed
by a thin layer of translucent material is proportional to the thickness of the
layer. If sunlight falling vertically on ocean water is reduced to one-half its initial
intensity at a depth of 10 feet, nd a dierential equation for the intensity as a
function of the depth and determine the depth at which the intensity is
1
16
th of
the initial intensity.
Exercise 2.76
An incompressible liquid in a U-tube, as
shown in the gure, will oscillate if one side
of the liquid is initially higher than the other
side. If the liquid is initially a height h0 above
the other side, use conservation of energy to
show that, if friction can be ignored,
h
2
=
2g
L
(h
2
0
h
2
),
where h(t) is the dierence in height at time
t, L is the total length of the tube and g is
the acceleration due to gravity.
Use this formula to nd h(t) and to show that
the period of oscillations is T =
_
2L/g.
h
1
h
2
h
Figure 2.5
Exercise 2.77
It can be shown that a body inside the earth is attracted towards the centre by a
force that is directly proportional to the distance from the centre.
If a hole joining any two points on the surface is drilled through the earth and
a particle can move without friction along this tube, show that the period of
oscillation is independent of the end points. The rotation of the earth should be
ignored.
Chapter 3
The Calculus of Variations
3.1 Introduction
In this chapter we consider the particular variational principle dening the shortest
distance between two points in a plane. It is well known that this shortest path is the
straight line, however, it is almost always easiest to understand a new idea by applying it
to a simple, familiar problem; so here we introduce the essential ideas of the Calculus of
Variations by nding the equation of this line. The algebra may seem overcomplicated
for this simple problem, but the same theory can be applied to far more complicated
problems, and we shall see in chapter 4 the most important equation of the Calculus of
Variations, the Euler-Lagrange equation, can be derived with almost no extra eort.
The chapter ends with a description of some of the problems that can be formulated
in terms of variational principles, some of which will be solved later in the course.
The approach adopted is intuitive, that is we assume that functionals behave like
functions of n real variables. This is exactly the approach used by Euler (1707 1783)
and Lagrange (1736 1813) in their original analysis and it can be successfully applied
to many important problems. However, it masks a number of problems, all to do
with the subtle dierences between innite and nite dimensional spaces which are not
considered in this course.
3.2 The shortest distance between two points in a
plane
The distance between two points P
a
= (a, A) and P
b
= (b, B) in the Oxy-plane along a
given curve, dened by the function y(x), is given by the functional
S[y] =
_
b
a
dx
_
1 +y
(x)
2
. (3.1)
The curve must pass through the end points, so y(x) satises the boundary conditions,
y(a) = A and y(b) = B. We shall usually assume that y
(x) is continuous on (a, b).

We require the equation of the function that makes S[y] stationary, that is we need
to understand how the values of the functional S[y] change as the path between P
a
and
93
94 CHAPTER 3. THE CALCULUS OF VARIATIONS
P
b
varies. These ideas are introduced here, and developed in chapter 4, using analogies
with the theory of functions of many real variables.
3.2.1 The stationary distance
In the theory of functions of several real variables a stationary point is one at which the
values of the function at all neighbouring points are almost the same as at the station-
ary point. To be precise, if G(x) is a function of n real variables, x = (x
1
, x
2
, , x
n
),
we compare values of G at x and the nearby point x + , where || 1 and || = 1.
Taylors expansion, equation 1.37 (page 36), gives,
G(x +) G(x) =
n
k=1
G
x
k
k
+O(
2
). (3.2)
A stationary point is dened to be one for which the term O() is zero for all . This
gives the familiar conditions for a point to be stationary, namely G/x
k
= 0 for
k = 1, 2, , n.
For a functional we proceed in the same way. That is, we choose adjacent paths
joining P
a
to P
b
and compare the values of S along these paths. If a path is represented
by a dierentiable function y(x), adjacent paths may be represented by y(x) + h(x),
where is a real variable and h(x) another dierentiable function. Since all paths must
pass through P
a
and P
b
, we require y(a) = A, y(b) = B and h(a) = h(b) = 0; otherwise
h(x) is arbitrary. The dierence
S = S[y +h] S[y],
may be considered as a function of the real variable , for arbitrary y(x) and h(x) and
for small values of , || 1. When = 0, S = 0 and for small || we expect S to be
proportional to ; in general this is true as seen in equation 3.3 below.
However, there may be some paths for which S is proportional to
2
, rather than .
These paths are special and we dene these to be the stationary paths, curves or sta-
tionary functions. Thus a necessary condition for a path y(x) to be a stationary path
is that
S[y +h] S[y] = O(
2
),
for all suitable h(x). The equation for the stationary function y(x) is obtained by
examining this dierence more carefully.
The distances along these adjacent curves are
S[y] =
_
b
a
dx
_
1 +y
(x)
2
, and S[y +h] =
_
b
a
dx
_
1 + [y
(x) +h
(x)]
2
.
We proceed by expanding the integrand of S[y +h] in powers of , retaining only the
terms proportional to . One way of making this expansion is to consider the integrand
as a function of and to use Taylors series to expand in powers of ,
_
1 + (y
+h
)
2
=
_
1 +y
2
+
_
d
d
_
1 + (y
+h
)
2
_
=0
+O(
2
),
=
_
1 +y
2
+
y
_
1 +y
2
+O(
2
).
3.2. THE SHORTEST DISTANCE BETWEEN TWO POINTS IN A PLANE 95
Substituting this expansion into the integral and rearranging gives the dierence be-
tween the two lengths,
S[y +h] S[y] =
_
b
a
dx
y
(x)
_
1 +y
(x)
2
h
(x) +O(
2
). (3.3)
This dierence depends upon both y(x) and h(x), just as for functions of n real variables
the dierence G(x+)G(x), equation 3.2, depends upon both x and , the equivalents
of y(x) and h(x) respectively.
Since S[y] is stationary it follows, by denition, that
_
b
a
dx
y
(x)
_
1 +y
(x)
2
h
(x) = 0 (3.4)
for all suitable functions h(x).
We shall see in chapter 4 that because 3.4 holds for all those functions h(x) for
which h(a) = h(b) = 0 and h
(x) is continuous, this equation is sucient to determine

y(x) uniquely. Here, however, we simply show that if
y
(x)
_
1 +y
(x)
2
= = constant for all x, (3.5)
then the integral in equation 3.4 is zero for all h(x). Assuming that 3.5 is true, equa-
tion 3.4 becomes
_
b
a
dxh
(x) = {h(b) h(a)} = 0 since h(a) = h(b) = 0.

In section 4.3 we show that condition 3.5 is necessary as well as sucient for equation 3.4
to hold.
Equation 3.5 shows that y
(x) = m, where m is a constant, and integration gives

the general solution,
y(x) = mx +c
for another constant c: this is the equation of a straight line as expected. The constants
m and c are determined by the conditions that the straight line passes through P
a
and P
b
:
y(x) =
B A
b a
x +
Ab Ba
b a
. (3.6)
This analysis shows that the functional S[y] dened in equation 3.1 is stationary along
the straight line joining P
a
to P
b
. We have not shown that this gives a minimum
distance: this is proved in exercise 3.2.
Exercise 3.1
Use the above method on the functional
S[y] =
_
1
0
dx
_
1 +y
(x), y(0) = 0, y(1) = B > 1,

to show that the stationary function is the straight line y(x) = Bx, and that the
value of the functional on this line is S[y] =
1 +B.
3.2.2 The shortest path: local and global minima
In this section we show that the straight line 3.6 gives the minimum distance. For
practical reasons this analysis is divided into two stages. First, we show that the
straight line is a local minimum of the functional, using an analysis that is generalised
in chapter 8 to functionals. Second, we show that, amongst the class of dierentiable
functions, the straight line is actually a global minimum: this analysis makes use of
special features of the integrand.
The distinction between local and global extrema is illustrated in gure 3.1. Here
we show a function f(x), dened in the interval a x b, having three stationary
points B, C and D, two of which are minima the other being a maximum. It is clear
from the gure that at the stationary point D, f(x) takes its smallest value in the
interval so this is the global minimum. The function is largest at A, but this point
is not stationary this is the global maximum. The stationary point at B is a local
minimum, because here, f(x) is smaller than at any point in the neighbourhood of B:
likewise the points C and D are local maxima and minima, respectively. The adjective
local is frequently omitted. In some texts local extrema are named relative extrema.
f x ( )
C
A
B
D
E
a b
x
Figure 3.1 Diagram to illustrate the dierence be-
tween local and global extrema.
It is clear from this example that to classify a point as a local extremum requires an
examination of the function values only in the neighbourhood of the point. Whereas,
determining whether a point is a global extremum requires examining all values of the
function; this type of analysis usually invokes special features of the function.
The local analysis of a stationary point of a function, G(x), of n variables proceeds
by making a second order Taylor expansion about a point x = a,
G(a +) = G(a) +
n
k=1
G
x
k
k
+
1
2
2
n
k=1
n
j=1
2
G
x
k
x
j
j
+ ,
where all derivatives are evaluated at x = a. If G(x) is stationary at x = a then all
rst derivatives are zero. The nature of the stationary point is usually determined by
the behaviour of the second order term. For a stationary point to be a local minimum
it is necessary for the quadratic terms to be strictly positive for all , that is
n
j=1
n
k=1
2
G
x
k
x
j
j
> 0 for all
k
,
j
, k, j = 1, 2, , n,
with || = 1. The stationary point is a local maximum if this quadratic form is strictly
negative. For large n it is usually dicult to determine whether these inequalities are
satised, although there are well dened tests which are described in chapter 8.
3.2. THE SHORTEST DISTANCE BETWEEN TWO POINTS IN A PLANE 97
For a functional we proceed in the same way: the nature of a stationary path
is usually determined by the second order expansion. If S[y] is stationary then, by
denition,
S[y +h] S[y] =
1
2
2
[y, h]
2
+O(
3
)
for some quantity
2
[y, h], depending upon both y and h; special cases of this expansion
are found in exercises 3.2 and 3.3. Then S[y] is a local minimum if
2
[y, h] > 0 for
all h(x), and a local maximum if
2
[y, h] < 0 for all h(x). Normally it is dicult to
establish these inequalities, and the general theory is described in chapter 8. For the
functional dened by equation 3.1, however, the proof is straight forward; the following
exercise guides you through it.
Exercise 3.2
(a) Use the binomial expansion, exercise 1.32 (page 34), to obtain the following
expansion in ,
_
1 + ( +)
2
=
_
1 +
2
+

1 +
2
+

2
2
2(1 +
2
)
3/2
+O(
3
).
(b) Use this result to show that if y(x) is the straight line dened in equation 3.6
and S[y] the functional 3.1, then,
S[y +h] S[y] =

2
2(1 +m
2
)
3/2
_
b
a
dxh
(x)
2
+O(
3
), m =
B A
b a
.
Deduce that the straight line is a local minimum for the distance between Pa
and P
b
.
Exercise 3.3
In this exercise the functional dened in exercise 3.1 is considered in more detail.
By expanding the integrand of S[y +h] to second order in show that, if y(x) is
the stationary path, then
S[y +h] = S[y]

2
8(1 +B)
3/2
_
1
0
dxh
(x)
2
, B > 1.
Deduce that the path y(x) = Bx, B > 1, is a local maximum of this functional.
Now we show that the straight line between the points (0, 0) and (a, A) gives a global
minimum of the functional, not just a local minimum. This analysis relies on a special
property of the integrand that follows from the Cauchy-Schwarz inequality.
Exercise 3.4
Use the Cauchy-Schwarz inequality (page 41) with a = (1, z) and b = (1, z + u)
to show that
_
1 + (z +u)
2
_
1 +z
2
1 +z
2
+zu
with equality only if u = 0. Hence show that
_
1 + (z +u)
2
_
1 +z
2
zu
1 +z
2
.
The distance between the points (0, 0) and (a, A) along the path y(x) is
S[y] =
_
a
0
dx
_
1 +y
2
, y(0) = 0, y(a) = A.
On using the inequality derived in the previous exercise, with z = y
(x) and u = h
(x),
we see that
S[y +h] S[y]
_
a
0
dx
y
_
1 +y
2
h
.
But on the stationary path y
is a constant and since h(0) = h(a) = 0 we have

S[y +h] S[y] for all h(x).
This analysis did not assume that |h| is small, and since all admissible paths can
be expressed in the form x + h(x), we have shown that in the class of dierentiable
functions the straight line gives the global minimum of the functional.
An observation
Problems involving shortest distances on surfaces other than a plane illustrate other
features of variational problems. Thus if we replace the plane by the surface of a sphere
then the shortest distance between two points on the surface is the arc length of a
great circle joining the two points that is the circle created by the intersection of
the spherical surface and the plane passing through the two points and the centre of
the sphere; this problem is examined in exercise 5.20 (page 168). Now, for most points,
there are two stationary paths corresponding to the long and the short arcs of the great
circle. However, if the points are at opposite ends of a diameter, there are innitely
many shortest paths. This example shows that solutions to variational problems may
be complicated.
In general, the stationary paths between two points on a surface are named geodesics
1
.
For a plane surface the only geodesics are straight lines; for a sphere, most pairs of points
are joined by just two geodesics that are the segments of the great circle through the
points. For other surfaces there may be several stationary paths: an example of the
consequences of such complications is described next.
3.2.3 Gravitational Lensing
The general theory of relativity, discovered by Einstein (1879 1955), shows that the
path taken by light from a source to an observer is along a geodesic on a surface in a
four-dimensional space. In this theory gravitational forces are represented by distortions
to this surface. The theory therefore predicts that light is bent by gravitational
forces, a prediction that was rst observed in 1919 by Eddington (1882 1944) in his
measurements of the position of stars during a total solar eclipse: these observations
provided the rst direct conrmation of Einsteins general theory of relativity.
The departure from a straight line path depends upon the mass of the body be-
tween the source and observer. If it is suciently massive, two images may be seen as
illustrated schematically in gure 3.2.
1
In some texts the name geodesic is used only for the shortest path.
3.3. TWO GENERALISATIONS 99
Earth
Galaxy
Quasar
Quasar Image
Quasar Image
Light paths
Figure 3.2 Diagram showing how an intervening galaxy can suciently dis-
tort a path of light from a bright object, such as a quasar, to provide two
stationary paths and hence two images. Many examples of such multiple im-
ages, and more complicated but similar optical eects, have now been observed.
Usually there are more than two stationary paths.
3.3 Two generalisations
3.3.1 Functionals depending only upon y
(x)
The functional 3.1 (page 93) depends only upon the derivative of the unknown function.
Although this is a special case it is worth considering in more detail in order to develop
the notation we need.
If F(z) is a dierentiable function of z then a general functional of the form of 3.1 is
S[y] =
_
b
a
dxF(y
), y(a) = A, y(b) = B, (3.7)

where F(y
) simply means that in F(z) all occurrences of z are replaced by y
(x). Thus
for the distance between two points F(z) =
1 +z
2
so F(y
) =
_
1 +y
(x)
2
. Note
that the symbols F(y
) and F(y
(x)) denote the same function.

The dierence between the functional evaluated along y(x) and the adjacent paths
y(x) +h(x), where || 1 and h(a) = h(b) = 0, is
S[y +h] S[y] =
_
b
a
dx
_
F(y
+h
) F(y
)
_
. (3.8)
Now we need to express F(y
+h
) as a series in ; assuming that F(z) is dierentiable,

Taylors theorem gives
F(z +u) = F(z) +u
dF
dz
+O(
2
).
The expansion of F(y
+h
) is obtained from this simply by the replacements z y
(x)
and u h
(x), which gives

F(y
+h
) F(y
) = h
(x)
d
dy
F(y
) +O(
2
) (3.9)
where the notation dF/dy
means
d
dy
F(y
) =
dF
dz
z=y
(x)
. (3.10)
For instance, if F(z) =
1 +z
2
then
dF
dz
=
z
1 +z
2
and
dF
dy
=
y
(x)
_
1 +y
(x)
2
.
Exercise 3.5
Find the expressions for dF/dy
when
(a) F(y
) = (1 +y
2
)
1/4
, (b) F(y
) = sin y
, (c) F(y
) = exp(y
).
Substituting the dierence 3.9 into the equation 3.8 gives
S[y +h] S[y] =
_
b
a
dxh
(x)
d
dy
F(y
) +O(
2
). (3.11)
The functional S[y] is stationary if the term O() is zero for all suitable functions h(x).
As before we give a sucient condition, deferring the proof that it is also necessary. In
this analysis it is important to remember that F(z) is a given function and that y(x)
is an unknown function that we need to nd. Observe that if
d
dy
F(y
) = = constant (3.12)
then
S[y +h] S[y] =
_
h(b) h(a)
_
+O(
2
) = O(
2
) since h(a) = h(b) = 0.
In general equation 3.12 is true only if y
(x) is also constant, and hence

y(x) = mx +c and therefore y(x) =
B A
b a
x +
Ab Ba
b a
,
the last result following from the boundary conditions y(a) = A and y(b) = B.
This is the same solution as given in equation 3.6. Thus, for this class of functional,
the stationary function is always a straight line, independent of the form of the inte-
grand, although its nature can sometimes depend upon the boundary conditions, see
for instance exercise 3.18 (page 117).
The exceptional example is when F(z) is linear, in which case the value of S[y]
depends only upon the end points and not the values of y(x) in between, as shown in
the following exercise.
Exercise 3.6
If F(z) = Cz + D, where C and D are constants, by showing that the value of
the functional S[y] =
_
b
a
dxF(y
) is independent of the chosen path, deduce that

equation 3.12 does not imply that y
(x) =constant.
What is the eect of making either, or both C and D a function of x?
3.3. TWO GENERALISATIONS 101
3.3.2 Functionals depending upon x and y
(x)
Now consider the slightly more general functional
S[y] =
_
b
a
dxF(x, y
), y(a) = A, y(b) = B, (3.13)

where the integrand F(x, y
) depends explicitly upon the two variables x and y
. The
dierence in the value of the functional along adjacent paths is
S[y +h] S[y] =
_
b
a
dx
_
F(x, y
+h
) F(x, y
)
_
. (3.14)
In this example F(x, z) is a function of two variables and we require the expansion
F(x, z +u) = F(x, z) +u
F
z
+O(
2
)
where Taylors series for functions of two variables is used. Comparing this with the
expression in equation 3.9 we see that the only dierence is that the derivative with
respect to y
has been replaced by a partial derivative. As before, replacing z by y
(x)
and u by h
(x), equation 3.14 becomes

S[y +h] S[y] =
_
b
a
dxh
(x)

y
F(x, y
) +O(
2
). (3.15)
If y(x) is the stationary path it is necessary that
_
b
a
dxh
(x)

y
F(x, y
) = 0 for all h(x).

As before a sucient condition for this is that F
y
(x, y
) = constant, which gives the

following dierential equation for y(x),
F(x, y
) = c, y(a) = A, y(b) = B, (3.16)

where c is a constant. This is the equivalent of equation 3.12, but now the explicit
presence of x in the equation means that y
(x) =constant is not a solution.

Exercise 3.7
Consider the functional
S[y] =
_
1
0
dx
_
1 +x +y
2
, y(0) = A, y(1) = B.
Show that the function y(x) dened by the relation,
y
(x) = c
_
1 +x +y
(x)
2
,
where c is a constant, makes S[y] stationary. By expressing y
(x) in terms of x
solve this equation to show that
y(x) = A +
(B A)
(2
3/2
1)
_
(1 +x)
3/2
1
_
.
3.4 Notation
In the previous sections we used the notation F(y
) to denote a function of the derivative

of y(x) and proceeded to treat y
as an independent variable, so that the expression

dF/dy
had the meaning dened in equation 3.10. This notation and its generalisation
are very important in subsequent analysis; it is therefore essential that you are familiar
with it and can use it.
Consider a function F(x, u, v) of three variables, for instance F = x
u
2
+v
2
, and
assume that all necessary partial derivatives of F(x, u, v) exist. If y(x) is a function of
x we may form a function of x with the substitutions u y(x), v y
(x), thus
F(x, u, v) becomes F(x, y, y
).
Depending upon circumstances F(x, y, y
) can be considered either as a function of a

single variable x, as when evaluating the integral
_
b
a
dxF(x, y(x), y
(x)), or as a function
of three independent variables (x, y, y
). In the latter case the rst partial derivatives

with respect to y and y
are just
F
y
=
F
u
u=y,v=y
and
F
y
=
F
v
u=y,v=y
.
Because y depends upon x we may also form the total derivative of F(x, y, y
) with
respect to x using the chain rule, equation 1.22 (page 27)
dF
dx
=
F
x
+
F
y
y
(x) +
F
y
(x). (3.17)
In the particular case F(x, u, v) = x
u
2
+v
2
these rules give
F
x
=
_
y
2
+y
2
,
F
y
=
xy
_
y
2
+ y
2
,
F
y
=
xy
_
y
2
+y
2
.
Similarly, the second order derivatives are
2
F
y
2
=

2
F
u
2
u=y,v=y
,

2
F
y
2
=

2
F
v
2
u=y,v=y
and

2
F
yy
=

2
F
uv
u=y,v=y
.
Because you must be able to use this notation we suggest that you do all the following
exercises before proceeding.
Exercise 3.8
If F(x, y
) =
_
x
2
+y
2
nd
F
x
,
F
y
,
F
y
,
dF
dx
and
d
dx
_
F
y
_
. Also, show that,
d
dx
_
F
y
_
=

y
_
dF
dx
_
.
3.4. NOTATION 103
Exercise 3.9
Show that for an arbitrary dierentiable function F(x, y, y
)
d
dx
_
F
y
_
=

2
F
y
2
y
+

2
F
yy
+

2
F
xy
.
Hence show that
d
dx
_
F
y
_
=

y
_
dF
dx
_
,
with equality only if F does not depend explicitly upon y.
Exercise 3.10
Use the rst identity found in exercise 3.9 to show that the equation
d
dx
_
F
y
F
y
= 0
is equivalent to the second-order dierential equation
2
F
y
2
y
+

2
F
yy
+

2
F
xy

F
y
= 0.
Note the rst equation will later be seen as crucial to the general theory described
in chapter 4. The fact that it is a second-order dierential equation means that
unique solutions can be obtained only if two initial or two boundary conditions
are given. Note also that the coecient of y
(x),
2
F/y
2
, is very important in
the general theory of the existence of solutions of this type of equation.
Exercise 3.11
(a) If F(y, y
) = y
_
1 +y
2
nd
F
y
,
F
y
,

2
F
y
2
and show that the equation
d
dx
_
F
y
F
y
= 0 becomes y
d
2
y
dx
2
1
_
dy
dx
_
2
= 0
and also that
d
dx
_
F
y
F
y
=
_
1 +y
2
_
3/2
_
y
2
d
dx
_
y
y
_
1
_
.
(b) By solving the equation y
2
(y
/y)
= 1 show that a non-zero solution of

y
d
2
y
dx
2
1
_
dy
dx
_
2
= 0 is y =
1
A
cosh(Ax +B),
for some constants A and B. Hint, let y be the independent variable and dene a
new variable z by the equation yz(y) = dy/dx to obtain an expression for dy/dx
that can be integrated.
3.5 Examples of functionals
In this section we describe a variety of problems that can be formulated in terms of
functionals, with solutions that are stationary paths of these functionals. This list is
provided because it is likely that you will not be familiar with these descriptions and
will be unaware of the wide variety of problems for which variational principles are
useful, and sometimes essential. You should not spend long on this section if time is
short; in this case you you should aim at obtaining a rough overview of the examples.
Indeed, you may move directly to chapter 4 and return to this section at a later date,
if necessary.
In each of the following sub-sections a dierent problem is described and the relevant
functional is written down; some of these are derived later. In compiling this list one
aim has been to describe a reasonably wide range of applications: if you are unfamiliar
with the underlying physical ideas behind any of these examples, do not worry because
they are not an assessed part of the course. Another aim is to show that there are
subtly dierent types of variational problems, for instance the isoperimetric and the
catenary problems, described in sections 3.5.5 and 3.5.6 respectively.
3.5.1 The brachistochrone
Given two points P
a
= (a, A) and P
b
= (b, B) in the same vertical plane, as in the
diagram below, we require the shape of the smooth wire joining P
a
to P
b
such that a
bead sliding on the wire under gravity, with no friction, and starting at P
a
with a given
speed shall reach P
b
in the shortest possible time.
P
b
P
a
x
y
Figure 3.3 The curved line joining Pa to P
b
is
a segment of a cycloid. In this diagram the axes
are chosen to give a = A = 0.
The name given to this curve is the brachistochrone, from the Greek, brachistos, shortest,
and chronos, time.
If the y-axis is vertical it can be shown that the time taken along the curve y(x) is
T[y] =
_
b
a
dx
1 +y
2
C 2gy
, y(a) = A, y(b) = B,
where g is the acceleration due to gravity and C a constant depending upon the initial
speed of the particle. This expression is derived in section 5.2.
3.5. EXAMPLES OF FUNCTIONALS 105
This problem was rst considered by Galileo (1564 1642) in his 1638 work Two
New Sciences, but lacking the necessary mathematical methods he concluded, erro-
neously, that the solution is the arc of a circle passing vertically through P
a
; exercise 5.4
(page 150) gives part of the reason for this error.
It was John Bernoulli (1667 1748), however, who made the problem famous when in
June 1696 he challenged the mathematical world to solve it. He followed his statement
of the problem by a paragraph reassuring readers that the problem was very useful in
mechanics, that it is not the straight line through P
a
and P
b
and that the curve is well
known to geometers. He also stated that he would show that this is so at the end of
the year provided no one else had.
In December 1696 Bernoulli extended the time limit to Easter 1697, though by this
time he was in possession of Leibnizs solution, sent in a letter dated 16
th
June 1696,
Leibniz having received notication of the problem on 9
th
June. Newton also solved
the problem quickly: apparently
2
the letter from Bernoulli arrived at Newtons house,
in London, on 29
th
January 1697 at the time when Newton was Warden of the Mint.
He returned from the Mint at 4pm, set to work on the problems and had solved it by
the early hours of the next morning. The solution was returned anonymously, to no
avail with Bernoulli stating upon receipt The lion is recognised by his paw. Further
details of this history and details of these solutions may be found in Goldstine (1980,
chapter 1).
The curve giving this shortest time is a segment of a cycloid, which is the curve traced
out by a point xed on the circumference of a vertical circle rolling, without slipping,
along a straight line. The parametric equations of the cycloid shown in gure 3.3 are
x = a( sin ), y = a(1 cos ),
where a is the radius of the circle: these equations are derived in section 5.2.1, where
other properties of the cycloid are discussed.
Other historically important names are the isochronous curve and the tautochrone.
A tautochrone is a curve such that a particle travelling along it under gravity reaches
a xed point in a time independent of its starting point; a cycloid is a tautochrone
and a brachistochrone. Isochronal means equal times so isochronous curves and
tautochrones are the same.
There are many variations of the brachistochrone problem. Euler
3
considered the
eect of resistance proportional to v
2n
, where v is the speed and n an integer. The
problem of a wire with friction, however, was not considered until 1975
4
. Both these
extensions require the use of Lagrange multipliers and are described in chapter 11.
Another variation was introduced by Lagrange
5
who allowed the end point, P
b
in g-
ure 3.3, to lie on a given surface and this introduces dierent boundary conditions that
the cycloid needs to satisfy: the simpler variant in which the motion remains in the
plane and one or both end points lie on given curves is treated in chapter 10.
2
This anecdote is from the records of Catherine Conduitt, nee Barton, Newtons niece who acted as
his housekeeper in London, see Newtons Apple by P Aughton, (Weidenfeld and Nicolson), page 201.
3
Chapter 3 of his 1744 opus, The Method of Finding Plane Curves that Show Some Property of
Maximum or Minimum. . . .
4
Ashby A, Brittin W E, Love W F and Wyss W, Brachistochrone with Coulomb Friction, Amer J
Physics 43 902-5.
5
Essay on a new method. . . , published in Vol II of the Miscellanea Taurinensai, the memoirs of
the Turin Academy.
3.5.2 Minimal surface of revolution
Here the problem is to nd a curve y(x) passing through two given points P
a
= (a, A)
and P
b
= (b, B), with A 0 and B > 0, as shown in the diagram, such that when
rotated about the x-axis the area of the curved surface formed is a minimum.
(a,A)
(b,B)
x
y
a
b
A
B
Figure 3.4 Diagram showing the cylindrical shape pro-
duced when a curve y(x), joining (a, A) to (b, B), is rotated
about the x-axis.
The area of this surface is shown in section 5.3 to be
S[y] = 2
_
b
a
dxy(x)
_
1 +y
2
,
and we shall see that this problem has solutions that can be expressed in terms of
dierentiable functions only for certain combinations of A, B and b a.
3.5.3 The minimum resistance problem
Newton formulated one of the rst problems to involve the ideas of the Calculus of
Variations. Newtons problem is to determine the shape of a solid of revolution with
the least resistance to its motion along its axis through a stationary uid.
Newton was interested in the problem of uid resistance and performed many exper-
iments aimed at determining its dependence on various parameters, such as the velocity
through the uid. These experiments were described in Book II of Principia (1687)
6
;
an account of Newtons ideas is given by Smith (2000)
7
. It is to Newton that we owe
the idea of the drag coecient, C
D
, a dimensionless number allowing the force on a
body moving through a uid to be written in the form
F
R
=
1
2
C
D
A
f
v
2
, (3.18)
where A
f
is the frontal area of the body, the uid density
8
, v = |v| where v is the
relative velocity of the body and the uid. For modern cars C
D
has values between
about 0.30 and 0.45, with frontal areas of about 30 ft
2
(about 2.8m
2
).
6
The full title is Philopsophiae naturalis Principia Mathematica, (Mathematical Principles of nat-
ural Philosophy.
7
Smith G E Fluid Resistance: Why Did Newton Change His Mind?, in The Foundations of New-
tonian Scholarship.
8
Note that this suggests that the 30
C change in temperature between summer and winter changes

F
R
by roughly 10%. The density of dry air is about 1.29 kg m
3
.
Newton distinguished two types of forces:
a) those imposed on the front of the body which oppose the motion, and
b) those at the back of the body resulting from the disturbance of the uid and which
may be in either direction.
He also considered two types of uid:
a) rareed uids comprising non-interacting particles spread out in space, such as a gas,
and
b) continuous uids, comprising particles packed together so that each is in contact
with its neighbours, such as a liquid.
The ideas sketched below are most relevant to rareed uids and ignore the second
type of force. They were used by Newton in 1687 to derive a functional, equation 3.21
below, for which the stationary path yields, in theory, a surface of minimum resistance.
This solution does not, however, agree with observation largely because the physical
assumptions made are too simple. Moreover, the functional has no continuously dif-
ferentiable paths that can satisfy the boundary conditions, although stationary paths
with one discontinuity in the derivative exist; but, Weierstrass (1815 1897) showed
that this path does not yield a strong minimum. These details are discussed further in
section 10.6. Nevertheless, the general problem is important and Newtons approach,
and the subsequent variants, are of historical and mathematical importance: we shall
mention a few of these variants after describing the basic problem.
It is worth noting that the problem of uid resistance is dicult and was not properly
understood until the early part of the 20
th
century. In 1752 dAlembert, (1717 1783),
published a paper, Essay on a New theory of the resistance of Fluids, in which he derived
the partial dierential equations describing the motion of an ideal, incompressible invis-
cid uid; the solution of these equations showed that resisting force was zero, regardless
of the shape of the body: this was in contradiction to observations and was hence-
forth known as dAlemberts paradox. It was not resolved until Prandtl (1875 1953)
developed the theory of boundary layers in 1904. This shows how uids of relatively
small viscosity, such as water or air, may be treated mathematically by taking account
of friction only in the region where essential, namely in the thin layer that exists in
the neighbourhood of the solid body. This concept was introduced in 1904, but many
decades passed before its ramications were understood: an account of these ideas can
be found in Schlichting (1955)
9
and a modern account of dAlemberts paradox can be
found in Landau and Lifshitz (1959)
10
. An eect of the boundary layer, and also turbu-
lence, is that the drag coecient, dened in equation 3.18, becomes speed dependent;
thus for a smooth sphere in air it varies between 0.07 and 0.5, approximately.
We now return to the main problem, which is to determine a functional for the
uid resistance. In deriving this it is necessary to make some assumptions about the
resistance and this, it transpires, is why the stationary path is not a minimum. The
main result is given by equation 3.21, and you may ignore the derivation if you wish.
It is assumed that the resistance is proportional to the square of the velocity. To
see why, consider a small plane area moving through a uid comprising many isolated
stationary particles, with density : the area of the plane is A and it is moving with
velocity v along its normal, as seen in the left-hand side of gure 3.5.
In order to derive a simple formula for the force on the area A it is helpful to
9
Schlichting H Boundary Layer Theory (McGraw-Hill, New York).
10
Landau L D and Lifshitz E M Fluid mechanics (Pergamon).
imagine the uid as comprising many particles, each of mass m and all stationary. If
there are N particles per unit volume, the density is = mN. In the small time t the
area A sweeps through a volume vtA, so NvtA particles collide with the area, as
shown schematically on the left-hand side of gure 3.5.
t v
O
v
N
Figure 3.5 Diagram showing the motion of a small area, A, through a rar-
eed gas. On the left-hand side the normal to the area is perpendicular to the
relative velocity; on the right-hand side the area is at an angle. The direction
of the arrows is in the direction of the gas velocity relative to the area.
For an elastic collision between a very large mass (that of which A is the small surface
element) with velocity v, and a small initially stationary mass, m, the momentum
change of the light particle is 2mv you may check this by doing exercise 3.23,
although this is not part of the course. Thus in a time t the total momentum transfer
is in the opposite direction to v, P = (2mv) (NvtA). Newtons law equates force
with the rate of change of momentum, so the force on the area opposing the motion is,
since = mN,
F =
P
t
= 2v
2
A. (3.19)
Equation 3.19 is a justication for the v
2
-law. If the normal, ON, to the area A is at
an angle to the velocity, as in the right-hand side side of gure 3.5, where the arrows
denote the uid velocity relative to the body, then the formula 3.19 is modied in two
ways. First, the signicant area is the projection of A onto v, so A Acos .
Second, the uid particles are elastically scattered through an angle 2 (because the
angle of incidence equals the angle of reection), so the momentum transfer along the
direction of travel is v(1 + cos 2) = 2v cos
2
: hence 2v 2v cos
2
, and the force
in the direction (v) is F = 2v
2
cos
3
A. We now apply this formula to nd the
force on a surface of revolution. We dene Oy to be the axis: consider a segment CD
of the curve in the Oxy-plane, with normal PN at an angle to Oy, as shown in the
left-hand panel of gure 3.6.
P
O
A
y
x
y
x
N
b
x
x
s
D
C
D
C
Figure 3.6 Diagram showing change in velocity of a particle colliding with the
element CD, on the left, and the whole curve which is rotated about the y-axis,
on the right.
The force on the ring formed by rotating the segment CD about Oy is, because of axial
symmetry, in the y-direction. The area of the ring is 2xs, where s is the length of
the element CD, so the magnitude of the force opposing the motion is
F = 2xs
_
2v
2
cos
3
_
.
The total force on the curve in gure 3.6 is obtained by integrating from x = 0 to x = b,
and is given by the functional,
F[y] = 4v
2
_
x=b
x=0
ds xcos
3
, y(0) = A, y(b) = 0. (3.20)
But dy/dx = tan and cos = dx/ds, so that
F[y]
4v
2
=
_
b
0
dx
x
1 +y
2
, y(0) = A, y(b) = 0. (3.21)
For a disc of area A
f
, y
(x) = 0, and this reduces to F = 2A

f
v
2
, giving a drag
coecient C
D
= 4, which compares with the measured value of about 1.3. Newtons
problem is to nd the path making this functional a minimum and this is solved in
section 10.6.
Exercise 3.12
Use the denition of the drag coecient, equation 3.18, to show that, according
to the theory described here,
CD =
8
b
2
_
b
0
dx
x
1 +y
2
.
Show that for a sphere, where x
2
+y
2
= b
2
this gives CD = 2. The experimental
value of the drag coecient for the motion of a sphere in air varies between 0.07
and 0.5, depending on its speed.
Variations of this problem were considered by Newton: one is the curve CBD, shown
in gure 3.7, rotated about Oy.
O
B
A
D
a
y
x
b
C
Figure 3.7 Diagram showing the modied geometry considered by Newton.
Here the variable a is an unkown, the line CB is parallel to the x-axis and
the coordinates of C are (0, A).
In this problem the position D is xed, but the position of B is not; it is merely
constrained to be on the line y = A, parallel to Ox. The resisting force is now given by
the functional
F
1
[y]
4v
2
=
1
2
a
2
+
_
b
a
dx
x
1 +y
2
, y(a) = A, y(b) = 0. (3.22)
Now the path y(x) and the number a are to be chosen to make the functional stationary.
Problems such as this, where the position of one (or both) of the end points are
also to be determined are known as variable end point problems and are dealt with in
chapter 10.
3.5.4 A problem in navigation
Given a river with straight, parallel banks a distance b apart
and a boat that can travel with constant speed c in still water,
the problem is to cross the river in the shortest time, starting
and landing at given points.
If the y-axis is chosen to be the left bank, the starting point
to be the origin, O, and the water is assumed to be moving
parallel to the banks with speed v(x), a known function of the
distance from the left-hand bank, then the time of passage
B
v(x)
y(x)
x
y
x=b
O
along the path y(x) is, assuming c > max(v(x)),
T[y] =
_
b
0
dx
_
c
2
(1 +y
2
) v(x)
2
v(x)y
c
2
v(x)
2
, y(0) = 0, y(b) = B,
where the nal destination is a distance B along the right-hand bank. The derivation of
this result is set in exercise 3.22, one of the harder exercises at the end of this chapter.
A variation of this problem is obtained by not dening the terminus, so there is only
one boundary condition, y(0) = 0, and then we need to nd both the path, y(x) and
the terminal point. It transpires that this is an easier problem and that the path is the
solution of y
(x) = v(x)/c, as is shown in exercise 10.7 (page 262).

3.5.5 The isoperimetric problem
Among all curves, y(x), represented by functions with continuous derivatives, that join
the two points P
a
and P
b
in the plane and have given length L[y], determine that which
encompasses the largest area, S[y] shown in diagram 3.8.
[ ]
[ ] S y
L y
x
y
b
B
a
A
P
b
P
a
Figure 3.8 Diagram showing the area, S[y], under a
curve of given length joining Pa to P
b
.
This is a classic problem discussed by Pappus of Alexandria in about 300 AD. Pappus
showed, in Book V of his collection, that of two regular polygons having equal perimeters
the one with the greater number of sides has the greater area. In the same book he
demonstrates that for a given perimeter the circle has a greater area than does any
regular polygon. This work seems to follow closely the earlier work of Zenodorus (circa
180 BC): extant fragments of his work include a proposition that of all solid gures, the
surface areas of which are equal, the sphere has the greatest volume.
Returning to gure 3.8, a modern analytic treatment of the problem requires a
dierentiable function y(x) satisfying y(a) = A, y(b) = B, such that the area,
S[y] =
_
b
a
dxy
is largest when the length of the curve,
L[y] =
_
b
a
dx
_
1 +y
2
,
is given. It transpires that a circular arc is the solution.
This problem diers from the rst three because an additional constraint the
length of the curve is imposed. We consider this type of problem in chapter 12.
3.5.6 The catenary
A catenary is the shape assumed by an inextensible cable, or chain, of uniform density
hanging between supports at both ends. In gure 3.9 we show an example of such a
curve when the points of support, (a, A) and (a, A), are at the same height.
a -a
y
x
A
(a,A) (-a,A)
Figure 3.9 Diagram showing the catenary formed by
a uniform chain hanging between two points at the
same height.
If the lowest point of the chain is taken as the origin, the catenary equation is shown
in section 12.2.3 to be
y = c
_
cosh
_
x
c
_
1
_
(3.23)
for some constant c determined by the length of the chain and the value of a.
If a curve is described by a dierentiable function y(x) it can be shown, see exer-
cise 3.19, that the potential energy E of the chain is proportional to the functional
S[y] =
_
a
a
dxy
_
1 +y
2
.
The curve that minimises this functional, subject to the length of the chain L[y] =
_
a
a
dx
_
1 +y
2
remaining constant, is the shape assumed by the hanging chain. In
common with the previous example, the catenary problem involves a constraint again
the length of the chain and is dealt with using the methods described in chapter 12.
3.5.7 Fermats principle
Light and other forms of electromagnetic radiation are wave phenomena. However, in
many common circumstances light may be considered to travel along lines joining the
source to the observer: these lines are named rays and are often straight lines. This is
why most shadows have distinct edges and why eclipses of the Sun are so spectacular.
In a vacuum, and normally in air, these rays are straight lines and the speed of light in
a vacuum is c 2.9 10
10
cm/sec, independent of its colour. In other uniform media,
for example water, the rays also travel in straight lines, but the speed is dierent: if
the speed of light in a uniform medium is c
m
then the refractive index is dened to be
the ratio n = c/c
m
. The refractive index usually depends on the wave length: thus for
water it is 1.333 for red light (wave length 6.5010
5
cm) and 1.343 for blue light (wave
length 7.5 10
5
cm); this dierence in the refractive index is one cause of rainbows.
In non-uniform media, in which the refractive index depends upon position, light rays
follow curved paths. Mirages are one consequence of a position-dependent refractive
index.
A simple example of the ray description of light is the reection of light in a plane
mirror. In diagram 3.10 the source is S and the light ray is reected from the mirror
at R to the observer at O. The plane of the mirror is perpendicular to the page and it
is assumed that the plane SRO is in the page.
1

2
h
1
h
2 S
R A B
O
Figure 3.10 Diagram showing light travelling from a source S to
an observer O, via a reection at R. The angles of incidence and of
reection are dened to be
1
and
2
, respectively.
It is known that light travels in straight lines and is reected from the mirror at a
point R as shown in the diagram. But without further information the position of R is
unknown. Observations, however, show that the angle of incidence,
1
, and the angle
of reection,
2
, are equal. This law of reection was known to Euclid (circa 300 BC)
and Aristotle (384 322 BC); but it was Hero of Alexandria (circa 125 BC) who showed
by geometric argument that the equality of the angles of incidence and reection is a
consequence of the Aristotelean principle that nature does nothing the hard way; that
is, if light is to travel from the source S to the observer O via a reection in the mirror
then it travels along the shortest path.
This result was generalised by the French mathematician Fermat (1601 1665) into
what is now known as Fermats principle which states that the path taken by light rays
is that which minimises the time of passage
11
. For the mirror, because the speed along
SR and RO is the same this means that the distance along SR plus RO is a minimum.
If AB = d and AR = x, the total distance travelled by the light ray depends only upon
x and is
f(x) =
_
x
2
+h
2
1
+
_
(d x)
2
+h
2
2
.
This function has a minimum when
1
=
2
, that is when the angle of incidence,
1
,
equals the angle of reection,
2
, see exercise 3.14.
In general, for light moving in the Oxy-plane, in a medium with refractive index
n(x, y), with the source at the origin and observer at (a, A) the time of passage, T,
along an arbitrary path y(x) joining these points is
T[y] =
1
c
_
a
0
dxn(x, y)
_
1 +y
2
, y(0) = 0, y(a) = A.
This follows because the time taken to travel along an element of length s is n(x, y)s/c
and s =
_
1 +y
(x)
2
x. If the refractive index, n(x, y), is constant then this integral
reduces to the integral 3.1 and the path of a ray is a straight line, as would be expected.
11
Fermats original statement was that light travelling between two points seeks a path such that the
number of waves is equal, as a rst approximation, to that in a neighbouring path. This formulation
has the form of a variational principle, which is remarkable because Fermat announced this result in
1658, before the calculus of either Newton or Leibniz was developed.
Fermats principle can be used to show that for light reected at a mirror the angle
of incidence equals the angle of reection. For light crossing the boundary between two
media it gives Snells law,
sin
1
sin
2
=
c
1
c
2
,
where
1
and
2
are the angles between the ray and the normal to the boundary and
c
k
is the speed of light in the media, as shown in gure 3.11: in water the speed of light
is approximately c
2
= c
1
/1.3, where c
1
is the speed of light in air, so 1.3 sin
2
= sin
1
.
1
O
S
S
N
Air
Water
2
Figure 3.11 Diagram showing the refraction of light at the surface of wa-
ter. The angles of incidence and refraction are dened to be
2
and
1
respectively; these are connected by Snells law.
In gure 3.11 the observer at O sees an object S in a pond and the light ray from S
to O travels along the two straight lines SN and NO, but the observer perceives the
object to be at S
, on the straight line OS
. This explains why a stick put partly into

water appears bent.
3.5.8 Coordinate free formulation of Newtons equations
Newtons laws of motion accurately describe a signicant portion of the physical world,
from the motion of large molecules to the motion of galaxies. However, Newtons
original formulation is usually dicult to apply to even quite simple mechanical systems
and hides the mathematical structure of the equations of motion, which is important
for the advanced developments in dynamics and for nding approximate solutions. It
transpires that in many important circumstances Newtons equations of motion can be
expressed as a variational principle the solution of which is the equations of motion.
This reformulation took some years to accomplish and was originally motivated partly
by Snells law and Fermats principle, that minimises the time of passage, and partly
by the ancient philosophical belief in the Economy of Nature; for a brief overview of
these ideas the introduction of the book by Yourgrau and Mandelstam (1968) should
be consulted.
The rst variational principle for dynamics was formulated in 1744 by Maupertuis
(1698 1759), but in the same year Euler (1707 1783) described the same principle
more precisely. In 1760 Lagrange (1736 1813) claried these ideas, by rst reformu-
lating Newtons equations of motion into a form now known as Lagranges equations of
motion: these are equivalent to Newtons equations but easier to use because the form
of the equations is independent of the coordinate system used this basic property
of variational principles is discussed in chapter 6 and this allows easier use of more
general coordinate systems.
The next major step was taken by Hamilton (1805 1865), in 1834, who cast La-
granges equations as a variational principle; confusingly, we now name this Lagranges
variational principle. Hamilton also generalised this theory to lay the foundations for
the development of modern physics that occurred in the early part of the 20
th
century.
These developments are important because they provide a coordinate-free formulation
of dynamics which emphasises the underlying mathematical structure of the equations
of motion, which is important in helping to understand how solutions behave.
Summary
These few examples provide some idea of the signicance of variational principles. In
summary, they are important for three distinct reasons
A variational principle is often the easiest or the only method of formulating a
problem.
Often conventional boundary value problems may be re-formulated in terms of a
variational principle which provides a powerful tool for approximating solutions.
This technique is introduced in chapter 13.
A variational formulation provides a coordinate free method of expressing the
laws of dynamics, allowing powerful analytic techniques to be used in ordinary
Newtonian dynamics. The use of variational principles also paved the way for
the formulation of dynamical laws describing motion of objects moving at speeds
close to that of light (special relativity), particles interacting through gravita-
tional forces (general relativity) and the laws of the microscopic world (quantum
mechanics).
Exercise 3.13
Functionals do not need to have the particular form considered in this chapter.
The following expressions also map functions to real numbers:
(a) D[y] = y
(1) +y(1)
2
;
(b) K[y] =
_
1
0
dxa(x)
_
y(x) +y(1)y
(x)
_
;
(c) L[y] =
_
xy(x)y
(x)
_
1
0
+
_
1
0
dx
_
a(x)y
(x) +b(x)y(x)
_
, where a(x) and b(x)
are prescribed functions;
(d) S[y] =
_
1
0
ds
_
1
0
dt
_
s
2
+st
_
y(s)y(t).
Find the values of these functionals for the functions y(x) = x
2
and y(x) = cos x
when a(x) = x and b(x) = 1.
Exercise 3.14
Show that the function
f(x) =
_
x
2
+h
2
1
+
_
(d x)
2
+h
2
2
,
where h1, h2 are dened in gure 3.10 (page 113) and x and d denote the lengths
AR and AB respectively, is stationary when 1 = 2 where
sin 1 =
x
_
x
2
+h
2
1
, sin 2 =
d x
_
(d x)
2
+h
2
2
.
Show that at this stationary value f(x) has a minimum.
Exercise 3.15
S[y] =
_
1
0
dxy
_
1 +y
, y(0) = 0, y(1) = B > 1.

(a) Show that the stationary function is the straight line y(x) = Bx and that the
value of the functional on this line is S[y] = B
1 +B.
(b) By expanding the integrand of S[y +h] to second order in , show that
S[y +h] = S[y] +
(4 + 3B)
2
8(1 +B)
3/2
_
1
0
dxh
(x)
2
, B > 1,
and deduce that on this path the function has a minimum.
Exercise 3.16
Using the method described in the text, show that the functionals
S1[y] =
_
b
a
dx
_
1 +xy
_
y
and S2[y] =
_
b
a
dxxy
2
,
where b > a > 0, y(b) = B and y(a) = A are both stationary on the same curve,
namely
y(x) = A + (B A)
ln(x/a)
ln(b/a)
.
Explain why the same function makes both functionals stationary.
Exercise 3.17
In this exercise the theory developed in section 3.3.1 is extended. The function
F(z) has a continuous second derivative and the functional S is dened by the
integral
S[y] =
_
b
a
dxF(y
).
(a) Show that
S[y +h] S[y] =
_
b
a
dx
dF
dy
(x) +
1
2
2
_
b
a
dx
d
2
F
dy
2
h
(x)
2
+O(
3
),
where h(a) = h(b) = 0.
(b) Show that if y(x) is chosen to make dF/dy
constant then the functional is

stationary.
(c) Deduce that this stationary path makes the functional either a maximum or a
minimum, provided F
(y
) = 0.
Exercise 3.18
Show that the functional
S[y] =
_
1
0
dx
_
1 +y
(x)
2
_
1/4
, y(0) = 0, y(1) = B,
is stationary for the straight line y(x) = Bx.
In addition, show that this straight line gives a minimum value of the functional
if |B| <
2, but if |B| >
2 it gives a maximum.
Harder exercises
Exercise 3.19
If a uniform, exible, inextensible chain of length L is suspended between two
supports having the coordinates (a, A) and (b, B), with the y-axis pointing verti-
cally upwards, show that, if the shape assumed by the chain is described by the
dierentiable function y(x), then its length is given by L[y] =
_
b
a
dx
_
1 +y
2
and
its potential energy by
E[y] = g
_
b
a
dxy
_
1 +y
2
, y(a) = A, y(b) = B,
where is the line-density of the chain and g the acceleration due to gravity.
Exercise 3.20
This question is about the shortest distance between two points on the surface of a
right-circular cylinder, so is a generalisation of the theory developed in section 3.2.
(a) If the cylinder axis coincides with the z-axis we may use the polar coordinates
(, , z) to label points on the cylindrical surface, where is the cylinder radius.
Show that the Cartesian coordinates of a point (x, y) are given by x = cos , y =
sin and hence that the distance between two adjacent points on the cylinder,
(, , z) and (, +, z +z) is, to rst order, given by s
2
=
2
2
+z
2
.
(b) A curve on the surface may be dened by prescribing z as a function of .
Show that the length of a curve from = 1 to 2 is
L[z] =
_

2
1
d
_
2
+z
()
2
.
(c) Deduce that the shortest distance on the cylinder between the two points
(, 0, 0) and (, , ) is along the curve z = /.
Exercise 3.21
An inverted cone has its apex at the origin and axis along the z-axis. Let be
the angle between this axis and the sides of the cone, and dene a point on the
conical surface by the coordinates (, ), where is the perpendicular distance to
the z-axis and is the polar angle measured from the x-axis.
Show that the distance on the cone between adjacent points (, ) and (+, +
) is, to rst order,
s
2
=
2
2
+

2
sin
2
.
Hence show that if (), 1 2, is a curve on the conical surface then its
length is
L[] =
_

2
1
d
_
2
+

2
sin
2
.
Exercise 3.22
A straight river of uniform width a ows with velocity (0, v(x)), where the axes
are chosen so the left-hand bank is the y-axis and where v(x) > 0. A boat can
travel with constant speed c > max(v(x)) relative to still water. If the starting
and landing points are chosen to be the origin and (b, B), respectively, show that
the path giving the shortest time of crossing is given by minimising the functional
T[y] =
_
b
0
dx
_
c
2
(1 +y
(x)
2
) v(x)
2
v(x)y
(x)
c
2
v(x)
2
, y(0) = 0, y(b) = B.
Exercise 3.23
In this exercise the basic dynamics required for the derivation of the minimum
resistance functional, equation 3.21, is derived. This exercise is optional, because it
requires knowledge of elementary mechanics which is not part of, or a prerequisite
of, this course.
Consider a block of mass M sliding smoothly on a plane, the cross section of which
is shown in gure 3.12.
V v
v
Before collision
After collision
M
V
m
Figure 3.12 Diagram showing the velocities of the block and
particle before and after the collision.
The block is moving from left to right, with speed V , towards a small particle of
mass m moving with speed v, such that initially the distance between the particle
and the block is decreasing. Suppose that after the inevitable collision the block
is moving with speed V
, in the same direction, and the particle is moving with

speed v
to the right. Use conservation of energy and linear momentum to show

that (V
, v
) are related to (V, v) by the equations

MV
2
+mv
2
= MV
2
+mv
2
and MV mv = MV
+mv
.
Hence show that
V
= V
2m
M +m
(V +v) and v
=
2MV + (M m)v
M +m
.
Show that in the limit m/M 0, V
= V and v
= 2V + v and give a physical

interpretation of these equations.
Chapter 4
The Euler-Lagrange equation
4.1 Introduction
In this chapter we apply the methods introduced in section 3.2 to more general problems
and derive the most important result of the Calculus of Variations. We show that for
the functional
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (4.1)

where F(x, u, v) is a real function of three real variables, a necessary and sucient
condition for the twice dierentiable function y(x) to be a stationary path is that it
satises the equation
d
dx
_
F
y
F
y
= 0 and the boundary conditions y(a) = A, y(b) = B. (4.2)
This equation is known either as Eulers equation or the Euler-Lagrange equation, and is
a second-order equation for y(x), exercise 3.10 (page 103). Conditions for a stationary
path to give either a local maximum or a local minimum are more dicult to nd and
we defer a discussion of this problem to chapter 8.
In order to derive the Euler-Lagrange equation it is helpful to rst discuss some
preliminary ideas. We start by briey describing Eulers original analysis, because
it provides an intuitive understanding of functionals and provides a link between the
calculus of functions of many variables and the Calculus of Variations. This leads
directly to the idea of the rate of change of a functional, which is required to dene
a stationary path. This section is followed by the proof of the fundamental lemma of
the Calculus of Variations which is essential for the derivation of the Euler-Lagrange
equation, which follows.
The Euler-Lagrange equation is usually a nonlinear boundary value problem: this
combination causes severe diculties, both theoretical and practical. First, solutions
may not exist and if they do uniqueness is not ensured: second, if solutions do exist
it is often dicult to compute them. These diculties are in sharp contrast to initial
value problems and, because the dierences are so marked, in section 4.5 we compare
these two types of equations in a little detail. Finally, in section 4.6, we show why the
limiting process used by Euler is subtle and can lead to diculties.
121
122 CHAPTER 4. THE EULER-LAGRANGE EQUATION
4.2 Preliminary remarks
4.2.1 Relation to dierential calculus
Euler (1707 1783) was the rst to make a systematic study of problems that can
be described by functionals, though it was Lagrange (1736 1813) who developed the
method we now use. Euler studied functionals having the form dened in equation 4.1.
He related these functionals to functions of many variables using the simple device of
dividing the abscissa into N + 1 equal intervals,
a = x
0
, x
1
, x
2
, . . . x
N
, x
N+1
= b, where x
k+1
x
k
= ,
and replacing the curve y(x) with segments of straight lines with vertices
(x
0
, A), (x
1
, y
1
), (x
2
, y
2
), . . . (x
N
, y
N
), (x
N+1
, B) where y
k
= y(x
k
),
y(a) = A and y(b) = B, as shown in the following gure.
A
P
a
a=x
0
x
1
x
2
x
3
x
4
x
5
b=x
6
P
b
B
y(x)
x
y
Figure 4.1 Diagram showing the rectication of a curve by a
series of six straight lines, N = 5.
Approximating the derivative at x
k
by the dierence (y
k
y
k1
)/ the functional 4.1
is replaced by a function of the N variables (y
1
, y
2
, , y
N
),
S(y
1
, y
2
, , y
N
) =
N+1
k=1
F
_
x
k
, y
k
,
y
k
y
k1
_
where =
b a
N + 1
, (4.3)
and where y
0
= A and y
N+1
= B. This association with ordinary functions of many
variables can illuminate the nature of functionals and, if all else fails, it can be used
as the basis of a numerical approximation; examples of this procedure are given in
exercises 4.1 and 4.21. The integral 4.1 is obtained from this sum by taking the limit
N ; similarly the Euler-Lagrange equation 4.2 may be derived by taking the same
limit of the N algebraic equations S/y
k
, k = 1, 2, , N, see exercise 4.30 (page 141).
In any mathematical analysis care is usually needed when such limits are taken and the
Calculus of Variations is no exception; however, here we discuss these problems only
briey, in section 4.6.
Euler made extensive use of this method of nite dierences. By replacing smooth
curves by polygonal lines he reduced the problem of nding stationary paths of func-
tionals to nding stationary points of a function of N variables: he then obtained exact
4.2. PRELIMINARY REMARKS 123
solutions by taking the limit as N . In this sense functionals may be regarded
as functions of innitely many variables that is, the values of the function y(x) at
distinct points and the Calculus of Variations may be regarded as the corresponding
analogue of dierential calculus.
Exercise 4.1
If the functional depends only upon y
,
S[y] =
_
b
a
dxF(y
), y(a) = A, y(b) = B,
show that the approximation dened by equation 4.3 becomes
S(y1, y2, , yN) =
_
F
_
y1 A
_
+F
_
y2 y1
_
+ +F
_
y
k
y
k1
_
+
+F
_
yN yN1
_
+F
_
B yN
__
.
Hence show that a stationary point of S satises the equations
F
((y
k
y
k1
)/) = c, k = 1, 2, , N + 1,
where c is a constant, independent of k. Deduce that, if F(z) is suciently smooth,
S(y1, y2, , yN) is stationary when the points (x
k
, y(x
k
)) lie on a straight line.
4.2.2 Dierentiation of a functional
The stationary points of a function of n variables are where all n rst partial derivatives
vanish. The stationary paths of a functional are dened in a similar manner and
the purpose of this section is to introduce the idea of the derivative of a functional
and to show how it may be calculated. First, however, it is necessary to make a few
preliminary remarks in order to emphasise the important dierences between functionals
and functions of n variables: we return to these problems later.
In the study of functions of n variables, it is convenient to use geometric language
and to regard the set of n numbers (x
1
, x
2
, , x
n
) as a point in an n-dimensional
space. Similarly, we regard each function y(x), belonging to a given class of functions,
as a point in some function space.
For functions of n variables it is sucient to consider a single space, for instance
the n-dimensional Euclidean space. But, there is no universal function space and the
nature of the problem determines the choice of function space. For instance, when
dealing with a functional of the form 4.1 it is natural to use the set of all functions with
a continuous rst derivative. In the case of functionals of the form
_
b
a
dxF(x, y, y
, y
)
we would require functions with two continuous derivatives.
The concept of continuity of functions is important and you will recall, section 1.3.2,
that a function f(x) is continuous at x = c if the values of f(x) at neighbouring values
of c are close to f(c); more precisely we require that
lim
0
f(c +) = f(c).
Remember that if the usual derivative of a function exists at any point x, it is continuous
at x.
The type of functional dened by equation 4.1 involves paths joining the points
(a, A) and (b, B) which are dierentiable or piecewise dierentiable for a x b.
In order to nd a stationary path we need to compare values of the functional on
nearby paths; this means that a careful denition of the distance between nearby paths
(functions) is important. This is achieved most easily by using the notion of a norm of
a function. A norm dened on a function space is a map taking elements of the space
to the non-negative real numbers; it represents the distance from an element to the
origin (zero function). It has the same properties as the Euclidean distance dened in
equation 1.2 (page 11).
In R
n
the Euclidean distance suces for most purposes. In innite dimensional
function spaces there is no obvious choice of norm that can be used in all circumstances.
Use of dierent norms and the corresponding concepts of distance can lead to dierent
classications of stationary paths as is seen in section 4.6.
For this reason it is usual to distinguish between a function space and a normed
space by using a dierent name whenever a specic norm on the set of functions is being
considered. For example, we have introduced the space C
0
[a, b] of continuous functions
on the interval [a, b]. One of the simplest norms on this space is the supremum norm
1
y(x) = max
axb
|y(x)|,
and this norm can be shown to satisfy the conditions of equation 1.3 (page 11). The
distance between two functions y and z is of course y z. When we wish to
emphasise that we are considering this particular normed space, and not just the space
of continuous functions, we shall write D
0
[a, b], by which we shall mean the space of
continuous functions with the specied norm. When we write C
0
[a, b], no particular
norm is implied.
In what follows, we shall sometimes need to restrict attention to functions which
have a continuous and bounded derivative. A suitable norm for such functions is
y(x)
1
= max
axb
|y(x)| + max
axb
|y
(x)|,
and we shall denote by D
1
[a, b] the normed space of functions with continuous bounded
derivative equipped with the norm .
1
dened above. This space consists of the same
functions as the space C
1
[a, b], but as before use of the latter notation will not imply
the use of any particular norm on the space.
It is usually necessary to restrict the class of functions we consider to the subset
of all possible functions that satisfy the boundary conditions, if dened. Normally we
shall simply refer to this restricted class of functions as the admissible functions: these
are dened to be those dierentiable functions that satisfy any boundary conditions
and, in most circumstances, to be in D
1
(a, b), because it is important to bound the
variation in y
(x). Later we shall be less restrictive and allow piecewise dierentiable

functions.
We now come to the most important part of this section, that is the idea of the rate
of change of a functional which is implicit in the idea of a stationary path. Recall that a
1
In analysis texts max |y(x)| is replaced by sup |y(x)|, but for continous functions on closed nite
intervals max and sup are identical.
4.2. PRELIMINARY REMARKS 125
real, dierentiable function of n real variables, G(x), x = (x
1
, x
2
, , x
n
), is stationary
at a point if all its rst partial derivatives are zero, G/x
k
= 0, k = 1, 2, , n.
This result follows by considering the dierence between the values of G(x) at adjacent
points using the rst-order Taylor expansion, equation 1.39 (page 36),
G(x +) G(x) =
n
k=1
k
G
x
k
+O(
2
), || = 1,
where = (
1
,
2
, ,
n
). The rate of change of G(x) in the direction is obtained by
dividing by and taking the limit 0,
G(x, ) = lim
0
G(x +) G(x)
=
n
k=1
k
G
x
k
. (4.4)
A stationary point is dened to be one at which the rate of change, G(x, ), is zero
in every direction; it follows that at a stationary point all rst partial derivatives must
be zero.
The idea embodied in equation 4.4 may be applied to the functional
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B,
which has a real value for each admissible function y(x). The rate of change of a
functional S[y] is obtained by examining the dierence between neighbouring admissible
paths, S[y +h] S[y]; since both y(x) and y(x) +h(x) are admissible functions for all
real , it follows that h(a) = h(b) = 0. This dierence is a function of the real variable
, so we dene the rate of change of S[y] by the limit,
S[y, h] = lim
0
S[y +h] S[y]
=
d
d
S[y +h]
=0
, (4.5)
which we assume exists. The functional S depends upon both y(x) and h(x), just as
the limit of the dierence [G(x+) G(x)]/, of equation 4.4, depends upon x and .
Denition 4.1
The functional S[y] is said to be stationary if y(x) is an admissible function and if
S[y, h] = 0 for all h(x) for which y(x) and y(x) +h(x) are admissible.
The functions for which S[y] is stationary are named stationary paths. The stationary
path, y(x), and the varied path y(x) + h(x) must be admissible: for most variational
problems considered in this chapter both paths needs to satisfy the boundary conditions,
so h(a) = h(b) = 0. But in more general problems considered later, particularly in
chapter 10, these conditions on h(x) are removed, but see exercises 4.12 and 4.13. If
y(x) is an admissible path we name the allowed variations, h(x), to be those for which
y(x) +h(x) are admissible.
On a stationary path the functional may achieve a maximum or a minimum value,
and then the path is named an extremal. The nature of stationary paths is usually
determined by the term O(
2
) in the expansion of S[y +h]: this theory is described in
chapter 8.
In all our applications the limit
S[y, h] =
d
d
S[y +h]
=0
is linear in h, that is if c is any constant then S[y, ch] = cS[y, h]; in this case it is
named the Gateaux dierential.
Notice that if S is an ordinary function of n variables, (y
1
, y
2
, , y
n
), rather than
a functional, then the Gateaux dierential is
S = lim
0
d
d
S(y +h) =
n
k=1
S
y
k
h
k
,
which is proportional to the rate of change dened in equation 4.4.
As an example, consider the functional
S[y] =
_
b
a
dx
_
1 +y
2
, y(a) = A, y(b) = B,
for the distance between (a, A) and (b, B), discussed in section 3.2.1. We have
d
d
S[y +h] =
d
d
_
b
a
dx
_
1 + (y
+h
)
2
=
_
b
a
dx
d
d
_
1 + (y
+h
)
2
,
=
_
b
a
dx
(y
+h
)
_
1 + (y
+h
)
2
h
.
Note that we may change the order of dierentiation with respect to and integration
with respect to x because a and b are independent of and all integrands are assumed
to be suciently well-behaved functions of x and . Hence, on putting = 0
S[y, h] =
d
d
S[y +h]
=0
=
_
b
a
dx
y
_
1 +y
2
h
,
which is just equation 3.4 (page 95).
For our nal comment, we note the approximation dened in equation 4.3 (page 122)
gives a function of N variables, so the associated dierential is
S[y, h] = lim
0
S(y +h) S(y)
.
Comparing this with G, equation 4.4, we can make the equivalences y x and h .
However, for functions of N variables there is no relation between the variables
k
and
k+1
, but h(x) is dierentiable, so |h
k
h
k+1
| = O(). This suggests that some care is
required in taking the limit N of equation 4.3 and shows why problems involving
nite numbers of variables can be dierent from those with innitely many variables
and why the choice of norms, discussed above, is important. Nevertheless, provided
caution is exercised, the analogy with functions of several variables can be helpful.
4.3. THE FUNDAMENTAL LEMMA 127
Exercise 4.2
Find the Gateaux dierentials of the following functionals:
(a) S[y] =
_
/2
0
dx
_
y
2
y
2
_
, (b) S[y] =
_
b
a
dx
y
2
x
3
, b > a > 0,
(c) S[y] =
_
b
a
dx
_
y
2
+y
2
+ 2ye
x
_
, (d) S[y] =
_
1
0
dx
_
x
2
+y
2
_
1 +y
2
.
Exercise 4.3
Show that the Gateaux dierential of the functional,
S[y] =
_
b
a
ds
_
b
a
dt K(s, t)y(s)y(t)
is
S[y, h] =
_
b
a
ds h(s)
_
b
a
dt
_
K(s, t) +K(t, s)
_
y(t).
4.3 The fundamental lemma
This section contains the essential result upon which the Calculus of Variations depends.
Using the result obtained here we will be able to use the stationary condition that
S[y, h] = 0, for all suitable h(x), to form a dierential equation for the unknown
function y(x).
The fundamental lemma: if z(x) is a continuous function of x for a x b and if
_
b
a
dxz(x)h(x) = 0
for all functions h(x) that are continuous for a x b and are zero at x = a and
x = b, then z(x) = 0 for a x b.
In order to prove this we assume on the contrary that z() = 0 for some satisfying
a < < b. Then, since z(x) is continuous there is an interval [x
1
, x
2
] around with
a < x
1
x
2
< b
in which z(x) = 0. We now construct a suitable function h(x) that yields a contradic-
tion. Dene h(x) to be
h(x) =
_
(x x
1
)(x
2
x), a < x
1
x x
2
< b,
0, otherwise,
so h(x) is continuous and
_
b
a
dxz(x)h(x) =
_
x2
x1
dxz(x)(x x
1
)(x
2
x) = 0,
since the integrand is continuous and non-zero on (x
1
, x
2
). However,
_
b
a
dxzh = 0, so
we have a contradiction.
Thus the assumptions that z(x) is continuous and z(x) = 0 for some x (a, b)
lead to a contradiction and we deduce that z(x) = 0 for a < x < b: because z(x) is
continuous it follows that z(x) = 0 for a x b. This result is named the fundamental
lemma of the Calculus of Variations.
This proof assumed only that h(x) is continuous and made no assumptions about
its dierentiability. In previous applications h(x) had to be dierentiable for x (a, b).
However, for the function h(x) dened above h
(x) does not exist at x

1
and x
2
. The
proof is easily modied to deal with this case. If h(x) needs to be n times dierentiable
then we use the function
h(x) =
_
(x x
1
)
n+1
(x
2
x)
n+1
, x
1
x x
2
,
0, otherwise.
Exercise 4.4
In this exercise a result due to du Bois-Reymond (1831 1889) which is closely
related to the fundamental lemma will be derived. This is required later, see
exercise 4.11.
If z(x) and h
(x) are continuous, h(a) = h(b) = 0 and

_
b
a
dxz(x)h
(x) = 0
for all h(x), then z(x) is constant for a x b.
Prove this result by dening a constant C and a function g(x) by the relations
C =
1
b a
_
b
a
dxz(x) and g(x) =
_
x
a
dt (C z(t)).
Show that g(a) = g(b) = 0 and
_
b
a
dxz(x)g
(x) =
_
b
a
dxz(x)(C z(x)) =
_
b
a
dx(C z(x))
2
.
Hence, deduce that z(x) = C.
4.4 The Euler-Lagrange equations
This section contains the most important result of this chapter. Namely, that if
F(x, u, v) is a suciently dierentiable function of three variables, then a necessary
and sucient condition for the functional
2
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (4.6)

2
Many texts state that a necessary condition for y(x) to be an extremal of S[y] is that it satises the
Euler-Lagrange equation. Here we consider stationary paths and then the condition is also sucient.
4.4. THE EULER-LAGRANGE EQUATIONS 129
to be stationary on the path y(x) is that it satises the dierential equation and bound-
ary conditions,
d
dx
_
F
y
F
y
= 0, y(a) = A, y(b) = B. (4.7)
This is named Eulers equation or the Euler-Lagrange equation. It is a second-order
dierential equation, as shown in exercise 3.10, and is the analogue of the conditions
G/x
k
= 0, k = 1, 2, , n, for a function of n real variables to be stationary, as
discussed in section 4.2.2. We now derive this equation.
The integral 4.6 is dened for functions y(x) that are dierentiable for a x b.
Using equation 4.5 we nd that the rate of change of S[y] is
S[y, h] =
d
d
_
b
a
dxF(x, y +h, y
+ h
=0
=
_
b
a
dx
d
d
F(x, y +h, y
+h
=0
. (4.8)
The integration limits a and b are independent of and we assume that the order of
integration and dierentiation may be interchanged. The integrand of equation 4.8 is a
total derivative with respect to and equation 1.21 (page 26) shows how to write this
expression in terms of the partial derivatives of F. Using equation 1.21 with n = 3,
t = and the variable changes (x
1
, x
2
, x
3
) = (x, y, y
) and (h
1
, h
2
, h
3
) = (0, h(x), h
(x)),
so that
f(x
1
+th
1
, x
2
+th
2
, x
3
+th
2
) becomes F(x, y +h, y
+h
)
we obtain
d
d
F(x, y +h, y
+h
) = h
F
y
+h
F
y
.
Now set = 0, so the partial derivatives are evaluated at (x, y, y
), to obtain,
S[y, h] =
_
b
a
dx
_
h(x)
F
y
+h
(x)
F
y
_
. (4.9)
The second term in this integral can be simplied by integrating by parts,
_
b
a
dxh
(x)
F
y
=
_
h(x)
F
y
_
b
a
_
b
a
dxh(x)
d
dx
_
F
y
_
,
assuming that F
y
is dierentiable. But h(a) = h(b) = 0 so the boundary term on the
right-hand side vanishes and the rate of change of the functional S[y] becomes
S[y, h] =
_
b
a
dx
_
d
dx
_
F
y
F
y
_
h(x). (4.10)
If the Euler-Lagrange equation is satised S[y, h] = 0 for all allowed h, so y(x) is a
stationary path of the functional.
If S[y] is stationary then, by denition, S[y, h] = 0 for all allowed h and it follows
from the fundamental lemma of the Calculus of Variations that y(x) satises the second-
order dierential equation
d
dx
_
F
y
F
y
= 0, y(a) = A, y(b) = B. (4.11)
Hence a necessary and sucient condition for a functional to be stationary on a su-
ciently dierentiable path, y(x), is that it satises the Euler-Lagrange equation 4.7.
The paths that satisfy the Euler-Lagrange equation are not necessarily extremals,
that is do not necessarily yield maxima or minima, of the functional. The Euler-
Lagrange equation is, in most cases, a second-order, nonlinear, boundary value problem
and there may be no solutions or many. Finally, note that functionals that are equal
except for multiplicative or additive constants have the same Euler-Lagrange equations.
Exercise 4.5
Show that the Euler-Lagrange equation for the functional
S[y] =
_
X
0
dx
_
y
2
y
2
_
, y(0) = 0, y(X) = 1, X > 0,
is y
+ y = 0. Hence show that provided X = n, n = 1, 2, , the stationary

function is y = sin x/ sin X.
The signicance of the point X = will be revealed in chapter 8, in particular
exercise 8.12. There it is shown that for 0 < X < this solution is a minimum of
the functional, but for X > it is simply a stationary point. In this example at
the boundary, X = , the Euler-Lagrange equation does not have a solution.
4.4.1 The rst-integral
The Euler-Lagrange equation is a second-order dierential equation. But if the inte-
grand does not depend explicitly upon x, so the functional has the form
S[y] =
_
b
a
dxG(y, y
), y(a) = A, y(b) = B, (4.12)

then the Euler-Lagrange equation reduces to the rst-order dierential equation,
y
G
y
G = c, y(a) = A, y(b) = B, (4.13)

where c is a constant determined by the boundary conditions, see for example exer-
cise 4.6 below. The expression on the left-hand side of this equation is often named the
rst-integral of the Euler-Lagrange equation. This result is important because, when
applicable, it often saves a great deal of eort, because it is usually far easier to solve
this lower order equation. Two proofs of equation 4.13 are provided: the rst involves
deriving an algebraic identity, see exercise 4.7, and it is important to do this yourself.
The second proof is given in section 7.2.1 and uses the invariance properties of the inte-
grand G(y, y
). A warning, however; in some circumstances a solution of equation 4.13

will not be a solution of the original Euler-Lagrange equation, see exercise 4.8, also
section 5.3 and chapter 6.
Another important consequence is that the stationary function, the solution of 4.13,
depends only upon the variables u = x a and b a (besides A and B), rather than
x, a and b independently, as is the case when the integrand depends explicitly upon x.
A specic example illustrating this behaviour is given in exercise 4.20.
An observation
You may have noticed that the original functional 4.6 is dened on the class of func-
tions for which F(x, y(x), y
(x)) is integrable: if F(x, u, v) is dierentiable in all three

variables this condition is satised if y
(x) is piecewise continuous. However, the Euler-

Lagrange equation 4.11 requires the stronger condition that y
(x) is dierentiable. This

extra condition is created by the derivation of the Euler-Lagrange equation, in partic-
ular the step between equations 4.9 and 4.10: a necessary condition for the functional
S[y] to be stationary, that does not make this step and does not require y
to exist, is
derived in exercise 4.11.
There are important problems where y
(x) does not exist at all points on a stationary

path the minimal surface of revolution, dealt with in the next chapter, is one simple
example; the general theory of this type of problem will be considered in chapter 10.
Exercise 4.6
S[y] =
_
1
0
dx
_
y
2
y
_
, y(0) = 0, y(1) = 1.
and show that the Euler-Lagrange equation is the linear equation,
2
d
2
y
dx
2
+ 1 = 0, y(0) = 0, y(1) = 1,
and nd its solution.
Show that the rst-integral, equation 4.13, becomes the nonlinear equation
_
dy
dx
_
2
+y = c.
Find the general solution of this equation and nd the solution that satises the
boundary conditions.
In this example it is easier to solve the linear second-order Euler-Lagrange equation
than the rst-order equation 4.13, which is nonlinear. Normally, both equations
are nonlinear and then it is easier to solve the rst-order equation. In the examples
considered in sections 5.3 and 5.2 it is more convenient to use the rst-integral.
Exercise 4.7
If G(y, y
) does not depend explicitly upon x, that is G/x = 0, show that

y
(x)
_
d
dx
_
G
y
G
y
_
=
d
dx
_
y
G
y
G
_
and hence derive equation 4.13.
Hint: you will nd the result derived in exercise 3.10 (page 103) helpful.
Exercise 4.8
(a) Show that provided G
y
(y, 0) exists the dierential equation 4.13 (without the
boundary conditions) has a solution y(x) = , where the constant is dened
implicitly by the equation G(, 0) = c.
(b) Under what circumstances is the solution y(x) = also a solution of the
Euler-Lagrange equation 4.11?
Exercise 4.9
S[y] =
_
1
0
dx
_
y
2
+y
2
+ 2axy
_
, y(0) = 0, y(1) = B,
where a is a constant, is y
y = ax and hence that a stationary function is

y(x) = (a +B)
sinhx
sinh1
ax.
By expanding S[y + h] to second order in show that this solution makes the
functional a minimum.
Exercise 4.10
In this exercise we consider a problem, due to Weierstrass (1815 1897), in which
the functional achieves its minimum value of zero for a piecewise continuous func-
tion but for continuous functions the functional is always positive.
The functional is
J[y] =
_
1
1
dxx
2
y
2
, y(1) = 1, y(1) = 1,
so J[y] 0 for all real functions. The function
y(x) =
_
1, 1 x < 0
1, 0 < x 1,
has a piecewise continuous derivative and J[y] = 0.
(a) Show that the associated Euler-Lagrange equation gives x
2
y
= A for some
constant A and that the solutions of this that satisfy the boundary conditions at
x = 1 and x = 1 are, respectively,
y(x) =
_
_
1 A
A
x
, 1 x < 0
1 +A
A
x
, 0 < x 1.
Deduce that no continuous function satises the Euler-Lagrange equation and the
(b) Show that for the class of continuous function dened by
y(x) =
_
_
1, 1 x ,
x/, |x| < ,
1, x 1,
where is a small positive number, J[y] = 2/3. Deduce that for continuous
functions the functional can be made arbitrarily close to the smallest possible
value of J, that is zero, so there is no stationary path.
(c) A similar result can be proved for a class of continuously dierentiable func-
tions. For the functions
y(x) =
1
tan
1
_
x
_
, tan =
1
, 0 < < 1,
show that
J[y] =
2
+O(
2
).
Deduce that J[y] may take arbitrarily small values, but cannot be zero.
Hint the relation tan
1
(1/z) = /2 tan
1
(z) is needed.
It may be shown that for no continuous function satisfying the boundary condi-
tions is J[y] = 0. Thus on the class of continuous functions J[y] never equals its
minimum value, but can approach it arbitrarily closely.
Exercise 4.11
The Euler-Lagrange equation 4.11 requires that y
(x) exists, yet the original func-

tional does not. The second derivative arises when equation 4.9 is integrated by
parts to replace h
(x) by h(x). In this exercise you will show that this step may be
avoided and that a sucient condition not depending upon y
(x) may be derived.

Dene the function (x) by the integral
(x) =
_
x
a
dt Fy(t, y(t), y
(t)),
so that (a) = 0 and
(x) = Fy(x, y, y
), and show that equation 4.9 becomes

S =
_
b
a
dxh
(x)
_
F
y
(x)
_
.
Using the result derived in exercise 4.4 show that a necessary condition for S[y]
to be stationary is that
F
y

_
x
a
dt
F
y
= C,
where C is a constant.
In practice, this equation is not usually as useful as the Euler-Lagrange equation.
Exercise 4.12
The boundary conditions y(a) = A, y(b) = B are not always appropriate so we
need functionals that yield dierent conditions. In this exercise we illustrate how
this can sometimes be achieved. The technique used here is important and will
be used extensively in chapter 10.
S[y] = G(y(b)) +
1
2
_
b
a
dx
_
y
2
+y
2
_
, y(a) = A,
with no condition being given at x = b. For this functional the variation h(x)
satises h(a) = 0, but h(b) is not constrained.
(a) Use the fact that h(a) = 0 to show that the Gateaux dierential can be written
in the form
S[y, h] =
_
y
(b) Gy(y(b))
_
h(b)
_
b
a
dx
_
y
y
_
h.
(b) Using a subset of variations with h(b) = 0 show that the stationary paths
satisfy the equation y
y = 0, y(a) = A, and that on this path

S[y, h] =
_
y
(b) Gy(y(b))
_
h(b).
Deduce that S[y] is stationary only if y(b) and y
(b) satisfy the equation

y
(b) = Gy(y(b)).
(c) Deduce that the stationary path of
S[y] = By(b) +
1
2
_
b
a
dx
_
y
2
+y
2
_
, y(a) = A,
satises the Euler-Lagrange equation y
y = 0, y(a) = A, y
(b) = B.
Exercise 4.13
Use the ideas outlined in the previous exercise to show that if G(b, y, B) is dened
by the integral
G(b, y, B) =
_
y
dz F
y
(b, z, B)
the functional
S[y] = G(b, y(b), B) +
_
b
a
dxF(x, y, y
), y(a) = A,
is stationary on the path satised by the Euler-Lagrange equation
d
dx
_
F
y
F
y
= 0, y(a) = A, y
(b) = B.
4.5 Theorems of Bernstein and du Bois-Reymond
In section 4.4 it was shown that a necessary condition for a function, y(x), to represent
a stationary path of the functional S =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, is
that it satises the Euler-Lagrange equation 4.11 or, in expanded form, exercise 3.10
(page 103),
y
F
y
y
+y
F
y y
+F
x y
F
y
= 0, y(a) = A, y(b) = B. (4.14)
This is a second-order, dierential equation and is usually nonlinear; even without
the boundary conditions this equation cannot normally be solved in terms of known
functions: the addition of the boundary values normally makes it even harder to solve. It
is therefore frequently necessary to resort to approximate or numerical methods to nd
solutions, in which case it is helpful to know that solutions actually exist and that they
4.5. THEOREMS OF BERNSTEIN AND DU BOIS-REYMOND 135
are unique: indeed it is possible for black-box numerical schemes to yield solutions
when none exists. In this course there is insucient space to discuss approximate
and numerical methods, but this section is devoted to a discussion of a theorem that
provides some information about the existence and uniqueness of solutions for the Euler-
Lagrange equation. In the last part of this section we contrast these results with those
for the equivalent equation, but with initial conditions rather than boundary values.
First, however, we return to the question, discussed on page 131, of whether the
second derivative of the stationary path exists, that is whether it satises the Euler-
Lagrange equation in the whole interval.
The following theorem due to the German mathematician du Bois-Reymond (1831
1889) gives necessary conditions for the second derivative of a stationary path to exist.
Theorem 4.1
If
(a) y(x) has a continuous rst derivative,
(b) S[y, h] = 0 for all allowed h(x),
(c) F(x, u, v) has continuous rst and second derivatives in all variables and
(d)
2
F/y
2
= 0 for a x b,
then y(x) has a continuous second derivative and satises the Euler-Lagrange equa-
tion 4.11 for all a x b.
This result is of limited practical value because its application sometimes requires
knowledge of the solution, or at least some of its properties. A proof of this theorem may
be found in Gelfand and Fomin (1963, page 17)
3
. An example in which F
y
y
= 0 on the
stationary path and where this path does not possess a second derivative, yet satises
the Euler-Lagrange equation almost everywhere, is given in exercise 4.28 (page 141).
4.5.1 Bernsteins theorem
The theorem quoted in this section concerns the boundary value problem that can be
written in form of the second-order, nonlinear, boundary value equation,
d
2
y
dx
2
= H
_
x, y,
dy
dx
_
, y(a) = A, y(b) = B. (4.15)
For such equations this is one of the few general results about the nature of the solutions
and is due to the Ukrainian mathematician S N Bernstein (1880 1968). This theorem
provides a sucient condition for equation 4.15 to have a unique solution.
Theorem 4.2
If for all nite y, y
and x in an open interval containing [a, b], that is c < a x b < d,

(a) the functions H, H
y
and H
y
are continuous,
(b) there is a constant k > 0 such that H
y
> k, and,
(c) for any Y > 0 and all |y| < Y and a x b there are positive constants (Y )
and (Y ), depending upon Y , and possibly c and d, such that
|H(x, y, y
)| (Y )y
2
+(Y ),
3
I M Gelfand and S V Fomin Calculus of Variations, (Prentice Hall, translated from the Russian
by R A Silverman), reprinted 2000 (Dover).
then one and only one solution of equation 4.15 exists.
A proof of this theorem may be found in Akhiezer (1962, page 30)
4
. The conditions
required by this theorem are far more stringent than those required by theorems 2.1
(page 61) and 2.2 (page 81), which apply to initial value problems. These theorems
emphasise the signicant dierences between initial and boundary value problems as
discussed in section 2.2.
Some examples
The usefulness of Bernsteins theorem is somewhat limited because the conditions of
the theorem are too stringent; it is, however, one of the rare general theorems applying
to this type of problem. Here we apply it to the two problems dealt with in the next
chapter, for which the integrands of the functionals are
F = y
_
1 +y
2
Minimal surface of revolution,
F =
1 +y
2
y
Brachistochrone.
Substituting these into the Euler-Lagrange equation 4.14 we obtain the following ex-
pressions for H,
y
= H =
1 +y
2
y
Minimal surface of revolution,
y
= H =
1 +y
2
2y
Brachistochrone.
In both cases is H discontinuous at y = 0, so the conditions of the theorem do not hold.
In fact, the Euler-Lagrange equation for the minimal surface problem has one piecewise
smooth solution and, in addition, either two or no dierentiable solutions, depending
upon the boundary values. The brachistochrone problem always has one, unique solu-
tion. These examples emphasise the fact that Bernsteins theorem gives sucient as
opposed to necessary conditions.
Exercise 4.14
Use Bernsteins theorem to show that the equation y
y = x, y(0) = A, y(1) = B,
has a unique solution, and nd this solution.
Exercise 4.15
(a) Apply Bernsteins theorem to the equation y
+ y = x, y(0) = 0, y(X) = 1
with X > 0.
(b) Show that the solution of this equation is
y = x + (1 X)
sin x
sin X
and explain why this does not contradict Bernsteins theorem.
Exercise 4.16
The integrand of the functional for Brachistochrone problem, described in sec-
tion 3.5.1, is F =
_
1 +y
2
/
y. Show that the associated Euler-Lagrange equa-

tion is y
=
1 +y
2
2y
and that this may be written as the pair of rst-order
equations
dy1
dx
= y2,
dy2
dx
=
1 +y
2
2
2y1
where y1 = y.
4
N I Akhiezer The Calculus of Variations (Blaisdell).
4.6. STRONG AND WEAK VARIATIONS 137
Exercise 4.17
Consider the functional S[y] =
_
1
1
dxy
2
_
1 y
_
2
, y(1) = 0, y(1) = 1, the
smallest value of which is zero. Show that the solution of the Euler-Lagrange
equation that minimises this functional is
y(x) =
_
0, 1 x 0,
x, 0 < x < 1,
which has a discontinuous derivative at x = 0. Show that this result is consistent
with theorem 4.1 of du Bois-Reymond.
4.6 Strong and Weak variations
In section 4.2.2 we briey discussed the idea of the norm of a function. Here we show
why the choice of the norm is important.
Consider the functional for the distance between the origin and the point (1, 0), on
the x-axis,
S[y] =
_
1
0
dx
_
1 +y
2
, y(0) = 0, y(1) = 0. (4.16)
It is obvious, and proved in section 3.2, that in the class of smooth functions the
stationary path is the segment of the x-axis between 0 and 1, that is y(x) = 0 for
0 x 1.
Now consider the value of the functional as the path is varied about y = 0, that is
S[h], where h(x) is rst restricted to D
1
(0, 1) and then to D
0
(0, 1).
In the rst case the norm of h(x) is taken to be
||h(x)||
1
= max
0x1
|h(x)| + max
0x1
|h
(x)|. (4.17)
and without loss of generality we may restrict h to satisfy ||h||
1
= 1, so that |h
(x)|
H
1
< 1. On the varied path the value of the functional is
S[h] =
_
1
0
dx
_
1 +
2
h
2
_
1 + (H
1
)
2
and hence
S[h] S[0]
_
1 + (H
1
)
2
1 =
(H
1
)
2
1 +
_
1 + (H
1
)
2
< (H
1
)
2
<
2
.
Thus if h(x) belongs to D
1
(0, 1), S[y] changes by O(
2
) on the neighbouring path and
since S[h] S[0] > 0 for all the straight line path is a minimum.
Now consider the less restrictive norm
||h(x)||
0
= max
0x1
|h(x)|, (4.18)
which restricts the magnitude of h, but not the magnitude of its derivative. A suitable
path close to y = 0 is given by h(x) = sinnx, n being a positive integer. Now we
have
S[h] =
_
1
0
dx
_
1 + (n)
2
cos
2
nx n
_
1
0
dx |cos nx| .
But
_
1
0
dx |cos nx| = 2n
_
1/2n
0
dx cos nx =
2
.
Hence S[h] 2n. Thus for any > 0 we may chose a value of n to make S[h] as
large as we please, even though the varied path is arbitrarily close to the straight-line
path: hence the path y = 0 is not stationary when this norm is used.
This analysis shows that the denition of the distance between paths is important
because dierent denitions can change the nature of a path; consequently two types
of stationary path are dened.
The functional S[y] is said to have a weak stationary path, y
s
if there exists a > 0
such that S[y
s
+g] S[y
s
] has the same sign for all variations g satisfying ||g||
1
< .
On the other hand, S[y] is said to have a strong stationary path, y
s
if there exists
a > 0 such that S[y
s
+ g] S[y
s
] has the same sign for all variations g satisfying
||g||
0
< .
A strong stationary path is also a weak stationary path because if ||g||
1
< then
||g||
0
< . The converse is not true in general.
It is easier to nd weak stationary paths and, fortunately these are often the most
important. The Gateaux dierential is dened only for weak variations and, as we have
seen, it leads to a dierential equation for the stationary path.
Exercise 4.18
In this exercise we give another example of a path satisfying the ||z||
0
norm which
is arbitrarily close to the line y = 0, but for which S is arbitrarily large.
Consider the isosceles triangle with base AC of length a, height h and base angle ,
as shown on the left-hand side of the gure.
B
1
B
2
C
B
h
l
D D
B
A
C
A
Figure 4.2
(a) Construct the two smaller triangles AB1D and DB2C by halving the height
and width of ABC, as shown on the right. If AB = l and BD = h, show that
AB1 = l/2, 2l = a/ cos and h = l sin . Hence show that the lengths of the lines
AB1DB2C and ABC are the same and equal to 2l.
(b) Show that after n such divisions there are 2
n
similar triangles of height 2
n
h
and that the total length of the curve is 2l. Deduce that arbitrarily close to AC,
the shortest distance between A and C, we may nd a continuous curve every
point of which is arbitrarily close to AC, but which has any given length.
Exercise 4.19
S[y] =
_
1
0
dx
_
y
2
y
2
2xy
_
, y(0) = y(1) = 0,
is y
+y = x. Hence show that the stationary function is y(x) = sin x/ sin 1 x.

Exercise 4.20
S[y] =
_
b
a
dxF(y, y
), y(a) = A, y(b) = B,
where F(y, y
) does not depend explicitly upon x. By changing the independent

variable to u = x a show that the solution of the Euler-Lagrange equation
depends on the dierence b a rather than a and b separately.
Exercise 4.21
Eulers original method for nding solutions of variational problems is described
in equation 4.3 (page 122). Consider approximating the functional dened in
exercise 4.19 using the polygon passing through the points (0, 0), (
1
2
, y1) and (1, 0),
so there is one variable y1 and two segments.
This polygon can be dened by the straight line segments
y(x) =
_
2y1x, 0 x
1
2
,
2y1(1 x),
1
2
x 1.
Show that the corresponding polygon approximation to the functional becomes
S(y1) =
11
3
y
2
1
1
2
y1,
and hence that the stationary polygon is given by y(1/2) y1 = 3/44. Note that
this gives y(1/2) 0.0682 by comparison to the exact value 0.0697.
Exercise 4.22
Find the stationary paths of the following functionals.
(a) S[y] =
_
1
0
dx
_
y
2
+ 12xy
_
, y(0) = 0, y(1) = 2.
(b) S[y] =
_
1
0
dx
_
2y
2
y
2
(1 +x)y
2
_
, y(0) = 1, y(1) = 2.
(c) S[y] =
1
2
By(2) +
_
2
1
dxy
2
/x
2
, y(1) = A.
(d) S[y] =
y(0)
2
A
3
+
_
b
0
dxy/y
2
, y(b) = B
2
, B
2
> 2Ab > 0.
Hint for (c) and (d) use the method described in exercise 4.12.
Exercise 4.23
What is the equivalent of the fundamental lemma of the Calculus of Variations in
the theory of functions of many real variables?
Exercise 4.24
Find the general solution of the Euler-Lagrange equation corresponding to the
functional S[y] =
_
b
a
dxw(x)
_
1 +y
2
, and nd explicit solutions in the special
cases w(x) =
x and w(x) = x.
Exercise 4.25
_
1
0
dx
_
y
2
1
_
2
, y(0) = 0, y(1) = A > 0.
(a) Show that the Euler-Lagrange equation reduces to y
2
= m
2
, where m is a
constant.
(b) Show that the equation y
2
= m
2
, with m > 0, has the following three solu-
tions that t the boundary conditions, y1(x) = Ax,
y2(x) =
_
_
mx, 0 x
A +m
2m
,
A +m(1 x),
A +m
2m
x 1,
m > A
and
y3(x) =
_
_
mx, 0 x
mA
2m
,
Am(1 x),
mA
2m
x 1,
m > A.
Show also that on these solutions the functional has the values
S[y1] = (A
2
1)
2
, S[y2] = (m
2
1)
2
and S[y3] = (m
2
1)
2
.
(c) Deduce that if A 1 the minimum value of S[y] is (A
2
1)
2
and that this
occurs on the curve y1(x), but if A < 1 the minimum value of S[y] is zero and this
occurs on the curves y2(x) and y3(x) with m = 1.
Exercise 4.26
Show that the following functionals do not have stationary values
(a)
_
1
0
dxy
, (b)
_
1
0
dxyy
, (c)
_
1
0
dxxyy
,
where, in all cases, y(0) = 0 and y(1) = 1.
Exercise 4.27
Show that the Euler-Lagrange equations for the functionals
S1[y] =
_
b
a
dxF(x, y, y
) and S2[y] =
_
b
a
dx
_
F(x, y, y
) +
d
dx
G(x, y)
_
are identical.
Exercise 4.28
Show that the functional S[y] =
_
1
1
dxy
2
_
2x y
_
2
, y(1) = 0, y(1) = 1,
achieves its minimum value, zero, when
y(x) =
_
0, 1 x 0,
x
2
, 0 x 1,
which has no second derivative at x = 0. Show that, despite the fact that y
(x)
does not exist everywhere, the Euler-Lagrange equation is satised for x = 0.
Exercise 4.29
The functional S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, is stationary on

those paths satisfying the Euler-Lagrange equation
d
dx
_
F
y
F
y
= 0, y(a) = A, y(b) = B.
In this formulation of the problem we choose to express y in terms of x: however,
we could express x in terms of y, so the functional has the form
J[x] =
_
B
A
dy G(y, x, x
), x(A) = a, x(B) = b,
where x
= x
(y) = dx/dy.
(a) Show that G(y, x, x
) = x
F(x, y, 1/x
), and that the Euler-Lagrange equation

for this functional,
d
dy
_
G
x
G
x
= 0, x(A) = a, x(B) = b,
when expressed in terms of the original function F is
F
y
x
3
x
1
x
F
yy
F
xy
+Fy = 0
where, for instance, the function F
y
is the dierential of F(x, y, y
) with respect
to y
expressed in terms of x
after dierentiation.
(b) Derive the same result from the original Euler-Lagrange equations for F.
Exercise 4.30
Use the approximation 4.3 (page 122) to show that the equations for the values
of y = (y1, y2, , yn), where x
k+1
= x
k
+, that make S(y) stationary are
S
y
k
=

u
F(z
k
) +

v
F(z
k
)

v
F(z
k+1
) = 0, k = 1, 2, , n,
where z
k
= (x
k
, u, v), u = y
k
, v = (y
k
y
k1
)/ and where y0 = A and yn+1 = B.
Show also that z
k+1
= z
k
+ (1, y
k
, y

k
) +O(
2
), and hence that
S
y
k
=
_
F
u

2
F
xv
y
2
F
uv
y

n
2
F
v
2
_
+O(
2
),
=
_
d
dx
_
F
v
_
F
u
_
+O(
2
),
where F and its derivatives are evaluated at z = z
k
.
Hence derive the Euler-Lagrange equations.
Harder exercises
Exercise 4.31
This exercise is a continuation of exercise 4.21 and uses a set of n variables to
dene the polygon. Take a set of n + 2 equally spaced points on the x-axis,
x
k
= k/(n + 1), k = 0, 1, , n + 1 with x0 = 0 and xn+1 = 1, and a polygon
passing through the points (x
k
, y
k
). Since y(0) = y(1) = 0 we have y0 = yn+1 = 0,
leaving N unknown variables.
Show that the functional dened in exercise 4.19 approximates to
S =
1
h
n
k=0
_
(y
k+1
y
k
)
2
h
2
_
y
2
k
+
2k
n + 1
y
k
__
, h =
1
n + 1
.
(a) For n = 1, the case treated in exercise 4.21, show that this reduces to
S(y1) =
7
2
y
2
1

1
2
y1.
Explain the dierence between this and the previous expression for S(y1), given
in exercise 4.21.
(b) For n = 2 show that this becomes
S =
17
3
y
2
1
+
17
3
y
2
2
6y1y2
2
9
y1
4
9
y2,
and hence that the equations for y1 and y2 are
34y1 18y2 =
2
3
, 34y2 18y1 =
4
3
.
Solve these equations to show that y(1/3) 35/624 0.0561 and y(2/3)
43/624 0.0689. Note that these compare favourably with the exact values,
y(1/3) = 0.0555 and y(2/3) = 0.0682.
Exercise 4.32
_
b
a
dxF(y
) where F(z) is a dierentiable func-

tion and the admissible functions are at least twice dierentiable and satisfy the
boundary conditions y(a) = A1, y(b) = B1, y
(a) = A2 and y
(b) = B2.
(a) Show that the function making S[y] stationary satises the equation
F
y
= c(x a) +d
where c and d are constants.
(b) In the case that F(z) =
1
2
z
2
show that the solution is
y(x) =
1
6
c(x a)
3
+
1
2
d(x a)
2
+A2(x a) +A1,
where c and d satisfy the equations
1
6
cD
3
+
1
2
dD
2
= B1 A1 A2D where D = b a,
1
2
cD
2
+dD = B2 A2.
(c) Show that this stationary function is also a minimum of the functional.
Exercise 4.33
The theory described in the text considered functionals with integrands depend-
ing only upon x, y(x) and y
(x). However, functionals depending upon higher

derivatives also exist and are important, for example in the theory of sti beams,
and the equivalent of the Euler-Lagrange equation may be derived using a direct
extension of the methods described in this chapter.
S[y] =
_
b
a
dxF(x, y, y
, y
), y(a) = A1, y
(a) = A2, y(b) = B1, y
(b) = B2.
Show that the Gateaux dierential of this functional is
S[y, h] =
_
b
a
dx
_
h
F
y
+h
F
y
+h
F
y
_
.
Using integration by parts show that
_
b
a
dxh
F
y
=
_
b
a
dxh
d
2
dx
2
_
F
y
_
being careful to describe the necessary properties of h(x). Hence show that S[y]
is stationary for the functions that satisfy the fourth-order dierential equation
d
2
dx
2
_
F
y
d
dx
_
F
y
_
+
F
y
= 0,
with the boundary conditions y(a) = A1, y
(a) = A2, y(b) = B1, and y
(b) = B2.
Exercise 4.34
Using the result derived in the previous exercise, nd the stationary functions of
the functionals
(a) S[y] =
_
1
0
dx(1 +y
2
), y(0) = 0, y
(0) = y(1) = y
(1) = 1,
(b) S[y] =
_
/2
0
dx
_
y
2
y
2
+x
2
_
, y(0) = 1, y
(0) = y
_
2
_
= 0, y
2
_
= 1.
Chapter 5
Applications of the
Euler-Lagrange equation
5.1 Introduction
In this chapter we solve the Euler-Lagrange equations for two classic problems, the
brachistochrone, section 5.2, and the minimal surface of revolution, section 5.3. These
examples are of historic importance and special because the Euler-Lagrange equations
can be solved in terms of elementary functions. They are also important because they
are relatively simple yet provide some insight into the complexities of variational prob-
lems.
The rst example, the brachistochrone problem, is the simpler of these two prob-
lems and there is always a unique solution satisfying the Euler-Lagrange equation. The
second example is important because it is one of the simplest examples of a minimum
energy problem; but it also illustrates the complexities inherent in nonlinear boundary
value problems and we shall see that there are sometimes two and sometimes no dier-
entiable solutions, depending upon the values of the various parameters. This example
also shows that some stationary paths have discontinuous derivatives and therefore can-
not satisfy the Euler-Lagrange equations everywhere. This eect is illustrated in the
discussion of soap lms in section 5.4 and in chapter 10 is considered in more detail.
In both these cases you may nd the analysis leading to the required solutions com-
plicated. It is, however, important that you are familiar with this type of mathematics
so you should understand the text suciently well to be able to write the analysis in
your own words.
5.2 The brachistochrone
The problem, described previously in section 3.5.1 (page 104), is to nd the smooth
curve joining two given points P
a
and P
b
, lying in a vertical plane, such that a bead
sliding on the curve, without friction but under the inuence of gravity, travels from
P
a
to P
b
in the shortest possible time, the initial speed at P
a
being given. It was
pointed out in section 3.5.1 that John Bernoulli made this problem famous in 1696
145
146 CHAPTER 5. APPLICATIONS OF THE EULER-LAGRANGE EQUATION
and that several solutions were published in 1697: Newtons comprised the simple
statement that the solution was a cycloid, giving no proof. In section 5.2.3 we prove
this result algebraically, but rst we describe necessary preliminary material. In the next
section we derive the parametric equations for the cycloid after giving some historical
background. In section 5.2.2 the brachistochrone problem is formulated in terms of a
functional and the stationary path of this is found in section 5.2.3.
5.2.1 The cycloid
The cycloid is one of a class of curves formed by a point xed on a circle that rolls,
without slipping, on another curve. A cycloid is formed when the xed point is on the
circumference of the circle and the circle rolls on a straight line, as shown in gure 5.1:
other curves with similar constructions are considered in chapter 9. A related curve is
the trochoid where the point tracing out the curve is not on the circle circumference;
clearly dierent types of trochoids are produced depending whether the point is inside
or outside the circle, see exercise 9.18 (page 253).
y
x
D
O
a
C
P
B
A
Figure 5.1 Diagram showing how the cycloid OPD is traced out by a circle
rolling along a straight line.
In gure 5.1 a circle of radius a rolls along the x-axis, starting with its centre on the
y-axis. Fix attention on the point P attached to the circle, initially at the origin O. As
the circle rolls P traces out the curve OPD named the cycloid .
The cycloid has been studied by many mathematicians from the time of Galileo
(1564 1642), and was the cause of so many controversies and quarrels in the 17
th
century that it became known as the Helen of geometers. Galileo named the cycloid
but knew insucient mathematics to make progress. He tried to nd the area between
it and the x-axis, but the best he could do was to trace the curve on paper, cut out the
arc and weigh it, to conclude that its area was a little less than three times that of the
generating circle in fact it is exactly three times the area of this circle, as you can
show in exercise 5.3. He abandoned his study of the cycloid, suggesting only that the
cycloid would make an attractive arch for a bridge. This suggestion was implemented
in 1764 with the building of a bridge with three cycloidal arches over the river Cam in
the grounds of Trinity College, Cambridge, shown in gure 5.2.
The reason why cycloidal arches were used is no longer known, all records and
original drawings having been lost. However, it seems likely that the architect, James
Essex (1722 1784), chose this shape to impress Robert Smith (1689 1768), the Master
of Trinity College, who was keen to promote the study of applied mathematics.
5.2. THE BRACHISTOCHRONE 147
Figure 5.2 Essexs bridge over the Cam, in the grounds of Trinity
college, having three cycloidal arches.
The area under a cycloid was rst calculated in 1634 by Roberval (1602 1675). In
1638 he also found the tangent to the curve at any point, a problem solved at about
the same time by Fermat (1601 1665) and Descartes (1596 1650). Indeed, it was at
this time that Fermat gave the modern denition of a tangent to a curve. Later, in
1658, Wren (1632 1723), the architect of St Pauls Cathedral, determined the length
of a cycloid.
Pascals (1623 1662) last mathematical work, in 1658, was on the cycloid and,
having found certain areas, volumes and centres of gravity associated with the cycloid,
he proposed a number of such questions to the mathematicians of his day with rst and
second prizes for their solution. However, publicity and timing were so poor that only
two solutions were submitted and because these contained errors no prizes were awarded,
which caused a degree of aggravation among the two contenders A de Lalouvère (1600
1664) and John Wallis (1616 1703).
At about the time of this contest Huygens (1629 1695) designed the rst pendulum
clock, which was made by Salomon Closter in 1658, but was aware that the period of the
pendulum depended upon the amplitude of the swing. It occurred to him to consider the
motion of an object sliding on an inverted cycloidal arch and he found that the object
reaches the lowest point in a time independent of the starting point. The question
that remained was how to persuade a pendulum to oscillate in a cycloidal, rather than
a circular arc. Huygens now made the remarkable discovery illustrated in gure 5.3.
If one suspends from a point P at the cusp, between two inverted cycloidal arcs PQ
and PR, then a pendulum of the same length as one of the semi-arcs will swing in a
cycloidal arc QSR which has the same size and shape as the cycloidal arcs of which PQ
and PR are parts. Such a pendulum will have a period independent of the amplitude
of the swing.
P
Q
R
S
T
Figure 5.3 Diagram showing how Huygens cy-
cloidal pendulum, PT, swings between two xed,
similar cycloidal arcs PR and PQ.
Huygens made a pendulum clock with cycloidal jaws, but found that in practice it
was no more accurate than an ordinary pendulum clock: his results on the cycloid
were published in 1673 when his Horologium Oscillatorium appeared
1
. However, the
discovery illustrated in gure 5.3 was signicant in the development of the mathematical
understanding of curves in space.
The equations for the cycloid
The equation of the cycloid is obtained by nding the coordinates of P, in gure 5.1,
after the circle has rolled through an angle , so the length of the longer circular arc PA
is a. Because there is no slipping, OA = PA = a and coordinates of the circle centre
are C = (a, a). The distances PB and BC are PB = a cos and BC = a sin and
hence the coordinates of P are
x = a( sin ), y = a(1 cos ), (5.1)
which are the parametric equations of the cycloid. For || 1, x and y are related
approximately by y = (a/2)(6x/a)
2/3
, see exercise 5.2. The arc OPD is traced out as
increases from 0 to 2.
If, in gure 5.3 the y-axis is in the direction PS, that is pointing downwards, the
upper arc QPR, with the cusp at P is given by these equations with and
it can be shown, see exercise 5.28, that the lower arc is described by x = a( + sin),
y = a(3+cos ), and the same range of . The following three exercises provide practice
in the manipulation of the cycloid equations; further examples are given in exercises 5.26
5.28.
Exercise 5.1
Show that the gradient of the cycloid is given by
dy
dx
=
1
tan(/2)
. Deduce that the
cycloid intersects the x-axis perpendicularly when = 0 and 2.
1
A more detailed account of Huygens work is given in Unrolling Time by J G Yoder (Cambridge
University Press).
Exercise 5.2
By using the Taylor series of sin and cos show that for small ||, x a
3
/6
and y a
2
/2. By eliminating from these equations show that near the origin
y (a/2)(6x/a)
2/3
.
Exercise 5.3
Show that the area under the arc OPD in gure 5.1 is 3a
2
and that the length
of the cycloidal arc OP is s() = 8a sin
2
(/4).
5.2.2 Formulation of the problem
In this section we formulate the variational principle for the brachistochrone by obtain-
ing an expression for the time of passage from given points (a, A) to (b, B) along a curve
y(x).
Dene a coordinate system Oxy with the y-axis vertically upwards and the origin
chosen to make a = B = 0, so the starting point, at (0, A), is on the y-axis and the
nal point is on the x-axis at (b, 0), as shown in gure 5.4.
A
b O
P
s(x)
y
x
Figure 5.4 Diagram showing the curve y(x) through (0, A) and
(b, 0) on which the bead slides. Here s(x) is the distance along
the curve from the starting point to P = (x, y(x)) on it.
At a point P = (x, y(x)) on this curve let s(x) be the distance along the curve from the
starting point, so the speed of the bead is dened to be v = ds/dt. The kinetic energy
of a bead having mass m at P is
1
2
mv
2
and its potential energy is mgy; because the
bead is sliding without friction, energy conservation gives
1
2
mv
2
+mgy = E, (5.2)
where the energy E is given by the initial conditions, E =
1
2
mv
2
0
+ mgA, v
0
being the
initial speed at P
a
= (0, A). Small changes in s are given by s
2
= x
2
+y
2
, and so
_
ds
dt
_
2
=
_
dx
dt
_
2
+
_
dy
dt
_
2
=
_
dx
dt
_
2 _
1 +y
(x)
2
_
. (5.3)
Thus on rearranging equation 5.2 we obtain
_
ds
dt
_
2
=
2E
m
2gy or
dx
dt
_
1 +y
(x)
2
=
_
2E
m
2gy(x). (5.4)
The time of passage from x = 0 to x = b is given by the integral
T =
_
T
0
dt =
_
b
0
dx
1
dx/dt
.
Thus on re-arranging equation 5.4 to express dx/dt in terms of y(x) we obtain the
required functional,
T[y] =
_
b
0
dx
1 +y
2
2E/m2gy
. (5.5)
This functional may be put in a slightly more convenient form by noting that the energy
and the initial conditions are related by equation 5.2, so by dening the new dependent
variable
z(x) = A +
v
2
0
2g
y(x) we obtain T[z] =
_
b
0
dx
1 +z
2
2gz
. (5.6)
Exercise 5.4
(a) Find the time, T, taken for a particle of mass m to slide down the straight
line, y = Ax, from the point (X, AX) to the origin when the initial speed is v0.
Show that if v0 = 0 this is
T =
_
2X
gA
_
1 +A
2
.
(b) Show also that if the point (X, AX) lies on the circle of radius R and with
centre at (0, R), so the equation of the circle is x
2
+(y R)
2
= R
2
, then the time
taken to slide along the straight line to the origin is independent of X and is given
by
T = 2
_
R
g
.
This surprising result was known by Galileo and seems to have been one reason
why he thought that the solution to the brachistochrone problem was a circle.
Exercise 5.5
Show that the functional dened in equation 5.6 when expressed using z as the
independent variable and if v0 = 0 becomes
T[x] =
1
2g
_
A
0
dz
_
1 +x
(z)
2
z
, x(0) = 0, x(A) = b,
and write down the Euler-Lagrange equation for this functional.
5.2.3 A solution
The integrand of the functional 5.6 is independent of x, so we may use equation 4.13
(page 130) to write Eulers equation in the form
z
F
z
F = constant where F(z, z
) =
_
1 +z
2
z
.
Note that the external constant (2g)
1/2
can be ignored. Since
F
z
=
z
_
z(1 +z
2
)
this gives
z
2
_
z(1 +z
2
)
_
1 +z
2
z
=
1
c
for some positive constant c note that c must be positive because the left-hand side
of the above equation is negative. Rearranging the last expression gives
z
_
1 +z
2
_
= c
2
or
dz
dx
=
_
c
2
z
1. (5.7)
This rst-order dierential equation is separable and can be solved. First, however, note
that because the y-axis is vertically upwards we expect the solution y(x) to decrease
away from x = 0, that is z(x) will increase so we take the positive sign and then
integration gives,
x =
_
dz
_
z
c
2
z
.
Now substitute z = c
2
sin
2
to give
x = 2c
2
_
d sin
2
= c
2
_
d(1 cos 2)
=
1
2
c
2
(2 sin 2) +d and z =
1
2
c
2
(1 cos 2), (5.8)
where d is a constant. Both c and d are determined by the values of A, b and the
initial speed, v
0
. Comparing these equations with equation 5.1 we see that the required
stationary curve is a cycloid. It is shown in chapter 8 that, in some cases, this solution
is a global minimum of T[z].
In the case that the particle starts from rest, v
0
= 0, these solutions give
x = d +
1
2
c
2
(2 sin 2) , y = A
1
2
c
2
(1 cos 2)
where c and d are constants determined by the known end points of the curve.
At the starting point y = A so here = 0 and since x = 0 it follows that d = 0:
because (0) = 0 the particle initially falls vertically downwards. At the nal point of
the curve, x = b, y = 0, let =
b
. Then
2b
c
2
= 2
b
sin 2
b
,
2A
c
2
= 1 cos 2
b
,
giving two equations for c and
b
: we now show that these equations have a unique,
real solution. Consider the cycloid
u = 2 sin 2, v = 1 cos 2, 0 . (5.9)
The value of
b
is given by the value of where this cycloid intersects the straight line
Au = bv. The graphs of these two curves are shown in the following gure.
0 1 2 3 4 5 6
0.5
1
1.5
2
u
v
cycloid
Au=bv
Figure 5.5 Graph of the cycloid dened in equation 5.9 and
the straight line bv = Au.
Because the gradient of the cycloid at = 0, (u = v = 0), is innite this graph shows
that there is a single value of
b
for all positive values of the ratio A/b. By dividing the
rst of equations 5.9 by the second we see that
b
is given by solving the equation
2
b
sin 2
b
2 sin
2
b
=
b
A
, 0 <
b
< . (5.10)
Unless b/A is small this equation can only be solved numerically. Once
b
is known,
the value of c is given from the equation 2A/c
2
= 1 cos 2
b
, which may be put in the
more convenient form c
2
= A/ sin
2
b
.
Exercise 5.6
Show that if A b then
b
3b/2A and that y/A 1 (x/b)
2/3
.
Exercise 5.7
Use the solution dened in equation 5.8 to show that on the stationary path the
time of passage is
T[z] =
_
2A
g
b
sin
b
.
We end this section by showing a few graphs of the solution 5.8 and quoting some
formulae that help understand them; the rest of this section is not assessed.
In the following gure are depicted graphs of the stationary paths for A = 1 and
various values of b, ranging from small to large, so all curves start at (0, 1) but end at
the points (b, 0), with 0.1 b 4.
1 2 3 4
-1
-0.5
0
0.5
1
x
y
b=0.1
b=0.5
b=/2
Figure 5.6 Graphs showing the stationary paths joining the points
(0, 1) and (b, 0) for b = 0.1, 1/2, 1, /2, 2, 3 and 4.
From gure 5.6 we see that for small b the stationary path is close to that of a straight
line, as would be expected. In this case
b
is small and it was shown in exercise 5.6
that
b
=
3b
2A

9b
3
20A
3
+O(b
5
) and that
y
A
1
_
x
b
_
2/3
.
Also the time of passage is
T =
2A
g
_
1 +
3b
2
8A
2

81b
4
640A
4
+O(b
6
)
_
.
By comparison, if a particle slides down the straight line joining (0, A) to (b, 0), that is
y/A+x/b = 1, so z = Ax/b, then the time of passage is
T
SL
=
2(A
2
+b
2
)
Ag
=
_
2A
g
_
1 +
b
2
2A
2
+O(b
4
)
_
, b A,
b
_
2
Ag
_
1 +
A
2
2b
2
+O(b
4
)
_
, b A.
Thus for, small b, the relative dierence is
T
SL
T = T
b
2
8A
2
+O(b
4
).
Returning to gure 5.6 we see for small b the stationary paths cross the x-axis at
the terminal point. At some critical value of b the stationary path is tangential to the
x-axis at the terminal point. We can see from the equation for x() that this critical
path occurs when y
() = 0, that is when
b
= /2 and, from equation 5.10, we see
that this gives b = A/2. On this path the time of passage is
T =

2
2A
g
and also T
SL
= T
_
1 +
4
2
= 1.185T.
For b > A/2 the stationary path dips below the x-axis and approaches the terminal
point from below. For b A/2 it can be shown that
b
=
_
A/b + O(b
3/2
),
and that the path is given approximately by
x
b
2
(2 sin 2), y A
b
sin
2
,
and that
T =
2b
g
_
1
_
A
b
+
6
_
A
b
_
3/2
+
_
.
Thus the time of passage increases as
b, compared with the time to slide down the

straight line, which is proportional to b, for large b. Further, the stationary path reaches
its lowest point when = /2, where y = A b/, in other words the distance it falls
below the x-axis is about 1/3 the distance it travels along it, provided b A. That
is, the particle rst accelerates to a high speed, reaching a speed v
_
2gb/, before
slowing to reach the terminal point at speed v =
2gA: on the straight line path the

particle accelerates uniformly to this speed.
Exercise 5.8
Galilieo thought that the solution to the brachistrchrone problem was given by the
circle passing through the initial and nal points, (0, A) and (b, 0), and tangential
to the y-axis at the start point.
Show that the equation of this circle is (x R)
2
+ (y A)
2
= R
2
, where R is
its radius given by 2bR = A
2
+ b
2
. Show also that if x = R(1 cos ) and
y = A Rsin , then the time of passage is
T =
_
R
2g
_

b
0
d
1
sin
where sin
b
=
A
R
=
2Ab
A
2
+b
2
.
If b A show that T
_
2A/g.
5.3 Minimal surface of revolution
The problem is to nd the non-negative, smooth function y(x), with given end points
y(a) = A and y(b) = B, such that the cylindrical surface formed by rotating the curve
y(x) about the x-axis has the smallest possible area. The left-hand side of the following
gure shows the construction of this surface: note that the end discs do not contribute
to the area considered.
x
s
x
y
x
y
(b,B)
(a,A)
Figure 5.7 Diagram showing the construction of a surface of revolution, on the left,
and, on the right, the small segment used to construct the integral 5.11.
5.3. MINIMAL SURFACE OF REVOLUTION 155
This section is divided into three parts. First, we derive the functional S[y] giving the
required area. Second, we derive the equation that a suciently dierentiable function
must satisfy to make the functional stationary. Finally we solve this equation in a
simple case and show that even this relatively simple problem has pitfalls.
5.3.1 Derivation of the functional
An expression for the area of this surface is obtained by rst nding the area of the
edge of a thin disc of width x, shown in the right-hand side of gure 5.7. The small
segment of the boundary curve may be approximated by a straight line provided x is
suciently small, so its length, s, is given by
s =
_
1 +y
2
x +O(x
2
).
The area S traced out by this segment as it rotates about the x-axis is the circumference
of the circle of radius y(x) times s; to order x this is.
S = 2y(x)s = 2y
_
1 +y
2
x.
Hence the area of the whole surface from x = a to x = b is given by the functional
S[y] = 2
_
b
a
dxy
_
1 +y
2
, y(a) = A 0, y(b) = B > 0, (5.11)
with no loss of generality we may assume that A B and hence that B > 0.
Exercise 5.9
Show that the equation of the straight line joining (a, A) to (b, B) is
y =
B A
b a
(x a) +A.
Use this together with equation 5.11 to show that the surface area of the frustum
of the cone shown in gure 5.8 is given by
S = (B +A)
_
(b a)
2
+ (B A)
2
.
Note that the frustum of a solid is that part of the solid lying between two parallel
planes which cut the solid; its area does not include the area of the parallel ends.
y
x
ba
A
B
l
Figure 5.8 Diagram showing the frustum of a cone, the unshaded area. The
slant-height is l and the radii of the circular ends are A and B.
Show further that this expression may be written in the form (A +B)l where l
is the length of the slant height and A and B are the radii of the end circles.
The following exercise may seem a non-sequitur, but it illustrates two important points.
First, it shows how a simple version of Eulers method, section 4.2, can provide a useful
approximation to a functional. Second, it shows how a very simple approximation can
capture the essential, quite complicated, behaviour of a functional: this is important
because only rarely can the Euler-Lagrange equation be solved exactly. In particular
it suggests that in the simple case A = B, with y(x) dened on |x| a, there are
stationary paths only if A/a is suciently large and then there are two stationary
paths.
Exercise 5.10
Consider the case A = B and with a x a, so the functional 5.11 becomes
S[y] = 2
_
a
a
dxy
_
1 +y
2
, y(a) = A > 0.
(a) Assume that the required stationary paths are even and use a variation of
Eulers method, described in section 4.2.1, by assuming that
y(x) = +
A
a
x, 0 x a
where is a constant, to derive an approximation, S(), for S[y].
(b) By dierentiating this expression with respect to show that S() is station-
ary if = =
_
A
A
2
2a
2
_
/2, and deduce that no such solutions exist if
A < a
2. Note that the exact calculation, described below, shows that there are
no continuous stationary paths if A < 1.51a.
(c) Show that if A > a
2 the two stationary values of S satisfy S() > S(+)

(d) If A a show that the two values of are given approximately are by
+ = A
a
2
2A
+ and =
a
2
2A
_
1 +
a
2
2A
2
+
_
,
and nd suitable approximations for the associated stationary paths. Show also
that the stationary values of S are given approximately by S() 2A
2
and
S(+) 4Aa, and give a physical interpretation of these values.
5.3.2 Applications of the Euler-Lagrange equation
The integrand of the functional 5.11 does not depend explicitly upon x, hence the rst-
integral of the Euler-Lagrange equation 4.13 (page 130) may be used. In this case we
may take the integrand to be G(y, y
) = y
_
1 +y
2
so that
G
y
=
yy
_
1 +y
2
and y
G
y
G =
y
_
1 +y
2
.
Hence the Euler-Lagrange equation integrates to
y
_
1 +y
2
= c, y(a) = A 0, y(b) = B > 0, (5.12)
for some constant c; since y(b) > 0 we may assume that c is positive. By squaring and
re-arranging this equation we obtain the simpler rst-order equation
dy
dx
=
_
y
2
c
2
c
, y(a) = A 0, y(b) = B > 0. (5.13)
The solutions of equation 5.13, if they exist, ensure that the functional 5.11 is stationary.
We shall see, however, that suitable solutions do not always exist and that when they
do further work is necessary in order to determine the nature of the stationary point.
5.3.3 The solution in a special case
Here we solve the rst-order dierential equation 5.13 when the ends of the cylinder
have the same radius, that is A = B > 0: in this case it is convenient to put b = a,
so that the origin is at the centre of the cylinder which has length 2a. Now there
are two independent parameters, the lengths a and A; since there are no other length
scales we expect the solution to depend upon a single, dimensionless parameter, which
may be taken to be the ratio A/a. If B = A, there are two independent dimensionless
parameters, A/a and B/a for instance, and this makes understanding the behaviour
of the solutions more dicult. However, even the seemingly simple case A = B has
surprises in store and so provides an indication of the sort of diculties that may
be encountered with variational problems: such diculties are typical of nonlinear
boundary value problems. Because the following analysis involves several strands, you
will probably understand it more easily by re-writing it in your own words.
The ends have the same radius so it is convenient to introduce a symmetry by re-
dening a and putting the cylinder ends at x = a. This change, which is merely a
shift along the x-axis, does not aect the dierential equation 5.13 (because its right-
hand side is independent of x); but the boundary conditions are slightly dierent. If we
denote the required solution by f(x), then, from equation 5.13 we see that it satises
the dierential equation and boundary conditions,
df
dx
=
_
f
2
c
2
c
, f(a) = f(a) = A > 0. (5.14)
The identity cosh
2
z sinh
2
z = 1 suggests changing the dependent variable from f
to , where f = c cosh. This gives the simpler equation cd/dx = 1 with solution
c = x for some real constant . Hence the general solution
2
is
f(x) = c cosh
_
x
c
_
.
The boundary conditions give
A
c
= cosh
_
+a
c
_
= cosh
_
a
c
_
, that is sinh

c
sinh
a
c
= 0.
Since a = 0, the only way of satisfying this equation is to set = 0, which gives
f(x) = c cosh
_
x
c
_
with c determined by A = c cosh
_
a
c
_
. (5.15)
2
Another solution is f(x) = c in the special case that c = A; however, this solution is not a solution
of the original Euler-Lagrange equation, see the discussion in section 4.4, in particular exercise 4.8.
Notice that f(0) = c, so c is the height of the curve at the origin, where f(x) is
stationary; also, because = 0 the solution is even. The required solutions are obtained
by nding the real values of c satisfying this equation. Unfortunately, the equation
A = c cosh(a/c) cannot be inverted to express c in terms of known functions of A.
Numerical solutions may be found, but rst it is necessary to determine those values of
a and A for which real solutions exist.
A convenient way of writing this equation is to introduce a new dimensionless vari-
able = a/c so we may write the equation for c in the form
A
a
= g() where g() =
1
cosh. (5.16)
This equation shows directly that depends only upon the dimensionless ratio A/a. In
terms of and A the solution 5.15 becomes
f(x) =
a
cosh
_
x
a
_
= A
cosh (x/a)
cosh
. (5.17)
The stationary solutions are found by solving the equation A/a = g() for . The
graph of g(), depicted in gure 5.9, shows that g() has a single minimum and that for
A/a > min(g) there are two real solutions,
1
and
2
, with
1
<
2
, giving the shapes
f
1
(x) and f
2
(x) respectively.
0 1 2 3 4
2
4
6
8
10
g()
1

2
A/a
Figure 5.9 Graph of g() =

1
cosh showing the solu-
tions of the equation g() = A/a.
This graph also suggests that g() as 0 and ; this behaviour can be veried
with the simple analysis performed in exercise 5.12, which shows that
g()
1
for 1 and g()

e
2
for 1.
The minimum of g() is at the real root of tanh = 1, see exercise 5.13; this may be
found numerically, and is at
m
1.200, and here g(
m
) = 1.509. Hence if A < 1.509a
there are no real solutions of equation 5.16, meaning that there are no functions with
continuous derivatives making the area stationary. For A > 1.509a there are two
real solutions giving two stationary values of the functional 5.11; we denote these two
solutions by
1
and
2
with
1
<
2
. Because there is no upper bound on the area
neither solution can be a global maximum. Recall that in exercise 5.10 it was shown
that a simple polygon approximation to the stationary path did not exist if A < a
2
and there were two solutions if A > a
2.
The following graph shows values of the dimensionless area S/a
2
for these two sta-
tionary solutions as functions of A/a when A/a g(
m
) 1.509. The area associated
with the smaller root,
1
, is denoted by S
1
, with S
2
denoting the area associated with
2
. These graphs show that S
2
> S
1
for A > ag(
m
) 1.51a.
1.5 1.75 2 2.25 2.5 2.75 3
20
30
40
50
60
A/a
S/a
2
S
2
/a
2
S
1
/a
2
Figure 5.10 Graphs showing how the dimensionless area
S/a
2
varies with A/a.
It is dicult to nd simple approximations for the area S[f] except when A a, in
which case the results obtained in exercise 5.12 and 5.13 may be used, as shown in the
following analysis. We consider the smaller and larger roots separately.
If A a the smaller root,
1
is seen from gure 5.9 to be small. The approximation
developed in exercise 5.12 gives
1
a/A so that equation 5.17 becomes
f
1
(x) Acosh(x/A) A,
since |x| a A and cosh(x/A) 1. Because f
1
(x) is approximately constant the
original functional, equation 5.11, is easily evaluated to give
S
1
= S[f
1
] = 4aA or
S
1
a
2
= 4
A
a
.
The latter expression is the equation of the approximately straight line seen in g-
ure 5.10. The area S
1
is that of the right circular cylinder formed by joining the ends
with parallel lines.
For the larger root,
2
, since cosh e
/2, for large , equation 5.16 for becomes,

see exercise 5.12
A
a
=
1
2
e
(5.18)
and
f
2
(x) Aexp
_
2
a
(a x)
_
+Aexp
_
2
a
(a +x)
_
,

2
a
1.
For positive x the second term is negligible (because
2
1) provided x
2
a. For
negative x the rst term is negligible, for the same reason. Hence an approximation for
f
2
(x) is
f
2
(x) Aexp
_
2
a
(a |x|)
_
provided |x|
2
a. (5.19)
The behaviour of this function as is discussed after equation 5.20. In exer-
cise 5.12 it is shown that the area is given by
S
2
= S[f
2
] 2A
2
or
S
2
a
2
= 2
_
A
a
_
2
,
which is the same as the area of the cylinder ends. The latter expression increases
quadratically with A/a, as seen in gure 5.10.
These approximations show directly that if A a then S
2
> S
1
, conrming the
conclusions drawn from gure 5.10. They also show that when A a the smallest area
is given when the surface of revolution approximates that of a right circular cylinder.
In the following three gures we show examples of these solutions for A = 2a,
A = 10a and A = 100a. In the rst example, on the left, the ratio A/a = 2 is only
a little larger than min(g()) 1.509, but the two solutions dier substantially, with
f
1
(x) already close to the constant value of A for all x. In the two other gures the
ratio A/a is larger and now f
1
(x) is indistinguishable from the constant A, while f
2
(x)
is relatively small for most values of x.
0 0.25 0.5 0.75 1
0.25
0.5
0.75
1
0 0.25 0.5 0.75 1
0.25
0.5
0.75
1
0 0.25 0.5 0.75 1
0.25
0.5
0.75
1
f
1
(x) /A
f
2
(x) /A
f
2
(x) /A
f
2
(x) /A
f
1
(x) /A
f
1
(x) /A
x/a
x/a x/a
A=2a
A=10a A=100a
Figure 5.11 Graphs showing the stationary solutions f(x)/A = cosh(x/a) as a function of x/a
and for various values of A/a, with a = 1.
These gures and the preceding analysis show that when the ends are relatively close,
that is A/a large, f
1
(x) A, for all x, and that as A/a , f
2
(x) tends to the
function
f
2
(x) f
G
(x) =
_
0, |x| < a,
A, |x| = a.
(5.20)
This result may be derived from the approximate solution given in equation 5.19. Con-
sider positive values of x, with x
2
a. If x = a(1 ), where is a small positive
number, then
f
2
(x) Ae
2
.
But from equation 5.18 ln(A/a) = ln(2) and if 1, ln(2) , so ln(A/a)
and the above approximation for f
2
(x) becomes
f
2
(x)
A
=
_
a
A
_
, x = a(1 ).
Hence, provided > 0, that is x = a, f
2
/A 0 as A/a .
The surface dened by the limiting function f
G
(x) comprises two discs of radius A, a
distance 2a apart, so has area S
G
= 2A
2
, independent of a. Since this limiting solution
has discontinuous derivatives at x = a it is not an admissible function. Nevertheless
it is important because if A < ag(
m
) 1.509a it can be shown that this surface gives
the global minimum of the area and, as will be seen in the next subsection, has physical
signicance. This solution to the problem was rst found by B C W Goldschmidt in
1831 and is now known as the Goldschmidt curve or Goldschmidt solution.
5.3.4 Summary
We have considered the special case where the ends of the cylinder are at x = a and
each end has the same radius A; in this case the curve y = f(x) is symmetric about
x = 0 and we have obtained the following results.
1. If the radius of the ends is small by comparison to the distance between them,
A < ag(
m
) 1.509a, there are no curves described by dierentiable functions
making the traced out area stationary. In this case it can be shown that the
smallest area is given by the Goldschmidt solution, f
G
(x), dened in equation 5.20,
and that this is the global minimum.
2. If A > 1.51a there are two smooth stationary curves. One of these approaches
the Goldschmidt solution as A/a and the other approaches the constant
function f(x) A in this limit, and this gives the smaller area. This solution is
a local minimum of the functional, as will be shown in chapter 8.
The nature of the stationary solutions is not easy to determine. In the following graph
we show the areas S
1
/a
2
and S
2
/a
2
, as in gure 5.10 and also, with the dashed lines,
the areas given by the Goldschmidt solution, S
G
/a
2
= 2(A/a)
2
, curve G, and the area
of the right circular cylinder, S
c
/a
2
= 4A/a, curve c.
1.5 1.75 2 2.25 2.5 2.75 3
20
30
40
50
60
S
2
/a
2
G
c
S
1
/a
2
A/a
S/a
2
Figure 5.12 Graphs showing how the dimensionless area S/a
2
varies
with A/a. Here the curves k, k = 1, 2, denote the area S
k
/a
2
as in gure 5.10; G the scaled area of the Goldschmidt curve,
S
G
= 2(A/a)
2
and c the scaled area of the cylinder, 4A/a.
If A > ag(
m
) 1.509a it will be shown in chapter 8 that S
1
is a local minimum of
the functional. The graphs shown in gure 5.12 suggest that for large enough A/a,
S
1
< S
G
, but for smaller values of A/a, S
G
< S
1
. The value of at which S
G
= S
1
is
given by the solution of 1 +e
2
= 2, see exercise 5.14. The numerical solution of this
equation gives = 0.639 at which A = 1.8945a. Hence if A < 1.89a the Goldschmidt
curve yields a smaller area, even though S
1
is a local minimum. For A > 1.89a, S
1
gives the smallest area.
This relatively simple example of a variational problem provides some idea of the
possible complications that can arise with nonlinear boundary value problems.
Exercise 5.11
(a) If f(x) = c cosh(x/c) show that
S[f]
a
2
=
2
2
( + sinh cosh ) , =
a
c
.
(b) Show that S[f] considered as function of is stationary at the root of tanh = 1.
Exercise 5.12
(a) Use the expansion cosh = 1 +
1
2
2
+O(
4
) to show that, for small , g() =
1/ + /2 + O(
3
), where g() is dened in equation 5.16. Hence show that if
A a then a/A and hence that c A and f(x) A. Using the result
obtained in the previous exercise, or otherwise, show that S1 = 4Aa.
(b) Show that if 2 is large the equation dening it is given approximately by
A
a

1
2
e
and, using the result obtained in the previous exercise, that

S2
a
2
2
_
e
2
_
2
+
2
2
_
e
2
_
2
, ( = 2).
Exercise 5.13
(a) Show that the position of the minimum of the function g() =
1
cosh ,
> 0, is at the real root, m, of tanh = 1.
By sketching the graphs of y = 1/ and y = tanh, for > 0, show that the
equation tanh = 1 has only one real root.
(b) If a/c = m and A/a = g(m) use the result derived in exercise 5.11 to
show that the area of the cylinder formed is Sm = 2A
2
m, and that Sm/a
2
=
2
1
m
cosh
2
m.
Exercise 5.14
Use the result derived in exercise 5.11 to show that SG = S1 when satises
the equation cosh
2
= + sinh cosh . Show that this equation simplies to
1 +e
2
= 2 and that there is only one positive root, given by = 0.639232.
Exercise 5.15
(a) Show that the functional
S[y] =
_
1
1
dx
_
y (1 +y
2
), y(1) = y(1) = A > 0,
is stationary on the two paths
y(x) =
1
4c
2
_
4c
4
+x
2
_
where c
2
= c
2
=
1
2
_
A
_
A
2
1
_
.
In the following these solutions are denoted by y(x).
(b) Show that on these stationary paths
S[y] = 2c +
1
6c
3
,
and deduce that when A > 1, S[y] > S[y+], and that when A = 1, S[y] = 4
2/3.
Show also that if A 1
S[y]
4
3
A
3/2
and S[y+] 2
A.
5.4. SOAP FILMS 163
(c) Find the value of S[y] for the function
y
(x) =
_
_
_
0, 0 x < 1 , 0 < 1,
A
A
(1 x), 1 x 1.
Show that as 0, y
(x) fG(x), the Goldschmidt curve dened in equa-

tion 5.20. Show also that
lim
0
S [y
] = S[fG] =
4
3
A
3/2
.
5.4 Soap Films
An easy way of forming soap lms is to dip a loop of wire into soap solution and then to
blow on it. Almost everyone will have noticed the initial at soap lm bounded by the
wire forms a segment of a sphere when blown. It transpires that there is a very close
connection between these surfaces and problems in the Calculus of Variations. The exact
physics of soap lms is complicated, but a fairly simple and accurate approximation
shows that the shapes assumed by soap lms are such as to minimise their areas, because
the surface-tension energy is approximately proportional to the area and equilibrium
positions are given by the minimum of this energy. Thus, in some circumstances the
shapes given by the minimum surface of revolution, described above, are those assumed
by soap lms.
The study of the formation and shapes of soap lms has a very distinguished pedi-
gree: Newton, Young, Laplace, Euler, Gauss, Poisson are some of the eminent scientists
and mathematicians who have studied the subject. Here we cannot do the subject jus-
tice, but the interested reader should obtain a copy of Isenbergs fascinating book
3
.
The essential property is that a stable soap lm is formed in the shape of a surface of
minimal area that is consistent with a wire boundary.
Probably the simplest example is that of a soap lm supported by a circular loop of
wire. If we distort it by blowing on it gently to form a portion of a sphere, when we stop
blowing the surface returns to its previous shape, that is a circular disc. Essentially this
is because in each case the surface-tension energy, which is proportional to the area, is
smallest in the assumed conguration.
Imagine a framework comprising two identical circular wires of radius A, held a
distance 2a apart (like wheels on an axle), as in gure 5.13 below. What shape soap
lm can such a frame support? These gures illustrate the alternatives suggested by
the analysis of the previous section and agree qualitatively with the solutions one would
intuitively expect.
The left-hand conguration (large separation), with two distinct surfaces, is the
Goldschmidt solution, equation 5.20, and it gives an absolute minimum area if A <
1.89a. The shape on the right is a catenoid of revolution and represents the absolute
minimum if A > 1.89a. It is a local minimum if 1.51a < A < 1.89a and does not exist
if A < 1.51a. When 1.51a < A < 1.89a the catenoid is unstable and we have only
to disturb it slightly, by blowing on it for instance, and it may suddenly jump to the
Goldschmidt solution which has a smaller area, as seen in gure 5.12.
3
The Science of Soap Films and Soap Bubbles, by C Isenberg (Dover 1992).
2a
A
2a
A
Figure 5.13 Diagrams showing two congurations assumed by soap lms on two rings of radius
A and a distance 2a appart. On the left, 1.89a > A, the soap lm simply lls the two circular
wires because they are too far apart: this is the Goldschmidt solution, equation 5.20. On the right
1.51a < A the soap lm joins the two rings in the shape dened by equation 5.17 with =
1
.
The methods discussed previously provide the shape of the right-hand lm, but the
matter of determining whether these stationary positions are extrema, local or global,
is of a dierent order of diculty. The complexity of this physical problem is further
compounded when one realises that there can be minimum energy solutions of a quite
unexpected form. The following diagram illustrates a possible conguration of this kind.
We do not expect the theory described in the previous section to nd such a solution
because the mathematical formulation of the physical problem makes no allowance for
this type of behaviour.
2a
Figure 5.14 Diagram showing a possible soap lm. In this example a circular
lm, perpendicular to the axis, is formed in the centre and this is joined to
both outer rings by a catenary.
The relationship between soap lms and some problems in the Calculus of Variations
can certainly add to our intuitive understanding, but this example should provide a
salutary warning against dependence on intuition.
Examples of the complex shapes that soap lms can form, but which are dicult
to describe mathematically, are produced by dipping a wire frame into a soap solution.
Photographs of the varied shapes obtained by cubes and tetrahedrons are provided in
Isenbergs book.
Here we describe a conceptually simple problem which is dicult to deal with math-
ematically, but which helps to understand the diculties that may be encountered with
certain variational problems. Further, this example has potential practical applications.
Consider the soap lm formed between two clear, parallel planes joined by a number
of pins, of negligible diameter, perpendicular to the planes. When dipped into a soap
5.4. SOAP FILMS 165
solution the resulting lm will join the pins in such a manner as to minimise the length
of lm, because the surface tension energy is proportional to the area, which is propor-
tional to the length of lm. In gure 5.15 we show three cases, viewed from above, with
two and three pins.
In panel A there are two pins: the natural shape for the soap lms is the straight line
joining them. In panels B and C there are three pins and two dierent congurations
are shown which, it transpires, are the only two allowed; but which of the pair is actually
assumed depends upon the relative positions of the pins.
A
B C
Figure 5.15 Diagram showing possible congurations
of soap lms for two and three pins.
The reason for this follows from elementary geometry and the application of one of
Plateaus (1801 1883)
4
three geometric rules governing the shapes of soap lms, which
he inferred from his experiments. In the present context the relevant rule is that three
intersecting planes meet at equal angles of 120
: this is a consequence of the surface

tension forces in each plane being equal. Plateaus other two rules are given by Isenberg
(1992, pages 83 4).
We can see how this works, and some of the consequences for certain problems in
the Calculus of Variations, by xing two points, a and b, and allowing the position of
the third point to vary. The crucial mathematical result needed is Proposition 20 of
Euclid
5
, described next.
Euclid: proposition 20
The angle subtended by a chord AB at the centre of
the circle, at O, is twice the angle subtended at any
point C on the circumference of the circle, as shown
in the gure. This is proved using the properties of
similar triangles.
A
B
C
O
2
With this result in mind draw a circle through the points a and b such that the angle
subtended by ab on the circumference is 120
, gures 5.16 and 5.17. If L is the distance

between a and b the radius of this circle is R = L/
3. The orientation of this circle is

chosen so the third point is on the same side of the line ab as the 120
angle.
Then for any point c outside this circle the shortest set of lines is obtained by joining
c to the centre of the circle, O, and if c
is the point where this line intersects the circle,

4
Joseph Plateau was a Belgian physicist who made extensive studies of the surface properties of
uids.
5
See Euclids Elements, Book III.
see gure 5.16, the lines cc
, ac
and c
b are the shortest set of lines joining the three

points a, b and c.
o
120
O
a b
c
c
Figure 5.16 Diagram of the shortest
length for a point c outside the circle.The
point O is the centre of the circle.
>120
o
a b
c
Figure 5.17 Diagram of the shortest
length for a point c inside the circle.
If the third point c is inside this circle the shortest line joining the points comprises
the two straight line segments ac and cb, as shown in gure 5.17. This result can be
proved, see Isenberg (1992, pages 67 73) and also exercise 5.16.
As the point c moves radially from outside to inside the circle the shortest cong-
uration changes its nature: this type of behaviour is generally dicult to predict and
may cause problems in the conventional theory of the Calculus of Variations.
If more pins join the parallel planes the soap lm will form congurations making
the total length a local minimum; there are usually several dierent minimum congu-
rations, and which is found depends upon a variety of factors, such as the orientation of
the planes when extracted from the soap solution. The problem of minimising the total
length of a path joining n points in a plane was rst investigated by the Swiss mathe-
matician Steiner (1796 1863) and such problems are now known as Steiner problems.
The mathematical analysis of such problems is dicult. One physical manifestation of
this type of situation is the laying of pipes between a number of centres, where, all else
being equal, the shortest total length of pipe is desirable.
Exercise 5.16
Consider the three points, O, A and C, in the Cartesian plane with coordinates
O = (0, 0), A = (a, 0) and C = (c, d) and where the angle OAC is less than 120
.
Consider a point X, with coordinates (x, y) inside the triangle OAC. Show that
the sum of the lengths OX, AX and CX is stationary and is a minimum when
the angles between the three lines are all equal to 120
.
Exercise 5.17
Consider the case where four pins are situated at the corners of a square with side
of length L.
(a) One possible conguration of the soap lms is for them to lie along the two
diagonals, to form the cross . Show that the length of the lms is 2
2 L = 2.83L.
(b) Another conguration is the H-shape, . Show that the length of lm is 3L.
(c) Another possible conguration is, , where the angle between three intersect-
ing lines is 120
. Show that the length of lm is (1 +
3)L = 2.73L.
5.4. SOAP FILMS 167
Exercise 5.18
Consider the conguration of four pins forming a rectangle
with sides of length L and aL.
(a) For the case shown in the top panel, a > 1, show that
total line length is d1 = L(a +
3) and that for the case in the

bottom panel, a < 1, it is d2 = L(1 +a
3).
(b) Show that the minimum of these two lengths is d1 if a > 1
and d2 if a < 1.
L
L
a<1
aL, a>1
Exercise 5.19
Show that the Euler-Lagrange equation for the minimal surface of revolution on
the interval 0 x a with the boundary conditions y(0) = 0, y(a) = A > 0, has
no solution.
Note that in this case the only solution is the Goldschmidt curve, equation 5.20,
page 160.
Exercise 5.20
Show that the functional giving the distance between two points on a sphere of
radius r, labelled by the spherical polar coordinates (a, a) and (
b
,
b
) can be
expressed in either of the forms
S = r
_

b
a
d
_
1 +
()
2
sin
2
or S = r
_

b
a
d
_
()
2
+ sin
2
giving rise to the two equivalent Euler-Lagrange equations, respectively,
sin
2
= c
_
1 +
()
2
sin
2
, (a) = a, (
b
) =
b
,
where c is a constant, and
sin 2
2
cos sin
2
cos = 0, (a) = a, (
b
) =
b
.
Both these equations can be solved, but this task is made easier with a sensible
choice of orientation. The two obvious choices are:
(a) put the initial point at the north pole, so a = 0 and a is undened, and
(b) put both points on the equator, so a =
b
= /2, and we may also choose
a = 0.
Using one of these choices show that the stationary paths are great circles.
Exercise 5.21
Consider the minimal surface problem with end points Pa = (0, A) and P
b
= (b, B),
where b, A and B are given and A B.
(a) Show that the general solution of the appropriate Euler-Lagrange equation is
y = c cosh
_
x
c
_
,
where and c are real constants with c > 0. Show that if c = b the boundary
conditions give the following equation for
B = f() where f(x) = Acosh(1/x)
_
A
2
x
2
sinh(1/x)
and A = A/b, B = B/b, with 0 A.
(b) Show that for x A and x A the function f(x) behaves, respectively, as
f(x)
x
2
4A
e
1/x
and f(x) Acosh(1/A)
_
2A(A x) sinh(1/A).
Deduce that f(x) has at least one minimum in the interval 0 < x < A and that
the equation B = f() has at least two roots for suciently large values of B and
none for small B.
(c) If A 1 show that the minimum value of f(x) occurs near x = A and that
min(f)
1
2
Aexp
_
1
A
_
. Deduce that if A 1 there are two solutions of the
Euler-Lagrange equation if B >
1
2
Aexp
_
1
A
_
, approximately, otherwise there are
no solutions.
Exercise 5.22
(a) For the brachistochrone problem suppose that the initial and nal points of
the curve are (x, y) = (0, A) and (b, 0), respectively, as in the text, but that the
initial speed, v0, is not zero.
Show that the parametric equations for the stationary path are
x = d +
1
2
c
2
(2 sin 2), z = c
2
sin
2
, y = A+
v
2
0
2g
z,
where 0
b
, for some constants c, d, 0 and
b
. Show that these four
constants are related by the equations
sin
2
0 = k
2
sin
2
b
, k
2
=
v
2
0
v
2
0
+ 2gA
< 1,
b =
v
2
0
4gk
2
sin
2
b
_
(2
b
sin 2
b
) (20 sin 20)
_
,
c
2
sin
2
b
= A +
v
2
0
2g
.
(b) If v
2
0
Ag, show that k is small and nd an approximate solution for these
equations. Note, this last part is technically demanding.
Exercise 5.23
In this exercise you will show that the cycloid is a local minimum for the brachis-
tochrone problem using the functional found in exercise 5.5. Consider the varied
path x(z) +h(z) and show that (ignoring the irrelevant factor 1/
2g )
T[x +h] T[x] =

2
2
_
A
0
dz
h
(z)
2
z(1 +x
2
)
3/2
+O(
3
),
=
2
c
_

A
0
dh
(z)
2
cos
4
,
where z(x) is the stationary path, given parametrically by z = c
2
sin
2
, x =
1
2
c
2
(2sin 2) and where A = c
2
sin
2
A. Deduce that T[x+h] > T[x], for || > 0
and all h(x), and hence that the stationary path is actually a local minimum.
Exercise 5.24
The Oxy-plane is vertical with the Oy-axis vertically upwards. A straight line
is drawn from the origin to the point P with coordinates (x, f(x)), for some
dierentiable function f(x). Show that the time taken for a particle to slide
smoothly from P to the origin is
T(x) = 2
_
x
2
+f(x)
2
2gf(x)
.
By forming a dierential equation for f(x), and solving it, show that T(x) is
independent of x if f satises the equation x
2
+(f )
2
=
2
, for some constant .
Describe the shape of the curve dened by this equation.
Exercise 5.25
A cylindrical shell of negligible thickness is formed by rotating the curve y(x),
a x b, about the x-axis. If the material is uniform with density the moment
of inertia about the x-axis is given by the functional
I[y] =
_
b
a
dxy
3
_
1 +y
2
, y(a) = A, y(b) = B
where A and B are the radii of the ends and are given.
(a) In the case A = B and with the end points at x = a show that I[y] is
stationary on the curve y = c cosh (x) where (x) is given implicitly by
x
c
=
_

0
dv
1
_
1 + cosh
2
v + cosh
4
v
,
and the constant c is given by A = c cosh a where a = (a) is given by the
solution of the equation
a
A
= f(a) where f(z) =
1
cosh z
_
z
0
dv
1
_
1 + cosh
2
v + cosh
4
v
.
(b) Show that for small and large z
f(z)
_
_
z
3
+O(z
3
),
2e
z
, =
_

0
dv
1
_
1 + cosh
2
v + cosh
4
v
.
Hence show that for a/A 1 there are two solutions. Show, also that there
is a critical value of a/A above which there are no appropriate solutions of the
Euler-Lagrange equation.
Problems on cycloids
Exercise 5.26
The cycloid OPD of gure 5.1 (page 146) is rotated about the x-axis to form a
solid of revolution. Show that the surface area, S, and volume, V , of this solid are
S = 2
_
2
0
d y
ds
d
V =
_
2
0
d y
2
dx
d
= 4a
2
_
2
0
d (1 cos ) sin(/2) = a
3
_
2
0
d (1 cos )
3
=
64
3
a
2
= 5
2
a
3
.
Exercise 5.27
The half cycloid with parametric equations x = a(sin), y = a(1cos ) with
0 is rotated about the y-axis to form a container.
(a) Show that the surface area, S(), and volume, V (), are given by
S() = 4a
2
_

0
d
_
sin
_
sin(/2),
V () = a
3
_

0
d
_
sin
_
2
sin .
(b) Show that for small x these integrals are approximated by
S(x) =
2
5
6
2/3
a
1/3
x
5/3
+O(x
7/3
) and V (x) =

8
6
2/3
a
1/3
x
8/3
+O(x
10/3
).
(c) Find the general expressions for S() and V () and their values at = .
Exercise 5.28
This exercise shows that the arc QST in gure 5.3, (page 148) is a cycloid, a
result discovered by Huygens and used in his attempt to construct a pendulum
with period independent of its amplitude for use in a clock.
Consider the situation shown in gure 5.18, where the arcs ABO and OCD are
cycloids dened parametrically by the equations
x = a( sin ), y = a(1 cos ), 2 2,
where B and C are at the points = , respectively.
O
D
C B
Q
R
y
x
A
Figure 5.18
The curve OQR has length l, is wrapped round the cycloid along OQ, is a straight
line between Q and R and is tangential to the cycloid at Q.
(a) If the point Q has the coordinates
xQ = a( sin ) and yQ = a(1 cos )
show that the angle between QR and the x-axis is given by = ( )/2.
(b) Show that the coordinates of the point R are
xR = xQ + (l s()) sin(/2) and yR = yQ + (l s()) cos(/2),
where s() is the arc length OQ.
(c) If the length of OQR is the same as the length of OQC show that
xR = a( + sin ) and yR = a(3 + cos ).
Chapter 6
Further theoretical
developments
6.1 Introduction
In this chapter we continue the development of the general theory, by rst considering
the eects of changing variables and then by introducing functionals with several de-
pendent variables. The chapter ends with a discussion of whether any second-order dif-
ferential equation can be expressed as an Euler-Lagrange equations and hence whether
its solutions are stationary paths of a functional.
The motivation for changing variables is simply that most problems can be simpli-
ed by a judicious choice of variables, both dependent and independent. If a set of
dierential equations can be derived from a functional it transpires that changing vari-
ables in the functional is easier than the equivalent change to the dierential equations,
because the order of dierentiation is always smaller.
The introduction of two or more dependent variables is needed when stationary paths
are described parametrically an idea introduced in chapter 9. Another very important
use, however, is in the reformulation of Newtons laws as a variational principle, an
important topic we have no room for in this course.
6.2 Invariance of the Euler-Lagrange equation
In this section we consider the eect of changing both the dependent and independent
variables and show that the form of the Euler-Lagrange equation remains unchanged,
an important property rst noted by Euler in 1744. This technique is useful because
one of the principal methods of solving dierential equations is to change variables with
the aim of converting it to a standard, recognisable form. For instance the unfamiliar
equation
z
d
2
y
dz
2
+ (1 a)
dy
dz
+a
2
z
2a1
y = 0 (6.1)
173
174 CHAPTER 6. FURTHER THEORETICAL DEVELOPMENTS
becomes, on setting z = x
1/a
, (x 0), the familiar equation
d
2
y
dx
2
+y = 0.
It is rarely easy to nd suitable new variables, but if the equation can be derived from
a variational principle the task is usually made easier because the algebra is simpler:
you will see why in exercise 6.1 where the above example is treated with a = 2.
We start with functionals having only one dependent variable, but the full power of
this technique becomes apparent mainly in the advanced study of dynamical systems
which cannot be dealt with here.
6.2.1 Changing the independent variable
The easiest way of understanding why the form of the Euler-Lagrange equation is invari-
ant under a coordinate change is to examine the eect of changing only the independent
variable x. Thus for the functional
S[y] =
_
b
a
dxF(x, y(x), y
(x)) (6.2)
we change to a new independent variable u, where x = g(u) for a known dierentiable
function g(u), assumed to be monotonic so the inverse exists. With this change of
variable y(x) becomes a function of u and it is convenient to dene Y (u) = y(g(u)).
Then the chain rule gives
dy
dx
=
dy
du
du
dx
=
Y
(u)
dx/du
=
Y
(u)
g
(u)
,
and the functional becomes
S[Y ] =
_
d
c
du g
(u)F
_
g(u), Y (u),
Y
(u)
g
(u)
_
, (6.3)
with the integration limits, c and d, dened implicitly by the equations a = g(c) and
b = g(d).
The integrand of the original functional depends upon x, y(x) and y
(x). The
integrand of the transformed functional depends upon u, Y (u) and Y
(u), so if we
dene
F(u, Y (u), Y
(u)) = g
(u)F
_
g(u), Y (u),
Y
(u)
g
(u)
_
, (6.4)
the functional can be written as
S[Y ] =
_
d
c
du F(u, Y (u), Y
(u)). (6.5)
The Euler-Lagrange equation in the new variable, u, is therefore
d
du
_
F
Y
F
Y
= 0 (6.6)
6.2. INVARIANCE OF THE EULER-LAGRANGE EQUATION 175
whereas the original Euler-Lagrange equation is
d
dx
_
F
y
F
y
= 0. (6.7)
These two equations have the same form, in the sense that the formula 6.6 is obtained
from 6.7 by replacing the explicit occurrences of x, y, y
and F by u, Y , Y
and F
respectively. The new second-order dierential equation for Y , obtained from 6.6 is,
however, normally quite dierent from the equation for y derived from 6.7, because F
and F have dierent functional forms.
A simple example is the functional
S[y] =
_
2
1
dx
y
2
x
2
, y(1) = 1, y(2) = 2,
which is similar to the example dealt with in exercise 4.22(c) (page 139). The general
solution to the Euler-Lagrange is y(x) = + x
3
and the boundary conditions give
= 6/7 and = 1/7.
Now make the transformation x = u
a
, for some constant a: the chain rule gives
dy
dx
=
dy
du
du
dx
=
Y
(u)
au
a1
where Y (u) = y(u
a
)
and the functional becomes
S[Y ] = a
_
2
1/a
1
du u
a1
1
u
2a
_
Y
(u)
au
a1
_
2
=
1
a
_
2
1/a
1
du
Y
(u)
2
u
3a1
.
Choosing 3a = 1, simplies this functional to
S[Y ] = 3
_
8
1
du Y
2
, Y (1) = 1, Y (8) = 2.
The Euler-Lagrange equation for this functional is Y
(u) = 0, having the general so-

lution Y = C + Du. The boundary conditions give C + 8D = 2 and C + D = 1 and
hence
Y (u) =
1
7
(6 +u) giving y(x) = Y (u(x)) =
1
7
_
6 +x
3
_
.
In this example little was gained, because the Euler-Lagrange equation is equally easily
solved in either representation. This is not always the case as the next exercise shows.
Exercise 6.1
The functional S[y] =
_
X
0
dx
_
y
2
2
y
2
_
, where is a constant, gives rise to
the Euler-Lagrange equation y
+
2
y = 0.
(a) Show that changing the independent variable to z where x = z
2
gives the
functional
S[y] =
1
2
_
Z
0
dz
_
y
(z)
2
z
4
2
zy
2
_
, Z =
X,
with the associated Euler-Lagrange equation
z
d
2
y
dz
2

dy
dz
+ 4
2
z
3
y = 0.
Show that this is the same as equation 6.1 when a = 2 and = 1.
(b) Show that
d
2
y
dx
2
=
1
4z
3
_
z
d
2
y
dz
2

dy
dz
_
and hence derive the above Euler-Lagrange equation directly.
Note that the rst method requires only that we compute dy/dx and avoids the
need to calculate the more dicult second derivative, d
2
y/dx
2
, required by the
second method. This is why it is normally easier to transform the functional rather
than the dierential equation.
Exercise 6.2
A simpler type of transformation involves a change of the dependent variable.
S[y] =
_
b
a
dxy
2
.
(a) Show that the associated Euler-Lagrange equation is y
(x) = 0.
(b) Dene a new variable z related to y by the dierentiable monotonic function
y = G(z) and show that the functional becomes
S[z] =
_
b
a
dxG
(z)
2
z
2
.
Show also that the Euler-Lagrange equation for this functional is
G
(z)z
+G
(z)z
2
= 0
and that this is identical to the original Euler-Lagrange equation.
Exercise 6.3
Show that if y = G(z), where G(z) is a dierentiable function, the functional
S[y] =
_
b
a
dxF(x, y, y
) transforms to S[z] =
_
b
a
dxF(x, G(z), G
(z)z
)
with associated Euler-Lagrange equation
d
dx
_
G
(z)
F
y
F
y
G
(z)
F
y
(z)
dz
dx
= 0.
6.2.2 Changing both the dependent and independent variables
In the previous section it was seen that when changing the independent variable the
algebra is simpler if the transformation is made to the functional rather than the associ-
ated Euler-Lagrange equation because changing the functional involves only rst-order
derivatives, recall exercise 6.1.
For the same reason it is far easier to apply more general transformations to the
functional than to the Euler-Lagrange equation. The most general transformation we
need to consider will be between the Cartesian coordinates (x, y) and two new variables
(u, v): such transformations are dened by two equations
x = f(u, v), y = g(u, v)
which we assume take each point (u, v) to a unique point (x, y) and vice-versa, so the
Jacobian determinant of the transformation, equation 1.26 (page 30), is not zero in the
relevant ranges of u and v.
Before dealing with the general case we illustrate the technique using the particular
example in which (u, v) are polar coordinates, which highlights all relevant aspects of
the analysis.
The transformation between Cartesian and polar coordinates
The Cartesian coordinates (x, y) are dened in terms of the plane polar coordinates
(r, ) by
x = r cos , y = r sin , r 0, < . (6.8)
The inverse transformation is (for r = 0),
r
2
= x
2
+y
2
, tan =
y
x
, (6.9)
where the signs of x and y need to be taken into account when inverting the tan function.
At the origin r = 0, but is undened. In Cartesian coordinates we normally choose x
to be the independent variable, so points on the curve C joining the points (a, A) and
(b, B), gure 6.1 below, are given by the Cartesian coordinates (x, y(x)).
C
b
b
r
y
x
(x,y)
a b
B
A
r
Figure 6.1 Diagram showing the relation between the Cartesian and polar
representations of a curve joining (a, A) and (b, B).
Alternatively, we can dene each point on the curve by expressing r as a function of ,
and then the curve is dened by the polar coordinates (r(), ).
The aim is to transform a functional
S[y] =
_
b
a
dxF(x, y(x), y
(x)) (6.10)
to an integral over in which y(x) and y
(x) are replaced by expressions involving ,

r() and r
(). First we change to the new independent variable : then since x = r cos
and y = r sin we have
S[r] =
_

b
a
d
dx
d
F (r cos , r sin, y
(x)) . (6.11)
The dierential dx/d is obtained from the relation x = r cos using the chain rule and
remembering that r depends upon ,
dx
d
=
dr
d
cos r sin and similarly
dy
d
=
dr
d
sin +r cos . (6.12)
It remains only to express y
(x) in terms of r, and r
, and this given by the relation

dy
dx
=
dy
d
d
dx
=
dy
d
_
dx
d
=
r
sin +r cos
r
cos r sin
, (6.13)
where r is assumed to depend upon . Hence the functional transforms to
S[r] =
_

b
a
d F(, r, r
), (6.14)
where
F = (r
cos r sin ) F
_
r cos , r sin,
r
sin +r cos
r
cos r sin
_
. (6.15)
The new functional depends only upon , r() and the rst derivative r
(), so the
Euler-Lagrange equation is
d
d
_
F
r
F
r
= 0 (6.16)
which is the transformed version of
d
dx
_
F
y
F
y
= 0. (6.17)
This analysis shows that the transformation to polar coordinates keeps the form of
Eulers equation invariant because the transformation of the functional introduces only
rst derivatives, via equations 6.12 and 6.13, so does not alter the derivation of the
Euler-Lagrange equation. The same transformation applied to the Euler-Lagrange
equation 6.17 involves nding a suitable expression for the second derivative, y
(x),
which is harder.
Exercise 6.4
The integrand of the functional 6.14 contains the denominator r
cos r sin .
Why can we assume that this is not zero?
Exercise 6.5
Show that if r is taken to be the independent variable the functional 6.10 becomes
S[] =
_
r
b
ra
dr F(r, ,
) where F =
_
cos r
sin
_
F
_
r cos , r sin ,
sin +r
cos
cos r
sin
_
.
Exercise 6.6
(a) Show that the Euler-Lagrange equation for the functional
S[r] =
_

b
a
d
_
r
2
+r
()
2
is r
d
2
r
d
2
2
_
dr
d
_
2
r
2
= 0 or r
d
d
_
1
r
2
dr
d
_
= 1.
(b) Show that the general solution of this equation is r = 1/(Acos +Bsin ), for
constants A and B.
(c) By showing that
d
dx
=
xy
y
x
2
+y
2
and
dr
d
=
(yy
+x)r
xy
y
, where (x, y) are the
Cartesian coordinates, show that this functional becomes
S[y] =
_
b
a
dx
_
1 +y
(x)
2
.
(d) If the boundary conditions in the Cartesian plane are (x, y) = (a, a) and
(b, b +), b > a and > 0 show that in each representation the stationary path is
y =
_
1 +

b a
_
x
a
b a
and r =
a
(b a +) cos (b a) sin
.
Consider the limit 0 and explain why the polar equation fails in this limit.
This example illustrates that simplication can sometimes occur when suitable transfor-
mations are made: the art is to nd such transformations. The last part of exercise 6.6
also shows that representations that are undened at isolated points can cause di-
culties. In this case a problem is created because polar coordinates are not unique at
the origin, where is undened. The same problems occur when using spherical polar
coordinates at the north and south poles, where the azimuthal angle is undened.
Exercise 6.7
Show that in polar coordinates the functional
S[y] =
_
b
a
dx
_
x
2
+y
2
_
1 +y
(x)
2
becomes S[r] =
_

b
a
d r
_
r
2
+r
()
2
and that the resulting Euler-Lagrange equation is
d
2
r
d
2

3
r
_
dr
d
_
2
2r = 0 which can be written as
d
2
d
2
_
1
r
2
_
+
4
r
2
= 0.
Hence show that equations for the stationary paths are
1
r
2
= Acos 2 +Bsin 2 or A(x
2
y
2
) + 2Bxy = 1,
where A and B are constants and 0 < .
The general transformation
The analysis for the general transformation
x = f(u, v), y = g(u, v)
is very similar to the special case dealt with above and, as in that case (see exercise 6.6),
it is necessary that the transformation is invertible, so that the Jacobian determinant,
equation 1.26 (page 30) is not zero,
(f, g)
(u, v)
=
f
u
f
v
g
u
g
v
= 0.
If the admissible curves are denoted by y(x) and v(u) in the two representations, with
a x b and c u d, then the functional
S[y] =
_
b
a
dxF(x, y, y
) transforms to S[v] =
_
d
c
du F(u, v, v
), (6.18)
where
F = (f
u
+f
v
v
) F
_
f, g,
g
u
+g
v
v
f
u
+f
v
v
_
. (6.19)
This result follows because the chain rule gives
dx
du
= f
u
+f
v
dv
du
and
dy
du
= g
u
+g
v
dv
du
.
In the (u, v)-coordinate system the stationary path is given by the Euler-Lagrange
equation
d
du
_
F
v
F
v
= 0.
Exercise 6.8
Consider the elementary functional
S[y] =
_
b
a
dxF(y
), y(a) = A, y(b) = B.
If the roles of the dependent and independent variables are interchanged, by noting
that y
(x) = 1/x
(y), show that the functional becomes

S[x] =
_
B
A
dy G(x
) where G(u) = uF(1/u).

Exercise 6.9
S[y] =
_
dx
_
1
2
A(x)y
2
+B(x, y)
_
,
where A(x) is a non-zero function of x and B(x, y) a function of x and y. Note
that the boundary conditions play no role in this question, so are omitted.
(a) If a new independent variable, u, is dened by the relation x = f(u), where
f(u) is a dierentiable, monotonic increasing function, show that with an appro-
priate choice of f the functional can be written in the form
S[y] =
_
du
_
1
2
y
(u)
2
+AB
_
.
(b) Use this transformation to show that the equation
x
d
2
y
dx
2

dy
dx
4x
3
y = 8x
3
can be converted to
d
2
y
du
2
4y = 8,
with a suitable choice of the variable u.
6.3. FUNCTIONALS WITH MANY DEPENDENT VARIABLES 181
Exercise 6.10
_
2
1
dx
y
2
x
2
, y(1) = A, y(2) = B, where A and B
are both positive.
(a) Using the fact that y
(x) = 1/x
(y) show that if y is used as the independent

variable the functional becomes
S[x] =
_
B
A
dy
1
x
2
x
(y)
, x(A) = 1, x(B) = 2.
(b) Show that the Euler-Lagrange equation for the functional S[x] is
d
dy
_
1
x
2
x
2
_
2
x
3
x
= 0 which can be written as

d
2
x
dy
2
+
2
x
_
dx
dy
_
2
= 0.
(c) By writing this last equation in the form
1
x
2
d
dy
_
x
2
dx
dy
_
= 0, x(A) = 1, x(B) = 2,
and integrating twice show that the stationary path is x
3
= (7y+B8A)/(BA).
6.3 Functionals with many dependent variables
6.3.1 Introduction
In chapter 4 we considered functionals of the type
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (6.20)

which involve one independent variable, x, and a single dependent variable, y(x), and
its rst derivative. There are many useful and important extensions to this type of
functional and in this chapter we discuss one of these the rst in the following list
which is important in the study of dynamical systems and when representing stationary
paths in parametric form, an idea introduced in chapter 9. Before proceeding we list
other important generalisations in order to provide you with some idea of the types of
problems that can be tackled: some are treated in later chapters.
(i) The integrand of the functional 6.20 depends upon the independent variable x and
a single dependent variable y(x), which is determined by the requirement that S[y]
be stationary. A simple generalisation is to integrands that depend upon several
dependent variables y
k
(x), k = 1, 2, , n, and their rst derivatives. This type
of functional is studied later in this section.
(ii) The integrand of 6.20 depends upon y(x) and its rst derivative. Another simple
generalisation involves functionals depending upon second or higher derivatives.
Some examples of this type are treated in exercises 4.32, 4.33 (page 143) and 7.12.
The elastic theory of sti beams and membranes requires functionals containing
the second derivative which represents the bending energy and some examples are
described in chapter 10.
(iii) Broken extremals: a broken extremal is a continuous solution of the Euler-
Lagrange equation with a piecewise continuous rst derivative. A simple example
of such a solution is the Goldschmidt curve dened by equation 5.20, (page 160).
That such solutions are important is clear, partly because they occur in the rela-
tively simple case of the surface of minimum revolution and also from observations
of soap lms that often comprise spherical segments such that across common
boundaries the normal to the surface changes direction discontinuously. We con-
sider broken extremals in chapter 10.
(iv) In all examples so far considered the end points of the curve have been xed.
However, there are variational problems where the ends of the path are free to
move on given curves: an example of this type of problem is described at the end
of section 3.5.3, equation 3.22 (page 110). The general theory is considered in
chapter 10.
(v) The integral dening the functional may be over a surface, S, rather than along
a line,
J[y] =
__
S
dx
1
dx
2
F
_
x
1
, x
2
, y,
y
x
1
,
y
x
2
_
where S is a region in the (x
1
, x
2
)-plane, so the functional depends upon two inde-
pendent variables, (x
1
, x
2
), rather than just one. In this case the Euler-Lagrange
equation is a partial dierential equation. Many of the standard equations of
mathematical physics can be derived from such functionals. There is, of course, a
natural extension to integrals over higher-dimensional spaces; such problems are
not considered in this course.
6.3.2 Functionals with two dependent variables
First we nd the necessary conditions for a functional depending on two functions to
be stationary. We are ultimately interested in functionals depending upon any nite
number of variables, so we shall often use a notation for which this further generalisation
becomes almost trivial.
If the two dependent variables are (y
1
(x), y
2
(x)) and the single independent variable
is x, the functional is
S[y
1
, y
2
] =
_
b
a
dxF (x, y
1
, y
2
, y
1
, y
2
) (6.21)
where each function satises given boundary conditions,
y
k
(a) = A
k
, y
k
(b) = B
k
, k = 1, 2. (6.22)
We require functions (y
1
(x), y
2
(x)) that make this functional stationary and proceed in
the same manner as before. Let y
1
(x) and y
2
(x) be two admissible functions that
is, functions having continuous rst derivatives and satisfying the boundary conditions
and use the Gateaux dierential of S[y
1
, y
2
] to calculate its rate of change. This is
S[y
1
, y
2
, h
1
, h
2
] =
d
d
S[y
1
+h
1
, y
2
+h
2
]
=0
, (6.23)
where y
k
(x) +h
k
(x), k = 1, 2, are also admissible functions, which means that h
k
(a) =
h
k
(b) = 0, k = 1, 2. As in equation 4.8 (page 129) we have
S =
_
b
a
dx
d
d
F (x, y
1
+h
1
, y
2
+h
2
, y
1
+h
1
, y
2
+h
2
)
=0
and the integrand is simplied using the chain rule,
dF
d
=0
=
F
y
1
h
1
+
F
y
1
h
1
+
F
y
2
h
2
+
F
y
2
h
2
.
Hence the Gateaux dierential is
S =
_
b
a
dx
_
F
y
1
h
1
+
F
y
1
h
1
+
F
y
2
h
2
+
F
y
2
h
2
_
. (6.24)
For a stationary path we need, by denition (chapter 4, page 125), S = 0 for all
h
1
(x) and h
2
(x). An allowed subset of variations is obtained by setting h
2
(x) = 0,
then the above equation becomes the same as equation 4.9, (page 129), with y and h
replaced by y
1
and h
1
respectively. Hence we may use the same analysis to obtain the
second-order dierential equation
d
dx
_
F
y
1
_
F
y
1
= 0, y
1
(a) = A
1
, y
1
(b) = B
1
. (6.25)
This equation looks the same equation 4.11, (page 130), but remember that here F also
depends upon the unknown function y
2
(x).
Similarly, by setting h
1
(x) = 0, we obtain another second-order equation
d
dx
_
F
y
2
_
F
y
2
= 0, y
2
(a) = A
2
, y
2
(b) = B
2
. (6.26)
Equations 6.25 and 6.26 are the Euler-Lagrange equations for the functional 6.21. These
two equations will normally involve both y
1
(x) and y
2
(x), so are named coupled dier-
ential equations; normally this makes them far harder to solve than the Euler-Lagrange
equations of chapter 4, which contain only one dependent variable.
An example will make this clear: consider the quadratic functional
S[y
1
, y
2
] =
_
/2
0
dx
_
y
2
1
+y
2
2
+ 2y
1
y
2
_
(6.27)
so that equation 6.25 becomes
d
2
y
1
dx
2
y
2
= 0, (6.28)
which involves both y
1
(x) and y
2
(x), and equation 6.26 becomes
d
2
y
2
dx
2
y
1
= 0, (6.29)
which also involves both y
1
(x) and y
2
(x).
Equations 6.28 and 6.29 now have to be solved. Coupled dierential equations are
normally very dicult to solve and their solutions can behave in bizarre ways, including
chaotically; but these equations are linear which makes the task of solving them much
easier, and the solutions are generally better behaved. One method is to use the rst
equation to write y
2
= y
1
, so the second equation becomes the fourth-order linear
equation
d
4
y
1
dx
4
y
1
= 0.
By substituting a function of the form y
1
= exp(x), where and are constants,
into this equation we obtain an equation for that gives
4
= 1, showing that there
are four solutions obtained by setting = 1, i and hence that the general solution
is a linear combination of these functions,
y
1
(x) = Acos x +Bsinx +C coshx +Dsinh x
where A, B, C and D are constants. Since y
2
= y
1
we also have
y
2
(x) = Acos x Bsin x +C coshx +Dsinh x.
The four arbitrary constants may now be determined from the four boundary conditions,
as demonstrated in the following exercise.
Exercise 6.11
Find the values of the constants A, B, C and D if the functional 6.27 has the
boundary conditions y1(0) = 0, y2(0) = 0, y1
_
2
_
= 1, y2
_
2
_
= 1.
Exercise 6.12
Show that the Euler-Lagrange equations for the functional
S[y1, y2] =
_
1
0
dx
_
y
2
1
+y
2
2
+y
1
y
2
_
,
with the boundary conditions y1(0) = 0, y2(0) = 1, y1(1) = 1, y2(1) = 2, integrate
to 2y
1
+ y
2
= a1 and 2y
2
+ y
1
= a2, where a1 and a2 are constants. Deduce
that the stationary path is given by the equations
y1(x) = x and y2(x) = x + 1.
Exercise 6.13
By dening a new variable z1 = y1 +y2/2, show that the functional dened in the
previous exercise becomes
S[z1, y2] =
_
1
0
dx
_
z
2
1
+
3
4
y
2
2
_
, z1(0) =
1
2
, y2(0) = 1, z1(1) = 2, y2(1) = 2,
and that the corresponding Euler-Lagrange equations are
d
2
z1
dx
2
= 0 and
d
2
y2
dx
2
= 0.
Solve these equations to derive the solution obtained in the previous exercise.
Note that by using the variables (z1, y2) each of the new Euler-Lagrange equations
depends only upon one of the dependent variables and are therefore far easier to
solve. Such systems of equations are said to be uncoupled and one of the main
methods of solving coupled Euler-Lagrange equations is to nd a transformation
that converts them to uncoupled equations. In real problems nding such trans-
formations is dicult and often relies upon understanding the symmetries of the
problem and then the methods described in sections 6.2 and 7.3 can be useful.
6.3.3 Functionals with many dependent variables
The extension of the above analysis to functionals involving n dependent variables,
their rst derivatives and a single independent variable is straightforward. It is helpful,
however, to use the notation y(x) = (y
1
(x), y
2
(x), , y
n
(x)) to denote the set of n
functions. There is still only one independent variable, so the functional is
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (6.30)

where y
= (y
1
, y
2
, , y
n
), A = (A
1
, A
2
, , A
n
), B = (B
1
, B
2
, , B
n
) and
h = (h
1
, h
2
, , h
n
). If y(x) and y(x) +h(x) are admissible functions, so that h(a) =
h(b) = 0, the Gateaux dierential is given by the relation
S[y, h] =
d
d
S[y +h]
=0
=
_
b
a
dx
d
d
F(x, y +h, y
+h
=0
,
and for y to be a stationary path this must be zero for all allowed h. Using the chain
rule we have
d
d
F(x, y +h, y
+h
=0
=
n
k=1
_
h
k
F
y
k
+h
k
F
y
k
_
,
and hence
S[y, h] =
n
k=1
_
b
a
dx
_
h
k
F
y
k
+ h
k
F
y
k
_
. (6.31)
Now integrate by parts to cast this in the form
S[y, h] =
n
k=1
_
h
k
F
y
k
_
b
a
k=1
_
b
a
dx
_
d
dx
_
F
y
k
_
F
y
k
_
h
k
. (6.32)
But, since h(a) = h(b) = 0, the boundary term vanishes. Further, since S[y, h] = 0
for all allowed h, by the same reasoning used when n = 2, we obtain the set of n coupled
equations
d
dx
_
F
y
k
_
F
y
k
= 0, y
k
(a) = A
k
, y
k
(b) = B
k
, k = 1, 2, , n. (6.33)
This set of n coupled equations is usually nonlinear and dicult to solve. The one
circumstance when the solution is relatively simple is when the integrand of the func-
tional S[y] is a quadratic form in both y and y
, and then the Euler-Lagrange equations

are coupled linear equations; this is an important example because it describes small
oscillations about an equilibrium position of an n-dimensional dynamical system.
Exercise 6.14
(a) If A and B are real, symmetric, positive denite, nn matrices
1
consider the
functional
S[y] =
_
b
a
dx
n
i=1
n
j=1
_
y
i
Aijy
j
yiBijyj
_
,
with the integrand quadratic in y and y
. Show that the n Euler-Lagrange equa-

tions are the set of coupled, linear equations
n
j=1
_
A
kj
d
2
yj
dx
2
+B
kj
yj
_
= 0, 1 k n.
(b) Show that if we interpret y as an n-dimensional column vector and its trans-
pose y
as a row vector, the functional can be written in the equivalent matrix

form
S[y] =
_
b
a
dx
_
y

Ay
By
_
,
and that the Euler-Lagrange equations can be written in the matrix form
A
d
2
y
dx
2
+By = 0.
Show that this can also be written in the form
d
2
y
dx
2
+A
1
By = 0. (6.34)
(c) It can be shown that the matrix A
1
B has non-negative eigenvalues
2
k
and
n orthonormal eigenvectors z
k
, k = 1, 2, , n, possibly complex, which each
satisfying A
1
Bz
k
=
2
k
z
k
. By expressing y as the linear combination of the z
k
,
y =
n
k=1
a
k
(x)z
k
, with z
k
zj =
kj
for all k and j,
show that the coecients a
k
(x) satisfy the uncoupled equations
d
2
aj
dx
2
+
2
j
aj = 0, j = 1, 2, , n.
6.3.4 Changing dependent variables
In this section we consider the eect of changing the dependent variables. A simple
example of such a transformation was dealt with in exercise 6.13 where it was shown how
a linear transformation uncoupled the Euler-Lagrange equations. In general the aim of
changing variables is to simplify the Euler-Lagrange equations and it is generally easier
to apply the transformation to the functional rather than the Euler-Lagrange equations.
Before explaining the general theory we deal with a specic example, which high-
lights all salient points. The functional is
S[y
1
, y
2
] =
_
b
a
dt
_
1
2
_
y
2
1
+y
2
2
_
V (r)
_
, r =
_
y
2
1
+y
2
2
, (6.35)
1
A real symmetric matrix, A, has real elements satisfying A
ij
= A
ji
, for all i and j, and it can be
shown that its eigenvalues are real; a positive denite matrix has positive eigenvalues.
where V (r) is any suitable function: this functional occurs frequently because it arises
when describing the planar motion of a particle acted upon by a force depending only
on the distance from a xed point, for example in a simplied description of the motion
of the Earth round the Sun; in this case the independent variable, t, is the time. The
functional S[y] is special because its integrand depends only upon the combinations
y
2
1
+ y
2
2
and y
2
1
+ y
2
2
, which suggests that changing to polar coordinates may lead to
simplication. These are (r, ) where y
1
= r cos and y
2
= r sin so that y
2
1
+y
2
2
= r
2
and, on using the chain rule,
dy
1
dt
=
dr
dt
cos r
d
dt
sin and
dy
2
dt
=
dr
dt
sin +r
d
dt
cos .
Squaring and adding these equations gives y
2
1
+y
2
2
= r
2
+r
2
2
. Hence the functional
becomes
S[r, ] =
_
b
a
dt
_
1
2
r
2
+
1
2
r
2
2
V (r)
_
. (6.36)
Exercise 6.15
(a) Show that the Euler-Lagrange equations for the functional 6.35 are
d
2
y1
dt
2
+V
(r)
y1
r
= 0 and
d
2
y2
dt
2
+V
(r)
y2
r
= 0. (6.37)
(b) Show that the Euler-Lagrange equations for the functional 6.36 can be written
in the form
d
2
r
dt
2

L
2
r
3
+V
(r) = 0 and
d
dt
=
L
r
2
, (6.38)
where L is a constant. Note that the equation for r does not depend upon and
that (t) is obtained from r(t) by a single integration. In older texts on dynamics,
see for instance Whittaker (1904), problems are said to be soluble by quadrature
if their solutions can be reduced to known functions or integrals of such functions.
The general theory is not much more complicated. Suppose that y = (y
1
, y
2
, , y
n
)
and z = (z
1
, z
2
, , z
n
) are two sets of dependent variables related by the equations
y
k
=
k
(z), k = 1, 2, , n, (6.39)
where we assume, in order to slightly simplify the analysis, that each of the
k
is not
explicitly dependent upon the independent variable, x. The chain rule gives
dy
k
dx
=
n
i=1
k
z
i
dz
i
dx
showing that each of the y
k
depends linearly upon z
i
. These linear equations can be
inverted to give z
i
in terms of y
k
, k = 1, 2, , n, if the n n matrix with elements
k
/z
i
is nonsingular, that is if the Jacobian determinant, equation 1.26 (page 30),
is non-zero. This is also the condition for the transformation between y and z to be
invertible.
Under this tranformation the functional
S[y] =
_
b
a
dxF(x, y, y
) becomes S[z] =
_
b
a
dxG(x, z, z
) (6.40)
where
G = F
_
x,
1
(z),
2
(z), ,
n
(z),
n
i=1
1
z
i
z
i
, ,
n
i=1
n
z
i
z
i
_
,
that is, G(x, z, z
) is obtained from F(x, y, y
) simply by replacing y and y
. In practice,
of course, the transformation 6.39 is chosen to ensure that G(x, z, z
) is simpler than
F(x, y, y
).
Exercise 6.16
Show that under the transformation y1 = cos , y2 = sin , y3 = z, the func-
tional
S[y1, y2, y3] =
_
b
a
dt
_
1
2
_
y
2
1
+y
2
2
+y
2
3
_
V ()
_
, =
_
y
2
1
+y
2
2
,
becomes
S[, , z] =
_
b
a
dt
_
1
2
_
2
+
2
2
+z
2
_
V ()
_
.
Find the Euler-Lagrange equations and show that those for and z are uncoupled.
6.4 The Inverse Problem
The ideas described in this chapter have shown that there are several advantages in
formulating a system of dierential equations as a variational principle. This naturally
raises the question as to whether any given system of equations can be formulated in
the form of the Euler-Lagrange equations and hence possesses an associated variational
principle.
In this section it is shown that any second-order equation of the form
d
2
y
dx
2
= f(x, y, y
), (6.41)
where f(x, y, y
) is a suciently well behaved function of the three variables can, in

principle, be expressed as a variational principle. When there are two or more dependent
variables there is no such general result, although there are special classes of equations
for which similar results hold: here, however, we do not discuss these more dicult
cases.
First, consider linear, second-order equations, the most general equation of this type
being,
a
2
(x)
d
2
y
dx
2
+a
1
(x)
dy
dx
+a
0
(x)y = b(x), (6.42)
where a
k
(x), k = 0, 1 and 2, and b(x) depend only upon x and a
2
(x) = 0 in the relevant
interval of x. This equation may be transformed to the canonical form
d
dx
_
p(x)
dy
dx
_
+q(x)y =
p(x)b(x)
a
2
(x)
(6.43)
where
p(x) = exp
__
dx
a
1
(x)
a
2
(x)
_
and q(x) =
a
0
(x)
a
2
(x)
p(x). (6.44)
6.4. THE INVERSE PROBLEM 189
Inspection shows that the associated functional is
S[y] =
_
dx
_
p
_
dy
dx
_
2
qy
2
+
2pb
a
2
y
_
. (6.45)
This is an important example and is discussed at length in chapter 13.
Exercise 6.17
(a) Show that equations 6.42 and 6.43 are equivalent if p(x) and q(x) are dened
as in equation 6.44.
(b) Show that the Euler-Lagrange equation associated with the functional 6.45 is
equation 6.43.
Now consider the more general equation 6.41. Suppose that the equivalent Euler-
Lagrange exists and is
d
dx
_
F
y
F
y
= 0 that is y
F
y
y
+y
F
yy
+F
xy
F
y
= 0, (6.46)
where F(x, y, y
) is an unknown function. Using equation 6.41 we can express y
in
terms of x, y and y
to give
fF
y
y
+y
F
yy
+F
xy
F
y
= 0, (6.47)
which is an equation relating the partial derivatives of F and may therefore be regarded
as second-order partial dierential equation for F. As it stands this equation is of limited
practical value because it can rarely be solved directly. If, however, we dene a new
function z = F
y
y
, we shall see that z satises a rst-order equation for which the
solutions are known to exist. This is seen by dierentiating equation 6.47 with respect
to y
,
fF
y
y
+f
y
F
y
y
+y
F
y
y
+F
y
x
= 0.
In terms of z this becomes the rst-order equation,
f
z
y
+y
z
y
+
z
x
+f
y
z = 0. (6.48)
It can be shown that solutions of this partial dierential equation for z(x, y, y
) exist,
see for example Courant and Hilbert
2
(1937b). It follows that the function F(x, y, y
)
exists and that equation 6.41 can be written as an Euler-Lagrange equation and that
there is an associated functional.
Finding F(x, y, y
) explicitly, however, is not usually easy or even possible because

this involves rst solving the partial dierential equation 6.48 and then integrating this
solution twice with respect to y
. At either stage it may prove impossible to express

the result in a useful form. Some examples illustrate this procedure in simple cases.
Consider the dierential equation
y
= f(x, y)
2
R Courant and D Hilbert, Methods of Mathematical Physics, Volume 2.
where the right-hand side is independent of y
. Then equation 6.48 contains only

derivatives of z and one solution is z = c, a constant. Now the equation F
y
y
= z can
be integrated directly to give
F(x, y, y
) =
1
2
cy
2
+y
A(x, y) +B(x, y),

where A and B are some functions of x and y, but not y
. The derivatives of F are,

F
y
= y
A
y
+B
y
, F
yy
= A
y
, F
xy
= A
x
,
so that the Euler-Lagrange equation 6.46 becomes
cy
= B
y
A
x
.
Comparing this with the original equation gives c = 1, and B
y
A
x
= f(x, y): two
obvious solutions are
B(x, y) =
_
y
c1
dv f(x, v), A = 0 and A =
_
x
c2
du f(u, y), B = 0,
where c
1
and c
2
are constants, so that the integrands of the required functional are
F
1
=
1
2
y
2
+
_
y
c1
dv f(x, v) or F
2
=
1
2
y
2
y
_
x
c2
du f(u, y). (6.49)
It may seem strange that this procedure yields two seemingly quite dierent expressions
for F(x, y, y
). But, recall that dierent functionals will give the same Euler-Lagrange
equation if the integrands dier by a function that is the derivative with respect to x of
a function g(x, y), see exercises 4.27 (page 140) and 6.22 (page 191). Thus, we expect
that there is a function g(x, y) such that F
1
F
2
= dg/dx.
In the next exercise it is shown that the Euler-Lagrange equations associated with
F
1
and F
2
are identical and an explicit expression is found for g(x, y).
Exercise 6.18
(a) Show that F1 and F2, dened in equation 6.49 give the same Euler-Lagrange
equations.
(b) Show that F1 F2 =
d
dx
g(x, y), and nd g(x, y).
Exercise 6.19
Find a functional for the equation
d
2
y
dx
2
+
dy
dx
+y = 0, where is a constant.
Exercise 6.20
Using the functional S[y] =
_
b
a
dx
_
y
2
2
y
2
_
and the change of variable z = x
1/c
,
show that the dierential equation y
+
2
y = 0 is transformed into
z
d
2
y
dz
2
+ (1 c)
dy
dz
+c
2
2
z
2c1
y = 0.
Exercise 6.21
Show that the Euler-Lagrange equations for the functional S[y1, y2] =
_
b
a
dxF(y
1
, y
2
),
which depends only upon the rst derivatives of y1 and y2, are
2
F
y
2
1
y
1
+

2
F
y
1
y
2
y
2
= 0 and

2
F
y
1
y
2
y
1
+

2
F
y
2
2
y
2
= 0.
Deduce that, provided the determinant
d =
2
F
y
2
1
2
F
y
1
y
2
F
y
1
y
2
F
y
2
2
is non-zero, the stationary paths are the straight lines

y1(x) = Ax +B, y2(x) = Cx +D
where A, B, C and D are constants. Describe the solution if d = 0.
What is the equivalent condition if there is only one dependent variable?
Exercise 6.22
If (x, y1, y2) is any twice dierentiable function show that the functionals
S1[y1, y2] =
_
b
a
dxF
_
x, y1, y2, y
1
, y
2
_
and
S2[y1, y2] =
_
b
a
dx
_
F
_
x, y1, y2, y
1
, y
2
_
+
_
x, y1, y2, y
1
, y
2
_
_
,
where
=
d
dx
=

x
+

y1
y
1
+

y2
y
2
for some (x, y1, y2),
lead to the same Euler-Lagrange equation.
Note that this is the direct generalisation of the result derived in exercise 4.27
(page 140).
Exercise 6.23
Consider the two functionals
S1[y1, y2] =
_
b
a
dx
_
1
2
_
y
2
1
+y
2
2
_
+g1(x)y
1
+g2(x)y
2
V (x, y1, y2)
_
and
S2[y1, y2] =
_
b
a
dx
_
1
2
_
y
2
1
+y
2
2
_
V (x, y1, y2)
_
where V = V +g
1
(x)y1 +g
2
(x)y2. Use the result proved in the previous exercise
to show that S1 and S2 give rise to identical Euler-Lagrange equations.
Exercise 6.24
(a) Show that the Euler-Lagrange equation of the functional
S[y] =
_

0
dxe
x
_
y e
x
y
is
d
2
y
dx
2

_
3e
x
1
_
dy
dx
+ 2e
2x
y = 0. (6.50)
(b) Show that the change of variables u = e
x
, with inverse x = ln(1/u), trans-
forms this functional to
S[Y ] =
_
1
0
du
_
Y (u) +Y
(u), Y (u) = y(ln(u)),

and that the Euler-Lagrange equation for this functional is
d
2
Y
du
2
+ 3
dY
du
+ 2Y = 0. (6.51)
(c) By making the substituting x = ln(u), show that equation 6.50 transforms
into equation 6.51.
Exercise 6.25
Show that the stationary paths of the functional S[y, z] =
_
/2
0
dx
_
y
2
+z
2
+ 2yz
_
,
with the boundary conditions y(0) = 0, z(0) = 0, y(/2) = 3/2 and z(/2) = 1/2,
satisfy the equations
d
2
y
dx
2
z = 0,
d
2
z
dx
2
y = 0.
Show that the solution of these equations is
y(x) =
sinh x
sinh(/2)
+
1
2
sin x, z(x) =
sinh x
sinh(/2)

1
2
sin x.
Exercise 6.26
The ordinary Bessel function, denoted by Jn(x), is dened to be proportional to
the solution of the second-order dierential equation
x
2
d
2
y
dx
2
+x
dy
dx
+ (x
2
n
2
)y = 0, n = 0, 1, 2, , (6.52)
that behaves as (x/2)
n
near the origin.
(a) Show that equation 6.52 is the Euler-Lagrange equation of the functional
F[y] =
_
X
0
dx
_
xy
(x)
2
_
x
n
2
x
_
y(x)
2
_
, y(X) = Y = 0, ,
where the admissible functions have continuous second derivatives.
(b) Dene a new independent variable u by the equation x = f(u), where f(u) is
monotonic and smooth, and set w(u) = y(f(u)) to cast this functional into the
form
F[w] =
_
u
1
u
0
du
_
f(u)
f
(u)
w
(u)
2
_
f(u)f
(u) n
2
f
(u)
f(u)
_
w(u)
2
_
,
where f(u0) = 0 and f(u1) = X.
(c) Hence show that if f(u) = e
u
, w(u) satises the equation
d
2
w
du
2
+
_
e
2u
n
2
_
w = 0 (6.53)
and deduce that a solution of equation 6.53 is w(u) = Jn (e
u
).
Chapter 7
Symmetries and Noethers
theorem
7.1 Introduction
In this chapter we show how symmetries can help solve the Euler-Lagrange equations.
The simplest example of the theory presented here was introduced in section 4.4.1 where
it was shown that a rst-integral existed if the integrand did not depend explicitly upon
the independent variable. This simplication was used to help solve the brachistochrone
and the minimum surface area problems, sections 5.2 and 5.3. Here we show how the
rst-integral may be derived using a more general principle which can be used to derive
other rst-integrals.
Students knowing some dynamics will be aware of how important the conservation of
energy, linear and angular momentum can be: the theory described in section 7.3 unies
all these conservation laws. In addition these ideas may be extended to deal with those
partial dierential equations that can be derived from a variational principle, although
this theory is not included in the present course.
7.2 Symmetries
The Euler-Lagrange equations for the brachistrochrone, section 5.2, and the minimal
surface area, section 5.3, were solved using the fact that the integrand, F(y, y
), does
not depend explicitly upon x, that is, F/x = 0. In this situation it was shown in
exercise 4.7 (page 131) that
y
(x)
_
d
dx
_
F
y
F
y
_
=
d
dx
_
y
F
y
F
_
.
This result is important because it shows that if y(x) satises the second-order Euler
equation it also satises the rst-order equation
y
F
y
F = constant,
195
196 CHAPTER 7. SYMMETRIES AND NOETHERS THEOREM
which is often simpler: we ignore the possibility that y
(x) = 0 for the reasons discussed

after exercise 7.8. This result is proved algebraically in exercise 4.7, (page 131), and
there we relied only on the fact that F/x = 0. In the following section we re-derive
this result using the equivalent but more fundamental notion that the integrand F(y, y
)
is invariant under translations in x: this is a fruitful method because it is more readily
generalised to other types of transformations; for instance in three-dimensional problems
the integrand of the functional may be invariant under all rotations, or just rotations
about a given axis. The general theory is described in section 7.3, but rst we introduce
the method by applying it to functionals that are invariant under translations.
7.2.1 Invariance under translations
The algebra of the following analysis is fairly complicated and requires careful thought
at each stage, so you may need to read it carefully several times and complete the
intermediate steps.
S[y] =
_
b
a
dxF(y, y
), y(a) = A, y(b) = B, (7.1)

where the integrand does not depend explicitly upon x, that is F/x = 0. The
stationary function, y(x), describes a curve C in the two-dimensional space with axes
Oxy, so that a point, P, on the curve has coordinates (x, y(x)) as shown in gure 7.1.
x
x
y
O
x
P
O
C
y
Figure 7.1 Diagram showing the two coordinate systems Oxy and Ox y, con-
nected by a translation along the x-axis by a distance .
Consider now the coordinate system Oxy where x = x + and y = y, with the origin,
O, of this system at x = , y = 0 in the original coordinate system; that is, Oxy is
translated from Oxy a distance along the x-axis. In this coordinate system the curve
C is described by y(x), so the coordinates of a point P are (x, y(x)) and these are
related to coordinates in Oxy, (x, y(x)), by
x = x and y(x) = y(x) or y(x) = y(x +),
the latter equation dening the function y; dierentiation, using the chain rule, gives
dy
dx
=
dy
dx
dx
dx
=
dy
dx
.
7.2. SYMMETRIES 197
The functional 7.1 can be computed in either coordinate system and, for reasons that
will soon become apparent, we consider the integral in the Oxy representation over a
limited, but arbitrary, range
G =
_
d
c
du F
_
y(u), y
(u)
_
where y
=
dy
du
,
c = c , d = d and where a < c < d < b. The integrand of G depends on u only
through the function y(u): this means that at each value of u the integrand has the
same value as the integrand of S[y] at the equivalent point, x = u +. Hence
_
d
c
du F
_
y(u), y
(u)
_
=
_
d
c
dxF
_
y(x), y
(x)
_
where x = u +, (7.2)
and this is true for all .
Now consider small values of and expand to O(), rst writing the integral in the
form
G =
_
d
c
du F(y(u), y
(u))
=
_
d
c
du F(y, y
) +
_
c
c
du F(y, y
)
_
d
d
du F(y, y
). (7.3)
But, for small
_
z
z
du g(u) = g(z) +O(
2
),
and to this order
y(u) = y(u +) = y(u) +y
(u) +O(
2
), and y
(u) = y
(u) +y
(u) +O(
2
).
Thus the expression 7.3 for G becomes, to rst order in ,
G =
_
d
c
du F(y +y
, y
+y
)
_
F(y, y
)
_
d
c
+O(
2
)
=
_
d
c
du
_
F(y, y
) +
F
y
y
+
F
y
_
F(y, y
)
_
d
c
+O(
2
).
Because of equation 7.2 this gives
0 =
_
d
c
du
_
y
F
y
+y
F
y
_
F(y, y
)
_
d
c
+O(
2
). (7.4)
Now integrate the second integral by parts,
_
d
c
du y
F
y
=
_
y
F
y
_
d
c
_
d
c
du y
d
du
_
F
y
_
.
Substituting this into 7.4 and dividing by gives
0 =
_
y
F
y
F
_
d
c
_
d
c
du y
_
d
du
_
F
y
F
y
_
+O(). (7.5)
But y(u) is a solution of the Euler-Lagrange equation, so the integrand is identically
zero, and hence on letting 0,
_
F y
F
y
x=c
=
_
F y
F
y
x=d
.
Finally, recall that c and d are arbitrary and hence, for any x in the interval a < x < b,
the function
y
F(y, y
) F(y, y
) = constant. (7.6)
Because the function on the left-hand side is continuous the equality is true in the
interval a x b. This relation is always true if the integrand of the functional does
not depend explicitly upon x, that is F/x = 0.
Equation 7.6 relates y
(x) and y(x) and by rearranging it we obtain one, or more,

rst-order equations for the unknown function y(x). Noethers theorem, stated below,
shows that solutions of the Euler-Lagrange equation also satisfy equation 7.6. In prac-
tice, because this equation is usually easier to solve than the original Euler-Lagrange
equation, it is implicitly assumed that its solutions are also solutions of the Euler-
Lagrange equation. In general this is true, but the examples treated in exercises 4.8
(page 132) and 7.8 show that care is sometimes needed. By dierentiating equation 7.6
with respect to x the Euler-Lagrange equation is regained, as shown in exercise 4.7
(page 131).
The function y
F
y
F is named the rst-integral of the Euler-Lagrange equation,
this name being suggestive of it being derived by integrating the original second-order
equation once to give a rst-order equation. For the same reason in dynamics, quantities
that are conserved, for instance energy, linear and angular momentum, are also named
rst-integrals, integrals of the motion or constants of the motion, and these dynamical
quantities have exactly the same mathematical origin as the rst-integral dened in
equation 7.6.
This proof of equation 7.6 may seem a lot more elaborate than that given in ex-
ercise 4.7 (page 131). However, there are circumstances when the algebra required to
use the former method is too unwieldy to be useful and then the present method is
superior. An example of such a problem is given in exercise 7.12.
In the context of Newtonian dynamics the equivalent of equation 7.6 is the conserva-
tion of energy in those circumstances when the forces are conservative and are indepen-
dent of the time; that is in Newtonian mechanics energy conservation is a consequence
of the invariance of the equations of motion under translations in time. Similarly, in-
variance under translations in space gives rise to conservation of linear momentum and
invariance under rotations in space gives rise to conservation of angular momentum.
As an example of a functional that is not invariant under translations of the inde-
pendent variable, consider
J[y] =
_
b
a
dxxy
(x)
2
, y(a) = A, y(b) = B.
It is instructive to go through the above proof to see where and how it breaks down.
7.3. NOETHERS THEOREM 199
In this case equation 7.2 becomes
J[y] =
_
d
c
du u y
(u)
2
=
_
d
c
dv (v )y
(v)
2
= J[y]
_
d
c
dv y
(v)
2
= J[y],
and we see how the explicit dependence of x destroys the invariance needed for the
existence of the rst-integral.
7.3 Noethers theorem
In this section we treat functionals having several dependent variables. The analysis
is a straightforward generalisation of that presented above but takes time to absorb.
For a rst reading, ensure that you understand the fundamental ideas and try to avoid
getting lost in algebraic details. That is, you should try to understand the denition of
an invariant functional, the meaning of Noethers theorem, rather than the proof, and
should be able to do exercises 7.1 7.3.
There are two ingredients to Noethers theorem:
(i) functionals that are invariant under transformations of either or both dependent
and independent variables:
(ii) families of transformations that depend upon one or more real parameters, though
here we deal with situations where there is just one parameter.
We consider each of these in turn in relation to the functional
S[y] =
_
b
a
dxF(x, y, y
), y = (y
1
, y
2
, , y
n
), (7.7)
which has stationary paths dened by the solutions of the Euler-Lagrange equations.
We do not include boundary conditions because they play no role in this theory.
The value of the functional depends upon the path taken which, in this section, is
not always restricted to stationary paths. We shall consider the change in the value
of the functional when the path is changed according to a given transformation: in
particular we are interested in those transformations which change the path but not
the value of the functional.
Consider, for instance, the two functionals
S
1
[y] =
_
1
0
dx
_
y
2
1
+y
2
2
_
and S
2
[y] =
_
1
0
dx
_
y
2
1
+y
2
2
_
y
1
. (7.8)
A path can be dened by the pair of functions (f(x), g(x)), 0 x 1, and on each
the functionals have a value.
Consider the transformation
y
1
= y
1
cos y
2
sin
y
2
= y
1
sin +y
2
cos
with inverse
y
1
= y
1
cos +y
2
sin
y
2
= y
1
sin +y
2
cos
(7.9)
which can be interpreted as an anticlockwise rotation in the (y
1
, y
2
)-plane through an
angle . Hence under this transformation the curve is rotated bodily to the curve ,
as shown in gure 7.2.
Original curve
Rotated curve
y
1
y
2
Figure 7.2 Diagram showing the rotation of the curve

anticlockwise through the angle to the curve .
The points on are parametrised by (f(x), g(x)), 0 x 1, where
f = f cos g sin and g = f sin +g cos . (7.10)
Hence on the functional S
1
has the value
S
1
() =
_
1
0
dx
_
f
(x)
2
+g
(x)
2
_
and on it has the value
S
1
() =
_
1
0
dx
_
f
(x)
2
+g
(x)
2
_
.
But on using equation 7.10 we obtain f
(x)
2
+g
(x)
2
= f
(x)
2
+g
(x)
2
which gives
S
1
() =
_
1
0
dx
_
f
(x)
2
+g
(x)
2
_
= S
1
().
That is the functional S
1
has the same value on and for all and is therefore
invariant with respect to the rotation 7.9.
On the other hand the values of S
2
are
S
2
() =
_
1
0
dx
_
f
(x)
2
+g
(x)
2
_
f(x)
and
S
2
() =
_
1
0
dx
_
f
(x)
2
+g
(x)
2
_
f(x)
=
_
1
0
dx
_
f
(x)
2
+g
(x)
2
_ _
f(x) cos g(x) sin
_
= S
2
() cos sin
_
1
0
dx
_
f
(x)
2
+g
(x)
2
_
g(x).
In this case the functional has dierent values on and , unless is an integer multiple
of 2. That is, S
2
[y] is not invariant with respect to the rotation 7.9.
The transformation 7.9 does not involve changes to the independent variable x,
whereas the transformation considered in section 7.2.1 involved only a change in the
independent variable, via a translation along the x-axis, see gure 7.1. In general it is
necessary to deal with a transformation in both dependent and independent variables,
which can be written as
x = (x, y, y
) and y
k
=
k
(x, y, y
), k = 1, 2, , n. (7.11)
We assume that these relations can be inverted to give x and y in terms of x and y.
For a curve , dened by the equation y = f (x), a x b, this transformation moves
to another curve dened by the transformed equation y = f (x).
Denition 7.1
The functional 7.7 is said to be invariant under the transformation 7.11 if
G = G where G =
_
d
c
dxF
_
x, y,
dy
dx
_
, G =
_
d
c
dxF
_
x, y,
dy
dx
_
,
and where c = (c, y(c), y
(c)) and d = (d, y(d), y
(d)), for all c and d satisfying

a c < d b.
The meaning of the equality G = G is easiest to understand if x = x. Then the functions
y(x) and y(x) dene two curves, and in an n-dimensional space, each parametrised
by the independent variable x. The functional G is the integral of F(x, y, y
) along
and G is the integral of the same function along .
In the case x = x the parametrisation along and is changed. An important
example of a change to the independent variable, x, is the uniform shift x = x + ,
where is independent of x, y and y
, which is the example dealt with in the previous

section. The scale transformation, whereby x = (1 +)x, is also useful, see exercise 7.8.
A one-parameter family of transformations is the set of transformations
x = (x, y, y
; ) and y
k
=
k
(x, y, y
; ), k = 1, 2, , n, (7.12)
depending upon the single parameter , which reduces to the identity when = 0, that
is
x = (x, y, y
; 0) and y
k
=
k
(x, y, y
; 0), k = 1, 2, , n,
and where and all the
k
have continuous rst derivatives in all variables, including .
This last condition ensures that the transformation is invertible in the neighbourhood of
= 0, provided the Jacobian determinant is not zero. An example of a one-parameter
family of transformations is dened by equation 7.9, which becomes the identity when
= 0.
Exercise 7.1
Which of the following is a one-parameter family of transformations
(a) x = x y, y = y +x,
(b) x = x cosh y sinh , y = xsinh y cosh ,
(c) y = y exp(A) where A is a square, non-singular, n n matrix.
Note that if B is a square matrix the matrix e
B
is dened by the sum
e
B
=
k=0
B
k
/k!.
Exercise 7.2
Families of transformations are very common and are often generated by solutions
of dierential equations, as illustrated by the following example.
(a) Show that the solution of the equation
dy
dt
= y(1y), 0 y(0) 1, is y = (z, t) =
ze
t
1 + (e
t
1) z
, where z = y(0).
(b) Show that this denes a one-parameter family of transformations, y = (z, t),
with parameter t, so that for each t, (z, t) transforms the initial point z to the
value y(t).
Exercise 7.3
_
b
a
dx
_
y
2
1
y
2
2
_
is invariant under the trans-
formation
y
1
= y1 cosh +y2 sinh, y
2
= y1 sinh +y2 cosh , x = x +g(x),
only if g(x) is a constant.
We have nally arrived at the main part of this section, the statement and proof of
Noethers theorem. The theorem was published in 1918 by Emmy Noether (1882-
1935), a German mathematician, considered to be one of the most creative abstract
algebraists of modern times. The theorem was derived for certain Variational Principles,
and has important applications to physics especially relativity and quantum mechanics,
besides systematising many of the known results of classical dynamics; in particular it
provides a uniform description of the laws of conservation of energy, linear and angular
momentum, which are, respectively, due to invariance of the equations of motion under
translations in time, space and rotations in space. The theorem can also be applied to
partial dierential equations that can be derived from a variational principle.
The theorem deals with arbitrarily small changes in the coordinates, so in equa-
tion 7.12 we assume || 1 and write the transformation in the form
x = x +(x, y, y
) +O(
2
), =

=0
,
y
k
= y
k
+
k
(x, y, y
) +O(
2
),
k
=

k
=0
,
(7.13)
where we have used the fact that when = 0 the transformation becomes the identity.
In all subsequent analysis second order terms in the parameter, here , are ignored.
Exercise 7.4
Show that to rst order in the rotation dened by equation 7.9 becomes
y
1
= y1 +1, y
2
= y2 +2 where 1 = y2 and 2 = y1.
Theorem 7.1
Noethers theorem: If the functional
S[y] =
_
d
c
dxF(x, y, y
) (7.14)
is invariant under the family of transformations 7.13, for arbitrary c and d, then
n
k=1
F
y
k
+
_
F
n
k=1
y
k
F
y
k
_
= constant (7.15)
along each stationary path of S[y].
The function dened on the left-hand side of this equation is often named a rst-
integral of the Euler-Lagrange equations.
1
In the one dimensional case, n = 1, and when y = y, ( = 0) and = 1, equation 7.15
reduces to the result derived in the previous section, equation 7.6 (page 198). In general,
if n = 1, equation 7.15 furnishes a rst-order dierential equation which is usually easier
to solve than the corresponding second-order Euler-Lagrange equation, as was seen in
sections 5.2 and 5.3. Normally, solutions of this rst-order equation are also solutions of
the Euler-Lagrange equations: however, this not always true, so some care is sometimes
needed, see for instance exercise 7.8 and the following discussion. A proof of Noethers
theorem is given after the following exercises.
Exercise 7.5
Use the fact that the functional S[y] =
_
b
a
dx
_
y
2
1
+y
2
2
_
is invariant under the
rotation dened by equation 7.9 and the result derived in exercise 7.4 to show that
a rst-integral, equation 7.15, is y1y
2
y2y
1
=constant.
In the context of dynamics this rst-integral is the angular momentum.
Exercise 7.6
_
b
a
dx
_
y
2
1
+y
2
2
_
is invariant under the following
three transformations
(i) y
1
= y1 +g(x), y
2
= y2, x = x,
(ii) y
1
= y1, y
2
= y2 +g(x), x = x,
(iii) y
1
= y1, y
2
= y2, x = x +g(x).
only if g(x) is a constant.
In the case g = 1 show that these three invariant transformations lead to the
rst-integrals,
(i) y
1
=constant, (ii) y
2
=constant (iii) y
2
1
+y
2
2
=constant.
1
The name rst-integral comes from the time when dierential equations were solved by successive
integration, with n integrations being necessary to nd the general solution of an nth order equation.
The term solution dates back to Lagrange, but it was Poincare who established its use; what is now
named a solution used to be called an integral or a particular integral.
Exercise 7.7
S[y] =
_
b
a
dx
_
1
2
_
y
2
1
+y
2
2
_
+V (y1 y2)
_
,
where V (z) is a dierentiable function, is invariant under the transformations
y
1
= y1 +g(x), y
2
= y2 +g(x), x = x,
only if g(x) is a constant, and then a rst-integral is y
1
+y
2
=constant.
Exercise 7.8
A version of the Emden-Fowler equation can be written in the form
d
2
y
dx
2
+
2
x
dy
dx
+y
5
= 0.
(a) Show that this is the Euler-Lagrange equation associated with the functional
S[y] =
_
b
a
dxx
2
_
y
2
1
3
y
6
_
.
(b) Show that this functional is invariant under the scaling transformation x = x,
y = y, where and are constants satisfying
2
= 1. Use Noethers theorem
to deduce that a rst-integral is
x
2
yy
+x
3
_
y
2
+
1
3
y
6
_
= c,
where c is a constant.
(c) By substituting the trial function y = Ax
into the rst-integral, nd a value

of that yields a solution of the rst-integral, for any A. Show also that this is
a solution of the original Euler-Lagrange equation, but only for particular values
of A.
(d) By dierentiating the rst-integral, obtain the following equation for y
,
_
xy
+ 2y
+xy
5
_ _
2x
2
y
+xy
_
= 0.
Show that the solutions of this equation are y = Ax
1/2
, for any constant A,
together with the solutions of the Euler-Lagrange equation.
In the previous exercise we saw that the rst-order dierential equation dened by
the rst-integral had a solution that was not a solution of the original Euler-Lagrange
equation. This feature, surprising at rst, is typical and a consequence of the orig-
inal equation being nonlinear in x and y. In general the Euler-Lagrange equation
can be written in the form y
= f(x, y, y
); suppose this possesses the rst-integral

(x, y, y
) = c, then dierentiation of this eliminates the constant c to give the second-

order equation y
y
+y
y
+
x
= 0, which is linear in y
. By denition solutions of
the Euler-Lagrange equation satisfy the rst-integral, so this equation factorises in the
form
y
y
+y
y
+
x
=
_
y
f(x, y, y
)
_
g(x, y, y
) = 0,
for some function g(x, y, y
), which may be a constant. This latter equation may also

have another solution given by g(x, y, y
) = 0, which may be integrated to furnish a

relation between x and y involving the single constant c. Usually this function is not a
solution of the original Euler-Lagrange equation.
The general solution of the Euler-Lagrange equation contains two independent vari-
ables, which are determined by the boundary conditions that can be varied indepen-
dently. The extra solution of the rst-integral, if it exists, can involve at most one
constant, so does not usually satisfy both boundary conditions. A simple example of
this was seen in the minimal surface area problem, equation 5.14 (page 157).
7.3.1 Proof of Noethers theorem
Noethers theorem is proved by substituting the transformation 7.13 into the func-
tional 7.14 and expanding to rst order in . The algebra is messy, so we proceed in
two stages.
First, we assume that x = x, that is = 0, which simplies the algebra. It is easiest
to start with the transformed functional
G =
_
d
c
dxF
_
x, y,
dy
dx
_
, (since x = x).
Now substitute for y and y
and expand to rst order in to obtain,

G =
_
d
c
dxF
_
x, y +, y
+
d
dx
_
,
=
_
d
c
dxF (x, y, y
) +
_
d
c
dx
n
k=1
_
F
y
k
k
+
F
y
k
d
k
dx
_
. (7.16)
But the rst term is merely the untransformed functional which, by denition equals
the transformed functional because it is invariant under the transformation. Also,
using integration by parts
_
d
c
dx
F
y
k
d
k
dx
=
_
k
F
y
k
_
d
c
_
d
c
dx
k
d
dx
_
F
y
k
_
and hence, by substituting this result into equation 7.16 we obtain
0 =
_
d
c
dx
n
k=1
k
_
F
y
k
d
dx
_
F
y
k
__
+
n
k=1
_
k
F
y
k
_
d
c
. (7.17)
The term in curly brackets is, by virtue of the Euler-Lagrange equation, zero on a
stationary path and hence it follows that
n
k=1
k
F
y
x=d
=
n
k=1
k
F
y
x=c
.
Since c and d are arbitrary we obtain equation 7.15, with = 0.
The general case, = 0, proceeds similarly but is algebraically more complicated.
As before we start with the transformed functional, which is now
G =
_
d
c
dxF
_
x, y,
dy
dx
_
,
where d = d + (d), c = c + (c), with (c) denoting (c, y(c), y
(c)). Now we have

to change the integration variable and limits, besides expanding F. First consider the
dierential; using equations 7.13 and the chain rule,
dy
dx
=
_
dy
dx
+
d
dx
_
dx
dx
but
dx
dx
= 1 +
d
dx
(7.18)
giving
dx
dx
= 1
d
dx
+O(
2
) and so, to rst order in we have
dy
dx
=
_
dy
dx
+
d
dx
__
1
d
dx
_
=
dy
dx
+
_
d
dx

dy
dx
d
dx
_
.
Thus the integral becomes
G =
_
d
c
dx
dx
dx
F
_
x +, y +,
dy
dx
+
_
d
dx

dy
dx
d
dx
__
.
Now expand to rst order in and use the fact that the functional is invariant. After
some algebra we nd that
0 =
_
d
c
dx
__
F
n
k=1
F
y
k
dy
k
dx
_
d
dx
+
F
x
+
n
k=1
_
k
F
y
k
+
F
y
k
d
k
dx
_
_
. (7.19)
Notice that if = 0 this is the equivalent of equation 7.17. Now integrate those terms
containing d/dx and d
k
/dx by parts to cast this equation into the form
0 =
__
F
n
k=1
F
y
k
dy
k
dx
_
+
n
k=1
k
F
y
k
_
d
c
+
_
d
c
dx
_
F
x

d
dx
_
F
n
k=1
F
y
k
dy
k
dx
__
+
_
d
c
dx
n
k=1
k
_
F
y
k
d
dx
_
F
y
k
__
. (7.20)
Finally, we need to show that on stationary paths the integrals are zero. The second
integral is clearly zero, by virtue of the Euler-Lagrange equations. On expanding the
integrand of the rst integral it becomes
F
x

_
F
x
+
n
k=1
_
F
y
k
y
k
+
F
y
k
y
k
_
_
+
n
k=1
F
y
k
y
k
+
n
k=1
y
k
d
dx
_
F
y
k
_
.
Using the Euler-Lagrange equations to modify the last term it is seen that this expres-
sion is zero. Hence, because c and d are arbitrary we have shown that the function
_
F
n
k=1
F
y
k
y
k
_
+
n
k=1
k
F
y
k
,
where y(x) is evaluated along a stationary path, is independent of x.
Exercise 7.9
Derive equations 7.19 and 7.20.
Exercise 7.10
_
b
a
dxF(x, y
), where the integrand depends only

upon x and y
. Show that Noethers theorem gives the rst-integral F

y
(x, y
) =
constant, and that this is consistent with the Euler-Lagrange equation.
Exercise 7.11
S[y] =
_
b
a
dx xyy
2
is 2xy
d
2
y
dx
2
+x
_
dy
dx
_
2
+ 2y
dy
dx
= 0.
(b) Show that this functional is invariant with respect to scale changes in the
independent variable, x, that is, under the change to the new variable x = (1+)x,
where is a constant. Use Noethers theorem to show that the rst-integral of the
above dierential equation is, x
2
y
_
dy
dx
_
2
= c, for some constant c,
Exercise 7.12
_
dxx
3
y
2
y
2
.
(a) Show that S is invariant under the scale transformation x = x, y = y if
2
= 1. Hence show that a rst-integral is x
3
y
3
y
+x
4
y
2
y
2
= c = constant.
(b) Using the function y = Ax
nd a solution of this equation; show also that

this is not a solution of the associated Euler-Lagrange equation.
(c) Show that the general solution of the Euler-Lagrange equation is y
2
+Ax
2
= B,
where A and B are arbitrary constants.
(d) Using the independent variable u where x = u
a
, show that with a suitable
choice of the constant a the functional becomes
S[y] =
1
a
_
du
_
y
dy
du
_
2
.
Find the rst-integral of this functional and show that the solution of the rst-
integral found in part (b) does not satisfy this new rst-integral.
Exercise 7.13
Show that the Euler-Lagrange equation of the functional
S[y] =
_
b
a
dxF(x, y, y
, y
), y(a) = A1, y
(a) = A2, y(b) = B1, y
(b) = B2,
derived in exercise 4.33 (page 143), has the rst-integral
d
dx
_
F
y
F
y
= constant
if the integrand does not depend explicitly upon y(x) and the integral
y
F
y

_
d
dx
_
F
y
F
y
_
y
F = constant
if the integrand does not depend explicitly upon x.
Hint: the second part of this question is most easily done using the theory de-
scribed in section 7.2.
Chapter 8
The second variation
8.1 Introduction
In this chapter we derive necessary and sucient conditions for the functional
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (8.1)

to have an actual extremum, rather than simply a stationary value. You will recall from
chapter 4 that necessary and sucient conditions for this functional to be stationary on
a suciently dierentiable curve y(x) is that it satises the Euler-Lagrange equation,
d
dx
_
F
y
F
y
= 0, y(a) = A, y(b) = B. (8.2)
Necessary and sucient conditions for a solution of the Euler-Lagrange equation to
be an extremum are stated in theorems 8.3 and 8.4 (page 221) respectively. This
theory is important because many variational principles require the functional to have
a minimum value. But the theory is limited because it does not determine whether an
extremum is local or global, section 3.2.2, and sometimes this distinction is important,
as for example with geodesics. The treatment of local and global extrema is dierent.
For a local extremum we need compare only neighbouring paths the behaviour far
away is irrelevant. For a global extremum, we require information about all admissible
paths, which is clearly a far more demanding, and often impossible, task. For this
reason we shall concentrate on local extrema but note that the analysis introduced in
exercise 3.4 (page 97) uses a global property of the functional which can be used for
some brachistochrone problems, as shown in section 8.6. This, and other, methods are
analysed in more depth by Troutman
1
.
We start this chapter with a description of the standard procedure used to classify
the stationary points of functions of several real variables, leading to a statement of
the Morse lemma, which shows that with n variables most stationary points can be
categorised as one of n + 1 types, only one of which is a local minimum and one a
local maximum. In section 8.4 we derive a sucient condition for the functional 8.1 to
1
Troutman J L, 1983, Variational Calculus with Elementary Convexity, Springer-Verlag.
209
210 CHAPTER 8. THE SECOND VARIATION
have an extremum. But to understand this theory it necessary to introduce Jacobis
equation and the notion of conjugate points, so these are introduced rst. Theorem 8.4
(page 221) is important and useful because it provides a test for determining whether a
stationary path actually yields a local extremum; in sections 8.6 and 8.7 we shall apply
it to the brachistochrone problem and the minimum surface of revolution, respectively.
Finally, in section 8.8 we complete the story by showing how the classication method
used for functions of n variables tends to Jacobis equation as n .
8.2 Stationary points of functions of several variables
Suppose that x = (x
1
, x
2
, , x
n
) and F(x) is a function in R
n
that possesses deriva-
tives of at least second-order. Using the Taylor expansion of F(x + ), equation 1.39
(page 36), with || = 1 to guarantee that tends to zero with , we have
F(x +) = F(x) +F[x, ] +
1
2
2
F[x, ] +O(
3
), (8.3)
where F is the Gateaux dierential
F[x, ] =
n
k=1
k
F
x
k
and
2
F[x, ] =
n
k=1
n
j=1
2
F
x
k
x
j
. (8.4)
At a stationary point, x = a, by denition F(a +) F(a) = O(
2
), for all so that
all the rst partial derivatives, F/x
k
, must be zero at x = a. Provided
2
F[x, ]
is not identically zero for any the nature of the stationary point is determined by
the behaviour of
2
F[a, ]. If this is positive for all then F(a) has a local minimum
at x = a: note that the adjective local is usually omitted. If it is negative for all
then F(a) has a local maximum at x = a. If the sign of
2
F[a, ] changes with the
stationary point is said to be a saddle. In some texts the terms stationary point and
critical point are synonyms.
8.2.1 Functions of one variable
For functions of one variable, n = 1, there are three types of stationary points as
illustrated in gure 8.1. The function shown in this gure has a local maximum and
a local minimum, but the global maximum and minimum are at the ends of the range
and are not stationary points.
inflection maximum
minimum
f(x)
x
local
local
Figure 8.1 Diagram showing the three possible types
of stationary point of a function, f(x), of one variable.
8.2. STATIONARY POINTS OF FUNCTIONS OF SEVERAL VARIABLES 211
At a typical maximum f
(x) = 0 and f
(x) < 0; at a typical minumum f
(x) = 0 and
f
(x) > 0 whereas at a stationary point which is also a point of inection

2
f
(x) =
f
(x) = 0. Care is needed when classifying stationary points because there are many
special cases; for instance the function f(x) = x
4
is stationary at the origin and
f
(k)
(0) = 0 for k = 1, 2 and 3. For this reason we restrict the discussion to typi-
cal stationary points, dened to be those at which the second derivative is not zero:
without this restriction complications arise, for the reasons discussed in the following
exercise.
Exercise 8.1
Stationary points which are also points of inection are not typical because small,
arbitrary changes to a function with such a stationary point usually change its
nature.
(a) Show that the function f(x) = x
3
is stationary and has a point of inection
at the origin. By adding x, with 0 < || 1, so f(x) becomes x
3
+x, show that
the stationary point is removed if > 0 or converted to two ordinary stationary
points (at which the second derivative is not zero) if < 0.
(b) If the function f(x) is stationary and has a point of inection at x = a, so
f
(a) = f
(a) = 0, but f
(3)
(a) = 0, show that the function F(x) = f(x) +g(x),
where 0 < || 1 and where g(a) = 0 and g
(a) = 0, is either not stationary or

has ordinary stationary points in the neighbourhood of x = a. You may assume
that all functions possess a Taylor expansion in the neighbourhood of x = a.
Note that a suciently smooth function f(x) dened on an interval [a, b] is often said
to be structurally stable if the number and nature of its stationary points are unchanged
by the addition of a small, arbitrary function; that is for arbitrarily small , f(x) and
f(x)+g(x), also suciently smooth, have the same stationary point structure on [a, b].
Generally, functions describing the physical world are structurally stable, provided f
and g have the same symmetries. For functions of one real variable there are just two
typical stationary points, maxima and minima.
8.2.2 Functions of two variables
If n = 2 there are three types of stationary points, maxima, minima and saddles,
examples of which are
f(x) =
_
_
x
2
1
x
2
2
, maximum,
x
2
1
+x
2
2
, minimum,
x
2
1
x
2
2
, saddle.
(8.5)
These three functions all have stationary points at the origin and their shapes are
illustrated in the following gures.
2
A general point of inection is where f
(x) changes sign, though f
(x) need not be zero.

Figure 8.2 Maximum Figure 8.3 Minimum Figure 8.4 Saddle
In the neighbourhood of a maximum or a minimum the function is always smaller or
always larger, respectively, than its value at the stationary point. In the neighbourhood
of a saddle it is both larger and smaller.
The nature of a stationary point is determined by the value of the Hessian determi-
nant at the stationary point. The Hessian determinant is the determinant of the real,
symmetric Hessian matrix with elements H
ij
=
2
f/x
i
x
j
. A stationary point is said
to be non-degenerate if, at the stationary point, det(H) = 0; a degenerate stationary
point is one at which det(H) = 0. At a degenerate stationary point a function can have
the characteristics of an extremum or a saddle, but there are other cases. In this text
the adjectives typical and non-degenerate when used to describe a stationary point are
synonyms.
Exercise 8.2
Find the Hessian determinant of the functions dened in equation 8.5.
Exercise 8.3
Show that the function f(x, y) = x
3
3xy
2
has a degenerate stationary point at
the origin.
8.2.3 Functions of n variables
For a scalar function, f(x), of n variables the Hessian matrix, H, is the n n, real
symmetric matrix with elements H
ij
(x) =
2
f/x
i
x
j
. A stationary point, at x = a,
is said to be typical, or non-degenerate, if det(H(a)) = 0 and the classication of these
points depends entirely upon the second-order term of the Taylor expansion, that is
2
F[a, ], equation 8.4. Further, the following lemma, due to Morse (1892 1977),
shows that there are n + 1 types of stationary points, only two of which are extrema.
The Morse Lemma: If a is a non-degenerate stationary point of a smooth function
f(x) then there is a local coordinate system
3
(y
1
, y
2
, , y
n
), where y
k
(a) = 0, for all
k, such that
f(y) = f(a)
_
y
2
1
+ y
2
2
+ +y
2
l
_
+
_
y
2
l+1
+y
2
l+2
+ + y
2
n
_
,
for some 0 l n.
3
This means that in the neighbourhood of x = a the transformation from x to y is one to one and
that each coordinate y
k
(x) satises the conditions of the implicit function theorem.
8.2. STATIONARY POINTS OF FUNCTIONS OF SEVERAL VARIABLES 213
Note that this representation of the function is exact in the neighbourhood of the
stationary point: it is not an expansion. The integer l is a topological invariant, meaning
that a smooth, invertible coordinate change does not alter its value, so it is a property
of the function not the coordinate system used to represent it.
At the extremes, l = 0 and l = n, we have
f(y) =
_
_
f(a) +
n
k=1
y
2
k
, (l = 0), minimum,
f(a)
n
k=1
y
2
k
, (l = n), maximum.
For 0 < l < n the function is said to have a Morse l-saddle and if n 1 there are many
more types of saddles than extrema
4
.
The Morse lemma is an existence theorem: it does not provide a method of deter-
mining the transformation to the coordinates y(x) or the value of the index l: this is
usually determined using the second-order term of the Taylor expansion, most conve-
niently written in terms of the Hessian matrix, H, evaluated at the stationary point a,
2
F[a, z] = z
H(a)z =
n
i=1
n
j=1
H
ij
(a)z
i
z
j
where z = x a (8.6)
and z
is the transpose of the vector z. Thus the nature of the non-degenerate station-
ary point depends upon the behaviour of the quadratic form
2
.
(a) if
2
is positive denite,
2
> 0 for all |z| > 0, the stationary point is a minimum;
(b) if
2
is negative denite,
2
< 0 for all |z| > 0, the stationary point is a maximum;
(c) otherwise the stationary point is a saddle.
The following two statements are equivalent and both provide necessary and sucient
conditions to determine the behaviour of
2
and hence the nature of the stationary
point of f(x).
(I) If the eigenvalues of H(a) are
k
, k = 1, 2, , n, then
(a)
2
will be positive denite if
k
> 0 for k = 1, 2, , n:
(b)
2
will be negative denite if
k
< 0 for k = 1, 2, , n:
(c)
2
will be indenite if the
k
are of both signs; further, the index l is equal
to the number of negative eigenvalues.
Since H is real and symmetric all its eigenvalues are real. If one of its eigenvalues
is zero the stationary point is degenerate.
(II) Let D
r
be the determinant derived from H by retaining only the rst r rows and
columns, so that
D
r
=
H
11
H
12
H
1r
H
21
H
22
H
2r

H
r1
H
r2
H
rr
, r = 1, 2, , n,
4
Note that some texts use l
= n l in place of l in which case the function has a minimum when

l
= n and maximum when l
= 0.
giving D
1
= H
11
and D
n
= det(H). Then
(a)
2
will be positive denite if D
k
> 0 for k = 1, 2, , n:
(b)
2
will be negative denite if (1)
k
D
k
> 0 for k = 1, 2, , n:
(c)
2
will be indenite it neither conditions (a) nor (b) are satised.
The proof of this statement may be found in Jerey
5
(1990, page 288).
The determinants D
k
, k = 1, 2, , n, are known as the descending principal minors
of H. In general a minor of a given element, a
ij
, of an Nth-order determinant is the
(N 1)th-order determinant obtained by removing the row and column of the given
element, and with the sign (1)
i+j
.
Exercise 8.4
Determine the nature of the stationary points at the origin of the following quadratic
functions:
(a) f = 2x
2
8xy +y
2
,
(b) f = 2x
2
+ 4y
2
+z
2
+ 2(xy +yz +xz).
Exercise 8.5
The functions f(x, y) = x
2
+ y
3
and g(x, y) = x
2
+y
4
are both stationary at the
origin. Show that for both functions this is a degenerate stationary point, classify
it and determine the expression for 2.
Exercise 8.6
Show that a nondegenerate stationary point of the function of two variables, f(x, y)
is a minimum if
2
f
x
2
> 0,

2
f
y
2
> 0 and

2
f
x
2
2
f
y
2

_

2
f
xy
_
2
> 0,
and a maximum if
2
f
x
2
< 0,

2
f
y
2
< 0 and

2
f
x
2
2
f
y
2

_

2
f
xy
_
2
> 0,
where all derivatives are evaluted at the stationary point. Show also that if
2
f
x
2
2
f
y
2

_

2
f
xy
_
2
< 0
the stationary point is a saddle.
Exercise 8.7
(a) Show that the function f(x, y) = (x
3
+y
3
) 3(x
2
+y
2
+ 2xy) has stationary
points at (0, 0) and (4, 4), and classify them.
(b) Find the stationary points of the function f(x, y) = x
4
+ 64y
4
2(x + 8y)
2
,
and classify them.
5
Linear Algebra and Ordinary Dierential Equations by A Jerey, (Blackwell Scientic Publica-
tions).
8.3. THE SECOND VARIATION OF A FUNCTIONAL 215
Exercise 8.8
The least squares t
Given a set of N pairs of data points (xi, yi) we require a curve given by the line
y = a +bx with the constants (a, b) chosen to minimise the function
(a, b) =
N
i=1
(a +bxi yi)
2
.
Show that (a, b) are given by the solutions of the linear equations
aN +b
xi =
yi,
a
xi +b
x
2
i
=
xiyi,
where the sums are all from i = 1 to i = N.
Hint use the Cauchy-Schwarz inequality for sums (page 41), to show that the
stationary point is a minimum.
8.3 The second variation of a functional
As with functions of n real variables the nature of a stationary functional is usually
determined by considering the second-order expansion, that is the term O(
2
) in the
dierence S = S[y+h]S[y], where y(x) is a solution of the Euler-Lagrange equation.
In order to determine this we use the Taylor expansion of the integrand,
F(x, y +h, y
+h
) = F(x, y, y
) +
_
F
y
+
F
y
h
_
+
2
2
_
2
F
y
2
h
2
+ 2

2
F
yy
hh
+

2
F
y
2
h
2
_
+O(
3
),
and assume that h(x) belongs to D
1
(a, b), dened on page 124, that is, we are consid-
ering weak variations, section 4.6. It is convenient to write
S = [y, h] +
1
2
2
[y, h] +o(
2
)
where [y, h] is the Gateaux dierential, introduced in equation 4.5 (page 125), and
2
[y, h] is named the second variation
6
and is given by
2
[y, h] =
_
b
a
dx
_
2
F
y
2
h
2
+ 2

2
F
yy
hh
+

2
F
y
2
h
2
_
.
On a stationary path, by denition [y, h] = 0 for all admissible h, and henceforth we
shall assume that on this path
S =
1
2
2
[y, h] +
2
R() with lim
0
R() = 0, for all admissible h. (8.7)
6
Note, in some texts
2
2
/2 is named the second variation but, whichever convention is used the
subsequent analysis is identical.
This means that for small the rst term dominates and the sign of S is the same
as the sign of the second variation,
2
, which therefore determines the nature of the
stationary path.
The expression for
2
may be simplied by integrating the term involving hh
by
parts, giving
_
b
a
dx

2
F
yy
hh
=
1
2
_
b
a
dx

2
F
yy
dh
2
dx
=
1
2
_
h
2

2
F
yy
_
b
a
1
2
_
b
a
dxh
2
d
dx
_

2
F
yy
_
.
Because of the boundary conditions h(a) = h(b) = 0 and the boundary term vanishes
to give
2
_
b
a
dx

2
F
yy
hh
=
_
b
a
dxh
2
d
dx
_

2
F
yy
_
.
Thus the second variation becomes
2
[y, h] =
_
b
a
dx
_
P(x)h
(x)
2
+Q(x)h(x)
2
_
, (8.8)
where
P(x) =

2
F
y
2
and Q(x) =

2
F
y
2

d
dx
_

2
F
yy
_
(8.9)
are known functions of x, because here y(x) is a solution of the Euler-Lagrange equation.
The signicance of
2
leads to the rst two important results conveniently expressed
as theorems, which we shall not prove
7
.
Theorem 8.1
A sucient condition for the functional S[y] to have a minimum on a path y(x) for
which the rst variation vanishes, [y, h] = 0, is that
2
[y, h] > 0 for all allowed h = 0.
For a maximum we reverse the inequality.
Note that the condition
2
> 0 for all h is often described by the statement
2
is
strongly positive.
If
2
[y, h] = 0 for some h(x) then for these h the sign of S is determined by the
higher-order terms in the expansion, as for the examples considered in exercise 8.5.
Theorem 8.2
A necessary condition for the functional S[y] to have a minimum along the path y(x)
is that
2
[y, h] 0 for all allowed h. For a maximum, the sign is replaced by .
The properties of
2
[y, h] are therefore crucial in determining the nature of a stationary
path. We need to show that
2
[y, h] = 0 for all admissible h, and this is not easy; the
remaining part of this theory is therefore devoted to understanding the behaviour of
2
.
7
Proofs of theorems 8.1 and 8.2 are provided in I M Gelfand and S V Fomin Calculus of Variations,
(Prentice Hall), chapter 5.
8.3.1 Short intervals
For short intervals, that is suciently small b a, there is a very simple condition for
a functional to have an extremum, namely that for a x b, P(x) = 0: if P(x) > 0
the stationary path is a minimum and if P(x) < 0 it is a maximum. Unfortunately,
estimates of the magnitude of b a necessary for this condition to be valid are usually
hard to nd.
This result follows because if h(a) = h(b) = 0 the variation of h
(x)
2
is larger that
that of h(x)
2
. We may prove this using Schwarzs inequality, that is
_
b
a
dxu(x)v(x)
_
b
a
dx |u(x)|
2
_
b
a
dx |v(x)|
2
, (8.10)
provided all integrals exist. Since h(x) =
_
x
a
du h
(u), we have
h(x)
2
=
__
x
a
du h
(u)
_
2
__
x
a
du
___
x
a
du h
(u)
2
_
= (x a)
_
x
a
du h
(u)
2
(x a)
_
b
a
du h
(u)
2
.
Now integrate again to obtain
_
b
a
dxh(x)
2
1
2
(b a)
2
_
b
a
dxh
(x)
2
. (8.11)
Exercise 8.9
As an example consider the function g(x) = (x a)(b x) and show that
I =
_
b
a
dx g(x)
2
=
1
30
(b a)
5
, I
=
_
b
a
dxg
(x)
2
=
1
3
(b a)
3
,
and deduce that I < I
if b a <
10.
Now all that is necessary is an application of the integral mean value theorem (page 23):
consider each component of
2
, equation 8.8, separately:
(P)
2
=
_
b
a
dxP(x)h
(x)
2
= P(x
p
)
_
b
a
dxh
(x)
2
,
(Q)
2
=
_
b
a
dxQ(x)h(x)
2
= Q(x
q
)
_
b
a
dxh(x)
2
,
where x
p
and x
q
are in the closed interval [a, b].
If Q(x) > 0 and P(x) > 0 for a < x < b then
2
> 0 for all admissible h and
the stationary path is a local minimum. However, this is neither a common nor very
interesting case, and the result follows directly from equation 8.8. We need to consider
the eect of Q(x) being negative with P(x) > 0.
If Q(x) is negative in all or part of the interval (a, b) it is necessary to show that
2
=
(P)
2
+
(Q)
2
> 0. We have, on using the above results,
(Q)
2
= |Q(x
q
)|
_
b
a
dxh(x)
2
1
2
|Q(x
q
)| (ba)
2
_
b
a
dxh
(x)
2
=
1
2
|Q(x
q
)|
P(x
p
)
(ba)
2
(P)
2
.
Since P(x
p
) > 0 it follows that for suciently small (b a),
(Q)
2
<
(P)
2
and hence
that
2
> 0. If P(x) < 0 for a x b, we simply consider
2
.
Thus for suciently small b a we have
a) if P(x) =

2
F
y
2
> 0, a x b, S[y] has a minimum;
b) if P(x) =

2
F
y
2
< 0, a x b, S[y] has a maximum.
This analysis shows that for short intervals, provided P(x) does not change sign, the
functional has either a maximum or a minimum and no other type of stationary point
exists. This result highlights the signicance of the sign of P(x) which, as we shall see,
pervades the whole of this theory. In practice this result is of limited value because it
is rarely clear how small the interval needs to be.
Dynamical systems
For a one-dimensional dynamical system described by a Lagrangian dened by the dif-
ference between the kinetic and potential energy, L =
1
2
m q
2
V (q, t), where q is the
generalised coordinate, the action, dened by the functional S[q] =
_
t2
t1
dt L(t, q, q), is
stationary along the orbit from q
1
= q(t
1
) to q
2
= q(t
2
). For short times the kinetic en-
ergy dominates the motion which is therefore similar to rectilinear motion and the action
has an actual minimum along the orbit. A similar result holds for many-dimensional
dynamical systems.
Comment
The analysis of this section emphasises the fact that for short intervals the solutions
of most dierential equations behave similarly and very simply, and is the idea the
rectication described after theorem 2.2 (page 81).
8.3.2 Legendres necessary condition
In this section we show that a necessary condition for
2
[y, h] 0 for all admissible h(x)
is that P(x) = F
y
y
0 for a x b. Unlike the result derived in the previous section
this does not depend upon the interval being small, though only a necessary condition is
obtained. This result is due to Legendre: it is important because it is usually easier to
apply than the necessary condition of theorem 8.2. Further, it is of historical signicance
because Legendre attempted (unsuccessfully) to show that a sucient condition for S[y]
to have a weak minimum on the path y(x) is that F
y
y
> 0 on every point of the curve.
Recall theorem 8.2 which states that a necessary condition for S[y] to have a mini-
mum on the stationary path y(x) is that
2
[y, h] 0 for all allowed h(x). We now show
that a necessary condition for
2
0, for all h(x) in D
1
(a, b) (dened on page 124),
such that h(a) = h(b) = 0, is that P(x) =
2
F/y
2
0 for a x b. The proof is by
contradiction.
Suppose that at some point x
0
[a, b], P(x
0
) = 2p, (p > 0). Then since P(x) is
continuous
P(x) < p if a x
0
x x
0
+ b
for some > 0. We now construct a suitable function h(x) for which
2
< 0. Let
h(x) =
_
_
_
sin
2
_
(x x
0
)
_
, x
0
x x
0
+,
0, otherwise.
Then
2
=
_
b
a
dx
_
P(x)h
2
+Q(x)h
2
_
=

2
2
_
x0+
x0
dxP(x) sin
2
_
2(x x
0
)
_
+
_
x0+
x0
dxQ(x) sin
4
_
(x x
0
)
_
<
p
2
+
3
4
M, M = max
axb
(|Q|).
For suciently small the last expression becomes negative and hence
2
< 0. But, it
is necessary that
2
0, theorem 8.2, and hence it follows that we need P(x) 0 for
x [a, b]. Note that as in the analysis leading to 8.11 it is the term depending upon
h
(x) that dominates the integral.

Exercise 8.10
Explain why h(x) has to be in D1(a, b).
We have shown that a necessary condition for
2
[y, h] 0 is that P(x) 0 for x [a, b].
Using theorem 8.2 this shows that a necessary condition for S[y] to be a minimum on
the stationary path y(x) is that P(x) 0.
Legendre also attempted, unsuccessfully, to show that the weaker condition P(x) > 0,
x [a, b], is also sucient. That this cannot be true is shown by the following counter
example.
We know that the minimum distance between two points on a sphere is along the
shorter arc of the great circle passing through them assuming that the two points are
not on the same diameter. Thus for the three points A, B and C, on the great circle
through A and B, shown in gure 8.5, the shortest distance between A and B and
between B and C is along the short arcs and on these P > 0, exercise 5.20 (page 168).
A
A
Great circle
Diameter
through
B
C
Figure 8.5
Hence on the arc ABC, P > 0, but this is not the shortest distance between A and C.
Hence, it is not sucient that P > 0 for a stationary path to give a minimum.
8.4 Analysis of the second variation
In this section we continue our analysis of the second variation
2
[y, h] =
_
b
a
dx
_
P(x)h
(x)
2
+Q(x)h(x)
2
_
(8.12)
with h(x) in D
1
(a, b) and h(a) = h(b) = 0, where
P(x) =

2
F
y
2
and Q(x) =

2
F
y
2

d
dx
_

2
F
yy
_
and where y(x) is a solution of the Euler-Lagrange equation.
The necessary and sucient conditions for the functional S[y], dened in equa-
tion 8.1, to have a minimum (or a maximum) on y(x) for all allowed h(x) are given in
theorems 8.2 and 8.1 respectively, but these involve inequalities of
2
which are usually
of little practical value.
We saw in the section 8.3.1 that provided P(x) = 0 for a x b and if (b a) is
suciently small the stationary path is automatically an extremum. In this section we
derive necessary and sucient conditions for the functional S[y] to have an extremum
on a stationary path, in terms of a condition that is usually relatively simple to apply,
provided y(x) is known.
The central result of this section, theorem 8.4, is quite simple, but the route to it
involves a number of intermediate theorems and requires one new, important idea
that of a conjugate point. This idea is central to the main result so we start by dening
it then quote the main result of this chapter.
Denition 8.1
Conjugate point. The point a = a is said to be conjugate to a point a if the solution
of the linear, second-order equation
d
dx
_
P(x)
du
dx
_
Q(x)u = 0, u(a) = 0, u
(a) = 1, (8.13)
where P(x) and Q(x) are known functions of x, vanishes at x = a.
Equation 8.13 is the Euler-Lagrange equation of the functional
2
[y, u] regarded as
a quadratic functional in u and is named the Jacobi equation of the original func-
tional S[y]. Note that equation 8.13 is an initial value problem and the existence of a
unique solution is guaranteed, see theorem 2.1 (page 61): moreover, later we see that
its solution can be derived from the general solution of the associated Euler-Lagrange
equation, exercise 8.18. The signicance of equation 8.13 will become apparent as the
analysis unfolds.
Exercise 8.11
(a) Show that if P = 1 and Q is either 0 or 1 there are no points conjugate to
x = a.
(b) Show that if P = 1 and Q = 1 the point x = a + is conjugate to x = a.
8.4. ANALYSIS OF THE SECOND VARIATION 221
The following two theorems show why conjugate points are important. The rst pro-
vides a necessary condition, and is an extension of Legendres condition. The second is
the more important because it provides a sucient condition and a useful criterion for
determining the nature of a stationary path.
Theorem 8.3
Jacobis necessary condition. If the stationary path y(x) corresponds to a minimum
of the functional
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (8.14)

and if P(x) =
2
F/y
2
> 0 along the path, then the open interval a < x < b does not
contain points conjugate to a.
For a maximum of the functional, we simply replace the condition P(x) > 0 by
P(x) < 0.
If the interval [a, b] contains a conjugate point this theorem provides no information
about the nature of the stationary path, which is why the next theorem is important.
Theorem 8.4
A sucient condition. If y(x) is an admissible function for the functional 8.14 and
satises the three conditions listed below, then the functional 8.14 has a weak minimum
along y(x).
S-1: The function y(x) satises the Euler-Lagrange equation,
d
dx
_
F
y
F
y
= 0.
S-2: Along the curve y(x), P(x) = F
y
y
> 0 for a x b.
S-3: The closed interval [a, b] contains no points conjugate to the point x = a.
Theorem 8.4 is the important result because it provides a relatively simple test for
whether a stationary path is an actual extremum. It means that to investigate the
nature of a stationary path we need only compute the solution of Jacobis equation
(either analytically or numerically) and either determine the value of the rst conjugate
point or determine whether or not one exists in the relevant interval. This, together
with the sign of P(x) provides the necessary information to classify a stationary path.
In section 8.6 we apply this test to the brachistochrone problem and in section 8.7 we
consider the surface of revolution.
Exercise 8.12
S[y] =
1
2
_
X
0
dx
_
y
2
y
2
_
, y(0) = 0, y(X) = 1, X > 0,
has a weak minimum on the path y = sin x/ sin X provided 0 < X < .
Exercise 8.13
Show that the stationary paths of the function S =
_
b
a
dxF(x, y
) have no conju-
gate points provided that along them F
y
y
= 0.
Exercise 8.14
If y(x) is a stationary path of the functional 8.14 and the point x = b is a conjugate
point, so u(a) = u(b) = 0, show that 2[y, u] = 0. Deduce that the functional
may not be an extremum on y(x).
Hint multiply equation 8.13 by u and integrate the integral over (a, b) by parts.
8.4.1 Analysis of the second variation
In the analysis of the second variation,
2
, the emphasis changes from y(x), the station-
ary path (assumed known), to the behaviour of
2
with h(x), the variation from y(x).
This analysis starts by converting the integrand of
2
into a non-negative function,
which is achieved by noting that for any dierentiable function w(x)
_
b
a
dx
d
dx
_
wh
2
_
= 0
because h(a) = h(b) = 0. Hence we may re-write
2
in the form
2
=
_
b
a
dx
_
Ph
2
+ (Q+w
)h
2
+ 2whh
_
.
If P(x) > 0 for a x b the integrand may be re-arranged,
Ph
2
+ (Q+w
)h
2
+ 2whh
=
_
h
P +h
_
Q+w
_
2
+ 2hh
_
w
_
(Q+w
)P
_
.
Hence if w(x) is dened by the nonlinear equation
w
2
=
_
Q(x) +
dw
dx
_
P(x) (8.15)
it follows that
2
=
_
b
a
dxP(x)
_
h
+
w
P
h
_
2
. (8.16)
Provided w(x) exists this shows that
2
0 when P(x) > 0: if P(x) < 0 for all
x [a, b] a similar analysis shows that
2
0.
Further,
2
= 0 only if h(x) satises the linear equation
dh
dx
+
w
P
h = 0, a x b. (8.17)
But h(a) = 0, so a solution of 8.17 is h(x) = 0, and since solutions of linear equations
are unique, see theorem 2.1 (page 61) this is the only solution of 8.17, which contradicts
the assumption that h(x) = 0, for almost all x (a, b). Hence equation 8.17 is not true
and
2
= 0, that is
2
> 0, provided w exists. The problem has therefore reduced to
showing that w(x) exists.
Equation 8.15 for w(x) is Riccatis equation, see 2.3.5 and there is a standard trans-
formation 2.25 (page 67), that converts this to a linear, second-order equation. For this
example the new dependent variable, u, is dened by,
w =
u
u
P (8.18)
which casts equation 8.15 into Jacobis equation
d
dx
_
P
du
dx
_
Qu = 0, a x b. (8.19)
This is just the Euler-Lagrange equation for the functional
2
[y, u] with y(x) a sta-
tionary path of the original functional: the reason for this coincidence is discussed in
section 8.5.
Exercise 8.15
Derive equation 8.19 from 8.15.
If the interval (a, b] contains no points conjugate to a, Jacobis equation, with the initial
conditions u(a) = 0, u
(a) = 1, has a solution which does not vanish in (a, b], then from
equation 8.18, w(x) exists on the whole interval and
2
> 0.
A little care is needed in deriving this result because the denition of a conjugate
point involves a solution of Jacobis equation which is zero at x = a. However, if
the interval (a, b] contains no points conjugate to a then, since the solutions of this
equation depend continuously on the initial conditions, the interval [a, b] contains no
points conjugate to a , for some suciently small, positive . Therefore the solution
that satises the initial conditions u(a ) = 0, u
(a ) = 1 does not vanish anywhere

in the interval [a, b]. Here, of course, it is implicit that P(x) = 0 in [a, b].
Thus we have just proved the following theorem.
Theorem 8.5
If P(x) > 0 for a x b and if the interval [a, b] contains no points conjugate to a,
then
2
[y, h] > 0 for all h D
1
(a, b) and for which h(a) = h(b) = 0.
This result indicates that conjugate points are important, but it does not describe the
behaviour of
2
if there is a point conjugate to a in (a, b]. It is necessary to prove
the converse result that if
2
> 0 then there are no points in (a, b] conjugate to a. It
then follows that
2
> 0 if and only if there are no conjugate points and hence, using
theorem 8.1, that the stationary path gives a local minimum value of the functional.
Thus we now need to prove the following theorem.
Theorem 8.6
If
2
[y, h] > 0, where P(x) > 0 for a x b, for all h D
1
(a, b) such that
h(a) = h(b) = 0, then the interval (a, b] contains no points conjugate to a.
We give an outline of the proof of this theorem by assuming the existence of a con-
jugate point a satisfying a < a b and showing that this contradicts the assumption
2
[y, h] > 0. First consider the limiting case a = b.
Note that if h(x) satises the equation
d
dx
_
P
dh
dx
_
Qh = 0, h(a) = h(b) = 0,
then
2
[y, h] = 0, because of the identity
0 =
_
b
a
dx
_
d
dx
_
P
dh
dx
_
Qh
_
h =
_
b
a
dx
_
Ph
2
+Qh
2
_
=
2
[y, h], (8.20)
obtained using integration by parts. This contradicts the assumption that
2
[y, h] > 0.
Theorem 8.6 is proved by constructing a family of positive denite functionals,
J
[h], depending upon a real parameter , such that for = 1 we obtain the quadratic
functional
J
1
[h] =
2
[y, h] =
_
b
a
dx
_
P(x)h
(x)
2
+Q(x)h(x)
2
_
where y(x) is a stationary path. For = 0 the functional is chosen to give
J
0
[h] =
_
b
a
dxh
(x)
2
for which there are no points conjugate to a, see exercise 8.11. It has to be shown that
as is increased continuously from 0 to 1 no conjugate points appear in the interval
[a, b].
Thus, consider the functional
J
[h] =
_
b
a
dx
__
P(x)h
(x)
2
+Q(x)h(x)
2
_
+ (1 )h
(x)
2
_
(8.21)
which is positive denite for 0 1 since we are assuming J
1
[h] =
2
[y, h] > 0. The
Euler-Lagrange equation for this functional is the linear, rst-order equation
d
dx
_
_
P + (1 )
_
dh
dx
_
Qh = 0. (8.22)
Suppose that h(x, ) is a solution of 8.22 satisfying the initial conditions h(a, ) = 0,
h
x
(a, ) = 1 for all 0 1. It can be shown that this solution is a continuous function
of the parameter
8
, and for = 1 reduces to the solution of the Jacobi equation 8.13
for S[y]; for = 0 it reduces to the equation h
(x) = 0, that is h(x) = x a.

Now assume that a conjugate point a, such that a < a < b exists. By considering
the set of all points in the rectangle a x b, 0 1, in the (x, )-plane, such that
h(x, ) = 0, we can show that this assumption leads to a contradiction.
If such a set of points exists it represents a curve in the (x, )-plane. This follows
because where h(x, ) = 0 we must have h
x
(x, ) = 0, for if h(c, ) = h
x
(c, ) = 0, for
some c, the uniqueness theorem for linear dierential equations shows that h(x, ) = 0
for all x [a, b], which is impossible since h
x
(a, ) = 1 for 0 1.
Thus, since h
x
(x, ) = 0 the implicit function theorem allows the equation h(x, ) = 0
to be inverted to dene a continuous function x = x() in the neighbourhood of each
point. The point ( a, 1) lies on this curve, as shown in gure 8.6.
8
In fact h(x, ) has as many continuous dierentials as the functions P(x) and Q(x).
=1
=0
D
a
C
a b
x
E
A
B
Figure 8.6 Figure showing possible lines along which h(x, ) = 0.
This gure shows the ve possible curves A E, that can emanate from ( a, 1); we
now discuss each of these curves in turn, showing that each is impossible.
Curve A terminates inside the rectangle. This is impossible because it would contra-
dict the continuous dependence of the solution h(x, ) on the parameter .
Curve B intersects the line x = b for 0 1. This is impossible because if true
h(x, ) would satisfy equation 8.22 with the boundary conditions h(a, ) = h(b, ) = 0,
giving h(x, ) = 0 and hence J
[h] = 0, which contradicts the assumption that this

functional is positive denite.
Curve C intersects the line a x b, = 1. This is impossible for then we would
have d/dx = 0 at some point (c, ), inside the rectangle, and since
h
x
+
h
d
dx
= 0
this would mean h = 0 and h
x
= 0 at this point which, for the reasons discussed
above, means that h(x, ) = 0 for all a x b.
Curve D intersects the x-axis between a x b when = 0. This is impossible
because when = 0 the solution of equation 8.22 reduces to h(x, 0) = x a, which
is zero only at x = a.
Curve E intersects the line x = a for 0 1. This implies that h
x
(a, ) = 0 and
hence contradicts an initial assumption, as is seen by expanding the solution of the
dierential equation 8.22 about x = a: using the initial conditions h(a, ) = 0, the
Taylor expansion gives
h(x, ) = (x a)h
x
(a, ) +
1
2
(x a)
2
h
xx
(a, ) +O
_
(x a)
3
_
.
But the curve x() is a solution of h(x, ) = 0 and hence on this curve
h
x
(a, ) =
1
2
(x a)h
xx
(a, ) +O
_
(x a)
2
_
.
As x a the right-hand side becomes zero, so for the curve x() to intersect the
line x = a we must have h
x
(a, ) = 0, contradicting the initial assumption that
h
x
(a, ) = 1.
Hence, we see that if J
[h] > 0 there is no conjugate point a satisfying a < a b, which

completes the proof.
Exercise 8.16
For the functional
S[y] =
1
2
_
X
0
dx
_
y
2
y
2
_
, y(0) = 0, y(X) = 1, X > 0,
show that equation 8.22 becomes h
+h = 0, with solution h(x, ) = sin(x
)/
.
Show that h(x, 0) = x and hence that for = 0 there are no conjugate points;
show also that for > 0 there are innitely many conjugate points at x
k
= k/
,
k = 1, 2, .
8.5 The Variational Equation
Here we provide an alternative interpretation of Jacobis equation. Consider the Euler-
Lagrange equation
d
dx
_
F
y
F
y
= 0, y(a) = A, y(b) = B. (8.23)
and assume that a stationary path, y(x), exists. This Euler-Lagrange equation is
second-order so requires two conditions to uniquely specify a solution. We now consider
a family of solutions by ignoring the boundary condition at x = b and constructing a
set of solutions passing through the rst point (a, A), restricting attention to those
solutions that are close to y(x).
One such a family is obtained by dening z(x) = y(x) +h(x), where || is small and
h(x) = O(1). Since, by denition z(a) = y(a) = A we must have h(a) = 0. A sketch of
such a family is shown in gure 8.7.
b a
A
B
x
y
y(x)
Figure 8.7 Diagram showing a stationary path, y(x), the solid line, and some
members of the family of varied paths, z(x), passing through (a, A).
Taylors theorem gives
F(x, z, z
) = F(x, y +h, y
+h
) = F(x, y, y
) + (hF
y
+h
F
y
) +O(
2
), (8.24)
8.5. THE VARIATIONAL EQUATION 227
where the derivatives, F
y
and F
y
, are evaluated on the stationary path, y(x). Since
z(x) satises the equation
d
dx
_
F
z
F
z
= 0, z(a) = A,
we can substitute the Taylor expansion 8.24 into this equation to obtain a dieren-
tial equation for h(x). For this substitution we use the equivalent expansions of the
derivatives,
F
z
(x, z, z
) = F
y
(x, y, y
) + (hF
yy
+h
F
yy
) +O(
2
),
F
z
(x, z, z
) = F
y
(x, y, y
) + (hF
y
y
+h
F
y
y
) +O(
2
),
and since y(x) also satises the Euler-Lagrange equation, we obtain
d
dx
(F
yy
h +F
y
y
h
) F
yy
h F
yy
h
= O().
On taking the limit 0, this reduces to the variational equation
d
dx
_
P(x)
dh
dx
_
Q(x)h = 0, h(a) = 0, (8.25)
where
P(x) =

2
F
y
2
and Q(x) =

2
F
y
2

d
dx
_

2
F
yy
_
.
This equation for h(x) is linear and homogeneous so all solutions may be obtained from
the solution with the initial conditions h(a) = 0, h
(a) = 1, which is the solution of

Jacobis equation.
If the end point x = b is a conjugate point, that is the solution of equation 8.25
satises h(b) = 0 then, because of equation 8.20,
2
S[y, h] = 0 and the stationary
path is not necessarily an extremum. Also, the adjacent paths z = y + h are, to
O() solutions of the Euler-Lagrange equation because z(a) = A and z(b) = B and
z(x) satises the Euler-Lagrange equation to O(). This has many important physical
consequences.
One example that helps visualise this property are geodesics on a sphere, that is,
the shorter arcs of the great circles through two points. We choose the north pole to
coincide with one of the points, N. For any other point, M, on the sphere, excluding
the south pole, there is only one geodesic joining N to M and the varied path, that is
the solutions of equation 8.25, do not pass through M. If, however, we place M at the
south pole, all great circles through N pass through M and in this case all the varied
paths (not just the neighbouring paths, as in the general theory) pass through M. This
behaviour is shown schematically in gure 8.8.
M
S
N
Figure 8.8 Diagram showing the shortest path between the
north pole, N and a point M on a sphere, the solid line, and
two neighbouring paths, the dashed lines.
In the older texts on dynamics conjugate points are often referred to as kinetic foci
because of an analogy with ray optics, and this analogy also aords another helpful
visualisation of the phenomenon. Consider the passage of light rays from a point source
through a convex lens, as shown in gure 8.9. The lens is centred at a point O on an
axis AB: a point source at S, on AB emits light which converges to the point S
, also
on the axis. If the distances OS and OS
are u and v respectively it can be shown that

1/u + 1/v = 1/f, where f is the magnitude of the focal length of the lens.
A
B
S
O
S
L
v u
Figure 8.9 Diagram showing the passage of rays from a point
source, S, through a convex lens, L, to the focal point S
.
This picture is connected with the Euler-Lagrange and the Jacobi equation by observing
that the rays from S to S
satisfy Fermats principle, described in section 3.5.7. The

direct ray SS
is the shortest path and satises the associated Euler-Lagrange equation;

the adjacent rays denoted by the dashed lines are small variations about this path with
the variation satisfying Jacobis equations. The lens focuses rays from S at the focal, or
conjugate, point S
. If u is decreased beyond a certain point S
no longer exists, giving

an illustration of how the existence of conjugate points depend upon the boundary
conditions.
Exercise 8.17
8.6. THE BRACHISTOCHRONE PROBLEM 229
Exercise 8.18
(a) The general solution of the Euler-Lagrange with a single initial condition at
x = a contains one arbitrary constant. We denote this constant by c and the
solution by y(x, c), so that y(a, c) = A for all c in some interval. Show that
y/c = yc(x, c) is a solution of Jacobis equation 8.25.
(b) For the functional dened in exercise 8.12 (page 221) nd the general solution
satisfying the condition y(0) = 0 and conrm that this can be used to nd the
solution of the variational equation 8.25.
Exercise 8.19
For short intervals both P(x) and Q(x) may be approximated by their values at
x = a. By solving equation 8.25 with this approximation show that for suciently
short intervals the stationary path is a minimum if P(a) > 0.
8.6 The Brachistochrone problem
This problem was considered in section 5.2 where it was shown that the relevant func-
tional can be written, apart from the external factor (2g)
1/2
, in the form
T[z] =
_
b
0
dx
_
1 +z
2
z
where z(x) = A +
v
2
0
2g
y(x). (8.26)
If the motion starts from rest at x = 0, y = A > 0 and terminates at x = b, y = 0, so
z(0) = 0 and z(b) = A, it is shown in section 5.2.3 that the stationary path is a segment
of a cycloid that can be written in the parametric form,
x =
1
2
c
2
(2 sin 2) , z = c
2
sin
2
, 0 ,
where the constant c(b, A) is dened uniquely by b and A, (section 5.2.3).
For this example the direct application of the general theory is left as an exercise
for the reader, see exercise 8.20.
An alternative method, in which the role of x and z are swapped, proves far easier:
provided z
(x) = 0, it is shown in exercise 5.5 (page 150) that the functional may be
written in the form,
T[x] =
_
A
0
dz
_
1 +x
2
z
, x
=
dx
dz
. (8.27)
Since z
(x) = 1/ tan, this representation is valid when 0 < /2, that is, the value
of b must not be so great that the cycloid dips below the x-axis.
The integrand, F =
1 +x
2
/
z, is independent of x(z) and depends only upon

the independent variable, z, and x
(z) so Q = 0 and
P(z) =

2
F
x
2
=
1
z(1 +x
2
)
3/2
> 0.
Thus the second variation, equation 8.8, becomes
2
=
_
A
0
dz P(z)h
(z)
2
.
The singularity at z = 0 is integrable, see exercise 5.23 (page 169), and hence, for all
h(z),
2
> 0 and from theorems 8.4 and 8.6 we see that the cycloidal path is a local
minimum if
b
< /2. In exercise 8.20 it is shown that the cycloid is a local minimum
for all
b
.
For this example we can, however, prove that T[x] is a global minimum. If x(z) is
the stationary path (and x
(z) = 0) the value of the functional along another admissible

path x +h is
T[x +h] =
_
A
0
dz
_
1 + (x
+h
)
2
z
.
Using the result derived in exercise 3.4 (page 97), we see that
T[x +h] T[x]
_
A
0
dz
x
_
z(1 +x
2
)
.
But, the Euler-Lagrange equation for the functional 8.27 gives
x
_
z(1 +x
2
)
= d, where
d is a constant. Hence
T[x +h] T[x] d
_
A
0
dz h
= 0.
Thus, provided x
(z) = 0, that is
b
< /2 (page 151), the cycloid is a global minimum
of the functional.
Exercise 8.20
For the functional 8.26 and the stationary cycloidal path show that
P =

2
F
z
2
=
1
z(1 +z
2
)
3/2
=
1
c
sin
2
,
and that
Q =
3
4z
5/2
_
1 +z
2
+
1
2
d
dx
_
z
z
3/2
1 +z
2
_
=
3
4z
5/2
_
1 +z
2
+
1
4c
2
sin
2
d
d
_
z
z
3/2
1 +z
2
_
=
1
2c
5
sin
4
.
Deduce that Jacobis equation is
sin
2
d
2
u
d
2
2u = 0.
Show that the general solution of this equation is
u =
A
tan
+B
_
1
cos
sin
_
where A and B are constants, and deduce that the stationary path is a local
minimum.
Note that one solution of Jacobis equation, A/ tan , is singular at = 0 because
the Jacobi equation is singular here; in turn, this is because the cycloid has an
innite gradient at = 0.
8.7. SURFACE OF REVOLUTION 231
Exercise 8.21
If s(x) is a stationary path of the functional
S[y] =
_
b
a
dxf(x)
_
1 +y
2
, y(a) = A, y(b) = B,
show that S[s +h] S[s] for all h(x) satisfying h(a) = h(b) = 0.
8.7 Surface of revolution
This problem is considered in section 5.3. In the special case in which both ends of
the curve have the same height, A = B see gure 5.7 (page 154), it was shown that
provided the ratio A/a exceeds 1.51, that is the ends are not too far apart, then there
are two stationary paths given by the two real roots of the equation A/a = g(), see
gure 5.9 (page 158). Here we show that the solution corresponding to the smaller root,
=
1
, gives a local minimum.
In this example the functional integrand is, apart from an irrelevant multiplicative
constant, F = y
_
1 +y
2
, so
P(x) =

2
F
y
2
=
y
(1 +y
2
)
3/2
and Q(x) =
d
dx
_

2
F
yy
_
=
d
dx
_
y
_
1 +y
2
_
.
For the boundary conditions y(a) = y(a) = A the stationary paths are given by
y =
a
cosh
_
x
a
_
where
A
a
= g() =
1
cosh
and there are two real solutions if A/a > min(g) 1.5089. With these solutions we
have
P(x) =
a
cosh
2
(x/a)
> 0 and Q(x) =

a cosh
2
(x/a)
. (8.28)
By dening z = x/a Jacobis equation becomes
d
dz
_
1
cosh
2
z
du
dz
_
+
u
cosh
2
z
= 0, u() = 0, u
() = 1, (8.29)
or
d
2
u
dz
2
2 tanhz
du
dz
+u = 0, u() = 0, u
() = 1. (8.30)
An obvious solution is u = sinh z, so putting u = v(z) sinh z gives the following equation
for v(z),
v
=
2
sinhz cosh z
which integrates to ln v
= 2 ln| tanh z| + constant

Hence for some constant C,
dv
dz
=
C
tanh
2
z
, and integration gives v(z) = C
z sinh z coshz
sinh z
+D,
where D is another constant. Hence the general solution of Jacobis equation is
u(z) = C(z sinhz coshz) +Dsinh z, z =
x
a
.
The constants C and D are obtained from the initial conditions u() = 0 and
u
() = 1, which give the two equations

0 = C
_
sinh cosh
_
Dsinh and 1 = C cosh +Dcosh .
These equations can be solved to give
C =
sinh
cosh
2
and D =
cosh sinh
cosh
2
.
Hence the required solution to Jacobis equation is
u(z) =
cosh sinh
cosh
2
sinh z + (coshz z sinh z)

sinh
cosh
2
, z =
x
a
. (8.31)
At z = this solution has the value
u() =
2 sinh
cosh
2
(cosh sinh ) = 2
2
sinh
cosh
2
() where g() =
1
cosh .
Here is a number dened by the solution of g() = A/a and for A/a > min(g) there
are two solutions
1
< 1.200 <
2
and from the graph of g(), see gure 5.9 (page 158),
we see that g
(
1
) < 0 and g
(
2
) > 0. Hence u(
2
) < 0 so this stationary path is not
an extremum. But u(
1
) > 0 so either u(z) > 0 for
1
< z
1
or there are an even
number of real roots of u
(z) = 0. But the equation

u
(z) = (cosh sinh z sinh )

coshz
cosh
2
= 0,
has at most one real root for < z because coshz = 0 and the other factor
is linear in z. Hence for =
1
there are no conjugate points in the interval (
1
,
1
],
and the stationary path is a local minimum.
8.8 The connection between Jacobis equation and
quadratic forms
In section 8.2.3 we considered the n-dimensional quadratic form z
Hz and provided
two conditions on the real, symmetric, n n matrix H for the quadratic form to be
either positive or negative denite. Here we show that the test involving the descending
minors, D
k
, k = 1, 2, , n (page 213) becomes, in the limit n , the condition that
Jacobis equation should have no conjugate points. This analysis is complicated and is
not assessed: it is included because it is important to know that the nite dimensional
theory tends to the correct limit as n . The method described here is taken from
Gelfand and Fomin (1963)
9
.
9
I M Gelfand and S V Fomin Calculus of Variations, (Prentice Hall).
8.8. JACOBIS EQUATION AND QUADRATIC FORMS 233
The connection is made using Eulers original simplication whereby the interval
[a, b] along the real axis is divided into n + 1 equal length intervals,
a = x
0
< x
1
< x
2
< < x
k
< x
k+1
< < x
n
< x
n+1
= b,
with x
k+1
x
k
= =
b a
n + 1
, k = 0, 1, , n, described in section 4.2.
Consider the quadratic functional
2
[y, h] = J[h] =
_
b
a
dx
_
P(x)h
2
+Q(x)h
2
_
, (8.32)
where y(x) is a stationary path and P(x) and Q(x) are dened in equation 8.9. Then,
as in equation 4.3 (page 122) we replace the integral by a sum, so the functional J[h]
becomes a function of h = (h
1
, h
2
, . . . , h
n
) with h
k
= h(x
k
),
J(h) =
n
k=0
_
_
h
k+1
h
k
_
2
P
k
+Q
k
h
2
k
_
, (8.33)
where P
k
= P(x
k
), Q
k
= Q(x
k
), for all k: also h
0
= h(a) = 0 and h
n+1
= h(b) = 0.
The function J(h) is quadratic in all the h
k
and it is helpful to collect the squares
and the cross terms together,
J(h) =
n
k=1
__
Q
k
+
P
k1
+P
k
_
h
2
k
P
k1
h
k1
h
k
_
. (8.34)
This quadratic form may be written in the form J = h
Mh where M is the real,

symmetric, n n matrix,
M =
_
_
_
_
_
_
_
_
_
_
_
a
1
b
1
0 0 0 0 0
b
1
a
2
b
2
0 0 0 0
0 b
2
a
3
b
3
0 0 0
0 0 b
3
a
4
0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 b
n2
a
n1
b
n1
0 0 0 0 0 b
n1
a
n
_
_
_
_
_
_
_
_
_
_
_
,
where
a
k
= Q
k
+
P
k1
+P
k
, b
k
=
P
k
, k = 1, 2, , n.
This matrix is the equivalent of H(a) in equation 8.6 (page 213). The descending minors
of det(M) are the determinants
D
i
=
a
1
b
1
0 0 0 0 0 0
b
1
a
2
b
2
0 0 0 0 0
0 b
2
a
3
b
3
0 0 0 0
0 0 b
3
a
4
0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 a
i3
b
i3
0 0
0 0 0 0 b
i3
a
i2
b
i2
0
0 0 0 0 0 b
i2
a
i1
b
i1
0 0 0 0 0 0 b
i1
a
i
, i = 1, 2, , n,
with D
1
= a
1
and D
n
= det(M).
We need to show that the conditions D
k
> 0, for all k, become, in the limit n ,
the same as the condition that the solution of the Jacobi equation has no conjugate
points in (a, b]. This is achieved by rst obtaining a recurrence relation for the descend-
ing principal minors, by expanding D
k
with respect to the elements of the last row; this
gives
D
k
= a
k
D
k1
b
k1
E
k1
where E
k1
is the minor of b
k1
in D
k
,
E
k1
=
a
1
b
1
0 0 0 0 0 0
b
1
a
2
b
2
0 0 0 0 0
0 b
2
a
3
b
3
0 0 0 0
0 0 b
3
a
4
0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 b
k4
a
k3
b
k3
0
0 0 0 0 0 b
k3
a
k2
0
0 0 0 0 0 0 b
k2
b
k1
.
Evaluating this determinant we obtain, E
k1
= b
k1
D
k2
and hence
D
k
= a
k
D
k1
b
2
k1
D
k2
, k = 1, 2, , n,
where we dene D
1
= 0 and D
0
= 1 in order to ensure that the relation is valid for
k = 1 and 2: the rst few terms are
D
1
= a
1
, D
2
= a
1
a
2
b
2
1
and D
3
= a
1
a
2
a
3
a
3
b
2
1
a
1
b
2
2
,
which can be checked by direct calculation.
Now substitute for a
k
and b
k
to obtain
D
k
=
_
Q
k
+
P
k1
+P
k
_
D
k1
P
2
k1
2
D
k2
. (8.35)
From this we see that D
1
= O(
1
), D
2
= O(
2
) and hence that D
j
= O(
j
). Since
we are interested in the limit 0 it is necessary to re-scale the variables to remove
this dependence upon ; thus we dene new variables X
j
, j = 0, 1, , n + 1, by the
relation
D
j
=
X
j+1
j+1
, D
1
= X
0
= 0, D
0
=
X
1
= 1.
Then equation 8.35 becomes
X
k+1
=
_
Q
k
2
+P
k1
+P
k
_
X
k
P
2
k1
X
k1
. (8.36)
This is still not in an appropriate form, so we dene the variables Z
k
, by the relations
X
k
= P
1
P
2
P
k2
P
k1
Z
k
, Z
0
= 0, Z
1
= X
1
= ,
and equation 8.36 becomes, after cancelling the product (P
1
P
2
P
k1
),
(Z
k+1
Z
k
) P
k
(Z
k
Z
k1
) P
k1
Q
k
Z
k
2
= 0. (8.37)
8.8. JACOBIS EQUATION AND QUADRATIC FORMS 235
Assuming that Z
k
can be derived from a function z(x), by Z
k
= z(x
k
), we have,
(Z
k+1
Z
k
) P
k
(Z
k
Z
k1
) P
k1
= P(x) (z(x +) z(x)) P(x ) (z(x) z(x ))
=
2
_
z
(x)P(x) +P
(x)z
(z)
_
+O(
3
),
where, on the right-hand side, we have put x
k
= x and x
k1
= x . Hence, in the
limit n equation 8.36 becomes
d
dx
_
P
dz
dx
_
Q(x)z = 0,
which is Jacobis equation. The conditions D
k
> 0, become, since P(x) > 0 and > 0,
z(x) > 0. Finally we observe that the initial conditions are Z(0) = z(a) = 0 and
z
(a) = lim
0
Z
1
Z
0
= 1.
Thus the condition D
k
> 0, for all k, becomes, in the limit n , the condition that
there should be no conjugate points in the interval a < x b.
Exercise 8.22
Find expressions for the second variation, 2, in terms of the stationary path y(x)
and its derivatives, for the following functionals.
(a) S[y] =
_
1
0
dxy
3
, (b) S[y] =
_
1
0
dx
_
y
2
1
_
2
,
(c) S[y] =
_
1
0
dxy
2
_
y
2
1
_
2
, (d) S[y] = exp
__
1
0
dxF(x, y
)
_
.
Exercise 8.23
Find the second variation of the following functionals.
(a) S[y] =
_
b
a
dxF(x, y), (b) S[y] =
_
b
a
dxF(x, y, y
, y
).
Note that in part (b) suitable boundary conditions need to be dened.
Exercise 8.24
Show that the second variation of the linear functional S[y] =
_
b
a
dx
_
A(x) +B(x)y
_
is zero.
Exercise 8.25
(a) Show that the stationary path of the functional
S[y] =
_
1
0
dxy
2
_
y
2
1
_
2
, y(0) = 0, y(1) = 1
is y = x and that this yields a global minimum.
(b) If the boundary conditions are y(0) = 0, y(1) = A > 0, show that a stationary
path exists only if A 1.
Exercise 8.26
By examining the solution of Jacobis equation for the functional
S[f] =
_
a
0
dx
_
f
(x)
2
2
f(x)
2
_
, f(0) = f(a) = 0,
where a and are positive constants, and using theorem 8.6 determine the range
of values of a and such that S[f] > 0. Deduce that
_
a
0
dxf
(x)
2
>
k
2
a
2
_
a
0
dxf(x)
2
, k < .
Compare this inequality with equation 8.11 (page 217).
Exercise 8.27
Show that y = 1 sin 2x is a stationary path of the functional
S[y] =
_
/4
0
dx
_
4y
2
8y y
2
_
, y(0) = 1, y(/4) = 0,
and use the result of exercise 8.26 to show that this is a global maximum.
Exercise 8.28
Prove that the stationary path of the functional S[y] =
_
a
0
dx y
b
, y(0) = 0,
y(a) = A = 0, where b is a real number, yields a weak but not a strong local
minimum.
Exercise 8.29
If y(x, , ) is the general solution of an Euler-Lagrange equation and and
are two constants, show that if the ratio R(x) =
y
y
has the same value at two

distinct points these points are conjugate.
Exercise 8.30
(a) Find the stationary path of the functional
S[y] =
_
a
0
dx
y
y
2
, y(0) = 1, y(a) = A
2
,
where a > 0 and 0 < A < 1.
(b) Show that the Jacobi equation can be written in the form
(1 bx)
2
d
2
u
dx
2
+ 2b(1 bx)
du
dx
+ 2b
2
u = 0 where b =
1
a
(1 A).
(c) By substituting u = (1 bx)
into Jacobis equation nd values of to obtain

a general solution, and hence the solution satisfying the conditions u(0) = 0 and
u
(0) = 1; deduce that the stationary path yields a minimum.

Chapter 9
The parametric representation
of functionals
9.1 Introduction: parametric equations
The functionals considered so far have the form
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (9.1)

giving rise to the Euler-Lagrange equation
d
dx
_
F
y
F
y
= 0, y(a) = A, y(b) = B. (9.2)
With this type of functional the path in the Oxy-plane is described explicitly by the
function y(x). Such a formulation is restrictive because the admissible functions must
be single valued, so curves with the shapes shown in gure 9.1 are not allowed; in
particular vertical gradients are forbidden, except at the end points.
A
y
x
B
a b
A
b
B
y
x
a
Figure 9.1 Examples of paths that cannot be described by a function y(x).
In addition, such a formulation creates an asymmetry between the x and y variables,
which is often not present in the original problem. In chapter 1 we encountered several
problems where these restrictions may create diculties. For example the solution
to the simplest isoperimetric problem described in section 3.5.5 and solved later
in chapter 12 is a circular arc, which may involve curves such as that depicted in
239
240 CHAPTER 9. PARAMETRIC FUNCTIONALS
gure 9.1. Another example is the navigation problem, described on page 110, where it
is entirely feasible that for some water velocities the required path may at some point
be parallel to the bank.
These anomalies may be removed by considering curves or arcs dened parametri-
cally; the coordinates of points on a path in a two dimensional space can be dened by
a pair of functions (x(t), y(t)), where x(t) and y(t) are piecewise continuous functions of
the parameter t, usually conned to a nite interval t
1
t t
2
. The curve is closed if
x(t
1
) = x(t
2
) and y(t
1
) = y(t
2
). In this context a common convention is the convenient
dot notation to represent derivatives, so x = dx/dt and x = d
2
x/dt
2
.
The simplest example of a parametrically dened curve is the straight line, x = at+b,
y = ct + d, where a, b, c and d are all constants. Another example is x(t) = cos t,
y(t) = sin t, giving x
2
+ y
2
= 1. If the parameter t is restricted to the range 0 t
this is a semi-circle in the upper half plane, and as t increases the arc is traversed
anticlockwise: if t it is a circle, that is a closed curve. Many other curves
are most easily described parametrically: the cycloid has already been discussed in
section 5.2.1 and several others are described in exercises 9.2 9.5 and 9.17 9.18. Other
examples are solutions of Newtons equations which are often expressed parametrically
in terms of the time.
It is important to note that there is never a unique parameter for a given curve.
For instance, the pair of functions x(s) = cos(s
2
), y(s) = sin(s
2
), 0 s <
2, also
denes the circle x
2
+ y
2
= 1. In geometric terms this merely states the obvious, that
the shape of the curve is independent of the particular choice of parameter: the value of
the parameter simply denes a point on the curve. This simple fact aects the nature
of functionals because functionals must be independent of the parametrisation used to
label the points on a curve.
If a curve is described by the functions x = f(t), y = g(t) and also explicitly by the
function y(x), then its gradient is
dy
dx
=
dg
dt
dt
df
=
g
f
.
One reason for using a parametric representation is that both coordinates are treated
equally. An example showing why this is sometimes helpful is the problem of determin-
ing the stationary values of the distance between the origin and points on the parabola
y
2
= a
2
(1 x), x 1, a > 0. (9.3)
This parabolic shape changes with a, as shown in gure 9.2. The parabola axis coincides
with the x-axis; all parabolas are symmetric about Ox and pass through (1, 0); they
intersect the y-axis at y = a.
A parametric representation of these curves is
x = 1
2
, y = a, a > 0, < < . (9.4)
9.1. INTRODUCTION: PARAMETRIC EQUATIONS 241
-1 -0.5 0 0.5 1
-4
-2
2
4
y
x
a=3
a=1/2
a=2
Figure 9.2 Diagram showing the shape of the
parabola 9.3, with a = 1/2,
2 and 3.
Now consider the distance, d(x), between a point on this curve and the origin. Intuition
suggests that for suciently small a the shortest distance will be to a point close to Oy
and that there will be a local maximum at (1, 0). But for large a the point (1, 0) will
become a local minimum.
Pythagoras theorem gives d(x) =
_
x
2
+y(x)
2
and elementary calculus shows that
if a <
2, d(x) is stationary at x = a
2
/2, where it has a local minimum. But if a >
2,
d(x) has no stationary points.
With the parametric representation the distance is d() =
_
(
2
1)
2
+ (a)
2
, and
this function is stationary at = 0, corresponding to the point (x, y) = (1, 0), for all
a, and at
2
= 1 a
2
/2 for a <
2; the rst of these stationary points was not given

by the previous representation. Thus the parametric formulation of this problem has
circumvented the diculty caused by a stationary point being at the edge of the range,
x = 1.
In chapter 10 we shall see that the two equivalent variational principles behave in
the same manner.
Exercise 9.1
(a) Show that the function d(x) =
_
x
2
+y(x)
2
, where y(x)
2
= a
2
(1 x) and
x 1, has a single minimum at x = a
2
/2 if a <
2. If a >
2 show that d(x) is

not stationary for any x 1 and has a global minimum at x = 1.
(b) Show that the function d() =
_
(
2
1)
2
+ (a)
2
, is stationary at = 0, for
all a, and at =
_
1 a
2
/2 if a <
2; classify the stationary points.

9.1.1 Lengths and areas
Lengths and areas of curves dened parametrically are obtained using simple general-
isations of the formulae used when a curve is represented by a single function. The
length along a curve is given by a direct generalisation of equation 3.1 (page 93) and
the derivation of the relevant formula is left to the reader in exercise 9.2.
The area, however, is dierent because there is no base-line, normally taken to be
the x-axis. Consider the arc AB in the Oxy-plane, shown in gure 9.3.
B
A
O
x
y
S
t increasing
Figure 9.3 Diagram showing the area traced out by the
arc dened parametrically by (x(t), y(t)), t
A
t t
B
.
If the arc is dened parametrically by (x(t), y(t)) with t = t
A
and t
B
at the points
A and B then it can be shown, by splitting the region into elementary triangles, see
exercise 9.20, that the area, S, of the region OAB is given by
S =
1
2
_
tB
tA
dt
_
x
dy
dt
y
dx
dt
_
, t
A
< t
B
. (9.5)
If the curve is closed it is traced out anticlockwise and S is the area enclosed. Elementary
geometry can be used to connect this area with the area represented by the usual integral
_
xB
xA
dxy(x), namely the area between the arc and the x-axis, see exercise 9.3.
If the arc AB in gure 9.3 is dened parametrically by (x
1
(), y
1
()) with increas-
ing from B to A, the sign of the above formula is reversed and the area is
S =
1
2
_
A
B
d
_
y
1
dx
1
d
x
1
dy
1
d
_
,
B
<
A
. (9.6)
If the curve is closed it is traced out clockwise.
The formula 9.5 is useful for nding the area of closed curves. Consider, for instance,
the Lemniscate of Bernoulli, having the Cartesian equation
_
x
2
+y
2
_
2
= a
2
_
x
2
y
2
_
and the shape shown in gure 9.4.
y=x y=-x
x
y
Figure 9.4 The Lemniscate of Bernoulli.
One parametric representation of this curve is
x =
a
2
sin
_
1 + sin
2
, y =
a
2
2
sin 2, .
9.1. INTRODUCTION: PARAMETRIC EQUATIONS 243
Restricting to the interval [0, ] gives the right-hand loop and as increases from 0
to the curve is traversed clockwise. The area S of each loop can be computed using
equation 9.6: we have
y
dx
d
x
dy
d
=
a
2
4
(1 + 2 sin
2
) cos sin2
_
1 + sin
2
a
2
2
sin cos 2
_
1 + sin
2
=
a
2
sin
3
_
1 + sin
2
,
so the area of each loop is
S =
a
2
2
_

0
d
sin
3
_
1 + sin
2
, and putting c = cos gives

S =
a
2
2
_
1
1
dc
1 c
2
2 c
2
= a
2
_
/4
0
d cos 2 =
a
2
2
,
where, for the penultimate integral, we have used the substitution c =
2 sin and
then used the trigonometric identity cos 2 = 1 2 sin
2
.
Exercise 9.2
Show that the length of a curve dened parametrically by functions (x(t), y(t))
between t = ta and t
b
is s =
_
t
b
ta
dt
_
x
2
+ y
2
.
Exercise 9.3
Consider the curve dened by the function y(x) and parametrically by the pair
(x(t), y(t)), ta t t
b
. Show that the area, S, under the curve between x = a,
x = b and the x-axis, is given by
S =
_
b
a
dxy(x) =
_
t
b
ta
dt xy =
1
2
_
t
b
ta
dt ( xy x y) +
1
2
(bB aA)
where (x(ta), y(ta)) = (a, A) and (x(t
b
), y(t
b
)) = (b, B).
Use gure 9.3 to provide a geometric interpretation of this formula.
Exercise 9.4
We expect the formula quoted in equation 9.5 to be independent of the choice
of parameter. If () is an increasing function,
() > 0, with (a) = ta and

(
b
) = t
b
, show that by putting t = () and using as the independent variable
the area becomes
S =
1
2
_

b
a
d
_
x()y
() x
()y()
_
,
so that the form of the expression for the area is unchanged, as would be expected.
Note that the formula for the area is invariant under parameter changes because
the integrand is homogeneous of degree one in the rst derivatives of x and y.
Exercise 9.5
Four typical parametric curves, with their parametric representation are:
The ellipse: x = a cos , y = b sin ;
The astroid: x = a cos
3
, y = a sin
3
;
The cardioid: x = a(2 cos cos 2), y = a(2 sin sin 2);
The cycloid: x = a( sin ), y = a(1 cos ),
where 0 2 and a and b are positive numbers: in the rst three cases the
angle is polar angle and the curve is traced out anticlockwise. Examples of the
rst three curves are shown in the following gure: a cycloid is shown in gure 5.1
(page 146).
y
x
b/a=1/4
b/a=3/4
Ellipses
x
y
Astroid
x
y
3a
a
Cardioid
Find the area enclosed and the length of each of these curves.
Note the arc length of an ellipse, the rst example, is given in terms of the function
known as a complete Elliptic integral which cannot be expressed as a nite com-
bination of standard functions. This integral, and its relations, are important in
many problems and have played an important role in the development of Analysis.
9.2 The parametric variational problem
In this section we re-write the functional 9.1 in terms of the parametrically dened
curve (x(t), y(t)). One might think that such an apparently minor change would make
no substantial dierence. This, however, is not the case as will soon become apparent.
The gradient y
(x) of a curve is just the ratio y/ x and hence the parametric func-
tional which is the equivalent of the functional 9.1 is
S[x, y] =
_
t2
t1
dt (x, y, x, y), x(t
1
) = a, y(t
1
) = A, x(t
2
) = b, y(t
2
) = B, (9.7)
where
1
(x, y, x, y) = xF(x, y, y/ x). (9.8)
Now, both the functions x(t) and y(t) have to be determined, not just y(x). The two
Euler-Lagrange equations for the functional 9.7 are given by the general formulae of
1
The function used here is unrelated to the used in section 7.3.
9.2. THE PARAMETRIC VARIATIONAL PROBLEM 245
equation 6.33 (page 185), that is
d
dt
_
x
_

x
= 0 and
d
dt
_
y
_

y
= 0, (9.9)
with the boundary conditions dened in equation 9.7. These two equations replace the
original single equation 9.2 which, at rst sight, seems rather strange since they dene
the same curve. However, the change to the parametric form of the functional is more
subtle than is apparent; in particular, it will be shown that these two equations are not
independent, meaning that a solution of one is automatically a solution of the other.
It is clear that for any parametric functional,
S[x, y] =
_
t2
t1
dt G(x, y, x, y), (9.10)
to be useful the value of the integral must be independent of the particular parameteri-
sation chosen. This means that the integrand, G(x, y, x, y), must not depend explicitly
upon the parameter, t. In addition, it was proved by Weierstrass that a necessary and
sucient condition is that G(x, y, x, y) is a positive homogeneous function of degree one
in x and y, that is,
G(x, y, x, y) = G(x, y, x, y) for any real > 0, (9.11)
provided x and y are in C
1
. It is clear that the function dened in equation 9.8
satises this condition.
We require positive homogeneity because the distance
_
x
2
+ y
2
occurs frequently
and
_
( x)
2
+ ( y)
2
= ||
_
x
2
+ y
2
,
that is, this function is homogeneous only if > 0.
Changing the parameter t to , where t = (), and
() > 0, transforms equa-

tion 9.10 into
S[x, y] =
_
2
1
d
()G
_
x, y,
x
,
y
_
=
_
2
1
d G(x, y, x
, y
).
Hence, in both the t- and -representation the Euler-Lagrange equations are identical,
and have the same solutions. That is, the variational principle is invariant with respect
to a parameter transformation. Note that one consequence of this is that the range of
integration plays no role and can always be scaled to [0, 1], or some other convenient
range.
The second consequence of homogeneity is that the two Euler-Lagrange equations 9.9
are not independent. This result follows from Eulers formula, see exercise 1.25 (page 28),
which gives
G = xG
x
+ yG
y
. (9.12)
Before proving this you should do the following two exercises involving systems where
the relation between the two Euler-Lagrange equations is clear. Following these exer-
cises we derive the general result.
Exercise 9.6
_
1
0
dxy
2
when expressed in parametric form,
becomes S[x, y] =
_
1
0
dt y
2
x
1
.
Show also that the Euler-Lagrange equations for x and y are, respectively,
d
dt
_
z
2
_
= 0 and
dz
dt
= 0 where z =
y
x
.
Exercise 9.7
_
1
0
dxF(y
) when expressed in parametric form,

becomes S[x, y] =
_
1
0
dt xF( y/ x).
Show also that the Euler-Lagrange equations for x and y are, respectively,
d
dt
_
F(z) zF
(z)
_
= 0 and
d
dt
_
F
(z)
_
= 0 where z =
y
x
,
and that these equations have the same general solution provided F
(z) = 0.
The results derived in these two exercises are particular examples of a general relation
between the two Euler-Lagrange equations. This relation is obtained by forming the
total derivative of with respect to t and rearranging the result:
d
dt
= x
x
+ y
y
+ x
x
+ y
y
= x
x
+ y
y
+
d
dt
( x
x
+ y
y
) x
d
x
dt
y
d
y
dt
.
Now use Eulers formula, that is = x
x
+ y
y
, to rearrange the latter expression in
the following manner,
_
d
dt
_
x
_

x
_
x +
_
d
dt
_
y
_

y
_
y = 0. (9.13)
This is an identity derived by assuming only that (x, y, x, y) is homogeneous of degree
one in x and y: at this stage we have not assumed that x(t) and y(t) are solutions of
the Euler-Lagrange equations. If, however, one of the Euler-Lagrange equations 9.9 is
satised, it follows from this identity that the other equation must also be satised.
The exceptional case when either y = 0 or x = 0 for all t, is ignored. Thus, in general,
the two Euler-Lagrange equations are not independent.
Furthermore, because is homogeneous of degree one in x and y, either of the Euler-
Lagrange equations 9.9 can be expressed as the second-order equation in y(x) derived
from the non-parametric form of the functional, as would be expected, see exercise 9.22.
Exercise 9.8
Using the fact that the distance between the origin and the point (a, A) of the
Oxy-plane can be expressed in terms of the parametric functional
S[x, y] =
_
1
0
dt
_
x
2
+ y
2
, x(0) = y(0) = 0, x(1) = a, y(1) = A,
show that the general solution of the Euler-Lagrange equations is x + y = ,
where , and are constants, and hence that the required solution is ay = Ax.
What is the solution if a = 0 and can this particular solution be described by the
original functional of equation 3.1 (page 93)?
Exercise 9.9
Show that Noethers theorem applied to the functional 9.7 gives the rst-integral
= x x + y y +c where c is a constant.
Recall that Eulers formula gives this identity but with c = 0, so in this instance
no further information is gleaned from Noethers theorem.
Exercise 9.10
Use equation 9.8 to show that
x
= F y
F
y
and

y
=
F
y
.
9.2.1 Geodesics
In this section we assume that all curves and surfaces are dened by smooth functions.
A geodesic is a line on a curved surface joining two given points, along which the
distance
2
is stationary
3
. Here we assume that the surface is two-dimensional so the
position on it may be dened by two real variables, which we denote by (u, v). Thus
the Cartesian coordinates, (x, y, z), of every point on the surface can be described by
three functions x = f(u, v), y = g(u, v) and z = h(u, v), which are assumed to be at
least twice dierentiable.
An example of such a surface is the sphere of radius r, centred at the origin: the
natural coordinates on this sphere are the spherical polar angles (, ), where is related
to the latitude, as shown in the diagram, and is the longitude.
x y
z
P
Figure 9.5 Diagram showing the physical meaning of the

spherical polar coordinates (r, , ).
2
At this point of the discussion we use an intuitive notion of the distance; a formal denition is
given in equation 9.17.
3
In some texts the name geodesic is used only for the shortest path.
The Cartesian coordinates of the point P are
x = r sin cos , y = r sin sin , z = r cos , (9.14)
and it is conventional to limit to the closed interval [0, ], with the north and south
poles being represented by = 0 and = respectively. Since the points with coordi-
nates (, ) and (, + 2k), k = 1, 2, , are physically identical it is sometimes
necessary to be careful about the range of . Note that both the north and south
poles are singular points of this coordinate system in the sense that at both points is
undened.
Returning to the general case, the distance, s, between the two points with coor-
dinates (x, y, z) and (x +x, y +y, z +z) is, from Pythagoras theorem,
s
2
= x
2
+y
2
+z
2
. (9.15)
If these two points are on the surface, with coordinates (u, v) and (u + u, v + v),
respectively, then if u and v are small
x +x = f(u +u, v +v)
= f(u, v) +f
u
(u, v)u +f
v
(u, v)v + higher order terms,
with similar expressions for y +y and z +z. Thus, to rst-order,
x = f
u
(u, v)u+f
v
(u, v)v, y = g
u
(u, v)u+g
v
(u, v)v, z = h
u
(u, v)u+h
v
(u, v)v.
Substituting these expressions into equation 9.15 and rearranging, gives the following
expression for the distance between neighbouring points on the surface
s
2
= E(u, v)u
2
+ 2F(u, v)uv +G(u, v)v
2
+ higher order terms, (9.16)
where
E = f
2
u
+g
2
u
+h
2
u
, G = f
2
v
+g
2
v
+h
2
v
and F = f
u
f
v
+g
u
g
v
+h
u
h
v
.
Now consider a line on the surface dened by the two dierentiable functions
(u(t), v(t)), depending on the parameter t. Then to rst-order u = ut and v = vt
and the distance 9.16 becomes, to this order,
s = t
_
E(u, v) u
2
+ 2F(u, v) u v +G(u, v) v
2
.
Finally, the distance between the points r
1
= (u(t
1
), v(t
1
)) and r
2
= (u(t
2
), v(t
2
)) on
the surface and along the line parameterised by the functions (u(t), v(t)) is dened by
the functional obtained by integrating the above expression,
S[u, v] =
_
t2
t1
dt
_
E(u, v) u
2
+ 2F(u, v) u v +G(u, v) v
2
. (9.17)
A geodesic is, by denition, given by those functions that make S[u, v] stationary. These,
therefore, satisfy the associated Euler-Lagrange equations for u(t) and v(t) which we
now derive.
For this functional
2
= E u
2
+ 2F u v +G v
2
, so that
u
= E u +F v,
v
= G v +F u.
Hence the Euler-Lagrange equations for u and v are, respectively,
d
dt
_
E u +F v
u
2
E
u
+ 2 u vF
u
+ v
2
G
u
2
= 0, (9.18)
d
dt
_
G v +F u
u
2
E
v
+ 2 u vF
v
+ v
2
G
v
2
= 0. (9.19)
We illustrate this theory by nding stationary paths on a sphere, also treated in exer-
cise 5.20 (page 168). For the polar coordinates illustrated in gure 9.5 we have, from
equation 9.14, and remembering that r is a constant,
f(, ) = r sin cos , g(, ) = r sin sin , h(, ) = r cos ,
which gives,
E = r
2
, G = r
2
sin
2
and F = 0, (9.20)
so that = r
_
2
+

2
sin
2
and the functional for the distance becomes
S[, ] = r
_
1
0
dt
_
2
+

2
sin
2
. (9.21)
For any two points there is some freedom of choice in the coordinate system and it is
convenient to choose the north pole to coincide with one end of the path, by setting
(0) = 0. If the nal point is ((1), (1)) = (
1
,
1
) the Euler-Lagrange equation for
is
d
dt
_

sin
2
_
= 0 which gives

sin
2
= A
_
2
+

2
sin
2
, (9.22)
for some constant A. But at t = 0, = 0 and hence A = 0, so

= 0 for all t. Therefore
(t) =
1
. As expected the stationary path is the line of constant longitude, that is
it lies on the great circle through the two points. Notice that unless the second point
is at the south pole this gives two stationary paths, the short and the long route. If
the second path is at the south pole there are innitely many paths, all with the same
length.
Exercise 9.11
Consider a cylinder with axis along Oz, with circular cross section of radius r.
The position of a point r on this cylinder can be described using cylindrical polar
coordinates (, z), r being xed, where r = (r cos , r sin , z).
(a) Show that E = r
2
, G = 1 and F = 0.
(b) Show that the Euler-Lagrange equations for the geodesic between the points
(1, z1) and (2, z2) are
d
dt
_
_
r
2

_
r
2
2
+ z
2
_
_
= 0,
d
dt
_
_
z
_
r
2
2
+ z
2
_
_
= 0, (t
k
) =
k
, z(t
k
) = z
k
, k = 1, 2.
Hence show that the stationary path is z = z1 + (z2 z1)
1
2 1
.
9.2.2 The Brachistochrone problem
The functional for this problem is given in equation 5.6 (page 150) and in parametric
form this becomes (on omitting the factor (2g)
1/2
)
T[x, z] =
_
1
0
dt
_
x
2
+ z
2
z
, x(0) = 0, x(1) = b, z(0) = A, z(1) = 0, (9.23)
and the Euler-Lagrange equations for x and y are, respectively,
d
dt
_
x
_
z( x
2
+ z
2
)
_
= 0 and
d
dt
_
z
_
z( x
2
+ z
2
)
_
+
x
2
+ z
2
2z
3/2
= 0. (9.24)
The rst of these equations may integrated and rearranged to give x
2
(c
2
z) = z z
2
,
for some constant c, which is just equation 5.7 (page 151). One more integration gives
x = d +
c
2
2
(2 sin 2), z = c
2
sin
2
,
for some constant d.
For this problem there is no apparent advantage in using a parametric formula-
tion: in chapter 12, however, we shall see that it enables more general brachistochrone
problems to be tackled.
9.2.3 Surface of Minimum Revolution
The functional for this problem is given in equation 5.11 (page 155); for the symmetric
version the parametric form becomes
S[x, y] =
_
1
1
dt y
_
x
2
+ y
2
, x(1) = a, y(1) = y(1) = A, (9.25)
and the Euler-Lagrange equations for x and y are, respectively,
d
dt
_
y x
_
x
2
+ y
2
_
= 0 and
d
dt
_
y y
_
x
2
+ y
2
_
_
x
2
+ y
2
= 0. (9.26)
The rst of these gives
y x
_
x
2
+ y
2
= c > 0 or x
2
=
c
2
y
2
c
2
y
2
, (9.27)
which is the same as equation 5.14 (page 157), and for c > 0 gives the solution obtained
in that section. Another solution, however, is obtained by setting c = 0, so that y x = 0,
which gives either y = 0 or x = 0; the latter is the Goldschmidt solution, equation 5.20
(page 160), which cannot be found using the original formulation of this problem. Thus
enlarging the space of admissible functions allows more solutions to be found.
Exercise 9.12
Show directly that the two equations 9.26 satisfy the identity 9.13.
9.3. THE PARAMETRIC AND THE CONVENTIONAL FORMULATION 251
Exercise 9.13
By substituting y = c cosh(t +), where and are constants, into the rst of
equations 9.27 show that the solution satisfying the boundary conditions is
x = at, y = c cosh
_
at
c
_
with
A
c
= cosh
_
a
c
_
.
9.3 The parametric and the conventional formula-
tion compared
We end this chapter with the important observation that the original variational princi-
ple, equation 9.1, and the associated, parametric problem, equation 9.7 are not precisely
equivalent, because a path that is an extremum of the rst problem is not necessarily
an extremum of the second because the class of admissible functions has been enlarged.
This feature is illustrated in the following example.
S[y] =
_
1
0
dxy
2
, y(0) = 0, y(1) = 1. (9.28)
If the admissible paths are in C
1
[0, 1] the stationary path is y = x; this is a global
minimum and on this path S = 1, see exercise 3.4 (page 97). Also, if we widen the class
of admissible functions to piecewise continuously dierentiable functions, the stationary
path is the same straight line.
The associated parametric problem is
S[x, y] =
_
1
0
dt
y
2
x
, x(0) = y(0) = 0, x(1) = y(1) = 1, (9.29)
and now we let the admissible functions be piecewise dierentiable. In this case the
stationary path is the straight line, see exercise 9.14, but this no longer yields a minimum
of the functional. The reason for this is simply that in the original functional 9.28 the
integrand is never negative, so S[y] 0: in the parametric functional the integrand can
be negative, which means, in this case, that the functional is unbounded below.
Consider the path OBA, passing through the point B = (, ) and where A = (1, 1),
shown in gure 9.6.
1
1
(1,1)
(,)
x
y
O
A
B
Figure 9.6
This path comprises two straight line segments OB and BA, dened, respectively, by
the parametric equations
x = t,
y = t,
0 t 1, and
x = (1 )(t 1) +,
y = (1 )(t 1) + ,
1 t 2.
Such a curve can be described by a single valued function y(x) only if 0 < < 1. On
this path, provided is neither 0 nor 1,
S(, ) =

2
+
(1 )
2
(1 )
,
and hence S can be made arbitrarily large by setting = 1 +
2
, for arbitrarily
small : hence the functional has no minimum. If is restricted to the interval (0, 1)
then S(, ) is always positive and has a local minimum at = , giving the straight
line y = x.
Exercise 9.14
Show that the general solution of the Euler-Lagrange equations for the func-
tional 9.29 is the line y = mx + c for some constants m and c, and use the
boundary conditions to determine their values.
Exercise 9.15
Consider the zig-zag path comprising the se-
quence of segments like AB and BC shown in
the gure. On AB, y = 0 and on BC y = 1,
x = 1.
Show that on this path S = 1. Show also that
such a path may be made arbitrarily close to the
straight line y = x.
1
1
B
C
A
x
y
Exercise 9.16
Show that the curve dened by the parametric equations x = at
2
, y = 2at, where
a is a positive constant, is the parabola y
2
= 4ax and that the length of the curve
between t = 1 and t = 1 is 2a
_
2 + ln(1 +
2)
_
.
Exercise 9.17
An epicycloid is a curve traced out by a point on a circle which rolls, without
slipping, on the outside of a xed circle. If the radii of the xed and rolling circles
are R and r respectively, the parametric equations of the epicycloid are
x = (R +r) cos r cos
_
R +r
r

_
, y = (R+r) sin r sin
_
R+r
r

_
,
with 0 < 2 and the curve is traced out anticlockwise as increases. It is
a worthwhile exercise to derive these equations. If R/r = n is a positive integer
these equations dene a closed curve. If n = 1 we obtain a cardioid, dened in
exercise 9.5: other examples are shown in the following gures.
Epicyloids
n=5
n=4
x
y
x
y
Show that the area enclosed by such a closed curve is S = (R +r)(R + 2r) and
that the length of the curve is 8(R+r). In the limit r R explain how this result
is related to the length of a cycloid.
Exercise 9.18
A trochoid is formed in a manner similar to the cycloid, section 5.2.1, except that
the point tracing out the curve is xed on a radius at a distance ka from the centre
of the rolling circle. The parametric equations of this curve are
x = a( k sin ), y = a(1 k cos ),
which reduce to the equations of the cycloid when k = 1.
When k < 1 the trochoid is a smooth arc above the x-axis, as shown on the left-
hand side of gure 9.7: when k > 1 there are loops below the line y = a(1k cos ),
where is the root of = k sin in the interval (0, ), as shown on the right of
the gure.
0 1 2 3 4 5 6
0.5
1
1.5
2
0 2 4 6
0
1
2
3 y
x
x
y
k=0.6
k=1.8
Figure 9.7 Graphs showing two typical trochoids.
(a) If k < 1 show that the area under the trochoid between = 0 and 2 is
a
2
(2 +k
2
).
(b) Show that the area enclosed by the loop surrounding the origin for k > 1 is
given by
S = a
2
_
k
2
2 +k cos
_
where = k sin , 0 < < .
(c) (Harder) Show that the two loops shown in the right-hand gure touch when
k satises the equation =
k
2
1 cos
1
(1/k) (the only real solution of this
equation is k = 4.6033 . . .).
Exercise 9.19
The coordinates of a point on a curve are given by
x = a(u tanh u), y =
a
cosh u
,
where a is a positive constant and u the parameter. Show that the arc length
from u = 0 to u = v is L = a ln(cosh v).
Exercise 9.20
If the points (x
k
, y
k
), k = 1, 2, 3, in the Cartesian plane dene the vertices of the
triangle ABC, respectively, then it may be shown that the area of the triangle is
|| where
=
1
2
x1(y2 y3) +
1
2
x2(y3 y1) +
1
2
x3(y1 y2),
=
1
2
y1(x2 x3)
1
2
y2(x3 x1)
1
2
y3(x1 x2),
and that is positive if moving round the vertices A, B and C, in this order,
represents an anti-clockwise rotation. By putting C = (x3, y3) at the origin and
(x1, y1) = (x(t1), y(t1)) and (x2, y2) = (x(t2), y(t2)) where t2 t1
use the above formula to show that the quantity
S =
1
2
_
t
2
t
1
dt (x y xy)
represents the area of the region OPQ shown in the
gure, where the curve PQ is an arc of the curve de-
ned parametrically by (x(t), y(t)) for t1 t t2.
2 2
) (x ,y
1 1
) (x ,y
x
O
y
P
Q
Exercise 9.21
Using the fact that (x, y, x, y) is homogeneous of degree 1 in x and y show that
(x, y, x, y) = x(x, y, 1, y
(x)) and x(x, y, x, y) = x(x, y, 1, y
(x)).
Exercise 9.22
Use the two parametric Euler-Lagrange equations dened in equation 9.9 to derive
the Euler-Lagrange equation for y(x), dened in equation 9.2.
Hint use the Euler-Lagrange equation for x and use x for the parameter t.
Exercise 9.23
S[x, y] =
_
1
0
dt
_
x
2
+ y
2
+ x y, x(0) = y(0) = 0, x(1) = X, y(1) = Y.
(a) Show that the Euler-Lagrange equation for x can be integrated to give the
general solution
y
+ 2 = A
_
1 +y
+y
2
for some constant A.
(b) Hence show that the stationary path is Xy = Y x.
(c) Explain why the Euler-Lagrange equation for y is not needed to nd this
solution.
Exercise 9.24
S[x, y] =
_
1
0
dt
_
x
2
+ y
2
+ x y
_
, x(0) = y(0) = 0, x(1) = X, y(1) = Y.
(a) Show that the two Euler-Lagrange equations be integrated to give the general
solution
2x +y = At +B, x + 2y = Ct +D
where A, B, C and D are constants.
(b) Hence show that the stationary path is x = Xt, y = Y t.
(c) Explain why both Euler-Lagrange equations are needed to nd this solution.
Exercise 9.25
Show that the stationary paths of the functional
S[x, y] =
_
1
0
dt
_
1
2
(x y y x)
_
x
2
+ y
2
_
are the circles (x A1)
2
+ (y A2)
2
=
2
, where A1 and A2 are constants.
Chapter 10
Variable end points
10.1 Introduction
The functionals considered previously all involve xed end points, that is the inde-
pendent variable is dened on a given interval at the ends of which the value of the
dependent variable is known. It is not hard to nd variational problems with dierent
types of boundary conditions: in this introduction we describe a few of these problems
in order to motivate the analysis described here and in chapter 12.
The simplest generalisation is to natural boundary conditions in which the interval
of integration is given, but the value of the path at either one or both ends is not
given but needs to be determined as part of the variational principle. An example is a
stationary, loaded, sti beam, which adopts a conguration that minimises its energy.
If the unloaded beam is horizontal along the x-axis, between x = 0 and L, and y(x)
represents the displacement, assumed small, the bending energy is proportional to its
curvature, which for small |y| is proportional to y
(x)
2
; then if (x) is the density per
unit length (of the beam and the load) the energy functional can be shown to be
E[y] =
_
L
0
dx
_
1
2
y
2
g(x)y
_
, (10.1)
where is a positive constant and g the acceleration due to gravity. Note that here
y(x) is positive for displacements below the x-axis. The Euler-Lagrange equation for
this functional is a linear, fourth order equation, see section 10.2.1, so requires four
If the beam is clamped horizontally at x = 0, there are just two boundary conditions,
y(0) = y
(0) = 0, though experience shows that this problem has a unique solution.
It transpires that the other two conditions, needed to determine this solution of the
Euler-Lagrange equation, can be derived directly from the variational principle that
requires E[y] to be stationary.
Alternatively, if the beam is simply supported at both ends, giving the boundary
conditions y(0) = y(L) = 0, it can be shown that the remaining two boundary conditions
are also obtained by insisting that E[y] is stationary. We explore this problem in
section 10.2.1.
257
258 CHAPTER 10. VARIABLE END POINTS
The rst person to generalise boundary conditions was Newton in his investigations
of the motion of an axially symmetric body through a resisting medium, see equa-
tion 3.22 (page 110).
The brachistochrone problem was generalised by John Bernoulli in 1697, by allowing
the lower end of the stationary path to move on a given curve, dened by an equation
of the form (x, y) = 0. In gure 10.1 we show an example where the right end of the
brachistochrone lies on the straight line dened by (x, y) = x + y 1 = 0, and the
left end is xed at (0, A), with A < 1. In gure 10.1 are shown the brachistochrones
for various values of A when the particle starts at rest at (0, A). The equation for the
stationary paths is derived in exercise 10.13. Notice that the cycloid intersects the curve
(x, y) = 0 at right angles and at x = 0 the gradient of the cycloid is innite.
0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
(x,y)=x+y-1=0
Cycloid segments
R L
x
y
Figure 10.1 Diagram showing stationary paths through the point (0, A), for
A = 0.2, 0.5 and 0.9, and (v, y(v)) where the right end is constrained to lie on the
straight line (x, y) = x + y 1 = 0, and the particle starts from rest at (0, A).
In this case the functional is, see equation 5.5 (page 150),
T[y] =
_
v
0
dx
1 +y
2
2E/m2gy
, y(0) = A, (v, y(v)) = 0, (10.2)
where A is known, but v and y(x), 0 x v, need to be determined. The actual
stationary path is clearly a cycloid, but which, of the innitely many cycloids through
these points, needs to be determined by an additional equation for v. In Bernoullis
original formulation the curve (x, y) = 0 was a vertical line through a given point,
that is the value of v is xed, but y(v) is unknown; in this case (x, y) = x v.
Exercise 10.1
Explain why the stationary curves depicted in gure 10.1 are cycloids.
Many xed end point problems can be modied in this manner. For instance a variation
of the catenary problem, described in section 3.5.6 (page 111) is given by an inelastic
10.2. NATURAL BOUNDARY CONDITIONS 259
rope hanging between two curves, dened by
1
(x, y) and
2
(x, y), on which the ends
may slide without hindrance, as shown in gure 10.2: the curve AB is a catenary, but
now we also need to determine the positions of A and B. Another example is a cable
hanging between two points A and B between which a weight of mass M is attached at
a given point C, with the distances AC and CB along the curve known, see gure 10.3.
The segments AC and CB will be catenaries but the gradient at C will be discontinuous.
Both these problems involve constraints, so are dealt with in chapter 12.
A B
(x,y)
2
(x,y)
1
Catenary
y
x
Figure 10.2 Diagram of a rope hanging between
the two curves dened by
k
(x, y) = 0, k = 1, 2,
on which it can slide freely.
A B
y
x
C
Mg
Figure 10.3 Diagram of a rope hanging between
two given points, A and B, and with a weight
rmly attached at a given point of the rope.
10.2 Natural boundary conditions
In this section we develop the theory for a particularly simple type of free boundary,
because this illustrates the method in the clearest manner. The ideas used for the more
general case are similar, but the algebra is more complicated. Here the interval [a, b]
and the value of the path at x = a are given, but the value of y(b) is to be determined.
Thus the functional is
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, (10.3)
and both y(x) and y(b) need to be chosen to ensure that S[y] is stationary, as shown
schematically in gure 10.4, where the stationary and a varied path are depicted. This
problem diers from the general case, treated later, in that the value of x at the right-
hand end is given. This type of boundary condition is known as a natural condition or
a natural boundary condition because the value of y(b) is not imposed, but is dened
by the variational principle.
The admissible paths all pass through (a, A); the right end is constrained to lie on
the line x = b, but the actual position on this line needs to be determined. If y + h
are admissible paths then h(a) = 0, but h(b) need not be zero.
A
a
y(x)
y(x)+h(x)
b
y(b)
y(b) + h(b)
y
x
L R
Figure 10.4 Diagram showing the stationary path, the solid curve, and a varied path,
the dashed curve, for a problem in which the left end is xed, but the other end is free
to move along the line x = b, parallel to the y-axis, so y(b) needs to be determined.
The Gateaux dierential of the functional is given by equation 4.9 (page 129), that is
S[y, h] =
_
b
a
dx
_
h(x)
F
y
+h
(x)
F
y
_
. (10.4)
As before we integrate the second term by parts: using the fact that h(a) = 0 this
gives,
S[y, h] = h(b)
F
y
x=b
_
b
a
dx
_
d
dx
_
F
y
F
y
_
h(x). (10.5)
This is the equivalent of equation 4.10 (page 129) but now the boundary term is not
automatically zero.
For a stationary path S[y, h] = 0 for all h(x) and because the allowed variations
include those functions for which h(b) = 0 the stationary paths must satisfy the Euler-
Lagrange equation
d
dx
_
F
y
F
y
= 0, y(a) = A, (10.6)
with only one boundary condition
1
. The general solution of this equation will contain
one arbitrary constant c, so we write the solution as y(x, c). Because y(x, c) satises
equation 10.6, the Gateaux dierential becomes S = h(b)F
y
(x, y, y
)|
x=b
and because
this must be zero for all h(b), the solution of the Euler-Lagrange equation must satisfy
the boundary condition
F
y
(b, y(b, c), y
(b, c)) = 0, (10.7)

which determines possible values of c, and hence the stationary paths. Equation 10.7
is the natural boundary condition.
As an example consider the brachistochrone problem, studied in section 5.2. It is
convenient to use the dependent variable z(x) = A y(x), dened in equation 5.6
(page 150), and as before, we suppose that the initial velocity is zero, v
0
= 0. Then the
functional may be taken to be
2
T[z] =
_
b
0
dx
_
1 +z
2
z
, z(0) = 0. (10.8)
1
This derivation assumes that there exists at least a one parameter family of variations, h(x), such
that h(a) = h(b) = 0, which is always the case for the problems we consider.
2
For convenience we ignore the factor (2g)
1/2
, which does not aect the Euler-Lagrange equation.
The Euler-Lagrange equation is the same as in the previous discussion and, because the
functional does not depend explicitly upon x, it reduces to the rst-order equation 5.7,
having the solution, see equation 5.8 (page 151),
x =
1
2
c
2
(2 sin 2), z = c
2
sin
2
, (10.9)
where we have set d = 0, because z = 0 when x = 0; the boundary condition at x = b
determines the value of c. For future reference we note that
dz
dx
=
dz
d
_
dx
d
=
2 sincos
1 cos 2
=
1
tan
because cos 2 = 1 2 sin
2
.
At x = b this solution must satisfy the boundary condition 10.7 which, for this
problem, becomes
F
z
=
z
_
z(1 +z
2
)
= 0.
But z is bounded so the only solution is z
= 0, and since z
= 1/ tan, this gives

= /2, and means that the cycloid intersects the vertical line through x = b or-
thogonally, see gure 10.5. But at the right end x = b, so 2b = c
2
, which gives the
solution
x =
b
(2 sin 2) , z =
2b
sin
2
, 0

2
. (10.10)
The shape of this curve depends only upon b, rather than both A and b as in the
conventional problem. Here the value of A merely changes the vertical displacement of
the whole curve. It is therefore convenient to set A = 2b/, and then the dependence
upon b becomes a change of scale, seen by setting x = x/b and y = y/b to give
x = 2 sin 2, y = 2 cos
2
, 0

2
. (10.11)
The graph of this scaled solution is shown in gure 10.5.
0 0.5 1 1.5 2 2.5 3
0.5
1
1.5
2
x
y
R
L
Figure 10.5 Graph showing the cycloid dened in equation 10.11,
where x = x/b and y = y/b.
The time of passage is also independent of A, and is given by the simple formula
T(b) =
_
b/g, a result derived in exercise 10.6.
Exercise 10.2
Write down a functional for the distance between the point (0, A) and the line
x = X > 0, parallel to the y-axis. Show that the stationary path is the straight
line through (0, A) and parallel to the x-axis.
Exercise 10.3
Find the stationary path of the functional S[y] =
_
/4
0
dx
_
y
2
y
2
_
, y(0) = A > 0,
where the right-hand end of the path lies on the line x = /4.
Exercise 10.4
_
b
a
dxF(x, y, y
), y(b) = B, with the left end

of the path constrained to the line x = a, is stationary on the solution of the
Euler-Lagrange equation,
d
dx
_
F
y
F
y
= 0, F
y
(x, y, y
x=a
= 0, y(b) = B.
Exercise 10.5
Find the stationary path for the functional S[y] =
_
1
0
dx
_
y
2
+y
2
_
, y(1) = B > 0,
with the left end of the path constrained to the y-axis.
Exercise 10.6
Show that the time to traverse the curve 10.10 is T(b) =
_
b/g.
Hint use equation 10.8, but remember the factor (2g)
1/2
.
Exercise 10.7
The navigation problem dened in section 3.5.4 gives rise to the functional
T[y] =
_
b
0
dxF(x, y
), F(x, y
) =
_
c
2
(1 +y
2
) v
2
vy
c
2
v
2
,
for the time to cross a river. The start point is at the origin so y(0) = 0, but the
terminus is, in this version of the problem, undened so the boundary condition
at x = b is a natural boundary condition. Assuming that v(x) 0 show that the
stationary path is given by
y(x) =
1
c
_
x
0
duv(u).
Exercise 10.8
This exercise is important because it uses the method introduced in this section
to extend the range of boundary conditions that can be described by functionals.
S[y] =
_
b
a
dx
_
y
(x)
2
y(x)
2
_
, y(a) = A, y(b) = B,
is y
+y = 0, y(a) = A, y(b) = B.
(b) Second-order equations of the above form occur frequently, but the boundary
conditions are sometimes dierent, involving linear combinations of y and y
. Thus
a typical equation is
d
2
y
dx
2
+y = 0, gay(a) +y
(a) = 0, g
b
y(b) +y
(b) = 0. (10.12)
where ga and g
b
are constants.
Show, from rst principles, that the functional
S[y] = g
b
y(b)
2
gay(a)
2
+
_
b
a
dx
_
y
(x)
2
y(x)
2
_
is stationary on the path that satises equation 10.12, for all ga and g
b
.
10.2.1 Natural boundary conditions for the loaded beam
In this section we discuss functionals such as those for the energy of the loaded beam,
equation 10.1, which contain the second derivative, y
, so the associated Euler-Lagrange

equation is fourth-order, see equation 10.16 below. We start with the general functional
S[y] =
_
b
a
dxF(x, y, y
, y
), y(a) = A, y
(a) = A
,
with natural boundary conditions at x = b. The derivation provided is brief because it
is similar to previous analysis. The Gateaux dierential is
S[y, h] =
_
b
a
dx (hF
y
+h
F
y
+h
F
y
) . (10.13)
Integration by parts gives
_
b
a
dxh
F
y
=
_
hF
y
_
b
a
_
b
a
dxh
d
dx
_
F
y
_
,
_
b
a
dxh
F
y
=
_
h
F
y
h
d
dx
_
F
y
__
b
a
+
_
b
a
dxh
d
2
dx
2
_
F
y
_
,
so the Gateaux dierential can be cast into the form
S[y, h] =
_
h
_
F
y

d
dx
_
F
y
__
+h
F
y
_
b
a
+
_
b
a
dx
_
d
2
dx
2
_
F
y
d
dx
_
F
y
_
+
F
y
_
h. (10.14)
In this example h(a) = h
(a) = 0 but there are no conditions on h(b). Hence S

reduces to
S[y, h] = h(b)
_
F
y

d
dx
_
F
y
__
b
+h
(b)
F
y
b
+
_
b
a
dx
_
d
2
dx
2
_
F
y
d
dx
_
F
y
_
+
F
y
_
h. (10.15)
On a stationary path this must be zero for all allowed h(x). A subset of varied paths
has h(b) = h
(b) = 0 and hence the stationary path must satisfy the Euler-Lagrange
equation
d
2
dx
2
_
F
y
d
dx
_
F
y
_
+
F
y
= 0, y(a) = A, y
(a) = A
. (10.16)
The solution of this equation contains two arbitrary constants. Now consider those
varied paths for which h(b) = 0 and h
(b) = 0, and those for which h(b) = 0 and

h
(b) = 0, to see that the solutions of this Euler-Lagrange equation must also satisfy
the two extra boundary conditions,
F
y
= 0 and F
y

d
dx
F
y
= 0 at x = b, (10.17)
which determine the two constants in the solution of equation 10.16.
Exercise 10.9
Exercise 10.10
For the functional dened in equation 10.1 (page 257) with =constant and the
boundary conditions y(0) = y
(0) = 0, use equations 10.16 and 10.17 to derive the

associated Euler-Lagrange equation and show that its solution is
y(x) =
g
24
x
2
_
x
2
4Lx + 6L
2
_
.
Exercise 10.11
(a) Show that the stationary paths of the functional
S[y] =
_
b
a
dxF(x, y, y
, y
), y(a) = A, y(b) = B,
satisfy the Euler-Lagrange equation
d
2
dx
2
_
F
y
d
dx
_
F
y
_
+
F
y
= 0, y(a) = A, y(b) = B, F
y
a
= F
y
b
= 0.
(b) Apply the result found in part (a) to the functional dened in equation 10.1
(page 257), with =constant and the boundary conditions y(0) = y(L) = 0, to
derive the associated Euler-Lagrange equation and show that its solution is
y(x) =
g
24
x(L x)
_
L
2
+xL x
2
_
.
10.3. VARIABLE END POINTS 265
10.3 Variable end points
The theory for variable end points is similar to that described above, but is slightly
more complicated because the x-coordinate of the free end must also be determined.
Here we consider the case where the left end of the stationary path is known, and has
coordinates (a, A), but the right end is free to lie on a given curve, dened by the
equation (x, y) = 0, as shown schematically in gure 10.6: we shall assume that
x
and
y
are not simultaneously zero in the region of interest. Note that if = x b, the
equation = 0 denes the line x = b parallel to the y-axis, which is the example dealt
with in the previous section.
A
a v v+
y(x)
y(x) + h(x)
(x,y)=0
y
x
L
R
Figure 10.6 Diagram showing the stationary path, the solid line, and a varied
path, the dashed curve, for a problem in which the left-hand end is xed, but
the other end is free to move along the line dened by (x, y) = 0.
The functional is
S[y] =
_
v
a
dxF(x, y, y
), y(a) = A, (10.18)
where the path y(x) and v need to be chosen to make the functional stationary.
Let y(x)+h(x) be an admissible varied path, so h(a) = 0. If x = v is the right-hand
terminal point of y(x), the terminal point of the varied path is at x = v +, for some
, so the x and y coordinates of this point are,
x = v + and y = y(v +) +h(v +)
= y(v) +
_
y
(v) +h(v)
_
+O(
2
).
This point also lies on the constraining curve so, to rst-order in ,
_
v +, y(v) +
_
y
(v) +h(v)
_
= 0.
Expanding this to rst-order in , and remembering that (v, y(v)) = 0, gives
(
x
+y
(v)
y
) +h(v)
y
= 0, (10.19)
which provides a relation between and h(v) that is needed later.
The Gateaux dierential of the functional is computed using equation 4.5 (page 125),
in the normal manner, except that the upper limit of the integral now depends upon .
Thus on the varied path
S[y +h] =
_
z
a
dxF(x, y +h, y
+h
), z = v +, (10.20)
so the derivative with respect to is given by equation 1.52, (page 43), with b = z()
so dz/d = ,
dS
d
= F
_
z, y(z) +h(z), y
(z) +h
(z)
_
+
_
z
a
dx
_
h
F
y
+h
F
y
_
.
On putting = 0, so z = v, we obtain the Gateaux dierential
S[y, h] = F(v, y(v), y
(v)) +
_
v
a
dx (hF
y
+h
F
y
) . (10.21)
Now use integration by parts and the fact that h(a) = 0 to give
_
v
a
dxh
F
y
=
_
hF
y
_
v
a
_
v
a
dxh
d
dx
(F
y
)
= hF
y
x=v
_
v
a
dxh
d
dx
(F
y
) .
Hence the Gateaux dierential, equation 10.21, becomes
S[y, h] =
_
F +hF
y
_
v
a
dx
_
d
dx
_
F
y
F
y
_
h. (10.22)
Finally we use equation 10.19 to express h(v) in terms of to arrive at the relation
S[y, h] =

y
_
F
y

x
+ (y
F
y
F)
y
_
_
v
a
dx
_
d
dx
_
F
y
F
y
_
h. (10.23)
On a stationary path S[y, h] = 0 for all allowed h. A subset of these variations will
have = 0, consequently y(x) must satisfy the Euler-Lagrange equation,
d
dx
_
F
y
F
y
= 0, y(a) = A. (10.24)
On a path satisfying this equation the Gateaux dierential reduces to
S[y, h] =

y
_
x
F
y
+ (y
F
y
F)
y
_
v
(10.25)
and this must also be zero for all . Hence, the equation
x
F
y
+
y
(y
F
y
F) = 0, x = v, (10.26)
must be satised. This equation is the required boundary condition for the right-hand
end of the path and is named a transversality condition.
In order to see how this works, consider the solution of equation 10.24, y(x, c),
which depends upon a single constant c, because there is only one boundary condition.
By substituting this into equation 10.26 we obtain an equation relating v and c. But
the right-hand end of the path satises the condition (v, y(v, c)) = 0, and this gives
another relation between v and c: if these two equations can be solved for one or more
real pairs of v and c, stationary paths are obtained.
The derivation of equation 10.26 implicitly assumed that
y
= 0, see equation 10.23.
Suppose that on the stationary path
y
= 0, which means that at this point the curve
(x, y) = 0 is parallel to the y-axis, then from equation 10.19 we see that = 0, since we
assumed that
x
and
y
are not simultaneously zero, the boundary term of 10.22 reduces
to hF
y
= 0, which means that F
y
= 0. Equation 10.26 also gives F
y
= 0 if
y
= 0 so it
is also valid in this exceptional case. Note that in this limit the transversality condition
reduces to the natural boundary condition of equation 10.7, which is also retrieved by
setting = x b in equation 10.26.
The transversality condition can be written in an alternative form by noting that
if the equation (x, y) = 0 denes a curve y = g
2
(x) then g
2
(x) =
x
/
y
, and equa-
tion 10.26 becomes
F + (g
2
y
)F
y
= 0, x = v. (10.27)
This form of the transversality condition is not valid when
y
= 0, that is where |g
2
(x)|
is innite.
If the left end of the path is also constrained to a prescribed curve, (x, y) = 0, then
a similar equation can be derived. In summary we have the following result.
Theorem 10.1
For the functional S[y] =
_
v
u
dxF(x, y, y
) and the smooth curves C
and C
dened by
the equations (x, y) = 0 and (x, y) = 0, the continuously dierentiable path joining
C
and C
, at x = u and x = v respectively, that makes S[y] stationary, satises the

d
dx
_
F
y
F
y
= 0 (10.28)
and the boundary conditions
x
F
y
+
y
(y
F
y
F)
x=u
= 0 and
x
F
y
+
y
(y
F
y
F)
x=v
= 0. (10.29)
Either of these boundary conditions may be replaced by conventional boundary condi-
tions.
As an example consider the functional
S[y] =
_
v
0
dxf(y)
_
1 +y
2
, y(0) = a, (10.30)
with the right end of the path terminating on the curve C dened by (x, y) = 0. For
this functional a rst-integral exists and is given by
F y
F
y
=
f(y)
_
1 +y
2
= c = constant.
The transversality condition 10.26 then gives
x
y
f(y)
_
1 +y
2
c
y
= 0 that is
x
y
(v) =
y
.
But the gradient of C is
x
/
y
and hence at the terminal point the stationary path is
perpendicular to C.
Exercise 10.12
Find the stationary path of the functional S[y] =
_
v
0
dx
_
1 +y
2
y
, y(0) = 0, for
a path terminating on the line y = x a, a > 0.
Hint rst show that the solutions of the Euler-Lagrange equation are circles
through the origin and with centres on the x-axis.
Exercise 10.13
Consider the brachistochrone in which the left end is xed at (0, A) and the right
end is constrained to the curve x/a + y/b = 1, a, b > 0. Initially the particle is
stationary at (0, A).
Show that the equations of the stationary path are
x =
1
2
c
2
(2 sin 2) , y = Ac
2
sin
2
, 0
b
= tan
1
(b/a),
where c is given by the equation c
2
b
= a (1 A/b).
Graphs of this solution, for various values of A and a = b = 1, are shown in
gure 10.1 (page 258).
Exercise 10.14
Consider the ellipse and the straight line dened, respectively, by the equations
x
2
a
2
+
y
2
b
2
= 1 and
x
A
+
y
B
= 1, x > 0, y > 0,
in the rst quadrant, where a, b, A and B are positive constants.
(a) Show that these curves do not intersect if AB > , where
2
= A
2
b
2
+B
2
a
2
.
(b) Construct a functional for the distance between two points (u, v) on the ellipse,
and (, ) on the straight line, and show that the solution of the associated Euler-
Lagrange equation is the straight line y = mx + c. Show also that the values of
the six constants m and c , (u, v) and (, ) making this distance stationary satisfy
the equations
mu
a
2
=
v
b
2
, m =
A
B
,
u
2
a
2
+
v
2
b
2
= 1,

A
+

B
= 1,
together with v = mu +c and = m +c.
(c) Solve these equations to show that when the curves do not intersect the sta-
tionary distance is d =
AB
A
2
+B
2
.
10.4 Parametric functionals
It is sometimes useful to formulate a functional in terms of curves dened parametrically
using the theory described in chapter 9. For variable end point problems the derivation
of the appropriate formulae follows in a similar manner to that described above, but
the homogeneity of the integrand simplies the nal result.
10.4. PARAMETRIC FUNCTIONALS 269
Consider the parametric functional
S[x, y] =
_
1
0
dt (x, y, x, y), x(0) = a, y(0) = A, (10.31)
where the end of the path at t = 0 is xed and the end at t = 1 lies on a smooth
curve, C, dened parametrically by x = (), y = (), where both () and ()
are continuously dierentiable and such that
() and
() are not simultaneously

zero for any in the region of interest. Notice that the parameter t varies in the xed
interval [0, 1] because the integrand is homogeneous of degree one in x and y: this is
dierent from the functional 10.18 in which it was necessary to allow the upper limit
to vary. Here 0 t 1 on all paths.
By considering the varied path (x +h
1
, y +h
2
) we obtain the Gateaux dierential
in the usual manner,
S[x, y, h
1
, h
2
] =
_
1
0
dt
_
h
1
x
+

h
1
x
+h
2
y
+

h
2
y
_
. (10.32)
The left end of the path is xed at t = 0, consequently h
1
(0) = h
2
(0) = 0, and
integration by parts gives
S =
_
h
1
x
+h
2
y
_
t=1
_
1
0
dt
_
h
1
_
d
dt
_
x
_

x
_
+h
2
_
d
dt
_
y
_

y
__
.
(10.33)
If S[x, y] is stationary it is necessary that S = 0 for all allowed variations h
1
(t) and
h
2
(t). By restricting the varied paths to those on which h
1
(1) = h
2
(1) = 0 we see that
the stationary path must satisfy the Euler-Lagrange equations
d
dt
_
x
_

x
= 0,
d
dt
_
y
_

y
= 0, x(0) = a, y(0) = A. (10.34)
The general solutions of these equations satisfying the conditions at t = 0 will contain
two constants, which we denote by c and d. On these paths the Gateaux dierential
becomes
S =
_
h
1
(t)
x
+h
2
(t)
y
_
t=1
. (10.35)
Because all admissible paths terminate on C, as shown in gure 10.7, the values of h
1
(1)
and h
2
(1) are related.
Varied path
Stationary path
=
1
=
1
+
y
x
C
a
A
Figure 10.7 Diagram showing the stationary path, the terminating
curve, C, and a varied path. At the intersection of C and the sta-
tionary path =
1
; and the varied path intersects C at =
1
+.
Suppose that the stationary path terminates at ((
1
), (
1
)) and a varied path at a
dierent value of ,
1
+. Hence
x(1) = (
1
) and x(1) +h
1
(1) = (
1
+).
Expanding to rst-order in gives h
1
(1) =
(
1
) and, similarly, h
2
(1) =
(
1
).
Thus equation 10.35 becomes
S =
_
(
1
)
x
+
(
1
)
y
_
t=1
. (10.36)
But S must be zero for all = 0 and hence the required boundary condition is
_
(
1
)
x
+
(
1
)
y
_
t=1
= 0. (10.37)
This is the transversality condition in parametric form and is the equivalent of equa-
tion 10.26 (page 266).
There are now three constants that need to be determined: these are (c, d) from
the solution of equations 10.34 and the value of the parameter
1
, where the stationary
path intersects C. Equation 10.37 gives one relation between these three parameters:
the other two are x(1, c, d) = (
1
) and y(1, c, d) = (
1
). In principle these equations
can be solved to give the required stationary path.
In order to see how this theory works consider the problem solved in exercise 9.1(b)
(page 241), that is the stationary values of the distance between the origin and the
parabola now dened parametrically by (1
2
, a).
The parametric form of the functional is
S[x, y] =
_
1
0
dt
_
x
2
+ y
2
, x(0) = y(0) = 0, (10.38)
and the boundary curve is () = 1
2
, () = a. Hence the boundary condi-
tion 10.37 becomes
a y = 2
1
x at t = 1. (10.39)
The Euler-Lagrange equations and the solutions that satisfy the boundary conditions
at the origin are
d
dt
_
x
_
x
2
+ y
2
_
= 0,
d
dt
_
y
_
x
2
+ y
2
_
= 0 = x = ct, y = dt, (10.40)
where c and d are constants to be determined: these solutions are the parametric
equations of a straight line through the origin, as expected. Hence equation 10.39
becomes ad = 2
1
c. But at t = 1 the solution 10.40 intersects the parabola, hence
c = 1
2
1
and d = a
1
. Substituting these into the equation ad = 2
1
c gives
a
2
1
= 2
1
(1
2
1
) that is
1
= 0 or
2
1
= 1
a
2
2
.
The rst of these solutions,
1
= 0, gives x = 1 and y = 0. The second equation,
2
1
= 1 a
2
/2, has real solutions if a <
2, which are the solutions found previously in

exercise 9.1(b).
10.5. WEIERSTRASS-ERDMANN CONDITIONS 271
Exercise 10.15
For the parametrically dened curve x = (), y = (), use the method described
above to show that the distance along the straight line y = mx from the origin to
a point on this curve is stationary if m =
()/
(). If the curve is represented

by the function y(x), show that this becomes my
(x) = 1 and give a geometric

interpretation of this formula.
Exercise 10.16
Express the functional dened in equation 10.38 in non-parametric form and nd
its stationary paths.
10.5 Broken Extremals: the Weierstrass-Erdmann
conditions
The theory so far has dealt almost entirely with continuously dierentiable solutions
of the Euler-Lagrange equations. In the construction of the minimum area of a surface
of revolution, section 5.3, it was seen that the Goldschmidt function, equation 5.20
(page 160), was the only solution if the end radii were too small: this function is
continuous, but at two points its derivatives do not exist.
Solutions of variational problems that are continuous, but have discontinuous deriva-
tives at a nite number of points are named broken extremals (though they are often
merely stationary paths rather than extremals). The points of discontinuity are named
corners. Such solutions are dealt with by dividing the path into contiguous segments
in each of which the path is continuously dierentiable and satises the Euler-Lagrange
equation; supplementing these equations are the Weierstrass-Erdmann (corner) condi-
tions which allow the paths in each segments to be joined to form a continuous path.
It transpires that the variational principle and the requirement of continuity provides
just sucient extra conditions for particular solutions to be formed.
It is quite easy to nd real problems that require broken extremals. One example is
illustrated in gure 10.3 (page 259), and we use a variant of this to introduce the basic
ideas before developing the general theory.
10.5.1 A taut wire
Consider a taut, elastic wire under tension T, xed at both ends one being at the origin,
the other at x = L, on the horizontal x-axis. We suppose the wire suciently light that
it lies along Ox. If a weight of mass M is hung from the wire at a given point x =
it will deform as shown in gure 10.8, and we assume that the deection is suciently
small that the change in tension is negligible.
If the y-axis is vertically upwards the energy due to the tension in the wire can
be shown to be
T
2
_
L
0
dxy
2
provided the displacement, y(x), is suciently small for
Hookes law to be valid. The potential energy of the mass is Mgy(), g being the
acceleration due to gravity, and, for the sake of simplicity, we assume that the wire
is suciently light that its potential energy is negligible. The functional for the total
energy of the system is
E[y] = Mgy() +
1
2
T
_
L
0
dxy
2
, y(0) = y(L) = 0, 0 < < L. (10.41)
The conguration adopted by the wire is the continuous stationary path of this func-
tional.
x
y
L
y
1
y
2
Mg
Figure 10.8 Diagram of a light, taut wire of length L sup-
porting a weight at x = .
This energy functional is dierent from others considered because the point x = is
special. We deal with this by splitting the interval [0, L] into two subintervals, [0, ]
and [, L] and writing the whole path, y(x) in terms of two functions,
y(x) =
_
y
1
(x), 0 x ,
y
2
(x), x L,
(10.42)
and since y(x) is continuous at x = , we have y
1
() = y
2
(). The derivatives of y(x)
are not dened at x = , but this does not hinder the analysis because we require only
the left and right-hand derivatives. These are dened, respectively, by
lim
0+
y() y( )
= lim
0+
y
1
() y
1
( )
, (left derivative),
lim
0+
y( +) y()
= lim
0+
y
2
( +) y
2
()
, (right derivative).
In the following the derivatives at x = are to be understood in this sense.
Now evaluate the functional on the varied path, y + h, also continuous at x = ,
and where h(0) = h(L) = 0,
E[y +h] = Mg
_
y() +h()
_
+
1
2
T
_

0
dx(y
1
+h
)
2
+
1
2
T
_
L
dx(y
2
+h
)
2
,
so that the Gateaux dierential is
E[y, h] = Mgh() +T
_

0
dxy
1
h
+T
_
L
dxy
2
h
.
Integration by parts gives, on remembering that h(0) = h(L) = 0,
E[y, h] =
_
Mg +T
_
y
1
() y
2
()
__
h() T
_

0
dxy
1
h T
_
L
dxy
2
h. (10.43)
Now proceed in the usual manner. By choosing those h(x) for which h(x) = 0 for
x L and those for which h(x) = 0 for 0 x , we obtain the Euler-Lagrange
equations for y
1
(x) and y
2
(x),
d
2
y
1
dx
2
= 0, 0 x < , y
1
(0) = 0,
d
2
y
2
dx
2
= 0, < x L, y
2
(L) = 0.
(10.44)
On the path satisfying these equations the Gateaux dierential becomes
E[y, h] =
_
Mg +T
_
y
1
() y
2
()
__
h()
and this can be zero for all h(x) only if
y
2
() y
1
() =
Mg
T
. (10.45)
Physically, this equation represents the resolution of forces acting on the weight in the
vertical direction. Together with the continuity of y(x) this condition provides sucient
information to nd a stationary path, as we now show by solving the equations.
The solutions of the Euler-Lagrange equations 10.44 that satisfy the boundary con-
ditions at x = 0 and x = L are
y
1
(x) = x and y
2
(x) = (L x),
for some constants and . Since y
1
() = y
2
() we have ( + ) = L and equa-
tion 10.45 gives + = Mg/T. Hence the stationary path comprises the two straight
line segments,
y(x) =
_
Mg
TL
(L )x, 0 x ,
Mg
TL
(L x), x L.
(10.46)
Exercise 10.17
Find the continuous stationary paths of the functional
S[y] = Cy()
2
+
1
2
_
L
0
dxy
2
, y(0) = A, 0 < < L,
with natural boundary conditions at x = L. Explain why there cannot be a
unique, nontrivial solution if A = 0.
10.5.2 The Weierstrass-Erdmann conditions
Now consider the problem of nding stationary paths of the functional
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (10.47)

that are continuously dierentiable for a x b, except possibly at a single, unknown
point c (a < c < b), where the path is continuous but its derivatives are not.
The main dierence between this and the previous special case is that the value of
c is not known in advance. However, we proceed in the same manner by splitting the
functional into two components,
S[y] =
_
c
a
dxF(x, y
1
, y
1
) +
_
b
c
dxF(x, y
2
, y
2
), (10.48)
and compute its value on the varied paths y
1
+h
1
and y
2
+h
2
, and also allowing the
point x = c to move to c
= c +, as shown diagrammatically in gure 10.9.

A
B
x
c a b
y
y
1
y
2
y
1
+ h
1
y
2
+ h
2
c
Figure 10.9 Diagram showing the stationary and a varied
path: here c
= c + .
The value of the functional on the varied path is
S[y +h] =
_
c+
a
dxF(x, y
1
+h
1
, y
1
+h
1
) +
_
b
c+
dxF(x, y
2
+h
2
, y
2
+h
2
).
Each integral is similar to that dened in equation 10.20 (page 265) and using the same
analysis that leads to equation 10.25, see exercise 10.18, we obtain,
S[y, h] =
_
F +h
1
F
y
(x,y)=(c,y1)
_
F +h
2
F
y
(x,y)=(c,y2)
, (10.49)
with y
1
(x) and y
2
(x) satisfying the Euler-Lagrange equations
d
dx
_
F
y
1
_
F
y
1
= 0, y
1
(a) = A, a x c, (10.50)
d
dx
_
F
y
2
_
F
y
2
= 0, y
2
(b) = B, c x b. (10.51)
On the stationary path the coordinates of the corner are (c, y(c)) and on a varied path
these become (c +, y(c) +), with and independent variables. In terms of y
1
and
y
2
we have
y(c) + = y
k
(c +) +h
k
(c +), k = 1 and 2,
= y
k
(c) +
_
y
k
(c) +h
k
(c)
_
+O(
2
).
Since y(x) is continuous, y
1
(c) = y
2
(c) = y(c), these equations allow h
1
(c) and h
2
(c)
to be expressed in terms of the independent variables and . Substituting these
expressions into equation 10.49 for S we obtain
S[y, h] =
_
(F y
F
y
) +F
y

_
(x,y)=(c,y1)
_
(F y
F
y
) +F
y

_
(x,y)=(c,y2)
.
(10.52)
Note that each term of the right-hand side of this equation is similar to the left-hand
side of equation 10.26 (page 266) with =
y
and =
x
: the important dierence is
that and are independent variables. Because of this S = 0 only if the coecients
of and are both zero, which gives the two relations
lim
xc
_
F y
F
y
_
= lim
xc+
_
F y
F
y
_
, (10.53)
lim
xc
F
y
= lim
xc+
F
y
. (10.54)
These relations between the values of y
1
and y
2
, and their rst derivative at x = c are
known as the Weierstrass-Erdmann (corner) conditions and they hold at every corner of
a stationary path. With one corner the Euler-Lagrange equations 10.50 and 10.51 may
be solved to give functions y
1
(x, ) and y
2
(x, ), each involving one arbitrary constant.
Substituting these into the corner conditions gives two equations relating , and c:
a third equation is given by the continuity equation y
1
(c, ) = y
2
(c, ). These three
equations allow, in principle, values for , and c to be found.
Exercise 10.18
Derive equations 10.4910.52.
For an example consider the functional
S[y] =
_
2
0
dxy
2
(1 y
)
2
, y(0) = 0, y(2) = 1. (10.55)
Because the integrand depends only upon y
, the solutions of the Euler-Lagrange equa-

tion are the straight lines y = mx + , for some constants m and . Therefore the
smooth solution that ts the boundary conditions is y = x/2 and on this path S = 1/8:
moreover, by considering the second-order terms in the expansion of S[y + h] we see
that this is path is a local maximum of S.
However, if y
= 0 or y
= 1 the integrand is zero, so we can imagine a broken

path comprising segments of straight lines at 45
and parallel to the x-axis on which

S[y] = 0; because the integrand is non-negative such a path gives a global minimum.
We now show that the corner conditions give such solutions.
Suppose that there is one corner at x = c. The two solutions that t the boundary
conditions either side of c are
y =
_
y
1
= m
1
x, 0 x c,
y
2
= m
2
(x 2) + 1, c x 2.
Since
F
y
= 2y
(1 y
)(1 2y
) and F y
F
y
= y
2
(1 y
)(1 3y
)
the Weierstrass-Erdmann conditions become
m
2
1
(1 m
1
)(1 3m
1
) = m
2
2
(1 m
2
)(1 3m
2
) and
m
1
(1 m
1
)(1 2m
1
) = m
2
(1 m
2
)(1 2m
2
).
(10.56)
The only non-trivial solutions of these equations and the continuity condition, m
1
c =
m
2
(c2) +1 are (m
1
, m
2
, c) = (1, 0, 1) and (0, 1, 1), which give the two solutions shown
by the solid and dashed lines, respectively, in gure 10.10. On both lines the functional
has its smallest possible value of zero.
y
(1,1) (2,1)
x
Figure 10.10 Graph of some broken extremals for the functional 10.55. On
the solid line (m
1
, m
2
) = (1, 0): on the dashed line (m
1
, m
2
) = (0, 1) and in
both cases c = 1. The dotted line is a broken extremal with several corners.
In this example there are solutions with any number of corners comprising alternate
lines with unit gradient and horizontal lines; an example is depicted by the dotted line
in gure 10.10.
Exercise 10.19
(a) Show that the stationary path of the functional 10.55 without corners is y =
x/2 and that on this path S[y] = 1/8.
(b) If y = x/2 show that
S[y +h] = S[y]
1
2
2
_
2
0
dxh
(x)
2
and deduce that this path gives a local maximim of S[y].
Exercise 10.20
Show that the only solutions of equations 10.56 are those given in the text.
Exercise 10.21
Find the stationary paths of the functional 10.55 with two corners.
Exercise 10.22
Find the stationary paths of the functional S[y] =
_
4
0
dx
_
y
2
1
_
2
, y(0) = 0,
y(4) = 2, having just one corner.
10.6. NEWTONS MINIMUM RESISTANCE PROBLEM 277
10.5.3 The parametric form of the corner conditions
The Weierstrass-Erdmann corner conditions for the parametric functional
S[x, y] =
_
1
0
dt (x, y, x, y), x(0) = a, y(0) = A, x(1) = b, y(1) = B, (10.57)
can be derived directly from equations 10.53 and 10.54, by setting (x, y, x, y) =
xF(x, y, y/ x) and recalling the results of exercise 9.10 (page 247), to give
lim
tc
x
= lim
tc+
x
and lim
tc
y
= lim
tc+
y
, (10.58)
where the corner is at t = c, with 0 < c < 1. At such a corner either or both of x(t)
and y(t) are discontinuous.
10.6 Newtons minimum resistance problem
We now consider the solution of Newtons minimum resistance problem described in sec-
tion 3.5.3 where the relevant functionals are derived, equations 3.21 and 3.22 (page 110).
Although the solution of this problem is of little practical value, for the reasons discussed
in section 3.5.3, its derivation is worth pursuing because of the techniques needed. The
detailed analysis in this section is not, however, assessed. To recap we require the
stationary paths of the functionals
S
1
[y] =
_
b
0
dx
x
1 +y
2
, y(0) = A > 0, y(b) = 0, (10.59)
and
S
2
[y] =
1
2
a
2
+
_
b
a
dx
x
1 +y
2
, y(a) = A > 0, y(b) = 0, 0 < a < b. (10.60)
For S
2
[y] both the stationary path and the value of a need to be determined. Physical
considerations suggest that y
(x) is piecewise continuous; further in the derivation of

these functionals we made the implicit assumption that y
(x) 0 and without this

constraint S
1
[y] can be made arbitrarily small, as shown in exercise 10.28.
Here we show that S
1
[y] has no stationary paths that can satisfy the boundary
conditions; using this analysis we derive a stationary path for S
2
[y].
The functional S
1
[y] is of the type considered in exercise 8.13 (page 222) because
the integrand, F = x/(1 + y
2
), does not depend explicitly upon y. The conclusion of
this exercise shows that if y(x) is a stationary path and if
F
y
y
=
2x(3y
2
1)
(1 +y
2
)
3
> 0, (10.61)
it gives a minimum value of S
1
.
The Euler-Lagrange equation associated with S
1
can be integrated directly, and
assuming that y(x) decreases monotonically from y(0) = A > 0 to y(b) = 0, we obtain
xy
(1 +y
2
)
2
= c, with c > 0. (10.62)
This equation can be solved by dening a new positive variable
3
,
p(x) = y
(x) giving the equation xp = c(1 +p

2
)
2
. (10.63)
Integrating the rst equation gives
y(x) = A
_
x
0
dxp(x)
= A
_
p
p0
dp p
dx
dp
= A
_
p
p0
dp
_
d
dp
(xp) x
_
,
where p
0
= p(0) is an unknown constant. The last expression gives
y(p) = A
_
xp
_
p
p0
+c
_
p
p0
dp
_
1
p
+ 2p +p
3
_
.
This equation can be integrated directly to obtain (x, y) in terms of p,
x(p) =
c(1 +p
2
)
2
p
and y(p) = B +c
_
ln p p
2
3
4
p
4
_
(10.64)
where B is a constant which has absorbed all other constants: in these equations p
may be regarded as a parameter, so we have found a solution in parametric form. The
required solution is obtained by nding the appropriate values of B, c and a range of p
that satisfy (x, y) = (0, A) and (b, 0): it transpires that this is impossible, as will now
be demonstrated.
Dene the related functions
(p) =
x
c
=
(1 +p
2
)
2
p
and (p) =
y B
c
= ln p p
2
3
4
p
4
, p > 0 (10.65)
which contain no arbitrary constants. Since, by denition, p = y
(x) it follows from

the chain rule that p
(p) =
(p) and hence for p = 0 the stationary points of (p)

and (p) coincide. The graphs of (p) and (p) are shown in gure 10.11.
0 0.2 0.4 0.6 0.8 1 1.2 1.4
2
3
4
5
6
7
0 0.2 0.4 0.6 0.8 1 1.2 1.4
-6
-5
-4
-3
-2
-1
0
p
p

Figure 10.11 Graphs of (p) and (p). Each function is stationary at p = 1/
3.
Since
(p) = (p
2
+1)(3p
2
1)/p
2
, (p) has a single minimum at p = 1/
3 and, because
p
(p) =
(p), (p) has a single maximum at p = 1/
3. The coordinates of the

3
See for instance exercise 2.7, page 59.
stationary points are (, ) = (16
3/9,
1
2
ln 3
5
12
) = (3.08, 0.97). The minimum
value of x(p) is 16
3c/9, with c > 0, hence there is no nontrivial stationary path that

can pass through x = 0, the lower boundary. However, we pursue the investigation of
this general solution because it is needed for the stationary path of S
2
.
The graphs of (p) and (p) show that there are two branches of the function (),
the solution of equation 10.62, one dened by p in the interval [1/
3, ) and the other

on the interval (0, 1/
3]. Consider each case

p
3 > 1: for p increasing from 1/
3, (p) increases monotonically from its

minimum value and (p) decreases monotonically from its maximum value. Hence
the function () remains in the fourth quadrant starting at (3.08, 0.97), where
p = 1/
3, and behaving as 3
4/3
/4 for large p. This is the curve MR in
gure 10.12. At p = 1/
3,
() = 1/
3. Since p > 1/
3, this curve is a local

minimum of S
1
[y], see equation 10.61.
p
3 < 1: for p decreasing from 1/
3, (p) increases monotonically from its

minimum value and (p) decreases monotonically from its maximum value: again
() remains in the fourth quadrant, and for small p, 1/p and () ln .
On this curve () decreases more slowly than on the previous curve. At p = 1/
3,
() = 1/
3. This is the curve MS in gure 10.12.

The equations 10.64 dene the parametric equations of a curve in the (, )-plane with
parameter p; this curve is shown in gure 10.12. In principle can be expressed in
terms of , but no simple formula for this relation exists. The two branches MR and
MS of (), shown in gure 10.12, start at (3.08, 0.97) with the same gradient.
2 3 4 5 6 7
-6
-5
-4
-3
-2
-1
0

M
S
R
p3 <1
p3 >1
Figure 10.12 Graph of the two branches of (), the so-
lution of equation 10.62.
Solid surrounding a hollow right circular cylinder
The above analysis shows that there is no smooth solution stationary path for S
1
[y].
However, suppose that the solid of revolution surrounds a hollow cylinder with axis
along Oy and with a given radius, a and height A, and through which the uid ows
unhindered, as shown in gure 10.13.
A
a b
y
x
Figure 10.13 Diagram of a solid surrounding a hollow cylinder.
The functional for this problem is a variation of that dened in equation 10.59,
S
3
[y] =
_
b
a
dx
x
1 +y
2
, y(a) = A, y(b) = 0. (10.66)
The solution of the associated Euler-Lagrange equation that makes S
3
[y] a minimum is
given by equations 10.64 with 1/
3 < p
1
p p
2
, with p
1
corresponding to the point
(a, A) and p
2
to (b, 0). The four constants, (p
1
, p
2
, c, B) are given in terms of (a, b, A)
by the four boundary conditions,
A = B +c(p
1
), (from y(p
1
) = A), (10.67)
0 = B +c(p
2
), (from y(p
2
) = 0), (10.68)
a = c(p
1
), b = c(p
2
), (from x(p
1
) = a and x(p
2
) = b > a). (10.69)
For given values of b/a and A/a we now show that these equations have a unique
solution, provided A/a is larger than some minimum value that depends upon b/a, and
tends to zero as b a. The equations can be solved numerically for the constants,
(p
1
, p
2
, c, B). But this task is made easier by rst expressing p
2
and c in terms of p
1
.
This is achieved by dividing the two equations 10.69 to eliminate c and writing the
resultant expression in the form
(p
2
) =
b
a
(p
1
), p
1
>
1
3
. (10.70)
This equation can be interpreted geometrically as illustrated in gure 10.14 which shows
the graphs of (p) = (1 +p
2
)
2
/p and b(p)/a > (p), the dashed line. For a given value
of p
1
> 1/
3, we can see by following the arrows on the dotted lines that a unique
value of p
2
is obtained. For large p we have (p) p
3
giving the approximate solution
p
2
(b/a)
1/3
p
1
.
(p)
p p
1
p
2
b(p)/a
1/3
Figure 10.14 Graphs of the functions (p), the solid line, and b(p)/a, the
dashed line and the geometric interpretation of the solution of equation 10.70.
The equation 10.70 for p
2
is simplied by dening a new variable
2
, p
2
= 1/ tan
2
=
tan(/2
2
), with 0 <
2
< /3, so equation 10.70 becomes
sin
3
2
cos
2
= d
3
, d
3
=
a
b
p
1
(1 +p
2
1
)
2
<
3
3
16
a
b
, d < 0.69. (10.71)
The left-hand side is O(
3
2
) as
2
0 and in this limit the solution is approximately
2
= d, that is p
2
(b/a)
1/3
p
1
, which is why we wrote d
3
on the right-hand side. For
larger values of d the solution can be approximated by a truncated Taylor series. It
transpires that the rst few terms of this series provide sucient accuracy for graphical
representations of the solution: the rst four terms give,
p
2
(d) =
1
d
_
1
2
3
d
2
1
3
d
4
78
81
d
6
+
_
. (10.72)
The constant c is given directly in terms of p
1
by equation 10.69, so, on eliminating B
from equations 10.67 and 10.68 we obtain an equation for p
1
,
A
a
=
(p
1
) (p
2
)
(p
1
)
. (10.73)
This equation can be used to determine p
1
for a given value of A/a. Numerical inves-
tigations suggest that for a given value of b/a there is a maximum value of A above
which there are no solutions, with this critical value of A tending to zero as b a.
Alternatively, as a 0, for xed b, the minimum value of A for which solutions exist
tends to innity, see exercise 10.24. A few such solutions are shown in gure 10.15 for
A = 4a, and in this case there are no solutions if b > 3.72a.
0 0.5 1 1.5 2 2.5 3 3.5
1
2
3
4
b=3.5a
b=2a b=1.5a
b=2.5a
x/a
y
a
Figure 10.15 Examples of the stationary paths of S
3
[y] for A = 4a and various
values of b/a. For this value of A there are no solutions for b > 3.72a.
Body surrounding a solid right circular cylinder
We now return to Newtons original problem. Another solution of the Euler-Lagrange
equation 10.62 is y
(x) = 0, with c = 0, so we might expect a suitable solution to be

the piecewise dierentiable function,
z(x) =
_
A, 0 x a,
y(x), a x b,
where y(x) is the solution found in equation 10.64 above, with y(a) = A, and a a
parameter to be found. This solution has a corner at (a, A), so can be treated using a
modication of the Weierstrass-Erdmann corner conditions the modication being
required because A is xed and y(a) = A, so constraining the end of the varied path to
move only in the x-direction.
However a more transparent formulation of this problem is obtained by explicitly
including the path y = A, for 0 x a, in the functional. Thus the equivalent
functional is
S
2
[y] =
1
2
a
2
+
_
b
a
dx
x
1 +y
2
, y(a) = A, y(b) = 0, (10.74)
where both the path and the variable a need to be found. The varied path is y(x)+h(x),
with h(b) = 0, and a +k. At the corner, x = a
A = y(a) and A = y(a +k) +h(a +k) (10.75)
and expanding the second equation to rst-order in we obtain a relation between k
and h(a),
ky
(a) +h(a) = 0. (10.76)

Setting F(x, y
) = x/(1 +y
2
) we obtain
S
2
[y +h] =
1
2
(a +k)
2
+
_
b
a
dxF(x, y
+h
)
_
a+k
a
dxF(x, y
+h
). (10.77)
Dierentiating with respect to , then setting to zero, integrating by parts and using
the fact that h(b) = 0 gives the Gateaux dierential, see exercise 10.26,
S
2
= ak kF
_
a, y
(a)
_
h(a)F
y
_
a, y
(a)
_
_
b
a
dxh
dF
y
dx
. (10.78)
Using the subset of variations with k = h(a) = 0 gives equation 10.62, having the
parametric solution
x = c
(1 +p
2
)
2
p
, y = B +c
_
ln p p
2
3
4
p
4
_
,
1
3
p
1
p p
2
, (10.79)
where c and B are constants and we restrict p > 1/
3 because in the previous case

only this range of p gave a minimum: this assumption is justied in exercise 10.27.
On using equation 10.76 to express h(a) in terms of k, we see that on the stationary
path the Gateaux dierential has the value
S
2
=
_
a F
_
a, y
(a)
_
+y
(a)F
y
_
a, y
(a)
_
_
k. (10.80)
This must be zero for all k and hence
a = F
_
a, y
(a)
_
y
(a)F
y
_
a, y
(a)
_
= a
1 + 3y
2
(a)
(1 +y
2
(a))
2
. (10.81)
From this it follows that y
(a) = 1 (we ignore the solution y
(a) = 0) and since y(x)

is a decreasing function y
(a) = 1. From the denition of p, equation 10.63, it follows

that p
1
= 1. Thus the solution is
x =
a
4
(1 +p
2
)
2
p
, y = A +
7a
16
+
a
4
_
ln p p
2
3
4
p
4
_
, 1 p p
2
. (10.82)
Finally the values of a and p
2
are determined from the boundary conditions x(p
2
) = b
and y(p
2
) = 0. Combining these equations we obtain
A =
bp
2
(1 +p
2
2
)
2
_
3
4
p
4
2
+p
2
2
ln p
2
7
4
_
. (10.83)
The term in curly brackets is zero at p
2
= 1 and the right-hand side of this equation
increases as p
2
for large p
2
. Also the gradient of the right-hand side is positive for p
2
> 1,
see exercise 10.25. Hence for any positive value of A this equation gives a unique value
of p
2
; and then a can be determined from either of equations 10.82. Further, this path
is a local (weak) minimum, see exercise 10.27.
In gure 10.16 are shown some solutions for the cases A = 4, b = 1, 2, , 5, 8 and
10: in this gure only the curved parts of the solutions are shown.
0 1 2 3 4 5 6 7 8 9 10
1
2
3
4
b=5
b=8 b=10
1 2 3
4
x
y
Figure 10.16 Graphs of the solutions dened in equation 10.82 for A = 4 and
b = 1, 2, , 5, 8 and 10. Here the horizontal part of each solution, from x = 0
to a, is not shown.
Exercise 10.23
Derive the rst two terms of equation 10.72.
Exercise 10.24
Show that as a 0 equation 10.73 can be written in the approximate form,
A
a

3
4
_
b
a
_
4/3
(p1)
1/3
,
and hence that for suciently small a there is no solution if A 1.09b
4/3
a
1/3
.
Exercise 10.25
Denote the right-hand side of equation 10.83 by bG(p2) where
G(p) =
p
(1 +p
2
)
2
_
3
4
p
4
+p
2
ln p
7
4
_
and show that G(1) = 0, G(p) =
3
4
p + O(1/p) as p , and that G
(p) > 0 for

p > 1/
3.
Exercise 10.26
Derive the Gateaux dierential 10.78.
Exercise 10.27
Show that the second derivative of S2[y +h] evaluated at = 0 is
d
2
S2
d
2
=
1
2
k
2
+ 2
_
b
a
dx
x(3y
2
1)
(1 +y
2
)
3
h
2
,
where k is dened in equation 10.76. Deduce that the stationary path dened by
equation 10.82 gives a (weak) local minimum of S2[y], provided 3y
2
> 1.
Exercise 10.28
(a) Consider the value of S1[y], dened in equation 10.59, on the path
z(x) = Acos
_
n +
1
2
_
x
b
, 0 x b
where n is any integer, and show that S1[z] may be made arbitrarily small.
(b) Which norm does this path satisfy?
Exercise 10.29
Show that the stationary paths of the functional S[y] =
_
b
a
dxf(x, y)
_
1 +y
2
,
y(a) = 0 and with natural boundary conditions at x = b are parallel to the x-axis
at x = b if f(x, y) = 0.
Exercise 10.30
Show that the stationary paths of the functional S[y] =
_
a
0
dxy
_
1 +y
2
, y(0) = A > 0,
with natural boundary conditions at x = a, are given by
y = c cosh
_
a x
c
_
with A = c cosh
_
a
c
_
.
Show that there are two solutions if A > 1.509a and none for smaller A.
Exercise 10.31
Derive the Euler-Lagrange equations for the functional
S[y] = G(y(v)) +
_
v
a
dxF(x, y, y
), y(a) = A,
for each of the two boundary conditions,
(a) natural boundary conditions at x = v, and
(b) the right end of the stationary path terminating on the curve dened by
(x, y) = 0.
Exercise 10.32
Find the stationary paths for the functional S[y] =
_
v
0
dx
_
1 +y
2
y
, y(0) = 0,
where the point (v, y(v)) is constrained to the curve x
2
+ (y r)
2
= r
2
, that is a
circle of radius r with centre on the y-axis at y = r.
Exercise 10.33
S[y] =
_
v
a
dxf(x, y)
_
1 +y
2
exp
_
tan
1
y
_
, y(a) = A,
with the condition that the right-hand end of the stationary path lies on the curve
C dened by (x, y) = 0. If the gradient of C and the stationary path at the point
of intersection are, respectively, tan C and tan , show that
tan( C) =
1
.
Exercise 10.34
Show that the stationary path of the functional
S[y] =
_
1
0
dx(xy +y
2
), y(0) = y
(0) = y(1) = 0,
is y(x) = x
2
(1 x)(2x
2
+ 2x 7)/480.
Exercise 10.35
A weight of mass M is hung from the end, x = L, of the beam described by the
functional of equation 10.1 and the beam is clamped at x = 0. The relevant energy
functional is
E[y] = Mgy(L) +
_
L
0
dx
_
1
2
y
2
(x)gy
_
, y(0) = y
(0) = 0,
where the y-axis is pointing downwards. Find the associated Euler-Lagrange equa-
tion and boundary conditions for this problem. Solve this equation in the case
that is independent of x.
Exercise 10.36
A weight of mass M is hung from a given point, x = , 0 < < L, of the beam
described by the functional of equation 10.1 and the beam rests on supports at
x = 0 and x = L, both at the same level. The relevant energy functional is
E[y] = Mgy() +
_
L
0
dx
_
1
2
y
2
(x)gy
_
, y(0) = y(L) = 0,
where the y-axis is pointing downwards. Assuming that y(x) is continuous at
x = , nd the associated Euler-Lagrange equation and all the boundary condi-
tions for this problem.
Exercise 10.37
Prove that the functional
S[y] =
_
b
a
dx
_
y
2
+ 2yy
+y
2
_
, y(a) = A, y(b) = B,
can have no broken extremals.
Exercise 10.38
Can the functional S[y] =
_
a
0
dxy
3
, y(0) = 0, y(a) = A, have broken extremals
Exercise 10.39
Does the functional S[y] =
_
a
0
dx
_
y
4
6y
2
_
, y(0) = 0, y(a) = A > 0, have any
stationary paths with a single corner? Find any such paths.
Exercise 10.40
Find the equation for the stationary curve of the modied brachistrochrone prob-
lem in which the initial point is (0, A), A > 0, and the nal point is on a circle
with centre on the x-axis at x = b and with radius r < b. The particle starts from
rest at (0, A).
Chapter 11
Conditional stationary points
11.1 Introduction
In this chapter we introduce the method needed to treat constrained variational prob-
lems, examples of which are the isoperimetric and catenary problems, described in
sections 3.5.5 and 3.5.6. With such problems the admissible paths are constrained to
a subset of all possible paths: in the isoperimetric and catenary problems these con-
straints are the lengths of the boundary and chain, respectively.
We introduce the technique required using the simpler example of constrained sta-
tionary points of functions of two or more variables, beginning with a discussion of a few
elementary cases; the method is applied to the Calculus of Variations in the next chap-
ter. Throughout this chapter we assume that all functions are suciently dierentiable
in the region of interest.
Consider a walker on a hill but conned to a one-dimensional path, AB, as shown
in gure 11.1.
AA
BB
2
0
2
x
3
2
1
0
1
2
3
y
0
0.5
1
1.5
2
2.5
3
h
Figure 11.1 Graph showing the height h(x, y) of the hill as x
and y vary. The path x + y = 1 is depicted by the solid line.
In this example the height of the hill is represented by the function
h(x, y) = 3 exp(x
2
y
2
/2) (11.1)
287
288 CHAPTER 11. CONDITIONAL STATIONARY POINTS
and the path by the equation x +y = 1. This hill has a global maximum at x = y = 0,
but because the path does not pass through this point the maximum height attained
by the walker is less. The problem is to nd this stationary point and its position: we
should also like to classify this stationary point, but usually this is more dicult.
The maximum height of the walker may be determined by rearranging the equation
of the path to express y in terms of x, y = 1 x, and then by expressing the height in
terms of x alone,
h(x) = 3 exp
_
3
2
x
2
+x
1
2
_
. (11.2)
The maximum of this function may be found by the methods described in section 8.2,
see also exercise 11.1, and is max(h(x, y)) = 3e
1/3
. In this example the path x+y = 1
constrains the walker and is named the constraint, or the equation of constraint.
Another problem is that of inscribing a rectangle of maximum area inside a given
ellipse, such that all corners of the rectangle lie on the ellipse, as shown in gure 11.2.
y
b
(x,y)
a
x
Figure 11.2 Diagram of a rectangle inscribed in the ellipse
dened by equation 11.3.
The coordinates of the top right-hand corner of the rectangle are (x, y) and since the
equation of the ellipse is
x
2
a
2
+
y
2
b
2
= 1, (11.3)
this is the equation of constraint. The area of the rectangle is
A(x, y) = 4xy, x > 0, y > 0, (11.4)
so we need the maximum of this function subject to the constraint 11.3: this problem
is solved in exercise 11.2.
If there are two independent variables, (x, y), there can be only one constraint which
we denote by g(x, y) = 0, and we require the stationary points of f(x, y) subject to this
constraint. Geometrically the constraint equation, g(x, y) = 0, denes a curve C
g
in the
Oxy plane, see for example gure 11.3, so we are searching for the stationary points of
f(x, y) along this curve.
With two independent variables there can be only one constraint because another
constraint, (x, y) = 0, denes another curve, C
that intersects C
g
at isolated points,
if at all. Sometimes, however, the equations g(x, y) = 0 and (x, y) = 0 will dene the
same curve, despite being algebraically dissimilar: then the functions g and are said
to be dependent and it can be shown that in the region where the curves g(x, y) = 0 and
(x, y) = 0 coincide there is a dierentiable function F(u, v) of two real variables such
that F(g(x, y), (x, y)) =constant: alternatively, using the implicit function theorem,
section 1.3.7, it can be shown that can be expressed in terms of g, (x, y) = G(g(x, y)),
or vice versa. It is not always obvious that two functions dene the same curve: for
instance the equations
g(x, y) = y sinh
1
(tan x) = 0 and (x, y) =
2
1 + tan(x/2)
1 e
y
= 0 (11.5)
dene the same line in the vicinity of the origin.
The equation of constraint, g(x, y) = 0, can be used to express y in terms of x,
provided g
y
(x, y) = 0, and then the function f(x, y) becomes a function, f(x, y(x)), of
the single variable x, representing the variation of f along those segments of the curve
C
g
not including tangents parallel to Oy, for instance the segments AB and BC of the
curve depicted in gure 11.3. The stationary points of f(x, y(x)) can then be found in
the usual manner. Similarly, for segments of C
g
on which g
x
(x, y) = 0, such as A
or B
in gure 11.3, we can form f(x(y), y) and treat this as a function of the single
variable y.
g
C
A
A
C
y
x
B
B
C
Figure 11.3 A typical curve dened by a constraint equation
g(x, y) = 0. The segments AB and BC may be represented by func-
tions y(x) and the segment A
and B
by functions x(y).
If there are three independent variables, (x, y, z) and we require the stationary points
of f(x, y, z) subject to the single constraint g
1
(x, y, z) = 0, we may proceed in the same
manner, by using the constraint to express z in terms of (x, y) to form the function
f(x, y, z(x, y)) of two independent variables. With two constraints g
k
(x, y, z) = 0,
k = 1, 2, the more general implicit function theorem, described on page 30, may be used
to express any two variables in terms of the third, to express f(x, y, z) as a function
of one variable. In either case there are three ways to proceed and it is rarely clear in
advance which yields the simplest algebra.
In general, with n variables x = (x
1
, x
2
, . . . , x
n
) there can be at most n 1 con-
straints. Suppose there are m constraints, m n 1, g
k
(x) = 0, k = 1, 2, , m.
Then, in principle we may use these m equations to express m of the variables in terms
of the remaining n m, hence giving a function of n m variables. In practice this is
rarely an easy task.
There are two main methods of dealing with constrained stationary problems. The
conceptually simplest method is to reduce the number of independent variables, as
described above, and in simple examples this method is usually preferable. The more
elegant method, due to Lagrange (1736 1813), is described in the next section.
There are two main disadvantages with the direct method:
1. The method is biased because it treats the variables asymmetrically, by expressing
some in terms of the others; it is often dicult to determine the most convenient
choice in advance.
2. The most important diculty, however, is that the method cannot easily be gen-
eralised to deal with other situations, such as functionals.
Use of the direct method is illustrated in the following exercises.
Exercise 11.1
Show that the function dened in equation 11.1 has a local maximum at x = 1/3,
where y = 2/3, and that the height of the hill here is 3e
1/3
.
Exercise 11.2
Show that the area of the rectangle inscribed in the ellipse shown in gure 11.2
can be expressed in the form
A(x) =
4b
a
x
_
a
2
x
2
, 0 x a,
and by nding the stationary point of this expression show that max(A) = 2ab.
Exercise 11.3
Geometric problems often give rise to constrained stationary problems and here
we consider a relatively simple example.
Let P be a point in the Cartesian plane with coordinates
(A, B) and D the distance from P to any point (x, y) on
the straight line with equation
x
a
+
y
b
= 1.
Show that D
2
= (x A)
2
+ (y B)
2
and deduce that
the shortest distance is
min(D) =
|ab Ab Ba|
a
2
+b
2
.
(x,y)
x
y
(A,B)
D
b
a
P
Exercise 11.4
If A, B and C are the angles of a triangle show that the function
f(A, B, C) = sin Asin Bsin C
is stationary when the triangle is equilateral.
Hint the constraint is A +B +C = .
Exercise 11.5
If z = f(x, y) and x and y satisfy the constraint g(x, y) = 0, show that at the
stationary points of z the contours of f(x, y), that is the curves dened by the
equations f(x, y) =constant, are tangential to the curve dened by g(x, y) = 0.
11.2. THE LAGRANGE MULTIPLIER 291
11.2 The Lagrange multiplier
The method for nding constrained stationary points described in the introduction
is unsatisfactory partly because it forces an arbitrary distinction between variables,
and partly because this technique cannot be applied to constrained problems in the
Calculus of Variations. The introduction of the Lagrange multiplier overcomes both
these diculties.
11.2.1 Three variables and one constraint
Lagranges method allows all variables to be treated equally, and may be illustrated
using a function f(x, y, z) of three variables and with one constraint g(x, y, z) = 0.
The problem is to nd the points at which f(x, y, z) is stationary subject to the con-
straint. Let (a, b, c) be the required stationary point and consider the neighbouring
points (a +u, b +v, c +w), where is small, which also satisfy the constraint, that
is g(a +u, b +v, c +w) = 0. Using Taylors theorem, see section 1.3.9, we have
g(a +u, b +v, c +w) = g(a, b, c) +
_
u
g
x
+v
g
y
+w
g
z
_
+O(
2
), (11.6)
where all derivatives are evaluated at (a, b, c). But both points satisfy the constraint so
we have
u
g
x
+ v
g
y
+w
g
z
= O().
The left-hand side is independent of , so taking the limit 0 gives
u
g
x
+v
g
y
+w
g
z
= 0. (11.7)
This equation can be interpreted as the equation of a plane
passing through the origin, in the Cartesian space with axes
Ou, Ov and Ow, as shown in the diagram. The normal to
this plane is parallel to the vector n = (g
x
, g
y
, g
z
), and the
plane exists provided |n| = 0: this means that the constraint
must not be stationary at (a, b, c). Any point in this plane can
be dened uniquely with just two coordinates. It follows that
(u, v, w) cannot vary independently but that usually any one
of these variables can be expressed in terms of the other two.
u
w
v
O
n
This is, of course, equivalent to using the implicit function theorem on the equation
g(x, y, z) = 0 to express one variable in terms of the other two.
If f(x, y, z) is stationary then, by denition, see section 3.2.1,
f(a +u, b +v, c +w) f(a, b, c) = O(
2
)
which means, by the same argument as before, that
u
f
x
+v
f
y
+w
f
z
= 0. (11.8)
Recall that if there were no constraint this equation must hold for independent variations
u, v and w: then by choosing v = w = 0 and u = 0 we see that f/x = 0: the other two
equations, f/y = f/z = 0, are obtained similarly. But because of the constraint
u, v and w cannot vary independently.
We proceed by introducing a new variable, the Lagrange multiplier , also named the
undetermined multiplier, so there are now four variables to be determined (x, y, z) and
: surprisingly this simplies the problem. Multiply equation 11.7 by and subtract
from equation 11.8 to form another equation,
_
f
x

g
x
_
u +
_
f
y

g
y
_
v +
_
f
z

g
z
_
w = 0. (11.9)
This equation is true for any value of . Because of the constraint, variations in u, v
and w are not independent but, if g/z = 0 we may choose to make the coecient
of w in equation 11.9 zero, that is
f
z

g
z
= 0. (11.10)
Then equation 11.9 reduces to
_
f
x

g
x
_
u +
_
f
y

g
y
_
v = 0.
Because u and v may be varied independently, by rst setting v = 0 and then u = 0,
we obtain the two equations
f
x

g
x
= 0,
f
y

g
y
= 0. (11.11)
The three equations 11.10 and 11.11 relate the four variables x, y, z and . Assuming
that the implicit function theorem can be applied, that is the Jacobian 1.26 (page 30)
is not zero, we can use these equations to express (x, y, z) in terms of . Then the
constraint becomes g(x(), y(), z()) = 0, which determines appropriate values of .
This procedure is equivalent to dening an auxiliary function of four variables
F(x, y, z, ) = f(x, y, z) g(x, y, z) (11.12)
and nding the stationary points of F(x, y, z, ) using the conventional theory for all
four variables, that is the solutions of
F
x
=
f
x

g
x
= 0, F
y
=
f
y

g
y
= 0, F
z
=
f
z

g
z
= 0,
and F
= g(x, y, z) = 0. Usually the rst three of these are solved rst to give
(x(), y(), z()) in terms of , and then the fourth, the equation of constraint, is
used to determine , although the order in which these equations are solved is clearly
immaterial.
Thus the introduction of the Lagrange multiplier , , gives a method of nding
stationary points that treats the three original variables equally. Before showing how
this method generalises to n variables and m n 1 constraints we apply it to the
triangle problem treated in exercise 11.4.
For this problem f(x, y, z) = sinxsin y sin z and g(x, y, z) = x + y + z , so that
the auxiliary function is
F(x, y, z, ) = sin xsin y sinz (x +y +z ),
with each of x, y and z in the interval (0, ). Equations 11.10 and 11.11 become
sin xsin y cos z = 0, sinxcos y sin z = 0, cos xsin y sinz = 0,
and x + y + z = . Three dierent equations, independent of , may be obtained by
forming pairs of dierences: thus subtracting the second equation from the rst gives
sin x(sin y cos z cos y sin z) = sin x sin(y z) = 0. (11.13)
Similarly, by subtracting the third from the second and the third from the rst we
obtain
sinz sin(x y) = 0 and sin y sin(z x) = 0.
From 11.13 either sin x = 0 or sin(y z) = 0; but for a triangle of nonzero area none
of x, y or z can be zero or , so < y z < and the only solution is y = z. The
remaining two equations give y = x and z = x and hence x = y = z and then the
constraint gives x = y = z = /3.
Exercise 11.6
Use a Lagrange multiplier to nd the stationary points of the problems set in
exercises 11.1, 11.2 and 11.3.
Exercise 11.7
Show that the stationary distance between the origin and the plane dened by the
equation ax +by +cz = d is given by the formula |d|/
a
2
+b
2
+c
2
.
Exercise 11.8
Consider a rectangle, two sides of which are along the x- and y-axes; the bot-
tom left-hand corner is at the origin and the opposite corner lies on the line
x/a +y/b = 1, where a and b are positive numbers. Show that the stationary
area of such a rectangle is A = ab/4 and that for this rectangle the top right-hand
corner is at (a/2, b/2).
11.2.2 Three variables and two constraints
If there are three variables and two constraints, g
1
(x, y, z) = 0 and g
2
(x, y, z) = 0, then
equation 11.7 must hold for both constraints so we have the two equations
u
g
1
x
+v
g
1
y
+w
g
1
z
= 0 and u
g
2
x
+v
g
2
y
+w
g
2
z
= 0, (11.14)
where all derivatives are evaluated at the stationary point. Provided neither g
1
(x, y, z)
nor g
2
(x, y, z) is stationary, and that the normals to the planes dened by the equations
are not parallel, so that the planes exist and are distinct, then the planes intersect along
a line and there can be only one independent variable.
Equation 11.8 remains valid and now we proceed by introducing two Lagrange mul-
tipliers,
1
and
2
, one for each constraint. Thus from equations 11.8 and 11.14 we
may form another equation,
_
f
x

1
g
1
x

2
g
2
x
_
u+
_
f
y

1
g
1
y

2
g
2
y
_
v+
_
f
z

1
g
1
z

2
g
2
z
_
w = 0.
(11.15)
Now choose
1
and
2
to make the coecients of v and w zero, that is
f
y

1
g
1
y

2
g
2
y
= 0 and
f
z

1
g
1
z

2
g
2
z
= 0. (11.16)
Then, since u may be varied independently, we have a third equation
f
x

1
g
1
x

2
g
2
x
= 0. (11.17)
The three equations 11.16 and 11.17 may, in principle, be solved to give (x, y, z) in terms
of
1
and
2
and then the constraints, g
j
(x, y, z) = 0, j = 1, 2, give two equations for
1
and
2
. Needless to say, in practice these equations are not usually easy to solve.
As in the previous case this is formally equivalent to dening an auxiliary function
F(x, y, z) = f(x, y, z)
1
g
1
(x, y, z)
2
g
2
(x, y, z), (11.18)
of ve variables and nding the stationary points of this, that is the solutions of
F
x
= 0,
F
y
= 0,
F
z
= 0,
F
1
= 0 and
F
2
= 0.
We illustrate this method by showing how to nd the stationary values of f(x, y, z) =
ax
2
+by
2
+cz
2
, subject to the variables being conned to the planes x +y +z = 1 and
x + 2y + 3z = 2. The auxiliary function is
F = ax
2
+by
2
+cz
2
1
(x +y +z 1)
2
(x + 2y + 3z 2)
so the equations to be solved are
F
x
= 2ax (
1
+
2
) = 0,
F
y
= 2by (
1
+
2
)
2
= 0,
F
z
= 2cz (
1
+
2
) 2
2
= 0.
In this case it is convenient to dene a new variable =
1
+
2
, and then these three
equations can be solved to give
x =

2a
, y =
+
2
2b
, z =
+ 2
2
2c
,
and the equations of constraint become
(ab +ac +bc)+
2
(2ab +ac) = 2abc and (3ab + 2ac +bc)+
2
(6ab + 2ac) = 4abc,
which have the solution
=
4ab
a + 4b +c
and
2
=
2b(c a)
a + 4b +c
.
Hence
x =
2b
a + 4b +c
, y =
a +c
a + 4b +c
and z = x =
2b
a + 4b +c
. (11.19)
Exercise 11.9
Derive equations 11.19 by using the constraints to express x and y in terms of z.
Note that in this example the direct method is easier, because the constraints are
linear.
Exercise 11.10
If f(x) is a function of the n variables x = (x1, x2, , xn) constrained by the
single function g(x) = 0 show that the stationary points can be found by forming
the auxiliary function F(x, ) = f(x) g(x) of n + 1 variables and nding its
stationary points.
11.2.3 The general case
The method of Lagrange multipliers is applied to the case of n variables and m n1
constraints, g
j
(x), j = 1, 2, , m, in a similar fashion, but with m multipliers, so the
auxiliary function has n +m variables,
F(x,
1
,
2
, ,
m
) = f(x)
m
j=1
j
g
j
(x) (11.20)
where f(x) is the function for which stationary points are required. The stationary
points of F are at the roots of
F
x
k
=
f
x
k
j=1
j
g
j
x
k
= 0, k = 1, 2, , n, (11.21)
F
j
= g
j
(x) = 0, j = 1, 2, , m n 1. (11.22)
This method has the advantage of treating all variables equally and hence retaining any
symmetries that might be present.
The Lagrange multiplier method determines the position of stationary points. It is
generally more dicult to determine the nature of a constrained stationary point and
normally one has to use physical or geometric considerations besides algebraic methods
to understand the problem.
11.3 The dual problem
We end this chapter by returning to the case of only one constraint and one Lagrange
multiplier. That is we seek the stationary points of the function f(x) subject to the
constraint g(x) = 0. The auxiliary function is F(x, ) = f(x) g(x) and, provided
= 0 this may be rewritten in the alternative form
G(x, ) = g(x) f(x) where = 1 and G(x, ) = F(x, ). (11.23)
This equation can be used to nd the stationary points of g(x) subject to the constraint
f(x) = 0, which are given by the roots of
G
x
k
=
g
x
k
f
x
k
= 0, k = 1, 2, , n,
which are the same equations as for the stationary points of the original problem. If
x() is a solution of these equations the stationary point of the new constrained problem
is given by those satisfying f(x()) = 0. Further, since = 1, the stationary points
of the original problem are x(1/) with the values of given by g(x(1/)) = 0. Thus
the Lagrange multiplier method highlights a duality between,
a) the stationary points of f(x) with the constraint g(x) = 0, and
b) the stationary points of g(x) with the constraint f(x) = 0,
which is not apparent in the conventional method.
Exercise 11.11
This exercise provides an illustration of the duality described above; compare this
problem with that considered in exercise 11.1.
Find the stationary value of the function g(x, y) = x + y 1 subject to the
constraint f(x, y) = 3 exp(x
2
y
2
/2) c where c is a positive constant.
Exercise 11.12
An open rectangular box made of thin sheet metal and sides of height z and a
rectangular base of interior dimensions x and y. The base and sides of length x
are of (small) uniform thickness d and the sides of length y are of thickness 2d. If
the volume of metal is xed prove that the volume of the box is stationary when
x = 2y = 4z.
Exercise 11.13
A vessel comprises a cylinder of radius r and height h with equal conical ends, the
semi-vertical angle of each cone being . Show that the volume V and the surface
area, S, are given by
V = r
2
h +
2r
3
3 tan
and S = 2rh +
2r
2
sin
.
If r, h and can vary, show that for a vessel of given volume the stationary surface
area occurs when cos =
2
3
. Also nd the value of h in terms of r and and r in
terms of V .
Exercise 11.14
Show that equations 11.5 dene the same line in the neighbourhood of the origin.
Exercise 11.15
Find the stationary value of f = x
2
+ y
2
+ z
2
+ w
2
, subject to the constraint
(xyzw)
2
= 1, and the values of the variables at which the stationary values are
attained.
Exercise 11.16
Find the stationary points of f = xyzw
9
subject to the constraint g = 4x
4
+2y
8
+
z
16
+ 9w
16
= 1 in the region where all variables are positive.
Exercise 11.17
If a, b, c and d are given positive numbers and x, y and z are positive, real variables
satisfying the equation x +y +z = d, show that the function
f(x, y, z) =
a
2
x
+
b
2
y
+
c
2
z
possesses a stationary value (a +b +c)
2
/d.
Exercise 11.18
Show that the shortest distance between the plane ax + by + cz = d in the Oxy-
plane and the point (A, B, C) is given by
D =
|Aa +Bb +Cc d|
a
2
+b
2
+c
2
.
Exercise 11.19
For a simple lens with focal length f the object distance p and the image distance
q are related by 1/p+1/q = 1/f. If p+q =constant nd the stationary value of f.
Exercise 11.20
Show that the stationary points of f = ax
2
+by
2
+ cz
2
, where the constants a, b
and c are all positive, on the line where the vertical cylinder, x
2
+y
2
= 1, intersects
the plane x +y +z = 1, are given by
x =
2
2(a 1)
, y =
2
2(b 1)
and z =
2
2c
,
where the two possible values of (1, 2) are
1 =
1
2
(a +b + 4c) , 2 =
2c(6c 2)
2c
with =
_
(a b)
2
+ 8c
2
.
Exercise 11.21
Show that the area, S, of canvas needed to make a tent of given volume V com-
prising a right circular cylinder of radius r, made of a single thickness of canvas,
together with conical top of height h, made of two thickness of canvas is given by
S =
2V
r
+ 2r
_
r
2
+h
2
2
3
rh.
If both r and h can vary show that the stationary value of S, for xed V , is given
by r
2 = 4h = R where V = 2R
3
/3.
Chapter 12
Constrained Variational
Problems
12.1 Introduction
In this chapter we apply the Lagrange multiplier method to functionals with constrained
admissible functions. Examples are the isoperimetric and the catenary problems, de-
scribed in sections 3.5.5 and 3.5.6, where the constraint is another functional. In these
examples the stationary path is described by a single function, y(x).
But, the most celebrated isoperimetric problem is that enshrined in the myth de-
scribing the foundation of the Phoenician city of Carthage in 814 BC: this is that Dido,
also known as Elissa, having ed from Tyre after her brother, King Pygmalion, had
killed her husband, was granted by the Libyans as much land as an ox-hide could cover.
By cutting the hide into thin strips, she was able to claim far more ground than an-
ticipated. In common with all foundation myths there is no trace of evidence for its
veracity.
Didos solution is a circle which cannot be described by a single function, the natural
representation being parametric. Thus, we need to consider the eects of constraints
on both types of functionals.
There is, however, another type of constrained problem, of equal signicance, exem-
plied by the problem of nding geodesics on surfaces. Consider a surface dened in the
three dimensional Cartesian space, which we suppose can be dened by an equation of
the form S(x, y, z) = 0. Given two points on this surface we require the shortest line, on
the surface, joining these points. Any smooth path can be represented parametrically
by three functions (x(t), y(t), z(t)) of a parameter t, with end points at t = 0 and t = 1.
The distance along this path is given by the functional
D[x, y, z] =
_
1
0
dt
_
x
2
+ y
2
+ z
2
and the constraint that forces this path to be on the surface is S(x(t), y(t), z(t)) = 0 for
0 t 1. This is a dierent type of constraint than found in the problems described
above. In the non-assessed sections 12.7 and 12.8 this theory is used to solve variants
of the brachistochrone problem.
299
300 CHAPTER 12. CONSTRAINED VARIATIONAL PROBLEMS
No fundamentally new ideas are presented in this chapter, but many ideas and
techniques introduced in previous chapters are used in a slightly dierent context, to
derive new results. As you read through this chapter you should ensure that you
thoroughly understand the previous work upon which it is based.
12.2 Conditional Stationary values of functionals
12.2.1 Functional constraints
One possible method of dealing with constrained problems is to use admissible functions
that automatically satisfy the constraint; this is the equivalent of the direct method
discussed in the introduction to the previous chapter. Unfortunately it is not always
possible to formulate satisfactory rules for dening such functions so the alternative
method, described in theorem 12.1 below, is essential.
The general theory for this type of function is a combination of the Lagrange multi-
plier method, described in chapter 11, and the derivation of the Euler-Lagrange equation
given in chapter 4; it is convenient to summarise the result as a theorem.
Theorem 12.1
Given the functional
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (12.1)

where the admissible curves must also satisfy the constraint functional
C[y] =
_
b
a
dxG(x, y, y
) = c, (12.2)
where c is a given constant, then, if y(x) is not a stationary path of C[y], there exists a
Lagrange multiplier such that y(x) is a stationary path of the auxiliary functional
S[y] =
_
b
a
dxF(x, y, y
), y(a) = A, y(b) = B, (12.3)

where F = F G. That is, the stationary path is given by the solutions of the
d
dx
_
F
y
F
y
= 0, y(a) = A, y(b) = B. (12.4)
The solution of this Euler-Lagrange equation will depend upon , the value of which is
determined by substituting the solution into the constraint functional 12.2.
The proof of this theorem requires a signicant, and not immediately obvious, change
to the proof presented in section 4.4. Thus before providing the proof it is instructive
to see what happens when the general theory of section 4.4 is applied directly to this
type of problem; this shows why a modication is required.
12.2. CONDITIONAL STATIONARY VALUES OF FUNCTIONALS 301
Suppose that y(x) is the required solution: consider the neighbouring admissible
function y(x) +h(x) where h(a) = h(b) = 0, then the Gateaux dierential is
S[y, h] =
_
b
a
dx
_
F
y

d
dx
_
F
y
__
h(x). (12.5)
But both y(x)+h(x) and y(x) are chosen to satisfy the constraint, that is C[y +h] = C[y],
so the rate of change of C[y] is zero, that is the Gateaux dierential is zero
C[y, h] =
_
b
a
dx
_
G
y

d
dx
_
G
y
__
h(x) = 0 for all h(x). (12.6)
It is assumed that C[y] is not stationary, so G(x, y, y
) does not satisfy the Euler-

Lagrange equation. But 12.6 is true for all h(x) only if G satises the Euler-Lagrange
equation. This contradiction can be resolved with a judicious choice of h(x). The
problem is that the constraint places an additional restriction on the variation h(x)
so that the theory developed in chapter 5, which placed no restriction (other than
dierentiability) on h(x), needs to be modied.
The same problem arises with functions of n real variables, s(x) and a single con-
straint c(x) = 0. In this case the equivalents of expressions 12.5 and 12.6 are
s[x, h] =
n
k=1
h
k
s
x
k
= 0 and c[x, h] =
n
k=1
h
k
c
x
k
= 0.
But the second of these equations is true for all variations satisfying the constraint, so
the h
k
cannot be varied independently, and therefore we cannot deduce that s/x
k
= 0
for all x
k
.
In order to derive the Euler-Lagrange equation 12.4 we use a special set of variations.
Recall that when rst deriving the Euler-Lagrange equation in section 4.4 we used the
fundamental Lemma, section 4.3, which involved sets of functions h(x) that isolated
small intervals of the integrand. Here we use a modication of this method that involves
picking out two, small, distinct intervals.
This is achieved by writing
h(x) =
1
g(x
1
) +
2
g(x
2
),
1
=
2
, (12.7)
where the function g(x ) is strongly peaked in a neighbourhood of x = and zero
for other x.
Such functions can be constructed from the type of function used to prove the
fundamental lemma, section 4.3; for example dene
g(x ) =
_
_
_
1
2
_
2
(x )
2
_
, a < x + < b,
0, otherwise.
(12.8)
The coecient
2
is chosen to make g = O(1). This function is zero except in the
neighbourhood of width 2 centred at x = .
For any function f(x) possessing a third derivative for a x b, we have, see
exercise 12.1
_
b
a
dxf(x)g(x ) = f() +O(
3
), =
4
3
, a + < < b . (12.9)
In the following analysis we use the specic family of functions 12.8 in order to illustrate
how the proof works. Such a restriction is not necessary, but without it the more general
equivalent of equation 12.9 needs to be derived; the only signicant dierence between
equation 12.9 and the general case is that the term O(
3
) is replaced by a term O(
2
).
For convenience dene the functions
F(x) =
d
dx
_
F
y
F
y
and G(x) =
d
dx
_
G
y
G
y
which we assume are suciently well behaved for a x b. Then the integrals 12.5
and 12.6 become, respectively,
S =
_
b
a
dxF(x)
_
1
g(x
1
) +
2
g(x
2
)
_
=
_
1
F(
1
) +
2
F(
2
)
_
+O(
3
), (12.10)
C =
_
b
a
dxG(x)
_
1
g(x
1
) +
2
g(x
2
)
_
=
_
1
G(
1
) +
2
G(
2
)
_
+O(
3
). (12.11)
The functional C[y] is not stationary therefore we may choose
2
such that G(
2
) = 0,
and then equation 12.11 gives, since C = 0,
2
=
G(
1
)
G(
2
)
1
+O(
2
).
Substituting this into equation 12.10 and using the fact S[y] is stationary, so S = 0,
_
F(
1
)
G(
1
)

F(
2
)
G(
2
)
_
1
= O(
2
).
Since this equation must be true for all
1
, and the left-hand side is independent of ,
we must have
F(
1
)
G(
1
)
=
F(
2
)
G(
2
)
.
Finally, recall that
1
and
2
are arbitrary, so it follows that the ratio F(x)/G(x) is
independent of x. Setting this ratio to a constant we obtain
d
dx
_
F
y
F
y

_
d
dx
_
G
y
G
y
_
= 0 for a x b, (12.12)
which is just equation 12.4 and can be derived from the functional S[y] in the usual
manner.
This proof shows clearly why two small parameters,
1
and
2
, are necessary; we
need the exibility to isolate two distinct points,
1
and
2
, in the interval (a, b) to
show that the ratio F(x)/G(x) is independent of x. In this proof it is necessary to
assume that G(x) = 0 for almost all values of x in this interval: that is, C[y] must not
be stationary.
Exercise 12.1
Prove equation 12.9
Exercise 12.2
Use theorem 12.1 to show that the stationary path of the variational problem
S[y] =
_
1
0
dxy
2
, y(0) = y(1) = 0,
subject to the constraint that the area under the curve is xed, that is
C[y] =
_
1
0
dxy(x) = A,
is given by y = 6Ax(1 x), and that the undetermined multiplier is = 24A.
Exercise 12.3
Show that the stationary path of the functional
S[y] =
_
2
1
dxxy
2
, y(1) = y(2) = 0,
subject to the constraint
_
2
1
dxy = 1 is given by y(x) =
2 ln 2
3 ln 2 2
_
1 x +
lnx
ln2
_
.
We end this section by considering the eect of M functional constraints on a functional
of n dependent variables. The extension required is identical to that described in
the previous chapter, namely a Lagrange multiplier is added for each constraint: we
summarise the result as a theorem.
Theorem 12.2
S[y] =
_
b
a
dxF(x, y(x), y
(x)), y(a) = A, y(b) = B, (12.13)

where y = (y
1
, y
2
, . . . , y
n
) and where the admissible curves must also satisfy the M
constraint functionals
C
j
[y] =
_
b
a
dxG
j
(x, y(x), y
(x)) = c
j
, j = 1, 2, , M, (12.14)
where the c
j
are M given constants, then, if y(x) is not a stationary path of any of the
constraints, there exists a set of M Lagrange multipliers
j
, j = 1, 2, , M, such that
y(x) is a stationary path of the functional
S[y] =
_
b
a
dxF(x, y(x), y
(x)), y(a) = A, y(b) = B, (12.15)

where F = F
M
j=1

j
G
j
. That is, the stationary path is given by the solution of the
Euler-Lagrange equations
d
dx
_
F
y
k
_
F
y
k
= 0, y
k
(a) = A
k
, y
k
(b) = B
k
, k = 1, 2, , n. (12.16)
The solution of these n Euler-Lagrange equations will depend upon M Lagrange mul-
tipliers, the values of which are determined by substituting the solution into the M
constraint functionals 12.14.
Exercise 12.4
Show that the stationary paths of the functional
S[y, z] =
_
1
0
dx
_
y
2
+z
2
2xz
4z
_
, y(0) = z(0) = 0, y(1) = z(1) = 1,
C[y, z] =
_
1
0
dx
_
y
2
xy
z
2
_
= c,
are given by
y =
(4 3)x x
2
4(1 )
and z =
(3 + 2)x x
2
2(1 +)
where is a solution of 1 +c =
24 46 + 23
2
48(1 )
2

1
12(1 +)
2
.
12.2.2 The dual problem
The form of theorem 12.1 suggests the same duality as for functions of real variables,
described in section 11.3. Thus we may change the roles of the functionals S[y] and C[y]
in theorem 12.1 and, provided = 0, the Euler-Lagrange equation of the functional
C[y] = C[y]S[y] gives the stationary paths of C[y] subject to the constraint S[y] = s,
which provides the equation for the Lagrange multiplier . The following exercise is
the dual of the problem considered in exercise 12.2.
Exercise 12.5
Show that the stationary path of the variational problem
S[y] =
_
1
0
dxy, y(0) = y(1) = 0,
subject to the constraint C[y] =
_
1
0
dx y
2
= c, is given by y(x) =
3c x(1 x),
and that the undetermined multiplier is = 1/(4
3c).
12.2.3 The catenary
Here we determine the shape of the catenary, that is, the shape assumed by an inexten-
sible cable of uniform density, , and known length, hanging between xed supports.
In gure 12.1 we show an example of such a curve with the points of support at (0, B)
and (a, A), with a > 0 and B < A.
(a,A)
x=0
x=a
B
A
x
y
Figure 12.1 The catenary formed by a uniform cable
hanging between two points at dierent heights.
If a curve is described by a dierentiable function y(x) it can be shown, see exercise 3.19
(page 117), that the potential energy E of the cable is proportional to the functional
E[y] = g
_
a
0
dxy
_
1 +y
2
, y(0) = B, y(a) = A B. (12.17)
The curve that minimises this functional, subject to the length of the cable,
L[y] =
_
a
0
dx
_
1 +y
2
, (12.18)
remaining constant is the shape assumed by the hanging cable.
Notice that the functional E[y] is identical to that giving the area of a surface of
revolution, see equation 5.11 (page 155). But, in the present case we shall see that the
existence of the constraint changes the behaviour of the solutions.
Experience leads us to expect that provided L is larger than the distance between
the supports,
_
a
2
+ (A B)
2
, the cable hangs in a specic manner; thus we expect
that there is a unique path that minimises E[y] with the constraint L[y]. Here we show
that provided L >
_
a
2
+ (A B)
2
(and the cable is strong enough) there are always
two stationary paths. But, in section 5.3 we saw that when A = B, with no constraint,
there are either two or no smooth stationary paths of E[y], depending upon the ratio
A/a. In exercise 5.19 it was shown that if B = 0 and A > 0, again with no constraint,
there is no solution. This illustrates the signicance of constraints.
A physical interpretation of the eect of the removal of this constraint is given by
considering a slight modication of the catenary, whereby the points of support are two
smooth pegs and the cable is draped over these with the surplus cable resting on the
ground, as shown in gure 12.2: the important property of a smooth peg is that around
it tension in the cable does not change. We suppose that the cable is suciently long
that there is always some cable on the ground.
A
a
Figure 12.2 Diagram showing a cable hanging over two
smooth pegs, at the same height, A, above the ground, a
distance a apart. The cable is long enough to reach the
ground on both sides.
In this example the potential energy of the vertical segments is independent of the
shape of the hanging portion, so the energy is given by equation 12.17, and there is no
constraint. This is the same functional as gives the area of a surface of revolution.
The hanging portion of the cable is supported only by the weight of the vertical
portion of the cable, so we consider the eect of keeping A and B xed and changing
a, the separation between the pegs. First consider the case A = B.
If a A the weight of the hanging cable is relatively small by comparison to the
vertical portion, and we expect the portion between the pegs to be almost horizontal.
In addition there will be a solution where the hanging portion falls almost vertically
near the pegs and with a section of it resting on the oor. Figure 5.11 (page 160) shows
the two solutions, for a A; one is almost horizontal and is shown in section 8.7 to be
a local minimum.
If a A the weight of the hanging cable is relatively large and cannot be sup-
ported by the vertical portion. Now the only solution is the Goldschmidt solution,
equation 5.20, which is physically possible only for an innitely exible cable.
Notice that if B = 0 the length of the vertical portion of the cable must be less
than the hanging portion, which therefore cannot be supported, so there is no smooth
solution, as in exercise 5.19. This example demonstrates the importance of constraints.
Returning to the main problem, choose the axes so the left-hand support is at the
origin, that is B = 0, and the right-hand end has coordinates (a, A). Further we may
assume, with no loss of generality, that A 0. The energy and constraint functionals
are given in equations 12.17 and 12.18, so if g is the Lagrange multiplier the auxiliary
functional is proportional to
E[y] =
_
a
0
dx(y )
_
1 +y
2
, y(0) = 0, y(a) = A 0, (12.19)
and this can be expressed in terms of a new variable z = y
E[z] =
_
a
0
dxz
_
1 +z
2
, z(0) = , z(a) = A .
The rst-integral of this functional is
z
1 +z
2
= c,
where c is a constant. Solving this equation for z
gives the rst-order equation

dz
dx
=
z
2
c
2
c
,
which is same equation as derived in section 5.3.2. Putting z = c cosh(x) gives
c
= 1, so the general solution is

y = +c cosh
_
x +d
c
_
, (12.20)
where d is another constant. This solution contains three unknown constants, , d
and c, which are obtained from the two boundary conditions and the constraint, as
shown next.
The boundary conditions y(0) = 0 and y(a) = A give the equations
= c cosh
_
d
c
_
and A = +c cosh
_
a +d
c
_
, (12.21)
and the constraint becomes
L =
_
a
0
dx
1 + sinh
2
_
x +d
c
_
=
_
a
0
dx cosh
_
x +d
c
_
= c
_
sinh
_
a +d
c
_
sinh
_
d
c
__
= 2c cosh
_
a + 2d
2c
_
sinh
a
2c
. (12.22)
Equations 12.21 and 12.22 give three equations enabling the constants , c and d to be
determined in terms of L, B and (a, A). It is not possible to nd formulae for these
constants, but a numerical solution is made relatively easy after some rearrangements
are made. Subtracting equations 12.21 gives
A = c
_
cosh
_
a +d
c
_
cosh
_
d
c
__
= 2c sinh
_
a + 2d
2c
_
sinh
a
2c
. (12.23)
On squaring and subtracting equations 12.22 and 12.23 we obtain
L
2
A
2
= 4c
2
sinh
2
_
a
2c
_
or, with =
a
2c
, sinh =
L
2
A
2
a
. (12.24)
This equation for has two real solutions, =
0
, where
0
is the positive solution
of the second equation; so c = c
0
, c
0
= a/(2
0
) > 0. These two values of c give two
values, d
, of d which can be found by dividing 12.23 by 12.22 to give

tanh
_
a + 2d
2c
0
_
=
A
L
,
_
0 <
A
L
< 1
_
.
If D
0
is the positive solution of tanh D
0
= A/L then
d
c
0
=
0
D
0
,
0
=
a
2c
0
.
Then equation 12.21 gives the following two values for ,
= c
0
cosh(
0
D
0
), giving
+
+
= A.
Hence the two solutions are
y
+
(x) = c
0
_
cosh
_
2
0
a
x (
0
D
0
)
_
cosh(
0
D
0
)
_
(12.25)
and
y
(x) = c
0
_
cosh
_
2
0
a
x (
0
+D
0
)
_
cosh(
0
+D
0
)
_
. (12.26)
The solution y
+
(x) has a local minimum at x = x
m
= a(1 D
0
/
0
)/2, and y
(x) has
a maximum at x = a x
m
. Also we note that y
(a x) = A y
+
(x). An example of
each of these solutions is shown in gure 12.3. Only y
+
(x) is physically signicant in
the present context.
0.2 0.4 0.6 0.8 1
-0.5
0
0.5
1
1.5
2
y
+
(x)
y
-
(x)
y
x
Figure 12.3 Graphs of the functions y
(x) in the case

a = A = 1 and L = 3.
We can deduce the existence of these two solutions directly from the original functional.
Suppose that y(x) satises the Euler-Lagrange equations associated with E[y], then if
w(x) = A y(a x), so w(0) = 0 and w(a) = A, then
E[y] =
_
a
0
dx
_
A w(a x)
_
_
1 +w
(a x)
2
=
_
a
0
du
_
w(u)
_
_
1 +w
(u)
2
, u = a x, + = A.
Thus if y
+
(x) is a stationary path then so is y
(x). Also,
E[y
] E[y
+
] = c
2
0
_
sinh2
0
2
0
_
> 0, (12.27)
that is the potential energy of the path y
+
(x) is less that that of the path y
(x); physical
considerations suggest that y
+
(x) gives a minimum value of E[y].
In gure 12.4 we show some examples of y
+
(x) for a = A = 1 and various values
of L.
0.2 0.4 0.6 0.8 1
-1
-0.5
0
0.5
1
L=1.42
L=1.5
L=3
L=2
x
y
Figure 12.4 Graphs showing catenaries y
+
(x), dened in
equation 12.25, of various lengths, L, for a = A = 1.
Exercise 12.6
Show that equation 12.24 for has a unique real solution if L is larger than the
distance between the origin and (a, A). What is the positive limiting value of c as
the stationary path y+(x) tends to the straight line between the end points?
Exercise 12.7
For given values of a and L, (L > a), show that the catenary y+(x) with zero
gradient at the left end, x = 0, has the height dierence A = Ltanh where
a sinh 2 = 2L.
Exercise 12.8
Prove the inequality 12.27.
Exercise 12.9
(a) Show that the Euler-Lagrange equation associated with the functional E[y],
dened in equation 12.19,, is
(y )y
= 1 +y
2
, y(0) = 0, y(a) = A.
(b) If y(x) is a solution of this equation and w(u) = A y(x), u = a x, show
that w(u) satises the equation
(w )w
(u) = 1 +w
(u)
2
, w(0) = 0, w(a) = A and + = A.
Hence explain why another solution for the minimum surface problem, discussed
in section 5.3, cannot be generated by this transformation.
12.3 Variable end points
Variational problems with variable end points, but without constraints, were considered
in section 10.3. The addition of one or more constraints does not alter this theory in
any signicant way, although its implementation is usually more dicult.
Suppose that we require the stationary paths of the functional
S[y] =
_
v
a
dxF(x, y, y
), y(a) = A, (12.28)
where the right-hand end of the path lies on the curve dened by (x, y) = 0 and where
the constraint
C[y] =
_
v
a
dxG(x, y, y
) = c, a constant, (12.29)
also needs to be satised.
Using a similar analysis to that outlined in section 12.2.1 it can be shown that the
required stationary path is given by the stationary path of the auxiliary functional
S[y] =
_
v
a
dxF(x, y, y
), y(a) = A, F = F G, (12.30)
where is a Lagrange multiplier and at x = v the transversality condition (page 266),
x
F
y
+
y
_
y
F
y
F
_
x=v
= 0, (12.31)
is satised.
As before the solution of the associated Euler-Lagrange equation depends upon ,
the value of which is determined by the constraint.
Exercise 12.10
A curve of given length L is described by the positive function y(x) passing through
the origin and some point, (v, 0), with v > 0, to be determined. Find the shape
of the curve making the area under it stationary.
Hint in this example the boundary curve is (x, y) = y = 0.
Exercise 12.11
A curve described by the positive function y(x) passing through the origin and
some point, to be determined, x = v > 0 on the x-axis, is rotated about the x-axis
to form a solid body.
(a) Show that the volume, V [y], and the surface area, A[y], of this body are given
by
V [y] =
_
v
0
dxy
2
and A[y] = 2
_
v
0
dxy
_
1 +y
2
.
(b) If the surface area is given determine the path making the volume stationary,
and nd the volume in terms of A.
Hint in this example the boundary curve is (x, y) = y = 0.
Exercise 12.12
Show that the equation of the cable with the right-hand end xed at (a, A), where
a and A are positive, and with the left-hand end free to slide on a vertical pole
aligned along the y-axis is given by y = A +c cosh
_
x
c
_
c cosh
_
a
c
_
, where c is
given by the positive root of L/a = sinh and c = a/.
12.4. BROKEN EXTREMALS 311
Exercise 12.13
Show that the equations of a cable of length L and uniform density, with the left
end free to slide on a vertical pole aligned along the y-axis and the right end free
to slide along the straight line x/a +y/b = 1, a, b > 0, is
y = +
bL
a
cosh
_
ax
bL
_
, 0 x
bL
a
sinh
1
_
a
b
_
,
for some for which you should nd an expression in terms of a, b and L.
12.4 Broken extremals
The theory of broken extremals, section 10.5.2, remains essentially unchanged when
constraints are added. For one constraint the theory is as described in that section
except that the integrand F is replaced by F = FG, where is a Lagrange multiplier
and G the integrand of the constraint.
We illustrate this theory with the simple example requiring the stationary paths of
S[y] =
_
a
0
dxy
2
, y(0) = 0, y(a) = A (12.32)
of given length
L[y] =
_
a
0
dx
_
1 +y
2
and with a discontinuous derivative at x = c, with 0 < c < a.
The modied functional is
S[y] =
_
a
0
dx
_
y
2
_
1 +y
2
_
, y(0) = 0, y(a) = A. (12.33)
This integrand depends only upon y
, so the solutions of the associated Euler-Lagrange

equation are straight lines, y = mx +d. On the interval 0 x c, since y(0) = 0, the
appropriate solution is y = m
1
x, for some constant m
1
. On the interval c x a, the
solution through y(a) = A is y = A + m
2
(x a). The solution is continuous at x = c,
so
(m
1
m
2
)c = A m
2
a. (12.34)
The Weierstrass-Erdmann (corner) conditions connecting the two sides of the solution
at x = c are, see equations 10.53 and 10.53 (page 275),
lim
xc
_
F y
F
y
_
= lim
xc+
_
F y
F
y
_
,
lim
xc
F
y
= lim
xc+
F
y
.
Since
F
y
= y
_
2

_
1 +y
2
_
and F y
F
y
= y
2

_
1 +y
2
these conditions become
m
2
1
+

_
1 +m
2
1
= m
2
2
+

_
1 +m
2
2
,
m
1
_
2

_
1 +m
2
1
_
= m
2
_
2

_
1 +m
2
2
_
.
(12.35)
A solution of the rst equation is m
1
= m, m
2
= m, for some m; then the second
equation gives
m
_
2

1 +m
2
_
= 0 giving the nontrivial solution
_
1 +m
2
=

2
.
The constraint now gives
L =
_
a
0
dx
_
1 +m
2
= a
_
1 +m
2
and hence =
2L
a
and m =
_
L
2
a
2
1.
Equation 12.34 for continuity then gives c = (A+ma)/2m. Hence the stationary paths
are
y(x) =
_
_
_
mx, 0 x c,
A +m(a x), c x a, where m =
_
L
2
a
2
1 and c =
A +ma
2m
.
Since 0 < c < a we must have |A| < ma.
With no corner conditions the dierentiable solution exists only if L =
a
2
+A
2
,
there being insucient exibility to satisfy the constraints and the Euler-Lagrange
equation for any other values of L, a and A.
Exercise 12.14
Show that the only solutions of equations 12.35 are those considered in the text.
Exercise 12.15
This is a long, dicult question which should be attempted only if time permits.
An inextensible cable with uniform density , is suspended between the points
(0, B) and (a, A), with A B, where the y-axis is vertically upwards. A weight of
mass M is rmly attached to the cable at distances L1 and L2 from the left and
right ends respectively, all distances being measured along the cable.
(a) Show that the energy functional is
E[y] = Mgy() +g
_

0
dxy
_
1 +y
2
+g
_
a
dxy
_
1 +y
2
,
where is the x-coordinate of the weight, and that the two constraints are
L1 =
_

0
dx
_
1 +y
2
and L2 =
_
a
dx
_
1 +y
2
.
12.5. PARAMETRIC FUNCTIONALS 313
(b) Derive the Euler-Lagrange equations for the cable and show that their solu-
tions are
y1(x) = 1 +c1 cosh
_
x d1
c1
_
, 0 x , y1(0) = B,
y2(x) = 2 +c2 cosh
_
x d2
c2
_
, x a, y2(a) = A,
where 1 and 2 are two Lagrange multipliers and (c1, c2, d1, d2) are constants
arising from the integration of the Euler-Lagrange equations.
(c) Show that c1 = c2 = c and that the six remaining unknown constants (1, 2, , c, d1, d2)
are determined by the following six equations.
L1 =
_

0
dx
_
1 +y
2
1
= c
_
sinh
_
d1
c
_
+ sinh
_
d1
c
__
L2 =
_
a
dx
_
1 +y
2
2
= c
_
sinh
_
a d2
c
_
sinh
_
d2
c
__
.
B = 1 +c cosh
_
d1
c
_
and A = 2 +c cosh
_
a d2
c
_
.
M = c
_
sinh
_
d2
c
_
sinh
_
d1
c
__
and
1 +c cosh
_
d1
c
_
= 2 +c cosh
_
d2
c
_
.
12.5 Parametric functionals
The general theory for a parametrically dened curve is identical to that described
in section 12.2.1, in particular theorem 12.2. Consider the case of three independent
variables, (x, y, z), depending upon a parameter t, and one constraint: the functional
will be
S[x, y, z] =
_
1
0
dt (x, y, z, x, y, z) (12.36)
with given boundary conditions and with admissible functions restricted to those paths
that satisfy the constraint,
C[x, y, z] =
_
1
0
dt G(x, y, z, x, y, z) = c (12.37)
where c is a constant. This is just the problem dealt with by theorem 12.2, so the
stationary paths satisfy the three Euler-Lagrange equations
d
dt
_
u
_

u
, u = {x, y, z}, = G, (12.38)
with the same boundary conditions as dened for the original functional, and where
is a Lagrange multiplier.
We illustrate this theory by applying it to the original isoperimetric problem of
Dido, that is we require the shape of the closed curve of given length L that encloses
the largest area, though we show only that the area is stationary. A version of this
problem was considered in exercise 12.10, where only the upper half of the curve was
considered: using a parametric representation of the functions this restriction is not
necessary.
The area of a closed curve in the Oxy-plane, see equation 9.5 (page 242), is
A[x, y] =
1
2
_
2
0
dt (x y xy) , x(0) = x(2), y(0) = y(2), (12.39)
where the range of the parameter t is appropriate for it to be an angle, and the curve
is traversed anti-clockwise. The constraint is the length,
C[x, y] =
_
2
0
dt
_
x
2
+ y
2
= L. (12.40)
If is the Lagrange multiplier the modied functional is
A[x, y] =
1
2
_
2
0
dt
_
x y xy 2
_
x
2
+ y
2
_
(12.41)
and the two associated Euler-Lagrange equations for x and y, respectively, are
d
dt
_
x
_
x
2
+ y
2
+
1
2
y
_
+
1
2
y = 0,
d
dt
_
y
_
x
2
+ y
2
1
2
x
_
1
2
x = 0.
These simplify to
d
dt
_
x
_
x
2
+ y
2
_
+ y = 0 and
d
dt
_
y
_
x
2
+ y
2
_
x = 0, (12.42)
which integrate directly to
x
_
x
2
+ y
2
= y and
y
_
x
2
+ y
2
= +x, (12.43)
for some constants and . Now multiply the rst of these by y, the second by x and
subtract to give ( y) y ( +x) x = 0. Integrate this to obtain
(x +)
2
+ (y )
2
=
2
, (12.44)
where is another real constant. This is the equation of the circle with centre at
(, ) and radius . Its circumference is 2 = L, which gives the required path. In
parametric form its equations are
x = +
L
2
cos t and y = +
L
2
sin t. (12.45)
The position of the centre of this circle cannot be determined from the information
provided.
12.6. THE LAGRANGE PROBLEM 315
Exercise 12.16
An alternative method of nding the stationary path for the area from equa-
tions 12.42 is to use the arc length, s, as the independent variable, which is related
to the parameter t by the relation
ds
dt
=
_
x
2
+ y
2
.
(a) Show that with s as the independent variable equations 12.42 become
dx
ds
= y and
dy
ds
= +x.
Further, show that these equations can be converted to
2
d
2
y
ds
2
+y = having the general solutions
_
y = +a cos(s/ +)
x = a sin(s/ +),
where a and are constants.
(b) Show that 2 = L, derive equations 12.41 and deduce that 2a = L.
Exercise 12.17
What is the shape of the closed curve, enclosing a given area, for which the length
is stationary.
12.6 The Lagrange problem
A dierent type of problem, originally formulated by Lagrange (1736 1813) and since
associated with his name, consists of nding stationary paths of the functional
S[y] =
_
b
a
dxF(x, y, y
), (12.46)
where y(x) = (y
1
, y
2
, . . . , y
n
) is an n-dimensional vector function, with constraints
dened by the m < n functions
C
j
(x, y, y
) = 0, j = 1, 2, , m < n, (12.47)
and such that certain boundary conditions are satised.
There are a number of complications and variants to this type of problem, which is
one reason that boundary conditions were not specied, but is also why this introductory
treatment is not assessed.
There are two dierent types of constraints to consider. The simplest type depends
upon x and y, but not the derivatives y
. Such constraints play an important role

in dynamics and are known as holonomic constraints. Constraints that depend upon
y
, and cannot be reduced to a form independent of y
, are known as non-holonomic

constraints: both types of constraints are sometimes named nite subsidiary conditions,
or side-conditions. We consider holonomic constraints rst.
The simplest method of dealing with holonomic constraints is to use a coordinate
system that automatically satises the constraint. If possible this is usually the most
convenient method and has the advantage that each constraint reduces the number
of dependent variables by unity. For example if there are three variables with the
constraint C(y
1
, y
2
, y
3
) = y
2
1
+y
2
2
+y
2
3
r
2
= 0, that forces the admissible paths to lie
on a sphere, it is usually better to use the two spherical polar angles (, ), where
y
1
= r sin cos , y
2
= r sin sin , y
3
= r cos .
The method described here is an alternative, and a specic example is considered in
section 12.6.2.
We assume that the m constraint equations C
j
(x, y) = 0 are suciently well behaved
that along the stationary path they can be used to express m of the dependent variables
in terms of the remaining nm variables, which means that boundary conditions for at
most nm variables need be specied. We shall assume that all holonomic constraints
are consistent with the boundary conditions. In the following proof we assume that
there is just one constraint, C(x, y) = 0.
Suppose that y(x) is the stationary path with the boundary conditions y(a) = A,
y(b) = B. If y +h is a neighbouring admissible path that also satises the constraint,
so h(a) = h(b) = 0, and for each j, h
j
(x) is in D
1
(a, b), then the Gateaux dierential
is
S[y, h] =
_
b
a
dx
n
k=1
_
F
y
k
d
dx
_
F
y
k
__
h
k
(x). (12.48)
But also C(x, y) = C(x, y +h) for all a x b, and hence
n
k=1
h
k
(x)
C
y
k
= 0, (12.49)
which shows that the variations, h
k
(x), are not independent.
Now integrate this expression over the range of x and choose the h
k
to be func-
tions peaked about x = , see equation 12.8 (page 301), so that for any suciently
dierentiable function f(x)
_
b
a
dxf(x)h
k
(x) =
k
f() +O(
3
k
)
as in equation 12.9 (page 301). Thus equations 12.48 and 12.49 become
n
k=1
k
C
y
k
= O(
3
) and
n
k=1
k
_
F
y
k
d
dx
_
F
y
k
__
= O(
3
),
all functions being evaluated at x = . Introduce a Lagrange multiplier, (), which is
a function of , and subtract these equations to obtain
n
k=1
k
_
F
y
k
d
dx
_
F
y
k
_
()
C
y
k
_
x=
= 0. (12.50)
Now choose () so that the coecient of
n
is zero. We have the freedom to choose
the remaining n 1 coecients
k
, k = 1, 2, , n 1, independently hence, using the
same argument as in section 11.2, we obtain the n Euler-Lagrange equations
d
dx
_
F
y
k
_
F
y
k
+(x)
C
y
k
= 0, y(a) = A, y(b) = B, k = 1, 2, , n. (12.51)
12.6. THE LAGRANGE PROBLEM 317
The derivation of this result assumed that there is a single holonomic constraint C(x, y).
This is not necessary; the addition of another holonomic constraint adds another La-
grange multiplier and in equation 12.51 the term
(x)
C
y
k
is replaced by
1
(x)
C
1
y
k
+
2
(x)
C
2
y
k
.
A common type of problem involving a single holonomic constraint is described in
section 12.6.2.
12.6.1 A single non-holonomic constraint
In order to be specic consider a single non-holonomic constraint, that is m = 1, and
n = 2 in equations 12.46 and 12.47. Assume rst that the boundary conditions,
y
1
(a) = A
1
, y
2
(a) = A
2
, y
1
(b) = B
1
, y
2
(b) = B
2
,
are prescribed. But the constraint C(x, y
1
, y
2
, y
1
, y
2
) = 0 can, provided C
y
2
= 0, be
inverted to express y
2
as a function of all the other variables. If we assume that y
1
is
known, as it would be if the stationary paths had been found, then the constraint gives
another rst-order dierential equation for y
2
: integration gives one arbitrary constant,
which may be chosen to satisfy the boundary condition y
2
(a) = A
2
, but there is no
guarantee that the other boundary condition, y
2
(b) = B
2
, will be satised. In these
circumstances it is usually necessary to impose fewer boundary conditions and rely on
natural boundary conditions to supply the rest. Because there are many combinations
of imposed and natural boundary conditions we provide a avour of the theory by
quoting a theorem valid for the restricted set of imposed conditions,
y
1
(a) = A
1
, y
2
(a) = A
2
, y
1
(b) = B
1
,
and a natural boundary condition on y
2
at x = b.
Theorem 12.3
S[y] =
_
b
a
dxF(x, y
1
, y
2
, y
1
, y
2
), y
1
(a) = A
1
, y
2
(a) = A
2
, y
1
(b) = B
1
, (12.52)
with the single constraint
C(x, y
1
, y
2
, y
1
, y
2
) = 0 where C
y
2
= 0, a x b, (12.53)
then if y
1
(x) and y
2
(x) are twice continuously dierentiable, stationary paths of this
system, there exists a Lagrange multiplier, (x), such that
S[y] =
_
b
a
dxF(x, y
1
, y
2
, y
1
, y
2
), y
1
(a) = A
1
, y
2
(a) = A
2
, y
1
(b) = B
1
, (12.54)
where F = F (x)C, is stationary on this path and satises the natural boundary
condition
F
y
2
= F
y
2
C
y
x=b
= 0. (12.55)
The solution of the associated Euler-Lagrange equation will depend upon (x), which
is determined by substituting the solution into the constraint equation 12.53.
12.6.2 An example with a single holonomic constraint
A simple problem with a single holonomic constraint involves nding geodesics on the
surface of a right circular cylinder. Consider such a surface in Oxyz, with equation
x
2
+y
2
= a
2
: we require the geodesics on this surface through points with coordinates
(a, 0, 0) and (a cos , a sin, b). Let the paths be parameterised by a variable 0 t 1,
so the distance along a path is
S[x, y, z] =
_
1
0
dt
_
x
2
+ y
2
+ z
2
with the constraint x(t)
2
+y(t)
2
= a
2
. (12.56)
Using the Euler-Lagrange equations 12.51 we see that
d
dt
_
x
_
+ 2x = 0,
2
= x
2
+ y
2
+ z
2
,
d
dt
_
y
_
+ 2y = 0 and
d
dt
_
z
_
= 0,
It is now helpful to use s, the arc length along the curve for the independent variable,
so that s
2
=
2
. First we note that
d
dt
_
x
_
= s
d
ds
_
1
dx
ds
s
_
= s
d
2
x
ds
2
.
Since t is an arbitrary parameter we may put t = s to reduce the three Euler-Lagrange
equations (since now s = 1) to
x
(s) + 2(s)x(s) = 0, y
(s) + 2(s)y(s) = 0, z
(s) = 0. (12.57)
We now show that the Lagrange multiplier is a constant, which makes the integration
of these equations easy. Dierentiating the constraint twice with respect to s gives
xx
+yy
+x
2
+y
2
= 0.
But, from the denition of s we have x
2
+y
2
+z
2
= 1 and together with equations 12.57
we obtain 2a
2
+ 1 z
2
= 0, and hence =constant (since z
is a constant). The
solution of equations 12.57 that t the initial conditions are, with
2
= 2,
x = a cos s, y = a sins, z = s,
for some constant . If the length of the curve is S then S = +2n, for some integer
n, and S = b. Dening a new variable = s we obtain a parametric representation
of a geodesic,
x = a cos , y = a sin , z =
b
2n +
, 0 2n +. (12.58)
For this example it is far easier to use cylindrical polar coordinates, see exercise 3.20
(page 118), which automatically satisfy the constraint.
12.7. BRACHISTOCHRONE IN A RESISTING MEDIUM 319
Exercise 12.18
S[y, z] =
_
b
a
dx
_
y
2
+z
2
y
2
_
, y(a) = A1, z(a) = A2, y(b) = B1,
with the constraint C(z, y
) = z y
= 0 and with a natural boundary condition

for z(b).
(a) Show that the Euler-Lagrange equations can be written in the form
d
4
y
dx
4

d
2
y
dx
2
y = 0, y(a) = A1, y
(a) = A2, y(b) = B1, y
(b) = 0,
with z = y
and = y
.
(b) Show that this equation for y(x) can be derived from the associated functional
of the single dependent variable
J[y] =
_
b
a
dx
_
y
2
+y
2
y
2
_
, y(a) = A1, y
(a) = A2, y(b) = B1

and with a natural boundary condition for y
(b).
12.7 Brachistochrone in a resisting medium
The modication of the brachistochrone problem to include a resistance is of historical
importance and was rst successfully treated by Euler (1707 1783) in chapter 3 of
his 1744 volume The method of Finding Plane Curves that Show Some Property of
Maximum or Minimum . . . . Indeed it was Euler who rst considered the problems
described here and in chapter 10. The problem considered here is dicult, requiring
many of the techniques and ideas developed earlier in the course, and is therefore good
revision even though this section is not assessed. Euler, on the other hand, developed
these techniques in order to solve this type of problem.
The analysis that follows is dicult and follows that outlined by Pars
1
(1965, chap-
ter 8). You may nd it hard to understand why certain steps are taken but, as usual
with any complicated problem, there is often no simple explanation and what is written
down is the result of trial and many errors: the blind alleys cannot be shown.
There are a variety of types of resistance that can be considered and here we follow
Euler by assuming that the resistance depends only upon the speed, v, of the particle.
This is a more dicult problem than that dealt with in chapter 5 because now energy
is not conserved, which means that there is not a simple relation between the speed
and the height of the particle, as in equation 5.2 (page 149). Instead we need to use
Newtons equation of motion, which here takes on the role of a constraint. First we
need to derive this equation in an appropriate form.
1
L A Pars, An Introduction to the Calculus of Variations, (Heinemann).
Newtons equation of motion
For a particle of mass, m, sliding along a smooth, rigid wire a natural variable for the
description of its position is the distance, s, measured along the wire, from the starting
point. The Cartesian coordinates of the initial point are taken to be (x, y) = (0, A),
and here s = 0; we take the y-axis to be vertically upwards. There are two forces acting
on the particle, the downward force of gravity and the resistance, that depends upon
the speed, v = s and acts tangentially (because the wire is smooth) so as to slow the
motion.
R(v)
x
O
s
mg
P
x
s
N
b
A
y
Figure 12.5 Diagram showing the forces acting on the particle, assuming that the
distance AP is increasing with time. The line PN is the tangent to the curve at
the instantaneous position P, and makes an angle with the downward vertical.
For a particle at P, consider the tangent PN, gure 12.5, to the curve which makes an
angle with the downward vertical, and let s be the distance along the curve from the
starting point, increasing with x. The component of the vertical force of gravity along
the tangent, PN, in the direction of increasing s, is mg cos .
If the magnitude of the resistance per unit mass is R(v), where R(v) is a positive
function such that
2
R(0) = 0, then by resolving forces along the tangent at P, Newtons
equation becomes
m
d
2
s
dt
2
= mR(v) +mg cos . (12.59)
The chain rule gives
d
2
s
dt
2
=
dv
dt
=
dv
ds
ds
dt
= v
dv
ds
and since y = s cos the equation of motion can be written as the rst-order
equation
v
dv
ds
= R(v) g
dy
ds
. (12.60)
We consider only cases where initially the particle is either stationary or moving down-
wards with a speed such that R(v) is small compared with the gravitational force, g
per unit mass. Thus, v
(s) is initially increasing. Subsequently there are two possible

types of motion:
A: v(s) steadily increases until the terminal point is reached, or;
B: v(s) increases to a maximum value at which v
(s) = 0, so here gy
(s) = R(v) < 0,

2
A typically approximation to assume that R is proportional to v
2
, see section 3.5.3, but this is
poor for low speeds, when R is proportional to v, and fails near the speed of sound.
after which v(s) decreases to its value at the terminal point.
We assume that the actual motion is either type A or type B; it will be seen that the
distinction between these two types of motion is important.
Exercise 12.19
If the wire is vertical, so s = y, and the particle starts from rest at s = 0, and
R(v) = v
2
, for some constant , show that the equation of motion 12.60 becomes
v
dv
ds
= v
2
+g and hence show that v
2
=
g
_
1 e
2s
_
, for a particle start-
ing at rest where s = 0.
Note that as s , v
_
g/ and approaches this limiting or terminal speed
monotonically.
The functional and boundary conditions
Now consider the integral for the time taken to travel between two given points (0, A)
and (b, 0), along a curve parameterised by [0, 1]. The time of passage, T, is given
by
T =
_
T
0
dt =
_
1
0
d
dt
d
. (12.61)
If the coordinates of points on the curve are (x(), y()), by denition,
v =
_
dx
dt
_
2
+
_
dy
dt
_
2
=
_
x
()
2
+y
()
2
d
dt
,
and hence the functional for the three variables x(), y() and v() is
T[x, y, v] =
_
1
0
d
_
x
()
2
+y
()
2
v
, (12.62)
where a prime denotes dierentiation with respect to . Now express the equation of
motion in terms of the independent variable , rather than s. Since
ds
d
=
_
x
()
2
+y
()
2
(12.63)
equation 12.60 becomes, on using the chain rule, v
dv
d
d
ds
= R(v) gy
d
ds
, that is,
vv
+R(v)
_
x
()
2
+y
()
2
+gy
= 0.
This constraint is satised by the three variables, so the auxiliary functional is
T[x, y, v] =
_
1
0
d F(x
, y
, v, v
), (12.64)
where
F = H(, v)
_
x
()
2
+y
()
2
vv
gy
with H(, v) =
1
v
()R(v) (12.65)
and where () is the Lagrange multiplier, that depends upon the independent variable,
here .
There are ve known boundary conditions. The initial values of (x, y, v) are assumed
known, and given by (0, A, v
0
), and the nal values of (x, y) are given by (b, 0). The
nal value of v is not known, because this depends upon the path taken. For this we
use the natural boundary condition, equation 12.55 (page 317), at = 1,
F
v
= (1)v(1) = 0. (12.66)
Assuming that v(1) = 0, this gives (1) = 0. In exercise 12.20 it is shown that () < 0
for 0 < 1, and hence that H(, v) > 0.
Exercise 12.20
(a) Show that
() > 0 at = 1.
(b) If (1) = 0 for 0 < 1 < 1, show that
(1) > 0. Deduce that () < 0 for

0 < 1, and that H(, v) > 0 for 0 1.
Hint for part (a) you will need the Euler-Lagrange equation for (), given in
equation 12.71.
The Euler-Lagrange equations and their solution
The Euler-Lagrange equations for x and y are particularly simple because F does not
depend upon either x or y. Thus the equation for x is
d
d
_
x
H
_
x
()
2
+y
()
2
_
= 0 that is
x
H
_
x
()
2
+y
()
2
= , (12.67)
where is a constant. Because we expect x() to be an increasing function of ,
must be a positive constant.
The Euler-Lagrange equation for y is
d
d
_
y
H
_
x
()
2
+y
()
2
g
_
= 0 that is
y
H
_
x
()
2
+y
()
2
= g , (12.68)
where is another constant. Since (1) = 0 it follows that for type A motion, in which
y
() < 0 for all , > 0; for type B motion during which y
() changes sign we must

have, < 0. It is shown how the values of the constants and may be determined
by expressions derived at the end of this calculation.
It is now helpful to use s as the independent variable because, using equation 12.63,
1
_
x
()
2
+y
()
2
dx
d
=
dx
ds
and
1
_
x
()
2
+y
()
2
dy
d
=
dy
ds
and hence equations 12.67 and 12.68 have the simpler form
H(s, v)
dx
ds
= , (12.69)
H(s, v)
dy
ds
= g , H =
1
v
(s)R(v). (12.70)
The third Euler-Lagrange equation, for v, is
d
d
(v)
_
x
()
2
+y
()
2
H
v
+v
= 0
and this simplies to
v
d
d
+
_
x
()
2
+y
()
2
H
v
= 0. (12.71)
Again using s for the independent variable gives the simpler equation
v
d
ds
= H
v
. (12.72)
Equations 12.69, 12.70 and 12.72 are the three Euler-Lagrange equations that we need
to solve. The remaining analysis is dicult partly because we change variables several
times and partly because it is necessary to keep in mind the expected behaviour of the
solution: in particular the two types of motion described before exercise 12.19 need to
be treated slightly dierently.
Since x
(s)
2
+y
(s)
2
= 1, squaring and adding equations 12.69 and 12.70 gives
H
2
=
2
+ (g )
2
that is
_
1
v
R
_
2
=
2
+ (g )
2
(12.73)
where we have used the denition 12.65. This is a quadratic equation for and hence
can be used to express as a function of v.
Before solving this equation consider its value at the terminal point, = 1, where
(1) = 0. If the speed at the terminus is V
t
, this equation gives
V
2
t
=
1
2
+
2
,
a result needed later.
It is helpful to concentrate rst on the type A motion, in which the speed steadily
increases. Then > 0, the maximum speed is at the terminus, max(v) = V
t
, conse-
quently during the motion v
2
>
2
+
2
. The quadratic equation 12.73 can be written
in the form
2
_
g
2
R
2
_
2
_
g
R
v
_
_
1
v
2

2
2
_
= 0. (12.74)
In general air resistance is relatively small, so we assume that g > R for the range of
speeds considered. Then this quadratic equation has the solutions
_
g
2
R
2
_
=
_
g
R
v
_
_
g
R
v
_
2
+ (g
2
R
2
)
_
1
v
2

2
2
_
. (12.75)
Since = 0 at the terminus the correct solution is given by the negative sign, and this
is conveniently written in the form
_
g
2
R
2
_

_
g
R
v
_
=
f(v)
_
2
+
2
(12.76)
where f(v) is the positive function dened by
f(v)
2
=
_
(
2
+
2
)R
g
v
_
2
+
2
g
2
_
1
v
2

2
2
_
.
The rst two Euler-Lagrange equations, 12.69 and 12.70, are simplied if divided by
the third, equation 12.72, to give
dx
d
=
v
HH
v
and
dy
d
=
(g )v
HH
v
. (12.77)
These equations can be used to express (x, y) as integrals over known functions of the
speed v. First we need to express HH
v
in terms of known quantities: dierentiate H
with respect to v,
dH
dv
= H
v
+
d
dv
H
= H
v
R(v)
d
dv
,
where we have used equation 12.65 for H. Similarly, from equation 12.73
H
dH
dv
= g(g )
d
dv
,
and on combining these two results
H
_
H
v
R
d
dv
_
= g(g )
d
dv
that is HH
v
=
_
_
g
2
R
2
_

_
g
R
v
__
d
dv
.
(12.78)
Observe that the right-hand side of this equation is proportional to the left-hand side
of equation 12.76 for , and hence
HH
v
dv
d
=
f(v)
_
2
+
2
. (12.79)
But, using the chain rule equations 12.77 can be written in the form
dx
dv
=
v
HH
v
d
dv
and
dy
dv
=
(g )v
HH
v
d
dv
so that using equation 12.79 these can be written as a pair of uncoupled rst-order
dierential equations,
dx
dv
=
_
2
+
2
v
f(v)
and
dy
dv
=
_
2
+
2
( g)v
f(v)
. (12.80)
Notice that x
(v) > 0 and y
(v) < 0, since 0 and > 0 (for type A motion).

The right-hand sides of these equations are functions of v, with (v) being given
by equation 12.76. Integration, and taking account of the initial conditions, gives the
equation of the curve in the form
x(v) =
_
2
+
2
_
v
v0
dv
v
f(v)
, (12.81)
y(v) = A
_
2
+
2
_
v
v0
dv
( g)v
f(v)
. (12.82)
Using equation 12.76 for we obtain
g =
1
g
2
R
2
_
gR
v
R
2
_
+
g
g
2
R
2
f(v)
_
2
+
2
so that the equation for y(v) becomes
y(v) = A g
_
v
v0
dv
v
g
2
R
2

_
2
+
2
_
v
v0
dv
gR vR
2
f(v)(g
2
R
2
)
. (12.83)
These expressions depend upon the unknown constants and , which are obtained
using information about the terminal point at which
v = V
t
=
1
_
2
+
2
, x(V
t
) = b and y(V
t
) = 0.
Thus the end conditions give two equations for the unknown constants and in terms
of the given parameters b and A. These equations are, however, nonlinear so are dicult
to solve: this diculty is compounded by the fact that the relations can usually only
be determined by numerically evaluating the integrals. Physical considerations suggest,
however, that for any pair of values (b, A) a solution exists.
Equations 12.81 and 12.82 dene the stationary path parametrically, with the speed
v as the parameter. They are therefore directly equivalent to equations 5.8 (page 151),
in which the parameter is the angle . Further, in the limit R(v) = 0 these equations
should reduce to those found previously: it is important that we establish that this is
true in order to check the derivation.
Exercise 12.21
In this exercise the limit R = 0 is considered and it is shown that equations 12.81
and 12.83 reduce to the conventional parametric equations of the cycloid.
(a) Show that in the limit R = 0 equation 12.83 for y(v) reduces to the energy
equation
mg(Ay) =
1
2
mv
2
1
2
mv
2
0
.
(b) Show that if R = 0,
_
2
+
2
f(v)
=
v
g
1
2
v
2
and hence that equation 12.81 for x(v) becomes
x(v) =

g
_
v
v
0
dv
v
2
1
2
v
2
.
(c) Using the substiution v = sin and setting v0 = 0, show that the equation
found in parts (a) and (b) become
x =
c
2
2
(2 sin 2), y = A c
2
sin
2
, c
2
=
1
2
2
g
, 0
b
.
(d) Show also that g = / tan, and hence that = / tan
b
. Deduce that
> 0 if 0
b
< /2 and < 0 if /2 <
b
< ; explain the signicance of the
condition
b
= /2.
In general equations 12.81 and 12.82 can be dealt with only using numerical methods,
and this is not easy because it is necessary to solve two coupled nonlinear equations,
which can be evaluated only by numerical integration. However, if the resistance is
relatively small we expect the stationary path to be close to that of the cycloid of the
resistance free motion, which suggests making an expansion in powers of R(v). Such
an analysis also helps check numerical solutions.
In order to facilitate this expansion it is helpful to replace R(v) by R(v), where
is a small positive, dimensionless, quantity, and to use to keep track of the expansion.
An approximation to f(v), to order , can be written,
_
2
+
2
f(v)
=
v
g
_
1
2
v
2
2
g
vR(v)
+O(
2
).
so that equation 12.81 for x(v) becomes
x(v) =

g
_
v
v0
dv
v
2
_
1
2
v
2
2
g
vR(v)
=

g
_
v
v0
dv
v
2
1
2
v
2
_
1 +

g
vR(v)
1
2
v
2
+O(
2
)
_
and equation 12.83 for y(v) becomes, to this order
y(v) = A
1
2g
_
v
2
v
2
0
_

g
2
_
v
v0
dv
vR(v)
1
2
v
2
.
We now set v
0
= 0 and use the same substitution as used in exercise 12.21, v = sin,
to write these relations in the form
x() =
1
4
2
g
(2 sin 2) +

3
g
2
_

0
d
sin
3
cos
2
R(v) (12.84)
y() = A
1
2
2
g
sin
2

2
g
2
_

0
d sin R(v). (12.85)
At the terminal point (x, y) = (b, 0), if =
b
we have =
_
2
+
2
sin
b
, which can
be rearranged to give = / tan
b
, so the two unknown parameters are now and
b
.
It is now necessary to choose a particular function for the resistance: a natural
choice is R = v
2
, where is a constant (with the dimensions of inverse length). Then
equations 12.84 and 12.85 become
x() =
1
4g
2
(2 sin 2) +

g
2
5
G
1
() (12.86)
y() = A
sin
2
2g
2

g
2
4
G
2
() (12.87)
where v = sin and
G
1
() =
_

0
d
sin
5
cos
2
=
1
cos

8
3
+
7
4
cos
1
12
cos 3,
G
2
() =
_

0
d sin
3
=
2
3

3
4
cos +
1
12
cos 3.
The rst task is to determine the values of and
b
from the terminal conditions. This
is facilitated by noting that the equation y(
b
) = 0 is a quadratic in 1/(g
2
),
g
2
4
G
2
(
b
) +
sin
2
b
2g
2
A = 0.
The quadratic term is proportional to so one of the roots behaves as
1
, as 0,
and since we require a root that is nite when there is no resistance, the relevant
solution is
1
g
2
=
4A
sin
2
b
+
_
sin
4
b
16AG
2
(
b
)
. (12.88)
This expression denes in terms of
b
, but numerical calculations show that it is real
only for small .
Using the equation = / tan
b
for allows us to write the equation x(
b
) = b in
the form
b =
1
4g
2
(2
b
sin2
b
) +

g
2
4
tan
b
G
1
(
b
). (12.89)
Since g
2
is given in terms of
b
by equation 12.88 this is a single equation for
b
that
can be solved numerically.
In gure 12.6 we show an example of such a solution. For the purposes of illustration
we choose g = 1 and take the end points to be (0, A), with A = 2/( 2), and (b, 0),
with b = 1, so for the cycloid
b
= /4. For these parameters it is necessary that
< 0.135 (approximately) for (
b
) to be real, so we take = 0.12. As might be
expected the resistance forces the stationary path below that of the cycloid, on to a
path that is initially steeper.
0 0.2 0.4 0.6 0.8 1
0.5
1
1.5
2
Cycloid
y
With resistance
x
L R
Figure 12.6 An example of a stationary path of a brachistochrone with resis-
tance, with end points given by A = 2/(2) and b = 1. The other parameters
used are dened in the text.
Now return briey to case B when the speed reaches a maximum value along the path,
so v(s) is not a monotonic increasing function for all s. If v
(s) = 0 at some intermediate

point where s = S
m
and v = V
m
, then the equation of motion 12.60 shows that at this
point gy
(S
m
) = R(V
m
) < 0, that is y(s) is still decreasing, so the maximum speed is
reached before the lowest point of the path; this is contrary to the case R = 0 where
energy conservation ensures that these points coincide. Substituting this value of y
into the Euler-Lagrange equation 12.70 for y
(s) gives the relation
_
g
2
R
2
_
= g
R
v
,
which, on comparing with equation 12.76, gives f(V
m
) = 0. Prior to this point the speed
is increasing to its maximum, V
m
and y
(s) < 0; subsequently v decreases steadily to the

speed at the terminus. The vertical component of the velocity changes when y
(s) = 0.
This situation is summarised in gure 12.7.
x
y
A
y(s)=0 =0, g
f(v)=0, v(s)=0
Figure 12.7 Diagram showing where v
(s) and y
(s) are zero.

On the rst part of the path v
(s) > 0 and g < 0 and we have

g =
g
g
2
R
2
f(v)
_
2
+
2
gR vR
2
v(g
2
R
2
)
, < 0.
On the second part of the path g = 0 at some point and
g =
g
g
2
R
2
f(v)
_
2
+
2
gR vR
2
v(g
2
R
2
)
, < 0.
We now use the limiting case, R = 0 dealt with in exercise 12.20, to suggest how this
problem may be simplied. Assuming that at v = V
m
, f(v)
2
has a simple zero, it is
convenient to factor f(v) in the form f(v)
2
= (V
2
m
v
2
)f
1
(v), where f
1
(v) > 0 for
0 v V
m
. Now dene a new parameter [0, ] by v = V
m
sin , so v increases for
< /2 and decreases for > /2, and then f(v) = V
m
_
f
1
(v) cos . If R = 0 then
V
m
= 1/ and this is the same parameter used for the cycloid. The two expressions for
g can now both be written in the form
g =
gV
m
cos
g
2
R
2
_
f
1
(v)
_
2
+
2
gR vR
2
v(g
2
R
2
)
, v = V
m
sin .
In terms of equations 12.80 for x and y become
dx
d
=
_
2
+
2
v
_
f
1
(v)
and
dy
d
=
_
2
+
2
v(g )
_
f
1
(v)
.
Substituting for g and integrating gives
x() =
_
2
+
2
_

0
d
v
_
f
1
(v)
= V
2
m
2
+
2
_

0
d
sincos
f(v)
, (12.90)
and
y() = gV
2
m
_

0
d
sin cos
g
2
R
2

_
2
+
2
_

0
d
gR vR
2
(g
2
R
2
)
_
f
1
(v)
= A g
_
v
v0
dv
v
g
2
R
2
V
m
_
2
+
2
_

0
d
cos
f(v)
gRvR
2
g
2
R
2
. (12.91)
12.8. BRACHISTOCHRONE WITH COULOMB FRICTION 329
The rst-integral in this expression is the equivalent of the kinetic energy discussed in
part (a) of exercise 12.21, to which it reduces when R = 0. Further, for < /2 these
two equations for (x(), y()) are identical to equations 12.81 and 12.82, but now they
are valid for all . The two equations for and are obtained by integrating to
t
,
where V
t
= V
m
sin
t
and where
t
> /2 if < 0 and
t
< /2 if > 0.
Exercise 12.22
Consider the case where the initial speed, v0, is large, so that R(v0) > g, and show
that the equations for the stationary path are now
dx
dv
=
_
2
+
2
v
f(v)
and
dy
dv
= ( g)
_
2
+
2
v
f(v)
where
f(v)
2
=
_
(
2
+
2
)R
g
v
_
2
2
g
2
_
2
+
2
1
v
2
_
.
Hence show that in the limit g 0 the stationary path between the points (0, A)
and (b, 0) is the straight line y = A(1 x/b), as expected.
12.8 Brachistochrone with Coulomb friction
In this variant of the brachistochrone problem there is friction between the wire and
the bead. Coulomb friction is proportional to the normal force between the bead and
the wire and opposes the motion. Thus the force normal to the wire aects the motion,
which is not so for a smooth wire as in the conventional brachistochrone or the problem
treated in the previous section. This means that energy is not conserved, and the
simplicity of the original problem is lost, as when the bead falls through a resisting
medium. A complete solution of this problem appears to have been described only
relatively recently by Ashby et al (1975)
3
, and here we follow their analysis.
If the ratio of the horizontal to the vertical distance of the end points is large and
the initial speed is zero, the frictional forces must be small for a stationary path to
exist. As this ratio increases we expect the critical value of the friction, beyond which
there is no stationary path, to decrease: this behaviour is dicult to see in the exact
solution but is illustrated in exercises 12.23, 12.24 and 12.32.
Newtons equation of motion
The Cartesian coordinates of the end points of the wire are taken to be (x, y) = (0, A),
for the starting point, and (b, 0) for the terminus, with A > 0 and b > 0, and where
the y-axis is vertically upwards. If m is the mass of the bead this conguration and
the forces acting on the bead are shown in gure 12.8. The gradient of the wire at the
bead is tan = dy/dx, where y(x) is the required curve.
3
N Ashby, W E Brittin, W F Love and W Wyss, Amer J Phys 1975, 43 pages 902-6.
N
y
x
b
N
mg
y
x
wire
mg
A
Figure 12.8 Diagram showing the wire and its terminal points, on the left, and the
forces acting on the bead on the right: here N is the force normal to the wire.
There are three forces acting on the bead, as shown on the right of gure 12.8; that due
to gravity, the force N normal to the wire, which does not directly aect the motion, and
the frictional force of magnitude N directed along the wire and opposing the motion.
Here is the constant coecient of friction and 0. For the reason discussed above
for a given value of we expect no stationary paths if b/A is too large.
The forces on the bead in the x- and y-directions are obtained directly by resolving
the forces shown in the inset of gure 12.8,
F
x
= N(sin +cos ), F
y
= N(cos sin ) mg, (12.92)
so the force in the tangential direction is
F
T
= F
x
cos +F
y
sin = N mg sin . (12.93)
Newtons equations of motion are therefore
m x = N(sin +cos ), (12.94)
m y = N(cos sin) mg, (12.95)
where we use the notation (due to Newton) x = dx/dt and x = d
2
x/dt
2
. Along the
wire, if v is the speed
m v = F
T
= N mg sin . (12.96)
Eliminating N from equations 12.94 and 12.95 gives
m( xsin y cos ) = N +mg cos . (12.97)
But also
x = v cos and y = v sin , (12.98)
and by dierentiation we see that xsin y cos = v
, so that equation 12.97 becomes

N = mv

+mg cos . (12.99)
By substituting this into equation 12.96, for the tangential motion, we obtain the equa-
tion of motion
v + (v
+g cos ) +g sin = 0. (12.100)

Using equation 12.98 this equation can be written in the alternative form
v v +v
2

+g x +g y = 0. (12.101)
In this equation ( x, y) are related to v and

, by geometry, equation 12.98; squaring
and adding these equations gives the obvious identity v
2
= x
2
+ y
2
, which is one of the
constraints on the functional. Dierentiation of equations 12.98 gives
=
y cos xsin
v
=
y x x y
v
2
.
This relation, together with the equation of motion 12.101, is the other constraint.
Exercise 12.23
A bead slides on a rough wire joining (0, A) to (b, 0) in a straight line, starting
from (0, A) with speed v0.
Show that provided v
2
0
> 2g(b A) the bead reaches the terminus at the time
t =
2
A
2
+b
2
v0 +
_
v
2
0
+ 2g(Ab)
.
Exercise 12.24
Consider a wire in the shape of the quadrant of a circle of radius R, centre at
(R, R) joining the points (0, R) and (R, 0). The coordinates of a point on this
quadrant can be expressed in terms of the angle ,
x = R(1 cos ), y = R(1 sin ), 0

2
,
with increasing from 0 at (0, R) to /2 at (R, 0).
(a) Show that = /2 where is the angle dened in gure 12.8.
(b) Show that the equation of motion of the bead on the wire is
v
dv
d
+v
2
= gR(cos sin ).
(c) By making an appropriate change of variable deduce, without solving the equa-
tion, that if v(0) = 0 the value of for which v(/2) = 0 is independent of R.
(d) By solving the dierential equation derived in part (b) with v(0) = 0 show
that v(/2) = 0 for = 1 where 1 is the solution of
2
2
+ 3e
= 1.
Deduce that if is slightly larger that 1 the bead does not reach the terminus.
The functional and boundary conditions
The time of passage, T, is given by equation 12.61 (page 321),
T =
_
1
0
d
dt
d
(12.102)
where is the parameter dening the position along the path if the natural variable,
t the time, were used the required quantity T would appear as a limit in the integral,
which is inconvenient.
This functional has two constraints: the equation of motion 12.101 and the relation
between v and ( x, y), so this is a Lagrange problem with two multipliers. The constraints
need to be expressed in terms of . For v
v() =
_
dx
d
d
dt
_
2
+
_
dy
d
d
dt
_
2
=
_
x
()
2
+y
()
2
t
()
,
with a prime denoting dierentiation with respect to . For
, since tan = y/ x =
y
/x
, dierentiation gives
1
cos
2
d
d
=
y
x
2
hence
d
d
=
y
x
2
+y
2
=
y
v
2
t
2
.
Thus the equation of motion 12.101 becomes
vv
+gy
+
y
t
2
+gx
= 0.
The auxiliary functional is therefore
T[x, y, v, t] =
_
1
0
d F(x
, x
, y
, y
, v, v
, t
) (12.103)
where
F = t
+
1
_
vv
+gy
+
y
t
2
+gx
_
+
2
_
_
x
2
+y
2
vt
_
, (12.104)
with both the Lagrange multipliers,
1
and
2
, depending upon . The dependent
variables are (x, y, v, t) and the functional contains second derivatives of x and y.
The known boundary conditions at the start, = 0, are
x(0) = 0, y(0) = A > 0, v(0) = v
0
0, t(0) = 0, (12.105)
and at the terminus, = 1,
x(1) = b, y(1) = 0. (12.106)
The remaining conditions are determined by the natural boundary conditions: for x
and y,
F
x
=
1
y
t
2
= 0,
F
y
=
1
x
t
2
= 0, at = 0 and 1, (12.107)
and for v at the terminus
F
v
=
1
(1)v
(1) = 0. (12.108)
This gives
1
(1) = 0 and hence the boundary condition 12.107 at the terminus is
automatically satised.
The four Euler-Lagrange equations are obtained from the derivatives
F
t
= 0,
F
t
= 1 2
1
t
3

2
v
F
v
=
1
v
2
t
,
F
v
=
1
v,
F
x
=

1
y
t
2
+
1
g +

2
x
_
x
2
+y
2
,
F
x
1
y
t
2
,
F
y
1
x
t
2
+
1
g +

2
y
_
x
2
+y
2
,
F
y
=

1
x
t
2
.
From these expressions we obtain the four Euler-Lagrange equations in terms of , after
which we may replace by t (because the choice of parameter is arbitrary). Thus the
four following Euler-Lagrange equations are obtained
2
v + 2
1
v
2

= c
1
(for t), (12.109)
v
1
+
2
= 0 (for v), (12.110)
2
1
y + y
1
+
1
g +
2
cos = c
x
(for x), (12.111)
2
1
x + x
1
g
2
sin = c
y
(for y), (12.112)
where c
1
, c
x
and c
y
are integration constants. These four equations, together with the
constraints allow a solution to be found; remarkably these equations can be integrated
in terms of known functions, though this process is not simple.
Using equations 12.110 and 12.98 we see that x
1
=
2
cos and y
1
=
2
sin .
Equation 12.109 gives
2
in terms of
1
, and the second derivatives in equations 12.111
and 12.112, x and y, may be replaced by the rst derivatives v and

using
x = v cos v
sin , y = v sin +v
cos ,
so equations 12.111 and 12.112 become, respectively,
2
1
_
v +v
_
sin +
1
g + (cos sin )
c
1
v
= c
x
, (12.113)
2
1
_
v +v

_
cos
1
g (sin +cos )
c
1
v
= c
y
. (12.114)
Now note that the combination v + v
also occurs in the equation of motion 12.101,

which can therefore be used to obtain two algebraic equations relating v and
1
. Thus
equations 12.113 and 12.114 become
1
g
_
1 2sin cos 2 sin
2
_
+ (cos sin )
c
1
v
= c
x
, (12.115)
1
g
_
1 + 2sin cos + 2
2
cos
2
_
(sin +cos )
c
1
v
= c
y
. (12.116)
These equations are linear in
1
g and c
1
/v so may be solved directly to give
v() =
cos
Bh()
where h() = 1 + 2sin cos + 2C cos
2
(12.117)
and
1
g = Bc
1
(C + tan ) where B =
c
x
c
y
c
1
(1 +
2
)
, C =
c
y
+c
x
c
x
c
y
. (12.118)
Thus both v and
1
are explicit functions of .
If v(0) = 0 the initial value of satises cos = 0, and physical considerations give
(0) = /2; that is, the stationary curve is initially vertical, as in the conventional
problem.
Because v is a function of it is possible to express x and y as rst-order dierential
equations with as the independent variable. First note that

= 1/t
(), then
x =
x
()
t
()
= v() cos that is
dx
d
= v
dt
d
cos and similarly
dy
d
= v
dt
d
sin .
An expression for t
() is obtained from the equation of motion 12.101 by dividing by
to give
v (v
+v) +gt
() (sin +cos ) = 0,
that is
g
dt
d
=
v
+v
sin +cos
.
Using equation 12.117 in this expression it becomes, after some algebra,
gB
dt
d
=
2
h
2

1
h
. (12.119)
Hence the dierential equations for x() and y() are
gB
2
dx
d
=
_
2
h
3

1
h
2
_
cos
2
and gB
2
dy
d
=
_
2
h
3

1
h
2
_
sin cos . (12.120)
At the terminus, where =
1
,
1
= 0 so equation 12.118 relates C to
1
, C = tan
1
.
Thus the equations for the stationary path are
x(, B) =
1
gB
2
_

/2
d
_
2
h
3

1
h
2
_
cos
2
, x(
1
, B) = b, (12.121)
y(, B) = A +
1
gB
2
_

/2
d
_
2
h
3

1
h
2
_
sin cos , y(
1
, B) = 0. (12.122)
The two boundary conditions give two equations for B and
1
which may be solved
(numerically) to yield the stationary path. Some examples of the solutions of these
equations are shown in gure 12.9; here the frictionless case ends tangentially to the
x-axis and if > 0 the stationary path dips below the x-axis, but too little to be seen
on this graph.
0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
0.2
0.4
0.6
0.8
1
=0
x
y
=0.5
=0.3
=0.2
Figure 12.9 Graphs of the curves traced out by equations 12.121 and 12.122
for the terminal points (0, 1) and (/2, 0) for which the frictionless brachis-
tochrone, = 0, ends tangentially to the x-axis; this is depicted by the dashed
line. The cases = 0.2, 0.3 and 0.5 are shown.
In gure 12.10 is shown stationary paths with the end points (0, 1) and (5, 0) for which
the frictionless brachistochrone dips below the x-axis. In this case the distance travelled
is longer than in gure 12.9 and the value of above which there is no stationary path
is smaller, as illustrated in the problems considered in exercises 12.24 and 12.32.
1 2 3 4 5
-0.5
0
0.5
1
=0
x
y
=0.2
=0.1
=0.05
=0.15
Figure 12.10 Graphs of the curves traced out by equations 12.121 and 12.122
for the terminal points (0, 1) and (5, 0) and various values of , with the case
= 0 shown with the dashed line.
Exercise 12.25
Assuming that v0 = 0 show that at the end points h() = 1, where h() is dened
in equation 12.117, and that h() has a single minimum at = 1/2 /4.
Find the minimum value of h() and deduce that solutions exist only if
tan
_
1
2
+

4
_
< 1.
Exercise 12.26
In the friction free limit, = 0, show that equations 12.121 and 12.122 give
x =
1
4gB
2
(2 sin 2) and y = A
1
4gB
2
(1 cos 2), =

2
+,
and that 1 is related to A by
b
A
=
21 sin 21
1 cos 21
, 1 =

2
+1.
Exercise 12.27
(a) Show that the functional
S[y] =
_

0
dxy
2
y(0) = y() = 0,
_

0
dx y
2
= 1, gives rise to the equation
d
2
y
dx
2
+y = 0, y(0) = y() = 0,
where is the Lagrange multiplier.
(b) Show that the functions y(x) =
_
2/ sin nx, with Lagrange multiplier = n
2
,
n = 1, 2, , are solutions of this equation.
Exercise 12.28
(a) Show that the functional, which is quadratic in y and y
,
S[y] =
_
b
a
dx
_
p(x)y
2
q(x)y
2
_
, y(a) = y(b) = 0,
and the constraint
_
b
a
dxw(x)y(x)
2
= 1 leads to the linear equation
d
dx
_
p(x)
dy
dx
_
+ (q(x) +w(x))y = 0, y(a) = y(b) = 0.
(b) If the constraint were not also quadratic in y(x) would the resulting Euler-
Lagrange equation be linear?
Exercise 12.29
Find the stationary value of the functional S[y] =
_
1
0
dxy
2
subject to the con-
straint
_
1
0
dx y = a.
Exercise 12.30
Find the function y(x) making the functional P[y] =
_

dx y ln y stationary
subject to the two constraints
_

dxy = 1 and
_

dxx
2
y =
2
, and where
y(x) goes to zero suciently rapidly as |x| for all integrals to exist.
You will nd the following integrals useful:
_

dxe
ax
2
=
_
a
,
_

dx x
2
e
ax
2
=
2a
3/2
where (a) > 0.
This is an important problem that occurs in statistical physics and information
theory, where y(x) is the probability distribution of a continuously distributed
random variable x and P[y] is the entropy. The rst constraint is just the normal-
isation condition, satised by all distributions, and the second is the variance.
Exercise 12.31
Show that the the stationary path of the functional
S[y] =
_

0
dxy
2
, y(0) = y() = 0,
_
0
dxy sin x = a, is y(x) = (2a/) sin x.
Exercise 12.32
The points (0, a) and (b, 0), respectively on the Oy and Ox axes, are joined by
a rough wire in the shape of the quadrant of the ellipse parameterised by the
equations
x = b(1 cos ), y = a(1 sin ), 0

2
.
A bead slides down this wire under the inuence of gravity and Coulomb friction,
show that the equation of motion 12.101 can be written in the form
dz
d
+
2abz
a
2
cos
2
+b
2
sin
2
= g(acos b sin ),
where z = v
2
/2. If v(0) = 0 show that
1
2
v
2
=
g
f()
_

0
dw(a cos w b sin w)f(w),
where
f() = exp
_
2tan
1
_
b
a
tan
__
.
Deduce that if = 1 where 1 is the positive solution of
_
/2
0
dw(cos w sin w)f(w) = 0
and = b/a the bead has zero speed at the terminus.
Chapter 13
Sturm-Liouville systems
13.1 Introduction
The general theory of Sturm-Liouville systems presented in this chapter was created in
a series of articles in 1836 and 1837 by Sturm (1803 1855) and Liouville (1809 1882):
their work, later known as Sturm-Liouville theory, created a new subject in mathe-
matical analysis. The theory deals with the general linear, second-order dierential
equation
d
dx
_
p(x)
dy
dx
_
+
_
q(x) +w(x)
_
y = 0 (13.1)
where the real variable, x, is conned to an interval, a x b, which may be the whole
real line or just x 0. The functions p(x), q(x) and w(x) are real and satisfy certain,
not very restrictive, conditions that will be delineated in section 13.4; in any particular
problem these functions are known. A second-order dierential equation is said to be
in self-adjoint form when expressed as in equation 13.1: most second-order equations
can be expressed in this form, see exercise 2.31 (page 74).
In addition to the dierential equation, boundary conditions are specied with the
consequence that solutions exist for only particular values of the constant =
k
,
k = 1, 2, , which are named
1
eigenvalues: the solution y
k
(x) is named the eigenfunc-
tion for the eigenvalue
2
k
. At this stage we shall not specify any boundary conditions,
despite their importance, because dierent types of problems produce dierent types of
conditions. Equation 13.1, together with any necessary boundary conditions, is known
as a Sturm-Liouville system, or problem, which belongs to the class of problems known
as eigenvalue problems.
In physical problems it is often important to compute or estimate the values of the
eigenvalues, for instance in problems associated with waves the eigenvalue is related to
the wave length: in the next chapter we show how variational methods can be used
1
The fact that we use the same symbol for the eigenvalue and the Lagrange multiplier introduced
in chapter 12, is not a coincidence, as is seen by comparing equation 13.1 with the equation derived in
exercise 12.28, page 337.
2
There are also important examples where the eigenvalues can take any real number in an interval
(which may be innite), and there are examples in which the eigenvalues can be both discrete and
continuous. Such problems are common and important in quantum mechanics. In this course we deal
only with discrete sets of eigenvalues.
339
340 CHAPTER 13. STURM-LIOUVILLE SYSTEMS
to provide estimates using a very simple technique requiring only the evaluation of
integrals, and which is readily extended to the more dicult case of partial dierential
equations. In this chapter we concentrate on the properties of the eigenfunctions and
eigenvalues.
Sturm-Liouville problems are important partly because they arise in diverse cir-
cumstances and partly because the properties of the eigenvalues and eigenfunctions are
well understood. Moreover, the behaviour of both the eigenvalues and eigenfunctions
of a wide class of Sturm-Liouville systems are remarkably similar and is independent
of the particular form of the functions p(x), q(x) and w(x). In this class of problems
there is always a countable innity of real eigenvalues
k
, k = 1, 2, , and the set of
eigenfunctions y
k
(x), k = 1, 2, , is complete, meaning that these functions may be
used to form generalised Fourier series, as described in section 13.3. Further, there are
simple approximations for both the eigenvalues and eigenfunctions which are accurate
for large k, as shown in exercise 13.25 (page 371).
The achievements of Sturm and Liouville are more impressive when seen in the
context of early nineteenth century mathematics. Prior to 1820 work on dierential
equations was concerned with nding solutions in terms of nite formulae or power
series; but for the general equation 13.1 Sturm could not nd an expression for the
solution and instead obtained information about the properties of the solution from
the equation itself. This was the rst qualitative theory of dierential equations and
anticipated Poincares work on nonlinear dierential equations developed at the end
of that century. Today the work of Sturm and Liouville is intimately interconnected:
however, though lifelong friends who discussed their work prior to publication, this
theory emerged from a series of articles published separately by each author during
the period 1829 to 1840. More details of this history may be found in L utzen (1990,
chapter 10).
Sturm-Liouville systems are important because they arise in attempts to solve the
linear, partial dierential equations that describe a wide variety of physical problems.
In addition most of the special functions that are so useful in mathematical physics,
and the study of which led to advances in analysis in the 19
th
century, originate in
Sturm-Liouville equations. The importance of these functions should not be under-
estimated, as is frequent in this age of computing, for they furnish useful solutions to
many physical problems and can lead to a broader understanding than purely numerical
solutions. Further, the mathematics associated with these functions is elegant and its
study rewarding. There is no time in this course for any discussion of these functions,
but aspects of the important Bessel function are described in section 13.3.1.
Section 13.2 therefore briey describes how Sturm-Liouville systems occur and gives
some idea of the variety of types of Sturm-Liouville problems that need to be tackled.
This section is optional, but recommended.
In section 13.3 we consider a particularly simple, solvable, Sturm-Liouville system
and examine the properties of its eigenvalues and eigenfunctions in order to illustrate
all the relevant properties of more general systems, which normally cannot be solved in
terms of elementary functions. Some of these properties depend on elementary prop-
erties of second-order dierential equations; this theory in described in section 13.3.
Other properties are endowed on the eigenvalues and eigenfunctions because the canon-
ical form of equation 13.1 is self-adjoint, a term dened in section 13.4.2.
Equation 13.1 can be cast into a variety of other forms which are useful in the
following discussion. Additionally this equation, with appropriate boundary conditions,
is the Euler-Lagrange equation of a constrained variational problem, with as the
Lagrange multiplier, and this is crucial for the later developments in chapter 14. The
following exercises lead you through this background and we recommend that you do
these exercises.
Exercise 13.1
(a) Show that the Euler-Lagrange equation for the functional and constraint
S[y] =
_
b
a
dx
_
py
2
qy
2
_
, C[y] =
_
b
a
dxw(x)y
2
= 1,
with admissible functions satisfying y(a) = y(b) = 0, is
d
dx
_
p
dy
dx
_
+ (q +w) y = 0, y(a) = y(b) = 0.
(b) Dene a new independent variable by =
_
x
a
du
p(u)
to show that this Euler-
Lagrange equation is transformed into
d
2
y
d
2
+p(q +w)y = 0.
(c) By putting y = uv and by choosing v carefully, show that the original func-
tional and constraint can be written in the form
S[u] =
_
b
a
dx
_
u
2
1
4p
2
_
p
2
+ 4pq 2pp
_
u
2
_
, C[u] =
_
b
a
dx
w
p
u
2
,
where u(a) = u(b) = 0. Hence derive the Euler-Lagrange equation for u and
compare this with equation 2.32 (page 74).
Exercise 13.2
Liouvilles normal form:
S[y] =
_
b
a
dx
_
p(x)y
2
(q +w)y
2
_
.
(a) Change the independent variable to = (x) and the dependent variable to
v() where y = A()v(). With a suitable choice of (x) show that the functional
can be written in the form
S[v] =
1
2
_
p
(x)
_
A
2
_
v
2
_
d
c
+
_
d
c
d
_
_
dv
d
_
2
F()v
2
_
,
where
d
dx
=
1
pA
2
, c = (a), d = (b) and
F() = (q +w)pA
4
A
d
2
d
2
_
1
A
_
.
(b) By dening A = (wp)
1/4
, show that
(x) =
_
w/p and the associated Euler-
Lagrange equation is
d
2
v
d
2
+
_
q
w
A
d
2
d
2
_
1
A
_
+
_
v = 0.
This transformation is sometimes named Liouvilles transformation , and is par-
ticularly useful for approximating the eigenvalues and eigenfunctions when is
large, see exercise 13.25 (page 371).
13.2 The origin of Sturm-Liouville systems
In this section we show how various types of Sturm-Liouville problems arise. This
material is not assessed but it is recommended that you read it and, time permitting,
that you do some of the exercises at the end of this section because it is important
background material.
The original work of Sturm appears to have been motivated by the problem of
heat conduction. One example he discussed is the temperature distribution in a one-
dimensional bar, described by the linear partial dierential equation
h(x)
u
t
=

x
_
p(x)
u
x
_
l(x)u, (13.2)
where u(x, t) denotes the temperature at a point x of the bar at time t, and h(x), p(x)
and l(x) are positive functions. If the surroundings of the bar are held at constant
temperature and the ends of the bar, at x = 0 and x = L, are in contact with large
bodies at a dierent temperature, then the boundary conditions can be shown to be
p(x)
u
x
+u(x, t) = 0, at x = 0,
p(x)
u
x
+u(x, t) = 0, at x = L,
(13.3)
for some constants and . Finally, the initial temperature of the bar needs to be
specied, so u(x, 0) = f(x) where f(x) is the known initial temperature.
Sturm attempted to solve this equation by rst substituting a function of the form
u(x, t) = X(x)e
t
, where is a constant and X(x) is independent of t. This yields
the ordinary dierential equation
d
dx
_
p(x)
dX
dx
_
+
_
h(x) l(x)
_
X = 0 (13.4)
for X(x) in terms of the unknown constant , together with the boundary conditions
p(0)X
(0) +X(0) = 0 and p(L)X
(L) +X(L) = 0. (13.5)

This is an eigenvalue problem. Assuming that there are solutions X
k
(x) with eigenvalues
=
k
, for k = 1, 2, , Sturm used the linearity of the original equation to write a
general solution as the sum
u(x, t) =
k=1
A
k
X
k
(x)e
k
t
,
13.2. THE ORIGIN OF STURM-LIOUVILLE SYSTEMS 343
where the coecients A
k
are arbitrary. This solution formally satises the dierential
equation and the boundary conditions, but not the initial condition u(x, 0) = f(x),
which will be satised only if
f(x) =
k=1
A
k
X
k
(x).
Thus the problem reduces to that of nding the values of the A
k
satisfying this equation.
Fourier (1768 1830) and Poisson (1781 1840) found expressions for the coecients
A
k
for particular functions h(x), p(x) and l(x), but Sturm and Liouville determined
the general solution.
Typically Sturm-Liouville equations occur when the method of separating variables
is used to solve the linear partial dierential equations that arise frequently in physical
problems; some common examples are
2
+k
2
= 0, (13.6)
2
k
t
= 0, heat or diusion equation, (13.7)
2

1
c
2
t
2
= 0, wave equation, (13.8)
1
(x)
x
_
(x)
x
_
1
c
2
t
2
= 0, canal or horn equation, (13.9)
where c is a constant representing the speed of propagation of small disturbances in the
medium, k is a positive constant, (x) some positive function of x and
2
=

2
x
2
+

2
y
2
+

2
z
2
.
The rst of these equations arises in the solution of Poissons equation that is,
2
= F(r) and similar equations occur when using separation of variables. The
second equation describes diusion processes and heat ow. The third equation 13.8
is the wave equation for propagation of small disturbances in an isotropic medium and
describes a variety of wave phenomena such as electromagnetic radiation, water and
air waves, waves in strings and membranes. The fourth equation is a variant of the
previous wave equation and in this form was derived by Green (1793 1841) in his
1838
3
paper describing waves on a canal of rectangular cross section but with a width
varying along its length; a similar equation describes, approximately, the air pressure
in a horn, though in many instruments the are is suciently rapid for the longitudinal
and radial modes to couple, so it is necessary to use the two-dimensional version of 13.9
in which the variation of the air pressure along the length of the pipe and in the radial
direction is included.
The many dierent forms of the Sturm-Liouville system that we discuss in the fol-
lowing sections are largely a consequence of the shapes of the regions in which the
physical system is dened and of the coordinate system that simplies the equations.
A Sturm-Liouville system arises when the method of separation of variables is used to
3
On the Motion of Waves in a variable Canal of small Depth and Width, 1838 Camb Phil Soc, Vol
VI, part III.
reduce a partial dierential equation to a set of uncoupled ordinary dierential equa-
tions. Whether or not such a simplication is feasible depends upon the existence of
a suitable coordinate system and this depends upon the form of the original equation
and the shape of the boundary. Relatively few problems yield to this treatment, but
it is important because it is one of the principal means of nding solutions in terms
of known functions: the main alternatives are numerical and variational methods, the
latter being introduced in chapter 14.
In problems with two spatial dimensions separation of variables can be used with
equations 13.6 and 13.8 for rectangular, circular and elliptical boundaries but not, for
example, most triangular boundaries.
We end this section by separating variables for the equation
2
+k
2
= 0, using
the spherical polar coordinates,
x = r cos cos , y = r cos sin , z = r sin ,
where 0 , 0 2 and r 0 which are appropriate when the equation
is dened in a spherically symmetric region, for instance the interior or exterior of a
sphere of given radius or the region between two spheres of given radii and coincident
centres. The purpose of this section is to show how and why dierent Sturm-Liouville
systems occur. Although this material is not assessed, you should read it in order to
understand why some of the later mathematics is necessary.
In these coordinates it can be shown that equation 13.6 becomes
r
_
r
2
r
_
+
1
sin
_
sin
_
+
1
sin
2
2
+k
2
r
2
= 0. (13.10)
First, write (r, , ) as the product (r, , ) = R(r)S(, ) where R depends only
upon r and S only upon (, ). Equation 13.10 then can be written in the form
1
R
d
dr
_
r
2
dR
dr
_
+k
2
r
2
=
1
S
_
1
sin
_
sin
S
_
+
1
sin
2
2
S
2
_
.
The left-hand side of this equation depends only upon r and the right-hand side only
upon (, ). Because (r, , ) are independent variables this equation can be satised
only if each side is equal to the same constant, which we denote by ; constants intro-
duced for this purpose are named separation constants . Note that the constant k is also
a separation constant obtained when separating the time from the spatial coordinates,
as in passing from equations 13.2 to 13.4. Thus we obtain the two equations,
d
dr
_
r
2
dR
dr
_
+
_
k
2
r
2
_
R = 0, (13.11)
1
sin
_
sin
S
_
+
1
sin
2
2
S
2
+S = 0. (13.12)
The rst of these equations is already in the canonical form of equation 13.1, and
contains two constants k and which are determined by the boundary conditions.
The second equation for S is converted into two suitable equations in the same
manner: substitute S = ()() where and are respectively functions of and
only. Then equation 13.12 can be cast in the form,
sin
d
d
_
sin
d
d
_
+sin
2
=
1
d
2
d
2
.
The left-hand side of this equation depends only upon and the right-hand side only
upon , so each must equal the same constant. Later we shall see that the separation
constant must be positive or zero: denoting it by
2
, with 0 so that the sign of
2
is unambiguous, gives the two equations
d
2
d
2
+
2
= 0, (13.13)
1
sin
d
d
_
sin
d
d
_
+
_

2
sin
2
_
= 0. (13.14)
Finally, if we dene a new independent variable by x = cos , so
df
d
=
df
dx
sin and
1
sin
d
d
_
sin
df
d
_
=
d
dx
_
(1 x
2
)
df
dx
_
,
the equation for becomes
d
dx
_
(1 x
2
)
d
dx
_
+
_

2
1 x
2
_
= 0, 1 x 1. (13.15)
Both equation 13.13 for and the two equations 13.14 and 13.15 for are in the
canonical form of equation 13.1.
Comparison of 13.13 for with equation 13.1 shows that the separation constant
2
now plays the role of the eigenvalue; its value is determined by the boundary conditions
that needs to satisfy. Comparison of 13.15 for with equation 13.1 shows that here
plays the role of the eigenvalue.
This analysis shows that in spherical polar coordinates the equation
2
+k
2
= 0
gives rise to three Sturm-Liouville systems for R(r), () and () where = R(r)()().
These equations are summarised in table 13.1.
Table 13.1: Summary of the three Sturm-Liouville systems arising from separation of vari-
ables of equation 13.6 using spherical polar coordinates, giving the explicit form for the three
functions p, q and w, in each case.
Equation p q w Eigenvalue
+
2
= 0 1 0 1
2
_
(1 x
2
)
(x)
_
+
_

2
1 x
2
_
= 0 1 x
2

2
1 x
2
1
_
r
2
R
(r)
_
+ (k
2
r
2
)R = 0 r
2
r
2
k
2
Now consider the boundary conditions.
The equation for : the points with coordinates (r, , ) and (r, , + 2n), n =
0, 1, 2, , all label the same point in space, so in most physical problems we must
have ( + 2n) = () for all , that is () must be 2-periodic. This is why the
separation constant introduced to derive equations 13.13 and 13.14 had to be positive,
for the equation

2
= 0, with > 0, does not have periodic solutions; further,
is 2-periodic only if is a non-negative integer, = m, m = 0, 1, 2, , see exer-
cise 13.9 (page 353).
The equation for , has p(x) = 1 x
2
, which is zero at the ends of the interval
(1, 1), that is at = 0 and , corresponding to the poles. The poles are singular
points of spherical polar coordinates, because at each pole is undened, and this is
why p(x) = 0 at x = 1. Further, because the coecient of
() is zero at x = 1,
the general theory of linear dierential equations shows that there are two types of
solutions, those that are bounded at x = 1 and those that are unbounded. Physical
considerations suggest that in most circumstances only bounded solutions are signi-
cant. Thus for this type of Sturm-Liouville problem the boundary conditions are simply
that () is bounded for x [1, 1]. It can be shown that with = m, this condition
gives = l(l +1), l = m, m+1, m+2,
4
; these solutions are named the associated
Legendre polynomials and are denoted by P
m
l
(x).
The radial equation for R(r) has p(r) = r
2
, so if the original space includes the origin
we nd that because p(0) = 0 the solutions are of two types, those that are bounded
and those that are unbounded at r = 0. Again, physical considerations usually suggest
that the bounded solutions are chosen. The other boundary conditions are either given
by some condition at r = a > 0, where a is the radius of the sphere in which the original
problem is dened, or that the solutions remain bounded as r .
Summary: the method of separation of variables applied to the equation
2
+k
2
= 0,
using spherical polar coordinates leads to three dierent types of Sturm-Liouville sys-
tems. In this summary we introduce the idea of regular and singular Sturm-Liouville
systems, that will be discussed further and dened in section 13.4.
(1) The equation
d
2
d
2
+
2
= 0 (13.16)
with periodic boundary conditions ( + 2) = () for all , which determines
possible values of . Note that this condition implies the conditions (0) = (2)
and
(0) =
(2).
(2) The equation
d
dx
_
(1 x
2
)
d
dx
_
+
_

2
1 x
2
_
= 0, 1 x 1. (13.17)
The condition that () is bounded for all x serves the same purpose as boundary
conditions, and determines possible values of the eigenvalue , once
2
is known.
Because p(x) = 1x
2
is zero at the ends of the interval this type of Sturm-Liouville
equation is classied as a singular Sturm-Liouville system.
(3) The equation
d
dr
_
r
2
dR
dr
_
+
_
k
2
r
2
_
R = 0. (13.18)
4
A physical reason why l m is that in some circumstances l is proportional to the magnitude of
an angular momentum and m a projection of this vector along a given axis, which can be no longer
than the original vector.
For this equation several types of conditions can specify the solution uniquely and
determine possible values of the eigenvalue k
2
.
(i) If 0 r a, since p(r) = r
2
is zero at r = 0, the solutions will normally
be required to be bounded at r = 0 and satisfy a condition of the form
A
1
y(a) +A
2
y
(a) = 0 at r = a, where A
1
and A
2
are constants. This system
is classied as a singular Sturm-Liouville system because p(r) = 0 at r = 0.
(ii) If r [0, ), since p(0) = 0 the solutions will normally be required to
be bounded at r = 0 and tend to zero as r . Again this is a singular
Sturm-Liouville system.
(iii) If 0 < a r b the solution will be required to satisfy boundary condi-
tions of the form
A
1
y(a) +A
2
y
(a) = 0 and B
1
y(b) +B
2
y
(b) = 0,
where A
1
, A
2
, B
1
and B
2
are constants. For this system p(r) = r
2
> 0 for all
r and the system is a regular Sturm-Liouville system.
The examples described in this section show how Sturm-Liouville equations arise and
why a variety of types of these equations exist. The signicance of the diering types
will become clear as the theory develops.
Exercise 13.3
Consider the system
2
+ k
2
= 0 with (x, y) = 0 on the rectangle dened
by the x- and y-axes, and the lines x = a > 0, y = b > 0. Show that inside
this rectangle separation of variables with Cartesian coordinates leads to the two
Sturm-Liouville systems
d
2
X
dx
2
+
2
1
X = 0 and
d
2
Y
dy
2
+
2
2
Y = 0,
with X(0) = X(a) = 0, Y (0) = Y (b) = 0 and where = X(x)Y (y) and
2
1
+
2
2
= k
2
.
Exercise 13.4
Consider the system
2
+ k
2
= 0 with (x, y) = 0 dened inside the circle of
radius a. Use the polar coordinates x = r cos , y = r sin , 0 r a to cast the
equation in the form
r
2
+
1
r
r
+
1
r
2
2
+k
2
= 0.
By putting = R(r)(), where R(r) depends only upon r and () only upon
, show that
d
2
d
2
+
2
= 0, with () 2-periodic,
r
2
d
2
R
dr
2
+r
dR
dr
+
_
k
2
r
2
2
_
R = 0,
where is a positive constant. Show further that the equation for R(r) can be
cast in self-adjoint form
d
dr
_
r
dR
dr
_
+
_
k
2
r

2
r
_
R = 0.
13.3 Eigenvalues and functions of simple systems
The eigenvalues and eigenfunctions of most Sturm-Liouville systems are not easy to
nd; yet the theory of Sturm-Liouville systems, to be described later, shows that the
eigenfunctions for most Sturm-Liouville systems with discrete eigenvalues behave sim-
ilarly, independent of the detailed form of the three functions p, q and w and of the
boundary conditions. This important fact is one reason why the approximate method
described in section 14.3 works so well.
Thus in this section, in order to help understand this behaviour, we consider the
Sturm-Liouville system dened by the equation
d
2
y
dx
2
+y = 0, y(0) = y() = 0, (13.19)
with p(x) = w(x) = 1, q(x) = 0 and dened in the interval [0, ]. This equation
has simple solutions, found in exercise 13.5, and by studying these it is possible to
understand almost everything about the solutions of other Sturm-Liouville systems
with discrete eigenvalues. We illustrate this point in section 13.3.1 by describing the
properties of a singular Sturm-Liouville system closely related to equation 13.18, and
whose eigenfunctions are Bessel functions.
Exercise 13.5
(a) Show that equation 13.19 has no real, nontrivial solutions if 0.
(b) Find the values of > 0 for which solutions exist and nd these solutions.
In exercise 13.5 it was shown that the eigenfunctions and eigenvalues of equation 13.19
are
y
n
(x) = Bsin nx,
n
= n
2
, n = 1, 2, . (13.20)
The constant B is undetermined because the equation and boundary conditions are
homogeneous. It is often convenient to x the value of this constant by normalising the
eigenfunctions to unity, that is we set
_

0
dxy
n
(x)
2
= 1 and this gives B
2
_

0
dx sin
2
nx =
1
2
B
2
= 1. (13.21)
By choosing B to be positive this convention gives the following eigenfunctions and
eigenvalues
y
n
(x) =
_
2
sin nx,
n
= n
2
, n = 1, 2, . (13.22)
Graphs of the adjacent pairs of eigenfunctions {y
1
(x), y
2
(x)}, and {y
5
(x), y
6
(x)} are
shown in the following gure.
13.3. EIGENVALUES AND FUNCTIONS OF SIMPLE SYSTEMS 349
1 2 3
-0.5
0
0.5
1 2 3
-0.5
0
0.5
k=1
k=2
k=6
k=5
y
x
x
y
Figure 13.1 Graphs of y
k
(x) =
2/ sinkx for k = 1, 2 on the left, and k = 5, 6 on the right.

We now list the important properties of these eigenvalues and eigenfunctions and state
which are common to all Sturm-Liouville systems. It is surprising that most of these
properties are common to all Sturm-Liouville systems regardless of the precise forms of
the functions p, q and w.
In this list we rst state the specic property of the solutions of the Sturm-Liouville
system 13.19, and then state the equivalent general property of the solutions for the
general system, equation 13.1.
Real eigenvalues The eigenvalues
n
= n
2
, n = 1, 2, are real.
The eigenvalues of all Sturm-Liouville systems are real and this is a consequence of
the form of the dierential equation and the boundary conditions, which together
produce a self-adjoint operator: for an example of boundary conditions that give
complex eigenvalues, see exercise 13.10 (page 353).
Behaviour of eigenvalues The smallest eigenvalue is unity, but there is no largest
eigenvalue: further,
n
/n
2
= O(1) as n .
For the general Sturm-Liouville system there is a smallest but no largest eigenvalue
and
n
increases as n
2
for large n; this is proved in exercise 13.25 (page 371).
Uniqueness of eigenfunctions For each eigenvalue
n
there is a single eigenfunction,
y
n
sin nx, unique to within a multiplicative constant.
This is also true of regular Sturm-Liouville systems and most singular Sturm-
Liouville systems of physical interest. The important exception described in ex-
ercise 13.9 (page 353) shows that there is not always a unique eigenfunction for
periodic boundary conditions. The example of exercise 13.13 shows that some
singular Sturm-Liouville systems have no eigenfunctions.
Interlacing zeros The zeros of adjacent eigenfunctions interlace, so there is one and
only one zero of y
n+1
(x) between adjacent zeros of y
n
(x), see gure 13.1.
This is also true in the general case, and is a property of many solutions of second-
order equations, see theorem 13.2 (page 360), see also theorem 13.3.
Number of zeros of the nth eigenfunction The nth eigenfunction has n1 zeros
in 0 < x < .
For the general Sturm-Liouville problem on the interval [a, b] the nth eigenfunction
has n 1 zeros in a < x < b. This property is largely a consequence of the
interlacing of zeros.
Orthogonality of eigenfunctions The integral of the product of two distinct eigen-
functions over the interval (0, ) is zero,
_

0
dxy
n
(x)y
m
(x) =
_

0
dx sin nxsin mx = 0, n = m.
For the general Sturm-Liouville system, regular and singular, dened in equa-
tion 13.1 there is a similar result. If
n
(x) and
m
(x) are eigenfunctions belonging
to two distinct eigenvalues, then they can be shown to satisfy the orthogonality
relation
_
b
a
dxw(x)
n
(x)
m
(x) = h
n
nm
, (13.23)
where h
n
is a sequence of positive numbers,
nm
is the Kronecker delta
5
and
a

denotes the complex conjugate. Note that there are two dierences between
the specic example of equation 13.19 and the general case. First, the function
w(x), the same function that multiplies the eigenvalue in the original dierential
equation 13.1, has been included in the integrand: in this context w(x) is often
named the weight function. Second, the complex conjugate of
n
(x) appears.
This is necessary because there are circumstances when it is more convenient to
use complex solutions even though the equations are real: for instance, we often
use e
inx
in place of the real trigonometric functions cos nx and sin nx.
By analogy with ordinary geometric vectors this integral is named an inner product
and it is convenient to introduce the short-hand notation
(f, g)
w
=
_
b
a
dxw(x)f(x)
g(x) (13.24)
where f(x) and g(x) are any functions, which may be complex, for which the
integral exists. Notice that (g, f)
w
= (f, g)
w
. With this notation equation 13.23
can be written in the form h
n
= (
n
,
n
)
w
. If w(x) = 1 we denote the inner
product by (f, g).
If (f, g)
w
= 0 the two functions are said to be orthogonal and if (f, f)
w
= 1 the
function f is said to be normalised.
Completeness of eigenfunctions The eigenfunctions y
n
(x) = sin nx may be used in
a Fourier series to represent any suciently well behaved function f(x) for which
_
0
dx|f(x)|
2
exists. The Fourier representation of f(x) is,
f(x) =
n=1
b
n
sin nx, 0 < x < where b
n
=
2
_

0
dxf(x) sin nx. (13.25)
The innite set of functions sin nx, n = 1, 2, , is said to be complete on the
interval (0, ) because any suciently well behaved function can be represented
in terms of such an innite series.
5
The Kronecker delta is a function of two integers, (n, m), dened as nm = 0 if n = m and 1 if
n = m.
In general if
n
(x), n = 1, 2, , are the eigenfunctions of a Sturm-Liouville
system dened on (a, b), with given boundary conditions, they are complete which
means that that any suciently well behaved function f(x) for which
_
b
a
dx|f(x)|
2
exists, can be represented by the innite series
f(x) =
n=1
a
n
n
(x), a < x < b, (13.26)
where
a
n
=
(
n
, f)
w
(
n
,
n
)
w
=
1
h
n
_
b
a
dxw(x)
n
(x)
f(x), h
n
= (
n
,
n
)
w
.
It is conventional to name the more general series 13.26 a Fourier series and the
coecients a
n
the Fourier components: the series 13.25 is often referred to as a
trigonometric series, if a distinction is necessary.
The twin properties of orthogonality and completeness of the eigenfunctions, and
hence the existence of the series 13.26, are two reasons why Sturm-Liouville sys-
tems play a signicant role in the theory of linear dierential equations. It means,
for instance, that solutions of the inhomogeneous equation
d
dx
_
p(x)
dy
dx
_
+q(x)y = F(x), (13.27)
with suitable boundary conditions, can usually be expressed as a linear combina-
tion of the eigenfunctions of the related Sturm-Liouville system,
d
dx
_
p(x)
dy
dx
_
+
_
q(x) +w(x)
_
y = 0,
with the same boundary conditions. The rigorous treatment of this theory is too
involved to be included in this course, but an outline of the theory is contained
in the next exercise.
Exercise 13.6
Suppose that the Sturm-Liouville system
d
dx
_
p
dy
dx
_
+ (q +w)y = 0, y(a) = y(b) = 0,
has an innite set of eigenvalues and eigenfunctions
k
and
k
(x), k = 1, 2, ,
with 0 < 1 < 2 < . which satisfy the orthogonality relation 13.23.
(a) Consider the innite series
y(x) =
k=1
y
k
k
(x)
where the coecients y
k
are constants. Assuming the order of summation and
dierentiation can be interchanged, show that
d
dx
_
p
dy
dx
_
+qy =
k=1
y
k
k
w(x)
k
(x).
(b) Hence show that the solution of the inhomogeneous equation 13.27 can be
written in the form
y(x) =
_
b
a
duG(x, u)F(u) where G(x, u) =
k=1
k
(u)
k
(x)
h
k
k
.
Exercise 13.7
This exercise shows how the boundary conditions can aect the eigenvalues and
eigenfunctions. Find all eigenvalues and eigenfunctions of the Sturm-Liouville
systems dened by the dierential equation
d
2
y
dx
2
+y = 0,
and the three sets of boundary conditions
(a) y
(0) = y
() = 0, (b) y(0) = y
() = 0, (c) y(0) = 0, y() = y
().
In each case show that the eigenfunctions, n(x), belonging to distinct eigenvalues
are orthogonal, that is satisfy
_

0
dxn(x)
m(x) = hnnm
where hn is a sequence of positive numbers which you should nd.
Exercise 13.8
This exercise involves lengthy algebraic manipulations. In exercise 13.7 you found
the following sets of eigenfunctions, yn(x), and eigenvalues, n, for the equation
d
2
y/dx
2
+y = 0 with three dierent boundary conditions,
(a) yn(x) = cos nx, n = n
2
, n = 0, 1, , y
(0) = y
() = 0;
(b) yn(x) = sin(n + 1/2)x, n = (n + 1/2)
2
, n = 0, 1, , y(0) = y
() = 0;
(c) y0(x) = sinh 0x, 0 =
2
0
, yn(x) = sin nx, n =
2
n
, where tanh0 = 0
and tan n = n, n = 1, 2, .
The Sturm-Liouville theorem shows that each of these sets of functions is complete
on (0, ). Use equation 13.26 to show that the function x may be represented by
any of the following series on the interval (0, )
x =

2

4
k=0
cos(2k + 1)x
(2k + 1)
2
,
x =
2
k=0
(1)
k
(k + 1/2)
2
sin
_
k +
1
2
_
x,
x =
2( 1) cosh 0
0( cosh
2
0)
sinh 0x 2( 1)
k=1
cos
k
sin
k
x
k
( cos
2
k
)
.
Exercise 13.9
Periodic boundary conditions:
(a) Show that the eigenvalues of the Sturm-Liouville system
d
2
y
dx
2
+y = 0, y(0) = y(2a), y
(0) = y
(2a), a > 0,
are given by
n =
_
n
a
_
2
, n = 0, 1, 2, ,
and that there are no negative eigenvalues. Show also that for n = 0 there is
just one eigenfunction, which can be taken to be y0(x) = 1, and for n 1 each
eigenvalue has two linearly independent eigenfunctions,
yn(x) =
_
cos
_
nx
a
_
, sin
_
nx
a
__
,
or any linear combination of these.
(b) Consider the two eigenfunctions associated with the nth eigenvalue
u1(x) = A1 cos
_
nx
a
_
+B1 sin
_
nx
a
_
and u2(x) = A2 cos
_
nx
a
_
+B2 sin
_
nx
a
_
.
Show that these are orthogonal only if A1A2 +B1B2 = 0.
Exercise 13.10
Mixed boundary conditions:
The solutions of a Sturm-Liouville equation with mixed boundary conditions usu-
ally behave quite dierently from those with unmixed conditions. An example is
considered in this exercise.
Consider the system with mixed boundary conditions
d
2
y
dx
2
+y = 0, y(0) = 0, y() = ay
(0), a > 0.
Show that if 0 < a < there are a nite number of real eigenvalues given by the
real roots of the equation sin = a, (1, 2, , N), with =
2
and with
eigenfunctions y
k
(x) = sin
k
x and N 1/a.
Are these eigenfunctions orthogonal?
13.3.1 Bessel functions (optional)
Here we show that the properties described in the previous section are shared by Bessel
functions, which is one of the special functions that can be dened by a singular Sturm-
Liouville equation, given in equation 13.28.
We choose the Bessel function for this illustration because it is one of the more
important special functions of mathematical physics. It was one of the rst special
functions to be the subject of a comprehensive treatise (Watson 1966)
6
which provides
6
G N Watson 1966 A treatise on the Theory of Bessel Functions (Cambridge University Press),
rst published 1922.
a thorough history of the early development and use of Bessel functions: they have oc-
curred in the work of Euler (1764, in the vibrations of a stretched membrane), Lagrange
(1770, in the theory of planetary motion), Fourier (1822, in his theory of heat ow),
Poisson (1823, in the theory of heat ow in spherical bodies) and by Bessel (1824, who
studied these functions in detail): Watson (1966) abandons his attempt to delineate the
chronological order of the study after Bessel as After the time of Bessel, investigations
on the functions become so numerous . . . .
Bessel functions are important because, unlike most other special functions, they
arise in two quite distinct types of problems. The rst is in the solution of linear partial
dierential equations where separation of variables is used to derive ordinary dierential
equations; typically problems involving cylindrical and spherical symmetry give rise to
Bessel functions, but so does the problem of the small vibrations of a chain suspended
from one end (considered by Euler in 1782).
These types of problem lead to dierential equations that can be cast into the form
x
2
d
2
y
dx
2
+x
dy
dx
+ (x
2
2
)y = 0, (13.28)
where is a real number
7
, though in the following we consider only the case = 1.
The various solutions of this equation are collectively named Bessel functions. This
equation is singular at the origin (see section 13.3) and, as a consequence, it can be
shown to possess two types of solution. Those denoted by J
(x) are bounded at the

origin: those denoted by Y
(x) are unbounded at the origin.

The second application arises because it is frequently necessary to expand the func-
tion e
iz sin t
, which is 2-periodic in t, as a Fourier series. It transpires that the Fourier
components are Bessel functions,
e
iz sin t
=
n=
J
n
(z)e
int
. (13.29)
This relation is useful in the modern problem of the interaction of periodic electric
elds, lasers for example, with atoms and molecules: but the original application of
Bessel functions in this context was the inversion of Keplers equation, which relates
the time, t, to the eccentric anomaly, u, of a planet in an elliptical orbit with the Sun
at one focus,
t = u sinu (Keplers equation). (13.30)
Here is the angular frequency of the planet and the eccentricity of the elliptical
path typically less than 0.1, the exceptions being Mercury (0.21) and Pluto (0.25).
Elementary dynamics gives the approximate position of each planet in terms of u, but
for practical applications they are needed in terms of the time. By writing t = and
u = + P(), so P() is a 2-periodic function, we nd that the Fourier components
of P() are related to Bessel functions, see exercise 13.26.
This application gives rise to the integral denition of J
n
(x),
J
n
(x) =
1
2
_

dt expi (nt xsin t) , n = 0, 1, 2, . (13.31)

7
In the general theory both x and are complex variables. The important Modied Bessel functions
are obtained by making purely imaginary.
The integral representation of J
(x), where is not an integer, is more complicated

(Whittaker and Watson, 1965, sections 17.1 and 17.231). It can be shown, by dier-
entiating equation 13.31, that the function dened in this way satises the dierential
equation 13.28, see exercise 13.27.
Exercise 13.11
(a) Show that the self-adjoint form of equation 13.28 is
d
dx
_
x
dy
dx
_
+
_
x

2
x
_
y = 0.
(b) Show that the normal form, dened in exercise 2.31 (page 74), of equa-
tion 13.28 is
d
2
u
dx
2
+
_
1
1
4
x
2
_
u = 0 where y(x) =
u(x)
x
, x > 0.
(c) Apply the Liouville transformation, dened in exercise 13.2, to equation 13.28
to give the alternative form of Bessels equation
d
2
y
d
2
+
_
e
2
2
_
y = 0 where = ln x, x > 0.
Exercise 13.12
(a) Use the Fourier series 13.29 to show that
(i) Jn(x) = (1)
n
Jn(x);
(ii) Jn(x) = (1)
n
Jn(x);
(iii) J0(x) + 2J2(x) + 2J4(x) + = 1.
(b) Use the integral denition to show that J0(0) = 1 and that Jn(0) = 0 for
n = 0.
(c) By dierentiating the integral denition 13.31 with respect to x derive the
recurrence relation
2J
n
(x) = Jn1(x) Jn+1(x).
(d) Use the integral denition 13.31 to show that
Jn1(x) +Jn+1(x) =
2n
x
Jn(x).
In the remainder of this section we describe the behaviour of the eigenvalues and eigen-
functions of the singular Sturm-Liouville system associated with Bessels equation,
x
2
d
2
y
dx
2
+x
dy
dx
+ (
2
x
2
1)y = 0, 0 x 1, y(1) = 0. (13.32)
with > 0, in particular we show that they satisfy most of the properties listed at
the beginning of section 13.3. By converting equation 13.32 to the self-adjoint form
(xy
+(x
2
1/x)y = 0, see exercise 13.11, and comparing with equation 13.1 we see
that the eigenvalue is =
2
(and p = w = x, q = 1/x). By changing the independent
variable to = x we see that this equation is the same as equation 13.28 with = 1
and hence has the solutions Y
1
(x) and J
1
(x); we require the solution that is bounded,
that is J
1
(x).
The boundary condition at x = 1 then gives J
1
() = 0, that is must be one of the
zeros of the Bessel function. A graph of J
1
() is shown in gure 13.2 and this suggests
that there are an innite number of positive zeros,
k
, k = 1, 2, .
2 4 6 8 10 12 14 16 18 20
-0.4
-0.2
0
0.2
0.4
0.6
J
1
()
Figure 13.2 Graph of the Bessel function J
1
().
Using its series expansion Daniel Bernoulli (1738) rst suggested that this Bessel func-
tion has an innite set of zeros. Later we shall see how this follows from the general
theory of second-order dierential equations: the rst ve zeros are
1
= 3.832,
2
= 7.016,
3
= 10.17,
4
= 13.32,
5
= 16.47,
and these numbers can be approximated by the formula
k
=
_
k +
1
4
_

3
8(k + 1/4)
+O(k
3
), k = 1, 2, ,
which gives the rst zero to within 0.006% and progressively improves in accuracy with
increasing k.
The easiest way to understand why J
1
(x) oscillates in the manner shown in g-
ure 13.2 is to use the result derived in exercise 13.11(b). For large x this shows
that u(x) =

xJ
1
(x) is given approximately by the equation u
+ u = 0, so that
J
1
(x) (Acos x + Bsin x)/
x; this shows why J

1
(x) oscillates but does not give the
phase of the oscillations, that is the values of A and B.
The eigenfunctions of equation 13.32 are thus
y
k
(x) = J
1
(
k
x), k = 1, 2, . (13.33)
In the following two gures are shown the graphs of the eigenfunctions {y
1
(x), y
2
(x)}
and {y
5
(x), y
6
(x)}, as in gure 13.1 (page 349), with which you should compare the
present gures.
13.4. STURM-LIOUVILLE SYSTEMS 357
0.2 0.4 0.6 0.8 1
-0.4
-0.2
0
0.2
0.4
0.2 0.4 0.6 0.8 1
-0.4
-0.2
0
0.2
0.4
k=1
k=2
k=6
k=5
y
x
x
y
Figure 13.3 Graphs of y
k
(x) = J
1
(
k
x), for k = 1, 2, on the left, and k = 5, 6 on the right.
These eigenfunctions and eigenvalues all behave as previously described, namely:
the eigenvalues are real:
for large n the eigenvalues behave as
n
=
2
n
(n+1/4)
2
2
, that is
n
/n
2
= O(1)
as n :
the nth eigenfunction has n 1 zeros in the interval 0 < x < 1:
there is one and only one zero of y
n+1
(x) between adjacent zeros of y
n
(x):
the eigenfunctions are orthogonal with weight function w(x) = x. In this case it
can be shown that
_
1
0
dxxJ
1
(x
n
)J
1
(x
m
) =
nm
h
n
with h
n
=
1
2
J
1
(
n
)
2
.
The eigenfunctions are complete, which means that any suciently well behaved
real function, f(x), on the interval 0 < x < 1 can be expressed as the innite
series, equation 13.26 (page 351),
f(x) =
n=1
a
n
J
1
(x
n
) where a
n
=
2
J
1
(
n
)
2
_
1
0
dxxf(x)J
1
(x
n
).
13.4 Sturm-Liouville systems
In the previous section it was shown how the eigenvalues and eigenfunctions of a par-
ticular Sturm-Liouville system behave and it was stated that most of these systems
behave similarly. We now formally dene regular and singular systems before investi-
gating some of these properties. The distinction between regular and singular Sturm-
Liouville systems is important, because not all singular systems have eigenvalues, see
exercise 13.13; however, the regular and singular systems that arise from linear partial
dierential equations behave similarly.
A regular Sturm-Liouville system is dened to be the linear, homogeneous, second-
order dierential equation
8
d
dx
_
p(x)
dy
dx
_
+
_
q(x) +w(x)
_
y = 0 (13.34)
8
There is no agreed convention for the signs in this equation. For instance, in Courant and Hilbert
(1965) and Birkho and Rota (1962) the sign in front of q(x) is negative and in Korner (1988) the
signs in front of q(x) and are negative. Care is needed when using dierent sources.
dened on a nite interval of the real axis a x b, together with the homogeneous
boundary conditions
A
1
y(a) +A
2
y
(a) = 0 and B
1
y(b) +B
2
y
(b) = 0, (13.35)
with A
1
, A
2
, B
1
and B
2
real constants, and the two cases A
1
= A
2
= 0 and B
1
= B
2
= 0
are excluded. These conditions are sometimes named separated boundary conditions.
the functions p(x), q(x) and w(x) are real and continuous for a x b;
p(x) and w(x) are strictly positive for a x b;
p
(x) exists and is continuous for a x b.

Equation 13.32, dening the Bessel function, and the radial equation 13.18 for R(r) and
equation 13.17 for (), do not satisfy the condition p > 0. Further, equation 13.16
for () has a dierent type of boundary condition than those of equation 13.35. It
follows that the scope of the theory needs to be extended if it is to be useful.
First, it needs to apply to periodic boundary conditions, that is
y(a) = y(b), y
(a) = y
(b) (13.36)
which are an important subset of the class of mixed boundary conditions, see exer-
cise 13.9. Equation 13.16 for () has this type of boundary condition. Another
common Sturm-Liouville system with periodic boundary conditions is Mathieus equa-
tion,
d
2
y
d
2
+ ( 2q cos 2) y = 0, y(0) = y(), y
(0) = y
(), (13.37)
where here q is a real variable. This equation seems to have been rst studied by the
French mathematician Mathieu (1835 1890) in his discussion of the vibrations of an
elliptic membrane and occurs when separating variables in elliptical coordinates, see
exercise 13.26 (page 373). In this example (q) is the eigenvalue and it has a fairly
complicated dependence upon the variable q.
The main dierence between periodic and separated boundary values is that some-
times, see exercise 13.9, each eigenvalue has more than one eigenfunction. In such cases
it is always possible to choose linear combinations that are orthogonal.
The second necessary extension is to those equations where p(x) = 0 at either or
both end points. In the example treated in section 13.2, the equation 13.18 for R(r) is
singular if the interval contains r = 0, as is the Bessel function example, equation 13.32:
the equation 13.17 for () is singular because p(x) = 1 x
2
is zero at both ends of
the interval. Thus singular systems are as common as regular systems.
As an aside we note that all these singular systems arise because the spherical polar
coordinates used to separate variables are singular at the poles, where x = cos = 1
and is undened, and at r = 0 where neither nor are dened. It is this geo-
metric singularity in the transformation between Cartesian and polar coordinates that
makes the Sturm-Liouville systems singular: therefore we do not expect these particular
singular systems to be much dierent from regular systems.
A Sturm-Liouville system for which p(x) is positive for a < x < b but vanishes at
one or both ends is named a singular Sturm-Liouville system. These systems comprise
the dierential equation 13.34, with w(x) and q(x) satisfying the same conditions as
for a regular system, and
the solution is bounded for a x b;
at an end point at which p(x) does not vanish, y(x) satises a boundary condition
of the type 13.35.
The example of equation 13.17 shows that for some singular systems q(x) is unbounded
at the interval ends. The behaviour of q(x) is not, however, so important in determining
the behaviour of the eigenfunctions.
The third necessary extension is to systems dened on innite or semi-innite inter-
vals, which arise in many applications in quantum mechanics. We shall not deal with
these problems, but note that in many cases these systems behave like regular systems.
Exercise 13.13
Consider the eigenvalue problem
d
dx
_
x
2
dy
dx
_
+y = 0, 0 x 1, y(1) = c 0,
and with y(x) bounded.
(a) Find the general solution of this equation and show that this problem has no
eigenvalues if c = 0 and innitely many if c > 0.
(b) How does this problem change if the boundary conditions become y(a) =
y(1) = 0, 0 < a < 1?
13.4.1 Separation and Comparison theorems
In this section we use the Wronskian, introduced in section 2.4.3, to derive useful
properties of the positions of the zeros of the solutions of the homogeneous equation,
p
2
(x)
d
2
y
dx
2
+p
1
(x)
dy
dx
+p
0
(x)y = 0, a x b, (13.38)
where p
2
(x) = 0 for x [a, b]. The theorems given here were rst discovered by Sturm:
the rst involves the relative positions of the zeros of two linearly independent solutions,
f(x) and g(x), of the homogeneous equation 13.38. Since W(f, g) = 0, if g(x) = 0 at
x = c, then
W(f, g; c) = f(c)g
(c) = 0.
Hence f(c) = 0 and g
(c) = 0.
Now let c and d be two successive zeros of g(x), so g(c) = g(d) = 0 then f(c) = 0
and f(d) = 0; also g
(c) and g
(d) must have dierent signs (because if g(x) is increasing

at x = c it must be decreasing at x = d, or vice-versa). Since W(f, g; x) has constant
sign and
W(c) = f(c)g
(c), W(d) = f(d)g
(d),
it follows that f(c) and f(d) must have opposite signs. Hence f(x) must have at least
one zero for c < x < d; two possible situations are shown in gure 13.4.
c
d
x
y
c
d
x
y
g(x)
g(x)
f(x)
f(x)
Figure 13.4 Diagram showing the behaviour of f(x) between two adjacent zeros of g(x),
consistent with W(f, g) not changing sign. Only the behaviour on the left-hand side is
actually possible, because we assume that g(x) = 0 for c < x < d, see text.
However, there can be only one zero of f(x) between adjacent zeros of g(x). Suppose
there are more: by reversing the roles of f and g we see that between two of the zeros
of f(x), there must be at least one zero of g(x), which contradicts the assumption that
c and d are adjacent zeros. Thus we have the following theorem.
Theorem 13.1
Sturms separation theorem. If f(x) and g(x) are linearly independent solutions of
the second-order homogeneous equation 13.38, then the zeros of f(x) and g(x) alternate
in (a, b).
A well known example of this theorem is the equation y
+ y = 0, on the whole real

line, which has the independent solutions sin x and cos x with the alternating zeros n
and (n+1/2), n = 0, 1, 2, , respectively. A less obvious consequence is that the
two functions
f(x) = a
1
sinx +a
2
cos x and g(x) = b
1
sinx +b
2
cos x
have alternating zeros provided a
1
b
2
= a
2
b
1
, which ensures that the two functions are
linearly independent, see exercise 2.33 (page 75).
Note that this theorem does not prove that the zeros exist. The equation y
y = 0,
with solutions sinh x and coshx shows that zeros need not exist.
The next theorem is more useful and in some circumstances can be used to show that
zeros exist and also to give their approximate positions. This is Sturms comparison
theorem, which we rst state, then prove.
Theorem 13.2
Sturms comparison theorem. Let y
1
(x) and y
2
(x) be, respectively, nontrivial so-
lutions of the dierential equations
d
2
y
dx
2
+Q
1
(x)y = 0 and
d
2
y
dx
2
+Q
2
(x)y = 0 (13.39)
on an interval (a, b) and assume that Q
1
(x) Q
2
(x) everywhere in this interval. Then
between any two zeros of y
2
(x) there is at least one zero of y
1
(x), unless Q
1
(x) = Q
2
(x)
everywhere and y
1
is a constant multiple of y
2
.
A simple example of this theorem is the equation y
+
2
y = 0, with solution sin x
having zeros at n/, equally spaced, a distance / apart. Hence for the two equations
with =
2
and =
1
>
2
there must be at least one zero of sin
1
x between
adjacent zeros of sin
2
x.
Proof of the comparison theorem
The following proof depends upon the properties of the Wronskian. If x = c and x = d
are adjacent zeros of y
2
(x), with c < d, suppose that y
1
(x) = 0 for c x d. We may
assume that both y
1
(x) and y
2
(x) are positive in (c, d). Then
W(y
1
, y
2
; c) = y
1
(c)y
2
(c) > 0, since y
2
(c) > 0,
W(y
1
, y
2
; d) = y
1
(d)y
2
(d) < 0, since y
2
(d) < 0.
(13.40)
But
dW
dx
=
d
dx
(y
1
y
2
y
1
y
2
) = y
1
y
2
y
1
y
2
and, on using the dierential equations 13.39 dening y
1
and y
2
, this simplies to
dW
dx
=
_
Q
1
(x) Q
2
(x)
_
y
1
(x)y
2
(x) 0, c x d.
It follows that if Q
1
(x) > Q
2
(x), W(y
1
, y
2
; x) is a monotonic increasing function of x,
so that W(c) W(d), which contradicts equation 13.40. Thus we must have y
1
(d) < 0
and hence y
1
(x) must have at least one zero in (c, d).
Further, if Q
1
= Q
2
the separation theorem implies that there is one zero unless y
1
and y
2
are linearly dependent, that is y
2
(x) is a multiple of y
1
(x).
Applications of the comparison theorem
The equation y
+Q(x)y = 0, Q(x) 0
The rst important result that follows from this is that every nontrivial solution of
d
2
y
dx
2
+Q(x)y = 0 (13.41)
has at most one zero in any interval where Q(x) 0.
The proof is by contradiction. A solution of y
= 0 (that is, Q
1
(x) = 0) is y
1
(x) = 1.
If a solution of 13.41 has two zeros in a region where Q
2
Q
1
= 0, then y
1
(x) would
have at least one zero in between, which is a contradiction.
The elementary equation y
y = 0, with the two sets of linearly independent

solutions, {coshx, sinh x} and {e
x
, e
x
}, illustrates this result: only the second member
of the rst pair, sinh x, has a zero.
Bessel functions
The comparison theorem can sometimes be applied to obtain useful properties of solu-
tions. For instance the equation for an ordinary Bessel function of order is
d
2
y
dx
2
+
1
x
dy
dx
+
_
1

2
x
2
_
y = 0, (13.42)
which can be written in the normal form, exercise 2.31 (page 74),
d
2
u
dx
2
+
_
1 +
1 4
2
4x
2
_
u = 0 where y(x) =
u(x)
x
, x > 0. (13.43)
If < 1/2 the function Q
1
(x) = 1 +
1 4
2
4x
2
> 1 so a suitable comparison equation is
v
+ v = 0, that is Q
2
= 1 < Q
1
. A solution of the comparison equation is v = sin x,
with positive zeros at x = n, n = 1, 2, . Hence u(x) has at least one zero in each
of the intervals (n, (n + 1)), n = 1, 2, .
If > 1/2 we can show that the solution has an innity of positive zeros. In this case
Q
1
(x) = 1(4
2
1)/(4x
2
) < 1, so we take the comparison equation to be v
+
2
v = 0,
with 0 < < 1: then for x > x
0
(), where Q
1
(x
0
) =
2
, Q
1
(x) > Q
2
=
2
, and
the comparison theorem shows that there is at least one zero of u(x) in each interval
(n/, (n + 1)/), with n > x
0
; as x , we may chose close to unity.
We end this section by quoting, without proof, a more general comparison theorem,
needed later to obtain approximate positions of the zeros of an eigenfunction. The proof
of this theorem may be found in Birkho and Rota (1962, chapter 10).
Theorem 13.3
Sturms comparison theorem II. For the dierential equations
d
dx
_
p
1
(x)
dy
dx
_
+Q
1
(x)y = 0 and
d
dx
_
p
2
(x)
dy
dx
_
+Q
2
(x)y = 0, a x b,
where p
2
(x) p
1
(x) and Q
2
(x) Q
1
(x) for x (a, b), then if y
1
(x) is a solution of the
rst equation and y
2
(x) any solution of the second equation, between any two adjacent
zeros of y
2
there lies at least one zero of y
1
, except if p
1
= p
2
, Q
1
= Q
2
, for all x [a, b],
and y
1
is a constant multiple of y
2
.
A shorter, approximate, easy to remember version is that as Q(x) increases and/or p(x)
decreases, the number of zeros of every solution increases.
The rst comparison theorem is a direct consequence of this theorem. These the-
orems can be used to show that for a regular Sturm-Liouville system, provided the
eigenfunctions y
n
(x) exist and the eigenvalues satisfy
1
<
2
< <
n
<
n+1
< ,
then the zeros of y
n
(x) interlace and that y
n
(x) has n 1 zeros in (a, b). We outline a
proof that these eigenfunctions exist in section 13.4.3.
Exercise 13.14
Use the Liouville normal form found in exercise 13.2 (page 341) and the comparison
theorem to show that there is a lower bound on the eigenvalues of a regular Sturm-
Liouville system with the boundary conditions y(a) = y(b) = 0.
Exercise 13.15
(a) Show that every solution of the Airy equation y
+ xy = 0 vanishes innitely
often for x > 1 and at most once for x < 0.
(b) Show that if y(x) satises Airys equation, then v(x) = y(ax) satises the
equation v
+a
3
xv = 0.
(c) Show that the Sturm-Liouville system y
+xy = 0, y(0) = y(1) = 0, has an

innite sequence of positive eigenvalues and no negative eigenvalues.
13.4.2 Self-adjoint operators
The eigenvalues of a Sturm-Liouville system are real and the eigenfunctions of most
systems are orthogonal. These two important properties follow directly from the form
of the real, dierential operator,
Lf =
d
dx
_
p(x)
df
dx
_
+q(x)f, a x b, (13.44)
which denes the Sturm-Liouville equation.
The rst result we need is Lagranges identity,
v(Lu)
Lv =
d
dx
_
p(x)
_
v
du
dx
u
dv
dx
__
(13.45)
where u and v are any, possibly complex, functions for which both sides of the identity
exist.
Exercise 13.16
Prove Lagranges identity, equation 13.45.
Using the the inner product notation, with unit weight function
9
, (f, g) =
_
b
a
dxf(x)
g(x),
Lagranges identity can be written in the form
(Lu, v) (u, Lv) =
_
p(x)
_
v(x)
du
dx
u(x)
dv
dx
__
b
a
. (13.46)
For some boundary conditions the right-hand side of this equation is zero and then
(Lu, v) = (u, Lv). (13.47)
In this case the operator and the boundary conditions are said to be self-adjoint. It is
important to note that a dierential operator cannot be self-adjoint without appropriate
For the homogeneous, separated boundary conditions dened in equation 13.35
(page 358) we have, since A
1
and A
2
are real, and assuming A
2
= 0,
A
1
u(a)
+ A
2
u
(a)
= 0
A
1
v(a) + A
2
v
(a) = 0
=
u
(a)
u(a)
=
v
(a)
v(a)
.
This shows that the boundary term of equation 13.46 is zero at x = a; a similar analysis
shows it to be zero at x = b. If A
2
= 0 then u(a) = v(a) = 0 and the same result follows.
For a singular system, if p(a) = 0 the boundary term at x = a is clearly zero. Thus
for regular and singular systems (Lu, v) = (u, Lv) and the operator L is self-adjoint.
Periodic boundary conditions also make the system self-adjoint, as shown in the next
exercise.
9
There is no agreed version of the inner product notation. That adopted here is normally used in
physics, particularly in quantum mechanics, but in mathematics texts the integrand is often taken to
be f(x)g(x)
. Provided one denition is used consistently the dierence is immaterial.

Exercise 13.17
Prove that if the boundary conditions are periodic, y(a) = y(b) and y
(a) = y
(b)
and p(a) = p(b), then L is self-adjoint.
Note: periodic boundary conditions are examples of mixed boundary conditions
in which the values of the function, and possibly its derivative, at the two ends of
the range are non-trivially related. Normally mixed boundary conditions produce
operators that are not self-adjoint, exercise 13.20.
Exercise 13.18
In this chapter the operators considered are real but complex operators are often
useful.
Show that on the space of dierentiable functions for which
_
dx|u(x)|
2
exists
the real operator L =
d
dx
is not self-adjoint, but that the complex operator L = iL
is self-adjoint.
In this example there are no boundary conditions: the condition that the integral
_
dx |u(x)|
2
exists means that |u| 0 as x and this plays the role of
the boundary conditions.
Exercise 13.19
Show that the operator L dened by
Ly =
d
2
y
dx
2
+y = 0, y(0) = A, y
() = B,
where , A and B are nonzero constants, is not self-adjoint. This exercise shows
why the boundary conditions need to be homogeneous.
Exercise 13.20
Show that the system Ly = y
+ y = 0, with the mixed boundary conditions,

y(0) = 0, y() = ay
(0), a = 0, is not self-adjoint.

Note in exercise 13.10 (page 353) it was shown that some of the eigenvalues of this
system are complex and that the eigenfunctions are not orthogonal.
The eigenvalues of a self-adjoint operator are real
If (x) is an eigenfunction corresponding to an eigenvalue , then L = w and
(L, ) = (w, ) =
(w, ).
Also
(, L) = (, w) = (, w)
and hence, since w(x) is real,
0 = (L, ) (, L) = (
)
_
b
a
dxw(x)|(x)|
2
.
Since w(x) > 0 and (, )
w
> 0, the right-hand side can be zero only if =
, that is
the eigenvalues of a Sturm-Liouville system are real: this proof is valid for regular and
singular systems and if the boundary conditions are periodic.
The eigenfunctions are orthogonal
Now consider two eigenfunctions (x) and (x) corresponding to distinct eigenvalues
and , respectively, that is L = w and L = w. By the self-adjoint property
0 = (L, ) (, L) = (, )
w
+(, )
w
= ( )
_
b
a
dxw(x)(x)
(x).
Since we have assumed that = it follows that
(, )
w
=
_
b
a
dxw(x)(x)
(x) = 0. (13.48)
13.4.3 The oscillation theorem (optional)
In this section we provide a brief outline of a proof that a regular Sturm-Liouville system
possesses a countable innity of eigenfunctions. The nal result is summarised in the
following theorem, which is a consequence of the oscillation theorem, theorem 13.5. In
the remainder of this section we describe the ideas behind the proof of the oscillation
theorem: the details may be found in Birkho and Rota (1962, chapter 10).
Theorem 13.4
The regular Sturm-Liouville system
d
dx
_
p
dy
dx
_
+Q(x)y = 0, Q(x) = q(x) +w(x), a x b, (13.49)
with the separated boundary conditions
A
1
y(a) +A
2
y
(a) = 0 and B
1
y(b) +B
2
y
(b) = 0 (13.50)
has an innite sequence of real eigenvalues
1
<
2
< <
n
<
n+1
< with
lim
n
n
= . The eigenfunction y
n
(x) belonging to the eigenvalue
n
has exactly
n 1 zeros in the interval a < x < b and is determined uniquely up to a constant
multiplicative factor.
The main idea behind the proof outlined here is the Pr ufer substitution, named after
the German mathematician Heinz Pr ufer (1896 1934); this involves using polar coordi-
nates in the Cartesian plane having coordinates (py
, y) to understand how the solution

behaves. Two new dependent variables (r(x), (x)) are dened by the relations
p(x)y
= r cos and y = r sin (13.51)

so that
r
2
= y
2
+p
2
y
2
and tan =
y
py
. (13.52)
Since y and y
cannot simultaneously be zero, r > 0. Notice that y(x) = 0 when

(x) = n, where n is an integer.
First we need the dierential equations for r and . Dierentiating the equation for
tan gives
1
cos
2
d
dx
=
1
p

y(py
(py
)
2
=
1
p
+
_
y
py
_
2
Q
where we have used the relation (py
= Qy. Multiplying by cos

2
gives
d
dx
= Q(x) sin
2
+
1
p(x)
cos
2
, Q = q(x) +w(x). (13.53)
This rst-order equation for is independent of r, and provided p(x) = 0, it has a
unique solution for every initial value of , that is (a). Further it can be shown that
the solution (x, ) is a continuous function of x and in the intervals a x b and
< < .
The equation for r is found by dierentiating the equation for r
2
and then using the
original equation
r
dr
dx
= y
dy
dx
+py
d(py
)
dx
=
r
2
p
sin cos Qr
2
sin cos .
Hence
dr
dx
=
1
2
_
1
p(x)
Q(x)
_
r sin 2. (13.54)
The two equations 13.53 and 13.54 are equivalent to the original dierential equation
and are named the Pr ufer system assocated with the self-adjoint equation 13.34.
The equation for r can be expressed as an integral
r(x) = r(a) exp
_
1
2
_
x
a
dt
_
1
p(t)
Q(t)
_
sin 2(t)
_
, (13.55)
which can be evaluated once (x) is known; however, we shall not need this equation.
Notice that because the original equation for y is homogeneous the magnitude of r(x)
is unimportant, and is why r(x) depends linearly upon r(a).
The solution of equation 13.53 for (x) depends only upon the initial conditions,
that is the boundary condition A
1
y(a) +A
2
y
(a) = 0, which gives

tan
a
=
A
2
A
1
p(a)
with 0
a
< , (13.56)
and with
a
= /2 if A
1
= 0 The eigenvalues are given by those values of for which
b
= (b, ), satises the equation tan
b
= B
2
/(B
1
p(b)). However, here the main
objective is not to nd the eigenvalues but to rst determine that they exist and second
to determine some of their properties, and for this only the initial condition is required.
It is necessary to understand how (x, ) behaves as a function of x and ; this
behaviour is summarised in the following theorem which is proved in Birkho and Rota
(1962, chapter 10).
Theorem 13.5
The oscillation theorem. The solution of the dierential equation 13.53 satisfying
the initial condition (a, ) =
a
< , for all , is a continuous and strictly monotonic
in for xed x on the interval a < x b. Also
lim
(x, ) = and lim
(x, ) = 0 for a < x b.

This theorem show that y(b, ) = r(b) sin (b, ) has innitely many zeros for > 0,
and hence that there are innitely many eigenfunctions.
In order to understand why (x, ) behaves in the manner described in theorem 13.5
we consider two specic examples. The rst is a very simple system with known eigen-
functions; the second example is suciently general to contain all the essential features
of the general case.
The rst system is
d
2
y
dx
2
+y = 0, 0 x , (13.57)
and here p = 1 and Q = , so the equation 13.53 for is
d
dx
= cos
2
+sin
2
, (0) =
0
.
This equation is particularly simple because the right-hand side is independent of x, so
it can be integrated directly, to give
x() =
_

0
d
1
cos
2
+sin
2
. (13.58)
However, this means that it is unrepresentative which is why another example is con-
sidered after the following discussion. We now deduce the qualitative behaviour of the
function (x) from this integral.
If > 0,
(x) > 0 and (x) is a monotonic increasing function of x; the larger the
greater the rate of increase of (x, ). In particular (, ) is an increasing function of
: this is clear from the integral 13.58 because the integrand is positive and for most
values of a decreasing function of . Thus for a given value of x the upper limit,
(x), must increase as increases to compensate for the decreasing magnitude of the
integrand, see exercise 13.21.
If < 0, then (x) > 0 tends to a constant value as x . To see this observe
that
(x) = 0 when =
c
and
c
where 0 <
c
= tan
1
(1/
) < /2, and thus,

if
0
=
c
then (x) =
c
for all x; this solution is stable;
if
0
=
c
then (x) =
c
for all x; this solution is unstable;
if 0
0
<
c
, then
(0) > 0 and (x) increases monotonically to

c
as x ;
if
c
<
0
<
c
, then
(0) < 0 and (x) decreases monotonically to

c
as
x ;
if
c
<
0
< then
(0) > 0 and (x) increases monotonically to

c
+
as x .
This behaviour is shown graphically in gure 13.5, where = 1/4, which gives
c
= 1.107 and graphs of (x, ) are shown for various initial conditions. Figure 13.6
shows the graphs of (x, ), with the same initial condition
0
= 0.6, but various values
of . Since
c
depends upon ,
(0) > 0 for > 2.14, and
(0) < 0 for < 2.14.

0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
c
+
c
x/
(x)
Figure 13.5 Graphs of (x) for = 1/4
and various initial conditions.
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
0
-0.1
-0.5
-1.0
-1.5
-5.0
-2.0
x/
(x)
Figure 13.6 Graphs of (x), for the initial
condition
0
= 0.6, and various negative .
It is clear from these graphs that there can be at most one negative eigenvalue. For the
parameters of gure 13.6,
0
= 0.6, (, ) varies between 0 and tan
1
( + tan
0
) =
1.315, as increases from to 0: if the boundary condition
, at x = , lies in this
range there will be a single eigenvalue for some negative . Otherwise there will be no
negative eigenvalue.
Now restrict attention to the case > 0, where (x, ) increases with x, for xed
, and with for xed x. Graphs of (x, ) for 0 x and various values of are
shown in gure 13.7.
0 0.2 0.4 0.6 0.8 1
0
10
20
30
40
50
60

150
120
100
50
10
1
30
40
250
x /
Figure 13.7 Some representative graphs of (x), dened by equation 13.58

with
0
= 0, for various values of . Using the integral 13.58 it can be shown
that if 1, (x) x
.
The following exercise uses the integral 13.58 to deduce some propoerties of (x, ) for
the dierential equation 13.57 with the boundary conditions y(0) = y() = 0.
Exercise 13.21
(a) For the boundary value problem
y
+y = 0, y(0) = y() = 0,
show that (0, ) = 0 and (, ) = n, for some positive integer n. Use equa-
tion 13.58 to deduce that the value of satisfying this last equation is = n
2
.
Deduce that the nth eigenvalue is n = n
2
and show that its eigenfunction has
n 1 zeros in the interval 0 < x < .
(b) If (, ) = () show that
d
d
=
_
cos
2
+sin
2
_
_

0
d
sin
2
(cos
2
+sin
2
)
2
> 0.
(c) Show that lim
(, ) = .
For part (a) you will need the integral
_
/2
0
d
1
a
2
cos
2
+b
2
sin
=

2ab
, a > 0, b > 0.
Now consider a slightly dierent, but more typical problem, for which there is no simple
formula for (x). Consider the eigenvalue problem
d
2
y
dx
2
+xy = 0, y(0) = y(1) = 0, (13.59)
also treated in exerise 13.15. In this example p = 1 and Q = x, so the equation for
is
d
dx
= cos
2
+xsin
2
, (0) = 0, 0 x 1. (13.60)
If > 0,
(x) > 0 and, as before, (x) is a monotonic increasing function of x, with a

greater rate of increase the larger . Further if
2
>
1
, (x,
2
) (x,
1
), as shown by
an application of the theorem for rst-order equations quoted in exercise 13.22. Thus
for > 0 there is little qualitative dierence between this and the previous simpler
example; some representative graphs of (x, ) are depicted in gure 13.8.
0 0.2 0.4 0.6 0.8 1
0
2
4
6
8
10

150
120
100
50
10
1
30
40
x
Figure 13.8 Some representative graphs of (x), dened by equation 13.60

for various values of .
If < 0 the behaviour is not so easy to understand but, nevertheless, is similar to the
simpler example. Put = , with > 0, so the equation for becomes
d
dx
= cos
2
xsin
2
, (0) = 0, 0 x 1. (13.61)
For small x, x
2
< 1 this equation is approximated by
= cos
2
1, so (x) grows
linearly with x, that is (x) = x. The two terms on the right-hand side of equation 13.61
are comparable when 1 = x
3
and near this value of x,
(x) becomes negative and for

large both and x are small, so the equation is approximately
d
dx
= 1 x
2
. (13.62)
For x
3
> 1 the approximate solution of this equation is the function that makes the
derivative zero, that is x
2
= 1. To see this put x
2
=
1
+, so
= : if > 0,
decreases; if < 0,
increases. In either case the solution moves towards the line

10
x
2
= 1. A more accurate solution in the region x
3
> 1 is found in exercise 13.23.
In gure 13.9 we compare the numerically generated solution of equation 13.61 with
the linear approximation, for x <
1/3
and the approximation x
2
= 1 for larger x,
for the cases = 10 and 100. This comparison conrms the predicted behaviour.
0 0.2 0.4 0.6 0.8 1
0.1
0.2
0.3
0.4
0.5
=10
x x
0 0.2 0.4 0.6 0.8 1

0.05
0.1
0.15
0.2
0.25
=100
Figure 13.9 Graphs of the numerical solution of equation 13.61 and the approximations = x
and x
2
= 1, shown by the dashed lines, for small and larger values of x, respectively. The
boundary x =
1/3
is shown by the arrows.
These graphs and the approximations show that for 1, (1, ) 1/
, and
max
0<x<1
() ()
1/3
; hence for 1, there can be no eigenvalues for the
boundary conditions y(0) = y(1) = 0.
We now apply the same method to the general case
= (q+w) sin
2
+(1/p) cos
2
,
to show that its solutions behave similarly. If is suciently large that is q +w > 0
for a < x < b, then (x) is a monotonic increasing function of x. Further, it can be
shown, see exercise 13.24, that (b, ) increases with and that lim
(b, ) = .
Hence there are innitely many positive eigenvalues with distinct eigenfunctions.
If = 1, with the initial condition (0) = 0, we again see that for small
x, (x) (x a)/p(a). This growth continues until wsin
2
is large enough, that is
w(x)(x a)
2
p(x), and subsequently the equation is approximately
d
dx

1
p(x)
w(x)
2
.
The same reasoning as above shows that the approximate solution is p(x)w(x)
2
= 1,
giving (b, ) (p(b)w(b))
1/2
, that is the variation of (x) is too small for eigenval-
ues to exist, for the boundary conditions y(a) = y(b) = 0: for other boundary conditions
one negative eigenvalue may exist.
Exercise 13.22
In this exercise bounds on the positions of zeros and eigenvalues are obtained for
the Sturm-Liouville system dened by equation 13.49 with the boundary condi-
tions y(a) = y(b) = 0. For this the following comparison theorem for the rst-order
equations y
= F(x, y) is needed.
10
This type of analysis is useful in the study of boundary layer problems, relaxation oscillations and
certain types of limit cycles.
Suppose that F(x, y) and G(x, z) satisfy the Lipshitz condition
|F(x, y) G(x, z)| L|y z|, a x b,
on suitable intervals of y and z, for some constant L. If y
= F(x, y) and z
= G(x, z),
with y(a) = z(a), then if F(x, y) G(x, y) for a x b and a suitable domain
of y, it can be shown that y(x) z(x) for a < x b.
Use this theorem with equation 13.53 for (x) to show that the kth zero, x
k
lies
between the limits,
_
p1
q2 +w2
x
k
a
k

_
p2
q1 +w1
.
where p1 p(x) p2, q1 q(x) q2, and w1 w(x) w2. Deduce that n, the
nth eigenvalue satises
p1
w2
_
n
b a
_
2
q2
w2
n
p2
w1
_
n
b a
_
2
q1
w1
.
Exercise 13.23
In equation 13.62 dene a new variable = /, where = 1/
, and show that
(x) = 1 x
2
.
By writing the solution of this equation in the form
(x) = 0(x) +1(x) +
2
2(x) + ,
and equating the coecients of the powers of to zero, show that 0, 1 and 2
satisfy the equations
1 x
2
0
= 0,
0
= 2x01,
1
= x
_
2
1
+ 202
_
and hence show that
(x) =
1
x
+
1
4x
2
+
7
32
3/2
x
7/2
+O(
5/2
).
Exercise 13.24
Use the comparison theorem for rst-order equations quoted in exercise 13.22 to
show that if 2 > 1 then (b, 2) (b, 1).
Exercise 13.25
In this exercise an approximation to the eigenvalues and eigenfunctions for large n
is found. The Liouville transformation, exercise 13.2 (page 341), shows that the
equation
d
dx
_
p
dy
dx
_
+ (q +w) y = 0, a x b,
can be transformed to the equation
d
2
v
d
2
+Q(x)v = 0, Q(, ) =
q
w
A
d
2
d
2
_
1
A
_
+,
where y = A()v(),
(x) =
_
x
a
dx
_
w
p
and A() = (wp)
1/4
.
(a) Dene the modied Pr ufer transformation
v() = RQ
1/4
cos , v
() = RQ
1/4
sin ,
where we assume Q() > 0 for all , and show that
d
d
=
_
Q
Q
4Q
sin 2 and
d
d
(lnR) =
Q
4Q
cos 2.
(b) Assume that Q is bounded and that max(Q) and show that ()
and R r, where and r are constants, and deduce that with the boundary
conditions y(a) = y(b) = 0 the approximate eigenvalues and eigenfunctions are
n =
_
n
(b)
_
2
and vn() =
r
Q(, n)
1/4
sin
_
n
(b)
_
.
Exercise 13.26
For problems dened inside an elliptical region it is sometimes convenient to use
elliptical coordinates dened by
x = cosh ucos v and y = sinh usin v
where is a positive constant, so that
x
2
2
cosh
2
u
+
y
2
2
sinh
2
u
= 1,
and when v changes by 2, with u xed, this equation denes an ellipse.
Any elliptical boundary can be dened by a particular choice of and u = u0,
and the interior of the ellipse if given by 0 u u0, v .
In these coordinates the partial dierential equation
2
+k
2
= 0 becomes
2
2
(cosh 2u cos 2v)
_
u
2
+

2
v
2
_
+k
2
= 0.
By putting = f(u)g(v) and using separation of variables form the two equations
d
2
g
dv
2
+ (a 2q cos 2v) g = 0, q =
1
2
k
2
2
, g(v + 2) = g(v) for all v,
d
2
f
du
2
(a 2q cosh 2u) f = 0.
The rst of these equations is commonly known as Mathieus equation and periodic
solutions exists only for certain values of a(q).
Exercise 13.27
Keplers equation
Show that Keplers equation = u sin u with 0 < 1 can be inverted in
terms of Bessel functions with the formula,
u = + 2
k=1
1
k
J
k
(k) sin k.
Exercise 13.28
Show that the function dened by the integral 13.31 (page 354) satises the dif-
ferential equation 13.28, with = n.
Hint, by dierentiating under the integral sign, show that Bessels equation can
be written in the form
1
2
_

dt
d
dt
_
g(t)e
i(ntxsin t)
_
with g(t) = i
_
n
x
2
+
cos t
t
_
.
Exercise 13.29
Find the eigenvalues and eigenfunctions of the Sturm-Liouville system y
+y = 0,
y(0) = 0, y() = y
() any real .
Exercise 13.30
Find the self-adjoint form of the equation y
+y
tanx = 0.
Exercise 13.31
Use a comparison theorem to show that the solutions of y
+
x
1 +x
y = 0 have
innitely many zeros for x > 1.
Exercise 13.32
Show that the eigenvalues of the Sturm-Liouville system y
+ y = 0 with the
2-periodic boundary conditions y() = y() and y
() = y
() are n = n
2
,
n = 0, 1, 2, and that for each eigenvalue, except 0, there are two distinct
eigenfunctions, which can be expressed as the real or the complex functions
yn(x) = {cos nx, sin nx} or yn(x) = e
inx
, n = 0, 1, 2, .
Show, also that any linear combination of the pairs e
inx
is also an eigenfunction
with eigenvalue n = n
2
.
Exercise 13.33
(a) Using the new independent variable dened by x = e
t
, show that if B > 1/4
the equation y
(x) +By/x
2
= 0 has innitely many zeros on (1, ).
(b) Show that the equation y
(x) + q(x)y/x
2
= 0 has innitely many zeros on
(1, ) if q(x) > 1/4 for x 1.
Exercise 13.34
Consider the system
x
d
2
y
dx
2
+
dy
dx
+

x
y = 0, x 0.
(a) Show that the self-adjoint form of this equation is
d
dx
_
x
dy
dx
_
+

x
y = 0, x 0,
and determine the intervals on which it is a regular system and on which it is a
singular system.
(b) Show that the normal form of the equation is
d
2
u
dx
2
+
+
1
4
x
2
u = 0, u(x) = y(x)
x,
and determine the intervals on which it is a regular system and on which it is a
singular system.
(c) Find any eigenvalues and eigenfunctions for the boundary conditions y(0) =
y(1) = c, for any c.
(d) Find the eigenvalues and eigenfunctions for the boundary conditions y(a) = y(b) = 0,
0 < a < b.
Exercise 13.35
Use the bounds determined in exercise 13.22 (page 370) to show that the nth
eigenvalue of system y
+ (x
a
+)y = 0, y(0) = y(1) = 0, with a > 0 is bounded
by (n)
2
1 n (n)
2
.
Chapter 14
The Rayleigh-Ritz Method
14.1 Introduction
The approach adopted in this course has been to use a variational principle to obtain a
functional from which the Euler-Lagrange equation is derived. The stationary paths of
the functional are obtained by solving this equation. This approach is not always the
most practical because the Euler-Lagrange equation is usually a nonlinear boundary
value problem, and these are notoriously dicult to solve even numerically. The di-
culties of this approach are compounded if there are two or more independent variables
when the Euler-Lagrange equation becomes a partial dierential equation.
These diculties have led to the development of direct methods which avoid the
need to solve dierential equations by dealing directly with the functional. Starting
with a dierential equation the approach is to nd an associated functional and to use
this to nd approximations to the stationary paths, which are necessarily solutions of
the original dierential equation.
A further renement applies to those functionals for which the stationary paths
are actual minima. The technique, described in section 14.4, shows how to construct
a sequence of stationary paths so that the functional approaches its minimum value
from above: this idea is particularly useful for Sturm-Liouville systems because the
eigenvalues are equal to the value of the functional to be minimised, provided suitable
admissible functions are used.
14.2 Basic ideas
The direct method is very simple and was introduced by Euler before the Euler-Lagrange
equation was discovered. Suppose we require a stationary path of a functional S[y],
with y belonging to a given class of admissible functions. Rather than solving the
Euler-Lagrange equation, we use a restricted set of admissible functions z(x; a), each
member of which is named a trial function, depending upon a set of real variables
a = (a
1
, a
2
, . . . , a
n
). Substituting this into the functional gives a function, S(a) = S[z],
of the n real variables a. The stationary points of S(a) can be determined using the
methods of ordinary calculus and this provides an approximation to the exact stationary
path. An example of this procedure was described in exercise 5.10 (page 156) and there
375
376 CHAPTER 14. THE RAYLEIGH-RITZ METHOD
it was shown how a very simple trial function captured the qualitative features of the
exact solution for the minimum surface of revolution. Another example, described in
section 4.2.1, is Eulers original method whereby smooth paths are approximated by
straight line segments, with the vertex values (y
1
, y
2
, . . . , y
N
), see gure 4.1 (page 122),
playing the part of the parameters a: exercise 5.10 is an example of this method.
Generally there are no rules for choosing the trial function z(x; a), except that it
must be an admissible function, with the choice being guided by intuition and conve-
nience. The number of parameters, n, can be as small as one, or as large as one pleases;
but the larger n, the harder the algebra, though computers are particularly useful for
this type of problem. We illustrate this method with some simple problems.
S[y] =
_
1
0
dx
_
y
2
y
2
+ 2xy
_
, y(0) = y(1) = 0. (14.1)
The Euler-Lagrange equation is y
+y = x and has the solution

y(x) = x
sin x
sin 1
.
A simple trial function satisfying the boundary conditions is the polynomial z(x; a) =
ax(1 x), having just one free parameter, a. Substituting this into the functional we
obtain the integrals
_
1
0
dxz
2
= a
2
_
1
0
dx(1 2x)
2
=
1
3
a
2
,
_
1
0
dxz
2
= a
2
_
1
0
dxx
2
(1 x)
2
=
1
30
a
2
,
2
_
1
0
dxxz = 2a
_
1
0
dxx
2
(1 x) =
1
6
a,
so that S(a) =
3
10
a
2
+
1
6
a. This is stationary at a = 5/18, giving the approximation
z =
5x
18
(1 x) to y = x
sin x
sin 1
. (14.2)
In the left-hand of gure 14.1 we compare the graphs of the exact and approximate
functions.
0.2 0.4 0.6 0.8 1
-0.08
-0.06
-0.04
-0.02
0
0.2 0.4 0.6 0.8 1
-1
-0.5
0
0.5
1
exact, y(x)
approximation, z(x)
x
x
y
100(y-z)
Figure 14.1 On the left we compare the exact solution of y
+ y = x with the variational

approximation, z(x), dened in equation 14.2. On the right we show the dierence, 100(y z),
between the exact solution and the variational approximation.
14.2. BASIC IDEAS 377
Further thought suggests that this trial function is a poor choice, because the actual
solution is an odd function of x. This can be deduced from the dierential equation
because its right-hand side is odd, so we expect the solution to be odd, for if y(x) were
even, so also is y
(x) and the left-hand side of the equation would be even. Thus a
more sensible trial function is
z(x; a) = ax(1 x
2
), (14.3)
which leads to a = 7/38. This estimate of the solution is very close to the exact
solution as seen in gure 14.2 where we show the graphs of 100(y z): notice that
the dierences are about 10 times smaller than those in gure 14.1, which shows that
a careful choice of trial function can lead to signicantly improved results with little
extra eort.
0.2 0.4 0.6 0.8 1
-0.1
-0.05
0
0.05
0.1
x
100(y-z)
L
R
Figure 14.2 Graph of the dierence 100(y z) between the exact solution and the trial
function dened in equation 14.3. Notice that the dierences are about 10 times smaller
than those in gure 14.1.
A more general odd trial function that satises the boundary conditions is
z(x; a) = x(1 x
2
)
_
a
0
+a
1
x
2
+a
2
x
4
+ +a
n
x
2n
_
, (14.4)
and this has n + 1 parameters.
For the second, slightly more complicated, example we nd an approximate solution
to the nonlinear boundary value problem
d
2
y
dx
2
+x
2
y
2
= x, y(0) = y(2) = 0, (14.5)
whose solution cannot be expressed in terms of elementary functions. The functional
for this equation is
S[y] =
_
2
0
dx
_
1
2
y
2
1
3
x
2
y
3
+xy
_
, y(0) = y(2) = 0. (14.6)
Now we use trial function
z(x; a) = ax(2 x).
The three integrals needed are
1
2
_
2
0
dxz
2
=
4
3
a
2
,
1
3
_
2
0
dxx
2
z
3
=
64
189
a
3
,
_
2
0
dxxz =
4
3
a,
so that
S(a) =
64
189
a
3
+
4
3
a
2
+
4
3
a and S
(a) =
64
63
a
2
+
8
3
a +
4
3
.
Now there are two stationary paths given by the roots of this quadratic, which we
denote by
a
=
1
16
_
777 21
_
and a
+
=
1
16
_
777 + 21
_
,
which suggests that this nonlinear boundary value problem has two solutions. Nu-
merical calculations, guided by this approximation, conrm this and in gure 14.3 we
compare the above approximate solutions with those given by a numerical calculation.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
-0.5
-0.4
-0.3
-0.2
-0.1
0
y(x)
x
y(x)
x
numerical
approximate
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
1
2
3
4
a < 0 a > 0
Figure 14.3 A graphical comparison of the numerical solutions of equation 14.5, the solid lines,
and the variational approximation, the dashed lines. On the left is the comparison for a < 0 and
on the right for a > 0.
By substituting a power series into the dierential equation, it can be seen that a better
trial function is z = ax(4 x
2
), because the coecient of the term x
2
is zero, but for
this trial function the integrals are slightly more complicated. We have
1
2
_
2
0
dxz
2
=
64
5
a
2
,
1
3
_
2
0
dxx
2
z
3
=
512
45
a
3
,
_
2
0
dxxz =
64
15
a
so that
S(a) =
512
45
a
3
+
64
5
a
2
+
64
15
a and S
(a) =
512
15
a
2
+
128
5
a +
64
15
and the two stationary paths are given by setting a to the values
a
=
3
17
8
and a
+
=
3 +
17
8
.
In gure 14.4 are compared these approximations with numerically generated solutions
of equation 14.5. For a = a
the trial solution, shown by the circles, is very close to

the exact solution. In both cases the approximations are better, which again illustrates
the value of choosing suitable trial functions.
14.3. EIGENVALUES AND EIGENFUNCTIONS 379
0 0.5 1 1.5 2
-0.5
-0.4
-0.3
-0.2
-0.1
0
y(x)
x
y(x)
x
0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
approximate
exact
a=a_ < 0
a=a
+
> 0
Figure 14.4 Graphs comparing the exact numerically generated solution of equation 14.5 and
the approximations obtained using the trial function z = ax(1 x
2
), denoted by the circles in the
left panel.
It is worth noting that some black-box numerical methods for solving boundary value
problems give only the solution with a > 0, and provide no inkling that another so-
lution exists. Thus, simple variational calculations, such as described here, can avoid
embarrassing errors; but they give no guarantee that only two, or indeed any, solutions
to this problem exist.
Exercise 14.1
Using the trial function y = 1 ax(1 a)x
2
obtain an approximate solution for
the equation y
+xy = 0, y(0) = 1, y(1) = 0.

Exercise 14.2
(a) Show that the functional associated with the equation y
+y
3
= 0, y(0) = 0,
y
(X) = 0 is
S[y] =
_
X
0
dx
_
1
2
y
2
1
4
y
4
_
, y(0) = 0,
with a natural boundary condition at x = X.
(b) Use the trial function y = a sin(x/(2X)) to nd an approximate solution.
You will need the integral
_
/2
0
du sin
4
u =
3
16
.
14.3 Eigenvalues and eigenfunctions
In this section we show how the rst n eigenvalues and eigenfunctions of a Sturm-
Liouville system can be approximated by solving a set of n linear equations in n vari-
ables, using a method originally due to J W Strutt, third Baron Rayleigh (1842 1919).
The method can start either from the Euler-Lagrange equation or the associated func-
tional, and though it is normally slightly easier to use the Euler-Lagrange equation, see
exercise 14.7; we use the functional because this analysis is needed in the next section
to provide upper bounds to the eigenvalues.
We illustrate the method with the functional
S[y] =
_
1
0
dx
_
py
2
qy
2
_
, y(0) = y(1) = 0, (14.7)
C[y] =
_
1
0
dxy
2
= 1. (14.8)
If is the Lagrange multiplier this leads to the Sturm-Liouville system
d
dx
_
p
dy
dx
_
+ (q +)y = 0, y(0) = y(1) = 0, (14.9)
which we assume to have an innite sequence of real eigenvalues
1
<
2
<
3
, and
associated eigenfunctions y
1
(x), y
2
(x), .
First we need the following relation between the nth eigenfunction, y
n
(x), and its
eigenvalue
n
= S[y
n
] =
_
1
0
dx
_
py
2
n
qy
2
n
_
. (14.10)
This formula is useful because we shall use it, with approximations for y
n
(x), to both
approximate and bound
n
.
Exercise 14.3
By multiplying equation 14.9 by yn and integrating over (0, 1) prove equation 14.10.
Exercise 14.4
If yn(x) is an exact eigenfunction with eigenvalue n and zn = yn + u(x), with
|| 1 and O(u) = 1, is an admissible function, show that
n = S[zn] +O(
2
).
The result derived in exercise 14.4 is important. It shows that if an eigenfunction is
known approximately, with an accuracy O(), then it can be used to approximate the
eigenvalue to an accuracy O(
2
).
For the linear system 14.9 we construct trial functions using a subset of a complete
set of functions {} = {
1
,
2
, }, each of which satises the boundary conditions.
Normally these functions are eigenfunctions of another Sturm-Liouville system and it
is clear that when choosing this system it is sensible to use a system that is similar to
that being studied.
Here we use the complete, orthogonal sequence
k
(x), k = 1, 2, , satisfying
_
1
0
dx
i
(x)
j
(x) = h
i
ij
, (14.11)
with each
i
(x) satisfying the same boundary conditions as the original Sturm-Liouville
system, in this case
i
(0) =
i
(1) = 0. At the end of this analysis we shall use a specic
set of functions by setting
k
= sinkx. A trial function is obtained using a linear
combination of the rst n of these functions
z(x; a) =
n
k=1
a
k
k
(x),
and this will provide an approximation to the rst n of the required eigenvalues and
eigenfunctions. The trial function needs to satisfy the constraint C[z] = 1, and this
denes the function
C(a) =
_
1
0
dx
_
n
k=1
a
k
k
(x)
_
2
=
n
k=1
h
k
a
2
k
= 1, (14.12)
where we have used the orthogonal property, equation 14.11. In the space of real
variables a = (a
1
, a
2
, . . . , a
n
) this quadratic function of a, equation 14.12, denes an
n-dimensional ellipsoid. It is convenient to write this constraint in terms of the vector a,
C(a) = a
Ha = 1,
where H is the n n, diagonal matrix with H
kk
= h
k
> 0. The functional S[z] denes
another function of a,
S(a) =
_
1
0
dx
_
_
p(x)
_
n
k=1
a
k
k
(x)
_
2
q(x)
_
n
k=1
a
k
k
(x)
_
2
_
_
, (14.13)
which is also a quadratic form and can be written as
S(a) =
n
i=1
n
j=1
a
i
S
ij
a
j
= a
Sa, (14.14)
where S is a real, symmetric nn matrix, with elements S
ij
. Specically, these matrix
elements are given by
S
ij
=
_
1
0
dx
_
p(x)
i
(x)
j
(x) q(x)
i
(x)
j
(x)
_
. (14.15)
An approximation to the rst n eigenvalues and eigenfunctions of the Euler-Lagrange
equation 14.9 is given by the stationary values of S(a), subject to the constraint 14.12;
this is a conventional constrained stationary problem, dealt with in section 11.2. If
is the Lagrange multiplier for this problem the auxiliary function is
S(a) = S(a) C(a)
=
n
i=1
n
j=1
a
i
S
ij
a
j
i=1
h
i
a
2
i
= a
Sa a
Ha. (14.16)
The stationary values of S(a) are given by the solutions of S(a)/a
i
= 0, i = 1, 2, , n,
that is, the solution of the matrix eigenvalue equation
Sa = Ha or H
1
Sa = a. (14.17)
That is the stationary points are given by the eigenvectors of H
1
S. Further, since
H
1
S is a real, symmetric matrix its n eigenvalues are real and can be ordered,
1
<
2
< <
n
, and the kth eigenvalue provides an approximation to the kth eigenvalue
of the original Euler-Lagrange equation, as shown next.
If a
k
is the kth eigenvector of H
1
S with eigenvalue
k
, then assuming that the
associated trial function, z(x; a
k
), is an approximation to the kth eigenfunction of the
Sturm-Liouville system we have, from the result found in exercise 14.4,
k
S[z] = S(a
k
) = a
k
Sa
k
=
k
a
k
Ha
k
=
k
.
In the next section we shall show that
1

1
, so that
1
gives an upper bound to the
lowest eigenvalue,
1
.
An example using the orthogonal set
k
(x) = sin kx
For the interval (0, 1) and boundary conditions y(0) = y(1) = 0, a convenient orthogonal
set is
k
(x) = sin kx. For this set h
k
= 1/2 for all k, and the matrix elements of H
1
S
become
(H
1
S)
ij
= 2
2
ij
_
1
0
dx
_
p(x) cos ixcos jx
_
2
_
1
0
dx
_
q(x) sin ixsin jx
_
.
Now we apply this approximation to the particular eigenvalue problem
d
2
y
dx
2
+ (x +)y = 0, y(0) = y(1) = 0, (14.18)
where p(x) = 1, q(x) = x and w(x) = 1. The associated functional and constraint are
S[y] =
_
1
0
dx
_
y
2
xy
2
_
, C[y] =
_
1
0
dxy
2
= 1, y(0) = y(1) = 0. (14.19)
We use the complete set
k
= sin kx, k = 1, 2, , to construct the trial functions,
and the simplest of these is obtained by using only the rst function,
z(x; a
1
) = a
1
sinx.
The constraint gives a
2
1
_
1
0
dx sin
2
x = 1, that is a
2
1
= 2. Thus the rst approximation
for
1
is given by
1
S(a
1
) = a
2
1
_
1
0
dx
_
2
cos
2
x xsin
2
x
_
=
2
1
2
9.3696. (14.20)
That is
1
9.3696. The exact eigenvalues are given by the real solutions of
Ai()Bi(1 ) Ai(1 )Bi() = 0,
where Ai(z) and Bi(z) are the Airy functions which are solutions of Airys equation,
y
xy = 0: to 7 signicant gures the rst eigenvalue is, 9.368507, so the approxima-

tion is larger than this by 0.01%.
Better approximations to
1
are obtained by increasing n, but quickly the algebra
becomes cumbersome, time consuming and error prone. However, because the equa-
tions for a are linear the standard methods of linear algebra may be used and so the
calculations become relatively trivial if a computer is available.
Here we limit the calculation to n = 2, which illustrates all the relevant details: for
the sake of brevity the minor details of this calculation are omitted. The trial function
is now
z(x; a
1
, a
2
) = a
1
sin x +a
2
sin 2x
and the constraint, equation 14.12, gives a
2
1
+a
2
2
= 2. The functional is given by
S
2
(a
1
, a
2
) =
_
2
2

1
4
_
a
2
1
+
_
2
2
1
4
_
a
2
2
+
16
9
2
a
1
a
2
.
Hence the matrix eigenvalue equation 14.17 is
_
_
_
1
2
16
9
2
16
9
2
4
2
1
2
_
_
_a = a. (14.21)
Since 16/(9
2
) 1 the eigenvalues of the matrix on the left are approximately
2
1/2 and (2)

2
1/2, giving
1
9.3696 (the previous value) and
2
38.9784; the
eigenvalues of this matrix are actually (9.368509, 38.9795), which compare favourably
with the exact eigenvalues, (9.368507, 38.9787), of equation 14.18.
Exercise 14.5
Consider the eigenvalue problem
y
+ (x
2
+)y = 0, y(0) = y(1) = 0.
(a) Using the orthogonal set
k
(x) = sin kx, k = 1, 2, , show that an upper
bound to the smallest eigenvalue is
1 2
_
1
0
dx
_
2
cos
2
x x
2
sin
2
x
_
=
2
1
3
+
1
2
2
9.59.
(b) Show that the trial function z = ax(1 x) gives the bound 1 68/7 9.71.
Which of these two estimates is closer to the exact value?
Exercise 14.6
Using the complete set of functions
k
(x) = sin(k
1
2
)x, k = 1, 2, , which
satisfy the boundary conditions
k
(0) =
k
(1) = 0, nd approximations to the
rst eigenvalue of the system
d
2
y
dx
2
+ (x +)y = 0, y(0) = y
(1) = 0,
using trial functions with one and with two parameters.
The following integral will be useful
_
1
0
dx xsin
_
n
2
x
_
sin
_
m
2
x
_
=
_
_
1
4
+
1
2
, n = m = 1,
1
4
+
1
9
2
, n = m = 3,
2
, n = 1, m = 3.
Exercise 14.7
(a) Determine an approximation to the eigenvalues and eigenfunctions of the equa-
tion
d
2
y
dx
2
+ (x +)y = 0, y(0) = y(1) = 0,
by substituting the series
y(x) =
n
k=1
a
k
sin kx
into the equation to form the matrix equation Ma = a where a = (a1, a2, . . . , an)
and M is an n n, real symmetric matrix, with elements
Mij =
2
i
2
ij 2
_
1
0
dxx sin ixsin jx.
(b) Show that for n = 1 and 2 this method gives the approximations 14.20
and 14.21 respectively.
(c) Show that for arbitrary n this method gives the equation 14.17 (page 381)
for a if
k
= sin kx, p(x) = 1 and q(x) = x.
14.4 The Rayleigh-Ritz method
In this section we consider the eigenvalues of Sturm-Liouville systems which are spe-
cial because the associated functionals have strict minima. This property allows the
eigenvalues to be bounded above arbitrarily accurately by a well dened and relatively
(especially with a computer) simple procedure. The method is also applicable to many
important partial dierential equations which is one reason why it is important.
Suppose that the functional S[y] has a minimum value: this means that in the
class of admissible functions, M, S[y] has a greatest lower bound, s and this bound is
achieved by a function in M. There are several technical issues behind this assertion
that we ignore.
The aim is to construct a sequence of functions {y
1
, y
2
, }, each in M, such
that S[y
k
] S[y
k+1
], so that s
k
= S[y
k
] is a decreasing innite sequence such that
lim
k
s
k
= s.
We start with an innite set of functions {} = {
1
,
2
, }, each in M, and with
a natural ordering. For instance if the admissible functions are dened on [0, 1] and are
zero at the end points, x = 0 and 1, two typical sequences are
sin kx, k = 1, 2, 3, ,
x(1 x), x(1 x)(1 +x), x(1 x)(1 +x +x
2
), x(1 x)(1 +x +x
2
+x
3
), .
From the sequence {} a nite dimensional subspace is formed from the rst n members
{
1
,
2
, ,
n
}; that is the set of all the linear combinations
z(x; a) = a
1
1
(x) +a
2
2
(x) + +a
n
n
(x), (14.22)
where the a
k
, k = 1, 2, , n are any real numbers. On this subspace S[z] becomes a
function of the real numbers a = (a
1
, a
2
, . . . , a
n
),
S(a) = S[z].
14.4. THE RAYLEIGH-RITZ METHOD 385
This is exactly as in the previous section; but now we use the fact that the functional
has a minimum.
Choose (a
1
, a
2
, . . . , a
n
) to minimise S(a) and denote this minimum value by s
n
and
the associated element of M
n
by y
n
,
s
n
= min
_
S(a
1
, a
2
, . . . , a
n
)
_
.
Clearly s
n
cannot increase with n because M
n+1
contains M
n
, that is any linear
combination of {
1
,
2
, ,
n
} is a linear combination of {
1
,
2
, ,
n
,
n+1
}. If
the sequence {} is complete, then it can be shown that the sequence s
n
converges to s,
the minimum value of S[y].
This method of successively approximating a functional using sequences of functions
is essentially that used by Lord Rayleigh, but in 1909 it was put on a rigorous basis by
W Ritz, and is now named the Rayleigh-Ritz method, or sometimes the Ritz method.
For Sturm-Liouville systems the signicance of this result is that the eigenvalue is
just the value of the functional that has a minimum, equation 14.10 (page 380).
The smallest eigenvalue of a Sturm-Liouville system
The simplest use of this technique is to estimate the lowest eigenvalue and eigenfunction
of a Sturm-Liouville system.
Consider the functional S[y] and constraint C[y],
S[y] =
_
1
0
dx
_
py
2
qy
2
_
, y(0) = y(1) = 0, C[y] =
_
1
0
dxy
2
= 1.
A suitable subspace M
n
is {sinx, sin 2x, , sin nx}, giving the linear combination
z(x; a) =
n
k=1
a
k
sin kx.
Then the functional becomes a function S(a)
S(a) =
_
1
0
dx
_
_
2
p(x)
_
n
k=1
ka
k
cos kx
_
2
q(x)
_
n
k=1
a
k
sin kx
_
2
_
_
,
which has a minimum, because S(a) is continuous, therefore bounded above and below,
and the constraint limits each a
k
to a nite region, so there is some value of a that
yields the minimum value. Substituting this value for a into S(a) gives an upper bound
(n)
1
for
1
,
1

(n)
1
= S(a). (14.23)
For each m = 1, 2, , we similarly obtain an upper,
(m)
1
for the lowest eigenvalue,
and by the same reasoning as used above, we see that
(1)
1

(2)
1

(m)
1
and lim
m
(m)
1
=
1
. (14.24)
Thus the method used in the previous section provides successively closer upper bounds
to the lowest eigenvalue.
A numerical example of this behaviour was seen in the calculation of the smallest
eigenvalue of equation 14.18 (page 382) where we used the trial function
z(x; a) =
n
k=1
a
k
sin kx.
The exact value of this eigenvalue is, to 10 signicant gures, 9.368 507 162: for n = 1, 2
and 3 the variational estimates for
1
are 9.3698, 9.368 509 and 9.368 508 6. With the
trial function
z(x, a) = x(1 x)
_
a
0
+a
1
x +a
2
x
2
+ a
n1
x
n1
_
the estimates of this eigenvalue with n = 1, 2, 3 and 4 are 9.5, 9.4989, 9.3687 and
9.368 513. As predicted the estimates approach the exact value from above.
The Rayleigh-Ritz method can be applied to any functional with a minimum value.
In particular it applies to the general Sturm-Liouville system
S[y] = p(a)y(a)
2
+p(b)y(b)
2
+
_
b
a
dx
_
py
2
qy
2
_
, (14.25)
with natural boundary conditions, and with the constraint
C[y] =
_
b
a
dxwy
2
= 1 (14.26)
which leads to the Euler-Lagrange equation
d
dx
_
p
dy
dx
_
+ (q +w)y = 0, (14.27)
with the separated boundary conditions
y(a) +y
(a) = 0, and y(b) +y
(b) = 0,
see exercise 14.8. Provided the integrals exist, the Rayleigh-Ritz method applies to
singular and regular systems. For the boundary conditions y(a) = 0 and/or y(b) = 0
the appropriate boundary term of the functional 14.25 is removed.
For this system a sequence can be found such that the smallest eigenvalue satises
the conditions of equation 14.24. Further, the rigorous application of this method proves
the existence of an innite sequence of eigenvalues and eigenfunctions for both regular
and singular systems, see for instance Fomin and Gelfand (1992, chapter 8) or Courant
and Hilbert (1965, chapter 6).
By adding an additional constraint that forces the admissible functions to be or-
thogonal to y
1
(x), the eigenfunction associated with the smallest eigenvalue, we obtain
bounds for the next eigenvalue. Thus by considering the system dened by equa-
tions 14.25 and 14.26 with the additional constraint
C
1
[y, y
1
] =
_
b
a
dxwyy
1
= 0, (14.28)
14.4. THE RAYLEIGH-RITZ METHOD 387
and using trial functions z satisfying the two constraints C[z] = 1 and C
1
[z, y
1
] = 0 we
obtain another convergent sequence
(1)
2

(2)
2

(m)
2
and lim
m
(m)
2
=
2
. (14.29)
By adding further constraints this process can be continued to obtain upper bounds for
any eigenvalue.
Exercise 14.8
(a) Show that the constrained functional with natural boundary conditions
S[y] = p(a)y(a)
2
+p(b)y(b)
2
+
_
b
a
dx
_
py
2
qy
2
_
and the constraint
C[y] =
_
b
a
dxwy
2
= 1
gives rise to the Euler-Lagrange equations
d
dx
_
p
dy
dx
_
+ (q +w)y = 0, y(a) +y
(a) = 0, y(b) +y
(b) = 0,
where is the Lagrange multiplier.
(b) Show that if
k
is the eigenvalue of the eigenfunction y
k
,
k
= S[y
k
].
Exercise 14.9
(a) Show that the eigenvalue problem
d
2
y
dx
2
+
xy = 0, y(0) = y(1) = 0
with eigenvalue , can be written as the constrained variational problem with
functional
S[y] =
_
1
0
dx y
2
with the constraint C[y] =
_
1
0
dx
xy
2
= 1,
with admissible functions satisfying y(0) = y(1) = 0.
(b) Using the trial function z = ax(1 x) show that the smallest eigenvalue
satises 1
231
16
.
Exercise 14.10
Using the trial function z = a(1 x
2
) show that a lower bound to the smallest
eigenvalue of the system
y
+
_
x
2p
+
_
y = 0, y(1) = y(1) = 0,
where p is a positive integer, is given by
1
5
2
_
1
6
(2p + 1)(2p + 3)(2p + 5)
_
.
Exercise 14.11
(a) Find the eigenvalues and eigenfunctions of the problem
d
2
y
dx
2
+y = 0, y
(0) = y(1) = 0.
(b) Use the rst eigenfunction of this problem to show that an approximation to
the rst eigenvalue of
d
2
y
dx
2
+
_
b sin
_
x
2
_
+
_
y = 0, y
(0) = y(1) = 0 is 1

2
4

4b
3
.
(c) Show that an approximation to the nth eigenvalue is
n
2
n
2
2b
_
1
1
16n
2
1
_
, n = n
1
2
.
Hint use the nth eigenfunction of the system dened in part (a) to construct a
one parameter trial function.
Exercise 14.12
(a) Show that the equation
d
dx
_
1
x
dy
dx
_
+xy = 0, y(1) = 0, y(2) y
(2) = 0,
is a regular Sturm-Liouville system, associated with the functional and constraint
S[y] =
1
2
y(2)
2
+
_
2
1
dx
y
2
x
, C[y] =
_
2
1
dxxy
2
= 1,
with admissible functions satisfying the conditions y(1) = 0 and y(2) = y
(2).
(b) Using the trial function z = (x1)(Ax+B), show that the smallest eigenvalue,
1, satises the inequality
1
6
7
+
12
7
ln 2.
References
Books and articles referred to in the text
Akhiezer N I 1962 The Calculus of Variations, (Blaisdell Publishing Company, trans-
lated from Russian by A H Frink)
Apostol T M 1963 Mathematical Analysis: A Modern Approach to Advanced Calculus,
(Addison-Wesley)
Arnold V I 1973 Ordinary Dierential Equations, (The MIT press)
Ashby A, Brittin W E, Love W F and Wyss W, 1975 Brachitochrone with Coulomb
Friction, Amer J Physics 43 902-5.
Aughton P 2001 Newtons Apple, (Weidenfeld and Nicolson)
Bernstein S N 1912 Sur les equations su calcul des variations, Ann. Sci.

Ecole Norm
Sup. 29 431-485
Birkho G and Rota G-C 1962 Ordinary dierential equations (Blaisdell Publishing
Co.)
Brunt, van B 2004 The Calculus of Variations, (Springer)
Courant R and Hilbert D 1937a Methods of Mathematical Physics, Vol 1 (Interscience
Publishers Inc)
Courant R and Hilbert D 1937b Methods of Mathematical Physics, Vol 2 (Interscience
Publishers Inc)
Davenport J H, Siret Y and Tournier E 1989 Computer Algebra. Sytems and algorithms
for algebraic computation, Academic Press.
Gelfand I M and Fomin S V 1963 Calculus of Variations, (Prentice Hall, translated
from the Russian by R A Silverman), reprinted 2000 (Dover)
Goldstine H H 1980 A History of the Calculus of Variations from the 17
th
through the
19
th
Century, (Springer, New York)
Ince E L 1956 Ordinary dierential equations (Dover)
Isenberg C 1992 The Science of Soap Films and Soap Bubbles, (Dover)
Jerey A 1990 Linear Algebra and Ordinary Dierential Equations (Blackwell Scientic
Publications)
Kolmogorov A N and Fomin S V 1975 Introductory Real Analysis, (Dover)
Landau L D and Lifshitz E M 1959 Fluid mechanics, (Pergamon)
L utzen J 1990 Joseph Liouville 1809-1882: Master of Pure and Applied Mathematics,
Springer-Verlag
Piaggio H T H 1968 An Elementary treatise on Dierential Equations, G Bell and Sons,
rst published in 1920
Prandtl L 1904

Uber Fl ussigkeitsbewegung bei sehr kleiner Reibung, Verhandlungendes
III. internationalen Mathematiker-kongresses, Heidelberg, 1904
Richards E G 1998 Mapping Time, Oxford University Press.
Rudin W 1976 Principles of Mathematical Analysis, (McGraw-Hill)
Schlichting H 1955 Boundary Layer Theory, (McGraw-Hill, New York)
Simmons G F 1981, Dierential Equations, McGraw-Hill Ltd.
Smith G E 2000 Fluid Resistance: Why Did Newton Change His Mind? Published in
The Foundations of Newtonian Scholarship, Eds R H Dalitz and M Nauenberg, (World
Scientic)
Sutherland W A 1975 Introduction to Metric and Topological Spaces, (Oxford University
Press)
Troutman J L 1983 Variational Calculus with Elementary Convexity, (Springer-Verlag)
Watson G N 1965 A Treatise on the Theory of Bessel Functions (Cambridge University
Press), rst published in 1922.
Whittaker E T and Watson G N 1965 A Course of Modern Analysis, (Cambridge
University Press)
Yoder J G 1988 Unrolling Time, (Cambridge University Press)
Yourgrau W and Mandelstram S 1968 Variational Principles in Dynamics and Quantum
Theory (Pitman)
Books on the Calculus of Variations
The following books have also been used in the preparation of these course notes and
should be consulted for a more detailed study of the subject.
Akhiezer N I 1962 The Calculus of Variations, (Blaisdell Publishing Company, trans-
lated from Russian by A H Frink)
Forsyth A R 1926 Calculus of Variations, (Cambridge University Press), reprinted 1960
(Dover)
Fox C 1963 An Introduction to the Calculus of Variations, (Oxford University Press),
reprinted 1987 (Dover)
Gelfand I M and Fomin S V 1963 Calculus of Variations, (Prentice Hall, translated
from the Russian by R A Silverman), reprinted 2000 (Dover)
Pars L A 1962 An Introduction to the Calculus of Variations, (Heineman)
Sagan H 1969 An Introduction to the Calculus of Variations, (General publishing Com-
pany, Canada), Reprinted 1992 (Dover)
Troutman J L 1983 Variational Calculus with Elementary Convexity, (Springer-Verlag)
Index
C
, 20
C
n
(a, b), 20
O-notation, 12
ln x, 35
D
0
norm, 124, 137
D
1
norm, 124, 137
f
1
(y), 17
o-notation, 13
admissible function, 124
Airys equation, 362
allowed variations, 125
analytic solution, 53, 58
Aristotle, 113
associated Riccati equation, 85
astroid, 244
autonomous equation, 54
auxiliary
function, 292
functional, 300
basis functions, 73
beam, loaded, 257, 263
Bernoulli
Daniel, 356
James, 52
John, 52, 62
Bernoulli John, 105, 145, 258
Bernoullis equation, 65
Bernstein S N, 135
Bernsteins theorem, 135
Bessel F W, 354
Bessel functions, 353, 361
binomial
coecients, 21, 35
expansion, 34
Bois-Reymond, P du, 128, 135
boundary conditions
mixed, 353, 364
periodic, 353, 364
separated, 358
boundary layer, 107
boundary value problem, 56, 121, 135
brachistochrone, 104, 145, 250, 258, 260
in a resisting medium, 319
with Coulomb friction, 329
broken extremals, 271
Brownian motion, 18
bubbles, 163
Cam, bridge over, 147
canal equation, 343
cardioid, 244
catenary, 258, 305
catenary equation, 112
Cauchy A C, 53, 61
Cauchy inequality, 41
chain rule, 19
Chartiers theorem, 42
Clairauts equation, 59
clepsydra, 91
closed interval, 11
closed-form solution, 53, 58
codomain, 10
comparison theorem
rst-order equation, 370
second-order equation, 360, 362
complete primitive, 55
completeness, 350
conjugate point, 220
and geodesics, 227
and lenses, 228
conservation laws, 202
constant of the motion, 198
constraint, 288
constraint, functional, 300
continuous function, 14
corners, 271
391
392 INDEX
coupled equations, 183
critical point, 210
cycloid, 105, 146, 244, 261
area and length, 148
pendulum, 148, 171
dAlemberts paradox, 107
denite integrals, 41
degenerate stationary point, 212
dependent
functions, 288
variable, 10
dependent variable, 54
derivative, 18
partial, 24
total, 26
Descartes R, 147
Dido, 299
dierentiable, 18
dierential equation, 54
dierentiation of an integral, 43
diusion equation, 343
direct methods, 375
discontinuity
removable, 15
simple, 15
domain, 10
drag coecient, 106
dual problem, 296
Eddington A S, 98
eigenfunction, 339
eigenvalue, 186, 339
Einstein A, 98
elastic wire, 271
ellipse, 244
elliptical coordinates, 373
Emden-Fowler equation, 204
Emerson W, 51
epicycloid, 253
equation of constraint, 288
Essex J, 146
Euclid, 113, 165
The Euler equation, 80
Euler L, 53, 69, 93, 114, 122, 173, 319, 354
Eulers formula, 28
Euler-Lagrange equation, 121, 129
extrema, local and global, 96
extremal, 125, 130
Fermat P de, 113, 147
Fermats principle, 112
nite subsidiary conditions, 315
rst-integral, 130, 198, 203
xed singularity, 57
uent, 52
folium of Descartes, 48
Fourier components, 351
Fourier J B J, 343, 354
Fourier series, 351
Frechet M, 9
Fredholm I, 9
frustum, 155
functional, 7
dierentiation of, 123
stationary value of, 125
Fundamental lemma of the Calculus of Vari-
ations, 128
Fundamental Theorem of Calculus, 40
Gateaux dierential, 126
Galileo G, 105
general solution, 55
general theory of relativity, 98
geodesic, 98, 247
geodesics and conjugate points, 227
global extrema, 96
Goldschmidt solution, 160
graph, 10
gravitational lensing, 98
great circle, 98, 249
Green G, 343
H older inequality, 41
Hamilton W R, 115
hanging cable, 111, 305
heat equation, 343
Heaviside function, 15
Hero of Alexandria, 113
Hessian matrix, 212
Hilbert D, 9
holonomic constraint, 315
homogeneous equation, 55, 63
homogeneous functions, 28
horn equation, 343
Huygens C, 147
INDEX 393
implicit function, 29
theorem, 29
indenite integral, 41
independent variable, 10, 54
inection, point of, 211
inhomogeneous equation, 55
initial value problems, 56
inner product, 350, 363
integral
denite, 41
dierentiation of a parameter, 43
indenite, 41
of oscillatory functions, 42
integral of the motion, 198
integrand, 40
integrating factor, 64
integration
by parts, 43
limits, 40
invariant, 74
invariant functional, 201
inverse function, 17
inverse problem, 188
irregular singular point, 73
Isochrone, 105
isoperimetric problem, 110, 299
Jacobis equation, 220
Jacobian determinant, 30, 179
Keplers equation, 354, 373
kinetic focus, 228
Kronecker delta, 350
LHospital GFA, Marquis de , 38
LHospitals rule, 38
Lagrange J-L, 93, 114, 122, 289, 315, 354
Lagrange multiplier, 292
Lagranges identity, 363
Lalouvère, de A, 147
Lamberts law of absorption, 92
least squares t, 215
Lebesgue H, 9
Legendres condition, 218
Leibniz G W, 52, 62, 105
Leibnizs rule, 21
Lemniscate of Bernoulli, 242
lenses and conjugate points, 228
Lienards equation, 59
linear dierential equations, 55
linear independence, 75
Liouville J, 53, 339
Liouville transformation, 342, 371
Lipshitz condition, 371
loaded beam, 257, 263
local
extrema, 96
maximum, 210
minimum, 210
logarithm, natural, 35
Maclaurin C, 31
Mathieu E L, 358
Mathieus equation, 373
Maupertuis P L N de, 114
maximum point, 210
Mean Value Theorem
one variable Cauchys form, 22
one variable integral form, 23
minimal
moment of inertia, 170
surface of revolution, 154
minimum point, 210
minimum resistance problem, 106, 277
Minkowski inequality, 41
minor of determinant, 214
mixed derivative rule, 25
monotonic function, 17
Morse H C M, 212
Morse Lemma, 212
movable singularity, 57
natural boundary condition, 257, 260
natural logarithm, 35
navigation problem, 110, 262
Newton I, 39, 52, 105, 109, 146, 258
Newtons problem, 106, 277
Noether E, 202
Noethers theorem, 203
non-autonomous equation, 54
nonlinear equation, 56
nontrivial solution, 73
norm, 11
on function space, 124
normal form, 74
Liouvilles, 341
394 INDEX
normalised functions, 350
open
ball, 11
interval, 11
set, 12
order notation, 12
ordinary point, 73
orthogonal functions, 350
oscillation theorem, 366
Pappus of Alexandria, 111
parametric functional, 244, 268
partial derivative, 24
particular
integral, 55
solution, 55
Pascal B, 147
pendulum
clock, 147
cycloidal, 148, 171
periodic boundary conditions, 358
piecewise continuous, 15
Plateau J A F, 165
Poisson S D, 343, 354
positive denite matrix, 186
Pr ufer system, 366
Prandtl L, 107
principal minor, 214
product rule, 19
quadratic form, 213
quadrature, 53, 187
quotient rule, 19
radius of convergence, 32
radius of curvature, 67
Rayleigh Lord, 379
Rayleigh-Ritz method, 385
rectication, 82
regular singular point , 73
regular Sturm-Liouville, 357
relative extrema, 96
Riccati J F, 66
Riccatis equation, 66, 223
Riemann G F B, 40
Ritz method, 385
Ritz W, 385
Roberval G P de, 147
saddle, 211
scale transformation, 201
Schwarz inequality, 41
Schwarzian derivative, 47, 89
second variation, 215
self-adjoint form, 74, 339
self-adjoint operator, 363
separable equations, 62
separated boundary conditions, 358
separation constant, 344
separation of variables, 62
separation theorem, 360
Sgn function, 15
shortest distance
in a plane, 98
on a cylinder, 118, 318
on a sphere, 98
side-conditions, 315
singular point, 72, 73
singular solution, 55, 59, 83
singular Sturm-Liouville system, 346
Smith R, 146
smooth function, 20
Snells law, 114
soap lms, 163
special functions, 340, 353
speed of light, 112
spherical polar coordinates, 247
stationary
point, classication, 96
curve, 94
functional, 125, 128
path, 94, 125, 129
point, 94, 125
point, degenerate, 212
Steiner J, 166
sti beam, 257, 263
Stirling J, 31
Stirlings approximation, 33
strictly
increasing, 17
monotonic, 17
strong stationary path, 138
strongly positive, 216
structurally stable functions, 211
INDEX 395
Strutt J W, 379
Sturm J C F, 53, 339
Sturm-Liouville system, 357
regular, 357
singular, 358
suciently smooth, 20
supremum norm, 124
surface of revolution
minimum area, 250, 305, 309
symmetric matrix, 186
tangent line, 18
Tautochrone, 105
Taylor
polynomials, 31
series, 31, 37
Taylor B, 31
Torricellis law, 91
total derivative, 26
transversality condition, 266, 270, 310
trial functions, 375
triangle inequality, 11
trigonometric series, 351
trivial solution, 55, 73
trochoid, 146, 253
uncoupled equations, 185
undetermined multiplier, 292
variable end points, 110, 265
variation of parameters, 64
variational equation, 227
Wallis J, 147
Water clock, 91
wave equation, 343
weak stationary path, 138
Weierstrass K, 107, 132, 245
Weierstrass-Erdmann conditions, 271, 275,
311
weight function, 350
Wren C, 147
Wronski J H de, 74
Wronskian, 74, 89
Zenodorus, 111
396 INDEX

m820 2011

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

m820 2011

Загружено:

Авторское право:

Доступные форматы

Contents

y, y 0. These two inverses are

(x) and the denition 1.3 of the order notation shows

(x) +o(h). (1.8)

(a) and passes through f(a). These two facts

(a) h. Conversely, given a point a, and the

(a), of this line. So the information that the

(a) is equivalent to the information that the tangent line at

(a) h. Although the classical derivative, equation 1.7,

(x), (The product rule)

(g(x)) does not mean the derivative

(x) with x replaced by g or g(x).

(x) is dierentiable, it may be dierentiated to obtain the second derivative,

(x) is an odd (even) function.

(u), a < u < b. (1.12)

do not vanish near (x, y) is sketched below; the generalisation to larger n is

= 0, but then more care is needed

1 +x1x2, and x1 and x2 are functions of t, show by direct calculation

(x) = 0 and hence

(x) = 0 for all x (a, b), where a < x < b . If

(x) = f(x) then F(x) = F(a)+

, > 1: if f(x) decays to zero slower than 1/x

, 0 < < 2 this shows that

, > 1, the integral exists, which is why, in the previous example, we

(t)t, where x = g(t), under the variable change.

nIn1 and deduce that

( 2x) tan x, (e) lim

(x) increases with increasing x, use the Mean

(x) < f(x + 1) f(x) < f

, a real, non-integer number. Essential singularity

x a) and ln(x a).

= y/(y x), which therefore has the two solutions

= |x|, y(1) = 2, (h) y

+y = sin x, y(0) = 0, y() = 0,

y = cos x, y(1) = y(2), y

sin x +y cos x = 0, y(/2) = 0, y

(p) and y = px +f(p)

) = 0 in which the inde-

(x) is dened: this gives the gradient of the

(0) directly from the dierential equation and

= F(x, y) can be made separable if new

1 +y, y(0) = 0, (d) y

cos x +y sin x = 2 cos

= Q, and hence that

) = 0, which does not depend

(x) in terms of p and p

(y), using the chain rule as follows,

) = 0 is reduced to the rst-order equation

(y)) = 0. Riccati chose particular functions to give the equations quoted at

= P(x) + Q(x)y + R(x)y

(x) f(x)g(x) f(x) g(x)

u(z), show that u(z) satises the equation

to be unity and such that (n + 2) = 2, show

(x). The theory

()) and (g(), g

()) are linearly independent,

() = 0, that is the zeros of the solutions are simple.

) and in exercise 2.30 it was

+q(x)y = 0 for a < x < b, then another solution is g(x) = f(x)

+p0(x)y = 0, show that

/y reduces the second-order equation

are arbitrary constants.