Вы находитесь на странице: 1из 447

Table of Contents

Introduction
Chapter I:

8
Manipulation of Infinite Series Expansions
1. Geometric Series
2. General Infinite Series
3. The Integral Comparison Test and Integral Bounds
4. The Ratio and Root Tests: Power Series
5. The Alternating Series Test
and Problems with Conditional Convergence
6. Taylor Series
7. Integration by Differentiation
8. Summary of Chapter I

11
11
13
18
24

Chapter II:

Infinite Product Expansions


1. General Infinite Products
2. Infinite Product Representations of Functions
3. Summary of Chapter II

47
47
52
65

Chapter III:

The Gamma Function (and friends)


1. Definitions of the Gamma Function
2. Expansions of the Gamma Function and Related Integrals
3. The Beta Function
4. The Riemann Zeta Function
5. Regularization of Integrals
6. Summary of Chapter III

67
67
77
84
96
105
111

Chapter IV:

Asymptotic Expansions
1. General Results
2. The Method of Steepest Descents
3. Summary of Chapter IV

113
113
123
133

Chapter V:

Complex Analysis
1. Functions of a Complex Variable
2. The Complex Exponential and Logarithm
3. Mbius Transformations
4. Complex Limits and Differentiability
5. Complex Integration
6. Cauchys Integral Formula
7. Taylors Theorem and Complex Taylor Expansions
8. Laurent Series
9. Applications to Fluid Mechanics
and Electromagnetic Theory
10. Residue Theory
11. Improper Contour Integrals
12. Summation of Series
13. Rouchs Theorem and Infinite Product Expansions
14. The Behavior of Entire Functions
15. Summary of Chapter V

135
135
141
152
157
184
199
205
216

29
33
42
45

230
254
269
279
284
305
313
5

Chapter VI:

Explorations in Sums and Prime Numbers


1. Stirlings Dissection
2. The Behavior of Prime Numbers
3. The Prime Number Theorem
4. The Behavior of the Nontrivial Zeros
5. Summary of Chapter VI

315
315
321
329
343
350

Chapter VII: Mathematical Models and Periodic Motion


1. Least Squares Fitting
2. The First Integral and Conservation of Energy
3. The Second Integral and Periodic Motion
4. Constant Coefficient Differential Equations
5. Summary of Chapter VII

351
351
360
363
371
378

Chapter VIII: Math in Space


1. Circular Motion and the Universal Law of Gravitation
2. Angular Momentum and the Effective Potential
3. Keplers Laws
4. Natural Scales and the Orbital Equations
5. Summary of Chapter VIII

379
379
385
389
395
403

Chapter IX:

Integral Transforms
1. The Laplace Transform
2. Laplace Transforms of Linear Differential Equations
3. The Inverse Laplace Transform and the Bromwich Integral
4. The Action of Impulses on Physical Systems
5. Convolution Integrals
6. The Fourier Transform
7. Mellin Transforms
8. Summary of Chapter IX

405
405
412
427
446
457
463
469
479

Chapter X:

Probability Distributions
1. The Binomial Distribution
2. Continuous Probability Distributions
3. Multinomial Distributions
4. Statistical Mechanics
5. The Maxwell-Boltzmann Speed Distribution
6. The Ideal Gas Law
7. The Birth of Quantum Mechanics
8. Summary of Chapter X

481
481
486
494
496
511
518
522
531

Chapter XI:

Finite Dimensional Linear Algebra


1. Vector Spaces
2. Linear Independence and the Solution of Linear Systems
3. Matrix Transforms
4. Determinants and Invertibility
5. General Vector Spaces: Isomorphisms
6. Eigenvalues and Eigenvectors
7. Inner Product Spaces
8. Summary of Chapter XI

533
533
536
542
552
560
569
589
602

Chapter XII: Infinidimensional Linear Algebra


1. Characterization of Infinidimensional Vector Spaces
2. Function Spaces: A Specific Example
of an Infinidimensional Vector Space
3. Self-Adjoint Transformations
4. Function Spaces Without Compact Support
5. Summary of Chapter XII

603
603
610
620
639
649

Chapter XIII: Partial Differential Equations


1. The Wave Equation
2. Boundary Conditions
3. Initial Conditions and the Solution of the Wave Equation
4. Waves in Two Dimensions
5. The Sturm-Liouville Problem
6. Hypergeometric Functions
7. Sound Waves
8. The Diffusion Equation
9. Summary of Chapter XIII

651
651
658
662
674
691
704
709
713
715

Chapter XIV: The Wave Equation in a Nonuniform Medium


1. Boundaries Between Media
2. Unbounded Regions with Different Media
3. Summary of Chapter XIV

717
717
729
743

Chapter XV: Greens Functions


1. Self-Adjoint Form
and the Requirements of a Greens Function
2. Eigenvalue Problems
3. Singular Eigenvalue Problems
4. The Refinement
5. Numerical Solutions
6. Summary of Chapter XV

745
745
753
761
772
781
784

Additional Reading

785

Answers to Selected Problems

787

Index

817

Introduction
The purpose of this series of notes is to elucidate many of the mathematical properties
that are important in scientific applications, especially those in physics. These notes cover a
variety of different mathematical techniques, including several special functions and their
integral representations, important approximation methods, complex analysis, mathematical
modeling, the Laplace transform, probability distributions, linear algebra (both finite and
infinidimensional), partial differential equations, and some aspects of Greens functions. I have
tried to explain why each of these techniques works and give examples of their use in all cases.
This set of notes is self-contained in the sense that a student who understands multivariable
calculus should be able to understand all of these notes without requiring a separate reference,
but separate references may be useful in putting a specific result in the context of a different
branch of mathematics that is not completely covered in these notes. Statements about
differential equations are made in several chapters, but you may not completely understand the
full scope of these statements without consulting a text on differential equations. This is,
unfortunately, by necessity, as the notes are already somewhat too long! On the other hand,
you should be able to get the gist of these techniques without having to consult an external
reference.
The first chapter of these notes concerns infinite series expansions of functions. This
topic is extremely broad, and is covered in a lot of detail. You will find more detail about some
aspects of infinite series in chapter 5, but these notes should help you understand the versatility
and importance of series expansions in mathematical analysis. The second chapter concerns
infinite product expansions, which are developed essentially in analogy to infinite series
expansions. These types of expansions are extremely useful in analytic analysis, and find
immense use in proofs of various series results. They are also very useful in proving theorems
on complex functions and their properties, but this aspect is best saved for chapter 5. The third
chapter concerns Eulers Gamma function, which is extremely useful in many branches of
mathematics. The results that I consider to be the most important are covered in detail, but
there are, as always, other results that are either absent or covered only peripherally. The fourth
chapter concerns asymptotic, or semi-convergent, expansions, which are expansions that do not
make sense in the context of the first chapter, but provide very useful approximations in certain
situations. These expansions are extremely valuable in many different fields, and often provide
useful insight into the behavior of complicated systems. Chapter 5 covers complex analysis
fairly comprehensively, and is presented in a theorem-proof format common to many higherlevel mathematics texts. This format is different from that contained in all of the other chapters
except the last one, but is appropriate for complex analysis. This chapter is intended to
introduce the reader both to complex analysis and to the formal proof structure of higher
mathematics. Many discussions on this structure and why it is presented in the manner it is can
be found in this chapter; hopefully, these discussions will help the reader understand the ideas
behind rigorous proof. Chapter 6 is sort of optional, and consists of some more advanced
applications of the material in the preceding chapters. It covers a technique used in the
summation of series and the behavior of prime numbers. The topics of complex analysis
developed in chapters 3 and 5 are essential to the material covered in this chapter.
The seventh chapter begins our discussion of mathematical models. It contains a lot of
information on how the parameters of expected mathematical models can be fit to experiment
and the differential equations and integral relations that are used to determine such models.
8

Chapter 8 concerns a very informative application of these models to the important example of
two bodies gravitationally influencing each other. This chapter also contains a short discussion
of the fundamentals of Newtonian physics and its application to this system. It illustrates how
the analysis of chapter 7 can be applied to systems moving in more than one dimension, at least
in the case of planetary orbits. Some of this chapter is presented in a manner that does not
require calculus, but the latter sections on Keplers laws and general relativistic modifications to
orbits definitely require calculus. The ninth chapter covers the important concept of integral
transforms, which are extensively used in the solution of differential equations. The Laplace
transform and its applications to differential equations are discussed in detail, and the Fourier
and Mellin transforms are introduced along with some of their properties and applications.
Chapter 10 concerns systems in which the initial conditions are not completely known.
The ultimate outcome of such systems is determined with a probabilistic interpretation of the
system itself. This leads us into thermodynamics and statistical mechanics, both of which are
full-fledged physical theories in their own right. Chapter 11 introduces the concepts of linear
algebra in finitely-many dimensions. This chapter contains the essentials of a one semester
course on linear algebra, and defines all of the salient concepts that would be found in such a
course, at least as far as algebraic results are concerned. Chapter 12 generalizes many of these
concepts to linear vector spaces that have infinitely many dimensions. This topic is rife with
subtleties, and the point of this chapter is really to elucidate these subtleties and indicate when
and where they will be expected to appear, as well as how to handle them when they do.
Chapter 13 illustrates one shining example of the uses of infinidimensional linear
algebra: the solution of linear partial differential equations that are expressed in terms of
variables in which they are separable. This chapter is devoted mainly to the specific example
of the wave equation, which is very important in many physical theories, but the concepts it
illustrates are applicable to a much broader set of partial differential equations. Some of these
equations are derived in this chapter, but there are many others. Chapter 14 is a sort of
supplement to chapter 13. It concerns the treatment of waves that are incident on a boundary
between media with different wave speeds. This analysis is very important in systems that
consist of multiple regions with different physical properties, and is widely applicable to the
propagation of waves both in classical and quantum systems. Chapter 15 is a supplemental
chapter containing the proof of some of the most important theorems used in chapters 12 14.
These results are fundamental to many branches of physics and engineering, but are often not
described in detail in college courses on these subjects. This chapter is really a supplement to
chapter 12, but the consequences of its material are more easily understood once you have a
background in the applications of infinidimensional linear algebra.
This second edition includes two additional chapters, chapters 5 and 9, that I thought
were appropriate for this book. The importance of the material contained in these chapters to a
large number of applications would be difficult to overstate, and the material presented in
chapter 5 in particular is very useful in several other chapters. I hope that these chapters will be
useful to you throughout the course and in future courses. Much of the material contained in
this book is often assumed to be understood by many professors of engineering and physics, and
is rarely explained in detail by these professors. The reason for this is that much of it is quite
involved and these professors do not have time in their courses to elucidate the reasons why it
works. It is my intent to provide this information in the hope that it will help you understand
these treatments when they come up in other discussions.

This second edition also includes problems at the end of each section and a short
summary at the end of each chapter. The answers to selected problems can be found at the back
of the book. I have included explanations when I felt they were appropriate. Not all answers in
the back are to odd problems, and not all odd problems have answers in the back. I have
answered even problems whose solutions I felt would aid the student struggling with that
problem and the related concepts, and have not answered odd problems whose solutions should
be more-or-less obvious. Many of the problems do not include complete answers, but include
enough for you to see how the solution goes and how to determine the left-out information.
Numerical answers and graphs are prime candidates for this exception, though many numerical
answers and many graphs are included. In general, the answers provided are intended to give
you a better understanding of the material; if an answer is either redundant in this sense or
requires only a deft manipulation of a computer algebra system, then it is left out.
These notes are formatted chapter-by-chapter, and no attempt has been made to integrate
the chapters into a whole excepting the page and footnote numbers. Figures are numbered by
chapter, so figure 1 of chapter 4 is not the same as figure 1 of chapter 10. Equations are
referred to by their location in the document and the related context; there are no equation
numbers. I hope that you will find this book useful and enjoy reading about the math contained
within.

10

Section I.1: Geometric Series

Chapter I
Manipulation of Infinite Series Expansions
The purpose of this chapter is to illustrate some of the methods used to manipulate infinite
series expansions, especially Taylor series, and efficiently determine their radii of convergence and
early terms of the expansion. These ideas are not complicated, but require practice in order to get the
technique down. We will see that the techniques involved in these manipulations are very useful in a
variety of fields.
Section I.1: Geometric Series
Lets begin with the geometric expansion, as that is almost certainly the one that is most
familiar. The expansion is easy to derive, and was known to the ancients. We begin with the
definition
n

Sn r k 1 r r 2 r 3 r n .
k 0

Multiplying by r and subtracting gives


S n rS n (1 r ) S n 1 r n 1 ,
so
1 r n 1
.
Sn
1 r
This series is extremely special because it can be summed in closed form. The annoying is no
longer present. We are free to compute the exact value of the sum for any choice of n we like. Taking
n very large, we obtain the infinite series

1 r n 1
S r k lim
.
n 1 r
k 0
The limit on the right diverges (does not exist) when the modulus of the common ratio r exceeds or
equals 1. If r 1 , the numerator becomes large faster than the denominator and the expression

becomes unlimited (goes to infinity) as n increases. If r 1 , but r 1 , then the numerator oscillates
and the limit again does not exist (but for a different reason). If r = 1, then the expression is
undefined. Taking the limit of the expression as r tends to 1 gives the value n +1 for the sum (as does
simply plugging 1 into the original definition of S n ), which becomes unbounded as n grows without
bound. If, on the other hand, the modulus of r is less than 1, the limit is easily seen to exist and be
given by
1
.
S
1 r
Note that this expression is perfectly well-defined for all r 1 , while the original sum is only defined
for r 1 . This feature is common: infinite series can often be continued to a region in which the
original sum is undefined. This process is called analytic continuation. We will see this on several
occasions through the course of this text.
Now that we have an infinite series, what can we do with it? Replacing r with x, we have
11

Advanced Mathematical Techniques

1
; x 1.
1

x
k 0
Now, x is just a symbol. It represents some number that is less than 1 in modulus. This allows us to
exploit the geometric series, obtaining many different series expansions:

1
1
n

x 1 ; x 1 1
2 x 1 ( x 1) n 0

n
1
x 3 x 3n ; x 1
3
1 x
n 0
n 0

n
1
1

x 2 (1)n x 2 n ; x 1
2
2
1 x 1 x
n 0
n 0

e nx ; e x 1
1 e x n 0

x
1 x

n 0

n 1

n0

k 0

k 0

x xn 2 x xk x xk x x x

; x 1

k 0

1
1
1
1
1
1 x2



; x2 4
2 x 2 ( x 2 2) 4 ( x 2) 4 1 ( x 2) 4 4 n 0 4
n

1
1
1
1
1
1
x5


(1)n
; x 5 3.
2 x 2 ( x 5 5) 3 ( x 5) 3 1 ( x 5) 3
3 n0
3
These expressions are intended to illustrate the depth of use that an expansion can be coaxed to
achieve. We can shift the center easily in this expansion, as illustrated in the first, sixth and seventh
expansions. Of course, this shift is accompanied by a change in the interval on which the expansion is
valid. It is obvious from each of these three convergence intervals that the value x = 2 lies on the
curve of convergence. There is a singularity in the function that limits the convergence radius. The
second expansion is no exception, as x = 1 is a singularity for it. The third expansion is somewhat
different. There are no real values of x for which the original function is singular. Despite this fact,
the series will definitely diverge whenever the modulus of x is equal to or exceeds 1. The reason for
this behavior is that power series like these are not able to hide the consequences of the complex
number system. All of the manipulations we have performed are just as valid for complex numbers x
as they are for real numbers, so the fact that the original function diverges when x i is enough to
cause the series to diverge whenever x 1 . It is important to recognize this fact when manipulating
expansions. The complex number system refuses to be ignored. The fourth expansion is somewhat
different in that it is not a power series. The powers in the series are taken of e x rather than x, so the
domain of convergence is not a circle. Rather, this expansion converges in the right half-plane, all
complex numbers x whose real part is positive. Again, this domain is limited by singularities. The
singularities in this function lie at x 2 ik , with k an integer. All of these singularities lie along the
imaginary axis, so that becomes the boundary of the convergence domain. The fifth expansion was
included just to show you what people generally do when the character of odd contributions to the sum
differs from that of even contributions. The sum is split into two pieces. In the even piece, n is
written instead as 2k, while in the odd piece n is written as 2k + 1. The resulting sums in k can
x x
, which
sometimes then be added together. Note that the expression we are left with is simply
1 x
is obviously equal to the original expression (remember conjugates). This expansion converges for
all x in the unit disk, x 1 , but we should keep in mind that the original function has an issue around
12

Section I.1: Geometric Series

zero due to the choice of which square root to use. This original function cannot be continuous on
any domain that includes the origin. That said, the two sides of the equation are the same whenever
the same choice of branch of the square root function (the positive or negative, for lack of a better
term) is made on both sides.
Exercises for Section I.1

In problems 1 4, find the infinite series expansion of the given function in terms of powers of x.
1.

x x
1 x2

2.

x
1 x3

3.

2 x x2
1 x3

4.

3x
2 x

In problems 5 10, find the infinite series expansion of the given function in terms of powers of x c .
5.

2
; c2
3 x

6.

4
; c4
5 x

7.

3
; c5
2 x

8.

6
; c 1
5 x

9.

2
; c3
x3

10.

5
; c3
x4

Section I.2: General Infinite Series

In order to continue our discussion of infinite series, it is important to have a general idea of
what one is and when it can be expected to make sense (converge). This is a fairly complicated
process in general, but there are some relatively simple results. A general infinite series is defined in

relation to the infinite sequence an n 1 . This sequence is a set of (in general, complex) numbers
2
, or recursively, as in
n2 3

a1 3 ; an 1 4 1 an . Given the sequence an n 1 , we define the finite sum

indexed by n.

This sequence can be given explicitly, as in an

S n ak a1 a2 an .
k 1

Clearly, our finite sum S n defines another sequence S n n 1 . The infinite series, S ak , is said to

k 1

converge if and only if the sequence S n n 1 converges. In that case, S lim S n . Our discussion of

infinite series then reduces to the question, what properties of the sequence an n 1 are necessary and
sufficient in order to ensure that the infinite series S converges?.
To tackle this question, we begin by reminding ourselves of the meaning of convergence. We
need the limit lim S n to exist. Recall that lim S n is said to equal S if and only if, given any real

number 0 one can find a whole number N for which Sn S whenever n N . Simply stated,
we require that terms in the sequence

S n n1

with large enough index n all fall within of S,

regardless of how small is chosen to be. Choosing a smaller value of will require us to consider
13

Advanced Mathematical Techniques

later terms in the expansion (N will be larger), but the statement of the limit ensures that we can always
find an N that will satisfy this requirement.
One very simple result follows directly from this reasoning. Suppose that the series converges.
Then, given 0 , we can find a corresponding value of N that makes Sn S whenever n N .
Now, choosing n N , we investigate the difference S n 1 Sn an 1 :

an 1 Sn1 Sn Sn1 S S Sn Sn1 S S Sn 2 .


Here, we have used the triangle inequality that states z w z w for any two complex numbers z
and w. The proof of this fact is very straightforward; we simply draw a triangle and use the geometric
fact that the length of any one side is less than the sum of the lengths of the other two. Complex
number addition is accomplished geometrically by forming triangles, so this algebraic property is a
direct consequence of the geometric one.1 This relationship is illustrated in figure 1. Since n N ,
both of the terms on the right hand side of the inequality are less than and the sum is less than 2 .

Since can be chosen arbitrarily small, the terms in the sequence an n 1 must become arbitrarily
small in magnitude as n increases in order for the series to converge. This is stated mathematically as,
If the infinite series

a
n 1

converges, then lim an 0 . The contrapositive of this statement is, If

lim an 0 , then the infinite series


n

a
n 1

diverges.

Thus, we require our original sequence to

converge to zero if we want the associated infinite series to converge. It is important to recognize that

lim an 0 is not enough to guarantee convergence of


n

n 1

. There are many sequences whose

individual terms go to zero, but their sum does not converge. The present result only allows us to
demonstrate that a series diverges; it does not contain the structure necessary to demonstrate
convergence. For this reason, it is often called the test for divergence or the nth term test. These
subtleties are one reason why we work hard in mathematics to make our proofs rigorous. The present
result is pretty obvious because we certainly cannot expect a sum of an infinite number of terms to
converge unless the terms get smaller and smaller as we continue to add! However, there are
sometimes surprising results in mathematics that do not follow our intuition. A careful study of what
properties are needed and why they are needed is required in order to fully understand the implications
of the mathematics.
4

w
3

z
2

z+w
1

0.5

1.0

1.5

2.0

2.5

3.0

Figure 1
There is a more general result that is related to the test for divergence. Instead of considering

the difference between neighboring terms in the sequence S n n 1 , let us consider two terms that are
1

This fact is used extensively in chapter 5.

14

Section I.2: General Infinite Series

separated by some natural number P. In a manner exactly analogous to that above, we find that this
difference is also bounded by 2 :
M P

k M 1

ak S M P S M S M P S S S M S M P S S S M 2 .

Here, M is a natural number greater than N and P is the number of terms we keep in this tail of the
expansion. This result indicates that a convergent series has the property that, starting with a the term
n = N + 1, we can add as many terms in the sequence as we like and never exceed 2 . Not only do
neighboring terms in the sequence have to come arbitrarily close to S M , but all subsequent terms also
have to come arbitrarily close. This property is known as Cauchys convergence criterion, and is
completely equivalent to our understanding of convergent sequences.2
It turns out not to be easy to develop other general results like this one that are sufficient to
guarantee convergence. One usually resorts to a series of tests that can be considered for a specific
sequence of terms. The easiest of these are the comparison tests that compare the convergence of an
infinite series to the convergence of a simpler expression. We will consider several of these, most of
which will be familiar from calculus. The comparison tests are almost always applied to series
consisting only of positive real numbers. The general series

a
n 1

and only if the series

a
n 1

is said to converge absolutely if

, consisting only of positive numbers, also converges. If this absolute

value series converges, then given any real number 0 it is possible to find a natural number N for
which

N P

n N 1

an for every natural number P. This follows from the definition of convergence, in

the form of Cauchys criterion. On the other hand, the tail of the general sum S must satisfy
N P

k N 1

ak

a
n 1

N P

k N 1

ak by the triangle inequality. Therefore, the general series

a
n 1

will converge if

does. This is often stated as, A series that is absolutely convergent is convergent. Using this

result, we can often employ comparison tests to determine whether or not a series converges by
considering instead the series of absolute values. If it converges, then we know for sure that the
original series also does. If it does not, we need to find another approach.
One of the simplest comparison tests to employ is the limit comparison test. This test
compares a series whose convergence properties are not known to another series whose convergence
properties are well-established. Suppose we are given two series consisting entirely of positive terms,

b
and
b
ck . The limit comparison test holds that if lim k L 0 then the two series either both

k
k c
k 1
k 1
k

1
converge or both diverge. This test can easily be used to demonstrate the convergence of k
k 1 2 1

2k 3
(compare to the convergent
k
2
k 1
geometric series with r 2 3 ). We will also be able to use it to demonstrate divergence once we have

(compare to the convergent geometric series with r 1 2 ) or

2
In order to prove that all sequences satisfying this criterion will converge, we need to appeal to the idea of compactness
that is usually taken for granted with complex numbers; for brevity, we will follow this lead.

15

Advanced Mathematical Techniques

some nontrivial divergent series to compare to. Divergent geometric series all fail the test for
divergence, so there is no reason to employ a comparison test.
The proof of the limit comparison test illustrates clearly both why it is true and how rigorous
b
proof can be used to establish it. The statement that lim k L 0 means that, given any 0 , we
k c
k

bk
L whenever k > N. In order to accomplish the
ck
proof, we need to manipulate this expression in such a way as to bound the sum we are interested in
between two sums associated with the one we know something about. In this vein, we multiply by ck
can find a natural number N large enough that

and arrive at bk Lck ck . Now, bk Lck is clearly a real number.3 If its absolute value is less
than ck , then it must be true that
ck bk Lck ck , or ( L )ck bk ( L )ck .
Since L 0 , we can always choose small enough that the terms on both sides are positive. In this
way, we have bounded the term bk between two positive numbers. Summing this inequality from
k = M + 1 to k = M + P for any natural numbers P and M > N, we arrive at
(L )

M P

k M 1

Now, suppose that

k 1

M P

ck

k M 1

bk ( L )

M P

k M 1

ck .

converges. Then, given any number 0 ( is totally unrelated to ), it

must be possible to choose M large enough that

M P

k M 1

ck for all natural numbers P. In this case, we

have
M P

k M 1

bk ( L ) .

Since we can choose as small as we wish by increasing the value of M, the tail

M P

k M 1

zero and the series

b
k 1

converges. If, on the other hand,

c
k 1

so small that it is not possible to find a value of M for which

diverges, then there are choices of


M P

k M 1

number so small that

M P

k M 1

bk is forced to

ck for all P, i.e. there is a

ck for at least some values of P, no matter how large we choose M to

be. In this case, the other side of the inequality gives


(L )

M P

k M 1

bk

for at least some values of P, no matter how large we choose M to be. Since it is clearly impossible to
find M large enough for the series bk to satisfy the convergence criterion, it diverges.

3
This result is also true for complex numbers, but this situation requires us to be more careful. We will save this analysis
for chapter 5.

16

Section I.2: General Infinite Series

It is important to understand both the logic of this theorem and its presentation. The basic idea
b
of why the theorem is true is fairly simple. If lim k L 0 , then bk Lck when k is large.
k c
k
Convergence or divergence of a series is associated exclusively with the later terms in the sequence we
are summing. Any finite number of terms we add together will converge; it is only the infinite number
of terms to follow that can lead to a divergence. Since bk behaves like ck in this tail of terms, the tail
of the series associated with bk is essentially just a multiple of the tail of the series associated with ck .
If the latter converges or diverges, so will the former. This is exactly what the proof of the theorem
lays out, but it does so in mathspeak. The complications associated with inequalities, choices of ,
etc, are required to firmly set down what is meant by behaves like and converges. In order to have
a strong ground on which to build mathematics, we need to build the structure soundly using
arguments based on definitions and logical reasoning. Note that we did not need to have ck in the
denominator; since L 0 , we can clearly flip the ratio upside down and arrive at exactly the same
conclusion. If the limit were to equal zero, the proof becomes much weaker and more complicated to
state. We could, for example, conclude that bk converges whenever ck does, but the divergence
of ck tells us nothing of the behavior of bk . This illustrates why it is important to understand
clearly not only what a theorem says but also the conditions under which it is valid.

Exercises for Section I.2

In problems 1 16, establish the convergence of the given series. Explain your reasoning clearly.

3n
n

2.

5.

3n 2n

n
5 n
n 1 7 n 4

9.

1.

3.

6.

7n

n n
7 n
n 1 5 2 n 9

7.

5n n 3 4 n

4n
n 1

2n
ln n

10.

11.

n 4 5n n 8 4 n
n
3 n
n 1 7 n 6

14.

n 1

n2

13.

5n
n3

6
n 1

n 1

5n n 5 4 n
ln n n3 5n

n 7 5n n 8 6 n
n
6 n
n 1 7 n 6

4
n 1

3n
3n

4.

n 1

7n
n
n 3 8n

8.

6 n n 8 5n

n
3 n
n 1 7 n 6

2n ln 2 n
n
n3 2n

12.

n 3 5n 4 n
n
n 3 5n
n 1

16.

n 1

15.

7 n ln 3 n n5 6n
9 n n 3 8n
n 1

n 1

n8 3n
n3 3n

na bn
, where the parameters a, b, c, d, and e are all nond n ne
n 1
negative real numbers. For what values of these parameters can we be sure that the series
converges from the results of this section? Explain your answer, including which series you
are comparing to and why it converges. What happens if a is allowed to take negative values?
What about c and e?

17. Consider the general series

17

Advanced Mathematical Techniques

a n ln b n
, where the parameters a, b, c, d, and e are all nonn
nd en
n2
negative real numbers and c e . For what values of these parameters can we be sure that the
series converges from the results of this section? Explain your answer, including which series
you are comparing to and why it converges. What happens if b is allowed to take negative
values? What happens if d is allowed to take negative values?

18. Consider the general series

Section I.3: The Integral Comparison Test and Integral Bounds

One of the most useful tests for convergence stems from an analogy between infinite series and
improper integrals. This test can often give definite results in cases when many of the other tests fail,
but its usefulness goes far beyond that simple result. In deriving the test, we arrive at a method that
can be used easily to bound the value of many different types of infinite series. Not only do we know
that the series converges, but we can also give an estimate theoretically to any desired number of
decimal places of its value. In addition, this technique allows us to assess the rate at which a
divergent series diverges. It allows us to approximate the value of the sum up to any specified number
no matter how large so we can determine how the series behaves as the number of terms included
increases. This result is extremely useful in many fields.
Given the series

b
n 1

of positive terms bn , we begin by determining a positive, monotonically

decreasing function, f (x), whose value at x = n is given by bn : f (n) bn . This can often be done, as
series that converge must consist of contributions that tend to zero as n is increased. As long as the

series bn n 1 decreases monotonically to zero as n increases, we can always find such a function. As
1
1
. The function in this case is clearly f ( x) 2 . A graph of the discrete
2
n
x
contributions to the sum and the function f (x) is shown in figure 2. The area of each rectangle gives a
20
1
contribution to the series, so the sum of the rectangle areas is equal to the series 2 . It is clear
n
n 11
from the figure that this sum is less than the area under the curve y = f (x) from x = 10 to x = 20, so we
have
20
20 dx
1
1
.
2

2
10
x
20
n 11 n
Shifting the rectangles over one unit, as illustrated in figure 3, we see that the same sum is greater than
the area under the curve from x = 11 to x = 21:
20
21 dx
1
1 1
10
.
2

2
11 x
n
11
21
231
n 11
Putting this all together, we find that the sum satisfies the inequality
20
21 dx
20 dx
10
1
1
.
2 2 2
10 x
231 11 x
n
20
n 11
20
1
Numerically, this evaluates to 0.04329 < 2 = 0.0463955 < 0.05, which is certainly true.
n
n 11

an example, consider bn

18

Section I.3: The Integral Comparison Test and Integral Bounds


0.010

0.010

0.008

0.008

0.006

0.006

0.004

0.004

0.002

0.002

10

12

14

16

18

10

20

Figure 2

12

14

16

18

20

Figure 3

In general, this analysis leads to the result

M 1

N 1

f ( x) dx

n N 1

bn

f ( x) dx .

As in the case of the limit comparison test, this immediately allows us to conclude that the two
expressions

b
n 1

and

f ( x) dx either converge or diverge together. The sum is bounded between

two integrals; if the left integral diverges then so does the sum, and if the right integral converges then
so does the sum. This integral comparison test is extremely useful in determining the convergence or

1
divergence of many, many series. One of the prime examples of this is the p-series p , which is
n
n 1

dx x1 p
x p 1 p goes to zero as x grows without
bound whenever p > 1, we can conclude from the integral comparison test that the series converges
dx
whenever p > 1. Conversely, since
ln x grows without bound when x does, we can conclude
x
that the p-series diverges when p = 1. If p < 1, then the power of x in the antiderivative is positive so
the sum will definitely diverge. These results cannot be obtained as easily from any other technique.
In conjunction with the limit comparison test, they can easily be used to establish the convergence of

n2 1
3n 2 2n 1
2n
n
, 4
, and
,
as
well
as
the
divergence
of
the
series
,
the series 3

2
3
n
1

1
n
1
n
n n
n 1
n 1
n 1
n 1

2n 1
3
, and
. The simple way to look at these series is to only pay attention to

2
3
n

2
n

5
n 2
n 1
n 1
what happens when n is very large. The first two series behave like 1 n 2 when n is large, so converge.
The third series behaves like n 3 2 , so converges as well. The fourth and fifth series both behave like
1 n , and the sixth behaves like 1 n , so these all diverge. We can also use this technique to assess

1
ln n 4
the convergence or divergence of the series 2
and
. Since the terms of the first series
n
n
ln
n2
n2 n
are all less than 1 n 2 and that series converges, the first series also converges. We can think of this as
the use of the limit comparison test when the limit gives zero. Likewise, the terms of the second series
are all greater than 1 n . Since that series diverges, the second series also diverges. This can be
thought of as a use of the limit comparison test in the case where the limit goes to infinity. Remember
that these two limits, zero and infinity, do not allow the direct application of the limit comparison test.
also known as the Riemann zeta function ( p ) . Since

The initial value of n has been modified in these sums in order to avoid a divergent contribution; this does not affect the
ultimate convergence or divergence of the sum, as convergence or divergence is determined exclusively by the behavior of
the infinite number of terms that follow.

19

Advanced Mathematical Techniques

They are, however, useful in some cases when treated with care. Make sure that you understand why
it is okay to make these judgments in this case.
In order to say more about the behavior of these series, we employ the full power of the

1
integral bounds above. Consider the convergent series 2 . It will be shown later that this series
n 1 n
2
converges to 6 1.644934 . In order to obtain an estimate for the value of this series, we first sum
300
1
a large, but finite, number of terms: 2 1.64161 . The remaining tail of the series is bounded,
n 1 n

dx
dx
1
1
1
0.003322

0.003333 ,
2
2
301
300
301
300
x
x
n 301 n
so, adding the two results, we obtain

1
1.64492854 2 1.64493962 ,
n
n 1
in agreement with the exact result. The maximum possible error associated with our approximation is
given by the difference between the two bounds, or approximately 1.11 10 5 . This is pretty good, as
the sum of the first 300 terms in the series is approximately 3.3 103 smaller than the infinite series
result. Our approximation contains an error which is of the order of the first term we left out of the
direct summation, 1 3012 1.10374 105 , rather than the sum of all of the terms we left out from 301
to infinity. This is not an accident. The slightly modified argument in many books gives the upper
bound

M 1

N 1

f ( x) dx bN 1 bM 1 for the sum

n N 1

bn instead of

f ( x) dx . This upper bound is a

more stringent constraint whenever f (x) is concave up (which almost always occurs for large x
whenever the series is convergent), but is more tricky to work with in practice. This approach does
give us an estimate for the error associated with our approximation technique, however, as the
difference between the upper and lower bounds is clearly given by bN 1 bM 1 . If we are considering
an infinite series that converges, then bM 1 0 and the error is simply bN 1 , the first term left out of
our direct computation of the sum. This is an important result to remember when approximating the
value of a series. The error obtained is of the order of the first term we omitted, once we add the
integral.

2n
. The sum of the first ten terms gives
As another example, consider the series 4
1
n
n 1
10
2n
1.379296 . Our error in approximating this series by integrals is of the order of

4
1
n
n 1
22
0.0015 ; the sum is bounded by
114 1

2n

Arctan 112 1.379296 1.387561 4


1.389296 Arctan 10 2 1.379296 .
2
2
n 1 n 1
The error here is a little larger than the first term left out because we are not using the more stringent
result quoted above. The exact value of this series is not easily expressed in closed form, but it is
approximately 1.388346, in agreement with our bounds.
If a series does not converge, the techniques used to derive the integral comparison test can still
be useful in determining the behavior of the series as the upper limit grows. Consider, for example,
20

Section I.3: The Integral Comparison Test and Integral Bounds


N
1
the harmonic series H N . This series diverges as N is increased, but thinking of the terms we
n 1 n
are adding makes this result counterintuitive. Since the terms we are adding tend to zero as n tends to
infinity, it seems to the uneducated observer that this series should converge. Lets test this by
computing the value of this series for N = one thousand, one million, and one billion. The sum of the
100
1
first 100 terms is given by 5.1873775 . The error in our approximations will be of the order of
n 1 n
1
the first term left out,
0.0099 . Summing from n = 101 to n = M, we have the bounds
101
M 1 M 1
M
.
ln
ln
101 n 101 n
100
Adding our sum from 1 to 100 gives the values
7.48101 S1000 7.48996
14.38777 S106 14.39772

21.29552 S109 21.30547 .

These sums are clearly growing, and there does not seem to be an end in sight. Although our only
concrete result from the actual series we are interested in contains only the first 100 terms, our bounds
remain accurate to 0.01. This demonstrates the power of our approach, as the exact value of
S109 21.3004815... takes even the fastest processors a long while to obtain. More sophisticated
computer algebra systems, like Mathematica, have the techniques we are discussing built into them in
order to speed up processing time. Mathematica, specifically, only computes the first 15 terms in a
general series before resorting to approximation techniques like those described above.
N
1
.
One very prominent series in number theory is the glacially diverging series Gn
n
n
ln
n2
dx
This series definitely diverges, as the integral
ln ln x certainly grows without bound as x
x ln x
does. However, the sum of the first billion terms is bounded between 3.82486 and 3.82702. A
computer adding one number every second from the beginning of the Universe until now would only
have summed the series to approximately 4.5. Adding a trillion terms per second since the beginning
21
of the Universe would bring the value to approximately 5.019. It would require more than 1010 terms
to sum the series to a number greater than 50. Regardless of the extremely slow pace of its divergence,
this series will eventually exceed any number you can think of. This certainly promotes the
importance of analytic results!
A more advanced application of this technique that finds direct use in mathematics and indirect
use in many different fields involves the determination of the limit of a series minus the term
associated with our correction as N tends to infinity. As an example, lets consider the harmonic series
N
1
H N . Using integral bounds and the value of H100 given above, we can bound the value of H n
n 1 n
by
N 1
N
.
0.572257 ln( N 1) H N 0.582207 ln( N ) H100 ln
H100 ln
101
100
Subtracting ln N from both sides gives us
N 1
H N ln N 0.582207 .
0.572257 ln
N
21

Advanced Mathematical Techniques

The lower bound tends to 0.572257 as N increases, so we obtain the important result
0.572257 lim H N ln N 0.582207 .
N

Here, is the so-called Euler-Mascheroni constant. It plays a large role in the study of prime
numbers and scientific studies of the absorption of light in deep space, and has been called the of
number theory. Our result shows that this limit definitely exists and takes some value between 0.572
and 0.583. In fact, its value is 0.5772156649, again in agreement with our bounds. This same
N 1

ln ln n exists and has some value between 0.7936 and 0.7958.


argument shows that lim
N
ln
n
n
n2

The actual value, 0.794679, is close to the average of these two, 0.7947, which is an important thing
to remember. However, this use of averaging removes any of the strict bounds we had on the original
series; we cannot directly compute the error in this improved approximation, except to say that it lies
within 0.0011 (half the difference between the two bounds) of the exact result. There are ways to
make a better approximation of this error, using the first and second derivatives of the function f (x),
but their development is more involved and we will not go into them here.
The integral comparison test is the most delicate of the tests we will consider. Indeed, it is
difficult to find a test whose sensitivity beats it. Its ability to determine not only the convergence
properties, but also an approximate value, of series makes it an extremely powerful tool. However, the
requirement of determining an antiderivative of the function f (x) often bars us from using it. In many
applications, its sensitivity is unnecessary to achieve the intended goal and it is relegated to treat only
those series that cause other tests to fail. Often, the integral comparison test is simply used to
determine certain base results, like the p-series result, which are then used to employ other tests.
This is certainly the case in calculus, where it definitely plays second fiddle for the test we turn to
now.
Exercises for Section I.3

In exercises 1 12, establish whether or not the given series converges. Explicitly describe the test(s)
you are using.

n 2 3n 2
3
n2 4
n0

1.

4.

n2e n

2.

5.

n 3 4n 3
2
n 1 n 2n 3 n

8.

n 1

7.

10.

n 0

22

7 n 2 3n 2
3n 2n 3
5

3.

ln n 3
2
n
n2

6.

n0

n2

n 4 3n3 n 2 1
7
n 5 2n 5

3n

n ln

n 0

2n 3
n 5
4

n 2 3n 2

n3 ln n
n2

2 n

n 1

n
n0

n 2 3n 5
2n 2 3 ln n

9.

n0

11.

n e

12.

5
n 2n 2 1
3

3n 2

n2

ln n n3 n 2 4

Section I.3: The Integral Comparison Test and Integral Bounds

n
converges, then use the sum of the first 100 terms to bound its
1
n 1
value. How accurate are your bounds? Give your answer as a percent error expected in your
bounding terms.

13. Show that the series

2
5
2
converges, then use the sum of the first 100 terms to
1 n 4
n0
bound its value. How accurate are your bounds? Give your answer as a percent error expected
in your bounding terms.

14. Show that the series

cos1 n
converges, then use the sum of the first 100 terms to bound its
n2
n 1
value. How accurate are your bounds? Give your answer as a percent error expected in your
bounding terms. Does the sum of the first thousand terms lie within your bounds? Must it?
Explain, and then improve your bounds using this sum instead of the first 100 terms.

15. Show that the series

16. Show that the series

e1 n

n
n 1

converges, then use the sum of the first 100 terms to bound its

value. How accurate are your bounds? Give your answer as a percent error expected in your
bounding terms. Show that the sum of the first thousand terms lies within your bounds, then
improve your bounds using this sum instead of the first 100 terms.
6n 1
diverges, and obtain bounds on the sum of the first billion
2n 1
n 0
terms of this series using the sum of the first 100 terms. Approximately how many terms must
be added in order to obtain a sum greater than 50?

17. Show that the series

18. Show that the series

6n

5n 3

(n 2)
n0

diverges, and obtain bounds on the sum of the first billion

terms of this series, using the sum of the first 100 terms. Approximately how many terms must
be added in order to obtain a sum greater than 50?
19. Show that the series

n ln

converges, and use the sum of the first 100 terms to obtain
n
bounds on the sum of the first billion terms of this series. What is the maximum relative error
(maximum error divided by value) on your result? Will this relative error improve if you
consider the sum of a larger number of terms? What is the relative error in your bounds on the
sum of the first trillion terms? How do your errors change if you use the sum of the first
thousand terms to bound the value?
n2

20. Find a set of necessary and sufficient conditions on p in order for the series

n ln

n
converge. Can you modify your conditions in such a way that the number p can be complex?
n2

to

23

Advanced Mathematical Techniques

1
is divergent, the limit
n
n 1
N 1

lim
2 N
N
n 1 n

converges. Use the sum of the first 100 terms in the series to bound the value of this limit.

21. Show that, while the series

1
is divergent, the limit
n
n 1
N 1 3

lim 3 3 N 2
N
n 1 n 2

converges. Use the sum of the first 100 terms in the series to bound the value of this limit.

22. Show that, while the series

Section I.4: The Ratio and Root Tests: Power Series


Two of the most widely used convergence tests are the ratio and root tests. These tests are
valuable for their simplicity and their direct relevance to a very important class of series: power series.
These series lay the foundations for analytic complex analysis, and the results of the ratio and root tests
are fundamental to their development. To begin, suppose that we are interested in summing the
b

sequence bn n 1 of positive terms that satisfies lim n 1 r . This means that, given any 0 , we
n b
n
can find a natural number N so large that
bn 1
r whenever n > N.
bn
This implies that the ratio bn 1 bn lies between r and r whenever n is large enough. The ratio
bn p
bn p bn p 1 bn p 2
b

n 1
bn
bn p 1 bn p 2 bn p 3
bn
can therefore be bounded as
b
(r ) p n p (r ) p
bn
for any natural number p, as long as n > N. Multiplying by bn , we obtain

bn (r ) p bn p bn (r ) p .
In this way, we have bounded an arbitrary term in the tail of the sequence. Taking n = M > N and
summing from p = 1 to p = P for some natural number P gives us
P

M P

p 1

k M 1

bM (r ) p

bk bM (r ) p .
p 1

Now for the tricky part. If r < 1, then the difference between r and 1 is a fixed nonzero number. We
can certainly choose the value of 0 to be smaller than this number so the sum r is also less
than 1. This means that the sums on both sides of the above inequality will converge as P increases
since they are geometric series with ratio less than 1. Since the tail of the series we are interested in is
bounded between them, it must also converge. If, on the other hand, r > 1, then we can use the same
argument to show that the series diverges. The one remaining case is r = 1. In this case, the series on
24

Section I.4: The Ratio and Root Tests: Power Series

the right diverges and the series on the left converges no matter how small is chosen to be. The tail
of the series is therefore bounded between bM and infinity, and no conclusions can be made. Thus,
N

bn 1
r , then the series converges as N tends to
bn
infinity if r < 1 and diverges if r > 1. The test fails if r = 1, and no information is given in this case.
The most useful application of the ratio test involves power series. A power series is a series

that depends on a variable, say z, in a specific way. Given a sequence of numbers an n 0 , we define

we arrive at the ratio test: given a series

b
n 1

, if lim

the function f (z) by the series

f ( z ) an z z 0 .
n

n 0

This is a power series centered at the point z0 . Note that we have started the series at n = 0 instead of
1 as a matter of convention. In general, both z and an can be complex numbers. In this case, the
terms in the series are not necessarily positive numbers and we cannot directly apply the ratio test.
Our earlier result that absolute convergence implies convergence allows us to circumvent this fact for
the moment and consider instead the series of positive terms

g z z0 cn z z0

n 0

Here, z z0 is the modulus of z z0 , x 2 y 2 if z z0 x iy , and cn an . There are several


possibilities for the behavior of this function, which are made even more obvious by the use of the
ratio test. The required limit is

lim

cn 1 z z0
cn z z0

n 1
n

z z0 lim

cn 1 z z0
,

cn
R

where I have defined R lim cn cn 1 , if it exists. The ratio test indicates that this series converges
n

whenever z z0 R . This domain of convergence is a disk of radius R centered at z0 . The ratio test
also indicates that the series will diverge whenever z z0 R , so we are left with the circle

z z0 R that the ratio test fails on.


We have just derived several fundamental results in the theory of analytic functions. A
function defined by a power series for which R is defined may do one of three things. If R 0 , then
the series converges nowhere and the function can only be defined at the single point z0 . If R is
unbounded, tending toward infinity, then the series converges everywhere in the whole complex plane.
Such functions are called entire; exponential functions and polynomials are some examples. Finally, if
R is nonzero then the series converges only within a disk centered at z0 . It diverges everywhere
outside the disk, but its behavior on the circle itself is not constrained. The series

converges

n 0

nowhere on its circle of convergence, the series

zn

n 1

converges at all points on its circle except for

n 0

zn
converges everywhere on its circle. We need more sensitive tests to
1
n0
determine the behavior on the circle of convergence, but the ratio test solves the problem for all of the

one, and the series

25

Advanced Mathematical Techniques

1
centered at x = 0
1 x2
fails to converge outside x 1 even though all real values of x pose no problem for the function: since

other points in the complex plane. It also explains why the power series for

it cannot converge at i, it cannot converge anywhere farther from the center than i.
Now that we have functions defined as power series, we can ask whether or not we can
differentiate or integrate them. This is an important question; many physical problems involve
functions that we cannot get in closed form, but can determine a series for. The easiest thing to do is
simply differentiate or integrate each term separately, arriving at

f ( z ) n an z z0

n 1

and

f ( w) dw an

z z0

n 1

.
n 1
These series both converge whenever z z0 R by the ratio test, as the additional factors of order n
do not affect the ratio in the limit as n tends to infinity. Continuity of all of the functions then implies
that they must converge to the proper derivative or integral of the function the original series
represented.5 We can use this idea to determine series for functions that otherwise may be hard to

1
z n 1
1
Log 1 z . The
determine. The fact that z n
implies n z n 1
and

2
1 z
n 0
n 0 n 1
n 1
1 z
n 0

z0

n 0

second series actually converges on its boundary, except for z 1 , giving us the interesting result

(1) n 1
1 1 1
1 .
ln 2
n
2 3 4
n 1
This series converges extremely slowly, and it would be much more efficient to compute the natural
logarithm of one half and negate it, but it is still a neat result. Another interesting result comes from
1
. This results in
integrating the series for
1 x2
(1)n
1 1 1

1 .
4 n 0 2n 1
3 5 7
Try to obtain this result on your own. It is not difficult to do.
The ratio test is extremely useful for many series, especially power series, but it has its
downsides. The ratio test fails for all of the p-series and, in fact, all of the other series we have
discussed above except for the geometric series that was instrumental in proving the test. Try it out for
yourself to make sure that you understand how this test works and what it looks like when it fails. One
of the major downsides of this test, and the reason why it is not often used to prove theorems, lies in
the fact that it requires the limit lim bn 1 bn to exist. This is actually not at all necessary for the
n

convergence of a series. The series


n
n

22
1
24
1
n
n
2
1

1
1

1 1 1 2 2 2 3 2 4 2 5 ,
3
2
3
2
3
2

n 1

for example, definitely converges (to 44/15 see if you can show this). However, it fails the ratio test
because the required limit does not exist (it equals 0 half the time and infinity the other half,
depending on whether n is even or odd). The integral comparison test also fails for this strange series,
as it is not monotonic. Although this type of weird series has not come up much in applications in the
past, it is simply the sum of two convergent series with different rates of convergence. In order to be
sure that a given series found in a problem converges, we need to have a test that does not fail on these
series.

We will have a stronger argument for this in chapter 5.

26

Section I.4: The Ratio and Root Tests: Power Series

One of the reasons why the ratio test failed on the above series is that it relates two members of
the series. There is no reason to require that neighboring contributions to the series are related to each
other in any way. Essentially any test that involves comparing the behavior of one term to that of
another is doomed to failure in the end. We can always construct a series that converges yet fails the
test. In order to avoid this issue, we need a test that does not compare any two terms to each other.
Rather, it must identify the characteristics of a convergent sequence separately in each of the terms.
One very simple way to do this is to employ the root test. In calculus, this test is introduced as being
associated with the limit
lim n cn .
n

Unfortunately, this does not serve our purpose since it does not exist for many series. The very fact of
imposing a limit restriction on the sequence involves relating different terms in the series to one
another because the values have to approach the same result for all terms in the tail of the sequence if it
is to exist. What we really need is for no terms in the tail of the sequence to buck the convergence
properties of the series. In calculus, we found that the series will converge as long as the above limit
exists and is less than 1. As a matter of fact, however, we really dont care whether or not the limit
exists as long as none of the terms in the tail give results that are greater than or equal to 1. The series
above, for example, does not have a convergent lim n cn since n cn tends either to 2/3 or 1/2
n

depending on whether n is even or odd. Both of these values are less than 1, so the series converges.
In order to formalize this statement mathematically, we need to introduce a new concept of the
limit process. To get away from this requirement that the limit exists, we define the limit superior:

Given a sequence of positive numbers cn n 1 , we say that the limit superior


lim sup cn L
n

if and only if, given any 0 there are only a finite number of terms in the sequence for which
cn L and an infinite number of terms for which cn L . We can couch this in another way by
saying that there must be a choice of N for which all of the members of the sequence with index larger
than N are less than L ,
cn L n N ,
but there is no choice of N for which
cn L n N .
If there was no such choice of N in the first case, then there would have to be an infinite number of
terms that were larger, in violation of the limit superior statement. If there was such a choice of N in
the second case, then there would clearly not be an infinite number of terms for which cn L ,
again violating the limit superior statement. This limit statement is more general than the standard
limit, as it does not require all of the terms of the sequence to approach the same thing. It simply
limits the maximum value of the tail of the sequence, exactly as we require for convergence. The limit
superior of a sequence is related to the supremum of the sequence in that it is the smallest possible
value that only a finite number of terms are significantly larger than.
To complete the statement of the root test, we consider the limit superior of n cn . Suppose that
this equals r, lim sup
n

large that

cn r . This means that, given any 0 , we can find a natural number N so

cn r whenever n > N. Taking the nth power of both sides, we arrive at


cn r whenever n > N.
n

27

Advanced Mathematical Techniques

If r 1 , then there is a definite distance between r and 1. Choosing as half this distance, we are
guaranteed that r 1 . Summing the inequality from M to M + P, where M is a natural number
larger than N and P is any natural number, we arrive at
M P

nM

M P

nM

The series on the right is the tail of a convergent geometric series, so can be made as small as we like
by choosing M large enough. Since the series

M P

nM

is bounded by this, we can conclude that it must

also converge. Conversely, if r 1 , then there again is a definite difference between r and 1. Again,
we choose as half this difference. Since infinitely many terms are larger than r , which is larger
than 1, we can look only at those terms and arrive at the conclusion that the series definitely diverges.
Finally, if r = 1 then we can do no better than to bound the tail of the series in question between 0 and
infinity. As with the ratio test, the root test fails when the limit superior is equal to 1.
The root test is also tailor-made for power series expansions. It is usually more difficult to
apply than the ratio test, so it often takes a backseat to that test. This is definitely the case in
introductory calculus, where almost all of the series considered are more easily handled by the ratio
test. However, its ability to circumvent the difficulties associated with series like the one above for
which the limit does not exist makes it the prime choice of tests when one is proving a theorem on the
convergence or divergence of series. For normal series that consist of only one convergence or
divergence behavior, it is really only used when the sequence we are summing is entirely a power. As
an example, the series
n2

n 1 1 1 n
would be very difficult to analyze with the ratio test. The root test, on the other hand, immediately
indicates that the series converges. The limit superior in this case is equal to 1 e .

Exercises for Section I.4


In exercises 1 12, determine the radius of convergence of the power series. Indicate which test you
are using and why it works.
1.

k 2 xk

3
k
k 1 3 k

4.

2.

xk

3 k
k
k 1 3 k 2

5.

k 1

7.

x3k
k3

2k

2k

k 2

3k 5 x

8.

k 1

10.

28

k 2 x 3k

(2 ln k 1)
k 1

3.

k x2k

2 k
k
k 1 5 k 2

6.

11.

2k 2 3k 1
2
k 12
k 1

5k

k 1

xk
ln k k 2
3

xk

1 1 k

k 2

9.

x2k

(ln k 3)
k 1

12.

k 2 x3k
k
ln k

k 4

k 2 (2 x) k

1 3 k

k2

Section I.4: The Ratio and Root Tests: Power Series

13. Discuss the convergence properties of the series


14. Discuss the convergence properties of the series

xk

xk
.

k
k 2 ln k

ln

.
k

xk
15. Discuss the convergence properties of the series 11 k . Does it converge when x = 1?
k 1 k
What test can be used to establish this result?
k 2

k2

Section I.5: The Alternating Series Test and Problems with Conditional Convergence
Up until now, we have mainly been considering the absolute convergence of a series. General
series whose terms do not have a definite sign (or, in the case of complex numbers, do not point in the
same direction in the complex plane) are very difficult to analyze and require more sophisticated
techniques. Some of these techniques are illustrated in Arfken and Webber, but there are certainly
others. If the sequence we are summing alternates in sign, however, then there is a very easy test that
we can apply in order to determine whether or not the series converges. Consider the series

(1) n 1
, for example, that was stated earlier to converge to ln 2 . Adding the terms in order, we can

n
n 1
see that the partial sums oscillate back and forth between two points that move inexorably closer
together. The absolute value of the terms we are adding decreases monotonically to zero as n
N
(1) n 1
increases, so, defining H N
, it is clear that
n
n 1
H N 1 H N 1 H N
whenever N is odd and
H N H N 1 H N 1
whenever N is even. If this is confusing to you, try plugging in some values like N = 5, 6, 7, etc, to see
how it works in practice. This bounds the next term in the sequence of partial sums between the last
two terms. These terms only differ by 1 N , which goes to zero as N increases, so the two bounds
approach each other as we move farther down the sequence. This is enough to imply that the series
converges.
This idea is formalized in the statement of the alternating series test: Given a sequence of

positive terms cn n 1 that (1) is monotonically decreasing when n is large enough and (2) tends to
zero as n increases without bound, the series

(1)
n 1

n 1

cn converges. This test is very useful when the

(1) n1
(1) n 1 n 2 n
and 3
2
n
n 1
n 1 3n 20n 2
both converge. The second series does not consist of terms whose absolute value monotonically
decreases over the entire range of n, but its terms do eventually decrease monotonically. Remember
that any finite number of terms can do anything they want. It is only the behavior of the tail of the
series, associated with very large values of n, that determines convergence or divergence.

series is of the appropriate form, as it immediately indicates that

29

Advanced Mathematical Techniques

While this test is very useful and is definitely true as stated, there is a subtle issue with its

(1) n 1
converges by the alternating series test, but what does that
results. We say that the series
n
n 1
really mean? This series is not absolutely convergent, as the series of absolute values is the divergent
harmonic series. It is therefore the difference between two infinite series that are separately divergent:

N
(1) n 1
1
1 1
1 1 1
N 1
lim
1 .

N
2
1
2
3
5
2 4 6
n
k
k

n 1
k 1
k 0

As you remember from calculus, this is an indeterminant form and can equal anything we want it to. If
we wanted the series to sum to 7 5 , for example, we would simply add the terms in the first series
until the result exceeds 7 5 , then subtract the first term of the second series, then add terms from the
first series until the result exceeds 7 5 again, then subtract the second term from the second series.
Continuing this process, it is obvious that we will end up summing the entire series to 7 5 . This
game can be played with any number, allowing us to conclude that the original series converges to
any number we want it to. The reason why we can do this is that the two series displayed above are
separately divergent. The first series will exceed any number I want if I add its terms without
subtracting the corresponding terms of the second series. This is an interesting result, but it calls into
question the meaning of our earlier statement that the sum of the series is ln 2 .
Looking back at our proof of the alternating series test, we see that the partial sums are
considered in sequential order. We need to add the terms in order if we want to use its result. This is
not an issue for absolutely convergent series, as these do not consist of a difference between two
divergent series, but conditionally convergent series must be added in order if we want to obtain a
meaningful result. These types of conditionally convergent series surprisingly do appear in many
applications. The potential energy of a large crystal consisting of regularly spaced ions is one example
where these series come up, and there are many other examples. This is a minor point, but you should
be aware of it whenever you come across such a series.
One very important result that follows from our analysis of alternating series is that the error in
calculating the value of the infinite series (provided it is summed in the proper order) is always less

(1) n 1
than the magnitude of the first term we omitted. The sum of the first 10 terms in the series
n
n 1
is 0.645635. The first term left out of our calculation is 1 11 0.09091 , so this value is less than
0.09091 away from the exact value of ln 2 . Adding this term allows us to bound the infinite series
result between two numbers:
10
11
(1) n 1
( 1) n 1
0.645635
ln
2
0.736544
.

n
n
n 1
n 1
This bounding process is as useful as that obtained from the integral comparison test above. The
bounds are brought closer together as the number of terms included increases, exactly as in the case of
the integral comparison test. This result is often used to determine the approximate value of many
infinite series, especially those for which we do not have a closed form. It also provides us with a
straightforward method to determine the value of certain irrational constants that otherwise would be
difficult to pin down. Methods like these were used extensively in the 18th and 19th centuries to
determine decimal approximations for many important mathematical constants.
Before turning to the next topic, it will be useful for us to derive an important result on the
product of two series. The details of this result are fairly subtle, so it will be best for us to state it as a
theorem.
30

Section I.5: The Alternating Series Test and Problems with Conditional Convergence

Theorem: Mertens theorem on the product of series.

n 0

n0

Given two convergent series A an and B bn of complex numbers, if at least one of the
two is absolutely convergent, then the product AB is given by the convergent series

AB ak bn k .
n0 k 0

Proof: We will assume that series A is absolutely convergent, without loss of generality. Let
N

k 0

j 0

AN ak , BN b j , and CN ak bn k . We first recognize that the partial series C N can be


n 0 k 0

re-arranged as
N

k 0

nk

k 0

CN ak bn k ak bn k ak BN k .
n 0 k 0

Now, the difference between this finite series and the product AB is given by
N

a B

C N AB

k 0

N k

AB

a B

B B ak AB
k 0

B B AN A

N k

N k

k 0

k 0

a B

Since the sequence Bn n 0 converges to B, given any 0 we must be able to find a value of M for

which Bn B whenever n M . Re-arranging the series as


N M

a B
k 0

N k

k N M 1

ak BN k B ,

for N larger than M , we see that the first contribution is bounded by


N M

a B
k 0

N k

B ak A .
k 0

The series associated with A was assumed to be absolutely convergent, so we are assured that the value
of A is finite. The second contribution can also be bounded by recognizing that BN k B , while not
vanishingly small for these large k values, has a maximum value over all k. Calling this maximum B ,
we have
N

k N M 1

ak BN k B B

k N M 1

ak .

The series on the right is the tail of a convergent series, so can be made smaller than any 0 . Using
the same as that used above, we are guaranteed to be able to find a number M such that
M P

k M

for every natural number P. Taking N M M , therefore, we can bound the difference as

31

Advanced Mathematical Techniques

CN AB

a B
k 0

B B AN A

N M

a B
k 0

N k

N k

N M

a B
k 0

N k

k N M 1

ak BN k B B AN A

k N M 1

ak BN k B B

k N 1

ak

A B B A B B
Since the values of A , B , and B are independent of N, this difference can be made as small as we
like by choosing a smaller and consequently increasing the value of N. This proves the theorem.
This theorem was originally proved by the German mathematician Franz Mertens in the latter
half of the nineteenth century. Notice how delicate it is. We first have to re-arrange the result in order
to consider partial sums of the sequence associated with B because this series may not be absolutely
convergent. These contributions must be summed in order if we are to have any meaningful result at
all. Then, we need to dissect the resulting series in such a way that each of its contributions are
bounded. These contributions are bounded for different reasons, one having to do with the
convergence of Bk to B, and the other having to do with the absolute convergence of the series
associated with A, which is why we needed to separate them. The product of two conditionally
convergent series may not be handled in this way without additional assumptions, and there are many
examples of products of conditionally convergent series that do not satisfy the results of this theorem.
We will use this theorem many, many times in the upcoming sections. Its basic result is simply
that we can multiply infinite series out just like FOIL; the product of two infinite series consists of the
sum of all possible products containing exactly one term from each of the series. Writing this result in
the form found above allows us to organize this multiplication in a manner that will be very useful
below. This property of infinite series is often referred to as the Cauchy product, after the prominent
French mathematician Augustin-Louis Cauchy, as he derived it for the product of two absolutely
convergent series much earlier than Mertens result.
Exercises for Section I.5
In problems 1 8, establish the convergence or divergence of the given series. Explain how your
choice is made.

k 2 (1) k

3
k 1 k 3k 1

2.

(1) k
k 2 ln k

4.

1.

5.

(1)

k 2

7.

(1)
k 1

32

3k 3 4k 1

3
k 2

k2
ln k
2k 3 3k 2 2
k 5
7

2k 1 (1)k

k 1

3.

6.

(1)

(3) k
k 2 2k ln k
4

k 2

8.

(1)
k 2

k 2 3k 4

k 9 2k 3 ln k
ln k
k 1

Section I.5: The Alternating Series Test and Problems with Conditional Convergence

9. Show that the series

(1) k k

converges, and use the first 100 terms of this series to


k3 1
determine a set of bounds on its value.
k 1

10. Show that the series

(1) k k 2 ln k

k5 1
determine a set of bounds on its value.

converges, and use the first 100 terms of this series to

k 1

Section I.6: Taylor Series


The importance of power series lies mainly in their ability to model functions. Given a
function f ( x ) , we would like to know if it is possible to represent f (x) as a power series centered at
some point x x0 at which the function is infinitely many times differentiable. This is, of course, the
Taylor series you learned about in introductory calculus. It is named for the British mathematician
Brook Taylor, who derived it in 1712, though the Scottish mathematician James Gregory stated
essentially the same result in 1671 and certain series for special cases like the trigonometric functions
have been known since the fourteenth century. Taylor series for which the center lies at x0 0 are
often referred to as Maclaurin series to honor the Scottish mathematician Colin Maclaurin, who used
them extensively to derive rules for the behavior of functions in the 1720s.
One very simple way to derive the desired expansion is to consider the integral

x0

f (t ) dt f x f x0 .

Integrating by parts once, we arrive at


x

f x f x0 f x0 x x0 f (t )(t x )dt .
x0

Integrating by parts repeatedly brings us to the result


1
2
f x f x0 f x0 x x0 f x0 x x0
2
.
1 (n)
1 x
n
f x0 x x0 f ( n 1) (t )( x t ) n dt
n!
n ! x0
This expression is valid for any natural number n whenever the required derivatives exist. The
remaining integral is suppressed by n!, which becomes extremely large as n increases. Unless the
derivative is able to compensate for this factor, the integrals contribution becomes vanishingly small
as n grows and we arrive at the expression

f ( k ) x0
n
f ( x)
x x0 .
n!
k 0
This is Taylors series. It converges whenever the derivatives of the function f (x) evaluated at
x0 cannot compensate for the factorial denominator adorning the integral in the above expression. To
put it another way, the error associated with truncating the series at a specific value of n, say n = N, is
given by

1 x ( N 1)
f
(t )( x t ) N dt . This error can be bounded by
N ! x0
M N 1
1 x ( N 1)
(t )( x t ) N dt
f
x x0
( N 1)!
N ! x0

N 1

33

Advanced Mathematical Techniques

where M N 1 is the maximum absolute value attained by the (N + 1)th derivative of f (t) on the
integration interval. This result is known as the Lagrange error bound, after the French mathematician
Joseph Louis Lagrange who derived it in at the very end of the 18th century. If this error goes to zero
as N is increases without bound, then the Taylor series converges.
Obviously, Taylors expansion is a power series. As such, it must conform to the general
properties we found using the ratio test above. There are three possibilities: It may converge only at
x x0 , it may converge for all x, or it may converge only on the interval x x0 R for some positive
number R, the radius of convergence of the series. Recall that our discussion of series using the ratio
test was valid for complex numbers x, so a series that has a radius of convergence R must converge for
all complex numbers x satisfying x x0 R . It is a standard result in complex analysis that a power
series with radius of convergence R about the point x0 must have an issue somewhere on the circle
of convergence. There must be a place somewhere on the circle of convergence where the function is
not differentiable. We will cover this in detail in chapter 5, where it will be strongly established for
complex domains. There, we will find that the radius of convergence of a Taylor series about the point
x0 is the distance from x0 to the nearest singular point of f (x) in the complex plane. The factorial
function grows extremely rapidly, much more rapidly than r N for any fixed value of r, so functions
whose derivatives cannot match this growth must have Taylor series that converge everywhere in the
complex plane. These types of entire functions are very important in mathematical analysis. Some
examples are the exponential function e x (whose derivatives are constant, independent of N), the
trigonometric functions sin x and cos x (whose derivatives oscillate back and forth between two fixed
values), and polynomial functions (which do not have infinite series as their Taylor expansions; these
expansions truncate at some finite value of n). Functions with a finite radius of convergence must
have derivatives that somehow match this rapid factorial growth.
This matching with the factorial is a very delicate process. If a functions derivatives grow
appreciably faster than the factorial function, then the expansion will not converge anywhere except at
its center. If they grow appreciably slower than the factorial, then the expansion will converge
everywhere and the function will be entire. In order to have a finite radius of convergence, the
derivatives of the function must grow just as fast as the factorial function. One example is the function
f ( x) 1 x , whose derivatives are given by
f ( n ) ( x) (1) n n ! x n 1 .
Centering the expansion at x = 1, we obtain
1
(1) n ( x 1) n ,
x n0
exactly as we would have obtained above from our geometric series analysis. The radius of
convergence of this expansion is 1, the distance from the center at x = 1 to the singularity at x = 0.
Another example is the function g ( x) ln x , whose derivatives are given by
g ( n ) ( x) (1) n 1 (n 1)! x n

( x 3) n
. The
3n n
n 1
radius of convergence of this expansion is 3, the distance from the center at x = 3 to the nearest
singularity at x = 0. We could also have obtained this expansion by integrating the geometric series, as

for n > 0. Centering this expansion at x = 3, we obtain the series ln x ln 3 (1) n 1

x 3
n 1 1 x 3
ln x ln(3 x 3) ln 3 ln 1
ln 3 (1)

.
3
n 3

n 1

34

Section I.6: Taylor Series

Note that it is not necessary for the function to diverge somewhere on the circle of
convergence, only that there is a singularity on the circle. The function h( x) x ln x , for example, has
a Taylor expansion with the same radius of convergence about any choice of center as g(x), but it
approaches zero as x approaches 0. Its derivative, h( x) 1 ln x , on the other hand, does diverge at
x = 0. This gives us an indication that the expansion cannot continue beyond x = 0, but there are
functions that buck even this idea. The general treatment of such series involves the idea of
differentiability in the complex plane, so must be saved for chapter 5. Until then, try to keep in mind
that the terms singular and diverge are not synonymous.
Armed with Taylors formula, we can easily determine the expansions of many different
functions. You should already be familiar with the expansions

xn
ex
n0 n !

( 1) n x 2 n 1
n 0 (2n 1)!

sin x

(1) n x 2 n
(2n)!
n 0

cos x


!
(1 x) x n
xn
n 0 n
n 0 ( n )! n !
from introductory calculus. The first three of these expansions are entire; they converge for all x. The
last expansion, called the binomial expansion, has a radius of convergence of 1 unless is a whole
number (in which case the function is a polynomial) because the original function has issues at x = -1.
It may be strange to think of the factorial of a non-whole number, but this idea does come up a lot. For

now, you can think of the binomial expansion coefficient as
n
n terms

( 1)( 2) ( n 1)
,

n!
n
but we will be discussing the idea of continuing the factorial function to non-whole numbers very
soon. Just as the geometric series result above allowed us to determine expansions for many different
functions, these results also generalize to more complicated expansions. Some examples are

2
x2n
e x ( 1) n
n!
n 0

sin x
x2n
(1) n
x
(2n 1)!
n 0
sin x
xn
(1) n
(2n 1)!
x
n 0

1 cos x
x2n
(1) n
2
x
(2n 2)!
n0

1
1 3 5 (2n 1) 2 n
(2n)!
x (1) n 2 n 2 x 2 n .
1 (1) n
n
2
2 n!
2 n!
n 1
n0
1 x

35

Advanced Mathematical Techniques

The first four expansions are entire, and the last one has a radius of convergence of 1. See if you can
derive these expansions from the above formulas, and make sure that you understand how to determine
their radii of convergence. The last one gives the interesting result that

1
1 3 5 (2n 1)
(2n)!
1 3 5 35
1 (1) n
(1) n 2 n 2 1
.
2n n !
2 n!
2 8 16 128
2
n 1
n 0
This series does not represent an efficient way to actually compute the square root of 2, but it is
interesting nonetheless.
The above expansions were not determined directly from the Taylor series formula. It would
have been very difficult indeed to determine the value of an arbitrary derivative of the functions
associated with the any of these expansions evaluated at x = 0. Knowledge of the expansion, however,
implies knowledge of the derivatives. Just because we did not use the Taylor formula to determine the
expansion does not mean that the Taylor formula is not valid. Suppose, for example, that we wish to
sin x
. By Taylors formula, this is simply 50!
determine the 50th derivative of the function f ( x)
x
times the coefficient of x 50 in the expansion. A glance at the expansion allows us to deduce the value
50! 101! for the required derivative, essentially without taking any derivatives at all. Even the second
derivative of this function would be annoying to actually take, and deducing the value at zero would
require multiple uses of LHpitals rule. This method allows us to cut through all of the nonsense and
get right to the answer.
These expansions have historically been used to compute the value of specific functions to high
accuracy, and they are still used for that purpose today inside the circuitry of our calculators and
computers. In this age of digital technology, however, one may wonder why we care about these
expansions. We can readily compute the value of the square root of 2 or the sine of 1 to as many digits
as we like (as long as it is not too many dont be greedy!), so why would we care about these
expansions? One of the many reasons has to do with the fact that we are often concerned with an
integral of these functions over a range of values instead of just a single value. A computer system
would have to compute each of the values separately, wasting a lot of processing time. Even
sophisticated programs like Mathematica will take a while to plot the results of an integral like
x dt
0 1 t 7 . Instead of asking a program to go through the arduous task of performing this integral again
and again for each point it plots, we can use the results of our series analysis to reduce the labor we are
asking our computer to perform and consequently reduce the time it takes to produce the required
graphic. An expansion for the integral is easily obtained by integrating the series term by term. This
is allowed whenever x < 1 because the radius of convergence of the series is 1. The required integral is
given by the expansion

x dt
x 7 n 1
x8 x15 x 22
(1) n
x

.
0 1 t 7
7n 1
8 15 22
n 0
This expansion is much easier for computer processors to handle, and will produce accurate results as
long as we include enough terms and the value of x is not too close to 1 (the boundary of the
convergence interval). We can also use this technique to determine integrals with free parameters like
x dt
x 1
x 2 1

; 0 .
x
0 1 t
1 2 1
This simplified form of the integral allows us to easily plot the results for any value of and even
take derivatives with respect to in some cases. In this specific example, the resulting series is
alternating and its error is easily determined.
36

Section I.6: Taylor Series

One important example in optics of the use of power series to determine integrals is the Fresnel
integral

x
x 4 n 3
2
( 1) n
.
0 sin t dt
(4n 3)(2n 1)!
n 0
For various reasons, it is often useful to have a graph of this function around, along with several of its
values. This expansion converges for all x, so there are no difficulties with intervals. The factorial in
the denominator causes the convergence to be quite rapid for moderate values of x, so this expansion is
very useful in practice. It does not converge at infinity, but the value of
equal

sin t 2 dt can be shown to

8 from other means. Another important example is the exponential integral

1 et
x n 1
n
,
dt

1)

0 t
(n 1) (n 1)!
n0
arising in the analysis of absorption of light by interstellar nebulas. This integral is also entire, but it
does not converge as x tends to infinity. Make sure that you can determine these results on your own;
some other examples are given in the exercises, but you are certainly free to make up some your own
as well. Use Mathematica to check and see how accurate your expansions are for specific values of x.
You may be surprised at how well these expansions approximate the exact results, especially when
there is a factorial in the denominator to aid convergence.
Another technique we can use to extend the results of our few expansions involves the
determination of the product or the ratio of two functions whose expansions are known. This
technique is not as clean as the differentiation and integration techniques, and will often not lead to a
general expression for the nth term of the expansion. We can, however, use it to obtain the first few
terms of the expansion. Depending on how accurate we require our results to be, this may be good
enough. A simple example is the tangent function. We already know the expansions of the sine and
cosine functions, so the tangent function is simply the ratio of these two. All we need to do is
determine the ratio

x 2 n 1
(1)n

sin x n 0
(2n 1)!

.
tan x
2k
cos x
k x
(1)

(2k )!
k 0
This expression looks somewhat daunting at first, and no explicit formula for the coefficients of the
resulting expansion has yet been found. However, we can determine the first several terms by
employing a very useful technique. Writing out the terms in the numerator and denominator, we see
that the denominator is reminiscent of a geometric series:
x3 x5
x

x3 x5
1
3! 5!


x
tan x

2
4
2
4
x
x
x
3!
5!
1 x
1
2! 4!
2! 4!
.
3
5

x
x
1
x
2
4
3! 5!

1 x x

2! 4!

If x is small enough that the quantity in parentheses in the denominator of the last expression is less
than one, we can expand this geometric series to obtain
x

37

Advanced Mathematical Techniques


2

x2 x4
x2 x4

x3 x5
tan x x 1 .
3! 5!

2! 4!
2! 4!

This still appears daunting, until we decide that we are only interested in the first few terms of the
expansion. Suppose, for example, that we only require terms up to x 5 in the expansion. In this case,
any term associated with a power of x larger than 5 can immediately be thrown away as too small.
The expansion now becomes much easier to handle:

x3 x5 x 2 x 4 x 4
x3 5 x5 x3 x5 x5
x3 2 x5
tan x x

x
.
1 x
6 120
2 24 4
2 24 6 12 120
3 15

This series definitely is appropriate for the tangent function, as illustrated in figure 4. If we wanted to
obtain more terms, we would simply have kept more terms in our analysis. This is not the standard
approach used to introduce division of series, but it is a useful technique that is often quicker than the
others. The most important thing to keep in mind when doing such calculations is what power of x we
are going for. Any power larger than that is simply thrown away.

3.5
3.0
2.5
2.0
1.5
1.0
0.5

0.2

0.4

0.6

0.8

1.0

1.2

Figure 4
There is, of course, an infinite power series expansion centered at zero for the tangent function.
It may be difficult for us to determine its coefficients, but that certainly does not mean that the
expansion does not exist. What is the radius of convergence of this expansion? Well, we cannot use
the ratio or root tests, or any other test for that matter, because we do not have an expression for the
coefficients of arbitrarily large powers of x. However, our knowledge of the behavior of the tangent
function makes this approach unnecessary. The tangent function diverges at 2 , so the radius of
convergence cannot be larger than this value. It turns out that there are no singularities in the tangent
function throughout the complex plane except those lying on the real axis that you are already familiar
with (at x n 1 2 , for integer n), so the distance to the closest singularity is 2 and the radius
of convergence of the expansion is therefore equal to 2 . This fact allows us to determine
asymptotic results for the coefficients of the expansion even though we cannot calculate all of them
explicitly. If

tan x an x n ,
n0

then
lim sup

an

since the radius of convergence of the expansion is given by


1
R lim sup
.
n
n
an
n

38

Section I.6: Taylor Series

At least some of the coefficients therefore must approach 2 for arbitrarily large n. We will have
an expression for the Taylor series of the tangent function soon, and this expression will illustrate this
result.
n

6
5
4
3
2
1

-6

-4

-2

Figure 5
x
. The limit as x approaches zero of
e 1
this function is 1, so the function must be defined to take the value 1 when x = 0 if we want it to be
continuous there. To determine an expansion for this function valid for small x, we simply expand the
denominator:
x
x
1

.
2
3
4
x
x
x x
x x 2 x3
e 1
x 1
2! 3! 4!
2! 3! 4!
Using our geometric series trick, we obtain the expansion
x
x x2 x4
1
.
x
e 1
2 12 720
The graph illustrated in figure 5 indicates that this expansion is indeed appropriate for the function.
We can even determine for sure that all of the odd powers in the expansion are absent except for the
single term x 2 because the function

As another example, consider the function f ( x)

x
x x ex 1
x e x 1

x
e 1 2 2 e 1
2 e x 1
is even in x. The radius of convergence of this expansion is 2 because the identity e 2 i 1 causes
the function to diverge at x 2 i . We may have thought that we could obtain the expansion by
manipulating the function as
x
xe x

xe x 1 e x e 2 x e 3 x
x
e 1 1 e x
and expanding each of the exponentials about x = 0. This approach is not correct, as the value of e x
when x = 0 is 1 and the geometric series does not converge at 1. When playing these sort of
expansion games, we must always keep in mind the disk of convergence of the expansions we are
using.
We can sometimes use expansions as aids to computing the value of a limit without having to
employ LHopitals rule. The limit
x sin x 2 x 3
,
lim
7
x 0
ex 1
for example, would be quite annoying to compute using LHopital. This would require us to take
seven derivatives of both the numerator and the denominator before we arrived at a limit that we could
just plug zero into. Using expansions, we arrive at
x

39

Advanced Mathematical Techniques

x 7 x11

x3
1
x sin x x
6 120
lim
lim

14
21
x7
x 0
x 0
x
x
6
e 1

x7
2
6
very quickly, essentially without having to perform any derivatives at all (we have already taken the
required derivatives). We can also sometimes use this technique to determine the convergence or

1
, for example, resists any reasonable test that
divergence of strange series. The series 3 1 n
1
n 1 n e
2

x3

we have encountered. The ratio test would be annoying and would fail anyway, as would the root test.
The integral test would lead to an integral that we probably could not compute in any straightforward
manner. For large n, however, we can expand the exponential and show that the series converges
simply by comparison to the convergent p-series with p = 2. In this way, the expansion shows us the
path to a lucrative convergence test.

Exercises for Section I.6


In exercises 1 20, find the Taylor series centered at zero of the given function. Determine the radius
of convergence of your series.
1. ln 1 2x 2

2. ln 1 3x 2

3. ln 2 3x 4

5. Arctan 3x5

6. 2 x 2 Arctan 3 x 2

7. e 5x

2 3x 4

9.

10.
3

1 e 2 t
13.
dt
0
t
x

17.

1 2t 3 1
dt
t

14.
18.

7 2x5

11.

15.

e 3t 1
dt
t3

1 2t 5 1
dt 19.
t3

4. ln 3 5x 3

x sin 2 x3 x

8.

1 cos t
dt
t

12.

2 3t 5 dt

16.

9 5t 6 3
dt 20.
t2

sin t
dt
t3

t 2 3 5t 7 dt
4

16 3t 5 4
dt
t3

21. Use the Lagrange error bound to determine the maximum possible error associated with
calculating the numerical value of cos 2 by truncating the Taylor series for cos x at the fifth
term (so n = 8). Compare your bound to the actual error in truncating the expansion at this
term, as well as to the error obtained by viewing this series as alternating. How many terms
would you need to keep in order to be guaranteed an error less than 108 by Lagrange? How
many terms are necessary if you use the alternating series error bound?
22. Use the Lagrange error bound to determine the maximum possible error associated with
calculating the numerical value of e3 by truncating the Taylor series for e x at the eighth term
term (so n = 7). This argument is sort of circular unless you already know something about this
number, so lets assume that you know e 3 . Compare your bound to the actual error in
truncating the expansion at this term. How many terms would you need to keep in order to be
guaranteed an error less than 108 by Lagrange? Can you use the alternating series approach in
this case? Explain.
40

Section I.6: Taylor Series


4

1 e 2 x 2 x sin x 3
.
x0
sin 3 x8

23. Determine the value of lim

24. Determine the value of lim

ln 1 2 x 5 2 x 2 ln 1 x 3
1 cos 3 x 4

x 0

25. Determine the exact value of the series

5k

2
k 1

3k

2k k 2 k 1

. Hint: break it up into three smaller


2k
k 1
k 0
pieces, aimed at canceling part of the denominator.

26. Determine the exact value of the series

1 x4 1
0 x3 dx correct to three digits (so
that your error is smaller than 5 104 ). How many terms do you need in your expansion?
Compare your result to that obtained by numerical integration.
1

27. Use a Taylor expansion to determine the value of

28. Use a Taylor expansion to determine the value of

sin 2 x 3

dx correct to six digits (so that


x2
your error is smaller than 5 10 7 ). How many terms do you need in your expansion?
Compare your result to that obtained by numerical integration.

29. Use a Taylor expansion to determine the value of

1 cos 3 x 5

dx correct to six digits (so


x3
that your error is smaller than 5 107 ). How many terms do you need in your expansion?
Compare your result to that obtained by numerical integration.
0

1 x5 1
0 x2 dx correct to three digits (so
that your error is smaller than 5 10 4 ). How many terms do you need in your expansion?
Compare your result to that obtained by numerical integration.

30. Use a Taylor expansion to determine the value of

31. Given the function f ( x)

1 cos 5 x 3

, determine lim f (20) ( x) . Leave your answer in terms


x 0
x4
of factorials and powers, then give a decimal approximation to your result.
5

e3 x 1
, determine lim f (17) ( x) . Leave your answer in terms of
32. Given the function f ( x)
x 0
2 x3
factorials and powers, then give a decimal approximation to your result.

41

Advanced Mathematical Techniques

33. Determine whether the given series converges or diverges. Explain your reasoning and which
test will be successful in proving your result.
(a)

k
k 1

(d)

sin 1 k 3

11

k 1
k

k 1

1 cos 3

(c)

(f)

(b)

k 1

(e)

8 1 k 2

k 1

1 k2

k 1

k 1

1 ln 3 k

16 1 k 2 2

34. Find the first five terms in the Taylor expansion of the function e x sec x , centered at x 0 .
What is the radius of convergence of this expansion? Use your radius to give an approximation
to the coefficients of arbitrarily large powers of x in this expansion. What does your
approximation mean? Will all of the large coefficients satisfy it? Explain what the behavior of
the coefficients must do in the limit of large powers of x.
35. Find the first four terms in the Taylor expansion of the function 1 2 x 2 cos x , centered at
x 0 . What is the radius of convergence of this expansion? Use your radius to give an
approximation to the coefficients of arbitrarily large powers of x in this expansion. What does
your approximation mean? Will all of the large coefficients satisfy it?
2

36. Show that the Maclaurin expansion for the function e 1 x using Taylors formula leads to the
2

result e 1 x 0 , i.e. all of the coefficients of the expansion are zero. Does this mean that this
function equals zero? Why or why not? When thought of as a real function of x, this function
is infinitely-many times differentiable at x = 0. Explain why the values of this function for
complex x cause us to view this result with suspicion.

Section 1.7: Integration by Differentiation


One very useful technique that is vaguely linked to expansions involves the determination of
many different integrals from one base result. Consider, for example, the definite integral

dx

ax 2 b ab ,
where a and b are arbitrary positive constants. This integral can be performed fairly easily because the
antiderivative is associated with the inverse tangent function. This integral is a fertile ground for the
determination of many different integrals, as it contains the parameters a and b. Differentiating this
equation with respect to b gives

dx
1
ax 2 b 2 2 ab3 ,

and differentiating with respect to a gives

x 2 dx

ax

1
.
2 a 3b

These derivatives can be taken under the integral sign essentially whenever the result makes sense. I
will not spend much time discussing the manner in which one can prove that such manipulations are
42

Section I.7: Integration by Differentiation

justified except to say that it works whenever it doesnt lead to nonsense. A more careful treatment is
much more involved, and requires us to repeatedly use the definition of a limit. We will find a more
straightforward approach in most cases using the analysis of chapter 5. Assuming that this result is
justified, we immediately arrive at

dx

x 2 1 2 2 ,

x 2 dx

2 3

and many other interesting results. Note that there was no need to integrate anything else once the
original result is determined. The fact that the original integral contains arbitrary external parameters
implies that it also contains information about these related integrals.6 Repeatedly differentiating with
respect to a and/or b gives us many more integrals of this form. One example is

x 4 dx
3
x2 1 5 128 .

It would certainly be daunting to approach this integral in introductory calculus!
Another fertile ground for determining the value of many integrals is the integral
1
1

0 x dx 1 .
Differentiating with respect to gives
1
1

0 x ln x dx (1 )2 .
Each derivative we take with respect to generates another power of ln x inside the integral. The
derivatives of the function on the right are easy to take, so we could do this all day. As an alternative
to mindlessly taking derivatives, let us consider n for small and determine an expansion for
the original relation about 0 :

1
1
k 1 n k
1
1
1
n
n ln x
x ln x dx

0 x dx 0 x e dx

!
1
1
k 0
1
n 1 .
k

1

(1) k

n 1 k 0
n 1
Equating the coefficients of k , we arrive at
1
1
( 1) k k !
k!
n
k
n
k 1
0 x ln x dx (n 1)k 1 , or
0 x ln x dx (n 1)k 1 .
The only restrictions on this integral are that n > -1 and k is a whole number. These expressions lead
to the interesting result that

1
1
(1) k 1
1 1 1
x
x ln x
1 2 3 4 ,
k
0 x dx 0 e dx
2
3 4
k
k 1

6
This technique has been called the dirtiest trick in mathematics by a prominent mathematics professor because of its easy
access to many integrals that would be difficult to approach from any other perspective. Physicists and engineers find
nothing dirty about it, however, as its results are so easy to obtain and the treatment is so clean. The struggle between
physicists and mathematicians goes on.

43

Advanced Mathematical Techniques

which would be difficult to derive in any other way. See if you can prove this result. Another
important result that arises from these expressions is derived by making the change of variable

1
z ln in the second of the above integrals and taking n = 0. This leads to z k e z dz k ! , which
0
x
will generate many, many interesting results in the coming chapters.

Exercises for Section I.7


In exercises 1 12, determine the value of the given integral.

4.

7.

10.

1.

x4
2

dx

2.

dx

5.

8.

11.

x 2 ln 2 x dx

x2

x5 ln 3 x dx

13. Determine the value of


integral

x 2 ln x dx

6.

x5 ln 2 x dx

9.

12.

dx

ln 4 x dx

dx

x5 ln x dx

x 2 ln 3 x dx

1
2

x ln 4 x dx

x e2 x cos(3x) dx by appealing to the general expression for the

e ax cos(bx) dx for positive constants a and b.

14. Determine the value of


integral

3.

x4

x e3 x sin(2 x) dx by appealing to the general expression for the

e ax cos(bx) dx for positive constants a and b.

15. Find an infinite series for the integral

x x dx , and use your series to determine the value of

this integral to six decimal places. How many terms does this require? Compare your result to
that obtained by numerical integration.

16. Determine the value of

44

ln 4 x

x ln 2 x 1

dx . Hint: u-substitutions are very useful.

Section I.8: Summary of Chapter I

Section I.8: Summary of Chaper I


This chapter began with an extremely old result, the geometric series. Geometric series are
used in many applications, and they can be exploited to give many results. We will see that the
geometric series is used directly to establish Taylors theorem on the expansion of an arbitrary
complex function in chapter 5, and the finite series also surprisingly find use in the amortization of
loans. In order to handle more general infinite series, we need to establish a way in which to
determine whether or not they converge. This is not always obvious, as there are many series whose
individual terms tend to zero but that do not ultimately converge. The limit comparison test is a very
lucrative test for convergence or divergence, and allows us to establish the convergence properties of a
given series if we already know the convergence properties of a related series. In order to use this test,
however, we must first have a set of baseline results about the convergence properties of a set of
nontrivial series. One of the best ways to do this is by using the integral comparison test, which
essentially compares a series to an improper integral related to that series. The integral comparison
test provides us with an unlimited number of results on the convergence properties of basic series,
which can then be used in conjunction with the limit comparison test to establish the convergence
properties of an extremely broad class of series. In addition, the techniques used to derive the integral
test can be harnessed to give us bounds on the value of an infinite series. This process is not only
useful for estimating the ultimate value of a convergent series, but also for assessing the rate at which a
divergent series diverges. Using this technique, we can extract the divergence of a divergent series to
arrive at a meaningful result.
Many series naturally contain contributions that are not easily handled by the integral test, such
as the factorial function. A more lucrative test for many of these series is the ratio test, which
estrablishes convergence essentially by comparing the series of interest to a geometric series. This test
disarms the factorial function in a manner not accessible to the integral test, and leads naturally to the
treatment of power series. Power series provide a very useful way in which to parameterize functions
that we may not know very much about, and can, in many instances, be thought of as their own branch
of mathematics. We will see in chapter 5 that all more-or-less nice functions possess representations
as power series, and these representations allow us to pursue a detailed study of many functions that
are otherwise inaccessible. Power series also allow us to integrate known functions repeatedly, even
when this process is not easy or even possible using known functions. They open up an entire world
of analysis that is not available to us without them, and this analysis is very lucrative in many
applications. A somewhat sturdier, though often more unwieldy, relative of the ratio test is the root
test. This test can be modified slightly, using the limit superior, to form an extremely stringent test for
convergence that is used to prove theorems on the behavior of entire classes of functions. We will see
a lot more of this in chapter 5.
Taylor series are a technique with which we can relate the power series representation of a
function to the derivatives of that function at a given point. This analysis allows us both to determine
the power series representation of a given function and to use a power series to extract meaningful
information about the derivatives of a function that may not be known in closed form. Because they
are power series, Taylor series can only converge in a disk about the center of the expansion. They
cannot represent a function everywhere unless the function has no singularities in the complex plane.
The introduction of the complex plane is inevitable when discussing power series and Taylor series, as
singularities that exist only for complex x cannot be muted in these representations. Even functions
that are perfectly well-defined and infinitely many times differentiable for all real x can fail to have a
Taylor series that converges for all real x. The radius of convergence of their Taylor series is always
limited by the distance to the closest singularity in the complex plane whether it lies on the real axis
or not. Despite this fact, we will see later that a Taylor series that converges in any disk uniquely
45

Advanced Mathematical Techniques

determines the function throughout the complex plane. This is not at all obvious, but it is definitely
true. Thus Taylor series are as good as the real thing, regardless of their shortcomings.
Taylor series are an immensely useful aid to numerical calculations of many types. Even in
this age of computers, where numerical computation is often very easy to obtain, there are many results
that are much easier to obtain from analytic methods than from numerical calculation. Examples
include the limits given in sections 1.3 and 1.6. These limits are actually much easier to obtain
analytically than numerically. It would be very difficult to extract the divergence of a divergent
expansion on a computer, as the computer itself is not aware of why the terms are getting larger or
even that they are. Numerical investigations of the glacially divergent series introduced in section 1.3
would indicate that the series is convergent; unless we have another means with which to compute its
value, we would have to wait longer than the age of the universe to see its value exceed the number 5.
While very useful, numerical computations cannot tell us the whole story. We need to make sure that
we understand what we are doing and what we are asking a compter to do before we can be sure that
the results of such an analysis can be trusted. The most efficient calculations involve using analytic
results as an aid to simplify the numerical computations given to the computer. This represents the
best of both worlds, allowing us to avoid waiting for the computer to repeatedly approximate
Riemann sums again and again, as well as avoiding the inevitable errors such approximations will
produce.
The uniqueness of Taylor series allows us to exploit integrals containing arbitrary parameters
and determine the value of many integrals from a single result. Integrals containing an arbitrary
parameter are fertile grounds for generating many related integrals. If we can determine the definite
integral of a function that contains a parameter, then this integral represents a function of that
parameter. The derivatives of this function give us the value of other integrals related to the original
integral, and allow us to strategically compute the value of many integrals that otherwise would be
very difficult to obtain. The idea is that knowledge of an integral containing a parameter represents
knowledge of that integral over a range of values of the parameter, and contains more information than
that contained in a single integral with a given value of the parameter. It is not difficult to obtain the
value of many related integrals, once we get the hang of the technique, and this process is often much
easier than approaching the desired integral directly.
The importance of series in analytic results cannot be overstated. We will find many more
analytic uses of infinite series in the following chapters, and a much stronger result concerning the
existence of infinite series in chapter 5.

46

Section II.1: General Infinite Products

Chapter II
Infinite Product Expansions
The purpose of this chapter is to develop the formalism necessary to show that an infinite
product exists and to illustrate some of the more basic techniques used to manipulate these expansions.
A much more involved and rigorous treatment will be given in section 5.13, but there are some special
results that we will use before that treatment.

Section II.1: General Infinite Products


Given a sequence an n 1 of complex numbers, we define the related partial product sequence

Pn n 1

by
n

Pn (1 ak ) (1 a1 )(1 a2 ) (1 an ) .
k 1

If ak 1 for all k, then none of the members of this sequence is zero. Furthermore, if lim Pn P 0 ,
n

then the infinite product

(1 a )
k 1

is said to converge to P. In analogy with infinite series, the all-

important question is, what conditions must be satisfied by the original sequence an n 1 in order to
ensure the convergence of the infinite product?.
The first and most obvious result is that lim an 0 . This follows immediately from the

definition of convergence, as if lim Pn P 0 then


n

Pn 1 P
Pn 1 lim
n
1,
n P
lim Pk
P
n

lim 1 an 1 lim
n

and the result follows. Note that this does not apply if P = 0; there are examples of product sequences
that converge to zero for which lim an is not zero, and may not even exist. The sequence defined by
n

1
2
an when n is even and an when n is odd certainly has no limit as n tends to infinity
2
3
1
2
because it oscillates back and forth between and . The sequence of partial products,
2
3
k 1

1 1
1 1
P2 k 1 ;
P2 k ,
3 2
3 2
on the other hand, clearly converges to zero. It is for this reason that 0 is disallowed from the
convergent subset of partial products. Partial product sequences that converge to 0 are said to
diverge because they do not converge to a finite (non-zero, non-infinite) number. This idea is the
analogue of the nth term test or the test for divergence for infinite products: if lim an 0 , then
n

(1 a ) diverges.
k 1

47

Advanced Mathematical Techniques

To get a more stringent test for convergence, let us consider the term Pn in the partial product
sequence,
n

ln 1 ak

Pn 1 ak e k 1

k 1

It is obvious from this representation both that Pn n 1 cannot converge if

ln 1 a
k 1

diverges7 and

that

Pn n 1

ln 1 a
k

k 1

ln1 ak

must converge to e k 1

if

ln 1 a
k

k 1

converges. Therefore, the convergence of

is necessary and sufficient for Pn n 1 to converge. Note the importance of disallowing

P = 0 for the validity of this result: if P = 0 were allowed, then

ln 1 a could certainly diverge to


k 1

. A more useful result can be obtained for sequences cn n 1 that consist only of positive terms. In

this case, we can bound the quantity ln 1 ck by using the fact that lim cn must equal zero if the
n

infinite product is to converge. Since the limit is zero, it must be possible to find a natural number N
so large that cn 1 2 whenever n > N. If 0 x 1 2 , then we can use our Taylor expansion for the
function f ( x ) ln(1 x ) to write

ln(1 x) x

x x 2 x3

1
1
x 2 x3 x 4
1

x x x x 2 2 3
2 3 4
2 2 3 2 4

2 3 4

1
1
14
3
1

x x 2 2 3 x x
x

2
2
2
2
2
1
1
2
2

Similarly,

x x 2 x3

x 2 x3 x 4
x 3
x x x x .
2 3 4
4 4
2 3 4

This is illustrated in figure 1. These relations immediately imply that


3
3
cn ln 1 cn cn
4
2
ln(1 x) x

whenever n > N, so, summing from n = N + 1 to N + K for some natural number K, we have
N K
3 N K
3 N K
cn ln 1 cn cn .

4 n N 1
2 n N 1
n N 1

In other words,

ln 1 c
n 1

converges or diverges with

c
n 1

. This is essentially a limit comparison

test between the two series: the expansion of the natural logarithm function implies that ln(1 x) x
when x is small, so the tails of the two series are the same and they must either converge or diverge

We can only arrange for the product to converge while the series diverges if we allow for the logarithm to be multi-valued.
See chapter 5 for more about this possibility.

48

Section II.1: General Infinite Products

together. From this analysis, we see that the infinite product


multiplied in any order to achieve the same result) whenever

1 a converges absolutely (can be


n

n 1

a
n 1

converges absolutely.

1.0
0.8
0.6
0.4
0.2
0.1 0.2 0.3 0.4 0.5 0.6 0.7

Figure 1
Note that this result cannot directly be applied to infinite products that are not associated with

1
positive sequences cn n 1 . The infinite product 1 diverges (it goes to zero), but the infinite
n
n 1
n

(1)
product 1
actually does converge to 1.00149 . It is tempting to say that the infinite
n
n 1
product

1 an

converges or diverges with

n 1

n 1

whenever an 1 for any n, but this more

general result is also more difficult to prove.8 We can understand qualitatively why this is expected,

however, by making use of Cauchys convergence criterion: Pn n 1 converges whenever, given any

0 , we can find a natural number N so large that PM K PM for any natural number K, as long
as M > N. By the definition of the sequence Pn n 1 , this condition can be re-written as

M K

PM

1 a 1 .
k

k M 1

The product appearing in the absolute value is clearly given by


M K

M K

1 a 1

k M 1

k M 1

ak

M K

k M 1

M K

ak

j k 1

The omitted terms contain all possible products of three, four, five, etc, ak with different index k.
Since lim an 0 whenever the product sequence converges, these contributions can be thought of as
n

small whenever M is large. Subtracting 1 and ignoring all of the small contributions, we see that the
criteria for convergence is simply
PM

M K

k M 1

ak small ,

i.e., the same requirement needed to ensure the convergence of

a
n 1

. The details of this small

business actually destroy the theorem, as the infinite product


8

It is also not true. See below.

49

Advanced Mathematical Techniques

(1)k

ln ln k
k 2

diverges even though the infinite series

(1) k

ln ln k 11.477968...
k 2

converges quite well by the alternating series test. It is important to understand that results that seem
ok may not actually be true. The history of mathematics is filled with examples of theorems that
people believed to be true, but were later shown to be false. This is why rigorous proof is so important
in mathematics. If we can show that a theorem is satisfied using the definitions of the terms used in the
statement of the theorem, then we can be sure that it will be true in all cases.

Using the general results above, it is easy to see that the infinite products 1 2 ,
n
n 1
n
5

(1)
3n
n

1
9
, and

1 3
, and 1 3 2 converge, while 1 , 1 6
2
5
2n 2 4
n
n

n
n
4
3
6

n 2
n 1
n 1
n 1
n

e diverge.
1 n

One way to handle the last example, which will be important later, is to consider

n 1
1 n

1 for large n. Using the Taylor expansion for e x , we see that this is simply 1 n when n is

1
large. Since is the divergent harmonic series, the infinite product also diverges (to zero in this
n 1 n

2
1
1

case). From this analysis, it should also be clear that e 1 n , cos , and n sin
n
n
n 1
n 1
n 1
converge.
e

Exercises for Section II.1


In problems 1 12, discuss the convergence or divergence of the given infinite product. Explain your
reasoning.

k 2 2k 1
1
k

1. 1
2. 1 2
3. 1 4

k 1
k k2 2
k
k 1
k 1
k 1
4.

k 2 3k 2
1

k 3 5k 2 1
k 1

7.

5.

2k k 2
1

3k k 2
k 1

8.

k 1

10.

k
k 1

k2
1

k k
k 2k 1

6.

32 k
1

k
3k
5 k2
k 1

9.

k 1

k2 k
2
1

11.

k 3 k 1
3
2k 1

k
k 1

k 1

12.

k
2k 3

k 3 2k 2 3
3
k2 2

k
k 1

The last one is a little difficult to show, and the proof that it converges is given in the exercises.

50

Section II.1: General Infinite Products

(1)k
(1) k

.
The
associated
series
definitely
1

k
k
k 2
k 2
converges by the alternating series test, though it does so only conditionally. Group the terms
with even k in the infinite product together with the next term, and multiply these two terms out
to obtain a different representation for this infinite product that does not alternate. Does the
infinite product converge?

13. Consider the infinite product

(1)k
(1) k

.
The
associated
series
definitely
1

ln k
k 2 ln k
k 2
converges by the alternating series test, though it does so only conditionally. Group the terms
with even k in the infinite product together with the next term, and multiply these two terms out
to obtain a different representation for this infinite product that does not alternate. Does the
infinite product converge?

14. Consider the infinite product

(1) k
(1) k
1

.
The
associated
series
definitely

3
3
k
k
k 2
k 2
converges by the alternating series test, though it does so only conditionally. Group the terms
with even k in the infinite product together with the next term, and multiply these two terms out
to obtain a different representation for this infinite product that does not alternate. Does the
infinite product converge?

15. Consider the infinite product

(1) k
(1) k
. The associated series
definitely

3 2
3 2
k 2
k 2
k
k
converges by the alternating series test, though it does so only conditionally. Group the terms
with even k in the infinite product together with the next term, and multiply these two terms out
to obtain a different representation for this infinite product that does not alternate. Does the
infinite product converge?

16. Consider the infinite product

17. It may seem strange that infinite products are said to diverge if they equal zero, as zero is not
usually considered to be divergent. The reason for this restriction is that infinite products that
give zero can exhibit all sorts of behavior, so we cannot make any statements about the
behavior of the terms in a convergent infinite product if we allow convergent infinite products
to take the value zero.
(a) Give an example of an infinite product
(b) Give an example of an infinite product

1 a that gives zero even though lim a


k 1

1 a that gives zero for which lim a


k 1

0.

0 , but

ln 1 a diverges.
k 1

(c) Show that it is not possible for an infinite product to give zero unless

ln 1 a
k 1

diverges. Show further that if an infinite product gives zero and there is a value of K large
enough for which

ln 1 a converges, then a

k K 1

1 for at least one value of k K .


51

Advanced Mathematical Techniques

Section II.2: Infinite Product Representations of Functions


As with series, some of the most important results of infinite products are associated with their
use in representing specific functions. Many functions have infinite product representations, but we
will be most concerned with the sine and cosine functions. There will be a third later on, but lets save
that beast for later. To develop these infinite product representations, we appeal to the familiar idea of
writing a polynomial whose (nonzero) value at 0 is given along with all of its zeros and their
multiplicity. Suppose we want to determine the polynomial p(x) of minimum degree whose zeros lie
at 1, -2, 3, and 5, all of which have multiplicity 1, and that takes the value 7 at x = 0. We begin by
writing
p ( x ) A( x 1)( x 2)( x 3)( x 5) .
The value at zero allows us to determine the constant A as
7
,
A
(1)(2)( 3)(5)
so our polynomial can be written as
( x 1)( x 2)( x 3)( x 5)
x x x x
p ( x) 7
7 1 1 1 1 .
(1)(2)( 3)(5)
1 2 3 5
It should be clear that this process can be generalized to a polynomial with an arbitrary number of
simple (multiplicity 1) zeros as
n

x
x
x
x
p( x) p(0) 1 1 1 p(0) 1 ,
rk
k 1
r1 r2 rn
where there are n zeros that take place at the roots rk , for k running from 1 to n. We can imagine
continuing this process for larger and larger numbers of roots, leading to the infinite product expansion

x
f ( x) f (0) 1
r
k 1
k
for the function f (x) with an infinite number of roots lying at rk for k running from 1 to infinity. This
expression will define a function of x whenever the infinite product converges. The cosine function
certainly has an infinite number of roots, occurring at rk k 1 2 for all integer k. Separating
positive and negative k leads us to consider the function

x
x
f ( x) 1
1
.
k 1 2 k 1 k 1 2
k 0
Neither of these infinite products converge anywhere except at x = 0, but we can consider instead the
well-defined function
M
M 1

x
x
f M ( x) 1
1
,

1
2
1
2
k
k

k 1

k 0
for some large but finite value of M. Re-arranging, we have
M
M 1

x
x
f M ( x ) 1
1

k 1 2 k 1 k 1 2
k 0
.
M

M
M
x
x
x2
1
1

2
2
k 1 2
k 1 2
k 0
k 0
k 0
k 1 2
52

Section II.2: Infinite Product Representations of Functions

The last product actually does converge as M tends to infinity, by comparison to the p-series with
p = 2, so the function

x2
f ( x) 1

2
2
k 0
k 1 2

x2

x2

1
converges, which is all x. Thus,
2

k
1
2
k

0
k

1 2
we have an entire function whose zeros all coincide with those of the cosine function. Is it the cosine
function? The answer is yes, but this fact is difficult to establish completely without some standard
results from complex analysis. Basically, the reason why this must be the cosine function is that the
ratio f ( x) cos x is defined throughout the plane and has no zeros. In addition (this is the hard part to
prove), f (x) has the same behavior and symmetries asymptotically as cos x . These two statements are
enough to imply that the ratio f ( x) cos x is constant; since it equals 1 when x = 0, we arrive at
f ( x ) cos x .10
We can derive many, many interesting results from this infinite product expression. The most
straightforward come from simply plugging in special values of x. See if you can derive the following
expressions:

(2k 1)(2k 3)
1

1
1

2
(2k 1) 2
k 0
k 1 2 k 0
is well-defined for all x for which

k 0


1
4
1

2
2
2
k 0
k 0 9(2k 1) 2

4 9k 2 9k 2 k 2 k 2 9
1
3

32 (2k 1) 2 9 (2k 1) 2
2
2
k 0
k 0 k 1 2
k 0

1 3 k 1 2

4 1 2 4 2 3 4 3 4 4 4 5

2 2 2 .
2
3
5
7
9
4
k 1
The last of these is derived by removing the factor in the infinite product that causes cos 2 0 ,

4k (k 1)

(2k 1)

then taking the limit of the remaining expression as x 2 . The appearance of comes from the
use of LHpitals rule to determine the limit. There are obviously many other similar expressions that
can be derived from this expansion.
This infinite product expansion of the cosine function also leads to some interesting series
results. We already have an infinite series expansion for the cosine function, and these two expansions
must agree. Multiplying out the infinite product gives
x2
1
cos x 1 2
O x4 .
k 0 k 1 2 2
Here, we are using the expansion ideas covered in that set of student notes in which we ignore all
contributions consisting of higher powers of x than the ones we are interested in (the second power, in
this case). Comparing with the expansion of the cosine function gives

1
1
2
,
4

2
2
2
k 0 k 1 2
k 0 (2k 1)
or
10

See section 5.13 for more information.

53

Advanced Mathematical Techniques

(2k 1)
k 0

1 1
2

.
32 52
8

This is a surprising result, and fairly simple to derive from this perspective. The series

(2k 1)
k 0

is

a special case of the Dirichlet lambda function, defined by

1
; Re( s ) 1 .
( s)
(2
k
1) s
k 0
The restriction on s is required for the convergence of the series.
It turns out that the infinite product representation of the cosine function contains information
about the values of this function for all even values of s. One way to see this is to differentiate the
logarithm of this product expansion. Try to show that this yields
tan x
1

.
2
2
2
2x
k 0 k 1 2 x
I have skipped several steps in deriving this, but you should be able to fill them in. Rescaling x and
cleaning up, we can write this result as
tan x 2
1

.
2
2
4x
k 0 (2k 1) x
Taking the limit of both sides as x 0 , we arrive at the earlier result (2) 2 8 . If x 1 , then we
can go even further by employing the geometric expansion:
2n

1
1
x

.
2
2
2
(2k 1) x
(2k 1) n 0 2k 1
Summing from k = 0 to infinity gives
tan x 2
(2n 2) x 2 n ,
4x
n 0
or, on rescaling x and rearranging,

22 n 1 (2n) 2 n 1
x
tan x
.
2n

n 1

This is just the Taylor expansion for the tangent function centered at x = 0. Our techniques from
chapter 1 allow us to determine these coefficients from the known series for the sine and cosine
functions, so we could, with some effort, determine the exact value of the Dirichlet lambda function at
any even integer. From the result
x3 2 x5
tan x x
O x7 ,
3 15
obtained in those notes, we immediately arrive at

1
1 1 1
4
(4)

1
4
34 54 7 4
96
k 0 (2k 1)
and

1
1 1 1
6
6
(6)

1
.
6
36 56 76
26 3 5 960
k 0 (2k 1)
Although our expansion for the tangent function does not explicitly give us the values of any of the
coefficients, it still allows us to make strong statements about their behavior. First, it is clear from our
new expansion that all of the coefficients are positive. This result is not easily obtained in any other
54

Section II.2: Infinite Product Representations of Functions

way, as the technique used to actually determine the coefficients consists of many additions and
subtractions that could, in general, lead to a negative coefficient. A result such as this one allows us to
make strong statements about the behavior of the tangent function in the complex plane. Second, we
see from the obvious fact that lim (n) 1 that the radius of convergence of the tangent expansion
n

about zero is 2 . This confirms the suspicion we had in the expansion notes based on the fact that
the tangent function diverges at 2 , and also allows us to approximate the tangent function by resumming the tail of the series:
2n

x3 2 x5 2 2 x
x3 2 x5 512
x7
.

x
8
3 15 x n 4
3 15 1 2 x 2
The difference between this expression and the actual tangent function is illustrated in figure 2, where
the first three terms in the Taylor expansion are also shown as the dashed curve for comparison; it is an
extremely good approximation, remaining accurate to less than 3 104 all the way up to x = 1.57! We
could get an even better approximation by including the 1 32 n term in (2n) . Third, as it is clear
from the expansion technique used to get the coefficients that all of them are rational numbers, we can
state unequivocally that (2n) is a rational number times 2n for all positive integer n.
tan x x

0.0010
0.0005

- 1.5 - 1.0 - 0.5

0.5

1.0

1.5

- 0.0005
- 0.0010

Figure 2
These results are certainly very interesting, and reveal the depth of the relationship between the
trigonometric functions and as well as the unexpected relationship between sums of inverse powers
of integers and powers of , but it is not clear where they can be applied to physical problems. In
fact, as we will see later, the Dirichlet lambda function and certain related functions do arise in many
physical and mathematical problems. For now, however, think of this as practice with manipulating
expansions. No matter how powerful computer processors are, they can often benefit from some
human help. The techniques used to manipulate expansions are extremely useful in any field that
requires advanced mathematics.
We cannot directly apply the preceding arguments to the sine function because it has a zero at
sin x
x 0 . The modified function
, on the other hand, has all of the same zeros as the sine function
x
except x 0 , at which it must take the value 1 for continuity purposes. The same analysis as above
then leads to

x2
sin x x 1 2 2 ,
k
k 1
which again converges for all x (except at the zeros, where it equals zero). Once again, this result
directly gives us many explicit infinite products:

1 3 3

1 2
9
k 2
k 1
55

Advanced Mathematical Techniques

1 3

2

k 1

4k 2 2k
2k 2 2 4 4 6 6 8 8

1 3 3 5 5 7 7 9
2
k 1 4k 1
k 1 2k 1 2k 1
Try to derive these yourself. The last one is a very famous formula for . It is called the Wallis
product after the English mathematician John Wallis, who derived it in 1655 as part of his early
contributions to calculus and efforts to square the circle.
We could expand in the same manner as with the cosine function, but it is informative to take a
different approach. Dividing by x and taking the logarithm of both sides, we obtain

x2
1 x2n
x2n 1
sin x
ln
ln
1


2 2
2n 2n
2n 2n
x k 1 k
k 1 n 1 n k
n 1 n
k 1 k
In manipulating these series, we are assuming that they converge absolutely; this will definitely occur
whenever x , so we are justified in exchanging the order of summation whenever this condition is

1 36k

satisfied. The sum over k is the sum of inverse powers of integers from k = 1 to infinity. This series
plays a large role in number theory, and is called the Riemann zeta function,

1
( s ) s ; Re( s ) 1 ,
k
k 1
after the German mathematician Georg Friedrich Bernhard Riemann. This function is related to the
Dirichlet lambda function by

1
1 1 1
1 1 1 1 1
1 1 1

1 s s s 1 s s s s s s s s
( s)
s
3 5 7
2 3 4 5 6
2 4 6

k 0 (2 k 1)
,
s
1
1 1 1
2 1
( s ) s 1 s s s s ( s )
2 2 3 4
2

so our earlier results from the infinite product expansion of the cosine function imply that the Riemann
zeta function can also be obtained in closed form for all even integer values of s. We will find that this
result also arises from the present treatment.
According to our infinite product expansion, the Taylor series of the function ln sin x x is
given by

(2n) x
sin x
ln

2n
n
x
n 1
On taking the derivative with respect to x, we obtain

1
(2n)
cot x 2 2 n x 2 n 1 .
x
n 1
The expansion of the cotangent function can be obtained fairly easily, at least for the first few terms,
using the techniques illustrated in the last chapter. We simply expand the numerator and denominator,
then use the geometric series trick to obtain
1
x x3 2 x5
.
cot x
x
3 45 945
Note that the overall factor of 1 x forces us to modify our arguments for throwing contributions
away: if we are interested in the coefficient of x 5 , we must keep all terms up to and including x 6 in
our treatment. Equating coefficients, we find that
2n

56

Section II.2: Infinite Product Representations of Functions

1
1 1 1 1
2

1
2
22 32 42 52
6
n 1 n

(2)

1
1 1 1 1
4

1
4
24 34 44 54
90
n 1 n

(4)

1
1 1 1 1
6
1 6 6 6 6
,
6
2 3 4 5
945
n 1 n
in agreement with our earlier findings for (2 n) . These inverse power series were of great interest to
mathematicians of the 18th century, during which time the first result was referred to as the Basel
problem, and these results were first obtained by the Swiss Mathematician Leonard Euler in 1735.11
The techniques used to determine the expansion make it obvious that (2 n) is equal to a rational
number times 2n for all integer n, and even give us the techniques necessary to determine the exact
values for all even arguments, but the values of the Riemann zeta function at odd integers are
mysteriously absent from this expansion. Attempts to relate these values to odd powers of have all
come to naught, as it is easy to show that if (3) is equal to a rational number times 3 then the
denominator of the rational number in lowest terms must have more than 5 million digits.12 Such a
large denominator seems unlikely, given that the denominator of the corresponding ratio for (2) is 6.
Not 6 digits, just 6. In fact, it was only recently shown that (3) is even irrational.13 This constant
is now called Aprys constant in honor of the Greek-French mathematician Roger Apry who
developed the proof of this fact in 1978. Subsequent studies have shown that there are infinitely many
integers n for which (2n 1) is irrational, and even that at least one of the numbers
(5), (7), (9), and (11) must be irrational, but more sweeping statements about these numbers
remain elusive.
Strangely enough, the Dirichlet beta function,

(1) n
( s)
,
s
n 0 (2n 1)
has exactly the opposite behavior. This alternating sum of inverse powers of odd integers can be
exactly summed in terms of powers of only for odd s. In this case, the even powers are
mysteriously absent. It is instructive to derive this result by going through the above analysis
backward, always keeping in mind what we want. This analysis is quite involved, and you should
NOT view it as something you would, for example, need to do on a test all by yourself, but it does
illustrate the manipulations that are sometimes necessary to achieve a desired result. We begin with
the series

(1) k
2 ( s )
,
s
k (2k 1)
for odd s. Note that if s is even, the sum gives zero. This indicates partially why the even members
are absent in the final results. Splitting the sum into two pieces, one for even k 2 j and one for odd
k 2 j 1 , we have

(6)

11

His original approach was met with hesitance by many contemporary mathematicians; these critics were largely silenced
in 1741 when he gave a more rigorous argument for the results.
12
at the very least this result can be obtained from Mathematica in less than 25 minutes, without the use of any results
100

from number theory; the actual theoretical limit is currently probably much greater than 10
the literature.
13

digits, but I couldnt find it in

This would be implied by the transcendental nature of if (3) were to equal a rational number times .
3

57

Advanced Mathematical Techniques


N
N

(1)k
1
1
lim

s
s
s
N

k (2k 1)
j N (4 j 1) j N (4 j 1)
Here, we have made the limiting process explicit because the two series in the brackets do not
converge separately for all values of s unless terms with positive and negative n are summed together.
The sum converges, but we need to be careful when we separate them. Keeping in mind that the above
sums came from expanding a logarithm, we multiply this result by x s s and sum from s = 1 to
infinity. This leads us to
N

x
1

s
N
N

4 j 1

x
x
x
j N
.

2
ln
1
lim
ln
( s) lim ln 1

N
N
x

s odd 1 s
j N 4 j 1 j N 4 j 1 N
1 4m 1
m N
A few comments are in order about the character of this expression. As stated above, contributions
associated with even values of s will ultimately cancel in the sum. Therefore, the series on the left
really only takes place over odd values of s. We will find that the function on the right is odd in x, so
this is appropriate. We use two different indices for the products in the last expression in order to
avoid confusion between the two products. The dummy index m is available, so its use is
appropriate. The ratio consists of a function whose zeros occur only at x = 4j 1 and one whose zeros
occur only at x = 4j + 1. This allows us to identify
N
N

x
x
x

x
lim 1
2 cos
and lim 1

2 cos
N
N

4 j 1
4m 1
4 4
4 4
j N
m N
from the zero structure and the value at x = 0, and the ratio as
N

x
x

cos

1 4 j 1
N
ln
4 4 ln tan x ,
lim ln j

N
N
x
x

4 4
cos

4m 1
4 4
m N
since
x
x
x
cos
cos
sin

4 4
4 4 2
4 4
from the sum identity for the cosine function. Miraculously, the remaining tangent function simplifies
immensely. The half-angle formula for the tangent function,
1 cos
,
tan
2
sin
can be used in conjunction with the sum formulas for sine and cosine to obtain

xs
x
x
.
tan
2
( s) ln sec
2
2

s odd 1 s
Differentiating with respect to x, we obtain the lovely expression

x
sec
2 ( s ) x s 1 ,
2
2
s odd 1
or, rearranging and rescaling x,

(2n 1) 2 n
sec x 22 n 2
x .
2 n 1
n 0
This very nice expression allows us to determine all of the odd values of the Dirichlet beta
function simply by comparing coefficients with the expansion of the secant function. In particular,

58

Section II.2: Infinite Product Representations of Functions

(1)k
1 1 1

1
3 5 7
4
k 0 2k 1
and

(1)k
1 1 1
3

(3)

.
3
33 53 73
32
k 0 (2k 1)
The secant function is, of course, even in x, so the even values of the -function are absent. This is as
expected from the beginning of our derivation. The first even value,

(1)k
1 1 1
(2)
1 2 2 2 ,
2
k
(2
1)
3
5 7

k 0
is known as Catalans constant after the French and Belgian mathematician Eugne Catalan. It plays a
role analogous to Aprys constant in physical and mathematical problems.
We can use the technique illustrated above to derive expansions involving the more general
series

1
f ( s; a )
s
k ( k a )
and

(1)k
g ( s; a )
,
s
k ( k a )
for some complex number a that is not an integer (why do we make this restriction?). Multiplying
f ( s; a ) by x s s , summing from s = 1 to infinity, and recognizing the resulting infinite product by
identifying its zeros and behavior at x = 0, we arrive at

xs
sin( a x)
,
f ( s; a) ln

s
sin a
s 1
or

(1)

f ( s; a ) x

s 1

cot a x .

s 1

Evaluating at x = 0 gives
f (1; a )

1 2a

k a lim k a k 1 a (k a)(k 1 a) cot a .

k 0

k 0

This can be evaluated at any non-integral value of a. Choosing a = 1/2 obviously gives 0, but the
choice a 1 3 gives the new result

1
1 1 1 1 1

1
,

k
k
3
1
3
2
2
4
5
7
8
3
3

k 0
as does the choice a 1 6 :

1
1 1 1 1 1

1
.

1
6k 5
5 7 11 13 17
2 3
k 0 6k 1
You can verify for yourself that the choice a 1 4 gives the same slowly-converging expansion for
4 we got from the inverse tangent function in the expansion notes.
We can also obtain expressions for higher values of s by differentiating with respect to x before
evaluating at x = 0. The expressions for s = 2 and 3 are

1
1
f (2; a)

2 csc2 a

2
2
2
(k 1 a )
k ( k a )
k 0 (k a)
59

Advanced Mathematical Techniques

1
3 csc2 a cot a .
3
(k 1 a)
k
k 0
The choice a 1 2 obviously either gives zero or the Dirichlet lambda function, depending on whether
s is odd or even, but other choices lead to some interesting results:

1
1
1 1 1
4 2

(3k 1) 2 (3k 2) 2 1 22 42 52 27
k 0

(k a) (k a)

f (3; a )

1
1
1 1
1
2

2
(6k 5) 2
52 7 2 112
9
k 0 (6k 1)

1
1 1 1
4 3

(3k 1)3 (3k 2)3 1 23 43 53


81 3
k 0

1
1 1
1
3

.
3
(6k 5)3
53 73 113
18 3
k 0
Note that even values of s consist of adding terms, while odd values consist of subtracting terms. This
generalizes and extends our understanding of why the odd values of the zeta function are unattainable
while the even values are laid out before us: the functions we are getting all of this information from,
the sine and cosine functions, have zeros that extend to both positive and negative values of x. When
adding all of these contributions together, we end up with addition for even s and subtraction for odd s.
In order to determine the missing pieces, we will need to employ a function whose zeros lie only on
one side of the y-axis. The most promising choice of such a function is the Gamma function, which we
will meet shortly (in the next chapter).
Multiplying the function g ( s; a ) by x s s and summing from s = 1 to infinity, we arrive at

(6k 1)

g (s; a)

x
x

(1) k ln 1
ln
Nlim

s
k
a

1 2 j a

j N
N

.
x

1 2m 1 a
m N
Identifying the infinite products as before, by their zeros and behavior at x = 0, we obtain
a x

tan

xs
2
2

.
g ( s; a ) ln

a
s
s 1
tan
2
Differentiating with respect to x gives the expression

a x a x
x s 1 g ( s; a ) csc

sec
csc( a x ) .
2
2 2
2
2
s 1
Again, taking x = 0 gives

(1)k
1
1

(1) k

g (1; a)
csc a .

1
k
a
k
a
k
a

k
k 0
Evaluating this expression at a 1 2 again gives the famous formula for 4 , but other values give
new results:
s 1

60

Section II.2: Infinite Product Representations of Functions

1
1 1 1
2
1

k
k
3
1
3
2
2
4
5
3

k 0

1
1
1 1 1

k
(1)

1
3 5 7
2 2
4k 1 4k 3
k 0

1
1
1 1 1

k
(1)

1 .

k
k
6
1
6
5
5
7
11
3

k 0
Notice the difference in character between these expansions and those obtained from f (1; a ) . The
numbers we are adding are the same, but the sign pattern is different. Differentiating again and again
gives us expressions for higher values of s:

(1) k
1
g (2; a )
(1) k

2 csc a cot a
2
2
2
k ( k a )
k 0
(k a) (k 1 a)

(1)

3
(1) k
1
1
k
2
(
1)

(k a )3 (k 1 a)3 2 csc a 2 csc a 1 .


3
k ( k a )
k 0

Choosing a 1 2 clearly gives the Dirichlet beta function for odd values of s this time and causes the
even terms to vanish. Other choices of a, of course, give new results:

1
1
1 1 1
2 2

(1) k
1

2
2
2 2 4 2 52
27
k 0
(3k 1) (3k 2)

g (3; a )

1
1
1 1 1
2
(4k 1) 2 (4k 3) 2 1 32 52 7 2
8 2
k 0

1
1
1 1
1
2
( 1) k

1 2 2 2

2
2
5 7 11
6 3
k 0
(6k 1) (6k 5)

(1)

1
1
1 1 1
5 3
k

(
1)
1

(3k 1)3 (3k 2)3


23 43 53
81 3
k 0

1
1
1 1 1
3 3
(1) k

1 3 3 3

3
3
3 5 7
64 2
k 0
(4k 1) (4k 3)

1
1
1
1
1
7 3
.

( 1) k
1

3
3
53 73 113
216
k 0
(6k 1) (6k 5)
You should try working out each of these results from the expressions given, as it will give you
practice in working with these kinds of expansions. Obviously, there are many more results that can
be obtained from different choices of a. Any non-integral value of a will work, even those that do not
result in nice angles (integral values are forbidden by the definition of the original series). Irrational
values of a will, of course, not result in inverse powers of integers. They will, however, result in a
correct expression for the sum.
As a final comment in these notes, I would like to point out that we can also choose values of x
that are complex without any additional difficulty. The expressions for our infinite series are not
illuminating when evaluated at complex numbers, but the infinite product expansions do give some
simple and interesting results. In order to evaluate the trigonometric functions with complex
arguments, it is important to remember that14
eix cos x i sin x cis x
for real x. Writing this expression for x x and adding or subtracting gives

14

This expression is derived section 5.2.

61

Advanced Mathematical Techniques

eix e ix
2
and
eix e ix
.
sin x
2i
These expressions are also valid for complex x, so we have
e x e x
cos ix
cosh x
2
and
e x e x
sin ix i
i sinh x .
2
Our treatment of the convergence of infinite products is just as valid for complex numbers as it is for
real numbers, so we immediately have

x2
cosh x
1

2
k 0
k 1 2
and

x 2 sinh x
.

1 2
x
k
k 1
Clearly, neither of these functions has any real zeros. The infinite product expansions, however,
clearly converge on the whole plane. Plugging in some values, we have

1 sinh e e

1 k 2 2
k 1
cos x

1
e2 1
sinh1
2
2e
k
k 1

1
e e
cosh
1

2
2
k 0
k 1 2

1
e2 1
.

1
cosh1

2
2
2e
k 0
k 1 2
Infinite product expansions are kind of shy, and dont really appear all that much in most
physical applications mainly because they often converge quite slowly and are not really that wellsuited for precise numerical work. Infinite series are usually easier to deal with numerically, and lead
to more precise results. The products do, however, contain a wealth of interesting information about
the structure of functions and the manner in which the zeros of a function dictate the behavior of that
function. Results obtained from infinite product expansions are usually very difficult to obtain in any
other way, so their study is useful in understanding the origins of many important results that do
appear in a large variety of physical applications.

62

Section II.2: Infinite Product Representations of Functions

Exercises for Section II.2

In problems 1 8, determine the value of the given infinite product.


1.

k 1

3.

2
2

2.

k2

4.

1 k

1
k 1

7.

1
k 1

3
2

1 4
k 1

1 (2k 1)
k 0

k 1

5.

1 2k

k2

8.

1 9(2k 1)
k 0

k2

6.

6
2

12.

(2n 1)

1 k
k 1

In problems 9 20, determine the value of the given infinite series.


9.

n 1

13.

15.

(3n 1)
n 0

(1)

(1)
n 0

n 0

19.

(12n 1)
n0

17.

10.

(1) n 1
n4
n 1

1
(3n 2) 4

1
2
(12n 11)

11.

(1) n

(2n 1)
n 0

14.

16.

n 0

18.

1
1
(12n 1) 2 (12n 11) 2

20.

(6n 1)
n 0

(1)

n 0

(1)
n 0

(12n 1)
n 0

1
(3n 1) 4 (3n 2) 4

(1)n

1
(6n 5) 4

1
3
(12n 11)

1
1
(6n 1) 4 (6n 5) 4

1
1
(12n 1)3 (12n 11)3

63

Advanced Mathematical Techniques

21. Consider the function lim

1 3n 1 .

n N

(a) Separate the terms with positive n from those with negative n for the finite product, so keep
N finite, and show that neither of these separate products converge as N tends to infinity
unless x 0 . You may include the term n 0 with either of the products; it does not
matter.
(b) Change the sign of n for the terms consisting of negative n, and write the full result as a
finite product over positive values of n. Show that the limit of this product converges as N
tends to infinity.
(c) Determine the function that the infinite product converges to by recognizing its zero
structure and value at x 0 . Write your result in terms of the sine function.
(d) Re-write the function you obtained in part (c) using the standard infinite product
representation of the sine function. Is it obvious that these two products are the same? Are
these two products the same?
(e) Use your results to determine the exact value of lim

n N

.
n0

(a) Show that this product converges for all values of x except at its zeros, i.e. it is an entire
function.

22. Consider the infinite product

1 6n 2 .

x2

1 (3n 1)

(b) Can you use the zero structure of this function to identify it in terms of either the sine or
cosine function? Why or why not? What is the difference between this function and that
considered in problem 21?

23. Determine the values of x for which the infinite product

1 n
n0

x
converges. Where are
1

the zeros of this function?

24. Determine the values of x for which the infinite product

n0

the zeros of this function?

64

1 n

nx
converges. Where are
1

Section II.2: Infinite Product Representations of Functions

, where cn n 1 is a sequence of positive real numbers.


n0
n

(a) What properties must the sequence cn n 1 satisfy if this product is to converge at x 1 ,

25. Consider the infinite product

1 c

assuming cn is never equal to 1?


(b) What are the zeros of this function?
(c) Is it possible for this function to converge at one value of x and fail to converge at every
value of x in the complex plane, excluding its zeros? Explain your reasoning.
(d) Can a function of this form be defined in such a way that its zeros lie at k 3 1 for all whole
numbers k? Explain the reasoning behind your answer.

k 3 1 for all

(e) Can a function of this form be defined in such a way that its zeros lie at
whole numbers k? Explain the reasoning behind your answer.
(f) Can a function of this form be defined in such a way that its zeros lie at
whole numbers k? Explain the reasoning behind your answer.

k 3 1 for all

Section II.3: Summary of Chapter II

Infinite product expansions are closely related to infinite series in that both can be used to
represent functions and the convergence properties of infinite series can be used to derive similar
convergence properties for infinite products, but there are some important differences. First, power
series representations of functions can only converge in a disk. The power series expansion of a
function is limited by the distance from its center to the closest singularity of the function. This is not
true of infinite product representations, as these can converge throughout the plane excluding singular
points, allowing us to look at the behavior of the function everywhere in the complex plane except at
isolated singularities. The infinite product representation of the tangent function is one example, but
there are obviously many others. Second, the conditions required in order for an infinite product to
converge are more subtle than those of infinite series. If the infinite series associated with an infinite
product converges absolutely, then the two requirements are the same. Infinite products associated
with infinite series that converge conditionally are much more delicate, and may converge or diverge
depending on the rate at which the terms in the infinite series go to zero. In this case, we must
determine the convergence properties of a series of logarithms in order to treat the infinite product.
These are not as easily treated as the bare infinite series discussed in chapter 1.
Infinite products also have much to say about the values of infinite series, and can be coaxed to
give a wealth of information along these lines. A major use of infinite products in analytic number
theory involves the determination of the exact values of series involving powers of natural numbers.
These results go to the root of the relationship between natural numbers and the transcendental number
, and are quite striking. Infinite products can sometimes even be tailor-made in order to sum a given
series. Infinite series can sometimes also be used in this vein, but infinite products are readily able to
provide values that are not easily accessible to the infinite series treatment. Prime examples are the
65

Advanced Mathematical Techniques

Riemann zeta series and the Dirichlet series considered in section 2.2. We can construct functions
whose Taylor series give these results at x 1 , but the functions themselves are not easily evaluated.
It is often more useful to determine the values of these series from the infinite product approach, then
use these values to determine the value of the function associated with the infinite series approach.
Infinite product expansions and infinite series expansions of functions are both useful, and have
their own special properties not held by the other. Infinite series expansions are easily integrated and
differentiated, and allow us to quickly obtain results for many integrals involving a given function.
Infinite product representations are not easily integrated or differentiated, but they are easily multiplied
together and directly exhibit the poles15 and zeros of the function in a manner inaccessible to infinite
series. Powers of functions are also easy to treat using infinite product representations, while they can
be extremely cumbersome using an infinite series approach.
Infinite products represent a sort of peripheral view of functions that does not often arise
directly in applications, but leads to results that are widely used. We will see a much broader approach
to infinite products in chapter 5, where we will find that they can be used to represent every function
that consists of a ratio of entire functions. Such functions are referred to as meromorphic functions,
and are extremely important in analytic function theory. Their infinite product expansions allow us to
treat them generally and obtain results on their behavior based on the location of their zeros and poles.
While infinite product representations do not often come up directly in applications, they are extremely
useful in many branches of mathematics.

15

This term can be thought of as meaning a zero of the denominator of a function, so the tangent function has poles at all of
the zeros of the cosine function. It will be properly defined in section 5.8.

66

Section III.1: Definitions of the Gamma Function

Chapter III
The Gamma Function (and friends)
The purpose of this chapter is to define a very important function that appears in many
applications, most notably in physics and advanced mathematics, the Gamma function. This function
was originally defined, in one form or another, by the mathematical equivalent of a saint, the Swiss
mathematician Leonhard Euler, in 1729. We will begin by motivating the need for the Gamma
function as an extension of the factorial function from the integral

u z 1eu du ,

which converges whenever the real part of z is greater than 0. This integral often occurs in
applications and in straightforward mathematical manipulations, so its use as a starting point is
appropriate.
Section III.1: Definitions of the Gamma Function
Consider the integral

x n e x dx for whole number n. The integrand is the product of a

polynomial ( x n ) and a function that is easily integrated as many times as we like ( e x ). Therefore, it
is a prime candidate for integration by parts. Using the tabular method of integration by parts, we
can calculate the value of this integral for any whole number n by integrating e x repeatedly n + 1
times and adorning each factor with the appropriate power of x. This technique is fine if we are only
interested in small values of n, like 3 or 5, but it falls short of the expectations of a relatively short
computation time if we are interested in larger values like n = 300. I dont want to write out 301
terms, no matter how easy it is to determine each individual term! Fortunately, there is an alternative.
When n = 0, it is easy to verify that the integral gives 1. For general n, we write

d
d x
n x
n
n x
n 1 x
0 x e dx 0 x dx e dx 0 dx x e nx e dx
.

x n e x n x n 1e x dx n x n 1e x dx
The value of the integral at any value of n is therefore equal to n times the value of the integral at
n 1 . Since the integral is equal to 1 when n = 0, we can work our way up to any natural number n
simply by repeatedly using this identity. The result is the famous and very useful expression

x n e x dx n ! .

This result is extremely useful, and appears in many, many applications. Unfortunately, it is
not very general. The integral

x e x dx , for example, certainly converges, but its value cannot be

determined from our expression because we do not know how to interpret 1 2 ! . Our derivation also
fails in this case, and in the case of all non-whole numbers n. For this reason, it would be useful to
somehow extend the factorial function to non-integral arguments. Extensions of the factorial function
are definitely not unique; given any function f ( x) that satisfies f (n) n ! for every whole number n,
the function g ( x) f ( x ) A( x ) sin x also satisfies exactly the same requirement for any continuous
function A(x). We can naturally define an extension of the factorial function by demanding that the
67

Advanced Mathematical Techniques

new function also satisfies the functional recursion relation f ( x 1) ( x 1) f ( x) for all arguments x,
but even this requirement does not lead to a unique generalization of the factorial function; the
function A( x ) can be constructed to also satisfy this requirement. It turns out that the additional
requirement that the generalized factorial function be logarithmically convex, or its natural logarithm is
concave upwards, for all positive real numbers is enough to fix the function uniquely. The argument is
almost always shifted to the left one unit, so the Gamma function can be defined as the unique function
( z ) that is logarithmically convex for positive real values of z, takes the value 1 at z = 1, and satisfies
the functional recursion relation z ( z ) ( z 1) for all complex z. The logarithm of this function is
illustrated in figure 1. It is clear from the graph that it is logarithmically convex.
15

10
5

10

Figure 1
All of these arguments about logarithmic convexity came much later than the original
definition of the Gamma function. The original definition represents quite a natural extension of the
factorial function, despite the fact that it is shifted by one unit. We begin with the idea that any value
of the factorial function, no matter how large, can be stepped down by repeatedly invoking the
recursion relation. For n a very large positive integer, much larger than another integer z, we must
have that
z terms

(n z )!
n !(n 1)(n 2) (n z )
( z ) ( z 1)!

.
(n z )(n z 1) ( z 1) z (n z )(n z 1) ( z 1) z
Since n z , each of the factors in the numerator is approximately equal to n:
n! n z
( z 1)!
.
( z n) ( z 1) z
This approximation gets better and better as n becomes larger and larger than z, and becomes
arbitrarily good as n increases without bound for fixed z. This promotes the identification
n! n z
( z ) lim
,
n ( z n) ( z 1) z
which is perfectly well-defined for all z, integral or non-integral, real or complex, unless z is a negative
integer or zero (why?). We are essentially breaking the numerator into two pieces; terms that are
greater than n are approximated by n, and those that are less than or equal to n are relegated to the
already-defined factorial function. The denominator terms are left alone because no definitive
statement about their size can be applied to all of them. This was Eulers original definition for the
Gamma function,16 and remains the most basic of those used today. It is commonly referred to as the
limit definition of the Gamma function.
Using this limit definition, it is easy to show that the Gamma function satisfies the recursion
relation:
16
His limit definition does not take this form, the factorial being disguised in an infinite product, but is entirely equivalent.
Gauss is attributed with this specific form of the limit.

68

Section III.1: Definitions of the Gamma Function

n ! n z 1
zn
n! nz
lim

z ( z ) .
n ( z 1 n)( z n) ( z 1)
n z 1 n ( z n) ( z 1) z
It is also clear that (1) 1 (show this). These two facts immediately imply that ( z 1) z !
whenever z is a whole number, so this function definitely represents an extension of the factorial
function to non-integral arguments. It is still unclear why the choice was made to shift the argument
over to the left one unit. This choice is attributed to the French mathematician Adrien-Marie
Legendre, who is also credited with the use of the symbol , but his reason for making this choice is
not known. Although the shift is kind of annoying when one is making the transition from the factorial
function to the Gamma function, many of the important results associated with the Gamma function
are expressed more compactly and more elegantly using the shifted function than they would be if the
function were not shifted. The limit definition is one example, as the denominator would terminate in
z + 1 instead of z if the function was not shifted, but there are many more striking examples. This
choice has not been made universally throughout history; Gauss used the unshifted factorial function
he called ( z ) ( z 1) in much of his work, and several mathematicians since have refused to work
with the shifted function. Riemanns work is written largely in terms of ( z ) instead of ( z ) ,
including his landmark 1859 paper on the distribution of prime numbers. Despite these prominent
dissenters, the shifted factorial function is commonly accepted today throughout almost all
mathematical circles.
As an interesting historical fact, every mathematical text containing a portrait of Legendre up
to the present actually contains a portrait of the politician Louis Legendre instead of the mathematician
Adrien-Marie Legendre. This mistake was so entrenched in mathematical society that it was only
discovered in 2005, and it appears in many highly-reputable tomes on the history of mathematics. Its
use to portray the mathematician Legendre hails back at least as far as 1900, 67 years after his death.
The only surviving likeness of the mathematician Legendre is a caricature including Legendre and the
French mathematician Joseph Fourier currently contained in the library of the Institut de France in
Paris that was brought to light in 2008. This caricature is illustrated in figure 2, with Legendre on the
left. It is strikingly different from the likeness contained in the calculus texts of the last century.
( z 1) lim

Figure 2
While the limit definition is very useful in establishing certain basic properties of the Gamma
function, its form is not conducive to actually calculating the value of the Gamma function. For this,
we need another expression for the Gamma function. We can determine such a form following the
German mathematician Karl Weierstrass, the father of modern analysis. Manipulating the limit
definition, we have

69

Advanced Mathematical Techniques

1
(n z )(n 1 z ) (1 z ) z
n z n 1 z
1 z

lim
z lim e z ln n


z
n

( z )
n! n
n
n 1
1

.
n
z ln n z

z
z
z ln n
z lim e
e

1 1
1 z z lim
1
n
n
k
n n 1
k 1

It is tempting to interpret the product as an infinite product, but that infinite product will not converge
at any value of z (except z 0 , of course) by comparison to the harmonic series. This is, of course,
mirrored by the fact that the term e z ln n n z 0 as n ; the product of the divergent infinite
product and this factor tends to a finite limit as n increases without bound. In order to get a useful
result, we need to somehow separate these two contributions in a well-defined way as n . To
motivate the lucrative way in which this is accomplished, observe that the factor
z 2 z z
1 2 1 1
k
k k
would lead to a convergent infinite product. Now, we cannot just multiply and divide by 1 z k in
order to make the infinite product converge because this factor equals zero when z k . However,
convergence or divergence of an infinite product is determined exclusively by the behavior of the tail
terms with k very large. Therefore, we do not really need to multiply and divide by 1 z k . We only
need a function that behaves like 1 z k when k is very large. The function e z k works admirably for
this purpose, as it is never zero and certainly behaves like 1 z k when k | z | . With this in mind,
we multiply and divide each of the factors in our product by this term:
n

z 1 j n
n
1
z
z
z lim e z ln n e z k e z k 1 z lim e z ln n e j1 e z k 1
n
n

( z )
k
k .
k 1
k 1

z lim e
n

z H n ln n

e
k 1

z k

z
1
k

The notation H n 1 j has been introduced in the last expression.


j 1

This modified product definitely will converge as n tends to infinity, so the expression in the
exponent should also converge. The convergence of this expression can be established algebraically,
as was done in chapter 1, but it is also instructive to have a geometric interpretation of its convergence.
Figure 3 contains a graph of the function 1 x along with a set of rectangles one unit wide whose
height is given by 1 j , where the left side of the box occurs at x j . The sum of the areas of these
rectangles from x 1 to x n is therefore equal to H n , while the area under the curve from x = 1 to
x = n is equal to ln n . The difference H n ln n is given by the sum of the part of the areas of the
boxes that lies above the curve. Sliding each of these regions over into the rectangle between x = 1
and x = 2, as illustrated in figure 4, we see immediately that this difference is less than the area of this
rectangle, and therefore bounded. The sequence is also monotonically increasing, as the rectangles in
figure 3 always lie above the curve. As with all bounded monotonic sequences, this sequence
definitely converges. The number it converges to is called the Euler-Mascheroni constant, , after
Leonhard Euler and the Italian mathematician Lorenzo Mascheroni. To determine its value, we appeal
to the techniques of chapter 1. The value of H100 is 5.1873775 . The remaining part of the sum
from 101 to n can be bounded by
n
n 1
1
n
.
ln
ln
101 j 101 j
100
70

Section III.1: Definitions of the Gamma Function

Subtracting ln n and adding H100 to both sides, we obtain


5.1873775... ln101 0.57257... 0.582207... 5.1873775... ln100 .
The actual value of the constant is
0.5772156649... .
It is currently known to almost 30 billion digits,17 but the question of whether or not it is a rational
number remains open. If it is rational, the denominator of its ratio in lowest terms must have more
than 242,080 digits. This constant was first defined by Euler in 1734, and he may also have developed
the infinite product expression for the Gamma function. However, credit for this expression is usually
given to Weierstrass (dont worry about Euler he will be fine).
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0

Figure 3

10

Figure 4

In light of the above analysis, we have the infinite product expansion

1
z

ze z e z k 1 .
( z )
k
k 1
This expansion converges for all z in the complex plane, so the function 1 ( z ) is entire like the sine
and cosine functions. It has zeros at all non-positive integers, so ( z ) has poles at all non-positive
integers. This is obvious from the recursion relation and the fact that (1) 1 , as
1 (1) 0 (0)
implies that (0) is ill-defined (this recursion argument also resolves the age-old pre-calculus and
algebra II question as to why 0! = 1 instead of 0; 1! must equal 1 0! by the recursion relation). The
relationship between the Gamma function and the sine function, while not at all obvious, can easily be
determined from this result. We can use our method of making the infinite product associated with the
Gamma function to rip the sine function apart:

z2
z
z
sin z z 1 2 z e z k 1 e z j 1
k
j
k j 1
k 1
k 1

.
1
1

z( z ) ( z )( z ) ( z )(1 z )
This results in the important reflection identity for the Gamma function,

.
( z )(1 z )
sin z
The most famous result of this identity comes from setting z 1 2 :
1
1
2

,
2 1
2
17

This computation is attributed to Alexander J. Yee, a computer science student at Northwestern University. It apparently
took 205 hours of processing time, and beat the previous record of almost 15 billion digits set by Yee and another student,
Raymond Chan, in January of 2009.

71

Advanced Mathematical Techniques

but there are many other useful identities that follow from this relationship. One of the most obvious
is the fact that (1 z ) can be obtained from ( z ) whenever z is not an integer. This is illustrated by
the expression
1 3
2 ,
4 4
which is certainly valid regardless of the fact that neither of the values on the left can be expressed in
closed form in terms of other well-known mathematical constants like or .
The third expression for the Gamma function arises from the integral that I began these notes
with. Consider the integral
n

1
t
z 1
z
z 1
n
0 t 1 n dt n 0 u (1 u ) du .
This integral can easily be integrated by parts repeatedly for any positive integer n whenever the real
part of z is larger than 0 to obtain
n

1
t
n! n z
z 1
z
z 1
n
0 t 1 n dt n 0 u (1 u ) du (n z )(n 1 z ) (1 z ) z .
Recognizing the expression on the right, we conclude that
n

t
( z ) lim t z 1 1 dt t z 1e t dt ; Re( z ) 0 .
0
n 0
n
This integral expression is sometimes called Legendres expression for ( z ) regardless of the fact that
Euler was almost certainly aware of all three expressions.18 Evaluating at z 1 2 , we arrive at the
important result

2
1
t 1 2 e t dt 2 e u du .
0
0
2

This integral is often referred to as Gauss integral, after the German mathematician, physicist, and
man of all trades Johann Carl Friedrich Gauss. It can be derived in many different ways, but one of
the most famous involves squaring the integral and changing to polar coordinates:

I e x dx ;
0

I 2 dx dy e x

y2

d r dr e r
0

from which the above result follows.


Of all of the results from the Gamma function we will consider in the following, Gauss
integral definitely represents the most important from the standpoint of physical applications. It will
come up again and again, in fields as varied as statistics, the average speed of a gas molecule at a welldefined temperature, models of light emission from an interstellar nebula, and the quantum
fluctuations of a molecule in a solid crystal. Interestingly, one of the reasons that this integral appears
2
so often in applications is that it is so easy to compute. Even though the function e x has no
elementary antiderivative and therefore cannot be integrated over any finite interval without the use of
special functions, Gauss result allows us to easily compute all integrals of the form

x n e x dx ,

where n is a whole number. For n odd, the integral can be computed easily via the u-substitution
u x 2 . For n even, we generalize Gauss result:
18
Euler gave an integral from 0 to 1 of a power of the logarithm function instead; the two representations are related via a
simple u-substitution. It is not clear whether or not Euler was aware of Weierstrass infinite product; Eulers infinite product
representation is actually our limit definition in disguise.

72

Section III.1: Definitions of the Gamma Function

1
.
2
This expression can be repeatedly differentiated with respect to , generating any desired power of
x 2 in the integrand. For example,

15
6 x2
0 x e dx 16 .
We will see these integrals again later on in the course of this text. The general expression for Gauss
integral can even be used for complex to determine the values of other interesting integrals. Taking
i , for example, gives

1 1 i 4

ix 2
0 e dx 2 i 2 e 2 2 (1 i) .
Taking the real and imaginary parts, we arrive at

1
2
2
0 sin x dx 0 cos x dx 2 2 .
These integrals are surprisingly important in many studies of optics, where they are referred to as
complete Fresnel integrals after the French physicist Augustin-Jean Fresnel. Compare these to the
series expansions for the incomplete Fresnel integrals introduced in chapter 1.
Aside from the Gaussian integral, many other integrals can be done easily with our integral
expression in hand. The results

1 3
2 x
3
0 x e dx 6 3 2

1 3
3 x3
0 x e dx 9

3 5
2 3 x 5
0 x e dx 5 5 27
represent just some of the many integrals that can be evaluated using the Gamma function. We can
even complete the general integral
m 1

n ; m 1 0
m ax n
m 1
0 x e dx
n
na n
using this technique.
One may question the utility of expressing such integrals in terms of the Gamma function. It is
certainly nice that we can express the first two integrals in terms of the number 1 3 , but this
doesnt really help us unless we have a way to determine the numerical value of this constant. The use
of expressing quantities in terms of these special functions is that we have other results available to us
that allow us to compute these values. We can, for example, compute 1 3 2.678938535 from the
infinite product or from another integral that is easier to approximate. We will develop a Taylor series
that will also be useful in this regard below, but the most efficient way to actually calculate these
values consists of approximating the value of the Gamma function at a much larger argument, like
30 1 3 , using the asymptotic expansion derived by the Scottish mathematician James Stirling in the

e x dx

18th century, and stepping down to 1 3 using the recursion relation. We will cover this technique
and the associated asymptotic expansion for the Gamma function in Chapter 5. The key point is that
73

Advanced Mathematical Techniques

special functions like the Gamma function allow us to determine relationships between important
integrals that may not be directly obvious, like

2 x e2 x dx 9 x3e x dx ,
3

and allow us to employ the results of centuries of work by dedicated mathematicians to the problem at
hand.
Some other integrals we can obtain from this expression for the Gamma function involve
products of exponential functions, power functions, and trigonometric functions. These integrals
obviously will employ the complex exponential expressions for the trigonometric functions, as the
integral

x3e x cos x dx

is obtained by taking the real part of the integral

(4)
3 (1i ) x
0 x e dx (1 i)4

6
2e

i 4

3
.
2

This gives 3 2 for our integral, as well as the result

x3e x sin x dx 0

for free. Fractional powers are also interesting; see if you can show that

7 2
53
15 4 2 2 i 8
(1 i ) x
2
.
x
x
e
dx
e

0
(1 i )7 2 23 2 e i 4 7 2
32

The only issue with these sorts of integrals is that they are much more efficiently taken if one uses the
polar form of complex numbers,
z a ib z ei Arg( z ) ,
where z a 2 b 2 and Arg z Arctan b a as long as a is positive. This form makes it much
easier to take powers of complex numbers, as we need only take the given power of the positive real
number z and multiply the argument function by the power. It follows from the analysis of the
French mathematician Abraham de Moivre, who originally derived it in 1707. Euler suggested the
exponential notation in his 1749 formula
ei cos i sin ,
which simplifies almost everything associated with complex numbers. Using this result, we can easily
extract the integrals

x 2 x e x cos x dx

15 4 2 2

15 2
cos
32
8
64
and

2 1

15 4 2 2
15 2
sin
2 1 .
0
32
8
64
As always, integrals obtained from this technique must first be established to converge. Just as
the integral representation gives us the incorrect result for the integral
x
e
0 x x dx 1 2 2 ,
it will also give us the result

74

x 2 x e x sin x dx

Section III.1: Definitions of the Gamma Function

x 2 sin x dx Im x 2 eix dx 2 .
0

Neither of these integrals converge, so neither value is accurate. There are, however, some
applications in which the latter of these can actually be given meaning. What this result really tells us
is that the limit

lim x 2 e x sin x dx 2 .
0 0

The parameter is introduced in order to regulate the divergence in this integral, and its value is
taken to zero at the earliest possible moment. If there is a physical reason why we should introduce
this parameter, then this value is entirely appropriate. If, on the other hand, there is no reason to
introduce this parameter, then the appearance of this integral implies either that we have done
something wrong or that the physical model we are using fails in our region of interest. This comes up
fairly often in physical models, and is usually an indication that we used an expansion technique that is
not valid in our region of interest. In such cases, it would be entirely inappropriate to use the value
given to us by analytic continuation. The appearance of the divergent integral really means that we
have done something wrong.

Exercises for Section III.1


In problems 1 18, determine the value of the given integral in terms of the Gamma function. Write
your result in terms of if possible; otherwise, just use the recursion relation to express the result in
terms of a Gamma function whose argument lies between 0 and 1.

4.
7.
10.
13.

1.

16.

x x e2 x sin x dx

x e dx
5. x e dx
8. x x 2
dx
11. x x ln 1 x dx
14. x e cos 3x dx

x e x sin 3x3 dx

17.

x3e2 x dx
2

x e3 x dx
2

x 2 2 x dx
x 2

x5

dx

2.

2 3 x 4

2 2 x5

3 x x

2 x

cos x
dx
x

6.
9.
12.
15.
3.

x x e4 x dx
x x e3 x x dx
3

x 2e 2 x dx

1
3
0

18.

x 2 ln 5 1 x dx

e x sin 2 x 2 dx
2

sin x
dx
x

19. The reciprocal of the Gamma function has zeros at all non-positive integers. Use this fact to
construct a function whose zeros occur at 3n 1 for all whole numbers n. What is the infinite
product representation of your function? Use your result to express the function

x2
f ( x) 1

(3k 1) 2
k 0
in terms of the Gamma function. Show that the exponential factors in the infinite product of
the Gamma function cancel when you consider this function, with zeros at both positive and
negative values of x.

75

Advanced Mathematical Techniques

20. The reciprocal of the Gamma function has zeros at all non-positive integers. Use this fact to
construct a function whose zeros occur at 5n 2 for all whole numbers n. What is the infinite
product representation of your function? Use your result to express the function

x2
f ( x) 1

(5k 2) 2
k 0
in terms of the Gamma function. Show that the exponential factors cancel when you consider
this function, with zeros at both positive and negative values of x.
sin x 2
0 x dx certainly converges, but its value is not easily obtained by our
methods using the Gamma function because the related integral involving the cosine function
does not converge. Determine the value of this integral by considering it as
2
2
sin x
sin x
dx
1
0 x dx lim
0
0
x
instead. Why does this help? This is another technique of regularization of integrals.

21. The integral

sin 8x 3
0 x3 dx certainly converges, but its value is not easily obtained by our
methods using the Gamma function because the related integral involving the cosine function
does not converge. Determine the value of this integral by considering it as
3
3
sin 8 x
sin 8 x
dx
3
0 x3 dx lim
0
0
x
instead. Why does this help? Note that the process you are using to find the integral is really
only valid when 2 , but the result is still valid because the integral we are interested in
converges.

22. The integral

cos x
dx .
3
x
(a) Show that this integral converges by breaking it up into a sum of pieces over which the sign
of the integrand is constant and using the alternating series test.
(b) Determine the value of this integral.

23. Consider the integral

24. Consider the integral

cos x dx .

(a) Use the u-substitution u x to show that this integral does not converge.
(b) Use the techniques presented in the text to determine the value of

for positive numbers .

(c) Determine the limit lim e


0

cos x dx

cos x dx . What does this limit mean? It will give the

same value regardless of the power of x in the exponential, as long as this power is positive.
Explain qualitatively what happens to this integral as is taken smaller and smaller.
Include an explanation of the sign of the limit in your analysis. What is the qualitative
difference between this limit and the original integral, excluding the exponential?
76

Section III.1: Definitions of the Gamma Function

25. Determine the value of the limit lim


x0

(ax )
. Hint: The recursion relation may be useful.
( x)

Section III.2: Expansions of the Gamma Function and Related Integrals


The integral expression for the Gamma function definitely originates the majority of the
appearances of Gamma function in physical applications, but in order to make use of its full potential
we need to be able to differentiate it. This act of differentiation will allow us to determine the value of
integrals such as

x 2 ln x e x dx and

x ln 2 x e x dx (why?).

Discussions of this process online will almost invariably lead to the introduction of the
d
( z )
digamma function, ( z ) ln ( z )
, but there is an easier way to determine the value of
dz
( z )
these integrals. Weierstrass infinite product provides us with a convenient means with which to
determine a Taylor expansion of the Gamma function. From this expression, it is clear that
1


(1 ) ( ) e e k 1 .
k
k 1
The natural logarithm of this expression is given by

(1) j 1 j

ln (1 ) ln 1

jkj
k
k 1 k
k 1 k
j 1
.

( 1) j j
( 1) j ( j ) j

jkj
j
k 1 j 2
j 2

This is the sought-after expansion, valid whenever 1 . This expression allows us to compute all
values of ( z ) on the disk z 1 1 , and, through the recursion relation, all values satisfying

0 z n 1 for any integer n. Note that all of the integer values of the Riemann zeta function appear
in this expression, not just the even values. The Gamma function contains information about all of
these values, in contrast to the related result with the sine function. We can re-sum part of this series
by including specific contributions to the Riemann zeta function, giving us more accurate numerical
representations of the logarithm of the Gamma function:

(1) j ( j ) 1 j

ln (1 ) ln 1
j
j 2
.
j
j
3
(1) ( j ) 1 1 2 j

ln 1 ln 1 2
j
2
j 2
A plot of the difference between these expressions and the logarithm of the Gamma function is shown
in figure 5. The solid line shows the result of re-summing the first term in the Riemann zeta function,
the short dashes the result of re-summing the first and second terms, and the long dashes the result of
re-summing the first three terms. The effect of including the first term in the series along with these
re-summations is shown in figure 6. Note that re-summing is useful, but the series still has its role to
play. The best computational techniques associated with this series include re-summing several of the
contributions in addition to including several terms in the resulting series. This is used to compute ten
or twelve values to high precision, and then polynomials are used to interpolate between these values.
77

Advanced Mathematical Techniques

The result of this analysis can give us eight or ten correct digits for all real values of z on the open
interval (-1,1).
- 1.0

- 0.5

0.5

1.0

0.02

- 0.1

0.01

- 0.2

- 1.0

- 0.5

0.5

1.0

- 0.01

- 0.3

- 0.02

- 0.4

- 0.03

Figure 5

Figure 6

This series also gives some interesting results for the Euler-Mascheroni constant . Taking
1 , for example, yields

( n) (2) (3) (4)


(1) n

.
2
3
4
n
n2
This expansion converges, but it does so at an extremely slow pace. We can improve this convergence
by re-summing some of the contributions to the Riemann zeta function, as before:

( j) 1
1 ln 2 (1) j
j
j 2

3
(1) j
ln 3
2
j
j 2

( j ) 1 2 j

(1) j 1
; n 1

j k n k j
j 2
The last expression clearly gives us a way to calculate
lim H n ln n

H n 1 ln n

without actually taking the limit; it represents the correction

( 1) j 1 1
H n ln n

j k n k j n
j 2
associated with taking a finite value of n. A somewhat faster-converging and more exotic expansion is
obtained by taking 1 2 :

(1) j ( j )
.
j 2 2 j 1 j
This series can again be partially re-summed to obtain faster convergence:
j
16 (1) ( j ) 1
1 ln

9 j 2
2 j 1 j
.

3
256
(1) j
1
ln

( j) 1 j
2
225 j 2 2 j 1 j
2

ln

The last series is alternating and behaves like 2

j6
j

when j is large, so allows us to easily estimate

the error in any given approximation. Another interesting series for the Euler-Mascheroni constant
involving the natural log of can be obtained by taking 1 2 :
78

Section III.2: Expansions of the Gamma function and Related Integrals

ln

( j)

.
2 j 1 j
This series is not alternating, though, so it is not as easy to determine the error in truncating the
expansion at a given value of j.
We can also derive a series for directly from its definition, following Euler. Since the series
n
j 1
is telescoping, its value can be determined exactly for any value of n:
ln

j
j 1
j 2

j 1
ln(n 1) .
j

ln
j 1

Manipulating the definition of , we have

1
n
k 1 k

lim H n ln n lim ln(n 1) ln


n

n 1
n 1
k 1
n 1

lim

ln
ln

n
k
n
k 1 k

,
1
1
ln 1
k
k 1 k
since the last contribution goes to 0 as n tends to infinity and the final series converges by limit
comparison to the p-series with p = 2.
We can use our new-found series for the logarithm of the Gamma function to determine the
exact value of many integrals. To get the feeling for how this works, we write

(1 ) x e x dx e ln x e x dx
n0

n!

ln n x e x dx .

The integrals of powers of logarithms times e x are therefore given as coefficients of the expansion of
(1 ) . Our above expression is the expansion of the natural logarithm of (1 ) rather than
(1 ) itself, so a little work is necessary to actually determine the values of these integrals. This
process is kind of daunting at first, but it is really not that difficult once you get used to it. Our
expansion implies that
(1 ) e

( 1)n

(n)

n 2

n 0

n ! 0

ln n x e x dx .

Suppose, for example, we are interested in determining the value of

ln 2 x e x dx . That means we

need the coefficient of 2 in the expansion. All terms in the exponent that contain powers of larger
than 2 can therefore be ignored as small, and we have

( 1)n

(n)

(2)

(2) 2 1
(2) 2



1
2
2

2!

.
2
(2) 2 2
1 2
2
1 (2)
1
2
2
2
The required integral is therefore given by
(1 ) e

n 2

ln 2 x e x dx 2 (2) 2

2
6

As an added bonus, we also obtain the integrals

e x dx 1
and
79

Advanced Mathematical Techniques

ln x e x dx

for free. Using this technique to determine

ln n x e x dx for some fixed whole number n, we

automatically also obtain the results for all of the related integrals associated with powers of the
logarithm less than n. Try using this technique to determine the value of

ln 4 x e x dx . Remember

that it is not necessary to keep all terms in the expansion; any term that contains a power of larger
than 4 can immediately be disregarded. You should obtain

ln 4 x e x dx 4 6 2 (2) 8 (3) 6 (4) 3 2 (2) ,


along with

ln x e

dx 3 3 (2) 2 (3) .

Note that each of the contributions to these expressions contains the same order, defined by taking the
order of as 1 and that of ( n) as n for n > 1, and adding the order of terms that are multiplied
together. This process continues, as you can easily see by asking Mathematica to compute integrals
for higher powers of ln x . Mathematica will simplify ( n) for all even values of n, so you can count
the order of as 1 for this purpose (why?).
To compute the value of an integral like

x3 ln 2 x e x , we begin by writing

x3 e x dx (4 ) (3 )(2 )(1 )(1 ) .

This integral is also given by the expansion

x3 e x dx
n0

n!

x3 ln n x e x dx ,

so the desired integral is contained in the coefficient of 2 in the expansion. We can determine this
coefficient in exactly the same manner as before, by ignoring all contributions associated with higher
powers of :
(3 )(2 )(1 )(1 ) 6 1 31 2 (1 ) e

( 1)n
n 2

(n)
n

(2) 2 2 2
11


6 1 2 1
6
2
2

11
11
(2) 2 2

6 1 1

6
2


6
The required integral is therefore given by

11
(2) 2
x
3
2
2
2
0 x ln x e 12 1 6 2 12 22 6 ,
and we also obtain

x3 ln x e x dx 11 6

for free. This type of calculation definitely looks daunting at first, but it really isnt that bad once
you get the hang of multiplying out the expansions and keeping only certain terms.
One other slightly modified approach involves the determination of integrals like

80

x ln x e2 x dx .

Section III.2: Expansions of the Gamma function and Related Integrals

We could do this by making a u-substitution and employing other results for

u e u du

and

u ln u e u du ,

but there is a more direct approach that will also be useful later on. The desired integral is the
coefficient of in the expansion of

x1 e 2 x dx 22 u1 e u du .
0

Keeping only terms up to the first power of , we have

1
1 ln 2
1
2 x
1
0 x e dx 4 2 (1 )(1 ) 4 e 1 (1 ) 4 1 ln 2 1 (1 )
.
1
1 1 ln 2
4
The integral therefore has the value

1
2 x
0 x ln x e dx 4 1 ln 2 .
It should be obvious that this process can be continued with higher powers of x and/or ln x , and with
different coefficients of x in the exponential function.
Our technique is extremely useful in helping us to determine the exact value of many integrals
in terms of more-or-less well-known mathematical constants like , (3), and ln 2 . Its basic ideas are
completely general, and can be applied to any integral of the form

x ln m x e x dx ,

where the only restrictions on the constants , , n, and m are those required to make the integral
converge and that m must be a whole number. However, our expansion technique does run into a
problem for certain large classes of integrals. In order to calculate the integral

x ln x e x dx ,

for example, we would begin by writing

x1 2 e x dx 3 2 1 2 1 2

and identifying the desired integral as the coefficient of in the expansion of the function on the right.
Unfortunately, our expansion for the Gamma function is not of this form. We do not have the
Maclaurin expansion for 1 2 , so it is not easy for us to determine the coefficient of in that
expansion. We can forestall this issue in the specific case of 1 2 by using a duplication
formula due to Legendre:
1 2 z 22 z 1 2 1 z z 1 2 .
This formula can be derived in many different ways, but the cleanest from my perspective involves
the Beta function defined below. For this reason, I will save the derivation of this formula for that
section. Using this result, we can write
(1 2 )
1 2 22
.
(1 )
Both of these Gamma functions have the appropriate form for the use of our expansion, so the desired
integral is easily obtained:

x ln x e x dx 1 ln 2 2 .

We can also use this technique to determine


81

Advanced Mathematical Techniques

4 ln 2 2 4 ln 2 3 (2) 2 .
8
Our luck runs out, however, when we are interested in determining an integral like
0

ln 2 x e x dx

x ln x e x dx
or

ln x e x dx .

Both of these integrals require the coefficient of in the expansion of 1 3 , and there is no
simple way to determine this expansion. There are multiplication formulas analogous to Legendres
for this, but they are more complicated and do not allow for an easy determination of the expansion.
As an alternative, and a more general means with which to determine such integrals, one turns
to the digamma function
d
( z )
.
( z ) ln ( z )
dz
( z )
This logarithmic derivative of the Gamma function has the expression

1 1
1
1
1
,
( z )
z
z k 1 k z k
z
k 1 k ( k z )
which converges for all z except at non-positive integers (where the Gamma function itself diverges).
Because of the divergence at z = 0, one usually writes the shifted expansion

1
1
1
.
( z 1)
z
k
k 1 k z
k 1 k ( k z )
In terms of this function, the coefficient of in the expansion of 1 3 is clearly given by

1 3 1 3 1 3
and the desired integrals are given by
1 1
x ln x e x dx 1 3 1
3 3
and

1 1 1
x3
0 ln x e dx 9 3 3 .
Using a more advanced technique, it is possible to actually determine the value of the digamma
function at all rational positive numbers. Basically, one sums the series associated with the digamma
function in closed form. This technique leads to logarithms of the rational number, as well as the
trigonometric functions and their logarithms evaluated at times the rational number. We develop
this technique in chapter 6; it results in the value
3 3
1 3
ln 3 ,
6
2
and allows us to express these integrals in closed form, except for the number 1 3 .

Using the digamma function, we can even express more exotic integrals like

1 1 1
x 2
0 ln x e dx 2 2 2 .
This integral is inexplicably rejected by Mathematica as divergent, even though it clearly converges.
Mathematica gives the correct value if asked to evaluate the integral numerically, and also fails to
82

Section III.2: Expansions of the Gamma function and Related Integrals

evaluate related integrals with other irrational powers of x in the exponent, so it seems to be an isolated
issue with these types of integrals. Computer-based algebra systems are very useful, but it is important
to understand that they are not infallible. One must be careful to ensure that the results of a computer
algebra system such as this make sense before one trusts them.
To determine integrals involving higher powers of the logarithm, we need higher derivatives of
the Gamma function. These derivatives are given by

d m 1
1
( m ) ( z 1) m 1 ln ( z 1) (1) m 1 m !
,
m 1
dz
k 1 ( k z )
and are called the polygamma functions. It is clear from their expansion that
( m ) (1) (1) m 1 m ! (m 1) ,
in agreement with our series expansion of ln (1 z ) (why?). The polygamma functions can also be
evaluated exactly for integral multiples of 1/2, and there are other rational values of z for which some
of these functions are available in closed form.

Exercises for Section III.2


In problems 1 24, determine the value of the integral.
mathematical constants , , and e if possible.
1.

4.

7.

10.

13.

16.

19.

22.

x 2 ln x e 3 x dx

2.

x3 ln 2 x e x dx

5.

x ln x e 3 x dx

8.

x x ln x e x dx

11.

x ln x e 2 x dx

14.

ln x e x dx

17.

ln x sin x e x dx

20.

x 2 ln x sin x e x dx

23.

x 4 ln x e 2 x dx

3.

x 2 ln 3 x e x dx

6.

9.

12.

15.

ln 2 x e 2 x dx

18.

ln x cos 2 x e x dx

21.

x ln x sin 3x e2 x dx

24.

Express your result in terms of the

x ln 2 x e x dx
2

ln x e x dx
3

ln x e3 x dx

x 2 ln 2 x e x dx
x 2 ln 4 x e x dx

x ln 3 x e x dx
2

ln 2 x e x dx
3

ln x e 2 x dx
2

x ln 2 x e x dx
x ln x sin 3x e x dx
x 2 ln x sin 2 x e3 x dx

83

Advanced Mathematical Techniques

25. Rederive the expansion

3
256
(1) j
1
j 1 ( j ) 1 j
2
225 j 2 2 j
2
for the Euler-Mascheroni constant. How many terms must be included to obtain 5 digits for
(so the error is less than 5 10 6 )? How many must be included for 20 digits? What about 30
billion digits? Obtain an expression for the Euler-Mascheroni constant that re-sums the next
two contributions to the zeta function, so the analogous term in the brackets starts with 1 5 j .
How many terms must be kept in this new expansion in order to obtain 5 digits, 20 digits, and
30 billion digits for ? Is this a significant improvement? Remember that all of the zeta
functions must be known to at least as many digits as you intend to obtain. What happens if
you re-do the analysis, starting with the expression
j
4 (1) ( j )
ln
j 1
j 2 2 j
and keeping the terms j = 2, 3, and 4 as exact while re-summing the first three contributions to
the rest? You do not have to actually do this, just think about how the result will behave.
How will the remaining series behave for large j? Will this manipulation represent a
significant improvement to the calculation of 5 digits, 20 digits, or 30 billion digits? Explain
why or why not. How many terms would need to be re-summed in order to determine the
value of to 30 billion digits by keeping only 100,000 terms in the resulting, un-re-summed,
series? Does this approach seem feasible for an actual calculation?

ln

Section III.3: The Beta Function


The Gamma function itself does not appear in applications quite as often as a related result, the
Beta function. This function allows for quick evaluation of many different integrals in terms of the
Gamma function. To develop the main formula associated with this function, we consider the product
( )( )

x 1 e x dx

du dx x 1 u x

du u 1e u
0

dt t
0

y 1 e y dy dx dy x 1 y 1e x y
1

x
e u du dx x 1 u 1 1
0
0
u

eu .

(1 t ) 1 ( ) dt t 1 (1 t ) 1
0

In the development of this function, we made two important u-substitutions. In the first, we use
u x y , and in the second we use t x u . Make sure you understand how this formula is
developed. The end expression,
1
( )( )
1
1
0 t (1 t ) dt ( ) B( , ) ,
is known as the Beta function for historical reasons.
84

Section III.3: The Beta Function

This function can be used directly to compute many different integrals that would otherwise be
very difficult to obtain. Examples include

t
0

13

1 t dt

t (1 t ) dt 8

4 3 3 2 6 1 3 3 1 3 1 6

17 6
55 5 6
55

1 7
.
9 9 14
We can also use this function to demonstrate many properties of the Gamma function. One prime
example is Legendres duplication identity. Beginning with the integral
1
2 ( z)
z 1
z 1
t
(1

t
)
dt

,
0
(2 z )
we make the u-substitution u 2t 1 . This gives
1
z 1
2 ( z)
1 1
t z 1 (1 t ) z 1 dt 2 z 1 1 u 2 du
0
1

(2 z )
2
.
z 1
( z )
1 1
1 1
2 z 2 1 u 2 du 2 z 1 s 1 2 (1 s ) z 1 ds 2 z 1
0
0
2
2
2 z 1 2
Comparing the two sides gives the identity:
22 z 1
(2 z )
( z ) z 1 2 .

1 t 7 dt

There are other ways to derive this identity, notably from the infinite product, but this one is definitely
the cleanest.19
As with the above integrals, we can extend our analysis by including logarithms in our
integrals. This treatment is fairly straightforward, but there are some new results that come from the
fact that we now have two variables that we can differentiate with respect to. The integral

x 2 (1 x)3 ln x dx ,

for example, is given by the coefficient of in the expansion of


1
(3 )(4)
1
2
3
0 x (1 x) dx (7 ) 6 (6 )(5 )(4 )(3 )
1
1

.
5 4 3 1 6 1 5 1 4 1 3

1 1 1 1 1 1 19
1 1
60 6 5 4 3 60 20

Apparently,
19
.
1200
Notice that the irrational parts of the Gamma function have canceled completely in the calculation of
this integral. It is clear that every integer power of the logarithm will give a rational result for this

x 2 (1 x )3 ln x dx

19

The infinite product derivation of this identity is notable because it can be modified fairly easily to also give results for
other multiples of z. These are not as simple, but can be useful in many different situations. The general form of this
identity is attributed to Gauss.

85

Advanced Mathematical Techniques

integral. This will not be the case if the power not associated with is not an integer, as illustrated by

x 2 1 x ln x dx :

x 2 1 x dx

(3 ) 3 2
(2 )(1 )(1 ) 3 2

9 2
7 2 5 2 3 2 1 2 1 2

.
(1 )
1 3 2
16
105 1 352 105 1 2
Using the duplication identity on the remaining ratio of Gamma functions, we see that
(1 )
2 2 2 (1 )
.

1 2
(1 2 )
If we write out the expansions for these two Gamma functions, we will immediately see that the
coefficient of cancels. Therefore, this ratio cannot contribute to the integral we are interested in (it
will, however, contribute to an integral with the logarithm squared). Pulling all of this together, we
arrive at the result
1
16
389
2
0 x 1 x ln x dx 105 2 ln 2 210 .
Many different results again come from this sort of treatment. In integrals involving square
roots, we often arrive at the same conclusion that the coefficient of associated with the final ratio of
Gamma functions cancels. In fact, it is difficult to construct an integral involving Beta functions in
which the Euler-Mascheroni constant appears in this final result. Try using these techniques to show
that
1 ln x
0 1 x dx 4 ln 2 1
and
1
4
0 1 x ln x dx 9 3ln 2 4 .
When the argument of the logarithm also appears under the square-root, the character of the
integral changes. Consider, for example, the integral
1 ln x dx
0 x(1 x) .
To evaluate this integral, we begin with the expression
1
1 2 1 2 1 2
1 2
1 2
0 x (1 x) dx (1 ) 22 2 (1 ) .
As before, the ratio of the Gamma functions does not contribute to the coefficient of in the
expansion and we have
1 ln x dx
0 x(1 x) 2 ln 2 .
The appearance of is an indication that the x was also associated with a square-root. It is important
to recognize that the ratio of the Gamma functions is not equal to 1. It does not contribute to the
coefficient of in the expansion, but this does not imply that it doesnt contribute to any of the
coefficients. The integral associated with the square of the logarithm, for example, requires the
coefficient of 2 :

86

Section III.3: The Beta Function


2

( 1)k

(k )

(2 )k

(2)
(4 2) 2
(1 2 )
e
22
22 e 2

2
(k ) k
(1 )

2 2 ( 1)k
k
k 2
e
1
2

1 2 ln 2 2 ln 2 2 1 (2) 2 .
2

k 2

1 2 ln 2 2 ln 2 2 (2) 2

The integral is therefore given by

ln 2 x dx
2
4 ln 2 2 .
0
3
x(1 x)

Try taking this one step further to show that


3
1 ln x dx
2
3
0 x(1 x) 2 ln 2 4 ln 2 6 (3) .
Another slight twist that appears in the consideration of the Beta function is the appearance of
integrals that converge, but resist direct efforts to compute them using the above formalism. One such
integral is
1 ln(1 x )
0 x dx .
This integral certainly converges, as the expansion of the natural logarithm implies that the integrand
behaves like -1 near x = 0. However, our standard approach would have us consider the divergent
integral

x 1 (1 x) dx .

To get around this, we instead consider the integral

( )(1 )
(1 )
for very small values of . The desired integral is the coefficient of in the expansion of this
integral, with the understanding that should be taken to zero at the earliest possible convenience.
This process is known as regularizing the integral in question. We are looking for a perfectly welldefined result, but have to introduce in order to avoid singularities that appear in the course of our
analysis. Multiplying and dividing by , we arrive at the expansion
1 (1 )(1 )
(1 )

x 1 (1 x) dx

(k ) k
(k )

k
exp (1) k
k ( ) ( 1) k
.

k
k

k 2
k 2

k
1
(
)

exp (1) k
k k ( )k
k

k 2

It should be clear from the appearance of this expansion that there are no terms in the exponent that do
not contain a factor of to cancel the one out front. Keeping only terms with one power of (any
more will cause the result to vanish with ), we have
1 (1 )(1 ) 1

1
exp (1) k (k ) k 1 (2) (3) 2 (4) 3 .
(1 )

k 2

The only term that is actually divergent as 0 is the first term; the others give us the integrals

87

Advanced Mathematical Techniques

ln k (1 x)
k
0 x dx (1) k ! (k 1) .
The one we are interested in is given by
1 ln(1 x )
2
0 x dx 6 .
This result could have been obtained much earlier, as the expansion for the natural logarithm gives

1 ln(1 x )
1 1 n 1
1
2
dx

x
dx

2
0 x
0
6
n 1 n
n 1 n
The other integrals, on the other hand, would have been much more difficult to obtain in this manner.
Treading on much more dangerous waters, we can also consider the integral
2
1 ln (1 x )
0 x 2 dx .
This integral is again perfectly well-defined, but resists direct efforts to compute its value.
Regularizing it by considering instead
1
(1 )(1 )
(1 )(1 )
2

0 x (1 x) dx ( ) (1 ) (1 ) ,
we have a chance at actually computing its value. In this case, the integral is actually only welldefined when 1 , so we have no direct access to the limit as 0 . Nevertheless, the above
analysis does yield the value. The reason for this is a bit involved, and has to do with the way that
functions must behave in the complex plane. Since the integral we are looking for is actually welldefined, its value has no choice but to be represented in this function. Try going through this analysis
to show that
k
1 ln (1 x )
k
0 x 2 dx (1) k ! (k ) ; k 2 .
3
1 ln (1 x )
What happens when we consider the well-defined integral
What about
dx ?
0
x3
4
1 ln (1 x )
0 x 4 dx ? Hint: it is not as easy as you might guess from the two results I showed something
happens with the -dependence that complicates matters a bit, but the results are still doable.
As a final point on the determination of integrals involving logarithms from this incarnation of
the Beta function, consider the integral
1

ln x ln(1 x) dx .
0

This integral contains two logarithms, but they are not of the same function. As such, we cannot
simply introduce a single new parameter to expand in. Instead, we need to introduce two new
parameters:
1
(1 )(1 )
1
(1 )(1 )

0 x (1 x) dx (2 ) 1 (1 ) .
We are interested in the coefficient of in this expansion. This is the same ratio of Gamma
functions we had above, so we have
1
(1 )(1 )
(k ) k

1 ( ) ( ) 2 exp (1) k
k ( )k .

1 (1 )
k
k 2

Since we are only looking for the coefficient of , we can ignore any powers of or that are
greater than 1. This immediately gives us
88

Section III.3: The Beta Function


1

ln x ln(1 x) dx 2 (2) .
0

Try to show that


1

ln
0

x ln(1 x)dx 2 (2) (3) 3


and

ln
0

x ln (1 x)dx 24 8 (2) 8 (3) 2 2 (2) 6 (4)


2

from the coefficients of 2 and 2 2 in this expansion. Remember that the coefficient gets
multiplied by the factorial of the power it is associated with when determining the integral.
We can expand the usefulness of our original Beta function result by making some changes of
variable. First, suppose that we are interested in what this integral implies about other integrals whose
1 t
.
range of integration extends from 0 to infinity. To do this, we use the change of variables u
t
This clearly equals 0 when t = 1 and approaches infinity as t 0 . Our integral becomes
1
1
u
du
( )( )
1
1
t
(1

t
)
dt

0
0 (1 u) ( ) .
This is a very useful form of the Beta function integral, and can be directly used to determine such
exotic integrals as

t dt
0 (1 t )2 2

dt
0 t (1 t )

dx
0 x(1 x)3 2

t dt
2

3
(1 t )
9 3

t dt 2
.

(1 t ) 2
4
In the last two of these expressions, I have used the reflection identity to simplify the results. The
reflection identity always applies when the power of (1 t ) in the denominator is a natural number
(why?).
The general nature of the parameters and in these expressions allows us to develop even
more general integration formulae. The integral
m 1
x
dx m n m n
,
0 x n 1
n ( )

4

for example, gives rise to many interesting results including

2 1 4
dx

0 x 4 1 4 ,

x x dx

3
,
40

and
89

Advanced Mathematical Techniques

x dx

x 1

When 1 , we have the special result

2 5 1 10
5

x m 1 dx
m
0 x n 1 n csc n .
This can be derived from the reflection identity. This is useful not only for deriving the integrals
dx
2
0 x3 1 3 3

dx
2

4
x 1
2
x dx
2
0 x 4 1 4
3 x dx

0 x8 1 4 ,
but also because it is easy to differentiate the cosecant function (at least once or twice) to obtain
m 1
x
ln x
2
m
m
0 x n 1 dx n2 csc n cot n
m 1
x
ln 2 x
3
m

2 m
dx
csc

1 .
2 csc
3
0 x n 1
n
n
n

These integrals give rise to the exotic expressions


ln x
2 2

dx
,
0 x3 1
27
ln x
2 2
,

dx
0 x 4 1
16
2
x ln x
2 2
,

dx
0 x 4 1
16
2
ln x
10 3

dx
0 x3 1 81 3 ,

x ln 2 x
3
,
dx
3
0
27
x 1
and
2
3 x ln x
5 3

dx
.
0 x 4 1
96 3
Obviously, there are many more where these came from. We could, of course, also differentiate our
earlier expression with respect to m to obtain more general results, but this requires a knowledge of the
derivative of the Gamma function and therefore does not always lead to such nice results in terms of
.
The third, and in some respects most useful, integral expression for the Beta function is easiest
to derive directly from the product ( )( ) :

( )( )
90

1 u

e du

1 v

e dv .

Section III.3: The Beta Function

Instead of integrating this directly, as we did to derive our original expression, we make the change of
variables u x 2 and v y 2 and change over to polar coordinates:

( )( ) 2 x 2 1e x dx 2 y 2 1e y dy 4 r dr r 2 2 2 e r
2

2 u 1e u du

cos 2 1 sin 2 1 d 2( )

cos 2 1 sin 2 1 d

cos 2 1 sin 2 1 d

Therefore, we have

( )( )
.
2( )
The Beta function gives us access to the integral of products of any powers of the sine and cosine
function over the interval 0, 2 . Some examples are

cos 2 1 sin 2 1 d

cos 2 sin 8 d

sin d

7
512

2 3 4
2 2

3 4
1 4

2 1 4
d

0 sin 2 2
2 3 4
cos sin d
2

and
2
4 3 4 2 2 2
2
0 cos sin d 5 1 4 5 3 4 .
This expression for the Beta function is most often used for integer powers of the trigonometric
functions, but it is interesting that the square roots can also be obtained.
One very interesting result that can also be obtained from this is the area and volume of an ndimensional hypersphere. Following the treatment of spherical coordinates in three dimensions, we
parameterize an n-dimensional sphere with radius r using the n + 1 coordinates x j

n 1
j 1

in terms of the

n 1 angles k k 1 and the single angle as


n 1

x1 r cos 1
x2 r sin 1 cos 2
x3 r sin 1 sin 2 cos 3

xn 1 r sin 1 sin 2 cos n 1


xn r sin 1 sin 2 sin n 1 cos
xn 1 r sin 1 sin 2 sin n 1 sin
The entire hypersphere is traversed by allowing all of the angles to vary from 0 to , while the
angle varies from 0 to 2 . Note that the ordinary sphere we all think of when someone says
sphere is, in this notation, a two-dimensional sphere since the surface itself is two-dimensional. The
element of hyperarea generated on the sphere by infinitesimal changes d j and d in our angles is
given by the product of the effective radii associated with these changes and the change in angle. A
91

Advanced Mathematical Techniques

change in the angle is clearly associated with the radius r sin 1 sin 2 sin n 1 (just look at xn and
xn 1 ), and a change in the angle n 1 is associated with the radius r sin 1 sin 2 sin n 2 . This
process continues, and we arrive at the area element
dS r n sin n 1 1 sin n 2 2 sin n 1d1d 2 d n 1d .
The surface area of this hypersphere is therefore given by integrating this element from 0 to for all
of the angles and 0 to 2 for the angle :

S n 1 r n sin n 1 1 d1 sin n 2 2 d 2 sin n 1 d n 1 d


n 1 n 1 1
2 1



2
2
2
2
2 2 2
rn
n 1
n
3

2
2
2
n 1

2 2
r
n 1

2
The n + 1 is kind of annoying; it is easier to use d = n + 1 for the dimension of the space in which the
hypersphere lives (d = 3 for normal spheres). The surface area of our n-dimensional hypersphere is
2 d 2 d 1
Sd
r .
d 2
This is a very useful result in many mathematical applications. We can check it by evaluating at d = 3:
2 3 2 2
S3
r 4 r 2 ,
3 2
in agreement with the known result. A three-dimensional hypersphere with radius r has the surface
hyperarea
2 2 3
S4
r 2 2 r 3 .
2
To determine the hypervolume enclosed by an n-dimensional hypersphere of radius a, we simply
integrate the surface area result from r = 0 to r = a:
2 d 2
d 2
Vd
ad
ad .
d d 2
1 d 2
4
1
This gives a 3 for the normal sphere and 2 a 4 for a three-dimensional hypersphere. These
3
2
results surprisingly do find applications in physics, most notably in general relativity, where the
universe as we know it is often modeled as the surface of an expanding three-dimensional hypersphere,
and in particle physics, where doing calculations in d-dimensions is an attractive way in which to
regularize the divergences that often occur in quantum field theory. The second of these applications
actually won the Dutch physicists Gerardus t Hooft and Martinus Veltman the 1999 Nobel prize in
physics.
As a final topic associated with the Beta function, lets consider integrals involving logarithms
of the trigonometric functions. These are fairly standard, except that they almost always require the
use of the duplication formula. As an example, consider
n

92

sin x ln sin x dx .

Section III.3: The Beta Function

As always, we begin by considering instead


2
2
1 2 1 2
1 2
2 1 2
1
sin
x
dx
.

0
2 3 2 2
1 1 2 2 1 (1 )
The ratio of Gamma functions here should be familiar. It does not contribute to the desired integral, so
the result is

sin x ln sin x dx ln 2 1 .

Bumping the power of the logarithm up by one, we obtain

ln 2 2 2 ln 2 .
12
Some other results that you should try out on your own are
2

0 ln sin x dx 2 ln 2
2

2
2
2
0 ln sin x dx 24 12 ln 2
2

2
0 cos x ln sin x dx 8 1 2 ln 2 .
2 ln sin x
The integral
dx must be regularized in order determine its value. Try to use the techniques
0
cos x
illustrated above to show that this integral has the value
2 ln sin x
2
0 cos x dx 8 .
Your analysis should also make it easy to go one step further and show that
2
2 ln sin x
7
0 cos x dx 4 (3) .
These regularization procedures do not end with simply regularizing a single integral. The integral
2 1
1
0 sin x x dx ,
for example, is definitely well-defined because the problem at x = 0 cancels in the difference. In
order to calculate its value, however, we need to separate the two contributions. This can be
accomplished by considering instead the integral
2
1
1
0 sin1 x x1 dx
for very small values of . Using this approach, try to show that this integral has the value
2 1
1
4
0 sin x x dx ln .
Your analysis should also allow you to show that
2 ln sin x
ln x
1
4 2

dx
ln
ln

0 sin x x 2
24
0

sin x ln 2 sin x dx 2

93

Advanced Mathematical Techniques

Exercises for Section III.3


In problems 1 57, determine the value of the given integral. Express your answers in terms of and
other mathematical constants if possible. Whenever k appears, it is a natural number.
1.

4.

7.

10.

x 1 x

13.

16.

19.

22.

25.

x 1

28.

x 1

31.

x 1

34.

x 1

37.

94

x3 1 x dx

2.

x x 1 x 2 dx

5.

x 1 x 2 dx

8.

ln x dx

11.

x 1 x

x ln x ln(1 x) dx

14.

x ln 2 x ln 2 (1 x) dx

17.

x ln x ln 1 x 2 dx

20.

x ln 2 x ln 2 1 x 2 dx

23.

dx

26.

x 1

dx

29.

x 1

dx

32.

x 1

dx

35.

0
1

x x

x2

x
4

x x ln x

x 4 x ln x

x3

dx

38.

x3 x(1 x) dx

3.

x(1 x) dx

6.

x 2 1 x 4 dx

9.

x 1 x

12.

x 1 x

x 2 ln x ln(1 x) dx

15.

x 2 ln 2 x ln 2 (1 x) dx

18.

ln

x 2 ln x ln 1 x 2 dx

21.

ln x ln 1 x dx

x x ln x ln(1 x) dx

24.

dx

27.

x 1

dx

30.

x 1

33.

x 1

0
1

ln 2 x dx

x2 x

x 2 ln x

x 2 x ln x

x x

x2

x 4 x(1 x) dx

ln x dx

2 3

ln 2 x dx

x 2 ln 2 x ln(1 x) dx

x ln 3 (1 x) dx

x ln x ln 2 (1 x) dx

x3x

dx

39.

dx

x ln x

36.

dx

x 3 ln x

dx

x 4 1 x 2 dx

1
4

dx

x2 x

dx

dx

x ln x

dx

Section III.3: The Beta Function

x 2 ln x

40.

43.

46.

49.

ln(1 x)
0 x x dx .

sin 6 x cos 4 x dx

44.

sin 4 x ln cos x dx

47.

50.

ln 2 (1 x)
0 x x dx

53.

56.

ln 3 1 x 2

52.

55.

ln 2 (1 x) ln x
dx .
0
x2

x 2 ln 2 x

41.

dx

x3

dx .

x3 ln 2 x

42.

sin 2 x cos8 x dx

45.

sin 4 x sin x dx

48.

51.

ln(1 x ) ln x
dx
0
x

54.

ln(1 x) ln 2 x
dx .
0
x

ln k 1 x
0 x3 dx .

57.

ln k 1 x
0 x 4 dx .

dx

dx

sin 2 x cos 2 x ln sin x dx


sin 4 x cos 2 x sin x dx

ln 2 1 x 2

x2

dx .

ln k 1 x
0 x k dx , for k a natural number, as a finite series
involving the Riemann zeta function. What happens to this integral as k increases without bound?

58. Determine the value of the integral

59. What happens to the hyper-area and hyper-volume of the unit d-ball (the (d -1)-dimensional sphere
with radius 1 and the volume contained within) as d grows without bound? Is this behavior
expected qualitatively? What dimension d boasts the hypersphere with maximum surface area?
What about maximum volume? Determine the maximal surface area and volume for integer
dimension d.

60. The integral

ddk

M2

finds a surprisingly large amount of use in quantum field

theories. Here, k is a d-dimensional vector and , , and M are all constants. The integral is taken
over the entire d-dimensional space.
(a) Argue that the direction of the vector k is unimportant to the integrand, so the integral can be rewritten as a single-variable integral involving the surface area of a sphere in d-dimensions.
(b) Re-write the integral, using the arguments you gave in part (a), and evaluate it. This integral
was part of the reason behind the arguments given by the winners of the Nobel prize in physics
in 1999, and was instrumental in demonstrating the result that won the Nobel prize in physics in
2004.
95

Advanced Mathematical Techniques

Section III.4: The Riemann Zeta Function

As we have seen above, the Riemann zeta function plays a major role in the development of the
Taylor series expansion for the Gamma function. It also appears prominently in other related
expansions, including those of tan x , cot x , ln sin x x , and ln cos x , and in many integrals
involving logarithms and exponentials. These appearances account for the majority of its use in
applications, but they are a far cry from the true importance of this function in analytic number theory.
The focus of this course is on applications of mathematics, so we will spend the majority of our effort
on the appearance of the Riemann zeta function in integrals and sums, but it is also important to see
where this function fits into number theory. Its contributions to analytic number theory arguably
represent some of the most ground-breaking mathematics of the last century and a half, so it is
appropriate to start this section with a brief treatment of some of those important results.
The zeta functions contributions to number theory begin with a simple observation. If the real
part of s > 1, then the zeta function

1
1 1 1 1 1
( s) s 1 s s s s s
2 3 4 5 6
n 1 n
converges absolutely; its terms may be added in any order we like without changing the ultimate
result. Multiplying this expression by 1 1 2 s gives
1

1 s
2

1 1 1 1 1
1 1 1 1

( s) 1 s s s s s s s s s
2 3 4 5 6
2 4 6 8

.
1 1 1 1
1
1 s s s s s
3 5 7 9 11
All of the multiples of 2 have been removed from the sum, so the remaining sum is taken only over
odd numbers. Multiplying by 1 1 3s , we have
1
1
1 1 1 1
1
1 1
1
1
1

1 s 1 s ( s) 1 s s s s s s s s s s
3 5 7 9 11
3 9 15 21 27
3 2
.
1 1
1
1
1
1 s s s s s
5 7 11 13 17
The remaining contributions are not multiples of either 2 or 3. Continuing in this manner, we see that
multiplying by 1 1 p s , with p a prime number, removes all contributions to the sum that are
associated with a multiple of p. Obviously, all natural numbers except 1 itself are multiples of some
prime number. Therefore,

1 s ( s) 1 ,
p
p prime
or
1

ps

1
( s) 1 s s .
p
p prime
p prime p 1
This can be thought of as the infinite product representation of the Riemann zeta function. The
product goes over all prime numbers p. Note that the number 1 is not prime. It plays a special role in
our number system, being the only natural number that is neither prime nor composite. This
distinction is made in order to have a unique factorization of every natural number (except 1) in terms
of primes. If 1 were considered prime, then any factorization could be modified by including an
arbitrary number of factors of 1.
96

Section III.4: The Riemann Zeta Function

There are many ways to establish that there are an infinite number of primes. One of the oldest
(and cleanest, in my view) is a proof by contradiction. Suppose that there were only a finite number N
N
of primes, pn n 1 , where the sequence is arranged in order from smallest to largest. Then, the number
N

P 1 pn 1 2 3 5 7 p N
n 1

cannot be prime, as it is certainly larger than the largest prime pN . On the other hand, this number is
definitely not divisible by any of the primes in our sequence, as the remainder of any of these divisions
is 1. It must therefore either be prime or be divisible by a prime number that is not on our list. This
contradiction implies that our original hypothesis is incorrect, i.e. our original list is incomplete and
there cannot be a finite number of primes. We can see how this works by looking at a few trial
sequences of prime numbers. If we consider 2 as the only prime number, then the argument asks us to
construct the number P 1 2 3 which cannot be divisible by 2. Therefore, it must either be prime
or divisible by a prime other than 2. Of course, 3 is a prime number. If we consider only 2 and 3 as
primes, then the argument constructs the number P 1 2 3 7 which cannot be divisible by either 2
or 3. Again, it therefore must be prime itself or divisible by a prime other than 2 or 3. Note that this
argument does not allow us to actually construct the correct sequence of primes. We have, in this case,
sort of skipped over 5. Continuing, suppose that 2, 3 and 7 are the only primes. The argument
constructs the number P 1 2 3 7 43 , which again is prime. We have skipped over many, many
primes in this leap, but we could continue to construct P 1 2 3 7 43 1807 . This number is not
prime, as it is equal to the product of the primes 13 and 139. Neither of these numbers is on our list, so
the argument still works fine.
Although the fact that there are an infinite number of primes has been known since antiquity
(the above argument goes back to Euclid circa 300 B.C.), the product representation of the zeta
function provides two interesting alternative ways of seeing the truth of this fact. If there were a finite
number of primes, then the product could certainly be evaluated at s = 1. None of the terms in the
product is divergent, and there are only a finite number of them, so the product
p

p prime p 1
is definitely finite. On the other hand, the zeta function certainly diverges at s = 1 because it becomes
the divergent harmonic series there. It must grow without bound as s approaches 1, so either the
formula for the zeta function is incorrect or our assumption that there are finitely many primes is.
Another way to see this that avoids the divergence is to evaluate the product at s = 2. If there were
only a finite number of primes, then the product
p2
2

p prime p 1
would be a rational number. Eulers result that (2) 2 6 disallows this possibility because is a
transcendental number, again forcing us to conclude that there are infinitely many primes.
Since there are an infinite number of primes, the product representation of the zeta function is
an infinite product. This basic relationship between the Riemann zeta function and the prime numbers
is the reason why this function is so important in analytic number theory. We can determine a more
direct relationship by taking the logarithm of both sides of our product representation:

1
1
1
1
sk .
ln ( s ) ln 1 s
sk
p
k
p
k
p
p prime
k 1
p prime

p prime k 1
This series is similar to the corresponding series for the Gamma function, in that we are summing sums
of inverse powers of whole numbers. In this case, however, we do not include all of the natural
97

Advanced Mathematical Techniques

numbers in our sums. Only prime numbers are included. Confining our attention to values of s near 1
(NOT s = 1 s near 1), we see that the left-hand side of this expression grows without bound as s
approaches 1. The right-hand side consists of the term when k = 1 and other terms with larger values
of k. The sum of the latter of these terms definitely does not diverge, as

1
1
1
1
1
1
1
2
1
1
k 2 3 4
2

sk
2
k
p
k
p
p
p
p
p
p
p
2
3
4
2
3
2
k 2
p prime
k 2
p prime
p prime
p prime

1 1 1
1
1
1
1 2

2
2
p p
p prime 2 p
p prime 2 p 1 1 p p prime 2 p( p 1)
whenever s > 1. These inequalities are fairly straightforward; we use the geometric expansion, which
is justified because all prime numbers are greater than 1. The final expansion converges by limit
comparison to the series associated with (2) , so the original series is bounded by a finite number.
This implies that the divergence of (1) is due entirely to the divergence of the series associated with
k = 1, 1 p .

p prime

Many important results in number theory are associated with the behavior of this series. The
Meissel-Mertens constant, named for the German astronomer Daniel Friedrich Ernst Meissel and the
German mathematician Franz Mertens, is an analogue of the Euler-Mascheroni constant, defined by
N 1

1
1
M lim ln ln N ln 1 .
N
p
p prime p

p prime p

The limit converges to the value 0.2614972 . This constant really is the gamma constant, but
including only prime numbers in the sums. Note that the logarithm in the definition of the Euler
constant has been replaced by the log of the logarithm. This will have important consequences as we
go further into the theory of prime numbers.
The fact that the series 1 p diverges implies that prime numbers are, in some sense, more
p prime

common than perfect squares since the sum of inverse squares of natural numbers converges. Suppose
that, instead of adding up the contributions from all prime numbers, we only consider those prime
numbers that are less than the specific number N. Waving our hands a bit, we arrive at the expression
N
N
1
1
ln
.
n
n 1
p prime p
The analysis of chapter 1 indicates that for large N the sum on the left is approximately given by ln N .
This brings us to the very important result20
N
1
ln ln N .
p prime p
The importance of this result lies in its prediction for the density of prime numbers as we go to larger
and larger values of N. Writing the left-hand side as an integral gives us
N
N dx
1

2 x ln x
p prime p
We start the integral at x = 2 to avoid the divergence at x = 1; this is immaterial to the final result, as
we are really only concerned with the asymptotic behavior of the integral. The interesting thing about
the integral on the right is that it implies that the density of prime numbers is given approximately by
1 ln x , i.e. the number of prime numbers on the interval x, x dx is given approximately by dx ln x .
20

The definition of the Meissel-Mertens constant gives a much more valid ground on which to base this result.

98

Section III.4: The Riemann Zeta Function

The approximation is made better and better as x increases. This statement is formalized by the
celebrated prime number theorem, which states that the number of prime numbers less than a given
large number N, represented by ( N ) , is approximately given by the expression
N dx
(N )
Li( N ) .
2 ln x
Riemann was able to show in 1859 that this expression, as well as a much stronger statement about the
exact number or primes below N, is valid if a certain condition on the behavior of the Riemann zeta
function is valid. The function Li(N) is called the logarithmic integral. It cannot be expressed in
terms of elementary functions, but has been the subject of a large amount of mathematical work over
the last century and a half.
- 50

2 106

4 106

6 106

8 106

2 1010 4 1010 6 1010 8 1010 1 1011

1 107

- 5000

- 100
- 150
- 200

- 10000

- 250
- 300

- 15000

Figure 7

Figure 8

The little version of the prime number theorem, which states that the number of primes less
than a given number N is asymptotically given by the above expression, was proved independently by
the French mathematician Jacques Hadamard and the Belgian mathematician Charles Jean de la Vallee
Poussin in 1896. For accessible numbers, it seems that the prime number function always lies below
this estimate. The difference ( x ) Li( x ) is plotted versus x in figure 7 for values of x up to ten
million and in figure 8 for values up to one hundred billion. The available data caused many
mathematicians to believe that this inequality is strict, i.e. that the actual number of prime numbers
below N is always smaller than Li(N). Surprisingly, the British mathematician John Littlewood was
able to show in 1914 that these two functions actually cross each other an infinite number of times as
N is increased. His student, the South African mathematician Stanley Skewes, was able to show that
the first such crossing must occur before the gargantuan number
e79

1034

ee 1010 ,
immortalized in history as Skewes number, the largest useful number ever discovered.21 This number
has since been dwarfed by Grahams number, from multi-dimensional geometry, that cannot be
expressed in this document because of space.22 Skewes bound has also been reduced in recent years
to the more reasonable, yet still gargantuan, number 1.397 10316 . This represents a lesson in
mathematics: even though a postulate seems reasonable from all available data, it may still be false.
Infinity is really, really, really, really big.
The strong statement of Riemann is that the number of primes less than N satisfies a certain
formula given in Riemanns landmark 1859 paper, one of the most influential mathematical papers of
all time. I will not give the formula here, as it is fairly complicated,23 but it involves the value of the
logarithmic integral evaluated at certain numbers. These numbers are associated with the so-called
non-trivial zeros of the zeta function. In order to understand what is meant by this term, we need to
understand something called analytic continuation. The Riemann zeta function, as defined above, only
21
This result assumes that the Riemann hypothesis is true. If the hypothesis fails, then the 34 is replaced by 963 in
Skewes result.
22
Even if every atom in the universe was to be used as a digit in a power expansion of this number, the amount of space in
the visible universe would still be woefully inadequate for this purpose like not even close.
23
This formula is given and derived in chapter 6.

99

Advanced Mathematical Techniques

converges when the real part of s is larger than 1. We can think of this along the same lines as the fact
that the series

converges only when the absolute value of x is less than 1. This fact is definitely

n 0

1
, certainly is finite for more values than
1 x
just those. In fact, this function is perfectly well-defined for all x 1 . This is the basic idea behind
analytic continuation: we are interested in finding a differentiable function that equals the sum of a
given series wherever that series converges, but is also defined outside the domain of convergence of
the series. Standard results in complex analysis indicate that such a function, if it exists, is unique.
There can only be one analytic continuation of a given function.
We cannot simply sum the series associated with the Riemann zeta function like we can in the
case of the simple geometric expansion, but we can get around this fact by playing games with an
associated conditionally convergent expansion. Consider the expansion

(1) k 1
1 1 1 1 1
(s)
1 s s s s s
s
k
2 3 4 5 6
k 1
.
1
1 1 1 1

( s ) 2 s s s s 1 s 1 ( s )
2 4 6 8
2
Whenever ( s ) is defined, it clearly has the value

true, but the function that the series sums to when x 1 ,

2 s 1
( s) .
2 1
The new series, ( s ) , is called the Dirichlet eta function after the German mathematician Johann
Dirichlet who developed it in the middle part of the nineteenth century. It is defined whenever the real
part of s is greater than 0, due to the alternating series test. Therefore, this last expression allows us to
define the zeta function for Re( s ) 0 . This new definition still has difficulty at s = 1, but this
difficulty is confined to the factor in front of the new series. The value of (1) is clearly ln 2 , as can
be seen from chapter 1. As s approaches 1, the zeta function behaves like
1
,
( s )

s 1
s 1
as can be seen from the expansion of the denominator near s = 1. This allows us to extend the
definition of the zeta function to all values of s satisfying Re( s ) 0 .
A somewhat more sophisticated approach that I will not go into here24 gives the reflection
identity for the zeta function,
s
( s ) 2 s s 1 sin (1 s ) (1 s ) .
2
This identity expresses ( s ) in terms of the Gamma function and the sine function, as well as
(1 s ) . It is satisfied for all complex numbers s, with the exception of s = 0 and s = 1 for which both
sides of the equation are undefined.25 This identity allows us to compute many values of the zeta
function that are not accessible from the standard series definition. Allowing s to approach 0, for
example, we see that the right-hand side of this relation approaches
s
1
1
( s )
2 0 1
(1)
.
s 0
2
2
s

( s)

s 1

24
This approach represents one of the main results of Riemanns original paper, and is considered in much more detail in
chapter 6.
25
Its value at whole numbers is given by the associated limit. The limit converges in all of these cases.

100

Section III.4: The Riemann Zeta Function

In this expression, I have simply plugged in 0 wherever it doesnt give either 0 or infinity. Where it
would give these values, I plugged in what the function approaches as s approaches 0. This should be
clear for the sine function; the result plugged in for the zeta function itself follows from the idea that
( s ) approaches 1 ( s 1) as s approaches 1. Substituting 1 s for s gives the result.
0.020
0.015
0.010
0.005

- 0.005

- 10

-8

-6

-4

-2

Figure 9
We already know how to calculate the values of the zeta function for arguments that have real
part greater than 1, so the interesting new results of this formula come from taking the real part of s
less than 1. The Gamma function has no zeros at all in the finite plane, and its poles lie at values of s
that are not in the region of interest, so it can be regarded simply as a number for this purpose. The
s
same is true of the functions 2s and s 1 , so we are left with the function sin
. This function is
2
clearly equal to zero at even integers, so we have
( 2 n ) 0
for positive integer n. These are the trivial zeros of the zeta function, in that their value is determined
without regard to the value of the zeta function on the right-hand side of the equation. These zeros
cause the zeta function to oscillate wildly along the negative real axis, as illustrated in figure 9.
Dividing the expression by s 2n and taking the limit as s approaches 2n, we obtain an interesting
expression for the derivative of the zeta function at these zeros:
(2n)!
(2n) (1) n
(2n 1) .
2n
2 2
At negative odd integers, the sine function does not supply this zero; the value of the zeta function at
these points is given by
(1) n 1 (2n 1)!
(2n 1)
(2n 2) .
22 n 1 2 n 2
Due to the representation of even values of the zeta function given in this chapter as well as chapter 2,
these values are all rational. Examples are
1
(1)
12
1
.
(3)
120
The value of the zeta function is definitely nonzero whenever the real part of s is larger than 1,
as evidenced by the series expansion. The remaining contributions to the reflection identity do not
provide zeros except where we have already discussed, so the values of the zeta function for real part
less than zero also cannot be zero, except as already indicated (the trivial zeros). Therefore, the only
remaining zeros of the zeta function are confined to the so-called critical strip lying between real part
of s = 0 and 1. The proof of the strong version of the prime number theorem depends critically on the
location of these zeros, and the Riemann hypothesis, arguably the most important unsolved problem in
contemporary mathematics, states that all of these have real part equal to 1/2.
101

Advanced Mathematical Techniques

It is, in some sense, obvious from the form of the reflection identity why Riemann would have
made this conjecture. The existence of a zero with, say, real part equal to 1/4, would imply the
existence of another zero with real part 3/4. All zeros of the zeta function must come in complex
conjugate pairs because the zeta function is complex symmetric: ( z ) z . The reflection identity
would therefore require any non-trivial zero with real part not equal to 1/2 to have three associated
zeros instead of just one. Despite the ease of understanding this fact from the reflection identity, this
result has resisted over a century and a half of attempts by dedicated mathematicians. The complexity
of the Riemann zeta function makes it difficult to prove that all of the nontrivial zeros have real part
equal to 1/2, even though it has been shown that there are infinitely many such zeros. A graph of the
trek the zeta function makes as it traverses the line with real part equal to 1/2 is shown in figure 10.
The zeros are, of course, the places where the curve goes through the origin. Changing the real part
slightly makes a huge difference in the appearance of this graph, as shown in figure 11 for real part
equal to 3/4.
1.5

1.0

1.0

0.5

0.5

-1

- 1.0 - 0.5

- 0.5
- 1.0
- 1.5

Figure 10

0.5

1.0

1.5

2.0

- 0.5
- 1.0
- 1.5
- 2.0

Figure 11

Now that we have discussed some of the pure math applications of the Riemann zeta function,
lets get down to business. The infinite product expansion of the Riemann zeta function is nice for
number theorists, but applied mathematicians find much more use in the integral expression
n 1
x
dx
0 e x 1 (n) (n) .
This expression is derived in a fairly simple way. Since e x 1 for all x > 0, we employ the geometric
expansion:
n 1

dx
1
1
n 1 x
n 1 ( k 1) x
(
)

x
e
dx

x
e
dx

n
(n) (n) .

x
n
0 e x 1 0

0
1 e
k 0
k 0 ( k 1)
The integral converges only when the real part of n is larger than 1, so there should be no nonsense
like

dx
1 1
0 x e x 1 2 2 2.58841...
This integral does not converge, so we cannot directly assign a value to it. There are situations in
which this type of manipulation is acceptable as a means to a well-defined end, but I will point these
out if and when we come across them. In the absence of a well-defined reason why expressions like
this should be accepted, these integrals should be interpreted as an indication that we did something
wrong.
The integral representation of the -function is directly applicable to many problems,
including the evaluation of

102

Section III.4: The Riemann Zeta Function

x dx 2
0 e x 1 6
2
x dx
0 e x 1 2 (3)
and
3
x dx
4
0 e x 1 15 .
This last integral comes up, surprisingly, in the evaluation of the Stefan-Boltzmann constant associated
with their law of radiative heat. We will discuss this application in chapter 10.
As always, we can get more for our money by introducing an additional constant:
n 1
x
dx 1
0 eax 1 a n (n) (n) .
Using this form, it is easy to derive the expressions

xne x
0 e x 1 2 dx (n 1) (n)

x n 1e x 1 e x

x n dx

dx ( n 2) (n)

(n 1) (n) (n 1) .

Differentiating with respect to n, we can generate logarithms like before:


n 1
x
ln x
0 e x 1 dx (n) (n) (n) (n)
n
x ln x
0 e x 1 2 dx (n 1) (n) (n 1) (n 1) (n) (n 1) .

It takes Mathematica a surprisingly long time to determine integrals like these, even the easier ones
that do not involve logarithms (you can beat it if you try!). It even refuses to do integrals that
involve rational powers of x, like
x x dx
5 3
5
0 e x 1 2 2 2 2 ,

even though it is certainly able to understand the functions that this integral is given in terms of (it will
give you any number of decimal places for the result if asked).
Even more surprisingly, Mathematica returns (or used to return) incorrect results for the
integrals involving logarithms. It thinks for quite a long time, then declares that the answer is 0. I am
not sure why it does this, as it is certainly able to manipulate derivatives of the zeta function and it is
also able to numerically determine the correct result (affirming our formulas above), but this again
goes to show that there is no substitute for human understanding of what a computer system is being
asked to do. Even if these errors in its coding are fixed (it seems some of them have been; recent
calculations of the same integrals now give correct results), there will definitely be others. This is not
a rebuke of Mathematica itself, as I, personally, like the system a great deal. Rather, it serves as an
example of why we need to be able to understand things ourselves. We can depend on technology to
do our dirty work, as long as we are certain that it will do a good job. In order to make this
103

Advanced Mathematical Techniques

distinction, we need to understand how the system is performing the task we are giving it. At least,
someone does.
A closely related integral that also finds a lot of use in physics, especially in the treatment of
electron gases and the analysis of neutrinos, is
n 1

x
dx
(1) k 1 2n 1 1

(
n
)
(
n
)

(
n
)
n 1 (n) (n) .

0 e x 1
kn
2
k 1
This integral gives rise to the results
dx
0 e x 1 ln 2
x dx
2
0 e x 1 12
2
x dx
3
0 e x 1 2 (3) .
It can also be manipulated in much the same manner as the previous integral with the minus sign in the
denominator.

Exercises for Section III.4


In problems 1 18, determine the value of the given integral. Express your answer in terms of and
other mathematical constants if possible.
x
dx
3x
e 1

2.

x3
dx
3 1

5.

8.

1.

4.

7.

2x

10.

13.

16.

104

x 3e 2 x

2x

dx

x3
dx
e4 x 1

3.

x2 x
dx
e2 x 1

6.

e3 x 1

x 2 ln x
dx
e2 x 1

x ln x
dx
e3 x 1

11.

x
dx
e 1

14.

17.

2x

x 2 e3 x

3x

dx

x2
2x

9.

12.

dx

15.

18.

x2
dx
e 1
3x

1
x

dx

x2

x 2 ln x

x2
dx
2x 1

dx

dx

x 3 ln x

xe x

dx

dx

dx

Section III.4: The Riemann Zeta Function

In problems 19 22, determine the desired value of the Riemann zeta function or its derivative. Give
your answers either in terms of and the zeta function or as a rational number.
19. ( 5)

20. ( 7)

21. ( 4)

22. ( 6)

23. Express the derivative of the Dirichlet eta function and the Dirichlet lambda function in terms of
the zeta function and its derivative.
24. Write the Riemann zeta function in terms of the Dirichlet eta function, and determine all of the
places in the complex plane where the eta function must take the value zero in order for the
Riemann zeta function to be entire except for its single pole at s 1 . Give exact values for all
of these zeros. What happens to the eta function at the zeros of the zeta function?

Section III.5: Regularization of Integrals


As a final topic in this chapter, lets consider some more integrals that will be ill-defined if we
separate them. The integral

1
e x
0 e x 1 x dx ,
for example, cannot be computed directly as the two integrals do not separately converge. The
difference, on the other hand, converges quite well. In order to regularize this integral, we need to
introduce a factor that diverts the divergence of both terms in the same way. Otherwise, the difference
will not be well-defined. The easiest way to do this is to introduce the factor x in both terms. This
allows us to write

x dx

1
e x
1
x

dx
lim
(1 ) (1 ) ( )

x
0 e 1 x 0 0 e x 1 0 x e dx lim
0
.
1

lim (1 ) (1 )
0

The limit of the first term is clearly 1, but in order to determine the limit of the expression in brackets,
we need to find out a bit more about the zeta function. What we really need is a sort of expansion for
the zeta function around s = 1. The zeta function is divergent at s = 1, so we need to approach this
carefully.
The standard way to do this involves cutting off the zeta function at a large, but finite, value
of N:
N
N
1
1 (1) k k ln k n (1) k k N ln k n
N (1 ) 1

k!
k ! n 1 n
n 1 n
n 1 n k 0
k 0
None of the series on the right converge as N tends to infinity, but we can use the techniques outlined
in chapter 1 to extract the divergence:
N
ln k n N ln k n ln k 1 N ln k 1 N
.

n
k 1 k 1
n 1
n 1 n
105

Advanced Mathematical Techniques

The quantity in parentheses converges as N tends to infinity, so the divergence is localized to the term
outside the parentheses. Re-summing the divergent parts gives us

(1) k k N ln k n ln k 1 N (1) k k ln k 1 N

N (1 )

k ! n 1 n
k 1 k 0
(k 1)!
k 0
.
k k
k
k 1

N
(1) ln n ln N N 1

k ! n 1 n
k 1
k 0
The limit can now safely be taken as N tends to infinity, yielding the desired expansion:
N ln n ln 2 N
1
(1 ) lim

.
N
2

n 1 n
The integral is clearly given by

1
e x

0 e x 1 x dx .
A related integral that really illustrates how expansions can be used to solve these problems is
2

1
e x

0 e x 1 x dx .
Mathematica wont do this one (it encounters a divergence and refuses to evaluate, but stops short of
saying that the integral itself does not converge), but we can. Regularizing as before and squaring the
expression out, we have

2
2 1
x
x

1
e x 2
dx
dx

2 x x
x 2 2 e2 x dx
x
x dx 0 x
2
0
0
e
1
x

e e 1

e 1

x 2 dx

2 x
0

2 1

1
x
2 2 2 x
e dx
e x 1 e dx 0 x

Using our integration formulae above, we are interested in determining the limit of
(1 2 ) (2 ) (1 2 ) 2(2 ) (2 ) 2(2 ) 21 2 (1 2 )
as 0 . Dividing out a factor of (1 2 ) , being careful to include the corrections for the last three
terms, gives

1
1
2 2
(1 2 ) (2 ) (1 2 ) (2 )

(1 2 )

1
1 2 22
(1 2 ) (2 ) (1 2 ) (2 )

(1 2 )

This expression definitely looks complicated, but it is not as complicated as it might seem. Remember
that we really only care what happens when is small. The factor out front goes to 1 as 0 , so
can be taken as zero anywhere inside the brackets where this does not lead to a divergence.26
Expanding the power of 2, substituting the expansion for (1 2 ) , and remembering that
(0) 1 2 brings us to the expression
1 1
1

(2 ) 2 ln 2 2 .
2 2

This would not have been the case if the factor out front itself diverged as 0 . This is the reason we chose to bring
(1 2 ) out front instead of (2 ) .

26

106

Section III.5: Regularization of Integrals

Notice that I have not substituted (0) 1 2 into the term with the 1 because cannot be safely
taken to zero in this term. The 1 cancels in the last term once we expand the power of 2, so we can
take 0 there after this cancellation. It is clear that the remaining 1 s will cancel, but a term
proportional to in (2 ) will make a finite contribution to the expression in the limit as 0 . In
order to go further, we need an expression for (2 ) that includes that contribution.
The easiest way to determine an expansion for (2 ) comes to us from the reflection identity:

(2 ) 22 2 1 sin (1 2 ) (1 2 ) .
When trying to extract an expansion from an expression like this, it is important to keep in mind what
order we are looking for in the expansion. The sine function will give an overall factor of , but this
factor will cancel with the 1 in the expansion of (1 2 ) . In order to ensure that we get the whole
coefficient of , we must keep terms up to order 2 in the factor multiplying (1 2 ) . Anything
associated with 3 or higher can safely be dropped. This leads us to
1

(2 ) 1 2 ln(2 ) 1 2
2

.
1
1
1 2 ln(2 ) 1 2 1 2 ln(2 )
2
2
Work through this calculation to make sure that you understand how it goes. As soon as we move the
inside to cancel the dangerous 1 , we can safely keep only terms of order . Note that the
contribution to (1 2 ) of order (which is not zero) cannot contribute to this expansion because it
is not multiplied by the 1 . Substituting this expansion into our earlier result, we finally obtain
2

1
e x

0 e x 1 x dx ln 8 e 5 2 .
This agrees with the numerical result given by Mathematica, which is a useful check of our algebra
even though the program refuses to give the exact result.
It is interesting to see that this calculation requires us to go deeply into the analytic
continuation of the zeta function. Splitting the above integrals up is really only possible when 1 2
since this is required to make each of the integrals separately converge. Nevertheless, the sum gives
the correct result even when is taken to zero. This is a consequence of the behavior of analytic
continuations. Note that the value of (0) as well as (0) are critical to the above analysis. Even
though 0 is not in the accepted region of definition of this function, the behavior of the analytic
continuation still impacts the value of well-defined integrals. We can go even further into the analytic
continuation of functions, considering, for example,
2
1
1
2
0 sin 2 x x 2 dx .
The evaluation of this integral is a bit tricky, and really requires one to trust the power of analytic
continuation. Once we trust this process, the result is very easily obtained. We begin by writing

1 2 2 1 2

x 1
.

sin
x
x
dx

0
1 0
2 2
These integrals can only be separated if 1 . In that case, the lower limit of the last term vanishes
and we have
2

107

Advanced Mathematical Techniques

1 2 2 1 2

sin 2 x x 2 dx

.
2 2
1
This expression has a finite limit as 0 , which gives us the result. Note that the first term does not
contribute, as its denominator becomes unbounded as 0 . One may argue that this limit process is
improper since we have already assumed that 1 in obtaining this expression. However, the limit of
the left-hand side certainly does exist as 0 . Since this expression is equal to the right-hand side
whenever 1 , the ideas of analytic continuation allow us to conclude that there will be no difficulty
in taking its limit as 0 . The fact that our original expression is well-defined is crucial to the
success of this technique. If we were, for example, to try to use this technique to evaluate
2
1
1
0 sin 4 x x 4 dx ,
we would definitely arrive at an incorrect result as this integral does not converge. We may arrive at a
finite answer through this analysis, but it would not represent this integral. The integral
2
1
1
1
0 sin 3 x x3 2 x dx ,
on the other hand, does converge. Its value should be given by the limit of
0

1 2 1 2

2 1 2 2
2
2
as 0 . We have already done away with the lower limit contribution from the integrals,
operating under the assumption that 2 . The first and last terms are divergent as 0 , so we
have to be careful with these. The middle term, on the other hand, presents no problems. The value is
determined as

1 2 2 1 2 1 2 2 2 2 1 2 2 2 1 2 2
2
1 2 2
2
1
2
1 2
1 2

1 2 1 2
2
1
1

ln 2 ln

2 1 2 2 2 2
2
4

Unfortunately, this result is incorrect. The correct value is larger by exactly 1/6,
2
1
1
1
2
1
1
0 sin 3 x x3 2 x dx 2 ln 2 2 ln 12 .
The reason for this gaffe is that, while the difference between our regularized integrand and the given
integrand vanishes as 0 ,
1
1
1 1
1
1
lim 3 3 3 3 1 0 ,
0
2 x
sin x x 2 x sin x x
the integrated difference does not vanish:
2
1
1
1 1
1
1
1
lim 3 3 3 3 1 dx 0 .
0 0
x
x
x
x
x
x
sin
2
sin
2
6

This unfortunate state of affairs is rooted in the fact that the terms in our expression are not regularized
in the same manner. Multiplying the first term by sin x is not the same as multiplying the last two
terms by x :

x2 x4

x3 x5
x2
x4
x 1

.
sin x x x 1
3!
5!
6
120
6
180

108

Section III.5: Regularization of Integrals

The difference between multiplying by sin x and multiplying by x is


x2

x4

.
sin x x x
6
180

Since this contribution is divided by sin 3 x x 3 , the integrated difference is


2 3

sin x x
x2
x4
2
2
dx
x

dx

0 sin 3 x
0
6
180
6
180 2

The problem is clearly the cancellation of in the first term. Any power of x larger than 2 will not
contribute to the integrated difference as 0 , but the x 2 contribution will. The reason why this
didnt matter before is that the difference is only of order x 2 , so cannot contribute as 0 unless it
is divided by a power of x larger than or equal to 3.27
To do this integral properly, we multiply all of the terms in the integrand by the same
regularization factor:

2
2 x
1
1 1
x x
3 dx .

3
0 sin 3 x x3 2 x dx lim

0 0
sin x x 2 x
In order to avoid having to integrate the first term, which we do not have an explicit formula for, we
simply add and subtract:

2 x sin x
sin x x x
1 2
1
1 2
1
1

3 dx 2 ln 2 ln 2 ln 2 ln .
lim
3
3
0 0
sin x x 2 x
6
2
4
2
12
sin x
The purpose of going through this example is to illustrate what can happen if we are not careful. There
are many, many different glib approaches that do lead to correct results, but we should be mindful of
the proper way to do things at the very least in order to track down difficulties should they arise. Note
that all of the work we went through was not useless. We just needed to couch the argument correctly
in order to determine the proper correction.
The expansion techniques in this set of notes are extremely useful in a variety of situations.
These situations do not always involve the determination of integrals, but that playing field is a useful
one on which to practice the techniques, and it leads to some very interesting results. There are
obviously many other examples that these techniques can be used on. Some will appear in the
exercises, but you can always make up your own if you like. Check your answers on Mathematica.
Even if it wont do the integral exactly, it will almost always give an accurate numerical estimate.

27

This is also the reason why our replacement of the denominator sin x with x doesnt affect the result; the difference is
of high enough order to make this replacement irrelevant.

109

Advanced Mathematical Techniques

Exercises for Section III.5:


In problems 1 14, determine the value of the given integral.

1
e x

2x
dx
e 1 2x

7.

10.

1.

4.

2.

1
2 x

x
dx
2 1 x ln 2

5.

1
e x

2x
dx
e 1 2x

8.

csc 2 x dx
2x

11.

13.

e x
1

dx
e x 12 x 2

1
3 x

x
dx
3 1 x ln 3

6.

9.

1
e 3 x

2x
dx
e 1 2x

1 1
3
cot x 3 dx 12.
x
x

14.

1
e 2 x

2x
dx
e 1 2x

1
e x

2 x
e 1 2x

3.

1
3 x

x
dx
3 1 x ln 3

2 cot 2x dx
x

dx

ln x

cot x ln sin x
dx
x

e x e x
1

dx
e x 12 x 2
x

a b c
5
csc x 5 3 dx , for real constants a, b, and c.
x
x
x

(a) This integral converges for a specific set of numbers a, b, and c. What are these numbers?
Can this integral converge for any other choice? Why or why not? Why are the even
powers absent in this expansion?

15. Consider the integral

(b) Determine the value of this integral, using the constants you found in part (a).

110

Section III.6: Summary of Chapter III

Section III.6: Summary of Chapter III


This chapter introduces a function that finds immense use in many fields of mathematics,
probability theory, physics, and several other disciplines. The Gamma function is introduced to
students in middle school as the factorial function, and is kept in a corner for a long while until
students are able to grasp its other properties. Its general treatment is very subtle, so the full nature of
the Gamma function is often shielded from view by instructors interested in getting the basic points
across. Despite this shielding, the Gamma function plays a major role in many of the standard results
of probability, where it hides inside the factorial function, statistics, where it is the essence of the
normal distribution, the Poisson distribution, and the binomial distribution, in mathematical physics,
where its integral representation is extraordinarily useful, and in number theory, where its information
about the Riemann zeta function is invaluable. We will use it in chapter 10 to derive one of the most
important aspects of statistical mechanics the Boltzmann factor. It is always present, though very
rarely acknowledged, and opens doors that are otherwise inaccessible to us.
The integrals accessible to us via the Gamma function are innumerable, and its properties allow
us to explore many relationships between these integrals. The integral representation of the Gamma
function itself has many uses, the normal and Poisson distributions of statistics being two prominent
examples, but we can also obtain other results by manipulating this integral representation. The beta
function consists of a combination of three Gamma functions, and has forms suitable for the
computation of integrals of products from 0 to 1, rational functions from 0 to infinity, and
trigonometric functions from 0 to 2 . These manipulations give rise to results running the spectrum
from the binomial distribution of statistics to the volume of a hypersphere in abstract geometry. The
applications of the latter may be hard to see concretely, but their use has won at least one Nobel prize
in physics. The beta function, in its several forms, represent one example of the uses of the Gamma
function, but there are others. Many mathematicians have devoted their lives to a study of these
relationships, and many important results have come from this devotion.
The Gamma function plays a key role in the theory of special functions, as it is the only
standard special function that does not fit into the mold set by the others. All of the other traditional
special functions, like the Bessel functions, the Legendre functions, the Hermite functions, and even
the Mathieu functions, satisfy a linear differential equation with coefficients that are elementary
functions involving ratios of polynomials, exponentials, and/or trigonometric functions. The Gamma
function stands alone in that it cannot satisfy any linear differential equation of any order with such
coefficients. This property disallows the hypergeometric treatment seen in sections 9.1, 9.3, and 13.5,
and even the Meijer G treatment seen in section 4.2,28 and is part of the reason why it opens so many
doors that are inaccessible to us through any other means. Its direct relationship to the EulerMascheroni constant and all of the natural values of the Riemann zeta function is unique to special
functions, and allows us to determine general results we could not determine in any other way.
The treatment of the Gamma function given in this chapter begins with the three basic
definitions of the Gamma function the limit, infinite product, and integral representations. These
representations are then used to determine a variety of different integrals, employing the
differentiation by integration technique introduced in section 1.6 as well as expanding the use of
power series to determine integrals established in the same section. The beta function is then defined
and manipulated in the same manner. The first four sections of the chapter can essentially be thought
of as an exercise in using expansions to determine the value of integrals, with the added twist that the
coefficients of these expansions are not rational numbers. The Riemann zeta function, in conjunction
with the Gamma function, allows us to determine the value of many integrals that are strikingly
different from those obtained from the Gamma function itself. These integrals play a major role in
28

This is despite the fact that the Meijer G-function is defined in terms of the Gamma function.

111

Advanced Mathematical Techniques

numerical analysis and a significant role in statistical thermodynamics. They can be manipulated in
the same manner as those associated with the Gamma and beta functions, but have a somewhat
different character. The Riemann zeta functions primary importance lies in its relationship with the
prime numbers, a topic that we will discuss in far more detail in chapter 6. The basics are laid out in
section 3.4, where we see that the Riemann zeta function can be expressed as a product over only
prime numbers. This will eventually lead, with a LOT of foresight and deft handling of integrals, to
Riemanns main formula for the number of prime numbers lying below a given natural number. This
treatment is found in some detail in chapter 6, once we have the results of complex analysis under our
belts.
The last section of this chapter considers integrals that do not allow us to use the techniques of
the rest of the chapter directly. Doing so would lead to divergent expressions, even though the
ultimate result is perfectly finite. The reason behind this is associated with separating two integrals
that separately diverge, but whose difference converges. This is also the case with integrals expressed
as the derivative of another integral with respect to an external parameter: the divergence in the
primary integral is independent of the external parameter, so cannot contribute to the derivative. Such
integrals are treated by introducing a regulator to quantitatively express the manner in which the
integrals diverge, so that the cancellation can be made explicit and the remaining contributions that do
not cancel can be extracted. This process is usually quite straightforward, but we must make sure to
regularize both contributions in the same manner in order to obtain meaningful results.
The Gamma function is extremely useful in a broad variety of applications. We will find many
uses for it in later chapters of this book. Its introduction early-on also allows us to illustrate how
analytic function theory can be used to treat new functions that we initially know very little about.
Starting from the definition of the Gamma function, we were able to derive expressions for its infinite
product representation, integral representation, and series representation essentially by sheer force of
will. Very similar treatments can be applied to many other special functions, and the character of each
of these expressions for a given function brings different information to the table. This allows us to
form a more complete understanding of the behavior of the function, making it less and less
unknown.
If a special function satisfies a differential equation, then this differential equation can also be
used to establish many of its properties. This is exploited in sections 13.4 6 for the Bessel functions,
hypergeometric functions, and many types of orthogonal polynomials. The treatment of the Gamma
function is limited by its original definition, as it cannot satisfy any differential equation of finite order
with rational coefficients. For this reason, the treatment of other special functions often appears
somewhat different from that given here for the Gamma function. Despite this limitation, the
treatment of the Gamma function found in this chapter is essential to the treatment of many other
special functions; these functions are often defined in terms of the Gamma function, so its properties
are inherited by these new functions.

112

Section IV.1: General Results

Chapter IV
Asymptotic Expansions
In all of the preceding chapters, we have been very concerned with the convergence of an
expansion. The series

is meaningless when x 1 . It can be assigned a value through the

k 0

process of analytic continuation, leading to a result like

1
1 ,

1
2
k 0
but this is not really useful unless we already know the analytic continuation (it would not be helpful to
actually try summing the terms in this expansion, for example, as that would certainly not give -1).
The symbol is to be read as is taken to mean rather than equals or approximately equals, as
neither of the latter interpretations make any sense at all. For a series like that defining the Riemann
zeta function, the analytic continuation is highly nontrivial because the function itself is not
elementary. We have no way of writing it in terms of the functions we are used to (trigonometric,
exponential, polynomial and combinations thereof), so cannot easily calculate its value. In order to
have a means of computing the value of such a function, we require a convergent expansion.

1 2 2 2 23

Section IV.1: General Results


There are, of course, many functions that cannot be expressed in terms of elementary functions.
In applications, these functions often appear as integrals that depend on one or more physical
parameters. In order to determine the value of these functions, we require a way in which they can be
systematically approximated for the values of interest in the application. One fairly simple example of
such a function is

f ( x) e xu

du .

This function appears, in one form or another, in many applications. It is directly related to the error
function
2 x u2
erf ( x)
e du ,

and its complement,


2 u 2
erfc( x)
e du ,

of statistics, and it will become apparent from our discussion of f (x) how these functions are calculated
to give the numbers on those tables statisticians hand out in class. Several facts about the function f (x)
are immediately obvious from its definition. First, the value f (0) clearly equals 1 and f (x) approaches
zero as x grows without bound. Second, f (x) is clearly defined for all positive values of x and
undefined for all negative values. Third, f (x) decreases monotonically as x increases. This is obvious
from the derivative,

f ( x) u 2e xu
0

du ; x 0 ,

which is clearly always negative. Therefore, f (x) represents a function that monotonically decreases
from 1 at x = 0 to 0 for unbounded x.
113

Advanced Mathematical Techniques

How can we determine the value of f (x) for a specific value of x? The answer to this question
depends on the value of x we are interested in. We will use a different technique to approximate f (x)
when x is very large than that used when x is very small. For large x, we write

f ( x) e xu

du e xu 1 2 x
0

1 4 x

du

.
2
1 1 4x
e
e d
1 4x
x
2x
So far, we have not made any assumptions at all on the size of x; this expression works perfectly well
for all positive values of x, and indicates the relationship between f (x) and the complement of the error
function in statistics:
1
1 1 4x
f ( x)
e erfc
.
2 x
4x
This expression, while correct, is not useful for small values of x because the factor in front quickly
becomes unbounded for small x; for x = 0.01, for example, we would have
f (0.01) 5 e25erfc 5 .

e1 4 x 1 e xv dv
2

Even a small error in the computation of erfc(5) would be greatly magnified when it is multiplied by
e 25 7.2 1010 . This only gets worse as x is made smaller. For large x, on the other hand, the
argument of the exponential is small and does not pose a problem. The lower limit on the integral is
close to zero when x is large; since we know the exact result of the integral when its lower limit equals
zero, thanks to Gauss, we write

1 4x
1 4x
2
2
1 1 4 x 2
1 1 4x
f ( x)
e e d
e d
e
e d .

0
0
0
x
x

The last integral can be evaluated by expansion, giving us

(1) n
1
f ( x) e1 4 x

2 n 1 n 1 .
2 x n 0 (2n 1) n ! 2 x
This expansion is entire, so it converges whenever 1 x , or x 0 . When x is larger than about
1/4, the expansion converges fairly quickly and it represents a useful way to approximate f (x). For
smaller x, however, we require many terms to accurately approximate the function. Even worse, small
values of x represent an extremely delicate balancing act for the series. When x = 0.1, for example,
keeping only the first two terms in the series gives
f (0.1) 23.9893 .
Remember that the exact value of f (0.1) 1 , so the remaining terms need to cancel out this immense
contribution; the only substantive contribution that these early terms give to the final result is
contained in the trailing decimal places, exactly where round-off errors cause the most trouble. This
problem is made even worse when x is smaller, as the same approximation gives
f (0.01) 2.704 1013 .
This expansion is clearly not acceptable, despite the fact that it does converge. We would be better off
using the tangent line approximation afforded us by the first derivative at zero.
This idea is a good one why dont we use the tangent line approximation? This
approximation gives
f ( x ) f (0) f (0) x 1 2 x .
This yields f (0.1) 0.8 , when the exact value is f (0.1) 0.8653925865... . For smaller x, the
approximation is much better: 0.9810943073 f (0.01) 0.98 . This approximation is certainly
114

Section IV.1: General Results

simple, and it is fairly accurate for small enough values of x, but in order to make it better we need to
have correction terms that we can add if we require more accuracy. The standard method used to do
this is to employ a Taylor expansion. Herein lies the rub: a Taylor expansion centered at x = 0 for our
function cannot converge with any finite radius of convergence because f (x) is undefined for all
negative values of x. Remember that series expansions cannot converge on only one side of the center;
they must converge in a disk or not at all. Any efforts to find a convergent series expansion for f (x)
centered at x = 0 are doomed to failure before they even begin.
Although this last statement is definitely true, it does not invalidate our results from the tangent
line approximation. We may not be able to find a series that converges to f (x), but we have already
found a truncated expansion that approximates f (x) quite well for small x. To see how we can
improve this result, lets try to find a series expansion for f (x) about x = 0. We know that this effort
will ultimately fail, but it is instructive to go through the process and see what works and what doesnt.
We can certainly replace the exponential in our integral definition for f (x) by its expansion:
n n

( 1) x
2
f ( x) e xu u du
u 2 n e u du .
0
0
n!
n 0
This expansion is entire, so converges for all values of x and u for which xu 2 . This is even true
for negative values of x, but the integral itself will not converge for these values. Unfortunately, this
restriction is not satisfied over the range of integration because u is unbounded over this range.
Ignoring this for the moment, we interchange the sum and the integral and obtain

(2n)! n
f ( x) (1) n
x .
n!
n0
The symbol is read as, has the expansion; it does not in any way imply equals. Our function
does not actually equal this expansion for any nonzero value of x because, as already stated, the
expansion can only converge at x 0 . This can easily be verified by the ratio test. It was this
exchange of the sum and the integral that caused us to lose equality, as the expansion itself does not
converge at infinity. Before we throw this expression away as garbage, though, lets look at the effect
of considering only a finite number of terms. Keeping terms up to order x 3 gives
f ( x) 1 2 x 12 x 2 120 x3 .
When x = 0.1, we obtain f (0.1) 0.8 . This is the same approximation as before, but with more
work. When x = 0.01, on the other hand, this expression gives f (0.01) 0.98108 , a much better
approximation to the exact result than we had before. Adding the next term, 1680x 4 , improves the
approximation to f (0.01) 0.9810968 . This is correct almost to 5 digits, much better than the
tangent line alone. The next term changes the estimate to f (0.01) 0.9810937760 , even closer to the
exact result, and adding another term gives the even better result f (0.01) 0.9810944413 . The
inclusion of another term brings us 7 digits of accuracy, f (0.01) 0.9810942683 . It is clear that
something about this expansion works, even though the full series does not actually converge. Adding
enough terms of the expansion will eventually cause our estimate to exceed any number, as evidenced
by the result of keeping 90 terms, f (0.01) 2.9467 1010 . However, that does not mean that the
expansion is entirely useless.
This expansion is very strange. It gives many digits of accuracy for f (0.01) when 10 terms are
kept, but fails miserably when 60 terms are kept. Exactly the same expansion that gives so much
accuracy for x = 0.01 fails miserably when x = 0.1. Ten terms give f (0.01) 0.9810943025558... ,
about 8 digits of accuracy, but the same ten terms give f (0.1) 12.853 , absolutely no accuracy at
all. Such an expansion is called a semi-convergent, or asymptotic expansion. These expansions are
strikingly different from convergent expansions in that the accuracy they contain depends on the
115

Advanced Mathematical Techniques

number of terms kept in the expansion as well as the value of x at which the expansion is evaluated.
We can get an idea of how these expansions work by investigating the difference between our exact
function f (x) and its approximation containing N terms:29
N 1

(2n)! n
x nu 2 n u
f ( x) f N ( x) f ( x) (1) n
x (1) n
e du .
0
n!
n!
n0
n N
Now, the sum in the integrand is a convergent alternating series. As such, its absolute value is less
than the absolute value of its first term:
N

x
x nu 2 n u
(2 N )! N
e du
u 2 N e u du
x .

0
0
n!
N!
N!
n N
For a fixed value of N, this error can be made as small as desired by decreasing the value of x.
Conversely, for fixed x, this error can be made as large as desired by increasing the value of N. This
summarizes the general character of asymptotic expansions: given a finite number of terms of such an
expansion, the function we are trying to obtain can be approximated as accurately as one desires by
changing the value at which we are trying to obtain the approximation. I must note here that the
analysis above is not actually valid as stated. Recall that the simple way to estimate the error in an
alternating series, used to obtain our bound above, is actually only valid when the terms of the
sequence are monotonically decreasing in absolute value. The ratio of neighboring terms in our
expansion, xu 2 (n 1) , satisfies this requirement only when xu 2 N 1 . This is obviously not true
for all of the values of u in our integration region, regardless of the value of x 0 , as this region
involves unbounded values of u. It does, however, arrive at the correct result in a simple way. I do not
know of an easy way to come to this result honestly in this specific situation, which is the only thing I
dont like about the example of f (x), but this is a minor technical point in the analysis of this specific
function. We will have much stronger results for the functions considered below, and these will imply
(indirectly) that the final result of the above analysis is correct even though the argument leading up to
it is flawed.
Our current standing on the function f (x) is illustrated in figure 1. The exact values are given
by the solid line, the large-x result (keeping three terms in the series) is given by the long dashes, and
the two asymptotic series results associated with keeping 2 and 4 terms are given by the shorter and
shortest dashes, respectively. It is clear that each of the associated expansions accurately gives the
value of the function in the region in which it is supposed to. It is also clear that the asymptotic series
with more terms stays closer to the function when x is small, but deviates by a larger amount than that
with less terms as x is increased. This is made clearer in figure 2, which shows the same graph over a
smaller interval of x. The long dashes in this plot show the result of keeping two terms in the
asymptotic series, the shorter dashes keep four terms, and the shortest dashes keep six. Again, we see
that keeping more terms brings the result closer to the exact value at the expense of deviating more
wildly for larger values of x. One may wonder why we would resort to such expansions when the
exact values are available. The reason for this is that the values we are calling exact are not actually
exact. They result from numerically integrating the definition of the function. As such, they take
much longer for a computer system to evaluate than the associated series. The exact plot in figure 1,
for example, took Mathematica 9.047 seconds to produce while the most complicated of the associated
expansions took only 0.016 seconds. This timing issue is certainly important, as it may result in the
difference between days and minutes for a more complicated function, but even more important is the
fact that these numerical integration techniques have their own difficulties. A plot of the exact graph
of f (x) from x 0 to x 0.1 took Mathematica 40.5 seconds, and was given as illustrated in figure .30

f ( x) f N ( x)

29

(1)

The exchange between the integral and the infinite sum is not a problem here, as the sum to be interchanged is finite.
The exact result plotted in figure 2 was generated by using the expression above for f (x) in terms of the complement of
the error function this is a built-in function for Mathematica, so it has highly accurate ways with which to calculate the

30

116

Section IV.1: General Results

Note the instabilities associated with very small values of x. These instabilities represent the fact that
small values of x require the consideration of very large values of u before the exponential eventually
dominates the expression. Conveniently, these small values of x are exactly the ones that the
asymptotic expansion works well for. This integral is actually a very simple example; it is far more
complicated to get an accurate result for the integral of a function that oscillates wildly like sin x 3 .
Such functions occur with surprising regularity in many applications, so it is important to have an
accurate analytic means with which to determine the actual result. These situations are shining
examples of the need for asymptotic expansions in physical applications.
1.0

1.00

0.8

0.95

1.00

0.6

0.90

0.95

0.4

0.85

0.2

0.80
0.2

0.4

0.6

0.8

1.0

0.00

1.05

0.90
0.85
0.80
0.02

Figure 1

0.04

0.06

0.08

0.02

0.10

Figure 2

0.04

0.06

0.08

0.10

Figure 3

Before moving on to another example of an asymptotic expansion, it is instructive to view the


progress of convergence to an exact value of f (x). Figure 4 shows the approximations to f (0.03)
versus the number of terms kept in the expansion (minus 1 n starts at 0). Note that the
approximations get closer to the exact value, given by the horizontal line, until the number of terms
kept in the expansion exceeds about 9. After this, adding more terms actually reduces the accuracy of
the approximation. This is indicative of the way that asymptotic expansions always work. The
number of terms that should be included to achieve optimal accuracy is determined by seeing where
the next contribution is larger than the last. In this case, that occurs when
(2 N 2)(2 N 1) 100
,

N 1
3
or at N = 7.83. Obviously, N increases as x decreases in this case. If x > 1/6, then the optimal number
of terms to keep is 1. Thats why the asymptotic expansion is not useful for large x in this case.
0.950
0.949
0.948
0.947
0.946
0

10

15

20

Figure 4
Although our example of f (x) has an expansion that is asymptotic around x 0 , most
asymptotic expansions used in mathematics and science work best when x is very large (this is why
they are called asymptotic expansions). For this reason, they often look like Taylor expansions in
which x has been replaced with 1 x . As an important example, consider the complement to the error
function,
2 u 2
erfc( x)
e du .

value of this function for essentially any argument this is yet another reason why it is useful to define and study special
functions.

117

Advanced Mathematical Techniques

This analysis will also indicate the techniques required to make the above treatment of the error
associated with the asymptotic expansions of f ( x ) honest, but without as much work. The
asymptotic expansion of this function is not easily obtained in its present form, but the substitution
u 2 brings us to exactly the right place:
erfc( x)

d .

This is the starting point of many lucrative asymptotic expansions. We have an integrand that is the
product of a function we can integrate repeatedly, e , and a function that we can differentiate
repeatedly, 1 . This form essentially begs us to use integration by parts. Doing so repeatedly
gives us the expression
2
2
2
x2

1 e x e x 3e x
e
N N 1 2 ! e
N 1 N 1 2 !
d

erfc( x)
(
1)
3 2 5 ( 1)

2
1
1
2
N

x
2x
2 x
x
x 2 N 1

The first terms are the asymptotic expansion,


2

e x 1
1
3
35
N N 1 2 !
,
AN ( x )
3 2 5 3 7 ( 1)
2 N 1
x 2x 2 x 2 x
x

and the last is the error,

(1) N 1
e
EN ( x)
N 1 2 ! 2 N 11 2 d .

The factorials are introduced to simplify the notation; the numerator is just the product of odd whole
numbers up to 2N 1, and the denominator has as many 2s as factors in the numerator (the number 1
counts as a factor for this purpose). The development of this expression is strong. The equality really
means equals, so the error term is the exact difference between the complementary error function and
the expansion. We cannot do the integral associated with the error exactly (if we could, then we would
have no need of the expansion!), but we can easily bound its magnitude by shifting the integral in the
expression:

EN ( x)

N 1 2 !

x2

2 N 11 2

d
2

e x
e
e x
N 1 2 !
d

N 1 2 !

1
1
2
N
2 N 3
0
x
x 2 N 3
1 x 2

The last inequality follows from the realization that the denominator of the integrand is always greater
than or equal to 1 and replacing that denominator by 1. Thus, the error in our series is always less than
the first term omitted. Note that it certainly does grow without bound as N is increased at fixed x, but
2
it is also quite small when N is moderate and x has appreciable size. The factor e x adorns every term
in the expansion and the remainder, so we can view it as an overall constant for this purpose.
N 3 2
, so the expansion is optimized by
Increasing N by 1 multiplies the error term by the factor
x2
taking N x 2 3 2 . A graph of the complementary error function and some of the asymptotic results
2

is shown in figure 5. The factor e x has been taken out for clarity. The long dashes represent one
term in the expansion, the shorter dashes two terms, and the shortest dashes three terms. Note that all
three are pretty good over a large range of x, but keeping more terms does increase the accuracy of the
118

Section IV.1: General Results

approximation at large x. As always, this comes at the expense of less accuracy for smaller values of
x.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1.5

2.0

2.5

3.0

Figure 5
Many examples of asymptotic expansions in physical problems come in a form that allows us
to use this integration by parts technique directly, but the first step in essentially all physical
problems involves a realistic assessment of what values of x we are actually interested in and what
values of any other parameters in the problem are actually expected to contribute significantly for
values of x in this range. This type of analysis often allows us to quickly cut to the chase and use a
technique that will be useful for our purposes. Consider a situation in which we are given the function
ux
e
du
f ( x)
2
0
1 u 3
and asked to determine its behavior for large values of x. This problem does not admit a simple exact
solution; Mathematica informs us that it is given by the Meijer G special function, which has been
defined essentially for this purpose,31 but this is not especially useful if one is looking for a simple way
in which to express the behavior of the function at large values of x. It would also not be useful to
attempt a repeated integration by parts approach, as the function 1 u 3

is not a function that I, for

one, have any desire to repeatedly differentiate. As an alternative, we look at the function as written
and try to discern what happens when x is very large. Clearly, the exponent will quickly dominate the
integrand at large x so that the only values of u that contribute substantially to the result are very small
(of order 1 x ). This guides us to use the binomial expansion of the denominator:
2
1 u 3 1 2u 3 (2)(2 3) u 6 (2)(3!3)(4) u 9 1 2u 3 3u 6 4u 9 .
This expansion does not converge unless u < 1, but our assessment is that the only values of u that
matter are those that are very small so we dont care. Substituting this expression into the integral, we
obtain
ux
e
du 1 2 3! 3 6! 4 9!
4 7 10 .
f ( x)
2
0
1 u 3 x x x x
Our result is, of course, an asymptotic expansion because of our willful disregard of the radius of
convergence of the binomial expansion. However, it does do an excellent job of approximating this
function at large x; the approximations associated with the first one, two, and three terms in the
expansion are shown along with the exact result in figure 6, using the same dash convention as
before. The errors associated with these approximations are shown in figure 7. Note that the threeterm approximation takes its time, but definitely wins out in the end. This example is especially
31

The Meijer G function was defined in 1936 by the Dutch mathematician Cornelis Simon Meijer in an effort to combine all
of the important special functions into one definition One Function to Rule them All except the Gamma function he
is very special, and will NOT be tamed!

119

Advanced Mathematical Techniques

important because even though Mathematica is able to determine the exact value of this integral in
terms of the special Meijer G function, a built-in function, it still takes almost 5 seconds to plot from
x 5 to 10. The most complicated of our asymptotic approximations takes it less than a sixtieth of a
second for the same range. Five seconds may not seem like a long time, but it definitely will if you are
trying to animate a plot or ask a computer to numerically integrate an expression that involves this
function!
0.6

0.010

0.4

0.005

0.2

6
3

- 0.2

10

- 0.005

Figure 6

Figure 7

Exercises for Section IV.1:


In problems 1 12, determine an asymptotic expansion for the given function valid for large enough x.
Use your expansion to determine the number of terms that should be kept in the expansion to find the
optimal value of the function at the indicated point. Give this best possible value. Estimate the error in
your approximation, and compare your result to that obtained from a more sophisticated computer
algebra system.

f ( x) e x

3.

f ( x)

5.

f ( x) e2 x

7.

f ( x ) e3 x

9.

f ( x)

11. f ( x)

cos t
dt ; compute f (7) .
t2

120

et
dt ; compute f (5) .
t2

1.

e 2t
dt ; compute f (3) .
t2 t

2. f ( x) e x

t3 t

4. f ( x)

e 4t

cos 2t 3

8. f ( x)

e xt
dt ; compute f (10) .
2t 3 5

10. f ( x)

e xt

t2 1

12. f ( x)

t7

e xt
dt ; compute f (6) .
1 t3

dt ; compute f (3) .

t3 3 t

e 3 t
dt ; compute f (2) .
t4

dt ; compute f (5)

sin t
dt ; compute f (7)
t3 t

6. f ( x) e 4 x

et

dt ; compute f (2) .

dt ; compute f (10) .

e 3 xt
dt ; compute f (6) .
t4 1

Section IV.1: General Results

13. Consider the function


2

et
x t 3 dt .
(a) Determine the asymptotic expansion of this function.
f ( x) e x

(b) Suppose you are interested in computing the value of this function at x 10 . What is the
optimal number of terms to use? Estimate the minimum error attained by using this
expansion to calculate its value.
(c) For what range of values of x will the first four terms of the expansion be sufficient to
guarantee an error less than 0.0001?
(d) Using the first four terms in your expansion, determine the first four terms of an asymptotic
expansion of the related function
x f (t )
g ( x)
dt .
10
t
What is the maximum error in this expression for g ( x ) ?
(e) Use the expression found in part (d) to compute g (20) . What is the maximum error in your
approximation? Compare your value to that obtained in a sophisticated computer algebra
system. How does the time required for this exact approximation compare to that required
for your approximation?
14. Consider the function

f ( x)

e xt
2

dt .

(a) Determine the asymptotic expansion of this function.


(b) Suppose you are interested in computing the value of this function at x 10 . What is the
optimal number of terms to use? Estimate the minimum error attained by using this
expansion to calculate its value.
(c) For what range of values of x will the first four terms of the expansion be sufficient to
guarantee an error less than 0.0001?
(d) Using the first four terms in your expansion, determine the first four terms of an asymptotic
expansion of the related function
x f (t )
g ( x) 2 dt .
10
t t
What is the maximum error in this expression for g ( x ) ?
(e) Use the expression found in part (d) to compute g (20) . What is the maximum error in your
approximation? Compare your value to that obtained in a sophisticated computer algebra
system. How does the time required for this exact approximation compare to that required
for your approximation?
121

Advanced Mathematical Techniques

et
dt cannot directly be treated with the techniques in this section, as it
0 3
t
is not in the proper form. If we are interested in its value over all x rather than just large x, then
an asymptotic approach is not appropriate anyway. Instead, we employ a sort of hybrid
approach using a series expansion for small x and an asymptotic one for large x.
(a) Determine the Maclaurin series for f ( x ) . What is the radius of convergence of your
expansion? What is the maximum error associated with truncating this expansion at the
sixth term for x = 1?

15. The function f ( x)

(b) Determine an asymptotic expansion for this function by first writing it as


t
e
f ( x) f () 3 dt
x
t
and finding the value of f ( ) . How large must x be in order to obtain an error smaller than
0.0001? How many terms are required to attain this accuracy for the smallest value of x at
which it is possible?
(c) Can the Maclaurin expansion you found in part (a) accommodate all values of x smaller than
the value you obtained in part (b)? How many terms are required to do this with an error
smaller than 0.0001?
(d) Your analysis should indicate that a small number of terms in each expansion can be used to
determine high accuracy approximations to this function for either large x or small x, but
there is a window between these two regions in which an uncomfortably large number of
terms are required. Find this window, assuming you dont want to keep more than six terms
in either expansion and require an error smaller than 0.0001. Values of the function in this
window are usually found by taking some time to approximate several values to very high
accuracy, then using interpolating polynomials to smear these values over the whole
window. If this is done carefully, one can arrive in a result that gives eight or ten correct
digits over the entire window.

122

Section IV.2: The Method of Steepest Descents

Section IV.2: The Method of Steepest Descents


One of the most important methods of determining the approximate asymptotic behavior of a
function given by an integral is the so-called method of steepest descents. This method can easily be
used to determine a simple form for the behavior without having to make all of the error estimates
associated with the above results. It is not as rigorous as the above treatment, and will not allow us to
estimate the error in our final approximation, so this analysis is usually used to estimate the behavior
of the function rather than give an accurate numerical value for the function. Nevertheless, it is
extremely useful in many situations and very accurate in most cases for large enough values of x. If
we are only interested in the asymptotic behavior of a function, its behavior for unbounded values of x,
this method is exactly what one wants: simple results in almost all cases that can readily be interpreted
in the context of the physical situation one is interested in. The celebrated WKB approximation
(named for the German physicist Gregor Wentzel, the Dutch physicist Hans Kramers, and the French
physicist Leon Nicolas Brillouin, who developed the technique shortly after the discovery of quantum
mechanics in 1926) of the probability of quantum mechanical tunneling is a stellar example of an
extremely lucrative use of this technique. More careful analyses of this situation would require an
entire chapter, but the steepest descents method can be elucidated in less than a page. If one is
interested in making accurate computations of the value of a function for x large, but not
asymptotically large, one usually begins with the steepest descent method to determine what is
expected of a more involved treatment, then proceeds with the more involved treatment to determine a
more accurate result.
2.0
1.5
1.0
0.5
4

10

Figure 8
One of the most common appearances of the method of steepest descents is its use to determine
the asymptotic behavior of the factorial function. We will begin with this treatment, as it is quite
simple and clearly illustrates how the technique is used. First, however, it is instructive to present an
even simpler argument that gets most of the answer with even less work. Consider the integral

ln x dx n ln n n 1 .

This integral may be evaluated by parts, if one likes. Using the trapezoidal approximation, we see that
n
ln1 ln 2 ln 2 ln 3
ln(n 1) ln n n 1
1
1
ln k ln n ln n ! ln n .
1 ln x dx 2 2
2
2
2
k 1
A comparison of the approximation and the exact result is illustrated graphically in figure 8, where n is
taken to be 10. Note that the trapezoidal approximation is expected to be quite good when n is very
large. The difference between the curve and the straight line approximation is barely visible in the
diagram! This is because the logarithm function changes quite slowly, so can locally be approximated
quite well with straight lines. The logarithm is also always concave down, which implies that the
trapezoidal approximation always underestimates the exact result. This gives
1
ln n ! ln n n ln n n 1 ,

2
or
123

Advanced Mathematical Techniques

1
ln n ! n ln n n ln n 1 ,

2
or
n ! n n e n 1 n .

This approximation of the factorial function was (possibly) first obtained by the French mathematician
Abraham de Moivre in 1733. He gave the asymptotic formula
n! c nne n n ,
where c is some constant as yet to be determined. Our result indicates that c = e, but a better value of
this constant was obtained (possibly) later by the Scottish mathematician James Stirling, who was able
to show
n!
lim n n
c 2 .
n n e
n
He was also able to generate a full asymptotic expansion for the factorial function, something that
deMoivre did not attempt. For this reason, the formula
n ! n n e n 2 n
is known as Stirlings approximation. As an interesting historical note, there are two distinct schools
of thought that you will find online about the factorial approximation business and how it relates to
the mathematicians deMoivre and Stirling. The first, presented above, has de Moivre coming up with
a formula first, then Stirling determining the value of the constant. Some historians use this idea to
argue that Stirling should not be given sole credit for the expression. The second scenario, which
seems to be more grounded in historical fact, holds that de Moivre argued in 1730 that the factorial
should have the form given above on the basis of empirical numerical evidence. Stirling then came
along in 1730, proved that this result was correct, and showed that the correct constant is given by
c 2 . De Moivre then used one of Stirlings results to give the substantive proof of his claim,
giving rise to the 1733 paper referenced above. From this perspective, Stirling clearly deserves full
credit for the result. The interested reader is referred to a reprint of James Stirlings 1730 calculus
textbook, annotated by Ian Tweddle, which also includes an exchange between the two
mathematicians in one of the appendices.
0.015
0.05

0.010
5

10

15

20

0.005

- 0.05

- 0.005

- 0.10

10

15

20

- 0.010

- 0.15

Figure 9

Figure 10

Our simple technique was able to determine the correct form of the asymptotic behavior, but it
gives the wrong value of the constant. This means that its result works fine for the logarithm of the
factorial function, but will always be significantly different from the function itself, no matter how
large the value of n becomes. As an example, 5! = 120 is approximated as 128 by our formula and 118
by Stirling. Despite this fact, the growth of the factorial function at large n is so fast that the difference
is unclear on graphs of the function itself. The relative error, on the other hand, given by the
difference between the approximation and the factorial divided by the factorial, makes the difference
between the approximations obvious. Figure 9 illustrates this, with the long dashes representing the
relative error associated with our trapezoidal approximation and the short dashes representing that
124

Section IV.2: The Method of Steepest Descents

associated with Stirlings result. The fractional error associated with Stirlings result clearly
approaches zero as n tends to infinity, while the trapezoidal result tends to
e 2
2 0.08443755... . This difference is not at all clear in the relative error associated with

the logarithms of these expressions illustrated in figure 10 because it only represents a constant which
is then divided by the growing logarithm.
There are several ways to extract the correct asymptotic factor of 2 . One way is to appeal
to Legendres duplication formula for very large n, but probably the most common is using the method
of steepest descents. Starting with the integral representation of the factorial function,32 we write

n ! x n e x dx en ln x x dx .
Now, if n is very large then small values of x in integral are suppressed by the factor of x n in the
integrand. Regardless of the value of n, the largest values of x will always be suppressed by the
exponential. Therefore, there is a certain region of x from which the majority of the integrals
contributions come. Since the exponential function is always increasing, we can identify this region as
that associated with the maximum value of the exponent n ln x x . As with all continuous and
differentiable functions, we locate this maximum by taking the derivative and setting it equal to zero.
This gives x n , so the majority of the integrals contribution comes from x in the vicinity of n.
When x lies close to n, we can approximate the exponent using a Taylor expansion:
xn
n ln x x n ln(n x n) ( x n n) n ln n n n ln 1
( x n)
n

.
( x n) 2 ( x n) 3
n ln n n

; x n n
2n
3n 2
The reason we are employing this expansion is our belief that the major contributions to the integral
come from x close to n; the major question at this point in our analysis is, how close?. The difference
between our maximum of n ln n n and the value of the exponent at x is given by
x
n ln x x (n ln n n) n ln n x .
n
This expression is obviously negative for all x n (our value is the maximum!), and it represents the
exponential suppression of values of x different from n. It also must decrease monotonically as x
moves farther from n in either direction, as there would otherwise be another critical point of the
exponent. Writing x n n for some positive constant 1 , we have
n n
n ln x x ( n ln n n) n ln
n n ln(1 ) .
n
The number in brackets is negative, so any value of x that deviates from n by an amount proportional
to n is exponentially suppressed by a number proportional to n. Taking 0.1 , for example, leads to
suppression of order e 0.005n . As n increases without bound, this exponential suppression will
effectively mute these contributions.
The standard practice in assessing the importance of contributions that are exponentially
suppressed is to consider only those contributions where the suppression is of order 1. Any
appearance of n in the numerator effectively spells death for that contribution. With this in mind,
consider values of x that deviate from n by an amount proportional to n , x n n . The constant
can be large in this case, but it cannot be as large as n as this would amount to the suppression
discussed above. The suppression of this contribution is
32

It is common practice to derive this result for the factorial function, n ! ( n 1) , rather than the Gamma function itself;
obviously, the two are equivalent.

125

Advanced Mathematical Techniques



n n
n n ln 1
,

n
n
n

which, on using the fact that n , can be written approximately as
1
3 4

.
n ln x x (n ln n n) 2
2
3 n 4n
As n grows without bound, this suppression tends to the finite amount 2 2 . It should be clear from
this analysis that deviations of x from n that are proportional to any power of n greater than 1/2 will be
ultimately suppressed as n increased and those proportional to powers of n less than 1/2 will not be.
As n increases without bound, therefore, we need only consider values of x that deviate from n by an
amount of order n or less.
Ignoring contributions that deviate from n by an amount greater than n allows us to simplify
the expansion of the exponent to
( x n) 2
.
n ln x x n ln n n
2n
Substituting this into our expression for n! gives
n ln x x (n ln n n) n ln

n ! en ln n n ( x n )

2n

dx n n e n e u
n

2n

du .

The final integral can be extended to go over all values of u from negative infinity to infinity, as the
difference is suppressed by e n 2 . The Gaussian integral then gives the Stirling approximation
n ! n n e n 2 n .
700
600
500
400
300
200
100

1.2
1.0
0.8
0.6
0.4
0.2
2

Figure 11

10

12

14

400 000
300 000
200 000
100 000
2

10

Figure 12

12

14

10

15

20

Figure 13

The majority of our treatment consists in showing that contributions to the integral that lie far
enough away from n are suppressed enough to ignore. Without this treatment, we could not have
substantively argued that the expansion of the logarithm could be so quickly terminated (or even that
we could use the expansion at all). This truncation of the expansion essentially consists of replacing
2
the integrand x n e x with n n e n ( x n ) 2 n . The two integrands are illustrated in figure 11, where n is
taken to be 3 and the short dashes indicate the approximation, figure 12, with n = 7, and figure 13, with
n = 10. It is clear from the figures that, while the curves are not quite the same shape, the
approximation definitely gets better as n is increased. In addition, the exact area under the solid curve
is approximated better by the area under the dashed curve than would be expected from the shapes of
the individual curves because the excess of area on the left is compensated by the deficiency of area on
the right. This is why the Stirling approximation works better than would be expected from our above
analysis at moderate values of n. This good luck with the factorial approximation is not
characteristic of all situations, but the idea that values of x lying too far from the local maximum can
be ignored because of exponential suppression works well whenever the desired function is given by
an integral of the form
f (n) g ( x)e nh ( x ) dx .
126

Section IV.2: The Method of Steepest Descents

We dont even need to substantively change our analysis in this case; the Taylor expansion about the
maximum can essentially always be truncated at the quadratic term33 in order to determine the
behavior of the function for asymptotically large values of n. If there is more than one maximum, then
we sum over the contributions from the vicinity of each maximum. This technique can even be
extended to cases in which the function h(x) is imaginary, as large values of n indicate that even small
changes in h(x) lead to vastly oscillatory behavior in the integrand. Values of x associated with places
where the derivative h( x ) is not zero34 are suppressed because they cancel each other out. This
approach is often called the method of stationary phase for this reason. Important contributions only
come from regions in which the phase of the integrand has a critical point, or is stationary in its value.
As an example, lets consider the asymptotic behavior of the Bessel functions. These functions
are extremely important in discussing systems with cylindrical symmetry, in which they play the role
of the trigonometric functions sine and cosine. They were first introduced by the Dutch-Swiss
mathematician Daniel Bernoulli in the 18th century, but were generalized by the German astronomer
and mathematician Friedrich Wilhelm Bessel early in the 19th century. Bessel functions of the first
kind, the ones that usually appear in physical problems, can be defined by
1
1 i x z sin x
J ( z )
e
dx .
cos x z sin x dx
2
2
This integral is of the above form, so we can use the method of stationary phase to determine its
asymptotic behavior. The exponent is stationary when
z cos x .
This occurs twice on the integration interval, at x Arccos z . Since these two critical points are
the same number with opposite sign, we can consider only the positive value and add the contribution
from the other critical point by adding the complex conjugate of our result. Approximating the
exponent as

x Arccos z ,
x z sin x Arccos z z sin Arccos z z sin Arccos z
2
and doing the remaining integral, we obtain the asymptotic behavior
1 i Arccos z z sin Arccos z
2i
J ( z )
e
c.c. .
2
z sin Arccos z
Here, the + c. c. represents the addition of the other critical point, whose contribution is the complex
conjugate (c.c.) of the one we have written. If both of the parameters z and are real and positive,
and z , then we can simplify this result to
1 i Arccos z z sin Arccos z 4
2
J ( z )
e
c.c.
2
z sin Arccos z
.
2
2
2

cos Arccos z z 4

z 2 2
This is an asymptotic formula, valid whenever z is very large and z . If the value of z is very much
larger than , this result can be simplified immensely to
2
J ( z )
cos z 2 4 ; z .
z
2

33

As long as the coefficient of this term is not zero.


Really, the derivative of the actual exponent nh ( x ) ln g ( x ) oscillations of the function h(x) can, in some sense, be
countered by oscillations in the function g(x).

34

127

Advanced Mathematical Techniques

This is the form usually found in texts on this subject, but the above result is more accurate and far
more general (though also more complicated this is probably the reason that it is absent in most texts
on the subject, even though it is an excellent approximation). Taking care with the addition of the
complex conjugate35 we can also determine asymptotic forms for complex values of z and or for
values of z . Some comparisons for real z and are illustrated in figures 14, 15, 16. Figure 14
contains the results for 0 , figure 15 for 2 , and figure 16 for 5 . The solid line in each
figure represents the exact result, the long dashes represent the simplified asymptotic form, and the
short dashes the more complicated asymptotic form (long and short dashes are identical when 0 ).
Note that, in each case, both of the asymptotic forms approximate the exact result very well when z is
large. The more complicated form latches onto the exact result almost immediately after z exceeds ,
while the simplified form takes a while to catch up (it is essentially waiting for the last approximations
we made, associated with taking z , to be more accurate).
1.0

2.5
2.0

0.5

0.5

1.5
1.0

0.5

10

12

- 0.5
1

14

10

15

20

- 0.5

Figure 14

Figure 15

Figure 16

Although the method of steepest descents is very useful in a variety of situations and can give
us very accurate representations of the asymptotic behavior of a wide range of functions, it is obvious
from the development above that this method is not well-suited for improvements. We could imagine
trying to improve our approximation of the factorial function by including the terms in the Taylor
expansion of degree 3 and 4 (just including degree three would cause the integral from negative
infinity to infinity to diverge), for example, but this would lead to a more complicated integral than
the one we started with. There are other techniques that can be used to obtain more accurate
asymptotic expressions for large values of n, both in the factorial case and in the Bessel case, and these
are given in more advanced texts on the subject. Stirling was able to show that the logarithm of the
factorial function has the asymptotic expansion
1
1
1
1
1

ln n ! n ln n n ln(2 n)
2
12n 360n3 1260n5 1680n 7
and that, for all positive n,
1

n n e n 2 n e12( n 1) n ! n n e n 2 n e12 n .
I will not present a derivation of Stirlings series here, but one is given in section 8.3 of Arfken and
Weber. This series gives an excellent approximation to the factorial function. Figure 17 illustrates the
relative error in these two approximations (the negative one is the lower bound) up to n = 6. It is
clear from the figure that the upper bound is within 0.2% for all values of n greater than 1, and the
lower bound is within 1% for all values greater than 2. Even better, this series is also valid for
complex n with large modulus36 as long as the values do not lie along the negative real axis. This is
required by the fact that the factorial function has poles along this axis at all negative integers.

35

It is meant only to change the sign of the exponent and the factor of i in the square-root, not the contributions from the
parameters z and .
36
This term means distance from the origin.

128

Section IV.2: The Method of Steepest Descents


0.02
0.01
1

- 0.01
- 0.02
- 0.03

Figure 17
Unfortunately, Stirlings series is still asymptotic. As with the function f (x) I began this text
with, asymptotic expansions are the only power series expansions that can even hope to approximate
the factorial function for large argument. It may not seem like it from the few terms given above, but
the coefficients of Stirlings expansion grow without bound as the expansion continues. Stirling was
apparently unaware of this fact when he published his result, but it was pointed out by the British
mathematician Thomas Bayes in a letter to the British physicist John Canton that was posthumously
published in 1763. As we have seen, asymptotic expansions can be very useful but are limited in their
accuracy. Given a specific value of n, it is impossible to use Stirlings series directly to obtain a result
with arbitrary accuracy. For this reason, many other series have been developed over the years that do
converge to the factorial function. These expansions are necessarily not power series; most of them
consist of terms of the form
ck
,
n(n 1)(n 2) ( n k )
reminiscent of the limit definition of the Gamma function. They do, however, converge, and therefore
can be used to determine arbitrary accuracy for any value of n if you have patience.
Efforts have been made in recent years to re-sum part of Stirlings expansion to determine a
closed form that gives the factorial function to a very high degree of accuracy (though still not
arbitrary accuracy). There are several results that work very well, but my favorite is the expression
z

2 z z
10
e z 1
( z )
,
z
z2 1
120

offered by the Hungarian mathematician Gergo Nemes in 2007 (!). Note that this is an expression for
( z ) rather than n!. We could make a factorial expression by adding 1 to all the zs, but it would
make the formula look more complicated. This formula is nothing short of amazing. Its relative error
is illustrated in figure 18 for values of z from 0 to 3,37 and in figure 19 for z running from 10 to 20.
The actual error (not divided by ( z ) ) is illustrated in figure 20 for z running from 10 to 16. Note that
the formula is not magic. It does not give the exact value of the Gamma function (it cant because
the Gamma function cannot be expressed as a finite combination of elementary functions), but only
being off by 100 for the value of 14!, which is approximately 8.72 1010 , is pretty impressive!
Stirlings estimate is off by about 500 million.
0.004
0.002

- 0.002

0.5

1.0

1.5

- 0.004
- 0.006

Figure 18

37

2.0

2.5

3.0

- 1. 10-9
- 2. 10-9
- 3. 10-9
- 4. 10-9
- 5. 10-9
- 6. 10-9

12

14

16

18

11

20

12

13

14

15

16

- 50
- 100
- 150

Figure 19

Figure 20

Remember that this is only 2!... z = 0 is a pole in the Gamma function, corresponding to (-1)!.

129

Advanced Mathematical Techniques

Actual computations of the Gamma and factorial functions generally consist of using one of
these nice asymptotic relations to determine the value of the function at the value we want plus some
large integer, then employing the recursion relation to step that value back down to what we actually
are interested in. This is especially useful because the recursion relation is exact. It introduces no
error whatsoever, so the relative error in the result we get is the same as the relative error in the
approximation we began with. Since the relative error in Nemes result is less than 109 for all
numbers larger than about 15, this process will give us eight digits if we step the number we want up
to a number larger than 15 more if we step it up farther. As an example, we can calculate (0.4) by
instead approximating (5.4) using Nemes result. This gives
(5.4) 44.59884199 ,
so
(5.4)
(0.4)
2.218159237669 .
(4.4)(3.4)(2.4)(1.4)(0.4)
The exact answer is (0.4) 2.218159543757688... , so we got 6 digits just by stepping up 5 units. Of
course, we could have gotten the same result from the Taylor expansion for the Gamma function
derived in the Gamma function notes (provided we had the patience to add up twenty-one terms and
knew all of the required zeta functions to enough digits). One of the real advantages this approach
has over the Taylor approximation is that it also allows us to calculate the Gamma function for
imaginary values. The relative error (in absolute value) of approximating the Gamma function along
the imaginary axis using Nemes result is shown in figure 21. Again, it is very good as long as the
argument is far from the origin. We can improve this by stepping the number we are interested in up
by several integers in exactly the same manner as before; to calculate (i ) , for example, we use
Nemes result to calculate
(5 i ) 1.2178330636 21.470122979i ,
and determine the desired result via
(5 i )
(i )
0.1549497366683 0.4980156403893 i .
(4 i )(3 i )(2 i )(1 i )(i )
The exact result is (i ) 0.1549498283... 0.498015668... i , so again we have 6 digits. We can do
the same thing with Stirlings result, but we will need to bump the desired result up the ladder38 more
in order to get the same accuracy. Taylor will not give us (i ) no matter what we do since the
distance from the center of that expansion (at z = 1) to the point we are interested in (z = i) is
larger than the distance to the nearest singularity (at z = 0).

2,

0.00014
0.00012
0.00010
0.00008
0.00006
0.00004
0.00002
2

10

Figure 21
As a final topic in these notes, I thought it would be interesting to illustrate one way in which
some of this analysis all pulls together to give us an exotic result that we would not have been able to
come up with before. Consider the integral
1

ln ( x) dx .
0

38

We should really say, over the ladder, as the recursion relation takes us farther to the right in the complex plane of z.

130

Section IV.2: The Method of Steepest Descents

This is certainly an exotic integral, and we would have no way of determining its value by finding an
anti-derivative. Using what we know about the Gamma function, however, we can write this integral
in the more approachable form
n
1
1

ln n ! x ln n ln( x k ) dx .

0 ln ( x) dx lim

n 0
k 0

We do not have to worry about exchanging the limit and the integral here, as there are no divergences
in the integration (the divergence in the logarithm is weak); the limit either exists or it doesnt. The
integral is quite easy to take in this form:
n
1
1

ln n ! ln n (1 k ) ln(1 k ) k ln k 1 1 .

0 ln ( x) dx lim
n
2
k 1

Looking closely at the sum, we can see that this result can also be written as
n
1
1
k 1

ln n ln(n 1) k ln
n 1 .

0 ln ( x) dx lim
n 2
k

k 1

The remaining sum is telescoping; by inspection,39 one can easily see that it is given by
n
k 1
n ln(n 1) ln n !.
k ln

k
k 1
This brings us to
1
1

ln n ! (n 1) ln( n 1) ln n n 1 .
0 ln ( x) dx lim
n
2

Now, n is very large. This means that we can freely use Stirlings approximation without error
1
1
1

n ln n n ln(2 n) (n 1) ln(n 1) ln n n 1
0 ln ( x) dx lim
n
2
2

lim (n 1) ln n ln(2 ) (n 1) ln(n 1) 1


.
n
2

1
1 1
lim ln(2 ) (n 1) ln 1 1 ln(2 )
n 2
n 2

We are therefore left with the very exotic expression


1

ln ( x) dx ln
0

2 .

The result depends only on the value of the constant c that de Moivre first introduced in his study of
the asymptotic behavior of the factorial function. Everything else cancels out in one way or another
because of the recurrence relation satisfied by the Gamma function and the algebraic and expansion
properties of the natural logarithm. Because of this fact, the same cancellations will occur in the
evaluation of the integral

z 1

ln ( x) dx

for any value of z 0 (it also works for complex z, but we need to be careful with the logarithms in
that case). Try to use this technique to show that40

32

12

ln ( x) dx ln e .

We can also use this interesting result to derive the following sums for the Riemann zeta function:
39
40

This means looking at the series for a few, preferably small, values of n and discerning its behavior from those.
Hint: it may be easier to establish the general result for z first, then plug in z 1 2 .

131

Advanced Mathematical Techniques

(1) k (k )
ln 2 e 1 ,
k 2 k ( k 1)

(1)k (k )
e
ln
1,
k 1
2
k 2

(2k )
ln e ,

k
(2
k 1)22 k
k 1
and

(k )
ln 2 e .

k
(
k
1)
k 2
These series appear as problems 19 22 in the exercises.

Exercises for Section IV.2:

In problems 1 6, use both the Stirling approximation and Nemes approximation to compute the given
number. Bump the number up ten times using the recursion relation before applying Stirling, and five
times before applying Nemes. Then, compare the relative errors in your approximations using a
computer algebra system that has the Gamma function built in.
1. 0.6

2. 2.3

3. 3i

4. 4.5i

5. 1 2i

6. 2 0.5i

In problems 7 10, use Stirlings result to determine the radius of convergence of the given expansion.
Explain what test you are using. Also explain why it doesnt matter whether you use Stirlings or
Nemes result in this application, and why this treatment is more easily used than the ratio test.

7.

n !2 (2n)! n
x

n4n
n 1

8.

n !2 n n n
x

n 1 (3n)!

9.

n !(3n)! n
x

n 1 (4 n )!

10.

n !(3n)!n n n
x

(5n)!
n 1

In problems 11 16, use the method of steepest descents to determine an asymptotic form of the given
integral. Evaluate your expression at n = 100, and compare your result with that obtained from a
computer algebra system. Problem 13 gives results that are too large for n = 100; use 10 instead.
11.

14.

132

n x e x dx

12.

n x x n dx

15.

n x e x dx

x n e nx dx

13.

16.

x x e nx dx

x n n x dx

Section IV.2: The Method of Steepest Descents

17. Determine the value of

z 1

ln ( x) dx , showing all of your work.

18. Show all of the steps necessary to establish the fact that
n
1
ln n
n 1
ln n ! ln n (1 k ) ln(1 k ) k ln k 1 1 ln n ! (n 1) ln(n 1)
2
2
k 1
whenever n is a natural number.
In exercises 19 22, determine the value of the given series. Show all of your work, as the answers are
given in the text. It may help to use the fact that

z 1

(1) k (k )

k 2 k ( k 1)

19.

ln ( x) dx ln (1 x) dx .

(1) k (k )

k 1
k 2

z 1

20.

21.

(2k )

k (2k 1)2
k 1

23. Explain why there cannot be an expansion of the form

a
k 0

2k

22.

(k )

k (k 1)
k 2

z k that converges to the Gamma

function in any neighborhood of the origin, regardless of the values of the constants ak . To
paraphrase, no series of inverse powers of z can converge to ( z ) ; all series that try must be
asymptotic in nature. Hint: remember the character of power series and their domains of
convergence.
24. Use the result you obtained in problem 17 to determine the value of the integral
1

0 ln sin x dx .
Hint: Use the reflection identity. Show that your result is consistent with that obtained from the
treatment of section 3.3.

Section IV.3: Summary of Chapter IV

While Taylor series are extremely useful, they are often very difficult to use when the argument
of interest is large. We would ordinarily simply shift the center of the expansion to one closer to our
argument of interest, but this is not always easy to do. Rational functions allow us to shift the center
of the expansion to infinity, giving us a much more well-behaved expansion that can be used to
approximate the value at large arguments, but these functions can be taken more-or-less exactly with
much less work. Other functions of interest usually have a major issue at infinity, called an essential
singularity, that prevents the existence of a Taylor series centered there. In order for us to approximate
these functions for large arguments, we must settle for a different type of expansion that does not
converge in the usual sense of the word, but does give us fairly accurate results if we treat it with care.
133

Advanced Mathematical Techniques

Such asymptotic expansions have a behavior that depends critically on both the number of terms kept
and the argument we are interested in. Adding more terms in an asymptotic expansion will give us
more accuracy up to a point. After we reach that point, it is actually counter-productive to include
more terms. The optimal number of terms to keep depends on the argument of interest, and we cannot
expect to obtain arbitrary accuracy for any argument except the center of the expansion. Asymptotic
expansions are both useful and divergent, a dichotomy that is difficult to grasp once one is familiar
with Taylor series. We cannot simply add terms without thought in an asymptotic series; a global
analysis of the series must first be done in order to determine the proper number of terms to keep for
our particular argument of interest before we can determine the approximation. The accuracy of our
final result depends both on the number of terms we keep and on the argument of interest. The
argument of interest is actually more important in some sense than the number of terms we keep, as it
determines the maximum possible accuracy that can be obtained from the expansion regardless of the
number of terms kept. This character is strikingly different from that associated with Taylor series.
The first section of this chapter illustrates some standard techniques used to determine the
asymptotic expansion of a function given as an integral. These techniques are very useful in most of
the situations found in standard applications, but they cannot truly be described as general. The
treatment of asymptotic expansions given in the present text is, at best, rudimentary. There are many
other techniques associated with finding solutions to equations involving an asymptotic parameter,
finding the asymptotic expansion of a function given as a power series, and solutions to differential
equations involving an asymptotic parameter. Interested readers are referred to the excellent texts by
N. G. de Bruijn, A. Erdli, and E. T. Copson. These offer a much more general analysis of the
treatment behind and uses of asymptotic approaches to mathematical problems.
In the second section, we focused on finding the asymptotic behavior of a given integral instead
of finding an expansion for it. This should not be taken as a sign that there are no asymptotic
expansions for such integrals, as most certainly do admit such expansions. Rather, this treatment
should be thought of as illustrating what happens for truly asymptotic values of n. We are essentially
looking at the first term in an associated asymptotic expansion. This analysis is useful because it is
both simple and accurate for extremely large values. More sophisticated analyses often begin with this
method of steepest descents in order to determine what is expected of function asymptotically and
where the asymptotic expansion is expected to break down before more work is done to determine
corrections to this fully asymptotic result. The method of steepest descents is amply suited for treating
the ratio test for series and physical situations in which we are interested in arguments that are truly
asymptotic. We will see some examples of this in chapter 10, but there are many others.
Asymptotic expansions are not as strong theoretically as Taylor series, and must be treated
with care if one wishes to obtain accurate results. Despite this fact, they are often preferable to Taylor
series when the desired argument is very large. Even if a Taylor series can be found that converges at
this argument, it may be very difficult to obtain the accuracy that can be had easily from an asymptotic
expansion. The choice of which expansion to use is highly dependent on both the desired accuracy
and the argument of interest, and must be made independently in each situation. This again illustrates
the need for a human to understand the underlying process even though a computer will calculate the
ultimate result.

134

Section VII.1: Least Squares Fitting

Chapter VII
Mathematical Models and Periodic Motion
This chapter is intended as an introduction to the ideas of mathematical models of physical
behavior and some of their uses in systems that conserve energy. The ideas of least-squares fitting and
the first and second integrals of the motion of simple systems are derived.
Section VII.1: Least Squares Fitting
Suppose that we have a model predicting that the relationship between two physical quantities
x and y is given by y f ( x ) , for some function f. For example, x could represent the height of a wall
and y could represent the time it takes a ball to fall from the top of the wall to the ground after it is
dropped. Simple kinematics indicates that this relationship is given by
y 2x g ,
where g is the acceleration due to gravity at the surface of the Earth. We can test this model by
measuring the values of x and corresponding values of y in some range, say by dropping balls off of
walls that range in height from 1 meter to 20 meters and measuring the corresponding time taken for
the fall. Now, this model is certainly not exact. There will be some heights x for which the
corresponding time is not given exactly by our formula. The accuracy of our formula for a given set of
data points xk , yk can be characterized by the deviation these data points illustrate from the formula.
The standard way to do this is to compute the chi-squared111 value for the data and the formula, the
sum of the squares of the differences between the values given by the formula and that given by the
data:
n

2 yk f ( xk ) .
2

k 1

In this expression, n is the total number of data points taken. If the formula was exact, then the value
of 2 would be zero. In practice, we consider a formula to be accurate if the value of 2 is much
smaller than the characteristic size of the yk2 in the data set. We compare to yk2 because this quantity
has the same unit as 2 .
Our example of the falling balls has a parameter, g, that also must be determined by
experiment. If we have an accepted value of g from some other source, then we can take our formula
for 2 as-is and determine how good this approximation represents our data. If, on the other hand, we
do not have an accepted value readily available, then we can use our data to determine which value of
g gives the best approximation. This is often the case, as physical models almost always contain
parameters like g that are initially unknown and must somehow be determined empirically from
experiment. The underlying assumptions in the development of the model are tested by comparing the
values of these parameters obtained from vastly different experiments. If these values are essentially
the same, then the underlying assumptions are validated. If they are not, then the data is indicating that
something is missing in the underlying analysis that must be remedied in order to obtain a coherent
description of the physical problem. This illustrates the constant struggle between theory and
111

Our use of the term chi squared is different from that found in statistics. This quantity is associated with the variance
found in statistics, and the variance will be equivalent to our chi squared in all relevant respects.

351

Advanced Mathematical Techniques

experiment: the theory indicates which experiments might lead to interesting results, and the
experiment indicates how valid the underlying ideas associated with the theory are.
Given a set of data that is expected to follow a certain model function that contains some
number of unknown parameters, we cannot directly compute the value of 2 until we know the values
of the parameters. The function 2x g , for example, is useless if we do not know what value to use
for g. The statement that the data should be approximated by this function indicates that this function
should give a small value of 2 for some value of g, however, so it is obvious that we should search
for the value of g that makes 2 the smallest. If this value of g makes 2 very small, then the model
is, in some sense, verified and this value of g seems to be the one preferred by the data. In this way,
we obtain an indirect measurement of the parameter g by fitting the data to the function 2x g . This
process of data fitting is extremely useful in many branches of science. In physics and related
disciplines, the parameters determined from this process often have a direct meaning in the context of
the physical problem. In our example, it gives the acceleration due to gravity experienced by objects
close to the surface of the Earth, and represents an indirect measurement of the Earths mass, radius,
and the universal gravitational constant G. In more complicated fields like biology, these parameters
may not have a direct physical interpretation. The form of the function f (x) in these cases may be
motivated by experiment itself rather than a well-defined physical theory, usually by simply graphing
the data and seeing what the relationship looks like (as in, oh that looks like a straight line!, or,
oh that looks like a square root graph!). In either case, the determination of the parameters that
minimize 2 will almost certainly give us more information about the physical processes underlying
the experiment, and will usually give us more insight into the direction experiments should move in to
increase our understanding of these underlying processes.
The process of data fitting involves finding the values of the unknown parameters that
minimize the value of 2 for some given set of data and some given function f (x). This is a standard
minimization process, familiar to students of multivariable calculus. In our example, suppose that we
assume that the function f ( x ) x describes our data. The value of 2 is then given by
n

2 ( ) yk xk .
k 1

This is a function of . To find its minimum value, we do as any good BC calculus student would do:
we differentiate with respect to and find where the derivative is zero:
n

d 2
( ) 2 yk xk xk 0
d
k 1

y
k 1

xk

x
j 1

This is the only value of for which the function 2 ( ) exhibits an extremum, so it must be a
minimum (it is obvious that there is no maximum because could be increased well beyond the
values of all of the xk and yk to make 2 ( ) as large as one wishes), but you can use the second
derivative test if you like. This gives us a best value of , but does not reflect on the validity of the
model in the first place. If even this best value of gives a 2 value that is too large, then the model
cannot be accurate for this data set; some other model function must be found in this case. If, on the
other hand, the value of 2 for this choice of is acceptably small, then we have not only validated
our model but also used the data to determine the value of the parameter .
352

Section VII.1: Least Squares Fitting

The above process can, in theory, be used to determine the best value of the parameters in any
model function with respect to a given set of data. This process is complicated if the parameters do not
a
appear linearly in the model function, as in f ( x) a b x or f ( x )
, but this is a complication
1 b cx
of calculation rather than one of substance. The first of these can be disarmed by considering the data
xk , ln yk instead of xk , yk . The second cannot be easily disarmed, which is the reason why logistic
fits always take so long on calculators. If the parameters all appear linearly in the model function, then
their best values can always be determined relatively easily using this method. The requirement that
the derivative of 2 with respect to each parameter is zero always results in a linear equation in the
parameters, so the best values of the parameters are obtained by solving the system of equations
associated with these requirements. Consider the data shown in table 1, for example, expected to
follow the relationship f ( x ) ax b ln x c for suitable parameters a, b, and c. See if you can show
that the best parameters a, b, and c are given by the equation
1

n
n
n 2
n

xk ln xk
xk yk xk

xk
k 1
k 1
k 1
k 1

2.02
a n
n
n
n

2
xk ln xk ln xk ln xk yk ln xk 4.156 .
b
k 1
k 1
c k 1
k 1
5.609
n

n
n
n

ln xk
1 yk

xk
k 1
k 1
k 1
k 1

The agreement between this function and the data is illustrated in figure 1.
x
y
0.5
4
1
7
2
13
3
16
4
20
5
22
6
25
7
28
Table 1
30
20
10
2

- 10

Figure 1
Now, this process is perfectly well-defined mathematically. It will always give us the values of
the parameters that make the value of 2 exactly the smallest. The mathematical techniques used in
this procedure are above reproach, and absolutely inarguable. However, we must not confuse this
mathematical certainty with physical certainty about our model. The mathematics that comes out of
this analysis is only as accurate as the numbers we put into it. When there are many parameters in our
model, slight changes in the data can lead to huge variations in the parameters associated with it. To
353

Advanced Mathematical Techniques

ensure that we are being honest, we must sample data in regions where the parameters we are
interested in are important. In the present situation, the parameter a is not very important when x is
small and neither of the parameters b or c are very important when x is large. Parameter b is extremely
important when x is small. These considerations require us to look closely at how our model behaves
in these regions before we can truly accept the accuracy of our results. In the present case, we would
be well-advised to ask our experimentalist friends to take more data at even smaller values of x in
order to assess the accuracy of our b coefficient. This is what I mean when I say that theory often
directs experimentalists on where to look for interesting results. The theory associated with this
example definitely indicates that something interesting is going on at small x, even though this is
definitely not obvious just from looking at the raw data.
Given enough parameters, we can force many models to fit any number of data points exactly,
despite the possibility that the model may have no physical significance whatsoever. The fact that a
polynomial of degree n 1 can be designed to go through any n data points exemplifies this issue.
This is one reason why scientists tend to gravitate toward models with a small number of parameters
instead of more complicated models. This technique is extremely useful, but it must be used with a
degree of caution and a clear understanding of what we are doing. Once we have a model that agrees
very well with our data, we must verify that this model predicts the outcome of future experiments
with similar accuracy in order to be comfortable with its standing. Even in this case, we cannot at all
be sure that the model will be accurate in situations that are very different from those in which the
original data were taken. In the present example, we cannot be sure that our model will accurately
predict the value of y when x = 30,000. There are many physical examples of theories that work very
well for a small range of values, but fail completely outside this domain. Special relativity and
quantum mechanics are prominent examples of this violation, but there are many more.
The unfortunate reality of experimental error complicates this idea. No matter how careful an
experimenter is, there will always be errors associated with his or her measurements. This is a fact of
measurement, and cannot be eliminated. One way to evaluate the effect of errors is to re-run the data
fitting with the largest and smallest values possible to assess the effect this has on the values of the
extracted parameters. If the effect is small, we can be reasonably sure that the results are accurate. If
it is not, then we need to take a closer look at the underlying model or commission more accurate
experiments. Both of these options cost money, but the former usually costs less than the latter. For
this reason, theorists often spend years perfecting the underlying analysis before a more expensive
experiment is undertaken. This interplay between theory and experiment is fundamental to current
scientific research, and its importance cannot be overstated.
The analysis given above is appropriate for experimental data that has comparable errors. If
the errors associated with the data differ greatly over the range of data points, then it is often useful to
weight the contributions to 2 according to the appropriate error. This process is very similar to that
used in multivariable calculus to find a weighted average of a quantity throughout a region; we simply
append the function with a weight before doing the sum. In multivariable calculus, this weight is
thought of as attached to the volume or area element. Recall that the volume element dV is replaced
with dV in all of the relevant integrals. This process can be thought of as a discrete version of
that one. Our area element is simply 1 since that is the size of the region we are summing (you
can think of this size as the change in k from one data point to the next). We are interested in making
contributions associated with a small error more important than those associated with a large error.
If the errors are random, then the statistically appropriate way to do this is to weight the contribution of
a given data point with the inverse square of its error. The reason for this is associated with the
celebrated central limit theorem of statistical analysis, which states that the probability that a

354

Section VII.1: Least Squares Fitting

( y y )2
measurement that should be y will be recorded as y with error y is proportional to exp
.
2 y2

The 1 y2 in the exponent of this probability is translated to the error weighting of 1 y2 in the
calculation of 2 . The bottom line is that the correct value of 2 associated with data xk , yk for
which the error in yk is k is given by
n

y
k 1

f ( xk ) k2
2

1 2j

j 1

The denominator of this expression is a constant that can be ignored throughout the minimization
process, so the error-corrected best values of the of the parameters are obtained simply be adorning
all of the sums in our above expressions with the factor 1 k2 before doing the sums.
x/deg. C
y/liters
Error/liters
-20
20.3
0.5
-10
21.6
0.02
0
22.4
0.02
10
23.5
0.3
20
24.0
0.1
30
25.1
0.3
40
26.2
0.5
50
26.5
0.02
Table 2
As a simple example, consider the data given in table 2 with errors in y as shown. This data
represents the volume y in liters of a sample of one mole of helium gas at standard atmospheric
pressure and temperature x in degrees Celsius. The ideal gas law states that this data should follow a
linear relationship. The graph shown in figure 2 indicates that this is a realistic expectation. Using an
un-weighted fit, we obtain the relationship
y 0.0892857 x 22.3607 .
This is shown with the data in figure 3. Note that while the model does come close to all of the data
points, it does not come within the quoted errors of several of the data points. The need to come close
to the data points with large errors has caused the model to stray farther from the points with small
error than we should allow. Using the modified approach with error-weighted averaging leads to the
relationship
y 0.0817978 x 22.4095 ,
which is graphed along with the data in figure 4. Note that this new graph strays farther from the data
with large errors in favor of the data with small errors. While it does not come as close to all of the
data, it is expected to be more accurate and be a better physical model in light of the large errors
associated with some of the data.

355

Advanced Mathematical Techniques


27

26

26

26

25

25

25

24

24

24

23

23

23

22

22

22

21

21

21

- 20

20

- 20

40

Figure 2

20

40

Figure 3

- 20

20

40

Figure 4

Once we have completed the fitting process, it is important to re-visit the underlying details of
our physical model to determine what it indicates about the physical parameters of our analysis. While
our extracted slope and intercept are certainly not without errors, there is very little wiggle room in
their values that will still allow them to go through the four data points with small errors. Ignoring
these errors for the moment, we compare our model to the underlying ideal gas theory. This theory
holds that the relationship between volume and Celsius temperature is given by
nR
nR
V
T
TA ,
P
P
where n is the number of moles of gas, R is the ideal gas constant, P is the pressure of the gas, and TA
is the Celsius temperature associated with absolute zero. We have said that the gas consisted of one
mole of helium atoms at one atmosphere of pressure, but neither of these quotes is exact. Suppose that
the number of gas atoms is known to be 1 mole with an error of 0.01 moles and the pressure is known
to be one atmosphere plus or minus 0.02 atmospheres.
We have obtained a slope of
0.0817978 liters C , so the ideal gas constant is given by
P 1 0.02
liter atm
.
R 0.0817978 liters C
0.081778
n 1 0.01
C mol
The errors in number of moles and pressure have been communicated to an error in the derived value
of the ideal gas constant. The standard way to do this division is to add the relative error associated
with all quantities that are multiplied or divided. Quantities that are squared count twice, and those
that appear under a square root count half. This rule comes from standard rules of calculus:
d ( xy ) dx dy
d ( xy ) ydx xdy

xy
x
y

d xn

dx
.
x
x
Since dx is the variation of x, it represents the error associated with x. The ratio dx x is then
interpreted as the fractional, or relative, error in x. You can check for yourself that similar rules apply
when we divide quantities; the error can be positive or negative, so we always add the absolute values
of the relative error. If, in addition, the errors are known to be independent of one another, then it is
highly unlikely that two quantities would both see a maximum positive error at the same time.
Statistical analysis of this effect using the normal curve (in accordance with the central limit theorem)
indicates that the proper way to combine independent errors is to add them in quadrature, replacing
dx x dy y with112
d x n nx n 1dx

dx dy
.
x y
112

We will derive this result in chapter 10.

356

Section VII.1: Least Squares Fitting

Running through the numbers gives us the value

liter atm
.

C mol
In light of this error, it seems silly to include so many decimal places; it is clear that the fifth decimal
place in our value is meaningless, so one would quote
liter atm
R 0.0818 0.0018
C mol
instead. To circumvent the need for the and all the leading zeros in the error, scientific
publications often abbreviate this result to
liter atm
.
R 0.0818(18)
C mol
The 18 in parentheses indicates the change in the last two digits of the quoted result, plus or minus.
Turning our attention to the temperature associated with absolute zero, we see that the same
factor adorns this value. If there were indeed no error in the slope or intercept at all, then we would
arrive at the exact temperature
22.4095
C 273.962136 C
TA
0.0817978
for absolute zero. This, of course, is absurd. There definitely are errors associated with both
quantities. We have said that these errors are small because of the need for the line to go through the
four data points with small errors, but they are not zero. Even very small errors associated with the
slope and the intercept can lead to surprisingly large errors in the derived temperature of absolute zero
because our data points all lie far away from this temperature; using the values from the un-weighted
fit, for example, gives the value -250.4 degrees instead, about an 8% difference. This often happens
when we use a model fit to data to extrapolate to experimental values far from those at which the
measurements were made. This, of course, is quite natural. We cannot expect a series of
measurements at temperatures ranging from -20 degrees Celsius to 50 degrees to give us accurate
information about what is happening at 250 degrees lower than our lowest temperature measurement.
Even if the errors in our slope and intercept were not very large, our model cannot realistically be
expected to be valid so far away from where the measurements were taken. In our situation, helium
will actually undergo a phase transition to liquid at about -269 degrees Celsius and there will be a
discontinuous decrease in volume from the model prediction of about 0.406 liters to about 0.032 liters,
which will remain essentially constant as the temperature is further decreased at constant pressure.113
Despite this fact, the prediction of the model is actually quite close to the temperature of absolute zero
obtained from many other experiments. This requires more information to establish, though, and it
cannot really be trusted from the data we used to determine the model. The only real way to handle
extrapolation errors is to get more data that lie closer to the region of interest. This idea is often used
by experimentalists to argue the need for a future experiment. Usually, the easiest and cheapest
regions to measure are done first. Future experiments are then planned in order to get a stronger hold
on the regions of interest. This again indicates the importance of theory/experiment interplay; the
experimentalists give the theorists data, which the theorists use to tell the experimentalists where to
look next.
One very straightforward way to determine the approximate error in our slope and intercept is
to simply graph the data with errorbars on a piece of graph paper and move a ruler around in order to
determine the maximum and minimum slope and intercept. This doesnt seem very exact when
R 0.0817978 0.001829

113

This decrease in temperature at constant pressure is difficult to accomplish in reality; the easiest way to decrease the
temperature of a sample of liquid helium involves decreasing the pressure by pumping the higher-temperature atoms out.
This process is called evaporative cooling.

357

Advanced Mathematical Techniques

compared to the mathematical method described above, but we must keep in mind the errors present
throughout this analysis. The method above gave us several decimal places in both the slope and the
intercept, but this accuracy is associated only with the values required to give the exact minimum of
our error-weighted chi-squared value; they are in no way a promise that the physical values
associated with the process we are interested in have the same number of digits. There are, of course,
more sophisticated ways in which one can determine the errors in the derived parameters associated
with a model for some physical data, but these are much more complicated and usually reserved for
curves more complicated than simple straight lines. If your data is expected to follow a straight line,
the moving a ruler around approach is much simpler and essentially just as accurate as these more
complicated approaches. This approach also allows us to easily incorporate the errors in x that have
been ignored up to this point. If the errors associated with x are independent of those associated with
y, then we do not expect both x and y to experience their maximum error simultaneously. The
quadrature technique illustrated above leads us to draw ellipses around each data point, with axes
corresponding to the maximal errors in the x and y directions. We then move our ruler around and
draw lines that go through each of the ellipses to determine the maximum/minimum slope and
intercept.
Linear data is obviously much easier to analyze than nonlinear data, for the simple reason that
all straight lines have the same shape. Data that are expected to follow more complicated curves
require a much more involved treatment. Sometimes, the data may be plotted in such a way that a
curve becomes a straight line. This process of linearization is very useful for many models, as it
allows one to consider straight lines instead of curves. If the data are expected to follow the model
y a x b , we will simply consider the data set
xk , yk instead of xk , yk ; if they are expected

to follow y a x , we consider ln xk , ln yk . Each of these gives a straight line which is easier to


analyze, and each allows us to determine the parameters a and b of the real model from those
obtained in the linearized model. We must take care to determine the errors associated with our
measurements in the linearization process, as they will almost never be the same as those of the
original data. See if you can show that
2.345 0.31 1.53 0.10
and
ln(23.67 0.72) 3.164 0.03 .
It is not difficult, essentially an application of the linear approximation of introductory calculus. This
treatment is valid whenever the error is much smaller than the central value. If it is not, then more
complicated approaches must be used.
One further comment is in order before moving on to the next topic. All of the preceding has
assumed that the errors in x and y are independent of one another. This is usually the case if the
measurements are taken separately, but there are many situations in which an external influence can
cause both measurements to run high or low in their values. Sensitive experiments involving magnetic
fields must be corrected for the contribution of the Earths magnetic field, and those involving
radioactivity must be corrected for the natural background radioactivity present essentially
everywhere. Experimentalists try to minimize these effects, by shielding the measurement apparatus
in a superconducting foil to reduce the contribution of unwanted external magnetic fields or a lead
house to minimize background radiation, but they are always present. These types of errors are
called systematic errors, and must be dealt with carefully in a manner that is more specific to the
experiment at hand. You will often see two errors quoted for a result, that for independent errors and
that associated with systematics. It often takes teams of graduate students to tackle a careful
b

358

Section VII.1: Least Squares Fitting

investigation of the systematic error associated with a given experiment, and it is a very tricky process.
I wont talk more about them here, but you should be aware of their presence.

Exercises for Section VII.1:


In problems 1 6, use error propagation to determine the required combination of data. Assume that
the errors are correlated; do not use quadrature.
1. (3.67 0.12)(2.81 0.08) 2

2. (1.28 0.07) 2 (3.92 0.03)3

3. sin(2.82 0.05)

4. tan(1.42 0.05)

5. (3.02 0.07) ln(6.3 0.2)

6. (2.35 0.08)e1.72 0.13

In problems 7 10, fit the given data to the given model. Use both a standard fit and a fit weighted
with the associated error (in y). Include a graph of the data, including errorbars and both of the best fit
curves in your solutions, and comment on the difference between the two fits and the validity of the
model.
7. Fit to y ax 2 bx c
x
y
Error
0.3
3.1
0.2
0.7
3.8
0.1
0.9
4.19
0.02
1.1
4.90
0.06
1.6
6.3
1.2
2.0
12
2.1

8. Fit to y a x bx c
x
y
Error
0.2
2.0
0.2
0.5
1.38
0.05
0.8
0.94
0.03
1.1
0.3
0.3
1.8
0.00
0.01
2.3
-0.5
0.2

9. Fit to y ax 2 bx c ln x
x
y
Error
0.1
2.5
0.4
0.5
2.0
0.2
0.8
1.95
0.1
1.3
2.4
0.4
1.7
2.0
0.9
2.2
3.5
0.2

10. Fit to y a x b sin x c


x
y
Error
0.5
6.15
0.03
0.9
8.09
0.02
1.3
10.2
0.5
1.5
9.9
0.4
2.1
10.1
0.4
2.5
8.70
0.01

359

Advanced Mathematical Techniques

In problems 11 14, linearize the data according to the given model. Include errors in your analysis.
Then, graph the linearized data on graph paper and use a ruler to determine the range of slopes and
intercepts consistent with the data. Use your results to find the values of the parameters of the original,
unlinearized, model that are consistent with the data. Include errors, and comment on the validity of the
model.
11. Fit to y ax 3 b
12. Fit to y a x b
x
y
x Error y Error
x
y
x Error y Error
0.2
0.7
0.1
0.1
0.2
2.2
0.1
0.1
0.6
0.9
0.1
0.1
0.5
3.2
0.1
0.2
0.9
1.6
0.2
0.1
0.8
3.8
0.1
0.1
1.2
2.8
0.1
0.2
1.1
4.2
0.1
0.2
1.6
6.3
0.3
0.2
1.5
5.1
0.2
0.2
1.8
8.1
0.2
0.3
1.9
5.5
0.1
0.3
13. Fit to y a b x
x
y
x Error
0.3
2.5
0.1
0.5
2.8
0.1
0.9
3.2
0.1
1.2
4.1
0.1
1.4
4.4
0.2
1.9
5.7
0.1

y Error
0.1
0.1
0.2
0.3
0.2
0.1

14. Fit to y a x n
x
y
x Error
0.3
0.2
0.1
0.8
0.5
0.1
1.8
1.4
0.1
2.5
2.4
0.1
3.7
3.7
0.1
4.6
5.1
0.1

y Error
0.1
0.1
0.1
0.2
0.2
0.1

Section VII.2: The First Integral and Conservation of Energy


Macroscopic systems are governed by Newtons laws of motion. As you know from physics,
the first of these essentially establishes the idea of an inertial reference frame. In such a frame, objects
with mass must be acted on by a force in order to change their velocity. The third law is a statement
that objects share force; it is not something that one object does to another, but rather something that
both objects experience. These two laws are very important theoretically, but the bulk of the work
almost always falls to the second law,
Fnet m a .
This law is the statement that, in an inertial frame, the acceleration experienced by an object is
proportional to the net force experienced by the object. When analyzing systems using this law of
motion, one often encounters second order differential equations because the force experienced by the
object often depends on its position while the acceleration is the second derivative of the position.
These equations are often very difficult to solve, even in situations that are really quite simple. For
this reason, it is useful to try to integrate this equation to get rid of one of the derivatives and end up
with a first order differential equation that is easier to solve. This can be accomplished in all cases by
taking the dot product of Newtons second law with the displacement vector experienced by the object
over a very short time interval:
d dr
dv
dr
Fnet dr m a dr m dr m dr m dv m v dv .
dt dt
dt
dt
Now, we can manipulate this expression easily by using the fact that
360

Section VII.2: The First Integral and Conservation of Energy

m v dv m d v 2 m v dv m v dv d mv 2 .
2

This follows from the product rule and the fact that v 2 v v . Adding the contributions to this
equation from an initial time to a final time then gives
1 2
C Fnet dr Wnet 2 mv .
The path integral on the left takes place over the path taken by the object from its initial location to its
final location, and is called the net work done on the object by forces over its motion. This is a very
important result in the treatment of macroscopic systems, called the Work-Energy theorem. It states
1
that the net work done on an object is equal to the change in its kinetic energy K mv 2 .
2
The work-energy theorem is completely general, and is satisfied by all macroscopic systems
regardless of the character of the forces acting on them. Slight modifications allow it to be applied
even to systems whose mass changes with time, like a leaking bag or a cable car that is being loaded
with produce. It is interpreted as meaning that systems must be given energy through work in order
for their kinetic energy to increase, and systems must have energy removed from them through work in
order for their kinetic energy to decrease. In its current form, however, it is not very useful unless one
has a method with which to compute the net work done on the system. This can be greatly improved if
all of the forces acting on the system are conservative in the sense that the path integral on the left is
independent of path.114 In this case, its value can be computed from the fundamental theorem of line
integrals as the change in a potential function U:
Wnet Fnet dr U .
C

The potential function U can depend only on position, as its change depends only on the initial and
final positions of the system. The introduction of the minus sign allows us to think of U as
representing the potential energy, or energy associated with position, of the system. Moving it to the
other side of the equation, we have the result
( K U ) E 0 .
This is the statement of conservation of energy. The sum of the potential and kinetic energies of the
system, called its total energy E, does not change with time.
40
20

-2

- 20
- 40

Figure 5
Focusing on a system confined to one dimension for the time being, the potential energy is a
function of the position x of the object. Given a potential energy function and the initial location and
114
This will be the case whenever the curl of the vector force field F is zero throughout our domain of interest, F 0 .
There are some subtle vector fields that buck this idea, but the majority of vector fields satisfying this requirement will be
independent of path throughout our domain. The subtle ones will be obvious because their curl will be divergent at certain
places but zero everywhere else. These fields are associated with the complex function 1 z from chapter 5 and the fact that
this function represents a vortex. The additional requirement that the domain of interest be simply connected avoids these
subtle vector fields in every case.

361

Advanced Mathematical Techniques

speed of the object, we can use conservation of energy to determine its speed at any other location.
Suppose, for example, that the potential energy of a system is given by the function
U ( x) x 4 3x 3 10 x 2 24 x .
All quantities are given in SI units so that we can avoid having to write units. This potential is graphed
in figure 5; it has zeros at x = -3, 0, 2, and 4. If the object is found initially at rest at x = 0, then its
total energy is zero. This is represented by the solid black line in figure 5. Now, the total energy is the
sum of potential and kinetic. The potential energy is represented by the solid curve in the figure and
the total energy by the solid line. The objects kinetic energy is therefore given by the difference
between these two. Now, it is obvious from the form of the kinetic energy that it cannot be negative
(what would make it negative? Is the mass negative? Is the speed imaginary?). This implies that the
object can never make it to x = 1 or to any place farther to the right than x = 0. It can go to the left,
however, and will because the shape of the potential function indicates that there is a net force pulling
it in that direction. This can easily be seen from the fact that, for a small change dx in x,
dU
.
Fnet dx dU Fnet
dx
The slope of the graph is positive at x = 0, so the net force is negative. As it moves to the left, its
speed increases because its kinetic energy, given by the difference between the two graphs, gets larger
until it reaches the minimum around x = -2. Past that point, the object continues to move to the left
with decreasing speed until it reaches x = -3, at which point it pauses briefly and turns around to repeat
the process.
Potential energy versus position graphs are among the easiest I know of to interpret. They can
literally be thought of as hills, or skate-parks, or roller coasters. I like to think of them as a lake bed:
the potential energy is the bed of the lake, the total energy is the surface of the water, and the kinetic
energy is its depth. An object placed at rest at x = 0 cannot make it to x = 2.5 even though it has
enough energy to exist there because it is unable to get over the island, or potential barrier between
x 0 and x 2 . Left alone in the absence of friction or other energy-consuming forces, the object
will continue to oscillate between x = 0 and x = -3 forever. Given its mass, we can easily determine its
speed at any point on its trek. If the objects mass is 3 kilograms, for example, its speed at x = -2 is
given by 4 2 m s 5.65685 m s . An object of mass 3 kilograms with total energy 20 Joules will
oscillate back and forth between x = -3.17 and x = 4.28. It will experience a maximum speed of 6.739
m/s at x = -1.935. Our energy equation involves only the position and its first derivative, so we have
successfully removed the second derivative from consideration. For this reason, the equation of
conservation of energy is often called the first integral of motion.

Exercises for Section VII.2:

In problems 1 6, a particle of mass 3 is moving under the influence of the force given as a function of
its position. All quantities are in SI units. It is initially at position x = 0 and moving with speed 2.
Determine (a) its potential energy function, (b) its turning points, and (c) its maximum speed.
1. F ( x) 2 x 2 x3

2. F ( x) 5 x 2 x3

3. F ( x) 4 sin x 2 x

4. F ( x) 3cos x x 3

5. F ( x) 16 x cos x 2 2 x

6. F ( x) 3 x 2 4 x 3 32 x cos 7 x 2

362

Section VII.3: The Second Integral and Periodic Motion

Section VII.3: The Second Integral and Periodic Motion

While conservation of energy is indeed very useful in a variety of situations, it does not give us
the whole picture of the motion. Suppose, for example, that we wanted to know how long it takes for
the object referenced above with total energy 0 Joules to make its trek from x = 0 to x = -3. Questions
like this are not easily answered by the first integral, as time does not appear explicitly in the equation.
The answer is definitely contained in the first integral because we can easily use it to determine how
fast the object is moving at every point along the way, but the total time taken is not directly accessible
from this expression. The problem is that the first integral still contains the first derivative of the
position so still represents a differential equation. In order to determine the time taken from one
position to another, we need to solve this differential equation: we need a second integral. Happily,
first order differential equations are much easier to solve than second order ones. Essentially all we
need to do is solve for the derivative and integrate it. This is easily accomplished formally, as we can
write
dx
2
dx

,
E U ( x) dt
dt
m
2 E U ( x) m
but this integral is not easily performed in practice. Numerically, however, we can always compute
the time taken from position xi to position x f as

xf

xi

dx
2 E U ( x) m

This procedure is often referred to as reduction to quadratures after a technique of numerical


integration called Gaussian quadratures after Gauss. I will not go into the details of this technique, as
essentially any numerical integration technique will do, but you should be aware of the term and its
origins. It is for this reason that any one-dimensional (and, by extension, many multidimensional)
system involving conserved energy is said to be able to be reduced to quadratures using the second
integral. Since we can reduce any conservative one-dimensional system to quadratures using this
approach, making essentially any question that can be asked about them answerable to as many
decimals as one likes, many mathematicians viewed these systems as trivial in the 19th century and
turned their attention to other, more complicated, systems that cannot be disarmed in this way.
Apparently, the answer to our question above about the time taken to go from one turning point
to the other115 is given by
3 0
dx
.
T
2 3 10 x 2 3 x 3 x 4 24 x
This is certainly a daunting integral, and it is not at all obvious how to approach it in an analytic way.
It turns out that this integral can be computed exactly in terms of the so-called Elliptic integrals, which
are special functions that have been exhaustively studied partly for this purpose. The integral we are
interested in is given by
3
7
10
i K K 0.9387359... ,
10
10
3
where
2
dt
K ( m)
0
1 m sin 2 t
115

This time is half the period of the motion, the time taken to go from one turning point to the other and back. The symbol
T is usually reserved for the whole period, but I will use it to represent an arbitrary time for now.

363

Advanced Mathematical Techniques

is the complete Elliptic integral of the first kind. Essentially any integral involving quartic
polynomials and their square roots can be taken exactly in terms of elliptic integrals, but this process is
difficult and only gives the answer in terms of special functions anyway. If the potential function
U ( x) was of higher degree, then this technique wouldnt be applicable, so we will confine our
discussion to approximation techniques. The exact result in this specific case will let us know how
well we are doing.
In trying to compute the value of this integral numerically, we are immediately faced with a
serious issue. The integrand we are working with is divergent at both endpoints. This is not an
accident. We are integrating the displacement divided by the speed, dx v , and the speed must
necessarily be zero at both endpoints (otherwise, it wouldnt be endpoint!). This issue will therefore
come up in every computation of the period of every object under the influence of every potential, so is
something we should consider carefully. Numerical integration techniques are not well-suited for
divergent integrands. The integral is, of course, convergent because the divergence is contained in a
square-root function. This integral converges in the same way that

dx
0

x converges, but it is not

easy to determine this numerically. This problem comes up surprisingly often in physical calculations,
and usually it comes for a completely physical reason. One way to handle this is to simply truncate the
integration domain, integrating only from -2.9 to -0.1 instead of from -3 to 0. This method gives
3 0.1
dx
T
0.70522 ,

2.9
2
10 x 3 x 3 x 4 24 x
a result uncomfortably far from the exact one obtained above. The reason for this shortcoming is that
we have excluded the largest contribution to the time: that coming from the smallest speeds. We can
improve on our result by considering -2.99 to -0.01,
3 0.01
dx
T
0.864839 ,
2 2.99 10 x 2 3 x 3 x 4 24 x
but we are still not terribly close to the exact result. We could continue this process by pushing closer
and closer to the divergence, but in doing so we are essentially re-introducing the divergence issue that
we were trying to solve.
As a compromise, we can take the numerical result as-is and approximate the remaining
contribution ourselves. This contribution takes place only over two very short intervals, from -3 to
2.9 and -0.1 to 0 in the first case. This idea begs us to use a Taylor expansion to approximate the
integrand. In the vicinity of x = -3, the integrand is approximately given by
1
.
105( x 3)
The integral of this function can easily be handled analytically from -3 to -2.9; the result is
0.1
2
0.0617213 .
105
The contribution from x = -0.1 to x = 0 is similarly given by
0.1
2
0.129099 ,
24
so the entire time is approximated by
3
T 0.70522
(0.0617213 0.129099) 0.938926 .
2
364

Section VII.3: The Second Integral and Periodic Motion

This result is correct to three places, almost four. Try to show for yourself that the corrected result
associated with going from -2.99 to -0.01 is 0.938744, correct to almost 5 places. In general, this
result gives the value
b
b
dx
dx

a f ( x) a f ( x) 2 f (a) 2 f (b) .
If one side has a larger second derivative (implying that the ignored contributions are more important)
than the other, you may want to take the numerical integration farther on that side than on the other;
this is not a substantive change, and only requires you to use different values of on the two sides.
This problem is obviously well-known to engineers and physicists, and its resolution is often
included in computer algebra systems. Mathematica, for example, has no problem whatsoever giving
the correct result to as many places as one wishes. It is programmed to recognize the integrand as
divergent and use similar techniques to approximate the contributions from these divergent places. It
is important, however, that you are aware of these techniques if you ever have the need to compute
similar integrals quickly and want to make your own code to circumvent some of the time issues
associated with using more sophisticated software. It is well-known that ones own C++ code is
usually much faster than using a computer algebra system designed for general purposes. The reason
for this is that computer algebra systems must be ready for anything, and must therefore spend time
deciding which approximation technique to use, while a tailor-made code has the specifics of the
problem at hand already programmed into it. In many cases, tailor-made codes can be more accurate
and run faster than computer algebra systems even when the computer algebra systems are using more
sophisticated techniques to do the numerical integration. This is extremely important when running
animations or considering the period of motion for many different objects moving under the influence
of the same potential function. You can even improve the accuracy of your results by including the
contribution of the next term in the Taylor expansions. Try to show for yourself that this improvement
gives 0.938728 for the half-period, using only the integral from -2.9 to -0.1. Remember your inverse
sine integral and that
dx
2
2
x 2 a 2 ln x x a C .
One example of periodic motion stands above all others: simple harmonic motion. You have,
of course, learned about simple harmonic motion in physics class. It is one of the underlying tenets of
a basic physics class, and is always covered. Simple harmonic motion is the motion of an object under
1
the action of the potential U ( x) kx 2 , so the second integral gives116
2
xf
dx
.
T m
xi
2 E kx 2
The endpoints of the motion are at 2 E k , so the period of the motion is given by
2E k
1
dx
du
.
T 4 m
4 m k
2
0
0
2 E kx
1 u2
The disappearance of the total energy E in this expression is one of the reasons why simple harmonic
motion is called simple: the period is completely independent of the energy or amplitude of motion; it
depends only on the mass and the spring constant k. The remaining integral is easily evaluated, using
either the beta function or the inverse sine function, as 2 , so the period is T 2 m k . Only a few

116
Note that this T is different from the previous Ts in that it represents the full period of the motion, the time taken to go
from one turning point back to the same turning point. This is twice the value we considered above.

365

Advanced Mathematical Techniques

systems exhibit simple harmonic motion, notably objects on springs, so one may wonder why this
topic is so important in introductory physics. One reason is that it is so simple and allows one to
explore many different phenomena without requiring advanced mathematics,117 but the main reason is
that simple harmonic motion is an extremely useful approximation to the behavior of objects under the
action of many different potentials.
Most objects experiencing oscillatory motion, like that considered above, oscillate about a local
minimum in their potential energy function. Looking closely at the local minimum, we see that the
Taylor expansion of the potential function about this minimum cannot contain a nonzero linear term;
the first term in its expansion other than the constant value of the minimum potential energy is the
quadratic term. Truncating the expansion at this term, we obtain simple harmonic motion. The value
of k is given by the second derivative of the potential evaluated at the equilibrium,
k U ( x0 ) .
Oscillations about this minimum with small total energy are therefore approximated well by simple
harmonic motion because their energy does not allow them to stray far enough from the equilibrium
value of x for higher terms in the Taylor expansion to be important. This approximation of simple
harmonic motion is extremely useful in many applications, from the motion of a pendulum to the
vibration of molecules in a solid lattice. Even quantum systems, where a general treatment is
forbiddingly complicated, can be approximated very well by simple harmonic motion. A
determination of the second derivative of the potential energy at the stable equilibrium is enough for us
to determine many things about the motion of the system. Consider the example given above. The
minimum occurs at x = -1.93536, and the second derivative of the potential energy there is 59.7841.
Our period is therefore given by T 2 m k 1.4075 , to be compared with a period for the full
motion from x = 0 back to x = 0 of 1.87747. This is obviously a very rough approximation. Figure 6
shows the potential energy function (solid line) along with its simple harmonic approximation
(dashed), and it is clear from this figure that the two functions do not agree very well when the total
energy is 0. The simple harmonic approximation overestimates the time to the left, where the actual
potential grows larger and the kinetic energy of the object shrinks faster, and underestimates the time
to the right, where the potential function allows for more kinetic energy than the approximation. It is,
however, clear that the underestimation is more important than the overestimation and that the true
period will be longer than this approximation. It is also clear that the approximation would be much
better if the total energy was -30 or even -40 Joules, keeping the object tightly bound to the minimum.
This is verified by the periods 1.50677 and 1.44725, respectively. Although these approximation
methods do not boast the five digit precision of those considered above, they are far more widely
applicable and give us a more complete approximate description of the motion. The simple harmonic
oscillator can be solved completely in terms of the well-known trigonometric functions sine and
cosine, which allows it to be used to approximately answer many questions about more complicated
motion. This idea is often used by physicists and engineers to get an idea of what to expect from a
detailed calculation before actually undertaking the calculation, and where to look for interesting
phenomena when more accurate models are not available. It also gives accurate information about the
dependence of the period on the parameters associated with the potential function, something that is
not as easily accessible from the previous technique.

117

The period, for example, can be obtained using much more elementary techniques than those indicated here.

366

Section VII.3: The Second Integral and Periodic Motion

80
60
40
20

- 3.0 - 2.5 - 2.0 - 1.5 - 1.0 - 0.5


- 20
- 40

Figure 6
One very prominent example of the use of simple harmonic motion to approximate a more
complicated problem is that of the pendulum. Consider a mass m hanging on a string of length L from
the ceiling. There is clearly a potential minimum when the mass is hanging straight down, as the mass
will oscillate about this position if displaced. Taking the zero of potential energy at the minimum and
using the gravitational potential energy function U = mgh gives the potential function
U mgL 1 cos
for the mass hanging at angle from the vertical. This function has the simple harmonic
approximation
1
mg
2
U S mgL 2
L .
2
2L
We write the function in this final way in order to emphasize that the angle cannot be interpreted
directly as x because the units are not correct. The parameter x represents the distance traversed by the
object, so must be given by L . Once this minor technicality is resolved, we can immediately
identify k mg L and determine the approximate simple harmonic motion period as T 2 L g .
This is a very useful result, valid whenever the maximum angle of oscillation is less than about 30
degrees. Figure 7 illustrates the exact potential along with its simple harmonic approximation; it is
clear from the figure that the two functions agree quite well for small angles, but deviate substantially
at larger angles. This result is very useful in finding the length of a pendulum. According to our
above result, the length is given approximately by
gT 2
L
4 2
in terms of the period of motion, provided that the maximum angle is small enough. This can be used
to determine the height of a ceiling with something hanging from it simply by making it swing a little
bit and timing the period of its motion. The larger the swings are, the more inaccurate the length will
be. The fact that g 2 in SI units makes this approximation especially useful, as it gives L T 2 4 .
The length is given in meters when the period is given in seconds.
3.0
2.5
2.0
1.5
1.0
0.5

-3

-2

-1

Figure 7
367

Advanced Mathematical Techniques

The deviation from simple harmonic motion exhibited by the pendulum is an interesting
exercise on its own. The exact period of a pendulum with maximum angle max is given by

T 2

2L
g

max

d
.
cos cos max

If max 2 , then this integral can be completed by using the beta function:

2 1 4
L
L
L
.

2
1.18034... 2
3
g
g
g
4
Thus, the period is larger by about 18% when the pendulum completes an entire circuit from horizontal
back to horizontal. This is a fairly small amount in comparison with the above results, mainly because
the higher order terms in the expansion of the cosine function are largely suppressed by the factorials.
If the maximum angle is not 2 , then the calculation is a bit more complicated. It can be given
exactly in terms of elliptic integrals (again!), but it will be more illustrative for us to get an expansion.
The form of the denominator of our integrand is fairly annoying, but it can be simplified a bit by
writing it in terms of half angles:
cos cos max 1 2sin 2 2 1 2sin 2 max 2 2 sin 2 max 2 sin 2 2
T 2 1 4

Using the half-angle substitution sin 2 sin max 2 sin , the integral can now be re-written as (this
really is a bit of magic)

T 4

L
g 0

1 sin max 2 sin 2


2

Expanding the integrand and integrating term-by-term using the beta function, we arrive at
2
L 1 2
k
2k
2k
T 4

(1) sin max 2 0 sin d


g k 0 k

k
L (1) 1 2 1 2 k 1 2 2 k

sin max 2

2k !
g k 0 1 2 k k !

2 k 1 2
(1) k sin k 1 2 sin 2 k max 2
k !2
k 0

2 L g

2 L g
k 0

(2k 1)!!

22 k k !2

sin 2 k max 2

2
2

1
3 1 1
5 3 1 1
4
6

sin
2
2 L g 1 2 sin 2 max 2

max
2

2 sin max 2
2 2 2!
2 2 2 3!
2

This is a very useful series, and clearly illustrates the deviation from the simple harmonic result. For
small maximum angles, sin max 2 is very small and the simple harmonic approximation works

extremely well. For example, a maximum angle of 30 degrees has sin 2 max 2 0.067 so the
deviation is expected to be about 2%. The series diverges when sin max 2 1 , for a maximum angle
of 180 degrees. This results from the physical fact that the angle 180 degrees represents an unstable
equilibrium for the (rigid) pendulum, so the speed and net force go to zero as this point is approached
and the period therefore becomes unbounded. Note all that went into the derivation of this series: the
half angle substitution, the binomial expansion, expression of the binomial coefficients in terms of the
368

Section VII.3: The Second Integral and Periodic Motion

Gamma function, the Gamma reflection identity, and reduction of the half-integral factorials to the
alternating factorial
(2n 1)!! (2n 1)(2n 1) 3 1 .
This is characteristic of many computations in physics and other sciences: many different properties
must be used in tandem to arrive at the desired result.
Exercises for Section VII.3:

In problems 1 4, determine the full (this means multiply by two) period of the motion of a particle
with mass 3 moving under the influence of a force with the specified potential function. The object is
initially moving with speed 2 at the origin. All quantities are given in SI units. Do this by (a) cutting
the required integral off 0.1 units from both sides, (b) cutting the required integral off 0.01 units from
both sides, and (c) cutting the required integral off by a different amount on each side, depending on
which side is more important. Explain why you made the choice you made. Include a correction for
the part you cut off in each case, using only the first nonzero term in the Taylor expansion about each of
the turning points, and discuss the importance of this correction to your final result. Compare your
results to those given by a computer algebra system.
1. U ( x) 3 x 4 2 x3 3x 2 x

2. U ( x) x 6 5 x 4 3 x3 6 x 2

3. U ( x) x 2 3 x cos x 2

4. U ( x) 2 x 2 5 x sin x 2

5. Re-do the analysis of problem 1, but include both the first and second terms in the Taylor
expansion about each of the turning points in your correction. Determine the period by cutting
the integral off 0.1 units and 0.01 units from each turning point. Explain the improvement
associated with this modified correction, comparing to the result of a computer algebra system.
Was it worth it?
6. Re-do the analysis of problem 2, but include both the first and second terms in the Taylor
expansion about each of the turning points in your correction. Determine the period by cutting
the integral off 0.1 units and 0.01 units from each turning point. Explain the improvement
associated with this modified correction, comparing to the result of a computer algebra system.
Was it worth it?
7. The Lennard-Jones, or 6-12, potential, named for the British mathematician and physicist Sir
John Edward Lennard-Jones, is used extensively to approximate the weak attraction and
repulsion experienced by molecules in a gas or, more rarely, liquid. It is given by
A B
U (r ) 12 6 ,
r
r
where A and B are constants and r is the distance between the centers of the molecules.
(a) Determine the equilibrium distance r0 between the molecules.
(b) Re-write the expression for the Lennard-Jones potential in terms of the scaled variable
r r0 . Show that the potential can be written in the form
U V0 12 2 6 ,

and determine the value of V0 in terms of A and B.


369

Advanced Mathematical Techniques

(c) Determine the simple harmonic approximation to the Lennard-Jones potential, and find the
associated period.
(d) Use a computer algebra system to compute the period of motion associated with the
Lennard-Jones potential for total energies E (1 ) V0 , with 0.1, 0.01, and 0.001 .
Compare your results to that associated with the simple harmonic approximation.
8. Consider the motion of an object with mass 3 moving under the influence of the potential
U ( x) x 2 ( x 2)(1 x) .
All quantities are given in SI units.
(a) Determine the simple harmonic approximation to this potential and the associated period of
motion.
(b) Suppose that the total energy of the object is 0.3. Use a computer algebra system to
determine the period of motion. Is it close to that obtained in the simple harmonic
approximation? Explain why or why not.
(c) Suppose that the total energy of the object is 0.05. Use a computer algebra system to
determine the period of motion. Is it close to that obtained in the simple harmonic
approximation? Explain why or why not.
9. This problem concerns some properties of the elliptic integral of the first kind,
2
d
.
K ( m)
0
1 m sin 2
(a) Show that this integral is real whenever m < 1. Further, show that the integral satisfies the
relation
m
K
1 m K (m)
m 1
whenever m < 1. Note that the integral does not change if the sine function is replaced by
the cosine function. Explain why this relation allows us to express all elliptic integrals with
negative argument in terms of those with 0 < m < 1.
(b) Show that the elliptic integral will have both a real and an imaginary part whenever m > 1.
Further, show that these parts are given by
Arcsin 1 m
2
d
d
.
Re K (m)
and Im K (m)
2
0
Arcsin 1 m
1 m sin
m sin 2 1
(c) Show that the real part of the elliptic integral is given by
1
1
Re K ( m)
K
m m
by employing the substitution sin m sin in the associated integral. What motivated
this substitution? Note that you will have to find an expression for cos in order to
complete the substitution; this can be determined by using a right triangle.
(d) Show that the imaginary part of the elliptic integral is given by
Im K (m) K (1 m)
by employing the substitution m sin 2 1 (m 1) sin 2 in the associated integral. What
motivated this substitution? Note that you will have to find an expression for both cos
and sin in order to complete the substitution.
370

Section VII.3: The Second Integral and Periodic Motion

(e) Use your results from parts (c) and (d) in conjunction with that from part (a) to write the
elliptic integral with m > 1 in terms of elliptic integrals whose arguments lie between 0 and
1. Why would this result be useful? Use it to write the result of the integral given in the text
in terms of a single elliptic integral whose argument lies between 0 and 1.
Section VII.4: Constant Coefficient Differential Equations

While the second integral is very useful in giving us the period of motion for object moving
effectively in one dimension, it is only applicable when the relevant forces are conservative. It also
often leads to complicated integrals which must be completed numerically, so is difficult to use for
analytic results. We motivated the need for the second integral by our desire to avoid differential
equations of the second order. This is certainly a worthwhile pursuit, as such differential equations can
be very difficult to solve especially if they are nonlinear.118 In some cases, however, this path is
actually much simpler and leads to more general results. The shining example of this is second order
differential equations with constant coefficients.
Consider a mass moving under the influence of a spring force. This force pulls the object back
to its equilibrium with strength proportional to its displacement from equilibrium. Newtons second
law indicates
ma mx kx ,
or
mx kx 0 .
Here, the double dot over x indicates a second derivative with respect to time. It is similar to the
double prime common in introductory calculus representing a second derivative with respect to
position, and is used to emphasize the difference between derivatives with respect to time and
derivatives with respect to position. This is a second order differential equation of the kind we were
trying to avoid with the first and second integrals.
The coefficients of this differential equation are constant, so the equation essentially is asking
us to find a function whose second derivative is proportional to itself:
k

x x.
m
This is a very special requirement, and only a small class of functions can satisfy it. It turns out that
the only functions that can satisfy this requirement are sums of functions of the form ce rt , with c and r
constant. This can be proved using something called the Wronskian, which is discussed in chapter 11.
The essential idea is that solutions to this equation form a linear vector space of dimension two. If we
can find two linearly independent solutions, then we can span the whole space and need look for no
more. Our thought that the function ce rt has the right properties to satisfy this differential equation
leads us to try it by substituting it into the equation. In this process, we use the mathematical term
ansatz, meaning starting point in German. In mathematics, it is almost always used in the place of
educated guess, as the latter is less formal. Our ansatz gives
ce rt mr 2 k 0 ,
which, on recognition that e rt 0 , implies r i k m . Therefore, the general solution of our
differential equation is given by

118

This means that the function we are solving for appears in the differential equation as a square, like x , or in another
manner that is not linear.

371

Advanced Mathematical Techniques

x(t ) c1ei
x(t ) a1 cos

k mt

c2 e i

k mt

or
k m t a2 sin

k mt ,

where the as and cs are arbitrary constants. This second form is given for the convenience of
avoiding imaginary numbers in our solution; it is not necessary, but has many useful properties.
Among these are the fact that x(0) a1 and x (0) k m a2 , which are very useful in determining the
values of these arbitrary constants. The solution given above describes every situation in which a mass
moves under the influence of a force proportional to its displacement from equilibrium. The constants
distinguish between the different physical situations described by initial conditions. The motion of an
object will be different if it begins at a different place or with a different speed, so a complete solution
of the physical problem definitely requires this information. It is a general fact that a specific solution
to a differential equation of second order requires two initial conditions, the two being directly
related to the fact that the differential equation is of second order. This is also directly related to the
fact that there are two independent solutions to the differential equation, each associated with its own
constant to be determined by an initial condition. An object beginning at rest at position A will move
according to the function
x(t ) A cos k m t

as time goes on, while one beginning at the equilibrium position with velocity v0 moves according to
the function
x(t ) v0 m k sin k m t .

These statements are also reflected in more complicated systems: there is almost always a solution that
is 1 at t = 0 and whose derivative is 0 at t = 0 and another solution that satisfies the opposite
requirements. These solutions are the most useful to employ if one is interested in determining the
combination that gives a specified initial position and velocity.
A general linear differential equation with constant coefficients can be solved in the same
manner, using the same ansatz. The rationale behind this ansatz is that the only way to cancel the
functional form of all of the derivatives is to ensure that the solution has derivatives that are
proportional to itself. In general, the differential equation
d 2x
dx
a 2 b cx 0
dt
dt
has the solution x c1e r t c2 e rt , with

b b 2 4ac
2a
from the quadratic formula. The constants c1 and c2 are determined from initial conditions. If, as is
usually the case, the constants a, b, and c are all real, then there are three distinct cases. When
b 2 4ac , both of the roots are real and the solutions will be exponentially increasing or decreasing
depending on the value of b and the signs of a and the product ac. This situation is often referred to
as overdamped, as an oscillator under its influence will approach its equilibrium exponentially without
any overshoot. Think of jeep with bad shocks going over a speed bump too quickly overdamped.
The form of the solutions given above is fine, but it does not have the nice structure exhibited
by the trigonometric functions above. Both of the constants contribute to the initial position and the
initial velocity, requiring us to solve a system of equations to determine their values given a set of
initial conditions. To remedy this unfortunate situation, we introduce the hyperbolic functions
r

372

Section VII.4: Constant Coefficient Differential Equations

e x e x
e x e x
and cosh x
.
2
2
These functions are defined in a manner analogous to the circular functions
eix e ix
eix e ix
and cos x
,
sin x
2i
2
but without the is. It is clear from their definition that
sinh 0 0
cosh 0 1
d
sinh x cosh x
dx
d
cosh x sinh x
dx
and
2
cosh x sinh 2 x 1 .
This last relation embodies the reason why these functions are referred to as hyperbolic functions:
while the circular functions sine and cosine satisfy the Pythagorean identity causing them to lie on a
circle, these functions satisfy this new identity and therefore lie on a hyperbola. In terms of these
functions, the general solution of our differential equation is given by

b 2 4ac
b 2 4ac
x(t ) ebt 2 a a1 cosh
t a2 sinh
t .

2a
2a

Evaluating at t = 0, it is clear that a1 is the initial displacement from equilibrium. The initial velocity
is determined by taking the derivative of this expression with respect to t and evaluating at t = 0. This
looks complicated, but it isnt really that bad if we proceed carefully. Since we are evaluating at t = 0,
the derivative of the hyperbolic cosine function cannot contribute. Similarly, we must take the
derivative of the hyperbolic sine function in order to obtain a contribution from a2 . This allows us to
write
sinh x

b 2 4ac
b
a2
a1 ,
2a
2a
so the solution appropriate to a mass that has initial displacement A and initial velocity v0 is given by
x (0)

b 2 4ac 2av bA
b 2 4ac
0
sinh
x(t ) e bt 2 a A cosh
t
t .

2a
2a

b 2 4ac

If the initial velocity is zero, then this reduces to

b 2 4ac
b 2 4ac
b
sinh
x(t ) Ae bt 2 a cosh
t
t .

2a
2a

b 2 4ac

If b 2 4ac , then there will be an imaginary contribution to both roots. Imaginary exponents
should always make you think of the oscillatory functions sine and cosine. In this case, the solutions
will oscillate around the equilibrium, approaching it asymptotically if b a 0 and moving away from
it if b a 0 . In this case, you should think of a jeep with weak shocks going over a speed bump: the
jeep oscillates up and down a few times after going over the speed bump before settling back to its
equilibrium. This situation is referred to as underdamped. We will again jettison our exponentials in
favor of the symmetric and antisymmetric functions sine and cosine. We could re-do the analysis as
373

Advanced Mathematical Techniques

above, replacing the hyperbolic functions with the circular ones and writing the square root as

4ac b2 instead to make it real, but it will be instructive for us to determine the new solution by
simply transforming the old one. Since b 2 4ac 0 , we will write it as b 2 4ac 4ac b 2 ei .
Taking the square root therefore gives

b 2 4ac 4ac b 2 ei 2 i 4ac b 2 .


This corresponds to taking the principal value of b2 4ac , with the +i rather than the i. Looking at
our definitions for the hyperbolic functions, we see that cosh(ix) cos x and sinh(ix) i sin x .
Therefore, we have the general solution

4ac b 2 2av bA
4ac b 2
0
sin
x(t ) e bt 2 a A cos
t
t ,
2

2
2
a
a

4
ac
b

and
4ac b 2
4ac b 2
b
x(t ) Ae bt 2 a cos
t
t
sin

2a
2a

4ac b 2


if the object is initially at rest. This technique of using one solution to derive another allows us to
avoid re-inventing the wheel, and is very useful in many situations. It is easy to see that these
solutions reduce to those given above when b = 0.
The third case is the middle ground between these two. It is what automobile manufacturers
are aiming for when they design shocks, the Goldilocks case of critical damping. This case is
obviously characterized by b 2 4ac , and our solutions coincide. It seems that there is only one
solution for critical damping. This is unacceptable because one solution does not carry the structure
necessary to satisfy two independent initial conditions. There must be another solution in this case.
There are many ways in which we can determine this solution, but the easiest from our standpoint is to
simply take the limit of our case one (or case 2 it doesnt matter) solution as b 2 4ac :
2av0 bA
b

x(t ) e bt 2 a A
t or x(t ) Ae bt 2 a 1
t .
2a
2a

Clearly, the second solution is given simply by multiplying the first by t, t e bt 2 a . There is a much
more general reason underlying this fact, involving something called a generalized eigenvector when
the eigenvalues of an operator are degenerate, but we will not go into this here. We will have a bit
more to say about this later when we discuss linear algebra and what eigenvalues and eigenvectors are,
but a full treatment must wait until a differential equations course.
In order to fully discuss the character of our solutions, it is useful to have a specific physical
problem in mind. There are several examples, but the most prominent are associated either with the
motion of a mass under both a restoring force (a spring force) and a resistive force or with a circuit that
includes a resistor, a capacitor, and an inductor. Lets begin with the former, as it can easily be
imagined and is therefore simpler to understand. Consider an object attached to a spring moving
through a viscous medium like air or water. As the object moves, it must push the viscous medium out
of the way. This force imposed on the medium induces a counter-force on the object in accordance
with Newtons third law, called a resistive or drag force.. Obviously, this force is directed opposite
the velocity of the object and acts to slow the object down, whatever direction it is moving in. If the
speed of the object is very small and the medium is very viscous, then it can be shown that the
magnitude of the resistive force is approximately proportional to the speed of the object. It is
important to note that this expression is not valid for objects moving quickly through a medium or
when the motion generates streamlines. The formation of streamlines is highly dependent on the shape
374

Section VII.4: Constant Coefficient Differential Equations

of the mass moving through the medium, so it is difficult to give a standard speed of motion for which
this approximation will be valid. Motion at higher speeds results in a resistive force that is
approximately proportional to the square of the speed, which complicates the analysis immensely.
General motion of a mass through a fluid medium is dictated by the celebrated Navier-Stokes
equations, named for the French physicist Claude-Louis Navier and the British physicist and
mathematician George Gabriel Stokes. These are nonlinear differential equations, often considered to
be among the most difficult equations in physics to obtain solutions for. Even numerical analysis of
these equations often runs into difficulty.
Assuming that the resistive force is proportional to the velocity, we write Newtons second law
as
mx bx kx ,
or
mx bx kx 0
for an object of mass m moving through a medium with resistive coefficient b. This is a second order
differential equation with constant coefficients, exactly of the type considered above. Its solutions are
therefore given by
4km b 2
4km b 2
b
sin
x(t ) Ae bt 2 m cos
t
t

2m
2m

4km b 2


in the underdamped case and related expressions in the other two cases, given that the initial velocity is
zero. To analyze the character of these solutions, it is helpful to introduce the parameters b 2m
and 2 k m :

x(t ) Ae t cos 2 2 t
sin 2 2 t .
2 2

The parameter can be thought of as a mass-reduced damping coefficient, and is the angular
frequency in the absence of damping. The oscillations can be seen to depend only on the angular
frequency and the dimensionless ratio by re-writing this expression as

x(t ) Ae t cos 1 2 t
sin 1 2 t .
1 2

Since the angular frequency always appears multiplied by the time t, we can think of x as a function
of t instead of just t and characterize our solutions entirely in terms of the value of the ratio . This
process of rescaling parameters is extremely useful in many branches of science, as it allows us to
boil down the dependence of a physical quantity on several parameters to a more approachable form.
It is clear from our solution, for example, that the motion has angular frequency when 0 and the
angular frequency decreases continuously to 0 when 1 (critical damping). When the ratio exceeds
1, the oscillations disappear and are replaced with exponential decay. This behavior is illustrated in
figure 8, where the solid line indicates 0 and the dashed lines indicate 0.4, 0.8, 1, 1.2, and 1.6 ,
in order of decreasing dash size. Note that the first two graphs show oscillations with decreasing
frequency. The oscillations cease at the critical damping value of the coefficient, after which the
approach to equilibrium becomes more and more slight. This behavior can be understood easily in
b
terms of the ratio
, as large values of this ratio imply that the velocity of the mass tends to
2 km
zero. Large damping and small spring constant give us a situation in which very little force is being

375

Advanced Mathematical Techniques

applied to accelerate the mass and a large force is exerted to prevent motion. In the limit , the
mass will simply sit at its original position, unable to move because of the overwhelming resistive
force.
1.0
0.5
1

- 0.5
- 1.0

Figure 8
It is impossible to have exponential increase in the situation considered above because all three
of the constants m, b, and k are necessarily positive. We can arrange a physical situation in which k is
negative for a time by, for example, considering the mass in motion about an unstable equilibrium.
One example of such a situation is that of a positive charge sitting in close vicinity to two other
positive charges arranged symmetrically about it. We can also imagine orchestrating a resistive force
that pushes in the same direction as the velocity, effectively making b negative. In either of these
situations, the position of the mass will move away from the equilibrium exponentially with time.
These are exotic physics situations, difficult to achieve for any significant amount of time, but they are
certainly not inaccessible to the mathematics. This is an important idea to understand: the mathematics
contain everything that can be possible, while physical solutions are limited to what actually is
possible.
The other prominent example of a second order differential equation with constant coefficients
is that of a circuit containing a resistor, a capacitor, and an inductor. This differential equation is
derived by using Kirchhoffs loop law, named for the German physicist Gustav Kirchhoff who
introduced it in 1845, in conjunction with the known potential difference induced across a resistor with
resistance R carrying current I, V IR , that induced across a capacitor with capacitance C carrying
charge Q, V Q C , and that induced across an inductor with inductance L carrying current I whose
rate of change with time is given by I , V LI . Kirchhoffs loop law gives
Q
IR LI 0 .
C
On recognizing that current represents a decrease in the charge held by the capacitor, I Q , we can
re-write this as
RQ Q C 0 .
LQ
Once again, we have a second order differential equation with constant coefficients. All of the
coefficients are positive, so this situation is mathematically identical to that discussed above. All we
1
need to do is identify the constants R 2 L , 1 LC , and R C L . The physical situation
2
in this case is somewhat different, but can be understood by carefully going through what is happening
in the circuit. If the capacitor initially contains some charge, it is storing energy. As the charge leaves
one side of the capacitor in favor of the other, it generates a current through the circuit. As the current
begins to run, the charge held by the capacitor decreases, reducing its effect on the circuit. The
inductor acts as a potential source (a battery) when the current running through it changes. Initially,
the current changes very quickly from 0 to a finite amount. As the current continues to flow, the
376

Section VII.4: Constant Coefficient Differential Equations

charge on the capacitor decreases until it is no longer able to drive the current. However, there is still
current flowing because the current cannot change discontinuously. This current begins to charge the
capacitor in the opposite sense (the positive charge switches sides) as the charge on the capacitor
passes through zero. This changes the directive issued by the capacitor on which direction the
current should flow, which causes the current that is already flowing to change its magnitude, making
the inductor act to keep the current as is. This process continues until the capacitors influence
outweighs that of the inductor and the current briefly becomes zero. At this point, the capacitor takes
over again and runs the current in the opposite direction in order regain its equilibrium and the process
is repeated. The job of a resistor in a circuit is to use energy. On moving through the resistor, the
current deposits some of its energy. If the resistor takes only a small amount of energy, then the
current will oscillate in the manner indicated above. If, on the other hand, the resistor takes a large
amount of energy, then there will not be enough energy left to charge the capacitor in the other
direction. The resistor is forcing the inductor to provide a smaller potential difference, essentially
undercutting the oscillation process described above. This indicates the different cases of
underdamped and overdamped, in exactly the same manner as that described above. One very
important aspect of this analysis is that the frequency of oscillation of the charge in an LRC circuit
actually generates electromagnetic waves which propagate out from the circuit and can be measured.
By adjusting the values of the capacitance and inductance of the circuit, one can achieve a large range
of frequencies for these electromagnetic waves. This was the technique used in 1887 by the German
physicist Heinrich Hertz to demonstrate the existence of electromagnetic waves, measure their speed,
and demonstrate their reflective and refractive properties. It remains an important technique used to
experimentally generate electromagnetic waves.

Exercises for Section VII.4:


In problems 1 4, a mass of 3 kilograms moves under the influence of a spring force whose spring
constant is 12 Newtons per meter. Determine the ensuing motion of the mass under the damping
coefficient b and initial conditions given in the problem. Tell whether the motion is underdamped,
overdamped, or critically damped, and why you came to this conclusion. All quantities are given in SI
units.
1. b 0 ; x (0) 2 , x (0) 1 .

2. b 3 15 ; x (0) 1 , x (0) 2 .

3. b 12 ; x (0) 4 , x (0) 3

4. b 2 37 ; x(0) 3 , x (0) 4 .

5. You are interested in repeating Hertz experiment to demonstrate that electromagnetic waves
exist. You intend to do this by using an RLC circuit driven by a capacitor that is initially
charged. Your capacitor has a capacitance of 2 pF (this prefix means 10 12 ), and you cannot
build a circuit whose resistance is smaller than 30 . What inductance should your inductor
have in order to produce a wave with frequency 7 1010 Hz ? You may simplify this problem
by assuming that the inductance is not small enough to cause the resistance to significantly
impact the frequency; show that this is the case once you have obtained your solution. How
long will it take for the amplitude of the ensuing oscillations of current in your circuit to drop
below half their initial value? About how many times will the signal have oscillated in this
amount of time?
377

Advanced Mathematical Techniques

Section VII.5: Summary of Chapter VII


This chapter is focused on the uses of mathematics to determine workable models for physical
phenomena. We began with fits of data to a given mathematical function. This is a very delicate
procedure, and one that finds immense use in almost every physical application of mathematics. We
need to keep both the idea of errors associated with physical measurements and the idea that the model
we are using is not expected to be exact in any physical application in mind when assessing the
applicability of a given model. Our determination of parameters associated with the model will inherit
these errors, and it is important for us to properly take these errors into account especially when
using the result of the model to extrapolate to physical regions not treated in the experimental results.
Deviations from the model can represent ordinary statistical fluctuations or new physics that was
unanticipated by the model, and it is important for us to be able to judge which of these two
possibilities is actually represented by a given deviation from the data. This process is very tricky, and
quite subjective at times. We must have a strong arsenal of different approaches to the data, including
error-weighted fits and fits to similar models, in order to properly make this judgement call.
Ultimately, of course, the decision falls to the data. If the data shows a strong deviation from the
model at a given place, more accurate experiments must be commissioned near this place in order to
make a better decision on whether the deviations are statistical or represent something new.
If a physical model indicates that an object of interest ought to be acting under the influence of
a conservative net force, then we can use the results of conservation of energy to completely determine
the motion whenever we are given initial conditions that make the solution unique. We can do this
with any one-dimensional potential function, even many of those with discontinuities, so it represents a
tremendous achievement in mathematical physics. One very interesting property of motion under the
influence of a given potential function is the oscillations they exhibit about a potential minimum. This
technique allows us to determine the period of of any one-dimensional motion about any potential
minimum as long as we are willing to take some very tough integrals. These integrals can usually be
taken only numerically, so we need to regularize the divergence associated with the zero speed at the
turning points.
In order to treat more general motion, not resulting from the influence of only conservative
forces, the first integral is no longer available and we need to solve second order differential equations.
This can be accomplished in simple cases, when the restoring force is linear in position and the
resistive force is linear in velocity, but it is very difficult to accomplish in more general cases. An
example is given in the next chapter. In the simpler case, one finds a second order differential
equation with constant coefficients. Such differential equations are notoriously easy to solve, as the
same ansatz always works. In the case of damped simple harmonic motion, this treatment leads to
three cases: underdamped, overdamped, and critically damped. The difference between these cases is
very important in many physical applications, and supplies at least qualitative information about the
behavior of more complicated systems whose resistive force is proportional to the velocity. Even more
complicated systems whose resistive force is proportional to a power of the speed are more
complicated. They often require numerical solutions, which must be done carefully in the vicinity of
any turning points of the motion. There are ways of determining the value of a solution to a given
differential equation in an extremely accurate manner, as long as we are not interested in asymptotic
results or results near a singularity. These situations require a more delicate analysis that is beyond the
scope of this text.
The problem is solved completely in one dimension, but there are other degrees of freedom we
need to pay attention to in order to fully establish motion in multiple dimensions. Even without a
resistive force, multidimensional motion requires a slightly different analysis. One simple illustration
of this technique is the topic of the next chapter.
378

Section VIII.1: Circular Motion and the Universal Law of Gravitation

Chapter VIII
Math in Space
A general study of orbits involves many concepts of physics, and provides an imaginative
introduction to the use of logic and mathematics to tackle a physical problem. The purpose of this
chapter is to introduce the major concepts and manipulations that come up in the study of gravitational
orbits, and can be seen as a specific application of the material presented in chapter 7. We will discuss
the dynamics of circular motion, the concept of energy as applied to gravitating systems, conservation
of angular momentum, and effective potentials. These topics emphasize the role of vectors in physical
analysis and illustrate many important techniques required to fully interpret the meaning of equations
governing physical systems.

Section VIII.1: Circular Motion and the Universal Law of Gravitation


In popular culture, the word accelerate is used to mean speed up and the word decelerate is
used to mean slow down. These meanings are inconsistent with the definition of the term acceleration
in physics, leading to many misunderstandings especially when one considers motion along a curved
path. The acceleration of an object is defined as the rate of change of its velocity:
dv
.
a
dt
Velocity is a vector, so changes in either its magnitude or its direction both indicate changes in the
velocity. The velocity of an object moving at constant speed around a circular path clearly changes its
direction as the circle is traversed, so such an object is obviously accelerating even though its speed is
constant.
The acceleration of an object traversing a circle of radius r at constant speed v can be
determined directly from the definition. An object moving at constant speed counterclockwise around
a circle of radius R that initially is located at (R,0) whose angular speed is given by d dt has the
position vector
r (t ) R i cos t j sin t .

The velocity and acceleration of this object can easily be obtained as functions of time simply by
differentiating:
v (t ) R i sin t j cos t

a(t ) R

i cos t j sin t .

The speed of the object is clearly given by v R , and its acceleration is directed toward the center of
the circle with magnitude R 2 despite the fact that its speed is constant. This represents one of the
most misunderstood facts about acceleration: acceleration implies that the velocity is changing, not that
the speed is changing. General circular motion, with angle (t ) that is allowed to depend on time
in an arbitrary manner, yields the acceleration119

119
I am using the notation u to represent a vector parallel to u whose magnitude is 1. This notation is often seen in
mathematics, and almost always seen in physics.

379

Advanced Mathematical Techniques


2
a(t ) R (t ) i sin (t ) j cos (t ) R (t ) i cos (t ) j sin (t )

.
R (t ) v R 2 (t ) r
This general acceleration contains a component parallel to the velocity and another perpendicular to
the velocity. The part parallel has magnitude R , and represents speeding up or slowing down, while
the part perpendicular has magnitude R 2 and represents the change in direction of the object. An
object moving in a circle must change its direction of motion in order to remain on the circle, even if
its speed remains the same. As with all changes in motion, this requires an acceleration. The
acceleration required to remain on a circular path at constant speed is popularly referred to as the
centripetal acceleration, and its magnitude is given by v 2 r .
The orbits of planets are approximately circular, and these circular orbits are more easily
described than more general orbits. The Polish astronomer Nicolaus Copernicus formulated his theory
of planetary motion about the Sun in the early 1500s. This theory is kinematic in nature it does not
explain why the planets move around the Sun, it merely describes their motion. Subsequent detailed
studies of planetary motion by the German mathematician, physicist, and astronomer Johannes Kepler
in the early 1600s led to Keplers three laws of planetary motion, but these were still based just on
observations of motion rather than a dynamic theory of why the observed motion occurred. The
understanding that uniform circular motion requires acceleration coupled with the idea that
acceleration requires force led Newton to postulate his universal law of gravity in 1687.
It has, of course, been long known that objects fall toward the Earth. Experiments performed
by the Italian mathematician Giambattista Benedetti in the 1550s showed that objects fall to Earth at
approximately the same rate, regardless of their mass. The Earth therefore puts a force on all objects at
or near its surface that is equal to the mass of the object times a constant acceleration, called the
acceleration due to gravity. Newton and other scientists wondered how this gravitational influence
varies as the distance from the Earths surface is increased. It seemed unnatural that the acceleration
due to gravity should remain constant up to distances as far away as the Moon or the other planets. On
the other hand, it seemed equally unnatural that the acceleration is constant out to a certain distance
and drops immediately to zero beyond. The prominent British mathematician and physicist Sir Isaac
Newton postulated a gravitational acceleration that starts out at the value measured at the Earths
surface and gradually drops to zero at the distance from Earths center increases. Specifically, Newton
postulated an inverse-square law force so that the acceleration due to Earths gravity experienced by an
object a distance r from the center of the Earth is given by
2

R
g (r ) E g E .
r
In this expression, RE 6370 km is the radius of the Earth and g E 9.8 m s 2 is the acceleration due
to gravity at the surface of the Earth. The small g is approximately constant and leads to a
gravitational force of magnitude mg directed toward the center of the Earth on all objects of mass m
lying close to the surface of the Earth, so all such objects experience the same acceleration regardless
of their mass. This was established by the Italian physicist, astronomer, and father of modern science
Galileo Galilei in his study of the kinematics of motion. It leads to the potential energy function
U g mg E h , where h is the height of an object above the surface of the Earth.

Newtons acceleration due to gravity has the nice property that it leads to an approximately
constant result of g E for objects whose distance from the surface of the Earth is much smaller than the
Earths radius, but the acceleration experienced by objects much farther away is much smaller. The
specific inverse square form was chosen so that the theory is consistent with Keplers third law of
planetary motion: the cube of the radius of a planets orbit is proportional to the square of its period.
380

Section VIII.1: Circular Motion and the Universal Law of Gravitation

This relation can be derived from the inverse-square law by setting the acceleration due to gravity
experienced by a far away body equal to the acceleration required for it to move in a circular orbit:
2

v 2 RE
RE2 g E
2
.

gE v
r r
r
Since speed is just distance traveled over time taken, we can also write
2 r
v
T
for a body in a circular orbit of radius r about the Earth that takes a time T to complete one orbit.
Squaring the second equation and substituting it into the first yields
4 2 r 3 RE2 g ET 2 ,
or
r3 T 2
since everything else is constant.
The Moon, for example, has a period of about 27.3 days = 2.36 106 seconds. Plugging in the
radius of the Earth RE 6.38 106 m and g E 9.8 m s 2 gives us the radius and speed of the Moons
orbit:
r 3.83 108 m ;
v 1021 m s ,
in excellent agreement with independent measurements of these quantities. These same relations
follow for satellites in circular orbit around the Earth.
The two important facts about circular orbits are (1) the square of the speed is inversely
proportional to the radius of the orbit and (2) the square of the period is proportional to the cube of the
radius. Circular orbits are very simple, and we will be using them later to define a natural set of units
for a planetary system in order to simplify the equations for general orbits.
This idea of Newtons that the gravitational influence of the Earth follows and inverse square
law has repercussions that may not be obvious. An object moving under no other influence than that
of the Earth will accelerate at the rate
2

R
g (r ) E g E
r
toward the Earth, where r is the distance from the object to the center of the Earth. That means an
object of mass m a distance r away from the center of the Earth feels a force
2

R
mg ( r ) E mg E .
r
Newtons postulate indicates that all masses in the universe are subject to this force. Thus, the Earth
puts forces on all other objects in the universe. By Newtons third law, all of these objects put forces
of equal magnitude on the Earth.
This is kind of a strange idea. The Earth puts forces on all other masses in the universe and all
other masses in the universe put forces on the Earth. The Earth is somehow singled out in this
approach because we know from experience that the Earth puts gravitational forces on us. On the other
hand, it is somewhat hard to believe that Jupiter singles out the Earth as being special and puts a
gravitational force only on it. In fact, observations by Galileo in 1610 of small objects orbiting Jupiter
indicate that Jupiters gravitational influence affects more objects than just the Earth. Postulating the
same form of gravitational attraction to Jupiter as we have for the Earth leads to a gravitational force
exerted on an object with mass m lying a distance r from the center of Jupiter of magnitude

381

Advanced Mathematical Techniques


2

RJ
mg J ,
r
where RJ is the radius of Jupiter and g J is the acceleration due to gravity experienced on the surface
of Jupiter.
Lets apply our expression for Jupiters gravitational force to the Earth. If the mass of the
Earth is M E and it lies a distance r from the center of Jupiter, Jupiter exerts a force of magnitude

RJ2 g J
ME
r2
on the Earth. Similarly, the Earth exerts a force of magnitude
R2 g
FE on J E 2 E M J
r
on Jupiter. These two expressions are equal by Newtons third law, so we have
RJ2 g J M E RE2 g E M J ,
FJ on E

or, dividing by M E M J ,
RJ2 g J RE2 g E

.
MJ
ME
The left-hand-side of this equation contains parameters that depend only on Jupiter, while the righthand-side contains parameters that depend only on the Earth. Therefore, both sides must equal a
constant independent of the Earth and Jupiter:
R2 g
G.
M
This constant is the universal gravitation constant. In terms of G, we have
GM J M E
.
FE on J FJ on E
r2
The above argument can be applied to any of the objects in the universe on which the Earth
exerts a gravitational force. This leads to the result that all objects in the universe exert a gravitational
force of attraction on all other objects in the universe. The force between any two masses m1 and m2
separated by a distance r is given by
Gm1m2
.
F1,2
r2
Obviously, this is a far more universal law than that found in our discussion of orbits. Its derivation
is based on a certain degree of symmetry between masses in the universe. Newtons third law requires
that all objects that feel a gravitational force also exert one. Since all objects are able to exert
gravitational forces, they must do so on all other objects.
The gravitational force between ordinary bits of matter is hardly detectable in everyday life. I
do not feel a gravitational attraction to my car, even though both me and my car have appreciable
masses. This means that the universal constant G is immensely small when expressed in terms of
moderate masses of kilograms. Only when at least one of the two masses becomes very large, say on a
planetary scale, does the gravitational force become appreciable.
Using our newfound understanding of the gravitational force, we can develop a series of ratios
for the masses of various heavenly bodies. Our equations for circular orbits can be re-expressed in
terms of the mass M of the primary as

382

Section VIII.1: Circular Motion and the Universal Law of Gravitation

GM
; 4 2 r 3 GMT 2 .
r
Comparing the radius and period of a circular orbit about the Earth to the radius and period of a
circular orbit around the Sun allows us to make a determination of the ratio of the Suns mass to the
Earths mass:
v2

M S rS TE
.
M E rE TS
Comparing the Earths orbit around the Sun, whose radius is 1.5 1011 m and whose period is one year,

to the Moons orbit around the Earth, whose radius is about 3.84 108 m and whose period is
approximately 27.3 days, leads to the ratio M S M E 333, 000 . Incorporating data about the orbits of
Jupiters moons around Jupiter allows us to compute analogous ratios for Earth and Jupiter, etc. In this
way, we can obtain the relative masses of any two objects that we can observe the gravitational effect
of. This gives us a kind of gravitational mass scale.
In order to connect this gravitational mass scale to the ordinary everyday mass scale of
kilograms, we need to fix one mass on both scales. We need to observe the gravitational effects of a
known mass in kilograms. This is very difficult to do, because objects that are easily weighed on Earth
all have very small gravitational influences. The first measurement of the gravitational force exerted
by moderate masses was not made until 1798, when the British scientist Henry Cavendish determined
the value of G in his famous Cavendish experiment. This experiment consisted of two masses, one of
which was fixed while the other was able to rotate freely. The original experimental set-up,
engineered by the British geologist John Michell, needed to be modified by Cavendish in order to be
sensitive enough to measure the small attraction force between the two masses because of temperature
variations and air currents. He obtained a value close to the modern value of
G 6.67428(67) 1011 N m 2 kg 2 .
This is one of the most uncertain physical constants in all of physics, as it is very difficult to devise an
experiment that can be performed on Earth because of the necessity of using relatively small masses.
From this value and the known values of RE and g E , we can easily calculate the mass of the Earth in
kilograms:
R2 g
M E E E 5.97 1024 kg .
G
Because of this, Cavendish is said to have weighed the Earth with his measurement. Of course, this
measurement coupled with the ratios on the gravitational mass scale fixes the masses of all of the
heavenly bodies in kilograms. If we can observe something orbiting it, we can determine its mass in
kilograms.
In order to obtain the first integral of orbital motion, we need to find the potential function
associated with the universal law of gravitation. A small mass m in the vicinity of a large mass M feels
the force
GMm
F 2 r .
r
The vector r is a unit vector pointing away from the center of the large mass, so the force is directed
toward the center of the large mass. The work done by this force as the mass m is displaced a small
amount dr is given by
GMm
WG 2 r dr .
r
383

Advanced Mathematical Techniques

This can be written as120


GMm
GMm
GMm
r dr 3 r dr 2 dr
r3
r
r
Here, we have made use of the fact that for small changes the dot product of a vector with its change is
equal to the ordinary product of its magnitude with the change in its magnitude.
Our new expression for the work done by gravity is the same as that for a one-dimensional
GMm
force F 2 acting only to change the distance r from the center of M to m. The fundamental
r
theorem of calculus assures us that the work done by any one-dimensional force that depends only on
position can always be written as the change in some function of position:
W U .
In our case, the potential energy function can be chosen as
GMm
.
UG
r
Remember that a constant added to a potential energy function has no effect on the predicted behavior
of the system. The choice of constant here has been made in such a way that the potential energy
associated with two masses infinitely far away from each other is zero. Since the masses attract each
other, smaller separations are associated with negative values of potential energy.
WG

Ug
1

UG

-1
-2
-3

Figure 1
A graph of the gravitational potential energy versus r R is shown in the figure 1. This
potential is only valid outside the surface of the large mass M. The gravitational potential inside the
mass depends on how the mass is distributed and can only be probed if we drill holes into the mass.
The approximation of U g to this function is also shown. It is clear that this approximation works well
when we are close to the surface, but fails miserably as the distance from the surface becomes
comparable to the radius of the mass itself. There are no equilibria on this graph stable or otherwise.
If we place the mass at rest anywhere, it will accelerate toward the mass and eventually end up
crashing on the surface.

Exercises for Section VIII.1:


1. Use the fact that the period of the orbit of Mars about the Sun is approximately 687 days, along
with the mass 2 1030 kg of the Sun, determine the radius and speed of Mars orbit. Assume
that the orbit is circular.
2. Use the fact that the moon Europa of Jupiter has an orbital radius of 670,900 km and orbits
Jupiter once every 3.55 days, determine the mass of Jupiter and the speed of Europas orbit
about the planet. Assume that the orbit is circular.
120

Remember that r r r .

384

Section VIII.1: Circular Motion and the Universal Law of Gravitation

3. An object speeds up at a rate of 2 m s 2 as it traverses a circle whose radius is 3 meters.


Determine the magnitude of its acceleration vector as a function of time, assuming it starts
from rest at t = 0. When will the acceleration have a magnitude of twice the rate at which it is
speeding up? How fast will it be moving then? Make some qualitative sketches of the
direction of the acceleration vector at various times, and explain what happens to its direction
as time increases.
4. The Moon orbits the Earth with once every 27.3 days. Use this information to determine the
ratio between the distance from the center of the Earth to a satellite in geosynchronous orbit (so
that it always lies above the same section of the Earth) to that from the center of the Earth to
the center of the Moon. Do you need the mass of the Earth or the value of G in order to
perform this calculation? Explain.

Section VIII.2: Angular Momentum and the Effective Potential


Our result from the last section is unexpected. We know that there are configurations
associated with constant r: the circular orbits discussed above. These should somehow be present as
stable equilibria on our potential energy versus position graph, but they are not there. The difficulty
lies in the fact that we have suppressed the other dimensions. The satellite of mass m can move in
three directions, toward or away from the mass representing only one of these. The analysis of chapter
7 indicates that the kinetic energy is given by the difference between the total energy and the potential
energy illustrated in the graph illustrated in figure 1. The kinetic energy clearly increases without
bound as m goes toward the primary, so our model seems to predict that every mass in the gravitational
influence of the primary will end up crashing on the surface of the primary.
The extra dimensions allow us to avoid this debacle because the kinetic energy referred to in
the work-energy theorem is the total kinetic energy. We do not require that this contribution is
associated only with motion toward or away from the primary, but components of the velocity
perpendicular to the vector from the primary to the satellite have been suppressed in our discussion. In
order to obtain a true representation of the motion, we need to include these components in our
evaluation.
The most efficient way to account for this orbital kinetic energy is by introducing the angular
momentum of the satellite. Angular momentum, defined via
L r (m v) ,
characterizes the degree to which an object is rotating about the origin. As a consequence of the
properties of the cross product and Newtons second law, it can be shown that the angular momentum
of an object cannot be changed by forces that act radially. Since the gravitational force is always
radial, directed toward the origin, the angular momentum of the satellite is a constant of the motion. It
will not change unless we force it to. The magnitude of the angular momentum vector,121
L m r v ,
is directly associated with the component of the velocity associated with motion around the primary.
Splitting the kinetic energy into its radial and orbital contributions and using our expression for
L, we have

121

The symbol v in this expression represents the component of the velocity perpendicular to the vector from the primary

to the satellite.

385

Advanced Mathematical Techniques

1 2 1 2 1 2 1 2
L2
mv mv mv mv
.
2
2
2
2
2mr 2
The angular momentum is constant, so the second term here is a function of r alone. It increases as r
decreases for the same reason a spinning ice skater speeds up as she pulls her arms in: the mass must
move faster when the radius is smaller in order to conserve angular momentum. The total energy of
the satellite is therefore given by
1
L2
GMm
E mv2

,
2
2
2mr
r

radial kinetic

effective potential

where the terms can be grouped as indicated. All of the contributions associated with orbital motion
are grouped into the effective potential, which is a function only of the distance to the primary. All
excess energy above the effective potential is reflected in motion toward or away from the primary.

Figure 2
A graph of the effective potential is shown in the figure 2. The orbital kinetic energy has the
effect of introducing a potential barrier at small values of r, preventing objects with finite angular
momentum from falling onto the primary. Its inclusion into the effective potential dramatically
changes the interpretation of the potential energy graph. We now see one stable equilibrium,
presumably corresponding to the circular orbit, and objects in orbit suddenly seem far less precarious.
We are almost ready to characterize the different types of orbits predicted by the effective
potential given in the last section. Before that, we recognize that the effective potential is proportional
to the satellite mass. Because this is an overall constant, it will not affect the dynamics of the system.
The mass of the satellite does affect the motion of the primary itself, but this motion can be neglected
if the satellites mass is much smaller than that of the primary. As long as the satellite satisfies this
criterion, its motion is completely independent of its mass. We will consider the specific energy of the
satellite, its energy per unit mass, for this reason. The specific energy is related to the specific
angular momentum via
1
2 GM
.
v2 2
2
2r
r
We will study the allowed orbits associated with a given fixed value of the specific angular
momentum. The effective potential associated with different values of angular momentum has the
same basic shape and features, but the stable equilibrium is pushed farther from the primary as the
specific angular momentum is increased.
The location of the stable equilibrium is given by our study of circular orbits above. It occurs
at the distance
2
rc
GM
from the primary. This represents the lowest possible energy for a given angular momentum,

1 GM

GM

.


2
2rc
386

Section VIII.2: Angular Momentum and the Effective Potential

Adding a little bit of energy without changing the angular momentum, by pushing the satellite a little
away from the primary, causes the satellite to oscillate back and forth between a maximum distance
from the primary and a minimum distance. These oscillations take place as the satellite orbits the
primary, so the maximum and minimum distances do not occur at the same part of the orbit. It turns
out that the time it takes the satellite to complete one full oscillation from maximum distance to
minimum distance and back is exactly the same as the amount of time it takes the satellite to make one
complete orbit around the primary. The orbit is closed in the sense that it returns to the same physical
place again and again as it orbits. This is an extremely special property of the effective potential we
are studying. Any small change in the effective potential will invalidate this property and cause the
orbits to precess as time goes on. This property of inverse-square law forces is one of the main
reasons why Newton introduced this form in the first place.
A detailed study of the shape of these orbits reveals that they are ellipses with one focus at the
primary. The minimum distance is called the periapsis and the maximum distance is called the
apapsis. It is easy to write the equations for these distances in terms of the standard parameters a, b
and c for an ellipse. For orbits around the Sun, we use the terms perihelion and aphelion; for orbits
around the Earth, the terms are perigee and apogee. The actual values of the distances depend, of
course, on the mass of the primary, the specific angular momentum of the satellite, and how much
energy was given to the satellite. It is clear from the graph in figure 3 that both the periapsis and the
apapsis move away from the circular radius rc as the energy added is increased, but the movement of
the apapsis is much larger than that of the periapsis.
rp

hyperbola
rc

ra

parabola
ellipse

circle

Figure 3
Once the energy of the satellite reaches zero, there is no longer an apapsis; the satellite has
enough energy to actually escape the primary. Such an orbit is called an escape orbit and takes the
shape of a parabola. Escape orbits are very specific, being characterized by only one value of the
energy, so their parameters can be obtained relatively easily from our equations for specific energy and
specific angular momentum. Parabolic orbits move toward the primary until they reach their distance
of closest approach
2
rp
rc 2 .
2GM
At this distance, they move at their maximum speed of
2GM
vmax
.

rp
They move out into space away from the primary after reaching their distance of closest approach,
asymptotically slowing to zero speed. The speed of a parabolic orbit at any distance r from the
primary is given by
2GM
v
.
r
387

Advanced Mathematical Techniques

This relation can be used to determine the escape speed of a mass the minimum speed it requires to
escape the primary. The escape speed of the Earth at its surface is approximately 11.2 km/s, or about
25,000 mph.
Continuing to add energy past the escape orbits, we enter a region of orbits with positive
energy. These orbits have more than enough energy to escape, so they have energy to move even
when they are far away from the primary. At these asymptotic distances from the primary, satellites in
these orbits move in approximately straight lines. They come in close to the primary, closer than any
other type of orbit with a given specific angular momentum, then go back off into space asymptotically
approaching another straight line trajectory. Their straight line asymptotes give the hyperbolic nature
of these orbits away. The only lasting effect of the primary on these orbits is the deflection of the
asymptotic trajectories.

Exercises for Section VIII.2:


1. Determine the escape speed of Mars, given its mass 6.4 1023 kg and its radius 3400 km.

2. Determine the escape speed of a spherical planet as a function of its average density and its
radius. What happens to the escape speed if the radius is doubled at constant density? What
happens if the density is doubled at constant radius?
3. Determine the escape speed of a spherical planet in terms of its average density and its mass.
What happens to the escape speed if the mass is doubled at constant density? What happens if
the density is doubled at constant mass?
4. Explain what it means for the specific total energy of a satellite to be independent of its mass.
What happens to an object ejected at negligible speed from a satellite in orbit? Explain the
importance of introducing the specific energy. What does this mean about the possibility of
falling onto the primary? How would your answer change if we were considering motion only
in one dimension? Explain why angular momentum is important to objects in orbit.
5. Consider the solar system given in natural units, so that all distances are measured in terms of
the average distance from the Earth to the Sun and all periods are measured in years. What does
the effective potential look like in this unit system? Where is the stable equilibrium associated
with an orbit with specific angular momentum ? What is the total specific energy of this
circular orbit, given in this unit system? Suppose that a comet has a specific angular momentum
of 3.2 and a specific total energy of 0.017 in these units. Determine the perihelion and
aphelion of this comets orbit.
6. Explain qualitatively how the speed of a satellite with positive specific energy changes as it
moves away from its distance of closest approach. Does it speed up or slow down? What
happens to the component v of its velocity?
388

Section VIII.3: Keplers Laws

Section VIII.3: Keplers Laws


As we have seen, the possible orbits that satellites can exhibit have the properties of conic
sections. This result was arrived at by Johannes Kepler in the early 1600s as a result of his extensive
study of the detailed astronomical observations made by his teacher and mentor, the early Danish
astronomer Tycho Brahe,122 in the late 16th century. Keplers studies of Brahes data ultimately led
him to three laws of planetary motion. This first of these laws states that the planets move in elliptical
orbits with the Sun at one focus. This law can be extended to state that all objects move in orbits
described by conic sections with the Sun at one focus, when the gravitational influence of the other
planets is neglected.
Keplers second law states that the radius vector from the Sun to a planet sweeps out equal
areas in equal times. This law is illustrated in figure 4; it is much easier to see than to explain. The
second law implies that planets move slower when they are farther away from the Sun, and follows
from conservation of energy. The rate at which the radius vector sweeps out area is equal to one half
the radius times the component of the velocity perpendicular to the radius:

1
rate of area sweep r v .
2
2
This can be seen by approximating the area of a sector by a part of a circle; the rate of area sweep is
given by
dA 1 2
r ,
dt 2
2
which is constant because the specific angular momentum is constant. The physical meaning of
these equations is that the time required for the object to sweep out area A1 in figure 4 is the same as
that required for it to sweep out area A2 . This obviously requires it to be moving faster near the origin
as it sweeps out the area A2 than it is far from the origin as it sweeps out A1 .

Figure 4
Keplers third law states that the square of the period of a periodic orbit is proportional to the
cube of its semi-major axis. We have already seen how this law follows for circular orbits from
Newtons universal law of gravitation. Kepler noticed it empirically in data taken by Brahe on the
elliptical orbits of the planets, especially Mars. This law also follows for these elliptic orbits from
Newtons universal law of gravitation, but it is more difficult to show in the general case. Keplers
third law states
GMT 2 4 2 a 3
in terms of Newtons constant, where a is the semi-major axis of the orbits. Of course, this law only
applies to periodic orbits. The term period has no meaning for parabolic or hyperbolic orbits.
122
Interestingly, Brahe would not have been fond of Keplers work. He was, by many accounts, a strong proponent of the
Ptolemy view of astronomy that the Earth is the center of the Universe. He had hoped that his extremely accurate
measurements would show the truth of this view, but Kepler showed otherwise. Some historians hold that Kepler actually
had to steal Brahes work from his descendents in order to complete his analysis.

389

Advanced Mathematical Techniques

Standard studies of conic sections result in equations that refer to the origin as the center of the
conic or the vertex of the parabola. As we have seen, applications of conic sections to orbits assign a
physical meaning to the focus rather than the center. We have also seen that orbits unify conic
sections in a way that some earlier definitions lacked. There is another, more unifying, property of
conic sections that we can use to obtain a much more satisfying understanding of the relationship
between these geometric objects. A (non-degenerate, non-circular) conic section can be defined as the
locus of all points for which the ratio of the distance to a fixed point (the focus) to that to a fixed line
(the directrix) is constant. The constant ratio is called the eccentricity of the conic section.
The standard definition of a parabola indicates immediately that the parabolas have an
eccentricity of 1. Circles, on the other hand, are perfectly symmetric. Their nature forbids them from
being biased in any direction, so they cannot have a directrix. The only way for a circle to meet the
criteria of the definition is for the ratio to be zero. This definition forces all circles to sit together at the
focus. All of the other conic sections are well-characterized by this definition. The characteristics of
circles and parabolas have already been discussed, as they are easy to implement in the context of
Newtons second law and conservation of energy. They represent a very specific orbit, so can easily
be characterized. Elliptic and hyperbolic orbits, on the other hand, can have different shapes. These
are the orbits that we really need to focus our attention on, and they are well-characterized by their
eccentricity.

P d - r cos
r
F

d
Figure 5

Polar coordinates are the most natural choice for working with orbits. The angle increases
from 0 to 2 as the satellite orbits, and the distance to the primary is given as a function of . Let us
take the focus of the conic at the origin and the directrix parallel to the y-axis a distance d to the right
of the focus, as illustrated in figure 5. The point r , is on the conic if the ratio of its distance to the
origin to its distance to the directrix is equal to e:
distance to focus
r

e.
distance to directrix d r cos
Solving for r gives
ed
.
r
1 e cos
How are the parameters e and d related to the familiar geometric measures a, b, and c? We can
determine the relationship by demanding that the geometry of this new figure agrees with the geometry
we already know about ellipses and hyperbolas.
For ellipses, we can use the fact that the greatest distance and smallest distance to the focus
sum to 2a. This implies
a 1 e2
ed
.
a
r

1 e2
1 e cos
The smallest distance to the focus is just a c, so we have
390

Section VIII.3: Keplers Laws

ac

a 1 e 2
1 e

From these relations, we can write

a 1 e

a 1 e 2

c
.
a

b2
.
1 e cos a c cos
Since c is always smaller than a for ellipses, the eccentricity of an ellipse always lies between 0 and 1.
The distance of closest approach for a hyperbola is c a , so we have
ed
(c a )(1 e)
.
c a r
1 e
1 e cos
Hyperbolas do not have a farthest distance. Instead, they approach an asymptote. The angle of this
asymptote satisfies
1
cos asymp
e
because that is where r becomes undefined. The negative sign implies that this angle is in the second
or third quadrant, as is obvious from the geometry. The cosine of this angle can also be obtained from
a hyperbola in standard form; it is a c . Ignoring the minus sign (why is this ok?) gives
c
e ,
a
exactly as in the case of the ellipse. Since c is always larger than a for hyperbolas, hyperbolas always
have e > 1. The equation for a hyperbola becomes
a e 2 1
b2
,
r

1 e cos a c cos
exactly as in the case of the ellipse. Thus we see that the standpoint of eccentricity provides a
framework in which the similarities between ellipses and hyperbolas are apparent.
We can represent the equations of all conic sections as
rp (1 e)
.
r
1 e cos 0
r

The inclusion of 0 allows us to consider orbits whose directrix is not parallel to the y-axis. Shifting
the value of in this way provides the same service that rotation of axes provides in Cartesian
coordinates. The satellite reaches its periapsis at 0 , so all other choices of represent places in the
orbit where the satellite is farther away from the primary than it is at 0 . Other than 0 , the orbit is
characterized by two parameters: the periapsis distance rp and the eccentricity e. These parameters are
good choices because they are well-defined for all orbits and allow us to represent all orbits in the
same way. The orbit is circular if e 0 , elliptic if 0 e 1 , parabolic if e 1 , and hyperbolic if e 1 .
In order to show that all orbits are conic sections, we need to show that the equation of constant
specific energy only admits solutions of the form given above for conic sections. This calculation is
necessary to determine the eccentricity of the orbit in terms of the specific energy and specific angular
momentum.
The equation of constant specific energy is
1
2 GM
v2 2
.
r
2
2r

391

Advanced Mathematical Techniques

We defined v above as representing the radial part of the velocity. As such, it is equal to the rate of
dr
. Taking the derivative of the specific energy equation with respect to time gives
dt
dr d 2 r 2 dr GM dr
0 2 3 2
dt dt
r dt
r
dt
since the specific energy is constant. If v is not identically zero, we can divide by it and obtain

change of r: v

d 2 r 2 GM
2 .
dt 2 r 3
r
This is a nonlinear differential equation of second order that can be solved in an easy way due to an
astute observation. First, we recognize that we are really interested in r as a function of rather than
t. The differential operators are related via
d d d
.

dt dt d
d
Now,
is the angular speed of the satellite. It is directly related to v as
dt
d
.
v r
dt
The specific angular momentum, rv , is a constant of the motion, so we can write
d

.
dt r 2
The second derivative can therefore be written as
2
d 2 r d dr d dr
2 d d 1 r
2 d 1 r

dt 2 dt dt r 2 d r 2 d
r 2 d d
r 2 d 2
Defining u 1 r , we find that u satisfies the linear differential equation
d 2u
GM
u 2 .

d 2
The solutions of this linear differential equation are
GM
u 2 A cos 0 ,

for any constant value of A. Inverting, we see that


1
1
2 GM
.

r
u GM 2 A cos 0 1 A 2 GM cos 0
The motion is therefore definitely given by a conic with one focus at the origin, the eccentricity of
which is A 2 GM , proving Keplers first law.
To determine the eccentricity in terms of our physical parameters and , we substitute u
back into the original specific energy relation:
2
2
2
1 dr 2
1 du 2 2
2 2 GM
u 2 GMu

u
GMu
A

2 .
2 dt
2
2 d
2
2

The value of A is therefore given by
2

A
392

GM 2
2 2

Section VIII.3: Keplers Laws

and the eccentricity is given by

e 1 2 GM .
2

The periapsis distance rp can be obtained via

2
GM (1 e)
once the eccentricity has been calculated. If the orbit does not have eccentricity 1, the scale parameter
GM
a
2| |
is also very useful.
The expression for e is quite revealing. The eccentricity is clearly less than 1 whenever the
specific energy is negative, leading to a bound elliptic orbit, and greater than 1 whenever the specific
energy is positive, leading to an unbound hyperbolic orbit. It is equal to 1 whenever the specific
energy is zero, an escape orbit, or when the specific angular momentum is zero. Orbits in which the
specific angular momentum is zero are really just straight lines, also known as degenerate conic
sections. The minimum value of is the specific energy associated with a circular orbit,
rp

circ

GM
1 GM

,
2rc
2

and gives an eccentricity of 0.


Keplers third law in general concerns the relationship between the period of an elliptical orbit
and its semi-major axis. In order to determine the period of an elliptical orbit, we relate a differential
change in time to a differential change in angle:
d
r2
dt d .
2

dt r

Integrating this expression over one whole orbit gives the period of the motion:
2
a 2 1 e 2 2
r2
1 2 a 1 e
d
d
T
d
0 1 e cos 2 .
0

0 1 e cos

The last integral is kind of a beast. The simplest way to determine its value is to first determine the
value of the related integral
2
d
2
0 cos 2 2 ,
2

using complex analysis, and recognize that the integral we are interested in is just minus the derivative
of this integral with respect to , evaluated at 1 and e :
2

1 e cos
0

2 2

3 2
2 1 e 2 .

1; e

Substitution into the expression for T gives


a2
a2
2 a 2
2
2 GM
T 2
1 e2 2
.

GMa
Squaring and rearranging, we have Keplers third law
GMT 2 4 2 a 3 .
393

Advanced Mathematical Techniques

The important thing to remember about Keplers third law is that it has exactly the same form for
elliptical orbits as for circular orbits. The only difference is that the semi-major axis a replaces the
radius rc .

Exercises for Section VIII.3:


1. Show that the value of A given in the text leads to the expressions for the eccentricity and scale
parameter.
2. Determine the apapsis distance associated with an elliptic orbit in terms of the semi-major axis
and the eccentricity. Show that this result follows naturally from the expression given in the
text for the polar form of the equation of an ellipse.
3. Explain why we choose to characterize general orbits in terms of their perihelion distance
instead of the choice of semi-major axis most often seen in algebra II. Why cant we use the
scale parameter a for parabolic orbits? What does the analogue of an aphelion distance mean
mathematically for a hyperbolic orbit? Is this quantity physical?
4. Show that the angle between the two asymptotes of a hyperbola with eccentricity e is given by
180 2 cos 1 1 e .

5. Find the other focus associated with the ellipse r

a 1 e 2

and show that the sum of the


1 e cos
distance from any point on the ellipse to each focus is constant along the ellipse. What is this
constant? Locate the center of this ellipse and show that our polar expression is equivalent to
the expression
( x h) 2 y 2
2 1
a2
b
usually presented in algebra II.

6. Show that our integral expression for the period of an elliptic orbit leads to Keplers third law.
Include the derivation of the integral and all of the steps required to cancel the specific energy
and angular momentum.

394

Section VIII.4: Natural Scales and the Orbital Equations

Section VIII.4: Natural Scales and the Orbital Equations


We have finally come to the practical part of this chapter. Suppose that an astronomer has
been able to measure the distance r from the satellite to the Sun, its speed v, and the component v of
its velocity associated with tangential motion. Using these measurements, we can calculate the
specific energy
1
GM
v2
2
r
and specific angular momentum
rv
of the satellite. The eccentricity is then given by

e 1 2 GM ,
2

and the periapsis distance is

2
.
GM (1 e)
The problem is now solved in principle. Given the measurements, we can determine all of the
orbital parameters and sketch the orbit. The numbers themselves, however, will often be quite
unwieldy in standard unit systems due to the immense scale of the solar system in comparison to the
scales found on Earth. The specific angular momentum of the Earth in its orbit about the Sun, for
example, is approximately 4.5 1015 m 2 s . The appearance of this large number is entirely due to our
choice of units. In order to make the numbers less foreboding, we can choose to use a more natural
system of units for this problem.
A simple choice is the radius, speed, and period of a given circular orbit around the same
primary. There are, of course, infinitely many such orbits we can choose from. Once we choose the
radius of the orbit, however, its speed and period are fixed by the equations of circular orbits:
rv 2 GM ; GMT 2 4 2 r 3 .
These relationships depend only on the mass of the primary. All of the satellites orbiting a given
primary can be handled with the same set of natural units. To move to a different primary, a new set
of natural units should be introduced. The same radius can be kept if convenient, but the natural speed
will then increase by the square root of the increase in mass and the natural period will decrease by the
same factor.
When expressed in terms of natural units, all of the factors of G and M disappear. If the speeds
and distance are expressed in terms of natural units, the eccentricity equation becomes
rp

e 1 rv2 rv 2 2

and Keplers third law becomes


T 2 a3 .

The scale factor a is given by


a

r
rv 2
2

for non-escape orbits.


As an example, we can take the Sun as our primary and use the approximately circular orbit of
the Earth with a radius of 1.5 1011 m as our natural scale. This distance is commonly used as a length
scale in our solar system, and is called the astronomical unit (AU). The mass of the Sun is 2 1030 kg ,
395

Advanced Mathematical Techniques

so our circular orbit equations give a speed of 3 104 m s and a period of 1 year. A comet observed at
a distance of 7 1011 m from the Sun moving with a speed of 7800 m/s, of which 4000 m/s is directed
tangentially, has the parameters r 4.67 ; v 0.26 ; v 0.133 . The eccentricity of the orbit is 0.93
and its semi-major axis is 2.77 AU. The period is 4.6 years.
Once we have agreed upon a set of natural scales for the system, we can determine all of the
interesting aspects of each type of orbit. A list of the important equations for each type of orbit
follows.
Circular orbits have rv 2 1 and v v . The speed and radius are constant and the eccentricity
is zero. These represent the orbits with the lowest possible specific enery for a given specific angular
momentum. The period and radius are related via T 2 r 3 .
Elliptic orbits have 0 < e < 1. These represent bound orbits with negative specific energy. The
r
semi-major axis, a, is given by a
. The period and semi-major axis are related via T 2 a 3 .
2 rv 2
rv
The average distance to the primary is a. The speed varies from a maximum value of vmax
at
a(1 e)
rv
the periapsis rp a (1 e) to a minimum of vmin
at the apapsis ra a (1 e) . The specific
a(1 e)
angular momentum is given in terms of a and e as 2 GMa 1 e 2 in general units.

Parabolic orbits have rv 2 2 and eccentricity 1. They have 0 , so represent the minimum
2
energy necessary to escape the primary. The satellite attains a maximum speed of vmax
at its
rv
1
2
periapsis distance rp rv . This distance serves as the scale factor for parabolic orbits since the
2
standard choice a is undefined. As the satellite moves away from the primary, its speed asymptotically
approaches zero.
Hyperbolic orbits have e > 1. These orbits have positive energy and are characterized by the
rv
r
parameter a 2
. They attain their maximum speed of vmax
at the periapsis distance of
rv 2
a(e 1)
rp a (e 1) . As the satellite moves farther away from the primary, its speed asymptotically
approaches the value vasymp v 2 2 r . The end result of the collision with the primary is a
deflection angle given by 180 2 cos 1 1 e .
Hyperbolic orbits are hard to imagine before having seen what is meant by this term. One is
illustrated in figure 5 for this reason. The primary is represented by the large black dot, and the
deflection angle is shown. Obviously, only one branch of the hyperbola is present. The other is
entirely unphysical. Figure 6 illustrates this same orbit, along with a parabolic orbit with the same
parameters. Note the difference between the shapes of these two orbits.
396

Section VIII.4: Natural Scales and the Orbital Equations

Figure 5

Figure 6

The only real difficulty with these equations concerns what happens when the mass of the
primary changes. Since we have suppressed all dependence on M, it is not clear how a change in M is
reflected in our formulas. Obviously, a change in M will affect the orbit of the comet. A larger
primary mass will pull harder on the comet, effectively lowering the energy well.
The suppression of the primary mass comes from the choice of natural units, so the dependence
on primary mass is contained within these natural units. When the mass changes, the natural units of
the system change. The natural units of the system are just the parameters associated with a specific
circular orbit, so we can choose our new natural units to correspond to a circular orbit with the same
radius as the old natural scale. Once we make this choice, however, the new natural units for speed
and period are different.
Taking the solar system as an example, suppose our natural units are the radius, orbital speed,
and period of the Earths orbit. The comet in the above example has r = 4.67, v = 0.26, and
v 0.133 when expressed in terms of these units. If the mass of the Sun were to suddenly double,
the relationship between the natural units of the system would change. We can still choose a distance
of 1 AU for the natural length scale, but the natural speed must then satisfy
rv 2 GM new 2GM
vnew 2 v 42,426 m/s .
A satellite must be moving faster in order to maintain a circular orbit with this radius. Since the
natural speed increased by a factor of 2 , the actual speed of the comet reckoned in terms of this
natural speed decreases by the same factor. The new parameters of the comet are therefore r 4.67 ,
v 0.184 , and v 0.094 . The eccentricity of the orbit changes to 0.96 and the semi-major axis
changes to 2.535 AU. From Keplers third law, it is clear that the new natural period is a factor of 2
shorter than the old natural period, or about 0.7071 years. The period of the comets orbit, equal to
about 4 new natural periods, is about 2.85 years.
These changes are quite general. Whenever the mass of the primary is increased by a factor of
, we can choose the same natural length unit. All of the speeds are then decreased by a factor of
when reckoned in terms of the new natural units, and we use the same equations to calculate the
new orbital parameters. Once we have the new semi-major axis, we can calculate the new period in
terms of the new natural units. The new natural unit for T is reduced by a factor of in comparison
with the old natural period. In this way, we see that reducing the mass of the Sun by a factor of 3 will
cause the eccentricity of the comets orbit to become 0.86, its semi-major axis to become 4.435 AU,
and its period to become 16.18 years.
397

Advanced Mathematical Techniques

We have seen that the effective potential associated with Newtons inverse square law force
admits orbits that are closed in the sense that they return to the same place again and again. This
property of the inverse square law form of the force is extremely special. It can be shown that objects
influenced by a central potential123 of the form
u (r ) kr ,
with k and constant, will admit closed orbits only for 1 , inverse square law forces like the one
weve been studying, or 2 , three-dimensional analogues of simple harmonic motion. All other
central potentials will see orbits precess with time, as illustrated in figure 7. Try following the path of
the motion as the satellite orbits the primary in figure 7.

Figure 7
The precession is easily characterized by the change in the angle associated with the apapsis of
the orbit in successive orbits, as illustrated in figure 7, but for experimental reasons it is usually
measured as the precession of the periapsis. The perihelion of Mercurys orbit has long been know to
exhibit a precession of 574 seconds of a degree per century.124 Why should it exhibit this precession?
Our analysis indicates that it is under the gravitational influence of the Sun, and that this influence
does not generate any precession; the orbits are closed. Most of the reason for this precession is the
influence of the other planets. Mercury and the Sun are certainly not the only gravitational influences
in the solar system; Mercurys orbit is also affected by all of the other planets. The most important
influences come from its closest neighbor, Venus, and the second largest mass in the system, Jupiter.
These influences are not large, as evidenced by the extremely small rate of precession observed,125 and
the fact that Jupiters mass is only a thousandth of the Suns, but they are certainly measurable.
Detailed calculations in the 19th century indicated that the influence of the other planets account for
531 seconds of a degree per century, so we have 43 seconds of a degree unaccounted for. This
anomalous precession was first discovered by the French mathematician and astronomer Urbain Le
Verrier in 1859. It is very small, but definitely present.
The explanation of this anomalous precession had to wait for the development of general
relativity, presented by the German physicist Albert Einstein in 1915. The German physicist Karl
Schwarzschild showed in 1916 that general relativity modifies the specific effective potential
experienced by satellites in orbit about a primary of mass M to
2
2 GM
1

ueff ( r ) 2
,
2r
r cr
123

This term indicates that the force pulls only toward or away from the center, so the potential function is spherically
symmetric.
124
The actual observed precession is 5600 seconds per century, but the vast majority of this is attributed to the fact the all of
the observers are on the Earth. As the Earth is orbiting the Sun, it does not represent an inertial reference frame and we
cannot directly apply Newtons laws to observations made from it. This correction to the data reduces the precession by
approximately 5026 seconds of a degree per century, leaving us with the as-yet unexplained 574.
125
574 seconds of a degree per century translates to about 1.4 seconds of an arc per orbit; the perihelion of Mercury moves
4

approximately 310 km during each orbit, representing a change of approximately 6.7 10 % .

398

Section VIII.4: Natural Scales and the Orbital Equations

where c is the speed of light in a vacuum. The modification is equal to the square of the ratio of v ,
the component of the satellite velocity perpendicular to the vector from the primary to the satellite, to
the speed of light. Mercurys maximum orbital speed of 59 km s pales in comparison with the speed
of light in a vacuum, 300, 000 km s , so this is not a large correction. It is, however, there, and
modifies the potential to one that does not admit closed orbits.
The modification of the effective potential is illustrated in figure 8, where everything has been
greatly exaggerated for clarity. The unmodified effective potential is also shown, dashed, for
comparison. Note that the potential barrier is still there, but the modification has moved it closer to the
primary and generated a new unstable equilibrium. This unstable equilibrium is really only physical in
the case of black holes, as it is located inside any massive body that is not a black hole. Even neutron
stars have trouble being small enough to exhibit this feature outside their radii. This effective potential
is only physical outside the primary, so the unstable equilibrium is not a property of the physical
model. Despite this fact, the modifications clearly change the potential at larger radii. It is this change
that leads to the precession of the orbits.

Figure 8
In order to see how this modified effective potential changes the orbit, we again appeal to
conservation of energy. The differential equation we obtain is
2
d 2 r 2 GM

1
3

,
dt 2 r 3
r 2
cr
which, on changing to a differential equation in , leads to
2
d 2u
GM
GM 3GM 2
u

u
1
3

u 2 2 u ,
d 2
c
c
2


126
where u 1 r as before.
Our wonderful trick that worked so well in disarming the nonlinear
differential equation last time does not work in this case. The equation is still nonlinear and, barring
another wonderful trick, we will have to deal with it.
This differential equation isnt so bad, as the nonlinearity is small. The constant 3GM c 2 is
given by
3GM c 2 4450 m
for the Sun, and by
3GM c 2 1.3 cm
for the Earth. These distances can be thought of as the effective orbital radius for which the speed
required for circular orbits is the speed of light.127 They are well within the primary in both cases, so
all physical orbits must have 3GMu c 2 1 . The exact solution to this equation is not easily obtained,
126

The function u is not to be confused with the specific potential energy u there are never enough letters!
This is the relativistic result; the fully classical result for this orbital radius is smaller by a factor of 3. Our analysis of
general relativity breaks down when the speed approaches that of light, so our modification is not correct in this limit.

127

399

Advanced Mathematical Techniques

but the first integral approach we used in the last chapter allows us to determine the precession after
one complete orbit without too much work. To begin, we write the first integral as
2

The angle is therefore given by

GM 2 3
2 du 2 2
u .

u GMu
c2
2 d
2
d

.
2GM 2 GM 3 c 2 2 2 2
This change in angle after one orbit is given by the value of this integral taken over a full period of the
motion, from the minimum value of u to the maximum value and back. The sign of the integrand
changes when we are going back, so it is easiest for us to write the change in as
umax
d
2
2
umin
2GM 2 GM 3 c 2 2 2 2
.
max
max
d
d
2
2
2
min
min
2 2 2 2 3 e2 1
e2 2 2 3 1
2

umin

We have re-scaled the integral to write it in terms of the dimensionless variable 2 GM and the
parameter GM c , as well as re-introducing the eccentricity e 2 1 2 GM . This integral is
quite a beast, probably the most complicated one in the entire text. It is made even worse by the fact
that we are integrating from one root of the cubic in the denominator to another, and we have no
interest in finding these roots explicitly. We will have to proceed very carefully.
We are interested in finding the value of this integral when 1 . There will be three real
roots to this polynomial whenever the eccentricity lies between 0 and 1, as long as is small enough:
the two we are interested in and one other that lies at a much larger value of . The two roots we are
interested in lie close to 1 , and the argument of the square root is positive between them. Rewriting our polynomial as
2
2 2 3 1 e2 2 2 large max min ,
2

and remembering that the sum of the roots of any polynomial whose leading coefficient is 1 is the
opposite of the coefficient of the next-to-leading term,128 we have
2
2 2 3 1 e 2 2 2 1 2 2 max min max min
.
1 2 2 max min max min
The integral is therefore given by
2

max

min

1 2 2 max min
max min

and we can compute it via expansion:

1 2
2 k
2
2
k 0 k

128

max

min

1 2

d ,

max min d .
max min
k

If you dont remember this fact, you can easily re-derive it simply by multiplying the factored form out and looking for
the next-to-leading coefficient.

400

Section VIII.4: Natural Scales and the Orbital Equations

Now, is very small. In the case of Mercury, its value is approximately 1.64 104 , so we
really have no interest in calculating higher terms in this series. The term with k 0 is given by
max
d
2
2 ,
min
max min
showing no precession, of course, as this term does not contain the modification. The precession is
given by the term with k 1 ,
max
max min
2 2
d 3 2 max min .
min
max min
The easiest way to do this integral is to notice that the integral
max
2 max min
min d 0
max
min
by symmetry and work from there. The values of max and min are modified slightly by the inclusion
of , but these modifications are suppressed and we are considering a term that is already suppressed
by 2 . This allows us to simply use the unmodified result and take max min 2 , with an error that
is suppressed by at least a factor of . Therefore, we see that the perihelion of the satellite has
advanced by the amount
2

6 GM
GM
6 2 6
2
2

ac 1 e
after one orbit as a result of the corrections from general relativity.
Our result indicates that the precession amount is suppressed by the semi-major axis of the
orbit, so the greatest precession will be exhibited by Mercury. This precession has the value
5.017 107 rad orbit , or, since the period of Mercurys orbit is approximately 88 days,
2.083 10 4 rad century . This translates to 42.97 seconds century , in excellent agreement with the
observations. The prediction of this precession of Mercurys perihelion represents one of the crowning
achievements of general relativity. Einstein did not set out to explain the precession of Mercurys
perihelion, but his theory naturally explains this observation. It led to essentially universal acceptance
of the theory in 1916. This result can be used for any elliptic orbit about any primary, as long as the
eccentricity is not too large. It will essentially be accurate whenever it is small. If the value of this
shift is not small, then more terms need to be accounted for in the above expansion.
Note that the calculation of this integral was not really all that bad in the end because of the
approach we took. It was advertised as more complicated than any of the other integrals we have
discussed, but it didnt require as much work in the end as many of those in chapter 3. Those integrals,
while complicated, were fairly straightforward. This present approach was far from obvious, but was
necessary because we cannot simply expand the original integral in a power series about 0 . The
reason for this is that our expansion is only justified when is small compared to the terms it is being
added to. This is not the case over the whole integration range of the original integral because both of
the endpoints are associated with places where the argument of the square root vanishes. No matter
how small is, it is not small in comparison to 0. In order to expand, we must first separate the
large zero from the ones of interest. This, coupled with our use of some basic facts about
polynomials, leads the way to a lucrative expansion.

401

Advanced Mathematical Techniques

Exercises for Section VIII.4:


In problems 1 6, suppose that astronomers have been able to determine the speed, v, its perpendicular
component v , and distance from the Sun, r, of a new comet. These quantities are given in terms of
the natural units associated with the Earths orbit for simplicity. Determine the eccentricity, specific
energy, specific angular momentum, and perihelion of its orbit, all in terms of the natural units. If the
orbit is circular or elliptical, determine its period and aphelion. If it is hyperbolic, determine its
asymptotic speed and deflection angle. Include a graph of the orbit in all cases, showing the natural
orbit of the Earth for comparison.
1. v 0.2 , v 0.14 , r 4

2. v 0.5 , v 0.5 , r 4

3. v 2 , v 1 , r 1.5

4. v 0.5 , v 0.4 , r 3

5. v 2 , v 1.5 , r 0.25

6. v 0.7 , v 0.3 , r 6

7. It seems strange that the shape of the orbit of a satellite about a given primary depends only on
two parameters, the specific energy and the specific angular momentum, but the determination
of this shape requires three astronomical observations: the distance r from the primary, the
speed v of the satellite, and its perpendicular component v .129 Is it possible to observe only
two physically-measurable properties of a satellite at a given point in its orbit and use these to
determine the shape and size of the orbit? What additional information does this third
measurement contain?
8. Determine the perihelion shift of the orbit of the Earth about the Sun associated with the
correction from general relativity to the gravitational potential of the Sun, in seconds of an arc
per century. The eccentricity of the Earths orbit is approximately 0.0167, and the orbits semimajor axis is 1.496 1011 m . Explain why this effect is smaller than that experienced by
Mercury.
9. Determine the perihelion shift of the orbit of Halleys comet about the Sun associated with the
correction from general relativity to the gravitational potential of the Sun, in seconds of an arc
per century. The eccentricity of its orbit is approximately 0.967 and the orbits semi-major axis
is 28.63 1011 m .

10. Determine the perihelion shift of the orbit of Jupiters innermost moon Io associated with the
general relativity corrections. Take its orbit to be circular, with a radius of 421,700 km and a
period of 1.77 days. Explain why this calculation would not have any physical meaning if the
orbit of Io actually was circular and why this does not invalidate the result itself.
129

The complete determination of an orbit actually requires several other pieces of information to fix the orientation of the
orbit with respect to the Earth. Only three are required to determine the orbit without reference to another body.

402

Section VIII.4: Natural Scales and the Orbital Equations

11. In the book, 2010:Odyssey II, Arthur C. Clarke envisions a superior alien race that foresees
evolution potential in the budding life-forms living below the icy surface of Jupiters moon
Europa. In an attempt to bolster this species chances of success, the species increases the mass
of Jupiter to the point that its core initiates fusion and it becomes a star. The minimum mass
necessary to initiate fusion in the center of a gas of hydrogen is approximately 90 Jupiter
masses, or 1.7110 29 kg . Europa is currently in an approximately circular orbit about Jupiter
with a period of 3.55 days. How would this increase in mass of its primary change its orbit?
Determine the new eccentricity, period, apapsis, and periapsis of the orbit. Would you expect
this change to help or hinder any budding life-form on Europa? Explain.

12. Consider the integral we used to determine the precession of orbits due to the corrections of
general relativity. Why didnt we just expand the integral about the stable equilibrium and
integrate term-by-term? Explain your answer with a partial attempt to do this. Pay close
attention to whether or not the series you obtain converges and what the limits of integration
ought to be. Explain why the treatment given in the text is more applicable to this situation.

Section VIII.5: Summary of Chapter VIII


This chapter treats one important case of two-dimensional motion that can be reduced to an
effective one-dimensional problem by employing conservation of angular momentum. This technique
is very useful, and essentially always employed when the system under consideration is spherically
symmetric. The tangential component of the velocity is re-expressed in terms of the radial parameter r
in order to effectively reduce the problem to one dimension. One always arrives at an effective
potential for the analogous one-dimensional system, and then uses the results of the last chapter to
describe the motion. Using this approach, we can fairly easily classify the orbits of bodies about a
massive primary and make many detailed statements about these orbits. Keplers three laws follow
directly from this analysis, and the satisfaction of these laws represented one of the main arguments
Newton had for his specific form of the gravitational force. Keplers second law, indicating that equal
areas are swept out by the radius vector in equal times, interestingly is a property only of conservation
of angular momentum; this law, unlike the other two, will be satisfied by every conservative force
whose potential energy function depends only on the distance to a fixed point. As long as a given
force field is conservative and spherically symmetric, we can expect every object acting under its
influence to sweep out equal areas in equal times.
The other dimension has been suppressed in this analysis, but it is still there. The question of
whether or not the orbits are closed requires us to consider angular motion in addition to radial
motion and determine whether or not the periods match. We have seen that potentials deviating even
by a small amount from the inverse square law force of Newton lead to precession of the orbits. It
can be shown that only two conservative and spherically symmetric force laws, the inverse square law
and the linear force law analogous to simple harmonic motion, admit orbits that are closed and do not
precess. This allows us to attribute even a small precession to influences that deviate from one of
these two laws, and led ultimately to the ability to measure even the extremely slight deviation general
relativity brings to the potential associated with our solar system. Of course, there are gravitational
influences other than the Sun in our solar system. These influences also lead to deviations from the
403

Advanced Mathematical Techniques

simple model of a single satellite orbiting a single primary, and the deviations they entail are much
more intrusive because they do not represent spherically symmetric force fields. Rather than simply
the precession of Mercurys orbit mentioned in the text associated with the presence of the other
planets, this influence leads to changes in orbital eccentricities and causes planets to violate even
Keplers second law in addition to the other two. This leads to the Milankovitch cycles in the
eccentricity of orbits, causing the eccentricity of the Earths orbit to oscillate between 0.003 and 0.058
with an approximately periodic cycle whose period is approximately 400,000 years. We are currently
in the low-eccentricity part of the cycle, with an eccentricity of approximately 0.0168. These cycles,
along with others indicating the axial tilt of planets and the orientation of their orbital plane, are named
for the Serbian mathematician Milutin Milankovitch, who exhaustively studied their properties during
World War I. They are responsible for many of the long-term climate patterns observed in fossil
fragments and ice samples taken from the Antarctic. Far from being a closed orbit, the Earth does not
even orbit in a single plane due to the effects of the other planets.
This is indicative of what happens in higher dimensions, even with situations that exhibit
spherical symmetry: the other dimensions may be suppressed, but they still are present and must be
considered in order to fully describe the situation. Situations that do not display spherical symmetry
are much more complicated; even the simple problem of three massive objects interacting
gravitationally with each other is analytically intractable and must be solved numerically. Such
numerical analyses are very useful, but often cannot be depended on to give us accurate information
about the asymptotic behavior of a system. The two-body problem avoids this unfortunate situation
in essence because of conservation of linear momentum; transforming to the center of mass frame
allows us to treat it as a one-dimensional problem. The results we obtain from this treatment often
allow us to give at least qualitative results for more complicated systems, which aids the analysis of
these more complicated systems immensely.

404

Section X.1: The Binomial Distribution

Chapter X
Probability Distributions
The purpose of this chapter is to develop some of the techniques used to assess the probability
of various outcomes when several are possible under the given circumstances. There are many reasons
why this may be the case, the most common of which are associated with a lack of information about
the initial conditions either because it is unknown, in the case of systems involving many similar or
identical objects, or because it cannot be known, as is the case when the effects of quantum mechanics
become important. In either case, the techniques of probability and statistics are extremely useful and
quite accurate in many situations. They also allow us to easily analyze the behavior of many systems
for which accurate information about all of the initial conditions is inaccessible, which in turn allows
us to make broad statements about the behavior of large systems and identify the types of initial
conditions that dictate the bulk behavior of the system.
Section X.1: The Binomial Distribution
One of the simplest situations illustrating the requirements of probability involves a system in
which two different outcomes are possible, with different probabilities. You can think of flipping a
coin as a specific example, or throwing a six-sided die and considering the two outcomes 5 and not
5. In the first case, the probability is even at 50% heads and 50% tails if the coin is fair. In the
second, the probabilities are 1/6 5 and 5/6 not 5. Suppose we flip a coin 300 times. How many
heads outcomes should we expect? How accurate is this expectation? What are the chances of
flipping heads exactly 95 times and tails exactly 205 times? In order to answer these questions, we
need to investigate the possible outcomes of the experiment. How many different ways are there to
have all three hundred flips come up heads? Well, each and every toss must come up heads, so
everything is completely determined. There is only one way for this to happen. There are 300
different ways to flip one heads and 299 tails, as the single heads could have been the first throw, the
second, the one hundred and fifty second, or the three hundredth. Each is equally likely, and the
probability of each occurring is given by p (1 p ) 299 (0.5)1 (0.5) 299 , as the heads comes with a
probability of 0.5 and each of the tails comes with the same probability. The number of ways to end
up with 2 heads and 298 tails can easily be determined by thinking about the number of different ways
that two flips can be of one type and 298 of the other. There are 300 different ways to choose the first
one to come up heads, and once that one is chosen there are 299 different ways in order to choose the
second one. The rest must be tails. When thinking about the possibilities in this way, it is unimportant
which one is chosen first. There are two ways to obtain heads at the 67th and 195th flips in this
analysis that do not actually represent different outcomes, so the actual number of ways to obtain two
heads is given by (300)(299) 2 44,850 . For three heads, we have (300)(299)(298) 6 4, 455,100
different ways. In general, the number of different ways to obtain exactly k heads out of n flips is given
by n choose k,
k terms

n n(n 1)(n 2) (n k 1)
n!

.

k!
k !(n k )!
k
There are clearly many more ways to have three heads than there are to have only one head, so
this outcome is more probable simply from the number of different ways in which it can be arranged,

481

Advanced Mathematical Techniques

having nothing to do with the probability of ending up with that option. The actual probability is
determined by multiplying the number of different ways in which the option can be obtained by the
probability for each,
n
Pk ;n p k (1 p ) n k .
k
Using this expression, the probability of having 160 heads and 140 tails is 0.0236722 2.36722% .
The probability of rolling 5 fives and 6 not fives in eleven rolls of a die is given by 0.0198975, or
1.98975%. Adding up the probabilities of all possible outcomes gives 1,
n
n k
nk

p (1 p ) 1 ,
k
k 0
by the binomial expansion theorem
n
n k nk
n

x y ( x y) .
k
k 0
The average number of positive outcomes, or expectation value of k is defined as the weighted sum
of k:
n
n
k k k p k (1 p ) n k .
k 0 k
This is expected to literally give the average of the number of heads obtained by a large number of
identical trials of the same experiment. Its value can be determined by differentiating the binomial
expansion with respect to x:
n
n
d
( x y ) n n( x y ) n 1 k x k 1 y n k
dx
k 0 k
.
n
n k
nk

k
p
(1
p
)
np


k 0 k
The result is fairly obvious, but the technique can be very useful in determining the average or
expected values of many observables. The expectation value of the square of k can be obtained by a
judicious use of derivatives:
n

n
d d n n
k 2 x k y n k x x x k y n k n(n 1) x 2 ( x y )n 2 nx( x y )n 1 ,

dx dx k 0 k
k 0
k

so
k 2 n( n 1) p 2 np .
In this calculation, the operator x d dx acting on ( x y ) n generates a factor of k in the sum. This idea
and related ones are very important in Fourier analysis, which we will consider later on. The
expectation value of any power of k can be obtained by using this technique, but the most important
one is the square. The expectation value of the square of the number of heads is used as a measure of
how accurate the expectation value is. A determination of how good an expectation the expectation
value k is of the actual number of heads given in any actual experiment is given by the expectation
value of the square of the difference between k and k . This gives the expectation value of the square
of the difference between the actual value and the expectation value:

k2

k2 2 k k k

For the simple binomial distribution above, this gives


482

k2 k .

Section X.1: The Binomial Distribution

k np(1 p) .
One determination of the error involved in assuming that the number of positive outcomes is given
by k is given by the standard deviation. For example, the expected value of number of 5s in 300
rolls of the die is np 50 , and the standard deviation is 6.455. We would expect that, given three
hundred rolls of the die, between 43.54 and 56.46 will turn up 5. It is important to have this estimate
of error in addition to the expected value so we can have more information about how good of an
estimate the average value is. It is clear that this standard deviation increases as the total number of
measurements increases, so getting a larger number of data points does not improve the absolute error
in the measurement. The relative error, on the other hand, decreases to zero as
k
1 p

k
np
when n increases without bound. This is characteristic of essentially all probabilistic models:
statistical techniques cannot give an accurate prediction for any single outcome, but bulk results for a
series of identical experiments can often be predicted with a large degree of accuracy.
The confidence level, or CL, can be thought of as the degree of certainty we have for the
result of a specific scientific experiment. In order to determine the probability that between 43 and 57
of the die rolls come up 5, we simply add the probabilities associated with that number of rolls. In
this case, the sum contains
300
43
257

1 6 5 6 0.035583...
43

and
300 5250

300 0.0616975... ,
50 6
along with 13 other contributions. The sum of these 15 gives a confidence level of 75.5%, pretty good.
The result would be quoted as 50 7 rolls of 5, at 75.5%CL. To get a higher confidence level, we
need to include more possible outcomes. This is the nature of probability: we can give a result to
arbitrarily high confidence levels, but can never determine with certainty the result of a given
experiment. While this analysis can be used to analyze the standard errors associated with the
measurement process, it is important to understand that it goes much deeper than that. In a
probabilistic process, it is impossible to determine with certainty the outcome of a single trial no
matter how accurately the data is measured. The underlying process itself comes with a degree of
error. Refining this concept leads to the underpinnings of quantum mechanics and its, sometimes
strange, interpretations.
One of the most misunderstood aspects of probability theory is the role of additional
information in determining probability. This aspect is misunderstood mainly because its interpretation
is very subtle and one can easily make incorrect judgments using common wisdom. Consider, for
example, the gambler who has placed bets that a die roll will be 5. On any given roll of the die, the
gambler has winning odds of 1 in 6. This is definitely true if the die is fair, and there is no problem
with it. The subtlety comes in when the gambler has already lost or won several rolls. Suppose that
the gambler has already lost 9 games and wants to assess the probability that he will lose the next
roll. The gambler reasons that the probability no 5s come up in 10 rolls is only 16%, so the odds of
the next roll coming up 5 is 84%. It is clear in this simple situation that the gamblers reasoning is
not accurate, but this sort of thinking is extremely common in gambling circles. One imagines that it is
highly unlikely for every roll to be a losing one, so each losing roll seems to increase the probability
that the next roll will be a winner. Players of slot machines often think of each loss as somehow
483

Advanced Mathematical Techniques

bringing them closer to the elusive payoff. The fallacy in this reasoning stems from the fact that each
roll of the die gives us definite information about what has already happened. The 84% result obtained
above by the hapless gambler also contains outcomes of 2 5s, 3 5s, 4 5s, etc, that have already
been disallowed by the results of the previous rolls. Nine rolls of the die have already occurred, and
none have yielded a 5. Each roll of the die represents definite information about what has happened
in this particular case. As far as the gambler is concerned, the probability that one of the first 9 rolls
comes up with a 5 is zero. It did not happen. The only roll that matters is the tenth one, and the
probability that this will come up 5 is 1 in 6.
The subtlety in the above reasoning lies in the importance of definite information in probability
analysis. Before the gambler begins, he can say for sure that it is unlikely for him to lose all of ten
rolls of the die. The probability that he will win at least once is 84%, as indicated above. This
probability changes with each roll, however. If he loses the first roll, then the probability that he will
win at least once in all ten rolls is reduced to the probability that at least one of the nine remaining
rolls is a winner, about 80.6%. If, on the other hand, he had won the first roll, then the probability that
he will win at least one of the ten rolls becomes 100%; he already won. It is often lucrative to think of
this process in terms of multiple parallel universes. Upon entering the casino, all possibilities are
open to the gambler. He could win all ten times, lose all ten times, or have any other outcome between
these two extremes. Each roll of the die reduces the possible outcomes, effectively putting him on a
specific path among the many possibilities that were initially open to him. Each roll of the die
represents a fork in the road. Two different outcomes are possible, leading to two different paths.
Once the die is rolled, one of the paths is picked over the other; the other option is no longer
available to the gambler. This idea is difficult to accept for the gambler who has already lost 49 rolls
and is hoping to win the next roll. It is highly unlikely that the gambler will lose all 50 rolls, about
0.01%, or one chance in 10,000, but this gambler has already lost 49 rolls. As improbable as this
outcome was at the outset, it is the path that the gambler finds himself on now. The probability that
the 50th roll will be a winner is 1/6, exactly as it was for all of the previous rolls. The significance of
the 0.01% probability is that, of 10,000 different gamblers each rolling the die 50 times, we expect
only 1 to lose all 50 rolls. It is highly unlikely that any specific gambler entering the game will turn
out to be this 1 in 10,000, but the gambler who has already lost 49 rolls is well on his way to becoming
that 1 losing gambler.
The idea of parallel universes is kind of strange in this context, but becomes extremely useful
in analyzing the behavior of quantum systems. In quantum mechanics, it is often the case that two
forks in the road recombine at a later time. This happens when a measurement is made, but the
results are not recorded and the two possible outcomes of the measurement are allowed to recombine
later on. An electron could be faced with two different paths, taking one with a probability of 1/3 and
the other with a probability of 2/3. The electron could be subjected to a magnetic field along one of
the paths and no magnetic field along the other. If the two paths then cross again and a measurement
is taken, there has been no concrete determination of which path was taken by the electron. Both paths
lead to the ultimate measurement, so both will contribute to the measurement. This phenomenon is
called quantum interference, and can be thought of as a path forking into two branches that later
recombine into a single branch.
We can continue our analysis of the effects of additional information on probability by
considering 300 rolls of the die. We saw above that we can expect 50 7 5s at the 75.5%
confidence level. Suppose we already know that there were 40 5s in the first 250 rolls and want to
determine the confidence level that the total number of 5s in the whole run of 300 rolls is 50 7 .
We already have 40 5s, so we need to know the confidence level that the remaining 50 rolls give
between 3 and 17 5s. This is given by
484

Section X.1: The Binomial Distribution

50 550k
0.9926... ,
50
k 3
6
so this additional information allows us to put the total number of 5s at 50 7 with a 99.3%
confidence. Alternatively, we could simply analyze the remaining rolls and put the additional number
of 5s at 8 (the expectation value is 50/6 = 8.333) plus or minus 3 (the error estimate is given by
2.635 in this case), or between 5 and 11 additional 5s with a 81.8%CL. The result would be
quoted as 48 3 total 5s at 81.8%CL. Note that the error obtained by this analysis is much smaller
(with a corresponding lower confidence level) than that obtained by considering all 300 rolls as
undetermined; the 250 rolls that have already taken place reduce the effective error.
17

Exercises for Section X.1:


1. Determine the probability that between 20 and 25 heads will come up in a trial of 70 tosses of a
fair coin.
2. Determine the probability that between 3 and 12 heads will come up in a trial of 30 tosses of a
fair coin.
3. A gambler has placed bets that he will end up with at least 30 heads in a series of 50 tosses of a
fair coin.
(a) What is the probability that his bet will pay off?
(b) Suppose that he has made a side bet that exactly 30 heads will come up. What is the
probability that he will win this bet?
Now, suppose that 30 tosses have already occurred and heads has come up 23 times.
(c) Determine the probability that he will win his bet.
(d) Determine the probability that he will win the side bet.
4. An unfair coin is tossed 120 times. The probability of heads in one toss is 0.4.
(a) How many heads are expected to come up in this run of tosses?
(b) What is the standard deviation of the number of heads?
(c) What is the expectation value of the product of the number of heads and the number of tails
in this run? Is this product greater than, less than, or equal to the product of the expected
number of heads and the expected number of tails? How is this related to the expectation
value of the number of heads and its standard deviation?

5. Determine the expectation values k 3

and k 4

in a trial run of n tosses of an unfair coin

whose probability of coming up heads is p.


6. A gambling enthusiast is interested in tossing an unfair coin that lands on heads 30% of the
time 1000 times. The expected number of heads is obviously 300. What range must the
enthusiast state in order to quote a 90% CL?
485

Advanced Mathematical Techniques

7. Suppose that there are 30 people in a room, randomly distributed according to birthday. What
is the probability that none of them have the same birthday? What is the probability that at
least two of them have the same birthday? Ignore leap years, and assume that the probability
that any given person has a given birth date is 1/365. How many people do you need in order
for the probability that at least two of them have the same birthday to be larger than 90%?
Hint: The easiest way to do this problem is to concentrate on the probability that all of the
birthdays are different.

Section X.2: Continuous Probability Distributions


The binomial distribution is an extremely useful example of a probability distribution that is
applicable to a large variety of different scenarios. Such examples are almost always milked to
obtain information about somewhat different situations. One useful result that can be obtained from
the binomial distribution concerns situations in which the measured quantity can have a continuous
distribution of values. The possible outcomes of 300 rolls of a die are discrete: you could get no 5s,
1 5, 10 5s, 200 5s, etc, but you cannot get 166.7 5s. If we look at the number of 5s as a
fraction of the total number of rolls, however, then the possible outcomes approach a continuous
spectrum as the total number of rolls is increased. If the fraction of 5s is given by x, then the
probability of obtaining x 5s in n rolls is given by
n
n!
Pxn;n p xn (1 p)(1 x ) n
p xn (1 p )(1 x ) n .
xn
( xn)!(n xn)!

Staying away from x = 0 or 1, we see that all of the factorials are taken of large numbers as n is
increased. This allows us to approximate the probability using Stirlings result:
p
1 p 1
ln Pxn;n xn ln (1 x )n ln
ln 2 x(1 x)n .
1 x 2
x
This is accurate provided that both of the numbers xn and (1 x)n are sufficiently large, which occurs
for large n whenever x is not equal either to 0 or 1. Figures 1, 2, and 3 compare the approximation
(dark curve) to the exact results (red dots) for p = 1/6 and n = 10, 20, and 50, respectively. The
approximation is not terribly accurate for n = 10, but gets significantly more accurate as n increases.
0.15

0.4
0.20

0.3

0.10

0.15

0.2

0.10

0.1

0.05

0.05

0.2

0.4

0.6

Figure 1

0.8

1.0

0.2

0.4

0.6

Figure 2

0.8

1.0

0.2

0.4

0.6

0.8

1.0

Figure 3

The total probability of attaining between 8 and 16 5s in a sample of 50 rolls is given by the
sum
16 50
550 k
P8 k 16 50 .
k 8 k 6
This sum can be manipulated in the following way:
16 50
16 50
550 k 16 50 550 k
550 k
P8k 16 50 50 k 50 50 x .
k 8 k 6
k 8 k 6
k 8 k 6

486

Section X.2: Continuous Probability Distributions

Since x is very small (0.02), we can think of the final sum as a Riemann sum. Writing Pxn;n P ( x) ,
the probability is given approximately by
16 50
8 25
550 k
P8 k 16 50 50 x 50 P ( x) dx .
4 25
k
6
k 8

Now, the actual function f (x) involves factorials that can, obviously, be interpreted in terms of the
Gamma function for non-integer arguments, but will not be easy to integrate. Our approximation,

p x (1 p)
nx

P( x)

(1 x)

n (1 x )

,
2 x(1 x)n
is easier to integrate and seems to be pretty accurate when n = 50. It results in a probability of 53.4%,
when P8 k 16 60.7% . This surprisingly large deviation stems from our approximation of the Riemann
sum, which gives the exact result, by the integral. It can be clearly understood by analyzing the graph
in figure 4, which shows the exact result as the area under the rectangles and the approximate result as
the area under the curve. The result is clearly underestimated by the area under the curve even though
the curve hits each probability on the left wall of the rectangles fairly accurately. The problem lies in
the fact that this distribution is not continuous; the probabilities associated with the binomial
distribution are valid over the interval x 1 50 , which is not of differential order.

Figure 4
Although 1/50 is not small enough to result in a good continuous approximation, the
approximation gets better and better as n increases. This is a good thing, as there is really no need for
an approximation when n is small; we can readily compute the exact value. When n is very large, a
computation of the exact value involves many terms and is not as easy to determine. It is preferable to
have an accurate integral representation of the probability. The approximation to f (x) given above is
not very easy to integrate, but we can make it a bit better by making use of the Taylor approximation
used to derive the Stirlings result. Expanding the logarithm of P(x) about x = p, we have
p2 p 1 2
1
1 1
1
1 n
2
2
ln P( x) ln 2 p(1 p )n
( x p)
( x p) .
2
2 p 1 p
2 p(1 p ) p (1 p )2
As in the Stirling case, the higher order terms in the can be neglected whenever the deviation of x
from p is smaller than order of 1 n as at least as small as order of 1 n . Contributions from values
of x that deviate from p by amounts larger than order of 1 n are exponentially suppressed by a power
of n, so for large n the important contributions are approximated well by
p2 p 1 2
1
1 1
1
1 n
2
ln P( x) ln 2 p (1 p )n
( x p)2 .
( x p)
2
2
2 p 1 p
2 p (1 p ) p (1 p)
Note that the first derivative term in this expansion is not zero because x = p does not represent the
maximum probability unless p = 1/2. We are expanding about the expectation value of x rather than the
maximum probability value for convenience. Completing the square, we can write our result as
487

Advanced Mathematical Techniques

p 1 2
1
1
ln P( x) ln 2 p(1 p )n
2
2 ( n 1) p (1 p) 1 2
2

.
2
1 (n 1) p (1 p) 1 2
(n 2) p(1 p ) p 2 1

x p (n 1) p(1 p ) 1 2
2
p 2 (1 p ) 2

This is not an obvious improvement, but it does give us the estimate


(n 2) p (1 p ) p 2 1
xMP p
(n 1) p (1 p ) 1 2
for the value of x associated with maximum probability. This approximation gives xMP 0.15963 for
p = 1/6 and n = 50, while the exact value, obtained by using the Gamma function rather than Stirlings
approximation, is xMP 0.15992 .
Although it is not pretty, our approximation to the function P(x) is Gaussian in nature and
therefore very easy to integrate numerically. As before, this approximation is not terribly accurate
either when n is not very large or when the probabilities we are interested in lie far from the region of
maximum probability. This is expected, as our approximations explicitly required large n and were
focused on the maximum probability region. The probability that between 8 and 16 5s will occur in
a run of 50 rolls is approximated by 51.3%, while the exact value is 60.7%. The probability that
between 90 and 110 5s will occur in a run of 600 rolls is approximated by 72.7%, while the exact
value is 75%. For even larger numbers of rolls, the accuracy improves significantly. Some
characteristic results are shown in table 1 to give you a feel for the regions in which this approximation
is expected to be accurate. The error column gives the difference between our approximation and the
exact result, divided by the exact result. Note that the approximation is not very accurate far from the
maximum around x = 1/6 even for large values of n. Looking at the probability column, however, we
see that these regions of inaccuracy are also associated with extremely small probabilities and are
therefore not terribly important anyway. All that we really need in these regions is an accurate
assessment of the order of magnitude of the probability, which is definitely accomplished by the
approximation.
n
x range
Probability
Approximation Error
100
0.1 0.2
0.8268
0.03
100
0.2 0.3
0.219
0.235
1000
0.15 0.18
0.808
0.016
1000
0.2 0.3
0.003
0.32
10,000
0.16 0.17
0.7829
0.00456
10,000
0.2 0.3
0.879
1.45 1018
Table 1
Our approximation has resulted in a complicated-looking expression that is only accurate over
a small window of x values when n is very large, so one may wonder how useful it actually is. For the
region in which the probability is appreciable, our approximation works very well when n is large and
is much easier to calculate. The probability that between 99,000 and 102,000 5s will appear in a run
of 600,000 rolls is approximated by 0.999736, while the exact result is 0.99974. The approximation
took Mathematica less than 10 14 seconds155 to compute, while the exact result took more than 7
hours. The exact result may actually agree with the approximation to more decimal places; I just dont
have the will to wait another 7 hours for a better result! The exact result is very difficult to compute
155

This is by its own estimation my feeling is that it took a little longer than that!!

488

Section X.2: Continuous Probability Distributions

for large n because it requires the computation of many large factorials that are not easily handled.
Mathematica has trouble even with small intervals like an outcome of between 101,000 and 101,100
5s in 600,000 rolls. The approximation in this case gives 0.000195852 in a negligible amount of
time, while the correct result of 0.000201625 takes more than 12 minutes. The accuracy in this case is
compromised by the small probability, but we are still less than 3% off.
I have been careful not to make too many assumptions about the size of n in the above work,
which complicated matters a lot more than necessary. Applications of statistics in the hard sciences
are usually associated either with the treatment of experimental errors or the behavior of systems for
which the value of n is extremely large, of order 1023 . In the former case, one is only interested in
approximate results. In the latter, the value of n is large enough (with room to spare!) for us to make
significant simplifications without altering the results very much. For this reason, the approximation
2
2

exp n ( x p ) 2 2 p (1 p ) exp (k np ) 2 2np (1 p ) exp k k 2 k

P ( x)
2 p (1 p ) n
2 p (1 p )n
2 k2
is often used. This approximation is much simpler than those found above, and represents an extremely
important result, the normal distribution. The last expression re-introduces the standard deviation k
and the mean k found above. It can be interpreted as a standard result for distributions that are really
continuous. Standard distributions can be characterized by their mean and their standard deviation, and
are often approximated quite well by this normal curve. The mean gives the value with maximum
probability and the standard deviation gives a measure of the width of this maximum. It is clear from
the above approximation to the maximum that it occurs closer and closer to x = p as n increases. The
standard deviation of x from its maximum at p is given by p(1 p) n , so the distribution congregates
more and more about x = p as the value of n is increased. This effect is illustrated in figure 5, which
shows our approximation for n = 100, 1000, and 10,000.
100
80
60
40
20
0.1

0.2

0.3

0.4

Figure 5
The normal, or bell, curve is obviously symmetric about the maximum, so is not appropriate for
small n if the probability p does not equal 1/2. As n increases, however, the exact binomial distribution
becomes more and more symmetric about its maximum, at least in the neighborhood of the maximum.
For the binomial distribution, the actual values of k range from 0 to n so the distribution cannot be
symmetric about the maximum unless p = 1/2. The continuous bell curve distribution, in contrast,
allows k to take all values from negative infinity to infinity. When n is very large, this difference is
unimportant; values of k that are either larger than n or smaller than 0 are suppressed by order of e n .
The probability of finding k between -10 and -9 will, of course, be incorrectly assessed by this
technique, so we can only really rely on our approximation when we are close to the maximum. This is
not really a big deal, as applications rarely find interest in regions associated with probabilities as small
as 10 40 . As long as the probability p is not equal to 0 or 1, there will be a value of n large enough that
489

Advanced Mathematical Techniques

the maximum is essentially contained within k 0, n and values outside this range do not contribute
substantively to the bell curve distribution.
When a measurement has a continuous outcome, the normal curve is often used as an
approximation to the probability of various outcomes. In this case, the distribution is thought of as
representing the probability that an experiment will produce a given result. One experiment consists
of n rolls of the die in our previous notation. It is important not to confuse the idea of a single trial, or
roll of the die, with that of a single run of trials, or set of n rolls of the die. Using this idea, it is easy to
make the transition from a discrete spectrum to a continuous one. The value of k = xn replaces the
value of x as the measured quantity. The number of trials, n, no longer has any meaning. It has
formally been taken to infinity. According to our above analysis, the probability that the value of k will
be found to be between kmin and kmax is given by
P kmin k kmax n

kmax n

kmin n

P ( x) dx

kmax

kmin

f (k ) dk ,

where
2

exp k k 2 k2

f (k )
2
2 k
is the probability density. For continuous probability distributions, it no longer makes sense to ask for
the probability that a specific outcome is obtained. Instead, we ask for the probability that the value of
k will be found in some window of size dk about a specific value. If the probability distribution is
continuous (doesnt have jumps), then this probability will be linear in dk for small window sizes.
The factor multiplying the window size, f (k), is the probability per unit window size, or probability
density. Continuous distributions are associated with probabilities that are given by the area under the
probability density between two extremes. If the normal curve is applicable, then the standard scale
used to measure the window size is given by the standard deviation (I will drop the k subscript
from now on). Writing z k k , the probability is given by
1 zmax z 2 2
e
dk .
2 zmin
People are usually interested in the probability that a measurement will fall within a certain range
distance from the mean value, so we are usually interested in the probability
2 z u 2 2
2 (1) n z 2 n 1
.
P k z k k z
e
du

0
n 0 2n n !(2n 1)
The probability that a measurement will lie within one standard deviation of the mean is given by
2
(1) n
0.68268949... ,
P1

n 0 2n n !(2n 1)
so measurements will lie within one standard deviation approximately 68.3% of the time. Two
standard deviations give the probability
8 (1) n 2n
0.9544997... ,
P2

n 0 n !(2n 1)
and three give

2
(1) n 32 n 1
0.9973002... .
P3

n 0 2n n !(2n 1)
P kmin k kmax

490

Section X.2: Continuous Probability Distributions

We can be approximately 95.5% certain that a measurement will lie within two standard deviations of
the mean, and 99.7% sure that it will lie within 3. These results are widely applicable and easily stated
in laymans terms, so find broad use in scientific literature. For 3000 die rolls, we can state that the
number of 5s will lie between 480 and 520 at 68.3%CL, between 459 and 541 at 95.5%CL, and
between 439 and 561 at 99.7%CL. The exact values for 3000 rolls are 68.5%CL, 95.8%CL, and
99.7%CL, respectively, so the approximation is fairly accurate for this moderately large value of n.
It is often useful to have a probability that a measurement lies outside a given range rather than
inside. For that, we use the asymptotic expansion
2 u 2 2
2 z2 2 1 1 3

P k k z
e
du
e
3 5 .

z
z z

For example, the probability that a measurement naturally lies outside of the 4 range is
approximately 6.35 105 . This is very useful in putting experimental results into perspective,
especially when one is trying to tout a new result. It is very effective to say that a result could not be
explained by the standard theory unless it was associated with a 0.00635% statistical fluctuation.
Analysis of errors is quite simple using the normal curve. Suppose we have two independent
measurements x and y with average values x and y and standard deviations x and y . The

probability density that a measurement will return x and y is proportional to


2
2
exp x x 2 x2 y y 2 y2 ,

so lines of constant probability density are ellipses in the xy plane with semi-major axes proportional to
the errors in x and y. These are the celebrated error ellipses that are often seen in experimental talks.
The error in the product of two measurements can also be analyzed in this manner. The expectation
value of the product xy is easily seen to be x y , as the integrals decouple. The standard deviation is,
as always, given by
2
xy2 x 2 y 2 xy .
I will leave it to you to show that this equals
xy2 x 2 y2 y 2 x2 x2 y2 .
Hint: you already know the standard deviation of x and y separately. Assuming that the errors are
much smaller than the central values, we ignore the last term and re-write our result as
2
xy2
x2 y
.

( x y )2 x 2 y 2
Thus, the square of the relative error in the product is the sum of the squares of the relative errors in
the quantities we are multiplying. This process is referred to as adding the relative errors in
quadrature. We can only add errors in quadrature when they are independent of each other, as the
above analysis depended crucially on the fact that the integrals decoupled. Correlated errors require a
somewhat more sophisticated analysis.
Exercises for Section X.2:
1. Suppose that the outcome of a given experiment is known to follow a normal distribution for
which the average value is 5.72 seconds with a standard deviation is 0.13 seconds. What is the
probability that the outcome of this experiment will give a value larger than 5.8 seconds? What
about 5.9 seconds?
491

Advanced Mathematical Techniques

2. Suppose that the outcome of a given experiment is known to follow a normal distribution for
which the average value is 4.315 Joules and the standard deviation is 0.068 Joules. What is the
probability that the outcome of this experiment will give a value larger than 4.33 Joules? What
is the probability that it will give a value smaller than 4.21 Joules?
3. Suppose that the outcome of a given experiment is known to follow a normal distribution for
which the average value is 8.93 Newtons and the standard deviation is 0.58 Newtons. What is
the probability that the outcome of this experiment will give a value between 7.2 and 7.5
Newtons?
4. Suppose that the outcome of a given experiment is known to follow a normal distribution for
which the average value is 223.6 Kelvin and the standard deviation is 3.8 Kelvin. What is the
probability that the outcome of this experiment will give a value between 231 and 238 Kelvin?
5. We have been using the idea of a standard deviation of a probability distribution in all of the
preceding, assuming that we have access to the exact average value and standard deviation of
the relevant distribution. This information is not easily obtained, as all we have experimental
access to is a certain number of experimental results. In order to estimate the mean and standard
deviation of a given distribution, we perform some measurements and try to use these estimates
along with some results from probability theory to determine approximations to the exact
results. Suppose that n measurements of the quantity k have been made, resulting in the values

k
j

n
j 1

.
n

(a) Show that the expectation value of the sum

k
j 1

is given by n k n k .

Our

approximation to the exact mean of the distribution is therefore given by


1 n
k k j kmean ,
n j 1
where kmean is the mean value of our n approximations.
(b) Show that the square of the standard deviation of a sum of two terms k1 and k2 is given by
the sum of the squares of the standard deviation of the two terms,
k21 k2 k21 k22 .
Hint: work directly from the definition of standard deviation; different measurements are
not correlated with each other.
(c) Use your result from part (b) to show that the standard deviation of the approximation kmean
is given by

k2

k2

,
n
provided that all of the k j have the same standard deviation.
mean

492

Section X.2: Continuous Probability Distributions

(d) Use your results from parts (a) and (c) to find the expectation value of the sum
n

k
j 1

kmean

in terms of the standard deviation k of the k j . Note that while the k j are independent of
one another, they are not independent of kmean .

In order to separate this dependence

effectively, we need to write k j kmean k j k k kmean . The actual mean of the


distribution, k , is a constant independent of k j . How is your result related to the actual
standard deviation of the distribution?

6. Suppose that five different measurements of a given length are done, leading to the values
3.213 m, 3.219 m, 3.217 m, 3.221 m, and 3.215 m. Use the results of problem 5 to estimate the
mean and standard deviation of the underlying probability distribution governing the
measurements.
(a) Using these approximations, determine the probability that the next measurement will yield
a value larger than 3.25 m.
(b) Suppose that another measurement had been taken before we made our approximations to
the exact mean and standard deviation of the distribution, and it gave the value 3.258 m.
What will the new estimates of the mean and standard deviation be? How does this affect
the probability that the next measurement will yield a value larger than 3.25 m?
(c) Explain what part (b) illustrates about the nature of our approximation to the mean and
standard deviation of the underlying distribution. How is this similar to the examples
involving the hapless gambler in the last section?
7. Suppose that we are interested in a binomial distribution for which the probability of a single
positive outcome is 64%.
(a) What does the complicated approximation to P(x) give for the probability that between 63
and 66% of the outcomes in a run of 20 trials will be positive? What is the relative error in
this approximation? How do these numbers change if the number of trials is 2000?
(b) Repeat the analysis of part (a) using the normal distribution. What is the difference between
these two approximations, and how does it change when the number of trials is increased?
(c) What does the complicated approximation to P(x) give for the probability that between 70
and 75% of the outcomes in a run of 20 trials will be positive? What is the relative error in
this approximation? How do these numbers change if the number of trials is 2000?
(d) Repeat the analysis of part (c) using the normal distribution. What is the difference between
these two approximations, and how does it change when the number of trials is increased?
8. Prove that the relative error in a product of two uncorrelated quantities is properly given by the
quadrature sum of the relative errors of the two quantities, provided that both fractional errors
are small. What does the term small mean in this context? How would correlation between
the two quantities change your treatment? What part of it would fail?

493

Advanced Mathematical Techniques

Section X.3: Multinomial Distributions

Closely affiliated with the binomial distribution is a class of multinomial distributions that
come into play when one is concerned with a situation in which several different outcomes are
possible. As a simple example, we can consider the roll of a die for which we are concerned with all
of the possible outcomes instead of just the elusive 5. What is the probability that 10 rolls of the die
will result in 3 1s, 2 2s, no 3s, 3 4s, 1 5, and 1 6? This question is easily answered once we
10
understand the process. To begin with, there are different ways in which the 3 1s can be
3
distributed among the 10 rolls. Once these are chosen, there are 7 more choices. The number of ways
7
in which the 2s can be chosen is then . Continuing this process, we arrive at a total of
2
10 7 5 5 2 1

3 2 0 3 1 1
ways in which this outcome can occur. Happily, this product can be simplified immensely by writing
it in terms of factorials:
10 7 5 5 2 1
10!
420 .

3
2
0
3
1
1
3!
2!
0!3!1!1!

The probability that any given roll of the die will give a specific outcome is 1/6, so the total probability
of this outcome is
3

10!
1 1 1 1 1 1 420
6
10 6.946 10 .
3! 2! 0!3!1!1! 6 6 6 6 6 6
6
This can obviously be generalized to an arbitrary number of possible outcomes with different
probabilities as
m pnj
n ! m ni
j
.

Pn1 , n2 ,,nm ;n m
p
n
!

i
n
i 1
j 1
j!
nj !
j 1

In this expression, there are m different possible outcomes and we are running n trials. The jth
outcome has probability p j and we are looking for the probability that there will be n j results with
this outcome. The multinomial distribution is a generalization of the binomial distribution, as the
n
multinomial p1 p2 pm can be expanded as

p1 p2 pm

n1 , n2 ,, nm

pj j

j 1

nj !

n !

The sum takes place under the restriction n1 n2 nm n . It is an interesting fact of number
m

theory that the quotient n !

n ! is a whole number whenever the n


j 1

satisfy this condition.

This result must be taken with a grain of salt, as we have implicitly assumed that each of the
different possible outcomes are considered as distinct. When answering a question like, What is the
probability of obtaining a royal flush?, we need to consider the fact that there are four possible royal
flushes that are all the same as far as the question is concerned. This leads to a factor of 4 in the
probability. Similarly, a treatment of the number of ways in which 4 rolls of a die will lead to exactly
494

Section X.3: Multinomial Distributions

2 rolls that are the same will require a factor of 4 choose 2 = 6. This consideration is extremely
important, and will often lead to a large correction factor for probabilities associated with a large
number of different outcomes that are actually considered the same. Calculations of probabilities
should always start with this idea; we first fix a single different outcome and determine how many of
these different outcomes there are, then determine the probability of attaining this specific outcome.
The probability we are interested in is the product of these two.
The multinomial distribution can be manipulated in the same manner as illustrated above with
the binomial distribution. Derivatives taken of the multinomial expansion with respect to p j give
information about expectation values associated with the jth outcome. When taken by themselves,
these derivatives do not give any new information; we could just as easily have declared the rest of the
outcomes the same and proceeded with the binomial expansion, as we did above. The real difference
associated with the multinomial distribution is the possibility of correlation between the different
outcomes. We can calculate n j and n 2j from the binomial distribution, but what about n j n j
with j j ? Common wisdom would indicate that this should simply equal n j

n j , but this would

imply that the two outcomes are not correlated and behave independently of one another. This is
clearly not the case, as an increase in one of the n j means a decrease in another because the sum is
constant. The exact result can easily be determined as
n j n j n(n 1) p j p j
from the expansion.

This result is smaller than

nj

n j n 2 p j p j by np j p j because of the

correlation.

Exercises for Section X.3:


1. Determine the probability that five consecutive rolls of fair die will lead to exactly two 3s, one
4, and one 6.
2. Determine the probability that seven consecutive rolls of a fair die will lead to at least two 4s
and three 5s.
3. Determine the probability that throwing three fair die simultaneously will lead to a sum of 14.
Assume that the interaction between the die does not favor any one number over the others.
4. Determine the probability that throwing four fair die simultaneously will lead to a sum of 19.
Assume that the interaction between the die does not favor any one number over the others.
5. Determine the expectation value of the product of the number of 2s, the number of 3s, and
3
the number of 6s attained in a run of n rolls of a fair die. Why doesnt this equal n 6 ?
What is the difference between your result and this nave expectation? Explain the reason why
the nave expectation is incorrect.
495

Advanced Mathematical Techniques

6. A room contains 7 people, selected at random. You are interested in determining the probability
of various distributions of the day of the week on which the people were born. Assume that all
days of the week are equally likely.
(a) Determine the probability that all people were born on different days of the week.
(b) Determine the probability that exactly two people were born on the same day of the week.
(c) Determine the probability that exactly two pairs of people were born on the same day of the
week.
(d) Determine the probability that exactly three people were born on the same day of the week
and none of the other people share a day of the week on which they were born.
(e) Show that the sum of the results for parts (a) (d) is less than 1, and indicate what
distributions the remaining probability consists of. Pick one of these, and show that its
probability is smaller than the difference between your sum and 1.
(f) Determine the expectation value of the product of the number of people born on Tuesday
and the number of people born on Saturday, as well the standard deviation of this quantity.
Why isnt this expectation value just equal to 1? Explain.
7. Lets re-visit the birthday problem from section 10.1, using the multinomial distribution.
Consider N people in a room, and take the number of days in a year to be 365.
(a) Determine the probability that all N people have different birthdays.
(b) Determine the probability that exactly two of the people have the same birthday.
(c) Determine the probability that there are two pairs of people who have the same birthday, but
everyone elses birthday is different.
(d) Determine the probability that there are three pairs of people who have the same birthday,
but everyone elses birthday is different.
(e) Determine the probability that exactly three people in the room have the same birthday, and
everyone elses birthday is different.
(f) Take N = 30, and show that the sum of the results for parts (a) (e) is very close to 1.
Explain what possibilities the remaining probability consists of. Take one of these
possibilities and compute its probability. Show that your result is smaller than the difference
between the sum and 1.
(g) Explain how the analysis of this problem differs from that seen in section 1. How does the
multinomial distribution simplify the analysis of multiple possible outcomes? Is it efficient
to use this technique directly to answer the question, What is the probability that at least
two people have the same birthday??. Explain why or why not.

Section X.4: Statistical Mechanics


Now that we have gotten through the basics, its time to turn to applications. One of the most
important and useful results from the use of statistics to describe physical materials is the Boltzmann
factor, named for Austrian physicist Ludwig Boltzmann, who first proposed it probably in the late
1870s. Imagine a collection of identical atoms156 with many different energy states available to them.
You can think of the energy states as representing different speeds if you like, but this restriction is not
necessary. At a given energy E, the question is how many atoms or molecules occupy each available
state. Boltzmanns answer to this question begins with an assessment of how many different ways a
particular distribution of energy can be attained. This assessment is identical to that we used above to
156

This requirement of identical atoms is not fundamental, and can be removed later without making substantive changes to
the results, but it is best to begin with the simplest case.

496

Section X.4: Statistical Mechanics

determine the total number of ways in which 43 rolls of 5 can be obtained in a total of 300 rolls.
Boltzmann set the total energy constant and assigned a state variable to the number of possible ways
in which atoms could occupy the available states. A macroscopic state is a set of occupation numbers
for each of the available microscopic or quantum states. The number of ways in which a given
macroscopic state can be occupied is given the symbol , and Boltzmann assigned it the entropy
S k B ln .
The logarithm ensures that the total entropy of two independent systems is the sum of the entropies of
the two systems considered separately, making entropy an extensive variable like energy or volume.
This definition of entropy is one of the crowning achievements of Boltzmanns career, and is inscribed
in his tombstone in Vienna. The constant k B 1.38065065(24) 10 23 J K is called the Boltzmann
constant, and has fundamental meaning concerning the relationship between energy and temperature,
but its real purpose is as a conversion constant relating our unit of energy to our unit of temperature; its
value is often simply taken as 1 by physicists working in statistical mechanics.
The introduction of entropy is extremely important, as it is a state variable. State variables
depend only on the macroscopic state of a system of atoms and molecules. Some examples are
pressure, volume, temperature, and internal energy. The distinction represented by state variables is
extremely important in thermodynamics, as materials subjected to violent reactions, like gasoline
vapor in a combustion engine, go through a very complicated intermediate set of macroscopic states
that is not easily analyzed. State variables are like a rock in the storm of intermediate states: they are
indifferent to the previous state of a system and depend only on what is happening at the moment. If a
macroscopic state has a well-defined state variable, then the value of that state variable is independent
of the past. There are many quantities in thermodynamics that are not so well-defined. Work and
heat, for example, represent flows of energy from one system to another. It is tempting to think of
them as representing separate flows of energy that can be tracked independently of one another, but
this is not the case. Only the sum of the heat and work energy added to a system represents a welldefined thermodynamic quantity. Denoting the heat transfer as Q and the work done on the system as
W, we have
E Q W ;
the introduction of work and heat energy serves to change the energy held by the system. This
statement seems very obvious, but it is one of the major tenets of thermodynamics. The total energy E
is a state variable, so the sum of heat and work added to a system is independent of the manner in
which the process takes place. As long as the initial and final energies coincide, all processes yield the
same value for this sum. This statement is one form of the first law of thermodynamics. Entropy
allows us to make similar statements about heat and work, as it is also a state variable. We will see
that the idea of entropy leads directly to the idea of temperature and, in fact, represents a convenient
way to define temperature in a manner that is independent of any measurement device.
Let us suppose that there are m microscopic energy states available to our system of atoms and
molecules, with the jth state requiring energy j . The number of ways a given macroscopic state
consisting of the occupation numbers n1 for state 1, n2 for state 2, and so on, can be occupied is given
by
n!
.
m
n
!
j
j 1

In our discussion of the multinomial distribution, we assigned different probabilities for each state
occupation. Boltzmann made the fundamental assumption that all of the different states are equally
probable for occupation, so the probability of finding a given macroscopic state is proportional to .
497

Advanced Mathematical Techniques

Boltzmann then reasoned that a system left alone for a long time would eventually settle into the
macroscopic state with maximum probability, or, along with his definition, maximum entropy. This is
actually the most fundamental statement of the second law of thermodynamics: a system left on its
own will act to increase its entropy whenever possible. All of the other, more flamboyant, statements
can be derived from this one.
We can determine the state of maximum probability by maximizing under the constraint
that the total energy and number of atoms is constant. The easiest way to do this is by using Lagrange
multipliers. We consider the function
m

F n1 , n2 , , nm ; , ln j n j E n j n
j 1

j 1

m
m

ln n ! ln n j ! j n j E n j n
j 1
j 1

j 1

and search for critical points. The derivative with respect to n j gives

d
ln n j ! j 0 ,
dn j
or
d
j
ln n j ! .
dn j
The difference between two of these equations gives
d
d
j j
ln n j !
ln n j ! .
dn j
dn j
Now, we are thinking of atoms here. The occupation numbers of the vast majority of occupied
microscopic states must be very large, of order 105 or more. In this case, it is quite appropriate to use
Stirlings approximation for the factorials. The logarithmic derivative of the factorial function is
especially simple with the Stirling approximation,
d
1
.
ln x ! ln x
dx
2x
The factor of 1 2 x is unimportant when x is very large, so we have
n

j j ln n j ln n j j j ln j n j n j e j j .
n j

Substituting this back into the equation for n j gives

ln n j ! ln n j ln n j j j ln n j j n j e e j .
dn j
The final result is that the occupation number of the jth state is equal to a constant times the
exponential of j . The constant is determined by the demand that the total number of atoms is n. It
is interesting to note that the result we have obtained is actually independent of our requirement that
the total number of atoms remain constant. In the absence of , we simply have the exponential.
This will be important to us later in our discussion of photons and the energy they carry.
All that is left is a determination of the value of the Lagrange multiplier in terms of a
physical quantity. There is a very neat way to do this that also emphasizes the importance of
thermodynamic relations and state variables. Suppose that we modify the macroscopic state slightly

498

Section X.4: Statistical Mechanics

by moving one atom from state j to state j . This requires an increase of j j in energy, so we
have
dE j j .
The change in entropy is given by
dS k B d ln k B ln n j 1 ! ln n j 1 ! ln n j ! ln n j !

k B ln

nj
n j 1

k B ln n j ln n j k B j j

so we have

dS k B dE .
Now, the change in internal energy is always equal to the sum of the heat added to the system and the
work done on it. Infinitesimal amounts of mechanical work are, as always, defined as the dot product
of the force applied and the infinitesimal displacement of the walls of the system. The relation above
is between two state variables, so cannot depend on the manner in which the change took place. We
can certainly imagine this change taking place with no change in volume at any time for the system, as
only one atoms state was changed (a change in volume would result in many atoms being displaced).
Such a process is called reversible because it is not forbiddingly improbable for that single atom to
revert back to its original state. Reversible processes must consist of a series of well-defined
macroscopic states; the pressure, internal energy, and other state variables must be well-defined at each
stage in the process. For this reason, reversible processes are sometimes called quasi-static processes.
Under this reversible change, there is no work done on the system and the energy change must be
attributed exclusively to heat. Thus,
dS k B Qr .
The lowercase delta is meant to remind us that this is an infinitesimally small amount of heat energy.
We cannot use a d because heat exchange does not represent the change in any well-defined quantity.
This relation is actually quite stunning, as it implies that multiplying the heat exchange by k B
converts it to the change in a state variable. As discussed above, heat exchange does not represent the
change in a state variable and will, in general, depend on the manner in which the change took place.
In the language of multivariable calculus, heat exchange does not represent a conservative vector field.
Its path integrals depend on the path taken between the initial and final states, so it cannot be viewed as
the change in any well-defined potential function. The factor k B changes this nonconservative
vector field into a conservative one.
As an example of this phenomenon, consider the vector field
F 3 x 3 y 3 i 2 x 4 y 2 j .
This vector field is not conservative, as can easily be seen from the fact that its mixed partials do not
agree. Therefore, it cannot be the gradient of a potential function and its line integrals will be pathdependent. Dividing it by xy, however, we arrive at the conservative vector field
1
F 3x 2 y 2 i 2 x3 y j x3 y 2 .
xy
The factor 1 xy is called an integrating factor because it allows us to relate F to the gradient of a
scalar function and immediately integrate the gradient. When the factor is in the denominator, as in
this case, it is sometimes referred to as an integrating denominator. The two are obviously equivalent.
This technique is extremely useful in solving many first order differential equations and changing the
form of many second and higher order differential equations. Suppose we are interested in solving the
differential equation
499

Advanced Mathematical Techniques

3x3 y 3 2 x 4 y 2

Re-writing it as
3 x 3 y 3 dx 2 x 4 y 2 dy 5 y dx

dy
5y ;
dx

F dr 5 y dx

y (1) 1 .

xy x 3 y 2 dr 5 y dx ,

we divide by xy and integrate from (1,1) to (x, y). This gives


x3 y 2 1 5ln x ,
or
5ln x 1
,
y
x3
the solution to our equation. When solving equations like this one, it is customary to move everything
to one side and find an integrating factor for
3x3 y 3 5 y dx 2 x 4 y 2 dy
instead of just F dr . This process guarantees that we will be able to find a solution, as the 0 on the
right-hand-side will not change no matter what we multiply it by. It is a standard result of differential
equations that every well-defined first order differential equation has an infinite number of integrating
factors. Finding one, on the other hand, is not at all guaranteed to be easy.
In thermodynamics, the process is a bit more complicated because there are many state
variables that we have to consider. It is no longer obvious that there will be an integrating factor for
every inexact differential.157 The existence of an integrating factor for a differential expression
involving more than two variables is, in fact, a very special property that can only be exhibited by a
small class of differentials. The general theory of inexact differentials and the existence of an
integrating denominator is due to the German mathematician Johann Friedrich Pfaff, and general
differentials are often referred to as Pfaffian expressions. Suppose we have the Pfaffian expression
Qr and are interested in finding out whether or not there exists an integrating denominator. If there
is such an integrating denominator, call it T , then it must be the case that
Qr
dS
T
for some function S. The d in front of S implies that S is a state variable; its differential can be
expressed in terms of the gradient of a potential function.158 Consider the equation Qr 0 . We are
searching for curves in our state variable space that pass through a given point and satisfy this
equation. If Qr admits an integrating denominator, then it is obvious that these solutions satisfy
S = constant. Now, the equation S = constant is a constraint on the state variables on which S depends.
Because of this restriction, it is not possible to reach all points neighboring our initial condition while
still satisfying the equation. Think of a surface in three dimensions. Given any point on the surface,
there are certainly other points in its immediate vicinity that do not lie on the surface. The Greek
mathematician Constantin Caratheodory proved the converse of this result in 1909, showing that a
Pfaffian expression admits an integrating denominator if and only if there are points in the vicinity of
every point that cannot be reached along a curve satisfying Qr 0 beginning at that point. Thus, the
existence of an integrating denominator is implied by the existence of neighboring points that are
inaccessible to the system along Qr 0 .
157

The term inexact is used in place of nonconservative whenever we are considering processes more general than vector
fields in physical space.
158
This potential function will, in general, depend on other state variables like pressure and internal energy. You should
think of a gradient as consisting of derivatives with respect to independent state variables.

500

Section X.4: Statistical Mechanics

Processes in which Qr 0 are called adiabatic processes. The question of whether or not the
heat transfer admits an integrating denominator is equivalent to the question of whether or not there are
states inaccessible to a system through a reversible adiabatic process.159 The second law of
thermodynamics indicates that it is impossible for heat to flow from a colder reservoir to a warmer
reservoir without work being expended to accomplish this flow. Thus, there are definitely states that
are inaccessible to a system through an adiabatic quasi-static process as a consequence of the second
law; the heat flow must admit an integrating denominator. Caratheodorys theorem is quite general,
and can be applied to essentially every system. It can also be applied to a combination of two systems.
Suppose that we have two systems isolated from their surroundings, but in thermal contact with each
other. This means that heat is allowed to flow between the two systems, but no heat can flow in or out
of the combined system. If the amount Qr of heat flows from one system to the other in a quasistatic reversible process, then this heat flow must admit an integrating denominator. This denominator
cannot be different for the two systems because the heat flow between them is shared. Thus, the
integrating denominator must be the same for all systems in thermal contact that have reached
equilibrium.160 This promotes the identification of the integrating denominator with the temperature
of the system.
Going back to our consideration of the small, reversible change of one atom from microstate j
to microstate j , we see that the heat flow is given by
dS
.
Qr j j
kB
Dividing by the temperature T, we see that
Qr
dS

T
k BT
must represent the change in a state variable. The entropy is certainly a state variable, so we can take
1 k BT , where the symbol has been introduced for convenience. This allows us to write
n j ce

,
m

where c is a constant determined by the requirement that

n
j 1

n . Apparently,

m
c ne j .
j 1

It is customary to define Z e

so that the probability of occupation for state j is given by

j 1

Pj e

Z .

The function Z is known as the partition function for the system, and has a large number of
interesting properties. See if you can show that the expectation value of the energy held by a single
atom is given by
d

ln Z .
d

159
160

The subscript r serves as a reminder that we are only considering reversible processes.
If the systems have not reached equilibrium, then the heat flow will not be reversible.

501

Advanced Mathematical Techniques

The probability of occupation of a state with energy j is proportional to the Boltzmann factor,
j

e . Thus, a collection of atoms in thermal equilibrium at temperature T occupies states of energy


j according to the value of the ratio j k BT . If the energy of a microstate is much larger than kBT ,
then it is unlikely that that state will be occupied. If, on the other hand, the energy is much smaller
than k BT , the state is highly likely to be occupied. For this reason, the quantity k BT can be thought of
as the amount of energy allotted to atoms, on average, in thermal equilibrium at temperature T. A
macrostate in which a few of the atoms hoard the majority of the energy will naturally evolve in such a
way that all of the available states are occupied according to the Boltzmann factor. This statement
follows from the second law of thermodynamics, but in this form it is really a statement about
probability: just as it is unlikely that all 300 of a run of die rolls will come up 5, it is unlikely that the
majority of the energy will be hoarded by a few of the atoms. Interaction between the atoms will
ultimately lead to the spread of the energy throughout the system, bringing us closer and closer to the
Boltzmann equilibrium value.
Although the second law is a statement about probability, it is a very strong statement. While it
is unlikely that all 300 rolls will come up 5, it is still possible. The probability of this happening is
3.586 10234 , but it is not zero. In contrast, the probability that all of a collection of n atoms with m
accessible states sit in the lowest energy state except 1, which sits in whatever state it needs to in order
to hold the remaining energy is given by n m n . For a very small sample of 1010 atoms with 2
9

accessible states, this gives a probability of approximately 10310 . This effect is made worse when
there are more atoms and/or more accessible states. Statistical fluctuations do occur in atomic
systems, but these fluctuations are limited by the standard deviation of the equilibrium values of the
state variables. This standard deviation is of order 1 n j for a state with population n j , so we can
expect the statistical fluctuations about these equilibrium values to be limited to approximately 1 part
in 1000 for a state consisting of 106 atoms. We can clearly still make strong statements about the
behavior of a macroscopic system using these results.
We began with a problem concerning a system of n atoms, each of which has m accessible
states, with total energy E and ended with a statement about the probability of occupation of each of
these states that involves a new quantity, the temperature T. The temperature made no appearance in
our initial statement, so it must be determined in terms of the total energy E and the total number of
m

atoms n.

As the Lagrange multiplier responsible for enforcing the restriction

n
j 1

E , the

temperature is determined by imposing this restriction. To see how this works, consider a system of n
atoms with two accessible microstates. The first has energy 0 and the second has energy . Writing
Z 1 e ,
the average energy per atom is clearly given by
d
e

ln Z
.
1 e
d
The total energy is E, and the number of atoms is n, so this expression must equal E n . The
temperature is determined by finding the value of that enforces this equality. Apparently,
T
k B ln
502

.
E n
E n

Section X.4: Statistical Mechanics

A few comments are in order about this expression. First, the energy of the excited state is clearly
larger than E n . If it werent, then it would be impossible for the atoms to have collective energy E
since the maximum energy of the system would be n E in this case. Second, the temperature will
be negative if 2 E n . In order for the atoms to have a combined energy E in this case, the excited
state must be more populated than the ground state. It is possible to prepare systems in such a strange
manner, but they will not remain that way for long. Atoms, being composed of charged electrons and
nuclei, interact with the electromagnetic field. Energy will be released through these interactions as
the more populated excited state decays to the ground state. We can force a system to remain in this
population inverted macrostate by replacing this energy, as is done in laser technology, but we cannot
maintain it forever. Third, as the energy of the excited state increases to infinity, the temperature must
also increase without bound. This results from the fact that the ground state does not contribute at all
to the total energy, so the excited state must be populated to some extent in order for the system to
have total energy E. If the temperature did not increase with , then the Boltzmann factor would
disallow population of the excited state as its energy increased.
As a more realistic model, consider n atoms that can each access an infinite number of states
with energies j j for j = 0, 1, 2, . This is the case of the quantum harmonic oscillator, and is
often used to approximate the behavior of physical systems. Happily, we can re-sum the partition
function as

1
Z e j
.

e
1
j 0
The average energy is given by
e

E n

,

e 1
1 e
and the temperature is

.
k B ln n E 1

This temperature is never negative because of the fact that there are an infinite number of states to
occupy. We can put this expression into a more convenient form by writing it in terms of the
dimensionless ratio n E :
E ln 1
k BT .

This is a very useful expression, as it relates the average energy per atom to the characteristic energy
k BT at temperature T. As 0 , the energy spectrum becomes more and more continuous and the
average energy per atom tends to k BT . The discrete nature of the energy states becomes less and less
important, so the atoms essentially just get what is coming to them, k BT of energy per atom. As
, even the first excited state becomes harder and harder to occupy and the average energy per
atom becomes an increasingly smaller fraction of k BT . More and more atoms congregate in the low
energy ground state, and only a few can have enough energy to populate the excited states. This limit
mimics the previous situation, as only two states really need to be considered when the energy
difference between states is so large.
In all of the above analysis, we have considered states that are accessible to atoms. This term
has not been rigorously defined, and in some cases it can lead to an incorrect result. There are many
examples of states whose energies are low enough to be occupied by atoms at a given temperature, but
in reality are not occupied to any large extent. This occurs because of the initial state of the system
503

Advanced Mathematical Techniques

and an inherent difficulty in getting from this initial state to other states. One prominent example is
that of diamonds. At ordinary pressures, the diamond structure of carbon is actually not
thermodynamically stable. Despite the advertisements of jewelers, a diamond is not forever. The
stable structure of carbon at atmospheric pressure is graphite because this structure is associated with a
lower energy per atom than diamond. The necessary conversion of diamond to graphite, however,
requires the carbon atoms to re-arrange themselves on a massive scale. Although the final energy of
the graphite structure is certainly lower than that of the diamond structure, the transition between these
two structures requires a huge amount of energy. Diamonds are frozen in their state, and will not
transition for an extremely long time161 even though the end result of such a transition would be a
lower energy state. Diamond is often called a quasi-stable or meta-stable state for this reason. We can
treat it as though it is in thermal equilibrium with its surroundings at a given temperature, but we must
omit the inaccessible graphite state from our calculations if we want to get an answer that agrees with
experiment. Recall that no mention of time was made in the preceding; we have stated only that
eventually the system will reach equilibrium. The amount of time required for a given system to attain
this equilibrium depends on the details of the system and the ease with which it can transition.
Diamonds are created thermodynamically at much higher pressures and temperatures, where the
transition is easier (because of the higher temperature) and thermodynamically preferred (because of
the higher pressure; diamond has a higher density than graphite). If an earthquake occurs, diamonds
formed at these higher pressures and temperatures can abruptly be brought to the surface. These
diamonds are frozen in, and can no longer make the transition to graphite in a reasonable amount of
time. This is clearly illustrated by the existence of diamond mines in Africa in close proximity to
graphite fields. The carbon associated with the graphite fields was brought up slowly, over millions of
years, making the transition from diamond back to graphite slowly at high pressures and temperatures,
while the carbon associated with the diamond mines was brought up quickly with no time to transition.
There are, of course, many other examples of quasi-stable states. Steel and glass are some examples,
as both are formed by heating a material up to a high temperature where certain transitions can take
place, then quenching them to quickly freeze the changes in.
Once we have calculated the partition function, we can use thermodynamic relations along with
the fact that

E n
ln Z

to determine the functional form of many different quantities. Thermodynamic relations follow from
the expression of various differentials in terms of the state variables of the system. One of the most
fundamental of these is the expression
dE TdS pdV ,
which follows from the fact that, for reversible changes, Qr TdS and Wr pdV . The second of
these results from the fact that work is the dot product of force and displacement; the force applied by
the system on a small surface area element162 is pdS and the displacement of the surface element dr
lies perpendicular to the surface element. Integrating over the boundary of our system, we see that
Wby system pA dr pdV , so the work done on the system is given by W pdV . Note that this
expression for the change in energy of a system implies that the energy is properly thought of as a
function of entropy and volume. This expression allows us to get a feel for the meaning of
161

Some calculations suggest that it would take many times the age of the universe for a diamond to convert to graphite
under standard temperature and pressure.
162
The quantity dS is a vector perpendicular to the surface element whose magnitude is given by the area of the surface
element.

504

Section X.4: Statistical Mechanics

thermodynamic quantities from a different perspective than that given above. It is clear, for example,
that
E
E
T
p
and

V
V S
The little subscripts indicate what state variable is held constant through the differentiation process. It
is important to specify this when there are many variables that could be held constant, as
E S V E S p E S T . The first of these expressions indicates that the temperature of a
system is equal to the ratio of energy change to entropy change at constant volume. Keeping the
volume of the system constant means that the microstate energies j are constant, as long as the
atomic number density is not large enough to alter the state energies. A determination of this effect
requires an analysis of the quantum mechanical behavior of the system, something Boltzmann did not
have access to at the time he developed this analysis, but atoms were widely believed to behave
independently of each other as long as their number density is not too large. This belief is backed by
quantum theory, with some prominent exceptions. These exceptions are certainly important, but will
complicate the present analysis. We will consider for the present atoms that are bosons and behave
independently of one another, as long as there are not too many of them. To increase the entropy at
constant volume, we must move atoms from the more populated states with lower energy to the less
populated states at higher energy. This increase in energy with entropy gives the temperature of the
system. When the entropy is much smaller than the maximum entropy possible (associated with
having all states equally occupied), a small change in the distribution of atoms leads to a large change
in entropy and a small temperature. Conversely, macrostates that lie close to the maximum entropy
possible cannot experience a large change in entropy; these states have a large temperature, so the
energy cannot change very much with changes in entropy. Similarly, the pressure of a system is given
by the ratio of change in energy to change in volume at constant entropy. Constant entropy means that
we are not allowed to change the occupation numbers of the states; the pressure is associated only with
the dependence of the microstate energy on volume. Increasing the volume of most systems decreases
the energy of each microstate, leading to the minus sign in the above expression.
One of the most important uses of thermodynamic relations is in the re-expression of important
physical quantities in terms of other quantities that are easily measured. It is difficult to require that a
system make a change at constant entropy, as we cannot force the atoms of the system to stay in their
states. The measurement of a quantity like p T S is therefore difficult to orchestrate. Writing
S
S
dS
dT dp ,
T

p
p T

S
p
S
however, it is easy to see that
T
. The numerator of this expression is related

p p T
T S
to a physical quantity that is very easy to measure, the heat capacity at constant pressure, C p . This

quantity gives the amount of heat energy required to change the temperature of the system by a unit
amount while its pressure is held constant. Multiplying by T, we have
S Qr
T

Cp
T p T p
since the heat transfer in a reversible process is given by Q TdS . The denominator can be reexpressed by using a Maxwell relation. Defining the Gibbs free energy, G E TS pV , it is clear
that
505

Advanced Mathematical Techniques

dG SdT Vdp .
Since G is obviously a state variable, it must be true that the mixed partials agree:
S V

.
p T T p
This last quantity is again easily measured, and often expressed in terms of the thermal expansion
coefficient
1 V

,
V T p
which indicates the fractional change in volume associated with a unit increase in temperature. In
terms of these two well-known physical quantities, the required derivative is given by
Cp
p
.


T S VT
The use of thermodynamic relations has thus allowed us to express this derivative in terms of easily
determined experimental quantities. It is clearly positive for ordinary materials, as these materials
expand when heated and require more energy to increase their temperature, so we see that the pressure
of a material must increase as its temperature does if the entropy is to remain constant.
The heat capacity of a substance at constant pressure or at constant volume is very important to
the analysis of the behavior of the system. The heat capacity at constant pressure can easily be
measured, but that at constant volume is only easily accessible for gases. Thermal expansion is very
important to solids and liquids, so it takes an enormous amount of pressure to ensure that the volume
does not change as the temperature is increased. For many materials, the requirement of constant
volume translates to many thousands of atmospheres of pressure, even for fairly small temperature
increases. This is not feasible for many materials, so we require an alternate method to determine the
value of this important quantity. This can be accomplished by writing the differential of S, thinking of
it as a function of pressure and temperature:
S
S
dS
dT dp .
T p
p T
Dividing by dT and taking the volume as constant, we obtain
S p
CV C p T
.
p T T V
Neither of the remaining derivatives are easily accessed by experiment, so we are not yet finished.
The derivative S p T is equal to V T p V , as shown above. The derivative at constant

volume can be re-expressed by writing the differential of the volume, thinking of it as a function of p
and T:
V
V
V
V
p
V
dV
dp
dT
T
p V p .

p
T p
T V
T

T
p T
The final derivative, thankfully, is easy to measure. It is associated with the inverse isothermal
compressibility, or the bulk modulus
p
B V

V T
of the system. The bulk modulus gives the amount of pressure required to obtain a given fractional
change in volume at constant temperature. Using these results, we obtain the famous relation
506

Section X.4: Statistical Mechanics

CV C p 2 BVT .
This result allows us to easily determine the heat capacity at constant volume from its value at constant
pressure. The difference between the two is given by 8.31447 J K mol for an ideal gas, as we will
show below, but it is often much smaller for solids and liquids. For water, the thermal expansion
coefficient is approximately 2.07 104 K and the bulk modulus is approximately 2.1 109 Pa , so the
difference in specific heats (heat capacity divided by mass) at 300K is 27 J kg K , quite small when
compared to the constant pressure value of 4190 J kg K .
The heat capacity at constant volume of a substance has an irresistible importance, as it is
directly related to the standard deviation of the energy from its average value. A system is certainly
allowed to be in thermal contact with a reservoir if its temperature is to remain constant; as thermal
energy flows from the reservoir to the system and back, the energy of the system will fluctuate. The
standard deviation of this fluctuation per atom is obviously given by

2 2

2j e

j 1

Differentiating ln Z twice with respect to gives


j

m e
2
j
ln Z 2
2
Z2

j 1

Since E n

m e j
j
j 1 Z

Z
2

2 .

ln Z , we have immediately that

1 E

2 .
n V
The derivative is taken at constant volume in order to avoid changes in the microstate energies j . To
determine the standard deviation in the total energy E, we simply add the standard deviations in energy
per atom in quadrature over all of the atoms: E2 n 2 . Writing this result in terms of a derivative
with respect to T, we obtain
E
E2 k BT 2 CV k BT 2 .
T V
The last expression follows from the fact that the change in energy is equal to the heat transfer for
processes taking place as constant volume. The fractional error in assuming that the total energy is
given by its average value E is
CV k BT
E
.

E
kB E
The factor on the right is the ratio of k BT , the approximate energy per atom, to the total energy. It is
extremely small, of order 1 n , so even the factor of k B in the denominator of the square root is not
enough to quell its suppression. For one mole163 of helium at 300K, this ratio is approximately 1023 ,
so there is obviously no trouble there. Solids and liquids show similar small results for the ratio,
163

One mole of atoms represents 6.02214179(30) 10 atoms, the number of carbon-12 atoms required for a mass of
exactly 12 grams. This number is called Avogadros number after the Italian savant Amadeo Avogadro, though he, himself,
was not actually involved in its definition.
23

507

Advanced Mathematical Techniques

though these are harder to determine partly because of the difficulty in finding the total energy of such
configurations.
Note that many of the above relations have been derived without the use of the Boltzmann
factor. The validity of these results lies in the structure of thermodynamics and multivariable calculus,
and precedes the validity of Boltzmanns result for maximal probability. If we have thermal
equilibrium, we can also determine the entropy and pressure as functions of temperature and, through
the microstate energies, volume. The entropy is given by
m
m

S k B ln k B ln n ! ln n j ! k B n ln n n p j n ln p j n p j n ,
j 1
j 1

where we have used an abbreviated Stirling result in the approximation. It will be valid to within 5%
whenever p j n 20 , and much better when the occupation numbers are larger. Continuing, we need to

calculate two sums. The first is

p
j 1

1 , obvious from the fact that the sum of probabilities equals 1.

The second requires a bit more work, but can be taken with some manipulation:
m

p
j 1

j 1

j 1

ln p j j p j ln Z p j ln Z .

At this point, we simply substitute:

E
n
n ln Z
S nk B ln Z nk B ln Z
k BT ln Z
.
T
T
T

V
This last expression illustrates the use of another integrating factor. See if you can show the result.
We must be careful with this expression for the entropy, as the volume dependence has been swept
under the rug into the microstate energies. There are other ways to determine the entropy that do not
require us to snub the volume so rudely; one will be illustrated below in the context of the MaxwellBoltzmann speed distribution for an ideal gas.
Using the partition function, we can easily determine many other quantities. For example, the
free energy F = E TS is given by
F nk BT ln Z
and the pressure is given by
F
ln Z
p
nk BT
.
V T
V T
We must know the dependence of the microstates energy on volume before we can compute the last
derivative. This information is provided by quantum theories of an atoms motion and energy
absorption habits; we will not discuss it more here. This determination of the entropy, free energy, and
pressure in terms of the temperature and volume is extremely important, as it implies that there are
only two free variables in this aspect of thermodynamics. These relations give an equation of state
that relates the pressure, temperature, and volume to one another. This equation of state is different in
different phases, like liquid and solid, and can be reduced to an equation involving essentially any two
of the state variables of the system. This is the reason why we are allowed to jump so freely from
thinking about the entropy as a function of volume and internal energy to thinking of it as a function of
temperature and pressure. For systems in diffusive contact, the number of particles also becomes a
state variable that must be included in our derivation of the Boltzmann factor. Just as the state
variables V and S are associated with the conjugate state variables p and T in the expression for the
differential of the total energy E, the number of particles is associated with its conjugate , the
chemical potential:
508

Section X.4: Statistical Mechanics

dE TdS pdV dn .
It is obvious from its appearance in this expression that the chemical potential gives the energy
increase associated with adding exactly one atom to the system at constant entropy and volume. All of
the above analysis can be restructured to include the chemical potential if one wishes. The result is a
new Boltzmann factor, exp ( j n ) , which distinguishes those microstates whose energy is

greater than the chemical potential times the number of atoms from those whose energy is smaller.
This new Boltzmann factor is called the Gibbs factor, and is extremely important to the study of
systems in diffusive contact. If two systems with different chemical potentials but equal temperature
are placed in diffusive contact, then atoms will flow from the system with higher chemical potential to
the system with lower chemical potential in order to increase the entropy of the joint system. This can
easily be seen by considering the change in entropy at constant volume and total energy,


S
dS 1 2 dn

T
n U ,V
T T
for a transfer of dn atoms from system 1 to system 2. This is clearly positive when 1 2 .
Boltzmanns result is considered to be one of the most important achievements in the history of
physics, certainly in the field of thermodynamics. It is widely applicable, and can often be used to
make back of the envelope calculations concerning the amount of energy a given atom can be
expected to hoard. When physicists are trying to decide which processes are important at a given
temperature, they almost always compare the value of k BT to the energy required to initiate a given
process. If the energy associated with the process is much larger than k BT , it probably will not happen
at that temperature. If it is much smaller, then it almost certainly will. The value of k BT gives a
useful measure of the amount of thermal energy that is available to atoms in thermal equilibrium at
temperature T, and can be used to estimate the average speed of atoms or average height in a
gravitational field. It can also be used in the analysis of stars to determine the approximate
temperature necessary to initiate the fusion reactions required to ignite the star. The details of this
analysis depend on the specifics of the energy levels involved, but the idea of k BT as a characteristic
ambient energy at temperature T is almost always useful in analyzing a physical process.

Exercises for Section X.4:


1. Consider a system of 2 1010 atoms with four possible states. The energy associated with the
states is 0, , 3 , and 5 . Answers should include k B .
(a) Determine the entropy associated with this system if there are 0.8 1010 atoms in the first
state, 0.5 1010 in the second state, 0.4 1010 , and 0.3 1010 in the fourth.
(b) What is the maximum entropy possible for this system of four states if the total energy is not
fixed?
(c) Determine the occupation numbers of each of the states if the total energy is fixed at
3 1010 . Give your answers in terms of x e .
(d) Determine the temperature of the system under the assumptions of part (c). Give your
answer as a decimal multiple of k B .
(e) What is the maximum entropy possible for this system under the assumptions of part (c)?
(f) Determine the occupation number of each state under the assumptions of part (c).
509

Advanced Mathematical Techniques

2. A system of 4 1015 atoms with three states having energies of 0, , and 2 is maintained at
temperature 5 k B . Assume that the state has reached thermal equilibrium.
(a) Determine the occupation number of each of the states.
(b) Determine the entropy of the system.
(c) Determine the total energy held by the system.
3. A system of 4 1015 atoms with three states having energies of 0, , and 2 is maintained at
temperature 2 k B . Assume that the state has reached thermal equilibrium.
(a) Determine the occupation number of each of the states.
(b) Determine the entropy of the system.
(c) Determine the total energy held by the system.
(d) Explain the meaning of negative temperature in the context of this problem. What does it
indicate about the occupation numbers of the states? How could the system be maintained
at such a temperature? Assume that the system is not able to exchange energy with its
surroundings. Why is this assumption relevant to the problem?
4. Use the integrating factor x 2 y

to solve the initial value problem

dy
xy 2 y 2
; y (1) 2 .
2
dx
x 2 xy
2

5. Use the integrating factor e xy to solve the initial value problem


dy 1 xy 2
; y (2) 1 .

dx 2 x 2 y
6. Go through the steps outlined in the text to show that
Cp
p
.


T S VT
7. Explain how the following quantities could be obtained experimentally. Your explanation can
be qualitative, but it should indicate what the experimentalist needs to do in order to determine
the quantity. Say whether the determination of this quantity is relatively easy, hard, or inbetween, and whether normal materials (like water or an ideal gas) will exhibit a positive or
negative value for this quantity. Explain the reasons for your answers.
(b) V T S
(c) E p T
(a) p T V
(d)
510

p V

(e)

p T

(f)

T V

Section X.4: Statistical Mechanics

8. Use Maxwell relations to determine the value of the following quantities in terms of the heat
capacity at constant pressure, C p T S T p , the isothermal bulk modulus,

B V p V T , and the thermal expansion coefficient V T p V . If this is not


possible, explain why and what other, easily measurable, quantity is needed in order to
determine this quantity.
(a) p T V
(b) V T S
(c) E p T
(d)

p V

(e)

p T

(f)

T V

9. The thermite reaction is a spectacular reaction involving rust and powdered aluminum. It
requires a large amount of energy to occur, however, as the reaction must first break the bonds
associated with the rust in order to proceed. The required activation energy is approximately
145 kJ/mol, or 2.41 1019 Joules per molecule. What is the expected temperature at which
this reaction will freely take place? Note that your answer will be noticeably larger than the
actual temperature required to initiate this reaction because the reaction is exothermic; any
molecules that react will provide more energy to the system.
10. The energy of the ground state of hydrogen is approximately -13.6 eV, or 2.18 10 18 Joules.
At what temperature will the average hydrogen atom dissociate into its constituent proton and
electron?
11. A hydrogen molecule consists of two hydrogen atoms bonding with each other. The bond
energy is approximately 432 kJ/mol, or 7 10 19 Joules per molecule. At what temperature
will the average hydrogen molecule actually be 2 hydrogen atoms?
12. In order to ignite, stars must first reach a temperature large enough for protons to come close
enough together to initiate fusion. The energy required to do this is approximately
2 fJ 2 1015 J per proton. This number is greatly aided by the process of quantum tunneling,
which allows the protons to get close enough to tunnel through the remaining potential barrier
without having the energy required to actually overcome the barrier. What temperature is
required in the core of a proto-star in order for this process to be viable on average?

Section X.5: The Maxwell-Boltzmann Speed Distribution


Using the above results, we can easily determine a probability distribution for the speed of a
collection of gas atoms or molecules in thermal equilibrium at temperature T. The energy associated
1
with an atom of mass m moving at speed v is given by mv 2 , so the Boltzmann factor in this case
2
is
2
e mv 2 kBT .
Speed is a continuous thing, so we should be looking for a continuous probability distribution. We
cannot do this directly because the total speed of the atom consists of three independent pieces
associated with motion in the x, y, and z direction. In order to arrive at a consistent result, we must
treat these three contributions as independent. Focusing on the z-direction, the Boltzmann result
indicates that the probability of occupation of a state with z-component of velocity vz is proportional
511

Advanced Mathematical Techniques


2

to e mvz 2 kBT . Since this component of velocity is continuous, we should view this as a probability
density and multiply by the window size dvz to get the actual probability. The probability that a given
atom will have some z-component of velocity between negative infinity and infinity164 is clearly 1, so
the proportionality constant can be determined and the probability that a given gas atom will have a zcomponent of velocity between vz and vz dvz is given by
2
m
e mvz 2 kBT dvz .
2 k BT
See if you can show this; it requires a use of the Gaussian integral.
The velocities in the other two directions are independent and identical, so the probability that
an atom will be found with velocity between v and v dv is given by

P vz ; dvz

32

m mv2 2 kBT
P v; d v
dvx dv y dvz .
e
2 k BT
This probability resides in a three-dimensional space of velocities rather than positions, but vectors are
vectors so the two behave mathematically in the same way. To distinguish between this velocity
space and ordinary space, the former is often referred to as phase space. This term is extremely
general, usually used to describe a space in which the independent variables associated with the state
of a system form the axes. In our case, we are interested only in the velocity of the atom and not in its
location, so the phase space relevant to our analysis consists only of the components of velocity. To
change this velocity distribution to a speed distribution, we need to change our perspective. The
components vx , v y , and vz are Cartesian components of the velocity vector. The speed is the

magnitude of this vector, so it would be preferable for us to work in spherical coordinates if we are
interested in the speed. Velocity space is identical to ordinary space as far as mathematics goes, so
we can write our phase space volume element as
dvx dv y dvz v 2 dv sin d d .
Our Boltzmann factor is independent of direction (the motion is isotropic, the same in all directions),
so we are free to integrate over the angular coordinates and arrive at the Maxwell-Boltzmann speed
distribution
32

m mv2 2 kBT
P v; dv
4 v 2 dv .
e
2 k BT
This expression gives the probability that a gas atom picked at random will have a speed
between v and v + dv. One interesting feature of it is that small speeds are suppressed as well as large
ones. It is easy to understand why large speeds should be suppressed, as the Boltzmann factor forbids
any one atom from hoarding energy, but why should the probability of finding an atom moving slowly
be suppressed? The maximum probability associated with the velocity distribution is the state with a
speed of 0, so it definitely seems strange that the speed distribution should suppress this region
especially since it was derived from the velocity distribution! The reason for this suppression is
associated with the volume element
dvx dv y dvz v 2 dv sin d d .

The region of phase space associated with speeds lying between v and v + dv is a spherical shell with
radius v and thickness dv. The volume of this region is 4 v 2 dv , which is suppressed at small values
of v. The reason small values of the speed are suppressed is that there is no room for a large number of
164

We are obviously ignoring special relativity in this treatment. This analysis can be performed without ignoring it, but the
treatment is more complicated and the difference is extremely slight under normal conditions.

512

Section X.5: The Maxwell-Boltzmann Speed Distribution

states in that region. Continuous distributions do not count states in the same way as discrete
distributions, but there still is a measure of the density of states available for occupation. This density
is constant for the speed distribution because there is no reason to favor one speed over another, so
smaller regions of phase space contain fewer states. Small speeds are therefore said to be
suppressed by phase space. The idea of phase space is extremely important in many branches of
physics, but it is often not discussed in detail until much later. Without a proper understanding of
phase space, it is impossible to understand many of the calculations involving stellar structure, nuclear
and particle decay, and the conduction bands of metals.
As an example of the use of this speed distribution, consider the probability that a helium atom
picked at random from a collection of helium in thermal equilibrium at a temperature of 300 K will be
found moving between 1000 m/s and 1005 m/s. Plugging the numbers in, we find that this will occur
approximately 0.36% of the time. We must be careful to use SI units when evaluating the speed, as
using inconsistent units can easily lead to a major problem with our evaluation of the probability. The
value of Boltzmanns constant is about 1.38 1023 J K , and atomic masses are usually given per mole
of atoms rather than per atom, so we can combine these two effects by multiplying by Avogadros
number in the numerator and denominator of the two fractions. Avogadros number times
Boltzmanns constant gives the ideal gas constant
R k B 8.3144727(145) J K mol .
We will see below why it is called this. Using R in place of k B , we are free to use the molar mass of
the atoms rather than the atomic mass. These molar masses are usually given in grams rather than
kilograms, so we must convert in order to arrive at the correct result. Because of the complication of
this expression, it is often useful to re-express it in terms of the ratio u 2 mv 2 2k BT . In terms of u,
the probability distribution is given as
4 u2 2
P u; du
e u du ,

which is obviously a much simpler expression and only requires us to determine the value of u once. It
seems almost magical that all of the messy factors adorning the exponential vanish when we make this
substitution. Of course, it is not magical. We chose to express the speed in terms of the dimensionless
ratio u. The probability is dimensionless, so the factors present in our initial form are there only to
cancel the dimensions in v 2 dv . Choosing the appropriate scale makes these factors obsolete, so they
vacate the premises. This often happens in scientific analysis: writing things in terms of appropriately
chosen dimensionless ratios allows us to boil everything down to its most basic form.
When the speed interval we are interested in is small, we need only compute the probability
density and multiply by the window size. When it is larger, the probability is determined by
integrating from the minimum value of u to the maximum value. The integral is Gaussian, so can only
be completed exactly in terms of elementary functions when the limits are 0 and infinity (in which
case, the probability is obviously equal to 1). For other limits, the integral can be completed in terms
of the well-known error function,
2 x t 2
erf ( x)
e dt ,

as
P umin u umax erf (u )

ue u

umax

u
The probability that a helium atom chosen at random from a gas in thermal equilibrium at temperature
300 K has a speed between 0 m/s and 1000 m/s is given by
min

513

Advanced Mathematical Techniques

P 0 u 0.8957 erf (0.8957)

(0.8957)e (0.8957) 0.3416 .

We can use the above results to compute the probability that an atom will be found in any
desired speed interval, but it is not obvious which interval will give the largest probability. As with all
probability distributions, this distribution can be characterized by its mean and standard deviation.
These quantities can be computed exactly because the required integrals run from 0 to infinity. The
average value of u is given by
4
2
2
,
u
u 3e u du
0

so

2 k BT
8k BT
.
u
m
m
In order to determine the standard deviation, we need to first determine the average value of v 2 . This
quantity actually has a special significance of its own in this particular distribution; we will see below
that, for many reasons, it is actually more important than the mean value. It is given by
3k T
2
v 2 vrms
B ,
m
so the standard deviation is equal to

8k T

v 3 B .
m
The rms subscript on the average of the square stands for root-mean-square, or the square root of the
average of the square. Root-mean-square values are prevalent in many branches of science for a
variety of reasons. One other quantity is interesting to consider, the speed with maximum probability
density. This is determined simply by differentiating our density function and setting it equal to zero:
2k BT
vmp
.
m
Figure 6 contains a graph of the probability density versus u illustrating the location of the most
probable speed and the root-mean-square speed, and the region associated with a single standard
deviation or less from the mean is shaded. The mean value lies between the most probable and the
root-mean-square, and can be located by going halfway between the sides of the shaded region. Note
that the root-mean-square and average values of the speed are both larger than the most probable value.
This is because there are more speeds to the right of the most probable value so these count more in the
average than those to the left. The root-mean-square is farther to the right than the average because
squaring large numbers makes them even more important.
0.8
0.6
0.4
0.2
0.5

1.0

1.5

2.0

2.5

3.0

Figure 6
The graph illustrated in figure 6 contains all of the information about the Maxwell-Boltzmann
speed distribution, regardless of temperature or atomic mass, as it has been given in terms of the
514

Section X.5: The Maxwell-Boltzmann Speed Distribution

dimensionless parameter u. Increasing the temperature represents a horizontal stretch of the


accompanying graph with speeds on the horizontal axis because the speeds now have to be larger in
order to result in the same value of u. Increasing the mass of the atom represents a horizontal shrink of
this graph because the speeds do not have to be as large to achieve the same value of u. Thus, higher
temperatures are associated with a broader distribution whose peak occurs at higher speeds and higher
masses are associated with a thinner peak lying closer to the vertical axis. Once we get used to this
kind of manipulation of the graph, graphs given in terms of scaled quantities are extremely useful.
It is often important to determine the probability that a given atom is moving faster than a given
speed. This can easily be determined from out above result, given that lim erf ( x) 1 (why?). The
x

result is
P umin u 1 erf umin

umin e umin .

Using this, we can see that the probability of finding a helium atom in thermal equilibrium at
temperature 300 K with a speed larger than 5000 m/s is about one part in a hundred million. Larger
speeds will see even larger suppressions.
The escape speed of the Earth is about 11.2 km/s, so this is certainly not the reason why helium
will not remain in the atmosphere of this planet of its own volition (an increase in speed by a factor of
2 results in an increase of the exponent in the probability density by a factor of four). We can obtain a
better argument for this fact of nature by considering the height of a helium atom in the atmosphere
rather than its speed. Height and speed are not correlated, so we can consider them completely
separately. The potential energy of an atom at height h above the surface of the Earth is given
approximately by mgh, as long as the height is not too large.165 Using this expression, we can write
the probability of finding an atom with mass m at a height between h and h + dh above the Earths
surface as
mg mgh
P (h; dh)
e
dh .
k BT
We really should have used ( h R ) 2 dh (with a different normalization factor) instead of just dh, to
account for the phase space associated with height h above a spherical surface, but this will only matter
if the height above the surface becomes comparable to the radius of the Earth, in which case we have
other problems. The average height is given by k BT mg (show this), which, for helium gas at 300 K,
is a little over 63.5 km. This is the average height; there are many atoms both above and below this
height. The standard deviation is surprisingly the same as the average, resulting from the fact that the
probability density does not have a peak around which to expand. Nevertheless, we can readily do the
required integrals. Show for yourself that
e 1
P h h
0.63212...
e
P h h 1 e 0.367879...
P h 2h 1 e 2 0.1353335...

P h 5h 1 e5 0.0067379... .

(Hint: scale the distribution first).


This last value corresponds to approximately 317.5 km above the Earths surface, close to 5%
of the radius of the Earth. Obviously, many things are different about our atmosphere at this height.
165

It should not represent an appreciable fraction of the radius of the Earth, 6400 km.

515

Advanced Mathematical Techniques

As we ascend up the Earths atmosphere, we initially see a decrease in temperature in the troposphere.
At about 10 km, where the temperature is roughly 215 K, this trend reverses through the stratosphere
and the temperature increases to approximately 260 K at 50 km. All of these temperatures are still
high enough to have the average height of the helium exceed 45 km, at which point the temperature
would have increased again and pushed the average height back up to 55 km. Past 50 km, the helium
enters the mesosphere and the temperature again drops with altitude. At its highest point of
approximately 85 km, the temperature of the mesosphere averages 170 K. This temperature is low
enough to drop the average height back to 36 km. The helium atoms are losing their energy to the
surrounding atoms. Even at this temperature, however, the probability that a helium atom should be
found above the mesosphere is approximately 5%. After that, the temperature increases again to
heights of 2000 K and beyond. The mesosphere is obviously the barrier that helium atoms must
overcome in order to escape the gravitational pull of the Earth, and one can expect to find a repository
of helium in the mesosphere and the upper stratosphere. On the other hand, a significant portion of the
helium atoms do, indeed, escape. As they leave, their states go unfilled and the cycle starts over again.
The constituents of the majority of our atmosphere, nitrogen and oxygen, escape this end mainly
because they form bonds with themselves. The average height is inversely proportional to mass, so a
nitrogen molecule at 300 K has an average height of only 9 km, and an oxygen molecules average is
even lower.
It is important to understand that this is actually the way that temperature works. When we
derived the Boltzmann factor above, we were operating under the assumption that the energy of the
system was known and determined the temperature in terms of the energy. Here, we are studying a
system of known temperature. The roles are reversed, and the energy is determined by the
temperature. They are both state variables, so are equivalent in the eyes of thermodynamics. The
point of view we take depends on the type information we have about the system. The actual value of
the total energy was not used in the derivation of the Boltzmann factor; it is used only to determine the
temperature of the system. The Boltzmann factor simply asserts that the total energy is finite. Less
energy means lower temperature. To say that a gas is in thermal equilibrium at temperature T means
that all of the accessible microstates are occupied to the extent predicted by the above analysis. If
some of the microstates are unoccupied, then the entropy of the state can increase through exchanges
in energy via collisions. Since entropy is a measure of probability, what we are really saying is that,
left to its own devices, a system will always be most likely to find in the most probable configuration.
Not a difficult statement to swallow, especially when the relative standard deviation from this most
probable configuration is of the order of 1 n j for state j. The states will be populated, as the
configuration with maximum probability is that given by Boltzmann. If it requires more energy to
populate these states, so be it. If the energy is not available, then the temperature of the system will
drop. In a large, open system, like our atmosphere, a drop in temperature in one section will cause
energy to flow to that region, spreading the deficit out and reducing the temperature of the entire air
mass. This process is, of course, the cause of the vast majority of our weather. The air is continually
heated by the Sun, which results in local differences in temperature that cannot be maintained. This
energy flows to the atoms through scattering and collisions, populating the previously unpopulated
states. As atoms rise through the air, they either gain energy or drop some off as the temperature
fluctuates. The resulting probability distribution for helium atom heights will not look like that for any
of the temperatures involved because the atmosphere itself is not in thermal equilibrium. Local
pockets of approximate equilibrium will, however, lead to a journey for the helium atom much like
that described in the previous paragraph.

516

Section X.5: The Maxwell-Boltzmann Speed Distribution

Exercises for Section X.5:


In problems 1 6, determine the probability that a molecule of the given gas will be found with a speed
in the given range if it is maintained at the given temperature. Also determine the average, most
probable, and root-mean-square speed of these molecules at the given temperature.
1.
2.
3.
4.
5.
6.

Carbon dioxide at 300 K, speed between 200 and 400 meters per second.
Nitrogen at 300 K, speed between 600 and 800 meters per second.
Hydrogen at 50 K, speed greater than 1000 meters per second.
Ammonia ( NH3 ) at 300 K, speed greater than 600 meters per second.
Nitrogen at its liquefaction point, 77 K, speed less than 100 meters per second.
Chlorine at its liquefaction point, 239 K, speed less than 100 meters per second.

In problems 7 10, determine the probability that a molecule of the given gas will be found between
the indicated heights above the surface of the Earth. Assume that the temperature is equal to 280 K,
independent of height. Also give the average height of these molecules.
7. Nitrogen, between 40 and 50 km.

8. Carbon dioxide, between 30 and 50 km.

9. Water vapor, between 50 and 70 km.

10. Nitrogen, between 50 and 70 km.

11. In all of the preceding, we have assumed that the temperature of the atmosphere is independent
of height. In order to correct for this fallacious assumption, we need to include a heightdependent temperature in our Boltzmann factor. This will not be a fully honest treatment, but
it will be better than that given above. Suppose that the temperature, in Kelvin, is given as a
function of height, h, in kilometers, by
280 6.5 h
; h 10

215
;
10 h 30

T (h) 215 2.25(h 30) ; 30 h 50 .


260 3(h 50) ; 50 h 80

170 0.76(h 80) ; 80 h 500


(a) Use a computer algebra system to normalize the associated height probability distribution,
truncating your integral at 500 kilometers. Determine this normalization constant for
helium, nitrogen, carbon dioxide, and water vapor, and write the associated probability
distribution.
(b) Find the probability that each of the species of gas considered in part (a) will be found above
80 kilometers.
(c) Determine the average height of each of the species of gas considered in part (a).
(d) Compare your results for nitrogen and water to the results obtained for the average height in
problems 7 and 9. Are these results larger or smaller? Explain qualitatively why the
difference is expected.
12. Determine the probability that an atom of helium at the surface of the Earth has a speed
exceeding the escape speed of the Earth, 11.2 kilometers per second. Is this probability similar
to the probability obtained in part (b) of problem 11 that a helium atom will exist above the
mesosphere and eventually escape? Explain why your result makes sense.
517

Advanced Mathematical Techniques

Section X.6: The Ideal Gas Law


It will be useful to illustrate one of the main procedures used to determine physical laws
associated with atoms, that of thinking about it. Suppose we are interested in determining the pressure
exerted by a gas of N atoms sealed in a volume V inside a tube by a piston of area A. Pressure is force
per unit area, so we need to determine how much force is being applied to the piston. The reason that
the piston is experiencing a force is that gas molecules are continuously colliding with it and bouncing
off it. Consider an atom with velocity component vz perpendicular to the piston. The force imparted
by this atom is equal to its change in momentum divided by the amount of time the collision takes.
The collision is approximately elastic, ignoring the small exchange of energy that could take place
between the gas atom and the molecules of the piston, so the magnitude of the change in momentum
experienced by the atom is 2mvz . How many atoms of this type hit the piston in time t ? Well, an
atom moving with velocity component vz toward the piston must be closer to the piston than vz t if
it is to hit in time t . Therefore, all of the atoms with this component of velocity lying within the
volume Avz t of gas adjacent to the piston will hit in time t . If the gas is in thermal equilibrium,
then the number of atoms with this component of velocity can be determined by the MaxwellBoltzmann velocity distribution. Suppose that f vz is the velocity probability density. Then, the
number of atoms in that volume of space with the required velocity component lying between vz and
vz dvz is given by
Av t
n vz ; dvz N z f vz dvz .
V
The force delivered by these atoms is
N 2mvz Avz t
N
Fvz
f vz dvz A 2mvz2 f vz dvz
V
t
V
and the pressure is
N
pvz 2mvz2 f vz dvz .
V
This result only represents a portion of the pressure that coming from atoms with the right
component of velocity. To obtain the whole pressure, we must integrate over vz . We could use the
Maxwell-Boltzmann distribution for this, but there is no need. It is obvious that the integral will
simply give
N
p m vz2 .
V
The factor of 2 disappears because we are only interested in atoms going toward the piston. The zcomponent of the velocity is not any different from the other two components, so we can also write
this result as
N 1
p mv 2 .
V 3
The average here is taken of the whole speed. Re-arranging, we obtain
N 2
p K ,
V 3
where K is the kinetic energy of the atoms. This law was originally derived empirically; experimental
evidence indicates that
518

Section X.6: The Ideal Gas Law

n
RT ,
V
where n N is the number of moles of atoms present in the container, R is the ideal gas constant,
determined from experiment, and T is the Kelvin temperature. Relating these two results, we obtain
3 R
K T .
2
The Maxwell-Boltzmann distribution puts this expectation value at 3k BT 2 , promoting the
identification
kB R .
The above does not represent a derivation of the ideal gas law as much as a statement of the
expectations of the behavior of an ideal gas. Comparison with the experimental results is what really
gives us new results, as it indicates that the average value of the kinetic energy is exactly as given by
the Maxwell-Boltzmann speed distribution even though we did not use this distribution in our analysis.
Experimental data indicates that the predicted root-mean-square speed is correct, provided the
Boltzmann constant is given as above. As always, the main importance of this result comes from its
interpretation. The 3 in K 3k BT 2 comes directly from the fact that there are three independent
p

ways in which our gas atoms can use energy: that associated with motion in the x, y, and z directions.
We can use this idea to make the statement that all independent ways in which the atoms can use
energy, called degrees of freedom of the system, are allotted k BT 2 worth of energy per atom.
Diatomic molecules can rotate about two different axes in addition to the simple motion of the whole
molecule in space, so these have 5 . Diatomic molecules can also vibrate at high temperature,
lending two more degrees of freedom (the potential and kinetic energies associated with these
vibrations each count as one separate degree of freedom); diatomic molecules at high temperature are
expected to have 7 . This idea is called the equipartition of energy, and is extremely useful in
practice. We can approximate the amount of energy it will take to increase the temperature of a
system by the small amount dT simply by multiplying k BT 2 by the number of independent ways in
which the atoms of the system can use the energy. Increasing the temperature requires the system to
allot more energy to each of the available ways in which the system can use energy, so systems that
have more degrees of freedom require more energy to change their temperatures than those that have
less degrees of freedom require less. A diatomic gas is expected to have a larger heat capacity than a
monatomic gas for this reason, and its heat capacity is expected to increase to reflect the additional
degrees of freedom as the temperature of the gas increases and the molecule begins to vibrate.
A system with degrees of freedom at temperature T has total energy

E Nk BT pV ,
2
2
which can be used to determine essentially all of the interesting features of the ideal gas. The thermal
expansion coefficient and the bulk modulus can be directly calculated using the equation of state:
Nk B 1
1 V


V T p pV T

Nk BT
p
B V
p,

V
V T
so we immediately have the results

E
CV
Nk B nR

T
2
2

V
519

Advanced Mathematical Techniques

pV
Nk B nR
T
CV k BT
E
2

.
E
kB E
N
We can also determine the entropy with a little work. At constant volume, we have

dU TdS S Nk B ln T T0 f (V ) ,
2
where f (V) is an unknown function of V. To determine this function, we form the free energy

F U TS pV T Nk B ln T T0 f (V ) .
2
2

The pressure is equal to negative the partial derivative of the free energy with respect to volume at
constant temperature, so
F
p
T f (V ) f (V ) Nk B ln V V0 S0 .
V T
The integral must be done at constant temperature because that was how the derivative was taken. The
entropy is therefore given by
T 2 V
p 2 V 1 2
pV

S Nk B ln S0 Nk B ln S0 Nk B ln
S0 ,

2
T0 V0
p0 V0
p0V0
where S0 is the initial entropy of the system. In the last expression, I have introduced the ratio of
specific heats C p CV 1 2 , which is extremely important in the treatment of processes
C p CV

involving ideal gases. Apparently, processes that take place at constant entropy also have a constant
value for pV . These adiabatic processes find much use in thermodynamics, and we will find
another use for this result later when we determine the speed of sound in an ideal gas.
The ideal gas model is very useful in thermodynamics, both because is it quite accurate for
most gases at temperatures far from their liquefaction temperature and because it provides a simple
model in which we can derive and test the full results of thermodynamics. Thermodynamics is an
extremely subtle subject, with many results depending crucially on a proper understanding of what
derivatives are taken, what is held constant, and specifically what state variables something depends
on, so it is very important to have a test case to use in order to work out the kinks. The ease with
which many results in thermodynamics can be derived belies the difficulty inherent in finding the easy
way to do something. It is obvious from the above that many, many different relationships can easily
be derived between the state variables. The difficulty lies in finding the specific ones that will be
useful in bringing us closer to a given goal.

Exercises for Section X.6:


1. Determine the relative standard deviation of the energy of 2.7 moles of a diatomic ideal gas at a
temperature that allows it to rotate, but not vibrate.
2. Determine the relative standard deviation of the energy of 0.43 moles of a diatomic ideal gas at
a temperature that allows it to rotate and vibrate.
3. Determine the value of V T S for an ideal gas with degrees of freedom. This quantity is
related to the adiabatic thermal expansion coefficient.
520

Section X.6: The Ideal Gas Law

T S S T V S V T , and use the


expression for entropy given in the text to show that this result is confirmed by the result of
problem 3.

4. Show that Maxwell relations indicate that

5. Determine the value of p V S for an ideal gas with degrees of freedom. This quantity is
related to the adiabatic bulk modulus, and is very important in the treatment of sound waves.
6. Show that Maxwell relations indicate that

V S S V p

p V , and use the

expression for entropy given in the text to show that this result is confirmed by the result of
problem 5.
7. Show that the Maxwell relation p T V S V T is satisfied by our expression for the
entropy of an ideal gas.
8. Show that the Maxwell relation

p S V S p is satisfied by our expression for the

entropy of an ideal gas.


9. Using the Van der Waals equation of state,

n2 a
p 2 (V nb) nRT ,
V

where a and b are constants to be determined from experiment, determine the following in terms
of the constants a and b along with the state variables p, T, and V. The internal energy of a Van

der Waals gas with degrees of freedom is U nRT n 2 a V .


2
(a) Isothermal bulk modulus, V p V T .
(b) Isobaric thermal expansion coefficient, V T p V .

(c) Entropy.
(d) Adiabatic bulk modulus, V p V S .

(e) Adiabatic thermal expansion coefficient, V T S V .


(f) Explain why the adiabatic thermal expansion coefficient is negative for most materials.
10. Use the results of problem 9 to produce a graph of pressure versus volume for Van der Waals
isotherms at 80%, 100%, and 120% of the liquefaction temperature for the given gases. Take
the number of moles as 1. How do these graphs deviate from the ideal case? Then, make a
graph of the isothermal bulk modulus as a function of temperature at one atmosphere of
pressure. All quantities are given in SI units.
(a) Helium, with a 3.457 103 and b 2.37 105 ; 3 , and liquefaction is at 4.2 K.
(b) Carbon dioxide, with a 0.364 and b 4.267 105 ; 7 , and liquefaction is at 195 K.
(c) Nitrogen, with a 0.14 and b 3.9 105 ; 5 , and liquefaction is at 77 K.
(d) Can you find a similarity in the value of the isothermal bulk modulus as the liquefaction
point is approached? How does this quantity behave as this temperature is approached?
521

Advanced Mathematical Techniques

Section X.7: The Birth of Quantum Mechanics


The advent of quantum mechanics is extremely interesting, as it relied on a great many
physicists at the close of the 19th century and the beginning of the 20th interpreting different
experimental results that at times seemed contradictory and totally out of line with the classical theory.
Probably the most important of these lies in Plancks determination of a function to model the
measured blackbody radiation spectrum. All objects give off some amount of radiation, as a result of
their temperature. This is essentially the idea of equipartition of energy at work. Since atoms and
molecules contain electric charge, they interact with the electromagnetic field. Electromagnetic fields
can support waves, as established by James Maxwell in 1865 and demonstrated experimentally by the
German physicist Heinrich Hertz in 1887. These waves represent degrees of freedom that the
electromagnetic field can use to store energy, so thermodynamics indicates that the energy of all atoms
and molecules must be shared with these degrees of freedom in order to maximize the entropy of the
system. Unfortunately, there are infinitely many of these degrees of freedom and there is no classical
way in which to limit the energy they hold. Every different frequency of the waves represents a
separate degree of freedom, and the energy associated with this degree of freedom depends on the
square of the amplitude of vibration of the electromagnetic wave. Making the amplitude smaller
means that the degree of freedom requires less energy, so all of the frequencies ought to share in the
thermodynamic energy available; none of the frequencies are classically forbidden to participate, as the
required state energies can be made as small as one likes.
This line of reasoning leads to a complete breakdown of physics. It takes an infinite amount of
energy to satisfy this infinite number of degrees of freedom, so every object seems doomed to expend
all of its energy trying to satiate the electromagnetic field. The only temperature at which an object
can be in thermal equilibrium classically seems to be absolute zero. At long wavelengths and small
frequencies, the number of degrees of freedom is limited by phase space.166 Classical accounts of the
amount of energy carried by these frequencies, culminating in the Rayleigh-Jeans law, named for the
English physicists William John Strutt (the 3rd Baron Rayleigh) and Sir James Hopwood Jeans, agree
very well with experiment for low frequencies or large wavelengths. These treatments utterly fail at
small wavelengths, though, so something is clearly wrong. The Rayleigh-Jeans law puts the amount of
energy per unit volume stored in electromagnetic waves in thermal equilibrium with matter at
temperature T with wavelength between and d at
8 k BT
d .
EM d
4

In terms of frequency, this is


8 k BT 2
4 2 d
2

d
k
T

,
B
c3
c3
where c represents the speed of light in a vacuum. The effect of phase space is obvious in the last form
given, and it is clear that the integral over frequency will not converge at large frequencies. This is the
so-called ultraviolet catastrophe, as ultraviolet light is associated with large frequencies.

EM d

0.00003

0.000025
0.00002
0.000015
0.00001
5. 10-6
0.0

0.2

0.4

0.6

0.8

Figure 7
166

Small frequencies represent small spherical shells in phase space, so have a limited number of states.

522

Section X.7: The Birth of Quantum Mechanics

The experimental results are quite different. Figure 7 illustrates a comparison of the RayleighJeans result (dashed) with experimental results (solid), obtained by measuring the amount of energy
associated with specific frequency windows emitted by an object at temperature T. The figure is
plotted for a temperature of 300 K versus the frequency of the electromagnetic waves, in units of
1014 Hz , and the vertical axis gives the energy density held per frequency in SI units. The RayleighJeans law clearly tracks the data quite well until the experimental results completely change direction
and plummet toward zero at a certain frequency. This is, of course, exactly what is required in order to
have a sensible theory in which matter can attain thermal equilibrium at finite temperature, but in order
to understand what it implies about the underlying theory we have more work to do. Early analysis of
these experimental results led German physicist Wilhelm Wien to publish Wiens displacement law for
the wavelength associated with maximum energy density in 1893. This law puts the maximum, whose
appearance clearly illustrates the difference between experiment and the Rayleigh-Jeans law, at
maxT 2.90 mm K . This wavelength does not correspond directly to the peak illustrated in figure 7,
as it was derived by considering the peak in energy density reckoned as a function of wavelength
rather than frequency. Frequency windows are not equal to wavelength windows, and this difference
contributes to the location of the maximum. The wavelength of this maximum frequency wave
( )
satisfies max
T 5.10 mm K , so the two results differ approximately by a factor of 7/4. Using Wiens
displacement law, we can calculate the approximate temperature of the Sun by measuring the peak
wavelength featured in its spectrum. This gives a surface temperature of approximately 5800 K,
corresponding to a peak wavelength of about 500 nm. This wavelength is associated with greenishyellow light, so the Sun should appear greenish-yellow in color. The scattering of light by our
atmosphere alters this somewhat, as the atmosphere preferentially scatters light with short
wavelengths. The light coming from the Sun is separated into its higher wavelength components,
which appear to come directly from the Sun, and its lower wavelength components, which are
scattered by the atmosphere and appear to come from all over. This is the reason why the Sun appears
yellow to us and the sky appears blue: blue light has a smaller wavelength than yellow, so is scattered
more efficiently by our atmosphere.
Wiens displacement law applies to all objects in thermal equilibrium at temperature T, not just
the hot ones. The law indicates that bodies at higher temperatures see a smaller wavelength
associated with maximum energy density. The wavelength associated with maximum energy density
for people, with an internal temperature of approximately 300 K, is approximately given by 10 m .167
This wavelength lies in the far infrared, and is not directly observable by our eyes, but can be
measured with scientific equipment. In the far infrared, we are all little lightbulbs. This fact is
exploited in the manufacture of night goggles and thermal imaging devices, which focus on larger
wavelengths of light than those directly observable to us.
Although Wiens displacement law is very useful in determining the approximate wavelength
expected to be emitted by a body at a given temperature, real progress toward understanding the
reason behind the peak in energy density had to wait a few years. The German physicist Max Planck
was able to demonstrate in 1900 that the experimental results follow the model
8 h 3
1
2h 4 2 d
,

d
EM d

c3
e h 1
e h 1
c3
where h is Plancks constant, given experimentally by h 6.6260693(11) 1034 J s . Phase space is
explicit here, and the appearance of is quite promising to the theoretical development of this law in
167

The wavelength associated with maximum energy density, when reckoned in terms of frequency, is approximately
17 m , as can be read from figure 7.

523

Advanced Mathematical Techniques

terms of Boltzmanns result. In order to fully understand this expression, we need to think carefully
about what we would expect the distribution of energy to look like independent of the actual
probability of occupation.
Fundamentally, the amount of energy stored in the electromagnetic field with frequency lying
between and d is expected to be given by
energy (energy per state)(occupation of state)(number of states) .
As mentioned above, the number of states is given by the volume of phase space associated with the
frequency window between and d . We can count this in a manner similar to that we used
above to compute the number of states within a given speed window. This derivation is somewhat
more detailed, however, as there is no normalization factor; we do not know what to use for the
number of atoms associated with the electromagnetic energy. The Lagrange multiplier is absent
from our present analysis because we have no reason to require the total number of atoms to be
constant, so the Boltzmann factor itself plays the role of probability. An electromagnetic wave with
frequency can move in any of the three directions. The direction of its motion is given by the
vector wavenumber k , called a wavenumber because its components are related to integers in
simple quantum theories. This fact will be important to our process of counting states. The units of
the wavenumber are inverse meters; to transform its dimension into that of the frequency, we multiply
by the speed of light:
2 ck .
The presence of 2 requires a bit of explaining. The wavenumber k describes the wave in terms of
the trigonometric functions sine and cosine. We can write its description as exp ik r , including both
sine and cosine together. For any function f (x), the function f (x 3) has exactly the same shape but
lies 3 units to the right of the original function. The function f ( x vt ) moves to the right with speed v
as time goes on, so we can generate the motion of the wave by writing
ei ( k r t ) .
Here, is the angular frequency; increasing the time by 2 takes us back to the same value of the
function, so the period is given by 2 and the frequency by 2 . The ratio k is equal to the
speed of the wave, the speed of light c in our case, by comparison with the example using f (x), so we
have the above result.
The quantity in parentheses is equal to kr t , so the wave moves in a direction parallel to k.
The wavenumber therefore dictates the direction of the wave and is the key to a proper determination
of the number of states. It is impossible to count the states as-is because the vector k is a continuous
variable. Each of its components can separately take any real value. To avoid this issue, we imagine
that our system is contained within a box with sides of length L. This box can be very large, as large
as we like, but in order to fit within the box the wave must repeat at each boundary. This
requirement is that of periodic boundary conditions, and stipulates that all of the interesting features of
the wave must occur within the box. Behavior outside the box must simply repeat what is going on
inside. This condition can be imposed by requiring that the wavefunction remain unchanged under any
of the replacements x x L , y y L , or z z L , which is easily accomplished by taking
k 2 n L , where n is a vector whose components are all integers. With this in mind, the volume of
phase space is easy to obtain. The components of n are all integral, so they can only change by a unit
amount. Each distinct vector n represents exactly one state, so the number of states with
n nx i n y j nz k is given by 1, or nx n y nz . Now, we are really interested in expressing this state

density in terms of changes in k rather than changes in n. This is easily accomplished from our
expression for k in terms of n:
524

Section X.7: The Birth of Quantum Mechanics


3

Vd 3 k
L
.
k
k
k
nx n y nz

x
y
z

(2 )3
2
This is the effective number of states associated with the window with vector wavenumber lying
between k and k dk . It is extremely useful to have this result on hand, as it is applicable to
essentially any situation in which the relevant states are indexed by a continuous vector quantity. At
present, we are really only interested in the magnitude of k, as that is what is related to the frequency,
so, assuming isotropy, we write
V 4 k 2 dk
(2 )3
for the number of states with wavenumber lying between k and k + dk.
Armed with this result, we are ready to write an expression for the energy expected in a given
frequency window. Writing f ( ) for the energy held per state, or the energy associated with the state
times the probability of occupation per state, we have the energy
V 4 2 d
f ( )
c3
held in the window between and d . Comparing this to the experimental result for energy
stored per unit volume gives the experimental result
2h
f ( ) h
2h e h e 2 h e 3 h .
e 1
This is a sum of Boltzmann factors associated with states of energy nh for natural numbers n,
indicating that each of the states in our phase space is associated with an infinite number of states with
energies nh . We can interpret these states, as Planck did, in terms of a single state with different
occupation numbers. The factor e h is associated with a single occupation of this state, the factor
e 2 h is associated with double occupation, and so on. The energy of the state is given by h , and
the sum of Boltzmann factors takes the place of multiplication by the occupation number. There is still
the issue of a factor of 2 left unaccounted for, but we will leave this discussion for a little later. The
more pressing issue is that these states were supposed to be associated with electromagnetic waves.
What on Earth is occupying these states?!
Plancks discovery was completely unexpected at the time, and led to great controversy among
scientists. Essentially, it states that electromagnetic waves are populated by an as-of-yet unknown
quantity that can only appear in integral values. This result was obtained by experiment, so must either
be understood in these terms or explained in another way. It is extremely difficult, if not impossible,
to explain this quantization of energy in terms of waves, as waves are classically allowed to have any
amplitude they wish. There should not be any requirement on a minimum energy for excitation;
classical waves do not jump from one energy to another, as they have continuous energies. This
jump is reminiscent of the behavior of a particle, not of a wave. Nevertheless, the experimental
results seem to indicate that electromagnetic waves with frequency can only occur with energies
that are integral multiples of h . Planck himself neither liked nor believed his own results, and has
been famously quoted as saying that his analysis resulted from a purely formal assumption, that he
did not think much about it, and further that it was an act of despair I was ready to sacrifice any
of my previous convictions about physics. Later on, he stated that A new scientific truth does not
triumph by convincing its opponents and making them see the light, but rather because its opponents
eventually die, and a new generation grows up that is familiar with it. These comments arose from
his dislike of the new quantum theory that was developing around him during the 1920s as he aged
into his twilight years. Despite Plancks dislike of his own legacy, he was awarded the Nobel Prize in
physics in 1918 for his treatment of the blackbody radiation spectrum. Einstein, another vocal critic of
525

Advanced Mathematical Techniques

quantum theory, was awarded the Nobel Prize in 1921 for his interpretations of Plancks result in the
analysis of the photoelectric effect. This very strange duality associated with the founders of the
theory vocally criticizing it is indicative of the difficulty inherent in accepting quantum theory, and
makes the history of quantum mechanics extremely interesting. Regardless of this interesting history,
however, successive experiments have demonstrated the validity of Plancks act of desperation again
and again. This represents an important fact about science: you cannot argue with experiment and try
to get it to see your point of view. It is what it is, whether you like it or not. It is a testament to both
Planck and Einsteins character as scientists that they were able to forward a theory that they both
disliked so much because it was the only logical way to proceed with the given experimental data.
Plancks treatment indicates that electromagnetic wave states are populated by particle-like
objects now called photons. Even the history of this title has an interesting past. It was introduced in
1926, a quarter of a century after Plancks result, by the American physical chemist Gilbert Lewis.
Planck called these objects quanta, and Einstein called them das Lichtquant (German for light
quanta). Photons of a given frequency can only exist with energy exactly equal to an integral
multiple of h . States of frequency are therefore thought of as having the ability to contain any
number of photons, each of which carries energy h . We can sometimes think in this way of photons
as particles, and this is convenient when we are trying to qualitatively understand a phenomenon like
that associated with the photoelectric effect or Plancks blackbody radiation spectrum, but we must
always remember that they represent the fluctuations in the electric and magnetic field in a given
region. These fluctuations are not particles, though through quantum effects they can behave like
particles in some ways. This was later revealed to also be a property of electrons in atoms, so we must
admit that electrons are also not particles in the qualitative sense of the word. All objects small
enough to exhibit quantum effects sit in a kind of limbo between the two classical ideas of particle
and wave; we must be ready to view them from either perspective in order to correctly account for
their behavior. The major difference between electrons and photons, and the reason why the quantum
behavior of photons (at least, the surface quantum behavior) was understood well before that of
electrons, lies in the fact that electrons have mass and photons do not. Through Einsteins famous
relation E mc 2 , this mass is associated with an energy requirement. The price of creating an
electron is 8.19 1014 J ,168 as this is the amount of energy stored in its mass. This may not seem like
much, but its ratio to k BT gives
5.93 109 K
.
T
Unless the temperature of a system lies on the order of about six billion Kelvin (again, its really 12),
the allocation of this much energy to a single degree of freedom is highly unlikely. This temperature is
not found anywhere in the natural universe, except in supernovas; it is three orders of magnitude
greater than the temperature at the core of the Sun. For this reason, the quantum effects of electrons
are mainly restricted by their interaction with protons and other nuclei. This makes the obvious effects
of quantum mechanics more difficult to discern experimentally, as the energies involved are much
lower than those associated with creating electrons. The fact that photons are massless allows them to
be created at much lower energies than electrons, so their thermodynamic behavior is vastly modified
by quantum effects. Photons cannot be thought of as classical particles, so their total number is not
conserved; the number of photons increases as the temperature increases.

168

In reality, it is twice this amount because electrons cannot be created by themselves. In order for an electron to be
created out of the vacuum, its antiparticle, the positron, must also be created.

526

Section X.7: The Birth of Quantum Mechanics

Although they are decidedly not particles, a great deal of insight into how the physics works
can be attained by thinking of photons as particles. In these terms, the number of photons occupying
states with frequency lying between and d is given by
2
4 2 d

V .
h
e 1
c3
The total energy stored per unit volume in the electromagnetic field in thermal equilibrium at
temperature T is therefore given by
4
3
2h
E
4 2 d
8 hc 4 8 5 k BT
16 T
3
7.55
10
k
T
h

J m .

B
4
0 e
V
c3
hc
15
15
1
K
1

hc

This energy density increases with the fourth power of the temperature, while that of an ideal gas
increases only with the first power, so will become dominant at very high temperatures. At 300 K, the
energy density is only 6.12 J V , almost eleven orders of magnitude less than that associated with
one atmospheres worth of helium, at 1.52 105 J V . Ordinary low temperatures are referred to as
matter dominated for this reason. At higher temperatures, the energy density of radiation can far
exceed that of matter. These regimes are known as radiation dominated. Note that the effects of
electromagnetic energy consumption are inescapable. All atoms interact with the electromagnetic
field, and therefore share energy with it. If there is not enough energy deposited in the electromagnetic
field to justify a given temperature quote, then energy will flow from the matter to the electromagnetic
field fluctuations until there is. The number of photons, while not necessarily relevant to any physical
theory, is interesting to investigate nonetheless. It is given by
3
3

N
2
4 2 d
16
T
k BT
7

(3)
16
(3)
2.03
10

3
0 e
V
c3
1
hc
hc
1K m
so there are almost 550 trillion photons in each cubic meter of space in thermal equilibrium at 300 K.
We have accounted for all of the factors in the experimental state density for photons except for
the factor of 2. This factor comes from the fact that there are two independent ways in which an
electromagnetic field can fluctuate. Taking k in the z-direction, the electric field can fluctuate in either
the x- or y-directions. This represents two separate states, two different polarization possibilities for
each photon, which accounts for the additional factor of 2. It should properly be placed with the phase
space contribution because it represents multiple states rather than a thermodynamic contribution. In
general, the phase space volume is given by
Vgd 3 k
,
3
2
with the degeneracy factor g representing the number of states associated with each distinct value of k.
In our case, the degeneracy of 2 had been well-known as the polarization of an electromagnetic wave
for several years by the time Planck published his analysis of blackbody radiation. It is a completely
indirect experimental confirmation of this theoretical expectation, once the ideas of quantum theory
have been accepted, and represents a tremendous triumph of Plancks result.
One of the most famous results to be derived from Plancks distribution is that of Austrian
physicist Joseph Stefan, the Stefan-Boltzmann law giving the intensity of electromagnetic radiation
emanating from an object in thermal equilibrium at temperature T. This law was originally proposed
based on experimental evidence and the use of the general ideas associated with Boltzmanns
treatment of statistical mechanics in 1879, but can be derived from Plancks distribution. Consider a
body in thermal equilibrium at temperature T. The volume of this body contains electromagnetic
energy as a result of thermal interaction between the molecules of the body and the electromagnetic
527

Advanced Mathematical Techniques

field. Electromagnetic waves are not attracted to small amounts of matter to any reasonable extent, so
essentially all of the electromagnetic waves that leave the body represent radiation emanating out of
the body, never to return. Focusing on a small surface element with area dA oriented perpendicular to
the z-axis, we can calculate the total amount of energy leaving the body through this surface in a given
amount of time. Only photons moving in the right direction and lying closer to the boundary than c dt
away will be able to make it to the boundary in time dt, so the effective phase space volume associated
with photons that escape is given by
2
2
2Vd 3 k
2cdt dA cos

k 2 dk cdt dA .
sin d d k 2 dk

3
3

0
0
(2 )
(2 )3
V (2 )
This is one quarter of what we had for the energy density, so we can immediately state that the amount
of energy leaving a surface of area dA in time dt is
3

E 2 5 k BT

k BTc dA dt .
4 15 hc
Dividing by the area and the time, we obtain the power per unit area, or intensity of the radiation
3

2 5c k BT
W
8
4
T4 .

k BT T 5.670400(40) 10
2
15 hc
m K4
The constant is the Stefan-Boltzmann constant. The total power radiated by an object with surface
area A in thermal equilibrium at temperature T is given by the Stefan-Boltzmann law
P AeT 4 .
The constant e is the thermal emissivity, characterizing the degree to which the body allows its
electromagnetic energy to escape.
Using the Stefan-Boltzmann law, it is possible to derive an expression for the average
temperature of planets in orbit about a star with a given surface temperature. Assuming that the star is
a perfect blackbody, with e = 1, we reason that a planet in thermal equilibrium with its star must
receive the same amount of energy it radiates. Equating the two, we have
R p2
RS
TS .
4 R p2 eTp4 Tp
4 RS2 TS4
4 d 2
2d e
This expression results from the fact that the planet only soaks up its image in stellar energy,
approximately its cross-sectional area divided by the area of a sphere with radius equal to that of its
orbit, d. The result is an expression for the planets equilibrium temperature in terms of its thermal
emissivity and average distance from the star as well as the stars radius and temperature. Note that
the temperature increases as the emissivity decreases. The emissivity measures the percent of the
energy that our planet is supposed to give off that it actually does give off. It is modified by chemical
fluctuations in the atmosphere, which is at the root of the current debate concerning global warming.
The radius of the Sun and the average distance between many planets and the Sun can be calculated
using Keplers laws in conjunction with experimental data, and the surface temperature of the Sun,
approximately 5800 K, can be obtained by comparing the measured frequency distribution with
Plancks result, so we can obtain an estimate of the temperature of the planets using this result. The
radius of the Sun is approximately 7 108 m , and that of Earths orbit is approximately 1.5 1011 m .
Taking e = 1 for the moment, we find the equilibrium temperature of the Earth to be approximately
280 K. This value increases as the emissivity changes; an emissivity of 0.9 leads to an equilibrium
temperature of approximately 288 K for the Earth. Planets that lie farther away from the Sun
obviously have lower equilibrium temperatures, and those closer have larger temperatures. The
I (T )

528

Section X.7: The Birth of Quantum Mechanics

equilibrium temperature of Mercury is approximately 450 K, and that of Pluto is approximately


45 K.169
These temperatures certainly do not give us an accurate account of what is actually happening
on the surface of the planet. The assumptions we made in its derivation require that the whole planet is
in thermal equilibrium, so all of the energy absorbed from the Sun must immediately be shared with
the entire mass of the planet even though it clearly was absorbed on the side that is currently facing the
Sun. The transport of thermal energy is limited by the thermal conductivity of the matter the planet
consists of, so the average temperature on the side of the planet facing the Sun (the side experiencing
daytime) can be much larger than that on the other side (the side experiencing night). These
temperature differences are moderated by the atmosphere, which acts to keep heat in and prevent it
from being radiated away into space, as well as providing a convenient means through which the
energy can be spread over the whole surface of the planet. Mercury lies very close to the Sun and its
gravity does not allow it to have much of an atmosphere, so its average nighttime temperature can
reach lows of 70 K while its average daytime temperature can exceed 700 K. This effect is
exacerbated by the fact that Mercury is almost locked in its orbit about the Sun; the cumulative
effects of the Suns gravitational influence over billions of years have slowed its rotation to the point
where one day on Mercury lasts almost two Mercury years.170 The planet Venus, on the other
hand, has an extremely dense atmosphere consisting almost entirely of carbon dioxide gas. The
absorption spectrum of carbon dioxide makes it one of the premier greenhouse gases, as it readily
absorbs the most prominent wavelengths in the Suns spectrum. This decreases its emissivity to the
point that the average temperature of Venus, at approximately 740 K, exceeds the maximum
temperature quoted above for Mercury, even though Venus lies twice as far from the Sun and is not
locked in its orbit to anywhere near Mercurys degree. The atmosphere of Venus also allows heat to
flow more freely over its surface, as wind currents can readily carry thermal energy from the side
facing the Sun to the side in darkness, so Venus represents a much better example of a planet in
thermal equilibrium than Mercury.
Obviously, the atmosphere of a planet affects its temperature enormously and we cannot simply
rely on the above result to accurately determine the temperature of a given planet. However, this
formula is very useful in giving us an expectation of the surface temperature. Discrepancies with this
expectation represent a fertile ground for us to probe in order to determine more about the nature of the
planets atmosphere and other relevant effects. This is an important idea to understand: we should not
simply discount a physical theory because its predictions are wrong. If the physical theory has some
reliable basis, the question we should be asking is why the predictions are wrong. This often leads to
very interesting results, and paves the way forward to understanding the universe more completely.
Plancks result is one of the most important physical results over the last two centuries, and is
extremely useful in many different situations. Its study led to the development of quantum mechanics,
and it is still used to distinguish between two fundamentally different types of fundamental particles:
fermions and bosons. The Planck distribution implies that electromagnetic wave states can be
occupied by an arbitrary number of photons, in stark contrast to that found for electrons. The Pauli
exclusion principle, named for the Austrian physicist Wolfgang Pauli, states that only one electron is
allowed to occupy any given state. Working through the Planck analysis with this in mind, we arrive
at the factor

169

Plutos orbital radius is about 100 times Mercurys, leading to a factor of 10 in the temperature.
This results from the fact that three complete revolutions of Mercury take the same amount of time as two full orbits;
after two years, Mercurys day/night cycle begins again. If the rotation period of Mercury was equal to its orbital period,
then Mercury would be fully locked into its orbit, with the same side facing the Sun at all times. This phenomenon is
exhibited by the Moons orbit about the Earth, which is why we always see the same side of the Moon from Earth.
170

529

Advanced Mathematical Techniques

1
e 1
instead of that associated with Planck. The minus signs in the geometric expansion forbid more than
one particle from occupying a given state. This result is very useful in the study of metals, stars, and
many other physical systems. Though it is a result with completely different character from Planck, its
derivation involves the same ideas. This is an excellent example of the interplay between physical
theories describing different phenomena. Advances in science often come from studying the results of
other scientists, even those who are not working in your field. The common denominator is the
mathematics underlying all physical theories. This is one reason why a study of mathematics is so
important to science. Rather than just knowing how to predict the outcome of an experiment given an
empirical result, scientists must understand what went into the empirical result and why it is expected
to be true. This allows them to modify the underlying assumptions and generalize the theory when
new experimental results come to light.

Exercises for Section X.7:


In problems 1 4, determine the (a) number of photons per unit volume and (b) amount of energy per
unit volume carried by photons.
1. The temperature is 500 K.

2. The temperature is 1000 K.

3. The temperature is 77 K.

4. The temperature is 15,000 K.

5. Determine the average photon energy at temperature T, the total photon energy divided by the
number of photons. How is this related to the classical result of the equipartition theorem,
k BT ?
6. At what temperature does a gas of helium atoms at one atmosphere of pressure become radiation
dominated?
7. At what temperature does a gas of hydrogen atoms at five hundred atmospheres of pressure
become radiation dominated?
8. At what temperature does the stability of hydrogen atoms become questionable? In other words,
which temperature sees an average photon energy that equals 13.6 eV?
9. At what temperature does the stability of deuterium nuclei become questionable? In other
words, which temperature sees an average photon energy that equals the binding energy of a
deuterium nucleus, 2.22 MeV?
10. Determine the equilibrium value of the temperature of the following planets. The average
radius of the orbit is given, and the radius of the Sun is RS 6.96 108 m . Assume that the
emissivity of the surface is 1.
(b) Saturn, a 1.43345 1012 m .
(a) Jupiter, a 7.785 1011 m .
(c) Uranus, a 2.87668 1012 m .
530

(d) Neptune, a 4.503 1012 m .

Section X.7: The Birth of Quantum Mechanics

11. Use Wiens displacement law to determine the wavelength of maximum energy associated with
a blackbody in thermal equilibrium at 8,000 K. What color does this wavelength appear?
Would you expect an object at 8,000 K to appear this color? Why or why not?
12. Use Wiens displacement law to determine the wavelength of maximum energy associated with
a blackbody in thermal equilibrium at 270 K. What part of the electromagnetic spectrum does
this wavelength lie in?
13. Determine the temperature required for 10% of the photons to have energy exceeding the
dissociation energy, 4.737 eV, of a hydrogen molecule. How many photons does this result in?
Compare this value to the number of hydrogen molecules per unit volume present in a sample of
hydrogen gas at this temperature and a pressure of 1 atm, 50 atm, and 300 atm. Assume that
hydrogen is an ideal gas. Do you expect hydrogen under these conditions to exist mainly as a
molecule or an atom? How do these results compare to the average number of hydrogen
molecules per unit volume with energy exceeding the dissociation energy at this temperature?
14. Determine the temperature required for 10% of the photons to have energy exceeding the
dissociation energy, 2.22 MeV, of a deuterium nucleus. How many photons does this result in?
Compare this value to the number of deuterium nuclei per unit volume present in a sample of
deuterium gas at this temperature and a pressure of 1 atm, 50 atm, and 300 atm. Assume that
the electrons do not significantly modify this calculation, and that the deuterium nuclei can be
treated as an ideal gas. How do these results compare to the average number of deuterium
nuclei per unit volume with energy exceeding the dissociation energy at this temperature?
15. Suppose that photons were fermions, so their population is dictated by the Fermi-Dirac
distribution instead of the Bose-Einstein distribution. Determine the number of photons and
energy held by these photons in thermal equilibrium at temperature T. Compare your results
to those seen in reality. How can you account for the difference?

Section X.8: Summary of Chapter X


This chapter may come across as sort of non-sequitor when compared to the chapters that
come before it, as it is mainly concerned with phenomena that we cannot predict exactly. In classical
physics, this implies a lack of information about the initial conditions. Averaging over the possible
initial conditions associated with a given system will give the results obtained above, but this often
leads to a subtle feeling that the experimentalists have somehow been asleep at the switch. It is
important to understand that many different physical phenomena do not allow us to determine exact
initial conditions and that many of these situations can also be considered to be somewhat chaotic in
the sense that small changes in the initial conditions lead to large changes in the observed behavior of a
system after some time. The ante is further increased with quantum mechanics, which prevents an
exact determination of the initial conditions needed to give an exact treatment of the future evolution
of a given state. In these situations, the only available approach is that forwarded by statistical
mechanics. In this, as well as many other classical situations where the exact positions and
momenta of each of the constituent molecules is unavailable, statistical mechanics gives us an
approach that gives results quite close to those seen in experiment without a large amount of work.
531

Advanced Mathematical Techniques

The treatment of this chapter is introduced from the standpoint of random outcomes, and it
stands properly in that light. This chapter cannot be used to determine the exact results of any
experiment, only to determine the probability that a given result will be attained. This strange
situation is appropriate for systems in which the initial conditions are not sufficient to determine the
outcome of an experiment completely. It is not always possible to determine these initial conditions
completely, and this intrusive inquiry would often lead to a less general result even when it is possible.
Systems containing a large number of degrees of freedom are often treated from this point of view, as
it is difficult to determine the exact initial conditions exhibited by a given system and more general
results are more useful for these systems anyway. This treatment is required by quantum mechanics,
but we do not need this paradigm change to understand the importance of these models. This chapter
is intended to illustrate the techniques used to determine the bulk behavior of a large system of objects
for which the initial conditions of each degree of freedom is not known.
Many of this chapters results are very useful even in systems for which quantum mechanics is
not appropriate. Two of the most important cases in which it comes up involve condensed matter
systems and ideal gases, where the relevant degrees of freedom are atoms and molecules. It is not
feasible to expect experimentalists to give us initial conditions on each and every atom and molecule,
so we must rely on the macroscopic results they are able to provide for us. These results fit neatly into
the analysis of this chapter, and lead to many important theoretical results on how the atoms and
molecules behave en masse. These include the Maxwell relations the treatment of ideal gases. More
involved analysis along these same lines is found in the treatment of quantum systems.
The analysis of this chapter covers many important results in thermodynamics, including the
Maxwell relations, the Boltzmann factor, the treatment of entropy, and the Maxwell-Boltzmann speed
distribution. The first of these allows us to re-express quantities that are not easily obtained in
experiment in terms of quantities that are routinely measured, and finds immense use throughout the
treatment of physical materials. The second allows us to estimate the probability that a given state will
be filled at a given temperature, regardless of the details associated with the state. The third represents
the treatment of an extremely important, though often misunderstood, state variable in
thermodynamics. This quantity is directly related to the second law of thermodynamics, which can be
re-stated to say that the entropy of a closed system must increase as time goes on. The MaxwellBoltzmann speed distribution allows us to both determine speeds associated with gas molecules in
thermal equilibrium, but also provides a very important playing field on which to test the various
treatments of thermodynamics. As stated above, these treatments are very subtle and can often dupe
even seasoned theoreticians. It is nice to have a more-or-less physical model on which to test our
assumptions about thermodynamics and see how they play out.
The next chapter involves a completely new and different subject called linear algebra.
Strangely enough, the topic of linear algebra is prominently featured in many applications of the
material considered in this chapter. The best example of this is quantum mechanics, which is
fundamentally based on the ideas of linear algebra. The next chapter treats the simplest case of linear
algebra, that found in systems with a finite number of degrees of freedom. Many applications require
us to consider infinidimensional linear algebra, but many others are able to be treated with linear
algebra in finitely-many dimensions. The basic ideas associated with linear algebra are best
introduced in the finite-dimensional case; we will see what happens when the dimension of a space
becomes unbounded in chapter 12.

532

Section XI.1: Vector Spaces

Chapter XI
Finite Dimensional Linear Algebra

This chapter is intended to introduce you to some of the important concepts and results of an
extremely broad branch of mathematics called linear algebra. One way to think of linear algebra is as
an extension of the properties of vectors in n-dimensional space. This thought process is made explicit
by definitions of the terms dimension and vector in the field of linear algebra. Linear algebra was
developed separately in many different disciplines over more than a hundred years and finally brought
together under one big tent about a hundred years ago, so there are a lot of concepts that carry several
different names depending on what you are applying them to. I will try to introduce the most common
ones as the text goes on, but I am sure that there are some I will forget.
Section XI.1: Vector Spaces
Linear algebra properly begins with the idea of a vector space. A vector space is a collection
of objects, called vectors, that satisfy a set of properties. Given two vectors in vector space V,
x and y , there must be a way to combine these two vectors to form another vector x y that is
also in V. This combination is called addition, and the fact that a vector space must contain the sum of
any two vectors in it is referred to as the closure of V under addition. The symbol is intended to
remind us that this addition operation is not necessarily the same as ordinary addition. It often is, but
does not have to be. The only properties that it must satisfy are those present in the properties of a
vector space. The addition operation must be commutative, that is x y y x x , y V ,
associative, that is
x y z x y z x , y , z V ,
and the vector space must contain an additive identity, 0 V s.t. x 0 x x V .171 Further,
every element x must have an additive inverse x V for which x x 0 . Many
operations preserve this. Multiplication, for example, would be an appropriate addition operation over
the real numbers if the number 0 had a multiplicative inverse. The set of positive rational numbers has
multiplication playing the role of a vector space addition operation, with 0 1 and x q p when

x p q . These four properties of the vector space, possession of an addition operation that is
commutative and associative under which it is closed, existence of an additive identity, and existence
of an additive inverse for all elements, are fundamental to vector spaces and several other
mathematical classifications of sets of objects.
Vector spaces are always associated with another mathematical set, a field. Fields are sets of
numbers for which there exist addition and multiplication operations. The field is closed under these
operations, and they satisfy the standard properties of commutativity, associativity, and distributivity.
Fields must have two distinct identities, one for addition and one for multiplication. The additive
identity is called 0 and the multiplicative one is called 1. All elements of the field possess an
additive inverse, and all elements except for 0 possess a multiplicative inverse. The rational numbers
171

The use of s.t. here for such that is common in several branches of mathematics; one also sometimes sees the symbol
, but this is often confusing as it seems that we are trying to say backwards that something is an element of something else.

533

Advanced Mathematical Techniques

are a field under this definition, as are the real numbers, but the irrational numbers and the integers are
not. Irrational numbers do not possess either identity, as 0 and 1 are both decidedly rational, and this
set is also not closed under either addition or multiplication. The integers have perfectly well-defined
addition and multiplication operations with identities under which this set is closed. Further, all
elements have an integral additive inverse. The field construct breaks down, however, when we
consider multiplicative inverses. Of the integers, only two possess these. Therefore, the integers do
not form a field under standard addition and multiplication.
The vector spaces we will be considering are associated with either the field of real numbers or
that of complex numbers, with addition and multiplication standard. The definition of a vector space V
requires the space to be associated with a field F and to possess another operation, under which it is
closed, between it and the field, that of scalar multiplication. For every vector x V and every
number k F , there must be a vector k x V . This operation is also associative:
a b x ( ab) x

a, b F and x V .

Commutativity has no meaning for scalar multiplication because one object is a member of the field
and the other is a member of the vector space; their roles cannot be interchanged in any meaningful
manner.
The two operations must also satisfy some consistency relations known as distributivity:
( k l ) x k x l x and k x y k x k y x , y V and k , l F .
The distributive property is sort of a hybrid commutative property for the addition operation and the
scalar multiplication operation: you can add before you multiply or multiply before you add and you
are guaranteed to get the same answer either way. The distributive property makes it necessary for the
scalar product of the additive identity of the field and any member of the vector space to give the
additive identity of the vector space: 0 x 0 x V , since

k x 0 x (k 0) x k x k F and x V .
It also requires that the scalar product of any vector in the space with the multiplicative identity of the
field equals the original vector, (1) x x , and that the additive inverse of any vector in the space is
given by the scalar product of the additive inverse of the multiplicative identity and the vector,
x (1) x . It is important to verify these properties, no matter how obvious they may seem, from
the definition of a vector space and a field and the definitions of the addition property and the scalar
multiplication property because these operations may not be the standard ones and we may not be able
to rely on our intuition. Verification from the definitions gives us a clear idea of what properties are
necessary to arrive at these results, and practice with this procedure in the obvious cases prepares us
better for less obvious ones.
These axioms describe the concept of a vector space, and anything that satisfies them can be
considered a vector space. The abstract nature of a vector space opens the door to broad applications
of linear algebra in areas as different as the ordinary vectors in three-dimensional space and the set of
solutions to a differential equation. The set of quantum states that an electron can occupy in an atom is
also part of a vector space. It is precisely the abstract nature of the vector space that leads to the
tremendous applicability of linear algebra.
All of the vector spaces we will be interested in possess obvious addition and scalar
multiplication operations, so we will forego the circle/plus notation from now on and write x y
for x y . One of the most prominent classes of vector spaces that we will be concerned with is the
534

Section XI.1: Vector Spaces

space of ordered n-tuples of real numbers, n . Vectors in these spaces will be written in a more
convenient manner for easy computation; the vector
a

x b 3 whenever a, b, c .
c

These are the familiar vectors you have been using since geometry. Addition and scalar multiplication
are obvious:
a c a c
a ka

; k ,
b d b d
b kb
for vectors in 2 whenever k is a real number (the real numbers are the field associated with the real
ordered n-tuple vector space). We will also be interested in function spaces, like the space of
polynomials of degree three or lower. See if you can write down the operations of addition and scalar
multiplication for this vector space, associated with the field of real numbers, and try to show that it is
a vector space by showing that all of the properties are satisfied. It is not difficult, and basically
follows from the related properties of the real numbers. Try also to show that the set of polynomials of
degree 3 is not a vector space, and neither is the set of polynomials p(x) that takes p(0) = 2. The set of
polynomials of the form ax 4 bx , with a and b real, is a vector space, but that of polynomials of the
form ax 4 x 2 bx is not.

Exercises for Section XI.1:


In problems 1 4, determine if the given set of objects defines a vector space under the ordinary
addition and scalar multiplication operations. If it does, prove this fact. If it does not, explain which of
the properties of a vector space is not fulfilled and why.
1. The set of all polynomials of degree 5 or less.
2. The set of all polynomials of degree 4.
3. The set of all vectors in 4 whose third component is 2.
4. The set of all vectors in 4 whose third component is zero.
5. Show that the additive inverse of any vector x V is given by (1) x . Show also that this
additive inverse is unique, i.e. that there can be no other additive inverses of x .

6. Prove that the distributive property implies that 0 x 0 for every vector in a given space.

535

Advanced Mathematical Techniques

Section XI.2: Linear Independence and the Solution of Linear Systems


Probably the most important concept in linear algebra other than that of a vector space is that of
linear independence. Suppose that x and y belong to a vector space V. These two vectors are said
to be linearly independent if and only if the equation
a x b y 0 , with a, b F
requires a = b = 0 in order to be satisfied. This means that we can only form the zero vector by
independently making all of the contributions to the sum equal to the zero vector. There is no other
way to cancel out the contributions from these vectors. The statement that two vectors are linearly
independent means that they each carry their own information that is not contained in the other vector,
so each must be thought of as special or independent in its own way. We can include any number of
vectors in this idea; the set

x
j

n
j 1

is said to be linearly independent if and only if the equation


n

c
j 1

xj 0

implies that all of the c j 0 . Linear independence is usually obvious with two vectors, as it is clear
2
that
3
2
and
3

4
and are linearly dependent (this means not linearly independent) and the vectors
6
4
are linearly independent, but it is not as obvious with larger sets of vectors. Is the set
7

2 1 4 9

3 0 2 2
,
,
,

1 2 3 2

2 3 1 3
linearly independent or not?
To answer this question, we write the definition of linear independence and try to analyze the
solutions of the linear system
2
1
4
9 2c1 c2 4c3 9c4 0




3c 2c 2c
3
0
2
2
1
3
4
0 .
c1 c2 c3 c4
1
2
3
2 c1 2c2 3c3 2c4 0





2
3
1
3 2c1 3c2 c3 3c4 0
This system can be analyzed in an efficient way by using the technique of Gaussian elimination. We
first recognize that only the coefficients of the c j and the 0s on the right are relevant to the system of
equations. Then, we see that the system is not changed by replacing any of the equations with a
multiple of itself plus a sum of multiples of the other equations. Three times the first equation minus
twice the second equation gives the equation 3c2 16c3 3c4 0 . This equation will be satisfied
whenever the original system is satisfied, and the failure of this equation to be satisfied implies the
failure of the original system to be satisfied, so the two systems are equivalent. Furthermore, the
second system is slightly more easily analyzed than the original system because c1 no longer appears
in the first equation. This gives us an idea: if we can somehow systematically remove as many of the
536

Section XI.2: Linear Independence and the Solution to Linear Systems

constants as we can from the equations, then we will have an easier time analyzing the system.
Writing the system as an augmented matrix,
2 1 4 9 0

3 0 2 2 0 ,
1 2 3 2 0

2 3 1 3 0
we pick an element of each column to act as assassin and use this element to kill off all other
members of its column. Once it is finished, we move on to pick an assassin in the next column. We
continue this process until we no longer can, and assess the form of the system once we are done. This
process is known as Gaussian elimination, or row reduction, and is extremely efficient in solving large
systems of equations. The first column is assassinated by making the replacements
2 1 4 9 0
2 1
4
9
0

R2 2 R2 3 R1
3 0 2 2 0
0 3 16 23 0 .

2 R3 R1
1 2 3 2 0 RR34
0 5 10
5
0
R4 R1

2 3 1 3 0
0 2 3 12 0
Now, the number c1 appears only in the first equation. Note that the new third row has a common
factor of 5. Since this row really represents an equation, and this equation cannot be changed by
dividing by 5, we can divide the row by 5 with impunity. It is often easier to use a 1 as the assassin, so
lets promote the new 1 in the third row to the title assassin for the second column:
2 1
4
9
0
2 0 2
8
0

R1 R1 R3

R2 R2 3 R3
0 3 16 23 0
0 0 10 20 0 .

0 1
2
1
0 R4 R4 2 R3 0 1 2
1
0

0 2 3 12 0
0 0 7 14 0
At this point, the first, second, and fourth rows contain a common factor. Dividing by it and choosing
the new 1 in the second row to be the assassin of the third column, we have
2 0 2 8 0
1 0 0 2 0

R1 R1 2 R2

R3 R3 2 R2
0 0 1 2 0
0 0 1 2 0 .

0 1 2 1 0 R4 R4 R2 0 1 0 3 0

0 0 1 2 0
0 0 0 0 0
Unfortunately, our new assassin was a bit overzealous and killed everyone in the last row instead of
just the member in his column that was his responsibility. This results from the fact that the rows of
this matrix were not linearly independent. The difference between the fourth and second rows gives
zero, so R4 R2 0 0 0 0 . This is the linear independence relation, with not all of the cs
equaling zero, so the rows are linearly dependent. What does this imply about the columns, though?
We cannot continue our row reduction process because there is no assassin in the last column. All of
the other rows with nonzero elements in the fourth column already contain an assassin, or, as the
correct mathematical term goes, a pivot. Trying to use any of these as a pivot will screw up the work
the other assassins/pivots have already done in the previous columns. The nature of pivots in Gaussian
elimination is that there can be only one in each row or column. The end result of our Gaussian
elimination process is related to the row echelon form of the matrix or system. To fully realize this
form, we must exchange rows two and three so that the matrix is upper triangular. In general, the row
echelon form of a matrix is one in which every element to the left and below each pivot is 0. The
537

Advanced Mathematical Techniques

matrix is said to be in reduced row echelon form if, in addition, all elements above each pivot are zero
and all pivots have the value 1.
Now that we have come to a dead end in our game of assassins, it is time to analyze the
resulting system. The last equation is worthless, as it simply states that 0 = 0. This is true, but not
useful. There are only three remaining equations, and each simply gives the value of one of the cs in
terms of c4 . The constant c4 is called a free variable for this reason, as any choice of c4 will yield a
solution to the system. The remaining equations are readily solved, given c4 ,

c1 2c4
2


3
c
c
2 4 c 3 ,
c3 2c4 4 2


1
c4 c4
so there are an infinite number of solutions to the system. The columns are definitely not linearly
independent. This can also be seen from the fact that
2 1 4 9

3
0
2
2
2 3 2 ,
1 2 3 2

2 3 1 3
but there is no obvious way to determine this fact directly from the vectors unless we simply see it.
Gaussian elimination provides a systematic way in which we can determine if any number of vectors
of any size are linearly independent, given enough patience and good arithmetic skills. If we find that
they are not, as in this case, then there must be at least one relation of the sort given above for the set
we were interested in; note that the coefficients of this relation are given by the entries in the last
column of the row reduced matrix.
The solution set to the above system actually forms a vector space. See if you can show that
these solutions satisfy the axioms of a vector space. This vector space is characteristic of the matrix
2 1 4 9

3 0 2 2
,
A
1 2 3 2

2 3 1 3
as it gives the space of vectors for which A x 0 . It is called the null space of A, and written
Nul(A). A set of vectors in n is linearly dependent whenever the matrix formed by the sets column
vectors has a nontrivial null space. How can we characterize when this will occur? Lets go back and
analyze what happened in the example above. The reason why we could not continue the elimination
process was that the last column did not have a pivot. The number c4 appears only in equations with
other pivots, so this column can have no assassin. Pivots can be thought of as associated with
variables that are completely determined by one equation. Since the coefficients of the variable in the
other rows are all zero, as a result of the assassins handiwork, the variable appears in only one
equation and is immediately determined from the values of the other variables. If a column does not
have a pivot, then the associated variable cannot have an equation that it is given by. It is a free
variable that the other variables will be found in terms of. Thus, a pivot-free column is an immediate
indictment of linear dependence.
Every given row or column can contain only one pivot, so the absence of a pivot in one of the
columns of a square matrix implies the absence of a pivot in one of the rows. A row without a pivot
538

Section XI.2: Linear Independence and the Solution to Linear Systems

looks quite different from a column without one. Because of the nature of the row operations we used
to reduce the matrix, every pivot-containing column in our row-reduced matrix will have zero in any
row without a pivot. A nonzero entry anywhere in that row therefore must lie in a column that doesnt
have another pivot, and can therefore be used as a pivot for that column. For this reason, a row in the
reduced matrix that does not contain a pivot must consist of 0s in every column. The row reduction
process is actually quite useful to analyze from the perspective of the rows, as it clearly indicates the
meaning of linear independence and linear dependence. Each time we replace one of the rows, we are
trading it in for a linear combination of the other rows that also includes it. The assassin process
asks us to search for linear combinations that give zeros in key places, so any linear relationship
among the rows will eventually show itself. When we trade in the last row in the last row operation,
it is replaced with the zero row. This indicates that that row wasnt actually worth anything; it may as
well have been ignored from the beginning, as it is equivalent to the zero row. The row itself only
includes information already provided by the other rows, so is worthless. Note that we do not need to
use the pivot to eliminate all of the other members of a column in order to determine the location of
pivots, only those members that lie in a row that does not already have a pivot. Nonzero elements of
any row that already contains a pivot cannot be used as pivots, as their row already has one. As we go
along, therefore, it is more efficient to only assassinate members of rows which do not yet contain a
pivot.
Suppose we had been allowed to finish the process and eliminate all of the other members of
the fourth column using a pivot. This would mean that each of the cs is determined uniquely by a
single equation. This equation must be of the form kc j 0 for some nonzero real number k, so we
obtain c j 0 for all j and the vectors are linearly independent. The system has only one solution, so
this follows from the definition of linear independence. Thus, we arrive at the statement that the
columns of a matrix will be linearly independent if and only if they all contain a pivot.
We can use this same idea to solve systems of linear equations that are not homogeneous, that
is, the vector we are trying to get the linear combination to equal is not the zero vector. Consider the
question of whether or not the vector
0

2
y
1

5
can be linearly expressed in terms of the set
2 1 4 9

3 0 2 2
,
,
,

.
1 2 3 2

2 3 1 3
This question is answered by solving the system
2c1 c2 4c3 c4 0


3c1 2c3 2c4 2 ,
c1 2c2 3c3 2c4 1


2c1 3c2 c3 3c4 5
which we will do via the augmented system
539

Advanced Mathematical Techniques

3
1

1 4
0 2

9
2

Row reducing, we have


2 1 4 9
0
2 1
4
9

2 R2 2 R2 3 R1 0 3 16 23
3 0 2 2

2 R3 R1
1 2 3 2 1 RR34
0 5 10
5
R4 R1

2
3
1

3
5
0
2

12

2
.
1

5
0

4
2

.
2 1
4
9
0
2 1
4
9
0

4 R4 50 R4 41R3 0 3 16 23
4
R3 3 R3 5 R2
0 3 16 23

R4 3 R4 2 R2
0 0 50 100 14
0 0 50 100 14

0
0
24
0 0 41 82 11
0 0
This row reduced matrix looks somewhat different from the one we got last time, as we went about the
row reduction process in a different way, but we still have the result that the last row does not contain
a pivot. Note that we have ignored the elements of a column lying above the last row containing a
pivot. We could have assassinated them, but there is no need. In this case, the last equation reads
0 24 , which is not true. Therefore, this system of equations is inconsistent and has no solutions. It
is not possible to write the given vector in terms of our set.
The set of all vectors that can be expressed in terms of our set of vectors is the set of all vectors
of the form
2
1
4
9 2c1 c2 4c3 9c4




3c 2c 2c
2
3
0
2
1
3
4

c1 c2 c3 c4
1
2
3
2 c1 2c2 3c3 2c4





2
3
1
3 2c1 3c2 c3 3c4
for suitable constants c j . You should show for yourself that this is a vector space. In fact, the set of
vectors that can be expressed in terms of a given set of vectors is always a vector space. The notation
span x1 , x2 , , xn is used to represent the set of all vectors

c1 x1 c2 x2 cn xn
for real numbers c j , and is called the span of the vectors. According to the above result,

2 1 4 9
0


2 span 3 , 0 , 2 , 2 .
1
1 2 3 2

2 3 1 3
5
The span of the columns of a matrix A is denoted col(A), so we can abbreviate this as y col( A) .

540

Section XI.2: Linear Independence and the Solution to Linear Systems

Exercises for Section XI.2:


In problems 1 6, find the most general solution to the given system.
3x1 2 x2 4 x3 2

2 x1 3x2 x3 2

1. 3x1 2 x2 4 x3 5

2. 3x1 2 x2 3x3 7

x1 2 x2 4 x3 12

x1 x2 3x3 12
x1 3 x2 2 x3 2
4. 2 x1 x2 5 x3 3
5 x1 11x2 3 x3 5

2 x1 3 x2 x3 2
5.

4 x1 6 x2 2 x3 4
4 x1 6 x2 4 x3 4

2 x1 3x2 x3 2
3.

x1 4 x2 2 x3 7
7 x1 6 x2 4 x3 25
2 x1 3x2 x3 5

6. 4 x1 6 x2 2 x3 10

x1 x2 3 x3 7

In problems 7 12, determine the reduced row-echelon form of the given matrix. How many pivots
does the matrix have?
2 3 1
7.

1 5 2

2 1

8. 2 3
1 4

3 2 2

9. 1 7 5
5 3 1

2 1 3

10. 3 1 2
5 7 1

2 4

3 6
11.
5 10

3 6

2 1 3 4

12. 1 2 3 4 .
2 3 1 5

12. Is it possible for a rectangular matrix with more rows than columns to have a pivot in every
row? Why or why not? What does this imply about systems with more equations than
variables? Can there be an infinite number of solutions to such a system? Can there be a unique
solution? Is it possible for such a system to be inconsistent? Give examples to support your
analysis.
13. Is it possible for a rectangular matrix with more columns than rows to have a pivot in every
column? Why or why not? What does this imply about systems with more variables than
equations? Can there be an infinite number of solutions to such a system? Can there be a
unique solution? Is it possible for such a system to be inconsistent? Give examples to support
your analysis.
14. Determine whether the following sets are linearly independent or not. If it is not linearly
independent, find a linear relation between the elements of the set.
3 2 7
2 1 2 5
2 3 1 8



2 3
3
3
1
1
2
2
1
3
4
(a) , , , (b) , , (c) , , ,

1
4

2
2
2
3
4
2
3
1 4

1 5 3 9
3 8 4 1
5 6 17
541

Advanced Mathematical Techniques

15. Determine the most general solution to the following systems. Use your analysis to determine a
basis for the null space of the coefficient matrix.
2x 3 y z 7
2 x 3 y z w 4
3x y z 1
(a) x y 2 z 2

(b)

3x 5 y 18

3x y 2 z w 3
x 2 y z 2w 7

(c) 6 x 2 y 2 z 2

12 x 4 y 4 z 4

16. Can a matrix without a pivot in every column be one-to-one? Can it be onto? Explain, and give
examples supporting your analysis.
17. Can a matrix without a pivot in every row be one-to-one? Can it be onto? Explain, and give
examples supporting your analysis.

a 2b 2

18. Does the set of vectors in 3 given by 2a b 1 , with a and b real, constitute a vector
ab

space? Why or why not?


a 2b

19. Does the set of vectors in given by 2a b 5 , with a and b real, constitute a vector
a b 1

space? Why or why not?


3

20. Use the results of problems 18 and 19 to analyze the situations in which a set of vectors in n
whose components are given by a sum of fixed numbers times arbitrary real numbers plus a
fixed real number will constitute a vector space. There is a simple answer to this question.

Section XI.3: Matrix Transformations


Matrices are often thought of as mappings or transformations from one vector space to another.
The matrix
2 1 1
A
,
1 3 2
for example, could be multiplied by any vector in 3 to give a vector in 2 :
3
2 1 1 3

2 .
1 3 2 11
1
3
For this reason, the matrix is A is said to map to 2 . This is indicated by writing A : 3 2 .
The domain of the transformation A is the space of all vectors x that can be acted on by A, which is
all vectors in 3 , and its range is the set of all vectors that can be obtained by the action of A on a
vector in its domain. One very important thing to think about when one is considering transformations
542

Section XI.3: Matrix Transformations

is whether or not the whole range space, or target space, 2 can be obtained from this mapping. We
have seen that the product A x gives a linear combination of the columns of A. It is obvious from the
above that the only way in which a vector could not be represented by a set of vectors is if the relevant
matrix with columns given by the vectors does not have a pivot in every row. That is the only way to
get an inconsistent statement like 0 = -24, so it is the only thing we have to look for in trying to
determine whether or not a vector can be expressed in terms of other vectors. If there is a pivot in
every row, then a solution can be found for every possible vector we wish to express. Thus, the vector
equation A x y has a solution x for every possible vector y . The transformation is said to be

onto in this case, as it maps 3 to every part of 2 , leaving no parts neglected. The columns of A are
said to span the space 2 , or exhaust 2 , as their span equals all of 2 . Row reduction in this case
is quite abrupt:
2 1 1 R2 2 R2 R1 2 1 1


.
1 3 2
0 7 5
Both rows have a pivot, so this transformation is definitely onto.
The last column does not contain a pivot, implying that the columns are not linearly
independent. There is a linear relation between them, so there is a nontrivial null space for the matrix.
To characterize this null space, we make c3 a free variable and write the solution as

c3 7
1

c
nul 5c3 7 3 5 .
c 77
3

It is immediately clear that A nul 0 , which implies an important property of solutions to systems
of equations involving A. We have seen that the transformation A is onto, so there is a solution to
A x y for every possible vector. Given this solution, it is clear that the vector x nul is also a
solution for every choice of nul Nul(A) . Since there are an infinite number of vectors in the null
space of A, there are infinitely many solutions to every vector equation A x y .

The

transformation is not one-to-one. Given a vector x , there is only one vector y that the
3

mapping goes to. On the other hand, there are infinitely many vectors in 3 that are taken to the same
vector y 2 . Evidently, the vector

2

x 7
6

3
also maps to , as do infinitely many others. A mapping without a pivot in every column will have
11
infinitely many solutions for every vector in the column space of A. If a vector is not in the column
space of A, of course, then there can be no solutions at all.
Thus we come to the conclusion that matrix transformations will be onto if they contain a pivot
in every row and one-to-one if they have a pivot in every column. Matrices without a pivot in every
column are losing information about where they came from when they act. They cannot carry all of
the information about the vector space they map from, so destroy some of it during the mapping. The
information that is destroyed by a mapping is contained in its null space. Vectors that map to the zero
543

Advanced Mathematical Techniques

vector cannot be distinguished from the zero vector by the transformation, so are simply thrown
away. Similarly, matrices without a pivot in every row are mapping to a space that has too much
information for them to support. They cannot reach all of the members of this target space, so map
only to a portion of it. The first of these actions cannot be undone in a well-defined manner, as the
information jettisoned by the transformation is gone and cannot be retrieved. The second can be
undone perfectly well. There will be some information about the range space that is lost in the
undoing process, but this was information that the original space could not contain anyway, so is
somehow not as important. All of the elements of linearly independent sets carry new information, so
assigning a single variable to each of them is appropriate. Linearly dependent sets, on the other hand,
have members that are not pulling their weight. Assigning a variable to them is pretending that you
have information you really dont. The matrix cannot carry all of this information, as its columns are
not independent of each other. We could get by with less of them.
Every column that has a pivot represents a column that is carrying information. Columns
without a pivot can already be expressed in terms of the others anyway, so are not important to keep.
The destination space of the matrix transformation is really col(A) rather than the stated m , with m
given by the number of rows (think of A as an m n matrix and think about which space is mapped
from and which is mapped to), and col(A) is the span of the columns of A. If some of the columns are
not pulling their weight, then they can be thrown out without changing this span. We can already
make these columns, so have no need to carry them around. Columns that have pivots are a different
story. We cannot throw one of these away without changing the span, as none of these vectors can be
obtained from the others with pivots. Thus, the columns with pivots represent the minimum number of
vectors that is required to span the space col(A). Such a collection of vectors is called a basis of
col(A), and the number of vectors it contains is called the dimension of col(A). As seen above, the
dimension of col(A) represents the amount of information that the transformation A can carry. It is
called the rank of A, and is obviously equal to the number of columns with pivots in A. Since each
pivot must be unique to its column and its row, we see immediately that the number of columns that
have pivots is the same as the number of rows that have pivots. This is natural, as the rank of a matrix
represents how much information it can carry, independent of whether we are thinking about its
columns or its rows. We can also consider the space spanned by the rows of A, row(A). It will have
the same dimension as col(A), but will not be the same space. If matrix A is m n , has m rows and n
columns, then its column space will consist of vectors in m and its row space will consist of vectors
in n . They cannot be the same unless m = n. The fact that their dimension is the same implies that
they can be spanned by the same number of linearly independent vectors and that they contain the
same amount of information, but they need not be the same. Neither of these vector spaces need to
be as large as the spaces n or m either, as the mapping could be neither one-to-one nor onto.
The information that is lost by the transformation is given by the null space of A. What is its
dimension? Our analysis above indicates that there is a free variable for every pivotless column, so
there is an independent parameter for every column without a pivot. In solving the system as above,
we find that each of these independent parameters is associated with its own vector. These vectors will
automatically be linearly independent as a result of our Gaussian elimination technique. Consider the
matrix
5 13 4 3
1

9 12 11 1
2
.
3
8 7 11 1

0 5 0
2 3
544

Section XI.3: Matrix Transformations

Using row reduction, taking pivots wherever they come and working from top to bottom left to right,
we have
1 5 13 4 3
1 5 13 4 3

2
9
12
11
1

R2 R2 2 R1


0 19 38 19 7

R3 3 R1
3 8 7 11 1 RR34
0 23 46 23 8
R4 2 R1

2 3 0 5 0
0 13 26 13 6
.
1 5 13 4 3
1 5 13 4 3

R3 19 R3 23 R2
R4 9 R4 23 R3
0 19 38 19 7
0 19 38 19 7

R4 19 R4 13 R2
0 0
0 0
0
0 9
0
0 9

0
0 0
0
0 23
0 0
0 0
There are three pivots, which appear with the double underlining in the last matrix, so the matrix has
rank 3. Since there are 5 columns, we expect that somehow 2 distinct pieces of information are lost
by the transformation; the null space is expected to have dimension 2. To verify this, we need to find
an expression for a general vector in the null space; we need to find a basis for the null space and
count the number of elements it has. In order to find the null space of this matrix, we would augment
the matrix with the zero vector appropriate to 4 and row reduce as before. This row reduction
process can follow exactly the same operations as above, and these row operations will not change the
values of the augmented vector, so there is no need to re-do our reduction process. The result is
1 5 13 4 3 0

0 19 38 19 7 0 .
0 0
0
0 9 0

0
0 0 0
0 0
As always, pivot-free columns are associated with free variables. We assign values to these
parameters and obtain the values of the other variables in terms of them. Writing out the equations, we
find that the general solution is given by
3c3 c4
3
1


2c3 c4
2
1
c3 1 c4 0 .
nul c3



c4

0
1

0
0

When written in this way, it is obvious that these two vectors form a basis for the null space of A, and
it follows that the dimension of this space is 2. We can verify that these two vectors are in the null
space simply by multiplying them by matrix A. The result is zero in both cases.
The idea of a matrix carrying information leads directly to an important theorem in linear
algebra, the rank theorem. Consider an m n matrix A. This matrix has n columns and m rows, so
A : n m . The vector space n clearly has dimension n, as it is obviously spanned by the
linearly independent basis
1 0
0


0
0 1
,
, , .

0 0
1
545

Advanced Mathematical Techniques

There are only n columns, so the maximum number of columns that can possibly have a pivot is n. If
the rank of A is r, then we have r n . Any column that does not have a pivot is associated with a free
variable and an independent vector in the null space of A, so the dimension of the null space is given
by n r . This allows us to write
dim Nul( A) rank(A) n ,
or the amount of information carried by the transformation plus the amount of information destroyed
by the transformation is equal to the total amount of information that was available.
To find a basis for the column space of A, we look to the columns that have pivots. Each of
these is guaranteed to carry new information, so they are definitely linearly independent. In the above
example, the first, second, and fifth columns have pivots. Now, the columns maintained their identity
throughout the row reduction process. At no point did we switch the order of the columns, so these
pivots were also present in the original matrix. Thus, we can be sure that the first, second, and fifth
columns of the original matrix are linearly independent. A basis for col(A) is therefore given by
1 5 3

1
9
2
col( A) span , , .
3 8 1
2 3 0
This is not the only basis that we could have chosen for col(A), as replacing any of these vectors with
itself plus any linear combination of the others would still represent a basis, but it certainly works.
Note that we took our columns from the original matrix rather than the row reduced one. The row
reduction process does not change the order of the columns, but it certainly does alter the column
space of the matrix. The column space of our row reduced matrix obviously does not contain either of
the first two columns of the original matrix, as their last components are nonzero.
A basis for the row space of A can also be found quite easily. The row reduction process
essentially replaces each row with itself plus a linear combination of the other rows. This process
manifestly does not change the row space of the matrix, as all of these linear combinations are already
included in the span of the original rows, so the row space of the row reduced matrix is identical to that
of the original matrix. We can therefore simply take the linearly independent rows in the row reduced
matrix as our basis:
row(A) span 1 5 13 4 3 , 0 19 38 19 7 , 0 0 0 0 1 .
We could also have taken the corresponding rows from our original matrix, but we must be careful
with this as the row reduction process sometimes swaps rows in order to bring the matrix to its row
echelon form so that every element to the left and below each pivot is zero. Swapping rows changes
the identity of the rows with pivots, so it is no longer clear which rows in the original matrix
correspond to the rows of the reduced matrix that have pivots. This is especially important when doing
row reduction on a calculator or computer algebra system, as we have no way of forcing the system
not to swap rows during the row reduction process (and they almost always are programmed to do so).
For this reason, it is standard procedure to always take a basis for the row space from the reduced
matrix and a basis for the column space from the original matrix.
The rows and columns play different roles, but they are related to each other in that the
dimension of the space they span must be the same. A 5 7 matrix with rank 3 has 5 rows and 7
columns, but the row space and column space both have dimension 3. There can only be three
columns and three rows with pivots. The columns, having 5 entries, are all elements of 5 . However,
they can only span a three dimensional subspace of 5 . The vector space 5 is clearly five
dimensional, so they cannot reach all of its elements despite the fact that there are 7 column vectors.
The rank of the matrix indicates that only three of them at a time can be linearly independent of one
546

Section XI.3: Matrix Transformations

another; the basis for the column space contains only three vectors. Since the whole of 5 is five
dimensional, there are two dimensions inaccessible to the columns. These are the vectors y that
make A x y an inconsistent system with no solutions. All of the same principles apply to the
rows: there are 5 of them, each residing in 7 , only three of which are linearly independent. There
are definitely vectors inaccessible to the rows of A, apparently 4 dimensions worth. The operation
A x y gives a linear combination of the columns of A, though, rather than its rows, so it is not
poised to tell us information about the row space. We can obtain a basis for this space using row
reduction, but we cannot easily tell which vectors will not be in the row space. To change rows over to
columns, we employ the transpose. The transpose AT of a matrix is the matrix obtained by swapping
rows and columns. The row space of A is the same as the column space of AT . These two matrices
are not in the same space of transformations; if A : 7 5 , then AT : 5 7 . The dimension of
the row space of A is the same as that of its column space, but they are not the same space. This idea
allows us to use row reduction techniques to study either space.
It will be instructive for us to characterize this determination of the vectors not contained in the
column space of matrix A, but computation makes this determination prohibitively difficult with a
5 7 matrix. Consider instead the 3 2 matrix
1 3

A 2 1 .
1 2

We need to determine the constraints on vectors in 3 that are not in col(A), so we row reduce with an
arbitrary vector in 3 :
1 3 a
1 3
a
1 3
a

R2 R2 2 R1
R3 7 R3 5 R2

1
b

7
b

2
a

7
b

2
a

R3 R3 R1

.
1 2 c
0 5

ca

0 0 5b 3a 7c
Thus, the system is inconsistent unless 5b 3a 7c 0 . It is useful to think of this in terms of the dot
product we are already familiar with. This will be introduced formally in a little while, but it makes
the visualization of this process easier. Thinking of the vectors as in three-dimensional space, we can
write
y y (a, b, c)
n n (3,5, 7) .
and
The requirement that y col( A) is equivalent to the requirement that n y 0 , or being in the space
means that you are perpendicular to the vector n. We know from multivariable calculus that the
space of vectors perpendicular to a given vector forms a plane. The vectors that are not in this plane
represent those inaccessible to the columns of A, and n is clearly one of these. This set does not form a
space (why?), but the span of vector n certainly is. The whole space can be spanned by including this
vector with the columns in a basis.
To see why this is the case, consider the question of whether or not the vector n is linearly
dependent with the columns of A. If it is, then it must be already be in the column space of A. The
equation 5b 3a 7c 0 , characterizing those vectors in the column space, is clearly not satisfied by
n , as its own components form the coefficients of the expression. The sum on the left is therefore
the sum of the squares of its components, which will not be zero unless each of them is independently
zero. Therefore, this vector does not lie in col(A) and can be included with the basis vectors of col(A)
547

Advanced Mathematical Techniques

in a basis for 3 . This idea generalizes to transformations of arbitrary size and rank: if we can find
one or more relations like 5b 3a + 7c = 0 that characterize elements of a given subspace, then the
vectors formed by their coefficients represent vectors that are linearly independent of the subspace and
can therefore be included in a basis for the entire space.
Vectors satisfying this requirement can be found, surprisingly, in a much easier manner.
Consider the space Nul(A) of matrix A. This space is in the same larger space as the row space of A, as
the transformation A : n m contains m rows, each in n ; the null space of A must lie in the
space we are mapping from. It is natural to ask about the relationship between these two subspaces of
n . Are vectors in one linearly independent or linearly dependent of those in the other? To answer
this question, lets investigate the meaning of the null space. The null space of matrix
2 3 1
A

1 2 1
is the set of vectors x 3 that are mapped to 0 2 . These vectors satisfy

a
2 3 1 0
2a 3b c 0

b
.
1

2
1
0


a 2b c 0
c
The two relations characterizing the null space are of exactly the same form of the relation
5b 3a 7c 0 , but there are two of them. Using the same logic as above, we arrive at the fact that
vectors in the null space of a matrix must be linearly independent of its rows. If the null space of a
6 9 matrix has dimension 3, then each of the three linearly independent vectors in its basis can be
included with the vectors forming a basis for the row space to make a basis for 9 . This basis will
definitely exhaust 9 because a 6 9 matrix with a three-dimensional null space has rank 6 by the
rank theorem. Its six rows are therefore all linearly independent of one another, and, when included
with the three linearly independent vectors from the null space, make 9 linearly independent vectors.
Any nine linearly independent vectors will form a basis for 9 , and no set of more than 9 vectors can
be linearly independent in 9 . Thus, the null space of matrix A completes the domain space of the
transformation A. If A : n m , then row( A) Nul( A) n .172 We can also complete the full
target space of the transformation, m , by employing the transpose: col( A) Nul AT m . These

two properties about the domain space and the target space are very useful in helping us to assess the
properties of these spaces. Given any mapping, the null space of the transpose of A characterizes the
vectors in the target space that are inaccessible to the transformation. Every vector inaccessible to the
transformation is the sum of a vector that is accessible and a nonzero vector in the null space of AT .
This fact is very nice, as it allows us to do the same row reduction process without having to
pay attention to what happens in the augment. A basis for the target space of the matrix
1 2 0

3 2 8
A
1 1 3

2 3 7
is given by row reduction of its transpose:

172

The circle/plus notation almost always re-appears when indicating the addition of two spaces.

548

Section XI.3: Matrix Transformations

1 3 1 2
1 3 1 2 1 3 1 2

R2 R2 2 R1

0 8 3 7 0 8 3 7 .
2 2 1 3
0 8 3 7
0 8 3 7 0 0 0 0

T
The row space of A is the same as the column space of A, so we have
1 2

2
3
col(A) span , .

1
1

2 3
To get the null space, we assign two free variables to the last two columns and solve:
1 5

7
3
T
Nul A span , .

0
8

0 1
These two bases together span the entirety of 4 . This process is important because it gives us a
basis of vectors that span the whole space, but also respect the column space of the matrix A. It would
be much easier to simply use the e-basis if all we wanted was a basis, but this basis may not easily
characterize the difference between the range space of the transformation and the target space. A
basis obtained from this process does, and it is really not that difficult to perform. One has to row
reduce in order to determine the rank of A and which columns to use for the basis of the column space
anyway, and its not that much more work to determine the null space basis as well.
Given a vector space, it is clear that there are many different choices of basis vectors. Any
three linearly independent vectors in a vector space that has dimension 3 will suffice as a basis for the
vector space. It is often important to understand how to change from one basis to another. Consider
the space 2 , and the two bases
1 0
2 3
e e1 , e2 ,
and
B b1 , b2 , .
0 1
1 2
These two sets are definitely bases for 2 , as they each contain two linearly independent vectors in
2 . The first is the easiest basis to use, called the e-basis. This is the basis we have implicitly used
whenever referring to vectors in 2 , as well as the analogous basis in n . The vectors in the Bbasis are naturally given in terms of this e-basis. We must choose a basis in order to explore and
study a vector space, and, in its own basis, the basis vectors look like those of the e-basis. In the Bbasis, the basis vectors of this basis are written
1
0
and
b1
b2 .
0 B
1 B
The subscripts are an indication of the basis that the vectors are given in. The vector
2
2
3 5
v 2 3 ,

3
1
B
e
2 e 8 e
where I have included the e subscript to distinguish it from the B subscript. With a bit of practice,
you will be quick to note that this can also be written as
5 2 3 2

,
8 e 1 2 eB 3 B
549

Advanced Mathematical Techniques

so the matrix
2 3
T

1 2 eB
transforms a vector given in the B-basis to the e-basis. Since the columns of this matrix are linearly
independent, it maps 2 to itself in a one-to-one manner. Since the rows are linearly independent, it
also maps 2 onto itself. Each vector in the e-basis is mapped to by exactly one vector in the B-basis,
and every vector in the e-basis can be obtained by transforming the appropriate vector in the B-basis.
Transformations like this one that are both one-to-one and onto are called invertible because we can
trace through the transformation and do it backwards. Given a vector in the e-basis, there is a unique
vector in the B-basis that will be taken to this vector by the transformation T. The transformation T 1
accomplishes this act. We will discuss some of the techniques used to determine the inverse of a
square matrix, and the properties exhibited by invertible matrices in the next section; for now, I will
continue under the assumption that you have, at some point in the past, been made aware of the fact
that the inverse of a 2 2 matrix is given by
a b
1 d b
A
A1

.
c
d
ad
bc c a

In our case, the inverse of this change of basis transformation T is given by


1 2 3
T 1
.
7 1 2 Be
This transformation takes us from the e-basis to the B-basis. Obviously, acting first with T on a vector
given in the B-basis and then with T 1 takes the vector back to its original form in the B-basis. Acting
with these transformations in the opposite order on a vector given in the e-basis gives the same result,
so we have
1 0
1 0
1
T 1T
I , TT
I.
0 1 BB
0 1 ee
If we are interested in changing from one basis to another, neither of which is the e-basis, it is
often simplest to consider mapping first from one of the bases to the e-basis, then transforming from
the e-basis to the new basis. It is important to think carefully about which transformation to apply
first, as the transformations do not in general commute. We need to multiply them in the right order to
get the correct result. See if you can show for yourself that the transformation from the B-basis to the
C-basis
3 1
C c1 , c2 ,
1 2
is given by
1 5 4
M

7 1 9 CB
and that from the C-basis to the B-basis is given by
1 9 4
M 1
.
7 1 5 BC
These matrices are obviously inverses of each other, but you should try to obtain them separately in
order to practice determining the right order to multiply the matrices in.
It may not be clear at this point why we care about transformations between different bases in a
vector space. Why would we want to use any basis other than the simply e-basis anyway?! In
550

Section XI.3: Matrix Transformations

applications, both mathematical and physical, certain bases are often held above the others for a
variety of reasons. The most important of these involves the idea of eigenvectors that we will discuss
below. In quantum mechanics, different bases are relevant for different physical measurements. We
need to use a different basis to consider the spin component of an electron along the x-axis than that
needed to assess the component along the z-axis. Transformations between the two bases are
extremely important in this application for that reason. These change of basis transformations are so
important in quantum mechanics that many quantum textbooks feature tables of their coefficients
inside the front cover.

Exercises for Section XI.3:


In problems 1 9, find a basis for the column space, row space, and null space of the matrix. Then,
determine the rank of the matrix and show that the rank theorem is satisfied.

2 3 1 4

1. 2 2 3 1
1 2 4 5

2 1

2. 3 2
5 3

2
3.
1

3 1

3 7
3 1

5 1

2 3 1 2 4

4. 3 1 2 4 5 .
9 8 1 2 17

1 1 1

5. 2 2 2
1 1 1

2
6.
0

2 3 4

3 4 1
1 2 7

5 7 5

1 2 2 3 3

7. 2 3 4 5 5
3 1 6 2 2

1 2 2

2 4 4
8.
2 1
3

3 3 1

1 2

9. 2 4
1 2

10. Is it possible for the dimension of the row space of a matrix to differ from that of the column
space? Explain.
11. A matrix of rank 3 has 15 columns and 6 rows. Determine the dimension of the row space,
column space, and null space of this matrix. How many free parameters are there in the solution
of a linear system with this matrix as the coefficient matrix, assuming that there are solutions?
Is it possible for there to be no solutions to this system? Is it possible for this system to have
exactly one solution? Explain.
12. A matrix of rank 7 has 22 columns and 7 rows. Determine the dimension of the row space,
column space, and null space of this matrix. How many free parameters are there in the solution
of a linear system with this matrix as the coefficient matrix, assuming that there are solutions?
Is it possible for there to be no solutions to this system? Is it possible for this system to have
exactly one solution? Explain.
551

Advanced Mathematical Techniques

13. A matrix of rank 16 has 16 columns and 22 rows. Determine the dimension of the row space,
column space, and null space of this matrix. How many free parameters are there in the solution
of a linear system with this matrix as the coefficient matrix, assuming that there are solutions?
Is it possible for there to be no solutions to this system? Is it possible for this system to have
exactly one solution? Explain.
14. Is the set of solutions x to the system A x b , with b 0 , a vector space? Explain why
or why not, and explain why the qualifier b 0 is relevant to this analysis.
In problems 15 18, find the transformation matrix that that changes the basis of 2 from the first
given basis to the second, then use this transformation to change the vector x given in the first basis
to the second basis.

2 3
6
15. The e-basis to B = , , x .
3
3 1

2 5
7
16. The e-basis to B = , , x .
2
1 2

3 1 2 4
5
17. , to , , x .
1 B
2 2 3 1

2 3 4 3
3
18. , to , , x .
7 B
1 5 3 2

In problems 19 and 20, determine whether or not the given statement is true.
2 5
3 2
3
4

20. 3 span 1 , 3 .
19. 2 span 1 , 4 .

6 6
2
2


5 1

Section XI.4: Determinants and Invertibility


The question of whether or not a matrix is invertible is very important in linear algebra.
Addressing this question ideologically, we recognize that the properties of onto and one-to-one are
essential to invertibility. A mapping that is not one-to-one will take an infinite number of vectors in
the domain to the same vector in the range, so we cannot determine which of this infinite number of
vectors we started with when we are only given the result of the transformation. Obviously, there is no
direct path back to the original vector in this case: the transformation cannot be invertible. A mapping
that is not onto cannot be inverted because the inverse mapping must be able to act on every vector in
the target space instead of just those vectors in the range of the original mapping. Where do these
vectors go? No vector in the domain is mapped to them by the original transformation, so they cant
be taken to the domain space in any well-defined manner. One way to accomplish a reverse mapping
is to send many of these non-range vectors to the zero vector in the domain space, but this causes the
reverse mapping to not be one-to-one and leads to the troubles indicated above. The only
transformations that can be inverted in a well-defined manner are those that are both onto and one-toone. In accordance with our earlier discussions, such transformations must have a pivot in every row
as well as one in every column. This implies that they must be square matrices with linearly
independent columns.
552

Section XI.4: Determinants and Invertibility

To arrive at this conclusion in a different manner, we consider the transformation A x y


and ask how we can somehow undo this transformation. We begin with writing y in terms of a
standard basis,

y c e1 d e2 .
Suppose we are able to find that the vector b1 maps to e1 and the vector b2 maps to e2 under
this transformation:
A b1 e1
and
A b2 e2 .
Clearly, we have

A c b1 d b2

c e

d e2 y ,

so
x c b1 d b2 .
Writing a matrix M whose columns are the vectors b1 and b2 , we see that
c
x M .
d
1
In other words, M A . The vectors b1 and b2 can be determined by row reduction:

1 0 1 0
M ,
A

0 1 0 1

provided that there is a pivot in every row of A. This analysis not only affirms the requirement of a
pivot in every row and, by extension, one in every column, but also gives us a realistic means with
which to find the inverse matrix. As an example, we determine the inverse of a general 2 2 matrix:
a b 1 0 R2 aR2 cR1 a
b
1 0

c d 0 1
0 ad bc c a
.
(ad bc)a
0
ad bc bc ab
R1 ( ad bc ) R1 bR2

0
ad bc
c
a

Dividing the top row by a(ad bc) and the bottom by ad bc gives the above result.
Some properties of inverse matrices follow directly from their definition and the properties of
matrix multiplication. If matrix B is invertible, then there must be a matrix A for which AB = I.
Similarly, there must be a matrix C for which BC = I. Multiplying the first of these equations by C on
the right, we obtain A = C. Therefore, the left and right inverses are the same; a matrix and its
inverse must commute. Furthermore, there can be only one inverse of a given matrix. This property of
uniqueness of matrix inverses is fundamental and always true for invertible matrices. It does contain
some subtleties, as we will see below, but it always rings true once we understand how to interpret it.
These types of broad statements about the properties of mathematical quantities are often overlooked
by students the first time they are presented, as they are not required to go through the motions and
determine the inverse of a given matrix via row reduction. Although not a lot of work is required to
determine these results, we must never forget that these properties are the foundation on which the
structure of mathematics and the techniques for determining the inverses are built. These overriding
properties of the mathematics often allow us to argue our way out of difficult situations in which the
techniques used to determine the inverse fail for one reason or another, and point the way toward a
resolution of the difficulty.
553

Advanced Mathematical Techniques

This row reduction technique for finding the inverse of a matrix is always accessible for any
finite-dimensional transformation, almost always more efficient than some of the more advanced
techniques, and can be done very quickly via computer, but it is unwieldy when considering
transformations with variable components like that shown above. The analogous operations with a
general 3 3 matrix are quite involved, and those for a 4 4 or larger matrix are essentially
prohibitive. For this and other reasons, it will be useful for us to define a simple scalar number that
characterizes whether or not a given matrix is invertible. If the matrix is invertible, then this number
will also contain information about the transformation properties of the matrix, specifically how it
treats the distance between two vectors. This scalar will ultimately be called the determinant of the
matrix, and is the same quantity you have discussed in earlier classes.
To begin, we are looking for a quantity that will automatically be zero (meaning the matrix is
not invertible) whenever two rows or columns are the same. This is accomplished by forming a
quantity that changes its sign whenever two rows or columns are interchanged:
R1
R2


det R2 det R1 .
R
R
3
3
This property is somewhat awkward to indicate mathematically, but it is essential to the structure of
the determinant. Suppose, for example, that a row is replaced with itself plus a linear combination of
the other rows. We do not want this addition of a linear combination of the other rows to affect our
scalar in any way, as it does not modify the space spanned by the rows and therefore cannot alter the
transformation properties of the matrix in any substantive way. In order to do this, we impose the
restriction of linearity on the determinant in all of its rows:
R1
R1
R1



det R2 aR det R2 a det R .
R
R
R
3

3
3
Here, R represents an arbitrary row and a an arbitrary real number. Once this restriction is imposed,
the above property immediately implies that the addition of a linear combination of the other rows
cannot change the determinant:
R1

R1
R1
R1
R1





det R2 aR1 bR3 det R2 a det R1 b det R3 det R2 .

R
R
R
R
R3

3
3
3
3
The last equality follows from the above property, as exchanging two rows costs a sign. Because of
this, the determinant remains unchanged whenever a row is replaced with itself plus any linear
combination of the other rows. The determinant of a matrix consisting of linearly dependent rows will
therefore definitely be zero in this construction.
In order to ensure linearity, we need to construct the determinant in such a way that all of its
contributions are linear separately in each row. To get a feel for how this works, we can turn back to
row reduction. Whenever we fully row reduce a matrix, we end up with a set of columns containing
zero in all entries save one. The same is true of the rows. At the end of the process, we have a single
nonzero element in all columns and a single nonzero element in all rows: the pivots. If we always
replace a row with itself plus any linear combination of the others, then the determinant remains
unchanged throughout this process. Thus, we can define the determinant of a matrix as the product of
the remaining elements, the pivots, provided that we account for the property that exchanging rows
costs a sign. This is accomplished by numbering the columns and rows from 1 to n. We take the
columns in order, from 1 to n, and write the row number of each pivot in order. The list of row
554

Section XI.4: Determinants and Invertibility

numbers is then changed systematically into the correct numerical order, tallying a sign for each
swap. For example, the row reduced matrix
0 3 0 0

0 0 5 0
2 0 0 0

0 0 0 3
has row permutation 3124. In changing this to the right order, we first change 1 and 3 to obtain
1324, then exchange the 2 and 3 to obtain 1234. This double exchange contains two signs, so the
product (-3)(5)(-2)(3) comes with a positive sign. The determinant is therefore +90. This rule
explicitly preserves the requirement that exchanging rows costs a sign, and can be used to determine a
general rule for the determinant that does not require us to row reduce the matrix first. Our present
definition is linear only in the rows that contain pivots, as it refers only to the pivots, so cannot be used
to define the determinant in general. The general rule requires us to consider all products consisting of
a single element from each row and a single element from each column, taken with the sign indicated
by the permutation associated with the rows. This definition is explicitly linear in both the columns
and the rows, taken one at a time, as each contribution contains exactly one factor from each of the
rows and columns. It will therefore automatically have all of the properties we require of the
determinant. The determinant of matrix A is defined as
det A k1k2 kn Ak11 Ak2 2 Akn n .
The numbers k1k2 kn , indexing the rows of A, are summed over all values from 1 to n. Each
contribution contains exactly one element of each column. The antisymmetric tensor k1k2 kn is equal
to +1 when the set of numbers k1k2 kn is an even permutation of 12 n (requiring an even number
of swaps to get to the right order) and -1 when the set is an odd permutation. It imposes the swap
requirement and implies that every contribution contains exactly one element of each row as well.
Since there is exactly one element from each row and column in each term, this definition is manifestly
linear in both the rows and the columns of A.
While this definition definitely satisfies our requirements, it is not practical to implement in
practice except in special situations. The number of terms in the sum associated with this definition is
equal to n!, as there are n ways to choose the row associated with the first column, n 1 ways to
choose the row associated with the second, and so on. For n = 2 or 3, this is not a big deal. We can
easily account for 2 or 6 terms in the sum. Large matrices, on the other hand, are quite different. It
would take quite a while to add up the 1.55 1025 contributions to the determinant of a 25 25 matrix,
and this matrix is actually not that large by modern standards. A 10, 000 10, 000 matrix has
2.846 1035,659 contributions, far beyond current computational capabilities. It is far more efficient to
use row reduction techniques first to reduce the number of nonzero elements of the matrix before using
this definition, as any term that contains a zero need not be included. Despite this inefficiency inherent
in the determinant, we can still derive a great deal of useful information from the idea that the
determinant exists and how it can be calculated.
One of the most important results for the determinant is the cofactor expansion. You should
already be somewhat familiar with this process from algebra II and pre-calculus, but you were
probably not told why it works. Consider the determinant of matrix A. As defined above, this quantity
consists of a sum of all possible products containing a single element from each column and a single
element from each row, each contribution taken with the appropriate sign. We can exploit the fact that
each contribution must contain a single element from each column to expand the determinant across
a column of the matrix. Every contribution to the determinant must contain an element from the first
555

Advanced Mathematical Techniques

column of matrix A. Those that contain the first element of the first column cannot contain any other
element of either the first row or the first column, so the sum of all these contributions is equal to the
first element of the first column times the sum (with appropriate sign) of all possible products
including a single element of every column except the first and a single element from every row except
the first. We immediately recognize this sum as the determinant of the matrix A becomes when the
first row and first column are removed. Those contributions to the determinant of A that contain the
second element of the first column are similarly equal to the product of this element and the sum (with
appropriate sign) of all possible products containing a single element from each column except the first
and a single element from each row except the second. This sum is recognized as the determinant of
the matrix A becomes when its first column and second row are removed, with an extra sign arising
from the fact that an extra row switch must be made in A itself to bring the 2 associated with row 2 to
its proper place; the row permutation associated with A is given by 213n, where the 13n is in the
correct order as far as the determinant of the matrix with the missing row and missing column is
concerned. Similarly, terms containing the third element of the first column combine to give the
product of this element and the determinant of the matrix A becomes when its first column and third
row are removed. There is no extra sign here because the row permutation in this case is given by
3124n, which requires two permutations to arrive at the proper order. Going through this analysis
with a 4 4 matrix gives
3 2 1 4
2 3 1
2 1 4
2 1 4
2 1 4
2 2 3 1
3 1 3 2 (2) 1 3 2 (2) 2 3 1 (4) 2 3 1 .
2 1 3 2
1 7 2
1 7 2
1 7 2
1 3 2
4 1 7 2
This is called a cofactor expansion, and the determinants associated with each entry, along with the
appropriate sign, are called cofactors of the entry.
We can do a cofactor expansion of a matrix along any of its rows or columns. The appropriate
signs associated with each of these elements is given by (1) r c , where r and c are the row and
column numbers, respectively, of the element. These signs alternate across the matrix in the pattern



As an example of a cofactor expansion, lets compute the determinant of the matrix
2 3 1 4

1 3 2 1 .
1 1 3 2

2 2 2 3
To reduce the number of nonzero elements, we first row reduce:
2 3 1 4
2 3 1 4

R3 R3 R2
1 3 2 1
1 3 2 1 .

1 1 3 2 R4 R4 R1 0 4 5 1

2 2 2 3
0 1 1 1
As columns are essentially equivalent to rows as far as the determinant is concerned, we can also
column reduce:
556

Section XI.4: Determinants and Invertibility

2 3 1 4
2

C2 C2 C3
1 3 2 1
1

0 4 5 1 C4 C4 C3 0

0 1 1 1
0
Expanding across the bottom row, we have
2 3 1 4
2 4 5
2
1 3 2 1
(1) 1 1 1 1
1 1 3 2
0 1 6 0
2 2 2 3

1 2 1
.
1 5 6

0 1 0
4

4
0

5
2
7 1

1 6

0
0

0 1

19
7
6 .

2 19
14 19 5
1 7
The final row reductions are not necessary, but they make for easier computation of the final
determinant. This is characteristic of general determinant computations: one needs to decide how
much row/column reduction work should be done before just giving in to the cofactor expansion.
Row reduction is better organized and easier to accomplish both with computers and by hand than
the cofactor expansion, so it is often worth it to do some row reduction prior to the cofactor expansion.
One very important property of determinants is the fact that the determinant of the product of
two matrices is equal to the product of the determinants of the two matrices,
det( AB ) det( A) det( B ) .
We can establish this fact by first considering what happens if either matrix A or matrix B is not
invertible. In this case, the matrix AB cannot be invertible because, if it were, then there would have to
be a matrix C for which ABC = CAB = I. This implies that A (BC) = I and (CA) B = I, so both matrix
A and matrix B are invertible. If either A or B is not invertible, then AB is also not invertible; both
sides of our expression are zero and the theorem is true. If both A and B are invertible, then we can
row reduce either of them to the identity. Lets row reduce matrix A to the identity, without loss of
generality (you will see that the proof is substantively unchanged by reducing B instead). We can
write each row reduction on A as a matrix E times A, where E represents either a swap of rows, the
replacement of a row with itself plus a linear combination of the other rows, or the scaling of a row.
Thus,
A E1 E2 Er I and AB E1 E2 Er B ,
where r row reductions are required to reduce A to the identity. Taking the determinant of AB, we see
that all of the row reductions can easily be compensated for in terms of our earlier results for the
determinant. If the row operation consists of a swap of rows, the determinant is multiplied by -1. If it
consists of replacing a row with itself plus a linear combination of the other rows, the determinant is
unchanged. If it consists of a scale, the determinant is multiplied by the scale factor. The scale factors
associated with reducing matrix A to the identity are just the pivots of A, so the end result of these row
operations is to make
det( AB) (1) p Pk det( B) ,
(1)

where p is the number of row switches necessary in the row reduction of A to the identity and the
product Pk is the product of the pivots of A. If B is taken as the identity, then this result indicates
that the determinant of A is equal to (1) p times the product of the pivots of A (this was also our
primordial definition of the determinant given above). Therefore, we arrive at
det( AB ) det( A) det( B ) ,
557

Advanced Mathematical Techniques

as desired. Note that this result immediately implies that the determinant of a product of matrices is
the same regardless of the order in which the matrices are multiplied. The matrix AB may not equal
the matrix BA, but their determinants are the same.
The idea of determinants allows us to give an explicit representation for the inverse of a matrix.
This representation is prohibitively complicated computationally, and does not beat the row reduction
method outlined above for finding the inverse, but it is often useful to have an explicit representation
for the inverse when proving theorems and thinking about how things will ultimately go in an explicit
calculation. We begin by defining the adjugate of matrix A, whose elements are given by the cofactor
of the corresponding elements of A. For example, the adjugate of the matrix
3 2 1

A 1 1 3
2 1 2

is given by
1 8 3

C 5 8 1 .
7 8 5

This definition implies that the dot product of a column of A with the corresponding column of C
gives the determinant of A, as it represents a cofactor expansion along that column of A. Mismatching
columns gives an entirely different result. Suppose that we calculate the dot product between the
first column of A and the second column of C. This product represents the determinant of the matrix
obtained by replacing the second column of A with its first column; the second column of the adjugate
matrix contains information about all columns of A except for the second one. Multiplying it by any
column gives the determinant of the matrix obtained by replacing the second column of A with that
column. The determinant of the matrix obtained when the second column of A is replaced with its first
column is obviously zero, as two columns are repeated. This occurs whenever a column of A is
mismatched with a column of C, so by inspection we can assert that
0
0
det A
CT

C A 0
A1
det A
0
.
det A
0

0
det A

The transpose changes the columns of C into rows, so the matrix multiplication associated with C T A
represents dot products between the columns of C and the columns of A. This formula for the
inverse of a matrix explicitly indicates that matrices with zero determinant are not invertible. The
product C T A is equal to the zero matrix if A is not invertible.
The adjugate formula for the inverse of a matrix is useful in deriving Cramers rule for the
solution of linear systems of equations.173 If there is a unique solution to the system of equations
A x y , then it is given by
1
CT y .
det A
The product represents products of the columns of the adjugate and the vector y . The top entry of
x A 1 y

x is given by the product of the first column of the adjugate matrix and the vector y , which, as we
173

This rule is named for the Swiss mathematician Gabriel Cramer, who published it along with a series of other algebraic
results in his 1750 treatise, despite the fact that it was also published (posthumously) two years earlier in a treatise by the
Scottish mathematician Colin Maclaurin. Poor Colin first Taylor and now this!

558

Section XI.4: Determinants and Invertibility

have seen, is equal to the determinant of the matrix obtained when the first column of A is replaced
with the vector y , divided by the determinant of A. The other entries of the solution x are given
by analogous determinants, consisting of the matrix A with one of its columns replaced by the vector
y , divided by the determinant of A. The nicest thing about Cramers rule is that it allows us to
calculate a given component of the solution vector without having to calculate the whole thing. This is
extremely useful when the system is very large, as only two determinants are required.

Exercises for Section XI.4:


In problems 1 6, find the determinant of the given matrix.

2 3 1

1. 4 1 3
2 1 5

1 2 3

2. 1 5 2 .
3 1 4

2
3.
3

2 1

3 2
4.
1 3

2 1

3
5.
0

2
6.
0

2 4

3 6
2 4

2 4

4 1 2
0 2 3

3 2 4

1 0

4 3

1 4 3
.
4 1 5

2 3 4
1

7 4 10
.
0 2
5

7 3
7
4

In problems 7 12, find the inverse of the given matrix. If the matrix is not invertible, show that the
product of it and its adjugate gives the zero matrix.

2 3 1

7. 4 1 3
2 1 5

1 2 3

8. 1 5 2
3 1 4

2 3 4

9. 1 5 3 .
5 1 5

2 3 4

10. 3 2 5
0 13 2

1 4 2

11. 1 3 5
2 7 3

4 1 2

12. 2 3 2
3 5 1

In problems 13 16, solve the given system using (a) row reduction, (b) the inverse matrix, and (c)
Cramers rule. Show that the solution is the same in all three cases, and explain which of the three you
found easiest.

2x 3 y z 4
13. x 2 y 3z 6

2 y z 2

x 2y z 3
14. 2 x y 5 z 2

x 3y 2z 6

559

Advanced Mathematical Techniques

x 3y 2z 3
15. 2 x y z 2
3x 2 y 2 z 5
3 2 4
2 1

17. Given the matrices A 1 2 1 and B 2 4


2 3
3 1 3

product AB is equal to the determinant of the product BA.

5 x 2 y 3z 6
16. 2 x 3 y z 2 .
4x z 7
3

1 , show that the determinant of the


1
Are the two matrices the same?

2 1
2 1 3

18. Given the matrices A 3 2 and B


, show that the determinant of the product
4 2 1
5 1

AB is not equal to the determinant of the product BA. One of these determinants is actually 0.
Explain why this determinant is expected to be zero and why these two matrices do not conform
to the expectations of a determinant that is independent of the order of the matrices.
Section XI.5: General Vector Spaces and Isomorphisms
Our definition of the term vector space is very general, but up to now we have only used it to
describe the behavior of n and various subspaces. You may wonder about the point of deriving
such an intricate framework of ideas just to apply them to this small set of vector spaces. It turns out
that all of the above framework can be applied to any finite-dimensional vector space through the
important idea of isomorphisms. Two vector spaces are said to be isomorphic to one another if there
exists an invertible mapping between them that preserves all of the algebraic properties of both spaces.
Given two vector spaces with the same dimension, we can always find such a mapping simply by
choosing a basis for each of the spaces and associating each element of one basis with an element of
the other. Since the two bases have the same number of elements, this mapping will definitely be oneto-one and onto. As long as our transformation is linear in nature, it will automatically preserve all of
the algebraic properties. As an example, consider the two vector spaces
3 1

V span 2 , 3
1 2

and 2 . Both of these spaces are definitely two-dimensional, so there definitely exists an
isomorphism (infinitely many, in fact). In order to find one, we associate each of the vectors in the
given basis for V with a vector in a basis for 2 . Choosing the e-basis of 2 for simplicity, we map
3
1
1
0
and
2 0
3 1 .
1
2


In order to be linear, this mapping must be accomplished by a pair of matrices: one that takes V to 2
and one that takes 2 to V. The arbitrary vector
560

Section XI.5: General Vector Spaces and Isomorphisms

a
1 0
2
a b
b
0 1
is mapped to the arbitrary vector

3 1

a 2 b 3 V .
1 2

2
This mapping from to V is accomplished by the matrix
3 1

N 2 3 : 2 V .
1 2

You remember from the above that this matrix is not onto when thought of in the previous light as a
mapping from 2 to 3 . It does map onto the vector space V, however, as shown explicitly by the
above relation. This is a two-dimensional subspace of 3 , so all of its vectors are accessible to the
transformation. We cannot obtain vectors in 3 that do not lie in V, but this does not concern us. We
are interested only in those vectors that lie in V. The inverse mapping from V to 2 is a little more
difficult to obtain, as it requires us to make a choice. We require a matrix

M


satisfying
3
3
1
1
1
0
M 2
M
2
and
3


3 1 .

1 1 0
2



2
These are only four equations in six variables, so we have two free choices to make. Choosing
0 gives
0 2 7 3 7
M
,
0 1 7 2 7
while choosing 0 gives
0 1 5
25
M
.
2 11 3 11 0
These appear to be different transformations, but they behave in exactly the same manner when
applied to any vector in V. They are no different from our perspective.
It is strange, to say the least, to think of the matrices M and N are inverses of one another, as
neither are square. When thought of as mappings between 2 and 3 , N is not onto and M is not
one-to-one. Nevertheless, both of these transformations are perfectly invertible when thought of as
mappings between 2 and V. To see how this works, let us first imagine starting with a vector in 2
and mapping it first to V using matrix N, then back to 2 using matrix M. The vector is ultimately
multiplied by the matrix
1 0
MN
,
0 1
so is brought back to itself as expected. You can check for yourself that this result is valid regardless
of which choice you make for the matrix M. This fact seems to violate our above result that matrix
561

Advanced Mathematical Techniques

inverses are unique, as well as the fact that matrix M has a nontrivial null space. However, these
properties are not relevant when thinking of the matrices as transformations between 2 and V; the
matrices M are not different as far as their transformation properties between these two spaces are
concerned, and the null space of matrix M is not part of V so does not matter for our purposes. These
facts are made far more apparent by the effective transformation associated with taking a vector from
V to 2 and back. We obtain
0 5 7 11 7

NM 0 1
0
0 0
1

when using the first matrix M and


76 55 3 11 3 5

NM 14 55
9 11 2 5
42 55 6 11 1 5

when using the second. These matrices certainly do not look the same, but neither did the matrices M.
It is not at all obvious that they will represent the identity, as neither looks much like the identity, but
before we throw these matrices away and go home, lets consider their action on the important part
of 3 , the subspace V. When acting on the first basis vector of this space, we obtain
0 5 7 11 7 3 3
76 55 3 11 3 5 3 3


0
1
0
2

2
and
9 11 2 5 2 2 ,


14 55

1 1


1
0 0

42 55 6 11 1 5 1 1
while it gives
0 5 7 11 7 1 1
76 55 3 11 3 5 1 1


0 3 3
and
9 11 2 5 3 3
0 1
14 55
0 0
42 55 6 11 1 5 2 2
1 2 2


on the second basis vector. Hence, both of these matrices are indeed the identity when acting on the
vector space V. They do not look like the identity matrix because they are not the identity when acting
on 3 . Both of these matrices have nontrivial null spaces, but these null spaces do not affect the
action on the vector space V. The matrices look different because their null spaces are different, but
their action on V is identical. As long as we are only interested in V and not the whole of 3 , these
matrices can both be considered the identity. Is it possible to choose a form of the matrix M for which
the product NM is the identity for all of 3 ? What do you think? What would it mean if we could?
The above example illustrates the subtleties that arise when we work with a subspace, as well
as the consistency that comes up when this is done properly. It is not wrong to work with subspaces,
just tricky. Since we know that there exists an isomorphism between every space of dimension n and
n , we can avoid dealing with subspaces by working exclusively in n rather than in the subspace.
The properties of isomorphisms guarantee that every operation we perform in n is matched exactly
by the analogous operation in the subspace. As an example, consider the space of polynomials with
degree less than or equal to 4. This is certainly a vector space, as the sum of two polynomials with
degree 4 or less is definitely also a polynomial satisfying this condition and every multiple of a
polynomial in this space is also in the space. A standard basis for this space of polynomials is
1, t , t 2 , t 3 , t 4 , so the space has dimension 5. Mapping this basis directly to the e-basis of 5 , we have
562

Section XI.5: General Vector Spaces and Isomorphisms

7

2
3t 4 2t 3 5t 2 2t 7 5 .

2
3

3
2 4
2
To effect a change to the basis t t , t 2t , t 3, t 2,3t 4 , we consider the change from the ebasis of 5 to the basis

0 0 3 2 4

0 2 1 1 0
1 , 0 , 0 , 0 , 3 .
1 0 0 0 0

0 1 0 0 0
Using our above analysis, the transformation from the new basis to the e-basis is given by
0 0 3 2 4

0 2 1 1 0
T 1 0 0 0 3

1 0 0 0 0
0 1 0 0 0

eB
and the transformation from the e-basis to the new basis is given by
0
0 0 0 15

0
0
0
0
15

1
T 1 3 6 4 4 12 .
15

3 9 4 4 18
0 0 5 5
0 Be

For example, the vector 3t 1 t 3 is given by


0 1 1
0 0 0 15


0 0 0 0 15 3 0
1
5
4
1
3 6 4 4 12 0 5 3 t 3 t 2 t 3 t 2 3t 2 4 .
15
3
3
3

3 9 4 4 18 1 4 3
0 0 5 5
0 Be 0 e 1 3 B

This transformation allows us to determine the coefficients of the B-basis vectors for any polynomial
given in the standard basis. It is probably not clear to you at this point why we care about such
transformations; the standard basis for polynomials is simple enough, so why should we choose to
work in such a nonstandard basis? There are reasons for such a replacement, but they will not be clear
until after the next section. For now, rest assured that there are important reasons why one needs to
know how to change from one basis to another in many spaces. The isomorphisms between all vector
spaces with the same (finite) dimension allow us to employ all of the techniques learned above to
handle these sorts of transformations.
563

Advanced Mathematical Techniques

Now that we have considered a function space consisting of functions of the variable t, it is
interesting to see what our isomorphism implies about linear operators that are usually considered to
act on such functions. The derivative operator, for example, is certainly linear, as
dy
dy
d
c1 y1 c2 y2 c1 1 c2 2 .
dt
dt
dt
This implies that its analogue operator acting on the appropriate isomorphic vector space n is a
matrix. Try to show that this operator is given by
0 1 0 0 0

0 0 2 0 0

d
0 0 0 3 0
dt

0 0 0 0 4
0 0 0 0 0

in the vector space associated with the space of polynomials of degree four or less, with the standard
basis. This matrix has a nontrivial null space (can you explain why without using row reduction?), so
is not invertible. It does map the space of polynomials of degree four or less to itself, though, so can
be applied repeatedly. The square of the derivative operator is given by
0 0 2 0 0

0 0 0 6 0
2

d
0 0 0 0 12 ,
dt 2

0 0 0 0 0
0 0 0 0 0

which has a null space with dimension 2 (why? Try to determine the answer both from the point of
view of the matrix and from that of calculus). The derivative operator itself does have an inverse (in
some sense), but this inverse cannot be written as a matrix mapping the space of polynomials with
degree equal to or less than 4 (why?). These matrix operations acting on a function space are very
useful for computer algebra systems, as computers are not directly applicable to functions themselves
but can easily be programmed to multiply matrices together. One of the reasons why linear algebra is
so applicable to so many systems is that it allows us to change our point of view about familiar
operations and apply the ideas associated with n to much more general systems.
One very important class of vector spaces in mathematical analysis is the space of solutions to
a given linear differential equation. Consider the differential equation
dy
t2
y 3.
dt
This differential equation is called linear because the differential operator associated with it,
d
L t2 1,
dt
is a linear operator: L c1 y1 c2 y2 c1 L y1 c2 L y2 for all c1 , c2 and differentiable functions
y1 and y2 . The set of differentiable functions definitely forms a vector space, as the sum of any two
differentiable functions is also differentiable, and so is any constant multiple of a differentiable
function. This space of functions is a little tricky, in that it cannot be finite-dimensional because it
clearly contains the space of polynomials of any degree. It also contains functions that are
differentiable only one time, so the linear differential operator does not map this space into itself. We
can consider instead the space of functions that are differentiable infinitely many times. The
564

Section XI.5: General Vector Spaces and Isomorphisms

differential operator does map this space of functions into itself, so can in some sense be thought of as
a matrix. In this light, we are looking for the set of solutions to the matrix equation
L y3.
The function y = 3 certainly solves this equation, as can easily be seen by inspection, but what is the
most general solution to this equation? It turns out that our single solution y = 3, called the particular
solution y p paves the way to finding this most general solution in a very simple way. Suppose that we
can find two solutions to this equation, y1 and y2 . The difference between these solutions satisfies

L y1 y2 L y1 L y2 3 3 0
by the linearity of L. Therefore, the most general solution to this equation is given by the set of all
functions of the form y y p y H , where the homogeneous solutions yH are elements of the null
space of the operator L: L yH 0 . This property is common to all linear differential equations: the
general solution is obtained by finding a single solution and analyzing the null space of the differential
operator.
How can we analyze the null space of this differential operator? Well, it is useful to first obtain
a value for its dimension. The whole space is infinidimensional, but this need not be the case for the
null space of the differential operator. Suppose we have two linearly independent elements of the null
space, y1 and y2 . Then, the equation c1 y1 c2 y2 0 can be solved only by taking c1 c2 0 .
Taking this equation and its derivative, we obtain the set of equations
c1 y1 c2 y2 0
.
c1 y1 c2 y2 0
If this set of equations is satisfied only for c1 c2 0 , then the determinant

y1 y2
y1 y2
must be nonzero for at least one value of t. If it was always zero, then the system of equations must
admit multiple solutions for c1 and c2 , and the functions are not linearly independent. This
determinant is called the Wronskian of the functions y1 and y2 after the Polish mathematician Jzef
Hoene-Wronski, and plays a very important role in the study of linear differential equations. Since
both of the functions y1 and y2 are presumed to be elements of the null space of L, it must be true
that
y
y
and
y1 21
y2 22 .
t
t
This means that the Wronskian can be written as
y y2
y1
y2
1 y y2
W y1 , y2 1

2 1
0,
y1 y2 y1 t 2 y2 t 2
t y1 y2
where the last equality follows from the fact that two rows are repeated. Therefore, any two elements
of the null space of L must be linearly dependent and the dimension of the null space is less than 2. If
there are any nontrivial elements in the null space, then it has dimension 1.
In order to find a basis for this null space, we need to solve the differential equation
dy
t2
y 0.
dt
This is a separable equation, so can easily be solved:
W y1 , y2

565

Advanced Mathematical Techniques

dy
1
2 y
dt
t

dy
dt
2
y
t

1
ln y C
t

y Ke1 t .

Here, K is an arbitrary constant. The basis for the null space is therefore e1 t , and the general
solution to our original equation is

y 3 Ke1 t .
In order to give a single solution, we require a condition to fix the value of K. If, for example, we are
given that y(1) = 0, then the unique solution to the equation and this initial condition is given by
y (t ) 3 1 e1 t 1 .
This solution is plotted in figure 1.
1.5
1.0
0.5
2

- 0.5

Figure 1
The above analysis applies to essentially every linear differential equation of any degree.
Higher order linear differential equations, involving higher order derivatives, are not as easily solved
as first order differential equations because we cannot simply separate the variables. For this reason, it
is very important to clearly understand the structure of the solutions. Consider the second-order
equation
d2y
dy
a2 (t ) 2 a1 (t ) a0 (t ) y f (t ) ,
dt
dt
where the functions a0 (t ), a1 (t ) , and a2 (t ) are sufficiently differentiable functions of t and f (t) is any
function of t, differentiable or not. Our analysis indicates that the general solution to such an equation
is given by y y p yH , where y p is any solution to the equation and yH is a general element of the
null space of the operator

d2
d
a1 (t ) a0 (t ) .
dt 2
dt
We can easily show that the dimension of the null space of this operator must be less than 3 by asking
whether or not there are nontrivial solutions c1 , c2 , and c3 to the equation
L a2 (t )

c1 y1 c2 y2 c3 y3 0
for functions y1 , y2 , y3 Nul( L) . Taking the first and second derivatives of this expression, we see
that there will be nontrivial solutions if and only if the Wronskian
y1 y2 y3
W y1 , y2 , y3 y1 y2 y3 0 .

y1 y2 y3
Since the functions are in the null space of L, we can also write the Wronskian as

566

Section XI.5: General Vector Spaces and Isomorphisms

y1 y2
W y1 , y2 , y3 y1 y2
y1 y2

y1
a1 (t )

y1
a2 (t )
y1

y3
y3
y3

y2
y2
y2

y1

y2

y3

y1

y2

y3

a1 (t ) y1 a0 (t ) y1
a2 (t )

a1 (t ) y2 a0 (t ) y2
a2 (t )

y3
a (t )
y3 0
a2 (t )
y3

y1
y1

y2
y2

y3
y3 0 0 0

y1

y2

y3

a1 (t ) y3 a0 (t ) y3 .
a2 (t )

We are evaluating these expressions at a value of t for which a2 (t ) 0 . If there is no such value of t,
then the equation is really of first order. From this, we can be sure that any three elements of the null
space of L cannot be linearly independent. This does not imply that the dimension of the null space is
2, though, as it could be either 1 or 0.
The proof that the dimension of the null space is exactly 2 is a bit more involved, and we
cannot yet establish this fact in any easy way. We will have another way to prove this a bit later on,
but there is a very nice result we can derive right now that indicates the possibility that the dimension
of the vector space is 2. The Wronskian of any two elements of the null space can be determined
explicitly in a very neat way; it is given by
y y2
,
W y1 , y2 1
y1 y2
and its derivative is given by
dW d y1

dt
dt y1

y2
y y2 y1 y2
a (t ) y1 y2
a (t )
1

0 1
1 W .
y2
y1 y2 y1 y2
a2 (t ) y1 y2
a2 (t )
We have used the fact that every contribution to the determinant contains exactly one factor from each
row, the fact that the determinant of any matrix containing two identical rows is zero, and the product
rule for derivatives in deriving this expression. The result is a first order linear differential equation
for the Wronskian that can easily be seen to have the solution
t
W (t ) K exp a1 ( ) a2 ( ) d .
t0

This Wronskian will be zero for all t only when K = 0 unless the function a2 (t ) is identically zero.
Thus, the possibility of two linearly independent solutions is left open by this result. If the dimension
of the null space is less than 2, then the value of K can have no value other than zero. An analogous
analysis can be performed for any linear differential equation of order n. The results are that the
dimension of the null space is less than or equal to n, and the Wronskian of n elements is given by
t
W (t ) K exp an 1 ( ) an ( ) d .
t0

It will turn out that there are, in essentially all cases, elements of the null space that satisfy this result
for the Wronskian with nonzero K, so the dimension of the null space is given by n. This means that n
independent conditions are required to specify a unique solution to the equation. These conditions are
known as initial conditions, and fix the vector in the null space associated with the solution we are
interested in.
567

Advanced Mathematical Techniques

Exercises for Section XI.5:

In problems 1 4, write the given polynomial in terms of the given new basis. You may use a
computer algebra system to determine the required inverse matrix, but you should make all of your
manipulations clear.
1. The basis is 1 t 2 , t 2 t , t 3 1, 1 , and p (t ) t 3 t 2 .

2. The basis is t 4 t , t 3 t 2 , t 2 1, t 3 t , 2 , and p (t ) t 4 1 .

3. The basis is t 4 2t 2 , t 3 t , t 2 t , t 4 t , 3 2t , and p(t ) t 4 t 2 1 .


4. The basis is t 3 t , t 2 t , t 1, 2t 1 , and p(t ) t 3 2t 2 3 .

5. Determine the inverse of the transformation matrix


0 0 3 2

0 2 1 1
T 1 0 0 0

1 0 0 0
0 1 0 0

by hand. Use any method you deem efficient.

0
3

0
0

6. Find a set of transformations that map the spaces


2 1
1 3

2
3

V span , and U span 2 , 2

2
5

1 4

1 3
from one to the other. Show that the composite matrices act in the appropriate manner on the
basis vectors.
7. Find a set of transformations that map the spaces
3 2
3 1

1 3

,
V span
and U span 1 , 3

2
5

2 2

4 1
from one to the other. Show that the composite matrices act in the appropriate manner on the
basis vectors.
8. Determine whether the given set of polynomials constitutes a vector space. Explain your
answer. The numbers a and b represent any real number.
(a) Polynomials of degree 3
(b) Polynomials of degree less than 5.
(c) Polynomials of the form at 2 bt .
(d) Polynomials of the form at 3 t 2 b .

568

Section XI.5: General Vector Spaces and Isomorphisms

In problems 9 12, determine the Wronskian of the given functions. Simplify your result as much as
possible. Are the functions linearly independent or not? If not, find a relation between them.
9. sin t , cos t , 1

10. e 2t , e t , sin t

11. sin 2 t , cos 2t , 1

12. et , sinh t , cosh t

In problems 13 18, determine the Wronskian of 2 linearly independent solutions to the given
differential equation.
13.
y 3 y 2 y 0

14.
y 3t 2 y 2sin t y 0

15. t 2
y 2t y 3 y 0

y 4 y 3t 2 y 0
16. t 2

17. cos t
y 2sin t y 2e 2t y 0

18. t (1 t )
y 4 y 3t 2 y 0

Section XI.6: Eigenvalues and Eigenvectors

An important generalization of the idea of a null space is the idea of an eigenspace. Vectors in
the null space of an operator A satisfy A x 0 0 x , so we can legitimately ask whether or not there
are other numbers for which A x x . The vector 0 obviously satisfies this equation for all
numbers , so is disallowed from being a viable vector x in this equation. Only nontrivial vectors
can be admitted as viable solutions to this equation. If such vectors exist, then the matrix A behaves
like the number when acting in these directions. The number is called an eigenvalue of the
matrix A in this case, and the associated vector x is called an eigenvector of A. The prefix eigen- is
German for characteristic, so eigenvalues of A are numbers that are characteristic of the matrix A.
They represent numbers that the matrix can behave like in certain directions. Eigenvalues allow us to
employ all of the characteristics of numbers to matrices, and give us an easy way to generalize many
of the properties of numbers to matrices. The fact that a matrix can have more than one eigenvalue
means that it can behave like different numbers in different directions, giving the matrix a richer
spectrum of possibilities than a simple number.
In order to determine whether or not the matrix A has the eigenvalue , we re-arrange the
eigenvalue equation to obtain
A x x A I x 0 .
Thus, if matrix A has eigenvalue , then the matrix A I must have a nontrivial null space. If this
is so, then its determinant must equal 0. In a finite-dimensional vector space, we can use this fact to
determine all of the possible eigenvalues of the matrix A. The determinant of an n n matrix consists
of a sum of all possible products of a single element from each row and each column, so det( A I )
can always be re-arranged into a polynomial of degree n in . The Fundamental Theorem of Algebra
implies that this polynomial must have at least one zero in the complex plane and cannot have more
than n distinct zeros. Therefore, we can infer that every n n matrix has at least one and not more
than n distinct eigenvalues in the complex plane. It may not be efficient to actually compute the
eigenvalues in this manner with a large matrix, but it is an easy way to determine their existence and
some of their properties.
As an example, consider the matrix
5 7
A
.
2 4
569

Advanced Mathematical Techniques

In order to determine the eigenvalues of this matrix, we need to find places where the determinant
det( A I ) 0 . This determinant is given by
5
7
( 5)( 4) 14 2 6 ( 3)( 2) ,
2
4
so the eigenvalues are -2 and 3. There can be no other eigenvalues of this matrix because there are no
other values of that make the determinant zero. To find the eigenvectors associated with these
eigenvalues, we search for the null space associated with the matrix A I for these two values of .
This matrix must have a nontrivial null space for both of these values because its determinant is equal
to zero. To find these null spaces, we simply row reduce. For the eigenvalue -2, we have
7 7
1

v1 ,
2 2
1
and for the eigenvalue 3 we have
2 7
7

v2 .
2 7
2
These two vectors are linearly independent, so span the space 2 . They form a basis of 2 , and
every vector in 2 can be expressed as a linear combination of them. This basis is called the
eigenbasis of matrix A, and the action of this matrix on vectors in 2 can be understood very clearly
in terms of this basis. Given the vector
x c1 v1 c2 v2 2 ,
we can easily determine the following:
A x 2c1 v1 3c2 v2

A2 x 4c1 v1 9c2 v2
An x (2) n c1 v1 3n c2 v2 .
Each of these matrices can be determined from A simply by multiplying repeatedly, but their action on
vectors in the space is much easier to determine in the eigenbasis. We can also determine more
complicated functions of the matrix A in the same manner:
1
1
A1 x c1 v1 c2 v2
2
3
A
2
3
e x e c1 v1 e c2 v2 .
These results are not easily obtained from the matrix A itself, but the matrix behaves like simple
numbers in its eigenbasis; any function that is defined for all of the eigenvalues can also be taken of
the full matrix A using this idea. When acting on an eigenvector, the matrix is its eigenvalue. Since
the whole space can be obtained in terms of the eigenvectors, these determinations actually represent
the action of these functions of the matrix on every vector in the space and can therefore be used to
define the function of the matrix.
The matrix e A cannot be obtained directly from A in any conventional manner, so you may not
actually believe these results. To make this more concrete, lets try to determine a way to find a
representation of these functions of the matrix in terms of the matrix itself instead of an abstract idea of
how it behaves on the eigenbasis. Suppose that we can define a function f (A) of the matrix A. This
function gives
f ( A) x f 1 c1 v1 f 2 c2 v2
570

Section XI.6: Eigenvalues and Eigenvectors

in terms of the eigenbasis. This means that the action of the function f (A) is determined exclusively
by the two numbers f 1 and f 2 . All we need to do is find a simple function of the matrix A
that gives the same value as f 1 and f 2 when acting on the eigenvectors. Since there are only
two of these numbers, this can be accomplished with a simple linear function a bA . To determine
the values of a and b, we solve the system
f 1 a b1
.
f 2 a b2
The solution gives the representation
f 2 2 f 1
f 1 f 2
f ( A) 1
I
A.
1 2
1 2
The matrices on either side of this equation give exactly the same result when acting on either of the
eigenvectors. Since these eigenvectors span the space, this implies that they give the same result when
acting on any vector in the space and are therefore indistinguishable. Thus, we can use this expression
to define the function of the matrix, whatever it is. In our case, we have
3 f 2 2 f 3
f 3 f 2
f ( A)
I
A.
5
5
To verify our result, we need a function that can also be obtained in a different manner. Consider the
function f ( A) A 6 I . According to our analysis, this function is given by
12
1
1 17 7
A 6I I A
.
5
5
5 2 8
To confirm this result, we square the matrix:
1 17 7 17 7 1 275 175 11 7 5 7 6 0

A 6I .
50 2 2 2 4 0 6
25 2 8 2 8 25 50
This matrix behaves exactly as we would expect the square root to behave, so we are justified in
believing that this technique works. We can use it to determine any function of the matrix A that is
defined at both of the eigenvalues. It is not at all necessary for the function to even be defined
anywhere else, as these are the only two values that the matrix can behave like. The inverse A1 is an
example, as the associated function 1 is not defined at 0. As an interesting side note, we determined
our square root matrix by taking 4 2 and 9 3 . What if we had chosen -2 instead of 2? Try to
show for yourself that this gives another viable square root of A + 6I, the matrix A itself.
The above analysis can be done for any n n matrix with distinct eigenvectors that span the
space. With n eigenvectors, we require a polynomial of degree n 1 to fit all of the n relevant values
of f , so the process is a bit more challenging computationally. A different process that allows us
to arrive at the same result using somewhat more conventional means is that of transforming to the
eigenbasis, taking the function of A, and transforming back to the original e-basis. This process is very
simple to implement on a computer, even with fairly large matrices. To begin, consider the matrix A
in its eigenbasis. In this basis, A is represented by a diagonal matrix whose entries along the diagonal
are its eigenvalues. To distinguish this matrix from its e-basis form, we will call it D for diagonal.
The action of D on an arbitrary vector x B given in the eigenbasis (indicated by the subscript B) is
written D x B . To determine the e-basis form of D, we transform this vector to the e-basis using the
transformation P formed by taking the eigenvectors of A as columns: PD x B . Now, the e-basis
571

Advanced Mathematical Techniques

representation of D is the matrix that acts on a vector in the e-basis and takes it to another vector in the
e-basis. The output of PD is in the e-basis, but the input is still in the eigenbasis. We can modify this
by inserting the identity in the form P 1 P between D and the vector x B . The action of P on the
vector x B takes this vector to the e-basis, so the remaining matrix PDP 1 takes vectors in the e-basis
to other vectors in the e-basis and must therefore equal A. The change of matrix A from its eigenbasis
to the e-basis is a similarity transformation. Such transformations are a subset of row operations, but
they preserve the eigenvalues of the matrix. Given any matrix A, the similar matrix PAP 1 plays the
role of A in a different basis. The matrix P can be any invertible matrix; its specific form simply
determines the identity of the new basis. Since any two bases of a vector space are entirely equivalent
to each other, this similar matrix must behave in an exactly analogous manner as A. The only
difference is its appearance.
Suppose that the matrix A has eigenvalue with associated eigenvector x :

A I x 0 .
Multiplying by P on the right, we can write this as
P A I x P A I P 1 P x PAP 1 I P x 0 .
Thus, the matrix PAP 1 has eigenvector P x with the same associated eigenvalue. This new
eigenvector is simply the old one expressed in the new basis. Powers of A are transformed in a similar
manner, as

PAP

1 2

PAP 1 PAP 1 PA2 P 1 .

The similarity transformation therefore preserves all operations on matrix A; to determine the effect of
a given transformation in the new basis, we simply change the transformed matrix to the new basis. If
the matrix A has an eigenbasis, a basis consisting entirely of eigenvectors, then it will be diagonal
when expressed in this basis. This is somewhat obvious, and its validity has been assumed in the
n
above, but we can show it explicitly by taking the vectors v k k 1 as the eigenvectors. In this case, the
matrix
P v1 v 2 v n
and
AP A v1 Av 2 Av n 1 v1 2 v 2 n v n .
Writing
u1

u
P 1 2 ,


un
you can see by inspection that

572

Section XI.6: Eigenvalues and Eigenvectors

u1
1u1 v1 2u1 v 2 n u1 v n

u
u v 2u 2 v 2 nu 2 v n
P 1 AP 2 1 v1 2 v 2 n v n 1 2 1

un
1u n v1 2u n v 2 nu n v n
.
1 0 0

0 2 0

0 0 n
These dot products indicate the result of matrix multiplication; it is clear from the fact that P 1 P I
that any matched pairs of u and v give 1, while unmatched pairs give 0.
Functions of matrices are easily understood when the matrix is diagonal,

0
0
f 1

0
f

2
,
f ( D)

f n
0
0
so the matrix f (A) can be determined simply by transforming this matrix back to another basis via
similarity transformation: f ( A) P f ( D ) P 1 . It is important to understand why the 0s in the
diagonal matrix are not changed by the function. These 0s indicate that the different eigenvectors are
not mixed with each other under the action of the matrix D: the action of D does not change one of the
eigenvectors to another; these vectors live in different worlds, and are unaffected by each other. The
matrix is its eigenvalue when acting on one of these directions, so the function of the matrix simply
returns the function of the eigenvalue in each of these directions. A modification of the off-diagonal
zeros would imply that these worlds are mixed together, which cannot occur using a simple numerical
function f . These off-diagonal elements can only be changed in a multidimensional

transformation like a similarity transform.


As an example of this technique, lets determine the function A 6 I using the matrix A
above. The similarity transform matrix P is given by
1 7
P
.
1 2
The function of the diagonal matrix is easy to calculate,
2 0
D 6I
,
0 3
so
1
1 7 2 0 2 7 1 17 7
A 6 I P D 6 I P 1


,
5 1 2 0 3 1 1 5 2 8
exactly as before. The other viable square roots can easily be determined simply by using the
diagonals -2 and 3, -2 and -3, or 2 and -3 in the eigenspace matrix. This procedure can be performed
on matrices of any finite dimension, provided that we know the eigenvalues and eigenvectors and that
the eigenvectors span the space.
How do we know whether or not the eigenvectors span the space? Given the eigenvectors, we
can obviously row reduce the matrix A to see whether or not they are linearly independent, but this is a
573

Advanced Mathematical Techniques

lot of work. It turns out that we can make a very general statement about when the eigenvectors will
definitely span the space, so we often wont need to even verify that they do. This statement is,
Eigenvectors associated with distinct eigenvalues are linearly independent. To prove this, lets first
assume that it is not true, i.e. that there is a set of eigenvectors associated with distinct eigenvectors
that is linearly dependent. For such a set of r eigenvectors

k 1

associated with the set of distinct

eigenvalues k k 1 , we have
r

c1 v1 c2 v2 cr vr 0
Acting on this combination with matrix A, we have
with at least two of the cs not zero.
1c1 v1 2c2 v2 r cr vr 0 .
174

Multiplying the first equation by r and subtracting gives

1 r c1

v1 2 r c2 v2 r 1 r cr 1 vr 1 0 .
Now, none of these additional factors is zero because the eigenvalues are distinct. Since not all of the
cs could be simultaneously zero, not all of the r c s are simultaneously zero. Again, we must
have at least two nonzero cs in order to make this expression valid. Therefore, we now have a
nontrivial linear combination of the r 1 remaining eigenvectors that equals zero and these vectors
must be linearly dependent. Continuing with this line of reasoning, we can whittle down the number
of eigenvectors to only one. Since a single eigenvector cannot be linearly dependent by itself, this
process implies that our initial assumption is not valid, i.e. eigenvectors associated with distinct
eigenvalues are linearly independent. An n n matrix with n distinct eigenvalues therefore is
associated with n linearly independent eigenvectors which must span the space. The only way to have
eigenvectors that dont span the space is if some of the eigenvalues are degenerate, or repeated.
As an example of such a matrix, consider
1 1
A
.
0 1
The characteristic polynomial of this matrix is
1
1
2
1 ,
0 1
so it has the degenerate eigenvalue 1 , with multiplicity 2. Searching for the eigenvectors
associated with this eigenvalue, we investigate the null space of
0 1
A I
.
0 0
There is only one linearly independent eigenvector,
1
v ,
0
so all of 2 is not spanned by eigenvectors of A, and the matrix is not diagonalizable. This problem
of a degenerate eigenvalue having a null space that does not have the same dimension as the
multiplicity of the eigenvalue is a deep one, and requires the introduction of a new concept called a
generalized eigenvector to resolve. We can find the remaining part of 2 by appealing to our earlier
result that the whole space is equal to the null space of A I , consisting of our eigenspace, combined
T
with the column space of A I ; it is spanned by the vector
174

It is not possible for only one of these coefficients to be nonzero, as that would imply that one of the eigenvectors is zero.

574

Section XI.6: Eigenvalues and Eigenvectors

0
u1 .
1
Now, the matrix A I cannot take this vector to itself since that would constitute an additional
eigenvalue and A has only one eigenvalue. It also cannot take this vector to 0 because that would
imply that it was an eigenvector of A with eigenvalue 1, and it is not. It must go somewhere, though,
and the only place available is the eigenspace of A:
1
A I u1 v1 .
0
Thus, although u1 is not an eigenvector of A I, it is an eigenvector of A I :
2

A I

u1 A I v1 0 .
This is a special case of the celebrated Cayley-Hamilton theorem, named for the British
mathematician Arthur Cayley and the Irish physicist and mathematician William Hamilton, which
states that every finite-dimensional matrix satisfies its own characteristic polynomial; replacing
with A in the characteristic polynomial p p A always gives the zero matrix. This theorem
2

implies that every vector space will be spanned by a combination of the eigenvectors and the
generalized eigenvectors u1 . Although these generalized eigenvectors are not eigenvectors of A,
they can be chosen to be in the null space of

A I

for some eigenvalue , where m is the

multiplicity of the eigenvalue. There are several ways to prove this theorem, many of which are quite
involved. The simplest one that I have seen asks us to think of the eigenvalues of the matrix as
functions of the entries of the matrix:
k k Aij ,
where k runs from 1 to n. The characteristic polynomial is clearly given by
n

p k Aij .
k 1

Degenerate eigenvalues occur whenever one of these eigenvalue functions happens to coincide with
another for some specific choice of the matrix entries Aij . Whenever the eigenvalues are distinct, the
eigenvectors span the space and it is clear that the matrix
n

p A A k Aij I
k 1

annihilates all of the basis vectors associated with the eigenbasis (why?). Therefore, this matrix
transforms every vector in the space to the zero vector. This implies that p ( A) is the zero matrix
whenever all of the eigenvalues are distinct. Now, suppose that for some specific set of elements Aij
the eigenvalues become degenerate. Degenerate eigenvalues represent an extremely delicate
balancing act among the entries of the matrix A. Changing one of the elements even by a small
amount will break this degeneracy and make all of the eigenvalues distinct. Therefore, we must have
p A 0
for some choice of the matrix , no matter how small.175 On the other hand, the matrix p(A) is a
continuous function of the elements of A. It is complicated, for sure, and may not be differentiable in
these elements, but it certainly must be continuous. The definition of continuity therefore requires
175
There are other choices of that still lead to degenerate eigenvalues, but there is certainly at least one choice that breaks
the degeneracy.

575

Advanced Mathematical Techniques

p ( A) lim p ( A ) 0 .
0

It is not possible for the matrix p ( A) to jump from 0 to nonzero when the eigenvalues happen to
align with each other because this situation is very delicate and can easily be destroyed. One way to
state this result is that p ( A) 0 almost always; no matter what the values of the entries of A are,
there are values arbitrarily close to these for which the matrix p ( A) must equal zero because the
associated eigenvalues are distinct. Continuity of this matrix in the components of A, established by
the continuity of polynomials in their arguments and the continuity in the roots of a polynomial
thought of as functions of its coefficients, then demands that the matrix p ( A) 0 for all elements.
As an example of degenerate eigenvalues, consider the matrix
2 1 1

A 0 3 1 .
0 0 3

The eigenvalues of this matrix are 2, degeneracy 1, and 3, degeneracy 2, as can be seen by finding the
characteristic polynomial:
2
1
1

1 (2 )(3 )2 .

0
0
3
We could also have read the eigenvalues directly off of this matrix, as it is upper triangular with zeros
everywhere below the main diagonal. The eigenvector with eigenvalue 2 is found by row reducing
A 2I:
0 1 1 0 2 0

A 2 I 0 1 1 0 1 0 ;
0 0 1 0 0 1

1

v1 0 .
0

The eigenvector with eigenvalue 3 is found by row reducing A 3I:
1 1 1 1 1 0

A 3I 0 0 1 0 0 1 ;
0 0 0 0 0 0

1

v2 1 .
0

There is no other solution, so the dimension of the eigenspace with eigenvalue 3 is only 1. It is less
than the multiplicity of this eigenvalue, so the eigenvectors do not span the space. This matrix is not
diagonalizable.
A third linearly independent vector in 3 is

576

Section XI.6: Eigenvalues and Eigenvectors

0

u 0 .
1

The matrix A cannot take this vector to itself because that would represent another eigenvector and this
is not possible. In fact, the matrix takes u to
2 1 1 0 1


A u 0 3 1 0 1 2 v1 v2 3 u .
0 0 3 1 3


The action of A 3I takes this vector to the eigenspace, so acting with A 2I and again with A 3I
takes this vector as well as the whole space to zero in accordance with the Cayley-Hamilton theorem:
( A 2 I )( A 3I )2 x 0 x 3 .
Note that the vector u is trying to be an eigenvector. Instead of just giving 3 u , the action of A on

u gives this plus a vector that is already in the eigenspace. This is characteristic of generalized
eigenvectors: there is always a choice of generalized eigenvector for which the action of the matrix
gives the degenerate eigenvalue times the generalized eigenvector plus a contribution that is already in
the eigenspace. If the multiplicity of the eigenvalue is larger than 2, then there may be another
generalized eigenvector that the matrix transforms to the degenerate eigenvalue times this new
generalized eigenvector plus some multiple of the first generalized eigenvector plus a combination that
is in the eigenspace. Generalized eigenvectors are, in this sense, kind of like zombies in a horror
film.176 Acting with A I once may not kill the vector, but it brings it closer to death. Repeatedly
acting with this matrix will always kill the vector eventually. The Cayley-Hamilton theorem assures
us that the vector will be killed once we have applied A I a number of times equal to the
degeneracy of the eigenvalue. Each action reduces the number of linearly independent vectors
available for the vector to be taken to, setting the vector up for its eventual death.
It is not always necessary to consider generalized eigenvectors. The Cayley-Hamilton theorem
states only that acting with the matrix A I a number of times equal to the degeneracy of will
definitely annihilate the vector, not that we need to act this many times. Consider, for example, the
matrix
14 0 4

A 1 12 2 .
4 0 4

Its eigenvalues can be found via the characteristic polynomial:


14
0
4
1
12 2 (12 ) ( 4)( 14) 16
.
4
0
4
( 12) 2 18 72 ( 12) 2 ( 6)
Clearly, they are 6 with degeneracy 1 and 12 with degeneracy 2. The eigenvector associated with
eigenvalue 6 is

176

I must attribute this wonderful analogy to Prof. Robert Sachs at George Mason University.

577

Advanced Mathematical Techniques

2

v1 1 ,
4

while row reduction associated with eigenvalue 12 leads to
2 0 4 1 0 2 1 0 2

1 0 2 1 0 2 0 0 0 .
4 0 8 0 0 0 0 0 0

This matrix clearly has a null space of dimension 2, so there are two linearly independent eigenvectors
associated with eigenvalue 12:
0
2


v2 1 ; v3 0 .
0
1


Thus, the eigenspace of this matrix does span the whole space and there is no problem at all with the
degeneracy of this eigenvalue. Degenerate eigenvalues may lead to eigenvectors that do not span the
space, but they do not have to. We will see later that there is a broad class of matrices for which we
can be sure that the eigenvectors will span the space, regardless of whether or not some of them are
degenerate. The bottom line with degenerate eigenvalues is that all bets are off. Anything can happen,
and we need to carefully consider the eigenspace associated with any degenerate matrix to see which
class it falls into.
Before moving on, it will be useful for us to consider a more complicated example of finding
eigenvalues. Consider the matrix
87 33 22

A 10 68 20 ,
4 12 90

and suppose that we are guaranteed that its eigenvalues are integers. How can we determine these
eigenvalues? The eigenvalues are the values of that make the determinant det A I zero. We
can do any row or column reduction we like without altering this property, so we write
87
33
22
87
33
0
C3 3C3 2 C2
det A I 10
68 20
10
68 2 196
4
12 90
4
12 294 3
87

10
R3 2 R3 3 R2

38

33
68
180 3

0
2 196 2 98 87 180 3 1254 .
0

6 98 2 147 4802 6 98 49
2

The characteristic polynomial we obtained is larger by a factor of 6 than the original determinant, but
this does not matter as we are only interested in the zeros. The eigenvalues are clearly given by 98, 98,
and 49. The eigenvectors can be found with a lot less work once the eigenvalues are known; the result
is

578

Section XI.6: Eigenvalues and Eigenvectors

2
3
11

x1 0 , x2 1 , and x3 10 ,
1
0
4

where the first two are associated with the eigenvalue 98 and the last is associated with eigenvalue 49.
These eigenvectors clearly span the space, despite the appearance of a degenerate eigenvalue. This is a
very specialized technique, and can only be applied with success to matrices whose eigenvalues have
been guaranteed to be simple. Otherwise, the fortunate cancellation in the last entry of the third
row would not have occurred. If we know that the eigenvalues will be simple, however, then obtaining
them simply requires a deft manipulation of the required determinant. Some examples are given in the
exercises.
One of the applications of eigenvalues concerns the steady state vectors associated with a
matrix. Consider the population of two small towns in close proximity to each other. We can define a
migration matrix which indicates the percent of each town that moves to the other town during a given
year. Suppose that two towns have the migration matrix
0.9 0.2
M
.
0.1 0.8
This matrix indicates that, in a given year, 10% of the first town will move to the second and 20% of
the second town will move to the first. The diagonal elements give the percent of the town that
remains where it is, 90% for the first town and 80% for the second. The interpretation of this matrix
requires the sum of each of the columns to give 100%, or 1, as everybody has to go somewhere. Given
an initial population distribution of 30% in the first town and 70% in the second, the initial population
is represented by
0.3
x0 .
0.7
The population distribution of the towns after one year is given by
0.9 0.2 0.3 0.41
x1 M x0

,
0.1 0.8 0.7 0.59
so 41% of the total population lives in the first town and the other 59% lives in the second after one
year. After two years, the population becomes 48.7% in town 1 and 51.3% in town 2. What can we
expect the populations to become after many years?
To answer this question, we need to examine the effect of acting with matrix M many times on
the initial vector. Obviously, this determination is most easily accomplished in the eigenbasis. The
eigenvalues of this matrix are determined by the characteristic polynomial
0.9 0.8 0.02 1 0.7 ,
so one of the eigenvalues is 1 and the other is 0.7. Writing the initial population vector in terms of the
eigenvectors, we have
x0 c1 v1 c2 v2
.
n
M x0 c1 v1 (0.7) n c2 v2
After many years, the contribution from the second eigenvector is suppressed immensely by its
coefficient. Unless c1 0 , the population will certainly line up with v1 after a very long time. For
this reason, the population distribution associated with the eigenvector v1 is called an asymptotic or
stable fixed point of the migration. Essentially all initial populations will eventually end up at this
distribution after a long time. The eigenvectors associated with the migration matrix are
579

Advanced Mathematical Techniques

so the eigenvector v2

2
1
and
v1
v2 ,
1
1
isnt viable as a population distribution and all initial populations will

eventually end up at the stable fixed point associated with town 1 having twice the population of town
2. Normalizing this eigenvector so that the sum of the components equals 1, we have 66.7% in town 1
and 33.3% in town 2 after a long period of time.
The above results are not at all specific to this example. All matrices with columns whose
entries each sum to 1 will have 1 as an eigenvalue because subtracting 1 from each of the diagonal
elements gives a matrix whose rows sum to zero. The rows are therefore linearly dependent and the
matrix M I must have a nontrivial null space. If, in addition to this, the entries of matrix M are all
positive, then the other eigenvalues will all be less than 1 in magnitude and the eigenvectors associated
with these other eigenvectors will not represent viable population distributions. Therefore, the
population after a long time is given by that associated with the eigenvector with eigenvalue 1 to a
very good approximation. One way to understand why this is true in two dimensions is to use the idea
of trace. The trace of a matrix is the sum of its diagonal elements,
n

Tr( A) Akk Akk .


k 1

The last equality emphasizes the agreement among mathematicians and scientists that repeated indices
are understood to be summed over, called the Einstein summation convention. There is no need to
write out the sum explicitly. It is a standard result that the trace of a product of matrices is cyclic, i.e.
Tr( ABC ) Tr(CAB )
for all matrices A, B, and C. This fact can be established trivially using index notation:
Tr( ABC ) Aij B jk Cki Cki Aij B jk Tr(CAB ) .
We are using Einsteins summation convention in this expression; the repeated indices i, j and k are
implicitly summed over all possible values. If the matrix A is diagonalizable, or similar to a diagonal
matrix, then its trace is given by
Tr( A) Tr PDP 1 Tr P 1 PD Tr( D ) .
Put in other words, the trace of a diagonalizable matrix is equal to the sum of its eigenvalues. Note
that matrices that are not diagonalizable also must satisfy this relation for reasons similar to those used
above to prove the Cayley-Hamilton theorem. We can also use this idea to show that the determinant
of a diagonal matrix is equal to the product of its eigenvalues. Try to show this. It presents an
alternate and more concise, though less useful, definition of the determinant. One of the eigenvalues
of a 2 2 migration matrix is 1, so the other must be given by Tr( M ) 1 . The elements of a
migration matrix are all less than or at best equal to 1, so the second eigenvalue lies between -1 and 1.
Its magnitude is therefore less than or at best equal to 1, so the repeated action of the migration matrix
always takes us to the steady-state eigenvector with eigenvalue 1. Subtracting the matrix I from M
leaves us with the diagonal element
M 11 Tr( M ) 1 1 M 22 0
in the top left position, and the element
M 22 Tr( M ) 1 1 M 11 0
in the bottom right. Both of these elements are positive, so the associated eigenvector must have
components with opposite sign. This makes it a nonviable population density, so the only accessible
density is that associated with the eigenvector associated with eigenvalue 1. The only way to have a
non-diagonalizable migration matrix is to have both of the diagonal elements equal to 1, in which case
580

Section XI.6: Eigenvalues and Eigenvectors

the migration matrix is simply the identity; no one moves at all. In this case, all vectors are
eigenvectors with eigenvalue 1 and all are stable under repeated application of the migration matrix.
The general proof is accomplished in a simpler manner: if the columns of A sum to 1, then it is
easy to verify that the sum of the entries of x is the same as that of the entries of A x x . This
implies that the entries of eigenvectors associated with eigenvalues not equal to 1 must sum to zero, so
viable populations must be dominated by the eigenvector with eigenvalue 1; if all of the elements of
this eigenvector do not have the same sign, then there can be no viable populations for this matrix.
It is interesting to explore the repeated action of matrices on a given vector. It is clear from the
above that a migration matrix takes all vectors to the eigenvector with eigenvalue 1, but how quickly
does this happen? We can plot the results of various repeated transformations to see their result. It is
obvious that the progression of these products depends exclusively on the values of the eigenvalues of
the matrix. Thankfully, we can easily create 2 2 matrices with any set of eigenvalues using the fact
that the trace of the matrix is equal to the sum of the eigenvalues and the determinant is equal to their
product. To make a matrix whose eigenvalues are 3 and 1, we simply arrange the elements in such a
way that the trace is 4 and the determinant is 3. One example is
5 8
M
.
1 1
If both of the eigenvalues are less than 1, as is the case with the matrix M 4 , then all solutions will
approach zero as the matrix is applied over and over again. In this case, the origin is called an
attractor, or a sink, because all of the solutions approach it as the iterations continue. This situation is
shown in figure 2, where the eigendirections are indicated with solid lines and the vector trajectories
are indicated with the dashed lines. Since one of the eigenvalues is smaller than the other in
magnitude, the solutions approach the eigenvector associated with this eigenvalue as the number of
iterations is increased. The matrix 2M has both eigenvalues greater than one, so the origin is a
repellor, or a source, and all solutions leave it behind as the number of iterations increases. The
associated diagram is exactly the same as that shown in figure 2, with the iterations reversed. If one of
the eigenvalues is less than one and the other is greater than 1, then the solutions skirt the origin as
time goes on. This situation is shown for the matrix M 2 in figure 3. Note that the solutions go from
one of the eigendirections to the other as the number of iterations is increased. This situation is called
a saddle point because the vectors first go toward the origin and then away from it as the time is
increased. Again, the less steep eigenvector associated with eigenvalue 3 is approached more quickly
than the other eigenvector.

40
20

- 200

- 100

100

- 20
- 40

200

Figure 2
5

- 15

- 10

-5

10

15

-5

Figure 3
581

Advanced Mathematical Techniques

In the above examples, both of the eigenvalues are real. The fact that the Fundamental
Theorem of Algebra does not guarantee the existence of a real eigenvector requires us to also consider
the case of complex eigenvalues, even if all of the elements of the matrix are real. As an example,
consider a matrix with eigenvalues 3 i . Using our trick, we can easily construct such a matrix:
4 2
A
.
1 2
While this matrix clearly has all real entries, neither its eigenvalues nor its eigenvectors can be real.
To determine the eigenvectors, we use the same procedure as that used above:
2 R1 R1 (1i ) R2 0
0
1 i
A (3 i ) I

.
1
1
i
1
1

Therefore, the eigenvector associated with eigenvalue 3 i is given by


1 i
v1
.
1
This change-over from real to complex eigenvalues requires us to consider the field of complex
numbers rather than real numbers in order to determine the eigenvector. It is not consistent to work
exclusively with real numbers multiplying vectors if some of the eigenvalues are complex, as can
immediately be seen directly from the definition of an eigenvector: if A v v with , then
we obviously need to allow scalar multiplication with complex numbers. Otherwise, it would not at all
be clear what is meant by v . With this seemingly minor change, everything works in exactly the
same manner. The algebra of complex numbers and the fact that the matrix itself contains only real
entries implies that the eigenvector associated with eigenvalue 3 + i is given by the complex conjugate
of this eigenvector:
1 i
v1
.
1
These two eigenvectors are indeed linearly independent, as implied by the fact that the eigenvalues are
distinct, so the space 2 is spanned by our eigenvectors.
What happens to a real vector that is acted on repeatedly by this matrix? The elements of the
matrix are all real, so the vectors that arise from this operation must also be real. There is no migration
out of 2 , so we can plot the results on a set of two axes exactly as was done before. In this case,
however, the steady state eigenvectors do not belong to the space we are plotting. All real vectors
must change their direction under the action of A. This type of phenomenon is associated with
rotation, as no vectors can remain unchanged under a rotation. Thus, we see that complex eigenvalues
are an indication that the matrix rotates all of the vectors in 2 . If the modulus of the eigenvectors is
larger than 1, then the rotation is also associated with an outward migration away from the origin. The
vectors spiral out from the origin as the matrix is applied repeatedly. This effect is illustrated in figure
4. Complex eigenvalues whose modulus is smaller than 1 represent the opposite effect: vectors spiral
in toward the origin as the matrix is applied repeatedly. Eigenvalues whose modulus is exactly 1 cause
the vectors to move around the origin in an ellipse, neither moving away from nor toward the origin on
average as the matrix is repeatedly applied.

582

Section XI.6: Eigenvalues and Eigenvectors


2
1

-4

-3

-2

-1

-1
-2

Figure 4
Rotation is obviously important to many physical systems. Weather, ecosystems, and
migration patterns all show some degree of cyclic variation through time, and these cyclic patterns
cannot be represented by real eigenvalues. For this reason, it is important to understand how complex
eigenvalues work and what they represent in order to treat such systems. We have already seen this
behavior, as the trigonometric functions sine and cosine are intimately related to the exponential
functions with complex argument. This is the fundamental reason why complex numbers cannot be
ignored in a serious study of physical systems. Contrary to popular belief, complex numbers are
certainly not unphysical. They play a major role in the study of many physical systems, and it would
be detrimental to the analysis of such systems to simply ignore them. Besides, why would we ignore
such great graphs?!
We can also use the idea of eigenvectors to give us insight into the reason why the dimension
of the null space of a second-order differential operator is exactly 2 instead of just less than or equal to
2. Consider the differential equation
d2y
dy
a2 (t ) 2 a1 (t ) a2 (t ) y 0 .
dt
dt
We can re-write this equation as the system of first-order differential equations
dy
z
dt
a (t )
a (t )
dz
1 z 0 y,
dt
a2 (t )
a2 (t )
or
0
1
y
d y
a (t ) a (t ) a (t ) a (t ) .
dt z 0
2
1
2
z
If the functions a(t) are constant, then the matrix is constant and the equation can easily be solved:
y (t ) tA y (0)

e
.
z (t )
z (0)
Here, the exponential of the matrix tA is most easily understood in terms of the eigenvalues of A. If
these are distinct, then the eigenvectors span the space 2 and it is possible to write any set of initial
conditions
y (0)
x(0)

z (0)
583

Advanced Mathematical Techniques

in terms of the eigenvectors:


y (0)
x(0)
c1 v1 c2 v2 .
z (0)
In this case, the solution is given as
y (t )
1t
2t
tA

x(t ) e x (0) c1e v1 c2 e v2 .


z (t )
These two solutions are clearly linearly independent whenever the eigenvalues are distinct. Matrices
for which the eigenvalues are not distinct can always be considered as limits of those for which the
eigenvalues are distinct, as distinct eigenvalues are the norm. The case of degenerate eigenvalues is
very special, and requires an extremely delicate balance of the contributions to the matrix. This
process can obviously be followed for a differential equation with constant coefficients of any degree,
implying that the dimension of the null space of a linear differential operator of order n is equal to the
maximum number of degenerate eigenvalues associated with an n n matrix, n.
We can also understand this result from a more pragmatic perspective. A linear differential
equation of order n must, in some sense, be integrated n times to obtain the solution. Each of these
integrations comes with an arbitrary constant that must be specified in order to obtain a unique
solution. These specifications are given by the initial conditions, which are arbitrary. The general
solution must contain n independent parameters and therefore represent an n-dimensional space. It is
clear from this line of reasoning that linear differential equations with variable coefficients a(t) must
also have solution spaces that are of dimension n. The solution of these equations is not as simple as
those with constant coefficients, but we can still formally write the solution to the equation
d
x(t ) A(t ) x(t )
dt
as
t
t
t1
t
t1
t2
x(t ) I A t1 dt1 A t1 dt1 A t2 dt2 A t1 dt1 A t2 dt2 A t3 dt3 x(0)
0
0
0
0
0
0

t
T exp A d x(0)
0

You can verify for yourself that this solution indeed satisfies the differential equation simply by
plugging it in. The T in the last expression is the time-ordering operator. It orders the integrations
so that the smaller values of t always lie to the right of the larger values, exactly as written in the
explicit expression above. This is required because the matrix A(t) may not commute with itself at
different values of t. You can show for yourself that this expression gives the exponential exp(t A)
when A is constant. The time-ordering restriction generates the factorial in the expansion of the
exponential. This solution is called formal because we have not shown that it converges. Whenever
it does, however, it represents n linearly independent solutions associated with the n independent
contributions to the vector x(0) . This technique of changing a linear differential equation of order n

into a system of n first-order differential equations is extremely useful, both in practice and in proving
theorems associated with the behavior of the solutions. I will not go into more detail about this here,
as the discussion closely mirrors that given above in terms of the migration matrices, but you should
be aware that this is one of the most prominent uses of linear algebra especially the use of
eigenvectors and eigenvalues.

584

Section XI.6: Eigenvalues and Eigenvectors

Exercises for Section XI.6:


In problems 1 8, determine the eigenvalues and associated eigenvectors of the given matrix. Are the
sum and the product of the eigenvalues correctly represented in the trace and determinant? Do the
eigenvectors span the appropriate space? Explain.
1 2
2 3
5 2
2 3
2.
3.
4.
1.

3 6
1 0
8 3
4 3
0 1
5.

1 2

3 2
6.

1 4

3 1 0

7. 1 3 0
0 0 2

3 1 0

8. 1 3 0
0 1 4

9. Show that the trace of a product of two matrices is a commutative operation. That is, it does not
depend on the order of the matrices are multiplied in. Verify that this is true with the two
matrices
e f
a b
A

and B
g h
c d
by direct multiplication. How is this verification different from that presented in the text?
Which method is simpler? Explain.
15 11
10. Consider the matrix A
.
1
3
(a) Find the eigenvalues of this matrix. Can you determine whether or not the eigenvectors will
span the space 2 without calculating them? How?
(b) Find the eigenvectors of this matrix.
(c) Use a similarity transformation to diagonalize this matrix, if possible. If it is not possible,
explain why. Show that your transformation works.
(d) Determine all possible matrices that can be considered as A 3I . Show that your
solutions indeed satisfy this requirement by squaring and comparing to A 3I .
3 4
11. Consider the matrix A
.
1 0
(a) Find the eigenvalues of this matrix. Can you determine whether or not the eigenvectors will
span the space 2 without calculating them? How?
(b) Find the eigenvectors of this matrix.
(c) Use a similarity transformation to diagonalize this matrix, if possible. If it is not possible,
explain why. Show that your transformation works.
(d) Determine all possible matrices that can be considered as A 5I . Show that your
solutions indeed satisfy this requirement by squaring and comparing to A + 5I.

585

Advanced Mathematical Techniques

5 2
12. Consider the matrix A
.
2 1
(a) Find the eigenvalues of this matrix. Can you determine whether or not the eigenvectors will
span the space 2 without calculating them? How?
(b) Find the eigenvectors of this matrix. Do they span the space?
(c) Use a similarity transformation to diagonalize this matrix, if possible. If it is not possible,
explain why. Show that your transformation works.
(d) Determine a matrix that can be considered as A I . Show that your solution satisfies this
requirement by squaring and comparing to A + I.

21 5 2

13. Consider the matrix A 3 19 2 .


6 10 20

(a) Find the eigenvalues of this matrix. Can you determine whether or not the eigenvectors will
span the space 3 without calculating them? How?
(b) Find the eigenvectors of this matrix.
(c) Use a similarity transformation to diagonalize this matrix, if possible. If it is not possible,
explain why. Show that your transformation works.
(d) Determine all possible matrices that can be considered as A 8I . Show that your
solutions indeed satisfy this requirement by squaring and comparing to A 8I .
23 11 25

14. Consider the matrix A 16 12 25 .


20 10 25

(a) Find the eigenvalues of this matrix. Can you determine whether or not the eigenvectors will
span the space 3 without calculating them? How?
(b) Find the eigenvectors of this matrix.
(c) Use a similarity transformation to diagonalize this matrix, if possible. If it is not possible,
explain why. Show that your transformation works.
(d) Determine all possible matrices that can be considered as A 5 2 I . Show that your
solutions indeed satisfy this requirement by squaring and comparing to A 5 2 I .

24 5 21

15. Consider the matrix A 8 30 24 .


18 9 37

(a) Find the eigenvalues of this matrix. Can you determine whether or not the eigenvectors will
span the space 3 without calculating them? How?
(b) Find the eigenvectors of this matrix.
(c) Use a similarity transformation to diagonalize this matrix, if possible. If it is not possible,
explain why. Show that your transformation works.
(d) Determine all possible matrices that can be considered as A2 . Note that there are 8
different matrices that play this role, only one of which is given by A. Show that your
solutions indeed satisfy this requirement by squaring and comparing to A2 .
586

Section XI.6: Eigenvalues and Eigenvectors

3 5
16. Consider the matrix A
.
2 4
(a) Find the eigenvalues and eigenvectors of this matrix, and show that the eigenvectors span
the space.
2
(b) Transform the vector x to the eigenbasis, and use this result to determine the value
7
5
of A x .
5 9
17. Consider the matrix A
.
2 4
(a) Find the eigenvalues and eigenvectors of this matrix, and show that the eigenvectors span
the space.
2
(b) Transform the vector x to the eigenbasis, and use this result to determine the value
7
of A5 x .

4 6 2

18. Consider the matrix A 0 11 1 .


0 9 5

(a) Show that the eigenvectors of this matrix do not span the space.
(b) Find a generalized eigenvector to include with the eigenbasis in a basis for the space 3 .
(c) Show that the action of A I , where is the degenerate eigenvalue, takes the generalized
eigenvector to the eigenspace.
(d) Verify the result of the Cayley-Hamilton theorem.
22 17 12

3 .
19. Consider the matrix A 4 8
32 31 18

(a) Show that the eigenvectors of this matrix do not span the space.
(b) Find a generalized eigenvector to include with the eigenbasis in a basis for the space 3 .
(c) Show that the action of A I , where is the degenerate eigenvalue, takes the generalized
eigenvector to the eigenspace.
(d) Verify the result of the Cayley-Hamilton theorem.
700
0.3 0.8
20. Given the migration matrix M
, determine
and the initial population P0
2000
0.7 0.2
the population after 1 year, 5 years, and many years.
0.8 0.6
300
21. Given the migration matrix M
and the initial population P0
, determine
0.2 0.4
5000
the population after 1 year, 5 years, and many years.
587

Advanced Mathematical Techniques

22. Solve the system


dx
4x 2 y
dt
,
dy
3 x 3 y
dt
subject to the initial conditions x (0) 1 ; y (0) 3 . Show a phase plane portrait of your
solution. What happens to the solution as t grows without bound? Can you identify the result in
terms of one of the eigenvectors?

23. Solve the system


dx
6 x 3 y
dt
,
dy
5x 2 y
dt
subject to the initial conditions x (0) 2 ; y (0) 5 . Show a phase plane portrait of your
solution. What happens to the solution as t grows without bound? Can you identify the result in
terms of one of the eigenvectors?

24. Solve the system


dx
x 9 y
dt
,
dy
x y
dt
subject to the initial conditions x(0) 2 ; y (0) 3 . Show a phase plane portrait of your
solution. What happens to the solution as t grows without bound? Can you identify the result in
terms of one of the eigenvectors?

25. Solve the system


dx
3 x 13 y
dt
,
dy
x 3y
dt
subject to the initial conditions x(0) 3 ; y (0) 5 . Show a phase plane portrait of your
solution. What happens to the solution as t grows without bound? Can you identify the result in
terms of one of the eigenvectors?
26. Show that the time-ordering treatment given in the text formally satisfies the required
differential equation, and give some indication of what is required for convergence of the
resulting series. Then, show that it leads to the series expansion for the exponential function if
the matrix is constant. Indicate specifically how and why the factorials are generated. Why
cant we simply use the exponential for matrices that are not constant? Are there any matrices
that are not constant, but that we can easily re-sum the series for? Are there other, constant,
matrices for which the infinite series is required? Explain, and include examples.

588

Section XI.7: Inner Product Spaces

Section XI.7: Inner Product Spaces


Linear algebra is broadly applicable to many different physical theories and models, but the
spaces that find application in these models are usually also endowed with an additional operation that
is not required by the definition of a vector space, that of the inner product. There are many ways to
understand the idea of an inner product. The most rigid and rigorous of these are somewhat obscure
and not easily understood, but the basic idea is quite simple. Given a vector space V, we define an
inner product space by associating this vector space with another vector space V of the same
dimension. This space is called the adjoint, or dual, space. Since the two spaces have the same
dimension, each element of V can be associated with a unique element of V . To distinguish
elements of it from those of the original vector space, we write the element of V associated with the
element x of V as x . This allows us to simultaneously emphasize both the fact that x V and
that it is associated with x V . The inner product is defined as a bilinear (linear in both the element
of V and that of V ) operator that maps the element x V and y V to the element y x of the
field associated with V and V . In our case, this field will always be either the real numbers or the
complex numbers. In addition to its bilinear property, this operation must satisfy the property
*
y x x y 177 and the positivity requirement x x 0 for all x V , with equality obtained only
for the element x 0 .
You are already familiar with one of the most commonly used inner products, the dot product.
This product clearly satisfies all of the requirements of an inner product, as it is linear in both vectors,
exchanging the two vectors has no effect on its value,178 and the dot product of a vector with itself can
only be zero if the vector itself is zero. In analogy with this product, the two vectors x and y are
said to be orthogonal whenever y x 0 . One of the most useful results to come from inner product
spaces is the simplicity of determining the coefficients of an expansion of any vector in terms of an
orthogonal basis. Suppose that we are lucky enough to be given a basis

k 1

of vector space V

that satisfies
vk v j 0

whenever

k j.

Such a basis is called orthogonal because each of the basis vectors is orthogonal to all of the others. It
is easy to express an arbitrary vector x V in terms of an orthogonal basis because the coefficients
can be determined simply by forming the inner product with one of the basis vectors:
x c1 v1 c2 v2 c3 v3 cn vn

vk x ck vk vk

ck

vk x
vk vk

As an example, consider the basis


1 1 1

1 , 1 , 1
1 0 2

of 3 . To express the vector

177
178

The asterisk denotes complex conjugation.


The complex conjugate is meaningless when all quantities are real.

589

Advanced Mathematical Techniques

3

x 2
3

in this basis, we need only perform a few inner products:
3
1
1
1
2 1 11
x 2 1 1 1 .
3 3 1 2 0 6 2




This process is much more efficient than determining the inverse of a 3 3 matrix and then
performing the multiplications required to determine the coefficients in the earlier manner. For this
reason, applications of inner product spaces usually call for an orthogonal basis. This is especially
important for infinidimensional vector spaces, as row reducing an infinite-by-infinite matrix is an
arduous task. To simplify the process further, one often opts for a basis that is not only orthogonal but
also consists of vectors that have been normalized so that the inner product of any of the basis vectors
with itself is equal to 1. Such an orthonormal basis,

k 1

Kronecker delta symbol jk 1 when j = k and 0 when j k .


an orthonormal basis are given by
n

x ck uk ;
k 1

, satisfies u j uk jk , where the

179

The coefficients of the vector x in

ck uk x .

Suppose we are given a basis for the vector space V and want to determine an orthonormal
basis for the same space. This can be accomplished in a number of ways, but the most straightforward
is the Gram-Schmidt process, named for the Danish mathematician Jorgen Gram and the German
mathematician Erhard Schmidt. The Gram-Schmidt process begins with an arbitrary basis for V, call it

k 1

. The orthonormal basis is determined by choosing one of the basis vectors bk to start with,

then iteratively defining the orthonormal basis vectors uk by forcing them to be orthogonal to all of
the preceding vectors. Starting with b1 , we define

u1

1
b1 b1

b1 .

The second vector must be orthogonal to this one, so we take

v2 b2 u1 b2 u1

u2

v2 v2

v2 .

See if you can show that these vectors are orthogonal to u1 . The third vector must be orthogonal to
both u1 and u2 , so we define

v3 b3 u1 b3 u1 u2 b3 u2

u3

1
v3 v3

v3 .

In general, the kth vector in the orthonormal basis is defined via

179
This symbol is named for the German mathematician Leopold Kronecker, who introduced it in the latter half of the
nineteenth century.

590

Section XI.7: Inner Product Spaces


k 1

vk bk u j bk u j

j 1

uk

1
vk vk

vk .

This process is not pretty, but it is guaranteed to work. As an example, consider the vector space
2 0 3 7

3 2 1 1
1 3 1 10
V span , , ,
.
1 4 2 12
0 1 5 8


2 2 1 9
Our first vector in the orthonormal set is
2

3
1 1
u1
.
19 1
0

2
The second is determined by first subtracting to determine v2 , then normalizing:

29
1 60
v2
;
19 73
19

44
The third is determined in the same way:
229

61
1 167
v3
;
91 132
437

171
Computing the fourth, we find

29
1 60
u2

.
7 247 73
19

44
229

61
167
1
u3

.
7 6565 132
437

171

0

0
0
v4 .
0
0
0

591

Advanced Mathematical Techniques

The Gram-Schmidt process is kind of like playing cards. Each new vector is traded in for
another that contains the same information but is orthogonal to the previous ones. This result for v4
indicates that the fourth vector isnt worth anything. Removing all of the parts of b4 that contain the
same information as the vectors u1 , u2 , and u3 leaves us with nothing, indicating that the original
vectors were linearly dependent. It is impossible to have mutually orthogonal vectors that are linearly
dependent, as can be shown directly from the definition of linear dependence (show this), so our
attempt to force these linearly dependent vectors to be mutually orthogonal leads to zero.
The Gram-Schmidt process is obviously not an easy path. Each step brings larger numbers,
despite the fact that the vectors we started with were fairly simple. This is generally the case with this
process. Remember that it never promised to be pretty; it promises only to work. The Gram-Schmidt
process is not unique, as it will give different results if we start with a different vector. It does not
guarantee the simplest orthonormal basis, only an orthonormal basis. Column reducing the vectors
spanning the space often leads to a simpler orthonormal basis, but large numbers are essentially
unavoidable when the dimension of the space is larger than three or four.
The rest of 6 that is not contained in V can be spanned by many vectors, but it is often useful
to have a basis for this remaining part of the larger space that is orthogonal to V. This remaining space
is called the orthogonal complement of V, and denoted V . Calling the matrix whose columns are
given by the vectors spanning V matrix P, it can be seen by inspection that this space is given by
V Nul PT . This is an artifact of our earlier result that
col P Nul PT 6 .

Combining our orthonormal basis of V with an orthonormal basis for Nul PT gives an orthonormal
basis for 6 that respects the space V. A basis for Nul PT can be obtained by row reducing PT :
2
1
1
0 2
2 3

2 R3 2 R3 3 R1 0 2
3
4 1 2

1 1
5
2 5 1 R4 2 R4 7 R1 0 7
7 10 8

1 10 12 8 9
0 19 13 31 16 4
2 3 1 1 0 2
2 3 1 1 0 2

0 2 3 4 1 2 R4 R4 R3 0 2 3 4 1 2
R3 2 R3 7 R2

R4 2 R4 19 R2
0 0 31 14 13 30
0 0 31 14 13 30

0 0 0 0 0 0
0 0 31 14 13 30
4 0 11 10 3 10

82 70 28
R1 2 R1 3 R2
0 62 0

R2 31R2 3 R3
0 0 31 14 13
30

0
0
0
0
0 0
5
31 0 0 39 59

R1 31R1 11R3 4
0 31 0 41 35 14

R2 R2 2
0 0 31 14 13 30
.

0
0
0
0 0 0
A basis for the null space is therefore given by
2

0
PT
3

592

Section XI.7: Inner Product Spaces

39 59 5

41 35 14
14 13 30
,
,
.

31 0 0
0 31 0


0 0 31
Choosing the simplest of these to start with, we obtain the orthonormal basis

5
3773
88

14
1820

1088
1
30
264
851
1
1

(whew!).

3 2082 0 23,332,974 0 5, 659,535 1601


0
2082
1039

31
1175
318

These vectors are all orthogonal not only to each other but to every vector in V. Again, they are not
pretty (to say the least!), but they do work. You can check for yourself that these vectors are indeed
orthogonal to each other and to each of the vectors in any basis of V.
This decomposition of 6 into an orthonormal basis that respects V is similar to the familiar
i , j, k basis of 3 . This basis respects the xy plane in that the third direction is orthogonal to all of
the vectors in this plane. Similarly, our new basis of 6 splits the space into vectors that lie in V and
those that are orthogonal to this space. This decomposition of the larger vector space is very useful in
many applications, as it may be that certain forces only act in the vector space V. In this case, we
need to separate V in a substantive manner from the rest of the space. The most efficient way to do
this is to obtain a basis that respects this subspace.
While simple in concept, the Gram-Schmidt process is very involved computationally. It is also
quite unforgiving; a mistake in even one element of the analysis even a small one can result in one
having to re-do the entire process. Even small numerical errors, like round-offs in the sixth digit, can
ultimately lead to incorrect results if the number of vectors we are trying to orthogonalize is large.
Mathematica has a command that executes this process exactly, called by the command
Orthogonalize, but even this is not viable when the number of vectors is large. There are streamlined
versions of the Gram-Schmidt process that minimize the errors associated with round-offs so that
computers can obtain correct results in many cases, but the process is still extremely complicated.
Thankfully, there is usually an alternate method that one can use to obtain an orthonormal basis for the
vector spaces that are most useful in applications. It takes a while to develop, but the ultimate result is
extremely beneficial and allows us to avoid Gram-Schmidt almost always.
To develop this new idea, we need to examine the relationship transformation matrices have
with the inner product. In order to obtain the most general results that we will need in our analysis
(and that needed in most applications), we must first understand the behavior of vector spaces that are
associated with the complex field rather than just the real numbers. Consider the inner product
between the vector y V and the vector x c1 v1 c2 v2 V , where c1 and c2 are arbitrary
complex numbers. One of the properties of the inner product states that the complex conjugate of this
inner product is given by
y x x y c1 y v1 c2 y v2
*

c1* v1 y c2* v2 y ,
593

Advanced Mathematical Techniques

where I have used the bilinear property of the inner product and its behavior under complex
conjugation. This implies that the adjoint, or dual, of x is given by x c1* v1 c2* v2 , so
coefficients undergo complex conjugation under the transition from V to V . This change guarantees
that the inner product of a vector with itself will be zero only if the vector is zero, something that
would not occur if we did not conjugate the coefficients (consider the inner product of the vector
1 i with itself, for example). This is the reason why this condition was imposed on the inner
product in the first place. There is no effect on vector spaces that are associated with the real number
field, of course, so this effect can be ignored when we are interested only in real numbers. As we have
seen, however, the inclusion of complex numbers is sometimes unavoidable. For this reason, it is
important to keep the complex conjugation of coefficients in mind when thinking about dual vector
spaces.
What happens to the vector A x when it is taken through the looking glass to the adjoint
space? We define the adjoint of matrix A, denoted as A , by the matrix for which
x A y A x

for all vectors y V . When looking at the inner product

x A y , the matrix A is said to be

sandwiched between x and y . The adjoint matrix is the matrix that gives the same value for this
inner product when it is considered to be acting backwards on x instead of forward on y . The
idea of a matrix acting backwards on a vector in V is new, and only applies to inner product spaces.
Since A y is a vector in V, we can take its inner product with the adjoint of the vector x . The
matrix A takes the vector x V to the appropriate place in V to make the inner product
unchanged. The inner product allows us to consider the vector x A V . The vector in V associated
with this vector is A x , defining the adjoint matrix. This general definition will be extremely useful
to us when we consider infinidimensional vector spaces which cannot be mapped either to n or n
for any n. When working in finite-dimensional vector spaces, we can find an even more useful way to
determine the matrix A from a given matrix A. Since the dimension of the vector space is finite, we
can think of it as n remember the isomorphism! In this case, the vector x can be written in
terms of the e-basis and the matrix A has components A jk . If the vector y has components yk and
the vector x has components x j , then

x A y x*j Ajk yk
and
x A x*j Ajk .
We are using Einsteins summation convention here, which is an agreement to sum over every
repeated index. In the first case, the indices j and k are summed over, while in the second case only
the index j is summed over. Taking this last vector back into V by complex conjugating (note that this
relationship follows from the definition of the inner product), we have
A x A*jk x j A*T x j .
kj

Thus, we arrive at the conclusion that the adjoint of A is the same as the complex conjugate of its
transpose. Both of these properties are easily understood from the definition of the adjoint. The
transpose comes because the adjoint is associated with the matrix acting backwards: instead of
594

Section XI.7: Inner Product Spaces

multiplying rows of A with the column x , we multiply the columns of A by the row x . The
complex conjugation is associated with the transition from V to V .
Suppose now that A is self-adjoint, A A . When all of the elements of A are real, we call
such a matrix symmetric. Otherwise, the term Hermitian is often used especially in applications
associated with physics. This title honors the French mathematician Charles Hermite, who made many
contributions to linear algebra especially in infinidimensional vector spaces. As with all finitedimensional matrices, A has eigenvalues and at least one eigenvector for each distinct one. The first
nice result associated with Hermitian matrices comes immediately from the definition of the terms
eigenvalue and eigenvector, as well as the properties of the inner product. If
A x x ,
then
x A x x x x x .
Complex conjugation of this equation gives
*
* x x x A x x A x x A x x x .
Thus, all of the eigenvalues of self-adjoint matrices are necessarily real. Further, consider two
eigenvectors of the self-adjoint matrix A. If
A x1 1 x1
and
A x2 2 x2 ,
then

1 x2 x1 x2 A x1 A x2

Therefore,

x1 A x2

1 2

x1 2 x2

x1 2* x2 x1 2 x2 x1 .

x2 x1 0 ,

so either 1 2 or x2 x1 0 . This implies that eigenvectors associated with distinct eigenvalues are
automatically orthogonal.
If two eigenvalues are degenerate, or equal, then the above result has no impact. However, we
can use an earlier result to give us insight into the behavior of degenerate eigenvalues. Let us suppose
that the eigenvectors of A do not span the space. We can, if we like, use the Gram-Schmidt procedure
to find an orthogonal basis for the whole space V on which A acts. Since eigenvectors associated with
distinct eigenvalues are automatically orthogonal, we can choose this basis in such a way that the first
r vectors are eigenvectors and the rest are not. The number r is the dimension of the eigenspace of A,
which must be greater than or equal to the number of distinct eigenvalues of A. We can call this
eigenspace of A the space W. Consider the action of A on the other basis vectors, which span a
subspace of V that we can call U. Given a vector x W and another vector y U , it is clear that
x A y A x

y A x

y x y 0.

Therefore, A maps the space U to itself. It must do so in a one-to-one manner, as there can be no
nontrivial intersection between U and Nul(A) (why?). This means that the matrix A can be
reformulated as a one-to-one and onto (why?) mapping from U to itself. As with all such mappings in
finite dimensions, it must have at least one eigenvalue and at least one eigenvector. But this is
impossible, as all of the eigenvectors of A are elements of V and not of U. U is therefore empty and
the eigenvectors of A span the space.
This result is strikingly different from that seen in the general case. Recall that, while
eigenvectors associated with distinct eigenvalues are always linearly independent, degenerate
eigenvalues may have less eigenvectors than the multiplicity of the eigenvalue. This causes part of the
space not to be spanned by the eigenvectors. An example of this situation is the matrix
595

Advanced Mathematical Techniques

1 1
M
,
0 1
which has the eigenvalue 1 with multiplicity 2, but its only eigenvector is
1
v .
0
We can proceed as argued above, taking the vector
0
u
1
as a basis for the part of the space U that does not belong to the eigenspace of M. This vector is clearly
orthogonal to the eigenspace, so our construction is exactly as explained above. Acting with M on u
gives

1
M u u v ,
1
so the action of M takes us out of the space U. This destroys the remaining part of the above
argument, as M does not map U to U. The presence of the vector v on the right-hand-side of this
equation causes the eigenspace to fail to span all of 2 ; if it were not there, then u would obviously
be another linearly independent eigenvector and all would be well. This behavior is possible because
M is not symmetric. If it were symmetric, then M u could not contain v because it would have to
be orthogonal to v . Thus, symmetric matrices automatically avoid all of the pitfalls associated with
degenerate eigenvalues. These results also hold for matrices with complex entries, provided that we
replace the word symmetric with the word Hermitian. We can be sure that the matrix
3
2 i

4
2i
i
3 2i
5

has eigenvectors that are orthogonal and span the space 3 , but we cannot say the same about the
matrix
i
3
2

4
2i,
i
3 2i
5

even though it is symmetric.


The major result of this section is the fact that eigenvectors of Hermitian matrices are
automatically orthogonal whenever they correspond to different eigenvalues. There is no need to
employ the Gram-Schmidt process when working with the eigenvectors of a Hermitian matrix, except
for the space of eigenvectors associated with a single degenerate eigenvalue. This is often a
tremendous benefit to working with such matrices, and it is a welcome fact that many applications
consist only of these matrices. We will see in the next set of notes that Hermitian matrices are
associated with all observables considered in quantum mechanics and that most of the important
partial differential equations (the linear ones, at least) can be solved exclusively by considering
Hermitian matrices. The study of Hermitian matrices can be thought of as its own separate branch of
mathematics, and there has been a tremendous amount of work done in this area over the last century
or two. The importance of Hermitian matrices in applications can essentially be boiled down to the
596

Section XI.7: Inner Product Spaces

properties derived in this section: their eigenvectors are orthogonal180 and they can be depended on to
span the space on which the matrix acts. These properties have direct physical implications that will
be discussed in the next chapter.
One more topic requires consideration before we make the jump to infinidimensional linear
algebra and the next set of notes. Suppose that we are interested in changing from the e-basis to
n
another orthonormal basis v k k 1 . As discussed above, the matrix

P v1 v 2 v n
takes us from this new basis to the e-basis. What of the inverse transformation that takes us from the
e-basis to the new basis? This is clearly given by P 1 , but this inverse matrix is difficult to find in
general. This is where the orthonormal properties of this basis come in very handy. See if you can
show by inspection (this means look at it) that the inverse of this matrix is simply its transpose:
P 1 PT . This is obviously a great help calculationally, as no extra work is required to determine the
transpose of a matrix. A real matrix that consists of orthonormal columns is called an orthogonal
matrix. Orthogonal matrices form a mathematical set called a group. A group is a set of objects that
have a multiplication operation under which it is closed. There must be a multiplicative identity I in
the group, and all elements T of the group must be associated with another element T 1 for which
T 1T TT 1 I . This idea of a multiplicative inverse is familiar from the properties of vector spaces
(where the role of multiplication operation is played by the addition operation) and fields, but groups
are not required to have an addition operation. The multiplication operation must be associative, but
commutativity is not required. If the multiplication operation of a group happens to be commutative,
then the group is called Abelian. The set of invertible n n matrices forms a group, but it does not
form a vector space. The sum of two invertible matrices is not necessarily invertible, but their product
is. To show that the set of orthogonal matrices forms a group, we need to show that the product of two
orthogonal matrices is again orthogonal and that all orthogonal matrices have a multiplicative inverse
that is also orthogonal. The former is established by the identity
T
AB AB BT AT AB B 1 A1 AB B 1 B I ,
while the latter is obvious from the fact that

A A A
A A A A I ,
1 T

1 T

T T
1

where we are assuming that both matrices A and B are orthogonal in these computations. The fact that
T
1
AB BT AT can be established easily using index notation, and the fact that AB B 1 A1 can
be established directly from the definition of the tem inverse. Try to show both of these properties.
One of the most important groups of orthogonal matrices is the rotation group. The set of
matrices associated with rotations in two dimensions, which can be parameterized by the rotation angle
as
cos sin
R
,
sin cos
consists entirely of orthogonal matrices. This group is called SO (2) , or the special orthogonal group
in two dimensions. The word special is included to remind us that all of these matrices have
determinant +1. The matrix

180

Or, in the case of degenerate eigenvalues, can be chosen to be orthogonal.

597

Advanced Mathematical Techniques

1 0

0 1
is clearly orthogonal, but does not constitute a rotation. It is an element of the larger group O(2) that
contains all orthogonal 2 2 matrices. Something similar happens in higher dimensions; the ndimensional rotation group is called the special orthogonal group in n-dimensions, or SO(n). It is a
subgroup of the orthogonal group in n-dimensions, or O(n). Matrices that are not real, but consist of
columns that are orthonormal, also form a group. This group is called the unitary group, denoted by
U(n) in general and SU(n) if the determinant is restricted to equal +1. Unitary groups are some of the
most important groups considered in physical applications. The groups SU(2) and SU(3) in particular
have found extensive use in the standard model representing the frontiers of fundamental physics. It is
clear from our above analysis that all real symmetric matrices can be diagonalized by an orthogonal
transformation and all Hermitian matrices can be diagonalized by unitary transformations. There is a
deep connection between the space of Hermitian matrices and the group of unitary matrices, as every
Hermitian matrix H is directly related to the unitary matrix exp iH . The applications of Hermitian

and unitary matrices are extremely broad, and I will cover a few of their applications in the remaining
chapters.
Exercises for Section XI.7:

1. Consider the scalar product x y x1 y1 2 x2 y2 3x3 y3 for vectors x , y 3 .


(a) Show that this product is an inner product.
(b) Show that this inner product is not invariant under rotation. Hint: consider a rotation of 90
degrees about the 3-axis.
(c) Show that this inner product respects the Cauchy-Schwarz inequality, x y

x x y y .

2. Consider the scalar product x y 3x1 y1 x2 y2 4 x3 y3 for vectors x , y 3 .


(a) Show that this product is an inner product.
(b) Show that this inner product is not invariant under rotation. Hint: consider a rotation of 90
degrees about the 3-axis.
(c) Show that this inner product respects the Cauchy-Schwarz inequality.
3. Consider the scalar product x y x1 y1 x2 y2 x3 y3 for vectors x , y 3 .
(a) Show that this product is not an inner product. Give examples, and explain which properties
of the inner product are not satisfied by this product.
(b) Show that this product is not invariant under rotation.
(c) Show that this product does not respect the Cauchy-Schwarz inequality. Give examples as
part of your analysis.
598

Section XI.7: Inner Product Spaces

4. Consider the scalar product x y x3 y1 x1 y2 x2 y3 for vectors x , y 3 .


(a) Show that this product is not an inner product. Give examples, and explain which properties
of the inner product are not satisfied by this product.
(b) Show that this product is not invariant under rotation.
(c) Show that this product does not respect the Cauchy-Schwarz inequality. Give examples as
part of your analysis.
5. Write the vector
7

b 5
1

in terms of the basis

1 41 10

B 2 , 26 , 11 .
3 31 4

13 0 6 13 1 6 13 0 6
13 0 4

6. Consider the matrices 0 15 0 , 0 15 0 , 0 15 1 , and 0 15 0 .


6 0 3 6 0 3 6 0 3
9 0 3

(a) Argue that these matrices all have the same eigenvalues.
(b) Find the eigenvalues and eigenvectors of each of these matrices.
(c) Explain the differences between the eigenvectors of these matrices. Are they orthogonal?
Do they span the space? Explain your findings in a short paragraph.
7. Use the Gram-Schmidt process to find an orthonormal basis for the vector space
2 3 1

1
3
3
V span , , .

1
4
2

5 2 6
8. Use the Gram-Schmidt process to find an orthonormal basis for the vector space
3 1 1

3
1
5
V span , , .

2
2
6

3 1 1
9. Use the Gram-Schmidt process to find an orthonormal basis for the vector space
1 1 3

2
2 1
V span , , .

4
1
4

3 1 5
599

Advanced Mathematical Techniques

10. Argue that the inner product must contain complex conjugation if it is to satisfy the required
properties of an inner product when vectors with complex entries are considered.
11. Determine the eigenvalues and eigenvectors of the Hermitian matrix
4 i
A
.
i 4
Show that the eigenvalues are real and that the eigenvectors are orthogonal, as long as the
proper inner product including complex conjugation is used.
12. Determine the eigenvalues and eigenvectors of the symmetric matrix
3 4i
A
.
4i 5
Show that its eigenvalues and eigenvectors share none of the nice properties of those found for
Hermitian matrices, but the eigenvectors still span the space 2 . How can we be sure of this
property? How is it possible that this symmetric matrix has complex eigenvalues and
eigenvectors that are not orthogonal?

2 1 3

13. Consider the matrix A 0 1 4 . It is upper triangular, so its eigenvalues are displayed
0 0 2

along the diagonal.


(a) Determine the eigenvectors of this matrix, and argue that they do not span the space.
(b) Find another vector that can be included with the eigenvectors of A in an orthogonal basis.
(c) Use the Gram-Schmidt procedure first to find an orthonormal basis for the eigenspace of A,
then to find an orthonormal basis for the entire space 3 that includes these two vectors.
(d) What does the action of matrix A do to the last vector in your basis obtained in (c)? Show
that this vector represents a generalized eigenvector of A, and that the Cayley-Hamilton
theorem is satisfied.
4 1 2

14. Consider the matrix A 0 4 2 . It is upper triangular, so its eigenvalues are displayed
0 0 3

along the diagonal.


(a) Determine the eigenvectors of this matrix, and argue that they do not span the space.
(b) Find another vector that can be included with the eigenvectors of A in an orthogonal basis.
(c) Use the Gram-Schmidt procedure first to find an orthonormal basis for the eigenspace of A,
then to find an orthonormal basis for the entire space 3 that includes these two vectors.
(d) What does the action of matrix A do to the last vector in your basis obtained in (c)? Show
that this vector represents a generalized eigenvector of A, and that the Cayley-Hamilton
theorem is satisfied.

600

Section XI.7: Inner Product Spaces

15. The purpose of this problem is to illustrate the properties of Hermitian matrices and the manner
of proof required for these properties. In the following, assume that A is a Hermitian matrix.
(a) Show that the eigenvalues of A are all real.
(b) Show that eigenvectors associated with different eigenvalues are orthogonal.
(c) Prove that the eigenvectors of A span the space on which A acts, whether or not some of the
eigenvalues are degenerate.
16. Show that the Gram-Schmidt process works, i.e. that each new vector defined in the process will
necessarily be orthogonal to all of the previous ones.
17. A group is defined as a collection of objects with exactly one operation, multiplication, under
which it is closed so that the product of any two elements is another element. This operation
must be associative, but commutativity is not required. There must be a multiplicative identity,
called I, in the group, and all elements of the group must have a multiplicative inverse that is
also in the group.
(a) Show that the set of 3 u 3 matrices under standard multiplication does not constitute a group.
What is missing?
(b) Show that the set of 3 u 3 matrices with standard addition taken as the multiplication
operation does form a group. What matrix plays the role of the multiplicative identity in
this case?
(c) Show that the set of invertible 3 u 3 matrices under standard multiplication forms a group.
What matrix plays the role of the multiplicative identity in this case?
(d) Show that the set of invertible 3 u 3 and invertible 4 u 4 matrices under standard
multiplication does not constitute a group. What is missing?
(e) Show that the multiplicative inverse of any element of a group is unique. That is, if gh I
and fg I , with f, g, and h all elements of a group, then f = h.
18. The unitary group in n dimensions is the set of all n u n matrices U with complex elements such
that UU U U I .
(a) Show that the unitary group is, in fact, a group under the definition given in problem 17.
Take the multiplication operation as standard.
(b) Show that the set of matrices exp iH , with H any Hermitian n u n matrix, forms a group of
unitary n u n matrices.
(c) Determine the number of free parameters associated with an n u n unitary matrix.
Remember that there are n 2 elements, each of which is complex, and that the product
U U I . How many conditions is this? Show that the number of free parameters
associated with Hermitian n u n matrices is the same, and argue that this implies that every
unitary n u n matrix can be written as the exponential of i times a Hermitian n u n matrix.
The two groups discussed in parts (a) and (b) are the same.
(d) Show that the additional requirement that det U 1 still allows the resulting set of matrices
to form a group. How many independent parameters does this special unitary group in n
dimensions contain? Can you find a related restriction on the Hermitian matrices that allows
this group to be written as exponentials of Hermitian matrices (times i)? Hint: all Hermitian
matrices can be diagonalized; it may be useful for you to think about the diagonal form of
these matrices and what the determinant of the exponential of such a matrix would look like.
(e) Show that the exponential of the vector space of complex n u n matrices forms a group.
Groups are sometimes thought of as exponentials of vector spaces for this reason.
601

Advanced Mathematical Techniques

Section XI.8: Summary of Chapter XI

This chapter represents an introduction to the salient points of finite-dimensional linear


algebra. Linear algebra is one of the most widely-used fields of mathematics in other applications, and
these salient points only scratch the surface of its true applicability in these situations. Many other
prominent uses of linear algebra, including the factorization theorems and many applications arising
from organizational structure, are absent from this text for reasons of space. Hopefully, you will be
able to understand these other applications from the material presented here, once you are given an
introduction to the methods used. One of the most important characteristics of linear algebra lies in its
abstract language. This allows it to be used in many different applications without significant changes,
and gives great importance to the many theorems associated with linear algebra in a large variety of
applications. The material found in this chapter emphasizes the abstract identity of linear algebra
without focusing on a specific application. This is what is meant by salient points.
The most important concepts to take from this chapter are those of a vector space, linear
independence, pivots (or assassins), the determinant, the requirements of invertibility, eigenvalues and
eigenvectors, and the inner product. Many of these ideas are easily manipulated in finite-dimensional
spaces, but all of them have subtleties that come up when they are applied to vector spaces whose
dimension is not finite. We will handle some of these subtleties in the next chapter. The results that
we will find the most use of later on in this book, other than the concepts outlined above, are that
Hermitian matrices have real eigenvalues associated with eigenvectors that are orthogonal whenever
the eigenvalues are distinct and that span the space on which the matrix acts. We will find that these
properties remain even when the space under consideration does not have finite dimension, though
there are some important caveats to this statement.
Many of the most important applications of linear algebra, in some respects, are associated with
its treatment of infinidimensional vector spaces. This treatment will allow us to solve linear partial
differential equations, analyze the behavior of quantum systems, and classify the general behavior of
functions defined on a compact interval. This is the underlying subject of the rest of the book, and the
purpose of chapter 11 is to introduce you to these topics in a simple way. The case of finitedimensional linear algebra lacks many of the subtleties associated with infinidimensional linear
algebra, but many of the core results remain. It is important to understand this basic treatment before
moving on to more advanced topics.

602

Section XII.1: Characterization of Infinidimensional Vector Spaces

Chapter XII
Infinidimensional Linear Algebra
The purpose of this chapter is to introduce and elucidate some of the most important properties
of linear vector spaces with infinite dimensions. This is an extremely broad and diverse topic that
covers applications as different as Fourier series and quantum mechanics. Many of the most important
theorems are too complicated to include in this work, but the main results will be stated and plausible
arguments will be given if a rigorous proof of a result is not accessible. Proofs of some of the most
important results can be found in chapter 15. The character of infinidimensional vector spaces is quite
different from those of finite dimension for several reasons, so the majority of this text will be focused
on the more general properties of infinidimensional vector spaces rather than just looking at the spaces
n , as was done in the last chapter. Examples, when they are given, will consist of one or another
class of function spaces whose elements are more widely thought of as functions rather than vectors.
Examples of such function spaces are the space of polynomials of degree less than or equal to 4,
considered in the last set of notes, and the space of differentiable functions defined on the interval
t 0,1 . The first of these spaces has dimension 5, while the second is infinidimensional.
Section XII.1: Characterization of Infinidimensional Vector Spaces
The word dimension is defined in linear algebra as the number of linearly independent
vectors required to span a given space. Whenever the dimension of a space is finite, we can count the
basis vectors to determine whether or not we have enough linearly independent vectors to span the
whole space. Infinidimensional vector spaces, on the other hand, do not have this nice property. The
statement that we have an infinite number of vectors is not enough to say that we are spanning the
entire space, as the definition of linear independence implies that jettisoning one of these vectors will
change the span of the remaining ones. Jettisoning one or more vectors in an infinite set of linearly
independent vectors still leaves us with infinitely many, so this counting argument cannot be enough
to guarantee that our vectors actually span the entire space. Some spaces are even uncountably
infinidimensional, so we cannot even hope to be able to count the vectors in any basis. In the absence
of this simple rule, we require another method to tell whether or not a given set of linearly independent
vectors actually exhausts an infinidimensional space.
There are many ways to do this, but the most common involves endowing the space with an
inner product. Inner products give us a measure of the distance between two vectors in a vector space;
the distance between the vectors x and y can be defined via
s2 x y

x x 2 Re x y y y ,

since this quantity is certainly a positive number by the definition of the inner product. If the field
associated with the vector space is the real numbers, then we can use this result to derive an important
identity. With , we have
s2 x y

x x 2 x y 2 y y 0 .

The last expression is quadratic in . It can either have no real roots, if y is not proportional to

x , or a single real root with multiplicity 2 if y is proportional to x . It is not possible for this
603

Advanced Mathematical Techniques

quadratic to have two distinct real roots, as this would require it to change sign. The quadratic formula
then gives
2
4 y x 4 x x y y 0,
or
2
x x y y y x .
This is the celebrated Cauchy-Schwarz inequality, named for the prominent French mathematician
Augustin-Louis Cauchy of complex analysis fame and the German mathematician Hermann Amandus
Schwarz.181 The inequality is strict whenever x and y are linearly independent. Allowing for
complex complicates this analysis a bit. In this case, we write

ei

y x y x ei .

and

The analogous distance is then given by


2
s2 x y x y x x y x x y y y
x x e i y x ei y x
x x 2 y x cos

y y

y y 0

This is a quadratic expression in the real variable , so its restriction to be positive again leads to the
inequality
4 cos 2 y x

4 x x y y ,

or

x x y y cos 2 y x .
2

This inequality must be valid for all choices of the angle , so choosing gives the modified
Cauchy-Schwarz inequality
x x y y y x

Our derivation of these inequalities is perfectly valid in inner product spaces of any dimension, finite
or infinite. In finite dimensions, this is simply the familiar inequality
2
A B A2 B 2 ,
indicating that the square of the cosine of the angle between two vectors is less than or equal to 1.
To go further, we first imagine taking a candidate basis for the space and making it
orthonormal. We could do this using the Gram-Schmidt process if necessary, but thankfully this will
almost never be necessary. Our orthonormal candidate basis is given by

k 1

where n is the dimension of the space. If the space is infinidimensional, then n . Taking the first
r vectors in this candidate basis, we would like to find the coefficients c j

181

r
j 1

that make the vector

As an interesting sidenote, this inequality was first introduced by Cauchy in 1821. An integral form of it was published
in 1859 by the Russian mathematician Viktor Bunyakovsky, and Schwarz re-discovered Cauchys result in 1888. For this
reason, it is also referred to as the Cauchy-Bunyakovsky-Schwarz inequality. This inequality finds broad use in many fields
of mathematics and scientific applications, from function theory to probability theory to the solution of the wave equation
and quantum mechanics. It has been called one of the most important and widely-used inequalities in all of mathematics.

604

Section XII.1: Characterization of Infinidimensional Vector Spaces


r

c
k 1

vk

as close as possible to a given vector x in our space. To do this, we minimize the inner product of
r

the difference x ck vk with itself, with respect to the coefficients:


k 1

r
r
r
r
r

s 2 x ck vk x c j v j x x ck vk x c j x v j ckc j vk v j
k 1
j 1
k 1
j 1
j 1 k 1

k 1

j 1

k 1

j 1

k 1

x x ck vk x c j x v j ckc j kj

j 1 k 1

x x ck vk x c j x v j ck

The last equalities follow from the orthonormality of the vectors vk

and the properties of the

Kronecker delta symbol kj . To extremize this with respect to the coefficients, we simply require that
its derivative with respect to every parameter it depends on vanishes. We will think of s 2 as a
function of the coefficients and their complex conjugates independently of one another, as the complex
coefficients contain two degrees of freedom each, their real and imaginary parts. It is a little more
rigorous to consider the real and imaginary parts as independent instead of the coefficient itself and its
complex conjugate, but this process is messier, less clear, and ultimately leads to the same result
anyway. The derivative with respect to c gives

x v c 0

c x v ,

and that with respect to c gives

v x c 0

c v x .

These two results are clearly equivalent, by one of the properties of the inner product. Make sure that
you clearly understand all that goes into this result. Remember that represents one of the numbers
between 1 and r, so we get contributions from either the sum over k or that over j.
This is the only extremum of s 2 , and it cannot be a maximum as we can obviously cause s 2 to
grow without bound by increasing the magnitude of some or all of the coefficients. There must be a
minimum value of s 2 because this quantity is definitely bounded below by zero due to the properties
of the inner product. Therefore, this choice of coefficients definitely gives the minimum. The
minimum value of s 2 is
r
r
r
r
r

s 2 x ck vk x c j v j x x ck vk x c j x v j c j
k 1
j 1
k 1
j 1
j 1

k 1

j 1

k 1

x x x vk vk x x v j v j x vk x
r

k 1

k 1

x x x vk vk x x x ck

Since this quantity is definitely nonnegative, we have


r

x x ck .
2

k 1

605

Advanced Mathematical Techniques

This result is called Bessels inequality, after the German mathematician Friedrich Bessel. It is valid
for all sets of orthonormal vectors, and implies that the infinite series obtained as r is taken to infinity
converges whenever x is normalizable, that is whenever the inner product of this vector with itself is
finite.
Whenever Bessels inequality is a true equality, the vector x can be expressed in terms of the
first r vectors in our candidate basis. If the dimension of the space is n, this can only be true when
r = n. For infinidimensional vector spaces, this property of Bessels inequality replaces the idea of
counting basis vectors. The set

k 1

is said to be a basis for the infinidimensional vector space V

if and only if
r

lim ck x x
r

k 1

for every vector x V . Tossing one of the vectors out still leaves infinitely many behind, but must
change the value of this sum for at least some vectors x V . If the above condition is satisfied, then
we must have
r

x lim ck vk
r

k 1

by the definition of the inner product.


This counting problem with infinidimensional vector spaces can be solved for some types of
vector spaces, but not for others. Consider, for example, the space of functions defined on t 0,1 .
This is certainly a vector space under ordinary addition and over the field of either real or complex
numbers, but what is its dimension? We can easily think of a basis for this space, the set of functions
that are equal to zero everywhere except for a single value of x. This set of functions can be written as
f t ; t k t tk ,
but the question of what the index k means is left open. The set of all numbers lying on the interval
0,1 is not countable. To see the reasoning behind this statement, consider a set of numbers R that are
ordered in the sense that, for any two distinct numbers a, b R , there exists a definite and consistent
inequality that either puts a b or b a . If these numbers are countable, then we must be able to
form a mapping from the set of natural numbers to these numbers. This is what the term countable
means: we must be able to index them in the manner shown above with the index k. If the set R is

countable, then there exists a sequence sn n 1 for which every element sn R and every number

R is contained in the sequence, that is sk for at least one value of k. Now, imagine that you

and a friend are going to play a game. You move first by choosing any number in R and call that
number a1 . Your friend moves next, and must choose a number in R that is strictly larger than your
choice a1 . He makes this choice and calls it b1 . Now it is your turn again. You must choose a
number in R that lies between a1 and b1 that you will call a2 . Your friend then chooses a number
b2 R that lies between a2 and b1 , and the process continues. To tie this process into the idea of
countability, we modify the rules slightly. Instead of just choosing randomly, each player must choose

the first number in the list sn n 1 that satisfies the required inequality. Since the inequalities
necessarily get narrower and narrower as time goes on, this requirement moves us further and further

up the sequence sn n 1 of numbers in R. The construction is such that all numbers in R that satisfy
606

Section XII.1: Characterization of Infinidimensional Vector Spaces

the required inequality at any stage of the game are still accessible to the player. The player simply
must choose the next element of the list that satisfies the inequality. This choice modifies the
inequality, so the next player must go farther down the list to find an acceptable number.

The ordering of the sequence sn n 1 obviously determines the entire game when it is played in
this manner. Exchanging s1 and s2 , for example, changes the intervals of acceptability at each turn, so
changes the structure of the game. Suppose first that there is a choice of ordering for sn n 1 that
causes the game to go on forever. No matter how long the game has already gone on, each player has

another acceptable choice to continue it for one more turn. In this case, both ak k 1 and bk k 1
represent monotonic and bounded sequences. This implies that each must be associated with a limit

point, i.e. each must bunch up as the sequence extends to infinity. The sequence ak k 1 is

monotonic increasing, so it must be the case that its limit point is larger than ak for every value of k.
Similarly, the limit point of bk k 1 is smaller than bk for every value of k. If either of these limit

points actually appears in the sequence sn n 1 , then it must always be an acceptable choice for every

interval because it necessarily is larger than all ak and smaller than all bk . If one of these lies in the
sequence sn n 1 , then it must take a particular place in this sequence. As the game is played, we
eventually reach this place in the sequence and this element must be chosen as the next one that lies
within the acceptable bounds. On the other hand, neither of these elements can ever actually be
chosen by either player since it must be true, by construction, that both of these limit points are larger
than all of the ak and smaller than all of the bk . Neither of these limit points can appear in the

sequence sn n 1 , so they cannot belong to R. Conversely, if these limit points do belong to the set we
are considering, then this set cannot be countable as these points are not included in our list. Thus, any
such set that contains all of its limit points must be uncountable. Sets that do not contain any of their
limit points, or that do not have any limit points, are definitely countable. A set that contains some of
its limit points may be countable or not, depending on whether or not it is possible to arrange the

sequence sn n 1 in such a way that the limit point of either ak k 1 or bk k 1 is part of the set.

The integers are definitely countable, as we can take s1 0, s2 1, s3 1, s4 2, etc. All of


the integers will be included somewhere on this list, so countability is established. The rational
numbers are also countable, but this is a little more difficult to establish. We start by taking s1 0 .
To obtain the rest, we make a table of rational numbers for which the row of the table is the numerator
of the number and the column is the denominator. This table is illustrated in Table 1 below. The
element in row 3, column 5 of the table is given by 3 5 . It is clear that this table contains all of the
positive rational numbers, with many (actually, all) of them repeated (an infinite number of times).
To count these, we can imagine starting at the upper left entry, 1, and counting the elements of the
diagonals of the table. In this fashion, we obtain the list
s1 0, s2 1, s4 2, s6 1 2, s8 3, s10 1, s12 1 3, s14 4, s16 3 2, s18 2 3, s20 1 4 ,
and so on. This process is illustrated in table 1; make sure you understand how it works. The odd
elements of this sequence are defined as negative of the last entry, so we also include the negative
rational numbers in our list. Using this process, it is clear that every rational number will appear at
least once in our list (actually, it will appear an infinite number of times). Therefore, we can count the
rational numbers. Using this argument in conjunction with the previous one, it is clear that the rational
numbers must have at least some limit points that are not rational. Sets that do not contain all of their
607

Advanced Mathematical Techniques

limit points are called incomplete. The completion of the set of rational numbers, defined by including
all of these irrational limit points, is called the set of real numbers. By the preceding argument, it is
clear that the real numbers are not countable.
row\column
1
2
3
4
5
6
1
1
1/2
1/3
1/4
1/5
1/6
2
2
1
2/3
1/2
2/5
1/3
3
3
3/2
1
3/4
3/5
1/2
4
4
2
4/3
1
4/5
2/3
5
5
5/2
5/3
5/4
1
5/6
6
6
3
2
3/2
6/5
1
Table 1
Note that the term countable in no way means finite. There are definitely an infinite
number of integers, and there are even more rational numbers than integers. We can see this easily
simply by recognizing that all integers are also rational, but all rational number are certainly not
integers. The integers are a countable subset of the rationals, just like the even numbers are a
countable subset of the integers and the numbers that are divisible by 6 are a countable subset of the
evens. Each of these has an infinite number of elements, yet each has somehow less elements than the
preceding. This is an extremely subtle point, but it is very important to the analysis of infinite sets of
numbers. To analyze this topic completely, it is useful to have the following result: every union of a
countable number of countable sets is itself countable. The set of integers divisible by 3 and the set of
integers divisible by 2 are both clearly countable, so this result implies that the set of integers that is
divisible by either 2 or 3 is also countable. To establish this result, we again form a table just as was
done when we showed that the rational numbers are countable. Each row of the table consists of the
ordered set of numbers contained in one of the countable sets we would like to consider the union of.
Since the number of sets is countable, we can definitely include all of them by associating each row of
the table with a single one of these sets. Counting the elements of this table in the manner shown
above for the rationals, by starting at the upper left corner and counting along diagonals, we can

develop a new indexed set sn n 1 that includes every element we are trying to consider. Therefore,
this union of sets is again countable by definition. Using this result, we can easily establish a fact that
is otherwise very difficult to show: the set of algebraic numbers is countable. An algebraic number is
a number that can be the root of a finite-degree polynomial with integer coefficients. Integers and
rationals are both countable subsets of this larger set that also includes many irrational numbers like
2 . This result implies that even the inclusion of an infinite number of irrationals is not enough to
make the set uncountable. In order to establish this, we first recognize that every polynomial of degree
n has at most n distinct roots. The number of polynomials of degree n with integer coefficients is
definitely countable, as each of the n + 1 coefficients of such polynomials is countable. The set of
polynomials of degree n is therefore a union of n + 1 countable sets, so must itself be countable. This
makes the set of roots of such polynomials again countable, as each has a countable number of roots.
The degree of these polynomials is obviously countable, so the set of possible roots is a union of a
countable number of countable sets, which must again be countable. Approaching this problem
directly is very difficult, and can lead to very complicated expressions. Our approach, on the other
hand, does not require us to actually find any of the roots. We are simply concentrating on the
countability of the sets that define them. The character of this proof is indicative of many similar
proofs of complicated mathematical ideas: we characterize the thing we are trying to prove in such a
way that a broad mathematical proof applies, then use this broad mathematical proof to establish our
608

Section XII.1: Characterization of Infinidimensional Vector Spaces

objective. This is one of the main reasons why broad mathematical proofs are so important to the
structure of mathematics.
Since the algebraic numbers are countable and the real numbers are not, there definitely exist
real numbers that cannot be the root of a finite-degree polynomial with integer coefficients. Such
numbers are called transcendental, and this set of numbers must be uncountable because the set of real
numbers is uncountable. In this sense, there are somehow more transcendental numbers than algebraic
ones. Despite the fact that transcendental numbers are sparingly represented in normal mathematical
discourse, being characterized by numbers like and e, these numbers are far more prominent in the
real line than algebraic numbers. The most important difference between these two sets for our
purposes lies in the fact that countable sets can be summed over using sigma notation while
uncountable sets cannot. This means that the basis can be expressed as the limit of a finite sum, and
allows us to make many statements that are not so easily made in spaces which are uncountably
infinidimensional. If the basis of an infinidimensional vector space is countable, then we can write
expressions like

c
k 1

vk

that make sense. If it is uncountable, then this cannot be done. Vector spaces with uncountable bases
have many problems, but these can be tackled in a well-defined manner in the most important cases.
Usually, the sum is replaced with an integral. The length, area, or volume element of this integral
mutes the issues of uncountability, leading to a well-defined result.
Taking these ideas back to our discussion of infinidimensional vector spaces and Bessels
inequality, we see that the inner product plays a major role in this discussion. If
r

lim ck x x ,
r

then we must have x lim ck vk


r

k 1

as far as the inner product is concerned. The inner product of

k 1

the difference between these two vectors vanishes, so this difference must be the zero vector by the
definition of the inner product. In many cases, the inner product is defined in such a way that some
differences between vectors are essentially ignored by it. One example is the case of all functions
defined on the interval 0,1 . This space obviously has uncountably infinite dimensions, as it can be
spanned by the basis given above which has a different element for every distinct element of the
interval 0,1 . Being a continuous part of the real line, this interval cannot be countable as it must
contain all of its limit points. We can, however, obtain a countable basis for this space by identifying
the function
f (t ) 0 when t 1 3 ; f 1 3 4 ,
as well as many others along these lines which are certainly members of this space, with the zero
element. These elements are definitely not the zero element when thought of in one way, but there are
inner products that we can define that do not distinguish between this element and the zero element.
These types of subtle distinctions are extremely common in applications of infinidimensional vector
spaces, and are often not described explicitly. In order to make the whole structure of these arguments
make sense, we can modify our space to include only those functions that are continuous on 0,1 ,
disallowing the function above. This has the unfortunate consequence of causing the space not to be
complete in the sense that there exist infinite sequences of continuous functions that converge to a
function that is not continuous and therefore not in the space. We have trouble with this outcome
because it implies that there are (infinite) linear combinations of basis vectors that do not converge to a
609

Advanced Mathematical Techniques

vector in the space, calling into question the idea of a basis vector. When considering infinite linear
combinations, we must automatically restrict ourselves only to those linear combinations that
converge. The idea of convergence is itself associated with the inner product, as we will see shortly,
so it is often easier to simply allow the inner product to indicate to us which vectors are actually
different and identify more than one vector with the zero vector for this purpose. This process is very
subtle, and requires some time to get used to, but it is required for a full analysis of many important
infinidimensional vector spaces.

Exercises for Section XII.1:


In problems 1 6, determine whether the given vector space has countable dimension. Explain the
reasoning behind your decision.
1. The span of the functions sin n t , where n is an integer.
2. The span of the functions sin rt , where r is a real number.
3. The span of the functions cos at , where a is an algebraic number.
4. The span of the functions cos pt q , where p and q are integers and q 0 .
5. The span of the functions e t , where is a transcendental number.
6. The set of all functions defined on the interval t 0.3, 0.4 .
In problems 7 12, determine whether or not the given set is countable. Explain the reasoning behind
your decision.
7. The set of zeros of polynomials of degree two or less with real coefficients.
8. The set of zeros of polynomials of degree two or less with algebraic coefficients.
9. The set of zeros of polynomials of any degree with coefficients that are all sums of rational
numbers and integer powers of .
10. The set of zeros of polynomials of any degree with coefficients that are rational powers of .
11. The set of zeros of polynomials of any degree with coefficients that are all sums of rational
numbers times integers, rational powers of , rational powers of e, and rational powers of e .
A sample coefficient is 2 3 4 2 5 2 e .
12. The set of zeros of polynomials of any degree with coefficients that are all complex numbers
with real part equal to a rational number and imaginary part equal to a rational number times an
integer power of .

Section XII.2: Function Spaces: A Specific Example of an Infinidimensional Vector Space


In order to fully understand the techniques required to analyze infinidimensional vector spaces,
it is useful to have a specific example. Consider the space of functions defined and continuous on the
interval 0,1 . This is clearly a vector space over the real or complex number field, but it is not
complete as stands. There are sequences of continuous functions that converge to a function that is not

continuous, as illustrated by the sequence f n (t )n 1 , where


610

Section XII.2: Function Spaces: A Specific Example of an Infinidimensional Vector Space

f n (t )

4
n t 1 2 1
2

All of these functions are continuous and all are defined on the interval t 0,1 , but the limit of this
sequence is a function that gives 0 whenever t 1 2 and 4 when t 1 2 .
Despite this unfortunate fact, we can find a basis for this space using the Weierstrass
approximation theorem, named for the German mathematician Karl Weierstrass, who gave the first
fully rigorous definition of a limit and is thought of by many mathematicians as the father of
modern analysis. Weierstrass sought to find a polynomial approximation to every uniformly
continuous function defined on the interval t 0,1 that was arbitrarily good. Given any 0 , his
goal was to find a polynomial p (t ) for which

f (t ) p(t )
for all t 0,1 . This is a very important statement, as the inequality holds for every t 0,1 , an

uncountable set, and Weierstrass contends that he can come arbitrarily close using a polynomial with a
countable number of coefficients. These coefficients themselves do not have to be in a countable
class, like the integers or the rationals, but there will be a countable number of them (n + 1, to be
exact) for any choice of polynomial degree n. The real property that allows us to close the deal is the
fact that Weierstrass requires the function f (t) to be uniformly continuous on the interval t 0,1 ;
given any 0 , we must be able to find a 0 for which f (t ) f ( ) whenever t .
The uniformity requirement indicates that this cannot depend on t. We must be able to find an
interval size that satisfies the definition of continuity regardless of the value of t. Uniform continuity
is very important to many theorems on functional analysis, and will play a major, though possibly
unnoticed, role in the proof of Weierstrass theorem.
To prove his theorem, we first need to prove three lemmas. Lemmas are often used to make the
ultimate proof, especially its requirements, more easily understood. They are kind of small proofs
that the full proof will depend on. Throughout this proof, we will use the notation
n
n k
pn (t ; f ) f t k (1 t ) n k .
n
k 0 k
We are trying to prove that, given any 0 , there is an n large enough so that pm (t ; f ) f (t )
whenever m > n. Lemma 1 states that this is true for the constant function f (t ) 1 , and is easily seen
to be true as
n
n
pn (t ;1) t k (1 t ) n k 1
k 0 k
for every value of n 0. The second lemma states that it is true for the linear function f (t ) t , as
can be seen by differentiating the general binomial expansion with respect to either of the variables:
n
n
n
n
( x y ) n x k y n k n( x y ) n 1 kx k 1 y n k .
k 0 k
k 0 k
From this, we easily obtain
n
n
n k
n k
pn (t ; t ) t k (1 t ) n k t t k 1 (1 t ) n k t .
k
n
k 0
k 0 k n
This clearly establishes the lemma. The third lemma is the most difficult. It states that the theorem is
also true for the quadratic expression f (t ) t 2 . Note that it is obvious that this function can be
611

Advanced Mathematical Techniques

expressed in terms of a polynomial; it is, in fact, a polynomial itself! The crucial question is whether
or not it can be expressed as this polynomial. To establish this, we apply the operator
n
n
d d
n
x x x y k 2 x k y nk
dx dx
k 0 k
n
n
d
n 1
n 1
nx x y nx x y n( n 1) x 2 ( x y ) n 2 k 2 x k y n k .

dx
k 0 k
From this, we see that
2
n
n k k
t 1
2
pn t; t t (1 t ) n k 1 t 2 .
k
n
n

n
k 0
2
This expression clearly tends to t as n , the difference being given by
t (1 t )
1
.
pn t ; t 2 t 2

n
4n
This difference clearly tends to zero at every value of t as n tends to infinity, so Weierstrass will be
satisfied for any given 0 .
The proof now comes like a whirlwind, with broad statements about the behavior of the
functions that are true, but that you may not have thought of. It goes like this:

Theorem: Weierstrass Approximation Theorem


Every function uniformly continuous on the closed interval t 0,1 can be approximated

arbitrarily well by a polynomial at every point t 0,1 .

Proof: We have stated that the function f (t ) is continuous, so given any 0 we must be
able to find a 0 for which
k
k
f (t ) f whenever t .
n
n 2
It also must be true that the function is bounded, so there exists a natural number M for which
f (t ) M on the interval t 0,1 . The triangle inequality then implies
k
f (t ) f 2M .
n
for all k whenever t in the interval. Therefore,
n
n k

pn (t ; f ) f (t ) f f (t ) t k (1 t ) n k
k
n

k 0

n
n
f
k 0 k
S1 S2
by lemma 1 and the triangle inequality.

k
k
nk
f (t ) t (1 t )
n

The term S1 on the right contains contributions for which

t k n , and the term S 2 contains the rest. For S1 , we have

612

Section XII.2: Function Spaces: A Specific Example of an Infinidimensional Vector Space

S1

k 0

k t
2

(1 t ) n k

by lemma 1, while for S 2 we have

S2

k
f f (t ) t k (1 t ) n k
n
t k n
n
2M t k (1 t ) n k 2M
t k n k

2M

2
2M

nk k
nk

t t (1 t )
k
n

t k n

nk k
nk

t t (1 t )
k
n

k 0
n

2 M 2 t (1 t )
M
2Mt (1 t )
t
2t 2 t 2

2
2
2n 2
n
n

The uniform continuity of f (t) then allows us to choose n large enough that
pn (t; f ) f (t ) ,
and the theorem is proved.

The proof of this theorem can be split into several pieces. First, we use the continuity of f (t) to
separate the sum into those contributions for which k n is close to t and those for which it is not. The
term close here refers to the distance for which the function f (t) varies less than the amount 2 .
We need to use the uniform continuity of f (t) to ensure that this term means the same thing for all
values of t. The close contribution to the sum is bounded trivially by using the result from lemma 1.
The not close contribution, on the other hand, is much more delicate to bound. We cannot use the
continuity of f (t) because the values we are interested in here are not close to t. Instead, we use the
fact that f (t) is bounded in conjunction with the fact that the values of k n associated with this
contribution lie farther from t than the fixed number away. This gets rid of the absolute value and
allows us to square out the remaining quadratic. Using the results of all three lemmas then allows us to
perform the sum, once we recognize that adding the close contributions back again can only make
the sum larger. Thankfully, the sum of these contributions is suppressed by a factor of n in the
denominator. The function t (1 t ) cannot get larger than 1/4 on this interval, so we have a constant
bound on this contribution. The number may be small, but it is certainly finite and independent of
n. Likewise, the maximum value of f (t), M, may be large, but it also is finite and independent of n.
Choosing n large enough so that
M
n 2

then guarantees that this contribution will be smaller than 2 so the full contribution is less than ,
as was to be proved. Note that uniform continuity of f (t) is required in order to have a value of that
is independent of t. Continuity itself is defined only at a point, and not on an interval. Functions can
certainly be continuous without being uniformly so. One very simple example is the function 1/t on
the interval t 0,1 . This function is certainly continuous on this interval, but it is not uniformly so;
the value of associated with a given must be much smaller for small values of t than it is for
613

Advanced Mathematical Techniques

larger values. It is difficult to imagine a function that is continuous, but not uniformly so on the closed
interval t 0,1 . However, unless we can provide proof of this statement, we need to state this
requirement explicitly. It is certainly needed, so no harm is done by including it, whether or not it is
possible for functions to not satisfy this requirement. On the other hand, not including it in the proof
will cause a problem unless proof is also given that it is automatically satisfied by the assumptions of
the theorem.
The Weierstrass approximation theorem states that a polynomial can be found that comes
arbitrarily close to any given uniformly continuous function f (t), so represents a proof that the space of

uniformly continuous functions on the interval is spanned by the countable basis t k

k 0

. We can

extend this proof quite simply to every bounded interval t a, b , so our result is that the power
functions t k span the space of uniformly continuous functions on this interval. The basis is countable,
so the space is countably infinidimensional.
In order to apply Bessels inequality, we first need to endow this space with an inner product.
There are many choices, but the most convenient for us at this point is the inner product
1

f g f (t ) g (t ) dt .
0

We have allowed for the functions to be complex in this expression, but we will largely be concerned
only with real functions for the moment. This inner product clearly satisfies all of the requirements of
an inner product, as the linearity of the integral implies. The inner product of the difference between
the function f (t) and its Weierstrass approximation gives

f (t ) pn (t; f ) dt 2 dt 2
2

whenever n is large enough to make f (t ) pn (t; f ) . Since can always choose n large enough to
satisfy this inequality for any positive number , it must be true that
1

lim f (t ) pn (t ; f ) dt 0
2

n 0

and we have
f (t ) lim pn (t ; f ) ,
n

as far as the inner product is concerned.


There is a very important subtlety in using this inner product, as the third property requires
f f 0 only when f 0 . As discussed above, this requirement is excessively harsh when
considering functions of the continuous variable t. This definition of the inner product automatically
discounts functions that are zero at all points except a few, as these few points do not consist of any
area. As long as there is not an interval on which f (t ) 0 , the function will be considered as the
same as zero by this inner product. A technical discussion of these facts requires the concept of a
measure and a technically different kind of integral called a Lebesgue integral. Essentially, Lebesgue
integrals give priority to the value of the function rather than the value of the independent variable t.
When doing a normal Riemann integral, one imagines adding up the value of the function times the
interval over which this function is constant. Functions that are continuous, but not constant are
considered by taking a limit as the interval of constancy goes to zero. This leads to the familiar idea of
a Riemann sum and its limit, the Riemann integral. In contrast, the Lebesgue integral asks first about
the value of the function. To each possible value of the function, the Lebesgue process assigns a
measure. This is essentially the width of the region of t for which f (t) takes this value, but it can be
generalized in ways that the Riemann integrals take on width cannot. The real line is uncountable,
so any set of countable numbers has zero measure. In order for the value of a function to attain finite
614

Section XII.2: Function Spaces: A Specific Example of an Infinidimensional Vector Space

measure, the function must take this value on an uncountable number of values of t. The same
limiting process then takes hold, but this time it is focused on the values of the function rather than
those of t. The advantage of the Lebesgue process is that it allows one to consider integrals of
discontinuous functions without additional disclaimers. The function f (t) that takes the value 1 if t is
irrational and 0 otherwise has Lebesgue integral

f (t )dt 1

because the set of irrational numbers, like the real line, is uncountable, and the measure of the interval
0,1 is its length, 1. The rationals can essentially be thrown out of this process because they have
zero measure. It is not possible to define a Riemann integral for this function because its
discontinuities are too vast.
Using the idea of Lebesgue integration, we can state that all functions that are nonzero only on
a set of measure zero will be identified with 0 by the inner product defined above. One may wonder
why this matters, as the function space we are interested in contains only uniformly continuous
functions. Among these functions identified with 0 by the inner product, only the zero function itself
is in this space. On the other hand, the space we are considering is not complete. There are functions
that are given by an infinite linear combination of the basis vectors that do not belong to this space.
Consider, for example, the sequence of functions
1
f n (t ) nt 1 2
.
e
1
Each of these functions is definitely continuous on the interval t 0,1 , so each can definitely be
written as a linear combination of the basis vectors t k as far as the inner product is concerned. On the
other hand, this sequence of functions converges as n to the discontinuous function f (t) that
equals 1 when t < 1/2, 0 when t > 1/2, and 1/2 when t = 1/2. This function does not lie in the space,
despite the fact that it can be given as a convergent infinite linear combination of the basis vectors as
far as the inner product is concerned. This leads to a contradiction, as the basis t k

k 0

is supposed

to span the space, and only the space. The space it spans is actually larger than this space, and
includes every discontinuous function that can be approximated arbitrarily well on a set of finite
measure by a continuous function.
This is a very important subtlety in the analysis of infinidimensional vector spaces, as we need
to add the additional requirement that a linear combination converges in order to consider it a valid
linear combination of the basis vectors. Adding the additional requirement that it converges to a
vector in the space is sort of strange, as it is not easy to characterize this requirement and it would
seem contrived if we did. A function space that has convergent linear combinations that do not
converge to a vector in the space is said to be incomplete. We can still think of an incomplete space as
a vector space as long as we add the word finite to our idea of linear combination. The space is
given by the set of all finite linear combinations of the basis vectors. Unfortunately, this requirement
does not allow the power functions t k to span the whole space of uniformly continuous functions.
The function sin t is certainly uniformly continuous, but it is not a polynomial so cannot be written as
a finite linear combination of the basis vectors t k . As an alternative, we can enlarge our space to
include all those functions that convergent linear combinations of the basis vectors converge to. This
process is called completing the space. The completion of the space of uniformly continuous functions

615

Advanced Mathematical Techniques

on the interval t 0,1 is called L2 0,1 . It contains every182 function defined on this interval whose
norm is finite, that is every function f (t ) for which

f (t )
1

dt .

This space is often called the space of square-integrable functions defined on t 0,1 for this reason.
Our result is that the complete space of square-integrable functions is spanned by the basis t k

k 0

Now that we have a basis, lets see if we can find the specific linear combination associated
with a given function. We can, if we like, use the function put forth in the proof of Weierstrass
theorem. Taking f (t ) sin t , we see that the nth polynomial approximation to f (t) is given by
n
n k k
pn t ;sin t sin
t (1 t ) n k .
n
k 0 k
This function is graphed for n = 20 in figure 1, where it is clear that the approximation is fairly close to
the function. We can also look at the approximations to the discontinuous function u (t ) 0 for

t 1 2 and u (t ) 1 for t 1 2 . The value of u 1 2 is unimportant, as it represents a set of measure


zero. This function is illustrated in figure 2 along with its approximation with n = 30.
1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.2

0.4

0.6

0.8

1.0

0.2

Figure 1

0.4

0.6

0.8

1.0

Figure 2

The Weierstrass approximation definitely converges to these functions as n grows without


bound, as shown in the proof of Weierstrass, but the character of the linear combination changes with
the value of n. It is not at all obvious how we could determine the ultimate coefficient of t from this
result, as this coefficient changes each time we change n. We cannot apply the Bessel inequality to

this combination either, as the basis t k

k 0

is not orthonormal with respect to our chosen inner

product. We can, if we like, use the Gram-Schmidt process to orthogonalize this basis. The first four
polynomials in this orthonormal basis are given by
b0 1
b1 12 t 1 2

b2 180 t 2 t 1 6

b3 2800 t 3 3t 2 2 3t 5 1 20 .

The coefficients of a linear combination for any given function in terms of these orthonormal
polynomials are clearly given by
1

ck f (t ) bk (t ) dt ,
0

so we have the polynomial approximations


182

Well not quite every there will be a discussion below.

616

Section XII.2: Function Spaces: A Specific Example of an Infinidimensional Vector Space

sin t

b0 (t ) 2 5

12 2

b2 (t )

1
3
7
u (t ) b0 (t )
b1 (t )
b3 (t ) .
2
4
16
Squaring the coefficients and adding allows us to verify Bessels inequality:
2

12 2
1 4
2 20
0.499701968...
3
2

1 1 3
7

0.46484375 .
2 4 16 256
These results indicate that the approximation is quite good for sin t , but not so good for u(t). This

is illustrated in figures 3 and 4, which shows the functions along with these approximations. Clearly,
the approximation to the sine function is extremely good despite the fact that this approximation is
only of third degree. The approximation does not fare so well for the discontinuous function u(t), but
it is clearly approaching the function. Adding more terms will cause the approximation to get better
and better, but it will never be able to reproduce the discontinuous nature of this function because all
of the approximation functions are continuous. As we include more and more terms in the expansion,
the approximation will get closer and closer to the function everywhere except in the immediate
vicinity of the discontinuity. This vicinity gets smaller and smaller as we include more and more terms
in the expansion, however, ultimately becoming a set of measure zero to be thrown away by the inner
product.
1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2
0.2

0.4

0.6

Figure 3

0.8

1.0

0.2

0.4

0.6

0.8

1.0

Figure 4

Something interesting happens when we continue to add terms to the expansion for u(t). The
approximation function does get closer and closer to the function, as illustrated in figure 5, but a
strange bump develops near the discontinuity that does not appear to go away as more terms are
included. The polynomials of degree 5, 10, 15, and 20 are shown along with the step function in
figures 5, 6, 7, and 8, respectively. The bump is clear, and it gets thinner as we include more terms
but does not go away. This bump is called the Gibbs phenomenon, after the American chemist,
physicist, and mathematician Josiah Willard Gibbs who studied it extensively in the late nineteenth
century, and is common to all expansions of continuous orthogonal basis vectors in function spaces
that contain discontinuous functions. We can think of this as the continuous polynomials rejecting the
idea of having to approximate a discontinuous function. It can be shown that the overshoot on either
side of the discontinuity tends to approximately 18% of the size of the discontinuity as n tends to
infinity, where the size of the discontinuity is taken as the jump from equilibrium, or half the total
jump. We will show this result later on in the context of an easier basis of continuous functions for the
space L2 . Note that the Weierstrass approximation does not exhibit this phenomenon because its
coefficients are allowed to change when n is increased. This is not the case with orthogonal basis
vectors, so our approximations cannot directly be compared with each other. When plotting only a
617

Advanced Mathematical Techniques

finite combination of basis vectors, we are implicitly ignoring all of the contributions that contain
terms like t and t 2 in the remaining basis vectors that were not included. Despite this fact, we can rest
assured that the approximation associated with a finite number of orthogonal basis vectors is the best
allowed by those vectors under the inner product we have associated with the space. This has already
been established above. The Weierstrass approximation cannot make such a claim, even though it is
definitely a good approximation when n is large. To see the difference between these two
approximation schemes, it is instructive to compare the square of the difference between the
approximation and the actual function for a given polynomial degree. This comparison is shown in
figure 9 for polynomial approximations of degree 10. Although the orthogonal polynomial
approximation contains wiggles away from the discontinuity, its approach to the function in the
vicinity of the discontinuity more than makes up for these small indiscretions. It is clear that the area
under the wiggly curve is less than that under the more uniform Weierstrass curve.
1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2
0.2

0.4

0.6

0.8

1.0

0.2

0.4

Figure 5
1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2
0.2

0.4

0.6

0.6

0.8

1.0

Figure 6

0.8

0.2

1.0

Figure 7

0.4

0.6

0.8

1.0

Figure 8
0.4
0.3
0.2
0.1
0.2

0.4

0.6

0.8

1.0

Figure 9
Before moving on to the next topic, it is instructive to look at the orthogonal polynomials
obtained above in order to discern the properties of orthogonal functions. This is important because
we have no obvious way to think about what the term orthogonal means when applied to functions. It
is easy to think of this term as meaning perpendicular when applied to vectors in n , but what can it
possibly mean for two functions to be perpendicular? The idea of orthogonal functions is obviously
associated with the inner product used to define orthogonal. Our inner product requires that the
functions change sign at different places so that the integral of their product will be zero. This is
certainly borne out by the graph of the first five of our orthogonal polynomials illustrated in figure 10.
This alternating behavior of the zeros is not only exhibited by polynomials; we will find below that
618

Section XII.2: Function Spaces: A Specific Example of an Infinidimensional Vector Space

functions that are orthogonal under this inner product must alternate their zeros, at least those that do
not coincide. There are many, many sets of orthogonal functions under this and other inner products.
The Gram-Schmidt process is a somewhat cumbersome way to make sets of orthogonal functions, as
this process can be extremely complicated even for ten vectors. Using it to orthogonalize an infinite
number of vectors obviously has feasibility issues. For this and other reasons, we usually obtain
orthogonal sets of functions from other sources. We turn to these sources now.
3
2
1
0.2

0.4

0.6

0.8

1.0

-1
-2

Figure 10
Exercises for Section XII.2:

In problems 1 6, find the first four coefficients in the orthonormal polynomial expansion of the given
function on the interval t 0,1 . Use the polynomials given in the text. You may use a computer
algebra system to aid your computation of the integrals, but clearly explain what you are doing at each
step. Show that Bessels inequality is satisfied and that this orthogonal polynomial expansion gives a
square integral closer to the exact result than that obtained by the Weierstrass approximation of degree
3. Use graphs to explain how good these approximations are to the actual function, and comment on
the behavior of these approximations in the vicinity of a discontinuity in the function or its derivative.
The function (t ) is the Heaviside step function.
1. cos t

2.

et 1
e 1

5. t 2 1 2 t 2 t 2 t 1 2

3. t 1 2 t

4. t 1 3 2 3 t

6. t 2 2t 2 t 1 2

7. Determine the first four orthonormal polynomials on the interval t 1,1 . These are called the
Legendre polynomials, and have immense use in physical applications.
8. Find the first four terms of a Legendre polynomial expansion of the following functions. Show
that Bessels inequality is satisfied, and also that the orthogonal polynomial approximation
comes closer to the exact square integral than the Weierstrass approximation. Include graphs in
your analysis, and comment on how well the orthogonal polynomial expansion and the
Weierstrass approximation approximate the exact graph.
(b) et
(a) cos t sin t
(c) (t )

(d) 1 2 t t 1 2

619

Advanced Mathematical Techniques

Section XII.3: Self-Adjoint Transformations

Our study of orthogonal bases for a vector space of finite dimension in the last set of notes led
us to consider self-adjoint, or symmetric, or Hermitian matrices.183 There, we found that the
eigenvectors and eigenvalues associated with such transformations had a very special set of properties.
The most important of these are that all of the eigenvalues are real, eigenvectors associated with
distinct eigenvalues are orthogonal, and the eigenvectors definitely span the space on which the
transformation acts. Several of these properties were proved in the last chapter using the abstract
notation defining the inner product and the adjoint transformation, so we expect these to carry over to
infinidimensional vector spaces without incident. Others will be a little more tricky to establish in
infinitely many dimensions, as infinidimensional vector spaces are associated with some subtleties.
For this reason, it will be instructive to go back through the derivation of these properties in the
infinidimensional case. We will begin with the easy results that can be derived fairly quickly. After
we have these results, we will go back through the analysis to establish the subtleties that are absent in
our finite-dimensional considerations but can occur in the infinidimensional case.
Given a transformation A mapping an infinidimensional vector space V to itself, we define the
adjoint of A, A , as the transformation satisfying

y A x A y

x ,

or
y A x x A y
*

for all vectors x , y V . If the vector space we are interested in has countable dimension, then we
can always associate the transformation A with an infinite matrix by considering its action on a set of
orthonormal basis vectors

k 1

:
A vk a jk v j ,

where the index j is implicitly summed over using Einsteins summation convention. Taking the inner
product of this with v gives the matrix element

v A vk ak
of the transformation A with respect to the basis

k 1

. Taking the complex conjugate of this

expression gives us the matrix elements of the adjoint transformation:


*
v A vk ak * vk A v
v A vk ak * .
Therefore, the adjoint is given by the complex conjugate of the transpose exactly as in the finite
dimensional case.
Suppose that we are able to find a transformation A that is Hermitian in the sense that its
adjoint is equal to itself, A A . If this transformation has eigenvalues and eigenvectors,
A vk k vk ,
then we must have
vk A vk k vk vk .
183

This last term is in honor of the French mathematician Charles Hermite, who studied Hermitian matrices and operators
extensively in the latter part of the nineteenth century. Hermitian operators later became the foundation on which quantum
mechanics is built, leading directly to the interpretation of observables and many prominent inequalities such as the
Heisenberg uncertainty principle.

620

Section XII.3: Self-Adjoint Transformations

Taking the complex conjugate of this result and using the properties of the inner product, we have
*
k* vk vk vk A vk vk A vk vk A vk k vk vk .
Therefore, either vk is the zero vector or k k* . Since the zero vector is disallowed from being an
eigenvector, it must be that the eigenvalue is real. Thus, Hermitian operators cannot have eigenvalues
that are not real. Taking the inner product of our eigenvalue equation with another eigenvector gives
v j A vk k v j vk .
Taking the complex conjugate of this equation gives
k vk v j vk A v j vk A v j j vk v j ,

or
j vk v j 0 .

The eigenvalues are either the same or the eigenvectors are orthogonal. Thus, any two eigenvectors of
a Hermitian operator that do not have the same eigenvalue are necessarily orthogonal. These results
are exactly the same as those found in the finite-dimensional case, but there are many subtleties. Our
proof requires not only that there exist self-adjoint operators A, but also that these operators have
eigenvectors. Neither of these results is obvious in an infinidimensional space, and it will turn out that
both of them require very special restrictions that are not required in the finite-dimensional case. If, on
the other hand, we are able to find an operator that satisfies these requirements, its eigenvalues must be
real and eigenvectors associated with different eigenvalues must be orthogonal. The other nice
property we hope to find is that the eigenvectors of Hermitian operators will span the space. This
property is the most difficult of all to establish, and the additional requirements we must impose on a
Hermitian operator in order to be sure that its eigenvectors span the space are not easily stated. We
will have more to say about this point later on.
Consider first result, that the adjoint of an operator is given by its complex conjugate transpose.
We must be very careful using this result, especially in the infinidimensional case, because it has been
derived assuming that we have an orthonormal basis under the inner product of the space. This
assumption is easily overlooked in finitely many dimensions because there always is a trivial
orthonormal basis that we can use the e-basis. This basis is certainly orthonormal under the standard
inner product in n , but we have to be very careful in finding a basis that is orthonormal under this
new integral inner product in function spaces. Consider, for example, the three dimensional space of
polynomials of degree 2 or less. The set 1, t , t 2 clearly forms a basis for this space, but it is not
orthonormal under the integral inner product as
1

t t dt 4 0 .
2

Ignoring this fact for the moment, we can consider the action of the linear derivative operator on this
space. Writing the element
a0
2
a2t a1t a0 as a1 ,
a2
the action of the derivative can easily be seen to be the same as the action of the matrix
0 1 0
d

0 0 2
dt

0 0 0
621

Advanced Mathematical Techniques

on our vector representation. Despite the fact that this transformation certainly takes any vector in our
space to the appropriate derivative of this vector, we cannot simply take this matrix as the derivative
operator in this basis because we have not accounted for the inner product in our analysis. To fully
represent this operator in this basis, we need to find a matrix M that satisfies
a0
1
d
d
q
p q(t ) p (t ) dt b0 b1 b2 M a1
0
dt
dt
a2
for all polynomials p (t ) a0 a1t a2t 2 and q (t ) b0 b1t b2t 2 . This matrix can be seen to be given
by
1
0 1

M 0 1 2 2 3
0 1 3 1 2

by doing the integral and pulling the coefficients. Note that the matrix elements of M correspond
exactly to the inner product of one of our basis vectors with the derivative of another. The 23
component, for example, is given by
1 d
1
d 2
2
M 23 t
t t t 2 dt 2 t 2 dt .
0
0
dt
dt
3
This matrix works perfectly well for inner products, but it does not correspond to the matrix that
accomplishes this transformation in the space itself. These two matrices will only correspond with
each other when working in an orthonormal basis.
As the matrix M acts on vectors in 3 , its adjoint is given by the complex conjugate of its
transpose:
0
0 0

M 1 1 2 1 3 .
1 2 3 1 2

How is this transformation related to the derivative operator itself? We can answer this question by a
judicious use of integration by parts:
1
1 d
1
d
d
d
q
p q(t ) p (t ) dt
q(t ) p(t ) dt p (t ) q (t ) dt

0
0 dt
0
dt
dt
dt
.

d
1
d
1
q(t ) p(t ) 0 p (t ) q (t ) dt q p
0
dt
dt

In order to identify the adjoint operator, we need to write this final expression in the form

1
1 d
d
1
q p q (t ) p (t ) 0 0 p (t ) q (t ) dt 0 q (t ) p (t ) dt .
dt
dt

dt

Part of the derivatives adjoint is obviously given by the operator d dt , but the full adjoint operator
1

also requires us to consider the boundary term q (t ) p (t ) 0 . It is not obvious how we can write this
contribution as an integral over the polynomial p (t ) times the action of an operator on the polynomial
q(t ) , as it is associated with a set of measure zero that will be thrown out by the inner product with
any normal operator.
622

Section XII.3: Self-Adjoint Transformations

We can avoid this embarrassment for the time being by defining the Dirac delta function (t ) .
This function was first introduced by the British physicist Paul Dirac as a convenient notation in his
1930 book The Principles of Quantum Mechanics. We have already seen it in our treatment of
impulses acting on physical systems given in section 9.4. It is extremely useful in many fields,
especially quantum mechanics and differential equations, but took a long time for mathematicians to
rigorously justify. Mathematically, it is more properly thought of as a distribution rather than a
function. A distribution is a construct that can only be defined fully by its behavior on functions under
the action of an integral. In the case of the Dirac delta function, we define it as the distribution (t )
that satisfies
b
0 if 0 a, b
a f (t ) (t ) dt f (0) if 0 a, b .
This is obviously a problem for Lebesgue integration, as the single point t = 0 represents a set of
measure zero. The integral cannot depend only on the value of the function at this point unless
something very strange is happening with the construct (t ) . To see this clearly, consider that the
only value of the function f (t) that matters is its value at t = 0. In order to orchestrate this property, it
must be true that (t ) 0 for all nonzero values of t. If it would be continuous, then this would also
imply that (0) 0 and the integral would be zero for all intervals a, b and functions f (t), in

violation of its definition. Similarly, a finite value of (0) will cause the contribution at t = 0 to be
thrown out as a set of measure zero by the integral. The only option left is for the value of (0) to be
infinite in such a way that the integral of (t ) over any interval containing 0 is exactly 1. This idea of
a function that takes the value 0 everywhere except at t = 0, where it is unbounded, flies in the face of
every well-defined set of functions. There are, however, sequences of well-defined functions that
converge, in some sense, to the delta function. The sequences
n ; t 1 2n ,1 2n
n (t )
0 ; t 1 2n ,1 2n
and

n (t )

e nt

both satisfy
b

lim n (t ) f (t ) dt f (0) ,
n a

for any function f (t) that is continuous at t = 0, so can be said to converge to the delta function in this
sense. We must be careful with this idea, however, as neither of these sequences actually converges.
The square of the norm of the difference of two of elements of the first sequence is given by

n (t ) m (t ) n (t ) m (t ) dt n m 2n n m
2

whenever m > n and the interval contains the entire region over which either function is nonzero. This
sequence therefore does not satisfy the Cauchy convergence criterion introduced in the expansion
notes, and cannot be thought of as actually converging. Despite this fact, the integrals of these
sequences of functions do converge. As with all similar distributions, the appearance of the delta
function implies that the expression is ultimately intended to appear under an integral. Naked delta
functions that are not intended to be integrated are not well-defined. This is also made clearer by the
rigorous mathematical treatment of the delta function, which associates this object with the Lebesgue
measure of the integral rather than with a function. The delta function is understood as giving unit
623

Advanced Mathematical Techniques

measure to the single value t = 0 and zero measure to all other values of t. In this respect, it is
mathematically more proper to write
f (t ) dt
rather than
f (t ) (t ) dt .

This notational difference is almost always ignored in applications, as it is more familiar to use the
latter notation, but it should always be clear that this is what is meant by our notation.
Along with his definition of the delta function, Dirac gives the following properties that can be
verified quite simply using the definition:
(t ) (t )
1
(at ) (t )
a
(t a ) f (t ) (t a ) f ( a )
1
t 2 a2
(t a) (t a)
2a
(t a ) (t b) (t a ) ( a b) .
As an example, consider the second property. In order to assess the meaning of (at ) , we consider its
action under an integral with a test function f (t):
1
a
1
1 (at ) f (t ) dt a (u ) f u a du a a f (0)
if a > 0, while
1
a
1 |a|
1
1 (at ) f (t ) dt a (u ) f u a du a a |a| (u ) f u a du a f (0)
if a < 0.
Therefore, the action of this distribution is the same as that of the distribution (t ) | a | . We are
integrating from -1 to 1 only to ensure that 0 lies in the interval of integration. If it does not, then the
integral is clearly zero in both cases. The second-to-last property is a bit tricky, but can easily be
determined using this one. First, we recognize that the action of t 2 a 2 under an integral with a
test function is given by

2a

2 a

t 2 a 2 f (t ) dt (t a)(t a) f (t ) dt .
2a

2 a

Now, the properties of the delta function imply that the only values of t that matter are those for which
the argument of the delta function is zero. This occurs when t = a or t = a. Splitting the integral into
two pieces, and thinking of a as positive, we are led to

2a

2 a

t 2 a 2 f (t ) dt (t a)(t a) f (t ) dt (t a)(t a) f (t ) dt .
0

2a

2 a

The only important contributions to the first integral lie near t = a, while the only important ones to
the second integral lie near t = a. For this reason, we can write

2a

2 a

t 2 a 2 f (t ) dt 2a(t a) f (t ) dt (t a)2a f (t ) dt
0

2a

2 a

1
f ( a) f (a)
2|a|
in accordance with the stated identity. A more general statement of this identity is

624

Section XII.3: Self-Adjoint Transformations

1
(t r ) ,

g
(r )
r
where the sum is taken over all roots r of the function g(t). If the derivative is zero, as in the case of
degenerate roots, then the distribution is ill-defined. The importance of the Dirac delta function in
many scientific and mathematical fields cannot be overstated. Standard results in these fields can often
only be clearly understood with a full understanding of the properties of the delta function.
Continuing our discussion of the adjoint of the derivative operator, we see that this operator is
given by

g (t )

d
d
(t 1) (t ) .
dt
dt

You should verify for yourself that the action of this operator on q(t) does in fact give the required
inner product with p(t). Using our finite-dimensional representation of the derivative operator and its
adjoint on the space of polynomials with degree less than or equal to 2, we find that the combination of
delta functions is given by the matrix
0 1 1

(t 1) (t ) M M 1 1 1 .
1 1 1

This is certainly true, as


0 1 1 a0
1
b
b
b
0 1 2 1 1 1 a1 b0 b1 b2 a0 a1 a2 a0b0 q(t ) p(t ) 0 ,
1 1 1 a

but our derivation of this expression for the adjoint of the derivative is more general than just the
special case of polynomials with degree smaller than 3. The expression

d
d
(t 1) (t )
dt
dt
is valid for any space of differentiable functions associated with the inner product defined on the
interval t 0,1 . If the functions are considered on the interval t a, b instead, then this expression

is replaced by

d
d
(t b) (t a ) .
dt
dt

I point this out in order to emphasize the dependence an adjoint operator has on the inner product
associated with the space. In order to determine the adjoint of a given operator, we must first establish
the inner product under which this adjoint is defined. This subtlety is often overlooked in applications,
as the inner product is usually defined in some obvious manner with the interval of integration clear.
However, it is important to understand that the idea of an adjoint is deeply associated with the
properties of the inner product. The complications associated with the definition of the adjoint
operator lead to severe restrictions on whether or not a transformation even has an adjoint that maps a
given vector space to itself, and these will turn out to be exactly the restrictions necessary to yield the
obvious inner product associated with these applications.
The adjoint of the derivative operator, as determined above, does not represent a mapping of
the space of continuous functions onto itself. Its action on the polynomial
q (t ) 3t 2 2t 4 ,
for example, gives the expression
625

Advanced Mathematical Techniques

d
q (t ) 5 (t 1) 4 (t ) 6t 2 .
dt
This function is certainly not continuous, as it contains infinite values at both endpoints of the
integration. These infinite contributions come directly from the boundary contribution to the
integration by parts we used above. The only way out of this mess is to establish a condition on the
elements of our space that forces this expression to vanish, so we require
q (1) p (1) q (0) p (0)
for all functions q and p in our space. This must be true for all functions in our space, so we require
these functions to satisfy the additional requirement that
p (1) p (0) .

This requirement does not mean that the naked operator d dt will map our space to itself, as the
infinite contributions are still there. When appearing under an integral with a test function that is in
the space, however, these contributions can be completely ignored:

(t 1) (t ) q(t ) p(t ) dt q(1) p(1) q(0) p(0) 0 .


1

As far as the inner product is concerned, we can think of the adjoint of the derivative operator simply
as

d
d
.
dt
dt

The derivative operator is certainly not Hermitian when acting on the space of continuous
functions defined on the interval 0,1 under the boundary condition f (0) f (1) . It does not even

map this space to itself, as there are continuous functions that are not differentiable. We cannot even
consider the smaller space of functions that are differentiable on the interval 0,1 , as there are
differentiable functions whose derivatives are not differentiable. The derivative operator does map the
space of functions defined on 0,1 that are infinitely many times differentiable on this interval,
though, and we can force this operator to be Hermitian by multiplying by i:184
d
k i
dt
is a Hermitian operator that maps the space of infinitely many times differentiable functions defined on
the interval 0,1 that satisfy the boundary condition f (0) f (1) onto itself.185 Thus, we have found
a Hermitian operator acting on an infinidimensional vector space. Does it have any eigenvalues? In
this specific case, eigenfunctions of this operator are functions satisfying the above conditions that also
satisfy
d
i f (t ) f (t ) .
dt
This is a separable differential equation that can easily be solved:
f (t ) Ceit .

184

Multiplication by i would also have worked, but there are reasons associated with quantum mechanics why this choice is
preferred.
185
Actually, there are functions satisfying this boundary condition that map to other functions that do not satisfy the
boundary condition; we will ignore this technicality for the moment as it only represents a set of measure zero.

626

Section XII.3: Self-Adjoint Transformations

The appearance of the imaginary unit i in this analysis forces us to consider complex functions, so we
will be more careful with the complex conjugation from now on. This solution in no way requires the
eigenvalues to be real. The boundary condition imposes this requirement, as we need
f (1) Cei f (0) C ,
so
2 n ; n .
The boundary condition not only requires the eigenvalues to be real, but also quantizes them to a
countable set with a minimum difference of 2 between them. This is common to essentially all
infinidimensional vector spaces: the eigenvalue requirement acting alone does not force the
eigenvalues to be real. Only the imposition of the boundary conditions leads to this requirement, as it
is this imposition that makes the operator Hermitian. Whenever the dimension of a vector space is
countable, the imposition of the boundary condition will also result in a countable number of
eigenvalues. The boundary conditions contribution is also clear when establishing that the
eigenfunctions associated with distinct eigenvectors are orthogonal. The inner product of the
eigenfunctions f n (t ) and f m (t ) is given by
1

f m f n e imt eint dt
0

1
i t
e n m 0
i n m
0

when n m because of the boundary condition. When n = m, we have


1

f n f n e int eint dt 1 .
0

Therefore, this set of eigenfunctions represents an orthonormal set.


Do these eigenfunctions span the space? To answer this question, it is useful to first consider
the question of whether or not a general Hermitian operator on an infinidimensional space admits
eigenvectors. Consider the Hermitian operator A mapping an inner product space V, finite or
infinidimensional, to itself in such a way that the quadratic expression x A x attains a local
extremum over all x V satisfying x x 1 . Lets call this vector achieving the extremum x1 .
The statement that x1 represents a local extremum means that the addition of the vector y , with

small and y normalized and orthogonal to x1 , must leave the quadratic expression stationary,
i.e.

d x1 y A x1 y
d
1 2

0.
0

The denominator of this expression normalizes the vector x1 y . Taking the derivative easily gives

y A x1 0 , so A x1 is orthogonal to all directions orthogonal to x1 . Taking x1 as an element of


an orthonormal basis for V, this statement is equivalent to A x1 1 x1 , or x1 is an eigenvector of
A, since none of the other elements of the basis can contribute to A x1 . Therefore, any Hermitian
transformation for which x A x attains a local extremum must have an eigenvector. Removing this
eigenvector from our space, we can continue this process to assert the existence of another
eigenvector. Using this procedure, we can assert the existence of eigenstates until the assumption that
the quadratic expression attains a local extremum breaks down.
The Hermiticity assumption allows us to assert that the eigenspace of A is orthogonal to the rest
of the space, as we can definitely consider eigenvectors with distinct eigenvalues as a part of an
627

Advanced Mathematical Techniques

orthonormal basis for the space. The action of A on any of the other elements of this basis cannot
generate elements of the eigenspace because
vk A v j A vk

whenever vk

v j A vk

is an element of the eigenspace and v j

v j k vk v j 0
is a vector orthogonal to this eigenspace.

Therefore, the transformation A has no choice but to map the orthogonal complement of the eigenspace
to itself. This is the reason why we can repeatedly apply the above analysis: the remaining part of the
space is mapped to itself, so satisfies the requirements of the theorem. The only way in which a
Hermitian operator can fail to have an eigenvector is for the quadratic expression x A x not to
achieve a local extremum in the orthogonal complement to its eigenspace. In the absence of this
failure, we can continue to find eigenvectors until we have run out of basis elements. This condition
therefore also represents the condition required to declare that the eigenvectors span the space.
There are, unfortunately, Hermitian operators that do not satisfy this requirement. The operator
t , for example, is clearly Hermitian as
*
q t p q(t )* t p(t ) dt t q(t ) p(t ) dt t q
1

for all functions p(t) and q(t).


V span e

i 2 n
n 0

The quadratic expression for normalized functions in the space

gives
x t x

1 1
c*j ck v j t vk 2
j k

j k j 1

Im c*j ck
k j

Try to show this result. We write x as a linear combination of the basis elements, then do the
resulting integrals using integration by parts. The remaining sum is then broken into two pieces: those
for which j = k, and those for which j k . The sum over the former of these terms gives 1 2 because
the norm of x is 1, and the sum over the latter is re-arranged so that j is always smaller than k.
The imaginary part of c*j ck is given by

Im c*j ck i 2 c*j ck ck* c j .

Our expression for the quadratic is somewhat strange, as it implies that x t x 1 2 whenever all of
the coefficients are real or all of the coefficients are imaginary. The only way to deviate from 1/2 is to
have fully complex coefficients. Writing c j a j ib j , with a j and b j real, gives
1 1 b a b j ak
.
v j t vk k j
2 j k j 1 k j
j k
To consider extremizing this expression with respect to the coefficients under the requirement that
x t x

cc

*
j k

a
k 0

2
k

bk2 1 ,

we could assume that such an extremum exists and use the method of Lagrange multipliers to
characterize the possible solution set. This is not needed in this case, however, as we can show quite
generally that the operator t cannot have any eigenvectors in the space L2 .
To see this, suppose we were able to find a function f t; such that

t f t; f t; .
This obviously implies that
628

Section XII.3: Self-Adjoint Transformations

so our function f t ; 0

t f t; 0 t ,
whenever t . If f t; is

continuous, then it must be the zero

function. The zero function is disallowed from eigenvector status, so this cannot be the case. If
f t; is not the zero function, then it must have a nonzero norm and must also be able to be
normalized so that this norm is equal to 1, i.e.

f t ; 0 whenever t

and

f t;
0

dt 1 .

These two properties are enough to associate f t; with the delta function:
f t ; t .
The delta function is not a function because it is not defined at t so cannot be in the space of
square-integrable functions L2 . Note that this analysis does not really require t f t ; to equal
2

0 for every value of t , only that the integral of the square of this function from 0 to 1 be zero.
Remember that our idea of what it means to be 0 is inextricably linked to our choice of inner product.
Any vector whose norm is zero is considered to be zero. The function t f t ; may deviate
from zero, but it can only do so on a set of zero measure. These deviations are washed out by our
inner product, so do not change the above analysis.
Our analysis has been quite general. It implies that the operator t cannot have any
eigenvectors in the space of square-integrable functions L2 . The failure of the operator t to attain an
eigenvector in this space is ultimately linked to its continuous nature. The appearance of the delta
function above indicates this continuity, as does our statement that t f t ; 0 implies

f t ; 0 whenever t . There is only a single point that satisfies equality, so there are
uncountably many values of t that do not satisfy it. This boxes us in and sets the stage for the
appearance of the delta function. We can, in some sense, say that the operator t has the continuous
set of eigenvalues 0,1 , but this statement is kind of an abuse of notation because none of these
presumed eigenvalues is associated with an eigenvector in the space. For this reason, mathematicians
often refer to such operators as having a continuous spectrum.
We can understand the failure of the operator t to exhibit an extremum for the quadratic
expression x t x in L2 as being related to the fact that our sum expressions for it given above
contain denominators that are unbounded. This generates the continuity of its spectrum and disallows
any eigenvalues. The problem is difficult to approach in this manner, and proving the set of conditions
an operator must satisfy in order for us to be sure that its eigenvectors span the space is notoriously
difficult; it is arguably one of the most difficult and important theorems in all of functional analysis. I
will not present the proof here, as it is extremely involved and contains a lot of notation that I would
only need to introduce for this purpose, but some of the results can be found in chapter 15. The final
result is that all Hermitian operators having a discrete spectrum will have eigenvectors that definitely
span the space. The operator t has a continuous spectrum, so does not satisfy these requirements.
The operator i d dt , on the other hand, has a discrete spectrum. We have found all of the allowable
eigenvalues, so this result indicates that the eigenvectors must span the space. According to our above
analysis, this space we refer to is the completion of the space of continuous square-integrable
functions L2 .
629

Advanced Mathematical Techniques

The general proof of the above result is very difficult, and will be accomplished from a very
different perspective in chapter 15, but we can prove it more easily in the specific case of the operator
i d dt . We will do this in a very roundabout manner, beginning with a very strange task: let us try to
represent the delta function in terms of our eigenbasis. We must, of course, ultimately fail to do so
because the delta function is not a function. We cannot have a continuous set of functions actually
converge to it. Nevertheless, it is instructive to try. The best approximation to the delta function
available has coefficients given by
1

ck vk e 2 ikt t dt e 2 ik .
0

Therefore, the best available approximation including values of k from N to N for some large integer
N is given by
N
2N
2N
e 2 i (2 N 1)( t ) 1
N (t ) e2 ik ( t ) e2 i ( j N )(t ) e 2 iN ( t ) e 2 ij (t ) e2 iN ( t )
e 2 i ( t ) 1
k N
j 0
j 0
e 2 iN (t )

e i (2 N 1)( t ) e i (2 N 1)(t ) e i (2 N 1)(t ) 2i sin (2 N 1)(t )

2i sin (t )
e i ( t )
e i (t ) e i (t )

sin (2 N 1)(t )

sin (t )
Work through this analysis. We first change the summation variable k to j so the sum will start at 0
instead of N. Then, we recognize the sum as a geometric series and sum it. After that, we manipulate
the exponentials to create the sine function. The remaining phase cancels, leaving us with the result.
Our result for N (t ) is a very famous function. It approaches the limit 2N + 1 as t , so
becomes unbounded in this vicinity as N grows without bound. The value of its norm also approaches
2N + 1 as N grows without bound, so the function becomes impossible to normalize. Away from this
vicinity, it oscillates more and more wildly as N grows. It does not go to zero, as can be seen by
searching for its maximum values. We can do this by taking the derivative with respect to t and asking
it to be zero:
d sin (2 N 1)(t )
dt

sin (t )

(2 N 1) cos (2 N 1)(t ) sin (t ) cos (t ) sin (2 N 1)(t )


sin 2 (t )

2 N cos (2 N 1)(t ) sin (t ) sin 2 N (t )

2 N
N

sin 2 (t )

cos (2 N 1)(t )

sin (t )
In considering large values of N, we ignore the second contribution to the derivative. This expression
will be zero whenever (2 N 1)(t ) n 1 2 , and the value of our function at these locations is
given by
sin n 1 2
sin (2 N 1)(t )
(1) n
(1) n
.

(2 N 1)

n 1 2 t
sin (t )
sin (n 1 2) (2 N 1)
Thus, the maxima fall off as the inverse of the distance to . This formula is slightly small when the
number of oscillations becomes appreciable to N, as our statement that the sine function in the
630

Section XII.3: Self-Adjoint Transformations

denominator can be replaced with its argument is no longer valid. Figure 11 illustrates the function
20 t 1 3 , along with its two envelope functions

t 1 3

It is clear from the illustration that the oscillations are quite wild away from t 1 3 , and that they fall
off approximately as the envelope functions indicate they should. It is also clear that the maximum at
t 1 3 far exceeds any of the other maxima.
20
10
0.2

0.4

0.6

0.8

1.0

- 10
- 20

Figure 11
The function N (t ) does not approach our conception of the delta function as a function that
equals zero everywhere except at one location, where it is unbounded, but we must remember that this
is not the definition of the delta function. The real question is whether or not
1

lim f (t ) N (t ) dt f ( )

N 0

for any suitable test function f (t). For N suitably large, we can re-write this integral by re-arranging
our expression for N (t ) . Writing u (2 N 1)(t ) , we have
sin (2 N 1)(t )

sin u

2 N 1
N

sin u

(2 N 1) sinc u .
sin (t )
sin u (2 N 1)
u
In taking the limit, we are saying that the argument of the sine function in the denominator goes to zero
so the sine function can be replaced with its argument in light of the fact that
sin x
lim
1.
x 0
x
This is cheating in some sense, as u (2 N 1) t does not go to zero as N is increased. However,

we can take this result quite literally whenever t O 1 N . If t does not satisfy this requirement,
then the oscillations in N (t ) take over. It is clear from our above analysis that every maximum is
followed by a minimum at a distance of 1 (2 N 1) away. These minima and maxima have
approximately the same magnitude, so effectively cancel each other out in the integral and do not
ultimately contribute. We can construct a continuous test function that avoids this issue for any given
value of N, but any fixed function f (t) will ultimately see this result as N grows without bound. For
this reason, we will ignore contributions to the integral that lie farther from than O 1 N away and
use the approximation given above. The integral is then given by
1
(1 )(2 N 1)
u
du

lim f (t ) N (t ) dt lim
(2 N 1) f
sinc u
(2 N 1)
0

N
N
2N 1
2N 1

.
(1 )(2 N 1)
u

sinc
lim
f
u
du

N (2 N 1)
2N 1

631

Advanced Mathematical Techniques

As N grows without bound, we can replace this expression with


1

N 0

lim f (t ) N (t ) dt

f sinc u du f sinc u du ,

as long as the test function f (t) is continuous and is neither 0 nor 1. The sinc function is defined
via
sin u
,
sinc u
u
and its integral from negative infinity to infinity is given by 1. You should show this; there are many
ways, several of which are discussed in the Gamma function notes. Thus, we arrive at the result that
1

lim f (t ) N (t ) dt f ( )

N 0

for all continuous test functions f (t).


The above is definitely not a rigorous proof, as we have not included discontinuous functions
in our analysis. Discontinuous functions can be defined in such a way that their nonzero locations
correspond exactly to the maxima of the function N (t ) . However, the nature of this function is
such that these can only be attained on sets of measure zero as N grows without bound. The extrema
of N (t ) form a countable set, so the support of such functions is of zero measure on the real line.
The proof of these statements is fairly involved, as it includes the definitions of countable, support
and zero measure. On the other hand, we have definitely shown that all square-integrable continuous
functions exhibit this requirement. The discontinuous functions contained in the space L2 can, by
definition, be obtained as limits of sequences of continuous functions. For any reasonably
discontinuous function, we can count the different values it takes everywhere except where it is
represented by a continuous function. This allows us to begin a process of approximating the function
better and better by a sequence of continuous functions. There are functions that are so discontinuous
that they cannot be represented by a sequence of continuous functions, but these must, by definition,
take different values on an uncountable set. We cannot easily even define such a function, as the very
act of specifying different values on an uncountable set in a manner that is not continuous almost
always makes the specified set countable. The remaining, uncountable set must be assigned a value,
and it is difficult to specify this value in such a way that it is discontinuous. Every set of rules we
make is countable, so all function spaces consisting of functions satisfying this set of rules can be
approximated arbitrarily well in terms of our inner product. In any case, the functions we will be
interested in for applications always are continuous enough for us to approximate them arbitrarily
well with continuous functions. Therefore, we can consider lim N (t ) as the delta function for all
N

intents and purposes.


What does this have to do with the eigenvectors spanning the space? Consider an arbitrary
function f (t) that lies in L2 and does not display the kinds of nasty discontinuities described in the
previous paragraph. It is certainly true that
1

f (t ) f ( x) ( x t ) dx
0

everywhere except at its discontinuities. Replacing this with


1

f (t ) lim f ( x) N ( x t ) dx ,
N 0

and then with


1

f (t ) lim f ( x )
N 0

632

k N

2 ik ( t x )

dx ,

Section XII.3: Self-Adjoint Transformations

we have a wonderful result. The integral and the sum can certainly be exchanged, as the sum is finite
for fixed N. This gives
f (t ) lim

e
2 ikt

k N

f ( x)e 2 ikx dx lim

k N

vk ,

explicitly proving that the eigenvectors span the space.


Interestingly, we can also verify the Bessel identity. The sum is given by
N

lim

k N

dx f ( x) e dy f ( y) e

ck lim

k N
N

lim

2 ikx

dx dy f ( x) f ( y) e

k N

2 iky

2 ik ( x y )

Exchanging the finite sum and the integral then gives


N

lim

k N

ck lim

N 0

dx dy f ( x) f ( y ) e2 ik ( x y ) lim
1

k N

N 0

dx dy

f ( x) f ( y ) N ( x y )

dx f ( x) f f
1

This is very strong evidence, and I hope that it is clear to you how important the delta function is in this
type of analysis, despite the fact that it does not belong to L2 and is not a function. The idea is
essentially that if a sequence of functions can approximate a function so crazy as the delta function
must be able to approximate any function. To put it more clearly, the delta function allows one to
express any given function in terms of an integral of the form we use to define our inner product. The
ability to approximate the delta function therefore paves the way to the ability to approximate any of
these functions, as the function can be expressed in terms of the delta function and the delta function
can be expressed in terms of these basis functions.
Lets explore the properties of this basis in a series of examples. Consider first the function
f (t ) t . The coefficients of its expansion are easily obtained by a single integration by parts:
1
i
1
ck t e 2 ikt dt
; c0 .
0
2 k
2
Adding up the finite series, we have
1 N i
1 1 N sin 2 kt
f N (t )
e2 ikt e 2 ikt
.

2 k 1 2 k
2 k 1
k
This function is plotted along with the function t in figure 12 for N = 2, 5, 10, and 20. Note the
appearance of the Gibbs phenomenon both at t = 0 and at t = 1. This occurs because the functions, as
constructed, must satisfy the periodic boundary conditions at these two locations. This requirement
arose from our desire to make the derivative function self-adjoint, so is non-negotiable. Any other
boundary conditions that cause the boundary term to vanish will do, but there must be such boundary
conditions. A periodic function that equals t from t = 0 to t = 1 exhibits discontinuities of unit size at
both 0 and 1, as it needs to repeat itself beyond these points. The appearance of the Gibbs phenomenon
indicates that we are rudely trying to approximate a function that does not satisfy the boundary
conditions that we imposed on our basis functions. Our basis functions will do it, as can easily be
verified by the Bessel inequality:
N
N
1
1
1
1
1
1 1 1
2
ck 2 2 2
2 (2) t 2 dt ,

N
0
4
4 2
4 12 3
k N
k 1 4 k
but they complain by bucking away from the discontinuity we have created.

633

Advanced Mathematical Techniques


1.0
0.8
0.6
0.4
0.2
0.2

0.4

0.6

0.8

1.0

Figure 12
We can actually sum this series, as it is logarithmic in nature:
N
1
i
1 i
i
lim f N (t ) lim
e 2 ikt e2 ikt
ln 1 e 2 it
ln 1 e 2 it

N
N
2
2 2
2
k 1 2 k
1 i
i

ln e2 it e 2 it 1
ln 1 e 2 it
.
2 2
2
1 e2 it 1
1
i
1
t
ln 2 it
t t
1 2
2
2 e
2
This indicates the validity of our expansion, but it is silent on the Gibbs phenomenon. To understand
this phenomenon, we take the derivative of our approximation and ask where it is zero first beyond
t = 0. This will locate the first minimum observed and allow us to assess its deviation from the function
t. The derivative is
N
N 1

e 2 iNt 1
N

f N (t ) 2 cos 2 kt 2 Re e 2 ikt 2 Re e2 it e2 ikt 2 Re e 2 it 2 it


e 1
k 1
k 0
k 1

.
i ( N 1)t sin 2 Nt
sin 2 Nt
2 Re e
2 cos ( N 1)t
sin 2 t
sin 2 t

The zeros of this function occur at t n 2 N and t (2n 1) (2 N 2) for integer n, so the closest lies
at t 1 (2 N 2) . Evaluating our function at that value gives
N
N
sin k ( N 1) 1 1
sin k ( N 1)
1 1 1
lim f N

lim

lim
k

N
N
N

k
2
k
2N 2 2
k 1
k 1
.
N
sin uk
1 1
1 1 1 sin( u )
lim
( N 1) u
du 0.08948987..
2 N k 1 ( N 1)uk
2 0 u
Thus, we have an undershoot of approximately 8.95% of the discontinuity, or about 17.9% of the halfjump. This is common to all expansions in eigenfunctions of the operator i d dt , which are called
Fourier series after the French mathematician Joseph Fourier, who extended much of the previous
analysis along these lines. At any jump discontinuity of f (t), we assign the name equilibrium to the
average of the values the function approaches to the left and right of the discontinuity, so the
discontinuity represents a jump up from the equilibrium on one side and a jump down of the same size
on the other. The function that the series converges to jumps the amount
2 1 sin u
du 1.178979744472...
0
u
times the jump experienced by f (t) from the equilibrium, or about 17.9% more. This effect is present
in every approximation to the expansion, though the region experiencing this phenomenon moves
634

Section XII.3: Self-Adjoint Transformations

closer and closer to the discontinuity as the number of terms included is increased. For this reason,
Fourier series are not reliable for numerical computation in the vicinity of any discontinuities.
As a second example, consider the function
1 2t ; t 1 2
f (t )
.
4t 3 ; t 1 2
This function exhibits a discontinuity at t 1 2 , so we should expect to see the Gibbs phenomenon in
the vicinity of this point. It is periodic, however, so we should not expect to see this phenomenon at
the endpoints of the interval. The coefficients of the expansion can again be obtained easily by
integrating by parts:
1

12

ck f (t )e 2 ikt dt

1 2t e2 ikt dt 1 2 4t 3 e2 ikt dt

.
k

1
(
1)

2 k
2 2 k 2
The value of c0 is given by 1 4 . Before looking at this function graphically, lets verify Bessels
inequality. The sum splits into two pieces, according to whether k is even or odd:
2 N 1
N
N

1
1
9
1
2
ck 2 2

2
4
4
2
2
(2 j 1)
16
k 2 N 1
j 0 4 (2 j 1)
j 1 4 (2 j )

(1) k

1
2

16 4 2

2 N 1

k
k 1

18

(2 j 1)
j 0

.
1 (2) 18
1


1 (4)
N
16 2 2 4 16
1
1 1 18 1
1
2

3 4 9 f (t ) dt f f
0
16 12 96 48
3
Equality thus indicates that this result is probably correct, barring an accident. To simplify our
approximation, we also split the sum into several pieces:
N
1 2 N 1 ( 1) k i 2 ikt 2 ikt
3
f 2 N 1 (t )
e
e
2
e 2 i (2 j 1) t e2 i (2 j 1)t

2
4 k 1 2 k
(2
j

1)

j 0
.
N
6 cos 2 (2 j 1)t
1 2 N 1 (1) k

sin 2 kt
4 k 1 k
2 (2 j 1)2
j 0
It is also possible to obtain the infinite sum in closed form using the dilogarithm function

x ln(1 t )
xk
Li 2 ( x)
dt 2 ; x 1 ,
0
t
k 1 k
but this is not as useful in the present case because the dilogarithm function is not as easily handled as
the logarithm function. The approximations with N = 1, 2, 5, and 10 are illustrated along with f (t) in
figure 13. The Gibbs phenomenon is apparent at t 1 2 , as expected, but there is no visible Gibbs
phenomenon at the endpoints. It is also clear from the graph that all of the approximations appear to
take the same value, around 1 2 , at t 1 2 . This is no accident, as the value 1 2 is the average of
the two different values the function is desperately trying to emulate on either side of the discontinuity.
A continuous function that tries to emulate a discontinuous one must take this average value, at least in
the limit as N grows without bound. This is easily verified, as

635

Advanced Mathematical Techniques


N
1
6
1 6 1
1 3
1
lim 2
2 1 (2) .
2

N
N
4
4 4
4 4
2
j 0 (2 j 1)
The value of the function at t 1 2 is clearly 0, so our limit function does not even come close to f (t)
at t 1 2 . However, this is a set of measure zero and is therefore discarded by the inner product. The
value of the function f (t) at t 1 2 is completely unimportant; the only thing that matters to the inner
product space is the sets of nonzero measure surrounding this point.

lim f 2 N 1 1 2

1.0
0.5
0.2

0.4

0.6

0.8

1.0

- 0.5
- 1.0

Figure 13
As a final example, lets consider a function that is both continuous and periodic. There should
be no Gibbs phenomenon difficulty in approximating such a function anywhere on the interval.
Consider the function
2t 2 ; t 1 2
.
f (t )
1 t ; t 1 2
This function is certainly continuous for all t 0,1 , and it also satisfies our boundary conditions. It is
not differentiable at t 1 2 , but we do not require differentiability; the Weierstrass approximation can
definitely handle this function in the strong limit case that does not require an inner product to throw
out sets of measure zero. As before, the coefficients can easily be obtained via integration by parts:
1
12
1
3
1
i
ck f (t )e 2 ikt dt 2t 2 e 2 ikt dt (1 t )e 2 ikt dt 3 3 2 2 1 (1) k 2 2 .
0
0
12
4 k
2 k
2 k
The coefficient c0 5 24 . Even coefficients are therefore given by
1
,
ceven
2 2 k 2
and odd ones are given by
i
1
codd 3 3 2 2 .
k k
Bessels inequality gives
2
N
N
2 N 1

1
1
1
2
5
2 6
4
ck 2 4

4
6
4
(2 j 1)
24
k 2 N 1
j 1 4 (2 j )
j 0 (2 j 1)
2

1
1
1
1

5 (4)

2 6 1 6 (6) 4 1 4 (4)
4
2
24 32
2

1
1
1 1
2
5
1

2
f (t ) dt f f
0
24 32 90
960 96 15
so our approximation ought to work. Re-writing it as

636

Section XII.3: Self-Adjoint Transformations

5
1 N cos 4 jt 2 N cos 2 (2 j 1)t 2 N sin 2 (2 j 1)t
,
2
2
3
24 j 1
4 j2
j 0
(2 j 1) 2
j 0
(2 j 1)3
it is obvious that the limit function takes the appropriate value at t 1 2 :
5 (2) 2 1
5
1 1 1
lim f 2 N 1 1 2

.
1 (2)
N
24 4 2 2 4
24 24 4 2
The function is plotted along with its approximations for N = 1, 2, and 5 in figure 14. It is clear from
the figure that we have no discernable Gibbs phenomenon. The only place that the approximations are
having trouble lies at the point t 1 2 , where the function is not differentiable. This trouble does not
stay with us in the manner that the Gibbs phenomenon does, but it is certainly present as all of our
approximations are differentiable. The sharp corner in f (t) is difficult to mimic, as illustrated in the
blow-up of this region in figure 15, showing the approximations with N = 1, 2, 5, 10, and 20, but our
functions are ultimately up to the task. The main difference between this difficulty and the Gibbs
phenomenon is that this difficulty goes away as N increases and the Gibbs phenomenon does not.
f 2 N 1 (t )

0.5

0.50

0.4

0.48

0.3

0.46

0.2

0.44

0.1
0.2

0.4

0.6

Figure 14

0.8

1.0

0.48

0.50

0.52

0.54

Figure 15

The expansion coefficients of the Fourier series sort of play the role of the DNA of functions
defined on this interval. Altering these coefficients slightly can lead to tremendous differences in the
function that is ultimately represented. This process represents a very interesting branch of functional
analysis that has broad applications throughout physics, electrical engineering, and optics, as well as
many other fields. The coefficient of the vector vk , along with that of v k , combine to form the
trigonometric functions cos 2 kt and sin 2 kt with wavelength 1 k and frequency k. Digital
recordings often require a frequency analysis, and this is most easily accomplished by looking at the
Fourier coefficients rather than at the function itself. The change from looking at a function to looking
at its Fourier coefficients is called a discrete Fourier transform, and the importance of this
transformation cannot be overstated. Just type this into Google and see what comes up.
To see the effect of modifying these coefficients, lets look at the last example given above.
Suppose we change the single coefficient of sin 10 kt from 2 125 3 to 2 3 . The new limit
function is illustrated along with the original function in figure 16. The change is obvious, and clearly
indicates the deviant frequency. Analyzing graphs like these allows chemical engineers to determine
the concentration of certain chemicals in a given sample, as well as identify the impurities in the
sample. Changing the coefficient of cos 12 kt from 1 36 2 to 11 36 2 leads to the graph in figure
17. Again, we see the tell-tale sign that there is an extraneous frequency. More invasive changes, such
as multiplying the coefficient of cos 2 (2 j 1)t by 2, lead to functions with a vastly different

behavior, as illustrated in figure 18. Multiplying the coefficient of sin 2 (2 j 1)t by 2 gives the
graph illustrated in figure 19. Even more invasive changes, like changing the denominator of the
cos 2 (2 j 1)t from 2j + 1 to 2j + 2, lead to graphs like that shown in figure 20. The most invasive
637

Advanced Mathematical Techniques

changes involve changing the power of the 2j + 1 or 2j in the denominators of the coefficients. This is
especially important when the power is reduced, as the series has a more difficult time converging in
this situation. The effect of changing the denominator of the third sum from (2 j 1)3 to 2j + 1 is
illustrated in figure 21. It is clear from this analysis that the discontinuities exhibited by our earlier
examples are generated by linear denominators in the coefficients. We should expect discontinuities
whenever we see coefficients that are only suppressed by O ( k ) .
0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0.2

0.4

0.6

0.8

1.0

0.2

Figure 16

0.4

0.6

0.8

1.0

Figure 17
0.5

0.6

0.4

0.4

0.3

0.2

0.2

0.2

0.4

0.6

0.8

1.0

0.1

- 0.2

0.2

Figure 18

0.4

0.6

0.8

1.0

Figure 19

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1
0.2

0.4

0.6

Figure 20

0.8

1.0

0.2

0.4

0.6

0.8

1.0

Figure 21

To conclude, we have shown that at least some Hermitian operators acting on infinidimensional
vector spaces have eigenvectors that span the space. In order to have this property, the Hermitian
operator A must have a discrete spectrum, meaning that the operator
A I
can only be singular, or not invertible, for a countable number of s. Every Hermitian operator that
satisfies this requirement will definitely have eigenvectors that span the space. All of its eigenvalues
will be real, and all eigenvectors associated with distinct eigenvalues will be orthogonal. These
properties of Hermitian operators with discrete spectra form the basis on which quantum mechanics is
built, as we will see later. Hermitian operators with continuous spectra will also play an important,
though admittedly muted, role in the structure of quantum mechanics. For this reason, we now turn to
the simplest case of such operators that can be treated in a systematic way.

638

Section XII.3: Self-Adjoint Transformations

Exercises for Section XII.3:


In problems 1 6, find an expression for the given function on the interval t 1,1 in terms of
trigonometric functions, assuming that the function satisfies periodic boundary conditions with period
2. Show that Bessels inequality is satisfied for the first ten terms of the expansion (combining
positive and negative values of k), then show that Bessels inequality is saturated (becomes an
equality) as the value of k is allowed to increase without bound.
4. et sin 4 t

3. e 2t

2. t 3 2t 2

1. t 2

5. t 2 t (t )

6. t 3 t (t )

7. Show that the matrix elements of the operator t are given by


1 1 b a ak b j
,
x t x k j
2 j k j 1 k j
as shown in the text. Why would an attempt to minimize this function be difficult? Why
would it ultimately not be fruitful?
8. Work through the analysis associated with the representation N t t0 given in the text.
Explain which parts of this analysis would be difficult to fully establish in a rigorous fashion.
9. The hardest part of the proof given in the text that Hermitian operators acting on
infinidimensional vector spaces have the same properties as those acting on finite-dimensional
spaces has to do with the statement that there are eigenvalues. Explain why this statement has
trouble in infinidimensional vector spaces even though there is no problem with it when the
dimension of the vector space is finite. Give examples of Hermitian operators acting on
infinidimensional vector spaces that have eigenvalues and examples of those that do not.
Explain the difference between these two operator classes.
10. Explain why the Dirac delta function cannot be considered as a function in the usual sense of
the word. Give at least two different sequences of actual functions that converge, in some
sense, to the Dirac delta function, but whose values do not agree on an infinite sequence of
points even in the limit as N tends to infinity. Explain why this behavior is admissible.
Would it be admissible if the delta function was an actual function?

Section XII.4: Function Spaces Without Compact Support


We have considered only functions defined on the interval t 0,1 in all of the above. This is
still extremely general, as any function defined on this interval can be transformed into an analogous
function defined on any interval t a, b for real numbers a and b. The space of functions defined on

0,1

is therefore isomorphic to the space of functions defined on a, b , so every result we have

obtained on

0,1

is equally applicable to

a, b .

This similarity ends when we consider the


639

Advanced Mathematical Techniques

unbounded intervals

a,

and

, .

The appropriate way to describe such intervals in

mathematics is by use of the word compact. A compact interval, or set, is one for which every infinite
sequence of elements of the set contains at least one limit point. Finite intervals are compact by
definition, since there is not an infinite amount of room the sequence has to move around in. The
sequence elements must bunch up somewhere if there are an infinite number of them. In contrast, the
interval a, is not compact because there exist sequences contained in it that do not have any limit
points. The sequence a nn 1 is one, but there are clearly an infinite number of others. The reason
why these sequences do not contain any limit points is that the interval itself is unbounded. There is an
infinite amount of space available for the sequences to spread out, so they need not bunch up
anywhere. A function that is defined on a compact interval, and equal to zero everywhere outside this
interval (except, possibly, on a set of measure zero) is said to have compact support. Function spaces
that contain all functions defined on an interval that is not compact are said to not have compact
support.
General function spaces without compact support are very involved to treat. Like function
spaces with compact support, there are inner products that allow the space to be countable in
dimension and other inner products that do not. This subtlety is even more complicated for function
spaces without compact support, as the lack of compact support gives another reason why the space
may have uncountable dimension. For function spaces with compact support, we can avoid this with
the simple inner product given above. Eigenvectors of Hermitian operators with discrete spectra span
the space, so the space is countable by definition. Function spaces without compact support, on the
other hand, do not have to be countable under this simple inner product. We require extreme cut-offs
for large t in order to force the space of square-integrable functions without compact support to have a
countable basis.
The most direct way, from our perspective, to approach a noncompact interval is to view it as a
limit of compact intervals. Consider the interval t L 2, L 2 . The space of all square-integrable

functions defined on this interval is definitely spanned by the eigenfunctions of the operator i d dt ,
as long as we impose periodic boundary conditions. This leads to the eigenfunctions
1 2i nt / L
,
vn
e
L
with eigenvalues
n 2 n L ; n ,
so for any function f (t) in the space we must have
N
L2
1
f (t ) lim e 2i nt L e i 2 nx L f ( x) dx .
L 2
N
n N L
We can perform the transformation quite easily, as the length of the old interval was 1 and the length
of the new interval is L. Any extraneous constants are removed by requiring orthonormality. Go
through this analysis yourself to see why it works. If you dont initially get it (like me), then go
through the analysis from the beginning on this interval. It is not a lot of work, and will show you
more clearly how the process works.
This process is exactly the same as that above, but its properties change when we take the limit
as L . Recall that we were only able to represent those functions that, under the inner product we
used, do not deviate from continuous except possibly on a set of measure zero. This set was countable
because the number of places at which the function can change in a discontinuous way was countable.
Functions that are discontinuous on an uncountable set are disallowed from this analysis. As we
increase the value of L, we are essentially taking a limit of the countable values that can be associated
640

Section XII.4: Function Spaces Without Compact Support

with discontinuities. We are including all of our limit points, so the resulting set is not countable
essentially by definition. This limit process then makes it impossible to span the space with a
countable set. One way to think about this result is that the boundaries of the interval imposed
conditions on the behavior of functions in the space. Removing these boundaries removes the
properties that made the space countable, so naturally causes it to become uncountable. We can also
see this in the behavior of the eigenvalues of the operator i d dt . The distance between them goes to
zero as the length L increases. The eigenvalues themselves approach an uncountable set, so we cannot
expect the limit space to have a countable basis. This is an extremely subtle point, but it can be
understood properly by thinking about the way in which the eigenfunctions span the space when it is
of compact support. The eigenfunctions have wavelength L n , with n any positive integer. These
values are definitely countable, as the integers are countable. The wavelength goes to zero as n tends
to infinity, but its values are countable. As L becomes larger and larger, the values of the wavelength
also become larger and larger for fixed n, but the values of n are still unbounded. Taking the limit as
L makes the differences between these discrete values more and more important, but every
rational number will be present at some value of L so the limit must include all of the limit points of
the rational number system. The set is therefore uncountably infinidimensional in the limit, and we
cannot expect to be able to span it with any countable basis.
Despite this fact, we are sure that the eigenfunctions of i d dt span the space for any value of
L. Passing to the limit, we are implicitly viewing the space as a limit space of the eigenfunctions of
i d dt . These functions span all of the spaces with finite L, so we can expect their limits to span the
space with infinite L. Unfortunately, the eigenvalues of i d dt become continuous as L so we
cannot expect the associated eigenfunctions to actually be in the space. This calls into question what is
actually meant by the terms span the space and basis, as these terms traditionally apply only to
vectors that lie in the space. This disconnect can be handled by changing the meaning of a linear
combination. Consider an arbitrary convergent linear combination of the orthonormal eigenbasis of
i d dt :

e 2i nt L
.

L
n N
This function is clearly continuous and even infinitely many times differentiable for all N, and can
approach any function in the space L2 L 2, L 2 as N grows without bound. Writing the sum as
f N (t )

cn

N
e 2i nt L
e2i nt L n
n L cn
,
L
L L
n N
n N
we see that the quantity kn 2 n L changes by a very small amount when L is very large. We can
write this in terms of k as
N
eiknt k
f N (t ) L cn
.
L 2
n N
The remaining sum can be thought of as a Riemann sum when N is very large, at least in some way.
The factor of L out front makes this process a bit delicate, as it actually cancels with the factor of L
needed to make the quantity k go to zero, but we can formally solve this problem by considering
instead the function f N (t ) L . Defining the function c (k ) in such a way that it equals the limit of the

f N (t )

function cLk 2

L as L tends to infinity, the function f N (t ) L can be re-arranged as follows:

f N (t ) L
N

ce

ik n t

dk
k

c( k )eikt .
L

2
2

641

Advanced Mathematical Techniques

This is an extremely delicate process that requires a certain order if it is to have any meaning. Taking
the limit as L tends to infinity before that as N does would lead to zero, as the coefficients cn are all
finite and will go to zero when divided by L . This factor is actually a remnant of the eigenfunctions
themselves, which also go to zero as L increases without bound. Taking the limit as N goes to infinity
first mutes this effect, as the values of k still run from negative infinity to infinity. No matter how
large L is, there are still finite frequencies appearing in the sum. Our understanding of the function
f (t) is that it somehow maintains its identity no matter how large the value of L. As L increases, the
wavelength of specific vibrations in f (t) increases. As L grows without bound, these finite
wavelengths all tend to zero. Taking the limit as N tends to infinity first ensures that we will always
have finite wavelengths available. Every wavelength present in the original function will still be
present in the limit, but it will be associated with larger and larger values of n as L tends to infinity. In
the limit, all finite values of k are associated with infinite values of n. The finite values of n all
congregate near k = 0 as L increases without bound, so only the infinite values of n are actually
important to the analysis of functions defined on intervals that are not compact. We can use this to
justify our division of the function f (t) by L: functions that are square-integrable on the compact
interval L 2, L 2 can congregate near the boundaries of this interval, but these same functions will
not be square-integrable if they try to pull this trick on a noncompact interval. In order to force
these functions to be square-integrable on an arbitrary domain, we divide them by L before taking the
limit. This essentially turns a probability distribution into a probability density, and is justified in
physical arguments by referring to a quantity per unit length or per unit volume. This process is
reminiscent of a u-substitution, or a scaling substitution, where the length L of the interval comes out
as part of the substitution.
Our eigenfunctions all oscillate nontrivially for every value of t, so cannot be square-integrable
on a noncompact interval. The eigenfunctions are divided by L as part of the normalization process,
so all of them tend to zero as L tends to infinity. Despite this, their norm is definitely equal to 1 for all
L. The integral of the absolute square of the eigenfunctions does not tend to zero as L tends to infinity,
making these functions more accurately described as distributions. In taking the limit, we associate
the factor of 1 L with the coefficients rather than with the eigenfunctions for convenience. This
allows us to consider nonzero functions as our eigenfunctions, even though these functions do not lie
in the space. This whole process is extremely delicate, and we need to think very carefully about how
it is structured in order to arrive at a meaningful result. It is important to always think of L as finite
until the final limit is taken. This avoids the embarrassment of having eigenfunctions that are zero.
When the final limit is taken, the sum passes over to an integral and we can properly think of the
eigenfunctions as distributions without having to worry about these problems. If we ever run into
trouble with our interpretation of the limit expressions, we always can fall back to finite values of L to
see what should happen and use these results to see what the previously meaningless expressions
mean.
The expression we have for an arbitrary function g(t) defined on the interval t , is
dk
c(k )eikt ,
2
but how can we determine the coefficients of this expansion in terms of the function g(t)? In the finite
case, we have

L2
ei 2 nt L
e i 2 nt L
f (t ) cn
; cn f (t )
dt .
L 2
L
L
n
Since c(k ) cLkn 2
L , we can also write this as
g (t )

642

Section XII.4: Function Spaces Without Compact Support

e iknt
dt
g (t )e ikt dt .
L
L 2

L
Here, we have taken g (t ) f (t ) L . The parameter kn tends to the continuous parameter k in this

c(k )

L 2

f (t )

limit. This process justifies our inclusion of the factor 1 L with the coefficients rather than the
eigenfunctions, as it allows the functions eikt to behave like eigenfunctions even though they are not in
the space. We can further manipulate our expression for g(t) by inserting this expression for the
coefficient function in exactly the same way used earlier in the compact case:
dk
dk

dk

g (t )
c(k )eikt
eikt d g e ik
d g eik t
2
2

2
.

dk ik t
e
d g

2
This must be true for all allowable functions g(t), so we must have
dk ik t

2 e t .
This result is arguably one of the most important in the entire field of functional analysis, and is
certainly one of the most important (if not the most important) expressions in applications of
functional analysis. It is not easy to prove rigorously, mainly because of the exchange in the integrals
performed in the last step above, but expressions involving the delta function are almost always
difficult to justify with full rigor. The very meaning of the delta function took mathematicians
decades to fully establish, so we cannot expect related results to be easy. This same result was
introduced from a different perspective in sections 9.4, 9.6, and, to a lesser extent, 6.3. We can view
this expression as an indication of the orthogonality of the eigenfunctions eikt with respect to the
variable t rather than k. Since both of these variables are continuous in the limit, they are essentially
interchangeable. We will see later that this statement becomes one of the underpinnings of quantum
mechanics, but for now we can content ourselves with the fact that the ordinary expression of
orthogonality for these eigenfunctions leads to essentially the same result. Writing the ordinary
orthogonality requirement
i k k t
i 2 nt L i 2 nt L
L 2 e
L 2 e n m
e
dt

L 2
L 2 L dt nm
L
on a compact interval, we see that this integral definitely equals zero whenever n m . Put in another
way, this integral is zero whenever kn km . Summing over m and taking the limit as L grows without
bound, we have
i k k t
i k k t

L 2 e n m
Lk L 2 e n m
dt
1 nm
L 2 L dt
L 2
L
m
m
m 2
.

dk i k k t
k L 2 i k n k m t


e
dt

e
dt
L
L 2
2
m 2
Thus, the function

ei k k t dt must be equal to zero whenever k k and its integral with respect to

k gives 2 . This is enough to identify it as

ei k k t dt 2 k k ,

in agreement with the above result (with different variable names). There is no need to switch the
order of integration in this derivation, so it is easier to prove the result rigorously in this manner.
643

Advanced Mathematical Techniques

We can use this orthogonality standpoint to see another reason why special care must be taken
with eigenfunctions associated with continuous eigenvalues. If the operator is Hermitian, then its
eigenfunctions associated with distinct eigenvalues are orthogonal. When eigenvalues become
continuous, this orthogonality relation must take the form of a delta function. When k k , the
singularity in the delta function appears. This singularity can either be removed by division, making
the eigenfunctions themselves zero, or be interpreted in a different way. The standard choice in

physics is to interpret the integral

1 dt as L, the length of the interval. This definitely represents

some fancy footwork mathematically, as we have taken the limit as L tends to infinity. It is not really
clear what is meant by the length of the interval if we are using an interval that is not compact, so we
divide by this length to obtain a meaningful result for densities. This has already been done above,
which is why the above results have meaning.
As an example of the use of this technique, consider the function g (t ) e t

. This function is

certainly square-integrable on the interval , , as

et dt .

The coefficients of its Fourier expansion are given by

c (k ) e t 2 e ikt dt e
2

t 2 2 ikt

dt e

t ik k 2 2

dt e k

e u 2 du 2 e k

Therefore, we must have


dk k 2 2 ikt
e
e .
2
This procedure is called a continuous Fourier transform, and is of the utmost importance in many,
many fields. Given any function g(t) that is not discontinuous at an uncountable number of places,
there must be a Fourier transform function g ( k ) given by
e t

g (k ) g (t )e ikt dt

for which
dk
g (t )
g (k )eikt ,
2
as far as the inner product is concerned. These equations are obviously very similar, differing only in
the fact that a factor of 2 comes with the measure of the Fourier transformed space, parameterized
by k, and no such factor is present in the space space parameterized by t. One often sees treatments
of continuous Fourier transforms in which this factor is shared by both spaces by placing a factor of
2 in both, but we will not take this route. The two options are entirely equivalent, but the one in
which Fourier space takes the brunt of this factor is more easily understood directly from the compact
case. The benevolent choice of sharing this factor is usually seen in mathematical treatments of
Fourier transforms, and the rude choice of giving this factor entirely to Fourier space is usually seen
in physics.
Truncating our expression by only integrating from K to K for some positive value of K
allows us to see how the function is approached by its Fourier counterpart. As we include larger and
larger values of K, the approximation is allowed to oscillate more and more wildly. These oscillations
allow it to mimic whatever function we ask it to. If the function is discontinuous or not differentiable
at some point, then it cannot be approximated well by any truncation of the integral. We can see how
important an oscillation of a given frequency is by examining the value of the Fourier transformed
function g ( k ) at the value of k associated with that frequency. The frequency of oscillations
644

Section XII.4: Function Spaces Without Compact Support

associated with the function eikt is k 2 , so large values of k are associated with high oscillation
frequencies. The approximations to our function are illustrated for K = 0.2, 0.5, 1, and 2 in figure 22.
Note that the approximations definitely approach the function as K is increased, but even the best one
oscillates much more than the actual function away from the origin. These oscillations will continue
for all values of t, as illustrated in figure 23 for K = 10, so we need to include all of the values of k in
order to fully suppress them. Even large values of K still contain these oscillations for large vales of t,
so K is never large enough to ensure that we have a good approximation for all t. This is not true in
the compact case, as we definitely have better and better approximations for all t as we increase the
value of N. This is yet another way to see the fundamental difference between a compact interval, no
matter how large, and an interval that is not compact.
1.0
0.8

6. 10-23

0.6

4. 10-23

0.4

2. 10-23

0.2

- 10

-5

10.5

10

Figure 22

11.0

11.5

Figure 23

Discontinuous functions defined on noncompact intervals will also exhibit the Gibbs
phenomenon, exactly as in the case with compact intervals. The function
1 ; 0 t 1
g (t )
1 t t ,
0 ; otherwise
where I have introduced the Heaviside function, or step function
1 ; t 0
(t )
0 ; t 0
for future convenience, has the Fourier transform
2
i
sin k 2i sin k 2
.
g (k ) e ik 1

k
k
k
Approximations to this function for K = 5, 10, 20, and 50 are illustrated in figure 24 along with the
function g(t). This figure is reminiscent of the earlier figures in that the Gibbs phenomenon is obvious,
but figure 25 shows that the oscillations continue for large values of t even when the cut-off value is
50. Again, this problem is associated with the lack of compactness of the interval.
1.0

0.005

0.8
0.6
0.4

2.0

0.2

-2

-1

Figure 24

2.5

3.0

3.5

4.0

- 0.005
1

Figure 25

Hermitian operators have many useful properties on infinidimensional vector spaces endowed
with any metric that satisfies the requirements given above. The most important are that they have
real eigenvalues associated with eigenvectors that are orthogonal and span the space. The most
common alterations of our chosen metric include a weight function associated with the importance of a
given value of t. The inner product in these spaces is defined by
645

Advanced Mathematical Techniques


b

f g f * (t ) g (t ) w(t )dt ,
a

where the weight function w(t) must be real and positive for every value of t on the interval except
possibly a countable set on which it is zero. All of these metrics have the properties listed above, and
many different weighting functions have made substantial contributions to many different areas of
research, particularly in the theory of Sturm-Liouville differential equations, which we will discuss in
the next chapter. The three properties listed above, of real eigenvalues with orthogonal eigenvectors
that span the space, also form the basis on which the mathematics of quantum mechanics is based.
One important fact that is often looked over in essentially every application is the dispelling of
functions that are discontinuous at an uncountable number of places. The set of delta functions does
span this space in the sense discussed above, but the eigenvectors of i d dt do not. This is true in
compact as well as noncompact intervals. Recall that the spanning proofs all disallowed functions that
are discontinuous on uncountable sets, so we are mute when faced with these vectors. Some Hermitian
operators, like t itself, do have eigenvectors that span the space, but these operators are plagued
with continuous spectra. The very property they need to span this part of the space disallows them
immediately from the space. These functions are perfectly admissible to the space, and have only been
disallowed for convenience on our part. It is true that most physical functions are expected to
conform, at least for the most part, to continuous functions. There are several physical phenomena that
flout this rule, but they are often ignored in introductory analyses. Essentially all of these functions are
well-behaved enough to allow our basis to access, but the idea that there are other functions there that
can be defined, but are not included in our analysis, is fairly strange. There is definitely a space of
functions defined on any nonzero continuous interval that is discontinuous on an uncountable set of
numbers within that interval. We can construct an orthogonal complement to the space of functions
spanned by the eigenvectors of Hermitian operators with discrete spectrum. This space consists only
of functions that deviate from continuity on a set of finite measure (and, of course, the zero vector), so
is not usually important in physical applications. However, it is important to remember that this space
exists and cannot be spanned by the standard eigenvectors.

Exercises for Section XII.4:


In problems 1 6, determine the continuous Fourier transform of the given function over all t. You
may leave your answer in terms of an integral if the integral cannot be obtained in terms of elementary
functions. Illustrate some graphs of the approximations to this function obtained by cutting the integral
off at some moderate value of k . Comment on the presence of the Gibbs phenomenon and the
character of the remaining oscillations for large t.
1. t (t )(1 t ) .

2. t 2 (t )(1 t ) .

3. t 2 (t )(1 t ) (2 t )(t 1)(2 t ) .

4. t 2 (t )(1 t ) (3 t )(t 1)(2 t )

5. e t sin(10 t ) (t )

6. et sin(10 t )

646

Section XII.4: Function Spaces Without Compact Support

7. Go through the analysis in the text associated with the treatment of an eigenfunction expansion
as the domain of interest becomes unbounded. Explain in your own words what the difference
is between compact and non-compact intervals and why the factor L is treated in the manner it is
in order to obtain a well-defined result.
8. Go through the treatment given in the text associated with the integral representation of the
Dirac delta function. Explain in your own words what is going on at each step and why the
delta function cannot truly be considered as a function.

Many statements are made in the text about treatments of quantum mechanics, but there unfortunately
was not enough room for me to include a chapter on this application (the focus of this book is on
mathematics, not physics). Problems 9 and 10 serve to illustrate, at least qualitatively, why linear
algebra and the treatment of Hermitian operators are so important to this application.
In quantum mechanics, particles like the electron are considered to be represented by a state.
This state is considered mathematically as a vector in a vector space or, equivalently, as a wave. This
treatment causes electrons and other quantum particles to be delocalized in space, so the question of
where an electron is or what its speed is has no meaning. The square of the inner product between two
different states gives the overlap between these states, the probability that a measurement of a particle
in one state will show that it actually is in the other state. Each state is normalized so that its inner
product with itself is 1 (why?).
9. Physical observables like position, momentum, energy, etc, are considered as operators on this
vector space of states. Eigenvectors of these operators therefore constitute states with welldefined position, momentum, energy, etc, and states that are not eigenvectors of these operators
cannot have a well-defined value for these quantities.
(a) Explain why the requirement that a Hermitian operator has real eigenvalues is useful in
quantum mechanics.
(b) Explain why quantum mechanics requires the inner product of two states with different welldefined values of a physical observable to be zero. How does the theory of Hermitian
operators help with this aspect of quantum mechanics?
(c) Explain why quantum mechanics requires the eigenvectors of any physical observable to
span the space of states available to a particle. What would happen if the eigenvectors did
not span the space? How do Hermitian operators help with this aspect of quantum
mechanics?
(d) Does the operator associated with position have any true eigenstates in the state space?
Explain why or why not.

647

Advanced Mathematical Techniques

10. The fact that operators do not have to commute leads to some very strange properties in
quantum mechanics. The commutator of two operators A and B is defined by
A, B AB BA ,
and the operators are said to commute if this quantity is equal to zero.
(a) Show that operators that do not commute cannot have the same eigenvectors unless the
eigenvector is also an eigenvector of the commutator with eigenvalue zero.
(b) Show that operators that commute must have the same eigenvectors unless the eigenvalue is
degenerate, in which case the eigenvectors can be chosen to the be same.
(c) The momentum operator p is given in position space by
d
p i ,
dx
where h 2 is Plancks constant divided by 2 . Show that the commutator of the
position operator x and the momentum operator is given by x, p i . Note that you
should consider the action of this commutator on an arbitrary function of x.
(d) Show that the operator i A, B is Hermitian whenever A and B are.

(e) Suppose that the two operators A and B are Hermitian, but do not commute. Consider the
state vector A i B , for an arbitrary real parameter . Show that the requirement
that the inner product of this vector with itself is positive regardless of the value of is
equivalent to the requirement
1
A2 B 2
i A, B .
2
This is the generalized Heisenberg uncertainty principle.
(f) Show that the left-hand-side of the inequality obtained in part (e) can be interpreted as the
product of the standard deviation of the measured value of A and that of the measured value
of B. Hint: replace the operator A with the operator A A I and make a similar
replacement with B in the inequality. We therefore have
1
A B i A, B .
2
(g) Determine the minimum value of the product of the uncertainty in position and the
uncertainty in momentum. This is the celebrated Heisenberg uncertainty principle.

648

Section XII.5: Summary of Chapter XII

Section XII.5: Summary of Chapter XII


This chapter focused on the treatment of vector spaces with infinite dimension. While we
found that many of the important concepts associated with finite-dimensional linear algebra also find
use in the treatment of infinidimensional linear algebra, there are some important subtleties. We can
no longer simply count the number of basis elements in order to establish that they span the whole
space, and the properties of some of these spaces forbid us from finding any direct means of even
counting the number of elements in a basis. Despite this fact, we can often still apply the same
qualitative reasoning seen in chapter 11 as long as we emphasize some caveats and modify spaces
slightly in order to make them countably infinidimensional. The standard example of this treatment is
found in the vector space of functions defined on a compact interval, which is modified by introducing
the standard integral inner product. This modification leads to the inner product space of squareintegrable functions defined on the given interval, which is countably infinidimensional. The
introduction of the inner product serves to limit the importance of discontinuities without forcing us to
consider only continuous functions. Infinitely many nonzero functions are identified with the zero
function in this analysis, and only functions that are nonzero on a set of nonzero measure are
considered as distinct from the zero function. This is not a terribly invasive restriction, and it leads to
many important results that we will use extensively in the upcoming chapters.
The treatment of Hermitian operators on infinidimensional vectors spaces also has analogues to
the related treatment in vector spaces with finite dimension, but there are again several important
subtleties that need to be treated carefully in order to arrive at a meaningful result. Chief among these
is the requirement of boundary conditions in order to make the boundary term associated with the
Hermiticity requirement zero. The imposition of boundary conditions is somewhat strange from a
mathematical point of view, as functions and their derivatives are usually required to satisfy conditions
imposed at only a single point. From a physical point of view, however, this restriction is perfectly
natural. We will find in the next chapter that boundary conditions often arise when considering spaces
of functions defined on some interval. This again illustrates the relationship between mathematics and
physical applications. Boundary conditions can be seen to be necessary in order to establish the
Hermiticity of a given operator, but they are much more natural when imposed by a physical
restriction. The mathematics works in the absence of a physical interpretation, but the treatment seems
more natural when presented from the standpoint of a physical problem.
Once we have established the Hermiticity of a given operator on a given function space, the
next question is whether or not this operator has eigenfunctions that span the space of functions on
which it acts. Boundary conditions are essential to this treatment, and many operators that are
Hermitian without the requirement of boundary conditions will not satisfy the results we ultimately
end up with. It is the imposition of boundary conditions that makes the Hermitian operators we are
interested in have a countable number of eigenvalues, and this allows us to make well-defined
statements about the behavior of the space that we could not have made in their absence. We will see
in the next chapter that the boundary conditions play a fundamental role in the study of function spaces
so much so that they outshine essentially every other property except that of linearity and
Hermiticity. If the boundary conditions required in a specific application are not easily expressed in
terms of the variables we are using, it is time to choose new variables.
Functions defined on noncompact intervals follow a slightly different set of rules. The
eigenvalues of Hermitian operators acting on this space are allowed to be continuous and
uncountable, and the associated eigenvectors no longer reside in the space itself. This treatment can
be done in a well-defined manner only by viewing noncompact intervals as limits of compact intervals
and leaning on the results seen in that case. If we do this carefully, then the linear combinations of
basis vectors seen in finite-dimensional linear algebra become integrals over an uncountable number of
649

Advanced Mathematical Techniques

eigenvalues associated with the Hermitian operator under consideration. This process is very
delicate, and must be done in a specific way in order to arrive at results that make sense. When it is
done properly, however, it leads to strong results that we can depend on. This treatment never actually
comes up in any physical application, as the size of every physical domain is finite. On the other hand,
it does allow us to find approximations to the behavior of physical states by simply pushing any
external influence off to infinity. This is how the celebrated orbitals of the hydrogen atom are found,
and leads to a successful approximation to the behavior of many physical systems.
At this point, it is time to turn to the next chapter. This chapter is focused on partial differential
equations and the manner in which linear algebra can be used to solve them. The most important
things to carry from this chapter are that (1) infinidimensional vector spaces can be either countable or
uncountable, (2) function spaces defined on compact intervals are of countable dimension when the
standard integral inner product with weight 1 is used, (3) Hermitian operators are difficult to show, but
when they exist they have eigenvectors with real eigenvalues that are orthogonal and span the space,
(4) function spaces defined on noncompact intervals are not countable under the standard inner
product, but can be treated by using integrals rather than sums and living with the fact that the
eigenfunctions do not lie in the space; the eigenfunctions span the space despite the fact that they do
not belong to the space, and all other properties align with those of countable spaces, and (5) all
functions that are discontinuous on an uncountable set are excluded from all of the preceding.
Functions in this part of the space that are eigenfunctions of a Hermitian operator with distinct
eigenvalues will be orthogonal to each other, but they will not themselves lie in the space. There have
not yet been many results on this part of the space, as it is largely ignored in applications because
physical functions are not usually expected to contain such an uncountable set of discontinuities.
Although we do not expect physical phenomena to exhibit such effects, these functions are in the
larger space and can, in principle, contribute to physical measurements.

650

Section XIII.1: The Wave Equation

Chapter XIII
Partial Differential Equations
This chapter is intended to illustrate the manner in which the results of the last chapter can be
used to solve certain types of linear differential equations. The most important of these in physical
applications will be discussed at length, and several examples will be given. These partial differential
equations represent some of the most important applications of infinidimensional linear algebra in
physical applications, and they are found throughout physics and chemistry.
Section XIII.1: The Wave Equation
It will be clearest for us to begin with a specific example. The standard choice is the wave
equation, which is both tremendously useful and easily derived. We are interested in studying the
motion of a string held in some way under tension T that has been plucked in some known way at time
t = 0. Plucking the string moves it away from its equilibrium configuration, so the tension in the string
will act to pull the string back to equilibrium. As the tension pulls on the part of the string away from
equilibrium, the part away from equilibrium will pull back. The tension force is exerted by the
neighboring parts of the string, so they are acted on by this equal and opposite force. This causes them
to move away from equilibrium, so the disturbance is moved through the string to neighboring
locations. The physical phenomenon associated with this spread of the disturbance through the string
is called a wave. Waves result whenever an extended cohesive material, called the medium, is
disturbed from its equilibrium configuration. The properties of the wave depend on two properties of
the medium: the restorative property, which can be thought of as the force applied to parts of the
medium when they deviate from their respective equilibrium configurations, and the inertial property,
associated with the resistance of the medium to acceleration. These properties govern the behavior of
the wave, as the restorative property contains information about how hard the medium will pull to get
itself back in line and the inertial property contains information about how much the medium itself
resists this acceleration.
To see how the wave equation is derived, consider a string of length L and mass M held in
some way under tension T. In reality, the tension itself will depend on where we are along the string.
Parameterizing the location along the string with s, we write the tension as T ( s ) . We can, if we like,
superimpose a set of coordinate axes on our string. The location of every part of the string can be
given by x( s ), y ( s) . Taking s as the arc length parameter, we must have
2

x y
ds ds ,
s s
or
2

x y
1.
s s
In order to go further, it is convenient to discretize the string into N small pieces of length s . The
piece located at sk experiences forces from the pieces at sk 1 and sk 1 , as well as from gravity. The

tension forces must pull the piece at sk toward the neighboring pieces, so a free-body diagram of this
piece looks like that shown in figure 1. The direction of the tension vectors is given by
651

Advanced Mathematical Techniques

x sk 1 x sk i y sk 1 y sk j for the force exerted on this piece from the neighboring piece
at sk 1 , and a similar expression for the other force. Using this in conjunction with the fact that the
magnitude of both vectors is T ( s ) allows us to write Newtons second law as
x sk 1 x sk
T sk s 2
2
2
x sk 1 x sk y sk 1 y sk

x sk 1 x sk

T sk s 2

T sk s 2

x sk 1 x sk y sk 1 y sk
horizontally and
y sk 1 y sk
2

x sk 1 x sk y sk 1 y sk
2

T sk s 2

mk

d 2 x sk
dt 2

y sk 1 y sk
x sk 1 x sk y sk 1 y sk
2

mk g mk

d 2 y sk
dt 2

vertically.
T
T
mg

Figure 1
Taking N to be very large, we can simplify these denominators by writing

x sk 1 x sk y sk 1 y sk
2
2
x sk 1 x sk y sk 1 y sk

s s
s
s

in light of our above result. This allows us to re-write Newtons second law in the horizontal direction
as
d 2 x sk
x sk 1 x sk
x sk x sk 1
.
mk
T sk s 2
T sk s 2
2
dt
s
s
As N becomes very large, both ratios tend to the derivatives x s evaluated slightly later and earlier
in s than sk . To be fair, we will evaluate these derivatives at the point halfway between the two points
associated with the ratio. Horizontally, this gives
d 2 x sk
x
x
x
mk
T s
T s
s T .
2
s s sk s 2
s s sk s 2
s s
dt
2

The mass of the element associated with sk is given by the mass density of this element times its
length, mk sk s , allowing us to further simplify this expression to
652

Section XIII.1: The Wave Equation

2 x x
T .
t 2 s s
We have changed the derivative with respect to t to a partial derivative because we are now thinking of
the function x as a function of both s and t. The time is the same during differentiation with respect to
s, and the variable s is the same during differentiation with respect to t, so the partial derivative
notation is appropriate. Performing the same manipulations in the vertical direction gives
2 y y
2 T g .
t
s s
These are the general equations governing wave motion along a string. We have allowed both the
tension T and the mass density to depend on where we are along the string. We will usually take
one or both of these functions to be constant, but these equations are appropriate in the general case.
Suppose that the mass density of the string is constant. The equilibrium configuration of
the string can then be found by taking the time derivatives as zero. The horizontal equation implies
that the quantity
x
T
C
s
is constant. Since the tension T is definitely positive, this implies that the partial derivative x s can
never vanish. We can therefore use x to denote position along the string instead of s. This allows us to
modify the vertical equation as:
y x dy
dy
x d dy C 2 d 2 y
g T T
.
C C

s s s s dx
s dx
s dx dx T dx 2
These are total derivatives, using the symbol d, because time does not change throughout our analysis.
Since

T C

dx 2 dy 2
s
dy
C
C 1 ,
dx
x
dx
we have
2

d2y T
g
dy
1 .
2 g
2
dx
C
C
dx
This is a nonlinear differential equation of second order for the equilibrium shape y ( x ) of the string;
we can re-write it as a separable nonlinear differential equation of first order by thinking of it as an
equation for v dy dx , and then solve it:
g
dv
g
v
dx .
1 v2

2
C
C
1 v
The integral can be accomplished by making the hyperbolic substitution v sinh u . This leads to the
solution
gx

v sinh
A ,
C

or
C
gx

y
cosh
A B .
g
C

653

Advanced Mathematical Techniques

This is called the catenary curve. The constants C, A, and B are chosen to make the solution conform
to the physical requirements of the problem at hand. If the string is fixed at y 0 at the two points
x D 2 and x D 2 , then we immediately have A = 0 and
C
gD
B
cosh
.
g
2C
Our solution is then
C
gx
gD
y
cosh
cosh
.
g
C
2C
The remaining constant C is related to the tension in the string:
2

dy
gx
gx
T C 1 C 1 sinh 2
C cosh
.
dx
C

C
From this, we can identify C as the tension in the string at its lowest point, C T0 . This is the
minimum tension found along the string, as all other points must support the weight of the string lying
below them.
The equilibrium solution of our string with constant mass density is given by
gx
gD
T
y 0 cosh
cosh
.
g
T0
2T0

When T0 gD , this solution can be written approximately as


1 ( g )2 2
x D2 4 .
2 T0
Thus, strings under high tension exhibit equilibrium configurations that approximate a parabola. The
larger the value of the central tension, the flatter the parabola. This is obviously in agreement with our
expectations of the behavior of a string under high tension. In order to explore what happens under
very small tensions, we need to first find the value of D in terms of L. The length of the string is
obviously given by
y

D2

D 2

gD
2T
dy
1 dx 0 sinh
,
dx
g


2T0
so
gL
2T0
D
sinh 1
.
g
2T0

Plugging this into our expression, and writing u 2 x D , we obtain


2

gL
T0
1 gL
y
cosh u sinh
1
.
g
2T0
2T0

At the center of the string, where u = 0, we have a location of


2

gL
T0
L
y
1 1
.
2
g
2T0

654

Section XIII.1: The Wave Equation

Since the length of the string is L, this implies that the smallest tensions require the string to hang
straight down. The horizontal displacement D becomes zero in this limit. Again, this is exactly what
we would expect.
If the string is allowed to change its position with respect to time, there are many different
situations we can imagine. The simplest is one in which the string is only displaced a little bit from its
equilibrium. In this case, we can again use x to parameterize location along the string instead of s.
Now, it is definitely true that
2

s
y
1 .
x
x
If we assume that the derivative of the vertical displacement with respect to the horizontal
displacement is small, then we can write this result as
2

s
1 y
y
1 1 1.
x

x
2 x

This is the so-called linear wave approximation, in which we ignore all powers of the derivative larger
than the first. It is identical in many respects to the small angle approximation sin often seen in
physics. In this approximation, the parameter s is essentially the same as the coordinate x. The
horizontal equation is somewhat strange in this case, as the value of s is what we are using to
parameterize the string. It cannot change with time in our construction because we have taken x = s,
up to a constant. The two quantities t and s represent independent variables, so cannot be correlated
without additional constraints on the system. This makes the horizontal equation read T x 0 , so
the approximation requires constant tension. If the tension was not constant, then the string would
tend to change its length. This is what changing x with time leads to in this approximation, so it is
disallowed by the statement that our string has constant length and mass density. We will see below
that a slight variation of this derivation allows us to include the effects of longitudinal motion back and
forth along the string, leading us to sound waves, but for now lets just think of x as fixed in such a
way that it cannot depend on time. It runs from L 2 to L 2 or, equivalently, from 0 to L. The
vertical displacement of the string, y, is allowed to change in small ways with x in a manner that
satisfies the partial differential equation
2 y
2 y
T 2 2 g .
x
t
This is the wave equation in one dimension, including gravity.
In order to include gravity systematically, we need to ensure that its effects do not violate the
linear wave approximation. Dividing our equilibrium expression for the tension experienced in the
string as a function of x by the central tension,
2

gx
gL
T
1 gL
cosh
cosh
1
,
T0
T
2
T
2 2T0
0
0
we see that this approximation is equivalent to stating that the central tension is much greater than the
weight gL Mg of the whole string. In this approximation, we are justified in ignoring the effects
of gravity in the wave equation completely, at least as a first step. This brings us to the standard onedimensional wave equation,
2 y
2 y
T 2 2 ,
x
t
or
655

Advanced Mathematical Techniques

2 y 2 y

.
x 2 T t 2
Before actually solving this equation in a specific situation, it is important to spend a few
minutes studying its character. The most striking result of this analysis is that any twice differentiable
function f (u ) can be placed as a solution to this equation. The claim is that the functions

f x T t

and

f x T t

both satisfy the wave equation whenever the function f (u) is twice differentiable in u. The validity of
this statement is easily verified simply by substitution into the equation. To see what these solutions
represent, consider how the character of them changes as t is increased from zero. It is a standard
result from pre-calculus or algebra II that the function f ( x c ) is the same as the function f (x), but
the whole thing has been moved to the right c units. Our first function therefore moves to the right by
the time-dependent amount T t , moving more and more to the right as time increases. This is
interpreted as saying that the wave is moving to the right at speed v T . The second solution
clearly moves to the left at the same speed, so our solutions represent right-moving and left-moving
waves. The speed of the wave is constant for every shape, and is given by the square root of the ratio
of the tension in the string to its mass density. More generally, any mechanical wave moving through
a medium with a restorative property and an inertial property is given by
restorative property
.
v
inertial property
This is a very useful result that often allows us to estimate the speed of waves through a given
material. All we need to do is properly identify the restorative and inertial properties of the material.
We will see this from several different perspectives below, but it is important to point it out now
because the complications associated with actually solving the wave equation under a given set of
conditions obscure this simple result. It would essentially become invisible in the final solution unless
it is already expected to be present.
Exercises for Section XIII.1:
1. A rope of length L and mass density is hung between two points, at x = 0 and x = D. Take
the value of y at these two points as zero.
(a) Find an expression for the vertical position y of the rope as a function of L, D, , x, and the
constant C. What requirement must be made about the ratio r = L/D in order for this result
to be physical? Show a graph of the equilibrium for r = 1.05, 1.2, and 3.
(b) Determine the tension in the rope at its lowest point in terms of gD for r = 1.05, 1.2, and
3. Where is this lowest point? What happens to this tension as the mass density of the rope
increases? At what ratio r does this tension equal the weight of the rope?
(c) Determine the tension required to support the rope at its endpoints in terms of gD for
r = 1.05, 1.2, and 3. What happens to this required tension as r increases? What is the
minimum value of this tension, and what value of D does it correspond to? Explain your
result physically.
(d) Determine the ratio of the maximum tension at the endpoints to the minimum tension at the
lowest point for r = 1.05, 1.2, and 3. What happens to this ratio as D tends to zero? What
happens as D tends to L?
656

Section XIII.1: The Wave Equation

2. A rope of length L and mass density is hung between two points, at x = 0 and x = D. The y
value at x = D is a distance H higher than that at x = 0. Take y = 0 at x = 0.
(a) Find an expression for the vertical position y of the rope as a function of L, D, H, , x, and
two constants given as the solution to transcendental equations. What requirements must be
made of the ratios r = L/D and s = H/D in order for your result to be physical? Show a graph
of the equilibrium for r = 3 and s = 0.5, 1, and 2.
(b) Determine the tension in the rope at its lowest point in terms of gD for r = 3 and s = 0.5,
1, and 2. Where is this lowest point, reckoned in terms of D, for the given values of r and s?
How does it behave as L is increased at constant D and H, and as H is increased at constant
L and D? What happens to this tension as the mass density of the rope increases?
(c) Determine the tensions required to support the rope at its endpoints in terms of gD for
r = 3 and s = 0.5, 1, and 2. What happens to these required tensions as s is increased at
constant r? What happens as r is increased at constant s? What is the minimum value of
these tensions, and what value of D does it correspond to? Explain your result physically.
(d) What happens to the tensions you found in part (c) as the distance D tends to its maximum
value. What is this maximum value?
(e) Determine the ratio between the maximum tension at the right endpoint and the minimum
tension at the lowest point for r = 3 and s = 0.5, 1, and 2. What happens to this ratio as D
tends to zero? What happens as D tends to its maximum value at constant L and H?
3. You intend to hang a chain 3 meters long whose mass density is 2 kilograms per meter from two
posts one meter above the ground.
(a) What is the minimum tension required at the endpoints in order for the chain to hang, and
not touch the ground? What is the distance between the two posts in this situation?
(b) What is the tension required at the endpoints in order for the chains lowest point to lie half
a meter above the ground? What is the distance between the posts in this situation?
(c) The chain cannot support a tension greater than 300 Newtons without breaking. What is the
maximum distance between the posts that you can employ without breaking the chain? How
far down will it sag at its lowest point in this situation?
4. You intend to hang a chain 3 meters long whose mass density is 2 kilograms per meter from two
posts. One post is a meter above the ground, and the other is one and a half meters above the
ground.
(a) What is the minimum tension required at the endpoints in order for the chain to hang, and
not touch the ground? What is the distance between the two posts in this situation?
(b) What tensions are required at the endpoints in order for the chains lowest point to lie half a
meter above the ground? What is the distance between the posts in this situation?
(c) The chain cannot support a tension greater than 300 Newtons without breaking. What is the
maximum distance between the posts that you can employ without breaking the chain? How
far down will it sag at its lowest point in this situation? Give your answer as a distance
below the lower post.
5. Show that f ( x vt ) and f ( x vt ) satisfy the linear wave equation whenever the function f (u) is
sufficiently differentiable. Explain why the constant v is called the speed of the wave.
657

Advanced Mathematical Techniques

Section XIII.2: Boundary Conditions


Suppose that we want to find a unique solution to this differential equation under a given set of
conditions. What conditions will we need? To answer this question, it is useful to step back a bit and
think carefully about the function y ( x, t ) . At every given value of t, this is a function of x alone

defined on the interval x 0, L . We can certainly expect this function to be well-behaved enough

that it will reside in the space L2 appropriate to this interval. We can span this space with a countable
basis, so it must be true that
y ( x, t )

c u ( x)

n n

for every fixed value of t. The basis functions un ( x) cannot depend on time, so the time dependence
must come from the coefficients of the expansion, cn cn (t ) . Therefore, every function y(x,t) is
written as a linear combination of products of a function that depends only on t and a function that
depends only on x. This property is called separability, and is extremely important to our method for
solving these partial differential equations. The function y ( x, t ) will solve our differential equation
whenever each of the contributions to this linear combination also satisfy the equation. Substituting
one of these terms into the equation gives
1
cn un 2 un cn .
v
Now, the product function cn un cannot be zero everywhere or else we would simply ignore it in the
expansion. Wherever it is not zero, we can divide by it:
un 1 cn
.

un v 2 cn
This equation is very telling, as the left-hand-side depends only on x and the right-hand-side depends
only on t. The only way to satisfy this is to have both sides constant:
un 1 cn

kn2 .
un v 2 cn
The particular form of this constant is chosen with a bit of clairvoyance, but we would eventually end
up with the same result by just writing C for this constant. Writing it this way is more refined, and will
lead to a clearer result.
The equation now separates, as it implies that
un k n2un
and
cn v 2 k n2 cn .
Thus, the separability of our equation allows us to consider it as two separate ordinary differential
equations instead of a single partial differential equation. Both of these equations are actually
eigenvalue equations with the second derivative operator in the appropriate variable. They are totally
symmetric, but the fact that we intend our solution to be valid for all t and only for values of x lying
between 0 and L indicates that only the second derivative with respect to x can have the boundary
conditions required to make this operator Hermitian and guarantee that its eigenfunctions will span the
space, be orthogonal, and have all of the other nice properties we require of our basis. The second
derivative operator has an adjoint that is given by

658

Section XIII.2: Boundary Conditions

d2
f
dx 2

g * ( x) f ( x)dx g * ( x) f ( x) g * ( x) f ( x)dx
L

g * ( x) f ( x ) g * ( x ) f ( x) g * ( x) f ( x)dx

d 2
L
L
*
g * ( x) f ( x ) g * ( x) f ( x) g ( x) f ( x)dx 2 g f
0
0
dx

so it will be Hermitian as long as we require our functions to satisfy boundary conditions that make the
boundary term vanish.
One fruitful way to explore the necessary requirements for the boundary term to vanish is to rewrite the boundary term as a determinant:
g * ( x) f ( x)
g * ( x) f ( x) g * ( x) f ( x) *
W ( x) .
g ( x) f ( x)
The requirement that the boundary term vanishes is equivalent to the periodic requirement
W ( L) W (0) . We need to have this be true regardless of the functions f (x) and g(x), so any given
condition must be stated entirely in terms of either f (x) or g(x). The boundary condition cannot
depend on what function we are taking the inner product with. The easiest, and most common, way to
do this is to require that both of these determinants are zero. This implies that the two rows are
linearly dependent, or
a f (0) b f (0) 0
and
c f ( L ) d f ( L) 0
for some nontrivial constants a, b, c, and d.186 These two linear combinations do not have to be the
same for both endpoints, but they can be. This is definitely the set of boundary conditions that one
usually finds, but it is not the most general. To obtain an alternate set of conditions, we recall that
determinants are not changed by row operations that replace a row with itself plus a linear combination
of the other rows. Replacing row 1 with itself plus a multiple of row 2, then replacing row 2 with itself
plus another multiple of row 1, leads to the periodic boundary conditions
f (0) a f (0) f ( L) b f ( L)
c f (0) (1 ac ) f (0) d f ( L ) (1 bd ) f ( L) ,
with constants a, b, c, and d. These constants may all be zero in this situation, as there is no choice
that makes them trivial. These types of conditions also come up often in applications, usually in the
form f (0) f ( L) ; f (0) f ( L) with all constants chosen equal to zero, but are not as common as
the previous set. Note that these two sets of boundary conditions are totally independent of one
another. Given a set of functions that satisfy the first set of boundary conditions, it is not always
possible to show that they also satisfy the second set. There are probably other sets of boundary
conditions that also guarantee that the boundary term vanishes, but we will be most interested in these
two choices.
The choice of which boundary condition to use is dictated by the physical problem we are
interested in solving. A string that is fixed at both ends, for example, will fix the boundary condition
as f (0) = f (L) = 0. These are sometimes called rigid or closed boundaries, as the string is fixed and
unable to move. A string that is fixed at x = 0, but able to move freely at x = L, will satisfy the
boundary condition f (0) f ( L) 0 , as any nonzero derivative at L will immediately fix itself
because that end is able to move freely. The tension forces will pull it back down to the same level as
its single neighbor to the left, causing the derivative to become zero. A boundary condition like that is
186

We cannot have a = b = 0, for example, as this makes the first equation trivial.

659

Advanced Mathematical Techniques

known as free or open. Periodic boundary conditions are most often seen when one is considering a
crystal in which a region is repeated again and again. The periodic boundary conditions then imply
that the behavior in the next region is the same as that in the present region. Since the boundary term
disappears for every choice of boundary condition of the types given above, we can be sure that the
second derivative operator is Hermitian and its eigenfunctions will span the space regardless of which
choice is made.
If we are interested in fixing the function at some value other than zero, we can shift this
requirement away from the boundary conditions by shifting our function y ( x, t ) . Suppose, for
example, that we wish to force y (0, t ) 0 and y ( L, t ) 12 . Considering instead the function
z ( x, t ) y ( x, t ) 12 x L , we see that this new function satisfies the appropriate boundary condition
z (0, t ) z ( L, t ) 0 and the same differential equation. This process is a little more tricky when we
require the derivative of the solution with respect to x to have a given nonzero value at one of the
endpoints. Suppose, for example, that we need the solution to satisfy y x (0, t ) 0 and y x ( L, t ) 4 .
Considering instead the function z ( x, t ) y ( x, t ) 2 x 2 L , we see that this function does satisfy the
appropriate zero boundary conditions. However, our shift has modified the differential equation in this
situation; the function z ( x, t ) satisfies
2 z 1 2 z 4

.
x 2 v 2 t 2 L
This can be fixed by considering the function w( x, t ) z ( x, t ) 2v 2t 2 L . This function satisfies the
original equation and all of the appropriate boundary conditions. Once we have found the solution for
w( x, t ) , we simply use our relations to determine the value of y ( x, t ) . If we want our function to
satisfy y (0, t ) 0 and y x ( L, t ) 3 , we need to shift our function without changing the boundary
condition at x = 0. We no longer need to ensure that the derivative at x = 0 remains fixed, though, so
we can accomplish this simply by subtracting 3x from our function. This makes the boundary
condition at x = L of the appropriate type without modifying either the boundary condition at x = 0 or
the differential equation. These types of shifting games are common to more general linear partial
differential equations; we can even get time-dependent boundary conditions using this process, though
this is a bit more tricky. We can think of the boundary conditions as being moved around from
boundary to differential equation to other boundary, etc, throughout these games. The essential point
is to get a function that satisfies a homogeneous differential equation (one that admits the zero function
as a solution) and boundary conditions of one of the above types. As long as the initial problem was
well-posed, i.e. has a unique solution, we will always be able to accomplish this.
Now that we have established that the second derivative operator will be Hermitian as long as
the functions we are interested in satisfy an appropriate set of boundary conditions, lets see what its
eigenfunctions are. The eigenvalue equation is
d 2u
u .
dx 2
This equation has solutions u Ae x Be x for any value of , but not all of these will satisfy the
appropriate boundary conditions. For boundary conditions of the first type, we require linear
combinations of these solutions to equal zero at both endpoints. These need not be the same linear
combination, but whatever linear combination we require at x = 0 must be zero for every value of
and the same goes for that at x = L. This is not possible for positive values of because the
exponential function is one-to-one and never zero. A specific (nontrivial) linear combination of two
exponentials can only be zero at a single value of x. We can arrange the coefficients A and B in such a
way that any given linear combination will satisfy this requirement at x = 0, but this will disallow any
660

Section XIII.2: Boundary Conditions

linear combinations from satisfying the required condition at x = L. This fact is difficult to see in
general, but very easily established with the specific requirement that u (0) u ( L) 0 . The first
requirement gives A = B, which causes the second requirement to never be satisfied for any nonzero
value of because e 2 L 1 . When is negative, on the other hand, the functions are no longer
one-to-one. They take the same values over and over again as x changes, so it is possible to satisfy
these boundary conditions. Thus, we require our eigenvalue to be a negative real number. This is the
clairvoyance that led to our expression with kn2 . The second type of boundary condition also
disallows positive values of because the exponential functions have no linear combinations that are
periodic. Again, it may be possible to choose constants a and b that are independent of and
guarantee one of the conditions, but there are no choices of c and d that will cause the second condition
to hold independent of . As before, negative values of do not have these difficulties as the
resulting functions are periodic and can be made to satisfy these conditions.
In order to find the eigenfunctions appropriate to the boundary conditions u (0) u ( L) 0 , we
first apply the boundary condition at x = 0. This gives A B 0 , so the appropriate linear
combination of the complex exponentials is given by
Aeikn x Be ikn x A eikn x e ikn x 2iA sin kn x .
Now, A is an arbitrary constant so can absorb the new factor of 2i. In addition, any multiple of an
eigenfunction that satisfies the boundary conditions will also be an eigenfunction that satisfies them so
we can simply take our eigenfunctions as sin kn x . In order to satisfy the boundary condition at

x L , we require sin kn L 0 . This is accomplished by taking kn L n , or


n
kn
; n .
L
The eigenfunctions that satisfy our boundary conditions are therefore given by sin n x L . Our
result is quite similar to that found in the infinidimensional linear algebra notes. The only thing that is
missing is a factor of 2, which is associated with the difference between the boundary conditions
imposed there and those imposed here. This situation is indicative of what happens in general: the first
boundary condition fixes the linear combination of the two linearly independent eigenfunctions, and
the second quantizes the eigenvalues to make them a countable set. The process is not as simple with
more general boundary conditions, especially those that combine the derivative and the function itself,
but the end result is the same.
Exercises for Section XIII.2:

In problems 1 6, determine the shift required in the function y, which satisfies the given boundary
2
2 y
2 y
conditions and the equation

v
, in order to cause the new function z to satisfy the same
t 2
x 2
differential equation along with homogeneous boundary conditions.
Then, determine the
eigenfunctions associated with these homogeneous boundary conditions.
2. y x (0, t ) 1; y ( L, t ) 3 .
1. y (0, t ) 5 ; y ( L, t ) 7 .
3. y (0, t ) 2 ; y x ( L, t ) 3 .

4. y x (0, t ) 1; y x ( L, t ) 5 .

5. y (0, t ) sin t ; y ( L, t ) 0 .

6. y (0, t ) cos t ; y ( L, t ) e t .
661

Advanced Mathematical Techniques

7. Suppose that we require a solution to the wave equation that satisfies the periodic boundary
conditions y (0, t ) y ( L, t ) ; y x (0, t ) yx ( L, t ) . Show that shifting games are not necessary
in this case and find the appropriate eigenfunctions of the second derivative 2 x 2 .
8. Suppose that we require a solution to the wave equation that satisfies the periodic boundary
conditions y (0, t ) y x ( L, t ) ; y x (0, t ) y ( L, t ) . Show that shifting games are not necessary
in this case and find the appropriate eigenfunctions of the second derivative 2 x 2 .

Section XIII.3: Initial Conditions and the Solution of the Wave Equation

Now that we have seen that we can determine a set of eigenfunctions that satisfy the
requirements necessary to make the second derivative operator Hermitian, it is time to turn to the
temporal differential equation. This equation is of exactly the same form as that we just solved, but
there are no boundary conditions in this case. The equations are the same, but we are not thinking
about them in the same way. In solving the spatial differential equation, we were concerned about
making sure that the boundary conditions were satisfied. This was possible because the constants kn
had not yet been specified. We were essentially specifying the differential equation itself in addition to
the solutions it spawns. The temporal differential equation has no such free parameters. It is given by
2

d 2 cn
n v

cn ,
dt 2
L

and has the solutions

n vt
n vt
cn (t ) A cos
B sin
,
L
L
for arbitrary constants A and B. In order to fix these coefficients and determine a unique solution, we
need to impose two independent conditions. It is no longer possible to impose boundary conditions,
as we cannot change the differential equation itself. Instead, we view this as a proper ordinary second
order differential equation and require two initial conditions to fix the coefficients A and B. These
initial conditions must apply to each of the functions cn (t ) , but they need not be the same for all n.
The standard choice is to fix cn (0) and cn (0) , and this is done by specifying the initial shape and
velocity of the string. Each of these initial functions specifies all of the coefficients of the expansion in
the manner shown in the last chapter, so each gives an initial condition on every function cn (t ) . Our
wave equation problem is then specified by asking for a function y ( x, t ) that satisfies both the
boundary conditions
y (0, t ) y ( L, t ) 0
and the initial conditions
y ( x, 0) f ( x) ;
yt ( x, 0) g ( x) ,
as well as the differential equation
2 y 1 2 y
.

x 2 v 2 t 2
Note that these requirements are symmetric in some way. We require two conditions on the function
that are satisfied for all values of t, and another two conditions that are satisfied for all values of x.
The first set of conditions is satisfied at fixed values of x, and the second set is satisfied at fixed values
of t. The only real difference between these two sets of conditions is that the boundary conditions are
662

Section XIII.3: Initial Conditions and the Solution of the Wave Equation

specified at two different values of x while the initial conditions are specified at the same value of t.
This again relates to the different character of these two conditions. The boundary conditions
determine the eigenvalues, and the initial conditions determine the function under the restriction of the
specified eigenvalues. The linear independence of the two solutions for cn (t ) guarantees that we will
be able to find solutions that satisfy an arbitrary set of conditions imposed at the same time, but it does
not guarantee that we will be able to find a solution that satisfies two arbitrary conditions specified at
different times.
The fact that we require two sets of conditions on both x and t follows indirectly from the fact
that our differential equation is of the second order in both of these variables. A partial differential
equation of third order in t and second order in x would require two boundary conditions and three
initial conditions to specify uniquely. One of third order in x and second order in t would require three
boundary conditions and two initial conditions to specify uniquely. This type of situation occurs only
rarely in physical applications, but it is interesting to analyze from a mathematical standpoint. The
same is true of differential equations that contain mixed partial derivative, especially when the highest
order derivative in one of the variables is accompanied by a derivative in the other variable. In
practice, these situations, if they arise, can usually be handled by choosing a different set of
independent variables. Physical situations are often constructed in such a way as to make these
situations impossible, basically by choosing this appropriate combination of variables at the outset.
Now that we have a set of rules that give a well-posed problem, lets illustrate the actual
technique of solving this problem in a specific example. Suppose that the string is fixed at zero at both
endpoints and that it is plucked in the middle so that its initial displacement is given by
; xL 2
x
y ( x, 0) f ( x)
L x ; x L 2
and its initial velocity yt ( x, 0) is zero. Our solution is given by

n x
y ( x, t ) cn (t ) sin
.
L
n 1
The negative values of n no longer contribute, as their eigenfunctions are linearly dependent with those
of positive n with this fixed linear combination that forces the function to be zero at x = 0. The
contribution from n = 0 is not present because the linear combination of eigenfunctions that satisfies
the boundary conditions is identically zero for this value of n. All that is left is the determination of
the functions cn (t ) . The second initial condition, that the initial velocity is everywhere zero, is the
easiest to implement. It gives
n vt
cn (t ) An cos
.
L
As with the boundary conditions, the first condition imposed determines the linear combination of the
two linearly independent solutions that is appropriate to our analysis. The product functions
n x
n vt
n x
An cos
sin
cn (t ) sin
L
L
L
are called normal modes of the system, as any linear combination of them will satisfy the partial
differential equation. They all act independently of one another, and their linear combinations are
guaranteed to be able to represent any wave satisfying the boundary conditions. The only question left
is which linear combination to take. The other initial condition states that

n x
f ( x) An sin
.
L
n 1

663

Advanced Mathematical Techniques

This condition fixes the coefficients, yielding a unique solution function y ( x, t ) . To determine these
coefficients, we project out the coefficient of the mth eigenfunction by taking the inner product with
sin m x L . This gives

L
m x
n x
m x
L
dx

An sin
sin
dx Am .

0
0
L
L
L
2
n 1
The factor of L 2 comes because the eigenfunctions we are using are not normalized. Each of them

f ( x ) sin

has norm L 2 , so this factor comes out when we do the projection. This expression allows us to
determine the coefficients explicitly:
0
; m even

2 L
m x

.
Am f ( x) sin
dx 4 L
( m 1) 2
L 0
L
; m odd
m 2 2 ( 1)
Therefore, our solution is given by
4 L (1) j
(2 j 1) vt
(2 j 1) x
.
y ( x, t ) 2
cos
sin
j 0 (2 j 1) 2
L
L
It is obvious that this function satisfies both boundary conditions as well as the partial differential
equation. The only tricky part is the initial conditions, as it would be easy to make a mistake in the
computation of the integrals.
Evaluating our solution at t = 0 and x L 2 gives

(1) j

4L
1
L
1 2 (2) ,
2
2
j 1

2
as required. This is a fairly stringent check, but it is not completely definitive. To make us more
confident in our results, we graph the function for a few values of t. Cutting off the sum at j = 50, we
obtain the graphs shown in figures 2, 3, 4, and 5 for vt = 0, 0.2L, 0.4L, and 0.8L, respectively. The
horizontal and vertical axes are given in units of L. These graphs indicate how the ensuing motion is
expected to go. It is best to think of them as a flip book or series of snapshots of the string at various
times. Once the string is plucked, it will approach the equilibrium and form a trapezoid in the process.
This trapezoid decreases its size until it reaches equilibrium and moves through it to reform on the
other side. This motion will continue forever under our model, as we have not included any friction.
y L 2, 0

4L
2

(2 j 1)

(1) j

4L
2

(2)

0.4

0.4

0.2

0.2
0.2

0.4

0.6

0.8

0.2

1.0

- 0.2

- 0.2

- 0.4

- 0.4

Figure 2
0.4
0.2

0.2
0.4

0.6

664

0.8

0.2

1.0

- 0.2

- 0.2

- 0.4

- 0.4

Figure 4

0.6

0.8

1.0

0.8

1.0

Figure 3
0.4

0.2

0.4

0.4

0.6

Figure 5

Section XIII.3: Initial Conditions and the Solution of the Wave Equation

We can, if we like, include friction in our analysis. This is not that invasive of a change as long
as the friction does not depend on x, as our eigenvalue equation is unchanged. The temporal equation,
on the other hand, becomes
cn 2bcn kn2 v 2 cn 0 .
The new term is the first derivative, associated with the resistive force
y
F 2b
t
directed opposite the velocity. The quantity b measures the strength of the resistive force. At higher
speeds, one actually expects a force proportional to the square of the speed. This changes the nature of
the equation, making it a nonlinear partial differential equation that cannot be directly solved using
any of the techniques we have discussed. The assumption of a resistive force proportional to the
velocity leads to a second order differential equation with constant coefficients, so can easily be
solved. The discriminant of the characteristic equation is 4b 2 4kn2 v 2 . We are actually looking for the
complex roots because these contain the same oscillatory behavior as seen in the undamped scenario.
For negative discriminant, we have solutions
cn (t ) An e bt cos

v 2 kn2 b 2 t Bn e bt sin

v 2 kn2 b 2 t .

The derivative at t = 0 gives

cn (0) bAn v 2 kn2 b 2 Bn ,


so strings initially at rest must use the linear combination

b
cn (t ) An e bt cos v 2 kn2 b 2 t
sin v 2 kn2 b2 t .
2 2
2

v kn b
The value of An is the same as before, as the damping cannot contribute when t = 0. It is often
convenient to express results like this in such a way that comparison to the undamped case is apparent.
We do this by writing the result as
2
2

b
b
b

sin
1
cn (t ) An e bt cos vkn 1
t
vk
t

2 2
2

n
vkn
vkn

v
k
b



This looks more complicated until we realize that we can write the ratio as a dimensionless parameter
b
Lb
Lb
,
n

vkn vn
T n
in terms of which

n
cn (t ) An e bt cos vkn 1 n2 t
sin vkn 1 n2 t .
2
1 n

It is clear from our expressions for n that this parameter tends to zero for every value of b as n tends
to infinity, but there may be values of n for which this ratio is greater than 1.
When the value of b is larger than v L , some of the values of n will lead to positive
discriminant, making the argument of the square roots negative. As long as we choose the same
convention for all of the square roots, we will arrive at the same value. Choosing the principal value
makes the square root have positive imaginary part, so we have
e x e x
e x e x
cosh x
i sinh x .
cos(ix)
and
sin(ix)
2
2i

665

Advanced Mathematical Techniques

The square root in the denominator of the sine functions contribution cancels the i, so we are left with

n
cn (t ) An e bt cosh vkn n2 1 t
sinh vkn n2 1 t .

n2 1
This change must be taken into account when doing calculations, but most computer algebra systems
are programmed to do it properly automatically, so it is usually not an issue.
When studying this effect graphically, it is important to choose parameters in such a way that
the interesting effects of the damping will be clear. We wouldnt want to take t large enough that the
damping exponential takes over the whole graph, for example, or so small that the damping force has
not had adequate time in which to work. The damping affects the oscillation frequency associated
with each eigenfunction n, as well as contributing the exponential factor that ultimately undoes the
wave. This causes different values of n to effectively move at different speeds, a phenomenon called
dispersion. Dispersion is given its name because causing different eigenfunctions, or modes, to move
at different speeds allows them to spread out from each other and changes the overall shape of the
wave. It disperses the frequencies of the wave. We will have much more to say about dispersion
below, but for now look at how this effect changes the shape of the wave. Figures 6, 7, 8, 9, 10, and
11 show a series of times shortly after the string is released under a fairly large damping coefficient of
7 Hertz, but with a fairly large wave speed of 20 meters per second. The length L is 1 meter. The
damped oscillations are given by the red curve (the higher one in all of the figures save the last) and
the undamped oscillations are given by the blue curve. Scales have been adjusted for clarity. In
analyzing these figures, it is important to understand that the middle section of the string must move
faster than the other parts in order to keep up, as it has the farthest to go. This increases the effect of
the damping force on this section, so the damped wave is not able to keep up as well in the center.
This generates the bump seen in figure 6, which continues on through figures 7, 8, and 9 as the string
passes through its equilibrium configuration. At the apex of the lower motion, shown in figure 10, the
damped string again takes the same shape as the undamped one, but the amplitude has been decreased
by the damping. As the motion continues in figure 11, the frictional effects generate the bump again
by acting more on the center of the string than on the other portions. These effects are present in all
damping scenarios, but the oscillations vanish when the damping becomes too large. The bump still
forms, but the entire waveform is muted to zero before the wave has a chance to pass through its
equilibrium configuration. This effect is illustrated in figure 12 for a damping coefficient of 60 Hz and
in figure 13 for a damping coefficient of 100 Hz. In the first case, none of the normal modes is
overdamped; the largest value of n is 3 1 . Despite this, the wave has not reached its equilibrium
even after the undamped string has clearly already passed through. This effect is even more
pronounced at 100 Hz, as the undamped wave has reached its negative apex and the damped wave is
still slowly trying to make it to its equilibrium. We cannot attribute these effects entirely to the
damping exponential, as this factor affects all of the normal modes in the same way. The change in
shape of the wave and its failure to reach equilibrium are both more accurately attributed to the
modification of the oscillation frequencies of the normal modes. Smaller oscillation frequencies
translate to more time taken for the string to oscillate, which ultimately suppresses the oscillations of
the string.

666

Section XIII.3: Initial Conditions and the Solution of the Wave Equation
0.2

0.04
0.1

0.02
0.2

0.4

0.6

0.8

1.0

- 0.1

0.2

0.4

0.6

0.8

1.0

- 0.02
- 0.04

- 0.2

Figure 6

Figure 7
0.15

0.10

0.10
0.05

0.05
0.2

0.4

0.6

0.8

1.0

- 0.05

- 0.05

0.2

0.4

0.6

0.8

1.0

0.8

1.0

- 0.10
- 0.15

- 0.10

Figure 8

Figure 9
0.10

0.4

0.05

0.2
0.2

0.4

0.6

0.8

0.2

1.0

- 0.2

0.4

0.6

- 0.05

- 0.4

- 0.10

Figure 10

Figure 11

0.2

0.4

0.1

0.2

0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

1.0

- 0.2

- 0.1

- 0.4

- 0.2

Figure 12

Figure 13

This example illustrates many things about the behavior of waves. The initial conditions set
the stage, then the partial differential equation takes over. The initial shape is repeated again and again
in the absence of damping effects, as all of the disparate frequencies are again brought together in the
same configuration they started with. Damping alters this by affecting the parts of the string that are
moving fastest more than those that are moving slower. This distorts the shape of the wave through
the phenomenon of dispersion and decreases its overall amplitude by damping the motion
exponentially. We can see these same sorts effects with any initial waveform, but the effects can look
strikingly different. The reason for this is that our initial waveform is composed of many different
normal modes, all of which move independently of one another. If all of the frequencies do this at the
same speed, then the shape of the wave is expected to be unaffected by the passage of time. This is
somewhat true in the above graphs, as the waves do eventually recombine to form the original shape,
but the intermediate shapes are quite different from the initial form. The reason for this is interference
between different parts of the initial waveform. The initial configuration is a standing wave in the
sense that it is composed of right-moving and left-moving waves with the same amplitude. This is the
reason why all of the is have dropped out of our analysis. Our original analysis contained waves with
667

Advanced Mathematical Techniques

negative n as well as those with positive n, but the imposition of a real function for the basis to mimic
causes the coefficients of the positive values of n, which are usually ascribed to motion in the positive
direction, to be the complex conjugates of those associated with negative n, moving in the negative
direction. Both of these components have exactly the same amplitude. As time goes on, these rightmoving and left-moving components move right and left that is, of course, what we expect them to
do. When a wave hits a rigid boundary like those at x = 0 and 1, it will invert itself and change its
direction. The reason for this is that the boundary is pulling as hard as it needs to on the string in order
to prevent any change in y at x = 0. No energy can be transmitted through this boundary, so the wave
inverts itself as a result of this force (you can think of this as Newtons third law for waves). We will
see the reasoning behind this more clearly in the next section on boundaries between different media.
This whole phenomenon can be somewhat confusing when first analyzed, but it is really quite
simple once you get the hang of what is going on. Consider the unit pulse given by
1 ; x L 3, 2 L 3
.
f ( x)
0 ; x L 3, 2 L 3
The coefficients of this pulse are given by

2
n
2n
cos
cos
,
n
3
3
and its subsequent evolution is shown in figures 14 19. The initial pulse consists of right-moving and
left-moving waves, which split up at time begins. In some sense, half of the wave moves to the right
and the other half moves to the left. The parts where the two waves still overlap have height 1, while
those that have split from the pack have height 1 2 . This is clearly illustrated in figure 15. As the
peaks continue to move, they eventually separate and continue toward their separate boundaries, as
illustrated in figure 16. The peaks in this figure are moving away from each other. When one of the
peaks hits a boundary, it reverses its height and its direction. This causes it to interfere destructively
with the rest of the peak that is still incident on the boundary, keeping the zero status next to the
boundary. The shortening of the peak indicates this interference, as shown in figure 17. Once the
peaks are no longer overlapping, the negative peak will resume its natural height of 1 2 , as shown in
figure 18. The pulses in this figure are moving toward each other. Once they recombine, as shown in
figure 19, the original height of 1 is restored, in the opposite direction. This process then repeats itself.
An

1.0

1.0

0.5

0.5
0.2

0.4

0.6

0.8

0.2

1.0

0.6

0.8

1.0

- 1.0

- 1.0

Figure 14

Figure 15

1.0

1.0

0.5

0.5
0.2

0.4

0.6

- 0.5

0.8

1.0

0.2

0.4

0.6

- 0.5

- 1.0

- 1.0

Figure 16
668

0.4

- 0.5

- 0.5

Figure 17

0.8

1.0

Section XIII.3: Initial Conditions and the Solution of the Wave Equation
1.0

1.0

0.5

0.5

0.2

0.4

0.6

0.8

1.0

- 0.5

0.2

0.4

0.6

0.8

1.0

- 0.5

- 1.0

- 1.0

Figure 18

Figure 19

One interesting feature of these graphs is the progress of the gremlins associated with the
Gibbs phenomenon. These scamper through the graph, back and forth, generating large oscillations
wherever they go. This is very interesting to watch on the animation, but it is totally obvious from the
mathematical perspective. The Gibbs phenomenon is associated with discontinuities in a graph. As
these discontinuities propagate throughout the string, they must move with them. Watching these little
gremlins move back and forth across the graph is a good way to really visualize what is going on with
the larger features. Each of the initial gremlins splits immediately into two pieces when the string is
released, one moving right and the other moving left. These gremlin-spawn have half the size of the
original gremlin, as the associated discontinuity has been split in half. These small figures are
associated with such a large oscillation frequency that they are easily picked out in a graph; their
interference with the larger waveforms does not change their shape appreciably. These high frequency
gremlins just ride the larger wave back and forth in their little demesnes.
This effect can be illustrated more clearly in the case of two initial pulses of different height, but
the interference does require time and effort to keep track of.
Suppose that we begin with the
waveform
x0
0 ;
1 ; 0 x 1 3

f ( x) 0 ; 1 3 x 2 3 ,
2 ; 2 3 x 1

0 ; x 1
taking L = 1 for convenience. This waveform is discontinuous at x 0, 1 3, 2 3, and 1 , so we expect to
see the Gibbs phenomenon at each of these points. The Fourier coefficients
4
n 1
n 1
n
An
2 cos 2
cos
(1)
n
3
2
3
2

confirm this fact, as they are only suppressed by 1 n as n tends to infinity. It is certainly true that this
initial waveform violates the linear wave approximation at each of its discontinuities, but we can think
of this as an approximation. The actual displacement of a physical string is certainly not discontinuous,
but it is still interesting to see what the mathematics gives us. This is, of course, what we were doing
above in the simpler case of one pulse. If the string begins at rest, then the function y ( x, t ) is given by

y ( x, t ) An sin n x cos n vt .
n 1

Plotting this function for t = 0 gives the initial displacement, which should be f (x). The resulting graph
is shown in figure 20, for a cut-off value of 50 for n. It is clear from this graph that the Gibbs
phenomenon is not the same at each of the four discontinuities. The discontinuity at x 1 3 shows the
smallest effect, followed by those at x = 0 and 2 3 and the largest at x = 1. The reason for this disparity
669

Advanced Mathematical Techniques

is the size of the discontinuity. Remember that the Gibbs phenomenon results in an overshoot of
about 18% of the jump from equilibrium. This jump clearly has size 1 at x 2 3 and 1 2 at x 1 3 , so
the relative size of these Gibbs phenomena are understood, but what of the jumps at x = 0 and 1? The
equilibrium value at both of these points is 0, so the jump from equilibrium is 1 at x = 0 and 2 at x = 1.
This can be more clearly understood by examining a graph of the actual function we are representing
by our Fourier series. This function is definitely odd and periodic with period 2, as can be seen directly
from the constituent eigenfunctions. This means that the full jump at x = 0 is 2 and that at x = 1 is 4.
The periodic function is illustrated in figure 21, again with a cut-off of 50 for n. It is clear from this
figure why the Gibbs effects show the character they show.

2
2.0

1.5

-2

1.0

-1

-1

0.5
0.2

0.4

0.6

Figure 20

0.8

1.0

-2

Figure 21

The motion of this wave is interesting to watch. Consider the pulse of amplitude 1 on the left
side of figure 22. As its right-moving component begins to move, it interferes with the left-moving
component of the region between 1 3 and 2 3 . This leads to the shorter amplitude of 0.5 shown in
figure 23. The part of the wave with amplitude 1 hasnt moved far enough to interfere with this region,
so it is unchanged. The left-moving part of the wave has a very different story. The wave is already at
the left endpoint of the region, so there is nowhere else to go. The inverted wave is now moving to the
right, so acts to cancel the remaining part of the pulse between 0 and 1 3 . This is the reason for the
strange shape illustrated in figure 23. An analogous situation occurs at the boundary at x = 1. Figures
24 35 show the subsequent evolution of the waveform. The time intervals are not equal, but time
increases as you move through the figures. Note that the initial waveform is re-combined, but inverted
in the last figure. This is because our treatment does not allow the different normal modes to interact
with each other. The term interference is only associated with our view of the shape of the wave; it
does not indicate that the waves are actually interfering with each other at all. All of the major
features of each of these shapes can easily be understood simply by applying this reasoning of left- and
right-moving wave components. Make sure that you understand why the shapes of the figures are what
they are. These shapes may not be what you would have expected initially, but you will get the hang of
it by examining these graphs (thats why there are so many of them). Think of rectangles moving
back and forth on the interval. The rectangles change their sign as they bounce off either way and
change direction, which complicates the analysis somewhat. It is difficult to keep track of which block
is where, unless you track their progress using graph paper or, if you do it properly, the gremlins.
This whole process is somewhat clearer when viewing an animation, so you can also try to generate
these graphs in Mathematica. Either way, the animation will be presented in class.

670

Section XIII.3: Initial Conditions and the Solution of the Wave Equation
2

1
0.2

0.4

0.6

0.8

0.2

1.0

-1

-1

-2

-2

Figure 22
2

1
0.4

0.6

0.6

0.8

1.0

0.8

1.0

0.8

1.0

0.8

1.0

0.8

1.0

Figure 23

0.2

0.4

0.8

0.2

1.0

0.4

0.6

-1

-1

-2

-2

Figure 24

Figure 25
2

1
0.2

0.4

0.6

0.8

0.2

1.0

-1

-1

-2

-2

Figure 26

0.4

0.6

Figure 27
2

1
0.2

0.4

0.6

0.8

0.2

1.0

-1

-1

-2

-2

Figure 28
2

1
0.4

0.6

Figure 29

0.2

0.4

0.6

0.8

0.2

1.0

0.4

0.6

-1

-1

-2

-2

Figure 30

Figure 31

671

Advanced Mathematical Techniques


2

1
0.2

0.4

0.6

0.8

0.2

1.0

0.4

0.6

0.8

1.0

0.8

1.0

-1

-1

-2

-2

Figure 32

Figure 33

0.2

0.4

0.6

0.8

1.0

0.2

-1

-1

-2

-2

Figure 34

0.4

0.6

Figure 35

We can, of course, also include damping in our analysis. The only difference mathematically
between this situation and that done above is the coefficients. The resulting behavior, illustrated in
figures 36 41, is difficult to understand until we think very carefully about it. As the string begins to
move, the places where its acceleration is the greatest (or, in this case, nonzero) lie at the
discontinuities. The string must suddenly jump in order to keep the shape of the wave. In the absence
of friction, it does this quite easily. When friction is present, on the other hand, it will delay the jump
in the wave. This is the reason for the diagonal lines forming in figures 36 and 37. The jumping
process requires tremendous motion of the string, and leads to the most significant instances of
damping. This damping does not cause the string to immediately return to its equilibrium
configuration, as a given part of the string is directly affected only by its neighbors, but the damped
string will eventually return to equilibrium. This process is illustrated in the figures. Try to see why
the damped string behaves as it does in comparison to the undamped string. This will be very useful in
helping you fully understand the idea of superposition and what it means to a function.
2

1
0.2

0.4

0.6

0.8

0.2

1.0

0.6

0.8

1.0

-2

-2

Figure 36

Figure 37

0.2

0.4

0.6

0.8

0.2

1.0

0.4

0.6

-1

-1

-2

-2

Figure 38
672

0.4

-1

-1

Figure 39

0.8

1.0

Section XIII.3: Initial Conditions and the Solution of the Wave Equation
2

0.2

0.4

0.6

0.8

0.2

1.0

0.4

0.6

0.8

1.0

-1

-1

-2

-2

Figure 40

Figure 41

Exercises for Section XIII.3:


In problems 1 6, solve the wave equation associated with speed v = 1 under the given boundary and
initial conditions. Do not include gravity in your treatment. Include graphs of your solution at four
different times, and explain how the boundary conditions and initial conditions are reflected in your
solution.
1. y (0, t ) y ( L, t ) 0 ; y ( x, 0) x x L 3 L 3x 2 , yt ( x, 0) 0 .
2. y x (0, t ) y x ( L, t ) 0 ; y ( x, 0) x 2 x L 2 L2 2 Lx , yt ( x, 0) 0 .
3. y x (0, t ) y ( L, t ) 0 ; y ( x, 0) Lx 2 x 3 , yt ( x, 0) 0 .
4. y (0, t ) y x ( L, t ) 0 ; y ( x, 0) x 2 ( L x) 2 , yt ( x, 0) 0 .
5. y (0, t ) 0 , y ( L, t ) 1 ; y ( x, 0) x L , yt ( x, 0) 0 .
3

6. y x (0, t ) 0 , y ( L, t ) 1 ; y ( x, 0) x L , yt ( x, 0) 0 .
3

7. Explain what happens to the wave described in the text when the damping coefficient grows. Is
there a critical damping coefficient for which the damped wave will never cross the
equilibrium? If so, what happens when the damping coefficient exceeds this critical value? If
not, why? Include some numerical examples and graphs in your solution.
8. Qualitatively explain the appearance of the graphs illustrated in figures 6 11. Why does the
shape change more in the middle than on the edges?
9. Qualitatively explain the evolution of the wave illustrated in figures 22 35. Show explicitly
the travelling waves in figures 27, 29, and 33, and how they attain the shapes shown in these
figures.
673

Advanced Mathematical Techniques

Section XIII.4: Waves in Two Dimensions


Suppose that we are interested in studying the vibrations of a two-dimensional membrane, like
a drum or a trampoline. Going through the analysis that led to the wave equation in this case leads to
the equation
1 2u 2 u 2 u

2u .
v 2 t 2 x 2 y 2
As before, the square of the speed is given by the ratio of surface tension in the membrane to its mass
density. The surface tension in a membrane is defined as the force pulling on a boundary of the
membrane per unit length of the boundary, so it has units of force per unit length. Similarly, a volume
tension would have units of force per unit area. Pressure is an example of such a quantity. We can
solve this equation in exact analogy to the above whenever the membrane has rectangular symmetry.
Consider, for example, a rectangular trampoline with length a and width b. The boundary conditions
on the displacement u(x, y, t) of the membrane are given by
u (0, y , t ) u ( a, y, t ) u ( x, 0, t ) u ( x, b, t ) 0 .
At fixed y and t, the membranes displacement is a function of x that equals zero at both x = 0 and
x = a, and at fixed x and t, the membranes displacement is a function of y that equals zero at both
y = 0 and y = b. This is enough for us to write the general solution as

n x
m y
.
sin
u ( x, y, t ) cn ,m (t ) sin
a
b
n , m 1
Substituting this expression into the differential equation and exploiting the linear independence of our
normal modes leads to the differential equation
n 2 2 m2 2
cn ,m v 2 2 2 cn ,m
b
a
for the Fourier coefficients cn ,m (t ) , so we have

n2 m2
n2 m2
n x
m y
cos 2 2 vt Bn ,m sin 2 2 vt sin
sin
.

a
b
a
b
a
b

n , m 1

The remaining coefficients are determined from the initial conditions, as always. If the membrane is
initially at rest and its displacement is given by
u ( x, y , 0) f ( x, y ) ,
then we have Bn , m 0 and
u ( x, y , t )

n,m

b
4 a
n x
m y
.
dx dy f ( x, y ) sin
sin

0
ab 0
a
b
This last expression follows directly from the orthogonality of the eigenfunctions, exactly as above.
As an example, consider trampoline whose initial displacement is given by
f ( x, y ) x a x y 2 (b y ) .

An , m

The coefficients are given by

674

Section XIII.4: Waves in Two Dimensions

An , m

b
4 a
n x
m y
dx dy f ( x, y ) sin
sin
ab 0 0
a
b
1

4a 2b3 d d 1 2 (1 ) sin n sin m


1

4a 2b3 d 1 sin n d 2 (1 ) sin m

2
2
1 (1) n 3 3 1 2(1) m
n3 3
m
The resulting oscillations are illustrated in figures 42 47, at time intervals of 0.4 a v in the case where
b = 2a. It is clear from the figures that the wave moves back and forth across the trampoline, inverting
itself as it does so, exactly as in the above one-dimensional case. Unlike the one-dimensional case,
these waves will never return to their exact original configuration. This is because the oscillation
frequencies
4a 2b3

n2 m2

a 2 b2
are not integral multiples of each other in this case. This almost always happens with two-dimensional
waves, and such waves are said to be anharmonic for this reason. The normal modes of the system do,
of course, return to their exact initial configuration an infinite number of times as time goes on, but
different normal modes almost never do this at the same time. The only way that this could possibly
occur is if the relevant frequencies of the oscillation are rational multiples of each other. This
complicates the qualitative treatment of such systems, but quantitatively we simply do the same thing
twice.

n , m 2v

Figure 42

Figure 43

Figure 44

Figure 45

Figure 46

Figure 47
675

Advanced Mathematical Techniques

The case of a rectangular membrane follows directly from our above analysis because it
essentially consists of two independent one-dimensional problems. The boundary conditions are
expressed at constant x or at constant y, so the problem separates in a nice way and allows us to simply
use our above results. This nice situation does not occur for more general shapes. A circular
membrane, like a drum, has boundary conditions imposed on its circular boundary. This boundary
cannot be expressed at constant x or y, so we cannot perform the same manipulations in this case.
Boundary conditions dictate everything about partial differential equations. If the boundary conditions
cannot easily be expressed in terms of the coordinates you are using, it is time to change the
coordinates. In this case, its polar time.
The Laplacian in polar coordinates is given by
1 u 1 2u
2u
.
r
r r r r 2 2
Assuming we can find a solution of the form u (r , , t ) R(r ) T (t ) , we plug this in and divide by
the function. This gives

1 T rR

2 .
v2 T
rR
r
The left-hand-side is only a function of t, and the right-hand-side is only a function of r and , so both
sides are constant. Choosing this constant as k 2 as before, we have
rR k 2 .
1 T
k 2
and
2
v T
rR
r 2
The second equation can be multiplied by r 2 and re-arranged to give
r rR

k 2r 2
.

R
The left-hand-side is only a function of r, and the right-hand-side is only a function of , so again
both sides are constant. It will be convenient to call this constant n 2 . The angular equation is then
very simple:
n 2 A cos n B sin n .
At this point, we need to include our boundary conditions. Since and 2 represent the same
physical location, our solution must be periodic in . Writing our solutions in exponential form
instead, we see that
ein 2 ein ei 2 n ein ei 2 n 1 n .
This is the reason we gave the constant the name we did. The solutions of the temporal equation are
also easily obtained, but the radial equation will require some work.
The radial equation can be written as
r 2 R rR k 2 r 2 n 2 R 0 .
This equation does not have constant coefficients, so it is more difficult to solve than those we have
seen previously. We have, however, discussed such equations in chapter 11. There, we found that the
solutions to such equations will be well-behaved in all places where the coefficient of the second
derivative contribution does not vanish. We are guaranteed to have two linearly independent solutions
at all such places. In our case, the point r = 0 is ill-defined. We are not guaranteed to have two
linearly independent solutions that are defined at this point. On the other hand, the point r = 0 is a
physical location in our drum. We cannot allow solutions that are not well-behaved at this point. This
becomes our first boundary condition: the solutions are well-behaved at r = 0. This boundary
676

Section XIII.4: Waves in Two Dimensions

condition is very different in character from that seen above, but this is expected in situations where
singular points of the equation at hand lie in the physical domain. We will see that the reason for this
boundary condition is contained in the requirement of Hemiticity, which was the original reason we
introduced boundary conditions in the first place. The second boundary condition is of the ordinary
type; we require that the membrane be fixed at r = a, the boundary of the drum.
Following the analysis given in chapter 12, we examine the adjoint of the operator
d2
d
d d
r 2 2 r k 2r 2 n2 r r k 2r 2 n2 .
dr
dr
dr dr
The Hermiticity of the second contribution is obvious; to check the first, we form the inner product
a
a
a

*
*
*
0 g (r )r r f (r )dr g (r )r r f (r ) 0 0 rg (r ) r f (r ) dr .
This is not going well, as there is no r accompanying f (r) before its first derivative is taken. This path
will not lead to Hermiticity. Dividing by r, however, we consider the operator
d2 d
d d
.
r
r 2
dr
dr dr dr
The inner product here gives
a
a
a
g * (r ) r f (r )dr g * (r ) r f (r ) g * (r ) r f (r ) dr

,
a
a
g * (r ) r f (r ) g * (r ) r f (r ) rg * (r ) f (r ) dr
0
0
so this operator will be Hermitian whenever the boundary term vanishes. This certainly occurs at the
right endpoint whenever the functions are required to be zero there, and will occur at r = 0 whenever
lim rg * (r ) f (r ) 0
r 0

for all functions g(r) and f (r) in the space. This can be seen as the well-behaved requirement.
We have already quantized the parameter n via the angular equation, so it is the job of the
radial equation to quantize the parameter k. It turns out that there is only one linearly independent
solution to this differential equation that is regular (or not singular) at r = 0 for any value of k or n.
The regular solution is called the Bessel function of the first kind, and written as J n (kr ) . These
functions are named after German mathematician Friedrich Bessel, who generalized earlier studies by
the Swiss mathematician Daniel Bernoulli. These functions are oscillatory for real values of k and n,
and play the role of the harmonic functions sine and cosine for the circular drum. This promotes them
to monumental importance for any system that displays circular or cylindrical symmetry, and has led
to decades of dedicated work by many mathematicians. The orthogonality of the Bessel functions can
be established by re-writing the Bessel equation as an eigenvalue equation:
d d 2
r k 2 r .
dr dr r
The operator on the left is definitely Hermitian, so has eigenfunctions that are orthogonal and span the
space. The right-hand-side, which represents the eigenvalue, however, is not a constant. We can rearrange our argument to include this function of r, though, by writing
a
a
d d 2
kn2 J * km r J kn r r dr J * km r r J kn r dr
0
0
dr dr r
.
*
a d
a

d 2
2
*
r J km r J kn r dr km J kn r J kn r r dr
0
0
dr dr r

This indicates that


677

Advanced Mathematical Techniques

2
m

kn2 J * km r J kn r rdr 0 ,
a

so either kn2 km2 or the functions are orthogonal with respect to the weight function r.
The appearance of the weight function r is really the only substantive difference between the
treatment of circular and rectangular waves. The Bessel functions behave exactly like the
trigonometric functions in essentially every other respect. All Bessel functions with common angular
index that are zero at a are orthogonal to one another with respect to the weight function r , just like all
sine functions that are zero at 0 and a are orthogonal to one another. Bessel functions with different
index, like J 0 (kr ) and J1 (kr ) , are not orthogonal with respect to r, but the angular part of the
wavefunction takes care of this. Figure 48 illustrates several of the Bessel functions. It is clear from
the figure that there are many zeros associated with each of these functions. In order to ensure that our
eigenfunctions satisfy the boundary condition at r = a, we choose the value of k in such a way that one
of these zeros coincides with the r = a. The zeros of the Bessel functions are all irrational, and not
easily expressed in terms of or other mathematical constants. For this reason, the kth zero of J n ( x)
is denoted by jn ,k . Table 1 gives the first few zeros of J 0 ( x), J1 ( x), and J 2 ( x) . Typically, only the
first couple of zeros are needed explicitly; the asymptotic expansion for the Bessel functions given in
chapter 4 indicates that large zeros can be approximated by
2n 1

k
jn ,k
.
k
4

Comparing these approximations with the values shown in the table indicates that they are fairly good
even for small k.

j0,k

j1,k

j2,k

1
2
3
4

2.40482556
5.52008
8.653728
11.79153

3.831706
7.015587
10.17347
13.32369

5.135622
8.417244
11.61984
14.79595

Table 1

1.0
0.8
0.6
0.4
0.2

- 0.2
- 0.4

10

Figure 48

The Bessel functions satisfy a large number of interesting relationships, all of which can be
derived directly from the differential equation (with malice of forethought, or clairvoyance). This
process actually requires a bit more clairvoyance than I really want to present, though, so we will
follow a different approach. Consider the integral
1 in ix sin
f n ( x)
e
d .
2
Our goal is to show that it satisfies Bessels equation. With this in mind, we act on the integral with
our differential operator:
2 d2

d
1
x 2 sin 2 ix sin x 2 n 2 ein ix sin d .
x x 2 n 2 f n ( x)
x
2

dx
dx
2

The last term is not like the others in that it contains n. We can change this by writing

678

Section XIII.4: Waves in Two Dimensions

1
2

n e
2

in ix sin

1
2

1 ix sin in
e
ine
2

ix

cos ein ix sin


2

e ix sin

1
2

2 in
e d
2

ix cos e

ix

in

ix sin

in
e d

cos e ix sin d

1
ix sin x 2 cos2 ein ix sin d
2
The boundary terms associated with the integration by parts vanish whenever n is an integer.
Substituting this expression into our above expression, we have
2 d2
d
2
2
x dx 2 x dx x n f n ( x)

1
2
2
2
2
2
in ix sin
x sin ix sin x ix sin x cos e
d 0

2
The functions f n ( x ) are clearly regular at x = 0, as
1 in
1
f n (0)
e d 0 when n 0 ; f 0 (0)
d 1 ,

2
2
so we can take these functions as the Bessel functions of order n whenever n is an integer.
We can prove many important properties of the Bessel functions from this integral
representation. First, consider the question of whether or not this function is real. Its imaginary part is
given by
1
1
sin n cos x sin cos n sin x sin d 0
Im J n ( x )
sin n x sin d

2
2
The last equality follows from the fact that the cosine function is even while the sine function is odd.
Thus, the Bessel functions are real. Taking the complex conjugate of our expression, we find that
J n ( x) J n ( x) .
Writing our integral representation as
1 in ix sin
1 2 in ix sin
e in 2 in ix sin
J n ( x)
e
d

e
d

e
d (1) n J n ( x) ,
2
2 0
2 0
we see that Bessel functions of even order are even and those with odd order are odd. The last equality
follows from the fact that the integrand is periodic with period 2 for all integral n, so it does not
matter what interval we integrate over as long as it has length 2 . Taken in conjunction with the
previous result, this implies that
J n ( x ) (1) n J n ( x ) .
We can even determine a power series expansion for the Bessel functions from this result. Writing

1 in ix sin
1 ix in
J n ( x)
e
d

e sin k d ,

2
2 k 0 k !
we see that all we need is the value of this integral. This integral can easily be done using the complex
representation of the sine function, as long as we remember that the functions eik are orthogonal on
this interval for all integral values of k. This gives
k

679

Advanced Mathematical Techniques

1
in
k
e sin d 2i

k
1
k j
in ij i ( k j )
d

(1) e e e
2i
j 0 j
k

j (1) e

k j

j 0

i ( n 2 j k )

k
k

2
(1)( k n ) 2 k !
1
(k n) 2
2

(
1)

k
2i (k n) 2
2i (k n) 2! (k n) 2!
whenever k and n are both either even or odd. If one is even and the other is odd, then the integral is
zero. The integral is also zero if k < n, so the parameter (k n) 2 starts at 0 and goes to infinity as
k increases. This allows us to write the expansion

2
n
(1) x n 2
x x 4
J n ( x) n 2

!( n)! 2 0 !(n )!
0 2
for these Bessel functions. This is not the only way to obtain this famous expansion. We could take at
least two or three other more-or-less obvious approaches to arrive at the same result. However, this
approach is fairly simple and allows me to avoid having to discuss the methods of Frobenius... .
Another immediate result we can obtain from this representation of the Bessel functions comes
from subtracting J n 1 ( x) from J n 1 ( x) :
1 i ( n 1)
e
ei ( n 1) e ix sin d
J n 1 ( x ) J n 1 ( x)
2
.
2i
2i in ix sin
in ix sin

d
i e
d 2 J n ( x)
sin e
2
2 x
We can obtain another expression by adding these two contributions:
1 i ( n 1)
e
ei ( n 1) e ix sin d
J n 1 ( x) J n 1 ( x)
2
2
2 in ix sin
cos ein ix sin d

e i
e
d

2
2
x

2i in ix sin
2n
in 2n in ix sin
e ix sin
e
e d
e
d
J n ( x)

2 x
2

x
x

This result is a three-term recurrence relation for the Bessel functions. It allows us to compute Bessel
functions of any integer order at any value of x from knowledge of two Bessel functions. Since we
have the sum and the difference of J n 1 ( x) and J n 1 ( x) , we can compute either in terms of J n ( x ) .
These expressions can be written in a stylistic manner as
n
1 d n
x J n ( x )
J n 1 ( x) J n ( x) J n ( x) n
x
x dx
and
n
d J n ( x)
.
J n 1 ( x) J n ( x ) J n ( x ) x n
x
dx x n
As a special case, we have the important result
J 0 ( x) J1 ( x) .
Starting from n = 0, we can build the whole tower of Bessel functions using these results:
d
J1 ( x ) J 0 ( x )
dx
d 1
d 1 d
J 2 ( x) x
J1 ( x ) x
J 0 ( x)
dx x
dx x dx

680

Section XIII.4: Waves in Two Dimensions

d 1
d 1 d 1 d
J 2 ( x) x 2
J 0 ( x) .
dx x 2
dx x dx x dx
This pattern clearly continues, so we have
J 3 ( x) x 2

1 d
J n ( x) (1) n x n
J 0 ( x) .
x dx
The operator d x dx therefore plays the role of a raising operator for the Bessel functions, taking

us from one value of n to the next higher one. As long as we understand the function J 0 ( x) well
enough, we can use this relation to understand all of the others. This explicit representation for J n ( x)
in terms of J 0 ( x) is very useful. It is similar to a class of Rodrigues representations for
eigenfunctions of self-adjoint differential operators that we will see below.
These expressions allow us to compute the integral
2
J n ( x) x dx

for arbitrary integer n as follows. First, we consider what happens when n = 0. In this case, we have
x2 2
x2 2
2
2

x
J
(
x
)
dx

J
(
x
)

x
J
(
x
)
J
(
x
)
dx

J 0 ( x) x 2 J 0 ( x) J1 ( x) dx ,
0
0
0
0

2
2
using integration by parts. The remaining integral can be computed quite easily by using the first of
our stylistic formulas with n = 1:
1 d
2
2
x J 0 ( x) J1 ( x) dx x J1 ( x) x dx x J1 ( x) dx x J1 ( x) x J1 ( x) dx .
x 2 J12 ( x) x J1 ( x) x J1 ( x) dx

Bringing the remaining integral to the other side of the equation immediately gives
x2 2
2
x
J
(
x
)
J
(
x
)
dx

J1 ( x ) ,
0 1
2
so we have
x2 2
2
J 0 ( x) J12 ( x) .

x
J
(
x
)
dx
0

2
I have ignored the arbitrary constant of integration. The key to completing this integral is the use of
cyclic integration by parts. We need an integrand that turns back into itself on integration by parts so
that we can indirectly determine its value. For general integer n, we have
x2 2
2
x
J
(
x
)
dx

J n ( x) x 2 J n ( x) J n ( x) dx
n
2
To complete the remaining integral, we need to somehow transform it into something that cyclic
integration by parts will work on. In its current form, the factor of x 2 is preventing this from
happening. We can remedy this issue by using the differential equation:
d d
x 2 J n ( x) n 2 J n ( x) x x J n ( x) .
dx dx
Substituting this in gives two integrals that both satisfy the requirements for a fruitful application of
cyclic integration by parts, so we have

681

Advanced Mathematical Techniques

x2 2
J n ( x) n 2 J n ( x) J n ( x) dx x J n ( x) x J n ( x) dx
2
x 2 J n2 ( x)
x 2 n2 2
.

J n ( x)
2
2
This expression is valid for every solution to Bessels equation, as long as it is well-defined throughout
the integration interval, because the differential equation was the only thing we used to derive it. If n
is an integer, then we can also write this as
x2 2
2n

2
2
x J n ( x) dx 2 J n ( x) x J n ( x) J n1 ( x) J n1 ( x)
by eliminating the derivative using the above relationships between the Bessel functions. This is the
form most commonly seen in the literature, as it is the most convenient for our work below.
Before moving on to actually using these functions to approximate a given initial waveform, it
will be useful to derive another important result about the Bessel functions: their generating function.
The idea of generating functions is very prominent in the study of special functions, as they often
allow us to easily derive results that would be extremely difficult to obtain in the absence of a
generating function. Generating functions are power series whose coefficients are the special
functions we are interested in. You will sometimes see a factorial in the definition, but it is not
convenient to include a factorial explicitly in this case.187 We define
2
x J n ( x) dx

g ( x, t )

( x) t n .

Our two recurrence relations for the Bessel functions lead to partial differential equations for g(x,t):

g
n
1

t g

J n ( x) t n J n 1 ( x ) J n ( x ) t n t J n ( x) t n J n ( x) t t n t g
x n

t
x
x
t
x

n
n
n

g
1

1
t g g
n

J n ( x) t n J n ( x ) J n 1 ( x ) t n J n ( x ) t t n J n ( x) t n

x n
t
x n
t n
x t t

n x
Adding these two equations together gives
g 1
t g g ( x, t ) f (t )e xt 1 t 2 ,
2
x t
and subtracting gives
t g 1
02
t g g ( x, t ) f ( x )e xt 1 t 2 .
x t t
The functions f (t ) and f ( x) are constants of integration. Comparing our two expressions for g(x,t)
indicates that both of these functions are indeed constant, so all we need to do is determine the
normalization of our generating function. This is accomplished by taking x = 0:
g (0, t ) J 0 (0) 1 f .
Therefore, our generating function is given by
e

x t 1 t 2

( x) t n .

Generating functions are often used to determine integrals that are otherwise difficult to find,
but the presence of negative powers of t as well as positive powers in this expansion makes this
187

The reason for this can be seen from the Maclaurin expansion for the Bessel functions obtained above. We already have
two factorials how many do you need?!?

682

Section XIII.4: Waves in Two Dimensions

process more difficult than usual. The generating function of the Bessel functions is usually used to
establish integral representations for the Bessel functions or to establish other useful properties. One
of these is the addition identity, a formula for J n ( x y ) in terms of J m ( x) and J ( y ) . We can
establish this identity easily from the generating function, as it is clear that
x
y
x y
exp
t 1 t exp t 1 t exp t 1 t .
2

Substituting the Bessel expansions gives

J n ( x y ) t n J m ( x) J ( y ) t m J m ( x) J n m ( y ) t n ,

n
m
n m

so equating coefficients gives


J n ( x y)

( x) J n m ( y ) .

This would definitely be difficult to establish from the Taylor expansion! There are many other results
we can obtain from the generating function, but its time to move on to a specific example of a Bessel
wave.
Our normal modes of vibration are the Bessel functions J n jn , k r a coupled with the angular
dependence and the time dependence T (t ) . If the surface is initially at rest, we have the
solution

r , , t An , k ein Bn ,k e in J n jn ,k r a cos jn ,k vt a .
n 0 k 1

The coefficients An ,k and Bn , k are determined from the initial disturbance f r , of the medium by
the orthogonality relation, but in order to actually find them we need to first normalize our Bessel
functions. This is easily accomplished using one of the above results:
a
a 2 jn ,k 2
2

J
j
r
a
r
dr
J n u u du

,
n
n
k
0
jn2,k 0
.
jn ,k
2n
a2 u2 2

2
2 2
2
J n (u )
J n (u ) J n 1 (u ) J n 1 (u ) a J n 1 jn , k 2
jn ,k 2
u
0
Thus, the coefficients are given by
2
a
1
An , k 2 2
d r dr J n jn , k r a e in f (r , )

0
0
a J n 1 jn ,k
.
2
1
1
in

d u du J n jn , k u e f (au , )
0
J n21 jn ,k 0
The B coefficients are the complex conjugates of these. Note that Bessel functions of different order n
are not orthogonal to each other. If the orders are different, then the required orthogonality is
established by the integration over in the angular part of the wavefunction. The orthogonality of the
Bessel functions is used only to separate different contributions of the same order. This is a general
result; when solving partial differential equations, each variable is responsible for orthogonality only
over its own functions. The eigenvalue n is associated with the angular equation, so the angular
integration must deliver orthogonality for different values of n. The radial functions are not required
to be orthogonal for different values of n because they are not associated with eigenvalue n.
683

Advanced Mathematical Techniques

The normal modes of oscillation of a drum are interesting to observe. There are two parameters
associated with these modes, exactly as we saw in the case of the rectangular trampoline. Some of the
lower frequency modes are illustrated in figures 49 57. Figure 49 shows the lowest frequency mode,
with n = 0 and k = 1. Figures 50 and 51 show the effect of increasing the radial parameter k. Note that
both of these figures show cylindrical symmetry because there is no dependence on the angle . Figure
52 illustrates the mode with n = 1 and k = 1, and figures 53 and 54 illustrate the effect of increasing the
radial mode for n = 1 (k = 2 in figure 53 and 3 in figure 54). Figures 55 57 illustrate the modes
associated with the next value of n. Note that the figures oscillate more wildly in the radial direction as
we increase k and in the angular direction as we increase n. Taking the frequency of oscillation
associated with the lowest mode illustrated in figure 49 as 1 (the frequency is actually given by
2 a j0,1v 2.61274 a v , so this represents a choice of units), the frequencies of the modes illustrated in
figures 50 57 are given by 2.29542, 3.59848, 1.59334, 2.9173, 4.23044, 2.13555, 3.50015, and
4.83189, respectively. None of these are integers, so we again expect the oscillations to be anharmonic.
Linear combinations of these modes will never return to the same shape.

Figure 49

Figure 52

Figure 55
684

Figure 50

Figure 53

Figure 56

Figure 51

Figure 54

Figure 57

Section XIII.4: Waves in Two Dimensions

We could illustrate the propagation of a wave associated with a finite value of a, but it will be
more useful to consider the limit as a tends to infinity. This will give another illustration of how this
limit is taken, as well as indicate some important properties of the Bessel function expansion. You can
think of the result as the propagation of a wave generated at the center of a very large pond by dropping
a rock into it. We will take the initial disturbance of the surface of the pond as
f r , r 2 2 r 1 e r ,

which is illustrated in figure 58. It certainly seems appropriate to a small rock dropped into a pond.
Our disturbance is independent of , so we expect the waves produced to be circular. It is important
for this function to have the property

f r, r dr 0
0

in order for it to be uniformly approximated by our normal modes. Otherwise, our normal modes will
not be able to approximate the wave. You can view this as a physical requirement that the total
amount of water in the pond remains constant if you like. Mathematically, it arises from the fact that
the constant 1 satisfies Bessels equation when k = 0 and n = 0. This constant is certainly present in
the unbounded region, as the eigenvalues become continuous in this limit, but it does not reside in the
space. Like the Bessel functions themselves, the constant function is not square-integrable over
unbounded regions. It will be more convenient for us not to include this contribution for this reason.
The coefficients of the expansion are given by
a
2
Ak 2 2
J 0 j0,k r a f (r )r dr

0
a J1 j0,k
for finite values of a. In order to take a to infinity, we must first define a parameter to take the place of
k. Defining j0,k a , we see that becomes continuous as a tends to infinity. The coefficients
become

2
a Ak

J 0 r f ( r ) r dr .
a
2

0
a J1 a
For large values of a, the function J1 a can be approximated quite well by its asymptotic
expression

2
2
2
cos a 3 4
cos j0,k 3 4
cos k .
a
a
a
In this expression, we have used the asymptotic expression for the zeros j0,k . Zeros that are not
J1 a

asymptotic will all congregate around 0 as a tends to infinity, and these contributions cannot
contribute when

f (r ) r dr 0 (why?). The square of the cosine will always give 1, so we can write

the coefficients in the limit as

a Ak
J 0 r f (r )r dr .
a
0

The expansion of the wavefunction then becomes

(r , t ) Ak J 0 r cos vt k Ak J 0 r cos vt
k 1

j0,k

k 1

a Ak

Ak J 0 r cos vt
k 1

J 0 r cos vt d d J 0 u f (u ) u du J 0 r cos vt
0

685

Advanced Mathematical Techniques


2

10

- 0.2
- 0.4
- 0.6
- 0.8
- 1.0

Figure 58
The coefficients can be obtained by using the series expansion if we like, though there are also
other avenues available. We begin by finding the more general integral
m
(1) m 2 m n 2 m br
1 (1) (2m n)! b

r
e
dr
.

2m
0
22 m m !2
m !2 0
b n 1 m 0
m 0 2
This series can be summed by recognizing it as derivatives of the algebraic expression

1 2 j
1 2
1 x
x .
j
j 0
This expression is put into a more familiar form by manipulating the binomial coefficient:
1 2 1 2 3 2 5 2 7 2 (1 2 j ) 2

j!
j
.
j (2 j 1)(2 j 3) (3)(1)
j (2 j )!
(1)
(1) 2 j 2
2 j j!
2 j!
Thus, we have

J 0 ( r ) e br r n dr

J 0 ( r ) e br r n dr

2m

(1) m (2m n)(2m n 1) (2m 1)(2m)!



b n 1 m 0
22 m m !2
b

1 dn
b n 1 dx n

(1) m (2m)! 2 m n
x
22 m m !2
m 0

x b

2m

1 2
1 dn n
x 1 x 2
n 1
n
b dx

x b

Our integral is given by

J 0 u f (u ) u du

15 2

2 1 2

72

so we have the propagating wave

(r , t )

15
3d
.

J
r
cos
v
t

0
7 2
2 0
1 2

The waves resulting from this expression take a few minutes to fully understand. The resulting
displacement is shown at times of 0, 0.5 h v , 0.7 h v , 1.0 h v , and 1.5 h v , where h is the length
dimension, in figure 59. The disturbance at r = 0 accelerates very quickly from rest after t = 0, and
heads for the equilibrium. This acceleration is so great that the r = 0 region quickly reaches and
overtakes the neighboring regions around t = 0.7 h v . To understand the reason for this, we must
visualize the whole wave. We are seeing only a cross-section in figure 59; the actual wave is obtained
by rotating this graph about the vertical axis. From this perspective, we see that the region associated
with small r has much less inertia than that associated with larger values of r. It is easier to accelerate,
and is being pulled toward the equilibrium by neighboring sections of the surface all around it. This is
the reason why it overtakes its neighbors and overshoots the equilibrium. At the same time, the r = 0
686

Section XIII.4: Waves in Two Dimensions

region is pulling downward on its neighbors. This generates the outward-propagating dip observed
around r = 2 at t = 1.5 h v . The amplitude of this dip is much smaller than that of the original dip
because the regions it is disturbing have more inertia than the center portion.
- 0.2

10

- 0.4
- 0.6
- 0.8
- 1.0

Figure 59
Once the central disturbance overshoots its neighbors, it is pulled down by the surrounding regions
and begins to slow down. It still overshoots the equilibrium because its speed was too great for the
restorative force to stop it as it reaches equilibrium. It will eventually stop at a maximum positive
displacement from equilibrium, at which point the restorative forces cause it to turn around and
approach equilibrium anew. This process generates a new hump in the wave that begins to propagate
outward from the center. Figures 60 68 show the subsequent propagation of the wave at equal time
intervals of h v , starting from t 2 h v in figure 60. The formation of the second hump is obvious in
figures 60 64, and the outward propagation is obvious in the remaining figures. All of these figures
have the same scale on the vertical axis in order to facilitate comparison between different times. Note
that the amplitude of both humps as well as the dip decreases as the wave moves out from the center.
This is again associated with the larger inertia associated with larger values of r; the amplitude must
decrease approximately as 1 r in order for the total energy of the wave to be conserved. This is, of
course, also reflected in the amplitude of the Bessel functions themselves. As normal modes of the
oscillating drum, these waves must conform to the physical requirements of these oscillations.
0.2

0.2

0.2

0.1

0.1

0.1

10

15

20

10

15

20

- 0.1

- 0.1

- 0.2

- 0.2

Figure 60

0.2

0.1

0.1

0.1

10

15

20

10

20

10

15

20

- 0.2

- 0.2

Figure 63

15

- 0.1

- 0.1

- 0.2

20

Figure 62

0.2

15

- 0.2

Figure 61

0.2

- 0.1

10

- 0.1

Figure 64

Figure 65

687

Advanced Mathematical Techniques


0.2

0.2

0.2

0.1

0.1

0.1

10

- 0.1

15

20

10

15

- 0.2

20

10

15

20

- 0.1

- 0.1

- 0.2

- 0.2

Figure 66

Figure 67

Figure 68

Not all problems involving cylindrical cells have the origin accessible. In situations where the
center is off-limits, like some coaxial cables, we must also include the other linearly independent
solution to Bessels equation. Recall that this solution was disallowed earlier because of its behavior at
r = 0. This is no longer an issue when the origin is inaccessible, so we have no reason to exclude these
solutions. The new boundary condition at the inner boundary replaces our earlier requirement that the
solutions be well-defined at the origin, fixing a linear combination of the two different kinds of Bessel
functions to be used in place of the well-behaved solutions J n ( x) . When satisfying the boundary
conditions, these functions will also be orthogonal and span the space of acceptable functions. These
Bessel functions of the second kind can be obtained in many different ways. The most direct, though
essentially useless from a calculational perspective, comes from the Wronskian
W J n ( x), Yn ( x) J n ( x)Yn( x) J n ( x)Yn ( x) .
As we saw in the modeling notes, the Wronskian can be determined directly from the differential
equation. In this case, it is given by
C
2
,
W J n ( x), Yn ( x)
x x
where the coefficient has been chosen to make things nice in other places. The Bessel Y functions can
then be determined by solving the differential equation, though there are easier ways involving integral
representations. The resulting function is plotted in figure 69 along with the associated Bessel J
function for n = 0. Note that the zeros of these functions interlace. This is one of the tenets of the
Sturm-Liouville theory of differential equations, which we will touch on below.
1.0
0.5
2

10

12

14

- 0.5
- 1.0

Figure 69
It is also not necessary to restrict the order of the contributing Bessel functions to integers. This
was established by the boundary condition on the angular coordinate, but we can alter this restriction by
considering the oscillations of a pie-slice-shaped part of a drum. The new boundary conditions will
give other restrictions on the order of the Bessel functions that contribute. Bessel functions were first
discussed in the 18th century, before Bessel, himself, was even born, and much work has been done on
them in the intervening years. The knowledge currently espoused by experts on Bessel functions could
certainly occupy an entire course. Many of the important results in Bessel function analysis are also
fundamental tenets of the theory of Sturm-Liouville differential equations, which we briefly turn to
now.

688

Section XIII.4: Waves in Two Dimensions

Exercises for Section XIII.4:

1. A rectangular trampoline fixed at its boundaries, which occur at x = 0, a and at y = 0, b, is


initially at rest displaced as u ( x, y, 0) x 2 y ( a x)(b y ) . Determine function u ( x, y , t ) in terms
of the dimensions a and b, as well as the speed, v, of the wave. Plot your solution for several
times, taking a = v = 1 and b = 2. Explain the resulting oscillations qualitatively.
2. A rectangular trampoline fixed at its boundaries, which occur at x = 0, a and at y = 0, b, is
initially at rest displaced as u ( x, y, 0) x 2 y 2 (a x)(b y ) . Determine function u ( x, y , t ) in
terms of the dimensions a and b, as well as the speed, v, of the wave. Plot your solution for
several times, taking a = b = v = 1. Explain the resulting oscillations qualitatively.
3. Show that the shape of a general rectangular wave never returns to its exact original
configuration, even if allowed to run forever with no friction. Show also how infinitely many
different waves (not simply normal modes though this can serve as a hint) can be
engineered in such a way that they do return to their original configuration infinitely many
times.
In problems 4 7, consider a membrane in the shape of a disk of radius a, fixed at its boundary, that is
initially at rest and disturbed in such a way that its shape is given by the function in the problem.
Determine the function u r , , t in terms of the Bessel functions and their zeros, and the speed v. Use
a computer algebra system to determine the values of the first four coefficients. Which Bessel
function(s) make a prominent showing in the analysis of this function? Why? Make a few plots of
your function at different times, taking the radius and speed as 1. Comment on the vibration of the
membrane. Will it ever return to exactly the same shape it started with? Why or why not?
4. u r , , 0 r (a r ) cos .

5. u r , , 0 r 2 (a r ) cos 2

6. u r , , 0 r 2 (a r ) cos3

7. u r , , 0 r 2 (a r ) 1 cos cos 2 cos 3

8. Go through the analysis presented in the text concerning the integral


2
x J n ( x) dx .

Derive both of the given forms, and explain what you are doing at each step.

9. Consider an annular membrane with inner radius a and outer radius b. Determine the normal
modes of vibration for this annulus, assuming a wave speed of v, independence of the angle ,
and that the membrane is fixed at its boundaries. Express the required roots using a symbol, and
explain what this symbol means and which equation it represents the roots of. Use a computer
algebra system to determine the first four of these roots to three decimal places, assuming that
b = 3a. Note that the second solution of Bessels equation, the Bessel Y-function, is not
excluded from this analysis (why?).
689

Advanced Mathematical Techniques

10. Derive the form of the Bessel expansion for non-compact intervals given in the text. Explain
what you are doing at each step and why you expect these steps to work.
11. Consider a pizza slice shaped membrane fixed at the edges, from 0 to 0 with radius 1.
(a) Show that this modification changes neither of the spatial differential equations. Where is
this change reflected in the solution to the differential equations?
(b) Find the normal modes of vibration of this wedge in terms of the Bessel functions. Explain
the difference between these eigenvalues and those found in the whole disk.
(c) Plot a few of your normal modes for 0 3 using a computer algebra system. Explain
how the resulting oscillations are similar to the rectangular waves considered earlier and
how they are similar to the Bessel waves. What is the qualitative difference between these
normal modes and those of the full Bessel waves?
12. Consider the Bessel equation
z 2u ( z ) zu ( z ) z 2 2 u ( z ) 0 .
(a) Show that the series solution J ( z ) z 2

k !(k )! z 2
2

is an entire function of x

k 0

for all except for the branch cut running from z = 0 to infinity when is not a positive
integer, an entire function of for all z 0 , and satisfies this differential equation for all
values of .
(b) Use this series expansion to establish the identities
2
J 1 ( z ) J 1 ( z )
J ( z ) and J 1 ( z ) J 1 ( z ) 2 J ( z )
z
for all .
(c) The Bessel function of the second kind, Y ( z ) , is defined by
J ( z ) cos( ) J ( z )
,
Y ( z )
sin( )
where a limit is used whenever is an integer. Use this definition to show that the Bessel
functions of the second kind also satisfy the relations derived in part (b).
(d) Argue that the recurrence relations derived in part (b) will be satisfied by any linear
combination of the Bessel functions of the first and second kinds, provided that the
coefficients of this linear combination are independent of z and .

690

Section XIII.5: The Sturm-Liouville Problem

Section XIII.5: The Sturm-Liouville Problem

Our discussion of circular waves is a special case of the general theory of Sturm-Liouville
problems, named for the French mathematicians Jacques Sturm and Joseph Liouville who studied their
properties in the nineteenth century. Any differential equation of the form
d
d

p ( x) y ( x) q ( x) y ( x) w( x) y ( x) ,

dx
dx

where the functions p(x) and w(x) are positive and continuous throughout the interval of interest
(except possibly at the endpoints, where they should be continuous but need not be positive) and q(x)
is continuous, is called a Sturm-Liouville equation. The operator on the left,
d
d
L
p q,
dx dx
is clearly Hermitian (show this) on the interval x a, b whenever the boundary term
b

f * ( x) p ( x) g ( x) f * ( x) p( x) g ( x)
a
vanishes. Using essentially the same argument as that given above, we can easily establish that the
eigenfunctions of this operator will be orthogonal with respect to the weight function w( x ) . More
involved analysis indicates that the eigenvalues of this operator are discrete and can therefore be
ordered from lowest to highest, with the highest usually tending to infinity. There are always an
infinite number of eigenvalues, and the eigenfunctions necessarily span the space under the inner
product taken with respect to the weight function. These are the general results of Sturm-Liouville
theory, and they are widely applicable in many different fields.
The appearance of an equation of Sturm-Liouville type in our analysis of waves on circular
disks is not an accident. It turns out that essentially every separable partial differential equation leads
to ordinary differential equations of the Sturm-Liouville type, so the analysis of these sorts of
equations is extremely important. If the function p(x) is zero at either endpoint, as is the case in the
Bessel equation, then one of the solutions may be singular at this endpoint. The orthogonality of the
eigenfunctions of the Sturm-Liouville operator is easily established, as
b

k *j ( x) k ( x) w( x) dx *j ( x) L k ( x) dx L j ( x) k ( x) dx
b

L j ( x ) k ( x) dx *j *j ( x) k ( x) w( x) dx
a
a
Since the eigenvalues are real, we must have

j *j ( x)k ( x) w( x) dx 0 ,
b

so eigenfunctions associated with distinct eigenvalues are orthogonal with respect to the weight
function w( x ) .
Many of the other properties of these functions are derived by taking a close look at the
differential equation, and I will not repeat them here. The main results are that the eigenvalues are
distinct and increase monotonically to infinity as n increases. The number of zeros associated with any
eigenfunction is given by n 1 (not counting the endpoints, if these points are also zeros), and the
zeros of consecutive eigenfunctions interlace. The solutions associated with a given set of allowed
boundary conditions span the space of square-integrable functions on the interval a, b under the
weight function w(x). These properties mirror those of certain classes of polynomials, so solutions to
Sturm-Liouville problems are very often associated with polynomials. This is certainly the case with
the well-known differential equations
691

Advanced Mathematical Techniques


2
d x2 d

e
H n ( x ) n e x H n ( x )
dx
dx

and
d
1 x 2 dxd P ( x) ( 1) P ( x) .
dx
The first of these is called the Hermite equation after the French mathematician Charles
Hermite. Mathematicians use a slightly different version of this equation, dividing the x 2 in the
exponent by two, because it is appropriate for their needs in statistics. The convention followed above
is used by physicists because of its relevance to the quantum harmonic oscillator problem. There are
no singularities in the Hermite equation, so we expect to find two linearly independent solutions for all
values of x. This is certainly true, but only one linear combination of these functions is square
integrable on x , . This boundary condition at infinity is standard in problems involving

arbitrarily long intervals. Large values of x are also suppressed by the exponential factor serving the
role of weight function, but this suppression is not enough to allow the other linearly independent
solution to this equation to contribute. One interesting fact about the Hermite equation and its
associated weight function is that it manages to keep the space of functions having countable
dimension even as the interval becomes unbounded. This is accomplished by the exponential decay of
the weight function, which makes differences in functions somehow less important at large values of
x. Note that the space we are referring to is still the space of square-integrable functions on
x , with no weight function. Including the weight function in our definition of the space
would cause it to not matter at all, as any member of L2 , can easily be seen to be a member of
the space defined with the modified inner product L(exp)
, . In addition, any member of the
2
modified space L(exp)
, can easily be mapped to a member of the original space L2 , by
2
multiplying it by the weight function. This mapping is one-to-one and onto, so the two spaces are
isomorphic to each other and can have no significant differences. We do not include the weight
function in our definition of the space in order to make the results differ from those found in the
analysis of the original space. It turns out that this is exactly what is required in many different
physical situations, so one finds a great number of uses for the Hermite polynomials.
The second equation is called the Legendre equation after the French mathematician AdrienMarie Legendre, who was mentioned in chapter 3. This equation always comes up in discussions of
spherical coordinates, and is very widely used in the scientific community. Note that the function p(x)
vanishes at the endpoints of the allowed interval x 1,1 . This indicates again the existence of a
disallowed solution and gives us the Legendre polynomials as the only viable solutions. These
polynomials satisfy recurrence relations like those of the Bessel functions and the Hermite polynomials,
and are also orthogonal on their specified domain. The weight function for these polynomials is the
constant 1, but they definitely span the space. The variable x is often thought of as the cosine of the
angle between two vectors associated with electric or magnetic field in applications, and this role can
also be extended to the shapes of general extended bodies having some sort of spherical symmetry.
Legendre polynomials are very often used to describe the manner in which an extended object deviates
from a perfect sphere, and they have been extensively used to describe the shapes of planets.
Deviations from a perfect sphere are manifest in contributions from Legendre polynomials of order
0 . The Earth, for example, has the coefficient 0.0010826 for the second order Legendre
polynomial and 2.531 106 for the third order polynomial when the zeroth polynomial associated with
a perfect sphere is taken to have coefficient 1. These contributions are determined from the
692

Section XIII.5: The Sturm-Liouville Problem

measurements of the gravitational effects the Earth has on man-made satellites, so they really represent
the gravitational shape of the Earth rather than its physical shape. The first polynomial does not
appear because it would represent a shift in the center of mass of the Earth itself. The effect of adding a
nonzero contribution from the first Legendre polynomial is illustrated in figure 70. Note that the whole
body has moved to the right. The second Legendre polynomial represents the equatorial bulge of the
Earth, and the third represents its famous pear shape. Figures 71 and 72 illustrate the effects of
adding a small admixture of Legendre polynomials of degree 2 and 3, respectively, to a dominant
spherical shape. Note the bulge associated with the second degree polynomial and the pear shape
associated with the third degree. The rotation axis of the Earth is taken as the horizontal axis in these
figures. These contributions have, of course, been dramatically increased in the figures for ease of
viewing. Legendre polynomials are also extensively used in the quantum analysis of atoms. The s
orbitals are associated with the zeroth order Legendre polynomial, the p orbitals with the first order
polynomial, the d orbitals with the second, and so on. Visualizing these orbitals then gives us a good
way to understand the shapes associated with different Legendre polynomials, but this must not be
taken as fully accurate because the shapes of the orbitals are determined from the square of the
Legendre polynomials rather than the polynomials themselves.

1.0

1.0

1.0

0.5

0.5

0.5

- 0.5

0.5

- 0.5

1.0

0.5

- 0.5

0.5

- 0.5

- 0.5

- 0.5

- 1.0

- 1.0

- 1.0

Figure 71

Figure 72

Figure 70

1.0

Like many orthogonal polynomials arising in Sturm-Liouville problems, the Hermite


polynomials and Legendre polynomials both have Rodrigues representations, named for the French
banker and mathematician Benjamin Olinde Rodrigues, that give them explicitly in terms of repeated
differentiation of a specific function. The Rodrigues representation of the Hermite polynomials is
n
2 d
2
H n ( x) (1) n e x
e x ,
dx n
using the physicists definition. This representation makes it easy to establish certain facts about the
behavior of these polynomials. The integrals we will be required to compute in order to express a
given function f (x) in terms of Hermite polynomials are given by

d n x2
n
x2
(
)
(
)

1)
(
)
f
x
H
x
e
dx
f
x
e dx
n

dx n

(1) n f ( x)

d n 1 x2
e
dx n 1

(1) n

f ( x)

d n 1 x2
e dx ,
dx n 1

f ( x) H n 1 ( x)e x dx
693

Advanced Mathematical Techniques

as the boundary conditions require the boundary term to vanish. This allows us to easily compute the
coefficients of integrals and derivatives of various functions Hermite polynomial expansions. It is
especially useful when considering integrals of exponential functions with the Hermite polynomials.
Our result makes it obvious that

e x x H n ( x) dx e

for all n, since this function does not change when assaulted with a derivative and this is clearly the
value of the integral when n = 0.
We can also easily derive one of the fundamental recursion relations associated with Hermite
polynomials,
d 2 d n x2
H n ( x) (1) n e x
e 2 xH n ( x) H n 1 ( x) .
dx
dx n
In order to satisfy this requirement, the leading coefficients of the nth Hermite polynomial must be 2n .
Otherwise, the highest power on the right would not cancel. This recursion relation can be derived
directly from the differential equation. If H n ( x) is an eigenfunction of the above operator with
eigenvalue n , then H n 1 ( x) 2 xH n ( x) H n ( x) is an eigenfunction with eigenvalue n 1 . The
functions that are not polynomials but still satisfy these recurrence relations along with the differential
equation are the Hermite functions that arise in some physical applications. They are not squareintegrable, so do not belong to the space we are considering, but they do come up at times.
The generating function of the Hermite polynomials is especially easy to calculate from the
Rodrigues representation, once we look at it in the right way. Defining

tn
g ( x, t ) H n ( x ) ,
n!
n 0
we see from the Rodrigues representation that
2
2
g ( x, t ) e x exp t x e x .
Try to show this. The exponential of a derivative generates a Taylor expansion, so the effect is a shift
to the right by the amount associated with the derivative:

f ( n ) ( x) n
exp a x f ( x)
a f ( x a) .
n!
n0
Therefore, we have
2
2
2
g ( x, t ) e x e ( x t ) e2 xt t .
This is the physicists form; the mathematicians Hermite polynomials have the generating function
2
g stat ( x, t ) e xt t 2 .
This generating function is extremely useful in calculating integrals of the Hermite polynomials, as
well as establishing many important properties. The value at zero, for example, is easily determined
via

2
(1) k t 2 k
tn
g (0, t ) e t
H n (0) .
k!
n!
k 0
n 0
Thus, H n (0) is zero when n is odd and has the value (1) k (2k )! k ! when n = 2k is even.
Orthogonality is easily established via the generating function. All we need to do is integrate the
product of two generating functions over the interval with the weight function:

2
t n sm
2 xt t 2 2 xs s 2 x 2
H n ( x) H m ( x)e x dx
.
e e e dx

n !m !
n 0 m 0
694

Section XIII.5: The Sturm-Liouville Problem

The integral is easily determined by completing the square in the exponent. The result is

2k s k t k
t n sm
2 st
( x t s )2 2 st
x2
e
e
dx
e
H
(
x
)
H
(
x
)
e
dx

n
m

k!
n !m !
k 0
n 0 m 0
Equating the coefficients of like powers of s and t immediately gives us the result

H n ( x) H m ( x)e x dx 2n n ! nm .

This not only re-establishes orthogonality, but also gives us the normalization integral for free.
Many integrals involving the Hermite polynomials can be obtained in the same manner by using the
generating function. Any integral we can do with a Gaussian function we can also do with any
combination of Hermite polynomials. Even products of three, four, or more Hermite polynomials can
be taken using this technique, simply by including more generating functions in our original integral.
The different Hermite polynomials are kept separate by associating a different parameter for t in each
of these generating functions. This process is clearly quite elegant, and requires only that we are adept
at manipulating Taylor expansions.
The Rodrigues representation of the Legendre polynomials is given by

1 d 2
P ( x)
x 1 ,

2 ! dx
where the normalization has again been chosen to make things nicer in other places. We can use this
result to aid in the computation of integrals involving the Legendre polynomials if we like. The
integral
n

1
1 d
n d

1
2
P
(
x
)
P
(
x
)
dx
x
1
x 2 1 dx

n
n

1 n

2 n ! ! dx
dx

1
n d

dn 2
1
x 1 1 x 2 1

n
n
2 n ! ! dx
dx

n 1
1
1 d
n d

1
x 2 1 1 x 2 1 dx

2 n ! ! 1 dx
dx

The boundary term vanishes, as the number of derivatives acting on x 2 1 is not enough to prevent

the function x 2 1 from annihilating both x = 1 and x = -1. We can continue to do this as many times
as we like. Choosing n then immediately gives us the orthogonality relation, as more than 2n

derivatives acting on x 2 1 will clearly give zero. If n , then the final integral is
n

1 Pn ( x)
1

dx

(1) n
22 n n !2

x
1

n
d 2n 2
(2n)!
x 1 dx 2 n 2
2n
dx
2 n!

1 x
1

2 n

dx

.
(2n)! 1 1 2
(2n)!
2
n
u
1

u
dx

22 n n !2 0
2 2 n n ! n 3 2 2n 1
The last equality follows from a use of the recurrence relation for the Gamma function and the
Legendre duplication formula. Try to show this. This integral is extremely useful, as it allows us to
normalize the Legendre polynomials and determine the coefficients of a Legendre polynomial
expansion. The same approach can be used to show that
1
1 1 ()
2
1 f ( x) P ( x) dx 2 ! 1 f ( x) 1 x dx ,
so polynomials of degree less than are necessarily orthogonal to P ( x) .
We can also derive a recurrence relation for the Legendre polynomials from this Rodrigues
representation by writing

695

Advanced Mathematical Techniques


n
n 1
1 dn 2
1 d n 1
x
1
2 xn x 2 1

n
n
n
n 1

2 n ! dx
2 n ! dx
.
n 1
n 1
n 1
d
2
1
1
d n 1 2
, x x 1 n 1
x
n 1
x 1
2 (n 1)! dx n 1
2 (n 1)! dx n 1

Pn ( x)

The commutator

d n 1 d n 1
d n 1
n 1 , x n 1 x x n 1
dx
dx
dx
is given by writing
n 1 n 1

dk
d n 1
x
k
n 1
dx
k 0 k dx

d n 1 k
d n 1
d n2
x n 1 k x n 1 (n 1) n 2 ,
dx
dx
dx

from which we obtain


n 1
n 1
d n2 2
x 1 xPn 1 ( x) .
n 1
n2
2 (n 1)! dx
Taking another derivative brings us to the recurrence relation
1 d n
x Pn 1 ( x) .
Pn( x) (n 1) Pn 1 ( x) Pn 1 ( x) xPn1 ( x ) nPn 1 ( x) xPn1 ( x) n 1
x dx
Numerous other recursion relations involving the Legendre polynomials can be developed from
Rodrigues formula; the most important is
(2n 1) xPn ( x) (n 1) Pn 1 ( x) nPn 1 ( x) .
From this and the preceding recurrence relation, we can derive an expression for the generating
function of the Legendre polynomials,

Pn ( x)

g ( x, t ) Pn ( x) t n .
n0

The first recurrence relation gives the partial differential equation


g

g
g
g
,
t tg xt
tg t 2
xt
x
t
x
t
x
and the second gives
g
g

g
2 xt
xg
t tg 1 t 2 tg .
t
t
t
t
Try to derive these from the recurrence relations. Essentially, you just multiply by t n and sum over n
then look for ways in which the additional factors of n can be generated,188 then re-create the
coefficients so that the correct Legendre polynomial is multiplied by t n . It is not terribly difficult, but
it does require some practice to get the hang of it. If you are having trouble, try substituting the
expression for g into the equations to see how it works.
In order to solve these equations, it will be helpful to isolate either the partial derivative with
respect to t or that with respect to x. The second equation immediately gives
g
x t
g,

t 1 2 xt t 2
and integration yields
C ( x)
.
g ( x, t )
1 2 xt t 2
188

Usually by derivatives with respect to t.

696

Section XIII.5: The Sturm-Liouville Problem

Here, C(x) is the arbitrary integration constant. It is allowed to be a function of x because the only
derivative we took was with respect to t. We have, as of yet, no information on the manner in which g
changes with x alone. Dividing the first equation by t 2 and multiplying by 1 2xt t 2 , then
subtracting the equations yields
1 2 xt t 2

xt 1
g
t x g 2 1 2 xt t 2
0,

t
t
x

or, since xt is not always 1,


g
t
g.

x 1 2 xt t 2
Integrating this equation gives us the solution
C (t )
,
g ( x, t )
1 2 xt t 2
where C(t) is an arbitrary function of t. Equating these two expression for g immediately indicates that
both integration constants are actually constant. The zeroth Legendre polynomial is clearly 1; this
fixes the normalization to

1
g ( x, t )
P ( x) t .
1 2 xt t 2 0
This generating function is not as easily manipulated as that for the Hermite polynomials, as
products of generating functions with different t parameters are not easily integrated. The uses of this
generating function are quite different from that of the Hermite generating function for this reason.
One often finds use for the Legendre generating function in physics problems involving
electromagnetic fields or gravitational fields, as the gravitational potential associated with the location
R in the vicinity of a mass distribution with density (r ) is given by

(r)

(r)

G
r
P cos (r)dV

2
2
R r
R 0 V
R
R 2 Rr cos r
V
V
In this expression, is the angle between the position vector R of the observation point and the
position vector r of the small amount of mass considered in the integral. Working in spherical
coordinates, the volume element dV allows us to make the change of variables u cos and causes
the integral over to take the required form for the Legendre polynomials. Effecting a Legendre
expansion of the mass density (r ) ,
V (R) G

dV G

dV

(r ) c r , P cos ,
0

then allows us to exploit the orthogonality of the Legendre polynomials and write

rmax
2
G
G 2
r
r
P
cos
(
)
dV
r 2 dr d c r , .

0
0
R 0 V
R 0 2 1
R
R
This expression is extremely useful in analyzing the gravitational effects of a given mass distribution.
The coefficients c r , are called Legendre moments of the mass distribution,189 and their integrals

V (R )

can readily be measured by orbiting satellites because of the dependence on R. This is the origin of the
coefficients given above for the Earth. The nice thing about this representation is that the different
Legendre polynomials are somewhat distinct in this analysis because of the orthogonality relation. We
189

This term is sometimes used instead to represent the integrals of these coefficients appearing in the given expansion,
which makes them numbers instead of functions; it is important to check the definitions in any reference that you read.

697

Advanced Mathematical Techniques

can think of the different contributions to this series as completely independent of one another, which
is a tremendous aid in analyzing these contributions. Physically, this arises from the conserved
angular momentum of orbiting satellites. This type of analysis also plays a major role in studies of
electromagnetic waves in the brain and many other systems for which spherical coordinates are
appropriate.
General linear second order differential equations can be expressed in terms of the SturmLiouville equation as follows. Consider the equation
d2y
dy
a2 ( x) 2 a1 ( x) a0 ( x) y b( x) y .
dx
dx
The differential operator in this equation is not of Sturm-Liouville type in general, and will not
necessarily be Hermitian. Dividing by a2 ( x ) allows us to write this in a more useful form:
a1 ( x ) a2 ( x ) dx d a1 ( x ) a2 ( x ) dx dy
a ( x)
d 2 y a1 ( x) dy a0 ( x)
b( x )
ye
e
y
y.

0
2

dx
a2 ( x) dx a2 ( x)
dx
dx a2 ( x)
a2 ( x)
This is equivalent to the equation
d
dy
p ( x) q ( x) y w( x ) y ,

dx
dx
a
(
x
)
a
(
x
)
dx
1
2
with p( x) e
, q( x) p ( x) a0 ( x) a2 ( x) , and w( x ) p ( x) b( x) a2 ( x) , which is an

eigenvalue problem of the Sturm-Liouville type, provided a2 ( x) 0 anywhere within the domain of
interest. Note that the as-yet unspecified parameter can absorb a sign if necessary, so we can
always choose a2 ( x ) to have the same sign as b(x). If the function a2 ( x) has a finite number of zeros
within the interval of interest, then the interval can definitely be broken up into different regions lying
between the zeros that are orthogonal to each other. They can be considered as completely
independent of one another, except possibly for boundary conditions, but there is definitely no
guarantee that the eigenvalues will be the same in both regions. The two regions are orthogonal, so a
change in eigenvalues will not necessarily destroy everything. However, the standard boundary
conditions at the boundary between the two regions connect the regions in a manner dictated by the
physical requirement that any vibrating membrane cannot be discontinuous and will correct any
deviations from smoothness almost immediately.
These requirements of continuity and
differentiability of the wavefunction at the boundary relate the eigenvalues associated with one region
to those associated with the other, often modifying both in order to satisfy the boundary conditions.
This process is exactly the same as that used to treat physical boundaries between two distinct media
with different propagation speeds in the next set of notes on boundaries between media, so I will defer
to that analysis.
If the function a2 ( x ) 0 anywhere within the domain of interest, except possibly at the
endpoints, and the function a1 ( x ) a2 ( x ) is piecewise continuous and finite within the domain, then the
function p ( x ) is positive and continuous throughout our domain except possibly at the endpoints.
Supposing that, for some value of , there is a function y ( x) that satisfies the differential equation
and for which the normalization integral

y2 ( x) w( x)dx ,

we can definitely normalize the eigenfunction in such a way that this integral is 1. In this case, we
have

698

Section XIII.5: The Sturm-Liouville Problem

dy ( x)
d

p( x)
q( x) y ( x) dx
dx
dx

y2 ( x) w( x)dx y ( x) y ( x) w( x) dx y ( x)
b

b
b
dy ( x)
p ( x) y ( x) y ( x) a p ( x)
q( x) y2 ( x) dx

a
dx

b
dy ( x)
p( x)
q ( x) y2 ( x) dx

a
dx

This is clearly positive if q(x) is nonnegative, so we have the result that any differential equation of the
above form with the same sign for both a2 ( x) and b(x) and opposite sign for a0 ( x ) for all x on the

interval of interest must have positive eigenvalues . This is also true if a0 ( x ) is allowed to be zero
any number of times within the interval. If the function q( x) is negative somewhere in the interval of
interest, but still satisfies q ( x) Mw( x) throughout the region, then we simply re-arrange the
equation as
d
dy
p ( x) q ( x) Mw( x) y M w( x) y .
dx
dx
The above analysis makes it clear that M , since the new function q(x) is positive throughout
the interval of interest. This can only be a problem at the endpoints, where w(x) may be zero. In this
case, any function q(x) that is negative at either endpoint cannot satisfy our inequality regardless of the
value of M. On the other hand, the region at the endpoints represents a set of measure zero and
therefore cannot contribute in any appreciable way to the integral we are using to demonstrate the
positivity of the eigenvalues .
To obtain further results on the eigenvalues, one generally makes arguments about how these
eigenvalues change under continuous changes in the functions p(x), w(x), and q(x). The result is that
the eigenvalues must change continuously under these variations. Since we know that the SturmLiouville problem with p(x) = w(x) = 1 and q(x) = 0 admits an infinite number of discrete eigenvalues

n n 1 that tend to infinity as n does, this result also carries over to the general Sturm-Liouville
problem. There can only be a finite number of negative eigenvalues for the general problem, as they
must be discrete, unbounded above, and bounded below by M. The associated eigenfunctions must
be orthogonal and span the space according to the inner product with weight function w(x). The
existence of eigenfunctions requires a separate proof, which I will omit. It follows the same line of
reasoning as that given in the chapter 12, and is extended by the treatment of chapter 15. It suffices to
say that there will be an infinite set of orthogonal eigenfunctions that span the space whenever p(x) and
w(x) are both positive throughout the region of interest and q(x) is well-behaved.
If the eigenfunctions of a Sturm-Liouville problem are polynomials, then these polynomials are
given by the Rodrigues representation
1 dn
w( x)a2 ( x) n .
Pn ( x)
w( x) dx n
This situation will definitely happen whenever a0 ( x) 0 , a1 ( x) is linear, b(x) = 1, and a2 ( x) is either
quadratic with distinct real roots that flank the root of a1 ( x) and of the same sign as a1 ( x) for large x,
linear in such a way that the intersection between the lines a1 ( x) and a2 ( x) occurs to the right of the
root of a2 ( x) , or constant with the opposite sign of the leading coefficient of a1 ( x) .190 In any of these
190

The reason for these requirements will be made clearer in chapter 15.

699

Advanced Mathematical Techniques

cases, the polynomials Pn ( x) are definitely orthogonal to each other with respect to the weight
function w(x) on the interval lying between the zeros of a2 ( x) in the first case, running from the zero
of a2 ( x ) to positive infinity in the second case, and running from negative infinity to infinity in the
third case. The weight function w(x) is determined by the equation, and must be such that the integral

P 2 ( x) w( x)dx

for all polynomials P(x). We can verify orthogonality simply by investigating the integral

Pn ( x) Pm ( x) w( x)dx .

If m > n, then we have


m
b d

dm
w( x)a2m ( x)dx (1) m m Pn ( x) w( x) a2m ( x)dx 0
m
a
a
a dx
dx

by repeated integration by parts. The boundary terms vanish because of the boundary conditions. The
final equality comes from the fact that taking more than n derivatives of a polynomial of degree n
gives zero.
Orthogonal polynomials come up very often in physical applications, so these results are very
useful. While they are not the most general we can derive for Sturm-Liouville problems, they do the
job in most cases. As an example, consider the Laguerre differential equation
xy ( 1 x) y y 0 .
This equation satisfies our requirements whenever 1 , so solutions are determined by first finding
the functions p(x) and w(x). These functions are given by
a1 ( x ) a2 ( x ) dx
1
exp
1 dx x 1e x
p( x) e
x

and
p( x)
x e x ,
w( x)
a2 ( x)
so the Laguerre polynomials are given by191
e x d n n x
L(n ) ( x )
x e .
n ! x dx n
These polynomials are orthogonal on the interval x 0, with respect to the weight function. Note

Pn ( x) Pm ( x) w( x)dx Pn ( x)

that the weight function loses its integrability at x = 0 when the parameter becomes less than -1, so
the differential equation is ill in this case. As long as 1 , however, these polynomials will do
everything required of them on the interval of interest. They will be orthogonal and span the space of
square-integrable functions defined on x 0, . This differential equation is famous for its role in
the solution of the hydrogen atom problem in quantum mechanics. The Laguerre polynomials with
whole number play the role of radial solutions to the Schrodinger equation for the hydrogen atom.
The normalization constant of n! is chosen in order to make the generating function for these
polynomials simple. This generating function is not obvious from the Rodrigues representation, but
can be derived in more-or-less the same manner as that used to derive the generating function for the
Legendre polynomials by first establishing several recurrence relations. The result is
191
The term Laguerre polynomial is usually reserved for the case 0 . Positive integer values of are referred to as
associated Laguerre polynomials. Both of these sets of polynomials play a prominent role in the quantum mechanical
treatment of the hydrogen atom, with the Laguerre polynomials appropriate to spherically symmetric wavefunctions with
zero angular momentum and the associated functions appropriate to states with nonzero angular momentum.

700

Section XIII.5: The Sturm-Liouville Problem

e xt (1t ) (1) 1 1 xt (1t )


1
e
.
(1 t ) 1
t
x 1
n 0
The last form of this expression allows us to identify
d
L(n ) ( x) (1) L(0)
n ( x ) ,
dx
at least when is an integer. When is not an integer, we write instead
dm
L(n ) ( x) ( 1) m m L(nmm ) ( x) ,
dx
where m is an integer.
There are many other standard results for orthogonal polynomials associated with SturmLiouville type problems, as well as results for the general Sturm-Liouville problem without
polynomial solutions. These discussions come up whenever one is solving a separable partial
differential equation, as these almost always lead to Sturm-Liouville problems. Some of the results
that have been established involve the spacing of roots of eigenfunctions, the distribution of
eigenvalues, and the conditions under which the eigenvalues will be discrete (recall that this is not the
case when the weighting function is 1 on an unbounded interval). This theory is so general that
specialization to specific classes of differential equations is required in order to determine the full
range of expressions that apply. You can go online to see the results for Chebychev polynomials,
Jacobi polynomials, Gegenbauer polynomials, and Mathieu functions. Each of these satisfies a
differential equation of the Sturm-Liouville type, and each is associated with an infinite set of discrete
eigenvalues, but they are different from each other in their own ways. Each is applicable to its own set
of problems, which may or may not be related to the problems associated with the other sets of
functions. The Chebychev polynomials are often seen in numerical analysis, especially when one is
looking for an approximation to a set of functions that comes within a specified error over a given
interval, the Jacobi polynomials parameterize the possible solutions to differential equations with
a2 ( x ) quadratic in x (the zeros of which are chosen to be x 1 ), the Gegenbauer polynomials are
extensions of the Legendre polynomials, and come up when one is considering physical problems
involving the conservation of angular momentum, and the Mathieu functions are associated with the
normal modes of vibration of an elliptical drum. Most of these functions can be expressed in terms of
the hypergeometric functions, and these functions are characterized by classifying the singularities
associated with places where a2 ( x) 0 . They will be discussed in detail in the next section. The
Mathieu functions are not expressible in terms of hypergeometric functions because their coefficients
have an essential (irregular) singularity at infinity, but these are definitely in the minority as far as
physical applications go.

g ( ) ( x, t ) L(n ) ( x)t n

701

Advanced Mathematical Techniques

Exercises for Section XIII.5:


1. Show that the Rodrigues representation of the Hermite polynomials satisfies the Hermite
equation
2
2
e x H n ( x) n e x H n ( x) .

What is the associated eigenvalue n ? Hint: Take the derivative H n ( x) first, then use
commutation relations to move the factor of x inside all of the derivatives. As an example,
dn
dn
dn
d n 1
d n 1
x f ( x) x n f ( x) n n 1 f ( x) x, n n n 1 .
n
dx
dx
dx
dx
dx
from the binomial expansion formula.
2. Show that the Rodrigues representation of the Legendre polynomials satisfies the Legendre
equation
1 x 2 P( x) P ( x) .

What is the associated eigenvalue ? Hint: Take the derivative P( x ) first, then use
commutation relations to move the factor of 1 x 2 inside all of the derivatives. See the hint
associated with problem 1 for more information.
3. Consider the differential equation a2 ( x) y a1 ( x) y y , where a1 ( x) is linear in x and a2 ( x )
is at most quadratic in x.
a1 ( x ) a2 ( x ) dx
(a) Show that the weight function is given, up to a constant factor, by w( x ) e
a ( x) .
2

(b) Show that the standard form of this differential equation is wa2 y wy .
1 dn
wa2n leads to
w dx n
d n 1
dn
wa2 yn a2 n 1 wa2n a2 a1 n wa2n .
dx
dx

(c) Show that the Rodrigues representation yn

dn
d n 1
d n 2
(d) Show that the derivative wa2 y a2 a1 n wa2n 2a2 a1 n 1 wa2n a2 n 2 wa2n .
dx
dx
dx
(e) Show that a2

d n2
d n2
d n 1
(n 1)( n 2) d n
n
n 1
n

wa

wa

(
n

2)
a
wa

a2 n wa2n .
2
2
2
2
dx n 2
dx n 2
dx n 1
2
dx

(f) Complete this analysis to show that the Rodrigues representation indeed satisfies the
differential equation, and determine the eigenvalue n .
702

Section XIII.5: The Sturm-Liouville Problem

In problems 4 6, consider the given differential equation.


(a) Show that it satisfies the requirements listed in the text for orthogonal polynomial solutions.
What is the weight function? What is the interval of orthogonality?
(b) What is the eigenvalue associated with the nth polynomial? The results of problem 3 will be
useful in this regard; they can be found in the back if you did not do that problem.
(c) Determine the first four polynomials directly from the Rodrigues representation. You should
use a computer algebra system to simplify the algebra. Verify that these polynomials are
indeed orthogonal with respect to the weight function on the appropriate interval.
4.

1 x y (1 2 x) y y .

5. ( x 1) y (2 3 x ) y y

6. y (1 2 x) y y . Also determine the generating function for these polynomials.


7. Consider the Laguerre equation, xy ( 1 x ) y y 0 .
(a) Determine the eigenvalue associated with the nth polynomial.
(b) Use the Rodrigues representation of the Laguerre polynomials to derive the recurrence
relations
( n 1) L(n)1 ( x) (2n 1 x ) L(n ) ( x) ( n ) L(n1) ( x)
and
d ( )
x Ln ( x) n L(n ) ( x) ( n ) L(n1) ( x) .
dx
(c) Use the recurrence relations derived in part (b) to establish the generating function given in
the text for the Laguerre polynomials.
8. Use the generating function of the Hermite polynomials to determine the integral

H ( x) H m ( x) H n ( x) e x dx .

9. Follow the steps outlined in the text to derive the generating function for the Hermite
polynomials.
10. Use the recurrence relations of the Legendre polynomials to derive their generating function.
Explain what you are doing at each step.
11. Put the differential equation e x y 2 3e x y x 2 3 x y 0 into Sturm-Liouville form and
indicate what weight function the eigenfunctions will be orthogonal with respect to. What
orthogonality intervals will work for this problem? Will the solutions be polynomials?

703

Advanced Mathematical Techniques

Section XIII.6: Hypergeometric Functions


The confluent hypergeometric function is the solution to the differential equation
d2y
dy
x 2 (b x) ay 0 .
dx
dx
The solutions to this are given by the confluent hypergeometric, or Kummer, series

(a) n x n
.
(
;
;
)
F
a
b
x

1 1
n 0 (b) n n !
These solutions were first introduced by the German mathematician Ernst Kummer in 1837. The
Pochhammer symbol (a)n is defined by

a n (a)(a 1)(a 2) (a n 1)

a 0 1 ,

or
( a n)
.
a n
(a )
It is also called the rising factorial, and named for the Prussian mathematician Leo August
Pochhammer. When a is a negative integer, this ratio is undefined. Replacing it with the former,
explicit representation, however, shows that it becomes zero at some finite value of n and remains zero
for the remaining values. This means that 1 F1 (a; b; x) is a polynomial whenever a is a negative
integer. The series is not well-defined if b is a negative integer, unless a is also a negative integer and
b a . In the latter case, we can define nonpolynomial solutions by allowing a and b to differ by an
integer and taking the limit as they both approach integers themselves. In this way, we obtain a string
of finite coefficients until a + n = 0, then a string of zero coefficients until b + n = 0. After that, the
singularities cancel and we have a well-defined expansion.192 The singularity associated with the
coefficient a2 ( x) is at x = 0, so this differential equation satisfies the requirements of orthogonal
polynomials whenever b > 0. Confluent hypergeometric functions can represent many different
special functions, especially the orthogonal polynomials. Some examples are given below:
e x 1 F1 (a; a; x)
L(n ) ( x )

1n

1 F1 ( n; 1; x ) .
n!
(2n)!
2
H 2 n ( x ) (1) n
1 F1 n ; 1 2 ; x .
n!
(2n 1)!
H 2 n 1 ( x) ( 1) n
2 x 1 F1 n ; 3 2 ; x 2
n!

e ix x
F 1 2; 2 1; 2ix .
! 2 1 1
As with all second order linear differential equations, there is a second solution to the confluent
hypergeometric equation. It can be seen by manipulating the differential equation that the function
x1b1 F1 (1 a b; 2 b ; x )
J ( x )

192

Note that this limiting process must be specified clearly in order for the result to be well-defined; the expression
F
(
3; 5; x ) has no meaning, and is usually taken as the polynomial lim 1 F1 ( 3; n; x ) in the absence of other
1 1
n 5

instructions.

704

Section XIII.7: Sound Waves

also satisfies the hypergeometric equation unless b is an integer larger than 2, in which case it is
undefined. This is called the confluent hypergeometric function of the second kind. The linear
combination most often used for this second solution is
1 F1 (a; b; x)
F (1 a b; 2 b; x )
x1b 1 1
U (a, b; x)
,
( a)(2 b)
sin b (1 a b)(b)

where the expression is understood to represent its limit whenever a or b are integers. It has been
designed in such a way that the singularities associated with integer a and b cancel, so this solution
works for all values of a and b if treated properly. This solution may not have proper behavior at
x = 0, as this is a singular point of the differential equation. If the physical location associated with
this value of x is part of the system, then this function is automatically disallowed. This is exactly
what happened to the cosine function in our analysis of strings fixed at both endpoints and to the
Bessel function of the second kind Y in our treatment of the vibrating drum.
Standard hypergeometric functions satisfy
d2y
dy
x(1 x) 2 c (a b 1) x aby 0 .
dx
dx
These functions are represented by the notation

(a) n (b)n x n
.
F
a
b
c
x

(
,
;
;
)

2 1
(c ) n n !
n0
This is sometimes called Gauss series, as Gauss spent a great deal of time analyzing its solutions.
These functions are regular at x = 0; the linearly independent solutions that are not necessarily regular
there are given by
x1c 2 F1 (a c 1, b c 1; 2 c; x) x1 c (1 x)c a b 2 F1 (1 a,1 b; 2 c; x) .
The zero of a1 ( x) falls between the two singular points associated with the zeros of a2 ( x) whenever
c
1 , so the orthogonal polynomial restriction given above is met whenever this condition
0
a b 1
is satisfied and a b 1 0 . The representations of the orthogonal polynomials given below are all
seen to satisfy both of these requirements.
The hypergeometric series converges whenever x 1 unless the coefficients are playing some
sort of integer game. Convergence on the circle itself is contingent on the parameters a, b, and c. If
Re(c a b) 1 , then the series diverges when x 1 . If Re(c a b) 0 , then the series
converges absolutely. If we lie in the in-between region 1 Re(c a b) 0 , then the series will
converge for some values of x on the circle and not other values. The integer game refers to negative
integral values of a, b, or c. If either a or b is a negative integer and c is not, then the hypergeometric
series truncates and one of the solutions is a polynomial. If c is a negative integer, then the series
becomes undefined unless either a or b is also a negative integer and c a, b . These cases are familiar
from the confluent hypergeometric case. In fact, the confluent hypergeometric functions can be
obtained as limits of the hypergeometric functions:
1 F1 a ; c ; x lim 2 F1 a, b ; c ; x b .
b

This can be seen directly from the series expansion, as


b n
lim
1
b b n
for all fixed values of n. This limit gives the rationale behind the name confluent hypergeometric
functions, as the regular singular points at x = 0, b, and in 2 F1 a, b, c; x b become a regular
705

Advanced Mathematical Techniques

singular point at x = 0 and an irregular singular point at in the limit as b tends to . It is the
confluence of the two singularities at that give rise to the name.
Many, many functions can be expressed as hypergeometric functions. Some examples
involving orthogonal polynomials are
Tn (1 2 x) 2 F1 n, n;1 2; x (Chebychev)
Pn (1 2 x) 2 F1 ( n, n 1;1; x)
(Legendre)
(2 ) n
Cn( ) (1 2 x)
(Gegenbauer)
2 F1 n, n 2 ; 1 2; x
n!
( 1) n
Pn( , ) (1 2 x)
(Jacobi) ,
2 F1 n, n 1; 1; x
n!
and other examples involving elementary functions are
ln(1 x)

2 F1 (1,1; 2; x)
x
1 1 x
2
3
1
ln
2 F1 2 ,1; 2 ; x
2x 1 x
Arc tan x
2 F1 12 ,1; 23 ; x 2
x
Arc sin x
2 F1 12 , 12 ; 23 ; x 2 1 x 2 2 F1 1,1; 23 ; x 2
x

ln x 1 x 2

2
1 1 3
2 F1 2 , 2 ; 2 ; x
x
(1 x) a 2 F1 (a, b; b; x)

cos 2ax 2 F1 a, a; 12 ;sin 2 x

cos (2a 1) x
cos x
sin (2a 1) x

(2a 1) sin x
sin (2a 2) x

2 F1 a,1 a; 12 ;sin 2 x

2 F1 a,1 a; 32 ;sin 2 x

2 F1 a, 2 a; 32 ;sin 2 x .
(a 1) sin 2 x
There are many other relations like these. It has been said that the hypergeometric functions can
represent almost anything, which is a fairly accurate statement if we are only interested in those
functions that come up in physical applications. Essentially the only functions appearing in standard
mathematical physics that cannot be represented by the hypergeometric series or its confluent relative
are the Gamma function and the Mathieu functions.
This fact can be seen directly from the form of the hypergeometric series, which can be forced
to conform to almost every Maclaurin expansion we have considered. The function
sin x (1)n x 2 n
,

x
n 0 (2n 1)!
for example, can be written in terms of the confluent hypergeometric series as follows. Re-writing the
factorial in the denominator as
(2n 1)! 22 n n 12 n 1 12 n 2 12 12 n ! 22 n 23 n n ! ,
706

Section XIII.7: Sound Waves

this function is given by


sin x
0 F1 32 ; x 2 4 ,
x
where

xn
.
a
n 0 (b) n n !
These confluent confluent hypergeometric functions have no singularities at all in the finite plane
unless b is a negative integer, in which case the series is undefined. They can also be written in terms
of the confluent hypergeometric functions as
2
ix
1
0 F1 a; x 4 e
1 F1 a 2 ; 2 a 1; 2ix .
0

F1 (b; x ) lim 1 F1 a; b; x a

The Bessel functions have a simpler representation in terms of these functions as

x 2
( x)

F 1; x 2 4 .

0 1

From this, we can easily conclude that

2 x sin x
2
sin x .

x
x
As a final example, consider the dilogarithm function

x ln(1 t )
xn
Li 2 ( x)
dt 2 .
0
t
n 1 n
We cannot express this function in terms of the standard hypergeometric series, but we can express it
in terms of the generalized hypergeometric series as

(1) (1) (1) x n


.
Li 2 ( x ) x 3 F2 1,1,1; 2, 2; x x n n n
n!
n 0 (2) n (2) n
The generalization should be obvious. The utility of expressing functions in terms of the
hypergeometric set comes from the large number of analytic results concerning the integrals of
hypergeometric functions. Functions that we may not be able to integrate in terms of elementary
functions can almost always be integrated by first representing them as hypergeometric functions, then
using these results. The process is essentially identical to just integrating the series term-by-term, as it
is clear that
a a a
x
x n 1
1 n
2 n
p n
0 p Fq a1 , a2 , , a p ; b1 , b2 , , bp ; t dt
n 0 b1 n b2 n bq ( n 1)!
n
,
n
a a a
1 n
2 n
p n (1) n x
x
x p 1 Fq 1 a1 , a2 , , a p ,1; b1 , b2 , , bp , 2; x
n 0 b1 n b2 n bq (2) n n !
J1 2 ( x )

but there are also some interesting results on the analytic continuation of the hypergeometric functions
that are not at all obvious from the series expansion itself. One such result allows us to write this as
x p 1 Fq 1 a1 , a2 , , a p ,1; b1 , b2 , , bp , 2; x
q

b
k 1
p

a
j 1

1
1

p Fq a1 1, a2 1, , a p 1; b1 1, b2 1, , bp 1; x 1

707

Advanced Mathematical Techniques

as long as none of the parameters is 1. This is why we cant write the dilogarithm function in terms of
the standard hypergeometric function. We can, however, write it as the limit

1
1 n n x n
2 (n ) x n
xn
lim 2
2
Li 2 ( x ) lim 2 2 F1 , ;1; x 1 lim 2
2
0
0
(1) n
n ! 0 n 1 (1 ) n !
n 1
n 1 n
The structure of hypergeometric functions is vast, and whole courses have been devoted entirely to this
subject in the past. There is A LOT more where this came from.

Exercises for Section XIII.6:


In problems 1 8, write the given function in terms of the generalized hypergeometric function. Also
write the given function in terms of elementary functions or associated integrals.
1.

xk

k k!

2.

k 1

5.

k
k 1

xk
2
k!

xk

(k 2) k !

3.

k 0

6.

xk

(2k 1) k !
k 0

k 0

7.

xk

k
k 1

xk

(k 1)(k 5)

4.

k xk

(k 1)(k 3)
k 1

8.

k
k 1

xk
(2k 1)

9. Find the derivative of the generalized hypergeometric function


p Fq a1 , , a p ; b1 , , bq ; x
in terms of another generalized hypergeometric function.

10. Derive the expression given in the text for the integral of a generalized hypergeometric function
in terms of a generalized hypergeometric function with the same values of p and q.

708

Section XIII.7: Sound Waves

Section XIII.7: Sound Waves


As is obvious from their name, sound waves are also waves. We can derive an equation for the
propagation of sound waves through a given medium by analyzing the reason for this propagation,
exactly as we did above for other mechanical waves. Sound waves can be thought of as displacement
waves or as pressure waves. The disturbance they are associated with is atoms or molecules of the
medium being displaced from their equilibrium locations. Consider a medium consisting of a long
cylinder of radius a whose axis coincides with the x-axis in which we have introduced the
displacement wave s(x, t). Molecules that are supposed to be at x are instead located at x s ( x, t ) at
time t. A small section of this cylinder is illustrated in figure 73. The solid lines are at x and x x ,
and the dashed lines are at x s ( x, t ) and x x s x x, t . These dashed lines represent the
location of the molecules that are supposed to be at x and x x , respectively. This is obviously not
the equilibrium configuration, so there must be a force that is trying to restore the medium to its
equilibrium. We can think of this force as resulting from the pressure necessary to accomplish the
change in volume associated with this displacement. The volume of these molecules is supposed to be
Ax , but in the displaced medium it is given by A x s( x x, t ) s( x, t ) . The change in volume
associated with the wave is given by

s ( x x, t ) s ( x, t )
x .
x
We can relate this change in volume to the pressure required to produce it by using the Bulk modulus
of the material, defined by
p
.
B V
V
According to this, we have the pressure requirement
V
s ( x, t )
.
p B
B
V
x
The difference in pressure exerted at x x 2 and at x x 2 generates the force
V A

s x x 2, t s x x 2, t
2 s ( x, t )
F BA

x
BA
x
x
x 2

on the material lying between these two parts of the medium. Newtons second law then gives
2 s ( x, t )
2 s ( x, t )
2 s ( x, t )
m
Ax
BA
x ,
2
2
t
t
x 2
or
2 s ( x, t ) B 2 s ( x, t )
.

x 2
t 2
This is obviously the wave equation for a wave with speed v B .

709

Advanced Mathematical Techniques

x+x

Figure 73
The most important thing to come out of this analysis is the speed v B of sound. This
can be calculated easily simply from the bulk modulus B and the density of the medium. It follows
the same pattern we saw earlier, as the bulk modulus represents the restorative property of the medium
and the density represents its inertial property. The bulk modulus defined above can be written as
p
,
B V
V
but it is unclear how this derivative should be taken. We will get different results for a derivative
taken at constant temperature than for one taken at constant internal energy, so we must take care to
define this quantity properly before we use it to determine the effective sound speed. For sound waves
with large frequencies, there is very little time associated with the changes we are interested in. We
cannot expect any heat flow between the different parts of the medium in a single period of the sound
wave, so it is appropriate to use the adiabatic bulk modulus
p
Badiabat V
,
V S
which takes the derivative at constant entropy. This distinction is not as important for solids and
liquids, but for gases it makes a huge difference. If the entropy of a gas is constant, then its pressure
and volume are related by
pV constant ,
where c p cV is the ratio of specific heats. Differentiating this expression gives

p
p

,
V
V S
so the appropriate bulk modulus for a gas at pressure p is given by
Badiabat p
and the speed of sound in a gas at equilibrium pressure p is
p
pV
nRT
RT
.

molar mass
m
m
This expression is familiar from our earlier treatment of the Maxwell-Boltzmann speed distribution;
the ratio of sound speed to root-mean-square molecule speed is given by
vsound

3
vrms
You will recall that the ratio 1 2 , where is the number of degrees of freedom associated
with the gas molecules. This is clearly less than 3, so the speed of sound is always less than the rootmean-square speed of the molecules of the gas in which it is propagating. In air, with 5 and molar
mass 28.8 grams , the speed of sound at 300 K is approximately 348.2 meters per second. The speed
at STP ( 0 C ) is 332.3 meters per second, a little higher than the measured value of 331.5 meters per
V dp pV 1dV 0

710

Section XIII.7: Sound Waves

second. This difference is attributed to the presence of argon at slightly less than 1% by volume in the
atmosphere. A more sophisticated treatment is required to properly include this contribution, as argon
is a monoatomic gas and has a different number of degrees of freedom than its diatomic counterparts
oxygen and nitrogen.
The treatment of solids and liquids is simpler, as the bulk modulus is largely independent of the
manner in which the derivative is taken in this case. The bulk modulus of water is 2.2 GPa at
standard pressure, and its density is 1000 kg m3 , so the speed of sound in water is approximately
1500 meters per second. The bulk modulus of steel is 160 GPa and its density is about 7900 kg m3 ,
so the speed of sound in steel is approximately 4500 meters per second. The bulk modulus and density
both change as the external pressure and temperature of the material changes, so we must be careful to
correct for this when treating materials at high temperature or pressure. The values given here are
appropriate for standard temperatures and pressures.
The pressure wave is given in terms of the displacement wave by
s ( x, t )
.
p ( x , t ) B
x
In one-dimensional cases, this wave is 90 degrees out of phase from the displacement wave, meaning
that the maximum pressure fluctuations occur at places where the displacement is zero and that the
maximum displacement fluctuations occur where the pressure takes its equilibrium value. This can be
easily understood in terms of the physics of the situation by considering the fact that positions of
maximum displacement displace neighboring regions by the same amount, so the pressure takes its
equilibrium value; there is no change in volume associated with these regions. At places where the
displacement wave has a node, places before and after the node have opposite displacement. If the
displacement wave changes from positive to negative, then the node is associated with a congregation
of molecules: molecules to the left of the node are pushed forward, and those to the right are pushed
backward. This naturally generates large pressure at the node itself, in agreement with the expression
above. In two dimensions, the displacement wave and the pressure wave have somewhat different
forms, but the main results are the same.
Sound waves have essentially the same properties as those given above for transverse waves.
The main difference is that sound is a longitudinal wave associated with disturbances in the same
direction as that associated with propagation, while the earlier waves we considered are transverse
waves in which the displacement is perpendicular to the direction of propagation. This difference is
most apparent in the propagation speeds of the waves, and does not modify the mathematical analysis
in any fundamental way. We can consider a sound wave as either a displacement wave or as a
pressure wave. The partial differential equation satisfied in either case is the same, as can easily be
seen by differentiating our displacement wave equation with respect to x:
2s B 2s
2 p B 2 p

.
t 2 x 2
t 2
x 2
The boundary conditions and initial conditions are, of course, different in each case, but the equation is
the same. The physical situation we are considering dictates which of these points of view is the most
simple to use. There are many other results that come from this analysis, but these are either fairly
obvious or more specific to the physical problem at hand. For this reason, we turn now to another
differential equation whose solutions follow the same prescription.

711

Advanced Mathematical Techniques

Exercises for Section XIII.7:


In problems 1 4, determine the speed of sound in the given material. The bulk modulus and density
are given, in gigapascals and grams per cubic centimeter, respectively.
1. Mercury, B 28.5 , 13.546 .

2. Aluminum, B 76 , 2.699 .

3. Iron, B 170 , 7.874

4. Magnesium, B 45 , 1.738 .

5. This problem is meant to present a rudimentary treatment of the speed of sound in a medium
that consists of multiple atomic species with different sound speeds. The example we will work
with is (dry) air, composed of 78.09% nitrogen, 20.95% oxygen, 0.93% argon, and 0.03% other
gases, by volume.
(a) The treatment in the text considers the molar mass as the weighted average of the molar
masses of nitrogen and oxygen. Show that this process is equivalent to adding the square of
the speeds in reciprocal, 1 v 2 1 v12 1 v22 , where each factor is weighted by the ratio of
molecules in a given volume to the total number of molecules in the volume.
(b) Show that, if adding the inverses of the speeds of the constituent molecules in quadrature is
appropriate, then we should replace the molar mass with the molar mass divided by the ratio
of specific heats in order to determine the molar mass appropriate to the speed of sound in
the mixed medium.
(c) Determine the number of nitrogen, oxygen, and argon atoms in a cubic meter of air at a
pressure of 1 atm = 101,325 Pa in thermal equilibrium at 273.15 K. Use this result to
determine the speed of sound in air, ignoring the other 0.03% entirely and assuming that the
technique of adding separate speeds in inverse quadrature is appropriate.
(d) Explain why the accuracy claimed in the above results is false. What contributions can the
speed of sound have that we have not accounted for?
(e) How would the presence of other molecular species in the atmosphere change the speed of
sound if they were diatomic with masses greater than oxygen?

712

Section XIII.8: The Diffusion Equation

Section XIII.8: The Diffusion Equation


Consider a heat source at a given location in a region we are interested in. How does the heat
energy flow into the surrounding region? In particular, how does the temperature of the surrounding
regions depend on time? As always, these questions are answered by breaking the region itself into
many small regions and considering how the energy flows through these small regions. Consider a
tiny element of the region consisting of a cylinder of length x and cross-sectional area A. If the
temperatures at either end of this cylinder are different, then energy will flow through the region from
higher temperature to lower temperature. The rate of flow is dictated by the thermal conductivity of
the intervening medium:
E A
T ( x, t )
.

T ( x x) T ( x) A
t x
x
This equation represents energy flow from the region associated with x x to the region associated
with x. If the neighboring part of the medium on the other side has temperature that differs from the
temperature at x, then there will also be energy flow in this direction:
E
A
T ( x x, t )
.

T ( x) T ( x x) A
t
x
x
The net energy flow into this part of the medium is therefore given by
E
2T
T ( x, t ) T ( x x, t )
A

A 2 x .

t
x
x
x

The change in temperature associated with this energy flow is determined by the specific heat of the
material, so we can write this result as
E
T
T
2T
T
2T
.
mc p
Ax c p
A 2 x

t
t
t
x
t c p x 2
This equation is similar to the wave equation in that it relates a time derivative to a spatial derivative,
but in this case the time derivative is only of first order. This last equation is called the diffusion
equation because it allows one to consider how a quantity diffuses from one region of a medium to
another. It makes sense from a physical perspective, as a positive second derivative of the temperature
with respect to position indicates that the temperature is greater on both sides, so the energy flow must
lead to an increase of temperature. We have derived this equation using the idea of temperature, but
the same equation will result from discussions of pollution diffusing into a reservoir or a new species
diffusing into an area in which it can prosper.
The process of solving the diffusion equation is quite similar to that used in solving the wave
equation, except only one initial condition is required and the time dependence of solutions is
exponential rather than trigonometric in character. The diffusion equation does not lead to solutions
that are oscillatory in nature. Instead, it leads to solutions that tend directly to an equilibrium
configuration as time goes on. The energy or diffusing material will tend to spread out evenly over the
system as time goes on, in contrast to the situation found with the wave equation. The boundary
conditions associated with this equation are often the most difficult to determine, as it is not clear what
we should require at the endpoints. If the medium is not in contact with any other medium with which
it can exchange energy, then we require the derivative of the temperature with respect to position to be
zero at each endpoint. This prevents any flow of energy out of the system, and allows us to think of
the boundary as a place at which the concentration of any pollutants cannot change from any exterior
agent. For this reason, free boundary conditions are much more prevalent in diffusion equation
problems than in those concerning the wave equation.
713

Advanced Mathematical Techniques

You can practice the techniques associated with solving the diffusion equation by considering a
one-dimensional region in which the initial temperature is twice as large on one side as on the other, or
a circular region in which the temperature initially varies in some given manner throughout the disk.
These techniques are very similar to those described in chapter 5, so I will not go into details here.
Different boundary conditions can be imposed if the region of interest is in contact with a thermal
reservoir at either of its boundaries. In this case, the boundary conditions will fix the temperature at
the boundary to the temperature of the surrounding reservoir, and the solutions will be similar to those
found above for rigid boundaries. If the temperature associated with the boundaries is different, then
you may need to shift the boundary conditions around in the manner described in section 13.2 so as to
obtain homogeneous boundary conditions.
If the region we are considering has a non-constant thermal conductivity, then the appropriate
equation is given by
T
1

T .
t c p
These situations are more involved, but the standard ideas are the same. The next chapter indicates
one way of treating this situation, but there are certainly others.

Exercises for Section XIII.8:


In problems 1 8, determine the solution u(x, t) to the diffusion equation with the given boundary
conditions and initial condition. Plot your solution for several values of t, and explain what happens as
t grows without bound. Give an example of a physical situation in which this problem would arise, and
explain why your graphs and asymptotic behavior are appropriate.
3
1. u (0, t ) 0 , u ( L, t ) 1; u ( x, 0) x L .
2. ux (0, t ) 1 L , u ( L, t ) 0 ; u ( x, 0) 1 x L x L .
3. u x (0, t ) 0 , u ( L, t ) 1; u ( x, 0) x L .
2

4. u x (0, t ) 3 L , u x ( L, t ) 0 ; u ( x, 0) 3 x L x L .
3

5. u x (0, t ) 1 L , u x ( L, t ) 3 L ; u ( x, 0) 2 x L x L .
2

6. u (0, t ) 2 , u x ( L, t ) 2 L ; u ( x, 0) 2 x L .
2

7. u (0, t ) t c p L2 , u ( L, t ) 0 ; u ( x, 0) 1 x L x L .
8. u (0, t ) t c p L2 , ux ( L, t ) 0 ; u ( x, 0) 1 x L x L .
2

714

Section XIII.9: Summary of Chapter XIII

Section XIII.9: Summary of Chapter XIII


This chapter is focused on the use of infinidimensional linear algebra to solve partial
differential equations. The wave and diffusion equations are the only ones that are treated, but the
analysis associated with this chapter is applicable to any linear partial differential equation that is
separable in the sense that a trial solution given by a product of functions that depend on one variable
only can be separated into a set of ordinary differential equations. The constants associated with this
separation play the role of eigenvalues of the differential operators of the separate equations. These
differential equations usually take the famous Sturm-Liouville form, and the eigenfunctions of the
associated differential operators are known to span the space of functions on which they act. The
proof of this statement is not easy, but it can be found for certain cases in chapter 15.
The result of all this analysis is that the set of solutions to the partial differential equation that
satisfy the boundary conditions is given by an infinidimensional vector space whose basis is the set of
eigenfunctions of the Sturm-Liouville operator multiplied by a factor containing the appropriate time
dependence. These basis vectors are called the normal modes of the system, and the separability of the
partial differential equation ensures that the normal modes decouple completely from one another; any
linear combination of the normal modes satisfies the partial differential equation along with the
boundary conditions. The specific linear combination of these basis functions associated with a given
well-defined physical problem is determined by the initial conditions. Just as the boundary conditions
are imposed at fixed positions for all time, the initial conditions are imposed at fixed time for all
positions. Using the initial conditions to fix the linear combination at one time then solves the
problem completely, as the normal modes separately satisfy the differential equation and the boundary
conditions for all time.
This treatment is most efficiently done when the boundary conditions are homogeneous in the
sense that the zero function satisfies them. If this is not the case, then this technique is sort of
awkward when approached directly. It is far more efficient to shift inhomogeneous boundary
conditions around to the initial conditions and the partial differential equation itself, resulting in an
equivalent partial differential equation with homogeneous boundary conditions. We then solve this
modified equation and transform the solution back to the original form. These shifting games require
some practice to get used to, but they can always be accomplished if the original problem is wellposed.
The solutions to the Sturm-Liouville problem associated with the wave equation in one
dimension are the trigonometric functions familiar from algebra II and pre-calculus, so this is an
appropriate place to start. The analogous functions in two dimensions depend on the shape of the
boundary, something that is not possible in one dimension. Rectangular membranes are treated simply
as one-dimensional motion done twice, and the trigonometric functions appear again. Circular
membranes, on the other hand, require the introduction of the Bessel functions. These functions are
qualitatively very similar to the trigonometric functions, but there are some important quantitative
differences. For this reason, a fairly detailed study of the properties of Bessel functions and their more
general relatives, the standard and confluent hypergeometric functions, follows the discussion of
waves in two dimensions. Hypergeometric functions lead directly to the study of orthogonal
polynomials, which are prevalent in applications of Sturm-Liouville differential equations whether
they arise from a solution of the wave equation or not. An introduction to the study of special
functions is very important to applied mathematicians and engineers, as they come up repeatedly in
idealized physical models. The treatment of these special functions is quite different from that of more
familiar functions, as we often must turn to the differential equation itself for guidance. The treatment
given here is not intended to be completely comprehensive, but it will hopefully allow you to follow
more sophisticated treatments when they come up in the future.
715

Advanced Mathematical Techniques

The special cases of sound waves and the diffusion equation are treated in the last two sections
of this chapter, the former primarily because it is interesting to have an expression for the speed of
sound in various media and the latter primarily to give an example of a partial differential equation that
contains only the first temporal derivative. The treatment of these situations is entirely analogous to
that found in the rest of the chapter, so these sections are fairly short. Partial differential equations
found in applications usually can be understood in terms of either the wave equation or the diffusion
equation when the appropriate set of coordinates is used, at least for small fluctuations about an
equilibrium. More complicated partial differential equations, like those arising in general relativity,
often require a numerical treatment that is beyond the scope of this book.
Boundary conditions play a fundamental role in the solution of partial differential equations.
This role is easily understood from both the mathematical and physical perspectives, but it leads to
tremendous problems when we attempt to solve a partial differential equation numerically. This
process even finds difficulty in one dimension, as the standard techniques used to solve ordinary
differential equations are not easily modified to incorporate boundary conditions. One fairly simple
example, associated with waves moving through a region that is either dispersive or supports different
wave speeds in different regions, is treated in detail in the next chapter. These problems are set up in
essentially the same manner as those found in this chapter, but the resulting eigenvalues are treated in a
somewhat different manner. This can be thought of as a sort of semi-analytic solution to the
equation, as the separation takes place analytically while the resulting integrals must be obtained
numerically. Partial differential equations with more complicated boundary conditions, like those for
which the boundary is a strange shape, must be solved in a completely numerical manner by
discretizing the differential operator itself. Boundary conditions must be treated very carefully in this
approach, and there are several different techniques that can be employed to properly take these
conditions into account. The interested reader is referred to H. F. Weinbergers text A First Course in
Partial Differential Equations, chapter 12, for a introduction to these techniques and Abromowitz and
Steguns AMS-55 reference Handbook of Mathematical Functions, chapter 25, for an exhaustive
treatment of different approximation schemes for the Laplacian operator and other partial differential
operators.

716

Answers to Selected Exercises

Answers!!!
Chapter I
Section I.1

1.

(1)

2 n 1

3.

n0

7.

(2 x) x

3 n 1

9.

n0

3 x

.
6

Section I.3
1. Div. with harmonic.
3. Conv. r = 1/2.
5. Div. with glacial
7. Conv. p = 3/2.
11. Div. 13. 0.6941725 S 0.6941735 .
15. 1.1447912 S 1.1447922 .
17. 21.948 S 21.958 . Need 1.52 10 terms.
19. For a billion, 2.06126 S 2.06172 . Cannot
improve relative error, as it is based on bounds
determined by the number of exact terms kept;
1000 will reduce the relative error by a factor of 3.
21. 1.5105 S 1.4104 .
21

Section I.4

1. R = 3; ratio

3. R

7. R 3 2 ; root

9. R ; root.

13. x 1 .

11. R = e.

2 x

; R

k 0

7.

k 0

; R.

n 1

( 1) 2 x
n

3n

; R.

(3n) n !

n 1

5 n 1
1 2 3 x
k 2 5n 1 ; R 5 2 3 .
n0

n 3n

1 2 2 x
3
17.
3n ; R 1 2 .
n 1 n
n

15.

6 n 1
1 2 5 x
6

9 6n 1 ; R 9 5 .

n 1 n
n

19. 3

21. 2 9! 1.4 10 Lagrange ;


9

2 10! 2.3 10 Alternating . Need n > 15, so


Lagrange indicates need for inclusion of n = 15.
Alternating sees that there is no n = 15. 8 terms
required in either case.
23. 2 3 .
25. ln 8 3 . 27. Need 9 terms.
10

Approx. is 0.23372 and exact is 0.233580 .


29. Need 7 terms. Approx. is 0.407199209 and exact
is 0.40719918335....
31. 20! 5 8! 2.357 10 .
33. (a) Div. with harmonic. (c) Conv. with p = 3/2.
(d) Conv. with p = 3/2. (f) Conv. with p = 2.
8

35.

19

1 2 x cos x 1
2

Coeffs tend to 2

n 2

3 x

( 1) 3
k

2 k 1

2k 1

k!

23
24

x
4

569
720

x .
6

asymptotically. Not all have

7. Conv.

Section I.7

1 2.

3. 35 2 .
17

5.

8
3

ln 2 1 3 .

7. 2 6 ln 3 9 ln 3 .
2

9. 8 3 ln 2 ln 2 1 16 9 ln 2 16 27 .

4k

2 k

( 5) x
k

k 1

5.

2k

2k

3. ln 2

13.

1. 16 .

to do this, but infinitely many must. R 1 2 .

Section I.6

k 1

5 ; ratio

15. Div. with harmonic.

Section I.5
1. Conv. 3. Conv. 5. Div.
9. 0.394 S 0.295 .

k 1

(2k )(2k )!
k 1

Section I.2
1. Conv. r = 3/4
5. Conv. r = 6/7
7. Div. r = 5/4
9. Conv. r = 2/3
11. Conv. r = 3/4
13. Conv. r = 6/7
17. If d 1 , then b > d. If d < 1, need b < 1. Results
of the next section extend this to b = d > 1 ok if
c a > 1 and b = d = 1 ok if max(c, e) 1 > a.

1.

( 1)

11.

n0

5 x

3
n0

1 2 3
k 2 x 4 k ; R 4 2 3 .
k 0

9.

; R

2 3.

10 k 5

; R

13.

11. 2 ln 2 8 ln 2 24 ln 2 48 ln 2 48 .
4

13. 5 169 . 15.

( 1)

0.7834305107 . 7 terms
n
give 0.7834305678, while 6 give 0.78342935.
n

n 1

2k

; R.

787

Advanced Mathematical Techniques

Chapter II

Section II.1
1. Div. with p = 1/2. 3. Conv. with p = 2.
5. Conv. by ratio. 7. Conv. with p = 2. 9. Div to 0.
11. Conv. with p = 3. 13. Div. with harmonic.
15. Div with p = 2/3.

1 2k . (b) 1 ln k .
17. (a)

3k 1
k
k 1
k 2
(c) The product is the exponential of the sum. The
exponential function is not zero in the finite
plane, so one of the terms must give zero; the
tail cannot supply this.

sin 2

3.

13. 8

729 .

19.

sin 2
2

5. 9 3
2

3
3 144 .

36 . 17. 7

729 .

. They do not

3 3
look the same, but are equal. Nontrival to
(e) 1

3.

23. Conv. for all x except at zeros n 1 for whole n.


2

25. (a) Need cn an

when n > N for some positive

constants N, a, and . (c) No. (e) Yes. (f) No.

13. 2

21.
23.

ln12 2 .

4 3

4 3 ln 2 4 3 .

3 1

5.

9. 945/4.

2 5

1 Arctan 2 .

sin

3 ln 2 .

3 1 ln 10 4Arctan(3)

.
50
12 1 6 ln 13 5Arctan 3 2

terms for 30 billion. Improved series

behaves like 1 8 , leading to ln 6 ln 8 0.86


times the number of terms required. Significant for
5 or 20, but insignificant for 30 billion. Would

4 3 5 10 .

11.

17.

30,000

terms to keep only


need to re-sum about 10
100,000 in the remaining sum. Not feasible.

2 27 .

Section III.3

13.
sin Arctan 1 2 .
4
2

20 5
4

1. 32/315.

2.

7.

3 3

3. 32 . 5.

9. -77/1200.

11. 115/864.

5 5 6 2 3

13. 1 12 . 15. 4 72 (3) 131 108 .


2

788

15.

.
2

2 ln 2 .

3.855 10

3. 3 128 .

3
ln 2 .

.
169
25. Need 4 terms for 6 digits, 21 for 20 digits, and

Section III.1

7.

3.

2 6 2 4 (3) .

10

ln 2 1 2 ln 2 1 1 .
3

19. 2 ln 2 4 8 .
17.

Chapter III

1. 315/8.

2 ln 2 2 2 3 2 4 ln 2

11.

15.

( x 1) 2
1 9k 2
k 1

3. 1 2 6 2

4 ln 2 3 2 ln 2 28 (3) 6

(b) Compare to p = 2.

(1 x )

show.

9.

6 3

15. 2

5 3

. 5. cos 3 2 .

21. (a) Compare to harmonic.


2

(c)
sin (1 x ) .
3

3
(d)

Section III.2
1 2
1.

ln 3 .
9 27

9. 9450 . 11. 5 1536 .

7. sinh(3) 3 .
4

. Note that, while


(1 x ) 3 (1 x ) 3
dependence on x cancels in the exponent, it still
makes contributions to the final result.
23. 2 3 2 . 24. Value is -2.
21. 4 .

7.

Section II.2
1.

1 3

19.

Answers to Selected Exercises


13, 285 710 6 4680 (3)
2

17.

1620

19. 1 2

5 12
2

(b)

4 107 .

1440

ln

24 .

21. 4 3 ln 2 4(4 ln 2) ln 2 7 (3) 24 .


2

23.

8
1125

328 45 ln 2

25. 3 8 .

4
27

Section IV.1
31. .

29. 1/24.

1. f ( x )

39. 2 3 3

81 .

3 4

15 2

. 49. 4 ln 2 . 51. 12 ln 2
55.

3 4 (3) .

f ( x)

2 2 d

d 2

d 2 d 2
d 2

Section III.4
2

54 . 3. 2 (3) ln 2 . 5.
3

15
64

2 7 2 .

3 2 2 ln 2 (3) 2 (3)
11. (3 2 ) (2) (3) 2 (2) (3) .
7. 3 (3) 8 . 9.

13.
21.

48 .

17. 1 1 ln 4 .

15. ln 2 .

8.

19. -1/252.

(5) . 24. Zeros at 1 2 in ln 2 for

Section III.5
1. ln 2 2 .

3.

5. ln 8 (5 2 ) 4 .

2 ln 2
7. 3 ln

x t

ix

Re

n 1

( N 1)!
N

it

N 2

n 1

( 1) (2 n 1)!!
3 2

2 n 1

n 1 1 2

( 1) (2 N 3)!!
N

2 ( x t )

f ( x)

n 1

7.

n 1

( 1) (2n 1)!!
2

3x
2

n3 2

( 1) (2 N 3)!!
N

2
3 x u

du
N 1
N 5 2
3x
2
u
0.004352656 f (2) 0.004361315 , with the
exact result of 0.00435698 in-between. 10 or 11
terms optimal.
2


5 5

(3n)!

.
3 n 1
x
Optimal number of terms is 5 (N = 4); N = 4 and 5
give 0.0199538 f (10) 0.01995408 , with the
exact value of 0.01995397466 in-between.

9. Integral has the expansion

n0

4
nonzero integer n. If zeta is zero, so is eta.
4

dt
2N
N 2 1 2
x
3 2
t
0.020659 f (3) 0.02602 , with the exact result
of 0.023335 in-between. 4 or 5 terms optimal.

1.

n 1 n

volume is 16 105 for d = 5.


60. (a) Re-write the volume element as the hyper-area
of a (d 1)-dimensional sphere with radius k
times dk.

n 1

5.

surface area is 16 15 for d = 7, and maximal

(4 )

dt
x
x
i
t
0.0091168 f (7) 0.0084577 , with the exact
result of -0.008499 in-between. 6 terms optimal.

3.

1
1
1

k
57. ( 1) k ! ( k ) ( k 1) ( k 2) ; k 3
2
6
3

59. Maximum area is achieved near d = 7.257, and


maximum volume near d = 5.257. Maximal

( 1) ( N 1)!

n 1

53. (3) .

(b)

n!

( 1) i n ! e

3. f ( x ) Re

14

n 1

n 1

41. 16 . 43. 3 512 . 45. (1 4 ln 2) 64 .


47.

( 1)

dt .
n 1
N 2
x
x
t
0.02592 f (5) 0.0336 , with the exact result of
0.0295778 in-between. 5 terms optimal.

35. 2 8 .

27 .

sec 18 .

Chapter IV

5.

27. 4 9 3 .

33. 3 3 2
37.

4.

2 (3 2 ) 4 .

9. ln 2 .

11. 1 3 2 ln 2 .

13. 1 2 .

15. (a) a = -1, b = -5/6, c = -3/8. No.

11. The expansion is

( 1)

3n 1 2 !

.
( 3 n 1) 2
x
Optimal number of terms is 5 (N = 4). The bounds
are 0.349509 f (6) 0.3507687 , with the exact
result of 0.35013667 in-between.
n 0

n 1

( 1) n !

( 1) ( N 1)!
N

x t

2 ( n 1)
x t N 2 dt .
2 n 1 x
2
For x = 10, use 98 or 99 terms. Error is smaller

13. f ( x )

45

than 4.7 10 . First four terms will give error


less than 0.0001 whenever x 3 . The four-term
expansion for g(x) is

789

Advanced Mathematical Techniques


1

g ( x)

7x

7 2

11x

11 2

Chapter V

2
5x

15 2

24

4.4612685 10
19 2
19 x
with the number arising from the other boundary,
10. The error depends slightly on x, but is
dominated by 10 as x tends to infinity. The

maximum error is 1.65 10

11

; the approximation

Section V.1
1. (a) (14 27i ) 37 . (c) ( 1 18i ) 25 .

3. (a) 3(1 i ) 2 .

(c) (17 i ) 10 .

5. Same rectangle, larger by a factor of 5 and


rotated counterclockwise about the origin by a little
more than 26.565 degrees.

gives 4.06323423 10 , and the exact result is


5

4.0632358... 10 . Mathematica took more than


52 seconds to determine the exact result, while the
approximation took 0 seconds (it probably took
longer than that!). The difference between the

two is 1.57 10

11

( 1) x
n

(3n 2)n ! .

infinite. Error is smaller than 2.0834 10 .


(b) Expansion is

5. e

Section IV.2
1. Stirling gives 1.4763, Nemes gives 1.489192077.
Exact is 1.489192249.
3. Stirling gives 0.01122 0.006346i, Nemes gives
0.011298671422 0.006430919723i. Exact is
0.011298670181 0.006430919655i.
5. Stirling gives 0.15066 + 0.019888i, Nemes gives
0.1519040017 + 0.0198048898i. Exact is
0.1519040027 + 0.0198048802i.
4

7. Radius is e 4 .

11.

ln n 4

n 1

9. Radius is 256/27.

, gives 355.762, while exact is 355.562.

13. e
2 e , gives 3.00643104688 10 .
Exact value is hard to come by, but close. I trust
the asymptotic expression more than numerical
results in this case.
e

15. n e

n 1

3521

2 , gives 2.84623596 10

15,657

exact is 2.84625968 10

15,657

17. z ln z 1 ln

790

, while

2 . 23. Poles around infinity.

i (1 8 k ) 6

2 3

i ln 3 2 k

7. 8 e

, k = 0, 1, 2. Principal is k = 0, or
3i

2 3

, k . Principal is k = 0, or e

( 2 4 k ) 3

1 i 3

2 e 1 1 4 28 .

3 3 x 3 x 9 x 2 27 x 3

Need x > 4.222 for this accuracy, with 5 terms


kept (including up to the 4th power of x).
(c) Yes (expansion is entire), but need 16 terms (up
to the 15th power of x).
(d) Window is determined by finding out when the
Maclaurin expansion with six terms fails to give
accuracy of 0.0001. It is 0.89575 < x < 4.222,
though the expansions are actually a little better
than this; we used the maximum error bounds.

2e

(1 i )

Radius is

n0

3.

15. (a) Series is 3

Section V.2
i ( 3 12 k ) 8
, k = 0, 1, 2, 3. Principal is k = 0, or
1. e
1
3 4
i
2 2 i 2 2 .
2

3 2 i

i ln 3

, k . Principal is k = 0, or

2 i ln 2

8 e

2 3

2 i ln 2

45 i Arctan 2 . 11. 2 i ln 17 .

9. ln

i Log 2i 1

4i 2

2 20 cos Arctan(2) 2

4
1 20 sin Arctan(2) 2
4

Arctan
13.

i
2

ln 5

20

3 4

cos Arctan(2) 2 Arctan 1 2

15. i Log z i 1 z

20

. The value is given by

i Log 2 3i i 6 12i

6 5 cos Arctan(2) 2 3

2 6 5 sin Arctan(2) 2

Arctan

i
2

ln 13 6 5

2 78 5 cos Arctan(2) 2 Arctan 2 3

Section V.3
z2
1 & 3.
. Region in problem 3 contains the
4 z
point mapped to infinity in its interior, so maps to

Advanced Mathematical Techniques


7. Cant use chapter 4; need an expansion for small x.

Section V.14
1. -3i, -1/2.

5. 12 e .
9. 1/2

3. -3i/4, i/2.

11. 2

7. (3i + tan z)/2.


13. 4
15. 1/3.

17. 1 ( z 2)( z 5)( z 3i )e . Infinitely many. One


occurrence is close to -3.86689 0.43416i.
z

19. e e , so the two functions are in different


places at the same value of z; they do not have to
be zero or infinity in order for the function to equal
1 or -1. The function equals 1 at ln 2 2 ik and
2z

-1 at ln 2 3 i 4 2 ik , for all integer k. This


function has no exceptional values.

Chapter VI

1 4 x
1 i 4 x
Log
i Log

4
4
4 4 x
1 x
1 i x
1 1 i 3
3
2
Log 1 3 x x

3
2
3 x
.
3.
1 i 3 3

3
i 3 Log 1
x Log 1 x
2

3
1 x

x 2
e 2e cos
x .
5.
2
3x
2
3

3 cosh x 2 sin x
8

ln
1 1 3 .
3 3 2
1

.
2

sinh x sinh x 2 cos x 3 2


3

11. (e) No. Legendre is special because it involves


only one Gamma function with a non-integral
argument.
Section VI.2
1. Gauss gives 30.29.
3. x/ln x gives 39,509, Li gives 43,128. Dusart
bounds 42, 511 (520, 000) 43, 083 .
5. Dusart holds that the next prime after 350,377
comes before 351,452 and the next prime after
350,381 comes before 351,556. Both are satisfied
(by a wide margin). Schoenfeld indicates that the
next prime after 350,377 should come before
350,398, which holds. His result also indicates that
the next prime after 350,381 comes before
350,402. This is not correct, as n is not large
enough to apply Schoenfeld. Dusart will be better
for asymptotically large n (very large n).

798

( 1) x
k

2 2 x
k 0

( k 1)!(2k 1)

Section VI.3

1. (1 s )

1 2

cos s 2

( s ) ( s ) .
s 1
1 2
Trivial zeros occur at negative even integers, as the
cosine function supplies the zero in this case.
There are two stumbling blocks in this evaluation.
The first is associated with writing the integral
representation of the eta function in terms of a
contour integral. When taken over the same
contour as that used for the zeta function, this
s

integral gives 2i sin( s )e ( s ) ( s ) . The second


is that the evaluation of the integral does not
directly give us the eta function. It gives the
lambda function instead. This is fixed by writing

(s)

7. 3

9.

i s

Section VI.I

1.

9.

3. (1 s )

1 2

1 s

1 2

cos s 2

(s) .
1 s

1 2

( s ) ( s ) .
s
1 2
Trivial zeros at negative even integers, including
zero.
s

Section VI.4
1. x/ln x gives 159,556, Li gives 172,360, and
Riemann gives 172,220.
3. The infinite product expansion could be multiplied
by any entire function that has no zeros. If the
zeros of ( x ) were to bunch up anywhere, then it
would not admit an infinite product expansion.
4. (d) Order is 1. The zeros become more and more
regularly spaced as s grows without bound.

5. 1 3 1 3 2 2 7 (3) 8 .
3

Chapter VII
Section VII.1
1. 29.0 2.6 .

3. 0.316(47).

5. 5.56(22).

7. Without error, we have y 3.69 x 3.84 x 4.32 .


2

With error, we have y 2.195 x 0.96 x 3.3 .


The error-weighted fit definitely comes closer to
the data with small errorbars than the fit without
error weighting, and the model parameters change
a great deal. This is an indication that the data is
not good enough to fix the parameters of the
model; other experiments should be done with
2

Answers to Selected Exercises


3. (a) 4.50058, (b) 4.50039. The exact value is
4.500336, so the results are very good.
Interestingly, the treatment with 0.01 gives only a
single new digit.
5. 3.34598 and 3.34553. The 0.01 result is correct to
5 digits, but the care required to properly evaluate
the correction is a bit much especially concerning
the relationship between the signs of the first and
second derivatives.

better error bounds especially in the region of


large x in order to validate the model.
18

14

16
14

12

12

10

10

6
0.5

1.0

1.5

2.0

0.5

2.5

Problem VII.1.7, no err

1.0

1.5

2.0

2.5

Problem VII.1.7, err

7. (a) r0

9. Without error, y 2.06 x 0.104 x 1.05 ln x .


2

With error, y 2.282 x 0.1476 x 1.061ln x .


The coefficient of ln x is fairly well-determined by
the data, but the other two coefficients would
benefit from more data.
2

(c) T

1.0

1.5

2.0

2.5

Problem VII.1.9, no err

0.5

1.0

1.5

2.0

( 0.50 0.03) x

y (2.17 0.06)e
(2.17 0.06)(1.64 0.05)
It is remarkable that we can attain this much
accuracy for the parameters of our model with data
whose errorbars look so ominous in the graph.
6
5
4
3

1
0.5

1.0

1.5

2.0

Problem VII.1.11

mr0

2V0

2.5

(e) K ( m)

Problem VII.1.9, err

1
m

K m

K 1 m

The

3 10 K 3 10 .

integral given in the text is

(c) cos 1 sin m .

11. The relation is y (1.5 0.5) x (0.65 0.15) .It


is obvious from the diagram that a function of this
form agreeing with the data must fall between
these two curves.
13. The relation is

10

(b) V0 B (4 A) .

2A B .

(d) The ratio of actual period to SHM period is


1.06891 for 0.1, 1.00637 for 0.01, and 1.000634
for 0.001.
9. (a) Write in terms of cosine, then re-express.

3
0.5

Section VII.4

1. x (t ) 2 cos(2t )
x

1
2

sin(2t ) . Underdamped.

2 t

3. x (t ) (4 5t ) e . Critical damping.
5. L = 2.6 pH. Time is 0.12 s , and a little more
than 8360 oscillations take place during this period.
The constant is of order 10 , so will have a
negligible effect on the frequency.
5

Chapter VIII
0.5

1.0

1.5

2.0

Problem VII.1.13

Section VIII.1

1. The radius is 2.284 10 m 1.53 AU , and the


speed is 24,180 m/s.
11

Section VII.2

1. U ( x ) x 4 2 x 3 . Turning points are -1.757


4

and 3.322. Max. speed is 2 11 3 .


3. U ( x ) x 4 cos x . Turning points are 3.6685 .
Max. speed is 2.263.
2

5. U ( x ) x 8 sin x . Turning points are 1.86 .


Max. speed is 2.8858.
2

Section VII.3
1. (a) 3.33213, (b) 3.34512. Exact is 3.3455, so
these results are fairly good. The corrections
represent an important contribution, 5.7% even for
0.01.

3. Acceleration has magnitude 2 3 9 4t , in SI.


This magnitude is twice the rate of speeding up at
about 1.61 s; its speed is 3.22 m/s. Acceleration
becomes closer and closer to perpendicular, but
always makes an acute angle with the velocity.
4

Section VIII.2

1. 5012 m/s.

3. vesc

32 G M 3 .
3

5. r . Aphelion is 53, and perihelion is 5.666.


2

799

Advanced Mathematical Techniques


Section VIII.3
1. Plug it in and square it out. The result follows
from the Pythagorean theorem.
3. The scale parameter tends to infinity as we
approach eccentricity 1. The perihelion distance is
well-defined for all orbits, so it allows us to
compare orbits to one-another. The scale
parameter is only useful as a mathematical tool; it
does not represent the distance between two
physical objects.
4. We can easily see why this relation is true by
drawing a diagram. Remember that a triangle
containing vertices at the vertex of a hyperbola, its
center, and the point on the asymptote directly
above the vertex, has hypotenuse c. The deflection
angle is the angle between the two asymptotes.
5. The other focus lies 2ea from the origin. Its not
pretty, but we can show that the sum of the two
distances is constant by writing out this sum in
polar coordinates and simplifying the complicated
one by combining fractions. We then recognize the
argument of the square root as a perfect square and
use this to combine it with the more simple term.
The sum is 2a. Showing the algebra II expression
is also a bit of work. Write the requirement of
constant sum of the distance to each focus in terms
of x and y, then move one of the square-roots to the
other side and square. Cancel, isolate the squareroot, and square again.

Chapter IX
Section IX.1
5
1. (a) 2
.
s 25

s.

(d)

800

s 16
2

s 4

2. (d)

2( s 5)

3. (a) 4 3 ( s 3)

4i

4 3

. (f)

18 s 3

(h)

. (g)

3 2

(c)

( s 2) 9

(g)

s2

(e)

6
( s 1)

6
( s 2)

( s 3i ) 3 2 ( s 3i ) 3 2 .

5 4

( s 2 3i ) ( s 2 3i ) .
2
1 log( s 1)
log( s 4)
(c)
. (d)
.
2
( s 1)
s4
(b)

(e)
(f)

5 4

2( s 2)

(g)

18
( s 1) ( s 1) 36
2

log s
2

s
1 i

5.

s
s c
2

. 7. tanh s 2 s ; poles at 0,
2

(2 k 1)i ; k

( 1) 2( k 2) 3
(2k 1)! s

2( k 2) 3

(c) log 1 1 s .
( s c) 1 .
2

13. (a) 1

k 0

(c)

6 2 2(1 ) log s log s

s sin c cos

11. (a)

1 1 log( s i ) 1 log( s i )

2
2
.
2i
(s i)
(s i)
2

(g)

6 2 log s

4. (b) (1 i ) ( s 2)
(d)

5 4

2 2 ln 2 log( s 2) .

3 2

Section VIII.4
1. The eccentricity is 0.65, the aphelion is at 3.588
and the perihelion is at 0.56. The period is 3.2
years, and the specific energy and angular
momentum are -0.23 and 0.56, respectively.
3. The eccentricity is 3.16, the perihelion is at 0.81,
the asymptotic speed is 1.633 and the deflection
angle is 37 degrees. The specific energy and
angular momentum are 1.33 and 1.5, respectively.
5. The eccentricity is 0.927, the perihelion is at 0.018,
the aphelion is at 0.482, and the period is 0.125
years. The specific energy is -2, and the specific
angular momentum is 0.375.
7. These measurements contain information about
where the object is in its orbit. We could instead
measure the specific energy and angular
momentum at a certain point, but we would need to
know where that point was in the orbit in order to
correctly measure these quantities.
9. 474.6 seconds per century.
11. This transformation does not change the speed or
angular momentum of Europa, so we can use
either the general equations or the scaled equations
to determine the new orbit. The new eccentricity is
0.9889, periapsis is about 3.75 Mm, apapsis is
about 671 Mm, and period is about 3.2 hours. I
would not expect this change to help.

(b)

( 1) k !
k

(b)

(2k 2)! s

k 1

k 0

(d) log 1 1 s s .
(b) 1

( s 3) c
2

1
1

.
2
2 ( s 3i ) 2 4
( s 3i ) 4

(d) s
(g) c

s
s

3 2

3 2

.
.

(e) 1 s

s 1 .
2

Answers to Selected Exercises


Section IX.6


1.
e
e
.
2i
0

3. e

5. e

9. e

cos 0

2 4

cos t

2 4

cos t
( i )

2.

2.

( t ) e
n 1

( t )

when t . If n is a

transform has poles at negative even integers.


3. ( s ) sin s 2 . The integral is not obviously

analytic throughout the plane except when s is a


negative odd integer.
5. 2 csc s 2 . There are problems with the
s 3

integral whenever s 1 1 , but the transform is


well-defined everywhere except at even integers.
i i 2 1 2 i 4
1
2 i 4
.
7. sin ln x . 9. x
x e x
e
4
( s 1)!
n
11. (a) ( s n ) . (b) ( 1)
( s n) ; s n .
( s n 1)!

. 15. x sin( 2 ln x ) 2 . 17. J 0 ( ln x ) .


2
( n 1)
19. The difference between any two such polynomials
must have Mellin moments that are zero for all
positive integers. This is not possible, as it has a
finite degree.
3

Section X.1
1. 1.1%. 3.(a) 10.1%. (b) 4.19%. (c) 94.2%. (d) 7.4%.
4. (a) 48. (b) 5.37. (c) 3427.2. Smaller. Diff is k .
2

np 1 3( n 1) p ( n 1)( n 2) p

Remember that different measurements are not


correlated with each other.
(d) ( n 1) k . Note
2

k n k mean k .

j 1
15

(b) 6.16%.
6. (a) 8.56 10 % .
7. (a) 11.14% versus 35.14%, error is 68%. For 2000,
its 79.44% versus 80.1%, error is 0.8%.
(b) Changes to 11.10% and 79.3%.
(c) 14.5% versus 27.9%, error is 48%. For 2000,
7

its 1.19 10 % versus 8.42 10 % , error is


41.3%.
6

(d) Changes to 13.5% and 1.13 10 % .


Overall, the more complicated expression rarely
really buys us anything; either approximation will
be good when n is large, as long as we are not
interested in really small probabilities, and neither
will be good when n is small.

Section X.3
1. 2.32%.

2. 5.4%.

3. 5.55%.

4. 4.01%.

5. n ( n 1)( n 2) 6 .
6. (a) 0.612%. (b) 12.85%. (c) 32.13%. (d) 10.7%.
(e) 3 pairs with same day and none others is 10.7%.
1 pair and 1 triplet and none others is 21.42%.
(f) nt ns N ( N 1) pt ps 6 7 .
3

N ( N 1) pt ps 1 ( N 2) pt ps
2

2(2 N 3) pt ps 6 97 7

365 N ! 364
365 N !
(b)
365 N .
.
N
N
2(365)

N 2
N ! 365 363
(c)

.
N
4(365) 2 N 4
N ! 365 362
(d)

.
N
8(365) 3 N 6
365 N ! 364
(e)
(f) One of the other
.
N
6(365) N 3

7. (a)

Chapter X

3. 0.54%.

5. (b) Work from the definition, k k

analytic for any values of s, though it does behave


when s 1 . The transform, on the other hand, is

5. k

Section X.2
1. 26.915% and 8.31%.
2

Section IX.7
1. s 2 2 . Integral is analytic for all s > 0, and

6. The gambler will have a 90% CL on 277 323


heads, or 300 23 .
7. 70.6% probability that at least two are the same.
Need 41 people for a 90% CL.

2 4 .

( n 1)!
negative integer or zero, the function is zero for all
t. If n is not an integer, the function can be
determined via analytic continuation. Negative
values of interchange these results.

13.

np 1 7( n 1) p 6( n 1)( n 2) p

( n 1)( n 2)( n 3) p

11. The function is zero whenever t and has the


value

2 . 7. e
2

and

possibilities is one set of two and one set of

805

Advanced Mathematical Techniques


for solids and liquids; less so for gases.
Expected to be positive.

three with the same birthday. This gives a


probability of 1.09%. There are MANY others.

8. (a) V

Section X.4
10
1. (a) 2.64 10 k B .

(b) 2.77 10 k B .
10

2 10

10

(c) n1

1 x x x
3

. n2 x n1 , n3 x n1 ,
3

(d) 4.4425 k B .

n4 x n1 .
5

(e) 2.6107 10 k B .
(f) 7.6, 6.067, 3.868, and 2.4658 billion.
10

2. (a) 1.607 10 , 1.3157 10 , and 1.077 10 .


15

(b) 4.352 10 k B .
15

15

15

(c) 3.4697 10 .
15

15

(b) 4.07 10 k B .
(c) 5.281 10 .
(d) Negative temperature indicates that states with
higher energy have higher occupation numbers.
The system has more energy than it can hold
without this population inversion; it would
donate much of this excess energy to external
systems if it was allowed.
15

2 xy

806

(d) 1 B .

(e) V .

(f) C p BVT

9. 17,450 K. Ignition only requires about 2500 K.


11. About 52,000 K. 12. 145 MK. Initiation is less.

Section X.5
1. 22.6%. 3. 9.31%. 5. 3.38%. 7. 0.62%. 9. 1.77%.
11. (a) He is 49km, N 2 is 6.88km, CO 2 is 4.52km,
and water is 10.6km.
5

(b) He is 20%, N 2 is 1.6 10 % , CO 2 is


9

1.8 10 % , and water is 4.9 10 % .

15

4. xy 2 .
5. xe
2.
7. (a) Measure the pressure required to keep the
volume fixed as temperature increases. Positive
for most materials, and somewhat difficult for
most solids and liquids. Easy for gases.
(b) Measure the change in volume associated with
a change in temperature without allowing any
heat exchange with the system. Very difficult
to do. Increasing the temperature without heat
flow requires work to be done on the system, so
the volume decreases and the quantity is
negative.
(c) Increase the pressure on a system in thermal
contact with a heat reservoir and measure the
energy flow in or out. Moderately difficult.
Will be zero for ideal gases, and can be
positive or negative for non-ideal systems,
depending on whether or not higher pressures
make it easier to attain a given temperature.
(d) Keep the volume constant and heat the system.
Determine the pressure required to keep the
volume constant as its temperature increases.
Easy with gases, not so easy with solids or
liquids. Positive for most materials.
(e) Add heat to the system, but make sure that its
temperature remains constant (thermal contact
with a reservoir). Somewhat difficult to
accomplish; expected to be negative.
(f) Keep the volume constant as you heat the
system, measuring the change in volume and
heat energy required. Difficult to accomplish

(b) V C p BT .

B.

(c) V p B T .

15

3. (a) 7.453 10 , 1.229 10 , and 2.026 10 .


14

(c) He is 63km, N 2 is 6.3km, CO 2 is 4km, and


water is 10.2km. (d) smaller by lower temp.

Section X.6
13
1. 5 10 .
9. (a)
(b)

pV n a V
V nb

(d)

na
V

nR
pV n a V 2 n ab V
2

(c) nR ln

5. p V .

3. V 2T .
2

(V nb )

T0 (V0 nb )

pV

nR ln

p (V nb )

p (V0 nb )
(e)

V nb

.
2 VT
V nb
(f) No heat can flow into the gas at constant
entropy, so work must be done on the gas in
order to increase its temperature.
10. The difference is much more apparent near the
liquefaction temperature, as the curve develops a
dip associated with a positive value of p V T .
This cannot be physical, as increases in pressure
must lead to decreases in volume, so it is
interpreted as representing the phase transition
from gas to liquid or solid. The negative pressures
do not bother us for this reason; they are not
physical. The graphs follow; pressure is given in
bars and volume is given in liters.
100
50

4
2
-2

0.05

0.10

0.15

0.20

-4

Problem X.6.10(a)

- 50
- 100
- 150

0.05 0.10 0.15 0.20

Problem X.6.10(c)

Answers to Selected Exercises


1.00
0.95
0.90
0.85

3.

0.99
0.98
0.97
0.96

10

15

20

5.
80 100 120140 160 180 200

Problem X.6.10(a)

Problem X.6.10(c)

(d) The values are not the same, but the bulk
modulus definitely drops off quickly as the
liquefaction temperature is approached. This is
because the atoms/molecules are finding that
they like each other and no longer feel the
repulsion they once felt liquefaction is
occurring. The bulk modulus of the liquid itself
will, of course, be much larger than the value
associated with the gas, but a super-cooled gas
will have a lower bulk modulus than that seen
at higher temperatures. The Van der Waals
model does not contain the details of the phase
transition.
Section X.7
1. 47.2 J m and 1.27 10 m
3

18

14

5. 2.7 k BT . This is a little more than would be


expected, the difference being attributed to the fact
that multiple photons can occupy the same state.
7. 560 kK. 9. 9.53 GK. 10. (a) 123 K. (b) 64 K.
11. 362.5 nm. This is part of the ultraviolet spectrum,
so is not associated with a color. Object will
appear white, as all of the visible spectrum is
prominent.
13. 10,860 K. The number of photons is about one per
260,000 hydrogen molecules at 1 atm, one per 13
million at 50 atm, and one per 77 million at 300
atm. Hydrogen is expected to be mainly molecular
at this temperature.
15. The density of photons is

12 (3) hc 1.51627 10
and the energy density is
3

1 K m ,
3

7 k BT 15 hc 2.1 10 T 1 K J m .
These values are smaller by factors of 1.34 and 3.6
in comparison to the boson values; the number of
photons allowed in each state is limited.
4

16

Chapter XI
Section XI.1
1. Yes.
2. No.
Section XI.2
1. x1 x2 x3 162

3. No.

15

x2

x3 29 2t

x2

x3 1 3t

1
0
1

9. 0

1
0
11.
0

0
7.

0 .

2t

11 7

5 7

4 19

3 19 . 2 pivots.

t 11 .

12 5t

. 2 pivots.

. 1 pivot. 13. No. Cannot be unique.


0

0
5

15. (a) No solution. Null is spanned by 3 .

1

0

(b) x 13 7 5s 4t , y 18 7 s 5t , z 7 s ,

3. 2.65 nJ m and 7.14 10 m .


3

x
x

4. Yes.
139 20 .

5 4
1 5

w 7t . Null is spanned by , .

7
0

0 7
(c) x 1 3 s t , y 3s , z 3t . Null is

1 1

spanned by 3 , 0 .


0 3
17. Cannot be onto, but can be one-to-one.
18. No. Zero vector is not contained.
19. Yes. Zero vector is contained.
20. The set of vectors x span vk
space iff x span vk

m
k 1

m
k 1

is a vector

Section XI.3
row( A) span 7 0 0 5 ,
1.
0 49 0 51 , 0 0 49

27

35
51
3
.
col(A) is . Null is spanned by
27

49

3. row( A) span 1 0 2 , 0 1 1 , col(A)


is spanned by first two columns, and null is
spanned by 2

1 1 .
T

807

Advanced Mathematical Techniques

1 1

5. Null is span 1 , 0 .

0 1

7.

row( A) span 1

0 ,

0 , 0

col(A) is , and null is spanned by


2 1

1

1 , 0 .

0 1

0 0
0

2
.
1

9. Null is spanned by

Section XI.4

1 8 5
1

1. -40. 3. 68. 5. -63. 7.


7 6 5 .

20

1 2 5
22 11 11

9. Singular. Adjugate is 20 10 10 .

26 13 13

26
26
26
x


76
1
1

11.
13
7
3 . 13. y
22 .

21

52

z
2
7
13 1

x
9

15. y
38 .

7
z
42

.
9 p ( t ) 2 t t 5 t t
.
11 t t 3 3 2t

1. p (t ) t 1 1 t
3

3.

808

6 4
4 12 .

9 4 4 18

0 5
5
0
9 0 0

4 0 0 : V U and

10 0 0
9
0

10 0
: U V . Answers not unique.
13 0

1
0
0

15

15

9. W = -1. Indep.

11. Dimension of null space is 12. It is possible to


have no solutions, solution cannot be unique.
13. Null space is trivial, dimension 0. There can either
be no solutions or exactly one.
1 15
1 17
. 17. x
.
14. No. 15. x

7 24 B
7 33 C
19. True.
20. False.

Section XI.5

0
0
1
5.
3
15
3
0

10
1
7.
0
7
8
7

1 0

10 11

13

13. W Ke .
3t

11. W = 0. cos 2t 1 2 sin t .


2

15. W K t . 17. W K sec t .


2

Section XI.6

1
2
1. 1 3 ; v1 . 2 4 ; v2 . Span.
1

3
1

3. 1 (2) ; v . Do not span.
2
1
5. 1 (2) ; v . Do not span.
1

1
0




0
1


1
3 4 ; v3 1 . Span.

0

7. 1 2 ; v1 1 , v2 0 .

11. (a) 1 4 , 2 1 . Eigenvalues are distinct, so


eigenvectors span.
4
1
(b) v1 ; v2 .
1

1
(d)

2 4 1 14 4
;
.
1 1 5 1 11

A 5I

12. (d) Hint: Change the matrix slightly


13. (a) 1 12, 2 24, 3 24 . Could span.

1
5
2


(b) v1 1 ; v2 3 ; v3 0 .



2
0
3


Answers to Selected Exercises

21

5 5 2
1

(d)
3 19 2 ,
3 3 2 ,
2

6 10 4
6 10 20

3 125 46
1

27 27 18 , and

18

60
18 30
15 95 58
1

9
38
6 .

18

54 90 36
1

15. (a) 1 17, 2 34, 2 34 . Distinct, so span.

3
1
1






1
0
3



16 9 3

(d) A , 0 34 0 ,

6 3 35

20 27 9 12 23 15

0
34 0 , 8 30 24 .

18 9 37 6 3 35

3
3
17. (a) 1 1 ; v1 . 2 2 ; v1 .
2
1
2 823
5
(b) A
.
7
282
1
3


19. (a) 1 3 (2) ; v1 1 . 2 6 ; v2 0 .


3
4


2

(b) w 1 . (c) A 3 I w 6 v1 3 v2 .

3

1760
2024.58
, 5 yrs is
21. 1 yr is

, and many
940
675.424
2025
years gives
.
675
2t
3 t
x (t ) e 9 e 5
23.
.

2t
y (t ) 2 5 15e
x (t ) 3 cos 2t 28 sin 2t
25.

.
y (t ) 5 cos 2t 6 sin 2t
(b) v1 0 ; v2 2 ; v3 1 .

Section XI.7
1. (c) becomes 2 x1 y1 x2 y2 3 x3 y3 .
3. (a) Infinitely many vectors have norm zero.
1
1
(c) The vectors and each have zero
1

1
norm, but their inner product is 2.
41
10
3
129

5.
26
11 .
237
79

4
31

2
29
317
3
2
11
1
1
1
,
,

.
7.
39 1 3 455 57 3 16, 555 159

5
1
152
1
4
1
2
7
0
1
1
1
,
,
.
9.
30 4
70 1
14 2



3
2
3
i
i
11. 1 3 , v1 ; 2 5 , v2 .
1
1
1
1


13. (a) v1 0 ; v2 1 . There are only two.


0
0


0

(b,c) Lets include w 0 . Basis is the e-basis.

1

1
1

(d) A 2 I w 0 4 1 .

0 0

15. (c) I will answer the two whys in the treatment
given in the text. If there was a nontrivial
intersection between U and Nul(A), then this
intersection would represent eigenvectors with
eigenvalue 0. Matrix A maps every vector in U
to another vector in U, so can be understood as
a transformation between basis vectors. The
space W is thrown out in this analysis. Since
the null space of this transformation is trivial
and it is definitely a square matrix with each
row/column representing a basis vector, it is
one-to-one and onto.
17. (a) Singular matrices are not invertible.
(b) Multiplicative identity is the zero matrix.
(c) Multiplicative identity is I.

809

Advanced Mathematical Techniques

Chapter XII

5.

Section XII.1
1. Countable.
4. Countable.
7. Uncountable.
10. Countable.

2. Uncountable.
5. Uncountable.
8. Countable.
11. Countable.

Section XII.2

4 3

1. cos t

b1 (t )

24 7 10

b (t ) .
3

p1 (t )

2,

p 2 (t )

3 2t

45 8 t 1 3 ,
2

p3 (t ) 175 8 t 3t 5 .
3

3t

cos t sin t

8. (a)

(c) (t )

3t
4

35 15
2

35

45

32

1 3

3t 5

Section XII.3
1. t
2

2 t

1
3

( 1)

k 1

Section XII.4
1.

(1 ik ) e
k

5.

ik

3.

2i ke

2 ik

(3k 2i )e
k

ik

10

.
2
2
( k i ) 100
9. (a) Physical measurements must be real.
(b) When measured, a physical quantity gives a
definite result. If two states are associated with
definite results that dont coincide, then there
can be no overlap
(c) If a state was not spanned by the eigenstates, it
would, by definition, not be able to be
expressed in terms of states with a definite
value of the physical quantity. We HAVE TO
be able to measure a physical quantity for any
physically possible state, and this process
MUST lead to a well-defined value. This value
must be an eigenvalue of the operator, by the
structure of the theory, so consistency requires
the eigenvectors to span the space. Hermitian
operators serve this purpose very well.
(d) No. It has a continuous spectrum, so its
eigenfunctions are not actually functions.
10. (b) Write it out, and assume that you have an
eigenvector for one of them.
(e) Write it out, using the distributive property,
then use the quadratic formula.
(g) x p 2 .

Chapter XIII

where C satisfies
cos kt .

k
k 1

( 1)

k 1

sinh 2

3.

810

12

( 1)

k 5 1 cos k t .

Section XIII.1
C
gx gD cosh gD ,
1. (a) y
cosh

g
2C
C
2C

t 3t 5 .
3

k sin k t
9. The issue lies in the fact that infinidimensional
vector spaces have room, so dont need to
actually attain a minimum. The quadratic form
could just get smaller and smaller in one direction.

3. Countable.
6. Uncountable.
9. Countable.
12. Countable.

Bessel gives 0.499992, indeed close to 0.5.


1
1
t 1 2 t b0 (t )
b1 (t )
8
8 3
. Bessel
3.
5
7

b2 (t )
b3 (t )
32
32
gives 0.03255, uncomfortably far from the exact
result of 0.125. Need higher degree polynomials to
give a better approximation to this function. This
is also obvious from the graph.
5
1
f (t )
b0 (t )
b1 (t )
48
32 3
5.
.
29
7
b2 (t )
b3 (t )

192
192 5
Bessel gives 0.01593, again uncomfortably far
from the exact result of 0.016667, but not as bad as
3. Need higher degrees for a better approximation.
7. p0 ( x ) 1

t t (t )

k sin k t 2 cos k t

gL
2C

gD .

2C

sinh

(b) The lowest point is at x D 2 , and the tension


is given by C. Tensions are 0.92, 0.47, and
0.176 gD . Will equal weight when r =
1.039.

Answers to Selected Exercises


2.0
1.5
1.0
0.5

(c) T C 1 gL 2C . Minimum value is


2

gL 2 , corresponding to C = D = 0.
(d) Ratios are 1.15, 1.62, and 8.57, respectively.
Ratio tends to 1 as D tends to L and infinity as
D tends to infinity.
C
gx A cosh A , where A
2. (a) y
cosh

g
C

and C are the solutions to the equations


gH
gD A cosh A and
cosh

C
C

gL

gD A sinh A . The
sinh

C
C

solution to this set can be reduced to the


solution of the equation
g L H 2C sinh gD 2C
for C, then substitution in either of the others
for A. These equations do not have real
2
2
solutions unless r s 1 , so this is our
requirement. The graphs are below.
(b) The lowest point lies at x AC g , which
is positive, as A must be negative whenever
2rs
2 s . The tension there is
sinh 2

2
2
r s
r s2
C (provided A < 0). This tension is again
proportional to . The minimum point is
given in units of D by 0.47, 0.437, and 0.33.
(c) This tension is given by C cosh A at the left
endpoint and
gD A gH C cosh A
C cosh

at the right. It is clear that the right endpoint


supports exactly gH more weight than the
left endpoint, as expected. If s is increased at
constant r, the tension on the left decreases and
the tension on the right increases until s gets
close enough to r, at which point both increase
without bound. If r is increased at constant s,
both tensions increase due to the heavier rope.
(d) Tensions both tend to infinity; max is
2

L H , as it tends to a straight line.


(e) The ratios are 7.603, 6.5635, and 4.4947.
Tends to infinity as D tends to zero; cannot be
computed as D tends to its maximum, as the
minimum point becomes the left endpoint. The
ratio of largest to smallest tensions tends to 1.
2

0.2

- 0.5
- 1.0

0.4

0.6

0.8

1.0

Problem XIII.1.2(a)
3. (a) The required tension is 32.5 N. This is slightly
greater than half the weight of the chain, as
expected. The distance between the posts is
2.012 m.
(b) The required tension is 50 N. The distance
between the posts is 2.77 m.
(c) The posts will be 2.995 m apart in this situation,
and the chain will sag about 7.5 cm.

Section XIII.2
1. z ( x, t ) y ( x, t ) 5 2 x L . z k sin k x L .
3. z ( x, t ) y ( x, t ) 2 3 x . z k sin k 1 2 x L .
sin ( x L ) v

5. z ( x , t ) y ( x , t )
7. e

2 inx L

8. sin

sin L v

sin t .

; n .

n x
L

( 1) n cos
n

n x
L

; n .

Section XIII.3

3L

1. y ( x, t )

n
2

n vt

cos

n 1

y ( x, t )

5.

n 1 2
.
n 1 2 vt n 1 2 x

cos

y ( x, t )

n x

L ( 1) (2 n 1) 3

n0

3.

sin

cos

x
L

2( 1)

6 n cos n vt sin n x
2

n
3

n 1

Section XIII.4

u ( x , y , t ) 16 a b
3

1.

1 2( 1)
n
3

n , m 1

( 1) 1
m

m
3

n2 m2

sin
cos
cos
2 vt
2

a
b
a b

n x

3. We would need

m y

n a

m b to be a
2

rational multiple of both n a and m b .

811

Advanced Mathematical Techniques

5.

u r, , t

n ,m

J n jn , m r a

. The

m 1 n 0

cos n cos jn , m vt a
coefficients are all zero if n is not either 0 or 2. If
n takes one of these values, the coefficients are
given by
c2 , m
a

c0, m

(1 ) J 0 j0, m d

j
(1 ) J j d .

J j
a

J1

and

2, m

Some sample

2,m

u r, , t

n ,m

J n jn , m r a

cos n cos jn , m vt a
coefficients are zero unless n = 0, 1, 2, or 3. If n is
one of these values, then we have
cn , m
a

(1 ) J n jn , m d
3

J n 1 jn , m
2

S ( kx ) Y0 ( kb ) Y0 ( ka ) J 0 ( kx )

and m is a

J 0 ( kb ) J 0 ( ka ) Y0 ( kx )
solution to
Y0 b a Y0 () J 0 () J 0 b a J 0 () Y0 ()
Some early solutions for b = 3a are 1.54846,
3.12908, 4.7038, 6.27666, 7.84873, and 9.42039.
11. (b) J n

0 , m

n 0 , m

r sin n 0 . Three modes

are illustrated below. Try to guess which ones.

1 x

4. (a) w( x )

. (b) n n( n 1) .

1 x

(c) y3 ( x ) 15 1 4 x 4 x 8 x

5. (a) w( x) x 1 e

, x 1, . (b) n 3n .

3 x

(c) y3 ( x ) 3 9 x 36 x 27 x 2 .
3

y n ( x )t

6. The generating function is


8.

n!

n 0

. These

coefficients are twice those in problem 5 for n = 0


and 2; some sample coefficients for other n are
c1, m
0.2539, 0.12288, 0.0288 and
3
a
c3, m
0.364, 0.00574, 0.044 for m = 1, 2, and 3.
3
a
9. S m r a cos m vt a , where

n n 2a1 ( n 1) a2 2 .

. The

m 1 n 0

c0, m

0.082, 0.0975, 0.0252 for


3
a
m = 1, 2, and 3, and
c2, m
0.15867, 0.0274, 0.01533 for m = 1, 2,
3
a
and 3.

n 1

d wa n 1 na a wa n ,
2 1 2
2
n 1
dx dx

which vanishes by virtue of the fact that


wa wa . The eigenvalue is given by
d

0, m

coefficients are

7.

Section XIII.5
1. n 2n .
2. 1 .
3. (c) This follows from direct substitution.
(e) This follows from the binomial expansion and its
relation to higher derivatives of products.
(f) The only contribution to the derivative that is not
proportional to wyn is

!m!n !
,

m
n
m

! n ! n m !

2
2
2

provided m n is even. If not, the integral is 0.


2

mn

2 e
11. e

3 x

2 e
y 3 xe

y x e

4 x

2 e

4 x

y . The

eigenfunctions will be orthogonal on any interval


2 e

with respect to the weight function w x e


The solutions are decidedly not polynomials.
2

Section XIII.6

1. x 2 F2 (1,1; 2, 2; x )

e 1
t

3.

25

dt .

F3 (1,1, 5, 5; 2, 6, 6; x )

1
x

dt
t

ds s ln(1 s )
3

5. x 3 F3 1,1,1; 2, 2, 2; x

dt
t

e 1

x 4 F3 (1,1,1,1; 2, 2, 2; x )
7.

812

t (1 2 x ) t

dt
t

ds

ln(1 w)
w

dw

ds .

4 x

Answers to Selected Exercises

9.

a
b

k
p

5. The essential point is that the functions are not

Fq a1 1, a2 1, , a p 1;

eigenfunctions of x , but rather of v x .


2

3. 4646.5 m s .

2
1

Section XIII.8

1. u ( x, t )

12( 1)
k
3

k 1

u ( x, t ) 1

3.

32

sin

( 1)

k x

k t L c p

.
e

k 1 2 t L c p
2

u ( x, t )
7.

all in units of

x x
x
x
1 2 3
c p L L 2 L 6 L 3L
2

2 3 2( 1)

sin k x L e

k
3

k 1

.
2

k t L c p

Chapter XIV

8 z 5 z 3z
4

3
i L 6 v

.
, where z e
3
3
2 5 z 3z
8 3z 5 z
The first four frequencies are 0, 10.6329, 14.9885,
and 22.7106, in units of v L . The second and
fourth are plotted below.
2
1

1.0
0.5
- 0.5

0.2

0.4

(c) 4.53 rad s . The wavelengths are 2.92,


2.76, 2.13, and 1.325 meters.
2
3. (b) A( ) 2 0 2 i 0 .
0

Chapter XV

Section XIV.1
L
1
1. cos
, where v is the slower speed. The
3v
5
lowest four frequencies are 5.31646, 13.5331,
24.166, and 32.3826, all in units of v L . The first
and fourth frequencies are plotted below.
3. The required condition is

5 z 3z 2 z

g d .

(b) Shallow water needs 25.5d , and deep


water needs 2.726d .

1.0

2. (a) 7.92665, 2.50662, 1.42542, and 0.591529 ,

5. u ( x, t ) x L 2 x L 4t L c p .
2

0.8

Section XIV.2
1. 1 .

0.6

Problem XIV.1.7

k 0

k 1 2 x

0.4

-2

(2k 1)

cos

0.2

-1
2

z L 18v . The lowest four frequencies are


5.08559, 10.5306, 14.6283, and 21.6277, all in
units of v L . These are all plotted below.

5. (c) 331.75 m s .
(d) Impurities like carbon dioxide and water,
fluctuations in temperature and pressure, and
other considerations we have ignored.
(e) These contributions make the speed smaller.

Section XIII.7
1

The factor of 1 v corrects for the difference.


7. We need to satisfy
3 tan(2 z ) tan(6 z )
2 tan(3 z ) , where
3
3 tan(2 z ) tan(6 z )

b1 1, b2 1, , bq 1; x

1. 1450.5 m s .

0.6

0.8

1.0

-1

0.2

0.4

0.6

0.8

Section XV.1
1. 0 and 1 are regular singular points, and infinity is
irregular singular.
3. The points n , with n integral, are regular
singular. Infinity is an irregular singularity.
5. 0, 1, and infinity are all regular singular points.
1
G x;
2 sin 4 2
.
7.
sin 2 x sin 2 1 ; x

sin 2 sin 2(1 x ) ; x

9. G x;

1 1 1 x ; x
2

2 1 1 2 ; x

1.0

-2
-3

- 1.0

-4

Problem XIV.1.1

Problem XIV.1.3

813

Advanced Mathematical Techniques


G x;

7. y a1 x u1 ( x ) a2 x
3

cos x sin
; x

1 x cos x sin cos sin x ; x



x

11. (b)

(c) y

cos x sin x . Solution cannot be unique.

x
.
1 x x

G x; x 1
2

n ,m

u n ( x )u m ,

94 3

44

27 3
132

, and

x cn x
3

n4

u 2 is the same with the opposite sign for all of the

mn

and cn , n

1
n
2

16

1 ( 1)
n
6

cn 1 , where r 3 for u1 ( x )

Section XV.4
1. No. w( x ) 1 x . The weight function does not
matter in this case, as 2 and 1 .
2

1 e

dt

3. No. w( x ) e
x . In this case, 2
and 1 so the self-adjoint condition is
satisfied, but the behavior of the weight function is
too singular to admit solutions; 4 2 2 .
t

1 e

dt

. The weight function

behaves like x for small x, and 3 2 2 , but


3

as x tends to zero and no more

singular than (1 x )

be square-integrable with respect to the weight


function over both x and . This implies that
the weight function can be no more singular
2

(n r ) 3

5. No. w( x ) x e

1
(d) y wy . The Greens function must
x

n r 1

1 (1)
nm
,
4 n m 1 ( 1)

nm n m

8 1 ( 1)

than x

11

u 2 ( x ) , where

and u2 ( x ) , respectively.

n , m 1

cn , m

cn

(c) G x;

(b) 1/150.
where

43 3

3 s. The coefficients satisfy

Section XV.2
3. (a)

u1 1

as x tends to 1.

the self-adjoint restriction 1 1 4 is not


2

satisfied.
sin t t

2
dt
7. Yes. w( x ) xe t
4 . The weight function
behaves like x for small x, and 1 2 2 , and
x

the self-adjoint restriction 1 1 4 is also


2

satisfied.
9. No. w( x ) x . 3 2 3 , but the self-adjoint
3

Section XV.3
1. y a1 x a2 x ln x .

restriction 1 1 4 is not satisfied.


2

11. No. w( x ) 1 . The self-adjoint requirement is


satisfied, but 0 2 2 so the weight function
is too singular to admit solutions.

3. y a1 x sin(ln x ) a2 x cos(ln x ) .
5. y a1 x u1 ( x ) a2

u1 1

2
3
3

2
5
7

x
2

x u2 ( x) , where
4

15
133

26

x cn x ,
4

135
n5

247
2
3
4
n
u2 1 x x
x
x cn x .
2
8
240
640
n 5
The coefficients satisfy
( n r 1)( n r 2) 1
cn 2
cn 1 , where r = 1 for
( n r )(2 n 2r 3) 1
the solution u1 ( x ) and r = 1/2 for u2 ( x ) .

814

13. No. w( x ) e x . The parameters 2 and


0 are not even good enough to satisfy the
boundary conditions.
x

15. Yes. w( x ) x e . The parameters 3 and


0 satisfy all of the necessary requirements,
and 2 2 3 .
17. This requirement follows from the behavior of the
kernel near zero. Irregular singular points for
which is defined, but is not, can satisfy this.
2

Answers to Selected Exercises


Section XV.5
2 x

1. w( x ) xe . The eigenvalues are 5.381, 17.092,


and 33.720. Graphs are illustrated below.

0.30
0.25
0.20
0.15
0.10
0.05

0.04
0.02
- 0.02

0.2 0.4 0.6 0.8 1.0


Problem XV.5.3

0.2 0.4 0.6 0.8 1.0


Problem XV.5.1

0.10
0.05
0.2

0.4

0.6

0.8

1.0

0.8

1.0

- 0.05

Problem XV.5.1

0.08
0.06
0.04
0.02
- 0.02

0.2

0.4

0.6

Problem XV.5.1
3. w( x ) xe . The eigenvalues are 20.38366,
59.27611, and 117.81640. Graphs are illustrated
below.
x

0.15
0.10
0.05
0.2 0.4 0.6 0.8 1.0
Problem XV.5.3

0.08
0.06
0.04
0.02
- 0.02
- 0.04

0.2 0.4 0.6 0.8 1.0


Problem XV.5.3

815

Index

Index
Abelian
597
Abel's theorem
216
Absolute convergence
15
Acceleration
360, 379, 651
Additive property
533
identity of
136, 533
inverse of
136, 533
Adjoint
620
of the derivative operator
625
of the second derivative operator
659
of the Sturm-Liouville differential operator
747
Alternating series
29
bounds on
30
Analytic
173, 175
continuation 11, 75, 99, 107, 113, 210, 277, 338, 465
of the logarithm function
211
use of Mellin transform in
470
distinguish from differentiable
208
Angular frequency
375, 467, 524, 718
constancy of
718
Angular momentum
385
specific
386
Annulus
183
Ansatz
371
Aprys constant
57
Argument
of a complex number
141
of a function
61, 67
principal
288
Associative property
136, 533
Asymptotic expansion
error in
118
of a function defined by an integral
119, 123
of the Bessel function
127
of the error function complement
491
of the factorial function
128
Avogadros number
507
Basel problem
57
Basis
544
orthogonal
589
orthonormal
590
transformations between
549
Bernoullis principle
245
Bertrands postulate
323
Bessel function
444, 677
asymptotic expression for
127
confluent hypergeometric representation of
704
generating function of
682
growth of
202
integral representation of
127, 678
integrals of
681
Laplace transform of
408
orthogonality of
678
recurrence relations of
680
series representation of
408, 680

zeros of
678
Bessels inequality
605, 726, 753
Beta function
84
compact interval
84
integrals associated with
86, 89, 93
noncompact interval
89
trigonometric integral representation of
90
Binomial
distribution
482
expansion coefficient
35
Blackbody radiation spectrum
522
Blue sky
523
Boltzmann
496
constant
497, 519
factor
501
Boundary condition
178, 238, 649, 658
between two different media
717
circular
676
homogeneous
659, 662, 747
periodic
524, 626, 627, 659, 722
Branch cut
146, 169, 188, 212
Bromwich integral
427, 429
Bulk modulus
506, 709
adiabatic
710
Caratheodory theorem
500
Casorati-Weierstrass theorem
222
Catalans constant
59
Catenary curve
654
Cauchy
32
convergence criterion
15
integral formula
199
for derivatives
202
theorem
190
Goursat version of
175, 192
Cauchy-Riemann conditions
173
Cauchy-Schwarz inequality
598, 604, 754
Cavendish experiment
383
Cayley-Hamilton theorem
575
Centripetal acceleration
380
Chain rule
166
Chebychev polynomial
701, 706
Chemical potential
508
Cis notation
142
Clairauts theorem
186, 188
Cofactor
556
matrix
558
Commutative property
136, 533
Compact
640
Comparison test
15
integral
19
Complex
conjugation
137
derivative
163
map
135, 136
number
135

817

Advanced Mathematical Techniques


argument of
141
imaginary part of
136
modulus of
141
polar form of
74
real part of
136
number, imaginary part of
137
number, real part of
137
variable
135
Conditional convergence
30
Confidence level
483
Confluent hypergeometric function
704
hypergeometric representation of
705
representation of functions
704
Conformal
140
map
141, 250
mapping theorem
173
Conic section
389, 390
degenerate
393
Conservative vector field
361
Continuity
26
of a complex function
164
uniform
611
Contour integral
184
deformation of
200, 255, 332, 335, 435, 471
fundamental theorem of
186
path dependence of
185, 191
Convolution integral
458
for the Fourier transform
466
Corollary
191
Countability
606
Critical strip
333
Curve
139
rectifiable
185
rectilinear
185
Damping coefficient
375, 738
mass reduced
375
Data
353
fitting
351
chi-squared value of
351
least squares
351
linear
358
nonlinear
353
with error
355
linearization of
358
systematic errors in
358
Dawsons integral
408
Degeneracy factor
527
Determinant
554
cofactor expansion of
555
product property of
557
Dielectric constant
738
Differentiable
163, 175
Differential equation
371
dimension of solution space
584
Frobenius method of solution
761
homogeneous solutions of
565
initial conditions of
566
linear
564

818

method of undetermined coefficients


416
nonlinear
371, 392, 399, 653
ordinary point of
746
ordinary points of
423
particular solution of
565
regular singular point of
746, 760, 778
regular solutions of
760
singular points of
422
system of
583
uniqueness of solutions
749
with constant coefficients
371, 412
Diffusion equation
713
Digamma function
77
expansion of
317
series for
82
special values of
82, 318
Dilogarithm function
443, 635
hypergeometric representation of
443, 707
Dirac delta function
449, 623
Fourier transform of
630
integral representation of
453, 643
Laplace transform of
450
properties of
624
sequences associated with
623, 630
stretch property of
470
Dirichlet
beta function
57
eta function
100
lambda function
54
Dispersion
666
Distribution
477, 623, 642
Distributive property
136
Domain
135, 172, 175, 207
connected
192
of a complex function
135
of analyticity
209
of convergence
12, 25
simply connected
191
wall
208
Dyad representation
757
Eccentricity
390
Effective potential
386
Eigenvalue
569
complex
582
continuous
641
degeneracy of
753
degenerate
574
existence of
569, 754
of the derivative operator
627
Eigenvector
569
completeness of
570, 573, 595, 627, 757
existence of
754
generalized
374, 574, 577
linear independence of
574
of symmetric matrices
595
orthogonality of
595, 621
Einstein
526
summation convention
580, 594

Index
Elliptic integral
363, 368
properties of
370
Energy
361, 496
conservation of
361
equipartition of
519
specific
386
Entire function
25, 162
exponential
168
growth of
202
infinite product representation of
285
order of
306
polynomial
167
transcendental
223, 306
Entropy
497, 507
of an ideal gas
520
Error
ellipse
358, 491
propagation of
356
weighting
355
Error function
113, 513
complement of
113, 117
Euler
57, 67, 71
equation
422, 474, 766
formula
61, 74
identity
143
Euler-Mascheroni constant
22, 70
integrals associated with
79, 107
series for
78
Expansion
25
absolutely convergent
15
conditionally convergent
30
importance of complex numbers in
12
Laurent
216
pole
291
semi-convergent
115
Taylor
33, 205
Exponential
25
complex
142
confluent hypergeometric representation of
704
integral
37, 340, 440
Laplace transform of
411
use in prime number theory
327
Mellin transform of
470
series expansion of
35
Factorial
34
alternating
369
integral representation of
67
of complex arguments
69, 130
Field
137, 533
Finite difference equation
473
Fluid flow
230
circulation of
236
incompressible
232
irrotational
232
lift of
244
sources and sinks of
237
stagnation points of
234, 239
streamlines of
233

velocity potential of
233
vortex of
237
Fourier
69
series
634
transform
463
continuous
463, 643
discrete
632
shift property of
464
stretch property of
464
Fractal
208
Free energy
508
Fresnel integral
37, 443
complete
73
Frobenius method
761
Fuchs theorem
763
Function
135
confluent hypergeometric representation of
704
entire
25, 34
A-points of
309
order of
305, 306
transcendental
306
hypergeometric representation of
706
infinite product representations of
52
meromorphic
66, 229, 286
singular point of
34
square-integrable
616, 629
Fundamental theorem of algebra
177, 202, 290
Galileo
380
Gamma function
67
definition of
68
duplication identity of
81, 85
failure to satisfy a differential equation
111
Gauss multiplication formula for
319
growth of
202
incomplete
412, 426, 443
infinite product of
71
integral representation of
72
integrals associated with
73, 79, 130
limit definition of
68
logarithmic convexity of
68
Nemes approximation of
129
recursion relation of
68
reflection identity of
71
special values of
72
Stirlings approximation of
73, 124
Taylor expansion of
77
unshifted version of
69
with complex argument
130
Gauss integral
72, 126, 512
Gaussian elimination
536
Gegenbauer polynomial
706
General relativity
398
modified effective potential of
399
precession prediction of
401
Generating function
682
of the Bessel functions
682
of the Hermite polynomials
694
of the Laguerre polynomials
700

819

Advanced Mathematical Techniques


of the Legendre polynomials
696
Geometric series
11
trick
37
use to establish Taylor series
205
Gibbs
617
factor
509
free energy
505
phenomenon
617, 635, 645, 669
analytic determination of
634
Glacially diverging series
21
Global warming
528
Goldbachs conjecture
324
Grahams number
99
Gram-Schmidt process
590
for polynomials
616
Green
748
function
748
theorem
189
Group
597, 601
Harmonic functions
178
Harmonic series
21
finite
70
Heat capacity
505
Heaviside step function
446, 732
Laplace transform of
447
Hermite
595
function
694
polynomial
691
confluent hypergeometric representation of
704
generating function of
694
orthogonality of
694
recursion relation of
694
Hermitian operator
620
eigenvalues of
627
with continuous spectrum
629
Hurwitz zeta function
408
Hyperbolic function
62, 142, 372
Laplace transform of
410
Hypergeometric function
409, 436, 701, 705
confluent
704
integral of
707
Laplace transform of
439
representation of elementary functions
706
representation of orthogonal polynomials
706
Ideal gas
511
constant
513
law
518
Impulse function
449
Index notation
580, 597
Index of refraction
737
Indicial equation
763
Inertial frame
360
Infinidimensional vector space
603
basis of
606
completeness of
610
completion of
615
countable
609
eigenvector basis of
628

820

inner product of
614
uncountable
606, 641
Infinite product
47
absolute convergence of
49
convergence of
48
divergence of
47
of entire functions
284
of meromorphic functions
293, 298
of the cosine function
53
of the Gamma function
71
of the Riemann xi function
344
of the sine function
55, 295
of the zeta function
96
representation of functions
52
Infinity
147
size of
99
Initial condition
372, 662
Inner product
589
weighted
645, 678, 691, 692, 721, 757
Integral
18
asymptotic expansion of
119, 123
bounds on
192
bounds on series
20
by differentiation
43
contour
184
equation
752
fundamental theorem of
186
of a power series
26, 36
regularization of
87, 93, 105
Integrating factor
423, 499, 510, 698, 745
Inverse matrix
558
uniqueness of
553, 561
Isomorphism
560
Jacobi polynomial
706
Jordans lemma
272, 435, 465
Joukowski profile
243
Julias theorem
312
Kepler
389
first law
391
second law
389
third law
393
Kernel
750
degenerate
753
square-integrable
761, 765
symmetric
750, 752
Kronecker delta
590, 605
Kutta-Joukowski Theorem
246
Lagrange error bound
34
Laguerre polynomial
700
confluent hypergeometric representation of
704
generating function of
700
Laplace
177
equation
178
transform
405
inverse of
427, 429
of a polynomial
405
of derivatives
407
of differential equations
412, 422

Index
of exponential functions
of periodic functions
of the Bessel function
of the delta function
of the Heaviside step function
of the hyperbolic functions
of the sawtooth wave
of the sinc function
of the triangle wave
of trigonometric functions
shift property of
table of
use in determining integrals
use in determining series
Laurent series
of the cotangent function
Legendre
duplication identity
moment
polynomial
generating function of
recurrance relation of
Rodrigues representation of
Lemma
Lesbegue integral
Limit
comparison test
of a complex function
superior
Line integral
fundamental theorem of
Linear independence
Liouvilles theorem
Logarithm function
analytic continuation of
Laplace transform of
multivalued behavior of
principal value of
series expansion of
with complex argument
Logarithmic integral
asymptotic expansion of
principal value of
use in prime number theory
Maclaurin series
Mapping
conformal
of the complex plane
open
Matrix
adjoint of
adjugate of
augmented
column space of
diagonal
diagonalizable
eigenspace of
element

405
407
408
450
447
410
410
408
410
406
406
461
437
434
217
56
69
81, 85
697
692, 706
696
696
695
161, 611
614
13, 24
16
158
27
185, 188
361
536
162, 201
49
211
408
144
147
26
142
99
324
339
99, 338
33
135, 542
140, 141, 173
136
175
542
594, 620
558
537
544
571
572, 580
569
620

function of
Hermitian
invertible
migration
null space of
one-to-one
onto
orthogonal
rank of
row echelon form of
row space of
similarity transformation of
symmetric
target space of
trace of
transformation
transpose of
upper triangular
Maximum modulus theorem
Maxwell
equations
relation
Maxwell-Boltzmann speed distribution
characteristic values of
Meijer G-function
Meissel-Mertens constant
Mellin moment
inverse of
uniqueness of
Mellin transform
discrete
inverse of
modified
shift property of
stretch property of
Meromorphic function
infinite product representation of
logarithmic derivative of
order of
Mertens
limit
theorem
Mittag-Leffler theorem
Mbius
function
transformation
treatment of circles and lines
Modulus
Motion
circular
drag force of
first integral of
Newtons laws of
of a pendulum
period of
periodic
reduction to quadratures
second integral of

570
595, 620
549, 552
579
538
543
542
597
544
537, 546
544
572
595
542
580
542
546
537, 576
176
522
247
505
512
513
119
98, 328
469, 475
476
478
331, 469
475
470
475
469
469
66, 229, 286
298
287
293
98
328
31
291
152
330
152, 306
154
141
351
379
374
362
360
368
363
363
363
363

821

Advanced Mathematical Techniques


simple harmonic
365
Multinomial distribution
494
Multiplicative identity
136
Multiplicative inverse
137
Neighborhood
158
deleted
216
of infinity
223
Normal mode
663, 674, 683, 717, 729, 738
orthogonality of
721
Number
algebraic
608
complex
135
integer
607
rational
607
transcendental
65, 609
Occupation number
525
Orbit
387
characteristics of
396
closed
398
deflection of
396
eccentricity of
393, 395
Keplers laws of
380, 389
Milankovitch cycle of
404
natural units of
395
precession of
398, 401
scale parameter of
393
Order
finite
293
infinite
297
of a meromorphic function
293
of a pole
222
of a polynomial
307
of a zero
285
of an entire function
305, 306
Orthogonal polynomial
618, 699
Partial differential equation
651
nonlinear
665
separable
658, 676
well-posed
662
Partial fraction decomposition
257, 266, 413
Partition function
501
Pauli exclusion principle
529
Pfaffian expression
500
Phase space
512
of electromagnetic waves
524
suppression by
512
Photon
526
energy held by
527
polarization of
527
Picard
310
exceptional value
311
great theorem
312
little theorem
310
Pivot
537
Planck
525
blackbody radiation spectrum of
522
constant
524
Plasma frequency
738

822

Pochhammer symbol
436, 704
Pole
222
expansion
291
simple
224
Polygamma function
83
Population inversion
503
Potential
barrier
386
effective
386
energy
361
of gravity close to Earths surface
380
of universal gravity
384
retarded
467
Power series
25
Precursor
736
Brillouin
740
Sommerfeld
740
Prime number
96
bound on next
324
Chebychev bounds on
323
density of
98, 322
Dusart bound on
323
fluctuations of
341
infinite number of
97
theorem
99, 323
Principal value
147
of an improper integral
277
of the arcsine function
148
of the argument function
147, 171
of the logarithm function
147, 213
of the logarithmic integral
339
Probability distribution
481
binomial
481
continuous
486
multinomial
494
Product rule
165
for higher derivatives
262
Propagator
458
Quantum mechanics
522, 647
Radius of convergence
25
Ratio of specific heats
520, 710
Ratio test
24
Rayleigh-Jeans law
522
Regularization
75
dimensional
92, 759
of infinite series
105
of integrals
75, 76, 105, 364, 735
problems with
108
Renormalization
759
Rescaling
375, 513, 515, 665, 731
Residue
255, 335
at a pole
260
at an essential singularity
255, 264
theorem
254, 339
Riemann
56
counting function
347
hypothesis
342
integral
614

Index
J-function
mapping theorem
sphere
xi function
zeta function
definition
special values of
RLC circuit
Rodrigues representation
of Legendre polynomials
of orthogonal polynomials
of the Laguerre polynomials
Root of unity
Root test
Rouch's theorem
Scalar multiplication
Schlfli integral
Series
absolute convergence of
alternating
binomial
bounds on
Cauchy convergence criterion for
Cauchy product of
conditionally convergent
convergence domain of
expansion of the secant function
expansion of the tangent function
geometric
integral comparison test
interesting results of
Laurent
limit comparison test
power
ratio test for
root test for
Taylor
test for divergence
Sheets of the complex plane
Simple harmonic motion
critically damped
overdamped
resonance
undamped
underdamped
Sine integral
Singularity
at infinity
essential
at infinity
behavior at
Fuchsian
isolated
of a differential equation
of a function
pole
simple
regular

329
179, 248
146
344
19
56
56
376
681
695
699
700
315
27
290
534
267
11
15
29
35
20
15
32
30
25
58
38
209
18
60
217
15
25
24
27
205
14
144
365
374, 418
372, 418
417
417
373, 418
444
12, 35
747
133, 222, 747
306
310
764
197, 214, 216, 220
422, 746
34, 35
222
224
778

removable
221
Skewes number
99
Sound wave
655, 709
speed of
710
Span
540
Special relativity
759
apparent violation of
739, 741
Specific heat
507
Spherical coordinates
146
in d dimensions
91
State variable
497
Statistical mechanics
See Thermodynamics
Steepest descents
123
Stefan-Boltzmann law
527
Stereographic projection
146
Stieltjes constants
345
Stirling
124
approximation
73, 124, 498
Nemes correction to
129
dissection
316
Sturm-Liouville
646, 691, 698
eigenvectors span the space
757
equation
752
infinite number of eigenvalues of
755
orthogonality of solutions
691
singular
778
Taylor series
33
complex
205
error in
34, 207
of elementary functions
35
of the Gamma function
77
of the secant function
58
of the tangent function
38, 54
Temperature
501
determination of
502
negative
502, 510
of planets
528
steady-state
178
Thermal conductivity
713
Thermal expansion coefficient
506
Thermodynamics
496
adiabatic processes of
500
equation of state
508
first law of
497
of an ideal gas
519
relations of
504
reversible processes of
499
second law of
498, 501
states of
497
quasi-stable
504
Time-ordering operator
584
Triangle inequality
14
Trigonometric function
373
exponential form of
61, 142
infinite product of
55
infinite product representation of
53
Laurent series of
56
product-to-sum identity of
460

823

Advanced Mathematical Techniques


series expansion of
series representation of
sum-to-product identity of
Taylor series of
Twin primes
Ultraviolet catasstrophe
Universal gravitation constant
Universal law of gravitation
Vector field
conservative
solenoidal
Vector space
adjoint of
basis of
closure of
dimension of
infinidimensional
of polynomials
orthogonal complement of
Viscosity
Wallis product
Wave
equation
in two dimensions
inversion
linear approximation of

824

35, 38
54
431
58
341
522
382
380
185
233, 499
232
533
589
544
533
544
603
562
592
231
56
651
655
674
729
655

speed in deep water


speed in general water depth
speed in shallow water
Wave inversion
Wavenumber
quantization of
Weierstrass
approximation theorem
theorem
Weight function
Wiens displacement law
Winding number
Work-energy theorem
Wronskian
of solutions to a differential equation
Zeta function
derivatives of
expansion of
infinite product representation of
integral representation of
nontrivial zeros of
reflection identity of
special values of
trivial zeros of
use in prime number theory

736
734
734
726
524, 718
718, 719
69
611
284
721, 752, 767
523
288
361
565, 761
567
56
101
345
96
333
339, 343
100, 333
107, 346
101, 333, 341
331

Вам также может понравиться