Classical Theoretical Physics

Classical Theoretical Physics
Alexander Altland
ii
Contents
1 Electrodynamics 1
1.1 Magnetic eld energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Electromagnetic gauge eld . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Covariant formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Fourier transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Fourier transform (nite space) . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Fourier transform (innte space) . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Fourier transform and solution of linear dierential equations . . . . . 12
1.3.4 Algebraic interpretation of Fourier transform . . . . . . . . . . . . . . 14
1.4 Electromagnetic waves in vacuum . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Solution of the homogeneous wave equations . . . . . . . . . . . . . . 20
1.4.2 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Green function of electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 Physical meaning of the Green function . . . . . . . . . . . . . . . . . 26
1.5.2 Electromagnetic gauge eld and Green functions . . . . . . . . . . . . 27
1.6 Field energy and momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1 Field energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.2 Field momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.3 Energy and momentum of plane electromagnetic waves . . . . . . . . 32
1.7 Electromagnetic radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 Macroscopic electrodynamics 37
2.1 Macroscopic Maxwell equations . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.1.1 Averaged sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Applications of macroscopic electrodynamics . . . . . . . . . . . . . . . . . . 47
2.2.1 Electrostatics in the presence of matter * . . . . . . . . . . . . . . . . 47
2.2.2 Magnetostatics in the presence of matter . . . . . . . . . . . . . . . . 50
2.3 Wave propagation in media . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.1 Model dielectric function . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.2 Plane waves in matter . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.3 Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.4 Electric conductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
iii
iv CONTENTS
3 Relativistic invariance 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Galilei invariance and its limitations . . . . . . . . . . . . . . . . . . . 61
3.1.2 Einsteins postulates . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.3 First consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 The mathematics of special relativity I: Lorentz group . . . . . . . . . . . . . 66
3.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Aspects of relativistic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 Proper time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.2 Relativistic mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Mathematics of special relativity II: Co and Contravariance . . . . . . . . . . 76
3.4.1 Covariant and Contravariant vectors . . . . . . . . . . . . . . . . . . . 76
3.4.2 The relativistic covariance of electrodynamics . . . . . . . . . . . . . . 78
4 Lagrangian mechanics 81
4.1 Variational principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 EulerLagrange equations . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.3 Coordinate invariance of Euler-Lagrange equations . . . . . . . . . . . 86
4.2 Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.2 Hamiltons principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.3 Lagrange mechanics and symmetries . . . . . . . . . . . . . . . . . . 92
4.2.4 Noether theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 Hamiltonian mechanics 99
5.1 Hamiltonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.1 Legendre transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.2 Hamiltonian function . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.1 Phase space and structure of the Hamilton equations . . . . . . . . . . 106
5.2.2 Hamiltonian ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.3 Phase space portraits . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2.4 Poisson brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.5 Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3 Canonical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.1 When is a coordinate transformation canonical? . . . . . . . . . . . 113
5.3.2 Canonical transformations: why? . . . . . . . . . . . . . . . . . . . . 113
Chapter 1
Electrodynamics
1.1 Magnetic eld energy
As a prelude to our discussion of the full set of Maxwell equations, let us address a question
which, in principle, should have been answered in the previous chapter: What is the energy
stored in a static magnetic eld? In section ??, the analogous question for the electric eld
was answered in a constructive manner: we computed the mechanical energy required to build
up a system of charges. It turned out that the answer could be formulated entirely in terms of
the electric eld, without explicit reference to the charge distribution creating it. By symmetry,
one might expect a similar prescription to work in the magnetic case. Here, one would ask
for the energy needed to build up a current distribution against the magnetic eld created by
those elements of the current distribution that have already been put in place. A moments
thought, however, shows that this strategy is not quite as straightforwardly implemented as
in the electric case: no matter how slowly we move a current loop in an magnetic eld, an
electric eld acting on the charge carriers maintaining the current in the loop will be induced
the induction law. Work will have to be done to maintain a constant current and it is this
work function that essentially enters the energy balance of the current distribution.
I
B
To make this picture quantitative, consider a current loop carry-
ing a current I. We may think of this loop as being consecutively
built up by importing small current loops (carrying current I) from
innity (see the gure.) The currents owing along adjacent seg-
ments of these loops will eventually cancel so that only the net
current I owing around the boundary remains. Let us, then, com-
pute the work that needs to be done to bring in one of these loops
from innity.
Consider, rst, an ordinary point particle kept at a (mechanical)
potential U. The rate at which this potential changes if the particle
changes its position is d
t
U = d
t
U(x(t)) = U d
t
x = F v,
where F is the force acting on the particle. Specically, for the charged particles moving in
our prototypical current loops, F = qE, where E is the electric eld induced by variation of
1
2 CHAPTER 1. ELECTRODYNAMICS
the magnetic ux through the loop as it approaches from innity.
Next consider an innitesimal volume element d
3
x inside the loop. Assuming that the
charge carriers move at a velocity v, the charge of the volume element is given by q = d
3
x and
rate of its potential change by d
t
= d
3
xv E = d
3
xj E. We may now use the induction
law to express the electric eld in terms of magnetic quantities:
_
dsE = c
1
_
S
d nd
t
B =
c
1
_
S
d n d
t
( A) = c
1
_
ds d
t
A. Since this holds regardless of the geometry
of the loop, we have E = c
1
d
t
A, where d
t
A is the change in vector potential due to the
movement of the loop. Thus, the rate at which the potential of a volume element changes is
given by d
t
= c
1
d
3
xj d
t
A. Integrating over time and space, we nd that the total potential
energy of the loop due to the presence of a vector potential is given by
E =
1
c
_
d
3
x j A. (1.1)
Although derived for the specic case of a current loop, Eq.(1.1) holds for general current
distributions subject to a magnetic eld. (For example, for the current density carried by a
point particle at x(t), j = q(x x(t))v(t), we obtain E = qv(t) A(x(t)), i.e. the familiar
Lorentzforce contribution to the Lagrangian of a charged particle.)
Now, assume that we shift the loop at xed current against the magnetic eld. The change
in potential energy corresponding to a small shift is given by E = c
1
_
d
3
x j A, where
A = B denotes the change in the eld strength. Using that H = 4c
1
j, we
represent E as
E =
1
4
_
d
3
x (H) A =
1
4
_
d
3
x
ijk
(
j
H
k
)A
i
=
=
1
4
_
d
3
x H
k
ijk
j
A
i
=
1
4
_
d
3
x H B,
where in the integration by parts we noted that due to the spatial decay of the elds no
surface terms at innitey arise. Due to the linear relation H =
1
0
B, we may write H B =
(H B)/2, i.e. E =
1
8
_
B E. Finally, summing over all shifts required to bring the
current loop in from innity, we obtain
E =
1
8
_
d
3
x H B (1.2)
for the magnetic eld energy. Notice (a) that we have again managed to express the
energy of the system entirely in terms of the elds, i.e. without explicit reference to the
sources creating these elds and (b) the structural similarity to the electric eld energy (??).
Finally, (c) the above construction shows that a separation of electromagnetism into static
and dynamic phenomena, resp., is not quite as unambiguous as one might have expected:
the derivation of the magnetostatic eld energy relied on the dynamic law of induction. We
next proceed to embed the general static theory as described by the concept of electric and
magnetic potentials into the larger framework of electrodynamics.
1.2. ELECTROMAGNETIC GAUGE FIELD 3
Thus, from now on, we are up to the theory of the full set of Maxwell equations
D = 4, (1.3)
H
1
c
t
D =
4
c
j, (1.4)
E +
1
c
t
B = 0, (1.5)
B = 0. (1.6)
1.2 Electromagnetic gauge eld
Consider the full set of Maxwell equations in vacuum (E = D and B = H),
E = 4,
B
1
c
t
E =
4
c
j,
E +
1
c
t
B = 0,
B = 0. (1.7)
As in previous sections we will try to use constraints inherent to these equations to compactify
them to a smaller set of equations. As in section ??, the equation B = 0 implies
B = A. (1.8)
(However, we may no longer expect A to be time independent.) Now, substitute this represen-
tation into the law of induction: (E+c
1
t
A) = 0. This implies that E+c
1
t
A =
can be written as the gradient of a scalar potential, or
E =
1
c
t
A. (1.9)
We have, thus, managed to represent the electromagnetic elds as in terms of derivatives of
a generalized fourcomponent potential A = (, A). (The negative sign multiplying A has
been introduced for later reference.) Substituting Eqs. (1.8) and (1.9) into the inhomogeneous
Maxwell equations, we obtain

1
c
t
A = 4,
A+
1
c
2
2
t
A+( A) +
1
c
t
=
4
c
j (1.10)
These equations do not look particularly inviting. However, as in section ?? we may observe
that the choice of the generalized vector potential A is not unique; this freedom can be used
to transform Eq.(1.10) into a more manageable form: For an arbitrary function f : R
3
R
R, (x, t) f(x, t). The transformation A A+ f leaves the magnetic eld unchanged
while the electric eld changes according to E Ec
1
t
f. If, however, we synchronously
redene the scalar potential as c
1
t
f, the electric eld, too, will not be aected by
the transformation. Summarizing, the generalized gauge transformation
A A+f,

1
c
t
f, (1.11)
leaves the electromagnetic elds (1.8) and (1.9) unchanged.
The gauge freedom may be used to transform the vector potential into one of several
convenient representations. Of particular relevance to the solution of the time dependent
Maxwell equations is the socalled Lorentz gauge
A+
1
c
t
= 0. (1.12)
Importantly, it is always possible to satisfy this condition by a suitable gauge transformation.
Indeed, if (
t
, A
t
) does not obey the Lorentz condition, i.e. A
t
+
1
c
t
,= 0, we may dene
a transformed potential (, A) through =
t
c
1
t
f and A = A
t
+f. We now require
that the newly dened potential obey (1.12). This is equivalent to the condition
_
1
c
2
2
t
_
f =
_
A
t
+
1
c
t
_
,
i.e. a wave equation with inhomogeneity ( A+ c
1
t
). We will see in a moment that
such equations can always be solved, i.e.
we may always choose a potential (, A) so as to obey the Lorentz
gauge condition.
In the Lorentz gauge, the Maxwell equations assume the simplied form
_
1
c
2
2
t
_
= 4,
_
1
c
2
2
t
_
A =
4
c
j. (1.13)
In combination with the gauge condition (1.12), Eqs. (1.13) are fully equivalent to the set of
Maxwell equations (1.7).
1.2.1 Covariant formulation
The formulation introduced above is mathematically ecient, however it has an unsatisfactory
touch to it: the gauge transformation (1.11) couples the zeroth component with the
1.2. ELECTROMAGNETIC GAUGE FIELD 5
vectorial components A. While this indicates that the four components (, A) should
form an entity, the equations above treat and A separately. In the following we show that
there exists a much more concise and elegant covariant formulation of electrodynamics that
transcendents this separation. At this point, we introduce the covariant notation merely as a
mnemonic aid. Later in the text, we will understand the physical principles behind the new
formulation.
Denitions
Consider a vector x R
4
. We represent this object in terms of four components x
,
= 0, 1, 2, 3, where x
0
will be called the timelike component and x
i
, i = 1, 2, 3 space-
like components.
1
Fourcomponent objects of this structure will be called contravariant
vectors. To a contravariant vector x
, we associate a covariant vector x
(notice the
positioning of the indices as superscripts and subscripts, resp.) through
x
0
= x
0
, x
i
= x
i
. (1.14)
A co and a contravariant vector x
and x
t
can be contracted to produce a number:

2
x
x
t
x
0
x
t0
x
i
x
ti
.
Here, we are employing the Einstein summation convention according to which pairs of indices
are summed over, x
x
t
4
=1
x
x
t
, etc. Notice that

x
x
t
= x
x
t
.
We next introduce a number of examples of co- and contravariant vectors that will be of
importance below:
Consider a point in space time (t, r) dened by a time argument, and three real space
components r
i
, i = 1, 2, 3. We store this information in a so-called contravariant four
component vector, x
, = 0, 1, 2, 3 as x
0
= ct, x
i
= r
i
, i = 1, 2, 3. Occasionally, we
write x x
= (ct, x) for short. The associated covariant vector has components

x
= (ct, r).
Derivatives taken w.r.t. the components of the contravariant 4vector are denoted by
= (c
1
t
,
r
). Notice that this is a covariant object. Its contravariant partner
is given by
= (c
1
t
,
r
).
The 4-current j = j
is dened through j (c, j), where and j are charge density

and vectorial current density, resp.
the 4vector potential A A
through A = (, A).
We next engage these denitions to introduce the
1
Following a widespread convention, we denote four component indices , , = 0, 1, 2, 3 by Greek letters
(from the middle of the alphabet.) Spacelike indices i, j, = 1, 2, 3 are denoted by (midalphabet) latin
letters.
2
Although one may formally consider sums x
, objects of this type do not make sense in the present

formalism.
Covariant representation of electrodynamics
Above, we saw that the freedom to represent the potentials (, A) in dierent gauges may be
used to obtain convenient formulations of electrodynamics. Within the covariant formalism,
the general gauge transformation (1.11) assumes the form
A
f. (1.15)
(Exercise: convince yourself that this is equivalent to (1.11).) Specically, this freedom may
be used to realize the Lorentz gauge,
= 0. (1.16)
Once more, we verify that any 4-potential may be transformed to one that obeys the Lorentz
condition: for if A
t
obeys
A
t
, where is some function, we may dene the transformed
potential A
A
t
f. Imposing on A
the Lorentz condition,

0
!
=
(A
t
f) =
f,
we obtain the equivalent condition f
!
= . Here, we have introduced the dAlambert
operator

=
1
c
2
2
t
. (1.17)
(In a sense, this is a fourdimentional generalization of the Laplace operator, hence the four
cornered square.) This is the covariant formulation of the condition discussed above. A gauge
transformation A
f of a potential A
in the Lorenz gauge class
= 0 is
compatible with the Lorentz condition i f = 0. In other words, the class of Lorentz gauge
vector potentials still contains a huge gauge freedom.
The general representation of the Maxwell equations through potentials (1.10) aords the
4-representation
A
) =
4
c
j
.
We may act on this equation with
to obtain the constraint
= 0, (1.18)
which we identify as the covariant formulation of charge conservation,
=
t
+j =
0: the Maxwell equations imply current conservation. Finally, imposing the Lorentz condition,
= 0, we obtain the simplied representation
=
4
c
j
= 0. (1.19)
Before turning to the discussion of the solution of these equations a few general remarks
are in order:
1.3. FOURIER TRANSFORMATION 7
The Lorentz condition is the prevalent gauge choice in electrodynamics because (a) it
brings the Maxwell equations into a maximally simple form and (b) will turn out below
to be invariant under the most general class of coordinate transformations, the Lorentz
transformations see below. (It is worthwhile to note that the Lorentz condition does
not unambiguously x the gauge of the vector potential. Indeed, we may modify any
Lorentz gauge vector potential A
as A
f where f satises the homogeneous

wave equation
f = 0. This doesnt alter the condition
= 0.) Other gauge

conditions frequently employed include
the Coulomb gauge or radiation gauge A = 0 (employed earlier in section
??.) The advantage of this gauge is that the scalar Maxwell equation assumes the
simple form of a Poisson equation (x, t) = 4(x, t) which is solved by (x, t) =
_
d
3
x
t
(x, t)/[x x
t
[, i.e. by an instantaneous Coulomb potential (hence the name
Coulomb gauge.) This gauge representation has also proven advantageous within the
context of quantum electrodynamics. However, we wont discuss it any further in this
text.
The passage from the physical elds E, B to potentials A
as fundamental variables
implies a convenient reduction of variables. Working with elds, we have 2 3
vectorial components, which, however are not independent: equations such as B = 0
represent constraints. This should be compared to the four components of the potential
A
. The gauge freedom implies that of the four components only three are independent
(think about this point.) In general, no further reduction in the number of variable
is possible. To see this, notice that the electromagnetic elds are determined by the
sources of the theory, i.e. the 4-current j
. The continuity equation
= 0 reduces
the number of independent sources down to 3. We certainly need three independent eld
components to describe the eld patterns created by the three independent sources. (The
electromagnetic eld in a sourcefree environment j
= 0 is described in terms of only

two independent components. This can be seen, e.g., by going to the Coulomb gauge.
The above equation for the timelike component of the potential is then uniquely solved
by = 0. Thus, only the two independent components of A constraint by A = 0
remain.
1.3 Fourier transformation
In this purely mathematical section, we introduce the concept of Fourier transformation.
We start with a formal denition of the Fourier transform as an abstract integral transfor-
mation. Later in the section, we will discuss the application of Fourier transformation in the
solution of linear dierential equations, and its conceptual interpretation as a basis-change in
function space.
1.3.1 Fourier transform (nite space)
Consider a function f : [0, L] C on an interval [0, L] R. Conceptually, the Fourier
transformation is a representation of f as a superposition (a sum) of plane waves:
f(x) =
1
L
k
e
ikx
f
k
, (1.20)
where the sum runs over all values k = 2n/L, n Z. Series representations of this type are
summarily called Fourier series representations. At this point, we havent proven that the
function f can actually be represented by a series representation of this type. However, if it
exists, the coecients f
k
can be computed with the help of the identity (prove it)
1
L
_
L
0
dx e
ikx
e
ikx
=
kk
, (1.21)
where k = 2n/L, k
t
= 2n
t
/L and
kk

nn
. We just need to multiply Eq. (1.1) by the
function e
ikx
and integrate over x to obtain
_
L
0
dx e
ikx
f(x)
(1.1)
=
1
L
_
L
0
dx e
ikx
e
ik
x
f
k
f
k
1
L
_
L
0
dx e
ikx
e
ik
x
(1.21)
= f
k
.
We call the numbers f
k
C the Fourier coecients of the function f. The relation
f
k
=
_
L
0
dx e
ikx
f(x), (1.22)
connecting a function with its Fourier coecients is called the Fourier transform of f.
The inverse identity (1.1) expressing f through its Fourier coecients is called the inverse
Fourier transform. To repeat, at this point, we havent proven the existence of the inverse
transform, i.e. the possibility to reconstruct a function f from its Fourier coecients f
k
.
A criterion for the existence of the Fourier transform is that the insertion of (1.22) into
the r.h.s. of (1.1) gives the back the original function f:
f(x)
!
=
1
L
k
e
ikx
_
L
0
dx
t
e
ikx
f(x
t
) =
=
_
L
0
dx
t
1
L
k
e
ik(xx
)
f(x
t
).
A necessary and sucient condition for this to hold is that
1
L
k
e
ik(xx
)
= (x x
t
), (1.23)
1.5 2.0 2.5 3.0
1.0
0.5
0.5
1.0
Figure 1.1: Representation of a function (thick line) by Fourier series expansion. The graphs
show series epansions up to (1, 3, 5, 7, 9) terms in *
where (x) is the Dirac function. For reasons that will become transparent below, Eq. (1.23)
is called a completeness relation. Referring for a proof of this relation to the exercise below,
we here merely conclude that Eq. (1.23) establishes Eq. (1.22) and its inverse (1.1) as an
integraltransform, i.e. the full information carried by f(x) is contained in the set of all Fourier
coecients f
k
. By way of example, gure 1.1 illustrates the approximation of a sawtooth
shaped function f by Fourier series truncated after a nite number of terms.
EXERCISE Prove the completeness relation. To this end, add an innitesimal convergence
generating factor > 0 to the sum:
1
L
k
e
ik(xx
)[k[
=
1
L
n=
e
2in(xx
)/L2[n[/L
.
Next compute the sum as a convergent geometric series. Take the limit 0 and convince
yourself that you obtain a representation of the Dirac function.
It is straightforward to generalize the formulae above to functions dened in higher dimen-
sional spaces: consider a function f : M C, where M = [0, L
1
] [0, L
2
] [0, L
d
] is
a ddimensional cuboid. It is straightforward to check, that the generalization of the formulae
above reads
f(x) =
d
l=1
1
L
l
k
e
ikx
f
k
,
f
k
=
_
M
d
d
x e
ixk
f(x),
(1.24)
where the sum runs over all vectors k = (k
1
, . . . , k
d
), k
l
= 2n
l
/L
l
, n
l
Z. Eqs. (1.21)
and (1.23) generalize to
_
M
d
d
x e
i(kk
)x
=
k,k
,
d
l=1
1
L
l
k
e
ik(xx
)
= (x x
t
),
where
k,k

d
l=1
k
l
,k
l
.
Finally, the Fourier transform of vector valued functions, f : M C
n
is dened
through the Fourier transform of its components, i.e. f
k
is an ndimensional vector whose
components f
l
k
, are given by the Fourier transform of the component functions f
l
(x), l =
1, . . . , n.
This completes the denition of Fourier transformation in nite integration domains. At
this point, a few general remarks are in order:
Why is Fourier transformation useful? There are many dierent answers to this
question, which underpins the fact that we are dealing with an important concept: In
many areas of electrical engineering, sound engineering, etc. it is customary to work
with wave like signals, i.e. signals of harmonic modulation Re e
ikx
= cos(kx). The
(inverse) Fourier transform is but the decomposition of a general signal into harmonic
constituents. Similarly, many signals encountered in nature (water or sound waves
reected from a complex pattern of obstacles, light reected from rough surfaces, etc.)
can be interpreted in terms of a superposition of planar waves. Again, this is a situation
where a Fourier mode decomposition is useful.
On a dierent note, Fourier transformation is an extremely potent tool in the solution
of linear dierential equations. This is a subject to which we will turn below.
There is some freedom in the denitions of Fourier transforms. For example,
we might have chosen f(x) =
c
L
k
e
ikx
f
k
with f
k
=
1
c
_
dx e
ikx
f
k
and an arbitrary
constant c. Or, f(x) =
1
L
k
e
ikx
f
k
with f
k
=
_
dx e
ikx
f
k
(inverted sign of the
phases), or combinations thereof. The actual choice of convention is often a matter of
convenience.
1.3.2 Fourier transform (innte space)
In many applications, we want to consider functions f : R
d
C without explicit reference to
some nite cuboid shaped integration domain. Formally, the extension to innite space may
be eected by sending the extension of the integration interval, L, to innity. In the limit
L , the spacing of consecutive Fourier modes k = 2/L approaches zero, and Eq.
(1.1) is identied as the Riemann sum representation of an integral,
lim
L
f(x) = lim
L
1
L
k
e
ikx
f
k
=
1
2
lim
k0
k
nZ
e
iknx
f
kn
=
1
2
_
dk e
ikx
f(k),
Following a widespread convention, we denote the Fourier representation of functions in innite
space by f(k) instead of f
k
. Explicit reference to the argument k will prevent us from
confusing the Fourier representation f(k) from the original function f(x).
The innite space version of the inverse transform reads as
f(k) =
_

dx e
ikx
f(x). (1.25)
While this denition is formally consistent, it has a problem: only functions decaying suf-
ciently fast at innity so as to make f(x) exp(ikx) integrable can be transformed in this
way. This makes numerous function classes of applied interest polynomials, various types of
transcendental functions, etc. non-transformable and severely restrict the usefulness of the
formalism.
However, by a little trick, we may upgrade the denition to one that is far more useful.
Let us modify the transform according to
f(k) = lim
0
_
dx e
ikxx
2
f(x). (1.26)
Due to the inclusion of the convergence generating factor, , any function that does not
increase faster than exponential can be transformed! Similarly, it will be useful to include a
convergence generating in the reverse transform, i.e.
f(x) = lim
0
_
dk e
ikxk
2
f(k).
Finally, upon straightforward extension of these formulae to multidimensional functions f :
R
d
C, we arrive at the denition of Fourier transform in innite space
f(x) = lim
0
_
d
d
k
(2)
d
e
ixkk
2
f(k),
f(k) = lim
0
_
d
d
x e
ikxx
2
f(x).
(1.27)
In these formulae, x
2
x x is the norm of the vector x, etc. A few more general remarks
on these denitions:
To keep the notation simple, we will suppress the explicit reference to the convergence
generating factor, unless convergence is an issue.
Occasionally, it will be useful to work with convergence generating factors dierent
from the one above. For example, in one dimension, it may be convenient to eect
convergence by the denition
f(x) = lim
0
1
2
_
dk e
ixk[k[
f(k),
f(k) = lim
0
_
dx e
ikx[x[
f(x).
For later reference, we note that the generalization of the completeness relation to
innite space reads as
_
d
d
k
(2)
d
e
ik(xx
)
= (x x
t
). (1.28)
Finally, one may wonder, whether the inclusion of the convergence generating factor
might not spoil the consistency of the transform. To check that this is not the case, we
need to verify the innite space variant of the completeness relation (1.23), i.e.
lim
0
_
d
d
k
(2)
d
e
ik(xx
)k
2
f(k) = (x x
t
). (1.29)
(For only if this relation holds, the two equations in (1.27) form a consistent pair, cf.
the argument in section 1.3.1.)
EXERCISE Prove relation (1.29). Do the Gaussian integral over k (hint: notice that the integral
factorizes into d onedimensional Gaussian integrals.) to obtain a result that in the limit 0
asymptotes to a representation of the function.
1.3.3 Fourier transform and solution of linear dierential equations
One of the most striking features of the Fourier transform is that it turns derivative operations
(complicated) into algebraic multiplications (simple): consider the function
x
f(x).
3
Without
much loss of generality, we assume the function f : [0, L] C to obey periodic boundary
conditions, f(0) = f(L). A straightforward integration by parts shows that the Fourier
coecients of
x
f are given by ikf
k
. In other words, denoting the passage from a function to
its Fourier transform by an arrow,
f(x) f
k

x
f(x) ikf
k
.
This operation can be iterated to give
n
x
f(x) (ik)
n
f
k
.
3
Although our example is onedimensional, we write
x
instead of
d
dx
. This is in anticipation of our
higherdimensional applications below.
In a higher dimensional setting these identities generalize to (think why!)
l
f(x) (ik
l
)f
k
,
where
l

x
l
and k
l
is the lcomponent of the vector k. Specically, the operations of
vector analysis become purely algebraic operations:
f(x) ikf
k
,
f (x) ik f
k
,
f (x) ik f
k
,
f(x) (k k)f
k
, (1.30)
etc. Relation of this type are of invaluable use in the solution of linear dierential equations
(of which electrodynamics is crawling!)
To begin with, consider a linear dierential equation with constant coecients. This
is a dierential equation of the structure
(c
(0)
+ c
(1)
i

i
+ c
(2)
ij

i
j
+ . . . )(x) = (x),
where c
(n)
i,j,...
C are constants. We may now Fourier transform both sides of the equation to
obtain
(c
(0)
+ ic
(1)
i
k
i
c
(2)
ij
k
i
k
j
+ . . . )
k
=
k
.
This is an algebraic equation which is readily solved as
k
=

k
c
(0)
+ ic
(1)
i
k
i
c
(2)
ij
k
i
k
j
+ . . .
.
The solution (x) can now be obtained by computation of the inverse Fourier transform,
(x) =
_
d
d
k
(2)
d
k
e
ikx
c
(0)
+ ic
(1)
i
k
i
c
(2)
ij
k
i
k
j
+ . . .
, (1.31)
Fourier transform methods thus reduce the solution of linear partial dierential equations with
constant coecients to the computation of a denite integral. Technically, this integral may
be dicult to compute, however, the conceptual task of solving the dierential equation is
under control.
EXAMPLE By way of example, consider the second order ordinary dierential equation
(
2
x
+
2
)(x) = c(x x
0
),
where , c and x
0
are constants. Application of formula (1.31) gives a
4
solution
(x) = c
_
dk
2
e
ikx
e
ikx
0
k
2
+
2
.
4
The solution is not unique: addition of a solution
0
of the homogeneous equation (
2
x
+
2
)
0
(x) = 0
gives another solution (x) (x) +
0
(x) of the inhomogeneous dierential equation.
The integral above can be done by the method of residues (or looked up in a table). As a result
we obtain the solution.
(x) = ce
[xx
0
[/
.
(Exercise: convince yourself that is a solution to the dierential equation.) In the limit ,
the dierential equation collapses to a onedimensional variant of the Poisson equation,
2
x
(x) =
c(x x
0
) for a charge c/4 at x
0
. Our solution asymptotes to
(x) = c[x x
0
[,
which we may interpret as a onedimensional version of the Coulomb potential.
However, the usefulness of Fourier transform methods has its limits. The point is that the
Fourier transform does not only turn complicated operations (derivatives) simple, the converse
is also true. The crux of the matter is expressed by the socalled convolution theorem:
straightforward application of the denitions (1.27) and of the completeness relation (1.29)
shows that
f(x)g(x) (f g)(k)
_
d
d
p
(2)
d
f(k p)g(p). (1.32)
In other words, the Fourier transform turns a simple product operation into a nonlocal inte-
gral convolution. Specically, this means that the Fourier transform representation of (linear)
dierential equations with spatially varying coecient functions will contain convolution oper-
ations. Needless to say that the solution of the ensuing integral equations can be complicated.
Finally, notice the mathematical similarity between the Fourier transform f(x) f(k) and
its reverse f(k) f(x). Apart from a sign change in the phase and the constant prefactor
(2)
d
the two operations in (1.27) are identical. This means that identities such as (1.30)
or (1.32) all have a reverse version. For example,
k
f(k) ixf(x),
a convolution,
(f g)(x)
_
d
d
xf(x y)g(y) f(k)g(k)
transforms into a simple product, etc. In practice, one often needs to compromise and gure
out whether the real space or the Fourier transformed variant of a problem oers the best
perspectives for mathematical attack.
1.3.4 Algebraic interpretation of Fourier transform
While it is perfectly o.k. to apply the Fourier transform as a purely operational tool, new
perspective become available once we understand the conceptual meaning of the transform.
The Fourier transform is but one member of an entire family of integral transforms. All these
transforms have a common basis, which we introduce below. The understanding of these
principles will elucidate the mathematical structure of the formulas introduced above, and be
of considerable practical use.
A word of caution: this section wont contain dicult formulae. Nonetheless, it doesnt
make for easy reading. Please do carefully think about the meaning of the structures introduced
below.
Function spaces as vector spaces
Consider the space V of L
2
functions f : [0, L] C, i.e. the space of functions for which
_
L
0
dx [f[
2
exists. We call this a space of functions, because V is a vector space: for two
elements f, g V and a, b C, af + bg V , the dening criterion of a complex vector
space. However, unlike the nitedimensional spaces studied in ordinary linear algebra, the
dimension of V is innite. Heuristically, we may think of the dimension of a complex vector
space as the number of linearly independent complex components required to uniquely specify
a vector. In the case of functions, we need the innitely many components f(x), x [0, L]
for a full characterization.
0.2 0.4 0.6 0.8 1.0
1.0
0.5
0.5
1.0
In the following we aim to develop some intuition for the linear
algebra on function spaces. To this end, it will be useful to
think of V as the limit of nite dimensional vector spaces.
Let x
i
= i, = L/N be a discretization of the interval
[0, L] into N segments. Provided N is suciently large, the
data set f
N
(f
1
, . . . , f
N
), f
i
f(x
i
) will be an accurate
representative of a (smooth) function f (see the gure for a discretization of a smooth function
into N = 30 data points). We think of the N-component complex vectors f
N
generated in
this way as elements of an N-dimensional complex vector space V
N
. Heuristically it should be
clear that in the limit of innitely ne discretization, N , the vectors f
N
carry the full
information on their reference functions, and V
N
approaches a representative of the function
space lim
N
V
N
= V .
5
5
The quotes are quite appropriate here. Many objects standard in linear algebra, traces, determinants,
etc. do not aord a straightforward extrapolation to the innite dimensional case. However, in the present
context, the ensuing structures the subject of functional analysis do not play an essential role, and a
naive approach will be sucient.
On V
N
, we may dene a complex scalar product, , : V
N
V
N
C
through, f , g
1
N
N
i=1
f
i
g
i
.
6
Here, the factor N
1
has been included to keep
the scalar product roughly constant as N increases. In the limit N , the
operation , evolves into a scalar product on the function space V : to two
functions f, g V , we assign
f, g lim
N
1
N
i
f
i
g
i
=
_
L
0
dxf
(x)g(x).
Corresponding to the scalar product on V
N
, we have a natural orthonormal basis
e
i
[i = 1, . . . , N, where e
i
= (0, . . . , N, 0, . . . ) and the entry N sits at the ith
position. With this denition, we have
e
i
, e
j
= N
ij
.
The Nnormalization of the basis vectors e
i
implies that the coecients of general
vectors f are simply obtained as
f
i
= e
i
, f . (1.33)
What is the N limit of these expressions? At N the vectors e
i
become the
representative of functions that are innitely sharply peaked around a single point (see the
gure above). Increasing N, and choosing a sequence of indices i that correspond to a xed
point y [0, L],
7
we denote this function by
y
. Equation (1.33) then asymptotes to
f(y) =
y
, f =
_
L
0
dx
y
(x)f(x),
where we noted that
y
is real. This equation identies
y
(x) = (y x) as the -function.
8
This consideration identies the functions
y
[y [0, L] as the function space analog of
the standard basis associated to the canonical scalar product. Notice that there are innitely
many basis vectors. Much like we need to distinguish between vectors v V
N
and the
components of vectors, v
i
C, we have to distinguish between functions f V and their
coecients or function values f(x) =
x
, f C.
Fourier transformation is a change of basis
Why did we bother to introduce these formal analogies? The point is that much like in
linear algebra we may consider bases dierent from the standard one. In fact, there is often
good motivation to switch to another basis. Imagine one aims to mathematically describe
a problem displaying a degree of symmetry. Think, for example, of the Poisson equation
of a rotationally symmetric charge distribution. It will certainly be a good idea, to seek for
6
If no confusion is possible, we omit the subscript N in f
N
.
7
For example, i = [Ny/L], where [x] is the largest integer smaller than x R.
8
Strictly speaking,
y
is not a function. Rather, it is what in mathematics is called a distribution.
However, we will continue to use the loose terminology function.
solutions that reect this symmetry, i.e. one might aim to represent the solution in terms
of a basis of functions that show denite behavior under rotations. Specically, the Fourier
transformation is a change to a function basis that behaves conveniently when acted upon by
derivative operations.
To start with, let us briey recall the linear algebra of changes of basis in nite dimen-
sional vector spaces. Let w
[ = 1, . . . , N be a second basis of the space V

N
(besides
the reference basis e
i
[i = 1, . . . , N). For simplicity, we presume that the new basis is
orthonormal with respect to the given scalar product, i.e.
w
, w
= N
,
.
Under these conditions, any vector v V
N
may be expanded as
v =
v
t
. (1.34)
Written out for individual components v
i
= e
i
, v, this reads
v
i
=
v
t
w
,i
, (1.35)
where w
,i
e
i
, w
are the components of the new basis vectors in the old basis. The
coecients of the expansion, v
t
, of v in the new basis are obtained by taking the scalar

product with w
: w
, v =
v
t
, w
= Nv
t
, or
v
t
=
1
N
j
w
,j
v
j
(1.36)
Substitution of this expansion into (1.35) obtains v
i
= N
1
,j
w
,i
v
j
. This relation
holds for arbitrary component sets v
i
, which is only possible if
ij
=
1
N
,j
w
,i
. (1.37)
This is the nite dimensional variant of a completeness relation. The completeness relation
holds only for orthonormalized bases. However, the converse is also true: any system of
vectors w
[i = 1, . . . , n that satisfying the completeness criterion (1.37) is orthonormal

and complete in the sense that it can be used to represent arbitrary vectors as in (1.34).
We may also write the completeness relation in a less indexoriented notation,
e
i
, e
j
=
e
i
, w
, e
j
. (1.38)
EXERCISE Prove that the completeness relation (1.37) represents a criterion for the orthonor-
mality of a basis.
Let us now turn to the innite dimensional generalization of these concepts. The (in-
verse) Fourier transform states (1.1) that an arbitrary element f V of function space (a
vector) can be expanded in a basis of functions e
k
V [k 2L
1
Z. The function values
(components in the standard basis) of these candidate basis functions are given by
e
k
(x) =
x
, e
k
= e
ikx
.
The functions e
k
are properly orthonormalized (cf. Eq. (1.21))
9
e
k
, e
k
=
_
L
0
dx e
k
(x)e
k
(x) = L
kk
.
The inverse Fourier transform states that an arbitrary function can be expanded as
f =
1
L
k
f
k
e
k
. (1.39)
This is the analog of (1.34). In components,
f(x) =
1
L
k
f
k
e
k
(x), (1.40)
which is the analog of (1.35). Substitution of e
k
(x) = e
ikx
gives (1.1). The coecients can
be obtained by taking the scalar product e
k
, with (1.39), i.e.
f
k
= e
k
, f =
_
L
0
dx e
ikx
f(x),
which is the Fourier transform. Finally, completeness holds if the substitution of these coef-
cients into the inverse transform (1.40) reproduces the function f(x). Since this must hold
for arbitrary functions, we are led to the condition
1
L
k
e
ikx
e
iky
= (x y),
The abstract representation of this relation (i.e. the analog of (1.38)) reads (check it!)
x
,
y
=
1
L
x
, e
k
e
k
,
y
.
For the convenience of the reader, the most important relations revolving around completeness
and basis change are summarized in table 1.1. The column invariant contains abstract
denitions, while components refers to the coecients of vectors/function values (For better
readability the normalization factors N we have our specic denition of scalar products above
are omitted in the table. )
9
However, this by itself does not yet prove their completeness as a basis!
vector space function space
invariant components invariant components
elements vector, v components v
i
function, f value, f(x)
scalar product v,w v
i
w
i
f,g
dx f
(x)g(x)
standard basis e
i
e
i,j
=
ij

x

x
(y)=(xy)
alternate basis w
w
,i
e
k
e
k
(x)=exp ikx
orthonormality w
,w
,i
w
,i
=
e
k
,
k
=L
kk
dx e
i(kk
)x
=L
kk
expansion v=v
v
i
=v
w
,i
f=
1
L
k
f
k
e
k
f(x)=
1
L
k
f
k
e
ikx
coecients v
=w
,v v
=w
,i
v
i
f
k
=e
k
,f f
k
=
dx e
ikx
f(x)
completeness e
i
,e
j
=e
i
,w
,e
j

ij
=w
,i
w
,j
(xy)=
1
L
x
,e
k
e
k
,
y
(xy)=
1
L
k
e
ik(xy)
Table 1.1: Summarizing the linear algebraic interpretation of basis changes in function space.
(In all formulas, a double index summation convention implied.)
With this background, we are in a position to formulate the motivation for Fourier
transformation from a dierent perspective. Above, we had argued that Fourier transfor-
mation transforms derivatives into something simple. The derivative operation
x
: V V
can be viewed as a linear operator in function space: derivatives map functions onto functions,
f
x
f and
x
(af + bg) = a
x
f + b
x
g, a, b C, the dening criterion for a linear map.
In linear algebra we learn that given a linear operator, A : V
N
V
N
, it may be a good idea
to look at its eigenvectors, w
, and eigenvalues,
, i.e. Aw
. In the specic case,

where A is hermitean, v, Aw = Av, w for arbitrary v, w, we even know that the set of
eigenvectors w
can be arranged to form an orthonormal basis.

These structures extend to function space. Specically, consider the operator i
x
, where
the factor (i) has been added for convenience. A straightforward integration by parts shows
that this operator is hermitean:
f, (i
x
)g =
_
dx f
(x)(i
x
)g(x) =
_
dx (i
x
f)
(x)g(x) = (i
x
)f, g.
Second, we have,
(i
x
e
k
)(x) = i
x
exp(ikx) = k exp(ikx) = ke
k
(x),
or
(i
x
) e
k
= k e
k
for short. This demonstrates that
the exponential functions exp(ikx) are eigenfunctions of the derivative
operator, and k is the corresponding eigenvalue.
This observation implies the conclusion:
Fourier transformation is an expansion of functions in the eigenbasis of the
derivative operator, i.e. the basis of exponential functions e
k
.
This expansion is appropriate, whenever we need a simple representation of the derivative
operator. In the sections to follow, we will explore how Fourier transformation works in practice.
1.4 Electromagnetic waves in vacuum
As a warmup to our discussion of the full problem posed by the solution of Eqs. (1.13), we
consider the vacuum problem, i.e. a situation where no sources are present, j
= 0. Taking
the curl of the second of Eqs. (1.13) and using (1.8) we then nd
_
1
c
2
2
t
_
B = 0.
Similarly, adding to the gradient of the rst equation c
1
times the time derivative of the
second, and using Eq.(1.9), we obtain
_
1
c
2
2
t
_
E = 0,
i.e. in vacuum both the electric eld and the magnetic eld obey homogeneous wave equations.
1.4.1 Solution of the homogeneous wave equations
The homogeneous wave equations are conveniently solved by Fourier transformation. To this
end, we dene a fourdimensional variant of Eq. (1.27),
f(t, x) =
_
d
3
kd
(2)
4
f(, k)e
ikxit
,
f(, k) =
_
d
3
xdt f(t, x)e
ikx+it
, (1.41)
The one dierence to Eq. (1.27) is that the signconvention in the exponent of the (/t)sector
of the transform diers from that in the (k, x)sector. We next subject the homogeneous wave
equation
_
1
c
2
2
t
_
(t, x) = 0,
(where is meant to represent any of the components of the E or Beld) to this transfor-
mation and obtain
_
k
2
c
_
2
_
(, k) = 0,
where k [k[. Evidently, the solution must vanish for all values (, k), except for those
for which the factor k
2
(/c)
2
= 0. We may thus write
(, k) = 2c
+
(k)( kc) + 2c
(k)( + kc),
where c
C are arbitrary complex functions of the wave vector k. Substituting this

representation into the inverse transform, we obtain the general solution of the scalar ho-
mogeneous wave equation
(t, x) =
_
d
3
k
(2)
3
_
c
+
(k)e
i(kxckt)
+ c
(k)e
i(kx+ckt)
_
. (1.42)
In words:
1.4. ELECTROMAGNETIC WAVES IN VACUUM 21
The general solution of the homogeneous wave equation is obtained by
linear superposition of elementary plane waves, e
i(kxckt)
,
where each wave is weighted with an arbitrary coecient c
(k).
The elementary constituents are called waves because for any
xed instance of space, x/time, t they harmonically depend
on the complementary argument time, t/position vector, x.
The waves are planar in the sense that for all points in the
plane xed by the condition x k = const. the phase of the
wave is identical, i.e. the set of points k x = const. denes a wave front perpendicular
to the wave vector k. The spacing between consequtive wave fronts with the same phase
arg(exp(i(k x ckt)) is given by x =
2
k
, where is the wave length of the wave
and
1
= k/2 its wave number. The temporal oscillation period of the wave fronts is set
by 2/ck.
k
E
B
Focusing on a xed wave vector k, we next generalize our results to
the vectorial problem posed by the homogeneous wave equations. Since
every component of the elds E and Bis subject to its own independent
wave equation, we may write down the prototypical solutions
E(x, t) = E
0
e
i(kxt)
, B(x, t) = B
0
e
i(kxt)
, (1.43)
where we introduced the abbreviation = ck and E
0
, B
0
C
3
are constant coecient
vectors. The Maxwell equations B = 0 and (vacuum) E = 0 imply the condition
k E
0
= k B
0
= 0, i.e. the coecient vectors are orthogonal to the wave vector. Waves
of this type, oscillating in a direction perpendicular to the wave vector, are called transverse
waves. Finally, evaluating Eqs.(1.43) on the law of induction E+c
1
t
B = 0, we obtain
the additional equation B
0
= n
k
E
0
, where n
k
k/k, i.e. the vector B
0
is perpendicular
to both k and E
0
, and of equal magnitude as E
0
. Summarizing, the vectors
k E B k, [B[ = [E[ (1.44)
form an orthogonal system and B is uniquely determined by E (and vice versa).
At rst sight, it may seem that we have been to liberal in formulating the solution (1.43):
while the physical electromagnetic eld is a real vector eld, the solutions (1.43) are manifestly
complex. The simple solution to this conict is to identify Re E and Re B with the physical
elds.
10
1.4.2 Polarization
In the following we will discuss a number of physically dierent realizations of plane electro-
magnetic waves. Since B is uniquely determined by E, we will focus attention on the latter.
10
One may ask why, then, did we introduce complex notation at all. The reason is that working with
exponents of phases is way more convenient than explicitly having to distinguish between the sin and cos
functions that arise after real parts have been taken.
Let us choose a coordinate system such that e
3
| k. We may then write
E(x, t) = (E
1
e
1
+ E
2
e
2
)e
ikxit
,
where E
i
= [E
i
[ exp(i
i
). Depending on the choice of the complex coecients E
i
, a number
of physically dierent wavetypes can be distinguished.
Linearly polarized waves
z
x
y
For identical phases
1
=
2
= , we obtain
Re E = ([E
1
[e
1
+[E
2
[e
2
) cos(k x t + ),
i.e. a vector eld linearly polarized in the direction of the
vector [E
1
[e
1
+ [E
2
[e
2
. The corresponding magnetic eld is
given by
Re B = ([E
1
[e
2
[E
2
[e
1
) cos(k x t + ),
i.e. a vector perpendicular on both E and k, and phase synchronous with E.
INFO Linear polarization is a hallmark of many articial light sources, e.g. laser light is usually
linearly polarized. Likewise, the radiation emitted by many antennae shows approximately linear
polarization. In nature, the light reected from optically dense media water surfaces, window
panes, etc. tends to exhibit a partial linear polarization.
11
Circularly polarized waves
Next consider the case of a phase dierence,
1
=
2
/2 and equal amplitudes
[E
1
[ = [E
2
[ = E:
Re E(x, t) = E (e
1
cos(k x t + ) e
2
sin(k x t + )) .
z
x
y
Evidently, the tip of the vector E moves on a circle E is
circularly polarized. Regarded as a function of x (for any xed
instance of time), E traces out a spiral whose characteristic
period is set by the wave number = 2/k, (see Fig. ??.)
The sense of orientation of this spiral is determined by the sign
of the phase mismatch. For
1
=
2
+/2 (
1
=
2
/2)
we speak of a right (left) circulary polarized wave or a wave of
positive (negative) helicity.
INFO As with classical point particles, electromagnetic elds can be subjected to a quantization
procedure. In quantum electrodynamics it is shown that the quantae of the electromagnetic eld,
the photons carry denite helicity, i.e. they represent the minimal quantum of circularly polarized
waves.
11
This phenomenon motivates the application of pole lters in photography. A pole lter blocks light of a
predened linear polarization. This direction can be adjusted so as to suppress light reections. As a result
one obtains images exhibiting less glare, more intense colouring, etc.
1.5. GREEN FUNCTION OF ELECTRODYNAMICS 23
Elliptically polarized waves
Circular and linear polarization represent limiting cases of a more general form of polarization.
Indeed, the minimal geometric structure capable of continuously interpolating between a line
segment and a circle is the ellipse. To conveniently see the appearance of ellipses, consider
the basis change, e
=
1
2
(e
1
ie
2
), and represent the electric eld as
E(x, t) = (E
+
e
+
+ E
)e
ikxit
.
z
x
y
In this representation, a circularly polarized wave corresponds
to the limit E
+
= 0 (positive helicity) or E
= 0 (negative he-
licity). Linear polarization is obtained by setting E
+
= E
.
It is straightforward to verify that for generic values of the ra-
tio [E
/E
+
[ r one obtains an elliptically polarized wave
where the ratio between the major and the minor axis is set
by [1 + r/1 r[. The tilting angle of the ellipse w.r.t. the
1axis is given by = arg(E
/E
+
).
This concludes our discussion of the polarization of electromagnetic radiation. It is impor-
tant to keep in mind that the dierent types of polarization discussed above represent limiting
cases of what in general is only partially or completely unpolarized radiation. Radiation of
a given value of the frequency, , usually involves the superposition of waves of dierent
wave vector k (at xed wave number k = [k[ = c
1
.) Only if the amplitudes of all partial
waves share a denite phase/amplitude relation, do we obtain a polarized signal. The degree
of polarization of a wave can be determined by computing the socalled Stokes parameters.
However, we will not discuss this concept in more detail in this text.
1.5 Green function of electrodynamics
To prepare our discussion of the full problem (1.19), let us consider the inhomogeneous scalar
wave equation
_
1
c
2
2
t
_
= 4g, (1.45)
where the inhomogeneity g and the solution f are scalar functions. As with the Poisson
equation (??) the weak spot of the wave equation is its linearity. We may, therefore, again
employ the concept of Green functions to simplify the solution of the problem.
INFO It may be worthwhile to rephrase the concept of the solution of linear dierential equations
by Green function methods in the present dynamical context: think of a generic inhomo-
geneity 4g(x, t) as an additive superposition of inhomogeneities,
(y,s)
(x, t) (xy)(t s)
concentrated in space and time around y and s, resp. (see the gure, where the vertical bars rep-
resent individual contributions
(y,s)
weighted by g(y, s) and a nite discretization in spacetime
is understood.)
x
t
g
4g(x, t) =
_
d
3
yds (4g(y, s))
(y,s)
(x, t).
We may suppress the arguments (x, t) to write
this identity as
4g =
_
d
3
yds (4g(y, s))
(y,s)
.
This formula emphasizes the interpretation of
g as a weighted sum of particular functions
(y,s)
, with constant coecients (i.e. num-
bers independent of (x, t)) 4g(y, s). Denote
the solutions of the wave equation for an in-
dividual point source
(y,s)
by G
(y,s)
. The
linearity of the wave equation means that the solution f of the equation for the function g is given
by the sum f =
_
d
d
yds (4g(y, s))G
(y,s)
of point source solutions, weighted by the coecients
g(y, t). Or, reinstating the arguments (x, t),
12
f(x, t) =
_
d
3
yds G(x, t; y, s) g(y, s). (1.46)
The functions G
(y,s)(x,t)
G(x, t; y, s) are the Green functions of the wave equation. Knowledge
of the Green function is equivalent to a full solution of the general equation, up to the nal
integration (1.46).
The Green function of the scalar wave equation (1.45) G(x, t; x
t
, t) is dened by the
dierential equation
_
1
c
2
2
t
_
G(x, t; x
t
, t
t
) = (x x
t
)(t t
t
). (1.47)
Once the solution of this equation has been obtained (which requires specication of a set of
boundary conditions), the solution of (1.45) becomes a matter of a straightforward integration:
f(x, t) = 4
_
d
3
x
t
dt
t
G(x, t; x
t
, t
t
)g(x
t
, t
t
). (1.48)
Assuming vanishing boundary conditions at innity G(x, t; x
t
, t
t
)
[xx
[,[tt
[
0, we next turn
to the solution of Eq.(1.47).
12
To check that this is a solution, act with the wave equation operator + c
2
t
on both sides of the
equation and use that (c
2
t
)G(x, t; y, s) =
(y,s)
(x, t) (x y)(t s).
INFO In fact, the most elegant and ecient solution strategy utilizes methods of the theory
of complex functions. however, we do not assume familiarity of the reader with this piece of
mathematics and will use a more elementary technique.
We rst note that for the chosen set of boundary conditions, the Green function G(x
x
t
, t t
t
) will depend on the dierence of its arguments only. We next Fourier transform
Eq.(1.47) in the temporal variable, i.e. we act on the equation with the integral transform
f() =
_
dt
2
exp(it)f(t) (whose inverse is f(t) =
_
dt exp(it)f().) The temporally
transformed equation is given by
_
+ k
2
_
G(x, ) =
1
2
(x), (1.49)
where we dened k /c. If it were not for the constant k (and a trivial scaling factor (2)
1
on the l.h.s.) this equation, known in the literature as the Helmholtz equation, would be
equivalent to the Poisson equation of electrostatics. Indeed, it is straightforward to verify that
the solution of (1.49) is given by
G
(x, ) =
1
8
2
e
ik[x[
[x[
, (1.50)
where the sign ambiguity needs to be xed on physical grounds.
INFO To prove Eq.(1.50), we introduce polar coordinates centered around x
t
and act with the
spherical representation of the Laplace operator (cf. section ??) on G
(r) =
1
8
2
e
ikr
/r. Noting
that the radial part of the Laplace operator,
#
is given by r
2
r
r
2
r
=
2
r
+ (2/r)
r
and
#
(4r)
1
= (x) (the equation of the Green function of electrostatics), we obtain
_
+ k
2
_
G
(r, ) =
1
8
2
_
2
r
+ 2r
1
r
+ k
2
_
e
ikr
r
=
=
e
ikr
8
2
_
_
2
r
+ 2r
1
r
_
r
1
+ 2e
ikr
(
r
r
1
)
r
e
ikr
+ e
ikr
r
1
_
2
r
+ k
2
+ 2r
1
r
_
e
ikr
_
=
=
1
2
(x) 2ikr
2
2ikr
2
=
1
2
(x),
as required.
Doing the inverse temporal Fourier transform, we obtain
G
(x, t) =
_
d e
it
G
/c
(x) =
_
d e
it
1
8
2
e
ic
1
[x[
[x[
=
1
4
(t c
1
[x[)
[x[
,
or
G
(x x
t
, t t
t
) =
1
4
(t t
t
c
1
[x x
t
[)
[x x
t
[
. (1.51)
For reasons to become clear momentarily, we call G
+
(G
) the retarded (advanced) Green

function of the wave equation.
t
x
x = ct
Figure 1.2: Wave front propagation of the pulse created by a point source at the origin
schematically plotted as a function of twodimensional space and time. The width of the
front is set by x = ct, where t is the duration of the time pulse. Its intensity decays as
x
1
.
1.5.1 Physical meaning of the Green function
Retarded Green function
To understand the physical signicance of the retarded Green function G
+
, we substitute the
r.h.s. of Eq.(1.51) into (1.48) and obtain
f(x, t) =
_
d
3
x
t
g(x
t
, t [x x
t
[/c)
[x x
t
[
. (1.52)
For any xed instance of space and time, (x, t), the solution f(x, t) is aected by the sources
g(x
t
, t
t
) at all points in space and xed earlier times t
t
= t [xx[
t
/c. Put dierently, a time
t t
t
= [xx
t
[/c has to pass before the amplitude of the source g(x
t
, t
t
) may cause an eect
at the observation point x at time t the signal received at (x, t) is subject to a retardation
mechanism. When the signal is received, it is so at a strength g(x
t
, t)/[xx
t
[, similarly to the
Coulomb potential in electrostatics. (Indeed, for a time independent source g(x
t
, t
t
) = g(x
t
),
we may enter the wave equation with a timeindependent ansatz f(x), whereupon it reduces
to the Poisson equation.) Summarizing, the sources act as instantaneous Coulomb charges
which are (a) of strength g(x
t
, t
t
) and felt at times t = t
t
+[x x
t
[/c.
EXAMPLE By way of example, consider a point source at the origin, which broadcasts a signal
for a short instance of time at t
t
0, g(x
t
, t
t
) = (x
t
)F(t
t
) where the function F is sharply
peaked around t
t
= 0 and describes the temporal prole of the source. The signal is then given
by f(x, t) = F(t [x[/c)/[x[, i.e. we obtain a pattern of outmoving spherical waves, whose
amplitude diminishes as [x[
1
or, equivalently 1/tc (see Fig. 1.2.)
Advanced Green function
Consider now the solution we would have obtained from the advanced Green function, f(x, t) =
_
d
3
x
t
g(x
,t+[xx
[/c)
[xx
[
. Here, the signal responds to the behaviour of the source in the future.
The principle of causeandeect or causality is violated implying that the advanced Green
function, though mathematically a legitimate solution of the wave equation, does not carry
physical meaning. Two more remarks are in order: (a) when solving the wave equation by
techniques borrowed from the theory of complex functions, the causality principle is built in
from the outset, and the retarded Green function automatically selected. (b) Although the
advanced Green function does not carry immanent physical meaning, its not a senseless object
altogether. However, the utility of this object discloses itself only in quantum theories.
1.5.2 Electromagnetic gauge eld and Green functions
So far we have solved the wave equation for an arbitrary scalar source. Let us now specialize
to the wave equations (1.19) for the components of the electromagnetic potential, A
. To a
rst approximation, these are four independent scalar wave equations for the four sources j
.
We may, therefore, just copy the prototypical solution to obtain the retarded potentials of
electrodynamics
(x, t) =
_
d
3
x
t
(x
t
, t [x x
t
[/c)
[x x
t
[
,
A(x, t) =
1
c
_
d
3
x
t
j(x
t
, t [x x
t
[/c)
[x x
t
[
. (1.53)
INFO There is, however, one important consistency check that needs to be performed: recalling
that the wave equations (1.19) hold only in the Lorentz gauge
= 0 c
1
t
+ A = 0,
we need to check that the solutions (1.53) actually meet the Lorentz gauge condition.
Relatedly, we have to keep in mind that the sources are not quite independent; they obey the
continuity equation
= 0
t
+ j = 0.
As in section ??, it will be most convenient to probe the gauge behaviour of the vector potential
in a fully developed Fourier language. Also, we will use a 4vector notation throughout. In this
notation, the Fourier transformation (1.41) assumes the form
f(k) =
_
d
4
x
(2)
4
f(x)e
ix
, (1.54)
where a factor of c
1
has been absorbed in the integration measure and k
= (k
0
, k) with k
0
=
/c.
13
The Fourier transform of the scalar wave equation (1.45) becomes k
(k) = 4g(k).
Specically, the Green function obeys the equation k
G(k) = (2)
4
which is solved by
G(k) = ((2)
4
k
)
1
. (One may check explicitly that this is the Fourier transform of our
solution (1.51), however for our present purposes there is no need to do so.) The solution of the
general scalar wave equation (1.48) obtains by convoluting the Green function and the source,
f(x) = 4(G g)(x) 4
_
d
4
xG(x x
t
)g(x
t
).
Using the convolution theorem, this transforms to f(k) = (2)
4
4 G(k)g(k) =
4g(k)
k
. Speci-
cally, for the vector potential, we obtain
A
= 4
j
. (1.55)
13
Recall that x
0
= ct, i.e. k
= k
0
x
0
k x = t k x.
This is all we need to check that the gauge condition is fullled: The Fourier transform of the
Lorentz gauge relation
= 0 is given by k
= k
= 0. Probing this relation on (1.55),

we obtain k
= 0, where we noted that k
= 0 is the continuity relation.

We have thus shown that the solution (1.53) conforms with the gauge constraints. As an important
byproduct, our proof above reveals an intimate connection between the gauge behaviour of the
electromagnetic potential and current conservation.
Eqs. (1.53) generalize Eqs.(??) and (??) to the case of dynamically varying sources. In
the next sections, we will explore various aspects of the physical contents of these equations.
1.6 Field energy and momentum
In sections ?? and 1.1 we have seen that the static electric and the magnetic eld, resp.,
carry energy. How does the concept of eld energy generalize to the dynamical case where
the electric and the magnetic eld form an inseparable entity? Naively, one may speculate
that the energy, E, carried by an electromagnetic eld is given by the sum of its electric and
magnetic components, E =
1
8
_
d
3
x(E D+B H). As we will show below, this expectation
actually turns out to be correct. However, there is more to the electromagnetic eld energy
than the formula above. For example, we know from daily experience that electromagnetic
energy can ow, and that it can be converted to mechanical energy. (Think of a microwave
oven where radiation ows into whatever fastfood you put into it, to next deposit its energy
into [mechanical] heating.)
In section 1.6.1, we explore the balance between bulk electromagnetic energy, energy ow
(current), and mechanical energy. In section 1.6.2, we will see that the concept of eld energy
can be generalized to one of eld momentum. Much like energy, the momentum carried by
the electromagnetic eld can ow, get converted to mechanical momentum, etc.
1.6.1 Field energy
We begin our discussion of electromagnetic eld energy with a few formal operations on the
Maxwell equations: multiplying Eq. (1.4) by E and Eq. (1.5) by H, and adding the results
to each other, we obtain
E (H) H (E)
1
c
(E
t
D+H
t
B) =
4
c
j E.
Using that for general vector eld v and w, (check!) (vw) = w (v) v (w),
as well as E
t
D =
t
(E D)/2 and H
t
B =
t
(H B)/2, this equation can be rewritten
as
1
8
t
(E D+B H) +
c
4
(E H) +j E = 0. (1.56)
To understand the signicance of Eq.(1.60), let us integrate it over a test volume V :
d
t
_
V
d
3
x
1
8
(E D+B H) =
_
S(V )
d n
c
4
(E H)
_
V
d
3
x j E, (1.57)
1.6. FIELD ENERGY AND MOMENTUM 29
where use of Gaus theorem has been made. The rst term is the sum over the electric and
the magnetic eld energy density, as derived earlier for static eld congurations. We now
interpret this term as the energy stored in general electromagnetic eld congurations and
w
1
8
(E D+B H) (1.58)
as the electromagnetic energy density. What is new in electrodynamics is that the eld
energy may change in time. The r.h.s. of the equation tells us that there are two mechanisms
whereby the eletromagnetic eld energy may be altered. First, there is a surface integral over
the socalled Poynting vector eld
S
c
4
E H. (1.59)
We interpret the integral over this vector as the energy current passing through the surface S
and the Poynting vector eld as the energy current density. Thus, the pair (energy density,
w)/(Poynting vector S) plays a role similar to the pair (charge density, )/(current density,
j) for the matter eld. However, unlike with matter, the energy of the electromagnetic eld
is not conserved. Rather, the balance equation of electromagnetic eld energy above
states that
t
w + S = j E. (1.60)
The r.h.s. of this equation contains the matter eld j which suggests that it describes the
conversion of electromagnetic into mechanical energy. Indeed, the temporal change of the
energy of a charged point particle in an electromagnetic eld (cf. section 1.1 for a similar line
of reasoning) is given by d
t
U(x(t)) = F x = q(E+c
1
v B) v = qE v, where we
observed that the magnetic component of the Lorentz force is perpendicular to the velocity
and, therefore, does not do work on the particle. Recalling that the current density of a point
particle is given by j = q(x x(t))v, this expression may be rewritten as d
t
U =
_
d
3
x j E.
The r.h.s. of Eq. (1.59) is the generalization of this expression to arbitrary current densities.
Energy conservation implies that the work done on a current of charged particles has to be
taken from the electromagnetic eld. This explains the appearance of the mechanical energy
on the r.h.s. of the balance equation (1.60).
INFO Equations of the structure
t
+ j = , (1.61)
generally represent continuity equations: suppose we have some quantity energy, oxygen
molecules, voters of some political party that may be described by a density distribution .
14
14
In the case of discrete quantities (the voters, say) (x, t)d
3
x is the average number of particles present
in a volume d
3
x where the reference volume d
3
x must be chosen large enough to make this number bigger
than unity (on average.)
The contents of our quantity in a small reference volume d
3
x may
change due to (i) ow of the quantity to somewhere else. Mathemati-
cally, this is described by a current density j, where j may be a function
of . However, (ii) changes may also occur as a consequence of a con-
version of into something else (the chemical reaction of oxygen to
carbon dioxide, voters changing their party, etc.). In (1.61), the rate
at which this happens is described by . If equals zero, we speak of
a conserved quantity. Any conservation law can be represented in dierential form, Eq. (1.61),
or in the equivalent integral form
d
t
_
V
d
3
x(x, t) +
_
S
dS j =
_
V
d
3
x(x, t). (1.62)
It is useful to memorize the form of the continuity equation. For if a mathematical equation of the
form
t
(function no. 1) + (vector eld) = function no. 2 crosses our way, we know that we
have encountered a conservation law, and that the vector eld must be the current associated
to function no. 1. This can be nontrivial information in contexts where the identication of
densities/currents from rst principles is dicult. For example, the structure of (1.60) identies
w as a density and S as its current.
1.6.2 Field momentum
Unllike in previous sections on conservation laws, we here identify B = H, D = E, i.e. we
consider the vacuum case.
15
As a second example of a physical quantity that can be exchanged between matter and
electromagnetic elds, we consider momentum. According to Newtons equations, the change
of the total mechanical momentum P
mech
carried by the particles inside a volume B is given
by the integrated force density, i.e.
d
t
P
mech
=
_
B
d
3
x f =
_
B
d
3
x
_
E +
1
c
j B
_
,
where in the second equality we inserted the Lorentz force density. Using Eqs. (??) and (??)
to eliminate the sources we obtain
d
t
P
mech
=
1
4
_
B
d
3
x
_
( E)E B(B) + c
1
B

E
_
.
Now, using that B

E = d
t
(B E)

B E = d
t
(B E) cE ( E) and adding
0 = B( B) to the r.h.s., we obtain the symmetric expression
d
t
P
mech
=
1
4
_
B
d
3
x
_
( E)E E (E) +B( B) B(B) + c
1
d
t
(BE)
_
,
15
The discussion of the eld momentum in matter (cf. ??) turns out to be a delicate matter, wherefore we
prefer to stay on the safe ground of the vaccum theory.
1.6. FIELD ENERGY AND MOMENTUM 31
which may be reorganized as
d
t
_
P
mech
1
4c
_
B
d
3
x BE
_
=
=
1
4
_
B
d
3
x (( E)E E (E) +B( B) B(B)) .
This equation is of the form d
t
(something) = (something else). Comparing to our earlier
discussion of conservation laws, we are led to interpret the something on the l.h.s. as a
conserved quantity. Presently, this quantity is the sum of the total mechanical momentum
density P
mech
and the integral
P
eld
=
_
d
3
x g, g
1
4c
E B =
S
c
2
. (1.63)
The structure of this expression suggests to interpret P
eld
as the momentum carried by
the electromagnetic eld and g as the momentum density (which happens to be given
by c
2
the Poynting vector.) If our tentative interpretation of the equation above as a
conservation law is to make sense, we must be able to identify its r.h.s. as a surface integral.
This in turn requires that the components of the (vector valued) integrand be representable
as a X
j

i
T
ij
i.e. as the divergence of a vectoreld T
j
with components T
ij
. (Here,
j = 1, 2, 3 plays the role of a spectator index.) If this is the case, we may, indeed, transform
the integral to a surface integral,
_
B
d
3
x X
j
=
_
B
d
3
x
i
T
ij
=
_
B
d n T
j
. Indeed, it is not
dicult to verify that
[( X) X(X)]
j
=
i
_
X
i
X
j

ij
2
X X
_
Identiyng X = E, B and introducing the components of the Maxwell stress tensor as
T
ij
=
1
4
_
E
i
E
j
+ B
i
B
j

ij
2
(E E +B B)
_
, (1.64)
The r.h.s. of the conservation law assumes the form
_
B
d
3
x
i
T
ij
=
_
B
d n
i
T
ij
, where n
i
are the components of the normal vector eld of the system boundary. The law of the
conservation of momentum thus assumes the nal form
d
t
(P
mech
+P
eld
)
j
=
_
B
d n
i
T
ij
. (1.65)
Physically, dtdn
i
T
ij
is the (jth component of the) momentum that gets pushed through d
in time dt. Thus, dn
i
T
ij
is the momentum per time, or force excerted on d and n
i
T
ij
the
force per area or radiation pressure due to the change of linear momentum in the system.
It is straightforward to generalize the discussion above to the conservation of angular
momentum: The angular momentum carried by a mechanical system of charged particles
may be converted into angular momentum of the electromagnetic eld. It is evident from
Eq. (1.63) that the angular momentum density of the eld is given by
l = x g =
1
4c
x (E B).
However, in this course we will not discuss angular momentum conservation any further.
1.6.3 Energy and momentum of plane electromagnetic waves
Consider a plane wave in vacuum. Assuming that the wave propagates in 3direction, the
physical electric eld, E
phys
is given by
E
phys.
(x, t) = Re E(x, t) = Re
_
(E
1
e
1
+ E
2
e
2
)e
ikx
3
it
_
= r
1
u
1
(x
3
, t)e
1
+ r
2
u
2
(x
3
, t)e
2
,
where u
i
(x
3
, t) = cos(
i
+kx
3
t) and we dened E
i
= r
i
exp(i
1
) with real r
i
. Similarly,
the magnetic eld B
phys.
is given by
B
phys.
(x, t) = Re (e
3
E(x, t)) = r
1
u
1
(x
3
, t)e
2
r
2
u
2
(x, t)e
1
.
From these relations we obtain the energy density and Poynting vector as
w(x, t) =
1
8
(E
2
+B
2
) =
1
4
((r
1
u
1
)
2
+ (r
2
u
2
)
2
)(x
3
, t)
S(x, t) = cw(x, t)e
3
,
where we omitted the subscript phys. for notational simplicity.
EXERCISE Check that w and S above comply with the conservation law (1.60).
1.7 Electromagnetic radiation
To better understand the physics of the expressions (1.53), let us consider where the sources
are conned to within a region in space of typical extension d. Without loss of generality,
we assume the time dependence of the sources to be harmonic, with a certain characteristic
frequency , j
(x) = j
(x) exp(it). (The signal generated by sources of more general

time dependence can always be obtained by superposition of harmonic signals.) As we shall
see, all other quantities of relevance to us (potentials, elds, etc.) will inherit the same
time dependence, i.e. X(x, t) = X(x) exp(it). Specically, Eq. (1.53) implies that the
electromagnetic potentials, too, oscillate in time, A
(x) = A
(x) exp(it). Substituting

this ansatz into Eq. (1.53), we obtain A
(x) =
1
c
_
d
3
x
t
j
(x
)e
ik|xx
|
[xx
[
.
1.7. ELECTROMAGNETIC RADIATION 33
As in our earlier analyses of multipole elds, we assume that the observation point x is far
away from the source, r = [x[ d. Under these circumstances, we may focus attention on
the spatial components of the vector potential,
A(x) =
1
c
_
d
3
x
t
j(x
t
)e
ik[xx
[
[x x
t
[
, (1.66)
where we have substituted the timeoscillatory ansatz for sources and potential into (1.53)
and divided out the time dependent phases. From (1.66) the magnetic and electric elds are
obtained as
B = A, E = ik
1
(A), (1.67)
where in the second identity we used the sourcefree variant of the law of magnetic circulation,
c
1
t
E = ikE = B.
We may now use the smallness of the ratio d/r to expand [xx
t
[ r n x
t
+. . . , where
n is the unit vector in xdirection. Substituting this expansion into (1.66) and expanding
the result in powers of r
1
, we obtain a series similar in spirit to the electric and magnetic
multipole series discussed above. For simplicity, we here focus attention on the dominant
contribution to the series, obtained by approximating [x x
t
[ r. For reasons to become
clear momentarily, this term,
A(x)
e
ikr
cr
_
d
3
x
t
j(x
t
), (1.68)
generates electric dipole radiation. In section ?? we have seen that for a static current
distribution, the integral
_
j vanishes. However, for a dynamic distribution, we may engage
the Fourier transform of the continuity relation, i = j to obtain
_
d
3
x j
i
=
_
d
3
x x
i
j =
_
d
3
x
t
x
i
( j) = i
_
d
3
x
t
x
i
.
Substituting this result into (1.68) and recalling the denition of the electric dipole moment
of a charge distribution, d
_
d
3
x x , we conclude that
A(x) ikd
e
ikr
r
, (1.69)
is indeed controlled by the electric dipole moment.
Besides d and r another characteristic length scale in the problem is the characteristic
wave length 2/k = 2c/. For simplicity, we assume that the wave length d is
much larger than the extent of the source.
INFO This latter condition is usually met in practice. E.g. for radiation in the MHz range,
10
6
s
1
the wave length is given by = 3 10
8
ms
1
/10
6
s
1
300 m much larger than the
extension of typical antennae.
We then need to distinguish between three
dierent regimes (see Fig. ??): the near zone,
d r , the intermediate zone, d
r, and the far zone, d r. We
next discuss these regions in turn.
Near zone
For r or kr 1, the exponential in
(1.68) may be approximated by unity and we
obtain
A(x, t) i
k
r
de
it
,
where we have reintroduced the time de-
pendence of the sources. Using Eq.(1.67) to
compute the electric and magnetic eld, we
get
B(x, t) = i
k
r
2
n de
it
,
E(x, t) =
3n(n d) d
r
3
e
it
.
The electric eld equals the dipole eld (??)
created by a dipole with time dependent mo-
ment dexp(it) (cf. the gure where the numbers on the ordinate indicate the separation
from the radiation center in units of the wave length and the color coding is a measure of the
eld intensity.) This motivates the denotation electric dipole radiation. The magnetic eld
is by a factor kr 1 smaller than the electric eld, i.e. in the near zone, the electromagnetic
eld is dominantly electric. In the limit k 0 the magnetic eld vanishes and the electric
eld reduces to that of a static dipole.
The analysis of the intermediate zone, kr 1 is complicated in as much as all powers
in an expansion of the exponential in (1.68) in kr must be retained. For a discussion of the
resulting eld distribution, we refer to[1].
Far zone
For kr 1, it suces to let the derivative operations on the argument kr of the exponential
function. Carrying out the derivatives, we obtain
B = k
2
n d
e
i(krt)
r
, E = n B. (1.70)
These asymptotic eld distributions have much in common with the vacuum electromagnetic
elds above. Indeed, one would expect that far away from the source (i.e. many wavelengths
away from the source), the electromagnetic eld resembles a spherical wave (by which we
1.7. ELECTROMAGNETIC RADIATION 35
mean that the surfaces of constant phase form spheres.) For observation points suciently
far from the center of the wave, its curvature will be nearly invisible and we the wave will look
approximately planar. These expectations are met by the eld distributions (1.70): neglecting
all derivatives other than those acting on the combination kr (i.e. neglecting corrections of
O(kr)
1
), the components of E and B obey the wave equation (the Maxwell equations in
vacuum.
16
The wave fronts are spheres, outwardly propagating at a speed c. The vectors
n E B n form an orthogonal set, as we saw is characteristic for a vacuum plane wave.
Finally, it is instructive to compute the energy current carried by the wave. To this end,
we recall that the physically realized values of the electromagnetic eld obtain by taking the
real part of (1.70). We may then compute the Poynting vector (1.59) as
S =
ck
4
4r
2
n(d
2
(n d)
2
) cos
2
(kr t)
... )
t
ck
4
8r
2
n(d
2
(n d)
2
),
Where the last expression is the energy current temporally averaged over several oscillation
periods
1
. The energy current is maximal in the plane perpendicular to the dipole moment of
the source and decays according to an inverse square law. It is also instructive to compute the
total power radiated by the source, i.e. the energy current integrated over spheres of constant
radius r. (Recall that the integrated energy current accounts for the change of energy inside
the reference volume per time, i.e. the power radiated by the source.) Choosing the zaxis of
the coordinate system to be colinear with the dipole moment, we have d
2
(n d)
2
= d
2
sin
2
and
P
_
S
dn S =
ck
4
d
2
8r
2
r
2
_

0
sin d
_
2
0
d sin
2
=
ck
4
d
2
3
.
Notice that the radiated power does not depend on the radius of the reference sphere, i.e.
the work done by the source is entirely converted into radiation and not, say, in a steadily
increasing density of vacuum eld energy.
INFO As a concrete example of a radiation source, let us consider a centerfed linear an-
tenna, i.a. a piece of wire of length a carrying an AC current that is maximal at the center
of the wire. We model the current ow by the distribution I(z) = I
0
(1 2[z[a
1
) exp(it),
where a = c/ and alignment with the zaxis has been assumed. Using the continu-
ity equation,
z
I(z, t) +
t
(z, t), we obtain the charge density (charge per unit length) in the
wire as (z) = iI
0
2(a)
1
sgn(z) exp(it). The dipole moment of the source is thus given by
d = e
z
_
a
a
dz z(z, t) = iI
0
(a/2), and the radiated power by
P =
(ka)
2
12c
I
2
0
.
The coecient R
rad
(ka)
2
/12c of the factor I
2
0
is called radiation resistance of the antenna.
To understand the origin of this denotation, notice that [R
rad
] = [c
1
] has indeed the dimension of
resistivity (exercise.) Next recall that the power required to drive a current through a conventional
resistor R is given by P = UI = RI
2
. Comparison with the expression above suggests to interpret
16
Indeed, one may verify (do it!) that the characteristic factors r
1
e
i(krt)
are exact solutions of the wave
equations; they describe the spacetime prole of a spherical wave.
R
rad
as the resistance of the antenna. However, this resistance has nothing to do with dissipative
energy losses inside the antenna. (I.e. those losses that hold responsible for the DC resistivity of
metals.) Rather, work has to be done to feed energy into electromagnetic radiation. This work
determines the radiation resistance. Also notice that R
rad
k
2

2
, i.e. the radiation losses
increase quadratic in frequency. This latter fact is of great technological importance.
Our results above relied on a rst order expansion in the ratio d/r between the extent of
the sources and the distance of the observation point. We saw that at this order of the
expansion, the source coupled to the electromagnetic eld by its electric dipole moment. A
more sophisticated description of the eld eld may be obtained by driving the expansion in d/r
to higher order. E.g. at next order, the electric quadrupole moment and the magnetic dipole
moment enter the stage. This leads to the generation of electric quadrupole radiation and
magnetic dipole radiation. For an indepth discussion of these types of radiation we refer
to [1].
Chapter 2
Macroscopic electrodynamics
Suppose we want to understand the electromagnetic elds in an environment containing ex-
tended pieces of matter. From a purist point of view, we would need to regard the O(10
23
)
carriers of electric and magnetic moments electrons, protons, neutrons comprising the
medium as sources. Throughout, we will denote the elds e, . . . , h created by this system
of sources as microscopic elds. (Small characters are used to distinguish the microscopic
elds from the eective macroscopic elds to be introduced momentarily.) The microscopic
Maxwell equations read as
e = 4
,
b
1
c
t
e =
4
c
j
,
e +
1
c
t
b = 0,
b = 0,
where
and j
are the microscopic charge and current density, respectively. Now, for several
reasons, it does not make much sense to consider these equations as they stand: First, it is
clear that attempts to get a system of O(10
23
) dynamical sources under control are bound
to fail. Second, the dynamics of the microscopic sources is governed by quantum eects and
cannot be described in terms of a classical vector eld j. Finally, we arent even interested in
knowing the microscopic elds. Rather, (in classical electrodynamics) we want to understand
the behaviour of elds on classical length scales which generally exceed the atomic scales by
far.
1
In the following, we develop a more praxis oriented theory of electromagnetism in matter.
We aim to lump the inuence of the microscopic degrees of freedom of the matterenvironment
into a few eective modiers of the Maxwell equations.
1
For example, the wavelength of visible light is about 600 nm, whilst the typical extension of a molecule
is 0.1 nm. Thus, classical length scales of interest are about three to four orders of magnitude larger than
the microscopic scales.
37
38 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS
Figure 2.1: Schematic on the spatially averaged system of sources. A zoom into a solid (left)
reveals a pattern of ions or atoms or molecules (center left). Individual ions/atoms/molecules
centered at lattice coordinate x
m
may carry a net charge (q
m
), and a dipole element d
m
(center
right). A dipole element forms if sub-atomic constituents (electrons/nuclei) are shifted asym-
metrically w.r.t. the center coordinate. In an ionic system, mobile electrons (at coordinates
x
j
contribute to the charge balance.)
2.1 Macroscopic Maxwell equations
Let us, then, introduce macroscopic elds by averaging the microscopic elds as
E e, B b,
where the averaging procedure is dened by f(x)
_
d
3
x
t
f(xx
t
)g(x
t
), and g is a weight
function that is unit normalized,
_
gt = 1 and decays over suciently large regions in space.
Since the averaging procedure commutes with taking derivatives w.r.t. both space and time,
t,x
E =
t,x
e (and the same for B), the averaged Maxwell equations read as
E = 4
,
B
1
c
t
E =
4
c
j
,
E +
1
c
t
B = 0,
B = 0.
2.1.1 Averaged sources
To better understand the eect of the averaging on the sources, we need to take a (supercial)
look into the atomic structure of a typical solid. Generally, two dierent types of charges in
solids have to be distinguished: rst, there are charges nuclei and valence electrons
that are by no more than a vector a
m,j
o the center coordinate of a molecule (or atom for
that matter) x
m
. Second, (in metallic systems) there are free conduction electrons which may
abide at arbitrary coordinates x
i
(t) in the system. Denoting the two contributions by
b
and
f
, respectively, we have
=
b
+
f
, where
b
(x, t) =
m,j
q
m,j
(xx
m
(t) a
m,j
(t)) and
f
= q
e
i
(x x
i
(t)). Here, q
m,j
is the charge of the ith molecular constituent and q
e
the
electron charge.
2.1. MACROSCOPIC MAXWELL EQUATIONS 39
Averaged charge density
The density of the bound charges, averaged over macroscopic proportions, is given by
b
(x) =
_
dx
t
g(x
t
)
m,j
q
m,j
(x x
t
x
m
a
m,j
) =
=
m,j
g(x x
m
a
m,j
)q
m,j

m
q
m
g(x x
m
)
m
g(x x
m
)
j
a
m,j
q
m,j
. .
d
m
=
=
m
q
m
(x x
m
) P(x).
In the second line, we used the fact that the range over which the weight function g changes
exceeds atomic extensions by far to Taylor expand to rst order in the osets a. The zeroth
order term of the expansion is oblivious to the molecular structure, i.e. it contains only the
total charge q
m
=
j
q
m,j
of the molecules and their center positions. The rst order term
contains the molecular dipole moments, d
m

j
a
m,j
q
m,j
. By the symbol
2
P(x, t)
_
m
(x x
m
(t))d
m
(t)
_
, (2.1)
we denote the average polarization of the medium. Evidently, P is a measure of the density
of the dipole moments carried by the molecules in the system.
The average density of the mobile carriers in the system is given by
f
(x, t) = q
e
i
(x
x
i
(t)), so that we obtain the average charge density as
(x, t) =
av
(x, t) P(x, t) (2.2)
where we dened the eective or macroscopic charge density in the system,
av
(x, t)
_
m
q
m
(x x
m
(t)) + q
e
i
(x x
i
(t))
_
, (2.3)
Notice that the molecules/atoms present in the medium enter the quantity
av
(x, t) as point
like entities. Their nite polarizability is accounted for by the second term contributing to .
Also notice that most solids are electrically neutral on average, i.e. the positive charge of ions
accounted for in the rst term in (2.3) cancels against the negative charge of mobile electrons
(the second term) meaning that
av
(x, t) = 0. Under these circumstances,
(x, t) = P(x, t). (2.4)

2
Here, we allow for dynamical atomic or molecular center coordinates, x
m
= x
m
(t). In this way, the deni-
tion of P becomes general enough to encompass non-crystalline, or liquid substances in which atoms/molecules
are not tied to a crystalline matrix.
where P is given by the averaged polarization (2.1). The electric eld E in the medium
is caused by the sum of the averaged microscopic density,
, and an external charge

density,
3
E = 4( +
), (2.5)
where is the external charge density.
Averaged current density
Averaging the current density proceeds in much the same way as the charge density procedure
outlined above. As a result of a somewhat more tedious calculation (see the info block below)
we obtain
j
j
av
+

P+ cM, (2.6)
where
M(x) =
_
m
(x x
m
)
1
2c
j
q
m,j
a
m,j
a
m,j
_
(2.7)
is the average density of magnetic dipole moments in the system. The quantity
j
av

_
q
e
x
i
(x x
i
)
_
+
_
m
q
m
x
m
(x x
m
)
_
is the current carried by the free charge carriers and the pointlike approximation of the
molecules. Much like the average charge density
av
this quantity vanishes in the majority of
cases solids usually do not support nonvanishing current densities on average, i.e. j
av
= 0,
meaning that the average of the microscopic current density is given by
j
=

P+ cM. (2.8)
In analogy to Eq. (2.5), the magnetic induction has the sum of j
and an external current

density, j as its source:
B
1
c
t
E =
4
c
(j
+j) . (2.9)
INFO To compute the average of the microscopic current density, we we decompose j
=
j
b
+ j
f
the current density into a bound and a free part. With j
b
(x) =
m,j
( x
m
+ a
m,j
)(x
3
For example, for a macroscopic system placed into a capacitor, the role of the external charges will be
taken by the charged capacitor plates, etc.
x
m
a
m,j
), the former is averaged as
j
b
(x) =
_
dx
t
g(x
t
)
m,j
q
m,j
( x
m
+ a
m,j
) (x x
t
x
m
a
m,j
) =
=
m,j
g(x x
m
a
m,j
)( x
m
+ a
m,j
)q
m,j

m,j
q
m,j
_
g(x x
m
)
m
g(x x
m
) a
m,j
_
( x
m
+ a
m,j
).
We next consider the dierent orders in a contributing to this expansion in turn. At zeroth order,
we obtain
j
b
(x)
(0)
=
m
q
m
g(x x
m
) x
m
=
_
m
q
m
x
m
(x x
m
)
_
,
i.e. the current carried by the molecules in a pointlike approximation. The rst order term is
given by
j
b
(x)
(1)
=
m
_
g(x x
m
)

d
m
(g(x x
m
) d
m
) x
m
_
.
The form of this expression suggests to compare it with the time derivative of the polarization
vector,
P = d
t
m
g(x x
m
)d
m
=
m
_
g(x x
m
)

d
m
(g(x x
m
) x
m
) d
m
_
.
The dierence of these two expressions is given by X j
b
(x)
(1)

P =
m
g(x x
m
)
(d
m
x
m
). By a dimensional argument, one may show that this quantity is negligibly small: the
magnitude [ x
m
[ v is of the order of the typical velocity of the atomic or electronic compounds
inside the solid. We thus obtain the rough estimate [X(q, )[ v(P)(q, )[ vq[P[, where
in the second step we generously neglected the dierences in the vectorial structure of P and X,
resp. However, [(

P)(q, )[ = [P(q, )[. This means that [X[/[
P[ vq/ v/c is of the order

of the ratio of typical velocities of nonrelativistic matter and the speed of light. This ratio is so
small that it (a) overcompensates the crudeness of our estimates above by far and (b) justies
to neglect the dierence X. We thus conclude that
j
b
(1)

P.
The second order term contributing to the average current density is given by
j
b
(2)
=
m,j
q
m,j
a
m,j
( a
m,j
g(x x
m
))
This expression is related to the density of magnetic dipole moments carried by the molecules.
The latter is given by (cf. Eq. (??) and Eq. (2.7))
M =
1
2c
m,j
q
m,j
a
m,j
a
m,j
g(x x
m
).
As a result of a straightforward calculation, we nd that the curl of this expression is given by
M =
1
c
j
b
(2)
+ d
t
1
2c
m,j
q
m,j
a
m,j
(a
m,j
g(x x
m
)).
The second term on the r.h.s. of this expression engages the time derivative of the electric
quadrupole moments
qa
a
a
b
, a, b = 1, 2, 3 carried by the molecules. The elds generated by
these terms are arguably very weak so that
j
b
(2)
cM.
Adding to the results above the contribution of the free carriers j
f
(x) = q
e
i
x
i
(x x
i
),
we arrive at Eq. (2.6).
Interpretation of the sources I: charge and current densities
Eqs. (2.4) and (2.8) describe the spatial average of the microscopic sources of electromagnetic
elds in an extended medium. The expressions appearing on the r.h.s. of these equations can
be interpreted in two dierent ways, both useful in their own right. One option is to look at the
r.h.s. of, say, Eq. (2.4) as an eective charge density. This density obtains as the divergence
of a vector eld P describing the average polarization in the medium (cf. Eq. (2.1)).
To understand the meaning of this expression, consider the cartoon of a solid shown in
Fig. 2.2. Assume that the microscopic constituents of the medium are polarized in a certain
direction.
4
Formally, this polarization is described by a vector eld P, as indicated by the
horizontal cut through the medium in Fig. 2.2. In the bulk of the medium, the positive
and negative ends of the molecules carrying the polarization mutually neutralize and no net-
charge density remains. (Remember that the medium is assumed to be electrically neutral on
average.) However, the boundaries of the system carry layers of uncompensated positive and
negative polarization, and this is where a non-vanishing charge density will form (indicated
by a solid and a dashed line at the system boundaries.) Formally, the vector eld P begins
and ends (has its sources) at the boundaries, i.e. P is non-vanishing at the boundaries
where it represents the eective surface charge density. This explains the appearance of P
as a source term in the Maxwell equations.
Dynamical changes of the polarization
t
P ,= 0, due to, for example, uctuations in the
orientation of polar molecules correspond to the movement of charges parallel to P. The
rate
t
P of charge movement is a current ow in P direction.
5
Thus,
t
P must appear as a
bulk current source in the Maxwell equations.
Finally, a non-vanishing magnetization M (cf. the vertical cut through the system) corre-
sponds to the ow of circular currents in a plane perpendicular to M. For homogeneous M,
the currents mutually cancel in the bulk of the system, but a non-vanishing surface current
remains. Formally, this surface current density is represented by the curl of M (Exercise:
compute the curl of a vector eld that is perpendicular to the cut plane and homogeneous
across the systems.), i.e. cM is a current source in the eective Maxwell equations.
4
Such a polarization may form spontaneously, or, more frequently, be induced externally, cf. the info block
below.
5
Notice that a current ow j =
t
P is not in conict with the condition of electro-neutrality.
Figure 2.2: Cartoon illustrating the origin of (surface) charge and current densities forming in
a solid with non-vanishing polarization and/or magnetization.
Interpretation of the sources II: elds
Above, we have interpreted P,
t
P and Mas charge and current sources in the Maxwell
equations. But how do we know these sources? It stands to reason that the polarization, P,
and magnetization, M, of a bulk material will sensitively depend on the electric eld, E, and the
magnetic eld, B, present in the system. For in the absence of such elds, E = 0, B = 0, only
few materials will spontaneously build up a nonvanishing polarization, or magnetization.
6
Usually, it takes an external electric eld, to orient the microscopic constituents of a solid so
as to add up to a bulk polarization. Similarly, a magnetic eld is usually needed to generate
a bulk magnetization.
This means that the solution of the Maxwell equations poses us with a self consistency
problem: An external charge distribution,
ext
immersed into a solid gives rise to an electric
eld. This eld in turn creates a bulk polarization, P[E], which in turn alters the charge
distribution,
e
[E]
ext
P[E]. (The notation indicates that the polarization
depends on E.) Our task then is to self consistently solve the equation E = 4
e
[E], with
a charge distribution that depends on the sought for electric eld E. Similarly, the current
distribution j = j
e
[B, E] in a solid depends on the electric and the magnetic eld which
means that Amper`es law, too, poses a self consistency problem.
In this section we reinterpret polarization and magnetization in a manner tailor made to
the self consistent solution of the Maxwell equations. We start by substituting Eq. (2.2) into
(2.5). Rearranging terms we obtain
(E + 4P) = 4.
6
Materials disobeying this rule are called ferroelectric (spontaneous formation of polarization), or fer-
romagnetic (spontaneous magnetization), respectively. In spite of their relative scarcity, ferroelectric and
magnetic materials are of huge applied importance. Application elds include, storage and display technol-
ogy, sensor technology, general electronics, and, of course, magnetism.
This equation suggests to introduce a eld
D E + 4P. (2.10)
The source of this displacement eld is the external charge density,
D = 4, (2.11)
(while the electric eld has the spatial average of the full microscopic charge system as its
source, E = 4
.) Notice the slightly dierent perspectives at which Eqs. (2.5) and

(2.10) interpret the phenomenon of matter polarization. Eq. (2.5) places the emphasis on
the intrinsic charge density 4
. The actual electric eld, E, is caused by the sum of

external charges and polarization charges. In Eq. (2.10), we consider a ctitious eld D, whose
sources are purely external. The eld D diers from the actual electric eld E by (4) the
polarization, P, building up in the system. Of course the two alternative interpretations of
polarization are physically equivalent. (Think about this point.)
The magnetization, M, can be treated in similar terms: Substituting Eq. (2.6) into (2.9),
we obtain
(B4M)
1
c
t
(E + 4P) =
4
c
j.
The form of this equation motivates the denition of the magnetic eld
H = B4M. (2.12)
Expressed in terms of this quantity, the Maxwell equation assumes the form
H
1
c
t
D =
4
c
j, (2.13)
where j is the external current density. According to this equation, H plays a role similar to D:
the sources of the magnetic eld H are purely external. The eld H diers from the induction
B (i.e. the eld an magnetometer would actually measure in the system) by (4) the
magnetization M. In conceptual analogy to the electric eld, E, the induction B has the sum
of external and intrinsic charge and current densities as its source (cf. Eq. (2.9).)
Electric and magnetic susceptibility
To make progress with Eqs. (2.10) and (2.11), we need to say more about the polarization.
Above, we have anticipated that a nite polarization is usually induced by an electric eld, i.e.
P = P[E]. Now, external electric elds triggering a nonvanishing polarization are ususally
much weaker than the intrinsic microscopic elds keeping the constituents of a solid together.
In practice, this means that the polarization may be approximated by a linear functional of the
electric eld. The most general form of a linear functional reads as
7
P(x, t) =
1
(2)
4
_
d
3
x
t
_
dt
t
(x, t; x
t
, t
t
)E(x
t
, t
t
), (2.14)
where the integral kernel is called the electric susceptibility of the medium and the factor
(2)
4
has been introduced for convenience. The susceptibility (x, t; x
t
, t
t
) describes how
an electric eld amplitude at x
t
and t
t
< t aects the polarization at (x, t).
8
Notice that the
polarization obtains by convolution of and E. In Fourier space, the equation assumes the
simpler form P(q) = (q)E(q), where q = (/c, q) as usual. Accordingly, the electric eld
and the displacement eld are connected by the linear relation
D(q) = (q)E(q), (q) = 1 + 4(q), (2.15)
where the function (q) is known as the dielectric function.
INFO On a microscopic level, at least two dierent mechanisms causing eldinduced polar-
ization need to be distinguished: Firstly, a nite electric eld may cause a distortion of the electron
clouds surrounding otherwise unpolarized atoms or molecules. The relative shift of the electron
clouds against the nuclear centers causes a nite molecular dipole moment which integrates to
a macroscopic polarization P. Second, in many substances (mostly of organic chemical origin),
the molecules (a) carry a permanent intrinsic dipole moment and (b) are free to move. A nite
electric eld will cause spatial alignment of the otherwise disoriented molecules. This leads to a
polarization of the medium as a whole.
Information on the frequency/momentum dependence of the dielectric function has to
be inferred from outside electrodynamics, typically from condensed matter physics, or plasma
physics.
9
In many cases of physical interest e.g. in the physics of plasmas or the physics of
metallic systems the dependence of on both q and has to be taken seriously. Sometimes,
only the frequency dependence, or the dependence on the spatial components of momentum
is of importance. However, the crudest approximation at all (often adopted in this course)
is to approximate the function (q) by a constant. (Which amounts to approximating
(x, t; x
t
, t
t
) (x x
t
)(t t
t
) by a function innitely short ranged in space and time.) If
not mentioned otherwise, we will thus assume
D = E,
where is a material constant.
7
Yet more generally, one might allow for a non colinear dependence of the polarization on the eld vector:
P
i
(x, t) =
1
(2)
4
_
d
3
x
_
dt

ij
(x, t; x
, t
)E
j
(x
, t
),
where =
ij
is a 3 3matrix eld.
8
By causality, (x, t; x
, t
) (t t
).
9
A plasma is a gas of charged particles.
As with the electric sector of the theory, the magnetic elds created by externally imposed
macroscopic current distributions (a) magnetize solids, i.e. generate nite elds M, and (b)
are generally much weaker than the intrinsic microscopic elds inside a solid. This means that
we can write B = H+4M[H], where M[H] may be assumed to be linear in H. In analogy
to Eq. (2.14) we thus dene
M(x, t) =
1
(2)
4
_
d
3
x
t
_
dt
t
m
(x, t; x
t
, t
t
)H(x
t
, t
t
), (2.16)
where the function is called the magnetic susceptibility of the system. The magnetic sus-
ceptibility describes how a magnetic eld at (x
t
, t
t
) causes magnetization at (x, t). Everything
we said above about the electric susceptibility applies in a similar manner to the magnetic
susceptibility. Specically, we have the Fourier space relation M(q) =
m
(q)H(q) and
B(q) = (q)H(q), (q) = 1 + 4
m
(q), (2.17)
where (q) is called the magnetic permeability. In cases where the momentum dependence
of the susceptibility is negligible, we obtain the simplied relation
B = H, (2.18)
where is a constant. As in the electric case, a number of dierent microscopic mechanisms
causing deviations of o unity can be distinguished. Specically, for molecules carrying no
intrinsic magnetic moment, the presence of an external magnetic eld may induce molecular
currents and, thus, a nonvanishing magnetization. This magnetization is generally directed
opposite to the external eld, i.e. < 1. Substances of this type are called diamagnetic.
(Even for the most diamagnetic substance known, bismuth, 1 = O(10
4
), i.e. diamag-
netism is a very weak eect.) For molecules carrying a nonvanishing magnetic moment, an
external eld will cause alignment of the latter and, thus, an eective amplication of the
eld, > 1. Materials of this type are called paramagnetic. Typical values of paramagnetic
permeabilities are of the order of 1 = O(10
5
) O(10
2
).
Before leaving this section, it is worthwhile to take a nal look at the general structure
of the Maxwell equations in matter, We note that
The averaged charge densities and currents are sources of the electric displacement eld
D and the magnetic eld H, respectively.
These elds are related to the electric eld Eand the magnetic induction Bby Eqs. (2.15)
and (2.17), respectively.
The actual elds one would measure in an experiment are E and B; these elds determine
the coupling (elds matter) as described by the Lorentz force law.
Similarly, the (matter unrelated) homogeneous Maxwell equations contain the elds E
and B.
2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 47
2.2 Applications of macroscopic electrodynamics
In this section, we will explore a number of phenomena caused by the joint presence of matter
and electromagnetic elds. The organization of the section parallels that of the vacuum part
of the course, i.e. we will begin by studying static electric phenomena, then advance to the
static magnetic case and nally discuss a few applications in electrodynamics.
2.2.1 Electrostatics in the presence of matter *
Generalities
Consider the macroscopic Maxwell equations of electrostatics
D = 4,
E = 0,
D = E, (2.19)
where in the last equation we assumed a dielectric constant for simplicity. Now assume that
we wish to explore the elds in the vicinity of a boundary separating two regions containing
types of matter. In general, the dielectric constants characterizing the two domains will be
dierent, i.e. we have D = E in region #1 and D =
t
E in region #2. (For
t
= 1, we
describe the limiting case of a matter/vacuum interface.) Optionally, the system boundary
may carry a nite surface charge density .
Arguing as in section ??, we nd that the rst and the second of the Maxwell equations
imply the boundary conditions
10
(D(x) D
t
(x)) n(x) = 0,
(E(x) E
t
(x)) n(x) = 0, (2.20)
where the unprimed/primed quantities refer to the elds in regions #1 and #2, respectively,
and n(x) is a unit vector normal to the boundary at x. Thus,
the tangential component of the electric eld continuous and the normal
component of the displacement eld are continuous at the interface.
Example: Dielectric sphere in a homogeneous electric eld
To illustrate the computation of electric elds in static environments with matter, consider the
example of a massive sphere of radius R and dielectric constant placed in an homogeneous
external displacement eld D. We wish to compute the electric eld E in the entire medium.
10
If the interface between #1 and #2 carries external surface charges, the rst of these equations generalizes
to (D(x) D
(x)) n(x) = 4(x), where is the surface charge density. In this case, the normal component
of the displacement eld jumps by an amount set the by the surface charge density.
Since there are no charges present, the electric potential inside and outside the sphere
obeys the Laplace equation = 0. Further, choosing polar coordinates such that the zaxis
is aligned with the orientation of the external eld, we expect the potential to be azimuthally
symmetric, (r, , ) = (r, ). Expressed in terms of this potential, the boundary equations
(2.20) assume the form
r=R0
=
r
r=R+0
, (2.21)
r=R0
=
r=R+0
, (2.22)
r=R0
=
r=R+0
, (2.23)
where R0 lim
0
R. To evaluate these conditions, we expand the potential inside and
outside the sphere in a series of spherical harmonics, (??). Due to the azimuthal symmetry of
the problem, only independent functions Y
l,0
=
_
2l+1
4
P
l
contribute, i.e.
(r, ) =
l
P
l
(cos )
_
A
l
r
l
, r < R,
B
l
r
l
+ C
l
r
(l+1)
, r R,
,
where we have absorbed the normalization factors (2l +1/4)
1/2
in the expansion coecients
A
l
, B
l
, C
l
.
At spatial innity, the potential must approach the potential = Dz = Dr cos =
DrP
1
(cos ) of the uniform external eld D = De
z
. Comparison with the series above
shows that B
1
= D and B
l,=1
= 0. To determine the as yet unknown coecients A
l
and C
l
,
we consider the boundary conditions above. Specically, Eq. (2.21) translates to the condition
l
P
l
(cos )
_
lA
l
R
l1
+ (l + 1)C
l
R
(l+2)
+ D
l,1
_
= 0.
Now, the Legendre polynomials are a complete set of functions, i.e. the vanishing of the l.h.s.
of the equation implies that all series coecients must vanish individually:
C
0
= 0,
A
1
+ 2R
3
C
1
+ D = 0,
lA
l
R
l1
+ (l + 1)C
l
R
(l+2)
= 0, l > 1.
A second set of equations is obtained from Eq. (2.22):
l
P
l
(cos )((A
l
B
l
)R
l
C
l
R
(l+1)
) =
0 or
l>0
P
l
(cos )((A
l
B
l
)R
l
C
l
R
(l+1)
) = const. The second condition implies (think
why!) (A
l
B
l
)R
l
C
l
R
(l+1)
= 0 for all l > 0. Substituting this condition into Eq. (2.23),
we nally obtain A
0
= 0. To summarize, the expansion coecients are determined by the set
of equations
A
0
= 0,
(A
1
+ D)R C
1
R
2
= 0,
A
l
R
l
C
l
R
(l+1)
= 0, l > 1.
2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 49
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
D
> 0 < 0
Figure 2.3: Polarization of a sphere by an external electric eld.
It is straightforward to verify that these equations are equivalent to the conditions
C
1
= DR
3
1
+ 2
, A
1
=
3
+ 2
D,
while C
l>1
= A
l>1
= 0. The potential distorted by the presence of the sphere is thus given by
(r, ) = rE
0
cos
_
_
3
+ 2
, r < R
1
1
+ 2
_
R
r
_
3
, r R,
(2.24)
The result (2.24) aords a very intuitive physical interpretation:
Inside the sphere the electric eld E = = E
0
z
3
+2
= E
0
3
+2
e
z
is parallel to the
external electric eld, but weaker in magnitude (as long as > 1 which usually is the
case.) This indicates the buildup of a polarization P =
1
4
E =
3
4
1
+2
E
0
.
Outside the sphere, the electric eld is given by a superposition of the external eld and
the eld generated by a dipole potential = d x/r
3
= d cos /r
2
, where the dipole
moment is given by d = V P and V = 4R
3
/3 is the volume of the sphere.
To understand the origin of this dipole moment, notice that in a polarizable medium, and
in the absence of external charges, 0 = D = E+4 P, while the microscopic
charge density E = 4
mic
need not necessarily vanish. In exceptional cases where
mic
does not vanish upon averaging, we denote
mic
=
pol
the polarization charge.
The polarization charge is determined by the condition
P =
pol
.
Turning back to the sphere, the equation 0 = D =

P tells us that there is

no polarization charge in the bulk of the sphere (nor, of course, outside the sphere.)
However, arguing as in section ??, we nd that the sphere (or any boundary between
two media with dierent dielectric constants) carries a surface polarization charge
ind
= P
t
n
P
n
.
Specically, in our problem, P
t
n
= 0 so that = P
n
=
3
4
1
+2
E
0
cos . Qualitatively,
the origin of this charge is easy enough to understand: while in the bulk of the sphere,
the induced dipole moments cancel each other upon spatial averaging (see the gure),
uncompensated charges remain at the surface of the medium. These surface charges
induce a nite and quite macroscopic dipole moment which is aligned opposite to E
0
and acts as a source of the electric eld (but not of the displacement eld).
2.2.2 Magnetostatics in the presence of matter
Generalities
The derivation of magnetic boundary conditions in the presence of materials with a nite
permeability largely parallels the electric case: the macroscopic equations of magnetostatics
read as
H =
4
c
j,
B = 0,
H =
1
B. (2.25)
Again, we wish to explore the behaviour of the elds in the vicinity of a boundary between
two media with dierent permeabilities. Proceeding in the usual manner, i.e. by application
of innitesimal variants of Gauss and Stokes law, respectively, we derive the equations
(B(x) B
t
(x)) n = 0,
(H(x) H
t
(x)) n =
4
c
j
s
, (2.26)
where j
s
is an optional surface current.
Example: paramagnetic sphere in a homogeneous magnetic eld
Consider a sphere of radius R and permeability > 1 in the presence of an external magnetic
eld H. We wish to compute the magnetic induction B inside and outside the sphere. In
principle, this can be done as in the analogous problem on the dielectric sphere, i.e. by
Legendre polynomial series expansion. However, comparing the magnetic Maxwell equations
to electric Maxwell equations considered earlier,
B = 0 D = 0,
H = 0 E = 0,
B = H D = E,
P =
1
4
E M =
1
4
H,
2.3. WAVE PROPAGATION IN MEDIA 51
we notice that there is actually no need to do so: Formally identifying B D, H E,
, M P, the two sets of equation become equivalent, and we may just copy the solution
obtained above. Thus (i) the magnetic eld outside the sphere is a superposition of the
external eld and a magnetic dipole eld, where (ii) the dipole moment M =
3
4
1
+2
H is
parallel to the external eld. This magnetic moment is caused by the orientation of the
intrinsic moments along the external eld axis; consequently, the actually felt magnetic eld
(the magnetic induction) B = H+ 4M exceeds the external eld.
2.3 Wave propagation in media
What happens if an electromagnetic wave impinges upon a medium characterized by a non
trivial dielectric function ?
11
And how does the propagation behaviour of waves relate to
actual physical processes inside the medium? To prepare the discussion of these questions, we
will rst introduce a physically motivated model for thhe dependence of the dielectric function
on its arguments. This modelling will establish a concrete link between the amplitude of the
electromagnetic waves and the dynamics of the charge carriers inside the medium.
2.3.1 Model dielectric function
In Fourier space, the dielectric function, (q, ) is a function of both wave vector and frequency.
It owes its frequency dependence to the fact that a wave in matter create excitations of the
molecular degrees of freedom. The feedback of these excitations to the electromagnetic wave
is described by the frequency dependence of the function . To get an idea of the relevant
frequency scales, notice that typical excitation energies in solids are of the order < 1 eV.
Noting that Plancks constant 0.610
15
eVs, we nd that the characteristic frequencies
at which solids dominantly respond to electromagnetic waves are of the order < 10
15
s
1
.
However, the wave lengths corresponding to these frequencies, = 2/[q[ 2c/ 10
6
m
are much larger than the range of a few interatomic spacings over which we expect the
nonlocal feedback of the external eld into the polarization to be screened (the range of
the susceptibility .) This means (think why!) that the function (q, ) () is largely
momentumindependent at the momentum scales of physical relevance.
To obtain a crude model for the function (), we imagine that the electrons surrounding
the molecules inside the solid are harmonically bound to the nuclear center coordinates. An
individual electron at spatial coordinate x (measured w.r.t. a nuclear position) will then be
subject to the equation of motion
m
_
d
2
t
+
2
0
/2 + d
t
_
x(t) = eE(t), (2.27)
where
0
is the characteristic frequency of the oscillator motion of the electron, a relaxation
constant, and the variation of the external eld over the extent of the molecule has been
11
We emphasize dielectric function because, as shall be seen below, the magnetic permeability has a much
lesser impact on electromagnetic wave propagation.
neglected (cf. the discussion above.) Fourier transforming this equation, we obtain d
ex() = e
2
E()m
1
/(
2
0
2
i). The dipole moment d() of a molecule with electrons
f
i
oscillating at frequency
i
and damping rate
i
is given by d() =
i
f
i
d
(
0
i
,
i
)
.
Denoting the density of molecules in the by n, we thus obtain (cf. Eqs. (2.1), (2.10), and
(2.15))
() = 1 +
4ne
2
m
i
f
i
2
i

2
i
i
. (2.28)
With suitable quantum mechanical denitions of the material constants f
i
,
i
and
i
, the
model dielectric function (2.28) provides an accurate description of the molecular contri-
bution to the dielectric behaviour of solids. A schematic of the typical behaviour of real and
imaginary part of the dielectric function is shown in Fig. 2.4. A few remarks on the prole of
these functions:
The material constant employed in the static sections of this chapter is given by
= (0). The imaginary part of the zero frequency dielectric function is negligible.
For reasons to become clear later on, we call the function
() =
2
c
Im
_
() (2.29)
the (frequency dependent) absorption coecient and
n() = Re
_
() (2.30)
the (frequency dependent) refraction coecient of the system.
For most frequencies, i.e. everywhere except for the immediate vicinity of a resonant
frequency
i
, the absorption coecient is negligibly small. In these regions,
d
d
Re () >
0. The positivity of this derivative denes a region of normal dispersion.
In the immediate vicinity of a resonant frequency, [
i
[ <
i
/2, the absorption
coecient shoots up while the derivative
d
d
Re () < 0 changes sign. As we shall
discuss momentarily, the physics of these frequency windows of anomalous dispersion
is governed by resonant absorption processes.
2.3.2 Plane waves in matter
To study the impact of the nonuniform dielectric function on the behaviour of electromagnetic
waves in matter, we consider the spatiotemporal Fourier transform of the Maxwell equations
in a medium free of extraneous sources, = 0, j = 0. Using that D(k) = (k)E(k) and
i
Im
Re
Figure 2.4: Schematic of the functional prole of the function (). Each resonance frequency
i
is center of a peak in the imaginary part of and of a resonant swing of in the real part.
The width of these structures is determined by the damping rate
i
.
H(k) =
1
(k)B(k), where k = (/c, k), we have
k (E) = 0,
k (
1
B) +

c
(E) = 0,
k E

c
B = 0,
k B = 0,
where we omitted momentum arguments for notational clarity. We next compute k(third
equation) + (/c)(second equation) and (/c)(third equation) k(second equation)
to obtain the Fourier representation of the wave equation
_
k
2

2
c
2
()
_
_
E(k, )
B(k, )
= 0, (2.31)
where
c() =
c
_
()
, (2.32)
is the eective velocity of light in matter.
Comparing with our discussion of section 1.4, we conclude that a plane electric wave in
a matter is mathematically described by the function
E(x, t) = E
0
exp(i(k x t)),
where k = kn and k = /c() is a (generally complex) function of the wave frequency.
Specically, for Im(k x) > 0, the wave is will be exponentially damped. Splitting the wave
number k into its real and imaginary part, we have
k = Re k + i Imk =

c
n() +
i
2
(),
where n and are the refraction and absorption coecient, respectively. To better understand
the meaning of the absorption coecient, let us compute the Poynting vector of the electro-
magnetic wave. As in section 1.4, the Maxwell equation 0 = D = E k E
0
implies
transversality of the electric eld. (Here we rely on the approximate xindependence of the
dielectric function.) Choosing coordinates such that n = e
3
and assuming planar polarization
for simplicity, we set E
0
= E
0
e
1
with a real coecient E
0
. The physically relevant real
part of the electric eld is thus given by
Re E = E
0
e
1
cos((zn/c t))e
z/2
.
From the Maxwell equations B = 0 and E + c
1
t
B, we further obtain (exercise)
Re B = E
0
e
2
cos((zn/c t))e
z/2
. Assuming that the permeability is close to unity, 1
or B H, we thus obtain for the magnitude of the Poynting vector
[S[ =
c
4
[E H[ =
cE
2
0
4
cos
2
((zn/c t))e
z
... )
t
cE
2
0
8
e
z
,
where in the last step we have averaged over several intervals of the oscillation period 2/.
According to this result,
The electromagnetic energy current inside a medium decays at a rate
set by the absorption coecient.
This phenomenon is easy enough to interpret: according to our semiphenomenological model
of the dielectric function above, the absorption coecient is nonvanishing in the immediate
vicinity of a resonant frequency
i
, i.e. a frequency where molecular degrees of freedom
oscillate. The energy stored in an electromagnetic wave of this frequency may thus get
converted into mechanical oscillator energy. This goes along with a loss of eld intensity, i.e.
a diminishing of the Poynting vector or, equivalently, a loss of electromagnetic energy density,
w.
2.3.3 Dispersion
In the previous section we have focused on the imaginary part of the dielectric function and
on its attenuating impact on individual plane waves propagating in matter. We next turn
to the discussion of the equally important role played by the real part. To introduce
the relevant physical principles in a maximally simple environment, we will focus on a one
dimensional model of wave propagation throughout. Consider, thus, the onedimensional
variant of the wave equations (2.31),
_
k
2

2
c
2
()
_
(k, ) = 0, (2.33)
where k is a onedimensional wave vector and c() = c/
_
() as before. (For simplicity,
we set = 1 from the outset.) Plane wave solutions to this equation are given by (x, t) =
0
exp(ik()x it), where k() = /c().
12
Notice that an alternative representation of
the plane wave congurations reads as (x, t) =
0
exp(ikx i(k)t), where (k) is dened
implicitly, viz. as a solution of the equation /c() = k.
To understand the consequences of frequency dependent variations in the real part of the
dielectric function, we need, however, to go beyond the level of isolated plane wave. Rather,
let us consider a superposition of plane waves (cf. Eq. (1.42)),
(x, t) =
_
dk
0
(k)e
i(kx(k)t)
, (2.34)
where
0
(k) is an arbitrary function. (Exercise: check that this superposition solves the wave
equations.)
Specically, let us chose the function
0
(k) such that, at initial time t = 0, the distribution
(x, t = 0) is localized in a nite region in space. Congurations of this type are called
wave packets. While there is a lot of freedom in choosing such spatially localizing envelope
functions, a convenient (and for our purposes suciently general) choice of the weight function
0
is a Gaussian,
0
(k) =
0
exp
_
(k k
0
)
2
2
4
_
,
where
0
C is a amplitude coecient xed and a coecient of dimensionality length
that determines the spatial extent of the wave package (at time t = 0.) To check that latter
assertion, let us compute the spatial prole (x, 0) of the wave package at the initial time:
(x, 0) =
0
_
dk e
(kk
0
)
2
2
4
ikx
=
0
e
ik
0
x
_
dk e
k
2
2
4
ikx
=
=
0
e
ik
0
x
_
dk e
k
2
2
4
(x/)
2
= 2
0
e
ik
0
x
e
(x/)
2
.
The function (x, 0) is concentrated in a volume of extension ; it describes the prole of a
small wave packet at initial time t = 0. What will happen to this wave packet as time goes
on?
To answer this question, let us assume that the scales over which the function (k) varies
are much smaller than the extension of the wave package in wave number space, k. It is, then,
12
There is also the left moving solution, (x, t) =
0
exp(ik()x + it), but for our purposes it will be
sucient to consider right moving waves.
x
v
g
t
Figure 2.5: Left: Dispersive propagation of an initially sharply focused wave package. Right:
A more shallow wave package suers less drastically from dispersive deformation.
a good idea to expand (k) around k
0
:
(k) =
0
+ v
g
(k k
0
) +
a
2
(k k
0
)
2
+ . . . ,
where we introduced the abbreviations
0
(k
0
), v
g

t
(k
0
) and a
tt
(k
0
). Substituting
this expansion into Eq. (2.34) and doing the Gaussian integral over wave numbers we obtain
(x, t) =
0
_
dk e
(kk
0
)
2
2
/4i(kx(k)t)

0
e
ik
0
(xv
p
t)
_
dk e
(
2
+2iat)(kk
0
)
2
/4i(kk
0
)(xv
g
t)
=
2
0
(t)e
ik
0
(xv
p
t)
e
((xv
g
t)/(t))
2
,
where we introduced the function (t) =
_
2
+ 2iat, and the abbreviation v
p
(k
0
)/k
0
Let us try to understand the main characteristics of this result:
The center of the wave package moves with a characteristic velocity v
g
=
k
(k
0
) to
the right. In as much as v
g
determines the eective center velocity of a superposition of
a continuum or group of plain waves, v
g
is called the group velocity of the system.
Only in vacuum where c() = c is independent of frequency we have = ck and the
group velocity v
g
=
k
= c coincides with the vacuum velocity of light.
A customary yet far less useful notion is that of a phase velocity: an individual plane
wave behaves as exp(i(kx (k)t)) = exp(ik(x ((k)/k)t)). In view of the
structure of the exponent, one may call v
p
(k)/k the velocity of the wave. Since,
however, the wave extends over the entire system anyway, the phase velocity is not of
direct physical relevance. Notice that the phase velocity v
p
= (k)/k = c(k) = c/n(),
where the refraction coecient has been introduced in (2.30). For most frequencies and
in almost all substances, n > 1 v
p
< c. In principle, however, the phase velocity
may become larger than the speed of light.
13
The group velocity v
g
=
k
(k) =
(
k())
1
= (
n())
1
c = (n() +
n())
1
c = v
p
(1 + n
1
n)
1
. In
regions of normal dispersion,
n > 0 implying that v

g
< v
p
. For the discussion of
anomalous cases, where v
g
may exceed the phase velocity or even the vacuum velocity
of light, see [1].
The width of the wave package, x changes in time. Inspection of the Gaussian fac-
tor controlling the envelope of the packet shows that Re e
(xv
g
t)
2
(
2
+
2
)/2
, where
the symbol indicates that only the exponential dependence of the package is taken
into account. Thus, the width of the package is given by (t) = (1 + (2at/
2
)
2
)
1/2
.
According to this formula, the rate at which (t) increases (the wave package ows
apart) is the larger, the sharper the spatial concentration of the initial wave package
was (cf. Fig. 2.5). For large times, t
2
/a, the width of the wave package increases
as x t. The disintegration of the wave package is caused by a phenomenon called
dispersion: The form of the package at t = 0 is obtained by (Fourier) superposition
of a large number of plane waves. If all these plane waves propagated with the same
phase velocity (i.e. if (k) was kindependent), the form of the wave package would
remain unchanged. However, due to the fact that in a medium plane waves of dierent
wave number k generally propagate at dierent velocities, the Fourier spectrum of the
package at nonzero times diers from that at time zero. Which in turn means that the
integrity of the form of the wave package gets gradually lost.
The dispersive spreading of electromagnetic wave packages is a phenomenon of immense
applied importance. For example, dispersive deformation is one of the main factors limiting
the information load that can be pushed through ber optical cables. (For too high loads,
the wave packets constituting the bit stream through such bers begin to overlap thus loosing
their identity. The construction of ever more sophisticated counter measures optimizing the
data capacity of optical bers represents a major stream of applied research.
2.3.4 Electric conductivity
As a last phenomenon relating to macroscopic electrodynamics, we discuss the electric con-
duction properties of metallic systems.
Empirically, we know that in a metal, a nite electric eld will cause current ow. In
its most general form, the relation between eld and current assumes a form similar to that
between eld and polarization discussed above:
j
i
(x, t) =
_
d
3
x
t
_
t
dt
t
ij
(x x
t
, t t
t
)E
j
(x
t
, t
t
), (2.35)
13
However, there is no need to worry that such anomalies conict with Einsteins principle of relativity to be
discussed in chapter 3 below: the dynamics of a uniform wave train is not linked to the transport of energy
or other observable physical quantities, i.e. there is no actual physical entity that is transported at a velocity
larger than the speed of light.
where =
ij
(x, t) is called the conductivity tensor. This equation states that a eld may
cause current ow at dierent spatial locations in time. Equally, a eld may cause current ow
in directions dierent from the eld vector. (For example, in a system subject to a magnetic
eld in zdirection, an electric eld in xdirection will cause current ow in both x and
ydirection, why?) As with the electric susceptibility, the nonlocal space dependence of the
conductivity may often be neglected, (x, t) (x), or (q, ) = () in Fourier space.
However, except for very low frequencies, the dependence is generally important. Generally,
one calls the nitefrequency conductivity AC conductivity, where AC stands for alternating
current. For 0, the conductivity crosses over to the DC conductivity, where DC stands
for directed current. In the DC limit, and for an isotropic medium, the eldcurrent relation
assumes the form of Ohms law
j = E. (2.36)
In the following, we wish to understand how the phenomenon of electrical conduction can be
explained from our previous considerations.
There are two dierent ways to specify the dynamical response of a metal to external
elds: We may either postulate a currenteld relation in the spirit of Ohms law. Or we
may determine a dielectric function which (in a more microscopic way) describes the response
of the mobile charge carriers in the system to the presence of a eld. However, since that
response generally implies current ow, the simultaneous specciciation of both a frequency
dependent
14
dielectric function and a currenteld relation would overdetermine the problem.
In the following, we discuss the two alternatives in more detail.
The phenomenological route
Let us impose Eq. (2.36) as a condition extraneous to the Maxwell equations; We simply
postulate that an electric eld drives a current whose temporal and spatial prole is rigidly
locked to the electric eld. As indiciated above, this postulate largely determines the dynamical
response of the system to the external eld, i.e. the dielectric function () = has to be
chosen constant. Substitution of this ansatz into the temporal Fourier transform of the Maxwell
equation (??) obtains
H(x, ) =
1
c
(i + 4) E(x, ), (2.37)
For the moment, we leave this result as it is and turn to
The microscopic route
We want to describe the electromagnetic response of a metal in terms of a model dielectric
function. The dielectric function (2.28) constructed above describes the response of charge
carriers harmonically bound to molecular center coordinates by a harmonic potential m
i
2
i
/2
and subject to an eective damping or friction mechanism. Now, the conduction electrons of a
14
As we shall see below, a dielectric constant does not lead to current ow.
metal mey be thought of as charge carriers whose conning potential is innitely weak,
i
= 0,
so that their motion is not tied to a reference coordinate. Still, the conduction electrons will
be subject to friction mechanisms. (E.g., scattering o atomic imperfections will impede their
ballistic motion through the solid.) We thus model the dielectric function of a metal as
() = 1
4n
e
e
2
m( +
i
)
+
4ne
2
m
i
f
i
2
i

2
i
i
_
i

4n
e
e
2
m( +
i
)
, (2.38)
where n
e
Nf
0
is a measure of the concentration of conduction electrons, the friction
coecient of the electrons has been denoted by
1
,
15
and in the last step we noted that
for frequencies well below the oscillator frequencies
i
of the valence electrons, the frequency
dependence of the second contribution to the dielectric function may be neglected; We thus
lump that contribution into an eective material constant . Keep in mind that the free electron
contribution to the dielectric function exhaustingly describes the acceleration of charge carriers
by the eld (the onset of current), i.e. no extraneous currenteld relations must be imposed.
Substitution of Eq. (2.38) into the Maxwell equation (??) obtains the result
H(x, ) =
i()
c
E(x, ) =
1
c
_
i +
4n
e
e
2
m(i +
1
)
_
E(x, ), (2.39)
Comparison to the phenomenological result (2.37) yields the identication
() =
n
e
e
2
m(i +
1
)
, (2.40)
i.e. a formula for the AC conductivity in terms of electron density and mass, and the impurity
collision rate.
Eq. (2.40) aords a very intuitive physical interpretations:
For high frequencies,
1
, we may approximate (i)
1
, or j E. In
time space, this means
t
j v E. This formula describes the ballistic acceleration
of electrons by an electric eld: On time scales t , which, in a Fourier sense,
correspond to large frequencies
1
the motion of electrons is not hindered by
impurity collisions and they accelarate as free particles.
For low frequencies,
1
, the conductivity may be approximated by a constant,
. In this regime, the motion of the electrons is impeded by repeated impurity
collisions. As a result they diuse with a constant drift induced by the electric eld.
15
... alluding to the fact that the attenuation of the electrons is due to collisions of impurities which take
place at a rate denoted by .
Chapter 3
Relativistic invariance
3.1 Introduction
Consult an entry level physics textbook to refamiliarize yourself with the basic notions of
special relativity!
3.1.1 Galilei invariance and its limitations
The laws of mechanics are the same in two coordinate systems K and K
t
moving at a con-
stant velocity relative to one another. Within classical Newtonian mechanics, the space time
coordinates (t, x) and (t
t
, x
t
) of two such systems are related by a Galilei transformation,
t
t
= t
x
t
= x vt.
Substituting this transformation into Newtons equations of motion as formulated in system
K,
K :
d
2
x
i
dt
2
=
x
i
j
V
ij
(x
i
x
j
)
(x
i
are particle coordinates in system K and V
ij
is a pair potential which, crucially, depends
only on the dierences between particle coordinates but not on any distinguished reference
point in universe.) we nd that
K
t
:
d
2
x
t
i
dt
t2
=
x
j
V
ij
(x
t
i
x
t
j
),
i.e. the equations remain form invariant. This is what is meant when we say that classical
mechanics is Galilei invariant.
1
1
More precisely, Galilei invariance implies invariance under all transformations of the Galilei group, the
uniform motions above, rotations of space, and uniform translations of space and time.
61
62 CHAPTER 3. RELATIVISTIC INVARIANCE
Now, suppose a certain physical phenomenon in K (the dynamics of a water surface, say)
is eectively described by a wave equation,
K :
_
1
c
2
2
t
_
f(x, t) = 0.
Substitution of the Galilei transformation into this equation obtains
K
t
:
_
1
c
2
2
t

2
c
2
v
t
t

1
c
2
(v
t
)
2
_
f(x
t
, t
t
) = 0,
i.e. an equation of modied form. At rst sight this result may look worrisome. After all,
many waves (water waves, sound waves, etc.) are of pure mechanical origin which means that
wave propagation ought to be a Galilei invariant phenomenon. The resolution to this problem
lies with the fact, that matter waves generally propagate in a host medium (water, say.) While
this medium is at rest in K, it isnt in K
t
. But a wave propagating in a nonstationary medium
must be controlled by a dierent equation of motion, i.e. there is no a priori contradiction
to the principle of Galilei invariance. Equivalently, one may observe that a wave emitted by a
stationary point source in K will propagate with velocity c (in K.) However, an observer in
K
t
will observe wave fronts propagating at dierent velocities. Mathematically, this distortion
of the wave pattern is described by K
t
s wave equation.
But what about electromagnetic waves? So far, we have never talked about a host
medium supporting the propagation of electromagnetic radiation. The lack of Galilean invari-
ance of Maxwells equations then leaves us with three possibilities:
1. Nature is Galilean invariant, but Maxwells equations are incorrect (or, to put it more
mildly, incomplete.)
2. An electromagnetic host medium (lets call it the ether) does, indeed, exist.
3. Nature is invariant under a group of spacetime transformations dierent from Galilei.
If so, Newtons equations of motion are in need of modication.
These questions became pressing in the second half of the 19th century, after Maxwell had
brought the theory of electromagnetism to completion, and the exploration of all kinds of
electromagnetic wave equation was a focal point of theoretical and experimental research. In
view of the sensational success of Maxwells theory the rst answer was not really considered
an option.
Since Newtonian mechanics wasnt a concept to be sacriced in a lightheartededly either,
the existence of an ether was postulated. However, it was soon realized that the ether medium
would have to have quite eccentric physical properties. Since no one had ever actually observed
an ether, this medium would have to be innitely light and fully noninteracting with matter.
(Its presumed etheric properties actually gave it its name.) Equally problematic, electromag-
netic waves would uniformly propagate with the speed of light, c, only in the distinguished
ether rest frame. The ether postulate was tantamount to the abandonment of the idea of no
preferential intertial frames in nature a concept close at heart to the development of modern
3.1. INTRODUCTION 63
Figure 3.1: Measurement of Newtons law of gravitational attraction (any other law would be
just as good) in two frames that are relatively inertial. The equations describing the force
between masses will be the same in both frames
physics. Towards the end of the nineteenth century, experimental evidence was mounting that
the vacuum velocity of light was, independent of the observation frame, given by c; the idea
of an ether became increasingly dicult to sustain.
3.1.2 Einsteins postulates
In essence, this summarizes the state of aairs before Einstein entered the debate. In view
of the arguments outlined above, Einstein decided that 3. had to be the correct option.
Specically, he put forward two fundamental postulates
The postulate of relativity: This postulate is best formulated in a negative form:
there is no such thing like absolute rest or an absolute velocity, or an absolute point
in space. No absolute frame exists in which the laws of nature assume a distinguished
form.
To formulate this principle in a positive form, we dene two coordinate frames to be
inertial relatively to each other if one is related to the other by a translation in space
and/or time, a constant rotation, a uniform motion, or a combination of these operations.
The postulate then states that the laws of nature will assume the same form (be expressed
by identical fundamental equations) in both systems.
As a corollary, this implies that physical laws must never make reference to absolute
coordinates, times, angles, etc.
The postulate of the constancy of the speed of light: the speed of light is inde-
pendent of the motion of its source.
What makes this postulate which supercially might seem to be no more than an
innocent tribute to experimental observation
2
so revolutionary is that it abandons the
2
... although the constancy of the speed of light had not yet been fully established experimentally when
Einstein made his postulate.
notion of absolute time. For example, two events that are simultaneous in one inertial
frame will in general no longer be simultaneous in other inertial frames. While, more
than one hundred years later, we have grown used to its implications the revolutionary
character of Einsteins second postulate cannot be exaggerated.
3.1.3 First consequences
As we shall see below, the notion of an absolute presence which makes perfect sense in
Newtonian mechanics
3
becomes meaningless in Einsteins theory of relativity. When we
monitor physical processes we must be careful to assign to each physical event its own space
and time coordinate x = (ct, x) (where the factor of c has been included for later convenience.)
An event is canonical in that it can be observed in all frames (no matter whether they are
relatively inertial or not). However, both its space and time coordinates are noncanonical,
i.e. when observed from a dierent frame K
t
it will have coordinates x
t
= (ct
t
, x
t
), where t
t
may be dierent from t.
ct
x
When we talk of a body or a particle in relativity, what we
actually mean is the continuous sequence of events x = (ct, x)
dened by its instantaneous position x at time t (both moni-
tored in a specic frame). The assembly of these events obtains
the world line of the body, a (directed) curve in a spacetime
coordinate frame (cf. the gure for a schematic of the (1 + 1)
dimensional world lines of a body in uniform motion (left) and an
accelerated body (right).)
The most important characteristic of the relative motion of inertial frames is that no
acceleration is involved. This implies, in particular, that a uniform motion in one frame will stay
uniform in the other; straight world lines transform into straight world lines. Parameterizing
a general uniform motion in system K by x = (ct, wt + b), where w is the velocity and b
an oset vector, we must require that the coordinate representation x
t
in any inertial system
K
t
is again a straight line. Now, the most general family of transformations mapping straight
lines onto straight lines are the ane transformations
x
t
= x + a, (3.1)
where : R
d+1
R
d+1
is a linear map from (d + 1)dimensional
4
space time into itself and
a R
d+1
a vector describing the displacement of the origin of coordinate systems in space
and or time. For a = 0, we speak of a homogeneous coordinate transformation. Since any
transformation may be represented as a succession of a homogeneous transformation and a
trivial translation, we will focus on the former class throughout.
INFO Above, we have set up criteria for two frames being relatively inertial. However, in the
theory of relativity, one meets with the notion of (absolutely) inertial frames (no reference to
3
Two events that occur simultaneously in one frame will remain simultaneous under Galilei transformations.
4
Although we will be foremostly interested in the case d = 3, it is occasionaly useful to generalize to other
dimensions. Specically, spacetime diagrams are easiest to draw/imagine in (1 + 1)dimensions.
3.1. INTRODUCTION 65
another frame.) But how can we tell, whether a frame is inertial or not? And what, for that
matter, does the attribute inertial mean, if no transformation is involved?
Newton approached this question by proposing the concept of an absolute space. Systems at rest
in this space, or at constant motion relative to it, were inertial. For practical purposes, it was
proposed that a reference system tied to the position of the xed stars was (a good approximation
of) an absolute system. Nowadays, we know that these concepts are problematic. The xed stars
arent xed (nor for that matter is any known constituent of the universe), i.e. we are not aware
of a suitable anchor reference system. What is more, the declaration of an absolute frame is at
odds with the ideas of relativity.
In the late nineteenth century, the above denition of inertial frames was superseded by a purely
operational one. A modern formulation of the alternative denition might read:
An inertial frame of reference is one in which the motion of a particle
not subject to forces is in a straight line at constant speed.
The meaning of this denition is best exposed by considering its negative: imagine an observer
tied to a rotating disc. A particle thrown by that observer will not move at a straight line (relative
to the disc.) If the observer does not know that he/she is on a rotating frame, he/she will have to
attribute bending of the trajectory to some ctitious forces. The presence of such forces is proof
that the disc system is not inertial.
However, in most physical contexts questions concerning the absolute status of a system do not
arise.
5
Far more frequently, we care about what happens as we pass from one frame (laboratory,
say) to another (that of a particle moving relatively to the lab.) Thus, the truly important issue is
whether systems are relatively inertial or not.
What conditions will the transformation matrix have to fulll in order to qualify as a trans-
formation between inertial frames? Referring for a rigorous discussion of this question to
Ref. [2], we here merely note that symmetry conditions implied by the principle of relativ-
ity nearly but not completely specify the class of permissible transformations. This class of
transformations includes the Gallilei transformations which we saw are incompatible with the
physics of electromagnetic wave propagation.
At this point, Einsteins second postulate enters the game. Consider two inertial frames
K and K
t
related by a homogeneous coordinate transformation. At time t = 0 the origins of
the two systems coalesce. Let us assume, however, the K
t
moves relatively to K at constant
velocity v. Now consider the event: at time t = 0 a light source at x = 0 emits a signal. In
K, the wave fronts of the light signal then trace out world lines propagating at the vacuum
speed of light, i.e. the spatial coordinates x of a wave front obey the relation
x
2
c
2
t
2
= 0.
The crucial point now is that in spite of its motion relative to the point of emanation of the
light signal, an observer in K
t
, too, will observe a light signal moving with velocity c in all
directions. (Notice the dierence to, say, a sound wave. From K
t
s point of view, fronts
moving along v would be slowed down while those moving in the opposite direction would
5
Exceptions include high precision measurements of gravitational forces.
propagate at higher speed, v
t
sound
= v
sound
v.) This means that the light front coordinates
in x
t
obey the same relation
x
t2
c
2
t
t2
= 0.
This additional condition unambiguously xes the set of permissible transformations. To
formulate it in the language of linear coordinate transformations (which, in view of the linearity
of transformations between inertial frames is the adequate one), let us dene the matrix
g = g

_
_
_
_
1
1
1
1
_
_
_
_
, (3.2)
where all nondiagonal entries are zero. The constancy of the speed of light may then be
expressed in concise form as x
T
gx = 0 x
tT
gx
t
= 0. This condition suggests to focus on
coordinate transformations for which the bilinear form x
t
gx is conserved,
6
x
T
gx = x
tT
gx
t
. (3.3)
Substituting x
t
= x, we nd that must obey the condition
T
g = g (3.4)
Linear coordinate transformations satisfying this condition are called Lorentz transforma-
tions. We thus postulate that
All laws of nature must be invariant under ane coordinate transformations (3.1)
where the homogeneous part of the transformation obeys the Lorentz condition (3.4).
The statement above characterizes the set of legitimate transformations in implicit terms. In
the next section, we aim for a more explicit description of the Lorentz transformation. These
results will form the basis for our discussion of Lorentz invariant electrodynamics further down.
3.2 The mathematics of special relativity I: Lorentz group
In this section, we will develop the mathematical framework required to describe transforma-
tions between inertial frames. We will show that these transformations form a continuous
group, the Lorentz group.
6
Notice the gap in the logics of the argument. Einsteins second postulate requires Eq. (3.3) only for those
space time vectors for which x
T
gx = 0 vanishes; We now suggest to declare it as a universal condition, i.e.
one that holds irrespective of the value of x
T
gx. To show that the weaker condition implies the stronger, one
may rst prove (cf. Ref. [2]) that relatively straightforward symmetry considerations suce to x the class of
allowed transformations up to one undetermined scalar constant. The value of that constant is then xed by
the weak condition above. Once it has been xed, one nds that Eq. (3.3) holds in general.
3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 67
ct
x
forward light cone
time like
space like
backward light cone
Figure 3.2: Schematic of the decomposition of space time into a time like region and a space
like region. The two regions are separated by a light cone of lightlike vectors. (A cone
because in space dimensions d > 1 it acquires a conical structure.) The forward/backward
part of the light cone extends to positive/negative times
3.2.1 Background
The bilinear form introduced above denes a scalar product of R
4
:
g : R
4
R
4
R, (3.5)
(x, y) x
T
gy x
. (3.6)
A crucial feature of this scalar product is that it is not positive denite. For reasons to be
discussed below, we call vectors whose norm is positive, x
T
gx = (ct)
2
x
2
> 0, time like
vectors. Vectors with negative norm will be called space like and vectors of vanishing norm
light like.
We are interested in Lorentz transformations, i.e. linear transformations : R
4
R
4
, x
x that leave the scalar product g invariant,
T
g = g. (Similarly to, say, the orthogonal
transformations which leave the standard scalar product invariant.) Before exploring the struc-
ture of these transformations in detail, let us observe a number of general properties. First
notice that the set of Lorentz transformations forms a group, the Lorentz group, L. For if
and
t
are Lorentz transformations, so is the composition
t
. The identity transformation
obviously obeys the Lorentz condition. Finally, the equation
T
g = g implies that is
nonsingular (why?), i.e. it possesses an inverse,
1
. We have thus shown that the Lorentz
transformations form a group.
Global considerations on the Lorentz group
What more can be said about this group? Taking the determinant of the invariance relation,
we nd that det(
T
g) = det()
2
det(g) = det(g), i.e. det()
2
= 1 which means that
det() 1, 1. ( is a real matrix, i.e. det() R.) We may further observe that the
absolute value of the component
00
is always larger than unity. This is shown by inspection
of the 00element of the invariance relation: 1 = g
00
= (
T
g)
00
=
2
00
i

2
0i
. Since the
subtracted term is positive (or zero), we have [
00
[ 1. We thus conclude that L contains
four components dened by
L
+
: det = +1,
00
+1,
L
+
: det = +1,
00
1,
L
: det = 1,
00
> +1,
L
: det = 1,
00
1.
(3.7)
Transformations belonging to any one of these subsets cannot be continuously deformed into a
transformation of a dierent subset (why?), i.e. the subsets are truly disjoint. In the following,
we introduce a few distinguished transformations belonging to the dierent components of the
Lorentz group.
Space reection, or parity, T : (ct, x) (ct, x) is a Lorentz transformation. It belongs
to the component L
. Time inversion, T : (ct, x) (ct, x) belongs to the component

L
. The product of space reection and time inversion, TT : (ct, x) (ct, x) belongs
to the component L
+
. Generally, the product of two transformations belonging to a given
component no longer belongs to that component, i.e. the subsets do not for subgroups of
the Lorentz group. However, the component L
+
is exceptional in that it is a subgroup. It is
called the proper orthochrome Lorentz group, or restricted Lorentz group. The attribute
proper indicates that it does not contain excpetional transformations (such as parity and time
inversion) which cannot be continuosly deformed back to the identity transform. It is called
orthochronous because it maps the positive light cone into it self, i.e. it respects the direction
of time. To see that L
+
is a group, notice that it contains the identity transformation. Further,
its elements can be continuously contracted to unity; however, if this property holds for two
elements ,
t
L
+
, then so for the product
t
, i.e. the product is again in L
+
.
The restricted Lorentz group
The restricted Lorentz group contains those transformations which we foremostly associate
with coordinate transformations between inertial frames. For example, rotations of space,
R

_
1
R
_
,
where R is a threedimensional rotation matrix trivially fall into this subgroup. However, it
also contains the second large family of homogeneous transformations between inertial frames,
uniform motions.
3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 69
x
3
x
3
x
2
x
2
x
1
x
1
x
3
x
3
x
2
x
2
x
1
x
1
vt
vt
K K
K
Consider two inertial frames K and K
t
whose origins
at time t = 0 coalesce (which means that the ane
coordinate transformation (3.1) will be homogeneous,
a = 0.) We assume that K
t
moves at constant veloc-
ity v relative to K. Lorentz transformations of this type,
v
, are called special Lorentz transformations. Phys-
ically, they describe what is sometimes callled a Lorentz
boost, i.e. the passage from a stationary to a moving
frame. (Although we are talking about a mere coor-
dinate transformation, i.e. no acceleration processes is
described.) We wish to explore the mathematical and
physical properties of these transformations.
Without loss of generality, we assume that the vector
v describing the motion is parallel to the e
1
direction of K (and hence to the e
t
1
direction of
K
t
: Any general velocity vector v may be rotated to v | e
1
by a space like rotation R. This
means that a generic special Lorentz transformation
v
may be obtained from the prototypical
one
ve
1
by a spatial rotation,
v
=
R
ve
1
1
R
(cf. the gure.)
A special transformation in e
1
direction will leave the coordinates x
2
= x
t
2
and x
3
= x
t
3
unchanged.
7
The sought for transformation thus assumes the form
ve
1
=
_
_
A
v
1
1
_
_
,
where the 2 2 matrix A
v
operates in (ct, x
1
)space.
There are many ways to determine the matrix A
v
. We here use a group theory inspired
method which may come across as somewhat abstract but has the advantage of straightfor-
ward applicability to many other physical problems. Being element of a continuous group of
transformations (a Lie group), the matrix A
v
exp(
i
T
i
) can be written in an exponential
parameterization. Here,
i
are real parameters and T
i
twodimensional xed generator ma-
trices (Cf. with the conventional rotation group; the T
i
s play the role of angular momentum
generator matrices, and the
i
s are generalized realvalued angles.) To determine the group
generators, we consider the transformation A
v
for innitesimal angles
i
. Substituting the
matrix A
v
into the dening equation of the Lorentz group, using that the restriction of the
metric to (ct, x) space assumes the form g
3
diag(1, 1), and expanding to rst order
in
i
, we obtain
A
T
v
3
A
v
(1 +
i
T
T
i
)
3
(1 +
i
T
i
)
3
+
i
(T
T
i

3
+
3
T
i
)
!
=
3
.
7
To actually prove this, notice that from the point of view of an observer in K
we consider a special trans-

formation with velocity ve
1
. Then apply symmetry arguments (notably the condition of parity invariance; a
mirrored universe should not dier in an observable manner in its transformation behaviour from our one.) to
show that any change in these coordinates would lead to a contradiction.
This relation must hold regardless of the value of
i
, i.e. the matrices T
i
must obey the
condition T
T
i

3
=
3
T
i
. Up to normalization, there is only one matrix that meets this
condition, viz T
i
=
1

_
0 1
1 0
_
. The most general form of the matrix A
v
is thus given by
A
v
= exp
_
_
0 1
1 0
__
=
_
cosh sinh
sinh cosh
_
,
where we denoted the single remaining parameter by and in the second equality expanded
the exponential in a power series. We now must determine the parameter = (v) in such
a way that it actually describes our vdependent transformation. We rst substitute A
v
into
the transformation law x
t
=
v
x, or
_
x
0t
x
1t
_
= A
v
_
x
0
x
1
_
.
Now, the origin of K
t
(having coordinate x
1t
= 0) is at coordinate x = vt. Substituting this
condition into the equation above, we obtain
tanh = v/c.
Using this result and introducing the (standard) notation
= [v[/c, (1
2
)
1/2
, (3.8)
the special Lorentz transformation assumes the form
ve
1
=
_
_
_
_

1
1
_
_
_
_
(3.9)
EXERCISE Repeat the argument above for the full transformation group, i.e. not just the set
of special Lorentz transformations along a certain axis. To this end, introduce an exponential
representation = exp(
i

i
T
i
) for general elements of the restricted Lorentz group. Use the
dening condition of the group to identify six linearly independent matrix generators.
8
Among
the matrices T
i
, identify three as the generators J
i
of the spatial rotation group, T
i
= J
i
. Three
others generate special Lorentz transformations along the three coordinate axes of K. (Linear
combinations of these generators can be used to describe arbitrary special Lorentz transformations.)
A few more remarks on the transformations of the restricted Lorentz group:
8
Matrices are elements of a certain vector space (the space of linear transformations of a given vector
space.) As A set of matrices A
1
, . . . A
k
is linearly independent if no linear combination exists such that
i
x
i
A
i
= 0 vanishes.
3.3. ASPECTS OF RELATIVISTIC DYNAMICS 71
In the limit of small , the transformation
ve
1
asymptotes to the Galilei transformation,
t
t
= t and x
t
1
= x
1
vt.
We repeat that a general special Lorentz transformation can be obtained from the
transformation along e
1
by a spacelike rotation,
v
=
R
ve
1
R
1, where R is a
rotation mapping e
1
onto the unit vector in vdirection.
Without proof (However, the proof is by and large implied by the exercise above.) we
mention that a general restricted Lorentz transformation can always be represented as
=
R
v
,
i.e. as the product of a rotation and a special Lorentz transformation.
3.3 Aspects of relativistic dynamics
In this section we explore a few of the physical consequences deriving from Einsteins postulates;
Our discussion will be embarassingly supercial, key topics relating to the theory of relativity
arent mentioned at all. Rather the prime objective of this section will be to provide the core
background material required to discuss the relativistic invariance of electrodynammics below.
3.3.1 Proper time
ct
x
ct
Consider the world line of a particle moving in space in a (not nec-

essarily uniform manner.) At any instance of time, t, the particle
will be at rest in an inertial system K
t
whose origin coincides with
x(t) and whose instantaneous velocity w.r.t. K is given by d
t
x(t).
At this point, it is important to avoid confusion. Of course, the
coordinate system whose origin globally coincides with x(t) is not
inertial w.r.t. K (unless the motion is uniform.) Put dierently,
the origin of the inertial system K
t
dened by the instantaneous
position x(t) and the instantaneous velocity d
t
x(t) coincides with
the particles position only at the very instant t. The t
t
axis of
K
t
is parallel to the vector x
t
(t). Also, we know that the t
t
axis
cannot be tilted w.r.t. the t axis by more than an angle /4, i.e. it must stay within the
instantaneous light cone attached to the reference coordinate x.
Now, imagine an observer traveling along the world line x(t) and carrying a watch. At
time t (corresponding to time zero in the instantaneous rest frame) the watch is reset. We
wish to identify the coordinates of the event watch shows (innitesimal) time d in both
K and K
t
. In K the answer to this question is formulated easily enough, the event has
coordinates (cd, 0), where we noted that, due to the orthogonality of the spacelike coordinate
axes to the world line, the watch will remain at coordinate x
t
= 0. To identify the K
coordinates, we rst note that the origin of K
t
is translated w.r.t. that of K by a = (ct, x(t).
Thus, the coordinate transformation between the two frames assumes the general ane form
x = Lx
t
+a. In K, the event takes place at some (as yet undetermined) time dt later than t.
Its spatial coordinates will be x(t) +vdt, where v = d
t
x is the instantaneous velocity of the
moving observer. We thus have the identication (cd, 0) (c(t +dt), x +vdt). Comparing
this with the general form of the coordinate transformation, we realize that (cd, 0) and
(cdt, vdt) are related by a homogeneous Lorentz transformation. This implies, in particular,
that (cd)
2
= (cdt)
2
(vdt)
2
, or
d = dt
_
1 v
2
/c
2
. (3.10)
The time between events of nite duration may be obtained by integration,
=
_
dt
_
1 v
2
/c
2
.
This equation expresses the famous phenomenon of the relativity of time: For a moving ob-
server time proceeds slower than for an observer at rest. The time measured by the propagating
watch, , is called proper time. The attribute proper indicates that is dened with refer-
ence to a coordinate system (the instantaneous rest frame) that can be canonically dened.
I.e. while the time t associated to a given point on the world line x(t) = (ct, x(t)) will change
from system to system (as we just saw), the proper time remains the same; the proper time can
be used to parameterize the spacetime curve in an invariant manner as x() = (ct(), x()).
3.3.2 Relativistic mechanics
Relativistic energy momentum relation
In section 3.1.2 we argued that Einstein solved the puzzle posed by the Lorentz invariance of
electrodynamics by postulating that Newtonian mechanics was in need of generalization. We
will begin our discussion of the physical ramications of Lorentz invariance by developing this
extended picture. Again, we start from postulates motivated by physical reasoning:
The extended variant of Newtonian mechanics (lets call it relativistic mechanics) ought to
be invariant under Lorentz transformations, i.e. the generalization of Newtons equations
should assume the same form in all inertial frames.
In the rest frame of a particle subject to mechanical forces, or in frames moving at low
velocity v/c 1 relatively to the rest frame, the equations must asymptote to Newtons
equations.
Newtons equations in the rest frame K are of course given by
m
d
2
d
2
x = f ,
where m is the particles mass (in its rest frame as we shall see, the mass is not an invariant
quantity. We thus better speak of a particle rest mass), and f is the force. As always in
relativity, a spatial vector (such as x or f ) must be interpreted as the spacelike component
of a fourvector. The zeroth component of x is given by x
0
= c, so that the rest frame
generalization of the Newton equation reads as md
2
= f
, where f = (0, f ) and we noted

that for the zeroth component md
2
x
0
= 0, so that f
0
= 0. It is useful to dene the four
momentum of the particle by p
= md
. (Notice that both and the rest mass m are

invariant quantities, they are not aected by Lorentz transformations. The transformation
behaviour of p is determined by that of x.) The zeroth component of the momentum, md
x
0
carries the dimension [energy]/[velocity]. This suggests to dene
p =
_
E/c
p
_
,
where E is some characteristic energy. (In the instantaneous rest frame of the particle,
E = mc
2
and p = 0.)
Expressed in terms of the fourmomentum, the Newton equation assumes the simple form
d
p = f. (3.11)
Let us explore what happens to the four momentum as we boost the particle from its rest
frame to a frame moving with velocity vector v = ve
1
. Qualitatively, one will expect that
in the moving frame, the particle does carry a nite spacelike momentum (which will be
proportional to the velocity of the boost, v). We should also expect a renormalization of its
energy (from the point of view of the moving frame, the particle has acquired kinetic energy).
Focusing on the 0 and 1 component of the momentum (the others wont change), Eq.
(3.9) implies
_
mc
2
/c
0
_
_
E
t
/c
p
t
1
_
_
mc
2
/c
(m)v
_
.
This equation may be interpreted in two dierent ways. Substituting the explicit formula
E = mc
2
, we nd E
t
= (m)c
2
and p
t
1
= (m)v. In Newtonian mechanics, we would have
expected that a particle at rest in K carries momentum p
t
1
= mv in K
t
. The dierence to
the relativistic result is the renormalization of the mass factor: A particle that has mass m in
its rest frame appears to carry a velocity dependent mass
m(v) = m =
m
_
1 (v/c)
2
in K
t
. For v c, it becomes innitely heavy and its further acceleration will become progres-
sively more dicult:
Particles of nite rest frame mass cannot move faster than with the speed of light.
The energy in K
t
is given by (m)c
2
, and is again aected by mass renormalization. However,
a more useful representation is obtained by noting that p
T
gp is a conserved quantity. In
K, p
T
gp = (E/c)
2
= (mc
2
)
2
. Comparing with the bilinear form in K
t
, we nd (mc)
2
=
(E
t
/c)
2
p
t2
or
E
t
=
_
(mc
2
)
2
+ (p
t
c)
2
(3.12)
This relativistic energymomentum relation determines the relation between energy and
momentum of a free particle.
It most important implication is that even a free particle at rest carries energy E = mc
2
,
Einsteins worldfamous result. However, this energy is usually not observable (unless it gets
released in nuclear processes of mass fragmentation, with all the known consequences.) What
we do observe in daily life are the changes in energy as we observe the particle in dierent
frames. This suggests to dene the kinetic energy of a particle by
T E mc
2
=
_
(mc
2
)
2
+ (pc)
2
mc
2
,
For velocities v c, the kineatic energy T p
2
/2m indeed reduces to the familiar non
relativistic result. Deviations from this relation become sizeable at velocities comparable to
the speed of light.
Particle subject to a Lorentz force
We next turn back to the discussion of the relativistically invariant Newton equation d
=
f
. Both, p
and the components of the force, f
, transform as vectors under Lorentz

transformations. Specically, we wish to explore this transformation behaviour on an example
of obvious relevance to electrodynamics, the Newton equation of a particle of velocity v subject
to an electric and a magnetic eld. Unlike in much of our previous discussion our observer
frame K is not the rest frame of the particle. The fourmomentum of the particle is thus
given by p = (mc, mv).
What we take for granted is the Lorentz force relation,
d
dt
p = q(E + (v/c) B).
In view of our discussion above, it is important to appreciate the meaning of the noninvariant
constituents of this equation. Specically, t is the time in the observer frame, and p is the
spacelike component of the actual momentum. This quantity is dierent from mv. Rather,
it is given by the product (observed mass) (observed velocity), where the observed mass
diers from the rest mass, as discussed above.
We now want to bring this equation into the form of an invariant Newton equation (3.11).
Noting that (cf. Eq. (3.10))
d
dt
=
1 d
d
, we obtain
d
d
p = f ,
where the force acting on the particle is given by
f = q(E + (v/c) B).
To complete our derivation of the Newton equation in K, we need to identify the zeroth
component of the force, f
0
. The zeroth component of the left hand side of the Newton
equation is given by (/c)d
t
E, i.e. (/c) the rate at which the kinetic energy of the particle
changes. However, we have seen before that this rate of potential energy change is given by
d
t
U = f v = qE v. Energy conservation implies that this energy gets converted into
the kinetic energy, d
t
T = d
t
E = +qE v. This implies the identication f
0
= (q/c)E v.
The generalized form of Newtons equations is thus given by
md
_
c
v
_
=
q
c
_
E (v)
cE + (v) B
_
.
The form of these equations suggests to introduce the fourvelocity vector
v
_
c
v
_
(in terms of which the fourmomentum is given by p = mv, i.e. by multiplication by the rest
mass.) The equations of motion can then be expressed as
md
= F
, (3.13)
where we used the index raising and lowering convention originally introduced in chapter ??,
i.e. v
0
= v
0
and v
i
= v
i
, and the matrix F is dened by
F = F
=
_
_
_
_
0 E
1
E
2
E
3
E
1
0 B
3
B
2
E
2
B
3
0 B
1
E
3
B
2
B
1
0
_
_
_
_
. (3.14)
The signicance of Eq. (3.13) goes much beyond that of a mere reformulation of Newtons
equations: The electromagnetic eld enters the equation through a matrix, the socalled
eld strength tensor. This signals that the transformation behaviour of the electromagnetic
eld will be dierent from that of vectors. Throughout much of the course, we treated the
electric eld as if it was a (spacelike) vector. Naively, one might have expected that in the
theory of relativity, this eld gets augmented by a fourth component to become a fourvector.
However, a moments thought shows that this picture cannot be correct. Consider, say, a
charged particle at rest in K. This particle will create a Coulomb electric eld. However, in
a frame K
t
moving relative to K, the charge will be in motion. Thus, an observer in K
t
will
see an electric current plus the corresponding magnetic eld. This tells us that under inertial
transformations, electric and magnetic elds get transformed into one another. We thus need
to nd a relativistically invariant object accommodating the six components of the electric and
magnetic eld vectors. Eq. (3.14) provides a tentative answer to that problem. The idea is
that the elds enter the theory as a matrix and, therefore, transform in a way fundamentally
dierent from that of vectors.
3.4 Mathematics of special relativity II: Co and Con-
travariance
So far, we have focused on the Lorentz transformation behaviour of vectors in R
4
. However, our
observations made in the end of the previous section suggest to include other objects (such
as matrices) into our discussion. We will begin by providing some essential mathematical
background and then, nally, turn to the discussion of Lorentz invariance of electrodynamics.
3.4.1 Covariant and Contravariant vectors
Generalities
Let x = x
be a four component object that transforms under homogeneous Lorentz trans-

formations as x x or x
in components. (By
we denote the components

of the Lorentz transformation matrix. For the up/down arrangement of indices, see the disuc-
ssion below.) Quantities of this transformation behaviour are called contravariant vectors
(or contravariant tensors of rst rank). Now, let us dene another fourcomponent object,
x
. Under a Lorentz transformation, x
= (
T1
)

. Quantities
transforming in this way will be denoted as covariant vectors (or covariant tensors of rst
rank). Dening g
to be the inverse of the Lorentz metric (in a basis where g is diagonal,

g = g
1
), the contravariant anchestor of x
may be obtained back by raising the indices as

x
= g
.
Critical readers may nd these denitions unsatisfactory. Formulated in a given basis,
one may actually wonder whether they are denitions at all. For a discussion of the basis
invariant meaning behind the notions of co- and contravariance, we refer to the info block
below. However, we here proceed in a pragmatic way and continue to explore the consequences
of the denitions (which, in fact, they are) above.
INFO Consider a general real vector space V . Recall that the dual space V

is the linear space of
all mappings v : V R, v v(v). (This space is a vector space by itself which is why we denote
its elements by vectorlike symbols, v.) For a given basis e
of V , a dual basis e
of V

may
be dened by the condition e
(e
) =
. With the expansions v =
and v =
,
respectively, we have v(v) = v
. (Notice that components of objects in dual space will be

indexed by subscripts throughout.) In the literature of relativity, elements of the vector space
then to be identied with spacetime, see below are usually called contravariant vectors while
their partners in dual space are called covariant vectors.
EXERCISE Let A : V V be a linear map dening a basis change in V , i.e. e
= A

e
t
,
where A

are the matrix elements of A. Show that

The components of a contravariant vector v transform as v
v
t
= (A
T
)
.
The components of a covariant vector, w transform as w
w
t
= (A
1
)

.
The action of w on v remains invariant, i.e. w(v) = w
= w
t
v
t
does not change.
(Of course, the action of the linear map w on vectors must not depend on the choice of a
particular basis.)
3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO AND CONTRAVARIANCE 77
In physics it is widespread (if somewhat tautological) practice to dene co or contravariant
vectors by their transformation behaviour. For example, a set of dcomponents w
is denoted
a covariant vector if these components change under linear transformations as w
.
Now, let us assume that V is a vector space with scalar product g : V V R, (v, w)
v
T
gw v, w. A special feature of vector spaces with scalar product is the existence of a
canonical mapping V V

, v v, i.e. a mapping that to all vectors v canonically (without
reference to a specic basis) assigns a dual element v. The vector v is implicitly dened by the
condition w V, v(w)
!
= v, w. For a given basis e
of V the components of v =
may be obtained as v
= v(e
) = v, e
= v
= g
, where in the last step we used the

symmetry of g. With the socalled index lowering convention
v
= g
, (3.15)
we are led to the identication v
= v
and v(w) = v
.
Before carrying on, let us introduce some more notation: by g
(g
1
)
we denote the compo-

nents of the inverse of the metric (which, in the case of the Lorentz metric in its standard diagonal
representation equals the metric, but thats an exception.) Then g
, where
is the
standard Kronecker symbol. With this denition, we may introduce an index raising convention
as v
. Indeed, g
= g
= v
. (Notice that indices that are summed

over or contracted always appear crosswise, one upper and one lower index.)
Consider a map A : V V, v Av. In general the action of the canonical covariant vector

Av on
transformed vectors Aw need not equal the original value v(w). However, it is natural to focus on
those transformations that are compatible with the canonical assignment, i.e.

Av(Aw) = v(w).
In a component language, this condition translates to
(A
T
)

= g
or A
T
gA = g. I.e. transformations compatible with the canonical identication (vector space)
(dual space.) have to respect the metric. E.g., in the case of the Lorentz metrix, the good
transformations belong to the Lorentz group.
Summarizing, we have seen that the invariant meaning behind contra and covariant vectors is
that of vectors and dual vectors resepectively. Under Lorentz transformations these objects behave
as is required by the component denitions given in the main text. A general tensor of degree
(n, m) is an element of (
n
1
V ) (
m
1
V

).
The denitions above may be extended to objects of higher complexity. A two component
quantity W
is called a contravariant tensor of second degree if it transforms under

Lorentz transformations as W
. Similiarly, a covariant tensor of second

degree transforms as W
(
T1
)

(
T1
)

. Covariant and contravariant tensors

are related to each other by index raising/lowering, e.g. W
= g
. A mixed
second rank tensor W

transforms as W

(
T1
)

. The generalization of
these denitions to tensors of higher degree should be obvious.
Finally, we dene a contravariant vector eld (as opposed to a xed vector) as a eld
v
(x) that transforms under Lorentz transformations as v
(x) v
t
(x
t
) =
(x). A
Lorentz scalar (eld), or tensor of degree zero is one that does not actively transform under
Lorentz transformations: (x)
t
(x
t
) = (x). Covariant vector elds, tensor elds, etc.
are dened in an analogous manner.
The summation over a contravariant and a covariant index is called a contraction. For
example, v
is the contraction of a contravariant and a covariant vector (eld). As

a result, we obtain a Lorentz scalar, i.e. an object that does not change under Lorentz
transformations. The contraction of two indices lowers the tensor degree of an object by two.
For example the contraction of a mixed tensor of degree two and a contravariant tensor of
degree one, A
obtains a contravariant tensor of degree one, etc.

3.4.2 The relativistic covariance of electrodynamics
We now have everything in store to prove the relativistic covariance of electrodynamics, i.e.
the forminvariance of its basic equations under Lorentz transformations. Basically, what we
need to do is assign a denite transformation behaviour scalar, co/contravariant tensor, etc.
to the building blocks of electrodynamics.
The coordinate vector x
is a contravariant vector. In electrodynamics we frequently

dierentiate w.r.t. the compoents x
, i.e. the next question to ask is whether the four

dimensional gradient
x
is co or contravariant. According to the chain rule,
x
t
=
x
x
t
.
Using that x
= (
1
)
x
t
, we nd that
x
t
= (
1T
)

,
i.e.
x

transforms as a covariant vector. In components,
= (c
1
t
, ),
= (c
1
t
, ).
The fourcurrent vector
We begin by showing that the fourcomponent object j = (c, j) is a Lorentz vector. Consider
the current density carried by a point particle at coordinate x(t) in some coordinate system
K. The fourcurrent density carried by the particle is given by j(x) = q(c, d
t
x(t))(xx(t)),
or j
(x) = qd
t
x
(t)(xx(t)), where x
(t) = (ct, x(t)). To show that this is a contravariant

vector, we introduce a dummy integration,
j
(x) = q
_
d
t
dx
t)
d
t
(x x(
t))(t
t) = qc
_
d
dx
d

4
(x x()),
where in the last step we switched to an integration over the proper time = (
t) uniquely
assigned to the world line parameterization x(
t) = (c
t, x(
t)) in K. Now, the fourcomponent

distribution,
4
(x) = (x)(ct) is a Lorentz scalar (why?). The proper time, , also is a
Lorentz scalar. Thus, the transformation behaviour of j
is dictated by that of x
, i.e. j
denes a contravariant vector.

As an important corrolary we note that the continuity equation the contraction of
the covariant vector
and the contravariant vector j
is Lorentz invariant,
= 0 is a
Lorentz scalar.
3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO AND CONTRAVARIANCE 79
Electric eld strength tensor and vector potential
In section 3.3.2 we obtained Eq. (3.13) for the Lorentz invariant generalization of Newtons
equations. Since v
and v
transform a covariant and contravariant vectors, respectively, the

matrix F
must transform as a contravariant tensor of rank two. Now, let us dene the four
vector potential as A
= (, A). It is then a straightforward exercise to verify that the

eld strength tensor is obtained from the vector potential as
F
. (3.16)
(Just work out the antisymmetric combination of derivatives and compare to the denitions
B = A, E = c
1
t
A.) This implies that the fourcomponent object A
indeed
transforms as a contravariant vector.
Now, we have seen in chapter 1.2 that the Lorentz condition can be expressed as
=
0, i.e. in a Lorentz invariant form. In the Lorentz gauge, the inhomogeneous wave equations
assume the form
A
=
4
c
j
, (3.17)
where =
is the Lorentz invariant wave operator. Formally, this completes the proof
of the Lorentz invariance of the theory. The combination Lorentzcondition/wave equations,
which we saw carries the same information as the Maxwell equations, has been proven to be
invariant. However, to stay in closer contact to the original formulation of the theory, we next
express Maxwells equations themselves in a manifestly invariant form.
Invariant formulation of Maxwells equations
One may verify by direct inspection that the two inhomogeneous Maxwell equations can be
formulated in a covariant manner as
=
4
c
j
. (3.18)
To obtain the covariant formulation of the homogeneous equations a bit of preparatory work
is required: Let us dene the fourth rank antisymmetric tensor as
_
_
_
1, (, , , ) = (0, 1, 2, 3)or an even permutation
1 for an odd permutation
0 else.
(3.19)
One may show (do it!) that
transforms as a contravariant tensor of rank four (under

the transformations of the unitdeterminant subgroup L
+
.) Contracting this object with the
covariant eld strength tensor F
, we obtain a contravariant tensor of rank two,

T
1
2
,
known as the dual eld strength tensor. Using (3.20), it is straightforward to verify that
T = T
=
_
_
_
_
0 B
1
B
2
B
3
B
1
0 E
3
E
2
B
2
E
3
0 E
1
B
3
E
2
E
1
0
_
_
_
_
, (3.20)
i.e. that T is obtained from F by replacement E B, B E. One also veries that the
homogeneous equations assume the covariant form
= 0. (3.21)
This completes the invariance proof. Critical readers may nd the denition of the dual tensor
somewhat unmotivated. Also, the excessive use of indices something one normally tries
to avoid in physics does not help to make the structure of the theory transparent. Indeed,
there exists a much more satisfactory, coordinate independent formulation of the theory in
terms of dierential forms. However, as we do not assume familiarity of the reader with the
theory of forms, all we can do is encourage him/her to learn it and to advance to the truly
invariant formulation of electrodynamics not discussed in this text ...
Chapter 4
Lagrangian mechanics
Newtons equations of motion provide a complete description of mechanical motion. Even
from todays perspective, their scope is limited only by velocity (at high velocities v c,
Newtonian mechanics has to be replaced by its relativistic generalization), and classicicity
(the dynamics of small bodies is aected by quantum eects.) Otherwise, they remain fully
applicable, which is remarkable for a theory that old.
Newtonian theory had its rst striking successes in celes-
tial mechanics. But how useful is this theory in a context more
worldly than that of a planet in open space? To see the jus-
tication of this question, consider the system shown in the
gure, a setup known as the Atwood machine: two bodies
subject to gravitational force are tied to each other by an ideal-
ized massless string over an idealized frictionless pulley. What
makes this problem dierent from those considered in celestial
mechanics is that the participating bodies (the two masses)
are constrained in their motion. The question then arises how
this constraint can be incorporated into (Newtons) mechan-
ical equations of motion, and how the ensuing equations can
be solved. This problem is of profound applied relevance of the hundreds of motions taking
place in, say, the engine of a car practically all are constrained. In the eighteenth century,
the era initiating the age of engineering and industrialization, the availability of a formalism
capable of ecient formulation of problems subject to mechanical constraints became pressing.
Early solution schemes in terms of Newtonian mechanics relied on the concept of con-
straining forces. The strategy there was to formulate a problem in terms of its basal un-
constrained variables (e.g., the real space coordinates of the two masses in the gure). In
a second step one would then introduce (innitely strong) constraining forces serving to
reduce the number of free coordinates (e.g., down to the one coordinate measuring the height
of one of the masses in the gure.) However, strategies of this type soon turned out to be
operationally sub-optimal. The need to nd more ecient formulations was motivation for
intensive research activity, which eventually culminated in the modern formulations of classical
mechanics, Lagrangian and Hamiltonian mechanics.
81
82 CHAPTER 4. LAGRANGIAN MECHANICS
INFO There are a number of conceptualy dierent types of mechanical constraints: con-
straints that can be expressed in terms of equalities such as f(q
1
, . . . , q
n
; q
1
, . . . , q
n
) = 0 are
called holonomic. Here, q
1
, . . . , q
n
are the basal coordinates of the problem, and q
1
, . . . , q
n
the corresponding velocities. (The constraint of the Atwood machine belongs to this category:
x
1
+x
2
= l, where l is a constant, and x
i
, i = 1, 2 are the heights of the two bodies measured with
respect to a common reference height.) This has to be contrasted to non-holonomic constraints,
i.e. constraints formulated by inequalities. (Think of the molecules moving in a piston. Their
coordinates obey the constraint 0 q
j
L
j
, j = 1, 2, 3, where L
j
are the extensions of the
piston.)
Constraints explicitly involving time f(q
1
, . . . , q
n
; q
1
, . . . , q
n
; t) = 0 are called rheonomic. For
example, a particle constraint to move on a moving surface is subject to a rheonomic constraint.
Constraints void of explicit time dependence are called scleronomic.
In this chapter, we will introduce the concept of variational principles to derive Lagrangian (and
later Hamiltonian) mechanics from their Newtonian ancestor. Our construction falls short to
elucidate the beautiful and important history developments that eventually led to the modern
formulation. Also, it is dicult to motivate in advance. However, within the framework of a
short introductory course, these shortcomings are outweighed by the brevity of the derivation.
Still, it is highly recommended to consult a textbook on classical mechanics to learn more
about the historical developments that led from Newtonian to Lagrangian mechanics.
Let us begin with a few simple and seemingly uninspired manipulations of Newtons
equations m q = F of a single particle (generalization to many particles will be obvious)
subject to a conservative force F. Now, notice that the l.h.s. of the equation may be written
as m q = d
t
q
T, where T = T( q) =
m
2
q
2
is the particles kinetic energy. By denition, the
conservative force aords a representation F =
q
U, where U = U(q) is the potential. This
means that Newtons equation may be equivalently represented as
(d
t
q
)L(q, q) = 0, (4.1)
where we have dened the Lagrangian function,
L = T U. (4.2)
But what is this reformulation good for? To appreciate the meaning of the mathematical
structure of Eq. (4.1), we need to introduce the purely mathematical concept of
4.1 Variational principles
In standard calculus, one is concerned with functions F(v) that take vectors v R
n
as
arguments. Variational calculus generalizes standard calculus, in that one considers functions
F[f] taking functions as arguments. Now, function of a function does not sound nice. This
may be the reason for why the function F is actually called a functional. Similarly, it is
customary to indicate the argument of a functional in angular brackets. The generalization
v f is not in the least mysterious: as we have seen in previous chapters, one may discretize
a function (cf. the discussion in section 1.3.4) f f
i
[i = 1, . . . , N, thus interpreting it as
4.1. VARIATIONAL PRINCIPLES 83
Figure 4.1: On the mathematical setting of functionals on curves (discussion, see text)
the limiting case of an N-dimensional vector; in many aspects, variational calculus amounts
to a straightforward generalization of standard calculus. At any rate, we will see that one may
work with functionals much like with ordinary functions.
4.1.1 Denitions
In this chapter we will focus on the class of functionals relevant to classical mechanics, viz.
functionals F[] taking curves in (subsets of) R
n
as arguments.
1
To be specic, consider the
set of all curves M : I [t
0
, t
1
] U mapping an interval I into a subset U R
n
of
n-dimensional space (see Fig. 4.1.)
2
Now, consider a mapping
:M R,
[], (4.3)
assigning to each curve a real number, i.e. a (real) functional on M.
EXAMPLE The length of a curve is dened as
L[]
_
t
1
t
0
dt
_
n
i=1

i
(t)
i
(t)
_
1/2
. (4.4)
1
We have actually met with more general functionals before. For example, the electric susceptibility [E]
is a functional of the electric eld E : R
4
R
3
.
2
The set U may actually be a lower dimensional submanifold of R
n
. For example, we might consider curves
in a twodimensional plane embedded in R
3
, etc.
It assigns to each curve its euclidean length. Some readers may question the consistency of the
notation: on the l.h.s. we have the symbol (no derivatives), and on the r.h.s. (temporal
derivatives.) However, there is no contradiction here. The notation [] indicates dependence on
the curve as a geometric object. By denition, this contains the full information on the curve,
including all derivatives. The r.h.s. indicates that the functional L reads only partial information
on the curve, viz. that contained in rst derivatives.
Consider now two curves ,
t
M that lie close to each other. (For example, we may
require that [(t)
t
(t)[
_
i
(
i
(t)
t
i
(t))
2
_
1/2
< for all t and some positive .) We
are interested in the increment [] [
t
]. Dening
t
= + h, the functional is called
dierentiable i
[ + h] [] = F
[h] +O(h
2
), (4.5)
where F
[h] is a linear functional of h, i.e. a functional obeying F
[c
1
h
1
+ c
2
h
2
] =
c
1
F
[h
1
] + c
2
F
[h
2
] for c
1
, c
2
R and h
1,2
M. In (4.5), O(h
2
) indicates residual
contributions of order h
2
. For example, if [h(t)[ < for all t, these terms would be of O(
2
).
The functional F[
is called the dierential of the functional at . Notice that F[
need
not depend linearly on . The dierential generalizes the notion of a derivative to functionals.
Similarly, we may think of [+h] = [] +F[
[h] +O(h
2
) as a generalized Taylor expansion.
The linear functional F[
describes the behavior of in the vicinity of the reference curve .

A curve is called an extremal curve of if F[
= 0.
EXERCISE Re-familiarize yourself with the denition of the derivative f
t
(x) of higher dimensional
functions f : R
k
R. Interpret the functional [] as the limit of a function : R
N
R,
i

(
i
) where the vector
i
[i = 1, . . . , N is a discrete approximation of the curve . Think how
the denition (4.5) generalizes the notion of dierentiability and how F[
f
t
(x) generalizes the
denition of a derivative.
This is about as much as we need to say/dene in most general terms. In the next section
we will learn how to determine the extremal curves for an extremely important subfamily of
functionals.
4.1.2 EulerLagrange equations
In the following, we will focus on functionals that aord a local representation
S[] =
_
t
1
t
0
dt L((t), (t), t), (4.6)
where L : R
n
R
n
R R is a function. The functional S is
local in that the integral kernel L does not depend on points on the
curve at dierent times. Local functionals play an important role
in applications. (For example, the length functional (4.4) belongs
to this family.) We will consider the restriction of local functionals
to the set of all curves M that begin and end at common terminal points: (t
0
)
0
and
t
1
=
1
with xed
0
and
1
(see the gure.) Again, this is a restriction motivated by
the applications below. To keep the notation slim, we will denote the space of all curves thus
restricted again by M.
We now prove the following important fact: the local functional S[] is dierentiable and
its derivative is given by
F
[h] =
_
t
1
t
0
dt (
L d
t

L) h. (4.7)
Here, we are using the shorthand notation
Lh
n
i=1
x
i
L(x, , t)
x
i
=
i
h
i
and analogously
for

h. Eq. (4.7) is veried by straightforward Taylor series expansion:
S[ + h] S[] =
_
t
1
t
0
dt
_
L( + h, +

h, t) L(, , t)
_
=
=
_
t
1
t
0
dt
_
L h +

h
_
+O(h
2
) =
=
_
t
1
t
0
dt [
L d
t
(

L)] h +

L h[
t
1
t
0
+O(h
2
),
where in the last step, we have integrated by parts. The surface term vanishes because
h(t
0
) = h(t
1
) = 0, on account of the condition (t
i
) = ( +h)(t
i
) =
i
, i = 0, 1. Comparison
with the denition (4.5) then readily gets us to (4.7).
Eq. (4.7) entails an important corollary: the local functional S is extremal on all curves
obeying the so-called Euler-Lagrange equations
d
dt
dL
d
i
dL
d
i
= 0, i = 1, . . . , N.
The reason is that if and only if these N conditions hold, will the linear functional (4.7) vanish
on arbitrary curves h. (Exercise: assuming that one or several of the conditions above are
violated, construct a curve h on which the functional (4.7) will not vanish.)
Let us summarize what we have got: for a given function L : R
n
R
n
R R,
the local functional
S : M R,
S[]
_
t
1
t
0
dt L((t), (t), t), (4.8)
dened on the set M = : [t
0
, t
1
] R
n
[(t
0
) =
0
, (t
1
) =
1
is extremal
on curves obeying the conditions
d
dt
dL
d
i
dL
d
i
= 0, i = 1, . . . , N. (4.9)
These equations are called the EulerLagrange equations of the functional S.
The function L is often called the Lagrangian (function) and S an action
functional.
At this stage, one may observe a suspicious structural similarity between the Euler-Lagrange
equations and our early reformulation of Newtons equations (4.1). However, before shedding
more light on this connection, it is worthwhile to illustrate the usage of Euler-Lagrange calculus
on an
EXAMPLE Let us ask what curve between xed initial and nal points
0
and
1
, resp. will have
extremal curve length. Here, extremal means shortest, there is no longest curve. To answer
this question, we need to formulate and solve the Euler-Lagrange equations of the functional (4.4).
Dierentiating the Lagrangian L( ) =
_
t
1
t
0
dt (
n
i=1

i

i
)
1/2
, we obtain (show it)
d
dt
L

i
i
=

i
L( )

i
(
j

j
)
L( )
3
.
These equations are solved by all curves connecting
0
and
1
as a straight line. For example,
the constant velocity curve, (t) =
0
tt
1
t
0
t
1
+
1
tt
0
t
1
t
0
has
i
= 0, thus solving the equation.
(Exercise: the most general curve connecting
0
and
1
by a straight line is given by (t) =
0
+f(t)(
0
1
), where f(t
0
) = 0 and f(t
1
) = 1. In general, these curves describe accelerated
motion, ,= 0. Still, they solve the Euler-Lagrange equations, i.e. they are of shortest geometric
length. Show it!)
4.1.3 Coordinate invariance of Euler-Lagrange equations
What we actually mean when we write
L
i
(t)
is: dierentiate the function L(, , t) w.r.t. the
ith coordinate of the curve at time t. However the same curve can be represented in dierent
coordinates! For example, a three dimensional curve will have dierent representations
depending on whether we work in cartesian coordinates (t) (x
1
(t), x
2
(t), x
3
(t)) or spherical
coordinates (t) (r(t), (t), (t)).

1
Figure 4.2: On the representation of curves and functionals in dierent coordinates
Yet, nowhere in our derivation of the EulerLagrange equations did we make reference to
specic properties of the coordinate system. This suggests that the EulerLagrange equations
assume the same form, Eq. (4.9), in all coordinate systems. (To appreciate the meaning
of this statement, compare with the Newton equations which assume their canonical form
x
i
= f
i
(x) only in cartesian coordinates.) Coordinate changes play an extremely important
role in analytical mechanics, especially when it comes to problems with constraints. Hence,
anticipating that the EulerLagrange formalism is slowly revealing itself as a replacement of
Newtons formulation, it is well invested time to take a close look at the role of coordinates.
Both a curve and the functional S[] are canonical objects, no reference to coordinates
made here. The curve is simply a map : I U and S[] assigns to that map a number.
However, most of the time when we actually need to work with curve, we do so in a system
of coordinates. Mathematically, a coordinate system of U is a dieomorphic
3
map:
:U V,
x (x) x (x
1
, . . . , x
n
),
where the coordinate domain V is an open subset of R
m
and m is the dimensionality of
U R
m
. (The set U may be of lower dimension than embedding space R
n
.) For example,
spherical coordinates ]0, []0, 2[ (, ) S
2
R
3
assign to each coordinate pair (, )
a point on the twodimensional sphere, etc.
4
3
Loosely speaking, this means invertible and smooth (dierentiable.)
4
We here avoid the discussion of the complications arising when U cannot be covered by a single coordinate
chart . For example, the sphere S
2
cannot be fully covered by a single coordinate mapping. (Think why.)
Given a coordinate system, , the abstract curve denes a curve x : I V ; t
x(t) = ((t)) in coordinate space (see Fig. ??.) What we actually mean when we refer to
i
(t) in the previous sections, are the coordinates x
i
(t) of in the coordinate system . (In
cases where unambiguous reference to one coordinate system is made, it is customary to avoid
the introduction of extra symbols (x
i
) and to write
i
instead. As long as one knows what
one is doing, no harm is done.) Similarly, the abstract functional S[] denes a functional
S
c
[x] S[
1
x] on curves in the coordinate domain. In our derivation of the Euler-Lagrange
equations we have been making tacit use of a coordinate representation of this kind. In other
words, the EulerLagrange equations were actually derived for the representation S
c
[x]. (Again
it is customary to simply write S[] if unambiguous reference to a coordinate representation
is made. Other customary notations include S[x] (omitting the subscript c), or just S[] if
is tacitly identied with a specic coordinate representation. Occasionally one writes S[]
where is meant to be the vector of coordinates of in a specic system.)
Now, suppose we are given another coordinate representation of U, that is, a dieomor-
phism
t
: U V
t
, x
t
(x) = x
t
(x
t
1
, . . . , x
t
m
). A point on the curve, (t) now has two
dierent coordinate representations, x(t) = ((t)) and x
t
(t) =
t
((t)). The coordinate
transformation between these representations is mediated by the map
t

1
:V V
t
,
x x
t
=
t

1
(x).
By construction, this is a smooth and invertible map between open subsets of R
m
. For example,
if x = (x
1
, x
2
, x
3
) are cartesian coordinates and x
t
= (r, , ) are spherical coordinates, we
would have x
t
1
(x) = (x
2
1
+ x
2
2
+ x
2
3
)
1/2
, etc.
The most important point is that x(t) and x
t
(t) describe the same curve, (t), only in
dierent representations. Specically, if the reference curve is an extremal curve, the coor-
dinate representations x : I V and x
t
: I V
2
will be extremal curves, too. According to
our discussion in section 4.1.2 both curves must be solutions of the Euler-Lagrange equations,
i.e. we can draw the conclusion:
extremal (4.10)
d
dt
L
x
i
L
x
i
= 0,
d
dt
L
t
x
t
i
L
x
t
i
= 0,
where L
t
(x
t
, x
t
, t) L(x(x
t
), x(x
t
), t) is the Lagrangian in the
t
coordinate representation.
In other words
The EulerLagrange equations are coordinate invariant.
They assume the same form in all coordinate systems.
4.2. LAGRANGIAN MECHANICS 89
EXERCISE Above we have shown the coordinate invariance of the EulerLagrange equations by
conceptual reasoning. However, it must be possible to obtain the same invariance properties by
brute force computation. To show that the second line in (4.10) follows from the rst, use the
chain rule, d
t
x
t
i
=
j
x
i
x
j
d
t
x
j
, and its immediate consequence
x
i
x
j
=
x
i
x
j
.
EXAMPLE Let us illustrate the coordinate invariance of the variational formalism on the example
of the functional curve length discussed on p 86. Considering the case of curves in the plane,
n = 2, we might get the idea to attack this problem in polar coordinates
1
(x) = (r, ). The
polar coordinate representation of the cartesian Lagrangian L(x
1
, x
2
, x
1
, x
2
) = ( x
2
1
+ x
2
2
)
1/2
reads
(verify it)
L(r, , r, ) = ( r
2
+ r
2

2
)
1/2
.
It is now straightforward to compute the EulerLagrange equations
d
dt
L
r

L
r
= (. . . )
!
= 0,
d
dt
L

L
= (. . . )
!
= 0.
Here, the notation (. . . ) indicates that we are getting a lengthy list of terms which, however,
are all weighted by . Putting the initial point into the origin (
0
) = (0, 0) and the nal point
somewhere into the plane, (
1
) = (r
1
,
1
), we thus conclude that the straight line connectors,
((t)) = (r(t),
0
) are solutions of the EulerLagrange equations. (It is less straightforward to
show that these are the only solutions.)
With these preparations in store, we are now in a very good position to apply variational
calculus to a new and powerful reformulation of Newtonian mechanics.
4.2 Lagrangian mechanics
4.2.1 The idea
Imagine a mechanical problem subject to constraints.
For deniteness, we may consider the system shown
on the right a bead sliding on a string and subject
to the gravitational force, F
g
. In principle, we may
describe the situation in terms of Newtons equa-
tions. These equations must contain an innitely
strong force F
c
whose sole function is to keep
the bead on the string. About this force we do not
know much (other than that it acts vertically to
the string, and vanishes right on the string.)
According to the structures outlined above, we
may now reformulate Newtons equations as (4.1), where the potential V = V
g
+ V
c
in the
Lagrangian L = T V contains two contributions, one accounting for the gravitational force,
V
g
, and another, V
c
, for the force F
c
. We also know that the sought for solution curve
q : I R
3
, t q(t) will extremize the action S[q] =
_
t
1
t
0
dt L(q, q). (Our problem does not
include explicit time dependence, i.e. L does not carry a timeargument.) So far, we have
not gained much. But let us now play the trump card of the new formulation, its invariance
under coordinate changes.
In the above formulation of the problem, we seeking for an extremum of S[q] on the set of
all curves I R
3
. However, we know that all curves in R
3
will be subject to the innitely
strong potential of the constraint force, unless they lie right on the string S. The action of
those generic curves will be innitely large and we may remove them from the set of curves
entering the variational procedure from the outset. This observation suggests to represent the
problem in terms of coordinates (s, q
), where the onedimensional coordinate s parameterizes

the curve, and the (3 1)dimensional coordinate vector q
parameterizes the space normal

to the curve. Knowing that curves with non-vanishing q
(t) will have an innitely large action,

we may restrict the set of curves under consideration to M : I S, t (s(t), 0),
i.e. to curves in S. On the string, S, the constraint force F
c
vanishes. The above limitation
thus entails that the constraint forces will never explicitly appear in the variational procedure;
this is an enormous simplication of the problem. Also, our problem has eectively become
onedimensional. The Lagrangian evaluated on curves in S is a function L(s(t), s(t)). It is
much simpler than the original Lagrangian L(q, q(t)) with its constraint force potential.
Before generalizing these ideas to a general solution strategy of problems with constraints,
let us illustrate its potential on a concrete problem.
EXAMPLE We consider the Atwood machine depicted on p81. The Lagrangian of this problem
reads
L(q
1
, q
2
, q
1
, q
2
) =
m
1
2
q
2
1
+
m
2
2
q
2
2
m
1
gx
1
m
2
gx
2
.
Here, q
i
is the position of mass i, i = 1, 2 and x
i
is its height. In principle, the sought for solution
(t) = (q
1
(t), q
2
(t)) is a curve in six dimensional space. We assume that the masses are released
at rest at initial coordinates (x
i
(t
0
), y
i
(t
0
), z
i
(t
0
)). The coordinates y
i
and z
i
will not change in
the process, i.e. eectively we are seeking for solution curves in the twodimensional space of
coordinates (x
1
, x
2
). The constraint now implies that x x
1
= l x
2
, or x
2
= l x. Thus,
our solution curves are uniquely parameterized by the generalized coordinate x as (x
1
, x
2
) =
(x, l x). We now enter with this parameterization into the Lagrangian above to obtain
L(x, x) =
m
1
+ m
2
2
x
2
(m
1
m
2
)gx + const.
It is important to realize that this function uniquely species the action
S[x] =
_
t
1
t
0
dt L(x(t), x(t))
of physically allowed curves. The extremal curve, which then describes the actual motion of the
twobody system, will be solution of the equation
d
t
L
x

L
x
= (m
1
+ m
2
) x + (m
1
m
2
)g = 0.
For the given initial conditions x(t
0
) = x
1
(t
0
) and x(t
0
) = 0, this equation is solved by
x(t) = x(t
0
)
m
1
m
2
m
1
+ m
2
g
2
(t t
0
)
2
.
Substitution of this solution into q
1
= (x, y
1
, z
1
) and q
2
= (l x, y
2
, z
2
) solves our problem.
4.2.2 Hamiltons principle
After this preparation, we are in a position to formulate a new approach to solving mechanical
problems. Suppose, we are given an Nparticle setup specied by the following data: (i) a La-
grangian function L : R
6N+1
R, (x
1
, . . . , x
N
, x
1
, . . . , x
N
, t) L(x
1
, . . . , x
N
, x
1
, . . . , x
N
, t)
dened in the 6Ndimensional space needed to register the coordinates and velocities of N
particles, and (ii) a set of constraints limiting the motion of particles to an fdimensional
submanifold of R
N
.
5
Mathematically, a (holonomic) set of constraints will be implemented
through 3N f equations
F
j
(x
1
, . . . , x
N
, t) = 0, j = 1, . . . , 3N f. (4.11)
The number f is called the number of degrees of freedom of the problem.
Hamiltonss principle states that such a problem is to be solved by a threestep algorithm:
Resolve the constraints (4.11) in terms of f parameters q (q
1
, . . . , q
f
), i.e. nd a
representation x
l
(q), l = 1, . . . , N, such that the constraints F
j
(x
1
(q), . . . , x
N
(q)) = 0
are resolved for all j = 1, . . . , 3N f. The parameters q
i
are called generalized coor-
dinates of the problem. The maximal set of parameter congurations q (q
1
, . . . q
f
)
compatible with the constraint denes a subset V R
f
and the map V R
3N
, q
(x
1
(q), . . . , x
N
(q)) denes an fdimensional submanifold of R
3N
.
Reduce the Lagrangian of the problem to an eective Lagrangian
L(q, q, t) L(x
1
(q), . . . , x
N
(q), x
1
(q), . . . , x
N
(q), t).
In practice, this amounts to a substitution of x
i
(t) = x
i
(q(t)) into the original La-
grangian. The eective Lagrangian is a function L : V R
f
R R, (q, q, t)
L(q, q, t).
Finally formulate and solve the EulerLagrange equations
d
dt
L(q, q, t)
d q
i
L(q, q, t)
dq
i
= 0, i = 1, . . . , f. (4.12)
5
Loosely speaking, a ddimensional submanifold of R
n
is a subset of R
n
that aords a smooth param-
eterization in terms of d < n coordinates (i.e. is locally dieomorphic to open subsets of R
d
.) Think of a
smooth surface in threedimensional space (n = 3, d = 2) or a line (n = 3, d = 1), etc.
The prescription above is equivalent to the following statement, which is known as Hamiltons
principle
Consider a mechanical problem formulated in terms of f generalized coordinates q =
(q
1
, . . . , q
f
) and a Lagrangian L(q, q, t) = (T U)(q, q, t). Let q(t) be solution of
the Euler-Lagrange equations
(d
t
q
i
q
i
)L(q, q, t) = 0, i = 1, . . . , f,
at given initial and nal conguration q(t
0
) = q
0
and q(t
1
) = q
1
. This curve describes
the physical motion of the system. It is an extremal curve of the action functional
S[q] =
_
t
1
t
0
dt L(q, q, t).
To conclude this section, let us transfer a number of important physical quantities from
Newtonian mechanics to the more general framework of Lagrangian mechanics: the general-
ized momentum associated to a generalized coordinate q
i
is dened as
p
i
=
L(q, q, t)
q
i
. (4.13)
Notice that for cartesian coordinates, p
i
=
q
i
L =
q
i
T = m q
i
reduces to the familiar
momentum variable. We call a coordinate q
i
a cyclic variable if it does not enter the
Lagrangian function, that is if
q
i
L = 0. These two denitions imply that
the generalized momentum corresponding to a cyclic variable is
conserved, d
t
p
i
= 0.
This follows from d
t
p
i
= d
t
q
i
L =
q
i
L = 0. In general, we call the derivative
q
i
L F
i
a generalized force. In the language of generalized momenta and forces, the Lagrange
equations (formally) assume the form of Newton-like equations,
d
t
p
i
= F
i
.
For the convenience of the reader, the most important quantities revolving around the La-
grangian formalism are summarized in table 4.1.
4.2.3 Lagrange mechanics and symmetries
Above, we have emphasized the capacity of the Lagrange formalism to handle problems with
constraints. However, an advantage of equal importance is its exibility in the choice of
problem adjusted coordinates: unlike the Newton equations, the Lagrange equations maintain
their form in all coordinate systems.
quantity designation or denition
generalized coordinate q
i
generalized momentum p
i
=
q
i
L
generalized force F
i
=
q
i
L
Lagrangian L(q, q, t) = (T U)(q, q, t)
Action (functional) S[q] =
_
t
1
t
0
dt L(q, q, t)
Euler-Lagrange equations (d
t
q
i
q
i
)L = 0
Table 4.1: Basic denitions of Lagrangian mechanics
The choice of good coordinates becomes instrumental in problems with symmetries.
From experience we know that a symmetry (think of zaxis rotational invariance) entails the
conservation of a physical variable (the zcomponent of angular momentum), and that it
is important to work in coordinates reecting the symmetry (zaxis cylindrical coordinates.)
But how do we actually dene the term symmetry? And how can we nd the ensuing
conservation laws? Finally, how do we obtain a system of symmetryadjusted coordinates? In
this section, we will provide answers to these questions.
4.2.4 Noether theorem
Consider a family of mappings, h
s
, of the coordinate
manifold of a mechanical system into itself,
h
s
:V V,
q h
s
(q). (4.14)
Here, s R is a control parameter, and we require that
h
0
= id. is the identity transform. By way of example, consider the map h
s
(r, , ) (r, , +
s) describing a rotation around the z-axis in the language of spherical coordinates, etc. For
each curve q : I V, t q(t), the map h
s
gives us a new curve h
s
q : I V, t h
s
(q(t))
(see the gure).
We call the transformation h
s
a symmetry (transformation) of a mechanical system i
S[h
s
q] = S[q],
for all s and curves q. The action is then said to be invariant under the symmetry transfor-
mation.
Suppose, we have found a symmetry and its representation in terms of a family of in-
variant transformations. Associated to that symmetry there is a quantity that is conserved
during the dynamical evolution of the system. The correspondence between symmetry and its
conservation law is established by a famous result due to Emmy Noether:
Noether theorem (1915-18): Let the action S[q] be invariant under the transfor-
mation q h
s
(q). Let q(t) be an solution curve of the system (a solution of the
Euler-Lagrange equations). Then, the quantity
I(q, q)
f
i=1
p
i
d
ds
(h
s
(q))
i
(4.15)
is dynamically conserved:
d
t
I(q, q) = 0.
Here, p
i
=
q
i
L
(q, q)=(h
s
(q),
h
s
(q))
is the generalized momentum of the ith coordinate,
and the quantity I(q, q) is known as the Noether momentum.
The proof of Noethers theorem is straightforward: the requirement of actioninvariance under
the transformation h
s
is equivalent to the condition d
s
S[h
s
(q)] = 0 for all values of s. We
now consider the action
S[q] =
_
t
f
t
0
dt L(q, q, t),
a solution curve samples during a time interval [t
0
, t
f
]. Here, t
f
is a free variable and the
transformed value h
s
(q(t
f
)) may dier from q(t
f
). We next explore what the condition of
action invariance tells us about the Lagrangian of the theory:
0
!
= d
s
_
t
f
t
0
dt L(h
s
(q),

h
s
(q), t)
=
_
t
1
t
0
dt
_
q
i
L
q=h
s
(q)
d
s
(h
s
(q))
i
+
q
i
L
q=
h
s
(q)
d
s
(
h
s
(q))
i
_
=
=
_
t
1
t
0
dt (
q
i
d
t
q
i
L)
(h
s
(q),
h
s
(q))
d
s
(h
s
(q))
i
+
q
i
L
q=
h
s
(q)
d
s
h
s
(q)
i
t
f
t
0
.
Now, our reference curve is a solution curve, which means that the integrand in the last line
vanishes. We are thus led to the conclusion
q
i
L
q=
h
s
(q)
d
s
h
s
(q)
i
t
f
t
0
= 0 for all values t
f
, and
this is equivalent to the constancy of the Noether momentum.
It is often convenient, to evaluate both, the symmetry transformation h
s
and the Noether
momentum I = I(s) at innitesimal values of the control parameter s. In practice, this
means that we x a pair (q, q) comprising coordinate and velocity of a solution curve. We
then consider an innitesimal transformation h
(q), where is innitesimally small. The

Noether momentum is then obtained as
I(q, q) =
f
i=1
p
i
d
d
(h
(q))
i
. (4.16)
It is usually convenient to work in symmetry adjusted coordinates, i.e. in coordinates
where the transformation h
s
assumes its simplest possible form. These are coordinates where
h
s
acts by translation in one coordinate direction, that is (h
s
(q))
i
= q
i
+ s
ij
, where j is the
aected coordinate direction. Coordinates adjusted to a symmetry are cyclic (think about this
point), and the Noether momentum
I = p
i
,
collapses to the standard momentum of the symmetry coordinate.
4.2.5 Examples
In this section, we will discuss two prominent examples of symmetries and their conservation
laws.
Translational invariance conservation of momentum
Consider a mechanical system that is invariant under translation in some direction. Without
loss of generality, we choose cartesian coordinates q = (q
1
, q
2
, q
3
) in such a way that the
coordinate q
1
parameterizes the invariant direction. The symmetry transformation h
s
then
translates in this direction: h
s
(q) = q + se
1
, or h
s
(q) = (q
1
+ s, q
2
, q
3
). With d
s
h
s
(q) = e
1
,
we readily obtain
I(q, q) =
L
q
1
= p
1
,
where p
1
= m q
1
is the ordinary cartesian momentum of the particle. We are thus led to the
conclusion
Translational invariance entails the conservation of cartesian momentum.
EXERCISE Generalize the construction above to a system of N particles in the absence of external
potentials. The (interaction) potential of the system then depends only on coordinate dierences
q
i
q
j
. Show that translation in arbitrary directions is a symmetry of the system and that this
implies the conservation of the total momentum P =
N
j=1
m
j
q
j
.
Rotational invariance conservation of angular momentum
As a second example, consider a system invariant under rotations around the 3-axis. In
cartesian coordinates, the corresponding symmetry transformation is described by
q h
(q) R
(3)
q,
where the angle serves as a control parameter (i.e. assumes the role of the parameter s
above), and
R
(3)

_
_
cos() sin() 0
sin() cos() 0
0 0 1
_
_
is an O(3)matrix generating rotations by the angle around the 3axis. The innitesimal
variant of a rotation is described by
R
(3)

_
_
1 0
1 0
0 0 1
_
_
+O(
2
).
From this representation, we obtain the Noether momentum as
I(q, q) =
L
q
i
d
=0
(R
(3)
q)
i
= m( q
1
q
2
q
2
q
1
).
This is (the negative of) the 3-component of the particles angular momentum. Since the
choice of the 3axis as a reference axis was arbitrary, we have established the result:
Rotational invariance around a symmetry axis entails the conservation of the
angular momentum component along that axis.
Now, we have argued above that in cases with symmetries, one should employ adjusted co-
ordinates. Presently, this means coordinates that are organized around the 3axis: spherical,
or cylindrical coordinates. Choosing cylindrical coordinates for deniteness, the Lagrangian
assumes the form
L(r, , r,

, z) =
m
2
( r
2
+ r
2

2
+ z
2
) U(r, z), (4.17)
where we noted that the problem of a rotationally invariant system does not depend on .
The symmetry transformation now simply acts by translation, h
s
(r, z, ) = (r, z, + s), and
the Noether current is given by
I l
3
=
L
= mr
2

. (4.18)
We recognize this as the cylindrical coordinate representation of the 3component of angular
momentum.
EXAMPLE Let us briey extend the discussion above to re-derive the symmetry optimized rep-
resentation of a particle in a central potential. We choose cylindrical coordinates, such that (a)
the force center lies in the origin, and (b) at time t = 0, both q and q lie in the z = 0 plane.
Under these conditions, the motion will stay in the z = 0 plane. (Exercise: show this from the
EulerLagrange equations.) and the cylindrical coordinates (r, , z) can be reduced to the polar
coordinates (r, ) of the invariant z = 0 plane.
The reduced Lagrangian reads
L(r, r,

) =
m
2
( r
2
+ r
2

2
) U(r).
From our discussion above, we know that l l
3
= mr
2

is
a constant, and this enables us to express the angular velocity
= l
3
/mr
2
in terms of the radial coordinate. This leads us the
eective Lagrangian of the radial coordinate,
L(r, r) =
m
2
r
2
+
l
2
2mr
2
U(r).
The solution of its EulerLagrange equations,
m r =
r
U
l
2
mr
3
,
has been discussed in section *
Chapter 5
Hamiltonian mechanics
The Lagrangian approach has introduced a whole new degree of exibility into mechanical
theory building. Still, there is room for further development. To see how, notice that in the
last chapter we have consistently characterized the state of a mechanical system in terms of
the 2f variables (q
1
, . . . , q
f
, q
1
, . . . , q
f
), the generalized coordinates and velocities. It is clear
that we need 2f variables to specify the state of a mechanical system. But are coordinates
and velocities necessarily the best variables? The answer is: often yes, but not always. In
our discussion of symmetries in section 4.2.3 above, we have argued that in problems with
symmetries, one should work with variables q
i
that transform in the simplest possible way (i.e.
additively) under symmetry transformations. If so, the corresponding momentum p
i
=
q
i
L is
conserved. Now, if it were possible to express the velocities uniquely in terms of coordinates
and momenta, q = q(q, p), we would be in possession of an alternative set of variables (q, p)
such that in a situation with symmetries a fraction of our variables stays constant. And thats
the simplest type of dynamics a variable can show.
In this chapter, we show that the reformulation of mechanics in terms of coordinates and
momenta as independent variables is an option. The resulting description is called Hamilto-
nian mechanics. Important examples of this new approach include:
It exhibits a maximal degree of exibility in the choice of problem adjusted coordinates.
The variables (q, p) live in a mathematical space, so called phase space, that carries
a high degree of mathematical structure (much more than an ordinary vector space.)
This structure turns out to be of invaluable use in the solution of complex problems.
Relatedly,
Hamiltonian theory is the method of choice in the solution of advanced mechanical
problems. For example, the theory of (conservative) chaotic systems is almost exclusively
formulated in the Hamiltonian approach.
Hamiltonian mechanics is the gateway into quantum mechanics. Virtually all concepts
introduced below have a direct quantum mechanical extension.
However, this does not mean that Hamiltonian mechanics is better than the Lagrangian
theory. The Hamiltonian approach has its specic advantages. However, in some cases it
99
100 CHAPTER 5. HAMILTONIAN MECHANICS
may be preferable to stay on the level of the Lagrangian theory. At any rate, Lagrangian
and Hamiltonian mechanics form a pair that represents the conceptual basis of many modern
theories of physics, not just mechanics. For example, electrodynamics (classical and quantum)
can be formulated in terms of a Lagrangian and an Hamiltonian theory, and these formulations
are strikingly powerful.
5.1 Hamiltonian mechanics
The Lagrangian L = L(q, q, t) is a function of coordinates and velocities. What we are after is
a function H = H(q, p, t) that is a function of coordinates and momenta, where the momenta
and velocities are related to each other as p
i
=
q
i
L = p
i
(q, q, t). Once in possession of the
new function H, there must be a way to express the key information carriers of the theory
the Euler Lagrange equations in the language of the new variables. In mathematics, there
exist dierent ways of formulating variable changes of this type, and one of them is known as
5.1.1 Legendre transform
Reducing the notation to a necessary minimum, the task formulated above amounts to the
following problem: given a function f(x) (here, f assumes the role of L and x represents q
i
),
consider the variable z =
x
f(x) = z(x) (z represents the momentum p
i
.) If this equation is
invertible, i.e. if a representation x = x(z) exists, nd a partner function g(z), such that g
carries the same amount of information as f. The latter condition means that we are looking
for a transformation, i.e. a mapping between functions f(x) g(z) that can be inverted
g(z) f(x), so that the function f(x) can be reconstructed in unique terms from g(z).
If this latter condition is met, it must be possible to express any mathematical operation
formulated for the function of f in terms of a corresponding operation on the function g (cf.
the analogous situation with the Fourier transform.)
Let us now try to nd a transformation that meets the criteria above. The most obvious
guess might be a direct variable substitution: compute z =
x
f(x) = z(x), invert to x = x(z),
and substitute this into f, i.e. g(z)
?
= f(x(z)). This idea goes in the right direction but is
not quite good enough. The problem is that in this way information stored in the function f
may get lost. To exemplify the situation, consider the function
f(x) = C exp(x),
where C is a constant. Now, z =
x
f(x) = C exp(x), which means x = ln(z/C) substitution
back into f gets us to
g(z) = C exp(ln(z/C)) = z.
The function g no longer knows about C, so information on this constant has been irretrievably
lost. (Knowledge of g(z) is not sucient to re-construct f(x).)
5.1. HAMILTONIAN MECHANICS 101
However, it turns out that a slight extension of the above idea does the trick. Namely,
consider the so called Legendre transform of f(x),
g(z) f(x(z)) zx(z). (5.1)
We claim that g(z) denes a transformation of functions. To verify this, we need to show
that the function f(x) can be obtained from g(z) by some suitable inverse transformation.
It turns out that (up to a harmless sign change) the Legendre transform is self-inverse; just
apply it once more and you get back to the function f. Indeed, let us dene the variable
y
z
g(z) =
x
f
x(z)
z
x(z) z
z
x(z) x(z). Now, by denition of the function z(x) we
have
x
f
x(z)
= z(x(z)) = z. Thus, the rst two terms cancel, and we have y(z) = x(z).
We next compute the Legendre transform of g(z):
h(y) = f(x(z(y)) x(z(y))z(y) z(y)y.
Evaluating the relation y(z) = x(z) on the specic argument z(y), we get y = y(z(y)) =
x(z(y)), i.e. f(x(z(y)) = f(y), and the last two terms in the denition of h(y) are seen
to cancel. We thus obtain
h(y) = f(y),
the Legendre transform is (almost) selfinverse, and this means that by passing from f(x) to
g(z) no information has been lost.
The abstract denition of the Legendre transform with its nested variable dependencies can
be somewhat confusing. However, when applied to concrete functions, the transform is actually
easy to handle. Let us illustrate this on the example considered above: with f(x) = C exp(x),
and x = ln(z/C), we get
g(z) = C exp(ln(z/C)) ln(z/C)z = z(1 ln(z/C)).
(Notice that g(z) looks very dierent from f(x), but this need not worry us.) Now, let us
apply the transform once more: dene y =
z
g(z) = ln(z/C), or z(y) = C exp(y). This
gives
h(y) = C exp(y)(1 + y) C exp(y)y = C exp(y),
in accordance with the general result h(y) = f(y).
The Legendre transform of multivariate functions f(x) is obtained by application of
the rule to all variables: compute z
i

x
i
f(x). Next construct the inverse x = x(z). Then
dene
g(z) = f(x(z))
i
z
i
x
i
(z). (5.2)
5.1.2 Hamiltonian function
Denition
We now apply the construction above to compute the Legendre transform of the Lagrange
function. We thus apply Eq. (5.2) to the function L(q, q, t) and identify variables as x q
and z p. (Notice that the coordinates, q, themselves play the role of spectator variables.
The Legendre transform is in the variables q!) Le us now formulate the few steps it takes to
pass to the Legendre transform of the Lagrange functions:
1. Compute the f variables
p
i
=
q
i
L(q, q, t). (5.3)
2. Invert these relations to obtain q
i
= q
i
(q, p, t)
3. Dene a function
H(q, p, t)
i
p
i
q
i
(q, q, t) L(q, q(q, p, t)). (5.4)
Technically, this is the negative of Ls Legendre transform. Of course, this function
carries the same information as the Legendre transform itself.
The function H is known as the Hamiltonian function of the system. It is usually called
Hamiltonian for short (much like the Lagrangian function) is called Lagrangian. The
notation above emphasizes the variable dependencies of the quantities q
i
, p
i
, etc. One usually
keeps the notation more compact, e.g. by writing H =
i
p
i
q
i
L. However, it is important
to remember that in both L and
i
q
i
p
i
, the variable q
i
has to be expressed as a function of
q and p. It doesnt make sense to write down formulae such as H = . . . , if the right hand
side contains q
i
s as fundamental variables!
Hamilton equations
Now we need to do something with the Hamiltonian function. Our goal will be to transcribe
the EulerLagrange equations to equations dened in terms of the Hamiltonian. The hope
then is, that these new equations contain operational advantages over the EulerLagrange
equations.
Now, the Euler-Lagrange equations probe changes (derivatives) of the function L. It will
therefore be a good idea, to explore what happens if we ask similar questions to the function
H. In doing so, we should keep in mind that our ultimate goal is to obtain unambiguous
information on the evolution of particle trajectories. Let us then compute
q
i
H(q, p, t) = p
j
q
i
q
j
(q, p)
q
i
L(q, q(q, p))
q
j
L(q, q(q, p))
q
i
q
j
(q, p).
The rst and the third term cancel, because
q
j
L = p
j
. Now, i the curve q(t) is a so-
lution curve, the middle term equals d
t
q
i
L = d
t
p
i
. We thus conclude that the (q, p)-
representation of solution curves must obey the equation
p
i
=
q
i
H(q, p, t).
Similarly,
p
i
H(q, p, t) = q
i
(q, p) + p
j
p
i
q
j
(q, p, t)
q
j
L(q, q(q, p))
p
i
q
j
(q, p, t).
The last two terms cancel, and we have
q
i
=
p
i
H(q, p, t).
Finally, let us compute the partial time derivative
t
H.
1
d
t
H(q, p, t) = p
i
t
q
i
(q, p, t)
q
i
L(q, q(q, p, t))
t
q
i
(q, p, t)
t
L(q, q, t).
Again, two terms cancel and we have
t
H(q, p, t) =
t
L(q, q(q, p, t), t), (5.5)
where the time derivative on the r.h.s. acts only on the third argument L( . , . , t).
Let us summarize where we are: The solution curve q(t) of a mechanical system de-
nes a 2fdimensional curve, (q(t), q(t)). Eq. (5.3) then denes a 2fdimensional curve
(q(t), p(t)). The invertibility of the relation p q implies that either representation faithfully
describes the curve. The derivation above then shows that
The solution curves (q, p)(t) of mechanical problems obey the socalled Hamilton
equations
q
i
=
p
i
H,
p
i
=
q
i
H. (5.6)
Some readers may worry about the numbers of variables employed in the description of curves:
in principle, a curve is uniquely represented by the f variables q(t). However, we have now
decided to represent curves in terms of the 2fvariables (q, p). Is there some redundancy
here? No there isnt and the reason can be understood in dierent ways. First notice that in
Lagrangian mechanics, solution curves are obtained from second order dierential equations in
time. (The EulerLagrange equations contain terms q
i
.) In contrast, the Hamilton equations
are rst order in time.
2
The price to be payed for this reduction is the introduction of a second
set of variables, p. Another way to understand the rational behind the introduction of 2f
variables is by noting that the solution of the (2nd order dierential) EulerLagrange equations
1
Here it is important to be very clear about what we are doing: in the present context, q is a variable in
the Lagrange function. (We could also name it v or z, or whatever.) It is considered a free variable (no time
dependence), unless the transformation q
i
= q
i
(q, p, t) becomes explicitly time dependent. Remembering the
origin of this transformation, we see that this may happen if the function L(q, p, t) contains explicit time
dependence. This happens, e.g., in the case of rheonomic constraints, or time dependent potentials.
2
Ordinary dierential equations of nth order can be transformed to systems of n ordinary dierential
equations of rst order. The passage L H is example of such an order reduction.
requires the specication of 2f boundary conditions. These can be the 2f conditions
stored in the specication q(t
0
) = q
0
and q(t
1
) = q
1
of an initial and a nal point,
3
or the
specication q(t
0
) = q
0
and q(t
0
) = v
0
of an initial conguration and velocity. In contrast
4
the Hamilton are uniquely solvable once an initial conguration
(q(t
0
), p(t
0
)) = (q
0
, p
0
) has been specied.
Either way, we need 2f boundary conditions, and hence 2f variables to uniquely specify the
state of a mechanical system.
EXAMPLE To make the new approach more concrete, let us formulate the Hamilton equations
of a point particle in cartesian coordinates. The Lagrangian is given by
L(q, q, t) =
m
2
q
2
U(q, t),
where we allow for time dependence of the potential. Thus
p
i
= m q
i
,
which is inverted as q
i
(p) =
p
i
m
. This leads to
H(q, p, t) =
3
i=1
p
i
p
i
m
L(q, p/m, t) =
p
2
2m
+ U(q, t).
The Hamilton equations assume the form
q =
p
m
,
p =
q
U(q, t).
Further,
t
H =
t
L =
t
U inherits its time dependence from the time dependence of the
potential. The two Hamilton equations are recognized as a reformulation of the Newton equation.
(Substituting the time derivative of the rst equation, q = p/m, into the second we obtain the
Newton equation m q =
q
U.)
Notice the Hamiltonian function H =
p
2
2m
+U(q, t) equals the energy of the particle! In the next
section, we will discuss this connection in more general terms.
EXAMPLE Let us now solve these equations for the very elementary example of the one-
dimensional harmonic oscillator. In this case, V (q) =
m
2
2
q
2
and the Hamilton equations assume
the form
q =
p
m
,
p = m
2
q.
For a given initial conguration x(0) = (q(0), p(0)), these equations aord the unique solution
q(t) = q(0) cos t +
p(0)
m
sin t,
p(t) = p(0) cos t m sin t. (5.7)
3
But notice that conditions of this type do not always uniquely specify a solution think of examples!
4
Formally, this follows from a result of the theory of ordinary dierential equations: a system of n rst
order dierential equations x
i
= f
i
(x
1
, . . . , x
n
), i = 1, . . . , n aords a unique solution, once n initial conditions
x
i
(0) have been specied.
Physical meaning of the Hamilton function
The Lagrangian L = T U did not carry immediate physical meaning; its essential role
was that of a generator of the EulerLagrange equations. However, with the Hamiltonian
the situation is dierent. To understand its physical interpretation, let us compute the full
time derivative d
t
H = d
t
H(q(t), p(t), t) of the Hamiltonian evaluated on a solution curve
(q(t), p(t)), (i.e. a solution of the Hamilton equations (5.6)):
d
t
H(q, p, t) =
f
i=1
_
H(q, p, t)
q
i
q
i
+
H(q, p, t)
p
i
p
i
_
+
H(q, p, t)
t
(5.6)
=
H(q, p, t)
t
.
Now, above we have seen (cf. Eq. (5.5)) that the Hamiltonian inherits its explicit time
dependence
t
H =
t
L from the explicit time dependence of the Lagrangian. In problems
without explicitly time dependent potentials (they are called autonomous problems), the
Hamiltonian stays constant on solution curves.
We have thus found that the function H(q, p), (the missing time argument indicates that
we are now looking at an autonomous situation) denes a constant of motion of extremely
general nature. What is its physical meaning? To answer this question, we consider an general
Nbody system in an arbitrary potential. (This is most general conservative autonomous
setup one may imagine.) In cartesian coordinates, its Lagrangian is given by
L(q, p) =
N
j=1
m
j
q
2
j
U(q
1
, . . . , q
N
),
where U is the Nbody potential and m
j
the mass of the jth particle. Proceeding as in the
example on p 104, we readily nd
H(q, p) =
N
j=1
p
2
i
2m
i
+ U(q
1
, . . . , q
N
). (5.8)
The expression on the r.h.s. we recognize as the energy of the system. We are thus led to the
following important conclusion:
The Hamiltonian H(q, p) of an autonomous problem (a problem with timeindependent
potentials) is dynamically conserved: H(q(t), p(t)) = E = const. on solution curves
(q(t), p(t)).
However, our discussion above implies another very important corollary. It has shown that
The Hamiltonian H = T + U is given by the sum of potential energy, U(q, t), and
kinetic energy, T(q, p, t), expressed as a function of coordinates and momenta.
We have established this connection in cartesian coordinates. However, the coordinate
invariance of the theory implies that this identication holds in general coordinate systems.
Also, the identication H = T +U did not rely on the timeindependence of the potential, it
extends to potentials U(q, t).
INFO In principle, the identication H = T +U provides an option to access the Hamiltonian
without reference to the Lagrangian. In cases where the expression T = T(q, p, t) of the
kinetic energy in terms of the coordinates and momenta of the theory is known, we may just
add H(q, p, t) = T(q, p, t) + U(q, t). In such circumstances, there is no need to compute the
Hamiltonian by the route L(q, q, t)
Legendre
H(q, p, t). Often, however, the identication of the
momenta of the theory is not so obvious, and one is better advised to proceed via Legendre
transform.
5.2 Phase space
In this section, we will introduce phase space as the basic arena of Hamiltonian mechanics.
We will start from an innocent denition of phase space as a 2fdimensional coordinate space
comprising conguration space coordinates, q, and momenta, p. However, as we go along,
we will realize that this working denition is but the tip of an iceberg: the coordinate spaces
of Hamiltonian mechanics are endowed with very deep mathematical structure, and this is one
way of telling why Hamiltonian mechanics is so powerful.
5.2.1 Phase space and structure of the Hamilton equations
To keep the notation simple, we will suppress reference to an optional explicit time depen-
dence of the Hamilton functions throughout this section, i.e. we just write H(. . . ) instead of
H(. . . , t).
The Hamilton equations are coupled equations for coordinates q, and momenta p. As was
argued above, the coordinate pair (q, p) contains sucient information to uniquely encode
the state of a mechanical system. This suggests to consider the 2fcomponent objects
x
_
q
p
_
(5.9)
as the new fundamental variables of the theory. Given a coordinate system, we may think of
x as element of a 2fdimensional vector space. However, all what has been said in section
4.1.3 about the curves of mechanical systems and their coordinate representations carries over
to the present context: in abstract terms, the pair (conguration space points, momenta)
denes a mathematical space known as phase space, . The formulation of coordinate
invariant descriptions of phase space is a subject beyond the scope of the present course.
However, locally it is always possible to parameterize in terms of coordinates,
5
and to describe
5
Which means that has the status of a 2fdimensional manifold.
5.2. PHASE SPACE 107
its elements through 2fcomponent objects such as (5.9). This means that, locally, phase
space can be identied with a 2fdimensional vector space. However, dierent coordinate
representations correspond to dierent coordinate vector spaces, and sometimes it is important
to recall that the identication (phase space) (vector space) may fail globally.
6
Keeping the above words of caution in mind, we will temporarily identify phase space with
a 2fdimensional vector space. We will soon see that this space contains a very particular sort
of scalar product. This additional structure makes phase space a mathematical object far
more interesting than an ordinary vector space. Let us start with a rewriting of the Hamilton
equations. The identication x
i
= q
i
and x
f+i
= p
i
, i = 1, . . . , f enables us to rewrite
Eq. (5.6) as,
x
i
=
x
i+f
H,
x
i+f
=
x
i
H,
where i = 1, . . . , f. We can express this in a more compact form as
x
i
= I
ij
x
j
H, i = 1, . . . , 2f, (5.10)
or, in vectorial notation,
x = I
x
H. (5.11)
Here, the (2f) (2f) matrix I is dened as
I =
_
1
f
1
f
_
, (5.12)
and 1
f
is the f-dimensional unit matrix. The matrix I is sometimes called the symplectic
unity.
5.2.2 Hamiltonian ow
For any x, the quantity I
x
H(x) = I
ij
x
j
H(x, t) is a vector in phase space. This means
that the prescription
X
H
: R
n
,
x I
x
H, (5.13)
denes a vector eld in phase space, the so-called Hamiltonian vector eld. The form of
the Hamilton equations
x = X
h
, (5.14)
suggests an interpretation in terms of the ow lines of the Hamiltonian vector eld: at each
point in phase space, the eld X
H
denes a vector X
H
(x). The Hamilton equations state
6
For example, there are mechanical systems whose phase space is a twosphere, and the sphere is not a
vector space. (However, locally, it can be represented in terms of vectorspace coordinates.)
Figure 5.1: Visualization of the Hamiltonian vector eld and its ow lines
that the solution curve x(t) is tangent to that vector, x(t) = X
H
(x). One may visualize the
situation in terms of the streamlines of a uid. Within that analogy, the value X
H
(x) is a
measure of the local current ow. If one injected a drop of colored ink into the uid, its trace
would be a representation of the curve x(t).
For a general vector eld v : U R
N
, x v(x), where U R
N
, one may dene a
parameterdependent map
: U R U,
(x, t) (x, t), (5.15)
through the condition
x
(x, t)
!
= v((x, t)). The map is called the ow of the vector
eld v. Specically, the ow of the Hamiltonian vector eld X
H
is dened by the prescription
(x, t) x(t), (5.16)
where x(t) is a curve with initial condition x(t = 0) = x. Equation (5.16) is a proper
denition because for a given x(0) x, the solution x(t) is uniquely dened. Further,
t
(x, t) = d
t
x(t) = X
H
(x(t)) satises the ow condition. The map (x, t) is called the
Hamiltonian ow.
There is a corollary to the uniqueness of the curve x(t) emanating from a given initial
condition x(0):
Phase space curves never cross.
For if they did, the crossing point would be the initial point of the two out-going stretches
of the curve, and this would be in contradiction to the uniqueness of the solution for a given
initial conguration. There is, however, one subtle caveat to the statement above: although
phase space curves do not cross, they may actually touch each other in common terminal
points. (For an example, see the discussion in section 5.2.3 below.)
5.2.3 Phase space portraits
Graphical representations of phase ow, so-called phase space
portraits are very useful in the qualitative characterization of
mechanical motion. The only problem with this is that phase
ow is dened in 2f dimensional space. For f > 1, we cannot
draw this. What can be drawn, however, is the projection of a
curve onto any of the 2f(2f 1)/2 coordinate planes (x
i
, x
j
),
1 i < j 2f. See the gure for the example of a coordinate
plane projection out of 3ddimensional space. However, it
takes some experience to work with those representations and
we will not discuss them any further.
However, the concept of phase space portraits becomes truly useful in problems with
one degree of freedom, f = 1. In this case, the conservation of energy on individual
curves H(q, p) = E = const. enables us compute an explicit representation q = q(p, E), or
p = p(q, E) just by solving the relation H(q, p) = E for q or p. This does not solve the me-
chanical problem under consideration. (For that one would need to know the time dependence
at which the curves are traversed.) Nonetheless, it provides a farreaching characterization
of the motion: for any phase space point (q, p) we can construct the curve running through
it without solving dierential equations. The projection of these curves onto conguration
space (q(t), p(t)) q(t) tells us something about the evolution of the conguration space
coordinate.
Let us illustrate these statements, on an example where a closed
solution is possible, the harmonic oscillator. From the Hamiltonian
H(q, p) =
p
2
2m
+
m
2
2
q
2
= E, we nd
p = p(q, E) =
2m
_
E
m
2
2
q
2
_
.
Examples of these (ellipsoidal) curves are shown in the gure. It
is straightforward to show (do it!) that they are tangential to the
Hamiltonian vector eld
X
H
=
_
m
1
p
m
2
q
_
.
At the turning points, p = 0, the energy of the curve E =
m
2
2
q
2
,
equals the potential energy.
Now, the harmonic oscillator may be an example a little bit too basic to illustrate the
usefulness of phase space portraits. Instead, consider the problem dened by the potential
shown in Fig. 5.2. The Hamilton equations can now no longer be solved in closed form.
However, we may still compute curves by solution of H(p, q) = E p = p(q, E), and the
results look as shown qualitatively for three dierent values of E in the gure. These curves
give a farreaching impression of the motion of a point particle in the potential landscape.
Specically, notice the threshold energy E
2
corresponding to the local potential maximum. At
rst sight, the corresponding phase space curve appears to violate the criterion of the non-
existence of crossing points (cf. the critical point at p = 0 beneath the potential maximum.)
In fact, however, this isnt a crossing point. Rather the two incoming curves terminate at
this point: coming from the left, or right it takes innite time for the particle to climb the
potential maximum.
EXERCISE Show that a trajectory at energy E = V

V (q
) equal to the potential maximum

at q = q
needs innite time to climb the potential hill. To this end, use that the potential
maximum at q
can be locally modelled as an inverted harmonic oscillator potential, V (q)

V

C(q q
)
2
, where C is a positive constant. Consider the local approximation of the Hamilton
function H =
p
2
2m
C(q q
)
2
+ const. and formulate and solve the corresponding equations of
motion (hint: compare to the standard oscillator discussed above.) Compute the time it takes to
reach the maximum if E = V (q
).
Eventually it will rest at the maximum in an unstable equilibrium position. Similarly, the out-
going trajectories begin at this point. Starting at zero velocity (corresponding to zero initial
momentum), it takes innite time to accelerate and move downhill. In this sense, the curves
touching (not crossing!) in the critical point represent idealizations that are never actually
realized. Trajectories of this type are called separatrices. Separatrices are important in that
they separate phase space into regions of qualitatively dierent type of motion (presently,
curves that make it over the hill and those which do not.)
EXERCISE Take a few minutes to familiarize yourself with the phase space portrait and to learn
how to read such representations. Sketch the periodic potential V (q) = cos(2q/a), a = const.
and its phase space portrait.
5.2.4 Poisson brackets
Consider a (dierentiable) function in phase space, g : R, x g(x). A natural question
to ask is how g(x) will evolve, as x x(0) x(t) traces out a phase space trajectory. In
other words, we ask for the time evolution of g(x(t)) with initial condition g(x(0)) = g(x).
The answer is obtained by straightforward time dierentiation:
d
dt
g(x(t), t) =
d
dt
g(q(t), p(t), t) =
f
i=1
_
g
q
i
dq
i
dt
+
g
p
i
dp
i
dt
_
(q(t),p(t),t)
+
t
g(q(t), p(t), t) =
=
f
i=1
_
g
q
i
H
p
i
g
p
i
H
q
i
_
(q(t),p(t),t)
+
t
g(q(t), p(t), t),
where the terms
t
g account for an optional explicit time dependence of g. The character-
istic combination of derivatives governing this expressions appears frequently in Hamiltonian
Figure 5.2: Phase space portrait of a nontrivial one-dimensional potential. Notice the
critical point at the threshold energy 2).
mechanics. It motivates the introduction of a shorthand notation,
f, g
f
i=1
_
f
p
i
g
q
i
f
q
i
g
p
i
_
. (5.17)
This is expression is called the Poisson bracket of two phase space functions f and g. In the
invariant notation of phase space coordinates x, it assumes the form
f, g = (
x
f)
T
I
x
g, (5.18)
where I is the symplectic unit matrix dened above. The time evolution of functions may be
concisely expressed in terms of the Poisson bracket of the two functions H and g:
d
t
g = H, g +
t
g. (5.19)
Notice that the Poisson bracket operation bears similarity to a scalar product of functions.
Let C be the space of smooth (and integrable) functions in phase space. Then
, : C C R,
(f, g) f, g, (5.20)
denes a scalar product. This scalar product comes with a number of characteristic algebraic
properties:
f, g = g, f, (skew symmetry),
cf + c
t
f
t
, g = cf, g + c
t
f
t
, g, (linearity),
c, g = 0,
ff
t
, g = ff
t
, g +f, gf
t
, (product rule),
f, g, h +h, f, g +g, h, f = 0 (Jacobi identity),
where, c, c
t
R are constants. The rst three of these relations are immediate consequences
of the denition and the fourth follows from the product rule. The proof of the Jacobi identity
amounts to a straightforward if tedious exercise in dierentiation.
At this point, the Poisson bracket appears to be little more than convenient notation.
However, as we go along, we will see that it encapsulates important information on the
mathematical structure of phase space.
5.2.5 Variational principle
The solutions of the Hamilton equations x(t) can be interpreted as extremal curves of a certain
action functional. This connection will be a gateway to the further development of the theory.
Let us consider the set M = x : [t
0
, t
1
] , q(t
0
) = q
0
, q(t
1
) = q
1
of all phase space
curves beginning and ending at a common conguration space point, q
0,1
. We next dene the
local functional
S :M R,
x S[x] =
_
t
1
t
0
dt
_
f
i=1
p
i
q
i
H(q, p, t)
_
. (5.21)
We now claim that the extremal curves of this functional are but the solutions of the Hamilton
equations (5.6). To see this, dene the function F(x, t)

f
i=1
p
i
q
i
H(q, p, t), in terms
of which S[x] =
_
t
1
t
0
dt F(x, t). Now, according to the general discussion of section 4.1.2, the
extremal curves are solutions of the Euler Lagrange equations
(d
t
x
i
x
i
) F(x, t) = 0, i = 1, . . . , 2f.
Evaluating these equations for i = 1, . . . , f and i = f +1, . . . , 2f, respectively, we obtain the
rst and second set of the equations (5.6).
5.3. CANONICAL TRANSFORMATIONS 113
5.3 Canonical transformations
In the previous chapter, we have emphasized the exibility of the Lagrangian approach in
the choice of problem adjusted coordinates: any dieomorphic transformation q
i
q
t
i
=
q
t
i
(q), i = 1, . . . , f let the EulerLagrange equations forminvariant. Now, in the Hamiltonian
formalism, we have twice as many variables x
i
, i = 1, . . . , 2f. Of course, we may consider
restricted transformations, q
i
q
t
i
(q), where only the conguration space coordinates
are transformed (the transformation of the p
i
s then follows.) However, things get more
interesting, when we set out to explore the full freedom of coordinate transformations mixing
conguration space coordinates, and momenta. This would include, for example, a mapping
q
i
q
t
i
p
i
, p
i
p
t
i
= q
i
(a dieomorphism!) that exchanges the roles of coordinates
and momenta. Now, this newly gained freedom raises two questions: (i) what might such
generalized transformations be good for, and (ii) does every dieormorphic mapping x
i
x
t
i
qualify as a good coordinate transformation? Beginning with the second one, let us address
these two questions in turn:
5.3.1 When is a coordinate transformation canonical?
In the literature of variable transformations in Hamiltonian dynamics it is customary to denote
the new variables X x
t
by capital letters. We will follow this convention.
Following our general paradigm of valuing form invariant equations, we deem a coordi-
nate transformation canonical, if the following condition is met:
A dieomorphism x X(x)
7
denes a canonical transformation if there exists a
function H
t
(X, t) such that the representation of solution curves (i.e. solutions of
the equations x = I
x
H) in the new coordinates X = X(x) solves the transformed
Hamilton equations
X
t
= I
X
H
t
(X, t). (5.22)
Now, this denition may leave us a bit at a loss: it does not tell how the new Hamiltonian H
t
relates to the old one, H and in this sense seems to be strangely underdened. However,
before exploring the ways by which canonical transformations can be constructively obtained,
let us take a look at possible motivations for switching to new coordinates.
5.3.2 Canonical transformations: why?
Why would one start thinking about generalized coordinate transformations mixing cong-
uration space coordinates and momenta, etc.? In previous sections, we had argued that
coordinates should relate to the symmetries of a problem. Symmetries, in turn, generate con-
servations laws (Noether), i.e. for each symmetry one obtains a quantity, I, that is dynamically
conserved. Within the framework of phase space dynamics, this means that there is a function
I(x) such that d
t
I(x(t)), where x(t) is a solution curve. We may think of I(x) as a function
that is constant along the Hamiltonian ow lines.
Now, suppose we have s f symmetries and, accordingly, s conserved functions I
i
, i =
1, . . . , f. Further assume that we nd a canonical transformation to phase space coordinates
x X, where P = (I
1
, . . . , I
s
, P
s+1
, . . . , P
f
), and Q = (
1
, . . . ,
s
, Q
s+1
, . . . , Q
f
). In
words: the rst s of the new momenta are the functions I
i
. Their conjugate conguration
space coordinates are called
i
.
8
The new Hamiltonian will be some function of the new
coordinates and momenta, H
t
(
1
, . . . ,
s
, Q
s+1
, . . . , Q
f
, I
1
, . . . , I
s
, P
s+1
, . . . , P
f
). Now lets
take a look at the Hamilton equations associated to the variables (
i
, I
i
):
d
t
i
=
I
i
H
t
,
d
t
I
i
=
i
H
t
!
= 0.
The second equation tells us that H
t
must be independent of the variables
i
:
The coordinates,
i
, corresponding to conserved momenta I
i
are cyclic coordinates,
H
t
i
= 0.
At
this point, it should have become clear that coordinate sets containing conserved momenta,
I
i
, and their coordinates,
i
are tailored to the optimal formulation of mechanical problems.
The power of these coordinates becomes fully evident in the extreme case where s = f,
i.e. where we have as many conserved quantities as degrees of freedom. In this case
H
t
= H
t
(I
1
, . . . , I
f
),
and the Hamilton equations assume the form
d
t
i
=
I
i
H
t

i
d
t
I
i
=
i
H
t
= 0.
These equations can now be trivially solved:
9
i
(t) =
i
t +
i
,
I
i
(t) = I
i
= const.
8
It is natural to designate the conserved quantities as momenta: in section 4.2.4 we saw that the
conserved quantity associated to the natural coordinate of a symmetry (a coordinate transforming additively
under the symmetry operation) is the Noether momentum of that coordinate.
9
The problem remains solvable, even if H
(I
1
, . . . , I
f
, t) contains explicit time dependence. In this case,
i
=
I
i
H
(I
1
, . . . , I
f
, t)
i
(t), where
i
(t) is a function of known time dependence. (The time dependence
is known because the variables I
i
are constant and we are left with the externally imposed dependence on
time.) These equations ordinary rst order dierential equations in time are solved by
i
(t) =
_
t
t
0
dt

i
(t
).
5.3. CANONICAL TRANSFORMATIONS 115
A very natural guess would H
t
(X, t)
?
= H(x(X), t), i.e. the old Hamiltonian expressed
in new coordinates. However, we will see that this ansatz can be too restrictive.
Progress with this situation is made once we remember that solution curves x are extrema
of an action functional S[x] (cf. section 5.2.5). This functional is the xrepresentation of an
abstract functional. The same object can be expressed in terms of Xcoordinates (cf. our
discussion in section 4.1.3.) as S
t
[X] = S[x]. The form of the functional S
t
follows from the
condition that the Xrepresentation of the curve be solution of the equations (5.22). This is
equivalent to the condition
S
t
[X] =
_
t
1
t
0
dt
_
f
i=1
P
i

Q
i
H
t
(Q, P, t)
_
+ const.,
where const. is a contribution whose signicance we will discuss in a moment. Indeed, the
variation of S
t
[X] generates EulerLagrange equations which we saw in section 5.2.5 are just
the Hamilton equations (5.22).
Bibliography
[1] J. D. Jackson. Classical Electrodynamics. Wiley, 1975.
[2] R. Sexl and H. Urbantke. Relativity, Groups, Particles. Springer, 2001.
117

Classical Theoretical Physics - Alexander Altland

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Classical Theoretical Physics - Alexander Altland

Загружено:

Авторское право:

Доступные форматы

, we associate a covariant vector x

can be contracted to produce a number:

, etc. Notice that

= (ct, x) for short. The associated covariant vector has components

is dened through j (c, j), where and j are charge density

, objects of this type do not make sense in the present

the Lorentz condition,

in the Lorenz gauge class

to obtain the constraint

= 0, we obtain the simplied representation

f where f satises the homogeneous

f = 0. This doesnt alter the condition

= 0.) Other gauge

. The continuity equation

= 0 is described in terms of only

[ = 1, . . . , N be a second basis of the space V

, of v in the new basis are obtained by taking the scalar

[i = 1, . . . , n that satisfying the completeness criterion (1.37) is orthonormal

. In the specic case,

can be arranged to form an orthonormal basis.

C are arbitrary complex functions of the wave vector k. Substituting this

) the retarded (advanced) Green

= 0. Probing this relation on (1.55),

= 0, where we noted that k

= 0 is the continuity relation.

(x) exp(it). (The signal generated by sources of more general

(x) exp(it). Substituting

(x, t) = P(x, t). (2.4)

, and an external charge

and an external current

P[ vq/ v/c is of the order

.) Notice the slightly dierent perspectives at which Eqs. (2.5) and

. The actual electric eld, E, is caused by the sum of

P tells us that there is

n > 0 implying that v

. Time inversion, T : (ct, x) (ct, x) belongs to the component

we consider a special trans-

Consider the world line of a particle moving in space in a (not nec-

, where f = (0, f ) and we noted

. (Notice that both and the rest mass m are

and the components of the force, f

, transform as vectors under Lorentz

be a four component object that transforms under homogeneous Lorentz trans-

we denote the components

. Under a Lorentz transformation, x

to be the inverse of the Lorentz metric (in a basis where g is diagonal,

may be obtained back by raising the indices as

. With the expansions v =

. (Notice that components of objects in dual space will be

are the matrix elements of A. Show that

, where in the last step we used the

we denote the compo-

. (Notice that indices that are summed

is called a contravariant tensor of second degree if it transforms under

. Similiarly, a covariant tensor of second

. Covariant and contravariant tensors

(x) that transforms under Lorentz transformations as v

is the contraction of a contravariant and a covariant vector (eld). As

obtains a contravariant tensor of degree one, etc.

is a contravariant vector. In electrodynamics we frequently

, i.e. the next question to ask is whether the four

transforms as a covariant vector. In components,