Вы находитесь на странице: 1из 130

MT2 Lectures

Dr Marcella Bona

March 28, 2018

Contents
1 Basics 4
1.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Complex Numbers 4
2.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Trigonometric Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Functions and equations of complex variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Integrating Complex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Hyperbolic functions 13

4 Coordinate systems 15
4.1 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Full Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Multiple Integrals 24
5.1 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.1 Methods for Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Some Examples of Integrals in Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Vectors 38
6.1 Operations and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Products between vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2.1 Vectors with complex components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3 Triple Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.3.1 Coplanarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.4 Rotation of coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.5 Vector Equations of Lines and Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.5.1 Vector Equation of a Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.5.2 Vector Equation of a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7 Vector Calculus 49
7.1 Scalar and Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.2 Gradient, Divergence and Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.2.1 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.2.2 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.2.3 Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2.4 Nabla Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

1
7.3 Properties of Gradient, Divergence and Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.3.1 Properties of the Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.3.2 Properties of the Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.3.3 Properties of the Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4 Second Order Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

8 Line and Surface Integrals 59


8.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.2 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2.1 Vector Area of Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2.2 Some Physics Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

9 Vector Calculus II 68
9.1 Conservative Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.2 Solenoidal Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.3 Divergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.4 Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.5 Stokes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9.6 Physics Applications of Divergence and Stokes’ Theorems . . . . . . . . . . . . . . . . . . . . . . . . 76

10 Matrices 80
10.1 Operations on Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.1.1 Properties of Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.2 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.3 More Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.4 Determinant of a Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.5 Trace of a Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10.6 More on Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.7 Inversion of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.8 More Properties of Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.9 More Special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.10Properties of the Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

11 Systems of Simultaneous Linear Equations 97


11.1 Special Case: Square Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.2 Methods to Solve Simultaneous Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.2.1 Direct Inversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.2.2 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
11.2.3 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

12 Eigenvalues and Eigenvectors 103


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12.2 Definition of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
12.3 Normalised Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
12.4 Matrix Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
12.5 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
12.6 Matrix Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
12.7 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

13 Differential Equations 115


13.1 First Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
13.2 Linear First Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
13.2.1 Homogeneous Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
13.2.2 Non-Homogeneous Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
13.3 Second Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
13.3.1 Direct Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
13.3.2 Second Order Linear Differential Equation With Constant Coefficients . . . . . . . . . . . . 122

2
13.3.3 Homogeneous Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
13.3.4 Examples for the Homogeneous Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
13.3.5 Non-Homogeneous Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

3
1 Basics
1.1 Revision
Things that we will not mention but students are assumed to know:

1. square of a binomial:
(a + b)2 = a 2 + 2ab + b 2

2. difference of squares:
(a + b)(a − b) = a 2 − b 2

3. real and imaginary parts of complex numbers can be indicated as:

ℜ(z) = Re(z) = ℜ(x + i y) = x


ℑ(z) = Im(z) = ℑ(x + i y) = y

4. Notable angles:

(a) θ = 0◦ or 0 rad: cos(θ = 0) = 1, sin(θ = 0) = 0;


¢ p
(b) θ = 30◦ or π6 rad: cos θ = π6 = 23 , sin θ = π6 = 21 ;
¡ ¡ ¢

¢ p ¢ p
(c) θ = 45◦ or π4 rad: cos θ = π4 = 22 , sin θ = π4 = 22 ;
¡ ¡

¢ p
(d) θ = 60◦ or π3 rad: cos θ = π3 = 12 , sin θ = π3 = 23 ;
¡ ¢ ¡

(e) θ = 90◦ or π2 rad: cos θ = π2 = 0, sin θ = π2 = 1;


¡ ¢ ¡ ¢

2 Complex Numbers
2.1 Revision
This is material already covered in MT1. Here it is a small revision of the basic concepts that are considered
fundamental for MT2 as well. These notes can possibly add new points of view to the topic.
The algebraic form of a complex number is:

c = a +ib

with a and b being real numbers and also defined:

a = Re(c)

b = Im(c)
, so the real and imaginary part respectively.
Some basic operations:

1. Sum: given c 1 = a 1 + i b 1 and c 2 = a 2 + i b 2 :

c t ot = c 1 + c 2 = (a 1 + a 2 ) + i (b 1 + b 2 )

2. Difference: given c 1 = a 1 + i b 1 and c 2 = a 2 + i b 2 :

c t ot = c 1 − c 2 = (a 1 − a 2 ) + i (b 1 − b 2 )

3. Product: given c 1 = a 1 + i b 1 and c 2 = a 2 + i b 2 :

c t ot = c 1 · c 2 = (a 1 + i b 1 ) · (a 2 + i b 2 ) = (a 1 a 2 − b 1 b 2 ) + i (b 1 a 2 + a 1 b 2 )

4
4. Complex Conjugate:
c∗ = a − i b
We can also calculate the product between a complex number and its complex conjugate:

cc ∗ = (a + i b) · (a − i b)

This can be calculated remembering the notable expression:

(d + e)(d − e) = d 2 − e 2

so that we get:
cc ∗ = (a + i b) · (a − i b) = a 2 − (i b)2 = a 2 + b 2

5. Modulus: p p
|c| = cc ∗ = a 2 + b 2 ≥ 0
where we can also say that:

|c| = 0 if and only if cc ∗ = 0 and thus a = b = 0

6. Division given c 1 = a 1 + i b 1 and c 2 = a 2 + i b 2 :

c1 a1 + i b1
c t ot = =
c2 a2 + i b2
we aim at removing the imaginary part from the denominator so that we can then separate real and
imaginary parts of the result:

c 1 a 1 + i b 1 a 2 − i b 2 (a 1 a 2 − b 1 b 2 ) + i (b 1 a 2 − a 1 b 2 )
= =
c2 a2 + i b2 a2 − i b2 a 22 + b 22
à ! à !
a1 a2 − b1 b2 b1 a2 − a1 b2
= +i
a 22 + b 22 a 22 + b 22

Quadratic equations:
Complex numbers can be solutions of quadratic equations:

ax 2 + bx + c = 0

whose solutions are: p


−b ± b 2 − 4ac
x± =
2a
. The determinant ∆ = b 2 − 4ac can be either:

• b 2 − 4ac > 0: there are two real solutions


• b 2 − 4ac = 0: there is one real solution with multiplicity equal to 2
• b 2 − 4ac < 0: there are two complex solutions
p
b 4ac − b 2
x± = − ±i
2a 2a
where we just extracted an −1 from the square root obtaining an i so that now 4ac − b 2 is now certainly
positive.

Remember: once we know the solutions to the equation, we can write it as:

ax 2 + bx + c = (x − x + )(x − x − ) = 0

where, as above, x ± are the two solutions. Viceversa, if we have an equation written as:

(x − a)(x − b) = 0

5
Figure 1: Argand diagram.

Figure 2: Sum and subtraction on an Argand diagram.

where a and b are two parameters, it is directly evident that the two solutions are x 1 = a and x 2 = b (hence
there is no need to perform the calculation and apply the formula, as it has been seen in past MT2 exams).
Argand diagrams:
Complex numbers can be represented on a two-dimensional plane and these representations are called Ar-
gand Diagrams like shown in Figure 1.
Figure 2 shows also how to represent sum and subtraction of two complex numbers in an Argand diagrams.
Another representation of a complex number is the trigonometric one: since we have:
(
a = r cos θ
b = r sin θ

we can define:
c = a + i b = r (cos θ + sin θ)

6
where of course we can also define the inverse relations:
( p
r = a2 + b2
θ = arctan ba
The θ range can be chosen to be θ ∈ [0, 2π] or alternatively θ ∈ [−π, π]. We are going to consider θ ∈ [0, 2π]
consistently in these lectures. The tangent is defined in [−π/2, π/2] so we need to think about how to translate
from one range to the other:

Figure 3: Argument of a complex number.

 ³ ´
b
arctan if a > 0 and b > 0 [I quadrant]
³a´



arctan ba + π

if a < 0 and b > 0 [II quadrant]


θ= ³ ´
b
arctan +π if a < 0 and b < 0 [III quadrant]
³a´



b

arctan
a + 2π if a > 0 and b < 0 [IV quadrant]

See Figure 3 for the Argand diagram showing the above cases.
Let’s now get to the exponential form of the complex numbers starting from the Taylor series for e x :
x2 x3 xn
ex = 1 + x + + ... +
2! 3! n!
. If we now substitute x → i θ we get:
(i θ)2 (i θ)3 (i θ)2
eiθ = 1 + (i θ) + + ... +
2! 3! n!
θ2 θ3
µ ¶
= 1− +i θ−
2! 3!
where we have separated the real components from the imaginary ones. The real part is:
θ2 (−1)n θ 2n
Re(e i θ ) = 1 − + ... +
2! (2n)!
which corresponds to the Taylor expansion of cos θ. Similarly for the imaginary part where:
θ3 (−1)n θ 2n+1
Im(e i θ ) = θ − + ... +
3! (2n + 1)!
is the Taylor expansion of sin θ. Thus we have found the alternative form which exploits: e i θ = cos θ + i sin θ,
giving
c = a + i b = r (cos θ + i sin θ) = r e i θ
Now we can reconsider the product and the division of complex numbers as they are going to be straightfor-
ward in this formalism:

7
• Product:
c t ot = c 1 · c 2 = r 1 r 2 e i (θ1 +θ2 )

• Division:
c 1 r 1 i (θ1 −θ2 )
c t ot = = e
c2 r 2

De Moivre’s Theorem:

(cos θ + i sin θ)n = cos(nθ) + i sin(nθ)


which is demonstrated by rewriting the left-hand (LH) as:

(e i θ )n = e i nθ

which gives the right-hand (RH) of the previous relation.


Two more operations:

• Powers of complex numbers: it is best to use the exponential form (and then use the De Moivre’s theorem
in case needed):
c n = r n e i nθ

Example: powers of c = 1 + i :
p iπ
c = 1+i = 2e 4
π
c2 = 2e i 2
p 3π
c3 = 2 2e i 4
c 4
= 4e i π

• Roots of complex numbers: it is best again to use the exponential form:


1 1 i (θ+2πN )
cn =r ne n

where N = 0, 1, 2, . . . , (n − 1). Thus we are going to obtain n roots. So in case in n = 2,


1 p i (θ+2πN )
c2 = re 2

8
thus getting two roots (we are calling them r 1 and r 2 ):
( p iθ
r1 = r e 2 for N = 0
p i (θ+2π) p i θ
r2 = r e 2 = r e 2 e i π for N = 1

where e i π factor in r 2 rotates the previous result r 1 counterclockwise of 180◦ . Remember that:

c = r e i θ = r e i (θ+2π)

as it is e i 2π = 1.
Example: square root of c = i = e i π/2 :
( iπ
r1 = e 4 for N = 0
i (π/2+2π) i 5π
r2 = e 2 =e 4 for N = 1

2.2 Trigonometric Identities


This is material already covered in MT1. Here we have again a small revision.
We can derive two trigonometric identities, exploiting what we have defined so far (and in order to avoid
having to remember them by heart):

1. cos 2θ and sin 2θ. Starting from e i 2θ , we can either develop the square:

e i 2θ = (e i θ )2 = (cos θ + i sin θ)2 = cos2 θ − sin2 θ + 2i cos θ sin θ

or we can apply directly the definition:

e i 2θ = cos 2θ + i sin 2θ

the two expressions have to be identical and we can equate the two real parts and the two imaginary
parts: (
cos 2θ = cos2 θ − sin2 θ
sin 2θ = 2 cos θ sin θ

9
2. cos(θ1 + θ2 ) and sin(θ1 + θ2 ). This can be addressed in a similar way: firstly

e i (θ1 +θ2 ) = e i θ1 · e i θ2
= (cos θ1 + i sin θ1 )(cos θ2 + i sin θ2 )
= cos θ1 cos θ2 − sin θ1 sin θ2 + i (cos θ1 sin θ2 + sin θ1 cos θ2 )

and then starting again for the initial expression and applying directly the definition:

e i (θ1 +θ2 ) = cos(θ1 + θ2 ) + i sin(θ1 + θ2 )

We can then equate again the two real parts and the two imaginary parts:
(
cos(θ1 + θ2 ) = cos θ1 cos θ2 − sin θ1 sin θ2
sin(θ1 + θ2 ) = cos θ1 sin θ2 + sin θ1 cos θ2

2.3 Functions and equations of complex variables


As we introduce functions of complex variables, we define them as z:

z = x +i y

while before we were considering c = a +i b which are complex constants. As a complex number is essentially a
two-dimensional object, if we have an equation involving complex function, the solution to this equation will
be a set of points in the complex plane, a curve in a two-dimensional plane. The way to proceed in this case is
to substitute z = x + i y and solve the real and imaginary parts. The best way is to work some examples.
Examples:

1. From function f (z) = |z|, we define the equation:

|z| = R

where R is a constant. The set of points solving this equation is the circle of radius R:
q
x2 + y 2 = R

2. From function g (z) = arg(z) where the argument define the phase of the complex number, i.e. θ in our
notation, we can define the equation:
arg(z) = θ0
where θ0 is a constant. The set of points solving this equation is a straight line of slope tan θ0 . As a matter
of fact: ³y´
arctan = θ0
x
we can apply tan to both sides:
y
= tan θ0
x
giving y = tan θ0 x which is the equation of a straight line going through the origin.

3. Solve now the equation:


¯ z − c ¯2
¯ =1
¯ ¯
¯
z +c
We can substitute the algebraic expression of z and c:

¯ (x + i y) − (a + i b) ¯2
¯ ¯
¯ (x + i y) + (a + i b) ¯ = 1
¯ ¯

10
and rearrange to isolate the real part on one side and the imaginary part of the other:

¯ (x − a) + i (y − b) ¯2
¯ ¯
¯ (x + a) + i (y + b) ¯ = 1
¯ ¯

p
now we can calculate the moduli of both the numerator and the denumerator ( Re2 + Im2 ):

¯ (x − a)2 + (y − b)2 ¯2
¯p ¯
¯ =1
¯ ¯
¯p
¯ (x + a)2 + (y + b)2 ¯

so doing the working:


(x − a)2 + (y − b)2
=1
(x + a)2 + (y + b)2
(x − a)2 + (y − b)2 = (x + a)2 + (y + b)2

x 2 + a 2 − 2ax + 
y2 + b 2 − 2b y = x 2 + a 2 + 2ax + 
y2 + b 2 + 2b y
one obtains:
xa + yb = 0
which is again the equation of a straight line passing through the origin:
a
y =− x
b
and with slope −a/b.

4. Solve the equation:


α(z 2 + (z ∗ )2 ) + 2βzz ∗ = 1
with the following the definition of α and β:
(
α = 41 (a −2 − b −2 )
β = 14 (a −2 + b −2 )

We can substitute the algebraic expression of z and c without substituting α and β just yet.

α(x 2 − y 2 + y + x 2 − y 2 −
x
2i  y) + 2β(x 2 + y 2 ) = 1
x
2i 

2α(x 2 − y 2 ) + 2β(x 2 + y 2 ) = 1
We can now substitute α and β:

1 −2 1
(a − b −2 )(x 2 − y 2 ) + (a −2 + b −2 )(x 2 + y 2 ) = 1
2 2

1 x 2 y 2 x 2 y 2 1 x 2 y 2 x 2 y 2
( 2 − 2 − 2 + 2 ) + ( 2 + 2 + 2 + 2 ) = 1
2 a a b b 2 a a b b
So we are left with:
x2 y 2
+ =1
a2 b2
which is the equation of an ellipse.

11
2.4 Integrating Complex Functions
Complex functions of real variables can be integrated very simply just by separating the real and imaginary
parts of the function and integrating them separately. Given a complex function Z (t ) of a real variable t , we
can integrate over the real variable and the complex nature of the function affects us only as we actually need
to solve two integrals:
Z Z Z Z
Z (t ) dt = [Re(Z (t )) + i I m(Z (t ))] dt = Re(Z (t )) dt + i I m(Z (t )) dt

Note that it is a different story if we need to integrate over a complex variable as we need to go to two
dimensions and perform a line integral. We will see this later in the module. We are going to address just few
examples which are particularly relevant in physics and statistics.
Examples:

1. solve the definite integral:


Z 2π
I= e i kx dx
0
We start substituting the trigonometric form of the exponential:
Z 2π
I = (cos(kx) + i sin(kx)) dx
0
¸2π
sin kx cos kx
·
= −i =0 for k 6= 0
k k 0

We can write this result in a more elegant way as:


Z 2π (
i kx 0 k 6= 0
e dx = = 2πδ0k
0 2π k = 0

where we have used the Kronecker delta which is defined as:


(
0 i 6= j
δi j =
1 i=j

2. solve the definite integral: Z ∞ 1 2


e (− 2 x +i kx)
dx
−∞
In this case we use a trick to get to an expression easier to integrate. We consider the exponent and to get
to a nice expression we can add and subtract the term 21 k 2 :

1 1 1 1
− x 2 + i kx ± k 2 = − (x 2 − 2i kx − k 2 + k 2 ) = − [(x − i k)2 + k 2 ]
2 2 2 2
So we can go back to our integral:
1 2
Z ∞ 1 2
I = e− 2 k e − 2 (x−i k) dx
−∞
where we have taken out the factor that does not depend on x. Now we can substitute the variable:

u = x −ik du = dx

and obtain a notable integral:


1 2
Z ∞ 1 2
I = e− 2 k e − 2 u du
−∞
It is the integral ofpa Gaussian function and from MT1 we should remember that the area of a Gaussian
with unit width is 2π. So our integral is solved:
Z ∞ p
1 2 1 2 1 2
I = e− 2 k e − 2 u du = 2πe − 2 k
−∞

12
However, as we are revising, let’s perform the calculation of the integral of the Gaussian:
Z ∞
1 2
G= e − 2 x dx
−∞

We can use the trick of starting from this other integral which will prove easier to solve:
Z
1 2 2
G 2D = e − 2 (x +y ) dxdy
R2

there are two directions we want to pursue:

(a) this G 2D can be rewritten as:


Z ∞ Z ∞ µZ ∞ ¶2
1 2 1 2 1 2
G 2D = e − 2 x dx · e − 2 y dy = e − 2 x dx = G2
−∞ −∞ −∞

(b) this G 2D can be solved using the polar coordinates:


Z 2π Z ∞ 1 2
h 1 2 ∞
i
G 2D = e − 2 r r dr dθ = 2π −e − 2 r = 2π
0 0 0

Thus putting together the two results above:


p
∴ G 2 = 2π G= 2π
p
This 2π factor appears in the Gaussian normalisation if we want the distribution to have unity area as
it is useful in statistics for example:
1 1 2
Gaussian = p e − 2 x

3 Hyperbolic functions
This is material already covered in MT1. Here we have again a small revision.
Hyperbolic functions are similar to trigonometric functions, but while the points (cos x, sin x) form a circle
with a unit radius, the points (cosh x, sinh x) form the right half of the equilateral hyperbola.
(
cosh(x) = cos(i x)
sinh(x) = −i sin(i x)

We can also define them through an exponential form, by starting from the expressions:

cos(θ) + i sin(θ) = e i θ
(

cos(θ) − i sin(θ) = e −i θ

Summing or subtracting the two above expressions, we get:

cos(x) = 21 (e i θ + e −i θ )
(

sin(x) = − 2i (e i θ − e −i θ )

Now we can apply these definitions by substituting θ = i x:


(
cosh(x) = 21 (e −x + e x )
sinh(x) = − 12 (e −x − e x ) = 21 (e x − e −x )

From these two expressions is easy to see that the cosh(x) is an even function, while sinh(x) is an odd function.
We can of course also define the hyperbolic tangent:

si nh(x)
tanh =
cosh(x)

13
Figure 4: Equilateral hyperbola.

which is again an odd function.


Properties:
From their symmetry properties, we can say the following:

cosh(−x) = cosh(x)
sinh(−x) = − sinh(x)
tanh(−x) = − tanh(x)
cosh2 (x) − sinh2 (x) = 1

where the last one derives from the equation of the equilateral hyperbola (see Figure 4):

x2 − y 2 = 1

The distributions of these functions are shown in Figure 5.

Figure 5: Hyperbolic functions as drawn in Mathematica with the following commands: Plot[Cosh[x], x, -4, 4]
Plot[Sinh[x], x, -4, 4], Plot[Tanh[x], x, -4, 4]

The hyperbolic cosine is a very used shape in the natural world: it corresponds for example to the shape
assumed by a chain hanging from its two ends and it is caused by its own weight. It could be confused with a
parabola, but it is indeed a hyperbolic cosine. It is very much used in architecture for arches (see for example
most of the work of architect Antoni Gaudí in Barcelona). The equation used is:
³x´
y = a cosh +b
a
and it can be seen in Figure 6 The a parameter controls the opening of the shape while b just controls the
height with respect to the y axis.

14
Figure 6: Catenary.

4 Coordinate systems
Different coordinates can be appropriate in different cases depending on the symmetry properties of the prob-
lem that needs to be addressed.

1. Cartesian: it is the most common one and the one that we always use to visualise the behaviour of the
functions we are studying. It can be two-dimensional (left in Figure 7) or three-dimensional (right in
Figure 7), or N-dimensional with N any positive integer. Of course when dealing with N > 3 visualisation
is not possible.

Figure 7: Cartesian coordinate system in two dimensions (left) and in three dimensions (right).

2. Polar system: in two dimensions see Figure 8. The polar coordinate are defined as function of the Carte-
sian coordinates as follows:
x = r cos ϕ
(

y = r sin ϕ
or inverting the relations we can have the Cartesian coordinates as function of the polar ones:
 q
r = x 2 + y 2

³y´
ϕ = arctan

x

Functions can also be defined with respect to polar coordinates. We can picture them in the polar coor-
dinate system itself but it is usually easier to draw them in the standard Cartesian system.
For example, consider the function:
r = f (ϕ) = 2 sin ϕ

15
Figure 8: Polar coordinate system in two dimensions.

Figure 9: Function f (ϕ) = 2 sin ϕ in the Cartesian plane (right). On the left, you see the point-by-point study of
the function.

Figure 10: For comparison, a reminder of the function y = 2 sin x in the Cartesian plane.

16
In the (r, ϕ) plane, this function is shown in Figure 9 We can study this function by considering intermediate
points on the function. We can start from the origin:


 r = 0, ϕ = 0, π




 x =0=y
π


 p
ϕ=


 r= 2
4







 x =1=y
π
 ϕ= r =2
2




x = 0, y = 2






 3π p
ϕ=


 r= 2
4





 x = −1, y = 1

4.1 Partial Derivatives


To study a function you need to understand how it behaves while varying the values of the variables. Deriva-
tives represent exactly this variation. However if a function is expressed with respect to one set of variables, we
might still want to study it with respect to another set.
So if I have a function f (r, ϕ) and we want to study it on the Cartesian plane, we can substitute the definition
of the polar coordinates as functions of (x, y) such as:

y
µq ¶
f (r, ϕ) → f x 2 + y 2 , arctan = g (x, y)
x

However it is not very practical because, due to the definitions of r and ϕ, g can become very complicated. So
what we can do is to study the function with respect to (x, y) going through (r, ϕ) via partial derivatives such
as:
∂ f (r, ϕ) ∂ f ∂r ∂ f ∂ϕ
= +
∂x ∂r ∂x ∂ϕ ∂x
Of course if you want to study the function you need to obtain all the partial derivatives with respect to all the
variables of the system, in this case both x and y:

∂ f (r, ϕ) ∂ f ∂r ∂ f ∂ϕ

= +


 ∂x

∂r ∂x ∂ϕ ∂x
 ∂ f (r, ϕ) ∂ f ∂r ∂ f ∂ϕ
= +


∂y ∂r ∂y ∂ϕ ∂y

or vice-versa if you have a function g (x, y) and you want to study it with respect to r and ϕ:

∂g (x, y) ∂g ∂x ∂g ∂y

= +


 ∂r

∂x ∂r ∂y ∂r
 ∂g (x, y) ∂g ∂x ∂g ∂y
= +


∂ϕ ∂x ∂ϕ ∂y ∂ϕ

Once you are considering two specific coordinate systems, the partial derivatives of the variables of one with
respect to the variables of the other are always the same and you can use them any time you need to do such
studies.
Let’s then consider the polar coordinate system we just defined and calculate the partial derivatives of x
and y with respect to r and ϕ. Starting from the definition:
 q
r = x 2 + y 2

³y´
ϕ = arctan

x

17
we derive first r with respect to both x and y:

∂r 1 2x x x r cos ϕ

 = = = = = cos ϕ
 ∂x 2 px 2 + y 2 px 2 + y 2 r

r

∂r y y
= = sin ϕ

=p


∂y r

2
x +y 2

and then ϕ with respect to both x and y:


 y
 ∂ϕ − x2 y y sin ϕ

 = ¡ ¢ =− 2 =− 2 =−
 ∂x 1 + y 2 x + y2 r r


x
1
 ∂ϕ

x x x cos ϕ
 ∂y = ¡ y ¢2 = 2 = 2=


x y 2 r r

1+ x +

Now consider the inverse relations to calculate the partial derivatives of r and ϕ with respect to x and y. Start-
ing from the definition:
x = r cos ϕ
(

y = r sin ϕ
we get:
∂x ∂y
 
 = cos ϕ  = sin ϕ
∂r ∂r

 

∂x  ∂y
= −r sin ϕ  = r cos ϕ


∂ϕ ∂ϕ
 

From the two sets of partial derivatives, we are also reminded that:

∂r 1
6= ∂x
∂x
∂r

as the two systems do not have separable variables, but each variable in one system is a function of both
variables in the other system.
So if we now go back to the study of the function f (x, y) we can simply substitute the derivatives calculated
above and we get:
∂ f (r, ϕ) ∂ f ∂r ∂ f ∂ϕ

 = +
∂x ∂r ∂x ∂ϕ ∂x





∂f ∂ f − sin ϕ

 µ ¶
ϕ

= cos +


∂r ∂ϕ

 r
 ∂ f (r, ϕ) ∂ f ∂r ∂ f ∂ϕ
= +


∂y ∂r ∂y ∂ϕ ∂y




∂f ∂ f cos ϕ



sin ϕ +

 =
∂r ∂ϕ r

∂g (x, y) ∂g ∂x ∂g ∂y

 = +
∂r ∂x ∂r ∂y ∂r





∂g ∂g


cos ϕ + sin ϕ

=


∂x ∂y


 ∂g (x, y) ∂g ∂x ∂g ∂y
= +


 ∂ϕ ∂x ∂ϕ ∂y ∂ϕ



∂g ∂g



(−r sin ϕ) + r cos ϕ

 =
∂x ∂y

4.2 Full Differentials


The full differential (or total or exact differential) considers the complete variation of a function with respect
to all the variables considered. Thermodynamics often needs to refer to exact or partial differential. However

18
in our case we want to define the full differentials between the two coordinate systems:
∂r ∂r
dr = dx + dy
∂x ∂y
∂ϕ ∂ϕ
dϕ = dx + dy
∂x ∂y

∂x ∂x
dx = dr + dϕ
∂r ∂ϕ
∂y ∂y
dy = dr + dϕ
∂r ∂ϕ

4.3 Double Integrals


Mostly we are going to use the different coordinate systems to have an easier solution to integrals. Looking at
the symmetries of the problem (the integrand function or the area of integration) that you need to solve, one
system can present clear advantages with respect to another. Let’s consider double integrals for now:
Ï Ï
f (x, y)dA = f (x, y)dxdy
A A

where A is the area over which we wish to integrate. Our infinitesimal area element dA in Cartesian coordinates
is just dxdy and corresponds to a small square area as seen on the left in Figure 11.

Figure 11: Infinitesimal area element dA in Cartesian coordinates (left) and in polar coordinates (right).

If we want to change our coordinate system, particular attention has to be paid on how the infinitesimal
area transforms between the systems. Looking on the right in Figure 11, you see that the description of the
differential area when we think in polar coordinates corresponds to r dr dϕ. Thus we need to transform:

dA = dxdy = r dr dϕ

thus my integral transforms as:


Ï Ï Ï
f (x, y)dA = f (x, y)dxdy = g (r, ϕ)r dr dϕ
A A A0

In general it is not practical to work out on a case-by-case basis how the differential area transforms. However
the transformation we have obtained derives from first principles when doing a coordinate change following
this:
¯ ∂(x, y) ¯
¯ ¯
dxdy = ¯¯ ¯ dr dϕ
∂(r, ϕ) ¯

19
where here the horizontal lines (||) indicate the absolute value In this expression we introduce the concept of
the Jacobian which is the determinant of the Jacobian matrix J where J is defined as:
∂(x, y)
J=
∂(r, ϕ)
This writing indicates the matrix of all first-order partial derivatives of the functions (x, y) with respect to the
variables (r, ϕ). So the integral is:
¯ ∂(x, y) ¯
Ï Ï Ï ¯ ¯
f (x, y)dA = f (x, y)dxdy = g (r, ϕ) ¯
¯ ¯ dr dϕ
A A A0 ∂(r, ϕ) ¯
where in this case |J | = r (or | det J | = r ) and we define A 0 the new area in the (r, ϕ) plane.
Of course this concept can be made general and we do not need to consider specifically the polar coordi-
nate system. Given variables (u, v) that are given as function of (x, y) (and vice-versa), we can write that:
¯ ∂(x, y) ¯
¯ ¯
dxdy = ¯¯ ¯ dudv
∂(u, v) ¯
Thus we will have that:
¯ ∂(x, y) ¯
Ï Ï ¯ ¯
f (x, y)dxdy = h(u, v) ¯¯ ¯ dudv
A A0 ∂(u, v) ¯
The Jacobian matrix is defined in general as:
∂x ∂x
à !
J= ∂u ∂v
∂y ∂y
∂u ∂v

and we need to take the determinant of this matrix. The determinant will be explain later in the module, but
it can be simply thought as a number representing the matrix. In the 2 × 2 case, we calculate it by multiplying
the terms of the first diagonal (from left to right, from up to down) minus the product of the terms left out
(diagonal from right to left, from up to down):
¯ ∂x ∂x ¯ ∂x ∂y ∂x ∂y
¯ ¯
¯ ∂u ∂v ¯
det J = ¯ ∂y ∂y ¯ = −
¯ ∂u ∂v ∂v ∂u
∂u ∂v
¯

In our specific case of polar coordinates:


∂x ∂x
¯ ¯
∂r ∂ϕ
¯ ¯
det J =
¯ ¯
¯ ∂y ∂y ¯
∂r ∂ϕ
¯ ¯
∂x ∂y ∂x ∂y
= −
∂r ∂ϕ ∂ϕ ∂r
= cos ϕ(r cos ϕ) − (−r sin ϕ) sin ϕ
= r cos2 ϕ + r sin2 ϕ = r

The reason for taking the absolute value of the determinant of the Jacobian is that if the Jacobian is negative,
then the orientation of the region of integration gets flipped. However, for the moment we do not have to think
about the orientation of a region. We will see it later in the module when we consider surface integrals. To talk
about change of variables without having to think about orientations, we make use of the fact that:
Ï Ï
f dA = (− f ) dA
R −R

so we get the same result if we flip the orientation of the region back to the positive orientation and flip the
sign of the Jacobian. To better understand
R1 this we can think about the one-dimensional case: suppose that we
make the substitution u = −x in 0 dx. Then we get:
Z 1 Z x=1 Z u=−1 Z 0
dx
dx = du = (−1) du = 1 du
0 x=0 du u=0 −1

where the integrand (−1) comes from dx/du that we need for the change of variable. If we do not want to think
about inverting the integral limits as shown in the last passage above (easy to do in one-dimension, but more
complicated in higher dimensions), then we do need the absolute value on the derivative (or the Jacobian).

20
4.3.1 Example

One example of integral that can benefit from polar coordinates:


Z ∞Z ∞q p 2 2
I= x 2 + y 2 e (−λ x +y ) dxdy
−∞ −∞

where λ is a real and positive constant.


Given the integrand function depends on x 2 + y 2 , it is natural to use the polar coordinates and thus:

x = r cos θ
( (
R ∈ [0, +∞]
with ranges :
y = r sin θ θ ∈ [0, 2π]

where we know consider the whole plane. Our integral becomes (including also the Jacobian):
Z 2π Z +∞ Z 2π Z +∞
I= r e −λr r dr dθ = r 2 e −λr dr dθ
0 0 0 0

We can immediately integrate over θ as the integrand do not depend on it:


Z +∞
I = 2π r 2 e −λr dr
0

Then we can use the integration by parts that I recall here:


Z Z
uv 0 = uv − u 0 v

So in our case we can assign:


 0
u =r2 u = 2r
(

v 0 = e −λr v = − 1 e −λr
λ
Substituting we get:

1 2 −λr +∞ 2 +∞ −λe
½· ¸ Z ¾
I = 2π − r e + re dr
λ 0 λ 0
Z +∞
2
= 2π[0] + 2π r e −λe dr
λ 0

then with by parts again:


 0
u = 1
(
u=r

v 0 = e −λr v = − 1 e −λr
λ
we get:

1 −λr +∞ 1 +∞ −λr
½· ¸ ¾

Z
I = − re + e dr
λ λ 0 λ 0
Z +∞
4π 4π 1
= [0] + e −λr dr
λ λ λ 0
1 −λr +∞ 4π
· ¸

= − e = 3
λ2 λ 0 λ

3. Spherical Coordinates:
these are the equivalent of the polar coordinates but in three dimensions. In this spherical system, the
Cartesian coordinates are defined as:
x = r sin θ cos ϕ


y = r sin θ sin ϕ

z = r cos θ

21
Figure 12: Spherical coordinate system.

where the spherical coordinates covering the whole space need to range the following intervals:

r ∈ [0, +∞]

θ ∈ [0, π]

ϕ ∈ [0, 2π]

as it can be easily seen from Figure 12. Inverting the system above, we can obtain the (r, ϕ, θ) coordinates
as function of the Cartesian ones:
 q


 r = x2 + y 2 + z2

 Ã !
z


θ = arccos p


 x2 + y 2 + z2
ϕ = arctan y

 ³ ´

x
All what said and defined above for the two-dimensional case is still valid in three dimensions, just in-
cluding an extra variable. For example, the full differential is now:

∂r ∂r ∂r
dr = dx + dy + dz
∂x ∂y ∂z

and so on for the others.

4.4 Triple Integrals


This three-dimensional coordinate system is useful to solve triple integrals and again you can go from one
coordinate system to another using the determinant of the Jacobian matrix:
Ñ Ñ
f (x, y, z)dV = f (x, y, z)dxdydz
V V
¯ ∂(x, y, z) ¯
Ñ ¯ ¯
= g (r, θ, ϕ) ¯
¯ ¯ dr dθdϕ
V ∂(r, θ, ϕ) ¯
Ñ
= g (r, θ, ϕ)r 2 sin θdr dθdϕ
V

22
where we used the result that |J | = r 2 sin θ. This can be obtain by explicitly calculating the 3 × 3 determinant in
this case:
∂x ∂x ∂x
 
 ∂r ∂y
∂θ
∂y
∂ϕ
∂y 
J =
 || ∂r | | ∂θ | | ∂ϕ || 

|| ∂z
∂r |
∂z
| ∂θ | ∂z
| ∂ϕ ||

where the colour coding gives you a hint on how to calculate the determinant of a 3 × 3 square matrix. Con-
sidering the first row each element is associated to a 2 × 2 matrix and it is multiplied by its determinant. For a
more comprehensive explanation, see the notes on matrices. This is the calculation:
¯ ∂y ∂y ¯ ¯ ∂y ∂y ¯
¯ ∂y ∂y ¯
¯ ¯
∂x¯ ∂θ ∂ϕ ¯ ∂x ¯ ∂r ∂ϕ ¯ ∂x ¯ ∂r ∂θ ¯
¯ ¯ ¯ ¯
det J = ¯ ∂z ∂z ¯ − ¯+
∂r ¯ ∂θ ¯ ∂z ∂z ¯ ∂ϕ ¯ ∂z ∂z ¯
¯ ¯ ¯
∂θ ∂ϕ ∂r ∂ϕ ∂r ∂θ
¯
∂x ³ ∂y ∂z ∂y ∂z ´
= −
∂r ∂θ ∂ϕ ∂ϕ ∂θ
∂x ³ ∂y ∂z ∂y ∂z ´
− −
∂θ ∂r ∂ϕ ∂ϕ ∂r
∂x ³ ∂y ∂z ∂y ∂z ´
+ −
∂ϕ ∂r ∂θ ∂θ ∂r
= (cos ϕ sin θ)[(r sin ϕ cos θ)(0) − (r cos ϕ sin θ)(−r sin θ)] +
−(r cos ϕ cos θ)[(sin ϕ sin θ)(0) − (r cos ϕ sin θ)(cos θ)] +
+(−r sin ϕ sin θ)[(sin ϕ sin θ)(−r sin θ) − (r sin ϕ cos θ)(cos θ)]
= (cos ϕ sin θ)[r 2 cos ϕ sin2 θ] +
−(r cos ϕ cos θ)[−r cos ϕ sin θ cos θ] +
+(−r sin ϕ sin θ)[−r sin ϕ sin2 θ − r sin ϕ cos2 θ]
= r 2 cos2 ϕ sin θ sin2 θ +
+r 2 cos2 ϕ sin θ cos2 θ +
+(−r sin ϕ sin θ)[−r sin ϕ]
= r 2 cos2 ϕ sin θ sin2 θ + r 2 cos2 ϕ sin θ cos2 θ + r 2 sin2 ϕ sin θ
= r 2 cos2 ϕ sin θ + r 2 sin2 ϕ sin θ
= r 2 sin θ

4. Cylindrical coordinates:
This is another three-dimensional system useful in case of problems with cylindrical symmetries. In
this cylindrical system, the Cartesian coordinates are defined as:

x = ρ cos ϕ


y = ρ sin ϕ

z=z

where the new coordinates covering the whole space need to range the following intervals:

ρ ∈ [0, +∞]


ϕ ∈ [0, 2π]

z ∈ [−∞, +∞]

as it can be easily seen from Figure 13. Inverting the system above, we can obtain the (ρ, ϕ, z) coordinates
as function of the Cartesian ones:  q



 ρ = x2 + y 2
 ³y´
ϕ = arctan


 x

z=z

23
Figure 13: Cylindrical coordinate system.

5 Multiple Integrals
Multiple integrals are common things in physics: they are used to calculate areas, volumes, masses. moments,
and much else. They can be solved in Cartesian coordinates but sometimes they can be simpler in alternative
coordinate systems like spherical or cylindrical. It all depends on the symmetry properties of the problem.
In general if the integrand function is just unity, i.e.

f (x, y, z, . . . ) = 1

then the integral will calculate the size of the region over which is performed. If it is in two dimensions, it will
correspond to an area. If it is in three dimensions, it will be a volume and so on.
If instead the integrand function is not just unity, i.e.

f (x, y, z, . . . ) 6= 1

then the integral will calculate the size of the region created by the integrand function and the integrand space.
If it is in two dimensions, the function f (x, y) will correspond to a surface in the three-dimensional space, thus
the integral will correspond to the volume of the solid created by the surface and its projection over the x − y
plane. If it is in three dimensions, the function f (x, y, z) will be a three-dimensional solid in a four-dimensional
space, thus the integral will correspond to some kind of four-volume of the object created by the solid and its
projection on the x − y − z space.

5.1 Double Integrals


As in one dimension, integrals in two dimensions can be defined as limits to a summation:

 N
X
S =
 f (x n , y n )∆A n
n=1
 lim ∆A n → 0


N →+∞

In general the limits of integration can be just some constants or they can be functions of the integration
variables themselves:

1. Fixed limits: if the limits are constants, we generally can choose the order of integration (even if a given
order can result in a simpler integral than another) in the two variables. By convention we should inte-

24
grate from inside out (i.e. we should start from the first differential and the second integral symbol):
Z y 1 Z x1
I= f (x, y)dxdy
y0 x0

where here one integrates first over x, and then over y. See Figure 14 for a visualisation of the integration
area. In case of f (x, y) = 1, the integral above calculates the area of rectangular region over which we are

Figure 14: Double integrals over a region with fixed limits (pink area). The Cartesian differential area is also
shown.

integrating. In this case the order is not important and one would get the same result inverting the order:
Z y 1 Z x1 Z x1 Z y 1
I= f (x, y)dxdy = f (x, y)dydx
y0 x0 x0 y0

2. Limits that are function of the integration variables: in this case, we need to integrate first over the vari-
able whose limits are functions of the second variable, thus the order is important. For example, see
Figure 15. In this case, the integral has to be calculated integrating first over the y variable:

Figure 15: Double integrals over a region delimited in the y direction by functions of x (red area).

Z x 2 Z y 2 (x) Z x2
I= f (x, y)dydx = g (x)dx
x1 y 1 (x) x1

Integrating first over the y variable is also called integration by vertical lines: this is because we need to
fix x to any value within its global range and then establish the variation of y.
Vice-versa, if the limits on x depend on the y variable, the integral would be:
Z y 2 Z x2 (y) Z y2
I= f (x, y)dxdy = h(y)dy
y1 x 1 (y) y1

25
where we need to integrate first over the x variable and this is also called integration by horizontal lines.
See also the visual example in Figure 16.

Figure 16: Visual example of integration by vertical (left) or horizontal (right) lines.

5.1.1 Exercises

Let’s now work on some examples.

(i) Solve the following integral:


Z 1 Z 1
I= dydx
x=0 y=0

In this case, as the limits are fixed, the order of integration does not matter. Figure 17 shows this the area
in this example. Let’s integrate by vertical lines. Then we need to fix x and then integrate over y first: x

Figure 17: Double integral over a rectangular region. The lines refer to the integration by vertical lines.

has to be fixed in the range [0, 1], and while x span that range, y variable can also vary in the range [0, 1]:
Z 1 Z 1 Z 1
I= dydx = dx = 1
x=0 y=0 x=0

where as in this example f (x, y) = 1, we have now calculated the area of the region considered.

(ii) Solve the following integral:


Z 1 Z x
I= dydx
x=0 y=x 2

¸1
1 1 x2 x3
·
1 1 1
Z Z
I= [y]xx 2 dx = (x − x )dx =2
− = − =
0 0 2 3 0 2 3 6
In this case we have integrated by vertical lines as the problem was already given in this way. However we
can think of inverting the integral and performing it by horizontal lines adjusting the limits: this is easily

26
Figure 18: Double integral from example (ii). The green area refers to the integration region.

done looking at the area in Figure 18. Fixing the y variable within the range [0, 1], the x variable will have
p
to go from the curve y = x to the curve y = x 2 , thus the limits on x will be [y, y]. So the integral becomes:
p
Z 1Z y Z 1 p
y
I = dxdy = [x] y dy
0 y 0
¸1
1 2 3/2 y 2
Z ·
p
= ( y − y)dy = y −
0 3 2 0
2 1 1
= − =
3 2 6
thus getting the same result as before, as expected.

(iii) Calculate the area of the circle x 2 + y 2 = R 2 . We can simplify our life by considering one quarter of the
circle and specifically the quarter in the first quadrant with x > 0 and y > 0. Thus our total area will be

Figure 19: Double integral from example (iii). The green area refers to the integration region.

I = 4A where I use A for the quarter in Figure 19. Again in this case we can choose to integrate by vertical
lines or horizontal lines as the problem is invariant by exchange of x and y. Let’s choose again to integrate
by vertical lines. We need to establish the limits of integration in the two variables: we start from x that
can range in the interval [0, R] and then fixing x, we see that y can go from 0 until it hits the circumference.
This means that it must be: (
x ∈ [0, R]
p
y ∈ [0, R 2 − x 2 ]

27
So our integral is:
p
Z RZ R 2 −x 2
I = 4 dydx
0 0
Z R p Z Rp
2 −x 2
= 4 [y]0 R dx = 4 R 2 − x 2 dx
0 0

Now to solve this last integral we can think of substituting the dummy variable:

x = R sin α
(

dx = R cos αdα

(you can choose x = R cos α as well of course, however this choice is simpler as it avoids problem with
minuses popping up). The limits also need to be translated in the new variable:

x = 0 → α = 0

π
x = R → α =
2
So the integral becomes:
Z π
2
p
I = 4 R 2 − R 2 sin2 αR cos α dα
0
Z π
2
p
= 4 R 1 − sin2 αR cos α dα
0
Z π
2
2
= 4R cos2 α dα
0

This last integral can be solved by remembering the identity:

1 + cos 2t
cos2 t =
2
So then back to our integral:
π
1 + cos 2α
Z
2
I = 4R 2 dα
0 2
¸π
α sin 2α 2
·
= 4R 2 +
2 4 0
2 π
h i
2
= 4R + 0 = πR
4
In this case, we should have solved the integral in polar coordinates. However the point here was to under-
stand how to set limits as function of the integration variables so we had to do it in Cartesian coordinates.
Let’s solve it anyway also in polar coordinates. The limits now are fixed:

x = r cos θ r ∈ [0, R]
(
with ranges : π
y = r sin θ θ ∈ [0, ]
2
So the integral is:
π · ¸R
R π r2
Z Z
2
I =4 r dr dθ = 4 = πR 2
0 0 2 2 0
where we have introduced the Jacobian of the polar coordinates that is |J | = r .

(iv) Calculate the area of the ellipse:


x2 y 2
+ =1
a2 b2

28
Figure 20: Double integral from example (iv). The green area refers to the integration region.

Here again we can choose to solve the integral over a quarter of the whole region so that we can concen-
trate on the first quadrant. Again it is equivalent to integrate by vertical lines or by horizontal lines. Let’s
then choose to integrate by horizontal lines. In this case, the limits are:

y ∈ [0, b]



  s 
y2 
x ∈ 0, a 1− 2
 
b

where the second limit for x comes from the equation of the ellipse. Then the integral is:
a
p
Z bZ b
b 2 −y 2 Z b ·aq ¸
I = 4q 1 = 4 dxdy = 4 b2 − y 2 dy
0 0 0 b

Now this integral is exactly the same we already solved in the previous exercise (iii):
Z Rp 1
R 2 − x 2 dx = πR 2
0 4

so we just need to substitute R with b:

a bq a h π 2i
Z
I =4 b 2 − y 2 dy = 4 b = πab
b 0 b 4

(v) Now solve an integral with f (x, y) 6= 1: Ï


I= x y dxdy

where: p ª
Ω = (x, y) ∈ ℜ2 : 0 < x < 1; x 2 < y < x
©

First thing to do in this case is to understand what kind of area we are given. The area Ω is shown in
Figure 21. Also in this case, it is equivalent whether we decide to integrate by vertical or horizontal lines.
However we need to decide first so that we can set the proper limits on the integration variables. Let’s con-
sider an integration by vertical lines as the problem suggests. We need to have x running along the whole
p
range [0, 1] while y is limited by the two curves y = x 2 and y = x (the latter is a horizontal parabola, with
axis along the x axis. Its equation is x = y 2 but in this case we are interested only in the part of the curve

29
Figure 21: Double integral from example (v). The green area refers to the integration region.

in the first quadrant). Thus the integral is:


p
Ï Z 1Z x
I = x y dxdy = x y dydx
Ω 0 x2
p
x
1 y2 x x4 1
Z · ¸ ¸ Z ·
= x − dx =dx x
0 x2 2 0 2 2
Z 1· 2 ¸1
x x5 1 x3 x6
¸ ·
= − dx = −
0 2 2 2 3 6 0
µ ¶
1 1 1 1
= − =
2 3 6 12

5.2 Triple Integrals


A general three-dimensional integral is: Ñ
f (x, y, z) dxdydz
V
where the integration is done in the Cartesian plane over the three-dimensional space V . As already mentioned
above, in case it is f (x, y, z) = 1, then the integral is calculating the volume of V .
Let’s see some examples.

(i) Calculate the volume of a sphere of radius R.


The sphere of radius R has equation x 2 + y 2 + z 2 = R 2 (see Figure 22).
Obviously the most natural way to solve this integral is using spherical coordinates 1 . However we are
going to write down the integral needed to calculate this volume in Cartesian coordinates first. This is to
understand in an easy example how it has to be done.
Being a sphere, we can choose any order of integration for the three variables as any order will be exactly
equivalent. Of course, once an order is chosen, we need to stick to it as different choices will correspond
to different limits for a given variable. Choosing an order of integration is similar to the choice between
integrating by vertical or horizontal lines in two dimensions.
First of all let’s put ourselves in a easier situation by considering that the sphere can be split in eight parts
and we can write our integral limited to one octant, the one defined by all positive coordinates: x > 0,
y > 0, z > 0. Thus:
V = 8I
Let’s now choose the standard x, y, z order: in this case we mean that we fix x first, but that as a conse-
quence we are going to integrate z first.
If we fix x first, x will be able to vary within the whole range [0, R]. Now for each x value in that range, we
have a circle on a plane parallel to the y − z plane. The radius of this circle depends only on our choice for
1 In Latin you would probably say Nomen omen, i.e. the meaning is in the name :)

30
Figure 22: Sphere: the triple integral in (i) is used to calculate its volume. On the right, it is shown the positive
octant that it is considered in calculating the integral.

x and we can determine the radius of this circle setting the third coordinate z to zero in our equation thus
p the maximum value the y coordinate can have. From
getting also p this we get that at fixed x, y can go at
most to R 2 − x 2 . So y can span over the range between 0 and R 2 − x 2 (remember we chose the positive
octant so our lower limits on the variables will all be at 0). Finally
p fixing both x and y, the coordinate z
will be allowed to go from 0 to the surface to the sphere, thus R 2 − x 2 − y 2 . To summarise:


 x ∈ [0, R]

 p
y ∈ [0, R 2 − x 2 ]

 q
z ∈ [0, R 2 − x 2 − y 2 ]

So our volume will be: p p


Z RZ R 2 −x 2 Z R 2 −x 2 −y 2
V = 8I = 8 dzdydx
0 0 0
where we need to integrate over z first and then over y and x.
Now we do not really want to do this integral as we know that it is so much easier in spherical coordinates
so here is it in spherical coordinates:

r ∈ [0, R]
x = r sin θ cos ϕ
 

 
 h πi
ϕ

y = r sin θ sin ϕ with ranges : ∈ 0,
2
  h πi
z = r cos θ
 
θ ∈ 0,


2
where we keep considering only the positive octant. Then we need to remember to introduce the Jacobian
to translate the volume element:
Ñ Ñ
dzdydx = r 2 sin θ dr dθdϕ
V V

So our integral becomes


π π
Z
2
Z
2
Z R
V =8 r 2 sin θ dr dθdϕ
0 0 0

31
where we can separate the three variables and integrate independently. We integrate immediately over ϕ
as the integrand function does not depend on it:
Z π
π R 2
Z
2
V = 8 r dr sin θ dθ
2 0 0
· 3 ¸R π
r
= 4π [− cos θ]02
3 0
R3 4
= 4π [0 − (−1)] = πR 3
3 3

(ii) Calculate the volume of a parabolic dish.

Figure 23: Parabolic dish: the triple integral in (ii) is used to calculate its volume.

A parabolic dish (see Figure 23) has equation

z = x2 + y 2

with z ∈ [0, R 2 ]. Thus the maximum opening of the dish corresponds to a circle x 2 + y 2 = R 2 .
Again it is useful to integrate in the positive octant only: hence our volume is:

V = 4I

Again we write down the integral in Cartesian coordinates first. In this case, because of the cylindrical
symmetry of the problem, it is better to fix z first, so we use the order z, y, x which corresponds to inte-
grating over x first. Once we fix z within the range [0, R 2 ], we select one circle parallel to the x − y plane.
Within this circle, y can range from 0 (from the choice of the positive octant) to a maximum that is the
p
radius of the circle when we are on the y axis. The radius of the circle depends on z and it is z. Then x
can span from 0 to the surface of the dish. So to summarise:

z ∈ [0, R 2 ]




 p
y ∈ [0, z]

 q
x ∈ [0, z − y 2 ]

So the integral is: p


p
Z R2 Z zZ z−y 2
I= dzdydx
0 0 0

32
Now to actually solve this integral it is better to use cylindrical coordinates to exploit the symmetry of the
problem.
2

z ∈ [0, R ]

z = z 
  p
y = ρ sin ϕ with ranges : ρ ∈ [0, z]
 
  h πi
x = ρ cos ϕ ϕ ∈ 0,
 

2
where now it is ρ that depends on z according to the same observation we made on y above, while ϕ
can span within the whole octant. Again we need to remember to translate the volume element from
Cartesian to cylindrical coordinates:
Ñ Ñ
dzdydx = ρ dρdϕdz
V V

So our integral becomes


π p
Z
2
Z R2 Z z
I = ρ dρdzdϕ
0 0 0
p
π
Z R 2 · ρ2 ¸ z
= dz
2 0 2 0
R2 · ¸R 2
π π z2 π
Z
= z dz = = R4
4 0 4 2 0 8

So finally our volume is:


π π
V = 4I = 4 R 4 = R 4
8 2

Figure 24: Parabolic dish orientated along the y axis so that its volume can be calculated using the formula for
the rotational solids.

Alternatively, you can use what you have learned in MT1 about rotational solids (see Figure 24 where now
the parabolic dish is oriented along the y axis. Of course this has no effect on the value of its volume.)
and use the formula:
dV = 2πx ydx
and for y = x 2 this becomes:
¸R
R R x4 π 4
Z Z ·
V = 2π x y dx = 2π x 3 dx = 2π = R
0 0 4 0 2

obtaining the same result.

33
5.2.1 Methods for Integration

For the double integral case, we defined two possible methods to proceed with the integration: via vertical or
via horizontal lines. Similarly, for triple integrals we can define two methods. Each of them will split the triple
integral in a sequence of two integrals: either a 1D integral and then a 2D integral, or a 2D integral and then a
1D integral.

Cross section method: imagine taking a big meat cleaver and chopping the three-dimensional region into
slices perpendicular to one of the coordinate axes (in a similar manner to the way in which we take cross
sections of a surface). If we visualise the axis perpendicular to the slices as being vertical, then you could
view the region as being composed of a bunch of these cross sections stacked on top of each other. For
example, we can choose z to be the “vertical” variable, then the slices or layers would be horizontal
sections of the volume and we can write the integral as:
Ñ Z z1 µÏ ¶
f (x, y, z) dxdydz = f (x, y, z) dxdy dz
V z0 C (z)

where C is the two-dimensional cross section. In general, the internal 2D integral will give as result a
function of z, hence: Z z1 µÏ ¶ Z z1
f (x, y, z) dxdy dz = g (z) dz.
z0 C (z) z0
With this method, we perform a double integral first and then a 1D integral.

Shadow method: imagine there is a light source (e.g. the sun) positioned far away along one of the coordinate
axis (e.g. the positive z-axis). We think of this sun as being straight up in the sky and think of the chosen
coordinate axis as though it were vertical. As this sun is shining on the three-dimensional region of our
integral, it is casting a shadow onto the flat ground below the region, i.e., on a plane perpendicular to the
axis the sun is coming from. This shadow is a two-dimensional region, and we turn the triple integral into
a double integral over the shadow. Inside the double integral, we still need to include a single integral
in the third “vertical” variable, where this variable ranges from the bottom of the volume to its top. If
we had chosen z to be the “vertical” variable where the sun is coming from, the shadow method for
integrating a function f over V would be of the form:
Ñ Ï µZ z1 (x,y) ¶
f (x, y, z) dxdydz = f (x, y, z) dz dxdy
V S z 0 (x,y)

where S is the two-dimensional shadow area, while the limit on the z integrals are in general functions
of the other two variables (x and y in this case). Hence also the result of the internal integral is in general
a function of the other two variables:
Ï µZ z1 (x,y) ¶ Ï
f (x, y, z) dz dxdy = g (x, y) dxdy
S z 0 (x,y) S

With this method, we perform a 1D integral first and then a double integral.

5.2.2 Example

Calculate the volume of the following solid:

S = (x, y, z) ∈ R3 : 0 ≤ z ≤ 1, (z − 1)2 ≥ x 2 + y 2 .
© ª

As usual, let’s first thing about our solid: on the (x, y) plane (i.e. for z = 0) the cross-section of the solid is
the unitary circle x 2 + y 2 = 1. Instead at z = 1, the the cross-section of the solid is one point on the z axis as
x 2 + y 2 = 0. Then we can consider the intersections with the other planes: with y = 0 (i.e. the (x, z) plane), we
get (z − 1)2 = x 2 , hence z − 1 = x which is the equation of a line connecting (1, 0, 0) and (0, 0, 1). Similarly for the
x = 0 (i.e. the (y, z) plane), we obtain a line as z − 1 = y. So our solid is a cone.
Now that we understood the shape of the solid, let’s write the integral: we can choose quite naturally to
integrate using the cross-section method. In this case we write the integral as:
Z ÃÏ !
1
V= dxdy dz
0 Sx y

34
where S x y is the circle given by the intersection of the solid with a plane parallel to the (x, y) plane at height
z: this is thus the circle centred in (0, 0, z) and with radius z − 1. So the internal integral is just the area of this
circle: Ï
dxdy = π(z − 1)2
Sx y

Hence we have in our volume calculation:


¸1
(z − 1)3 π
Z 1 ·
V = π (z − 1)2 dz = π =
0 3 0 3

5.3 Some Examples of Integrals in Physics


Here are few examples on how the multiple integrals defined in the past lectures are used in physics:
1. masses: when calculating the mass of a solid V , one needs to know its mass density (its mass per unit
volume) which in general will be a function of x, y, z. By convention the mass density is called with the
Greek letter ρ: in order not to get confused with the ρ sometimes used in the cylindrical coordinates, we
will define the mass density as ρ M . Thus the mass will be calculated by the integral:
Ñ Ñ
M= dM = ρ M (x, y, z) dxdydz
V V

2. moments of inertia: a moment of inertia is the mass property of a rigid body that determines the torque
needed for a desired angular acceleration about an axis of rotation. Moment of inertia depends on the
shape of the body and may be different around different axes of rotation. In general it is also defined
as the angular mass of a body. The moment of inertia is calculated considering each mass infinitesimal
element and its distance from the chosen axis of rotation:

dI = `2 dM

where ` is defined as the distance of the mass element dM from the rotation axis. Thus the total moment
of inertia about an axis becomes:
Ñ Ñ
I= `2 dM = `2 (x, y, z)ρ M (x, y, z) dxdydz
V V

As shown above, in general also ` is a function of the coordinates.


Below we go through few examples of calculations of moments of inertia. Usually the first thing to address
is the derivation of the function describing the distance of the mass element from the rotation axis.

(i) Calculate the moment of inertia of a sphere of constant density about the z axis.
Using the definition above we need to use the integral:
Ñ Ñ
Iz = `2z dM = ρ M `2z (x, y, z) dxdydz
V V

where we can take the density out of the integral symbol as it is constant. Also we define `z to indicate
that we need to calculate the distance from the z axis.
In this case the distance from the z axis is easily calculated as if we cut the sphere with planes parallel to
the x − y plane we still have circles, thus: q
`z = x2 + y 2
The moment of inertia will be: Ñ
Iz = ρM (x 2 + y 2 ) dxdydz
V
Now we move immediately in spherical coordinate and thus we can integrate over the whole sphere
(rather than use the positive octant as we did in the past lecture where we wanted to solve the problem
also in Cartesian coordinates):

x = r sin θ cos ϕ
 
 r ∈ [0, R]

y = r sin θ sin ϕ with ranges : ϕ ∈ [0, 2π]
 
z = r cos θ θ ∈ [0, π]
 

35
In these coordinates the distance `z is written as:

`2z = x 2 + y 2 = r 2 sin2 θ

Then we need to remember to introduce the Jacobian to translate the volume element:
Ñ Ñ
dzdydx = r 2 sin θ dr dθdϕ
V V

So our integral becomes


Z 2π Z π Z R
Iz = ρM (r 2 sin2 θ) r 2 sin θ dr dθdϕ
0 0 0
Z 2π Z π Z R
= ρM r 4 sin3 θ dr dθdϕ
0 0 0

where we can separate the three variables:


¸R Z π
r5
·
Iz = ρ M 2π sin3 θ dθ
5
0 0
R5 π
Z
= ρ M 2π sin θ(1 − cos2 θ) dθ
5 0
R5 π
Z
= ρM 2π (sin θ − sin θ cos2 θ) dθ
5 0
¸π
R5 cos3 θ
·
= ρM 2π − cos θ +
5 3 0
5
R
· ¸
2
= ρM 2π 2−
5 3
R5 4
= ρ M 2π
5 3

where we can rearrange the factors keeping in mind that the volume of the sphere is 43 πR 3 :
· ¸
2 4
Iz = R2 πR 3 ρ M
5 3

where the last two factors correspond to the mass M of the solid:
· ¸
4
M= πR 3 ρ M
3

Thus our moment of inertia can be written as:


2
Iz = R2 M
5

(ii) Calculate the moment of inertia of a cylinder along the z axis and of constant mass density rotating about
the x axis. The cylinder has radius a and goes from −b to b along the z axis (height=2b).
The integral giving us the moment of inertia about the x axis can be written as:
Ñ Ñ
2
Ix = `x dM = ρ M `2x (x, y, z) dxdydz
V V

where we need to find the expression for `x , the distance of the mass element from the x axis. In this case
if we fix x and we cut the cylinder with a plane parallel to the y − z plane, we get a rectangle and we can
see that the distance of any point on this rectangle from the x axis is simply:
q
`x = y 2 + z2

36
Figure 25: Cylinder along the z axis and considered rotating about the x axis.

Thus our integral becomes: Ñ


Ix = ρM (y 2 + z 2 ) dxdydz
V
We can now move to cylindrical coordinates:

x = ρ cos ϕ ρ ∈ [0, a]
 
 
y = ρ sin ϕ with ranges : ϕ ∈ [0, 2π]
 
z=z z ∈ [−b, b]
 

So `x becomes:
`2x = y 2 + z 2 = ρ 2 sin2 ϕ + z 2
and the volume element:
dzdydx = ρ dρdϕdz
Thus the integral is:
Z b Z 2π Z a
Ix = ρM (ρ 2 sin2 ϕ + z 2 )ρ dρdϕdz
z=−b ϕ=0 ρ=0

As we have a sum of two terms in the integrand we can divide the integral into the sum of two separated
integrals:
I x = ρ M (I 1 + I 2 )
So starting from the first one:
Z b Z 2π Z a
I1 = ρ 3 sin2 ϕ dρdϕdz
z=−b ϕ=0 ρ=0

where we can easily integrate over z and separate the other two variables
¸a Z
ρ4 2π 2π
·
1
Z
I 1 = 2b sin2 ϕ dϕ = a 4 b sin2 ϕ dϕ
4 0 0 2 0

We now can use the trigonometric identity:

1
sin2 x = (1 − cos 2x)
2

37
and so the integral becomes:
Z 2π
1 4
I1 = a b (1 − cos 2ϕ) dϕ
4 0
sin 2ϕ 2π
· ¸
1 4
= a b ϕ−
4 2 0
1 4 1 4
= a b [2π − 0] = πa b
4 2
Now the second integral I 2 :
Z b Z 2π Z a
I2 = z 2 ρ dρdϕdz
z=−b ϕ=0 ρ=0
Z b Z a
= 2π z 2 ρ dρdz
z=−b ρ=0
¸b · ¸a
z3 ρ2
·
= 2π
3 −b 2 0
b3 a2
= 2π 2
3 2
2 2 3
= πa b
3
Now putting the two terms back together we get:
µ ¶
1 4 2
I x = ρ M (I 1 + I 2 ) = ρ M πa b + πa 2 b 3
2 3

Now as in the previous exercise, we can use the volume of the solid to simplify the expression. The volume
of this cylinder is:
V = πa 2 (2b) = 2πa 2 b
so we can take this factor out the expression of the moment of inertia:
µ ¶ µ ¶
1 1 1 1
I x = ρ M 2πa 2 b a 2 + b 2 = a 2 + b 2 M
4 3 4 3

where again we have defined the mass M of our cylinder as ρ M V .

6 Vectors
A vector is a geometrical object which has both magnitude and direction. It exists independently of a particular
coordinate system. However usually we choose a coordinate system and represent the vector in that.
The standard choice of representation is using the three-dimensional Cartesian coordinates. In this system,
we represent a vector as an ordered triplet:
~
v = (v x , v y , v z )
where these numbers are called components of the vector in Cartesian coordinates.

6.1 Operations and Properties


(a) Addition: summing two vectors ~ u and ~ ~ and each component of w
v , we obtain a new vector w ~ is the sum
of the respective components of ~
u and ~
v . Thus:

~ =~
w u +~
v = (u x + v x , u y + v y , u z + v z )

(b) Scalar multiplication: consider a number λ we multiply the vector by λ by multiplying each component by
this factor:
λ~v = (λv x , λv y , λv z )

38
Figure 26: A (magenta) vector in the three-dimensional Cartesian space.

(c) Some properties of the sum and the scalar multiplication:

(i) Commutative:
~
u +~
v =~
v +~
u

(ii) Associative:
u +~
(~ ~ =~
v) + w u + (~ ~)
v +w
thus one can sum first ~
u and ~ ~ , or first ~
v and then w ~ and then ~
v and w u (or any other combination)
and the result does not change.
(iii) Distributive over addition:
u +~
λ(~ v ) = λ~
u + λ~
v

(d) Modulus of a vector: it represents the length or size of a vector.


q
v| =
|~ v x2 + v 2y + v z2

(e) Null vector: we define the null vector as the vector with all zero components:

~0 = (0, 0, 0)

Summing it to a non-null vector does not change the non-null vector:

v +~0 = ~
~ v =~0 + ~
v

(f ) Unit vectors: in general unit vectors are vectors of unit modulus. Thus from any vector, one can define an
unit vector by dividing it by its modulus:
~
v
v̂ =
v|
|~
where we indicate the unit vector by a hat over its name. By construction it is:

|v̂| = 1

It is most useful to define the unit vectors along the three Cartesian axes:

ı̂ = (1, 0, 0) along x
̂ = (0, 1, 0) along y
k̂ = (0, 0, 1) along z

39
They represent a set of basis vectors of unit length:

|ı̂| = | ̂| = |k̂| = 1

and they are a right-handed set. They span the three-dimensional Cartesian space in which we represent
vectors, thus any vector can be written as a linear combination of this basis. The linear coefficients of this
combination are the vector components in Cartesian coordinates:

~
v = v x ı̂ + v y ̂ + v z k̂

6.2 Products between vectors


There are two ways to multiply two vectors together:

(a) scalar product or dot product: it gives a scalar s

~
u ·~
v =s

~
(b) vector product or cross product: it gives a vector w

~
u ×~ ~
v =w

We now consider the properties of each of the two products.

(a) The scalar product (or dot product) corresponds the sum of the products of the corresponding compo-
nents of the two vectors:
3
~
u ·~
X
v = (u x , u y , u z ) · (v x , v y , v z ) = u x v x + u y v y + u z v z = ui v i
i =1

where we numbered the (x, y, z) components as (1, 2, 3). Geometrically, it corresponds to the product of
the moduli of the two vectors and the cosine of the angle between them:

~
u ·~
v = |~ v | cos θ
u | |~

where θ ∈ [0.π] (see Figure 27). Basically, the scalar product tells us how alike are two vectors, or also how

Figure 27: The geometrical definition of the scalar product.

much of one is in the other.


If we consider the basis vectors, we get:

ı̂ · ı̂ = ̂ · ̂ = k̂ · k̂ = 1

ı̂ · ̂ = ̂ · k̂ = k̂ · ı̂ = 0

40
If we multiply a vector by one of the basis unit vectors, we obtain the component of the vector along the
relative axis. We are thus projecting the vector along one of the axes.

~v · ı̂ = v x ı̂ projects the vector ~


v along x



~v · ̂ = v y ̂ projects the vector ~
v along y

~ k̂ projects the vector ~

v · k̂ = v z v along z

If we calculate the scalar product of a vector with itself we get the square of the modulus:

~
v ·~ v |2 = v x2 + v 2y + v k2
v = |~

Properties of the scalar product:


(i) Commutativity:
~
u ·~
v =~
v ·~
u
(ii) Associativity:
u ·~
(~ ~ =~
v) · w u · (~ ~)
v ·w
thus one can multiply first ~
u and ~
v and then w ~ , or first ~ ~ and then ~
v and w u (or any other combination)
and the result does not change.
(iii) Distributivity over addition:
w~ · (~
u +~ v) = w~ ·~u+w ~ ·~
v

6.2.1 Vectors with complex components

If we consider vectors with complex components, we need to adjust the definition of the scalar product
if we want to have the same properties and a coherent definition of the modulus. The scalar product
definition in this case is:
3
a ·~
~ b = a x b x∗ + a y b ∗y + a z b z∗ = a i b i∗
X
i =1

As a consequence, however we loose the commutative properties as:

a ·~
~ b = (~ a )∗
b ·~

(b) The vector product (or cross product) corresponds a new vector that is perpendicular to both the original
vectors and therefore normal to the plane containing them. The three vectors ~
u, ~
v , and ~
u ×~
v form a right-
handed set. The magnitude of the resulting vector can be obtained by the geometrical definition of the
vector product (see the left part of Figure 28):

~ | = |~
|w u ×~
v | = |~ v | sin θ
u | |~

The geometrical interpretation can be seen in the right part of Figure 28: the magnitude of the vector
product gives the area of the parallelogram formed by the two vectors, while the direction is normal to the
surface of the parallelogram.
The vector resulting from the vector product is defined as a pseudo-vector or an axial-vector: this means
that it transforms like a vector under a rotation, but it changes sign under a reflection. In physics, there are
a number of these pseudo-vectors, like for example the magnetic field B ~ and the angular momentum ~ L (as
a matter of fact from a vector product ~r ×p~).
Properties of the vector product:
(i) Anti-commutativity:
~
u ×~ v ×~
v = −~ u
(ii) Non associativity:
u ×~
(~ ~ 6= ~
v) × w u × (~ ~)
v ×w
thus if one multiply first ~
u and ~
v and then w~ , or first ~ ~ and then ~
v and w u (or any other combination)
and the result does change.
(iii) Distributivity over addition:
w~ × (~
u +~ v) = w~ ×~ u+w~ ×~v

41
Figure 28: The geometrical definition of the vector product magnitude.

If we consider the basis vectors, we get:

ı̂ × ı̂ = ̂ × ̂ = k̂ × k̂ = 0

ı̂ × ̂ = k̂ = − ̂ × ı̂




̂ × k̂ = ı̂ = −k̂ × ̂

k̂ × ı̂ = ̂ = −ı̂ × k̂

Now that we know the properties of the cross product and how the basis unit vectors behave under cross
product, we can calculate the cross product between two generic vectors:

~
u ×~
v = (u x ı̂ + u y ̂ + u z k̂) × (v x ı̂ + v y ̂ + v z k̂)
= ı̂ ×
u x v x ( ı̂) + u x v y (ı̂ × ̂) + u x v z (ı̂ × k̂) +
̂ ×̂) + u y v z ( ̂ × k̂) +
+u y v x ( ̂ × ı̂) + u y v y (
+u z v x (k̂ × ı̂) + u z v y (k̂ × ̂) + u z v z (
k̂ ×k̂)


= u x v y (k̂) + u x v z (− ̂) + u y v x (−k̂) +


+u y v z (ı̂) + u z v x ( ̂) + u z v y (−ı̂)
= (u y v z − u z v y )ı̂ + (u z v x − u x v z ) ̂ + (u x v y − u y v x )k̂

The same result can be obtained through the following matrix definition:

~
u ×~
v = (u x , u y , u z ) × (v x , v y , v z )
̂
¯ ¯
¯ ı̂ k̂ ¯¯
¯
= ¯ ux u y uz ¯
¯ ¯
¯ v v y vz ¯
x

= (u y v z − u z v y )ı̂ − (u x v z − u z v x ) ̂ + (u x v y − u y v x )k̂

where in the last passage we applied again the determinant calculation for a 3 × 3 matrix.

6.3 Triple Products


Again there are two ways to have a triple product:

(a) scalar triple product:


a · (~
~ b ×~
c)
(b) vector triple product:
a × (~
~ b ×~
c)

42
We now consider the properties of each of these triple products.

(a) The scalar triple product gives a scalar that corresponds to the volume of the parallelepiped (warped cube
in Figure 29) spanned by the three vectors.

a · (~
~ b ×~ a ||~
c ) = |~ b ×~ a | cos ϕ|~
c | cos ϕ = |~ c | sin θ
b||~

where the term:


|~ c | sin θ = area
b||~
is the area of the parallelogram spanned by ~
b and ~
c . Thus, we have

a | cos ϕ|~
|~ c | sin θ = |~
b||~ a | cos ϕ · area = volume

a | cos ϕ) is giving the volume.


as the area multiplied by the height (|~

Figure 29: The geometrical definition of the scalar triple product.

The scalar triple product can be calculated through the determinant of the matrix:
¯ ¯
¯ ax a y az ¯
~a · (~
¯ ¯
c ) = ¯¯ b x b y b z ¯¯
b ×~
¯ c c y cz ¯
x

If two of the vectors involved in the triple product are the same, we get:

~ a ×~
a · (~ b) = 0

and this is clearly seen by remembering that the cross product ~ a ×~


b gives a vector perpendicular to both
the original vectors ~a and ~b, on the other side, the dot product measures how much the two vectors are
aligned so it gives zero when the two vectors are perpendicular like in this case:

a ×~
~ b ⊥~
a

6.3.1 Coplanarity

a, ~
Non-null vectors ~ b, and ~
c are coplanar (i.e. there exists a geometric plane that contains them all), if and
only if:
a · (~
~ b ×~
c) = 0
Thus this is one-to-one correspondence that can go in both directions: if the three vectors are coplanar,
then necessarily the triple product is null. If the triple product is null, then necessarily the three vectors
are coplanar.
We can prove both these statements:
a, ~
(i) Let’s start assuming that ~ b, and ~
c are coplanar. In this case we can write one of the three vectors as
the linear combination of the other two:
a = β~
~ b + γ~
c
with some real numbers β and γ Now we can substitute this expression for ~
a into the triple product:

a · (~
~ c ) = (β~ c ) · (~
b + γ~ ~
c ) = β (~ ~
c ) + γ (~
 
b ×~ b ×~ b ·b×~ c ·b c) = 0
×~

where in the last step, we are using the fact that the triple product goes to zero when two of the three
vectors are identical. Thus we proved that if the three vectors are coplanar, their their triple product
is zero.

43
a · (~
(ii) Let’s now assume that ~ c ) = 0 and that the three vectors are independent so we can write ~
b ×~ a as:

a = β~
~ c + δd~
b + γ~

where the vector d~ is perpendicular to both ~b and ~


c so that it takes care of the projection of ~
a perpen-
dicular to the plane of ~b and ~
c . Then we can substitute this expression for ~ a into the triple product
that we know has to be null (as it is our current hypothesis):

0 = a · (~
~ b ×~c ) = (β~b + γ~c + δd~) · (~
b ×~ c)
= ~
βb ·(~
b ×~
 ~
c ) + γ

(~
c ·b ×~
 c ) + δd~ · (~

b ×~ c ) 6= 0

where in the last step we have used the fact that, as d~ is perpendicular to both ~ b and ~
c , then it has to
be parallel to ~b ×~ c . Thus the scalar product between d~ and ~ b ×~
c cannot be zero, which goes against
our hypothesis. Hence δ has to be null and the three vectors have to be coplanar as we wanted to
prove.
(b) The vector triple product ~ a ×(~ c ) gives as a result a vector that is perpendicular to ~
b ×~ a and it is on the plane
~
defined by b and ~c . If we then consider for example (~ a ×~b)×~c , the resulting vector lies on the plane defined
a and ~
by ~ b, thus clearly it has to be:
a × (~
~ b ×~ a ×~
c ) 6= (~ b) ×~ c
There are a number of useful identities that can simplify the calculations when dealing with vector triple
products:
(1)
a × (~
~ b ×~
c ) = (~ c )~
a ·~ a ·~
b − (~ b)~
c

a ×~
(~ b) ×~
c = −~ a ×~
c × (~ b)
= c ·~
(~ ~ c ·~
a )b − (~ b)~
a

(2)
a × (~
~ c ) +~
b ×~ c ×~
b × (~ a ) +~ a ×~
c × (~ b) = 0

(3) Lagrange’s identity:


a ×~
(~ c × d~) = (~
b) · (~ c )(~
a ·~ b · d~) − (~
b ·~ a · d~)
c )(~

6.4 Rotation of coordinate systems

Figure 30: Rotation of coordinate systems.

Consider a two-dimensional coordinate system (x, y) and another system (x 0 , y 0 ) whose axes are rotated by
an angle α with respect to (x, y).
A vector ~
r can be represented in the (x, y) system with coordinates:

~
r = r x ı̂ + r y ̂

44
while in the case of the (x 0 , y 0 ) system, the vector ~
r will have coordinates:

r = r x0 ı̂ 0 + r y0 ̂ 0
~

The vector stays of course the same even if we are considering it in two different systems of coordinates. How-
ever as the components (r x , r y ) of the vector change when going from a system to another (r x0 , r y0 ), we want to
find a way to calculate one set of components from the other.
Let’s start from the basis unit vectors of both systems:

ı̂ · ı̂ = ̂ · ̂ = ı̂ 0 · ı̂ 0 = ̂ 0 · ̂ 0 = 1



ı̂ · ̂ = ̂ · ı̂ = ı̂ 0 · ̂ 0 = ̂ 0 · ı̂ 0 = 0






ı̂ · ı̂ 0 = ̂ · ̂ 0 = cos α

0
³ π´
ı̂ · ̂ = cos α + = − sin α




 2
ı̂ 0 · ̂ = cos π − α = sin α

 ³ ´


2
Now we can go back to our vector and its components: if we consider the dot product of it with one of the
basis vectors, we obtain the corresponding component. Thus:

r · ı̂ = (r x0 ı̂ 0 + r y0 ̂ 0 ) · ı̂


 r x =~

= r x0 ı̂ 0 · ı̂ + r y0 ̂ 0 · ı̂






= r x0 cos α + r y0 (− sin α)





 r · ̂
r y =~ = (r x0 ı̂ 0 + r y0 ̂ 0 ) · ̂

= r x0 ı̂ 0 · ̂ + r y0 ̂ 0 · ̂






= r x0 sin α + r y0 cos α

Thus we have obtained the general coordinate transformation as:

x = x 0 cos α − y 0 sin α
(

y = x 0 sin α + y 0 cos α

or equivalently, we can invert as:


x 0 = x cos α + y sin α
(

y 0 = −x sin α + y cos α
Another way of writing the transformation is using a matrix and see the vector as a matrix with only one
column:
cos α − sin α
µ ¶ µ ¶µ 0 ¶
x x
=
y sin α cos α y0
where the 2×2 matrix is called the rotation matrix. Again, in an equivalent way, we can have the inverse matrix:

cos α sin α
µ 0 ¶ µ ¶µ ¶
x x
=
y 0
− sin α cos α y

We are going to define the product between matrices and between matrices and vectors in the next lectures.

6.5 Vector Equations of Lines and Planes


Using vectors we can also define vector equations of lines and planes.

6.5.1 Vector Equation of a Line

Consider two points A and B : we want to find the vector equation of the line passing through the two points.
From the two points, we define their position vectors:

A ⇒ position ~
(
a = (a x , a y , a z )
B ⇒ position ~
b = (b x , b y , b z )

45
Figure 31: Drawing for obtaining the vector equation of a line.

Now consider a generic point P (with position vector ~


r ) that lies on the line. From the diagram in Figure 31 we
−−→
see that the vector OP can be defined as:
−−→ −−→ −→
OP = O A + AP
which means (in terms of position vectors defined above):
−→
~
r =~
a + AP
−→ −→
Now we know that the AP is on the line and thus it has the same direction as AB for example. We can express
−→
AP as:
−→ −→
AP = t AB
where t is a real number and each value of t will correspond to a different P point. Then we can write that:
−−→ −−→ −→
OB = O A + AB
−→
thus we can obtain AB as:
−→ −−→ −−→ ~
AB = OB − O A = b − ~
a
Now putting all the quantities derived above in the equation for ~
r , we get:

~ a + t (~
r =~ b −~
a)

that can also be rearranged as:


~ a + t~
r = (1 − t )~ b
where t ∈ [−∞, +∞]. The last two expressions correspond to the vector equations for a line. We can now write
explicitly the vector components:

(x, y, z) = (1 − t )(a x , a y , a z ) + t (b x , b y , b z )

and then we can consider component by component:



x = (1 − t )a x + t b x = a x + t (b x − a x )

y = (1 − t )a y + t b y = a y + t (b y − a y )

z = (1 − t )a z + t b z = a z + t (b z − a z )

and from the second expression is easy to extract t which has to be the same in each of the equations:

x − ax y − ay z − az
t= = =
bx − ax b y − a y bz − az

which corresponds to the Cartesian form for a line.

46
In general, given a line equation in Cartesian form, we can always express the equation in vector form. For
example from this Cartesian form:
x y −9 z −2
t= = =
3 −1 1
we can derive the vector equation:

~ a + t (~
r =~ b −~
a ) = (0, 9, 2) + t (3, −1, 1)

Let’s now work some examples to practice the concepts.

(a) Find the vector equation of a line ~


r 1 passing through points A = (0, 1, −2) and B = (3, 4, 3).
We simply apply the formula:

~
r1 = a + t~
(1 − t )~ b = (1 − t )(0, 1, −2) + t (3, 4, 3)
= (0, 1 − t , 2t − 2) + (3t , 4t , 3t ) = (3t , 1 + 3t , 5t − 2)

giving the component-by-component system of relations:



x 1 = 3t

y 1 = 1 + 3t

z 1 = 5t − 2

and the Cartesian form:


x1 y 1 − 1 z1 + 2
t= = =
3 3 5
(b) Find the vector equation of another line ~
r 2 passing through points C = (1, 1, 0) and D = (−3, −2, −7).
Applying again the formula (and using s as parameter):

~
r2 = c + s d~ = (1 − s)(1, 1, 0) + s(−3, −2, −7)
(1 − s)~
= (1 − s, 1 − s, 0) + (−3s, −2s, −7s) = (1 − 4s, 1 − 3s, −7s)

giving the component-by-component system of relations:



x 2 = 1 − 4s

y 2 = 1 − 3s

z 2 = −7s

and the Cartesian form:


x2 − 1 y 2 − 1 z2
s= = =
−4 −3 −7
(c) Do the two previous lines intersect?
To answer this, we just need to equate the component-by-component relations:

3t = 1 − 4s

1 + 3t = 1 − 3s

−2 + 5t = −7s

This is a system of three linear equations in two unknowns. We will go over the theory about this kind of
systems in the next weeks. For the moment we need to see if there exists a pair of (t , s) that satisfies all the
three equations. We start from the first one and derive t as a function of s:

1 − 4s
t=
3
and we substitute this expression in the second equation:

1 − 4s
1+
3 = 1 − 3s → 1 − 4s = −3s
3

47
thus giving s = 1. If s = 1, then:
1−4
t= = −1
3
We can then check that with these values the third equation is verified:

−2 − 5 = −7 X
Substituting the values for t in the first line equation or the value for s in the second line equation, one
finds the point in which the two lines intersect:

x 2 = 1 − 4s = −3

y 2 = 1 − 3s = −2

z 2 = −7s = −7

so they intersect at (−3, −2, −7).

6.5.2 Vector Equation of a Plane

Figure 32: Drawing for obtaining the vector equation of a plane.

Consider to have a point A and an unit vector n̂. We want to find the equation of the plane passing through
A and having n̂ as normal.
We know that a line passing through any two points on the plane is perpendicular to the normal to the
plane. If we consider a generic point P on the plane we want to define as shown in Figure 32, the segment
−→
vector AP is perpendicular to n̂. If we now define ~ a as the position vector of point A and ~r is the position
vector of the generic point P , we can write:
−−→ −−→ −→ −→ −−→ −−→
OP = O A + AP → AP = OP − O A =~ r −~a
−→
We can require the perpendicularity of AP and n̂ by writing that their scalar product has to be zero:

r −~
(~ a ) · n̂ = 0

or
~
r · n̂ = ~
a · n̂ = p
where p is a number obtained by the scalar product of ~
a and n̂ which are data of the problem. The two equa-
tions above are vector equations of a plane.
We can rewrite them in the component form. We define the components of the normal vector n̂ as:

~
n = (α, β, γ)

Thus substituting in the vector equation we get:

αx + βy + γz = p

Let’s now work one example to practice the concept.

48
(a) Find the vector equation for the plane passing through the three points:

A = (3, 2, 0) B = (1, 3, −1) C = (0, −2, 3)

To apply the formula above we need to have the normal vector. To find a vector normal to the plane con-
taining the three points we are given, we can find two vectors on the plane and then calculate their cross
product that would be perpendicular to the plane.
−→
The first vector on the plane can be found by considering the vector AB between the two points A and B :
−→ ~
AB = b − ~
a = (1, 3, −1) − (3, 2, 0) = (−2, 1, −1)
−→
a and ~
where ~ b are the position vectors for A and B respectively. The second vector on the plane can be AC :
−→
AC = ~
c −~
a = (0, −2, 3) − (3, 2, 0) = (−3, −4, 3)

Now we can calculate the cross product:

~
n = (~b −~a ) × (~c −~a ) = (−2, 1, −1) × (−3, −4, 3)
̂
¯ ¯
¯ ı̂ k̂ ¯¯
¯
= ¯ −2 1 −1 ¯
¯ ¯
¯ −3 −4 3 ¯
¯ ¯ ¯ ¯ ¯ ¯
¯ 1 −1 ¯ ¯ −2 −1 ¯ ¯ −2 1
− ̂ ¯
¯
= ı̂ ¯
¯ ¯ ¯ ¯ + k̂ ¯¯ ¯
−4 3 ¯ −3 3 ¯ −3 −4 ¯

= −1ı̂ + 9 ̂ + 11k̂

Using the normal vector ~n = (−1, 9, 11) (it is not a unit vector but this does not represent a problem), we
can write the vector equation of the plane:

~ n =~
r ·~ a ·~
n = −3 + 18 + 0 = 15

where in the last step we just calculated the scalar product between ~
a and ~
n . Thus the plane equation is:

~
r · (−1, 9, 11) = 15

and writing explicitly the components

(x, y, z) · (−1, 9, 11) = 15

we get to the Cartesian equation:


−x + 9y + 11z = 15
If we want to verify that the three points really lie on the plane, we need to substitute their coordinates into
the plane equation:

 A : −3 + 18 + 0 = 15 X

B: −1 + 27 − 11 = 15 X

C: 0 − 18 + 33 = 15 X

7 Vector Calculus
Before being able to introduce vector calculus, we need to introduce the concept of scalar and vector fields.

7.1 Scalar and Vector fields


(a) Scalar fields: they refer to scalar functions of the position vector ~
r = (x, y, z):

φ(x, y, z)

or φ(x, y) in two dimensions. A scalar field is thus a number (a scalar) associated to each point in the space
considered.
Examples of scalar fields are many in physics:

49
(i) the temperature T (~
r ) or the pressure p(~
r ): they are scalar field as to each point in the space you can
associate a number, a measurement of the temperature or the pressure.
(ii) the electrostatic or the gravitational potential V (~
r ).
(iii) the height over sea level h(~
r ) that we are going to use as example below.

(b) Vector fields: they are functions associating a vector to each point in the domain of the function. Thus now
in each point in the space considered, we have a number and a direction.
~
A(x, y, z) = (A x , A y , A z ) = (A x (x, y, z), A y (x, y, z), A z (x, y, z))

or ~
A = (A x , A y ) in two dimensions. They are vectors in which each component is a function of the position
vector.
Examples of vector fields are also in physics:

(i) the velocity ~


v (~
r ) of a gas or a fluid as a function of the position.
~ (~
(ii) the electric field, E ~ (~
r ), and the magnetic field, B r ).

7.2 Gradient, Divergence and Curl


We can define three differential operations that will allow us to study and characterise the scalar and the vector
fields. As the derivatives are necessary to understand the behaviour of a function, these operators are the tools
we can use to obtain the properties of the fields.
We will start from describing these operations and then we will derive the mathematical expressions for
them.
(1) Gradient: the gradient is applied to scalar fields and it gives a vector field. It represents the direction of the

Figure 33: Contours for a scalar field: each line is formed by points in which the field is constant, φ =constant.
On the right side, the directions of maximum change are shown.

greatest rate of increase of the scalar field and its magnitude is the slope of the function in that direction.
For example a topographic map of a mountain has contour lines which are curves along which the height
above sea level has a constant value (the scalar field “height” h(~ r ) has a constant value). The contour lines
will be closer to each others when the slope is the steepest while they will be sparser in case of smoother
slopes.
Every point can be assigned a vector describing the direction of the greatest change in height, with the
length of the vector corresponding of the actual slope. The direction of this vector in each point is perpen-
dicular to the contour line passing in that point and this vector is the gradient of the scalar field.
To summarise:

50
(a) the gradient of a scalar field is a vector field
(b) the gradient of the scalar field is perpendicular to its contour lines.
(c) the size of the gradient is bigger when contour lines are denser.

(2) Divergence: the divergence of a vector field is a scalar field. The divergence represent the volume density

Figure 34: Example of a vector field: the electric field.

of the outward flux of the vector field from an infinitesimal volume around a given point.
For example, consider the electric field in Figure 34 where the red lines are the vector field lines while the
purple lines correspond to the equipotential lines (along which the electric potential is constant). The field
lines exit from the positive charge (the source) and enter in the negative charge (the sink). If we consider a
closed line around the positive charge, all the field lines will be exiting the curve, hence the flux and thus
the divergence will be positive: the divergence around a source is positive. Vice-versa, the field lines would
all enter through a closed line around the sink, thus giving a negative flux or divergence. The divergence is
a quantitative measure of how much a vector field diverges (spread out) or converges at any given point.
To summarise:

(a) the divergence of a vector field is a scalar field


(b) the divergence of the vector field corresponds to a flux per unit area (or volume) of the field lines.
(c) the sign of the divergence is positive when the lines exit the area (or volume), while it is negative when
the lines enter.

(3) Curl: the curl of a vector field is a vector field. It is a vector operator that describes the infinitesimal rotation
of a vector field, its vorticity. At every point in the field, the curl of that point is represented by a vector. The
direction of this vector is the axis of rotation, as determined by the right-hand rule, and the magnitude of
the curl is the magnitude of rotation. If the rotation is anticlockwise, the curl is positive, while it is negative
when the rotation is clockwise.
If we consider the vector field that is the gradient of a scalar field discussed above, this vector cannot
rotate: as a matter of fact, you cannot walk in a circle and walk uphill at the same time (even if Escher
could imagine a way: see left Figure 35). Thus the curl of the gradient of a scalar field is always zero, as we
will see better in the next lectures.
However there are vector fields that can indeed go around in circles, like for example the velocity field of a
turntable (~ v =~
ω ×~ r ) or the magnetic field created by an electric current (see right Figure 35).
To summarise:

(a) the curl of a vector field is a vector field


(b) the curl of the vector field corresponds to a measure of the rotation of the field and its direction is the
axis of rotation, as determined by the right-hand rule,
(c) the curl is positive if the rotation is anticlockwise, while it is negative when the rotation is clockwise.
Now that we have defined them, let’s obtain their mathematical definitions.

51
Figure 35: Left: Escher’s impossible staircase. Right: example of a vector field, the magnetic field generated by
a wire.

7.2.1 Gradient

Figure 36: Derivation of the mathematical form of the gradient.

Consider the scalar field φ and its contour lines (φ =constant). Given a point P (with position vector ~ r ) on
a contour line, we move of an infinitesimal step along the contour to point P 0 (with position vector ~
r 0 ). We can
write ~
r 0 as:
~r 0 =~
r + d~
r
We also know that, by construction:

φ(~ r 0 ) = φ(~
r ) = φ(~ r ) = φ(x + dx, y + dy, z + dz) = constant
r + d~

Hence, we can write:


dφ = φ(x + dx, y + dy, z + dz) − φ(x, y, z) = 0
as it is exactly the definition of the full differential so we can write:
∂φ ∂φ ∂φ
dφ = dx + dy + dz = 0
∂x ∂y ∂z

52
If we consider the infinitesimal chance in the position vector as:

r = (dx, dy, dz)


d~

then the expression above looks very similar to the result of a scalar product between two vectors like:

∂φ ∂φ ∂φ
µ ¶
, , · (dx, dy, dz) = 0
∂x ∂y ∂z

Then, if the dot product is null, it means that the two vectors are perpendicular. As the d~r was chosen to be
along the contour line, we have found a vector perpendicular to it as the gradient needs to be:

grad φ · d~
r =0 ⇒ grad φ ⊥ d~
r

Thus the gradient of a scalar field is:

∂φ ∂φ ∂φ
µ ¶
grad φ(x, y, z) = , ,
∂x ∂y ∂z

Some examples:

(i) Given the scalar field:


φ(x, y, z) = x 2 + y 2 + z 2
the gradient is:
grad φ = (2x, 2y, 2z) = 2(x, y, z) = 2~
r

(ii) Given the scalar field: q


φ(x, y, z) = r = x2 + y 2 + z2
the partial derivative with respect to x is:

∂φ 1 2x x
= p =
∂x 2 x 2 + y 2 + z 2 r

where we have defined r = |~


r |. It is similar for the other coordinates, so the gradient is:
à !
x y z
grad φ = p ,p ,p
x2 + y 2 + z2 x2 + y 2 + z2 x2 + y 2 + z2
³x y z ´ ~ r
= , , =
r r r r

7.2.2 Divergence

To derive the mathematical expression of the divergence, we start from a two-dimensional case and it is going
to be easily extended to three dimensions. Consider the vector field

~
A(x, y) = (A x (x, y), A y (x, y))

going through an infinitesimal area dA = dxdy as in Figure 37. We want to calculate the volume density of the
outward flux through this infinitesimal area, so we will consider side by side the contribution of the vector field
to the flux:

Side 1: A y (x, y)dx and no contribution from A x as it is parallel to the side.

Side 2: A x (x + dx, y)dy and no contribution from A y . This contribution can be written as:

∂A x
µ ¶
A x (x + dx, y)dy = A x (x, y) + dx dy
∂x

53
Figure 37: Derivation of the mathematical form of the divergence.

Side 3: A y (x, y + dy)dx and no contribution from A x .


∂A y
µ ¶
A y (x, y + dy)dx = A y (x, y) + dy dx
∂y

Side 4: A x (x, y)dy and no contribution from A y .


Putting these together, we can calculate the net exchange:

Total flux = Side 2 + Side 3 − Side 1 − Side 4

where we assume that the vector field flows from left to right, and thus it would be entering through sides 1
and 4 and it would be exiting through sides 2 and 3. Hence sides 1 and 4 will contribute negatively to the flux,
while sides 2 and 3 have a positive sign. Substituting the contributions calculated above we get:

Total flux = Side 2 + Side 3 − Side 1 − Side 4


∂A x ∂A y
µ ¶ µ ¶
= A x
(x,y) +
 dx dy +  A y
(x,
y) + dy dx +
∂x ∂y

−A y(x,
y)dx − A x(x,
y)dy
∂A x ∂A y
µ ¶ µ ¶
= dx dy + dy dx
∂x ∂y
∂A x ∂A y
µ ¶
= + dydx
∂x ∂y
Now the flux per unit area, i.e. the divergence, is just:
∂A x ∂A y
µ ¶
div ~
A(x, y) = +
∂x ∂y
This is easily extendable to the three-dimensional case:
∂A x ∂A y ∂A z
µ ¶
~
div A(x, y, z) = + +
∂x ∂y ∂z

One example:
(i) Given the vector field obtain above as a gradient:

grad φ = (2x, 2y, 2z)

the divergence is:


div (grad φ) = div(2x, 2y, 2z) = 2 + 2 + 2 = 6

54
7.2.3 Curl

Figure 38: Derivation of the mathematical form of the curl.

To derive the mathematical expression of the curl, we start from a two-dimensional case and then we will
extend the result to three dimensions. Consider the vector field
~
A(x, y) = (A x (x, y), A y (x, y))
going through an infinitesimal area dA = dxdy as in Figure 38. We need to imagine to walk counter-clockwise
around this infinitesimal area, so this time, we need to consider the component of the field along the path to
calculate the contribution to the rotation.
Side 1: A x (x, y)dx and no contribution from A y as it is perpendicular to the side.
Side 2: A y (x + dx, y)dy and no contribution from A x . This contribution can be written as:
∂A y
µ ¶
A y (x + dx, y)dy = A y (x, y) + dx dy
∂x
Side 3: −A x (x, y + dy)dx and no contribution from A y .
∂A x
µ ¶
−A x (x, y + dy)dx = − A x (x, y) + dy dx
∂y
where the minus sign takes into account that because we are walking counter-clockwise we would go
from right to left, while the vector field goes from left to right.
Side 4: −A y (x, y)dy and no contribution from A x .
Putting these together, we can calculate the total rotation:
Total rotation = Side 1 + Side 2 + Side 3 + Side 4
Substituting the contributions calculated above we get:
Total rotation = Side 1 + Side 2 + Side 3 + Side 4
∂A y
µ ¶
A x
=  (x,
y)dx +  A y(x,
y) + dx dy +
∂x
∂A
µ ¶
x
−  A x
(x,
y) + dy dx − A y
(x,y)dy
∂y
∂A y ∂A x
µ ¶ µ ¶
= dx dy − dy dx
∂x ∂y
∂A y ∂A x
µ ¶
= − dydx
∂x ∂y

55
Again we want to define a quantity independent of the specific area so we divide by dA. What we obtain is the
z component of the curl as we have been working in two dimensions and the rotation on the x − y plane has a
curl along the z axis (according to the right-hand rule).
∂A y ∂A x
µ ¶
(curl ~
A)z = −
∂x ∂y
Now using the cyclic permutation of the triplet (x, y, z):
(x, y, z) → (y, z, x) → (z, x, y)
we can get the other components:
∂A y ∂A x
 µ ¶

(curl ~
A) z = −
∂x ∂y





∂A z ∂A y

 µ ¶
(curl ~
A)x = −


 ∂y ∂z
∂A ∂A

 µ ¶
~ x z

(curl A) y = −


∂z ∂x
Thus the curl in three dimensions is:
∂A z ∂A y ∂A x ∂A z ∂A y ∂A x
µ ¶
curl ~
A(x, y, z) = − , − , −
∂y ∂z ∂z ∂x ∂x ∂y
which is a vector.
One example:
(i) Given the vectors
~
ω = (0, 0, ω)
(

~
r = (x, y, z)
the velocity field is defined as:
iˆ jˆ
¯ ¯
¯ k̂ ¯
~
v =~
ω ×~ ω ¯ = (−ωy, ωx, 0)
¯ ¯
r = ¯¯ 0 0 ¯
¯ x y z ¯

Now we calculate the curl of the velocity field so obtained:


curl ~
v = (0 − 0, 0 − 0, ω − (−ω)) = (0, 0, 2ω)

7.2.4 Nabla Operator

To summarise, we have obtained in the previous sections:


(1) gradient:
∂φ ∂φ ∂φ
µ ¶
grad φ(x, y, z) = , ,
∂x ∂y ∂z
(2) divergence:
∂A x ∂A y ∂A z
µ ¶
div ~
A(x, y, z) = + +
∂x ∂y ∂z
(3) curl:
∂A z ∂A y ∂A x ∂A z ∂A y ∂A x
µ ¶
curl ~
A(x, y, z) = − , − , −
∂y ∂z ∂z ∂x ∂x ∂y
where it is clear that all these operators have in common the first order partial derivatives with respect to the
Cartesian coordinates. Also the divergence reminds us of a scalar product (from vectors to scalars), while the
curl reminds us of a vector product (mixed components in a resulting vector).
We can define a vector operator (which is not technically a proper vector) that is called “del” or “nabla” and
has a symbol ∇:
∂ ∂ ∂
µ ¶
∇= , ,
∂x ∂y ∂z
Using this operator we can simplify the notation for all the gradient, the divergence and the curl:

56
(1) gradient:
∂φ ∂φ ∂φ
µ ¶
grad φ(x, y, z) = , , = ∇φ = ~
∇φ
∂x ∂y ∂z
(2) divergence:
∂A x ∂A y ∂A z
µ ¶
div ~
A(x, y, z) = + + = ∇· ~ ∇· ~
A =~ A
∂x ∂y ∂z
(3) curl:
∂A z ∂A y ∂A x ∂A z ∂A y ∂A x
µ ¶
curl ~
A(x, y, z) = − , − , −
∂y ∂z ∂z ∂x ∂x ∂y
~ ~ ~
= ¯∇ × A = ∇ × A ¯
¯ ı̂
¯ ̂ k̂ ¯¯
¯ ∂ ∂ ∂ ¯
= ¯ ∂x ∂y ∂z ¯¯
¯
¯ Ax A y Az ¯

Some examples:
(i) Given the scalar field:
φ(x, y, z) = e x sin y z 3
the gradient is:
∇φ = (e x sin y z 3 , e x cos y z 3 , 3e x sin y z 2 ) = ~
A
and the divergence of the gradient is:
∇ · (∇φ) = ∇ · ~ e x e x
siny z 3 − siny z 3 + 6e x sin y z = 6e x sin y z
 
A =

(ii) Given the vector field:


~
A = (y z, 3xz, z)
the curl is:
¯ ¯
¯
¯ ı̂ ̂
k̂ ¯¯
∂ ∂ ∂ ¯
∇× ~
A =
¯
¯
¯ ∂x ∂y ∂z ¯¯
¯ Ax A y Az ¯
¯ ¯
¯
¯ ı̂ ̂ k̂ ¯¯
∂ ∂ ∂ ¯
=
¯
¯
¯ ∂x ∂y ∂z ¯¯
¯ yz 3xz z ¯
= (0 − 3x)ı̂ + (y − 0) ̂ + (3z − z)k̂
= (−3x, y, 2z)

7.3 Properties of Gradient, Divergence and Curl


In the following sections, we will list the main properties of the three differential vector operators.

7.3.1 Properties of the Gradient

(a) Linearity:
∇(αφ + βψ) = α∇φ + β∇ψ

(b) Product rule: considering the product of two scalar fields:


∇(φψ) = φ∇ψ + ψ∇φ

(c) Product rule: considering the scalar product of two vector fields:
∇( ~ ~) = ~
A ·B ~) + B
A × (∇ × B ~ × (∇ × ~
A) + ( ~ ~ + (B
A · ∇)B ~ · ∇) ~
A

(d) Chain rule: when our scalar field is a function of another scalar field:
0
r )) = φ ∇ f (~
∇φ( f (~ r)

57
7.3.2 Properties of the Divergence

(a) Linearity:
∇ · (α ~ ~ ) = α∇ · ~
A + βB ~
A + β∇ · B

(b) Product rule: considering the product of a scalar field and a vector field:

∇ · (φ ~
A) = ∇φ · ~
A + φ∇ · ~
A

(c) Product rule: considering the vector product of two vector fields:

∇ · (~ ~ ) = (∇ × ~
A ×B ~−~
A) · B ~)
A · (∇ × B

7.3.3 Properties of the Curl

(a) Linearity:
∇ × (α ~ ~ ) = α∇ × ~
A + βB ~
A + β∇ × B

(b) product rule: considering the product of a scalar field and a vector field:

∇ × (φ ~
A) = (∇φ) × ~
A + φ(∇ × ~
A)

(c) product rule: considering the vector product of two vector fields:

∇ × (~ ~) = ~
A ×B ~) − B
A(∇ · B ~ (∇ · ~ ~ · ∇) ~
A) + (B A − (~ ~
A · ∇)B

In the latter expression we have the operator ( ~ A · ∇) that is worth writing in details. Let’s write explicitly the
vector and the nabla differential operator:
¢ ∂ ∂ ∂ ∂ ∂ ∂
µ ¶ µ ¶
~
¡
A · ∇ = Ax , A y , Az · , , = Ax + Ay + Az
∂x ∂y ∂z ∂x ∂y ∂z

where we need to keep the ordering of this dot product as the partial differentiation has to be applied to the
field to which the whole expression is applied. As a matter of fact, in the expression above the derivation would
~ , thus:
be applied to the vector field B

∂ ∂ ∂
µ ¶
(~ ~ =
A · ∇)B Ax + Ay + Az ~
B
∂x ∂y ∂z
∂B x ∂B x ∂B x
µ
= Ax + Ay + Az ,
∂x ∂y ∂z
∂B y ∂B y ∂B y
Ax + Ay + Az ,
∂x ∂y ∂z
∂B z ∂B z ∂B z

Ax + Ay + Az
∂x ∂y ∂z

7.4 Second Order Derivations


Having three operators as first order derivatives allows for nine combinations for the second order derivation
of fields. Due to the different nature (scalar or vector) of the three first order operators, not all the nine combi-
nations will be possible. Let’s consider all of them.

(a) starting from the gradient and applying the three operator to ∇φ:

(1) gradient of a gradient:   is not possible.


∇(∇φ)
(2) divergence of a gradient: ∇ · (∇φ) ∃ (exists). This can be written also as ∇2 φ or ∆φ where the latter
symbol is called Laplacian (see below).
(3) curl of a gradient: ∇ × (∇φ) = 0 always. As we will see later, a vector field ~A with a null curl is called
irrotational. Because of the fact that ∇×(∇φ) = 0, the irrotational vector field ~
A is also conservative, i.e.
it is possible to find a scalar field such as ~
A = ∇φ.

58
(b) starting from the divergence and applying the three operator to ∇ · ~
A:

(4) gradient of a divergence: ∇(∇ · ~


A) ∃ (exists).
·~

(5) divergence of a divergence: 
∇ · (∇
 A) is not possible.
~
· 
(6) curl of a divergence: 
∇×(∇ A) is not possible.

(c) starting from the curl and applying the three operator to ∇ × ~
A:

(7) gradient of a curl: 


∇(∇× ~
A) is not possible.
(8) divergence of a curl: ∇ · (∇ × ~ ~ with a null divergence
A) = 0 always. As we will see later, a vector field B
is called divergenceless. Because of the fact that ∇ · (∇ × ~ A) = 0, the divergenceless vector field B ~ is
also solenoidal, i.e. it is possible to find another vector field such as B ~ = ∇× ~ A.
(9) curl of a curl: ∇ × (∇ × ~
A) ∃ (exists).

To go back to the definition of the Laplacian from the divergence of the gradient, we can write explicitly the
operator as:
µ 2
∂ ∂2 ∂2

∆ ≡ ∇2 = + +
∂x 2 ∂y 2 ∂z 2
and this can be applied to both scalar and vector fields. In case of the vector fields, we can write explicitly the
resulting operator:

∂2 ∂2 ∂2
·µ ¶
∇ A = ∆~
2~
A = + + Ax ,
∂x 2 ∂y 2 ∂z 2
µ 2
∂ ∂2 ∂2

+ + Ay ,
∂x 2 ∂y 2 ∂z 2
µ 2
∂ ∂2 ∂2
¶ ¸
+ + Az
∂x 2 ∂y 2 ∂z 2
µ 2
∂ A x ∂2 A x ∂2 A x
= + + ,
∂x 2 ∂y 2 ∂z 2
∂2 A y ∂2 A y ∂2 A y
+ + ,
∂x 2 ∂y 2 ∂z 2
∂2 A z ∂2 A Z ∂2 A z

+ +
∂x 2 ∂y 2 ∂z 2

which is again a vector field. This is for example useful to simplify the curl of a curl exploiting the property of
the curl of a vector product ∇ × ( ~
A ×B~ ) above, substituting ∇ to ~
A:

∇ × (∇ × ~
A) = ∇(∇ · ~
A) − ∇2 ~
A

where as a matter of fact:


∇2 ~
A = (∇2 ~
A x , ∇2 ~
A y , ∇2 ~
Az )

and where of course each of the components of the field ~


A are in general function of (x, y, z):

 A x = A x (x, y, z)

A y = A y (x, y, z)

A z = A z (x, y, z)

8 Line and Surface Integrals


8.1 Line Integrals
Having fields that have different magnitudes and directions in each point in space, we can define integrals
that depends not only on the initial and final limits, but that depend also on the specific path done to connect
them.

59
In general, we can have different types of integrals involving vector quantities:
Z Z Z
φ d~r (vector), ~
A · d~
r (scalar), ~
A × d~
r (vector).
C C C

where C is the curve we integrate over. We will concentrate on the middle form which returns a scalar as it is
the most commonly used in physics, for example, to calculate the work done by a vector field:
Z
~
dW = F · d~r → W= F ~ · d~
r
C

The definition of a line integral is similar to the definition for a regular integral:
Z N
~ ~
A(x i , y i , z i ) · ∆~ lim ∆~
X
A · d~
r = lim ri with ri → 0
C N →∞ i =1 N →∞

where in the sum the field has to be calculated along the curve. A line integral can be calculated also on a

~ On the right, the small intervals from the sum


Figure 39: Integrals over a curve C with position vector R.
definition of the integral are shown.

closed curve and in this case we can write it as:


I
~
A · d~
r
C

As already mentioned before, the integral in general will depend on the path: given two curves C 1 and C 2 both
starting in point A and finishing in point B , we will have:
Z Z
~
A · d~
r 6= ~
A · d~
r
C1 C2

The main problem in this kind of integrals is how to include the curve information in the calculation. In general
the curve can be given either in Cartesian coordinates or via a parameterisation.
If the curve is given in Cartesian coordinates, we can write it as y = f (x) with x ∈ [x A , x B ] as coordinates of
the initial A and final B points. We can rewrite the integral as:
Z Z
~
¡ ¢
A(x, y) · d~
r = A x (x, y), A y (x, y) · (dx, dy)
C C
Z xB
A x (x, f (x)) dx + A y (x, f (x)) f 0 (x) dx
¡ ¢
=
xA

where we have now a regular one-variable integral. Depending on the problem, we can decide to write x = g (y)
thus obtaining an integral in y.
In general, however, it is easier to describe the curve through a parameterisation and in this case, we can
write: (
A → t = t0
~
r (t ) = (x(t ), y(t ), z(t ))
B → t = t1

60
where we have expressed the initial and final points A and B in the corresponding values of the parameter t .
We need to obtain the differential position vector:

r = dx ı̂ + dy ̂ + dz k̂
d~

which can be now be written as: µ ¶


dx dy dz
r (t ) =
d~ ı̂ + ̂ + k̂ dt
dt dt dt
Now we can simply put everything back into our integral:
Z Z t1
~
³ ´
dx dy dz
A · d~
r = ( A x (x(t ),y(t ),z(t )),A y (x(t ),y(t ),z(t )),A z (x(t ),y(t ),z(t )))· dt , dt , dt dt
C t0
Z t1 µ ¶
dx dy dz
= A x (t ) + A y (t ) + A z (t ) dt
t0 dt dt dt

where in writing the integral in this form, we want to highlight the fact that the field ~
A has to be calculated on
the curve (so substituting the expressions for the variables x, y and z as a function of t ) thus it is effectively
only function of the parameter t . The line integral is thus reduced to a regular one variable integral that we
know how to solve.
Here are two properties of line integrals:

(a) If we are travelling along curve C from point A to point B , the line integral will have the opposite sign with
respect to the line integral evaluated going from B to A along the same curve C :
Z B Z A
~
A · d~
r =− ~
A · d~
r
A along C B along C

(b) If we have a curve C that can be divided into two different curves C 1 and C 2 we can write the total line
integral over C as the sum of the line integrals over the separate curves C 1 and C 2 :
Z B Z Q Z B
~
A · d~
r= ~
A · d~
r+ ~
A · d~
r
A along C A along C 1 Q along C 2

where we have assumed that point Q is where C 1 and C 2 intersect.

Let’s now work on one example:

(i) Consider the vector field:


~ = (sin x, 1, 0)
F
and calculate its line integral along the curve:
h πi
C: ~
r (t ) = (t , sin t , 0) t ∈ 0,
2

So we need to calculate:
π µ ¶
dx dy dz
Z Z
2
I= ~ · d~
F r = (sin x(t ), 1, 0) · , , dt
C 0 dt dt dt
Z π
2
= (sin t , 1, 0) · (1, cos t , 0) dt
0
Z π
2
= (sin t + cos t ) dt
0
π
= [− cos t + sin t ]02 = 1 − (−1) = 2

61
8.2 Surface Integrals
If we go up of one dimension, we need to consider surface integrals where the integral is calculated over a sur-
face (technically a two-dimensional object) rather than on a curve (that is one-dimensional in the parameter
space). Again we can define three types of combinations of vectorial objects:
Z Z Z
φ d~
S (vector), ~ · d~
F S (scalar), ~ × d~
F S (vector).
S S S

As for the line integrals, we focus on the middle type that involves vectors but returns a scalar. As for the line
integral of a vector field, this surface integral is very much used in physics as it represent the flux of the vector
~ through the surface S.
field F
In this case we need to work on the differential surface element to be able to solve the integral. As a surface
can be identified in each point by its normal unit vector, the differential surface element can be written as:

d~
S = n̂ dS.

In order to decide the direction of the normal unit vector (“up” or “down”) we need to establish a convention:

• if the surface S is closed, the normal unit vector n̂ points outward


• if the surface S is open, the normal unit vector n̂ will point in the direction indicated by the right-hand
rule. The latter is shown in Figure 40. If your fingers rotate counterclockwise along the contour of the
surface, then the extended thumb points in the wanted direction.

Figure 40: The right-hand rule.

The definition of a surface integral is as usual through a sum:


Z N
~ · d~ ~ (x i , y i , z i ) · n̂ i ∆S i lim ∆S i → 0
X
F S = lim F with
S N →∞ i =1 N →∞

To evaluate these types of integrals, we need to write the normal to the surface and the surface element in the
specific problem. Most of the times, we’ll need to think about the surface in question and analyse if we can
write these two elements in a simple way. We’ll work an example below that shows this kind of reasoning.
However, we aim at finding also a general recipe to write d~S. Let’s look at Figure 41. The area element dS
is the orange part and n̂ indicates the unit vector normal to the given area element. Now in general we can
define α the angle between n̂ and the unit vector k̂ along z. Then we can write that the projection of dS on the
(x, y) plane is given by:
dA = cos αdS
thus:
dA dA
dS = =
cos α n̂ · k̂

62
Figure 41: Evaluation of the surface integrals.

If we have the surface expressed through a Cartesian function:

f (x, y, z) = 0

then we have recently discovered that the gradient of a scalar field is a vector perpendicular to the contour
lines or surfaces of the field, in 2d or 3d respectively. So the operation of the gradient is exactly what we need
to extract the normal vector to the given surface:

∇f
n̂ =
|∇ f |

so using the latter definition to write the surface element, we get:

dA |∇ f |
dS = = dA
n̂ · k̂ ∇ f · k̂

Of course we know that, if we consider k̂ = (0, 0, 1), then projection of the gradient along the z axis is just the
partial derivative of the scalar field with respect to z:

∂f
µ ¶
∇ f · k̂ =
∂z
So back to our surface element, we get:
|∇ f |
dS = ∂f
dA
∂z
Thus our integral becomes:
Z Z
I = ~ · d~
F S= ~ · n̂ dS
F
S S
|∇ f |
Z
= ~ · n̂
F dA
A ∂f
∂z
∇ f |∇ f |
Z
= ~·
F dA
A |∇ f | ∂ f
∂z
∇f
Z
= ~·
F dA
A ∂f
∂z

where A is now the area element on the (x, y) plane. Thus we have transformed the surface integral into a
regular integral in two variables.

63
Let’s know work an example: calculate the surface integral
Z
I= F ~ · d~
S
S

where:
~ = (x, 0, 0)
(
F
f : x2 + y 2 + z2 = a2, z > 0
As usual in this case of surfaces, we should move to spherical coordinates, keeping in mind that we can only

Figure 42: Example of calculation of a surface integral: the surface considered is the positive (z > 0) hemi-
sphere.

move over the surface of the sphere so we do not have the r variable but we can fix it to a.

x = a sin θ cos φ θ ∈ [0, π ]


 

y = a sin θ sin φ with 2
φ ∈ [0, 2π]
 
z = a cos θ

In this case, the normal vector corresponds just to the position vector, thus:

~
r
n̂ ≡
r|
|~

Our integrand becomes:


~
r ³x´
µ ¶
~ · n̂ dS = x(ı̂ · n̂) dS = x ı̂ ·
F dS = x dS
r|
|~ a
where we have used in the last step: (
ı̂ = (1, 0, 0)
→ ı̂ ·~
r =x
~
r = (x, y, z)
Now the surface element dS of the sphere we know from when we analysed the variable transformations in
spherical coordinates:
dS = a 2 sin θ dθdφ
So back to our integrand:
³x´ ³x´
~ · n̂ dS = x
F dS = x a 2 sin θ dθdφ = x 2 a sin θ dθdφ
a a

64
So substituting x, our integral is,
π
Z Z 2π Z 2
~ · d~
F S = a3 sin3 θ cos2 φ dθdφ
S 0 0
π
2π 2πa 3
Z Z
2
= a3 sin3 θ dθ cos2 φ dφ =
0 0 3

where we can use what was already derived in the past exercises:
Z π
 2 sin3 θ dθ = 2


3

0
Z 2π
cos2 φ dφ = π




0

Let’s now try to apply the recipe given above. We need to start from the Cartesian form of the surface written
as f (x, y, z) = 0 and this would be:
f (x, y, z) = x 2 + y 2 + z 2 − a 2 = 0
Thus now we need to evaluate the gradient and the partial derivative with respect to z:


 ∇ f = (2x, 2y, 2z) = 2~r





 |∇ f | = 2|~r | = 2a
∂f
 q
= 2z = 2 a 2 − x 2 − y 2


 ∂z
~


 r x
ı̂ · n̂ = ı̂ · =


r| a
|~

So the integrand becomes:

x 2a x2
Z Z Z
I= ~ · d~
F S= x p dA = p dA
S A a 2 a2 − x2 − y 2 A a2 − x2 − y 2

which is now a regular integral in two variables. To solve this it is better to go in 2-dimensional polar coor-
dinates. It is left to the students to proceed into this calculation and verify that the same result as above is
obtained.

8.2.1 Vector Area of Surfaces

We can define a vector area of a surface as: Z


~
S= d~
S
S
This is simply the surface integral with unit integrand. Now we can just work an example to understand better.
Let’s take again the positive (z > 0) hemisphere of:
( 2
x + y 2 + z2 = a2
z >0

The vector area is: Z Z Z


~
S= d~
S= dS rˆ = a 2 sin θ dθdφrˆ
S S S
where in the last step we have used the surface element already used in the previous example on surface inte-
grals and:
rˆ = (sin θ cos φ, sin θ sin φ, cos θ)
Thus the above vector area integral can be decompose in the sum of three integrals and only the last one will
count as the result is: Z Z
~
S = d~ S = a 2 sin θ dθdφrˆ = πa 2 k̂
S S

65
However we are not going to solve this as there is a better way. The vector area calculation can be simplify
transforming the surface integral into a line integral according to the argument below.
First we need to specify one property of the vector area integrals: if the surface is closed, the vector area is
always null: I
~
S= d~
S =0
S
This implies that the vector area of an open surface only depends on its perimeter, or boundary curve C . As a
matter of fact, if S 1 and S 2 are two surfaces with the same boundary curve C , then the surface given by S 1 − S 2
is a closed surface (the sign takes into account that we need to invert the direction of the normal vector in one
case). Thus we have: I Z Z
~
S= d~
S =0= d~
S− d~S
S 1 −S 2 S1 S2

obtaining: Z Z
~
S1 = d~
S= d~
S =~
S2
S1 S2

Now as the surface integral in this case only depends on the boundary curve, we can find a way to express it in
terms of a line integral around the curve C .
As the vector area is independent on the actual S considered given the same curve C , we can choose any
surface having C as perimeter. It is useful to consider the cone-like surface from the origin to C like shown in
Figure 43.

Figure 43: Vector area calculation through a line integral.

The vector area of each of these elementary triangular regions is given by half the area of the triangle made
by the position vector ~
r and the differential of it d~
r that is tangential to the curve:

1
~
r × d~
r
2
thus the total vector area is just the line integral of the area expression above along the whole curve C :

1
Z I
~
S = d~ S= ~
r × d~
r
S 2 C

Let’s apply this to the example we have introduced before of the vector area of the positive (z > 0) hemi-
sphere: ( 2
x + y 2 + z2 = a2
S:
z >0
Now we need to define the contour curve C of our surface S and in this case this is simply the circle on the
(x, y) plane:
C : x2 + y 2 = a2

66
that is obtained from the sphere equation putting z = 0. So our integral is now:
1
Z I
~
S = d~ S= ~
r × d~
r
S 2 C
and thus we need the following ingredients to evaluate it:

~
r = a cos θı̂ + a sin θ ̂
(

r = (−a sin θ ı̂ + a cos θ ̂) dθ


d~

from which we can obtain:


̂
¯ ¯
¯ ı̂ k̂ ¯
~ r = ¯¯ a cos θ a sin θ ¯ dθ = a 2 (cos2 θ + sin2 θ)k̂ dθ = a 2 dθ k̂
¯ ¯
r × d~ 0 ¯
¯ −a sin θ a cos θ 0 ¯

and the integral becomes:


1
I
1
Z 2π 2π 2
~
S= ~ r = a2
r × d~ dθ k̂ = a k̂ = πa 2 k̂
2 c 2 0 2
which is our vector area of half a sphere.

8.2.2 Some Physics Examples

We can think of some examples from physics, examples that we will explore in more details later. For example
we can calculate the flux of the electric field through a sphere S of radius a with the charge Q in the center. The
expression for the electric field is given by:
Q
~=
E rˆ
4πε0 r 2
In this case again we have that the normal n̂ to the surface in question corresponds so the position vector,
normalised:
n̂ = rˆ
The flux is calculated with the surface integral:

Figure 44: Flux of electric field through a sphere.

Z Z
Φ= ~ · n̂ dS =
E ~ · rˆ dS
E
S S

where S is usual sphere x 2 + y 2 + z 2 = a 2 . Substituting the expression for the electric field:
Q Q
Z Z
Φ= rˆ · rˆ dS = dS
4πε0 a 2 S 4πε0 a 2 S

67
where in the last step we have used rˆ · rˆ = 1 and where the last integral corresponds to the area of the surface
of the sphere that is equal to 4πa 2 :
Q Q
Φ= 4πa 2 =
4πε0 a 2 ε0
which is the Gauss’ Law. This flux does not depend on the shape of the sphere we have considered but only on
the source(s) included in the volume enclosed by the surface. The Gauss’ Law says that the flux of E ~ through S
is 1/ε0 times the charge contained in S.
For example if we have a wire, we can think of it as a stick/small cylinder with a charge density per length
ρ. In this case the total charge Q will depend on the length ` of the wire: Q = ρ`. So the total flux will be:

ρ`
Φ=
ε0

Let’s consider in this case a cylinder with its axis along the wire to calculate the flux with a surface integral:
Z Z
Φ= E ~ · n̂ dS = |E
~ | dS = |E
~ |2πa`
S S

where in the last step, we have used the area of the cylinder that is equal to 2πa`. Now obtaining the electric
field and using the information we have on the flux from the Gauss’ law, we have that:

Φ ρl 1 ρ
~| =
|E = =
2πa` ε0 2πa` 2πε0 a

The electric field is a vector field and, as we will see later, it is conservative:
Q Q
~=
E rˆ = ~
r = −∇U
4πε0 r 2 4πε0 r 3

where now the potential scalar field is:


Q
U=
4πε0 r
but this all will be review later. First let’s talk a little more about conservative field.

9 Vector Calculus II
9.1 Conservative Fields
When we talked of line integrals (calculation of the work done by the vector field):
Z
W= F ~ · d~
r
C

we said that in general the result of this integration depends on the actual curve C that is travelled.
In some cases though, it can happen that the line integral is independent on the path, depending only on
the initial and final points of integration. This happens if the field is conservative. We can state that:
~ is a conservative vector field if and only if any of the following is true:
F

(a) the integral Z


~ · d~
F r
C
is independent of the curve C , depending only on the points A and B where the curve starts and finishes.
This is equivalent to say that the integral on a closed curve is null:
I
~ · d~
F r
C

(b) ∃ (it exists) a scalar field φ (single-valued function of position) such that:

~ = ∇φ
F

68
~ is irrotational:
(c) The vector field F
~ =0
∇×F
~ · d~
(d) F r is an exact differential

Let’s go through these conditions starting from the expression for the line integral:
Z Z t1 r
d~
W= ~ · d~
F r= ~ (x(t ), y(t ), z(t )) ·
F dt
C t0 dt

As from the second condition above we can write:

~ = ∇φ
F

substituting we get:
Z t1 r
d~
W= (∇φ)t · dt
t0 dt
Now let’s consider in detail the integrand:

r ∂φ ∂φ ∂φ
µ ¶ µ ¶
d~ dx dy dz
∇φ · = , , · , ,
dt ∂x ∂y ∂z dt dt dt
∂φ dx ∂φ dy ∂φ dz
µ ¶

= + + =
∂x dt ∂y dt ∂z dt dt

where in the last step we use the fact that φ being a scalar field is now simply a function of the only variable t .
Putting this back in our integral we get:
Z t1 r
d~
Z t1 dφ
Z t1
W= (∇φ)t · dt = dt = dφ = φ(B ) − φ(A)
t0 dt t0 dt t0

where the last step shows that the integral only depends on the starting and finishing points of the path. Also
it shows that:
r = dφ
∇φ · d~
~ · d~
that is F r is an exact differential.
Going now in the opposite direction, we can start assuming that the integral
Z
W= F ~ · d~
r
C

is independent on the path then it only depends on the starting and finishing points of the path:
Z
W= F ~ · d~
r = φ(B ) − φ(A)
C

Infinitesimally this means:


~ · d~
F r = dφ
~ · d~
and thus F r is an exact differential. The exact differential of a scalar field can also be written as:

r
dφ = ∇φ · d~

as we have seen before. So putting together the last two equations we have that

~ · d~
F r = dφ = ∇φ · d~
r

or also
~ − ∇φ) · d~
(F r =0
r is arbitrary then it must be:
and since d~
~ = ∇φ
F

69
Considering this last equation, we can use what we already know: the curl of a gradient is always null:

~
∇ × ∇φ = 0 = ∇ × F

hence the curl of our conservative vector field F ~ is also null. Thus F
~ is irrotational.
~ and it is unique, up
The scalar field φ is called a scalar potential function of the conservative vector field F
to an additive constant.
Let’s work out an example: given the vector field:

~ = (2y + 1, 2x − 1, 2z)
F

is it conservative? If so, find the potential.


First we want to check if the vector field is conservative, and we are going to do it by calculating the curl and
see if it is null: ¯ ¯
¯
¯ ı̂ ̂ k̂ ¯¯
~ = ¯¯ ∂ ∂ ∂ ¯ ~
∇×F
¯ ∂x ∂y ∂z ¯¯ = ı̂(0 − 0) − ̂(0 − 0) + k̂(2 − 2) = 0
¯ 2y + 1 2x − 1 2z ¯

So the field is conservative, now we can calculate φ as:

∂φ ∂φ ∂φ
µ ¶
~ = ∇φ
F → (F x , F y , F z ) = , ,
∂x ∂y ∂z

This last equality gives us three equalities. Let’s start from the first one:

∂φ
Fx = = 2y + 1
∂x
and integrating we get:
∂φ
Z Z
φ= dx = (2y + 1) dx = 2y x + x + f (y, z)
∂x
where in this case the constant can be function of both the remaining variables. Now consider the second
equation:
∂φ ∂f
Fy = = 2x + = 2x − 1
∂y ∂y
where we have substituted the expression of the potential φ we have obtained above. Integrating we get the
expression for the function f (y, z)

∂f
Z Z
f (y, z) = dy = −1 dy = −y + g (z)
∂y

where again the constant can be still function of z. The expression of the potential φ after this second step is:

φ = 2y x + x − y + g (z)

Using the latter and the third equation, we get:

∂φ ∂g
Fz = = 0+ = 2z
∂z ∂z
and integrating:
∂g z2
Z Z
g (z) = dz = 2z dz = 2 = z2 + k
∂z 2
where now we can just have a pure constant k as possible degree of freedom. So putting all the pieces together
our potential function is:
φ = 2y x + x − y + z 2 + k

70
9.2 Solenoidal Fields
~ is solenoidal if it is divergenceless:
A vector field B

~=0
∇·B

Then it means that it is always possible to define a vector field such that:

~ = ∇× ~
B A

Also, if ~
A is a vector field that satisfies the above, then also the vector field ~
A 0 defined as:

~
A0 = ~ ~
A + ∇ψ + C

~ is a constant vector) will satisfies the condition B


(where ψ is any scalar function and C ~ = ∇× ~
A 0 . As a matter
of fact, as it is always:
∇ × ∇ψ = 0
~ = 0, those two extra terms would give no extra contributions when calculating the curl of
and of course ∇ × C
the vector potential. Thus the vector potential ~
A is defined up to the gradient of a scalar field and a constant
vector.

9.3 Divergence Theorem


The divergence theorem is very useful to simplify some integrals relating surface integrals to more regular
volume integrals. This theorem connects the total flux of a vector field out of a closed surface S to the integral
of the divergence of the vector field itself over the enclosed volume. If you remember we defined the divergence
as a measure of the flux per unit volume, thus integrating over the whole volume gives us the total flux as from
the surface integral. The theorem is written as:
I Z
Φ= ~ · d~
F S= ~ dV
∇·F
S V

To understand this, consider a closed surface and look at infinitesimal cubes within the volume enclosed by
the surface, as (somewhat) shown in Figure 45. Since ∇ · F ~ is the flux per unit volume and considering an

Figure 45: The divergence theorem: flux going through the infinitesimal volumes inside the surface.

~ dV is the total flux through the surface enclosing the small volume element dV that we
infinitesimal cube, ∇· F
represented as a cube.
Consider neighbouring elements (the other nearby cubes): if we add the flux through the surface elements
of two neighbouring volume elements, the contributions through the common face cancel each other since an
inward flux of one element is an outward flux for the other element.
If we now look at the whole volume that is all occupied by these infinitesimal volumes, then we need to
sum the flux to obtain the total one and the only non null contributions to this total flux will come from the
volume elements with one side on the surface. These sides on the surface are the only one that will not have
the countepart contributing with an opposite sign flux, so they are the only ones contributing to the total flux

71
through the entire volume. The flux through a surface element is written as F~ · n̂ dS. If then we integrate over
the whole surface we get the surface integral above and thus the divergence theorem.
Let’s immediately work an example: evaluate the following surface integral
Z
I= F ~ · d~
S
S

where we define:
~ = (y − x, x 2 z, z + x 2 )
(
F
S : x2 + y 2 + z2 = a2, z > 0
This integral can be calculated directly using what we learned so far but we would need a lot of algebra. So let’s
try instead with the divergence theorem. For the divergence theorem it is important to have a closed surface
that encloses a volume. So we need to define a closed surface appropriate for our problem. Let’s consider
surface S 0 :
S0 = S + S1
where S is our surface defined above and S 1 is the circular area on the (x, y) plane: x 2 + y 2 ≤ a 2 . This last area
allows us to close the initial surface. The new closed area S 0 encloses a volume V that is half a sphere. Now we
can apply the divergence theorem:
Z I Z Z
~ dV =
∇·F ~ · d~
F S= F ~ · d~
S+ ~ · d~
F S
V S0 S S1

We can now calculate the divergence of our vector field:

~ = (−1 + 0 + 1) = 0
∇·F

So if we go back to our integral we get:


Z Z Z
~ dV = 0 =
∇·F ~ · d~
F S+ ~ · d~
F S
V S S1

thus the integral we are interested in is:


Z Z
I= ~ · d~
F S =− ~ · d~
F S
S S1

where now we have to deal with a much easier integral as the surface S 1 is just the circle on the (x, y) plane.
The normal to the surface is simply the vector along the z axis but going down in order to be outward with
respect to the volume. The vector field has to be calculated on the (x, y) plane.

~ = (y − x, 0, x 2 )


 F

n̂ = −k̂

 ~

dS = −k̂ dxdy

Thus our integral is: Z Z Z


I =− ~ · d~
F S =− (y − x, 0, x 2 ) · (0, 0, 1) dA = x 2 dxdy
S1 S1 S1

where now the integral has become a regular integral in two variables. Given that we are integrating on a circle,
we move to polar coordinates (remembering to include the Jacobian):
Z a Z 2π
I = (r 2 cos2 φ)r dr dφ
00
· 4 ¸a Z 2π
r πa 4
= cos2 φ dφ =
4 0 0 4

where in the last step we used:


Z 2π Z 2π 1 + cos 2φ
cos2 φ dφ = dφ = π
0 0 2

72
9.4 Green’s Theorem
Green’s theorem is simply the two-dimensional version of the divergence theorem. Consider a two-dimensional
planar region R bounded by some closed curve C and consider the line integral of a vector field over the curve:
I
~ · n̂ dr
F
C

At any point of the curve the differential vector:

r = dx ı̂ + dy ̂
d~

is tangent to the curve C . We are interested in the vector perpendicular to the curve (that would correspond to

Figure 46: Vector tangent to a curve (a circle just for simplicity).

the n̂ vector perpendicular to the surface, but in this 2-dimensional case) and this can be obtained by the cross
r tangent to the curve and the basis unit vector along the z axis which is perpendicular
product of the vector d~
to the (x, y) plane:
̂
¯ ¯
¯ ı̂ k̂ ¯¯
r × k̂ = ¯¯ dx dy 0 ¯¯ = dy ı̂ − dx ̂
¯
n̂ dr = d~
¯ 0 0 1 ¯
We can now rewrite the line integral as:
I I I
~ · n̂ dr = (F x , F y ) · (dy, −dx) = (F x dy − F y dx)
F
C C C

Applying the divergence theorem, this latter integral is equal to:

∂F x ∂F y
I Ï µ ¶
(F x dy − F y dx) = + dxdy
C R ∂x ∂y

where the last integral is the integral of the 2-dimensional divergence over the region R enclosed by the curve
C . It is useful to rewrite the latter equation rearranging the minus sign (as it can be found in books) as F x → Q
and F y → −P :
∂Q ∂P
I Ï µ ¶
(P dx +Qdy) = − dxdy
C R ∂x ∂y
Now the differential form P dx +Qdy can be seen as a 2-dimensional scalar product if we define the vector field
~ = (P,Q).
F
In addition, the Green’s theorem can be used for example to calculate the area of the region R. It is enough
~ such as:
to find a vector field F
∂Q ∂P
µ ¶
− =1
∂x ∂y

73
so that the right-hand side of the integral above turns out to be just the area A of the region R:
Ï
A= dxdy
R

~=
and this can be also calculated using the Green’s theorem in case it is easier. If we consider for example F
(0, x), where P = 0 and Q = x:
∂Q ∂P ∂x ∂0
µ ¶ µ ¶
− = − = 1−0 = 1
∂x ∂y ∂x ∂y
then applying the Green’s theorem, we would have:
Ï I I
A= dxdy = Qdy = xdy
R C C

9.5 Stokes’ theorem


The Stokes’ theorem is the curl analog of the divergence theorem: it relates the integral of the curl of a vector
field over an open surface S to the line integral of the vector field around the perimeter C bounding the surface.
Let’s start from the curl of an infinitesimal closed curve as in Figure 47 on the left. The contribution to the

Figure 47: Left: Curl representation on an infinitesimal closed curve. Right: cancellation of the contributions
to the curl from the internal elements of the surface

curl from this infinitesimal surface is:


~ · d~
X
∇×F S= F i · d~
ri
i

where it is simply dS = dxdy but we need to consider the vectorial nature of this infinitesimal surface, while
the right-hand side consider the side-by-side contribution to the curl. It considers the projections of the vector
field along the sides of that surface summing over the four contributions.
If we now map the whole surface of infinitesimal surfaces (like in Figure 47 on the right), the contribution
to the curl from a side of one infinitesimal rectangle will be cancelled by the opposite-signed contribution
to the curl coming from the same side travelled in the opposite direction when considering the neighbour
rectangle. Because of this effect, if we integrate over the whole surface, only the infinitesimal sides on the edge
of the surface (thus on the curve C ) will not have a counterpart and will contribute to the total result. The total
circuitation will be the integral of the curl over the surface and is equal to the integral of the vector field over
the curve bounding S:
Z I
∇×F~ · d~
S= F ~ · d~
r
S C

Let’s apply this in an example.


~ and surface S given as:
(i) Verify Stokes’ theorem for the vector field F

~ = (y, −x, z)
(
F
S : x2 + y 2 = a2, z = 0

74
Figure 48: Surface S of example (i) to verify the Stokes’ theorem.

where the surface is simply a circular disk on the (x, y) plane. We want to use the Stokes’ theorem:
Z I
∇×F ~ · d~
S = ~ · d~
F r
S C
↑ ↑
LHS = RHS

Let’s start from the left-hand side (LHS) calculating the curl of the vector field:
¯ ¯
¯ ı̂
¯ ̂ k̂ ¯¯
∇×F ~ = ¯¯ ∂ ∂ ∂ ¯
¯ ∂x ∂y ∂z ¯¯ = (0, 0, −1 − 1) = (0, 0 − 2)
¯ y −x z ¯

Then we have to consider the vectorial surface element:

d~
S = n̂ dS = k̂ dS

where in the last step we use the fact that the normal to our surface is simply the unit vector along z:
k̂ = (0, 0, 1). Thus:
Z
LHS = (0, 0, −2) · (0, 0, 1)dS
ZS Z
= (−2)dS = −2 dS = −2πa 2
S S

where in the last step we have just substitute the integral over the surface (which corresponds to the area
of the given surface) with the area of the circle of radius a.
Now let’s move to the right-hand side (RHS): we need to parameterise C :

~
(
r (t ) = (a cos t , a sin t , 0) with t = [0, 2π]
r = (−a sin t , a cos t , 0)dt
d~

~ has to be calculated on the curve:


Then F

~ (~
F r (t )) = (a sin t , −a cos t , 0)

~ · d~
and then we need F r:

~ · d~
F r = (a sin t , −a cos t , 0) · (−a sin t , a cos t , 0)dt
= (−a 2 sin2 t − a 2 cos2 t )dt = −a 2 dt

Now substituting in the RHS integral:


I Z 2π
RHS = ~ · d~
F r = −a 2 dt = −2πa 2
c 0

which verifies the Stokes’ theorem.

75
9.6 Physics Applications of Divergence and Stokes’ Theorems
Exploiting the divergence and Stokes’ theorems we can obtain Maxwell’s equations.

(a) Let’s start from the electric field and Coulomb’s law: the expression for the electric field is given by:

Q rˆ
~=
E
4πε0 r 2

~ = −∇U and the potential U is:


The electric field is a conservative field, thus E

Q 1
U=
4πε0 r

We want to calculate the flux of the electric field through a sphere S of radius a with the charge Q in the
center. In this case again we have that the normal n̂ to the surface in question corresponds so the position
vector, normalised:
n̂ = rˆ
The flux is calculated with the surface integral:

Figure 49: Flux of electric field through a sphere.

I I
Φ= ~ · n̂ dS =
E ~ · rˆ dS
E
S S

where S is usual sphere x 2 + y 2 + z 2 = a 2 . Substituting the expression for the electric field:

Q Q
I I
Φ= rˆ · ˆ
r dS = dS
4πε0 a 2 S 4πε0 a 2 S

where in the last step we have used rˆ ·rˆ = 1 and where the last integral corresponds to the area of the surface
of the sphere that is equal to 4πa 2 :
Q Q
Φ= 4πa 2 =
4πε0 a 2 ε0
which is the Gauss’ Law. This flux does not depend on the shape of the sphere we have considered but
only on the source(s) included in the volume enclosed by the surface. The Gauss’ Law says that the flux of
~ through S is 1/ε0 times the charge contained in S.
E
Starting from the expression for the flux:

Q
I
Φ= ~ · n̂ dS =
E
S ε0

76
we can apply the divergence theorem to the LHS expression:
I Z
Φ= E ~ · n̂ dS = ∇ · E
~ dV
S V

while for the RHS expression we can introduce the charge density ρ(~
r ):

Q 1
Z
= ρ dV
ε0 ε0 V

Thus we get:
1
Z Z
~ dV =
∇·E ρ dV
V ε0 V

where we have integrals on the same volume V so we can equate the integrands:

ρ
~=
∇·E
ε0

which is the first Maxwell equation.

(b) Let’s continue with Biot-Savart law giving the magnetic field as:

µ0 d~I × rˆ
~=
dB I
4π r2

where d~I goes in the direction of the current I and again we have an inverse square law with respect to the
distance from the source. If we consider the flux of the magnetic field through a closed surface, again we
need to calculate the surface integral: I
ΦB = ~ · n̂ dS
B
S
where again if we consider a sphere with the source at its centre, n̂ = rˆ. Thus the vectorial part of the
integrand is:
~ · n̂ ∝ (d~
dB I × rˆ) · rˆ = 0
where the last step comes from the fact that we have a triple product where two the vectors are identical.
Then if we again apply the divergence theorem:
I Z
ΦB = B ~ · n̂ dS = ∇ · B
~ dV = 0
S V

this gives:
~=0
∇·B
which is Maxwell’s second equation

(c) Next we use Faraday’s law about induced electro-motive force (EMF): we define the flux through an open
surface: Z
ΦB = B ~ · d~
S
S
then the EMF is:
dΦB
EMF = −
dt
~:
but the EMF is also defined as the work done by the electric field E
I
EMF = E ~ · d~
r
C

Putting things together:


dΦB d
I Z
~ · d~
E r =− =− ~ · d~
B S
C dt dt S

77
we can apply Stokes’ theorem to the first line integral:
I Z
~ · d~
E ~ · d~
r = ∇×E S
C S

while we can invert the spacial and the time integration of the magnetic field:

d
Z Z ~
∂B
− ~ · d~
B S =− · d~
S
dt S S ∂t
Thus we get:
Z Z ~
∂B
~ · d~
∇×E S =− · d~
S
S S ∂t
where again we have an integral on the same surface in both terms. The two integrands have to be identi-
cal:
~
∂B
∇×E ~ =−
∂t
which is Maxwell’s third equation.

(d) Finally we consider Ampere’s law: the circuitation of the magnetic field is:
I
~ · d~
B r = µ0 I
C

where we can visualise the current and the magnetic field as in Figure 50. If we consider the line integral

Figure 50: Left: Current generating a magnetic field. Right: Volume of a disk for the calculation of the current
density flux.

above and we apply the Stokes’ theorem, we get:


I Z
~
B · d~ ~ · d~
r = ∇×B S
C S

while, as before, in the second part of the equation, we can introduce the electric current density ~ which
corresponds to a charge density times velocity:
Z
µ0 I = µ0 ~ · d~
S
S

So the Ampere’s law becomes: Z Z


~ · d~
∇×B S = µ0 ~ · d~
S
S S
and equalling the integrands:
~ = µ0~
∇×B

78
which is the first half of the fourth Maxwell’s equation. Let’s take the divergence of both sides in this last
equation:

~
∇·∇×B = µ0 ∇ ·~
↑ ↑
LHS = 0 RHS 6= 0

where the LHS is always null due to the properties of vector double differentiation. But as the RHS cannot
be always null we are missing a piece of this equation. We can recover it by considering that charge cannot
appear or disappear, but it just flows between different spacial regions. Thus considering a volume (in
this case we can think of a disk as in Figure 50 on the right), the change of charge inside the volume V
corresponds to minus the charge flowing out of the volume through its surface S due to the current density
~. This latter statement can be written mathematically as:

dQ
I
= − ~ · d~
S
dt S

In the LHS expression, Q can be again written as a volume integral of the charge density, while in the RHS
we can apply the divergence theorem:

dQ d ∂ρ
 Z Z
 = ρ dV = dV
V ∂t

dt dt V

I Z
− ~ · d~ S = − ∇ ·~ dV


S V

We get:
∂ρ
Z Z
dV = − ∇ ·~ dV
V ∂t V
and thus:
∂ρ
∇ ·~ = −
∂t
Now going back to our initial equation:

~ = 0 = µ0 (∇ ·~ + X )
∇·∇×B

we now have a candidate expression for X :

~
à !
∂ρ ∂ ∂E
X = −∇ ·~ = ~
= (ε0 ∇ · E ) = ε0 ∇ ·
∂t ∂t ∂t

where we have substituted the first Maxwell’s equation for ρ and then we have inverted the time and spacial
derivatives. We now have:
~
à !
∂E
~ ) = ∇ · µ0~ + µ0 ε0
∇ · (∇ × B
∂t
thus:
~
∂E
~ = µ0~ + µ0 ε0
∇×B
∂t
which is the fourth Maxwell’s equation.

To summarise, Maxwell’s equations are:


 ρ
 ∇·E~=  ~
∇·E = 0
ε

 
 0 
~=0
 
~ ∇·B

 

∇ · B = 0

 

 
~

∂B ~ ∂B
~ =− ∇×E~ =−
∇×E 
∂t
∂t

 

 
~
∂E
 
~
 
∂E ~ = µ0 ε0
 
~ = µ0~ + µ0 ε0

 
∇ × B
∇ × B
∂t

∂t

79
where on the right, the case of free space is shown. In this latter case of no sources, we can obtain the equation
for electromagnetic waves. We start from the third Maxwell’s equation and apply a curl to both the terms. The
first term can be developed as:
~ ) = ∇(∇ · E
∇ × (∇ × E ~ ) − ∇2 E
~ = −∇2 E
~
~ = 0. The second term is:
where in the last step we used the first equation in free space ∇ · E
~ ~ ~
à ! à !
∂B ∂ ∂ ∂E ∂2 E
∇× − = − (∇ × B ~) = − µ0 ε0 = −µ0 ε0 2
∂t ∂t ∂t ∂t ∂t

~ . Equating the two terms so modified we get:


where the fourth equation is used to substitute ∇ × B
~
∂2 E ~
1 ∂2 E
~ = µ0 ε0
∇2 E =
∂t 2 c 2 ∂t 2
which is the wave equation. The solutions to this equations are wave propagating at the speed of light.

10 Matrices
A matrix is an array of objects with rows and columns. A generic matrix can be written as:
a 11 a 12 a 13 . . . a 1q
 
 a 21 a 22 a 23 . . . a 2q 
 
A=  a 31 a 32 a 33 . . . a 3q 

 ... ... ... ... ... 
a p1 a p2 a p3 . . . a pq
where a i j are called elements of the matrix and the first index i is the row index, while j is the column index.
The matrix shape is described by the number of row × the number of columns so in this case:

Rows × columns → p ×q

as we have p rows and q columns. The matrix can also be indicated as:

A = {a i j }

Some examples of matrices are: µ ¶


1 2
3 4
cos θ sin θ
µ ¶

− sin θ cos θ
where these are 2 × 2 matrices or in general n × n with n = 2 and these are square matrices.
 
1 2
 3 4 
5 6
which is a 3 × 2 rectangular matrix (3 rows and 2 columns).
µ ¶
1 2 3 4
5 6 7 8
which is another rectangular 2 × 4 matrix (2 rows and 4 columns).
Vectors can also be represented as matrices and they can be represented as row vector:

v =( 1 2 3 )

which is a 1 × 3 matrix. Or we can have column vectors:


 
1
 2 
3
which is a 3 × 1 matrix. Of course vectors can also have higher dimensions like 1 × n or n × 1.

80
10.1 Operations on Matrices
We now define the fundamental basic operations on matrices:

(a) Sum of two matrices. Let’s define the two matrices:


(
A = {a i j }, l ×m
B = {b i j }, p ×q

We can define the sum C = A + B if and only if:


(
l =p
m=q

thus if the two matrices have same dimensions. If this is the case, then the sum is:

C = A + B = {a i j + b i j } = {c i j }

thus the sum is performed element by element:

a 11 + b 11 a 12 + b 12 a 13 + b 13 ... a 1q + b 1q
 

 a 21 + b 21 a 22 + b 22 a 23 + b 23 ... a 2q + b 2q 

C = A +B = 
 a 31 + b 31 a 32 + b 32 a 33 + b 33 ... a 3q + b 3q 

 ... ... ... ... ... 
a p1 + b p1 a p2 + b p2 a p3 + b p3 ... a pq + b pq

One example:

(i) Sum the matrices: µ ¶ µ ¶


1 2 2 4
A= B=
3 4 6 8
The sum is: µ ¶
3 6
C = A +B =
9 12
(b) Matrix equality. Consider again the two matrices:
(
A = {a i j }, l ×m
B = {b i j }, p ×q

The equality A = B is true if and only if:


(
ai j = bi j ∀ i , j
l = p and m = q

thus if the two matrices have identical elements and same dimensions. Two examples:

(i) comparison between two sets of matrices:


µ ¶ µ ¶
2 0 2 2
A= 6= =B
0 2 0 2
µ ¶ µ ¶
2 1 2 1
A= = =B
3 2 3 2
(c) Multiplication of matrices. It is performed through what it is called an inner product as it involves a sum
over an internal index.
It is similar to the scalar product of vectors so let’s remind ourselves the definition of the scalar product:

~
u ·~
v = (u x , u y , u z ) · (v x , v y , v z )
3
X
= ux v x + u y v y + uz v z = uk v k
k=1

81
If we write the vectors naming the elements with the matrix convention we have:
U = ( u 11 u 12 u 13 )
and 
v 11
V =  v 21 
v 31
Then the scalar product between vectors can be rewritten as:
3
X
W = UV = u 1k v k1 = w 11
k=1
where now we sum over an index while there are other two which stay fixed. In this case the product of
{u i j } and {v i j } gives a scalar like result. If we extend to generic size matrices:
(
A = {a i j }, l × m
B = {b i j }, p ×q
we can define the product C = AB if and only if:
m=p
thus the number of columns of A is the same as the number of rows of B . The product matrix C will have
dimensions l × q and elements:
m
ai k bk j = ai k bk j = ai k b k j
X
C = {c i j } =
k=1
where in the last steps, we introduce the convention of summing over repeated indices. This product is
called also “row by column” because effectively given the index i and j , the element i , j of the product
matrix is given by the scalar product of the row i of matrix A and the column j of matrix B .
Some examples:
(i) Calculate the product AB of
µ ¶µ ¶ µ ¶
1 2 5 6 1×5+2×7 1×6+2×8
=
3 4 7 8 3×5+4×7 3×6+4×8
µ ¶
19 22
=
43 50
Inverting now the order of the matrices in the product:
µ ¶µ ¶ µ ¶
5 6 1 2 5×1+6×3 5×2+6×4
=
7 8 3 4 7×1+8×3 7×2+8×4
µ ¶
23 34
=
33 46
Thus we can deduce that for matrices it is AB 6= B A (non-commutative).
(ii) Now multiply a square 2 × 2 matrix with a column vector:
µ ¶µ ¶ µ ¶
1 3 2 1×2+3×5
=
7 3 5 7×2+3×5
This (2 × 2)(2 × 1) gives a product matrix 2 × 1.
(iii) Another product: µ ¶
¡ ¢ 1 2 ¡ ¢
a b = a + 3b 2a + 4b
3 4
This (1 × 2)(2 × 2) gives a product matrix 1 × 2.
(iv) Finally:
   
1 2 µ ¶ a + 2c b + 2d
 3 a b
4  =  3a + 4c 3b + 4d 
c d
5 6 5a + 6c 5b + 6d
where (3 × 2)(2 × 2) gives a product matrix 3 × 2.

82
10.1.1 Properties of Matrix Multiplication

(a) Non-commutative:
AB 6= B A
(b) Associative:
ABC = (AB )C = A(BC )
(c) Distributive with respect to the sum:

(A + B )C = AC + BC or C (A + B ) = C A +C B

It is also valid with the multiplication by a scalar λ:

λ(AB ) = (λA)B or (AB )λ = A(B λ) = (λA)B

where the product by a scalar is defined as:

λA = λ{a i j } = {λa i j }

so each element of the matrix A is multiplied by the scalar λ.

10.2 Special Matrices


(a) Null matrix. It is a matrix with all null elements.

O = {o i j }

with o i j = 0 ∀ i , j . It is such that:


A +O = O + A = A ∀ A
For example in the case of a square 3 × 3 matrix:
 
0 0 0
 0 0 0 
0 0 0

(b) Identity matrix: it is a square matrix with non-null element only on the diagonal and unit values on the
diagonal.
I n = E = {e i j }
where n is the dimension of the square matrix (i.e. n × n) and the element e i j is defined as:
(
1 for i = j
e i j = δi j =
0 for i 6= j

The identity matrix is such that:


I A = AI = A ∀ A
For example, with a 3 × 3 matrix:
 
1 0 0
I3 =  0 1 0 
0 0 1

(c) Diagonal Matrices: these are square matrices with non-null terms only on the diagonal.

D = {d i j }

where we can write: (


d i j for i = j
d i j = d i j δi j =
0 for i 6= j
For example in a 3 × 3 matrix:
 
a 0 0
D = 0 b 0 
0 0 c

83
(d) Triangular matrices. We can have upper triangular matrix:

Tu = {t i j } with t i j = 0 for i > j

or a lower triangular matrix:


T` = {t i j } with t i j = 0 for i < j
Thus, in general we have for square matrices p × p:

t 11 t 12 t 13 ... t 1p
 
 0 t 22 t 23 ... t 2p 
 
 0
Tu =  0 t 33 ... t 3p 

 ... ... ... ... ... 
0 0 0 ... a pp

t 11 0 0 ... 0
 

 t 21 t 22 0 ... 0 

T` = 
 t 31 t 32 t 33 ... 0 

 ... ... ... ... ... 
t p1 t p2 t p3 ... a pp

10.3 More Matrix Operations


(a) Transpose of a matrix A is defined as A T :

A = {a i j } → A T = {a j i }

thus consisting simply in swapping rows with columns. If we have a p × q matrix A, then the transpose A T
is q × p. Some examples:

(i) µ ¶ µ ¶
1 2 1 3
A= → AT =
3 4 2 4
(ii) µ ¶
3
→ AT =
¡ ¢
A= 3 1
1
(iii)  
1 2 µ ¶
1 3 5
A= 3 4  → AT =
2 4 6
5 6
(b) Complex Conjugate. Given a matrix A

A = {a i j } → A ∗ = {a i∗j }

thus the complex conjugate matrix is just the matrix with all complex conjugate elements with respect to
the original matrix. One example:

(i) µ ¶ µ ¶
1 2+i 1 2−i
A= → A∗ =
i 3 −i 3
(c) Hermitian Conjugate. Given a matrix A

A = {a i j } → A † = {a ∗j i }

where A † is the Hermitian conjugate matrix and it is called A “dagger”. The Hermitian conjugate matrix is
the matrix transpose of the complex conjugate of the original matrix or, equivalently, the complex conju-
gate of the transpose of the original matrix:

A † = (A ∗ )T = (A T )∗

One example:

84
(i) µ ¶ µ ¶
1 2+i 1 −i
A= → A† =
i 3 2−i 3
(ii)
eiθ
µ ¶
0
A=
e −i θ 0
e −i θ
µ ¶
0
→ A∗ =
eiθ 0
eiθ
µ ¶
0
→ A† = −i θ =A
e 0
where in this case the Hermitian conjugate is identical to the original matrix. In this case the matrix
A is called Hermitian.

10.4 Determinant of a Square Matrix


The determinant of a matrix is defined only for square matrices, i.e. p × p matrix. The determinant is a scalar
quantity (not necessarily positive) that carries information about the elements of the matrix and its properties.
It is indicated by a straight “parentheses”:
a 11 a 12 a 13 . . . a 1p
 
 a 21 a 22 a 23 . . . a 2p 
 
A=  a 31 a 32 a 33 . . . a 3p 

 ... ... ... ... ... 
a p1 a p2 a p3 ... a pp

a 11 a 12 a 13 ... a 1p
¯ ¯
¯ ¯
¯ ¯
¯
¯ a 21 a 22 a 23 ... a 2p ¯
¯
→ det A = ¯¯ a 31 a 32 a 33 ... a 3p ¯
¯
¯
¯ ... ... ... ... ... ¯
¯
¯ a p1 a p2 a p3 ... a pp ¯

To calculate the determinant, we need to define the concepts of minor and cofactor of an element of a
matrix:
(a) minor M i j of the element a i j of the matrix p × p A: it is the determinant of the (p − 1) × (p − 1) matrix
obtained by removing all the elements of the i th row and the j th column of A. Let’s consider a 3 × 3 matrix
A:  
a 11 a 12 a 13
A =  a 21 a 22 a 23 
a 31 a 32 a 33
The minor M 23 of element a 23 , for example. is the determinant of the 2 × 2 matrix obtained by removing
the 2nd row and the 3rd column:
 
a a 12 a 13 ¯ ¯
 11 ¯ a 11 a 12 ¯¯
A =  a 21 a 22 → M 23 = ¯¯

a 23  a 31 a 32 ¯
a 31 a 32 a 33

(b) cofactor C i j of the element a i j : it is the minor M i j multiplied by (−1)i + j :

C i j = (−1)i + j M i j

Using the definitions above, we can now define the determinant of the matrix A as:
X
det A = ai j C i j ∀ i
j
X
= ai j C i j ∀ j
i

This formalism means the following:

85
• we choose to start from a specific row, fixing i , and then the determinant will be calculated summing on
P
the column index j from 1 to p: j a i j C i j .

• or we can choose to start from a specific column, fixing j this time, and then the determinant will be
P
calculated summing on the row index i from 1 to p: i a i j C i j .

Let’s apply this definition to a simple 2 × 2 matrix A:


µ ¶
a 11 a 12
A=
a 21 a 22

We can use the second column, to experience a different calculation with respect to what we have used so far:
the index j is fixed to 2 and we have to sum over the row index i :
X
det A = a i 2C i 2 = a 12C 12 + a 22C 22
i
= a 12 (−1)1+2 M 12 + a 22 (−1)2+2 M 22
= −a 12 a 21 + a 22 a 11

where in the last step we used the fact that the minor of an element of a 2×2 matrix is the determinant of a 1×1
matrix. The determinant of a 1 × 1 matrix is the determinant of a scalar and corresponds to the scalar itself:

A = (a 11 ) → det A = a 11

Similarly we can do the calculation for a 3 × 3 matrix:


 
a 11 a 12 a 13
A =  a 21 a 22 a 23 
a 31 a 32 a 33

We can use the second row for example: the index i is fixed to 2 and we have to sum over the column index j :
X
det A = a 2 j C 2 j = a 21C 21 + a 22C 22 + a 23C 23
j

= a 21 (−1)2+1 M 21 + a 22 (−1)2+2 M 22 + a 23 (−1)2+3 M 23


¯ ¯ ¯ ¯ ¯ ¯
¯ a 12 a 13 ¯ ¯ a 11 a 13 ¯ ¯ a 11 a 12 ¯
= −a 21 ¯¯ ¯ + a 22 ¯
¯ ¯ − a 23 ¯¯ ¯
a 32 a 33 ¯ a 31 a 33 ¯ a 31 a 32 ¯
= −a 21 (a 12 a 33 − a 13 a 32 ) + a 22 (a 11 a 33 +
−a 13 a 31 ) − a 23 (a 11 a 32 − a 12 a 31 )
= −a 21 a 12 a 33 + a 21 a 13 a 32 + a 22 a 11 a 33 +
−a 22 a 13 a 31 − a 23 a 11 a 32 + a 23 a 12 a 31

Another way to define the determinant is through the Levi-Civita symbol εi 1 i 2 i 3 ...i p where all the i ’s are natural
integers. This symbol assumes values as follows:

+1 if (i 1 i 2 i 3 . . . i p ) is an even permutation of




(1, 2, 3, . . . , p)





−1 if (i i i . . . i )

is an odd permutation of
1 2 3 p
εi 1 i 2 i 3 ...i p =


 (1, 2, 3, . . . , p)

0 otherwise for example if two indices are





identical, like in (1, 1, 3, . . . , p)

The determinant is then defined as:

εi 1 i 2 i 3 ...i p a 1i 1 a 2i 2 a 3i 3 . . . a pi p
X
det A =
i 1 i 2 i 3 ...i p

86
where in general this sum is made of p p terms of which p! are non null. Let’s see how to apply this formula in
the two cases we already considered above, with 2 × 2 and 3 × 3 matrices. Let’s start with 2 × 2 matrix and from
the Levi-Civita symbol that will have only 2 indices (i 1 i 2 i 3 . . . i p ) → (i 1 i 2 ):

ε12 = +1 as it is an even permutation of (1, 2)




εi 1 i 2 = ε21 = −1 as it is an odd permutation of (1, 2)

ε11 = ε22 = 0 as there are 2 identical indices

Thus we have 22 = 4 terms of which 2 are non null. Then we can apply the formula for the determinant:

εi 1 i 2 a 1i 1 a 2i 2
X
det A =
i1i2
ε
=  ε
11 a 11 a 21 + ε12 a 11 a 22 + ε21 a 12 a 21 + 22 a 12 a 22
= a 11 a 22 − a 12 a 21

where in the last step we substituted the values of the Levi-Civita symbols obtained above. Similarly we can do
for the 3 × 3 matrices where the indices in the Levi-Civita symbol become 3:

ε123 = ε231 = ε312 = +1



 as they are all even permutations





 of (1, 2, 3)
εi 1 i 2 i 3 = ε213 = ε132 = ε321 = −1 as they are all odd permutations

of (1, 2, 3)





ε112 = ε223 = · · · = 0

as there are identical indices

Thus we have 33 = 27 terms of which 3! = 6 are non null. Now we can calculate the determinant of the 3 × 3
matrix considering only the 6 non-null terms above:

εi 1 i 2 i 3 a 1i 1 a 2i 2 a 3i 3
X
det A =
i1i2i3
= ε123 a 11 a 22 a 33 + ε231 a 12 a 23 a 31 + ε312 a 13 a 21 a 32 +
+ε213 a 12 a 21 a 33 + ε132 a 11 a 23 a 32 + ε321 a 13 a 22 a 31
= a 11 a 22 a 33 + a 12 a 23 a 31 + a 13 a 21 a 32 +
−a 12 a 21 a 33 − a 11 a 23 a 32 − a 13 a 22 a 31

which, reordering the terms, is identical to what we found above.


Some examples follow:

(i) Calculate the determinant of matrix A:  


a 0 b
A= 0 c 0 
d 0 e
We can calculate the determinant using the usual method, i.e. using the first row (i = 1) and applying the
definition: X
det A = a 1 j C 1 j
j

for j going from 1 to 3.


µ ¶ µ ¶ µ ¶
c 0 0 0 0 c
det A = a −0 +b = ace − bcd
0 e d e d 0

(ii) Calculate the determinant of matrix A:  


1 2 3
A= 4 5 6 
7 8 9

87
We can calculate the determinant using different rows or columns: let’s start from the second row, i.e.
fixing i = 2 and then applying X
det A = a 2 j C 2 j
j

¯ ¯ ¯ ¯ ¯ ¯
¯ 2 3 ¯¯ 2+2 ¯ 1 3 ¯¯ 2+3 ¯ 1 2 ¯¯
(−1)2+1 4 ¯¯
¯ ¯
det A = + (−1) 5 + (−1) 6
8 9 ¯ ¯ 7 9 ¯ ¯ 7 8 ¯
= −4(18 − 24) + 5(9 − 21) − 6(8 − 14) = 24 − 60 + 36 = 0

where in terms of minors and cofactors we have:


¯ ¯
2+1 2+1 ¯
¯ 2 3 ¯¯
C 21 = (−1) M 21 = (−1) ¯ 8 = −(18 − 24) = 6
9 ¯
¯ ¯
¯ 1 3 ¯
C 22 = (−1)2+2 M 22 = (−1)2+2 ¯¯ ¯ = (9 − 21) = −12
7 9 ¯
¯ ¯
¯ 1 2 ¯
C 32 = (−1)3+2 M 32 = (−1)3+2 ¯¯ ¯ = −(8 − 14) = 6
7 8 ¯
Now let’s try using the third column, i.e. fixing j = 3 and then applying
X
det A = a i 3C i 3
i

¯ ¯ ¯ ¯ ¯ ¯
¯ 4 5 ¯¯ 2+3 ¯ 1 2 ¯¯ 3+3 ¯ 1 2 ¯¯
(−1)1+3 3 ¯¯
¯ ¯
det A = + (−1) 6 + (−1) 9
7 8 ¯ ¯ 7 8 ¯ ¯ 4 5 ¯
= 3(32 − 35) − 6(8 − 14) + 9(5 − 8) = −9 + 36 − 27 = 0

where in terms of minors and cofactors we have:


¯ ¯
1+3 1+3 ¯
¯ 4 5 ¯¯
C 13 = (−1) M 13 = (−1) ¯ 7 = (32 − 35) = −3
8 ¯
¯ ¯
2+3
¯ 1 2 ¯
2+3 ¯
C 23 = (−1) M 23 = (−1) ¯ 7 8 ¯ = −(8 − 14) = 6
¯

¯ ¯
¯ 1 2 ¯
C 33 = (−1)3+3 M 33 = (−1)3+3 ¯¯ ¯ = (5 − 8) = −3
4 5 ¯

10.5 Trace of a Square Matrix


Consider a generic p × p square matrix:
A = {a i j }.
We define the trace of the square matrix A as the sum of the diagonal terms:
p
X
Tr[A] = a kk
k=1

For example:

(i) Calculate the trace of matrix A:  


1 2 3
A= 4
 5 6 
7 8 9
and summing just the elements of the diagonal, we get:

Tr[A] = 1 + 5 + 9 = 15

88
10.6 More on Special Matrices
(a) Square null matrix: p × p square matrix with all null elements.

O = {o i j }

with o i j = 0 ∀ i , j . The determinant of the square null matrix is null:

detO = 0

(b) Identity matrix: it is a square matrix with non-null unit-value element only on the diagonal. The determi-
nant of the identity matrix is 1:
det I = 1
where this can be obtained from the determinant of a diagonal matrix (see below).

(c) Diagonal matrices: p × p square matrix


D = {d i j }
is diagonal if only diagonal elements d kk are non-null. The determinant of diagonal matrices is calculated
by the product of all the diagonal elements:
p
Y
det D = d kk
k=1

(d) Triangular matrices: p × p square matrix


T = {t i j }
where upper triangular matrices have t i j = 0 if i > j , and lower triangular matrices have t i j = 0 if i < j .
The determinant of triangular matrices is calculated similarly to the determinant of diagonal ones by the
product of all the diagonal elements:
Yp
det T = t kk
k=1

(e) The transpose A T of a matrix A is obtained by swapping rows and columns: the determinant of the trans-
pose matrix is the same as the determinant of the original matrix:

det A T = det A

(f) The complex conjugate A ∗ of a matrix A is made of the complex conjugate of the elements of the original
matrix A: the determinant of the complex conjugate A ∗ matrix is the complex conjugate of the determi-
nant of the original matrix:
det A ∗ = (det A)∗

(g) The Hermitian conjugate A † of a matrix A is the complex conjugate of the transpose of the original matrix
A:
A † = (A ∗ )T = (A T )∗
The determinant of the Hermitian conjugate A † matrix is the complex conjugate of the determinant of the
original matrix:
det A † = det(A ∗ )T = det(A ∗ ) = (det A)∗
where in the second step we have applied the property of the determinant of the transpose, while in the
third step we used the property of the determinant of the complex conjugate of a matrix.

89
10.7 Inversion of a Matrix
We can define an inverse matrix A −1 of a square matrix as:

A A −1 = A −1 A = I

and if ∃A −1 (if the inverse A −1 exists), then it is unique. We can start from some examples, address the general
case of 2 × 2 matrices and then proceed to define a general recipe to calculate the inverse of a square matrix.

(a) The inverse of the 2 × 2 identity matrix: µ ¶


1 0
I=
0 1
In general we can write down the definition with A −1 as a generic matrix:
µ ¶µ ¶ µ ¶
1 0 a b 1 0
I I −1 = = =I
0 1 c d 0 1
µ ¶µ ¶ µ ¶ µ ¶
1 0 a b a b a =1 b=0
= =
0 1 c d c d c =0 d =1
thus we get that: µ ¶
1 0
I −1 = =I
0 1

(b) The inverse of the 2 × 2 diagonal matrix: µ ¶


2 0
D=
0 3
¶µ µ ¶ µ ¶
−12 0 a b 1 0
DD = = =I
0 3 c d 0 1
µ ¶µ ¶ µ ¶
2 0 a b 2a = 1 2b = 0
=
0 3 c d 3c = 0 3d = 1
thus we have b = c = 0, a = 1/2 and d = 1/3. The inverse of the diagonal matrix is the diagonal matrix with
inverse elements: µ 1 ¶
−1 2 0
D =
0 13

(c) The inverse of 2 × 2 matrix: µ ¶


4 3
A=
3 2
µ ¶µ ¶ µ ¶
−1 4 3 a b 1 0
AA = = =I
3 2 c d 0 1
µ ¶µ ¶ µ ¶
4 3 a b 4a + 3c = 1 4b + 3d = 0
=
3 2 c d 3a + 2c = 0 3b + 2d = 1
The latter matrix gives us two sets of equations

2
a = − c
(  (
3a + 2c = 0 a = −2

3
→ →
4a + 3c = 1  2 −8 + 9 c =3
−4 c + 3c =
 c =1
3 3
Similarly:
 3
4b = − d
( (
4b + 3d = 0 b=3

→ 4 →
3b + 2d = 1  3
−3 d + 2d = −9 + 8 d = −4
d =1
4 4
Thus the inverse matrix is: µ ¶
−2 3
A −1 =
3 −4

90
In general, given a generic p × p matrix A, the inverse A −1 is given by:

1
A −1 = CT
det A

where C T is the transpose of the cofactor matrix C which is the matrix having all the cofactors as elements:

C = {C i j = (−1)i + j M i j }

The inverse of a matrix A −1 is defined if and only if det A 6= 0. Let’s first consider the case of a 2 × 2 generic
matrix applying the definition: given matrix A:
µ ¶
a 11 a 12
A=
a 21 a 22

first we calculate the determinant:


det A = a 11 a 22 − a 12 a 21
and then we calculate the cofactor matrix:
µ ¶ µ ¶ µ ¶
C 11 C 12 +M 11 −M 12 a 22 −a 21
C= = =
C 21 C 22 −M 21 +M 22 −a 12 a 11

where in the last step we substituted the values of the minors in this case. Then the transpose is:
µ ¶
a 22 −a 12
CT =
−a 21 a 11

Thus the inverse matrix is: µ ¶


−1 1 a 22 −a 12
A =
(a 11 a 22 − a 12 a 21 ) −a 21 a 11
In the 2 × 2 case the inverse matrix is obtained by switching the diagonal terms and changing the sign of the
off-diagonal ones. We obtained the same in the example above but we need to consider the determinant in
that case: ¯ ¯
¯ 4 3 ¯
det A = ¯¯ ¯ = 8 − 9 = −1
3 2 ¯
Thus: µ ¶
2 −3
A −1 = −1
−3 4
In the case of a 3 × 3 matrix, let’s work an example:

(i) calculate the inverse of matrix A:  


1 2 3
A= 0 1 0 
3 2 1
First, we calculate the determinant choosing the second row:

det A = 1(−1)2+2 (1 − 9) = −8

and then we calculate all the cofactors row by row:


¯ ¯ ¯ ¯ ¯ ¯
¯ 1 0 ¯¯ ¯ 0 0 ¯¯ ¯ 0 1 ¯¯
C 11 = + ¯¯ = 1 C 12 = − ¯¯ = 0 C 13 = + ¯¯ = −3
¯ 2 1 ¯¯ ¯ 3 1 ¯¯ ¯ 3 2 ¯¯
¯ 2 3 ¯¯ ¯ 1 3 ¯¯ ¯ 1 2 ¯¯
C 21 = − ¯¯ = 4 C 22 = + ¯¯ = −8 C 23 = − ¯¯ = 4
¯ 2 1 ¯¯ ¯ 3 1 ¯¯ ¯ 3 2 ¯¯
¯ 2 3 ¯¯ ¯ 1 3 ¯¯ ¯ 1 2 ¯¯
C 31 = + ¯¯ = −3 C 32 = − ¯¯ = 0 C 33 = + ¯¯ = 1
1 0 ¯ 0 0 ¯ 0 1 ¯

91
We have obtained the cofactor matrix C :
 
1 0 −3
C = 4 −8 4 
−3 0 1

So the inverse of A becomes:  


1 4 −3
1
A −1 = −  0 −8 0 
8
−3 4 1

Let’s now mention some properties of the inverse of a matrix:

(1) The inverse of the inverse of a matrix is equal to the matrix itself:

(A −1 )−1 = A

(2) The inverse of the transpose is the transpose of the inverse:

(A T )−1 = (A −1 )T

(3) The inverse of the Hermitian conjugate is the Hermitian conjugate of the inverse:

(A † )−1 = (A −1 )†

(4) The inverse of a matrix product is the commuted product of the inverse matrices:

(AB )−1 = B −1 A −1

and this is true also for the product of more than two matrices:

(ABC . . . M )−1 = M −1 . . .C −1 B −1 A −1

10.8 More Properties of Matrix Operations


(1) The transpose of the product is the commuted product of the transposes:

(AB )T = B T A T

(2) The complex conjugate of the product is the product of the complex conjugates:

(AB )∗ = A ∗ B ∗

(3) The Hermitian conjugate of the product is the commuted product of the Hermitian conjugates:

(AB )† = B † A †

10.9 More Special matrices


The definitions of the operations of transposition and Hermitian conjugation of a matrix allow us to define
more special matrices:

(a) Symmetric matrices: a matrix S is symmetric if it is identical to its transpose:

S T = S = {s i j } so si j = s j i ∀ i , j

For example:
1 2 3 4
 
 2 5 6 7 
S =
 3

6 8 9 
4 7 9 10

92
A matrix is defined anti-symmetric if

A T = −A = a i j so a i j = −a j i ∀ i , j

This definition implies also that the diagonal elements have to be null:

a kk = −a kk → a kk = 0

For example:
0 1 2 3
 
 −1 0 4 5 
A= 
 −2 −4 0 6 
−3 −5 −6 0
If p × p matrix A is anti-symmetric, then det A = 0 if p is odd (see below in the properties of the determi-
nant).

(b) Orthogonal matrices: a matrix O is defined orthogonal if it satisfies:

O T O = OO T = I → O T = O −1

As it will seen below the determinant of an orthogonal matrix is:

detO = ±1

For example, an orthogonal matrix we already considered is the 2-dimensional rotation matrix:

cos α sin α
µ ¶
R=
− sin α cos α

(c) Hermitian Matrices: a matrix H is defined Hermitian if it is identical to its Hermitian conjugate:

H † = H = {h i j } so h i j = h ∗j i

A matrix can also be anti-Hermitian if:

H † = −H so h i j = −h ∗j i

The inverse of a Hermitian or anti-Hermitian matrix is still Hermitian or anti-Hermitian:

(H −1 )† = (H † )−1 = H −1

A symmetric matrix can be seen as a special case of Hermitian matrix in case we have a real matrix (a matrix
with real elements).

(d) Unitary matrix: a matrix U is defined unitary if it satisfies:

UU † = U †U = I → U † = U −1

An orthogonal matrix can be seen as a special case of unitary matrix in case we have a real matrix (a matrix
with real elements).
A famous example of unitary matrix in particle physics is the so called CKM2 matrix representing the mix-
ing between quark families: the CKM matrix is 3 × 3 and unitary.

(e) Normal Matrix: a matrix N is defined normal if it commutes with its Hermitian conjugate:

N N † = N †N

Hermitian, unitary, symmetric and orthogonal are all normal matrices and the inverse N −1 of a normal
matrix is still normal.
2 CKM stands for Cabibbo, Kobayashi and Maskawa, the three theorists who developed the idea of quark mixing.

93
An example of Hermitian and unitary matrices is:
µ ¶
1 0 1+i
U=p
2 1−i 0

Let’s first write the Hermitian conjugate of the matrix U :


µ ¶
1 0 1−i
U∗ = p
2 1+i 0
µ ¶
0 1+i
→ U † = (U ∗ )T = =U
1−i 0
where in the last step, we notice that the Hermitian conjugate matrix U † is identical to the original matrix U
and this our U matrix is Hermitian. Now let’s also calculate UU † :
µ ¶µ ¶
† 1 0 1+i 0 1+i
UU =
2 1−i 0 1−i 0
µ ¶ µ ¶
1 (1 + i )(1 − i ) 0 1 2 0
= = =I
2 0 (1 − i )(1 + i ) 2 0 2
where we have proved here that this U matrix is also unitary as the definition is verified.

10.10 Properties of the Determinant


We are now going to list the properties of the determinant. Some of them we have already mentioned and most
of them will be extremely useful for the next topics.
(1) the determinant of the transpose A T is identical to the determinant of the original matrix A:

det A T = det A

(2) the determinant of the complex conjugate A ∗ is the complex conjugate of the determinant of the original
matrix A:
det A ∗ = (det A)∗

(3) the determinant of the Hermitian conjugate A † is the complex conjugate of the determinant of the original
matrix A:
det A † = (det A ∗ )T = det A ∗ = (det A)∗

(4) interchanging two rows or columns changes the sign of the determinant while its magnitude stays the
same. Given matrix A  
a b c
A= d e f 
g h i
swapping the first two rows, we get:
 
b a c
B = e d f 
h g i
it is:
det A = − det B
The two rows or columns do not have to be adjacent. A simple proof is the following: consider a p × p
matrix Awritten as a column of rows:  
R1
 ... 
 
 R 
 i 
A =  ... 
 
 Rk 
 
 
 ... 
Rp

94
In order to swap R i and R k we can first add the k-th row to the i -th row:
 
R1

 ... 

 R +R 
 i k 
A0 =  ...
 

Rk
 
 
 
 ... 
Rp

Then subtract the i -th row of the resulting matrix from the k-th row:
 
R1

 ... 


 R i + R k


00
A = ...
 

 R k − R i − R k = −R i 
 
 
 ... 
Rp

Then add the k-th row of the resulting matrix to the i -th row:
 
R1

 ... 

 R +R −R = R 
 i k i k 
A 000 =  ...
 


−R i

 
 
 ... 
Rp

Multiply the k-th row of the resulting matrix by −1:


 
R1

 ... 


 Rk 

0000
A = ...
 

Ri
 
 
 
 ... 
Rp

By the property 7) below all operations of this procedure except the very last one do not change the de-
terminant. The last operation changes the sign of the determinant according to property 5) below with
λ = −1.

(5) If all the elements of a single row (or column) of matrix B are multiplied by a common factor λ, then this
factor can be taken out of the determinant calculation and the value of the determinant is given by the
product of the determinant of A and λ:

a 11 a 12 . . . a 1p
 
 a 21 a 22 . . . a 2p 
A= ...


a p1 a p2 . . . a pp

then defining:
λa 11 a 12 ... a 1p
 
 λa 21 a 22 ... a 2p 
B =
 
... 
λa p1 a p2 ... a pp
then we have:
det B = λ det A

95
5a) if all elements of one row (or column) are null, then:

det A = 0

(it is easy to see this by considering the property above (5) with λ = 0.
5b) if every element is scaled by λ, i.e. for the matrix it is:

λa 11 λa 12 ... λa 1p
 
 λa 21 λa 22 ... λa 2p 
C = λA = 
 ...


λa p1 λa p2 ... λa pp

then the determinant becomes:


detC = det(λA) = λp det A
where the matrix is p × p.
5c) If matrix A is anti-symmetric then we have that:

det A = 0 if p is odd

If we consider the property defining the symmetric matrices:

A T = −A

then we have:
det A = det A T = det(−A) = (λ)p det A = (−1)p det A
where if p is even, (−1)p = 1;
det A = det A
which is obviously verified and does not tell us much.
If instead p is odd, then (−1)p = −1, so:

det A = − det A ⇒ det A = 0

(6) Identical rows or columns: if any two rows or columns are identical or scaled (multiple of one another),
then:
det A = 0
This can be understood by considering property (4) as by exchanging two rows (or columns) the determi-
nant has to change sign: if the two rows (or columns) are identical, then again:

det A = − det A ⇒ det A = 0

(7) Adding a constant multiple of one row (or column) to another: the determinant is unchanged in value
by adding to the elements of one row (or column) any fixed multiple of the elements of another row (or
column). If C j is substituted by:
C 0j = C j + βC k
the determinant does not change. Let’s see an example: consider matrix A
 
2 1 2
A= 1 2 0 
3 5 6

whose determinant is:


¯ ¯ ¯ ¯
¯ 1 2 ¯¯ ¯ 2 2 ¯¯
det A = −1 ¯¯ + 2 ¯ = −1(6 − 10) + 2(12 − 6) = 16
5 6 ¯ ¯ 3 6 ¯

96
Now we apply the following transformation on the second column:

C 20 = C 2 − 2C 1

we get:
 
2 1 − 2 · 2 = −3 2
A0 =  1 2−2·1 = 0 0 
3 5 − 2 · 3 = −1 6
and the determinant of A 0 can be very easily calculated using the second row now:
¯ ¯
¯ −3 2 ¯
det A 0 = (−1)2+1 1 ¯¯ ¯ = −1(−18 + 2) = 16
−1 6 ¯

(8) The determinant of a matrix product AB if the two matrices A and B are square matrices of the same size
is given by:
det(AB ) = det(A) det(B ) = det(B A)

8a) the latter property can be extended to more than two matrices:

det(ABC . . . M ) = ˙
det(A) det(B ) det(C )det(M )
= det(M . . . B A)

(9) Instead, for the sum of two matrices:

det(A + B ) 6= det(A) + det(B )

(10) For the determinant of an orthogonal matrix, we start from the definition:

OO T = O T O = I

and using the properties of the determinants listed above, we can obtain:

det(OO T ) = det(O) det(O T ) = (det(O))2 = det I = 1

that is:
detO = det(O T ) = ±1

(11) The determinant of a unitary matrix is also obtained by using the properties above and the definition of
unitary matrix:
UU † = U †U = I
Thus:
det(UU † ) = detU detU † = detU (detU )∗ = | detU |2 = det I = 1
that gives:
detU (detU )∗ = 1 → | detU | = 1
The determinant of a unitary matrix is a complex number that has to have a unit modulus.

11 Systems of Simultaneous Linear Equations


We will now apply some of what learned about matrices and their properties to solve systems of linear equa-
tions. In general, let’s consider the following p equations in q unknowns:


 a 11 x 1 + a 12 x 2 + · · · + a 1q x q = b 1

a x + a x + · · · + a x = b

21 1 22 2 2q q 2


...

a p1 x 1 + a p2 x 2 + · · · + a pq x q = b p

If b i = 0, ∀i , then the system is called homogeneous (where ∀ means “for each”). Otherwise the system is
non-homogeneous. The system can have:

97
1. no solutions
2. one unique solution
3. ∞ solutions
We will use matrix analysis to distinguish between these three possibilities. As a matter of fact, the set of
equations above can be expressed as a simple matrix equation. We can define a matrix whose elements are the
coefficients of the unknowns:
a 11 a 12 . . . a 1q
 
 a 21 a 22 . . . a 2q 
A= ...


a p1 a p2 . . . a pq
Then we define the unknowns as a column vector and the b coefficients as another column vector:
x1 b1
   
 x2   b2 
x = 
 ...  b=  ... 

xq bp
where x is a q-dimensional vector, while b is a p-dimensional vector. Thus the system above can be written as
the equation:
Ax = b
The matrix A represents a transformation from x to b, thus it transforms a vector in a q-dimensional space
into a vector in a p-dimensional space. For future reference, let’s define V the q-dimensional space and W the
p-dimensional space so we have:
x ∈V b ∈W
The operator A will map any vector in the q-dimensional space V into some subspace of the p-dimensional W
space. This subspace can also be the entire space W . It is called the “range” of A and it has dimensions equal
to the rank R(A) of matrix A. If there exists some q-dimensional vector y ∈ V such that:

Ay = 0

then there exists a subspace of V that is mapped into the null p-dimensional vector 0 ∈ W . Any vector, y, that
satisfies Ay = 0 lies in the particular subspace defined the “null” space of A. The dimension of this null space
is defined as the “nullity” of A. The nullity N (A) is such that:

R(A) + N (A) = q

thus the sum of the rank and the nullity gives back the dimension of the V space.
Let’s go back to try to classify the various types of systems. To do this we need to define the “augmented”
matrix M :
a 11 a 12 . . . a 1q b 1
 
 a 21 a 22 . . . a 2q b 2 
M =  ....


a m1 a m2 ... a pq bp
which is a p × (q + 1) matrix and it is obtained by adding to A one column made up of the b i coefficients, thus
the b column vector. Considering now the ranks of A and M we can classify our system:
1. If the ranks of A and M are different, R(A) 6= R(M ), then there is no solution of the system
2. if A and M have the same rank r , R(A) = R(M ) = r , then the system of equations will have either one
unique solution or ∞ solutions.
(a) if the rank is equal to the dimension of the V space, r = q, then there is one unique solution to the
system.
(b) instead, if the rank of A is smaller than the dimension of the V space, r < q, then there are ∞
solutions of the system. These ∞ solutions span the (q − r ) space, thus this space of solutions
corresponds to the null space. As a matter of fact, if a vector x satisfies Ax = b and another vector y
satisfies Ay = 0 (thus y ∈ null space of A), then it is:

A(x + y) = Ax + Ay = b + 0 = b

so vector x + y is also a solution of our initial system.

98
If we put what said above in the context of a homogeneous set of linear equations Ay = 0, we can consider that
this set always has the trivial solution
y 1 = y 2 = y 3 = ... = y q = 0
and if r = q then the trivial null solution is the only solution. If instead r < q, there can be also non-null
solutions as we have ∞ solutions to our system. These ∞ solutions form the null space of A with dimensions
q −r.
Of course in the case where p < q, then necessarily it is r < q so we will have ∞ solutions: this corresponds
to the fact that if there are less equations than unknowns, we have ∞ solutions.

11.1 Special Case: Square Matrices


We are going to consider in particular the case of A being a p × p square matrix: this corresponds to a system
of p linear equations in p unknowns. In this case, we can revisit the conditions given above on the number of
expected solutions of our given system. The condition r = p, i.e. the rank of the matrices A and M is equal to
the dimension p of the V space, corresponds to requiring:

det A 6= 0 → one unique solution

Instead, if det A = 0, we can have either no solution or ∞ solutions, depending on the ranks of A and M being
different or identical, respectively.
In case of a homogeneous system of equations with det A 6= 0, the one unique solution is the trivial one,
x = 0, while if we have det A = 0, we always fall in the case of ∞ solutions and we can have non null x satisfying
the system (albeit not completely determined). In the case of the homogeneous system of equations, the
augmented matrix M will differ from the matrix A by only a column of zeroes, hence the rank of A will always
be identical to the rank of M .

11.2 Methods to Solve Simultaneous Linear Equations


In the special case described above of a system of p linear equations in p unknowns having a unique solution
(det A 6= 0), we can list a number of ways to get to said unique solution. They all involve a good amount of
calculations, but there are cases in which one or the other gives specific advantages. It is up to our choice to
find the solution in one way or another.

11.2.1 Direct Inversions

If the matrix A is square and det A 6= 0, then the inverse A −1 exists and is unique, as we saw in the matrix theory.
Then we can just invert the initial system:

Ax = b → A −1 Ax = A −1 b → x = A −1 b

where in the second step, we have multiplied both sides of the equation from left by the inverse A −1 and then
we exploit the definition that A −1 A = I . The column vector x found in this way is the unique solution to the
system (if ~
b =~0, then only the null solution exists).
Let’s work immediately one example that we are going to carry on through all the methods:

(i) Given the following system of linear equation:



 2x + 4y + 3z = 4

x − 2y − 2z = 0

−3x + 3y + 2z = −7

prove that it has a unique solution and find it.


We start writing the equation in matrix form:
    
2 4 3 x 4
 1 −2 −2   y  =  0 
−3 3 2 z −7

99
where we are going call the 3 × 3 matrix A as usual. We need to check that the determinant of the matrix
A is not zero to be sure that the system has a unique solution:
¯ ¯
¯ 2 4 3 ¯
¯ ¯
det A = ¯ 1 −2 −2 ¯
¯ ¯
¯ −3 3 2 ¯
= 2(−4 + 6) − 4(2 − 6) + 3(3 − 6)
= 4 + 16 − 9 = 11

thus the determinant is non-null and we expect one single solution. Now we need to calculate the inverse
of the matrix A, using the formula we have found in the past lectures:

1
A −1 = CT
det A

where C is the co-factor matrix and the C T is the transpose of C . We leave out the calculation of the
inverse of the matrix and we give directly the solution:
 
2 1 −2
1
A −1 =  4 13 7 
11
−3 −18 −8

We can now insert this inverse matrix into the equation above and perform the matrix product x = A −1 b:
    
x 2 1 −2 4
 y  1 
= 4 13 7  0 
11
z −3 −18 −8 −7
 
2 · 4 + 1 · 0 − 2 · (−7)
1 
= 4 · 4 + 13 · 0 + 7 · (−7) 
11
−3 · 4 − 1 · 0 − 8 · (−7)
 
22
1 
= −33 
11
44
 
2
=  −3 
4

thus obtaining the solution for the system:



x = 2

y = −3

z =4

11.2.2 Cramer’s Rule

An alternative way to solve this kind of systems is called Cramer’s Rule. We are going to derive it in general, but
for a 3 × 3 matrix. However it can be generalised further to square matrices of any dimension. Let’s start from
our usual generic system in 3 equations and 3 unknowns:

a 11 x 1 + a 12 x 2 + a 13 x 3 = b 1

a 21 x 1 + a 22 x 2 + a 23 x 3 = b 2

a 31 x 1 + a 32 x 2 + a 33 x 3 = b 3

which corresponds to the now usual matrix equation:

Ax = b

100
Now we can exploit one of the properties of the determinant of a matrix: the determinant det A will not change
by adding to a column (or a row) another column (or row) multiplied by a scalar. Now we are going to apply
this property by adding to the first column the combination:
x2 x3
· (second column) + · (third column)
x1 x1

Thus effectively we are going to do the following substitution:


x2 x3
c1 → c1 + · c2 + · c3
x1 x1

where we have defined as c j the j th column. If we perform this operation the determinant does not change so
we want to consider the determinant of the matrix A:
x
¯ a 11 a 12 a 13 ¯ ¯¯ a 11 + xx12 a 12 + x31 a 13 a 12 a 13 ¯¯
¯ ¯ ¯ ¯
¯ ¯ ¯ x x
det A = ¯¯ a 21 a 22 a 23 ¯¯ = ¯ a 21 + x12 a 22 + x31 a 23 a 22 a 23 ¯
¯
x x
¯ a a 32 a 33 ¯ ¯ a 31 + x2 a 32 + x3 a 33 a 32 a 33 ¯
¯ ¯
31
1 1

Now we can notice that, if we go back to our initial system and we divide all the equations by x 1 we get:

x2 x3 b1


 a 11 + a 12 + a 13 =
x1 x1 x1





 x2 x3 b2
a + a + a =
 21 x 1 22 x 1 23
 x1

x x b3

a 31 + 2 a 32 + 3 a 33 =



x1 x1 x1

where all the left-hand sides are exactly the elements of the new first column in the modified matrix. We can
b
thus substitute x i in the first column, with i running on the row index.
i

¯ b1 ¯
¯
¯ x1 a 12 a 13 ¯¯
b2
det A = ¯ a 22 a 23 ¯
¯ ¯
x1
b3
¯ ¯
¯
x1 a 32 a 33 ¯

We can now make use again of one of the properties of the determinant. Reminder: if all the elements of a
single row (or column) of matrix B are multiplied by a common factor λ, then this factor can be taken out of
the determinant calculation and the value of the determinant is given by the product of the determinant of A
and λ (det B = λ det A). In this case we have that the common factor multiplying all the elements of the first
column is:
1
λ=
x1
and we can take it out of the determinant, thus:
¯ ¯
¯ b a 12 a 13
1 ¯¯ 1
¯
¯ 1
det A = b2 a 22 a 23 ¯ = ∆1
x 1 ¯¯ ¯ x
1
b3 a 32 a 33 ¯

where in the last step we defined the Cramer’s discriminant ∆1 Inverting the relation we can obtain the un-
known x 1 :
1
x1 = ∆1
det A
We can obtain something similar applying the respective column substitution to the second and third column
respectively. We define the three Cramer’s discriminants:
¯ ¯ ¯ ¯ ¯ ¯
¯ b1 a 12 a 13 ¯ ¯ a 11 b1 a 13 ¯ ¯ a 11 a 12 b1 ¯
∆1 = ¯¯ b 2 ∆2 = ¯¯ a 21 ∆3 = ¯¯ a 21
¯ ¯ ¯ ¯ ¯ ¯
a 22 a 23 ¯
¯ b2 a 23 ¯
¯ a 22 b2 ¯
¯
¯ b a 32 a 33 ¯ ¯ a b3 a 33 ¯ ¯ a a 32 b3 ¯
3 31 31

101
where each ∆ j is the determinant of a matrix obtained from matrix A in which the j th column has been re-
placed by the column vector of the coefficients b. The three unknowns (the elements of the column vector x)
are then obtained:
∆1 ∆2 ∆3
x1 = x2 = x3 =
det A det A det A
Let’s apply this method to the same example used above for the direct inversion method:
(i) Given the following system of linear equation:

 2x + 4y + 3z = 4

x − 2y − 2z = 0

−3x + 3y + 2z = −7

find the unique solution using the Cramer’s rule.


We can write again the matrix form of the system:
    
2 4 3 x 4
 1 −2 −2   y  =  0 
−3 3 2 z −2

We have already calculated the determinant of A as det A = 11. With the Cramer’s rule we can directly
calculate all the unknowns through the three Cramer’s discriminants:
1
x1 = ∆1
det A
¯ ¯
¯ 4 4 3 ¯¯
1 ¯¯
= 0 −2 −2 ¯¯
det A ¯¯
−7 3 2 ¯
1 8 + 14 22
= [4(−4 + 6) + 14(4 − 3)] = = =2
11 11 11
Similarly, for x 2 :
¯ ¯
¯ 2 4 3
∆2
¯
1 ¯¯ ¯ = −33 1 = −3
¯
x2 = = 1 0 −2
det A 11 ¯¯ ¯ 11
−3 −7 2 ¯

and for x 3 : ¯ ¯
¯ 2 4 4
∆3
¯
1 ¯¯ ¯ = 44 1 = 4
¯
x3 = = 1 −2 0
det A 11 ¯¯ ¯ 11
−3 3 −7 ¯

thus obtaining again the solution:



x = 2

y = −3

z =4

11.2.3 Gaussian Elimination

The last method we are going to address is called Gaussian elimination or row reduction. It involves working
on the augmented matrix M by modifying its rows. The modifications are of the type

R i → R i + k 1 R i 0 + k 2 R i 00 .

which means that each element of the i th row R i will be substituted by the sum of the element itself and the
element of i 0 th row R i 0 scaled by a factor k 1 , and so on.
The aim should be to reduce the square sub-matrix A which is part of the augmented matrix to be upper
triangular. If we obtain an upper triangular sub-matrix, it would be:
 0 0 0
b 10

a 11 a 12 a 13
0 0
M = 0 a 22 a 23 b 20 
0 0 a 33 b 30
0

102
where the elements are now primed as they have been modified by the row operations. If then we write the
system corresponding to the new form of the augmented matrix M , we get:
 0 0 0 0
a 11 x 1 + a 12 x 2 + a 13 x 3 = b 1

0 0
a 22 x 2 + a 23 x 3 = b 20

 0 0
a 33 x 3 = b 3

where we can immediately extract x 3 from the last equation. Once we have x 3 we can substitute in the second
equation to obtain x 2 and so on, substituting x 2 and x 3 in the first equation, we have obtained the solution.
Let’s work again the same example already used for the other two methods:

(i) Given the following system of linear equation:



 2x + 4y + 3z = 4

x − 2y − 2z = 0

−3x + 3y + 2z = −7

find the solution doing operations on rows and columns.


As said above, we need to start from the augmented matrix M :
 
2 4 3 4
M =  1 −2 −2 0 
−3 3 2 −7

and we apply the row modification:


 
2 4 3 4
R 2 → R 3 + 3R 2 ⇒  0 −3 −4 −7 
−3 3 2 −7

then we can do again:


 
2 4 3 4
3
R3 → R3 + R1 ⇒  0 −3 −4 −7 
2 13
0 9 2 −1
and finally:
 
2 4 3 4
R 3 → R 3 + 3R 2 ⇒  0 −3 −4 −7 
−11
0 0 2 −22
where we now have an upper triangular square sub-matrix. We can now write the system obtain with this
modified matrix: 

 2x + 4y + 3z = 4

−3y − 4z = −7


 −11
z = −22 → z = 4


2
where clearly the last equation is already giving us one of the unknowns. This value can be substituted in
the second equation:
−3y − 4(4) = −7 → y = −3
and similarly, both values need to be substituted in the first equation:

2x + 4(−3) + 3(4) = 4 → x = 2

thus obtaining once again the expected solution.

12 Eigenvalues and Eigenvectors


We are now going to exploit some of the properties seen in the previous section to define and obtain the eigen-
values and the eigenvectors of a square matrix A.

103
12.1 Introduction
As brief introduction, let’s consider a diagonal matrix D:
µ ¶
2 0
D= 1
0 2

and let’s apply it to a generic column vector x:

x 10
µ ¶µ ¶ µ ¶ µ ¶
2 0 x1 2x 1
1 = 1 =
0 2
x2 2 x2
x 20

If now we apply the same matrix D to the unit vectors ı̂ and ̂


µ ¶µ ¶ µ ¶ µ ¶
2 0 1 2 1
D ı̂ = 1 = =2 = 2 ı̂
0 2
0 0 0
µ ¶µ ¶ µ ¶ µ ¶
2 0 0 0 1 0 1
D ̂ = 1 = 1 = = ̂
0 2
1 2 2 1 2
Thus it is clear that this matrix D when applied to the unit vectors does not change them but it simply rescales
them of some constant (2 or 1/2).
Let’s now consider a matrix F : µ ¶ 5 3
F= 4 4
3 5
4 4
In this case, can we find a vector x such that when we apply F to it, it is left unchanged?
This question corresponds to the equation:
Fx = λx
where indeed if a vector x satisfies this equation, it means that F does not change it, apart from a scaling due
to the parameter λ. We can solve the equation for the vector x:
µ 5 3 ¶µ ¶ µ ¶
x1 x1
4
3
4
5 =λ
4 4
x2 x2

by performing the matrix product on the left side and equating to the right side, we obtain a system of equa-
tions: 5 3
 x 1 + x 2 = λx 1

4 4
 3 x + 5 x = λx

1 2 2
4 4
This system has three unknowns and two equations. Now let’s assume a value for x 1 and obtain a new system:
(
5 + 3x 2 = 4λ
x1 = 1 →
3 + 5x 2 = 4λx 2

From the latter system, we can extract a value of x 2 by substituting the first equation into the second:

3 + 5x 2 = (5 + 3x 2 )x 2 = 5x 2 + 3x 22 → x 22 = 1 → x 2 = ±1

Thus we have two sets of (x 1 , x 2 ) pairs and for each pair we can obtain the corresponding value of λ:
 1
(x 1 = 1, x 2 = 1) : → λ1 = (8) = 2

4
(x = 1, x = −1) : → λ = 1 (2) = 1

1 2 2
4 2

Thus we have found that the vector x (1) = (1, 1) is an eigenvector of F with eigenvalue λ = 2, i.e. satisfying the
following equation: µ ¶ µ ¶
1 1
F x (1) = λ1 x (1) → F =2
1 1

104
and similarly for the vector x (2) = (1, −1) which is an eigenvector of F with eigenvalue λ = 12 , i.e. satisfying the
following equation: µ ¶ µ ¶
1 1 1
F x (2) = λ2 x (2) → F =
−1 2 −1
Let’s now summarise what we obtained:

D F
µ ¶ µ 5 3 ¶
2 0 4 4
Matrix 1 3 5
0 2 4 4

Determinant 1 1
5 5
Trace 2 2

Eigenvalues: λ1 2 2
1 1
λ2 2 2
µ ¶ µ ¶
1 1
Eigenvectors: x (1)
0 1
µ ¶ µ ¶
(2) 0 1
x
1 −1

To conclude this example, we learned that the two matrices D and F correspond to the same deformation
apart from the orientation with respect to the two-dimensional axes. Thus the only difference is a clockwise
rotation of 45◦ degrees of the two sets of eigenvectors.

12.2 Definition of Eigenvalues and Eigenvectors


The concepts of eigenvalues and eigenvectors are fundamental in quantum physics:

• a physics system is described by a state vector


• a measurement is a matrix operator
• the outcome of a measurement (a physical observable) is an eigenvalue of the matrix
• the system is left after the measurement in an eigenstate, i.e. in a state vector which is an eigenvector of
the matrix.

Let’s now state some definitions and then find a way to solve the problem of finding eigenvalues and eigen-
vectors of a given matrix. In this section, we are going to use vectors as column vectors:

x1 y1
  
 x2   y2 
x = 
 ...  y = 
 ... 
xp yp

and matrices as square matrices p × p:

a 11 a 12 ... a 1p
 
 a 21 a 22 ... a 2p 
A=
 ...

... 
a p1 a p2 ... a pp

We can write a matrix equation:

a 11 a 12 ... a 1p x1 y1
    
 a 21 a 22 ... a 2p   x2   y2 
Ax = y → 
 ...
 = 
...  ...   ... 
a p1 a p2 ... a pp xp yp

105
that corresponds to a system of p simultaneous linear equations obtained by performing the row×column
matrix product: 

 a 11 x 1 + a 12 x 2 + · · · + a 1p x p = y 1

a x + a x + · · · + a x = y
 p
21 1 22 2 2p p 2 X
→ yi = ai j x j


 ... j =1

a p1 x 1 + a p2 x 2 + · · · + a pp x p = y p

We now define the eigenvector equation as:


Ax = λx
where λ is some constant and x is some column vector. Solving the eigenvector equation for the matrix A
means finding those vectors that the matrix A leaves unchanged in direction and scales of a factor λ. These
vectors are called eigenvectors and the factors λ are called eigenvalues 3 . The explicit matrix form of the eigen-
vector equation is as follows:

a 11 a 12 ... a 1p x1 x1
    
 a 21 a 22 a 2p 
...   x2  = λ  x2 
   

 ... ...  ...   ... 
a p1 a p2 ... a pp xp xp

The eigenvectors are defined in direction but not in length: as a matter of fact if x is a solution of the eigenvector
equation Ax = λx, then given a scalar c, any vector c x is also a solution:

a 11 a 12 ... a 1p x1 x1
    
 a 21 a 22 a 2p   x2
...  = λ(c x) = λc  x 2 
  
A(c x) = 
 ...
c 
...   ...   ... 
a p1 a p2 ... a pp xp xp

→ Ax = λx
which is still verified if x is a solution. So we are going to have an infinite number of solutions out of our
eigenvector equation: given one solution x, all c x are also solutions for each value of c. This means that the
direction of the vector is determined, but not its size. As a convention, we might be requested to obtain the
normalised eigenvectors which are the unit vectors solving the eigenvector equation. Given an eigenvector x
solving the equation Ax = λx, the normalised eigenvector will be simply:
x
x̂ =
|x|

where at the denominator we have the modulus of the vector, i.e. its size.
Let’s start considering an easy example for which we can solve the eigenvector equation directly, so that
we can then solve the same example once we formalise a way to solve this problem of finding eigenvalues and
eigenvectors.

(i) Given the following 2 × 2 matrix: µ ¶


0 1
A=
1 0
solve the eigenvector equation Ax = λx.
Let’s solve directly this equation:
µ ¶ µ ¶µ ¶ µ ¶
x 0 1 x x
A = =λ
y 1 0 y y

If we perform the matrix product, we get two simultaneous equations:

0x + 1y = λx
(

1x + 0y = λy
3 “Eigen” is a German word meaning “self”, or “proper” or “own”, or also “particular”.

106
From the second equation, we get:
x = λy
and then we can substitute in the first one:

y = λ2 y → λ2 = 1 → λ = ±1

So we have found two eigenvalues for matrix A. Now let’s start from one eigenvalue and find the corre-
sponding eigenvector: given λ = 1, we have:
(
0x + 1y = x
µ ¶µ ¶ µ ¶
0 1 x x
= +1 →
1 0 y y 1x + 0y = y

where the two equations actually correspond to the same condition: x = y. This is indeed due to the fact
that the eigenvector equation determine the direction and not the size of the vector solutions. Thus we
can choose one value for one of the two coordinates and then use the condition above to find the other:
let’s put x = 1
x =1 → y =1
So the eigenvector x (1) corresponding to eigenvalue λ1 = 1 is:
µ ¶
1
x (1) =
1

We can then do the same for the other eigenvalue λ2 = −1:


(
0x + 1y = −x
µ ¶µ ¶ µ ¶
0 1 x x
= −1 →
1 0 y y 1x + 0y = −y

where again the two equations actually correspond to the same condition: x = −y. Again we can choose
one value for x and then use the condition above to find y: let’s put x = 1

x = 1 → y = −1

So the eigenvector x (2) corresponding to eigenvalue λ2 = −1 is:


µ ¶
(2) 1
x =
−1

Let’s go back to our general eigenvector equation in order to find a formal recipe to solve it: let’s rewrite λx as:

λ 0 0 ... 0 x1
  
 0 λ 0 . . . 0   x2 
λx = λI x = 
 ...
 
 ... 
0 0 0 0 λ xp

where I is the p × p identity matrix. So now we have a matrix product also on the right side of the eigenvector
equation Ax = λx and we can now rewrite it by subtracting the two matrices:

(A − λI )x = 0

Now this latter equation can be interpreted as a homogeneous system of linear equations:

a 11 − λ a 12 ... a 1p x1 0
    
 a 21 a 22 − λ ... a 2p   x2   0 
 ...  =  ... 
    
 ... ...
a p1 a p2 ... a pp − λ xp 0

(a 11 − λ)x 1 + a 12 x 2 + · · · + a 1p x p = 0



a x + (a − λ)x + · · · + a x = 0

21 1 22 2 2p p



 . . .
a p1 x 1 + a p2 x 2 + · · · + (a pp − λ)x p = 0

107
From what we know about homogeneous system of simultaneous linear equations, it always exists the null
solution x = 0, but this is not interesting for us. We want to find non-null eigenvectors, thus we need the
system to have ∞ solutions. If the system has infinite solutions, there will be non-null eigenvectors verifying
the equation above. In order for the system to have infinite solutions, the condition on the matrix A needs to
be det A = 0 (see the section about systems of simultaneous linear equations). So we can define what is called
the eigenvalue equation:
det(A − λI ) = 0
where now we have obtained an equation in λ only. By solving this equation, we can find the eigenvalues of the
matrix A and then we can put each of them in the eigenvector equation to find the corresponding eigenvectors.
Let’s work again the example used above.

(i) Given the following 2 × 2 matrix: µ ¶


0 1
A=
1 0
solve the eigenvector equation Ax = λx.
Let’s solve first the eigenvalue equation:

¯ 0−λ
¯ ¯
1 ¯¯
det(A − λI ) = ¯¯ =0
1 0−λ ¯

→ λ2 − 1 = 0 → λ1,2 = ±1
As expected, we have found an equation in λ only and solving it we have found two values λ1 = +1 and
λ2 = −1. We need now to use those values in the eigenvector equation to obtain the correspondent eigen-
vector: for λ1 = +1, we have (just as above):
µ ¶µ ¶ µ ¶
0 1 x x
= +1
1 0 y y
µ ¶
1
→ y = x → x (1) =
1
while for λ2 = −1, it is: µ ¶µ ¶ µ ¶
0 1 x x
= −1
1 0 y y
µ ¶
1
→ y = −x → x (2) =
−1

12.3 Normalised Eigenvectors


This is just a reminder of what we already mentioned while talking about vectors in general:

• real eigenvectors. In case the vectors have all real components, then their normalisation simply requires
the calculation of their modulus. Given the generic vector a:
 
ax
~
a = (a x , a y , a z ) → a =  a y 
az

the (square) modulus is calculated as:

a |2 = a x2 + a 2y + a z2
|~

To write the corresponding calculation in matrix form, we need to define the transpose of our vector that
will be a row vector:
a T = (a x a y a z )

108
thus the modulus becomes:
 
ax
T
a a = (a x a y a z )  a y  = a x2 + a 2y + a z2
az

The normalised vector is:


a
â = q
a x2 + a 2y + a z2
• complex eigenvectors. In case the vectors have complex components, then their normalisation implies
the use of the hermitian of the original vector. Given the generic vector c:

~ c |2 = c x∗ c x + c y∗ c y + c z∗ c z
c = (c x , c y , c z ) → |~

The latter operation corresponds to:


 
cx
c † c = (c x∗ c y∗ c z∗ )  c y  = c x∗ c x + c y∗ c y + c z∗ c z = |c x |2 + |c y |2 + |c z |2
cz

where in the last step, we sum the square moduli of the complex components of the vector.

12.4 Matrix Diagonalisation


Using the knowledge of the eigenvalues and the eigenvectors of a matrix A, we can define a matrix operation
that allows us to obtain a diagonal matrix from our generic matrix A. Let’s start from a generic 2 × 2 matrix
µ ¶µ ¶ µ ¶
a 11 a 12 x x

a 21 a 22 y y

and from its two eigenvector equations:


¶ µ (1) ¶
x (1) x (1)
µ µ ¶ µ ¶
a 11 a 12 x
= λ1 = λ1

y (1) y (1) y (1)

 a 21 a 22

¶ µ (2) ¶
x (2) x (2)
µ µ ¶ µ ¶
 a 11 a 12 x
= λ2 = λ2


(2)
a 21 a 22 y y (2) y (2)

We can put together the two equations into one by merging the two column vectors into one matrix and in-
serting the identity matrix from the right to take care of the λ scalars:
¶ µ (1)
x (2)
µ (1)
x (2) λ1 0
µ ¶ ¶µ ¶
a 11 a 12 x x
=
a 21 a 22 y (1) y (2) y (1) y (2) 0 λ2
(1) (2) ¶
λ1 x λ2 x
µ
=
λ1 y (1) λ2 y (2)

that can be written in matrix form as:


AX = X Λ
where we defined two new matrices:
x (1) x (2) λ1
µ ¶ µ ¶
0
X= Λ=
y (1) y (2) 0 λ2

X is the matrix made up with the eigenvectors as columns, while Λ is the diagonal matrix with the eigenvalues
as diagonal elements. Starting from the equation AX = X Λ, we can multiply both sides, from the left, by the
matrix X −1 , inverse of X :
X −1 AX = X −1 X Λ → X −1 AX = Λ
where in the last step we used the definition of the inverse of a matrix: X −1 X = I . From the equation X −1 AX =
Λ, we say that X diagonalises A.
We are going to list now some more properties we can derive from the above equation. Consider a matrix A,
the matrix of its eigenvectors X and the diagonal matrix of its eigenvalues Λ:

109
(1) the determinant of A is the same as the determinant of the diagonal matrix Λ:
p
det A = det Λ = λ1 λ2 . . . λp = λk
Y
k=1

The first equality can be proved by applying the definition above, exploiting the determinant property for
the matrix product (det AB = det A det B ) and using the commutativity of the product between scalars:

det Λ = det (X −1 AX ) = det X −1 det A det X


= det X −1 det X det A = det (X −1 X ) det A
= det I det A = 1 · det A = det A

(2) The trace of A is the same as the trace of the diagonal matrix Λ:
p
Tr A = Tr Λ = λ1 + λ2 + · · · + λp = λk
X
k=1

(3) The X matrix diagonalises also the inverse of A:

I = X −1 X = X −1 A A −1 X = X −1 AX X −1 A −1 X = Λ(X −1 A −1 X )

where we used just the definitions of matrix inversion X −1 X = X X −1 = I , A A −1 , and matrix diagonalisation
X −1 AX = Λ. From the first and last expressions:

I = Λ(X −1 A −1 X ) → X −1 A −1 X = Λ−1

which is the equation for the diagonalisation of the A −1 matrix and the diagonal matrix Λ−1 is the inverse
of the original diagonal matrix Λ.

Let’s work one example:

(i) Consider the matrix A: µ ¶


0 −i
A=
i 0
and find the eigenvector matrix X and the eigenvalue matrix Λ.
This exercise is similar to what we have solved so far as we just need to find the eigenvalues and the cor-
respondent eigenvectors, in order to be able to write the matrices X and Λ. We start from the eigenvalue
equation:
¯ ¯
¯ −λ −i ¯¯
det (A − I λ) = ¯
¯ i =0
−λ ¯
→ λ2 − 1 = 0
→ λ1,2 = ±1

Then from the eigenvector equation, for λ1 = 1 we get:


µ ¶µ ¶ µ ¶
0 −i x x
= +1 → −i y = x
i 0 y y

If we set x = 1, then we get y = i , thus obtaining the eigenvector:


µ ¶
1
x (1) =
i

For λ2 = −1, we have: µ ¶µ ¶ µ ¶


0 −i x x
= +1 → −i y = −x
i 0 y y

110
and if we set x = 1, we get y = −i and the eigenvector:
µ ¶
1
x (2) =
−i

If we want to normalise these eigenvectors in order to have unit vectors, we need to calculate
µ ¶
1
x (1)† x (1) = (1 − i ) = 1 − (i )2 = 2
i
µ ¶
1
x (2)† x (2) = (1 i ) = 1 − (i )2 = 2
−i

thus obtaining the unit vectors:


µ ¶ µ ¶
1 1 1 1
x̂ (1) = p x̂ (2) = p
2 i 2 −i

Now finally the two matrices are:


à ! µ ¶
1 1 1 0
X= Λ=
i −i 0 −1

as requested.

12.5 Special Cases


Considering special kinds of matrices, we can reflect on the properties of their eigenvalues and eigenvectors:
(1) Hermitian matrix H = H †

(a) its eigenvalues are real


(b) its eigenvectors are orthogonal
(c) the eigenvector matrix X that satisfies X −1 H X = Λ is unitary: X † = X −1

It is worth spending some time reflecting on the two points above: if we consider a unitary matrix and we
call it X this time, we know that we can apply the definition writing:
µ ∗ ∗ ¶µ ¶ µ ¶
† x 11 x 21 x 11 x 12 1 0
X X= ∗ ∗ =
x 12 x 22 x 21 x 22 0 1

If we perform the matrix product we get four equations. Let’s write down the first two from the product of
the first row of X † by the first and second columns of X :
( ∗ ∗
x 11 x 11 + x 21 x 21 = 1
∗ ∗
x 11 x 12 + x 21 x 22 = 0

The first equation is the unitary condition for the first column of X and, as the first column of X corre-
sponds to the first eigenvector, this is the condition on the first eigenvector to be a unit vector. The second
equation is the product of the first column of X with its second column: again as the two columns corre-
spond to the two eigenvectors, this condition tells us that the two eigenvectors are orthogonal. More: as
they are orthogonal unit vectors, they are also called an orthonormal basis.

(2) Symmetric matrices S T = S: this is just a specific sub-case of the Hermitian matrix when all the elements
of the matrix are real. The eigenvalues are real, the eigenvectors are orthogonal and the matrix X is orthog-
onal (X T = X −1 ).

(3) Diagonal matrix D. In general it can be written as:

d 11 0 0 0
 
 0 d 22 0 0 
D = 
 0 0 d 33 0 
0 0 0 d 44

111
The eigenvector equation is:
D x = λx
giving the eigenvalue equation:
¯ d 11 − λ
¯ ¯
0 0 0 ¯
d 22 − λ
¯ ¯
0 0 0
det (D − λI )
¯ ¯
= ¯ ¯=0
¯
¯ 0 0 d 33 − λ 0 ¯
¯
¯ 0 0 0 d 44 − λ ¯
= (d 11 − λ)(d 22 − λ)(d 33 − λ)(d 44 − λ) = 0
from which we obtain that the elements on the diagonal correspond to the eigenvalues of the diagonal
matrix:
λi = d i i ∀i = 1, . . . , 4
Thus we obtain that D = Λ in this case. Regarding the eigenvectors:
1 0 0 0
       
 0   1   0  0 
x (1) =  (2) (3) (4)

 0  x = 0  x = 1  x =
      
0 
0 0 0 1
which are the standard unit vectors in a p−dimensional space (where in this example p = 4).

12.6 Matrix Diagonalisation


Using the knowledge of the eigenvalues and the eigenvectors of a matrix A, we can define a matrix operation
that allows us to obtain a diagonal matrix from our generic matrix A. Let’s start from a generic 2 × 2 matrix
µ ¶µ ¶ µ ¶
a 11 a 12 x x

a 21 a 22 y y
and from its two eigenvector equations:
¶ µ (1) ¶
x (1) x (1)
µ µ ¶ µ ¶
a 11 a 12 x
= λ1 = λ1

y (1) y (1) y (1)

 a 21 a 22

¶ µ (2) ¶
x (2) x (2)
µ µ ¶ µ ¶
 a 11 a 12 x
= λ2 = λ2


a 21 a 22 y (2) y (2) y (2)

We can put together the two equations into one by merging the two column vectors into one matrix and in-
serting the identity matrix from the right to take care of the λ scalars:
¶ µ (1)
x (2)
µ (1)
x (2) λ1 0
µ ¶ ¶µ ¶
a 11 a 12 x x
=
a 21 a 22 y (1) y (2) y (1) y (2) 0 λ2
(1) (2) ¶
λ1 x λ2 x
µ
=
λ1 y (1) λ2 y (2)
that can be written in matrix form as:
AX = X Λ
where we defined two new matrices:
x (1) x (2) λ1
µ ¶ µ ¶
0
X= Λ=
y (1) y (2) 0 λ2
X is the matrix made up with the eigenvectors as columns, while Λ is the diagonal matrix with the eigenvalues
as diagonal elements. Starting from the equation AX = X Λ, we can multiply both sides, from the left, by the
matrix X −1 , inverse of X :
X −1 AX = X −1 X Λ → X −1 AX = Λ
where in the last step we used the definition of the inverse of a matrix: X −1 X = I . From the equation X −1 AX =
Λ, we say that X diagonalises A.
We are going to list now some more properties we can derive from the above equation. Consider a matrix A,
the matrix of its eigenvectors X and the diagonal matrix of its eigenvalues Λ:

112
(1) the determinant of A is the same as the determinant of the diagonal matrix Λ:
p
det A = det Λ = λ1 λ2 . . . λp = λk
Y
k=1

The first equality can be proved by applying the definition above, exploiting the determinant property for
the matrix product (det AB = det A det B ) and using the commutativity of the product between scalars:

det Λ = det (X −1 AX ) = det X −1 det A det X


= det X −1 det X det A = det (X −1 X ) det A
= det I det A = 1 · det A = det A

(2) The trace of A is the same as the trace of the diagonal matrix Λ:
p
Tr A = Tr Λ = λ1 + λ2 + · · · + λp = λk
X
k=1

(3) The X matrix diagonalises also the inverse of A:

I = X −1 X = X −1 A A −1 X = X −1 AX X −1 A −1 X = Λ(X −1 A −1 X )

where we used just the definitions of matrix inversion X −1 X = X X −1 = I , A A −1 , and matrix diagonalisation
X −1 AX = Λ. From the first and last expressions:

I = Λ(X −1 A −1 X ) → X −1 A −1 X = Λ−1

which is the equation for the diagonalisation of the A −1 matrix and the diagonal matrix Λ−1 is the inverse
of the original diagonal matrix Λ.

Let’s work one example:

(i) Consider the matrix A: µ ¶


0 −i
A=
i 0
and find the eigenvector matrix X and the eigenvalue matrix Λ.
This exercise is similar to what we have solved so far as we just need to find the eigenvalues and the cor-
respondent eigenvectors, in order to be able to write the matrices X and Λ. We start from the eigenvalue
equation:
¯ ¯
¯ −λ −i ¯¯
det (A − I λ) = ¯
¯ i =0
−λ ¯
→ λ2 − 1 = 0
→ λ1,2 = ±1

Then from the eigenvector equation, for λ1 = 1 we get:


µ ¶µ ¶ µ ¶
0 −i x x
= +1 → −i y = x
i 0 y y

If we set x = 1, then we get y = i , thus obtaining the eigenvector:


µ ¶
1
x (1) =
i

For λ2 = −1, we have: µ ¶µ ¶ µ ¶


0 −i x x
= +1 → −i y = −x
i 0 y y

113
and if we set x = 1, we get y = −i and the eigenvector:
µ ¶
1
x (2) =
−i

If we want to normalise these eigenvectors in order to have unit vectors, we need to calculate
µ ¶
1
x (1)† x (1) = (1 − i ) = 1 − (i )2 = 2
i
µ ¶
1
x (2)† x (2) = (1 i ) = 1 − (i )2 = 2
−i

thus obtaining the unit vectors:


µ ¶ µ ¶
1 1 1 1
x̂ (1) = p x̂ (2) = p
2 i 2 −i

Now finally the two matrices are:


à ! µ ¶
1 1 1 0
X= Λ=
i −i 0 −1

as requested.

12.7 Special Cases


Considering special kinds of matrices, we can reflect on the properties of their eigenvalues and eigenvectors:
(1) Hermitian matrix H = H †

(a) its eigenvalues are real


(b) its eigenvectors are orthogonal
(c) the eigenvector matrix X that satisfies X −1 H X = Λ is unitary: X † = X −1

It is worth spending some time reflecting on the two points above: if we consider a unitary matrix and we
call it X this time, we know that we can apply the definition writing:
µ ∗ ∗ ¶µ ¶ µ ¶
† x 11 x 21 x 11 x 12 1 0
X X= ∗ ∗ =
x 12 x 22 x 21 x 22 0 1

If we perform the matrix product we get four equations. Let’s write down the first two from the product of
the first row of X † by the first and second columns of X :
( ∗ ∗
x 11 x 11 + x 21 x 21 = 1
∗ ∗
x 11 x 12 + x 21 x 22 = 0

The first equation is the unitary condition for the first column of X and, as the first column of X corre-
sponds to the first eigenvector, this is the condition on the first eigenvector to be a unit vector. The second
equation is the product of the first column of X with its second column: again as the two columns corre-
spond to the two eigenvectors, this condition tells us that the two eigenvectors are orthogonal. More: as
they are orthogonal unit vectors, they are also called an orthonormal basis.

(2) Symmetric matrices S T = S: this is just a specific sub-case of the Hermitian matrix when all the elements
of the matrix are real. The eigenvalues are real, the eigenvectors are orthogonal and the matrix X is orthog-
onal (X T = X −1 ).

(3) Diagonal matrix D. In general it can be written as:

d 11 0 0 0
 
 0 d 22 0 0 
D = 
 0 0 d 33 0 
0 0 0 d 44

114
The eigenvector equation is:
D x = λx
giving the eigenvalue equation:

¯ d 11 − λ
¯ ¯
0 0 0 ¯
d 22 − λ
¯ ¯
0 0 0
det (D − λI )
¯ ¯
= ¯ ¯=0
¯
¯ 0 0 d 33 − λ 0 ¯
¯
¯ 0 0 0 d 44 − λ ¯
= (d 11 − λ)(d 22 − λ)(d 33 − λ)(d 44 − λ) = 0

from which we obtain that the elements on the diagonal correspond to the eigenvalues of the diagonal
matrix:
λi = d i i ∀i = 1, . . . , 4
Thus we obtain that D = Λ in this case. Regarding the eigenvectors:

1 0 0 0
       
 0   1   0   0 
x (1) =  
 0  x (2) =  
 0  x (3) =  
 1  x (4) =  
 0 
0 0 0 1

which are the standard unit vectors in a p−dimensional space (where in this example p = 4).

13 Differential Equations
Differential equations (D.E.) are equations involving a function y(x) of variable x and its derivatives with re-
spect to x:
µ 2
d y(x) dy(x)

f , , y(x), x =0
dx 2 dx
where f is some function of those quantities. The D.E. are classified based on the higher derivative present
giving the order of the equation:

(a) take for example the equation:


dy
+ sin x y = 0 → first order D.E.
dx
(b) or the equation:
d2 y
+ e x = x → second order D.E.
dx 2
(c) or finally the equation:
µ ¶4
d2 y dy
+ = x → second order D.E.
dx 2 dx

Solving a D.E. involves one or more integrations, thus the function y(x) solution of a given D.E. will be defined
up to a number of integration constants. The number of integration constants corresponds to the order of
the D.E. Some D.E. will have a number of “boundary conditions” (B.C.) associated to them: they will allow to
determine the integration constants and thus completely define the solution function. We will start from the
first order D.E. and will then address some simple second order ones.
Note on notation: there are many notations out there in books. In general all notations of the kind:

dy dy(x)
≡ ≡ y 0 ≡ y 0 (x)
dx dx
dy
are all equivalent. In general I would use the notation dx and sometimes y 0 when it is clearer.

115
13.1 First Order Differential Equations
In the case of first order D.E., we can start considering some easy examples that can be solved by “direct inte-
gration”, applying the technique of the “separation of the variables”.

(i) Solve the D.E.


dy
=1
dx
We can separate the variables meaning that we keep all the terms depending on the function y on the
left-hand side, while we put all the terms involving the variable x on the right-hand side:
Z Z
dy = dx → dy = dx

and then integrating:


y + c y = x + cx → y = x + c1
where in the last step we have put together the two integration constants into one as there is only one
degree of freedom in this case. We can then consider one boundary condition for example:

y(x = 0) = 1

which will determine the integration constant of the solution:

y(0) = 0 + c 1 = 1 → c 1 = 1 → y = x + 1

where the last one is the particular solution of our differential equation with the associated boundary
condition.

(ii) Consider this first order differential equation with associated boundary condition:

 dy = 2y

dx
y(x = 0) = 1

We can again integrate directly by separating the variables:

dy dy
Z Z
= 2dx → =2 dx → ln y = 2x + c 1
y y

Thus the solution to our differential equation is:

y = c 2 e 2x

where we have included the previous integration constant into a simpler new multiplicative constant (i.e.
c 2 = e c1 ). Now we have to consider the boundary condition y(0) = 1:

y(0) = c 2 = 1

thus the particular solution has to be:


y = e 2x

In general we can formalise the variable separation writing the generic first order differential equation:

dy f (x)
=
dx g (y)

from which we get: Z Z


g (y)dy = f (x)dx → g (y)dy = f (x)dx

Let’s go back to some other examples:

116
(iii) Solve the D.E.
dy
= (1 + y)(1 + x)
dx
where in the formalism above this would correspond to:

 f (x) = (1 + x)

1
g (y) =
(1 + y)

So integrating:
dy x2
Z Z
= (1 + x)dx → ln(1 + y) = x + + c1
1+ y 2
and then exponentiating both sides and renaming the new integration constant we get:

x2
y = c 2 e (x+ 2 )
−1

If we then consider a boundary condition like:

y(x = 0) = 1 → y(0) = c 2 − 1 = 1 → c 2 = 2

so the solution is
x2
y = 2e (x+ 2 )
−1

13.2 Linear First Order Differential Equations


We are now going to consider a specific kind of differential equations, i.e. linear first order differential equa-
tions like the following:
dy
p(x) + q(x)y(x) = f (x)
dx
where p(x), q(x) and f (x) are generic functions of variable x. We can distinguish two main cases described in
the sections below.

13.2.1 Homogeneous Differential Equations

We define homogeneous a differential equation in which it is f (x) = 0, thus:

dy
p(x) + q(x)y(x) = 0
dx
Also in this case we can separate the variables and obtain the following:

dy dy q(x)
p(x) = −q(x)y(x) → =− dx
dx y(x) p(x)

Then we integrate:
dy q(x)
Z Z
=− dx
y p(x)
q(x)
Z R q(x)
ln y = − dx → y = C e
− p(x) dx
p(x)
This is a formula that we can just apply once we have a linear first order differential equation. However some-
times it is just easier and more linear to just separate the variables and integrate, without having to remember
by heart this formula.
Let’s work some examples of such linear equations:

117
(i) Solve the following D.E.:
dy
+xy = 0
dx
Let’s try to solve this by separating the variables as we have been doing so far:

dy dy x2
Z Z
= −xdx → =− xdx → ln y = − +C 1
y y 2

and exponentiating:
x2
y = C2e − 2

If we want to apply directly the formula found above, we compare this D.E. to our generic equation defin-
ing: (
p(x) = 1
q(x) = x
So we just need write down the formula substituting p(x) and q(x) in this specific case:
q(x) x2
R R
y = C2e
− p(x) dx = C2e − xdx
= C2e − 2

which is identical to what we obtained above.

(ii) Solve the following D.E.:


dy
x +y =0
dx
If we want to apply the formula directly we need to assign:
(
p(x) = x
q(x) = 1

Thus: R dx 1 1
y = C2e − x = C 2 e − ln x = C 2 e ln x = C 2
x

13.2.2 Non-Homogeneous Differential Equations

We define non-homogeneous a differential equation in which it is f (x) 6= 0, thus:

dy
p(x) + q(x)y(x) = f (x)
dx
We can solve these types of equations through three steps:

(1) Solve the correspondent homogeneous equation:

dy
p(x) + q(x)y(x) = 0
dx
whose solution will be called the complementary solution: y c (x).

(2) Find a particular solution of the full differential equation: we are going to call it y p (x).

(3) Finally the general solution to the full differential equation is the sum of the complementary and the par-
ticular solutions:
y(x) = y c (x) + y p (x)

We already know how to address point (1): in the case of linear homogeneous differential equation we can
write the complementary solution as:
R q(x)
y c (x) = C e
− p(x) dx

118
Then the particular solution is obtained using the method of the variation of the constant: we start from the
complementary solution and substitute the integration constant with an unknown function of x:
R q
y p (x) = u(x)e
− p dx

Then we derive this y p (x) function once and by substituting into the full differential equation to find a new
differential equation in the u(x) function. Solving this latter equation for u(x) will give the particular solution
y p (x).
Let’s work an example to understand better how it is practically done:

(i) Solve the differential equation:


dy
− 2x y = x − x 3
dx
So first of all we need to find the complementary solution of the homogeneous equation:

dy
− 2x y = 0
dx
and we can apply the formula we have found above:
2
R
y = C e− 2xdx
= Cex

thus the complementary solution is:


2
y c (x) = C e x
Then we variate the constant to obtain the particular solution and then we derive it once:

x2

 y p (x) = u(x)e

d 2 2

 y p (x) = u 0 e x + 2xu(x)e x
dx
We then substitute in the full equation:

d
y p (x) − 2x y p (x) = x − x3
dx
2 2 2
u 0 e x +  x −   x x − x3
 
2xu(x)e 2xu(x)e =
2
0
x − x 3 e −x
¡ ¢
u =

Now we have a differential equation for the u(x) function. We can separate the variables and integrate:
2
x − x 3 e −x dx
¡ ¢
du =
Z Z
2
x − x 3 e −x dx
¡ ¢
du =
Z
2
x 1 − x 2 e −x dx
¡ ¢
u(x) =

The last integral can be solved by integrating by parts setting:


 0
w = 1 − x2 w = −2x
(

v 0 = xe −x
2
v = − 1 e −x 2
2
and thus obtaining:

1¡ 1
Z
2 2
1 − x 2 e −x − 2xe −x dx
¢
u(x) = −
2 2
1 −x2 1 2 −x 2 1 −x 2
u(x) = − e + x e + e
2 2 2
1 2 −x 2
u(x) = x e
2

119
The particular solution is therefore:

1 2 2 1
y p (x) = x 2 e −x e x = x 2
2 2
Finally the solution of the non-homogeneous equation is the sum of the complementary and the partic-
ular solutions:
2 1
y(x) = y c (x) + y p (x) = C e x + x 2
2
(ii) Solve the differential equation:
dy y
− = ex
dx e x + 1
given the boundary condition: y(x = 0) = −1.
First of all we need to find the complementary solution of the homogeneous equation:

dy y
− =0
dx e x + 1
and we can separate the variables:
dy dx
= x
y e +1
and integrate:
dy dx e −x
Z Z Z
= = dx
y ex + 1 e −x + 1
where in the last step we just rewrote the integrand to ease the integration:

1
ln y = − ln(e −x + 1) + c 1 = ln + c1
e −x+1
thus exponentiating both sides it is:
1
y c = c2
e −x+1
which is the complementary solution.
Then we can proceed to find the particular solution to the full equation using the method of the variation
of the constant. The trial function is then written as follow together with its first derivative:

1

 y = u(x)


e −x + 1
dy 1 e −x
= u 0 (x) −x


 + u(x) −x
dx e +1 (e + 1)2

Then we can substitute this into the full equation:

1 e −x 1
u 0 (x) −x
+ u(x) −x 2
− u(x) x = ex
e +1 (e + 1) (e + 1)(e −x + 1)

where the terms containing a dependence on u(x) can be rewritten as:

1 e −x e −x
u 0 (x) + u(x) −x 2
− u(x) −x = ex
e −x+1 (e + 1) (e + 1)(e −x + 1)

so that it is clear that they cancel. We are now getting a simpler expression where we can separate the
variables:
du = e x (e −x + 1)dx = (1 + e x )dx
that can just be integrated: Z
u= (1 + e x )dx = e x + x

120
The particular solution is thus:
1
y p = (e x + x)
e −x+1
Finally the full solution is:
e x + x + c2
y=
e −x + 1
Applying the boundary condition, we get:
1 + c2
y(x = 0) = = −1
2
thus giving c 2 = −3. The specific full solution is then:

ex + x − 3
y B.C . =
e −x + 1

13.3 Second Order Differential Equations


We now consider second order differential equations. The most general form of second order differential equa-
tion can be written as follows:
F (y 00 , y 0 , y, x) = 0
which represents a function of second and first derivatives of the function y of the variable x. The solution of a
second order differential equation depends on 2 constants of integration, so we need 2 boundary conditions in
order to determine them. The 2 boundary conditions can be two conditions on the function or one condition
on the function and one on the first derivative:
( (
y(x = k 1 ) = c 1 y(x = k 1 ) = c 1
or
y(x = k 2 ) = c 2 y 0 (x = k 3 ) = c 3

13.3.1 Direct Integration

We start from some easy examples that can be addressed via direct integration:

(i) Solve the second order differential equation:

d2 y
=0
dx 2
This can be seen as a first order differential equation, twice, i.e.:
µ ¶
d dy
=0
dx dx
so we can start to solve the most “external” one by setting:

dy
z=
dx
Thus we have:
dz dy
=0 → z = c1 → = c1
dx dx
and then separating the variables:

dy = c 1 dx → y = c1 x + c2

Now applying the boundary conditions, we get:


( (
y(0) = 1 c2 = 1

y 0 (0) = 2 c1 = 2

Thus the solution is:


y = 2x + 1

121
(ii) Solve the second order differential equation:

d2 y dy
− =0
dx 2 dx
Here again as we are missing the term in y(x), we can again substitute:
dy
z=
dx
So our differential equation becomes:
dz
−z =0
dx
and again separating the variables we get:
dz dy
= dx → z= = c1 e x
z dx
where the last equality is again a first order differential equation:
dy
= c1 e x → dy = c 1 e x dx
dx
Thus we obtain:
y(x) = c 1 e x + c 2
which is the solution of the second order differential equation and it depends on two integration con-
stants.

13.3.2 Second Order Linear Differential Equation With Constant Coefficients

We can again define a subset of second order differential equations as the second order linear differential
equations which in general would be as:

d2 y dy
p(x) + q(x) + r (x)y(x) = f (x)
dx 2 dx
As there is no general recipe for solving this class of differential equations, we are going to restrict ourselves
even more by defining a sub-subset as second order linear differential equations with constant coefficients:

p(x) = a q(x) = b r (x) = c

So we are going to consider the case:


d2 y dy
a
+b + c y(x) = f (x)
dx 2 dx
First we are going to solve the homogeneous case and then we will consider the non-homogeneous cases with
specific f (x) functions.

13.3.3 Homogeneous Cases

Homogeneous second order linear differential equations with constant coefficients will have f (x) = 0, thus:

d2 y dy
a +b + c y(x) = 0
dx 2 dx
The generic solution will need to be in the form:

y(x) = c 1 y 1 (x) + c 2 y 2 (x)

and it needs to depend on two integration constants. We can try to guess what type of function could work in
this case and we can do it thinking about the case of the first order differential equations. A first order linear
differential equation with fixed coefficients will be as follows:
dy
a + b y(x) = 0
dx

122
and we already know the solution to this. It is an exponential:
b
y(x) = ce − a x

Now, moving to the case of the second order:

d2 y dy
a +b + c y(x) = 0
dx 2 dx
we can imagine that the solution will have to be something similar, so we can start assuming that the solution
can have the form:
y(x) = e λx
We can derive this possible solution and substitute into our differential equation to see if we can obtain any
condition on the λ parameter:
y(x) = e λx




y 0 (x) = λe λx

y (x) = λ2 e λx

 00

Substituting:
e λx
aλ2 e λx
+ bλ e λx
+ c = 0

we get a second order linear equation in λ

aλ2 + bλ + c = 0

which is called the associated characteristic equation. We know how to solve second order linear equation and
the solutions λ1,2 are: p
−b ± b 2 − 4ac
λ1,2 =
2a
and we can write the solution of the homogeneous differential equation as:

y = c 1 e λ1 x + c 2 e λ2 x

Now depending on the quantity under the square root, we are going to have different types of functions as the
exponent can be real or imaginary. Let’s define the quantity under the square root as determinant ∆:

∆ = b 2 − 4ac

and consider the three possible cases:

(a) if the determinant is positive, ∆ > 0, the two solutions λ1,2 are real and then the general solution to the
differential equation is made up of two exponential functions:

y(x) = c 1 e λ1 x + c 2 e λ2 x

(b) if the determinant is negative, ∆ < 0, the two solutions λ1,2 are imaginary numbers:
p
−b ± i 4ac − b 2
λ1,2 =
2a
where we have taken a −1 out of the square root obtaining a factor i and a positive quantity under the
square root. We can write these solutions as

−b

α =


2a
p → λ1,2 = α + i β
2
β = 4ac − b


2a

123
where α and β are now real numbers. Our solution can be written as before as:

y(x) = c 1 y 1 (x) + c 2 y 2 (x)

where each function is by definition:

y 1,2 = e λ1,2 x = e (α±i β)x


= e αx e ±βx
e αx cos βx ± i sin βx
¡ ¢
=

Now instead of writing these solutions like above (linear independent combinations of two functions), we
can write them directly as the two independent functions:

y 1 = e αx cos βx
(

y 2 = e αx sin βx

Thus our solution becomes:


y(x) = c 1 e αx cos βx + c 2 e αx sin βx

(c) If the determinant is null, ∆ = 0, we have one unique solution with double degeneracy m = 2:
b
λ=−
2a
So we can get only one function contributing to our solution:
b
y 1 = e − 2a x = e λx

and we need to find the second function. We can find it by using the method of the variation of the con-
stant. We start from the solution that we know: we substitute the constant with a function of x and then
we obtain the first and the second derivatives:
y = u(x)e λx




y 0 = u 0 (x)e λx + u(x)λe λx

y 00 = u 00 (x)e λx + 2u 0 (x)λe λx + u(x)λ2 e λx

We can substitute this in our homogeneous differential equation:

a u 00 (x) + 2u 0 (x)λ + u(x)λ2 e λx + b u 0 (x) + u(x)λ e λx + cu(x)e λx = 0


¡ ¢ ¡ ¢

Now we cancel all the e λx terms as they multiply every term and then we reorder collecting the terms
containing the same order of derivatives:
2
au 00 (x) +(2aλ 0
| {z+ b})u (x) | +{zbλ + c}) = 0
+(aλ
³ ´
b
2a − 2a +b = 0 aλ2 + bλ + c = 0

where in the first parenthesis we use the definition of λ obtained from the characteristic equation, while
in the second we use the characteristic equation itself. We are left with:

au 00 (x) = 0 → u(x) = Ax + B

Now we can write a solution using this as:

y 2 (x) = (Ax + B )e λx = Axe λx + B e λx

where clearly the second term is the same as the first solution we already found so we do not have to
include it in the second function that becomes:

y 2 (x) = xe λx

Therefore the general solution in this case is:

y(x) = c 1 e λx + c 2 xe λx

124
To conclude, in order to solve these second order linear differential equations, we need to write the charac-
teristic equation (substituting a n-order derivative with an n-order power of the parameter λ) and then solving
it for λ in order to have the functional forms of the solution.
There are some common mistakes that are worth some thinking:

(1) given the second order differential equation:

y 00 + y = 0 → λ2 + 1 = 0

and not
λ2

λ=0

 +

(2) Consider now:


y 00 + y 0 + y + 1 = e x → y 00 + y 0 + y = 0
where the last equality is the homogeneous equation associated to the give differential equation and not

y 00( y0 +
(((
( +( (y(
+1 = 0

13.3.4 Examples for the Homogeneous Cases

Let’s work some examples of homogeneous second order linear differential equations with constant coeffi-
cients:

(i) Solve the differential equation:


y 00 − y 0 − 6y = 0
We first write down the characteristic equation:

λ2 − λ − 6 = 0

that we can solve by decomposing as:


(λ − 3)(λ + 2) = 0
thus obtaining the two solutions λ1 = −2 and λ2 = 3. We now can write the solution to the differential
equation as:
y(x) = C 1 e −2x +C 2 e 3x

(ii) Solve the differential equation:


y 00 + 6y 0 + 9y = 0
In this case the characteristic equation is:

λ2 + 6λ + 9 = 0

which corresponds to:


(λ + 3)2 = 0
giving the solution λ = 3 with multiplicity m = 2. So the solution to the differential equation is now:

y(x) = C 1 e −3x +C 2 xe −3x = e −3x (C 1 +C 2 x)

(iii) Solve the differential equation:


y 00 + 2y 0 + 5y(x) = 0
Again we need to write the characteristic equation:

λ2 + 2λ + 5 = 0

that we can solve by using the formula:


p
−b ± b 2 − 4ac
λ1,2 =
2a
p p
−2 ± 4 − 20 −2 ± i 16
= = = −1 ± 2i
2 2

125
This gives us two complex solutions: in the formalism used above we have α = −1 and β = 2. Thus the
solution to the differential equation is:

y(x) = C 1 e −x cos 2x +C 2 e −x sin 2x = e −x (C 1 cos 2x +C 2 sin 2x)

(iv) We can now address also a famous example from physics: Hooke’s Law. The force can be written as
proportional to the displacement through the spring constant k:

d 2x
F = −kx = ma = m
dt2
We obtain a linear second order differential equation with fixed coefficients, exactly what we have been
dealing with so far:
d 2x
m 2 + kx = 0
dt
or we can rewrite it as:
d 2x k
+ x =0
dt2 m
giving the characteristic equation:
k
λ2 + =0
m
with solutions: s
k
λ1,2 = ±i = ±i ω
m
where we have set ω as: s
k
ω=
m
as it is conventionally done in physics. It represents the angular frequency and it is connected to the
frequency f by the relation ω = 2π f . Going back to our general formalism, in this case α = 0 and β = ω,
so the solution of the differential equation contains just the oscillation term:

y(x) = C 1 cos ωx +C 2 sin ωx

13.3.5 Non-Homogeneous Cases

We now consider some special non-homogeneous cases. Before detailing the possibilities, it is useful to state
some general properties.
The general way to solve is identical to the first-order case:

(1) Solve the homogeneous case and find the complementary solution:

y c (x)

(2) Find the particular solution:


y p (x)
that does not depend on any integration constant.
(3) The general solution is then:
y(x) = y c (x) + y p (x)

In general, the method of the variation of the constants can still be applied starting from the complementary
solution (i.e. the solution of the homogeneous equation) and substituting functions of the variable (x in our
general formalism) to the integration constants: the particular solution should be written as:

y p = u(x)y 1 (x) + v(x)y 2 (x)

where u(x) and v(x) are unknown functions of x and have to be determined by substituting in the non-
homogeneous differential equation. This method can become quite lengthy and complicated, thus we are

126
not going to address it.
Another useful property is the following: given an non-homogeneous differential equation that can be written
as:
F (y 00 , y 0 , y) = f 1 (x) + f 2 (x)
where F is a generic function of the y function and its derivatives, but in the simplified case we are addressing
it would be a y 00 + b y 0 + c y. The functions f 1 (x) and f 2 (x) can be generic functions of x. In this case if we have
two solutions y 1 and y 2 , where y 1 is solution of the differential equation:

a y 00 + b y 0 + c y = f 1 (x)

and y 2 is solution of the differential equation:

a y 00 + b y 0 + c y = f 2 (x)

then the solution:


y = y1 + y2
is a solution of initial differential equation:

a y 00 + b y 0 + c y = f 1 (x) + f 2 (x)

Instead of considering the most generic case, we are going to consider specific types of functions as f (x) and
solve accordingly. In general we can have any function f (x) in our differential equation. However if we restrict
ourselves to a specific types of functions, we can greatly simplify the problem and we can easily write down a
recipe to find the particular solution of the non-homogeneous differential equation.
We are going to consider the following possibilities for the f (x) function:

(a) f (x) = R(x)e px , where R(x) is a polynomial of order r and p is a real number;
(b) f (x) = R(x)e px cos q x or f (x) = R(x)e px sin q x, where again R(x) is a polynomial of order r and p and q are
real numbers.

Some examples to clarify this classification::

(i) f (x) = xe 2x would be a function of the first type [ f (x) = R(x)e px ] with r = 1 as order of the polynomial
and p = 2 for the exponential.
(ii) f (x) = e x is again of the first type [ f (x) = R(x)e px ] with r = 0 as order of the polynomial and p = 1 for the
exponential.
(iii) f (x) = (2x 2 + 4x + 1)e −4x is yet again of the first type [ f (x) = R(x)e px ] with r = 2, and p = −4.
(iv) f (x) = (x 2 + 2x + 1) is of the first type with r = 2, and p = 0.

Solutions for the two types of functions:

(a) for the cases f (x) = R(x)e px


1) if λ = p is not a solution of the characteristic equation, then the solution of the non-homogeneous
differential equation can be written as:
y p (x) = Q(x)e px
where Q(x) is a polynomial of order r (same as R(x)).
2) if λ = p is a solution of the characteristic equation with multiplicity m, then the solution of the non-
homogeneous differential equation can be written as:

y p (x) = x m Q(x)e px

where Q(x) is a polynomial of order r (same as R(x)).


(b) for the cases f (x) = R(x)e px cos q x or f (x) = R(x)e px sin q x
1) if λ = p + i q is not a solution of the characteristic equation, then the solution of the non-homogeneous
differential equation can be written as:

y p (x) = e px (Q(x) cos q x + S(x) sin q x)

where Q(x) and S(x) are polynomials of order r (same as R(x)).

127
2) if λ = p + i q is a solution of characteristic equation, then the solution of the non-homogeneous differ-
ential equation can be written as:

y p (x) = xe px (Q(x) cos q x + S(x) sin q x)

Note that this is equivalent to the previous case: the addition of a x m factor in this case corresponds
to x 1 as m = 1. In fact, in a second order case we can have only two solutions and if the solution are
complex, then they both have m − 1. In case we are thinking of extending this mechanism to higher
orders, then the general rules apply and the factor needs to be x m .

In all cases, the polynomial will depend on some constants that will have to be determined in the specific
case of the given non-homogeneous differential equation by deriving the solution and substituting into the
complete equation. Some examples to clarify the procedure:

(i) Solve the following non-homogeneous differential equation:

y 00 + 4y = 4 cos 2x

In this case we set: the order of the polynomial is r = 0, the exponential factor is p = 0, and the cosine
factor is q = 2. We proceed into the various steps of the solution:
1) Solve the homogeneous equation:
y 00 + 4y = 0
by writing the characteristic equation:
λ2 + 4 = 0
giving two imaginary solutions λ = ±i 2. Then the complementary solution (solution to the homoge-
neous equation) is in the form:
y c (x) = C 1 cos 2x +C 2 sin 2x
2) Find the particular solution: having p = 0 and q = 2 we need to check if λ = p ± qi = ±i 2 is a solution
of the characteristic equation and indeed it is as we saw in the previous point. Thus we are in the case
(b2) above and we can write the particular solution as:

y p (x) = x(A cos 2x + B sin 2x)

where we introduced two A and B constants that need to be evaluated. To do so, we calculate the 1st
and 2nd derivatives of the proposed particular solution and substitute them back into equation:

y p0 (x) = A cos 2x + B sin 2x + x(−2A sin 2x + 2B cos 2x)


(

y p00 (x) = 2(−2A sin 2x + 2B cos 2x) + x(−4A cos 2x − 4B sin 2x)

and now substituting in the full non-homogeneous equation:


(( (
−4A sin 2x + 4B cos 2x − 4x(A cos 2x
(+(B(
sin 2x)
(((( (( (
+ 4x(A cos 2x
(+(B(
sin 2x) = 4 cos 2x
((((
we get:
−A sin 2x + B cos 2x = cos 2x
giving B = 1 and A = 0. So the particular solution becomes:

y p (x) = x sin 2x

3) Finally the general solution is:

y(x) = C 1 cos 2x +C 2 sin 2x + x sin 2x

128
If we add boundary conditions to the problem, for example:
(
y(0) = 0
y 0 (0) = 2

we can determine the two integration constants still present in the general solution. Applying the first
condition:
y(x = 0) = C 1 +C 2 · 0 + 0 = 0
it gives C 1 = 0. For applying the second condition we need to calculate the first derivative of the general
solution (where we have already eliminated C 1 ):

y 0 (x) = 2C 2 cos 2x + sin 2x + 2x cos 2x

and then calculate it in x = 0:


y 0 (x = 0) = 2C 2 + 0 + 0 = 2
thus obtaining C 2 = 1. So the solution including the boundary conditions is:

y(x) = sin 2x + x sin 2x = (1 + x) sin 2x

(ii) Solve the following non-homogeneous differential equation:

d2 y dy
2
+2 + 2y(x) = xe x
dx dx
In this case we set: the order of the polynomial is r = 1, and the exponential factor is p = 1 (the cosine/sine
factor is q = 0). We proceed into the various steps of the solution:
1) Solve the homogeneous equation:
d2 y dy
+2 + 2y(x) = 0
dx 2 dx
by writing the characteristic equation:
λ2 + 2λ + 2 = 0
thus giving the two complex solutions:
p
−2 ± 4 − 8
λ1,2 = = −1 ± i
2
where in the convention used above we can set α = −1 and β = 1. So the complementary solution is:

y c (x) = C 1 e −x cos x +C 2 e −x sin x = e −x (C 1 cos x +C 2 sin x)

2) The particular solution is obtained by considering the function f (x) = xe x and checking if λ = p = 1 is
a solution of the characteristic equation: as seen in the previous point, this is not a solution, so we are
in case (a1) with respect to the above classification: thus we can write the particular solution as:

y p (x) = (Ax + B )e x

where the two constants again need to be determined by obtaining the derivatives:

y p (x) = Ae x + (Ax + B )e x
( 0

y p00 (x) = Ae x + Ae x + (Ax + B )e x

and substituting in our equation:

(2A + Ax + B )e x + (2A + 2Ax + 2B )e x + (2Ax + 2B )e x = xe x

where the common factor e x can be eliminated everywhere and then we can reorganise based on
powers of x:
(Ax + 2Ax + 2Ax − x) + (2A + B + 2A + 2B + 2B ) = 0

129
that simplifies in:
(5A − 1)x + (5B + 4A) = 0
giving: 
1
5A − 1 = 0 → A=


5
 4
4A + 5B = 0
 → B =−
25
The particular solution is thus: µ ¶
1 4
y p (x) = x − ex
5 5
3) Finally the general solution is:
µ ¶
1 4
y(x) = C 1 e −x cos x +C 2 e −x sin x + x − ex
5 5

130

Вам также может понравиться